Training: 2022-03-31 15:44:06,522-rank_id: 0 Training: 2022-03-31 15:44:30,118-Added key: store_based_barrier_key:2 to store for rank: 0 Training: 2022-03-31 15:44:30,129-Rank 0: Completed store-based barrier for 8 nodes. Training: 2022-03-31 15:44:30,129-Added key: store_based_barrier_key:3 to store for rank: 0 Training: 2022-03-31 15:44:30,140-Rank 0: Completed store-based barrier for 8 nodes. Training: 2022-03-31 15:44:30,140-Added key: store_based_barrier_key:4 to store for rank: 0 Training: 2022-03-31 15:44:30,150-Rank 0: Completed store-based barrier for 8 nodes. Training: 2022-03-31 15:44:30,150-Added key: store_based_barrier_key:5 to store for rank: 0 Training: 2022-03-31 15:44:30,161-Rank 0: Completed store-based barrier for 8 nodes. Training: 2022-03-31 15:44:30,161-Added key: store_based_barrier_key:6 to store for rank: 0 Training: 2022-03-31 15:44:30,171-Rank 0: Completed store-based barrier for 8 nodes. Training: 2022-03-31 15:44:30,171-Added key: store_based_barrier_key:7 to store for rank: 0 Training: 2022-03-31 15:44:30,171-Rank 0: Completed store-based barrier for 8 nodes. Training: 2022-03-31 15:44:30,171-Added key: store_based_barrier_key:8 to store for rank: 0 Training: 2022-03-31 15:44:30,182-Rank 0: Completed store-based barrier for 8 nodes. Training: 2022-03-31 15:44:30,182-Added key: store_based_barrier_key:9 to store for rank: 0 Training: 2022-03-31 15:44:30,192-Rank 0: Completed store-based barrier for 8 nodes. Training: 2022-03-31 15:44:30,192-Added key: store_based_barrier_key:10 to store for rank: 0 Training: 2022-03-31 15:44:30,203-Rank 0: Completed store-based barrier for 8 nodes. Training: 2022-03-31 15:44:32,622-: margin_list [1.0, 0.0, 0.4] Training: 2022-03-31 15:44:32,623-: network vit_t_dp005_mask0 Training: 2022-03-31 15:44:32,623-: resume False Training: 2022-03-31 15:44:32,623-: output work_dirs/wf42m_pfc02_40epoch_8gpu_vit_t Training: 2022-03-31 15:44:32,623-: embedding_size 512 Training: 2022-03-31 15:44:32,623-: sample_rate 0.2 Training: 2022-03-31 15:44:32,623-: interclass_filtering_threshold0 Training: 2022-03-31 15:44:32,623-: fp16 True Training: 2022-03-31 15:44:32,623-: batch_size 256 Training: 2022-03-31 15:44:32,623-: optimizer adamw Training: 2022-03-31 15:44:32,623-: lr 0.001 Training: 2022-03-31 15:44:32,623-: momentum 0.9 Training: 2022-03-31 15:44:32,623-: weight_decay 0.1 Training: 2022-03-31 15:44:32,624-: verbose 2000 Training: 2022-03-31 15:44:32,624-: frequent 10 Training: 2022-03-31 15:44:32,624-: dali True Training: 2022-03-31 15:44:32,624-: rec /train_tmp/WebFace42M Training: 2022-03-31 15:44:32,624-: num_classes 2059906 Training: 2022-03-31 15:44:32,624-: num_image 42474557 Training: 2022-03-31 15:44:32,624-: num_epoch 40 Training: 2022-03-31 15:44:32,624-: warmup_epoch 4 Training: 2022-03-31 15:44:32,624-: val_targets [] Training: 2022-03-31 15:44:32,624-: total_batch_size 2048 Training: 2022-03-31 15:44:32,624-: warmup_step 82956 Training: 2022-03-31 15:44:32,624-: total_step 829560 Training: 2022-03-31 15:44:36,541-Reducer buckets have been rebuilt in this iteration. Training: 2022-03-31 15:44:42,392-Speed 6302.86 samples/sec Loss 42.5054 LearningRate 0.0000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 65536 Required: 107 hours Training: 2022-03-31 15:44:45,640-Speed 6307.34 samples/sec Loss 42.4883 LearningRate 0.0000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 65536 Required: 97 hours Training: 2022-03-31 15:44:48,881-Speed 6322.39 samples/sec Loss 42.4902 LearningRate 0.0000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 65536 Required: 91 hours Training: 2022-03-31 15:44:52,143-Speed 6280.65 samples/sec Loss 42.4888 LearningRate 0.0000 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 65536 Required: 88 hours Training: 2022-03-31 15:44:55,388-Speed 6315.33 samples/sec Loss 42.4722 LearningRate 0.0000 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 65536 Required: 86 hours Training: 2022-03-31 15:44:58,636-Speed 6307.65 samples/sec Loss 42.4910 LearningRate 0.0000 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 65536 Required: 84 hours Training: 2022-03-31 15:45:01,883-Speed 6309.80 samples/sec Loss 42.4626 LearningRate 0.0000 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 65536 Required: 83 hours Training: 2022-03-31 15:45:05,145-Speed 6279.76 samples/sec Loss 42.4459 LearningRate 0.0000 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 65536 Required: 82 hours Training: 2022-03-31 15:45:08,396-Speed 6302.04 samples/sec Loss 42.4438 LearningRate 0.0000 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 131072 Required: 82 hours Training: 2022-03-31 15:45:11,635-Speed 6326.38 samples/sec Loss 42.4678 LearningRate 0.0000 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 131072 Required: 81 hours Training: 2022-03-31 15:45:14,882-Speed 6309.51 samples/sec Loss 42.4814 LearningRate 0.0000 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 131072 Required: 80 hours Training: 2022-03-31 15:45:18,127-Speed 6313.99 samples/sec Loss 42.4532 LearningRate 0.0000 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 131072 Required: 80 hours Training: 2022-03-31 15:45:21,381-Speed 6296.85 samples/sec Loss 42.4487 LearningRate 0.0000 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 131072 Required: 80 hours Training: 2022-03-31 15:45:24,726-Speed 6125.10 samples/sec Loss 42.4309 LearningRate 0.0000 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 131072 Required: 79 hours Training: 2022-03-31 15:45:27,996-Speed 6264.87 samples/sec Loss 42.3752 LearningRate 0.0000 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 131072 Required: 79 hours Training: 2022-03-31 15:45:31,258-Speed 6280.13 samples/sec Loss 42.3615 LearningRate 0.0000 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 131072 Required: 79 hours Training: 2022-03-31 15:45:34,512-Speed 6296.08 samples/sec Loss 42.3755 LearningRate 0.0000 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 131072 Required: 79 hours Training: 2022-03-31 15:45:37,774-Speed 6281.10 samples/sec Loss 42.3637 LearningRate 0.0000 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 131072 Required: 79 hours Training: 2022-03-31 15:45:41,025-Speed 6301.45 samples/sec Loss 42.3273 LearningRate 0.0000 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 262144 Required: 78 hours Training: 2022-03-31 15:45:44,257-Speed 6338.55 samples/sec Loss 42.3098 LearningRate 0.0000 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 131072 Required: 78 hours Training: 2022-03-31 15:45:47,507-Speed 6305.12 samples/sec Loss 42.3088 LearningRate 0.0000 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 131072 Required: 78 hours Training: 2022-03-31 15:45:50,754-Speed 6308.71 samples/sec Loss 42.2882 LearningRate 0.0000 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 131072 Required: 78 hours Training: 2022-03-31 15:45:54,026-Speed 6260.38 samples/sec Loss 42.2838 LearningRate 0.0000 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 131072 Required: 78 hours Training: 2022-03-31 15:45:57,291-Speed 6276.28 samples/sec Loss 42.1959 LearningRate 0.0000 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 131072 Required: 78 hours Training: 2022-03-31 15:46:00,554-Speed 6280.08 samples/sec Loss 42.2318 LearningRate 0.0000 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 131072 Required: 78 hours Training: 2022-03-31 15:46:03,813-Speed 6287.33 samples/sec Loss 42.1473 LearningRate 0.0000 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 131072 Required: 78 hours Training: 2022-03-31 15:46:07,063-Speed 6303.71 samples/sec Loss 42.1248 LearningRate 0.0000 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 131072 Required: 77 hours Training: 2022-03-31 15:46:10,331-Speed 6269.79 samples/sec Loss 42.0656 LearningRate 0.0000 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 131072 Required: 77 hours Training: 2022-03-31 15:46:13,597-Speed 6272.72 samples/sec Loss 42.0593 LearningRate 0.0000 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 131072 Required: 77 hours Training: 2022-03-31 15:46:16,855-Speed 6292.60 samples/sec Loss 42.0087 LearningRate 0.0000 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:20,108-Speed 6298.40 samples/sec Loss 41.9627 LearningRate 0.0000 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:23,353-Speed 6313.41 samples/sec Loss 41.9278 LearningRate 0.0000 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:26,629-Speed 6253.60 samples/sec Loss 41.8475 LearningRate 0.0000 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:29,884-Speed 6295.37 samples/sec Loss 41.8058 LearningRate 0.0000 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:33,144-Speed 6285.83 samples/sec Loss 41.7296 LearningRate 0.0000 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:36,396-Speed 6298.00 samples/sec Loss 41.6821 LearningRate 0.0000 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:39,651-Speed 6295.28 samples/sec Loss 41.5835 LearningRate 0.0000 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:42,899-Speed 6307.05 samples/sec Loss 41.5292 LearningRate 0.0000 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:46,163-Speed 6277.92 samples/sec Loss 41.4279 LearningRate 0.0000 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:49,400-Speed 6327.17 samples/sec Loss 41.3548 LearningRate 0.0000 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:52,648-Speed 6308.78 samples/sec Loss 41.2369 LearningRate 0.0000 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:55,898-Speed 6302.93 samples/sec Loss 41.1577 LearningRate 0.0000 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:46:59,160-Speed 6279.94 samples/sec Loss 41.0353 LearningRate 0.0000 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 262144 Required: 77 hours Training: 2022-03-31 15:47:02,430-Speed 6266.90 samples/sec Loss 40.9382 LearningRate 0.0000 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:05,677-Speed 6307.94 samples/sec Loss 40.8617 LearningRate 0.0000 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:08,940-Speed 6279.12 samples/sec Loss 40.7275 LearningRate 0.0000 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:12,193-Speed 6296.93 samples/sec Loss 40.6200 LearningRate 0.0000 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:15,438-Speed 6313.73 samples/sec Loss 40.5289 LearningRate 0.0000 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:18,691-Speed 6297.69 samples/sec Loss 40.3985 LearningRate 0.0000 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:21,956-Speed 6275.35 samples/sec Loss 40.3094 LearningRate 0.0000 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 524288 Required: 76 hours Training: 2022-03-31 15:47:25,190-Speed 6334.36 samples/sec Loss 40.1808 LearningRate 0.0000 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:28,438-Speed 6307.41 samples/sec Loss 40.1558 LearningRate 0.0000 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:31,684-Speed 6313.83 samples/sec Loss 40.0242 LearningRate 0.0000 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:34,933-Speed 6305.49 samples/sec Loss 39.9495 LearningRate 0.0000 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:38,174-Speed 6319.80 samples/sec Loss 39.8385 LearningRate 0.0000 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:41,428-Speed 6295.20 samples/sec Loss 39.7928 LearningRate 0.0000 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 262144 Required: 76 hours Training: 2022-03-31 15:47:44,688-Speed 6285.72 samples/sec Loss 39.7139 LearningRate 0.0000 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 131072 Required: 76 hours Training: 2022-03-31 15:47:47,929-Speed 6320.45 samples/sec Loss 39.6391 LearningRate 0.0000 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 131072 Required: 76 hours Training: 2022-03-31 15:47:51,180-Speed 6302.00 samples/sec Loss 39.5896 LearningRate 0.0000 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 131072 Required: 76 hours Training: 2022-03-31 15:47:54,473-Speed 6220.05 samples/sec Loss 39.5285 LearningRate 0.0000 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 131072 Required: 76 hours Training: 2022-03-31 15:47:57,716-Speed 6317.73 samples/sec Loss 39.4703 LearningRate 0.0000 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 131072 Required: 76 hours Training: 2022-03-31 15:48:00,949-Speed 6335.63 samples/sec Loss 39.4251 LearningRate 0.0000 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:04,207-Speed 6287.85 samples/sec Loss 39.3801 LearningRate 0.0000 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:07,462-Speed 6293.08 samples/sec Loss 39.3246 LearningRate 0.0000 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:10,703-Speed 6321.19 samples/sec Loss 39.2813 LearningRate 0.0000 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:13,947-Speed 6314.29 samples/sec Loss 39.2511 LearningRate 0.0000 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:17,195-Speed 6307.74 samples/sec Loss 39.2188 LearningRate 0.0000 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:20,440-Speed 6313.38 samples/sec Loss 39.1738 LearningRate 0.0000 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:23,685-Speed 6312.78 samples/sec Loss 39.1487 LearningRate 0.0000 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:26,932-Speed 6309.67 samples/sec Loss 39.1421 LearningRate 0.0000 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:30,177-Speed 6312.28 samples/sec Loss 39.1016 LearningRate 0.0000 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:33,412-Speed 6334.07 samples/sec Loss 39.0540 LearningRate 0.0000 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:36,673-Speed 6282.17 samples/sec Loss 39.0532 LearningRate 0.0000 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:39,924-Speed 6301.26 samples/sec Loss 39.0220 LearningRate 0.0000 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:43,173-Speed 6304.63 samples/sec Loss 38.9963 LearningRate 0.0000 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:46,420-Speed 6310.96 samples/sec Loss 38.9730 LearningRate 0.0000 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:49,714-Speed 6218.63 samples/sec Loss 38.9592 LearningRate 0.0000 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:52,960-Speed 6309.88 samples/sec Loss 38.9541 LearningRate 0.0000 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:48:56,204-Speed 6315.44 samples/sec Loss 38.9328 LearningRate 0.0000 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:48:59,479-Speed 6254.42 samples/sec Loss 38.9123 LearningRate 0.0000 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:02,766-Speed 6232.97 samples/sec Loss 38.8781 LearningRate 0.0000 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:06,111-Speed 6123.98 samples/sec Loss 38.8903 LearningRate 0.0000 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:09,352-Speed 6321.73 samples/sec Loss 38.8854 LearningRate 0.0000 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:12,623-Speed 6262.61 samples/sec Loss 38.8680 LearningRate 0.0000 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:15,872-Speed 6304.54 samples/sec Loss 38.8527 LearningRate 0.0000 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:19,114-Speed 6317.12 samples/sec Loss 38.8501 LearningRate 0.0000 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:22,359-Speed 6313.00 samples/sec Loss 38.8441 LearningRate 0.0000 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:25,609-Speed 6304.50 samples/sec Loss 38.8290 LearningRate 0.0000 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:28,887-Speed 6249.51 samples/sec Loss 38.8344 LearningRate 0.0000 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 65536 Required: 76 hours Training: 2022-03-31 15:49:32,135-Speed 6307.35 samples/sec Loss 38.8266 LearningRate 0.0000 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:35,384-Speed 6305.31 samples/sec Loss 38.8224 LearningRate 0.0000 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:38,627-Speed 6316.21 samples/sec Loss 38.8122 LearningRate 0.0000 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:49:41,857-Speed 6342.96 samples/sec Loss 38.8110 LearningRate 0.0000 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:49:45,107-Speed 6302.40 samples/sec Loss 38.8143 LearningRate 0.0000 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:49:48,350-Speed 6316.24 samples/sec Loss 38.8016 LearningRate 0.0000 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:49:51,606-Speed 6291.36 samples/sec Loss 38.7960 LearningRate 0.0000 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:49:54,862-Speed 6293.39 samples/sec Loss 38.7944 LearningRate 0.0000 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:49:58,104-Speed 6318.42 samples/sec Loss 38.7919 LearningRate 0.0000 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:50:01,347-Speed 6315.67 samples/sec Loss 38.7672 LearningRate 0.0000 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:50:04,601-Speed 6295.55 samples/sec Loss 38.7803 LearningRate 0.0000 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:50:07,846-Speed 6312.44 samples/sec Loss 38.7580 LearningRate 0.0000 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:50:11,093-Speed 6309.23 samples/sec Loss 38.7423 LearningRate 0.0000 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:50:14,349-Speed 6292.02 samples/sec Loss 38.7629 LearningRate 0.0000 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:50:17,594-Speed 6311.70 samples/sec Loss 38.7572 LearningRate 0.0000 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:50:20,850-Speed 6292.65 samples/sec Loss 38.7333 LearningRate 0.0000 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 32768 Required: 76 hours Training: 2022-03-31 15:50:24,081-Speed 6339.06 samples/sec Loss 38.7333 LearningRate 0.0000 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:50:27,337-Speed 6292.28 samples/sec Loss 38.7826 LearningRate 0.0000 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:50:30,606-Speed 6266.53 samples/sec Loss 38.7556 LearningRate 0.0000 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:50:33,858-Speed 6313.05 samples/sec Loss 38.7434 LearningRate 0.0000 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 16384 Required: 76 hours Training: 2022-03-31 15:50:37,104-Speed 6310.60 samples/sec Loss 38.7153 LearningRate 0.0000 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:50:40,351-Speed 6309.07 samples/sec Loss 38.7555 LearningRate 0.0000 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:50:43,633-Speed 6308.89 samples/sec Loss 38.7427 LearningRate 0.0000 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:50:46,882-Speed 6305.54 samples/sec Loss 38.7639 LearningRate 0.0000 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:50:50,136-Speed 6296.90 samples/sec Loss 38.7861 LearningRate 0.0000 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:50:53,398-Speed 6279.20 samples/sec Loss 38.7285 LearningRate 0.0000 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:50:56,651-Speed 6297.38 samples/sec Loss 38.7054 LearningRate 0.0000 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 15:50:59,938-Speed 6233.18 samples/sec Loss 38.6803 LearningRate 0.0000 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:03,191-Speed 6297.20 samples/sec Loss 38.6758 LearningRate 0.0000 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:06,433-Speed 6318.85 samples/sec Loss 38.6681 LearningRate 0.0000 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:09,706-Speed 6259.84 samples/sec Loss 38.6274 LearningRate 0.0000 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:12,959-Speed 6296.84 samples/sec Loss 38.5957 LearningRate 0.0000 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:16,228-Speed 6267.03 samples/sec Loss 38.6011 LearningRate 0.0000 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:19,475-Speed 6310.40 samples/sec Loss 38.6396 LearningRate 0.0000 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:22,727-Speed 6300.66 samples/sec Loss 38.6389 LearningRate 0.0000 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:25,972-Speed 6311.27 samples/sec Loss 38.6532 LearningRate 0.0000 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:29,225-Speed 6297.73 samples/sec Loss 38.6620 LearningRate 0.0000 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:32,469-Speed 6315.68 samples/sec Loss 38.6770 LearningRate 0.0000 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:35,733-Speed 6277.13 samples/sec Loss 38.6737 LearningRate 0.0000 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:38,975-Speed 6317.31 samples/sec Loss 38.6824 LearningRate 0.0000 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:42,233-Speed 6293.10 samples/sec Loss 38.6976 LearningRate 0.0000 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:45,487-Speed 6295.93 samples/sec Loss 38.7194 LearningRate 0.0000 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:48,764-Speed 6253.50 samples/sec Loss 38.6899 LearningRate 0.0000 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:52,009-Speed 6312.90 samples/sec Loss 38.6953 LearningRate 0.0000 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:55,255-Speed 6311.26 samples/sec Loss 38.7030 LearningRate 0.0000 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:51:58,501-Speed 6310.57 samples/sec Loss 38.7114 LearningRate 0.0000 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:01,765-Speed 6277.06 samples/sec Loss 38.7204 LearningRate 0.0000 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:05,010-Speed 6311.43 samples/sec Loss 38.7186 LearningRate 0.0000 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:08,260-Speed 6302.45 samples/sec Loss 38.7027 LearningRate 0.0000 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:11,509-Speed 6305.48 samples/sec Loss 38.6980 LearningRate 0.0000 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:14,759-Speed 6302.76 samples/sec Loss 38.7006 LearningRate 0.0000 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:18,039-Speed 6246.08 samples/sec Loss 38.6882 LearningRate 0.0000 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:21,291-Speed 6299.18 samples/sec Loss 38.6419 LearningRate 0.0000 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:24,537-Speed 6311.74 samples/sec Loss 38.6438 LearningRate 0.0000 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:27,786-Speed 6304.37 samples/sec Loss 38.6122 LearningRate 0.0000 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:31,030-Speed 6315.56 samples/sec Loss 38.6120 LearningRate 0.0000 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:52:34,277-Speed 6308.31 samples/sec Loss 38.6124 LearningRate 0.0000 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:52:37,523-Speed 6310.52 samples/sec Loss 38.6502 LearningRate 0.0000 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:52:40,753-Speed 6342.67 samples/sec Loss 38.6471 LearningRate 0.0000 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:43,999-Speed 6311.85 samples/sec Loss 38.6163 LearningRate 0.0000 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:47,250-Speed 6301.18 samples/sec Loss 38.6311 LearningRate 0.0000 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:50,592-Speed 6128.22 samples/sec Loss 38.6613 LearningRate 0.0000 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:53,948-Speed 6103.97 samples/sec Loss 38.6511 LearningRate 0.0000 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:52:57,198-Speed 6302.48 samples/sec Loss 38.6694 LearningRate 0.0000 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:00,447-Speed 6305.20 samples/sec Loss 38.6428 LearningRate 0.0000 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:03,697-Speed 6304.67 samples/sec Loss 38.6282 LearningRate 0.0000 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:06,941-Speed 6314.27 samples/sec Loss 38.6246 LearningRate 0.0000 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:10,187-Speed 6308.99 samples/sec Loss 38.6603 LearningRate 0.0000 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:13,440-Speed 6297.64 samples/sec Loss 38.6671 LearningRate 0.0000 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:53:16,721-Speed 6244.29 samples/sec Loss 38.6879 LearningRate 0.0000 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:53:19,986-Speed 6277.69 samples/sec Loss 38.6918 LearningRate 0.0000 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:53:23,243-Speed 6290.55 samples/sec Loss 38.6851 LearningRate 0.0000 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:53:26,524-Speed 6242.42 samples/sec Loss 38.6859 LearningRate 0.0000 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:53:29,817-Speed 6220.11 samples/sec Loss 38.6906 LearningRate 0.0000 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:53:33,045-Speed 6346.34 samples/sec Loss 38.7181 LearningRate 0.0000 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:36,303-Speed 6286.60 samples/sec Loss 38.7076 LearningRate 0.0000 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:39,568-Speed 6274.81 samples/sec Loss 38.7377 LearningRate 0.0000 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:42,812-Speed 6314.01 samples/sec Loss 38.7624 LearningRate 0.0000 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:46,061-Speed 6306.72 samples/sec Loss 38.7565 LearningRate 0.0000 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:49,307-Speed 6310.60 samples/sec Loss 38.7861 LearningRate 0.0000 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:52,646-Speed 6134.55 samples/sec Loss 38.7539 LearningRate 0.0000 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:55,896-Speed 6302.93 samples/sec Loss 38.7144 LearningRate 0.0000 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:53:59,140-Speed 6315.46 samples/sec Loss 38.6893 LearningRate 0.0000 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:02,387-Speed 6308.67 samples/sec Loss 38.6788 LearningRate 0.0000 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:05,686-Speed 6209.51 samples/sec Loss 38.6468 LearningRate 0.0000 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:54:08,935-Speed 6305.50 samples/sec Loss 38.6565 LearningRate 0.0000 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:12,194-Speed 6285.69 samples/sec Loss 38.6674 LearningRate 0.0000 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:15,445-Speed 6300.70 samples/sec Loss 38.7016 LearningRate 0.0000 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:18,694-Speed 6306.22 samples/sec Loss 38.7482 LearningRate 0.0000 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:21,975-Speed 6242.02 samples/sec Loss 38.7786 LearningRate 0.0000 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:25,219-Speed 6315.47 samples/sec Loss 38.7443 LearningRate 0.0000 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:28,467-Speed 6306.36 samples/sec Loss 38.7803 LearningRate 0.0000 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:31,738-Speed 6262.94 samples/sec Loss 38.7438 LearningRate 0.0000 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:35,024-Speed 6234.94 samples/sec Loss 38.7611 LearningRate 0.0000 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:38,303-Speed 6246.42 samples/sec Loss 38.8115 LearningRate 0.0000 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:41,552-Speed 6305.21 samples/sec Loss 38.7607 LearningRate 0.0000 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:54:44,786-Speed 6334.77 samples/sec Loss 38.7336 LearningRate 0.0000 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:48,041-Speed 6293.60 samples/sec Loss 38.7069 LearningRate 0.0000 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:51,292-Speed 6301.91 samples/sec Loss 38.6871 LearningRate 0.0000 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:54:54,534-Speed 6317.55 samples/sec Loss 38.6562 LearningRate 0.0000 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 15:54:57,790-Speed 6292.50 samples/sec Loss 38.6636 LearningRate 0.0000 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 15:55:01,050-Speed 6283.70 samples/sec Loss 38.6653 LearningRate 0.0000 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 15:55:04,417-Speed 6084.32 samples/sec Loss 38.6991 LearningRate 0.0000 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 15:55:07,670-Speed 6298.10 samples/sec Loss 38.7206 LearningRate 0.0000 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 15:55:10,923-Speed 6297.17 samples/sec Loss 38.7306 LearningRate 0.0000 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 15:55:14,237-Speed 6182.82 samples/sec Loss 38.7652 LearningRate 0.0000 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 15:55:17,487-Speed 6304.20 samples/sec Loss 38.7880 LearningRate 0.0000 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 15:55:20,744-Speed 6289.52 samples/sec Loss 38.8126 LearningRate 0.0000 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 15:55:23,989-Speed 6313.89 samples/sec Loss 38.8017 LearningRate 0.0000 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 15:55:27,247-Speed 6287.22 samples/sec Loss 38.7764 LearningRate 0.0000 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:55:30,497-Speed 6302.04 samples/sec Loss 38.7040 LearningRate 0.0000 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:55:33,754-Speed 6290.80 samples/sec Loss 38.7148 LearningRate 0.0000 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:55:37,007-Speed 6295.99 samples/sec Loss 38.6720 LearningRate 0.0000 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:55:40,256-Speed 6305.47 samples/sec Loss 38.6801 LearningRate 0.0000 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:55:43,511-Speed 6294.65 samples/sec Loss 38.6985 LearningRate 0.0000 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:55:46,757-Speed 6309.91 samples/sec Loss 38.6972 LearningRate 0.0000 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:55:50,005-Speed 6308.46 samples/sec Loss 38.7342 LearningRate 0.0000 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:55:53,245-Speed 6320.78 samples/sec Loss 38.7680 LearningRate 0.0000 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:55:56,489-Speed 6315.53 samples/sec Loss 38.7908 LearningRate 0.0000 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:55:59,740-Speed 6301.74 samples/sec Loss 38.8102 LearningRate 0.0000 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:56:02,989-Speed 6304.37 samples/sec Loss 38.8040 LearningRate 0.0000 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:56:06,239-Speed 6302.85 samples/sec Loss 38.8041 LearningRate 0.0000 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:56:09,472-Speed 6336.76 samples/sec Loss 38.8202 LearningRate 0.0000 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:12,720-Speed 6307.88 samples/sec Loss 38.8129 LearningRate 0.0000 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:15,968-Speed 6306.63 samples/sec Loss 38.8561 LearningRate 0.0000 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:19,220-Speed 6299.92 samples/sec Loss 38.8586 LearningRate 0.0000 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:22,469-Speed 6304.15 samples/sec Loss 38.8281 LearningRate 0.0000 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:25,728-Speed 6285.39 samples/sec Loss 38.8011 LearningRate 0.0000 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:28,987-Speed 6286.53 samples/sec Loss 38.7957 LearningRate 0.0000 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:32,234-Speed 6308.67 samples/sec Loss 38.8151 LearningRate 0.0000 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:35,484-Speed 6303.52 samples/sec Loss 38.7744 LearningRate 0.0000 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:38,728-Speed 6313.13 samples/sec Loss 38.7808 LearningRate 0.0000 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:41,983-Speed 6293.97 samples/sec Loss 38.7465 LearningRate 0.0000 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:56:45,214-Speed 6339.59 samples/sec Loss 38.7498 LearningRate 0.0000 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:48,463-Speed 6305.68 samples/sec Loss 38.7585 LearningRate 0.0000 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:51,714-Speed 6302.10 samples/sec Loss 38.7478 LearningRate 0.0000 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:54,958-Speed 6313.03 samples/sec Loss 38.7461 LearningRate 0.0000 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:56:58,225-Speed 6271.56 samples/sec Loss 38.7635 LearningRate 0.0000 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:01,475-Speed 6302.31 samples/sec Loss 38.7652 LearningRate 0.0000 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:04,721-Speed 6312.29 samples/sec Loss 38.7292 LearningRate 0.0000 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:07,965-Speed 6313.65 samples/sec Loss 38.7647 LearningRate 0.0000 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:11,214-Speed 6305.64 samples/sec Loss 38.7853 LearningRate 0.0000 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:14,502-Speed 6230.51 samples/sec Loss 38.8207 LearningRate 0.0000 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:17,750-Speed 6305.41 samples/sec Loss 38.8755 LearningRate 0.0000 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:57:21,011-Speed 6283.88 samples/sec Loss 38.8649 LearningRate 0.0000 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:57:24,268-Speed 6288.33 samples/sec Loss 38.8408 LearningRate 0.0000 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:57:27,518-Speed 6303.33 samples/sec Loss 38.8421 LearningRate 0.0000 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:57:30,765-Speed 6308.55 samples/sec Loss 38.8477 LearningRate 0.0000 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:57:34,026-Speed 6282.91 samples/sec Loss 38.8226 LearningRate 0.0000 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:57:37,275-Speed 6303.41 samples/sec Loss 38.8045 LearningRate 0.0000 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:57:40,509-Speed 6334.33 samples/sec Loss 38.8393 LearningRate 0.0000 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:43,756-Speed 6312.64 samples/sec Loss 38.9204 LearningRate 0.0000 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:47,010-Speed 6294.59 samples/sec Loss 39.0808 LearningRate 0.0000 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:50,254-Speed 6315.18 samples/sec Loss 39.0426 LearningRate 0.0000 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:53,565-Speed 6187.00 samples/sec Loss 39.0617 LearningRate 0.0000 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:57:56,815-Speed 6303.24 samples/sec Loss 39.0285 LearningRate 0.0000 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:00,111-Speed 6214.49 samples/sec Loss 38.9210 LearningRate 0.0000 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:03,370-Speed 6287.07 samples/sec Loss 38.8862 LearningRate 0.0000 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:06,682-Speed 6185.08 samples/sec Loss 38.8658 LearningRate 0.0000 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:09,934-Speed 6297.42 samples/sec Loss 38.8677 LearningRate 0.0000 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:13,190-Speed 6292.69 samples/sec Loss 38.8763 LearningRate 0.0000 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:58:16,435-Speed 6313.04 samples/sec Loss 38.8549 LearningRate 0.0000 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:58:19,684-Speed 6305.66 samples/sec Loss 38.8710 LearningRate 0.0000 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:58:22,932-Speed 6306.18 samples/sec Loss 38.8620 LearningRate 0.0000 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:58:26,167-Speed 6333.36 samples/sec Loss 38.8782 LearningRate 0.0000 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:29,411-Speed 6314.04 samples/sec Loss 38.8611 LearningRate 0.0000 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:32,660-Speed 6305.69 samples/sec Loss 38.8455 LearningRate 0.0000 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:35,903-Speed 6317.32 samples/sec Loss 38.8271 LearningRate 0.0000 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:39,151-Speed 6305.44 samples/sec Loss 38.8366 LearningRate 0.0000 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:42,400-Speed 6306.00 samples/sec Loss 38.8187 LearningRate 0.0000 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:45,652-Speed 6299.67 samples/sec Loss 38.8505 LearningRate 0.0000 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:48,929-Speed 6250.70 samples/sec Loss 38.8292 LearningRate 0.0000 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:52,198-Speed 6265.40 samples/sec Loss 38.8646 LearningRate 0.0000 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:55,448-Speed 6303.83 samples/sec Loss 38.8504 LearningRate 0.0000 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:58:58,708-Speed 6283.19 samples/sec Loss 38.8346 LearningRate 0.0000 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:59:01,957-Speed 6304.85 samples/sec Loss 38.8816 LearningRate 0.0000 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:59:05,204-Speed 6308.95 samples/sec Loss 38.8783 LearningRate 0.0000 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:59:08,451-Speed 6309.10 samples/sec Loss 38.8634 LearningRate 0.0000 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:59:11,703-Speed 6299.98 samples/sec Loss 38.8698 LearningRate 0.0000 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:59:14,953-Speed 6303.34 samples/sec Loss 38.8563 LearningRate 0.0000 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:59:18,205-Speed 6298.36 samples/sec Loss 38.8981 LearningRate 0.0000 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:59:21,456-Speed 6302.13 samples/sec Loss 38.8856 LearningRate 0.0000 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:59:24,706-Speed 6302.97 samples/sec Loss 38.8671 LearningRate 0.0000 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:59:27,954-Speed 6305.93 samples/sec Loss 38.8696 LearningRate 0.0000 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 15:59:31,186-Speed 6338.25 samples/sec Loss 38.8746 LearningRate 0.0000 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:59:34,448-Speed 6281.21 samples/sec Loss 38.8642 LearningRate 0.0000 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:59:37,691-Speed 6315.43 samples/sec Loss 38.8994 LearningRate 0.0000 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:59:40,939-Speed 6306.62 samples/sec Loss 38.8673 LearningRate 0.0000 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:59:44,189-Speed 6303.17 samples/sec Loss 38.8878 LearningRate 0.0000 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:59:47,436-Speed 6310.49 samples/sec Loss 38.8518 LearningRate 0.0000 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:59:50,730-Speed 6217.93 samples/sec Loss 38.8576 LearningRate 0.0000 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:59:53,999-Speed 6267.28 samples/sec Loss 38.8744 LearningRate 0.0000 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 15:59:57,249-Speed 6302.83 samples/sec Loss 38.8808 LearningRate 0.0000 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:00:00,496-Speed 6306.95 samples/sec Loss 38.8882 LearningRate 0.0000 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:00:03,747-Speed 6302.82 samples/sec Loss 38.8981 LearningRate 0.0000 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:06,992-Speed 6312.39 samples/sec Loss 38.8664 LearningRate 0.0000 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:10,246-Speed 6295.55 samples/sec Loss 38.8841 LearningRate 0.0000 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:13,492-Speed 6310.68 samples/sec Loss 38.8868 LearningRate 0.0000 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:16,738-Speed 6310.88 samples/sec Loss 38.8860 LearningRate 0.0000 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:19,990-Speed 6301.18 samples/sec Loss 38.8792 LearningRate 0.0000 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:23,234-Speed 6314.30 samples/sec Loss 38.8815 LearningRate 0.0000 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:26,483-Speed 6304.42 samples/sec Loss 38.9051 LearningRate 0.0000 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:29,770-Speed 6232.93 samples/sec Loss 38.8831 LearningRate 0.0000 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:33,020-Speed 6303.44 samples/sec Loss 38.9261 LearningRate 0.0000 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:36,265-Speed 6312.07 samples/sec Loss 38.8990 LearningRate 0.0000 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:39,511-Speed 6312.45 samples/sec Loss 38.9053 LearningRate 0.0000 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:42,754-Speed 6316.36 samples/sec Loss 38.9107 LearningRate 0.0000 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:46,005-Speed 6302.24 samples/sec Loss 38.8774 LearningRate 0.0000 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:49,265-Speed 6282.09 samples/sec Loss 38.8703 LearningRate 0.0000 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:52,524-Speed 6286.07 samples/sec Loss 38.8905 LearningRate 0.0000 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:00:55,758-Speed 6335.04 samples/sec Loss 38.8884 LearningRate 0.0000 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:00:59,018-Speed 6284.13 samples/sec Loss 38.8688 LearningRate 0.0000 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:02,339-Speed 6167.96 samples/sec Loss 38.8867 LearningRate 0.0000 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:05,583-Speed 6316.19 samples/sec Loss 38.8821 LearningRate 0.0000 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:08,845-Speed 6278.61 samples/sec Loss 38.9149 LearningRate 0.0000 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:12,093-Speed 6306.91 samples/sec Loss 38.9027 LearningRate 0.0000 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:15,354-Speed 6283.85 samples/sec Loss 38.9196 LearningRate 0.0000 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:18,601-Speed 6307.88 samples/sec Loss 38.9203 LearningRate 0.0000 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:21,850-Speed 6306.68 samples/sec Loss 38.9329 LearningRate 0.0000 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:25,098-Speed 6305.06 samples/sec Loss 38.9147 LearningRate 0.0000 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:28,392-Speed 6219.84 samples/sec Loss 38.9425 LearningRate 0.0000 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:01:31,664-Speed 6259.73 samples/sec Loss 39.0690 LearningRate 0.0000 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:01:34,927-Speed 6279.54 samples/sec Loss 38.9443 LearningRate 0.0000 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:01:38,173-Speed 6309.96 samples/sec Loss 38.9250 LearningRate 0.0000 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:01:41,424-Speed 6301.82 samples/sec Loss 38.9326 LearningRate 0.0000 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:01:44,670-Speed 6310.19 samples/sec Loss 38.9248 LearningRate 0.0000 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:48,009-Speed 6135.41 samples/sec Loss 38.9538 LearningRate 0.0000 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:51,375-Speed 6085.86 samples/sec Loss 38.9215 LearningRate 0.0000 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:54,667-Speed 6222.63 samples/sec Loss 38.9694 LearningRate 0.0000 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:01:57,912-Speed 6313.13 samples/sec Loss 38.9416 LearningRate 0.0000 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:01,159-Speed 6308.48 samples/sec Loss 38.9119 LearningRate 0.0000 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:04,419-Speed 6283.64 samples/sec Loss 38.9419 LearningRate 0.0000 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:07,663-Speed 6315.93 samples/sec Loss 38.9390 LearningRate 0.0000 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:10,921-Speed 6286.06 samples/sec Loss 38.9350 LearningRate 0.0000 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:14,276-Speed 6106.56 samples/sec Loss 38.9301 LearningRate 0.0000 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:17,532-Speed 6292.62 samples/sec Loss 38.9382 LearningRate 0.0000 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:02:20,780-Speed 6308.35 samples/sec Loss 38.9433 LearningRate 0.0000 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:02:24,032-Speed 6300.32 samples/sec Loss 38.9299 LearningRate 0.0000 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:02:27,281-Speed 6303.77 samples/sec Loss 38.9368 LearningRate 0.0000 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:02:30,529-Speed 6307.09 samples/sec Loss 38.9404 LearningRate 0.0000 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:02:33,781-Speed 6299.30 samples/sec Loss 38.9661 LearningRate 0.0000 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:02:37,074-Speed 6220.58 samples/sec Loss 38.9436 LearningRate 0.0000 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:02:40,307-Speed 6336.57 samples/sec Loss 38.9270 LearningRate 0.0000 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:43,557-Speed 6302.05 samples/sec Loss 38.9484 LearningRate 0.0000 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:46,805-Speed 6308.50 samples/sec Loss 38.9558 LearningRate 0.0000 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:50,052-Speed 6308.96 samples/sec Loss 38.9626 LearningRate 0.0000 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:53,298-Speed 6311.04 samples/sec Loss 38.9584 LearningRate 0.0000 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:56,547-Speed 6304.63 samples/sec Loss 38.9832 LearningRate 0.0000 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:02:59,799-Speed 6299.20 samples/sec Loss 39.0719 LearningRate 0.0000 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:03:03,058-Speed 6288.97 samples/sec Loss 39.1107 LearningRate 0.0000 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:03:06,322-Speed 6276.01 samples/sec Loss 38.9977 LearningRate 0.0000 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:03:09,568-Speed 6310.94 samples/sec Loss 39.0404 LearningRate 0.0000 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:03:12,834-Speed 6272.73 samples/sec Loss 39.0470 LearningRate 0.0000 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:03:16,082-Speed 6306.72 samples/sec Loss 39.0275 LearningRate 0.0000 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:03:19,326-Speed 6314.45 samples/sec Loss 39.0192 LearningRate 0.0000 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:03:22,578-Speed 6300.12 samples/sec Loss 38.9617 LearningRate 0.0000 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:03:25,830-Speed 6298.69 samples/sec Loss 39.0204 LearningRate 0.0000 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:03:29,078-Speed 6307.07 samples/sec Loss 38.9812 LearningRate 0.0000 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:03:32,335-Speed 6288.34 samples/sec Loss 38.9817 LearningRate 0.0000 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:03:35,582-Speed 6309.30 samples/sec Loss 38.9626 LearningRate 0.0000 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:03:38,831-Speed 6304.92 samples/sec Loss 39.0310 LearningRate 0.0000 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:03:42,081-Speed 6304.17 samples/sec Loss 39.2008 LearningRate 0.0000 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:03:45,323-Speed 6317.72 samples/sec Loss 39.0864 LearningRate 0.0000 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:03:48,575-Speed 6299.62 samples/sec Loss 39.1530 LearningRate 0.0000 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:03:51,830-Speed 6293.21 samples/sec Loss 39.2276 LearningRate 0.0000 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:03:55,081-Speed 6302.00 samples/sec Loss 39.3060 LearningRate 0.0000 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:03:58,332-Speed 6299.52 samples/sec Loss 39.2202 LearningRate 0.0000 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:04:01,586-Speed 6297.24 samples/sec Loss 39.2269 LearningRate 0.0000 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:04:04,836-Speed 6302.13 samples/sec Loss 39.3022 LearningRate 0.0000 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:04:08,103-Speed 6270.63 samples/sec Loss 39.2738 LearningRate 0.0000 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:04:11,351-Speed 6306.68 samples/sec Loss 39.1944 LearningRate 0.0000 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:04:14,604-Speed 6297.12 samples/sec Loss 39.3098 LearningRate 0.0000 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:04:17,859-Speed 6293.62 samples/sec Loss 39.1294 LearningRate 0.0000 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:21,108-Speed 6304.73 samples/sec Loss 39.1169 LearningRate 0.0000 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:24,360-Speed 6298.54 samples/sec Loss 39.1510 LearningRate 0.0000 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:27,611-Speed 6302.26 samples/sec Loss 39.1990 LearningRate 0.0000 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:30,855-Speed 6315.01 samples/sec Loss 39.1307 LearningRate 0.0000 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:34,104-Speed 6305.26 samples/sec Loss 39.0118 LearningRate 0.0000 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:37,355-Speed 6301.19 samples/sec Loss 39.0474 LearningRate 0.0000 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:40,601-Speed 6310.01 samples/sec Loss 39.0066 LearningRate 0.0000 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:43,852-Speed 6301.80 samples/sec Loss 39.0073 LearningRate 0.0000 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:47,102-Speed 6301.31 samples/sec Loss 39.0552 LearningRate 0.0000 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:50,356-Speed 6296.37 samples/sec Loss 39.1124 LearningRate 0.0000 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:04:53,595-Speed 6325.26 samples/sec Loss 39.0517 LearningRate 0.0000 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:04:56,841-Speed 6310.13 samples/sec Loss 39.0677 LearningRate 0.0000 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:05:00,092-Speed 6301.04 samples/sec Loss 39.2792 LearningRate 0.0000 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:05:03,350-Speed 6288.32 samples/sec Loss 39.2551 LearningRate 0.0000 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:05:06,622-Speed 6260.66 samples/sec Loss 39.3221 LearningRate 0.0000 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:05:09,871-Speed 6304.23 samples/sec Loss 39.1762 LearningRate 0.0000 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:05:13,146-Speed 6256.84 samples/sec Loss 39.1057 LearningRate 0.0000 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:05:16,402-Speed 6291.05 samples/sec Loss 39.0733 LearningRate 0.0000 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:05:19,661-Speed 6285.88 samples/sec Loss 39.0245 LearningRate 0.0000 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:05:22,908-Speed 6310.22 samples/sec Loss 39.0222 LearningRate 0.0000 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:05:26,154-Speed 6310.36 samples/sec Loss 39.0395 LearningRate 0.0000 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:05:29,412-Speed 6288.30 samples/sec Loss 39.0515 LearningRate 0.0000 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:05:32,691-Speed 6248.15 samples/sec Loss 39.0632 LearningRate 0.0000 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:05:35,936-Speed 6311.32 samples/sec Loss 39.0597 LearningRate 0.0000 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:05:39,186-Speed 6303.28 samples/sec Loss 39.0666 LearningRate 0.0000 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:05:42,430-Speed 6314.41 samples/sec Loss 39.0720 LearningRate 0.0000 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:05:45,675-Speed 6312.15 samples/sec Loss 39.0630 LearningRate 0.0000 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:05:48,930-Speed 6293.95 samples/sec Loss 39.0724 LearningRate 0.0000 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:05:52,182-Speed 6300.94 samples/sec Loss 39.0749 LearningRate 0.0000 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:05:55,427-Speed 6312.48 samples/sec Loss 39.0509 LearningRate 0.0000 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:05:58,688-Speed 6280.85 samples/sec Loss 39.0452 LearningRate 0.0000 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:06:02,030-Speed 6128.94 samples/sec Loss 39.0786 LearningRate 0.0000 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:06:05,300-Speed 6265.82 samples/sec Loss 39.0605 LearningRate 0.0000 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:06:08,546-Speed 6310.04 samples/sec Loss 39.0792 LearningRate 0.0000 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:06:11,799-Speed 6297.55 samples/sec Loss 39.0795 LearningRate 0.0000 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:06:15,061-Speed 6280.91 samples/sec Loss 39.0746 LearningRate 0.0000 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:06:18,313-Speed 6297.68 samples/sec Loss 39.0785 LearningRate 0.0000 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:06:21,564-Speed 6303.06 samples/sec Loss 39.0628 LearningRate 0.0000 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:06:24,812-Speed 6306.41 samples/sec Loss 39.0794 LearningRate 0.0000 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:06:28,063-Speed 6299.59 samples/sec Loss 39.0630 LearningRate 0.0000 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:06:31,315-Speed 6299.82 samples/sec Loss 39.0628 LearningRate 0.0000 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:06:34,560-Speed 6312.15 samples/sec Loss 39.0841 LearningRate 0.0000 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:06:37,818-Speed 6288.45 samples/sec Loss 39.1041 LearningRate 0.0000 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:06:41,077-Speed 6285.77 samples/sec Loss 39.0741 LearningRate 0.0000 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:06:44,323-Speed 6311.62 samples/sec Loss 39.0866 LearningRate 0.0000 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:06:47,575-Speed 6298.54 samples/sec Loss 39.0926 LearningRate 0.0000 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:06:50,835-Speed 6283.11 samples/sec Loss 39.0743 LearningRate 0.0000 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:06:54,103-Speed 6267.97 samples/sec Loss 39.0952 LearningRate 0.0000 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:06:57,362-Speed 6288.19 samples/sec Loss 39.0989 LearningRate 0.0000 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:00,612-Speed 6301.33 samples/sec Loss 39.0964 LearningRate 0.0000 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:03,867-Speed 6294.77 samples/sec Loss 39.1068 LearningRate 0.0000 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:07,114-Speed 6308.42 samples/sec Loss 39.1215 LearningRate 0.0001 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:10,374-Speed 6283.11 samples/sec Loss 39.1129 LearningRate 0.0001 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:13,632-Speed 6287.78 samples/sec Loss 39.1048 LearningRate 0.0001 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:16,881-Speed 6305.83 samples/sec Loss 39.1034 LearningRate 0.0001 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:20,124-Speed 6317.09 samples/sec Loss 39.1060 LearningRate 0.0001 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:07:23,373-Speed 6304.16 samples/sec Loss 39.0999 LearningRate 0.0001 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:26,620-Speed 6309.66 samples/sec Loss 39.0993 LearningRate 0.0001 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:29,866-Speed 6310.44 samples/sec Loss 39.1091 LearningRate 0.0001 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:33,115-Speed 6305.82 samples/sec Loss 39.0998 LearningRate 0.0001 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:36,371-Speed 6290.73 samples/sec Loss 39.1168 LearningRate 0.0001 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:39,623-Speed 6298.85 samples/sec Loss 39.1104 LearningRate 0.0001 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:42,878-Speed 6294.08 samples/sec Loss 39.1168 LearningRate 0.0001 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:46,129-Speed 6302.40 samples/sec Loss 39.1511 LearningRate 0.0001 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:49,378-Speed 6304.78 samples/sec Loss 39.1308 LearningRate 0.0001 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:52,673-Speed 6217.22 samples/sec Loss 39.1121 LearningRate 0.0001 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:07:55,931-Speed 6286.31 samples/sec Loss 39.1340 LearningRate 0.0001 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:07:59,177-Speed 6312.94 samples/sec Loss 39.1215 LearningRate 0.0001 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:08:02,422-Speed 6311.49 samples/sec Loss 39.1423 LearningRate 0.0001 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:08:05,695-Speed 6259.43 samples/sec Loss 39.1276 LearningRate 0.0001 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:08:08,942-Speed 6309.14 samples/sec Loss 39.1255 LearningRate 0.0001 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:08:12,215-Speed 6258.72 samples/sec Loss 39.1388 LearningRate 0.0001 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:08:15,459-Speed 6314.46 samples/sec Loss 39.1466 LearningRate 0.0001 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:08:18,718-Speed 6285.46 samples/sec Loss 39.1333 LearningRate 0.0001 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:08:21,952-Speed 6332.96 samples/sec Loss 39.1482 LearningRate 0.0001 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:08:25,214-Speed 6282.14 samples/sec Loss 39.1414 LearningRate 0.0001 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:08:28,458-Speed 6313.68 samples/sec Loss 39.1327 LearningRate 0.0001 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:08:31,712-Speed 6295.84 samples/sec Loss 39.1306 LearningRate 0.0001 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:08:34,960-Speed 6306.58 samples/sec Loss 39.1430 LearningRate 0.0001 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:08:38,224-Speed 6275.23 samples/sec Loss 39.1447 LearningRate 0.0001 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:08:41,470-Speed 6310.71 samples/sec Loss 39.1457 LearningRate 0.0001 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:08:44,716-Speed 6311.07 samples/sec Loss 39.1713 LearningRate 0.0001 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:08:47,959-Speed 6318.17 samples/sec Loss 39.1458 LearningRate 0.0001 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:08:51,203-Speed 6314.71 samples/sec Loss 39.1371 LearningRate 0.0001 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:08:54,446-Speed 6316.04 samples/sec Loss 39.1681 LearningRate 0.0001 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:08:57,691-Speed 6313.41 samples/sec Loss 39.1881 LearningRate 0.0001 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:09:00,940-Speed 6303.81 samples/sec Loss 39.1598 LearningRate 0.0001 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:09:04,194-Speed 6295.76 samples/sec Loss 39.1593 LearningRate 0.0001 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:09:07,447-Speed 6298.27 samples/sec Loss 39.1485 LearningRate 0.0001 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:09:10,693-Speed 6310.20 samples/sec Loss 39.1566 LearningRate 0.0001 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:09:13,941-Speed 6307.60 samples/sec Loss 39.2099 LearningRate 0.0001 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 65536 Required: 75 hours Training: 2022-03-31 16:09:17,175-Speed 6334.04 samples/sec Loss 39.1762 LearningRate 0.0001 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:09:20,419-Speed 6314.32 samples/sec Loss 39.1787 LearningRate 0.0001 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:09:23,670-Speed 6301.15 samples/sec Loss 39.1534 LearningRate 0.0001 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:09:26,915-Speed 6312.26 samples/sec Loss 39.1698 LearningRate 0.0001 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:09:30,166-Speed 6302.23 samples/sec Loss 39.2170 LearningRate 0.0001 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:09:33,409-Speed 6315.87 samples/sec Loss 39.2383 LearningRate 0.0001 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:09:36,653-Speed 6314.64 samples/sec Loss 39.2056 LearningRate 0.0001 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:09:39,896-Speed 6316.63 samples/sec Loss 39.2032 LearningRate 0.0001 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 32768 Required: 75 hours Training: 2022-03-31 16:09:43,124-Speed 6345.39 samples/sec Loss 39.1969 LearningRate 0.0001 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:09:46,374-Speed 6304.88 samples/sec Loss 39.2145 LearningRate 0.0001 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:09:49,718-Speed 6125.55 samples/sec Loss 39.2008 LearningRate 0.0001 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:09:52,978-Speed 6282.65 samples/sec Loss 39.2355 LearningRate 0.0001 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:09:56,226-Speed 6306.76 samples/sec Loss 39.2198 LearningRate 0.0001 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:09:59,472-Speed 6310.44 samples/sec Loss 39.2354 LearningRate 0.0001 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:10:02,705-Speed 6336.89 samples/sec Loss 39.2608 LearningRate 0.0001 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:10:05,953-Speed 6307.63 samples/sec Loss 39.2557 LearningRate 0.0001 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:10:09,195-Speed 6319.24 samples/sec Loss 39.2751 LearningRate 0.0001 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:10:12,435-Speed 6320.56 samples/sec Loss 39.2905 LearningRate 0.0001 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:10:15,676-Speed 6321.45 samples/sec Loss 39.2276 LearningRate 0.0001 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:10:18,984-Speed 6192.67 samples/sec Loss 39.2544 LearningRate 0.0001 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:10:22,229-Speed 6312.70 samples/sec Loss 39.2519 LearningRate 0.0001 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:10:25,470-Speed 6320.89 samples/sec Loss 39.2417 LearningRate 0.0001 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:10:28,714-Speed 6313.32 samples/sec Loss 39.2349 LearningRate 0.0001 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:10:31,958-Speed 6315.44 samples/sec Loss 39.2253 LearningRate 0.0001 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:10:35,206-Speed 6306.87 samples/sec Loss 39.2562 LearningRate 0.0001 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:10:38,448-Speed 6319.04 samples/sec Loss 39.2165 LearningRate 0.0001 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:10:41,696-Speed 6305.62 samples/sec Loss 39.4145 LearningRate 0.0001 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:10:44,942-Speed 6310.39 samples/sec Loss 39.5611 LearningRate 0.0001 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:10:48,203-Speed 6282.27 samples/sec Loss 39.3668 LearningRate 0.0001 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:10:51,484-Speed 6242.55 samples/sec Loss 39.3495 LearningRate 0.0001 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:10:54,735-Speed 6302.21 samples/sec Loss 39.2993 LearningRate 0.0001 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:10:57,983-Speed 6306.95 samples/sec Loss 39.2797 LearningRate 0.0001 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:11:01,244-Speed 6282.73 samples/sec Loss 39.4107 LearningRate 0.0001 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:11:04,493-Speed 6304.10 samples/sec Loss 39.3893 LearningRate 0.0001 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:11:07,725-Speed 6338.55 samples/sec Loss 39.3150 LearningRate 0.0001 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:11:10,975-Speed 6303.12 samples/sec Loss 39.3083 LearningRate 0.0001 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:11:14,241-Speed 6271.49 samples/sec Loss 39.3167 LearningRate 0.0001 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:11:17,530-Speed 6228.21 samples/sec Loss 39.3095 LearningRate 0.0001 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:11:20,780-Speed 6303.13 samples/sec Loss 39.3012 LearningRate 0.0001 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:11:24,037-Speed 6290.02 samples/sec Loss 39.3123 LearningRate 0.0001 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:11:27,280-Speed 6315.26 samples/sec Loss 39.2696 LearningRate 0.0001 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:11:30,510-Speed 6342.10 samples/sec Loss 39.4071 LearningRate 0.0001 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:11:33,754-Speed 6315.62 samples/sec Loss 39.3768 LearningRate 0.0001 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:11:37,012-Speed 6286.87 samples/sec Loss 39.3652 LearningRate 0.0001 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:11:40,256-Speed 6313.87 samples/sec Loss 39.2999 LearningRate 0.0001 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:11:43,511-Speed 6294.48 samples/sec Loss 39.3096 LearningRate 0.0001 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:11:46,751-Speed 6322.10 samples/sec Loss 39.3445 LearningRate 0.0001 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:11:49,996-Speed 6312.43 samples/sec Loss 39.7898 LearningRate 0.0001 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:11:53,259-Speed 6277.54 samples/sec Loss 39.9632 LearningRate 0.0001 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:11:56,525-Speed 6272.17 samples/sec Loss 39.9568 LearningRate 0.0001 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:11:59,783-Speed 6288.16 samples/sec Loss 40.0102 LearningRate 0.0001 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:12:03,021-Speed 6326.44 samples/sec Loss 39.9747 LearningRate 0.0001 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:12:06,270-Speed 6304.77 samples/sec Loss 39.9863 LearningRate 0.0001 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:12:09,522-Speed 6300.04 samples/sec Loss 39.9905 LearningRate 0.0001 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:12:12,766-Speed 6313.19 samples/sec Loss 39.9755 LearningRate 0.0001 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:12:16,001-Speed 6334.04 samples/sec Loss 39.9315 LearningRate 0.0001 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:12:19,251-Speed 6302.93 samples/sec Loss 39.9170 LearningRate 0.0001 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:12:22,510-Speed 6284.87 samples/sec Loss 39.8786 LearningRate 0.0001 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:12:25,803-Speed 6221.03 samples/sec Loss 39.8748 LearningRate 0.0001 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:12:29,055-Speed 6299.31 samples/sec Loss 39.8435 LearningRate 0.0001 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:12:32,303-Speed 6305.69 samples/sec Loss 39.8412 LearningRate 0.0001 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:12:35,576-Speed 6258.50 samples/sec Loss 39.7988 LearningRate 0.0001 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:12:38,825-Speed 6305.05 samples/sec Loss 39.7824 LearningRate 0.0001 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:12:42,074-Speed 6304.37 samples/sec Loss 39.7179 LearningRate 0.0001 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:12:45,316-Speed 6320.20 samples/sec Loss 39.7314 LearningRate 0.0001 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:12:48,565-Speed 6303.62 samples/sec Loss 39.7209 LearningRate 0.0001 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:12:51,811-Speed 6310.56 samples/sec Loss 39.7021 LearningRate 0.0001 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:12:55,056-Speed 6312.77 samples/sec Loss 39.6576 LearningRate 0.0001 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:12:58,308-Speed 6299.84 samples/sec Loss 39.5800 LearningRate 0.0001 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:13:01,559-Speed 6301.45 samples/sec Loss 39.5743 LearningRate 0.0001 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:13:04,810-Speed 6301.53 samples/sec Loss 39.4998 LearningRate 0.0001 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:13:08,058-Speed 6305.91 samples/sec Loss 39.4681 LearningRate 0.0001 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:13:11,316-Speed 6287.80 samples/sec Loss 39.6017 LearningRate 0.0001 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:13:14,563-Speed 6309.80 samples/sec Loss 40.0115 LearningRate 0.0001 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:13:17,812-Speed 6305.06 samples/sec Loss 40.0799 LearningRate 0.0001 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:13:21,061-Speed 6304.84 samples/sec Loss 39.8160 LearningRate 0.0001 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:13:24,306-Speed 6311.45 samples/sec Loss 39.7317 LearningRate 0.0001 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:13:27,557-Speed 6302.46 samples/sec Loss 39.6567 LearningRate 0.0001 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:13:30,802-Speed 6312.43 samples/sec Loss 39.5967 LearningRate 0.0001 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:13:34,053-Speed 6299.76 samples/sec Loss 39.5820 LearningRate 0.0001 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:13:37,312-Speed 6286.90 samples/sec Loss 39.5259 LearningRate 0.0001 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:13:40,563-Speed 6301.06 samples/sec Loss 39.5172 LearningRate 0.0001 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:13:43,809-Speed 6309.60 samples/sec Loss 39.5504 LearningRate 0.0001 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:13:47,114-Speed 6199.15 samples/sec Loss 39.5092 LearningRate 0.0001 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:13:50,347-Speed 6335.59 samples/sec Loss 39.8298 LearningRate 0.0001 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:13:53,595-Speed 6306.29 samples/sec Loss 39.4875 LearningRate 0.0001 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:13:56,841-Speed 6310.04 samples/sec Loss 39.5141 LearningRate 0.0001 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:00,092-Speed 6301.26 samples/sec Loss 39.5121 LearningRate 0.0001 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:03,362-Speed 6265.88 samples/sec Loss 39.5031 LearningRate 0.0001 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:06,640-Speed 6249.79 samples/sec Loss 39.4877 LearningRate 0.0001 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:09,888-Speed 6306.89 samples/sec Loss 39.4918 LearningRate 0.0001 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:13,133-Speed 6311.53 samples/sec Loss 39.4639 LearningRate 0.0001 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:16,381-Speed 6306.28 samples/sec Loss 39.4481 LearningRate 0.0001 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:19,628-Speed 6310.27 samples/sec Loss 39.4471 LearningRate 0.0001 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:22,879-Speed 6300.26 samples/sec Loss 39.4467 LearningRate 0.0001 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:14:26,142-Speed 6278.83 samples/sec Loss 39.4701 LearningRate 0.0001 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:14:29,518-Speed 6066.97 samples/sec Loss 39.4686 LearningRate 0.0001 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:14:32,758-Speed 6323.27 samples/sec Loss 39.4912 LearningRate 0.0001 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:36,003-Speed 6312.32 samples/sec Loss 39.5220 LearningRate 0.0001 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:39,248-Speed 6311.72 samples/sec Loss 39.4522 LearningRate 0.0001 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:42,496-Speed 6305.82 samples/sec Loss 39.4723 LearningRate 0.0001 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:45,747-Speed 6302.14 samples/sec Loss 39.4475 LearningRate 0.0001 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:49,000-Speed 6297.76 samples/sec Loss 39.5327 LearningRate 0.0001 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:52,252-Speed 6298.49 samples/sec Loss 39.9624 LearningRate 0.0001 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:55,504-Speed 6299.91 samples/sec Loss 39.9394 LearningRate 0.0001 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:14:58,758-Speed 6293.51 samples/sec Loss 39.8877 LearningRate 0.0001 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:15:02,016-Speed 6289.32 samples/sec Loss 39.8514 LearningRate 0.0001 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-03-31 16:15:05,267-Speed 6299.25 samples/sec Loss 39.8100 LearningRate 0.0001 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:15:08,520-Speed 6297.29 samples/sec Loss 39.7973 LearningRate 0.0001 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:15:11,773-Speed 6297.21 samples/sec Loss 39.7778 LearningRate 0.0001 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:15:15,031-Speed 6288.06 samples/sec Loss 39.7629 LearningRate 0.0001 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:15:18,306-Speed 6254.86 samples/sec Loss 39.7327 LearningRate 0.0001 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 16384 Required: 75 hours Training: 2022-03-31 16:15:21,530-Speed 6354.83 samples/sec Loss 39.6568 LearningRate 0.0001 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 4096 Required: 75 hours Training: 2022-03-31 16:15:24,772-Speed 6318.49 samples/sec Loss 39.6547 LearningRate 0.0001 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 2048 Required: 75 hours Training: 2022-03-31 16:15:28,010-Speed 6326.28 samples/sec Loss 39.6087 LearningRate 0.0001 Epoch: 0 Global Step: 5690 Fp16 Grad Scale: 1024 Required: 75 hours Training: 2022-03-31 16:15:31,263-Speed 6298.47 samples/sec Loss 39.5347 LearningRate 0.0001 Epoch: 0 Global Step: 5700 Fp16 Grad Scale: 1024 Required: 75 hours Training: 2022-03-31 16:15:34,545-Speed 6241.86 samples/sec Loss 39.5392 LearningRate 0.0001 Epoch: 0 Global Step: 5710 Fp16 Grad Scale: 1024 Required: 75 hours Training: 2022-03-31 16:15:37,793-Speed 6307.33 samples/sec Loss 39.5479 LearningRate 0.0001 Epoch: 0 Global Step: 5720 Fp16 Grad Scale: 256 Required: 75 hours Training: 2022-03-31 16:15:41,136-Speed 6127.56 samples/sec Loss 39.5455 LearningRate 0.0001 Epoch: 0 Global Step: 5730 Fp16 Grad Scale: 256 Required: 75 hours Training: 2022-03-31 16:15:44,413-Speed 6251.15 samples/sec Loss 39.5164 LearningRate 0.0001 Epoch: 0 Global Step: 5740 Fp16 Grad Scale: 128 Required: 75 hours Training: 2022-03-31 16:15:47,652-Speed 6322.57 samples/sec Loss 39.5280 LearningRate 0.0001 Epoch: 0 Global Step: 5750 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:15:50,916-Speed 6276.98 samples/sec Loss 39.5303 LearningRate 0.0001 Epoch: 0 Global Step: 5760 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:15:54,204-Speed 6229.13 samples/sec Loss 39.6263 LearningRate 0.0001 Epoch: 0 Global Step: 5770 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:15:57,447-Speed 6317.78 samples/sec Loss 39.6792 LearningRate 0.0001 Epoch: 0 Global Step: 5780 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:16:00,686-Speed 6324.38 samples/sec Loss 39.7319 LearningRate 0.0001 Epoch: 0 Global Step: 5790 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:16:03,953-Speed 6269.83 samples/sec Loss 39.5787 LearningRate 0.0001 Epoch: 0 Global Step: 5800 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:16:07,207-Speed 6294.67 samples/sec Loss 39.5545 LearningRate 0.0001 Epoch: 0 Global Step: 5810 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:16:10,465-Speed 6287.61 samples/sec Loss 39.6355 LearningRate 0.0001 Epoch: 0 Global Step: 5820 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:16:13,764-Speed 6208.91 samples/sec Loss 39.5806 LearningRate 0.0001 Epoch: 0 Global Step: 5830 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:16:17,040-Speed 6253.67 samples/sec Loss 39.4950 LearningRate 0.0001 Epoch: 0 Global Step: 5840 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:16:20,296-Speed 6291.87 samples/sec Loss 39.4550 LearningRate 0.0001 Epoch: 0 Global Step: 5850 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:16:23,536-Speed 6322.30 samples/sec Loss 39.4529 LearningRate 0.0001 Epoch: 0 Global Step: 5860 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:16:26,769-Speed 6336.89 samples/sec Loss 39.4539 LearningRate 0.0001 Epoch: 0 Global Step: 5870 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:16:30,003-Speed 6333.32 samples/sec Loss 39.4330 LearningRate 0.0001 Epoch: 0 Global Step: 5880 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:16:33,243-Speed 6322.28 samples/sec Loss 39.4701 LearningRate 0.0001 Epoch: 0 Global Step: 5890 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:16:36,484-Speed 6319.97 samples/sec Loss 39.4816 LearningRate 0.0001 Epoch: 0 Global Step: 5900 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:16:39,719-Speed 6332.44 samples/sec Loss 39.5303 LearningRate 0.0001 Epoch: 0 Global Step: 5910 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:16:42,956-Speed 6328.74 samples/sec Loss 39.5284 LearningRate 0.0001 Epoch: 0 Global Step: 5920 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:16:46,197-Speed 6320.79 samples/sec Loss 39.4668 LearningRate 0.0001 Epoch: 0 Global Step: 5930 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:16:49,477-Speed 6245.22 samples/sec Loss 39.5210 LearningRate 0.0001 Epoch: 0 Global Step: 5940 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:16:52,719-Speed 6318.66 samples/sec Loss 39.4869 LearningRate 0.0001 Epoch: 0 Global Step: 5950 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:16:55,966-Speed 6306.97 samples/sec Loss 39.4861 LearningRate 0.0001 Epoch: 0 Global Step: 5960 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:16:59,210-Speed 6316.12 samples/sec Loss 39.4823 LearningRate 0.0001 Epoch: 0 Global Step: 5970 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:17:02,456-Speed 6309.40 samples/sec Loss 39.4764 LearningRate 0.0001 Epoch: 0 Global Step: 5980 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:17:05,685-Speed 6345.37 samples/sec Loss 39.4487 LearningRate 0.0001 Epoch: 0 Global Step: 5990 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:17:08,926-Speed 6319.74 samples/sec Loss 39.4487 LearningRate 0.0001 Epoch: 0 Global Step: 6000 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:17:12,201-Speed 6254.47 samples/sec Loss 39.4700 LearningRate 0.0001 Epoch: 0 Global Step: 6010 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:17:15,459-Speed 6286.95 samples/sec Loss 39.4837 LearningRate 0.0001 Epoch: 0 Global Step: 6020 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:17:18,709-Speed 6303.38 samples/sec Loss 39.4787 LearningRate 0.0001 Epoch: 0 Global Step: 6030 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:17:21,952-Speed 6316.49 samples/sec Loss 39.5065 LearningRate 0.0001 Epoch: 0 Global Step: 6040 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:17:25,177-Speed 6353.76 samples/sec Loss 39.4542 LearningRate 0.0001 Epoch: 0 Global Step: 6050 Fp16 Grad Scale: 8 Required: 75 hours Training: 2022-03-31 16:17:28,429-Speed 6298.79 samples/sec Loss 39.5044 LearningRate 0.0001 Epoch: 0 Global Step: 6060 Fp16 Grad Scale: 8 Required: 75 hours Training: 2022-03-31 16:17:31,666-Speed 6327.31 samples/sec Loss 39.4690 LearningRate 0.0001 Epoch: 0 Global Step: 6070 Fp16 Grad Scale: 8 Required: 75 hours Training: 2022-03-31 16:17:34,914-Speed 6306.77 samples/sec Loss 39.4656 LearningRate 0.0001 Epoch: 0 Global Step: 6080 Fp16 Grad Scale: 8 Required: 75 hours Training: 2022-03-31 16:17:38,165-Speed 6300.92 samples/sec Loss 39.4640 LearningRate 0.0001 Epoch: 0 Global Step: 6090 Fp16 Grad Scale: 8 Required: 75 hours Training: 2022-03-31 16:17:41,413-Speed 6306.87 samples/sec Loss 39.4446 LearningRate 0.0001 Epoch: 0 Global Step: 6100 Fp16 Grad Scale: 8 Required: 75 hours Training: 2022-03-31 16:17:44,667-Speed 6295.28 samples/sec Loss 39.4474 LearningRate 0.0001 Epoch: 0 Global Step: 6110 Fp16 Grad Scale: 8 Required: 75 hours Training: 2022-03-31 16:17:47,916-Speed 6305.31 samples/sec Loss 39.4769 LearningRate 0.0001 Epoch: 0 Global Step: 6120 Fp16 Grad Scale: 8 Required: 75 hours Training: 2022-03-31 16:17:51,162-Speed 6310.14 samples/sec Loss 39.4651 LearningRate 0.0001 Epoch: 0 Global Step: 6130 Fp16 Grad Scale: 8 Required: 75 hours Training: 2022-03-31 16:17:54,417-Speed 6292.83 samples/sec Loss 39.4673 LearningRate 0.0001 Epoch: 0 Global Step: 6140 Fp16 Grad Scale: 8 Required: 75 hours Training: 2022-03-31 16:17:57,660-Speed 6316.58 samples/sec Loss 39.4852 LearningRate 0.0001 Epoch: 0 Global Step: 6150 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:18:00,908-Speed 6307.78 samples/sec Loss 39.4759 LearningRate 0.0001 Epoch: 0 Global Step: 6160 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:18:04,168-Speed 6283.91 samples/sec Loss 39.4820 LearningRate 0.0001 Epoch: 0 Global Step: 6170 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:18:07,421-Speed 6297.29 samples/sec Loss 39.5005 LearningRate 0.0001 Epoch: 0 Global Step: 6180 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:18:10,673-Speed 6298.51 samples/sec Loss 39.4789 LearningRate 0.0001 Epoch: 0 Global Step: 6190 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:18:13,919-Speed 6310.40 samples/sec Loss 39.4677 LearningRate 0.0001 Epoch: 0 Global Step: 6200 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:18:17,163-Speed 6314.00 samples/sec Loss 39.4613 LearningRate 0.0001 Epoch: 0 Global Step: 6210 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:18:20,404-Speed 6321.59 samples/sec Loss 39.5103 LearningRate 0.0001 Epoch: 0 Global Step: 6220 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:18:23,646-Speed 6317.10 samples/sec Loss 39.4783 LearningRate 0.0001 Epoch: 0 Global Step: 6230 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:18:26,893-Speed 6311.13 samples/sec Loss 39.4712 LearningRate 0.0001 Epoch: 0 Global Step: 6240 Fp16 Grad Scale: 16 Required: 75 hours Training: 2022-03-31 16:18:30,130-Speed 6327.18 samples/sec Loss 39.4828 LearningRate 0.0001 Epoch: 0 Global Step: 6250 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:18:33,374-Speed 6314.70 samples/sec Loss 39.4859 LearningRate 0.0001 Epoch: 0 Global Step: 6260 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:18:36,610-Speed 6330.14 samples/sec Loss 39.4676 LearningRate 0.0001 Epoch: 0 Global Step: 6270 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:18:39,850-Speed 6322.46 samples/sec Loss 39.4826 LearningRate 0.0001 Epoch: 0 Global Step: 6280 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:18:43,100-Speed 6304.58 samples/sec Loss 39.5131 LearningRate 0.0001 Epoch: 0 Global Step: 6290 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:18:46,338-Speed 6324.46 samples/sec Loss 39.4864 LearningRate 0.0001 Epoch: 0 Global Step: 6300 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:18:49,604-Speed 6271.97 samples/sec Loss 39.4710 LearningRate 0.0001 Epoch: 0 Global Step: 6310 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:18:52,931-Speed 6157.32 samples/sec Loss 39.5172 LearningRate 0.0001 Epoch: 0 Global Step: 6320 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:18:56,172-Speed 6321.37 samples/sec Loss 39.4604 LearningRate 0.0001 Epoch: 0 Global Step: 6330 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:18:59,414-Speed 6316.99 samples/sec Loss 39.4716 LearningRate 0.0001 Epoch: 0 Global Step: 6340 Fp16 Grad Scale: 32 Required: 75 hours Training: 2022-03-31 16:19:02,659-Speed 6314.16 samples/sec Loss 39.4570 LearningRate 0.0001 Epoch: 0 Global Step: 6350 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:05,901-Speed 6318.21 samples/sec Loss 39.4591 LearningRate 0.0001 Epoch: 0 Global Step: 6360 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:09,139-Speed 6326.07 samples/sec Loss 39.4993 LearningRate 0.0001 Epoch: 0 Global Step: 6370 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:12,380-Speed 6320.71 samples/sec Loss 39.4654 LearningRate 0.0001 Epoch: 0 Global Step: 6380 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:15,622-Speed 6317.17 samples/sec Loss 39.4782 LearningRate 0.0001 Epoch: 0 Global Step: 6390 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:18,861-Speed 6325.65 samples/sec Loss 39.5041 LearningRate 0.0001 Epoch: 0 Global Step: 6400 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:22,101-Speed 6322.55 samples/sec Loss 39.4880 LearningRate 0.0001 Epoch: 0 Global Step: 6410 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:25,345-Speed 6313.17 samples/sec Loss 39.4775 LearningRate 0.0001 Epoch: 0 Global Step: 6420 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:28,595-Speed 6304.70 samples/sec Loss 39.5032 LearningRate 0.0001 Epoch: 0 Global Step: 6430 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:31,842-Speed 6308.33 samples/sec Loss 39.4876 LearningRate 0.0001 Epoch: 0 Global Step: 6440 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:35,086-Speed 6315.38 samples/sec Loss 39.5185 LearningRate 0.0001 Epoch: 0 Global Step: 6450 Fp16 Grad Scale: 128 Required: 75 hours Training: 2022-03-31 16:19:38,371-Speed 6236.13 samples/sec Loss 39.5112 LearningRate 0.0001 Epoch: 0 Global Step: 6460 Fp16 Grad Scale: 128 Required: 75 hours Training: 2022-03-31 16:19:41,618-Speed 6309.49 samples/sec Loss 39.5284 LearningRate 0.0001 Epoch: 0 Global Step: 6470 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:44,930-Speed 6184.83 samples/sec Loss 39.4982 LearningRate 0.0001 Epoch: 0 Global Step: 6480 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:48,265-Speed 6141.82 samples/sec Loss 39.5200 LearningRate 0.0001 Epoch: 0 Global Step: 6490 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:51,519-Speed 6295.46 samples/sec Loss 39.5267 LearningRate 0.0001 Epoch: 0 Global Step: 6500 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:54,771-Speed 6298.39 samples/sec Loss 39.5415 LearningRate 0.0001 Epoch: 0 Global Step: 6510 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:19:58,018-Speed 6309.67 samples/sec Loss 39.7692 LearningRate 0.0001 Epoch: 0 Global Step: 6520 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:20:01,257-Speed 6323.35 samples/sec Loss 39.5882 LearningRate 0.0001 Epoch: 0 Global Step: 6530 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:20:04,502-Speed 6312.24 samples/sec Loss 39.6110 LearningRate 0.0001 Epoch: 0 Global Step: 6540 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:20:07,745-Speed 6317.88 samples/sec Loss 39.6172 LearningRate 0.0001 Epoch: 0 Global Step: 6550 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:20:10,991-Speed 6310.02 samples/sec Loss 39.6137 LearningRate 0.0001 Epoch: 0 Global Step: 6560 Fp16 Grad Scale: 64 Required: 75 hours Training: 2022-03-31 16:20:14,233-Speed 6318.21 samples/sec Loss 39.6050 LearningRate 0.0001 Epoch: 0 Global Step: 6570 Fp16 Grad Scale: 128 Required: 75 hours Training: 2022-03-31 16:20:17,479-Speed 6309.95 samples/sec Loss 39.5637 LearningRate 0.0001 Epoch: 0 Global Step: 6580 Fp16 Grad Scale: 128 Required: 75 hours Training: 2022-03-31 16:20:20,722-Speed 6316.58 samples/sec Loss 39.5613 LearningRate 0.0001 Epoch: 0 Global Step: 6590 Fp16 Grad Scale: 128 Required: 75 hours Training: 2022-03-31 16:20:23,977-Speed 6294.56 samples/sec Loss 39.5373 LearningRate 0.0001 Epoch: 0 Global Step: 6600 Fp16 Grad Scale: 128 Required: 75 hours Training: 2022-03-31 16:20:27,217-Speed 6322.22 samples/sec Loss 39.5473 LearningRate 0.0001 Epoch: 0 Global Step: 6610 Fp16 Grad Scale: 128 Required: 75 hours Training: 2022-03-31 16:20:30,460-Speed 6315.06 samples/sec Loss 39.5209 LearningRate 0.0001 Epoch: 0 Global Step: 6620 Fp16 Grad Scale: 128 Required: 75 hours Training: 2022-03-31 16:20:33,706-Speed 6312.35 samples/sec Loss 39.5091 LearningRate 0.0001 Epoch: 0 Global Step: 6630 Fp16 Grad Scale: 128 Required: 74 hours Training: 2022-03-31 16:20:36,947-Speed 6320.65 samples/sec Loss 39.5315 LearningRate 0.0001 Epoch: 0 Global Step: 6640 Fp16 Grad Scale: 128 Required: 74 hours Training: 2022-03-31 16:20:40,191-Speed 6315.07 samples/sec Loss 39.5204 LearningRate 0.0001 Epoch: 0 Global Step: 6650 Fp16 Grad Scale: 128 Required: 74 hours Training: 2022-03-31 16:20:43,433-Speed 6318.58 samples/sec Loss 39.4942 LearningRate 0.0001 Epoch: 0 Global Step: 6660 Fp16 Grad Scale: 128 Required: 74 hours Training: 2022-03-31 16:20:46,676-Speed 6315.75 samples/sec Loss 39.4674 LearningRate 0.0001 Epoch: 0 Global Step: 6670 Fp16 Grad Scale: 256 Required: 74 hours Training: 2022-03-31 16:20:49,916-Speed 6322.88 samples/sec Loss 39.5111 LearningRate 0.0001 Epoch: 0 Global Step: 6680 Fp16 Grad Scale: 256 Required: 74 hours Training: 2022-03-31 16:20:53,163-Speed 6308.80 samples/sec Loss 39.5419 LearningRate 0.0001 Epoch: 0 Global Step: 6690 Fp16 Grad Scale: 256 Required: 74 hours Training: 2022-03-31 16:20:56,408-Speed 6313.40 samples/sec Loss 39.5711 LearningRate 0.0001 Epoch: 0 Global Step: 6700 Fp16 Grad Scale: 256 Required: 74 hours Training: 2022-03-31 16:20:59,682-Speed 6255.80 samples/sec Loss 39.5485 LearningRate 0.0001 Epoch: 0 Global Step: 6710 Fp16 Grad Scale: 256 Required: 74 hours Training: 2022-03-31 16:21:02,951-Speed 6266.29 samples/sec Loss 39.5750 LearningRate 0.0001 Epoch: 0 Global Step: 6720 Fp16 Grad Scale: 256 Required: 74 hours Training: 2022-03-31 16:21:06,197-Speed 6311.70 samples/sec Loss 39.5868 LearningRate 0.0001 Epoch: 0 Global Step: 6730 Fp16 Grad Scale: 256 Required: 74 hours Training: 2022-03-31 16:21:09,445-Speed 6305.09 samples/sec Loss 39.5574 LearningRate 0.0001 Epoch: 0 Global Step: 6740 Fp16 Grad Scale: 256 Required: 74 hours Training: 2022-03-31 16:21:12,691-Speed 6312.37 samples/sec Loss 39.5569 LearningRate 0.0001 Epoch: 0 Global Step: 6750 Fp16 Grad Scale: 256 Required: 74 hours Training: 2022-03-31 16:21:15,934-Speed 6316.20 samples/sec Loss 39.5345 LearningRate 0.0001 Epoch: 0 Global Step: 6760 Fp16 Grad Scale: 256 Required: 74 hours Training: 2022-03-31 16:21:19,181-Speed 6307.74 samples/sec Loss 39.4973 LearningRate 0.0001 Epoch: 0 Global Step: 6770 Fp16 Grad Scale: 512 Required: 74 hours Training: 2022-03-31 16:21:22,428-Speed 6309.24 samples/sec Loss 39.4712 LearningRate 0.0001 Epoch: 0 Global Step: 6780 Fp16 Grad Scale: 512 Required: 74 hours Training: 2022-03-31 16:21:25,675-Speed 6309.41 samples/sec Loss 39.4534 LearningRate 0.0001 Epoch: 0 Global Step: 6790 Fp16 Grad Scale: 512 Required: 74 hours Training: 2022-03-31 16:21:28,919-Speed 6314.13 samples/sec Loss 39.4152 LearningRate 0.0001 Epoch: 0 Global Step: 6800 Fp16 Grad Scale: 512 Required: 74 hours Training: 2022-03-31 16:21:32,164-Speed 6312.52 samples/sec Loss 39.4546 LearningRate 0.0001 Epoch: 0 Global Step: 6810 Fp16 Grad Scale: 512 Required: 74 hours Training: 2022-03-31 16:21:35,415-Speed 6301.62 samples/sec Loss 39.4286 LearningRate 0.0001 Epoch: 0 Global Step: 6820 Fp16 Grad Scale: 512 Required: 74 hours Training: 2022-03-31 16:21:38,661-Speed 6309.36 samples/sec Loss 39.3933 LearningRate 0.0001 Epoch: 0 Global Step: 6830 Fp16 Grad Scale: 512 Required: 74 hours Training: 2022-03-31 16:21:41,906-Speed 6313.04 samples/sec Loss 39.4332 LearningRate 0.0001 Epoch: 0 Global Step: 6840 Fp16 Grad Scale: 512 Required: 74 hours Training: 2022-03-31 16:21:45,152-Speed 6312.44 samples/sec Loss 39.4654 LearningRate 0.0001 Epoch: 0 Global Step: 6850 Fp16 Grad Scale: 512 Required: 74 hours Training: 2022-03-31 16:21:48,398-Speed 6309.67 samples/sec Loss 39.4010 LearningRate 0.0001 Epoch: 0 Global Step: 6860 Fp16 Grad Scale: 512 Required: 74 hours Training: 2022-03-31 16:21:51,651-Speed 6297.62 samples/sec Loss 39.4080 LearningRate 0.0001 Epoch: 0 Global Step: 6870 Fp16 Grad Scale: 1024 Required: 74 hours Training: 2022-03-31 16:21:54,898-Speed 6309.23 samples/sec Loss 39.3882 LearningRate 0.0001 Epoch: 0 Global Step: 6880 Fp16 Grad Scale: 1024 Required: 74 hours Training: 2022-03-31 16:21:58,146-Speed 6306.93 samples/sec Loss 39.4142 LearningRate 0.0001 Epoch: 0 Global Step: 6890 Fp16 Grad Scale: 1024 Required: 74 hours Training: 2022-03-31 16:22:01,391-Speed 6312.23 samples/sec Loss 39.4086 LearningRate 0.0001 Epoch: 0 Global Step: 6900 Fp16 Grad Scale: 1024 Required: 74 hours Training: 2022-03-31 16:22:04,640-Speed 6305.20 samples/sec Loss 39.3939 LearningRate 0.0001 Epoch: 0 Global Step: 6910 Fp16 Grad Scale: 1024 Required: 74 hours Training: 2022-03-31 16:22:07,886-Speed 6311.16 samples/sec Loss 39.4066 LearningRate 0.0001 Epoch: 0 Global Step: 6920 Fp16 Grad Scale: 1024 Required: 74 hours Training: 2022-03-31 16:22:11,139-Speed 6296.61 samples/sec Loss 39.4303 LearningRate 0.0001 Epoch: 0 Global Step: 6930 Fp16 Grad Scale: 1024 Required: 74 hours Training: 2022-03-31 16:22:14,387-Speed 6306.06 samples/sec Loss 39.4529 LearningRate 0.0001 Epoch: 0 Global Step: 6940 Fp16 Grad Scale: 1024 Required: 74 hours Training: 2022-03-31 16:22:17,635-Speed 6308.48 samples/sec Loss 39.3966 LearningRate 0.0001 Epoch: 0 Global Step: 6950 Fp16 Grad Scale: 1024 Required: 74 hours Training: 2022-03-31 16:22:20,885-Speed 6302.61 samples/sec Loss 39.4104 LearningRate 0.0001 Epoch: 0 Global Step: 6960 Fp16 Grad Scale: 1024 Required: 74 hours Training: 2022-03-31 16:22:24,131-Speed 6309.40 samples/sec Loss 39.3976 LearningRate 0.0001 Epoch: 0 Global Step: 6970 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 16:22:27,382-Speed 6302.30 samples/sec Loss 39.4013 LearningRate 0.0001 Epoch: 0 Global Step: 6980 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 16:22:30,632-Speed 6301.90 samples/sec Loss 39.3822 LearningRate 0.0001 Epoch: 0 Global Step: 6990 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 16:22:33,878-Speed 6310.82 samples/sec Loss 39.3698 LearningRate 0.0001 Epoch: 0 Global Step: 7000 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 16:22:37,131-Speed 6298.09 samples/sec Loss 39.3856 LearningRate 0.0001 Epoch: 0 Global Step: 7010 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 16:22:40,380-Speed 6304.17 samples/sec Loss 39.3965 LearningRate 0.0001 Epoch: 0 Global Step: 7020 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 16:22:43,635-Speed 6292.70 samples/sec Loss 39.3941 LearningRate 0.0001 Epoch: 0 Global Step: 7030 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 16:22:46,896-Speed 6283.26 samples/sec Loss 39.3666 LearningRate 0.0001 Epoch: 0 Global Step: 7040 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 16:22:50,146-Speed 6302.00 samples/sec Loss 39.3454 LearningRate 0.0001 Epoch: 0 Global Step: 7050 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 16:22:53,394-Speed 6308.37 samples/sec Loss 39.3625 LearningRate 0.0001 Epoch: 0 Global Step: 7060 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 16:22:56,644-Speed 6302.31 samples/sec Loss 39.3982 LearningRate 0.0001 Epoch: 0 Global Step: 7070 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 16:22:59,910-Speed 6271.35 samples/sec Loss 39.3830 LearningRate 0.0001 Epoch: 0 Global Step: 7080 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 16:23:03,159-Speed 6306.41 samples/sec Loss 39.3637 LearningRate 0.0001 Epoch: 0 Global Step: 7090 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 16:23:06,406-Speed 6307.32 samples/sec Loss 39.3321 LearningRate 0.0001 Epoch: 0 Global Step: 7100 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 16:23:09,654-Speed 6307.90 samples/sec Loss 39.3784 LearningRate 0.0001 Epoch: 0 Global Step: 7110 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 16:23:12,903-Speed 6304.68 samples/sec Loss 39.3354 LearningRate 0.0001 Epoch: 0 Global Step: 7120 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 16:23:16,153-Speed 6303.61 samples/sec Loss 39.3218 LearningRate 0.0001 Epoch: 0 Global Step: 7130 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 16:23:19,400-Speed 6308.65 samples/sec Loss 39.3324 LearningRate 0.0001 Epoch: 0 Global Step: 7140 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 16:23:22,646-Speed 6310.53 samples/sec Loss 39.3524 LearningRate 0.0001 Epoch: 0 Global Step: 7150 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 16:23:25,899-Speed 6296.05 samples/sec Loss 39.3500 LearningRate 0.0001 Epoch: 0 Global Step: 7160 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 16:23:29,153-Speed 6296.27 samples/sec Loss 39.3037 LearningRate 0.0001 Epoch: 0 Global Step: 7170 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:23:32,401-Speed 6305.95 samples/sec Loss 39.2886 LearningRate 0.0001 Epoch: 0 Global Step: 7180 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:23:35,653-Speed 6299.48 samples/sec Loss 39.2902 LearningRate 0.0001 Epoch: 0 Global Step: 7190 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:23:38,904-Speed 6301.33 samples/sec Loss 39.2866 LearningRate 0.0001 Epoch: 0 Global Step: 7200 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:23:42,159-Speed 6292.06 samples/sec Loss 39.2740 LearningRate 0.0001 Epoch: 0 Global Step: 7210 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:23:45,405-Speed 6311.30 samples/sec Loss 39.2657 LearningRate 0.0001 Epoch: 0 Global Step: 7220 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:23:48,654-Speed 6304.06 samples/sec Loss 39.3040 LearningRate 0.0001 Epoch: 0 Global Step: 7230 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:23:51,908-Speed 6297.70 samples/sec Loss 39.2752 LearningRate 0.0001 Epoch: 0 Global Step: 7240 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:23:55,156-Speed 6305.87 samples/sec Loss 39.2874 LearningRate 0.0001 Epoch: 0 Global Step: 7250 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:23:58,404-Speed 6306.10 samples/sec Loss 39.2714 LearningRate 0.0001 Epoch: 0 Global Step: 7260 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:24:01,655-Speed 6302.63 samples/sec Loss 39.2738 LearningRate 0.0001 Epoch: 0 Global Step: 7270 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:04,907-Speed 6298.78 samples/sec Loss 39.2939 LearningRate 0.0001 Epoch: 0 Global Step: 7280 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:08,173-Speed 6272.52 samples/sec Loss 39.2895 LearningRate 0.0001 Epoch: 0 Global Step: 7290 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:11,423-Speed 6301.63 samples/sec Loss 39.2610 LearningRate 0.0001 Epoch: 0 Global Step: 7300 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:14,669-Speed 6311.24 samples/sec Loss 39.2592 LearningRate 0.0001 Epoch: 0 Global Step: 7310 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:17,919-Speed 6303.23 samples/sec Loss 39.1848 LearningRate 0.0001 Epoch: 0 Global Step: 7320 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:21,167-Speed 6306.73 samples/sec Loss 39.2842 LearningRate 0.0001 Epoch: 0 Global Step: 7330 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:24,414-Speed 6309.07 samples/sec Loss 39.2623 LearningRate 0.0001 Epoch: 0 Global Step: 7340 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:27,666-Speed 6298.76 samples/sec Loss 39.2380 LearningRate 0.0001 Epoch: 0 Global Step: 7350 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:30,919-Speed 6297.21 samples/sec Loss 39.2124 LearningRate 0.0001 Epoch: 0 Global Step: 7360 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:34,184-Speed 6273.59 samples/sec Loss 39.2486 LearningRate 0.0001 Epoch: 0 Global Step: 7370 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:24:37,419-Speed 6332.48 samples/sec Loss 39.2599 LearningRate 0.0001 Epoch: 0 Global Step: 7380 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:40,669-Speed 6302.84 samples/sec Loss 39.2228 LearningRate 0.0001 Epoch: 0 Global Step: 7390 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:43,916-Speed 6309.89 samples/sec Loss 39.2114 LearningRate 0.0001 Epoch: 0 Global Step: 7400 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:47,166-Speed 6303.10 samples/sec Loss 39.2061 LearningRate 0.0001 Epoch: 0 Global Step: 7410 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:50,411-Speed 6311.91 samples/sec Loss 39.2072 LearningRate 0.0001 Epoch: 0 Global Step: 7420 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:53,661-Speed 6304.32 samples/sec Loss 39.1984 LearningRate 0.0001 Epoch: 0 Global Step: 7430 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:24:56,896-Speed 6332.22 samples/sec Loss 39.1934 LearningRate 0.0001 Epoch: 0 Global Step: 7440 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:00,165-Speed 6265.11 samples/sec Loss 39.2147 LearningRate 0.0001 Epoch: 0 Global Step: 7450 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:03,444-Speed 6248.69 samples/sec Loss 39.2234 LearningRate 0.0001 Epoch: 0 Global Step: 7460 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:06,703-Speed 6287.23 samples/sec Loss 39.2081 LearningRate 0.0001 Epoch: 0 Global Step: 7470 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:09,950-Speed 6308.16 samples/sec Loss 39.2152 LearningRate 0.0001 Epoch: 0 Global Step: 7480 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:13,201-Speed 6300.63 samples/sec Loss 39.2247 LearningRate 0.0001 Epoch: 0 Global Step: 7490 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:16,446-Speed 6313.62 samples/sec Loss 39.2236 LearningRate 0.0001 Epoch: 0 Global Step: 7500 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:19,693-Speed 6308.83 samples/sec Loss 39.2013 LearningRate 0.0001 Epoch: 0 Global Step: 7510 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:22,944-Speed 6301.40 samples/sec Loss 39.2048 LearningRate 0.0001 Epoch: 0 Global Step: 7520 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:26,192-Speed 6305.14 samples/sec Loss 39.2034 LearningRate 0.0001 Epoch: 0 Global Step: 7530 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:29,438-Speed 6311.40 samples/sec Loss 39.1847 LearningRate 0.0001 Epoch: 0 Global Step: 7540 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:25:32,685-Speed 6308.95 samples/sec Loss 39.1766 LearningRate 0.0001 Epoch: 0 Global Step: 7550 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:25:35,924-Speed 6323.89 samples/sec Loss 39.1984 LearningRate 0.0001 Epoch: 0 Global Step: 7560 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:39,171-Speed 6309.31 samples/sec Loss 39.1471 LearningRate 0.0001 Epoch: 0 Global Step: 7570 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:42,418-Speed 6308.05 samples/sec Loss 39.2301 LearningRate 0.0001 Epoch: 0 Global Step: 7580 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:45,672-Speed 6294.88 samples/sec Loss 39.1655 LearningRate 0.0001 Epoch: 0 Global Step: 7590 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:48,921-Speed 6305.36 samples/sec Loss 39.1923 LearningRate 0.0001 Epoch: 0 Global Step: 7600 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:52,183-Speed 6281.00 samples/sec Loss 39.1325 LearningRate 0.0001 Epoch: 0 Global Step: 7610 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:55,438-Speed 6292.39 samples/sec Loss 39.2001 LearningRate 0.0001 Epoch: 0 Global Step: 7620 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:25:58,698-Speed 6284.14 samples/sec Loss 39.1452 LearningRate 0.0001 Epoch: 0 Global Step: 7630 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:26:01,952-Speed 6294.43 samples/sec Loss 39.1554 LearningRate 0.0001 Epoch: 0 Global Step: 7640 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:26:05,203-Speed 6302.94 samples/sec Loss 39.1516 LearningRate 0.0001 Epoch: 0 Global Step: 7650 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:26:08,453-Speed 6301.75 samples/sec Loss 39.1694 LearningRate 0.0001 Epoch: 0 Global Step: 7660 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:26:11,702-Speed 6304.14 samples/sec Loss 39.1794 LearningRate 0.0001 Epoch: 0 Global Step: 7670 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:26:14,949-Speed 6309.12 samples/sec Loss 39.1414 LearningRate 0.0001 Epoch: 0 Global Step: 7680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:26:18,212-Speed 6279.35 samples/sec Loss 39.1390 LearningRate 0.0001 Epoch: 0 Global Step: 7690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:26:21,462-Speed 6301.80 samples/sec Loss 39.1529 LearningRate 0.0001 Epoch: 0 Global Step: 7700 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:26:24,718-Speed 6291.16 samples/sec Loss 39.1240 LearningRate 0.0001 Epoch: 0 Global Step: 7710 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:26:27,968-Speed 6304.12 samples/sec Loss 39.1337 LearningRate 0.0001 Epoch: 0 Global Step: 7720 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:26:31,215-Speed 6307.58 samples/sec Loss 39.1210 LearningRate 0.0001 Epoch: 0 Global Step: 7730 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:26:34,469-Speed 6295.16 samples/sec Loss 39.1110 LearningRate 0.0001 Epoch: 0 Global Step: 7740 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:26:37,720-Speed 6307.64 samples/sec Loss 39.0978 LearningRate 0.0001 Epoch: 0 Global Step: 7750 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:26:40,969-Speed 6302.99 samples/sec Loss 39.0957 LearningRate 0.0001 Epoch: 0 Global Step: 7760 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:26:44,219-Speed 6303.32 samples/sec Loss 39.1278 LearningRate 0.0001 Epoch: 0 Global Step: 7770 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:26:47,473-Speed 6295.31 samples/sec Loss 39.1268 LearningRate 0.0001 Epoch: 0 Global Step: 7780 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:26:50,724-Speed 6301.15 samples/sec Loss 39.1205 LearningRate 0.0001 Epoch: 0 Global Step: 7790 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:26:53,975-Speed 6306.93 samples/sec Loss 39.1213 LearningRate 0.0001 Epoch: 0 Global Step: 7800 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:26:57,220-Speed 6311.56 samples/sec Loss 39.1105 LearningRate 0.0001 Epoch: 0 Global Step: 7810 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:27:00,466-Speed 6310.84 samples/sec Loss 39.1651 LearningRate 0.0001 Epoch: 0 Global Step: 7820 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:27:03,717-Speed 6301.36 samples/sec Loss 39.1290 LearningRate 0.0001 Epoch: 0 Global Step: 7830 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:27:06,954-Speed 6329.63 samples/sec Loss 39.1156 LearningRate 0.0001 Epoch: 0 Global Step: 7840 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:27:10,199-Speed 6312.13 samples/sec Loss 39.1055 LearningRate 0.0001 Epoch: 0 Global Step: 7850 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:27:13,449-Speed 6301.94 samples/sec Loss 39.1061 LearningRate 0.0001 Epoch: 0 Global Step: 7860 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:27:16,697-Speed 6307.67 samples/sec Loss 39.1103 LearningRate 0.0001 Epoch: 0 Global Step: 7870 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:27:19,942-Speed 6311.94 samples/sec Loss 39.0811 LearningRate 0.0001 Epoch: 0 Global Step: 7880 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:27:23,193-Speed 6301.46 samples/sec Loss 39.0631 LearningRate 0.0001 Epoch: 0 Global Step: 7890 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:27:26,442-Speed 6305.07 samples/sec Loss 39.0855 LearningRate 0.0001 Epoch: 0 Global Step: 7900 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:27:29,691-Speed 6305.93 samples/sec Loss 39.0892 LearningRate 0.0001 Epoch: 0 Global Step: 7910 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:27:32,939-Speed 6306.38 samples/sec Loss 39.0951 LearningRate 0.0001 Epoch: 0 Global Step: 7920 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:27:36,187-Speed 6307.42 samples/sec Loss 39.0888 LearningRate 0.0001 Epoch: 0 Global Step: 7930 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:27:39,458-Speed 6260.70 samples/sec Loss 39.0882 LearningRate 0.0001 Epoch: 0 Global Step: 7940 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:27:42,712-Speed 6296.68 samples/sec Loss 39.0848 LearningRate 0.0001 Epoch: 0 Global Step: 7950 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:27:45,960-Speed 6307.08 samples/sec Loss 39.1061 LearningRate 0.0001 Epoch: 0 Global Step: 7960 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:27:49,207-Speed 6308.55 samples/sec Loss 39.0970 LearningRate 0.0001 Epoch: 0 Global Step: 7970 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:27:52,467-Speed 6290.07 samples/sec Loss 39.0649 LearningRate 0.0001 Epoch: 0 Global Step: 7980 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:27:55,723-Speed 6291.62 samples/sec Loss 39.0896 LearningRate 0.0001 Epoch: 0 Global Step: 7990 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:27:58,997-Speed 6257.63 samples/sec Loss 39.0438 LearningRate 0.0001 Epoch: 0 Global Step: 8000 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:28:02,253-Speed 6290.45 samples/sec Loss 39.0586 LearningRate 0.0001 Epoch: 0 Global Step: 8010 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:28:05,510-Speed 6290.48 samples/sec Loss 39.0742 LearningRate 0.0001 Epoch: 0 Global Step: 8020 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:28:08,760-Speed 6302.58 samples/sec Loss 39.0574 LearningRate 0.0001 Epoch: 0 Global Step: 8030 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:28:12,010-Speed 6302.25 samples/sec Loss 39.0491 LearningRate 0.0001 Epoch: 0 Global Step: 8040 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:28:15,265-Speed 6293.67 samples/sec Loss 39.0755 LearningRate 0.0001 Epoch: 0 Global Step: 8050 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:28:18,514-Speed 6305.21 samples/sec Loss 39.0540 LearningRate 0.0001 Epoch: 0 Global Step: 8060 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:28:21,765-Speed 6301.98 samples/sec Loss 39.0641 LearningRate 0.0001 Epoch: 0 Global Step: 8070 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:28:25,013-Speed 6306.68 samples/sec Loss 39.0323 LearningRate 0.0001 Epoch: 0 Global Step: 8080 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:28:28,261-Speed 6305.32 samples/sec Loss 39.0457 LearningRate 0.0001 Epoch: 0 Global Step: 8090 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:28:31,512-Speed 6301.77 samples/sec Loss 39.0231 LearningRate 0.0001 Epoch: 0 Global Step: 8100 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:28:34,765-Speed 6296.99 samples/sec Loss 39.0580 LearningRate 0.0001 Epoch: 0 Global Step: 8110 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:28:38,013-Speed 6306.54 samples/sec Loss 39.0414 LearningRate 0.0001 Epoch: 0 Global Step: 8120 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:28:41,261-Speed 6307.52 samples/sec Loss 39.0244 LearningRate 0.0001 Epoch: 0 Global Step: 8130 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:28:44,514-Speed 6296.15 samples/sec Loss 39.0425 LearningRate 0.0001 Epoch: 0 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:28:47,769-Speed 6294.03 samples/sec Loss 39.0145 LearningRate 0.0001 Epoch: 0 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:28:51,017-Speed 6306.03 samples/sec Loss 39.0238 LearningRate 0.0001 Epoch: 0 Global Step: 8160 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:28:54,270-Speed 6298.80 samples/sec Loss 39.0185 LearningRate 0.0001 Epoch: 0 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:28:57,518-Speed 6305.85 samples/sec Loss 38.9753 LearningRate 0.0001 Epoch: 0 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:29:00,767-Speed 6304.05 samples/sec Loss 39.0097 LearningRate 0.0001 Epoch: 0 Global Step: 8190 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:29:03,989-Speed 6357.50 samples/sec Loss 39.0161 LearningRate 0.0001 Epoch: 0 Global Step: 8200 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:07,255-Speed 6273.03 samples/sec Loss 39.0102 LearningRate 0.0001 Epoch: 0 Global Step: 8210 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:10,508-Speed 6297.67 samples/sec Loss 39.0204 LearningRate 0.0001 Epoch: 0 Global Step: 8220 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:13,756-Speed 6307.26 samples/sec Loss 39.0255 LearningRate 0.0001 Epoch: 0 Global Step: 8230 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:17,005-Speed 6303.96 samples/sec Loss 38.9578 LearningRate 0.0001 Epoch: 0 Global Step: 8240 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:20,264-Speed 6285.64 samples/sec Loss 39.0573 LearningRate 0.0001 Epoch: 0 Global Step: 8250 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:23,510-Speed 6311.62 samples/sec Loss 38.9791 LearningRate 0.0001 Epoch: 0 Global Step: 8260 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:26,761-Speed 6301.68 samples/sec Loss 39.0002 LearningRate 0.0001 Epoch: 0 Global Step: 8270 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:30,012-Speed 6301.57 samples/sec Loss 38.9730 LearningRate 0.0001 Epoch: 0 Global Step: 8280 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:33,260-Speed 6305.08 samples/sec Loss 38.9898 LearningRate 0.0001 Epoch: 0 Global Step: 8290 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:36,513-Speed 6297.71 samples/sec Loss 38.9950 LearningRate 0.0001 Epoch: 0 Global Step: 8300 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:29:39,764-Speed 6300.46 samples/sec Loss 39.0279 LearningRate 0.0001 Epoch: 0 Global Step: 8310 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:29:43,015-Speed 6302.77 samples/sec Loss 39.0098 LearningRate 0.0001 Epoch: 0 Global Step: 8320 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:29:46,267-Speed 6298.26 samples/sec Loss 38.9765 LearningRate 0.0001 Epoch: 0 Global Step: 8330 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:29:49,505-Speed 6325.49 samples/sec Loss 38.9858 LearningRate 0.0001 Epoch: 0 Global Step: 8340 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:52,759-Speed 6296.15 samples/sec Loss 39.0406 LearningRate 0.0001 Epoch: 0 Global Step: 8350 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:56,011-Speed 6299.04 samples/sec Loss 39.0041 LearningRate 0.0001 Epoch: 0 Global Step: 8360 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:29:59,267-Speed 6291.22 samples/sec Loss 38.9553 LearningRate 0.0001 Epoch: 0 Global Step: 8370 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:30:02,569-Speed 6204.25 samples/sec Loss 38.9438 LearningRate 0.0001 Epoch: 0 Global Step: 8380 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:30:05,840-Speed 6262.29 samples/sec Loss 38.9072 LearningRate 0.0001 Epoch: 0 Global Step: 8390 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:30:09,091-Speed 6299.52 samples/sec Loss 38.9236 LearningRate 0.0001 Epoch: 0 Global Step: 8400 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:30:12,348-Speed 6290.34 samples/sec Loss 38.9509 LearningRate 0.0001 Epoch: 0 Global Step: 8410 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:30:15,604-Speed 6292.22 samples/sec Loss 38.9767 LearningRate 0.0001 Epoch: 0 Global Step: 8420 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:30:18,842-Speed 6326.83 samples/sec Loss 38.9656 LearningRate 0.0001 Epoch: 0 Global Step: 8430 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:22,099-Speed 6288.78 samples/sec Loss 38.9417 LearningRate 0.0001 Epoch: 0 Global Step: 8440 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:25,345-Speed 6310.85 samples/sec Loss 38.8896 LearningRate 0.0001 Epoch: 0 Global Step: 8450 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:28,622-Speed 6249.79 samples/sec Loss 38.9596 LearningRate 0.0001 Epoch: 0 Global Step: 8460 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:31,876-Speed 6295.02 samples/sec Loss 38.9467 LearningRate 0.0001 Epoch: 0 Global Step: 8470 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:35,124-Speed 6308.94 samples/sec Loss 38.9347 LearningRate 0.0001 Epoch: 0 Global Step: 8480 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:38,375-Speed 6300.73 samples/sec Loss 38.9466 LearningRate 0.0001 Epoch: 0 Global Step: 8490 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:41,690-Speed 6178.93 samples/sec Loss 38.9503 LearningRate 0.0001 Epoch: 0 Global Step: 8500 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:44,936-Speed 6310.40 samples/sec Loss 38.9374 LearningRate 0.0001 Epoch: 0 Global Step: 8510 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:48,184-Speed 6306.61 samples/sec Loss 38.9402 LearningRate 0.0001 Epoch: 0 Global Step: 8520 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:51,438-Speed 6294.69 samples/sec Loss 38.9101 LearningRate 0.0001 Epoch: 0 Global Step: 8530 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:30:54,671-Speed 6336.08 samples/sec Loss 38.9724 LearningRate 0.0001 Epoch: 0 Global Step: 8540 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:30:57,915-Speed 6314.94 samples/sec Loss 38.9079 LearningRate 0.0001 Epoch: 0 Global Step: 8550 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:01,163-Speed 6306.52 samples/sec Loss 38.9204 LearningRate 0.0001 Epoch: 0 Global Step: 8560 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:04,413-Speed 6304.36 samples/sec Loss 38.9385 LearningRate 0.0001 Epoch: 0 Global Step: 8570 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:07,662-Speed 6304.22 samples/sec Loss 38.9152 LearningRate 0.0001 Epoch: 0 Global Step: 8580 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:10,916-Speed 6295.34 samples/sec Loss 38.9226 LearningRate 0.0001 Epoch: 0 Global Step: 8590 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:14,165-Speed 6304.13 samples/sec Loss 38.9959 LearningRate 0.0001 Epoch: 0 Global Step: 8600 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:17,417-Speed 6299.80 samples/sec Loss 38.8712 LearningRate 0.0001 Epoch: 0 Global Step: 8610 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:20,669-Speed 6299.45 samples/sec Loss 38.9345 LearningRate 0.0001 Epoch: 0 Global Step: 8620 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:23,920-Speed 6300.92 samples/sec Loss 38.8870 LearningRate 0.0001 Epoch: 0 Global Step: 8630 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:27,173-Speed 6297.39 samples/sec Loss 38.8981 LearningRate 0.0001 Epoch: 0 Global Step: 8640 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:31:30,438-Speed 6274.57 samples/sec Loss 38.8953 LearningRate 0.0001 Epoch: 0 Global Step: 8650 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:33,692-Speed 6295.22 samples/sec Loss 38.9293 LearningRate 0.0001 Epoch: 0 Global Step: 8660 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:36,945-Speed 6297.52 samples/sec Loss 38.9127 LearningRate 0.0001 Epoch: 0 Global Step: 8670 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:40,263-Speed 6172.30 samples/sec Loss 38.8856 LearningRate 0.0001 Epoch: 0 Global Step: 8680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:43,521-Speed 6287.74 samples/sec Loss 38.8447 LearningRate 0.0001 Epoch: 0 Global Step: 8690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:46,768-Speed 6308.46 samples/sec Loss 38.8477 LearningRate 0.0001 Epoch: 0 Global Step: 8700 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:50,018-Speed 6303.61 samples/sec Loss 38.8987 LearningRate 0.0001 Epoch: 0 Global Step: 8710 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:53,266-Speed 6307.53 samples/sec Loss 38.8616 LearningRate 0.0001 Epoch: 0 Global Step: 8720 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:56,511-Speed 6311.40 samples/sec Loss 38.8399 LearningRate 0.0001 Epoch: 0 Global Step: 8730 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:31:59,760-Speed 6304.64 samples/sec Loss 38.8383 LearningRate 0.0001 Epoch: 0 Global Step: 8740 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:32:03,010-Speed 6304.38 samples/sec Loss 38.8904 LearningRate 0.0001 Epoch: 0 Global Step: 8750 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:06,259-Speed 6304.59 samples/sec Loss 38.8387 LearningRate 0.0001 Epoch: 0 Global Step: 8760 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:09,519-Speed 6282.76 samples/sec Loss 38.8753 LearningRate 0.0001 Epoch: 0 Global Step: 8770 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:12,775-Speed 6292.01 samples/sec Loss 38.8042 LearningRate 0.0001 Epoch: 0 Global Step: 8780 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:16,030-Speed 6294.39 samples/sec Loss 38.8862 LearningRate 0.0001 Epoch: 0 Global Step: 8790 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:19,275-Speed 6310.71 samples/sec Loss 38.9032 LearningRate 0.0001 Epoch: 0 Global Step: 8800 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:22,583-Speed 6192.83 samples/sec Loss 38.8383 LearningRate 0.0001 Epoch: 0 Global Step: 8810 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:25,840-Speed 6289.04 samples/sec Loss 38.8176 LearningRate 0.0001 Epoch: 0 Global Step: 8820 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:29,096-Speed 6296.36 samples/sec Loss 38.8578 LearningRate 0.0001 Epoch: 0 Global Step: 8830 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:32,342-Speed 6311.03 samples/sec Loss 38.8145 LearningRate 0.0001 Epoch: 0 Global Step: 8840 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:35,581-Speed 6324.22 samples/sec Loss 38.8033 LearningRate 0.0001 Epoch: 0 Global Step: 8850 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:38,835-Speed 6294.41 samples/sec Loss 38.7963 LearningRate 0.0001 Epoch: 0 Global Step: 8860 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:42,090-Speed 6293.78 samples/sec Loss 38.9439 LearningRate 0.0001 Epoch: 0 Global Step: 8870 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:45,337-Speed 6308.16 samples/sec Loss 38.8632 LearningRate 0.0001 Epoch: 0 Global Step: 8880 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:48,583-Speed 6312.74 samples/sec Loss 38.8722 LearningRate 0.0001 Epoch: 0 Global Step: 8890 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:51,855-Speed 6260.21 samples/sec Loss 38.8197 LearningRate 0.0001 Epoch: 0 Global Step: 8900 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:55,103-Speed 6305.82 samples/sec Loss 38.7515 LearningRate 0.0001 Epoch: 0 Global Step: 8910 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:32:58,352-Speed 6305.48 samples/sec Loss 38.8232 LearningRate 0.0001 Epoch: 0 Global Step: 8920 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:33:01,599-Speed 6309.13 samples/sec Loss 38.7727 LearningRate 0.0001 Epoch: 0 Global Step: 8930 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:33:04,862-Speed 6277.78 samples/sec Loss 38.8135 LearningRate 0.0001 Epoch: 0 Global Step: 8940 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:33:08,124-Speed 6278.53 samples/sec Loss 38.8235 LearningRate 0.0001 Epoch: 0 Global Step: 8950 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:33:11,397-Speed 6258.37 samples/sec Loss 38.8002 LearningRate 0.0001 Epoch: 0 Global Step: 8960 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:33:14,650-Speed 6297.10 samples/sec Loss 38.7937 LearningRate 0.0001 Epoch: 0 Global Step: 8970 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:33:17,898-Speed 6307.87 samples/sec Loss 38.8564 LearningRate 0.0001 Epoch: 0 Global Step: 8980 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:33:21,148-Speed 6301.67 samples/sec Loss 38.8107 LearningRate 0.0001 Epoch: 0 Global Step: 8990 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:33:24,399-Speed 6301.47 samples/sec Loss 38.7642 LearningRate 0.0001 Epoch: 0 Global Step: 9000 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:33:27,645-Speed 6310.99 samples/sec Loss 38.7849 LearningRate 0.0001 Epoch: 0 Global Step: 9010 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:33:30,898-Speed 6298.52 samples/sec Loss 38.7720 LearningRate 0.0001 Epoch: 0 Global Step: 9020 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:33:34,150-Speed 6299.07 samples/sec Loss 38.7667 LearningRate 0.0001 Epoch: 0 Global Step: 9030 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:33:37,421-Speed 6262.53 samples/sec Loss 38.8264 LearningRate 0.0001 Epoch: 0 Global Step: 9040 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:33:40,675-Speed 6294.61 samples/sec Loss 38.8077 LearningRate 0.0001 Epoch: 0 Global Step: 9050 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:33:43,925-Speed 6302.75 samples/sec Loss 38.7401 LearningRate 0.0001 Epoch: 0 Global Step: 9060 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:33:47,184-Speed 6286.41 samples/sec Loss 38.7997 LearningRate 0.0001 Epoch: 0 Global Step: 9070 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:33:50,440-Speed 6290.65 samples/sec Loss 38.7978 LearningRate 0.0001 Epoch: 0 Global Step: 9080 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:33:53,698-Speed 6286.87 samples/sec Loss 38.7717 LearningRate 0.0001 Epoch: 0 Global Step: 9090 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:33:56,950-Speed 6299.30 samples/sec Loss 38.7390 LearningRate 0.0001 Epoch: 0 Global Step: 9100 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:00,203-Speed 6297.44 samples/sec Loss 38.6999 LearningRate 0.0001 Epoch: 0 Global Step: 9110 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:03,464-Speed 6286.35 samples/sec Loss 38.7235 LearningRate 0.0001 Epoch: 0 Global Step: 9120 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:06,712-Speed 6306.61 samples/sec Loss 38.8089 LearningRate 0.0001 Epoch: 0 Global Step: 9130 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:09,963-Speed 6301.16 samples/sec Loss 38.7675 LearningRate 0.0001 Epoch: 0 Global Step: 9140 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:34:13,212-Speed 6304.22 samples/sec Loss 38.7416 LearningRate 0.0001 Epoch: 0 Global Step: 9150 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:34:16,460-Speed 6306.21 samples/sec Loss 38.7652 LearningRate 0.0001 Epoch: 0 Global Step: 9160 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:34:19,703-Speed 6316.13 samples/sec Loss 38.7513 LearningRate 0.0001 Epoch: 0 Global Step: 9170 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:34:22,950-Speed 6309.64 samples/sec Loss 38.6897 LearningRate 0.0001 Epoch: 0 Global Step: 9180 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:34:26,202-Speed 6299.67 samples/sec Loss 38.7144 LearningRate 0.0001 Epoch: 0 Global Step: 9190 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:34:29,449-Speed 6307.42 samples/sec Loss 38.7383 LearningRate 0.0001 Epoch: 0 Global Step: 9200 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:34:32,706-Speed 6290.51 samples/sec Loss 38.6958 LearningRate 0.0001 Epoch: 0 Global Step: 9210 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:34:35,935-Speed 6344.33 samples/sec Loss 38.7130 LearningRate 0.0001 Epoch: 0 Global Step: 9220 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:39,180-Speed 6313.90 samples/sec Loss 38.7418 LearningRate 0.0001 Epoch: 0 Global Step: 9230 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:42,427-Speed 6307.06 samples/sec Loss 38.7114 LearningRate 0.0001 Epoch: 0 Global Step: 9240 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:45,677-Speed 6304.79 samples/sec Loss 38.7399 LearningRate 0.0001 Epoch: 0 Global Step: 9250 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:48,921-Speed 6313.06 samples/sec Loss 38.7739 LearningRate 0.0001 Epoch: 0 Global Step: 9260 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:52,168-Speed 6309.96 samples/sec Loss 38.6991 LearningRate 0.0001 Epoch: 0 Global Step: 9270 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:55,442-Speed 6255.64 samples/sec Loss 38.6916 LearningRate 0.0001 Epoch: 0 Global Step: 9280 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:34:58,689-Speed 6309.53 samples/sec Loss 38.6993 LearningRate 0.0001 Epoch: 0 Global Step: 9290 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:35:01,950-Speed 6281.30 samples/sec Loss 38.7067 LearningRate 0.0001 Epoch: 0 Global Step: 9300 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:35:05,198-Speed 6306.22 samples/sec Loss 38.6680 LearningRate 0.0001 Epoch: 0 Global Step: 9310 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:35:08,446-Speed 6306.93 samples/sec Loss 38.6803 LearningRate 0.0001 Epoch: 0 Global Step: 9320 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:35:11,694-Speed 6308.29 samples/sec Loss 38.7153 LearningRate 0.0001 Epoch: 0 Global Step: 9330 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:35:14,941-Speed 6308.61 samples/sec Loss 38.6861 LearningRate 0.0001 Epoch: 0 Global Step: 9340 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:35:18,188-Speed 6308.42 samples/sec Loss 38.6712 LearningRate 0.0001 Epoch: 0 Global Step: 9350 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:35:21,440-Speed 6298.43 samples/sec Loss 38.6991 LearningRate 0.0001 Epoch: 0 Global Step: 9360 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:35:24,684-Speed 6314.31 samples/sec Loss 38.7489 LearningRate 0.0001 Epoch: 0 Global Step: 9370 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:35:27,916-Speed 6339.11 samples/sec Loss 38.7660 LearningRate 0.0001 Epoch: 0 Global Step: 9380 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:35:31,169-Speed 6296.87 samples/sec Loss 38.6934 LearningRate 0.0001 Epoch: 0 Global Step: 9390 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:35:34,423-Speed 6295.53 samples/sec Loss 38.6678 LearningRate 0.0001 Epoch: 0 Global Step: 9400 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:35:37,670-Speed 6308.27 samples/sec Loss 38.6411 LearningRate 0.0001 Epoch: 0 Global Step: 9410 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:35:40,939-Speed 6267.79 samples/sec Loss 38.6613 LearningRate 0.0001 Epoch: 0 Global Step: 9420 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:35:44,198-Speed 6285.73 samples/sec Loss 38.6579 LearningRate 0.0001 Epoch: 0 Global Step: 9430 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:35:47,446-Speed 6306.19 samples/sec Loss 38.6750 LearningRate 0.0001 Epoch: 0 Global Step: 9440 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:35:50,697-Speed 6301.66 samples/sec Loss 38.6487 LearningRate 0.0001 Epoch: 0 Global Step: 9450 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:35:53,944-Speed 6308.37 samples/sec Loss 38.6422 LearningRate 0.0001 Epoch: 0 Global Step: 9460 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:35:57,197-Speed 6296.99 samples/sec Loss 38.6326 LearningRate 0.0001 Epoch: 0 Global Step: 9470 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:36:00,495-Speed 6210.36 samples/sec Loss 38.6602 LearningRate 0.0001 Epoch: 0 Global Step: 9480 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:03,748-Speed 6297.48 samples/sec Loss 38.6630 LearningRate 0.0001 Epoch: 0 Global Step: 9490 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:07,004-Speed 6292.33 samples/sec Loss 38.6242 LearningRate 0.0001 Epoch: 0 Global Step: 9500 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:10,258-Speed 6295.01 samples/sec Loss 38.6012 LearningRate 0.0001 Epoch: 0 Global Step: 9510 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:13,508-Speed 6302.09 samples/sec Loss 38.6565 LearningRate 0.0001 Epoch: 0 Global Step: 9520 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:16,758-Speed 6303.58 samples/sec Loss 38.7648 LearningRate 0.0001 Epoch: 0 Global Step: 9530 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:20,005-Speed 6309.43 samples/sec Loss 38.6603 LearningRate 0.0001 Epoch: 0 Global Step: 9540 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:23,254-Speed 6304.21 samples/sec Loss 38.5882 LearningRate 0.0001 Epoch: 0 Global Step: 9550 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:26,506-Speed 6299.72 samples/sec Loss 38.6513 LearningRate 0.0001 Epoch: 0 Global Step: 9560 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:29,756-Speed 6302.96 samples/sec Loss 38.6119 LearningRate 0.0001 Epoch: 0 Global Step: 9570 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:33,003-Speed 6307.09 samples/sec Loss 38.5991 LearningRate 0.0001 Epoch: 0 Global Step: 9580 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:36:36,260-Speed 6289.16 samples/sec Loss 38.6412 LearningRate 0.0001 Epoch: 0 Global Step: 9590 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:36:39,508-Speed 6306.93 samples/sec Loss 38.6153 LearningRate 0.0001 Epoch: 0 Global Step: 9600 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:36:42,755-Speed 6309.88 samples/sec Loss 38.6420 LearningRate 0.0001 Epoch: 0 Global Step: 9610 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:36:46,004-Speed 6305.55 samples/sec Loss 38.6608 LearningRate 0.0001 Epoch: 0 Global Step: 9620 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:36:49,252-Speed 6306.43 samples/sec Loss 38.6468 LearningRate 0.0001 Epoch: 0 Global Step: 9630 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:36:52,496-Speed 6314.80 samples/sec Loss 38.6541 LearningRate 0.0001 Epoch: 0 Global Step: 9640 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:36:55,730-Speed 6333.65 samples/sec Loss 38.7263 LearningRate 0.0001 Epoch: 0 Global Step: 9650 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:36:58,979-Speed 6308.49 samples/sec Loss 38.6437 LearningRate 0.0001 Epoch: 0 Global Step: 9660 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:37:02,229-Speed 6302.18 samples/sec Loss 38.6328 LearningRate 0.0001 Epoch: 0 Global Step: 9670 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:37:05,480-Speed 6301.31 samples/sec Loss 38.6134 LearningRate 0.0001 Epoch: 0 Global Step: 9680 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:37:08,731-Speed 6300.85 samples/sec Loss 38.6633 LearningRate 0.0001 Epoch: 0 Global Step: 9690 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:37:11,979-Speed 6306.86 samples/sec Loss 38.6747 LearningRate 0.0001 Epoch: 0 Global Step: 9700 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:37:15,226-Speed 6308.51 samples/sec Loss 38.6432 LearningRate 0.0001 Epoch: 0 Global Step: 9710 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:37:18,473-Speed 6308.87 samples/sec Loss 38.5922 LearningRate 0.0001 Epoch: 0 Global Step: 9720 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:37:21,718-Speed 6311.94 samples/sec Loss 38.5596 LearningRate 0.0001 Epoch: 0 Global Step: 9730 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:37:24,975-Speed 6289.38 samples/sec Loss 38.5982 LearningRate 0.0001 Epoch: 0 Global Step: 9740 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:37:28,229-Speed 6295.39 samples/sec Loss 38.6180 LearningRate 0.0001 Epoch: 0 Global Step: 9750 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:37:31,476-Speed 6308.86 samples/sec Loss 38.6061 LearningRate 0.0001 Epoch: 0 Global Step: 9760 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:37:34,723-Speed 6309.46 samples/sec Loss 38.5799 LearningRate 0.0001 Epoch: 0 Global Step: 9770 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:37:37,967-Speed 6315.23 samples/sec Loss 38.5520 LearningRate 0.0001 Epoch: 0 Global Step: 9780 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:37:41,216-Speed 6304.74 samples/sec Loss 38.5586 LearningRate 0.0001 Epoch: 0 Global Step: 9790 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:37:44,461-Speed 6311.72 samples/sec Loss 38.5707 LearningRate 0.0001 Epoch: 0 Global Step: 9800 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:37:47,710-Speed 6306.11 samples/sec Loss 38.5752 LearningRate 0.0001 Epoch: 0 Global Step: 9810 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:37:50,961-Speed 6300.19 samples/sec Loss 38.6248 LearningRate 0.0001 Epoch: 0 Global Step: 9820 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:37:54,211-Speed 6304.36 samples/sec Loss 38.5776 LearningRate 0.0001 Epoch: 0 Global Step: 9830 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:37:57,458-Speed 6309.23 samples/sec Loss 38.6139 LearningRate 0.0001 Epoch: 0 Global Step: 9840 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:38:00,706-Speed 6306.33 samples/sec Loss 38.6079 LearningRate 0.0001 Epoch: 0 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:38:03,956-Speed 6302.56 samples/sec Loss 38.5447 LearningRate 0.0001 Epoch: 0 Global Step: 9860 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:38:07,204-Speed 6307.42 samples/sec Loss 38.5771 LearningRate 0.0001 Epoch: 0 Global Step: 9870 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:38:10,438-Speed 6333.74 samples/sec Loss 38.5373 LearningRate 0.0001 Epoch: 0 Global Step: 9880 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:38:13,683-Speed 6313.53 samples/sec Loss 38.5642 LearningRate 0.0001 Epoch: 0 Global Step: 9890 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:38:16,940-Speed 6288.99 samples/sec Loss 38.5461 LearningRate 0.0001 Epoch: 0 Global Step: 9900 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:38:20,173-Speed 6335.52 samples/sec Loss 38.5903 LearningRate 0.0001 Epoch: 0 Global Step: 9910 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:38:23,419-Speed 6310.32 samples/sec Loss 38.5526 LearningRate 0.0001 Epoch: 0 Global Step: 9920 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:38:26,667-Speed 6307.54 samples/sec Loss 38.5662 LearningRate 0.0001 Epoch: 0 Global Step: 9930 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:38:29,919-Speed 6298.51 samples/sec Loss 38.5255 LearningRate 0.0001 Epoch: 0 Global Step: 9940 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:38:33,166-Speed 6308.81 samples/sec Loss 38.5184 LearningRate 0.0001 Epoch: 0 Global Step: 9950 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:38:36,423-Speed 6288.07 samples/sec Loss 38.5537 LearningRate 0.0001 Epoch: 0 Global Step: 9960 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:38:39,673-Speed 6304.42 samples/sec Loss 38.5514 LearningRate 0.0001 Epoch: 0 Global Step: 9970 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:38:42,926-Speed 6296.10 samples/sec Loss 38.5254 LearningRate 0.0001 Epoch: 0 Global Step: 9980 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:38:46,174-Speed 6307.48 samples/sec Loss 38.5324 LearningRate 0.0001 Epoch: 0 Global Step: 9990 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:38:49,420-Speed 6311.39 samples/sec Loss 38.5059 LearningRate 0.0001 Epoch: 0 Global Step: 10000 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:38:52,670-Speed 6304.18 samples/sec Loss 38.4922 LearningRate 0.0001 Epoch: 0 Global Step: 10010 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:38:55,917-Speed 6307.51 samples/sec Loss 38.5447 LearningRate 0.0001 Epoch: 0 Global Step: 10020 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:38:59,166-Speed 6306.16 samples/sec Loss 38.5443 LearningRate 0.0001 Epoch: 0 Global Step: 10030 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:39:02,414-Speed 6306.92 samples/sec Loss 38.4826 LearningRate 0.0001 Epoch: 0 Global Step: 10040 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:39:05,660-Speed 6310.66 samples/sec Loss 38.5200 LearningRate 0.0001 Epoch: 0 Global Step: 10050 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:39:08,911-Speed 6299.50 samples/sec Loss 38.4959 LearningRate 0.0001 Epoch: 0 Global Step: 10060 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:39:12,159-Speed 6307.72 samples/sec Loss 38.5115 LearningRate 0.0001 Epoch: 0 Global Step: 10070 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:39:15,391-Speed 6338.54 samples/sec Loss 38.5246 LearningRate 0.0001 Epoch: 0 Global Step: 10080 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:18,634-Speed 6315.74 samples/sec Loss 38.4687 LearningRate 0.0001 Epoch: 0 Global Step: 10090 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:21,888-Speed 6295.88 samples/sec Loss 38.4767 LearningRate 0.0001 Epoch: 0 Global Step: 10100 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:25,138-Speed 6303.12 samples/sec Loss 38.4828 LearningRate 0.0001 Epoch: 0 Global Step: 10110 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:28,401-Speed 6277.34 samples/sec Loss 38.5047 LearningRate 0.0001 Epoch: 0 Global Step: 10120 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:31,651-Speed 6303.21 samples/sec Loss 38.5106 LearningRate 0.0001 Epoch: 0 Global Step: 10130 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:34,902-Speed 6301.06 samples/sec Loss 38.4987 LearningRate 0.0001 Epoch: 0 Global Step: 10140 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:38,200-Speed 6212.95 samples/sec Loss 38.4699 LearningRate 0.0001 Epoch: 0 Global Step: 10150 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:41,454-Speed 6296.06 samples/sec Loss 38.4595 LearningRate 0.0001 Epoch: 0 Global Step: 10160 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:44,739-Speed 6236.56 samples/sec Loss 38.4658 LearningRate 0.0001 Epoch: 0 Global Step: 10170 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:47,990-Speed 6300.21 samples/sec Loss 38.4898 LearningRate 0.0001 Epoch: 0 Global Step: 10180 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:39:51,225-Speed 6332.48 samples/sec Loss 38.4607 LearningRate 0.0001 Epoch: 0 Global Step: 10190 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:54,472-Speed 6309.07 samples/sec Loss 38.5477 LearningRate 0.0001 Epoch: 0 Global Step: 10200 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:39:57,720-Speed 6307.85 samples/sec Loss 38.4644 LearningRate 0.0001 Epoch: 0 Global Step: 10210 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:40:00,994-Speed 6256.72 samples/sec Loss 38.4613 LearningRate 0.0001 Epoch: 0 Global Step: 10220 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:40:04,237-Speed 6315.63 samples/sec Loss 38.4969 LearningRate 0.0001 Epoch: 0 Global Step: 10230 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:40:07,471-Speed 6334.94 samples/sec Loss 38.5211 LearningRate 0.0001 Epoch: 0 Global Step: 10240 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:40:10,716-Speed 6312.29 samples/sec Loss 38.5242 LearningRate 0.0001 Epoch: 0 Global Step: 10250 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:40:13,962-Speed 6313.31 samples/sec Loss 38.5413 LearningRate 0.0001 Epoch: 0 Global Step: 10260 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:40:17,207-Speed 6313.23 samples/sec Loss 38.4300 LearningRate 0.0001 Epoch: 0 Global Step: 10270 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:40:20,453-Speed 6309.47 samples/sec Loss 38.4490 LearningRate 0.0001 Epoch: 0 Global Step: 10280 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:40:23,697-Speed 6313.98 samples/sec Loss 38.4764 LearningRate 0.0001 Epoch: 0 Global Step: 10290 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:40:26,948-Speed 6300.91 samples/sec Loss 38.4676 LearningRate 0.0001 Epoch: 0 Global Step: 10300 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:40:30,222-Speed 6258.34 samples/sec Loss 38.4430 LearningRate 0.0001 Epoch: 0 Global Step: 10310 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:40:33,468-Speed 6308.72 samples/sec Loss 38.4112 LearningRate 0.0001 Epoch: 0 Global Step: 10320 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:40:36,719-Speed 6300.99 samples/sec Loss 38.4507 LearningRate 0.0001 Epoch: 0 Global Step: 10330 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:40:39,966-Speed 6310.54 samples/sec Loss 38.4530 LearningRate 0.0001 Epoch: 0 Global Step: 10340 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:40:43,213-Speed 6307.24 samples/sec Loss 38.5028 LearningRate 0.0001 Epoch: 0 Global Step: 10350 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:40:46,471-Speed 6288.87 samples/sec Loss 38.4456 LearningRate 0.0001 Epoch: 0 Global Step: 10360 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:40:49,785-Speed 6181.21 samples/sec Loss 38.4113 LearningRate 0.0001 Epoch: 0 Global Step: 10370 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:40:53,030-Speed 6311.86 samples/sec Loss 38.4461 LearningRate 0.0001 Epoch: 0 Global Step: 10380 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:40:56,279-Speed 6304.52 samples/sec Loss 38.4760 LearningRate 0.0001 Epoch: 0 Global Step: 10390 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:40:59,529-Speed 6305.00 samples/sec Loss 38.4284 LearningRate 0.0001 Epoch: 0 Global Step: 10400 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:02,780-Speed 6301.34 samples/sec Loss 38.4264 LearningRate 0.0001 Epoch: 0 Global Step: 10410 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:06,026-Speed 6309.33 samples/sec Loss 38.4118 LearningRate 0.0001 Epoch: 0 Global Step: 10420 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:09,277-Speed 6301.14 samples/sec Loss 38.4014 LearningRate 0.0001 Epoch: 0 Global Step: 10430 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:12,509-Speed 6337.68 samples/sec Loss 38.5403 LearningRate 0.0001 Epoch: 0 Global Step: 10440 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:15,758-Speed 6306.27 samples/sec Loss 38.4240 LearningRate 0.0001 Epoch: 0 Global Step: 10450 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:19,004-Speed 6309.74 samples/sec Loss 38.3984 LearningRate 0.0001 Epoch: 0 Global Step: 10460 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:22,254-Speed 6303.38 samples/sec Loss 38.5346 LearningRate 0.0001 Epoch: 0 Global Step: 10470 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:25,598-Speed 6125.80 samples/sec Loss 38.4898 LearningRate 0.0001 Epoch: 0 Global Step: 10480 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:28,886-Speed 6229.35 samples/sec Loss 38.4728 LearningRate 0.0001 Epoch: 0 Global Step: 10490 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:32,144-Speed 6288.76 samples/sec Loss 38.5077 LearningRate 0.0001 Epoch: 0 Global Step: 10500 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:35,401-Speed 6288.63 samples/sec Loss 38.4534 LearningRate 0.0001 Epoch: 0 Global Step: 10510 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:41:38,656-Speed 6294.04 samples/sec Loss 38.4186 LearningRate 0.0001 Epoch: 0 Global Step: 10520 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:41:41,903-Speed 6308.97 samples/sec Loss 38.3919 LearningRate 0.0001 Epoch: 0 Global Step: 10530 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:41:45,150-Speed 6309.09 samples/sec Loss 38.3682 LearningRate 0.0001 Epoch: 0 Global Step: 10540 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:41:48,393-Speed 6315.06 samples/sec Loss 38.3702 LearningRate 0.0001 Epoch: 0 Global Step: 10550 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:41:51,642-Speed 6304.93 samples/sec Loss 38.3899 LearningRate 0.0001 Epoch: 0 Global Step: 10560 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:41:54,893-Speed 6301.39 samples/sec Loss 38.4938 LearningRate 0.0001 Epoch: 0 Global Step: 10570 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:41:58,140-Speed 6309.08 samples/sec Loss 38.4053 LearningRate 0.0001 Epoch: 0 Global Step: 10580 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:01,391-Speed 6301.13 samples/sec Loss 38.3771 LearningRate 0.0001 Epoch: 0 Global Step: 10590 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:04,638-Speed 6309.89 samples/sec Loss 38.3657 LearningRate 0.0001 Epoch: 0 Global Step: 10600 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:07,884-Speed 6310.17 samples/sec Loss 38.3262 LearningRate 0.0001 Epoch: 0 Global Step: 10610 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:11,132-Speed 6307.38 samples/sec Loss 38.3494 LearningRate 0.0001 Epoch: 0 Global Step: 10620 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:42:14,379-Speed 6308.29 samples/sec Loss 38.5120 LearningRate 0.0001 Epoch: 0 Global Step: 10630 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:42:17,623-Speed 6314.87 samples/sec Loss 38.3538 LearningRate 0.0001 Epoch: 0 Global Step: 10640 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:42:20,858-Speed 6332.39 samples/sec Loss 38.4412 LearningRate 0.0001 Epoch: 0 Global Step: 10650 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:24,107-Speed 6304.94 samples/sec Loss 38.3604 LearningRate 0.0001 Epoch: 0 Global Step: 10660 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:27,355-Speed 6306.89 samples/sec Loss 38.3667 LearningRate 0.0001 Epoch: 0 Global Step: 10670 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:30,605-Speed 6302.65 samples/sec Loss 38.3162 LearningRate 0.0001 Epoch: 0 Global Step: 10680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:33,858-Speed 6295.35 samples/sec Loss 38.3007 LearningRate 0.0001 Epoch: 0 Global Step: 10690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:37,104-Speed 6311.39 samples/sec Loss 38.3580 LearningRate 0.0001 Epoch: 0 Global Step: 10700 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:40,356-Speed 6298.79 samples/sec Loss 38.4001 LearningRate 0.0001 Epoch: 0 Global Step: 10710 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:43,604-Speed 6307.17 samples/sec Loss 38.4522 LearningRate 0.0001 Epoch: 0 Global Step: 10720 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:46,850-Speed 6311.04 samples/sec Loss 38.4324 LearningRate 0.0001 Epoch: 0 Global Step: 10730 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:50,099-Speed 6306.05 samples/sec Loss 38.4042 LearningRate 0.0001 Epoch: 0 Global Step: 10740 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:42:53,348-Speed 6304.33 samples/sec Loss 38.3303 LearningRate 0.0001 Epoch: 0 Global Step: 10750 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:42:56,597-Speed 6304.46 samples/sec Loss 38.3167 LearningRate 0.0001 Epoch: 0 Global Step: 10760 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:42:59,842-Speed 6311.86 samples/sec Loss 38.3468 LearningRate 0.0001 Epoch: 0 Global Step: 10770 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:43:03,090-Speed 6307.93 samples/sec Loss 38.3302 LearningRate 0.0001 Epoch: 0 Global Step: 10780 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:43:06,341-Speed 6300.74 samples/sec Loss 38.3272 LearningRate 0.0001 Epoch: 0 Global Step: 10790 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:43:09,588-Speed 6310.04 samples/sec Loss 38.3308 LearningRate 0.0001 Epoch: 0 Global Step: 10800 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:43:12,837-Speed 6305.27 samples/sec Loss 38.2857 LearningRate 0.0001 Epoch: 0 Global Step: 10810 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:43:16,084-Speed 6308.39 samples/sec Loss 38.3356 LearningRate 0.0001 Epoch: 0 Global Step: 10820 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:43:19,331-Speed 6309.82 samples/sec Loss 38.3179 LearningRate 0.0001 Epoch: 0 Global Step: 10830 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:43:22,578-Speed 6307.16 samples/sec Loss 38.3063 LearningRate 0.0001 Epoch: 0 Global Step: 10840 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:43:25,827-Speed 6305.90 samples/sec Loss 38.3457 LearningRate 0.0001 Epoch: 0 Global Step: 10850 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:43:29,080-Speed 6296.10 samples/sec Loss 38.2839 LearningRate 0.0001 Epoch: 0 Global Step: 10860 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:43:32,343-Speed 6279.28 samples/sec Loss 38.2614 LearningRate 0.0001 Epoch: 0 Global Step: 10870 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:43:35,592-Speed 6303.55 samples/sec Loss 38.3516 LearningRate 0.0001 Epoch: 0 Global Step: 10880 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:43:38,840-Speed 6306.76 samples/sec Loss 38.3051 LearningRate 0.0001 Epoch: 0 Global Step: 10890 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:43:42,091-Speed 6301.24 samples/sec Loss 38.3053 LearningRate 0.0001 Epoch: 0 Global Step: 10900 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:43:45,340-Speed 6304.42 samples/sec Loss 38.2875 LearningRate 0.0001 Epoch: 0 Global Step: 10910 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:43:48,588-Speed 6306.76 samples/sec Loss 38.2695 LearningRate 0.0001 Epoch: 0 Global Step: 10920 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:43:51,840-Speed 6299.36 samples/sec Loss 38.2922 LearningRate 0.0001 Epoch: 0 Global Step: 10930 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:43:55,089-Speed 6305.52 samples/sec Loss 38.2715 LearningRate 0.0001 Epoch: 0 Global Step: 10940 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:43:58,341-Speed 6298.67 samples/sec Loss 38.2733 LearningRate 0.0001 Epoch: 0 Global Step: 10950 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:44:01,576-Speed 6332.79 samples/sec Loss 38.2648 LearningRate 0.0001 Epoch: 0 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:44:04,824-Speed 6306.35 samples/sec Loss 38.2808 LearningRate 0.0001 Epoch: 0 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:44:08,053-Speed 6343.76 samples/sec Loss 38.2918 LearningRate 0.0001 Epoch: 0 Global Step: 10980 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:11,301-Speed 6308.45 samples/sec Loss 38.3623 LearningRate 0.0001 Epoch: 0 Global Step: 10990 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:14,578-Speed 6250.45 samples/sec Loss 38.3052 LearningRate 0.0001 Epoch: 0 Global Step: 11000 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:17,827-Speed 6305.74 samples/sec Loss 38.2448 LearningRate 0.0001 Epoch: 0 Global Step: 11010 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:21,070-Speed 6315.34 samples/sec Loss 38.2914 LearningRate 0.0001 Epoch: 0 Global Step: 11020 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:24,319-Speed 6305.71 samples/sec Loss 38.2920 LearningRate 0.0001 Epoch: 0 Global Step: 11030 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:27,568-Speed 6304.06 samples/sec Loss 38.2506 LearningRate 0.0001 Epoch: 0 Global Step: 11040 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:30,813-Speed 6313.14 samples/sec Loss 38.2254 LearningRate 0.0001 Epoch: 0 Global Step: 11050 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:34,059-Speed 6311.49 samples/sec Loss 38.2426 LearningRate 0.0001 Epoch: 0 Global Step: 11060 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:37,307-Speed 6306.92 samples/sec Loss 38.2658 LearningRate 0.0001 Epoch: 0 Global Step: 11070 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:40,555-Speed 6306.87 samples/sec Loss 38.2909 LearningRate 0.0001 Epoch: 0 Global Step: 11080 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:44:43,800-Speed 6311.74 samples/sec Loss 38.2957 LearningRate 0.0001 Epoch: 0 Global Step: 11090 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:44:47,031-Speed 6340.79 samples/sec Loss 38.2543 LearningRate 0.0001 Epoch: 0 Global Step: 11100 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:44:50,281-Speed 6302.68 samples/sec Loss 38.2256 LearningRate 0.0001 Epoch: 0 Global Step: 11110 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:44:53,529-Speed 6306.25 samples/sec Loss 38.2314 LearningRate 0.0001 Epoch: 0 Global Step: 11120 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:44:56,788-Speed 6285.66 samples/sec Loss 38.1939 LearningRate 0.0001 Epoch: 0 Global Step: 11130 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:00,034-Speed 6310.41 samples/sec Loss 38.2924 LearningRate 0.0001 Epoch: 0 Global Step: 11140 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:03,286-Speed 6300.30 samples/sec Loss 38.2018 LearningRate 0.0001 Epoch: 0 Global Step: 11150 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:06,530-Speed 6314.17 samples/sec Loss 38.2782 LearningRate 0.0001 Epoch: 0 Global Step: 11160 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:09,779-Speed 6304.34 samples/sec Loss 38.2350 LearningRate 0.0001 Epoch: 0 Global Step: 11170 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:13,024-Speed 6312.96 samples/sec Loss 38.2667 LearningRate 0.0001 Epoch: 0 Global Step: 11180 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:16,346-Speed 6167.10 samples/sec Loss 38.2928 LearningRate 0.0001 Epoch: 0 Global Step: 11190 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:19,600-Speed 6296.07 samples/sec Loss 38.2836 LearningRate 0.0001 Epoch: 0 Global Step: 11200 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:45:22,833-Speed 6335.46 samples/sec Loss 38.3900 LearningRate 0.0001 Epoch: 0 Global Step: 11210 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:26,080-Speed 6308.83 samples/sec Loss 38.3025 LearningRate 0.0001 Epoch: 0 Global Step: 11220 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:29,332-Speed 6298.84 samples/sec Loss 38.2856 LearningRate 0.0001 Epoch: 0 Global Step: 11230 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:32,580-Speed 6307.89 samples/sec Loss 38.3306 LearningRate 0.0001 Epoch: 0 Global Step: 11240 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:35,822-Speed 6318.25 samples/sec Loss 38.2239 LearningRate 0.0001 Epoch: 0 Global Step: 11250 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:39,068-Speed 6309.88 samples/sec Loss 38.2806 LearningRate 0.0001 Epoch: 0 Global Step: 11260 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:42,316-Speed 6307.64 samples/sec Loss 38.2193 LearningRate 0.0001 Epoch: 0 Global Step: 11270 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:45,560-Speed 6313.36 samples/sec Loss 38.2697 LearningRate 0.0001 Epoch: 0 Global Step: 11280 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:48,810-Speed 6303.51 samples/sec Loss 38.2193 LearningRate 0.0001 Epoch: 0 Global Step: 11290 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:52,064-Speed 6295.93 samples/sec Loss 38.2236 LearningRate 0.0001 Epoch: 0 Global Step: 11300 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:45:55,317-Speed 6295.73 samples/sec Loss 38.2152 LearningRate 0.0001 Epoch: 0 Global Step: 11310 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:45:58,566-Speed 6305.28 samples/sec Loss 38.2203 LearningRate 0.0001 Epoch: 0 Global Step: 11320 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:46:01,826-Speed 6284.62 samples/sec Loss 38.2208 LearningRate 0.0001 Epoch: 0 Global Step: 11330 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:46:05,074-Speed 6305.17 samples/sec Loss 38.2189 LearningRate 0.0001 Epoch: 0 Global Step: 11340 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:46:08,318-Speed 6314.90 samples/sec Loss 38.2347 LearningRate 0.0001 Epoch: 0 Global Step: 11350 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:46:11,565-Speed 6309.15 samples/sec Loss 38.2980 LearningRate 0.0001 Epoch: 0 Global Step: 11360 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:46:14,812-Speed 6308.31 samples/sec Loss 38.1936 LearningRate 0.0001 Epoch: 0 Global Step: 11370 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:46:18,058-Speed 6311.24 samples/sec Loss 38.1690 LearningRate 0.0001 Epoch: 0 Global Step: 11380 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:46:21,289-Speed 6341.13 samples/sec Loss 38.2665 LearningRate 0.0001 Epoch: 0 Global Step: 11390 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:46:24,542-Speed 6296.11 samples/sec Loss 38.2024 LearningRate 0.0001 Epoch: 0 Global Step: 11400 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:46:27,788-Speed 6310.59 samples/sec Loss 38.2506 LearningRate 0.0001 Epoch: 0 Global Step: 11410 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:46:31,033-Speed 6312.70 samples/sec Loss 38.1587 LearningRate 0.0001 Epoch: 0 Global Step: 11420 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:46:34,284-Speed 6301.85 samples/sec Loss 38.1301 LearningRate 0.0001 Epoch: 0 Global Step: 11430 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:46:37,533-Speed 6304.55 samples/sec Loss 38.1340 LearningRate 0.0001 Epoch: 0 Global Step: 11440 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:46:40,780-Speed 6309.33 samples/sec Loss 38.1470 LearningRate 0.0001 Epoch: 0 Global Step: 11450 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:46:44,028-Speed 6306.58 samples/sec Loss 38.1518 LearningRate 0.0001 Epoch: 0 Global Step: 11460 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:46:47,274-Speed 6309.88 samples/sec Loss 38.1506 LearningRate 0.0001 Epoch: 0 Global Step: 11470 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:46:50,524-Speed 6302.88 samples/sec Loss 38.0953 LearningRate 0.0001 Epoch: 0 Global Step: 11480 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:46:53,770-Speed 6310.68 samples/sec Loss 38.1900 LearningRate 0.0001 Epoch: 0 Global Step: 11490 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:46:57,016-Speed 6311.46 samples/sec Loss 38.0979 LearningRate 0.0001 Epoch: 0 Global Step: 11500 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:00,271-Speed 6292.53 samples/sec Loss 38.0800 LearningRate 0.0001 Epoch: 0 Global Step: 11510 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:03,522-Speed 6300.72 samples/sec Loss 38.1020 LearningRate 0.0001 Epoch: 0 Global Step: 11520 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:06,773-Speed 6301.20 samples/sec Loss 38.1289 LearningRate 0.0001 Epoch: 0 Global Step: 11530 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:10,022-Speed 6304.84 samples/sec Loss 38.1032 LearningRate 0.0001 Epoch: 0 Global Step: 11540 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:13,275-Speed 6297.70 samples/sec Loss 38.1173 LearningRate 0.0001 Epoch: 0 Global Step: 11550 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:16,529-Speed 6295.37 samples/sec Loss 38.0572 LearningRate 0.0001 Epoch: 0 Global Step: 11560 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:19,782-Speed 6297.75 samples/sec Loss 38.0599 LearningRate 0.0001 Epoch: 0 Global Step: 11570 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:23,029-Speed 6309.31 samples/sec Loss 38.0685 LearningRate 0.0001 Epoch: 0 Global Step: 11580 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:26,260-Speed 6339.42 samples/sec Loss 38.0899 LearningRate 0.0001 Epoch: 0 Global Step: 11590 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:29,508-Speed 6307.69 samples/sec Loss 38.0967 LearningRate 0.0001 Epoch: 0 Global Step: 11600 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:32,754-Speed 6309.10 samples/sec Loss 38.1059 LearningRate 0.0001 Epoch: 0 Global Step: 11610 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:36,000-Speed 6311.25 samples/sec Loss 38.0750 LearningRate 0.0001 Epoch: 0 Global Step: 11620 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:39,253-Speed 6297.70 samples/sec Loss 38.1051 LearningRate 0.0001 Epoch: 0 Global Step: 11630 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:42,501-Speed 6305.75 samples/sec Loss 38.0637 LearningRate 0.0001 Epoch: 0 Global Step: 11640 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:45,752-Speed 6301.44 samples/sec Loss 38.0586 LearningRate 0.0001 Epoch: 0 Global Step: 11650 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:48,998-Speed 6309.60 samples/sec Loss 38.0580 LearningRate 0.0001 Epoch: 0 Global Step: 11660 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:52,244-Speed 6310.98 samples/sec Loss 38.0663 LearningRate 0.0001 Epoch: 0 Global Step: 11670 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:55,496-Speed 6301.68 samples/sec Loss 38.0507 LearningRate 0.0001 Epoch: 0 Global Step: 11680 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:47:58,745-Speed 6305.30 samples/sec Loss 38.0857 LearningRate 0.0001 Epoch: 0 Global Step: 11690 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:48:01,979-Speed 6333.89 samples/sec Loss 38.0532 LearningRate 0.0001 Epoch: 0 Global Step: 11700 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:48:05,230-Speed 6301.06 samples/sec Loss 38.0586 LearningRate 0.0001 Epoch: 0 Global Step: 11710 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:48:08,474-Speed 6314.58 samples/sec Loss 38.0781 LearningRate 0.0001 Epoch: 0 Global Step: 11720 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:48:11,725-Speed 6300.18 samples/sec Loss 38.0793 LearningRate 0.0001 Epoch: 0 Global Step: 11730 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:48:14,977-Speed 6300.25 samples/sec Loss 38.0419 LearningRate 0.0001 Epoch: 0 Global Step: 11740 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:48:18,225-Speed 6305.77 samples/sec Loss 38.0442 LearningRate 0.0001 Epoch: 0 Global Step: 11750 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:48:21,477-Speed 6299.76 samples/sec Loss 37.9891 LearningRate 0.0001 Epoch: 0 Global Step: 11760 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:48:24,731-Speed 6296.71 samples/sec Loss 37.9842 LearningRate 0.0001 Epoch: 0 Global Step: 11770 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:48:27,978-Speed 6308.93 samples/sec Loss 38.0258 LearningRate 0.0001 Epoch: 0 Global Step: 11780 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:48:31,229-Speed 6300.95 samples/sec Loss 38.1307 LearningRate 0.0001 Epoch: 0 Global Step: 11790 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:48:34,473-Speed 6314.70 samples/sec Loss 38.0320 LearningRate 0.0001 Epoch: 0 Global Step: 11800 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:48:37,725-Speed 6297.86 samples/sec Loss 38.0139 LearningRate 0.0001 Epoch: 0 Global Step: 11810 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:48:40,972-Speed 6309.81 samples/sec Loss 37.9849 LearningRate 0.0001 Epoch: 0 Global Step: 11820 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:48:44,216-Speed 6314.28 samples/sec Loss 38.0262 LearningRate 0.0001 Epoch: 0 Global Step: 11830 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:48:47,464-Speed 6307.33 samples/sec Loss 37.9960 LearningRate 0.0001 Epoch: 0 Global Step: 11840 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:48:50,711-Speed 6307.35 samples/sec Loss 37.9912 LearningRate 0.0001 Epoch: 0 Global Step: 11850 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:48:53,963-Speed 6299.89 samples/sec Loss 38.0177 LearningRate 0.0001 Epoch: 0 Global Step: 11860 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:48:57,218-Speed 6293.21 samples/sec Loss 37.9947 LearningRate 0.0001 Epoch: 0 Global Step: 11870 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:49:00,467-Speed 6305.29 samples/sec Loss 37.9499 LearningRate 0.0001 Epoch: 0 Global Step: 11880 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:49:03,713-Speed 6311.60 samples/sec Loss 37.9453 LearningRate 0.0001 Epoch: 0 Global Step: 11890 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:49:06,963-Speed 6302.02 samples/sec Loss 37.9828 LearningRate 0.0001 Epoch: 0 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:49:10,208-Speed 6312.92 samples/sec Loss 37.9467 LearningRate 0.0001 Epoch: 0 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 74 hours Training: 2022-03-31 16:49:13,446-Speed 6326.12 samples/sec Loss 37.9620 LearningRate 0.0001 Epoch: 0 Global Step: 11920 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:49:16,674-Speed 6346.31 samples/sec Loss 38.0133 LearningRate 0.0001 Epoch: 0 Global Step: 11930 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:49:19,921-Speed 6308.47 samples/sec Loss 37.9831 LearningRate 0.0001 Epoch: 0 Global Step: 11940 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:49:23,166-Speed 6311.81 samples/sec Loss 38.0146 LearningRate 0.0001 Epoch: 0 Global Step: 11950 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:49:26,414-Speed 6307.63 samples/sec Loss 37.9787 LearningRate 0.0001 Epoch: 0 Global Step: 11960 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:49:29,668-Speed 6295.17 samples/sec Loss 37.9367 LearningRate 0.0001 Epoch: 0 Global Step: 11970 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:49:32,917-Speed 6305.90 samples/sec Loss 37.9277 LearningRate 0.0001 Epoch: 0 Global Step: 11980 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:49:36,166-Speed 6304.37 samples/sec Loss 37.9592 LearningRate 0.0001 Epoch: 0 Global Step: 11990 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:49:39,416-Speed 6301.89 samples/sec Loss 38.0106 LearningRate 0.0001 Epoch: 0 Global Step: 12000 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:49:42,667-Speed 6302.83 samples/sec Loss 37.9349 LearningRate 0.0001 Epoch: 0 Global Step: 12010 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:49:45,948-Speed 6243.28 samples/sec Loss 38.0031 LearningRate 0.0001 Epoch: 0 Global Step: 12020 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:49:49,294-Speed 6121.12 samples/sec Loss 38.0112 LearningRate 0.0001 Epoch: 0 Global Step: 12030 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:49:52,560-Speed 6272.79 samples/sec Loss 38.0714 LearningRate 0.0001 Epoch: 0 Global Step: 12040 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:49:55,809-Speed 6303.95 samples/sec Loss 38.0008 LearningRate 0.0001 Epoch: 0 Global Step: 12050 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:49:59,059-Speed 6303.47 samples/sec Loss 37.9431 LearningRate 0.0001 Epoch: 0 Global Step: 12060 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:50:02,310-Speed 6301.29 samples/sec Loss 37.9728 LearningRate 0.0001 Epoch: 0 Global Step: 12070 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:50:05,557-Speed 6308.31 samples/sec Loss 38.0000 LearningRate 0.0001 Epoch: 0 Global Step: 12080 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:50:08,790-Speed 6334.78 samples/sec Loss 38.0123 LearningRate 0.0001 Epoch: 0 Global Step: 12090 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:50:12,057-Speed 6271.50 samples/sec Loss 37.9433 LearningRate 0.0001 Epoch: 0 Global Step: 12100 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:50:15,313-Speed 6290.02 samples/sec Loss 37.9399 LearningRate 0.0001 Epoch: 0 Global Step: 12110 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:50:18,563-Speed 6302.82 samples/sec Loss 37.9639 LearningRate 0.0001 Epoch: 0 Global Step: 12120 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:50:21,811-Speed 6308.27 samples/sec Loss 37.9272 LearningRate 0.0001 Epoch: 0 Global Step: 12130 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:50:25,058-Speed 6307.94 samples/sec Loss 37.8949 LearningRate 0.0001 Epoch: 0 Global Step: 12140 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:50:28,308-Speed 6304.09 samples/sec Loss 37.8765 LearningRate 0.0001 Epoch: 0 Global Step: 12150 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:50:31,557-Speed 6305.03 samples/sec Loss 37.8880 LearningRate 0.0001 Epoch: 0 Global Step: 12160 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:50:34,803-Speed 6311.22 samples/sec Loss 37.9686 LearningRate 0.0001 Epoch: 0 Global Step: 12170 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:50:38,069-Speed 6270.99 samples/sec Loss 38.0733 LearningRate 0.0001 Epoch: 0 Global Step: 12180 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:50:41,313-Speed 6314.79 samples/sec Loss 38.0254 LearningRate 0.0001 Epoch: 0 Global Step: 12190 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:50:44,561-Speed 6308.31 samples/sec Loss 37.9510 LearningRate 0.0001 Epoch: 0 Global Step: 12200 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:50:47,803-Speed 6317.89 samples/sec Loss 37.9123 LearningRate 0.0001 Epoch: 0 Global Step: 12210 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:50:51,049-Speed 6309.88 samples/sec Loss 37.8965 LearningRate 0.0001 Epoch: 0 Global Step: 12220 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:50:54,306-Speed 6289.07 samples/sec Loss 37.8984 LearningRate 0.0001 Epoch: 0 Global Step: 12230 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:50:57,538-Speed 6338.50 samples/sec Loss 37.8992 LearningRate 0.0001 Epoch: 0 Global Step: 12240 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:51:00,787-Speed 6305.72 samples/sec Loss 37.8955 LearningRate 0.0001 Epoch: 0 Global Step: 12250 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:51:04,037-Speed 6301.41 samples/sec Loss 37.9621 LearningRate 0.0001 Epoch: 0 Global Step: 12260 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:51:07,284-Speed 6308.61 samples/sec Loss 37.8809 LearningRate 0.0001 Epoch: 0 Global Step: 12270 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:51:10,533-Speed 6306.03 samples/sec Loss 37.8909 LearningRate 0.0001 Epoch: 0 Global Step: 12280 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:51:13,779-Speed 6310.90 samples/sec Loss 37.9424 LearningRate 0.0001 Epoch: 0 Global Step: 12290 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:51:17,026-Speed 6308.60 samples/sec Loss 37.8786 LearningRate 0.0001 Epoch: 0 Global Step: 12300 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:51:20,271-Speed 6311.60 samples/sec Loss 37.8922 LearningRate 0.0001 Epoch: 0 Global Step: 12310 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:51:23,531-Speed 6285.18 samples/sec Loss 37.8472 LearningRate 0.0001 Epoch: 0 Global Step: 12320 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:51:26,777-Speed 6309.16 samples/sec Loss 37.8397 LearningRate 0.0001 Epoch: 0 Global Step: 12330 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:51:30,024-Speed 6309.75 samples/sec Loss 37.8314 LearningRate 0.0001 Epoch: 0 Global Step: 12340 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:51:33,272-Speed 6307.09 samples/sec Loss 37.8240 LearningRate 0.0001 Epoch: 0 Global Step: 12350 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:51:36,520-Speed 6305.79 samples/sec Loss 37.8543 LearningRate 0.0001 Epoch: 0 Global Step: 12360 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:51:39,767-Speed 6308.71 samples/sec Loss 37.8327 LearningRate 0.0001 Epoch: 0 Global Step: 12370 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:51:43,012-Speed 6313.10 samples/sec Loss 37.8081 LearningRate 0.0001 Epoch: 0 Global Step: 12380 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:51:46,255-Speed 6317.52 samples/sec Loss 37.8525 LearningRate 0.0001 Epoch: 0 Global Step: 12390 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:51:49,501-Speed 6310.03 samples/sec Loss 37.8830 LearningRate 0.0001 Epoch: 0 Global Step: 12400 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:51:52,749-Speed 6306.47 samples/sec Loss 37.8730 LearningRate 0.0001 Epoch: 0 Global Step: 12410 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:51:56,002-Speed 6297.48 samples/sec Loss 37.8244 LearningRate 0.0001 Epoch: 0 Global Step: 12420 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:51:59,251-Speed 6305.69 samples/sec Loss 37.8021 LearningRate 0.0001 Epoch: 0 Global Step: 12430 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:02,501-Speed 6303.18 samples/sec Loss 37.8215 LearningRate 0.0001 Epoch: 0 Global Step: 12440 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:52:05,748-Speed 6307.48 samples/sec Loss 37.7875 LearningRate 0.0002 Epoch: 0 Global Step: 12450 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:52:09,001-Speed 6297.51 samples/sec Loss 37.7982 LearningRate 0.0002 Epoch: 0 Global Step: 12460 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:52:12,254-Speed 6297.91 samples/sec Loss 37.7635 LearningRate 0.0002 Epoch: 0 Global Step: 12470 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:52:15,485-Speed 6339.90 samples/sec Loss 37.7594 LearningRate 0.0002 Epoch: 0 Global Step: 12480 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:18,733-Speed 6305.75 samples/sec Loss 37.7847 LearningRate 0.0002 Epoch: 0 Global Step: 12490 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:21,981-Speed 6307.95 samples/sec Loss 37.7258 LearningRate 0.0002 Epoch: 0 Global Step: 12500 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:25,231-Speed 6303.07 samples/sec Loss 37.8421 LearningRate 0.0002 Epoch: 0 Global Step: 12510 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:28,478-Speed 6307.60 samples/sec Loss 37.8129 LearningRate 0.0002 Epoch: 0 Global Step: 12520 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:31,723-Speed 6313.11 samples/sec Loss 37.7689 LearningRate 0.0002 Epoch: 0 Global Step: 12530 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:34,968-Speed 6313.28 samples/sec Loss 37.7576 LearningRate 0.0002 Epoch: 0 Global Step: 12540 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:38,215-Speed 6307.91 samples/sec Loss 37.7758 LearningRate 0.0002 Epoch: 0 Global Step: 12550 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:41,467-Speed 6300.09 samples/sec Loss 37.7559 LearningRate 0.0002 Epoch: 0 Global Step: 12560 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:44,718-Speed 6302.21 samples/sec Loss 37.7387 LearningRate 0.0002 Epoch: 0 Global Step: 12570 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:47,961-Speed 6316.17 samples/sec Loss 37.7535 LearningRate 0.0002 Epoch: 0 Global Step: 12580 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:52:51,209-Speed 6305.65 samples/sec Loss 37.7307 LearningRate 0.0002 Epoch: 0 Global Step: 12590 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:52:54,446-Speed 6329.40 samples/sec Loss 37.7439 LearningRate 0.0002 Epoch: 0 Global Step: 12600 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:52:57,697-Speed 6299.37 samples/sec Loss 37.7696 LearningRate 0.0002 Epoch: 0 Global Step: 12610 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:00,947-Speed 6304.73 samples/sec Loss 37.7570 LearningRate 0.0002 Epoch: 0 Global Step: 12620 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:04,194-Speed 6308.16 samples/sec Loss 37.6995 LearningRate 0.0002 Epoch: 0 Global Step: 12630 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:07,442-Speed 6306.82 samples/sec Loss 37.7027 LearningRate 0.0002 Epoch: 0 Global Step: 12640 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:10,689-Speed 6307.86 samples/sec Loss 37.7475 LearningRate 0.0002 Epoch: 0 Global Step: 12650 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:13,938-Speed 6305.22 samples/sec Loss 37.7385 LearningRate 0.0002 Epoch: 0 Global Step: 12660 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:17,188-Speed 6303.21 samples/sec Loss 37.7133 LearningRate 0.0002 Epoch: 0 Global Step: 12670 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:20,435-Speed 6309.03 samples/sec Loss 37.7504 LearningRate 0.0002 Epoch: 0 Global Step: 12680 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:23,685-Speed 6301.82 samples/sec Loss 37.7188 LearningRate 0.0002 Epoch: 0 Global Step: 12690 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:26,937-Speed 6299.20 samples/sec Loss 37.7209 LearningRate 0.0002 Epoch: 0 Global Step: 12700 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:53:30,199-Speed 6279.87 samples/sec Loss 37.6988 LearningRate 0.0002 Epoch: 0 Global Step: 12710 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:53:33,430-Speed 6340.33 samples/sec Loss 37.7049 LearningRate 0.0002 Epoch: 0 Global Step: 12720 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:36,679-Speed 6305.60 samples/sec Loss 37.7338 LearningRate 0.0002 Epoch: 0 Global Step: 12730 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:39,926-Speed 6308.30 samples/sec Loss 37.7204 LearningRate 0.0002 Epoch: 0 Global Step: 12740 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:43,176-Speed 6303.67 samples/sec Loss 37.7123 LearningRate 0.0002 Epoch: 0 Global Step: 12750 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:46,425-Speed 6305.18 samples/sec Loss 37.7236 LearningRate 0.0002 Epoch: 0 Global Step: 12760 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:49,672-Speed 6308.92 samples/sec Loss 37.6909 LearningRate 0.0002 Epoch: 0 Global Step: 12770 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:52,919-Speed 6308.61 samples/sec Loss 37.6986 LearningRate 0.0002 Epoch: 0 Global Step: 12780 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:56,170-Speed 6300.45 samples/sec Loss 37.6695 LearningRate 0.0002 Epoch: 0 Global Step: 12790 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:53:59,428-Speed 6287.25 samples/sec Loss 37.6805 LearningRate 0.0002 Epoch: 0 Global Step: 12800 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:54:02,684-Speed 6292.22 samples/sec Loss 37.6886 LearningRate 0.0002 Epoch: 0 Global Step: 12810 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:54:05,916-Speed 6338.51 samples/sec Loss 37.6763 LearningRate 0.0002 Epoch: 0 Global Step: 12820 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:54:09,146-Speed 6340.59 samples/sec Loss 37.6841 LearningRate 0.0002 Epoch: 0 Global Step: 12830 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:54:12,394-Speed 6307.89 samples/sec Loss 37.6432 LearningRate 0.0002 Epoch: 0 Global Step: 12840 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:54:15,641-Speed 6307.56 samples/sec Loss 37.6380 LearningRate 0.0002 Epoch: 0 Global Step: 12850 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:54:18,885-Speed 6314.98 samples/sec Loss 37.6559 LearningRate 0.0002 Epoch: 0 Global Step: 12860 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:54:22,128-Speed 6315.92 samples/sec Loss 37.6734 LearningRate 0.0002 Epoch: 0 Global Step: 12870 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:54:25,378-Speed 6304.27 samples/sec Loss 37.6735 LearningRate 0.0002 Epoch: 0 Global Step: 12880 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:54:28,624-Speed 6310.74 samples/sec Loss 37.6443 LearningRate 0.0002 Epoch: 0 Global Step: 12890 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:54:31,878-Speed 6294.04 samples/sec Loss 37.6260 LearningRate 0.0002 Epoch: 0 Global Step: 12900 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:54:35,125-Speed 6309.46 samples/sec Loss 37.6104 LearningRate 0.0002 Epoch: 0 Global Step: 12910 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:54:38,370-Speed 6311.76 samples/sec Loss 37.6325 LearningRate 0.0002 Epoch: 0 Global Step: 12920 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:54:41,618-Speed 6306.47 samples/sec Loss 37.6340 LearningRate 0.0002 Epoch: 0 Global Step: 12930 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:54:44,871-Speed 6299.82 samples/sec Loss 37.6622 LearningRate 0.0002 Epoch: 0 Global Step: 12940 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:54:48,121-Speed 6302.17 samples/sec Loss 37.6341 LearningRate 0.0002 Epoch: 0 Global Step: 12950 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:54:51,368-Speed 6308.48 samples/sec Loss 37.6310 LearningRate 0.0002 Epoch: 0 Global Step: 12960 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:54:54,619-Speed 6301.21 samples/sec Loss 37.6414 LearningRate 0.0002 Epoch: 0 Global Step: 12970 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:54:57,875-Speed 6292.14 samples/sec Loss 37.6140 LearningRate 0.0002 Epoch: 0 Global Step: 12980 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:01,129-Speed 6295.40 samples/sec Loss 37.6147 LearningRate 0.0002 Epoch: 0 Global Step: 12990 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:04,379-Speed 6301.81 samples/sec Loss 37.6102 LearningRate 0.0002 Epoch: 0 Global Step: 13000 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:07,627-Speed 6307.02 samples/sec Loss 37.5533 LearningRate 0.0002 Epoch: 0 Global Step: 13010 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:10,876-Speed 6305.88 samples/sec Loss 37.6060 LearningRate 0.0002 Epoch: 0 Global Step: 13020 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:14,121-Speed 6311.49 samples/sec Loss 37.5840 LearningRate 0.0002 Epoch: 0 Global Step: 13030 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:55:17,368-Speed 6308.18 samples/sec Loss 37.5677 LearningRate 0.0002 Epoch: 0 Global Step: 13040 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:55:20,599-Speed 6341.51 samples/sec Loss 37.6050 LearningRate 0.0002 Epoch: 0 Global Step: 13050 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:23,843-Speed 6313.70 samples/sec Loss 37.5979 LearningRate 0.0002 Epoch: 0 Global Step: 13060 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:27,094-Speed 6301.14 samples/sec Loss 37.5725 LearningRate 0.0002 Epoch: 0 Global Step: 13070 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:30,339-Speed 6313.31 samples/sec Loss 37.5606 LearningRate 0.0002 Epoch: 0 Global Step: 13080 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:33,586-Speed 6307.35 samples/sec Loss 37.5494 LearningRate 0.0002 Epoch: 0 Global Step: 13090 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:36,835-Speed 6305.83 samples/sec Loss 37.5372 LearningRate 0.0002 Epoch: 0 Global Step: 13100 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:40,086-Speed 6301.07 samples/sec Loss 37.5566 LearningRate 0.0002 Epoch: 0 Global Step: 13110 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:43,334-Speed 6305.88 samples/sec Loss 37.5161 LearningRate 0.0002 Epoch: 0 Global Step: 13120 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:46,582-Speed 6307.12 samples/sec Loss 37.4930 LearningRate 0.0002 Epoch: 0 Global Step: 13130 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:49,836-Speed 6297.01 samples/sec Loss 37.5382 LearningRate 0.0002 Epoch: 0 Global Step: 13140 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:55:53,094-Speed 6286.45 samples/sec Loss 37.5376 LearningRate 0.0002 Epoch: 0 Global Step: 13150 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:55:56,345-Speed 6301.47 samples/sec Loss 37.5555 LearningRate 0.0002 Epoch: 0 Global Step: 13160 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:55:59,595-Speed 6303.58 samples/sec Loss 37.5356 LearningRate 0.0002 Epoch: 0 Global Step: 13170 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 16:56:02,827-Speed 6337.60 samples/sec Loss 37.5408 LearningRate 0.0002 Epoch: 0 Global Step: 13180 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:56:06,057-Speed 6342.83 samples/sec Loss 37.5273 LearningRate 0.0002 Epoch: 0 Global Step: 13190 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:56:09,307-Speed 6301.15 samples/sec Loss 37.4915 LearningRate 0.0002 Epoch: 0 Global Step: 13200 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:56:12,564-Speed 6290.92 samples/sec Loss 37.5582 LearningRate 0.0002 Epoch: 0 Global Step: 13210 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:56:15,814-Speed 6300.97 samples/sec Loss 37.4712 LearningRate 0.0002 Epoch: 0 Global Step: 13220 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:56:19,047-Speed 6336.86 samples/sec Loss 37.5140 LearningRate 0.0002 Epoch: 0 Global Step: 13230 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:56:22,296-Speed 6304.87 samples/sec Loss 37.5425 LearningRate 0.0002 Epoch: 0 Global Step: 13240 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:56:25,543-Speed 6308.68 samples/sec Loss 37.4779 LearningRate 0.0002 Epoch: 0 Global Step: 13250 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:56:28,792-Speed 6305.52 samples/sec Loss 37.5075 LearningRate 0.0002 Epoch: 0 Global Step: 13260 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:56:32,038-Speed 6310.08 samples/sec Loss 37.5090 LearningRate 0.0002 Epoch: 0 Global Step: 13270 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:56:35,297-Speed 6285.27 samples/sec Loss 37.4725 LearningRate 0.0002 Epoch: 0 Global Step: 13280 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:56:38,544-Speed 6308.72 samples/sec Loss 37.4758 LearningRate 0.0002 Epoch: 0 Global Step: 13290 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:56:41,788-Speed 6315.35 samples/sec Loss 37.4395 LearningRate 0.0002 Epoch: 0 Global Step: 13300 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:56:45,034-Speed 6309.29 samples/sec Loss 37.5937 LearningRate 0.0002 Epoch: 0 Global Step: 13310 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:56:48,280-Speed 6311.35 samples/sec Loss 37.4984 LearningRate 0.0002 Epoch: 0 Global Step: 13320 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:56:51,529-Speed 6305.36 samples/sec Loss 37.4984 LearningRate 0.0002 Epoch: 0 Global Step: 13330 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:56:54,779-Speed 6304.66 samples/sec Loss 37.4972 LearningRate 0.0002 Epoch: 0 Global Step: 13340 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:56:58,023-Speed 6314.16 samples/sec Loss 37.4301 LearningRate 0.0002 Epoch: 0 Global Step: 13350 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:57:01,272-Speed 6305.13 samples/sec Loss 37.4248 LearningRate 0.0002 Epoch: 0 Global Step: 13360 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:57:04,519-Speed 6307.98 samples/sec Loss 37.4551 LearningRate 0.0002 Epoch: 0 Global Step: 13370 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:57:07,769-Speed 6303.80 samples/sec Loss 37.4413 LearningRate 0.0002 Epoch: 0 Global Step: 13380 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:57:11,016-Speed 6309.23 samples/sec Loss 37.4190 LearningRate 0.0002 Epoch: 0 Global Step: 13390 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:57:14,244-Speed 6343.92 samples/sec Loss 37.4491 LearningRate 0.0002 Epoch: 0 Global Step: 13400 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:57:17,493-Speed 6305.70 samples/sec Loss 37.4000 LearningRate 0.0002 Epoch: 0 Global Step: 13410 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:57:20,743-Speed 6303.15 samples/sec Loss 37.3740 LearningRate 0.0002 Epoch: 0 Global Step: 13420 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:57:23,987-Speed 6314.99 samples/sec Loss 37.6302 LearningRate 0.0002 Epoch: 0 Global Step: 13430 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:57:27,235-Speed 6306.44 samples/sec Loss 37.5977 LearningRate 0.0002 Epoch: 0 Global Step: 13440 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:57:30,481-Speed 6310.75 samples/sec Loss 37.4914 LearningRate 0.0002 Epoch: 0 Global Step: 13450 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:57:33,731-Speed 6301.85 samples/sec Loss 37.4854 LearningRate 0.0002 Epoch: 0 Global Step: 13460 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:57:36,978-Speed 6309.01 samples/sec Loss 37.5521 LearningRate 0.0002 Epoch: 0 Global Step: 13470 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:57:40,226-Speed 6307.34 samples/sec Loss 37.4936 LearningRate 0.0002 Epoch: 0 Global Step: 13480 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:57:43,470-Speed 6313.93 samples/sec Loss 37.4528 LearningRate 0.0002 Epoch: 0 Global Step: 13490 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 16:57:46,723-Speed 6297.02 samples/sec Loss 37.4119 LearningRate 0.0002 Epoch: 0 Global Step: 13500 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:57:49,974-Speed 6302.90 samples/sec Loss 37.4482 LearningRate 0.0002 Epoch: 0 Global Step: 13510 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:57:53,227-Speed 6296.16 samples/sec Loss 37.4286 LearningRate 0.0002 Epoch: 0 Global Step: 13520 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:57:56,474-Speed 6308.47 samples/sec Loss 37.3919 LearningRate 0.0002 Epoch: 0 Global Step: 13530 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:57:59,719-Speed 6313.72 samples/sec Loss 37.4058 LearningRate 0.0002 Epoch: 0 Global Step: 13540 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:58:02,968-Speed 6305.14 samples/sec Loss 37.3696 LearningRate 0.0002 Epoch: 0 Global Step: 13550 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:58:06,216-Speed 6307.32 samples/sec Loss 37.3550 LearningRate 0.0002 Epoch: 0 Global Step: 13560 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:58:09,466-Speed 6302.73 samples/sec Loss 37.3594 LearningRate 0.0002 Epoch: 0 Global Step: 13570 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:58:12,717-Speed 6300.70 samples/sec Loss 37.3684 LearningRate 0.0002 Epoch: 0 Global Step: 13580 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:58:15,988-Speed 6263.09 samples/sec Loss 37.4093 LearningRate 0.0002 Epoch: 0 Global Step: 13590 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:58:19,236-Speed 6305.37 samples/sec Loss 37.3676 LearningRate 0.0002 Epoch: 0 Global Step: 13600 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:58:22,485-Speed 6304.58 samples/sec Loss 37.3409 LearningRate 0.0002 Epoch: 0 Global Step: 13610 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:58:25,732-Speed 6308.95 samples/sec Loss 37.3805 LearningRate 0.0002 Epoch: 0 Global Step: 13620 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:58:28,980-Speed 6308.12 samples/sec Loss 37.3478 LearningRate 0.0002 Epoch: 0 Global Step: 13630 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:58:32,228-Speed 6305.54 samples/sec Loss 37.3483 LearningRate 0.0002 Epoch: 0 Global Step: 13640 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:58:35,476-Speed 6307.62 samples/sec Loss 37.3533 LearningRate 0.0002 Epoch: 0 Global Step: 13650 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:58:38,728-Speed 6299.24 samples/sec Loss 37.3754 LearningRate 0.0002 Epoch: 0 Global Step: 13660 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:58:41,981-Speed 6297.56 samples/sec Loss 37.3591 LearningRate 0.0002 Epoch: 0 Global Step: 13670 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:58:45,216-Speed 6332.12 samples/sec Loss 37.3015 LearningRate 0.0002 Epoch: 0 Global Step: 13680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:58:48,462-Speed 6310.90 samples/sec Loss 37.3136 LearningRate 0.0002 Epoch: 0 Global Step: 13690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:58:51,711-Speed 6304.70 samples/sec Loss 37.2983 LearningRate 0.0002 Epoch: 0 Global Step: 13700 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:58:54,959-Speed 6305.58 samples/sec Loss 37.3107 LearningRate 0.0002 Epoch: 0 Global Step: 13710 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:58:58,207-Speed 6307.32 samples/sec Loss 37.3218 LearningRate 0.0002 Epoch: 0 Global Step: 13720 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:01,452-Speed 6313.49 samples/sec Loss 37.3410 LearningRate 0.0002 Epoch: 0 Global Step: 13730 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:04,698-Speed 6311.03 samples/sec Loss 37.3636 LearningRate 0.0002 Epoch: 0 Global Step: 13740 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:07,949-Speed 6301.28 samples/sec Loss 37.3288 LearningRate 0.0002 Epoch: 0 Global Step: 13750 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:11,197-Speed 6306.48 samples/sec Loss 37.2859 LearningRate 0.0002 Epoch: 0 Global Step: 13760 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:14,444-Speed 6307.99 samples/sec Loss 37.3257 LearningRate 0.0002 Epoch: 0 Global Step: 13770 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:17,691-Speed 6309.99 samples/sec Loss 37.3207 LearningRate 0.0002 Epoch: 0 Global Step: 13780 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:59:20,939-Speed 6306.41 samples/sec Loss 37.2870 LearningRate 0.0002 Epoch: 0 Global Step: 13790 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:59:24,187-Speed 6306.60 samples/sec Loss 37.2493 LearningRate 0.0002 Epoch: 0 Global Step: 13800 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:59:27,435-Speed 6307.02 samples/sec Loss 37.2341 LearningRate 0.0002 Epoch: 0 Global Step: 13810 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:59:30,678-Speed 6315.21 samples/sec Loss 37.3006 LearningRate 0.0002 Epoch: 0 Global Step: 13820 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:59:33,930-Speed 6300.67 samples/sec Loss 37.2644 LearningRate 0.0002 Epoch: 0 Global Step: 13830 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:59:37,181-Speed 6300.50 samples/sec Loss 37.2141 LearningRate 0.0002 Epoch: 0 Global Step: 13840 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 16:59:40,420-Speed 6323.43 samples/sec Loss 37.3162 LearningRate 0.0002 Epoch: 0 Global Step: 13850 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:43,669-Speed 6306.28 samples/sec Loss 37.2535 LearningRate 0.0002 Epoch: 0 Global Step: 13860 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:46,918-Speed 6303.43 samples/sec Loss 37.2852 LearningRate 0.0002 Epoch: 0 Global Step: 13870 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:50,168-Speed 6302.44 samples/sec Loss 37.2595 LearningRate 0.0002 Epoch: 0 Global Step: 13880 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:53,416-Speed 6308.35 samples/sec Loss 37.2858 LearningRate 0.0002 Epoch: 0 Global Step: 13890 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:56,669-Speed 6296.85 samples/sec Loss 37.2598 LearningRate 0.0002 Epoch: 0 Global Step: 13900 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 16:59:59,915-Speed 6310.55 samples/sec Loss 37.2429 LearningRate 0.0002 Epoch: 0 Global Step: 13910 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:03,191-Speed 6253.59 samples/sec Loss 37.2561 LearningRate 0.0002 Epoch: 0 Global Step: 13920 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:06,437-Speed 6311.29 samples/sec Loss 37.2091 LearningRate 0.0002 Epoch: 0 Global Step: 13930 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:09,682-Speed 6312.80 samples/sec Loss 37.2382 LearningRate 0.0002 Epoch: 0 Global Step: 13940 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:12,930-Speed 6305.62 samples/sec Loss 37.1971 LearningRate 0.0002 Epoch: 0 Global Step: 13950 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:00:16,177-Speed 6309.50 samples/sec Loss 37.1669 LearningRate 0.0002 Epoch: 0 Global Step: 13960 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:00:19,425-Speed 6307.48 samples/sec Loss 37.1938 LearningRate 0.0002 Epoch: 0 Global Step: 13970 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:22,674-Speed 6304.12 samples/sec Loss 37.1980 LearningRate 0.0002 Epoch: 0 Global Step: 13980 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:25,925-Speed 6300.77 samples/sec Loss 37.1436 LearningRate 0.0002 Epoch: 0 Global Step: 13990 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:29,175-Speed 6302.79 samples/sec Loss 37.1658 LearningRate 0.0002 Epoch: 0 Global Step: 14000 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:32,422-Speed 6308.24 samples/sec Loss 37.1601 LearningRate 0.0002 Epoch: 0 Global Step: 14010 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:35,675-Speed 6297.06 samples/sec Loss 37.1536 LearningRate 0.0002 Epoch: 0 Global Step: 14020 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:38,922-Speed 6309.59 samples/sec Loss 37.1909 LearningRate 0.0002 Epoch: 0 Global Step: 14030 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:42,170-Speed 6307.03 samples/sec Loss 37.1188 LearningRate 0.0002 Epoch: 0 Global Step: 14040 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:45,415-Speed 6311.85 samples/sec Loss 37.1612 LearningRate 0.0002 Epoch: 0 Global Step: 14050 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:48,666-Speed 6302.40 samples/sec Loss 37.2008 LearningRate 0.0002 Epoch: 0 Global Step: 14060 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:00:51,914-Speed 6306.44 samples/sec Loss 37.1600 LearningRate 0.0002 Epoch: 0 Global Step: 14070 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:00:55,176-Speed 6279.97 samples/sec Loss 37.1126 LearningRate 0.0002 Epoch: 0 Global Step: 14080 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:00:58,432-Speed 6290.67 samples/sec Loss 37.0943 LearningRate 0.0002 Epoch: 0 Global Step: 14090 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:01:01,681-Speed 6305.49 samples/sec Loss 37.1437 LearningRate 0.0002 Epoch: 0 Global Step: 14100 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:01:04,931-Speed 6302.72 samples/sec Loss 37.1759 LearningRate 0.0002 Epoch: 0 Global Step: 14110 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:01:08,180-Speed 6305.31 samples/sec Loss 37.1037 LearningRate 0.0002 Epoch: 0 Global Step: 14120 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:01:11,426-Speed 6311.21 samples/sec Loss 37.1043 LearningRate 0.0002 Epoch: 0 Global Step: 14130 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:01:14,673-Speed 6308.62 samples/sec Loss 37.0869 LearningRate 0.0002 Epoch: 0 Global Step: 14140 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:01:17,921-Speed 6307.68 samples/sec Loss 37.1301 LearningRate 0.0002 Epoch: 0 Global Step: 14150 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:01:21,168-Speed 6308.81 samples/sec Loss 37.1119 LearningRate 0.0002 Epoch: 0 Global Step: 14160 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:01:24,415-Speed 6308.04 samples/sec Loss 37.0699 LearningRate 0.0002 Epoch: 0 Global Step: 14170 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 17:01:27,666-Speed 6300.00 samples/sec Loss 37.1390 LearningRate 0.0002 Epoch: 0 Global Step: 14180 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 17:01:30,917-Speed 6301.75 samples/sec Loss 37.1058 LearningRate 0.0002 Epoch: 0 Global Step: 14190 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 17:01:34,149-Speed 6336.85 samples/sec Loss 37.0757 LearningRate 0.0002 Epoch: 0 Global Step: 14200 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:01:37,414-Speed 6274.56 samples/sec Loss 37.0876 LearningRate 0.0002 Epoch: 0 Global Step: 14210 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:01:40,650-Speed 6330.50 samples/sec Loss 37.0666 LearningRate 0.0002 Epoch: 0 Global Step: 14220 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:01:43,898-Speed 6306.15 samples/sec Loss 37.0705 LearningRate 0.0002 Epoch: 0 Global Step: 14230 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:01:47,146-Speed 6306.69 samples/sec Loss 37.0766 LearningRate 0.0002 Epoch: 0 Global Step: 14240 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:01:50,450-Speed 6199.60 samples/sec Loss 37.0509 LearningRate 0.0002 Epoch: 0 Global Step: 14250 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:01:53,709-Speed 6287.09 samples/sec Loss 37.0385 LearningRate 0.0002 Epoch: 0 Global Step: 14260 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:01:56,958-Speed 6304.69 samples/sec Loss 37.0378 LearningRate 0.0002 Epoch: 0 Global Step: 14270 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:00,212-Speed 6294.19 samples/sec Loss 37.0064 LearningRate 0.0002 Epoch: 0 Global Step: 14280 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:03,462-Speed 6303.55 samples/sec Loss 37.0105 LearningRate 0.0002 Epoch: 0 Global Step: 14290 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:06,709-Speed 6309.63 samples/sec Loss 37.0303 LearningRate 0.0002 Epoch: 0 Global Step: 14300 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:09,957-Speed 6306.53 samples/sec Loss 37.0340 LearningRate 0.0002 Epoch: 0 Global Step: 14310 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:13,206-Speed 6305.63 samples/sec Loss 37.0624 LearningRate 0.0002 Epoch: 0 Global Step: 14320 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:02:16,455-Speed 6304.53 samples/sec Loss 37.0577 LearningRate 0.0002 Epoch: 0 Global Step: 14330 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:02:19,721-Speed 6272.82 samples/sec Loss 37.0305 LearningRate 0.0002 Epoch: 0 Global Step: 14340 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:02:22,969-Speed 6305.47 samples/sec Loss 37.0028 LearningRate 0.0002 Epoch: 0 Global Step: 14350 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:02:26,216-Speed 6309.68 samples/sec Loss 37.0341 LearningRate 0.0002 Epoch: 0 Global Step: 14360 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:02:29,459-Speed 6315.24 samples/sec Loss 37.0177 LearningRate 0.0002 Epoch: 0 Global Step: 14370 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:02:32,691-Speed 6338.11 samples/sec Loss 36.9883 LearningRate 0.0002 Epoch: 0 Global Step: 14380 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:35,934-Speed 6316.00 samples/sec Loss 36.9995 LearningRate 0.0002 Epoch: 0 Global Step: 14390 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:39,183-Speed 6305.26 samples/sec Loss 36.9934 LearningRate 0.0002 Epoch: 0 Global Step: 14400 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:42,438-Speed 6293.06 samples/sec Loss 37.0102 LearningRate 0.0002 Epoch: 0 Global Step: 14410 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:45,684-Speed 6310.91 samples/sec Loss 37.0685 LearningRate 0.0002 Epoch: 0 Global Step: 14420 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:48,931-Speed 6308.74 samples/sec Loss 37.0036 LearningRate 0.0002 Epoch: 0 Global Step: 14430 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:52,176-Speed 6312.52 samples/sec Loss 36.9900 LearningRate 0.0002 Epoch: 0 Global Step: 14440 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:55,422-Speed 6310.38 samples/sec Loss 36.9796 LearningRate 0.0002 Epoch: 0 Global Step: 14450 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:02:58,670-Speed 6307.37 samples/sec Loss 36.9946 LearningRate 0.0002 Epoch: 0 Global Step: 14460 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:03:01,920-Speed 6302.61 samples/sec Loss 36.9600 LearningRate 0.0002 Epoch: 0 Global Step: 14470 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:03:05,171-Speed 6301.70 samples/sec Loss 36.9744 LearningRate 0.0002 Epoch: 0 Global Step: 14480 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:08,416-Speed 6312.65 samples/sec Loss 36.9911 LearningRate 0.0002 Epoch: 0 Global Step: 14490 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:11,667-Speed 6302.78 samples/sec Loss 36.9431 LearningRate 0.0002 Epoch: 0 Global Step: 14500 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:14,914-Speed 6308.65 samples/sec Loss 36.9653 LearningRate 0.0002 Epoch: 0 Global Step: 14510 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:18,160-Speed 6309.74 samples/sec Loss 36.9474 LearningRate 0.0002 Epoch: 0 Global Step: 14520 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:21,407-Speed 6309.63 samples/sec Loss 36.9092 LearningRate 0.0002 Epoch: 0 Global Step: 14530 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:24,657-Speed 6302.14 samples/sec Loss 36.9428 LearningRate 0.0002 Epoch: 0 Global Step: 14540 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:27,902-Speed 6312.62 samples/sec Loss 36.9152 LearningRate 0.0002 Epoch: 0 Global Step: 14550 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:31,150-Speed 6307.31 samples/sec Loss 36.9174 LearningRate 0.0002 Epoch: 0 Global Step: 14560 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:34,392-Speed 6317.22 samples/sec Loss 36.9152 LearningRate 0.0002 Epoch: 0 Global Step: 14570 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:37,624-Speed 6338.46 samples/sec Loss 36.8984 LearningRate 0.0002 Epoch: 0 Global Step: 14580 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:40,873-Speed 6305.04 samples/sec Loss 36.9227 LearningRate 0.0002 Epoch: 0 Global Step: 14590 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:03:44,107-Speed 6333.61 samples/sec Loss 36.8988 LearningRate 0.0002 Epoch: 0 Global Step: 14600 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:03:47,356-Speed 6304.71 samples/sec Loss 36.8673 LearningRate 0.0002 Epoch: 0 Global Step: 14610 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:03:50,606-Speed 6303.84 samples/sec Loss 36.8609 LearningRate 0.0002 Epoch: 0 Global Step: 14620 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:03:53,855-Speed 6303.63 samples/sec Loss 36.9363 LearningRate 0.0002 Epoch: 0 Global Step: 14630 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:03:57,105-Speed 6304.64 samples/sec Loss 36.9054 LearningRate 0.0002 Epoch: 0 Global Step: 14640 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:00,368-Speed 6276.94 samples/sec Loss 36.8986 LearningRate 0.0002 Epoch: 0 Global Step: 14650 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:03,619-Speed 6300.46 samples/sec Loss 36.8994 LearningRate 0.0002 Epoch: 0 Global Step: 14660 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:06,872-Speed 6297.14 samples/sec Loss 36.8295 LearningRate 0.0002 Epoch: 0 Global Step: 14670 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:10,119-Speed 6310.68 samples/sec Loss 36.8846 LearningRate 0.0002 Epoch: 0 Global Step: 14680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:13,368-Speed 6304.28 samples/sec Loss 36.8818 LearningRate 0.0002 Epoch: 0 Global Step: 14690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:16,617-Speed 6305.12 samples/sec Loss 36.8789 LearningRate 0.0002 Epoch: 0 Global Step: 14700 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:04:19,868-Speed 6301.21 samples/sec Loss 36.8561 LearningRate 0.0002 Epoch: 0 Global Step: 14710 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:04:23,122-Speed 6295.16 samples/sec Loss 36.8345 LearningRate 0.0002 Epoch: 0 Global Step: 14720 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:04:26,371-Speed 6304.77 samples/sec Loss 36.8171 LearningRate 0.0002 Epoch: 0 Global Step: 14730 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:04:29,640-Speed 6266.48 samples/sec Loss 36.7872 LearningRate 0.0002 Epoch: 0 Global Step: 14740 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:04:32,891-Speed 6301.47 samples/sec Loss 36.7995 LearningRate 0.0002 Epoch: 0 Global Step: 14750 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:04:36,138-Speed 6308.64 samples/sec Loss 36.8454 LearningRate 0.0002 Epoch: 0 Global Step: 14760 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:04:39,371-Speed 6336.58 samples/sec Loss 36.8164 LearningRate 0.0002 Epoch: 0 Global Step: 14770 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:42,617-Speed 6309.66 samples/sec Loss 36.8433 LearningRate 0.0002 Epoch: 0 Global Step: 14780 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:45,867-Speed 6304.15 samples/sec Loss 36.8138 LearningRate 0.0002 Epoch: 0 Global Step: 14790 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:49,113-Speed 6310.42 samples/sec Loss 36.7947 LearningRate 0.0002 Epoch: 0 Global Step: 14800 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:52,423-Speed 6190.95 samples/sec Loss 36.8143 LearningRate 0.0002 Epoch: 0 Global Step: 14810 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:55,668-Speed 6311.96 samples/sec Loss 36.8592 LearningRate 0.0002 Epoch: 0 Global Step: 14820 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:04:58,922-Speed 6296.41 samples/sec Loss 36.7595 LearningRate 0.0002 Epoch: 0 Global Step: 14830 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:02,166-Speed 6313.01 samples/sec Loss 36.7760 LearningRate 0.0002 Epoch: 0 Global Step: 14840 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:05,414-Speed 6307.53 samples/sec Loss 36.7996 LearningRate 0.0002 Epoch: 0 Global Step: 14850 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:08,657-Speed 6317.84 samples/sec Loss 36.7425 LearningRate 0.0002 Epoch: 0 Global Step: 14860 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:11,907-Speed 6302.15 samples/sec Loss 36.7566 LearningRate 0.0002 Epoch: 0 Global Step: 14870 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:05:15,155-Speed 6306.58 samples/sec Loss 36.7772 LearningRate 0.0002 Epoch: 0 Global Step: 14880 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:05:18,402-Speed 6309.33 samples/sec Loss 36.7560 LearningRate 0.0002 Epoch: 0 Global Step: 14890 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:05:21,649-Speed 6310.57 samples/sec Loss 36.9354 LearningRate 0.0002 Epoch: 0 Global Step: 14900 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:05:24,897-Speed 6306.47 samples/sec Loss 36.7403 LearningRate 0.0002 Epoch: 0 Global Step: 14910 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:05:28,143-Speed 6310.83 samples/sec Loss 36.7187 LearningRate 0.0002 Epoch: 0 Global Step: 14920 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:05:31,386-Speed 6314.94 samples/sec Loss 36.7510 LearningRate 0.0002 Epoch: 0 Global Step: 14930 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:05:34,620-Speed 6334.69 samples/sec Loss 36.9469 LearningRate 0.0002 Epoch: 0 Global Step: 14940 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:37,870-Speed 6303.57 samples/sec Loss 36.8614 LearningRate 0.0002 Epoch: 0 Global Step: 14950 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:41,115-Speed 6311.44 samples/sec Loss 37.6369 LearningRate 0.0002 Epoch: 0 Global Step: 14960 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:44,364-Speed 6305.86 samples/sec Loss 37.3826 LearningRate 0.0002 Epoch: 0 Global Step: 14970 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:47,612-Speed 6306.49 samples/sec Loss 37.0894 LearningRate 0.0002 Epoch: 0 Global Step: 14980 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:50,857-Speed 6312.03 samples/sec Loss 36.9382 LearningRate 0.0002 Epoch: 0 Global Step: 14990 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:54,107-Speed 6303.51 samples/sec Loss 36.8887 LearningRate 0.0002 Epoch: 0 Global Step: 15000 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:05:57,354-Speed 6308.42 samples/sec Loss 36.8159 LearningRate 0.0002 Epoch: 0 Global Step: 15010 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:00,600-Speed 6310.92 samples/sec Loss 36.7922 LearningRate 0.0002 Epoch: 0 Global Step: 15020 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:03,853-Speed 6297.60 samples/sec Loss 36.7739 LearningRate 0.0002 Epoch: 0 Global Step: 15030 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:07,105-Speed 6297.85 samples/sec Loss 36.7406 LearningRate 0.0002 Epoch: 0 Global Step: 15040 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:06:10,358-Speed 6297.39 samples/sec Loss 36.7204 LearningRate 0.0002 Epoch: 0 Global Step: 15050 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:06:13,608-Speed 6302.86 samples/sec Loss 36.7217 LearningRate 0.0002 Epoch: 0 Global Step: 15060 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:06:16,858-Speed 6304.49 samples/sec Loss 36.7230 LearningRate 0.0002 Epoch: 0 Global Step: 15070 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:06:20,107-Speed 6303.76 samples/sec Loss 36.6973 LearningRate 0.0002 Epoch: 0 Global Step: 15080 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:06:23,369-Speed 6281.40 samples/sec Loss 36.6532 LearningRate 0.0002 Epoch: 0 Global Step: 15090 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:06:26,613-Speed 6314.57 samples/sec Loss 36.7200 LearningRate 0.0002 Epoch: 0 Global Step: 15100 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:06:29,868-Speed 6292.04 samples/sec Loss 36.6978 LearningRate 0.0002 Epoch: 0 Global Step: 15110 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:06:33,101-Speed 6337.92 samples/sec Loss 36.7746 LearningRate 0.0002 Epoch: 0 Global Step: 15120 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:36,349-Speed 6305.38 samples/sec Loss 36.6840 LearningRate 0.0002 Epoch: 0 Global Step: 15130 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:39,597-Speed 6306.75 samples/sec Loss 36.7012 LearningRate 0.0002 Epoch: 0 Global Step: 15140 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:42,846-Speed 6305.78 samples/sec Loss 36.6909 LearningRate 0.0002 Epoch: 0 Global Step: 15150 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:46,094-Speed 6305.85 samples/sec Loss 36.6614 LearningRate 0.0002 Epoch: 0 Global Step: 15160 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:49,339-Speed 6314.01 samples/sec Loss 36.6365 LearningRate 0.0002 Epoch: 0 Global Step: 15170 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:52,581-Speed 6318.22 samples/sec Loss 36.6202 LearningRate 0.0002 Epoch: 0 Global Step: 15180 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:55,826-Speed 6311.45 samples/sec Loss 36.6695 LearningRate 0.0002 Epoch: 0 Global Step: 15190 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:06:59,072-Speed 6311.49 samples/sec Loss 36.6705 LearningRate 0.0002 Epoch: 0 Global Step: 15200 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:07:02,321-Speed 6304.60 samples/sec Loss 36.6592 LearningRate 0.0002 Epoch: 0 Global Step: 15210 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:07:05,554-Speed 6336.80 samples/sec Loss 36.6247 LearningRate 0.0002 Epoch: 0 Global Step: 15220 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:07:08,794-Speed 6322.63 samples/sec Loss 36.6361 LearningRate 0.0002 Epoch: 0 Global Step: 15230 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:07:12,039-Speed 6311.29 samples/sec Loss 36.6448 LearningRate 0.0002 Epoch: 0 Global Step: 15240 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:07:15,273-Speed 6335.08 samples/sec Loss 36.6498 LearningRate 0.0002 Epoch: 0 Global Step: 15250 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:07:18,521-Speed 6305.86 samples/sec Loss 36.6064 LearningRate 0.0002 Epoch: 0 Global Step: 15260 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:07:21,754-Speed 6337.16 samples/sec Loss 36.6218 LearningRate 0.0002 Epoch: 0 Global Step: 15270 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 17:07:24,995-Speed 6318.60 samples/sec Loss 36.6080 LearningRate 0.0002 Epoch: 0 Global Step: 15280 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 17:07:28,248-Speed 6297.56 samples/sec Loss 36.6641 LearningRate 0.0002 Epoch: 0 Global Step: 15290 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 17:07:31,496-Speed 6307.49 samples/sec Loss 36.7496 LearningRate 0.0002 Epoch: 0 Global Step: 15300 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 17:07:34,740-Speed 6314.48 samples/sec Loss 36.6775 LearningRate 0.0002 Epoch: 0 Global Step: 15310 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 17:07:37,989-Speed 6304.62 samples/sec Loss 36.6414 LearningRate 0.0002 Epoch: 0 Global Step: 15320 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 17:07:41,231-Speed 6319.35 samples/sec Loss 36.6467 LearningRate 0.0002 Epoch: 0 Global Step: 15330 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 17:07:44,475-Speed 6315.23 samples/sec Loss 36.6310 LearningRate 0.0002 Epoch: 0 Global Step: 15340 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 17:07:47,719-Speed 6314.18 samples/sec Loss 36.5953 LearningRate 0.0002 Epoch: 0 Global Step: 15350 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 17:07:50,963-Speed 6315.08 samples/sec Loss 36.6156 LearningRate 0.0002 Epoch: 0 Global Step: 15360 Fp16 Grad Scale: 2048 Required: 74 hours Training: 2022-03-31 17:07:54,209-Speed 6309.59 samples/sec Loss 36.6046 LearningRate 0.0002 Epoch: 0 Global Step: 15370 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:07:57,454-Speed 6312.85 samples/sec Loss 36.5433 LearningRate 0.0002 Epoch: 0 Global Step: 15380 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:08:00,703-Speed 6304.13 samples/sec Loss 36.5826 LearningRate 0.0002 Epoch: 0 Global Step: 15390 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:08:03,953-Speed 6303.21 samples/sec Loss 36.5534 LearningRate 0.0002 Epoch: 0 Global Step: 15400 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:08:07,198-Speed 6312.23 samples/sec Loss 36.6080 LearningRate 0.0002 Epoch: 0 Global Step: 15410 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:08:10,442-Speed 6316.48 samples/sec Loss 36.5581 LearningRate 0.0002 Epoch: 0 Global Step: 15420 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:08:13,687-Speed 6310.72 samples/sec Loss 36.5172 LearningRate 0.0002 Epoch: 0 Global Step: 15430 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:08:16,936-Speed 6306.03 samples/sec Loss 36.4795 LearningRate 0.0002 Epoch: 0 Global Step: 15440 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:08:20,185-Speed 6304.97 samples/sec Loss 36.5165 LearningRate 0.0002 Epoch: 0 Global Step: 15450 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:08:23,432-Speed 6309.22 samples/sec Loss 36.5066 LearningRate 0.0002 Epoch: 0 Global Step: 15460 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:08:26,678-Speed 6310.99 samples/sec Loss 36.5352 LearningRate 0.0002 Epoch: 0 Global Step: 15470 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:08:29,925-Speed 6307.17 samples/sec Loss 36.5064 LearningRate 0.0002 Epoch: 0 Global Step: 15480 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:08:33,174-Speed 6305.57 samples/sec Loss 36.5118 LearningRate 0.0002 Epoch: 0 Global Step: 15490 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:08:36,420-Speed 6310.77 samples/sec Loss 36.4819 LearningRate 0.0002 Epoch: 0 Global Step: 15500 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:08:39,668-Speed 6307.80 samples/sec Loss 36.4793 LearningRate 0.0002 Epoch: 0 Global Step: 15510 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:08:42,908-Speed 6322.23 samples/sec Loss 36.4918 LearningRate 0.0002 Epoch: 0 Global Step: 15520 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:08:46,176-Speed 6268.91 samples/sec Loss 36.5302 LearningRate 0.0002 Epoch: 0 Global Step: 15530 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:08:49,424-Speed 6306.44 samples/sec Loss 36.4788 LearningRate 0.0002 Epoch: 0 Global Step: 15540 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:08:52,671-Speed 6307.64 samples/sec Loss 36.5308 LearningRate 0.0002 Epoch: 0 Global Step: 15550 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:08:55,920-Speed 6304.96 samples/sec Loss 36.4852 LearningRate 0.0002 Epoch: 0 Global Step: 15560 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:08:59,169-Speed 6305.71 samples/sec Loss 36.4736 LearningRate 0.0002 Epoch: 0 Global Step: 15570 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:09:02,402-Speed 6335.76 samples/sec Loss 36.5134 LearningRate 0.0002 Epoch: 0 Global Step: 15580 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:09:05,649-Speed 6308.23 samples/sec Loss 36.4979 LearningRate 0.0002 Epoch: 0 Global Step: 15590 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:09:08,897-Speed 6306.81 samples/sec Loss 36.5516 LearningRate 0.0002 Epoch: 0 Global Step: 15600 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:09:12,142-Speed 6314.08 samples/sec Loss 36.4706 LearningRate 0.0002 Epoch: 0 Global Step: 15610 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:09:15,395-Speed 6295.69 samples/sec Loss 36.5214 LearningRate 0.0002 Epoch: 0 Global Step: 15620 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:09:18,643-Speed 6308.04 samples/sec Loss 36.4425 LearningRate 0.0002 Epoch: 0 Global Step: 15630 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:09:21,890-Speed 6306.76 samples/sec Loss 36.4167 LearningRate 0.0002 Epoch: 0 Global Step: 15640 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:09:25,143-Speed 6298.70 samples/sec Loss 36.4333 LearningRate 0.0002 Epoch: 0 Global Step: 15650 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:09:28,390-Speed 6307.61 samples/sec Loss 36.3773 LearningRate 0.0002 Epoch: 0 Global Step: 15660 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:09:31,635-Speed 6313.74 samples/sec Loss 36.4537 LearningRate 0.0002 Epoch: 0 Global Step: 15670 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:09:34,883-Speed 6305.82 samples/sec Loss 36.4684 LearningRate 0.0002 Epoch: 0 Global Step: 15680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:09:38,131-Speed 6307.86 samples/sec Loss 36.3642 LearningRate 0.0002 Epoch: 0 Global Step: 15690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:09:41,376-Speed 6312.32 samples/sec Loss 36.3834 LearningRate 0.0002 Epoch: 0 Global Step: 15700 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:09:44,623-Speed 6310.20 samples/sec Loss 36.4005 LearningRate 0.0002 Epoch: 0 Global Step: 15710 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:09:47,872-Speed 6304.71 samples/sec Loss 36.4113 LearningRate 0.0002 Epoch: 0 Global Step: 15720 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:09:51,116-Speed 6314.06 samples/sec Loss 36.4247 LearningRate 0.0002 Epoch: 0 Global Step: 15730 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:09:54,360-Speed 6315.09 samples/sec Loss 36.3708 LearningRate 0.0002 Epoch: 0 Global Step: 15740 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:09:57,606-Speed 6310.33 samples/sec Loss 36.4005 LearningRate 0.0002 Epoch: 0 Global Step: 15750 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:10:00,852-Speed 6310.84 samples/sec Loss 36.4313 LearningRate 0.0002 Epoch: 0 Global Step: 15760 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:10:04,104-Speed 6299.40 samples/sec Loss 36.3976 LearningRate 0.0002 Epoch: 0 Global Step: 15770 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:10:07,356-Speed 6298.18 samples/sec Loss 36.3276 LearningRate 0.0002 Epoch: 0 Global Step: 15780 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:10,604-Speed 6307.62 samples/sec Loss 36.3574 LearningRate 0.0002 Epoch: 0 Global Step: 15790 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:13,847-Speed 6316.02 samples/sec Loss 36.3728 LearningRate 0.0002 Epoch: 0 Global Step: 15800 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:17,093-Speed 6310.85 samples/sec Loss 36.3460 LearningRate 0.0002 Epoch: 0 Global Step: 15810 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:20,339-Speed 6311.20 samples/sec Loss 36.3481 LearningRate 0.0002 Epoch: 0 Global Step: 15820 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:23,654-Speed 6179.62 samples/sec Loss 36.3470 LearningRate 0.0002 Epoch: 0 Global Step: 15830 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:26,900-Speed 6310.39 samples/sec Loss 36.3132 LearningRate 0.0002 Epoch: 0 Global Step: 15840 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:30,147-Speed 6308.64 samples/sec Loss 36.3171 LearningRate 0.0002 Epoch: 0 Global Step: 15850 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:33,395-Speed 6305.65 samples/sec Loss 36.2971 LearningRate 0.0002 Epoch: 0 Global Step: 15860 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:36,649-Speed 6295.31 samples/sec Loss 36.3865 LearningRate 0.0002 Epoch: 0 Global Step: 15870 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:39,878-Speed 6345.15 samples/sec Loss 36.3309 LearningRate 0.0002 Epoch: 0 Global Step: 15880 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:43,124-Speed 6310.54 samples/sec Loss 36.3074 LearningRate 0.0002 Epoch: 0 Global Step: 15890 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:46,378-Speed 6294.99 samples/sec Loss 36.3014 LearningRate 0.0002 Epoch: 0 Global Step: 15900 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:49,623-Speed 6312.68 samples/sec Loss 36.3273 LearningRate 0.0002 Epoch: 0 Global Step: 15910 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:52,872-Speed 6305.36 samples/sec Loss 36.2683 LearningRate 0.0002 Epoch: 0 Global Step: 15920 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:56,119-Speed 6309.31 samples/sec Loss 36.2784 LearningRate 0.0002 Epoch: 0 Global Step: 15930 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:10:59,368-Speed 6303.91 samples/sec Loss 36.3549 LearningRate 0.0002 Epoch: 0 Global Step: 15940 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:02,623-Speed 6293.33 samples/sec Loss 36.2938 LearningRate 0.0002 Epoch: 0 Global Step: 15950 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:05,872-Speed 6306.69 samples/sec Loss 36.3051 LearningRate 0.0002 Epoch: 0 Global Step: 15960 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:09,118-Speed 6309.22 samples/sec Loss 36.2745 LearningRate 0.0002 Epoch: 0 Global Step: 15970 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:12,365-Speed 6309.73 samples/sec Loss 36.3041 LearningRate 0.0002 Epoch: 0 Global Step: 15980 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 17:11:15,615-Speed 6301.97 samples/sec Loss 36.2916 LearningRate 0.0002 Epoch: 0 Global Step: 15990 Fp16 Grad Scale: 65536 Required: 74 hours Training: 2022-03-31 17:11:18,847-Speed 6338.56 samples/sec Loss 36.3079 LearningRate 0.0002 Epoch: 0 Global Step: 16000 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:22,093-Speed 6311.56 samples/sec Loss 36.3471 LearningRate 0.0002 Epoch: 0 Global Step: 16010 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:25,340-Speed 6308.34 samples/sec Loss 36.2793 LearningRate 0.0002 Epoch: 0 Global Step: 16020 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:28,601-Speed 6280.91 samples/sec Loss 36.2076 LearningRate 0.0002 Epoch: 0 Global Step: 16030 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:31,848-Speed 6307.99 samples/sec Loss 36.2692 LearningRate 0.0002 Epoch: 0 Global Step: 16040 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:35,098-Speed 6303.50 samples/sec Loss 36.2566 LearningRate 0.0002 Epoch: 0 Global Step: 16050 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:38,343-Speed 6313.01 samples/sec Loss 36.3530 LearningRate 0.0002 Epoch: 0 Global Step: 16060 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:41,592-Speed 6306.00 samples/sec Loss 36.2732 LearningRate 0.0002 Epoch: 0 Global Step: 16070 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:11:44,826-Speed 6333.91 samples/sec Loss 36.2690 LearningRate 0.0002 Epoch: 0 Global Step: 16080 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:11:48,080-Speed 6295.91 samples/sec Loss 36.2183 LearningRate 0.0002 Epoch: 0 Global Step: 16090 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:11:51,309-Speed 6344.00 samples/sec Loss 36.3395 LearningRate 0.0002 Epoch: 0 Global Step: 16100 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:11:54,542-Speed 6334.72 samples/sec Loss 36.3406 LearningRate 0.0002 Epoch: 0 Global Step: 16110 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:11:57,785-Speed 6318.27 samples/sec Loss 36.3370 LearningRate 0.0002 Epoch: 0 Global Step: 16120 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:01,032-Speed 6308.20 samples/sec Loss 36.2904 LearningRate 0.0002 Epoch: 0 Global Step: 16130 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:04,279-Speed 6308.90 samples/sec Loss 36.2753 LearningRate 0.0002 Epoch: 0 Global Step: 16140 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:07,530-Speed 6301.21 samples/sec Loss 36.2003 LearningRate 0.0002 Epoch: 0 Global Step: 16150 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:10,779-Speed 6303.78 samples/sec Loss 36.2387 LearningRate 0.0002 Epoch: 0 Global Step: 16160 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:14,027-Speed 6306.90 samples/sec Loss 36.2688 LearningRate 0.0002 Epoch: 0 Global Step: 16170 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:17,269-Speed 6318.82 samples/sec Loss 36.2697 LearningRate 0.0002 Epoch: 0 Global Step: 16180 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:20,515-Speed 6310.76 samples/sec Loss 36.2175 LearningRate 0.0002 Epoch: 0 Global Step: 16190 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:23,761-Speed 6311.07 samples/sec Loss 36.1663 LearningRate 0.0002 Epoch: 0 Global Step: 16200 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:27,006-Speed 6313.37 samples/sec Loss 36.2743 LearningRate 0.0002 Epoch: 0 Global Step: 16210 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:12:30,257-Speed 6300.59 samples/sec Loss 36.1703 LearningRate 0.0002 Epoch: 0 Global Step: 16220 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:12:33,507-Speed 6302.21 samples/sec Loss 36.3066 LearningRate 0.0002 Epoch: 0 Global Step: 16230 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:12:36,753-Speed 6310.26 samples/sec Loss 36.2610 LearningRate 0.0002 Epoch: 0 Global Step: 16240 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:12:39,984-Speed 6340.74 samples/sec Loss 36.1454 LearningRate 0.0002 Epoch: 0 Global Step: 16250 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:43,232-Speed 6306.29 samples/sec Loss 36.1911 LearningRate 0.0002 Epoch: 0 Global Step: 16260 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:46,481-Speed 6306.38 samples/sec Loss 36.1820 LearningRate 0.0002 Epoch: 0 Global Step: 16270 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:49,725-Speed 6314.50 samples/sec Loss 36.1772 LearningRate 0.0002 Epoch: 0 Global Step: 16280 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:52,969-Speed 6314.32 samples/sec Loss 36.1386 LearningRate 0.0002 Epoch: 0 Global Step: 16290 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:12:56,210-Speed 6319.88 samples/sec Loss 36.2054 LearningRate 0.0002 Epoch: 0 Global Step: 16300 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:12:59,455-Speed 6312.91 samples/sec Loss 36.1542 LearningRate 0.0002 Epoch: 0 Global Step: 16310 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:13:02,706-Speed 6300.47 samples/sec Loss 36.1155 LearningRate 0.0002 Epoch: 0 Global Step: 16320 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:13:05,940-Speed 6334.86 samples/sec Loss 36.1452 LearningRate 0.0002 Epoch: 0 Global Step: 16330 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:09,184-Speed 6313.97 samples/sec Loss 36.0888 LearningRate 0.0002 Epoch: 0 Global Step: 16340 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:12,429-Speed 6313.63 samples/sec Loss 36.1639 LearningRate 0.0002 Epoch: 0 Global Step: 16350 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:15,677-Speed 6306.63 samples/sec Loss 36.1292 LearningRate 0.0002 Epoch: 0 Global Step: 16360 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:18,924-Speed 6309.41 samples/sec Loss 36.0737 LearningRate 0.0002 Epoch: 0 Global Step: 16370 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:22,168-Speed 6314.04 samples/sec Loss 36.2181 LearningRate 0.0002 Epoch: 0 Global Step: 16380 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:25,410-Speed 6317.89 samples/sec Loss 36.4231 LearningRate 0.0002 Epoch: 0 Global Step: 16390 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:28,659-Speed 6304.88 samples/sec Loss 36.3443 LearningRate 0.0002 Epoch: 0 Global Step: 16400 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:31,905-Speed 6311.56 samples/sec Loss 36.3158 LearningRate 0.0002 Epoch: 0 Global Step: 16410 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:35,155-Speed 6303.59 samples/sec Loss 36.1362 LearningRate 0.0002 Epoch: 0 Global Step: 16420 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:38,404-Speed 6303.95 samples/sec Loss 36.1872 LearningRate 0.0002 Epoch: 0 Global Step: 16430 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:13:41,653-Speed 6305.72 samples/sec Loss 36.1337 LearningRate 0.0002 Epoch: 0 Global Step: 16440 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:13:44,902-Speed 6303.75 samples/sec Loss 36.0634 LearningRate 0.0002 Epoch: 0 Global Step: 16450 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:13:48,145-Speed 6317.12 samples/sec Loss 36.1577 LearningRate 0.0002 Epoch: 0 Global Step: 16460 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:13:51,379-Speed 6334.14 samples/sec Loss 36.1311 LearningRate 0.0002 Epoch: 0 Global Step: 16470 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:54,622-Speed 6315.92 samples/sec Loss 36.3112 LearningRate 0.0002 Epoch: 0 Global Step: 16480 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:13:57,866-Speed 6314.48 samples/sec Loss 36.2080 LearningRate 0.0002 Epoch: 0 Global Step: 16490 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:14:01,114-Speed 6307.93 samples/sec Loss 36.1771 LearningRate 0.0002 Epoch: 0 Global Step: 16500 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:14:04,361-Speed 6307.94 samples/sec Loss 36.1308 LearningRate 0.0002 Epoch: 0 Global Step: 16510 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:14:07,607-Speed 6311.95 samples/sec Loss 36.0891 LearningRate 0.0002 Epoch: 0 Global Step: 16520 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:14:10,850-Speed 6315.36 samples/sec Loss 36.0926 LearningRate 0.0002 Epoch: 0 Global Step: 16530 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:14:14,091-Speed 6322.11 samples/sec Loss 36.0731 LearningRate 0.0002 Epoch: 0 Global Step: 16540 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:14:17,340-Speed 6304.34 samples/sec Loss 36.0399 LearningRate 0.0002 Epoch: 0 Global Step: 16550 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:14:20,584-Speed 6315.01 samples/sec Loss 36.0660 LearningRate 0.0002 Epoch: 0 Global Step: 16560 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:14:23,842-Speed 6287.32 samples/sec Loss 36.1083 LearningRate 0.0002 Epoch: 0 Global Step: 16570 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:14:27,092-Speed 6302.30 samples/sec Loss 36.0772 LearningRate 0.0002 Epoch: 0 Global Step: 16580 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:14:30,340-Speed 6307.67 samples/sec Loss 35.9590 LearningRate 0.0002 Epoch: 0 Global Step: 16590 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:14:33,589-Speed 6303.21 samples/sec Loss 36.0695 LearningRate 0.0002 Epoch: 0 Global Step: 16600 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:14:36,834-Speed 6313.09 samples/sec Loss 36.0581 LearningRate 0.0002 Epoch: 0 Global Step: 16610 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:14:40,089-Speed 6294.66 samples/sec Loss 36.0448 LearningRate 0.0002 Epoch: 0 Global Step: 16620 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:14:43,331-Speed 6317.54 samples/sec Loss 35.9946 LearningRate 0.0002 Epoch: 0 Global Step: 16630 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:14:46,577-Speed 6310.48 samples/sec Loss 35.9343 LearningRate 0.0002 Epoch: 0 Global Step: 16640 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:14:49,821-Speed 6315.41 samples/sec Loss 36.0221 LearningRate 0.0002 Epoch: 0 Global Step: 16650 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:14:53,068-Speed 6308.45 samples/sec Loss 35.9571 LearningRate 0.0002 Epoch: 0 Global Step: 16660 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:14:56,313-Speed 6312.30 samples/sec Loss 35.9735 LearningRate 0.0002 Epoch: 0 Global Step: 16670 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:14:59,562-Speed 6304.83 samples/sec Loss 36.0215 LearningRate 0.0002 Epoch: 0 Global Step: 16680 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:15:02,810-Speed 6307.09 samples/sec Loss 36.0254 LearningRate 0.0002 Epoch: 0 Global Step: 16690 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:15:06,056-Speed 6312.49 samples/sec Loss 36.0769 LearningRate 0.0002 Epoch: 0 Global Step: 16700 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:15:09,305-Speed 6304.99 samples/sec Loss 35.9826 LearningRate 0.0002 Epoch: 0 Global Step: 16710 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:15:12,558-Speed 6297.00 samples/sec Loss 36.0842 LearningRate 0.0002 Epoch: 0 Global Step: 16720 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:15:15,786-Speed 6346.67 samples/sec Loss 35.9225 LearningRate 0.0002 Epoch: 0 Global Step: 16730 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:15:19,017-Speed 6339.75 samples/sec Loss 36.1237 LearningRate 0.0002 Epoch: 0 Global Step: 16740 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:15:22,265-Speed 6308.09 samples/sec Loss 35.9706 LearningRate 0.0002 Epoch: 0 Global Step: 16750 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:15:25,513-Speed 6306.04 samples/sec Loss 36.1635 LearningRate 0.0002 Epoch: 0 Global Step: 16760 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:15:28,762-Speed 6305.71 samples/sec Loss 36.0456 LearningRate 0.0002 Epoch: 0 Global Step: 16770 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:15:32,008-Speed 6310.02 samples/sec Loss 35.9436 LearningRate 0.0002 Epoch: 0 Global Step: 16780 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:15:35,257-Speed 6303.98 samples/sec Loss 35.9250 LearningRate 0.0002 Epoch: 0 Global Step: 16790 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:15:38,502-Speed 6312.94 samples/sec Loss 35.9356 LearningRate 0.0002 Epoch: 0 Global Step: 16800 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:15:41,751-Speed 6305.66 samples/sec Loss 35.9076 LearningRate 0.0002 Epoch: 0 Global Step: 16810 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:15:44,997-Speed 6309.47 samples/sec Loss 35.8698 LearningRate 0.0002 Epoch: 0 Global Step: 16820 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:15:48,242-Speed 6312.70 samples/sec Loss 35.9650 LearningRate 0.0002 Epoch: 0 Global Step: 16830 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:15:51,489-Speed 6309.47 samples/sec Loss 35.9025 LearningRate 0.0002 Epoch: 0 Global Step: 16840 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:15:54,734-Speed 6311.89 samples/sec Loss 35.9238 LearningRate 0.0002 Epoch: 0 Global Step: 16850 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:15:57,994-Speed 6284.20 samples/sec Loss 35.8880 LearningRate 0.0002 Epoch: 0 Global Step: 16860 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:16:01,253-Speed 6285.65 samples/sec Loss 36.0482 LearningRate 0.0002 Epoch: 0 Global Step: 16870 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:16:04,588-Speed 6142.52 samples/sec Loss 35.8859 LearningRate 0.0002 Epoch: 0 Global Step: 16880 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:16:07,835-Speed 6308.40 samples/sec Loss 36.0271 LearningRate 0.0002 Epoch: 0 Global Step: 16890 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:16:11,078-Speed 6317.28 samples/sec Loss 35.8913 LearningRate 0.0002 Epoch: 0 Global Step: 16900 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:16:14,320-Speed 6317.62 samples/sec Loss 35.8862 LearningRate 0.0002 Epoch: 0 Global Step: 16910 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:16:17,567-Speed 6310.47 samples/sec Loss 35.9127 LearningRate 0.0002 Epoch: 0 Global Step: 16920 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:16:20,846-Speed 6245.45 samples/sec Loss 35.9828 LearningRate 0.0002 Epoch: 0 Global Step: 16930 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:16:24,204-Speed 6100.69 samples/sec Loss 35.9064 LearningRate 0.0002 Epoch: 0 Global Step: 16940 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:16:27,461-Speed 6290.54 samples/sec Loss 35.8898 LearningRate 0.0002 Epoch: 0 Global Step: 16950 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:16:30,706-Speed 6311.43 samples/sec Loss 35.9047 LearningRate 0.0002 Epoch: 0 Global Step: 16960 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:16:33,954-Speed 6307.12 samples/sec Loss 35.8658 LearningRate 0.0002 Epoch: 0 Global Step: 16970 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:16:37,199-Speed 6312.10 samples/sec Loss 35.8727 LearningRate 0.0002 Epoch: 0 Global Step: 16980 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:16:40,446-Speed 6310.12 samples/sec Loss 35.7810 LearningRate 0.0002 Epoch: 0 Global Step: 16990 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:16:43,675-Speed 6342.17 samples/sec Loss 35.8427 LearningRate 0.0002 Epoch: 0 Global Step: 17000 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:16:46,923-Speed 6307.52 samples/sec Loss 35.9090 LearningRate 0.0002 Epoch: 0 Global Step: 17010 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:16:50,154-Speed 6339.44 samples/sec Loss 35.9667 LearningRate 0.0002 Epoch: 0 Global Step: 17020 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:16:53,405-Speed 6300.98 samples/sec Loss 35.7813 LearningRate 0.0002 Epoch: 0 Global Step: 17030 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:16:56,652-Speed 6308.57 samples/sec Loss 35.7799 LearningRate 0.0002 Epoch: 0 Global Step: 17040 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:16:59,896-Speed 6314.70 samples/sec Loss 35.7990 LearningRate 0.0002 Epoch: 0 Global Step: 17050 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:17:03,141-Speed 6313.83 samples/sec Loss 35.8875 LearningRate 0.0002 Epoch: 0 Global Step: 17060 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:17:06,387-Speed 6310.94 samples/sec Loss 35.8631 LearningRate 0.0002 Epoch: 0 Global Step: 17070 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:17:09,639-Speed 6300.27 samples/sec Loss 35.8032 LearningRate 0.0002 Epoch: 0 Global Step: 17080 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:17:12,885-Speed 6310.44 samples/sec Loss 35.8270 LearningRate 0.0002 Epoch: 0 Global Step: 17090 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:17:16,131-Speed 6310.02 samples/sec Loss 35.7983 LearningRate 0.0002 Epoch: 0 Global Step: 17100 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:17:19,375-Speed 6314.79 samples/sec Loss 35.7491 LearningRate 0.0002 Epoch: 0 Global Step: 17110 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:17:22,622-Speed 6308.30 samples/sec Loss 35.7886 LearningRate 0.0002 Epoch: 0 Global Step: 17120 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:17:25,871-Speed 6306.00 samples/sec Loss 35.7712 LearningRate 0.0002 Epoch: 0 Global Step: 17130 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:17:29,126-Speed 6293.58 samples/sec Loss 35.8220 LearningRate 0.0002 Epoch: 0 Global Step: 17140 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:17:32,372-Speed 6309.89 samples/sec Loss 35.7357 LearningRate 0.0002 Epoch: 0 Global Step: 17150 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:17:35,616-Speed 6315.30 samples/sec Loss 35.7441 LearningRate 0.0002 Epoch: 0 Global Step: 17160 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:17:38,866-Speed 6302.78 samples/sec Loss 35.7411 LearningRate 0.0002 Epoch: 0 Global Step: 17170 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:17:42,116-Speed 6301.81 samples/sec Loss 35.7130 LearningRate 0.0002 Epoch: 0 Global Step: 17180 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:17:45,364-Speed 6307.43 samples/sec Loss 35.7123 LearningRate 0.0002 Epoch: 0 Global Step: 17190 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:17:48,614-Speed 6303.40 samples/sec Loss 35.7561 LearningRate 0.0002 Epoch: 0 Global Step: 17200 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:17:51,859-Speed 6311.47 samples/sec Loss 35.7209 LearningRate 0.0002 Epoch: 0 Global Step: 17210 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:17:55,106-Speed 6308.55 samples/sec Loss 35.7198 LearningRate 0.0002 Epoch: 0 Global Step: 17220 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:17:58,355-Speed 6305.44 samples/sec Loss 35.7125 LearningRate 0.0002 Epoch: 0 Global Step: 17230 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:18:01,615-Speed 6284.51 samples/sec Loss 35.6653 LearningRate 0.0002 Epoch: 0 Global Step: 17240 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:18:04,875-Speed 6283.82 samples/sec Loss 35.6705 LearningRate 0.0002 Epoch: 0 Global Step: 17250 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:18:08,119-Speed 6314.13 samples/sec Loss 35.6505 LearningRate 0.0002 Epoch: 0 Global Step: 17260 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:18:11,369-Speed 6303.07 samples/sec Loss 35.6747 LearningRate 0.0002 Epoch: 0 Global Step: 17270 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:18:14,617-Speed 6308.25 samples/sec Loss 35.7052 LearningRate 0.0002 Epoch: 0 Global Step: 17280 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:18:17,876-Speed 6285.84 samples/sec Loss 35.7902 LearningRate 0.0002 Epoch: 0 Global Step: 17290 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:18:21,110-Speed 6333.21 samples/sec Loss 35.6692 LearningRate 0.0002 Epoch: 0 Global Step: 17300 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:18:24,358-Speed 6306.27 samples/sec Loss 35.6797 LearningRate 0.0002 Epoch: 0 Global Step: 17310 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:18:27,610-Speed 6298.86 samples/sec Loss 35.6931 LearningRate 0.0002 Epoch: 0 Global Step: 17320 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:18:30,854-Speed 6315.09 samples/sec Loss 35.6429 LearningRate 0.0002 Epoch: 0 Global Step: 17330 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:18:34,102-Speed 6307.68 samples/sec Loss 35.6095 LearningRate 0.0002 Epoch: 0 Global Step: 17340 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:18:37,353-Speed 6300.97 samples/sec Loss 35.6438 LearningRate 0.0002 Epoch: 0 Global Step: 17350 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:18:40,600-Speed 6307.79 samples/sec Loss 35.5906 LearningRate 0.0002 Epoch: 0 Global Step: 17360 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:18:43,849-Speed 6305.16 samples/sec Loss 35.6254 LearningRate 0.0002 Epoch: 0 Global Step: 17370 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:18:47,098-Speed 6305.08 samples/sec Loss 35.6640 LearningRate 0.0002 Epoch: 0 Global Step: 17380 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:18:50,343-Speed 6312.29 samples/sec Loss 35.6709 LearningRate 0.0002 Epoch: 0 Global Step: 17390 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:18:53,593-Speed 6303.01 samples/sec Loss 35.6198 LearningRate 0.0002 Epoch: 0 Global Step: 17400 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:18:56,842-Speed 6306.36 samples/sec Loss 35.6512 LearningRate 0.0002 Epoch: 0 Global Step: 17410 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:19:00,089-Speed 6307.74 samples/sec Loss 35.6108 LearningRate 0.0002 Epoch: 0 Global Step: 17420 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:19:03,337-Speed 6307.46 samples/sec Loss 35.7985 LearningRate 0.0002 Epoch: 0 Global Step: 17430 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:19:06,588-Speed 6299.80 samples/sec Loss 35.9050 LearningRate 0.0002 Epoch: 0 Global Step: 17440 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:19:09,836-Speed 6308.24 samples/sec Loss 36.4577 LearningRate 0.0002 Epoch: 0 Global Step: 17450 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:19:13,084-Speed 6305.37 samples/sec Loss 36.3270 LearningRate 0.0002 Epoch: 0 Global Step: 17460 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:19:16,334-Speed 6304.53 samples/sec Loss 36.1301 LearningRate 0.0002 Epoch: 0 Global Step: 17470 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:19:19,579-Speed 6312.53 samples/sec Loss 35.9264 LearningRate 0.0002 Epoch: 0 Global Step: 17480 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:19:22,826-Speed 6308.93 samples/sec Loss 35.8684 LearningRate 0.0002 Epoch: 0 Global Step: 17490 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:19:26,071-Speed 6312.84 samples/sec Loss 35.7975 LearningRate 0.0002 Epoch: 0 Global Step: 17500 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:19:29,318-Speed 6308.54 samples/sec Loss 35.7693 LearningRate 0.0002 Epoch: 0 Global Step: 17510 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:19:32,589-Speed 6263.27 samples/sec Loss 35.6917 LearningRate 0.0002 Epoch: 0 Global Step: 17520 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:19:35,838-Speed 6305.08 samples/sec Loss 35.6632 LearningRate 0.0002 Epoch: 0 Global Step: 17530 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:19:39,085-Speed 6308.08 samples/sec Loss 35.6296 LearningRate 0.0002 Epoch: 0 Global Step: 17540 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:19:42,314-Speed 6344.31 samples/sec Loss 35.6775 LearningRate 0.0002 Epoch: 0 Global Step: 17550 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:19:45,546-Speed 6337.83 samples/sec Loss 35.7035 LearningRate 0.0002 Epoch: 0 Global Step: 17560 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:19:48,789-Speed 6315.22 samples/sec Loss 35.7878 LearningRate 0.0002 Epoch: 0 Global Step: 17570 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:19:52,041-Speed 6300.17 samples/sec Loss 35.8665 LearningRate 0.0002 Epoch: 0 Global Step: 17580 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:19:55,290-Speed 6303.96 samples/sec Loss 36.2306 LearningRate 0.0002 Epoch: 0 Global Step: 17590 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:19:58,535-Speed 6312.53 samples/sec Loss 36.0108 LearningRate 0.0002 Epoch: 0 Global Step: 17600 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:20:01,782-Speed 6309.01 samples/sec Loss 35.9065 LearningRate 0.0002 Epoch: 0 Global Step: 17610 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:20:05,016-Speed 6335.70 samples/sec Loss 36.0422 LearningRate 0.0002 Epoch: 0 Global Step: 17620 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:20:08,258-Speed 6316.99 samples/sec Loss 35.7924 LearningRate 0.0002 Epoch: 0 Global Step: 17630 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:20:11,502-Speed 6315.77 samples/sec Loss 35.7254 LearningRate 0.0002 Epoch: 0 Global Step: 17640 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:20:14,744-Speed 6318.07 samples/sec Loss 35.7172 LearningRate 0.0002 Epoch: 0 Global Step: 17650 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:20:17,991-Speed 6309.32 samples/sec Loss 35.6624 LearningRate 0.0002 Epoch: 0 Global Step: 17660 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:20:21,237-Speed 6310.18 samples/sec Loss 35.6420 LearningRate 0.0002 Epoch: 0 Global Step: 17670 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:20:24,483-Speed 6311.00 samples/sec Loss 35.5798 LearningRate 0.0002 Epoch: 0 Global Step: 17680 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:20:27,725-Speed 6319.39 samples/sec Loss 35.5867 LearningRate 0.0002 Epoch: 0 Global Step: 17690 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:20:30,970-Speed 6312.64 samples/sec Loss 35.7578 LearningRate 0.0002 Epoch: 0 Global Step: 17700 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:20:34,285-Speed 6178.29 samples/sec Loss 35.7551 LearningRate 0.0002 Epoch: 0 Global Step: 17710 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:20:37,532-Speed 6309.88 samples/sec Loss 35.6619 LearningRate 0.0002 Epoch: 0 Global Step: 17720 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:20:40,786-Speed 6294.96 samples/sec Loss 35.5843 LearningRate 0.0002 Epoch: 0 Global Step: 17730 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:20:44,032-Speed 6310.06 samples/sec Loss 35.5831 LearningRate 0.0002 Epoch: 0 Global Step: 17740 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:20:47,286-Speed 6295.76 samples/sec Loss 35.5259 LearningRate 0.0002 Epoch: 0 Global Step: 17750 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:20:50,533-Speed 6309.47 samples/sec Loss 35.5947 LearningRate 0.0002 Epoch: 0 Global Step: 17760 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:20:53,836-Speed 6200.61 samples/sec Loss 35.5296 LearningRate 0.0002 Epoch: 0 Global Step: 17770 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:20:57,086-Speed 6303.96 samples/sec Loss 35.5117 LearningRate 0.0002 Epoch: 0 Global Step: 17780 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:21:00,355-Speed 6265.82 samples/sec Loss 35.5205 LearningRate 0.0002 Epoch: 0 Global Step: 17790 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:21:03,660-Speed 6197.24 samples/sec Loss 35.5358 LearningRate 0.0002 Epoch: 0 Global Step: 17800 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:21:06,903-Speed 6317.80 samples/sec Loss 35.4658 LearningRate 0.0002 Epoch: 0 Global Step: 17810 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:21:10,154-Speed 6299.82 samples/sec Loss 35.5112 LearningRate 0.0002 Epoch: 0 Global Step: 17820 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:21:13,399-Speed 6314.04 samples/sec Loss 35.5193 LearningRate 0.0002 Epoch: 0 Global Step: 17830 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:21:16,648-Speed 6303.66 samples/sec Loss 35.4954 LearningRate 0.0002 Epoch: 0 Global Step: 17840 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:21:19,898-Speed 6304.01 samples/sec Loss 35.4495 LearningRate 0.0002 Epoch: 0 Global Step: 17850 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:21:23,144-Speed 6310.01 samples/sec Loss 35.4189 LearningRate 0.0002 Epoch: 0 Global Step: 17860 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:21:26,396-Speed 6299.58 samples/sec Loss 35.4581 LearningRate 0.0002 Epoch: 0 Global Step: 17870 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:21:29,641-Speed 6312.29 samples/sec Loss 35.5017 LearningRate 0.0002 Epoch: 0 Global Step: 17880 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:21:32,884-Speed 6317.60 samples/sec Loss 35.4837 LearningRate 0.0002 Epoch: 0 Global Step: 17890 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:21:36,141-Speed 6291.36 samples/sec Loss 35.4519 LearningRate 0.0002 Epoch: 0 Global Step: 17900 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:21:39,386-Speed 6314.33 samples/sec Loss 35.4305 LearningRate 0.0002 Epoch: 0 Global Step: 17910 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:21:42,632-Speed 6309.00 samples/sec Loss 35.4203 LearningRate 0.0002 Epoch: 0 Global Step: 17920 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:21:45,882-Speed 6303.09 samples/sec Loss 35.3748 LearningRate 0.0002 Epoch: 0 Global Step: 17930 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:21:49,128-Speed 6311.16 samples/sec Loss 35.5208 LearningRate 0.0002 Epoch: 0 Global Step: 17940 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:21:52,407-Speed 6247.54 samples/sec Loss 35.4839 LearningRate 0.0002 Epoch: 0 Global Step: 17950 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:21:55,697-Speed 6225.50 samples/sec Loss 35.5578 LearningRate 0.0002 Epoch: 0 Global Step: 17960 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:21:58,930-Speed 6336.98 samples/sec Loss 35.4803 LearningRate 0.0002 Epoch: 0 Global Step: 17970 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:22:02,177-Speed 6309.09 samples/sec Loss 35.4069 LearningRate 0.0002 Epoch: 0 Global Step: 17980 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:22:05,427-Speed 6302.28 samples/sec Loss 35.5778 LearningRate 0.0002 Epoch: 0 Global Step: 17990 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:22:08,673-Speed 6311.71 samples/sec Loss 35.4286 LearningRate 0.0002 Epoch: 0 Global Step: 18000 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:22:11,924-Speed 6299.61 samples/sec Loss 35.5516 LearningRate 0.0002 Epoch: 0 Global Step: 18010 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:22:15,218-Speed 6219.18 samples/sec Loss 35.5268 LearningRate 0.0002 Epoch: 0 Global Step: 18020 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:22:18,543-Speed 6161.25 samples/sec Loss 35.4694 LearningRate 0.0002 Epoch: 0 Global Step: 18030 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:22:21,790-Speed 6308.73 samples/sec Loss 35.3687 LearningRate 0.0002 Epoch: 0 Global Step: 18040 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:22:25,036-Speed 6310.39 samples/sec Loss 35.3465 LearningRate 0.0002 Epoch: 0 Global Step: 18050 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:22:28,282-Speed 6310.94 samples/sec Loss 35.3879 LearningRate 0.0002 Epoch: 0 Global Step: 18060 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:22:31,529-Speed 6308.46 samples/sec Loss 35.3749 LearningRate 0.0002 Epoch: 0 Global Step: 18070 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:22:34,778-Speed 6305.45 samples/sec Loss 35.3813 LearningRate 0.0002 Epoch: 0 Global Step: 18080 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:22:38,024-Speed 6310.32 samples/sec Loss 35.4339 LearningRate 0.0002 Epoch: 0 Global Step: 18090 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:22:41,272-Speed 6308.45 samples/sec Loss 35.4024 LearningRate 0.0002 Epoch: 0 Global Step: 18100 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:22:44,522-Speed 6302.20 samples/sec Loss 35.4175 LearningRate 0.0002 Epoch: 0 Global Step: 18110 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:22:47,771-Speed 6307.42 samples/sec Loss 35.3268 LearningRate 0.0002 Epoch: 0 Global Step: 18120 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:22:51,016-Speed 6312.25 samples/sec Loss 35.3347 LearningRate 0.0002 Epoch: 0 Global Step: 18130 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:22:54,262-Speed 6311.43 samples/sec Loss 35.3173 LearningRate 0.0002 Epoch: 0 Global Step: 18140 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:22:57,508-Speed 6310.63 samples/sec Loss 35.3428 LearningRate 0.0002 Epoch: 0 Global Step: 18150 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:23:00,752-Speed 6314.86 samples/sec Loss 35.2957 LearningRate 0.0002 Epoch: 0 Global Step: 18160 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:23:03,985-Speed 6335.14 samples/sec Loss 35.3453 LearningRate 0.0002 Epoch: 0 Global Step: 18170 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:23:07,228-Speed 6315.86 samples/sec Loss 35.2903 LearningRate 0.0002 Epoch: 0 Global Step: 18180 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:23:10,474-Speed 6310.93 samples/sec Loss 35.2919 LearningRate 0.0002 Epoch: 0 Global Step: 18190 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:23:13,724-Speed 6303.68 samples/sec Loss 35.3338 LearningRate 0.0002 Epoch: 0 Global Step: 18200 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:23:16,970-Speed 6309.41 samples/sec Loss 35.6261 LearningRate 0.0002 Epoch: 0 Global Step: 18210 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:23:20,214-Speed 6314.94 samples/sec Loss 35.5195 LearningRate 0.0002 Epoch: 0 Global Step: 18220 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:23,460-Speed 6310.59 samples/sec Loss 35.5280 LearningRate 0.0002 Epoch: 0 Global Step: 18230 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:26,705-Speed 6313.11 samples/sec Loss 35.4456 LearningRate 0.0002 Epoch: 0 Global Step: 18240 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:29,948-Speed 6317.83 samples/sec Loss 35.3767 LearningRate 0.0002 Epoch: 0 Global Step: 18250 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:33,193-Speed 6312.78 samples/sec Loss 35.3897 LearningRate 0.0002 Epoch: 0 Global Step: 18260 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:36,435-Speed 6318.55 samples/sec Loss 35.3359 LearningRate 0.0002 Epoch: 0 Global Step: 18270 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:39,709-Speed 6257.06 samples/sec Loss 35.3253 LearningRate 0.0002 Epoch: 0 Global Step: 18280 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:42,956-Speed 6308.69 samples/sec Loss 35.2896 LearningRate 0.0002 Epoch: 0 Global Step: 18290 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:46,200-Speed 6314.84 samples/sec Loss 35.3085 LearningRate 0.0002 Epoch: 0 Global Step: 18300 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:49,448-Speed 6305.21 samples/sec Loss 35.3126 LearningRate 0.0002 Epoch: 0 Global Step: 18310 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:52,702-Speed 6295.41 samples/sec Loss 35.2226 LearningRate 0.0002 Epoch: 0 Global Step: 18320 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:23:55,932-Speed 6343.31 samples/sec Loss 35.2258 LearningRate 0.0002 Epoch: 0 Global Step: 18330 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:23:59,186-Speed 6294.55 samples/sec Loss 35.2838 LearningRate 0.0002 Epoch: 0 Global Step: 18340 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:24:02,442-Speed 6291.31 samples/sec Loss 35.1676 LearningRate 0.0002 Epoch: 0 Global Step: 18350 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:24:05,698-Speed 6291.27 samples/sec Loss 35.2280 LearningRate 0.0002 Epoch: 0 Global Step: 18360 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:24:08,941-Speed 6316.31 samples/sec Loss 35.2922 LearningRate 0.0002 Epoch: 0 Global Step: 18370 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:24:12,187-Speed 6310.07 samples/sec Loss 35.7098 LearningRate 0.0002 Epoch: 0 Global Step: 18380 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:24:15,433-Speed 6310.51 samples/sec Loss 35.5355 LearningRate 0.0002 Epoch: 0 Global Step: 18390 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:24:18,690-Speed 6289.72 samples/sec Loss 35.4272 LearningRate 0.0002 Epoch: 0 Global Step: 18400 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:24:21,937-Speed 6309.29 samples/sec Loss 35.3993 LearningRate 0.0002 Epoch: 0 Global Step: 18410 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:24:25,180-Speed 6315.40 samples/sec Loss 35.3243 LearningRate 0.0002 Epoch: 0 Global Step: 18420 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:24:28,429-Speed 6305.81 samples/sec Loss 35.3250 LearningRate 0.0002 Epoch: 0 Global Step: 18430 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:24:31,672-Speed 6317.40 samples/sec Loss 35.2526 LearningRate 0.0002 Epoch: 0 Global Step: 18440 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:24:34,922-Speed 6303.40 samples/sec Loss 35.2380 LearningRate 0.0002 Epoch: 0 Global Step: 18450 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:24:38,172-Speed 6303.01 samples/sec Loss 35.2617 LearningRate 0.0002 Epoch: 0 Global Step: 18460 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:24:41,420-Speed 6307.41 samples/sec Loss 35.2351 LearningRate 0.0002 Epoch: 0 Global Step: 18470 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:24:44,668-Speed 6306.71 samples/sec Loss 35.2058 LearningRate 0.0002 Epoch: 0 Global Step: 18480 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:24:47,909-Speed 6320.71 samples/sec Loss 35.2070 LearningRate 0.0002 Epoch: 0 Global Step: 18490 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:24:51,155-Speed 6310.44 samples/sec Loss 35.2282 LearningRate 0.0002 Epoch: 0 Global Step: 18500 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:24:54,402-Speed 6308.15 samples/sec Loss 35.1973 LearningRate 0.0002 Epoch: 0 Global Step: 18510 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:24:57,648-Speed 6312.18 samples/sec Loss 35.1875 LearningRate 0.0002 Epoch: 0 Global Step: 18520 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:00,918-Speed 6262.78 samples/sec Loss 35.1990 LearningRate 0.0002 Epoch: 0 Global Step: 18530 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:25:04,201-Speed 6240.73 samples/sec Loss 35.3006 LearningRate 0.0002 Epoch: 0 Global Step: 18540 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:25:07,449-Speed 6306.24 samples/sec Loss 35.1582 LearningRate 0.0002 Epoch: 0 Global Step: 18550 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:25:10,679-Speed 6341.92 samples/sec Loss 35.2490 LearningRate 0.0002 Epoch: 0 Global Step: 18560 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:13,922-Speed 6315.70 samples/sec Loss 35.2789 LearningRate 0.0002 Epoch: 0 Global Step: 18570 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:17,173-Speed 6300.86 samples/sec Loss 35.2310 LearningRate 0.0002 Epoch: 0 Global Step: 18580 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:20,417-Speed 6315.13 samples/sec Loss 35.1645 LearningRate 0.0002 Epoch: 0 Global Step: 18590 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:23,662-Speed 6313.44 samples/sec Loss 35.1231 LearningRate 0.0002 Epoch: 0 Global Step: 18600 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:26,905-Speed 6316.40 samples/sec Loss 35.1301 LearningRate 0.0002 Epoch: 0 Global Step: 18610 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:30,149-Speed 6318.04 samples/sec Loss 35.1536 LearningRate 0.0002 Epoch: 0 Global Step: 18620 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:33,398-Speed 6304.65 samples/sec Loss 35.1410 LearningRate 0.0002 Epoch: 0 Global Step: 18630 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:36,650-Speed 6300.32 samples/sec Loss 35.1588 LearningRate 0.0002 Epoch: 0 Global Step: 18640 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:39,892-Speed 6317.96 samples/sec Loss 35.1120 LearningRate 0.0002 Epoch: 0 Global Step: 18650 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:43,118-Speed 6349.90 samples/sec Loss 35.0837 LearningRate 0.0002 Epoch: 0 Global Step: 18660 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:46,374-Speed 6291.64 samples/sec Loss 35.2174 LearningRate 0.0002 Epoch: 0 Global Step: 18670 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:49,616-Speed 6318.65 samples/sec Loss 35.1106 LearningRate 0.0002 Epoch: 0 Global Step: 18680 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:25:52,845-Speed 6343.74 samples/sec Loss 35.1793 LearningRate 0.0002 Epoch: 0 Global Step: 18690 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:25:56,075-Speed 6342.56 samples/sec Loss 35.1386 LearningRate 0.0002 Epoch: 0 Global Step: 18700 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:25:59,318-Speed 6315.90 samples/sec Loss 35.1258 LearningRate 0.0002 Epoch: 0 Global Step: 18710 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:26:02,562-Speed 6313.82 samples/sec Loss 35.1522 LearningRate 0.0002 Epoch: 0 Global Step: 18720 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:26:05,806-Speed 6315.07 samples/sec Loss 35.1337 LearningRate 0.0002 Epoch: 0 Global Step: 18730 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:26:09,052-Speed 6310.55 samples/sec Loss 35.0698 LearningRate 0.0002 Epoch: 0 Global Step: 18740 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:26:12,305-Speed 6298.40 samples/sec Loss 35.0949 LearningRate 0.0002 Epoch: 0 Global Step: 18750 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:26:15,549-Speed 6312.86 samples/sec Loss 35.1067 LearningRate 0.0002 Epoch: 0 Global Step: 18760 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:26:18,793-Speed 6315.03 samples/sec Loss 35.2169 LearningRate 0.0002 Epoch: 0 Global Step: 18770 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:26:22,035-Speed 6318.52 samples/sec Loss 35.0477 LearningRate 0.0002 Epoch: 0 Global Step: 18780 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:26:25,283-Speed 6306.16 samples/sec Loss 35.1040 LearningRate 0.0002 Epoch: 0 Global Step: 18790 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:26:28,527-Speed 6314.95 samples/sec Loss 35.0102 LearningRate 0.0002 Epoch: 0 Global Step: 18800 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:26:31,773-Speed 6311.55 samples/sec Loss 35.0585 LearningRate 0.0002 Epoch: 0 Global Step: 18810 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:26:35,020-Speed 6309.29 samples/sec Loss 35.1292 LearningRate 0.0002 Epoch: 0 Global Step: 18820 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:26:38,269-Speed 6304.97 samples/sec Loss 35.1146 LearningRate 0.0002 Epoch: 0 Global Step: 18830 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:26:41,519-Speed 6303.64 samples/sec Loss 35.0313 LearningRate 0.0002 Epoch: 0 Global Step: 18840 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:26:44,764-Speed 6311.26 samples/sec Loss 35.0083 LearningRate 0.0002 Epoch: 0 Global Step: 18850 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:26:48,016-Speed 6299.56 samples/sec Loss 35.0524 LearningRate 0.0002 Epoch: 0 Global Step: 18860 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:26:51,264-Speed 6307.71 samples/sec Loss 35.0210 LearningRate 0.0002 Epoch: 0 Global Step: 18870 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:26:54,512-Speed 6305.63 samples/sec Loss 35.0018 LearningRate 0.0002 Epoch: 0 Global Step: 18880 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:26:57,757-Speed 6313.24 samples/sec Loss 34.9390 LearningRate 0.0002 Epoch: 0 Global Step: 18890 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:00,999-Speed 6318.67 samples/sec Loss 35.0509 LearningRate 0.0002 Epoch: 0 Global Step: 18900 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:27:04,246-Speed 6308.15 samples/sec Loss 35.0619 LearningRate 0.0002 Epoch: 0 Global Step: 18910 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:27:07,500-Speed 6295.56 samples/sec Loss 35.1180 LearningRate 0.0002 Epoch: 0 Global Step: 18920 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:10,745-Speed 6313.79 samples/sec Loss 34.9914 LearningRate 0.0002 Epoch: 0 Global Step: 18930 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:13,990-Speed 6311.75 samples/sec Loss 34.9793 LearningRate 0.0002 Epoch: 0 Global Step: 18940 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:17,235-Speed 6313.23 samples/sec Loss 35.1582 LearningRate 0.0002 Epoch: 0 Global Step: 18950 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:20,480-Speed 6312.34 samples/sec Loss 35.0868 LearningRate 0.0002 Epoch: 0 Global Step: 18960 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:23,725-Speed 6312.14 samples/sec Loss 35.0364 LearningRate 0.0002 Epoch: 0 Global Step: 18970 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:26,972-Speed 6308.42 samples/sec Loss 35.0749 LearningRate 0.0002 Epoch: 0 Global Step: 18980 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:30,225-Speed 6297.61 samples/sec Loss 35.3757 LearningRate 0.0002 Epoch: 0 Global Step: 18990 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:33,470-Speed 6313.14 samples/sec Loss 35.2481 LearningRate 0.0002 Epoch: 0 Global Step: 19000 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:36,716-Speed 6310.65 samples/sec Loss 35.1783 LearningRate 0.0002 Epoch: 0 Global Step: 19010 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:39,962-Speed 6310.73 samples/sec Loss 35.1223 LearningRate 0.0002 Epoch: 0 Global Step: 19020 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:43,215-Speed 6297.07 samples/sec Loss 35.0344 LearningRate 0.0002 Epoch: 0 Global Step: 19030 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:46,460-Speed 6313.48 samples/sec Loss 35.0563 LearningRate 0.0002 Epoch: 0 Global Step: 19040 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:49,720-Speed 6282.57 samples/sec Loss 35.0366 LearningRate 0.0002 Epoch: 0 Global Step: 19050 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:52,972-Speed 6300.80 samples/sec Loss 34.9833 LearningRate 0.0002 Epoch: 0 Global Step: 19060 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:56,214-Speed 6317.02 samples/sec Loss 34.9682 LearningRate 0.0002 Epoch: 0 Global Step: 19070 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:27:59,460-Speed 6310.80 samples/sec Loss 34.9412 LearningRate 0.0002 Epoch: 0 Global Step: 19080 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:28:02,705-Speed 6312.64 samples/sec Loss 34.9286 LearningRate 0.0002 Epoch: 0 Global Step: 19090 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:28:05,935-Speed 6341.95 samples/sec Loss 34.9735 LearningRate 0.0002 Epoch: 0 Global Step: 19100 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:28:09,179-Speed 6315.32 samples/sec Loss 35.0022 LearningRate 0.0002 Epoch: 0 Global Step: 19110 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:28:12,418-Speed 6324.12 samples/sec Loss 34.9433 LearningRate 0.0002 Epoch: 0 Global Step: 19120 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:28:15,663-Speed 6313.35 samples/sec Loss 34.9941 LearningRate 0.0002 Epoch: 0 Global Step: 19130 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:28:18,914-Speed 6299.65 samples/sec Loss 34.9276 LearningRate 0.0002 Epoch: 0 Global Step: 19140 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:28:22,165-Speed 6302.52 samples/sec Loss 34.9682 LearningRate 0.0002 Epoch: 0 Global Step: 19150 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:28:25,416-Speed 6299.72 samples/sec Loss 35.0402 LearningRate 0.0002 Epoch: 0 Global Step: 19160 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:28:28,664-Speed 6307.85 samples/sec Loss 35.0344 LearningRate 0.0002 Epoch: 0 Global Step: 19170 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:28:31,909-Speed 6312.59 samples/sec Loss 34.9708 LearningRate 0.0002 Epoch: 0 Global Step: 19180 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:28:35,147-Speed 6325.83 samples/sec Loss 35.0889 LearningRate 0.0002 Epoch: 0 Global Step: 19190 Fp16 Grad Scale: 2048 Required: 73 hours Training: 2022-03-31 17:28:38,391-Speed 6313.80 samples/sec Loss 35.0142 LearningRate 0.0002 Epoch: 0 Global Step: 19200 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:28:41,635-Speed 6315.15 samples/sec Loss 34.9323 LearningRate 0.0002 Epoch: 0 Global Step: 19210 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:28:44,876-Speed 6320.48 samples/sec Loss 34.8450 LearningRate 0.0002 Epoch: 0 Global Step: 19220 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:28:48,124-Speed 6308.12 samples/sec Loss 34.9251 LearningRate 0.0002 Epoch: 0 Global Step: 19230 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:28:51,364-Speed 6321.54 samples/sec Loss 34.8334 LearningRate 0.0002 Epoch: 0 Global Step: 19240 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:28:54,610-Speed 6311.96 samples/sec Loss 34.9224 LearningRate 0.0002 Epoch: 0 Global Step: 19250 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:28:57,853-Speed 6315.74 samples/sec Loss 34.8982 LearningRate 0.0002 Epoch: 0 Global Step: 19260 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:29:01,096-Speed 6317.17 samples/sec Loss 34.8600 LearningRate 0.0002 Epoch: 0 Global Step: 19270 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:29:04,364-Speed 6268.06 samples/sec Loss 34.8333 LearningRate 0.0002 Epoch: 0 Global Step: 19280 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:29:07,604-Speed 6321.39 samples/sec Loss 34.9555 LearningRate 0.0002 Epoch: 0 Global Step: 19290 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:29:10,849-Speed 6313.68 samples/sec Loss 34.9274 LearningRate 0.0002 Epoch: 0 Global Step: 19300 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:29:14,098-Speed 6305.24 samples/sec Loss 34.8938 LearningRate 0.0002 Epoch: 0 Global Step: 19310 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:29:17,343-Speed 6311.54 samples/sec Loss 34.9305 LearningRate 0.0002 Epoch: 0 Global Step: 19320 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:29:20,587-Speed 6314.97 samples/sec Loss 34.8953 LearningRate 0.0002 Epoch: 0 Global Step: 19330 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:29:23,833-Speed 6310.46 samples/sec Loss 34.9017 LearningRate 0.0002 Epoch: 0 Global Step: 19340 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:29:27,077-Speed 6314.79 samples/sec Loss 34.8731 LearningRate 0.0002 Epoch: 0 Global Step: 19350 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:29:30,319-Speed 6318.43 samples/sec Loss 34.8384 LearningRate 0.0002 Epoch: 0 Global Step: 19360 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:29:33,567-Speed 6306.40 samples/sec Loss 34.8974 LearningRate 0.0002 Epoch: 0 Global Step: 19370 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:29:36,811-Speed 6314.89 samples/sec Loss 34.8322 LearningRate 0.0002 Epoch: 0 Global Step: 19380 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:29:40,056-Speed 6312.84 samples/sec Loss 34.7755 LearningRate 0.0002 Epoch: 0 Global Step: 19390 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:29:43,296-Speed 6321.07 samples/sec Loss 34.8306 LearningRate 0.0002 Epoch: 0 Global Step: 19400 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:29:46,546-Speed 6303.61 samples/sec Loss 34.8156 LearningRate 0.0002 Epoch: 0 Global Step: 19410 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:29:49,795-Speed 6304.24 samples/sec Loss 34.8196 LearningRate 0.0002 Epoch: 0 Global Step: 19420 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:29:53,040-Speed 6313.11 samples/sec Loss 34.7908 LearningRate 0.0002 Epoch: 0 Global Step: 19430 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:29:56,289-Speed 6305.59 samples/sec Loss 34.8268 LearningRate 0.0002 Epoch: 0 Global Step: 19440 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:29:59,534-Speed 6312.61 samples/sec Loss 34.8311 LearningRate 0.0002 Epoch: 0 Global Step: 19450 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:30:02,781-Speed 6308.93 samples/sec Loss 34.8305 LearningRate 0.0002 Epoch: 0 Global Step: 19460 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:30:06,027-Speed 6309.95 samples/sec Loss 34.9961 LearningRate 0.0002 Epoch: 0 Global Step: 19470 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:30:09,272-Speed 6314.26 samples/sec Loss 34.9938 LearningRate 0.0002 Epoch: 0 Global Step: 19480 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:30:12,520-Speed 6306.88 samples/sec Loss 34.9413 LearningRate 0.0002 Epoch: 0 Global Step: 19490 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:30:15,761-Speed 6319.38 samples/sec Loss 34.8629 LearningRate 0.0002 Epoch: 0 Global Step: 19500 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:30:19,007-Speed 6311.12 samples/sec Loss 34.8943 LearningRate 0.0002 Epoch: 0 Global Step: 19510 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:30:22,250-Speed 6317.15 samples/sec Loss 34.8430 LearningRate 0.0002 Epoch: 0 Global Step: 19520 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:30:25,492-Speed 6317.77 samples/sec Loss 34.8298 LearningRate 0.0002 Epoch: 0 Global Step: 19530 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:30:28,731-Speed 6324.73 samples/sec Loss 34.7609 LearningRate 0.0002 Epoch: 0 Global Step: 19540 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:30:31,976-Speed 6312.03 samples/sec Loss 34.8497 LearningRate 0.0002 Epoch: 0 Global Step: 19550 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:30:35,227-Speed 6302.65 samples/sec Loss 34.7524 LearningRate 0.0002 Epoch: 0 Global Step: 19560 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:30:38,481-Speed 6293.45 samples/sec Loss 34.7161 LearningRate 0.0002 Epoch: 0 Global Step: 19570 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:30:41,715-Speed 6335.34 samples/sec Loss 34.7331 LearningRate 0.0002 Epoch: 0 Global Step: 19580 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:30:44,959-Speed 6312.98 samples/sec Loss 34.7412 LearningRate 0.0002 Epoch: 0 Global Step: 19590 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:30:48,201-Speed 6319.28 samples/sec Loss 34.8523 LearningRate 0.0002 Epoch: 0 Global Step: 19600 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:30:51,451-Speed 6302.78 samples/sec Loss 34.8099 LearningRate 0.0002 Epoch: 0 Global Step: 19610 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:30:54,692-Speed 6320.07 samples/sec Loss 34.7430 LearningRate 0.0002 Epoch: 0 Global Step: 19620 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:30:57,925-Speed 6337.38 samples/sec Loss 34.7361 LearningRate 0.0002 Epoch: 0 Global Step: 19630 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:31:01,173-Speed 6306.64 samples/sec Loss 34.6812 LearningRate 0.0002 Epoch: 0 Global Step: 19640 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:31:04,422-Speed 6306.42 samples/sec Loss 34.8399 LearningRate 0.0002 Epoch: 0 Global Step: 19650 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:31:07,664-Speed 6318.50 samples/sec Loss 34.7541 LearningRate 0.0002 Epoch: 0 Global Step: 19660 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:31:10,908-Speed 6314.60 samples/sec Loss 34.6543 LearningRate 0.0002 Epoch: 0 Global Step: 19670 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:31:14,153-Speed 6312.65 samples/sec Loss 34.7068 LearningRate 0.0002 Epoch: 0 Global Step: 19680 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:31:17,402-Speed 6304.80 samples/sec Loss 34.8183 LearningRate 0.0002 Epoch: 0 Global Step: 19690 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:31:20,656-Speed 6295.31 samples/sec Loss 34.7489 LearningRate 0.0002 Epoch: 0 Global Step: 19700 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:31:23,925-Speed 6266.02 samples/sec Loss 34.7530 LearningRate 0.0002 Epoch: 0 Global Step: 19710 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:31:27,180-Speed 6292.77 samples/sec Loss 35.0579 LearningRate 0.0002 Epoch: 0 Global Step: 19720 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:31:30,430-Speed 6303.71 samples/sec Loss 34.8796 LearningRate 0.0002 Epoch: 0 Global Step: 19730 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:31:33,676-Speed 6310.80 samples/sec Loss 34.8680 LearningRate 0.0002 Epoch: 0 Global Step: 19740 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:31:36,920-Speed 6312.98 samples/sec Loss 34.8443 LearningRate 0.0002 Epoch: 0 Global Step: 19750 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:31:40,166-Speed 6311.55 samples/sec Loss 34.8264 LearningRate 0.0002 Epoch: 0 Global Step: 19760 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:31:43,413-Speed 6308.30 samples/sec Loss 34.7866 LearningRate 0.0002 Epoch: 0 Global Step: 19770 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:31:46,659-Speed 6310.54 samples/sec Loss 34.7245 LearningRate 0.0002 Epoch: 0 Global Step: 19780 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:31:49,902-Speed 6316.17 samples/sec Loss 34.6314 LearningRate 0.0002 Epoch: 0 Global Step: 19790 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:31:53,150-Speed 6306.98 samples/sec Loss 34.7205 LearningRate 0.0002 Epoch: 0 Global Step: 19800 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:31:56,395-Speed 6313.07 samples/sec Loss 34.7754 LearningRate 0.0002 Epoch: 0 Global Step: 19810 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:31:59,638-Speed 6317.11 samples/sec Loss 34.7335 LearningRate 0.0002 Epoch: 0 Global Step: 19820 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:32:02,887-Speed 6305.52 samples/sec Loss 34.7049 LearningRate 0.0002 Epoch: 0 Global Step: 19830 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:06,132-Speed 6313.07 samples/sec Loss 34.6121 LearningRate 0.0002 Epoch: 0 Global Step: 19840 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:09,376-Speed 6314.15 samples/sec Loss 34.6003 LearningRate 0.0002 Epoch: 0 Global Step: 19850 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:12,624-Speed 6306.90 samples/sec Loss 34.6160 LearningRate 0.0002 Epoch: 0 Global Step: 19860 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:15,869-Speed 6312.46 samples/sec Loss 34.5588 LearningRate 0.0002 Epoch: 0 Global Step: 19870 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:19,114-Speed 6313.06 samples/sec Loss 34.6128 LearningRate 0.0002 Epoch: 0 Global Step: 19880 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:22,362-Speed 6306.03 samples/sec Loss 34.6768 LearningRate 0.0002 Epoch: 0 Global Step: 19890 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:25,609-Speed 6309.01 samples/sec Loss 34.6352 LearningRate 0.0002 Epoch: 0 Global Step: 19900 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:28,860-Speed 6300.80 samples/sec Loss 34.5777 LearningRate 0.0002 Epoch: 0 Global Step: 19910 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:32,108-Speed 6307.47 samples/sec Loss 34.6773 LearningRate 0.0002 Epoch: 0 Global Step: 19920 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:35,339-Speed 6340.27 samples/sec Loss 34.6498 LearningRate 0.0002 Epoch: 0 Global Step: 19930 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:32:38,568-Speed 6344.11 samples/sec Loss 34.5238 LearningRate 0.0002 Epoch: 0 Global Step: 19940 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:32:41,815-Speed 6308.72 samples/sec Loss 34.5982 LearningRate 0.0002 Epoch: 0 Global Step: 19950 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:32:45,059-Speed 6314.61 samples/sec Loss 34.6185 LearningRate 0.0002 Epoch: 0 Global Step: 19960 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:32:48,312-Speed 6297.39 samples/sec Loss 34.5521 LearningRate 0.0002 Epoch: 0 Global Step: 19970 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:32:51,561-Speed 6305.18 samples/sec Loss 34.5286 LearningRate 0.0002 Epoch: 0 Global Step: 19980 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:32:54,794-Speed 6335.55 samples/sec Loss 34.5539 LearningRate 0.0002 Epoch: 0 Global Step: 19990 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:32:58,035-Speed 6320.17 samples/sec Loss 34.6258 LearningRate 0.0002 Epoch: 0 Global Step: 20000 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:33:01,280-Speed 6312.58 samples/sec Loss 34.5429 LearningRate 0.0002 Epoch: 0 Global Step: 20010 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:33:04,532-Speed 6300.96 samples/sec Loss 34.4689 LearningRate 0.0002 Epoch: 0 Global Step: 20020 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:33:07,776-Speed 6313.00 samples/sec Loss 34.5210 LearningRate 0.0002 Epoch: 0 Global Step: 20030 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:33:11,026-Speed 6303.42 samples/sec Loss 34.4916 LearningRate 0.0002 Epoch: 0 Global Step: 20040 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:33:14,270-Speed 6314.11 samples/sec Loss 34.5430 LearningRate 0.0002 Epoch: 0 Global Step: 20050 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:33:17,513-Speed 6317.54 samples/sec Loss 34.4911 LearningRate 0.0002 Epoch: 0 Global Step: 20060 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:33:20,764-Speed 6299.75 samples/sec Loss 34.5203 LearningRate 0.0002 Epoch: 0 Global Step: 20070 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:33:24,008-Speed 6315.64 samples/sec Loss 34.4937 LearningRate 0.0002 Epoch: 0 Global Step: 20080 Fp16 Grad Scale: 4096 Required: 73 hours Training: 2022-03-31 17:33:27,254-Speed 6311.26 samples/sec Loss 34.4737 LearningRate 0.0002 Epoch: 0 Global Step: 20090 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:33:30,499-Speed 6311.52 samples/sec Loss 34.4358 LearningRate 0.0002 Epoch: 0 Global Step: 20100 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:33:33,748-Speed 6305.96 samples/sec Loss 34.4637 LearningRate 0.0002 Epoch: 0 Global Step: 20110 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:33:36,997-Speed 6304.47 samples/sec Loss 34.4992 LearningRate 0.0002 Epoch: 0 Global Step: 20120 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:33:40,245-Speed 6305.75 samples/sec Loss 34.4690 LearningRate 0.0002 Epoch: 0 Global Step: 20130 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:33:43,490-Speed 6313.69 samples/sec Loss 34.4756 LearningRate 0.0002 Epoch: 0 Global Step: 20140 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:33:46,733-Speed 6316.42 samples/sec Loss 34.4176 LearningRate 0.0002 Epoch: 0 Global Step: 20150 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:33:49,976-Speed 6316.75 samples/sec Loss 34.4876 LearningRate 0.0002 Epoch: 0 Global Step: 20160 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:33:53,220-Speed 6313.85 samples/sec Loss 34.3930 LearningRate 0.0002 Epoch: 0 Global Step: 20170 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:33:56,459-Speed 6323.92 samples/sec Loss 34.4552 LearningRate 0.0002 Epoch: 0 Global Step: 20180 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:33:59,708-Speed 6305.32 samples/sec Loss 34.4000 LearningRate 0.0002 Epoch: 0 Global Step: 20190 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:34:02,986-Speed 6250.60 samples/sec Loss 34.4475 LearningRate 0.0002 Epoch: 0 Global Step: 20200 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:34:06,350-Speed 6088.75 samples/sec Loss 34.3960 LearningRate 0.0002 Epoch: 0 Global Step: 20210 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:34:09,599-Speed 6305.37 samples/sec Loss 34.4037 LearningRate 0.0002 Epoch: 0 Global Step: 20220 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:34:12,826-Speed 6347.71 samples/sec Loss 34.4567 LearningRate 0.0002 Epoch: 0 Global Step: 20230 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:16,074-Speed 6306.06 samples/sec Loss 34.4497 LearningRate 0.0002 Epoch: 0 Global Step: 20240 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:19,327-Speed 6297.57 samples/sec Loss 34.4395 LearningRate 0.0002 Epoch: 0 Global Step: 20250 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:22,568-Speed 6321.39 samples/sec Loss 34.4015 LearningRate 0.0002 Epoch: 0 Global Step: 20260 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:25,811-Speed 6315.74 samples/sec Loss 34.4261 LearningRate 0.0002 Epoch: 0 Global Step: 20270 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:29,058-Speed 6310.37 samples/sec Loss 34.3666 LearningRate 0.0002 Epoch: 0 Global Step: 20280 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:32,300-Speed 6317.17 samples/sec Loss 34.3973 LearningRate 0.0002 Epoch: 0 Global Step: 20290 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:35,544-Speed 6315.20 samples/sec Loss 34.3538 LearningRate 0.0002 Epoch: 0 Global Step: 20300 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:38,791-Speed 6309.00 samples/sec Loss 34.3683 LearningRate 0.0002 Epoch: 0 Global Step: 20310 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:42,039-Speed 6306.57 samples/sec Loss 34.3839 LearningRate 0.0002 Epoch: 0 Global Step: 20320 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:45,289-Speed 6302.39 samples/sec Loss 34.3849 LearningRate 0.0002 Epoch: 0 Global Step: 20330 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:34:48,536-Speed 6309.06 samples/sec Loss 34.3875 LearningRate 0.0002 Epoch: 0 Global Step: 20340 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:34:51,781-Speed 6312.92 samples/sec Loss 34.3698 LearningRate 0.0002 Epoch: 0 Global Step: 20350 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:34:55,018-Speed 6328.47 samples/sec Loss 34.5152 LearningRate 0.0002 Epoch: 0 Global Step: 20360 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:34:58,260-Speed 6317.34 samples/sec Loss 34.4473 LearningRate 0.0002 Epoch: 0 Global Step: 20370 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:01,504-Speed 6315.67 samples/sec Loss 34.4238 LearningRate 0.0002 Epoch: 0 Global Step: 20380 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:04,748-Speed 6314.92 samples/sec Loss 34.4844 LearningRate 0.0002 Epoch: 0 Global Step: 20390 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:07,996-Speed 6306.88 samples/sec Loss 34.3799 LearningRate 0.0002 Epoch: 0 Global Step: 20400 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:11,244-Speed 6307.46 samples/sec Loss 34.3810 LearningRate 0.0002 Epoch: 0 Global Step: 20410 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:14,489-Speed 6312.93 samples/sec Loss 34.3083 LearningRate 0.0002 Epoch: 0 Global Step: 20420 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:17,740-Speed 6299.79 samples/sec Loss 34.3229 LearningRate 0.0002 Epoch: 0 Global Step: 20430 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:20,985-Speed 6313.98 samples/sec Loss 34.2462 LearningRate 0.0002 Epoch: 0 Global Step: 20440 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:24,230-Speed 6312.13 samples/sec Loss 34.2765 LearningRate 0.0002 Epoch: 0 Global Step: 20450 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:27,477-Speed 6309.16 samples/sec Loss 34.2422 LearningRate 0.0002 Epoch: 0 Global Step: 20460 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:35:30,726-Speed 6303.80 samples/sec Loss 34.2828 LearningRate 0.0002 Epoch: 0 Global Step: 20470 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:35:33,960-Speed 6334.73 samples/sec Loss 34.2649 LearningRate 0.0002 Epoch: 0 Global Step: 20480 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:37,206-Speed 6310.45 samples/sec Loss 34.2847 LearningRate 0.0002 Epoch: 0 Global Step: 20490 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:40,450-Speed 6314.58 samples/sec Loss 34.2902 LearningRate 0.0002 Epoch: 0 Global Step: 20500 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:43,707-Speed 6288.85 samples/sec Loss 34.2590 LearningRate 0.0002 Epoch: 0 Global Step: 20510 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:46,954-Speed 6309.58 samples/sec Loss 34.3119 LearningRate 0.0002 Epoch: 0 Global Step: 20520 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:50,197-Speed 6315.81 samples/sec Loss 34.2762 LearningRate 0.0002 Epoch: 0 Global Step: 20530 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:53,442-Speed 6312.60 samples/sec Loss 34.2658 LearningRate 0.0002 Epoch: 0 Global Step: 20540 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:56,687-Speed 6311.60 samples/sec Loss 34.3360 LearningRate 0.0002 Epoch: 0 Global Step: 20550 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:35:59,939-Speed 6300.39 samples/sec Loss 34.2646 LearningRate 0.0002 Epoch: 0 Global Step: 20560 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:03,186-Speed 6308.65 samples/sec Loss 34.1960 LearningRate 0.0002 Epoch: 0 Global Step: 20570 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:06,445-Speed 6286.10 samples/sec Loss 34.2743 LearningRate 0.0002 Epoch: 0 Global Step: 20580 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:36:09,701-Speed 6292.30 samples/sec Loss 34.2077 LearningRate 0.0002 Epoch: 0 Global Step: 20590 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:36:12,957-Speed 6289.65 samples/sec Loss 34.2190 LearningRate 0.0002 Epoch: 0 Global Step: 20600 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:36:16,206-Speed 6305.52 samples/sec Loss 34.1360 LearningRate 0.0002 Epoch: 0 Global Step: 20610 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:36:19,438-Speed 6338.81 samples/sec Loss 34.1554 LearningRate 0.0002 Epoch: 0 Global Step: 20620 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:22,699-Speed 6280.64 samples/sec Loss 34.2325 LearningRate 0.0002 Epoch: 0 Global Step: 20630 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:25,952-Speed 6297.31 samples/sec Loss 34.3918 LearningRate 0.0002 Epoch: 0 Global Step: 20640 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:29,196-Speed 6314.50 samples/sec Loss 34.3163 LearningRate 0.0002 Epoch: 0 Global Step: 20650 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:32,441-Speed 6312.66 samples/sec Loss 34.2867 LearningRate 0.0002 Epoch: 0 Global Step: 20660 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:35,691-Speed 6303.50 samples/sec Loss 34.1791 LearningRate 0.0002 Epoch: 0 Global Step: 20670 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:38,937-Speed 6311.81 samples/sec Loss 34.2234 LearningRate 0.0002 Epoch: 0 Global Step: 20680 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:42,185-Speed 6306.91 samples/sec Loss 34.1571 LearningRate 0.0002 Epoch: 0 Global Step: 20690 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:45,428-Speed 6315.15 samples/sec Loss 34.1394 LearningRate 0.0002 Epoch: 0 Global Step: 20700 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:48,671-Speed 6317.56 samples/sec Loss 34.1687 LearningRate 0.0002 Epoch: 0 Global Step: 20710 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:36:51,915-Speed 6314.71 samples/sec Loss 34.2030 LearningRate 0.0002 Epoch: 0 Global Step: 20720 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:36:55,167-Speed 6297.33 samples/sec Loss 34.1895 LearningRate 0.0002 Epoch: 0 Global Step: 20730 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:36:58,410-Speed 6317.57 samples/sec Loss 34.3434 LearningRate 0.0003 Epoch: 0 Global Step: 20740 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:37:59,357-Speed 336.04 samples/sec Loss 34.2268 LearningRate 0.0003 Epoch: 1 Global Step: 20750 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:38:02,595-Speed 6326.13 samples/sec Loss 34.1491 LearningRate 0.0003 Epoch: 1 Global Step: 20760 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:38:05,841-Speed 6311.31 samples/sec Loss 34.1512 LearningRate 0.0003 Epoch: 1 Global Step: 20770 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:38:09,075-Speed 6334.98 samples/sec Loss 34.1949 LearningRate 0.0003 Epoch: 1 Global Step: 20780 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:38:12,310-Speed 6331.46 samples/sec Loss 34.1304 LearningRate 0.0003 Epoch: 1 Global Step: 20790 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:38:15,547-Speed 6327.20 samples/sec Loss 34.1564 LearningRate 0.0003 Epoch: 1 Global Step: 20800 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:38:18,783-Speed 6331.75 samples/sec Loss 34.1203 LearningRate 0.0003 Epoch: 1 Global Step: 20810 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:38:22,036-Speed 6297.05 samples/sec Loss 34.0872 LearningRate 0.0003 Epoch: 1 Global Step: 20820 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:38:25,349-Speed 6182.41 samples/sec Loss 34.1637 LearningRate 0.0003 Epoch: 1 Global Step: 20830 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:38:28,587-Speed 6326.58 samples/sec Loss 34.0702 LearningRate 0.0003 Epoch: 1 Global Step: 20840 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:38:31,822-Speed 6332.69 samples/sec Loss 34.0496 LearningRate 0.0003 Epoch: 1 Global Step: 20850 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:38:35,061-Speed 6324.44 samples/sec Loss 34.0255 LearningRate 0.0003 Epoch: 1 Global Step: 20860 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:38:38,278-Speed 6368.35 samples/sec Loss 34.0855 LearningRate 0.0003 Epoch: 1 Global Step: 20870 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:38:41,520-Speed 6318.17 samples/sec Loss 34.0724 LearningRate 0.0003 Epoch: 1 Global Step: 20880 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:38:44,760-Speed 6321.70 samples/sec Loss 34.1495 LearningRate 0.0003 Epoch: 1 Global Step: 20890 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:38:47,997-Speed 6328.66 samples/sec Loss 34.0526 LearningRate 0.0003 Epoch: 1 Global Step: 20900 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:38:51,239-Speed 6317.40 samples/sec Loss 34.0143 LearningRate 0.0003 Epoch: 1 Global Step: 20910 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:38:54,476-Speed 6328.40 samples/sec Loss 33.9882 LearningRate 0.0003 Epoch: 1 Global Step: 20920 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:38:57,716-Speed 6322.35 samples/sec Loss 34.3018 LearningRate 0.0003 Epoch: 1 Global Step: 20930 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:01,008-Speed 6222.61 samples/sec Loss 34.1893 LearningRate 0.0003 Epoch: 1 Global Step: 20940 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:04,250-Speed 6319.19 samples/sec Loss 34.0664 LearningRate 0.0003 Epoch: 1 Global Step: 20950 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:07,492-Speed 6318.49 samples/sec Loss 34.0602 LearningRate 0.0003 Epoch: 1 Global Step: 20960 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:10,732-Speed 6321.91 samples/sec Loss 34.0481 LearningRate 0.0003 Epoch: 1 Global Step: 20970 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:39:13,972-Speed 6323.15 samples/sec Loss 34.0185 LearningRate 0.0003 Epoch: 1 Global Step: 20980 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:39:17,234-Speed 6279.02 samples/sec Loss 33.9669 LearningRate 0.0003 Epoch: 1 Global Step: 20990 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:39:20,486-Speed 6300.16 samples/sec Loss 33.9393 LearningRate 0.0003 Epoch: 1 Global Step: 21000 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:39:23,713-Speed 6347.74 samples/sec Loss 33.9778 LearningRate 0.0003 Epoch: 1 Global Step: 21010 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:26,960-Speed 6307.49 samples/sec Loss 34.0466 LearningRate 0.0003 Epoch: 1 Global Step: 21020 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:30,196-Speed 6331.63 samples/sec Loss 34.0193 LearningRate 0.0003 Epoch: 1 Global Step: 21030 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:33,435-Speed 6323.51 samples/sec Loss 33.9974 LearningRate 0.0003 Epoch: 1 Global Step: 21040 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:36,684-Speed 6305.81 samples/sec Loss 33.9440 LearningRate 0.0003 Epoch: 1 Global Step: 21050 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:39,919-Speed 6332.06 samples/sec Loss 33.9750 LearningRate 0.0003 Epoch: 1 Global Step: 21060 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:43,156-Speed 6328.05 samples/sec Loss 33.9474 LearningRate 0.0003 Epoch: 1 Global Step: 21070 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:46,395-Speed 6323.09 samples/sec Loss 33.9063 LearningRate 0.0003 Epoch: 1 Global Step: 21080 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:49,631-Speed 6330.00 samples/sec Loss 33.8973 LearningRate 0.0003 Epoch: 1 Global Step: 21090 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:52,870-Speed 6325.90 samples/sec Loss 33.8932 LearningRate 0.0003 Epoch: 1 Global Step: 21100 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:39:56,127-Speed 6289.42 samples/sec Loss 33.9150 LearningRate 0.0003 Epoch: 1 Global Step: 21110 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:39:59,365-Speed 6325.08 samples/sec Loss 33.9404 LearningRate 0.0003 Epoch: 1 Global Step: 21120 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:02,605-Speed 6323.60 samples/sec Loss 33.9473 LearningRate 0.0003 Epoch: 1 Global Step: 21130 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:05,847-Speed 6318.84 samples/sec Loss 33.9258 LearningRate 0.0003 Epoch: 1 Global Step: 21140 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:09,096-Speed 6303.74 samples/sec Loss 33.9027 LearningRate 0.0003 Epoch: 1 Global Step: 21150 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:12,330-Speed 6333.89 samples/sec Loss 33.9104 LearningRate 0.0003 Epoch: 1 Global Step: 21160 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:15,567-Speed 6328.69 samples/sec Loss 33.8408 LearningRate 0.0003 Epoch: 1 Global Step: 21170 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:18,801-Speed 6334.28 samples/sec Loss 33.8202 LearningRate 0.0003 Epoch: 1 Global Step: 21180 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:22,038-Speed 6329.74 samples/sec Loss 33.8868 LearningRate 0.0003 Epoch: 1 Global Step: 21190 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:25,272-Speed 6333.31 samples/sec Loss 33.8696 LearningRate 0.0003 Epoch: 1 Global Step: 21200 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:28,511-Speed 6325.26 samples/sec Loss 33.8548 LearningRate 0.0003 Epoch: 1 Global Step: 21210 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:40:31,747-Speed 6329.15 samples/sec Loss 33.8111 LearningRate 0.0003 Epoch: 1 Global Step: 21220 Fp16 Grad Scale: 32768 Required: 74 hours Training: 2022-03-31 17:40:34,970-Speed 6355.68 samples/sec Loss 33.8460 LearningRate 0.0003 Epoch: 1 Global Step: 21230 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:38,211-Speed 6322.04 samples/sec Loss 33.8643 LearningRate 0.0003 Epoch: 1 Global Step: 21240 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:41,447-Speed 6329.88 samples/sec Loss 33.7980 LearningRate 0.0003 Epoch: 1 Global Step: 21250 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:44,682-Speed 6330.61 samples/sec Loss 33.8089 LearningRate 0.0003 Epoch: 1 Global Step: 21260 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:47,919-Speed 6329.83 samples/sec Loss 33.7795 LearningRate 0.0003 Epoch: 1 Global Step: 21270 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:51,157-Speed 6325.10 samples/sec Loss 33.8704 LearningRate 0.0003 Epoch: 1 Global Step: 21280 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:54,392-Speed 6332.01 samples/sec Loss 33.7606 LearningRate 0.0003 Epoch: 1 Global Step: 21290 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:40:57,629-Speed 6329.15 samples/sec Loss 33.8333 LearningRate 0.0003 Epoch: 1 Global Step: 21300 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:41:00,858-Speed 6344.29 samples/sec Loss 33.7327 LearningRate 0.0003 Epoch: 1 Global Step: 21310 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:04,095-Speed 6328.20 samples/sec Loss 33.7389 LearningRate 0.0003 Epoch: 1 Global Step: 21320 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:07,332-Speed 6327.35 samples/sec Loss 33.7083 LearningRate 0.0003 Epoch: 1 Global Step: 21330 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:10,568-Speed 6329.80 samples/sec Loss 33.7100 LearningRate 0.0003 Epoch: 1 Global Step: 21340 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:13,808-Speed 6323.88 samples/sec Loss 33.7351 LearningRate 0.0003 Epoch: 1 Global Step: 21350 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:17,049-Speed 6320.20 samples/sec Loss 33.7091 LearningRate 0.0003 Epoch: 1 Global Step: 21360 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:20,285-Speed 6329.54 samples/sec Loss 33.7248 LearningRate 0.0003 Epoch: 1 Global Step: 21370 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:23,525-Speed 6322.75 samples/sec Loss 33.6943 LearningRate 0.0003 Epoch: 1 Global Step: 21380 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:26,766-Speed 6321.03 samples/sec Loss 33.6511 LearningRate 0.0003 Epoch: 1 Global Step: 21390 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:30,008-Speed 6319.17 samples/sec Loss 33.6847 LearningRate 0.0003 Epoch: 1 Global Step: 21400 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:33,243-Speed 6331.51 samples/sec Loss 33.6573 LearningRate 0.0003 Epoch: 1 Global Step: 21410 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:41:36,466-Speed 6355.38 samples/sec Loss 33.7200 LearningRate 0.0003 Epoch: 1 Global Step: 21420 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:39,706-Speed 6322.34 samples/sec Loss 33.6239 LearningRate 0.0003 Epoch: 1 Global Step: 21430 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:42,944-Speed 6326.88 samples/sec Loss 33.6438 LearningRate 0.0003 Epoch: 1 Global Step: 21440 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:46,179-Speed 6331.84 samples/sec Loss 33.6544 LearningRate 0.0003 Epoch: 1 Global Step: 21450 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:49,416-Speed 6329.36 samples/sec Loss 33.6335 LearningRate 0.0003 Epoch: 1 Global Step: 21460 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:52,656-Speed 6322.52 samples/sec Loss 33.6210 LearningRate 0.0003 Epoch: 1 Global Step: 21470 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:55,894-Speed 6324.71 samples/sec Loss 33.5614 LearningRate 0.0003 Epoch: 1 Global Step: 21480 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:41:59,132-Speed 6326.56 samples/sec Loss 33.6326 LearningRate 0.0003 Epoch: 1 Global Step: 21490 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:02,372-Speed 6322.28 samples/sec Loss 33.6278 LearningRate 0.0003 Epoch: 1 Global Step: 21500 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:05,615-Speed 6316.73 samples/sec Loss 33.5709 LearningRate 0.0003 Epoch: 1 Global Step: 21510 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:08,852-Speed 6329.33 samples/sec Loss 33.6013 LearningRate 0.0003 Epoch: 1 Global Step: 21520 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:42:12,093-Speed 6319.43 samples/sec Loss 33.5434 LearningRate 0.0003 Epoch: 1 Global Step: 21530 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:42:15,342-Speed 6304.56 samples/sec Loss 33.5966 LearningRate 0.0003 Epoch: 1 Global Step: 21540 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:42:18,582-Speed 6322.78 samples/sec Loss 33.5774 LearningRate 0.0003 Epoch: 1 Global Step: 21550 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:42:21,803-Speed 6360.21 samples/sec Loss 33.5507 LearningRate 0.0003 Epoch: 1 Global Step: 21560 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:25,046-Speed 6316.49 samples/sec Loss 33.6300 LearningRate 0.0003 Epoch: 1 Global Step: 21570 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:28,280-Speed 6333.60 samples/sec Loss 33.5177 LearningRate 0.0003 Epoch: 1 Global Step: 21580 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:31,514-Speed 6333.90 samples/sec Loss 33.5173 LearningRate 0.0003 Epoch: 1 Global Step: 21590 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:34,754-Speed 6323.04 samples/sec Loss 33.5831 LearningRate 0.0003 Epoch: 1 Global Step: 21600 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:37,990-Speed 6331.25 samples/sec Loss 33.5189 LearningRate 0.0003 Epoch: 1 Global Step: 21610 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:41,226-Speed 6328.85 samples/sec Loss 33.5538 LearningRate 0.0003 Epoch: 1 Global Step: 21620 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:44,471-Speed 6313.92 samples/sec Loss 33.5609 LearningRate 0.0003 Epoch: 1 Global Step: 21630 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:47,710-Speed 6324.73 samples/sec Loss 33.5085 LearningRate 0.0003 Epoch: 1 Global Step: 21640 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:50,952-Speed 6318.25 samples/sec Loss 33.4976 LearningRate 0.0003 Epoch: 1 Global Step: 21650 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:42:54,187-Speed 6332.35 samples/sec Loss 33.5017 LearningRate 0.0003 Epoch: 1 Global Step: 21660 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:42:57,426-Speed 6323.90 samples/sec Loss 33.5120 LearningRate 0.0003 Epoch: 1 Global Step: 21670 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:43:00,662-Speed 6329.60 samples/sec Loss 33.5017 LearningRate 0.0003 Epoch: 1 Global Step: 21680 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:43:03,898-Speed 6329.91 samples/sec Loss 33.5002 LearningRate 0.0003 Epoch: 1 Global Step: 21690 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:43:07,133-Speed 6333.67 samples/sec Loss 33.4725 LearningRate 0.0003 Epoch: 1 Global Step: 21700 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:43:10,371-Speed 6325.60 samples/sec Loss 33.4705 LearningRate 0.0003 Epoch: 1 Global Step: 21710 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:43:13,614-Speed 6316.12 samples/sec Loss 33.4145 LearningRate 0.0003 Epoch: 1 Global Step: 21720 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:43:16,852-Speed 6326.09 samples/sec Loss 33.5259 LearningRate 0.0003 Epoch: 1 Global Step: 21730 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:43:20,091-Speed 6325.12 samples/sec Loss 33.4570 LearningRate 0.0003 Epoch: 1 Global Step: 21740 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:43:23,317-Speed 6350.03 samples/sec Loss 33.3438 LearningRate 0.0003 Epoch: 1 Global Step: 21750 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:43:26,554-Speed 6328.15 samples/sec Loss 33.3855 LearningRate 0.0003 Epoch: 1 Global Step: 21760 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:43:29,790-Speed 6330.48 samples/sec Loss 33.4289 LearningRate 0.0003 Epoch: 1 Global Step: 21770 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:43:33,031-Speed 6319.58 samples/sec Loss 33.4363 LearningRate 0.0003 Epoch: 1 Global Step: 21780 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:43:36,273-Speed 6320.12 samples/sec Loss 33.3751 LearningRate 0.0003 Epoch: 1 Global Step: 21790 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:43:39,511-Speed 6325.46 samples/sec Loss 33.3805 LearningRate 0.0003 Epoch: 1 Global Step: 21800 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:43:42,737-Speed 6349.09 samples/sec Loss 33.3623 LearningRate 0.0003 Epoch: 1 Global Step: 21810 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:43:45,971-Speed 6334.49 samples/sec Loss 33.3855 LearningRate 0.0003 Epoch: 1 Global Step: 21820 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:43:49,206-Speed 6332.66 samples/sec Loss 33.3213 LearningRate 0.0003 Epoch: 1 Global Step: 21830 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:43:52,539-Speed 6145.78 samples/sec Loss 33.3226 LearningRate 0.0003 Epoch: 1 Global Step: 21840 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:43:55,776-Speed 6327.61 samples/sec Loss 33.3418 LearningRate 0.0003 Epoch: 1 Global Step: 21850 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:43:59,016-Speed 6323.79 samples/sec Loss 33.4052 LearningRate 0.0003 Epoch: 1 Global Step: 21860 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:44:02,267-Speed 6301.10 samples/sec Loss 33.3744 LearningRate 0.0003 Epoch: 1 Global Step: 21870 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:44:05,507-Speed 6321.47 samples/sec Loss 33.3871 LearningRate 0.0003 Epoch: 1 Global Step: 21880 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:44:08,744-Speed 6329.21 samples/sec Loss 33.3115 LearningRate 0.0003 Epoch: 1 Global Step: 21890 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:44:11,998-Speed 6294.27 samples/sec Loss 33.3411 LearningRate 0.0003 Epoch: 1 Global Step: 21900 Fp16 Grad Scale: 4096 Required: 74 hours Training: 2022-03-31 17:44:15,239-Speed 6319.77 samples/sec Loss 33.2941 LearningRate 0.0003 Epoch: 1 Global Step: 21910 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:44:18,477-Speed 6327.59 samples/sec Loss 33.1958 LearningRate 0.0003 Epoch: 1 Global Step: 21920 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:44:21,714-Speed 6327.45 samples/sec Loss 33.2915 LearningRate 0.0003 Epoch: 1 Global Step: 21930 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:44:24,951-Speed 6327.47 samples/sec Loss 33.2632 LearningRate 0.0003 Epoch: 1 Global Step: 21940 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:44:28,193-Speed 6319.72 samples/sec Loss 33.2738 LearningRate 0.0003 Epoch: 1 Global Step: 21950 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:44:31,432-Speed 6324.51 samples/sec Loss 33.3004 LearningRate 0.0003 Epoch: 1 Global Step: 21960 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:44:34,668-Speed 6330.07 samples/sec Loss 33.2788 LearningRate 0.0003 Epoch: 1 Global Step: 21970 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:44:37,904-Speed 6329.89 samples/sec Loss 33.2672 LearningRate 0.0003 Epoch: 1 Global Step: 21980 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:44:41,152-Speed 6307.15 samples/sec Loss 33.1869 LearningRate 0.0003 Epoch: 1 Global Step: 21990 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:44:44,393-Speed 6321.50 samples/sec Loss 33.2184 LearningRate 0.0003 Epoch: 1 Global Step: 22000 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:44:47,637-Speed 6313.92 samples/sec Loss 33.1314 LearningRate 0.0003 Epoch: 1 Global Step: 22010 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:44:50,876-Speed 6325.95 samples/sec Loss 33.2810 LearningRate 0.0003 Epoch: 1 Global Step: 22020 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:44:54,110-Speed 6332.45 samples/sec Loss 33.2530 LearningRate 0.0003 Epoch: 1 Global Step: 22030 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:44:57,351-Speed 6321.51 samples/sec Loss 33.2124 LearningRate 0.0003 Epoch: 1 Global Step: 22040 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:45:00,591-Speed 6321.19 samples/sec Loss 33.0756 LearningRate 0.0003 Epoch: 1 Global Step: 22050 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:45:03,836-Speed 6313.06 samples/sec Loss 33.1471 LearningRate 0.0003 Epoch: 1 Global Step: 22060 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:45:07,076-Speed 6323.01 samples/sec Loss 33.1769 LearningRate 0.0003 Epoch: 1 Global Step: 22070 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:45:10,315-Speed 6323.21 samples/sec Loss 33.1679 LearningRate 0.0003 Epoch: 1 Global Step: 22080 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:45:13,556-Speed 6321.05 samples/sec Loss 33.1528 LearningRate 0.0003 Epoch: 1 Global Step: 22090 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:45:16,783-Speed 6347.04 samples/sec Loss 33.1151 LearningRate 0.0003 Epoch: 1 Global Step: 22100 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:45:20,046-Speed 6278.34 samples/sec Loss 33.0837 LearningRate 0.0003 Epoch: 1 Global Step: 22110 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:45:23,287-Speed 6320.69 samples/sec Loss 33.0902 LearningRate 0.0003 Epoch: 1 Global Step: 22120 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:45:26,542-Speed 6292.45 samples/sec Loss 33.0226 LearningRate 0.0003 Epoch: 1 Global Step: 22130 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:45:29,784-Speed 6319.92 samples/sec Loss 33.1157 LearningRate 0.0003 Epoch: 1 Global Step: 22140 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:45:33,021-Speed 6327.80 samples/sec Loss 33.0004 LearningRate 0.0003 Epoch: 1 Global Step: 22150 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:45:36,260-Speed 6325.09 samples/sec Loss 33.0845 LearningRate 0.0003 Epoch: 1 Global Step: 22160 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:45:39,497-Speed 6326.91 samples/sec Loss 32.9956 LearningRate 0.0003 Epoch: 1 Global Step: 22170 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:45:42,739-Speed 6319.31 samples/sec Loss 33.0857 LearningRate 0.0003 Epoch: 1 Global Step: 22180 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:45:45,980-Speed 6319.95 samples/sec Loss 33.0898 LearningRate 0.0003 Epoch: 1 Global Step: 22190 Fp16 Grad Scale: 8192 Required: 74 hours Training: 2022-03-31 17:45:49,221-Speed 6322.27 samples/sec Loss 33.0996 LearningRate 0.0003 Epoch: 1 Global Step: 22200 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:45:52,459-Speed 6325.17 samples/sec Loss 33.0625 LearningRate 0.0003 Epoch: 1 Global Step: 22210 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:45:55,701-Speed 6319.55 samples/sec Loss 32.9050 LearningRate 0.0003 Epoch: 1 Global Step: 22220 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:45:58,940-Speed 6324.51 samples/sec Loss 33.0283 LearningRate 0.0003 Epoch: 1 Global Step: 22230 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:46:02,184-Speed 6314.56 samples/sec Loss 33.0575 LearningRate 0.0003 Epoch: 1 Global Step: 22240 Fp16 Grad Scale: 16384 Required: 74 hours Training: 2022-03-31 17:46:05,425-Speed 6319.37 samples/sec Loss 33.0801 LearningRate 0.0003 Epoch: 1 Global Step: 22250 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:46:08,651-Speed 6350.51 samples/sec Loss 32.9914 LearningRate 0.0003 Epoch: 1 Global Step: 22260 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:46:11,892-Speed 6319.28 samples/sec Loss 33.0483 LearningRate 0.0003 Epoch: 1 Global Step: 22270 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:46:15,133-Speed 6320.76 samples/sec Loss 32.9499 LearningRate 0.0003 Epoch: 1 Global Step: 22280 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:46:18,371-Speed 6326.90 samples/sec Loss 32.8826 LearningRate 0.0003 Epoch: 1 Global Step: 22290 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:46:21,613-Speed 6318.95 samples/sec Loss 32.8963 LearningRate 0.0003 Epoch: 1 Global Step: 22300 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:46:24,851-Speed 6324.99 samples/sec Loss 32.9471 LearningRate 0.0003 Epoch: 1 Global Step: 22310 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:46:28,091-Speed 6323.62 samples/sec Loss 32.9360 LearningRate 0.0003 Epoch: 1 Global Step: 22320 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:46:31,327-Speed 6329.38 samples/sec Loss 32.9179 LearningRate 0.0003 Epoch: 1 Global Step: 22330 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:46:34,564-Speed 6329.67 samples/sec Loss 32.9056 LearningRate 0.0003 Epoch: 1 Global Step: 22340 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:46:37,807-Speed 6315.41 samples/sec Loss 32.8625 LearningRate 0.0003 Epoch: 1 Global Step: 22350 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:46:41,046-Speed 6325.40 samples/sec Loss 32.8443 LearningRate 0.0003 Epoch: 1 Global Step: 22360 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:46:44,286-Speed 6321.70 samples/sec Loss 32.8631 LearningRate 0.0003 Epoch: 1 Global Step: 22370 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:46:47,524-Speed 6325.99 samples/sec Loss 32.8975 LearningRate 0.0003 Epoch: 1 Global Step: 22380 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:46:50,766-Speed 6319.54 samples/sec Loss 32.8650 LearningRate 0.0003 Epoch: 1 Global Step: 22390 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:46:54,007-Speed 6319.70 samples/sec Loss 32.7762 LearningRate 0.0003 Epoch: 1 Global Step: 22400 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:46:57,248-Speed 6321.79 samples/sec Loss 32.8808 LearningRate 0.0003 Epoch: 1 Global Step: 22410 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:47:00,492-Speed 6315.05 samples/sec Loss 32.8482 LearningRate 0.0003 Epoch: 1 Global Step: 22420 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:47:03,731-Speed 6323.21 samples/sec Loss 32.7748 LearningRate 0.0003 Epoch: 1 Global Step: 22430 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:47:06,958-Speed 6347.87 samples/sec Loss 32.9052 LearningRate 0.0003 Epoch: 1 Global Step: 22440 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:47:10,197-Speed 6325.42 samples/sec Loss 32.8927 LearningRate 0.0003 Epoch: 1 Global Step: 22450 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:47:13,433-Speed 6329.24 samples/sec Loss 32.7877 LearningRate 0.0003 Epoch: 1 Global Step: 22460 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:47:16,676-Speed 6315.44 samples/sec Loss 32.8011 LearningRate 0.0003 Epoch: 1 Global Step: 22470 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:47:19,920-Speed 6315.23 samples/sec Loss 32.7598 LearningRate 0.0003 Epoch: 1 Global Step: 22480 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:47:23,159-Speed 6324.96 samples/sec Loss 32.7425 LearningRate 0.0003 Epoch: 1 Global Step: 22490 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:47:26,396-Speed 6328.31 samples/sec Loss 32.7408 LearningRate 0.0003 Epoch: 1 Global Step: 22500 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:47:29,639-Speed 6316.49 samples/sec Loss 32.7684 LearningRate 0.0003 Epoch: 1 Global Step: 22510 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:47:32,879-Speed 6322.84 samples/sec Loss 32.7180 LearningRate 0.0003 Epoch: 1 Global Step: 22520 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:47:36,120-Speed 6319.07 samples/sec Loss 32.7296 LearningRate 0.0003 Epoch: 1 Global Step: 22530 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:47:39,366-Speed 6311.78 samples/sec Loss 32.7210 LearningRate 0.0003 Epoch: 1 Global Step: 22540 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:47:42,609-Speed 6315.27 samples/sec Loss 32.6356 LearningRate 0.0003 Epoch: 1 Global Step: 22550 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:47:45,848-Speed 6323.90 samples/sec Loss 32.7121 LearningRate 0.0003 Epoch: 1 Global Step: 22560 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:47:49,087-Speed 6324.77 samples/sec Loss 32.7438 LearningRate 0.0003 Epoch: 1 Global Step: 22570 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:47:52,326-Speed 6324.06 samples/sec Loss 32.6393 LearningRate 0.0003 Epoch: 1 Global Step: 22580 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:47:55,572-Speed 6311.20 samples/sec Loss 32.6001 LearningRate 0.0003 Epoch: 1 Global Step: 22590 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:47:58,818-Speed 6311.53 samples/sec Loss 32.6617 LearningRate 0.0003 Epoch: 1 Global Step: 22600 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:02,061-Speed 6316.93 samples/sec Loss 32.5746 LearningRate 0.0003 Epoch: 1 Global Step: 22610 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:05,298-Speed 6329.52 samples/sec Loss 32.6288 LearningRate 0.0003 Epoch: 1 Global Step: 22620 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:08,540-Speed 6318.51 samples/sec Loss 32.6139 LearningRate 0.0003 Epoch: 1 Global Step: 22630 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:11,778-Speed 6326.38 samples/sec Loss 32.5542 LearningRate 0.0003 Epoch: 1 Global Step: 22640 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:48:15,024-Speed 6309.37 samples/sec Loss 32.6843 LearningRate 0.0003 Epoch: 1 Global Step: 22650 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:48:18,272-Speed 6306.45 samples/sec Loss 32.6331 LearningRate 0.0003 Epoch: 1 Global Step: 22660 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:48:21,526-Speed 6295.00 samples/sec Loss 32.5858 LearningRate 0.0003 Epoch: 1 Global Step: 22670 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:48:24,756-Speed 6343.28 samples/sec Loss 32.5985 LearningRate 0.0003 Epoch: 1 Global Step: 22680 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:28,008-Speed 6299.40 samples/sec Loss 32.5496 LearningRate 0.0003 Epoch: 1 Global Step: 22690 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:31,256-Speed 6305.14 samples/sec Loss 32.5182 LearningRate 0.0003 Epoch: 1 Global Step: 22700 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:34,499-Speed 6316.81 samples/sec Loss 32.5917 LearningRate 0.0003 Epoch: 1 Global Step: 22710 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:37,739-Speed 6323.90 samples/sec Loss 32.4735 LearningRate 0.0003 Epoch: 1 Global Step: 22720 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:40,984-Speed 6312.47 samples/sec Loss 32.5423 LearningRate 0.0003 Epoch: 1 Global Step: 22730 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:44,230-Speed 6309.23 samples/sec Loss 32.5502 LearningRate 0.0003 Epoch: 1 Global Step: 22740 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:47,470-Speed 6323.57 samples/sec Loss 32.5679 LearningRate 0.0003 Epoch: 1 Global Step: 22750 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:50,714-Speed 6314.30 samples/sec Loss 32.4843 LearningRate 0.0003 Epoch: 1 Global Step: 22760 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:53,953-Speed 6323.62 samples/sec Loss 32.4761 LearningRate 0.0003 Epoch: 1 Global Step: 22770 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:48:57,195-Speed 6318.47 samples/sec Loss 32.5042 LearningRate 0.0003 Epoch: 1 Global Step: 22780 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:49:00,424-Speed 6345.36 samples/sec Loss 32.4542 LearningRate 0.0003 Epoch: 1 Global Step: 22790 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:03,669-Speed 6311.88 samples/sec Loss 32.4040 LearningRate 0.0003 Epoch: 1 Global Step: 22800 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:06,909-Speed 6323.98 samples/sec Loss 32.4592 LearningRate 0.0003 Epoch: 1 Global Step: 22810 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:10,154-Speed 6312.81 samples/sec Loss 32.4082 LearningRate 0.0003 Epoch: 1 Global Step: 22820 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:13,394-Speed 6322.31 samples/sec Loss 32.4241 LearningRate 0.0003 Epoch: 1 Global Step: 22830 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:16,636-Speed 6318.46 samples/sec Loss 32.4193 LearningRate 0.0003 Epoch: 1 Global Step: 22840 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:19,877-Speed 6319.74 samples/sec Loss 32.3460 LearningRate 0.0003 Epoch: 1 Global Step: 22850 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:23,140-Speed 6278.82 samples/sec Loss 32.3357 LearningRate 0.0003 Epoch: 1 Global Step: 22860 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:26,379-Speed 6323.48 samples/sec Loss 32.3151 LearningRate 0.0003 Epoch: 1 Global Step: 22870 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:29,621-Speed 6318.52 samples/sec Loss 32.3258 LearningRate 0.0003 Epoch: 1 Global Step: 22880 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:32,862-Speed 6319.87 samples/sec Loss 32.3619 LearningRate 0.0003 Epoch: 1 Global Step: 22890 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:49:36,105-Speed 6316.57 samples/sec Loss 32.3194 LearningRate 0.0003 Epoch: 1 Global Step: 22900 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:49:39,351-Speed 6312.16 samples/sec Loss 32.2772 LearningRate 0.0003 Epoch: 1 Global Step: 22910 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:49:42,592-Speed 6320.31 samples/sec Loss 32.2466 LearningRate 0.0003 Epoch: 1 Global Step: 22920 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:49:45,830-Speed 6325.33 samples/sec Loss 32.2753 LearningRate 0.0003 Epoch: 1 Global Step: 22930 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:49:49,070-Speed 6323.30 samples/sec Loss 32.3153 LearningRate 0.0003 Epoch: 1 Global Step: 22940 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:52,307-Speed 6326.32 samples/sec Loss 32.2745 LearningRate 0.0003 Epoch: 1 Global Step: 22950 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:55,545-Speed 6326.79 samples/sec Loss 32.2660 LearningRate 0.0003 Epoch: 1 Global Step: 22960 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:49:58,789-Speed 6315.36 samples/sec Loss 32.2175 LearningRate 0.0003 Epoch: 1 Global Step: 22970 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:02,042-Speed 6297.42 samples/sec Loss 32.2162 LearningRate 0.0003 Epoch: 1 Global Step: 22980 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:05,332-Speed 6225.56 samples/sec Loss 32.2541 LearningRate 0.0003 Epoch: 1 Global Step: 22990 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:08,572-Speed 6323.18 samples/sec Loss 32.1494 LearningRate 0.0003 Epoch: 1 Global Step: 23000 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:11,817-Speed 6313.37 samples/sec Loss 32.2022 LearningRate 0.0003 Epoch: 1 Global Step: 23010 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:15,068-Speed 6299.97 samples/sec Loss 32.2394 LearningRate 0.0003 Epoch: 1 Global Step: 23020 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:18,310-Speed 6319.80 samples/sec Loss 32.2186 LearningRate 0.0003 Epoch: 1 Global Step: 23030 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:21,553-Speed 6315.39 samples/sec Loss 32.2311 LearningRate 0.0003 Epoch: 1 Global Step: 23040 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:50:24,796-Speed 6316.90 samples/sec Loss 32.1892 LearningRate 0.0003 Epoch: 1 Global Step: 23050 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:50:28,044-Speed 6307.84 samples/sec Loss 32.1777 LearningRate 0.0003 Epoch: 1 Global Step: 23060 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:50:31,301-Speed 6288.59 samples/sec Loss 32.1201 LearningRate 0.0003 Epoch: 1 Global Step: 23070 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:50:34,534-Speed 6336.99 samples/sec Loss 32.1272 LearningRate 0.0003 Epoch: 1 Global Step: 23080 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:37,778-Speed 6314.52 samples/sec Loss 32.0617 LearningRate 0.0003 Epoch: 1 Global Step: 23090 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:41,037-Speed 6286.90 samples/sec Loss 32.1502 LearningRate 0.0003 Epoch: 1 Global Step: 23100 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:44,278-Speed 6320.12 samples/sec Loss 32.1687 LearningRate 0.0003 Epoch: 1 Global Step: 23110 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:47,520-Speed 6318.47 samples/sec Loss 32.1145 LearningRate 0.0003 Epoch: 1 Global Step: 23120 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:50,767-Speed 6308.44 samples/sec Loss 32.0708 LearningRate 0.0003 Epoch: 1 Global Step: 23130 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:54,008-Speed 6320.48 samples/sec Loss 32.1526 LearningRate 0.0003 Epoch: 1 Global Step: 23140 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:50:57,246-Speed 6326.43 samples/sec Loss 32.0411 LearningRate 0.0003 Epoch: 1 Global Step: 23150 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:00,491-Speed 6312.24 samples/sec Loss 32.0683 LearningRate 0.0003 Epoch: 1 Global Step: 23160 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:03,731-Speed 6323.21 samples/sec Loss 32.0933 LearningRate 0.0003 Epoch: 1 Global Step: 23170 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:06,971-Speed 6321.75 samples/sec Loss 32.0595 LearningRate 0.0003 Epoch: 1 Global Step: 23180 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:51:10,216-Speed 6312.01 samples/sec Loss 32.0587 LearningRate 0.0003 Epoch: 1 Global Step: 23190 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:51:13,456-Speed 6322.85 samples/sec Loss 32.1072 LearningRate 0.0003 Epoch: 1 Global Step: 23200 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:51:16,688-Speed 6338.03 samples/sec Loss 32.0509 LearningRate 0.0003 Epoch: 1 Global Step: 23210 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:19,928-Speed 6323.49 samples/sec Loss 32.0053 LearningRate 0.0003 Epoch: 1 Global Step: 23220 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:23,171-Speed 6317.23 samples/sec Loss 32.0229 LearningRate 0.0003 Epoch: 1 Global Step: 23230 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:26,411-Speed 6321.14 samples/sec Loss 31.9768 LearningRate 0.0003 Epoch: 1 Global Step: 23240 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:29,674-Speed 6277.57 samples/sec Loss 32.0001 LearningRate 0.0003 Epoch: 1 Global Step: 23250 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:32,913-Speed 6324.39 samples/sec Loss 31.9745 LearningRate 0.0003 Epoch: 1 Global Step: 23260 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:36,153-Speed 6323.72 samples/sec Loss 31.9446 LearningRate 0.0003 Epoch: 1 Global Step: 23270 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:39,393-Speed 6322.16 samples/sec Loss 31.8909 LearningRate 0.0003 Epoch: 1 Global Step: 23280 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:42,633-Speed 6321.71 samples/sec Loss 31.8653 LearningRate 0.0003 Epoch: 1 Global Step: 23290 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:45,873-Speed 6322.71 samples/sec Loss 31.8780 LearningRate 0.0003 Epoch: 1 Global Step: 23300 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:51:49,119-Speed 6311.36 samples/sec Loss 31.9331 LearningRate 0.0003 Epoch: 1 Global Step: 23310 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:51:52,359-Speed 6321.95 samples/sec Loss 31.8723 LearningRate 0.0003 Epoch: 1 Global Step: 23320 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:51:55,605-Speed 6310.81 samples/sec Loss 31.9432 LearningRate 0.0003 Epoch: 1 Global Step: 23330 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:51:58,869-Speed 6275.73 samples/sec Loss 31.9444 LearningRate 0.0003 Epoch: 1 Global Step: 23340 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:52:02,112-Speed 6315.48 samples/sec Loss 31.8247 LearningRate 0.0003 Epoch: 1 Global Step: 23350 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:52:05,353-Speed 6320.93 samples/sec Loss 31.8108 LearningRate 0.0003 Epoch: 1 Global Step: 23360 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:52:08,598-Speed 6313.79 samples/sec Loss 31.8114 LearningRate 0.0003 Epoch: 1 Global Step: 23370 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:52:11,846-Speed 6305.63 samples/sec Loss 31.8326 LearningRate 0.0003 Epoch: 1 Global Step: 23380 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:52:15,089-Speed 6316.41 samples/sec Loss 31.8011 LearningRate 0.0003 Epoch: 1 Global Step: 23390 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:52:18,333-Speed 6315.70 samples/sec Loss 31.7992 LearningRate 0.0003 Epoch: 1 Global Step: 23400 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:52:21,571-Speed 6326.28 samples/sec Loss 31.8034 LearningRate 0.0003 Epoch: 1 Global Step: 23410 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:52:24,818-Speed 6309.82 samples/sec Loss 31.8387 LearningRate 0.0003 Epoch: 1 Global Step: 23420 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:52:28,059-Speed 6320.35 samples/sec Loss 31.7944 LearningRate 0.0003 Epoch: 1 Global Step: 23430 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:52:31,302-Speed 6315.62 samples/sec Loss 31.7776 LearningRate 0.0003 Epoch: 1 Global Step: 23440 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:52:34,547-Speed 6312.99 samples/sec Loss 31.7807 LearningRate 0.0003 Epoch: 1 Global Step: 23450 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:52:37,791-Speed 6314.21 samples/sec Loss 31.7567 LearningRate 0.0003 Epoch: 1 Global Step: 23460 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:52:41,035-Speed 6314.11 samples/sec Loss 31.7079 LearningRate 0.0003 Epoch: 1 Global Step: 23470 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:52:44,277-Speed 6319.42 samples/sec Loss 31.6478 LearningRate 0.0003 Epoch: 1 Global Step: 23480 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:52:47,523-Speed 6309.37 samples/sec Loss 31.7037 LearningRate 0.0003 Epoch: 1 Global Step: 23490 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:52:50,767-Speed 6315.51 samples/sec Loss 31.7329 LearningRate 0.0003 Epoch: 1 Global Step: 23500 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:52:54,010-Speed 6317.17 samples/sec Loss 31.6578 LearningRate 0.0003 Epoch: 1 Global Step: 23510 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:52:57,251-Speed 6320.50 samples/sec Loss 31.6931 LearningRate 0.0003 Epoch: 1 Global Step: 23520 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:53:00,485-Speed 6334.04 samples/sec Loss 31.7531 LearningRate 0.0003 Epoch: 1 Global Step: 23530 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:03,743-Speed 6287.75 samples/sec Loss 31.6220 LearningRate 0.0003 Epoch: 1 Global Step: 23540 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:06,998-Speed 6292.35 samples/sec Loss 31.6368 LearningRate 0.0003 Epoch: 1 Global Step: 23550 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:10,243-Speed 6311.96 samples/sec Loss 31.6256 LearningRate 0.0003 Epoch: 1 Global Step: 23560 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:13,485-Speed 6318.24 samples/sec Loss 31.5950 LearningRate 0.0003 Epoch: 1 Global Step: 23570 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:16,730-Speed 6312.54 samples/sec Loss 31.5704 LearningRate 0.0003 Epoch: 1 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:19,977-Speed 6309.74 samples/sec Loss 31.5382 LearningRate 0.0003 Epoch: 1 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:23,223-Speed 6310.46 samples/sec Loss 31.4955 LearningRate 0.0003 Epoch: 1 Global Step: 23600 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:26,468-Speed 6313.17 samples/sec Loss 31.6136 LearningRate 0.0003 Epoch: 1 Global Step: 23610 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:29,713-Speed 6313.30 samples/sec Loss 31.5211 LearningRate 0.0003 Epoch: 1 Global Step: 23620 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:32,957-Speed 6315.06 samples/sec Loss 31.5481 LearningRate 0.0003 Epoch: 1 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:53:36,186-Speed 6343.62 samples/sec Loss 31.5219 LearningRate 0.0003 Epoch: 1 Global Step: 23640 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:39,429-Speed 6315.85 samples/sec Loss 31.5188 LearningRate 0.0003 Epoch: 1 Global Step: 23650 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:42,673-Speed 6315.43 samples/sec Loss 31.4683 LearningRate 0.0003 Epoch: 1 Global Step: 23660 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:53:45,902-Speed 6343.93 samples/sec Loss 31.5262 LearningRate 0.0003 Epoch: 1 Global Step: 23670 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:53:49,140-Speed 6326.39 samples/sec Loss 31.4385 LearningRate 0.0003 Epoch: 1 Global Step: 23680 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:53:52,387-Speed 6307.69 samples/sec Loss 31.5009 LearningRate 0.0003 Epoch: 1 Global Step: 23690 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:53:55,625-Speed 6326.39 samples/sec Loss 31.5232 LearningRate 0.0003 Epoch: 1 Global Step: 23700 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:53:58,868-Speed 6316.77 samples/sec Loss 31.4126 LearningRate 0.0003 Epoch: 1 Global Step: 23710 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:54:02,115-Speed 6309.22 samples/sec Loss 31.4651 LearningRate 0.0003 Epoch: 1 Global Step: 23720 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:54:05,356-Speed 6320.52 samples/sec Loss 31.4039 LearningRate 0.0003 Epoch: 1 Global Step: 23730 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:54:08,600-Speed 6314.88 samples/sec Loss 31.4659 LearningRate 0.0003 Epoch: 1 Global Step: 23740 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:54:11,827-Speed 6348.12 samples/sec Loss 31.4239 LearningRate 0.0003 Epoch: 1 Global Step: 23750 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:54:15,068-Speed 6318.82 samples/sec Loss 31.2792 LearningRate 0.0003 Epoch: 1 Global Step: 23760 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:54:18,308-Speed 6323.02 samples/sec Loss 31.2884 LearningRate 0.0003 Epoch: 1 Global Step: 23770 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:54:21,550-Speed 6318.67 samples/sec Loss 31.2933 LearningRate 0.0003 Epoch: 1 Global Step: 23780 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:54:24,793-Speed 6316.52 samples/sec Loss 31.3772 LearningRate 0.0003 Epoch: 1 Global Step: 23790 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:54:28,036-Speed 6316.88 samples/sec Loss 31.2986 LearningRate 0.0003 Epoch: 1 Global Step: 23800 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:54:31,281-Speed 6313.26 samples/sec Loss 31.2454 LearningRate 0.0003 Epoch: 1 Global Step: 23810 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:54:34,521-Speed 6321.17 samples/sec Loss 31.3251 LearningRate 0.0003 Epoch: 1 Global Step: 23820 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:54:37,766-Speed 6312.67 samples/sec Loss 31.3524 LearningRate 0.0003 Epoch: 1 Global Step: 23830 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:54:41,006-Speed 6323.79 samples/sec Loss 31.3113 LearningRate 0.0003 Epoch: 1 Global Step: 23840 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:54:44,246-Speed 6322.67 samples/sec Loss 31.2342 LearningRate 0.0003 Epoch: 1 Global Step: 23850 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:54:47,493-Speed 6307.97 samples/sec Loss 31.2763 LearningRate 0.0003 Epoch: 1 Global Step: 23860 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:54:50,740-Speed 6307.50 samples/sec Loss 31.3031 LearningRate 0.0003 Epoch: 1 Global Step: 23870 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:54:53,985-Speed 6314.84 samples/sec Loss 31.2223 LearningRate 0.0003 Epoch: 1 Global Step: 23880 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:54:57,224-Speed 6323.02 samples/sec Loss 31.2396 LearningRate 0.0003 Epoch: 1 Global Step: 23890 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:55:00,464-Speed 6322.12 samples/sec Loss 31.2615 LearningRate 0.0003 Epoch: 1 Global Step: 23900 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:55:03,709-Speed 6312.67 samples/sec Loss 31.2691 LearningRate 0.0003 Epoch: 1 Global Step: 23910 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:55:06,946-Speed 6328.35 samples/sec Loss 31.1745 LearningRate 0.0003 Epoch: 1 Global Step: 23920 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:55:10,187-Speed 6320.06 samples/sec Loss 31.1523 LearningRate 0.0003 Epoch: 1 Global Step: 23930 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:55:13,427-Speed 6322.54 samples/sec Loss 31.2257 LearningRate 0.0003 Epoch: 1 Global Step: 23940 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:55:16,672-Speed 6313.77 samples/sec Loss 31.1707 LearningRate 0.0003 Epoch: 1 Global Step: 23950 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:55:19,912-Speed 6322.08 samples/sec Loss 31.1780 LearningRate 0.0003 Epoch: 1 Global Step: 23960 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:55:23,156-Speed 6313.87 samples/sec Loss 31.1022 LearningRate 0.0003 Epoch: 1 Global Step: 23970 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:55:26,401-Speed 6315.80 samples/sec Loss 31.1325 LearningRate 0.0003 Epoch: 1 Global Step: 23980 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:55:29,647-Speed 6310.95 samples/sec Loss 31.0633 LearningRate 0.0003 Epoch: 1 Global Step: 23990 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:55:32,876-Speed 6344.80 samples/sec Loss 31.1655 LearningRate 0.0003 Epoch: 1 Global Step: 24000 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:55:36,107-Speed 6339.58 samples/sec Loss 31.0763 LearningRate 0.0003 Epoch: 1 Global Step: 24010 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:55:39,331-Speed 6355.52 samples/sec Loss 31.1212 LearningRate 0.0003 Epoch: 1 Global Step: 24020 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:55:42,570-Speed 6323.11 samples/sec Loss 31.0047 LearningRate 0.0003 Epoch: 1 Global Step: 24030 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:55:45,806-Speed 6331.09 samples/sec Loss 30.9703 LearningRate 0.0003 Epoch: 1 Global Step: 24040 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:55:49,051-Speed 6312.78 samples/sec Loss 31.0114 LearningRate 0.0003 Epoch: 1 Global Step: 24050 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:55:52,292-Speed 6320.58 samples/sec Loss 31.0079 LearningRate 0.0003 Epoch: 1 Global Step: 24060 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:55:55,530-Speed 6324.94 samples/sec Loss 31.0567 LearningRate 0.0003 Epoch: 1 Global Step: 24070 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:55:58,775-Speed 6313.55 samples/sec Loss 30.9822 LearningRate 0.0003 Epoch: 1 Global Step: 24080 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:56:02,013-Speed 6325.31 samples/sec Loss 31.0302 LearningRate 0.0003 Epoch: 1 Global Step: 24090 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:56:05,258-Speed 6312.92 samples/sec Loss 30.9933 LearningRate 0.0003 Epoch: 1 Global Step: 24100 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:56:08,498-Speed 6321.93 samples/sec Loss 30.8351 LearningRate 0.0003 Epoch: 1 Global Step: 24110 Fp16 Grad Scale: 8192 Required: 73 hours Training: 2022-03-31 17:56:11,740-Speed 6319.81 samples/sec Loss 30.8777 LearningRate 0.0003 Epoch: 1 Global Step: 24120 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:14,981-Speed 6319.07 samples/sec Loss 30.8995 LearningRate 0.0003 Epoch: 1 Global Step: 24130 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:18,220-Speed 6325.02 samples/sec Loss 30.9513 LearningRate 0.0003 Epoch: 1 Global Step: 24140 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:21,461-Speed 6319.31 samples/sec Loss 30.8180 LearningRate 0.0003 Epoch: 1 Global Step: 24150 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:24,703-Speed 6319.51 samples/sec Loss 30.8646 LearningRate 0.0003 Epoch: 1 Global Step: 24160 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:27,949-Speed 6310.37 samples/sec Loss 30.8288 LearningRate 0.0003 Epoch: 1 Global Step: 24170 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:31,190-Speed 6321.12 samples/sec Loss 30.7947 LearningRate 0.0003 Epoch: 1 Global Step: 24180 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:34,437-Speed 6308.26 samples/sec Loss 30.8223 LearningRate 0.0003 Epoch: 1 Global Step: 24190 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:37,682-Speed 6313.74 samples/sec Loss 30.8382 LearningRate 0.0003 Epoch: 1 Global Step: 24200 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:40,927-Speed 6312.52 samples/sec Loss 30.8151 LearningRate 0.0003 Epoch: 1 Global Step: 24210 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:44,168-Speed 6320.09 samples/sec Loss 30.8946 LearningRate 0.0003 Epoch: 1 Global Step: 24220 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:56:47,410-Speed 6318.60 samples/sec Loss 30.8003 LearningRate 0.0003 Epoch: 1 Global Step: 24230 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:56:50,641-Speed 6339.80 samples/sec Loss 30.8100 LearningRate 0.0003 Epoch: 1 Global Step: 24240 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:53,885-Speed 6314.18 samples/sec Loss 30.8476 LearningRate 0.0003 Epoch: 1 Global Step: 24250 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:56:57,128-Speed 6318.18 samples/sec Loss 30.7783 LearningRate 0.0003 Epoch: 1 Global Step: 24260 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:57:00,370-Speed 6317.99 samples/sec Loss 30.7449 LearningRate 0.0003 Epoch: 1 Global Step: 24270 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:57:03,614-Speed 6314.34 samples/sec Loss 30.7294 LearningRate 0.0003 Epoch: 1 Global Step: 24280 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:57:06,858-Speed 6314.54 samples/sec Loss 30.6976 LearningRate 0.0003 Epoch: 1 Global Step: 24290 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:57:10,098-Speed 6322.28 samples/sec Loss 30.7143 LearningRate 0.0003 Epoch: 1 Global Step: 24300 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:57:13,339-Speed 6320.69 samples/sec Loss 30.7281 LearningRate 0.0003 Epoch: 1 Global Step: 24310 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:57:16,581-Speed 6318.04 samples/sec Loss 30.7367 LearningRate 0.0003 Epoch: 1 Global Step: 24320 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:57:19,827-Speed 6311.49 samples/sec Loss 30.7494 LearningRate 0.0003 Epoch: 1 Global Step: 24330 Fp16 Grad Scale: 16384 Required: 73 hours Training: 2022-03-31 17:57:23,070-Speed 6315.65 samples/sec Loss 30.7646 LearningRate 0.0003 Epoch: 1 Global Step: 24340 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:57:26,313-Speed 6316.99 samples/sec Loss 30.6463 LearningRate 0.0003 Epoch: 1 Global Step: 24350 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:57:29,559-Speed 6310.61 samples/sec Loss 30.6535 LearningRate 0.0003 Epoch: 1 Global Step: 24360 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:57:32,801-Speed 6318.66 samples/sec Loss 30.6131 LearningRate 0.0003 Epoch: 1 Global Step: 24370 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:57:36,046-Speed 6313.50 samples/sec Loss 30.6484 LearningRate 0.0003 Epoch: 1 Global Step: 24380 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:57:39,290-Speed 6314.92 samples/sec Loss 30.5215 LearningRate 0.0003 Epoch: 1 Global Step: 24390 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:57:42,532-Speed 6318.77 samples/sec Loss 30.5466 LearningRate 0.0003 Epoch: 1 Global Step: 24400 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:57:45,776-Speed 6315.10 samples/sec Loss 30.5775 LearningRate 0.0003 Epoch: 1 Global Step: 24410 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:57:49,020-Speed 6314.33 samples/sec Loss 30.5712 LearningRate 0.0003 Epoch: 1 Global Step: 24420 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:57:52,266-Speed 6310.08 samples/sec Loss 30.5770 LearningRate 0.0003 Epoch: 1 Global Step: 24430 Fp16 Grad Scale: 32768 Required: 73 hours Training: 2022-03-31 17:57:55,513-Speed 6308.24 samples/sec Loss 30.5026 LearningRate 0.0003 Epoch: 1 Global Step: 24440 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:57:58,755-Speed 6319.89 samples/sec Loss 30.4723 LearningRate 0.0003 Epoch: 1 Global Step: 24450 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:02,002-Speed 6307.72 samples/sec Loss 30.4468 LearningRate 0.0003 Epoch: 1 Global Step: 24460 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:05,251-Speed 6304.35 samples/sec Loss 30.4539 LearningRate 0.0003 Epoch: 1 Global Step: 24470 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:08,497-Speed 6311.78 samples/sec Loss 30.4244 LearningRate 0.0003 Epoch: 1 Global Step: 24480 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:11,738-Speed 6319.86 samples/sec Loss 30.4923 LearningRate 0.0003 Epoch: 1 Global Step: 24490 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:14,978-Speed 6323.52 samples/sec Loss 30.4330 LearningRate 0.0003 Epoch: 1 Global Step: 24500 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:18,219-Speed 6319.53 samples/sec Loss 30.4554 LearningRate 0.0003 Epoch: 1 Global Step: 24510 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:21,461-Speed 6318.99 samples/sec Loss 30.4376 LearningRate 0.0003 Epoch: 1 Global Step: 24520 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:24,705-Speed 6314.71 samples/sec Loss 30.3458 LearningRate 0.0003 Epoch: 1 Global Step: 24530 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:27,957-Speed 6297.59 samples/sec Loss 30.2736 LearningRate 0.0003 Epoch: 1 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:58:31,204-Speed 6310.15 samples/sec Loss 30.3658 LearningRate 0.0003 Epoch: 1 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:58:34,448-Speed 6314.76 samples/sec Loss 30.4234 LearningRate 0.0003 Epoch: 1 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:58:37,691-Speed 6315.75 samples/sec Loss 30.3475 LearningRate 0.0003 Epoch: 1 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:58:40,937-Speed 6310.61 samples/sec Loss 30.3169 LearningRate 0.0003 Epoch: 1 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:58:44,181-Speed 6314.34 samples/sec Loss 30.3098 LearningRate 0.0003 Epoch: 1 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:58:47,426-Speed 6314.44 samples/sec Loss 30.3575 LearningRate 0.0003 Epoch: 1 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:58:50,656-Speed 6341.58 samples/sec Loss 30.3625 LearningRate 0.0003 Epoch: 1 Global Step: 24610 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:53,897-Speed 6320.99 samples/sec Loss 30.2295 LearningRate 0.0003 Epoch: 1 Global Step: 24620 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:58:57,136-Speed 6322.66 samples/sec Loss 30.2894 LearningRate 0.0003 Epoch: 1 Global Step: 24630 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:00,378-Speed 6319.69 samples/sec Loss 30.2634 LearningRate 0.0003 Epoch: 1 Global Step: 24640 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:03,621-Speed 6317.11 samples/sec Loss 30.2257 LearningRate 0.0003 Epoch: 1 Global Step: 24650 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:06,861-Speed 6320.48 samples/sec Loss 30.2054 LearningRate 0.0003 Epoch: 1 Global Step: 24660 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:10,107-Speed 6312.36 samples/sec Loss 30.2586 LearningRate 0.0003 Epoch: 1 Global Step: 24670 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:13,354-Speed 6308.81 samples/sec Loss 30.1953 LearningRate 0.0003 Epoch: 1 Global Step: 24680 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:16,633-Speed 6247.20 samples/sec Loss 30.2434 LearningRate 0.0003 Epoch: 1 Global Step: 24690 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:19,876-Speed 6315.69 samples/sec Loss 30.1989 LearningRate 0.0003 Epoch: 1 Global Step: 24700 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:23,117-Speed 6321.51 samples/sec Loss 30.1285 LearningRate 0.0003 Epoch: 1 Global Step: 24710 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:59:26,381-Speed 6275.12 samples/sec Loss 30.1692 LearningRate 0.0003 Epoch: 1 Global Step: 24720 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:59:29,628-Speed 6309.09 samples/sec Loss 30.1693 LearningRate 0.0003 Epoch: 1 Global Step: 24730 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:59:32,870-Speed 6317.24 samples/sec Loss 30.1842 LearningRate 0.0003 Epoch: 1 Global Step: 24740 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 17:59:36,102-Speed 6337.85 samples/sec Loss 30.0918 LearningRate 0.0003 Epoch: 1 Global Step: 24750 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:39,354-Speed 6302.43 samples/sec Loss 30.0876 LearningRate 0.0003 Epoch: 1 Global Step: 24760 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:42,597-Speed 6315.90 samples/sec Loss 30.1009 LearningRate 0.0003 Epoch: 1 Global Step: 24770 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:45,838-Speed 6320.30 samples/sec Loss 30.0375 LearningRate 0.0003 Epoch: 1 Global Step: 24780 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:49,078-Speed 6322.61 samples/sec Loss 30.0195 LearningRate 0.0003 Epoch: 1 Global Step: 24790 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:52,321-Speed 6316.03 samples/sec Loss 30.0717 LearningRate 0.0003 Epoch: 1 Global Step: 24800 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:55,561-Speed 6323.47 samples/sec Loss 30.0339 LearningRate 0.0003 Epoch: 1 Global Step: 24810 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 17:59:58,803-Speed 6317.91 samples/sec Loss 30.0855 LearningRate 0.0003 Epoch: 1 Global Step: 24820 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:02,043-Speed 6323.27 samples/sec Loss 29.9787 LearningRate 0.0003 Epoch: 1 Global Step: 24830 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:05,285-Speed 6317.79 samples/sec Loss 29.9650 LearningRate 0.0003 Epoch: 1 Global Step: 24840 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:08,531-Speed 6311.21 samples/sec Loss 29.9529 LearningRate 0.0003 Epoch: 1 Global Step: 24850 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:00:11,772-Speed 6320.12 samples/sec Loss 30.0434 LearningRate 0.0003 Epoch: 1 Global Step: 24860 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:00:15,019-Speed 6309.73 samples/sec Loss 30.0136 LearningRate 0.0003 Epoch: 1 Global Step: 24870 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:00:18,259-Speed 6321.28 samples/sec Loss 29.8781 LearningRate 0.0003 Epoch: 1 Global Step: 24880 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:00:21,504-Speed 6313.04 samples/sec Loss 29.9679 LearningRate 0.0003 Epoch: 1 Global Step: 24890 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:00:24,745-Speed 6319.80 samples/sec Loss 29.9417 LearningRate 0.0003 Epoch: 1 Global Step: 24900 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:00:27,977-Speed 6338.21 samples/sec Loss 29.9136 LearningRate 0.0003 Epoch: 1 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:31,221-Speed 6315.47 samples/sec Loss 29.9161 LearningRate 0.0003 Epoch: 1 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:34,463-Speed 6317.51 samples/sec Loss 29.8582 LearningRate 0.0003 Epoch: 1 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:37,709-Speed 6310.71 samples/sec Loss 29.8095 LearningRate 0.0003 Epoch: 1 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:40,951-Speed 6318.88 samples/sec Loss 29.8042 LearningRate 0.0003 Epoch: 1 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:44,197-Speed 6310.36 samples/sec Loss 29.8458 LearningRate 0.0003 Epoch: 1 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:47,438-Speed 6321.00 samples/sec Loss 29.8381 LearningRate 0.0003 Epoch: 1 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:50,683-Speed 6313.01 samples/sec Loss 29.7269 LearningRate 0.0003 Epoch: 1 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:53,925-Speed 6317.94 samples/sec Loss 29.7967 LearningRate 0.0003 Epoch: 1 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:00:57,166-Speed 6320.17 samples/sec Loss 29.7118 LearningRate 0.0003 Epoch: 1 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:00,409-Speed 6318.01 samples/sec Loss 29.7302 LearningRate 0.0003 Epoch: 1 Global Step: 25010 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:01:03,711-Speed 6203.58 samples/sec Loss 29.7148 LearningRate 0.0003 Epoch: 1 Global Step: 25020 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:01:06,958-Speed 6307.83 samples/sec Loss 29.7280 LearningRate 0.0003 Epoch: 1 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:01:10,203-Speed 6313.13 samples/sec Loss 29.6771 LearningRate 0.0003 Epoch: 1 Global Step: 25040 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:01:13,495-Speed 6223.24 samples/sec Loss 29.6404 LearningRate 0.0003 Epoch: 1 Global Step: 25050 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:01:16,739-Speed 6315.05 samples/sec Loss 29.6464 LearningRate 0.0003 Epoch: 1 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:01:19,983-Speed 6313.52 samples/sec Loss 29.6413 LearningRate 0.0003 Epoch: 1 Global Step: 25070 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:01:23,224-Speed 6321.59 samples/sec Loss 29.5249 LearningRate 0.0003 Epoch: 1 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:01:26,454-Speed 6341.27 samples/sec Loss 29.5645 LearningRate 0.0003 Epoch: 1 Global Step: 25090 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:29,695-Speed 6320.16 samples/sec Loss 29.6268 LearningRate 0.0003 Epoch: 1 Global Step: 25100 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:32,942-Speed 6308.28 samples/sec Loss 29.5593 LearningRate 0.0003 Epoch: 1 Global Step: 25110 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:36,182-Speed 6322.17 samples/sec Loss 29.5326 LearningRate 0.0003 Epoch: 1 Global Step: 25120 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:39,424-Speed 6318.88 samples/sec Loss 29.5764 LearningRate 0.0003 Epoch: 1 Global Step: 25130 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:42,670-Speed 6311.24 samples/sec Loss 29.5707 LearningRate 0.0003 Epoch: 1 Global Step: 25140 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:45,913-Speed 6316.64 samples/sec Loss 29.5940 LearningRate 0.0003 Epoch: 1 Global Step: 25150 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:49,160-Speed 6308.68 samples/sec Loss 29.5453 LearningRate 0.0003 Epoch: 1 Global Step: 25160 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:52,411-Speed 6300.96 samples/sec Loss 29.5583 LearningRate 0.0003 Epoch: 1 Global Step: 25170 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:55,654-Speed 6316.11 samples/sec Loss 29.5126 LearningRate 0.0003 Epoch: 1 Global Step: 25180 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:01:58,894-Speed 6322.23 samples/sec Loss 29.4334 LearningRate 0.0003 Epoch: 1 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:02:02,139-Speed 6312.55 samples/sec Loss 29.5354 LearningRate 0.0003 Epoch: 1 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:02:05,372-Speed 6337.68 samples/sec Loss 29.4693 LearningRate 0.0003 Epoch: 1 Global Step: 25210 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:08,615-Speed 6315.83 samples/sec Loss 29.4664 LearningRate 0.0003 Epoch: 1 Global Step: 25220 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:11,861-Speed 6311.51 samples/sec Loss 29.5307 LearningRate 0.0003 Epoch: 1 Global Step: 25230 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:15,102-Speed 6319.75 samples/sec Loss 29.3879 LearningRate 0.0003 Epoch: 1 Global Step: 25240 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:18,344-Speed 6318.90 samples/sec Loss 29.4647 LearningRate 0.0003 Epoch: 1 Global Step: 25250 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:21,588-Speed 6314.40 samples/sec Loss 29.4054 LearningRate 0.0003 Epoch: 1 Global Step: 25260 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:24,830-Speed 6319.47 samples/sec Loss 29.4161 LearningRate 0.0003 Epoch: 1 Global Step: 25270 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:28,076-Speed 6310.36 samples/sec Loss 29.3237 LearningRate 0.0003 Epoch: 1 Global Step: 25280 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:31,314-Speed 6325.22 samples/sec Loss 29.3213 LearningRate 0.0003 Epoch: 1 Global Step: 25290 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:34,560-Speed 6311.31 samples/sec Loss 29.4033 LearningRate 0.0003 Epoch: 1 Global Step: 25300 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:37,789-Speed 6342.80 samples/sec Loss 29.3204 LearningRate 0.0003 Epoch: 1 Global Step: 25310 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:41,032-Speed 6316.97 samples/sec Loss 29.2428 LearningRate 0.0003 Epoch: 1 Global Step: 25320 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:44,281-Speed 6305.36 samples/sec Loss 29.2746 LearningRate 0.0003 Epoch: 1 Global Step: 25330 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:47,528-Speed 6310.95 samples/sec Loss 29.2432 LearningRate 0.0003 Epoch: 1 Global Step: 25340 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:50,820-Speed 6223.27 samples/sec Loss 29.3348 LearningRate 0.0003 Epoch: 1 Global Step: 25350 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:54,071-Speed 6299.42 samples/sec Loss 29.2223 LearningRate 0.0003 Epoch: 1 Global Step: 25360 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:02:57,314-Speed 6317.68 samples/sec Loss 29.2567 LearningRate 0.0003 Epoch: 1 Global Step: 25370 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:00,559-Speed 6312.50 samples/sec Loss 29.1897 LearningRate 0.0003 Epoch: 1 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:03,807-Speed 6307.63 samples/sec Loss 29.2740 LearningRate 0.0003 Epoch: 1 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:07,054-Speed 6307.14 samples/sec Loss 29.2386 LearningRate 0.0003 Epoch: 1 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:10,300-Speed 6312.42 samples/sec Loss 29.1840 LearningRate 0.0003 Epoch: 1 Global Step: 25410 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:03:13,543-Speed 6316.62 samples/sec Loss 29.0795 LearningRate 0.0003 Epoch: 1 Global Step: 25420 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:03:16,801-Speed 6287.78 samples/sec Loss 29.2174 LearningRate 0.0003 Epoch: 1 Global Step: 25430 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:03:20,031-Speed 6341.38 samples/sec Loss 29.1157 LearningRate 0.0003 Epoch: 1 Global Step: 25440 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:23,271-Speed 6323.29 samples/sec Loss 29.1829 LearningRate 0.0003 Epoch: 1 Global Step: 25450 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:26,515-Speed 6315.21 samples/sec Loss 29.0199 LearningRate 0.0003 Epoch: 1 Global Step: 25460 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:29,762-Speed 6308.83 samples/sec Loss 29.0976 LearningRate 0.0003 Epoch: 1 Global Step: 25470 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:33,000-Speed 6325.63 samples/sec Loss 29.0649 LearningRate 0.0003 Epoch: 1 Global Step: 25480 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:36,245-Speed 6311.75 samples/sec Loss 29.0664 LearningRate 0.0003 Epoch: 1 Global Step: 25490 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:39,488-Speed 6316.73 samples/sec Loss 28.9999 LearningRate 0.0003 Epoch: 1 Global Step: 25500 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:42,730-Speed 6318.16 samples/sec Loss 29.0416 LearningRate 0.0003 Epoch: 1 Global Step: 25510 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:45,975-Speed 6313.92 samples/sec Loss 29.0175 LearningRate 0.0003 Epoch: 1 Global Step: 25520 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:49,216-Speed 6319.44 samples/sec Loss 28.9810 LearningRate 0.0003 Epoch: 1 Global Step: 25530 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:03:52,462-Speed 6312.17 samples/sec Loss 29.0496 LearningRate 0.0003 Epoch: 1 Global Step: 25540 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:03:55,712-Speed 6301.62 samples/sec Loss 29.0296 LearningRate 0.0003 Epoch: 1 Global Step: 25550 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:03:58,944-Speed 6337.92 samples/sec Loss 28.9172 LearningRate 0.0003 Epoch: 1 Global Step: 25560 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:02,191-Speed 6309.32 samples/sec Loss 28.9920 LearningRate 0.0003 Epoch: 1 Global Step: 25570 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:05,441-Speed 6302.69 samples/sec Loss 28.9249 LearningRate 0.0003 Epoch: 1 Global Step: 25580 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:08,691-Speed 6302.76 samples/sec Loss 28.9575 LearningRate 0.0003 Epoch: 1 Global Step: 25590 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:11,933-Speed 6319.36 samples/sec Loss 28.9303 LearningRate 0.0003 Epoch: 1 Global Step: 25600 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:15,179-Speed 6311.72 samples/sec Loss 28.9081 LearningRate 0.0003 Epoch: 1 Global Step: 25610 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:18,427-Speed 6306.33 samples/sec Loss 28.9072 LearningRate 0.0003 Epoch: 1 Global Step: 25620 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:21,670-Speed 6316.26 samples/sec Loss 28.8223 LearningRate 0.0003 Epoch: 1 Global Step: 25630 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:24,914-Speed 6314.77 samples/sec Loss 28.8515 LearningRate 0.0003 Epoch: 1 Global Step: 25640 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:28,155-Speed 6320.29 samples/sec Loss 28.9399 LearningRate 0.0003 Epoch: 1 Global Step: 25650 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:31,398-Speed 6316.10 samples/sec Loss 28.8155 LearningRate 0.0003 Epoch: 1 Global Step: 25660 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:04:34,647-Speed 6305.74 samples/sec Loss 28.8282 LearningRate 0.0003 Epoch: 1 Global Step: 25670 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:04:37,878-Speed 6339.74 samples/sec Loss 28.8675 LearningRate 0.0003 Epoch: 1 Global Step: 25680 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:41,122-Speed 6314.57 samples/sec Loss 28.8190 LearningRate 0.0003 Epoch: 1 Global Step: 25690 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:44,367-Speed 6312.54 samples/sec Loss 28.7728 LearningRate 0.0003 Epoch: 1 Global Step: 25700 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:47,611-Speed 6315.31 samples/sec Loss 28.7586 LearningRate 0.0003 Epoch: 1 Global Step: 25710 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:50,852-Speed 6318.47 samples/sec Loss 28.7422 LearningRate 0.0003 Epoch: 1 Global Step: 25720 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:54,099-Speed 6309.57 samples/sec Loss 28.7281 LearningRate 0.0003 Epoch: 1 Global Step: 25730 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:04:57,339-Speed 6323.46 samples/sec Loss 28.8630 LearningRate 0.0003 Epoch: 1 Global Step: 25740 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:00,581-Speed 6317.53 samples/sec Loss 28.6724 LearningRate 0.0003 Epoch: 1 Global Step: 25750 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:03,829-Speed 6305.96 samples/sec Loss 28.7351 LearningRate 0.0003 Epoch: 1 Global Step: 25760 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:07,075-Speed 6312.17 samples/sec Loss 28.6561 LearningRate 0.0003 Epoch: 1 Global Step: 25770 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:10,307-Speed 6336.44 samples/sec Loss 28.6270 LearningRate 0.0003 Epoch: 1 Global Step: 25780 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:13,553-Speed 6311.43 samples/sec Loss 28.6612 LearningRate 0.0003 Epoch: 1 Global Step: 25790 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:16,796-Speed 6317.99 samples/sec Loss 28.6162 LearningRate 0.0003 Epoch: 1 Global Step: 25800 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:20,041-Speed 6312.21 samples/sec Loss 28.6085 LearningRate 0.0003 Epoch: 1 Global Step: 25810 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:23,285-Speed 6314.92 samples/sec Loss 28.6803 LearningRate 0.0003 Epoch: 1 Global Step: 25820 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:26,530-Speed 6311.87 samples/sec Loss 28.5553 LearningRate 0.0003 Epoch: 1 Global Step: 25830 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:29,770-Speed 6322.47 samples/sec Loss 28.5219 LearningRate 0.0003 Epoch: 1 Global Step: 25840 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:33,017-Speed 6309.63 samples/sec Loss 28.5682 LearningRate 0.0003 Epoch: 1 Global Step: 25850 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:36,268-Speed 6301.21 samples/sec Loss 28.6267 LearningRate 0.0003 Epoch: 1 Global Step: 25860 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:39,514-Speed 6310.69 samples/sec Loss 28.4808 LearningRate 0.0003 Epoch: 1 Global Step: 25870 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:42,757-Speed 6315.16 samples/sec Loss 28.5105 LearningRate 0.0003 Epoch: 1 Global Step: 25880 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:05:45,987-Speed 6343.73 samples/sec Loss 28.4455 LearningRate 0.0003 Epoch: 1 Global Step: 25890 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:49,244-Speed 6287.88 samples/sec Loss 28.4554 LearningRate 0.0003 Epoch: 1 Global Step: 25900 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:52,493-Speed 6306.69 samples/sec Loss 28.5661 LearningRate 0.0003 Epoch: 1 Global Step: 25910 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:55,740-Speed 6308.45 samples/sec Loss 28.4351 LearningRate 0.0003 Epoch: 1 Global Step: 25920 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:05:58,985-Speed 6312.26 samples/sec Loss 28.4235 LearningRate 0.0003 Epoch: 1 Global Step: 25930 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:02,227-Speed 6317.29 samples/sec Loss 28.3401 LearningRate 0.0003 Epoch: 1 Global Step: 25940 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:05,466-Speed 6325.66 samples/sec Loss 28.4078 LearningRate 0.0003 Epoch: 1 Global Step: 25950 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:08,712-Speed 6309.87 samples/sec Loss 28.3954 LearningRate 0.0003 Epoch: 1 Global Step: 25960 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:11,958-Speed 6311.40 samples/sec Loss 28.4031 LearningRate 0.0003 Epoch: 1 Global Step: 25970 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:15,201-Speed 6316.29 samples/sec Loss 28.3520 LearningRate 0.0003 Epoch: 1 Global Step: 25980 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:18,445-Speed 6314.75 samples/sec Loss 28.3173 LearningRate 0.0003 Epoch: 1 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:06:21,688-Speed 6316.94 samples/sec Loss 28.2190 LearningRate 0.0003 Epoch: 1 Global Step: 26000 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:24,933-Speed 6313.89 samples/sec Loss 28.3058 LearningRate 0.0003 Epoch: 1 Global Step: 26010 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:28,180-Speed 6308.43 samples/sec Loss 28.3451 LearningRate 0.0003 Epoch: 1 Global Step: 26020 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:31,423-Speed 6316.65 samples/sec Loss 28.2490 LearningRate 0.0003 Epoch: 1 Global Step: 26030 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:34,698-Speed 6254.22 samples/sec Loss 28.3497 LearningRate 0.0003 Epoch: 1 Global Step: 26040 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:37,942-Speed 6314.20 samples/sec Loss 28.2564 LearningRate 0.0003 Epoch: 1 Global Step: 26050 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:41,200-Speed 6286.93 samples/sec Loss 28.2544 LearningRate 0.0003 Epoch: 1 Global Step: 26060 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:44,447-Speed 6308.79 samples/sec Loss 28.1910 LearningRate 0.0003 Epoch: 1 Global Step: 26070 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:47,685-Speed 6326.33 samples/sec Loss 28.1789 LearningRate 0.0003 Epoch: 1 Global Step: 26080 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:50,925-Speed 6323.06 samples/sec Loss 28.1441 LearningRate 0.0003 Epoch: 1 Global Step: 26090 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:06:54,169-Speed 6315.63 samples/sec Loss 28.2909 LearningRate 0.0003 Epoch: 1 Global Step: 26100 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:06:57,411-Speed 6316.94 samples/sec Loss 28.1227 LearningRate 0.0003 Epoch: 1 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:07:00,643-Speed 6338.35 samples/sec Loss 28.2557 LearningRate 0.0003 Epoch: 1 Global Step: 26120 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:03,889-Speed 6310.25 samples/sec Loss 28.1255 LearningRate 0.0003 Epoch: 1 Global Step: 26130 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:07,131-Speed 6319.02 samples/sec Loss 28.0524 LearningRate 0.0003 Epoch: 1 Global Step: 26140 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:10,380-Speed 6304.54 samples/sec Loss 28.0839 LearningRate 0.0003 Epoch: 1 Global Step: 26150 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:13,621-Speed 6319.67 samples/sec Loss 28.1114 LearningRate 0.0003 Epoch: 1 Global Step: 26160 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:16,862-Speed 6321.05 samples/sec Loss 28.0387 LearningRate 0.0003 Epoch: 1 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:20,103-Speed 6322.16 samples/sec Loss 27.9764 LearningRate 0.0003 Epoch: 1 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:23,349-Speed 6309.76 samples/sec Loss 28.0481 LearningRate 0.0003 Epoch: 1 Global Step: 26190 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:26,590-Speed 6321.77 samples/sec Loss 28.0372 LearningRate 0.0003 Epoch: 1 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:29,832-Speed 6317.59 samples/sec Loss 27.9215 LearningRate 0.0003 Epoch: 1 Global Step: 26210 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:33,065-Speed 6337.08 samples/sec Loss 27.9203 LearningRate 0.0003 Epoch: 1 Global Step: 26220 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:36,313-Speed 6306.21 samples/sec Loss 28.0561 LearningRate 0.0003 Epoch: 1 Global Step: 26230 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:39,558-Speed 6313.54 samples/sec Loss 27.9992 LearningRate 0.0003 Epoch: 1 Global Step: 26240 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:42,799-Speed 6319.27 samples/sec Loss 27.9650 LearningRate 0.0003 Epoch: 1 Global Step: 26250 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:46,041-Speed 6319.80 samples/sec Loss 27.9806 LearningRate 0.0003 Epoch: 1 Global Step: 26260 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:49,281-Speed 6321.79 samples/sec Loss 27.9356 LearningRate 0.0003 Epoch: 1 Global Step: 26270 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:52,521-Speed 6323.56 samples/sec Loss 27.9252 LearningRate 0.0003 Epoch: 1 Global Step: 26280 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:55,762-Speed 6318.76 samples/sec Loss 27.9087 LearningRate 0.0003 Epoch: 1 Global Step: 26290 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:07:59,002-Speed 6322.51 samples/sec Loss 27.9273 LearningRate 0.0003 Epoch: 1 Global Step: 26300 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:02,245-Speed 6316.60 samples/sec Loss 27.8175 LearningRate 0.0003 Epoch: 1 Global Step: 26310 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:05,475-Speed 6341.51 samples/sec Loss 27.8012 LearningRate 0.0003 Epoch: 1 Global Step: 26320 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:08,716-Speed 6320.50 samples/sec Loss 27.8139 LearningRate 0.0003 Epoch: 1 Global Step: 26330 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:11,961-Speed 6314.10 samples/sec Loss 27.8875 LearningRate 0.0003 Epoch: 1 Global Step: 26340 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:15,207-Speed 6309.15 samples/sec Loss 27.7660 LearningRate 0.0003 Epoch: 1 Global Step: 26350 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:18,453-Speed 6311.48 samples/sec Loss 27.7586 LearningRate 0.0003 Epoch: 1 Global Step: 26360 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:21,693-Speed 6323.59 samples/sec Loss 27.7693 LearningRate 0.0003 Epoch: 1 Global Step: 26370 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:24,936-Speed 6315.19 samples/sec Loss 27.7078 LearningRate 0.0003 Epoch: 1 Global Step: 26380 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:28,180-Speed 6316.21 samples/sec Loss 27.7951 LearningRate 0.0003 Epoch: 1 Global Step: 26390 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:31,420-Speed 6320.75 samples/sec Loss 27.6904 LearningRate 0.0003 Epoch: 1 Global Step: 26400 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:34,660-Speed 6323.98 samples/sec Loss 27.6327 LearningRate 0.0003 Epoch: 1 Global Step: 26410 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:37,892-Speed 6338.61 samples/sec Loss 27.7121 LearningRate 0.0003 Epoch: 1 Global Step: 26420 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:41,135-Speed 6316.38 samples/sec Loss 27.6087 LearningRate 0.0003 Epoch: 1 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:44,380-Speed 6311.78 samples/sec Loss 27.6609 LearningRate 0.0003 Epoch: 1 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:47,627-Speed 6308.56 samples/sec Loss 27.6268 LearningRate 0.0003 Epoch: 1 Global Step: 26450 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:50,868-Speed 6320.39 samples/sec Loss 27.5381 LearningRate 0.0003 Epoch: 1 Global Step: 26460 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:54,112-Speed 6314.08 samples/sec Loss 27.5650 LearningRate 0.0003 Epoch: 1 Global Step: 26470 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:08:57,350-Speed 6326.34 samples/sec Loss 27.5995 LearningRate 0.0003 Epoch: 1 Global Step: 26480 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:00,590-Speed 6322.29 samples/sec Loss 27.5513 LearningRate 0.0003 Epoch: 1 Global Step: 26490 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:03,834-Speed 6315.81 samples/sec Loss 27.6269 LearningRate 0.0003 Epoch: 1 Global Step: 26500 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:07,078-Speed 6314.53 samples/sec Loss 27.5063 LearningRate 0.0003 Epoch: 1 Global Step: 26510 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:10,323-Speed 6312.65 samples/sec Loss 27.4961 LearningRate 0.0003 Epoch: 1 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:09:13,567-Speed 6313.71 samples/sec Loss 27.4806 LearningRate 0.0003 Epoch: 1 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:09:16,799-Speed 6337.83 samples/sec Loss 27.4133 LearningRate 0.0003 Epoch: 1 Global Step: 26540 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:20,043-Speed 6315.62 samples/sec Loss 27.5251 LearningRate 0.0003 Epoch: 1 Global Step: 26550 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:23,289-Speed 6310.72 samples/sec Loss 27.6435 LearningRate 0.0003 Epoch: 1 Global Step: 26560 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:26,534-Speed 6312.47 samples/sec Loss 27.4390 LearningRate 0.0003 Epoch: 1 Global Step: 26570 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:29,779-Speed 6312.84 samples/sec Loss 27.4484 LearningRate 0.0003 Epoch: 1 Global Step: 26580 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:33,022-Speed 6316.47 samples/sec Loss 27.4010 LearningRate 0.0003 Epoch: 1 Global Step: 26590 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:36,264-Speed 6319.20 samples/sec Loss 27.4453 LearningRate 0.0003 Epoch: 1 Global Step: 26600 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:39,507-Speed 6315.55 samples/sec Loss 27.4013 LearningRate 0.0003 Epoch: 1 Global Step: 26610 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:42,751-Speed 6315.85 samples/sec Loss 27.4677 LearningRate 0.0003 Epoch: 1 Global Step: 26620 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:45,993-Speed 6317.98 samples/sec Loss 27.3996 LearningRate 0.0003 Epoch: 1 Global Step: 26630 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:49,231-Speed 6325.85 samples/sec Loss 27.4109 LearningRate 0.0003 Epoch: 1 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:09:52,480-Speed 6306.76 samples/sec Loss 27.3032 LearningRate 0.0003 Epoch: 1 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:09:55,707-Speed 6347.22 samples/sec Loss 27.2960 LearningRate 0.0003 Epoch: 1 Global Step: 26660 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:09:58,998-Speed 6224.56 samples/sec Loss 27.4231 LearningRate 0.0003 Epoch: 1 Global Step: 26670 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:02,300-Speed 6203.21 samples/sec Loss 27.3170 LearningRate 0.0003 Epoch: 1 Global Step: 26680 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:05,544-Speed 6315.03 samples/sec Loss 27.3016 LearningRate 0.0003 Epoch: 1 Global Step: 26690 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:08,787-Speed 6316.10 samples/sec Loss 27.3351 LearningRate 0.0003 Epoch: 1 Global Step: 26700 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:12,029-Speed 6318.81 samples/sec Loss 27.1545 LearningRate 0.0003 Epoch: 1 Global Step: 26710 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:15,272-Speed 6314.90 samples/sec Loss 27.2886 LearningRate 0.0003 Epoch: 1 Global Step: 26720 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:18,518-Speed 6311.54 samples/sec Loss 27.2271 LearningRate 0.0003 Epoch: 1 Global Step: 26730 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:21,761-Speed 6316.89 samples/sec Loss 27.1743 LearningRate 0.0003 Epoch: 1 Global Step: 26740 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:25,008-Speed 6307.87 samples/sec Loss 27.1791 LearningRate 0.0003 Epoch: 1 Global Step: 26750 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:28,239-Speed 6341.57 samples/sec Loss 27.2139 LearningRate 0.0003 Epoch: 1 Global Step: 26760 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:31,482-Speed 6315.83 samples/sec Loss 27.0830 LearningRate 0.0003 Epoch: 1 Global Step: 26770 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:34,725-Speed 6316.19 samples/sec Loss 26.9603 LearningRate 0.0003 Epoch: 1 Global Step: 26780 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:37,969-Speed 6316.62 samples/sec Loss 27.1279 LearningRate 0.0003 Epoch: 1 Global Step: 26790 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:41,225-Speed 6291.03 samples/sec Loss 27.2001 LearningRate 0.0003 Epoch: 1 Global Step: 26800 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:44,480-Speed 6293.10 samples/sec Loss 27.1081 LearningRate 0.0003 Epoch: 1 Global Step: 26810 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:47,721-Speed 6319.32 samples/sec Loss 27.2041 LearningRate 0.0003 Epoch: 1 Global Step: 26820 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:50,971-Speed 6304.91 samples/sec Loss 27.0781 LearningRate 0.0003 Epoch: 1 Global Step: 26830 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:54,215-Speed 6313.19 samples/sec Loss 27.0278 LearningRate 0.0003 Epoch: 1 Global Step: 26840 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:10:57,462-Speed 6309.71 samples/sec Loss 27.0625 LearningRate 0.0003 Epoch: 1 Global Step: 26850 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:00,704-Speed 6318.63 samples/sec Loss 27.0105 LearningRate 0.0003 Epoch: 1 Global Step: 26860 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:11:03,952-Speed 6306.88 samples/sec Loss 27.0992 LearningRate 0.0003 Epoch: 1 Global Step: 26870 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:11:07,181-Speed 6342.73 samples/sec Loss 26.9347 LearningRate 0.0003 Epoch: 1 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:10,427-Speed 6311.98 samples/sec Loss 26.9922 LearningRate 0.0003 Epoch: 1 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:13,672-Speed 6312.19 samples/sec Loss 26.8943 LearningRate 0.0003 Epoch: 1 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:16,917-Speed 6311.99 samples/sec Loss 27.0052 LearningRate 0.0003 Epoch: 1 Global Step: 26910 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:20,163-Speed 6311.30 samples/sec Loss 26.9788 LearningRate 0.0003 Epoch: 1 Global Step: 26920 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:23,406-Speed 6316.68 samples/sec Loss 26.9471 LearningRate 0.0003 Epoch: 1 Global Step: 26930 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:26,652-Speed 6309.76 samples/sec Loss 26.8719 LearningRate 0.0003 Epoch: 1 Global Step: 26940 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:29,893-Speed 6320.41 samples/sec Loss 26.8886 LearningRate 0.0003 Epoch: 1 Global Step: 26950 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:33,137-Speed 6315.12 samples/sec Loss 26.9415 LearningRate 0.0003 Epoch: 1 Global Step: 26960 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:36,383-Speed 6310.15 samples/sec Loss 26.8748 LearningRate 0.0003 Epoch: 1 Global Step: 26970 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:39,616-Speed 6336.49 samples/sec Loss 26.7779 LearningRate 0.0003 Epoch: 1 Global Step: 26980 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:42,865-Speed 6306.29 samples/sec Loss 26.7233 LearningRate 0.0003 Epoch: 1 Global Step: 26990 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:46,108-Speed 6315.31 samples/sec Loss 26.7984 LearningRate 0.0003 Epoch: 1 Global Step: 27000 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:49,353-Speed 6312.10 samples/sec Loss 26.7512 LearningRate 0.0003 Epoch: 1 Global Step: 27010 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:52,598-Speed 6313.58 samples/sec Loss 26.7977 LearningRate 0.0003 Epoch: 1 Global Step: 27020 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:55,841-Speed 6316.55 samples/sec Loss 26.6833 LearningRate 0.0003 Epoch: 1 Global Step: 27030 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:11:59,081-Speed 6321.97 samples/sec Loss 26.7741 LearningRate 0.0003 Epoch: 1 Global Step: 27040 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:02,333-Speed 6298.95 samples/sec Loss 26.6880 LearningRate 0.0003 Epoch: 1 Global Step: 27050 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:05,577-Speed 6314.57 samples/sec Loss 26.7499 LearningRate 0.0003 Epoch: 1 Global Step: 27060 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:08,819-Speed 6319.47 samples/sec Loss 26.6309 LearningRate 0.0003 Epoch: 1 Global Step: 27070 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:12,059-Speed 6322.76 samples/sec Loss 26.7403 LearningRate 0.0003 Epoch: 1 Global Step: 27080 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:12:15,290-Speed 6339.40 samples/sec Loss 26.6098 LearningRate 0.0003 Epoch: 1 Global Step: 27090 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:18,534-Speed 6313.88 samples/sec Loss 26.7253 LearningRate 0.0003 Epoch: 1 Global Step: 27100 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:21,776-Speed 6318.54 samples/sec Loss 26.6124 LearningRate 0.0003 Epoch: 1 Global Step: 27110 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:25,016-Speed 6321.95 samples/sec Loss 26.5812 LearningRate 0.0003 Epoch: 1 Global Step: 27120 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:28,261-Speed 6313.74 samples/sec Loss 26.5171 LearningRate 0.0003 Epoch: 1 Global Step: 27130 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:31,504-Speed 6316.85 samples/sec Loss 26.6880 LearningRate 0.0003 Epoch: 1 Global Step: 27140 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:34,749-Speed 6311.73 samples/sec Loss 26.5373 LearningRate 0.0003 Epoch: 1 Global Step: 27150 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:37,997-Speed 6307.31 samples/sec Loss 26.5905 LearningRate 0.0003 Epoch: 1 Global Step: 27160 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:41,241-Speed 6315.25 samples/sec Loss 26.6083 LearningRate 0.0003 Epoch: 1 Global Step: 27170 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:44,484-Speed 6315.44 samples/sec Loss 26.5324 LearningRate 0.0003 Epoch: 1 Global Step: 27180 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:12:47,734-Speed 6304.09 samples/sec Loss 26.5568 LearningRate 0.0003 Epoch: 1 Global Step: 27190 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:12:50,980-Speed 6312.83 samples/sec Loss 26.4426 LearningRate 0.0003 Epoch: 1 Global Step: 27200 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:12:54,226-Speed 6311.37 samples/sec Loss 26.5392 LearningRate 0.0003 Epoch: 1 Global Step: 27210 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:12:57,465-Speed 6324.34 samples/sec Loss 26.3896 LearningRate 0.0003 Epoch: 1 Global Step: 27220 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:13:00,693-Speed 6345.57 samples/sec Loss 26.4330 LearningRate 0.0003 Epoch: 1 Global Step: 27230 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:03,935-Speed 6319.28 samples/sec Loss 26.4422 LearningRate 0.0003 Epoch: 1 Global Step: 27240 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:07,184-Speed 6304.58 samples/sec Loss 26.4386 LearningRate 0.0003 Epoch: 1 Global Step: 27250 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:10,426-Speed 6319.77 samples/sec Loss 26.3523 LearningRate 0.0003 Epoch: 1 Global Step: 27260 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:13,665-Speed 6323.30 samples/sec Loss 26.4487 LearningRate 0.0003 Epoch: 1 Global Step: 27270 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:16,911-Speed 6310.38 samples/sec Loss 26.2539 LearningRate 0.0003 Epoch: 1 Global Step: 27280 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:20,155-Speed 6314.00 samples/sec Loss 26.3620 LearningRate 0.0003 Epoch: 1 Global Step: 27290 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:23,398-Speed 6316.91 samples/sec Loss 26.3779 LearningRate 0.0003 Epoch: 1 Global Step: 27300 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:26,642-Speed 6314.91 samples/sec Loss 26.2657 LearningRate 0.0003 Epoch: 1 Global Step: 27310 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:29,884-Speed 6319.63 samples/sec Loss 26.3084 LearningRate 0.0003 Epoch: 1 Global Step: 27320 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:33,128-Speed 6314.43 samples/sec Loss 26.3219 LearningRate 0.0003 Epoch: 1 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:13:36,372-Speed 6314.54 samples/sec Loss 26.2416 LearningRate 0.0003 Epoch: 1 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:13:39,602-Speed 6340.48 samples/sec Loss 26.2894 LearningRate 0.0003 Epoch: 1 Global Step: 27350 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:42,844-Speed 6319.83 samples/sec Loss 26.2679 LearningRate 0.0003 Epoch: 1 Global Step: 27360 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:46,088-Speed 6313.66 samples/sec Loss 26.2287 LearningRate 0.0003 Epoch: 1 Global Step: 27370 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:49,328-Speed 6322.69 samples/sec Loss 26.3253 LearningRate 0.0003 Epoch: 1 Global Step: 27380 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:52,569-Speed 6321.28 samples/sec Loss 26.2343 LearningRate 0.0003 Epoch: 1 Global Step: 27390 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:55,811-Speed 6319.00 samples/sec Loss 26.2091 LearningRate 0.0003 Epoch: 1 Global Step: 27400 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:13:59,086-Speed 6253.89 samples/sec Loss 26.1733 LearningRate 0.0003 Epoch: 1 Global Step: 27410 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:02,332-Speed 6311.40 samples/sec Loss 26.2121 LearningRate 0.0003 Epoch: 1 Global Step: 27420 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:05,578-Speed 6309.60 samples/sec Loss 26.1483 LearningRate 0.0003 Epoch: 1 Global Step: 27430 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:08,823-Speed 6314.42 samples/sec Loss 26.1348 LearningRate 0.0003 Epoch: 1 Global Step: 27440 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:12,050-Speed 6346.69 samples/sec Loss 26.0642 LearningRate 0.0003 Epoch: 1 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:15,294-Speed 6315.18 samples/sec Loss 26.0881 LearningRate 0.0003 Epoch: 1 Global Step: 27460 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:18,542-Speed 6306.28 samples/sec Loss 26.0164 LearningRate 0.0003 Epoch: 1 Global Step: 27470 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:21,785-Speed 6316.07 samples/sec Loss 26.0907 LearningRate 0.0003 Epoch: 1 Global Step: 27480 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:25,027-Speed 6319.52 samples/sec Loss 26.0374 LearningRate 0.0003 Epoch: 1 Global Step: 27490 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:28,271-Speed 6314.24 samples/sec Loss 25.9627 LearningRate 0.0003 Epoch: 1 Global Step: 27500 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:31,511-Speed 6322.72 samples/sec Loss 26.0563 LearningRate 0.0003 Epoch: 1 Global Step: 27510 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:34,756-Speed 6312.35 samples/sec Loss 25.8170 LearningRate 0.0003 Epoch: 1 Global Step: 27520 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:38,000-Speed 6314.68 samples/sec Loss 25.9501 LearningRate 0.0003 Epoch: 1 Global Step: 27530 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:41,246-Speed 6310.67 samples/sec Loss 25.9853 LearningRate 0.0003 Epoch: 1 Global Step: 27540 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:44,498-Speed 6298.41 samples/sec Loss 26.0209 LearningRate 0.0003 Epoch: 1 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:14:47,739-Speed 6319.95 samples/sec Loss 25.9573 LearningRate 0.0003 Epoch: 1 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:14:50,975-Speed 6331.97 samples/sec Loss 26.0759 LearningRate 0.0003 Epoch: 1 Global Step: 27570 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:54,218-Speed 6316.17 samples/sec Loss 25.8780 LearningRate 0.0003 Epoch: 1 Global Step: 27580 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:14:57,466-Speed 6307.29 samples/sec Loss 25.9315 LearningRate 0.0003 Epoch: 1 Global Step: 27590 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:00,713-Speed 6309.24 samples/sec Loss 25.9192 LearningRate 0.0003 Epoch: 1 Global Step: 27600 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:03,963-Speed 6302.38 samples/sec Loss 25.7639 LearningRate 0.0003 Epoch: 1 Global Step: 27610 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:07,203-Speed 6322.58 samples/sec Loss 25.9166 LearningRate 0.0003 Epoch: 1 Global Step: 27620 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:10,454-Speed 6301.81 samples/sec Loss 25.7521 LearningRate 0.0003 Epoch: 1 Global Step: 27630 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:13,692-Speed 6324.96 samples/sec Loss 25.8129 LearningRate 0.0003 Epoch: 1 Global Step: 27640 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:16,940-Speed 6308.03 samples/sec Loss 25.7731 LearningRate 0.0003 Epoch: 1 Global Step: 27650 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:20,199-Speed 6285.15 samples/sec Loss 25.8043 LearningRate 0.0003 Epoch: 1 Global Step: 27660 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:23,451-Speed 6299.49 samples/sec Loss 25.7811 LearningRate 0.0003 Epoch: 1 Global Step: 27670 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:26,693-Speed 6318.99 samples/sec Loss 25.7915 LearningRate 0.0003 Epoch: 1 Global Step: 27680 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:29,940-Speed 6307.61 samples/sec Loss 25.8133 LearningRate 0.0003 Epoch: 1 Global Step: 27690 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:33,187-Speed 6309.41 samples/sec Loss 25.6618 LearningRate 0.0003 Epoch: 1 Global Step: 27700 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:36,430-Speed 6316.17 samples/sec Loss 25.7959 LearningRate 0.0003 Epoch: 1 Global Step: 27710 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:39,675-Speed 6312.09 samples/sec Loss 25.7038 LearningRate 0.0003 Epoch: 1 Global Step: 27720 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:42,914-Speed 6325.98 samples/sec Loss 25.7754 LearningRate 0.0003 Epoch: 1 Global Step: 27730 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:46,158-Speed 6314.04 samples/sec Loss 25.6599 LearningRate 0.0003 Epoch: 1 Global Step: 27740 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:49,398-Speed 6322.57 samples/sec Loss 25.6734 LearningRate 0.0003 Epoch: 1 Global Step: 27750 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:52,645-Speed 6307.92 samples/sec Loss 25.5681 LearningRate 0.0003 Epoch: 1 Global Step: 27760 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:55,884-Speed 6325.00 samples/sec Loss 25.5878 LearningRate 0.0003 Epoch: 1 Global Step: 27770 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:15:59,130-Speed 6310.68 samples/sec Loss 25.6217 LearningRate 0.0003 Epoch: 1 Global Step: 27780 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:02,377-Speed 6309.78 samples/sec Loss 25.6099 LearningRate 0.0003 Epoch: 1 Global Step: 27790 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:05,631-Speed 6295.43 samples/sec Loss 25.5501 LearningRate 0.0003 Epoch: 1 Global Step: 27800 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:08,879-Speed 6306.80 samples/sec Loss 25.4889 LearningRate 0.0003 Epoch: 1 Global Step: 27810 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:12,125-Speed 6309.60 samples/sec Loss 25.5847 LearningRate 0.0003 Epoch: 1 Global Step: 27820 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:15,382-Speed 6289.69 samples/sec Loss 25.6988 LearningRate 0.0003 Epoch: 1 Global Step: 27830 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:18,698-Speed 6178.66 samples/sec Loss 25.6083 LearningRate 0.0003 Epoch: 1 Global Step: 27840 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:21,939-Speed 6319.30 samples/sec Loss 25.4251 LearningRate 0.0003 Epoch: 1 Global Step: 27850 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:25,182-Speed 6316.87 samples/sec Loss 25.4349 LearningRate 0.0003 Epoch: 1 Global Step: 27860 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:28,425-Speed 6316.28 samples/sec Loss 25.3887 LearningRate 0.0003 Epoch: 1 Global Step: 27870 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:16:31,673-Speed 6307.11 samples/sec Loss 25.4256 LearningRate 0.0003 Epoch: 1 Global Step: 27880 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:16:34,917-Speed 6313.97 samples/sec Loss 25.4929 LearningRate 0.0003 Epoch: 1 Global Step: 27890 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:16:38,150-Speed 6336.50 samples/sec Loss 25.4243 LearningRate 0.0003 Epoch: 1 Global Step: 27900 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:41,437-Speed 6231.52 samples/sec Loss 25.3087 LearningRate 0.0003 Epoch: 1 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:44,682-Speed 6314.31 samples/sec Loss 25.4577 LearningRate 0.0003 Epoch: 1 Global Step: 27920 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:47,925-Speed 6315.49 samples/sec Loss 25.3661 LearningRate 0.0003 Epoch: 1 Global Step: 27930 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:51,176-Speed 6304.27 samples/sec Loss 25.3723 LearningRate 0.0003 Epoch: 1 Global Step: 27940 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:54,418-Speed 6317.08 samples/sec Loss 25.3764 LearningRate 0.0003 Epoch: 1 Global Step: 27950 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:16:57,662-Speed 6315.63 samples/sec Loss 25.3701 LearningRate 0.0003 Epoch: 1 Global Step: 27960 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:00,905-Speed 6317.41 samples/sec Loss 25.3091 LearningRate 0.0003 Epoch: 1 Global Step: 27970 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:04,147-Speed 6316.49 samples/sec Loss 25.2884 LearningRate 0.0003 Epoch: 1 Global Step: 27980 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:07,396-Speed 6306.33 samples/sec Loss 25.2735 LearningRate 0.0003 Epoch: 1 Global Step: 27990 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:10,628-Speed 6336.98 samples/sec Loss 25.3732 LearningRate 0.0003 Epoch: 1 Global Step: 28000 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:13,872-Speed 6314.90 samples/sec Loss 25.2860 LearningRate 0.0003 Epoch: 1 Global Step: 28010 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:17,115-Speed 6317.10 samples/sec Loss 25.2668 LearningRate 0.0003 Epoch: 1 Global Step: 28020 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:20,359-Speed 6314.64 samples/sec Loss 25.2841 LearningRate 0.0003 Epoch: 1 Global Step: 28030 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:23,602-Speed 6318.84 samples/sec Loss 25.1771 LearningRate 0.0003 Epoch: 1 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:26,848-Speed 6310.56 samples/sec Loss 25.2338 LearningRate 0.0003 Epoch: 1 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:30,095-Speed 6310.23 samples/sec Loss 25.1473 LearningRate 0.0003 Epoch: 1 Global Step: 28060 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:33,339-Speed 6314.70 samples/sec Loss 25.1163 LearningRate 0.0003 Epoch: 1 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:36,590-Speed 6299.86 samples/sec Loss 25.1414 LearningRate 0.0003 Epoch: 1 Global Step: 28080 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:39,834-Speed 6315.01 samples/sec Loss 25.2217 LearningRate 0.0003 Epoch: 1 Global Step: 28090 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:43,078-Speed 6315.68 samples/sec Loss 25.1684 LearningRate 0.0003 Epoch: 1 Global Step: 28100 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:17:46,331-Speed 6297.32 samples/sec Loss 25.1536 LearningRate 0.0003 Epoch: 1 Global Step: 28110 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:17:49,564-Speed 6336.40 samples/sec Loss 25.0254 LearningRate 0.0003 Epoch: 1 Global Step: 28120 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:52,811-Speed 6308.62 samples/sec Loss 25.1836 LearningRate 0.0003 Epoch: 1 Global Step: 28130 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:56,051-Speed 6321.05 samples/sec Loss 25.0643 LearningRate 0.0003 Epoch: 1 Global Step: 28140 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:17:59,297-Speed 6312.48 samples/sec Loss 25.0184 LearningRate 0.0003 Epoch: 1 Global Step: 28150 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:02,540-Speed 6314.72 samples/sec Loss 25.0568 LearningRate 0.0003 Epoch: 1 Global Step: 28160 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:05,786-Speed 6311.64 samples/sec Loss 25.0173 LearningRate 0.0003 Epoch: 1 Global Step: 28170 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:09,027-Speed 6320.25 samples/sec Loss 25.0141 LearningRate 0.0003 Epoch: 1 Global Step: 28180 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:12,269-Speed 6317.95 samples/sec Loss 24.9828 LearningRate 0.0003 Epoch: 1 Global Step: 28190 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:15,514-Speed 6314.00 samples/sec Loss 24.9922 LearningRate 0.0003 Epoch: 1 Global Step: 28200 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:18,777-Speed 6279.28 samples/sec Loss 25.0545 LearningRate 0.0003 Epoch: 1 Global Step: 28210 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:22,016-Speed 6324.05 samples/sec Loss 24.9120 LearningRate 0.0003 Epoch: 1 Global Step: 28220 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:18:25,263-Speed 6307.93 samples/sec Loss 25.0814 LearningRate 0.0003 Epoch: 1 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:18:28,508-Speed 6313.96 samples/sec Loss 24.8344 LearningRate 0.0003 Epoch: 1 Global Step: 28240 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:18:31,737-Speed 6343.17 samples/sec Loss 24.9100 LearningRate 0.0003 Epoch: 1 Global Step: 28250 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:35,062-Speed 6160.93 samples/sec Loss 24.8882 LearningRate 0.0003 Epoch: 1 Global Step: 28260 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:38,345-Speed 6239.28 samples/sec Loss 24.9109 LearningRate 0.0003 Epoch: 1 Global Step: 28270 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:41,597-Speed 6299.28 samples/sec Loss 24.8212 LearningRate 0.0003 Epoch: 1 Global Step: 28280 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:44,839-Speed 6318.88 samples/sec Loss 24.8858 LearningRate 0.0003 Epoch: 1 Global Step: 28290 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:48,078-Speed 6323.05 samples/sec Loss 24.8103 LearningRate 0.0003 Epoch: 1 Global Step: 28300 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:51,325-Speed 6312.25 samples/sec Loss 24.8792 LearningRate 0.0003 Epoch: 1 Global Step: 28310 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:54,568-Speed 6316.08 samples/sec Loss 24.7653 LearningRate 0.0003 Epoch: 1 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:18:57,811-Speed 6318.30 samples/sec Loss 24.7810 LearningRate 0.0003 Epoch: 1 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:01,056-Speed 6312.38 samples/sec Loss 24.7492 LearningRate 0.0003 Epoch: 1 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:04,308-Speed 6298.92 samples/sec Loss 24.7933 LearningRate 0.0003 Epoch: 1 Global Step: 28350 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:19:07,548-Speed 6322.25 samples/sec Loss 24.7343 LearningRate 0.0003 Epoch: 1 Global Step: 28360 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:19:10,792-Speed 6314.11 samples/sec Loss 24.8280 LearningRate 0.0003 Epoch: 1 Global Step: 28370 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:19:14,015-Speed 6355.58 samples/sec Loss 24.6004 LearningRate 0.0003 Epoch: 1 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:17,260-Speed 6312.33 samples/sec Loss 24.7491 LearningRate 0.0003 Epoch: 1 Global Step: 28390 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:20,500-Speed 6324.64 samples/sec Loss 24.8379 LearningRate 0.0003 Epoch: 1 Global Step: 28400 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:23,749-Speed 6305.37 samples/sec Loss 24.7109 LearningRate 0.0003 Epoch: 1 Global Step: 28410 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:26,987-Speed 6324.97 samples/sec Loss 24.6483 LearningRate 0.0003 Epoch: 1 Global Step: 28420 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:30,231-Speed 6315.56 samples/sec Loss 24.6887 LearningRate 0.0003 Epoch: 1 Global Step: 28430 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:33,472-Speed 6319.76 samples/sec Loss 24.6014 LearningRate 0.0003 Epoch: 1 Global Step: 28440 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:36,729-Speed 6290.15 samples/sec Loss 24.6583 LearningRate 0.0003 Epoch: 1 Global Step: 28450 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:40,068-Speed 6134.26 samples/sec Loss 24.6562 LearningRate 0.0003 Epoch: 1 Global Step: 28460 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:43,344-Speed 6253.02 samples/sec Loss 24.5482 LearningRate 0.0003 Epoch: 1 Global Step: 28470 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:19:46,581-Speed 6327.28 samples/sec Loss 24.5534 LearningRate 0.0003 Epoch: 1 Global Step: 28480 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:19:49,827-Speed 6310.68 samples/sec Loss 24.5907 LearningRate 0.0003 Epoch: 1 Global Step: 28490 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:19:53,075-Speed 6307.53 samples/sec Loss 24.5293 LearningRate 0.0003 Epoch: 1 Global Step: 28500 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:19:56,323-Speed 6307.63 samples/sec Loss 24.5817 LearningRate 0.0003 Epoch: 1 Global Step: 28510 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:19:59,547-Speed 6352.52 samples/sec Loss 24.5598 LearningRate 0.0003 Epoch: 1 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:02,792-Speed 6314.17 samples/sec Loss 24.4908 LearningRate 0.0003 Epoch: 1 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:06,036-Speed 6313.43 samples/sec Loss 24.4921 LearningRate 0.0003 Epoch: 1 Global Step: 28540 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:09,277-Speed 6321.54 samples/sec Loss 24.4459 LearningRate 0.0003 Epoch: 1 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:12,520-Speed 6316.70 samples/sec Loss 24.3957 LearningRate 0.0003 Epoch: 1 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:15,780-Speed 6283.49 samples/sec Loss 24.4761 LearningRate 0.0003 Epoch: 1 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:19,023-Speed 6316.47 samples/sec Loss 24.2886 LearningRate 0.0003 Epoch: 1 Global Step: 28580 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:22,262-Speed 6324.73 samples/sec Loss 24.3973 LearningRate 0.0003 Epoch: 1 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:25,507-Speed 6312.97 samples/sec Loss 24.3527 LearningRate 0.0003 Epoch: 1 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:28,747-Speed 6323.37 samples/sec Loss 24.3589 LearningRate 0.0003 Epoch: 1 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:31,976-Speed 6343.53 samples/sec Loss 24.4321 LearningRate 0.0003 Epoch: 1 Global Step: 28620 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:35,216-Speed 6321.37 samples/sec Loss 24.4158 LearningRate 0.0003 Epoch: 1 Global Step: 28630 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:38,459-Speed 6317.24 samples/sec Loss 24.4006 LearningRate 0.0003 Epoch: 1 Global Step: 28640 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:41,704-Speed 6312.08 samples/sec Loss 24.3566 LearningRate 0.0003 Epoch: 1 Global Step: 28650 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:44,947-Speed 6316.63 samples/sec Loss 24.3637 LearningRate 0.0003 Epoch: 1 Global Step: 28660 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:48,187-Speed 6323.37 samples/sec Loss 24.2374 LearningRate 0.0003 Epoch: 1 Global Step: 28670 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:51,427-Speed 6321.18 samples/sec Loss 24.2491 LearningRate 0.0003 Epoch: 1 Global Step: 28680 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:54,667-Speed 6322.04 samples/sec Loss 24.2768 LearningRate 0.0003 Epoch: 1 Global Step: 28690 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:20:57,910-Speed 6317.93 samples/sec Loss 24.1835 LearningRate 0.0003 Epoch: 1 Global Step: 28700 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:01,151-Speed 6319.56 samples/sec Loss 24.3091 LearningRate 0.0003 Epoch: 1 Global Step: 28710 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:04,394-Speed 6317.13 samples/sec Loss 24.2436 LearningRate 0.0003 Epoch: 1 Global Step: 28720 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:21:07,637-Speed 6315.45 samples/sec Loss 24.2889 LearningRate 0.0003 Epoch: 1 Global Step: 28730 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:21:10,879-Speed 6322.45 samples/sec Loss 24.2498 LearningRate 0.0003 Epoch: 1 Global Step: 28740 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:21:14,111-Speed 6337.83 samples/sec Loss 24.2712 LearningRate 0.0003 Epoch: 1 Global Step: 28750 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:17,353-Speed 6319.20 samples/sec Loss 24.2787 LearningRate 0.0003 Epoch: 1 Global Step: 28760 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:20,595-Speed 6317.61 samples/sec Loss 24.2621 LearningRate 0.0003 Epoch: 1 Global Step: 28770 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:23,838-Speed 6317.57 samples/sec Loss 24.2718 LearningRate 0.0003 Epoch: 1 Global Step: 28780 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:27,090-Speed 6299.23 samples/sec Loss 24.1724 LearningRate 0.0003 Epoch: 1 Global Step: 28790 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:30,329-Speed 6324.86 samples/sec Loss 24.1328 LearningRate 0.0003 Epoch: 1 Global Step: 28800 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:33,571-Speed 6318.47 samples/sec Loss 24.1015 LearningRate 0.0003 Epoch: 1 Global Step: 28810 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:36,819-Speed 6306.77 samples/sec Loss 23.9902 LearningRate 0.0003 Epoch: 1 Global Step: 28820 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:40,059-Speed 6322.26 samples/sec Loss 24.1168 LearningRate 0.0003 Epoch: 1 Global Step: 28830 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:43,302-Speed 6316.09 samples/sec Loss 24.1017 LearningRate 0.0003 Epoch: 1 Global Step: 28840 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:46,547-Speed 6313.60 samples/sec Loss 24.0619 LearningRate 0.0003 Epoch: 1 Global Step: 28850 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:21:49,776-Speed 6344.09 samples/sec Loss 23.9860 LearningRate 0.0003 Epoch: 1 Global Step: 28860 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:53,021-Speed 6312.89 samples/sec Loss 24.0411 LearningRate 0.0003 Epoch: 1 Global Step: 28870 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:56,263-Speed 6316.90 samples/sec Loss 24.1169 LearningRate 0.0003 Epoch: 1 Global Step: 28880 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:21:59,505-Speed 6318.45 samples/sec Loss 23.9961 LearningRate 0.0003 Epoch: 1 Global Step: 28890 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:02,746-Speed 6320.39 samples/sec Loss 24.0294 LearningRate 0.0003 Epoch: 1 Global Step: 28900 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:06,039-Speed 6221.60 samples/sec Loss 23.9548 LearningRate 0.0003 Epoch: 1 Global Step: 28910 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:09,284-Speed 6312.44 samples/sec Loss 24.0255 LearningRate 0.0003 Epoch: 1 Global Step: 28920 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:12,577-Speed 6220.39 samples/sec Loss 23.8893 LearningRate 0.0003 Epoch: 1 Global Step: 28930 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:15,821-Speed 6315.07 samples/sec Loss 23.8481 LearningRate 0.0003 Epoch: 1 Global Step: 28940 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:19,065-Speed 6313.33 samples/sec Loss 23.9138 LearningRate 0.0003 Epoch: 1 Global Step: 28950 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:22,311-Speed 6311.84 samples/sec Loss 23.9233 LearningRate 0.0003 Epoch: 1 Global Step: 28960 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:22:25,555-Speed 6314.82 samples/sec Loss 23.8659 LearningRate 0.0003 Epoch: 1 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:22:28,798-Speed 6317.19 samples/sec Loss 23.8833 LearningRate 0.0003 Epoch: 1 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:22:32,025-Speed 6347.86 samples/sec Loss 23.8910 LearningRate 0.0003 Epoch: 1 Global Step: 28990 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:35,323-Speed 6210.87 samples/sec Loss 23.9201 LearningRate 0.0003 Epoch: 1 Global Step: 29000 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:38,572-Speed 6305.08 samples/sec Loss 23.8419 LearningRate 0.0003 Epoch: 1 Global Step: 29010 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:41,829-Speed 6289.62 samples/sec Loss 23.7455 LearningRate 0.0003 Epoch: 1 Global Step: 29020 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:45,076-Speed 6308.55 samples/sec Loss 23.7289 LearningRate 0.0003 Epoch: 1 Global Step: 29030 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:48,324-Speed 6306.83 samples/sec Loss 23.8016 LearningRate 0.0004 Epoch: 1 Global Step: 29040 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:51,568-Speed 6314.42 samples/sec Loss 23.7841 LearningRate 0.0004 Epoch: 1 Global Step: 29050 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:54,817-Speed 6304.98 samples/sec Loss 23.8505 LearningRate 0.0004 Epoch: 1 Global Step: 29060 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:22:58,061-Speed 6316.29 samples/sec Loss 23.7891 LearningRate 0.0004 Epoch: 1 Global Step: 29070 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:01,307-Speed 6310.13 samples/sec Loss 23.7478 LearningRate 0.0004 Epoch: 1 Global Step: 29080 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:04,557-Speed 6301.74 samples/sec Loss 23.7077 LearningRate 0.0004 Epoch: 1 Global Step: 29090 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:23:07,800-Speed 6317.37 samples/sec Loss 23.6601 LearningRate 0.0004 Epoch: 1 Global Step: 29100 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:23:11,045-Speed 6312.51 samples/sec Loss 23.6169 LearningRate 0.0004 Epoch: 1 Global Step: 29110 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:23:14,280-Speed 6332.87 samples/sec Loss 23.6282 LearningRate 0.0004 Epoch: 1 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:17,524-Speed 6315.08 samples/sec Loss 23.6121 LearningRate 0.0004 Epoch: 1 Global Step: 29130 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:20,775-Speed 6300.72 samples/sec Loss 23.7852 LearningRate 0.0004 Epoch: 1 Global Step: 29140 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:24,013-Speed 6325.83 samples/sec Loss 23.7643 LearningRate 0.0004 Epoch: 1 Global Step: 29150 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:27,260-Speed 6308.97 samples/sec Loss 23.4849 LearningRate 0.0004 Epoch: 1 Global Step: 29160 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:30,506-Speed 6310.86 samples/sec Loss 23.6434 LearningRate 0.0004 Epoch: 1 Global Step: 29170 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:33,752-Speed 6310.35 samples/sec Loss 23.5153 LearningRate 0.0004 Epoch: 1 Global Step: 29180 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:36,995-Speed 6316.49 samples/sec Loss 23.6121 LearningRate 0.0004 Epoch: 1 Global Step: 29190 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:40,235-Speed 6322.29 samples/sec Loss 23.6435 LearningRate 0.0004 Epoch: 1 Global Step: 29200 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:43,479-Speed 6314.37 samples/sec Loss 23.5117 LearningRate 0.0004 Epoch: 1 Global Step: 29210 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:23:46,720-Speed 6321.48 samples/sec Loss 23.5079 LearningRate 0.0004 Epoch: 1 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:23:49,963-Speed 6316.47 samples/sec Loss 23.4818 LearningRate 0.0004 Epoch: 1 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:23:53,206-Speed 6317.76 samples/sec Loss 23.5665 LearningRate 0.0004 Epoch: 1 Global Step: 29240 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:23:56,449-Speed 6315.00 samples/sec Loss 23.5181 LearningRate 0.0004 Epoch: 1 Global Step: 29250 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:23:59,702-Speed 6298.44 samples/sec Loss 23.5667 LearningRate 0.0004 Epoch: 1 Global Step: 29260 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:24:02,941-Speed 6323.27 samples/sec Loss 23.4098 LearningRate 0.0004 Epoch: 1 Global Step: 29270 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:24:06,192-Speed 6302.31 samples/sec Loss 23.4050 LearningRate 0.0004 Epoch: 1 Global Step: 29280 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:24:09,421-Speed 6343.28 samples/sec Loss 23.5254 LearningRate 0.0004 Epoch: 1 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:12,661-Speed 6322.11 samples/sec Loss 23.4212 LearningRate 0.0004 Epoch: 1 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:15,901-Speed 6321.95 samples/sec Loss 23.5550 LearningRate 0.0004 Epoch: 1 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:19,144-Speed 6316.94 samples/sec Loss 23.4912 LearningRate 0.0004 Epoch: 1 Global Step: 29320 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:22,391-Speed 6308.46 samples/sec Loss 23.4168 LearningRate 0.0004 Epoch: 1 Global Step: 29330 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:25,631-Speed 6323.46 samples/sec Loss 23.4765 LearningRate 0.0004 Epoch: 1 Global Step: 29340 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:28,878-Speed 6308.21 samples/sec Loss 23.3046 LearningRate 0.0004 Epoch: 1 Global Step: 29350 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:32,117-Speed 6323.65 samples/sec Loss 23.3952 LearningRate 0.0004 Epoch: 1 Global Step: 29360 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:35,368-Speed 6301.43 samples/sec Loss 23.3259 LearningRate 0.0004 Epoch: 1 Global Step: 29370 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:38,611-Speed 6316.91 samples/sec Loss 23.3541 LearningRate 0.0004 Epoch: 1 Global Step: 29380 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:41,850-Speed 6324.59 samples/sec Loss 23.2798 LearningRate 0.0004 Epoch: 1 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:24:45,093-Speed 6316.96 samples/sec Loss 23.3931 LearningRate 0.0004 Epoch: 1 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:24:48,334-Speed 6319.99 samples/sec Loss 23.2443 LearningRate 0.0004 Epoch: 1 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:24:51,580-Speed 6310.64 samples/sec Loss 23.2012 LearningRate 0.0004 Epoch: 1 Global Step: 29420 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:54,822-Speed 6319.13 samples/sec Loss 23.1898 LearningRate 0.0004 Epoch: 1 Global Step: 29430 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:24:58,061-Speed 6325.38 samples/sec Loss 23.2460 LearningRate 0.0004 Epoch: 1 Global Step: 29440 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:01,301-Speed 6322.00 samples/sec Loss 23.1509 LearningRate 0.0004 Epoch: 1 Global Step: 29450 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:04,542-Speed 6320.35 samples/sec Loss 23.1571 LearningRate 0.0004 Epoch: 1 Global Step: 29460 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:07,782-Speed 6321.37 samples/sec Loss 23.2844 LearningRate 0.0004 Epoch: 1 Global Step: 29470 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:11,026-Speed 6314.63 samples/sec Loss 23.2077 LearningRate 0.0004 Epoch: 1 Global Step: 29480 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:14,266-Speed 6322.57 samples/sec Loss 23.2143 LearningRate 0.0004 Epoch: 1 Global Step: 29490 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:17,505-Speed 6324.18 samples/sec Loss 23.2268 LearningRate 0.0004 Epoch: 1 Global Step: 29500 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:20,745-Speed 6322.55 samples/sec Loss 23.2080 LearningRate 0.0004 Epoch: 1 Global Step: 29510 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:23,994-Speed 6306.29 samples/sec Loss 23.1715 LearningRate 0.0004 Epoch: 1 Global Step: 29520 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:25:27,237-Speed 6315.52 samples/sec Loss 23.0974 LearningRate 0.0004 Epoch: 1 Global Step: 29530 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:25:30,487-Speed 6303.74 samples/sec Loss 23.1398 LearningRate 0.0004 Epoch: 1 Global Step: 29540 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:25:33,717-Speed 6341.58 samples/sec Loss 23.0499 LearningRate 0.0004 Epoch: 1 Global Step: 29550 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:36,956-Speed 6324.69 samples/sec Loss 23.0107 LearningRate 0.0004 Epoch: 1 Global Step: 29560 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:40,197-Speed 6319.41 samples/sec Loss 23.1662 LearningRate 0.0004 Epoch: 1 Global Step: 29570 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:43,439-Speed 6319.72 samples/sec Loss 23.0590 LearningRate 0.0004 Epoch: 1 Global Step: 29580 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:46,678-Speed 6322.40 samples/sec Loss 22.9811 LearningRate 0.0004 Epoch: 1 Global Step: 29590 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:49,919-Speed 6321.71 samples/sec Loss 23.0662 LearningRate 0.0004 Epoch: 1 Global Step: 29600 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:53,165-Speed 6310.44 samples/sec Loss 22.9892 LearningRate 0.0004 Epoch: 1 Global Step: 29610 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:56,412-Speed 6309.33 samples/sec Loss 23.0488 LearningRate 0.0004 Epoch: 1 Global Step: 29620 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:25:59,661-Speed 6306.00 samples/sec Loss 23.0889 LearningRate 0.0004 Epoch: 1 Global Step: 29630 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:02,899-Speed 6325.32 samples/sec Loss 22.9980 LearningRate 0.0004 Epoch: 1 Global Step: 29640 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:06,145-Speed 6310.72 samples/sec Loss 22.9221 LearningRate 0.0004 Epoch: 1 Global Step: 29650 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:26:09,387-Speed 6319.45 samples/sec Loss 22.9856 LearningRate 0.0004 Epoch: 1 Global Step: 29660 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:26:12,612-Speed 6351.40 samples/sec Loss 22.9962 LearningRate 0.0004 Epoch: 1 Global Step: 29670 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:15,855-Speed 6315.92 samples/sec Loss 22.8655 LearningRate 0.0004 Epoch: 1 Global Step: 29680 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:19,095-Speed 6322.42 samples/sec Loss 22.9358 LearningRate 0.0004 Epoch: 1 Global Step: 29690 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:22,337-Speed 6318.52 samples/sec Loss 23.0034 LearningRate 0.0004 Epoch: 1 Global Step: 29700 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:25,582-Speed 6312.48 samples/sec Loss 22.9800 LearningRate 0.0004 Epoch: 1 Global Step: 29710 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:28,825-Speed 6320.06 samples/sec Loss 22.8893 LearningRate 0.0004 Epoch: 1 Global Step: 29720 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:32,068-Speed 6317.03 samples/sec Loss 22.8585 LearningRate 0.0004 Epoch: 1 Global Step: 29730 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:35,312-Speed 6314.21 samples/sec Loss 22.8107 LearningRate 0.0004 Epoch: 1 Global Step: 29740 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:38,553-Speed 6320.72 samples/sec Loss 22.7895 LearningRate 0.0004 Epoch: 1 Global Step: 29750 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:41,794-Speed 6320.54 samples/sec Loss 22.9330 LearningRate 0.0004 Epoch: 1 Global Step: 29760 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:45,037-Speed 6315.88 samples/sec Loss 22.8711 LearningRate 0.0004 Epoch: 1 Global Step: 29770 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:26:48,293-Speed 6292.14 samples/sec Loss 22.8111 LearningRate 0.0004 Epoch: 1 Global Step: 29780 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:26:51,537-Speed 6313.58 samples/sec Loss 22.7847 LearningRate 0.0004 Epoch: 1 Global Step: 29790 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:26:54,764-Speed 6348.81 samples/sec Loss 22.8541 LearningRate 0.0004 Epoch: 1 Global Step: 29800 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:26:58,006-Speed 6318.14 samples/sec Loss 22.7021 LearningRate 0.0004 Epoch: 1 Global Step: 29810 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:01,244-Speed 6326.89 samples/sec Loss 22.7068 LearningRate 0.0004 Epoch: 1 Global Step: 29820 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:04,483-Speed 6323.54 samples/sec Loss 22.7435 LearningRate 0.0004 Epoch: 1 Global Step: 29830 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:07,724-Speed 6320.88 samples/sec Loss 22.8219 LearningRate 0.0004 Epoch: 1 Global Step: 29840 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:10,969-Speed 6313.36 samples/sec Loss 22.6432 LearningRate 0.0004 Epoch: 1 Global Step: 29850 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:14,210-Speed 6320.12 samples/sec Loss 22.7834 LearningRate 0.0004 Epoch: 1 Global Step: 29860 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:17,452-Speed 6318.81 samples/sec Loss 22.7091 LearningRate 0.0004 Epoch: 1 Global Step: 29870 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:20,695-Speed 6317.21 samples/sec Loss 22.5498 LearningRate 0.0004 Epoch: 1 Global Step: 29880 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:23,937-Speed 6316.85 samples/sec Loss 22.6925 LearningRate 0.0004 Epoch: 1 Global Step: 29890 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:27,181-Speed 6315.12 samples/sec Loss 22.6138 LearningRate 0.0004 Epoch: 1 Global Step: 29900 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:27:30,510-Speed 6152.94 samples/sec Loss 22.6717 LearningRate 0.0004 Epoch: 1 Global Step: 29910 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:27:33,750-Speed 6323.07 samples/sec Loss 22.7069 LearningRate 0.0004 Epoch: 1 Global Step: 29920 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:27:36,993-Speed 6317.86 samples/sec Loss 22.7176 LearningRate 0.0004 Epoch: 1 Global Step: 29930 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:27:40,235-Speed 6316.64 samples/sec Loss 22.5408 LearningRate 0.0004 Epoch: 1 Global Step: 29940 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:27:43,477-Speed 6320.07 samples/sec Loss 22.5969 LearningRate 0.0004 Epoch: 1 Global Step: 29950 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:27:46,716-Speed 6323.84 samples/sec Loss 22.6746 LearningRate 0.0004 Epoch: 1 Global Step: 29960 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:27:49,942-Speed 6349.55 samples/sec Loss 22.4980 LearningRate 0.0004 Epoch: 1 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:53,185-Speed 6316.53 samples/sec Loss 22.5518 LearningRate 0.0004 Epoch: 1 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:56,433-Speed 6305.65 samples/sec Loss 22.6016 LearningRate 0.0004 Epoch: 1 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:27:59,688-Speed 6294.48 samples/sec Loss 22.5645 LearningRate 0.0004 Epoch: 1 Global Step: 30000 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:02,954-Speed 6272.59 samples/sec Loss 22.5298 LearningRate 0.0004 Epoch: 1 Global Step: 30010 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:06,199-Speed 6312.76 samples/sec Loss 22.5444 LearningRate 0.0004 Epoch: 1 Global Step: 30020 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:09,445-Speed 6309.24 samples/sec Loss 22.4418 LearningRate 0.0004 Epoch: 1 Global Step: 30030 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:12,688-Speed 6317.62 samples/sec Loss 22.5393 LearningRate 0.0004 Epoch: 1 Global Step: 30040 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:15,929-Speed 6319.80 samples/sec Loss 22.5775 LearningRate 0.0004 Epoch: 1 Global Step: 30050 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:19,171-Speed 6318.78 samples/sec Loss 22.4744 LearningRate 0.0004 Epoch: 1 Global Step: 30060 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:22,403-Speed 6339.45 samples/sec Loss 22.4726 LearningRate 0.0004 Epoch: 1 Global Step: 30070 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:25,646-Speed 6315.17 samples/sec Loss 22.4088 LearningRate 0.0004 Epoch: 1 Global Step: 30080 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:28,891-Speed 6313.27 samples/sec Loss 22.4514 LearningRate 0.0004 Epoch: 1 Global Step: 30090 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:32,135-Speed 6315.27 samples/sec Loss 22.3626 LearningRate 0.0004 Epoch: 1 Global Step: 30100 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:35,378-Speed 6315.35 samples/sec Loss 22.5140 LearningRate 0.0004 Epoch: 1 Global Step: 30110 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:38,621-Speed 6317.34 samples/sec Loss 22.3979 LearningRate 0.0004 Epoch: 1 Global Step: 30120 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:41,869-Speed 6306.76 samples/sec Loss 22.2901 LearningRate 0.0004 Epoch: 1 Global Step: 30130 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:45,112-Speed 6316.57 samples/sec Loss 22.3971 LearningRate 0.0004 Epoch: 1 Global Step: 30140 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:48,355-Speed 6316.71 samples/sec Loss 22.3852 LearningRate 0.0004 Epoch: 1 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:51,597-Speed 6317.41 samples/sec Loss 22.3157 LearningRate 0.0004 Epoch: 1 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:28:54,842-Speed 6312.58 samples/sec Loss 22.3625 LearningRate 0.0004 Epoch: 1 Global Step: 30170 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:28:58,069-Speed 6348.51 samples/sec Loss 22.3743 LearningRate 0.0004 Epoch: 1 Global Step: 30180 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:01,322-Speed 6296.15 samples/sec Loss 22.3152 LearningRate 0.0004 Epoch: 1 Global Step: 30190 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:04,566-Speed 6315.27 samples/sec Loss 22.2826 LearningRate 0.0004 Epoch: 1 Global Step: 30200 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:07,807-Speed 6320.00 samples/sec Loss 22.3748 LearningRate 0.0004 Epoch: 1 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:11,051-Speed 6315.11 samples/sec Loss 22.4076 LearningRate 0.0004 Epoch: 1 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:14,296-Speed 6313.45 samples/sec Loss 22.4182 LearningRate 0.0004 Epoch: 1 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:17,540-Speed 6315.38 samples/sec Loss 22.3213 LearningRate 0.0004 Epoch: 1 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:20,783-Speed 6316.02 samples/sec Loss 22.3375 LearningRate 0.0004 Epoch: 1 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:24,024-Speed 6320.54 samples/sec Loss 22.2999 LearningRate 0.0004 Epoch: 1 Global Step: 30260 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:27,274-Speed 6302.87 samples/sec Loss 22.2179 LearningRate 0.0004 Epoch: 1 Global Step: 30270 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:30,518-Speed 6314.51 samples/sec Loss 22.2477 LearningRate 0.0004 Epoch: 1 Global Step: 30280 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:29:33,755-Speed 6329.95 samples/sec Loss 22.1754 LearningRate 0.0004 Epoch: 1 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:36,995-Speed 6321.62 samples/sec Loss 22.1120 LearningRate 0.0004 Epoch: 1 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:40,242-Speed 6308.05 samples/sec Loss 22.1753 LearningRate 0.0004 Epoch: 1 Global Step: 30310 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:43,485-Speed 6316.97 samples/sec Loss 22.2013 LearningRate 0.0004 Epoch: 1 Global Step: 30320 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:46,737-Speed 6299.04 samples/sec Loss 22.1152 LearningRate 0.0004 Epoch: 1 Global Step: 30330 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:50,022-Speed 6235.38 samples/sec Loss 22.0419 LearningRate 0.0004 Epoch: 1 Global Step: 30340 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:53,266-Speed 6314.50 samples/sec Loss 22.0008 LearningRate 0.0004 Epoch: 1 Global Step: 30350 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:56,512-Speed 6310.57 samples/sec Loss 22.0463 LearningRate 0.0004 Epoch: 1 Global Step: 30360 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:29:59,754-Speed 6318.23 samples/sec Loss 22.0136 LearningRate 0.0004 Epoch: 1 Global Step: 30370 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:03,000-Speed 6310.80 samples/sec Loss 22.1803 LearningRate 0.0004 Epoch: 1 Global Step: 30380 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:06,230-Speed 6342.51 samples/sec Loss 22.0465 LearningRate 0.0004 Epoch: 1 Global Step: 30390 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:09,474-Speed 6314.45 samples/sec Loss 22.0425 LearningRate 0.0004 Epoch: 1 Global Step: 30400 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:12,721-Speed 6309.17 samples/sec Loss 21.9208 LearningRate 0.0004 Epoch: 1 Global Step: 30410 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:15,959-Speed 6326.07 samples/sec Loss 22.1664 LearningRate 0.0004 Epoch: 1 Global Step: 30420 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:19,202-Speed 6317.15 samples/sec Loss 22.0099 LearningRate 0.0004 Epoch: 1 Global Step: 30430 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:22,446-Speed 6314.41 samples/sec Loss 21.8702 LearningRate 0.0004 Epoch: 1 Global Step: 30440 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:25,687-Speed 6320.75 samples/sec Loss 22.0347 LearningRate 0.0004 Epoch: 1 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:28,931-Speed 6314.69 samples/sec Loss 22.0554 LearningRate 0.0004 Epoch: 1 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:32,171-Speed 6322.83 samples/sec Loss 21.9222 LearningRate 0.0004 Epoch: 1 Global Step: 30470 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:35,413-Speed 6319.12 samples/sec Loss 22.0087 LearningRate 0.0004 Epoch: 1 Global Step: 30480 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:30:38,716-Speed 6200.69 samples/sec Loss 21.9053 LearningRate 0.0004 Epoch: 1 Global Step: 30490 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:30:41,958-Speed 6319.11 samples/sec Loss 21.8901 LearningRate 0.0004 Epoch: 1 Global Step: 30500 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:30:45,199-Speed 6319.37 samples/sec Loss 21.8746 LearningRate 0.0004 Epoch: 1 Global Step: 30510 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:30:48,441-Speed 6319.56 samples/sec Loss 21.9214 LearningRate 0.0004 Epoch: 1 Global Step: 30520 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:30:51,684-Speed 6316.55 samples/sec Loss 21.9408 LearningRate 0.0004 Epoch: 1 Global Step: 30530 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:30:54,927-Speed 6315.20 samples/sec Loss 21.8130 LearningRate 0.0004 Epoch: 1 Global Step: 30540 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:30:58,183-Speed 6291.48 samples/sec Loss 21.8674 LearningRate 0.0004 Epoch: 1 Global Step: 30550 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:01,425-Speed 6319.54 samples/sec Loss 21.9073 LearningRate 0.0004 Epoch: 1 Global Step: 30560 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:04,663-Speed 6326.01 samples/sec Loss 21.8482 LearningRate 0.0004 Epoch: 1 Global Step: 30570 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:07,908-Speed 6313.32 samples/sec Loss 21.8929 LearningRate 0.0004 Epoch: 1 Global Step: 30580 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:11,139-Speed 6338.42 samples/sec Loss 21.7835 LearningRate 0.0004 Epoch: 1 Global Step: 30590 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:14,397-Speed 6288.39 samples/sec Loss 21.8198 LearningRate 0.0004 Epoch: 1 Global Step: 30600 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:17,641-Speed 6314.69 samples/sec Loss 21.8844 LearningRate 0.0004 Epoch: 1 Global Step: 30610 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:20,886-Speed 6312.57 samples/sec Loss 21.8243 LearningRate 0.0004 Epoch: 1 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:24,131-Speed 6312.06 samples/sec Loss 21.8182 LearningRate 0.0004 Epoch: 1 Global Step: 30630 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:27,405-Speed 6257.81 samples/sec Loss 21.7347 LearningRate 0.0004 Epoch: 1 Global Step: 30640 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:30,649-Speed 6313.57 samples/sec Loss 21.8852 LearningRate 0.0004 Epoch: 1 Global Step: 30650 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:33,888-Speed 6325.49 samples/sec Loss 21.7108 LearningRate 0.0004 Epoch: 1 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:37,129-Speed 6319.97 samples/sec Loss 21.6877 LearningRate 0.0004 Epoch: 1 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:40,374-Speed 6313.36 samples/sec Loss 21.6650 LearningRate 0.0004 Epoch: 1 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:43,601-Speed 6347.43 samples/sec Loss 21.7684 LearningRate 0.0004 Epoch: 1 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:46,842-Speed 6320.32 samples/sec Loss 21.6320 LearningRate 0.0004 Epoch: 1 Global Step: 30700 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:50,085-Speed 6317.06 samples/sec Loss 21.7507 LearningRate 0.0004 Epoch: 1 Global Step: 30710 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:31:53,313-Speed 6345.41 samples/sec Loss 21.7388 LearningRate 0.0004 Epoch: 1 Global Step: 30720 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:31:56,554-Speed 6320.31 samples/sec Loss 21.6486 LearningRate 0.0004 Epoch: 1 Global Step: 30730 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:31:59,796-Speed 6318.63 samples/sec Loss 21.6004 LearningRate 0.0004 Epoch: 1 Global Step: 30740 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:32:03,044-Speed 6307.49 samples/sec Loss 21.5441 LearningRate 0.0004 Epoch: 1 Global Step: 30750 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:32:06,286-Speed 6317.53 samples/sec Loss 21.6093 LearningRate 0.0004 Epoch: 1 Global Step: 30760 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:32:09,541-Speed 6293.92 samples/sec Loss 21.6538 LearningRate 0.0004 Epoch: 1 Global Step: 30770 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:32:12,817-Speed 6253.22 samples/sec Loss 21.6787 LearningRate 0.0004 Epoch: 1 Global Step: 30780 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:32:16,059-Speed 6317.00 samples/sec Loss 21.6305 LearningRate 0.0004 Epoch: 1 Global Step: 30790 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:32:19,300-Speed 6321.93 samples/sec Loss 21.5150 LearningRate 0.0004 Epoch: 1 Global Step: 30800 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:32:22,542-Speed 6318.59 samples/sec Loss 21.5287 LearningRate 0.0004 Epoch: 1 Global Step: 30810 Fp16 Grad Scale: 65536 Required: 73 hours Training: 2022-03-31 18:32:25,780-Speed 6324.53 samples/sec Loss 21.7074 LearningRate 0.0004 Epoch: 1 Global Step: 30820 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:32:29,025-Speed 6314.36 samples/sec Loss 21.5830 LearningRate 0.0004 Epoch: 1 Global Step: 30830 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:32:32,267-Speed 6317.61 samples/sec Loss 21.5748 LearningRate 0.0004 Epoch: 1 Global Step: 30840 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:32:35,508-Speed 6321.22 samples/sec Loss 21.3329 LearningRate 0.0004 Epoch: 1 Global Step: 30850 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:32:38,746-Speed 6325.58 samples/sec Loss 21.4711 LearningRate 0.0004 Epoch: 1 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:32:42,001-Speed 6294.31 samples/sec Loss 21.4432 LearningRate 0.0004 Epoch: 1 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:32:45,245-Speed 6313.44 samples/sec Loss 21.4531 LearningRate 0.0004 Epoch: 1 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:32:48,488-Speed 6318.34 samples/sec Loss 21.5386 LearningRate 0.0004 Epoch: 1 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:32:51,732-Speed 6313.50 samples/sec Loss 21.3649 LearningRate 0.0004 Epoch: 1 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:32:54,974-Speed 6318.10 samples/sec Loss 21.6098 LearningRate 0.0004 Epoch: 1 Global Step: 30910 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:32:58,205-Speed 6341.10 samples/sec Loss 21.5402 LearningRate 0.0004 Epoch: 1 Global Step: 30920 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:33:01,450-Speed 6311.49 samples/sec Loss 21.4746 LearningRate 0.0004 Epoch: 1 Global Step: 30930 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:33:04,699-Speed 6305.69 samples/sec Loss 21.3531 LearningRate 0.0004 Epoch: 1 Global Step: 30940 Fp16 Grad Scale: 131072 Required: 73 hours Training: 2022-03-31 18:33:07,944-Speed 6313.71 samples/sec Loss 21.3639 LearningRate 0.0004 Epoch: 1 Global Step: 30950 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:33:11,185-Speed 6318.61 samples/sec Loss 21.3414 LearningRate 0.0004 Epoch: 1 Global Step: 30960 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:33:14,416-Speed 6341.41 samples/sec Loss 21.3565 LearningRate 0.0004 Epoch: 1 Global Step: 30970 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:33:17,658-Speed 6318.25 samples/sec Loss 21.4841 LearningRate 0.0004 Epoch: 1 Global Step: 30980 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:33:20,899-Speed 6320.03 samples/sec Loss 21.3303 LearningRate 0.0004 Epoch: 1 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:33:24,142-Speed 6316.00 samples/sec Loss 21.3556 LearningRate 0.0004 Epoch: 1 Global Step: 31000 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:33:27,384-Speed 6318.35 samples/sec Loss 21.3355 LearningRate 0.0004 Epoch: 1 Global Step: 31010 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:33:30,624-Speed 6322.46 samples/sec Loss 21.2830 LearningRate 0.0004 Epoch: 1 Global Step: 31020 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:33:33,865-Speed 6320.48 samples/sec Loss 21.3533 LearningRate 0.0004 Epoch: 1 Global Step: 31030 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:33:37,110-Speed 6313.87 samples/sec Loss 21.3177 LearningRate 0.0004 Epoch: 1 Global Step: 31040 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:33:40,348-Speed 6325.49 samples/sec Loss 21.2438 LearningRate 0.0004 Epoch: 1 Global Step: 31050 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:33:43,589-Speed 6321.62 samples/sec Loss 21.2362 LearningRate 0.0004 Epoch: 1 Global Step: 31060 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:33:46,832-Speed 6315.89 samples/sec Loss 21.3027 LearningRate 0.0004 Epoch: 1 Global Step: 31070 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:33:50,076-Speed 6315.65 samples/sec Loss 21.1362 LearningRate 0.0004 Epoch: 1 Global Step: 31080 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:33:53,321-Speed 6312.80 samples/sec Loss 21.2898 LearningRate 0.0004 Epoch: 1 Global Step: 31090 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:33:56,563-Speed 6317.10 samples/sec Loss 21.2526 LearningRate 0.0004 Epoch: 1 Global Step: 31100 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:33:59,809-Speed 6311.15 samples/sec Loss 21.1007 LearningRate 0.0004 Epoch: 1 Global Step: 31110 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:34:03,053-Speed 6315.35 samples/sec Loss 21.1777 LearningRate 0.0004 Epoch: 1 Global Step: 31120 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:34:06,300-Speed 6308.87 samples/sec Loss 21.1450 LearningRate 0.0004 Epoch: 1 Global Step: 31130 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:34:09,545-Speed 6312.70 samples/sec Loss 21.1546 LearningRate 0.0004 Epoch: 1 Global Step: 31140 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:34:12,790-Speed 6311.23 samples/sec Loss 21.1724 LearningRate 0.0004 Epoch: 1 Global Step: 31150 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:34:16,018-Speed 6345.72 samples/sec Loss 21.1021 LearningRate 0.0004 Epoch: 1 Global Step: 31160 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:34:19,264-Speed 6312.26 samples/sec Loss 21.1857 LearningRate 0.0004 Epoch: 1 Global Step: 31170 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:34:22,525-Speed 6281.55 samples/sec Loss 21.1453 LearningRate 0.0004 Epoch: 1 Global Step: 31180 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:34:25,779-Speed 6295.53 samples/sec Loss 21.1515 LearningRate 0.0004 Epoch: 1 Global Step: 31190 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:34:29,027-Speed 6305.88 samples/sec Loss 21.0847 LearningRate 0.0004 Epoch: 1 Global Step: 31200 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:34:32,270-Speed 6316.82 samples/sec Loss 21.1620 LearningRate 0.0004 Epoch: 1 Global Step: 31210 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:34:35,510-Speed 6323.26 samples/sec Loss 21.0361 LearningRate 0.0004 Epoch: 1 Global Step: 31220 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:34:38,754-Speed 6313.98 samples/sec Loss 21.0925 LearningRate 0.0004 Epoch: 1 Global Step: 31230 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:34:41,995-Speed 6321.13 samples/sec Loss 21.0381 LearningRate 0.0004 Epoch: 1 Global Step: 31240 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:34:45,246-Speed 6300.35 samples/sec Loss 21.1484 LearningRate 0.0004 Epoch: 1 Global Step: 31250 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:34:48,488-Speed 6319.65 samples/sec Loss 21.1022 LearningRate 0.0004 Epoch: 1 Global Step: 31260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:34:51,733-Speed 6312.04 samples/sec Loss 21.1131 LearningRate 0.0004 Epoch: 1 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:34:54,978-Speed 6312.73 samples/sec Loss 21.0634 LearningRate 0.0004 Epoch: 1 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:34:58,220-Speed 6319.13 samples/sec Loss 21.1193 LearningRate 0.0004 Epoch: 1 Global Step: 31290 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:01,464-Speed 6313.92 samples/sec Loss 21.0474 LearningRate 0.0004 Epoch: 1 Global Step: 31300 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:04,715-Speed 6300.61 samples/sec Loss 20.9516 LearningRate 0.0004 Epoch: 1 Global Step: 31310 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:07,970-Speed 6294.55 samples/sec Loss 21.0986 LearningRate 0.0004 Epoch: 1 Global Step: 31320 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:11,233-Speed 6276.38 samples/sec Loss 20.9688 LearningRate 0.0004 Epoch: 1 Global Step: 31330 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:14,479-Speed 6312.19 samples/sec Loss 21.0286 LearningRate 0.0004 Epoch: 1 Global Step: 31340 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:17,719-Speed 6321.08 samples/sec Loss 20.9295 LearningRate 0.0004 Epoch: 1 Global Step: 31350 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:20,965-Speed 6310.27 samples/sec Loss 20.9326 LearningRate 0.0004 Epoch: 1 Global Step: 31360 Fp16 Grad Scale: 262144 Required: 72 hours Training: 2022-03-31 18:35:24,198-Speed 6336.09 samples/sec Loss 20.9761 LearningRate 0.0004 Epoch: 1 Global Step: 31370 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:27,444-Speed 6312.21 samples/sec Loss 21.0463 LearningRate 0.0004 Epoch: 1 Global Step: 31380 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:30,684-Speed 6320.39 samples/sec Loss 20.9699 LearningRate 0.0004 Epoch: 1 Global Step: 31390 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:33,926-Speed 6320.34 samples/sec Loss 20.9541 LearningRate 0.0004 Epoch: 1 Global Step: 31400 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:37,173-Speed 6307.67 samples/sec Loss 20.8941 LearningRate 0.0004 Epoch: 1 Global Step: 31410 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:40,412-Speed 6325.76 samples/sec Loss 20.9175 LearningRate 0.0004 Epoch: 1 Global Step: 31420 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:43,654-Speed 6318.11 samples/sec Loss 20.8946 LearningRate 0.0004 Epoch: 1 Global Step: 31430 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:46,895-Speed 6318.92 samples/sec Loss 20.9932 LearningRate 0.0004 Epoch: 1 Global Step: 31440 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:50,138-Speed 6316.69 samples/sec Loss 20.9252 LearningRate 0.0004 Epoch: 1 Global Step: 31450 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:53,378-Speed 6322.60 samples/sec Loss 20.7828 LearningRate 0.0004 Epoch: 1 Global Step: 31460 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:56,609-Speed 6341.86 samples/sec Loss 20.7858 LearningRate 0.0004 Epoch: 1 Global Step: 31470 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:35:59,849-Speed 6321.40 samples/sec Loss 20.8082 LearningRate 0.0004 Epoch: 1 Global Step: 31480 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:36:03,090-Speed 6320.42 samples/sec Loss 20.9176 LearningRate 0.0004 Epoch: 1 Global Step: 31490 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:36:06,332-Speed 6319.78 samples/sec Loss 20.7697 LearningRate 0.0004 Epoch: 1 Global Step: 31500 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:36:09,573-Speed 6319.53 samples/sec Loss 20.6931 LearningRate 0.0004 Epoch: 1 Global Step: 31510 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:36:12,818-Speed 6312.55 samples/sec Loss 20.7908 LearningRate 0.0004 Epoch: 1 Global Step: 31520 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:36:16,062-Speed 6315.97 samples/sec Loss 20.7460 LearningRate 0.0004 Epoch: 1 Global Step: 31530 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:36:19,305-Speed 6315.85 samples/sec Loss 20.6975 LearningRate 0.0004 Epoch: 1 Global Step: 31540 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:36:22,546-Speed 6321.39 samples/sec Loss 20.7463 LearningRate 0.0004 Epoch: 1 Global Step: 31550 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:36:25,788-Speed 6318.19 samples/sec Loss 20.7284 LearningRate 0.0004 Epoch: 1 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:36:29,014-Speed 6349.02 samples/sec Loss 20.7599 LearningRate 0.0004 Epoch: 1 Global Step: 31570 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:36:32,256-Speed 6318.82 samples/sec Loss 20.6629 LearningRate 0.0004 Epoch: 1 Global Step: 31580 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:36:35,498-Speed 6317.58 samples/sec Loss 20.7559 LearningRate 0.0004 Epoch: 1 Global Step: 31590 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:36:38,754-Speed 6292.55 samples/sec Loss 20.7389 LearningRate 0.0004 Epoch: 1 Global Step: 31600 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:36:41,995-Speed 6321.08 samples/sec Loss 20.6543 LearningRate 0.0004 Epoch: 1 Global Step: 31610 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:36:45,239-Speed 6313.02 samples/sec Loss 20.6487 LearningRate 0.0004 Epoch: 1 Global Step: 31620 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:36:48,482-Speed 6316.01 samples/sec Loss 20.6829 LearningRate 0.0004 Epoch: 1 Global Step: 31630 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:36:51,723-Speed 6321.72 samples/sec Loss 20.7623 LearningRate 0.0004 Epoch: 1 Global Step: 31640 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:36:54,964-Speed 6319.33 samples/sec Loss 20.7069 LearningRate 0.0004 Epoch: 1 Global Step: 31650 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:36:58,208-Speed 6314.62 samples/sec Loss 20.6950 LearningRate 0.0004 Epoch: 1 Global Step: 31660 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:01,453-Speed 6314.16 samples/sec Loss 20.6464 LearningRate 0.0004 Epoch: 1 Global Step: 31670 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:37:04,694-Speed 6320.18 samples/sec Loss 20.6935 LearningRate 0.0004 Epoch: 1 Global Step: 31680 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:37:07,941-Speed 6309.46 samples/sec Loss 20.4968 LearningRate 0.0004 Epoch: 1 Global Step: 31690 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:37:11,185-Speed 6314.66 samples/sec Loss 20.6413 LearningRate 0.0004 Epoch: 1 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:37:14,436-Speed 6300.15 samples/sec Loss 20.5862 LearningRate 0.0004 Epoch: 1 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:37:17,679-Speed 6317.11 samples/sec Loss 20.5599 LearningRate 0.0004 Epoch: 1 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:37:20,922-Speed 6316.20 samples/sec Loss 20.5105 LearningRate 0.0004 Epoch: 1 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:37:24,152-Speed 6343.62 samples/sec Loss 20.6111 LearningRate 0.0004 Epoch: 1 Global Step: 31740 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:27,391-Speed 6323.08 samples/sec Loss 20.5385 LearningRate 0.0004 Epoch: 1 Global Step: 31750 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:30,633-Speed 6319.07 samples/sec Loss 20.6092 LearningRate 0.0004 Epoch: 1 Global Step: 31760 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:33,874-Speed 6320.14 samples/sec Loss 20.5084 LearningRate 0.0004 Epoch: 1 Global Step: 31770 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:37,121-Speed 6308.80 samples/sec Loss 20.4790 LearningRate 0.0004 Epoch: 1 Global Step: 31780 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:40,361-Speed 6322.61 samples/sec Loss 20.5125 LearningRate 0.0004 Epoch: 1 Global Step: 31790 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:43,601-Speed 6321.70 samples/sec Loss 20.4448 LearningRate 0.0004 Epoch: 1 Global Step: 31800 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:46,847-Speed 6310.38 samples/sec Loss 20.5284 LearningRate 0.0004 Epoch: 1 Global Step: 31810 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:50,091-Speed 6315.23 samples/sec Loss 20.5769 LearningRate 0.0004 Epoch: 1 Global Step: 31820 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:53,333-Speed 6318.71 samples/sec Loss 20.3706 LearningRate 0.0004 Epoch: 1 Global Step: 31830 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:37:56,581-Speed 6307.43 samples/sec Loss 20.4073 LearningRate 0.0004 Epoch: 1 Global Step: 31840 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:37:59,820-Speed 6323.47 samples/sec Loss 20.4408 LearningRate 0.0004 Epoch: 1 Global Step: 31850 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:38:03,066-Speed 6310.26 samples/sec Loss 20.4135 LearningRate 0.0004 Epoch: 1 Global Step: 31860 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:38:06,310-Speed 6314.65 samples/sec Loss 20.4039 LearningRate 0.0004 Epoch: 1 Global Step: 31870 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:38:09,537-Speed 6350.20 samples/sec Loss 20.4997 LearningRate 0.0004 Epoch: 1 Global Step: 31880 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:12,778-Speed 6319.14 samples/sec Loss 20.4280 LearningRate 0.0004 Epoch: 1 Global Step: 31890 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:16,021-Speed 6316.39 samples/sec Loss 20.4912 LearningRate 0.0004 Epoch: 1 Global Step: 31900 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:19,267-Speed 6310.72 samples/sec Loss 20.4244 LearningRate 0.0004 Epoch: 1 Global Step: 31910 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:22,511-Speed 6315.36 samples/sec Loss 20.3835 LearningRate 0.0004 Epoch: 1 Global Step: 31920 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:25,752-Speed 6321.16 samples/sec Loss 20.4729 LearningRate 0.0004 Epoch: 1 Global Step: 31930 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:28,996-Speed 6314.22 samples/sec Loss 20.4817 LearningRate 0.0004 Epoch: 1 Global Step: 31940 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:32,241-Speed 6312.82 samples/sec Loss 20.3953 LearningRate 0.0004 Epoch: 1 Global Step: 31950 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:35,486-Speed 6311.14 samples/sec Loss 20.3157 LearningRate 0.0004 Epoch: 1 Global Step: 31960 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:38,727-Speed 6320.41 samples/sec Loss 20.3895 LearningRate 0.0004 Epoch: 1 Global Step: 31970 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:41,969-Speed 6319.06 samples/sec Loss 20.2950 LearningRate 0.0004 Epoch: 1 Global Step: 31980 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:38:45,210-Speed 6319.93 samples/sec Loss 20.3055 LearningRate 0.0004 Epoch: 1 Global Step: 31990 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:38:48,441-Speed 6340.77 samples/sec Loss 20.4257 LearningRate 0.0004 Epoch: 1 Global Step: 32000 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:51,684-Speed 6315.28 samples/sec Loss 20.3216 LearningRate 0.0004 Epoch: 1 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:54,928-Speed 6315.34 samples/sec Loss 20.3038 LearningRate 0.0004 Epoch: 1 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:38:58,202-Speed 6257.21 samples/sec Loss 20.3205 LearningRate 0.0004 Epoch: 1 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:01,441-Speed 6323.39 samples/sec Loss 20.2172 LearningRate 0.0004 Epoch: 1 Global Step: 32040 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:04,680-Speed 6325.72 samples/sec Loss 20.2546 LearningRate 0.0004 Epoch: 1 Global Step: 32050 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:07,920-Speed 6321.86 samples/sec Loss 20.2367 LearningRate 0.0004 Epoch: 1 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:11,164-Speed 6315.06 samples/sec Loss 20.2681 LearningRate 0.0004 Epoch: 1 Global Step: 32070 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:14,406-Speed 6318.43 samples/sec Loss 20.1889 LearningRate 0.0004 Epoch: 1 Global Step: 32080 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:17,644-Speed 6325.37 samples/sec Loss 20.2015 LearningRate 0.0004 Epoch: 1 Global Step: 32090 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:20,886-Speed 6320.02 samples/sec Loss 20.1961 LearningRate 0.0004 Epoch: 1 Global Step: 32100 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:39:24,123-Speed 6327.41 samples/sec Loss 20.2721 LearningRate 0.0004 Epoch: 1 Global Step: 32110 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:39:27,368-Speed 6313.87 samples/sec Loss 20.1435 LearningRate 0.0004 Epoch: 1 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:39:30,611-Speed 6316.22 samples/sec Loss 20.1261 LearningRate 0.0004 Epoch: 1 Global Step: 32130 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:39:33,852-Speed 6320.96 samples/sec Loss 20.2090 LearningRate 0.0004 Epoch: 1 Global Step: 32140 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:39:37,079-Speed 6346.91 samples/sec Loss 20.1838 LearningRate 0.0004 Epoch: 1 Global Step: 32150 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:40,317-Speed 6327.65 samples/sec Loss 20.0439 LearningRate 0.0004 Epoch: 1 Global Step: 32160 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:43,559-Speed 6317.90 samples/sec Loss 20.1446 LearningRate 0.0004 Epoch: 1 Global Step: 32170 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:46,796-Speed 6327.81 samples/sec Loss 20.1223 LearningRate 0.0004 Epoch: 1 Global Step: 32180 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:50,035-Speed 6324.53 samples/sec Loss 20.1547 LearningRate 0.0004 Epoch: 1 Global Step: 32190 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:53,276-Speed 6320.79 samples/sec Loss 20.1059 LearningRate 0.0004 Epoch: 1 Global Step: 32200 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:56,515-Speed 6323.72 samples/sec Loss 20.0434 LearningRate 0.0004 Epoch: 1 Global Step: 32210 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:39:59,760-Speed 6311.80 samples/sec Loss 19.9792 LearningRate 0.0004 Epoch: 1 Global Step: 32220 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:03,005-Speed 6313.58 samples/sec Loss 20.0881 LearningRate 0.0004 Epoch: 1 Global Step: 32230 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:06,250-Speed 6312.74 samples/sec Loss 20.0089 LearningRate 0.0004 Epoch: 1 Global Step: 32240 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:09,493-Speed 6315.24 samples/sec Loss 20.0955 LearningRate 0.0004 Epoch: 1 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:40:12,734-Speed 6322.24 samples/sec Loss 20.0394 LearningRate 0.0004 Epoch: 1 Global Step: 32260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:40:15,964-Speed 6340.30 samples/sec Loss 20.1206 LearningRate 0.0004 Epoch: 1 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:19,203-Speed 6325.21 samples/sec Loss 20.0232 LearningRate 0.0004 Epoch: 1 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:22,447-Speed 6315.33 samples/sec Loss 20.0695 LearningRate 0.0004 Epoch: 1 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:25,689-Speed 6318.37 samples/sec Loss 19.8847 LearningRate 0.0004 Epoch: 1 Global Step: 32300 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:28,937-Speed 6307.60 samples/sec Loss 19.9485 LearningRate 0.0004 Epoch: 1 Global Step: 32310 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:32,177-Speed 6322.07 samples/sec Loss 20.0836 LearningRate 0.0004 Epoch: 1 Global Step: 32320 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:35,417-Speed 6322.35 samples/sec Loss 19.9678 LearningRate 0.0004 Epoch: 1 Global Step: 32330 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:38,661-Speed 6313.99 samples/sec Loss 19.9886 LearningRate 0.0004 Epoch: 1 Global Step: 32340 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:41,902-Speed 6320.81 samples/sec Loss 19.9582 LearningRate 0.0004 Epoch: 1 Global Step: 32350 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:45,144-Speed 6318.18 samples/sec Loss 19.8366 LearningRate 0.0004 Epoch: 1 Global Step: 32360 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:40:48,391-Speed 6309.28 samples/sec Loss 19.9452 LearningRate 0.0004 Epoch: 1 Global Step: 32370 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:40:51,635-Speed 6314.41 samples/sec Loss 19.9767 LearningRate 0.0004 Epoch: 1 Global Step: 32380 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:40:54,883-Speed 6307.40 samples/sec Loss 19.8726 LearningRate 0.0004 Epoch: 1 Global Step: 32390 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:40:58,127-Speed 6314.76 samples/sec Loss 19.7991 LearningRate 0.0004 Epoch: 1 Global Step: 32400 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:41:01,370-Speed 6316.75 samples/sec Loss 19.9495 LearningRate 0.0004 Epoch: 1 Global Step: 32410 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:41:04,617-Speed 6308.98 samples/sec Loss 19.8140 LearningRate 0.0004 Epoch: 1 Global Step: 32420 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:41:07,858-Speed 6320.07 samples/sec Loss 19.7988 LearningRate 0.0004 Epoch: 1 Global Step: 32430 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:41:11,101-Speed 6316.51 samples/sec Loss 19.7827 LearningRate 0.0004 Epoch: 1 Global Step: 32440 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:41:14,345-Speed 6314.07 samples/sec Loss 19.8192 LearningRate 0.0004 Epoch: 1 Global Step: 32450 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:41:17,594-Speed 6304.84 samples/sec Loss 19.8236 LearningRate 0.0004 Epoch: 1 Global Step: 32460 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:41:20,822-Speed 6345.49 samples/sec Loss 19.9228 LearningRate 0.0004 Epoch: 1 Global Step: 32470 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:41:24,054-Speed 6338.44 samples/sec Loss 19.8604 LearningRate 0.0004 Epoch: 1 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:41:27,294-Speed 6322.91 samples/sec Loss 19.9140 LearningRate 0.0004 Epoch: 1 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:41:30,534-Speed 6322.29 samples/sec Loss 19.7422 LearningRate 0.0004 Epoch: 1 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:41:33,781-Speed 6308.69 samples/sec Loss 19.7510 LearningRate 0.0004 Epoch: 1 Global Step: 32510 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:41:37,023-Speed 6318.69 samples/sec Loss 19.8350 LearningRate 0.0004 Epoch: 1 Global Step: 32520 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:41:40,267-Speed 6316.03 samples/sec Loss 19.7537 LearningRate 0.0004 Epoch: 1 Global Step: 32530 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:41:43,533-Speed 6270.45 samples/sec Loss 19.8444 LearningRate 0.0004 Epoch: 1 Global Step: 32540 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:41:46,778-Speed 6314.81 samples/sec Loss 19.7804 LearningRate 0.0004 Epoch: 1 Global Step: 32550 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:41:50,021-Speed 6315.38 samples/sec Loss 19.8510 LearningRate 0.0004 Epoch: 1 Global Step: 32560 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:41:53,263-Speed 6318.47 samples/sec Loss 19.6393 LearningRate 0.0004 Epoch: 1 Global Step: 32570 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:41:56,505-Speed 6318.58 samples/sec Loss 19.7821 LearningRate 0.0004 Epoch: 1 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:41:59,749-Speed 6314.51 samples/sec Loss 19.7700 LearningRate 0.0004 Epoch: 1 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:02,992-Speed 6316.72 samples/sec Loss 19.7439 LearningRate 0.0004 Epoch: 1 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:06,237-Speed 6313.00 samples/sec Loss 19.7188 LearningRate 0.0004 Epoch: 1 Global Step: 32610 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:09,478-Speed 6319.73 samples/sec Loss 19.6724 LearningRate 0.0004 Epoch: 1 Global Step: 32620 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:12,717-Speed 6324.40 samples/sec Loss 19.7940 LearningRate 0.0004 Epoch: 1 Global Step: 32630 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:15,959-Speed 6319.03 samples/sec Loss 19.7342 LearningRate 0.0004 Epoch: 1 Global Step: 32640 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:19,198-Speed 6323.75 samples/sec Loss 19.7159 LearningRate 0.0004 Epoch: 1 Global Step: 32650 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:22,439-Speed 6319.82 samples/sec Loss 19.7068 LearningRate 0.0004 Epoch: 1 Global Step: 32660 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:25,682-Speed 6317.99 samples/sec Loss 19.6807 LearningRate 0.0004 Epoch: 1 Global Step: 32670 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:28,919-Speed 6328.20 samples/sec Loss 19.5713 LearningRate 0.0004 Epoch: 1 Global Step: 32680 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:32,157-Speed 6326.34 samples/sec Loss 19.5091 LearningRate 0.0004 Epoch: 1 Global Step: 32690 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:35,401-Speed 6315.94 samples/sec Loss 19.6216 LearningRate 0.0004 Epoch: 1 Global Step: 32700 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:38,643-Speed 6317.43 samples/sec Loss 19.6473 LearningRate 0.0004 Epoch: 1 Global Step: 32710 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:41,883-Speed 6322.48 samples/sec Loss 19.7312 LearningRate 0.0004 Epoch: 1 Global Step: 32720 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:45,126-Speed 6318.06 samples/sec Loss 19.5673 LearningRate 0.0004 Epoch: 1 Global Step: 32730 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:48,370-Speed 6314.11 samples/sec Loss 19.6944 LearningRate 0.0004 Epoch: 1 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:51,611-Speed 6319.26 samples/sec Loss 19.5181 LearningRate 0.0004 Epoch: 1 Global Step: 32750 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:54,849-Speed 6326.28 samples/sec Loss 19.6501 LearningRate 0.0004 Epoch: 1 Global Step: 32760 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:42:58,090-Speed 6321.06 samples/sec Loss 19.4948 LearningRate 0.0004 Epoch: 1 Global Step: 32770 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:01,316-Speed 6349.89 samples/sec Loss 19.5557 LearningRate 0.0004 Epoch: 1 Global Step: 32780 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:04,574-Speed 6288.14 samples/sec Loss 19.4113 LearningRate 0.0004 Epoch: 1 Global Step: 32790 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:07,815-Speed 6320.07 samples/sec Loss 19.6119 LearningRate 0.0004 Epoch: 1 Global Step: 32800 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:11,056-Speed 6320.45 samples/sec Loss 19.5151 LearningRate 0.0004 Epoch: 1 Global Step: 32810 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:14,298-Speed 6318.65 samples/sec Loss 19.6182 LearningRate 0.0004 Epoch: 1 Global Step: 32820 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:17,546-Speed 6307.04 samples/sec Loss 19.4660 LearningRate 0.0004 Epoch: 1 Global Step: 32830 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:20,786-Speed 6321.99 samples/sec Loss 19.4376 LearningRate 0.0004 Epoch: 1 Global Step: 32840 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:24,031-Speed 6313.18 samples/sec Loss 19.4693 LearningRate 0.0004 Epoch: 1 Global Step: 32850 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:27,280-Speed 6304.86 samples/sec Loss 19.4886 LearningRate 0.0004 Epoch: 1 Global Step: 32860 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:30,520-Speed 6321.75 samples/sec Loss 19.4788 LearningRate 0.0004 Epoch: 1 Global Step: 32870 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:33,750-Speed 6342.56 samples/sec Loss 19.3938 LearningRate 0.0004 Epoch: 1 Global Step: 32880 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:36,990-Speed 6323.07 samples/sec Loss 19.4404 LearningRate 0.0004 Epoch: 1 Global Step: 32890 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:40,232-Speed 6318.63 samples/sec Loss 19.4341 LearningRate 0.0004 Epoch: 1 Global Step: 32900 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:43,471-Speed 6323.78 samples/sec Loss 19.3521 LearningRate 0.0004 Epoch: 1 Global Step: 32910 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:46,715-Speed 6315.86 samples/sec Loss 19.4990 LearningRate 0.0004 Epoch: 1 Global Step: 32920 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:49,957-Speed 6317.49 samples/sec Loss 19.3828 LearningRate 0.0004 Epoch: 1 Global Step: 32930 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:53,202-Speed 6312.55 samples/sec Loss 19.3270 LearningRate 0.0004 Epoch: 1 Global Step: 32940 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:56,443-Speed 6319.70 samples/sec Loss 19.3680 LearningRate 0.0004 Epoch: 1 Global Step: 32950 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:43:59,697-Speed 6295.43 samples/sec Loss 19.5501 LearningRate 0.0004 Epoch: 1 Global Step: 32960 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:02,940-Speed 6317.84 samples/sec Loss 19.4368 LearningRate 0.0004 Epoch: 1 Global Step: 32970 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:06,164-Speed 6352.45 samples/sec Loss 19.2429 LearningRate 0.0004 Epoch: 1 Global Step: 32980 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:09,407-Speed 6318.05 samples/sec Loss 19.3660 LearningRate 0.0004 Epoch: 1 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:12,649-Speed 6317.67 samples/sec Loss 19.3138 LearningRate 0.0004 Epoch: 1 Global Step: 33000 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:15,892-Speed 6316.08 samples/sec Loss 19.3128 LearningRate 0.0004 Epoch: 1 Global Step: 33010 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:19,138-Speed 6311.65 samples/sec Loss 19.2780 LearningRate 0.0004 Epoch: 1 Global Step: 33020 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:22,382-Speed 6313.31 samples/sec Loss 19.2018 LearningRate 0.0004 Epoch: 1 Global Step: 33030 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:25,629-Speed 6309.51 samples/sec Loss 19.3486 LearningRate 0.0004 Epoch: 1 Global Step: 33040 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:28,875-Speed 6310.73 samples/sec Loss 19.2636 LearningRate 0.0004 Epoch: 1 Global Step: 33050 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:32,115-Speed 6322.16 samples/sec Loss 19.3692 LearningRate 0.0004 Epoch: 1 Global Step: 33060 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:35,359-Speed 6314.85 samples/sec Loss 19.2924 LearningRate 0.0004 Epoch: 1 Global Step: 33070 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:38,589-Speed 6342.39 samples/sec Loss 19.2509 LearningRate 0.0004 Epoch: 1 Global Step: 33080 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:41,830-Speed 6321.12 samples/sec Loss 19.2016 LearningRate 0.0004 Epoch: 1 Global Step: 33090 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:45,072-Speed 6317.61 samples/sec Loss 19.2496 LearningRate 0.0004 Epoch: 1 Global Step: 33100 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:48,317-Speed 6314.55 samples/sec Loss 19.2398 LearningRate 0.0004 Epoch: 1 Global Step: 33110 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:51,558-Speed 6319.30 samples/sec Loss 19.2192 LearningRate 0.0004 Epoch: 1 Global Step: 33120 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:54,805-Speed 6309.09 samples/sec Loss 19.1966 LearningRate 0.0004 Epoch: 1 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:44:58,048-Speed 6315.17 samples/sec Loss 19.0805 LearningRate 0.0004 Epoch: 1 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:01,289-Speed 6320.73 samples/sec Loss 19.2547 LearningRate 0.0004 Epoch: 1 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:04,537-Speed 6307.39 samples/sec Loss 19.2826 LearningRate 0.0004 Epoch: 1 Global Step: 33160 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:07,776-Speed 6324.27 samples/sec Loss 19.1775 LearningRate 0.0004 Epoch: 1 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:11,004-Speed 6346.24 samples/sec Loss 19.1851 LearningRate 0.0004 Epoch: 1 Global Step: 33180 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:14,243-Speed 6325.21 samples/sec Loss 19.1466 LearningRate 0.0004 Epoch: 1 Global Step: 33190 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:17,490-Speed 6309.04 samples/sec Loss 19.2453 LearningRate 0.0004 Epoch: 1 Global Step: 33200 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:20,730-Speed 6321.78 samples/sec Loss 19.2172 LearningRate 0.0004 Epoch: 1 Global Step: 33210 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:23,977-Speed 6309.04 samples/sec Loss 19.1382 LearningRate 0.0004 Epoch: 1 Global Step: 33220 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:27,220-Speed 6315.08 samples/sec Loss 18.9792 LearningRate 0.0004 Epoch: 1 Global Step: 33230 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:30,460-Speed 6322.81 samples/sec Loss 19.0594 LearningRate 0.0004 Epoch: 1 Global Step: 33240 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:33,704-Speed 6315.24 samples/sec Loss 19.0986 LearningRate 0.0004 Epoch: 1 Global Step: 33250 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:36,947-Speed 6316.91 samples/sec Loss 19.0532 LearningRate 0.0004 Epoch: 1 Global Step: 33260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:40,190-Speed 6316.89 samples/sec Loss 19.0440 LearningRate 0.0004 Epoch: 1 Global Step: 33270 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:43,419-Speed 6342.72 samples/sec Loss 19.0594 LearningRate 0.0004 Epoch: 1 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:46,670-Speed 6301.36 samples/sec Loss 19.1254 LearningRate 0.0004 Epoch: 1 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:49,914-Speed 6316.15 samples/sec Loss 19.1526 LearningRate 0.0004 Epoch: 1 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:45:53,140-Speed 6348.45 samples/sec Loss 19.0445 LearningRate 0.0004 Epoch: 1 Global Step: 33310 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:45:56,383-Speed 6317.87 samples/sec Loss 18.9645 LearningRate 0.0004 Epoch: 1 Global Step: 33320 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:45:59,623-Speed 6321.14 samples/sec Loss 19.1256 LearningRate 0.0004 Epoch: 1 Global Step: 33330 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:46:02,867-Speed 6315.46 samples/sec Loss 19.1211 LearningRate 0.0004 Epoch: 1 Global Step: 33340 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:46:06,136-Speed 6265.59 samples/sec Loss 19.1349 LearningRate 0.0004 Epoch: 1 Global Step: 33350 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:46:09,372-Speed 6330.20 samples/sec Loss 18.9940 LearningRate 0.0004 Epoch: 1 Global Step: 33360 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:46:12,614-Speed 6319.01 samples/sec Loss 19.0369 LearningRate 0.0004 Epoch: 1 Global Step: 33370 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:46:15,855-Speed 6320.13 samples/sec Loss 19.0719 LearningRate 0.0004 Epoch: 1 Global Step: 33380 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:46:19,100-Speed 6313.70 samples/sec Loss 19.0317 LearningRate 0.0004 Epoch: 1 Global Step: 33390 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:46:22,344-Speed 6313.11 samples/sec Loss 18.9874 LearningRate 0.0004 Epoch: 1 Global Step: 33400 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:46:25,591-Speed 6308.12 samples/sec Loss 18.8944 LearningRate 0.0004 Epoch: 1 Global Step: 33410 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:46:28,836-Speed 6314.44 samples/sec Loss 18.9365 LearningRate 0.0004 Epoch: 1 Global Step: 33420 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:46:32,107-Speed 6261.17 samples/sec Loss 18.9447 LearningRate 0.0004 Epoch: 1 Global Step: 33430 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:46:35,349-Speed 6318.81 samples/sec Loss 19.0859 LearningRate 0.0004 Epoch: 1 Global Step: 33440 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:46:38,589-Speed 6323.33 samples/sec Loss 19.0274 LearningRate 0.0004 Epoch: 1 Global Step: 33450 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:46:41,853-Speed 6276.33 samples/sec Loss 18.8996 LearningRate 0.0004 Epoch: 1 Global Step: 33460 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:46:45,104-Speed 6299.91 samples/sec Loss 19.0167 LearningRate 0.0004 Epoch: 1 Global Step: 33470 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:46:48,352-Speed 6306.88 samples/sec Loss 18.9783 LearningRate 0.0004 Epoch: 1 Global Step: 33480 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:46:51,593-Speed 6322.32 samples/sec Loss 18.9440 LearningRate 0.0004 Epoch: 1 Global Step: 33490 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:46:54,836-Speed 6315.85 samples/sec Loss 18.8837 LearningRate 0.0004 Epoch: 1 Global Step: 33500 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:46:58,063-Speed 6347.53 samples/sec Loss 19.0608 LearningRate 0.0004 Epoch: 1 Global Step: 33510 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:01,307-Speed 6314.41 samples/sec Loss 18.9356 LearningRate 0.0004 Epoch: 1 Global Step: 33520 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:04,555-Speed 6306.47 samples/sec Loss 18.8710 LearningRate 0.0004 Epoch: 1 Global Step: 33530 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:07,801-Speed 6310.75 samples/sec Loss 18.8606 LearningRate 0.0004 Epoch: 1 Global Step: 33540 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:11,046-Speed 6313.06 samples/sec Loss 18.8793 LearningRate 0.0004 Epoch: 1 Global Step: 33550 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:14,297-Speed 6302.14 samples/sec Loss 18.8814 LearningRate 0.0004 Epoch: 1 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:17,538-Speed 6319.17 samples/sec Loss 18.8381 LearningRate 0.0004 Epoch: 1 Global Step: 33570 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:20,780-Speed 6319.72 samples/sec Loss 18.7848 LearningRate 0.0004 Epoch: 1 Global Step: 33580 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:24,023-Speed 6315.79 samples/sec Loss 18.8710 LearningRate 0.0004 Epoch: 1 Global Step: 33590 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:27,264-Speed 6320.16 samples/sec Loss 18.8824 LearningRate 0.0004 Epoch: 1 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:30,494-Speed 6341.96 samples/sec Loss 18.8459 LearningRate 0.0004 Epoch: 1 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:33,736-Speed 6317.78 samples/sec Loss 18.8300 LearningRate 0.0004 Epoch: 1 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:36,980-Speed 6316.13 samples/sec Loss 18.6526 LearningRate 0.0004 Epoch: 1 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:40,222-Speed 6318.18 samples/sec Loss 18.8294 LearningRate 0.0004 Epoch: 1 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:43,465-Speed 6316.13 samples/sec Loss 18.7315 LearningRate 0.0004 Epoch: 1 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:46,705-Speed 6321.99 samples/sec Loss 18.8635 LearningRate 0.0004 Epoch: 1 Global Step: 33660 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:49,947-Speed 6319.21 samples/sec Loss 18.6865 LearningRate 0.0004 Epoch: 1 Global Step: 33670 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:53,188-Speed 6320.69 samples/sec Loss 18.8391 LearningRate 0.0004 Epoch: 1 Global Step: 33680 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:56,428-Speed 6321.48 samples/sec Loss 18.7432 LearningRate 0.0004 Epoch: 1 Global Step: 33690 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:47:59,672-Speed 6314.77 samples/sec Loss 18.6024 LearningRate 0.0004 Epoch: 1 Global Step: 33700 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:02,911-Speed 6325.46 samples/sec Loss 18.6478 LearningRate 0.0004 Epoch: 1 Global Step: 33710 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:06,156-Speed 6311.90 samples/sec Loss 18.7067 LearningRate 0.0004 Epoch: 1 Global Step: 33720 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:09,409-Speed 6298.73 samples/sec Loss 18.6351 LearningRate 0.0004 Epoch: 1 Global Step: 33730 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:12,657-Speed 6305.49 samples/sec Loss 18.7149 LearningRate 0.0004 Epoch: 1 Global Step: 33740 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:15,900-Speed 6316.93 samples/sec Loss 18.6488 LearningRate 0.0004 Epoch: 1 Global Step: 33750 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:19,143-Speed 6315.89 samples/sec Loss 18.6493 LearningRate 0.0004 Epoch: 1 Global Step: 33760 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:22,394-Speed 6302.46 samples/sec Loss 18.7055 LearningRate 0.0004 Epoch: 1 Global Step: 33770 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:25,636-Speed 6316.98 samples/sec Loss 18.8289 LearningRate 0.0004 Epoch: 1 Global Step: 33780 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:28,889-Speed 6298.72 samples/sec Loss 18.7139 LearningRate 0.0004 Epoch: 1 Global Step: 33790 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:32,128-Speed 6324.25 samples/sec Loss 18.6523 LearningRate 0.0004 Epoch: 1 Global Step: 33800 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:35,371-Speed 6315.66 samples/sec Loss 18.5657 LearningRate 0.0004 Epoch: 1 Global Step: 33810 Fp16 Grad Scale: 262144 Required: 72 hours Training: 2022-03-31 18:48:38,597-Speed 6349.43 samples/sec Loss 18.6023 LearningRate 0.0004 Epoch: 1 Global Step: 33820 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:41,840-Speed 6315.95 samples/sec Loss 18.7981 LearningRate 0.0004 Epoch: 1 Global Step: 33830 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:45,082-Speed 6319.52 samples/sec Loss 18.7091 LearningRate 0.0004 Epoch: 1 Global Step: 33840 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:48,326-Speed 6314.15 samples/sec Loss 18.6747 LearningRate 0.0004 Epoch: 1 Global Step: 33850 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:51,573-Speed 6309.39 samples/sec Loss 18.5936 LearningRate 0.0004 Epoch: 1 Global Step: 33860 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:54,820-Speed 6308.63 samples/sec Loss 18.6207 LearningRate 0.0004 Epoch: 1 Global Step: 33870 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:48:58,062-Speed 6318.22 samples/sec Loss 18.6858 LearningRate 0.0004 Epoch: 1 Global Step: 33880 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:49:01,304-Speed 6319.73 samples/sec Loss 18.5949 LearningRate 0.0004 Epoch: 1 Global Step: 33890 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:49:04,554-Speed 6303.06 samples/sec Loss 18.5147 LearningRate 0.0004 Epoch: 1 Global Step: 33900 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:49:07,806-Speed 6297.89 samples/sec Loss 18.6482 LearningRate 0.0004 Epoch: 1 Global Step: 33910 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:49:11,037-Speed 6340.97 samples/sec Loss 18.5967 LearningRate 0.0004 Epoch: 1 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:49:14,267-Speed 6341.65 samples/sec Loss 18.4729 LearningRate 0.0004 Epoch: 1 Global Step: 33930 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:49:17,505-Speed 6326.35 samples/sec Loss 18.6610 LearningRate 0.0004 Epoch: 1 Global Step: 33940 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:49:20,752-Speed 6310.08 samples/sec Loss 18.4745 LearningRate 0.0004 Epoch: 1 Global Step: 33950 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:49:23,996-Speed 6312.89 samples/sec Loss 18.5303 LearningRate 0.0004 Epoch: 1 Global Step: 33960 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:49:27,239-Speed 6316.54 samples/sec Loss 18.4706 LearningRate 0.0004 Epoch: 1 Global Step: 33970 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:49:30,483-Speed 6315.20 samples/sec Loss 18.5367 LearningRate 0.0004 Epoch: 1 Global Step: 33980 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:49:33,723-Speed 6322.01 samples/sec Loss 18.4608 LearningRate 0.0004 Epoch: 1 Global Step: 33990 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:49:36,964-Speed 6319.71 samples/sec Loss 18.5055 LearningRate 0.0004 Epoch: 1 Global Step: 34000 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:49:40,208-Speed 6316.10 samples/sec Loss 18.5707 LearningRate 0.0004 Epoch: 1 Global Step: 34010 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:49:43,473-Speed 6273.03 samples/sec Loss 18.4950 LearningRate 0.0004 Epoch: 1 Global Step: 34020 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:49:46,719-Speed 6311.73 samples/sec Loss 18.5272 LearningRate 0.0004 Epoch: 1 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:49:49,961-Speed 6316.60 samples/sec Loss 18.5070 LearningRate 0.0004 Epoch: 1 Global Step: 34040 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:49:53,204-Speed 6318.55 samples/sec Loss 18.4339 LearningRate 0.0004 Epoch: 1 Global Step: 34050 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:49:56,446-Speed 6318.18 samples/sec Loss 18.5980 LearningRate 0.0004 Epoch: 1 Global Step: 34060 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:49:59,694-Speed 6305.63 samples/sec Loss 18.5181 LearningRate 0.0004 Epoch: 1 Global Step: 34070 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:02,933-Speed 6323.72 samples/sec Loss 18.3910 LearningRate 0.0004 Epoch: 1 Global Step: 34080 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:06,174-Speed 6320.91 samples/sec Loss 18.4305 LearningRate 0.0004 Epoch: 1 Global Step: 34090 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:09,421-Speed 6310.23 samples/sec Loss 18.3994 LearningRate 0.0004 Epoch: 1 Global Step: 34100 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:12,663-Speed 6317.93 samples/sec Loss 18.4369 LearningRate 0.0004 Epoch: 1 Global Step: 34110 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:15,908-Speed 6313.46 samples/sec Loss 18.4713 LearningRate 0.0004 Epoch: 1 Global Step: 34120 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:19,142-Speed 6334.66 samples/sec Loss 18.4270 LearningRate 0.0004 Epoch: 1 Global Step: 34130 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:22,393-Speed 6300.51 samples/sec Loss 18.3102 LearningRate 0.0004 Epoch: 1 Global Step: 34140 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:25,638-Speed 6312.46 samples/sec Loss 18.3533 LearningRate 0.0004 Epoch: 1 Global Step: 34150 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:28,878-Speed 6323.20 samples/sec Loss 18.2641 LearningRate 0.0004 Epoch: 1 Global Step: 34160 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:32,123-Speed 6311.47 samples/sec Loss 18.3099 LearningRate 0.0004 Epoch: 1 Global Step: 34170 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:35,366-Speed 6316.82 samples/sec Loss 18.4010 LearningRate 0.0004 Epoch: 1 Global Step: 34180 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:38,611-Speed 6313.37 samples/sec Loss 18.3427 LearningRate 0.0004 Epoch: 1 Global Step: 34190 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:41,851-Speed 6320.83 samples/sec Loss 18.2139 LearningRate 0.0004 Epoch: 1 Global Step: 34200 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:45,091-Speed 6322.48 samples/sec Loss 18.3227 LearningRate 0.0004 Epoch: 1 Global Step: 34210 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:48,335-Speed 6315.12 samples/sec Loss 18.3253 LearningRate 0.0004 Epoch: 1 Global Step: 34220 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:51,568-Speed 6335.65 samples/sec Loss 18.3750 LearningRate 0.0004 Epoch: 1 Global Step: 34230 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:54,813-Speed 6312.78 samples/sec Loss 18.3399 LearningRate 0.0004 Epoch: 1 Global Step: 34240 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:50:58,054-Speed 6321.44 samples/sec Loss 18.2565 LearningRate 0.0004 Epoch: 1 Global Step: 34250 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:01,299-Speed 6312.21 samples/sec Loss 18.2686 LearningRate 0.0004 Epoch: 1 Global Step: 34260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:04,543-Speed 6314.92 samples/sec Loss 18.3414 LearningRate 0.0004 Epoch: 1 Global Step: 34270 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:07,788-Speed 6311.59 samples/sec Loss 18.2398 LearningRate 0.0004 Epoch: 1 Global Step: 34280 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:11,031-Speed 6317.30 samples/sec Loss 18.2797 LearningRate 0.0004 Epoch: 1 Global Step: 34290 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:14,283-Speed 6300.27 samples/sec Loss 18.3400 LearningRate 0.0004 Epoch: 1 Global Step: 34300 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:17,538-Speed 6293.56 samples/sec Loss 18.2612 LearningRate 0.0004 Epoch: 1 Global Step: 34310 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:20,781-Speed 6316.75 samples/sec Loss 18.2808 LearningRate 0.0004 Epoch: 1 Global Step: 34320 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:24,016-Speed 6331.61 samples/sec Loss 18.1810 LearningRate 0.0004 Epoch: 1 Global Step: 34330 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:27,263-Speed 6308.97 samples/sec Loss 18.2738 LearningRate 0.0004 Epoch: 1 Global Step: 34340 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:30,504-Speed 6319.46 samples/sec Loss 18.1361 LearningRate 0.0004 Epoch: 1 Global Step: 34350 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:33,799-Speed 6217.98 samples/sec Loss 18.1623 LearningRate 0.0004 Epoch: 1 Global Step: 34360 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:37,069-Speed 6264.91 samples/sec Loss 18.2954 LearningRate 0.0004 Epoch: 1 Global Step: 34370 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:40,310-Speed 6318.59 samples/sec Loss 18.1763 LearningRate 0.0004 Epoch: 1 Global Step: 34380 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:43,551-Speed 6321.02 samples/sec Loss 18.1405 LearningRate 0.0004 Epoch: 1 Global Step: 34390 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:46,806-Speed 6294.37 samples/sec Loss 18.1007 LearningRate 0.0004 Epoch: 1 Global Step: 34400 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:50,059-Speed 6296.60 samples/sec Loss 18.2066 LearningRate 0.0004 Epoch: 1 Global Step: 34410 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:53,305-Speed 6310.03 samples/sec Loss 18.1955 LearningRate 0.0004 Epoch: 1 Global Step: 34420 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:56,536-Speed 6340.39 samples/sec Loss 18.2075 LearningRate 0.0004 Epoch: 1 Global Step: 34430 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:51:59,778-Speed 6318.99 samples/sec Loss 18.2044 LearningRate 0.0004 Epoch: 1 Global Step: 34440 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:52:03,010-Speed 6337.44 samples/sec Loss 18.0781 LearningRate 0.0004 Epoch: 1 Global Step: 34450 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:52:06,254-Speed 6315.00 samples/sec Loss 18.2068 LearningRate 0.0004 Epoch: 1 Global Step: 34460 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:52:09,495-Speed 6320.86 samples/sec Loss 18.1497 LearningRate 0.0004 Epoch: 1 Global Step: 34470 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:52:12,736-Speed 6319.28 samples/sec Loss 18.1458 LearningRate 0.0004 Epoch: 1 Global Step: 34480 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:52:15,976-Speed 6323.00 samples/sec Loss 18.0492 LearningRate 0.0004 Epoch: 1 Global Step: 34490 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:52:19,220-Speed 6313.50 samples/sec Loss 18.1516 LearningRate 0.0004 Epoch: 1 Global Step: 34500 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:52:22,463-Speed 6317.96 samples/sec Loss 18.0556 LearningRate 0.0004 Epoch: 1 Global Step: 34510 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:52:25,704-Speed 6321.72 samples/sec Loss 18.2070 LearningRate 0.0004 Epoch: 1 Global Step: 34520 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:52:28,958-Speed 6293.95 samples/sec Loss 18.2078 LearningRate 0.0004 Epoch: 1 Global Step: 34530 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:52:32,222-Speed 6275.43 samples/sec Loss 18.1168 LearningRate 0.0004 Epoch: 1 Global Step: 34540 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:52:35,463-Speed 6322.41 samples/sec Loss 18.1386 LearningRate 0.0004 Epoch: 1 Global Step: 34550 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:52:38,706-Speed 6315.02 samples/sec Loss 18.1103 LearningRate 0.0004 Epoch: 1 Global Step: 34560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:52:41,945-Speed 6325.26 samples/sec Loss 18.0573 LearningRate 0.0004 Epoch: 1 Global Step: 34570 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:52:45,188-Speed 6316.12 samples/sec Loss 17.9587 LearningRate 0.0004 Epoch: 1 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:52:48,434-Speed 6310.89 samples/sec Loss 18.0664 LearningRate 0.0004 Epoch: 1 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:52:51,689-Speed 6293.45 samples/sec Loss 18.0697 LearningRate 0.0004 Epoch: 1 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:52:54,931-Speed 6318.04 samples/sec Loss 18.0408 LearningRate 0.0004 Epoch: 1 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:52:58,188-Speed 6289.29 samples/sec Loss 17.9592 LearningRate 0.0004 Epoch: 1 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:01,430-Speed 6317.89 samples/sec Loss 18.1110 LearningRate 0.0004 Epoch: 1 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:04,727-Speed 6215.77 samples/sec Loss 17.9998 LearningRate 0.0004 Epoch: 1 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:07,958-Speed 6341.28 samples/sec Loss 17.9846 LearningRate 0.0004 Epoch: 1 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:11,199-Speed 6319.21 samples/sec Loss 18.0372 LearningRate 0.0004 Epoch: 1 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:14,440-Speed 6321.91 samples/sec Loss 18.0167 LearningRate 0.0004 Epoch: 1 Global Step: 34670 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:17,683-Speed 6315.81 samples/sec Loss 17.9884 LearningRate 0.0004 Epoch: 1 Global Step: 34680 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:20,923-Speed 6322.43 samples/sec Loss 17.9840 LearningRate 0.0004 Epoch: 1 Global Step: 34690 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:24,171-Speed 6307.38 samples/sec Loss 17.9011 LearningRate 0.0004 Epoch: 1 Global Step: 34700 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:27,413-Speed 6317.26 samples/sec Loss 17.9528 LearningRate 0.0004 Epoch: 1 Global Step: 34710 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:30,661-Speed 6307.07 samples/sec Loss 17.9480 LearningRate 0.0004 Epoch: 1 Global Step: 34720 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:33,954-Speed 6222.26 samples/sec Loss 17.8651 LearningRate 0.0004 Epoch: 1 Global Step: 34730 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:37,194-Speed 6321.98 samples/sec Loss 18.0710 LearningRate 0.0004 Epoch: 1 Global Step: 34740 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:40,423-Speed 6343.56 samples/sec Loss 17.9885 LearningRate 0.0004 Epoch: 1 Global Step: 34750 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:43,666-Speed 6315.98 samples/sec Loss 17.8761 LearningRate 0.0004 Epoch: 1 Global Step: 34760 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:46,912-Speed 6312.49 samples/sec Loss 17.8595 LearningRate 0.0004 Epoch: 1 Global Step: 34770 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:50,154-Speed 6316.84 samples/sec Loss 17.9615 LearningRate 0.0004 Epoch: 1 Global Step: 34780 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:53,423-Speed 6266.48 samples/sec Loss 17.9729 LearningRate 0.0004 Epoch: 1 Global Step: 34790 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:56,676-Speed 6297.89 samples/sec Loss 17.8621 LearningRate 0.0004 Epoch: 1 Global Step: 34800 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:53:59,920-Speed 6313.82 samples/sec Loss 17.9359 LearningRate 0.0004 Epoch: 1 Global Step: 34810 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:03,166-Speed 6311.83 samples/sec Loss 17.8854 LearningRate 0.0004 Epoch: 1 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:06,409-Speed 6315.49 samples/sec Loss 17.9692 LearningRate 0.0004 Epoch: 1 Global Step: 34830 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:09,657-Speed 6307.37 samples/sec Loss 17.7673 LearningRate 0.0004 Epoch: 1 Global Step: 34840 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:12,885-Speed 6346.03 samples/sec Loss 17.9019 LearningRate 0.0004 Epoch: 1 Global Step: 34850 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:16,127-Speed 6317.72 samples/sec Loss 17.8409 LearningRate 0.0004 Epoch: 1 Global Step: 34860 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:19,374-Speed 6308.60 samples/sec Loss 17.9548 LearningRate 0.0004 Epoch: 1 Global Step: 34870 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:22,619-Speed 6314.13 samples/sec Loss 17.9720 LearningRate 0.0004 Epoch: 1 Global Step: 34880 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:25,869-Speed 6302.78 samples/sec Loss 17.7426 LearningRate 0.0004 Epoch: 1 Global Step: 34890 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:29,110-Speed 6319.43 samples/sec Loss 17.7417 LearningRate 0.0004 Epoch: 1 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:32,356-Speed 6310.50 samples/sec Loss 17.8243 LearningRate 0.0004 Epoch: 1 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:35,600-Speed 6316.10 samples/sec Loss 17.7169 LearningRate 0.0004 Epoch: 1 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:38,841-Speed 6320.38 samples/sec Loss 17.9081 LearningRate 0.0004 Epoch: 1 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:42,087-Speed 6310.89 samples/sec Loss 17.7726 LearningRate 0.0004 Epoch: 1 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:45,318-Speed 6338.94 samples/sec Loss 17.7302 LearningRate 0.0004 Epoch: 1 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:48,565-Speed 6310.08 samples/sec Loss 17.7989 LearningRate 0.0004 Epoch: 1 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:51,808-Speed 6316.10 samples/sec Loss 17.7560 LearningRate 0.0004 Epoch: 1 Global Step: 34970 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:55,056-Speed 6306.86 samples/sec Loss 17.7690 LearningRate 0.0004 Epoch: 1 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:54:58,307-Speed 6301.03 samples/sec Loss 17.8024 LearningRate 0.0004 Epoch: 1 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:01,553-Speed 6310.58 samples/sec Loss 17.7853 LearningRate 0.0004 Epoch: 1 Global Step: 35000 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:04,797-Speed 6315.08 samples/sec Loss 17.7775 LearningRate 0.0004 Epoch: 1 Global Step: 35010 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:08,063-Speed 6271.86 samples/sec Loss 17.6931 LearningRate 0.0004 Epoch: 1 Global Step: 35020 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:11,309-Speed 6310.18 samples/sec Loss 17.7063 LearningRate 0.0004 Epoch: 1 Global Step: 35030 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:14,551-Speed 6318.15 samples/sec Loss 17.7351 LearningRate 0.0004 Epoch: 1 Global Step: 35040 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:17,775-Speed 6353.82 samples/sec Loss 17.6156 LearningRate 0.0004 Epoch: 1 Global Step: 35050 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:21,019-Speed 6314.40 samples/sec Loss 17.6810 LearningRate 0.0004 Epoch: 1 Global Step: 35060 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:24,263-Speed 6315.53 samples/sec Loss 17.6295 LearningRate 0.0004 Epoch: 1 Global Step: 35070 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:27,501-Speed 6325.15 samples/sec Loss 17.6392 LearningRate 0.0004 Epoch: 1 Global Step: 35080 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:30,745-Speed 6315.57 samples/sec Loss 17.6057 LearningRate 0.0004 Epoch: 1 Global Step: 35090 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:33,987-Speed 6317.04 samples/sec Loss 17.6216 LearningRate 0.0004 Epoch: 1 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:37,233-Speed 6312.07 samples/sec Loss 17.6023 LearningRate 0.0004 Epoch: 1 Global Step: 35110 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:40,474-Speed 6320.50 samples/sec Loss 17.6785 LearningRate 0.0004 Epoch: 1 Global Step: 35120 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:43,713-Speed 6325.05 samples/sec Loss 17.7297 LearningRate 0.0004 Epoch: 1 Global Step: 35130 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:55:46,940-Speed 6348.81 samples/sec Loss 17.6421 LearningRate 0.0004 Epoch: 1 Global Step: 35140 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:55:50,186-Speed 6310.35 samples/sec Loss 17.6272 LearningRate 0.0004 Epoch: 1 Global Step: 35150 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:55:53,430-Speed 6314.52 samples/sec Loss 17.5549 LearningRate 0.0004 Epoch: 1 Global Step: 35160 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:55:56,669-Speed 6323.32 samples/sec Loss 17.5206 LearningRate 0.0004 Epoch: 1 Global Step: 35170 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:55:59,922-Speed 6298.10 samples/sec Loss 17.6883 LearningRate 0.0004 Epoch: 1 Global Step: 35180 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:56:03,164-Speed 6317.42 samples/sec Loss 17.5597 LearningRate 0.0004 Epoch: 1 Global Step: 35190 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:56:06,408-Speed 6316.31 samples/sec Loss 17.5828 LearningRate 0.0004 Epoch: 1 Global Step: 35200 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:56:09,655-Speed 6307.06 samples/sec Loss 17.6273 LearningRate 0.0004 Epoch: 1 Global Step: 35210 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:56:12,893-Speed 6327.27 samples/sec Loss 17.6323 LearningRate 0.0004 Epoch: 1 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:56:16,135-Speed 6318.31 samples/sec Loss 17.6150 LearningRate 0.0004 Epoch: 1 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:56:19,378-Speed 6315.85 samples/sec Loss 17.5930 LearningRate 0.0004 Epoch: 1 Global Step: 35240 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:22,620-Speed 6318.41 samples/sec Loss 17.5682 LearningRate 0.0004 Epoch: 1 Global Step: 35250 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:25,861-Speed 6320.37 samples/sec Loss 17.6096 LearningRate 0.0004 Epoch: 1 Global Step: 35260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:29,103-Speed 6320.50 samples/sec Loss 17.5710 LearningRate 0.0004 Epoch: 1 Global Step: 35270 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:32,346-Speed 6316.15 samples/sec Loss 17.5274 LearningRate 0.0004 Epoch: 1 Global Step: 35280 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:35,582-Speed 6329.12 samples/sec Loss 17.4784 LearningRate 0.0004 Epoch: 1 Global Step: 35290 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:38,827-Speed 6312.46 samples/sec Loss 17.5449 LearningRate 0.0004 Epoch: 1 Global Step: 35300 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:42,073-Speed 6311.71 samples/sec Loss 17.5768 LearningRate 0.0004 Epoch: 1 Global Step: 35310 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:45,312-Speed 6325.37 samples/sec Loss 17.5715 LearningRate 0.0004 Epoch: 1 Global Step: 35320 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:48,558-Speed 6309.25 samples/sec Loss 17.3976 LearningRate 0.0004 Epoch: 1 Global Step: 35330 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:51,791-Speed 6336.61 samples/sec Loss 17.5245 LearningRate 0.0004 Epoch: 1 Global Step: 35340 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:55,046-Speed 6292.89 samples/sec Loss 17.4057 LearningRate 0.0004 Epoch: 1 Global Step: 35350 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:56:58,291-Speed 6314.60 samples/sec Loss 17.5623 LearningRate 0.0004 Epoch: 1 Global Step: 35360 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:01,534-Speed 6316.67 samples/sec Loss 17.4608 LearningRate 0.0004 Epoch: 1 Global Step: 35370 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:04,790-Speed 6290.04 samples/sec Loss 17.4793 LearningRate 0.0004 Epoch: 1 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:08,030-Speed 6322.95 samples/sec Loss 17.3181 LearningRate 0.0004 Epoch: 1 Global Step: 35390 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:11,272-Speed 6318.36 samples/sec Loss 17.4027 LearningRate 0.0004 Epoch: 1 Global Step: 35400 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:14,515-Speed 6317.04 samples/sec Loss 17.4930 LearningRate 0.0004 Epoch: 1 Global Step: 35410 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:17,755-Speed 6321.90 samples/sec Loss 17.5474 LearningRate 0.0004 Epoch: 1 Global Step: 35420 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:21,011-Speed 6291.26 samples/sec Loss 17.4017 LearningRate 0.0004 Epoch: 1 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:24,250-Speed 6325.07 samples/sec Loss 17.4262 LearningRate 0.0004 Epoch: 1 Global Step: 35440 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:27,492-Speed 6318.17 samples/sec Loss 17.3979 LearningRate 0.0004 Epoch: 1 Global Step: 35450 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:30,732-Speed 6322.34 samples/sec Loss 17.4965 LearningRate 0.0004 Epoch: 1 Global Step: 35460 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:33,987-Speed 6292.48 samples/sec Loss 17.4959 LearningRate 0.0004 Epoch: 1 Global Step: 35470 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:37,236-Speed 6305.75 samples/sec Loss 17.5234 LearningRate 0.0004 Epoch: 1 Global Step: 35480 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:40,483-Speed 6308.96 samples/sec Loss 17.3683 LearningRate 0.0004 Epoch: 1 Global Step: 35490 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:43,729-Speed 6310.41 samples/sec Loss 17.3937 LearningRate 0.0004 Epoch: 1 Global Step: 35500 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:46,972-Speed 6316.34 samples/sec Loss 17.3434 LearningRate 0.0004 Epoch: 1 Global Step: 35510 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:50,217-Speed 6313.36 samples/sec Loss 17.4206 LearningRate 0.0004 Epoch: 1 Global Step: 35520 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:53,459-Speed 6318.36 samples/sec Loss 17.3841 LearningRate 0.0004 Epoch: 1 Global Step: 35530 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:56,689-Speed 6341.30 samples/sec Loss 17.4190 LearningRate 0.0004 Epoch: 1 Global Step: 35540 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:57:59,929-Speed 6323.82 samples/sec Loss 17.3118 LearningRate 0.0004 Epoch: 1 Global Step: 35550 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:58:03,175-Speed 6309.06 samples/sec Loss 17.2816 LearningRate 0.0004 Epoch: 1 Global Step: 35560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:58:06,420-Speed 6313.49 samples/sec Loss 17.2902 LearningRate 0.0004 Epoch: 1 Global Step: 35570 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:58:09,648-Speed 6345.61 samples/sec Loss 17.3927 LearningRate 0.0004 Epoch: 1 Global Step: 35580 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:58:12,932-Speed 6237.78 samples/sec Loss 17.3411 LearningRate 0.0004 Epoch: 1 Global Step: 35590 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:58:16,182-Speed 6303.04 samples/sec Loss 17.4139 LearningRate 0.0004 Epoch: 1 Global Step: 35600 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:58:19,421-Speed 6323.12 samples/sec Loss 17.3484 LearningRate 0.0004 Epoch: 1 Global Step: 35610 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:58:22,663-Speed 6320.11 samples/sec Loss 17.3929 LearningRate 0.0004 Epoch: 1 Global Step: 35620 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:58:25,906-Speed 6315.92 samples/sec Loss 17.3863 LearningRate 0.0004 Epoch: 1 Global Step: 35630 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:58:29,148-Speed 6319.15 samples/sec Loss 17.3467 LearningRate 0.0004 Epoch: 1 Global Step: 35640 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:58:32,394-Speed 6309.30 samples/sec Loss 17.2939 LearningRate 0.0004 Epoch: 1 Global Step: 35650 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:58:35,635-Speed 6321.29 samples/sec Loss 17.3234 LearningRate 0.0004 Epoch: 1 Global Step: 35660 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:58:38,879-Speed 6314.38 samples/sec Loss 17.3459 LearningRate 0.0004 Epoch: 1 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 18:58:42,120-Speed 6320.57 samples/sec Loss 17.3754 LearningRate 0.0004 Epoch: 1 Global Step: 35680 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:58:45,363-Speed 6316.76 samples/sec Loss 17.2783 LearningRate 0.0004 Epoch: 1 Global Step: 35690 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:58:48,607-Speed 6313.29 samples/sec Loss 17.2691 LearningRate 0.0004 Epoch: 1 Global Step: 35700 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:58:51,853-Speed 6312.47 samples/sec Loss 17.1744 LearningRate 0.0004 Epoch: 1 Global Step: 35710 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:58:55,096-Speed 6317.24 samples/sec Loss 17.3348 LearningRate 0.0004 Epoch: 1 Global Step: 35720 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:58:58,336-Speed 6322.92 samples/sec Loss 17.2868 LearningRate 0.0004 Epoch: 1 Global Step: 35730 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:01,576-Speed 6322.25 samples/sec Loss 17.1738 LearningRate 0.0004 Epoch: 1 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:04,818-Speed 6318.32 samples/sec Loss 17.3343 LearningRate 0.0004 Epoch: 1 Global Step: 35750 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:08,059-Speed 6318.65 samples/sec Loss 17.3243 LearningRate 0.0004 Epoch: 1 Global Step: 35760 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:11,303-Speed 6315.93 samples/sec Loss 17.1872 LearningRate 0.0004 Epoch: 1 Global Step: 35770 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:14,530-Speed 6347.23 samples/sec Loss 17.2264 LearningRate 0.0004 Epoch: 1 Global Step: 35780 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:17,777-Speed 6308.73 samples/sec Loss 17.3053 LearningRate 0.0004 Epoch: 1 Global Step: 35790 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:21,020-Speed 6317.15 samples/sec Loss 17.2228 LearningRate 0.0004 Epoch: 1 Global Step: 35800 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:24,262-Speed 6318.33 samples/sec Loss 17.1994 LearningRate 0.0004 Epoch: 1 Global Step: 35810 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:27,506-Speed 6313.84 samples/sec Loss 17.1899 LearningRate 0.0004 Epoch: 1 Global Step: 35820 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:30,751-Speed 6312.67 samples/sec Loss 17.1702 LearningRate 0.0004 Epoch: 1 Global Step: 35830 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:33,992-Speed 6321.27 samples/sec Loss 17.1593 LearningRate 0.0004 Epoch: 1 Global Step: 35840 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:37,233-Speed 6319.08 samples/sec Loss 17.1671 LearningRate 0.0004 Epoch: 1 Global Step: 35850 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:40,478-Speed 6313.87 samples/sec Loss 17.2346 LearningRate 0.0004 Epoch: 1 Global Step: 35860 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:43,717-Speed 6324.22 samples/sec Loss 17.2493 LearningRate 0.0004 Epoch: 1 Global Step: 35870 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:46,950-Speed 6335.42 samples/sec Loss 16.9820 LearningRate 0.0004 Epoch: 1 Global Step: 35880 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:50,191-Speed 6321.89 samples/sec Loss 17.1149 LearningRate 0.0004 Epoch: 1 Global Step: 35890 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:53,429-Speed 6324.63 samples/sec Loss 17.0903 LearningRate 0.0004 Epoch: 1 Global Step: 35900 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:56,672-Speed 6317.58 samples/sec Loss 17.1467 LearningRate 0.0004 Epoch: 1 Global Step: 35910 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 18:59:59,929-Speed 6290.15 samples/sec Loss 17.2152 LearningRate 0.0004 Epoch: 1 Global Step: 35920 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:03,174-Speed 6312.05 samples/sec Loss 17.1519 LearningRate 0.0004 Epoch: 1 Global Step: 35930 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:06,416-Speed 6318.30 samples/sec Loss 16.9659 LearningRate 0.0004 Epoch: 1 Global Step: 35940 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:09,659-Speed 6317.54 samples/sec Loss 17.2252 LearningRate 0.0004 Epoch: 1 Global Step: 35950 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:12,903-Speed 6314.83 samples/sec Loss 17.0459 LearningRate 0.0004 Epoch: 1 Global Step: 35960 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:16,179-Speed 6253.29 samples/sec Loss 17.2073 LearningRate 0.0004 Epoch: 1 Global Step: 35970 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:19,407-Speed 6345.05 samples/sec Loss 17.1588 LearningRate 0.0004 Epoch: 1 Global Step: 35980 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:22,651-Speed 6316.03 samples/sec Loss 17.0969 LearningRate 0.0004 Epoch: 1 Global Step: 35990 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:25,892-Speed 6318.79 samples/sec Loss 17.1553 LearningRate 0.0004 Epoch: 1 Global Step: 36000 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:29,141-Speed 6306.23 samples/sec Loss 17.0856 LearningRate 0.0004 Epoch: 1 Global Step: 36010 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:32,391-Speed 6301.82 samples/sec Loss 17.0297 LearningRate 0.0004 Epoch: 1 Global Step: 36020 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:35,635-Speed 6315.40 samples/sec Loss 16.9507 LearningRate 0.0004 Epoch: 1 Global Step: 36030 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:38,878-Speed 6315.52 samples/sec Loss 17.0614 LearningRate 0.0004 Epoch: 1 Global Step: 36040 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:42,122-Speed 6316.31 samples/sec Loss 17.0752 LearningRate 0.0004 Epoch: 1 Global Step: 36050 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:45,373-Speed 6300.32 samples/sec Loss 17.1455 LearningRate 0.0004 Epoch: 1 Global Step: 36060 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:48,616-Speed 6316.68 samples/sec Loss 17.0168 LearningRate 0.0004 Epoch: 1 Global Step: 36070 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:51,846-Speed 6341.38 samples/sec Loss 16.9173 LearningRate 0.0004 Epoch: 1 Global Step: 36080 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:55,095-Speed 6305.94 samples/sec Loss 17.0079 LearningRate 0.0004 Epoch: 1 Global Step: 36090 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:00:58,340-Speed 6312.72 samples/sec Loss 17.0095 LearningRate 0.0004 Epoch: 1 Global Step: 36100 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:01,583-Speed 6314.81 samples/sec Loss 17.0384 LearningRate 0.0004 Epoch: 1 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:04,829-Speed 6312.01 samples/sec Loss 16.9961 LearningRate 0.0004 Epoch: 1 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:08,068-Speed 6324.15 samples/sec Loss 17.1112 LearningRate 0.0004 Epoch: 1 Global Step: 36130 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:11,313-Speed 6314.31 samples/sec Loss 17.0753 LearningRate 0.0004 Epoch: 1 Global Step: 36140 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:14,554-Speed 6319.30 samples/sec Loss 16.9672 LearningRate 0.0004 Epoch: 1 Global Step: 36150 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:17,801-Speed 6307.86 samples/sec Loss 16.9824 LearningRate 0.0004 Epoch: 1 Global Step: 36160 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:21,050-Speed 6305.24 samples/sec Loss 16.9683 LearningRate 0.0004 Epoch: 1 Global Step: 36170 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:24,278-Speed 6346.98 samples/sec Loss 16.9125 LearningRate 0.0004 Epoch: 1 Global Step: 36180 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:27,525-Speed 6309.23 samples/sec Loss 16.9542 LearningRate 0.0004 Epoch: 1 Global Step: 36190 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:30,771-Speed 6310.19 samples/sec Loss 17.0489 LearningRate 0.0004 Epoch: 1 Global Step: 36200 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:34,068-Speed 6213.03 samples/sec Loss 16.9176 LearningRate 0.0004 Epoch: 1 Global Step: 36210 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:37,312-Speed 6314.46 samples/sec Loss 17.0363 LearningRate 0.0004 Epoch: 1 Global Step: 36220 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:40,554-Speed 6318.67 samples/sec Loss 16.9784 LearningRate 0.0004 Epoch: 1 Global Step: 36230 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:43,797-Speed 6315.13 samples/sec Loss 16.9487 LearningRate 0.0004 Epoch: 1 Global Step: 36240 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:47,045-Speed 6308.07 samples/sec Loss 16.8542 LearningRate 0.0004 Epoch: 1 Global Step: 36250 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:50,293-Speed 6306.65 samples/sec Loss 16.9743 LearningRate 0.0004 Epoch: 1 Global Step: 36260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:53,539-Speed 6309.43 samples/sec Loss 16.9342 LearningRate 0.0004 Epoch: 1 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:01:56,780-Speed 6321.81 samples/sec Loss 16.8458 LearningRate 0.0004 Epoch: 1 Global Step: 36280 Fp16 Grad Scale: 262144 Required: 72 hours Training: 2022-03-31 19:02:00,011-Speed 6340.28 samples/sec Loss 16.8486 LearningRate 0.0004 Epoch: 1 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:02:03,258-Speed 6308.28 samples/sec Loss 16.9359 LearningRate 0.0004 Epoch: 1 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:02:06,489-Speed 6339.99 samples/sec Loss 16.9274 LearningRate 0.0004 Epoch: 1 Global Step: 36310 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:02:09,726-Speed 6328.25 samples/sec Loss 16.8926 LearningRate 0.0004 Epoch: 1 Global Step: 36320 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:02:12,972-Speed 6310.53 samples/sec Loss 16.8877 LearningRate 0.0004 Epoch: 1 Global Step: 36330 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:02:16,216-Speed 6315.17 samples/sec Loss 16.8278 LearningRate 0.0004 Epoch: 1 Global Step: 36340 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:02:19,457-Speed 6320.23 samples/sec Loss 16.8569 LearningRate 0.0004 Epoch: 1 Global Step: 36350 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:02:22,708-Speed 6302.43 samples/sec Loss 16.9142 LearningRate 0.0004 Epoch: 1 Global Step: 36360 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:02:25,954-Speed 6309.14 samples/sec Loss 16.8567 LearningRate 0.0004 Epoch: 1 Global Step: 36370 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:02:29,198-Speed 6315.81 samples/sec Loss 16.8095 LearningRate 0.0004 Epoch: 1 Global Step: 36380 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:02:32,439-Speed 6320.60 samples/sec Loss 16.8117 LearningRate 0.0004 Epoch: 1 Global Step: 36390 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:02:35,709-Speed 6263.34 samples/sec Loss 16.7862 LearningRate 0.0004 Epoch: 1 Global Step: 36400 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:02:38,964-Speed 6293.48 samples/sec Loss 16.8052 LearningRate 0.0004 Epoch: 1 Global Step: 36410 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:02:42,206-Speed 6318.30 samples/sec Loss 16.7696 LearningRate 0.0004 Epoch: 1 Global Step: 36420 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:02:45,452-Speed 6310.94 samples/sec Loss 16.8276 LearningRate 0.0004 Epoch: 1 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:02:48,695-Speed 6317.71 samples/sec Loss 16.8134 LearningRate 0.0004 Epoch: 1 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:02:51,936-Speed 6319.14 samples/sec Loss 16.9039 LearningRate 0.0004 Epoch: 1 Global Step: 36450 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:02:55,181-Speed 6314.11 samples/sec Loss 16.7957 LearningRate 0.0004 Epoch: 1 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:02:58,422-Speed 6319.73 samples/sec Loss 16.6973 LearningRate 0.0004 Epoch: 1 Global Step: 36470 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:01,664-Speed 6318.53 samples/sec Loss 16.8050 LearningRate 0.0004 Epoch: 1 Global Step: 36480 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:04,910-Speed 6310.89 samples/sec Loss 16.8043 LearningRate 0.0004 Epoch: 1 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:08,159-Speed 6304.84 samples/sec Loss 16.6832 LearningRate 0.0004 Epoch: 1 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:11,390-Speed 6339.89 samples/sec Loss 16.7630 LearningRate 0.0004 Epoch: 1 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:14,636-Speed 6309.87 samples/sec Loss 16.8629 LearningRate 0.0004 Epoch: 1 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:17,884-Speed 6307.88 samples/sec Loss 16.8113 LearningRate 0.0004 Epoch: 1 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:21,131-Speed 6309.55 samples/sec Loss 16.6916 LearningRate 0.0004 Epoch: 1 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:24,373-Speed 6318.73 samples/sec Loss 16.7332 LearningRate 0.0004 Epoch: 1 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:27,614-Speed 6319.18 samples/sec Loss 16.7684 LearningRate 0.0004 Epoch: 1 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:30,858-Speed 6314.62 samples/sec Loss 16.7538 LearningRate 0.0004 Epoch: 1 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:34,100-Speed 6318.68 samples/sec Loss 16.6841 LearningRate 0.0004 Epoch: 1 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:37,342-Speed 6319.33 samples/sec Loss 16.8292 LearningRate 0.0004 Epoch: 1 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:40,586-Speed 6316.80 samples/sec Loss 16.6362 LearningRate 0.0004 Epoch: 1 Global Step: 36600 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:43,818-Speed 6336.75 samples/sec Loss 16.6169 LearningRate 0.0004 Epoch: 1 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:47,066-Speed 6308.31 samples/sec Loss 16.6733 LearningRate 0.0004 Epoch: 1 Global Step: 36620 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:50,309-Speed 6315.07 samples/sec Loss 16.5634 LearningRate 0.0004 Epoch: 1 Global Step: 36630 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:53,549-Speed 6321.80 samples/sec Loss 16.6418 LearningRate 0.0004 Epoch: 1 Global Step: 36640 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:03:56,793-Speed 6315.84 samples/sec Loss 16.6430 LearningRate 0.0004 Epoch: 1 Global Step: 36650 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:04:00,036-Speed 6316.97 samples/sec Loss 16.8327 LearningRate 0.0004 Epoch: 1 Global Step: 36660 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:04:03,283-Speed 6307.30 samples/sec Loss 16.6992 LearningRate 0.0004 Epoch: 1 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:04:06,525-Speed 6320.06 samples/sec Loss 16.6524 LearningRate 0.0004 Epoch: 1 Global Step: 36680 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:04:09,770-Speed 6310.79 samples/sec Loss 16.5973 LearningRate 0.0004 Epoch: 1 Global Step: 36690 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:04:13,013-Speed 6317.06 samples/sec Loss 16.6753 LearningRate 0.0004 Epoch: 1 Global Step: 36700 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:04:16,242-Speed 6344.16 samples/sec Loss 16.7449 LearningRate 0.0004 Epoch: 1 Global Step: 36710 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:04:19,486-Speed 6314.55 samples/sec Loss 16.6270 LearningRate 0.0004 Epoch: 1 Global Step: 36720 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:04:22,729-Speed 6317.57 samples/sec Loss 16.6475 LearningRate 0.0004 Epoch: 1 Global Step: 36730 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:04:25,973-Speed 6315.63 samples/sec Loss 16.5692 LearningRate 0.0004 Epoch: 1 Global Step: 36740 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:04:29,218-Speed 6312.54 samples/sec Loss 16.5827 LearningRate 0.0004 Epoch: 1 Global Step: 36750 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:04:32,461-Speed 6316.90 samples/sec Loss 16.6250 LearningRate 0.0004 Epoch: 1 Global Step: 36760 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:04:35,704-Speed 6316.36 samples/sec Loss 16.5981 LearningRate 0.0004 Epoch: 1 Global Step: 36770 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:04:38,946-Speed 6317.24 samples/sec Loss 16.6788 LearningRate 0.0004 Epoch: 1 Global Step: 36780 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:04:42,196-Speed 6303.76 samples/sec Loss 16.6896 LearningRate 0.0004 Epoch: 1 Global Step: 36790 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:04:45,436-Speed 6321.09 samples/sec Loss 16.6556 LearningRate 0.0004 Epoch: 1 Global Step: 36800 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:04:48,679-Speed 6318.32 samples/sec Loss 16.5433 LearningRate 0.0004 Epoch: 1 Global Step: 36810 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:04:51,925-Speed 6309.60 samples/sec Loss 16.6648 LearningRate 0.0004 Epoch: 1 Global Step: 36820 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:04:55,168-Speed 6316.85 samples/sec Loss 16.6175 LearningRate 0.0004 Epoch: 1 Global Step: 36830 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:04:58,425-Speed 6289.82 samples/sec Loss 16.6552 LearningRate 0.0004 Epoch: 1 Global Step: 36840 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:01,667-Speed 6318.44 samples/sec Loss 16.4766 LearningRate 0.0004 Epoch: 1 Global Step: 36850 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:04,921-Speed 6295.43 samples/sec Loss 16.6869 LearningRate 0.0004 Epoch: 1 Global Step: 36860 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:08,164-Speed 6315.06 samples/sec Loss 16.5540 LearningRate 0.0004 Epoch: 1 Global Step: 36870 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:11,406-Speed 6318.53 samples/sec Loss 16.5571 LearningRate 0.0004 Epoch: 1 Global Step: 36880 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:14,650-Speed 6314.07 samples/sec Loss 16.4721 LearningRate 0.0004 Epoch: 1 Global Step: 36890 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:17,897-Speed 6310.33 samples/sec Loss 16.4870 LearningRate 0.0004 Epoch: 1 Global Step: 36900 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:21,127-Speed 6341.97 samples/sec Loss 16.5772 LearningRate 0.0004 Epoch: 1 Global Step: 36910 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:24,366-Speed 6323.03 samples/sec Loss 16.5614 LearningRate 0.0004 Epoch: 1 Global Step: 36920 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:27,612-Speed 6313.03 samples/sec Loss 16.5744 LearningRate 0.0004 Epoch: 1 Global Step: 36930 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:30,856-Speed 6313.08 samples/sec Loss 16.5674 LearningRate 0.0004 Epoch: 1 Global Step: 36940 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:34,099-Speed 6316.95 samples/sec Loss 16.5223 LearningRate 0.0004 Epoch: 1 Global Step: 36950 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:37,342-Speed 6316.65 samples/sec Loss 16.5039 LearningRate 0.0004 Epoch: 1 Global Step: 36960 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:40,585-Speed 6317.42 samples/sec Loss 16.4792 LearningRate 0.0004 Epoch: 1 Global Step: 36970 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:43,827-Speed 6318.26 samples/sec Loss 16.5129 LearningRate 0.0004 Epoch: 1 Global Step: 36980 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:47,070-Speed 6317.23 samples/sec Loss 16.4248 LearningRate 0.0004 Epoch: 1 Global Step: 36990 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:50,312-Speed 6317.44 samples/sec Loss 16.4742 LearningRate 0.0004 Epoch: 1 Global Step: 37000 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:53,544-Speed 6338.94 samples/sec Loss 16.4126 LearningRate 0.0004 Epoch: 1 Global Step: 37010 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:05:56,784-Speed 6321.52 samples/sec Loss 16.4352 LearningRate 0.0004 Epoch: 1 Global Step: 37020 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:00,024-Speed 6322.30 samples/sec Loss 16.4461 LearningRate 0.0004 Epoch: 1 Global Step: 37030 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:03,270-Speed 6310.61 samples/sec Loss 16.3669 LearningRate 0.0004 Epoch: 1 Global Step: 37040 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:06,514-Speed 6315.01 samples/sec Loss 16.4431 LearningRate 0.0004 Epoch: 1 Global Step: 37050 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:09,766-Speed 6298.80 samples/sec Loss 16.4447 LearningRate 0.0004 Epoch: 1 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:13,009-Speed 6316.95 samples/sec Loss 16.4404 LearningRate 0.0004 Epoch: 1 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:16,257-Speed 6307.45 samples/sec Loss 16.4829 LearningRate 0.0004 Epoch: 1 Global Step: 37080 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:19,500-Speed 6315.40 samples/sec Loss 16.3249 LearningRate 0.0004 Epoch: 1 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:22,740-Speed 6321.71 samples/sec Loss 16.4650 LearningRate 0.0004 Epoch: 1 Global Step: 37100 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:25,971-Speed 6340.44 samples/sec Loss 16.5235 LearningRate 0.0004 Epoch: 1 Global Step: 37110 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:29,216-Speed 6313.26 samples/sec Loss 16.4814 LearningRate 0.0004 Epoch: 1 Global Step: 37120 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:32,460-Speed 6315.66 samples/sec Loss 16.4756 LearningRate 0.0004 Epoch: 1 Global Step: 37130 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:35,701-Speed 6321.05 samples/sec Loss 16.4051 LearningRate 0.0004 Epoch: 1 Global Step: 37140 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:38,946-Speed 6311.52 samples/sec Loss 16.3869 LearningRate 0.0004 Epoch: 1 Global Step: 37150 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:42,190-Speed 6313.88 samples/sec Loss 16.4139 LearningRate 0.0004 Epoch: 1 Global Step: 37160 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:45,430-Speed 6324.08 samples/sec Loss 16.3910 LearningRate 0.0004 Epoch: 1 Global Step: 37170 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:48,671-Speed 6320.51 samples/sec Loss 16.4420 LearningRate 0.0004 Epoch: 1 Global Step: 37180 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:51,912-Speed 6319.00 samples/sec Loss 16.4732 LearningRate 0.0004 Epoch: 1 Global Step: 37190 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:55,151-Speed 6325.20 samples/sec Loss 16.3770 LearningRate 0.0004 Epoch: 1 Global Step: 37200 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:06:58,379-Speed 6346.14 samples/sec Loss 16.4087 LearningRate 0.0004 Epoch: 1 Global Step: 37210 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:01,621-Speed 6317.59 samples/sec Loss 16.3441 LearningRate 0.0004 Epoch: 1 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:04,885-Speed 6277.17 samples/sec Loss 16.3601 LearningRate 0.0004 Epoch: 1 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:08,125-Speed 6322.64 samples/sec Loss 16.3397 LearningRate 0.0004 Epoch: 1 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:11,384-Speed 6284.24 samples/sec Loss 16.3230 LearningRate 0.0004 Epoch: 1 Global Step: 37250 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:14,629-Speed 6313.10 samples/sec Loss 16.3091 LearningRate 0.0004 Epoch: 1 Global Step: 37260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:17,874-Speed 6312.34 samples/sec Loss 16.3042 LearningRate 0.0004 Epoch: 1 Global Step: 37270 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:21,116-Speed 6318.17 samples/sec Loss 16.2946 LearningRate 0.0004 Epoch: 1 Global Step: 37280 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:24,356-Speed 6322.69 samples/sec Loss 16.3577 LearningRate 0.0004 Epoch: 1 Global Step: 37290 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:27,604-Speed 6307.36 samples/sec Loss 16.1693 LearningRate 0.0004 Epoch: 1 Global Step: 37300 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:30,832-Speed 6344.83 samples/sec Loss 16.3109 LearningRate 0.0004 Epoch: 1 Global Step: 37310 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:34,077-Speed 6313.63 samples/sec Loss 16.2636 LearningRate 0.0004 Epoch: 1 Global Step: 37320 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:37,337-Speed 6284.69 samples/sec Loss 16.2819 LearningRate 0.0004 Epoch: 1 Global Step: 37330 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:40,657-Speed 6169.93 samples/sec Loss 16.2813 LearningRate 0.0005 Epoch: 1 Global Step: 37340 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:43,897-Speed 6321.39 samples/sec Loss 16.3283 LearningRate 0.0005 Epoch: 1 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:47,140-Speed 6317.32 samples/sec Loss 16.2366 LearningRate 0.0005 Epoch: 1 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:50,384-Speed 6315.33 samples/sec Loss 16.3232 LearningRate 0.0005 Epoch: 1 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:53,629-Speed 6311.04 samples/sec Loss 16.3677 LearningRate 0.0005 Epoch: 1 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:07:56,870-Speed 6322.20 samples/sec Loss 16.4055 LearningRate 0.0005 Epoch: 1 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:00,122-Speed 6298.93 samples/sec Loss 16.2550 LearningRate 0.0005 Epoch: 1 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:03,394-Speed 6259.86 samples/sec Loss 16.1974 LearningRate 0.0005 Epoch: 1 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:06,648-Speed 6296.13 samples/sec Loss 16.1901 LearningRate 0.0005 Epoch: 1 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:09,888-Speed 6321.52 samples/sec Loss 16.2563 LearningRate 0.0005 Epoch: 1 Global Step: 37430 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:13,130-Speed 6318.40 samples/sec Loss 16.2341 LearningRate 0.0005 Epoch: 1 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:16,370-Speed 6323.32 samples/sec Loss 16.2951 LearningRate 0.0005 Epoch: 1 Global Step: 37450 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:19,619-Speed 6304.04 samples/sec Loss 16.2849 LearningRate 0.0005 Epoch: 1 Global Step: 37460 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:22,865-Speed 6311.13 samples/sec Loss 16.2731 LearningRate 0.0005 Epoch: 1 Global Step: 37470 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:26,112-Speed 6308.05 samples/sec Loss 16.1757 LearningRate 0.0005 Epoch: 1 Global Step: 37480 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:29,356-Speed 6314.02 samples/sec Loss 16.1978 LearningRate 0.0005 Epoch: 1 Global Step: 37490 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:32,600-Speed 6316.06 samples/sec Loss 16.1935 LearningRate 0.0005 Epoch: 1 Global Step: 37500 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:35,831-Speed 6338.78 samples/sec Loss 16.2103 LearningRate 0.0005 Epoch: 1 Global Step: 37510 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:39,077-Speed 6311.49 samples/sec Loss 16.0818 LearningRate 0.0005 Epoch: 1 Global Step: 37520 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:42,318-Speed 6320.25 samples/sec Loss 16.1266 LearningRate 0.0005 Epoch: 1 Global Step: 37530 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:45,561-Speed 6318.08 samples/sec Loss 16.3364 LearningRate 0.0005 Epoch: 1 Global Step: 37540 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:48,806-Speed 6312.10 samples/sec Loss 16.2895 LearningRate 0.0005 Epoch: 1 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:52,053-Speed 6308.38 samples/sec Loss 16.1886 LearningRate 0.0005 Epoch: 1 Global Step: 37560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:55,303-Speed 6303.38 samples/sec Loss 16.1468 LearningRate 0.0005 Epoch: 1 Global Step: 37570 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:08:58,545-Speed 6318.44 samples/sec Loss 16.1254 LearningRate 0.0005 Epoch: 1 Global Step: 37580 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:01,790-Speed 6313.02 samples/sec Loss 16.0723 LearningRate 0.0005 Epoch: 1 Global Step: 37590 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:05,099-Speed 6193.38 samples/sec Loss 16.0974 LearningRate 0.0005 Epoch: 1 Global Step: 37600 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:08,330-Speed 6340.36 samples/sec Loss 16.0960 LearningRate 0.0005 Epoch: 1 Global Step: 37610 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:11,578-Speed 6305.47 samples/sec Loss 16.1029 LearningRate 0.0005 Epoch: 1 Global Step: 37620 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:14,827-Speed 6305.21 samples/sec Loss 16.1379 LearningRate 0.0005 Epoch: 1 Global Step: 37630 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:18,071-Speed 6314.75 samples/sec Loss 15.9914 LearningRate 0.0005 Epoch: 1 Global Step: 37640 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:21,313-Speed 6317.87 samples/sec Loss 16.0618 LearningRate 0.0005 Epoch: 1 Global Step: 37650 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:24,557-Speed 6315.31 samples/sec Loss 16.0592 LearningRate 0.0005 Epoch: 1 Global Step: 37660 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:27,799-Speed 6318.11 samples/sec Loss 16.1058 LearningRate 0.0005 Epoch: 1 Global Step: 37670 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:31,044-Speed 6312.26 samples/sec Loss 16.0630 LearningRate 0.0005 Epoch: 1 Global Step: 37680 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:34,286-Speed 6318.34 samples/sec Loss 16.0612 LearningRate 0.0005 Epoch: 1 Global Step: 37690 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:37,532-Speed 6312.02 samples/sec Loss 16.0529 LearningRate 0.0005 Epoch: 1 Global Step: 37700 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:40,758-Speed 6348.55 samples/sec Loss 16.0135 LearningRate 0.0005 Epoch: 1 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:44,004-Speed 6311.78 samples/sec Loss 16.0334 LearningRate 0.0005 Epoch: 1 Global Step: 37720 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:47,244-Speed 6323.71 samples/sec Loss 15.9963 LearningRate 0.0005 Epoch: 1 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:50,485-Speed 6320.22 samples/sec Loss 16.1200 LearningRate 0.0005 Epoch: 1 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:53,729-Speed 6314.14 samples/sec Loss 16.0389 LearningRate 0.0005 Epoch: 1 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:09:56,975-Speed 6311.09 samples/sec Loss 16.0829 LearningRate 0.0005 Epoch: 1 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:10:00,220-Speed 6312.28 samples/sec Loss 16.1194 LearningRate 0.0005 Epoch: 1 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:10:03,465-Speed 6312.14 samples/sec Loss 16.1184 LearningRate 0.0005 Epoch: 1 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:10:06,707-Speed 6319.59 samples/sec Loss 16.0732 LearningRate 0.0005 Epoch: 1 Global Step: 37790 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:10:09,953-Speed 6309.92 samples/sec Loss 16.0486 LearningRate 0.0005 Epoch: 1 Global Step: 37800 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:10:13,197-Speed 6315.37 samples/sec Loss 16.0587 LearningRate 0.0005 Epoch: 1 Global Step: 37810 Fp16 Grad Scale: 262144 Required: 72 hours Training: 2022-03-31 19:10:16,425-Speed 6344.54 samples/sec Loss 16.1118 LearningRate 0.0005 Epoch: 1 Global Step: 37820 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:10:19,669-Speed 6314.54 samples/sec Loss 16.0572 LearningRate 0.0005 Epoch: 1 Global Step: 37830 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:10:22,912-Speed 6316.58 samples/sec Loss 15.9864 LearningRate 0.0005 Epoch: 1 Global Step: 37840 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:10:26,216-Speed 6200.00 samples/sec Loss 16.1135 LearningRate 0.0005 Epoch: 1 Global Step: 37850 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:10:29,445-Speed 6343.93 samples/sec Loss 15.9678 LearningRate 0.0005 Epoch: 1 Global Step: 37860 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:10:32,686-Speed 6321.24 samples/sec Loss 16.0896 LearningRate 0.0005 Epoch: 1 Global Step: 37870 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:10:35,941-Speed 6293.47 samples/sec Loss 16.0096 LearningRate 0.0005 Epoch: 1 Global Step: 37880 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:10:39,182-Speed 6319.43 samples/sec Loss 16.0475 LearningRate 0.0005 Epoch: 1 Global Step: 37890 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:10:42,428-Speed 6310.53 samples/sec Loss 15.9643 LearningRate 0.0005 Epoch: 1 Global Step: 37900 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:10:45,670-Speed 6319.10 samples/sec Loss 16.0597 LearningRate 0.0005 Epoch: 1 Global Step: 37910 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:10:48,916-Speed 6310.93 samples/sec Loss 15.9754 LearningRate 0.0005 Epoch: 1 Global Step: 37920 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:10:52,160-Speed 6314.35 samples/sec Loss 15.9225 LearningRate 0.0005 Epoch: 1 Global Step: 37930 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:10:55,403-Speed 6317.72 samples/sec Loss 15.9809 LearningRate 0.0005 Epoch: 1 Global Step: 37940 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:10:58,647-Speed 6314.31 samples/sec Loss 15.9772 LearningRate 0.0005 Epoch: 1 Global Step: 37950 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:11:01,887-Speed 6323.15 samples/sec Loss 15.9385 LearningRate 0.0005 Epoch: 1 Global Step: 37960 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:11:05,132-Speed 6312.52 samples/sec Loss 16.0238 LearningRate 0.0005 Epoch: 1 Global Step: 37970 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:11:08,378-Speed 6309.76 samples/sec Loss 15.9262 LearningRate 0.0005 Epoch: 1 Global Step: 37980 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:11:11,618-Speed 6322.48 samples/sec Loss 15.9167 LearningRate 0.0005 Epoch: 1 Global Step: 37990 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:11:14,859-Speed 6320.85 samples/sec Loss 15.9147 LearningRate 0.0005 Epoch: 1 Global Step: 38000 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:11:18,161-Speed 6203.18 samples/sec Loss 15.8326 LearningRate 0.0005 Epoch: 1 Global Step: 38010 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:11:21,403-Speed 6319.26 samples/sec Loss 16.0097 LearningRate 0.0005 Epoch: 1 Global Step: 38020 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:11:24,679-Speed 6252.06 samples/sec Loss 15.8812 LearningRate 0.0005 Epoch: 1 Global Step: 38030 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:11:27,943-Speed 6276.53 samples/sec Loss 15.9215 LearningRate 0.0005 Epoch: 1 Global Step: 38040 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:11:31,173-Speed 6340.73 samples/sec Loss 15.8393 LearningRate 0.0005 Epoch: 1 Global Step: 38050 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:11:34,414-Speed 6322.39 samples/sec Loss 15.7977 LearningRate 0.0005 Epoch: 1 Global Step: 38060 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:11:37,660-Speed 6310.37 samples/sec Loss 15.9022 LearningRate 0.0005 Epoch: 1 Global Step: 38070 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:11:40,906-Speed 6309.51 samples/sec Loss 15.8522 LearningRate 0.0005 Epoch: 1 Global Step: 38080 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:11:44,150-Speed 6314.74 samples/sec Loss 15.9805 LearningRate 0.0005 Epoch: 1 Global Step: 38090 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:11:47,393-Speed 6317.00 samples/sec Loss 15.8196 LearningRate 0.0005 Epoch: 1 Global Step: 38100 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:11:50,637-Speed 6315.31 samples/sec Loss 15.9419 LearningRate 0.0005 Epoch: 1 Global Step: 38110 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:11:53,879-Speed 6317.04 samples/sec Loss 15.9068 LearningRate 0.0005 Epoch: 1 Global Step: 38120 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:11:57,122-Speed 6317.31 samples/sec Loss 15.9740 LearningRate 0.0005 Epoch: 1 Global Step: 38130 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:12:00,366-Speed 6315.01 samples/sec Loss 15.8411 LearningRate 0.0005 Epoch: 1 Global Step: 38140 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:12:03,614-Speed 6308.15 samples/sec Loss 15.9197 LearningRate 0.0005 Epoch: 1 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:06,859-Speed 6311.26 samples/sec Loss 15.8467 LearningRate 0.0005 Epoch: 1 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:10,104-Speed 6313.96 samples/sec Loss 15.8026 LearningRate 0.0005 Epoch: 1 Global Step: 38170 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:13,348-Speed 6314.57 samples/sec Loss 15.8690 LearningRate 0.0005 Epoch: 1 Global Step: 38180 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:16,594-Speed 6310.51 samples/sec Loss 15.8321 LearningRate 0.0005 Epoch: 1 Global Step: 38190 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:19,839-Speed 6311.84 samples/sec Loss 15.7889 LearningRate 0.0005 Epoch: 1 Global Step: 38200 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:23,083-Speed 6314.81 samples/sec Loss 15.8465 LearningRate 0.0005 Epoch: 1 Global Step: 38210 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:26,326-Speed 6316.28 samples/sec Loss 15.8961 LearningRate 0.0005 Epoch: 1 Global Step: 38220 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:29,569-Speed 6316.87 samples/sec Loss 15.6586 LearningRate 0.0005 Epoch: 1 Global Step: 38230 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:32,811-Speed 6317.80 samples/sec Loss 15.7431 LearningRate 0.0005 Epoch: 1 Global Step: 38240 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:36,040-Speed 6343.75 samples/sec Loss 15.8614 LearningRate 0.0005 Epoch: 1 Global Step: 38250 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:39,288-Speed 6309.07 samples/sec Loss 15.8534 LearningRate 0.0005 Epoch: 1 Global Step: 38260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:42,530-Speed 6316.84 samples/sec Loss 15.7008 LearningRate 0.0005 Epoch: 1 Global Step: 38270 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:45,773-Speed 6316.88 samples/sec Loss 15.8268 LearningRate 0.0005 Epoch: 1 Global Step: 38280 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:49,014-Speed 6321.09 samples/sec Loss 15.6704 LearningRate 0.0005 Epoch: 1 Global Step: 38290 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:52,259-Speed 6312.23 samples/sec Loss 15.7434 LearningRate 0.0005 Epoch: 1 Global Step: 38300 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:55,503-Speed 6314.32 samples/sec Loss 15.8452 LearningRate 0.0005 Epoch: 1 Global Step: 38310 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:12:58,748-Speed 6312.21 samples/sec Loss 15.7257 LearningRate 0.0005 Epoch: 1 Global Step: 38320 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:01,993-Speed 6313.91 samples/sec Loss 15.7759 LearningRate 0.0005 Epoch: 1 Global Step: 38330 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:05,240-Speed 6308.59 samples/sec Loss 15.7719 LearningRate 0.0005 Epoch: 1 Global Step: 38340 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:08,472-Speed 6338.65 samples/sec Loss 15.7177 LearningRate 0.0005 Epoch: 1 Global Step: 38350 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:11,719-Speed 6309.76 samples/sec Loss 15.7603 LearningRate 0.0005 Epoch: 1 Global Step: 38360 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:14,963-Speed 6313.20 samples/sec Loss 15.7394 LearningRate 0.0005 Epoch: 1 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:18,207-Speed 6315.82 samples/sec Loss 15.7181 LearningRate 0.0005 Epoch: 1 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:21,449-Speed 6318.56 samples/sec Loss 15.6987 LearningRate 0.0005 Epoch: 1 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:24,692-Speed 6316.33 samples/sec Loss 15.7970 LearningRate 0.0005 Epoch: 1 Global Step: 38400 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:27,940-Speed 6306.05 samples/sec Loss 15.7075 LearningRate 0.0005 Epoch: 1 Global Step: 38410 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:31,185-Speed 6313.21 samples/sec Loss 15.6234 LearningRate 0.0005 Epoch: 1 Global Step: 38420 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:34,433-Speed 6306.39 samples/sec Loss 15.7653 LearningRate 0.0005 Epoch: 1 Global Step: 38430 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:37,684-Speed 6302.06 samples/sec Loss 15.6054 LearningRate 0.0005 Epoch: 1 Global Step: 38440 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:40,912-Speed 6344.52 samples/sec Loss 15.6412 LearningRate 0.0005 Epoch: 1 Global Step: 38450 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:44,157-Speed 6313.42 samples/sec Loss 15.6569 LearningRate 0.0005 Epoch: 1 Global Step: 38460 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:47,400-Speed 6315.60 samples/sec Loss 15.6381 LearningRate 0.0005 Epoch: 1 Global Step: 38470 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:50,643-Speed 6316.94 samples/sec Loss 15.7547 LearningRate 0.0005 Epoch: 1 Global Step: 38480 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:53,888-Speed 6313.91 samples/sec Loss 15.7356 LearningRate 0.0005 Epoch: 1 Global Step: 38490 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:13:57,133-Speed 6311.03 samples/sec Loss 15.6976 LearningRate 0.0005 Epoch: 1 Global Step: 38500 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:00,380-Speed 6309.42 samples/sec Loss 15.6746 LearningRate 0.0005 Epoch: 1 Global Step: 38510 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:03,627-Speed 6309.73 samples/sec Loss 15.6542 LearningRate 0.0005 Epoch: 1 Global Step: 38520 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:06,869-Speed 6319.08 samples/sec Loss 15.7635 LearningRate 0.0005 Epoch: 1 Global Step: 38530 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:10,111-Speed 6318.94 samples/sec Loss 15.6349 LearningRate 0.0005 Epoch: 1 Global Step: 38540 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:13,358-Speed 6307.86 samples/sec Loss 15.6544 LearningRate 0.0005 Epoch: 1 Global Step: 38550 Fp16 Grad Scale: 262144 Required: 72 hours Training: 2022-03-31 19:14:16,590-Speed 6337.04 samples/sec Loss 15.6670 LearningRate 0.0005 Epoch: 1 Global Step: 38560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:19,834-Speed 6315.98 samples/sec Loss 15.6597 LearningRate 0.0005 Epoch: 1 Global Step: 38570 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:23,080-Speed 6310.45 samples/sec Loss 15.6546 LearningRate 0.0005 Epoch: 1 Global Step: 38580 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:26,317-Speed 6327.18 samples/sec Loss 15.6335 LearningRate 0.0005 Epoch: 1 Global Step: 38590 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:29,592-Speed 6255.67 samples/sec Loss 15.5623 LearningRate 0.0005 Epoch: 1 Global Step: 38600 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:32,838-Speed 6311.02 samples/sec Loss 15.6980 LearningRate 0.0005 Epoch: 1 Global Step: 38610 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:36,081-Speed 6315.67 samples/sec Loss 15.7079 LearningRate 0.0005 Epoch: 1 Global Step: 38620 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:14:39,312-Speed 6340.82 samples/sec Loss 15.5773 LearningRate 0.0005 Epoch: 1 Global Step: 38630 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:14:42,550-Speed 6326.82 samples/sec Loss 15.6059 LearningRate 0.0005 Epoch: 1 Global Step: 38640 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:14:45,791-Speed 6319.57 samples/sec Loss 15.6968 LearningRate 0.0005 Epoch: 1 Global Step: 38650 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:14:49,036-Speed 6312.58 samples/sec Loss 15.5092 LearningRate 0.0005 Epoch: 1 Global Step: 38660 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:14:52,280-Speed 6315.05 samples/sec Loss 15.6369 LearningRate 0.0005 Epoch: 1 Global Step: 38670 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:14:55,526-Speed 6310.51 samples/sec Loss 15.6261 LearningRate 0.0005 Epoch: 1 Global Step: 38680 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:14:58,767-Speed 6319.23 samples/sec Loss 15.6377 LearningRate 0.0005 Epoch: 1 Global Step: 38690 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:15:02,012-Speed 6313.79 samples/sec Loss 15.4909 LearningRate 0.0005 Epoch: 1 Global Step: 38700 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:15:05,254-Speed 6318.80 samples/sec Loss 15.5189 LearningRate 0.0005 Epoch: 1 Global Step: 38710 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:15:08,497-Speed 6315.77 samples/sec Loss 15.6202 LearningRate 0.0005 Epoch: 1 Global Step: 38720 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:15:11,738-Speed 6321.69 samples/sec Loss 15.5054 LearningRate 0.0005 Epoch: 1 Global Step: 38730 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:14,979-Speed 6319.17 samples/sec Loss 15.5282 LearningRate 0.0005 Epoch: 1 Global Step: 38740 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:18,226-Speed 6309.71 samples/sec Loss 15.5218 LearningRate 0.0005 Epoch: 1 Global Step: 38750 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:21,467-Speed 6320.93 samples/sec Loss 15.5594 LearningRate 0.0005 Epoch: 1 Global Step: 38760 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:24,706-Speed 6323.91 samples/sec Loss 15.6371 LearningRate 0.0005 Epoch: 1 Global Step: 38770 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:27,944-Speed 6326.86 samples/sec Loss 15.4549 LearningRate 0.0005 Epoch: 1 Global Step: 38780 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:31,186-Speed 6318.13 samples/sec Loss 15.5716 LearningRate 0.0005 Epoch: 1 Global Step: 38790 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:34,426-Speed 6320.98 samples/sec Loss 15.6149 LearningRate 0.0005 Epoch: 1 Global Step: 38800 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:37,664-Speed 6327.83 samples/sec Loss 15.5241 LearningRate 0.0005 Epoch: 1 Global Step: 38810 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:40,915-Speed 6301.56 samples/sec Loss 15.5886 LearningRate 0.0005 Epoch: 1 Global Step: 38820 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:44,143-Speed 6344.90 samples/sec Loss 15.5551 LearningRate 0.0005 Epoch: 1 Global Step: 38830 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:47,384-Speed 6320.16 samples/sec Loss 15.5173 LearningRate 0.0005 Epoch: 1 Global Step: 38840 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:50,627-Speed 6316.29 samples/sec Loss 15.6213 LearningRate 0.0005 Epoch: 1 Global Step: 38850 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:53,874-Speed 6309.90 samples/sec Loss 15.5841 LearningRate 0.0005 Epoch: 1 Global Step: 38860 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:15:57,117-Speed 6316.99 samples/sec Loss 15.4196 LearningRate 0.0005 Epoch: 1 Global Step: 38870 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:16:00,360-Speed 6314.61 samples/sec Loss 15.4989 LearningRate 0.0005 Epoch: 1 Global Step: 38880 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:16:03,611-Speed 6301.84 samples/sec Loss 15.3760 LearningRate 0.0005 Epoch: 1 Global Step: 38890 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:16:06,844-Speed 6335.53 samples/sec Loss 15.4278 LearningRate 0.0005 Epoch: 1 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:16:10,092-Speed 6307.23 samples/sec Loss 15.5033 LearningRate 0.0005 Epoch: 1 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:16:13,336-Speed 6314.95 samples/sec Loss 15.3935 LearningRate 0.0005 Epoch: 1 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:16:16,577-Speed 6321.62 samples/sec Loss 15.5345 LearningRate 0.0005 Epoch: 1 Global Step: 38930 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:16:19,821-Speed 6314.32 samples/sec Loss 15.4678 LearningRate 0.0005 Epoch: 1 Global Step: 38940 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:16:23,061-Speed 6322.15 samples/sec Loss 15.4752 LearningRate 0.0005 Epoch: 1 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:16:26,302-Speed 6321.34 samples/sec Loss 15.4677 LearningRate 0.0005 Epoch: 1 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:16:29,548-Speed 6310.40 samples/sec Loss 15.4243 LearningRate 0.0005 Epoch: 1 Global Step: 38970 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:16:32,790-Speed 6318.44 samples/sec Loss 15.5178 LearningRate 0.0005 Epoch: 1 Global Step: 38980 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:16:36,043-Speed 6296.54 samples/sec Loss 15.4590 LearningRate 0.0005 Epoch: 1 Global Step: 38990 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:16:39,284-Speed 6321.37 samples/sec Loss 15.5181 LearningRate 0.0005 Epoch: 1 Global Step: 39000 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:16:42,529-Speed 6311.47 samples/sec Loss 15.5368 LearningRate 0.0005 Epoch: 1 Global Step: 39010 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:16:45,772-Speed 6316.51 samples/sec Loss 15.4734 LearningRate 0.0005 Epoch: 1 Global Step: 39020 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:16:49,019-Speed 6309.70 samples/sec Loss 15.4073 LearningRate 0.0005 Epoch: 1 Global Step: 39030 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:16:52,267-Speed 6307.32 samples/sec Loss 15.4275 LearningRate 0.0005 Epoch: 1 Global Step: 39040 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:16:55,510-Speed 6317.34 samples/sec Loss 15.3551 LearningRate 0.0005 Epoch: 1 Global Step: 39050 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:16:58,741-Speed 6338.36 samples/sec Loss 15.4706 LearningRate 0.0005 Epoch: 1 Global Step: 39060 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:17:01,992-Speed 6301.65 samples/sec Loss 15.3787 LearningRate 0.0005 Epoch: 1 Global Step: 39070 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:17:05,245-Speed 6296.37 samples/sec Loss 15.3344 LearningRate 0.0005 Epoch: 1 Global Step: 39080 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:17:08,488-Speed 6316.69 samples/sec Loss 15.4464 LearningRate 0.0005 Epoch: 1 Global Step: 39090 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:17:11,728-Speed 6322.54 samples/sec Loss 15.4287 LearningRate 0.0005 Epoch: 1 Global Step: 39100 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:17:14,977-Speed 6304.96 samples/sec Loss 15.4087 LearningRate 0.0005 Epoch: 1 Global Step: 39110 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:17:18,219-Speed 6318.79 samples/sec Loss 15.3021 LearningRate 0.0005 Epoch: 1 Global Step: 39120 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:17:21,461-Speed 6318.51 samples/sec Loss 15.3308 LearningRate 0.0005 Epoch: 1 Global Step: 39130 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:17:24,703-Speed 6319.41 samples/sec Loss 15.3191 LearningRate 0.0005 Epoch: 1 Global Step: 39140 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:17:27,953-Speed 6303.95 samples/sec Loss 15.4550 LearningRate 0.0005 Epoch: 1 Global Step: 39150 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:17:31,194-Speed 6319.92 samples/sec Loss 15.4897 LearningRate 0.0005 Epoch: 1 Global Step: 39160 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:17:34,436-Speed 6316.94 samples/sec Loss 15.4587 LearningRate 0.0005 Epoch: 1 Global Step: 39170 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:17:37,679-Speed 6317.24 samples/sec Loss 15.3911 LearningRate 0.0005 Epoch: 1 Global Step: 39180 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:17:40,921-Speed 6318.01 samples/sec Loss 15.3416 LearningRate 0.0005 Epoch: 1 Global Step: 39190 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:17:44,167-Speed 6310.89 samples/sec Loss 15.2923 LearningRate 0.0005 Epoch: 1 Global Step: 39200 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:17:47,412-Speed 6313.19 samples/sec Loss 15.3623 LearningRate 0.0005 Epoch: 1 Global Step: 39210 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:17:50,658-Speed 6310.52 samples/sec Loss 15.4795 LearningRate 0.0005 Epoch: 1 Global Step: 39220 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:17:53,900-Speed 6320.38 samples/sec Loss 15.3099 LearningRate 0.0005 Epoch: 1 Global Step: 39230 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:17:57,143-Speed 6316.34 samples/sec Loss 15.2864 LearningRate 0.0005 Epoch: 1 Global Step: 39240 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:00,393-Speed 6303.40 samples/sec Loss 15.1935 LearningRate 0.0005 Epoch: 1 Global Step: 39250 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:03,628-Speed 6331.56 samples/sec Loss 15.3509 LearningRate 0.0005 Epoch: 1 Global Step: 39260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:06,873-Speed 6313.43 samples/sec Loss 15.4117 LearningRate 0.0005 Epoch: 1 Global Step: 39270 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:10,114-Speed 6319.89 samples/sec Loss 15.3022 LearningRate 0.0005 Epoch: 1 Global Step: 39280 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:13,353-Speed 6324.77 samples/sec Loss 15.3329 LearningRate 0.0005 Epoch: 1 Global Step: 39290 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:16,599-Speed 6311.09 samples/sec Loss 15.2760 LearningRate 0.0005 Epoch: 1 Global Step: 39300 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:19,842-Speed 6316.90 samples/sec Loss 15.2484 LearningRate 0.0005 Epoch: 1 Global Step: 39310 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:23,085-Speed 6315.69 samples/sec Loss 15.2455 LearningRate 0.0005 Epoch: 1 Global Step: 39320 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:26,341-Speed 6291.35 samples/sec Loss 15.2962 LearningRate 0.0005 Epoch: 1 Global Step: 39330 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:29,598-Speed 6290.63 samples/sec Loss 15.3504 LearningRate 0.0005 Epoch: 1 Global Step: 39340 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:32,845-Speed 6307.73 samples/sec Loss 15.2527 LearningRate 0.0005 Epoch: 1 Global Step: 39350 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:36,071-Speed 6351.11 samples/sec Loss 15.2594 LearningRate 0.0005 Epoch: 1 Global Step: 39360 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:39,315-Speed 6314.82 samples/sec Loss 15.3749 LearningRate 0.0005 Epoch: 1 Global Step: 39370 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:42,555-Speed 6321.22 samples/sec Loss 15.1958 LearningRate 0.0005 Epoch: 1 Global Step: 39380 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:45,800-Speed 6312.99 samples/sec Loss 15.3212 LearningRate 0.0005 Epoch: 1 Global Step: 39390 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:49,043-Speed 6316.56 samples/sec Loss 15.2255 LearningRate 0.0005 Epoch: 1 Global Step: 39400 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:52,286-Speed 6316.99 samples/sec Loss 15.3154 LearningRate 0.0005 Epoch: 1 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:55,532-Speed 6311.23 samples/sec Loss 15.2336 LearningRate 0.0005 Epoch: 1 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:18:58,777-Speed 6312.94 samples/sec Loss 15.2058 LearningRate 0.0005 Epoch: 1 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:02,031-Speed 6294.97 samples/sec Loss 15.2303 LearningRate 0.0005 Epoch: 1 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:05,271-Speed 6320.60 samples/sec Loss 15.2666 LearningRate 0.0005 Epoch: 1 Global Step: 39450 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:08,502-Speed 6340.62 samples/sec Loss 15.3573 LearningRate 0.0005 Epoch: 1 Global Step: 39460 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:11,748-Speed 6310.94 samples/sec Loss 15.1919 LearningRate 0.0005 Epoch: 1 Global Step: 39470 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:14,994-Speed 6311.28 samples/sec Loss 15.2158 LearningRate 0.0005 Epoch: 1 Global Step: 39480 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:18,238-Speed 6314.69 samples/sec Loss 15.1734 LearningRate 0.0005 Epoch: 1 Global Step: 39490 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:21,480-Speed 6317.38 samples/sec Loss 15.1872 LearningRate 0.0005 Epoch: 1 Global Step: 39500 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:24,723-Speed 6316.86 samples/sec Loss 15.2010 LearningRate 0.0005 Epoch: 1 Global Step: 39510 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:27,982-Speed 6286.06 samples/sec Loss 15.3367 LearningRate 0.0005 Epoch: 1 Global Step: 39520 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:31,224-Speed 6318.35 samples/sec Loss 15.1659 LearningRate 0.0005 Epoch: 1 Global Step: 39530 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:34,468-Speed 6314.73 samples/sec Loss 15.0919 LearningRate 0.0005 Epoch: 1 Global Step: 39540 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:37,709-Speed 6320.55 samples/sec Loss 15.2223 LearningRate 0.0005 Epoch: 1 Global Step: 39550 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:40,940-Speed 6341.74 samples/sec Loss 15.2299 LearningRate 0.0005 Epoch: 1 Global Step: 39560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:44,180-Speed 6321.91 samples/sec Loss 15.2452 LearningRate 0.0005 Epoch: 1 Global Step: 39570 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:47,424-Speed 6315.11 samples/sec Loss 15.1116 LearningRate 0.0005 Epoch: 1 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:50,669-Speed 6312.49 samples/sec Loss 15.1102 LearningRate 0.0005 Epoch: 1 Global Step: 39590 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:53,913-Speed 6313.25 samples/sec Loss 15.2419 LearningRate 0.0005 Epoch: 1 Global Step: 39600 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:19:57,152-Speed 6325.72 samples/sec Loss 15.1937 LearningRate 0.0005 Epoch: 1 Global Step: 39610 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:00,395-Speed 6315.23 samples/sec Loss 15.1884 LearningRate 0.0005 Epoch: 1 Global Step: 39620 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:03,641-Speed 6311.02 samples/sec Loss 15.1079 LearningRate 0.0005 Epoch: 1 Global Step: 39630 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:06,893-Speed 6299.50 samples/sec Loss 15.1044 LearningRate 0.0005 Epoch: 1 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:10,132-Speed 6323.19 samples/sec Loss 15.1494 LearningRate 0.0005 Epoch: 1 Global Step: 39650 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:13,368-Speed 6331.78 samples/sec Loss 15.2502 LearningRate 0.0005 Epoch: 1 Global Step: 39660 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:16,612-Speed 6313.91 samples/sec Loss 15.2042 LearningRate 0.0005 Epoch: 1 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:19,852-Speed 6322.13 samples/sec Loss 15.1672 LearningRate 0.0005 Epoch: 1 Global Step: 39680 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:23,096-Speed 6315.24 samples/sec Loss 15.0346 LearningRate 0.0005 Epoch: 1 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:26,339-Speed 6315.51 samples/sec Loss 15.2524 LearningRate 0.0005 Epoch: 1 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:29,583-Speed 6315.73 samples/sec Loss 15.1201 LearningRate 0.0005 Epoch: 1 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:20:32,815-Speed 6337.83 samples/sec Loss 15.1023 LearningRate 0.0005 Epoch: 1 Global Step: 39720 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:20:36,061-Speed 6311.09 samples/sec Loss 15.1261 LearningRate 0.0005 Epoch: 1 Global Step: 39730 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:20:39,306-Speed 6312.08 samples/sec Loss 15.2325 LearningRate 0.0005 Epoch: 1 Global Step: 39740 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:20:42,552-Speed 6312.33 samples/sec Loss 15.1572 LearningRate 0.0005 Epoch: 1 Global Step: 39750 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:20:45,795-Speed 6315.31 samples/sec Loss 15.0263 LearningRate 0.0005 Epoch: 1 Global Step: 39760 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:20:49,073-Speed 6248.85 samples/sec Loss 15.1187 LearningRate 0.0005 Epoch: 1 Global Step: 39770 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:20:52,330-Speed 6290.10 samples/sec Loss 15.0881 LearningRate 0.0005 Epoch: 1 Global Step: 39780 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:20:55,575-Speed 6311.90 samples/sec Loss 15.0257 LearningRate 0.0005 Epoch: 1 Global Step: 39790 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:20:58,820-Speed 6313.87 samples/sec Loss 15.1232 LearningRate 0.0005 Epoch: 1 Global Step: 39800 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:21:02,064-Speed 6315.18 samples/sec Loss 15.0316 LearningRate 0.0005 Epoch: 1 Global Step: 39810 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:21:05,306-Speed 6318.44 samples/sec Loss 14.9896 LearningRate 0.0005 Epoch: 1 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:08,553-Speed 6307.05 samples/sec Loss 14.9449 LearningRate 0.0005 Epoch: 1 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:11,791-Speed 6326.54 samples/sec Loss 14.9639 LearningRate 0.0005 Epoch: 1 Global Step: 39840 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:15,036-Speed 6313.42 samples/sec Loss 15.0051 LearningRate 0.0005 Epoch: 1 Global Step: 39850 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:18,283-Speed 6308.02 samples/sec Loss 14.9913 LearningRate 0.0005 Epoch: 1 Global Step: 39860 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:21,533-Speed 6302.70 samples/sec Loss 14.9676 LearningRate 0.0005 Epoch: 1 Global Step: 39870 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:24,780-Speed 6309.87 samples/sec Loss 15.0260 LearningRate 0.0005 Epoch: 1 Global Step: 39880 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:28,024-Speed 6313.40 samples/sec Loss 15.0518 LearningRate 0.0005 Epoch: 1 Global Step: 39890 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:31,268-Speed 6314.28 samples/sec Loss 14.9592 LearningRate 0.0005 Epoch: 1 Global Step: 39900 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:34,516-Speed 6307.21 samples/sec Loss 15.1577 LearningRate 0.0005 Epoch: 1 Global Step: 39910 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:37,744-Speed 6345.91 samples/sec Loss 14.9502 LearningRate 0.0005 Epoch: 1 Global Step: 39920 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:40,988-Speed 6315.32 samples/sec Loss 15.0699 LearningRate 0.0005 Epoch: 1 Global Step: 39930 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:44,229-Speed 6321.26 samples/sec Loss 15.1150 LearningRate 0.0005 Epoch: 1 Global Step: 39940 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:47,469-Speed 6321.79 samples/sec Loss 15.0122 LearningRate 0.0005 Epoch: 1 Global Step: 39950 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:50,709-Speed 6323.10 samples/sec Loss 15.0218 LearningRate 0.0005 Epoch: 1 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:53,951-Speed 6318.80 samples/sec Loss 15.1256 LearningRate 0.0005 Epoch: 1 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:21:57,192-Speed 6318.98 samples/sec Loss 14.9544 LearningRate 0.0005 Epoch: 1 Global Step: 39980 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:00,437-Speed 6314.42 samples/sec Loss 14.9696 LearningRate 0.0005 Epoch: 1 Global Step: 39990 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:03,677-Speed 6320.40 samples/sec Loss 14.9288 LearningRate 0.0005 Epoch: 1 Global Step: 40000 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:06,919-Speed 6319.21 samples/sec Loss 15.0174 LearningRate 0.0005 Epoch: 1 Global Step: 40010 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:10,147-Speed 6346.13 samples/sec Loss 15.0095 LearningRate 0.0005 Epoch: 1 Global Step: 40020 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:13,391-Speed 6314.57 samples/sec Loss 14.9888 LearningRate 0.0005 Epoch: 1 Global Step: 40030 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:16,635-Speed 6315.03 samples/sec Loss 15.0395 LearningRate 0.0005 Epoch: 1 Global Step: 40040 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:19,878-Speed 6317.07 samples/sec Loss 15.0643 LearningRate 0.0005 Epoch: 1 Global Step: 40050 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:23,125-Speed 6307.27 samples/sec Loss 14.9617 LearningRate 0.0005 Epoch: 1 Global Step: 40060 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:26,373-Speed 6306.79 samples/sec Loss 15.0374 LearningRate 0.0005 Epoch: 1 Global Step: 40070 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:29,618-Speed 6314.24 samples/sec Loss 14.9773 LearningRate 0.0005 Epoch: 1 Global Step: 40080 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:32,860-Speed 6317.80 samples/sec Loss 15.1100 LearningRate 0.0005 Epoch: 1 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:36,110-Speed 6303.62 samples/sec Loss 14.8270 LearningRate 0.0005 Epoch: 1 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:39,352-Speed 6318.05 samples/sec Loss 15.0029 LearningRate 0.0005 Epoch: 1 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:42,596-Speed 6315.34 samples/sec Loss 14.9023 LearningRate 0.0005 Epoch: 1 Global Step: 40120 Fp16 Grad Scale: 262144 Required: 72 hours Training: 2022-03-31 19:22:45,822-Speed 6350.47 samples/sec Loss 14.8630 LearningRate 0.0005 Epoch: 1 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:22:49,057-Speed 6332.33 samples/sec Loss 15.0177 LearningRate 0.0005 Epoch: 1 Global Step: 40140 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:22:52,296-Speed 6324.00 samples/sec Loss 14.9418 LearningRate 0.0005 Epoch: 1 Global Step: 40150 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:22:55,541-Speed 6313.02 samples/sec Loss 14.9646 LearningRate 0.0005 Epoch: 1 Global Step: 40160 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:22:58,783-Speed 6317.62 samples/sec Loss 14.7742 LearningRate 0.0005 Epoch: 1 Global Step: 40170 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:23:02,026-Speed 6316.71 samples/sec Loss 14.9337 LearningRate 0.0005 Epoch: 1 Global Step: 40180 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:23:05,269-Speed 6317.90 samples/sec Loss 14.8964 LearningRate 0.0005 Epoch: 1 Global Step: 40190 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:23:08,509-Speed 6321.44 samples/sec Loss 14.9986 LearningRate 0.0005 Epoch: 1 Global Step: 40200 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:23:11,767-Speed 6286.98 samples/sec Loss 14.9205 LearningRate 0.0005 Epoch: 1 Global Step: 40210 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:23:15,009-Speed 6318.03 samples/sec Loss 14.9908 LearningRate 0.0005 Epoch: 1 Global Step: 40220 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:23:18,258-Speed 6306.13 samples/sec Loss 14.8193 LearningRate 0.0005 Epoch: 1 Global Step: 40230 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:23:21,500-Speed 6318.18 samples/sec Loss 14.8712 LearningRate 0.0005 Epoch: 1 Global Step: 40240 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:24,748-Speed 6306.85 samples/sec Loss 14.9370 LearningRate 0.0005 Epoch: 1 Global Step: 40250 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:27,995-Speed 6309.39 samples/sec Loss 14.9067 LearningRate 0.0005 Epoch: 1 Global Step: 40260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:31,245-Speed 6302.05 samples/sec Loss 14.8705 LearningRate 0.0005 Epoch: 1 Global Step: 40270 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:34,488-Speed 6317.56 samples/sec Loss 14.9188 LearningRate 0.0005 Epoch: 1 Global Step: 40280 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:37,730-Speed 6318.55 samples/sec Loss 14.8123 LearningRate 0.0005 Epoch: 1 Global Step: 40290 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:40,971-Speed 6319.41 samples/sec Loss 14.9511 LearningRate 0.0005 Epoch: 1 Global Step: 40300 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:44,221-Speed 6302.25 samples/sec Loss 14.7752 LearningRate 0.0005 Epoch: 1 Global Step: 40310 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:47,466-Speed 6314.87 samples/sec Loss 14.8314 LearningRate 0.0005 Epoch: 1 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:50,708-Speed 6317.72 samples/sec Loss 14.8189 LearningRate 0.0005 Epoch: 1 Global Step: 40330 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:53,940-Speed 6337.49 samples/sec Loss 14.7393 LearningRate 0.0005 Epoch: 1 Global Step: 40340 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:23:57,182-Speed 6319.75 samples/sec Loss 14.8603 LearningRate 0.0005 Epoch: 1 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:24:00,437-Speed 6292.63 samples/sec Loss 14.8004 LearningRate 0.0005 Epoch: 1 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:24:03,692-Speed 6293.47 samples/sec Loss 14.7951 LearningRate 0.0005 Epoch: 1 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:24:06,921-Speed 6343.18 samples/sec Loss 14.9536 LearningRate 0.0005 Epoch: 1 Global Step: 40380 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:24:10,169-Speed 6308.02 samples/sec Loss 14.8697 LearningRate 0.0005 Epoch: 1 Global Step: 40390 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:24:13,411-Speed 6318.31 samples/sec Loss 14.8354 LearningRate 0.0005 Epoch: 1 Global Step: 40400 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:24:16,653-Speed 6317.28 samples/sec Loss 14.9149 LearningRate 0.0005 Epoch: 1 Global Step: 40410 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:24:19,896-Speed 6317.42 samples/sec Loss 14.7786 LearningRate 0.0005 Epoch: 1 Global Step: 40420 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:24:23,142-Speed 6311.44 samples/sec Loss 14.9087 LearningRate 0.0005 Epoch: 1 Global Step: 40430 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:24:26,383-Speed 6319.78 samples/sec Loss 14.8846 LearningRate 0.0005 Epoch: 1 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:24:29,629-Speed 6311.26 samples/sec Loss 14.8706 LearningRate 0.0005 Epoch: 1 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:24:32,876-Speed 6308.60 samples/sec Loss 14.8101 LearningRate 0.0005 Epoch: 1 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:24:36,115-Speed 6324.70 samples/sec Loss 14.7850 LearningRate 0.0005 Epoch: 1 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:24:39,358-Speed 6315.41 samples/sec Loss 14.8089 LearningRate 0.0005 Epoch: 1 Global Step: 40480 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:24:42,605-Speed 6309.60 samples/sec Loss 14.7282 LearningRate 0.0005 Epoch: 1 Global Step: 40490 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:24:45,849-Speed 6313.75 samples/sec Loss 14.8462 LearningRate 0.0005 Epoch: 1 Global Step: 40500 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:24:49,096-Speed 6309.73 samples/sec Loss 14.7792 LearningRate 0.0005 Epoch: 1 Global Step: 40510 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:24:52,350-Speed 6294.37 samples/sec Loss 14.8576 LearningRate 0.0005 Epoch: 1 Global Step: 40520 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:24:55,598-Speed 6306.69 samples/sec Loss 14.7733 LearningRate 0.0005 Epoch: 1 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:24:58,835-Speed 6328.86 samples/sec Loss 14.7006 LearningRate 0.0005 Epoch: 1 Global Step: 40540 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:25:02,091-Speed 6290.98 samples/sec Loss 14.7357 LearningRate 0.0005 Epoch: 1 Global Step: 40550 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:25:05,367-Speed 6254.16 samples/sec Loss 14.8108 LearningRate 0.0005 Epoch: 1 Global Step: 40560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:25:08,609-Speed 6318.53 samples/sec Loss 14.7163 LearningRate 0.0005 Epoch: 1 Global Step: 40570 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:25:11,839-Speed 6341.20 samples/sec Loss 14.7124 LearningRate 0.0005 Epoch: 1 Global Step: 40580 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:15,081-Speed 6319.51 samples/sec Loss 14.7500 LearningRate 0.0005 Epoch: 1 Global Step: 40590 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:18,325-Speed 6313.65 samples/sec Loss 14.8067 LearningRate 0.0005 Epoch: 1 Global Step: 40600 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:21,568-Speed 6316.88 samples/sec Loss 14.6624 LearningRate 0.0005 Epoch: 1 Global Step: 40610 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:24,814-Speed 6310.35 samples/sec Loss 14.7652 LearningRate 0.0005 Epoch: 1 Global Step: 40620 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:28,056-Speed 6318.51 samples/sec Loss 14.7511 LearningRate 0.0005 Epoch: 1 Global Step: 40630 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:31,298-Speed 6319.21 samples/sec Loss 14.6002 LearningRate 0.0005 Epoch: 1 Global Step: 40640 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:34,545-Speed 6309.55 samples/sec Loss 14.6557 LearningRate 0.0005 Epoch: 1 Global Step: 40650 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:37,786-Speed 6320.11 samples/sec Loss 14.6872 LearningRate 0.0005 Epoch: 1 Global Step: 40660 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:41,034-Speed 6305.59 samples/sec Loss 14.7164 LearningRate 0.0005 Epoch: 1 Global Step: 40670 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:44,267-Speed 6336.24 samples/sec Loss 14.7393 LearningRate 0.0005 Epoch: 1 Global Step: 40680 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:47,509-Speed 6318.61 samples/sec Loss 14.6653 LearningRate 0.0005 Epoch: 1 Global Step: 40690 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:50,750-Speed 6320.71 samples/sec Loss 14.7994 LearningRate 0.0005 Epoch: 1 Global Step: 40700 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:53,989-Speed 6323.29 samples/sec Loss 14.6869 LearningRate 0.0005 Epoch: 1 Global Step: 40710 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:25:57,234-Speed 6313.84 samples/sec Loss 14.7650 LearningRate 0.0005 Epoch: 1 Global Step: 40720 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:26:00,477-Speed 6317.12 samples/sec Loss 14.5328 LearningRate 0.0005 Epoch: 1 Global Step: 40730 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:26:03,724-Speed 6309.15 samples/sec Loss 14.6448 LearningRate 0.0005 Epoch: 1 Global Step: 40740 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:26:06,966-Speed 6317.87 samples/sec Loss 14.7119 LearningRate 0.0005 Epoch: 1 Global Step: 40750 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:26:10,212-Speed 6311.96 samples/sec Loss 14.6481 LearningRate 0.0005 Epoch: 1 Global Step: 40760 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:26:13,458-Speed 6308.88 samples/sec Loss 14.7386 LearningRate 0.0005 Epoch: 1 Global Step: 40770 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:26:16,688-Speed 6343.26 samples/sec Loss 14.6417 LearningRate 0.0005 Epoch: 1 Global Step: 40780 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:26:19,921-Speed 6335.64 samples/sec Loss 14.5837 LearningRate 0.0005 Epoch: 1 Global Step: 40790 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:26:23,164-Speed 6317.21 samples/sec Loss 14.6634 LearningRate 0.0005 Epoch: 1 Global Step: 40800 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:26:26,414-Speed 6302.84 samples/sec Loss 14.6644 LearningRate 0.0005 Epoch: 1 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:26:29,660-Speed 6310.34 samples/sec Loss 14.6975 LearningRate 0.0005 Epoch: 1 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:26:32,902-Speed 6317.33 samples/sec Loss 14.6662 LearningRate 0.0005 Epoch: 1 Global Step: 40830 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:26:36,148-Speed 6312.06 samples/sec Loss 14.6255 LearningRate 0.0005 Epoch: 1 Global Step: 40840 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:26:39,395-Speed 6308.32 samples/sec Loss 14.5522 LearningRate 0.0005 Epoch: 1 Global Step: 40850 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:26:42,645-Speed 6303.16 samples/sec Loss 14.6347 LearningRate 0.0005 Epoch: 1 Global Step: 40860 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:26:45,890-Speed 6311.20 samples/sec Loss 14.5500 LearningRate 0.0005 Epoch: 1 Global Step: 40870 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:26:49,131-Speed 6320.87 samples/sec Loss 14.6739 LearningRate 0.0005 Epoch: 1 Global Step: 40880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:26:52,375-Speed 6315.84 samples/sec Loss 14.6697 LearningRate 0.0005 Epoch: 1 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:26:55,618-Speed 6314.87 samples/sec Loss 14.6528 LearningRate 0.0005 Epoch: 1 Global Step: 40900 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:26:58,865-Speed 6309.95 samples/sec Loss 14.5682 LearningRate 0.0005 Epoch: 1 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:02,112-Speed 6310.05 samples/sec Loss 14.5130 LearningRate 0.0005 Epoch: 1 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:05,361-Speed 6304.79 samples/sec Loss 14.7177 LearningRate 0.0005 Epoch: 1 Global Step: 40930 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:08,603-Speed 6319.50 samples/sec Loss 14.7086 LearningRate 0.0005 Epoch: 1 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:11,847-Speed 6313.60 samples/sec Loss 14.5759 LearningRate 0.0005 Epoch: 1 Global Step: 40950 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:15,090-Speed 6315.94 samples/sec Loss 14.6343 LearningRate 0.0005 Epoch: 1 Global Step: 40960 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:18,332-Speed 6320.03 samples/sec Loss 14.5357 LearningRate 0.0005 Epoch: 1 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:21,575-Speed 6316.45 samples/sec Loss 14.5870 LearningRate 0.0005 Epoch: 1 Global Step: 40980 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:24,817-Speed 6318.89 samples/sec Loss 14.5663 LearningRate 0.0005 Epoch: 1 Global Step: 40990 Fp16 Grad Scale: 262144 Required: 71 hours Training: 2022-03-31 19:27:28,045-Speed 6345.53 samples/sec Loss 14.6094 LearningRate 0.0005 Epoch: 1 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:31,286-Speed 6319.18 samples/sec Loss 14.5890 LearningRate 0.0005 Epoch: 1 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:34,526-Speed 6322.70 samples/sec Loss 14.6044 LearningRate 0.0005 Epoch: 1 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:37,770-Speed 6314.64 samples/sec Loss 14.6080 LearningRate 0.0005 Epoch: 1 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:41,013-Speed 6316.69 samples/sec Loss 14.5219 LearningRate 0.0005 Epoch: 1 Global Step: 41040 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:44,255-Speed 6317.78 samples/sec Loss 14.7203 LearningRate 0.0005 Epoch: 1 Global Step: 41050 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:47,503-Speed 6307.46 samples/sec Loss 14.5329 LearningRate 0.0005 Epoch: 1 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:50,754-Speed 6301.28 samples/sec Loss 14.5560 LearningRate 0.0005 Epoch: 1 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:53,997-Speed 6315.80 samples/sec Loss 14.5832 LearningRate 0.0005 Epoch: 1 Global Step: 41080 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:27:57,241-Speed 6314.42 samples/sec Loss 14.5207 LearningRate 0.0005 Epoch: 1 Global Step: 41090 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:28:00,472-Speed 6340.69 samples/sec Loss 14.6268 LearningRate 0.0005 Epoch: 1 Global Step: 41100 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:28:03,718-Speed 6312.25 samples/sec Loss 14.6910 LearningRate 0.0005 Epoch: 1 Global Step: 41110 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:28:06,960-Speed 6318.92 samples/sec Loss 14.5760 LearningRate 0.0005 Epoch: 1 Global Step: 41120 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:28:10,201-Speed 6319.07 samples/sec Loss 14.5627 LearningRate 0.0005 Epoch: 1 Global Step: 41130 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:28:13,429-Speed 6346.25 samples/sec Loss 14.5886 LearningRate 0.0005 Epoch: 1 Global Step: 41140 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:28:16,670-Speed 6320.25 samples/sec Loss 14.4806 LearningRate 0.0005 Epoch: 1 Global Step: 41150 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:28:19,910-Speed 6321.81 samples/sec Loss 14.4409 LearningRate 0.0005 Epoch: 1 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:28:23,151-Speed 6321.52 samples/sec Loss 14.4420 LearningRate 0.0005 Epoch: 1 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:28:26,394-Speed 6316.59 samples/sec Loss 14.5124 LearningRate 0.0005 Epoch: 1 Global Step: 41180 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:28:29,636-Speed 6317.67 samples/sec Loss 14.5371 LearningRate 0.0005 Epoch: 1 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:28:32,880-Speed 6315.24 samples/sec Loss 14.5591 LearningRate 0.0005 Epoch: 1 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:28:36,124-Speed 6315.52 samples/sec Loss 14.5894 LearningRate 0.0005 Epoch: 1 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:28:39,363-Speed 6323.84 samples/sec Loss 14.5870 LearningRate 0.0005 Epoch: 1 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:28:42,610-Speed 6307.19 samples/sec Loss 14.3925 LearningRate 0.0005 Epoch: 1 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:28:45,853-Speed 6316.38 samples/sec Loss 14.4553 LearningRate 0.0005 Epoch: 1 Global Step: 41240 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:28:49,098-Speed 6313.85 samples/sec Loss 14.4477 LearningRate 0.0005 Epoch: 1 Global Step: 41250 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:28:52,341-Speed 6316.08 samples/sec Loss 14.4990 LearningRate 0.0005 Epoch: 1 Global Step: 41260 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:28:55,589-Speed 6306.34 samples/sec Loss 14.4383 LearningRate 0.0005 Epoch: 1 Global Step: 41270 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:28:58,832-Speed 6317.33 samples/sec Loss 14.4505 LearningRate 0.0005 Epoch: 1 Global Step: 41280 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:29:02,079-Speed 6308.67 samples/sec Loss 14.5038 LearningRate 0.0005 Epoch: 1 Global Step: 41290 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:29:05,325-Speed 6311.52 samples/sec Loss 14.4833 LearningRate 0.0005 Epoch: 1 Global Step: 41300 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:29:08,569-Speed 6314.50 samples/sec Loss 14.4120 LearningRate 0.0005 Epoch: 1 Global Step: 41310 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:29:11,815-Speed 6311.04 samples/sec Loss 14.4042 LearningRate 0.0005 Epoch: 1 Global Step: 41320 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:29:15,045-Speed 6341.74 samples/sec Loss 14.4939 LearningRate 0.0005 Epoch: 1 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:18,287-Speed 6319.09 samples/sec Loss 14.5533 LearningRate 0.0005 Epoch: 1 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:21,531-Speed 6314.13 samples/sec Loss 14.4277 LearningRate 0.0005 Epoch: 1 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:24,778-Speed 6309.82 samples/sec Loss 14.4028 LearningRate 0.0005 Epoch: 1 Global Step: 41360 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:28,024-Speed 6310.48 samples/sec Loss 14.4645 LearningRate 0.0005 Epoch: 1 Global Step: 41370 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:31,264-Speed 6322.11 samples/sec Loss 14.4906 LearningRate 0.0005 Epoch: 1 Global Step: 41380 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:34,506-Speed 6318.74 samples/sec Loss 14.4805 LearningRate 0.0005 Epoch: 1 Global Step: 41390 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:37,768-Speed 6279.67 samples/sec Loss 14.4269 LearningRate 0.0005 Epoch: 1 Global Step: 41400 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:41,008-Speed 6321.43 samples/sec Loss 14.4502 LearningRate 0.0005 Epoch: 1 Global Step: 41410 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:44,253-Speed 6312.68 samples/sec Loss 14.4972 LearningRate 0.0005 Epoch: 1 Global Step: 41420 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:47,497-Speed 6314.85 samples/sec Loss 14.4737 LearningRate 0.0005 Epoch: 1 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:29:50,727-Speed 6341.55 samples/sec Loss 14.5535 LearningRate 0.0005 Epoch: 1 Global Step: 41440 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:53,970-Speed 6317.18 samples/sec Loss 14.4642 LearningRate 0.0005 Epoch: 1 Global Step: 41450 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:29:57,219-Speed 6304.55 samples/sec Loss 14.4714 LearningRate 0.0005 Epoch: 1 Global Step: 41460 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:30:00,462-Speed 6315.86 samples/sec Loss 14.5117 LearningRate 0.0005 Epoch: 1 Global Step: 41470 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:30:03,704-Speed 6318.59 samples/sec Loss 14.3569 LearningRate 0.0005 Epoch: 1 Global Step: 41480 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:31:02,727-Speed 346.99 samples/sec Loss 14.3855 LearningRate 0.0005 Epoch: 2 Global Step: 41490 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:05,966-Speed 6325.17 samples/sec Loss 14.4342 LearningRate 0.0005 Epoch: 2 Global Step: 41500 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:09,205-Speed 6324.73 samples/sec Loss 14.1968 LearningRate 0.0005 Epoch: 2 Global Step: 41510 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:12,440-Speed 6331.29 samples/sec Loss 14.4334 LearningRate 0.0005 Epoch: 2 Global Step: 41520 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:15,671-Speed 6341.66 samples/sec Loss 14.4218 LearningRate 0.0005 Epoch: 2 Global Step: 41530 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:18,908-Speed 6326.59 samples/sec Loss 14.3537 LearningRate 0.0005 Epoch: 2 Global Step: 41540 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:31:22,144-Speed 6329.61 samples/sec Loss 14.3841 LearningRate 0.0005 Epoch: 2 Global Step: 41550 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:31:25,384-Speed 6323.85 samples/sec Loss 14.3353 LearningRate 0.0005 Epoch: 2 Global Step: 41560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:31:28,620-Speed 6329.06 samples/sec Loss 14.3563 LearningRate 0.0005 Epoch: 2 Global Step: 41570 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:31:31,857-Speed 6329.45 samples/sec Loss 14.3618 LearningRate 0.0005 Epoch: 2 Global Step: 41580 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:31:35,091-Speed 6333.80 samples/sec Loss 14.3350 LearningRate 0.0005 Epoch: 2 Global Step: 41590 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:38,334-Speed 6316.58 samples/sec Loss 14.3181 LearningRate 0.0005 Epoch: 2 Global Step: 41600 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:41,569-Speed 6332.22 samples/sec Loss 14.3569 LearningRate 0.0005 Epoch: 2 Global Step: 41610 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:44,808-Speed 6324.70 samples/sec Loss 14.4376 LearningRate 0.0005 Epoch: 2 Global Step: 41620 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:48,047-Speed 6323.60 samples/sec Loss 14.2308 LearningRate 0.0005 Epoch: 2 Global Step: 41630 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:51,286-Speed 6324.49 samples/sec Loss 14.2779 LearningRate 0.0005 Epoch: 2 Global Step: 41640 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:54,527-Speed 6320.59 samples/sec Loss 14.3948 LearningRate 0.0005 Epoch: 2 Global Step: 41650 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:31:57,775-Speed 6306.69 samples/sec Loss 14.3245 LearningRate 0.0005 Epoch: 2 Global Step: 41660 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:01,012-Speed 6328.72 samples/sec Loss 14.1715 LearningRate 0.0005 Epoch: 2 Global Step: 41670 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:04,247-Speed 6331.76 samples/sec Loss 14.2602 LearningRate 0.0005 Epoch: 2 Global Step: 41680 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:07,482-Speed 6333.06 samples/sec Loss 14.2458 LearningRate 0.0005 Epoch: 2 Global Step: 41690 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:32:10,723-Speed 6320.30 samples/sec Loss 14.2141 LearningRate 0.0005 Epoch: 2 Global Step: 41700 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:32:13,964-Speed 6319.94 samples/sec Loss 14.3162 LearningRate 0.0005 Epoch: 2 Global Step: 41710 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:32:17,201-Speed 6328.03 samples/sec Loss 14.4180 LearningRate 0.0005 Epoch: 2 Global Step: 41720 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:32:20,446-Speed 6314.21 samples/sec Loss 14.2593 LearningRate 0.0005 Epoch: 2 Global Step: 41730 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:32:23,695-Speed 6304.03 samples/sec Loss 14.3552 LearningRate 0.0005 Epoch: 2 Global Step: 41740 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:32:26,938-Speed 6316.03 samples/sec Loss 14.3179 LearningRate 0.0005 Epoch: 2 Global Step: 41750 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:32:30,166-Speed 6347.77 samples/sec Loss 14.2828 LearningRate 0.0005 Epoch: 2 Global Step: 41760 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:33,404-Speed 6324.40 samples/sec Loss 14.3094 LearningRate 0.0005 Epoch: 2 Global Step: 41770 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:36,642-Speed 6327.34 samples/sec Loss 14.2040 LearningRate 0.0005 Epoch: 2 Global Step: 41780 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:39,880-Speed 6326.93 samples/sec Loss 14.2757 LearningRate 0.0005 Epoch: 2 Global Step: 41790 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:43,119-Speed 6322.71 samples/sec Loss 14.0943 LearningRate 0.0005 Epoch: 2 Global Step: 41800 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:46,360-Speed 6321.10 samples/sec Loss 14.3568 LearningRate 0.0005 Epoch: 2 Global Step: 41810 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:49,614-Speed 6295.71 samples/sec Loss 14.2562 LearningRate 0.0005 Epoch: 2 Global Step: 41820 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:52,932-Speed 6173.73 samples/sec Loss 14.1531 LearningRate 0.0005 Epoch: 2 Global Step: 41830 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:56,178-Speed 6310.06 samples/sec Loss 14.2122 LearningRate 0.0005 Epoch: 2 Global Step: 41840 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:32:59,417-Speed 6325.56 samples/sec Loss 14.2306 LearningRate 0.0005 Epoch: 2 Global Step: 41850 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:33:02,657-Speed 6321.33 samples/sec Loss 14.2805 LearningRate 0.0005 Epoch: 2 Global Step: 41860 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:05,901-Speed 6314.85 samples/sec Loss 14.2658 LearningRate 0.0005 Epoch: 2 Global Step: 41870 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:09,143-Speed 6319.09 samples/sec Loss 14.2508 LearningRate 0.0005 Epoch: 2 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:12,380-Speed 6326.96 samples/sec Loss 14.2816 LearningRate 0.0005 Epoch: 2 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:15,624-Speed 6315.41 samples/sec Loss 14.2264 LearningRate 0.0005 Epoch: 2 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:18,861-Speed 6328.67 samples/sec Loss 14.2578 LearningRate 0.0005 Epoch: 2 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:22,103-Speed 6318.65 samples/sec Loss 14.2281 LearningRate 0.0005 Epoch: 2 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:25,344-Speed 6319.90 samples/sec Loss 14.2143 LearningRate 0.0005 Epoch: 2 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:28,589-Speed 6313.68 samples/sec Loss 14.1933 LearningRate 0.0005 Epoch: 2 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:31,829-Speed 6321.59 samples/sec Loss 14.1960 LearningRate 0.0005 Epoch: 2 Global Step: 41950 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:35,053-Speed 6353.95 samples/sec Loss 14.0936 LearningRate 0.0005 Epoch: 2 Global Step: 41960 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:38,297-Speed 6315.39 samples/sec Loss 14.3259 LearningRate 0.0005 Epoch: 2 Global Step: 41970 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:41,535-Speed 6325.81 samples/sec Loss 14.1985 LearningRate 0.0005 Epoch: 2 Global Step: 41980 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:44,772-Speed 6328.52 samples/sec Loss 14.2579 LearningRate 0.0005 Epoch: 2 Global Step: 41990 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:48,016-Speed 6315.19 samples/sec Loss 14.2079 LearningRate 0.0005 Epoch: 2 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:51,257-Speed 6319.21 samples/sec Loss 14.3410 LearningRate 0.0005 Epoch: 2 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:54,518-Speed 6282.55 samples/sec Loss 14.1595 LearningRate 0.0005 Epoch: 2 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:33:57,754-Speed 6329.43 samples/sec Loss 14.2137 LearningRate 0.0005 Epoch: 2 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:34:00,984-Speed 6342.20 samples/sec Loss 14.2672 LearningRate 0.0005 Epoch: 2 Global Step: 42040 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:34:04,227-Speed 6316.69 samples/sec Loss 14.1317 LearningRate 0.0005 Epoch: 2 Global Step: 42050 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:34:07,468-Speed 6321.09 samples/sec Loss 14.2499 LearningRate 0.0005 Epoch: 2 Global Step: 42060 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:34:10,708-Speed 6321.98 samples/sec Loss 14.1672 LearningRate 0.0005 Epoch: 2 Global Step: 42070 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:34:13,958-Speed 6303.45 samples/sec Loss 14.2089 LearningRate 0.0005 Epoch: 2 Global Step: 42080 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:34:17,197-Speed 6324.23 samples/sec Loss 14.1660 LearningRate 0.0005 Epoch: 2 Global Step: 42090 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:34:20,437-Speed 6322.28 samples/sec Loss 14.3073 LearningRate 0.0005 Epoch: 2 Global Step: 42100 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:34:23,680-Speed 6317.49 samples/sec Loss 14.1398 LearningRate 0.0005 Epoch: 2 Global Step: 42110 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:34:26,920-Speed 6322.54 samples/sec Loss 14.1558 LearningRate 0.0005 Epoch: 2 Global Step: 42120 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:34:30,157-Speed 6327.68 samples/sec Loss 14.1284 LearningRate 0.0005 Epoch: 2 Global Step: 42130 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:34:33,400-Speed 6316.43 samples/sec Loss 14.0976 LearningRate 0.0005 Epoch: 2 Global Step: 42140 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:34:36,639-Speed 6325.13 samples/sec Loss 14.2400 LearningRate 0.0005 Epoch: 2 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:34:39,881-Speed 6318.17 samples/sec Loss 14.1202 LearningRate 0.0005 Epoch: 2 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:34:43,128-Speed 6309.34 samples/sec Loss 14.0776 LearningRate 0.0005 Epoch: 2 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:34:46,370-Speed 6318.17 samples/sec Loss 14.0027 LearningRate 0.0005 Epoch: 2 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:34:49,612-Speed 6318.85 samples/sec Loss 14.1224 LearningRate 0.0005 Epoch: 2 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:34:52,860-Speed 6306.75 samples/sec Loss 14.1262 LearningRate 0.0005 Epoch: 2 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:34:56,099-Speed 6324.16 samples/sec Loss 14.1879 LearningRate 0.0005 Epoch: 2 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:34:59,340-Speed 6320.87 samples/sec Loss 14.1270 LearningRate 0.0005 Epoch: 2 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:02,578-Speed 6325.20 samples/sec Loss 14.1629 LearningRate 0.0005 Epoch: 2 Global Step: 42230 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:05,808-Speed 6341.87 samples/sec Loss 14.1185 LearningRate 0.0005 Epoch: 2 Global Step: 42240 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:09,052-Speed 6315.52 samples/sec Loss 14.1385 LearningRate 0.0005 Epoch: 2 Global Step: 42250 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:12,297-Speed 6311.55 samples/sec Loss 14.1456 LearningRate 0.0005 Epoch: 2 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:15,538-Speed 6320.66 samples/sec Loss 14.0442 LearningRate 0.0005 Epoch: 2 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:18,782-Speed 6315.72 samples/sec Loss 14.0222 LearningRate 0.0005 Epoch: 2 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:22,027-Speed 6312.30 samples/sec Loss 14.1300 LearningRate 0.0005 Epoch: 2 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:25,268-Speed 6321.49 samples/sec Loss 14.0230 LearningRate 0.0005 Epoch: 2 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:28,504-Speed 6328.96 samples/sec Loss 14.1121 LearningRate 0.0005 Epoch: 2 Global Step: 42310 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:31,745-Speed 6320.76 samples/sec Loss 14.1171 LearningRate 0.0005 Epoch: 2 Global Step: 42320 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:34,987-Speed 6318.49 samples/sec Loss 14.0548 LearningRate 0.0005 Epoch: 2 Global Step: 42330 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:38,213-Speed 6350.29 samples/sec Loss 14.1666 LearningRate 0.0005 Epoch: 2 Global Step: 42340 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:41,452-Speed 6324.72 samples/sec Loss 14.1544 LearningRate 0.0005 Epoch: 2 Global Step: 42350 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:44,696-Speed 6314.81 samples/sec Loss 14.0205 LearningRate 0.0005 Epoch: 2 Global Step: 42360 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:47,934-Speed 6324.80 samples/sec Loss 14.0772 LearningRate 0.0005 Epoch: 2 Global Step: 42370 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:51,173-Speed 6325.70 samples/sec Loss 14.1381 LearningRate 0.0005 Epoch: 2 Global Step: 42380 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:54,434-Speed 6280.56 samples/sec Loss 14.0242 LearningRate 0.0005 Epoch: 2 Global Step: 42390 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:35:57,673-Speed 6324.66 samples/sec Loss 14.0390 LearningRate 0.0005 Epoch: 2 Global Step: 42400 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:00,916-Speed 6317.77 samples/sec Loss 14.0390 LearningRate 0.0005 Epoch: 2 Global Step: 42410 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:04,161-Speed 6312.05 samples/sec Loss 14.0636 LearningRate 0.0005 Epoch: 2 Global Step: 42420 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:07,403-Speed 6318.11 samples/sec Loss 14.0934 LearningRate 0.0005 Epoch: 2 Global Step: 42430 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:10,643-Speed 6322.00 samples/sec Loss 14.0190 LearningRate 0.0005 Epoch: 2 Global Step: 42440 Fp16 Grad Scale: 262144 Required: 72 hours Training: 2022-03-31 19:36:13,896-Speed 6298.21 samples/sec Loss 14.1216 LearningRate 0.0005 Epoch: 2 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:17,137-Speed 6320.99 samples/sec Loss 14.0987 LearningRate 0.0005 Epoch: 2 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:20,376-Speed 6324.58 samples/sec Loss 14.0839 LearningRate 0.0005 Epoch: 2 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:23,613-Speed 6327.72 samples/sec Loss 14.0575 LearningRate 0.0005 Epoch: 2 Global Step: 42480 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:26,853-Speed 6321.15 samples/sec Loss 14.0814 LearningRate 0.0005 Epoch: 2 Global Step: 42490 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:30,163-Speed 6189.48 samples/sec Loss 13.9951 LearningRate 0.0005 Epoch: 2 Global Step: 42500 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:33,404-Speed 6321.31 samples/sec Loss 14.0260 LearningRate 0.0005 Epoch: 2 Global Step: 42510 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:36,647-Speed 6317.86 samples/sec Loss 13.8795 LearningRate 0.0005 Epoch: 2 Global Step: 42520 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:39,899-Speed 6298.21 samples/sec Loss 13.9717 LearningRate 0.0005 Epoch: 2 Global Step: 42530 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:43,141-Speed 6319.07 samples/sec Loss 14.0254 LearningRate 0.0005 Epoch: 2 Global Step: 42540 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:46,370-Speed 6343.74 samples/sec Loss 14.1232 LearningRate 0.0005 Epoch: 2 Global Step: 42550 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:49,609-Speed 6323.63 samples/sec Loss 14.0306 LearningRate 0.0005 Epoch: 2 Global Step: 42560 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:52,852-Speed 6317.47 samples/sec Loss 14.0618 LearningRate 0.0005 Epoch: 2 Global Step: 42570 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:56,091-Speed 6322.98 samples/sec Loss 14.0097 LearningRate 0.0005 Epoch: 2 Global Step: 42580 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:36:59,330-Speed 6326.37 samples/sec Loss 13.9790 LearningRate 0.0005 Epoch: 2 Global Step: 42590 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:37:02,572-Speed 6317.84 samples/sec Loss 13.9475 LearningRate 0.0005 Epoch: 2 Global Step: 42600 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:37:05,797-Speed 6351.84 samples/sec Loss 14.0204 LearningRate 0.0005 Epoch: 2 Global Step: 42610 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:37:09,034-Speed 6328.79 samples/sec Loss 13.9598 LearningRate 0.0005 Epoch: 2 Global Step: 42620 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:37:12,274-Speed 6322.35 samples/sec Loss 13.9910 LearningRate 0.0005 Epoch: 2 Global Step: 42630 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:37:15,516-Speed 6317.63 samples/sec Loss 14.0800 LearningRate 0.0005 Epoch: 2 Global Step: 42640 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:37:18,755-Speed 6324.22 samples/sec Loss 13.9838 LearningRate 0.0005 Epoch: 2 Global Step: 42650 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:37:21,992-Speed 6328.42 samples/sec Loss 14.0189 LearningRate 0.0005 Epoch: 2 Global Step: 42660 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:37:25,245-Speed 6296.85 samples/sec Loss 13.9895 LearningRate 0.0005 Epoch: 2 Global Step: 42670 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:37:28,486-Speed 6319.90 samples/sec Loss 13.8734 LearningRate 0.0005 Epoch: 2 Global Step: 42680 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:37:31,727-Speed 6320.71 samples/sec Loss 13.9220 LearningRate 0.0005 Epoch: 2 Global Step: 42690 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:37:34,972-Speed 6313.44 samples/sec Loss 13.9263 LearningRate 0.0005 Epoch: 2 Global Step: 42700 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:37:38,212-Speed 6322.81 samples/sec Loss 13.9095 LearningRate 0.0005 Epoch: 2 Global Step: 42710 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:37:41,458-Speed 6310.62 samples/sec Loss 13.8582 LearningRate 0.0005 Epoch: 2 Global Step: 42720 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:37:44,695-Speed 6328.61 samples/sec Loss 13.9654 LearningRate 0.0005 Epoch: 2 Global Step: 42730 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:37:47,937-Speed 6318.38 samples/sec Loss 13.9439 LearningRate 0.0005 Epoch: 2 Global Step: 42740 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:37:51,175-Speed 6327.50 samples/sec Loss 13.9998 LearningRate 0.0005 Epoch: 2 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:37:54,417-Speed 6318.37 samples/sec Loss 13.9604 LearningRate 0.0005 Epoch: 2 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:37:57,657-Speed 6321.58 samples/sec Loss 13.8546 LearningRate 0.0005 Epoch: 2 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:38:00,901-Speed 6315.24 samples/sec Loss 13.9472 LearningRate 0.0005 Epoch: 2 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:38:04,154-Speed 6296.53 samples/sec Loss 13.9653 LearningRate 0.0005 Epoch: 2 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:38:07,392-Speed 6326.73 samples/sec Loss 13.8727 LearningRate 0.0005 Epoch: 2 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:38:10,614-Speed 6357.40 samples/sec Loss 13.8569 LearningRate 0.0005 Epoch: 2 Global Step: 42810 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:38:13,856-Speed 6317.77 samples/sec Loss 13.8065 LearningRate 0.0005 Epoch: 2 Global Step: 42820 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:38:17,095-Speed 6324.70 samples/sec Loss 13.9107 LearningRate 0.0005 Epoch: 2 Global Step: 42830 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:38:20,334-Speed 6324.43 samples/sec Loss 13.9247 LearningRate 0.0005 Epoch: 2 Global Step: 42840 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:38:23,576-Speed 6318.16 samples/sec Loss 14.0331 LearningRate 0.0005 Epoch: 2 Global Step: 42850 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:38:26,818-Speed 6319.28 samples/sec Loss 13.9474 LearningRate 0.0005 Epoch: 2 Global Step: 42860 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:38:30,066-Speed 6306.04 samples/sec Loss 13.8212 LearningRate 0.0005 Epoch: 2 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:38:33,308-Speed 6319.67 samples/sec Loss 13.8215 LearningRate 0.0005 Epoch: 2 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:38:36,558-Speed 6303.10 samples/sec Loss 13.9709 LearningRate 0.0005 Epoch: 2 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:38:39,798-Speed 6322.03 samples/sec Loss 13.8720 LearningRate 0.0005 Epoch: 2 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:38:43,040-Speed 6317.19 samples/sec Loss 13.8799 LearningRate 0.0005 Epoch: 2 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:38:46,280-Speed 6324.59 samples/sec Loss 13.8657 LearningRate 0.0005 Epoch: 2 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:38:49,525-Speed 6313.39 samples/sec Loss 13.9057 LearningRate 0.0005 Epoch: 2 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:38:52,764-Speed 6323.30 samples/sec Loss 13.9303 LearningRate 0.0005 Epoch: 2 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:38:56,011-Speed 6309.69 samples/sec Loss 13.8660 LearningRate 0.0005 Epoch: 2 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:38:59,257-Speed 6309.80 samples/sec Loss 13.7871 LearningRate 0.0005 Epoch: 2 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:02,488-Speed 6339.56 samples/sec Loss 13.8797 LearningRate 0.0005 Epoch: 2 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:05,736-Speed 6306.83 samples/sec Loss 13.7629 LearningRate 0.0005 Epoch: 2 Global Step: 42980 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:08,978-Speed 6318.44 samples/sec Loss 13.8747 LearningRate 0.0005 Epoch: 2 Global Step: 42990 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:12,220-Speed 6318.99 samples/sec Loss 13.8797 LearningRate 0.0005 Epoch: 2 Global Step: 43000 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:15,463-Speed 6316.71 samples/sec Loss 13.8143 LearningRate 0.0005 Epoch: 2 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:18,705-Speed 6318.23 samples/sec Loss 13.8849 LearningRate 0.0005 Epoch: 2 Global Step: 43020 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:21,949-Speed 6315.25 samples/sec Loss 13.8717 LearningRate 0.0005 Epoch: 2 Global Step: 43030 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:25,189-Speed 6321.28 samples/sec Loss 13.7799 LearningRate 0.0005 Epoch: 2 Global Step: 43040 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:28,432-Speed 6318.06 samples/sec Loss 13.7830 LearningRate 0.0005 Epoch: 2 Global Step: 43050 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:31,677-Speed 6313.49 samples/sec Loss 13.8433 LearningRate 0.0005 Epoch: 2 Global Step: 43060 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:34,910-Speed 6334.54 samples/sec Loss 13.8170 LearningRate 0.0005 Epoch: 2 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:38,153-Speed 6316.39 samples/sec Loss 13.7714 LearningRate 0.0005 Epoch: 2 Global Step: 43080 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:41,403-Speed 6304.48 samples/sec Loss 13.8514 LearningRate 0.0005 Epoch: 2 Global Step: 43090 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:44,649-Speed 6309.44 samples/sec Loss 13.7812 LearningRate 0.0005 Epoch: 2 Global Step: 43100 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:47,891-Speed 6319.00 samples/sec Loss 13.8376 LearningRate 0.0005 Epoch: 2 Global Step: 43110 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:51,134-Speed 6317.15 samples/sec Loss 13.8766 LearningRate 0.0005 Epoch: 2 Global Step: 43120 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:54,374-Speed 6322.53 samples/sec Loss 13.7879 LearningRate 0.0005 Epoch: 2 Global Step: 43130 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:39:57,615-Speed 6321.54 samples/sec Loss 13.6706 LearningRate 0.0005 Epoch: 2 Global Step: 43140 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:00,860-Speed 6311.95 samples/sec Loss 13.8015 LearningRate 0.0005 Epoch: 2 Global Step: 43150 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:04,116-Speed 6291.75 samples/sec Loss 13.8059 LearningRate 0.0005 Epoch: 2 Global Step: 43160 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:07,389-Speed 6258.31 samples/sec Loss 13.7729 LearningRate 0.0005 Epoch: 2 Global Step: 43170 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:40:10,644-Speed 6292.69 samples/sec Loss 13.7379 LearningRate 0.0005 Epoch: 2 Global Step: 43180 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:40:13,890-Speed 6310.22 samples/sec Loss 13.7879 LearningRate 0.0005 Epoch: 2 Global Step: 43190 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:40:17,135-Speed 6313.86 samples/sec Loss 13.7714 LearningRate 0.0005 Epoch: 2 Global Step: 43200 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:40:20,375-Speed 6322.41 samples/sec Loss 13.9014 LearningRate 0.0005 Epoch: 2 Global Step: 43210 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:40:23,618-Speed 6316.35 samples/sec Loss 13.7837 LearningRate 0.0005 Epoch: 2 Global Step: 43220 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:40:26,869-Speed 6301.48 samples/sec Loss 13.8423 LearningRate 0.0005 Epoch: 2 Global Step: 43230 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:40:30,111-Speed 6318.57 samples/sec Loss 13.6837 LearningRate 0.0005 Epoch: 2 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:40:33,336-Speed 6352.88 samples/sec Loss 13.7353 LearningRate 0.0005 Epoch: 2 Global Step: 43250 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:36,574-Speed 6325.41 samples/sec Loss 13.8474 LearningRate 0.0005 Epoch: 2 Global Step: 43260 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:39,817-Speed 6316.82 samples/sec Loss 13.7535 LearningRate 0.0005 Epoch: 2 Global Step: 43270 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:43,059-Speed 6317.75 samples/sec Loss 13.7189 LearningRate 0.0005 Epoch: 2 Global Step: 43280 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:46,304-Speed 6313.34 samples/sec Loss 13.7371 LearningRate 0.0005 Epoch: 2 Global Step: 43290 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:49,547-Speed 6315.93 samples/sec Loss 13.6423 LearningRate 0.0005 Epoch: 2 Global Step: 43300 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:52,788-Speed 6321.56 samples/sec Loss 13.7765 LearningRate 0.0005 Epoch: 2 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:56,052-Speed 6275.64 samples/sec Loss 13.7706 LearningRate 0.0005 Epoch: 2 Global Step: 43320 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:40:59,294-Speed 6319.94 samples/sec Loss 13.7629 LearningRate 0.0005 Epoch: 2 Global Step: 43330 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:41:02,539-Speed 6312.62 samples/sec Loss 13.6882 LearningRate 0.0005 Epoch: 2 Global Step: 43340 Fp16 Grad Scale: 65536 Required: 72 hours Training: 2022-03-31 19:41:05,783-Speed 6313.74 samples/sec Loss 13.6901 LearningRate 0.0005 Epoch: 2 Global Step: 43350 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:41:09,025-Speed 6318.04 samples/sec Loss 13.6996 LearningRate 0.0005 Epoch: 2 Global Step: 43360 Fp16 Grad Scale: 131072 Required: 72 hours Training: 2022-03-31 19:41:12,267-Speed 6320.15 samples/sec Loss 13.6997 LearningRate 0.0005 Epoch: 2 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:15,505-Speed 6325.99 samples/sec Loss 13.7103 LearningRate 0.0005 Epoch: 2 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:18,765-Speed 6282.99 samples/sec Loss 13.7216 LearningRate 0.0005 Epoch: 2 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:22,006-Speed 6320.33 samples/sec Loss 13.8062 LearningRate 0.0005 Epoch: 2 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:25,251-Speed 6313.29 samples/sec Loss 13.6890 LearningRate 0.0005 Epoch: 2 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:28,493-Speed 6317.53 samples/sec Loss 13.7545 LearningRate 0.0005 Epoch: 2 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:31,734-Speed 6321.83 samples/sec Loss 13.7131 LearningRate 0.0005 Epoch: 2 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:34,992-Speed 6287.18 samples/sec Loss 13.7078 LearningRate 0.0005 Epoch: 2 Global Step: 43440 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:38,220-Speed 6345.86 samples/sec Loss 13.7350 LearningRate 0.0005 Epoch: 2 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:41,485-Speed 6273.30 samples/sec Loss 13.6842 LearningRate 0.0005 Epoch: 2 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:44,728-Speed 6316.73 samples/sec Loss 13.6367 LearningRate 0.0005 Epoch: 2 Global Step: 43470 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:47,968-Speed 6322.89 samples/sec Loss 13.7876 LearningRate 0.0005 Epoch: 2 Global Step: 43480 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:51,212-Speed 6314.68 samples/sec Loss 13.6601 LearningRate 0.0005 Epoch: 2 Global Step: 43490 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:54,450-Speed 6324.60 samples/sec Loss 13.6049 LearningRate 0.0005 Epoch: 2 Global Step: 43500 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:41:57,698-Speed 6308.25 samples/sec Loss 13.7210 LearningRate 0.0005 Epoch: 2 Global Step: 43510 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:42:00,941-Speed 6315.52 samples/sec Loss 13.6876 LearningRate 0.0005 Epoch: 2 Global Step: 43520 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:42:04,190-Speed 6305.38 samples/sec Loss 13.6480 LearningRate 0.0005 Epoch: 2 Global Step: 43530 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:42:07,415-Speed 6352.08 samples/sec Loss 13.6516 LearningRate 0.0005 Epoch: 2 Global Step: 43540 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:10,656-Speed 6321.77 samples/sec Loss 13.7549 LearningRate 0.0005 Epoch: 2 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:13,903-Speed 6308.25 samples/sec Loss 13.7042 LearningRate 0.0005 Epoch: 2 Global Step: 43560 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:17,150-Speed 6307.80 samples/sec Loss 13.6769 LearningRate 0.0005 Epoch: 2 Global Step: 43570 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:20,396-Speed 6310.63 samples/sec Loss 13.6524 LearningRate 0.0005 Epoch: 2 Global Step: 43580 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:23,638-Speed 6318.64 samples/sec Loss 13.6378 LearningRate 0.0005 Epoch: 2 Global Step: 43590 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:26,880-Speed 6319.91 samples/sec Loss 13.7512 LearningRate 0.0005 Epoch: 2 Global Step: 43600 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:30,122-Speed 6317.61 samples/sec Loss 13.6092 LearningRate 0.0005 Epoch: 2 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:33,366-Speed 6314.54 samples/sec Loss 13.6945 LearningRate 0.0005 Epoch: 2 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:36,608-Speed 6318.92 samples/sec Loss 13.6699 LearningRate 0.0005 Epoch: 2 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:39,857-Speed 6304.04 samples/sec Loss 13.6019 LearningRate 0.0005 Epoch: 2 Global Step: 43640 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:42:43,100-Speed 6316.45 samples/sec Loss 13.6629 LearningRate 0.0005 Epoch: 2 Global Step: 43650 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:42:46,346-Speed 6310.81 samples/sec Loss 13.5979 LearningRate 0.0005 Epoch: 2 Global Step: 43660 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:42:49,595-Speed 6306.03 samples/sec Loss 13.6208 LearningRate 0.0005 Epoch: 2 Global Step: 43670 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:42:52,835-Speed 6322.46 samples/sec Loss 13.5515 LearningRate 0.0005 Epoch: 2 Global Step: 43680 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:42:56,064-Speed 6343.90 samples/sec Loss 13.6959 LearningRate 0.0005 Epoch: 2 Global Step: 43690 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:42:59,303-Speed 6323.68 samples/sec Loss 13.7313 LearningRate 0.0005 Epoch: 2 Global Step: 43700 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:02,545-Speed 6318.32 samples/sec Loss 13.6272 LearningRate 0.0005 Epoch: 2 Global Step: 43710 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:05,789-Speed 6316.32 samples/sec Loss 13.5487 LearningRate 0.0005 Epoch: 2 Global Step: 43720 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:09,034-Speed 6311.66 samples/sec Loss 13.6853 LearningRate 0.0005 Epoch: 2 Global Step: 43730 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:12,275-Speed 6320.84 samples/sec Loss 13.5748 LearningRate 0.0005 Epoch: 2 Global Step: 43740 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:15,518-Speed 6317.72 samples/sec Loss 13.5733 LearningRate 0.0005 Epoch: 2 Global Step: 43750 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:18,758-Speed 6321.71 samples/sec Loss 13.6586 LearningRate 0.0005 Epoch: 2 Global Step: 43760 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:21,999-Speed 6320.86 samples/sec Loss 13.6797 LearningRate 0.0005 Epoch: 2 Global Step: 43770 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:25,246-Speed 6309.16 samples/sec Loss 13.5118 LearningRate 0.0005 Epoch: 2 Global Step: 43780 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:28,491-Speed 6311.44 samples/sec Loss 13.6574 LearningRate 0.0005 Epoch: 2 Global Step: 43790 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:43:31,735-Speed 6314.88 samples/sec Loss 13.5846 LearningRate 0.0005 Epoch: 2 Global Step: 43800 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:43:34,987-Speed 6303.66 samples/sec Loss 13.5753 LearningRate 0.0005 Epoch: 2 Global Step: 43810 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:43:38,231-Speed 6313.95 samples/sec Loss 13.5666 LearningRate 0.0005 Epoch: 2 Global Step: 43820 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:43:41,474-Speed 6316.22 samples/sec Loss 13.6064 LearningRate 0.0005 Epoch: 2 Global Step: 43830 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:43:44,714-Speed 6322.53 samples/sec Loss 13.6430 LearningRate 0.0005 Epoch: 2 Global Step: 43840 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:43:47,945-Speed 6340.65 samples/sec Loss 13.5689 LearningRate 0.0005 Epoch: 2 Global Step: 43850 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:51,188-Speed 6316.48 samples/sec Loss 13.5926 LearningRate 0.0005 Epoch: 2 Global Step: 43860 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:54,437-Speed 6304.50 samples/sec Loss 13.4597 LearningRate 0.0005 Epoch: 2 Global Step: 43870 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:43:57,676-Speed 6324.67 samples/sec Loss 13.5897 LearningRate 0.0005 Epoch: 2 Global Step: 43880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:00,915-Speed 6324.22 samples/sec Loss 13.5871 LearningRate 0.0005 Epoch: 2 Global Step: 43890 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:04,156-Speed 6321.10 samples/sec Loss 13.5169 LearningRate 0.0005 Epoch: 2 Global Step: 43900 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:07,397-Speed 6319.08 samples/sec Loss 13.6221 LearningRate 0.0005 Epoch: 2 Global Step: 43910 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:10,641-Speed 6315.98 samples/sec Loss 13.4549 LearningRate 0.0005 Epoch: 2 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:13,882-Speed 6320.44 samples/sec Loss 13.5592 LearningRate 0.0005 Epoch: 2 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:17,123-Speed 6320.86 samples/sec Loss 13.5724 LearningRate 0.0005 Epoch: 2 Global Step: 43940 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:20,352-Speed 6342.94 samples/sec Loss 13.6116 LearningRate 0.0005 Epoch: 2 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:23,592-Speed 6323.60 samples/sec Loss 13.5731 LearningRate 0.0005 Epoch: 2 Global Step: 43960 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:26,840-Speed 6306.23 samples/sec Loss 13.4593 LearningRate 0.0005 Epoch: 2 Global Step: 43970 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:30,085-Speed 6311.59 samples/sec Loss 13.4941 LearningRate 0.0005 Epoch: 2 Global Step: 43980 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:33,327-Speed 6320.03 samples/sec Loss 13.5492 LearningRate 0.0005 Epoch: 2 Global Step: 43990 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:36,566-Speed 6323.39 samples/sec Loss 13.4642 LearningRate 0.0005 Epoch: 2 Global Step: 44000 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:39,811-Speed 6313.48 samples/sec Loss 13.5420 LearningRate 0.0005 Epoch: 2 Global Step: 44010 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:43,055-Speed 6313.42 samples/sec Loss 13.4657 LearningRate 0.0005 Epoch: 2 Global Step: 44020 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:46,295-Speed 6322.42 samples/sec Loss 13.5335 LearningRate 0.0005 Epoch: 2 Global Step: 44030 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:49,539-Speed 6315.87 samples/sec Loss 13.5768 LearningRate 0.0005 Epoch: 2 Global Step: 44040 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:44:52,779-Speed 6321.38 samples/sec Loss 13.5480 LearningRate 0.0005 Epoch: 2 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:44:56,020-Speed 6320.57 samples/sec Loss 13.5051 LearningRate 0.0005 Epoch: 2 Global Step: 44060 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:44:59,263-Speed 6315.90 samples/sec Loss 13.5005 LearningRate 0.0005 Epoch: 2 Global Step: 44070 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:45:02,508-Speed 6313.85 samples/sec Loss 13.6031 LearningRate 0.0005 Epoch: 2 Global Step: 44080 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:45:05,756-Speed 6306.60 samples/sec Loss 13.4583 LearningRate 0.0005 Epoch: 2 Global Step: 44090 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:45:09,001-Speed 6311.91 samples/sec Loss 13.3972 LearningRate 0.0005 Epoch: 2 Global Step: 44100 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:45:12,247-Speed 6311.53 samples/sec Loss 13.4787 LearningRate 0.0005 Epoch: 2 Global Step: 44110 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:45:15,494-Speed 6308.78 samples/sec Loss 13.6586 LearningRate 0.0005 Epoch: 2 Global Step: 44120 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:45:18,737-Speed 6316.86 samples/sec Loss 13.5507 LearningRate 0.0005 Epoch: 2 Global Step: 44130 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:45:21,981-Speed 6313.55 samples/sec Loss 13.5124 LearningRate 0.0005 Epoch: 2 Global Step: 44140 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:45:25,211-Speed 6343.83 samples/sec Loss 13.5356 LearningRate 0.0005 Epoch: 2 Global Step: 44150 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:45:28,453-Speed 6317.89 samples/sec Loss 13.5242 LearningRate 0.0005 Epoch: 2 Global Step: 44160 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:45:31,684-Speed 6341.24 samples/sec Loss 13.5226 LearningRate 0.0005 Epoch: 2 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:45:34,923-Speed 6323.07 samples/sec Loss 13.4683 LearningRate 0.0005 Epoch: 2 Global Step: 44180 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:45:38,163-Speed 6321.74 samples/sec Loss 13.4787 LearningRate 0.0005 Epoch: 2 Global Step: 44190 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:45:41,405-Speed 6319.07 samples/sec Loss 13.4588 LearningRate 0.0005 Epoch: 2 Global Step: 44200 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:45:44,646-Speed 6320.30 samples/sec Loss 13.6168 LearningRate 0.0005 Epoch: 2 Global Step: 44210 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:45:47,898-Speed 6299.59 samples/sec Loss 13.4973 LearningRate 0.0005 Epoch: 2 Global Step: 44220 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:45:51,138-Speed 6321.34 samples/sec Loss 13.5237 LearningRate 0.0005 Epoch: 2 Global Step: 44230 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:45:54,380-Speed 6318.55 samples/sec Loss 13.5049 LearningRate 0.0005 Epoch: 2 Global Step: 44240 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:45:57,621-Speed 6321.96 samples/sec Loss 13.5394 LearningRate 0.0005 Epoch: 2 Global Step: 44250 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:46:00,859-Speed 6324.95 samples/sec Loss 13.4471 LearningRate 0.0005 Epoch: 2 Global Step: 44260 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:46:04,106-Speed 6310.17 samples/sec Loss 13.3248 LearningRate 0.0005 Epoch: 2 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:46:07,349-Speed 6316.68 samples/sec Loss 13.4336 LearningRate 0.0005 Epoch: 2 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:46:10,587-Speed 6325.26 samples/sec Loss 13.4610 LearningRate 0.0005 Epoch: 2 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:46:13,851-Speed 6275.65 samples/sec Loss 13.5908 LearningRate 0.0005 Epoch: 2 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:46:17,147-Speed 6214.76 samples/sec Loss 13.4993 LearningRate 0.0005 Epoch: 2 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:46:20,395-Speed 6306.81 samples/sec Loss 13.4868 LearningRate 0.0005 Epoch: 2 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:46:23,638-Speed 6316.34 samples/sec Loss 13.4339 LearningRate 0.0005 Epoch: 2 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:46:26,885-Speed 6309.37 samples/sec Loss 13.4205 LearningRate 0.0005 Epoch: 2 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:46:30,142-Speed 6289.36 samples/sec Loss 13.4695 LearningRate 0.0005 Epoch: 2 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:46:33,367-Speed 6353.13 samples/sec Loss 13.4189 LearningRate 0.0005 Epoch: 2 Global Step: 44360 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:46:36,607-Speed 6322.33 samples/sec Loss 13.3640 LearningRate 0.0005 Epoch: 2 Global Step: 44370 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:46:39,847-Speed 6323.20 samples/sec Loss 13.4393 LearningRate 0.0005 Epoch: 2 Global Step: 44380 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:46:43,086-Speed 6322.91 samples/sec Loss 13.3797 LearningRate 0.0005 Epoch: 2 Global Step: 44390 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:46:46,326-Speed 6323.11 samples/sec Loss 13.3892 LearningRate 0.0005 Epoch: 2 Global Step: 44400 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:46:49,568-Speed 6318.50 samples/sec Loss 13.3243 LearningRate 0.0005 Epoch: 2 Global Step: 44410 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:46:52,807-Speed 6324.69 samples/sec Loss 13.3573 LearningRate 0.0005 Epoch: 2 Global Step: 44420 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:46:56,049-Speed 6316.97 samples/sec Loss 13.4161 LearningRate 0.0005 Epoch: 2 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:46:59,291-Speed 6318.55 samples/sec Loss 13.4273 LearningRate 0.0005 Epoch: 2 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:02,536-Speed 6313.43 samples/sec Loss 13.4295 LearningRate 0.0005 Epoch: 2 Global Step: 44450 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:05,768-Speed 6337.40 samples/sec Loss 13.5310 LearningRate 0.0005 Epoch: 2 Global Step: 44460 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:09,014-Speed 6312.61 samples/sec Loss 13.4054 LearningRate 0.0005 Epoch: 2 Global Step: 44470 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:12,258-Speed 6313.07 samples/sec Loss 13.3417 LearningRate 0.0005 Epoch: 2 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:15,502-Speed 6314.80 samples/sec Loss 13.3702 LearningRate 0.0005 Epoch: 2 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:18,742-Speed 6322.40 samples/sec Loss 13.3854 LearningRate 0.0005 Epoch: 2 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:21,984-Speed 6318.17 samples/sec Loss 13.4125 LearningRate 0.0005 Epoch: 2 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:25,229-Speed 6312.44 samples/sec Loss 13.4050 LearningRate 0.0005 Epoch: 2 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:28,470-Speed 6321.25 samples/sec Loss 13.3660 LearningRate 0.0005 Epoch: 2 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:31,716-Speed 6311.22 samples/sec Loss 13.3118 LearningRate 0.0005 Epoch: 2 Global Step: 44540 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:34,958-Speed 6319.13 samples/sec Loss 13.4485 LearningRate 0.0005 Epoch: 2 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:47:38,197-Speed 6323.76 samples/sec Loss 13.4551 LearningRate 0.0005 Epoch: 2 Global Step: 44560 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:47:41,447-Speed 6304.14 samples/sec Loss 13.2838 LearningRate 0.0005 Epoch: 2 Global Step: 44570 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:47:44,762-Speed 6179.28 samples/sec Loss 13.3601 LearningRate 0.0005 Epoch: 2 Global Step: 44580 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:47:48,007-Speed 6311.01 samples/sec Loss 13.3262 LearningRate 0.0005 Epoch: 2 Global Step: 44590 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:47:51,254-Speed 6309.64 samples/sec Loss 13.3614 LearningRate 0.0005 Epoch: 2 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:47:54,496-Speed 6318.65 samples/sec Loss 13.3551 LearningRate 0.0005 Epoch: 2 Global Step: 44610 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:47:57,739-Speed 6316.62 samples/sec Loss 13.3037 LearningRate 0.0005 Epoch: 2 Global Step: 44620 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:00,985-Speed 6309.99 samples/sec Loss 13.3392 LearningRate 0.0005 Epoch: 2 Global Step: 44630 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:04,232-Speed 6307.88 samples/sec Loss 13.3279 LearningRate 0.0005 Epoch: 2 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:07,488-Speed 6292.83 samples/sec Loss 13.3686 LearningRate 0.0005 Epoch: 2 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:10,717-Speed 6344.14 samples/sec Loss 13.2875 LearningRate 0.0005 Epoch: 2 Global Step: 44660 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:13,961-Speed 6313.20 samples/sec Loss 13.3901 LearningRate 0.0005 Epoch: 2 Global Step: 44670 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:17,201-Speed 6322.36 samples/sec Loss 13.3280 LearningRate 0.0005 Epoch: 2 Global Step: 44680 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:20,445-Speed 6314.80 samples/sec Loss 13.2925 LearningRate 0.0005 Epoch: 2 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:23,688-Speed 6316.11 samples/sec Loss 13.2853 LearningRate 0.0005 Epoch: 2 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:26,935-Speed 6309.65 samples/sec Loss 13.2315 LearningRate 0.0005 Epoch: 2 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:30,178-Speed 6317.42 samples/sec Loss 13.2661 LearningRate 0.0005 Epoch: 2 Global Step: 44720 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:33,414-Speed 6328.81 samples/sec Loss 13.2948 LearningRate 0.0005 Epoch: 2 Global Step: 44730 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:48:36,646-Speed 6338.89 samples/sec Loss 13.3325 LearningRate 0.0005 Epoch: 2 Global Step: 44740 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:48:39,890-Speed 6314.96 samples/sec Loss 13.3616 LearningRate 0.0005 Epoch: 2 Global Step: 44750 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:48:43,132-Speed 6318.88 samples/sec Loss 13.3342 LearningRate 0.0005 Epoch: 2 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:48:46,378-Speed 6311.71 samples/sec Loss 13.2298 LearningRate 0.0005 Epoch: 2 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:48:49,621-Speed 6315.08 samples/sec Loss 13.4481 LearningRate 0.0005 Epoch: 2 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:48:52,860-Speed 6323.92 samples/sec Loss 13.3584 LearningRate 0.0005 Epoch: 2 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:48:56,102-Speed 6319.39 samples/sec Loss 13.2388 LearningRate 0.0005 Epoch: 2 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:48:59,345-Speed 6315.81 samples/sec Loss 13.2952 LearningRate 0.0005 Epoch: 2 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:02,589-Speed 6315.87 samples/sec Loss 13.2895 LearningRate 0.0005 Epoch: 2 Global Step: 44820 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:05,836-Speed 6309.01 samples/sec Loss 13.1208 LearningRate 0.0005 Epoch: 2 Global Step: 44830 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:09,078-Speed 6318.24 samples/sec Loss 13.1650 LearningRate 0.0005 Epoch: 2 Global Step: 44840 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:49:12,325-Speed 6309.49 samples/sec Loss 13.2332 LearningRate 0.0005 Epoch: 2 Global Step: 44850 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:49:15,568-Speed 6315.62 samples/sec Loss 13.3001 LearningRate 0.0005 Epoch: 2 Global Step: 44860 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:49:18,812-Speed 6315.14 samples/sec Loss 13.2208 LearningRate 0.0005 Epoch: 2 Global Step: 44870 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:49:22,034-Speed 6356.62 samples/sec Loss 13.1948 LearningRate 0.0005 Epoch: 2 Global Step: 44880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:25,275-Speed 6320.74 samples/sec Loss 13.3705 LearningRate 0.0005 Epoch: 2 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:28,520-Speed 6312.53 samples/sec Loss 13.2720 LearningRate 0.0005 Epoch: 2 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:31,760-Speed 6322.23 samples/sec Loss 13.1549 LearningRate 0.0005 Epoch: 2 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:34,999-Speed 6325.67 samples/sec Loss 13.2234 LearningRate 0.0005 Epoch: 2 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:38,247-Speed 6305.15 samples/sec Loss 13.2223 LearningRate 0.0005 Epoch: 2 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:41,488-Speed 6320.79 samples/sec Loss 13.2396 LearningRate 0.0005 Epoch: 2 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:44,730-Speed 6320.66 samples/sec Loss 13.1150 LearningRate 0.0005 Epoch: 2 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:47,974-Speed 6313.72 samples/sec Loss 13.2591 LearningRate 0.0005 Epoch: 2 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:51,215-Speed 6320.91 samples/sec Loss 13.2649 LearningRate 0.0005 Epoch: 2 Global Step: 44970 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:49:54,460-Speed 6312.81 samples/sec Loss 13.2288 LearningRate 0.0005 Epoch: 2 Global Step: 44980 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:49:57,703-Speed 6316.04 samples/sec Loss 13.1136 LearningRate 0.0005 Epoch: 2 Global Step: 44990 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:50:00,939-Speed 6331.40 samples/sec Loss 13.1582 LearningRate 0.0005 Epoch: 2 Global Step: 45000 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:04,188-Speed 6304.11 samples/sec Loss 13.2007 LearningRate 0.0005 Epoch: 2 Global Step: 45010 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:07,434-Speed 6311.64 samples/sec Loss 13.1473 LearningRate 0.0005 Epoch: 2 Global Step: 45020 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:10,677-Speed 6316.17 samples/sec Loss 13.1873 LearningRate 0.0005 Epoch: 2 Global Step: 45030 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:13,919-Speed 6318.18 samples/sec Loss 13.2593 LearningRate 0.0005 Epoch: 2 Global Step: 45040 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:17,161-Speed 6318.81 samples/sec Loss 13.1098 LearningRate 0.0005 Epoch: 2 Global Step: 45050 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:20,400-Speed 6324.89 samples/sec Loss 13.1233 LearningRate 0.0005 Epoch: 2 Global Step: 45060 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:23,641-Speed 6320.44 samples/sec Loss 13.2570 LearningRate 0.0005 Epoch: 2 Global Step: 45070 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:26,882-Speed 6319.30 samples/sec Loss 13.2861 LearningRate 0.0005 Epoch: 2 Global Step: 45080 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:30,126-Speed 6314.71 samples/sec Loss 13.2095 LearningRate 0.0005 Epoch: 2 Global Step: 45090 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:33,374-Speed 6307.24 samples/sec Loss 13.2391 LearningRate 0.0005 Epoch: 2 Global Step: 45100 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:50:36,614-Speed 6322.47 samples/sec Loss 13.2659 LearningRate 0.0005 Epoch: 2 Global Step: 45110 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:50:39,859-Speed 6313.12 samples/sec Loss 13.1035 LearningRate 0.0005 Epoch: 2 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:50:43,101-Speed 6318.43 samples/sec Loss 13.1673 LearningRate 0.0005 Epoch: 2 Global Step: 45130 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:50:46,346-Speed 6313.13 samples/sec Loss 13.0443 LearningRate 0.0005 Epoch: 2 Global Step: 45140 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:50:49,578-Speed 6336.72 samples/sec Loss 13.1934 LearningRate 0.0005 Epoch: 2 Global Step: 45150 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:52,821-Speed 6317.32 samples/sec Loss 13.1022 LearningRate 0.0005 Epoch: 2 Global Step: 45160 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:56,067-Speed 6312.13 samples/sec Loss 13.2138 LearningRate 0.0005 Epoch: 2 Global Step: 45170 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:50:59,319-Speed 6298.15 samples/sec Loss 13.2001 LearningRate 0.0005 Epoch: 2 Global Step: 45180 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:02,575-Speed 6290.96 samples/sec Loss 13.1212 LearningRate 0.0005 Epoch: 2 Global Step: 45190 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:05,826-Speed 6301.71 samples/sec Loss 13.1342 LearningRate 0.0005 Epoch: 2 Global Step: 45200 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:09,067-Speed 6320.11 samples/sec Loss 13.2753 LearningRate 0.0005 Epoch: 2 Global Step: 45210 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:12,310-Speed 6318.02 samples/sec Loss 13.1308 LearningRate 0.0005 Epoch: 2 Global Step: 45220 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:15,554-Speed 6313.54 samples/sec Loss 13.2144 LearningRate 0.0005 Epoch: 2 Global Step: 45230 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:18,794-Speed 6321.69 samples/sec Loss 13.2331 LearningRate 0.0005 Epoch: 2 Global Step: 45240 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:22,040-Speed 6311.13 samples/sec Loss 13.1829 LearningRate 0.0005 Epoch: 2 Global Step: 45250 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:51:25,289-Speed 6305.00 samples/sec Loss 13.0774 LearningRate 0.0005 Epoch: 2 Global Step: 45260 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:51:28,534-Speed 6313.24 samples/sec Loss 13.1920 LearningRate 0.0005 Epoch: 2 Global Step: 45270 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:51:31,776-Speed 6317.24 samples/sec Loss 13.0818 LearningRate 0.0005 Epoch: 2 Global Step: 45280 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:51:35,028-Speed 6299.99 samples/sec Loss 13.0686 LearningRate 0.0005 Epoch: 2 Global Step: 45290 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:51:38,258-Speed 6341.96 samples/sec Loss 13.2429 LearningRate 0.0005 Epoch: 2 Global Step: 45300 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:41,499-Speed 6319.92 samples/sec Loss 13.1775 LearningRate 0.0005 Epoch: 2 Global Step: 45310 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:44,741-Speed 6319.03 samples/sec Loss 13.1991 LearningRate 0.0005 Epoch: 2 Global Step: 45320 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:47,983-Speed 6318.89 samples/sec Loss 13.0738 LearningRate 0.0005 Epoch: 2 Global Step: 45330 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:51,224-Speed 6319.62 samples/sec Loss 13.1229 LearningRate 0.0005 Epoch: 2 Global Step: 45340 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:54,463-Speed 6323.52 samples/sec Loss 13.1103 LearningRate 0.0005 Epoch: 2 Global Step: 45350 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:51:57,705-Speed 6320.79 samples/sec Loss 13.2178 LearningRate 0.0005 Epoch: 2 Global Step: 45360 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:00,950-Speed 6311.16 samples/sec Loss 13.0331 LearningRate 0.0005 Epoch: 2 Global Step: 45370 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:04,197-Speed 6309.58 samples/sec Loss 13.1528 LearningRate 0.0005 Epoch: 2 Global Step: 45380 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:07,445-Speed 6307.04 samples/sec Loss 13.0637 LearningRate 0.0005 Epoch: 2 Global Step: 45390 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:10,684-Speed 6323.83 samples/sec Loss 13.0390 LearningRate 0.0005 Epoch: 2 Global Step: 45400 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:13,927-Speed 6317.48 samples/sec Loss 13.0534 LearningRate 0.0005 Epoch: 2 Global Step: 45410 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:17,171-Speed 6315.20 samples/sec Loss 13.0844 LearningRate 0.0005 Epoch: 2 Global Step: 45420 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:20,413-Speed 6318.53 samples/sec Loss 13.0955 LearningRate 0.0005 Epoch: 2 Global Step: 45430 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:23,658-Speed 6310.88 samples/sec Loss 13.0943 LearningRate 0.0005 Epoch: 2 Global Step: 45440 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:26,901-Speed 6317.62 samples/sec Loss 13.1379 LearningRate 0.0005 Epoch: 2 Global Step: 45450 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:30,178-Speed 6250.71 samples/sec Loss 13.1478 LearningRate 0.0005 Epoch: 2 Global Step: 45460 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:33,434-Speed 6291.51 samples/sec Loss 13.0948 LearningRate 0.0005 Epoch: 2 Global Step: 45470 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:36,717-Speed 6239.54 samples/sec Loss 13.0909 LearningRate 0.0005 Epoch: 2 Global Step: 45480 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:39,964-Speed 6308.01 samples/sec Loss 13.1406 LearningRate 0.0005 Epoch: 2 Global Step: 45490 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:43,207-Speed 6317.68 samples/sec Loss 13.0213 LearningRate 0.0005 Epoch: 2 Global Step: 45500 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:52:46,470-Speed 6277.67 samples/sec Loss 13.1318 LearningRate 0.0005 Epoch: 2 Global Step: 45510 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:52:49,714-Speed 6314.37 samples/sec Loss 13.1113 LearningRate 0.0005 Epoch: 2 Global Step: 45520 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:52:52,993-Speed 6246.82 samples/sec Loss 13.1261 LearningRate 0.0005 Epoch: 2 Global Step: 45530 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:52:56,244-Speed 6301.20 samples/sec Loss 13.1095 LearningRate 0.0005 Epoch: 2 Global Step: 45540 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:52:59,489-Speed 6312.53 samples/sec Loss 13.0357 LearningRate 0.0005 Epoch: 2 Global Step: 45550 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:02,734-Speed 6312.50 samples/sec Loss 13.1415 LearningRate 0.0005 Epoch: 2 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:05,977-Speed 6318.24 samples/sec Loss 13.0074 LearningRate 0.0005 Epoch: 2 Global Step: 45570 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:09,224-Speed 6307.28 samples/sec Loss 13.0643 LearningRate 0.0005 Epoch: 2 Global Step: 45580 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:12,514-Speed 6227.62 samples/sec Loss 13.0759 LearningRate 0.0005 Epoch: 2 Global Step: 45590 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:15,756-Speed 6319.03 samples/sec Loss 13.1077 LearningRate 0.0005 Epoch: 2 Global Step: 45600 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:18,996-Speed 6320.73 samples/sec Loss 13.0030 LearningRate 0.0005 Epoch: 2 Global Step: 45610 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:22,240-Speed 6316.01 samples/sec Loss 13.1459 LearningRate 0.0005 Epoch: 2 Global Step: 45620 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:25,487-Speed 6311.10 samples/sec Loss 12.9387 LearningRate 0.0006 Epoch: 2 Global Step: 45630 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:28,725-Speed 6326.64 samples/sec Loss 12.9371 LearningRate 0.0006 Epoch: 2 Global Step: 45640 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:53:31,967-Speed 6318.75 samples/sec Loss 13.0938 LearningRate 0.0006 Epoch: 2 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:53:35,219-Speed 6299.22 samples/sec Loss 12.9998 LearningRate 0.0006 Epoch: 2 Global Step: 45660 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:53:38,472-Speed 6296.74 samples/sec Loss 13.1394 LearningRate 0.0006 Epoch: 2 Global Step: 45670 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:53:41,718-Speed 6311.72 samples/sec Loss 13.0472 LearningRate 0.0006 Epoch: 2 Global Step: 45680 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:53:44,959-Speed 6319.83 samples/sec Loss 13.0169 LearningRate 0.0006 Epoch: 2 Global Step: 45690 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:53:48,191-Speed 6337.08 samples/sec Loss 13.0344 LearningRate 0.0006 Epoch: 2 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:51,439-Speed 6307.64 samples/sec Loss 13.0334 LearningRate 0.0006 Epoch: 2 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:54,719-Speed 6245.85 samples/sec Loss 13.0125 LearningRate 0.0006 Epoch: 2 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:53:57,966-Speed 6308.45 samples/sec Loss 13.0637 LearningRate 0.0006 Epoch: 2 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:01,207-Speed 6321.07 samples/sec Loss 12.9809 LearningRate 0.0006 Epoch: 2 Global Step: 45740 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:04,456-Speed 6304.52 samples/sec Loss 12.8591 LearningRate 0.0006 Epoch: 2 Global Step: 45750 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:07,701-Speed 6312.57 samples/sec Loss 13.0921 LearningRate 0.0006 Epoch: 2 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:10,947-Speed 6309.97 samples/sec Loss 12.9902 LearningRate 0.0006 Epoch: 2 Global Step: 45770 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:14,196-Speed 6305.54 samples/sec Loss 13.0554 LearningRate 0.0006 Epoch: 2 Global Step: 45780 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:17,445-Speed 6305.54 samples/sec Loss 13.0212 LearningRate 0.0006 Epoch: 2 Global Step: 45790 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:20,695-Speed 6303.35 samples/sec Loss 12.9558 LearningRate 0.0006 Epoch: 2 Global Step: 45800 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:54:23,927-Speed 6338.48 samples/sec Loss 13.0100 LearningRate 0.0006 Epoch: 2 Global Step: 45810 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:27,175-Speed 6306.99 samples/sec Loss 13.0100 LearningRate 0.0006 Epoch: 2 Global Step: 45820 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:30,418-Speed 6315.11 samples/sec Loss 13.0023 LearningRate 0.0006 Epoch: 2 Global Step: 45830 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:33,667-Speed 6305.61 samples/sec Loss 13.0063 LearningRate 0.0006 Epoch: 2 Global Step: 45840 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:36,910-Speed 6316.57 samples/sec Loss 12.8740 LearningRate 0.0006 Epoch: 2 Global Step: 45850 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:40,156-Speed 6310.89 samples/sec Loss 12.9685 LearningRate 0.0006 Epoch: 2 Global Step: 45860 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:43,413-Speed 6289.85 samples/sec Loss 12.9584 LearningRate 0.0006 Epoch: 2 Global Step: 45870 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:46,656-Speed 6315.35 samples/sec Loss 12.9671 LearningRate 0.0006 Epoch: 2 Global Step: 45880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:49,903-Speed 6308.88 samples/sec Loss 12.9761 LearningRate 0.0006 Epoch: 2 Global Step: 45890 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:53,150-Speed 6308.71 samples/sec Loss 13.1016 LearningRate 0.0006 Epoch: 2 Global Step: 45900 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:54:56,390-Speed 6323.27 samples/sec Loss 13.0963 LearningRate 0.0006 Epoch: 2 Global Step: 45910 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:54:59,635-Speed 6312.01 samples/sec Loss 13.0309 LearningRate 0.0006 Epoch: 2 Global Step: 45920 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:55:02,877-Speed 6318.07 samples/sec Loss 13.0307 LearningRate 0.0006 Epoch: 2 Global Step: 45930 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:55:06,110-Speed 6335.75 samples/sec Loss 12.8690 LearningRate 0.0006 Epoch: 2 Global Step: 45940 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:55:09,354-Speed 6315.13 samples/sec Loss 12.9445 LearningRate 0.0006 Epoch: 2 Global Step: 45950 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:55:12,598-Speed 6314.95 samples/sec Loss 12.9240 LearningRate 0.0006 Epoch: 2 Global Step: 45960 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:55:15,840-Speed 6318.68 samples/sec Loss 13.0450 LearningRate 0.0006 Epoch: 2 Global Step: 45970 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:55:19,086-Speed 6311.51 samples/sec Loss 13.0469 LearningRate 0.0006 Epoch: 2 Global Step: 45980 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:55:22,331-Speed 6312.80 samples/sec Loss 12.9906 LearningRate 0.0006 Epoch: 2 Global Step: 45990 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:55:25,574-Speed 6315.50 samples/sec Loss 12.9459 LearningRate 0.0006 Epoch: 2 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:55:28,819-Speed 6314.34 samples/sec Loss 12.8591 LearningRate 0.0006 Epoch: 2 Global Step: 46010 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:55:32,063-Speed 6314.72 samples/sec Loss 12.8978 LearningRate 0.0006 Epoch: 2 Global Step: 46020 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:55:35,367-Speed 6198.61 samples/sec Loss 12.9411 LearningRate 0.0006 Epoch: 2 Global Step: 46030 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:55:38,628-Speed 6282.14 samples/sec Loss 12.9137 LearningRate 0.0006 Epoch: 2 Global Step: 46040 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:55:41,871-Speed 6317.49 samples/sec Loss 12.9824 LearningRate 0.0006 Epoch: 2 Global Step: 46050 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:55:45,113-Speed 6318.75 samples/sec Loss 12.9833 LearningRate 0.0006 Epoch: 2 Global Step: 46060 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:55:48,361-Speed 6305.56 samples/sec Loss 12.9396 LearningRate 0.0006 Epoch: 2 Global Step: 46070 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:55:51,618-Speed 6289.89 samples/sec Loss 12.9199 LearningRate 0.0006 Epoch: 2 Global Step: 46080 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:55:54,864-Speed 6311.03 samples/sec Loss 12.8229 LearningRate 0.0006 Epoch: 2 Global Step: 46090 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:55:58,106-Speed 6318.32 samples/sec Loss 12.9197 LearningRate 0.0006 Epoch: 2 Global Step: 46100 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:56:01,352-Speed 6311.05 samples/sec Loss 12.8899 LearningRate 0.0006 Epoch: 2 Global Step: 46110 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:56:04,592-Speed 6320.95 samples/sec Loss 12.9064 LearningRate 0.0006 Epoch: 2 Global Step: 46120 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:07,834-Speed 6318.40 samples/sec Loss 12.9368 LearningRate 0.0006 Epoch: 2 Global Step: 46130 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:11,079-Speed 6313.78 samples/sec Loss 12.8214 LearningRate 0.0006 Epoch: 2 Global Step: 46140 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:14,320-Speed 6319.29 samples/sec Loss 12.8963 LearningRate 0.0006 Epoch: 2 Global Step: 46150 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:17,563-Speed 6318.92 samples/sec Loss 12.9674 LearningRate 0.0006 Epoch: 2 Global Step: 46160 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:20,806-Speed 6316.09 samples/sec Loss 12.8627 LearningRate 0.0006 Epoch: 2 Global Step: 46170 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:24,050-Speed 6315.30 samples/sec Loss 12.9396 LearningRate 0.0006 Epoch: 2 Global Step: 46180 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:27,294-Speed 6314.35 samples/sec Loss 12.9029 LearningRate 0.0006 Epoch: 2 Global Step: 46190 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:30,536-Speed 6318.72 samples/sec Loss 12.8871 LearningRate 0.0006 Epoch: 2 Global Step: 46200 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:33,782-Speed 6309.84 samples/sec Loss 12.8502 LearningRate 0.0006 Epoch: 2 Global Step: 46210 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:37,026-Speed 6315.53 samples/sec Loss 12.9356 LearningRate 0.0006 Epoch: 2 Global Step: 46220 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:56:40,272-Speed 6309.79 samples/sec Loss 12.8688 LearningRate 0.0006 Epoch: 2 Global Step: 46230 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:56:43,505-Speed 6337.48 samples/sec Loss 12.8232 LearningRate 0.0006 Epoch: 2 Global Step: 46240 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:46,746-Speed 6319.86 samples/sec Loss 12.7917 LearningRate 0.0006 Epoch: 2 Global Step: 46250 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:50,003-Speed 6289.58 samples/sec Loss 13.0326 LearningRate 0.0006 Epoch: 2 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:53,287-Speed 6237.18 samples/sec Loss 12.8233 LearningRate 0.0006 Epoch: 2 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:56,537-Speed 6303.17 samples/sec Loss 12.8108 LearningRate 0.0006 Epoch: 2 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:56:59,781-Speed 6315.09 samples/sec Loss 12.8947 LearningRate 0.0006 Epoch: 2 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:03,026-Speed 6312.78 samples/sec Loss 12.7490 LearningRate 0.0006 Epoch: 2 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:06,281-Speed 6293.01 samples/sec Loss 12.8879 LearningRate 0.0006 Epoch: 2 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:09,525-Speed 6313.35 samples/sec Loss 12.9123 LearningRate 0.0006 Epoch: 2 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:12,781-Speed 6292.18 samples/sec Loss 12.9570 LearningRate 0.0006 Epoch: 2 Global Step: 46330 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:16,123-Speed 6129.31 samples/sec Loss 12.7892 LearningRate 0.0006 Epoch: 2 Global Step: 46340 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:19,405-Speed 6242.20 samples/sec Loss 12.8553 LearningRate 0.0006 Epoch: 2 Global Step: 46350 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:22,655-Speed 6301.78 samples/sec Loss 12.9878 LearningRate 0.0006 Epoch: 2 Global Step: 46360 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:25,903-Speed 6307.75 samples/sec Loss 12.9579 LearningRate 0.0006 Epoch: 2 Global Step: 46370 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:29,151-Speed 6306.83 samples/sec Loss 12.8825 LearningRate 0.0006 Epoch: 2 Global Step: 46380 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:32,394-Speed 6316.75 samples/sec Loss 12.8002 LearningRate 0.0006 Epoch: 2 Global Step: 46390 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:35,640-Speed 6311.09 samples/sec Loss 12.9046 LearningRate 0.0006 Epoch: 2 Global Step: 46400 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:38,927-Speed 6231.88 samples/sec Loss 12.8837 LearningRate 0.0006 Epoch: 2 Global Step: 46410 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:42,189-Speed 6280.54 samples/sec Loss 12.8291 LearningRate 0.0006 Epoch: 2 Global Step: 46420 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:45,431-Speed 6317.67 samples/sec Loss 12.8520 LearningRate 0.0006 Epoch: 2 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:57:48,676-Speed 6315.36 samples/sec Loss 12.8523 LearningRate 0.0006 Epoch: 2 Global Step: 46440 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:57:51,927-Speed 6300.48 samples/sec Loss 12.7494 LearningRate 0.0006 Epoch: 2 Global Step: 46450 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:57:55,173-Speed 6310.66 samples/sec Loss 12.9323 LearningRate 0.0006 Epoch: 2 Global Step: 46460 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:57:58,408-Speed 6332.01 samples/sec Loss 12.9522 LearningRate 0.0006 Epoch: 2 Global Step: 46470 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:01,656-Speed 6307.63 samples/sec Loss 12.8191 LearningRate 0.0006 Epoch: 2 Global Step: 46480 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:04,902-Speed 6311.12 samples/sec Loss 12.8245 LearningRate 0.0006 Epoch: 2 Global Step: 46490 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:08,145-Speed 6315.47 samples/sec Loss 12.7271 LearningRate 0.0006 Epoch: 2 Global Step: 46500 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:11,389-Speed 6315.12 samples/sec Loss 12.7741 LearningRate 0.0006 Epoch: 2 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:14,633-Speed 6314.83 samples/sec Loss 12.8100 LearningRate 0.0006 Epoch: 2 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:17,879-Speed 6310.12 samples/sec Loss 12.8591 LearningRate 0.0006 Epoch: 2 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:21,124-Speed 6312.43 samples/sec Loss 12.7519 LearningRate 0.0006 Epoch: 2 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:24,370-Speed 6309.78 samples/sec Loss 12.7561 LearningRate 0.0006 Epoch: 2 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:27,615-Speed 6312.87 samples/sec Loss 12.7552 LearningRate 0.0006 Epoch: 2 Global Step: 46560 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:30,860-Speed 6314.22 samples/sec Loss 12.7791 LearningRate 0.0006 Epoch: 2 Global Step: 46570 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:58:34,094-Speed 6335.08 samples/sec Loss 12.8497 LearningRate 0.0006 Epoch: 2 Global Step: 46580 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:37,338-Speed 6314.07 samples/sec Loss 12.8499 LearningRate 0.0006 Epoch: 2 Global Step: 46590 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:40,584-Speed 6309.70 samples/sec Loss 12.7302 LearningRate 0.0006 Epoch: 2 Global Step: 46600 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:43,827-Speed 6317.62 samples/sec Loss 12.7559 LearningRate 0.0006 Epoch: 2 Global Step: 46610 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:47,098-Speed 6261.66 samples/sec Loss 12.7861 LearningRate 0.0006 Epoch: 2 Global Step: 46620 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:50,340-Speed 6319.34 samples/sec Loss 12.9050 LearningRate 0.0006 Epoch: 2 Global Step: 46630 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:53,596-Speed 6291.95 samples/sec Loss 12.9100 LearningRate 0.0006 Epoch: 2 Global Step: 46640 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:58:56,839-Speed 6314.93 samples/sec Loss 12.7866 LearningRate 0.0006 Epoch: 2 Global Step: 46650 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:00,081-Speed 6319.70 samples/sec Loss 12.9242 LearningRate 0.0006 Epoch: 2 Global Step: 46660 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:03,324-Speed 6315.61 samples/sec Loss 12.7893 LearningRate 0.0006 Epoch: 2 Global Step: 46670 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:06,569-Speed 6314.35 samples/sec Loss 12.7394 LearningRate 0.0006 Epoch: 2 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:59:09,813-Speed 6313.77 samples/sec Loss 12.7028 LearningRate 0.0006 Epoch: 2 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:59:13,045-Speed 6338.35 samples/sec Loss 12.7470 LearningRate 0.0006 Epoch: 2 Global Step: 46700 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:16,294-Speed 6303.78 samples/sec Loss 12.8207 LearningRate 0.0006 Epoch: 2 Global Step: 46710 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:19,538-Speed 6316.00 samples/sec Loss 12.7430 LearningRate 0.0006 Epoch: 2 Global Step: 46720 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:22,776-Speed 6325.06 samples/sec Loss 12.8536 LearningRate 0.0006 Epoch: 2 Global Step: 46730 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:26,021-Speed 6312.04 samples/sec Loss 12.8057 LearningRate 0.0006 Epoch: 2 Global Step: 46740 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:29,262-Speed 6320.65 samples/sec Loss 12.7152 LearningRate 0.0006 Epoch: 2 Global Step: 46750 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:32,506-Speed 6316.30 samples/sec Loss 12.7579 LearningRate 0.0006 Epoch: 2 Global Step: 46760 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:35,750-Speed 6314.42 samples/sec Loss 12.7131 LearningRate 0.0006 Epoch: 2 Global Step: 46770 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:38,989-Speed 6325.20 samples/sec Loss 12.7844 LearningRate 0.0006 Epoch: 2 Global Step: 46780 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:42,233-Speed 6313.88 samples/sec Loss 12.8489 LearningRate 0.0006 Epoch: 2 Global Step: 46790 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 19:59:45,485-Speed 6300.16 samples/sec Loss 12.8013 LearningRate 0.0006 Epoch: 2 Global Step: 46800 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:59:48,728-Speed 6315.93 samples/sec Loss 12.7032 LearningRate 0.0006 Epoch: 2 Global Step: 46810 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:59:51,973-Speed 6313.23 samples/sec Loss 12.6411 LearningRate 0.0006 Epoch: 2 Global Step: 46820 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:59:55,213-Speed 6320.65 samples/sec Loss 12.6186 LearningRate 0.0006 Epoch: 2 Global Step: 46830 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 19:59:58,443-Speed 6343.46 samples/sec Loss 12.7318 LearningRate 0.0006 Epoch: 2 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:01,691-Speed 6307.20 samples/sec Loss 12.6448 LearningRate 0.0006 Epoch: 2 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:04,942-Speed 6300.89 samples/sec Loss 12.6769 LearningRate 0.0006 Epoch: 2 Global Step: 46860 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:08,199-Speed 6289.24 samples/sec Loss 12.7691 LearningRate 0.0006 Epoch: 2 Global Step: 46870 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:11,444-Speed 6312.91 samples/sec Loss 12.6899 LearningRate 0.0006 Epoch: 2 Global Step: 46880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:14,690-Speed 6309.91 samples/sec Loss 12.8135 LearningRate 0.0006 Epoch: 2 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:17,935-Speed 6312.84 samples/sec Loss 12.7584 LearningRate 0.0006 Epoch: 2 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:21,179-Speed 6315.25 samples/sec Loss 12.7418 LearningRate 0.0006 Epoch: 2 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:24,423-Speed 6313.92 samples/sec Loss 12.7374 LearningRate 0.0006 Epoch: 2 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:27,670-Speed 6308.97 samples/sec Loss 12.7037 LearningRate 0.0006 Epoch: 2 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:30,898-Speed 6344.87 samples/sec Loss 12.6404 LearningRate 0.0006 Epoch: 2 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:34,146-Speed 6307.38 samples/sec Loss 12.7096 LearningRate 0.0006 Epoch: 2 Global Step: 46950 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:37,391-Speed 6312.85 samples/sec Loss 12.6988 LearningRate 0.0006 Epoch: 2 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:40,634-Speed 6317.32 samples/sec Loss 12.6909 LearningRate 0.0006 Epoch: 2 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:43,874-Speed 6323.27 samples/sec Loss 12.7572 LearningRate 0.0006 Epoch: 2 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:47,122-Speed 6306.37 samples/sec Loss 12.6635 LearningRate 0.0006 Epoch: 2 Global Step: 46990 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:50,365-Speed 6315.55 samples/sec Loss 12.7357 LearningRate 0.0006 Epoch: 2 Global Step: 47000 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:53,608-Speed 6316.44 samples/sec Loss 12.7743 LearningRate 0.0006 Epoch: 2 Global Step: 47010 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:00:56,852-Speed 6315.16 samples/sec Loss 12.7073 LearningRate 0.0006 Epoch: 2 Global Step: 47020 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:00,099-Speed 6309.34 samples/sec Loss 12.6915 LearningRate 0.0006 Epoch: 2 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:03,343-Speed 6313.87 samples/sec Loss 12.7310 LearningRate 0.0006 Epoch: 2 Global Step: 47040 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:01:06,587-Speed 6315.71 samples/sec Loss 12.7281 LearningRate 0.0006 Epoch: 2 Global Step: 47050 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:01:09,830-Speed 6315.84 samples/sec Loss 12.7438 LearningRate 0.0006 Epoch: 2 Global Step: 47060 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:01:13,058-Speed 6346.67 samples/sec Loss 12.6787 LearningRate 0.0006 Epoch: 2 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:16,302-Speed 6313.11 samples/sec Loss 12.7474 LearningRate 0.0006 Epoch: 2 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:19,545-Speed 6318.67 samples/sec Loss 12.7222 LearningRate 0.0006 Epoch: 2 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:22,787-Speed 6318.22 samples/sec Loss 12.7612 LearningRate 0.0006 Epoch: 2 Global Step: 47100 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:26,030-Speed 6315.88 samples/sec Loss 12.6659 LearningRate 0.0006 Epoch: 2 Global Step: 47110 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:29,274-Speed 6313.63 samples/sec Loss 12.7025 LearningRate 0.0006 Epoch: 2 Global Step: 47120 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:32,518-Speed 6315.10 samples/sec Loss 12.6484 LearningRate 0.0006 Epoch: 2 Global Step: 47130 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:35,761-Speed 6316.79 samples/sec Loss 12.7409 LearningRate 0.0006 Epoch: 2 Global Step: 47140 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:39,007-Speed 6310.65 samples/sec Loss 12.6039 LearningRate 0.0006 Epoch: 2 Global Step: 47150 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:42,253-Speed 6311.71 samples/sec Loss 12.7043 LearningRate 0.0006 Epoch: 2 Global Step: 47160 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:45,495-Speed 6318.83 samples/sec Loss 12.5320 LearningRate 0.0006 Epoch: 2 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:01:48,741-Speed 6310.88 samples/sec Loss 12.7528 LearningRate 0.0006 Epoch: 2 Global Step: 47180 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:51,986-Speed 6312.68 samples/sec Loss 12.5833 LearningRate 0.0006 Epoch: 2 Global Step: 47190 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:55,229-Speed 6316.75 samples/sec Loss 12.6268 LearningRate 0.0006 Epoch: 2 Global Step: 47200 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:01:58,483-Speed 6293.95 samples/sec Loss 12.6616 LearningRate 0.0006 Epoch: 2 Global Step: 47210 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:01,729-Speed 6312.50 samples/sec Loss 12.5281 LearningRate 0.0006 Epoch: 2 Global Step: 47220 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:04,973-Speed 6313.08 samples/sec Loss 12.6239 LearningRate 0.0006 Epoch: 2 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:08,215-Speed 6319.73 samples/sec Loss 12.5856 LearningRate 0.0006 Epoch: 2 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:11,461-Speed 6309.83 samples/sec Loss 12.6457 LearningRate 0.0006 Epoch: 2 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:14,700-Speed 6325.40 samples/sec Loss 12.6727 LearningRate 0.0006 Epoch: 2 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:17,945-Speed 6312.32 samples/sec Loss 12.5973 LearningRate 0.0006 Epoch: 2 Global Step: 47270 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:21,172-Speed 6347.96 samples/sec Loss 12.5568 LearningRate 0.0006 Epoch: 2 Global Step: 47280 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:24,417-Speed 6312.64 samples/sec Loss 12.6099 LearningRate 0.0006 Epoch: 2 Global Step: 47290 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:27,660-Speed 6315.53 samples/sec Loss 12.6506 LearningRate 0.0006 Epoch: 2 Global Step: 47300 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:30,898-Speed 6326.80 samples/sec Loss 12.6057 LearningRate 0.0006 Epoch: 2 Global Step: 47310 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:34,141-Speed 6316.43 samples/sec Loss 12.7011 LearningRate 0.0006 Epoch: 2 Global Step: 47320 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:37,381-Speed 6322.12 samples/sec Loss 12.6555 LearningRate 0.0006 Epoch: 2 Global Step: 47330 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:40,625-Speed 6314.88 samples/sec Loss 12.6272 LearningRate 0.0006 Epoch: 2 Global Step: 47340 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:43,866-Speed 6319.58 samples/sec Loss 12.6272 LearningRate 0.0006 Epoch: 2 Global Step: 47350 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:47,110-Speed 6315.77 samples/sec Loss 12.6997 LearningRate 0.0006 Epoch: 2 Global Step: 47360 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:50,356-Speed 6310.04 samples/sec Loss 12.6001 LearningRate 0.0006 Epoch: 2 Global Step: 47370 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:02:53,598-Speed 6319.45 samples/sec Loss 12.6085 LearningRate 0.0006 Epoch: 2 Global Step: 47380 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:02:56,841-Speed 6316.88 samples/sec Loss 12.5737 LearningRate 0.0006 Epoch: 2 Global Step: 47390 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:03:00,068-Speed 6346.66 samples/sec Loss 12.5990 LearningRate 0.0006 Epoch: 2 Global Step: 47400 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:03,310-Speed 6319.31 samples/sec Loss 12.6067 LearningRate 0.0006 Epoch: 2 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:06,551-Speed 6319.90 samples/sec Loss 12.6296 LearningRate 0.0006 Epoch: 2 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:09,794-Speed 6317.34 samples/sec Loss 12.5759 LearningRate 0.0006 Epoch: 2 Global Step: 47430 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:13,036-Speed 6318.46 samples/sec Loss 12.7020 LearningRate 0.0006 Epoch: 2 Global Step: 47440 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:16,278-Speed 6317.35 samples/sec Loss 12.6430 LearningRate 0.0006 Epoch: 2 Global Step: 47450 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:19,520-Speed 6319.45 samples/sec Loss 12.6131 LearningRate 0.0006 Epoch: 2 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:22,785-Speed 6273.79 samples/sec Loss 12.5974 LearningRate 0.0006 Epoch: 2 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:26,030-Speed 6312.74 samples/sec Loss 12.6281 LearningRate 0.0006 Epoch: 2 Global Step: 47480 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:29,272-Speed 6319.29 samples/sec Loss 12.5736 LearningRate 0.0006 Epoch: 2 Global Step: 47490 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:32,503-Speed 6339.86 samples/sec Loss 12.6569 LearningRate 0.0006 Epoch: 2 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:35,749-Speed 6310.33 samples/sec Loss 12.6243 LearningRate 0.0006 Epoch: 2 Global Step: 47510 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:38,991-Speed 6317.35 samples/sec Loss 12.5000 LearningRate 0.0006 Epoch: 2 Global Step: 47520 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:42,233-Speed 6318.92 samples/sec Loss 12.5717 LearningRate 0.0006 Epoch: 2 Global Step: 47530 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:45,480-Speed 6309.16 samples/sec Loss 12.4976 LearningRate 0.0006 Epoch: 2 Global Step: 47540 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:48,721-Speed 6321.04 samples/sec Loss 12.6183 LearningRate 0.0006 Epoch: 2 Global Step: 47550 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:51,965-Speed 6314.51 samples/sec Loss 12.5576 LearningRate 0.0006 Epoch: 2 Global Step: 47560 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:55,212-Speed 6309.80 samples/sec Loss 12.6119 LearningRate 0.0006 Epoch: 2 Global Step: 47570 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:03:58,454-Speed 6318.08 samples/sec Loss 12.4808 LearningRate 0.0006 Epoch: 2 Global Step: 47580 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:01,698-Speed 6314.90 samples/sec Loss 12.5775 LearningRate 0.0006 Epoch: 2 Global Step: 47590 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:04,924-Speed 6350.11 samples/sec Loss 12.5143 LearningRate 0.0006 Epoch: 2 Global Step: 47600 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:08,164-Speed 6321.66 samples/sec Loss 12.5812 LearningRate 0.0006 Epoch: 2 Global Step: 47610 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:11,408-Speed 6314.55 samples/sec Loss 12.5543 LearningRate 0.0006 Epoch: 2 Global Step: 47620 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:14,676-Speed 6268.90 samples/sec Loss 12.5769 LearningRate 0.0006 Epoch: 2 Global Step: 47630 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:17,988-Speed 6183.58 samples/sec Loss 12.4967 LearningRate 0.0006 Epoch: 2 Global Step: 47640 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:21,246-Speed 6288.54 samples/sec Loss 12.5528 LearningRate 0.0006 Epoch: 2 Global Step: 47650 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:24,489-Speed 6317.12 samples/sec Loss 12.5746 LearningRate 0.0006 Epoch: 2 Global Step: 47660 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:27,732-Speed 6315.26 samples/sec Loss 12.5294 LearningRate 0.0006 Epoch: 2 Global Step: 47670 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:30,977-Speed 6312.90 samples/sec Loss 12.6386 LearningRate 0.0006 Epoch: 2 Global Step: 47680 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:34,218-Speed 6320.81 samples/sec Loss 12.6031 LearningRate 0.0006 Epoch: 2 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:37,464-Speed 6309.59 samples/sec Loss 12.6690 LearningRate 0.0006 Epoch: 2 Global Step: 47700 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:04:40,709-Speed 6313.90 samples/sec Loss 12.4345 LearningRate 0.0006 Epoch: 2 Global Step: 47710 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:04:43,939-Speed 6342.17 samples/sec Loss 12.5249 LearningRate 0.0006 Epoch: 2 Global Step: 47720 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:47,185-Speed 6310.22 samples/sec Loss 12.5971 LearningRate 0.0006 Epoch: 2 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:50,424-Speed 6324.94 samples/sec Loss 12.5619 LearningRate 0.0006 Epoch: 2 Global Step: 47740 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:53,667-Speed 6315.94 samples/sec Loss 12.4229 LearningRate 0.0006 Epoch: 2 Global Step: 47750 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:04:56,913-Speed 6310.64 samples/sec Loss 12.5914 LearningRate 0.0006 Epoch: 2 Global Step: 47760 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:00,169-Speed 6291.89 samples/sec Loss 12.5707 LearningRate 0.0006 Epoch: 2 Global Step: 47770 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:03,410-Speed 6320.48 samples/sec Loss 12.5240 LearningRate 0.0006 Epoch: 2 Global Step: 47780 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:06,653-Speed 6317.66 samples/sec Loss 12.5874 LearningRate 0.0006 Epoch: 2 Global Step: 47790 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:09,896-Speed 6315.43 samples/sec Loss 12.5275 LearningRate 0.0006 Epoch: 2 Global Step: 47800 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:13,140-Speed 6315.03 samples/sec Loss 12.5314 LearningRate 0.0006 Epoch: 2 Global Step: 47810 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:16,386-Speed 6309.32 samples/sec Loss 12.4506 LearningRate 0.0006 Epoch: 2 Global Step: 47820 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:05:19,635-Speed 6306.94 samples/sec Loss 12.5256 LearningRate 0.0006 Epoch: 2 Global Step: 47830 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:05:22,862-Speed 6347.16 samples/sec Loss 12.5060 LearningRate 0.0006 Epoch: 2 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:26,107-Speed 6312.23 samples/sec Loss 12.4437 LearningRate 0.0006 Epoch: 2 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:29,356-Speed 6305.86 samples/sec Loss 12.3905 LearningRate 0.0006 Epoch: 2 Global Step: 47860 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:32,596-Speed 6322.03 samples/sec Loss 12.5825 LearningRate 0.0006 Epoch: 2 Global Step: 47870 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:35,839-Speed 6316.31 samples/sec Loss 12.3995 LearningRate 0.0006 Epoch: 2 Global Step: 47880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:39,082-Speed 6315.66 samples/sec Loss 12.4374 LearningRate 0.0006 Epoch: 2 Global Step: 47890 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:42,333-Speed 6302.33 samples/sec Loss 12.4296 LearningRate 0.0006 Epoch: 2 Global Step: 47900 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:45,584-Speed 6299.15 samples/sec Loss 12.5015 LearningRate 0.0006 Epoch: 2 Global Step: 47910 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:48,852-Speed 6269.03 samples/sec Loss 12.4209 LearningRate 0.0006 Epoch: 2 Global Step: 47920 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:52,095-Speed 6316.42 samples/sec Loss 12.4830 LearningRate 0.0006 Epoch: 2 Global Step: 47930 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:05:55,343-Speed 6308.19 samples/sec Loss 12.4782 LearningRate 0.0006 Epoch: 2 Global Step: 47940 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:05:58,575-Speed 6337.95 samples/sec Loss 12.4085 LearningRate 0.0006 Epoch: 2 Global Step: 47950 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:01,820-Speed 6311.37 samples/sec Loss 12.4064 LearningRate 0.0006 Epoch: 2 Global Step: 47960 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:05,065-Speed 6314.50 samples/sec Loss 12.4814 LearningRate 0.0006 Epoch: 2 Global Step: 47970 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:08,308-Speed 6316.09 samples/sec Loss 12.4671 LearningRate 0.0006 Epoch: 2 Global Step: 47980 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:11,551-Speed 6317.44 samples/sec Loss 12.4325 LearningRate 0.0006 Epoch: 2 Global Step: 47990 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:14,793-Speed 6317.26 samples/sec Loss 12.5100 LearningRate 0.0006 Epoch: 2 Global Step: 48000 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:18,035-Speed 6318.44 samples/sec Loss 12.4505 LearningRate 0.0006 Epoch: 2 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:21,278-Speed 6317.40 samples/sec Loss 12.3329 LearningRate 0.0006 Epoch: 2 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:24,519-Speed 6320.41 samples/sec Loss 12.4307 LearningRate 0.0006 Epoch: 2 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:27,769-Speed 6303.40 samples/sec Loss 12.5165 LearningRate 0.0006 Epoch: 2 Global Step: 48040 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:30,997-Speed 6344.81 samples/sec Loss 12.5318 LearningRate 0.0006 Epoch: 2 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:34,241-Speed 6315.61 samples/sec Loss 12.4321 LearningRate 0.0006 Epoch: 2 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:37,484-Speed 6315.89 samples/sec Loss 12.3542 LearningRate 0.0006 Epoch: 2 Global Step: 48070 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:40,728-Speed 6315.31 samples/sec Loss 12.4570 LearningRate 0.0006 Epoch: 2 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:43,973-Speed 6310.89 samples/sec Loss 12.4123 LearningRate 0.0006 Epoch: 2 Global Step: 48090 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:47,220-Speed 6309.27 samples/sec Loss 12.5236 LearningRate 0.0006 Epoch: 2 Global Step: 48100 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:50,465-Speed 6312.56 samples/sec Loss 12.3973 LearningRate 0.0006 Epoch: 2 Global Step: 48110 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:53,711-Speed 6310.58 samples/sec Loss 12.3229 LearningRate 0.0006 Epoch: 2 Global Step: 48120 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:06:56,955-Speed 6315.28 samples/sec Loss 12.4386 LearningRate 0.0006 Epoch: 2 Global Step: 48130 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:00,202-Speed 6308.56 samples/sec Loss 12.4961 LearningRate 0.0006 Epoch: 2 Global Step: 48140 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:03,437-Speed 6332.71 samples/sec Loss 12.4536 LearningRate 0.0006 Epoch: 2 Global Step: 48150 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:06,682-Speed 6313.14 samples/sec Loss 12.4508 LearningRate 0.0006 Epoch: 2 Global Step: 48160 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:09,927-Speed 6312.98 samples/sec Loss 12.3733 LearningRate 0.0006 Epoch: 2 Global Step: 48170 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:13,168-Speed 6319.91 samples/sec Loss 12.4678 LearningRate 0.0006 Epoch: 2 Global Step: 48180 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:16,417-Speed 6304.88 samples/sec Loss 12.4788 LearningRate 0.0006 Epoch: 2 Global Step: 48190 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:19,657-Speed 6322.26 samples/sec Loss 12.5097 LearningRate 0.0006 Epoch: 2 Global Step: 48200 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:22,904-Speed 6310.27 samples/sec Loss 12.4573 LearningRate 0.0006 Epoch: 2 Global Step: 48210 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:26,144-Speed 6322.61 samples/sec Loss 12.3992 LearningRate 0.0006 Epoch: 2 Global Step: 48220 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:29,389-Speed 6312.40 samples/sec Loss 12.4761 LearningRate 0.0006 Epoch: 2 Global Step: 48230 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:32,632-Speed 6316.06 samples/sec Loss 12.4127 LearningRate 0.0006 Epoch: 2 Global Step: 48240 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:35,875-Speed 6315.49 samples/sec Loss 12.4334 LearningRate 0.0006 Epoch: 2 Global Step: 48250 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:07:39,110-Speed 6333.60 samples/sec Loss 12.4191 LearningRate 0.0006 Epoch: 2 Global Step: 48260 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:42,355-Speed 6312.53 samples/sec Loss 12.4660 LearningRate 0.0006 Epoch: 2 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:45,600-Speed 6312.84 samples/sec Loss 12.2709 LearningRate 0.0006 Epoch: 2 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:48,845-Speed 6312.75 samples/sec Loss 12.3128 LearningRate 0.0006 Epoch: 2 Global Step: 48290 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:52,087-Speed 6318.38 samples/sec Loss 12.3879 LearningRate 0.0006 Epoch: 2 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:55,335-Speed 6306.50 samples/sec Loss 12.4281 LearningRate 0.0006 Epoch: 2 Global Step: 48310 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:07:58,576-Speed 6320.64 samples/sec Loss 12.4447 LearningRate 0.0006 Epoch: 2 Global Step: 48320 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:01,816-Speed 6322.08 samples/sec Loss 12.3378 LearningRate 0.0006 Epoch: 2 Global Step: 48330 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:05,062-Speed 6309.71 samples/sec Loss 12.3669 LearningRate 0.0006 Epoch: 2 Global Step: 48340 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:08,305-Speed 6317.66 samples/sec Loss 12.4342 LearningRate 0.0006 Epoch: 2 Global Step: 48350 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:11,552-Speed 6309.21 samples/sec Loss 12.4260 LearningRate 0.0006 Epoch: 2 Global Step: 48360 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:08:14,781-Speed 6344.21 samples/sec Loss 12.3772 LearningRate 0.0006 Epoch: 2 Global Step: 48370 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:18,021-Speed 6321.74 samples/sec Loss 12.4509 LearningRate 0.0006 Epoch: 2 Global Step: 48380 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:21,264-Speed 6317.98 samples/sec Loss 12.4193 LearningRate 0.0006 Epoch: 2 Global Step: 48390 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:24,511-Speed 6307.68 samples/sec Loss 12.3692 LearningRate 0.0006 Epoch: 2 Global Step: 48400 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:27,754-Speed 6317.32 samples/sec Loss 12.4905 LearningRate 0.0006 Epoch: 2 Global Step: 48410 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:30,991-Speed 6327.48 samples/sec Loss 12.4188 LearningRate 0.0006 Epoch: 2 Global Step: 48420 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:34,233-Speed 6318.22 samples/sec Loss 12.4134 LearningRate 0.0006 Epoch: 2 Global Step: 48430 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:37,475-Speed 6319.86 samples/sec Loss 12.3089 LearningRate 0.0006 Epoch: 2 Global Step: 48440 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:40,718-Speed 6315.37 samples/sec Loss 12.2963 LearningRate 0.0006 Epoch: 2 Global Step: 48450 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:43,960-Speed 6317.84 samples/sec Loss 12.3780 LearningRate 0.0006 Epoch: 2 Global Step: 48460 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:08:47,204-Speed 6314.87 samples/sec Loss 12.3303 LearningRate 0.0006 Epoch: 2 Global Step: 48470 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:08:50,446-Speed 6318.26 samples/sec Loss 12.3608 LearningRate 0.0006 Epoch: 2 Global Step: 48480 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:08:53,692-Speed 6312.54 samples/sec Loss 12.4034 LearningRate 0.0006 Epoch: 2 Global Step: 48490 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:08:56,923-Speed 6338.45 samples/sec Loss 12.4101 LearningRate 0.0006 Epoch: 2 Global Step: 48500 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:00,163-Speed 6322.30 samples/sec Loss 12.3653 LearningRate 0.0006 Epoch: 2 Global Step: 48510 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:03,410-Speed 6309.75 samples/sec Loss 12.2801 LearningRate 0.0006 Epoch: 2 Global Step: 48520 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:06,654-Speed 6314.30 samples/sec Loss 12.3560 LearningRate 0.0006 Epoch: 2 Global Step: 48530 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:09,899-Speed 6312.26 samples/sec Loss 12.4490 LearningRate 0.0006 Epoch: 2 Global Step: 48540 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:13,150-Speed 6301.77 samples/sec Loss 12.3306 LearningRate 0.0006 Epoch: 2 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:16,393-Speed 6316.50 samples/sec Loss 12.4296 LearningRate 0.0006 Epoch: 2 Global Step: 48560 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:19,637-Speed 6316.39 samples/sec Loss 12.3729 LearningRate 0.0006 Epoch: 2 Global Step: 48570 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:22,878-Speed 6319.47 samples/sec Loss 12.4135 LearningRate 0.0006 Epoch: 2 Global Step: 48580 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:26,125-Speed 6309.77 samples/sec Loss 12.3039 LearningRate 0.0006 Epoch: 2 Global Step: 48590 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:29,448-Speed 6164.74 samples/sec Loss 12.3684 LearningRate 0.0006 Epoch: 2 Global Step: 48600 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:09:32,690-Speed 6317.53 samples/sec Loss 12.3408 LearningRate 0.0006 Epoch: 2 Global Step: 48610 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:09:35,933-Speed 6317.60 samples/sec Loss 12.3167 LearningRate 0.0006 Epoch: 2 Global Step: 48620 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:09:39,163-Speed 6342.45 samples/sec Loss 12.3133 LearningRate 0.0006 Epoch: 2 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:42,405-Speed 6317.00 samples/sec Loss 12.2633 LearningRate 0.0006 Epoch: 2 Global Step: 48640 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:45,649-Speed 6314.47 samples/sec Loss 12.4145 LearningRate 0.0006 Epoch: 2 Global Step: 48650 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:48,891-Speed 6319.69 samples/sec Loss 12.3623 LearningRate 0.0006 Epoch: 2 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:52,134-Speed 6315.70 samples/sec Loss 12.2713 LearningRate 0.0006 Epoch: 2 Global Step: 48670 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:55,375-Speed 6320.79 samples/sec Loss 12.3212 LearningRate 0.0006 Epoch: 2 Global Step: 48680 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:09:58,620-Speed 6313.51 samples/sec Loss 12.3021 LearningRate 0.0006 Epoch: 2 Global Step: 48690 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:01,864-Speed 6314.13 samples/sec Loss 12.4148 LearningRate 0.0006 Epoch: 2 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:05,110-Speed 6310.14 samples/sec Loss 12.4177 LearningRate 0.0006 Epoch: 2 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:08,353-Speed 6317.09 samples/sec Loss 12.3271 LearningRate 0.0006 Epoch: 2 Global Step: 48720 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:11,594-Speed 6320.08 samples/sec Loss 12.3777 LearningRate 0.0006 Epoch: 2 Global Step: 48730 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:10:14,843-Speed 6304.99 samples/sec Loss 12.3331 LearningRate 0.0006 Epoch: 2 Global Step: 48740 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:10:18,073-Speed 6343.05 samples/sec Loss 12.3503 LearningRate 0.0006 Epoch: 2 Global Step: 48750 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:21,317-Speed 6315.05 samples/sec Loss 12.3871 LearningRate 0.0006 Epoch: 2 Global Step: 48760 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:24,601-Speed 6237.29 samples/sec Loss 12.2632 LearningRate 0.0006 Epoch: 2 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:27,854-Speed 6297.74 samples/sec Loss 12.3512 LearningRate 0.0006 Epoch: 2 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:31,104-Speed 6305.82 samples/sec Loss 12.2186 LearningRate 0.0006 Epoch: 2 Global Step: 48790 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:34,348-Speed 6313.50 samples/sec Loss 12.3641 LearningRate 0.0006 Epoch: 2 Global Step: 48800 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:37,590-Speed 6319.11 samples/sec Loss 12.3329 LearningRate 0.0006 Epoch: 2 Global Step: 48810 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:40,834-Speed 6313.74 samples/sec Loss 12.2858 LearningRate 0.0006 Epoch: 2 Global Step: 48820 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:44,076-Speed 6319.80 samples/sec Loss 12.3673 LearningRate 0.0006 Epoch: 2 Global Step: 48830 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:47,319-Speed 6316.47 samples/sec Loss 12.2345 LearningRate 0.0006 Epoch: 2 Global Step: 48840 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:50,563-Speed 6314.05 samples/sec Loss 12.3288 LearningRate 0.0006 Epoch: 2 Global Step: 48850 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:10:53,791-Speed 6346.51 samples/sec Loss 12.2470 LearningRate 0.0006 Epoch: 2 Global Step: 48860 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:10:57,036-Speed 6312.13 samples/sec Loss 12.1762 LearningRate 0.0006 Epoch: 2 Global Step: 48870 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:00,281-Speed 6311.60 samples/sec Loss 12.2592 LearningRate 0.0006 Epoch: 2 Global Step: 48880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:03,531-Speed 6302.88 samples/sec Loss 12.2679 LearningRate 0.0006 Epoch: 2 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:06,773-Speed 6320.05 samples/sec Loss 12.3011 LearningRate 0.0006 Epoch: 2 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:10,015-Speed 6318.15 samples/sec Loss 12.2917 LearningRate 0.0006 Epoch: 2 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:13,260-Speed 6313.12 samples/sec Loss 12.3298 LearningRate 0.0006 Epoch: 2 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:16,503-Speed 6316.23 samples/sec Loss 12.2991 LearningRate 0.0006 Epoch: 2 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:19,748-Speed 6312.06 samples/sec Loss 12.2597 LearningRate 0.0006 Epoch: 2 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:22,992-Speed 6315.28 samples/sec Loss 12.3157 LearningRate 0.0006 Epoch: 2 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:26,220-Speed 6345.24 samples/sec Loss 12.1700 LearningRate 0.0006 Epoch: 2 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:29,463-Speed 6317.73 samples/sec Loss 12.2522 LearningRate 0.0006 Epoch: 2 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:32,706-Speed 6315.97 samples/sec Loss 12.3332 LearningRate 0.0006 Epoch: 2 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:35,951-Speed 6312.70 samples/sec Loss 12.3054 LearningRate 0.0006 Epoch: 2 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:39,199-Speed 6307.43 samples/sec Loss 12.3476 LearningRate 0.0006 Epoch: 2 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:42,439-Speed 6322.10 samples/sec Loss 12.2508 LearningRate 0.0006 Epoch: 2 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:45,681-Speed 6318.82 samples/sec Loss 12.2918 LearningRate 0.0006 Epoch: 2 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:48,925-Speed 6314.23 samples/sec Loss 12.2790 LearningRate 0.0006 Epoch: 2 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:52,171-Speed 6310.08 samples/sec Loss 12.2414 LearningRate 0.0006 Epoch: 2 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:55,412-Speed 6321.31 samples/sec Loss 12.1713 LearningRate 0.0006 Epoch: 2 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:11:58,646-Speed 6334.03 samples/sec Loss 12.1504 LearningRate 0.0006 Epoch: 2 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:01,890-Speed 6315.20 samples/sec Loss 12.2728 LearningRate 0.0006 Epoch: 2 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:05,238-Speed 6118.71 samples/sec Loss 12.2206 LearningRate 0.0006 Epoch: 2 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:08,493-Speed 6292.02 samples/sec Loss 12.1451 LearningRate 0.0006 Epoch: 2 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:11,735-Speed 6319.31 samples/sec Loss 12.1666 LearningRate 0.0006 Epoch: 2 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:14,980-Speed 6311.79 samples/sec Loss 12.2472 LearningRate 0.0006 Epoch: 2 Global Step: 49110 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:18,218-Speed 6326.77 samples/sec Loss 12.2574 LearningRate 0.0006 Epoch: 2 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:21,464-Speed 6310.65 samples/sec Loss 12.1887 LearningRate 0.0006 Epoch: 2 Global Step: 49130 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:24,708-Speed 6314.77 samples/sec Loss 12.1705 LearningRate 0.0006 Epoch: 2 Global Step: 49140 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:27,965-Speed 6290.65 samples/sec Loss 12.2461 LearningRate 0.0006 Epoch: 2 Global Step: 49150 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:31,212-Speed 6312.46 samples/sec Loss 12.2793 LearningRate 0.0006 Epoch: 2 Global Step: 49160 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:12:34,440-Speed 6344.68 samples/sec Loss 12.1604 LearningRate 0.0006 Epoch: 2 Global Step: 49170 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:37,685-Speed 6312.82 samples/sec Loss 12.2360 LearningRate 0.0006 Epoch: 2 Global Step: 49180 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:40,930-Speed 6313.12 samples/sec Loss 12.2080 LearningRate 0.0006 Epoch: 2 Global Step: 49190 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:44,174-Speed 6315.21 samples/sec Loss 12.2795 LearningRate 0.0006 Epoch: 2 Global Step: 49200 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:47,416-Speed 6318.08 samples/sec Loss 12.2657 LearningRate 0.0006 Epoch: 2 Global Step: 49210 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:50,665-Speed 6304.64 samples/sec Loss 12.3461 LearningRate 0.0006 Epoch: 2 Global Step: 49220 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:53,912-Speed 6308.92 samples/sec Loss 12.1909 LearningRate 0.0006 Epoch: 2 Global Step: 49230 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:12:57,158-Speed 6310.16 samples/sec Loss 12.1029 LearningRate 0.0006 Epoch: 2 Global Step: 49240 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:00,404-Speed 6312.19 samples/sec Loss 12.1550 LearningRate 0.0006 Epoch: 2 Global Step: 49250 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:03,651-Speed 6308.22 samples/sec Loss 12.2720 LearningRate 0.0006 Epoch: 2 Global Step: 49260 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:06,898-Speed 6307.53 samples/sec Loss 12.3004 LearningRate 0.0006 Epoch: 2 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:13:10,144-Speed 6311.62 samples/sec Loss 12.2416 LearningRate 0.0006 Epoch: 2 Global Step: 49280 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:13:13,407-Speed 6279.14 samples/sec Loss 12.1505 LearningRate 0.0006 Epoch: 2 Global Step: 49290 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:16,651-Speed 6315.16 samples/sec Loss 12.2685 LearningRate 0.0006 Epoch: 2 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:19,891-Speed 6321.67 samples/sec Loss 12.3411 LearningRate 0.0006 Epoch: 2 Global Step: 49310 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:23,133-Speed 6319.56 samples/sec Loss 12.2445 LearningRate 0.0006 Epoch: 2 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:26,376-Speed 6317.50 samples/sec Loss 12.2332 LearningRate 0.0006 Epoch: 2 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:29,630-Speed 6295.38 samples/sec Loss 12.2087 LearningRate 0.0006 Epoch: 2 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:32,872-Speed 6317.88 samples/sec Loss 12.1564 LearningRate 0.0006 Epoch: 2 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:36,118-Speed 6311.76 samples/sec Loss 12.2763 LearningRate 0.0006 Epoch: 2 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:39,364-Speed 6310.54 samples/sec Loss 12.2359 LearningRate 0.0006 Epoch: 2 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:42,606-Speed 6318.91 samples/sec Loss 12.3224 LearningRate 0.0006 Epoch: 2 Global Step: 49380 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:45,840-Speed 6333.29 samples/sec Loss 12.3397 LearningRate 0.0006 Epoch: 2 Global Step: 49390 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:49,087-Speed 6309.39 samples/sec Loss 12.2227 LearningRate 0.0006 Epoch: 2 Global Step: 49400 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:52,326-Speed 6322.95 samples/sec Loss 12.1357 LearningRate 0.0006 Epoch: 2 Global Step: 49410 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:55,572-Speed 6312.03 samples/sec Loss 12.2290 LearningRate 0.0006 Epoch: 2 Global Step: 49420 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:13:58,811-Speed 6324.18 samples/sec Loss 12.1450 LearningRate 0.0006 Epoch: 2 Global Step: 49430 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:02,056-Speed 6312.72 samples/sec Loss 12.1179 LearningRate 0.0006 Epoch: 2 Global Step: 49440 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:05,301-Speed 6312.06 samples/sec Loss 12.2517 LearningRate 0.0006 Epoch: 2 Global Step: 49450 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:08,543-Speed 6318.52 samples/sec Loss 12.1052 LearningRate 0.0006 Epoch: 2 Global Step: 49460 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:11,786-Speed 6317.19 samples/sec Loss 12.1053 LearningRate 0.0006 Epoch: 2 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:15,028-Speed 6317.27 samples/sec Loss 12.1587 LearningRate 0.0006 Epoch: 2 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:18,272-Speed 6314.96 samples/sec Loss 12.1934 LearningRate 0.0006 Epoch: 2 Global Step: 49490 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:14:21,500-Speed 6345.89 samples/sec Loss 12.1139 LearningRate 0.0006 Epoch: 2 Global Step: 49500 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:24,744-Speed 6315.29 samples/sec Loss 12.2100 LearningRate 0.0006 Epoch: 2 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:28,026-Speed 6242.37 samples/sec Loss 12.1422 LearningRate 0.0006 Epoch: 2 Global Step: 49520 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:31,267-Speed 6319.40 samples/sec Loss 12.1376 LearningRate 0.0006 Epoch: 2 Global Step: 49530 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:34,553-Speed 6233.98 samples/sec Loss 12.1034 LearningRate 0.0006 Epoch: 2 Global Step: 49540 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:37,796-Speed 6316.93 samples/sec Loss 12.2102 LearningRate 0.0006 Epoch: 2 Global Step: 49550 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:41,050-Speed 6294.68 samples/sec Loss 12.1840 LearningRate 0.0006 Epoch: 2 Global Step: 49560 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:44,298-Speed 6308.02 samples/sec Loss 12.2289 LearningRate 0.0006 Epoch: 2 Global Step: 49570 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:47,540-Speed 6317.12 samples/sec Loss 12.2509 LearningRate 0.0006 Epoch: 2 Global Step: 49580 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:50,785-Speed 6312.96 samples/sec Loss 12.1466 LearningRate 0.0006 Epoch: 2 Global Step: 49590 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:14:54,029-Speed 6314.21 samples/sec Loss 12.1343 LearningRate 0.0006 Epoch: 2 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:14:57,260-Speed 6339.89 samples/sec Loss 12.2274 LearningRate 0.0006 Epoch: 2 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:00,506-Speed 6312.46 samples/sec Loss 12.1737 LearningRate 0.0006 Epoch: 2 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:03,755-Speed 6304.41 samples/sec Loss 12.1468 LearningRate 0.0006 Epoch: 2 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:07,000-Speed 6312.37 samples/sec Loss 12.1820 LearningRate 0.0006 Epoch: 2 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:10,244-Speed 6314.00 samples/sec Loss 12.2349 LearningRate 0.0006 Epoch: 2 Global Step: 49650 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:13,484-Speed 6323.69 samples/sec Loss 12.1007 LearningRate 0.0006 Epoch: 2 Global Step: 49660 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:16,738-Speed 6294.71 samples/sec Loss 12.1747 LearningRate 0.0006 Epoch: 2 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:19,983-Speed 6311.00 samples/sec Loss 12.2250 LearningRate 0.0006 Epoch: 2 Global Step: 49680 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:23,226-Speed 6317.88 samples/sec Loss 12.0802 LearningRate 0.0006 Epoch: 2 Global Step: 49690 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:26,470-Speed 6313.43 samples/sec Loss 12.1487 LearningRate 0.0006 Epoch: 2 Global Step: 49700 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:29,726-Speed 6292.97 samples/sec Loss 12.1645 LearningRate 0.0006 Epoch: 2 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:15:32,957-Speed 6339.85 samples/sec Loss 12.0967 LearningRate 0.0006 Epoch: 2 Global Step: 49720 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:36,220-Speed 6278.99 samples/sec Loss 12.1842 LearningRate 0.0006 Epoch: 2 Global Step: 49730 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:39,491-Speed 6262.18 samples/sec Loss 12.0368 LearningRate 0.0006 Epoch: 2 Global Step: 49740 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:42,733-Speed 6318.99 samples/sec Loss 12.0524 LearningRate 0.0006 Epoch: 2 Global Step: 49750 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:45,978-Speed 6311.33 samples/sec Loss 12.1784 LearningRate 0.0006 Epoch: 2 Global Step: 49760 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:49,218-Speed 6323.08 samples/sec Loss 12.1491 LearningRate 0.0006 Epoch: 2 Global Step: 49770 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:52,459-Speed 6319.82 samples/sec Loss 12.1579 LearningRate 0.0006 Epoch: 2 Global Step: 49780 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:55,701-Speed 6319.47 samples/sec Loss 12.0177 LearningRate 0.0006 Epoch: 2 Global Step: 49790 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:15:58,946-Speed 6312.49 samples/sec Loss 12.1774 LearningRate 0.0006 Epoch: 2 Global Step: 49800 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:02,190-Speed 6315.43 samples/sec Loss 12.0771 LearningRate 0.0006 Epoch: 2 Global Step: 49810 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:05,436-Speed 6310.00 samples/sec Loss 12.1801 LearningRate 0.0006 Epoch: 2 Global Step: 49820 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:16:08,698-Speed 6280.17 samples/sec Loss 12.0886 LearningRate 0.0006 Epoch: 2 Global Step: 49830 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:16:11,938-Speed 6321.36 samples/sec Loss 12.0474 LearningRate 0.0006 Epoch: 2 Global Step: 49840 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:16:15,175-Speed 6328.67 samples/sec Loss 12.0846 LearningRate 0.0006 Epoch: 2 Global Step: 49850 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:18,422-Speed 6308.74 samples/sec Loss 12.0582 LearningRate 0.0006 Epoch: 2 Global Step: 49860 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:21,662-Speed 6321.73 samples/sec Loss 12.1434 LearningRate 0.0006 Epoch: 2 Global Step: 49870 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:24,939-Speed 6251.40 samples/sec Loss 12.1372 LearningRate 0.0006 Epoch: 2 Global Step: 49880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:28,183-Speed 6315.22 samples/sec Loss 12.1281 LearningRate 0.0006 Epoch: 2 Global Step: 49890 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:31,431-Speed 6307.03 samples/sec Loss 12.0837 LearningRate 0.0006 Epoch: 2 Global Step: 49900 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:34,678-Speed 6307.17 samples/sec Loss 12.0291 LearningRate 0.0006 Epoch: 2 Global Step: 49910 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:37,926-Speed 6307.76 samples/sec Loss 12.1629 LearningRate 0.0006 Epoch: 2 Global Step: 49920 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:41,168-Speed 6318.87 samples/sec Loss 12.0702 LearningRate 0.0006 Epoch: 2 Global Step: 49930 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:44,412-Speed 6315.64 samples/sec Loss 12.1033 LearningRate 0.0006 Epoch: 2 Global Step: 49940 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:47,655-Speed 6315.93 samples/sec Loss 12.0682 LearningRate 0.0006 Epoch: 2 Global Step: 49950 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:16:50,899-Speed 6314.46 samples/sec Loss 12.0999 LearningRate 0.0006 Epoch: 2 Global Step: 49960 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:16:54,134-Speed 6333.05 samples/sec Loss 12.0730 LearningRate 0.0006 Epoch: 2 Global Step: 49970 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:16:57,380-Speed 6309.19 samples/sec Loss 11.9420 LearningRate 0.0006 Epoch: 2 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:00,621-Speed 6320.78 samples/sec Loss 12.1448 LearningRate 0.0006 Epoch: 2 Global Step: 49990 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:03,867-Speed 6310.86 samples/sec Loss 12.0199 LearningRate 0.0006 Epoch: 2 Global Step: 50000 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:07,110-Speed 6316.53 samples/sec Loss 12.0659 LearningRate 0.0006 Epoch: 2 Global Step: 50010 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:10,355-Speed 6313.09 samples/sec Loss 12.0842 LearningRate 0.0006 Epoch: 2 Global Step: 50020 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:13,635-Speed 6244.74 samples/sec Loss 11.9746 LearningRate 0.0006 Epoch: 2 Global Step: 50030 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:16,881-Speed 6310.27 samples/sec Loss 12.1868 LearningRate 0.0006 Epoch: 2 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:20,129-Speed 6306.94 samples/sec Loss 12.0341 LearningRate 0.0006 Epoch: 2 Global Step: 50050 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:23,374-Speed 6313.82 samples/sec Loss 12.1440 LearningRate 0.0006 Epoch: 2 Global Step: 50060 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:26,606-Speed 6338.52 samples/sec Loss 12.0100 LearningRate 0.0006 Epoch: 2 Global Step: 50070 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:29,851-Speed 6312.80 samples/sec Loss 12.0368 LearningRate 0.0006 Epoch: 2 Global Step: 50080 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:33,095-Speed 6313.02 samples/sec Loss 12.1150 LearningRate 0.0006 Epoch: 2 Global Step: 50090 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:36,340-Speed 6313.05 samples/sec Loss 12.0800 LearningRate 0.0006 Epoch: 2 Global Step: 50100 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:39,585-Speed 6312.79 samples/sec Loss 12.0274 LearningRate 0.0006 Epoch: 2 Global Step: 50110 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:42,835-Speed 6303.78 samples/sec Loss 12.0697 LearningRate 0.0006 Epoch: 2 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:46,079-Speed 6314.15 samples/sec Loss 12.0312 LearningRate 0.0006 Epoch: 2 Global Step: 50130 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:49,323-Speed 6314.62 samples/sec Loss 12.0906 LearningRate 0.0006 Epoch: 2 Global Step: 50140 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:52,567-Speed 6314.75 samples/sec Loss 11.9915 LearningRate 0.0006 Epoch: 2 Global Step: 50150 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:55,811-Speed 6314.63 samples/sec Loss 12.1113 LearningRate 0.0006 Epoch: 2 Global Step: 50160 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:17:59,060-Speed 6306.42 samples/sec Loss 12.1222 LearningRate 0.0006 Epoch: 2 Global Step: 50170 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:18:02,302-Speed 6317.52 samples/sec Loss 12.0637 LearningRate 0.0006 Epoch: 2 Global Step: 50180 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:18:05,531-Speed 6343.75 samples/sec Loss 11.9815 LearningRate 0.0006 Epoch: 2 Global Step: 50190 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:08,777-Speed 6311.18 samples/sec Loss 12.0057 LearningRate 0.0006 Epoch: 2 Global Step: 50200 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:12,023-Speed 6310.76 samples/sec Loss 11.9711 LearningRate 0.0006 Epoch: 2 Global Step: 50210 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:15,317-Speed 6217.97 samples/sec Loss 11.9722 LearningRate 0.0006 Epoch: 2 Global Step: 50220 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:18,566-Speed 6305.81 samples/sec Loss 12.0393 LearningRate 0.0006 Epoch: 2 Global Step: 50230 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:21,809-Speed 6316.06 samples/sec Loss 12.0440 LearningRate 0.0006 Epoch: 2 Global Step: 50240 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:25,052-Speed 6317.58 samples/sec Loss 12.0561 LearningRate 0.0006 Epoch: 2 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:28,297-Speed 6311.30 samples/sec Loss 12.1544 LearningRate 0.0006 Epoch: 2 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:31,545-Speed 6307.80 samples/sec Loss 12.0545 LearningRate 0.0006 Epoch: 2 Global Step: 50270 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:34,791-Speed 6310.01 samples/sec Loss 11.9874 LearningRate 0.0006 Epoch: 2 Global Step: 50280 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:38,041-Speed 6302.52 samples/sec Loss 12.0399 LearningRate 0.0006 Epoch: 2 Global Step: 50290 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:18:41,272-Speed 6340.85 samples/sec Loss 12.0572 LearningRate 0.0006 Epoch: 2 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:44,514-Speed 6318.94 samples/sec Loss 11.9723 LearningRate 0.0006 Epoch: 2 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:47,757-Speed 6316.20 samples/sec Loss 11.9688 LearningRate 0.0006 Epoch: 2 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:51,004-Speed 6308.92 samples/sec Loss 12.0629 LearningRate 0.0006 Epoch: 2 Global Step: 50330 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:54,248-Speed 6315.35 samples/sec Loss 11.9081 LearningRate 0.0006 Epoch: 2 Global Step: 50340 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:18:57,496-Speed 6307.26 samples/sec Loss 12.0233 LearningRate 0.0006 Epoch: 2 Global Step: 50350 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:00,740-Speed 6314.30 samples/sec Loss 12.0405 LearningRate 0.0006 Epoch: 2 Global Step: 50360 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:03,986-Speed 6311.02 samples/sec Loss 11.9511 LearningRate 0.0006 Epoch: 2 Global Step: 50370 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:07,230-Speed 6313.81 samples/sec Loss 12.0508 LearningRate 0.0006 Epoch: 2 Global Step: 50380 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:10,472-Speed 6317.86 samples/sec Loss 12.0017 LearningRate 0.0006 Epoch: 2 Global Step: 50390 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:13,698-Speed 6351.15 samples/sec Loss 11.8848 LearningRate 0.0006 Epoch: 2 Global Step: 50400 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:16,939-Speed 6318.86 samples/sec Loss 12.0297 LearningRate 0.0006 Epoch: 2 Global Step: 50410 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:20,185-Speed 6311.67 samples/sec Loss 12.0414 LearningRate 0.0006 Epoch: 2 Global Step: 50420 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:23,428-Speed 6315.82 samples/sec Loss 12.0159 LearningRate 0.0006 Epoch: 2 Global Step: 50430 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:26,673-Speed 6312.87 samples/sec Loss 11.9149 LearningRate 0.0006 Epoch: 2 Global Step: 50440 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:29,916-Speed 6317.37 samples/sec Loss 12.0228 LearningRate 0.0006 Epoch: 2 Global Step: 50450 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:33,161-Speed 6311.33 samples/sec Loss 11.9859 LearningRate 0.0006 Epoch: 2 Global Step: 50460 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:36,404-Speed 6317.09 samples/sec Loss 12.0894 LearningRate 0.0006 Epoch: 2 Global Step: 50470 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:39,648-Speed 6315.76 samples/sec Loss 12.0169 LearningRate 0.0006 Epoch: 2 Global Step: 50480 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:42,888-Speed 6320.91 samples/sec Loss 11.8690 LearningRate 0.0006 Epoch: 2 Global Step: 50490 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:46,120-Speed 6339.17 samples/sec Loss 12.0066 LearningRate 0.0006 Epoch: 2 Global Step: 50500 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:49,363-Speed 6315.21 samples/sec Loss 12.1112 LearningRate 0.0006 Epoch: 2 Global Step: 50510 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:52,604-Speed 6321.77 samples/sec Loss 12.0421 LearningRate 0.0006 Epoch: 2 Global Step: 50520 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:55,847-Speed 6316.76 samples/sec Loss 11.9806 LearningRate 0.0006 Epoch: 2 Global Step: 50530 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:19:59,090-Speed 6316.12 samples/sec Loss 12.0382 LearningRate 0.0006 Epoch: 2 Global Step: 50540 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:02,336-Speed 6311.44 samples/sec Loss 11.9736 LearningRate 0.0006 Epoch: 2 Global Step: 50550 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:05,577-Speed 6321.27 samples/sec Loss 12.1116 LearningRate 0.0006 Epoch: 2 Global Step: 50560 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:08,820-Speed 6315.93 samples/sec Loss 11.9785 LearningRate 0.0006 Epoch: 2 Global Step: 50570 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:12,068-Speed 6305.86 samples/sec Loss 12.0006 LearningRate 0.0006 Epoch: 2 Global Step: 50580 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:15,309-Speed 6321.53 samples/sec Loss 11.9567 LearningRate 0.0006 Epoch: 2 Global Step: 50590 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:18,555-Speed 6310.48 samples/sec Loss 11.9399 LearningRate 0.0006 Epoch: 2 Global Step: 50600 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:20:21,786-Speed 6340.05 samples/sec Loss 12.0143 LearningRate 0.0006 Epoch: 2 Global Step: 50610 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:25,033-Speed 6308.92 samples/sec Loss 11.9091 LearningRate 0.0006 Epoch: 2 Global Step: 50620 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:28,279-Speed 6309.75 samples/sec Loss 12.0605 LearningRate 0.0006 Epoch: 2 Global Step: 50630 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:31,526-Speed 6309.57 samples/sec Loss 11.9895 LearningRate 0.0006 Epoch: 2 Global Step: 50640 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:34,770-Speed 6314.58 samples/sec Loss 11.9690 LearningRate 0.0006 Epoch: 2 Global Step: 50650 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:38,018-Speed 6307.12 samples/sec Loss 11.9936 LearningRate 0.0006 Epoch: 2 Global Step: 50660 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:41,264-Speed 6309.31 samples/sec Loss 11.9049 LearningRate 0.0006 Epoch: 2 Global Step: 50670 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:44,509-Speed 6312.85 samples/sec Loss 11.9695 LearningRate 0.0006 Epoch: 2 Global Step: 50680 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:47,751-Speed 6318.72 samples/sec Loss 11.9296 LearningRate 0.0006 Epoch: 2 Global Step: 50690 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:50,998-Speed 6309.38 samples/sec Loss 11.9676 LearningRate 0.0006 Epoch: 2 Global Step: 50700 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:20:54,254-Speed 6291.35 samples/sec Loss 11.9453 LearningRate 0.0006 Epoch: 2 Global Step: 50710 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:20:57,499-Speed 6311.89 samples/sec Loss 11.9809 LearningRate 0.0006 Epoch: 2 Global Step: 50720 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:21:00,761-Speed 6281.63 samples/sec Loss 11.9414 LearningRate 0.0006 Epoch: 2 Global Step: 50730 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:21:04,008-Speed 6309.50 samples/sec Loss 12.0075 LearningRate 0.0006 Epoch: 2 Global Step: 50740 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:21:07,241-Speed 6336.45 samples/sec Loss 12.0605 LearningRate 0.0006 Epoch: 2 Global Step: 50750 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:10,482-Speed 6319.20 samples/sec Loss 11.9531 LearningRate 0.0006 Epoch: 2 Global Step: 50760 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:13,784-Speed 6204.00 samples/sec Loss 11.9050 LearningRate 0.0006 Epoch: 2 Global Step: 50770 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:17,026-Speed 6318.29 samples/sec Loss 11.9910 LearningRate 0.0006 Epoch: 2 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:20,275-Speed 6306.44 samples/sec Loss 11.9678 LearningRate 0.0006 Epoch: 2 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:23,517-Speed 6317.47 samples/sec Loss 11.9217 LearningRate 0.0006 Epoch: 2 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:26,762-Speed 6312.25 samples/sec Loss 11.9214 LearningRate 0.0006 Epoch: 2 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:30,011-Speed 6306.38 samples/sec Loss 11.9765 LearningRate 0.0006 Epoch: 2 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:33,252-Speed 6319.86 samples/sec Loss 11.8925 LearningRate 0.0006 Epoch: 2 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:36,499-Speed 6309.19 samples/sec Loss 11.9277 LearningRate 0.0006 Epoch: 2 Global Step: 50840 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:39,728-Speed 6343.48 samples/sec Loss 11.9520 LearningRate 0.0006 Epoch: 2 Global Step: 50850 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:42,970-Speed 6319.00 samples/sec Loss 11.9151 LearningRate 0.0006 Epoch: 2 Global Step: 50860 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:46,219-Speed 6304.87 samples/sec Loss 11.9434 LearningRate 0.0006 Epoch: 2 Global Step: 50870 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:49,463-Speed 6314.35 samples/sec Loss 11.8927 LearningRate 0.0006 Epoch: 2 Global Step: 50880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:52,708-Speed 6312.85 samples/sec Loss 11.9769 LearningRate 0.0006 Epoch: 2 Global Step: 50890 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:55,953-Speed 6310.99 samples/sec Loss 11.9159 LearningRate 0.0006 Epoch: 2 Global Step: 50900 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:21:59,200-Speed 6309.61 samples/sec Loss 11.9274 LearningRate 0.0006 Epoch: 2 Global Step: 50910 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:02,447-Speed 6309.60 samples/sec Loss 11.9869 LearningRate 0.0006 Epoch: 2 Global Step: 50920 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:05,689-Speed 6319.85 samples/sec Loss 11.9700 LearningRate 0.0006 Epoch: 2 Global Step: 50930 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:08,932-Speed 6315.27 samples/sec Loss 11.8630 LearningRate 0.0006 Epoch: 2 Global Step: 50940 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:12,177-Speed 6313.54 samples/sec Loss 11.8324 LearningRate 0.0006 Epoch: 2 Global Step: 50950 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:22:15,406-Speed 6342.93 samples/sec Loss 11.9513 LearningRate 0.0006 Epoch: 2 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:18,660-Speed 6296.73 samples/sec Loss 11.9329 LearningRate 0.0006 Epoch: 2 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:21,903-Speed 6316.23 samples/sec Loss 11.9088 LearningRate 0.0006 Epoch: 2 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:25,148-Speed 6312.88 samples/sec Loss 11.9338 LearningRate 0.0006 Epoch: 2 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:28,394-Speed 6310.02 samples/sec Loss 11.8871 LearningRate 0.0006 Epoch: 2 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:31,638-Speed 6313.57 samples/sec Loss 11.8773 LearningRate 0.0006 Epoch: 2 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:34,879-Speed 6320.80 samples/sec Loss 11.9435 LearningRate 0.0006 Epoch: 2 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:38,125-Speed 6311.68 samples/sec Loss 11.8342 LearningRate 0.0006 Epoch: 2 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:41,369-Speed 6314.06 samples/sec Loss 11.8698 LearningRate 0.0006 Epoch: 2 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:44,612-Speed 6316.86 samples/sec Loss 12.0103 LearningRate 0.0006 Epoch: 2 Global Step: 51050 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:47,846-Speed 6334.67 samples/sec Loss 11.8402 LearningRate 0.0006 Epoch: 2 Global Step: 51060 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:51,089-Speed 6315.60 samples/sec Loss 11.8511 LearningRate 0.0006 Epoch: 2 Global Step: 51070 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:54,332-Speed 6316.60 samples/sec Loss 11.8926 LearningRate 0.0006 Epoch: 2 Global Step: 51080 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:22:57,574-Speed 6319.33 samples/sec Loss 11.9322 LearningRate 0.0006 Epoch: 2 Global Step: 51090 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:00,819-Speed 6312.77 samples/sec Loss 12.0037 LearningRate 0.0006 Epoch: 2 Global Step: 51100 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:04,062-Speed 6315.91 samples/sec Loss 11.9326 LearningRate 0.0006 Epoch: 2 Global Step: 51110 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:07,303-Speed 6320.32 samples/sec Loss 11.9407 LearningRate 0.0006 Epoch: 2 Global Step: 51120 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:10,553-Speed 6303.41 samples/sec Loss 11.8873 LearningRate 0.0006 Epoch: 2 Global Step: 51130 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:13,795-Speed 6319.89 samples/sec Loss 11.8810 LearningRate 0.0006 Epoch: 2 Global Step: 51140 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:17,042-Speed 6307.60 samples/sec Loss 11.7922 LearningRate 0.0006 Epoch: 2 Global Step: 51150 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:20,286-Speed 6315.86 samples/sec Loss 11.8793 LearningRate 0.0006 Epoch: 2 Global Step: 51160 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:23:23,532-Speed 6311.07 samples/sec Loss 11.9029 LearningRate 0.0006 Epoch: 2 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:26,786-Speed 6293.75 samples/sec Loss 11.9080 LearningRate 0.0006 Epoch: 2 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:30,042-Speed 6292.49 samples/sec Loss 11.9391 LearningRate 0.0006 Epoch: 2 Global Step: 51190 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:33,285-Speed 6316.60 samples/sec Loss 11.9191 LearningRate 0.0006 Epoch: 2 Global Step: 51200 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:36,529-Speed 6314.28 samples/sec Loss 11.9364 LearningRate 0.0006 Epoch: 2 Global Step: 51210 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:39,776-Speed 6307.85 samples/sec Loss 11.9665 LearningRate 0.0006 Epoch: 2 Global Step: 51220 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:43,022-Speed 6311.88 samples/sec Loss 11.8508 LearningRate 0.0006 Epoch: 2 Global Step: 51230 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:46,280-Speed 6286.75 samples/sec Loss 11.8603 LearningRate 0.0006 Epoch: 2 Global Step: 51240 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:49,527-Speed 6309.66 samples/sec Loss 11.8235 LearningRate 0.0006 Epoch: 2 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:52,776-Speed 6303.95 samples/sec Loss 11.8282 LearningRate 0.0006 Epoch: 2 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:56,004-Speed 6345.11 samples/sec Loss 11.9395 LearningRate 0.0006 Epoch: 2 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:23:59,253-Speed 6306.00 samples/sec Loss 11.9013 LearningRate 0.0006 Epoch: 2 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:02,501-Speed 6306.98 samples/sec Loss 11.7659 LearningRate 0.0006 Epoch: 2 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:05,745-Speed 6314.66 samples/sec Loss 11.8195 LearningRate 0.0006 Epoch: 2 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:08,987-Speed 6318.08 samples/sec Loss 11.8825 LearningRate 0.0006 Epoch: 2 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:12,228-Speed 6319.92 samples/sec Loss 11.8523 LearningRate 0.0006 Epoch: 2 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:15,482-Speed 6296.76 samples/sec Loss 11.7655 LearningRate 0.0006 Epoch: 2 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:18,723-Speed 6321.08 samples/sec Loss 11.9027 LearningRate 0.0006 Epoch: 2 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:21,967-Speed 6314.70 samples/sec Loss 11.8156 LearningRate 0.0006 Epoch: 2 Global Step: 51350 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:25,208-Speed 6319.98 samples/sec Loss 11.8814 LearningRate 0.0006 Epoch: 2 Global Step: 51360 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:28,453-Speed 6312.79 samples/sec Loss 11.8523 LearningRate 0.0006 Epoch: 2 Global Step: 51370 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:24:31,699-Speed 6310.18 samples/sec Loss 11.9180 LearningRate 0.0006 Epoch: 2 Global Step: 51380 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:24:34,944-Speed 6313.02 samples/sec Loss 11.9264 LearningRate 0.0006 Epoch: 2 Global Step: 51390 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:24:38,174-Speed 6342.01 samples/sec Loss 11.7980 LearningRate 0.0006 Epoch: 2 Global Step: 51400 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:41,417-Speed 6316.24 samples/sec Loss 11.8783 LearningRate 0.0006 Epoch: 2 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:44,662-Speed 6312.24 samples/sec Loss 12.0265 LearningRate 0.0006 Epoch: 2 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:47,906-Speed 6314.97 samples/sec Loss 11.8623 LearningRate 0.0006 Epoch: 2 Global Step: 51430 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:51,151-Speed 6313.18 samples/sec Loss 11.7323 LearningRate 0.0006 Epoch: 2 Global Step: 51440 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:54,392-Speed 6319.88 samples/sec Loss 11.7154 LearningRate 0.0006 Epoch: 2 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:24:57,637-Speed 6312.24 samples/sec Loss 11.8211 LearningRate 0.0006 Epoch: 2 Global Step: 51460 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:00,881-Speed 6316.08 samples/sec Loss 11.9084 LearningRate 0.0006 Epoch: 2 Global Step: 51470 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:04,122-Speed 6319.12 samples/sec Loss 11.7796 LearningRate 0.0006 Epoch: 2 Global Step: 51480 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:07,371-Speed 6304.98 samples/sec Loss 11.9260 LearningRate 0.0006 Epoch: 2 Global Step: 51490 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:10,615-Speed 6314.22 samples/sec Loss 11.8531 LearningRate 0.0006 Epoch: 2 Global Step: 51500 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:25:13,861-Speed 6311.27 samples/sec Loss 11.7716 LearningRate 0.0006 Epoch: 2 Global Step: 51510 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:25:17,103-Speed 6318.80 samples/sec Loss 11.7910 LearningRate 0.0006 Epoch: 2 Global Step: 51520 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:25:20,333-Speed 6342.13 samples/sec Loss 11.7901 LearningRate 0.0006 Epoch: 2 Global Step: 51530 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:23,578-Speed 6312.26 samples/sec Loss 11.8443 LearningRate 0.0006 Epoch: 2 Global Step: 51540 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:26,827-Speed 6306.59 samples/sec Loss 11.8649 LearningRate 0.0006 Epoch: 2 Global Step: 51550 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:30,068-Speed 6320.60 samples/sec Loss 11.7929 LearningRate 0.0006 Epoch: 2 Global Step: 51560 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:33,313-Speed 6311.38 samples/sec Loss 11.8891 LearningRate 0.0006 Epoch: 2 Global Step: 51570 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:36,559-Speed 6312.05 samples/sec Loss 11.7894 LearningRate 0.0006 Epoch: 2 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:39,799-Speed 6321.13 samples/sec Loss 11.7918 LearningRate 0.0006 Epoch: 2 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:43,043-Speed 6316.34 samples/sec Loss 11.7635 LearningRate 0.0006 Epoch: 2 Global Step: 51600 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:46,283-Speed 6321.38 samples/sec Loss 11.9004 LearningRate 0.0006 Epoch: 2 Global Step: 51610 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:49,526-Speed 6315.87 samples/sec Loss 11.8710 LearningRate 0.0006 Epoch: 2 Global Step: 51620 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:52,770-Speed 6315.32 samples/sec Loss 11.7556 LearningRate 0.0006 Epoch: 2 Global Step: 51630 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:25:56,001-Speed 6341.05 samples/sec Loss 11.7768 LearningRate 0.0006 Epoch: 2 Global Step: 51640 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:25:59,243-Speed 6316.71 samples/sec Loss 11.8465 LearningRate 0.0006 Epoch: 2 Global Step: 51650 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:02,490-Speed 6309.99 samples/sec Loss 11.8080 LearningRate 0.0006 Epoch: 2 Global Step: 51660 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:05,731-Speed 6321.02 samples/sec Loss 11.8248 LearningRate 0.0006 Epoch: 2 Global Step: 51670 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:08,972-Speed 6318.76 samples/sec Loss 11.7701 LearningRate 0.0006 Epoch: 2 Global Step: 51680 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:12,242-Speed 6264.78 samples/sec Loss 11.7913 LearningRate 0.0006 Epoch: 2 Global Step: 51690 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:15,495-Speed 6298.12 samples/sec Loss 11.8104 LearningRate 0.0006 Epoch: 2 Global Step: 51700 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:18,741-Speed 6309.99 samples/sec Loss 11.9359 LearningRate 0.0006 Epoch: 2 Global Step: 51710 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:21,982-Speed 6319.25 samples/sec Loss 11.7569 LearningRate 0.0006 Epoch: 2 Global Step: 51720 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:25,227-Speed 6315.01 samples/sec Loss 11.8158 LearningRate 0.0006 Epoch: 2 Global Step: 51730 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:28,474-Speed 6308.24 samples/sec Loss 11.7574 LearningRate 0.0006 Epoch: 2 Global Step: 51740 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:26:31,718-Speed 6315.39 samples/sec Loss 11.8161 LearningRate 0.0006 Epoch: 2 Global Step: 51750 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:26:34,945-Speed 6346.77 samples/sec Loss 11.8622 LearningRate 0.0006 Epoch: 2 Global Step: 51760 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:38,191-Speed 6311.44 samples/sec Loss 11.8038 LearningRate 0.0006 Epoch: 2 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:41,437-Speed 6311.25 samples/sec Loss 11.7888 LearningRate 0.0006 Epoch: 2 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:44,678-Speed 6320.40 samples/sec Loss 11.7222 LearningRate 0.0006 Epoch: 2 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:47,921-Speed 6316.60 samples/sec Loss 11.7765 LearningRate 0.0006 Epoch: 2 Global Step: 51800 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:51,161-Speed 6322.01 samples/sec Loss 11.7602 LearningRate 0.0006 Epoch: 2 Global Step: 51810 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:54,405-Speed 6314.27 samples/sec Loss 11.6628 LearningRate 0.0006 Epoch: 2 Global Step: 51820 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:26:57,650-Speed 6311.87 samples/sec Loss 11.6779 LearningRate 0.0006 Epoch: 2 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:00,894-Speed 6316.28 samples/sec Loss 11.8739 LearningRate 0.0006 Epoch: 2 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:04,138-Speed 6314.02 samples/sec Loss 11.8051 LearningRate 0.0006 Epoch: 2 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:07,369-Speed 6340.83 samples/sec Loss 11.7463 LearningRate 0.0006 Epoch: 2 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:10,610-Speed 6319.44 samples/sec Loss 11.7711 LearningRate 0.0006 Epoch: 2 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:13,856-Speed 6311.71 samples/sec Loss 11.7817 LearningRate 0.0006 Epoch: 2 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:17,112-Speed 6289.93 samples/sec Loss 11.7925 LearningRate 0.0006 Epoch: 2 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:20,357-Speed 6312.94 samples/sec Loss 11.7807 LearningRate 0.0006 Epoch: 2 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:23,603-Speed 6310.21 samples/sec Loss 11.7966 LearningRate 0.0006 Epoch: 2 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:26,849-Speed 6311.59 samples/sec Loss 11.7961 LearningRate 0.0006 Epoch: 2 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:30,137-Speed 6230.20 samples/sec Loss 11.7438 LearningRate 0.0006 Epoch: 2 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:33,381-Speed 6314.32 samples/sec Loss 11.7685 LearningRate 0.0006 Epoch: 2 Global Step: 51940 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:36,627-Speed 6312.33 samples/sec Loss 11.8193 LearningRate 0.0006 Epoch: 2 Global Step: 51950 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:39,853-Speed 6348.77 samples/sec Loss 11.6716 LearningRate 0.0006 Epoch: 2 Global Step: 51960 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:43,109-Speed 6292.39 samples/sec Loss 11.7782 LearningRate 0.0006 Epoch: 2 Global Step: 51970 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:46,355-Speed 6308.96 samples/sec Loss 11.6973 LearningRate 0.0006 Epoch: 2 Global Step: 51980 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:49,602-Speed 6309.09 samples/sec Loss 11.6938 LearningRate 0.0006 Epoch: 2 Global Step: 51990 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:52,849-Speed 6308.17 samples/sec Loss 11.8367 LearningRate 0.0006 Epoch: 2 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:56,095-Speed 6312.90 samples/sec Loss 11.8321 LearningRate 0.0006 Epoch: 2 Global Step: 52010 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:27:59,337-Speed 6318.39 samples/sec Loss 11.7695 LearningRate 0.0006 Epoch: 2 Global Step: 52020 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:02,580-Speed 6315.25 samples/sec Loss 11.7530 LearningRate 0.0006 Epoch: 2 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:05,822-Speed 6318.28 samples/sec Loss 11.7281 LearningRate 0.0006 Epoch: 2 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:09,063-Speed 6321.35 samples/sec Loss 11.6626 LearningRate 0.0006 Epoch: 2 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:12,309-Speed 6309.88 samples/sec Loss 11.8014 LearningRate 0.0006 Epoch: 2 Global Step: 52060 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:28:15,555-Speed 6312.02 samples/sec Loss 11.7813 LearningRate 0.0006 Epoch: 2 Global Step: 52070 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:28:18,786-Speed 6339.58 samples/sec Loss 11.7936 LearningRate 0.0006 Epoch: 2 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:22,028-Speed 6318.90 samples/sec Loss 11.6635 LearningRate 0.0006 Epoch: 2 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:25,273-Speed 6312.40 samples/sec Loss 11.7384 LearningRate 0.0006 Epoch: 2 Global Step: 52100 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:28,517-Speed 6314.89 samples/sec Loss 11.7394 LearningRate 0.0006 Epoch: 2 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:31,765-Speed 6306.43 samples/sec Loss 11.6881 LearningRate 0.0006 Epoch: 2 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:35,008-Speed 6316.46 samples/sec Loss 11.7358 LearningRate 0.0006 Epoch: 2 Global Step: 52130 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:38,257-Speed 6303.91 samples/sec Loss 11.7387 LearningRate 0.0006 Epoch: 2 Global Step: 52140 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:41,501-Speed 6315.83 samples/sec Loss 11.6370 LearningRate 0.0006 Epoch: 2 Global Step: 52150 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:44,745-Speed 6315.77 samples/sec Loss 11.7464 LearningRate 0.0006 Epoch: 2 Global Step: 52160 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:47,989-Speed 6314.83 samples/sec Loss 11.7299 LearningRate 0.0006 Epoch: 2 Global Step: 52170 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:51,216-Speed 6346.07 samples/sec Loss 11.7411 LearningRate 0.0006 Epoch: 2 Global Step: 52180 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:54,461-Speed 6312.81 samples/sec Loss 11.7467 LearningRate 0.0006 Epoch: 2 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:28:57,705-Speed 6315.96 samples/sec Loss 11.7816 LearningRate 0.0006 Epoch: 2 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:00,949-Speed 6314.19 samples/sec Loss 11.7673 LearningRate 0.0006 Epoch: 2 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:04,193-Speed 6314.01 samples/sec Loss 11.7099 LearningRate 0.0006 Epoch: 2 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:07,437-Speed 6314.48 samples/sec Loss 11.7276 LearningRate 0.0006 Epoch: 2 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:10,683-Speed 6311.02 samples/sec Loss 11.6911 LearningRate 0.0006 Epoch: 2 Global Step: 52240 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:13,930-Speed 6309.09 samples/sec Loss 11.7512 LearningRate 0.0006 Epoch: 2 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:17,170-Speed 6321.25 samples/sec Loss 11.7422 LearningRate 0.0006 Epoch: 2 Global Step: 52260 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:20,417-Speed 6308.83 samples/sec Loss 11.7688 LearningRate 0.0006 Epoch: 2 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:23,662-Speed 6313.37 samples/sec Loss 11.7157 LearningRate 0.0006 Epoch: 2 Global Step: 52280 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:29:26,892-Speed 6342.04 samples/sec Loss 11.6595 LearningRate 0.0006 Epoch: 2 Global Step: 52290 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:30,135-Speed 6316.90 samples/sec Loss 11.6845 LearningRate 0.0006 Epoch: 2 Global Step: 52300 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:33,379-Speed 6313.84 samples/sec Loss 11.7849 LearningRate 0.0006 Epoch: 2 Global Step: 52310 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:36,623-Speed 6315.06 samples/sec Loss 11.7206 LearningRate 0.0006 Epoch: 2 Global Step: 52320 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:39,864-Speed 6320.03 samples/sec Loss 11.7742 LearningRate 0.0006 Epoch: 2 Global Step: 52330 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:43,177-Speed 6184.35 samples/sec Loss 11.7536 LearningRate 0.0006 Epoch: 2 Global Step: 52340 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:46,420-Speed 6317.05 samples/sec Loss 11.8043 LearningRate 0.0006 Epoch: 2 Global Step: 52350 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:49,661-Speed 6319.21 samples/sec Loss 11.7758 LearningRate 0.0006 Epoch: 2 Global Step: 52360 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:52,909-Speed 6308.31 samples/sec Loss 11.7670 LearningRate 0.0006 Epoch: 2 Global Step: 52370 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:56,151-Speed 6318.63 samples/sec Loss 11.6323 LearningRate 0.0006 Epoch: 2 Global Step: 52380 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:29:59,378-Speed 6347.12 samples/sec Loss 11.6709 LearningRate 0.0006 Epoch: 2 Global Step: 52390 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:02,621-Speed 6316.28 samples/sec Loss 11.6653 LearningRate 0.0006 Epoch: 2 Global Step: 52400 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:05,867-Speed 6311.36 samples/sec Loss 11.7329 LearningRate 0.0006 Epoch: 2 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:09,113-Speed 6310.66 samples/sec Loss 11.7944 LearningRate 0.0006 Epoch: 2 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:12,360-Speed 6308.26 samples/sec Loss 11.6709 LearningRate 0.0006 Epoch: 2 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:15,602-Speed 6320.05 samples/sec Loss 11.7182 LearningRate 0.0006 Epoch: 2 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:18,871-Speed 6265.94 samples/sec Loss 11.7033 LearningRate 0.0006 Epoch: 2 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:22,118-Speed 6308.64 samples/sec Loss 11.7161 LearningRate 0.0006 Epoch: 2 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:25,364-Speed 6309.45 samples/sec Loss 11.6842 LearningRate 0.0006 Epoch: 2 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:28,605-Speed 6320.27 samples/sec Loss 11.7184 LearningRate 0.0006 Epoch: 2 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:31,835-Speed 6343.43 samples/sec Loss 11.6403 LearningRate 0.0006 Epoch: 2 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:35,076-Speed 6320.49 samples/sec Loss 11.6696 LearningRate 0.0006 Epoch: 2 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:38,319-Speed 6315.89 samples/sec Loss 11.6504 LearningRate 0.0006 Epoch: 2 Global Step: 52510 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:41,562-Speed 6316.17 samples/sec Loss 11.6506 LearningRate 0.0006 Epoch: 2 Global Step: 52520 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:44,805-Speed 6317.78 samples/sec Loss 11.6323 LearningRate 0.0006 Epoch: 2 Global Step: 52530 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:48,050-Speed 6312.05 samples/sec Loss 11.7561 LearningRate 0.0006 Epoch: 2 Global Step: 52540 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:51,294-Speed 6314.15 samples/sec Loss 11.8417 LearningRate 0.0006 Epoch: 2 Global Step: 52550 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:54,537-Speed 6317.84 samples/sec Loss 11.7470 LearningRate 0.0006 Epoch: 2 Global Step: 52560 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:30:57,791-Speed 6294.42 samples/sec Loss 11.6875 LearningRate 0.0006 Epoch: 2 Global Step: 52570 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:01,043-Speed 6300.25 samples/sec Loss 11.6725 LearningRate 0.0006 Epoch: 2 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:04,277-Speed 6334.40 samples/sec Loss 11.6552 LearningRate 0.0006 Epoch: 2 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:07,523-Speed 6309.66 samples/sec Loss 11.7394 LearningRate 0.0006 Epoch: 2 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:10,767-Speed 6315.35 samples/sec Loss 11.6455 LearningRate 0.0006 Epoch: 2 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:14,009-Speed 6319.03 samples/sec Loss 11.6777 LearningRate 0.0006 Epoch: 2 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:17,251-Speed 6318.77 samples/sec Loss 11.6208 LearningRate 0.0006 Epoch: 2 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:20,490-Speed 6322.79 samples/sec Loss 11.6610 LearningRate 0.0006 Epoch: 2 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:23,735-Speed 6313.78 samples/sec Loss 11.6721 LearningRate 0.0006 Epoch: 2 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:26,980-Speed 6313.05 samples/sec Loss 11.7101 LearningRate 0.0006 Epoch: 2 Global Step: 52660 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:30,226-Speed 6309.70 samples/sec Loss 11.6533 LearningRate 0.0006 Epoch: 2 Global Step: 52670 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:33,466-Speed 6322.58 samples/sec Loss 11.7003 LearningRate 0.0006 Epoch: 2 Global Step: 52680 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:36,712-Speed 6310.60 samples/sec Loss 11.7143 LearningRate 0.0006 Epoch: 2 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:31:39,959-Speed 6309.05 samples/sec Loss 11.6592 LearningRate 0.0006 Epoch: 2 Global Step: 52700 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:31:43,192-Speed 6336.89 samples/sec Loss 11.6642 LearningRate 0.0006 Epoch: 2 Global Step: 52710 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:46,436-Speed 6313.04 samples/sec Loss 11.6721 LearningRate 0.0006 Epoch: 2 Global Step: 52720 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:49,682-Speed 6311.17 samples/sec Loss 11.7397 LearningRate 0.0006 Epoch: 2 Global Step: 52730 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:52,929-Speed 6307.61 samples/sec Loss 11.5972 LearningRate 0.0006 Epoch: 2 Global Step: 52740 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:56,178-Speed 6306.94 samples/sec Loss 11.7951 LearningRate 0.0006 Epoch: 2 Global Step: 52750 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:31:59,425-Speed 6307.91 samples/sec Loss 11.6318 LearningRate 0.0006 Epoch: 2 Global Step: 52760 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:02,669-Speed 6314.77 samples/sec Loss 11.6971 LearningRate 0.0006 Epoch: 2 Global Step: 52770 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:05,915-Speed 6310.66 samples/sec Loss 11.6588 LearningRate 0.0006 Epoch: 2 Global Step: 52780 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:09,156-Speed 6321.50 samples/sec Loss 11.5469 LearningRate 0.0006 Epoch: 2 Global Step: 52790 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:12,402-Speed 6310.49 samples/sec Loss 11.5957 LearningRate 0.0006 Epoch: 2 Global Step: 52800 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:15,628-Speed 6350.04 samples/sec Loss 11.6315 LearningRate 0.0006 Epoch: 2 Global Step: 52810 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:18,874-Speed 6311.30 samples/sec Loss 11.5698 LearningRate 0.0006 Epoch: 2 Global Step: 52820 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:22,119-Speed 6311.29 samples/sec Loss 11.7217 LearningRate 0.0006 Epoch: 2 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:25,377-Speed 6287.47 samples/sec Loss 11.6811 LearningRate 0.0006 Epoch: 2 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:28,625-Speed 6307.86 samples/sec Loss 11.5745 LearningRate 0.0006 Epoch: 2 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:31,869-Speed 6314.42 samples/sec Loss 11.6184 LearningRate 0.0006 Epoch: 2 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:35,110-Speed 6319.92 samples/sec Loss 11.6379 LearningRate 0.0006 Epoch: 2 Global Step: 52870 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:38,348-Speed 6325.70 samples/sec Loss 11.5985 LearningRate 0.0006 Epoch: 2 Global Step: 52880 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:41,594-Speed 6312.64 samples/sec Loss 11.6889 LearningRate 0.0006 Epoch: 2 Global Step: 52890 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:44,838-Speed 6312.96 samples/sec Loss 11.6496 LearningRate 0.0006 Epoch: 2 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:48,082-Speed 6316.03 samples/sec Loss 11.7050 LearningRate 0.0006 Epoch: 2 Global Step: 52910 Fp16 Grad Scale: 131072 Required: 71 hours Training: 2022-03-31 20:32:51,312-Speed 6340.92 samples/sec Loss 11.6068 LearningRate 0.0006 Epoch: 2 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:54,564-Speed 6298.62 samples/sec Loss 11.5700 LearningRate 0.0006 Epoch: 2 Global Step: 52930 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:32:57,809-Speed 6313.09 samples/sec Loss 11.6580 LearningRate 0.0006 Epoch: 2 Global Step: 52940 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:01,052-Speed 6316.43 samples/sec Loss 11.5897 LearningRate 0.0006 Epoch: 2 Global Step: 52950 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:04,297-Speed 6314.66 samples/sec Loss 11.6384 LearningRate 0.0006 Epoch: 2 Global Step: 52960 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:07,539-Speed 6317.28 samples/sec Loss 11.6190 LearningRate 0.0006 Epoch: 2 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:10,786-Speed 6310.48 samples/sec Loss 11.6262 LearningRate 0.0006 Epoch: 2 Global Step: 52980 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:14,029-Speed 6315.41 samples/sec Loss 11.6349 LearningRate 0.0006 Epoch: 2 Global Step: 52990 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:17,271-Speed 6319.37 samples/sec Loss 11.7704 LearningRate 0.0006 Epoch: 2 Global Step: 53000 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:20,512-Speed 6320.01 samples/sec Loss 11.6207 LearningRate 0.0006 Epoch: 2 Global Step: 53010 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:23,742-Speed 6341.84 samples/sec Loss 11.6564 LearningRate 0.0006 Epoch: 2 Global Step: 53020 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:26,985-Speed 6315.90 samples/sec Loss 11.5980 LearningRate 0.0006 Epoch: 2 Global Step: 53030 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:30,231-Speed 6311.64 samples/sec Loss 11.6726 LearningRate 0.0006 Epoch: 2 Global Step: 53040 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:33,475-Speed 6314.76 samples/sec Loss 11.5757 LearningRate 0.0006 Epoch: 2 Global Step: 53050 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:36,724-Speed 6304.96 samples/sec Loss 11.6719 LearningRate 0.0006 Epoch: 2 Global Step: 53060 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:39,968-Speed 6314.54 samples/sec Loss 11.6360 LearningRate 0.0006 Epoch: 2 Global Step: 53070 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:43,212-Speed 6314.94 samples/sec Loss 11.6919 LearningRate 0.0006 Epoch: 2 Global Step: 53080 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:46,460-Speed 6306.51 samples/sec Loss 11.5867 LearningRate 0.0006 Epoch: 2 Global Step: 53090 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:49,705-Speed 6313.32 samples/sec Loss 11.5048 LearningRate 0.0006 Epoch: 2 Global Step: 53100 Fp16 Grad Scale: 65536 Required: 71 hours Training: 2022-03-31 20:33:52,959-Speed 6295.03 samples/sec Loss 11.5766 LearningRate 0.0006 Epoch: 2 Global Step: 53110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:33:56,202-Speed 6316.65 samples/sec Loss 11.5639 LearningRate 0.0006 Epoch: 2 Global Step: 53120 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:33:59,434-Speed 6337.18 samples/sec Loss 11.4839 LearningRate 0.0006 Epoch: 2 Global Step: 53130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:02,680-Speed 6310.45 samples/sec Loss 11.5409 LearningRate 0.0006 Epoch: 2 Global Step: 53140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:05,929-Speed 6304.26 samples/sec Loss 11.4995 LearningRate 0.0006 Epoch: 2 Global Step: 53150 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:09,174-Speed 6314.23 samples/sec Loss 11.5798 LearningRate 0.0006 Epoch: 2 Global Step: 53160 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:12,418-Speed 6314.39 samples/sec Loss 11.5557 LearningRate 0.0006 Epoch: 2 Global Step: 53170 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:15,666-Speed 6307.25 samples/sec Loss 11.5694 LearningRate 0.0006 Epoch: 2 Global Step: 53180 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:18,917-Speed 6302.13 samples/sec Loss 11.5531 LearningRate 0.0006 Epoch: 2 Global Step: 53190 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:22,159-Speed 6316.98 samples/sec Loss 11.5450 LearningRate 0.0006 Epoch: 2 Global Step: 53200 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:25,404-Speed 6313.01 samples/sec Loss 11.5704 LearningRate 0.0006 Epoch: 2 Global Step: 53210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:28,645-Speed 6320.24 samples/sec Loss 11.6214 LearningRate 0.0006 Epoch: 2 Global Step: 53220 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:31,896-Speed 6302.47 samples/sec Loss 11.4615 LearningRate 0.0006 Epoch: 2 Global Step: 53230 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:34:35,125-Speed 6343.19 samples/sec Loss 11.6078 LearningRate 0.0006 Epoch: 2 Global Step: 53240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:38,366-Speed 6319.89 samples/sec Loss 11.6785 LearningRate 0.0006 Epoch: 2 Global Step: 53250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:41,609-Speed 6316.50 samples/sec Loss 11.5754 LearningRate 0.0006 Epoch: 2 Global Step: 53260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:44,849-Speed 6323.49 samples/sec Loss 11.6676 LearningRate 0.0006 Epoch: 2 Global Step: 53270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:48,103-Speed 6295.68 samples/sec Loss 11.5044 LearningRate 0.0006 Epoch: 2 Global Step: 53280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:51,345-Speed 6318.07 samples/sec Loss 11.5893 LearningRate 0.0006 Epoch: 2 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:54,588-Speed 6315.22 samples/sec Loss 11.5724 LearningRate 0.0006 Epoch: 2 Global Step: 53300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:34:57,829-Speed 6320.85 samples/sec Loss 11.5316 LearningRate 0.0006 Epoch: 2 Global Step: 53310 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:01,074-Speed 6313.58 samples/sec Loss 11.5544 LearningRate 0.0006 Epoch: 2 Global Step: 53320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:04,317-Speed 6314.99 samples/sec Loss 11.5952 LearningRate 0.0006 Epoch: 2 Global Step: 53330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:07,548-Speed 6340.20 samples/sec Loss 11.5802 LearningRate 0.0006 Epoch: 2 Global Step: 53340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:10,789-Speed 6321.32 samples/sec Loss 11.5644 LearningRate 0.0006 Epoch: 2 Global Step: 53350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:14,033-Speed 6313.32 samples/sec Loss 11.5356 LearningRate 0.0006 Epoch: 2 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:17,277-Speed 6315.72 samples/sec Loss 11.5572 LearningRate 0.0006 Epoch: 2 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:20,521-Speed 6316.03 samples/sec Loss 11.6220 LearningRate 0.0006 Epoch: 2 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:23,764-Speed 6316.49 samples/sec Loss 11.5326 LearningRate 0.0006 Epoch: 2 Global Step: 53390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:27,010-Speed 6309.74 samples/sec Loss 11.5046 LearningRate 0.0006 Epoch: 2 Global Step: 53400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:30,255-Speed 6314.11 samples/sec Loss 11.4977 LearningRate 0.0006 Epoch: 2 Global Step: 53410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:33,501-Speed 6309.32 samples/sec Loss 11.6277 LearningRate 0.0006 Epoch: 2 Global Step: 53420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:36,747-Speed 6310.10 samples/sec Loss 11.6117 LearningRate 0.0006 Epoch: 2 Global Step: 53430 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:40,158-Speed 6006.22 samples/sec Loss 11.4654 LearningRate 0.0006 Epoch: 2 Global Step: 53440 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:35:43,468-Speed 6188.81 samples/sec Loss 11.4202 LearningRate 0.0006 Epoch: 2 Global Step: 53450 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:46,712-Speed 6313.82 samples/sec Loss 11.6110 LearningRate 0.0006 Epoch: 2 Global Step: 53460 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:49,958-Speed 6311.96 samples/sec Loss 11.5579 LearningRate 0.0006 Epoch: 2 Global Step: 53470 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:53,203-Speed 6312.67 samples/sec Loss 11.5332 LearningRate 0.0006 Epoch: 2 Global Step: 53480 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:56,444-Speed 6319.92 samples/sec Loss 11.5832 LearningRate 0.0006 Epoch: 2 Global Step: 53490 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:35:59,690-Speed 6309.93 samples/sec Loss 11.5951 LearningRate 0.0006 Epoch: 2 Global Step: 53500 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:02,933-Speed 6317.72 samples/sec Loss 11.4956 LearningRate 0.0006 Epoch: 2 Global Step: 53510 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:06,178-Speed 6311.48 samples/sec Loss 11.4842 LearningRate 0.0006 Epoch: 2 Global Step: 53520 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:09,420-Speed 6319.40 samples/sec Loss 11.5558 LearningRate 0.0006 Epoch: 2 Global Step: 53530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:12,661-Speed 6319.55 samples/sec Loss 11.5738 LearningRate 0.0006 Epoch: 2 Global Step: 53540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:15,890-Speed 6344.08 samples/sec Loss 11.5769 LearningRate 0.0006 Epoch: 2 Global Step: 53550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:19,136-Speed 6311.68 samples/sec Loss 11.5243 LearningRate 0.0006 Epoch: 2 Global Step: 53560 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:22,380-Speed 6314.19 samples/sec Loss 11.6506 LearningRate 0.0006 Epoch: 2 Global Step: 53570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:25,627-Speed 6308.16 samples/sec Loss 11.5269 LearningRate 0.0006 Epoch: 2 Global Step: 53580 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:28,872-Speed 6313.17 samples/sec Loss 11.5265 LearningRate 0.0006 Epoch: 2 Global Step: 53590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:32,115-Speed 6316.46 samples/sec Loss 11.5659 LearningRate 0.0006 Epoch: 2 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:35,358-Speed 6317.27 samples/sec Loss 11.6630 LearningRate 0.0006 Epoch: 2 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:38,603-Speed 6312.54 samples/sec Loss 11.5445 LearningRate 0.0006 Epoch: 2 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:41,846-Speed 6318.55 samples/sec Loss 11.5146 LearningRate 0.0006 Epoch: 2 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:45,089-Speed 6316.49 samples/sec Loss 11.4532 LearningRate 0.0006 Epoch: 2 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:48,315-Speed 6349.11 samples/sec Loss 11.5573 LearningRate 0.0006 Epoch: 2 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:51,558-Speed 6317.99 samples/sec Loss 11.4737 LearningRate 0.0006 Epoch: 2 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:54,803-Speed 6311.20 samples/sec Loss 11.5203 LearningRate 0.0006 Epoch: 2 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:36:58,043-Speed 6322.56 samples/sec Loss 11.4075 LearningRate 0.0006 Epoch: 2 Global Step: 53680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:01,292-Speed 6305.36 samples/sec Loss 11.6251 LearningRate 0.0006 Epoch: 2 Global Step: 53690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:04,536-Speed 6314.41 samples/sec Loss 11.5492 LearningRate 0.0006 Epoch: 2 Global Step: 53700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:07,779-Speed 6315.40 samples/sec Loss 11.5761 LearningRate 0.0006 Epoch: 2 Global Step: 53710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:11,023-Speed 6315.57 samples/sec Loss 11.5071 LearningRate 0.0006 Epoch: 2 Global Step: 53720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:14,271-Speed 6307.52 samples/sec Loss 11.5511 LearningRate 0.0006 Epoch: 2 Global Step: 53730 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:17,516-Speed 6311.99 samples/sec Loss 11.5058 LearningRate 0.0006 Epoch: 2 Global Step: 53740 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:20,757-Speed 6320.35 samples/sec Loss 11.6054 LearningRate 0.0006 Epoch: 2 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:37:23,985-Speed 6346.22 samples/sec Loss 11.5546 LearningRate 0.0006 Epoch: 2 Global Step: 53760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:27,230-Speed 6312.76 samples/sec Loss 11.4382 LearningRate 0.0006 Epoch: 2 Global Step: 53770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:30,479-Speed 6305.58 samples/sec Loss 11.4826 LearningRate 0.0006 Epoch: 2 Global Step: 53780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:33,726-Speed 6309.04 samples/sec Loss 11.6150 LearningRate 0.0006 Epoch: 2 Global Step: 53790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:36,973-Speed 6309.39 samples/sec Loss 11.4749 LearningRate 0.0006 Epoch: 2 Global Step: 53800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:40,220-Speed 6307.57 samples/sec Loss 11.4520 LearningRate 0.0006 Epoch: 2 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:43,471-Speed 6300.79 samples/sec Loss 11.4260 LearningRate 0.0006 Epoch: 2 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:46,715-Speed 6315.84 samples/sec Loss 11.5228 LearningRate 0.0006 Epoch: 2 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:49,958-Speed 6315.17 samples/sec Loss 11.3875 LearningRate 0.0006 Epoch: 2 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:53,203-Speed 6312.69 samples/sec Loss 11.5463 LearningRate 0.0006 Epoch: 2 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:56,431-Speed 6347.26 samples/sec Loss 11.5299 LearningRate 0.0006 Epoch: 2 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:37:59,674-Speed 6315.49 samples/sec Loss 11.4535 LearningRate 0.0006 Epoch: 2 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:02,954-Speed 6245.94 samples/sec Loss 11.4618 LearningRate 0.0006 Epoch: 2 Global Step: 53880 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:06,202-Speed 6307.08 samples/sec Loss 11.5234 LearningRate 0.0006 Epoch: 2 Global Step: 53890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:09,445-Speed 6315.90 samples/sec Loss 11.5285 LearningRate 0.0006 Epoch: 2 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:12,689-Speed 6314.83 samples/sec Loss 11.5659 LearningRate 0.0006 Epoch: 2 Global Step: 53910 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:15,934-Speed 6311.41 samples/sec Loss 11.5015 LearningRate 0.0006 Epoch: 2 Global Step: 53920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:19,224-Speed 6226.93 samples/sec Loss 11.6096 LearningRate 0.0007 Epoch: 2 Global Step: 53930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:22,467-Speed 6317.21 samples/sec Loss 11.4511 LearningRate 0.0007 Epoch: 2 Global Step: 53940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:25,724-Speed 6289.06 samples/sec Loss 11.4542 LearningRate 0.0007 Epoch: 2 Global Step: 53950 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:28,988-Speed 6275.39 samples/sec Loss 11.4927 LearningRate 0.0007 Epoch: 2 Global Step: 53960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:32,231-Speed 6316.29 samples/sec Loss 11.4856 LearningRate 0.0007 Epoch: 2 Global Step: 53970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:35,478-Speed 6309.05 samples/sec Loss 11.4192 LearningRate 0.0007 Epoch: 2 Global Step: 53980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:38,720-Speed 6319.34 samples/sec Loss 11.4707 LearningRate 0.0007 Epoch: 2 Global Step: 53990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:41,961-Speed 6320.07 samples/sec Loss 11.4712 LearningRate 0.0007 Epoch: 2 Global Step: 54000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:45,204-Speed 6317.15 samples/sec Loss 11.4253 LearningRate 0.0007 Epoch: 2 Global Step: 54010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:48,445-Speed 6320.89 samples/sec Loss 11.4539 LearningRate 0.0007 Epoch: 2 Global Step: 54020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:51,690-Speed 6313.14 samples/sec Loss 11.4340 LearningRate 0.0007 Epoch: 2 Global Step: 54030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:54,934-Speed 6313.56 samples/sec Loss 11.4843 LearningRate 0.0007 Epoch: 2 Global Step: 54040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:38:58,176-Speed 6317.89 samples/sec Loss 11.4248 LearningRate 0.0007 Epoch: 2 Global Step: 54050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:01,432-Speed 6291.40 samples/sec Loss 11.4417 LearningRate 0.0007 Epoch: 2 Global Step: 54060 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:39:04,663-Speed 6341.81 samples/sec Loss 11.4438 LearningRate 0.0007 Epoch: 2 Global Step: 54070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:07,906-Speed 6314.66 samples/sec Loss 11.5100 LearningRate 0.0007 Epoch: 2 Global Step: 54080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:11,156-Speed 6304.54 samples/sec Loss 11.5430 LearningRate 0.0007 Epoch: 2 Global Step: 54090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:14,398-Speed 6318.43 samples/sec Loss 11.5423 LearningRate 0.0007 Epoch: 2 Global Step: 54100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:17,643-Speed 6312.94 samples/sec Loss 11.4876 LearningRate 0.0007 Epoch: 2 Global Step: 54110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:20,891-Speed 6305.61 samples/sec Loss 11.5083 LearningRate 0.0007 Epoch: 2 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:24,140-Speed 6304.30 samples/sec Loss 11.6344 LearningRate 0.0007 Epoch: 2 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:27,381-Speed 6321.78 samples/sec Loss 11.4733 LearningRate 0.0007 Epoch: 2 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:30,622-Speed 6320.64 samples/sec Loss 11.4550 LearningRate 0.0007 Epoch: 2 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:33,866-Speed 6314.53 samples/sec Loss 11.4114 LearningRate 0.0007 Epoch: 2 Global Step: 54160 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:37,096-Speed 6340.27 samples/sec Loss 11.5013 LearningRate 0.0007 Epoch: 2 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:40,373-Speed 6251.67 samples/sec Loss 11.4233 LearningRate 0.0007 Epoch: 2 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:43,619-Speed 6311.76 samples/sec Loss 11.5058 LearningRate 0.0007 Epoch: 2 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:46,863-Speed 6314.87 samples/sec Loss 11.5057 LearningRate 0.0007 Epoch: 2 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:50,105-Speed 6318.96 samples/sec Loss 11.4698 LearningRate 0.0007 Epoch: 2 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:53,358-Speed 6296.06 samples/sec Loss 11.3723 LearningRate 0.0007 Epoch: 2 Global Step: 54220 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:56,600-Speed 6319.05 samples/sec Loss 11.4400 LearningRate 0.0007 Epoch: 2 Global Step: 54230 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:39:59,848-Speed 6310.21 samples/sec Loss 11.5606 LearningRate 0.0007 Epoch: 2 Global Step: 54240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:03,098-Speed 6303.14 samples/sec Loss 11.4321 LearningRate 0.0007 Epoch: 2 Global Step: 54250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:06,343-Speed 6311.73 samples/sec Loss 11.4368 LearningRate 0.0007 Epoch: 2 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:09,577-Speed 6334.30 samples/sec Loss 11.5252 LearningRate 0.0007 Epoch: 2 Global Step: 54270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:12,821-Speed 6313.87 samples/sec Loss 11.4950 LearningRate 0.0007 Epoch: 2 Global Step: 54280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:16,065-Speed 6315.15 samples/sec Loss 11.3977 LearningRate 0.0007 Epoch: 2 Global Step: 54290 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:19,312-Speed 6309.01 samples/sec Loss 11.4456 LearningRate 0.0007 Epoch: 2 Global Step: 54300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:22,562-Speed 6302.00 samples/sec Loss 11.4310 LearningRate 0.0007 Epoch: 2 Global Step: 54310 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:25,804-Speed 6319.82 samples/sec Loss 11.4166 LearningRate 0.0007 Epoch: 2 Global Step: 54320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:29,043-Speed 6324.26 samples/sec Loss 11.4670 LearningRate 0.0007 Epoch: 2 Global Step: 54330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:32,288-Speed 6311.28 samples/sec Loss 11.4530 LearningRate 0.0007 Epoch: 2 Global Step: 54340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:35,533-Speed 6313.82 samples/sec Loss 11.4096 LearningRate 0.0007 Epoch: 2 Global Step: 54350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:38,779-Speed 6310.24 samples/sec Loss 11.4666 LearningRate 0.0007 Epoch: 2 Global Step: 54360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:42,079-Speed 6207.73 samples/sec Loss 11.4054 LearningRate 0.0007 Epoch: 2 Global Step: 54370 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:40:45,309-Speed 6342.57 samples/sec Loss 11.5313 LearningRate 0.0007 Epoch: 2 Global Step: 54380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:48,551-Speed 6317.22 samples/sec Loss 11.3261 LearningRate 0.0007 Epoch: 2 Global Step: 54390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:51,792-Speed 6319.88 samples/sec Loss 11.4099 LearningRate 0.0007 Epoch: 2 Global Step: 54400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:55,038-Speed 6312.48 samples/sec Loss 11.4615 LearningRate 0.0007 Epoch: 2 Global Step: 54410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:40:58,289-Speed 6301.59 samples/sec Loss 11.4713 LearningRate 0.0007 Epoch: 2 Global Step: 54420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:01,534-Speed 6315.06 samples/sec Loss 11.4975 LearningRate 0.0007 Epoch: 2 Global Step: 54430 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:04,777-Speed 6317.50 samples/sec Loss 11.5759 LearningRate 0.0007 Epoch: 2 Global Step: 54440 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:08,025-Speed 6306.63 samples/sec Loss 11.4689 LearningRate 0.0007 Epoch: 2 Global Step: 54450 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:11,270-Speed 6312.35 samples/sec Loss 11.4348 LearningRate 0.0007 Epoch: 2 Global Step: 54460 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:14,514-Speed 6313.85 samples/sec Loss 11.4012 LearningRate 0.0007 Epoch: 2 Global Step: 54470 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:17,768-Speed 6296.81 samples/sec Loss 11.3491 LearningRate 0.0007 Epoch: 2 Global Step: 54480 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:21,010-Speed 6317.11 samples/sec Loss 11.4359 LearningRate 0.0007 Epoch: 2 Global Step: 54490 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:24,254-Speed 6315.50 samples/sec Loss 11.4348 LearningRate 0.0007 Epoch: 2 Global Step: 54500 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:27,496-Speed 6317.36 samples/sec Loss 11.3940 LearningRate 0.0007 Epoch: 2 Global Step: 54510 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:30,739-Speed 6317.08 samples/sec Loss 11.4156 LearningRate 0.0007 Epoch: 2 Global Step: 54520 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:33,985-Speed 6311.66 samples/sec Loss 11.4186 LearningRate 0.0007 Epoch: 2 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:37,226-Speed 6320.01 samples/sec Loss 11.4652 LearningRate 0.0007 Epoch: 2 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:40,469-Speed 6315.90 samples/sec Loss 11.4803 LearningRate 0.0007 Epoch: 2 Global Step: 54550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:43,716-Speed 6308.55 samples/sec Loss 11.4540 LearningRate 0.0007 Epoch: 2 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:46,958-Speed 6319.63 samples/sec Loss 11.5099 LearningRate 0.0007 Epoch: 2 Global Step: 54570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:50,203-Speed 6312.05 samples/sec Loss 11.4000 LearningRate 0.0007 Epoch: 2 Global Step: 54580 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:41:53,433-Speed 6342.26 samples/sec Loss 11.3940 LearningRate 0.0007 Epoch: 2 Global Step: 54590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:56,683-Speed 6303.86 samples/sec Loss 11.4658 LearningRate 0.0007 Epoch: 2 Global Step: 54600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:41:59,929-Speed 6311.03 samples/sec Loss 11.4342 LearningRate 0.0007 Epoch: 2 Global Step: 54610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:03,174-Speed 6311.90 samples/sec Loss 11.3524 LearningRate 0.0007 Epoch: 2 Global Step: 54620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:06,424-Speed 6303.53 samples/sec Loss 11.3780 LearningRate 0.0007 Epoch: 2 Global Step: 54630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:09,671-Speed 6308.61 samples/sec Loss 11.4466 LearningRate 0.0007 Epoch: 2 Global Step: 54640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:12,922-Speed 6300.69 samples/sec Loss 11.4420 LearningRate 0.0007 Epoch: 2 Global Step: 54650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:16,168-Speed 6310.36 samples/sec Loss 11.4385 LearningRate 0.0007 Epoch: 2 Global Step: 54660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:19,413-Speed 6312.90 samples/sec Loss 11.4016 LearningRate 0.0007 Epoch: 2 Global Step: 54670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:22,657-Speed 6315.26 samples/sec Loss 11.4613 LearningRate 0.0007 Epoch: 2 Global Step: 54680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:25,887-Speed 6342.43 samples/sec Loss 11.4697 LearningRate 0.0007 Epoch: 2 Global Step: 54690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:29,131-Speed 6314.19 samples/sec Loss 11.5308 LearningRate 0.0007 Epoch: 2 Global Step: 54700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:32,379-Speed 6307.46 samples/sec Loss 11.3502 LearningRate 0.0007 Epoch: 2 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:35,618-Speed 6323.04 samples/sec Loss 11.3989 LearningRate 0.0007 Epoch: 2 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:38,864-Speed 6311.21 samples/sec Loss 11.2838 LearningRate 0.0007 Epoch: 2 Global Step: 54730 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:42,109-Speed 6312.30 samples/sec Loss 11.4369 LearningRate 0.0007 Epoch: 2 Global Step: 54740 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:45,362-Speed 6297.21 samples/sec Loss 11.3894 LearningRate 0.0007 Epoch: 2 Global Step: 54750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:48,602-Speed 6322.09 samples/sec Loss 11.4005 LearningRate 0.0007 Epoch: 2 Global Step: 54760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:51,848-Speed 6311.81 samples/sec Loss 11.4065 LearningRate 0.0007 Epoch: 2 Global Step: 54770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:55,090-Speed 6317.21 samples/sec Loss 11.4186 LearningRate 0.0007 Epoch: 2 Global Step: 54780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:42:58,333-Speed 6315.96 samples/sec Loss 11.3649 LearningRate 0.0007 Epoch: 2 Global Step: 54790 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:43:01,569-Speed 6335.21 samples/sec Loss 11.4125 LearningRate 0.0007 Epoch: 2 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:04,813-Speed 6314.18 samples/sec Loss 11.3724 LearningRate 0.0007 Epoch: 2 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:08,057-Speed 6315.38 samples/sec Loss 11.3455 LearningRate 0.0007 Epoch: 2 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:11,297-Speed 6321.46 samples/sec Loss 11.4922 LearningRate 0.0007 Epoch: 2 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:14,543-Speed 6310.59 samples/sec Loss 11.3945 LearningRate 0.0007 Epoch: 2 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:17,789-Speed 6310.82 samples/sec Loss 11.4381 LearningRate 0.0007 Epoch: 2 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:21,032-Speed 6316.40 samples/sec Loss 11.3737 LearningRate 0.0007 Epoch: 2 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:24,277-Speed 6312.42 samples/sec Loss 11.4288 LearningRate 0.0007 Epoch: 2 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:27,518-Speed 6320.88 samples/sec Loss 11.4295 LearningRate 0.0007 Epoch: 2 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:30,764-Speed 6312.15 samples/sec Loss 11.3800 LearningRate 0.0007 Epoch: 2 Global Step: 54890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:33,995-Speed 6339.11 samples/sec Loss 11.3575 LearningRate 0.0007 Epoch: 2 Global Step: 54900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:37,242-Speed 6308.59 samples/sec Loss 11.3908 LearningRate 0.0007 Epoch: 2 Global Step: 54910 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:40,485-Speed 6316.93 samples/sec Loss 11.3052 LearningRate 0.0007 Epoch: 2 Global Step: 54920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:43,729-Speed 6314.87 samples/sec Loss 11.4048 LearningRate 0.0007 Epoch: 2 Global Step: 54930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:46,978-Speed 6304.54 samples/sec Loss 11.4098 LearningRate 0.0007 Epoch: 2 Global Step: 54940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:50,218-Speed 6321.72 samples/sec Loss 11.4158 LearningRate 0.0007 Epoch: 2 Global Step: 54950 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:53,463-Speed 6313.74 samples/sec Loss 11.3955 LearningRate 0.0007 Epoch: 2 Global Step: 54960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:56,718-Speed 6292.28 samples/sec Loss 11.4152 LearningRate 0.0007 Epoch: 2 Global Step: 54970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:43:59,961-Speed 6316.54 samples/sec Loss 11.3725 LearningRate 0.0007 Epoch: 2 Global Step: 54980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:03,208-Speed 6309.64 samples/sec Loss 11.4673 LearningRate 0.0007 Epoch: 2 Global Step: 54990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:06,433-Speed 6350.16 samples/sec Loss 11.3092 LearningRate 0.0007 Epoch: 2 Global Step: 55000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:09,678-Speed 6313.76 samples/sec Loss 11.3341 LearningRate 0.0007 Epoch: 2 Global Step: 55010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:12,924-Speed 6311.84 samples/sec Loss 11.3165 LearningRate 0.0007 Epoch: 2 Global Step: 55020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:16,167-Speed 6315.99 samples/sec Loss 11.4482 LearningRate 0.0007 Epoch: 2 Global Step: 55030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:19,413-Speed 6311.89 samples/sec Loss 11.3386 LearningRate 0.0007 Epoch: 2 Global Step: 55040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:22,656-Speed 6316.13 samples/sec Loss 11.4014 LearningRate 0.0007 Epoch: 2 Global Step: 55050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:25,904-Speed 6305.95 samples/sec Loss 11.3225 LearningRate 0.0007 Epoch: 2 Global Step: 55060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:29,150-Speed 6312.06 samples/sec Loss 11.2948 LearningRate 0.0007 Epoch: 2 Global Step: 55070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:32,394-Speed 6314.90 samples/sec Loss 11.3683 LearningRate 0.0007 Epoch: 2 Global Step: 55080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:35,638-Speed 6314.45 samples/sec Loss 11.3658 LearningRate 0.0007 Epoch: 2 Global Step: 55090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:38,882-Speed 6313.89 samples/sec Loss 11.3782 LearningRate 0.0007 Epoch: 2 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:44:42,113-Speed 6340.27 samples/sec Loss 11.3005 LearningRate 0.0007 Epoch: 2 Global Step: 55110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:45,361-Speed 6305.85 samples/sec Loss 11.3138 LearningRate 0.0007 Epoch: 2 Global Step: 55120 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:48,614-Speed 6297.39 samples/sec Loss 11.3204 LearningRate 0.0007 Epoch: 2 Global Step: 55130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:51,861-Speed 6308.41 samples/sec Loss 11.2976 LearningRate 0.0007 Epoch: 2 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:55,111-Speed 6303.27 samples/sec Loss 11.4781 LearningRate 0.0007 Epoch: 2 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:44:58,358-Speed 6309.58 samples/sec Loss 11.3296 LearningRate 0.0007 Epoch: 2 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:01,609-Speed 6299.95 samples/sec Loss 11.3934 LearningRate 0.0007 Epoch: 2 Global Step: 55170 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:04,856-Speed 6308.50 samples/sec Loss 11.4469 LearningRate 0.0007 Epoch: 2 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:08,104-Speed 6307.19 samples/sec Loss 11.3790 LearningRate 0.0007 Epoch: 2 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:11,345-Speed 6320.80 samples/sec Loss 11.4209 LearningRate 0.0007 Epoch: 2 Global Step: 55200 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:14,575-Speed 6342.58 samples/sec Loss 11.3371 LearningRate 0.0007 Epoch: 2 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:17,816-Speed 6321.67 samples/sec Loss 11.2581 LearningRate 0.0007 Epoch: 2 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:21,060-Speed 6314.50 samples/sec Loss 11.3948 LearningRate 0.0007 Epoch: 2 Global Step: 55230 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:24,304-Speed 6313.20 samples/sec Loss 11.3839 LearningRate 0.0007 Epoch: 2 Global Step: 55240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:27,546-Speed 6319.20 samples/sec Loss 11.2561 LearningRate 0.0007 Epoch: 2 Global Step: 55250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:30,790-Speed 6314.54 samples/sec Loss 11.4069 LearningRate 0.0007 Epoch: 2 Global Step: 55260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:34,031-Speed 6319.92 samples/sec Loss 11.2808 LearningRate 0.0007 Epoch: 2 Global Step: 55270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:37,278-Speed 6309.55 samples/sec Loss 11.3236 LearningRate 0.0007 Epoch: 2 Global Step: 55280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:40,523-Speed 6312.36 samples/sec Loss 11.3778 LearningRate 0.0007 Epoch: 2 Global Step: 55290 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:43,772-Speed 6305.93 samples/sec Loss 11.2734 LearningRate 0.0007 Epoch: 2 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:47,016-Speed 6314.86 samples/sec Loss 11.4130 LearningRate 0.0007 Epoch: 2 Global Step: 55310 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:45:50,246-Speed 6340.88 samples/sec Loss 11.2554 LearningRate 0.0007 Epoch: 2 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:53,494-Speed 6307.15 samples/sec Loss 11.3592 LearningRate 0.0007 Epoch: 2 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:56,735-Speed 6319.41 samples/sec Loss 11.3431 LearningRate 0.0007 Epoch: 2 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:45:59,982-Speed 6310.23 samples/sec Loss 11.3308 LearningRate 0.0007 Epoch: 2 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:03,230-Speed 6305.85 samples/sec Loss 11.2767 LearningRate 0.0007 Epoch: 2 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:06,474-Speed 6315.17 samples/sec Loss 11.2382 LearningRate 0.0007 Epoch: 2 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:09,717-Speed 6316.25 samples/sec Loss 11.2867 LearningRate 0.0007 Epoch: 2 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:12,966-Speed 6307.74 samples/sec Loss 11.2843 LearningRate 0.0007 Epoch: 2 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:16,217-Speed 6299.90 samples/sec Loss 11.3995 LearningRate 0.0007 Epoch: 2 Global Step: 55400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:19,464-Speed 6310.47 samples/sec Loss 11.2506 LearningRate 0.0007 Epoch: 2 Global Step: 55410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:22,758-Speed 6218.60 samples/sec Loss 11.2375 LearningRate 0.0007 Epoch: 2 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:46:26,040-Speed 6240.61 samples/sec Loss 11.4631 LearningRate 0.0007 Epoch: 2 Global Step: 55430 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:29,285-Speed 6312.69 samples/sec Loss 11.3006 LearningRate 0.0007 Epoch: 2 Global Step: 55440 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:32,531-Speed 6312.74 samples/sec Loss 11.2941 LearningRate 0.0007 Epoch: 2 Global Step: 55450 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:35,776-Speed 6311.82 samples/sec Loss 11.3733 LearningRate 0.0007 Epoch: 2 Global Step: 55460 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:39,021-Speed 6312.73 samples/sec Loss 11.3217 LearningRate 0.0007 Epoch: 2 Global Step: 55470 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:42,262-Speed 6320.73 samples/sec Loss 11.4047 LearningRate 0.0007 Epoch: 2 Global Step: 55480 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:45,502-Speed 6321.67 samples/sec Loss 11.2690 LearningRate 0.0007 Epoch: 2 Global Step: 55490 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:48,744-Speed 6318.88 samples/sec Loss 11.2217 LearningRate 0.0007 Epoch: 2 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:51,989-Speed 6312.87 samples/sec Loss 11.3006 LearningRate 0.0007 Epoch: 2 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:55,232-Speed 6315.26 samples/sec Loss 11.3178 LearningRate 0.0007 Epoch: 2 Global Step: 55520 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:46:58,464-Speed 6338.26 samples/sec Loss 11.2915 LearningRate 0.0007 Epoch: 2 Global Step: 55530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:01,755-Speed 6224.71 samples/sec Loss 11.2827 LearningRate 0.0007 Epoch: 2 Global Step: 55540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:05,049-Speed 6219.29 samples/sec Loss 11.3248 LearningRate 0.0007 Epoch: 2 Global Step: 55550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:08,319-Speed 6263.82 samples/sec Loss 11.2249 LearningRate 0.0007 Epoch: 2 Global Step: 55560 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:11,563-Speed 6316.72 samples/sec Loss 11.3530 LearningRate 0.0007 Epoch: 2 Global Step: 55570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:14,807-Speed 6314.36 samples/sec Loss 11.3333 LearningRate 0.0007 Epoch: 2 Global Step: 55580 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:18,091-Speed 6237.98 samples/sec Loss 11.2991 LearningRate 0.0007 Epoch: 2 Global Step: 55590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:21,334-Speed 6315.94 samples/sec Loss 11.3011 LearningRate 0.0007 Epoch: 2 Global Step: 55600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:24,578-Speed 6313.68 samples/sec Loss 11.2943 LearningRate 0.0007 Epoch: 2 Global Step: 55610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:27,825-Speed 6308.62 samples/sec Loss 11.3164 LearningRate 0.0007 Epoch: 2 Global Step: 55620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:31,055-Speed 6343.10 samples/sec Loss 11.2413 LearningRate 0.0007 Epoch: 2 Global Step: 55630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:34,298-Speed 6317.55 samples/sec Loss 11.3963 LearningRate 0.0007 Epoch: 2 Global Step: 55640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:37,539-Speed 6320.69 samples/sec Loss 11.2789 LearningRate 0.0007 Epoch: 2 Global Step: 55650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:40,785-Speed 6309.51 samples/sec Loss 11.3655 LearningRate 0.0007 Epoch: 2 Global Step: 55660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:44,030-Speed 6314.21 samples/sec Loss 11.4121 LearningRate 0.0007 Epoch: 2 Global Step: 55670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:47,278-Speed 6305.74 samples/sec Loss 11.3220 LearningRate 0.0007 Epoch: 2 Global Step: 55680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:50,524-Speed 6311.73 samples/sec Loss 11.3632 LearningRate 0.0007 Epoch: 2 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:53,768-Speed 6313.54 samples/sec Loss 11.3493 LearningRate 0.0007 Epoch: 2 Global Step: 55700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:47:57,012-Speed 6313.99 samples/sec Loss 11.2183 LearningRate 0.0007 Epoch: 2 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:00,256-Speed 6316.26 samples/sec Loss 11.3520 LearningRate 0.0007 Epoch: 2 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:03,498-Speed 6317.07 samples/sec Loss 11.1524 LearningRate 0.0007 Epoch: 2 Global Step: 55730 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:48:06,745-Speed 6308.31 samples/sec Loss 11.2009 LearningRate 0.0007 Epoch: 2 Global Step: 55740 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:48:09,975-Speed 6343.20 samples/sec Loss 11.2933 LearningRate 0.0007 Epoch: 2 Global Step: 55750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:13,218-Speed 6316.50 samples/sec Loss 11.3387 LearningRate 0.0007 Epoch: 2 Global Step: 55760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:16,469-Speed 6301.39 samples/sec Loss 11.3679 LearningRate 0.0007 Epoch: 2 Global Step: 55770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:19,717-Speed 6305.57 samples/sec Loss 11.1393 LearningRate 0.0007 Epoch: 2 Global Step: 55780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:22,963-Speed 6310.83 samples/sec Loss 11.3407 LearningRate 0.0007 Epoch: 2 Global Step: 55790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:26,204-Speed 6320.14 samples/sec Loss 11.2880 LearningRate 0.0007 Epoch: 2 Global Step: 55800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:29,446-Speed 6319.11 samples/sec Loss 11.2857 LearningRate 0.0007 Epoch: 2 Global Step: 55810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:32,689-Speed 6315.40 samples/sec Loss 11.2135 LearningRate 0.0007 Epoch: 2 Global Step: 55820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:35,945-Speed 6293.99 samples/sec Loss 11.2680 LearningRate 0.0007 Epoch: 2 Global Step: 55830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:39,189-Speed 6314.61 samples/sec Loss 11.2792 LearningRate 0.0007 Epoch: 2 Global Step: 55840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:42,415-Speed 6348.39 samples/sec Loss 11.1603 LearningRate 0.0007 Epoch: 2 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:45,658-Speed 6317.09 samples/sec Loss 11.2619 LearningRate 0.0007 Epoch: 2 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:48,901-Speed 6316.10 samples/sec Loss 11.1769 LearningRate 0.0007 Epoch: 2 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:52,143-Speed 6319.45 samples/sec Loss 11.2526 LearningRate 0.0007 Epoch: 2 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:55,386-Speed 6316.75 samples/sec Loss 11.2228 LearningRate 0.0007 Epoch: 2 Global Step: 55890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:48:58,630-Speed 6314.94 samples/sec Loss 11.2493 LearningRate 0.0007 Epoch: 2 Global Step: 55900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:01,884-Speed 6294.41 samples/sec Loss 11.2794 LearningRate 0.0007 Epoch: 2 Global Step: 55910 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:05,126-Speed 6317.84 samples/sec Loss 11.2034 LearningRate 0.0007 Epoch: 2 Global Step: 55920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:08,372-Speed 6312.24 samples/sec Loss 11.3013 LearningRate 0.0007 Epoch: 2 Global Step: 55930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:11,613-Speed 6318.79 samples/sec Loss 11.2192 LearningRate 0.0007 Epoch: 2 Global Step: 55940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:14,862-Speed 6306.17 samples/sec Loss 11.2213 LearningRate 0.0007 Epoch: 2 Global Step: 55950 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:49:18,093-Speed 6340.06 samples/sec Loss 11.3103 LearningRate 0.0007 Epoch: 2 Global Step: 55960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:21,337-Speed 6313.64 samples/sec Loss 11.1808 LearningRate 0.0007 Epoch: 2 Global Step: 55970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:24,580-Speed 6316.91 samples/sec Loss 11.2884 LearningRate 0.0007 Epoch: 2 Global Step: 55980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:27,824-Speed 6314.46 samples/sec Loss 11.3489 LearningRate 0.0007 Epoch: 2 Global Step: 55990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:31,069-Speed 6312.29 samples/sec Loss 11.3562 LearningRate 0.0007 Epoch: 2 Global Step: 56000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:34,313-Speed 6315.28 samples/sec Loss 11.2860 LearningRate 0.0007 Epoch: 2 Global Step: 56010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:37,558-Speed 6311.97 samples/sec Loss 11.2283 LearningRate 0.0007 Epoch: 2 Global Step: 56020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:40,802-Speed 6314.68 samples/sec Loss 11.2604 LearningRate 0.0007 Epoch: 2 Global Step: 56030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:44,045-Speed 6316.79 samples/sec Loss 11.2110 LearningRate 0.0007 Epoch: 2 Global Step: 56040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:47,289-Speed 6315.84 samples/sec Loss 11.2411 LearningRate 0.0007 Epoch: 2 Global Step: 56050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:50,518-Speed 6344.13 samples/sec Loss 11.2267 LearningRate 0.0007 Epoch: 2 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:53,768-Speed 6304.41 samples/sec Loss 11.2498 LearningRate 0.0007 Epoch: 2 Global Step: 56070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:49:57,012-Speed 6314.26 samples/sec Loss 11.3237 LearningRate 0.0007 Epoch: 2 Global Step: 56080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:00,250-Speed 6325.97 samples/sec Loss 11.3334 LearningRate 0.0007 Epoch: 2 Global Step: 56090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:03,495-Speed 6313.25 samples/sec Loss 11.3037 LearningRate 0.0007 Epoch: 2 Global Step: 56100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:06,736-Speed 6320.04 samples/sec Loss 11.1918 LearningRate 0.0007 Epoch: 2 Global Step: 56110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:09,979-Speed 6316.44 samples/sec Loss 11.1451 LearningRate 0.0007 Epoch: 2 Global Step: 56120 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:13,223-Speed 6315.64 samples/sec Loss 11.2290 LearningRate 0.0007 Epoch: 2 Global Step: 56130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:16,466-Speed 6315.48 samples/sec Loss 11.1380 LearningRate 0.0007 Epoch: 2 Global Step: 56140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:19,712-Speed 6310.83 samples/sec Loss 11.2473 LearningRate 0.0007 Epoch: 2 Global Step: 56150 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:22,945-Speed 6337.44 samples/sec Loss 11.2499 LearningRate 0.0007 Epoch: 2 Global Step: 56160 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:26,189-Speed 6313.54 samples/sec Loss 11.2583 LearningRate 0.0007 Epoch: 2 Global Step: 56170 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:29,430-Speed 6321.42 samples/sec Loss 11.1835 LearningRate 0.0007 Epoch: 2 Global Step: 56180 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:32,670-Speed 6320.95 samples/sec Loss 11.1726 LearningRate 0.0007 Epoch: 2 Global Step: 56190 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:35,914-Speed 6316.14 samples/sec Loss 11.2225 LearningRate 0.0007 Epoch: 2 Global Step: 56200 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:39,154-Speed 6321.61 samples/sec Loss 11.2378 LearningRate 0.0007 Epoch: 2 Global Step: 56210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:42,396-Speed 6317.16 samples/sec Loss 11.2627 LearningRate 0.0007 Epoch: 2 Global Step: 56220 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:45,641-Speed 6313.67 samples/sec Loss 11.3144 LearningRate 0.0007 Epoch: 2 Global Step: 56230 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:48,887-Speed 6310.93 samples/sec Loss 11.1722 LearningRate 0.0007 Epoch: 2 Global Step: 56240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:52,142-Speed 6293.73 samples/sec Loss 11.2693 LearningRate 0.0007 Epoch: 2 Global Step: 56250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:55,373-Speed 6341.44 samples/sec Loss 11.3206 LearningRate 0.0007 Epoch: 2 Global Step: 56260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:50:58,615-Speed 6318.44 samples/sec Loss 11.2817 LearningRate 0.0007 Epoch: 2 Global Step: 56270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:01,863-Speed 6306.61 samples/sec Loss 11.1757 LearningRate 0.0007 Epoch: 2 Global Step: 56280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:05,110-Speed 6308.52 samples/sec Loss 11.1856 LearningRate 0.0007 Epoch: 2 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:08,353-Speed 6315.26 samples/sec Loss 11.3063 LearningRate 0.0007 Epoch: 2 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:11,597-Speed 6315.13 samples/sec Loss 11.2548 LearningRate 0.0007 Epoch: 2 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:14,846-Speed 6304.61 samples/sec Loss 11.3202 LearningRate 0.0007 Epoch: 2 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:18,088-Speed 6319.55 samples/sec Loss 11.1992 LearningRate 0.0007 Epoch: 2 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:21,332-Speed 6313.82 samples/sec Loss 11.1303 LearningRate 0.0007 Epoch: 2 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:24,577-Speed 6313.80 samples/sec Loss 11.1884 LearningRate 0.0007 Epoch: 2 Global Step: 56350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:27,810-Speed 6335.29 samples/sec Loss 11.1738 LearningRate 0.0007 Epoch: 2 Global Step: 56360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:31,054-Speed 6315.33 samples/sec Loss 11.2333 LearningRate 0.0007 Epoch: 2 Global Step: 56370 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:34,301-Speed 6307.49 samples/sec Loss 11.1811 LearningRate 0.0007 Epoch: 2 Global Step: 56380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:37,549-Speed 6307.26 samples/sec Loss 11.2547 LearningRate 0.0007 Epoch: 2 Global Step: 56390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:40,791-Speed 6317.35 samples/sec Loss 11.2858 LearningRate 0.0007 Epoch: 2 Global Step: 56400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:44,036-Speed 6313.42 samples/sec Loss 11.2407 LearningRate 0.0007 Epoch: 2 Global Step: 56410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:47,284-Speed 6306.90 samples/sec Loss 11.2090 LearningRate 0.0007 Epoch: 2 Global Step: 56420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:50,529-Speed 6313.37 samples/sec Loss 11.1810 LearningRate 0.0007 Epoch: 2 Global Step: 56430 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:53,776-Speed 6308.09 samples/sec Loss 11.2069 LearningRate 0.0007 Epoch: 2 Global Step: 56440 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:51:57,016-Speed 6322.73 samples/sec Loss 11.2150 LearningRate 0.0007 Epoch: 2 Global Step: 56450 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:00,247-Speed 6339.88 samples/sec Loss 11.1822 LearningRate 0.0007 Epoch: 2 Global Step: 56460 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:03,500-Speed 6298.62 samples/sec Loss 11.2314 LearningRate 0.0007 Epoch: 2 Global Step: 56470 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:06,758-Speed 6287.80 samples/sec Loss 11.2459 LearningRate 0.0007 Epoch: 2 Global Step: 56480 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:10,003-Speed 6311.57 samples/sec Loss 11.1768 LearningRate 0.0007 Epoch: 2 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:13,248-Speed 6313.49 samples/sec Loss 11.1995 LearningRate 0.0007 Epoch: 2 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:16,564-Speed 6176.47 samples/sec Loss 11.1896 LearningRate 0.0007 Epoch: 2 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:19,811-Speed 6309.22 samples/sec Loss 11.2141 LearningRate 0.0007 Epoch: 2 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:23,060-Speed 6306.14 samples/sec Loss 11.1941 LearningRate 0.0007 Epoch: 2 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:26,302-Speed 6317.29 samples/sec Loss 11.1297 LearningRate 0.0007 Epoch: 2 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:29,545-Speed 6315.91 samples/sec Loss 11.1656 LearningRate 0.0007 Epoch: 2 Global Step: 56550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:32,793-Speed 6307.16 samples/sec Loss 11.1756 LearningRate 0.0007 Epoch: 2 Global Step: 56560 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:52:36,022-Speed 6344.27 samples/sec Loss 11.1799 LearningRate 0.0007 Epoch: 2 Global Step: 56570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:39,294-Speed 6260.03 samples/sec Loss 11.1196 LearningRate 0.0007 Epoch: 2 Global Step: 56580 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:42,586-Speed 6223.55 samples/sec Loss 11.1342 LearningRate 0.0007 Epoch: 2 Global Step: 56590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:45,834-Speed 6306.76 samples/sec Loss 11.2181 LearningRate 0.0007 Epoch: 2 Global Step: 56600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:49,110-Speed 6251.44 samples/sec Loss 11.1717 LearningRate 0.0007 Epoch: 2 Global Step: 56610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:52,431-Speed 6168.13 samples/sec Loss 11.3306 LearningRate 0.0007 Epoch: 2 Global Step: 56620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:55,679-Speed 6307.80 samples/sec Loss 11.1784 LearningRate 0.0007 Epoch: 2 Global Step: 56630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:52:58,923-Speed 6315.63 samples/sec Loss 11.2059 LearningRate 0.0007 Epoch: 2 Global Step: 56640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:02,167-Speed 6314.49 samples/sec Loss 11.1928 LearningRate 0.0007 Epoch: 2 Global Step: 56650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:05,410-Speed 6316.28 samples/sec Loss 11.2101 LearningRate 0.0007 Epoch: 2 Global Step: 56660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:08,636-Speed 6350.37 samples/sec Loss 11.0993 LearningRate 0.0007 Epoch: 2 Global Step: 56670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:11,880-Speed 6314.47 samples/sec Loss 11.2434 LearningRate 0.0007 Epoch: 2 Global Step: 56680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:15,123-Speed 6316.71 samples/sec Loss 11.2996 LearningRate 0.0007 Epoch: 2 Global Step: 56690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:18,374-Speed 6301.37 samples/sec Loss 11.1867 LearningRate 0.0007 Epoch: 2 Global Step: 56700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:21,620-Speed 6310.63 samples/sec Loss 11.1865 LearningRate 0.0007 Epoch: 2 Global Step: 56710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:24,864-Speed 6315.37 samples/sec Loss 11.1650 LearningRate 0.0007 Epoch: 2 Global Step: 56720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:28,131-Speed 6268.62 samples/sec Loss 11.1453 LearningRate 0.0007 Epoch: 2 Global Step: 56730 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:31,377-Speed 6310.93 samples/sec Loss 11.1799 LearningRate 0.0007 Epoch: 2 Global Step: 56740 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:34,623-Speed 6311.30 samples/sec Loss 11.2113 LearningRate 0.0007 Epoch: 2 Global Step: 56750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:37,871-Speed 6306.84 samples/sec Loss 11.1024 LearningRate 0.0007 Epoch: 2 Global Step: 56760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:41,103-Speed 6338.20 samples/sec Loss 11.2150 LearningRate 0.0007 Epoch: 2 Global Step: 56770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:44,353-Speed 6302.97 samples/sec Loss 11.1464 LearningRate 0.0007 Epoch: 2 Global Step: 56780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:47,599-Speed 6309.48 samples/sec Loss 11.1104 LearningRate 0.0007 Epoch: 2 Global Step: 56790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:50,842-Speed 6316.00 samples/sec Loss 11.2483 LearningRate 0.0007 Epoch: 2 Global Step: 56800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:54,087-Speed 6313.72 samples/sec Loss 11.1133 LearningRate 0.0007 Epoch: 2 Global Step: 56810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:53:57,336-Speed 6305.77 samples/sec Loss 11.1709 LearningRate 0.0007 Epoch: 2 Global Step: 56820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:00,586-Speed 6302.13 samples/sec Loss 11.1920 LearningRate 0.0007 Epoch: 2 Global Step: 56830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:03,837-Speed 6301.67 samples/sec Loss 11.1327 LearningRate 0.0007 Epoch: 2 Global Step: 56840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:07,085-Speed 6307.74 samples/sec Loss 11.1510 LearningRate 0.0007 Epoch: 2 Global Step: 56850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:10,330-Speed 6312.01 samples/sec Loss 11.2437 LearningRate 0.0007 Epoch: 2 Global Step: 56860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:13,574-Speed 6314.56 samples/sec Loss 11.2864 LearningRate 0.0007 Epoch: 2 Global Step: 56870 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:54:16,821-Speed 6308.99 samples/sec Loss 11.1883 LearningRate 0.0007 Epoch: 2 Global Step: 56880 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:54:20,053-Speed 6338.93 samples/sec Loss 11.2325 LearningRate 0.0007 Epoch: 2 Global Step: 56890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:23,298-Speed 6311.71 samples/sec Loss 11.1959 LearningRate 0.0007 Epoch: 2 Global Step: 56900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:26,544-Speed 6310.04 samples/sec Loss 11.2257 LearningRate 0.0007 Epoch: 2 Global Step: 56910 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:29,790-Speed 6311.44 samples/sec Loss 11.0849 LearningRate 0.0007 Epoch: 2 Global Step: 56920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:33,034-Speed 6313.97 samples/sec Loss 11.1310 LearningRate 0.0007 Epoch: 2 Global Step: 56930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:36,281-Speed 6309.82 samples/sec Loss 11.0128 LearningRate 0.0007 Epoch: 2 Global Step: 56940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:39,528-Speed 6308.95 samples/sec Loss 11.1392 LearningRate 0.0007 Epoch: 2 Global Step: 56950 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:42,772-Speed 6313.90 samples/sec Loss 11.2013 LearningRate 0.0007 Epoch: 2 Global Step: 56960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:46,019-Speed 6307.93 samples/sec Loss 11.1566 LearningRate 0.0007 Epoch: 2 Global Step: 56970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:49,261-Speed 6319.08 samples/sec Loss 11.2209 LearningRate 0.0007 Epoch: 2 Global Step: 56980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:52,495-Speed 6334.95 samples/sec Loss 11.1674 LearningRate 0.0007 Epoch: 2 Global Step: 56990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:55,738-Speed 6316.21 samples/sec Loss 10.9729 LearningRate 0.0007 Epoch: 2 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:54:59,051-Speed 6183.36 samples/sec Loss 11.1514 LearningRate 0.0007 Epoch: 2 Global Step: 57010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:02,376-Speed 6159.48 samples/sec Loss 11.1570 LearningRate 0.0007 Epoch: 2 Global Step: 57020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:05,630-Speed 6294.68 samples/sec Loss 11.0797 LearningRate 0.0007 Epoch: 2 Global Step: 57030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:08,878-Speed 6307.33 samples/sec Loss 11.0268 LearningRate 0.0007 Epoch: 2 Global Step: 57040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:12,121-Speed 6316.97 samples/sec Loss 11.2035 LearningRate 0.0007 Epoch: 2 Global Step: 57050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:15,367-Speed 6311.62 samples/sec Loss 11.0831 LearningRate 0.0007 Epoch: 2 Global Step: 57060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:18,610-Speed 6317.52 samples/sec Loss 11.1330 LearningRate 0.0007 Epoch: 2 Global Step: 57070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:21,851-Speed 6319.00 samples/sec Loss 11.0730 LearningRate 0.0007 Epoch: 2 Global Step: 57080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:25,080-Speed 6345.59 samples/sec Loss 11.1172 LearningRate 0.0007 Epoch: 2 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:28,326-Speed 6310.80 samples/sec Loss 11.1889 LearningRate 0.0007 Epoch: 2 Global Step: 57100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:31,575-Speed 6304.41 samples/sec Loss 11.1938 LearningRate 0.0007 Epoch: 2 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:34,819-Speed 6313.02 samples/sec Loss 11.1940 LearningRate 0.0007 Epoch: 2 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:38,066-Speed 6310.60 samples/sec Loss 11.0477 LearningRate 0.0007 Epoch: 2 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:41,310-Speed 6314.64 samples/sec Loss 11.0709 LearningRate 0.0007 Epoch: 2 Global Step: 57140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:44,552-Speed 6317.67 samples/sec Loss 11.1415 LearningRate 0.0007 Epoch: 2 Global Step: 57150 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:47,798-Speed 6310.99 samples/sec Loss 11.0405 LearningRate 0.0007 Epoch: 2 Global Step: 57160 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:51,040-Speed 6318.87 samples/sec Loss 11.1537 LearningRate 0.0007 Epoch: 2 Global Step: 57170 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:54,281-Speed 6319.93 samples/sec Loss 11.2862 LearningRate 0.0007 Epoch: 2 Global Step: 57180 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:55:57,515-Speed 6333.97 samples/sec Loss 11.2461 LearningRate 0.0007 Epoch: 2 Global Step: 57190 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:00,764-Speed 6304.63 samples/sec Loss 11.1603 LearningRate 0.0007 Epoch: 2 Global Step: 57200 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:04,012-Speed 6307.55 samples/sec Loss 11.2175 LearningRate 0.0007 Epoch: 2 Global Step: 57210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:07,252-Speed 6321.09 samples/sec Loss 11.1407 LearningRate 0.0007 Epoch: 2 Global Step: 57220 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:10,498-Speed 6311.13 samples/sec Loss 11.0426 LearningRate 0.0007 Epoch: 2 Global Step: 57230 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:13,751-Speed 6298.91 samples/sec Loss 11.0425 LearningRate 0.0007 Epoch: 2 Global Step: 57240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:16,999-Speed 6307.24 samples/sec Loss 11.1097 LearningRate 0.0007 Epoch: 2 Global Step: 57250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:20,246-Speed 6307.60 samples/sec Loss 11.0702 LearningRate 0.0007 Epoch: 2 Global Step: 57260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:23,502-Speed 6291.53 samples/sec Loss 11.2069 LearningRate 0.0007 Epoch: 2 Global Step: 57270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:26,750-Speed 6306.79 samples/sec Loss 11.1276 LearningRate 0.0007 Epoch: 2 Global Step: 57280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:29,996-Speed 6312.41 samples/sec Loss 11.1443 LearningRate 0.0007 Epoch: 2 Global Step: 57290 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:56:33,226-Speed 6340.08 samples/sec Loss 11.1379 LearningRate 0.0007 Epoch: 2 Global Step: 57300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:36,472-Speed 6311.64 samples/sec Loss 11.0924 LearningRate 0.0007 Epoch: 2 Global Step: 57310 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:39,715-Speed 6315.17 samples/sec Loss 11.0378 LearningRate 0.0007 Epoch: 2 Global Step: 57320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:42,971-Speed 6291.84 samples/sec Loss 11.0960 LearningRate 0.0007 Epoch: 2 Global Step: 57330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:46,217-Speed 6311.41 samples/sec Loss 11.1220 LearningRate 0.0007 Epoch: 2 Global Step: 57340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:49,461-Speed 6314.60 samples/sec Loss 11.1783 LearningRate 0.0007 Epoch: 2 Global Step: 57350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:52,706-Speed 6313.58 samples/sec Loss 10.9992 LearningRate 0.0007 Epoch: 2 Global Step: 57360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:56,006-Speed 6206.10 samples/sec Loss 11.0815 LearningRate 0.0007 Epoch: 2 Global Step: 57370 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:56:59,249-Speed 6316.62 samples/sec Loss 11.1787 LearningRate 0.0007 Epoch: 2 Global Step: 57380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:02,494-Speed 6312.79 samples/sec Loss 11.0822 LearningRate 0.0007 Epoch: 2 Global Step: 57390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:05,729-Speed 6332.45 samples/sec Loss 11.0897 LearningRate 0.0007 Epoch: 2 Global Step: 57400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:08,975-Speed 6310.48 samples/sec Loss 11.0734 LearningRate 0.0007 Epoch: 2 Global Step: 57410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:12,236-Speed 6282.02 samples/sec Loss 11.1156 LearningRate 0.0007 Epoch: 2 Global Step: 57420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:15,485-Speed 6305.04 samples/sec Loss 11.1850 LearningRate 0.0007 Epoch: 2 Global Step: 57430 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:18,730-Speed 6313.17 samples/sec Loss 11.1083 LearningRate 0.0007 Epoch: 2 Global Step: 57440 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:21,974-Speed 6314.10 samples/sec Loss 11.1545 LearningRate 0.0007 Epoch: 2 Global Step: 57450 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:25,221-Speed 6308.98 samples/sec Loss 11.1663 LearningRate 0.0007 Epoch: 2 Global Step: 57460 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:28,465-Speed 6315.59 samples/sec Loss 11.0593 LearningRate 0.0007 Epoch: 2 Global Step: 57470 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:31,715-Speed 6301.67 samples/sec Loss 11.1493 LearningRate 0.0007 Epoch: 2 Global Step: 57480 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:34,959-Speed 6315.13 samples/sec Loss 11.1064 LearningRate 0.0007 Epoch: 2 Global Step: 57490 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:38,188-Speed 6343.71 samples/sec Loss 11.1016 LearningRate 0.0007 Epoch: 2 Global Step: 57500 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:41,445-Speed 6290.34 samples/sec Loss 11.0336 LearningRate 0.0007 Epoch: 2 Global Step: 57510 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:44,686-Speed 6321.11 samples/sec Loss 11.1002 LearningRate 0.0007 Epoch: 2 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:47,943-Speed 6288.56 samples/sec Loss 11.1106 LearningRate 0.0007 Epoch: 2 Global Step: 57530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:51,302-Speed 6097.35 samples/sec Loss 11.1883 LearningRate 0.0007 Epoch: 2 Global Step: 57540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:54,546-Speed 6315.02 samples/sec Loss 11.1189 LearningRate 0.0007 Epoch: 2 Global Step: 57550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:57:57,788-Speed 6317.87 samples/sec Loss 11.0583 LearningRate 0.0007 Epoch: 2 Global Step: 57560 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:01,033-Speed 6313.43 samples/sec Loss 11.0659 LearningRate 0.0007 Epoch: 2 Global Step: 57570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:04,284-Speed 6300.97 samples/sec Loss 11.1314 LearningRate 0.0007 Epoch: 2 Global Step: 57580 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:07,526-Speed 6318.03 samples/sec Loss 11.0539 LearningRate 0.0007 Epoch: 2 Global Step: 57590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:10,760-Speed 6334.18 samples/sec Loss 11.0993 LearningRate 0.0007 Epoch: 2 Global Step: 57600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:14,004-Speed 6314.77 samples/sec Loss 11.0944 LearningRate 0.0007 Epoch: 2 Global Step: 57610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:17,254-Speed 6303.94 samples/sec Loss 11.1249 LearningRate 0.0007 Epoch: 2 Global Step: 57620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:20,502-Speed 6305.48 samples/sec Loss 11.0886 LearningRate 0.0007 Epoch: 2 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:23,746-Speed 6316.26 samples/sec Loss 11.0993 LearningRate 0.0007 Epoch: 2 Global Step: 57640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:26,996-Speed 6302.43 samples/sec Loss 11.0957 LearningRate 0.0007 Epoch: 2 Global Step: 57650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:30,243-Speed 6310.17 samples/sec Loss 11.1593 LearningRate 0.0007 Epoch: 2 Global Step: 57660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:33,487-Speed 6314.55 samples/sec Loss 11.1052 LearningRate 0.0007 Epoch: 2 Global Step: 57670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:36,733-Speed 6310.47 samples/sec Loss 11.1205 LearningRate 0.0007 Epoch: 2 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:39,976-Speed 6315.21 samples/sec Loss 11.0610 LearningRate 0.0007 Epoch: 2 Global Step: 57690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:43,222-Speed 6311.00 samples/sec Loss 11.1036 LearningRate 0.0007 Epoch: 2 Global Step: 57700 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:58:46,452-Speed 6343.04 samples/sec Loss 11.0996 LearningRate 0.0007 Epoch: 2 Global Step: 57710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:49,695-Speed 6316.18 samples/sec Loss 11.1892 LearningRate 0.0007 Epoch: 2 Global Step: 57720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:52,941-Speed 6311.08 samples/sec Loss 11.0398 LearningRate 0.0007 Epoch: 2 Global Step: 57730 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:56,189-Speed 6306.55 samples/sec Loss 10.9899 LearningRate 0.0007 Epoch: 2 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:58:59,432-Speed 6315.28 samples/sec Loss 10.9769 LearningRate 0.0007 Epoch: 2 Global Step: 57750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:02,682-Speed 6303.92 samples/sec Loss 11.1243 LearningRate 0.0007 Epoch: 2 Global Step: 57760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:05,924-Speed 6318.61 samples/sec Loss 10.9445 LearningRate 0.0007 Epoch: 2 Global Step: 57770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:09,166-Speed 6318.23 samples/sec Loss 11.0417 LearningRate 0.0007 Epoch: 2 Global Step: 57780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:12,413-Speed 6309.03 samples/sec Loss 11.1167 LearningRate 0.0007 Epoch: 2 Global Step: 57790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:15,653-Speed 6321.52 samples/sec Loss 11.0831 LearningRate 0.0007 Epoch: 2 Global Step: 57800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:18,880-Speed 6348.43 samples/sec Loss 11.0779 LearningRate 0.0007 Epoch: 2 Global Step: 57810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:22,128-Speed 6307.48 samples/sec Loss 11.0718 LearningRate 0.0007 Epoch: 2 Global Step: 57820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:25,396-Speed 6267.02 samples/sec Loss 11.0256 LearningRate 0.0007 Epoch: 2 Global Step: 57830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:28,638-Speed 6319.19 samples/sec Loss 11.1006 LearningRate 0.0007 Epoch: 2 Global Step: 57840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:31,880-Speed 6318.65 samples/sec Loss 11.1202 LearningRate 0.0007 Epoch: 2 Global Step: 57850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:35,130-Speed 6303.18 samples/sec Loss 11.1803 LearningRate 0.0007 Epoch: 2 Global Step: 57860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:38,374-Speed 6314.34 samples/sec Loss 11.0990 LearningRate 0.0007 Epoch: 2 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:41,621-Speed 6308.92 samples/sec Loss 11.0430 LearningRate 0.0007 Epoch: 2 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:44,865-Speed 6314.80 samples/sec Loss 11.0862 LearningRate 0.0007 Epoch: 2 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:48,114-Speed 6305.38 samples/sec Loss 11.1019 LearningRate 0.0007 Epoch: 2 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:51,359-Speed 6312.22 samples/sec Loss 11.0832 LearningRate 0.0007 Epoch: 2 Global Step: 57910 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 20:59:54,590-Speed 6339.62 samples/sec Loss 11.0408 LearningRate 0.0007 Epoch: 2 Global Step: 57920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 20:59:57,832-Speed 6319.20 samples/sec Loss 11.0199 LearningRate 0.0007 Epoch: 2 Global Step: 57930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:01,080-Speed 6309.82 samples/sec Loss 11.0592 LearningRate 0.0007 Epoch: 2 Global Step: 57940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:04,329-Speed 6305.43 samples/sec Loss 11.0057 LearningRate 0.0007 Epoch: 2 Global Step: 57950 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:07,572-Speed 6315.08 samples/sec Loss 11.1203 LearningRate 0.0007 Epoch: 2 Global Step: 57960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:10,811-Speed 6323.89 samples/sec Loss 11.0634 LearningRate 0.0007 Epoch: 2 Global Step: 57970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:14,059-Speed 6307.03 samples/sec Loss 11.0919 LearningRate 0.0007 Epoch: 2 Global Step: 57980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:17,305-Speed 6312.31 samples/sec Loss 11.1238 LearningRate 0.0007 Epoch: 2 Global Step: 57990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:20,550-Speed 6311.51 samples/sec Loss 11.0944 LearningRate 0.0007 Epoch: 2 Global Step: 58000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:23,796-Speed 6309.99 samples/sec Loss 11.0843 LearningRate 0.0007 Epoch: 2 Global Step: 58010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:27,030-Speed 6335.17 samples/sec Loss 11.0746 LearningRate 0.0007 Epoch: 2 Global Step: 58020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:30,278-Speed 6306.34 samples/sec Loss 10.9815 LearningRate 0.0007 Epoch: 2 Global Step: 58030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:33,527-Speed 6306.21 samples/sec Loss 11.0134 LearningRate 0.0007 Epoch: 2 Global Step: 58040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:36,771-Speed 6312.90 samples/sec Loss 10.9945 LearningRate 0.0007 Epoch: 2 Global Step: 58050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:40,017-Speed 6311.74 samples/sec Loss 11.0769 LearningRate 0.0007 Epoch: 2 Global Step: 58060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:43,261-Speed 6314.05 samples/sec Loss 11.0766 LearningRate 0.0007 Epoch: 2 Global Step: 58070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:46,504-Speed 6317.21 samples/sec Loss 11.0785 LearningRate 0.0007 Epoch: 2 Global Step: 58080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:49,748-Speed 6316.14 samples/sec Loss 11.0429 LearningRate 0.0007 Epoch: 2 Global Step: 58090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:52,994-Speed 6310.42 samples/sec Loss 11.0630 LearningRate 0.0007 Epoch: 2 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:56,239-Speed 6310.88 samples/sec Loss 11.0032 LearningRate 0.0007 Epoch: 2 Global Step: 58110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:00:59,474-Speed 6333.53 samples/sec Loss 11.0751 LearningRate 0.0007 Epoch: 2 Global Step: 58120 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:02,722-Speed 6306.64 samples/sec Loss 10.9409 LearningRate 0.0007 Epoch: 2 Global Step: 58130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:05,965-Speed 6317.01 samples/sec Loss 10.9802 LearningRate 0.0007 Epoch: 2 Global Step: 58140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:09,210-Speed 6312.74 samples/sec Loss 11.0567 LearningRate 0.0007 Epoch: 2 Global Step: 58150 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:12,452-Speed 6318.38 samples/sec Loss 11.0540 LearningRate 0.0007 Epoch: 2 Global Step: 58160 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:15,702-Speed 6302.88 samples/sec Loss 11.1456 LearningRate 0.0007 Epoch: 2 Global Step: 58170 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:18,946-Speed 6313.07 samples/sec Loss 10.9839 LearningRate 0.0007 Epoch: 2 Global Step: 58180 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:22,190-Speed 6316.20 samples/sec Loss 11.0405 LearningRate 0.0007 Epoch: 2 Global Step: 58190 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:25,437-Speed 6307.43 samples/sec Loss 11.0325 LearningRate 0.0007 Epoch: 2 Global Step: 58200 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:28,684-Speed 6309.27 samples/sec Loss 11.0245 LearningRate 0.0007 Epoch: 2 Global Step: 58210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:31,927-Speed 6317.04 samples/sec Loss 11.1061 LearningRate 0.0007 Epoch: 2 Global Step: 58220 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:01:35,157-Speed 6342.44 samples/sec Loss 11.0119 LearningRate 0.0007 Epoch: 2 Global Step: 58230 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:38,399-Speed 6318.59 samples/sec Loss 11.0208 LearningRate 0.0007 Epoch: 2 Global Step: 58240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:41,645-Speed 6308.86 samples/sec Loss 11.0135 LearningRate 0.0007 Epoch: 2 Global Step: 58250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:44,891-Speed 6311.82 samples/sec Loss 10.9832 LearningRate 0.0007 Epoch: 2 Global Step: 58260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:48,137-Speed 6310.42 samples/sec Loss 10.9078 LearningRate 0.0007 Epoch: 2 Global Step: 58270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:51,380-Speed 6316.83 samples/sec Loss 11.0459 LearningRate 0.0007 Epoch: 2 Global Step: 58280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:54,622-Speed 6319.36 samples/sec Loss 11.0259 LearningRate 0.0007 Epoch: 2 Global Step: 58290 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:01:57,863-Speed 6319.40 samples/sec Loss 11.0262 LearningRate 0.0007 Epoch: 2 Global Step: 58300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:01,107-Speed 6315.92 samples/sec Loss 10.9560 LearningRate 0.0007 Epoch: 2 Global Step: 58310 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:04,353-Speed 6309.72 samples/sec Loss 11.0346 LearningRate 0.0007 Epoch: 2 Global Step: 58320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:07,584-Speed 6339.64 samples/sec Loss 10.9315 LearningRate 0.0007 Epoch: 2 Global Step: 58330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:10,827-Speed 6317.78 samples/sec Loss 10.9780 LearningRate 0.0007 Epoch: 2 Global Step: 58340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:14,068-Speed 6320.37 samples/sec Loss 11.0259 LearningRate 0.0007 Epoch: 2 Global Step: 58350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:17,308-Speed 6322.09 samples/sec Loss 10.9998 LearningRate 0.0007 Epoch: 2 Global Step: 58360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:20,550-Speed 6317.54 samples/sec Loss 10.9937 LearningRate 0.0007 Epoch: 2 Global Step: 58370 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:23,793-Speed 6318.25 samples/sec Loss 10.9210 LearningRate 0.0007 Epoch: 2 Global Step: 58380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:27,046-Speed 6296.92 samples/sec Loss 11.0204 LearningRate 0.0007 Epoch: 2 Global Step: 58390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:30,289-Speed 6315.36 samples/sec Loss 11.0679 LearningRate 0.0007 Epoch: 2 Global Step: 58400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:33,535-Speed 6311.81 samples/sec Loss 11.0047 LearningRate 0.0007 Epoch: 2 Global Step: 58410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:36,779-Speed 6314.51 samples/sec Loss 11.1361 LearningRate 0.0007 Epoch: 2 Global Step: 58420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:40,009-Speed 6340.93 samples/sec Loss 11.0183 LearningRate 0.0007 Epoch: 2 Global Step: 58430 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:43,270-Speed 6281.74 samples/sec Loss 10.9745 LearningRate 0.0007 Epoch: 2 Global Step: 58440 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:46,517-Speed 6309.71 samples/sec Loss 11.0641 LearningRate 0.0007 Epoch: 2 Global Step: 58450 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:49,761-Speed 6312.89 samples/sec Loss 11.0052 LearningRate 0.0007 Epoch: 2 Global Step: 58460 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:53,007-Speed 6311.77 samples/sec Loss 10.9817 LearningRate 0.0007 Epoch: 2 Global Step: 58470 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:56,250-Speed 6317.74 samples/sec Loss 11.0183 LearningRate 0.0007 Epoch: 2 Global Step: 58480 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:02:59,493-Speed 6316.76 samples/sec Loss 11.0128 LearningRate 0.0007 Epoch: 2 Global Step: 58490 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:03:02,738-Speed 6312.66 samples/sec Loss 10.9945 LearningRate 0.0007 Epoch: 2 Global Step: 58500 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:03:05,983-Speed 6312.04 samples/sec Loss 11.0065 LearningRate 0.0007 Epoch: 2 Global Step: 58510 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:03:09,224-Speed 6320.92 samples/sec Loss 11.0200 LearningRate 0.0007 Epoch: 2 Global Step: 58520 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:03:12,456-Speed 6338.07 samples/sec Loss 10.9848 LearningRate 0.0007 Epoch: 2 Global Step: 58530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:03:15,690-Speed 6334.26 samples/sec Loss 10.9881 LearningRate 0.0007 Epoch: 2 Global Step: 58540 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:03:18,933-Speed 6316.03 samples/sec Loss 10.9537 LearningRate 0.0007 Epoch: 2 Global Step: 58550 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:03:22,177-Speed 6314.79 samples/sec Loss 10.9766 LearningRate 0.0007 Epoch: 2 Global Step: 58560 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:03:25,423-Speed 6310.41 samples/sec Loss 10.9734 LearningRate 0.0007 Epoch: 2 Global Step: 58570 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:03:28,667-Speed 6315.31 samples/sec Loss 10.9976 LearningRate 0.0007 Epoch: 2 Global Step: 58580 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:03:31,914-Speed 6307.86 samples/sec Loss 10.9936 LearningRate 0.0007 Epoch: 2 Global Step: 58590 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:03:35,180-Speed 6271.84 samples/sec Loss 10.9775 LearningRate 0.0007 Epoch: 2 Global Step: 58600 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:03:38,428-Speed 6306.22 samples/sec Loss 10.9876 LearningRate 0.0007 Epoch: 2 Global Step: 58610 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:03:41,671-Speed 6318.19 samples/sec Loss 10.9466 LearningRate 0.0007 Epoch: 2 Global Step: 58620 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:03:44,920-Speed 6304.91 samples/sec Loss 11.0260 LearningRate 0.0007 Epoch: 2 Global Step: 58630 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:03:48,165-Speed 6312.19 samples/sec Loss 11.0085 LearningRate 0.0007 Epoch: 2 Global Step: 58640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:03:51,406-Speed 6318.94 samples/sec Loss 10.9900 LearningRate 0.0007 Epoch: 2 Global Step: 58650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:03:54,650-Speed 6315.61 samples/sec Loss 11.0023 LearningRate 0.0007 Epoch: 2 Global Step: 58660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:03:57,891-Speed 6319.59 samples/sec Loss 10.9862 LearningRate 0.0007 Epoch: 2 Global Step: 58670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:01,139-Speed 6308.50 samples/sec Loss 11.0673 LearningRate 0.0007 Epoch: 2 Global Step: 58680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:04,384-Speed 6313.51 samples/sec Loss 10.9564 LearningRate 0.0007 Epoch: 2 Global Step: 58690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:07,630-Speed 6310.45 samples/sec Loss 10.8813 LearningRate 0.0007 Epoch: 2 Global Step: 58700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:10,875-Speed 6312.72 samples/sec Loss 11.0371 LearningRate 0.0007 Epoch: 2 Global Step: 58710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:14,117-Speed 6318.34 samples/sec Loss 11.0151 LearningRate 0.0007 Epoch: 2 Global Step: 58720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:17,409-Speed 6222.71 samples/sec Loss 11.0513 LearningRate 0.0007 Epoch: 2 Global Step: 58730 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:20,647-Speed 6324.80 samples/sec Loss 11.0217 LearningRate 0.0007 Epoch: 2 Global Step: 58740 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:04:23,879-Speed 6338.33 samples/sec Loss 10.9791 LearningRate 0.0007 Epoch: 2 Global Step: 58750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:27,130-Speed 6302.33 samples/sec Loss 10.9957 LearningRate 0.0007 Epoch: 2 Global Step: 58760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:30,372-Speed 6316.95 samples/sec Loss 10.9740 LearningRate 0.0007 Epoch: 2 Global Step: 58770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:33,621-Speed 6305.39 samples/sec Loss 11.0473 LearningRate 0.0007 Epoch: 2 Global Step: 58780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:36,865-Speed 6314.11 samples/sec Loss 10.8794 LearningRate 0.0007 Epoch: 2 Global Step: 58790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:40,107-Speed 6319.92 samples/sec Loss 10.8960 LearningRate 0.0007 Epoch: 2 Global Step: 58800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:43,352-Speed 6312.20 samples/sec Loss 10.8964 LearningRate 0.0007 Epoch: 2 Global Step: 58810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:46,598-Speed 6310.84 samples/sec Loss 11.0802 LearningRate 0.0007 Epoch: 2 Global Step: 58820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:49,842-Speed 6314.70 samples/sec Loss 10.9049 LearningRate 0.0007 Epoch: 2 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:53,085-Speed 6316.32 samples/sec Loss 11.0004 LearningRate 0.0007 Epoch: 2 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:56,315-Speed 6341.99 samples/sec Loss 10.8767 LearningRate 0.0007 Epoch: 2 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:04:59,556-Speed 6319.95 samples/sec Loss 10.9053 LearningRate 0.0007 Epoch: 2 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:02,805-Speed 6305.78 samples/sec Loss 10.9779 LearningRate 0.0007 Epoch: 2 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:06,052-Speed 6308.02 samples/sec Loss 10.9570 LearningRate 0.0007 Epoch: 2 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:09,293-Speed 6322.06 samples/sec Loss 10.9248 LearningRate 0.0007 Epoch: 2 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:12,538-Speed 6310.71 samples/sec Loss 10.8847 LearningRate 0.0007 Epoch: 2 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:15,793-Speed 6294.22 samples/sec Loss 10.9284 LearningRate 0.0007 Epoch: 2 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:19,039-Speed 6310.52 samples/sec Loss 10.9365 LearningRate 0.0007 Epoch: 2 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:22,283-Speed 6315.51 samples/sec Loss 10.9154 LearningRate 0.0007 Epoch: 2 Global Step: 58930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:25,522-Speed 6323.20 samples/sec Loss 10.9794 LearningRate 0.0007 Epoch: 2 Global Step: 58940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:28,766-Speed 6315.62 samples/sec Loss 10.9539 LearningRate 0.0007 Epoch: 2 Global Step: 58950 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:05:31,995-Speed 6341.88 samples/sec Loss 10.9450 LearningRate 0.0007 Epoch: 2 Global Step: 58960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:35,247-Speed 6304.37 samples/sec Loss 10.8731 LearningRate 0.0007 Epoch: 2 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:38,485-Speed 6325.52 samples/sec Loss 10.9262 LearningRate 0.0007 Epoch: 2 Global Step: 58980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:41,728-Speed 6315.82 samples/sec Loss 11.0410 LearningRate 0.0007 Epoch: 2 Global Step: 58990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:44,982-Speed 6296.74 samples/sec Loss 11.0335 LearningRate 0.0007 Epoch: 2 Global Step: 59000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:48,231-Speed 6303.35 samples/sec Loss 10.9982 LearningRate 0.0007 Epoch: 2 Global Step: 59010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:51,473-Speed 6318.64 samples/sec Loss 10.8713 LearningRate 0.0007 Epoch: 2 Global Step: 59020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:54,719-Speed 6310.39 samples/sec Loss 10.8918 LearningRate 0.0007 Epoch: 2 Global Step: 59030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:05:57,961-Speed 6318.97 samples/sec Loss 10.9835 LearningRate 0.0007 Epoch: 2 Global Step: 59040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:01,207-Speed 6311.34 samples/sec Loss 10.9334 LearningRate 0.0007 Epoch: 2 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:04,441-Speed 6333.71 samples/sec Loss 10.9359 LearningRate 0.0007 Epoch: 2 Global Step: 59060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:07,687-Speed 6310.69 samples/sec Loss 10.9272 LearningRate 0.0007 Epoch: 2 Global Step: 59070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:10,930-Speed 6316.77 samples/sec Loss 10.9421 LearningRate 0.0007 Epoch: 2 Global Step: 59080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:14,176-Speed 6311.86 samples/sec Loss 10.9552 LearningRate 0.0007 Epoch: 2 Global Step: 59090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:17,417-Speed 6319.92 samples/sec Loss 10.8844 LearningRate 0.0007 Epoch: 2 Global Step: 59100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:20,659-Speed 6318.93 samples/sec Loss 10.8588 LearningRate 0.0007 Epoch: 2 Global Step: 59110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:23,905-Speed 6311.42 samples/sec Loss 10.9725 LearningRate 0.0007 Epoch: 2 Global Step: 59120 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:27,148-Speed 6316.15 samples/sec Loss 11.0101 LearningRate 0.0007 Epoch: 2 Global Step: 59130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:30,390-Speed 6318.21 samples/sec Loss 10.8721 LearningRate 0.0007 Epoch: 2 Global Step: 59140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:33,632-Speed 6318.65 samples/sec Loss 10.9243 LearningRate 0.0007 Epoch: 2 Global Step: 59150 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:36,874-Speed 6317.98 samples/sec Loss 10.9002 LearningRate 0.0007 Epoch: 2 Global Step: 59160 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:40,120-Speed 6311.10 samples/sec Loss 10.8641 LearningRate 0.0007 Epoch: 2 Global Step: 59170 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:43,362-Speed 6318.57 samples/sec Loss 10.9556 LearningRate 0.0007 Epoch: 2 Global Step: 59180 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:46,611-Speed 6305.87 samples/sec Loss 10.8915 LearningRate 0.0007 Epoch: 2 Global Step: 59190 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:49,854-Speed 6316.53 samples/sec Loss 10.8850 LearningRate 0.0007 Epoch: 2 Global Step: 59200 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:53,118-Speed 6274.69 samples/sec Loss 10.9096 LearningRate 0.0007 Epoch: 2 Global Step: 59210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:56,364-Speed 6312.02 samples/sec Loss 10.9407 LearningRate 0.0007 Epoch: 2 Global Step: 59220 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:06:59,612-Speed 6305.42 samples/sec Loss 10.8938 LearningRate 0.0007 Epoch: 2 Global Step: 59230 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:02,860-Speed 6306.54 samples/sec Loss 10.9105 LearningRate 0.0007 Epoch: 2 Global Step: 59240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:06,103-Speed 6316.68 samples/sec Loss 10.9551 LearningRate 0.0007 Epoch: 2 Global Step: 59250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:09,348-Speed 6313.97 samples/sec Loss 10.9283 LearningRate 0.0007 Epoch: 2 Global Step: 59260 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:07:12,576-Speed 6346.00 samples/sec Loss 10.8428 LearningRate 0.0007 Epoch: 2 Global Step: 59270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:15,820-Speed 6313.72 samples/sec Loss 10.9526 LearningRate 0.0007 Epoch: 2 Global Step: 59280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:19,067-Speed 6307.94 samples/sec Loss 10.8470 LearningRate 0.0007 Epoch: 2 Global Step: 59290 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:22,314-Speed 6309.61 samples/sec Loss 10.9518 LearningRate 0.0007 Epoch: 2 Global Step: 59300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:25,562-Speed 6306.96 samples/sec Loss 10.9256 LearningRate 0.0007 Epoch: 2 Global Step: 59310 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:28,826-Speed 6277.49 samples/sec Loss 10.8767 LearningRate 0.0007 Epoch: 2 Global Step: 59320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:32,072-Speed 6309.89 samples/sec Loss 10.9218 LearningRate 0.0007 Epoch: 2 Global Step: 59330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:35,317-Speed 6312.08 samples/sec Loss 10.9490 LearningRate 0.0007 Epoch: 2 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:38,560-Speed 6317.87 samples/sec Loss 10.9508 LearningRate 0.0007 Epoch: 2 Global Step: 59350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:41,852-Speed 6221.48 samples/sec Loss 10.9519 LearningRate 0.0007 Epoch: 2 Global Step: 59360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:45,079-Speed 6347.49 samples/sec Loss 10.8981 LearningRate 0.0007 Epoch: 2 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:48,324-Speed 6314.25 samples/sec Loss 10.8710 LearningRate 0.0007 Epoch: 2 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:51,566-Speed 6317.06 samples/sec Loss 10.8194 LearningRate 0.0007 Epoch: 2 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:54,810-Speed 6315.65 samples/sec Loss 11.0481 LearningRate 0.0007 Epoch: 2 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:07:58,050-Speed 6321.40 samples/sec Loss 10.9970 LearningRate 0.0007 Epoch: 2 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:01,309-Speed 6286.36 samples/sec Loss 10.8380 LearningRate 0.0007 Epoch: 2 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:04,553-Speed 6314.68 samples/sec Loss 10.9500 LearningRate 0.0007 Epoch: 2 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:07,793-Speed 6321.47 samples/sec Loss 10.9541 LearningRate 0.0007 Epoch: 2 Global Step: 59440 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:11,049-Speed 6291.64 samples/sec Loss 10.9028 LearningRate 0.0007 Epoch: 2 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:14,294-Speed 6313.60 samples/sec Loss 10.8869 LearningRate 0.0007 Epoch: 2 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:17,525-Speed 6338.91 samples/sec Loss 10.8979 LearningRate 0.0007 Epoch: 2 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:20,783-Speed 6287.86 samples/sec Loss 10.8329 LearningRate 0.0007 Epoch: 2 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:24,028-Speed 6313.08 samples/sec Loss 10.8664 LearningRate 0.0007 Epoch: 2 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:27,272-Speed 6314.28 samples/sec Loss 10.9547 LearningRate 0.0007 Epoch: 2 Global Step: 59500 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:30,518-Speed 6310.38 samples/sec Loss 10.9002 LearningRate 0.0007 Epoch: 2 Global Step: 59510 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:33,761-Speed 6317.47 samples/sec Loss 10.9552 LearningRate 0.0007 Epoch: 2 Global Step: 59520 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:37,004-Speed 6317.53 samples/sec Loss 10.8394 LearningRate 0.0007 Epoch: 2 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:40,248-Speed 6315.12 samples/sec Loss 10.9320 LearningRate 0.0007 Epoch: 2 Global Step: 59540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:43,497-Speed 6303.78 samples/sec Loss 10.9767 LearningRate 0.0007 Epoch: 2 Global Step: 59550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:46,740-Speed 6317.53 samples/sec Loss 10.9381 LearningRate 0.0007 Epoch: 2 Global Step: 59560 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:49,971-Speed 6340.25 samples/sec Loss 10.9731 LearningRate 0.0007 Epoch: 2 Global Step: 59570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:53,217-Speed 6309.43 samples/sec Loss 10.8742 LearningRate 0.0007 Epoch: 2 Global Step: 59580 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:56,475-Speed 6288.33 samples/sec Loss 10.8930 LearningRate 0.0007 Epoch: 2 Global Step: 59590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:08:59,720-Speed 6313.34 samples/sec Loss 10.9124 LearningRate 0.0007 Epoch: 2 Global Step: 59600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:02,965-Speed 6310.87 samples/sec Loss 10.8928 LearningRate 0.0007 Epoch: 2 Global Step: 59610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:06,209-Speed 6315.78 samples/sec Loss 10.9197 LearningRate 0.0007 Epoch: 2 Global Step: 59620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:09,455-Speed 6311.94 samples/sec Loss 10.8891 LearningRate 0.0007 Epoch: 2 Global Step: 59630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:12,701-Speed 6309.44 samples/sec Loss 10.8661 LearningRate 0.0007 Epoch: 2 Global Step: 59640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:15,945-Speed 6314.94 samples/sec Loss 10.9063 LearningRate 0.0007 Epoch: 2 Global Step: 59650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:19,193-Speed 6306.50 samples/sec Loss 10.7651 LearningRate 0.0007 Epoch: 2 Global Step: 59660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:22,428-Speed 6332.61 samples/sec Loss 10.8096 LearningRate 0.0007 Epoch: 2 Global Step: 59670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:25,670-Speed 6318.15 samples/sec Loss 10.8997 LearningRate 0.0007 Epoch: 2 Global Step: 59680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:28,915-Speed 6312.81 samples/sec Loss 10.8504 LearningRate 0.0007 Epoch: 2 Global Step: 59690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:32,162-Speed 6307.72 samples/sec Loss 10.8710 LearningRate 0.0007 Epoch: 2 Global Step: 59700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:35,416-Speed 6296.42 samples/sec Loss 10.8768 LearningRate 0.0007 Epoch: 2 Global Step: 59710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:38,665-Speed 6305.23 samples/sec Loss 10.9292 LearningRate 0.0007 Epoch: 2 Global Step: 59720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:41,911-Speed 6310.02 samples/sec Loss 10.9172 LearningRate 0.0007 Epoch: 2 Global Step: 59730 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:45,156-Speed 6313.86 samples/sec Loss 10.9160 LearningRate 0.0007 Epoch: 2 Global Step: 59740 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:48,399-Speed 6315.77 samples/sec Loss 10.7888 LearningRate 0.0007 Epoch: 2 Global Step: 59750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:51,643-Speed 6315.78 samples/sec Loss 10.8758 LearningRate 0.0007 Epoch: 2 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:09:54,893-Speed 6303.07 samples/sec Loss 10.8949 LearningRate 0.0007 Epoch: 2 Global Step: 59770 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:09:58,121-Speed 6346.00 samples/sec Loss 10.8983 LearningRate 0.0007 Epoch: 2 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:01,377-Speed 6289.56 samples/sec Loss 10.8704 LearningRate 0.0007 Epoch: 2 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:04,618-Speed 6320.26 samples/sec Loss 10.8823 LearningRate 0.0007 Epoch: 2 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:07,859-Speed 6322.24 samples/sec Loss 10.9109 LearningRate 0.0007 Epoch: 2 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:11,104-Speed 6312.60 samples/sec Loss 10.8125 LearningRate 0.0007 Epoch: 2 Global Step: 59820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:14,353-Speed 6304.36 samples/sec Loss 10.8239 LearningRate 0.0007 Epoch: 2 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:17,598-Speed 6311.72 samples/sec Loss 10.8870 LearningRate 0.0007 Epoch: 2 Global Step: 59840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:20,847-Speed 6304.96 samples/sec Loss 10.8675 LearningRate 0.0007 Epoch: 2 Global Step: 59850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:24,093-Speed 6310.14 samples/sec Loss 10.8944 LearningRate 0.0007 Epoch: 2 Global Step: 59860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:27,338-Speed 6316.69 samples/sec Loss 10.9306 LearningRate 0.0007 Epoch: 2 Global Step: 59870 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:30,566-Speed 6345.82 samples/sec Loss 10.8673 LearningRate 0.0007 Epoch: 2 Global Step: 59880 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:33,809-Speed 6316.39 samples/sec Loss 10.9421 LearningRate 0.0007 Epoch: 2 Global Step: 59890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:37,059-Speed 6304.05 samples/sec Loss 10.8405 LearningRate 0.0007 Epoch: 2 Global Step: 59900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:40,303-Speed 6315.06 samples/sec Loss 10.8834 LearningRate 0.0007 Epoch: 2 Global Step: 59910 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:43,546-Speed 6316.23 samples/sec Loss 10.9674 LearningRate 0.0007 Epoch: 2 Global Step: 59920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:46,791-Speed 6313.89 samples/sec Loss 10.8454 LearningRate 0.0007 Epoch: 2 Global Step: 59930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:50,039-Speed 6306.94 samples/sec Loss 10.9564 LearningRate 0.0007 Epoch: 2 Global Step: 59940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:53,279-Speed 6321.81 samples/sec Loss 10.8870 LearningRate 0.0007 Epoch: 2 Global Step: 59950 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:56,523-Speed 6313.40 samples/sec Loss 10.8453 LearningRate 0.0007 Epoch: 2 Global Step: 59960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:10:59,768-Speed 6313.80 samples/sec Loss 10.8614 LearningRate 0.0007 Epoch: 2 Global Step: 59970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:02,997-Speed 6342.96 samples/sec Loss 10.8149 LearningRate 0.0007 Epoch: 2 Global Step: 59980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:06,242-Speed 6313.29 samples/sec Loss 10.8860 LearningRate 0.0007 Epoch: 2 Global Step: 59990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:09,524-Speed 6241.07 samples/sec Loss 10.8596 LearningRate 0.0007 Epoch: 2 Global Step: 60000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:12,770-Speed 6311.11 samples/sec Loss 10.7465 LearningRate 0.0007 Epoch: 2 Global Step: 60010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:16,022-Speed 6299.25 samples/sec Loss 10.8269 LearningRate 0.0007 Epoch: 2 Global Step: 60020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:19,266-Speed 6315.42 samples/sec Loss 10.9307 LearningRate 0.0007 Epoch: 2 Global Step: 60030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:22,507-Speed 6319.85 samples/sec Loss 10.8932 LearningRate 0.0007 Epoch: 2 Global Step: 60040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:25,754-Speed 6309.15 samples/sec Loss 10.8150 LearningRate 0.0007 Epoch: 2 Global Step: 60050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:29,064-Speed 6188.94 samples/sec Loss 10.7566 LearningRate 0.0007 Epoch: 2 Global Step: 60060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:32,316-Speed 6298.21 samples/sec Loss 10.8863 LearningRate 0.0007 Epoch: 2 Global Step: 60070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:35,560-Speed 6313.79 samples/sec Loss 10.8117 LearningRate 0.0007 Epoch: 2 Global Step: 60080 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:11:38,792-Speed 6339.23 samples/sec Loss 10.8365 LearningRate 0.0007 Epoch: 2 Global Step: 60090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:42,035-Speed 6317.05 samples/sec Loss 10.8506 LearningRate 0.0007 Epoch: 2 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:11:45,267-Speed 6338.31 samples/sec Loss 10.8453 LearningRate 0.0007 Epoch: 2 Global Step: 60110 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:11:48,516-Speed 6305.07 samples/sec Loss 10.8192 LearningRate 0.0007 Epoch: 2 Global Step: 60120 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:11:51,759-Speed 6316.76 samples/sec Loss 10.8531 LearningRate 0.0007 Epoch: 2 Global Step: 60130 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:11:55,000-Speed 6320.17 samples/sec Loss 11.0105 LearningRate 0.0007 Epoch: 2 Global Step: 60140 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:11:58,265-Speed 6272.78 samples/sec Loss 10.8358 LearningRate 0.0007 Epoch: 2 Global Step: 60150 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:12:01,612-Speed 6121.82 samples/sec Loss 10.8860 LearningRate 0.0007 Epoch: 2 Global Step: 60160 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:12:04,853-Speed 6319.08 samples/sec Loss 10.8762 LearningRate 0.0007 Epoch: 2 Global Step: 60170 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:12:08,102-Speed 6305.93 samples/sec Loss 10.8692 LearningRate 0.0007 Epoch: 2 Global Step: 60180 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:12:11,345-Speed 6316.72 samples/sec Loss 10.8301 LearningRate 0.0007 Epoch: 2 Global Step: 60190 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:12:14,600-Speed 6293.68 samples/sec Loss 10.8921 LearningRate 0.0007 Epoch: 2 Global Step: 60200 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:12:17,843-Speed 6315.96 samples/sec Loss 10.8567 LearningRate 0.0007 Epoch: 2 Global Step: 60210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:21,085-Speed 6317.35 samples/sec Loss 10.8444 LearningRate 0.0007 Epoch: 2 Global Step: 60220 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:24,329-Speed 6314.16 samples/sec Loss 10.7716 LearningRate 0.0007 Epoch: 2 Global Step: 60230 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:27,576-Speed 6310.38 samples/sec Loss 10.8695 LearningRate 0.0007 Epoch: 2 Global Step: 60240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:30,822-Speed 6310.20 samples/sec Loss 10.8323 LearningRate 0.0007 Epoch: 2 Global Step: 60250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:34,068-Speed 6310.64 samples/sec Loss 10.8570 LearningRate 0.0007 Epoch: 2 Global Step: 60260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:37,319-Speed 6299.85 samples/sec Loss 10.8254 LearningRate 0.0007 Epoch: 2 Global Step: 60270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:40,561-Speed 6319.76 samples/sec Loss 10.8495 LearningRate 0.0007 Epoch: 2 Global Step: 60280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:43,803-Speed 6318.74 samples/sec Loss 10.7670 LearningRate 0.0007 Epoch: 2 Global Step: 60290 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:47,047-Speed 6314.48 samples/sec Loss 10.8562 LearningRate 0.0007 Epoch: 2 Global Step: 60300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:50,293-Speed 6309.80 samples/sec Loss 10.9060 LearningRate 0.0007 Epoch: 2 Global Step: 60310 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:12:53,558-Speed 6275.54 samples/sec Loss 10.8658 LearningRate 0.0007 Epoch: 2 Global Step: 60320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:12:56,802-Speed 6314.66 samples/sec Loss 10.7745 LearningRate 0.0007 Epoch: 2 Global Step: 60330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:13:00,048-Speed 6315.20 samples/sec Loss 10.7874 LearningRate 0.0007 Epoch: 2 Global Step: 60340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:13:03,291-Speed 6316.71 samples/sec Loss 10.7770 LearningRate 0.0007 Epoch: 2 Global Step: 60350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:13:06,534-Speed 6316.61 samples/sec Loss 10.7965 LearningRate 0.0007 Epoch: 2 Global Step: 60360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:13:09,781-Speed 6307.99 samples/sec Loss 10.7701 LearningRate 0.0007 Epoch: 2 Global Step: 60370 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:13:13,024-Speed 6316.92 samples/sec Loss 10.7623 LearningRate 0.0007 Epoch: 2 Global Step: 60380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:13:16,285-Speed 6281.72 samples/sec Loss 10.7861 LearningRate 0.0007 Epoch: 2 Global Step: 60390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:13:19,574-Speed 6227.60 samples/sec Loss 10.7826 LearningRate 0.0007 Epoch: 2 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:13:22,840-Speed 6271.47 samples/sec Loss 10.8264 LearningRate 0.0007 Epoch: 2 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:13:26,092-Speed 6300.45 samples/sec Loss 10.8376 LearningRate 0.0007 Epoch: 2 Global Step: 60420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:13:29,322-Speed 6340.71 samples/sec Loss 10.8796 LearningRate 0.0007 Epoch: 2 Global Step: 60430 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:13:32,568-Speed 6310.58 samples/sec Loss 10.8336 LearningRate 0.0007 Epoch: 2 Global Step: 60440 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:13:35,812-Speed 6314.64 samples/sec Loss 10.8206 LearningRate 0.0007 Epoch: 2 Global Step: 60450 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:13:39,059-Speed 6310.03 samples/sec Loss 10.8654 LearningRate 0.0007 Epoch: 2 Global Step: 60460 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:13:42,304-Speed 6311.40 samples/sec Loss 10.8475 LearningRate 0.0007 Epoch: 2 Global Step: 60470 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:13:45,548-Speed 6316.01 samples/sec Loss 10.8007 LearningRate 0.0007 Epoch: 2 Global Step: 60480 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:13:48,793-Speed 6311.76 samples/sec Loss 10.7975 LearningRate 0.0007 Epoch: 2 Global Step: 60490 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:13:52,036-Speed 6315.71 samples/sec Loss 10.6817 LearningRate 0.0007 Epoch: 2 Global Step: 60500 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:13:55,280-Speed 6314.54 samples/sec Loss 11.0003 LearningRate 0.0007 Epoch: 2 Global Step: 60510 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:13:58,532-Speed 6300.75 samples/sec Loss 10.8416 LearningRate 0.0007 Epoch: 2 Global Step: 60520 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:14:01,779-Speed 6309.22 samples/sec Loss 10.7976 LearningRate 0.0007 Epoch: 2 Global Step: 60530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:05,022-Speed 6316.53 samples/sec Loss 10.8181 LearningRate 0.0007 Epoch: 2 Global Step: 60540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:08,267-Speed 6312.86 samples/sec Loss 10.9041 LearningRate 0.0007 Epoch: 2 Global Step: 60550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:11,513-Speed 6311.44 samples/sec Loss 10.7862 LearningRate 0.0007 Epoch: 2 Global Step: 60560 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:14,758-Speed 6312.92 samples/sec Loss 10.8541 LearningRate 0.0007 Epoch: 2 Global Step: 60570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:18,000-Speed 6317.07 samples/sec Loss 10.7487 LearningRate 0.0007 Epoch: 2 Global Step: 60580 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:21,244-Speed 6314.26 samples/sec Loss 10.7750 LearningRate 0.0007 Epoch: 2 Global Step: 60590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:24,493-Speed 6304.53 samples/sec Loss 10.7740 LearningRate 0.0007 Epoch: 2 Global Step: 60600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:27,737-Speed 6316.54 samples/sec Loss 10.7636 LearningRate 0.0007 Epoch: 2 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:30,983-Speed 6309.84 samples/sec Loss 10.7768 LearningRate 0.0007 Epoch: 2 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:34,214-Speed 6339.52 samples/sec Loss 10.8667 LearningRate 0.0007 Epoch: 2 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:37,456-Speed 6317.94 samples/sec Loss 10.7851 LearningRate 0.0007 Epoch: 2 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:40,699-Speed 6317.56 samples/sec Loss 10.7615 LearningRate 0.0007 Epoch: 2 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:43,947-Speed 6306.94 samples/sec Loss 10.8235 LearningRate 0.0007 Epoch: 2 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:47,195-Speed 6307.53 samples/sec Loss 10.7957 LearningRate 0.0007 Epoch: 2 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:50,440-Speed 6310.68 samples/sec Loss 10.6919 LearningRate 0.0007 Epoch: 2 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:53,688-Speed 6307.84 samples/sec Loss 10.8226 LearningRate 0.0007 Epoch: 2 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:14:56,931-Speed 6316.70 samples/sec Loss 10.8821 LearningRate 0.0007 Epoch: 2 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:00,177-Speed 6310.96 samples/sec Loss 10.8623 LearningRate 0.0007 Epoch: 2 Global Step: 60710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:03,421-Speed 6315.22 samples/sec Loss 10.8199 LearningRate 0.0007 Epoch: 2 Global Step: 60720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:06,668-Speed 6307.75 samples/sec Loss 10.8597 LearningRate 0.0007 Epoch: 2 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:15:09,899-Speed 6341.73 samples/sec Loss 10.7378 LearningRate 0.0007 Epoch: 2 Global Step: 60740 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:13,147-Speed 6305.23 samples/sec Loss 10.7958 LearningRate 0.0007 Epoch: 2 Global Step: 60750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:16,390-Speed 6317.92 samples/sec Loss 10.6844 LearningRate 0.0007 Epoch: 2 Global Step: 60760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:19,632-Speed 6318.71 samples/sec Loss 10.7295 LearningRate 0.0007 Epoch: 2 Global Step: 60770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:22,884-Speed 6299.17 samples/sec Loss 10.7550 LearningRate 0.0007 Epoch: 2 Global Step: 60780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:26,125-Speed 6319.20 samples/sec Loss 10.8507 LearningRate 0.0007 Epoch: 2 Global Step: 60790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:29,372-Speed 6309.85 samples/sec Loss 10.7911 LearningRate 0.0007 Epoch: 2 Global Step: 60800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:32,618-Speed 6309.54 samples/sec Loss 10.8440 LearningRate 0.0007 Epoch: 2 Global Step: 60810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:35,861-Speed 6317.47 samples/sec Loss 10.8187 LearningRate 0.0007 Epoch: 2 Global Step: 60820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:39,103-Speed 6318.85 samples/sec Loss 10.8309 LearningRate 0.0007 Epoch: 2 Global Step: 60830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:42,336-Speed 6334.37 samples/sec Loss 10.8480 LearningRate 0.0007 Epoch: 2 Global Step: 60840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:45,580-Speed 6315.77 samples/sec Loss 10.8221 LearningRate 0.0007 Epoch: 2 Global Step: 60850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:48,828-Speed 6306.73 samples/sec Loss 10.7689 LearningRate 0.0007 Epoch: 2 Global Step: 60860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:52,074-Speed 6310.70 samples/sec Loss 10.6671 LearningRate 0.0007 Epoch: 2 Global Step: 60870 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:55,318-Speed 6313.73 samples/sec Loss 10.7855 LearningRate 0.0007 Epoch: 2 Global Step: 60880 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:15:58,564-Speed 6311.86 samples/sec Loss 10.8000 LearningRate 0.0007 Epoch: 2 Global Step: 60890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:01,808-Speed 6313.31 samples/sec Loss 10.7168 LearningRate 0.0007 Epoch: 2 Global Step: 60900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:05,057-Speed 6304.85 samples/sec Loss 10.7403 LearningRate 0.0007 Epoch: 2 Global Step: 60910 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:08,303-Speed 6310.79 samples/sec Loss 10.8713 LearningRate 0.0007 Epoch: 2 Global Step: 60920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:11,548-Speed 6313.23 samples/sec Loss 10.7642 LearningRate 0.0007 Epoch: 2 Global Step: 60930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:14,794-Speed 6311.80 samples/sec Loss 10.8428 LearningRate 0.0007 Epoch: 2 Global Step: 60940 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:16:18,031-Speed 6327.91 samples/sec Loss 10.7164 LearningRate 0.0007 Epoch: 2 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:21,277-Speed 6310.75 samples/sec Loss 10.8255 LearningRate 0.0007 Epoch: 2 Global Step: 60960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:24,536-Speed 6285.12 samples/sec Loss 10.8183 LearningRate 0.0007 Epoch: 2 Global Step: 60970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:27,848-Speed 6186.52 samples/sec Loss 10.8130 LearningRate 0.0007 Epoch: 2 Global Step: 60980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:31,100-Speed 6298.10 samples/sec Loss 10.7966 LearningRate 0.0007 Epoch: 2 Global Step: 60990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:34,357-Speed 6290.65 samples/sec Loss 10.7935 LearningRate 0.0007 Epoch: 2 Global Step: 61000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:37,606-Speed 6303.18 samples/sec Loss 10.8217 LearningRate 0.0007 Epoch: 2 Global Step: 61010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:40,850-Speed 6314.35 samples/sec Loss 10.7035 LearningRate 0.0007 Epoch: 2 Global Step: 61020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:44,098-Speed 6307.49 samples/sec Loss 10.7643 LearningRate 0.0007 Epoch: 2 Global Step: 61030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:47,339-Speed 6319.72 samples/sec Loss 10.7633 LearningRate 0.0007 Epoch: 2 Global Step: 61040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:50,575-Speed 6332.21 samples/sec Loss 10.7619 LearningRate 0.0007 Epoch: 2 Global Step: 61050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:53,818-Speed 6315.55 samples/sec Loss 10.6495 LearningRate 0.0007 Epoch: 2 Global Step: 61060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:16:57,066-Speed 6306.89 samples/sec Loss 10.8514 LearningRate 0.0007 Epoch: 2 Global Step: 61070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:00,307-Speed 6319.09 samples/sec Loss 10.7162 LearningRate 0.0007 Epoch: 2 Global Step: 61080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:03,554-Speed 6310.46 samples/sec Loss 10.7719 LearningRate 0.0007 Epoch: 2 Global Step: 61090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:06,798-Speed 6314.56 samples/sec Loss 10.8433 LearningRate 0.0007 Epoch: 2 Global Step: 61100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:10,040-Speed 6317.88 samples/sec Loss 10.7957 LearningRate 0.0007 Epoch: 2 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:13,284-Speed 6314.39 samples/sec Loss 10.6861 LearningRate 0.0007 Epoch: 2 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:16,536-Speed 6298.16 samples/sec Loss 10.7486 LearningRate 0.0007 Epoch: 2 Global Step: 61130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:19,784-Speed 6308.94 samples/sec Loss 10.8590 LearningRate 0.0007 Epoch: 2 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:23,035-Speed 6301.06 samples/sec Loss 10.7681 LearningRate 0.0007 Epoch: 2 Global Step: 61150 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:17:26,263-Speed 6345.23 samples/sec Loss 10.8883 LearningRate 0.0007 Epoch: 2 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:29,509-Speed 6310.00 samples/sec Loss 10.8198 LearningRate 0.0007 Epoch: 2 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:32,768-Speed 6286.23 samples/sec Loss 10.7019 LearningRate 0.0007 Epoch: 2 Global Step: 61180 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:36,010-Speed 6319.33 samples/sec Loss 10.7741 LearningRate 0.0007 Epoch: 2 Global Step: 61190 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:39,256-Speed 6311.01 samples/sec Loss 10.7329 LearningRate 0.0007 Epoch: 2 Global Step: 61200 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:42,501-Speed 6312.21 samples/sec Loss 10.8377 LearningRate 0.0007 Epoch: 2 Global Step: 61210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:45,745-Speed 6314.33 samples/sec Loss 10.8183 LearningRate 0.0007 Epoch: 2 Global Step: 61220 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:48,991-Speed 6310.94 samples/sec Loss 10.7137 LearningRate 0.0007 Epoch: 2 Global Step: 61230 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:52,240-Speed 6305.45 samples/sec Loss 10.8295 LearningRate 0.0007 Epoch: 2 Global Step: 61240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:55,487-Speed 6307.38 samples/sec Loss 10.7625 LearningRate 0.0007 Epoch: 2 Global Step: 61250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:17:58,722-Speed 6332.55 samples/sec Loss 10.8298 LearningRate 0.0007 Epoch: 2 Global Step: 61260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:01,967-Speed 6311.57 samples/sec Loss 10.7131 LearningRate 0.0007 Epoch: 2 Global Step: 61270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:05,217-Speed 6303.54 samples/sec Loss 10.7695 LearningRate 0.0007 Epoch: 2 Global Step: 61280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:08,463-Speed 6311.36 samples/sec Loss 10.7988 LearningRate 0.0007 Epoch: 2 Global Step: 61290 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:11,706-Speed 6316.18 samples/sec Loss 10.6695 LearningRate 0.0007 Epoch: 2 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:14,960-Speed 6296.29 samples/sec Loss 10.7657 LearningRate 0.0007 Epoch: 2 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:18,205-Speed 6312.06 samples/sec Loss 10.7444 LearningRate 0.0007 Epoch: 2 Global Step: 61320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:21,451-Speed 6311.00 samples/sec Loss 10.7688 LearningRate 0.0007 Epoch: 2 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:24,712-Speed 6281.76 samples/sec Loss 10.8025 LearningRate 0.0007 Epoch: 2 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:27,955-Speed 6317.11 samples/sec Loss 10.7780 LearningRate 0.0007 Epoch: 2 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:31,192-Speed 6328.54 samples/sec Loss 10.8000 LearningRate 0.0007 Epoch: 2 Global Step: 61360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:34,436-Speed 6315.20 samples/sec Loss 10.8241 LearningRate 0.0007 Epoch: 2 Global Step: 61370 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:37,680-Speed 6314.46 samples/sec Loss 10.6323 LearningRate 0.0007 Epoch: 2 Global Step: 61380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:40,926-Speed 6309.40 samples/sec Loss 10.6778 LearningRate 0.0007 Epoch: 2 Global Step: 61390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:44,172-Speed 6312.00 samples/sec Loss 10.7449 LearningRate 0.0007 Epoch: 2 Global Step: 61400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:47,418-Speed 6309.84 samples/sec Loss 10.7133 LearningRate 0.0007 Epoch: 2 Global Step: 61410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:50,665-Speed 6309.86 samples/sec Loss 10.7618 LearningRate 0.0007 Epoch: 2 Global Step: 61420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:53,915-Speed 6301.42 samples/sec Loss 10.7509 LearningRate 0.0007 Epoch: 2 Global Step: 61430 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:18:57,161-Speed 6311.75 samples/sec Loss 10.8596 LearningRate 0.0007 Epoch: 2 Global Step: 61440 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:00,406-Speed 6312.78 samples/sec Loss 10.7393 LearningRate 0.0007 Epoch: 2 Global Step: 61450 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:03,643-Speed 6328.17 samples/sec Loss 10.6127 LearningRate 0.0007 Epoch: 2 Global Step: 61460 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:06,887-Speed 6313.47 samples/sec Loss 10.7544 LearningRate 0.0007 Epoch: 2 Global Step: 61470 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:10,133-Speed 6310.45 samples/sec Loss 10.7244 LearningRate 0.0007 Epoch: 2 Global Step: 61480 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:13,378-Speed 6312.53 samples/sec Loss 10.6073 LearningRate 0.0007 Epoch: 2 Global Step: 61490 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:16,628-Speed 6304.72 samples/sec Loss 10.6405 LearningRate 0.0007 Epoch: 2 Global Step: 61500 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:19,881-Speed 6295.58 samples/sec Loss 10.6526 LearningRate 0.0007 Epoch: 2 Global Step: 61510 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:23,129-Speed 6308.03 samples/sec Loss 10.7179 LearningRate 0.0007 Epoch: 2 Global Step: 61520 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:26,383-Speed 6295.48 samples/sec Loss 10.6986 LearningRate 0.0007 Epoch: 2 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:29,628-Speed 6312.71 samples/sec Loss 10.7329 LearningRate 0.0007 Epoch: 2 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:32,873-Speed 6311.88 samples/sec Loss 10.7720 LearningRate 0.0007 Epoch: 2 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:36,108-Speed 6333.37 samples/sec Loss 10.7094 LearningRate 0.0007 Epoch: 2 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:39,353-Speed 6311.84 samples/sec Loss 10.7226 LearningRate 0.0007 Epoch: 2 Global Step: 61570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:42,599-Speed 6310.39 samples/sec Loss 10.7105 LearningRate 0.0007 Epoch: 2 Global Step: 61580 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:45,840-Speed 6321.68 samples/sec Loss 10.6916 LearningRate 0.0007 Epoch: 2 Global Step: 61590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:49,087-Speed 6308.87 samples/sec Loss 10.7477 LearningRate 0.0007 Epoch: 2 Global Step: 61600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:52,330-Speed 6316.49 samples/sec Loss 10.6548 LearningRate 0.0007 Epoch: 2 Global Step: 61610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:55,580-Speed 6303.01 samples/sec Loss 10.7546 LearningRate 0.0007 Epoch: 2 Global Step: 61620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:19:58,823-Speed 6316.70 samples/sec Loss 10.6214 LearningRate 0.0007 Epoch: 2 Global Step: 61630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:02,066-Speed 6315.54 samples/sec Loss 10.6669 LearningRate 0.0007 Epoch: 2 Global Step: 61640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:05,316-Speed 6304.28 samples/sec Loss 10.7066 LearningRate 0.0007 Epoch: 2 Global Step: 61650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:08,546-Speed 6341.31 samples/sec Loss 10.6709 LearningRate 0.0007 Epoch: 2 Global Step: 61660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:11,789-Speed 6315.46 samples/sec Loss 10.8226 LearningRate 0.0007 Epoch: 2 Global Step: 61670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:15,034-Speed 6313.58 samples/sec Loss 10.7279 LearningRate 0.0007 Epoch: 2 Global Step: 61680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:18,278-Speed 6314.74 samples/sec Loss 10.6439 LearningRate 0.0007 Epoch: 2 Global Step: 61690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:21,534-Speed 6291.78 samples/sec Loss 10.6540 LearningRate 0.0007 Epoch: 2 Global Step: 61700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:24,781-Speed 6308.35 samples/sec Loss 10.7833 LearningRate 0.0007 Epoch: 2 Global Step: 61710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:28,026-Speed 6312.50 samples/sec Loss 10.7613 LearningRate 0.0007 Epoch: 2 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:31,272-Speed 6311.23 samples/sec Loss 10.6774 LearningRate 0.0007 Epoch: 2 Global Step: 61730 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:34,515-Speed 6315.70 samples/sec Loss 10.7678 LearningRate 0.0007 Epoch: 2 Global Step: 61740 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:37,761-Speed 6311.52 samples/sec Loss 10.6800 LearningRate 0.0007 Epoch: 2 Global Step: 61750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:41,003-Speed 6319.46 samples/sec Loss 10.6957 LearningRate 0.0007 Epoch: 2 Global Step: 61760 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:20:44,259-Speed 6291.78 samples/sec Loss 10.6353 LearningRate 0.0007 Epoch: 2 Global Step: 61770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:47,504-Speed 6312.57 samples/sec Loss 10.7021 LearningRate 0.0007 Epoch: 2 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:50,751-Speed 6307.43 samples/sec Loss 10.6958 LearningRate 0.0007 Epoch: 2 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:53,998-Speed 6310.08 samples/sec Loss 10.6599 LearningRate 0.0007 Epoch: 2 Global Step: 61800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:20:57,251-Speed 6295.59 samples/sec Loss 10.7544 LearningRate 0.0007 Epoch: 2 Global Step: 61810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:00,498-Speed 6308.96 samples/sec Loss 10.5770 LearningRate 0.0007 Epoch: 2 Global Step: 61820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:03,745-Speed 6309.35 samples/sec Loss 10.6849 LearningRate 0.0007 Epoch: 2 Global Step: 61830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:06,990-Speed 6312.10 samples/sec Loss 10.6605 LearningRate 0.0007 Epoch: 2 Global Step: 61840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:10,238-Speed 6307.97 samples/sec Loss 10.7671 LearningRate 0.0007 Epoch: 2 Global Step: 61850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:13,485-Speed 6307.15 samples/sec Loss 10.8402 LearningRate 0.0007 Epoch: 2 Global Step: 61860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:16,722-Speed 6330.37 samples/sec Loss 10.6816 LearningRate 0.0007 Epoch: 2 Global Step: 61870 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:19,961-Speed 6322.76 samples/sec Loss 10.6487 LearningRate 0.0007 Epoch: 2 Global Step: 61880 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:23,209-Speed 6307.16 samples/sec Loss 10.7720 LearningRate 0.0007 Epoch: 2 Global Step: 61890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:26,450-Speed 6320.24 samples/sec Loss 10.7181 LearningRate 0.0007 Epoch: 2 Global Step: 61900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:29,695-Speed 6313.90 samples/sec Loss 10.6893 LearningRate 0.0007 Epoch: 2 Global Step: 61910 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:32,940-Speed 6311.13 samples/sec Loss 10.7528 LearningRate 0.0007 Epoch: 2 Global Step: 61920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:36,183-Speed 6316.67 samples/sec Loss 10.6964 LearningRate 0.0007 Epoch: 2 Global Step: 61930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:39,429-Speed 6312.43 samples/sec Loss 10.7002 LearningRate 0.0007 Epoch: 2 Global Step: 61940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:42,673-Speed 6313.24 samples/sec Loss 10.7541 LearningRate 0.0007 Epoch: 2 Global Step: 61950 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:45,918-Speed 6313.17 samples/sec Loss 10.7511 LearningRate 0.0007 Epoch: 2 Global Step: 61960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:49,160-Speed 6319.30 samples/sec Loss 10.6602 LearningRate 0.0007 Epoch: 2 Global Step: 61970 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:21:52,405-Speed 6313.60 samples/sec Loss 10.7341 LearningRate 0.0007 Epoch: 2 Global Step: 61980 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:21:55,635-Speed 6341.01 samples/sec Loss 10.7446 LearningRate 0.0007 Epoch: 2 Global Step: 61990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:21:58,877-Speed 6319.13 samples/sec Loss 10.6903 LearningRate 0.0007 Epoch: 2 Global Step: 62000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:02,128-Speed 6300.38 samples/sec Loss 10.7441 LearningRate 0.0007 Epoch: 2 Global Step: 62010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:05,376-Speed 6308.02 samples/sec Loss 10.7075 LearningRate 0.0007 Epoch: 2 Global Step: 62020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:08,617-Speed 6319.00 samples/sec Loss 10.6646 LearningRate 0.0007 Epoch: 2 Global Step: 62030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:11,862-Speed 6312.90 samples/sec Loss 10.7733 LearningRate 0.0007 Epoch: 2 Global Step: 62040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:15,104-Speed 6317.99 samples/sec Loss 10.7180 LearningRate 0.0007 Epoch: 2 Global Step: 62050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:18,347-Speed 6317.20 samples/sec Loss 10.7179 LearningRate 0.0007 Epoch: 2 Global Step: 62060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:21,589-Speed 6319.29 samples/sec Loss 10.6956 LearningRate 0.0007 Epoch: 2 Global Step: 62070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:24,866-Speed 6249.89 samples/sec Loss 10.5963 LearningRate 0.0007 Epoch: 2 Global Step: 62080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:28,115-Speed 6304.55 samples/sec Loss 10.6603 LearningRate 0.0007 Epoch: 2 Global Step: 62090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:31,356-Speed 6321.27 samples/sec Loss 10.7287 LearningRate 0.0007 Epoch: 2 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:34,603-Speed 6309.86 samples/sec Loss 10.6793 LearningRate 0.0007 Epoch: 2 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:37,845-Speed 6317.23 samples/sec Loss 10.6767 LearningRate 0.0007 Epoch: 2 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:41,092-Speed 6308.85 samples/sec Loss 10.6857 LearningRate 0.0007 Epoch: 2 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:44,336-Speed 6314.45 samples/sec Loss 10.7927 LearningRate 0.0007 Epoch: 2 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:47,585-Speed 6306.09 samples/sec Loss 10.6811 LearningRate 0.0007 Epoch: 2 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:50,843-Speed 6288.52 samples/sec Loss 10.8060 LearningRate 0.0007 Epoch: 2 Global Step: 62160 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:54,084-Speed 6319.80 samples/sec Loss 10.6802 LearningRate 0.0007 Epoch: 2 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:22:57,326-Speed 6318.30 samples/sec Loss 10.6178 LearningRate 0.0007 Epoch: 2 Global Step: 62180 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:23:00,554-Speed 6346.14 samples/sec Loss 10.7366 LearningRate 0.0007 Epoch: 2 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:23:03,802-Speed 6307.28 samples/sec Loss 10.7365 LearningRate 0.0007 Epoch: 2 Global Step: 62200 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:23:07,044-Speed 6318.03 samples/sec Loss 10.7236 LearningRate 0.0007 Epoch: 2 Global Step: 62210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:05,592-Speed 349.80 samples/sec Loss 10.6180 LearningRate 0.0008 Epoch: 3 Global Step: 62220 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:08,819-Speed 6348.99 samples/sec Loss 10.7578 LearningRate 0.0008 Epoch: 3 Global Step: 62230 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:12,061-Speed 6317.74 samples/sec Loss 10.6456 LearningRate 0.0008 Epoch: 3 Global Step: 62240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:15,309-Speed 6308.32 samples/sec Loss 10.6824 LearningRate 0.0008 Epoch: 3 Global Step: 62250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:18,633-Speed 6161.02 samples/sec Loss 10.6956 LearningRate 0.0008 Epoch: 3 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:21,871-Speed 6326.76 samples/sec Loss 10.7246 LearningRate 0.0008 Epoch: 3 Global Step: 62270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:25,111-Speed 6323.14 samples/sec Loss 10.6781 LearningRate 0.0008 Epoch: 3 Global Step: 62280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:28,348-Speed 6328.82 samples/sec Loss 10.6630 LearningRate 0.0008 Epoch: 3 Global Step: 62290 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:24:31,571-Speed 6355.83 samples/sec Loss 10.6495 LearningRate 0.0008 Epoch: 3 Global Step: 62300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:34,807-Speed 6328.64 samples/sec Loss 10.6188 LearningRate 0.0008 Epoch: 3 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:38,045-Speed 6326.82 samples/sec Loss 10.7085 LearningRate 0.0008 Epoch: 3 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:41,293-Speed 6307.51 samples/sec Loss 10.6871 LearningRate 0.0008 Epoch: 3 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:44,527-Speed 6334.93 samples/sec Loss 10.7052 LearningRate 0.0008 Epoch: 3 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:47,766-Speed 6325.20 samples/sec Loss 10.6803 LearningRate 0.0008 Epoch: 3 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:51,010-Speed 6314.71 samples/sec Loss 10.6479 LearningRate 0.0008 Epoch: 3 Global Step: 62360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:54,251-Speed 6320.35 samples/sec Loss 10.6867 LearningRate 0.0008 Epoch: 3 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:24:57,499-Speed 6305.46 samples/sec Loss 10.7210 LearningRate 0.0008 Epoch: 3 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:00,758-Speed 6285.81 samples/sec Loss 10.6700 LearningRate 0.0008 Epoch: 3 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:03,988-Speed 6342.88 samples/sec Loss 10.6588 LearningRate 0.0008 Epoch: 3 Global Step: 62400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:07,233-Speed 6311.61 samples/sec Loss 10.6489 LearningRate 0.0008 Epoch: 3 Global Step: 62410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:10,478-Speed 6312.56 samples/sec Loss 10.6900 LearningRate 0.0008 Epoch: 3 Global Step: 62420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:13,717-Speed 6323.62 samples/sec Loss 10.6111 LearningRate 0.0008 Epoch: 3 Global Step: 62430 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:16,986-Speed 6268.42 samples/sec Loss 10.7236 LearningRate 0.0008 Epoch: 3 Global Step: 62440 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:20,251-Speed 6272.98 samples/sec Loss 10.6091 LearningRate 0.0008 Epoch: 3 Global Step: 62450 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:23,498-Speed 6308.12 samples/sec Loss 10.5777 LearningRate 0.0008 Epoch: 3 Global Step: 62460 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:26,743-Speed 6313.02 samples/sec Loss 10.5816 LearningRate 0.0008 Epoch: 3 Global Step: 62470 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:29,989-Speed 6310.39 samples/sec Loss 10.6223 LearningRate 0.0008 Epoch: 3 Global Step: 62480 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:33,232-Speed 6317.39 samples/sec Loss 10.5509 LearningRate 0.0008 Epoch: 3 Global Step: 62490 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:36,458-Speed 6348.72 samples/sec Loss 10.5480 LearningRate 0.0008 Epoch: 3 Global Step: 62500 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:39,700-Speed 6320.11 samples/sec Loss 10.7150 LearningRate 0.0008 Epoch: 3 Global Step: 62510 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:42,942-Speed 6316.63 samples/sec Loss 10.6325 LearningRate 0.0008 Epoch: 3 Global Step: 62520 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:46,229-Speed 6233.04 samples/sec Loss 10.5997 LearningRate 0.0008 Epoch: 3 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:49,470-Speed 6320.24 samples/sec Loss 10.5942 LearningRate 0.0008 Epoch: 3 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:52,712-Speed 6320.28 samples/sec Loss 10.6455 LearningRate 0.0008 Epoch: 3 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:55,953-Speed 6320.95 samples/sec Loss 10.6542 LearningRate 0.0008 Epoch: 3 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:25:59,196-Speed 6316.03 samples/sec Loss 10.6264 LearningRate 0.0008 Epoch: 3 Global Step: 62570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:02,449-Speed 6297.26 samples/sec Loss 10.6754 LearningRate 0.0008 Epoch: 3 Global Step: 62580 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:05,692-Speed 6315.74 samples/sec Loss 10.6095 LearningRate 0.0008 Epoch: 3 Global Step: 62590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:08,916-Speed 6354.62 samples/sec Loss 10.5807 LearningRate 0.0008 Epoch: 3 Global Step: 62600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:12,155-Speed 6323.55 samples/sec Loss 10.6587 LearningRate 0.0008 Epoch: 3 Global Step: 62610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:15,393-Speed 6326.99 samples/sec Loss 10.6312 LearningRate 0.0008 Epoch: 3 Global Step: 62620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:18,630-Speed 6327.36 samples/sec Loss 10.6555 LearningRate 0.0008 Epoch: 3 Global Step: 62630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:21,873-Speed 6317.47 samples/sec Loss 10.5976 LearningRate 0.0008 Epoch: 3 Global Step: 62640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:25,109-Speed 6330.43 samples/sec Loss 10.6443 LearningRate 0.0008 Epoch: 3 Global Step: 62650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:28,348-Speed 6324.52 samples/sec Loss 10.5764 LearningRate 0.0008 Epoch: 3 Global Step: 62660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:31,585-Speed 6326.95 samples/sec Loss 10.6049 LearningRate 0.0008 Epoch: 3 Global Step: 62670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:34,841-Speed 6292.77 samples/sec Loss 10.5452 LearningRate 0.0008 Epoch: 3 Global Step: 62680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:38,080-Speed 6323.19 samples/sec Loss 10.6185 LearningRate 0.0008 Epoch: 3 Global Step: 62690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:41,306-Speed 6349.24 samples/sec Loss 10.6787 LearningRate 0.0008 Epoch: 3 Global Step: 62700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:44,543-Speed 6329.34 samples/sec Loss 10.6892 LearningRate 0.0008 Epoch: 3 Global Step: 62710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:47,788-Speed 6311.83 samples/sec Loss 10.6292 LearningRate 0.0008 Epoch: 3 Global Step: 62720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:51,026-Speed 6326.41 samples/sec Loss 10.6039 LearningRate 0.0008 Epoch: 3 Global Step: 62730 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:54,262-Speed 6330.79 samples/sec Loss 10.6076 LearningRate 0.0008 Epoch: 3 Global Step: 62740 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:26:57,507-Speed 6311.71 samples/sec Loss 10.7028 LearningRate 0.0008 Epoch: 3 Global Step: 62750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:00,750-Speed 6317.21 samples/sec Loss 10.6189 LearningRate 0.0008 Epoch: 3 Global Step: 62760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:03,992-Speed 6320.32 samples/sec Loss 10.6344 LearningRate 0.0008 Epoch: 3 Global Step: 62770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:07,233-Speed 6320.84 samples/sec Loss 10.6851 LearningRate 0.0008 Epoch: 3 Global Step: 62780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:10,470-Speed 6326.38 samples/sec Loss 10.6148 LearningRate 0.0008 Epoch: 3 Global Step: 62790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:13,710-Speed 6323.39 samples/sec Loss 10.6560 LearningRate 0.0008 Epoch: 3 Global Step: 62800 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:27:16,941-Speed 6340.59 samples/sec Loss 10.7069 LearningRate 0.0008 Epoch: 3 Global Step: 62810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:20,182-Speed 6320.37 samples/sec Loss 10.6157 LearningRate 0.0008 Epoch: 3 Global Step: 62820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:23,434-Speed 6298.45 samples/sec Loss 10.6659 LearningRate 0.0008 Epoch: 3 Global Step: 62830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:26,740-Speed 6196.28 samples/sec Loss 10.6522 LearningRate 0.0008 Epoch: 3 Global Step: 62840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:29,994-Speed 6295.62 samples/sec Loss 10.5766 LearningRate 0.0008 Epoch: 3 Global Step: 62850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:33,234-Speed 6320.97 samples/sec Loss 10.6039 LearningRate 0.0008 Epoch: 3 Global Step: 62860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:36,474-Speed 6324.19 samples/sec Loss 10.6101 LearningRate 0.0008 Epoch: 3 Global Step: 62870 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:39,714-Speed 6321.46 samples/sec Loss 10.7145 LearningRate 0.0008 Epoch: 3 Global Step: 62880 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:42,951-Speed 6327.56 samples/sec Loss 10.6758 LearningRate 0.0008 Epoch: 3 Global Step: 62890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:46,198-Speed 6308.62 samples/sec Loss 10.6501 LearningRate 0.0008 Epoch: 3 Global Step: 62900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:49,439-Speed 6320.54 samples/sec Loss 10.6376 LearningRate 0.0008 Epoch: 3 Global Step: 62910 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:27:52,664-Speed 6352.55 samples/sec Loss 10.6224 LearningRate 0.0008 Epoch: 3 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:55,905-Speed 6320.45 samples/sec Loss 10.6214 LearningRate 0.0008 Epoch: 3 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:27:59,141-Speed 6330.03 samples/sec Loss 10.6624 LearningRate 0.0008 Epoch: 3 Global Step: 62940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:02,380-Speed 6325.29 samples/sec Loss 10.6199 LearningRate 0.0008 Epoch: 3 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:05,619-Speed 6323.37 samples/sec Loss 10.7174 LearningRate 0.0008 Epoch: 3 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:08,861-Speed 6318.64 samples/sec Loss 10.5408 LearningRate 0.0008 Epoch: 3 Global Step: 62970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:12,103-Speed 6319.71 samples/sec Loss 10.5484 LearningRate 0.0008 Epoch: 3 Global Step: 62980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:15,354-Speed 6301.69 samples/sec Loss 10.6781 LearningRate 0.0008 Epoch: 3 Global Step: 62990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:18,593-Speed 6323.69 samples/sec Loss 10.5706 LearningRate 0.0008 Epoch: 3 Global Step: 63000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:21,837-Speed 6313.99 samples/sec Loss 10.5854 LearningRate 0.0008 Epoch: 3 Global Step: 63010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:25,070-Speed 6337.01 samples/sec Loss 10.6452 LearningRate 0.0008 Epoch: 3 Global Step: 63020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:28,313-Speed 6315.80 samples/sec Loss 10.7076 LearningRate 0.0008 Epoch: 3 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:31,552-Speed 6325.29 samples/sec Loss 10.6755 LearningRate 0.0008 Epoch: 3 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:34,811-Speed 6285.62 samples/sec Loss 10.5773 LearningRate 0.0008 Epoch: 3 Global Step: 63050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:38,049-Speed 6324.96 samples/sec Loss 10.6810 LearningRate 0.0008 Epoch: 3 Global Step: 63060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:41,293-Speed 6315.09 samples/sec Loss 10.6181 LearningRate 0.0008 Epoch: 3 Global Step: 63070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:44,533-Speed 6323.77 samples/sec Loss 10.6909 LearningRate 0.0008 Epoch: 3 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:47,772-Speed 6323.92 samples/sec Loss 10.6326 LearningRate 0.0008 Epoch: 3 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:51,013-Speed 6320.20 samples/sec Loss 10.6194 LearningRate 0.0008 Epoch: 3 Global Step: 63100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:54,255-Speed 6319.16 samples/sec Loss 10.6312 LearningRate 0.0008 Epoch: 3 Global Step: 63110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:28:57,484-Speed 6343.54 samples/sec Loss 10.5927 LearningRate 0.0008 Epoch: 3 Global Step: 63120 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:29:00,723-Speed 6324.25 samples/sec Loss 10.6418 LearningRate 0.0008 Epoch: 3 Global Step: 63130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:29:03,980-Speed 6289.14 samples/sec Loss 10.6358 LearningRate 0.0008 Epoch: 3 Global Step: 63140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:29:07,220-Speed 6322.29 samples/sec Loss 10.5349 LearningRate 0.0008 Epoch: 3 Global Step: 63150 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:29:10,454-Speed 6334.29 samples/sec Loss 10.5961 LearningRate 0.0008 Epoch: 3 Global Step: 63160 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:29:13,763-Speed 6191.02 samples/sec Loss 10.5304 LearningRate 0.0008 Epoch: 3 Global Step: 63170 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:29:17,004-Speed 6320.79 samples/sec Loss 10.5696 LearningRate 0.0008 Epoch: 3 Global Step: 63180 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:29:20,247-Speed 6315.69 samples/sec Loss 10.5537 LearningRate 0.0008 Epoch: 3 Global Step: 63190 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:29:23,487-Speed 6323.73 samples/sec Loss 10.7083 LearningRate 0.0008 Epoch: 3 Global Step: 63200 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:29:26,725-Speed 6325.81 samples/sec Loss 10.6029 LearningRate 0.0008 Epoch: 3 Global Step: 63210 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:29:29,964-Speed 6323.43 samples/sec Loss 10.6243 LearningRate 0.0008 Epoch: 3 Global Step: 63220 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:29:33,208-Speed 6316.38 samples/sec Loss 10.6422 LearningRate 0.0008 Epoch: 3 Global Step: 63230 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:29:36,450-Speed 6317.04 samples/sec Loss 10.5630 LearningRate 0.0008 Epoch: 3 Global Step: 63240 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:29:39,701-Speed 6301.38 samples/sec Loss 10.4591 LearningRate 0.0008 Epoch: 3 Global Step: 63250 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:29:42,945-Speed 6315.45 samples/sec Loss 10.6708 LearningRate 0.0008 Epoch: 3 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:29:46,184-Speed 6323.50 samples/sec Loss 10.7163 LearningRate 0.0008 Epoch: 3 Global Step: 63270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:29:49,431-Speed 6308.65 samples/sec Loss 10.6573 LearningRate 0.0008 Epoch: 3 Global Step: 63280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:29:52,672-Speed 6320.72 samples/sec Loss 10.5924 LearningRate 0.0008 Epoch: 3 Global Step: 63290 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:29:55,913-Speed 6321.13 samples/sec Loss 10.5756 LearningRate 0.0008 Epoch: 3 Global Step: 63300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:29:59,157-Speed 6313.07 samples/sec Loss 10.5074 LearningRate 0.0008 Epoch: 3 Global Step: 63310 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:02,404-Speed 6310.17 samples/sec Loss 10.6114 LearningRate 0.0008 Epoch: 3 Global Step: 63320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:05,690-Speed 6232.66 samples/sec Loss 10.4841 LearningRate 0.0008 Epoch: 3 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:08,943-Speed 6298.43 samples/sec Loss 10.5975 LearningRate 0.0008 Epoch: 3 Global Step: 63340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:12,184-Speed 6319.90 samples/sec Loss 10.6317 LearningRate 0.0008 Epoch: 3 Global Step: 63350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:15,413-Speed 6344.43 samples/sec Loss 10.6024 LearningRate 0.0008 Epoch: 3 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:18,654-Speed 6320.47 samples/sec Loss 10.6083 LearningRate 0.0008 Epoch: 3 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:21,907-Speed 6296.60 samples/sec Loss 10.6095 LearningRate 0.0008 Epoch: 3 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:25,149-Speed 6320.67 samples/sec Loss 10.6464 LearningRate 0.0008 Epoch: 3 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:28,395-Speed 6308.78 samples/sec Loss 10.6352 LearningRate 0.0008 Epoch: 3 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:31,641-Speed 6311.90 samples/sec Loss 10.5111 LearningRate 0.0008 Epoch: 3 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:34,880-Speed 6325.01 samples/sec Loss 10.5336 LearningRate 0.0008 Epoch: 3 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:30:38,110-Speed 6340.15 samples/sec Loss 10.6377 LearningRate 0.0008 Epoch: 3 Global Step: 63430 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:30:41,351-Speed 6321.29 samples/sec Loss 10.5471 LearningRate 0.0008 Epoch: 3 Global Step: 63440 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:30:44,591-Speed 6322.28 samples/sec Loss 10.5437 LearningRate 0.0008 Epoch: 3 Global Step: 63450 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:30:47,829-Speed 6326.39 samples/sec Loss 10.5489 LearningRate 0.0008 Epoch: 3 Global Step: 63460 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:30:51,075-Speed 6311.36 samples/sec Loss 10.6135 LearningRate 0.0008 Epoch: 3 Global Step: 63470 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:30:54,317-Speed 6317.30 samples/sec Loss 10.5658 LearningRate 0.0008 Epoch: 3 Global Step: 63480 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:30:57,556-Speed 6325.70 samples/sec Loss 10.5640 LearningRate 0.0008 Epoch: 3 Global Step: 63490 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:31:00,797-Speed 6318.44 samples/sec Loss 10.5144 LearningRate 0.0008 Epoch: 3 Global Step: 63500 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:31:04,044-Speed 6310.00 samples/sec Loss 10.5313 LearningRate 0.0008 Epoch: 3 Global Step: 63510 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:31:07,284-Speed 6321.66 samples/sec Loss 10.5436 LearningRate 0.0008 Epoch: 3 Global Step: 63520 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:31:10,528-Speed 6315.29 samples/sec Loss 10.6145 LearningRate 0.0008 Epoch: 3 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:13,769-Speed 6320.70 samples/sec Loss 10.5228 LearningRate 0.0008 Epoch: 3 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:17,014-Speed 6311.69 samples/sec Loss 10.6389 LearningRate 0.0008 Epoch: 3 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:20,255-Speed 6321.00 samples/sec Loss 10.5302 LearningRate 0.0008 Epoch: 3 Global Step: 63560 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:23,498-Speed 6316.42 samples/sec Loss 10.5531 LearningRate 0.0008 Epoch: 3 Global Step: 63570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:26,742-Speed 6315.82 samples/sec Loss 10.6533 LearningRate 0.0008 Epoch: 3 Global Step: 63580 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:29,984-Speed 6318.61 samples/sec Loss 10.5811 LearningRate 0.0008 Epoch: 3 Global Step: 63590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:33,230-Speed 6309.72 samples/sec Loss 10.5573 LearningRate 0.0008 Epoch: 3 Global Step: 63600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:36,472-Speed 6319.84 samples/sec Loss 10.5383 LearningRate 0.0008 Epoch: 3 Global Step: 63610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:39,720-Speed 6306.07 samples/sec Loss 10.5843 LearningRate 0.0008 Epoch: 3 Global Step: 63620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:42,949-Speed 6343.54 samples/sec Loss 10.6328 LearningRate 0.0008 Epoch: 3 Global Step: 63630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:46,203-Speed 6295.55 samples/sec Loss 10.5107 LearningRate 0.0008 Epoch: 3 Global Step: 63640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:49,448-Speed 6311.48 samples/sec Loss 10.5233 LearningRate 0.0008 Epoch: 3 Global Step: 63650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:52,688-Speed 6323.88 samples/sec Loss 10.5680 LearningRate 0.0008 Epoch: 3 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:55,928-Speed 6320.91 samples/sec Loss 10.5697 LearningRate 0.0008 Epoch: 3 Global Step: 63670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:31:59,170-Speed 6319.58 samples/sec Loss 10.6754 LearningRate 0.0008 Epoch: 3 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:02,422-Speed 6299.75 samples/sec Loss 10.5756 LearningRate 0.0008 Epoch: 3 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:05,668-Speed 6310.66 samples/sec Loss 10.6213 LearningRate 0.0008 Epoch: 3 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:08,910-Speed 6318.28 samples/sec Loss 10.5406 LearningRate 0.0008 Epoch: 3 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:12,150-Speed 6322.39 samples/sec Loss 10.5171 LearningRate 0.0008 Epoch: 3 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:15,393-Speed 6315.07 samples/sec Loss 10.5714 LearningRate 0.0008 Epoch: 3 Global Step: 63730 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:32:18,621-Speed 6347.00 samples/sec Loss 10.5471 LearningRate 0.0008 Epoch: 3 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:21,864-Speed 6315.95 samples/sec Loss 10.4874 LearningRate 0.0008 Epoch: 3 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:25,108-Speed 6314.19 samples/sec Loss 10.5612 LearningRate 0.0008 Epoch: 3 Global Step: 63760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:28,352-Speed 6315.85 samples/sec Loss 10.4877 LearningRate 0.0008 Epoch: 3 Global Step: 63770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:31,595-Speed 6317.79 samples/sec Loss 10.5941 LearningRate 0.0008 Epoch: 3 Global Step: 63780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:34,860-Speed 6272.31 samples/sec Loss 10.5615 LearningRate 0.0008 Epoch: 3 Global Step: 63790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:38,098-Speed 6326.43 samples/sec Loss 10.5414 LearningRate 0.0008 Epoch: 3 Global Step: 63800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:41,341-Speed 6316.82 samples/sec Loss 10.5998 LearningRate 0.0008 Epoch: 3 Global Step: 63810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:44,585-Speed 6314.94 samples/sec Loss 10.5236 LearningRate 0.0008 Epoch: 3 Global Step: 63820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:47,830-Speed 6312.77 samples/sec Loss 10.4280 LearningRate 0.0008 Epoch: 3 Global Step: 63830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:51,087-Speed 6288.58 samples/sec Loss 10.5208 LearningRate 0.0008 Epoch: 3 Global Step: 63840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:54,331-Speed 6314.59 samples/sec Loss 10.5896 LearningRate 0.0008 Epoch: 3 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:32:57,575-Speed 6314.76 samples/sec Loss 10.5309 LearningRate 0.0008 Epoch: 3 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:33:00,806-Speed 6339.44 samples/sec Loss 10.6285 LearningRate 0.0008 Epoch: 3 Global Step: 63870 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:33:04,049-Speed 6316.13 samples/sec Loss 10.5947 LearningRate 0.0008 Epoch: 3 Global Step: 63880 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:33:07,294-Speed 6314.28 samples/sec Loss 10.5340 LearningRate 0.0008 Epoch: 3 Global Step: 63890 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:33:10,537-Speed 6315.24 samples/sec Loss 10.4801 LearningRate 0.0008 Epoch: 3 Global Step: 63900 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:33:13,796-Speed 6285.77 samples/sec Loss 10.4625 LearningRate 0.0008 Epoch: 3 Global Step: 63910 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:33:17,061-Speed 6274.93 samples/sec Loss 10.5655 LearningRate 0.0008 Epoch: 3 Global Step: 63920 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:33:20,302-Speed 6319.57 samples/sec Loss 10.5583 LearningRate 0.0008 Epoch: 3 Global Step: 63930 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:33:23,540-Speed 6326.41 samples/sec Loss 10.5451 LearningRate 0.0008 Epoch: 3 Global Step: 63940 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:33:26,781-Speed 6320.65 samples/sec Loss 10.3764 LearningRate 0.0008 Epoch: 3 Global Step: 63950 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:33:30,028-Speed 6308.99 samples/sec Loss 10.5440 LearningRate 0.0008 Epoch: 3 Global Step: 63960 Fp16 Grad Scale: 32768 Required: 70 hours Training: 2022-03-31 21:33:33,269-Speed 6320.16 samples/sec Loss 10.6523 LearningRate 0.0008 Epoch: 3 Global Step: 63970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:33:36,515-Speed 6312.55 samples/sec Loss 10.5774 LearningRate 0.0008 Epoch: 3 Global Step: 63980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:33:39,763-Speed 6307.45 samples/sec Loss 10.5377 LearningRate 0.0008 Epoch: 3 Global Step: 63990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:33:43,001-Speed 6325.75 samples/sec Loss 10.4689 LearningRate 0.0008 Epoch: 3 Global Step: 64000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:33:46,243-Speed 6319.19 samples/sec Loss 10.5737 LearningRate 0.0008 Epoch: 3 Global Step: 64010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:33:49,485-Speed 6318.65 samples/sec Loss 10.5501 LearningRate 0.0008 Epoch: 3 Global Step: 64020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:33:52,729-Speed 6313.99 samples/sec Loss 10.5500 LearningRate 0.0008 Epoch: 3 Global Step: 64030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:33:55,974-Speed 6313.29 samples/sec Loss 10.6167 LearningRate 0.0008 Epoch: 3 Global Step: 64040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:33:59,214-Speed 6321.74 samples/sec Loss 10.5005 LearningRate 0.0008 Epoch: 3 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:02,456-Speed 6317.63 samples/sec Loss 10.5370 LearningRate 0.0008 Epoch: 3 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:05,684-Speed 6347.05 samples/sec Loss 10.5341 LearningRate 0.0008 Epoch: 3 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:08,925-Speed 6319.11 samples/sec Loss 10.5510 LearningRate 0.0008 Epoch: 3 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:12,169-Speed 6316.24 samples/sec Loss 10.5538 LearningRate 0.0008 Epoch: 3 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:15,410-Speed 6320.02 samples/sec Loss 10.5390 LearningRate 0.0008 Epoch: 3 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:18,656-Speed 6310.30 samples/sec Loss 10.5430 LearningRate 0.0008 Epoch: 3 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:21,900-Speed 6313.40 samples/sec Loss 10.5190 LearningRate 0.0008 Epoch: 3 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:25,153-Speed 6297.52 samples/sec Loss 10.5050 LearningRate 0.0008 Epoch: 3 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:28,394-Speed 6321.15 samples/sec Loss 10.5524 LearningRate 0.0008 Epoch: 3 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:31,637-Speed 6316.34 samples/sec Loss 10.5484 LearningRate 0.0008 Epoch: 3 Global Step: 64150 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:34,877-Speed 6322.58 samples/sec Loss 10.4921 LearningRate 0.0008 Epoch: 3 Global Step: 64160 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:38,118-Speed 6319.19 samples/sec Loss 10.4477 LearningRate 0.0008 Epoch: 3 Global Step: 64170 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:41,363-Speed 6313.92 samples/sec Loss 10.4959 LearningRate 0.0008 Epoch: 3 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:44,606-Speed 6316.20 samples/sec Loss 10.4350 LearningRate 0.0008 Epoch: 3 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:47,850-Speed 6315.13 samples/sec Loss 10.4681 LearningRate 0.0008 Epoch: 3 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:51,096-Speed 6311.31 samples/sec Loss 10.6084 LearningRate 0.0008 Epoch: 3 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:54,342-Speed 6310.58 samples/sec Loss 10.4609 LearningRate 0.0008 Epoch: 3 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:34:57,585-Speed 6316.99 samples/sec Loss 10.5310 LearningRate 0.0008 Epoch: 3 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:00,831-Speed 6310.22 samples/sec Loss 10.6063 LearningRate 0.0008 Epoch: 3 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:04,078-Speed 6307.22 samples/sec Loss 10.6209 LearningRate 0.0008 Epoch: 3 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:07,321-Speed 6317.69 samples/sec Loss 10.5689 LearningRate 0.0008 Epoch: 3 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:10,553-Speed 6337.56 samples/sec Loss 10.6108 LearningRate 0.0008 Epoch: 3 Global Step: 64270 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:13,801-Speed 6308.41 samples/sec Loss 10.5139 LearningRate 0.0008 Epoch: 3 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:17,042-Speed 6318.88 samples/sec Loss 10.5297 LearningRate 0.0008 Epoch: 3 Global Step: 64290 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:20,284-Speed 6319.25 samples/sec Loss 10.5760 LearningRate 0.0008 Epoch: 3 Global Step: 64300 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:23,523-Speed 6323.85 samples/sec Loss 10.4936 LearningRate 0.0008 Epoch: 3 Global Step: 64310 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:26,769-Speed 6311.50 samples/sec Loss 10.5225 LearningRate 0.0008 Epoch: 3 Global Step: 64320 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:30,018-Speed 6305.00 samples/sec Loss 10.4988 LearningRate 0.0008 Epoch: 3 Global Step: 64330 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:33,264-Speed 6310.49 samples/sec Loss 10.5850 LearningRate 0.0008 Epoch: 3 Global Step: 64340 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:36,509-Speed 6312.59 samples/sec Loss 10.5173 LearningRate 0.0008 Epoch: 3 Global Step: 64350 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:39,756-Speed 6307.17 samples/sec Loss 10.5235 LearningRate 0.0008 Epoch: 3 Global Step: 64360 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:43,004-Speed 6309.66 samples/sec Loss 10.5414 LearningRate 0.0008 Epoch: 3 Global Step: 64370 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:35:46,233-Speed 6342.85 samples/sec Loss 10.5545 LearningRate 0.0008 Epoch: 3 Global Step: 64380 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:49,477-Speed 6315.70 samples/sec Loss 10.4253 LearningRate 0.0008 Epoch: 3 Global Step: 64390 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:52,723-Speed 6310.80 samples/sec Loss 10.5022 LearningRate 0.0008 Epoch: 3 Global Step: 64400 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:55,965-Speed 6317.36 samples/sec Loss 10.5161 LearningRate 0.0008 Epoch: 3 Global Step: 64410 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:35:59,211-Speed 6311.95 samples/sec Loss 10.5029 LearningRate 0.0008 Epoch: 3 Global Step: 64420 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:02,454-Speed 6316.45 samples/sec Loss 10.4685 LearningRate 0.0008 Epoch: 3 Global Step: 64430 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:05,696-Speed 6318.52 samples/sec Loss 10.4138 LearningRate 0.0008 Epoch: 3 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:08,938-Speed 6318.38 samples/sec Loss 10.4673 LearningRate 0.0008 Epoch: 3 Global Step: 64450 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:12,183-Speed 6312.21 samples/sec Loss 10.5088 LearningRate 0.0008 Epoch: 3 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:15,426-Speed 6315.56 samples/sec Loss 10.5213 LearningRate 0.0008 Epoch: 3 Global Step: 64470 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:18,652-Speed 6350.66 samples/sec Loss 10.5156 LearningRate 0.0008 Epoch: 3 Global Step: 64480 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:21,899-Speed 6307.86 samples/sec Loss 10.5171 LearningRate 0.0008 Epoch: 3 Global Step: 64490 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:25,144-Speed 6313.73 samples/sec Loss 10.4885 LearningRate 0.0008 Epoch: 3 Global Step: 64500 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:28,389-Speed 6312.96 samples/sec Loss 10.4934 LearningRate 0.0008 Epoch: 3 Global Step: 64510 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:31,629-Speed 6320.88 samples/sec Loss 10.5052 LearningRate 0.0008 Epoch: 3 Global Step: 64520 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:34,874-Speed 6312.79 samples/sec Loss 10.4247 LearningRate 0.0008 Epoch: 3 Global Step: 64530 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:38,119-Speed 6316.03 samples/sec Loss 10.4037 LearningRate 0.0008 Epoch: 3 Global Step: 64540 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:41,362-Speed 6315.51 samples/sec Loss 10.4552 LearningRate 0.0008 Epoch: 3 Global Step: 64550 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:44,606-Speed 6314.89 samples/sec Loss 10.4925 LearningRate 0.0008 Epoch: 3 Global Step: 64560 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:47,848-Speed 6319.15 samples/sec Loss 10.4368 LearningRate 0.0008 Epoch: 3 Global Step: 64570 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:51,078-Speed 6341.94 samples/sec Loss 10.4820 LearningRate 0.0008 Epoch: 3 Global Step: 64580 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:54,322-Speed 6315.53 samples/sec Loss 10.5449 LearningRate 0.0008 Epoch: 3 Global Step: 64590 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:36:57,565-Speed 6316.02 samples/sec Loss 10.4758 LearningRate 0.0008 Epoch: 3 Global Step: 64600 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:00,806-Speed 6319.75 samples/sec Loss 10.4333 LearningRate 0.0008 Epoch: 3 Global Step: 64610 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:04,049-Speed 6317.35 samples/sec Loss 10.4918 LearningRate 0.0008 Epoch: 3 Global Step: 64620 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:07,295-Speed 6311.72 samples/sec Loss 10.5517 LearningRate 0.0008 Epoch: 3 Global Step: 64630 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:10,535-Speed 6321.08 samples/sec Loss 10.5484 LearningRate 0.0008 Epoch: 3 Global Step: 64640 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:13,780-Speed 6318.22 samples/sec Loss 10.4844 LearningRate 0.0008 Epoch: 3 Global Step: 64650 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:17,021-Speed 6318.95 samples/sec Loss 10.4852 LearningRate 0.0008 Epoch: 3 Global Step: 64660 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:20,265-Speed 6315.72 samples/sec Loss 10.4635 LearningRate 0.0008 Epoch: 3 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:23,506-Speed 6319.03 samples/sec Loss 10.4556 LearningRate 0.0008 Epoch: 3 Global Step: 64680 Fp16 Grad Scale: 131072 Required: 70 hours Training: 2022-03-31 21:37:26,737-Speed 6341.52 samples/sec Loss 10.4554 LearningRate 0.0008 Epoch: 3 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:29,976-Speed 6324.11 samples/sec Loss 10.4059 LearningRate 0.0008 Epoch: 3 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:33,217-Speed 6319.66 samples/sec Loss 10.4866 LearningRate 0.0008 Epoch: 3 Global Step: 64710 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:36,460-Speed 6317.07 samples/sec Loss 10.4860 LearningRate 0.0008 Epoch: 3 Global Step: 64720 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:39,702-Speed 6317.74 samples/sec Loss 10.4509 LearningRate 0.0008 Epoch: 3 Global Step: 64730 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:42,952-Speed 6303.87 samples/sec Loss 10.4649 LearningRate 0.0008 Epoch: 3 Global Step: 64740 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:46,196-Speed 6312.97 samples/sec Loss 10.5258 LearningRate 0.0008 Epoch: 3 Global Step: 64750 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:49,435-Speed 6325.45 samples/sec Loss 10.5332 LearningRate 0.0008 Epoch: 3 Global Step: 64760 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:52,678-Speed 6316.69 samples/sec Loss 10.5037 LearningRate 0.0008 Epoch: 3 Global Step: 64770 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:55,929-Speed 6301.03 samples/sec Loss 10.4590 LearningRate 0.0008 Epoch: 3 Global Step: 64780 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:37:59,160-Speed 6340.22 samples/sec Loss 10.5561 LearningRate 0.0008 Epoch: 3 Global Step: 64790 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:02,402-Speed 6317.90 samples/sec Loss 10.5843 LearningRate 0.0008 Epoch: 3 Global Step: 64800 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:05,646-Speed 6316.75 samples/sec Loss 10.4515 LearningRate 0.0008 Epoch: 3 Global Step: 64810 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:08,894-Speed 6305.51 samples/sec Loss 10.3919 LearningRate 0.0008 Epoch: 3 Global Step: 64820 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:12,137-Speed 6317.24 samples/sec Loss 10.4153 LearningRate 0.0008 Epoch: 3 Global Step: 64830 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:15,381-Speed 6314.87 samples/sec Loss 10.3839 LearningRate 0.0008 Epoch: 3 Global Step: 64840 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:18,626-Speed 6311.57 samples/sec Loss 10.5390 LearningRate 0.0008 Epoch: 3 Global Step: 64850 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:21,874-Speed 6307.62 samples/sec Loss 10.5340 LearningRate 0.0008 Epoch: 3 Global Step: 64860 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:25,118-Speed 6315.23 samples/sec Loss 10.4568 LearningRate 0.0008 Epoch: 3 Global Step: 64870 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:28,360-Speed 6318.71 samples/sec Loss 10.5007 LearningRate 0.0008 Epoch: 3 Global Step: 64880 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:31,589-Speed 6342.66 samples/sec Loss 10.5475 LearningRate 0.0008 Epoch: 3 Global Step: 64890 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:34,833-Speed 6313.75 samples/sec Loss 10.5891 LearningRate 0.0008 Epoch: 3 Global Step: 64900 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:38,072-Speed 6325.06 samples/sec Loss 10.5711 LearningRate 0.0008 Epoch: 3 Global Step: 64910 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:41,317-Speed 6313.81 samples/sec Loss 10.4938 LearningRate 0.0008 Epoch: 3 Global Step: 64920 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:44,560-Speed 6315.46 samples/sec Loss 10.4607 LearningRate 0.0008 Epoch: 3 Global Step: 64930 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:47,805-Speed 6313.71 samples/sec Loss 10.4178 LearningRate 0.0008 Epoch: 3 Global Step: 64940 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:51,047-Speed 6318.06 samples/sec Loss 10.3765 LearningRate 0.0008 Epoch: 3 Global Step: 64950 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:54,295-Speed 6307.29 samples/sec Loss 10.4156 LearningRate 0.0008 Epoch: 3 Global Step: 64960 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:38:57,538-Speed 6315.32 samples/sec Loss 10.4409 LearningRate 0.0008 Epoch: 3 Global Step: 64970 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:00,782-Speed 6315.08 samples/sec Loss 10.4799 LearningRate 0.0008 Epoch: 3 Global Step: 64980 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:04,019-Speed 6329.59 samples/sec Loss 10.4573 LearningRate 0.0008 Epoch: 3 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:07,262-Speed 6315.84 samples/sec Loss 10.4149 LearningRate 0.0008 Epoch: 3 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:10,506-Speed 6314.95 samples/sec Loss 10.4165 LearningRate 0.0008 Epoch: 3 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:13,750-Speed 6314.18 samples/sec Loss 10.5248 LearningRate 0.0008 Epoch: 3 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:16,994-Speed 6315.88 samples/sec Loss 10.5162 LearningRate 0.0008 Epoch: 3 Global Step: 65030 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:20,237-Speed 6316.38 samples/sec Loss 10.4477 LearningRate 0.0008 Epoch: 3 Global Step: 65040 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:23,480-Speed 6315.52 samples/sec Loss 10.4735 LearningRate 0.0008 Epoch: 3 Global Step: 65050 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:26,724-Speed 6314.80 samples/sec Loss 10.4312 LearningRate 0.0008 Epoch: 3 Global Step: 65060 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:29,968-Speed 6315.59 samples/sec Loss 10.4274 LearningRate 0.0008 Epoch: 3 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:33,211-Speed 6316.23 samples/sec Loss 10.4755 LearningRate 0.0008 Epoch: 3 Global Step: 65080 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:36,439-Speed 6345.94 samples/sec Loss 10.3861 LearningRate 0.0008 Epoch: 3 Global Step: 65090 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:39,685-Speed 6310.83 samples/sec Loss 10.5597 LearningRate 0.0008 Epoch: 3 Global Step: 65100 Fp16 Grad Scale: 65536 Required: 70 hours Training: 2022-03-31 21:39:42,931-Speed 6309.20 samples/sec Loss 10.5110 LearningRate 0.0008 Epoch: 3 Global Step: 65110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:39:46,177-Speed 6310.95 samples/sec Loss 10.5278 LearningRate 0.0008 Epoch: 3 Global Step: 65120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:39:49,420-Speed 6316.14 samples/sec Loss 10.5095 LearningRate 0.0008 Epoch: 3 Global Step: 65130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:39:52,665-Speed 6312.87 samples/sec Loss 10.4038 LearningRate 0.0008 Epoch: 3 Global Step: 65140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:39:55,908-Speed 6316.31 samples/sec Loss 10.4426 LearningRate 0.0008 Epoch: 3 Global Step: 65150 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:39:59,151-Speed 6317.48 samples/sec Loss 10.4838 LearningRate 0.0008 Epoch: 3 Global Step: 65160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:02,398-Speed 6308.05 samples/sec Loss 10.4380 LearningRate 0.0008 Epoch: 3 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:05,649-Speed 6301.91 samples/sec Loss 10.5190 LearningRate 0.0008 Epoch: 3 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:08,896-Speed 6309.31 samples/sec Loss 10.4826 LearningRate 0.0008 Epoch: 3 Global Step: 65190 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 21:40:12,124-Speed 6346.02 samples/sec Loss 10.4367 LearningRate 0.0008 Epoch: 3 Global Step: 65200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:15,377-Speed 6296.60 samples/sec Loss 10.4629 LearningRate 0.0008 Epoch: 3 Global Step: 65210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:18,624-Speed 6309.95 samples/sec Loss 10.5248 LearningRate 0.0008 Epoch: 3 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:21,868-Speed 6314.76 samples/sec Loss 10.4994 LearningRate 0.0008 Epoch: 3 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:25,130-Speed 6280.02 samples/sec Loss 10.4191 LearningRate 0.0008 Epoch: 3 Global Step: 65240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:28,375-Speed 6312.24 samples/sec Loss 10.4808 LearningRate 0.0008 Epoch: 3 Global Step: 65250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:31,621-Speed 6309.79 samples/sec Loss 10.4838 LearningRate 0.0008 Epoch: 3 Global Step: 65260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:34,870-Speed 6305.51 samples/sec Loss 10.4843 LearningRate 0.0008 Epoch: 3 Global Step: 65270 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:38,115-Speed 6311.64 samples/sec Loss 10.4565 LearningRate 0.0008 Epoch: 3 Global Step: 65280 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:41,359-Speed 6314.42 samples/sec Loss 10.3967 LearningRate 0.0008 Epoch: 3 Global Step: 65290 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:44,590-Speed 6340.66 samples/sec Loss 10.5473 LearningRate 0.0008 Epoch: 3 Global Step: 65300 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:47,834-Speed 6313.71 samples/sec Loss 10.4407 LearningRate 0.0008 Epoch: 3 Global Step: 65310 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:51,077-Speed 6316.49 samples/sec Loss 10.4828 LearningRate 0.0008 Epoch: 3 Global Step: 65320 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:54,320-Speed 6316.23 samples/sec Loss 10.4783 LearningRate 0.0008 Epoch: 3 Global Step: 65330 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:40:57,573-Speed 6299.04 samples/sec Loss 10.4445 LearningRate 0.0008 Epoch: 3 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:00,815-Speed 6317.39 samples/sec Loss 10.4664 LearningRate 0.0008 Epoch: 3 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:04,062-Speed 6309.53 samples/sec Loss 10.4097 LearningRate 0.0008 Epoch: 3 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:07,309-Speed 6309.10 samples/sec Loss 10.4352 LearningRate 0.0008 Epoch: 3 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:10,565-Speed 6290.04 samples/sec Loss 10.3523 LearningRate 0.0008 Epoch: 3 Global Step: 65380 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:13,807-Speed 6319.41 samples/sec Loss 10.4457 LearningRate 0.0008 Epoch: 3 Global Step: 65390 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:17,051-Speed 6315.85 samples/sec Loss 10.4235 LearningRate 0.0008 Epoch: 3 Global Step: 65400 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 21:41:20,281-Speed 6342.29 samples/sec Loss 10.4535 LearningRate 0.0008 Epoch: 3 Global Step: 65410 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:23,523-Speed 6318.74 samples/sec Loss 10.4646 LearningRate 0.0008 Epoch: 3 Global Step: 65420 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:26,768-Speed 6312.42 samples/sec Loss 10.4363 LearningRate 0.0008 Epoch: 3 Global Step: 65430 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:30,010-Speed 6317.81 samples/sec Loss 10.4407 LearningRate 0.0008 Epoch: 3 Global Step: 65440 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:33,252-Speed 6319.04 samples/sec Loss 10.4447 LearningRate 0.0008 Epoch: 3 Global Step: 65450 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:36,497-Speed 6312.58 samples/sec Loss 10.4876 LearningRate 0.0008 Epoch: 3 Global Step: 65460 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:39,740-Speed 6315.68 samples/sec Loss 10.4754 LearningRate 0.0008 Epoch: 3 Global Step: 65470 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:41:42,971-Speed 6341.89 samples/sec Loss 10.4047 LearningRate 0.0008 Epoch: 3 Global Step: 65480 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:41:46,218-Speed 6309.55 samples/sec Loss 10.4522 LearningRate 0.0008 Epoch: 3 Global Step: 65490 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:41:49,463-Speed 6312.66 samples/sec Loss 10.3956 LearningRate 0.0008 Epoch: 3 Global Step: 65500 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:41:52,706-Speed 6316.65 samples/sec Loss 10.4091 LearningRate 0.0008 Epoch: 3 Global Step: 65510 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:41:55,989-Speed 6239.11 samples/sec Loss 10.3836 LearningRate 0.0008 Epoch: 3 Global Step: 65520 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:41:59,238-Speed 6304.73 samples/sec Loss 10.3746 LearningRate 0.0008 Epoch: 3 Global Step: 65530 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:02,497-Speed 6288.99 samples/sec Loss 10.3751 LearningRate 0.0008 Epoch: 3 Global Step: 65540 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:05,740-Speed 6317.27 samples/sec Loss 10.4769 LearningRate 0.0008 Epoch: 3 Global Step: 65550 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:08,990-Speed 6301.92 samples/sec Loss 10.3657 LearningRate 0.0008 Epoch: 3 Global Step: 65560 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:12,235-Speed 6312.68 samples/sec Loss 10.4420 LearningRate 0.0008 Epoch: 3 Global Step: 65570 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:15,481-Speed 6310.41 samples/sec Loss 10.4041 LearningRate 0.0008 Epoch: 3 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:42:18,725-Speed 6314.52 samples/sec Loss 10.3386 LearningRate 0.0008 Epoch: 3 Global Step: 65590 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:42:21,977-Speed 6300.95 samples/sec Loss 10.2558 LearningRate 0.0008 Epoch: 3 Global Step: 65600 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:42:25,221-Speed 6313.05 samples/sec Loss 10.3508 LearningRate 0.0008 Epoch: 3 Global Step: 65610 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:42:28,481-Speed 6284.06 samples/sec Loss 10.4155 LearningRate 0.0008 Epoch: 3 Global Step: 65620 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:42:31,710-Speed 6344.19 samples/sec Loss 10.4268 LearningRate 0.0008 Epoch: 3 Global Step: 65630 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:34,955-Speed 6313.51 samples/sec Loss 10.3868 LearningRate 0.0008 Epoch: 3 Global Step: 65640 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:38,199-Speed 6313.45 samples/sec Loss 10.4899 LearningRate 0.0008 Epoch: 3 Global Step: 65650 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:41,444-Speed 6314.06 samples/sec Loss 10.3382 LearningRate 0.0008 Epoch: 3 Global Step: 65660 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:44,687-Speed 6315.72 samples/sec Loss 10.4194 LearningRate 0.0008 Epoch: 3 Global Step: 65670 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:47,929-Speed 6318.27 samples/sec Loss 10.4527 LearningRate 0.0008 Epoch: 3 Global Step: 65680 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:51,172-Speed 6316.07 samples/sec Loss 10.4992 LearningRate 0.0008 Epoch: 3 Global Step: 65690 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:54,421-Speed 6305.46 samples/sec Loss 10.4313 LearningRate 0.0008 Epoch: 3 Global Step: 65700 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:42:57,666-Speed 6312.10 samples/sec Loss 10.3130 LearningRate 0.0008 Epoch: 3 Global Step: 65710 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:43:00,908-Speed 6319.16 samples/sec Loss 10.4014 LearningRate 0.0008 Epoch: 3 Global Step: 65720 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:43:04,153-Speed 6311.75 samples/sec Loss 10.4355 LearningRate 0.0008 Epoch: 3 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:07,401-Speed 6308.21 samples/sec Loss 10.3808 LearningRate 0.0008 Epoch: 3 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:10,647-Speed 6310.06 samples/sec Loss 10.4120 LearningRate 0.0008 Epoch: 3 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:13,891-Speed 6314.48 samples/sec Loss 10.3539 LearningRate 0.0008 Epoch: 3 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:17,132-Speed 6319.99 samples/sec Loss 10.4409 LearningRate 0.0008 Epoch: 3 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:20,376-Speed 6316.33 samples/sec Loss 10.4476 LearningRate 0.0008 Epoch: 3 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:23,622-Speed 6309.27 samples/sec Loss 10.4486 LearningRate 0.0008 Epoch: 3 Global Step: 65790 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:26,864-Speed 6319.71 samples/sec Loss 10.5094 LearningRate 0.0008 Epoch: 3 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:30,106-Speed 6317.49 samples/sec Loss 10.3955 LearningRate 0.0008 Epoch: 3 Global Step: 65810 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:33,351-Speed 6313.70 samples/sec Loss 10.4018 LearningRate 0.0008 Epoch: 3 Global Step: 65820 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:36,579-Speed 6345.17 samples/sec Loss 10.3986 LearningRate 0.0008 Epoch: 3 Global Step: 65830 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:39,829-Speed 6303.27 samples/sec Loss 10.3576 LearningRate 0.0008 Epoch: 3 Global Step: 65840 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:43,073-Speed 6314.49 samples/sec Loss 10.3730 LearningRate 0.0008 Epoch: 3 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:46,315-Speed 6318.90 samples/sec Loss 10.4593 LearningRate 0.0008 Epoch: 3 Global Step: 65860 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:49,561-Speed 6311.37 samples/sec Loss 10.4033 LearningRate 0.0008 Epoch: 3 Global Step: 65870 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:52,805-Speed 6313.68 samples/sec Loss 10.3615 LearningRate 0.0008 Epoch: 3 Global Step: 65880 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:56,050-Speed 6312.35 samples/sec Loss 10.5282 LearningRate 0.0008 Epoch: 3 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:43:59,293-Speed 6317.79 samples/sec Loss 10.3694 LearningRate 0.0008 Epoch: 3 Global Step: 65900 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:02,547-Speed 6294.85 samples/sec Loss 10.3179 LearningRate 0.0008 Epoch: 3 Global Step: 65910 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:05,788-Speed 6319.56 samples/sec Loss 10.3536 LearningRate 0.0008 Epoch: 3 Global Step: 65920 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:09,019-Speed 6340.65 samples/sec Loss 10.4310 LearningRate 0.0008 Epoch: 3 Global Step: 65930 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:12,261-Speed 6317.87 samples/sec Loss 10.4778 LearningRate 0.0008 Epoch: 3 Global Step: 65940 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:15,508-Speed 6309.54 samples/sec Loss 10.4568 LearningRate 0.0008 Epoch: 3 Global Step: 65950 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:18,749-Speed 6319.96 samples/sec Loss 10.3242 LearningRate 0.0008 Epoch: 3 Global Step: 65960 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:21,994-Speed 6312.42 samples/sec Loss 10.3800 LearningRate 0.0008 Epoch: 3 Global Step: 65970 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:25,236-Speed 6319.42 samples/sec Loss 10.3641 LearningRate 0.0008 Epoch: 3 Global Step: 65980 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:28,478-Speed 6317.33 samples/sec Loss 10.3461 LearningRate 0.0008 Epoch: 3 Global Step: 65990 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:31,724-Speed 6312.13 samples/sec Loss 10.4676 LearningRate 0.0008 Epoch: 3 Global Step: 66000 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:34,969-Speed 6312.98 samples/sec Loss 10.3279 LearningRate 0.0008 Epoch: 3 Global Step: 66010 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:38,224-Speed 6292.45 samples/sec Loss 10.3701 LearningRate 0.0008 Epoch: 3 Global Step: 66020 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:41,456-Speed 6339.74 samples/sec Loss 10.3478 LearningRate 0.0008 Epoch: 3 Global Step: 66030 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:44,699-Speed 6316.66 samples/sec Loss 10.4573 LearningRate 0.0008 Epoch: 3 Global Step: 66040 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:47,945-Speed 6310.47 samples/sec Loss 10.4243 LearningRate 0.0008 Epoch: 3 Global Step: 66050 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:51,186-Speed 6319.53 samples/sec Loss 10.3687 LearningRate 0.0008 Epoch: 3 Global Step: 66060 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:54,432-Speed 6311.02 samples/sec Loss 10.3705 LearningRate 0.0008 Epoch: 3 Global Step: 66070 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:44:57,675-Speed 6316.91 samples/sec Loss 10.3893 LearningRate 0.0008 Epoch: 3 Global Step: 66080 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:00,918-Speed 6316.15 samples/sec Loss 10.4668 LearningRate 0.0008 Epoch: 3 Global Step: 66090 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:04,167-Speed 6305.85 samples/sec Loss 10.4120 LearningRate 0.0008 Epoch: 3 Global Step: 66100 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:07,416-Speed 6304.25 samples/sec Loss 10.3575 LearningRate 0.0008 Epoch: 3 Global Step: 66110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:10,659-Speed 6315.53 samples/sec Loss 10.3739 LearningRate 0.0008 Epoch: 3 Global Step: 66120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:13,889-Speed 6343.46 samples/sec Loss 10.3240 LearningRate 0.0008 Epoch: 3 Global Step: 66130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:17,130-Speed 6320.15 samples/sec Loss 10.4180 LearningRate 0.0008 Epoch: 3 Global Step: 66140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:20,376-Speed 6311.00 samples/sec Loss 10.3936 LearningRate 0.0008 Epoch: 3 Global Step: 66150 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:23,623-Speed 6308.29 samples/sec Loss 10.3758 LearningRate 0.0008 Epoch: 3 Global Step: 66160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:26,866-Speed 6315.81 samples/sec Loss 10.4229 LearningRate 0.0008 Epoch: 3 Global Step: 66170 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:30,111-Speed 6314.05 samples/sec Loss 10.3976 LearningRate 0.0008 Epoch: 3 Global Step: 66180 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:33,353-Speed 6318.10 samples/sec Loss 10.4257 LearningRate 0.0008 Epoch: 3 Global Step: 66190 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:36,637-Speed 6238.00 samples/sec Loss 10.3166 LearningRate 0.0008 Epoch: 3 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:39,881-Speed 6313.65 samples/sec Loss 10.4467 LearningRate 0.0008 Epoch: 3 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:43,124-Speed 6317.93 samples/sec Loss 10.3931 LearningRate 0.0008 Epoch: 3 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:46,371-Speed 6308.11 samples/sec Loss 10.4548 LearningRate 0.0008 Epoch: 3 Global Step: 66230 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 21:45:49,602-Speed 6340.88 samples/sec Loss 10.3986 LearningRate 0.0008 Epoch: 3 Global Step: 66240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:52,846-Speed 6313.89 samples/sec Loss 10.4523 LearningRate 0.0008 Epoch: 3 Global Step: 66250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:56,091-Speed 6313.10 samples/sec Loss 10.3471 LearningRate 0.0008 Epoch: 3 Global Step: 66260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:45:59,345-Speed 6295.59 samples/sec Loss 10.4369 LearningRate 0.0008 Epoch: 3 Global Step: 66270 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:46:02,574-Speed 6343.90 samples/sec Loss 10.3223 LearningRate 0.0008 Epoch: 3 Global Step: 66280 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:46:05,820-Speed 6309.47 samples/sec Loss 10.4802 LearningRate 0.0008 Epoch: 3 Global Step: 66290 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:46:09,062-Speed 6318.17 samples/sec Loss 10.4789 LearningRate 0.0008 Epoch: 3 Global Step: 66300 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:46:12,305-Speed 6317.31 samples/sec Loss 10.3985 LearningRate 0.0008 Epoch: 3 Global Step: 66310 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:46:15,552-Speed 6309.75 samples/sec Loss 10.3007 LearningRate 0.0008 Epoch: 3 Global Step: 66320 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:46:18,804-Speed 6297.65 samples/sec Loss 10.4222 LearningRate 0.0008 Epoch: 3 Global Step: 66330 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:46:22,051-Speed 6309.48 samples/sec Loss 10.3074 LearningRate 0.0008 Epoch: 3 Global Step: 66340 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:46:25,300-Speed 6305.16 samples/sec Loss 10.3706 LearningRate 0.0008 Epoch: 3 Global Step: 66350 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:46:28,543-Speed 6316.42 samples/sec Loss 10.3549 LearningRate 0.0008 Epoch: 3 Global Step: 66360 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:46:31,787-Speed 6314.72 samples/sec Loss 10.3856 LearningRate 0.0008 Epoch: 3 Global Step: 66370 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:46:35,031-Speed 6314.82 samples/sec Loss 10.3572 LearningRate 0.0008 Epoch: 3 Global Step: 66380 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:46:38,273-Speed 6317.43 samples/sec Loss 10.3568 LearningRate 0.0008 Epoch: 3 Global Step: 66390 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:46:41,513-Speed 6322.78 samples/sec Loss 10.3305 LearningRate 0.0008 Epoch: 3 Global Step: 66400 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:46:44,761-Speed 6307.74 samples/sec Loss 10.3835 LearningRate 0.0008 Epoch: 3 Global Step: 66410 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:46:48,003-Speed 6319.55 samples/sec Loss 10.3850 LearningRate 0.0008 Epoch: 3 Global Step: 66420 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:46:51,364-Speed 6094.76 samples/sec Loss 10.3220 LearningRate 0.0008 Epoch: 3 Global Step: 66430 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:46:54,621-Speed 6288.60 samples/sec Loss 10.3263 LearningRate 0.0008 Epoch: 3 Global Step: 66440 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:46:57,873-Speed 6299.46 samples/sec Loss 10.3761 LearningRate 0.0008 Epoch: 3 Global Step: 66450 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:01,114-Speed 6319.96 samples/sec Loss 10.3056 LearningRate 0.0008 Epoch: 3 Global Step: 66460 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:04,368-Speed 6294.43 samples/sec Loss 10.3070 LearningRate 0.0008 Epoch: 3 Global Step: 66470 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:07,611-Speed 6316.80 samples/sec Loss 10.2907 LearningRate 0.0008 Epoch: 3 Global Step: 66480 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 21:47:10,839-Speed 6346.13 samples/sec Loss 10.2776 LearningRate 0.0008 Epoch: 3 Global Step: 66490 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:14,088-Speed 6305.50 samples/sec Loss 10.3434 LearningRate 0.0008 Epoch: 3 Global Step: 66500 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:17,331-Speed 6315.53 samples/sec Loss 10.4585 LearningRate 0.0008 Epoch: 3 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:20,577-Speed 6311.31 samples/sec Loss 10.2133 LearningRate 0.0008 Epoch: 3 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:23,825-Speed 6306.68 samples/sec Loss 10.4119 LearningRate 0.0008 Epoch: 3 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:27,086-Speed 6282.86 samples/sec Loss 10.3663 LearningRate 0.0008 Epoch: 3 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:30,330-Speed 6314.31 samples/sec Loss 10.2961 LearningRate 0.0008 Epoch: 3 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:33,574-Speed 6315.10 samples/sec Loss 10.3283 LearningRate 0.0008 Epoch: 3 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:36,813-Speed 6323.21 samples/sec Loss 10.3498 LearningRate 0.0008 Epoch: 3 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:40,060-Speed 6308.35 samples/sec Loss 10.4041 LearningRate 0.0008 Epoch: 3 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:43,293-Speed 6336.00 samples/sec Loss 10.3885 LearningRate 0.0008 Epoch: 3 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:46,539-Speed 6311.28 samples/sec Loss 10.3901 LearningRate 0.0008 Epoch: 3 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:49,786-Speed 6308.44 samples/sec Loss 10.2507 LearningRate 0.0008 Epoch: 3 Global Step: 66610 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:53,087-Speed 6205.83 samples/sec Loss 10.3966 LearningRate 0.0008 Epoch: 3 Global Step: 66620 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:56,342-Speed 6294.08 samples/sec Loss 10.3885 LearningRate 0.0008 Epoch: 3 Global Step: 66630 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:47:59,589-Speed 6308.42 samples/sec Loss 10.3432 LearningRate 0.0008 Epoch: 3 Global Step: 66640 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:48:02,820-Speed 6340.13 samples/sec Loss 10.4313 LearningRate 0.0008 Epoch: 3 Global Step: 66650 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:48:06,067-Speed 6308.34 samples/sec Loss 10.3075 LearningRate 0.0008 Epoch: 3 Global Step: 66660 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:48:09,310-Speed 6318.09 samples/sec Loss 10.3140 LearningRate 0.0008 Epoch: 3 Global Step: 66670 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:48:12,561-Speed 6300.43 samples/sec Loss 10.3551 LearningRate 0.0008 Epoch: 3 Global Step: 66680 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:48:15,802-Speed 6320.27 samples/sec Loss 10.2295 LearningRate 0.0008 Epoch: 3 Global Step: 66690 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:48:19,046-Speed 6315.03 samples/sec Loss 10.3373 LearningRate 0.0008 Epoch: 3 Global Step: 66700 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:48:22,290-Speed 6314.67 samples/sec Loss 10.3149 LearningRate 0.0008 Epoch: 3 Global Step: 66710 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:48:25,533-Speed 6315.57 samples/sec Loss 10.2429 LearningRate 0.0008 Epoch: 3 Global Step: 66720 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:48:28,779-Speed 6311.69 samples/sec Loss 10.3323 LearningRate 0.0008 Epoch: 3 Global Step: 66730 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:48:32,026-Speed 6309.01 samples/sec Loss 10.3064 LearningRate 0.0008 Epoch: 3 Global Step: 66740 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:48:35,269-Speed 6315.64 samples/sec Loss 10.3045 LearningRate 0.0008 Epoch: 3 Global Step: 66750 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:48:38,514-Speed 6312.34 samples/sec Loss 10.3359 LearningRate 0.0008 Epoch: 3 Global Step: 66760 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:48:41,759-Speed 6312.48 samples/sec Loss 10.3603 LearningRate 0.0008 Epoch: 3 Global Step: 66770 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:48:45,007-Speed 6307.94 samples/sec Loss 10.3406 LearningRate 0.0008 Epoch: 3 Global Step: 66780 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:48:48,250-Speed 6315.46 samples/sec Loss 10.3961 LearningRate 0.0008 Epoch: 3 Global Step: 66790 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:48:51,493-Speed 6316.51 samples/sec Loss 10.2938 LearningRate 0.0008 Epoch: 3 Global Step: 66800 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:48:54,734-Speed 6320.48 samples/sec Loss 10.3412 LearningRate 0.0008 Epoch: 3 Global Step: 66810 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:48:57,990-Speed 6292.68 samples/sec Loss 10.3591 LearningRate 0.0008 Epoch: 3 Global Step: 66820 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:01,235-Speed 6312.04 samples/sec Loss 10.4302 LearningRate 0.0008 Epoch: 3 Global Step: 66830 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:04,481-Speed 6311.12 samples/sec Loss 10.3173 LearningRate 0.0008 Epoch: 3 Global Step: 66840 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:07,715-Speed 6336.15 samples/sec Loss 10.3460 LearningRate 0.0008 Epoch: 3 Global Step: 66850 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:10,956-Speed 6320.04 samples/sec Loss 10.3006 LearningRate 0.0008 Epoch: 3 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:14,202-Speed 6312.31 samples/sec Loss 10.3675 LearningRate 0.0008 Epoch: 3 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:17,445-Speed 6316.23 samples/sec Loss 10.3784 LearningRate 0.0008 Epoch: 3 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:20,689-Speed 6313.90 samples/sec Loss 10.3220 LearningRate 0.0008 Epoch: 3 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:23,933-Speed 6315.88 samples/sec Loss 10.4032 LearningRate 0.0008 Epoch: 3 Global Step: 66900 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:27,182-Speed 6303.63 samples/sec Loss 10.4430 LearningRate 0.0008 Epoch: 3 Global Step: 66910 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:30,424-Speed 6317.91 samples/sec Loss 10.4040 LearningRate 0.0008 Epoch: 3 Global Step: 66920 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:33,666-Speed 6318.08 samples/sec Loss 10.2868 LearningRate 0.0008 Epoch: 3 Global Step: 66930 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:36,912-Speed 6311.99 samples/sec Loss 10.3520 LearningRate 0.0008 Epoch: 3 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:40,146-Speed 6334.35 samples/sec Loss 10.4092 LearningRate 0.0008 Epoch: 3 Global Step: 66950 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:43,390-Speed 6313.70 samples/sec Loss 10.2878 LearningRate 0.0008 Epoch: 3 Global Step: 66960 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:46,681-Speed 6224.19 samples/sec Loss 10.3271 LearningRate 0.0008 Epoch: 3 Global Step: 66970 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:49,924-Speed 6317.54 samples/sec Loss 10.3401 LearningRate 0.0008 Epoch: 3 Global Step: 66980 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:53,169-Speed 6312.58 samples/sec Loss 10.3115 LearningRate 0.0008 Epoch: 3 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:56,414-Speed 6312.32 samples/sec Loss 10.3101 LearningRate 0.0008 Epoch: 3 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:49:59,660-Speed 6310.31 samples/sec Loss 10.2892 LearningRate 0.0008 Epoch: 3 Global Step: 67010 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:02,903-Speed 6316.93 samples/sec Loss 10.2725 LearningRate 0.0008 Epoch: 3 Global Step: 67020 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:06,146-Speed 6316.82 samples/sec Loss 10.2957 LearningRate 0.0008 Epoch: 3 Global Step: 67030 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:09,388-Speed 6319.39 samples/sec Loss 10.3409 LearningRate 0.0008 Epoch: 3 Global Step: 67040 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:12,639-Speed 6300.54 samples/sec Loss 10.3251 LearningRate 0.0008 Epoch: 3 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:15,956-Speed 6175.10 samples/sec Loss 10.3392 LearningRate 0.0008 Epoch: 3 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:19,199-Speed 6317.14 samples/sec Loss 10.4115 LearningRate 0.0008 Epoch: 3 Global Step: 67070 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:22,451-Speed 6298.92 samples/sec Loss 10.3882 LearningRate 0.0008 Epoch: 3 Global Step: 67080 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:25,697-Speed 6311.60 samples/sec Loss 10.4301 LearningRate 0.0008 Epoch: 3 Global Step: 67090 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:28,939-Speed 6317.06 samples/sec Loss 10.3760 LearningRate 0.0008 Epoch: 3 Global Step: 67100 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:32,183-Speed 6314.23 samples/sec Loss 10.3953 LearningRate 0.0008 Epoch: 3 Global Step: 67110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:35,431-Speed 6308.49 samples/sec Loss 10.3010 LearningRate 0.0008 Epoch: 3 Global Step: 67120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:38,676-Speed 6311.37 samples/sec Loss 10.3141 LearningRate 0.0008 Epoch: 3 Global Step: 67130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:41,922-Speed 6311.96 samples/sec Loss 10.3640 LearningRate 0.0008 Epoch: 3 Global Step: 67140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:45,166-Speed 6313.81 samples/sec Loss 10.2220 LearningRate 0.0008 Epoch: 3 Global Step: 67150 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 21:50:48,395-Speed 6343.61 samples/sec Loss 10.3985 LearningRate 0.0008 Epoch: 3 Global Step: 67160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:51,639-Speed 6314.15 samples/sec Loss 10.3878 LearningRate 0.0008 Epoch: 3 Global Step: 67170 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:54,891-Speed 6300.73 samples/sec Loss 10.2599 LearningRate 0.0008 Epoch: 3 Global Step: 67180 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:50:58,137-Speed 6309.41 samples/sec Loss 10.3267 LearningRate 0.0008 Epoch: 3 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:01,380-Speed 6316.91 samples/sec Loss 10.2841 LearningRate 0.0008 Epoch: 3 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:04,625-Speed 6312.34 samples/sec Loss 10.3643 LearningRate 0.0008 Epoch: 3 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:07,872-Speed 6310.33 samples/sec Loss 10.3203 LearningRate 0.0008 Epoch: 3 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:11,117-Speed 6311.41 samples/sec Loss 10.3085 LearningRate 0.0008 Epoch: 3 Global Step: 67230 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:14,360-Speed 6315.61 samples/sec Loss 10.2911 LearningRate 0.0008 Epoch: 3 Global Step: 67240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:17,605-Speed 6314.78 samples/sec Loss 10.4773 LearningRate 0.0008 Epoch: 3 Global Step: 67250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:20,836-Speed 6340.87 samples/sec Loss 10.2844 LearningRate 0.0008 Epoch: 3 Global Step: 67260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:24,078-Speed 6316.78 samples/sec Loss 10.3025 LearningRate 0.0008 Epoch: 3 Global Step: 67270 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:27,324-Speed 6311.27 samples/sec Loss 10.2718 LearningRate 0.0008 Epoch: 3 Global Step: 67280 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:30,569-Speed 6313.45 samples/sec Loss 10.3310 LearningRate 0.0008 Epoch: 3 Global Step: 67290 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:33,811-Speed 6317.83 samples/sec Loss 10.3334 LearningRate 0.0008 Epoch: 3 Global Step: 67300 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:37,058-Speed 6310.72 samples/sec Loss 10.3168 LearningRate 0.0008 Epoch: 3 Global Step: 67310 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:40,303-Speed 6312.93 samples/sec Loss 10.2650 LearningRate 0.0008 Epoch: 3 Global Step: 67320 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:43,549-Speed 6310.01 samples/sec Loss 10.3148 LearningRate 0.0008 Epoch: 3 Global Step: 67330 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:46,795-Speed 6310.46 samples/sec Loss 10.4087 LearningRate 0.0008 Epoch: 3 Global Step: 67340 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:50,039-Speed 6313.65 samples/sec Loss 10.3050 LearningRate 0.0008 Epoch: 3 Global Step: 67350 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:53,277-Speed 6329.08 samples/sec Loss 10.3728 LearningRate 0.0008 Epoch: 3 Global Step: 67360 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:56,520-Speed 6316.23 samples/sec Loss 10.2654 LearningRate 0.0008 Epoch: 3 Global Step: 67370 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:51:59,826-Speed 6195.79 samples/sec Loss 10.2737 LearningRate 0.0008 Epoch: 3 Global Step: 67380 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:03,073-Speed 6308.37 samples/sec Loss 10.3070 LearningRate 0.0008 Epoch: 3 Global Step: 67390 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:06,328-Speed 6294.26 samples/sec Loss 10.4062 LearningRate 0.0008 Epoch: 3 Global Step: 67400 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:09,574-Speed 6310.02 samples/sec Loss 10.2831 LearningRate 0.0008 Epoch: 3 Global Step: 67410 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:12,820-Speed 6310.01 samples/sec Loss 10.3002 LearningRate 0.0008 Epoch: 3 Global Step: 67420 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:16,070-Speed 6303.68 samples/sec Loss 10.3683 LearningRate 0.0008 Epoch: 3 Global Step: 67430 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:19,314-Speed 6315.73 samples/sec Loss 10.2769 LearningRate 0.0008 Epoch: 3 Global Step: 67440 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:22,560-Speed 6310.07 samples/sec Loss 10.3337 LearningRate 0.0008 Epoch: 3 Global Step: 67450 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:25,790-Speed 6342.11 samples/sec Loss 10.2943 LearningRate 0.0008 Epoch: 3 Global Step: 67460 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:29,034-Speed 6314.18 samples/sec Loss 10.2761 LearningRate 0.0008 Epoch: 3 Global Step: 67470 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:32,279-Speed 6313.09 samples/sec Loss 10.2701 LearningRate 0.0008 Epoch: 3 Global Step: 67480 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:35,523-Speed 6315.11 samples/sec Loss 10.2891 LearningRate 0.0008 Epoch: 3 Global Step: 67490 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:38,766-Speed 6317.52 samples/sec Loss 10.2734 LearningRate 0.0008 Epoch: 3 Global Step: 67500 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:42,010-Speed 6313.45 samples/sec Loss 10.3441 LearningRate 0.0008 Epoch: 3 Global Step: 67510 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:45,259-Speed 6305.52 samples/sec Loss 10.3002 LearningRate 0.0008 Epoch: 3 Global Step: 67520 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:48,502-Speed 6315.65 samples/sec Loss 10.2877 LearningRate 0.0008 Epoch: 3 Global Step: 67530 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:51,746-Speed 6314.98 samples/sec Loss 10.3823 LearningRate 0.0008 Epoch: 3 Global Step: 67540 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:54,992-Speed 6311.53 samples/sec Loss 10.3323 LearningRate 0.0008 Epoch: 3 Global Step: 67550 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:52:58,236-Speed 6312.92 samples/sec Loss 10.2747 LearningRate 0.0008 Epoch: 3 Global Step: 67560 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 21:53:01,468-Speed 6338.61 samples/sec Loss 10.2760 LearningRate 0.0008 Epoch: 3 Global Step: 67570 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:04,709-Speed 6319.68 samples/sec Loss 10.3177 LearningRate 0.0008 Epoch: 3 Global Step: 67580 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:07,957-Speed 6306.93 samples/sec Loss 10.2690 LearningRate 0.0008 Epoch: 3 Global Step: 67590 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:11,206-Speed 6306.25 samples/sec Loss 10.2912 LearningRate 0.0008 Epoch: 3 Global Step: 67600 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:14,448-Speed 6318.55 samples/sec Loss 10.2722 LearningRate 0.0008 Epoch: 3 Global Step: 67610 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:17,693-Speed 6311.15 samples/sec Loss 10.3076 LearningRate 0.0008 Epoch: 3 Global Step: 67620 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:20,940-Speed 6309.66 samples/sec Loss 10.3511 LearningRate 0.0008 Epoch: 3 Global Step: 67630 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:24,189-Speed 6304.00 samples/sec Loss 10.2640 LearningRate 0.0008 Epoch: 3 Global Step: 67640 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:27,435-Speed 6310.85 samples/sec Loss 10.2793 LearningRate 0.0008 Epoch: 3 Global Step: 67650 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:30,679-Speed 6314.77 samples/sec Loss 10.3026 LearningRate 0.0008 Epoch: 3 Global Step: 67660 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:33,911-Speed 6339.41 samples/sec Loss 10.3222 LearningRate 0.0008 Epoch: 3 Global Step: 67670 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:37,155-Speed 6314.64 samples/sec Loss 10.3192 LearningRate 0.0008 Epoch: 3 Global Step: 67680 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:40,401-Speed 6311.56 samples/sec Loss 10.2817 LearningRate 0.0008 Epoch: 3 Global Step: 67690 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:53:43,629-Speed 6344.79 samples/sec Loss 10.2161 LearningRate 0.0008 Epoch: 3 Global Step: 67700 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:53:46,875-Speed 6310.81 samples/sec Loss 10.2917 LearningRate 0.0008 Epoch: 3 Global Step: 67710 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:53:50,118-Speed 6316.40 samples/sec Loss 10.3282 LearningRate 0.0008 Epoch: 3 Global Step: 67720 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:53:53,358-Speed 6323.35 samples/sec Loss 10.2740 LearningRate 0.0008 Epoch: 3 Global Step: 67730 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:53:56,601-Speed 6315.84 samples/sec Loss 10.2603 LearningRate 0.0008 Epoch: 3 Global Step: 67740 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:53:59,848-Speed 6309.58 samples/sec Loss 10.3100 LearningRate 0.0008 Epoch: 3 Global Step: 67750 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:54:03,094-Speed 6309.81 samples/sec Loss 10.2128 LearningRate 0.0008 Epoch: 3 Global Step: 67760 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:54:06,339-Speed 6312.34 samples/sec Loss 10.3271 LearningRate 0.0008 Epoch: 3 Global Step: 67770 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:54:09,585-Speed 6311.70 samples/sec Loss 10.3048 LearningRate 0.0008 Epoch: 3 Global Step: 67780 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:54:12,831-Speed 6310.57 samples/sec Loss 10.3957 LearningRate 0.0008 Epoch: 3 Global Step: 67790 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 21:54:16,075-Speed 6313.29 samples/sec Loss 10.2894 LearningRate 0.0008 Epoch: 3 Global Step: 67800 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:19,320-Speed 6314.29 samples/sec Loss 10.3590 LearningRate 0.0008 Epoch: 3 Global Step: 67810 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:22,567-Speed 6308.69 samples/sec Loss 10.3299 LearningRate 0.0008 Epoch: 3 Global Step: 67820 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:25,813-Speed 6310.96 samples/sec Loss 10.3051 LearningRate 0.0008 Epoch: 3 Global Step: 67830 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:29,057-Speed 6314.88 samples/sec Loss 10.2294 LearningRate 0.0008 Epoch: 3 Global Step: 67840 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:32,299-Speed 6316.96 samples/sec Loss 10.3113 LearningRate 0.0008 Epoch: 3 Global Step: 67850 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:35,549-Speed 6304.95 samples/sec Loss 10.2834 LearningRate 0.0008 Epoch: 3 Global Step: 67860 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:38,792-Speed 6315.84 samples/sec Loss 10.2720 LearningRate 0.0008 Epoch: 3 Global Step: 67870 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:42,039-Speed 6309.90 samples/sec Loss 10.2795 LearningRate 0.0008 Epoch: 3 Global Step: 67880 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:45,283-Speed 6312.72 samples/sec Loss 10.3029 LearningRate 0.0008 Epoch: 3 Global Step: 67890 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:48,514-Speed 6341.63 samples/sec Loss 10.3245 LearningRate 0.0008 Epoch: 3 Global Step: 67900 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:51,759-Speed 6312.12 samples/sec Loss 10.2747 LearningRate 0.0008 Epoch: 3 Global Step: 67910 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:55,005-Speed 6311.68 samples/sec Loss 10.2703 LearningRate 0.0008 Epoch: 3 Global Step: 67920 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:54:58,253-Speed 6305.43 samples/sec Loss 10.3383 LearningRate 0.0008 Epoch: 3 Global Step: 67930 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:01,495-Speed 6318.68 samples/sec Loss 10.3307 LearningRate 0.0008 Epoch: 3 Global Step: 67940 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:04,742-Speed 6308.30 samples/sec Loss 10.3797 LearningRate 0.0008 Epoch: 3 Global Step: 67950 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:07,997-Speed 6293.74 samples/sec Loss 10.2544 LearningRate 0.0008 Epoch: 3 Global Step: 67960 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:11,256-Speed 6285.32 samples/sec Loss 10.2172 LearningRate 0.0008 Epoch: 3 Global Step: 67970 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:14,503-Speed 6309.63 samples/sec Loss 10.1488 LearningRate 0.0008 Epoch: 3 Global Step: 67980 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:17,768-Speed 6273.62 samples/sec Loss 10.2540 LearningRate 0.0008 Epoch: 3 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:21,005-Speed 6328.67 samples/sec Loss 10.2572 LearningRate 0.0008 Epoch: 3 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:24,253-Speed 6306.52 samples/sec Loss 10.1803 LearningRate 0.0008 Epoch: 3 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:27,499-Speed 6310.19 samples/sec Loss 10.2363 LearningRate 0.0008 Epoch: 3 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:30,746-Speed 6308.47 samples/sec Loss 10.3424 LearningRate 0.0008 Epoch: 3 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:33,992-Speed 6310.47 samples/sec Loss 10.3217 LearningRate 0.0008 Epoch: 3 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:37,235-Speed 6317.62 samples/sec Loss 10.2932 LearningRate 0.0008 Epoch: 3 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:40,482-Speed 6308.01 samples/sec Loss 10.3773 LearningRate 0.0008 Epoch: 3 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:43,729-Speed 6309.58 samples/sec Loss 10.2691 LearningRate 0.0008 Epoch: 3 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:46,971-Speed 6317.71 samples/sec Loss 10.1248 LearningRate 0.0008 Epoch: 3 Global Step: 68080 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:50,219-Speed 6308.21 samples/sec Loss 10.2937 LearningRate 0.0008 Epoch: 3 Global Step: 68090 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:53,451-Speed 6338.43 samples/sec Loss 10.2482 LearningRate 0.0008 Epoch: 3 Global Step: 68100 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:56,695-Speed 6313.66 samples/sec Loss 10.2204 LearningRate 0.0008 Epoch: 3 Global Step: 68110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:55:59,938-Speed 6316.48 samples/sec Loss 10.2540 LearningRate 0.0008 Epoch: 3 Global Step: 68120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:03,183-Speed 6312.31 samples/sec Loss 10.2477 LearningRate 0.0008 Epoch: 3 Global Step: 68130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:06,429-Speed 6311.52 samples/sec Loss 10.2564 LearningRate 0.0008 Epoch: 3 Global Step: 68140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:09,678-Speed 6304.35 samples/sec Loss 10.3054 LearningRate 0.0008 Epoch: 3 Global Step: 68150 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:12,924-Speed 6312.09 samples/sec Loss 10.3146 LearningRate 0.0008 Epoch: 3 Global Step: 68160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:16,170-Speed 6309.90 samples/sec Loss 10.3114 LearningRate 0.0008 Epoch: 3 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:19,413-Speed 6316.00 samples/sec Loss 10.3208 LearningRate 0.0008 Epoch: 3 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:22,661-Speed 6307.41 samples/sec Loss 10.2358 LearningRate 0.0008 Epoch: 3 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:25,893-Speed 6337.56 samples/sec Loss 10.1805 LearningRate 0.0008 Epoch: 3 Global Step: 68200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:29,141-Speed 6306.37 samples/sec Loss 10.3258 LearningRate 0.0008 Epoch: 3 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:32,383-Speed 6318.65 samples/sec Loss 10.3559 LearningRate 0.0008 Epoch: 3 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:35,624-Speed 6321.82 samples/sec Loss 10.2740 LearningRate 0.0008 Epoch: 3 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:38,867-Speed 6315.77 samples/sec Loss 10.3508 LearningRate 0.0008 Epoch: 3 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:42,112-Speed 6312.18 samples/sec Loss 10.2668 LearningRate 0.0008 Epoch: 3 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:45,360-Speed 6306.24 samples/sec Loss 10.3326 LearningRate 0.0008 Epoch: 3 Global Step: 68260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:48,603-Speed 6316.48 samples/sec Loss 10.3367 LearningRate 0.0008 Epoch: 3 Global Step: 68270 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:51,848-Speed 6314.26 samples/sec Loss 10.2263 LearningRate 0.0008 Epoch: 3 Global Step: 68280 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:55,093-Speed 6311.65 samples/sec Loss 10.3285 LearningRate 0.0008 Epoch: 3 Global Step: 68290 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:56:58,339-Speed 6311.11 samples/sec Loss 10.2646 LearningRate 0.0008 Epoch: 3 Global Step: 68300 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 21:57:01,573-Speed 6334.02 samples/sec Loss 10.2087 LearningRate 0.0008 Epoch: 3 Global Step: 68310 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:04,817-Speed 6314.45 samples/sec Loss 10.2782 LearningRate 0.0008 Epoch: 3 Global Step: 68320 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:08,063-Speed 6311.03 samples/sec Loss 10.2644 LearningRate 0.0008 Epoch: 3 Global Step: 68330 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:11,309-Speed 6310.25 samples/sec Loss 10.2469 LearningRate 0.0008 Epoch: 3 Global Step: 68340 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:14,555-Speed 6311.10 samples/sec Loss 10.1910 LearningRate 0.0008 Epoch: 3 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:17,801-Speed 6311.69 samples/sec Loss 10.2257 LearningRate 0.0008 Epoch: 3 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:21,042-Speed 6318.99 samples/sec Loss 10.1488 LearningRate 0.0008 Epoch: 3 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:24,301-Speed 6286.52 samples/sec Loss 10.1523 LearningRate 0.0008 Epoch: 3 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:27,551-Speed 6302.85 samples/sec Loss 10.2253 LearningRate 0.0008 Epoch: 3 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:30,792-Speed 6320.09 samples/sec Loss 10.3218 LearningRate 0.0008 Epoch: 3 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:34,025-Speed 6337.36 samples/sec Loss 10.2597 LearningRate 0.0008 Epoch: 3 Global Step: 68410 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:37,270-Speed 6311.87 samples/sec Loss 10.1558 LearningRate 0.0008 Epoch: 3 Global Step: 68420 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:40,522-Speed 6299.29 samples/sec Loss 10.2707 LearningRate 0.0008 Epoch: 3 Global Step: 68430 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:43,767-Speed 6311.71 samples/sec Loss 10.2732 LearningRate 0.0008 Epoch: 3 Global Step: 68440 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:47,012-Speed 6313.85 samples/sec Loss 10.1355 LearningRate 0.0008 Epoch: 3 Global Step: 68450 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:50,258-Speed 6308.94 samples/sec Loss 10.2876 LearningRate 0.0008 Epoch: 3 Global Step: 68460 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:53,500-Speed 6320.20 samples/sec Loss 10.2187 LearningRate 0.0008 Epoch: 3 Global Step: 68470 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:56,745-Speed 6313.66 samples/sec Loss 10.2369 LearningRate 0.0008 Epoch: 3 Global Step: 68480 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:57:59,989-Speed 6313.73 samples/sec Loss 10.2142 LearningRate 0.0008 Epoch: 3 Global Step: 68490 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:03,233-Speed 6315.43 samples/sec Loss 10.2296 LearningRate 0.0008 Epoch: 3 Global Step: 68500 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:06,465-Speed 6338.31 samples/sec Loss 10.2679 LearningRate 0.0008 Epoch: 3 Global Step: 68510 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:09,709-Speed 6313.65 samples/sec Loss 10.1079 LearningRate 0.0008 Epoch: 3 Global Step: 68520 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:13,011-Speed 6204.36 samples/sec Loss 10.2750 LearningRate 0.0008 Epoch: 3 Global Step: 68530 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:16,257-Speed 6309.16 samples/sec Loss 10.2155 LearningRate 0.0008 Epoch: 3 Global Step: 68540 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:19,499-Speed 6318.72 samples/sec Loss 10.3177 LearningRate 0.0008 Epoch: 3 Global Step: 68550 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:22,745-Speed 6310.80 samples/sec Loss 10.2717 LearningRate 0.0008 Epoch: 3 Global Step: 68560 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:25,989-Speed 6315.59 samples/sec Loss 10.2440 LearningRate 0.0008 Epoch: 3 Global Step: 68570 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:29,232-Speed 6316.53 samples/sec Loss 10.2728 LearningRate 0.0008 Epoch: 3 Global Step: 68580 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:32,481-Speed 6303.40 samples/sec Loss 10.2118 LearningRate 0.0008 Epoch: 3 Global Step: 68590 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:35,724-Speed 6316.94 samples/sec Loss 10.1863 LearningRate 0.0008 Epoch: 3 Global Step: 68600 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:38,957-Speed 6335.69 samples/sec Loss 10.2726 LearningRate 0.0008 Epoch: 3 Global Step: 68610 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:42,208-Speed 6302.65 samples/sec Loss 10.2270 LearningRate 0.0008 Epoch: 3 Global Step: 68620 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:45,451-Speed 6314.68 samples/sec Loss 10.1615 LearningRate 0.0008 Epoch: 3 Global Step: 68630 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:48,705-Speed 6295.15 samples/sec Loss 10.1316 LearningRate 0.0008 Epoch: 3 Global Step: 68640 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:51,949-Speed 6314.85 samples/sec Loss 10.2077 LearningRate 0.0008 Epoch: 3 Global Step: 68650 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:55,200-Speed 6301.27 samples/sec Loss 10.2549 LearningRate 0.0008 Epoch: 3 Global Step: 68660 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:58:58,443-Speed 6316.38 samples/sec Loss 10.2019 LearningRate 0.0008 Epoch: 3 Global Step: 68670 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:01,693-Speed 6304.68 samples/sec Loss 10.1827 LearningRate 0.0008 Epoch: 3 Global Step: 68680 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:04,935-Speed 6318.23 samples/sec Loss 10.2425 LearningRate 0.0008 Epoch: 3 Global Step: 68690 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:08,182-Speed 6307.48 samples/sec Loss 10.3027 LearningRate 0.0008 Epoch: 3 Global Step: 68700 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:11,413-Speed 6339.92 samples/sec Loss 10.3951 LearningRate 0.0008 Epoch: 3 Global Step: 68710 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:14,656-Speed 6317.43 samples/sec Loss 10.2266 LearningRate 0.0008 Epoch: 3 Global Step: 68720 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:17,899-Speed 6316.11 samples/sec Loss 10.2524 LearningRate 0.0008 Epoch: 3 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:21,144-Speed 6312.24 samples/sec Loss 10.2103 LearningRate 0.0008 Epoch: 3 Global Step: 68740 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:24,389-Speed 6313.70 samples/sec Loss 10.2584 LearningRate 0.0008 Epoch: 3 Global Step: 68750 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:27,630-Speed 6320.80 samples/sec Loss 10.1987 LearningRate 0.0008 Epoch: 3 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:30,875-Speed 6313.02 samples/sec Loss 10.2404 LearningRate 0.0008 Epoch: 3 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:34,119-Speed 6314.60 samples/sec Loss 10.2533 LearningRate 0.0008 Epoch: 3 Global Step: 68780 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:37,362-Speed 6316.20 samples/sec Loss 10.2549 LearningRate 0.0008 Epoch: 3 Global Step: 68790 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:40,612-Speed 6302.93 samples/sec Loss 10.2742 LearningRate 0.0008 Epoch: 3 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:43,842-Speed 6341.23 samples/sec Loss 10.2120 LearningRate 0.0008 Epoch: 3 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:47,094-Speed 6298.39 samples/sec Loss 10.2257 LearningRate 0.0008 Epoch: 3 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:50,340-Speed 6311.58 samples/sec Loss 10.2371 LearningRate 0.0008 Epoch: 3 Global Step: 68830 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:53,593-Speed 6296.87 samples/sec Loss 10.1858 LearningRate 0.0008 Epoch: 3 Global Step: 68840 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 21:59:56,842-Speed 6304.87 samples/sec Loss 10.2074 LearningRate 0.0008 Epoch: 3 Global Step: 68850 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:00,083-Speed 6320.47 samples/sec Loss 10.0828 LearningRate 0.0008 Epoch: 3 Global Step: 68860 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:03,328-Speed 6312.28 samples/sec Loss 10.1720 LearningRate 0.0008 Epoch: 3 Global Step: 68870 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:06,574-Speed 6310.73 samples/sec Loss 10.2810 LearningRate 0.0008 Epoch: 3 Global Step: 68880 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:09,823-Speed 6306.36 samples/sec Loss 10.1394 LearningRate 0.0008 Epoch: 3 Global Step: 68890 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:13,077-Speed 6293.99 samples/sec Loss 10.2337 LearningRate 0.0008 Epoch: 3 Global Step: 68900 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:16,325-Speed 6308.18 samples/sec Loss 10.1453 LearningRate 0.0008 Epoch: 3 Global Step: 68910 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:00:19,557-Speed 6337.62 samples/sec Loss 10.1595 LearningRate 0.0008 Epoch: 3 Global Step: 68920 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:22,802-Speed 6312.90 samples/sec Loss 10.2139 LearningRate 0.0008 Epoch: 3 Global Step: 68930 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:26,046-Speed 6313.61 samples/sec Loss 10.2239 LearningRate 0.0008 Epoch: 3 Global Step: 68940 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:29,328-Speed 6243.05 samples/sec Loss 10.2853 LearningRate 0.0008 Epoch: 3 Global Step: 68950 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:32,576-Speed 6305.21 samples/sec Loss 10.2504 LearningRate 0.0008 Epoch: 3 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:35,824-Speed 6308.28 samples/sec Loss 10.1176 LearningRate 0.0008 Epoch: 3 Global Step: 68970 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:39,067-Speed 6315.30 samples/sec Loss 10.1961 LearningRate 0.0008 Epoch: 3 Global Step: 68980 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:42,316-Speed 6306.31 samples/sec Loss 10.2109 LearningRate 0.0008 Epoch: 3 Global Step: 68990 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:45,562-Speed 6309.59 samples/sec Loss 10.1636 LearningRate 0.0008 Epoch: 3 Global Step: 69000 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:48,807-Speed 6312.57 samples/sec Loss 10.2166 LearningRate 0.0008 Epoch: 3 Global Step: 69010 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:52,038-Speed 6339.27 samples/sec Loss 10.2665 LearningRate 0.0008 Epoch: 3 Global Step: 69020 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:55,283-Speed 6312.94 samples/sec Loss 10.1917 LearningRate 0.0008 Epoch: 3 Global Step: 69030 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:00:58,529-Speed 6311.03 samples/sec Loss 10.2809 LearningRate 0.0008 Epoch: 3 Global Step: 69040 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:01,773-Speed 6313.75 samples/sec Loss 10.2154 LearningRate 0.0008 Epoch: 3 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:05,021-Speed 6307.08 samples/sec Loss 10.1750 LearningRate 0.0008 Epoch: 3 Global Step: 69060 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:08,272-Speed 6301.79 samples/sec Loss 10.1759 LearningRate 0.0008 Epoch: 3 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:11,513-Speed 6320.13 samples/sec Loss 10.1934 LearningRate 0.0008 Epoch: 3 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:14,757-Speed 6314.62 samples/sec Loss 10.1719 LearningRate 0.0008 Epoch: 3 Global Step: 69090 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:18,010-Speed 6297.78 samples/sec Loss 10.2425 LearningRate 0.0008 Epoch: 3 Global Step: 69100 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:21,258-Speed 6307.03 samples/sec Loss 10.0669 LearningRate 0.0008 Epoch: 3 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:24,492-Speed 6334.01 samples/sec Loss 10.2348 LearningRate 0.0008 Epoch: 3 Global Step: 69120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:27,738-Speed 6311.68 samples/sec Loss 10.2480 LearningRate 0.0008 Epoch: 3 Global Step: 69130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:30,986-Speed 6306.73 samples/sec Loss 10.1053 LearningRate 0.0008 Epoch: 3 Global Step: 69140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:34,230-Speed 6315.18 samples/sec Loss 10.2182 LearningRate 0.0008 Epoch: 3 Global Step: 69150 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:37,476-Speed 6309.49 samples/sec Loss 10.3412 LearningRate 0.0008 Epoch: 3 Global Step: 69160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:40,718-Speed 6319.58 samples/sec Loss 10.1956 LearningRate 0.0008 Epoch: 3 Global Step: 69170 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:43,965-Speed 6308.42 samples/sec Loss 10.2573 LearningRate 0.0008 Epoch: 3 Global Step: 69180 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:47,208-Speed 6316.02 samples/sec Loss 10.1566 LearningRate 0.0008 Epoch: 3 Global Step: 69190 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:50,459-Speed 6301.33 samples/sec Loss 10.1124 LearningRate 0.0008 Epoch: 3 Global Step: 69200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:53,701-Speed 6318.99 samples/sec Loss 10.1158 LearningRate 0.0008 Epoch: 3 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:01:56,926-Speed 6351.28 samples/sec Loss 10.1915 LearningRate 0.0008 Epoch: 3 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:00,172-Speed 6310.48 samples/sec Loss 10.2559 LearningRate 0.0008 Epoch: 3 Global Step: 69230 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:03,421-Speed 6305.65 samples/sec Loss 10.2215 LearningRate 0.0008 Epoch: 3 Global Step: 69240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:06,670-Speed 6304.35 samples/sec Loss 10.2124 LearningRate 0.0008 Epoch: 3 Global Step: 69250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:09,917-Speed 6309.50 samples/sec Loss 10.1144 LearningRate 0.0008 Epoch: 3 Global Step: 69260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:13,165-Speed 6305.70 samples/sec Loss 10.1970 LearningRate 0.0008 Epoch: 3 Global Step: 69270 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:16,409-Speed 6313.78 samples/sec Loss 10.1581 LearningRate 0.0008 Epoch: 3 Global Step: 69280 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:19,650-Speed 6321.29 samples/sec Loss 10.2882 LearningRate 0.0008 Epoch: 3 Global Step: 69290 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:22,899-Speed 6305.42 samples/sec Loss 10.1988 LearningRate 0.0008 Epoch: 3 Global Step: 69300 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:26,144-Speed 6312.16 samples/sec Loss 10.1632 LearningRate 0.0008 Epoch: 3 Global Step: 69310 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:29,375-Speed 6340.93 samples/sec Loss 10.1341 LearningRate 0.0008 Epoch: 3 Global Step: 69320 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:32,624-Speed 6304.27 samples/sec Loss 10.0580 LearningRate 0.0008 Epoch: 3 Global Step: 69330 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:35,863-Speed 6324.27 samples/sec Loss 10.1963 LearningRate 0.0008 Epoch: 3 Global Step: 69340 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:39,107-Speed 6315.31 samples/sec Loss 10.1552 LearningRate 0.0008 Epoch: 3 Global Step: 69350 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:42,354-Speed 6309.37 samples/sec Loss 10.1798 LearningRate 0.0008 Epoch: 3 Global Step: 69360 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:45,600-Speed 6309.91 samples/sec Loss 10.2534 LearningRate 0.0008 Epoch: 3 Global Step: 69370 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:48,845-Speed 6312.96 samples/sec Loss 10.1973 LearningRate 0.0008 Epoch: 3 Global Step: 69380 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:52,093-Speed 6306.26 samples/sec Loss 10.2806 LearningRate 0.0008 Epoch: 3 Global Step: 69390 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:55,342-Speed 6306.15 samples/sec Loss 10.2810 LearningRate 0.0008 Epoch: 3 Global Step: 69400 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:02:58,583-Speed 6319.52 samples/sec Loss 10.2048 LearningRate 0.0008 Epoch: 3 Global Step: 69410 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:03:01,814-Speed 6339.35 samples/sec Loss 10.1636 LearningRate 0.0008 Epoch: 3 Global Step: 69420 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:03:05,058-Speed 6315.43 samples/sec Loss 10.2429 LearningRate 0.0008 Epoch: 3 Global Step: 69430 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:03:08,301-Speed 6317.33 samples/sec Loss 10.2800 LearningRate 0.0008 Epoch: 3 Global Step: 69440 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:03:11,552-Speed 6300.85 samples/sec Loss 10.2295 LearningRate 0.0008 Epoch: 3 Global Step: 69450 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:03:14,793-Speed 6319.04 samples/sec Loss 10.2855 LearningRate 0.0008 Epoch: 3 Global Step: 69460 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:03:18,036-Speed 6317.78 samples/sec Loss 10.1692 LearningRate 0.0008 Epoch: 3 Global Step: 69470 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:03:21,281-Speed 6311.18 samples/sec Loss 10.1347 LearningRate 0.0008 Epoch: 3 Global Step: 69480 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:03:24,523-Speed 6318.82 samples/sec Loss 10.1449 LearningRate 0.0008 Epoch: 3 Global Step: 69490 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:03:27,773-Speed 6304.04 samples/sec Loss 10.2252 LearningRate 0.0008 Epoch: 3 Global Step: 69500 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:03:31,019-Speed 6310.51 samples/sec Loss 10.2205 LearningRate 0.0008 Epoch: 3 Global Step: 69510 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:03:34,264-Speed 6313.96 samples/sec Loss 10.1592 LearningRate 0.0008 Epoch: 3 Global Step: 69520 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:03:37,508-Speed 6313.79 samples/sec Loss 10.2005 LearningRate 0.0008 Epoch: 3 Global Step: 69530 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:03:40,751-Speed 6317.80 samples/sec Loss 10.1457 LearningRate 0.0008 Epoch: 3 Global Step: 69540 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:03:43,994-Speed 6316.12 samples/sec Loss 10.1671 LearningRate 0.0008 Epoch: 3 Global Step: 69550 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:03:47,239-Speed 6311.90 samples/sec Loss 10.1934 LearningRate 0.0008 Epoch: 3 Global Step: 69560 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:03:50,484-Speed 6312.53 samples/sec Loss 10.0550 LearningRate 0.0008 Epoch: 3 Global Step: 69570 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:03:53,726-Speed 6318.54 samples/sec Loss 10.2614 LearningRate 0.0008 Epoch: 3 Global Step: 69580 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:03:56,972-Speed 6311.28 samples/sec Loss 10.1861 LearningRate 0.0008 Epoch: 3 Global Step: 69590 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:00,216-Speed 6314.15 samples/sec Loss 10.1940 LearningRate 0.0008 Epoch: 3 Global Step: 69600 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:03,464-Speed 6306.75 samples/sec Loss 10.1834 LearningRate 0.0008 Epoch: 3 Global Step: 69610 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:06,708-Speed 6314.66 samples/sec Loss 10.1214 LearningRate 0.0008 Epoch: 3 Global Step: 69620 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:04:09,942-Speed 6334.97 samples/sec Loss 10.1275 LearningRate 0.0008 Epoch: 3 Global Step: 69630 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:13,188-Speed 6309.15 samples/sec Loss 10.1150 LearningRate 0.0008 Epoch: 3 Global Step: 69640 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:16,431-Speed 6318.16 samples/sec Loss 10.2652 LearningRate 0.0008 Epoch: 3 Global Step: 69650 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:19,677-Speed 6309.35 samples/sec Loss 10.1689 LearningRate 0.0008 Epoch: 3 Global Step: 69660 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:22,922-Speed 6313.18 samples/sec Loss 10.0690 LearningRate 0.0008 Epoch: 3 Global Step: 69670 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:26,171-Speed 6304.89 samples/sec Loss 10.1749 LearningRate 0.0008 Epoch: 3 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:29,418-Speed 6308.43 samples/sec Loss 10.1928 LearningRate 0.0008 Epoch: 3 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:32,674-Speed 6291.22 samples/sec Loss 10.1903 LearningRate 0.0008 Epoch: 3 Global Step: 69700 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:35,926-Speed 6300.80 samples/sec Loss 10.1624 LearningRate 0.0008 Epoch: 3 Global Step: 69710 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:39,171-Speed 6312.20 samples/sec Loss 10.2534 LearningRate 0.0008 Epoch: 3 Global Step: 69720 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:42,403-Speed 6338.18 samples/sec Loss 10.1084 LearningRate 0.0008 Epoch: 3 Global Step: 69730 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:45,647-Speed 6314.03 samples/sec Loss 10.2003 LearningRate 0.0008 Epoch: 3 Global Step: 69740 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:48,889-Speed 6318.29 samples/sec Loss 10.2231 LearningRate 0.0008 Epoch: 3 Global Step: 69750 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:52,132-Speed 6316.18 samples/sec Loss 10.2049 LearningRate 0.0008 Epoch: 3 Global Step: 69760 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:55,378-Speed 6312.17 samples/sec Loss 10.0981 LearningRate 0.0008 Epoch: 3 Global Step: 69770 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:04:58,624-Speed 6310.07 samples/sec Loss 10.1107 LearningRate 0.0008 Epoch: 3 Global Step: 69780 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:01,871-Speed 6308.09 samples/sec Loss 10.2467 LearningRate 0.0008 Epoch: 3 Global Step: 69790 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:05,114-Speed 6317.33 samples/sec Loss 10.1128 LearningRate 0.0008 Epoch: 3 Global Step: 69800 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:08,359-Speed 6312.81 samples/sec Loss 10.1625 LearningRate 0.0008 Epoch: 3 Global Step: 69810 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:11,601-Speed 6318.22 samples/sec Loss 10.2020 LearningRate 0.0008 Epoch: 3 Global Step: 69820 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:14,834-Speed 6337.12 samples/sec Loss 10.2061 LearningRate 0.0008 Epoch: 3 Global Step: 69830 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:18,077-Speed 6315.71 samples/sec Loss 10.1130 LearningRate 0.0008 Epoch: 3 Global Step: 69840 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:21,329-Speed 6299.69 samples/sec Loss 10.0297 LearningRate 0.0008 Epoch: 3 Global Step: 69850 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:24,582-Speed 6296.15 samples/sec Loss 10.1112 LearningRate 0.0008 Epoch: 3 Global Step: 69860 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:27,829-Speed 6309.50 samples/sec Loss 10.2137 LearningRate 0.0008 Epoch: 3 Global Step: 69870 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:31,078-Speed 6303.82 samples/sec Loss 10.1757 LearningRate 0.0008 Epoch: 3 Global Step: 69880 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:34,328-Speed 6303.24 samples/sec Loss 10.1528 LearningRate 0.0008 Epoch: 3 Global Step: 69890 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:37,575-Speed 6308.39 samples/sec Loss 10.1232 LearningRate 0.0008 Epoch: 3 Global Step: 69900 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:40,822-Speed 6310.22 samples/sec Loss 10.0897 LearningRate 0.0008 Epoch: 3 Global Step: 69910 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:44,065-Speed 6317.44 samples/sec Loss 10.1392 LearningRate 0.0008 Epoch: 3 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:47,315-Speed 6302.46 samples/sec Loss 10.0797 LearningRate 0.0008 Epoch: 3 Global Step: 69930 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:05:50,548-Speed 6336.61 samples/sec Loss 10.1979 LearningRate 0.0008 Epoch: 3 Global Step: 69940 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:53,800-Speed 6298.43 samples/sec Loss 10.1877 LearningRate 0.0008 Epoch: 3 Global Step: 69950 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:05:57,048-Speed 6307.94 samples/sec Loss 10.0451 LearningRate 0.0008 Epoch: 3 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:06:00,278-Speed 6341.46 samples/sec Loss 10.1738 LearningRate 0.0008 Epoch: 3 Global Step: 69970 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:06:03,528-Speed 6302.95 samples/sec Loss 10.1245 LearningRate 0.0008 Epoch: 3 Global Step: 69980 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:06:06,771-Speed 6315.32 samples/sec Loss 10.0595 LearningRate 0.0008 Epoch: 3 Global Step: 69990 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:06:10,018-Speed 6310.68 samples/sec Loss 10.0947 LearningRate 0.0008 Epoch: 3 Global Step: 70000 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:06:13,263-Speed 6315.54 samples/sec Loss 10.2870 LearningRate 0.0008 Epoch: 3 Global Step: 70010 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:06:16,509-Speed 6310.17 samples/sec Loss 10.0993 LearningRate 0.0008 Epoch: 3 Global Step: 70020 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:06:19,754-Speed 6314.32 samples/sec Loss 10.1048 LearningRate 0.0008 Epoch: 3 Global Step: 70030 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:06:23,003-Speed 6304.70 samples/sec Loss 10.1867 LearningRate 0.0008 Epoch: 3 Global Step: 70040 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:06:26,248-Speed 6311.11 samples/sec Loss 10.1581 LearningRate 0.0008 Epoch: 3 Global Step: 70050 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:06:29,490-Speed 6318.54 samples/sec Loss 10.1448 LearningRate 0.0008 Epoch: 3 Global Step: 70060 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:06:32,735-Speed 6315.58 samples/sec Loss 10.1710 LearningRate 0.0008 Epoch: 3 Global Step: 70070 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:06:35,985-Speed 6302.35 samples/sec Loss 10.1211 LearningRate 0.0008 Epoch: 3 Global Step: 70080 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:06:39,226-Speed 6321.03 samples/sec Loss 10.0784 LearningRate 0.0008 Epoch: 3 Global Step: 70090 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:06:42,470-Speed 6313.84 samples/sec Loss 10.0554 LearningRate 0.0008 Epoch: 3 Global Step: 70100 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:06:45,713-Speed 6317.39 samples/sec Loss 10.1843 LearningRate 0.0008 Epoch: 3 Global Step: 70110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:06:48,957-Speed 6314.18 samples/sec Loss 10.2430 LearningRate 0.0008 Epoch: 3 Global Step: 70120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:06:52,195-Speed 6326.20 samples/sec Loss 10.1385 LearningRate 0.0008 Epoch: 3 Global Step: 70130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:06:55,437-Speed 6318.76 samples/sec Loss 10.1733 LearningRate 0.0008 Epoch: 3 Global Step: 70140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:06:58,690-Speed 6297.15 samples/sec Loss 10.1180 LearningRate 0.0008 Epoch: 3 Global Step: 70150 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:01,936-Speed 6310.51 samples/sec Loss 10.1620 LearningRate 0.0008 Epoch: 3 Global Step: 70160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:05,188-Speed 6299.76 samples/sec Loss 10.2238 LearningRate 0.0008 Epoch: 3 Global Step: 70170 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:07:08,413-Speed 6351.77 samples/sec Loss 10.1152 LearningRate 0.0008 Epoch: 3 Global Step: 70180 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:11,659-Speed 6310.15 samples/sec Loss 10.1254 LearningRate 0.0008 Epoch: 3 Global Step: 70190 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:14,904-Speed 6313.83 samples/sec Loss 10.0902 LearningRate 0.0008 Epoch: 3 Global Step: 70200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:18,146-Speed 6316.92 samples/sec Loss 10.1090 LearningRate 0.0008 Epoch: 3 Global Step: 70210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:21,391-Speed 6314.00 samples/sec Loss 10.1512 LearningRate 0.0008 Epoch: 3 Global Step: 70220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:24,639-Speed 6307.12 samples/sec Loss 10.1561 LearningRate 0.0008 Epoch: 3 Global Step: 70230 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:27,884-Speed 6312.90 samples/sec Loss 10.1255 LearningRate 0.0008 Epoch: 3 Global Step: 70240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:31,128-Speed 6314.39 samples/sec Loss 10.1982 LearningRate 0.0008 Epoch: 3 Global Step: 70250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:34,370-Speed 6317.02 samples/sec Loss 10.1825 LearningRate 0.0008 Epoch: 3 Global Step: 70260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:37,625-Speed 6294.31 samples/sec Loss 10.2574 LearningRate 0.0008 Epoch: 3 Global Step: 70270 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:40,855-Speed 6340.58 samples/sec Loss 10.1481 LearningRate 0.0008 Epoch: 3 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:44,100-Speed 6313.79 samples/sec Loss 10.1791 LearningRate 0.0008 Epoch: 3 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:47,342-Speed 6316.89 samples/sec Loss 10.1247 LearningRate 0.0008 Epoch: 3 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:50,587-Speed 6313.34 samples/sec Loss 10.1746 LearningRate 0.0008 Epoch: 3 Global Step: 70310 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:53,833-Speed 6312.61 samples/sec Loss 10.1802 LearningRate 0.0008 Epoch: 3 Global Step: 70320 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:07:57,085-Speed 6299.50 samples/sec Loss 10.0939 LearningRate 0.0008 Epoch: 3 Global Step: 70330 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:00,330-Speed 6311.90 samples/sec Loss 10.0767 LearningRate 0.0008 Epoch: 3 Global Step: 70340 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:03,572-Speed 6317.79 samples/sec Loss 10.0816 LearningRate 0.0008 Epoch: 3 Global Step: 70350 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:06,818-Speed 6310.81 samples/sec Loss 10.1714 LearningRate 0.0008 Epoch: 3 Global Step: 70360 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:10,065-Speed 6310.01 samples/sec Loss 10.0708 LearningRate 0.0008 Epoch: 3 Global Step: 70370 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:13,298-Speed 6334.48 samples/sec Loss 10.1339 LearningRate 0.0008 Epoch: 3 Global Step: 70380 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:16,542-Speed 6315.24 samples/sec Loss 10.1372 LearningRate 0.0008 Epoch: 3 Global Step: 70390 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:19,788-Speed 6311.12 samples/sec Loss 10.1575 LearningRate 0.0008 Epoch: 3 Global Step: 70400 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:23,036-Speed 6305.61 samples/sec Loss 10.1191 LearningRate 0.0008 Epoch: 3 Global Step: 70410 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:26,284-Speed 6308.01 samples/sec Loss 10.1652 LearningRate 0.0008 Epoch: 3 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:29,526-Speed 6317.86 samples/sec Loss 10.0261 LearningRate 0.0008 Epoch: 3 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:32,770-Speed 6314.13 samples/sec Loss 10.1991 LearningRate 0.0008 Epoch: 3 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:36,013-Speed 6316.78 samples/sec Loss 10.1137 LearningRate 0.0008 Epoch: 3 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:39,260-Speed 6309.88 samples/sec Loss 10.1488 LearningRate 0.0008 Epoch: 3 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:42,501-Speed 6318.77 samples/sec Loss 10.1342 LearningRate 0.0008 Epoch: 3 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:45,733-Speed 6339.55 samples/sec Loss 10.1776 LearningRate 0.0008 Epoch: 3 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:48,978-Speed 6312.27 samples/sec Loss 10.1468 LearningRate 0.0008 Epoch: 3 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:52,226-Speed 6306.52 samples/sec Loss 10.1471 LearningRate 0.0008 Epoch: 3 Global Step: 70500 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:55,472-Speed 6311.83 samples/sec Loss 10.1615 LearningRate 0.0008 Epoch: 3 Global Step: 70510 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:08:58,712-Speed 6321.40 samples/sec Loss 10.0995 LearningRate 0.0009 Epoch: 3 Global Step: 70520 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:01,969-Speed 6290.91 samples/sec Loss 10.1076 LearningRate 0.0009 Epoch: 3 Global Step: 70530 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:05,215-Speed 6310.40 samples/sec Loss 10.0921 LearningRate 0.0009 Epoch: 3 Global Step: 70540 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:08,468-Speed 6297.65 samples/sec Loss 10.1019 LearningRate 0.0009 Epoch: 3 Global Step: 70550 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:11,717-Speed 6303.96 samples/sec Loss 10.0686 LearningRate 0.0009 Epoch: 3 Global Step: 70560 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:14,961-Speed 6314.14 samples/sec Loss 10.0990 LearningRate 0.0009 Epoch: 3 Global Step: 70570 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:18,208-Speed 6310.12 samples/sec Loss 10.0540 LearningRate 0.0009 Epoch: 3 Global Step: 70580 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:09:21,446-Speed 6326.09 samples/sec Loss 10.0398 LearningRate 0.0009 Epoch: 3 Global Step: 70590 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:24,692-Speed 6310.76 samples/sec Loss 10.0926 LearningRate 0.0009 Epoch: 3 Global Step: 70600 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:27,936-Speed 6313.20 samples/sec Loss 10.0198 LearningRate 0.0009 Epoch: 3 Global Step: 70610 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:31,181-Speed 6313.65 samples/sec Loss 10.0637 LearningRate 0.0009 Epoch: 3 Global Step: 70620 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:34,422-Speed 6320.63 samples/sec Loss 10.0505 LearningRate 0.0009 Epoch: 3 Global Step: 70630 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:37,662-Speed 6321.33 samples/sec Loss 10.1105 LearningRate 0.0009 Epoch: 3 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:40,908-Speed 6310.36 samples/sec Loss 10.0968 LearningRate 0.0009 Epoch: 3 Global Step: 70650 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:44,153-Speed 6313.92 samples/sec Loss 10.1286 LearningRate 0.0009 Epoch: 3 Global Step: 70660 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:47,398-Speed 6311.41 samples/sec Loss 10.1203 LearningRate 0.0009 Epoch: 3 Global Step: 70670 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:50,643-Speed 6312.75 samples/sec Loss 10.1286 LearningRate 0.0009 Epoch: 3 Global Step: 70680 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:53,872-Speed 6344.94 samples/sec Loss 10.1465 LearningRate 0.0009 Epoch: 3 Global Step: 70690 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:09:57,115-Speed 6316.16 samples/sec Loss 10.0734 LearningRate 0.0009 Epoch: 3 Global Step: 70700 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:00,358-Speed 6317.32 samples/sec Loss 10.1272 LearningRate 0.0009 Epoch: 3 Global Step: 70710 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:03,609-Speed 6300.97 samples/sec Loss 10.1445 LearningRate 0.0009 Epoch: 3 Global Step: 70720 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:06,857-Speed 6306.77 samples/sec Loss 10.0331 LearningRate 0.0009 Epoch: 3 Global Step: 70730 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:10,114-Speed 6290.42 samples/sec Loss 10.0935 LearningRate 0.0009 Epoch: 3 Global Step: 70740 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:13,363-Speed 6303.74 samples/sec Loss 10.1564 LearningRate 0.0009 Epoch: 3 Global Step: 70750 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:16,615-Speed 6299.94 samples/sec Loss 10.1973 LearningRate 0.0009 Epoch: 3 Global Step: 70760 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:19,859-Speed 6315.23 samples/sec Loss 10.0320 LearningRate 0.0009 Epoch: 3 Global Step: 70770 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:23,105-Speed 6308.83 samples/sec Loss 10.0684 LearningRate 0.0009 Epoch: 3 Global Step: 70780 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:26,335-Speed 6343.24 samples/sec Loss 10.1391 LearningRate 0.0009 Epoch: 3 Global Step: 70790 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:29,584-Speed 6303.70 samples/sec Loss 10.1020 LearningRate 0.0009 Epoch: 3 Global Step: 70800 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:32,838-Speed 6296.84 samples/sec Loss 10.1108 LearningRate 0.0009 Epoch: 3 Global Step: 70810 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:36,083-Speed 6311.71 samples/sec Loss 10.1406 LearningRate 0.0009 Epoch: 3 Global Step: 70820 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:39,329-Speed 6311.51 samples/sec Loss 10.0749 LearningRate 0.0009 Epoch: 3 Global Step: 70830 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:42,578-Speed 6303.71 samples/sec Loss 10.0581 LearningRate 0.0009 Epoch: 3 Global Step: 70840 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:45,828-Speed 6303.26 samples/sec Loss 10.1063 LearningRate 0.0009 Epoch: 3 Global Step: 70850 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:49,071-Speed 6317.13 samples/sec Loss 10.0995 LearningRate 0.0009 Epoch: 3 Global Step: 70860 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:52,317-Speed 6310.94 samples/sec Loss 10.1122 LearningRate 0.0009 Epoch: 3 Global Step: 70870 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:55,559-Speed 6318.27 samples/sec Loss 10.0707 LearningRate 0.0009 Epoch: 3 Global Step: 70880 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:10:58,795-Speed 6329.85 samples/sec Loss 10.0581 LearningRate 0.0009 Epoch: 3 Global Step: 70890 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:02,039-Speed 6314.14 samples/sec Loss 10.0838 LearningRate 0.0009 Epoch: 3 Global Step: 70900 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:05,295-Speed 6291.45 samples/sec Loss 10.1694 LearningRate 0.0009 Epoch: 3 Global Step: 70910 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:08,541-Speed 6310.84 samples/sec Loss 10.0328 LearningRate 0.0009 Epoch: 3 Global Step: 70920 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:11,786-Speed 6311.88 samples/sec Loss 10.0233 LearningRate 0.0009 Epoch: 3 Global Step: 70930 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:15,034-Speed 6309.25 samples/sec Loss 10.1230 LearningRate 0.0009 Epoch: 3 Global Step: 70940 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:18,276-Speed 6317.57 samples/sec Loss 10.0581 LearningRate 0.0009 Epoch: 3 Global Step: 70950 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:21,520-Speed 6315.49 samples/sec Loss 10.1245 LearningRate 0.0009 Epoch: 3 Global Step: 70960 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:24,767-Speed 6307.89 samples/sec Loss 10.0839 LearningRate 0.0009 Epoch: 3 Global Step: 70970 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:28,010-Speed 6317.04 samples/sec Loss 10.0361 LearningRate 0.0009 Epoch: 3 Global Step: 70980 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:31,242-Speed 6338.29 samples/sec Loss 10.1287 LearningRate 0.0009 Epoch: 3 Global Step: 70990 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:34,484-Speed 6317.33 samples/sec Loss 10.1081 LearningRate 0.0009 Epoch: 3 Global Step: 71000 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:37,730-Speed 6311.85 samples/sec Loss 10.0546 LearningRate 0.0009 Epoch: 3 Global Step: 71010 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:40,977-Speed 6308.24 samples/sec Loss 10.0256 LearningRate 0.0009 Epoch: 3 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:44,219-Speed 6318.15 samples/sec Loss 10.1502 LearningRate 0.0009 Epoch: 3 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:47,464-Speed 6313.76 samples/sec Loss 10.1759 LearningRate 0.0009 Epoch: 3 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:50,713-Speed 6304.02 samples/sec Loss 10.0777 LearningRate 0.0009 Epoch: 3 Global Step: 71050 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:53,957-Speed 6314.17 samples/sec Loss 9.9975 LearningRate 0.0009 Epoch: 3 Global Step: 71060 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:11:57,204-Speed 6310.01 samples/sec Loss 10.0221 LearningRate 0.0009 Epoch: 3 Global Step: 71070 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:00,447-Speed 6316.86 samples/sec Loss 10.1032 LearningRate 0.0009 Epoch: 3 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:03,679-Speed 6336.68 samples/sec Loss 10.0703 LearningRate 0.0009 Epoch: 3 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:06,926-Speed 6309.19 samples/sec Loss 10.0839 LearningRate 0.0009 Epoch: 3 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:10,173-Speed 6309.38 samples/sec Loss 10.1961 LearningRate 0.0009 Epoch: 3 Global Step: 71110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:13,419-Speed 6310.06 samples/sec Loss 10.1549 LearningRate 0.0009 Epoch: 3 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:16,664-Speed 6311.75 samples/sec Loss 10.1076 LearningRate 0.0009 Epoch: 3 Global Step: 71130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:19,910-Speed 6310.59 samples/sec Loss 10.1140 LearningRate 0.0009 Epoch: 3 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:23,208-Speed 6210.99 samples/sec Loss 9.9957 LearningRate 0.0009 Epoch: 3 Global Step: 71150 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:26,463-Speed 6294.10 samples/sec Loss 10.1531 LearningRate 0.0009 Epoch: 3 Global Step: 71160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:29,707-Speed 6315.97 samples/sec Loss 10.1016 LearningRate 0.0009 Epoch: 3 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:32,952-Speed 6311.47 samples/sec Loss 10.0879 LearningRate 0.0009 Epoch: 3 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:36,195-Speed 6317.02 samples/sec Loss 10.0775 LearningRate 0.0009 Epoch: 3 Global Step: 71190 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:12:39,427-Speed 6338.61 samples/sec Loss 10.0887 LearningRate 0.0009 Epoch: 3 Global Step: 71200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:42,672-Speed 6312.35 samples/sec Loss 10.0200 LearningRate 0.0009 Epoch: 3 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:45,917-Speed 6312.16 samples/sec Loss 10.0862 LearningRate 0.0009 Epoch: 3 Global Step: 71220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:49,160-Speed 6318.34 samples/sec Loss 9.9786 LearningRate 0.0009 Epoch: 3 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:52,400-Speed 6320.93 samples/sec Loss 10.0579 LearningRate 0.0009 Epoch: 3 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:55,645-Speed 6312.73 samples/sec Loss 10.0930 LearningRate 0.0009 Epoch: 3 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:12:58,897-Speed 6300.21 samples/sec Loss 10.1261 LearningRate 0.0009 Epoch: 3 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:13:02,141-Speed 6313.77 samples/sec Loss 10.0204 LearningRate 0.0009 Epoch: 3 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:13:05,385-Speed 6315.24 samples/sec Loss 10.0691 LearningRate 0.0009 Epoch: 3 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:13:08,631-Speed 6310.02 samples/sec Loss 10.1415 LearningRate 0.0009 Epoch: 3 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:13:11,863-Speed 6339.04 samples/sec Loss 10.0060 LearningRate 0.0009 Epoch: 3 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:13:15,111-Speed 6306.65 samples/sec Loss 10.0225 LearningRate 0.0009 Epoch: 3 Global Step: 71310 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:13:18,358-Speed 6308.59 samples/sec Loss 10.0697 LearningRate 0.0009 Epoch: 3 Global Step: 71320 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:13:21,602-Speed 6314.00 samples/sec Loss 10.0820 LearningRate 0.0009 Epoch: 3 Global Step: 71330 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:13:24,846-Speed 6314.35 samples/sec Loss 10.0347 LearningRate 0.0009 Epoch: 3 Global Step: 71340 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:13:28,075-Speed 6343.19 samples/sec Loss 10.2574 LearningRate 0.0009 Epoch: 3 Global Step: 71350 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:13:31,319-Speed 6314.97 samples/sec Loss 10.0923 LearningRate 0.0009 Epoch: 3 Global Step: 71360 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:13:34,562-Speed 6316.33 samples/sec Loss 10.1353 LearningRate 0.0009 Epoch: 3 Global Step: 71370 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:13:37,810-Speed 6309.55 samples/sec Loss 10.0794 LearningRate 0.0009 Epoch: 3 Global Step: 71380 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:13:41,076-Speed 6270.83 samples/sec Loss 10.0027 LearningRate 0.0009 Epoch: 3 Global Step: 71390 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:13:44,325-Speed 6305.86 samples/sec Loss 10.0068 LearningRate 0.0009 Epoch: 3 Global Step: 71400 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:13:47,565-Speed 6322.18 samples/sec Loss 10.0816 LearningRate 0.0009 Epoch: 3 Global Step: 71410 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:13:50,808-Speed 6315.63 samples/sec Loss 10.0330 LearningRate 0.0009 Epoch: 3 Global Step: 71420 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:13:54,052-Speed 6315.13 samples/sec Loss 10.0883 LearningRate 0.0009 Epoch: 3 Global Step: 71430 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:13:57,296-Speed 6315.06 samples/sec Loss 10.0292 LearningRate 0.0009 Epoch: 3 Global Step: 71440 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:14:00,561-Speed 6273.26 samples/sec Loss 10.0970 LearningRate 0.0009 Epoch: 3 Global Step: 71450 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:14:03,809-Speed 6306.41 samples/sec Loss 10.0452 LearningRate 0.0009 Epoch: 3 Global Step: 71460 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:14:07,050-Speed 6321.04 samples/sec Loss 10.0700 LearningRate 0.0009 Epoch: 3 Global Step: 71470 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:14:10,297-Speed 6309.11 samples/sec Loss 10.0447 LearningRate 0.0009 Epoch: 3 Global Step: 71480 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:14:13,549-Speed 6299.23 samples/sec Loss 10.0103 LearningRate 0.0009 Epoch: 3 Global Step: 71490 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:14:16,798-Speed 6303.83 samples/sec Loss 10.0759 LearningRate 0.0009 Epoch: 3 Global Step: 71500 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:14:20,046-Speed 6308.09 samples/sec Loss 10.0281 LearningRate 0.0009 Epoch: 3 Global Step: 71510 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:14:23,292-Speed 6310.11 samples/sec Loss 10.0691 LearningRate 0.0009 Epoch: 3 Global Step: 71520 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:14:26,539-Speed 6307.89 samples/sec Loss 10.0808 LearningRate 0.0009 Epoch: 3 Global Step: 71530 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:14:29,780-Speed 6321.16 samples/sec Loss 10.0597 LearningRate 0.0009 Epoch: 3 Global Step: 71540 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:14:33,026-Speed 6310.18 samples/sec Loss 10.0650 LearningRate 0.0009 Epoch: 3 Global Step: 71550 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:14:36,271-Speed 6313.52 samples/sec Loss 10.0637 LearningRate 0.0009 Epoch: 3 Global Step: 71560 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:14:39,513-Speed 6318.08 samples/sec Loss 10.0890 LearningRate 0.0009 Epoch: 3 Global Step: 71570 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:14:42,760-Speed 6309.15 samples/sec Loss 10.0311 LearningRate 0.0009 Epoch: 3 Global Step: 71580 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:14:46,005-Speed 6311.54 samples/sec Loss 10.0886 LearningRate 0.0009 Epoch: 3 Global Step: 71590 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:14:49,248-Speed 6318.54 samples/sec Loss 10.1251 LearningRate 0.0009 Epoch: 3 Global Step: 71600 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:14:52,495-Speed 6307.68 samples/sec Loss 10.1323 LearningRate 0.0009 Epoch: 3 Global Step: 71610 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:14:55,742-Speed 6310.22 samples/sec Loss 10.0777 LearningRate 0.0009 Epoch: 3 Global Step: 71620 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:14:58,985-Speed 6316.98 samples/sec Loss 10.0799 LearningRate 0.0009 Epoch: 3 Global Step: 71630 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:15:02,233-Speed 6305.98 samples/sec Loss 10.1190 LearningRate 0.0009 Epoch: 3 Global Step: 71640 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:05,475-Speed 6319.65 samples/sec Loss 10.1497 LearningRate 0.0009 Epoch: 3 Global Step: 71650 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:08,717-Speed 6318.28 samples/sec Loss 10.1266 LearningRate 0.0009 Epoch: 3 Global Step: 71660 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:11,964-Speed 6308.70 samples/sec Loss 10.0106 LearningRate 0.0009 Epoch: 3 Global Step: 71670 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:15,209-Speed 6311.24 samples/sec Loss 10.0706 LearningRate 0.0009 Epoch: 3 Global Step: 71680 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:18,455-Speed 6312.27 samples/sec Loss 9.9954 LearningRate 0.0009 Epoch: 3 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:21,697-Speed 6318.28 samples/sec Loss 9.9595 LearningRate 0.0009 Epoch: 3 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:24,944-Speed 6307.51 samples/sec Loss 10.0371 LearningRate 0.0009 Epoch: 3 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:28,187-Speed 6317.00 samples/sec Loss 10.0617 LearningRate 0.0009 Epoch: 3 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:31,436-Speed 6305.88 samples/sec Loss 10.0499 LearningRate 0.0009 Epoch: 3 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:34,667-Speed 6339.81 samples/sec Loss 10.0493 LearningRate 0.0009 Epoch: 3 Global Step: 71740 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:37,911-Speed 6314.56 samples/sec Loss 10.1211 LearningRate 0.0009 Epoch: 3 Global Step: 71750 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:41,210-Speed 6209.10 samples/sec Loss 10.0737 LearningRate 0.0009 Epoch: 3 Global Step: 71760 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:44,452-Speed 6317.76 samples/sec Loss 10.0860 LearningRate 0.0009 Epoch: 3 Global Step: 71770 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:47,702-Speed 6303.31 samples/sec Loss 10.0201 LearningRate 0.0009 Epoch: 3 Global Step: 71780 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:50,948-Speed 6310.57 samples/sec Loss 10.1145 LearningRate 0.0009 Epoch: 3 Global Step: 71790 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:54,197-Speed 6305.15 samples/sec Loss 10.0436 LearningRate 0.0009 Epoch: 3 Global Step: 71800 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:15:57,437-Speed 6321.68 samples/sec Loss 9.9446 LearningRate 0.0009 Epoch: 3 Global Step: 71810 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:00,681-Speed 6314.76 samples/sec Loss 9.9947 LearningRate 0.0009 Epoch: 3 Global Step: 71820 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:03,927-Speed 6310.59 samples/sec Loss 10.0763 LearningRate 0.0009 Epoch: 3 Global Step: 71830 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:07,157-Speed 6342.35 samples/sec Loss 10.1444 LearningRate 0.0009 Epoch: 3 Global Step: 71840 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:10,411-Speed 6295.47 samples/sec Loss 10.0291 LearningRate 0.0009 Epoch: 3 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:13,653-Speed 6319.57 samples/sec Loss 10.1162 LearningRate 0.0009 Epoch: 3 Global Step: 71860 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:16,899-Speed 6310.43 samples/sec Loss 10.0669 LearningRate 0.0009 Epoch: 3 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:20,144-Speed 6311.38 samples/sec Loss 9.9765 LearningRate 0.0009 Epoch: 3 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:23,392-Speed 6307.89 samples/sec Loss 10.0985 LearningRate 0.0009 Epoch: 3 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:26,638-Speed 6310.61 samples/sec Loss 10.0846 LearningRate 0.0009 Epoch: 3 Global Step: 71900 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:29,881-Speed 6316.02 samples/sec Loss 10.0504 LearningRate 0.0009 Epoch: 3 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:16:33,117-Speed 6331.27 samples/sec Loss 9.9769 LearningRate 0.0009 Epoch: 3 Global Step: 71920 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:16:36,360-Speed 6315.92 samples/sec Loss 9.9372 LearningRate 0.0009 Epoch: 3 Global Step: 71930 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:16:39,605-Speed 6311.82 samples/sec Loss 9.9539 LearningRate 0.0009 Epoch: 3 Global Step: 71940 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:16:42,850-Speed 6312.35 samples/sec Loss 10.0097 LearningRate 0.0009 Epoch: 3 Global Step: 71950 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:16:46,092-Speed 6319.40 samples/sec Loss 10.0538 LearningRate 0.0009 Epoch: 3 Global Step: 71960 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:16:49,335-Speed 6315.80 samples/sec Loss 10.0612 LearningRate 0.0009 Epoch: 3 Global Step: 71970 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:16:52,580-Speed 6314.05 samples/sec Loss 10.0727 LearningRate 0.0009 Epoch: 3 Global Step: 71980 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:16:55,823-Speed 6315.93 samples/sec Loss 9.9725 LearningRate 0.0009 Epoch: 3 Global Step: 71990 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:16:59,071-Speed 6306.11 samples/sec Loss 10.1632 LearningRate 0.0009 Epoch: 3 Global Step: 72000 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:17:02,315-Speed 6316.20 samples/sec Loss 10.0632 LearningRate 0.0009 Epoch: 3 Global Step: 72010 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:17:05,559-Speed 6314.26 samples/sec Loss 10.0559 LearningRate 0.0009 Epoch: 3 Global Step: 72020 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:08,804-Speed 6312.04 samples/sec Loss 10.0331 LearningRate 0.0009 Epoch: 3 Global Step: 72030 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:12,047-Speed 6315.80 samples/sec Loss 10.1446 LearningRate 0.0009 Epoch: 3 Global Step: 72040 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:15,297-Speed 6304.50 samples/sec Loss 10.0541 LearningRate 0.0009 Epoch: 3 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:18,563-Speed 6271.93 samples/sec Loss 9.9350 LearningRate 0.0009 Epoch: 3 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:21,807-Speed 6315.21 samples/sec Loss 9.9909 LearningRate 0.0009 Epoch: 3 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:25,052-Speed 6311.52 samples/sec Loss 10.1183 LearningRate 0.0009 Epoch: 3 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:28,299-Speed 6309.37 samples/sec Loss 10.0060 LearningRate 0.0009 Epoch: 3 Global Step: 72090 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:31,545-Speed 6310.91 samples/sec Loss 10.0631 LearningRate 0.0009 Epoch: 3 Global Step: 72100 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:34,788-Speed 6317.08 samples/sec Loss 9.9916 LearningRate 0.0009 Epoch: 3 Global Step: 72110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:38,019-Speed 6339.26 samples/sec Loss 10.0407 LearningRate 0.0009 Epoch: 3 Global Step: 72120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:41,259-Speed 6323.04 samples/sec Loss 9.9652 LearningRate 0.0009 Epoch: 3 Global Step: 72130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:44,511-Speed 6298.63 samples/sec Loss 9.9781 LearningRate 0.0009 Epoch: 3 Global Step: 72140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:47,756-Speed 6311.56 samples/sec Loss 10.0503 LearningRate 0.0009 Epoch: 3 Global Step: 72150 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:51,003-Speed 6310.17 samples/sec Loss 9.9459 LearningRate 0.0009 Epoch: 3 Global Step: 72160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:54,244-Speed 6320.18 samples/sec Loss 10.1147 LearningRate 0.0009 Epoch: 3 Global Step: 72170 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:17:57,489-Speed 6312.91 samples/sec Loss 9.9589 LearningRate 0.0009 Epoch: 3 Global Step: 72180 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:00,737-Speed 6306.99 samples/sec Loss 9.9692 LearningRate 0.0009 Epoch: 3 Global Step: 72190 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:03,984-Speed 6307.96 samples/sec Loss 10.0316 LearningRate 0.0009 Epoch: 3 Global Step: 72200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:07,227-Speed 6317.19 samples/sec Loss 10.0550 LearningRate 0.0009 Epoch: 3 Global Step: 72210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:10,459-Speed 6337.64 samples/sec Loss 10.0594 LearningRate 0.0009 Epoch: 3 Global Step: 72220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:13,707-Speed 6306.84 samples/sec Loss 10.0321 LearningRate 0.0009 Epoch: 3 Global Step: 72230 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:16,955-Speed 6306.47 samples/sec Loss 9.9894 LearningRate 0.0009 Epoch: 3 Global Step: 72240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:20,197-Speed 6318.72 samples/sec Loss 9.9602 LearningRate 0.0009 Epoch: 3 Global Step: 72250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:23,444-Speed 6309.11 samples/sec Loss 10.0935 LearningRate 0.0009 Epoch: 3 Global Step: 72260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:26,688-Speed 6315.70 samples/sec Loss 9.9632 LearningRate 0.0009 Epoch: 3 Global Step: 72270 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:29,932-Speed 6313.56 samples/sec Loss 10.0087 LearningRate 0.0009 Epoch: 3 Global Step: 72280 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:33,184-Speed 6299.26 samples/sec Loss 10.0773 LearningRate 0.0009 Epoch: 3 Global Step: 72290 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:36,432-Speed 6306.73 samples/sec Loss 10.0221 LearningRate 0.0009 Epoch: 3 Global Step: 72300 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:39,676-Speed 6315.68 samples/sec Loss 10.0353 LearningRate 0.0009 Epoch: 3 Global Step: 72310 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:42,906-Speed 6340.38 samples/sec Loss 10.0182 LearningRate 0.0009 Epoch: 3 Global Step: 72320 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:46,150-Speed 6314.57 samples/sec Loss 9.9464 LearningRate 0.0009 Epoch: 3 Global Step: 72330 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:49,398-Speed 6307.45 samples/sec Loss 10.0564 LearningRate 0.0009 Epoch: 3 Global Step: 72340 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:52,641-Speed 6316.03 samples/sec Loss 10.0300 LearningRate 0.0009 Epoch: 3 Global Step: 72350 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:55,886-Speed 6312.57 samples/sec Loss 10.0181 LearningRate 0.0009 Epoch: 3 Global Step: 72360 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:18:59,133-Speed 6308.11 samples/sec Loss 10.0248 LearningRate 0.0009 Epoch: 3 Global Step: 72370 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:02,408-Speed 6255.68 samples/sec Loss 9.9770 LearningRate 0.0009 Epoch: 3 Global Step: 72380 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:05,656-Speed 6307.41 samples/sec Loss 10.0250 LearningRate 0.0009 Epoch: 3 Global Step: 72390 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:08,902-Speed 6309.24 samples/sec Loss 10.1422 LearningRate 0.0009 Epoch: 3 Global Step: 72400 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:12,154-Speed 6300.71 samples/sec Loss 9.9715 LearningRate 0.0009 Epoch: 3 Global Step: 72410 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:15,385-Speed 6338.54 samples/sec Loss 10.0243 LearningRate 0.0009 Epoch: 3 Global Step: 72420 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:18,632-Speed 6308.47 samples/sec Loss 10.1318 LearningRate 0.0009 Epoch: 3 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:21,879-Speed 6308.92 samples/sec Loss 10.0618 LearningRate 0.0009 Epoch: 3 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:25,124-Speed 6313.90 samples/sec Loss 10.0594 LearningRate 0.0009 Epoch: 3 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:28,383-Speed 6284.46 samples/sec Loss 10.0300 LearningRate 0.0009 Epoch: 3 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:31,630-Speed 6308.87 samples/sec Loss 9.9452 LearningRate 0.0009 Epoch: 3 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:34,880-Speed 6304.61 samples/sec Loss 9.9841 LearningRate 0.0009 Epoch: 3 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:38,136-Speed 6290.86 samples/sec Loss 9.9372 LearningRate 0.0009 Epoch: 3 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:41,381-Speed 6313.75 samples/sec Loss 9.9864 LearningRate 0.0009 Epoch: 3 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:44,631-Speed 6301.58 samples/sec Loss 9.9282 LearningRate 0.0009 Epoch: 3 Global Step: 72510 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:47,887-Speed 6291.06 samples/sec Loss 9.9989 LearningRate 0.0009 Epoch: 3 Global Step: 72520 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:19:51,120-Speed 6337.47 samples/sec Loss 9.9897 LearningRate 0.0009 Epoch: 3 Global Step: 72530 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:54,367-Speed 6308.27 samples/sec Loss 9.9443 LearningRate 0.0009 Epoch: 3 Global Step: 72540 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:19:57,610-Speed 6315.76 samples/sec Loss 10.0689 LearningRate 0.0009 Epoch: 3 Global Step: 72550 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:00,856-Speed 6310.61 samples/sec Loss 10.1454 LearningRate 0.0009 Epoch: 3 Global Step: 72560 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:04,106-Speed 6303.40 samples/sec Loss 9.9843 LearningRate 0.0009 Epoch: 3 Global Step: 72570 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:07,369-Speed 6278.54 samples/sec Loss 9.9279 LearningRate 0.0009 Epoch: 3 Global Step: 72580 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:10,621-Speed 6297.60 samples/sec Loss 10.0016 LearningRate 0.0009 Epoch: 3 Global Step: 72590 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:13,862-Speed 6320.99 samples/sec Loss 10.0346 LearningRate 0.0009 Epoch: 3 Global Step: 72600 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:17,113-Speed 6301.72 samples/sec Loss 10.0178 LearningRate 0.0009 Epoch: 3 Global Step: 72610 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:20,356-Speed 6315.95 samples/sec Loss 10.0651 LearningRate 0.0009 Epoch: 3 Global Step: 72620 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:23,598-Speed 6317.34 samples/sec Loss 10.0945 LearningRate 0.0009 Epoch: 3 Global Step: 72630 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:26,843-Speed 6313.51 samples/sec Loss 10.0063 LearningRate 0.0009 Epoch: 3 Global Step: 72640 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:30,097-Speed 6295.79 samples/sec Loss 9.9595 LearningRate 0.0009 Epoch: 3 Global Step: 72650 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:33,341-Speed 6313.38 samples/sec Loss 10.0123 LearningRate 0.0009 Epoch: 3 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:36,588-Speed 6309.65 samples/sec Loss 10.0318 LearningRate 0.0009 Epoch: 3 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:39,829-Speed 6320.05 samples/sec Loss 9.9251 LearningRate 0.0009 Epoch: 3 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:43,074-Speed 6313.81 samples/sec Loss 10.0008 LearningRate 0.0009 Epoch: 3 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:46,316-Speed 6317.61 samples/sec Loss 9.9656 LearningRate 0.0009 Epoch: 3 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:49,560-Speed 6316.08 samples/sec Loss 9.9700 LearningRate 0.0009 Epoch: 3 Global Step: 72710 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:52,806-Speed 6310.59 samples/sec Loss 10.0064 LearningRate 0.0009 Epoch: 3 Global Step: 72720 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:56,035-Speed 6343.84 samples/sec Loss 10.1038 LearningRate 0.0009 Epoch: 3 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:20:59,280-Speed 6312.42 samples/sec Loss 9.9762 LearningRate 0.0009 Epoch: 3 Global Step: 72740 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:21:02,525-Speed 6312.87 samples/sec Loss 10.0818 LearningRate 0.0009 Epoch: 3 Global Step: 72750 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:21:05,766-Speed 6320.71 samples/sec Loss 9.9197 LearningRate 0.0009 Epoch: 3 Global Step: 72760 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:21:09,010-Speed 6313.77 samples/sec Loss 9.9966 LearningRate 0.0009 Epoch: 3 Global Step: 72770 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:21:12,257-Speed 6309.63 samples/sec Loss 9.9678 LearningRate 0.0009 Epoch: 3 Global Step: 72780 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:21:15,502-Speed 6311.28 samples/sec Loss 10.0417 LearningRate 0.0009 Epoch: 3 Global Step: 72790 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:21:18,749-Speed 6310.24 samples/sec Loss 9.9762 LearningRate 0.0009 Epoch: 3 Global Step: 72800 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:21:21,985-Speed 6329.47 samples/sec Loss 9.9763 LearningRate 0.0009 Epoch: 3 Global Step: 72810 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:21:25,231-Speed 6310.42 samples/sec Loss 9.9049 LearningRate 0.0009 Epoch: 3 Global Step: 72820 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:21:28,478-Speed 6309.13 samples/sec Loss 9.9586 LearningRate 0.0009 Epoch: 3 Global Step: 72830 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:21:31,719-Speed 6319.03 samples/sec Loss 9.9839 LearningRate 0.0009 Epoch: 3 Global Step: 72840 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:21:34,968-Speed 6305.63 samples/sec Loss 9.9927 LearningRate 0.0009 Epoch: 3 Global Step: 72850 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:21:38,210-Speed 6318.15 samples/sec Loss 10.0375 LearningRate 0.0009 Epoch: 3 Global Step: 72860 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:21:41,457-Speed 6309.93 samples/sec Loss 9.9784 LearningRate 0.0009 Epoch: 3 Global Step: 72870 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:21:44,702-Speed 6312.75 samples/sec Loss 9.9912 LearningRate 0.0009 Epoch: 3 Global Step: 72880 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:21:47,944-Speed 6318.08 samples/sec Loss 10.1139 LearningRate 0.0009 Epoch: 3 Global Step: 72890 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:21:51,184-Speed 6322.87 samples/sec Loss 9.9184 LearningRate 0.0009 Epoch: 3 Global Step: 72900 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:21:54,430-Speed 6311.50 samples/sec Loss 9.9544 LearningRate 0.0009 Epoch: 3 Global Step: 72910 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:21:57,674-Speed 6314.79 samples/sec Loss 9.9800 LearningRate 0.0009 Epoch: 3 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:00,923-Speed 6304.83 samples/sec Loss 10.0162 LearningRate 0.0009 Epoch: 3 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:04,180-Speed 6289.24 samples/sec Loss 9.9864 LearningRate 0.0009 Epoch: 3 Global Step: 72940 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:07,421-Speed 6319.26 samples/sec Loss 9.8518 LearningRate 0.0009 Epoch: 3 Global Step: 72950 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:10,670-Speed 6304.68 samples/sec Loss 9.9828 LearningRate 0.0009 Epoch: 3 Global Step: 72960 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:13,916-Speed 6312.53 samples/sec Loss 10.0241 LearningRate 0.0009 Epoch: 3 Global Step: 72970 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:17,158-Speed 6316.63 samples/sec Loss 9.9835 LearningRate 0.0009 Epoch: 3 Global Step: 72980 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:20,406-Speed 6307.28 samples/sec Loss 9.9629 LearningRate 0.0009 Epoch: 3 Global Step: 72990 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:23,651-Speed 6313.70 samples/sec Loss 9.9686 LearningRate 0.0009 Epoch: 3 Global Step: 73000 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:26,880-Speed 6342.47 samples/sec Loss 10.0318 LearningRate 0.0009 Epoch: 3 Global Step: 73010 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:30,126-Speed 6311.09 samples/sec Loss 10.0169 LearningRate 0.0009 Epoch: 3 Global Step: 73020 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:33,375-Speed 6304.28 samples/sec Loss 10.0179 LearningRate 0.0009 Epoch: 3 Global Step: 73030 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:36,621-Speed 6311.60 samples/sec Loss 10.0811 LearningRate 0.0009 Epoch: 3 Global Step: 73040 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:39,863-Speed 6318.77 samples/sec Loss 10.0269 LearningRate 0.0009 Epoch: 3 Global Step: 73050 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:43,114-Speed 6299.94 samples/sec Loss 9.9552 LearningRate 0.0009 Epoch: 3 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:46,373-Speed 6286.00 samples/sec Loss 10.0620 LearningRate 0.0009 Epoch: 3 Global Step: 73070 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:49,615-Speed 6317.74 samples/sec Loss 9.9750 LearningRate 0.0009 Epoch: 3 Global Step: 73080 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:52,861-Speed 6311.62 samples/sec Loss 10.0574 LearningRate 0.0009 Epoch: 3 Global Step: 73090 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:56,107-Speed 6311.27 samples/sec Loss 10.0133 LearningRate 0.0009 Epoch: 3 Global Step: 73100 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:22:59,339-Speed 6339.15 samples/sec Loss 10.0486 LearningRate 0.0009 Epoch: 3 Global Step: 73110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:02,584-Speed 6311.49 samples/sec Loss 10.0391 LearningRate 0.0009 Epoch: 3 Global Step: 73120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:05,830-Speed 6312.21 samples/sec Loss 9.9678 LearningRate 0.0009 Epoch: 3 Global Step: 73130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:09,073-Speed 6315.46 samples/sec Loss 9.9437 LearningRate 0.0009 Epoch: 3 Global Step: 73140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:12,321-Speed 6308.10 samples/sec Loss 9.9778 LearningRate 0.0009 Epoch: 3 Global Step: 73150 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:15,566-Speed 6311.92 samples/sec Loss 10.0994 LearningRate 0.0009 Epoch: 3 Global Step: 73160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:18,811-Speed 6311.80 samples/sec Loss 10.1060 LearningRate 0.0009 Epoch: 3 Global Step: 73170 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:22,060-Speed 6304.30 samples/sec Loss 9.9328 LearningRate 0.0009 Epoch: 3 Global Step: 73180 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:25,308-Speed 6308.22 samples/sec Loss 10.0015 LearningRate 0.0009 Epoch: 3 Global Step: 73190 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:28,556-Speed 6306.73 samples/sec Loss 9.9841 LearningRate 0.0009 Epoch: 3 Global Step: 73200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:31,789-Speed 6336.23 samples/sec Loss 10.0626 LearningRate 0.0009 Epoch: 3 Global Step: 73210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:35,034-Speed 6312.70 samples/sec Loss 9.9893 LearningRate 0.0009 Epoch: 3 Global Step: 73220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:38,285-Speed 6299.32 samples/sec Loss 10.0350 LearningRate 0.0009 Epoch: 3 Global Step: 73230 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:41,533-Speed 6308.29 samples/sec Loss 9.9308 LearningRate 0.0009 Epoch: 3 Global Step: 73240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:44,782-Speed 6305.19 samples/sec Loss 9.8969 LearningRate 0.0009 Epoch: 3 Global Step: 73250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:48,025-Speed 6315.04 samples/sec Loss 10.0499 LearningRate 0.0009 Epoch: 3 Global Step: 73260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:51,272-Speed 6309.84 samples/sec Loss 10.0139 LearningRate 0.0009 Epoch: 3 Global Step: 73270 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:54,518-Speed 6309.55 samples/sec Loss 10.0578 LearningRate 0.0009 Epoch: 3 Global Step: 73280 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:23:57,761-Speed 6316.34 samples/sec Loss 9.9192 LearningRate 0.0009 Epoch: 3 Global Step: 73290 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:01,013-Speed 6300.41 samples/sec Loss 9.9447 LearningRate 0.0009 Epoch: 3 Global Step: 73300 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:04,246-Speed 6336.43 samples/sec Loss 9.9070 LearningRate 0.0009 Epoch: 3 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:07,491-Speed 6312.65 samples/sec Loss 9.9513 LearningRate 0.0009 Epoch: 3 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:10,738-Speed 6309.43 samples/sec Loss 9.8540 LearningRate 0.0009 Epoch: 3 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:13,979-Speed 6320.23 samples/sec Loss 9.9295 LearningRate 0.0009 Epoch: 3 Global Step: 73340 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:17,226-Speed 6309.09 samples/sec Loss 9.9460 LearningRate 0.0009 Epoch: 3 Global Step: 73350 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:20,474-Speed 6305.80 samples/sec Loss 9.9397 LearningRate 0.0009 Epoch: 3 Global Step: 73360 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:23,727-Speed 6297.50 samples/sec Loss 10.0027 LearningRate 0.0009 Epoch: 3 Global Step: 73370 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:26,972-Speed 6313.53 samples/sec Loss 10.0851 LearningRate 0.0009 Epoch: 3 Global Step: 73380 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:30,214-Speed 6318.54 samples/sec Loss 10.0701 LearningRate 0.0009 Epoch: 3 Global Step: 73390 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:33,462-Speed 6306.27 samples/sec Loss 10.0515 LearningRate 0.0009 Epoch: 3 Global Step: 73400 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:36,694-Speed 6337.94 samples/sec Loss 9.9534 LearningRate 0.0009 Epoch: 3 Global Step: 73410 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:39,941-Speed 6309.06 samples/sec Loss 10.0100 LearningRate 0.0009 Epoch: 3 Global Step: 73420 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:43,189-Speed 6307.71 samples/sec Loss 10.0498 LearningRate 0.0009 Epoch: 3 Global Step: 73430 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:46,434-Speed 6311.85 samples/sec Loss 9.9950 LearningRate 0.0009 Epoch: 3 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:49,686-Speed 6298.08 samples/sec Loss 9.8781 LearningRate 0.0009 Epoch: 3 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:52,931-Speed 6313.67 samples/sec Loss 10.0137 LearningRate 0.0009 Epoch: 3 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:56,175-Speed 6314.06 samples/sec Loss 9.8909 LearningRate 0.0009 Epoch: 3 Global Step: 73470 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:24:59,428-Speed 6297.68 samples/sec Loss 9.9940 LearningRate 0.0009 Epoch: 3 Global Step: 73480 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:02,673-Speed 6312.41 samples/sec Loss 9.9282 LearningRate 0.0009 Epoch: 3 Global Step: 73490 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:05,916-Speed 6316.89 samples/sec Loss 10.0072 LearningRate 0.0009 Epoch: 3 Global Step: 73500 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:09,145-Speed 6342.48 samples/sec Loss 9.9584 LearningRate 0.0009 Epoch: 3 Global Step: 73510 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:12,393-Speed 6308.77 samples/sec Loss 10.0609 LearningRate 0.0009 Epoch: 3 Global Step: 73520 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:15,642-Speed 6304.55 samples/sec Loss 9.8936 LearningRate 0.0009 Epoch: 3 Global Step: 73530 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:18,885-Speed 6315.81 samples/sec Loss 10.0122 LearningRate 0.0009 Epoch: 3 Global Step: 73540 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:22,138-Speed 6298.74 samples/sec Loss 9.9197 LearningRate 0.0009 Epoch: 3 Global Step: 73550 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:25,385-Speed 6309.23 samples/sec Loss 9.9708 LearningRate 0.0009 Epoch: 3 Global Step: 73560 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:28,627-Speed 6317.16 samples/sec Loss 10.0436 LearningRate 0.0009 Epoch: 3 Global Step: 73570 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:31,870-Speed 6317.00 samples/sec Loss 9.9780 LearningRate 0.0009 Epoch: 3 Global Step: 73580 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:35,129-Speed 6284.73 samples/sec Loss 9.9385 LearningRate 0.0009 Epoch: 3 Global Step: 73590 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:38,375-Speed 6312.17 samples/sec Loss 9.9796 LearningRate 0.0009 Epoch: 3 Global Step: 73600 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:41,605-Speed 6341.78 samples/sec Loss 9.9827 LearningRate 0.0009 Epoch: 3 Global Step: 73610 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:44,852-Speed 6307.88 samples/sec Loss 9.8999 LearningRate 0.0009 Epoch: 3 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:48,094-Speed 6318.17 samples/sec Loss 9.9593 LearningRate 0.0009 Epoch: 3 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:51,340-Speed 6311.24 samples/sec Loss 9.9438 LearningRate 0.0009 Epoch: 3 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:54,582-Speed 6319.10 samples/sec Loss 9.9947 LearningRate 0.0009 Epoch: 3 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:25:57,825-Speed 6314.79 samples/sec Loss 9.9894 LearningRate 0.0009 Epoch: 3 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:01,070-Speed 6313.58 samples/sec Loss 10.0247 LearningRate 0.0009 Epoch: 3 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:04,314-Speed 6314.61 samples/sec Loss 10.0815 LearningRate 0.0009 Epoch: 3 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:07,566-Speed 6299.49 samples/sec Loss 9.9622 LearningRate 0.0009 Epoch: 3 Global Step: 73690 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:10,809-Speed 6315.58 samples/sec Loss 9.9821 LearningRate 0.0009 Epoch: 3 Global Step: 73700 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:14,056-Speed 6309.42 samples/sec Loss 9.9198 LearningRate 0.0009 Epoch: 3 Global Step: 73710 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:26:17,292-Speed 6329.64 samples/sec Loss 9.8314 LearningRate 0.0009 Epoch: 3 Global Step: 73720 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:20,540-Speed 6307.73 samples/sec Loss 9.9668 LearningRate 0.0009 Epoch: 3 Global Step: 73730 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:23,784-Speed 6313.50 samples/sec Loss 9.9035 LearningRate 0.0009 Epoch: 3 Global Step: 73740 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:27,033-Speed 6306.40 samples/sec Loss 10.0032 LearningRate 0.0009 Epoch: 3 Global Step: 73750 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:30,280-Speed 6308.76 samples/sec Loss 9.9315 LearningRate 0.0009 Epoch: 3 Global Step: 73760 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:33,522-Speed 6318.80 samples/sec Loss 9.9501 LearningRate 0.0009 Epoch: 3 Global Step: 73770 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:26:36,753-Speed 6339.71 samples/sec Loss 9.9235 LearningRate 0.0009 Epoch: 3 Global Step: 73780 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:26:40,000-Speed 6309.23 samples/sec Loss 10.0404 LearningRate 0.0009 Epoch: 3 Global Step: 73790 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:26:43,246-Speed 6310.02 samples/sec Loss 9.9452 LearningRate 0.0009 Epoch: 3 Global Step: 73800 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:26:46,492-Speed 6310.77 samples/sec Loss 9.9071 LearningRate 0.0009 Epoch: 3 Global Step: 73810 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:26:49,738-Speed 6311.36 samples/sec Loss 9.9300 LearningRate 0.0009 Epoch: 3 Global Step: 73820 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:26:52,982-Speed 6313.69 samples/sec Loss 10.0289 LearningRate 0.0009 Epoch: 3 Global Step: 73830 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:26:56,229-Speed 6309.99 samples/sec Loss 9.9900 LearningRate 0.0009 Epoch: 3 Global Step: 73840 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:26:59,475-Speed 6310.22 samples/sec Loss 9.8457 LearningRate 0.0009 Epoch: 3 Global Step: 73850 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:27:02,715-Speed 6321.92 samples/sec Loss 9.8634 LearningRate 0.0009 Epoch: 3 Global Step: 73860 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:27:05,958-Speed 6315.73 samples/sec Loss 9.8991 LearningRate 0.0009 Epoch: 3 Global Step: 73870 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:27:09,205-Speed 6309.58 samples/sec Loss 9.9286 LearningRate 0.0009 Epoch: 3 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:12,446-Speed 6320.04 samples/sec Loss 10.0006 LearningRate 0.0009 Epoch: 3 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:15,693-Speed 6309.95 samples/sec Loss 9.8838 LearningRate 0.0009 Epoch: 3 Global Step: 73900 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:18,938-Speed 6312.77 samples/sec Loss 9.8738 LearningRate 0.0009 Epoch: 3 Global Step: 73910 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:22,187-Speed 6304.80 samples/sec Loss 9.9506 LearningRate 0.0009 Epoch: 3 Global Step: 73920 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:25,437-Speed 6302.40 samples/sec Loss 10.0138 LearningRate 0.0009 Epoch: 3 Global Step: 73930 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:28,679-Speed 6318.36 samples/sec Loss 9.8940 LearningRate 0.0009 Epoch: 3 Global Step: 73940 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:31,930-Speed 6301.84 samples/sec Loss 9.8937 LearningRate 0.0009 Epoch: 3 Global Step: 73950 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:35,174-Speed 6313.27 samples/sec Loss 9.9405 LearningRate 0.0009 Epoch: 3 Global Step: 73960 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:38,421-Speed 6310.48 samples/sec Loss 9.9585 LearningRate 0.0009 Epoch: 3 Global Step: 73970 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:41,654-Speed 6336.45 samples/sec Loss 9.8964 LearningRate 0.0009 Epoch: 3 Global Step: 73980 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:44,898-Speed 6314.96 samples/sec Loss 9.8753 LearningRate 0.0009 Epoch: 3 Global Step: 73990 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:48,143-Speed 6312.61 samples/sec Loss 10.0307 LearningRate 0.0009 Epoch: 3 Global Step: 74000 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:51,388-Speed 6312.63 samples/sec Loss 9.8753 LearningRate 0.0009 Epoch: 3 Global Step: 74010 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:54,631-Speed 6314.94 samples/sec Loss 9.9520 LearningRate 0.0009 Epoch: 3 Global Step: 74020 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:27:57,877-Speed 6311.76 samples/sec Loss 9.9060 LearningRate 0.0009 Epoch: 3 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:01,158-Speed 6243.61 samples/sec Loss 10.0632 LearningRate 0.0009 Epoch: 3 Global Step: 74040 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:04,404-Speed 6309.36 samples/sec Loss 9.8554 LearningRate 0.0009 Epoch: 3 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:07,646-Speed 6318.78 samples/sec Loss 9.9033 LearningRate 0.0009 Epoch: 3 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:10,899-Speed 6297.39 samples/sec Loss 9.8822 LearningRate 0.0009 Epoch: 3 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:14,129-Speed 6341.83 samples/sec Loss 9.8983 LearningRate 0.0009 Epoch: 3 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:17,377-Speed 6307.68 samples/sec Loss 9.9145 LearningRate 0.0009 Epoch: 3 Global Step: 74090 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:20,625-Speed 6306.22 samples/sec Loss 9.9492 LearningRate 0.0009 Epoch: 3 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:23,869-Speed 6314.21 samples/sec Loss 9.9349 LearningRate 0.0009 Epoch: 3 Global Step: 74110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:27,124-Speed 6292.37 samples/sec Loss 9.9491 LearningRate 0.0009 Epoch: 3 Global Step: 74120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:30,366-Speed 6319.40 samples/sec Loss 10.0035 LearningRate 0.0009 Epoch: 3 Global Step: 74130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:33,609-Speed 6316.51 samples/sec Loss 9.9404 LearningRate 0.0009 Epoch: 3 Global Step: 74140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:36,851-Speed 6318.38 samples/sec Loss 9.9743 LearningRate 0.0009 Epoch: 3 Global Step: 74150 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:40,096-Speed 6312.90 samples/sec Loss 9.8727 LearningRate 0.0009 Epoch: 3 Global Step: 74160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:43,343-Speed 6309.90 samples/sec Loss 9.8520 LearningRate 0.0009 Epoch: 3 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:46,586-Speed 6316.45 samples/sec Loss 9.8950 LearningRate 0.0009 Epoch: 3 Global Step: 74180 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:28:49,820-Speed 6333.69 samples/sec Loss 9.8678 LearningRate 0.0009 Epoch: 3 Global Step: 74190 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:53,069-Speed 6303.95 samples/sec Loss 9.8900 LearningRate 0.0009 Epoch: 3 Global Step: 74200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:56,311-Speed 6318.87 samples/sec Loss 9.9574 LearningRate 0.0009 Epoch: 3 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:28:59,568-Speed 6289.82 samples/sec Loss 9.9384 LearningRate 0.0009 Epoch: 3 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:29:02,810-Speed 6318.46 samples/sec Loss 9.8614 LearningRate 0.0009 Epoch: 3 Global Step: 74230 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:29:06,056-Speed 6309.85 samples/sec Loss 9.8935 LearningRate 0.0009 Epoch: 3 Global Step: 74240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:29:09,297-Speed 6321.43 samples/sec Loss 9.9750 LearningRate 0.0009 Epoch: 3 Global Step: 74250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:29:12,543-Speed 6310.83 samples/sec Loss 9.9286 LearningRate 0.0009 Epoch: 3 Global Step: 74260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:29:15,785-Speed 6317.88 samples/sec Loss 9.9854 LearningRate 0.0009 Epoch: 3 Global Step: 74270 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:29:19,016-Speed 6339.76 samples/sec Loss 9.9235 LearningRate 0.0009 Epoch: 3 Global Step: 74280 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:29:22,258-Speed 6318.09 samples/sec Loss 9.9489 LearningRate 0.0009 Epoch: 3 Global Step: 74290 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:29:25,503-Speed 6312.86 samples/sec Loss 9.9162 LearningRate 0.0009 Epoch: 3 Global Step: 74300 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:29:28,797-Speed 6218.95 samples/sec Loss 9.9135 LearningRate 0.0009 Epoch: 3 Global Step: 74310 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:29:32,040-Speed 6316.94 samples/sec Loss 9.9848 LearningRate 0.0009 Epoch: 3 Global Step: 74320 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:29:35,285-Speed 6312.29 samples/sec Loss 9.8550 LearningRate 0.0009 Epoch: 3 Global Step: 74330 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:29:38,531-Speed 6310.94 samples/sec Loss 9.9986 LearningRate 0.0009 Epoch: 3 Global Step: 74340 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:29:41,773-Speed 6319.16 samples/sec Loss 9.9222 LearningRate 0.0009 Epoch: 3 Global Step: 74350 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:29:45,015-Speed 6317.24 samples/sec Loss 10.0069 LearningRate 0.0009 Epoch: 3 Global Step: 74360 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:29:48,254-Speed 6324.57 samples/sec Loss 9.9894 LearningRate 0.0009 Epoch: 3 Global Step: 74370 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:29:51,498-Speed 6314.05 samples/sec Loss 9.9877 LearningRate 0.0009 Epoch: 3 Global Step: 74380 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:29:54,742-Speed 6315.44 samples/sec Loss 9.8565 LearningRate 0.0009 Epoch: 3 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:29:57,990-Speed 6306.65 samples/sec Loss 9.9096 LearningRate 0.0009 Epoch: 3 Global Step: 74400 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:01,235-Speed 6314.44 samples/sec Loss 9.9039 LearningRate 0.0009 Epoch: 3 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:04,483-Speed 6306.16 samples/sec Loss 9.9662 LearningRate 0.0009 Epoch: 3 Global Step: 74420 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:07,726-Speed 6317.38 samples/sec Loss 9.9218 LearningRate 0.0009 Epoch: 3 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:10,971-Speed 6312.82 samples/sec Loss 9.9513 LearningRate 0.0009 Epoch: 3 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:14,217-Speed 6310.11 samples/sec Loss 9.8936 LearningRate 0.0009 Epoch: 3 Global Step: 74450 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:17,458-Speed 6319.35 samples/sec Loss 9.9361 LearningRate 0.0009 Epoch: 3 Global Step: 74460 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:20,708-Speed 6304.49 samples/sec Loss 9.9240 LearningRate 0.0009 Epoch: 3 Global Step: 74470 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:23,947-Speed 6324.25 samples/sec Loss 9.8372 LearningRate 0.0009 Epoch: 3 Global Step: 74480 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:27,198-Speed 6299.25 samples/sec Loss 9.8668 LearningRate 0.0009 Epoch: 3 Global Step: 74490 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:30,443-Speed 6312.64 samples/sec Loss 9.9323 LearningRate 0.0009 Epoch: 3 Global Step: 74500 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:33,690-Speed 6310.04 samples/sec Loss 9.9485 LearningRate 0.0009 Epoch: 3 Global Step: 74510 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:36,937-Speed 6307.72 samples/sec Loss 9.8901 LearningRate 0.0009 Epoch: 3 Global Step: 74520 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:40,182-Speed 6312.96 samples/sec Loss 9.9944 LearningRate 0.0009 Epoch: 3 Global Step: 74530 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:43,430-Speed 6307.08 samples/sec Loss 9.8640 LearningRate 0.0009 Epoch: 3 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:46,677-Speed 6308.92 samples/sec Loss 9.8152 LearningRate 0.0009 Epoch: 3 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:49,919-Speed 6318.73 samples/sec Loss 9.7986 LearningRate 0.0009 Epoch: 3 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:53,164-Speed 6311.19 samples/sec Loss 9.8421 LearningRate 0.0009 Epoch: 3 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:56,396-Speed 6338.57 samples/sec Loss 9.8900 LearningRate 0.0009 Epoch: 3 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:30:59,642-Speed 6311.20 samples/sec Loss 9.9304 LearningRate 0.0009 Epoch: 3 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:02,889-Speed 6309.06 samples/sec Loss 9.9036 LearningRate 0.0009 Epoch: 3 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:06,130-Speed 6320.39 samples/sec Loss 9.9776 LearningRate 0.0009 Epoch: 3 Global Step: 74610 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:09,374-Speed 6314.90 samples/sec Loss 9.9598 LearningRate 0.0009 Epoch: 3 Global Step: 74620 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:12,619-Speed 6313.39 samples/sec Loss 9.9295 LearningRate 0.0009 Epoch: 3 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:15,859-Speed 6321.15 samples/sec Loss 9.9355 LearningRate 0.0009 Epoch: 3 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:19,102-Speed 6318.16 samples/sec Loss 9.9203 LearningRate 0.0009 Epoch: 3 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:22,346-Speed 6314.16 samples/sec Loss 9.9522 LearningRate 0.0009 Epoch: 3 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:25,691-Speed 6122.82 samples/sec Loss 9.8446 LearningRate 0.0009 Epoch: 3 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:29,029-Speed 6136.29 samples/sec Loss 9.8199 LearningRate 0.0009 Epoch: 3 Global Step: 74680 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:31:32,260-Speed 6340.50 samples/sec Loss 9.9038 LearningRate 0.0009 Epoch: 3 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:35,503-Speed 6316.33 samples/sec Loss 9.8218 LearningRate 0.0009 Epoch: 3 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:38,749-Speed 6310.50 samples/sec Loss 9.8813 LearningRate 0.0009 Epoch: 3 Global Step: 74710 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:41,997-Speed 6307.49 samples/sec Loss 9.9050 LearningRate 0.0009 Epoch: 3 Global Step: 74720 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:45,245-Speed 6306.65 samples/sec Loss 9.9365 LearningRate 0.0009 Epoch: 3 Global Step: 74730 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:48,490-Speed 6313.58 samples/sec Loss 9.8816 LearningRate 0.0009 Epoch: 3 Global Step: 74740 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:51,733-Speed 6315.09 samples/sec Loss 9.8881 LearningRate 0.0009 Epoch: 3 Global Step: 74750 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:54,976-Speed 6317.96 samples/sec Loss 9.8473 LearningRate 0.0009 Epoch: 3 Global Step: 74760 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:31:58,223-Speed 6308.81 samples/sec Loss 9.9810 LearningRate 0.0009 Epoch: 3 Global Step: 74770 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:01,471-Speed 6307.18 samples/sec Loss 9.9221 LearningRate 0.0009 Epoch: 3 Global Step: 74780 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:04,717-Speed 6309.06 samples/sec Loss 9.9251 LearningRate 0.0009 Epoch: 3 Global Step: 74790 Fp16 Grad Scale: 131072 Required: 69 hours Training: 2022-03-31 22:32:07,947-Speed 6341.76 samples/sec Loss 9.9377 LearningRate 0.0009 Epoch: 3 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:11,201-Speed 6296.49 samples/sec Loss 9.8266 LearningRate 0.0009 Epoch: 3 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:14,512-Speed 6186.53 samples/sec Loss 9.8555 LearningRate 0.0009 Epoch: 3 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:17,759-Speed 6310.91 samples/sec Loss 9.8728 LearningRate 0.0009 Epoch: 3 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:21,002-Speed 6315.16 samples/sec Loss 9.9713 LearningRate 0.0009 Epoch: 3 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:24,248-Speed 6312.08 samples/sec Loss 9.9078 LearningRate 0.0009 Epoch: 3 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:27,491-Speed 6314.97 samples/sec Loss 9.9363 LearningRate 0.0009 Epoch: 3 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:30,739-Speed 6308.23 samples/sec Loss 9.9801 LearningRate 0.0009 Epoch: 3 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:33,987-Speed 6306.41 samples/sec Loss 9.8864 LearningRate 0.0009 Epoch: 3 Global Step: 74880 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:37,231-Speed 6313.97 samples/sec Loss 9.8688 LearningRate 0.0009 Epoch: 3 Global Step: 74890 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:40,464-Speed 6336.46 samples/sec Loss 9.9635 LearningRate 0.0009 Epoch: 3 Global Step: 74900 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:43,712-Speed 6307.07 samples/sec Loss 9.9451 LearningRate 0.0009 Epoch: 3 Global Step: 74910 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:46,956-Speed 6314.42 samples/sec Loss 9.9263 LearningRate 0.0009 Epoch: 3 Global Step: 74920 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:50,204-Speed 6307.29 samples/sec Loss 9.8815 LearningRate 0.0009 Epoch: 3 Global Step: 74930 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:53,454-Speed 6302.23 samples/sec Loss 9.9702 LearningRate 0.0009 Epoch: 3 Global Step: 74940 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:56,703-Speed 6305.44 samples/sec Loss 9.9862 LearningRate 0.0009 Epoch: 3 Global Step: 74950 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:32:59,947-Speed 6313.64 samples/sec Loss 9.9783 LearningRate 0.0009 Epoch: 3 Global Step: 74960 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:33:03,193-Speed 6310.23 samples/sec Loss 9.9020 LearningRate 0.0009 Epoch: 3 Global Step: 74970 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:33:06,442-Speed 6305.15 samples/sec Loss 9.9373 LearningRate 0.0009 Epoch: 3 Global Step: 74980 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:33:09,673-Speed 6339.62 samples/sec Loss 9.9185 LearningRate 0.0009 Epoch: 3 Global Step: 74990 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:33:12,921-Speed 6307.43 samples/sec Loss 9.8797 LearningRate 0.0009 Epoch: 3 Global Step: 75000 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:33:16,176-Speed 6292.48 samples/sec Loss 9.8960 LearningRate 0.0009 Epoch: 3 Global Step: 75010 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:33:19,421-Speed 6313.97 samples/sec Loss 9.8639 LearningRate 0.0009 Epoch: 3 Global Step: 75020 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:33:22,666-Speed 6313.46 samples/sec Loss 9.7804 LearningRate 0.0009 Epoch: 3 Global Step: 75030 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:33:25,915-Speed 6303.87 samples/sec Loss 9.9278 LearningRate 0.0009 Epoch: 3 Global Step: 75040 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:33:29,177-Speed 6280.79 samples/sec Loss 9.9404 LearningRate 0.0009 Epoch: 3 Global Step: 75050 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:33:32,421-Speed 6313.76 samples/sec Loss 9.9354 LearningRate 0.0009 Epoch: 3 Global Step: 75060 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:33:35,670-Speed 6306.24 samples/sec Loss 9.9095 LearningRate 0.0009 Epoch: 3 Global Step: 75070 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:33:38,937-Speed 6268.18 samples/sec Loss 9.8539 LearningRate 0.0009 Epoch: 3 Global Step: 75080 Fp16 Grad Scale: 32768 Required: 69 hours Training: 2022-03-31 22:33:42,195-Speed 6289.21 samples/sec Loss 9.9050 LearningRate 0.0009 Epoch: 3 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:33:45,442-Speed 6308.35 samples/sec Loss 9.8669 LearningRate 0.0009 Epoch: 3 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:33:48,686-Speed 6314.90 samples/sec Loss 9.8937 LearningRate 0.0009 Epoch: 3 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:33:51,931-Speed 6312.67 samples/sec Loss 9.8475 LearningRate 0.0009 Epoch: 3 Global Step: 75120 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:33:55,176-Speed 6312.74 samples/sec Loss 9.8213 LearningRate 0.0009 Epoch: 3 Global Step: 75130 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:33:58,421-Speed 6310.56 samples/sec Loss 9.8369 LearningRate 0.0009 Epoch: 3 Global Step: 75140 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:01,676-Speed 6295.34 samples/sec Loss 9.9320 LearningRate 0.0009 Epoch: 3 Global Step: 75150 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:04,922-Speed 6309.88 samples/sec Loss 9.9593 LearningRate 0.0009 Epoch: 3 Global Step: 75160 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:08,165-Speed 6316.27 samples/sec Loss 10.0070 LearningRate 0.0009 Epoch: 3 Global Step: 75170 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:11,410-Speed 6312.55 samples/sec Loss 9.8102 LearningRate 0.0009 Epoch: 3 Global Step: 75180 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:14,645-Speed 6331.71 samples/sec Loss 9.7745 LearningRate 0.0009 Epoch: 3 Global Step: 75190 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:17,987-Speed 6130.92 samples/sec Loss 10.0251 LearningRate 0.0009 Epoch: 3 Global Step: 75200 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:21,233-Speed 6309.30 samples/sec Loss 9.9702 LearningRate 0.0009 Epoch: 3 Global Step: 75210 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:24,482-Speed 6306.64 samples/sec Loss 9.9613 LearningRate 0.0009 Epoch: 3 Global Step: 75220 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:27,734-Speed 6298.90 samples/sec Loss 9.9308 LearningRate 0.0009 Epoch: 3 Global Step: 75230 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:30,979-Speed 6311.69 samples/sec Loss 9.9198 LearningRate 0.0009 Epoch: 3 Global Step: 75240 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:34,222-Speed 6317.15 samples/sec Loss 9.9356 LearningRate 0.0009 Epoch: 3 Global Step: 75250 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:37,470-Speed 6307.09 samples/sec Loss 9.8765 LearningRate 0.0009 Epoch: 3 Global Step: 75260 Fp16 Grad Scale: 65536 Required: 69 hours Training: 2022-03-31 22:34:40,718-Speed 6306.44 samples/sec Loss 9.8651 LearningRate 0.0009 Epoch: 3 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:34:43,965-Speed 6309.96 samples/sec Loss 9.8284 LearningRate 0.0009 Epoch: 3 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:34:47,199-Speed 6333.36 samples/sec Loss 9.9670 LearningRate 0.0009 Epoch: 3 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:34:50,447-Speed 6306.47 samples/sec Loss 9.8680 LearningRate 0.0009 Epoch: 3 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:34:53,693-Speed 6310.94 samples/sec Loss 9.9779 LearningRate 0.0009 Epoch: 3 Global Step: 75310 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:34:56,939-Speed 6311.68 samples/sec Loss 9.8610 LearningRate 0.0009 Epoch: 3 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:35:00,171-Speed 6336.89 samples/sec Loss 9.8801 LearningRate 0.0009 Epoch: 3 Global Step: 75330 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:35:03,413-Speed 6319.86 samples/sec Loss 9.9051 LearningRate 0.0009 Epoch: 3 Global Step: 75340 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:35:06,661-Speed 6305.53 samples/sec Loss 9.8389 LearningRate 0.0009 Epoch: 3 Global Step: 75350 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:35:09,907-Speed 6311.15 samples/sec Loss 9.9061 LearningRate 0.0009 Epoch: 3 Global Step: 75360 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:35:13,152-Speed 6312.25 samples/sec Loss 9.9393 LearningRate 0.0009 Epoch: 3 Global Step: 75370 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:35:16,398-Speed 6310.85 samples/sec Loss 9.9102 LearningRate 0.0009 Epoch: 3 Global Step: 75380 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:35:19,642-Speed 6314.29 samples/sec Loss 9.8739 LearningRate 0.0009 Epoch: 3 Global Step: 75390 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:35:22,887-Speed 6312.68 samples/sec Loss 9.9517 LearningRate 0.0009 Epoch: 3 Global Step: 75400 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:35:26,163-Speed 6253.63 samples/sec Loss 9.7888 LearningRate 0.0009 Epoch: 3 Global Step: 75410 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:35:29,409-Speed 6310.43 samples/sec Loss 9.9157 LearningRate 0.0009 Epoch: 3 Global Step: 75420 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:35:32,653-Speed 6313.76 samples/sec Loss 9.8524 LearningRate 0.0009 Epoch: 3 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:35:35,914-Speed 6283.01 samples/sec Loss 9.9158 LearningRate 0.0009 Epoch: 3 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:35:39,162-Speed 6306.47 samples/sec Loss 9.8263 LearningRate 0.0009 Epoch: 3 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:35:42,412-Speed 6303.27 samples/sec Loss 9.9183 LearningRate 0.0009 Epoch: 3 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:35:45,658-Speed 6310.61 samples/sec Loss 9.7917 LearningRate 0.0009 Epoch: 3 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:35:48,905-Speed 6308.84 samples/sec Loss 9.8822 LearningRate 0.0009 Epoch: 3 Global Step: 75480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:35:52,154-Speed 6305.44 samples/sec Loss 9.9194 LearningRate 0.0009 Epoch: 3 Global Step: 75490 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:35:55,393-Speed 6324.52 samples/sec Loss 9.9048 LearningRate 0.0009 Epoch: 3 Global Step: 75500 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:35:58,636-Speed 6316.60 samples/sec Loss 9.8036 LearningRate 0.0009 Epoch: 3 Global Step: 75510 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:01,879-Speed 6317.05 samples/sec Loss 9.8843 LearningRate 0.0009 Epoch: 3 Global Step: 75520 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:05,111-Speed 6336.39 samples/sec Loss 9.8469 LearningRate 0.0009 Epoch: 3 Global Step: 75530 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:08,356-Speed 6314.33 samples/sec Loss 9.8743 LearningRate 0.0009 Epoch: 3 Global Step: 75540 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:11,599-Speed 6315.27 samples/sec Loss 9.8917 LearningRate 0.0009 Epoch: 3 Global Step: 75550 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:14,842-Speed 6316.88 samples/sec Loss 9.8516 LearningRate 0.0009 Epoch: 3 Global Step: 75560 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:18,087-Speed 6312.75 samples/sec Loss 9.9054 LearningRate 0.0009 Epoch: 3 Global Step: 75570 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:21,329-Speed 6317.79 samples/sec Loss 9.8988 LearningRate 0.0009 Epoch: 3 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:24,581-Speed 6299.50 samples/sec Loss 9.8488 LearningRate 0.0009 Epoch: 3 Global Step: 75590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:27,824-Speed 6316.97 samples/sec Loss 9.9030 LearningRate 0.0009 Epoch: 3 Global Step: 75600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:31,069-Speed 6312.26 samples/sec Loss 9.8007 LearningRate 0.0009 Epoch: 3 Global Step: 75610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:34,310-Speed 6320.44 samples/sec Loss 9.8027 LearningRate 0.0009 Epoch: 3 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:37,555-Speed 6313.94 samples/sec Loss 9.8485 LearningRate 0.0009 Epoch: 3 Global Step: 75630 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 22:36:40,785-Speed 6341.74 samples/sec Loss 9.8224 LearningRate 0.0009 Epoch: 3 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:44,035-Speed 6301.65 samples/sec Loss 9.8436 LearningRate 0.0009 Epoch: 3 Global Step: 75650 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:47,285-Speed 6303.07 samples/sec Loss 9.8858 LearningRate 0.0009 Epoch: 3 Global Step: 75660 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:50,534-Speed 6305.41 samples/sec Loss 9.8287 LearningRate 0.0009 Epoch: 3 Global Step: 75670 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:53,784-Speed 6302.68 samples/sec Loss 9.9082 LearningRate 0.0009 Epoch: 3 Global Step: 75680 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:36:57,031-Speed 6308.50 samples/sec Loss 9.8981 LearningRate 0.0009 Epoch: 3 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:00,285-Speed 6296.60 samples/sec Loss 9.9666 LearningRate 0.0009 Epoch: 3 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:03,532-Speed 6307.54 samples/sec Loss 9.8935 LearningRate 0.0009 Epoch: 3 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:06,779-Speed 6310.38 samples/sec Loss 9.8113 LearningRate 0.0009 Epoch: 3 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:10,023-Speed 6314.67 samples/sec Loss 9.8794 LearningRate 0.0009 Epoch: 3 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:13,256-Speed 6335.48 samples/sec Loss 9.8446 LearningRate 0.0009 Epoch: 3 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:16,501-Speed 6312.35 samples/sec Loss 9.7774 LearningRate 0.0009 Epoch: 3 Global Step: 75750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:19,749-Speed 6306.84 samples/sec Loss 9.8856 LearningRate 0.0009 Epoch: 3 Global Step: 75760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:22,996-Speed 6309.19 samples/sec Loss 9.9719 LearningRate 0.0009 Epoch: 3 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:26,267-Speed 6263.19 samples/sec Loss 9.9018 LearningRate 0.0009 Epoch: 3 Global Step: 75780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:29,514-Speed 6307.47 samples/sec Loss 9.8479 LearningRate 0.0009 Epoch: 3 Global Step: 75790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:32,763-Speed 6305.34 samples/sec Loss 9.8809 LearningRate 0.0009 Epoch: 3 Global Step: 75800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:36,009-Speed 6310.10 samples/sec Loss 9.8234 LearningRate 0.0009 Epoch: 3 Global Step: 75810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:39,254-Speed 6313.11 samples/sec Loss 9.8618 LearningRate 0.0009 Epoch: 3 Global Step: 75820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:42,511-Speed 6288.72 samples/sec Loss 9.8891 LearningRate 0.0009 Epoch: 3 Global Step: 75830 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:45,744-Speed 6336.16 samples/sec Loss 9.8607 LearningRate 0.0009 Epoch: 3 Global Step: 75840 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:48,994-Speed 6303.13 samples/sec Loss 9.9076 LearningRate 0.0009 Epoch: 3 Global Step: 75850 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:52,236-Speed 6319.06 samples/sec Loss 9.8389 LearningRate 0.0009 Epoch: 3 Global Step: 75860 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:55,484-Speed 6306.98 samples/sec Loss 9.8348 LearningRate 0.0009 Epoch: 3 Global Step: 75870 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:37:58,739-Speed 6293.15 samples/sec Loss 9.9985 LearningRate 0.0009 Epoch: 3 Global Step: 75880 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:01,984-Speed 6310.88 samples/sec Loss 9.9209 LearningRate 0.0009 Epoch: 3 Global Step: 75890 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:05,233-Speed 6306.59 samples/sec Loss 9.8184 LearningRate 0.0009 Epoch: 3 Global Step: 75900 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:08,480-Speed 6308.41 samples/sec Loss 9.7766 LearningRate 0.0009 Epoch: 3 Global Step: 75910 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:11,726-Speed 6311.08 samples/sec Loss 9.7714 LearningRate 0.0009 Epoch: 3 Global Step: 75920 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:14,971-Speed 6313.24 samples/sec Loss 9.8883 LearningRate 0.0009 Epoch: 3 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:18,218-Speed 6307.35 samples/sec Loss 9.8140 LearningRate 0.0009 Epoch: 3 Global Step: 75940 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 22:38:21,451-Speed 6336.21 samples/sec Loss 9.8748 LearningRate 0.0009 Epoch: 3 Global Step: 75950 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:24,697-Speed 6310.71 samples/sec Loss 9.8515 LearningRate 0.0009 Epoch: 3 Global Step: 75960 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:27,944-Speed 6309.45 samples/sec Loss 9.8658 LearningRate 0.0009 Epoch: 3 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:31,188-Speed 6313.73 samples/sec Loss 9.9351 LearningRate 0.0009 Epoch: 3 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:34,433-Speed 6313.46 samples/sec Loss 9.8474 LearningRate 0.0009 Epoch: 3 Global Step: 75990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:37,677-Speed 6314.70 samples/sec Loss 9.8610 LearningRate 0.0009 Epoch: 3 Global Step: 76000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:40,925-Speed 6305.77 samples/sec Loss 9.8661 LearningRate 0.0009 Epoch: 3 Global Step: 76010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:44,171-Speed 6311.05 samples/sec Loss 9.8394 LearningRate 0.0009 Epoch: 3 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:47,416-Speed 6312.89 samples/sec Loss 9.8524 LearningRate 0.0009 Epoch: 3 Global Step: 76030 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:50,661-Speed 6313.10 samples/sec Loss 9.8766 LearningRate 0.0009 Epoch: 3 Global Step: 76040 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:53,896-Speed 6332.38 samples/sec Loss 9.8160 LearningRate 0.0009 Epoch: 3 Global Step: 76050 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:38:57,140-Speed 6313.33 samples/sec Loss 9.9436 LearningRate 0.0009 Epoch: 3 Global Step: 76060 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:39:00,389-Speed 6305.99 samples/sec Loss 9.8720 LearningRate 0.0009 Epoch: 3 Global Step: 76070 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:39:03,637-Speed 6305.50 samples/sec Loss 9.8308 LearningRate 0.0009 Epoch: 3 Global Step: 76080 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:39:06,885-Speed 6308.26 samples/sec Loss 9.9681 LearningRate 0.0009 Epoch: 3 Global Step: 76090 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:39:10,130-Speed 6313.67 samples/sec Loss 9.8567 LearningRate 0.0009 Epoch: 3 Global Step: 76100 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:39:13,372-Speed 6317.30 samples/sec Loss 9.8719 LearningRate 0.0009 Epoch: 3 Global Step: 76110 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:39:16,628-Speed 6293.30 samples/sec Loss 9.9192 LearningRate 0.0009 Epoch: 3 Global Step: 76120 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:39:19,874-Speed 6310.50 samples/sec Loss 9.8618 LearningRate 0.0009 Epoch: 3 Global Step: 76130 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:39:23,121-Speed 6308.35 samples/sec Loss 9.8207 LearningRate 0.0009 Epoch: 3 Global Step: 76140 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:39:26,363-Speed 6317.91 samples/sec Loss 9.8257 LearningRate 0.0009 Epoch: 3 Global Step: 76150 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:39:29,598-Speed 6332.37 samples/sec Loss 9.8959 LearningRate 0.0009 Epoch: 3 Global Step: 76160 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:39:32,843-Speed 6313.26 samples/sec Loss 9.7791 LearningRate 0.0009 Epoch: 3 Global Step: 76170 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:39:36,089-Speed 6310.43 samples/sec Loss 9.9006 LearningRate 0.0009 Epoch: 3 Global Step: 76180 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:39:39,333-Speed 6313.81 samples/sec Loss 9.8193 LearningRate 0.0009 Epoch: 3 Global Step: 76190 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:39:42,581-Speed 6306.09 samples/sec Loss 9.7735 LearningRate 0.0009 Epoch: 3 Global Step: 76200 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:39:45,828-Speed 6310.14 samples/sec Loss 9.8042 LearningRate 0.0009 Epoch: 3 Global Step: 76210 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:39:49,076-Speed 6307.07 samples/sec Loss 9.9439 LearningRate 0.0009 Epoch: 3 Global Step: 76220 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:39:52,317-Speed 6319.01 samples/sec Loss 9.9326 LearningRate 0.0009 Epoch: 3 Global Step: 76230 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:39:55,570-Speed 6297.79 samples/sec Loss 9.8810 LearningRate 0.0009 Epoch: 3 Global Step: 76240 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:39:58,820-Speed 6302.41 samples/sec Loss 9.8787 LearningRate 0.0009 Epoch: 3 Global Step: 76250 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:40:02,070-Speed 6303.22 samples/sec Loss 9.8503 LearningRate 0.0009 Epoch: 3 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:05,315-Speed 6312.11 samples/sec Loss 9.8964 LearningRate 0.0009 Epoch: 3 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:08,563-Speed 6307.10 samples/sec Loss 9.8642 LearningRate 0.0009 Epoch: 3 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:11,808-Speed 6312.54 samples/sec Loss 9.8859 LearningRate 0.0009 Epoch: 3 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:15,054-Speed 6311.35 samples/sec Loss 9.8563 LearningRate 0.0009 Epoch: 3 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:18,300-Speed 6311.15 samples/sec Loss 9.8986 LearningRate 0.0009 Epoch: 3 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:21,546-Speed 6310.28 samples/sec Loss 9.8280 LearningRate 0.0009 Epoch: 3 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:24,795-Speed 6306.37 samples/sec Loss 9.9200 LearningRate 0.0009 Epoch: 3 Global Step: 76330 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:28,036-Speed 6320.34 samples/sec Loss 9.7801 LearningRate 0.0009 Epoch: 3 Global Step: 76340 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:31,278-Speed 6317.49 samples/sec Loss 9.8791 LearningRate 0.0009 Epoch: 3 Global Step: 76350 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:34,510-Speed 6338.16 samples/sec Loss 9.7910 LearningRate 0.0009 Epoch: 3 Global Step: 76360 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:37,755-Speed 6312.81 samples/sec Loss 9.8001 LearningRate 0.0009 Epoch: 3 Global Step: 76370 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:41,000-Speed 6312.62 samples/sec Loss 9.8373 LearningRate 0.0009 Epoch: 3 Global Step: 76380 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:44,243-Speed 6316.18 samples/sec Loss 9.9019 LearningRate 0.0009 Epoch: 3 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:47,488-Speed 6312.28 samples/sec Loss 9.7857 LearningRate 0.0009 Epoch: 3 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:50,736-Speed 6306.49 samples/sec Loss 9.7745 LearningRate 0.0009 Epoch: 3 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:53,978-Speed 6320.38 samples/sec Loss 9.8382 LearningRate 0.0009 Epoch: 3 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:40:57,221-Speed 6315.37 samples/sec Loss 9.9051 LearningRate 0.0009 Epoch: 3 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:00,466-Speed 6312.42 samples/sec Loss 9.8169 LearningRate 0.0009 Epoch: 3 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:03,712-Speed 6310.42 samples/sec Loss 9.8048 LearningRate 0.0009 Epoch: 3 Global Step: 76450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:06,984-Speed 6261.53 samples/sec Loss 9.8512 LearningRate 0.0009 Epoch: 3 Global Step: 76460 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 22:41:10,215-Speed 6338.73 samples/sec Loss 9.8088 LearningRate 0.0009 Epoch: 3 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:13,462-Speed 6308.55 samples/sec Loss 9.8615 LearningRate 0.0009 Epoch: 3 Global Step: 76480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:16,702-Speed 6323.64 samples/sec Loss 9.8664 LearningRate 0.0009 Epoch: 3 Global Step: 76490 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:19,949-Speed 6308.45 samples/sec Loss 9.7603 LearningRate 0.0009 Epoch: 3 Global Step: 76500 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:23,197-Speed 6307.42 samples/sec Loss 9.8310 LearningRate 0.0009 Epoch: 3 Global Step: 76510 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:26,440-Speed 6317.64 samples/sec Loss 9.8947 LearningRate 0.0009 Epoch: 3 Global Step: 76520 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:29,711-Speed 6262.17 samples/sec Loss 9.7785 LearningRate 0.0009 Epoch: 3 Global Step: 76530 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:32,957-Speed 6311.35 samples/sec Loss 9.8520 LearningRate 0.0009 Epoch: 3 Global Step: 76540 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:36,201-Speed 6313.26 samples/sec Loss 9.8073 LearningRate 0.0009 Epoch: 3 Global Step: 76550 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:39,467-Speed 6272.23 samples/sec Loss 9.7515 LearningRate 0.0009 Epoch: 3 Global Step: 76560 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:42,702-Speed 6333.57 samples/sec Loss 9.8478 LearningRate 0.0009 Epoch: 3 Global Step: 76570 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:45,945-Speed 6314.85 samples/sec Loss 9.8323 LearningRate 0.0009 Epoch: 3 Global Step: 76580 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:49,193-Speed 6306.44 samples/sec Loss 9.8253 LearningRate 0.0009 Epoch: 3 Global Step: 76590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:52,439-Speed 6310.92 samples/sec Loss 9.8319 LearningRate 0.0009 Epoch: 3 Global Step: 76600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:55,681-Speed 6319.78 samples/sec Loss 9.8701 LearningRate 0.0009 Epoch: 3 Global Step: 76610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:41:58,923-Speed 6318.04 samples/sec Loss 9.8287 LearningRate 0.0009 Epoch: 3 Global Step: 76620 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:02,167-Speed 6313.57 samples/sec Loss 9.7252 LearningRate 0.0009 Epoch: 3 Global Step: 76630 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:05,412-Speed 6313.18 samples/sec Loss 9.8527 LearningRate 0.0009 Epoch: 3 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:08,662-Speed 6302.71 samples/sec Loss 9.7723 LearningRate 0.0009 Epoch: 3 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:11,909-Speed 6308.44 samples/sec Loss 9.7721 LearningRate 0.0009 Epoch: 3 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:15,154-Speed 6314.04 samples/sec Loss 9.7617 LearningRate 0.0009 Epoch: 3 Global Step: 76670 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 22:42:18,382-Speed 6345.28 samples/sec Loss 9.8157 LearningRate 0.0009 Epoch: 3 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:21,630-Speed 6306.90 samples/sec Loss 9.8947 LearningRate 0.0009 Epoch: 3 Global Step: 76690 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:24,877-Speed 6308.52 samples/sec Loss 9.8778 LearningRate 0.0009 Epoch: 3 Global Step: 76700 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:28,118-Speed 6320.53 samples/sec Loss 9.8377 LearningRate 0.0009 Epoch: 3 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:31,364-Speed 6311.39 samples/sec Loss 9.8076 LearningRate 0.0009 Epoch: 3 Global Step: 76720 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:34,607-Speed 6316.44 samples/sec Loss 9.8631 LearningRate 0.0009 Epoch: 3 Global Step: 76730 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:37,852-Speed 6314.02 samples/sec Loss 9.7700 LearningRate 0.0009 Epoch: 3 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:41,098-Speed 6309.93 samples/sec Loss 9.8126 LearningRate 0.0009 Epoch: 3 Global Step: 76750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:44,343-Speed 6312.41 samples/sec Loss 9.7962 LearningRate 0.0009 Epoch: 3 Global Step: 76760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:47,590-Speed 6308.46 samples/sec Loss 9.7798 LearningRate 0.0009 Epoch: 3 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:50,850-Speed 6284.63 samples/sec Loss 9.7981 LearningRate 0.0009 Epoch: 3 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:54,096-Speed 6309.19 samples/sec Loss 9.8734 LearningRate 0.0009 Epoch: 3 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:42:57,323-Speed 6348.78 samples/sec Loss 9.7209 LearningRate 0.0009 Epoch: 3 Global Step: 76800 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:43:00,564-Speed 6319.50 samples/sec Loss 9.8467 LearningRate 0.0009 Epoch: 3 Global Step: 76810 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:43:03,811-Speed 6309.43 samples/sec Loss 9.8530 LearningRate 0.0009 Epoch: 3 Global Step: 76820 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:43:07,057-Speed 6311.13 samples/sec Loss 9.8011 LearningRate 0.0009 Epoch: 3 Global Step: 76830 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:43:10,303-Speed 6311.32 samples/sec Loss 9.8717 LearningRate 0.0009 Epoch: 3 Global Step: 76840 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:43:13,548-Speed 6310.82 samples/sec Loss 9.6880 LearningRate 0.0009 Epoch: 3 Global Step: 76850 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:43:16,794-Speed 6310.81 samples/sec Loss 9.8388 LearningRate 0.0009 Epoch: 3 Global Step: 76860 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:43:20,040-Speed 6311.26 samples/sec Loss 9.8103 LearningRate 0.0009 Epoch: 3 Global Step: 76870 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:43:23,290-Speed 6303.09 samples/sec Loss 9.8458 LearningRate 0.0009 Epoch: 3 Global Step: 76880 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:43:26,542-Speed 6298.89 samples/sec Loss 9.8065 LearningRate 0.0009 Epoch: 3 Global Step: 76890 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:43:29,785-Speed 6316.79 samples/sec Loss 9.7477 LearningRate 0.0009 Epoch: 3 Global Step: 76900 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:43:33,035-Speed 6303.35 samples/sec Loss 9.8260 LearningRate 0.0009 Epoch: 3 Global Step: 76910 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:43:36,279-Speed 6314.18 samples/sec Loss 9.8172 LearningRate 0.0009 Epoch: 3 Global Step: 76920 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:43:39,527-Speed 6308.11 samples/sec Loss 9.8328 LearningRate 0.0009 Epoch: 3 Global Step: 76930 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:43:42,778-Speed 6300.69 samples/sec Loss 9.8996 LearningRate 0.0009 Epoch: 3 Global Step: 76940 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:43:46,025-Speed 6307.79 samples/sec Loss 9.7780 LearningRate 0.0009 Epoch: 3 Global Step: 76950 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:43:49,272-Speed 6309.48 samples/sec Loss 9.7994 LearningRate 0.0009 Epoch: 3 Global Step: 76960 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:43:52,517-Speed 6312.11 samples/sec Loss 9.8283 LearningRate 0.0009 Epoch: 3 Global Step: 76970 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:43:55,764-Speed 6309.23 samples/sec Loss 9.8170 LearningRate 0.0009 Epoch: 3 Global Step: 76980 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:43:59,013-Speed 6305.61 samples/sec Loss 9.8441 LearningRate 0.0009 Epoch: 3 Global Step: 76990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:44:02,242-Speed 6342.09 samples/sec Loss 9.7733 LearningRate 0.0009 Epoch: 3 Global Step: 77000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:44:05,492-Speed 6304.47 samples/sec Loss 9.9066 LearningRate 0.0009 Epoch: 3 Global Step: 77010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:44:08,734-Speed 6318.83 samples/sec Loss 9.7411 LearningRate 0.0009 Epoch: 3 Global Step: 77020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:44:11,980-Speed 6308.63 samples/sec Loss 9.8188 LearningRate 0.0009 Epoch: 3 Global Step: 77030 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:44:15,225-Speed 6313.15 samples/sec Loss 9.7677 LearningRate 0.0009 Epoch: 3 Global Step: 77040 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:44:18,471-Speed 6310.39 samples/sec Loss 9.8139 LearningRate 0.0009 Epoch: 3 Global Step: 77050 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:44:21,706-Speed 6333.12 samples/sec Loss 9.8644 LearningRate 0.0009 Epoch: 3 Global Step: 77060 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:44:24,958-Speed 6299.63 samples/sec Loss 9.8749 LearningRate 0.0009 Epoch: 3 Global Step: 77070 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:44:28,204-Speed 6309.21 samples/sec Loss 9.8172 LearningRate 0.0009 Epoch: 3 Global Step: 77080 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:44:31,447-Speed 6317.60 samples/sec Loss 9.8466 LearningRate 0.0009 Epoch: 3 Global Step: 77090 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:44:34,691-Speed 6313.51 samples/sec Loss 9.8171 LearningRate 0.0009 Epoch: 3 Global Step: 77100 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:44:37,940-Speed 6306.54 samples/sec Loss 9.8478 LearningRate 0.0009 Epoch: 3 Global Step: 77110 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:44:41,184-Speed 6313.86 samples/sec Loss 9.8271 LearningRate 0.0009 Epoch: 3 Global Step: 77120 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:44:44,433-Speed 6305.34 samples/sec Loss 9.8699 LearningRate 0.0009 Epoch: 3 Global Step: 77130 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:44:47,677-Speed 6315.96 samples/sec Loss 9.8629 LearningRate 0.0009 Epoch: 3 Global Step: 77140 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:44:50,923-Speed 6309.89 samples/sec Loss 9.6740 LearningRate 0.0009 Epoch: 3 Global Step: 77150 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:44:54,177-Speed 6295.63 samples/sec Loss 9.8055 LearningRate 0.0009 Epoch: 3 Global Step: 77160 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:44:57,419-Speed 6318.89 samples/sec Loss 9.8528 LearningRate 0.0009 Epoch: 3 Global Step: 77170 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:00,666-Speed 6308.44 samples/sec Loss 9.8799 LearningRate 0.0009 Epoch: 3 Global Step: 77180 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:03,912-Speed 6311.01 samples/sec Loss 9.8337 LearningRate 0.0009 Epoch: 3 Global Step: 77190 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:07,156-Speed 6313.31 samples/sec Loss 9.8664 LearningRate 0.0009 Epoch: 3 Global Step: 77200 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:10,409-Speed 6298.04 samples/sec Loss 9.8846 LearningRate 0.0009 Epoch: 3 Global Step: 77210 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:13,669-Speed 6283.72 samples/sec Loss 9.8467 LearningRate 0.0009 Epoch: 3 Global Step: 77220 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:16,912-Speed 6316.69 samples/sec Loss 9.8352 LearningRate 0.0009 Epoch: 3 Global Step: 77230 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:20,156-Speed 6314.43 samples/sec Loss 9.8480 LearningRate 0.0009 Epoch: 3 Global Step: 77240 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:23,403-Speed 6307.45 samples/sec Loss 9.7941 LearningRate 0.0009 Epoch: 3 Global Step: 77250 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:26,633-Speed 6341.60 samples/sec Loss 9.8768 LearningRate 0.0009 Epoch: 3 Global Step: 77260 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:29,884-Speed 6302.11 samples/sec Loss 9.8150 LearningRate 0.0009 Epoch: 3 Global Step: 77270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:33,128-Speed 6313.94 samples/sec Loss 9.7260 LearningRate 0.0009 Epoch: 3 Global Step: 77280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:45:36,358-Speed 6341.92 samples/sec Loss 9.8674 LearningRate 0.0009 Epoch: 3 Global Step: 77290 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:45:39,599-Speed 6320.61 samples/sec Loss 9.8770 LearningRate 0.0009 Epoch: 3 Global Step: 77300 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:45:42,843-Speed 6314.41 samples/sec Loss 9.7748 LearningRate 0.0009 Epoch: 3 Global Step: 77310 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:45:46,090-Speed 6308.36 samples/sec Loss 9.7148 LearningRate 0.0009 Epoch: 3 Global Step: 77320 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:45:49,368-Speed 6249.47 samples/sec Loss 9.8169 LearningRate 0.0009 Epoch: 3 Global Step: 77330 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:45:52,616-Speed 6306.72 samples/sec Loss 9.8140 LearningRate 0.0009 Epoch: 3 Global Step: 77340 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:45:55,861-Speed 6314.21 samples/sec Loss 9.8668 LearningRate 0.0009 Epoch: 3 Global Step: 77350 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:45:59,106-Speed 6313.42 samples/sec Loss 9.7584 LearningRate 0.0009 Epoch: 3 Global Step: 77360 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:46:02,361-Speed 6293.09 samples/sec Loss 9.8824 LearningRate 0.0009 Epoch: 3 Global Step: 77370 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:46:05,610-Speed 6304.65 samples/sec Loss 9.8180 LearningRate 0.0009 Epoch: 3 Global Step: 77380 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:46:08,854-Speed 6314.92 samples/sec Loss 9.7055 LearningRate 0.0009 Epoch: 3 Global Step: 77390 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:12,101-Speed 6308.25 samples/sec Loss 9.7914 LearningRate 0.0009 Epoch: 3 Global Step: 77400 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:15,347-Speed 6309.17 samples/sec Loss 9.7580 LearningRate 0.0009 Epoch: 3 Global Step: 77410 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:18,596-Speed 6305.94 samples/sec Loss 9.8274 LearningRate 0.0009 Epoch: 3 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:21,844-Speed 6306.22 samples/sec Loss 9.8566 LearningRate 0.0009 Epoch: 3 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:25,091-Speed 6308.27 samples/sec Loss 9.8448 LearningRate 0.0009 Epoch: 3 Global Step: 77440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:28,342-Speed 6301.59 samples/sec Loss 9.7522 LearningRate 0.0009 Epoch: 3 Global Step: 77450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:31,582-Speed 6321.97 samples/sec Loss 9.7457 LearningRate 0.0009 Epoch: 3 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:34,834-Speed 6300.54 samples/sec Loss 9.8300 LearningRate 0.0009 Epoch: 3 Global Step: 77470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:38,075-Speed 6320.36 samples/sec Loss 9.7568 LearningRate 0.0009 Epoch: 3 Global Step: 77480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:41,321-Speed 6310.50 samples/sec Loss 9.7969 LearningRate 0.0009 Epoch: 3 Global Step: 77490 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 22:46:44,554-Speed 6336.26 samples/sec Loss 9.7790 LearningRate 0.0009 Epoch: 3 Global Step: 77500 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:47,866-Speed 6184.18 samples/sec Loss 9.7525 LearningRate 0.0009 Epoch: 3 Global Step: 77510 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:51,137-Speed 6263.03 samples/sec Loss 9.7192 LearningRate 0.0009 Epoch: 3 Global Step: 77520 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:54,382-Speed 6312.29 samples/sec Loss 9.6832 LearningRate 0.0009 Epoch: 3 Global Step: 77530 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:46:57,626-Speed 6314.79 samples/sec Loss 9.7919 LearningRate 0.0009 Epoch: 3 Global Step: 77540 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:00,872-Speed 6310.75 samples/sec Loss 9.8481 LearningRate 0.0009 Epoch: 3 Global Step: 77550 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:04,120-Speed 6306.51 samples/sec Loss 9.9093 LearningRate 0.0009 Epoch: 3 Global Step: 77560 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:07,366-Speed 6312.29 samples/sec Loss 9.7412 LearningRate 0.0009 Epoch: 3 Global Step: 77570 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:10,611-Speed 6312.43 samples/sec Loss 9.7975 LearningRate 0.0009 Epoch: 3 Global Step: 77580 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:13,859-Speed 6306.52 samples/sec Loss 9.7138 LearningRate 0.0009 Epoch: 3 Global Step: 77590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:17,092-Speed 6335.76 samples/sec Loss 9.6916 LearningRate 0.0009 Epoch: 3 Global Step: 77600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:20,337-Speed 6313.68 samples/sec Loss 9.8009 LearningRate 0.0009 Epoch: 3 Global Step: 77610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:23,579-Speed 6317.27 samples/sec Loss 9.7471 LearningRate 0.0009 Epoch: 3 Global Step: 77620 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:26,826-Speed 6308.85 samples/sec Loss 9.8366 LearningRate 0.0009 Epoch: 3 Global Step: 77630 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:30,096-Speed 6264.45 samples/sec Loss 9.7327 LearningRate 0.0009 Epoch: 3 Global Step: 77640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:33,342-Speed 6311.68 samples/sec Loss 9.8453 LearningRate 0.0009 Epoch: 3 Global Step: 77650 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:36,588-Speed 6308.97 samples/sec Loss 9.8672 LearningRate 0.0009 Epoch: 3 Global Step: 77660 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:39,835-Speed 6309.71 samples/sec Loss 9.7205 LearningRate 0.0009 Epoch: 3 Global Step: 77670 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:43,086-Speed 6300.45 samples/sec Loss 9.8222 LearningRate 0.0009 Epoch: 3 Global Step: 77680 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:46,332-Speed 6311.97 samples/sec Loss 9.8900 LearningRate 0.0009 Epoch: 3 Global Step: 77690 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:49,567-Speed 6331.89 samples/sec Loss 9.8004 LearningRate 0.0009 Epoch: 3 Global Step: 77700 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:52,813-Speed 6310.80 samples/sec Loss 9.7777 LearningRate 0.0009 Epoch: 3 Global Step: 77710 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:56,058-Speed 6311.56 samples/sec Loss 9.7785 LearningRate 0.0009 Epoch: 3 Global Step: 77720 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:47:59,304-Speed 6311.41 samples/sec Loss 9.7386 LearningRate 0.0009 Epoch: 3 Global Step: 77730 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:02,556-Speed 6298.94 samples/sec Loss 9.8162 LearningRate 0.0009 Epoch: 3 Global Step: 77740 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:05,799-Speed 6315.39 samples/sec Loss 9.7678 LearningRate 0.0009 Epoch: 3 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:09,044-Speed 6313.99 samples/sec Loss 9.7768 LearningRate 0.0009 Epoch: 3 Global Step: 77760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:12,297-Speed 6295.87 samples/sec Loss 9.7841 LearningRate 0.0009 Epoch: 3 Global Step: 77770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:15,549-Speed 6300.17 samples/sec Loss 9.8630 LearningRate 0.0009 Epoch: 3 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:18,792-Speed 6317.11 samples/sec Loss 9.8278 LearningRate 0.0009 Epoch: 3 Global Step: 77790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:22,024-Speed 6337.59 samples/sec Loss 9.7820 LearningRate 0.0009 Epoch: 3 Global Step: 77800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:25,279-Speed 6293.62 samples/sec Loss 9.8001 LearningRate 0.0009 Epoch: 3 Global Step: 77810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:28,521-Speed 6318.21 samples/sec Loss 9.8243 LearningRate 0.0009 Epoch: 3 Global Step: 77820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:31,768-Speed 6310.19 samples/sec Loss 9.8090 LearningRate 0.0009 Epoch: 3 Global Step: 77830 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:35,044-Speed 6252.99 samples/sec Loss 9.6731 LearningRate 0.0009 Epoch: 3 Global Step: 77840 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:38,336-Speed 6222.84 samples/sec Loss 9.8630 LearningRate 0.0009 Epoch: 3 Global Step: 77850 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:41,581-Speed 6311.56 samples/sec Loss 9.7243 LearningRate 0.0009 Epoch: 3 Global Step: 77860 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:44,828-Speed 6309.81 samples/sec Loss 9.7833 LearningRate 0.0009 Epoch: 3 Global Step: 77870 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:48,073-Speed 6312.67 samples/sec Loss 9.7144 LearningRate 0.0009 Epoch: 3 Global Step: 77880 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:51,320-Speed 6307.74 samples/sec Loss 9.6753 LearningRate 0.0009 Epoch: 3 Global Step: 77890 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:48:54,565-Speed 6313.25 samples/sec Loss 9.8535 LearningRate 0.0009 Epoch: 3 Global Step: 77900 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 22:48:57,799-Speed 6334.36 samples/sec Loss 9.7751 LearningRate 0.0009 Epoch: 3 Global Step: 77910 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:01,045-Speed 6309.51 samples/sec Loss 9.7977 LearningRate 0.0009 Epoch: 3 Global Step: 77920 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:04,290-Speed 6312.75 samples/sec Loss 9.6699 LearningRate 0.0009 Epoch: 3 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:07,534-Speed 6314.32 samples/sec Loss 9.8352 LearningRate 0.0009 Epoch: 3 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:10,780-Speed 6310.54 samples/sec Loss 9.7504 LearningRate 0.0009 Epoch: 3 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:14,029-Speed 6306.30 samples/sec Loss 9.8732 LearningRate 0.0009 Epoch: 3 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:17,273-Speed 6313.84 samples/sec Loss 9.6953 LearningRate 0.0009 Epoch: 3 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:20,514-Speed 6320.39 samples/sec Loss 9.7546 LearningRate 0.0009 Epoch: 3 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:23,765-Speed 6301.63 samples/sec Loss 9.7607 LearningRate 0.0009 Epoch: 3 Global Step: 77990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:27,012-Speed 6309.79 samples/sec Loss 9.7509 LearningRate 0.0009 Epoch: 3 Global Step: 78000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:30,244-Speed 6336.60 samples/sec Loss 9.8052 LearningRate 0.0009 Epoch: 3 Global Step: 78010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:33,495-Speed 6301.96 samples/sec Loss 9.8653 LearningRate 0.0009 Epoch: 3 Global Step: 78020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:36,745-Speed 6302.61 samples/sec Loss 9.7374 LearningRate 0.0009 Epoch: 3 Global Step: 78030 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:39,996-Speed 6300.36 samples/sec Loss 9.7469 LearningRate 0.0009 Epoch: 3 Global Step: 78040 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:43,239-Speed 6317.33 samples/sec Loss 9.7423 LearningRate 0.0009 Epoch: 3 Global Step: 78050 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:46,487-Speed 6305.87 samples/sec Loss 9.7441 LearningRate 0.0009 Epoch: 3 Global Step: 78060 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:49,737-Speed 6303.24 samples/sec Loss 9.8010 LearningRate 0.0009 Epoch: 3 Global Step: 78070 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:49:52,974-Speed 6327.89 samples/sec Loss 9.7798 LearningRate 0.0009 Epoch: 3 Global Step: 78080 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:49:56,219-Speed 6312.49 samples/sec Loss 9.8792 LearningRate 0.0009 Epoch: 3 Global Step: 78090 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:49:59,465-Speed 6310.72 samples/sec Loss 9.8560 LearningRate 0.0009 Epoch: 3 Global Step: 78100 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:50:02,711-Speed 6310.72 samples/sec Loss 9.7750 LearningRate 0.0009 Epoch: 3 Global Step: 78110 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:50:05,954-Speed 6316.09 samples/sec Loss 9.7570 LearningRate 0.0009 Epoch: 3 Global Step: 78120 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:50:09,205-Speed 6301.61 samples/sec Loss 9.7788 LearningRate 0.0009 Epoch: 3 Global Step: 78130 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:50:12,449-Speed 6315.19 samples/sec Loss 9.8049 LearningRate 0.0009 Epoch: 3 Global Step: 78140 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:50:15,691-Speed 6318.16 samples/sec Loss 9.7814 LearningRate 0.0009 Epoch: 3 Global Step: 78150 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:50:18,936-Speed 6312.60 samples/sec Loss 9.7729 LearningRate 0.0009 Epoch: 3 Global Step: 78160 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:50:22,179-Speed 6315.85 samples/sec Loss 9.7703 LearningRate 0.0009 Epoch: 3 Global Step: 78170 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:50:25,425-Speed 6310.91 samples/sec Loss 9.7866 LearningRate 0.0009 Epoch: 3 Global Step: 78180 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:50:28,680-Speed 6294.43 samples/sec Loss 9.8090 LearningRate 0.0009 Epoch: 3 Global Step: 78190 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:50:31,925-Speed 6312.45 samples/sec Loss 9.7688 LearningRate 0.0009 Epoch: 3 Global Step: 78200 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:50:35,174-Speed 6305.87 samples/sec Loss 9.8099 LearningRate 0.0009 Epoch: 3 Global Step: 78210 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:50:38,418-Speed 6313.50 samples/sec Loss 9.8400 LearningRate 0.0009 Epoch: 3 Global Step: 78220 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:50:41,663-Speed 6313.27 samples/sec Loss 9.7946 LearningRate 0.0009 Epoch: 3 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:50:44,910-Speed 6308.64 samples/sec Loss 9.8486 LearningRate 0.0009 Epoch: 3 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:50:48,159-Speed 6305.22 samples/sec Loss 9.7073 LearningRate 0.0009 Epoch: 3 Global Step: 78250 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:50:51,403-Speed 6314.35 samples/sec Loss 9.7627 LearningRate 0.0009 Epoch: 3 Global Step: 78260 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:50:54,648-Speed 6313.07 samples/sec Loss 9.7764 LearningRate 0.0009 Epoch: 3 Global Step: 78270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:50:57,877-Speed 6345.02 samples/sec Loss 9.8756 LearningRate 0.0009 Epoch: 3 Global Step: 78280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:01,124-Speed 6307.73 samples/sec Loss 9.8014 LearningRate 0.0009 Epoch: 3 Global Step: 78290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:04,368-Speed 6313.67 samples/sec Loss 9.7418 LearningRate 0.0009 Epoch: 3 Global Step: 78300 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:07,615-Speed 6309.27 samples/sec Loss 9.9035 LearningRate 0.0009 Epoch: 3 Global Step: 78310 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:10,856-Speed 6321.45 samples/sec Loss 9.7535 LearningRate 0.0009 Epoch: 3 Global Step: 78320 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:14,105-Speed 6305.44 samples/sec Loss 9.8509 LearningRate 0.0009 Epoch: 3 Global Step: 78330 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:17,351-Speed 6309.07 samples/sec Loss 9.7881 LearningRate 0.0009 Epoch: 3 Global Step: 78340 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:20,597-Speed 6312.17 samples/sec Loss 9.6541 LearningRate 0.0009 Epoch: 3 Global Step: 78350 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:23,846-Speed 6303.65 samples/sec Loss 9.8285 LearningRate 0.0009 Epoch: 3 Global Step: 78360 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:27,100-Speed 6295.07 samples/sec Loss 9.6984 LearningRate 0.0009 Epoch: 3 Global Step: 78370 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:30,350-Speed 6302.71 samples/sec Loss 9.7336 LearningRate 0.0009 Epoch: 3 Global Step: 78380 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 22:51:33,580-Speed 6342.59 samples/sec Loss 9.8045 LearningRate 0.0009 Epoch: 3 Global Step: 78390 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:36,823-Speed 6315.86 samples/sec Loss 9.8441 LearningRate 0.0009 Epoch: 3 Global Step: 78400 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:40,069-Speed 6311.10 samples/sec Loss 9.8895 LearningRate 0.0009 Epoch: 3 Global Step: 78410 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:43,320-Speed 6301.55 samples/sec Loss 9.6624 LearningRate 0.0009 Epoch: 3 Global Step: 78420 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:46,565-Speed 6312.79 samples/sec Loss 9.8135 LearningRate 0.0009 Epoch: 3 Global Step: 78430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:49,809-Speed 6314.88 samples/sec Loss 9.7506 LearningRate 0.0009 Epoch: 3 Global Step: 78440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:53,053-Speed 6314.54 samples/sec Loss 9.8497 LearningRate 0.0009 Epoch: 3 Global Step: 78450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:56,304-Speed 6300.35 samples/sec Loss 9.7808 LearningRate 0.0009 Epoch: 3 Global Step: 78460 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:51:59,548-Speed 6315.42 samples/sec Loss 9.7541 LearningRate 0.0009 Epoch: 3 Global Step: 78470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:02,796-Speed 6306.60 samples/sec Loss 9.7312 LearningRate 0.0009 Epoch: 3 Global Step: 78480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:06,027-Speed 6339.94 samples/sec Loss 9.8353 LearningRate 0.0009 Epoch: 3 Global Step: 78490 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:09,280-Speed 6298.53 samples/sec Loss 9.8082 LearningRate 0.0009 Epoch: 3 Global Step: 78500 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:12,526-Speed 6310.96 samples/sec Loss 9.8168 LearningRate 0.0009 Epoch: 3 Global Step: 78510 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:15,777-Speed 6301.39 samples/sec Loss 9.7564 LearningRate 0.0009 Epoch: 3 Global Step: 78520 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:19,022-Speed 6312.32 samples/sec Loss 9.8076 LearningRate 0.0009 Epoch: 3 Global Step: 78530 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:22,270-Speed 6306.45 samples/sec Loss 9.6583 LearningRate 0.0009 Epoch: 3 Global Step: 78540 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:25,524-Speed 6293.83 samples/sec Loss 9.7066 LearningRate 0.0009 Epoch: 3 Global Step: 78550 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:28,770-Speed 6312.23 samples/sec Loss 9.8538 LearningRate 0.0009 Epoch: 3 Global Step: 78560 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:32,017-Speed 6308.75 samples/sec Loss 9.7990 LearningRate 0.0009 Epoch: 3 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:35,261-Speed 6314.37 samples/sec Loss 9.7906 LearningRate 0.0009 Epoch: 3 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:38,488-Speed 6348.35 samples/sec Loss 9.7183 LearningRate 0.0009 Epoch: 3 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:41,735-Speed 6307.73 samples/sec Loss 9.7512 LearningRate 0.0009 Epoch: 3 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:44,978-Speed 6317.18 samples/sec Loss 9.7519 LearningRate 0.0009 Epoch: 3 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:48,222-Speed 6314.75 samples/sec Loss 9.7116 LearningRate 0.0009 Epoch: 3 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:51,479-Speed 6288.74 samples/sec Loss 9.9174 LearningRate 0.0009 Epoch: 3 Global Step: 78630 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:54,726-Speed 6308.50 samples/sec Loss 9.8171 LearningRate 0.0009 Epoch: 3 Global Step: 78640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:52:57,974-Speed 6308.31 samples/sec Loss 9.7493 LearningRate 0.0009 Epoch: 3 Global Step: 78650 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:01,229-Speed 6292.84 samples/sec Loss 9.7268 LearningRate 0.0009 Epoch: 3 Global Step: 78660 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:04,475-Speed 6310.23 samples/sec Loss 9.7727 LearningRate 0.0009 Epoch: 3 Global Step: 78670 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:07,720-Speed 6313.39 samples/sec Loss 9.7755 LearningRate 0.0009 Epoch: 3 Global Step: 78680 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:10,951-Speed 6340.67 samples/sec Loss 9.8318 LearningRate 0.0009 Epoch: 3 Global Step: 78690 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:14,196-Speed 6312.62 samples/sec Loss 9.7312 LearningRate 0.0009 Epoch: 3 Global Step: 78700 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:17,440-Speed 6313.92 samples/sec Loss 9.7934 LearningRate 0.0009 Epoch: 3 Global Step: 78710 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:20,683-Speed 6316.94 samples/sec Loss 9.7479 LearningRate 0.0009 Epoch: 3 Global Step: 78720 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:23,928-Speed 6312.80 samples/sec Loss 9.8144 LearningRate 0.0009 Epoch: 3 Global Step: 78730 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:27,176-Speed 6305.94 samples/sec Loss 9.7069 LearningRate 0.0009 Epoch: 3 Global Step: 78740 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:30,420-Speed 6314.91 samples/sec Loss 9.7265 LearningRate 0.0009 Epoch: 3 Global Step: 78750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:33,667-Speed 6309.49 samples/sec Loss 9.6782 LearningRate 0.0009 Epoch: 3 Global Step: 78760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:36,912-Speed 6312.29 samples/sec Loss 9.7890 LearningRate 0.0009 Epoch: 3 Global Step: 78770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:40,167-Speed 6292.24 samples/sec Loss 9.7134 LearningRate 0.0009 Epoch: 3 Global Step: 78780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:43,404-Speed 6328.21 samples/sec Loss 9.8089 LearningRate 0.0009 Epoch: 3 Global Step: 78790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:46,648-Speed 6314.79 samples/sec Loss 9.7861 LearningRate 0.0009 Epoch: 3 Global Step: 78800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:49,892-Speed 6314.43 samples/sec Loss 9.7018 LearningRate 0.0010 Epoch: 3 Global Step: 78810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:53,135-Speed 6317.11 samples/sec Loss 9.8263 LearningRate 0.0010 Epoch: 3 Global Step: 78820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:56,381-Speed 6310.04 samples/sec Loss 9.6708 LearningRate 0.0010 Epoch: 3 Global Step: 78830 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:53:59,627-Speed 6311.75 samples/sec Loss 9.7427 LearningRate 0.0010 Epoch: 3 Global Step: 78840 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:02,873-Speed 6310.67 samples/sec Loss 9.6881 LearningRate 0.0010 Epoch: 3 Global Step: 78850 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:06,123-Speed 6303.64 samples/sec Loss 9.7714 LearningRate 0.0010 Epoch: 3 Global Step: 78860 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:09,367-Speed 6314.34 samples/sec Loss 9.7575 LearningRate 0.0010 Epoch: 3 Global Step: 78870 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:12,611-Speed 6314.05 samples/sec Loss 9.8061 LearningRate 0.0010 Epoch: 3 Global Step: 78880 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:15,857-Speed 6312.09 samples/sec Loss 9.7626 LearningRate 0.0010 Epoch: 3 Global Step: 78890 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 22:54:19,088-Speed 6338.46 samples/sec Loss 9.6147 LearningRate 0.0010 Epoch: 3 Global Step: 78900 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:22,332-Speed 6315.94 samples/sec Loss 9.7034 LearningRate 0.0010 Epoch: 3 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:25,584-Speed 6297.58 samples/sec Loss 9.8707 LearningRate 0.0010 Epoch: 3 Global Step: 78920 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:28,826-Speed 6318.73 samples/sec Loss 9.7718 LearningRate 0.0010 Epoch: 3 Global Step: 78930 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:32,073-Speed 6309.41 samples/sec Loss 9.7166 LearningRate 0.0010 Epoch: 3 Global Step: 78940 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:35,318-Speed 6312.74 samples/sec Loss 9.6369 LearningRate 0.0010 Epoch: 3 Global Step: 78950 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:38,565-Speed 6307.81 samples/sec Loss 9.8592 LearningRate 0.0010 Epoch: 3 Global Step: 78960 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:41,809-Speed 6316.00 samples/sec Loss 9.7446 LearningRate 0.0010 Epoch: 3 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:45,055-Speed 6310.33 samples/sec Loss 9.7387 LearningRate 0.0010 Epoch: 3 Global Step: 78980 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:48,302-Speed 6307.77 samples/sec Loss 9.7287 LearningRate 0.0010 Epoch: 3 Global Step: 78990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:51,540-Speed 6326.63 samples/sec Loss 9.7303 LearningRate 0.0010 Epoch: 3 Global Step: 79000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:54,790-Speed 6304.31 samples/sec Loss 9.7732 LearningRate 0.0010 Epoch: 3 Global Step: 79010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:54:58,037-Speed 6307.68 samples/sec Loss 9.6536 LearningRate 0.0010 Epoch: 3 Global Step: 79020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:55:01,291-Speed 6295.76 samples/sec Loss 9.7455 LearningRate 0.0010 Epoch: 3 Global Step: 79030 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:55:04,523-Speed 6337.64 samples/sec Loss 9.7954 LearningRate 0.0010 Epoch: 3 Global Step: 79040 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:55:07,766-Speed 6316.01 samples/sec Loss 9.6899 LearningRate 0.0010 Epoch: 3 Global Step: 79050 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:55:11,011-Speed 6313.64 samples/sec Loss 9.7216 LearningRate 0.0010 Epoch: 3 Global Step: 79060 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:55:14,272-Speed 6281.36 samples/sec Loss 9.7101 LearningRate 0.0010 Epoch: 3 Global Step: 79070 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:55:17,516-Speed 6315.86 samples/sec Loss 9.6249 LearningRate 0.0010 Epoch: 3 Global Step: 79080 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:55:20,761-Speed 6312.10 samples/sec Loss 9.7342 LearningRate 0.0010 Epoch: 3 Global Step: 79090 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:55:24,007-Speed 6312.08 samples/sec Loss 9.6795 LearningRate 0.0010 Epoch: 3 Global Step: 79100 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:55:27,251-Speed 6313.36 samples/sec Loss 9.6181 LearningRate 0.0010 Epoch: 3 Global Step: 79110 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:55:30,499-Speed 6307.32 samples/sec Loss 9.7406 LearningRate 0.0010 Epoch: 3 Global Step: 79120 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:55:33,741-Speed 6319.43 samples/sec Loss 9.7743 LearningRate 0.0010 Epoch: 3 Global Step: 79130 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:55:36,984-Speed 6316.40 samples/sec Loss 9.7215 LearningRate 0.0010 Epoch: 3 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:55:40,228-Speed 6313.87 samples/sec Loss 9.7286 LearningRate 0.0010 Epoch: 3 Global Step: 79150 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:55:43,476-Speed 6305.97 samples/sec Loss 9.7772 LearningRate 0.0010 Epoch: 3 Global Step: 79160 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:55:46,721-Speed 6313.86 samples/sec Loss 9.7854 LearningRate 0.0010 Epoch: 3 Global Step: 79170 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:55:49,965-Speed 6314.32 samples/sec Loss 9.5983 LearningRate 0.0010 Epoch: 3 Global Step: 79180 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:55:53,209-Speed 6314.57 samples/sec Loss 9.7066 LearningRate 0.0010 Epoch: 3 Global Step: 79190 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:55:56,463-Speed 6294.00 samples/sec Loss 9.6833 LearningRate 0.0010 Epoch: 3 Global Step: 79200 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:55:59,714-Speed 6302.18 samples/sec Loss 9.6983 LearningRate 0.0010 Epoch: 3 Global Step: 79210 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:02,963-Speed 6304.52 samples/sec Loss 9.7923 LearningRate 0.0010 Epoch: 3 Global Step: 79220 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:06,208-Speed 6313.50 samples/sec Loss 9.7824 LearningRate 0.0010 Epoch: 3 Global Step: 79230 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:09,439-Speed 6338.31 samples/sec Loss 9.7753 LearningRate 0.0010 Epoch: 3 Global Step: 79240 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:12,684-Speed 6313.05 samples/sec Loss 9.7147 LearningRate 0.0010 Epoch: 3 Global Step: 79250 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:15,929-Speed 6313.79 samples/sec Loss 9.7413 LearningRate 0.0010 Epoch: 3 Global Step: 79260 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:19,171-Speed 6317.07 samples/sec Loss 9.6348 LearningRate 0.0010 Epoch: 3 Global Step: 79270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:22,424-Speed 6298.60 samples/sec Loss 9.6860 LearningRate 0.0010 Epoch: 3 Global Step: 79280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:25,670-Speed 6310.64 samples/sec Loss 9.7260 LearningRate 0.0010 Epoch: 3 Global Step: 79290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:28,915-Speed 6313.65 samples/sec Loss 9.8031 LearningRate 0.0010 Epoch: 3 Global Step: 79300 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:32,164-Speed 6304.35 samples/sec Loss 9.6589 LearningRate 0.0010 Epoch: 3 Global Step: 79310 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:35,411-Speed 6309.40 samples/sec Loss 9.6723 LearningRate 0.0010 Epoch: 3 Global Step: 79320 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:38,655-Speed 6313.10 samples/sec Loss 9.7926 LearningRate 0.0010 Epoch: 3 Global Step: 79330 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:41,887-Speed 6339.35 samples/sec Loss 9.7875 LearningRate 0.0010 Epoch: 3 Global Step: 79340 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:45,134-Speed 6307.64 samples/sec Loss 9.6668 LearningRate 0.0010 Epoch: 3 Global Step: 79350 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:48,382-Speed 6307.17 samples/sec Loss 9.7841 LearningRate 0.0010 Epoch: 3 Global Step: 79360 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:51,632-Speed 6304.19 samples/sec Loss 9.7319 LearningRate 0.0010 Epoch: 3 Global Step: 79370 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:54,878-Speed 6309.58 samples/sec Loss 9.7276 LearningRate 0.0010 Epoch: 3 Global Step: 79380 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:56:58,123-Speed 6313.53 samples/sec Loss 9.6818 LearningRate 0.0010 Epoch: 3 Global Step: 79390 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:01,371-Speed 6305.69 samples/sec Loss 9.8128 LearningRate 0.0010 Epoch: 3 Global Step: 79400 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:04,616-Speed 6315.98 samples/sec Loss 9.6884 LearningRate 0.0010 Epoch: 3 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:07,858-Speed 6318.87 samples/sec Loss 9.7290 LearningRate 0.0010 Epoch: 3 Global Step: 79420 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:11,107-Speed 6303.74 samples/sec Loss 9.7249 LearningRate 0.0010 Epoch: 3 Global Step: 79430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:14,338-Speed 6340.12 samples/sec Loss 9.7157 LearningRate 0.0010 Epoch: 3 Global Step: 79440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:17,586-Speed 6306.91 samples/sec Loss 9.6395 LearningRate 0.0010 Epoch: 3 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:20,830-Speed 6314.31 samples/sec Loss 9.7697 LearningRate 0.0010 Epoch: 3 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:24,089-Speed 6285.82 samples/sec Loss 9.7149 LearningRate 0.0010 Epoch: 3 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:27,334-Speed 6312.75 samples/sec Loss 9.7176 LearningRate 0.0010 Epoch: 3 Global Step: 79480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:30,579-Speed 6311.91 samples/sec Loss 9.6838 LearningRate 0.0010 Epoch: 3 Global Step: 79490 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:33,823-Speed 6315.69 samples/sec Loss 9.7210 LearningRate 0.0010 Epoch: 3 Global Step: 79500 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:37,066-Speed 6317.29 samples/sec Loss 9.7999 LearningRate 0.0010 Epoch: 3 Global Step: 79510 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:40,312-Speed 6311.08 samples/sec Loss 9.6681 LearningRate 0.0010 Epoch: 3 Global Step: 79520 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:43,555-Speed 6315.45 samples/sec Loss 9.7416 LearningRate 0.0010 Epoch: 3 Global Step: 79530 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:57:46,786-Speed 6340.33 samples/sec Loss 9.7011 LearningRate 0.0010 Epoch: 3 Global Step: 79540 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:57:50,031-Speed 6312.83 samples/sec Loss 9.7225 LearningRate 0.0010 Epoch: 3 Global Step: 79550 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:57:53,274-Speed 6317.57 samples/sec Loss 9.6452 LearningRate 0.0010 Epoch: 3 Global Step: 79560 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:57:56,518-Speed 6314.48 samples/sec Loss 9.7315 LearningRate 0.0010 Epoch: 3 Global Step: 79570 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:57:59,765-Speed 6307.78 samples/sec Loss 9.7839 LearningRate 0.0010 Epoch: 3 Global Step: 79580 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:03,007-Speed 6318.61 samples/sec Loss 9.7070 LearningRate 0.0010 Epoch: 3 Global Step: 79590 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:06,251-Speed 6315.14 samples/sec Loss 9.6822 LearningRate 0.0010 Epoch: 3 Global Step: 79600 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:09,493-Speed 6318.62 samples/sec Loss 9.6784 LearningRate 0.0010 Epoch: 3 Global Step: 79610 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:12,733-Speed 6321.27 samples/sec Loss 9.7529 LearningRate 0.0010 Epoch: 3 Global Step: 79620 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:15,979-Speed 6312.12 samples/sec Loss 9.7443 LearningRate 0.0010 Epoch: 3 Global Step: 79630 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:19,223-Speed 6313.26 samples/sec Loss 9.7661 LearningRate 0.0010 Epoch: 3 Global Step: 79640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:58:22,455-Speed 6338.01 samples/sec Loss 9.7101 LearningRate 0.0010 Epoch: 3 Global Step: 79650 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:25,700-Speed 6312.60 samples/sec Loss 9.6434 LearningRate 0.0010 Epoch: 3 Global Step: 79660 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:28,944-Speed 6314.28 samples/sec Loss 9.6720 LearningRate 0.0010 Epoch: 3 Global Step: 79670 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:32,191-Speed 6309.72 samples/sec Loss 9.7253 LearningRate 0.0010 Epoch: 3 Global Step: 79680 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:35,435-Speed 6314.39 samples/sec Loss 9.7142 LearningRate 0.0010 Epoch: 3 Global Step: 79690 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:38,679-Speed 6315.51 samples/sec Loss 9.6657 LearningRate 0.0010 Epoch: 3 Global Step: 79700 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:41,934-Speed 6292.12 samples/sec Loss 9.7225 LearningRate 0.0010 Epoch: 3 Global Step: 79710 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:45,179-Speed 6313.25 samples/sec Loss 9.6708 LearningRate 0.0010 Epoch: 3 Global Step: 79720 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:48,426-Speed 6309.34 samples/sec Loss 9.6196 LearningRate 0.0010 Epoch: 3 Global Step: 79730 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:51,667-Speed 6321.26 samples/sec Loss 9.7870 LearningRate 0.0010 Epoch: 3 Global Step: 79740 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:58:54,909-Speed 6318.39 samples/sec Loss 9.6641 LearningRate 0.0010 Epoch: 3 Global Step: 79750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:58:58,158-Speed 6303.25 samples/sec Loss 9.6678 LearningRate 0.0010 Epoch: 3 Global Step: 79760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:01,404-Speed 6312.01 samples/sec Loss 9.7614 LearningRate 0.0010 Epoch: 3 Global Step: 79770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:04,646-Speed 6318.06 samples/sec Loss 9.7636 LearningRate 0.0010 Epoch: 3 Global Step: 79780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:07,890-Speed 6314.13 samples/sec Loss 9.7289 LearningRate 0.0010 Epoch: 3 Global Step: 79790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:11,138-Speed 6307.65 samples/sec Loss 9.7248 LearningRate 0.0010 Epoch: 3 Global Step: 79800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:14,386-Speed 6306.47 samples/sec Loss 9.6952 LearningRate 0.0010 Epoch: 3 Global Step: 79810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:17,628-Speed 6318.46 samples/sec Loss 9.7585 LearningRate 0.0010 Epoch: 3 Global Step: 79820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:20,880-Speed 6300.00 samples/sec Loss 9.7153 LearningRate 0.0010 Epoch: 3 Global Step: 79830 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:24,127-Speed 6307.87 samples/sec Loss 9.7317 LearningRate 0.0010 Epoch: 3 Global Step: 79840 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:27,359-Speed 6337.08 samples/sec Loss 9.7025 LearningRate 0.0010 Epoch: 3 Global Step: 79850 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:30,605-Speed 6311.37 samples/sec Loss 9.7185 LearningRate 0.0010 Epoch: 3 Global Step: 79860 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:33,848-Speed 6316.77 samples/sec Loss 9.7839 LearningRate 0.0010 Epoch: 3 Global Step: 79870 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:37,094-Speed 6309.97 samples/sec Loss 9.6403 LearningRate 0.0010 Epoch: 3 Global Step: 79880 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:40,337-Speed 6316.26 samples/sec Loss 9.6427 LearningRate 0.0010 Epoch: 3 Global Step: 79890 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:43,584-Speed 6309.89 samples/sec Loss 9.5786 LearningRate 0.0010 Epoch: 3 Global Step: 79900 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:46,828-Speed 6314.57 samples/sec Loss 9.6274 LearningRate 0.0010 Epoch: 3 Global Step: 79910 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:50,070-Speed 6317.77 samples/sec Loss 9.6123 LearningRate 0.0010 Epoch: 3 Global Step: 79920 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:53,318-Speed 6308.15 samples/sec Loss 9.7666 LearningRate 0.0010 Epoch: 3 Global Step: 79930 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 22:59:56,545-Speed 6348.56 samples/sec Loss 9.7217 LearningRate 0.0010 Epoch: 3 Global Step: 79940 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 22:59:59,791-Speed 6310.20 samples/sec Loss 9.8200 LearningRate 0.0010 Epoch: 3 Global Step: 79950 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:00:03,037-Speed 6311.66 samples/sec Loss 9.6910 LearningRate 0.0010 Epoch: 3 Global Step: 79960 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:00:06,294-Speed 6288.41 samples/sec Loss 9.7241 LearningRate 0.0010 Epoch: 3 Global Step: 79970 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:00:09,538-Speed 6314.14 samples/sec Loss 9.7274 LearningRate 0.0010 Epoch: 3 Global Step: 79980 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:00:12,779-Speed 6320.61 samples/sec Loss 9.7728 LearningRate 0.0010 Epoch: 3 Global Step: 79990 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:00:16,027-Speed 6308.08 samples/sec Loss 9.7561 LearningRate 0.0010 Epoch: 3 Global Step: 80000 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:00:19,276-Speed 6303.13 samples/sec Loss 9.6580 LearningRate 0.0010 Epoch: 3 Global Step: 80010 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:00:22,524-Speed 6307.38 samples/sec Loss 9.6587 LearningRate 0.0010 Epoch: 3 Global Step: 80020 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:00:25,771-Speed 6309.67 samples/sec Loss 9.7106 LearningRate 0.0010 Epoch: 3 Global Step: 80030 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:00:29,019-Speed 6305.69 samples/sec Loss 9.7149 LearningRate 0.0010 Epoch: 3 Global Step: 80040 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:00:32,263-Speed 6315.01 samples/sec Loss 9.7336 LearningRate 0.0010 Epoch: 3 Global Step: 80050 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:00:35,506-Speed 6317.25 samples/sec Loss 9.7564 LearningRate 0.0010 Epoch: 3 Global Step: 80060 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:00:38,750-Speed 6313.06 samples/sec Loss 9.7139 LearningRate 0.0010 Epoch: 3 Global Step: 80070 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:00:41,998-Speed 6307.23 samples/sec Loss 9.6840 LearningRate 0.0010 Epoch: 3 Global Step: 80080 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:00:45,249-Speed 6300.56 samples/sec Loss 9.7102 LearningRate 0.0010 Epoch: 3 Global Step: 80090 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:00:48,495-Speed 6310.94 samples/sec Loss 9.6485 LearningRate 0.0010 Epoch: 3 Global Step: 80100 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:00:51,742-Speed 6309.26 samples/sec Loss 9.6572 LearningRate 0.0010 Epoch: 3 Global Step: 80110 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:00:54,990-Speed 6307.40 samples/sec Loss 9.6219 LearningRate 0.0010 Epoch: 3 Global Step: 80120 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:00:58,234-Speed 6314.19 samples/sec Loss 9.7117 LearningRate 0.0010 Epoch: 3 Global Step: 80130 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:01:01,464-Speed 6342.32 samples/sec Loss 9.7361 LearningRate 0.0010 Epoch: 3 Global Step: 80140 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:01:04,710-Speed 6311.73 samples/sec Loss 9.8123 LearningRate 0.0010 Epoch: 3 Global Step: 80150 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:01:07,955-Speed 6311.51 samples/sec Loss 9.6561 LearningRate 0.0010 Epoch: 3 Global Step: 80160 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:01:11,231-Speed 6252.82 samples/sec Loss 9.6344 LearningRate 0.0010 Epoch: 3 Global Step: 80170 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:01:14,478-Speed 6310.39 samples/sec Loss 9.7025 LearningRate 0.0010 Epoch: 3 Global Step: 80180 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:01:17,725-Speed 6308.61 samples/sec Loss 9.5837 LearningRate 0.0010 Epoch: 3 Global Step: 80190 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:01:20,975-Speed 6302.00 samples/sec Loss 9.7689 LearningRate 0.0010 Epoch: 3 Global Step: 80200 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:01:24,222-Speed 6309.83 samples/sec Loss 9.7012 LearningRate 0.0010 Epoch: 3 Global Step: 80210 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:01:27,470-Speed 6305.34 samples/sec Loss 9.7831 LearningRate 0.0010 Epoch: 3 Global Step: 80220 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:01:30,701-Speed 6339.74 samples/sec Loss 9.6994 LearningRate 0.0010 Epoch: 3 Global Step: 80230 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:01:33,946-Speed 6312.66 samples/sec Loss 9.6224 LearningRate 0.0010 Epoch: 3 Global Step: 80240 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:01:37,190-Speed 6314.85 samples/sec Loss 9.7444 LearningRate 0.0010 Epoch: 3 Global Step: 80250 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:01:40,436-Speed 6310.55 samples/sec Loss 9.6943 LearningRate 0.0010 Epoch: 3 Global Step: 80260 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:01:43,703-Speed 6271.54 samples/sec Loss 9.6810 LearningRate 0.0010 Epoch: 3 Global Step: 80270 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:01:46,947-Speed 6313.06 samples/sec Loss 9.7087 LearningRate 0.0010 Epoch: 3 Global Step: 80280 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:01:50,195-Speed 6307.73 samples/sec Loss 9.6468 LearningRate 0.0010 Epoch: 3 Global Step: 80290 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:01:53,436-Speed 6320.39 samples/sec Loss 9.7117 LearningRate 0.0010 Epoch: 3 Global Step: 80300 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:01:56,680-Speed 6315.16 samples/sec Loss 9.6302 LearningRate 0.0010 Epoch: 3 Global Step: 80310 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:01:59,921-Speed 6318.42 samples/sec Loss 9.7418 LearningRate 0.0010 Epoch: 3 Global Step: 80320 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:02:03,169-Speed 6308.72 samples/sec Loss 9.6529 LearningRate 0.0010 Epoch: 3 Global Step: 80330 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:06,414-Speed 6311.93 samples/sec Loss 9.6209 LearningRate 0.0010 Epoch: 3 Global Step: 80340 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:09,657-Speed 6316.10 samples/sec Loss 9.6406 LearningRate 0.0010 Epoch: 3 Global Step: 80350 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:12,905-Speed 6306.91 samples/sec Loss 9.6482 LearningRate 0.0010 Epoch: 3 Global Step: 80360 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:16,150-Speed 6313.30 samples/sec Loss 9.6238 LearningRate 0.0010 Epoch: 3 Global Step: 80370 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:19,405-Speed 6294.21 samples/sec Loss 9.7080 LearningRate 0.0010 Epoch: 3 Global Step: 80380 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:22,650-Speed 6311.74 samples/sec Loss 9.6491 LearningRate 0.0010 Epoch: 3 Global Step: 80390 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:25,892-Speed 6320.70 samples/sec Loss 9.6319 LearningRate 0.0010 Epoch: 3 Global Step: 80400 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:29,139-Speed 6308.52 samples/sec Loss 9.6761 LearningRate 0.0010 Epoch: 3 Global Step: 80410 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:32,383-Speed 6313.32 samples/sec Loss 9.7639 LearningRate 0.0010 Epoch: 3 Global Step: 80420 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:35,614-Speed 6340.12 samples/sec Loss 9.7115 LearningRate 0.0010 Epoch: 3 Global Step: 80430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:38,857-Speed 6315.89 samples/sec Loss 9.6186 LearningRate 0.0010 Epoch: 3 Global Step: 80440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:42,101-Speed 6316.17 samples/sec Loss 9.6740 LearningRate 0.0010 Epoch: 3 Global Step: 80450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:45,346-Speed 6312.05 samples/sec Loss 9.6856 LearningRate 0.0010 Epoch: 3 Global Step: 80460 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:48,590-Speed 6314.34 samples/sec Loss 9.8295 LearningRate 0.0010 Epoch: 3 Global Step: 80470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:51,832-Speed 6317.59 samples/sec Loss 9.6110 LearningRate 0.0010 Epoch: 3 Global Step: 80480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:55,075-Speed 6318.40 samples/sec Loss 9.6941 LearningRate 0.0010 Epoch: 3 Global Step: 80490 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:02:58,319-Speed 6312.71 samples/sec Loss 9.7520 LearningRate 0.0010 Epoch: 3 Global Step: 80500 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:01,576-Speed 6289.59 samples/sec Loss 9.6438 LearningRate 0.0010 Epoch: 3 Global Step: 80510 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:04,820-Speed 6315.46 samples/sec Loss 9.6785 LearningRate 0.0010 Epoch: 3 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:08,065-Speed 6311.96 samples/sec Loss 9.6245 LearningRate 0.0010 Epoch: 3 Global Step: 80530 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 23:03:11,295-Speed 6343.01 samples/sec Loss 9.6604 LearningRate 0.0010 Epoch: 3 Global Step: 80540 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:14,542-Speed 6307.07 samples/sec Loss 9.7737 LearningRate 0.0010 Epoch: 3 Global Step: 80550 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:17,788-Speed 6312.05 samples/sec Loss 9.7253 LearningRate 0.0010 Epoch: 3 Global Step: 80560 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:21,045-Speed 6288.12 samples/sec Loss 9.7819 LearningRate 0.0010 Epoch: 3 Global Step: 80570 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:24,295-Speed 6303.83 samples/sec Loss 9.6890 LearningRate 0.0010 Epoch: 3 Global Step: 80580 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:27,540-Speed 6314.00 samples/sec Loss 9.7353 LearningRate 0.0010 Epoch: 3 Global Step: 80590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:30,786-Speed 6309.76 samples/sec Loss 9.6779 LearningRate 0.0010 Epoch: 3 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:34,028-Speed 6318.27 samples/sec Loss 9.7271 LearningRate 0.0010 Epoch: 3 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:37,276-Speed 6306.82 samples/sec Loss 9.6681 LearningRate 0.0010 Epoch: 3 Global Step: 80620 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:40,524-Speed 6307.03 samples/sec Loss 9.7536 LearningRate 0.0010 Epoch: 3 Global Step: 80630 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:43,754-Speed 6342.12 samples/sec Loss 9.7044 LearningRate 0.0010 Epoch: 3 Global Step: 80640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:03:47,056-Speed 6204.31 samples/sec Loss 9.6429 LearningRate 0.0010 Epoch: 3 Global Step: 80650 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:03:50,303-Speed 6307.96 samples/sec Loss 9.6594 LearningRate 0.0010 Epoch: 3 Global Step: 80660 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:03:53,561-Speed 6287.02 samples/sec Loss 9.7663 LearningRate 0.0010 Epoch: 3 Global Step: 80670 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:03:56,807-Speed 6311.46 samples/sec Loss 9.7022 LearningRate 0.0010 Epoch: 3 Global Step: 80680 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:04:00,051-Speed 6314.57 samples/sec Loss 9.6834 LearningRate 0.0010 Epoch: 3 Global Step: 80690 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:04:03,300-Speed 6304.48 samples/sec Loss 9.6299 LearningRate 0.0010 Epoch: 3 Global Step: 80700 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:04:06,547-Speed 6308.87 samples/sec Loss 9.7835 LearningRate 0.0010 Epoch: 3 Global Step: 80710 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:04:09,799-Speed 6298.82 samples/sec Loss 9.5439 LearningRate 0.0010 Epoch: 3 Global Step: 80720 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:04:13,048-Speed 6305.42 samples/sec Loss 9.6664 LearningRate 0.0010 Epoch: 3 Global Step: 80730 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:04:16,299-Speed 6299.79 samples/sec Loss 9.5931 LearningRate 0.0010 Epoch: 3 Global Step: 80740 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:04:19,545-Speed 6312.90 samples/sec Loss 9.7669 LearningRate 0.0010 Epoch: 3 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:22,788-Speed 6314.92 samples/sec Loss 9.7065 LearningRate 0.0010 Epoch: 3 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:26,035-Speed 6309.74 samples/sec Loss 9.7176 LearningRate 0.0010 Epoch: 3 Global Step: 80770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:29,279-Speed 6313.93 samples/sec Loss 9.6155 LearningRate 0.0010 Epoch: 3 Global Step: 80780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:32,526-Speed 6309.87 samples/sec Loss 9.6447 LearningRate 0.0010 Epoch: 3 Global Step: 80790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:35,772-Speed 6310.15 samples/sec Loss 9.5769 LearningRate 0.0010 Epoch: 3 Global Step: 80800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:39,018-Speed 6311.99 samples/sec Loss 9.6593 LearningRate 0.0010 Epoch: 3 Global Step: 80810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:42,263-Speed 6311.93 samples/sec Loss 9.6760 LearningRate 0.0010 Epoch: 3 Global Step: 80820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:45,509-Speed 6311.75 samples/sec Loss 9.6894 LearningRate 0.0010 Epoch: 3 Global Step: 80830 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:48,755-Speed 6310.82 samples/sec Loss 9.6601 LearningRate 0.0010 Epoch: 3 Global Step: 80840 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:52,009-Speed 6294.39 samples/sec Loss 9.5366 LearningRate 0.0010 Epoch: 3 Global Step: 80850 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:55,303-Speed 6218.97 samples/sec Loss 9.6243 LearningRate 0.0010 Epoch: 3 Global Step: 80860 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:04:58,553-Speed 6303.53 samples/sec Loss 9.6679 LearningRate 0.0010 Epoch: 3 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:01,797-Speed 6313.45 samples/sec Loss 9.7547 LearningRate 0.0010 Epoch: 3 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:05,041-Speed 6314.92 samples/sec Loss 9.5806 LearningRate 0.0010 Epoch: 3 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:08,285-Speed 6314.30 samples/sec Loss 9.6499 LearningRate 0.0010 Epoch: 3 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:11,536-Speed 6301.85 samples/sec Loss 9.7516 LearningRate 0.0010 Epoch: 3 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:14,782-Speed 6310.33 samples/sec Loss 9.6502 LearningRate 0.0010 Epoch: 3 Global Step: 80920 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:18,026-Speed 6314.05 samples/sec Loss 9.6971 LearningRate 0.0010 Epoch: 3 Global Step: 80930 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:21,274-Speed 6306.61 samples/sec Loss 9.6575 LearningRate 0.0010 Epoch: 3 Global Step: 80940 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:24,530-Speed 6291.56 samples/sec Loss 9.6840 LearningRate 0.0010 Epoch: 3 Global Step: 80950 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:27,782-Speed 6298.38 samples/sec Loss 9.8220 LearningRate 0.0010 Epoch: 3 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:31,032-Speed 6304.14 samples/sec Loss 9.6727 LearningRate 0.0010 Epoch: 3 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:34,279-Speed 6308.03 samples/sec Loss 9.6912 LearningRate 0.0010 Epoch: 3 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:37,528-Speed 6305.44 samples/sec Loss 9.7163 LearningRate 0.0010 Epoch: 3 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:40,768-Speed 6323.34 samples/sec Loss 9.5989 LearningRate 0.0010 Epoch: 3 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:44,015-Speed 6308.37 samples/sec Loss 9.6495 LearningRate 0.0010 Epoch: 3 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:47,263-Speed 6307.07 samples/sec Loss 9.6568 LearningRate 0.0010 Epoch: 3 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:50,507-Speed 6314.11 samples/sec Loss 9.7189 LearningRate 0.0010 Epoch: 3 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:53,750-Speed 6317.45 samples/sec Loss 9.6530 LearningRate 0.0010 Epoch: 3 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:05:56,986-Speed 6329.63 samples/sec Loss 9.7477 LearningRate 0.0010 Epoch: 3 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:00,234-Speed 6307.53 samples/sec Loss 9.6717 LearningRate 0.0010 Epoch: 3 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:03,478-Speed 6313.21 samples/sec Loss 9.6947 LearningRate 0.0010 Epoch: 3 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:06,727-Speed 6305.68 samples/sec Loss 9.7016 LearningRate 0.0010 Epoch: 3 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:09,976-Speed 6305.20 samples/sec Loss 9.6324 LearningRate 0.0010 Epoch: 3 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:13,223-Speed 6308.79 samples/sec Loss 9.6811 LearningRate 0.0010 Epoch: 3 Global Step: 81100 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:16,479-Speed 6291.08 samples/sec Loss 9.6474 LearningRate 0.0010 Epoch: 3 Global Step: 81110 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:19,726-Speed 6309.05 samples/sec Loss 9.6837 LearningRate 0.0010 Epoch: 3 Global Step: 81120 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:22,974-Speed 6306.40 samples/sec Loss 9.7028 LearningRate 0.0010 Epoch: 3 Global Step: 81130 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:26,231-Speed 6290.46 samples/sec Loss 9.6514 LearningRate 0.0010 Epoch: 3 Global Step: 81140 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:29,475-Speed 6314.35 samples/sec Loss 9.6515 LearningRate 0.0010 Epoch: 3 Global Step: 81150 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:32,720-Speed 6311.87 samples/sec Loss 9.6494 LearningRate 0.0010 Epoch: 3 Global Step: 81160 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:35,963-Speed 6315.29 samples/sec Loss 9.6556 LearningRate 0.0010 Epoch: 3 Global Step: 81170 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:39,216-Speed 6298.95 samples/sec Loss 9.6862 LearningRate 0.0010 Epoch: 3 Global Step: 81180 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:42,463-Speed 6308.58 samples/sec Loss 9.7103 LearningRate 0.0010 Epoch: 3 Global Step: 81190 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:45,713-Speed 6303.21 samples/sec Loss 9.6597 LearningRate 0.0010 Epoch: 3 Global Step: 81200 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:48,959-Speed 6310.15 samples/sec Loss 9.6949 LearningRate 0.0010 Epoch: 3 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:52,206-Speed 6308.69 samples/sec Loss 9.6603 LearningRate 0.0010 Epoch: 3 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:55,464-Speed 6288.74 samples/sec Loss 9.7154 LearningRate 0.0010 Epoch: 3 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:06:58,709-Speed 6313.01 samples/sec Loss 9.6893 LearningRate 0.0010 Epoch: 3 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:01,964-Speed 6293.09 samples/sec Loss 9.6078 LearningRate 0.0010 Epoch: 3 Global Step: 81250 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 23:07:05,198-Speed 6332.70 samples/sec Loss 9.6933 LearningRate 0.0010 Epoch: 3 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:08,440-Speed 6318.76 samples/sec Loss 9.6577 LearningRate 0.0010 Epoch: 3 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:11,687-Speed 6309.04 samples/sec Loss 9.6755 LearningRate 0.0010 Epoch: 3 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:14,933-Speed 6311.14 samples/sec Loss 9.6532 LearningRate 0.0010 Epoch: 3 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:18,185-Speed 6298.59 samples/sec Loss 9.5348 LearningRate 0.0010 Epoch: 3 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:21,432-Speed 6308.74 samples/sec Loss 9.7341 LearningRate 0.0010 Epoch: 3 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:24,681-Speed 6304.88 samples/sec Loss 9.6102 LearningRate 0.0010 Epoch: 3 Global Step: 81320 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:27,929-Speed 6307.28 samples/sec Loss 9.6262 LearningRate 0.0010 Epoch: 3 Global Step: 81330 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:31,178-Speed 6304.81 samples/sec Loss 9.5748 LearningRate 0.0010 Epoch: 3 Global Step: 81340 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:34,473-Speed 6216.23 samples/sec Loss 9.6790 LearningRate 0.0010 Epoch: 3 Global Step: 81350 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:37,726-Speed 6297.45 samples/sec Loss 9.6219 LearningRate 0.0010 Epoch: 3 Global Step: 81360 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:40,973-Speed 6309.12 samples/sec Loss 9.6376 LearningRate 0.0010 Epoch: 3 Global Step: 81370 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:44,221-Speed 6305.84 samples/sec Loss 9.7452 LearningRate 0.0010 Epoch: 3 Global Step: 81380 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:47,467-Speed 6309.74 samples/sec Loss 9.6398 LearningRate 0.0010 Epoch: 3 Global Step: 81390 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:50,714-Speed 6310.78 samples/sec Loss 9.6539 LearningRate 0.0010 Epoch: 3 Global Step: 81400 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:53,957-Speed 6315.04 samples/sec Loss 9.6899 LearningRate 0.0010 Epoch: 3 Global Step: 81410 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:07:57,204-Speed 6309.43 samples/sec Loss 9.6425 LearningRate 0.0010 Epoch: 3 Global Step: 81420 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:00,456-Speed 6300.06 samples/sec Loss 9.6421 LearningRate 0.0010 Epoch: 3 Global Step: 81430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:03,703-Speed 6309.15 samples/sec Loss 9.6514 LearningRate 0.0010 Epoch: 3 Global Step: 81440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:06,948-Speed 6312.39 samples/sec Loss 9.6807 LearningRate 0.0010 Epoch: 3 Global Step: 81450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:10,181-Speed 6336.08 samples/sec Loss 9.6937 LearningRate 0.0010 Epoch: 3 Global Step: 81460 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:13,428-Speed 6308.75 samples/sec Loss 9.6583 LearningRate 0.0010 Epoch: 3 Global Step: 81470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:16,673-Speed 6311.52 samples/sec Loss 9.5627 LearningRate 0.0010 Epoch: 3 Global Step: 81480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:19,935-Speed 6281.05 samples/sec Loss 9.7585 LearningRate 0.0010 Epoch: 3 Global Step: 81490 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:23,189-Speed 6294.58 samples/sec Loss 9.6478 LearningRate 0.0010 Epoch: 3 Global Step: 81500 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:26,435-Speed 6311.93 samples/sec Loss 9.6629 LearningRate 0.0010 Epoch: 3 Global Step: 81510 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:29,681-Speed 6310.03 samples/sec Loss 9.6477 LearningRate 0.0010 Epoch: 3 Global Step: 81520 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:32,927-Speed 6310.00 samples/sec Loss 9.5769 LearningRate 0.0010 Epoch: 3 Global Step: 81530 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:36,168-Speed 6319.81 samples/sec Loss 9.6063 LearningRate 0.0010 Epoch: 3 Global Step: 81540 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:39,417-Speed 6305.81 samples/sec Loss 9.7567 LearningRate 0.0010 Epoch: 3 Global Step: 81550 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:42,662-Speed 6312.26 samples/sec Loss 9.6937 LearningRate 0.0010 Epoch: 3 Global Step: 81560 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 23:08:45,891-Speed 6343.36 samples/sec Loss 9.6754 LearningRate 0.0010 Epoch: 3 Global Step: 81570 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:49,143-Speed 6299.41 samples/sec Loss 9.6498 LearningRate 0.0010 Epoch: 3 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:52,393-Speed 6304.48 samples/sec Loss 9.6395 LearningRate 0.0010 Epoch: 3 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:55,632-Speed 6323.44 samples/sec Loss 9.6163 LearningRate 0.0010 Epoch: 3 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:08:58,877-Speed 6312.19 samples/sec Loss 9.5954 LearningRate 0.0010 Epoch: 3 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:09:02,122-Speed 6313.93 samples/sec Loss 9.6551 LearningRate 0.0010 Epoch: 3 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:09:05,364-Speed 6319.53 samples/sec Loss 9.6964 LearningRate 0.0010 Epoch: 3 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:09:08,611-Speed 6307.48 samples/sec Loss 9.6799 LearningRate 0.0010 Epoch: 3 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:09:11,854-Speed 6316.64 samples/sec Loss 9.5975 LearningRate 0.0010 Epoch: 3 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:09:15,100-Speed 6310.72 samples/sec Loss 9.6361 LearningRate 0.0010 Epoch: 3 Global Step: 81660 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:09:18,330-Speed 6342.84 samples/sec Loss 9.6471 LearningRate 0.0010 Epoch: 3 Global Step: 81670 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:09:21,577-Speed 6308.13 samples/sec Loss 9.6190 LearningRate 0.0010 Epoch: 3 Global Step: 81680 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:09:24,818-Speed 6321.52 samples/sec Loss 9.7055 LearningRate 0.0010 Epoch: 3 Global Step: 81690 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:09:28,064-Speed 6310.12 samples/sec Loss 9.6354 LearningRate 0.0010 Epoch: 3 Global Step: 81700 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:09:31,309-Speed 6312.61 samples/sec Loss 9.7257 LearningRate 0.0010 Epoch: 3 Global Step: 81710 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:09:34,553-Speed 6314.66 samples/sec Loss 9.6560 LearningRate 0.0010 Epoch: 3 Global Step: 81720 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:09:37,796-Speed 6315.86 samples/sec Loss 9.6711 LearningRate 0.0010 Epoch: 3 Global Step: 81730 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:09:41,041-Speed 6312.53 samples/sec Loss 9.7007 LearningRate 0.0010 Epoch: 3 Global Step: 81740 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:09:44,288-Speed 6308.30 samples/sec Loss 9.6395 LearningRate 0.0010 Epoch: 3 Global Step: 81750 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:09:47,535-Speed 6309.49 samples/sec Loss 9.5987 LearningRate 0.0010 Epoch: 3 Global Step: 81760 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:09:50,781-Speed 6311.93 samples/sec Loss 9.5192 LearningRate 0.0010 Epoch: 3 Global Step: 81770 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:09:54,026-Speed 6311.52 samples/sec Loss 9.6827 LearningRate 0.0010 Epoch: 3 Global Step: 81780 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:09:57,268-Speed 6319.17 samples/sec Loss 9.6838 LearningRate 0.0010 Epoch: 3 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:00,512-Speed 6313.54 samples/sec Loss 9.6611 LearningRate 0.0010 Epoch: 3 Global Step: 81800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:03,800-Speed 6231.38 samples/sec Loss 9.6017 LearningRate 0.0010 Epoch: 3 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:07,045-Speed 6311.45 samples/sec Loss 9.6396 LearningRate 0.0010 Epoch: 3 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:10,289-Speed 6314.57 samples/sec Loss 9.6800 LearningRate 0.0010 Epoch: 3 Global Step: 81830 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:13,538-Speed 6305.59 samples/sec Loss 9.5890 LearningRate 0.0010 Epoch: 3 Global Step: 81840 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:16,787-Speed 6305.16 samples/sec Loss 9.6391 LearningRate 0.0010 Epoch: 3 Global Step: 81850 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:20,033-Speed 6311.18 samples/sec Loss 9.7661 LearningRate 0.0010 Epoch: 3 Global Step: 81860 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:23,279-Speed 6310.52 samples/sec Loss 9.6373 LearningRate 0.0010 Epoch: 3 Global Step: 81870 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:26,526-Speed 6309.47 samples/sec Loss 9.6952 LearningRate 0.0010 Epoch: 3 Global Step: 81880 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:29,759-Speed 6335.09 samples/sec Loss 9.6117 LearningRate 0.0010 Epoch: 3 Global Step: 81890 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:33,005-Speed 6310.70 samples/sec Loss 9.6428 LearningRate 0.0010 Epoch: 3 Global Step: 81900 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:36,251-Speed 6311.69 samples/sec Loss 9.6237 LearningRate 0.0010 Epoch: 3 Global Step: 81910 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:39,498-Speed 6308.27 samples/sec Loss 9.6983 LearningRate 0.0010 Epoch: 3 Global Step: 81920 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:42,743-Speed 6312.81 samples/sec Loss 9.6299 LearningRate 0.0010 Epoch: 3 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:45,987-Speed 6313.37 samples/sec Loss 9.6426 LearningRate 0.0010 Epoch: 3 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:49,236-Speed 6305.37 samples/sec Loss 9.6731 LearningRate 0.0010 Epoch: 3 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:52,483-Speed 6308.36 samples/sec Loss 9.6473 LearningRate 0.0010 Epoch: 3 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:55,727-Speed 6315.27 samples/sec Loss 9.6853 LearningRate 0.0010 Epoch: 3 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:10:58,971-Speed 6315.32 samples/sec Loss 9.5986 LearningRate 0.0010 Epoch: 3 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:02,208-Speed 6327.27 samples/sec Loss 9.7361 LearningRate 0.0010 Epoch: 3 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:05,455-Speed 6309.95 samples/sec Loss 9.6207 LearningRate 0.0010 Epoch: 3 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:08,702-Speed 6306.91 samples/sec Loss 9.6504 LearningRate 0.0010 Epoch: 3 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:11,950-Speed 6307.13 samples/sec Loss 9.6372 LearningRate 0.0010 Epoch: 3 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:15,200-Speed 6304.21 samples/sec Loss 9.6593 LearningRate 0.0010 Epoch: 3 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:18,447-Speed 6308.74 samples/sec Loss 9.5724 LearningRate 0.0010 Epoch: 3 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:21,699-Speed 6298.09 samples/sec Loss 9.6533 LearningRate 0.0010 Epoch: 3 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:24,953-Speed 6296.28 samples/sec Loss 9.5861 LearningRate 0.0010 Epoch: 3 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:28,200-Speed 6309.29 samples/sec Loss 9.6178 LearningRate 0.0010 Epoch: 3 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:31,440-Speed 6322.14 samples/sec Loss 9.7299 LearningRate 0.0010 Epoch: 3 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:34,690-Speed 6302.25 samples/sec Loss 9.6784 LearningRate 0.0010 Epoch: 3 Global Step: 82090 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 23:11:37,928-Speed 6327.10 samples/sec Loss 9.5591 LearningRate 0.0010 Epoch: 3 Global Step: 82100 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:41,172-Speed 6314.81 samples/sec Loss 9.6273 LearningRate 0.0010 Epoch: 3 Global Step: 82110 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:44,423-Speed 6300.54 samples/sec Loss 9.5539 LearningRate 0.0010 Epoch: 3 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:47,666-Speed 6316.47 samples/sec Loss 9.6489 LearningRate 0.0010 Epoch: 3 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:50,909-Speed 6316.93 samples/sec Loss 9.6046 LearningRate 0.0010 Epoch: 3 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:54,154-Speed 6311.33 samples/sec Loss 9.6187 LearningRate 0.0010 Epoch: 3 Global Step: 82150 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:11:57,405-Speed 6302.65 samples/sec Loss 9.5769 LearningRate 0.0010 Epoch: 3 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:00,655-Speed 6301.65 samples/sec Loss 9.6178 LearningRate 0.0010 Epoch: 3 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:03,897-Speed 6318.00 samples/sec Loss 9.6679 LearningRate 0.0010 Epoch: 3 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:07,142-Speed 6312.79 samples/sec Loss 9.6408 LearningRate 0.0010 Epoch: 3 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:10,373-Speed 6341.04 samples/sec Loss 9.5997 LearningRate 0.0010 Epoch: 3 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:13,617-Speed 6313.99 samples/sec Loss 9.6688 LearningRate 0.0010 Epoch: 3 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:16,863-Speed 6311.20 samples/sec Loss 9.5407 LearningRate 0.0010 Epoch: 3 Global Step: 82220 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:20,105-Speed 6318.65 samples/sec Loss 9.6629 LearningRate 0.0010 Epoch: 3 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:23,355-Speed 6301.51 samples/sec Loss 9.7053 LearningRate 0.0010 Epoch: 3 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:26,599-Speed 6314.75 samples/sec Loss 9.6253 LearningRate 0.0010 Epoch: 3 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:29,848-Speed 6305.39 samples/sec Loss 9.5594 LearningRate 0.0010 Epoch: 3 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:33,093-Speed 6314.53 samples/sec Loss 9.6765 LearningRate 0.0010 Epoch: 3 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:36,341-Speed 6305.30 samples/sec Loss 9.5313 LearningRate 0.0010 Epoch: 3 Global Step: 82280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:39,590-Speed 6304.89 samples/sec Loss 9.6150 LearningRate 0.0010 Epoch: 3 Global Step: 82290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:42,826-Speed 6331.48 samples/sec Loss 9.6612 LearningRate 0.0010 Epoch: 3 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:46,073-Speed 6307.60 samples/sec Loss 9.5969 LearningRate 0.0010 Epoch: 3 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:49,322-Speed 6305.28 samples/sec Loss 9.5365 LearningRate 0.0010 Epoch: 3 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:52,565-Speed 6316.95 samples/sec Loss 9.5522 LearningRate 0.0010 Epoch: 3 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:55,808-Speed 6315.86 samples/sec Loss 9.6370 LearningRate 0.0010 Epoch: 3 Global Step: 82340 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:12:59,053-Speed 6314.01 samples/sec Loss 9.6169 LearningRate 0.0010 Epoch: 3 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:02,301-Speed 6305.67 samples/sec Loss 9.6459 LearningRate 0.0010 Epoch: 3 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:05,544-Speed 6317.64 samples/sec Loss 9.6532 LearningRate 0.0010 Epoch: 3 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:08,789-Speed 6311.06 samples/sec Loss 9.5956 LearningRate 0.0010 Epoch: 3 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:12,033-Speed 6314.82 samples/sec Loss 9.6857 LearningRate 0.0010 Epoch: 3 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:15,281-Speed 6307.86 samples/sec Loss 9.5793 LearningRate 0.0010 Epoch: 3 Global Step: 82400 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 23:13:18,520-Speed 6324.51 samples/sec Loss 9.5728 LearningRate 0.0010 Epoch: 3 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:21,763-Speed 6314.92 samples/sec Loss 9.6258 LearningRate 0.0010 Epoch: 3 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:25,009-Speed 6312.59 samples/sec Loss 9.6965 LearningRate 0.0010 Epoch: 3 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:28,251-Speed 6317.02 samples/sec Loss 9.6980 LearningRate 0.0010 Epoch: 3 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:31,499-Speed 6307.34 samples/sec Loss 9.6320 LearningRate 0.0010 Epoch: 3 Global Step: 82450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:34,745-Speed 6311.13 samples/sec Loss 9.5656 LearningRate 0.0010 Epoch: 3 Global Step: 82460 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:37,989-Speed 6315.62 samples/sec Loss 9.6296 LearningRate 0.0010 Epoch: 3 Global Step: 82470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:41,234-Speed 6311.92 samples/sec Loss 9.5261 LearningRate 0.0010 Epoch: 3 Global Step: 82480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:13:44,467-Speed 6336.37 samples/sec Loss 9.5544 LearningRate 0.0010 Epoch: 3 Global Step: 82490 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:13:47,711-Speed 6314.43 samples/sec Loss 9.5961 LearningRate 0.0010 Epoch: 3 Global Step: 82500 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:13:50,965-Speed 6295.64 samples/sec Loss 9.6162 LearningRate 0.0010 Epoch: 3 Global Step: 82510 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:13:54,206-Speed 6319.29 samples/sec Loss 9.6720 LearningRate 0.0010 Epoch: 3 Global Step: 82520 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:13:57,451-Speed 6313.91 samples/sec Loss 9.6307 LearningRate 0.0010 Epoch: 3 Global Step: 82530 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:14:00,694-Speed 6315.25 samples/sec Loss 9.5746 LearningRate 0.0010 Epoch: 3 Global Step: 82540 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:14:03,939-Speed 6313.08 samples/sec Loss 9.6711 LearningRate 0.0010 Epoch: 3 Global Step: 82550 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:14:07,187-Speed 6307.74 samples/sec Loss 9.7172 LearningRate 0.0010 Epoch: 3 Global Step: 82560 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:14:10,432-Speed 6312.34 samples/sec Loss 9.5569 LearningRate 0.0010 Epoch: 3 Global Step: 82570 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:14:13,676-Speed 6314.94 samples/sec Loss 9.5546 LearningRate 0.0010 Epoch: 3 Global Step: 82580 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:14:16,920-Speed 6313.51 samples/sec Loss 9.6216 LearningRate 0.0010 Epoch: 3 Global Step: 82590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:20,168-Speed 6306.79 samples/sec Loss 9.6604 LearningRate 0.0010 Epoch: 3 Global Step: 82600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:23,412-Speed 6314.78 samples/sec Loss 9.6326 LearningRate 0.0010 Epoch: 3 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:26,658-Speed 6310.49 samples/sec Loss 9.6262 LearningRate 0.0010 Epoch: 3 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:29,907-Speed 6305.75 samples/sec Loss 9.6975 LearningRate 0.0010 Epoch: 3 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:33,151-Speed 6315.21 samples/sec Loss 9.5507 LearningRate 0.0010 Epoch: 3 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:36,394-Speed 6314.57 samples/sec Loss 9.5953 LearningRate 0.0010 Epoch: 3 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:39,639-Speed 6313.03 samples/sec Loss 9.6037 LearningRate 0.0010 Epoch: 3 Global Step: 82660 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:42,883-Speed 6316.33 samples/sec Loss 9.6198 LearningRate 0.0010 Epoch: 3 Global Step: 82670 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:46,130-Speed 6309.19 samples/sec Loss 9.6999 LearningRate 0.0010 Epoch: 3 Global Step: 82680 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:49,360-Speed 6341.33 samples/sec Loss 9.6387 LearningRate 0.0010 Epoch: 3 Global Step: 82690 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:52,610-Speed 6302.51 samples/sec Loss 9.6663 LearningRate 0.0010 Epoch: 3 Global Step: 82700 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:55,854-Speed 6315.07 samples/sec Loss 9.6262 LearningRate 0.0010 Epoch: 3 Global Step: 82710 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:14:59,101-Speed 6307.79 samples/sec Loss 9.6824 LearningRate 0.0010 Epoch: 3 Global Step: 82720 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:02,351-Speed 6304.60 samples/sec Loss 9.6301 LearningRate 0.0010 Epoch: 3 Global Step: 82730 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:05,599-Speed 6305.18 samples/sec Loss 9.6184 LearningRate 0.0010 Epoch: 3 Global Step: 82740 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:08,845-Speed 6311.90 samples/sec Loss 9.6608 LearningRate 0.0010 Epoch: 3 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:12,094-Speed 6304.84 samples/sec Loss 9.6070 LearningRate 0.0010 Epoch: 3 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:15,337-Speed 6316.42 samples/sec Loss 9.5824 LearningRate 0.0010 Epoch: 3 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:18,584-Speed 6309.21 samples/sec Loss 9.6581 LearningRate 0.0010 Epoch: 3 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:21,816-Speed 6337.65 samples/sec Loss 9.6061 LearningRate 0.0010 Epoch: 3 Global Step: 82790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:25,064-Speed 6305.76 samples/sec Loss 9.7255 LearningRate 0.0010 Epoch: 3 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:28,316-Speed 6300.40 samples/sec Loss 9.5692 LearningRate 0.0010 Epoch: 3 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:31,563-Speed 6308.33 samples/sec Loss 9.7126 LearningRate 0.0010 Epoch: 3 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:34,812-Speed 6304.43 samples/sec Loss 9.5318 LearningRate 0.0010 Epoch: 3 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:38,061-Speed 6304.85 samples/sec Loss 9.6550 LearningRate 0.0010 Epoch: 3 Global Step: 82840 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:41,326-Speed 6274.22 samples/sec Loss 9.5812 LearningRate 0.0010 Epoch: 3 Global Step: 82850 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:44,582-Speed 6291.79 samples/sec Loss 9.6057 LearningRate 0.0010 Epoch: 3 Global Step: 82860 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:47,829-Speed 6307.19 samples/sec Loss 9.7515 LearningRate 0.0010 Epoch: 3 Global Step: 82870 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:51,069-Speed 6323.58 samples/sec Loss 9.6078 LearningRate 0.0010 Epoch: 3 Global Step: 82880 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:15:54,300-Speed 6340.41 samples/sec Loss 9.6544 LearningRate 0.0010 Epoch: 3 Global Step: 82890 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:15:57,540-Speed 6322.31 samples/sec Loss 9.5962 LearningRate 0.0010 Epoch: 3 Global Step: 82900 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:16:00,786-Speed 6310.43 samples/sec Loss 9.5266 LearningRate 0.0010 Epoch: 3 Global Step: 82910 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:16:04,031-Speed 6313.85 samples/sec Loss 9.6673 LearningRate 0.0010 Epoch: 3 Global Step: 82920 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:16:07,274-Speed 6316.51 samples/sec Loss 9.6966 LearningRate 0.0010 Epoch: 3 Global Step: 82930 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:16:10,518-Speed 6313.72 samples/sec Loss 9.5925 LearningRate 0.0010 Epoch: 3 Global Step: 82940 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:16:13,757-Speed 6323.26 samples/sec Loss 9.6214 LearningRate 0.0010 Epoch: 3 Global Step: 82950 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:17:13,321-Speed 343.84 samples/sec Loss 9.5561 LearningRate 0.0010 Epoch: 4 Global Step: 82960 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:17:16,559-Speed 6326.25 samples/sec Loss 9.6963 LearningRate 0.0010 Epoch: 4 Global Step: 82970 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:17:19,827-Speed 6269.12 samples/sec Loss 9.6169 LearningRate 0.0010 Epoch: 4 Global Step: 82980 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:17:23,109-Speed 6241.53 samples/sec Loss 9.6237 LearningRate 0.0010 Epoch: 4 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:26,379-Speed 6264.89 samples/sec Loss 9.6183 LearningRate 0.0010 Epoch: 4 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:29,618-Speed 6323.15 samples/sec Loss 9.5520 LearningRate 0.0010 Epoch: 4 Global Step: 83010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:32,854-Speed 6331.02 samples/sec Loss 9.7283 LearningRate 0.0010 Epoch: 4 Global Step: 83020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:36,095-Speed 6321.01 samples/sec Loss 9.5615 LearningRate 0.0010 Epoch: 4 Global Step: 83030 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:39,337-Speed 6318.70 samples/sec Loss 9.6220 LearningRate 0.0010 Epoch: 4 Global Step: 83040 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:42,575-Speed 6324.89 samples/sec Loss 9.6320 LearningRate 0.0010 Epoch: 4 Global Step: 83050 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:45,823-Speed 6306.18 samples/sec Loss 9.6090 LearningRate 0.0010 Epoch: 4 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:49,059-Speed 6331.29 samples/sec Loss 9.5824 LearningRate 0.0010 Epoch: 4 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:52,305-Speed 6310.88 samples/sec Loss 9.5965 LearningRate 0.0010 Epoch: 4 Global Step: 83080 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:55,532-Speed 6348.05 samples/sec Loss 9.6156 LearningRate 0.0010 Epoch: 4 Global Step: 83090 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:17:58,770-Speed 6326.73 samples/sec Loss 9.6297 LearningRate 0.0010 Epoch: 4 Global Step: 83100 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:18:02,011-Speed 6320.25 samples/sec Loss 9.7442 LearningRate 0.0010 Epoch: 4 Global Step: 83110 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:18:05,250-Speed 6324.71 samples/sec Loss 9.5225 LearningRate 0.0010 Epoch: 4 Global Step: 83120 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:18:08,492-Speed 6319.04 samples/sec Loss 9.6616 LearningRate 0.0010 Epoch: 4 Global Step: 83130 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:18:11,734-Speed 6318.03 samples/sec Loss 9.6135 LearningRate 0.0010 Epoch: 4 Global Step: 83140 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:18:14,976-Speed 6319.26 samples/sec Loss 9.6356 LearningRate 0.0010 Epoch: 4 Global Step: 83150 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:18:18,221-Speed 6312.75 samples/sec Loss 9.5169 LearningRate 0.0010 Epoch: 4 Global Step: 83160 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:18:21,458-Speed 6328.87 samples/sec Loss 9.5183 LearningRate 0.0010 Epoch: 4 Global Step: 83170 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:18:24,685-Speed 6346.03 samples/sec Loss 9.5272 LearningRate 0.0010 Epoch: 4 Global Step: 83180 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:18:27,922-Speed 6329.66 samples/sec Loss 9.5114 LearningRate 0.0010 Epoch: 4 Global Step: 83190 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:18:31,161-Speed 6322.96 samples/sec Loss 9.6262 LearningRate 0.0010 Epoch: 4 Global Step: 83200 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:18:34,461-Speed 6207.86 samples/sec Loss 9.5762 LearningRate 0.0010 Epoch: 4 Global Step: 83210 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:18:37,699-Speed 6326.96 samples/sec Loss 9.5941 LearningRate 0.0010 Epoch: 4 Global Step: 83220 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:18:40,944-Speed 6311.14 samples/sec Loss 9.6245 LearningRate 0.0010 Epoch: 4 Global Step: 83230 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:18:44,182-Speed 6327.37 samples/sec Loss 9.6040 LearningRate 0.0010 Epoch: 4 Global Step: 83240 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:18:47,422-Speed 6322.27 samples/sec Loss 9.5056 LearningRate 0.0010 Epoch: 4 Global Step: 83250 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:18:50,660-Speed 6325.86 samples/sec Loss 9.5317 LearningRate 0.0010 Epoch: 4 Global Step: 83260 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:18:53,901-Speed 6321.86 samples/sec Loss 9.5452 LearningRate 0.0010 Epoch: 4 Global Step: 83270 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:18:57,149-Speed 6307.16 samples/sec Loss 9.4824 LearningRate 0.0010 Epoch: 4 Global Step: 83280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:19:00,389-Speed 6322.31 samples/sec Loss 9.5911 LearningRate 0.0010 Epoch: 4 Global Step: 83290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:19:03,619-Speed 6341.49 samples/sec Loss 9.6427 LearningRate 0.0010 Epoch: 4 Global Step: 83300 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:19:06,858-Speed 6324.11 samples/sec Loss 9.6910 LearningRate 0.0010 Epoch: 4 Global Step: 83310 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:19:10,096-Speed 6327.79 samples/sec Loss 9.5826 LearningRate 0.0010 Epoch: 4 Global Step: 83320 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:19:13,339-Speed 6315.83 samples/sec Loss 9.5433 LearningRate 0.0010 Epoch: 4 Global Step: 83330 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:19:16,581-Speed 6318.62 samples/sec Loss 9.6074 LearningRate 0.0010 Epoch: 4 Global Step: 83340 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:19:19,826-Speed 6312.65 samples/sec Loss 9.6542 LearningRate 0.0010 Epoch: 4 Global Step: 83350 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:19:23,067-Speed 6319.48 samples/sec Loss 9.5748 LearningRate 0.0010 Epoch: 4 Global Step: 83360 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:19:26,323-Speed 6291.74 samples/sec Loss 9.5977 LearningRate 0.0010 Epoch: 4 Global Step: 83370 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:19:29,563-Speed 6323.18 samples/sec Loss 9.6250 LearningRate 0.0010 Epoch: 4 Global Step: 83380 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:19:32,803-Speed 6320.85 samples/sec Loss 9.5431 LearningRate 0.0010 Epoch: 4 Global Step: 83390 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:19:36,045-Speed 6319.30 samples/sec Loss 9.4562 LearningRate 0.0010 Epoch: 4 Global Step: 83400 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:19:39,287-Speed 6318.46 samples/sec Loss 9.5732 LearningRate 0.0010 Epoch: 4 Global Step: 83410 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:19:42,530-Speed 6317.27 samples/sec Loss 9.5627 LearningRate 0.0010 Epoch: 4 Global Step: 83420 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:19:45,774-Speed 6314.15 samples/sec Loss 9.5933 LearningRate 0.0010 Epoch: 4 Global Step: 83430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:19:49,028-Speed 6294.10 samples/sec Loss 9.6031 LearningRate 0.0010 Epoch: 4 Global Step: 83440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:19:52,273-Speed 6314.32 samples/sec Loss 9.6269 LearningRate 0.0010 Epoch: 4 Global Step: 83450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:19:55,514-Speed 6319.92 samples/sec Loss 9.5707 LearningRate 0.0010 Epoch: 4 Global Step: 83460 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:19:58,769-Speed 6293.85 samples/sec Loss 9.5672 LearningRate 0.0010 Epoch: 4 Global Step: 83470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:02,007-Speed 6325.07 samples/sec Loss 9.5885 LearningRate 0.0010 Epoch: 4 Global Step: 83480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:05,254-Speed 6308.85 samples/sec Loss 9.5747 LearningRate 0.0010 Epoch: 4 Global Step: 83490 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:08,497-Speed 6316.99 samples/sec Loss 9.5938 LearningRate 0.0010 Epoch: 4 Global Step: 83500 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 23:20:11,731-Speed 6337.58 samples/sec Loss 9.4297 LearningRate 0.0010 Epoch: 4 Global Step: 83510 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:14,980-Speed 6305.64 samples/sec Loss 9.6364 LearningRate 0.0010 Epoch: 4 Global Step: 83520 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:18,223-Speed 6316.97 samples/sec Loss 9.4932 LearningRate 0.0010 Epoch: 4 Global Step: 83530 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:21,469-Speed 6310.24 samples/sec Loss 9.5911 LearningRate 0.0010 Epoch: 4 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:24,711-Speed 6318.43 samples/sec Loss 9.5061 LearningRate 0.0010 Epoch: 4 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:27,952-Speed 6321.54 samples/sec Loss 9.5854 LearningRate 0.0010 Epoch: 4 Global Step: 83560 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:31,211-Speed 6285.18 samples/sec Loss 9.5242 LearningRate 0.0010 Epoch: 4 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:34,451-Speed 6321.92 samples/sec Loss 9.5997 LearningRate 0.0010 Epoch: 4 Global Step: 83580 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:37,694-Speed 6316.87 samples/sec Loss 9.5145 LearningRate 0.0010 Epoch: 4 Global Step: 83590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:40,942-Speed 6306.97 samples/sec Loss 9.6042 LearningRate 0.0010 Epoch: 4 Global Step: 83600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:44,172-Speed 6341.59 samples/sec Loss 9.6896 LearningRate 0.0010 Epoch: 4 Global Step: 83610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:47,418-Speed 6310.16 samples/sec Loss 9.6150 LearningRate 0.0010 Epoch: 4 Global Step: 83620 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:50,661-Speed 6317.37 samples/sec Loss 9.5823 LearningRate 0.0010 Epoch: 4 Global Step: 83630 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:53,906-Speed 6310.77 samples/sec Loss 9.5234 LearningRate 0.0010 Epoch: 4 Global Step: 83640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:20:57,147-Speed 6321.86 samples/sec Loss 9.6197 LearningRate 0.0010 Epoch: 4 Global Step: 83650 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:00,390-Speed 6316.78 samples/sec Loss 9.6514 LearningRate 0.0010 Epoch: 4 Global Step: 83660 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:03,633-Speed 6316.33 samples/sec Loss 9.4617 LearningRate 0.0010 Epoch: 4 Global Step: 83670 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:06,873-Speed 6322.23 samples/sec Loss 9.5802 LearningRate 0.0010 Epoch: 4 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:10,113-Speed 6321.46 samples/sec Loss 9.5789 LearningRate 0.0010 Epoch: 4 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:13,370-Speed 6290.30 samples/sec Loss 9.6105 LearningRate 0.0010 Epoch: 4 Global Step: 83700 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:16,599-Speed 6343.57 samples/sec Loss 9.5127 LearningRate 0.0010 Epoch: 4 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:19,843-Speed 6314.87 samples/sec Loss 9.4855 LearningRate 0.0010 Epoch: 4 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:23,087-Speed 6314.59 samples/sec Loss 9.5757 LearningRate 0.0010 Epoch: 4 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:26,329-Speed 6319.46 samples/sec Loss 9.5024 LearningRate 0.0010 Epoch: 4 Global Step: 83740 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:29,570-Speed 6319.80 samples/sec Loss 9.5197 LearningRate 0.0010 Epoch: 4 Global Step: 83750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:32,814-Speed 6314.77 samples/sec Loss 9.6115 LearningRate 0.0010 Epoch: 4 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:36,057-Speed 6316.69 samples/sec Loss 9.5875 LearningRate 0.0010 Epoch: 4 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:39,303-Speed 6310.19 samples/sec Loss 9.5713 LearningRate 0.0010 Epoch: 4 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:42,548-Speed 6313.21 samples/sec Loss 9.6112 LearningRate 0.0010 Epoch: 4 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:45,790-Speed 6318.84 samples/sec Loss 9.5369 LearningRate 0.0010 Epoch: 4 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:49,017-Speed 6346.66 samples/sec Loss 9.6034 LearningRate 0.0010 Epoch: 4 Global Step: 83810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:52,259-Speed 6319.23 samples/sec Loss 9.5329 LearningRate 0.0010 Epoch: 4 Global Step: 83820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:55,503-Speed 6314.15 samples/sec Loss 9.6031 LearningRate 0.0010 Epoch: 4 Global Step: 83830 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:21:58,754-Speed 6300.85 samples/sec Loss 9.5910 LearningRate 0.0010 Epoch: 4 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:01,996-Speed 6318.71 samples/sec Loss 9.5465 LearningRate 0.0010 Epoch: 4 Global Step: 83850 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:05,238-Speed 6317.90 samples/sec Loss 9.5474 LearningRate 0.0010 Epoch: 4 Global Step: 83860 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:08,475-Speed 6328.21 samples/sec Loss 9.7056 LearningRate 0.0010 Epoch: 4 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:11,719-Speed 6316.13 samples/sec Loss 9.5595 LearningRate 0.0010 Epoch: 4 Global Step: 83880 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:14,961-Speed 6317.17 samples/sec Loss 9.5846 LearningRate 0.0010 Epoch: 4 Global Step: 83890 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:18,205-Speed 6315.91 samples/sec Loss 9.5742 LearningRate 0.0010 Epoch: 4 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:21,436-Speed 6338.80 samples/sec Loss 9.6817 LearningRate 0.0010 Epoch: 4 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:24,681-Speed 6312.15 samples/sec Loss 9.5506 LearningRate 0.0010 Epoch: 4 Global Step: 83920 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:27,927-Speed 6311.42 samples/sec Loss 9.5411 LearningRate 0.0010 Epoch: 4 Global Step: 83930 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:31,168-Speed 6322.38 samples/sec Loss 9.6054 LearningRate 0.0010 Epoch: 4 Global Step: 83940 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:34,411-Speed 6315.14 samples/sec Loss 9.4868 LearningRate 0.0010 Epoch: 4 Global Step: 83950 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:37,658-Speed 6310.49 samples/sec Loss 9.6785 LearningRate 0.0010 Epoch: 4 Global Step: 83960 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:40,903-Speed 6312.31 samples/sec Loss 9.5405 LearningRate 0.0010 Epoch: 4 Global Step: 83970 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:44,148-Speed 6312.03 samples/sec Loss 9.5001 LearningRate 0.0010 Epoch: 4 Global Step: 83980 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:47,390-Speed 6318.59 samples/sec Loss 9.6404 LearningRate 0.0010 Epoch: 4 Global Step: 83990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:50,631-Speed 6319.93 samples/sec Loss 9.5702 LearningRate 0.0010 Epoch: 4 Global Step: 84000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:53,857-Speed 6350.62 samples/sec Loss 9.5759 LearningRate 0.0010 Epoch: 4 Global Step: 84010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:22:57,100-Speed 6316.71 samples/sec Loss 9.4849 LearningRate 0.0010 Epoch: 4 Global Step: 84020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:00,347-Speed 6307.54 samples/sec Loss 9.4722 LearningRate 0.0010 Epoch: 4 Global Step: 84030 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:03,589-Speed 6318.70 samples/sec Loss 9.5060 LearningRate 0.0010 Epoch: 4 Global Step: 84040 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:06,837-Speed 6307.80 samples/sec Loss 9.5793 LearningRate 0.0010 Epoch: 4 Global Step: 84050 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:10,080-Speed 6316.40 samples/sec Loss 9.4526 LearningRate 0.0010 Epoch: 4 Global Step: 84060 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:13,327-Speed 6307.78 samples/sec Loss 9.5186 LearningRate 0.0010 Epoch: 4 Global Step: 84070 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:16,571-Speed 6315.69 samples/sec Loss 9.5843 LearningRate 0.0010 Epoch: 4 Global Step: 84080 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:19,819-Speed 6305.96 samples/sec Loss 9.6048 LearningRate 0.0010 Epoch: 4 Global Step: 84090 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:23,063-Speed 6314.47 samples/sec Loss 9.5127 LearningRate 0.0010 Epoch: 4 Global Step: 84100 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:26,338-Speed 6254.82 samples/sec Loss 9.5814 LearningRate 0.0010 Epoch: 4 Global Step: 84110 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 23:23:29,606-Speed 6267.85 samples/sec Loss 9.5562 LearningRate 0.0010 Epoch: 4 Global Step: 84120 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:32,847-Speed 6320.91 samples/sec Loss 9.5152 LearningRate 0.0010 Epoch: 4 Global Step: 84130 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:36,091-Speed 6314.48 samples/sec Loss 9.5007 LearningRate 0.0010 Epoch: 4 Global Step: 84140 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:39,336-Speed 6314.92 samples/sec Loss 9.5814 LearningRate 0.0010 Epoch: 4 Global Step: 84150 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:42,581-Speed 6310.88 samples/sec Loss 9.5920 LearningRate 0.0010 Epoch: 4 Global Step: 84160 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:45,827-Speed 6311.03 samples/sec Loss 9.4367 LearningRate 0.0010 Epoch: 4 Global Step: 84170 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:49,083-Speed 6292.20 samples/sec Loss 9.4833 LearningRate 0.0010 Epoch: 4 Global Step: 84180 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:52,371-Speed 6230.15 samples/sec Loss 9.5691 LearningRate 0.0010 Epoch: 4 Global Step: 84190 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:55,616-Speed 6312.99 samples/sec Loss 9.4923 LearningRate 0.0010 Epoch: 4 Global Step: 84200 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:23:58,859-Speed 6316.33 samples/sec Loss 9.6240 LearningRate 0.0010 Epoch: 4 Global Step: 84210 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:02,089-Speed 6340.59 samples/sec Loss 9.5009 LearningRate 0.0010 Epoch: 4 Global Step: 84220 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:05,334-Speed 6312.44 samples/sec Loss 9.5761 LearningRate 0.0010 Epoch: 4 Global Step: 84230 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:08,582-Speed 6307.58 samples/sec Loss 9.4884 LearningRate 0.0010 Epoch: 4 Global Step: 84240 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:11,826-Speed 6315.12 samples/sec Loss 9.4889 LearningRate 0.0010 Epoch: 4 Global Step: 84250 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:15,070-Speed 6314.12 samples/sec Loss 9.5839 LearningRate 0.0010 Epoch: 4 Global Step: 84260 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:18,316-Speed 6311.26 samples/sec Loss 9.5248 LearningRate 0.0010 Epoch: 4 Global Step: 84270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:21,557-Speed 6320.72 samples/sec Loss 9.4368 LearningRate 0.0010 Epoch: 4 Global Step: 84280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:24,804-Speed 6307.53 samples/sec Loss 9.4936 LearningRate 0.0010 Epoch: 4 Global Step: 84290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:28,044-Speed 6322.98 samples/sec Loss 9.5371 LearningRate 0.0010 Epoch: 4 Global Step: 84300 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:31,290-Speed 6310.64 samples/sec Loss 9.4231 LearningRate 0.0010 Epoch: 4 Global Step: 84310 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:34,519-Speed 6343.70 samples/sec Loss 9.5299 LearningRate 0.0010 Epoch: 4 Global Step: 84320 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:37,761-Speed 6319.63 samples/sec Loss 9.5199 LearningRate 0.0010 Epoch: 4 Global Step: 84330 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:41,004-Speed 6316.21 samples/sec Loss 9.5094 LearningRate 0.0010 Epoch: 4 Global Step: 84340 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:44,244-Speed 6322.17 samples/sec Loss 9.4532 LearningRate 0.0010 Epoch: 4 Global Step: 84350 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:47,488-Speed 6315.06 samples/sec Loss 9.5247 LearningRate 0.0010 Epoch: 4 Global Step: 84360 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:50,732-Speed 6315.97 samples/sec Loss 9.4961 LearningRate 0.0010 Epoch: 4 Global Step: 84370 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:53,974-Speed 6318.31 samples/sec Loss 9.6175 LearningRate 0.0010 Epoch: 4 Global Step: 84380 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:24:57,220-Speed 6309.66 samples/sec Loss 9.5251 LearningRate 0.0010 Epoch: 4 Global Step: 84390 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:25:00,467-Speed 6308.04 samples/sec Loss 9.5497 LearningRate 0.0010 Epoch: 4 Global Step: 84400 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:25:03,714-Speed 6310.15 samples/sec Loss 9.5290 LearningRate 0.0010 Epoch: 4 Global Step: 84410 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:25:06,957-Speed 6316.16 samples/sec Loss 9.5411 LearningRate 0.0010 Epoch: 4 Global Step: 84420 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 23:25:10,188-Speed 6339.27 samples/sec Loss 9.5238 LearningRate 0.0010 Epoch: 4 Global Step: 84430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:25:13,435-Speed 6308.31 samples/sec Loss 9.5414 LearningRate 0.0010 Epoch: 4 Global Step: 84440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:25:16,667-Speed 6339.73 samples/sec Loss 9.4980 LearningRate 0.0010 Epoch: 4 Global Step: 84450 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:25:19,911-Speed 6314.50 samples/sec Loss 9.4253 LearningRate 0.0010 Epoch: 4 Global Step: 84460 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:25:23,155-Speed 6314.91 samples/sec Loss 9.5208 LearningRate 0.0010 Epoch: 4 Global Step: 84470 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:25:26,404-Speed 6304.65 samples/sec Loss 9.6207 LearningRate 0.0010 Epoch: 4 Global Step: 84480 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:25:29,650-Speed 6310.31 samples/sec Loss 9.4787 LearningRate 0.0010 Epoch: 4 Global Step: 84490 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:25:32,892-Speed 6317.27 samples/sec Loss 9.5898 LearningRate 0.0010 Epoch: 4 Global Step: 84500 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:25:36,135-Speed 6317.95 samples/sec Loss 9.5325 LearningRate 0.0010 Epoch: 4 Global Step: 84510 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:25:39,377-Speed 6317.11 samples/sec Loss 9.6248 LearningRate 0.0010 Epoch: 4 Global Step: 84520 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:25:42,620-Speed 6317.34 samples/sec Loss 9.5186 LearningRate 0.0010 Epoch: 4 Global Step: 84530 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:25:45,867-Speed 6309.45 samples/sec Loss 9.5444 LearningRate 0.0010 Epoch: 4 Global Step: 84540 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:25:49,116-Speed 6303.81 samples/sec Loss 9.4771 LearningRate 0.0010 Epoch: 4 Global Step: 84550 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:25:52,360-Speed 6316.12 samples/sec Loss 9.4640 LearningRate 0.0010 Epoch: 4 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:25:55,607-Speed 6308.46 samples/sec Loss 9.5725 LearningRate 0.0010 Epoch: 4 Global Step: 84570 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:25:58,851-Speed 6313.57 samples/sec Loss 9.4679 LearningRate 0.0010 Epoch: 4 Global Step: 84580 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:02,096-Speed 6313.18 samples/sec Loss 9.4537 LearningRate 0.0010 Epoch: 4 Global Step: 84590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:05,348-Speed 6300.30 samples/sec Loss 9.5356 LearningRate 0.0010 Epoch: 4 Global Step: 84600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:08,591-Speed 6316.72 samples/sec Loss 9.4387 LearningRate 0.0010 Epoch: 4 Global Step: 84610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:11,839-Speed 6305.64 samples/sec Loss 9.5410 LearningRate 0.0010 Epoch: 4 Global Step: 84620 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:15,083-Speed 6314.97 samples/sec Loss 9.5037 LearningRate 0.0010 Epoch: 4 Global Step: 84630 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:18,327-Speed 6314.94 samples/sec Loss 9.4436 LearningRate 0.0010 Epoch: 4 Global Step: 84640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:21,559-Speed 6338.19 samples/sec Loss 9.5363 LearningRate 0.0010 Epoch: 4 Global Step: 84650 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:24,802-Speed 6316.03 samples/sec Loss 9.5784 LearningRate 0.0010 Epoch: 4 Global Step: 84660 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:28,058-Speed 6291.75 samples/sec Loss 9.4850 LearningRate 0.0010 Epoch: 4 Global Step: 84670 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:31,301-Speed 6315.21 samples/sec Loss 9.5164 LearningRate 0.0010 Epoch: 4 Global Step: 84680 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:34,546-Speed 6314.04 samples/sec Loss 9.5375 LearningRate 0.0010 Epoch: 4 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:37,794-Speed 6308.26 samples/sec Loss 9.4222 LearningRate 0.0010 Epoch: 4 Global Step: 84700 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:41,036-Speed 6318.05 samples/sec Loss 9.6132 LearningRate 0.0010 Epoch: 4 Global Step: 84710 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:44,279-Speed 6316.46 samples/sec Loss 9.5060 LearningRate 0.0010 Epoch: 4 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:47,531-Speed 6299.33 samples/sec Loss 9.5456 LearningRate 0.0010 Epoch: 4 Global Step: 84730 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:50,775-Speed 6313.99 samples/sec Loss 9.4410 LearningRate 0.0010 Epoch: 4 Global Step: 84740 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:26:54,021-Speed 6311.67 samples/sec Loss 9.5049 LearningRate 0.0010 Epoch: 4 Global Step: 84750 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 23:26:57,249-Speed 6344.93 samples/sec Loss 9.4511 LearningRate 0.0010 Epoch: 4 Global Step: 84760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:27:00,492-Speed 6318.24 samples/sec Loss 9.4477 LearningRate 0.0010 Epoch: 4 Global Step: 84770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:27:03,744-Speed 6299.60 samples/sec Loss 9.4851 LearningRate 0.0010 Epoch: 4 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:27:06,991-Speed 6308.81 samples/sec Loss 9.5071 LearningRate 0.0010 Epoch: 4 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:27:10,236-Speed 6312.71 samples/sec Loss 9.4505 LearningRate 0.0010 Epoch: 4 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:27:13,481-Speed 6311.00 samples/sec Loss 9.4460 LearningRate 0.0010 Epoch: 4 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:27:16,727-Speed 6312.31 samples/sec Loss 9.4491 LearningRate 0.0010 Epoch: 4 Global Step: 84820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:27:19,959-Speed 6336.56 samples/sec Loss 9.5320 LearningRate 0.0010 Epoch: 4 Global Step: 84830 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:27:23,211-Speed 6302.27 samples/sec Loss 9.4764 LearningRate 0.0010 Epoch: 4 Global Step: 84840 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:27:26,469-Speed 6288.02 samples/sec Loss 9.4385 LearningRate 0.0010 Epoch: 4 Global Step: 84850 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:27:29,729-Speed 6282.00 samples/sec Loss 9.4553 LearningRate 0.0010 Epoch: 4 Global Step: 84860 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:27:32,972-Speed 6316.94 samples/sec Loss 9.5069 LearningRate 0.0010 Epoch: 4 Global Step: 84870 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:27:36,217-Speed 6312.28 samples/sec Loss 9.5672 LearningRate 0.0010 Epoch: 4 Global Step: 84880 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:27:39,463-Speed 6313.18 samples/sec Loss 9.4041 LearningRate 0.0010 Epoch: 4 Global Step: 84890 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:27:42,712-Speed 6303.64 samples/sec Loss 9.5475 LearningRate 0.0010 Epoch: 4 Global Step: 84900 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:27:45,962-Speed 6303.57 samples/sec Loss 9.4954 LearningRate 0.0010 Epoch: 4 Global Step: 84910 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:27:49,207-Speed 6312.62 samples/sec Loss 9.5089 LearningRate 0.0010 Epoch: 4 Global Step: 84920 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:27:52,450-Speed 6315.75 samples/sec Loss 9.3554 LearningRate 0.0010 Epoch: 4 Global Step: 84930 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:27:55,701-Speed 6302.46 samples/sec Loss 9.4344 LearningRate 0.0010 Epoch: 4 Global Step: 84940 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:27:58,950-Speed 6303.73 samples/sec Loss 9.5461 LearningRate 0.0010 Epoch: 4 Global Step: 84950 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:02,199-Speed 6304.71 samples/sec Loss 9.4545 LearningRate 0.0010 Epoch: 4 Global Step: 84960 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:05,449-Speed 6304.17 samples/sec Loss 9.4358 LearningRate 0.0010 Epoch: 4 Global Step: 84970 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:08,693-Speed 6315.02 samples/sec Loss 9.4815 LearningRate 0.0010 Epoch: 4 Global Step: 84980 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:11,940-Speed 6308.20 samples/sec Loss 9.4327 LearningRate 0.0010 Epoch: 4 Global Step: 84990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:15,183-Speed 6317.44 samples/sec Loss 9.5034 LearningRate 0.0010 Epoch: 4 Global Step: 85000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:18,428-Speed 6311.97 samples/sec Loss 9.5404 LearningRate 0.0010 Epoch: 4 Global Step: 85010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:21,682-Speed 6295.56 samples/sec Loss 9.4614 LearningRate 0.0010 Epoch: 4 Global Step: 85020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:24,914-Speed 6338.81 samples/sec Loss 9.4933 LearningRate 0.0010 Epoch: 4 Global Step: 85030 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:28,158-Speed 6312.71 samples/sec Loss 9.5151 LearningRate 0.0010 Epoch: 4 Global Step: 85040 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:31,399-Speed 6321.38 samples/sec Loss 9.5053 LearningRate 0.0010 Epoch: 4 Global Step: 85050 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:34,643-Speed 6314.73 samples/sec Loss 9.4089 LearningRate 0.0010 Epoch: 4 Global Step: 85060 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:37,887-Speed 6314.97 samples/sec Loss 9.3736 LearningRate 0.0010 Epoch: 4 Global Step: 85070 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:28:41,139-Speed 6299.48 samples/sec Loss 9.4130 LearningRate 0.0010 Epoch: 4 Global Step: 85080 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:28:44,383-Speed 6312.98 samples/sec Loss 9.5749 LearningRate 0.0010 Epoch: 4 Global Step: 85090 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:28:47,628-Speed 6312.61 samples/sec Loss 9.3820 LearningRate 0.0010 Epoch: 4 Global Step: 85100 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:28:50,874-Speed 6311.02 samples/sec Loss 9.4247 LearningRate 0.0010 Epoch: 4 Global Step: 85110 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:28:54,118-Speed 6314.88 samples/sec Loss 9.4819 LearningRate 0.0010 Epoch: 4 Global Step: 85120 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:28:57,362-Speed 6315.51 samples/sec Loss 9.4753 LearningRate 0.0010 Epoch: 4 Global Step: 85130 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:29:00,608-Speed 6309.82 samples/sec Loss 9.4841 LearningRate 0.0010 Epoch: 4 Global Step: 85140 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:29:03,851-Speed 6316.79 samples/sec Loss 9.5505 LearningRate 0.0010 Epoch: 4 Global Step: 85150 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:29:07,097-Speed 6309.94 samples/sec Loss 9.4790 LearningRate 0.0010 Epoch: 4 Global Step: 85160 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:29:10,341-Speed 6314.96 samples/sec Loss 9.4204 LearningRate 0.0010 Epoch: 4 Global Step: 85170 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:29:13,585-Speed 6316.59 samples/sec Loss 9.4324 LearningRate 0.0010 Epoch: 4 Global Step: 85180 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:16,829-Speed 6314.91 samples/sec Loss 9.5830 LearningRate 0.0010 Epoch: 4 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:20,074-Speed 6312.64 samples/sec Loss 9.4939 LearningRate 0.0010 Epoch: 4 Global Step: 85200 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:23,327-Speed 6296.12 samples/sec Loss 9.4620 LearningRate 0.0010 Epoch: 4 Global Step: 85210 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:26,573-Speed 6312.06 samples/sec Loss 9.4734 LearningRate 0.0010 Epoch: 4 Global Step: 85220 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:29,817-Speed 6312.82 samples/sec Loss 9.4287 LearningRate 0.0010 Epoch: 4 Global Step: 85230 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:33,061-Speed 6315.42 samples/sec Loss 9.5534 LearningRate 0.0010 Epoch: 4 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:36,308-Speed 6308.59 samples/sec Loss 9.4677 LearningRate 0.0010 Epoch: 4 Global Step: 85250 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:39,556-Speed 6307.76 samples/sec Loss 9.4603 LearningRate 0.0010 Epoch: 4 Global Step: 85260 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:42,802-Speed 6310.35 samples/sec Loss 9.5726 LearningRate 0.0010 Epoch: 4 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:46,037-Speed 6331.73 samples/sec Loss 9.4346 LearningRate 0.0010 Epoch: 4 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:49,280-Speed 6317.08 samples/sec Loss 9.5124 LearningRate 0.0010 Epoch: 4 Global Step: 85290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:52,537-Speed 6289.13 samples/sec Loss 9.5564 LearningRate 0.0010 Epoch: 4 Global Step: 85300 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:55,804-Speed 6269.82 samples/sec Loss 9.4628 LearningRate 0.0010 Epoch: 4 Global Step: 85310 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:29:59,048-Speed 6314.87 samples/sec Loss 9.5290 LearningRate 0.0010 Epoch: 4 Global Step: 85320 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:30:02,278-Speed 6342.16 samples/sec Loss 9.4307 LearningRate 0.0010 Epoch: 4 Global Step: 85330 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:30:05,520-Speed 6318.18 samples/sec Loss 9.4564 LearningRate 0.0010 Epoch: 4 Global Step: 85340 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:30:08,765-Speed 6313.40 samples/sec Loss 9.5009 LearningRate 0.0010 Epoch: 4 Global Step: 85350 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:30:12,010-Speed 6311.59 samples/sec Loss 9.5620 LearningRate 0.0010 Epoch: 4 Global Step: 85360 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:30:15,258-Speed 6309.25 samples/sec Loss 9.5166 LearningRate 0.0010 Epoch: 4 Global Step: 85370 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:30:18,506-Speed 6307.42 samples/sec Loss 9.4320 LearningRate 0.0010 Epoch: 4 Global Step: 85380 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:30:21,749-Speed 6316.48 samples/sec Loss 9.4892 LearningRate 0.0010 Epoch: 4 Global Step: 85390 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:30:25,067-Speed 6174.37 samples/sec Loss 9.4687 LearningRate 0.0010 Epoch: 4 Global Step: 85400 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:30:28,368-Speed 6205.42 samples/sec Loss 9.5022 LearningRate 0.0010 Epoch: 4 Global Step: 85410 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:30:31,627-Speed 6286.29 samples/sec Loss 9.4324 LearningRate 0.0010 Epoch: 4 Global Step: 85420 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:30:34,878-Speed 6299.37 samples/sec Loss 9.4831 LearningRate 0.0010 Epoch: 4 Global Step: 85430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:30:38,128-Speed 6305.87 samples/sec Loss 9.4374 LearningRate 0.0010 Epoch: 4 Global Step: 85440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:30:41,374-Speed 6310.48 samples/sec Loss 9.4760 LearningRate 0.0010 Epoch: 4 Global Step: 85450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:30:44,621-Speed 6309.02 samples/sec Loss 9.4689 LearningRate 0.0010 Epoch: 4 Global Step: 85460 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:30:47,871-Speed 6301.89 samples/sec Loss 9.3857 LearningRate 0.0010 Epoch: 4 Global Step: 85470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:30:51,122-Speed 6301.20 samples/sec Loss 9.4746 LearningRate 0.0010 Epoch: 4 Global Step: 85480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:30:54,390-Speed 6267.75 samples/sec Loss 9.4270 LearningRate 0.0010 Epoch: 4 Global Step: 85490 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:30:57,636-Speed 6311.61 samples/sec Loss 9.4466 LearningRate 0.0010 Epoch: 4 Global Step: 85500 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:00,882-Speed 6311.08 samples/sec Loss 9.4456 LearningRate 0.0010 Epoch: 4 Global Step: 85510 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:04,131-Speed 6303.91 samples/sec Loss 9.5687 LearningRate 0.0010 Epoch: 4 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:07,365-Speed 6335.11 samples/sec Loss 9.3796 LearningRate 0.0010 Epoch: 4 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:10,611-Speed 6310.54 samples/sec Loss 9.3989 LearningRate 0.0010 Epoch: 4 Global Step: 85540 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:13,856-Speed 6312.43 samples/sec Loss 9.4281 LearningRate 0.0010 Epoch: 4 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:17,101-Speed 6313.08 samples/sec Loss 9.4628 LearningRate 0.0010 Epoch: 4 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:20,347-Speed 6309.79 samples/sec Loss 9.3697 LearningRate 0.0010 Epoch: 4 Global Step: 85570 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:23,593-Speed 6309.46 samples/sec Loss 9.3660 LearningRate 0.0010 Epoch: 4 Global Step: 85580 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:26,840-Speed 6309.54 samples/sec Loss 9.4444 LearningRate 0.0010 Epoch: 4 Global Step: 85590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:30,091-Speed 6301.88 samples/sec Loss 9.4060 LearningRate 0.0010 Epoch: 4 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:33,336-Speed 6311.69 samples/sec Loss 9.4154 LearningRate 0.0010 Epoch: 4 Global Step: 85610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:36,583-Speed 6309.78 samples/sec Loss 9.4283 LearningRate 0.0010 Epoch: 4 Global Step: 85620 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:39,824-Speed 6319.77 samples/sec Loss 9.4334 LearningRate 0.0010 Epoch: 4 Global Step: 85630 Fp16 Grad Scale: 131072 Required: 68 hours Training: 2022-03-31 23:31:43,058-Speed 6335.93 samples/sec Loss 9.3679 LearningRate 0.0010 Epoch: 4 Global Step: 85640 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:46,336-Speed 6247.55 samples/sec Loss 9.4703 LearningRate 0.0010 Epoch: 4 Global Step: 85650 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:49,671-Speed 6143.08 samples/sec Loss 9.5032 LearningRate 0.0010 Epoch: 4 Global Step: 85660 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:52,918-Speed 6307.39 samples/sec Loss 9.4193 LearningRate 0.0010 Epoch: 4 Global Step: 85670 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:56,163-Speed 6313.51 samples/sec Loss 9.4185 LearningRate 0.0010 Epoch: 4 Global Step: 85680 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:31:59,410-Speed 6308.17 samples/sec Loss 9.3614 LearningRate 0.0010 Epoch: 4 Global Step: 85690 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:02,657-Speed 6308.97 samples/sec Loss 9.4259 LearningRate 0.0010 Epoch: 4 Global Step: 85700 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:05,901-Speed 6314.69 samples/sec Loss 9.4120 LearningRate 0.0010 Epoch: 4 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:09,143-Speed 6319.34 samples/sec Loss 9.4666 LearningRate 0.0010 Epoch: 4 Global Step: 85720 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:12,389-Speed 6310.88 samples/sec Loss 9.4234 LearningRate 0.0010 Epoch: 4 Global Step: 85730 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:15,620-Speed 6340.35 samples/sec Loss 9.4171 LearningRate 0.0010 Epoch: 4 Global Step: 85740 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:18,865-Speed 6312.18 samples/sec Loss 9.4337 LearningRate 0.0010 Epoch: 4 Global Step: 85750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:22,118-Speed 6297.29 samples/sec Loss 9.4470 LearningRate 0.0010 Epoch: 4 Global Step: 85760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:25,360-Speed 6317.42 samples/sec Loss 9.4433 LearningRate 0.0010 Epoch: 4 Global Step: 85770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:28,601-Speed 6321.20 samples/sec Loss 9.5465 LearningRate 0.0010 Epoch: 4 Global Step: 85780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:31,845-Speed 6312.99 samples/sec Loss 9.4416 LearningRate 0.0010 Epoch: 4 Global Step: 85790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:35,088-Speed 6316.70 samples/sec Loss 9.4615 LearningRate 0.0010 Epoch: 4 Global Step: 85800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:38,332-Speed 6315.24 samples/sec Loss 9.5023 LearningRate 0.0010 Epoch: 4 Global Step: 85810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:41,575-Speed 6316.62 samples/sec Loss 9.4843 LearningRate 0.0010 Epoch: 4 Global Step: 85820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:44,823-Speed 6308.44 samples/sec Loss 9.4511 LearningRate 0.0010 Epoch: 4 Global Step: 85830 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:32:48,052-Speed 6344.02 samples/sec Loss 9.4529 LearningRate 0.0010 Epoch: 4 Global Step: 85840 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:32:51,294-Speed 6317.19 samples/sec Loss 9.3489 LearningRate 0.0010 Epoch: 4 Global Step: 85850 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:32:54,543-Speed 6304.66 samples/sec Loss 9.3365 LearningRate 0.0010 Epoch: 4 Global Step: 85860 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:32:57,789-Speed 6311.78 samples/sec Loss 9.4391 LearningRate 0.0010 Epoch: 4 Global Step: 85870 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:33:01,037-Speed 6305.97 samples/sec Loss 9.3446 LearningRate 0.0010 Epoch: 4 Global Step: 85880 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:33:04,278-Speed 6321.37 samples/sec Loss 9.3889 LearningRate 0.0010 Epoch: 4 Global Step: 85890 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:33:07,523-Speed 6312.59 samples/sec Loss 9.5527 LearningRate 0.0010 Epoch: 4 Global Step: 85900 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:33:10,775-Speed 6299.04 samples/sec Loss 9.4501 LearningRate 0.0010 Epoch: 4 Global Step: 85910 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:33:14,044-Speed 6268.48 samples/sec Loss 9.4172 LearningRate 0.0010 Epoch: 4 Global Step: 85920 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:33:17,288-Speed 6315.90 samples/sec Loss 9.4492 LearningRate 0.0010 Epoch: 4 Global Step: 85930 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:33:20,529-Speed 6320.41 samples/sec Loss 9.3828 LearningRate 0.0010 Epoch: 4 Global Step: 85940 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:23,772-Speed 6315.18 samples/sec Loss 9.4016 LearningRate 0.0010 Epoch: 4 Global Step: 85950 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:27,014-Speed 6320.10 samples/sec Loss 9.4115 LearningRate 0.0010 Epoch: 4 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:30,259-Speed 6313.92 samples/sec Loss 9.4121 LearningRate 0.0010 Epoch: 4 Global Step: 85970 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:33,503-Speed 6313.81 samples/sec Loss 9.3380 LearningRate 0.0010 Epoch: 4 Global Step: 85980 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:36,746-Speed 6316.65 samples/sec Loss 9.4240 LearningRate 0.0010 Epoch: 4 Global Step: 85990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:39,995-Speed 6304.50 samples/sec Loss 9.4411 LearningRate 0.0010 Epoch: 4 Global Step: 86000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:43,239-Speed 6316.48 samples/sec Loss 9.4836 LearningRate 0.0010 Epoch: 4 Global Step: 86010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:46,488-Speed 6304.91 samples/sec Loss 9.3742 LearningRate 0.0010 Epoch: 4 Global Step: 86020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:49,733-Speed 6311.42 samples/sec Loss 9.3397 LearningRate 0.0010 Epoch: 4 Global Step: 86030 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:52,972-Speed 6324.25 samples/sec Loss 9.4071 LearningRate 0.0010 Epoch: 4 Global Step: 86040 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:56,221-Speed 6305.24 samples/sec Loss 9.4637 LearningRate 0.0010 Epoch: 4 Global Step: 86050 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:33:59,467-Speed 6311.46 samples/sec Loss 9.4366 LearningRate 0.0010 Epoch: 4 Global Step: 86060 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:34:02,717-Speed 6303.84 samples/sec Loss 9.3999 LearningRate 0.0010 Epoch: 4 Global Step: 86070 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:34:05,965-Speed 6306.28 samples/sec Loss 9.3654 LearningRate 0.0010 Epoch: 4 Global Step: 86080 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:34:09,206-Speed 6320.99 samples/sec Loss 9.4031 LearningRate 0.0010 Epoch: 4 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:34:12,461-Speed 6293.43 samples/sec Loss 9.3810 LearningRate 0.0010 Epoch: 4 Global Step: 86100 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:34:15,705-Speed 6313.87 samples/sec Loss 9.4363 LearningRate 0.0010 Epoch: 4 Global Step: 86110 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:34:18,935-Speed 6342.59 samples/sec Loss 9.3853 LearningRate 0.0010 Epoch: 4 Global Step: 86120 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:34:22,298-Speed 6091.04 samples/sec Loss 9.4793 LearningRate 0.0010 Epoch: 4 Global Step: 86130 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:34:25,548-Speed 6302.94 samples/sec Loss 9.3863 LearningRate 0.0010 Epoch: 4 Global Step: 86140 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:34:28,794-Speed 6309.95 samples/sec Loss 9.3550 LearningRate 0.0010 Epoch: 4 Global Step: 86150 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:34:32,039-Speed 6312.97 samples/sec Loss 9.4288 LearningRate 0.0010 Epoch: 4 Global Step: 86160 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:34:35,286-Speed 6308.14 samples/sec Loss 9.4241 LearningRate 0.0010 Epoch: 4 Global Step: 86170 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:34:38,533-Speed 6310.20 samples/sec Loss 9.4126 LearningRate 0.0010 Epoch: 4 Global Step: 86180 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:34:41,778-Speed 6311.58 samples/sec Loss 9.4379 LearningRate 0.0010 Epoch: 4 Global Step: 86190 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:34:45,020-Speed 6319.94 samples/sec Loss 9.4075 LearningRate 0.0010 Epoch: 4 Global Step: 86200 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:34:48,264-Speed 6313.35 samples/sec Loss 9.3991 LearningRate 0.0010 Epoch: 4 Global Step: 86210 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:34:51,513-Speed 6304.44 samples/sec Loss 9.4237 LearningRate 0.0010 Epoch: 4 Global Step: 86220 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:34:54,760-Speed 6310.02 samples/sec Loss 9.3506 LearningRate 0.0010 Epoch: 4 Global Step: 86230 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:34:58,013-Speed 6296.29 samples/sec Loss 9.3084 LearningRate 0.0010 Epoch: 4 Global Step: 86240 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:01,256-Speed 6317.42 samples/sec Loss 9.4438 LearningRate 0.0010 Epoch: 4 Global Step: 86250 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:04,499-Speed 6317.24 samples/sec Loss 9.4979 LearningRate 0.0010 Epoch: 4 Global Step: 86260 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:07,743-Speed 6314.82 samples/sec Loss 9.3442 LearningRate 0.0010 Epoch: 4 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:10,984-Speed 6320.34 samples/sec Loss 9.3369 LearningRate 0.0010 Epoch: 4 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:14,230-Speed 6311.06 samples/sec Loss 9.4228 LearningRate 0.0010 Epoch: 4 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:17,473-Speed 6316.88 samples/sec Loss 9.3849 LearningRate 0.0010 Epoch: 4 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:20,720-Speed 6308.36 samples/sec Loss 9.4390 LearningRate 0.0010 Epoch: 4 Global Step: 86310 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:23,948-Speed 6344.47 samples/sec Loss 9.4809 LearningRate 0.0010 Epoch: 4 Global Step: 86320 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:27,193-Speed 6313.85 samples/sec Loss 9.4292 LearningRate 0.0010 Epoch: 4 Global Step: 86330 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:30,439-Speed 6309.67 samples/sec Loss 9.3429 LearningRate 0.0010 Epoch: 4 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:33,687-Speed 6308.18 samples/sec Loss 9.3551 LearningRate 0.0010 Epoch: 4 Global Step: 86350 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:36,938-Speed 6300.88 samples/sec Loss 9.2754 LearningRate 0.0010 Epoch: 4 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:40,182-Speed 6313.62 samples/sec Loss 9.3730 LearningRate 0.0010 Epoch: 4 Global Step: 86370 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:43,424-Speed 6320.12 samples/sec Loss 9.3531 LearningRate 0.0010 Epoch: 4 Global Step: 86380 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:46,677-Speed 6296.73 samples/sec Loss 9.3539 LearningRate 0.0010 Epoch: 4 Global Step: 86390 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:49,920-Speed 6316.58 samples/sec Loss 9.3760 LearningRate 0.0010 Epoch: 4 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:53,167-Speed 6308.29 samples/sec Loss 9.3513 LearningRate 0.0010 Epoch: 4 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:56,404-Speed 6331.27 samples/sec Loss 9.3049 LearningRate 0.0010 Epoch: 4 Global Step: 86420 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:35:59,652-Speed 6306.53 samples/sec Loss 9.4112 LearningRate 0.0010 Epoch: 4 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:02,898-Speed 6309.64 samples/sec Loss 9.3371 LearningRate 0.0010 Epoch: 4 Global Step: 86440 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:06,143-Speed 6312.85 samples/sec Loss 9.3314 LearningRate 0.0010 Epoch: 4 Global Step: 86450 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:09,392-Speed 6304.84 samples/sec Loss 9.4043 LearningRate 0.0010 Epoch: 4 Global Step: 86460 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:12,635-Speed 6316.52 samples/sec Loss 9.4043 LearningRate 0.0010 Epoch: 4 Global Step: 86470 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:15,886-Speed 6301.48 samples/sec Loss 9.4178 LearningRate 0.0010 Epoch: 4 Global Step: 86480 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:19,134-Speed 6307.68 samples/sec Loss 9.3858 LearningRate 0.0010 Epoch: 4 Global Step: 86490 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:22,380-Speed 6310.94 samples/sec Loss 9.3355 LearningRate 0.0010 Epoch: 4 Global Step: 86500 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:25,629-Speed 6303.60 samples/sec Loss 9.3262 LearningRate 0.0010 Epoch: 4 Global Step: 86510 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:28,860-Speed 6341.34 samples/sec Loss 9.4058 LearningRate 0.0010 Epoch: 4 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:32,106-Speed 6310.30 samples/sec Loss 9.3962 LearningRate 0.0010 Epoch: 4 Global Step: 86530 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:35,354-Speed 6305.42 samples/sec Loss 9.4251 LearningRate 0.0010 Epoch: 4 Global Step: 86540 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:38,602-Speed 6307.50 samples/sec Loss 9.3740 LearningRate 0.0010 Epoch: 4 Global Step: 86550 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:41,847-Speed 6313.57 samples/sec Loss 9.4830 LearningRate 0.0010 Epoch: 4 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:45,091-Speed 6314.40 samples/sec Loss 9.3979 LearningRate 0.0010 Epoch: 4 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:48,336-Speed 6312.77 samples/sec Loss 9.3532 LearningRate 0.0010 Epoch: 4 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:51,583-Speed 6307.74 samples/sec Loss 9.4634 LearningRate 0.0010 Epoch: 4 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:54,828-Speed 6312.59 samples/sec Loss 9.3691 LearningRate 0.0010 Epoch: 4 Global Step: 86600 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:36:58,075-Speed 6309.35 samples/sec Loss 9.3459 LearningRate 0.0010 Epoch: 4 Global Step: 86610 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:37:01,304-Speed 6344.92 samples/sec Loss 9.3922 LearningRate 0.0010 Epoch: 4 Global Step: 86620 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:37:04,547-Speed 6315.12 samples/sec Loss 9.3321 LearningRate 0.0010 Epoch: 4 Global Step: 86630 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:37:07,788-Speed 6320.47 samples/sec Loss 9.4375 LearningRate 0.0010 Epoch: 4 Global Step: 86640 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:37:11,031-Speed 6317.74 samples/sec Loss 9.3772 LearningRate 0.0010 Epoch: 4 Global Step: 86650 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:37:14,274-Speed 6316.48 samples/sec Loss 9.4515 LearningRate 0.0010 Epoch: 4 Global Step: 86660 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:37:17,519-Speed 6312.67 samples/sec Loss 9.2302 LearningRate 0.0010 Epoch: 4 Global Step: 86670 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:37:20,766-Speed 6308.54 samples/sec Loss 9.3454 LearningRate 0.0010 Epoch: 4 Global Step: 86680 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:37:24,011-Speed 6312.04 samples/sec Loss 9.4214 LearningRate 0.0010 Epoch: 4 Global Step: 86690 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:37:27,258-Speed 6310.92 samples/sec Loss 9.3037 LearningRate 0.0010 Epoch: 4 Global Step: 86700 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:37:30,499-Speed 6320.34 samples/sec Loss 9.2350 LearningRate 0.0010 Epoch: 4 Global Step: 86710 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:37:33,739-Speed 6321.15 samples/sec Loss 9.3545 LearningRate 0.0010 Epoch: 4 Global Step: 86720 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:37:36,984-Speed 6312.72 samples/sec Loss 9.3689 LearningRate 0.0010 Epoch: 4 Global Step: 86730 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:37:40,229-Speed 6312.73 samples/sec Loss 9.3107 LearningRate 0.0010 Epoch: 4 Global Step: 86740 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:37:43,476-Speed 6310.28 samples/sec Loss 9.2404 LearningRate 0.0010 Epoch: 4 Global Step: 86750 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:37:46,716-Speed 6322.09 samples/sec Loss 9.4465 LearningRate 0.0010 Epoch: 4 Global Step: 86760 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:37:49,962-Speed 6311.08 samples/sec Loss 9.4329 LearningRate 0.0010 Epoch: 4 Global Step: 86770 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:37:53,209-Speed 6308.17 samples/sec Loss 9.3249 LearningRate 0.0010 Epoch: 4 Global Step: 86780 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:37:56,464-Speed 6293.15 samples/sec Loss 9.3539 LearningRate 0.0010 Epoch: 4 Global Step: 86790 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:37:59,722-Speed 6287.59 samples/sec Loss 9.2559 LearningRate 0.0010 Epoch: 4 Global Step: 86800 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:38:02,972-Speed 6302.88 samples/sec Loss 9.3061 LearningRate 0.0010 Epoch: 4 Global Step: 86810 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:38:06,204-Speed 6338.67 samples/sec Loss 9.3835 LearningRate 0.0010 Epoch: 4 Global Step: 86820 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:38:09,445-Speed 6318.73 samples/sec Loss 9.5189 LearningRate 0.0010 Epoch: 4 Global Step: 86830 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:38:12,691-Speed 6312.13 samples/sec Loss 9.3418 LearningRate 0.0010 Epoch: 4 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:38:15,938-Speed 6309.17 samples/sec Loss 9.3736 LearningRate 0.0010 Epoch: 4 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:38:19,179-Speed 6319.22 samples/sec Loss 9.2911 LearningRate 0.0010 Epoch: 4 Global Step: 86860 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:38:22,433-Speed 6295.94 samples/sec Loss 9.3852 LearningRate 0.0010 Epoch: 4 Global Step: 86870 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:38:25,676-Speed 6316.25 samples/sec Loss 9.2773 LearningRate 0.0010 Epoch: 4 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:38:28,910-Speed 6334.95 samples/sec Loss 9.3871 LearningRate 0.0010 Epoch: 4 Global Step: 86890 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:38:32,157-Speed 6309.62 samples/sec Loss 9.4025 LearningRate 0.0010 Epoch: 4 Global Step: 86900 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:38:35,407-Speed 6302.44 samples/sec Loss 9.3265 LearningRate 0.0010 Epoch: 4 Global Step: 86910 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:38:38,654-Speed 6308.11 samples/sec Loss 9.3503 LearningRate 0.0010 Epoch: 4 Global Step: 86920 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:38:41,898-Speed 6314.23 samples/sec Loss 9.3605 LearningRate 0.0010 Epoch: 4 Global Step: 86930 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:38:45,146-Speed 6307.84 samples/sec Loss 9.4462 LearningRate 0.0010 Epoch: 4 Global Step: 86940 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:38:48,390-Speed 6315.17 samples/sec Loss 9.3769 LearningRate 0.0010 Epoch: 4 Global Step: 86950 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:38:51,634-Speed 6313.95 samples/sec Loss 9.3639 LearningRate 0.0010 Epoch: 4 Global Step: 86960 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:38:54,886-Speed 6299.68 samples/sec Loss 9.2823 LearningRate 0.0010 Epoch: 4 Global Step: 86970 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:38:58,130-Speed 6313.53 samples/sec Loss 9.3704 LearningRate 0.0010 Epoch: 4 Global Step: 86980 Fp16 Grad Scale: 32768 Required: 68 hours Training: 2022-03-31 23:39:01,378-Speed 6307.56 samples/sec Loss 9.3663 LearningRate 0.0010 Epoch: 4 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:39:04,623-Speed 6311.95 samples/sec Loss 9.2936 LearningRate 0.0010 Epoch: 4 Global Step: 87000 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:39:07,867-Speed 6313.47 samples/sec Loss 9.3308 LearningRate 0.0010 Epoch: 4 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:39:11,111-Speed 6315.12 samples/sec Loss 9.4149 LearningRate 0.0010 Epoch: 4 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 68 hours Training: 2022-03-31 23:39:14,356-Speed 6312.98 samples/sec Loss 9.2512 LearningRate 0.0010 Epoch: 4 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:17,605-Speed 6304.36 samples/sec Loss 9.3290 LearningRate 0.0010 Epoch: 4 Global Step: 87040 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:20,867-Speed 6280.97 samples/sec Loss 9.3414 LearningRate 0.0010 Epoch: 4 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:24,111-Speed 6313.19 samples/sec Loss 9.3167 LearningRate 0.0010 Epoch: 4 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:27,356-Speed 6312.89 samples/sec Loss 9.3593 LearningRate 0.0010 Epoch: 4 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:30,604-Speed 6307.97 samples/sec Loss 9.3828 LearningRate 0.0010 Epoch: 4 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:33,853-Speed 6304.25 samples/sec Loss 9.3872 LearningRate 0.0010 Epoch: 4 Global Step: 87090 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-03-31 23:39:37,086-Speed 6336.36 samples/sec Loss 9.2825 LearningRate 0.0010 Epoch: 4 Global Step: 87100 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:40,331-Speed 6312.72 samples/sec Loss 9.3792 LearningRate 0.0010 Epoch: 4 Global Step: 87110 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:43,573-Speed 6319.91 samples/sec Loss 9.4073 LearningRate 0.0010 Epoch: 4 Global Step: 87120 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:46,817-Speed 6313.26 samples/sec Loss 9.2802 LearningRate 0.0010 Epoch: 4 Global Step: 87130 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:50,063-Speed 6312.43 samples/sec Loss 9.3422 LearningRate 0.0010 Epoch: 4 Global Step: 87140 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:53,325-Speed 6279.31 samples/sec Loss 9.2921 LearningRate 0.0010 Epoch: 4 Global Step: 87150 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:56,568-Speed 6316.40 samples/sec Loss 9.3299 LearningRate 0.0010 Epoch: 4 Global Step: 87160 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:39:59,814-Speed 6309.76 samples/sec Loss 9.3147 LearningRate 0.0010 Epoch: 4 Global Step: 87170 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:03,061-Speed 6310.37 samples/sec Loss 9.2619 LearningRate 0.0010 Epoch: 4 Global Step: 87180 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:06,305-Speed 6314.05 samples/sec Loss 9.3447 LearningRate 0.0010 Epoch: 4 Global Step: 87190 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:09,535-Speed 6342.48 samples/sec Loss 9.2233 LearningRate 0.0010 Epoch: 4 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:12,777-Speed 6317.32 samples/sec Loss 9.3388 LearningRate 0.0010 Epoch: 4 Global Step: 87210 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:16,019-Speed 6319.14 samples/sec Loss 9.3320 LearningRate 0.0010 Epoch: 4 Global Step: 87220 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:19,267-Speed 6306.25 samples/sec Loss 9.3657 LearningRate 0.0010 Epoch: 4 Global Step: 87230 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:22,509-Speed 6318.05 samples/sec Loss 9.3429 LearningRate 0.0010 Epoch: 4 Global Step: 87240 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:25,755-Speed 6310.72 samples/sec Loss 9.3592 LearningRate 0.0010 Epoch: 4 Global Step: 87250 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:29,001-Speed 6311.91 samples/sec Loss 9.3551 LearningRate 0.0010 Epoch: 4 Global Step: 87260 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:32,251-Speed 6301.76 samples/sec Loss 9.3128 LearningRate 0.0010 Epoch: 4 Global Step: 87270 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:35,495-Speed 6315.12 samples/sec Loss 9.3049 LearningRate 0.0010 Epoch: 4 Global Step: 87280 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:38,753-Speed 6286.79 samples/sec Loss 9.3676 LearningRate 0.0010 Epoch: 4 Global Step: 87290 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:41,983-Speed 6343.38 samples/sec Loss 9.2412 LearningRate 0.0010 Epoch: 4 Global Step: 87300 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:45,223-Speed 6320.98 samples/sec Loss 9.3570 LearningRate 0.0010 Epoch: 4 Global Step: 87310 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:48,469-Speed 6311.23 samples/sec Loss 9.3556 LearningRate 0.0010 Epoch: 4 Global Step: 87320 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:51,721-Speed 6303.72 samples/sec Loss 9.3627 LearningRate 0.0010 Epoch: 4 Global Step: 87330 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:40:54,950-Speed 6344.04 samples/sec Loss 9.3996 LearningRate 0.0010 Epoch: 4 Global Step: 87340 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:40:58,194-Speed 6314.58 samples/sec Loss 9.3470 LearningRate 0.0010 Epoch: 4 Global Step: 87350 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:41:01,437-Speed 6315.49 samples/sec Loss 9.3293 LearningRate 0.0010 Epoch: 4 Global Step: 87360 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:41:04,683-Speed 6311.09 samples/sec Loss 9.2695 LearningRate 0.0010 Epoch: 4 Global Step: 87370 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:41:07,930-Speed 6309.14 samples/sec Loss 9.3451 LearningRate 0.0010 Epoch: 4 Global Step: 87380 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:41:11,177-Speed 6309.05 samples/sec Loss 9.2825 LearningRate 0.0010 Epoch: 4 Global Step: 87390 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:41:14,420-Speed 6316.10 samples/sec Loss 9.3403 LearningRate 0.0010 Epoch: 4 Global Step: 87400 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:41:17,678-Speed 6287.11 samples/sec Loss 9.3524 LearningRate 0.0010 Epoch: 4 Global Step: 87410 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:41:20,924-Speed 6311.07 samples/sec Loss 9.3403 LearningRate 0.0010 Epoch: 4 Global Step: 87420 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:41:24,170-Speed 6311.17 samples/sec Loss 9.2841 LearningRate 0.0010 Epoch: 4 Global Step: 87430 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:41:27,413-Speed 6316.66 samples/sec Loss 9.3240 LearningRate 0.0010 Epoch: 4 Global Step: 87440 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:41:30,658-Speed 6312.44 samples/sec Loss 9.3105 LearningRate 0.0010 Epoch: 4 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:41:33,902-Speed 6314.22 samples/sec Loss 9.3166 LearningRate 0.0010 Epoch: 4 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:41:37,157-Speed 6293.74 samples/sec Loss 9.3099 LearningRate 0.0010 Epoch: 4 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:41:40,402-Speed 6313.17 samples/sec Loss 9.3435 LearningRate 0.0010 Epoch: 4 Global Step: 87480 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:41:43,650-Speed 6306.71 samples/sec Loss 9.3087 LearningRate 0.0010 Epoch: 4 Global Step: 87490 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:41:46,895-Speed 6312.25 samples/sec Loss 9.1898 LearningRate 0.0010 Epoch: 4 Global Step: 87500 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:41:50,133-Speed 6326.29 samples/sec Loss 9.2897 LearningRate 0.0010 Epoch: 4 Global Step: 87510 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:41:53,380-Speed 6309.77 samples/sec Loss 9.3809 LearningRate 0.0010 Epoch: 4 Global Step: 87520 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:41:56,624-Speed 6313.37 samples/sec Loss 9.3166 LearningRate 0.0010 Epoch: 4 Global Step: 87530 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:41:59,857-Speed 6336.62 samples/sec Loss 9.2833 LearningRate 0.0010 Epoch: 4 Global Step: 87540 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:03,103-Speed 6310.04 samples/sec Loss 9.2430 LearningRate 0.0010 Epoch: 4 Global Step: 87550 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:06,354-Speed 6301.98 samples/sec Loss 9.3223 LearningRate 0.0010 Epoch: 4 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:09,601-Speed 6308.82 samples/sec Loss 9.3090 LearningRate 0.0010 Epoch: 4 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:12,850-Speed 6306.26 samples/sec Loss 9.3628 LearningRate 0.0010 Epoch: 4 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:16,096-Speed 6309.29 samples/sec Loss 9.2941 LearningRate 0.0010 Epoch: 4 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:19,344-Speed 6308.14 samples/sec Loss 9.2684 LearningRate 0.0010 Epoch: 4 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:22,599-Speed 6292.68 samples/sec Loss 9.2681 LearningRate 0.0010 Epoch: 4 Global Step: 87610 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:25,846-Speed 6309.55 samples/sec Loss 9.3558 LearningRate 0.0010 Epoch: 4 Global Step: 87620 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:29,089-Speed 6315.76 samples/sec Loss 9.3443 LearningRate 0.0010 Epoch: 4 Global Step: 87630 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:32,323-Speed 6334.70 samples/sec Loss 9.2732 LearningRate 0.0010 Epoch: 4 Global Step: 87640 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:35,573-Speed 6301.67 samples/sec Loss 9.3580 LearningRate 0.0010 Epoch: 4 Global Step: 87650 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:38,821-Speed 6307.83 samples/sec Loss 9.2913 LearningRate 0.0010 Epoch: 4 Global Step: 87660 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:42,068-Speed 6308.52 samples/sec Loss 9.2721 LearningRate 0.0010 Epoch: 4 Global Step: 87670 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:45,316-Speed 6306.78 samples/sec Loss 9.3613 LearningRate 0.0010 Epoch: 4 Global Step: 87680 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:48,561-Speed 6312.23 samples/sec Loss 9.2523 LearningRate 0.0010 Epoch: 4 Global Step: 87690 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:51,809-Speed 6307.21 samples/sec Loss 9.2973 LearningRate 0.0010 Epoch: 4 Global Step: 87700 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:55,056-Speed 6307.97 samples/sec Loss 9.2937 LearningRate 0.0010 Epoch: 4 Global Step: 87710 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:42:58,304-Speed 6307.64 samples/sec Loss 9.2783 LearningRate 0.0010 Epoch: 4 Global Step: 87720 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:01,552-Speed 6307.35 samples/sec Loss 9.4092 LearningRate 0.0010 Epoch: 4 Global Step: 87730 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:04,801-Speed 6303.51 samples/sec Loss 9.3175 LearningRate 0.0010 Epoch: 4 Global Step: 87740 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-03-31 23:43:08,031-Speed 6341.76 samples/sec Loss 9.3552 LearningRate 0.0010 Epoch: 4 Global Step: 87750 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:11,281-Speed 6303.01 samples/sec Loss 9.2285 LearningRate 0.0010 Epoch: 4 Global Step: 87760 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:14,527-Speed 6310.62 samples/sec Loss 9.2827 LearningRate 0.0010 Epoch: 4 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:17,775-Speed 6306.64 samples/sec Loss 9.2685 LearningRate 0.0010 Epoch: 4 Global Step: 87780 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:21,025-Speed 6304.48 samples/sec Loss 9.3568 LearningRate 0.0010 Epoch: 4 Global Step: 87790 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:24,267-Speed 6318.11 samples/sec Loss 9.3452 LearningRate 0.0010 Epoch: 4 Global Step: 87800 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:27,514-Speed 6310.19 samples/sec Loss 9.2598 LearningRate 0.0010 Epoch: 4 Global Step: 87810 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:30,761-Speed 6308.44 samples/sec Loss 9.3638 LearningRate 0.0010 Epoch: 4 Global Step: 87820 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:34,006-Speed 6311.80 samples/sec Loss 9.2815 LearningRate 0.0010 Epoch: 4 Global Step: 87830 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:37,254-Speed 6308.00 samples/sec Loss 9.3845 LearningRate 0.0010 Epoch: 4 Global Step: 87840 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:40,483-Speed 6343.39 samples/sec Loss 9.3053 LearningRate 0.0010 Epoch: 4 Global Step: 87850 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:43,729-Speed 6315.86 samples/sec Loss 9.1940 LearningRate 0.0010 Epoch: 4 Global Step: 87860 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:46,975-Speed 6310.68 samples/sec Loss 9.3690 LearningRate 0.0010 Epoch: 4 Global Step: 87870 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:50,220-Speed 6314.31 samples/sec Loss 9.3091 LearningRate 0.0010 Epoch: 4 Global Step: 87880 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:53,464-Speed 6313.59 samples/sec Loss 9.2804 LearningRate 0.0010 Epoch: 4 Global Step: 87890 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:56,711-Speed 6308.40 samples/sec Loss 9.3127 LearningRate 0.0010 Epoch: 4 Global Step: 87900 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:43:59,954-Speed 6315.66 samples/sec Loss 9.3360 LearningRate 0.0010 Epoch: 4 Global Step: 87910 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:03,199-Speed 6314.65 samples/sec Loss 9.2397 LearningRate 0.0010 Epoch: 4 Global Step: 87920 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:06,444-Speed 6313.59 samples/sec Loss 9.2735 LearningRate 0.0010 Epoch: 4 Global Step: 87930 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:09,684-Speed 6320.98 samples/sec Loss 9.2073 LearningRate 0.0010 Epoch: 4 Global Step: 87940 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:12,916-Speed 6339.44 samples/sec Loss 9.2342 LearningRate 0.0010 Epoch: 4 Global Step: 87950 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:16,172-Speed 6289.77 samples/sec Loss 9.2821 LearningRate 0.0010 Epoch: 4 Global Step: 87960 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:19,414-Speed 6319.65 samples/sec Loss 9.2818 LearningRate 0.0010 Epoch: 4 Global Step: 87970 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:22,660-Speed 6310.47 samples/sec Loss 9.3530 LearningRate 0.0010 Epoch: 4 Global Step: 87980 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:25,907-Speed 6309.19 samples/sec Loss 9.2519 LearningRate 0.0010 Epoch: 4 Global Step: 87990 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:29,156-Speed 6303.04 samples/sec Loss 9.2947 LearningRate 0.0010 Epoch: 4 Global Step: 88000 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:32,403-Speed 6309.49 samples/sec Loss 9.2927 LearningRate 0.0010 Epoch: 4 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:35,650-Speed 6310.20 samples/sec Loss 9.2833 LearningRate 0.0010 Epoch: 4 Global Step: 88020 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:38,900-Speed 6301.81 samples/sec Loss 9.2056 LearningRate 0.0010 Epoch: 4 Global Step: 88030 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:42,142-Speed 6318.64 samples/sec Loss 9.2306 LearningRate 0.0010 Epoch: 4 Global Step: 88040 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:45,384-Speed 6319.21 samples/sec Loss 9.3378 LearningRate 0.0010 Epoch: 4 Global Step: 88050 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-03-31 23:44:48,616-Speed 6338.05 samples/sec Loss 9.1532 LearningRate 0.0010 Epoch: 4 Global Step: 88060 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:51,869-Speed 6297.80 samples/sec Loss 9.2609 LearningRate 0.0010 Epoch: 4 Global Step: 88070 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:55,112-Speed 6316.48 samples/sec Loss 9.3238 LearningRate 0.0010 Epoch: 4 Global Step: 88080 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:44:58,355-Speed 6314.96 samples/sec Loss 9.3786 LearningRate 0.0010 Epoch: 4 Global Step: 88090 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:01,602-Speed 6308.90 samples/sec Loss 9.2978 LearningRate 0.0010 Epoch: 4 Global Step: 88100 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:04,848-Speed 6311.79 samples/sec Loss 9.3065 LearningRate 0.0010 Epoch: 4 Global Step: 88110 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:08,095-Speed 6307.42 samples/sec Loss 9.3557 LearningRate 0.0010 Epoch: 4 Global Step: 88120 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:11,339-Speed 6314.99 samples/sec Loss 9.3263 LearningRate 0.0010 Epoch: 4 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:14,595-Speed 6292.00 samples/sec Loss 9.2890 LearningRate 0.0010 Epoch: 4 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:17,842-Speed 6308.78 samples/sec Loss 9.3353 LearningRate 0.0010 Epoch: 4 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:21,074-Speed 6338.26 samples/sec Loss 9.3122 LearningRate 0.0010 Epoch: 4 Global Step: 88160 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:24,316-Speed 6318.03 samples/sec Loss 9.3112 LearningRate 0.0010 Epoch: 4 Global Step: 88170 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:27,564-Speed 6307.35 samples/sec Loss 9.2770 LearningRate 0.0010 Epoch: 4 Global Step: 88180 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:30,815-Speed 6300.21 samples/sec Loss 9.2823 LearningRate 0.0010 Epoch: 4 Global Step: 88190 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:34,058-Speed 6316.80 samples/sec Loss 9.2480 LearningRate 0.0010 Epoch: 4 Global Step: 88200 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:37,302-Speed 6315.70 samples/sec Loss 9.2077 LearningRate 0.0010 Epoch: 4 Global Step: 88210 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:40,546-Speed 6312.78 samples/sec Loss 9.3209 LearningRate 0.0010 Epoch: 4 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:43,791-Speed 6313.45 samples/sec Loss 9.1905 LearningRate 0.0010 Epoch: 4 Global Step: 88230 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:47,038-Speed 6309.36 samples/sec Loss 9.2742 LearningRate 0.0010 Epoch: 4 Global Step: 88240 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:50,283-Speed 6312.81 samples/sec Loss 9.2861 LearningRate 0.0010 Epoch: 4 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:53,519-Speed 6329.81 samples/sec Loss 9.2976 LearningRate 0.0010 Epoch: 4 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:45:56,768-Speed 6306.70 samples/sec Loss 9.2578 LearningRate 0.0010 Epoch: 4 Global Step: 88270 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:00,013-Speed 6312.41 samples/sec Loss 9.2620 LearningRate 0.0010 Epoch: 4 Global Step: 88280 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:03,262-Speed 6303.59 samples/sec Loss 9.2458 LearningRate 0.0010 Epoch: 4 Global Step: 88290 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:06,505-Speed 6317.05 samples/sec Loss 9.1987 LearningRate 0.0010 Epoch: 4 Global Step: 88300 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:09,750-Speed 6314.11 samples/sec Loss 9.2900 LearningRate 0.0010 Epoch: 4 Global Step: 88310 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:12,996-Speed 6310.18 samples/sec Loss 9.2097 LearningRate 0.0010 Epoch: 4 Global Step: 88320 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:16,242-Speed 6310.97 samples/sec Loss 9.2868 LearningRate 0.0010 Epoch: 4 Global Step: 88330 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:19,493-Speed 6301.00 samples/sec Loss 9.2520 LearningRate 0.0010 Epoch: 4 Global Step: 88340 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:22,738-Speed 6312.70 samples/sec Loss 9.2363 LearningRate 0.0010 Epoch: 4 Global Step: 88350 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:25,970-Speed 6338.40 samples/sec Loss 9.3637 LearningRate 0.0010 Epoch: 4 Global Step: 88360 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:29,213-Speed 6315.37 samples/sec Loss 9.2976 LearningRate 0.0010 Epoch: 4 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:32,458-Speed 6313.28 samples/sec Loss 9.1945 LearningRate 0.0010 Epoch: 4 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:35,706-Speed 6305.42 samples/sec Loss 9.2871 LearningRate 0.0010 Epoch: 4 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:38,956-Speed 6303.34 samples/sec Loss 9.2469 LearningRate 0.0010 Epoch: 4 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:42,203-Speed 6308.33 samples/sec Loss 9.2125 LearningRate 0.0010 Epoch: 4 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:45,452-Speed 6304.96 samples/sec Loss 9.3864 LearningRate 0.0010 Epoch: 4 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:48,704-Speed 6300.14 samples/sec Loss 9.3335 LearningRate 0.0010 Epoch: 4 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:51,950-Speed 6310.74 samples/sec Loss 9.1888 LearningRate 0.0010 Epoch: 4 Global Step: 88440 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:55,196-Speed 6312.08 samples/sec Loss 9.2287 LearningRate 0.0010 Epoch: 4 Global Step: 88450 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:46:58,428-Speed 6337.50 samples/sec Loss 9.2544 LearningRate 0.0010 Epoch: 4 Global Step: 88460 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:47:01,676-Speed 6305.85 samples/sec Loss 9.3365 LearningRate 0.0010 Epoch: 4 Global Step: 88470 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:47:04,922-Speed 6310.91 samples/sec Loss 9.2121 LearningRate 0.0010 Epoch: 4 Global Step: 88480 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:47:08,164-Speed 6318.83 samples/sec Loss 9.2020 LearningRate 0.0010 Epoch: 4 Global Step: 88490 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:47:11,412-Speed 6306.73 samples/sec Loss 9.2360 LearningRate 0.0010 Epoch: 4 Global Step: 88500 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:47:14,664-Speed 6299.30 samples/sec Loss 9.2880 LearningRate 0.0010 Epoch: 4 Global Step: 88510 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:47:17,908-Speed 6314.56 samples/sec Loss 9.2674 LearningRate 0.0010 Epoch: 4 Global Step: 88520 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:47:21,150-Speed 6318.93 samples/sec Loss 9.2666 LearningRate 0.0010 Epoch: 4 Global Step: 88530 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:47:24,394-Speed 6315.46 samples/sec Loss 9.2810 LearningRate 0.0010 Epoch: 4 Global Step: 88540 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:47:27,717-Speed 6162.89 samples/sec Loss 9.2909 LearningRate 0.0010 Epoch: 4 Global Step: 88550 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:47:30,963-Speed 6310.35 samples/sec Loss 9.2157 LearningRate 0.0010 Epoch: 4 Global Step: 88560 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:47:34,208-Speed 6312.67 samples/sec Loss 9.2909 LearningRate 0.0010 Epoch: 4 Global Step: 88570 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:47:37,453-Speed 6313.10 samples/sec Loss 9.2586 LearningRate 0.0010 Epoch: 4 Global Step: 88580 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:47:40,696-Speed 6316.47 samples/sec Loss 9.2697 LearningRate 0.0010 Epoch: 4 Global Step: 88590 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:47:43,942-Speed 6311.88 samples/sec Loss 9.2353 LearningRate 0.0010 Epoch: 4 Global Step: 88600 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:47:47,185-Speed 6315.13 samples/sec Loss 9.1836 LearningRate 0.0010 Epoch: 4 Global Step: 88610 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:47:50,434-Speed 6306.04 samples/sec Loss 9.2064 LearningRate 0.0010 Epoch: 4 Global Step: 88620 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:47:53,681-Speed 6307.63 samples/sec Loss 9.2594 LearningRate 0.0010 Epoch: 4 Global Step: 88630 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:47:56,924-Speed 6318.15 samples/sec Loss 9.1629 LearningRate 0.0010 Epoch: 4 Global Step: 88640 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:00,169-Speed 6311.96 samples/sec Loss 9.2329 LearningRate 0.0010 Epoch: 4 Global Step: 88650 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:03,401-Speed 6337.40 samples/sec Loss 9.2499 LearningRate 0.0010 Epoch: 4 Global Step: 88660 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:06,647-Speed 6311.21 samples/sec Loss 9.3007 LearningRate 0.0010 Epoch: 4 Global Step: 88670 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:09,896-Speed 6306.75 samples/sec Loss 9.2348 LearningRate 0.0010 Epoch: 4 Global Step: 88680 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:13,139-Speed 6316.18 samples/sec Loss 9.2518 LearningRate 0.0010 Epoch: 4 Global Step: 88690 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:16,387-Speed 6306.10 samples/sec Loss 9.2255 LearningRate 0.0010 Epoch: 4 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:19,642-Speed 6294.38 samples/sec Loss 9.1878 LearningRate 0.0010 Epoch: 4 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:22,892-Speed 6301.13 samples/sec Loss 9.1740 LearningRate 0.0010 Epoch: 4 Global Step: 88720 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:26,138-Speed 6312.10 samples/sec Loss 9.2185 LearningRate 0.0010 Epoch: 4 Global Step: 88730 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:29,383-Speed 6313.28 samples/sec Loss 9.1610 LearningRate 0.0010 Epoch: 4 Global Step: 88740 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:32,629-Speed 6309.86 samples/sec Loss 9.2579 LearningRate 0.0010 Epoch: 4 Global Step: 88750 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:35,860-Speed 6339.08 samples/sec Loss 9.2390 LearningRate 0.0010 Epoch: 4 Global Step: 88760 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:39,107-Speed 6308.68 samples/sec Loss 9.2841 LearningRate 0.0010 Epoch: 4 Global Step: 88770 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:42,353-Speed 6310.65 samples/sec Loss 9.2216 LearningRate 0.0010 Epoch: 4 Global Step: 88780 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:45,598-Speed 6313.91 samples/sec Loss 9.2585 LearningRate 0.0010 Epoch: 4 Global Step: 88790 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:48,931-Speed 6144.85 samples/sec Loss 9.2180 LearningRate 0.0010 Epoch: 4 Global Step: 88800 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:52,208-Speed 6251.19 samples/sec Loss 9.2976 LearningRate 0.0010 Epoch: 4 Global Step: 88810 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:55,450-Speed 6318.62 samples/sec Loss 9.1634 LearningRate 0.0010 Epoch: 4 Global Step: 88820 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:48:58,697-Speed 6309.74 samples/sec Loss 9.2496 LearningRate 0.0010 Epoch: 4 Global Step: 88830 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:01,949-Speed 6297.41 samples/sec Loss 9.2309 LearningRate 0.0010 Epoch: 4 Global Step: 88840 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:05,196-Speed 6309.28 samples/sec Loss 9.2074 LearningRate 0.0010 Epoch: 4 Global Step: 88850 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:08,429-Speed 6337.98 samples/sec Loss 9.3183 LearningRate 0.0010 Epoch: 4 Global Step: 88860 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:11,675-Speed 6309.94 samples/sec Loss 9.2668 LearningRate 0.0010 Epoch: 4 Global Step: 88870 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:14,916-Speed 6321.21 samples/sec Loss 9.2168 LearningRate 0.0010 Epoch: 4 Global Step: 88880 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:18,165-Speed 6305.63 samples/sec Loss 9.3297 LearningRate 0.0010 Epoch: 4 Global Step: 88890 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:21,408-Speed 6316.81 samples/sec Loss 9.1884 LearningRate 0.0010 Epoch: 4 Global Step: 88900 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:24,673-Speed 6273.59 samples/sec Loss 9.1959 LearningRate 0.0010 Epoch: 4 Global Step: 88910 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:27,924-Speed 6301.51 samples/sec Loss 9.1195 LearningRate 0.0010 Epoch: 4 Global Step: 88920 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:31,174-Speed 6301.13 samples/sec Loss 9.3372 LearningRate 0.0010 Epoch: 4 Global Step: 88930 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:34,421-Speed 6308.89 samples/sec Loss 9.2189 LearningRate 0.0010 Epoch: 4 Global Step: 88940 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:37,671-Speed 6303.02 samples/sec Loss 9.2001 LearningRate 0.0010 Epoch: 4 Global Step: 88950 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:40,915-Speed 6315.08 samples/sec Loss 9.1835 LearningRate 0.0010 Epoch: 4 Global Step: 88960 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-03-31 23:49:44,146-Speed 6340.13 samples/sec Loss 9.1841 LearningRate 0.0010 Epoch: 4 Global Step: 88970 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:47,387-Speed 6320.83 samples/sec Loss 9.3498 LearningRate 0.0010 Epoch: 4 Global Step: 88980 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:50,632-Speed 6312.93 samples/sec Loss 9.2379 LearningRate 0.0010 Epoch: 4 Global Step: 88990 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:53,877-Speed 6311.89 samples/sec Loss 9.1634 LearningRate 0.0010 Epoch: 4 Global Step: 89000 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:49:57,123-Speed 6309.96 samples/sec Loss 9.2794 LearningRate 0.0010 Epoch: 4 Global Step: 89010 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:00,372-Speed 6306.10 samples/sec Loss 9.2477 LearningRate 0.0010 Epoch: 4 Global Step: 89020 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:03,618-Speed 6310.05 samples/sec Loss 9.1764 LearningRate 0.0010 Epoch: 4 Global Step: 89030 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:06,864-Speed 6310.54 samples/sec Loss 9.1418 LearningRate 0.0010 Epoch: 4 Global Step: 89040 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:10,107-Speed 6317.27 samples/sec Loss 9.2299 LearningRate 0.0010 Epoch: 4 Global Step: 89050 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:13,354-Speed 6308.34 samples/sec Loss 9.1877 LearningRate 0.0010 Epoch: 4 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:16,581-Speed 6347.53 samples/sec Loss 9.2420 LearningRate 0.0010 Epoch: 4 Global Step: 89070 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:19,828-Speed 6310.67 samples/sec Loss 9.2426 LearningRate 0.0010 Epoch: 4 Global Step: 89080 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:23,070-Speed 6316.89 samples/sec Loss 9.2919 LearningRate 0.0010 Epoch: 4 Global Step: 89090 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:26,317-Speed 6310.34 samples/sec Loss 9.2882 LearningRate 0.0010 Epoch: 4 Global Step: 89100 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:29,585-Speed 6268.44 samples/sec Loss 9.1956 LearningRate 0.0010 Epoch: 4 Global Step: 89110 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:32,823-Speed 6325.58 samples/sec Loss 9.0799 LearningRate 0.0010 Epoch: 4 Global Step: 89120 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:36,069-Speed 6310.69 samples/sec Loss 9.2275 LearningRate 0.0010 Epoch: 4 Global Step: 89130 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:39,314-Speed 6313.92 samples/sec Loss 9.1957 LearningRate 0.0010 Epoch: 4 Global Step: 89140 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:42,557-Speed 6316.52 samples/sec Loss 9.2608 LearningRate 0.0010 Epoch: 4 Global Step: 89150 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:45,800-Speed 6316.21 samples/sec Loss 9.1726 LearningRate 0.0010 Epoch: 4 Global Step: 89160 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:49,037-Speed 6328.76 samples/sec Loss 9.2265 LearningRate 0.0010 Epoch: 4 Global Step: 89170 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:52,282-Speed 6312.18 samples/sec Loss 9.1410 LearningRate 0.0010 Epoch: 4 Global Step: 89180 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:55,529-Speed 6308.10 samples/sec Loss 9.2215 LearningRate 0.0010 Epoch: 4 Global Step: 89190 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:50:58,777-Speed 6307.98 samples/sec Loss 9.2261 LearningRate 0.0010 Epoch: 4 Global Step: 89200 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:02,018-Speed 6320.08 samples/sec Loss 9.2130 LearningRate 0.0010 Epoch: 4 Global Step: 89210 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:05,260-Speed 6317.37 samples/sec Loss 9.1811 LearningRate 0.0010 Epoch: 4 Global Step: 89220 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:08,506-Speed 6310.47 samples/sec Loss 9.2477 LearningRate 0.0010 Epoch: 4 Global Step: 89230 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:11,749-Speed 6317.75 samples/sec Loss 9.1250 LearningRate 0.0010 Epoch: 4 Global Step: 89240 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:14,995-Speed 6310.67 samples/sec Loss 9.1674 LearningRate 0.0010 Epoch: 4 Global Step: 89250 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:18,242-Speed 6307.48 samples/sec Loss 9.2129 LearningRate 0.0010 Epoch: 4 Global Step: 89260 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:21,475-Speed 6337.48 samples/sec Loss 9.2950 LearningRate 0.0010 Epoch: 4 Global Step: 89270 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:24,726-Speed 6300.39 samples/sec Loss 9.1753 LearningRate 0.0010 Epoch: 4 Global Step: 89280 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:27,968-Speed 6319.85 samples/sec Loss 9.1817 LearningRate 0.0010 Epoch: 4 Global Step: 89290 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:31,213-Speed 6312.65 samples/sec Loss 9.1751 LearningRate 0.0010 Epoch: 4 Global Step: 89300 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:34,459-Speed 6310.23 samples/sec Loss 9.1207 LearningRate 0.0010 Epoch: 4 Global Step: 89310 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:37,708-Speed 6304.54 samples/sec Loss 9.2282 LearningRate 0.0010 Epoch: 4 Global Step: 89320 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:40,962-Speed 6296.65 samples/sec Loss 9.2301 LearningRate 0.0010 Epoch: 4 Global Step: 89330 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:44,208-Speed 6310.17 samples/sec Loss 9.1114 LearningRate 0.0010 Epoch: 4 Global Step: 89340 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:47,452-Speed 6314.28 samples/sec Loss 9.1844 LearningRate 0.0010 Epoch: 4 Global Step: 89350 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:50,695-Speed 6315.59 samples/sec Loss 9.2196 LearningRate 0.0010 Epoch: 4 Global Step: 89360 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:51:53,942-Speed 6310.39 samples/sec Loss 9.2638 LearningRate 0.0010 Epoch: 4 Global Step: 89370 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-03-31 23:51:57,173-Speed 6339.62 samples/sec Loss 9.1594 LearningRate 0.0010 Epoch: 4 Global Step: 89380 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:00,423-Speed 6301.78 samples/sec Loss 9.0970 LearningRate 0.0010 Epoch: 4 Global Step: 89390 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:03,731-Speed 6193.30 samples/sec Loss 9.2152 LearningRate 0.0010 Epoch: 4 Global Step: 89400 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:06,971-Speed 6321.57 samples/sec Loss 9.1505 LearningRate 0.0010 Epoch: 4 Global Step: 89410 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:10,219-Speed 6306.73 samples/sec Loss 9.0562 LearningRate 0.0010 Epoch: 4 Global Step: 89420 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:13,460-Speed 6320.12 samples/sec Loss 9.1598 LearningRate 0.0010 Epoch: 4 Global Step: 89430 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:16,720-Speed 6284.94 samples/sec Loss 9.1141 LearningRate 0.0010 Epoch: 4 Global Step: 89440 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:19,967-Speed 6307.53 samples/sec Loss 9.2924 LearningRate 0.0010 Epoch: 4 Global Step: 89450 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:23,211-Speed 6315.54 samples/sec Loss 9.2262 LearningRate 0.0010 Epoch: 4 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:26,460-Speed 6304.50 samples/sec Loss 9.2271 LearningRate 0.0010 Epoch: 4 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:29,696-Speed 6330.57 samples/sec Loss 9.2583 LearningRate 0.0010 Epoch: 4 Global Step: 89480 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:52:32,929-Speed 6337.30 samples/sec Loss 9.1735 LearningRate 0.0010 Epoch: 4 Global Step: 89490 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:52:36,174-Speed 6312.30 samples/sec Loss 9.0995 LearningRate 0.0010 Epoch: 4 Global Step: 89500 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:52:39,417-Speed 6316.49 samples/sec Loss 9.1975 LearningRate 0.0010 Epoch: 4 Global Step: 89510 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:52:42,663-Speed 6311.63 samples/sec Loss 9.2051 LearningRate 0.0010 Epoch: 4 Global Step: 89520 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:52:45,906-Speed 6316.22 samples/sec Loss 9.2550 LearningRate 0.0010 Epoch: 4 Global Step: 89530 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:52:49,155-Speed 6303.83 samples/sec Loss 9.0652 LearningRate 0.0010 Epoch: 4 Global Step: 89540 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:52:52,405-Speed 6303.70 samples/sec Loss 9.2108 LearningRate 0.0010 Epoch: 4 Global Step: 89550 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:52:55,646-Speed 6320.74 samples/sec Loss 9.1604 LearningRate 0.0010 Epoch: 4 Global Step: 89560 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:52:58,891-Speed 6312.58 samples/sec Loss 9.0963 LearningRate 0.0010 Epoch: 4 Global Step: 89570 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:53:02,136-Speed 6312.11 samples/sec Loss 9.0811 LearningRate 0.0010 Epoch: 4 Global Step: 89580 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:53:05,388-Speed 6300.10 samples/sec Loss 9.1361 LearningRate 0.0010 Epoch: 4 Global Step: 89590 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:08,632-Speed 6314.00 samples/sec Loss 9.1462 LearningRate 0.0010 Epoch: 4 Global Step: 89600 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:11,874-Speed 6318.34 samples/sec Loss 9.2137 LearningRate 0.0010 Epoch: 4 Global Step: 89610 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:15,120-Speed 6310.23 samples/sec Loss 9.1429 LearningRate 0.0010 Epoch: 4 Global Step: 89620 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:18,370-Speed 6303.13 samples/sec Loss 9.2037 LearningRate 0.0010 Epoch: 4 Global Step: 89630 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:21,616-Speed 6311.49 samples/sec Loss 9.2357 LearningRate 0.0010 Epoch: 4 Global Step: 89640 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:24,860-Speed 6316.22 samples/sec Loss 9.1803 LearningRate 0.0010 Epoch: 4 Global Step: 89650 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:28,109-Speed 6304.46 samples/sec Loss 9.1756 LearningRate 0.0010 Epoch: 4 Global Step: 89660 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:31,357-Speed 6305.69 samples/sec Loss 9.1946 LearningRate 0.0010 Epoch: 4 Global Step: 89670 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:34,599-Speed 6319.50 samples/sec Loss 9.2418 LearningRate 0.0010 Epoch: 4 Global Step: 89680 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:37,831-Speed 6337.91 samples/sec Loss 9.0819 LearningRate 0.0010 Epoch: 4 Global Step: 89690 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:41,077-Speed 6309.91 samples/sec Loss 9.1725 LearningRate 0.0010 Epoch: 4 Global Step: 89700 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:44,326-Speed 6306.99 samples/sec Loss 9.1522 LearningRate 0.0010 Epoch: 4 Global Step: 89710 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:47,569-Speed 6315.01 samples/sec Loss 9.1735 LearningRate 0.0010 Epoch: 4 Global Step: 89720 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:50,817-Speed 6307.45 samples/sec Loss 9.2063 LearningRate 0.0010 Epoch: 4 Global Step: 89730 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:54,065-Speed 6307.11 samples/sec Loss 9.1014 LearningRate 0.0010 Epoch: 4 Global Step: 89740 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:53:57,311-Speed 6311.40 samples/sec Loss 9.1458 LearningRate 0.0010 Epoch: 4 Global Step: 89750 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:54:00,553-Speed 6317.36 samples/sec Loss 9.1719 LearningRate 0.0010 Epoch: 4 Global Step: 89760 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:54:03,798-Speed 6312.73 samples/sec Loss 9.1447 LearningRate 0.0010 Epoch: 4 Global Step: 89770 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:54:07,044-Speed 6311.31 samples/sec Loss 9.1974 LearningRate 0.0010 Epoch: 4 Global Step: 89780 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:54:10,276-Speed 6337.43 samples/sec Loss 9.2131 LearningRate 0.0010 Epoch: 4 Global Step: 89790 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:54:13,525-Speed 6306.03 samples/sec Loss 9.1217 LearningRate 0.0010 Epoch: 4 Global Step: 89800 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:54:16,770-Speed 6312.13 samples/sec Loss 9.1250 LearningRate 0.0010 Epoch: 4 Global Step: 89810 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:54:20,012-Speed 6318.83 samples/sec Loss 9.1342 LearningRate 0.0010 Epoch: 4 Global Step: 89820 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:54:23,261-Speed 6304.69 samples/sec Loss 9.1233 LearningRate 0.0010 Epoch: 4 Global Step: 89830 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:54:26,505-Speed 6314.86 samples/sec Loss 9.1692 LearningRate 0.0010 Epoch: 4 Global Step: 89840 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:54:29,760-Speed 6293.35 samples/sec Loss 9.1848 LearningRate 0.0010 Epoch: 4 Global Step: 89850 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:54:33,005-Speed 6312.69 samples/sec Loss 9.1090 LearningRate 0.0010 Epoch: 4 Global Step: 89860 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:54:36,246-Speed 6318.79 samples/sec Loss 9.2254 LearningRate 0.0010 Epoch: 4 Global Step: 89870 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:54:39,492-Speed 6311.06 samples/sec Loss 9.1646 LearningRate 0.0010 Epoch: 4 Global Step: 89880 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:54:42,738-Speed 6311.15 samples/sec Loss 9.1797 LearningRate 0.0010 Epoch: 4 Global Step: 89890 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:54:45,987-Speed 6304.05 samples/sec Loss 9.1727 LearningRate 0.0010 Epoch: 4 Global Step: 89900 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:54:49,234-Speed 6309.05 samples/sec Loss 9.1402 LearningRate 0.0010 Epoch: 4 Global Step: 89910 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:54:52,477-Speed 6316.24 samples/sec Loss 9.0701 LearningRate 0.0010 Epoch: 4 Global Step: 89920 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:54:55,724-Speed 6309.30 samples/sec Loss 9.1008 LearningRate 0.0010 Epoch: 4 Global Step: 89930 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:54:58,968-Speed 6315.48 samples/sec Loss 9.2142 LearningRate 0.0010 Epoch: 4 Global Step: 89940 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:02,215-Speed 6310.07 samples/sec Loss 9.0847 LearningRate 0.0010 Epoch: 4 Global Step: 89950 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:05,463-Speed 6305.53 samples/sec Loss 9.2359 LearningRate 0.0010 Epoch: 4 Global Step: 89960 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:08,708-Speed 6313.24 samples/sec Loss 9.1837 LearningRate 0.0010 Epoch: 4 Global Step: 89970 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:11,955-Speed 6308.02 samples/sec Loss 9.1142 LearningRate 0.0010 Epoch: 4 Global Step: 89980 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:15,183-Speed 6347.59 samples/sec Loss 9.1566 LearningRate 0.0010 Epoch: 4 Global Step: 89990 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:18,430-Speed 6308.14 samples/sec Loss 9.1424 LearningRate 0.0010 Epoch: 4 Global Step: 90000 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:21,673-Speed 6315.14 samples/sec Loss 9.1524 LearningRate 0.0010 Epoch: 4 Global Step: 90010 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:24,917-Speed 6314.98 samples/sec Loss 9.1321 LearningRate 0.0010 Epoch: 4 Global Step: 90020 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:28,164-Speed 6310.50 samples/sec Loss 9.0940 LearningRate 0.0010 Epoch: 4 Global Step: 90030 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:31,410-Speed 6310.09 samples/sec Loss 9.0928 LearningRate 0.0010 Epoch: 4 Global Step: 90040 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:34,656-Speed 6310.18 samples/sec Loss 9.1409 LearningRate 0.0010 Epoch: 4 Global Step: 90050 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:37,901-Speed 6313.63 samples/sec Loss 9.1887 LearningRate 0.0010 Epoch: 4 Global Step: 90060 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:41,148-Speed 6308.25 samples/sec Loss 9.1346 LearningRate 0.0010 Epoch: 4 Global Step: 90070 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:44,392-Speed 6315.14 samples/sec Loss 9.1479 LearningRate 0.0010 Epoch: 4 Global Step: 90080 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:47,621-Speed 6343.23 samples/sec Loss 9.1834 LearningRate 0.0010 Epoch: 4 Global Step: 90090 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:50,878-Speed 6290.06 samples/sec Loss 9.0684 LearningRate 0.0010 Epoch: 4 Global Step: 90100 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:54,120-Speed 6318.18 samples/sec Loss 9.1766 LearningRate 0.0010 Epoch: 4 Global Step: 90110 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:55:57,367-Speed 6308.38 samples/sec Loss 9.1083 LearningRate 0.0010 Epoch: 4 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:56:00,611-Speed 6314.73 samples/sec Loss 9.1018 LearningRate 0.0010 Epoch: 4 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:56:03,859-Speed 6306.97 samples/sec Loss 9.1779 LearningRate 0.0010 Epoch: 4 Global Step: 90140 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:56:07,093-Speed 6334.73 samples/sec Loss 9.1442 LearningRate 0.0010 Epoch: 4 Global Step: 90150 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:56:10,337-Speed 6313.80 samples/sec Loss 9.1401 LearningRate 0.0010 Epoch: 4 Global Step: 90160 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:56:13,582-Speed 6312.74 samples/sec Loss 9.0677 LearningRate 0.0010 Epoch: 4 Global Step: 90170 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:56:16,828-Speed 6310.95 samples/sec Loss 9.0768 LearningRate 0.0010 Epoch: 4 Global Step: 90180 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:56:20,072-Speed 6315.21 samples/sec Loss 9.1401 LearningRate 0.0010 Epoch: 4 Global Step: 90190 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:56:23,313-Speed 6319.63 samples/sec Loss 9.1222 LearningRate 0.0010 Epoch: 4 Global Step: 90200 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:56:26,574-Speed 6319.74 samples/sec Loss 9.1606 LearningRate 0.0010 Epoch: 4 Global Step: 90210 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:56:29,818-Speed 6314.17 samples/sec Loss 9.2196 LearningRate 0.0010 Epoch: 4 Global Step: 90220 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:56:33,063-Speed 6314.01 samples/sec Loss 9.1344 LearningRate 0.0010 Epoch: 4 Global Step: 90230 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:56:36,322-Speed 6286.71 samples/sec Loss 9.0808 LearningRate 0.0010 Epoch: 4 Global Step: 90240 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-03-31 23:56:39,565-Speed 6315.84 samples/sec Loss 9.1673 LearningRate 0.0010 Epoch: 4 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:56:42,808-Speed 6316.19 samples/sec Loss 9.1888 LearningRate 0.0010 Epoch: 4 Global Step: 90260 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:56:46,063-Speed 6305.05 samples/sec Loss 9.2507 LearningRate 0.0010 Epoch: 4 Global Step: 90270 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:56:49,307-Speed 6313.66 samples/sec Loss 9.1430 LearningRate 0.0010 Epoch: 4 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:56:52,556-Speed 6304.62 samples/sec Loss 9.0382 LearningRate 0.0010 Epoch: 4 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:56:55,800-Speed 6323.83 samples/sec Loss 9.1315 LearningRate 0.0010 Epoch: 4 Global Step: 90300 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:56:59,120-Speed 6169.72 samples/sec Loss 9.1636 LearningRate 0.0010 Epoch: 4 Global Step: 90310 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:02,372-Speed 6313.59 samples/sec Loss 9.1819 LearningRate 0.0010 Epoch: 4 Global Step: 90320 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:05,621-Speed 6306.76 samples/sec Loss 9.1613 LearningRate 0.0010 Epoch: 4 Global Step: 90330 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:08,865-Speed 6313.82 samples/sec Loss 9.1797 LearningRate 0.0010 Epoch: 4 Global Step: 90340 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:12,095-Speed 6340.99 samples/sec Loss 9.0852 LearningRate 0.0010 Epoch: 4 Global Step: 90350 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:15,353-Speed 6314.33 samples/sec Loss 9.0845 LearningRate 0.0010 Epoch: 4 Global Step: 90360 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:18,595-Speed 6319.24 samples/sec Loss 9.0730 LearningRate 0.0010 Epoch: 4 Global Step: 90370 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:21,895-Speed 6207.63 samples/sec Loss 9.1017 LearningRate 0.0010 Epoch: 4 Global Step: 90380 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:25,148-Speed 6310.37 samples/sec Loss 9.1736 LearningRate 0.0010 Epoch: 4 Global Step: 90390 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:28,392-Speed 6315.15 samples/sec Loss 9.1197 LearningRate 0.0010 Epoch: 4 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:31,646-Speed 6313.92 samples/sec Loss 9.0666 LearningRate 0.0010 Epoch: 4 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:34,887-Speed 6319.73 samples/sec Loss 9.1140 LearningRate 0.0010 Epoch: 4 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:38,133-Speed 6310.83 samples/sec Loss 9.0651 LearningRate 0.0010 Epoch: 4 Global Step: 90430 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:41,378-Speed 6311.76 samples/sec Loss 9.1167 LearningRate 0.0010 Epoch: 4 Global Step: 90440 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:44,627-Speed 6319.08 samples/sec Loss 9.0490 LearningRate 0.0010 Epoch: 4 Global Step: 90450 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-03-31 23:57:47,862-Speed 6330.45 samples/sec Loss 9.0843 LearningRate 0.0010 Epoch: 4 Global Step: 90460 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:51,108-Speed 6310.86 samples/sec Loss 9.0548 LearningRate 0.0010 Epoch: 4 Global Step: 90470 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:54,363-Speed 6318.09 samples/sec Loss 9.1578 LearningRate 0.0010 Epoch: 4 Global Step: 90480 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:57:57,610-Speed 6308.43 samples/sec Loss 9.1232 LearningRate 0.0010 Epoch: 4 Global Step: 90490 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:00,852-Speed 6318.92 samples/sec Loss 9.1351 LearningRate 0.0010 Epoch: 4 Global Step: 90500 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:04,099-Speed 6308.15 samples/sec Loss 9.0196 LearningRate 0.0010 Epoch: 4 Global Step: 90510 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:07,345-Speed 6310.52 samples/sec Loss 9.1435 LearningRate 0.0010 Epoch: 4 Global Step: 90520 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:10,594-Speed 6306.10 samples/sec Loss 9.1295 LearningRate 0.0010 Epoch: 4 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:13,841-Speed 6308.75 samples/sec Loss 9.1432 LearningRate 0.0010 Epoch: 4 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:17,091-Speed 6303.50 samples/sec Loss 9.1167 LearningRate 0.0010 Epoch: 4 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:20,322-Speed 6339.64 samples/sec Loss 9.0570 LearningRate 0.0010 Epoch: 4 Global Step: 90560 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:23,568-Speed 6310.51 samples/sec Loss 9.1234 LearningRate 0.0010 Epoch: 4 Global Step: 90570 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:26,815-Speed 6311.71 samples/sec Loss 9.1517 LearningRate 0.0010 Epoch: 4 Global Step: 90580 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:30,061-Speed 6310.65 samples/sec Loss 9.0625 LearningRate 0.0010 Epoch: 4 Global Step: 90590 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:33,305-Speed 6315.87 samples/sec Loss 9.0788 LearningRate 0.0010 Epoch: 4 Global Step: 90600 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:36,549-Speed 6314.15 samples/sec Loss 9.0938 LearningRate 0.0010 Epoch: 4 Global Step: 90610 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:39,794-Speed 6311.70 samples/sec Loss 9.0841 LearningRate 0.0010 Epoch: 4 Global Step: 90620 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:43,043-Speed 6304.99 samples/sec Loss 9.1502 LearningRate 0.0010 Epoch: 4 Global Step: 90630 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:46,292-Speed 6305.18 samples/sec Loss 9.1560 LearningRate 0.0010 Epoch: 4 Global Step: 90640 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:49,538-Speed 6310.29 samples/sec Loss 9.1178 LearningRate 0.0010 Epoch: 4 Global Step: 90650 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:52,841-Speed 6201.01 samples/sec Loss 8.9672 LearningRate 0.0010 Epoch: 4 Global Step: 90660 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:56,121-Speed 6246.61 samples/sec Loss 9.1222 LearningRate 0.0010 Epoch: 4 Global Step: 90670 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:58:59,371-Speed 6302.41 samples/sec Loss 9.1278 LearningRate 0.0010 Epoch: 4 Global Step: 90680 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:02,613-Speed 6319.74 samples/sec Loss 9.1726 LearningRate 0.0010 Epoch: 4 Global Step: 90690 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:05,858-Speed 6311.80 samples/sec Loss 9.1502 LearningRate 0.0010 Epoch: 4 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:09,098-Speed 6321.08 samples/sec Loss 9.1339 LearningRate 0.0010 Epoch: 4 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:12,345-Speed 6310.03 samples/sec Loss 9.0975 LearningRate 0.0010 Epoch: 4 Global Step: 90720 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:15,589-Speed 6314.40 samples/sec Loss 9.0501 LearningRate 0.0010 Epoch: 4 Global Step: 90730 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:18,838-Speed 6305.78 samples/sec Loss 9.0607 LearningRate 0.0010 Epoch: 4 Global Step: 90740 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:22,085-Speed 6308.59 samples/sec Loss 9.0198 LearningRate 0.0010 Epoch: 4 Global Step: 90750 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:25,313-Speed 6346.17 samples/sec Loss 8.9879 LearningRate 0.0010 Epoch: 4 Global Step: 90760 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:28,558-Speed 6313.06 samples/sec Loss 9.1300 LearningRate 0.0010 Epoch: 4 Global Step: 90770 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:31,801-Speed 6316.48 samples/sec Loss 9.0331 LearningRate 0.0010 Epoch: 4 Global Step: 90780 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:35,046-Speed 6312.38 samples/sec Loss 9.0101 LearningRate 0.0010 Epoch: 4 Global Step: 90790 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:38,289-Speed 6316.44 samples/sec Loss 9.0238 LearningRate 0.0010 Epoch: 4 Global Step: 90800 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:41,531-Speed 6318.17 samples/sec Loss 9.1054 LearningRate 0.0010 Epoch: 4 Global Step: 90810 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:44,774-Speed 6316.55 samples/sec Loss 9.0993 LearningRate 0.0010 Epoch: 4 Global Step: 90820 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:48,016-Speed 6318.70 samples/sec Loss 9.0646 LearningRate 0.0010 Epoch: 4 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:51,260-Speed 6313.92 samples/sec Loss 9.0897 LearningRate 0.0010 Epoch: 4 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:54,503-Speed 6317.92 samples/sec Loss 9.1206 LearningRate 0.0010 Epoch: 4 Global Step: 90850 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-03-31 23:59:57,734-Speed 6339.40 samples/sec Loss 9.0726 LearningRate 0.0010 Epoch: 4 Global Step: 90860 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:00,978-Speed 6314.97 samples/sec Loss 9.1528 LearningRate 0.0010 Epoch: 4 Global Step: 90870 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:04,223-Speed 6311.78 samples/sec Loss 9.1816 LearningRate 0.0010 Epoch: 4 Global Step: 90880 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:07,469-Speed 6311.21 samples/sec Loss 9.1691 LearningRate 0.0010 Epoch: 4 Global Step: 90890 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:10,713-Speed 6313.96 samples/sec Loss 9.1358 LearningRate 0.0010 Epoch: 4 Global Step: 90900 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:13,961-Speed 6306.48 samples/sec Loss 9.0896 LearningRate 0.0010 Epoch: 4 Global Step: 90910 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:17,212-Speed 6301.75 samples/sec Loss 9.0583 LearningRate 0.0010 Epoch: 4 Global Step: 90920 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:20,455-Speed 6316.15 samples/sec Loss 9.1130 LearningRate 0.0010 Epoch: 4 Global Step: 90930 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:23,700-Speed 6312.82 samples/sec Loss 9.0962 LearningRate 0.0010 Epoch: 4 Global Step: 90940 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:26,943-Speed 6316.89 samples/sec Loss 9.1763 LearningRate 0.0010 Epoch: 4 Global Step: 90950 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:30,187-Speed 6315.41 samples/sec Loss 9.0765 LearningRate 0.0010 Epoch: 4 Global Step: 90960 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-04-01 00:00:33,421-Speed 6333.91 samples/sec Loss 9.0210 LearningRate 0.0010 Epoch: 4 Global Step: 90970 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:36,665-Speed 6314.07 samples/sec Loss 8.9989 LearningRate 0.0010 Epoch: 4 Global Step: 90980 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:39,911-Speed 6311.50 samples/sec Loss 9.1254 LearningRate 0.0010 Epoch: 4 Global Step: 90990 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:43,152-Speed 6320.26 samples/sec Loss 9.1169 LearningRate 0.0010 Epoch: 4 Global Step: 91000 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:46,395-Speed 6316.86 samples/sec Loss 9.0908 LearningRate 0.0010 Epoch: 4 Global Step: 91010 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:49,643-Speed 6306.02 samples/sec Loss 9.1255 LearningRate 0.0010 Epoch: 4 Global Step: 91020 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:52,887-Speed 6314.34 samples/sec Loss 9.0940 LearningRate 0.0010 Epoch: 4 Global Step: 91030 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:56,131-Speed 6314.96 samples/sec Loss 9.0493 LearningRate 0.0010 Epoch: 4 Global Step: 91040 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:00:59,373-Speed 6319.57 samples/sec Loss 9.0778 LearningRate 0.0010 Epoch: 4 Global Step: 91050 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:02,618-Speed 6311.78 samples/sec Loss 9.1214 LearningRate 0.0010 Epoch: 4 Global Step: 91060 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:05,858-Speed 6322.60 samples/sec Loss 9.1570 LearningRate 0.0010 Epoch: 4 Global Step: 91070 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:09,105-Speed 6307.94 samples/sec Loss 9.0448 LearningRate 0.0010 Epoch: 4 Global Step: 91080 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:12,350-Speed 6315.00 samples/sec Loss 9.1077 LearningRate 0.0010 Epoch: 4 Global Step: 91090 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:15,595-Speed 6312.03 samples/sec Loss 9.0360 LearningRate 0.0010 Epoch: 4 Global Step: 91100 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:18,844-Speed 6305.62 samples/sec Loss 9.0656 LearningRate 0.0010 Epoch: 4 Global Step: 91110 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:22,097-Speed 6297.03 samples/sec Loss 9.0446 LearningRate 0.0010 Epoch: 4 Global Step: 91120 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:25,344-Speed 6308.97 samples/sec Loss 9.0693 LearningRate 0.0010 Epoch: 4 Global Step: 91130 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:28,592-Speed 6307.14 samples/sec Loss 9.1133 LearningRate 0.0010 Epoch: 4 Global Step: 91140 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:31,834-Speed 6317.93 samples/sec Loss 9.1249 LearningRate 0.0010 Epoch: 4 Global Step: 91150 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:35,080-Speed 6311.52 samples/sec Loss 9.0436 LearningRate 0.0010 Epoch: 4 Global Step: 91160 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:38,311-Speed 6339.47 samples/sec Loss 9.0323 LearningRate 0.0010 Epoch: 4 Global Step: 91170 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:41,559-Speed 6307.38 samples/sec Loss 9.1524 LearningRate 0.0010 Epoch: 4 Global Step: 91180 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:44,805-Speed 6310.24 samples/sec Loss 9.0786 LearningRate 0.0010 Epoch: 4 Global Step: 91190 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:48,051-Speed 6310.96 samples/sec Loss 9.1230 LearningRate 0.0010 Epoch: 4 Global Step: 91200 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:51,301-Speed 6302.32 samples/sec Loss 9.0246 LearningRate 0.0010 Epoch: 4 Global Step: 91210 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:54,545-Speed 6314.93 samples/sec Loss 9.0214 LearningRate 0.0010 Epoch: 4 Global Step: 91220 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:01:57,791-Speed 6312.24 samples/sec Loss 9.1798 LearningRate 0.0010 Epoch: 4 Global Step: 91230 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:01,037-Speed 6309.75 samples/sec Loss 9.0528 LearningRate 0.0010 Epoch: 4 Global Step: 91240 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:04,284-Speed 6309.51 samples/sec Loss 9.1165 LearningRate 0.0010 Epoch: 4 Global Step: 91250 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:07,530-Speed 6310.17 samples/sec Loss 9.0546 LearningRate 0.0010 Epoch: 4 Global Step: 91260 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:10,772-Speed 6319.56 samples/sec Loss 9.0273 LearningRate 0.0010 Epoch: 4 Global Step: 91270 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:14,018-Speed 6309.77 samples/sec Loss 9.1053 LearningRate 0.0010 Epoch: 4 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:17,259-Speed 6319.88 samples/sec Loss 9.0624 LearningRate 0.0010 Epoch: 4 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:20,505-Speed 6311.14 samples/sec Loss 9.1382 LearningRate 0.0010 Epoch: 4 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:23,756-Speed 6302.39 samples/sec Loss 8.9368 LearningRate 0.0010 Epoch: 4 Global Step: 91310 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:27,003-Speed 6309.09 samples/sec Loss 9.0216 LearningRate 0.0010 Epoch: 4 Global Step: 91320 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:30,248-Speed 6311.46 samples/sec Loss 9.0736 LearningRate 0.0010 Epoch: 4 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:33,491-Speed 6316.40 samples/sec Loss 9.0066 LearningRate 0.0010 Epoch: 4 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:36,737-Speed 6311.56 samples/sec Loss 8.9934 LearningRate 0.0010 Epoch: 4 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:39,981-Speed 6315.07 samples/sec Loss 8.9628 LearningRate 0.0010 Epoch: 4 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:43,213-Speed 6338.40 samples/sec Loss 9.0126 LearningRate 0.0010 Epoch: 4 Global Step: 91370 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:46,462-Speed 6305.49 samples/sec Loss 8.9688 LearningRate 0.0010 Epoch: 4 Global Step: 91380 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:49,707-Speed 6312.25 samples/sec Loss 9.1108 LearningRate 0.0010 Epoch: 4 Global Step: 91390 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:52,954-Speed 6309.06 samples/sec Loss 9.0755 LearningRate 0.0010 Epoch: 4 Global Step: 91400 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:56,197-Speed 6315.75 samples/sec Loss 9.1297 LearningRate 0.0010 Epoch: 4 Global Step: 91410 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:02:59,429-Speed 6337.87 samples/sec Loss 9.0874 LearningRate 0.0010 Epoch: 4 Global Step: 91420 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:03:02,672-Speed 6316.06 samples/sec Loss 9.0291 LearningRate 0.0010 Epoch: 4 Global Step: 91430 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:03:05,917-Speed 6312.87 samples/sec Loss 8.9658 LearningRate 0.0010 Epoch: 4 Global Step: 91440 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:03:09,161-Speed 6316.28 samples/sec Loss 9.0034 LearningRate 0.0010 Epoch: 4 Global Step: 91450 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:03:12,405-Speed 6313.86 samples/sec Loss 9.0139 LearningRate 0.0010 Epoch: 4 Global Step: 91460 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:03:15,672-Speed 6270.32 samples/sec Loss 9.0149 LearningRate 0.0010 Epoch: 4 Global Step: 91470 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:03:18,912-Speed 6321.24 samples/sec Loss 9.0713 LearningRate 0.0010 Epoch: 4 Global Step: 91480 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:03:22,154-Speed 6319.50 samples/sec Loss 9.0354 LearningRate 0.0010 Epoch: 4 Global Step: 91490 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:03:25,420-Speed 6271.89 samples/sec Loss 9.0269 LearningRate 0.0010 Epoch: 4 Global Step: 91500 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:03:28,788-Speed 6082.48 samples/sec Loss 9.1225 LearningRate 0.0010 Epoch: 4 Global Step: 91510 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:03:32,053-Speed 6273.08 samples/sec Loss 8.9564 LearningRate 0.0010 Epoch: 4 Global Step: 91520 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:03:35,296-Speed 6316.75 samples/sec Loss 9.0506 LearningRate 0.0010 Epoch: 4 Global Step: 91530 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:03:38,542-Speed 6311.03 samples/sec Loss 8.9780 LearningRate 0.0010 Epoch: 4 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:03:41,787-Speed 6311.95 samples/sec Loss 9.0206 LearningRate 0.0010 Epoch: 4 Global Step: 91550 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:03:45,034-Speed 6310.47 samples/sec Loss 9.0791 LearningRate 0.0010 Epoch: 4 Global Step: 91560 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:03:48,279-Speed 6311.18 samples/sec Loss 9.0235 LearningRate 0.0010 Epoch: 4 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:03:51,528-Speed 6305.65 samples/sec Loss 9.0230 LearningRate 0.0010 Epoch: 4 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:03:54,773-Speed 6312.65 samples/sec Loss 9.0708 LearningRate 0.0010 Epoch: 4 Global Step: 91590 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:03:58,014-Speed 6320.68 samples/sec Loss 9.0460 LearningRate 0.0010 Epoch: 4 Global Step: 91600 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:01,258-Speed 6315.26 samples/sec Loss 9.1048 LearningRate 0.0010 Epoch: 4 Global Step: 91610 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:04,491-Speed 6336.90 samples/sec Loss 9.0060 LearningRate 0.0010 Epoch: 4 Global Step: 91620 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:07,736-Speed 6312.26 samples/sec Loss 9.0268 LearningRate 0.0010 Epoch: 4 Global Step: 91630 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:10,982-Speed 6309.47 samples/sec Loss 9.0068 LearningRate 0.0010 Epoch: 4 Global Step: 91640 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:14,225-Speed 6318.21 samples/sec Loss 8.9600 LearningRate 0.0010 Epoch: 4 Global Step: 91650 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:17,473-Speed 6305.61 samples/sec Loss 8.9779 LearningRate 0.0010 Epoch: 4 Global Step: 91660 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:20,720-Speed 6309.34 samples/sec Loss 9.0363 LearningRate 0.0010 Epoch: 4 Global Step: 91670 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:23,965-Speed 6311.49 samples/sec Loss 9.0559 LearningRate 0.0010 Epoch: 4 Global Step: 91680 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:27,210-Speed 6313.71 samples/sec Loss 9.0300 LearningRate 0.0010 Epoch: 4 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:30,453-Speed 6316.03 samples/sec Loss 9.0331 LearningRate 0.0010 Epoch: 4 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:33,697-Speed 6315.16 samples/sec Loss 9.0410 LearningRate 0.0010 Epoch: 4 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:36,936-Speed 6324.89 samples/sec Loss 9.0440 LearningRate 0.0010 Epoch: 4 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:40,194-Speed 6286.38 samples/sec Loss 8.9333 LearningRate 0.0010 Epoch: 4 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:43,441-Speed 6308.30 samples/sec Loss 9.0117 LearningRate 0.0010 Epoch: 4 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:46,687-Speed 6311.14 samples/sec Loss 9.0852 LearningRate 0.0010 Epoch: 4 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:49,929-Speed 6319.07 samples/sec Loss 8.9816 LearningRate 0.0010 Epoch: 4 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:53,176-Speed 6307.24 samples/sec Loss 9.0709 LearningRate 0.0010 Epoch: 4 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:56,424-Speed 6308.17 samples/sec Loss 9.0074 LearningRate 0.0010 Epoch: 4 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:04:59,668-Speed 6315.71 samples/sec Loss 9.0400 LearningRate 0.0010 Epoch: 4 Global Step: 91790 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:05:02,951-Speed 6239.89 samples/sec Loss 9.0099 LearningRate 0.0010 Epoch: 4 Global Step: 91800 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:05:06,201-Speed 6303.14 samples/sec Loss 9.0274 LearningRate 0.0010 Epoch: 4 Global Step: 91810 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:05:09,429-Speed 6344.31 samples/sec Loss 9.0567 LearningRate 0.0010 Epoch: 4 Global Step: 91820 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:05:12,676-Speed 6308.77 samples/sec Loss 9.0100 LearningRate 0.0010 Epoch: 4 Global Step: 91830 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:05:15,923-Speed 6309.79 samples/sec Loss 9.0902 LearningRate 0.0010 Epoch: 4 Global Step: 91840 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:05:19,172-Speed 6305.94 samples/sec Loss 8.9992 LearningRate 0.0010 Epoch: 4 Global Step: 91850 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:05:22,423-Speed 6299.60 samples/sec Loss 8.9848 LearningRate 0.0010 Epoch: 4 Global Step: 91860 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:05:25,659-Speed 6329.51 samples/sec Loss 8.9148 LearningRate 0.0010 Epoch: 4 Global Step: 91870 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:05:28,905-Speed 6311.59 samples/sec Loss 9.0377 LearningRate 0.0010 Epoch: 4 Global Step: 91880 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:05:32,151-Speed 6310.76 samples/sec Loss 9.0315 LearningRate 0.0010 Epoch: 4 Global Step: 91890 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:05:35,395-Speed 6313.75 samples/sec Loss 9.0616 LearningRate 0.0010 Epoch: 4 Global Step: 91900 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:05:38,639-Speed 6315.09 samples/sec Loss 9.0170 LearningRate 0.0010 Epoch: 4 Global Step: 91910 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:05:41,889-Speed 6303.81 samples/sec Loss 9.0465 LearningRate 0.0010 Epoch: 4 Global Step: 91920 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:05:45,138-Speed 6303.77 samples/sec Loss 8.9978 LearningRate 0.0010 Epoch: 4 Global Step: 91930 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:05:48,378-Speed 6322.89 samples/sec Loss 8.9962 LearningRate 0.0010 Epoch: 4 Global Step: 91940 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:05:51,637-Speed 6284.94 samples/sec Loss 9.0059 LearningRate 0.0010 Epoch: 4 Global Step: 91950 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:05:54,883-Speed 6311.01 samples/sec Loss 8.9662 LearningRate 0.0010 Epoch: 4 Global Step: 91960 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:05:58,127-Speed 6314.01 samples/sec Loss 9.0994 LearningRate 0.0010 Epoch: 4 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:01,378-Speed 6301.94 samples/sec Loss 8.9898 LearningRate 0.0010 Epoch: 4 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:04,625-Speed 6309.30 samples/sec Loss 8.9556 LearningRate 0.0010 Epoch: 4 Global Step: 91990 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:07,879-Speed 6294.65 samples/sec Loss 8.9844 LearningRate 0.0010 Epoch: 4 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:11,134-Speed 6293.94 samples/sec Loss 9.1444 LearningRate 0.0010 Epoch: 4 Global Step: 92010 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:14,387-Speed 6297.76 samples/sec Loss 9.0635 LearningRate 0.0010 Epoch: 4 Global Step: 92020 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:17,669-Speed 6240.53 samples/sec Loss 8.9919 LearningRate 0.0010 Epoch: 4 Global Step: 92030 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:20,915-Speed 6310.45 samples/sec Loss 9.0122 LearningRate 0.0010 Epoch: 4 Global Step: 92040 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:24,160-Speed 6313.03 samples/sec Loss 8.9939 LearningRate 0.0010 Epoch: 4 Global Step: 92050 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:27,407-Speed 6309.79 samples/sec Loss 9.0645 LearningRate 0.0010 Epoch: 4 Global Step: 92060 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:30,716-Speed 6190.00 samples/sec Loss 8.9674 LearningRate 0.0010 Epoch: 4 Global Step: 92070 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:33,986-Speed 6264.34 samples/sec Loss 9.0997 LearningRate 0.0010 Epoch: 4 Global Step: 92080 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:37,290-Speed 6200.35 samples/sec Loss 9.0113 LearningRate 0.0010 Epoch: 4 Global Step: 92090 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:40,537-Speed 6307.78 samples/sec Loss 9.0270 LearningRate 0.0010 Epoch: 4 Global Step: 92100 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:06:43,772-Speed 6333.07 samples/sec Loss 8.9558 LearningRate 0.0010 Epoch: 4 Global Step: 92110 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:06:47,017-Speed 6311.44 samples/sec Loss 8.9590 LearningRate 0.0010 Epoch: 4 Global Step: 92120 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:06:50,267-Speed 6302.84 samples/sec Loss 9.0955 LearningRate 0.0010 Epoch: 4 Global Step: 92130 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:06:53,514-Speed 6309.94 samples/sec Loss 9.0527 LearningRate 0.0010 Epoch: 4 Global Step: 92140 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:06:56,763-Speed 6304.02 samples/sec Loss 8.9641 LearningRate 0.0010 Epoch: 4 Global Step: 92150 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:00,021-Speed 6288.74 samples/sec Loss 9.0116 LearningRate 0.0010 Epoch: 4 Global Step: 92160 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:03,310-Speed 6227.12 samples/sec Loss 8.9660 LearningRate 0.0010 Epoch: 4 Global Step: 92170 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:06,554-Speed 6314.44 samples/sec Loss 8.9988 LearningRate 0.0010 Epoch: 4 Global Step: 92180 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:09,812-Speed 6288.31 samples/sec Loss 9.0187 LearningRate 0.0010 Epoch: 4 Global Step: 92190 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:13,060-Speed 6308.03 samples/sec Loss 8.9882 LearningRate 0.0010 Epoch: 4 Global Step: 92200 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:16,310-Speed 6303.41 samples/sec Loss 9.0657 LearningRate 0.0010 Epoch: 4 Global Step: 92210 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:07:19,555-Speed 6312.88 samples/sec Loss 9.0214 LearningRate 0.0010 Epoch: 4 Global Step: 92220 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:07:22,802-Speed 6308.98 samples/sec Loss 8.9726 LearningRate 0.0010 Epoch: 4 Global Step: 92230 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:07:26,051-Speed 6304.34 samples/sec Loss 9.0346 LearningRate 0.0010 Epoch: 4 Global Step: 92240 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:07:29,292-Speed 6319.92 samples/sec Loss 8.9653 LearningRate 0.0010 Epoch: 4 Global Step: 92250 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:07:32,542-Speed 6303.97 samples/sec Loss 8.9243 LearningRate 0.0010 Epoch: 4 Global Step: 92260 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:07:35,791-Speed 6305.31 samples/sec Loss 8.9790 LearningRate 0.0010 Epoch: 4 Global Step: 92270 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:07:39,021-Speed 6340.26 samples/sec Loss 8.9223 LearningRate 0.0010 Epoch: 4 Global Step: 92280 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:42,266-Speed 6312.41 samples/sec Loss 9.0075 LearningRate 0.0010 Epoch: 4 Global Step: 92290 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:45,509-Speed 6318.27 samples/sec Loss 9.0528 LearningRate 0.0010 Epoch: 4 Global Step: 92300 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:48,758-Speed 6303.77 samples/sec Loss 9.0743 LearningRate 0.0010 Epoch: 4 Global Step: 92310 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:52,001-Speed 6317.26 samples/sec Loss 9.0026 LearningRate 0.0010 Epoch: 4 Global Step: 92320 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:55,245-Speed 6314.21 samples/sec Loss 9.0052 LearningRate 0.0010 Epoch: 4 Global Step: 92330 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:07:58,491-Speed 6310.25 samples/sec Loss 8.9438 LearningRate 0.0010 Epoch: 4 Global Step: 92340 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:08:01,736-Speed 6313.25 samples/sec Loss 8.9145 LearningRate 0.0010 Epoch: 4 Global Step: 92350 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:08:04,980-Speed 6314.55 samples/sec Loss 8.9843 LearningRate 0.0010 Epoch: 4 Global Step: 92360 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:08:08,231-Speed 6300.25 samples/sec Loss 8.9755 LearningRate 0.0010 Epoch: 4 Global Step: 92370 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:08:11,486-Speed 6294.28 samples/sec Loss 9.0683 LearningRate 0.0010 Epoch: 4 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:14,731-Speed 6311.93 samples/sec Loss 8.9620 LearningRate 0.0010 Epoch: 4 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:17,980-Speed 6305.34 samples/sec Loss 8.9851 LearningRate 0.0010 Epoch: 4 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:21,225-Speed 6313.12 samples/sec Loss 9.0384 LearningRate 0.0010 Epoch: 4 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:24,471-Speed 6310.20 samples/sec Loss 8.9718 LearningRate 0.0010 Epoch: 4 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:27,715-Speed 6314.44 samples/sec Loss 8.9906 LearningRate 0.0010 Epoch: 4 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:30,960-Speed 6313.09 samples/sec Loss 8.9968 LearningRate 0.0010 Epoch: 4 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:34,205-Speed 6312.95 samples/sec Loss 8.9895 LearningRate 0.0010 Epoch: 4 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:37,450-Speed 6312.80 samples/sec Loss 8.9339 LearningRate 0.0010 Epoch: 4 Global Step: 92460 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:40,698-Speed 6307.41 samples/sec Loss 8.8525 LearningRate 0.0010 Epoch: 4 Global Step: 92470 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:43,930-Speed 6337.29 samples/sec Loss 8.9836 LearningRate 0.0010 Epoch: 4 Global Step: 92480 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:08:47,159-Speed 6344.40 samples/sec Loss 9.0478 LearningRate 0.0010 Epoch: 4 Global Step: 92490 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:08:50,403-Speed 6313.39 samples/sec Loss 8.9543 LearningRate 0.0010 Epoch: 4 Global Step: 92500 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:08:53,648-Speed 6314.61 samples/sec Loss 9.0110 LearningRate 0.0010 Epoch: 4 Global Step: 92510 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:08:56,892-Speed 6314.24 samples/sec Loss 9.0016 LearningRate 0.0010 Epoch: 4 Global Step: 92520 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:09:00,138-Speed 6308.99 samples/sec Loss 9.0391 LearningRate 0.0010 Epoch: 4 Global Step: 92530 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:09:03,390-Speed 6300.44 samples/sec Loss 8.9932 LearningRate 0.0010 Epoch: 4 Global Step: 92540 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:09:06,673-Speed 6239.93 samples/sec Loss 8.9543 LearningRate 0.0010 Epoch: 4 Global Step: 92550 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:09:09,916-Speed 6316.44 samples/sec Loss 8.9393 LearningRate 0.0010 Epoch: 4 Global Step: 92560 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:09:13,159-Speed 6316.15 samples/sec Loss 9.0632 LearningRate 0.0010 Epoch: 4 Global Step: 92570 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:09:16,402-Speed 6315.40 samples/sec Loss 9.0207 LearningRate 0.0010 Epoch: 4 Global Step: 92580 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:09:19,646-Speed 6315.70 samples/sec Loss 8.9868 LearningRate 0.0010 Epoch: 4 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:22,889-Speed 6316.41 samples/sec Loss 8.9975 LearningRate 0.0010 Epoch: 4 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:26,135-Speed 6310.83 samples/sec Loss 9.0703 LearningRate 0.0010 Epoch: 4 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:29,382-Speed 6309.78 samples/sec Loss 8.9593 LearningRate 0.0010 Epoch: 4 Global Step: 92620 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:32,626-Speed 6314.21 samples/sec Loss 8.9933 LearningRate 0.0010 Epoch: 4 Global Step: 92630 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:35,873-Speed 6308.74 samples/sec Loss 8.9662 LearningRate 0.0010 Epoch: 4 Global Step: 92640 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:39,123-Speed 6302.99 samples/sec Loss 8.9970 LearningRate 0.0010 Epoch: 4 Global Step: 92650 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:42,365-Speed 6318.46 samples/sec Loss 9.0192 LearningRate 0.0010 Epoch: 4 Global Step: 92660 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:45,613-Speed 6307.69 samples/sec Loss 9.0220 LearningRate 0.0010 Epoch: 4 Global Step: 92670 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:48,857-Speed 6312.66 samples/sec Loss 8.9974 LearningRate 0.0010 Epoch: 4 Global Step: 92680 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:52,097-Speed 6323.65 samples/sec Loss 8.9190 LearningRate 0.0010 Epoch: 4 Global Step: 92690 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-04-01 00:09:55,326-Speed 6344.02 samples/sec Loss 9.0794 LearningRate 0.0010 Epoch: 4 Global Step: 92700 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:09:58,567-Speed 6320.21 samples/sec Loss 8.8555 LearningRate 0.0010 Epoch: 4 Global Step: 92710 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:01,812-Speed 6312.39 samples/sec Loss 9.0016 LearningRate 0.0010 Epoch: 4 Global Step: 92720 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:05,057-Speed 6312.23 samples/sec Loss 8.8478 LearningRate 0.0010 Epoch: 4 Global Step: 92730 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:08,299-Speed 6319.01 samples/sec Loss 8.9319 LearningRate 0.0010 Epoch: 4 Global Step: 92740 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:11,558-Speed 6284.51 samples/sec Loss 8.9554 LearningRate 0.0010 Epoch: 4 Global Step: 92750 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:14,800-Speed 6318.63 samples/sec Loss 9.0255 LearningRate 0.0010 Epoch: 4 Global Step: 92760 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:18,048-Speed 6307.11 samples/sec Loss 8.9666 LearningRate 0.0010 Epoch: 4 Global Step: 92770 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:21,296-Speed 6306.16 samples/sec Loss 9.0204 LearningRate 0.0010 Epoch: 4 Global Step: 92780 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:24,541-Speed 6313.59 samples/sec Loss 8.9331 LearningRate 0.0010 Epoch: 4 Global Step: 92790 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:27,777-Speed 6330.73 samples/sec Loss 8.9724 LearningRate 0.0010 Epoch: 4 Global Step: 92800 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:31,021-Speed 6313.33 samples/sec Loss 8.9973 LearningRate 0.0010 Epoch: 4 Global Step: 92810 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:34,272-Speed 6302.23 samples/sec Loss 8.9891 LearningRate 0.0010 Epoch: 4 Global Step: 92820 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:37,519-Speed 6309.01 samples/sec Loss 8.9946 LearningRate 0.0010 Epoch: 4 Global Step: 92830 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:40,766-Speed 6309.07 samples/sec Loss 8.9024 LearningRate 0.0010 Epoch: 4 Global Step: 92840 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:44,012-Speed 6310.80 samples/sec Loss 8.9407 LearningRate 0.0010 Epoch: 4 Global Step: 92850 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:47,256-Speed 6315.67 samples/sec Loss 8.9619 LearningRate 0.0010 Epoch: 4 Global Step: 92860 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:50,503-Speed 6308.42 samples/sec Loss 9.0037 LearningRate 0.0010 Epoch: 4 Global Step: 92870 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:53,748-Speed 6312.10 samples/sec Loss 8.9506 LearningRate 0.0010 Epoch: 4 Global Step: 92880 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:10:57,004-Speed 6291.98 samples/sec Loss 8.9563 LearningRate 0.0010 Epoch: 4 Global Step: 92890 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:00,233-Speed 6343.76 samples/sec Loss 8.9787 LearningRate 0.0010 Epoch: 4 Global Step: 92900 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:03,479-Speed 6309.87 samples/sec Loss 8.9559 LearningRate 0.0010 Epoch: 4 Global Step: 92910 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:06,724-Speed 6311.99 samples/sec Loss 8.9338 LearningRate 0.0010 Epoch: 4 Global Step: 92920 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:09,967-Speed 6316.67 samples/sec Loss 8.8739 LearningRate 0.0010 Epoch: 4 Global Step: 92930 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:13,217-Speed 6303.40 samples/sec Loss 8.9618 LearningRate 0.0010 Epoch: 4 Global Step: 92940 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:16,462-Speed 6313.39 samples/sec Loss 9.0159 LearningRate 0.0010 Epoch: 4 Global Step: 92950 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:19,708-Speed 6310.18 samples/sec Loss 8.9097 LearningRate 0.0010 Epoch: 4 Global Step: 92960 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:22,955-Speed 6309.89 samples/sec Loss 8.9460 LearningRate 0.0010 Epoch: 4 Global Step: 92970 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:26,204-Speed 6303.34 samples/sec Loss 8.8976 LearningRate 0.0010 Epoch: 4 Global Step: 92980 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:29,454-Speed 6304.28 samples/sec Loss 8.9928 LearningRate 0.0010 Epoch: 4 Global Step: 92990 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:32,684-Speed 6341.19 samples/sec Loss 8.9206 LearningRate 0.0010 Epoch: 4 Global Step: 93000 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:35,928-Speed 6314.12 samples/sec Loss 8.9340 LearningRate 0.0010 Epoch: 4 Global Step: 93010 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:39,171-Speed 6318.02 samples/sec Loss 8.9414 LearningRate 0.0010 Epoch: 4 Global Step: 93020 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:42,419-Speed 6306.35 samples/sec Loss 8.9524 LearningRate 0.0010 Epoch: 4 Global Step: 93030 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:45,667-Speed 6306.89 samples/sec Loss 8.8783 LearningRate 0.0010 Epoch: 4 Global Step: 93040 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:48,912-Speed 6313.26 samples/sec Loss 9.0062 LearningRate 0.0010 Epoch: 4 Global Step: 93050 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:52,159-Speed 6308.12 samples/sec Loss 8.9530 LearningRate 0.0010 Epoch: 4 Global Step: 93060 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:55,410-Speed 6302.26 samples/sec Loss 8.9994 LearningRate 0.0010 Epoch: 4 Global Step: 93070 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:11:58,656-Speed 6310.52 samples/sec Loss 8.9935 LearningRate 0.0010 Epoch: 4 Global Step: 93080 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:01,904-Speed 6306.21 samples/sec Loss 8.9022 LearningRate 0.0010 Epoch: 4 Global Step: 93090 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:05,133-Speed 6343.71 samples/sec Loss 9.0142 LearningRate 0.0010 Epoch: 4 Global Step: 93100 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:08,378-Speed 6313.96 samples/sec Loss 8.9216 LearningRate 0.0010 Epoch: 4 Global Step: 93110 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:11,620-Speed 6318.27 samples/sec Loss 8.9994 LearningRate 0.0010 Epoch: 4 Global Step: 93120 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:14,868-Speed 6306.05 samples/sec Loss 8.8944 LearningRate 0.0010 Epoch: 4 Global Step: 93130 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:18,146-Speed 6249.61 samples/sec Loss 8.9692 LearningRate 0.0010 Epoch: 4 Global Step: 93140 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:21,388-Speed 6317.22 samples/sec Loss 8.9965 LearningRate 0.0010 Epoch: 4 Global Step: 93150 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:24,636-Speed 6308.69 samples/sec Loss 8.9369 LearningRate 0.0010 Epoch: 4 Global Step: 93160 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:27,880-Speed 6314.63 samples/sec Loss 8.9191 LearningRate 0.0010 Epoch: 4 Global Step: 93170 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:31,126-Speed 6309.86 samples/sec Loss 8.9200 LearningRate 0.0010 Epoch: 4 Global Step: 93180 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:34,376-Speed 6303.36 samples/sec Loss 8.8806 LearningRate 0.0010 Epoch: 4 Global Step: 93190 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:37,606-Speed 6341.46 samples/sec Loss 8.9810 LearningRate 0.0010 Epoch: 4 Global Step: 93200 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:40,853-Speed 6309.44 samples/sec Loss 8.9334 LearningRate 0.0010 Epoch: 4 Global Step: 93210 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:44,100-Speed 6309.03 samples/sec Loss 8.9988 LearningRate 0.0010 Epoch: 4 Global Step: 93220 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:47,346-Speed 6310.41 samples/sec Loss 8.9396 LearningRate 0.0010 Epoch: 4 Global Step: 93230 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:50,596-Speed 6302.38 samples/sec Loss 8.9715 LearningRate 0.0010 Epoch: 4 Global Step: 93240 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:53,843-Speed 6309.39 samples/sec Loss 9.0381 LearningRate 0.0010 Epoch: 4 Global Step: 93250 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:12:57,086-Speed 6316.92 samples/sec Loss 8.8536 LearningRate 0.0010 Epoch: 4 Global Step: 93260 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:00,332-Speed 6311.59 samples/sec Loss 8.8196 LearningRate 0.0010 Epoch: 4 Global Step: 93270 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:03,589-Speed 6288.35 samples/sec Loss 8.8541 LearningRate 0.0010 Epoch: 4 Global Step: 93280 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:06,835-Speed 6312.25 samples/sec Loss 9.0075 LearningRate 0.0010 Epoch: 4 Global Step: 93290 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:10,081-Speed 6311.03 samples/sec Loss 8.8911 LearningRate 0.0010 Epoch: 4 Global Step: 93300 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-04-01 00:13:13,322-Speed 6319.98 samples/sec Loss 8.9479 LearningRate 0.0010 Epoch: 4 Global Step: 93310 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:16,570-Speed 6306.76 samples/sec Loss 8.8308 LearningRate 0.0010 Epoch: 4 Global Step: 93320 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:19,814-Speed 6314.39 samples/sec Loss 8.9622 LearningRate 0.0010 Epoch: 4 Global Step: 93330 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:23,061-Speed 6308.70 samples/sec Loss 8.9857 LearningRate 0.0010 Epoch: 4 Global Step: 93340 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:26,307-Speed 6310.64 samples/sec Loss 8.9714 LearningRate 0.0010 Epoch: 4 Global Step: 93350 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:29,549-Speed 6319.37 samples/sec Loss 8.9892 LearningRate 0.0010 Epoch: 4 Global Step: 93360 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:32,790-Speed 6319.41 samples/sec Loss 8.9043 LearningRate 0.0010 Epoch: 4 Global Step: 93370 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:36,032-Speed 6318.49 samples/sec Loss 8.9544 LearningRate 0.0010 Epoch: 4 Global Step: 93380 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:39,286-Speed 6294.22 samples/sec Loss 8.9366 LearningRate 0.0010 Epoch: 4 Global Step: 93390 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:42,535-Speed 6305.17 samples/sec Loss 8.9535 LearningRate 0.0010 Epoch: 4 Global Step: 93400 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:13:45,765-Speed 6343.07 samples/sec Loss 8.9568 LearningRate 0.0010 Epoch: 4 Global Step: 93410 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:13:49,008-Speed 6316.42 samples/sec Loss 8.8539 LearningRate 0.0010 Epoch: 4 Global Step: 93420 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:13:52,250-Speed 6317.68 samples/sec Loss 8.9228 LearningRate 0.0010 Epoch: 4 Global Step: 93430 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:13:55,494-Speed 6314.34 samples/sec Loss 8.9392 LearningRate 0.0010 Epoch: 4 Global Step: 93440 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:13:58,739-Speed 6312.90 samples/sec Loss 8.8513 LearningRate 0.0010 Epoch: 4 Global Step: 93450 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:14:01,985-Speed 6311.07 samples/sec Loss 8.9436 LearningRate 0.0010 Epoch: 4 Global Step: 93460 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:14:05,265-Speed 6246.37 samples/sec Loss 8.9656 LearningRate 0.0010 Epoch: 4 Global Step: 93470 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:14:08,507-Speed 6318.28 samples/sec Loss 8.9069 LearningRate 0.0010 Epoch: 4 Global Step: 93480 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:14:11,752-Speed 6314.11 samples/sec Loss 8.8398 LearningRate 0.0010 Epoch: 4 Global Step: 93490 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:14:14,995-Speed 6315.59 samples/sec Loss 8.9365 LearningRate 0.0010 Epoch: 4 Global Step: 93500 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:14:18,243-Speed 6305.94 samples/sec Loss 9.0126 LearningRate 0.0010 Epoch: 4 Global Step: 93510 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:14:21,487-Speed 6315.14 samples/sec Loss 9.0004 LearningRate 0.0010 Epoch: 4 Global Step: 93520 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:14:24,733-Speed 6311.97 samples/sec Loss 8.8986 LearningRate 0.0010 Epoch: 4 Global Step: 93530 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:14:27,977-Speed 6314.24 samples/sec Loss 8.8385 LearningRate 0.0010 Epoch: 4 Global Step: 93540 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:14:31,220-Speed 6316.48 samples/sec Loss 8.8652 LearningRate 0.0010 Epoch: 4 Global Step: 93550 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:14:34,468-Speed 6306.85 samples/sec Loss 8.9394 LearningRate 0.0010 Epoch: 4 Global Step: 93560 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:14:37,712-Speed 6313.41 samples/sec Loss 8.9062 LearningRate 0.0010 Epoch: 4 Global Step: 93570 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:14:40,956-Speed 6316.15 samples/sec Loss 8.8606 LearningRate 0.0010 Epoch: 4 Global Step: 93580 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:14:44,188-Speed 6336.98 samples/sec Loss 8.8693 LearningRate 0.0010 Epoch: 4 Global Step: 93590 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:14:47,436-Speed 6306.80 samples/sec Loss 8.8934 LearningRate 0.0010 Epoch: 4 Global Step: 93600 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:14:50,683-Speed 6310.54 samples/sec Loss 8.8864 LearningRate 0.0010 Epoch: 4 Global Step: 93610 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:14:53,933-Speed 6303.36 samples/sec Loss 8.8959 LearningRate 0.0010 Epoch: 4 Global Step: 93620 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:14:57,177-Speed 6315.11 samples/sec Loss 8.8514 LearningRate 0.0010 Epoch: 4 Global Step: 93630 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:15:00,429-Speed 6299.22 samples/sec Loss 8.9344 LearningRate 0.0010 Epoch: 4 Global Step: 93640 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:15:03,679-Speed 6303.14 samples/sec Loss 8.9076 LearningRate 0.0010 Epoch: 4 Global Step: 93650 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:15:06,925-Speed 6310.55 samples/sec Loss 8.8908 LearningRate 0.0010 Epoch: 4 Global Step: 93660 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:15:10,172-Speed 6308.50 samples/sec Loss 8.9063 LearningRate 0.0010 Epoch: 4 Global Step: 93670 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:15:13,417-Speed 6313.79 samples/sec Loss 9.0245 LearningRate 0.0010 Epoch: 4 Global Step: 93680 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:15:16,663-Speed 6310.94 samples/sec Loss 8.8569 LearningRate 0.0010 Epoch: 4 Global Step: 93690 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:19,907-Speed 6313.50 samples/sec Loss 8.8802 LearningRate 0.0010 Epoch: 4 Global Step: 93700 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:23,154-Speed 6308.57 samples/sec Loss 8.9819 LearningRate 0.0010 Epoch: 4 Global Step: 93710 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:26,400-Speed 6310.61 samples/sec Loss 8.9505 LearningRate 0.0010 Epoch: 4 Global Step: 93720 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:29,645-Speed 6313.81 samples/sec Loss 8.9390 LearningRate 0.0010 Epoch: 4 Global Step: 93730 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:32,890-Speed 6312.56 samples/sec Loss 8.8544 LearningRate 0.0010 Epoch: 4 Global Step: 93740 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:36,137-Speed 6309.05 samples/sec Loss 8.8676 LearningRate 0.0010 Epoch: 4 Global Step: 93750 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:39,377-Speed 6321.99 samples/sec Loss 8.8527 LearningRate 0.0010 Epoch: 4 Global Step: 93760 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:42,621-Speed 6313.86 samples/sec Loss 8.8958 LearningRate 0.0010 Epoch: 4 Global Step: 93770 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:45,868-Speed 6308.60 samples/sec Loss 8.8941 LearningRate 0.0010 Epoch: 4 Global Step: 93780 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:49,104-Speed 6330.95 samples/sec Loss 8.8394 LearningRate 0.0010 Epoch: 4 Global Step: 93790 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:52,349-Speed 6312.28 samples/sec Loss 8.9207 LearningRate 0.0010 Epoch: 4 Global Step: 93800 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:55,597-Speed 6307.48 samples/sec Loss 8.9159 LearningRate 0.0010 Epoch: 4 Global Step: 93810 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:15:58,842-Speed 6311.18 samples/sec Loss 9.0466 LearningRate 0.0010 Epoch: 4 Global Step: 93820 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:02,089-Speed 6310.11 samples/sec Loss 9.0117 LearningRate 0.0010 Epoch: 4 Global Step: 93830 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:05,330-Speed 6320.13 samples/sec Loss 8.9225 LearningRate 0.0010 Epoch: 4 Global Step: 93840 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:08,577-Speed 6307.61 samples/sec Loss 8.9272 LearningRate 0.0010 Epoch: 4 Global Step: 93850 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:11,821-Speed 6315.04 samples/sec Loss 8.9472 LearningRate 0.0010 Epoch: 4 Global Step: 93860 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:15,063-Speed 6318.46 samples/sec Loss 8.9369 LearningRate 0.0010 Epoch: 4 Global Step: 93870 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:18,309-Speed 6312.39 samples/sec Loss 8.8920 LearningRate 0.0010 Epoch: 4 Global Step: 93880 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:21,540-Speed 6339.67 samples/sec Loss 8.9293 LearningRate 0.0010 Epoch: 4 Global Step: 93890 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:24,780-Speed 6323.02 samples/sec Loss 8.8990 LearningRate 0.0010 Epoch: 4 Global Step: 93900 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:28,025-Speed 6312.08 samples/sec Loss 8.9007 LearningRate 0.0010 Epoch: 4 Global Step: 93910 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:31,270-Speed 6312.07 samples/sec Loss 8.9937 LearningRate 0.0010 Epoch: 4 Global Step: 93920 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:34,515-Speed 6312.48 samples/sec Loss 8.8692 LearningRate 0.0010 Epoch: 4 Global Step: 93930 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:37,762-Speed 6309.20 samples/sec Loss 8.9079 LearningRate 0.0010 Epoch: 4 Global Step: 93940 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:41,006-Speed 6315.91 samples/sec Loss 9.0203 LearningRate 0.0010 Epoch: 4 Global Step: 93950 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:44,251-Speed 6311.44 samples/sec Loss 8.9483 LearningRate 0.0010 Epoch: 4 Global Step: 93960 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:16:47,484-Speed 6336.24 samples/sec Loss 8.9572 LearningRate 0.0010 Epoch: 4 Global Step: 93970 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:16:50,729-Speed 6313.91 samples/sec Loss 8.8847 LearningRate 0.0010 Epoch: 4 Global Step: 93980 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:16:53,970-Speed 6318.86 samples/sec Loss 8.8914 LearningRate 0.0010 Epoch: 4 Global Step: 93990 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:16:57,222-Speed 6299.34 samples/sec Loss 8.9129 LearningRate 0.0010 Epoch: 4 Global Step: 94000 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:17:00,470-Speed 6307.53 samples/sec Loss 8.8821 LearningRate 0.0010 Epoch: 4 Global Step: 94010 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:17:03,716-Speed 6310.57 samples/sec Loss 8.8308 LearningRate 0.0010 Epoch: 4 Global Step: 94020 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:17:06,967-Speed 6301.01 samples/sec Loss 8.8246 LearningRate 0.0010 Epoch: 4 Global Step: 94030 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:17:10,213-Speed 6311.31 samples/sec Loss 8.7864 LearningRate 0.0010 Epoch: 4 Global Step: 94040 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:17:13,455-Speed 6316.85 samples/sec Loss 8.8313 LearningRate 0.0010 Epoch: 4 Global Step: 94050 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:17:16,701-Speed 6311.26 samples/sec Loss 8.8866 LearningRate 0.0010 Epoch: 4 Global Step: 94060 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:17:19,947-Speed 6310.68 samples/sec Loss 8.9084 LearningRate 0.0010 Epoch: 4 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:23,194-Speed 6309.85 samples/sec Loss 8.8785 LearningRate 0.0010 Epoch: 4 Global Step: 94080 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:26,439-Speed 6311.65 samples/sec Loss 8.8896 LearningRate 0.0010 Epoch: 4 Global Step: 94090 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:29,685-Speed 6312.32 samples/sec Loss 8.8533 LearningRate 0.0010 Epoch: 4 Global Step: 94100 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:32,928-Speed 6316.12 samples/sec Loss 8.8692 LearningRate 0.0010 Epoch: 4 Global Step: 94110 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:36,173-Speed 6312.59 samples/sec Loss 8.8023 LearningRate 0.0010 Epoch: 4 Global Step: 94120 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:39,418-Speed 6312.99 samples/sec Loss 8.8435 LearningRate 0.0010 Epoch: 4 Global Step: 94130 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:42,663-Speed 6312.76 samples/sec Loss 8.8604 LearningRate 0.0010 Epoch: 4 Global Step: 94140 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:45,908-Speed 6312.15 samples/sec Loss 8.9779 LearningRate 0.0010 Epoch: 4 Global Step: 94150 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:49,153-Speed 6312.25 samples/sec Loss 8.9157 LearningRate 0.0010 Epoch: 4 Global Step: 94160 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:52,384-Speed 6341.33 samples/sec Loss 8.9321 LearningRate 0.0010 Epoch: 4 Global Step: 94170 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:55,630-Speed 6308.96 samples/sec Loss 8.9729 LearningRate 0.0010 Epoch: 4 Global Step: 94180 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:17:58,873-Speed 6318.70 samples/sec Loss 8.8200 LearningRate 0.0010 Epoch: 4 Global Step: 94190 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:18:02,118-Speed 6312.42 samples/sec Loss 8.8619 LearningRate 0.0010 Epoch: 4 Global Step: 94200 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:18:05,365-Speed 6307.70 samples/sec Loss 8.9715 LearningRate 0.0010 Epoch: 4 Global Step: 94210 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:18:08,594-Speed 6344.12 samples/sec Loss 8.8216 LearningRate 0.0010 Epoch: 4 Global Step: 94220 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:18:11,835-Speed 6319.93 samples/sec Loss 8.9365 LearningRate 0.0010 Epoch: 4 Global Step: 94230 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:18:15,079-Speed 6314.22 samples/sec Loss 8.8591 LearningRate 0.0010 Epoch: 4 Global Step: 94240 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:18:18,325-Speed 6311.65 samples/sec Loss 8.8301 LearningRate 0.0010 Epoch: 4 Global Step: 94250 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:18:21,569-Speed 6313.92 samples/sec Loss 8.8542 LearningRate 0.0010 Epoch: 4 Global Step: 94260 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:18:24,814-Speed 6313.78 samples/sec Loss 8.9218 LearningRate 0.0010 Epoch: 4 Global Step: 94270 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:18:28,055-Speed 6319.56 samples/sec Loss 8.9494 LearningRate 0.0010 Epoch: 4 Global Step: 94280 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:18:31,337-Speed 6241.62 samples/sec Loss 8.8568 LearningRate 0.0010 Epoch: 4 Global Step: 94290 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:18:34,583-Speed 6311.32 samples/sec Loss 8.9131 LearningRate 0.0010 Epoch: 4 Global Step: 94300 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:18:37,839-Speed 6290.76 samples/sec Loss 8.8688 LearningRate 0.0010 Epoch: 4 Global Step: 94310 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:18:41,084-Speed 6312.64 samples/sec Loss 8.8547 LearningRate 0.0010 Epoch: 4 Global Step: 94320 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:18:44,329-Speed 6314.25 samples/sec Loss 8.9115 LearningRate 0.0010 Epoch: 4 Global Step: 94330 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:18:47,585-Speed 6289.97 samples/sec Loss 8.9185 LearningRate 0.0010 Epoch: 4 Global Step: 94340 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:18:50,832-Speed 6309.87 samples/sec Loss 8.9770 LearningRate 0.0010 Epoch: 4 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:18:54,077-Speed 6312.38 samples/sec Loss 8.8200 LearningRate 0.0010 Epoch: 4 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:18:57,324-Speed 6308.85 samples/sec Loss 8.8082 LearningRate 0.0010 Epoch: 4 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:00,571-Speed 6308.06 samples/sec Loss 8.8130 LearningRate 0.0010 Epoch: 4 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:03,823-Speed 6298.77 samples/sec Loss 8.9227 LearningRate 0.0010 Epoch: 4 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:07,072-Speed 6306.46 samples/sec Loss 8.9371 LearningRate 0.0010 Epoch: 4 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:10,317-Speed 6312.16 samples/sec Loss 8.7845 LearningRate 0.0010 Epoch: 4 Global Step: 94410 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:13,573-Speed 6290.40 samples/sec Loss 8.7768 LearningRate 0.0010 Epoch: 4 Global Step: 94420 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-04-01 00:19:16,810-Speed 6328.67 samples/sec Loss 8.8896 LearningRate 0.0010 Epoch: 4 Global Step: 94430 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:20,055-Speed 6312.64 samples/sec Loss 8.9512 LearningRate 0.0010 Epoch: 4 Global Step: 94440 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:23,309-Speed 6295.19 samples/sec Loss 8.8529 LearningRate 0.0010 Epoch: 4 Global Step: 94450 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:26,557-Speed 6306.81 samples/sec Loss 8.8853 LearningRate 0.0010 Epoch: 4 Global Step: 94460 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:29,806-Speed 6305.65 samples/sec Loss 8.8648 LearningRate 0.0010 Epoch: 4 Global Step: 94470 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:33,057-Speed 6301.80 samples/sec Loss 8.9303 LearningRate 0.0010 Epoch: 4 Global Step: 94480 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:36,301-Speed 6313.10 samples/sec Loss 8.9922 LearningRate 0.0010 Epoch: 4 Global Step: 94490 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:39,546-Speed 6313.27 samples/sec Loss 8.8757 LearningRate 0.0010 Epoch: 4 Global Step: 94500 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:42,790-Speed 6315.20 samples/sec Loss 8.9102 LearningRate 0.0010 Epoch: 4 Global Step: 94510 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:46,037-Speed 6308.29 samples/sec Loss 8.9074 LearningRate 0.0010 Epoch: 4 Global Step: 94520 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:49,270-Speed 6336.84 samples/sec Loss 8.7094 LearningRate 0.0010 Epoch: 4 Global Step: 94530 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:52,513-Speed 6315.78 samples/sec Loss 8.7730 LearningRate 0.0010 Epoch: 4 Global Step: 94540 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:55,761-Speed 6306.53 samples/sec Loss 8.8698 LearningRate 0.0010 Epoch: 4 Global Step: 94550 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:19:59,015-Speed 6296.32 samples/sec Loss 8.9115 LearningRate 0.0010 Epoch: 4 Global Step: 94560 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:02,263-Speed 6306.85 samples/sec Loss 8.7653 LearningRate 0.0010 Epoch: 4 Global Step: 94570 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:05,519-Speed 6290.83 samples/sec Loss 8.9084 LearningRate 0.0010 Epoch: 4 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:08,765-Speed 6309.87 samples/sec Loss 8.8002 LearningRate 0.0010 Epoch: 4 Global Step: 94590 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:12,011-Speed 6312.60 samples/sec Loss 8.9378 LearningRate 0.0010 Epoch: 4 Global Step: 94600 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:15,260-Speed 6304.91 samples/sec Loss 8.9030 LearningRate 0.0010 Epoch: 4 Global Step: 94610 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:18,504-Speed 6312.70 samples/sec Loss 8.9244 LearningRate 0.0010 Epoch: 4 Global Step: 94620 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:21,739-Speed 6332.81 samples/sec Loss 8.8037 LearningRate 0.0010 Epoch: 4 Global Step: 94630 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:24,986-Speed 6308.89 samples/sec Loss 8.9353 LearningRate 0.0010 Epoch: 4 Global Step: 94640 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:28,232-Speed 6310.96 samples/sec Loss 8.8399 LearningRate 0.0010 Epoch: 4 Global Step: 94650 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:31,475-Speed 6316.39 samples/sec Loss 8.8064 LearningRate 0.0010 Epoch: 4 Global Step: 94660 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:34,725-Speed 6302.79 samples/sec Loss 8.9008 LearningRate 0.0010 Epoch: 4 Global Step: 94670 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:37,973-Speed 6306.64 samples/sec Loss 8.8821 LearningRate 0.0010 Epoch: 4 Global Step: 94680 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:41,222-Speed 6305.68 samples/sec Loss 8.8464 LearningRate 0.0010 Epoch: 4 Global Step: 94690 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:44,472-Speed 6301.86 samples/sec Loss 8.8635 LearningRate 0.0010 Epoch: 4 Global Step: 94700 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:47,763-Speed 6224.33 samples/sec Loss 8.7490 LearningRate 0.0010 Epoch: 4 Global Step: 94710 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:20:51,044-Speed 6242.84 samples/sec Loss 8.8326 LearningRate 0.0010 Epoch: 4 Global Step: 94720 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:20:54,287-Speed 6316.35 samples/sec Loss 8.8366 LearningRate 0.0010 Epoch: 4 Global Step: 94730 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:20:57,531-Speed 6316.58 samples/sec Loss 8.8602 LearningRate 0.0010 Epoch: 4 Global Step: 94740 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:21:00,777-Speed 6309.60 samples/sec Loss 8.7713 LearningRate 0.0010 Epoch: 4 Global Step: 94750 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:21:04,026-Speed 6305.15 samples/sec Loss 8.7563 LearningRate 0.0010 Epoch: 4 Global Step: 94760 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:21:07,271-Speed 6312.84 samples/sec Loss 8.8356 LearningRate 0.0010 Epoch: 4 Global Step: 94770 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:21:10,522-Speed 6301.43 samples/sec Loss 8.7659 LearningRate 0.0010 Epoch: 4 Global Step: 94780 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:21:13,766-Speed 6314.46 samples/sec Loss 8.8291 LearningRate 0.0010 Epoch: 4 Global Step: 94790 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:21:17,013-Speed 6308.73 samples/sec Loss 8.9491 LearningRate 0.0010 Epoch: 4 Global Step: 94800 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:21:20,259-Speed 6310.31 samples/sec Loss 8.9326 LearningRate 0.0010 Epoch: 4 Global Step: 94810 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:21:23,503-Speed 6314.43 samples/sec Loss 8.7870 LearningRate 0.0010 Epoch: 4 Global Step: 94820 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:21:26,751-Speed 6307.89 samples/sec Loss 8.7795 LearningRate 0.0010 Epoch: 4 Global Step: 94830 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:21:29,997-Speed 6310.61 samples/sec Loss 8.7902 LearningRate 0.0010 Epoch: 4 Global Step: 94840 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:21:33,242-Speed 6313.32 samples/sec Loss 8.8078 LearningRate 0.0010 Epoch: 4 Global Step: 94850 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:21:36,489-Speed 6308.38 samples/sec Loss 8.8441 LearningRate 0.0010 Epoch: 4 Global Step: 94860 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:21:39,736-Speed 6308.75 samples/sec Loss 8.8817 LearningRate 0.0010 Epoch: 4 Global Step: 94870 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:21:42,982-Speed 6310.35 samples/sec Loss 8.9058 LearningRate 0.0010 Epoch: 4 Global Step: 94880 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:21:46,236-Speed 6295.94 samples/sec Loss 8.9018 LearningRate 0.0010 Epoch: 4 Global Step: 94890 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:21:49,483-Speed 6308.21 samples/sec Loss 8.8698 LearningRate 0.0010 Epoch: 4 Global Step: 94900 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:21:52,729-Speed 6310.70 samples/sec Loss 8.8852 LearningRate 0.0010 Epoch: 4 Global Step: 94910 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:21:55,975-Speed 6310.88 samples/sec Loss 8.7573 LearningRate 0.0010 Epoch: 4 Global Step: 94920 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-04-01 00:21:59,210-Speed 6332.55 samples/sec Loss 8.8583 LearningRate 0.0010 Epoch: 4 Global Step: 94930 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:02,457-Speed 6306.98 samples/sec Loss 8.8391 LearningRate 0.0010 Epoch: 4 Global Step: 94940 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:05,714-Speed 6290.36 samples/sec Loss 8.7709 LearningRate 0.0010 Epoch: 4 Global Step: 94950 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:08,959-Speed 6313.45 samples/sec Loss 8.7492 LearningRate 0.0010 Epoch: 4 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:12,205-Speed 6310.53 samples/sec Loss 8.7435 LearningRate 0.0010 Epoch: 4 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:15,452-Speed 6309.42 samples/sec Loss 8.7718 LearningRate 0.0010 Epoch: 4 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:18,699-Speed 6308.62 samples/sec Loss 8.7873 LearningRate 0.0010 Epoch: 4 Global Step: 94990 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:21,945-Speed 6311.63 samples/sec Loss 8.8614 LearningRate 0.0010 Epoch: 4 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:25,192-Speed 6307.56 samples/sec Loss 8.8531 LearningRate 0.0010 Epoch: 4 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:28,434-Speed 6319.50 samples/sec Loss 8.7096 LearningRate 0.0010 Epoch: 4 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:31,666-Speed 6337.78 samples/sec Loss 8.8342 LearningRate 0.0010 Epoch: 4 Global Step: 95030 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:34,912-Speed 6311.30 samples/sec Loss 8.8985 LearningRate 0.0010 Epoch: 4 Global Step: 95040 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:38,153-Speed 6318.88 samples/sec Loss 8.8581 LearningRate 0.0010 Epoch: 4 Global Step: 95050 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:41,401-Speed 6307.35 samples/sec Loss 8.8105 LearningRate 0.0010 Epoch: 4 Global Step: 95060 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:44,644-Speed 6317.19 samples/sec Loss 8.7901 LearningRate 0.0010 Epoch: 4 Global Step: 95070 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:47,892-Speed 6306.75 samples/sec Loss 8.8611 LearningRate 0.0010 Epoch: 4 Global Step: 95080 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:51,139-Speed 6308.11 samples/sec Loss 8.8002 LearningRate 0.0010 Epoch: 4 Global Step: 95090 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:54,391-Speed 6298.16 samples/sec Loss 8.8340 LearningRate 0.0010 Epoch: 4 Global Step: 95100 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:22:57,638-Speed 6310.29 samples/sec Loss 8.7966 LearningRate 0.0010 Epoch: 4 Global Step: 95110 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:00,880-Speed 6317.77 samples/sec Loss 8.7555 LearningRate 0.0010 Epoch: 4 Global Step: 95120 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:04,125-Speed 6312.92 samples/sec Loss 8.8693 LearningRate 0.0010 Epoch: 4 Global Step: 95130 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:07,375-Speed 6302.47 samples/sec Loss 8.8079 LearningRate 0.0010 Epoch: 4 Global Step: 95140 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:10,625-Speed 6303.04 samples/sec Loss 8.8552 LearningRate 0.0010 Epoch: 4 Global Step: 95150 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:13,877-Speed 6299.14 samples/sec Loss 8.8431 LearningRate 0.0010 Epoch: 4 Global Step: 95160 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:17,123-Speed 6312.21 samples/sec Loss 8.8211 LearningRate 0.0010 Epoch: 4 Global Step: 95170 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:20,370-Speed 6308.98 samples/sec Loss 8.8495 LearningRate 0.0010 Epoch: 4 Global Step: 95180 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:23,613-Speed 6315.56 samples/sec Loss 8.8114 LearningRate 0.0010 Epoch: 4 Global Step: 95190 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:26,860-Speed 6309.33 samples/sec Loss 8.9512 LearningRate 0.0010 Epoch: 4 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:30,106-Speed 6310.13 samples/sec Loss 8.8291 LearningRate 0.0010 Epoch: 4 Global Step: 95210 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:33,351-Speed 6313.03 samples/sec Loss 8.9370 LearningRate 0.0010 Epoch: 4 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:36,583-Speed 6338.78 samples/sec Loss 8.8095 LearningRate 0.0010 Epoch: 4 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:39,830-Speed 6308.53 samples/sec Loss 8.7450 LearningRate 0.0010 Epoch: 4 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:43,075-Speed 6313.12 samples/sec Loss 8.7613 LearningRate 0.0010 Epoch: 4 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:46,321-Speed 6309.53 samples/sec Loss 8.8028 LearningRate 0.0010 Epoch: 4 Global Step: 95260 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:49,569-Speed 6307.05 samples/sec Loss 8.7989 LearningRate 0.0010 Epoch: 4 Global Step: 95270 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:52,814-Speed 6312.71 samples/sec Loss 8.7924 LearningRate 0.0010 Epoch: 4 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:56,061-Speed 6309.65 samples/sec Loss 8.8076 LearningRate 0.0010 Epoch: 4 Global Step: 95290 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:23:59,309-Speed 6305.57 samples/sec Loss 8.8127 LearningRate 0.0010 Epoch: 4 Global Step: 95300 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:24:02,556-Speed 6310.39 samples/sec Loss 8.7761 LearningRate 0.0010 Epoch: 4 Global Step: 95310 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:24:05,802-Speed 6309.64 samples/sec Loss 8.8473 LearningRate 0.0010 Epoch: 4 Global Step: 95320 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:24:09,036-Speed 6334.14 samples/sec Loss 8.8441 LearningRate 0.0010 Epoch: 4 Global Step: 95330 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:24:12,283-Speed 6309.05 samples/sec Loss 8.7593 LearningRate 0.0010 Epoch: 4 Global Step: 95340 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:24:15,528-Speed 6312.09 samples/sec Loss 8.8544 LearningRate 0.0010 Epoch: 4 Global Step: 95350 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:24:18,775-Speed 6310.05 samples/sec Loss 8.8044 LearningRate 0.0010 Epoch: 4 Global Step: 95360 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:24:22,019-Speed 6314.35 samples/sec Loss 8.8395 LearningRate 0.0010 Epoch: 4 Global Step: 95370 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:24:25,264-Speed 6311.24 samples/sec Loss 8.7218 LearningRate 0.0010 Epoch: 4 Global Step: 95380 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:24:28,513-Speed 6306.81 samples/sec Loss 8.7903 LearningRate 0.0010 Epoch: 4 Global Step: 95390 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:24:31,758-Speed 6313.55 samples/sec Loss 8.7463 LearningRate 0.0010 Epoch: 4 Global Step: 95400 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:24:35,000-Speed 6317.92 samples/sec Loss 8.8230 LearningRate 0.0010 Epoch: 4 Global Step: 95410 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:24:38,245-Speed 6312.17 samples/sec Loss 8.8166 LearningRate 0.0010 Epoch: 4 Global Step: 95420 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:24:41,491-Speed 6310.96 samples/sec Loss 8.7415 LearningRate 0.0010 Epoch: 4 Global Step: 95430 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:24:44,749-Speed 6288.37 samples/sec Loss 8.8222 LearningRate 0.0010 Epoch: 4 Global Step: 95440 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:24:47,993-Speed 6314.09 samples/sec Loss 8.7652 LearningRate 0.0010 Epoch: 4 Global Step: 95450 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:24:51,239-Speed 6311.37 samples/sec Loss 8.7932 LearningRate 0.0010 Epoch: 4 Global Step: 95460 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:24:54,485-Speed 6309.80 samples/sec Loss 8.8201 LearningRate 0.0010 Epoch: 4 Global Step: 95470 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:24:57,732-Speed 6309.10 samples/sec Loss 8.8042 LearningRate 0.0010 Epoch: 4 Global Step: 95480 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:00,980-Speed 6305.60 samples/sec Loss 8.7153 LearningRate 0.0010 Epoch: 4 Global Step: 95490 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:04,227-Speed 6310.70 samples/sec Loss 8.8254 LearningRate 0.0010 Epoch: 4 Global Step: 95500 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:07,475-Speed 6306.06 samples/sec Loss 8.8692 LearningRate 0.0010 Epoch: 4 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:10,722-Speed 6309.15 samples/sec Loss 8.8068 LearningRate 0.0010 Epoch: 4 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:13,955-Speed 6334.46 samples/sec Loss 8.8322 LearningRate 0.0010 Epoch: 4 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:17,204-Speed 6305.91 samples/sec Loss 8.8738 LearningRate 0.0010 Epoch: 4 Global Step: 95540 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:20,455-Speed 6300.40 samples/sec Loss 8.8887 LearningRate 0.0010 Epoch: 4 Global Step: 95550 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:23,705-Speed 6303.91 samples/sec Loss 8.7756 LearningRate 0.0010 Epoch: 4 Global Step: 95560 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:26,952-Speed 6308.60 samples/sec Loss 8.8759 LearningRate 0.0010 Epoch: 4 Global Step: 95570 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:30,200-Speed 6306.01 samples/sec Loss 8.7908 LearningRate 0.0010 Epoch: 4 Global Step: 95580 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:33,446-Speed 6310.94 samples/sec Loss 8.8490 LearningRate 0.0010 Epoch: 4 Global Step: 95590 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:36,692-Speed 6311.81 samples/sec Loss 8.8294 LearningRate 0.0010 Epoch: 4 Global Step: 95600 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:25:39,920-Speed 6344.90 samples/sec Loss 8.8694 LearningRate 0.0010 Epoch: 4 Global Step: 95610 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:25:43,165-Speed 6313.92 samples/sec Loss 8.7951 LearningRate 0.0010 Epoch: 4 Global Step: 95620 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:25:46,411-Speed 6309.63 samples/sec Loss 8.8350 LearningRate 0.0010 Epoch: 4 Global Step: 95630 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:25:49,674-Speed 6279.40 samples/sec Loss 8.8738 LearningRate 0.0010 Epoch: 4 Global Step: 95640 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:25:52,916-Speed 6318.23 samples/sec Loss 8.7180 LearningRate 0.0010 Epoch: 4 Global Step: 95650 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:25:56,172-Speed 6290.83 samples/sec Loss 8.8264 LearningRate 0.0010 Epoch: 4 Global Step: 95660 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:25:59,424-Speed 6298.84 samples/sec Loss 8.7946 LearningRate 0.0010 Epoch: 4 Global Step: 95670 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:26:02,669-Speed 6311.65 samples/sec Loss 8.8017 LearningRate 0.0010 Epoch: 4 Global Step: 95680 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:26:05,916-Speed 6310.68 samples/sec Loss 8.7998 LearningRate 0.0010 Epoch: 4 Global Step: 95690 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:26:09,160-Speed 6313.70 samples/sec Loss 8.8085 LearningRate 0.0010 Epoch: 4 Global Step: 95700 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:26:12,407-Speed 6308.04 samples/sec Loss 8.8483 LearningRate 0.0010 Epoch: 4 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:15,660-Speed 6298.28 samples/sec Loss 8.8121 LearningRate 0.0010 Epoch: 4 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:18,906-Speed 6309.82 samples/sec Loss 8.8256 LearningRate 0.0010 Epoch: 4 Global Step: 95730 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:22,163-Speed 6289.25 samples/sec Loss 8.8031 LearningRate 0.0010 Epoch: 4 Global Step: 95740 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:25,411-Speed 6306.12 samples/sec Loss 8.8113 LearningRate 0.0010 Epoch: 4 Global Step: 95750 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:28,663-Speed 6300.83 samples/sec Loss 8.8445 LearningRate 0.0010 Epoch: 4 Global Step: 95760 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:31,905-Speed 6316.81 samples/sec Loss 8.8197 LearningRate 0.0010 Epoch: 4 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:35,153-Speed 6307.61 samples/sec Loss 8.8174 LearningRate 0.0010 Epoch: 4 Global Step: 95780 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:38,398-Speed 6311.84 samples/sec Loss 8.7753 LearningRate 0.0010 Epoch: 4 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:41,641-Speed 6317.14 samples/sec Loss 8.8032 LearningRate 0.0010 Epoch: 4 Global Step: 95800 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:44,870-Speed 6343.65 samples/sec Loss 8.8332 LearningRate 0.0010 Epoch: 4 Global Step: 95810 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:48,114-Speed 6315.14 samples/sec Loss 8.8400 LearningRate 0.0010 Epoch: 4 Global Step: 95820 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:51,367-Speed 6298.91 samples/sec Loss 8.8300 LearningRate 0.0010 Epoch: 4 Global Step: 95830 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:54,622-Speed 6291.64 samples/sec Loss 8.8068 LearningRate 0.0010 Epoch: 4 Global Step: 95840 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:26:57,867-Speed 6313.10 samples/sec Loss 8.8022 LearningRate 0.0010 Epoch: 4 Global Step: 95850 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:01,112-Speed 6312.48 samples/sec Loss 8.8450 LearningRate 0.0010 Epoch: 4 Global Step: 95860 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:04,358-Speed 6312.08 samples/sec Loss 8.8041 LearningRate 0.0010 Epoch: 4 Global Step: 95870 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:07,605-Speed 6308.19 samples/sec Loss 8.6948 LearningRate 0.0010 Epoch: 4 Global Step: 95880 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:10,851-Speed 6310.12 samples/sec Loss 8.7533 LearningRate 0.0010 Epoch: 4 Global Step: 95890 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:14,097-Speed 6310.79 samples/sec Loss 8.7627 LearningRate 0.0010 Epoch: 4 Global Step: 95900 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:17,344-Speed 6309.99 samples/sec Loss 8.8666 LearningRate 0.0010 Epoch: 4 Global Step: 95910 Fp16 Grad Scale: 131072 Required: 67 hours Training: 2022-04-01 00:27:20,577-Speed 6334.90 samples/sec Loss 8.7959 LearningRate 0.0010 Epoch: 4 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:23,825-Speed 6307.21 samples/sec Loss 8.8136 LearningRate 0.0010 Epoch: 4 Global Step: 95930 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:27,078-Speed 6296.18 samples/sec Loss 8.8300 LearningRate 0.0010 Epoch: 4 Global Step: 95940 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:30,330-Speed 6298.73 samples/sec Loss 8.6769 LearningRate 0.0010 Epoch: 4 Global Step: 95950 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:33,586-Speed 6293.03 samples/sec Loss 8.7798 LearningRate 0.0010 Epoch: 4 Global Step: 95960 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:36,834-Speed 6306.29 samples/sec Loss 8.7705 LearningRate 0.0010 Epoch: 4 Global Step: 95970 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:40,082-Speed 6307.01 samples/sec Loss 8.7442 LearningRate 0.0010 Epoch: 4 Global Step: 95980 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:43,329-Speed 6308.98 samples/sec Loss 8.8038 LearningRate 0.0010 Epoch: 4 Global Step: 95990 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:46,582-Speed 6296.95 samples/sec Loss 8.7729 LearningRate 0.0010 Epoch: 4 Global Step: 96000 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:49,828-Speed 6310.21 samples/sec Loss 8.7892 LearningRate 0.0010 Epoch: 4 Global Step: 96010 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:27:53,062-Speed 6334.72 samples/sec Loss 8.7480 LearningRate 0.0010 Epoch: 4 Global Step: 96020 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:27:56,313-Speed 6300.62 samples/sec Loss 8.7890 LearningRate 0.0010 Epoch: 4 Global Step: 96030 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:27:59,585-Speed 6261.80 samples/sec Loss 8.8060 LearningRate 0.0010 Epoch: 4 Global Step: 96040 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:28:02,860-Speed 6253.27 samples/sec Loss 8.8410 LearningRate 0.0010 Epoch: 4 Global Step: 96050 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:28:06,105-Speed 6313.49 samples/sec Loss 8.7336 LearningRate 0.0010 Epoch: 4 Global Step: 96060 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:28:09,350-Speed 6313.60 samples/sec Loss 8.7813 LearningRate 0.0010 Epoch: 4 Global Step: 96070 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:28:12,599-Speed 6304.25 samples/sec Loss 8.7508 LearningRate 0.0010 Epoch: 4 Global Step: 96080 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:28:15,846-Speed 6307.86 samples/sec Loss 8.7703 LearningRate 0.0010 Epoch: 4 Global Step: 96090 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:28:19,094-Speed 6307.35 samples/sec Loss 8.8436 LearningRate 0.0010 Epoch: 4 Global Step: 96100 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:28:22,336-Speed 6318.19 samples/sec Loss 8.8861 LearningRate 0.0010 Epoch: 4 Global Step: 96110 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:28:25,584-Speed 6307.59 samples/sec Loss 8.8699 LearningRate 0.0010 Epoch: 4 Global Step: 96120 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:28:28,833-Speed 6304.20 samples/sec Loss 8.7828 LearningRate 0.0010 Epoch: 4 Global Step: 96130 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:28:32,082-Speed 6303.99 samples/sec Loss 8.7068 LearningRate 0.0010 Epoch: 4 Global Step: 96140 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:28:35,327-Speed 6314.28 samples/sec Loss 8.8008 LearningRate 0.0010 Epoch: 4 Global Step: 96150 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:28:38,570-Speed 6315.92 samples/sec Loss 8.7361 LearningRate 0.0010 Epoch: 4 Global Step: 96160 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:28:41,817-Speed 6309.22 samples/sec Loss 8.7065 LearningRate 0.0010 Epoch: 4 Global Step: 96170 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:28:45,064-Speed 6307.61 samples/sec Loss 8.8067 LearningRate 0.0010 Epoch: 4 Global Step: 96180 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:28:48,319-Speed 6293.70 samples/sec Loss 8.7019 LearningRate 0.0010 Epoch: 4 Global Step: 96190 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:28:51,566-Speed 6308.92 samples/sec Loss 8.7953 LearningRate 0.0010 Epoch: 4 Global Step: 96200 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:28:54,818-Speed 6299.31 samples/sec Loss 8.7896 LearningRate 0.0010 Epoch: 4 Global Step: 96210 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:28:58,049-Speed 6339.45 samples/sec Loss 8.8199 LearningRate 0.0010 Epoch: 4 Global Step: 96220 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:01,303-Speed 6295.37 samples/sec Loss 8.7227 LearningRate 0.0010 Epoch: 4 Global Step: 96230 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:04,552-Speed 6304.74 samples/sec Loss 8.7867 LearningRate 0.0010 Epoch: 4 Global Step: 96240 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:07,799-Speed 6309.26 samples/sec Loss 8.6746 LearningRate 0.0010 Epoch: 4 Global Step: 96250 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:11,045-Speed 6311.14 samples/sec Loss 8.6652 LearningRate 0.0010 Epoch: 4 Global Step: 96260 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:14,293-Speed 6308.01 samples/sec Loss 8.7899 LearningRate 0.0010 Epoch: 4 Global Step: 96270 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:17,543-Speed 6301.89 samples/sec Loss 8.7240 LearningRate 0.0010 Epoch: 4 Global Step: 96280 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:20,791-Speed 6307.83 samples/sec Loss 8.7444 LearningRate 0.0010 Epoch: 4 Global Step: 96290 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:24,032-Speed 6319.19 samples/sec Loss 8.8139 LearningRate 0.0010 Epoch: 4 Global Step: 96300 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:27,281-Speed 6304.83 samples/sec Loss 8.7910 LearningRate 0.0010 Epoch: 4 Global Step: 96310 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:30,513-Speed 6339.27 samples/sec Loss 8.7598 LearningRate 0.0010 Epoch: 4 Global Step: 96320 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:33,762-Speed 6305.34 samples/sec Loss 8.8896 LearningRate 0.0010 Epoch: 4 Global Step: 96330 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:37,007-Speed 6312.23 samples/sec Loss 8.7702 LearningRate 0.0010 Epoch: 4 Global Step: 96340 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:40,252-Speed 6312.90 samples/sec Loss 8.7625 LearningRate 0.0010 Epoch: 4 Global Step: 96350 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:43,502-Speed 6301.90 samples/sec Loss 8.7869 LearningRate 0.0010 Epoch: 4 Global Step: 96360 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:46,753-Speed 6302.36 samples/sec Loss 8.8401 LearningRate 0.0010 Epoch: 4 Global Step: 96370 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:50,000-Speed 6308.16 samples/sec Loss 8.8272 LearningRate 0.0010 Epoch: 4 Global Step: 96380 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:53,254-Speed 6295.33 samples/sec Loss 8.8229 LearningRate 0.0010 Epoch: 4 Global Step: 96390 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:56,503-Speed 6303.87 samples/sec Loss 8.7320 LearningRate 0.0010 Epoch: 4 Global Step: 96400 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:29:59,749-Speed 6312.04 samples/sec Loss 8.7585 LearningRate 0.0010 Epoch: 4 Global Step: 96410 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:30:02,990-Speed 6319.86 samples/sec Loss 8.7781 LearningRate 0.0010 Epoch: 4 Global Step: 96420 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:30:06,234-Speed 6313.22 samples/sec Loss 8.7185 LearningRate 0.0010 Epoch: 4 Global Step: 96430 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:30:09,497-Speed 6278.81 samples/sec Loss 8.8185 LearningRate 0.0010 Epoch: 4 Global Step: 96440 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:30:12,739-Speed 6319.32 samples/sec Loss 8.7693 LearningRate 0.0010 Epoch: 4 Global Step: 96450 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:30:15,983-Speed 6315.15 samples/sec Loss 8.7865 LearningRate 0.0010 Epoch: 4 Global Step: 96460 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:30:19,230-Speed 6308.86 samples/sec Loss 8.7611 LearningRate 0.0010 Epoch: 4 Global Step: 96470 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:30:22,477-Speed 6307.35 samples/sec Loss 8.7599 LearningRate 0.0010 Epoch: 4 Global Step: 96480 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:30:25,731-Speed 6295.90 samples/sec Loss 8.7958 LearningRate 0.0010 Epoch: 4 Global Step: 96490 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:30:28,987-Speed 6291.27 samples/sec Loss 8.8808 LearningRate 0.0010 Epoch: 4 Global Step: 96500 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:30:32,243-Speed 6291.96 samples/sec Loss 8.7291 LearningRate 0.0010 Epoch: 4 Global Step: 96510 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:30:35,484-Speed 6319.50 samples/sec Loss 8.8076 LearningRate 0.0010 Epoch: 4 Global Step: 96520 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:30:38,731-Speed 6310.09 samples/sec Loss 8.7504 LearningRate 0.0010 Epoch: 4 Global Step: 96530 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:30:41,974-Speed 6316.54 samples/sec Loss 8.6659 LearningRate 0.0010 Epoch: 4 Global Step: 96540 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:30:45,223-Speed 6303.28 samples/sec Loss 8.8308 LearningRate 0.0010 Epoch: 4 Global Step: 96550 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:30:48,467-Speed 6316.06 samples/sec Loss 8.7071 LearningRate 0.0010 Epoch: 4 Global Step: 96560 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:30:51,713-Speed 6309.96 samples/sec Loss 8.7608 LearningRate 0.0010 Epoch: 4 Global Step: 96570 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:30:54,958-Speed 6313.34 samples/sec Loss 8.7666 LearningRate 0.0010 Epoch: 4 Global Step: 96580 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:30:58,203-Speed 6311.92 samples/sec Loss 8.8564 LearningRate 0.0010 Epoch: 4 Global Step: 96590 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:31:01,450-Speed 6309.35 samples/sec Loss 8.7347 LearningRate 0.0010 Epoch: 4 Global Step: 96600 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:31:04,742-Speed 6221.42 samples/sec Loss 8.7062 LearningRate 0.0010 Epoch: 4 Global Step: 96610 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:31:08,003-Speed 6281.58 samples/sec Loss 8.7367 LearningRate 0.0010 Epoch: 4 Global Step: 96620 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:31:11,250-Speed 6309.92 samples/sec Loss 8.7623 LearningRate 0.0010 Epoch: 4 Global Step: 96630 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:31:14,495-Speed 6313.14 samples/sec Loss 8.6561 LearningRate 0.0010 Epoch: 4 Global Step: 96640 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:31:17,744-Speed 6303.88 samples/sec Loss 8.7700 LearningRate 0.0010 Epoch: 4 Global Step: 96650 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:31:20,988-Speed 6314.07 samples/sec Loss 8.7940 LearningRate 0.0010 Epoch: 4 Global Step: 96660 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:31:24,233-Speed 6313.22 samples/sec Loss 8.6784 LearningRate 0.0010 Epoch: 4 Global Step: 96670 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:31:27,481-Speed 6307.31 samples/sec Loss 8.7181 LearningRate 0.0010 Epoch: 4 Global Step: 96680 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:31:30,726-Speed 6311.87 samples/sec Loss 8.7295 LearningRate 0.0010 Epoch: 4 Global Step: 96690 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:31:33,972-Speed 6311.78 samples/sec Loss 8.7092 LearningRate 0.0010 Epoch: 4 Global Step: 96700 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:31:37,214-Speed 6319.15 samples/sec Loss 8.7635 LearningRate 0.0010 Epoch: 4 Global Step: 96710 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:31:40,459-Speed 6312.89 samples/sec Loss 8.7797 LearningRate 0.0010 Epoch: 4 Global Step: 96720 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:31:43,701-Speed 6318.59 samples/sec Loss 8.7516 LearningRate 0.0010 Epoch: 4 Global Step: 96730 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:31:46,946-Speed 6310.89 samples/sec Loss 8.7370 LearningRate 0.0010 Epoch: 4 Global Step: 96740 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:31:50,195-Speed 6305.23 samples/sec Loss 8.8303 LearningRate 0.0010 Epoch: 4 Global Step: 96750 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:31:53,443-Speed 6306.31 samples/sec Loss 8.7161 LearningRate 0.0010 Epoch: 4 Global Step: 96760 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:31:56,691-Speed 6308.43 samples/sec Loss 8.6332 LearningRate 0.0010 Epoch: 4 Global Step: 96770 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:31:59,937-Speed 6310.39 samples/sec Loss 8.6360 LearningRate 0.0010 Epoch: 4 Global Step: 96780 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:03,182-Speed 6311.14 samples/sec Loss 8.7356 LearningRate 0.0010 Epoch: 4 Global Step: 96790 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:06,428-Speed 6311.37 samples/sec Loss 8.8463 LearningRate 0.0010 Epoch: 4 Global Step: 96800 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:09,673-Speed 6313.76 samples/sec Loss 8.7232 LearningRate 0.0010 Epoch: 4 Global Step: 96810 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:12,910-Speed 6327.02 samples/sec Loss 8.6893 LearningRate 0.0010 Epoch: 4 Global Step: 96820 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:16,159-Speed 6306.30 samples/sec Loss 8.7465 LearningRate 0.0010 Epoch: 4 Global Step: 96830 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:19,403-Speed 6313.78 samples/sec Loss 8.7736 LearningRate 0.0010 Epoch: 4 Global Step: 96840 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:22,652-Speed 6305.02 samples/sec Loss 8.7448 LearningRate 0.0010 Epoch: 4 Global Step: 96850 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:25,899-Speed 6309.14 samples/sec Loss 8.7409 LearningRate 0.0010 Epoch: 4 Global Step: 96860 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:29,144-Speed 6313.06 samples/sec Loss 8.7193 LearningRate 0.0010 Epoch: 4 Global Step: 96870 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:32,399-Speed 6293.09 samples/sec Loss 8.7958 LearningRate 0.0010 Epoch: 4 Global Step: 96880 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:35,655-Speed 6291.64 samples/sec Loss 8.7213 LearningRate 0.0010 Epoch: 4 Global Step: 96890 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:38,900-Speed 6312.65 samples/sec Loss 8.7524 LearningRate 0.0010 Epoch: 4 Global Step: 96900 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:42,153-Speed 6298.04 samples/sec Loss 8.7653 LearningRate 0.0010 Epoch: 4 Global Step: 96910 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:45,384-Speed 6339.67 samples/sec Loss 8.7843 LearningRate 0.0010 Epoch: 4 Global Step: 96920 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:48,642-Speed 6286.24 samples/sec Loss 8.6920 LearningRate 0.0010 Epoch: 4 Global Step: 96930 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:51,890-Speed 6308.33 samples/sec Loss 8.7143 LearningRate 0.0010 Epoch: 4 Global Step: 96940 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:32:55,121-Speed 6339.47 samples/sec Loss 8.7148 LearningRate 0.0010 Epoch: 4 Global Step: 96950 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:32:58,369-Speed 6306.60 samples/sec Loss 8.7484 LearningRate 0.0010 Epoch: 4 Global Step: 96960 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:33:01,616-Speed 6309.08 samples/sec Loss 8.7607 LearningRate 0.0010 Epoch: 4 Global Step: 96970 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:33:04,864-Speed 6306.14 samples/sec Loss 8.8262 LearningRate 0.0010 Epoch: 4 Global Step: 96980 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:33:08,112-Speed 6307.55 samples/sec Loss 8.7901 LearningRate 0.0010 Epoch: 4 Global Step: 96990 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:33:11,354-Speed 6317.05 samples/sec Loss 8.6926 LearningRate 0.0010 Epoch: 4 Global Step: 97000 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:33:14,609-Speed 6293.58 samples/sec Loss 8.6614 LearningRate 0.0010 Epoch: 4 Global Step: 97010 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:33:17,859-Speed 6302.75 samples/sec Loss 8.8044 LearningRate 0.0010 Epoch: 4 Global Step: 97020 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:33:21,107-Speed 6307.87 samples/sec Loss 8.7327 LearningRate 0.0010 Epoch: 4 Global Step: 97030 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:33:24,352-Speed 6312.70 samples/sec Loss 8.7974 LearningRate 0.0010 Epoch: 4 Global Step: 97040 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:33:27,598-Speed 6311.05 samples/sec Loss 8.7516 LearningRate 0.0010 Epoch: 4 Global Step: 97050 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:33:30,843-Speed 6312.11 samples/sec Loss 8.7271 LearningRate 0.0010 Epoch: 4 Global Step: 97060 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:33:34,085-Speed 6318.70 samples/sec Loss 8.6688 LearningRate 0.0010 Epoch: 4 Global Step: 97070 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:33:37,333-Speed 6305.95 samples/sec Loss 8.7486 LearningRate 0.0010 Epoch: 4 Global Step: 97080 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:33:40,582-Speed 6305.64 samples/sec Loss 8.6922 LearningRate 0.0010 Epoch: 4 Global Step: 97090 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:33:43,827-Speed 6312.15 samples/sec Loss 8.7091 LearningRate 0.0010 Epoch: 4 Global Step: 97100 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:33:47,073-Speed 6312.58 samples/sec Loss 8.8356 LearningRate 0.0010 Epoch: 4 Global Step: 97110 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:33:50,317-Speed 6313.64 samples/sec Loss 8.7097 LearningRate 0.0010 Epoch: 4 Global Step: 97120 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:33:53,566-Speed 6305.39 samples/sec Loss 8.6884 LearningRate 0.0010 Epoch: 4 Global Step: 97130 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:33:56,812-Speed 6310.92 samples/sec Loss 8.6962 LearningRate 0.0010 Epoch: 4 Global Step: 97140 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:00,047-Speed 6332.21 samples/sec Loss 8.6763 LearningRate 0.0010 Epoch: 4 Global Step: 97150 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:03,299-Speed 6299.50 samples/sec Loss 8.7412 LearningRate 0.0010 Epoch: 4 Global Step: 97160 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:06,543-Speed 6313.86 samples/sec Loss 8.7813 LearningRate 0.0010 Epoch: 4 Global Step: 97170 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:09,793-Speed 6303.03 samples/sec Loss 8.5554 LearningRate 0.0010 Epoch: 4 Global Step: 97180 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:13,034-Speed 6319.74 samples/sec Loss 8.6798 LearningRate 0.0010 Epoch: 4 Global Step: 97190 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:16,283-Speed 6306.12 samples/sec Loss 8.7106 LearningRate 0.0010 Epoch: 4 Global Step: 97200 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:19,532-Speed 6307.56 samples/sec Loss 8.7659 LearningRate 0.0010 Epoch: 4 Global Step: 97210 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:22,781-Speed 6306.40 samples/sec Loss 8.6820 LearningRate 0.0010 Epoch: 4 Global Step: 97220 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:26,028-Speed 6307.71 samples/sec Loss 8.7544 LearningRate 0.0010 Epoch: 4 Global Step: 97230 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:29,269-Speed 6320.24 samples/sec Loss 8.6072 LearningRate 0.0010 Epoch: 4 Global Step: 97240 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:34:32,502-Speed 6336.70 samples/sec Loss 8.6754 LearningRate 0.0010 Epoch: 4 Global Step: 97250 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:34:35,760-Speed 6286.34 samples/sec Loss 8.6708 LearningRate 0.0010 Epoch: 4 Global Step: 97260 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:34:39,008-Speed 6308.49 samples/sec Loss 8.7330 LearningRate 0.0010 Epoch: 4 Global Step: 97270 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:34:42,254-Speed 6309.40 samples/sec Loss 8.6726 LearningRate 0.0010 Epoch: 4 Global Step: 97280 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:34:45,499-Speed 6313.87 samples/sec Loss 8.6685 LearningRate 0.0010 Epoch: 4 Global Step: 97290 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:34:48,745-Speed 6310.29 samples/sec Loss 8.8005 LearningRate 0.0010 Epoch: 4 Global Step: 97300 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:34:51,994-Speed 6303.76 samples/sec Loss 8.6803 LearningRate 0.0010 Epoch: 4 Global Step: 97310 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:34:55,241-Speed 6309.04 samples/sec Loss 8.6761 LearningRate 0.0010 Epoch: 4 Global Step: 97320 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:34:58,489-Speed 6308.77 samples/sec Loss 8.7524 LearningRate 0.0010 Epoch: 4 Global Step: 97330 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:35:01,733-Speed 6315.31 samples/sec Loss 8.6803 LearningRate 0.0010 Epoch: 4 Global Step: 97340 Fp16 Grad Scale: 32768 Required: 67 hours Training: 2022-04-01 00:35:04,982-Speed 6303.85 samples/sec Loss 8.6942 LearningRate 0.0010 Epoch: 4 Global Step: 97350 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:35:08,234-Speed 6298.75 samples/sec Loss 8.7270 LearningRate 0.0010 Epoch: 4 Global Step: 97360 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:35:11,482-Speed 6307.32 samples/sec Loss 8.7534 LearningRate 0.0010 Epoch: 4 Global Step: 97370 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:35:14,725-Speed 6316.59 samples/sec Loss 8.6792 LearningRate 0.0010 Epoch: 4 Global Step: 97380 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:35:17,973-Speed 6306.28 samples/sec Loss 8.6798 LearningRate 0.0010 Epoch: 4 Global Step: 97390 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:35:21,221-Speed 6308.24 samples/sec Loss 8.6170 LearningRate 0.0010 Epoch: 4 Global Step: 97400 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:35:24,467-Speed 6311.20 samples/sec Loss 8.6898 LearningRate 0.0010 Epoch: 4 Global Step: 97410 Fp16 Grad Scale: 65536 Required: 67 hours Training: 2022-04-01 00:35:27,713-Speed 6310.13 samples/sec Loss 8.6856 LearningRate 0.0010 Epoch: 4 Global Step: 97420 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:35:30,957-Speed 6313.27 samples/sec Loss 8.7870 LearningRate 0.0010 Epoch: 4 Global Step: 97430 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:35:34,200-Speed 6317.04 samples/sec Loss 8.7321 LearningRate 0.0010 Epoch: 4 Global Step: 97440 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:35:37,431-Speed 6340.80 samples/sec Loss 8.6482 LearningRate 0.0010 Epoch: 4 Global Step: 97450 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:35:40,680-Speed 6304.34 samples/sec Loss 8.7974 LearningRate 0.0010 Epoch: 4 Global Step: 97460 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:35:43,934-Speed 6295.60 samples/sec Loss 8.6568 LearningRate 0.0010 Epoch: 4 Global Step: 97470 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:35:47,182-Speed 6305.20 samples/sec Loss 8.7645 LearningRate 0.0010 Epoch: 4 Global Step: 97480 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:35:50,428-Speed 6311.54 samples/sec Loss 8.6688 LearningRate 0.0010 Epoch: 4 Global Step: 97490 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:35:53,674-Speed 6310.07 samples/sec Loss 8.7472 LearningRate 0.0010 Epoch: 4 Global Step: 97500 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:35:56,921-Speed 6309.87 samples/sec Loss 8.7086 LearningRate 0.0010 Epoch: 4 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:00,166-Speed 6311.60 samples/sec Loss 8.6749 LearningRate 0.0010 Epoch: 4 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:03,413-Speed 6310.09 samples/sec Loss 8.6398 LearningRate 0.0010 Epoch: 4 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:06,658-Speed 6311.66 samples/sec Loss 8.6761 LearningRate 0.0010 Epoch: 4 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:09,905-Speed 6309.55 samples/sec Loss 8.5732 LearningRate 0.0010 Epoch: 4 Global Step: 97550 Fp16 Grad Scale: 131072 Required: 66 hours Training: 2022-04-01 00:36:13,135-Speed 6341.80 samples/sec Loss 8.7707 LearningRate 0.0010 Epoch: 4 Global Step: 97560 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:16,381-Speed 6311.86 samples/sec Loss 8.6978 LearningRate 0.0010 Epoch: 4 Global Step: 97570 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:19,629-Speed 6306.70 samples/sec Loss 8.6275 LearningRate 0.0010 Epoch: 4 Global Step: 97580 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:22,876-Speed 6308.69 samples/sec Loss 8.5576 LearningRate 0.0010 Epoch: 4 Global Step: 97590 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:26,125-Speed 6304.32 samples/sec Loss 8.6715 LearningRate 0.0010 Epoch: 4 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:29,372-Speed 6309.33 samples/sec Loss 8.6516 LearningRate 0.0010 Epoch: 4 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:32,617-Speed 6312.79 samples/sec Loss 8.7977 LearningRate 0.0010 Epoch: 4 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:35,867-Speed 6301.44 samples/sec Loss 8.7639 LearningRate 0.0010 Epoch: 4 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:39,116-Speed 6305.21 samples/sec Loss 8.7561 LearningRate 0.0010 Epoch: 4 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:42,366-Speed 6304.13 samples/sec Loss 8.6531 LearningRate 0.0010 Epoch: 4 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:45,601-Speed 6331.14 samples/sec Loss 8.6870 LearningRate 0.0010 Epoch: 4 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:48,848-Speed 6309.69 samples/sec Loss 8.6700 LearningRate 0.0010 Epoch: 4 Global Step: 97670 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:52,097-Speed 6304.41 samples/sec Loss 8.7718 LearningRate 0.0010 Epoch: 4 Global Step: 97680 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:55,341-Speed 6314.22 samples/sec Loss 8.6764 LearningRate 0.0010 Epoch: 4 Global Step: 97690 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:36:58,586-Speed 6313.05 samples/sec Loss 8.7599 LearningRate 0.0010 Epoch: 4 Global Step: 97700 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:37:01,831-Speed 6312.51 samples/sec Loss 8.7499 LearningRate 0.0010 Epoch: 4 Global Step: 97710 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:37:05,081-Speed 6303.24 samples/sec Loss 8.6389 LearningRate 0.0010 Epoch: 4 Global Step: 97720 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:37:08,327-Speed 6310.08 samples/sec Loss 8.7078 LearningRate 0.0010 Epoch: 4 Global Step: 97730 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:37:11,574-Speed 6308.53 samples/sec Loss 8.6248 LearningRate 0.0010 Epoch: 4 Global Step: 97740 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:37:14,821-Speed 6309.84 samples/sec Loss 8.6567 LearningRate 0.0010 Epoch: 4 Global Step: 97750 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:37:18,056-Speed 6333.47 samples/sec Loss 8.6818 LearningRate 0.0010 Epoch: 4 Global Step: 97760 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:37:21,299-Speed 6314.92 samples/sec Loss 8.6879 LearningRate 0.0010 Epoch: 4 Global Step: 97770 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:37:24,548-Speed 6306.28 samples/sec Loss 8.6993 LearningRate 0.0010 Epoch: 4 Global Step: 97780 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:37:27,778-Speed 6341.82 samples/sec Loss 8.7346 LearningRate 0.0010 Epoch: 4 Global Step: 97790 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:37:31,026-Speed 6306.59 samples/sec Loss 8.5937 LearningRate 0.0010 Epoch: 4 Global Step: 97800 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:37:34,282-Speed 6290.32 samples/sec Loss 8.7564 LearningRate 0.0010 Epoch: 4 Global Step: 97810 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:37:37,527-Speed 6313.49 samples/sec Loss 8.6829 LearningRate 0.0010 Epoch: 4 Global Step: 97820 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:37:40,770-Speed 6316.60 samples/sec Loss 8.7722 LearningRate 0.0010 Epoch: 4 Global Step: 97830 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:37:44,014-Speed 6314.50 samples/sec Loss 8.8043 LearningRate 0.0010 Epoch: 4 Global Step: 97840 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:37:47,262-Speed 6306.59 samples/sec Loss 8.7007 LearningRate 0.0010 Epoch: 4 Global Step: 97850 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:37:50,506-Speed 6315.08 samples/sec Loss 8.7907 LearningRate 0.0010 Epoch: 4 Global Step: 97860 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:37:53,751-Speed 6311.63 samples/sec Loss 8.7139 LearningRate 0.0010 Epoch: 4 Global Step: 97870 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:37:56,998-Speed 6310.30 samples/sec Loss 8.6585 LearningRate 0.0010 Epoch: 4 Global Step: 97880 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:00,252-Speed 6294.04 samples/sec Loss 8.7333 LearningRate 0.0010 Epoch: 4 Global Step: 97890 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:38:03,498-Speed 6310.11 samples/sec Loss 8.6399 LearningRate 0.0010 Epoch: 4 Global Step: 97900 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:38:06,743-Speed 6313.50 samples/sec Loss 8.6661 LearningRate 0.0010 Epoch: 4 Global Step: 97910 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:38:09,987-Speed 6314.01 samples/sec Loss 8.6601 LearningRate 0.0010 Epoch: 4 Global Step: 97920 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:38:13,234-Speed 6309.52 samples/sec Loss 8.7639 LearningRate 0.0010 Epoch: 4 Global Step: 97930 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:38:16,463-Speed 6343.54 samples/sec Loss 8.7939 LearningRate 0.0010 Epoch: 4 Global Step: 97940 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:19,709-Speed 6311.42 samples/sec Loss 8.6101 LearningRate 0.0010 Epoch: 4 Global Step: 97950 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:22,953-Speed 6313.84 samples/sec Loss 8.6539 LearningRate 0.0010 Epoch: 4 Global Step: 97960 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:26,196-Speed 6317.89 samples/sec Loss 8.6703 LearningRate 0.0010 Epoch: 4 Global Step: 97970 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:29,438-Speed 6318.01 samples/sec Loss 8.7300 LearningRate 0.0010 Epoch: 4 Global Step: 97980 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:32,683-Speed 6312.97 samples/sec Loss 8.6900 LearningRate 0.0010 Epoch: 4 Global Step: 97990 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:35,932-Speed 6304.77 samples/sec Loss 8.6873 LearningRate 0.0010 Epoch: 4 Global Step: 98000 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:39,179-Speed 6309.61 samples/sec Loss 8.7238 LearningRate 0.0010 Epoch: 4 Global Step: 98010 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:42,424-Speed 6312.13 samples/sec Loss 8.7676 LearningRate 0.0010 Epoch: 4 Global Step: 98020 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:45,670-Speed 6311.69 samples/sec Loss 8.7911 LearningRate 0.0010 Epoch: 4 Global Step: 98030 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:38:48,914-Speed 6314.37 samples/sec Loss 8.7028 LearningRate 0.0010 Epoch: 4 Global Step: 98040 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:38:52,159-Speed 6312.31 samples/sec Loss 8.6040 LearningRate 0.0010 Epoch: 4 Global Step: 98050 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:38:55,409-Speed 6301.90 samples/sec Loss 8.7136 LearningRate 0.0010 Epoch: 4 Global Step: 98060 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:38:58,657-Speed 6308.17 samples/sec Loss 8.7269 LearningRate 0.0010 Epoch: 4 Global Step: 98070 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:01,900-Speed 6316.50 samples/sec Loss 8.6840 LearningRate 0.0010 Epoch: 4 Global Step: 98080 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:05,141-Speed 6319.36 samples/sec Loss 8.5451 LearningRate 0.0010 Epoch: 4 Global Step: 98090 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:08,389-Speed 6307.73 samples/sec Loss 8.6196 LearningRate 0.0010 Epoch: 4 Global Step: 98100 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:11,635-Speed 6311.00 samples/sec Loss 8.6894 LearningRate 0.0010 Epoch: 4 Global Step: 98110 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:14,881-Speed 6310.88 samples/sec Loss 8.7297 LearningRate 0.0010 Epoch: 4 Global Step: 98120 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:18,138-Speed 6287.97 samples/sec Loss 8.6122 LearningRate 0.0010 Epoch: 4 Global Step: 98130 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:21,371-Speed 6336.12 samples/sec Loss 8.6188 LearningRate 0.0010 Epoch: 4 Global Step: 98140 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:24,620-Speed 6304.47 samples/sec Loss 8.7847 LearningRate 0.0010 Epoch: 4 Global Step: 98150 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:27,867-Speed 6309.67 samples/sec Loss 8.6669 LearningRate 0.0010 Epoch: 4 Global Step: 98160 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:31,115-Speed 6306.02 samples/sec Loss 8.5945 LearningRate 0.0010 Epoch: 4 Global Step: 98170 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:34,360-Speed 6313.93 samples/sec Loss 8.6332 LearningRate 0.0010 Epoch: 4 Global Step: 98180 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:37,609-Speed 6304.03 samples/sec Loss 8.6755 LearningRate 0.0010 Epoch: 4 Global Step: 98190 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:40,862-Speed 6296.43 samples/sec Loss 8.6587 LearningRate 0.0010 Epoch: 4 Global Step: 98200 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:44,109-Speed 6310.81 samples/sec Loss 8.6827 LearningRate 0.0010 Epoch: 4 Global Step: 98210 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:47,354-Speed 6311.74 samples/sec Loss 8.6318 LearningRate 0.0010 Epoch: 4 Global Step: 98220 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:50,605-Speed 6302.21 samples/sec Loss 8.6714 LearningRate 0.0010 Epoch: 4 Global Step: 98230 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:53,843-Speed 6325.00 samples/sec Loss 8.6670 LearningRate 0.0010 Epoch: 4 Global Step: 98240 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:39:57,090-Speed 6308.83 samples/sec Loss 8.6747 LearningRate 0.0010 Epoch: 4 Global Step: 98250 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:00,340-Speed 6303.79 samples/sec Loss 8.6339 LearningRate 0.0010 Epoch: 4 Global Step: 98260 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:03,586-Speed 6311.33 samples/sec Loss 8.6042 LearningRate 0.0010 Epoch: 4 Global Step: 98270 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:06,834-Speed 6305.90 samples/sec Loss 8.6896 LearningRate 0.0010 Epoch: 4 Global Step: 98280 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:10,080-Speed 6310.44 samples/sec Loss 8.6238 LearningRate 0.0010 Epoch: 4 Global Step: 98290 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:13,323-Speed 6315.92 samples/sec Loss 8.6477 LearningRate 0.0010 Epoch: 4 Global Step: 98300 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:16,572-Speed 6305.34 samples/sec Loss 8.6684 LearningRate 0.0010 Epoch: 4 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:19,819-Speed 6309.04 samples/sec Loss 8.6248 LearningRate 0.0010 Epoch: 4 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:23,069-Speed 6304.02 samples/sec Loss 8.6917 LearningRate 0.0010 Epoch: 4 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:26,301-Speed 6337.28 samples/sec Loss 8.7579 LearningRate 0.0010 Epoch: 4 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:29,549-Speed 6306.21 samples/sec Loss 8.6840 LearningRate 0.0010 Epoch: 4 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:32,797-Speed 6306.35 samples/sec Loss 8.7319 LearningRate 0.0010 Epoch: 4 Global Step: 98360 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:36,043-Speed 6312.24 samples/sec Loss 8.7261 LearningRate 0.0010 Epoch: 4 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:39,288-Speed 6311.61 samples/sec Loss 8.6534 LearningRate 0.0010 Epoch: 4 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:42,536-Speed 6306.96 samples/sec Loss 8.6625 LearningRate 0.0010 Epoch: 4 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:45,787-Speed 6301.13 samples/sec Loss 8.6266 LearningRate 0.0010 Epoch: 4 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:49,031-Speed 6313.93 samples/sec Loss 8.6628 LearningRate 0.0010 Epoch: 4 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:52,280-Speed 6305.52 samples/sec Loss 8.5965 LearningRate 0.0010 Epoch: 4 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:55,529-Speed 6305.03 samples/sec Loss 8.5420 LearningRate 0.0010 Epoch: 4 Global Step: 98430 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:40:58,757-Speed 6347.21 samples/sec Loss 8.6971 LearningRate 0.0010 Epoch: 4 Global Step: 98440 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:41:02,005-Speed 6307.50 samples/sec Loss 8.6965 LearningRate 0.0010 Epoch: 4 Global Step: 98450 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:41:05,250-Speed 6313.06 samples/sec Loss 8.6439 LearningRate 0.0010 Epoch: 4 Global Step: 98460 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:41:08,478-Speed 6344.84 samples/sec Loss 8.6927 LearningRate 0.0010 Epoch: 4 Global Step: 98470 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:41:11,724-Speed 6311.48 samples/sec Loss 8.6588 LearningRate 0.0010 Epoch: 4 Global Step: 98480 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:41:14,966-Speed 6317.15 samples/sec Loss 8.7259 LearningRate 0.0010 Epoch: 4 Global Step: 98490 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:41:18,211-Speed 6313.76 samples/sec Loss 8.5864 LearningRate 0.0010 Epoch: 4 Global Step: 98500 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:41:21,454-Speed 6315.99 samples/sec Loss 8.7499 LearningRate 0.0010 Epoch: 4 Global Step: 98510 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:41:24,699-Speed 6311.74 samples/sec Loss 8.6955 LearningRate 0.0010 Epoch: 4 Global Step: 98520 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:41:27,944-Speed 6312.12 samples/sec Loss 8.6761 LearningRate 0.0010 Epoch: 4 Global Step: 98530 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:41:31,189-Speed 6314.60 samples/sec Loss 8.6349 LearningRate 0.0010 Epoch: 4 Global Step: 98540 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:41:34,433-Speed 6312.85 samples/sec Loss 8.7434 LearningRate 0.0010 Epoch: 4 Global Step: 98550 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:41:37,677-Speed 6315.40 samples/sec Loss 8.7456 LearningRate 0.0010 Epoch: 4 Global Step: 98560 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:41:40,923-Speed 6310.50 samples/sec Loss 8.5827 LearningRate 0.0010 Epoch: 4 Global Step: 98570 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:41:44,164-Speed 6321.32 samples/sec Loss 8.6374 LearningRate 0.0010 Epoch: 4 Global Step: 98580 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:41:47,415-Speed 6299.22 samples/sec Loss 8.7209 LearningRate 0.0010 Epoch: 4 Global Step: 98590 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:41:50,659-Speed 6316.35 samples/sec Loss 8.5824 LearningRate 0.0010 Epoch: 4 Global Step: 98600 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:41:53,908-Speed 6303.52 samples/sec Loss 8.6318 LearningRate 0.0010 Epoch: 4 Global Step: 98610 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:41:57,158-Speed 6302.72 samples/sec Loss 8.6118 LearningRate 0.0010 Epoch: 4 Global Step: 98620 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:00,410-Speed 6300.78 samples/sec Loss 8.7276 LearningRate 0.0010 Epoch: 4 Global Step: 98630 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:03,654-Speed 6313.89 samples/sec Loss 8.6766 LearningRate 0.0010 Epoch: 4 Global Step: 98640 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:06,903-Speed 6305.80 samples/sec Loss 8.6514 LearningRate 0.0010 Epoch: 4 Global Step: 98650 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:10,150-Speed 6309.23 samples/sec Loss 8.6317 LearningRate 0.0010 Epoch: 4 Global Step: 98660 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:13,381-Speed 6338.29 samples/sec Loss 8.7135 LearningRate 0.0010 Epoch: 4 Global Step: 98670 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:16,630-Speed 6306.45 samples/sec Loss 8.6656 LearningRate 0.0010 Epoch: 4 Global Step: 98680 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:19,878-Speed 6306.70 samples/sec Loss 8.5648 LearningRate 0.0010 Epoch: 4 Global Step: 98690 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:23,125-Speed 6309.11 samples/sec Loss 8.6665 LearningRate 0.0010 Epoch: 4 Global Step: 98700 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:26,372-Speed 6308.82 samples/sec Loss 8.5995 LearningRate 0.0010 Epoch: 4 Global Step: 98710 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:29,618-Speed 6309.50 samples/sec Loss 8.6594 LearningRate 0.0010 Epoch: 4 Global Step: 98720 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:32,862-Speed 6314.78 samples/sec Loss 8.5622 LearningRate 0.0010 Epoch: 4 Global Step: 98730 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:36,113-Speed 6301.48 samples/sec Loss 8.6229 LearningRate 0.0010 Epoch: 4 Global Step: 98740 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:39,358-Speed 6313.15 samples/sec Loss 8.6159 LearningRate 0.0010 Epoch: 4 Global Step: 98750 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:42,605-Speed 6308.56 samples/sec Loss 8.5928 LearningRate 0.0010 Epoch: 4 Global Step: 98760 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:45,838-Speed 6335.25 samples/sec Loss 8.6554 LearningRate 0.0010 Epoch: 4 Global Step: 98770 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:42:49,078-Speed 6322.65 samples/sec Loss 8.7051 LearningRate 0.0010 Epoch: 4 Global Step: 98780 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:42:52,325-Speed 6308.38 samples/sec Loss 8.6656 LearningRate 0.0010 Epoch: 4 Global Step: 98790 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:42:55,573-Speed 6306.01 samples/sec Loss 8.6669 LearningRate 0.0010 Epoch: 4 Global Step: 98800 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:42:58,837-Speed 6277.13 samples/sec Loss 8.6052 LearningRate 0.0010 Epoch: 4 Global Step: 98810 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:43:02,088-Speed 6301.04 samples/sec Loss 8.5951 LearningRate 0.0010 Epoch: 4 Global Step: 98820 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:43:05,337-Speed 6304.09 samples/sec Loss 8.6120 LearningRate 0.0010 Epoch: 4 Global Step: 98830 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:43:08,585-Speed 6308.08 samples/sec Loss 8.5986 LearningRate 0.0010 Epoch: 4 Global Step: 98840 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:43:11,836-Speed 6302.11 samples/sec Loss 8.6727 LearningRate 0.0010 Epoch: 4 Global Step: 98850 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:43:15,082-Speed 6310.14 samples/sec Loss 8.5855 LearningRate 0.0010 Epoch: 4 Global Step: 98860 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:43:18,327-Speed 6312.74 samples/sec Loss 8.5884 LearningRate 0.0010 Epoch: 4 Global Step: 98870 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:43:21,575-Speed 6307.39 samples/sec Loss 8.7198 LearningRate 0.0010 Epoch: 4 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:43:24,823-Speed 6305.92 samples/sec Loss 8.7057 LearningRate 0.0010 Epoch: 4 Global Step: 98890 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:43:28,069-Speed 6310.32 samples/sec Loss 8.6679 LearningRate 0.0010 Epoch: 4 Global Step: 98900 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:43:31,317-Speed 6308.17 samples/sec Loss 8.6113 LearningRate 0.0010 Epoch: 4 Global Step: 98910 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:43:34,563-Speed 6309.54 samples/sec Loss 8.6494 LearningRate 0.0010 Epoch: 4 Global Step: 98920 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:43:37,814-Speed 6302.67 samples/sec Loss 8.6927 LearningRate 0.0010 Epoch: 4 Global Step: 98930 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:43:41,062-Speed 6306.48 samples/sec Loss 8.7464 LearningRate 0.0010 Epoch: 4 Global Step: 98940 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:43:44,309-Speed 6308.05 samples/sec Loss 8.6927 LearningRate 0.0010 Epoch: 4 Global Step: 98950 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:43:47,553-Speed 6313.82 samples/sec Loss 8.7045 LearningRate 0.0010 Epoch: 4 Global Step: 98960 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:43:50,786-Speed 6337.00 samples/sec Loss 8.6886 LearningRate 0.0010 Epoch: 4 Global Step: 98970 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:43:54,033-Speed 6309.23 samples/sec Loss 8.5307 LearningRate 0.0010 Epoch: 4 Global Step: 98980 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:43:57,278-Speed 6312.64 samples/sec Loss 8.5978 LearningRate 0.0010 Epoch: 4 Global Step: 98990 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:44:00,534-Speed 6290.03 samples/sec Loss 8.6647 LearningRate 0.0010 Epoch: 4 Global Step: 99000 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:44:03,782-Speed 6306.82 samples/sec Loss 8.5274 LearningRate 0.0010 Epoch: 4 Global Step: 99010 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:44:07,026-Speed 6315.96 samples/sec Loss 8.6594 LearningRate 0.0010 Epoch: 4 Global Step: 99020 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:44:10,273-Speed 6308.18 samples/sec Loss 8.6791 LearningRate 0.0010 Epoch: 4 Global Step: 99030 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:44:13,520-Speed 6308.90 samples/sec Loss 8.6833 LearningRate 0.0010 Epoch: 4 Global Step: 99040 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:44:16,766-Speed 6310.14 samples/sec Loss 8.7709 LearningRate 0.0010 Epoch: 4 Global Step: 99050 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:44:20,010-Speed 6316.16 samples/sec Loss 8.6224 LearningRate 0.0010 Epoch: 4 Global Step: 99060 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:44:23,257-Speed 6308.33 samples/sec Loss 8.6769 LearningRate 0.0010 Epoch: 4 Global Step: 99070 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:44:26,500-Speed 6317.17 samples/sec Loss 8.6467 LearningRate 0.0010 Epoch: 4 Global Step: 99080 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:44:29,751-Speed 6299.94 samples/sec Loss 8.5605 LearningRate 0.0010 Epoch: 4 Global Step: 99090 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:44:32,999-Speed 6307.81 samples/sec Loss 8.6648 LearningRate 0.0010 Epoch: 4 Global Step: 99100 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:44:36,248-Speed 6303.83 samples/sec Loss 8.6761 LearningRate 0.0010 Epoch: 4 Global Step: 99110 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:44:39,494-Speed 6310.94 samples/sec Loss 8.6156 LearningRate 0.0010 Epoch: 4 Global Step: 99120 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:44:42,742-Speed 6307.54 samples/sec Loss 8.5691 LearningRate 0.0010 Epoch: 4 Global Step: 99130 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:44:45,988-Speed 6309.69 samples/sec Loss 8.6325 LearningRate 0.0010 Epoch: 4 Global Step: 99140 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:44:49,238-Speed 6303.23 samples/sec Loss 8.6394 LearningRate 0.0010 Epoch: 4 Global Step: 99150 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:44:52,484-Speed 6311.66 samples/sec Loss 8.7011 LearningRate 0.0010 Epoch: 4 Global Step: 99160 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:44:55,719-Speed 6330.93 samples/sec Loss 8.5884 LearningRate 0.0010 Epoch: 4 Global Step: 99170 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:44:58,967-Speed 6306.89 samples/sec Loss 8.5622 LearningRate 0.0010 Epoch: 4 Global Step: 99180 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:45:02,211-Speed 6315.53 samples/sec Loss 8.6529 LearningRate 0.0010 Epoch: 4 Global Step: 99190 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:45:05,464-Speed 6301.27 samples/sec Loss 8.5972 LearningRate 0.0010 Epoch: 4 Global Step: 99200 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:45:08,709-Speed 6312.98 samples/sec Loss 8.6786 LearningRate 0.0010 Epoch: 4 Global Step: 99210 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:45:11,953-Speed 6312.50 samples/sec Loss 8.6184 LearningRate 0.0010 Epoch: 4 Global Step: 99220 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:45:15,203-Speed 6304.48 samples/sec Loss 8.5989 LearningRate 0.0010 Epoch: 4 Global Step: 99230 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:45:18,452-Speed 6305.23 samples/sec Loss 8.6443 LearningRate 0.0010 Epoch: 4 Global Step: 99240 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:45:21,697-Speed 6311.51 samples/sec Loss 8.5511 LearningRate 0.0010 Epoch: 4 Global Step: 99250 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:45:24,942-Speed 6313.09 samples/sec Loss 8.6975 LearningRate 0.0010 Epoch: 4 Global Step: 99260 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:45:28,192-Speed 6303.75 samples/sec Loss 8.6246 LearningRate 0.0010 Epoch: 4 Global Step: 99270 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:45:31,437-Speed 6313.23 samples/sec Loss 8.6853 LearningRate 0.0010 Epoch: 4 Global Step: 99280 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:45:34,685-Speed 6306.12 samples/sec Loss 8.6314 LearningRate 0.0010 Epoch: 4 Global Step: 99290 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:45:37,933-Speed 6307.89 samples/sec Loss 8.6688 LearningRate 0.0010 Epoch: 4 Global Step: 99300 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:45:41,177-Speed 6314.33 samples/sec Loss 8.6641 LearningRate 0.0010 Epoch: 4 Global Step: 99310 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:45:44,420-Speed 6315.23 samples/sec Loss 8.5989 LearningRate 0.0010 Epoch: 4 Global Step: 99320 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:45:47,665-Speed 6313.81 samples/sec Loss 8.5727 LearningRate 0.0010 Epoch: 4 Global Step: 99330 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:45:50,935-Speed 6263.69 samples/sec Loss 8.6036 LearningRate 0.0010 Epoch: 4 Global Step: 99340 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:45:54,182-Speed 6308.46 samples/sec Loss 8.6408 LearningRate 0.0010 Epoch: 4 Global Step: 99350 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:45:57,425-Speed 6316.40 samples/sec Loss 8.7043 LearningRate 0.0010 Epoch: 4 Global Step: 99360 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:00,656-Speed 6340.02 samples/sec Loss 8.5456 LearningRate 0.0010 Epoch: 4 Global Step: 99370 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:03,903-Speed 6308.94 samples/sec Loss 8.6251 LearningRate 0.0010 Epoch: 4 Global Step: 99380 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:07,146-Speed 6316.33 samples/sec Loss 8.6997 LearningRate 0.0010 Epoch: 4 Global Step: 99390 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:10,391-Speed 6313.52 samples/sec Loss 8.6167 LearningRate 0.0010 Epoch: 4 Global Step: 99400 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:13,652-Speed 6280.50 samples/sec Loss 8.6359 LearningRate 0.0010 Epoch: 4 Global Step: 99410 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:16,902-Speed 6304.29 samples/sec Loss 8.5599 LearningRate 0.0010 Epoch: 4 Global Step: 99420 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:20,145-Speed 6316.44 samples/sec Loss 8.5934 LearningRate 0.0010 Epoch: 4 Global Step: 99430 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:23,391-Speed 6309.39 samples/sec Loss 8.5935 LearningRate 0.0010 Epoch: 4 Global Step: 99440 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:26,636-Speed 6313.37 samples/sec Loss 8.5782 LearningRate 0.0010 Epoch: 4 Global Step: 99450 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:29,884-Speed 6308.49 samples/sec Loss 8.4507 LearningRate 0.0010 Epoch: 4 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:46:33,104-Speed 6361.32 samples/sec Loss 8.7171 LearningRate 0.0010 Epoch: 4 Global Step: 99470 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:46:36,349-Speed 6313.07 samples/sec Loss 8.6102 LearningRate 0.0010 Epoch: 4 Global Step: 99480 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:46:39,596-Speed 6309.39 samples/sec Loss 8.5515 LearningRate 0.0010 Epoch: 4 Global Step: 99490 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:46:42,837-Speed 6319.14 samples/sec Loss 8.5425 LearningRate 0.0010 Epoch: 4 Global Step: 99500 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:46:46,088-Speed 6301.85 samples/sec Loss 8.5475 LearningRate 0.0010 Epoch: 4 Global Step: 99510 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:46:49,341-Speed 6296.18 samples/sec Loss 8.6839 LearningRate 0.0010 Epoch: 4 Global Step: 99520 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:46:52,583-Speed 6318.41 samples/sec Loss 8.6250 LearningRate 0.0010 Epoch: 4 Global Step: 99530 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:46:55,829-Speed 6312.07 samples/sec Loss 8.6013 LearningRate 0.0010 Epoch: 4 Global Step: 99540 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:46:59,075-Speed 6309.62 samples/sec Loss 8.6279 LearningRate 0.0010 Epoch: 4 Global Step: 99550 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:02,322-Speed 6309.41 samples/sec Loss 8.6062 LearningRate 0.0010 Epoch: 4 Global Step: 99560 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:05,567-Speed 6311.72 samples/sec Loss 8.6182 LearningRate 0.0010 Epoch: 4 Global Step: 99570 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:47:08,816-Speed 6304.62 samples/sec Loss 8.5845 LearningRate 0.0010 Epoch: 4 Global Step: 99580 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:47:12,060-Speed 6316.49 samples/sec Loss 8.5864 LearningRate 0.0010 Epoch: 4 Global Step: 99590 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:47:15,306-Speed 6310.53 samples/sec Loss 8.5819 LearningRate 0.0010 Epoch: 4 Global Step: 99600 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:47:18,559-Speed 6295.37 samples/sec Loss 8.6112 LearningRate 0.0010 Epoch: 4 Global Step: 99610 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:47:21,786-Speed 6349.13 samples/sec Loss 8.5631 LearningRate 0.0010 Epoch: 4 Global Step: 99620 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:25,037-Speed 6300.70 samples/sec Loss 8.5930 LearningRate 0.0010 Epoch: 4 Global Step: 99630 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:28,283-Speed 6309.73 samples/sec Loss 8.5442 LearningRate 0.0010 Epoch: 4 Global Step: 99640 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:31,536-Speed 6297.12 samples/sec Loss 8.5660 LearningRate 0.0010 Epoch: 4 Global Step: 99650 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:34,788-Speed 6300.97 samples/sec Loss 8.5742 LearningRate 0.0010 Epoch: 4 Global Step: 99660 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:38,056-Speed 6268.45 samples/sec Loss 8.5738 LearningRate 0.0010 Epoch: 4 Global Step: 99670 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:41,326-Speed 6264.84 samples/sec Loss 8.5784 LearningRate 0.0010 Epoch: 4 Global Step: 99680 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:44,571-Speed 6312.60 samples/sec Loss 8.6420 LearningRate 0.0010 Epoch: 4 Global Step: 99690 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:47,816-Speed 6311.43 samples/sec Loss 8.5672 LearningRate 0.0010 Epoch: 4 Global Step: 99700 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:51,063-Speed 6308.80 samples/sec Loss 8.7005 LearningRate 0.0010 Epoch: 4 Global Step: 99710 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:47:54,307-Speed 6315.65 samples/sec Loss 8.7025 LearningRate 0.0010 Epoch: 4 Global Step: 99720 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:47:57,553-Speed 6310.47 samples/sec Loss 8.5679 LearningRate 0.0010 Epoch: 4 Global Step: 99730 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:00,799-Speed 6310.43 samples/sec Loss 8.5530 LearningRate 0.0010 Epoch: 4 Global Step: 99740 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:04,052-Speed 6297.71 samples/sec Loss 8.5970 LearningRate 0.0010 Epoch: 4 Global Step: 99750 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:07,294-Speed 6318.80 samples/sec Loss 8.6224 LearningRate 0.0010 Epoch: 4 Global Step: 99760 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:10,541-Speed 6308.66 samples/sec Loss 8.6256 LearningRate 0.0010 Epoch: 4 Global Step: 99770 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:13,788-Speed 6308.82 samples/sec Loss 8.5643 LearningRate 0.0010 Epoch: 4 Global Step: 99780 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:17,034-Speed 6310.40 samples/sec Loss 8.5513 LearningRate 0.0010 Epoch: 4 Global Step: 99790 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:20,282-Speed 6305.85 samples/sec Loss 8.6139 LearningRate 0.0010 Epoch: 4 Global Step: 99800 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:23,528-Speed 6310.62 samples/sec Loss 8.5049 LearningRate 0.0010 Epoch: 4 Global Step: 99810 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:26,767-Speed 6325.91 samples/sec Loss 8.6033 LearningRate 0.0010 Epoch: 4 Global Step: 99820 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:30,016-Speed 6304.53 samples/sec Loss 8.6070 LearningRate 0.0010 Epoch: 4 Global Step: 99830 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:33,261-Speed 6312.29 samples/sec Loss 8.6692 LearningRate 0.0010 Epoch: 4 Global Step: 99840 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:36,507-Speed 6310.93 samples/sec Loss 8.6548 LearningRate 0.0010 Epoch: 4 Global Step: 99850 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:39,752-Speed 6312.16 samples/sec Loss 8.6160 LearningRate 0.0010 Epoch: 4 Global Step: 99860 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:43,003-Speed 6304.72 samples/sec Loss 8.5101 LearningRate 0.0010 Epoch: 4 Global Step: 99870 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:46,276-Speed 6257.77 samples/sec Loss 8.6212 LearningRate 0.0010 Epoch: 4 Global Step: 99880 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:49,545-Speed 6267.78 samples/sec Loss 8.6250 LearningRate 0.0010 Epoch: 4 Global Step: 99890 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:52,789-Speed 6313.71 samples/sec Loss 8.5895 LearningRate 0.0010 Epoch: 4 Global Step: 99900 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:56,033-Speed 6314.74 samples/sec Loss 8.5901 LearningRate 0.0010 Epoch: 4 Global Step: 99910 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:48:59,264-Speed 6340.52 samples/sec Loss 8.5468 LearningRate 0.0010 Epoch: 4 Global Step: 99920 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:02,506-Speed 6318.26 samples/sec Loss 8.5574 LearningRate 0.0010 Epoch: 4 Global Step: 99930 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:05,752-Speed 6311.62 samples/sec Loss 8.5948 LearningRate 0.0010 Epoch: 4 Global Step: 99940 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:08,999-Speed 6308.68 samples/sec Loss 8.5666 LearningRate 0.0010 Epoch: 4 Global Step: 99950 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:12,242-Speed 6316.13 samples/sec Loss 8.4948 LearningRate 0.0010 Epoch: 4 Global Step: 99960 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:15,492-Speed 6302.00 samples/sec Loss 8.5062 LearningRate 0.0010 Epoch: 4 Global Step: 99970 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:18,734-Speed 6318.63 samples/sec Loss 8.6402 LearningRate 0.0010 Epoch: 4 Global Step: 99980 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:21,983-Speed 6305.83 samples/sec Loss 8.6171 LearningRate 0.0010 Epoch: 4 Global Step: 99990 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:25,236-Speed 6295.55 samples/sec Loss 8.4926 LearningRate 0.0010 Epoch: 4 Global Step: 100000 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:28,483-Speed 6309.95 samples/sec Loss 8.5477 LearningRate 0.0010 Epoch: 4 Global Step: 100010 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:31,713-Speed 6341.17 samples/sec Loss 8.5528 LearningRate 0.0010 Epoch: 4 Global Step: 100020 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:34,960-Speed 6309.68 samples/sec Loss 8.5821 LearningRate 0.0010 Epoch: 4 Global Step: 100030 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:38,209-Speed 6304.38 samples/sec Loss 8.4595 LearningRate 0.0010 Epoch: 4 Global Step: 100040 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:41,452-Speed 6316.88 samples/sec Loss 8.6985 LearningRate 0.0010 Epoch: 4 Global Step: 100050 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:44,697-Speed 6312.24 samples/sec Loss 8.6401 LearningRate 0.0010 Epoch: 4 Global Step: 100060 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:47,942-Speed 6311.74 samples/sec Loss 8.5294 LearningRate 0.0010 Epoch: 4 Global Step: 100070 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:51,187-Speed 6314.07 samples/sec Loss 8.5260 LearningRate 0.0010 Epoch: 4 Global Step: 100080 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:54,429-Speed 6317.18 samples/sec Loss 8.5556 LearningRate 0.0010 Epoch: 4 Global Step: 100090 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:49:57,673-Speed 6315.30 samples/sec Loss 8.6263 LearningRate 0.0010 Epoch: 4 Global Step: 100100 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:00,920-Speed 6308.46 samples/sec Loss 8.5042 LearningRate 0.0010 Epoch: 4 Global Step: 100110 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:04,153-Speed 6337.26 samples/sec Loss 8.6301 LearningRate 0.0010 Epoch: 4 Global Step: 100120 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:07,397-Speed 6313.79 samples/sec Loss 8.5547 LearningRate 0.0010 Epoch: 4 Global Step: 100130 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:10,642-Speed 6313.02 samples/sec Loss 8.6310 LearningRate 0.0010 Epoch: 4 Global Step: 100140 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:13,885-Speed 6316.87 samples/sec Loss 8.6885 LearningRate 0.0010 Epoch: 4 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:17,142-Speed 6290.61 samples/sec Loss 8.5530 LearningRate 0.0010 Epoch: 4 Global Step: 100160 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:20,383-Speed 6319.40 samples/sec Loss 8.5413 LearningRate 0.0010 Epoch: 4 Global Step: 100170 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:23,628-Speed 6313.20 samples/sec Loss 8.5997 LearningRate 0.0010 Epoch: 4 Global Step: 100180 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:26,877-Speed 6304.58 samples/sec Loss 8.5142 LearningRate 0.0010 Epoch: 4 Global Step: 100190 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:30,172-Speed 6215.84 samples/sec Loss 8.5012 LearningRate 0.0010 Epoch: 4 Global Step: 100200 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:33,417-Speed 6313.70 samples/sec Loss 8.4916 LearningRate 0.0010 Epoch: 4 Global Step: 100210 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:36,685-Speed 6268.02 samples/sec Loss 8.5532 LearningRate 0.0010 Epoch: 4 Global Step: 100220 Fp16 Grad Scale: 131072 Required: 66 hours Training: 2022-04-01 00:50:39,926-Speed 6320.78 samples/sec Loss 8.5242 LearningRate 0.0010 Epoch: 4 Global Step: 100230 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:43,172-Speed 6310.50 samples/sec Loss 8.6006 LearningRate 0.0010 Epoch: 4 Global Step: 100240 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:46,415-Speed 6316.73 samples/sec Loss 8.6317 LearningRate 0.0010 Epoch: 4 Global Step: 100250 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:49,666-Speed 6299.67 samples/sec Loss 8.5931 LearningRate 0.0010 Epoch: 4 Global Step: 100260 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:52,910-Speed 6314.96 samples/sec Loss 8.5297 LearningRate 0.0010 Epoch: 4 Global Step: 100270 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:56,157-Speed 6309.15 samples/sec Loss 8.6888 LearningRate 0.0010 Epoch: 4 Global Step: 100280 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:50:59,403-Speed 6309.68 samples/sec Loss 8.5611 LearningRate 0.0010 Epoch: 4 Global Step: 100290 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:51:02,649-Speed 6311.79 samples/sec Loss 8.6395 LearningRate 0.0010 Epoch: 4 Global Step: 100300 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:51:05,899-Speed 6302.01 samples/sec Loss 8.6091 LearningRate 0.0010 Epoch: 4 Global Step: 100310 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:51:09,144-Speed 6313.76 samples/sec Loss 8.5645 LearningRate 0.0010 Epoch: 4 Global Step: 100320 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:51:12,378-Speed 6333.71 samples/sec Loss 8.4123 LearningRate 0.0010 Epoch: 4 Global Step: 100330 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:51:15,624-Speed 6312.48 samples/sec Loss 8.5642 LearningRate 0.0010 Epoch: 4 Global Step: 100340 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:51:18,870-Speed 6310.08 samples/sec Loss 8.4806 LearningRate 0.0010 Epoch: 4 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:51:22,098-Speed 6345.12 samples/sec Loss 8.6157 LearningRate 0.0010 Epoch: 4 Global Step: 100360 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:51:25,343-Speed 6315.60 samples/sec Loss 8.5709 LearningRate 0.0010 Epoch: 4 Global Step: 100370 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:51:28,602-Speed 6283.82 samples/sec Loss 8.6122 LearningRate 0.0010 Epoch: 4 Global Step: 100380 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:51:31,849-Speed 6309.83 samples/sec Loss 8.4808 LearningRate 0.0010 Epoch: 4 Global Step: 100390 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:51:35,095-Speed 6310.87 samples/sec Loss 8.5811 LearningRate 0.0010 Epoch: 4 Global Step: 100400 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:51:38,338-Speed 6317.44 samples/sec Loss 8.5116 LearningRate 0.0010 Epoch: 4 Global Step: 100410 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:51:41,582-Speed 6314.01 samples/sec Loss 8.4704 LearningRate 0.0010 Epoch: 4 Global Step: 100420 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:51:44,829-Speed 6309.24 samples/sec Loss 8.6221 LearningRate 0.0010 Epoch: 4 Global Step: 100430 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:51:48,070-Speed 6319.33 samples/sec Loss 8.5444 LearningRate 0.0010 Epoch: 4 Global Step: 100440 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:51:51,316-Speed 6311.37 samples/sec Loss 8.5687 LearningRate 0.0010 Epoch: 4 Global Step: 100450 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:51:54,563-Speed 6308.92 samples/sec Loss 8.5188 LearningRate 0.0010 Epoch: 4 Global Step: 100460 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:51:57,812-Speed 6304.42 samples/sec Loss 8.5493 LearningRate 0.0010 Epoch: 4 Global Step: 100470 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:01,065-Speed 6297.38 samples/sec Loss 8.5312 LearningRate 0.0010 Epoch: 4 Global Step: 100480 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:04,317-Speed 6299.15 samples/sec Loss 8.5614 LearningRate 0.0010 Epoch: 4 Global Step: 100490 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:07,566-Speed 6305.38 samples/sec Loss 8.5069 LearningRate 0.0010 Epoch: 4 Global Step: 100500 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:10,815-Speed 6303.37 samples/sec Loss 8.5646 LearningRate 0.0010 Epoch: 4 Global Step: 100510 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:14,065-Speed 6303.98 samples/sec Loss 8.5755 LearningRate 0.0010 Epoch: 4 Global Step: 100520 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:17,307-Speed 6317.25 samples/sec Loss 8.5820 LearningRate 0.0010 Epoch: 4 Global Step: 100530 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:20,559-Speed 6300.12 samples/sec Loss 8.5938 LearningRate 0.0010 Epoch: 4 Global Step: 100540 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:23,806-Speed 6310.09 samples/sec Loss 8.5193 LearningRate 0.0010 Epoch: 4 Global Step: 100550 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:27,039-Speed 6335.50 samples/sec Loss 8.5635 LearningRate 0.0010 Epoch: 4 Global Step: 100560 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:30,289-Speed 6303.97 samples/sec Loss 8.4774 LearningRate 0.0010 Epoch: 4 Global Step: 100570 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:33,533-Speed 6312.84 samples/sec Loss 8.4579 LearningRate 0.0010 Epoch: 4 Global Step: 100580 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:36,785-Speed 6299.87 samples/sec Loss 8.5521 LearningRate 0.0010 Epoch: 4 Global Step: 100590 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:40,031-Speed 6311.50 samples/sec Loss 8.5243 LearningRate 0.0010 Epoch: 4 Global Step: 100600 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:43,275-Speed 6313.23 samples/sec Loss 8.5998 LearningRate 0.0010 Epoch: 4 Global Step: 100610 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:46,527-Speed 6299.15 samples/sec Loss 8.5524 LearningRate 0.0010 Epoch: 4 Global Step: 100620 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:49,773-Speed 6312.10 samples/sec Loss 8.4526 LearningRate 0.0010 Epoch: 4 Global Step: 100630 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:53,021-Speed 6305.56 samples/sec Loss 8.5314 LearningRate 0.0010 Epoch: 4 Global Step: 100640 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:56,270-Speed 6306.11 samples/sec Loss 8.5243 LearningRate 0.0010 Epoch: 4 Global Step: 100650 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:52:59,497-Speed 6347.87 samples/sec Loss 8.5425 LearningRate 0.0010 Epoch: 4 Global Step: 100660 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:53:02,747-Speed 6302.13 samples/sec Loss 8.5051 LearningRate 0.0010 Epoch: 4 Global Step: 100670 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:53:05,992-Speed 6312.60 samples/sec Loss 8.5234 LearningRate 0.0010 Epoch: 4 Global Step: 100680 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:53:09,243-Speed 6300.89 samples/sec Loss 8.4125 LearningRate 0.0010 Epoch: 4 Global Step: 100690 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:53:12,487-Speed 6315.89 samples/sec Loss 8.5256 LearningRate 0.0010 Epoch: 4 Global Step: 100700 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:53:15,731-Speed 6312.99 samples/sec Loss 8.5634 LearningRate 0.0010 Epoch: 4 Global Step: 100710 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:53:18,965-Speed 6334.85 samples/sec Loss 8.5738 LearningRate 0.0010 Epoch: 4 Global Step: 100720 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:53:22,210-Speed 6312.79 samples/sec Loss 8.6050 LearningRate 0.0010 Epoch: 4 Global Step: 100730 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:53:25,457-Speed 6309.35 samples/sec Loss 8.5894 LearningRate 0.0010 Epoch: 4 Global Step: 100740 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:53:28,699-Speed 6317.66 samples/sec Loss 8.5827 LearningRate 0.0010 Epoch: 4 Global Step: 100750 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:53:31,943-Speed 6314.59 samples/sec Loss 8.5925 LearningRate 0.0010 Epoch: 4 Global Step: 100760 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:53:35,190-Speed 6309.95 samples/sec Loss 8.5305 LearningRate 0.0010 Epoch: 4 Global Step: 100770 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:53:38,432-Speed 6318.79 samples/sec Loss 8.6309 LearningRate 0.0010 Epoch: 4 Global Step: 100780 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:53:41,678-Speed 6310.61 samples/sec Loss 8.4986 LearningRate 0.0010 Epoch: 4 Global Step: 100790 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:53:44,921-Speed 6317.28 samples/sec Loss 8.4703 LearningRate 0.0010 Epoch: 4 Global Step: 100800 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:53:48,169-Speed 6304.88 samples/sec Loss 8.4499 LearningRate 0.0010 Epoch: 4 Global Step: 100810 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:53:51,424-Speed 6294.07 samples/sec Loss 8.5232 LearningRate 0.0010 Epoch: 4 Global Step: 100820 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:53:54,668-Speed 6315.22 samples/sec Loss 8.5943 LearningRate 0.0010 Epoch: 4 Global Step: 100830 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:53:57,914-Speed 6309.16 samples/sec Loss 8.5291 LearningRate 0.0010 Epoch: 4 Global Step: 100840 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:01,157-Speed 6318.18 samples/sec Loss 8.4828 LearningRate 0.0010 Epoch: 4 Global Step: 100850 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:04,400-Speed 6316.96 samples/sec Loss 8.5831 LearningRate 0.0010 Epoch: 4 Global Step: 100860 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:07,642-Speed 6317.34 samples/sec Loss 8.5825 LearningRate 0.0010 Epoch: 4 Global Step: 100870 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:10,888-Speed 6310.97 samples/sec Loss 8.4085 LearningRate 0.0010 Epoch: 4 Global Step: 100880 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:14,137-Speed 6305.18 samples/sec Loss 8.5724 LearningRate 0.0010 Epoch: 4 Global Step: 100890 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:17,380-Speed 6317.26 samples/sec Loss 8.4847 LearningRate 0.0010 Epoch: 4 Global Step: 100900 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:20,626-Speed 6310.05 samples/sec Loss 8.5263 LearningRate 0.0010 Epoch: 4 Global Step: 100910 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:23,859-Speed 6335.45 samples/sec Loss 8.5065 LearningRate 0.0010 Epoch: 4 Global Step: 100920 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:27,109-Speed 6302.62 samples/sec Loss 8.5065 LearningRate 0.0010 Epoch: 4 Global Step: 100930 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:30,355-Speed 6311.49 samples/sec Loss 8.5512 LearningRate 0.0010 Epoch: 4 Global Step: 100940 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:33,599-Speed 6314.38 samples/sec Loss 8.5371 LearningRate 0.0010 Epoch: 4 Global Step: 100950 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:36,842-Speed 6315.68 samples/sec Loss 8.6756 LearningRate 0.0010 Epoch: 4 Global Step: 100960 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:40,088-Speed 6310.87 samples/sec Loss 8.5919 LearningRate 0.0010 Epoch: 4 Global Step: 100970 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:43,337-Speed 6305.16 samples/sec Loss 8.5366 LearningRate 0.0010 Epoch: 4 Global Step: 100980 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:46,586-Speed 6305.26 samples/sec Loss 8.5219 LearningRate 0.0010 Epoch: 4 Global Step: 100990 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:49,832-Speed 6311.08 samples/sec Loss 8.5334 LearningRate 0.0010 Epoch: 4 Global Step: 101000 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:53,079-Speed 6308.26 samples/sec Loss 8.5890 LearningRate 0.0010 Epoch: 4 Global Step: 101010 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:54:56,318-Speed 6324.85 samples/sec Loss 8.4203 LearningRate 0.0010 Epoch: 4 Global Step: 101020 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:54:59,566-Speed 6306.88 samples/sec Loss 8.5328 LearningRate 0.0010 Epoch: 4 Global Step: 101030 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:55:02,812-Speed 6311.60 samples/sec Loss 8.4825 LearningRate 0.0010 Epoch: 4 Global Step: 101040 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:55:06,065-Speed 6295.35 samples/sec Loss 8.5621 LearningRate 0.0010 Epoch: 4 Global Step: 101050 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:55:09,307-Speed 6318.59 samples/sec Loss 8.5602 LearningRate 0.0010 Epoch: 4 Global Step: 101060 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:55:12,553-Speed 6312.17 samples/sec Loss 8.5524 LearningRate 0.0010 Epoch: 4 Global Step: 101070 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:55:15,798-Speed 6311.32 samples/sec Loss 8.4820 LearningRate 0.0010 Epoch: 4 Global Step: 101080 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:55:19,044-Speed 6310.79 samples/sec Loss 8.5240 LearningRate 0.0010 Epoch: 4 Global Step: 101090 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:55:22,292-Speed 6307.55 samples/sec Loss 8.4790 LearningRate 0.0010 Epoch: 4 Global Step: 101100 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:55:25,539-Speed 6307.69 samples/sec Loss 8.5378 LearningRate 0.0010 Epoch: 4 Global Step: 101110 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:55:28,783-Speed 6315.06 samples/sec Loss 8.4606 LearningRate 0.0010 Epoch: 4 Global Step: 101120 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:55:32,032-Speed 6304.88 samples/sec Loss 8.5772 LearningRate 0.0010 Epoch: 4 Global Step: 101130 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:55:35,278-Speed 6311.30 samples/sec Loss 8.5469 LearningRate 0.0010 Epoch: 4 Global Step: 101140 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:55:38,522-Speed 6314.22 samples/sec Loss 8.5690 LearningRate 0.0010 Epoch: 4 Global Step: 101150 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:55:41,766-Speed 6314.89 samples/sec Loss 8.5674 LearningRate 0.0010 Epoch: 4 Global Step: 101160 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:55:45,012-Speed 6310.91 samples/sec Loss 8.4631 LearningRate 0.0010 Epoch: 4 Global Step: 101170 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:55:48,257-Speed 6312.48 samples/sec Loss 8.5691 LearningRate 0.0010 Epoch: 4 Global Step: 101180 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:55:51,507-Speed 6303.76 samples/sec Loss 8.5217 LearningRate 0.0010 Epoch: 4 Global Step: 101190 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:55:54,754-Speed 6309.07 samples/sec Loss 8.5398 LearningRate 0.0010 Epoch: 4 Global Step: 101200 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:55:58,000-Speed 6311.02 samples/sec Loss 8.4923 LearningRate 0.0010 Epoch: 4 Global Step: 101210 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:01,236-Speed 6329.40 samples/sec Loss 8.5587 LearningRate 0.0010 Epoch: 4 Global Step: 101220 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:04,490-Speed 6295.04 samples/sec Loss 8.5816 LearningRate 0.0010 Epoch: 4 Global Step: 101230 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:07,736-Speed 6310.39 samples/sec Loss 8.4447 LearningRate 0.0010 Epoch: 4 Global Step: 101240 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:10,980-Speed 6316.47 samples/sec Loss 8.4067 LearningRate 0.0010 Epoch: 4 Global Step: 101250 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:14,225-Speed 6310.92 samples/sec Loss 8.4545 LearningRate 0.0010 Epoch: 4 Global Step: 101260 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:17,476-Speed 6301.25 samples/sec Loss 8.5936 LearningRate 0.0010 Epoch: 4 Global Step: 101270 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:20,722-Speed 6310.90 samples/sec Loss 8.5682 LearningRate 0.0010 Epoch: 4 Global Step: 101280 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:23,972-Speed 6303.02 samples/sec Loss 8.4982 LearningRate 0.0010 Epoch: 4 Global Step: 101290 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:27,218-Speed 6310.44 samples/sec Loss 8.4761 LearningRate 0.0010 Epoch: 4 Global Step: 101300 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:30,464-Speed 6311.42 samples/sec Loss 8.5514 LearningRate 0.0010 Epoch: 4 Global Step: 101310 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:33,708-Speed 6313.66 samples/sec Loss 8.5166 LearningRate 0.0010 Epoch: 4 Global Step: 101320 Fp16 Grad Scale: 131072 Required: 66 hours Training: 2022-04-01 00:56:36,943-Speed 6331.74 samples/sec Loss 8.4804 LearningRate 0.0010 Epoch: 4 Global Step: 101330 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:56:40,221-Speed 6251.08 samples/sec Loss 8.4965 LearningRate 0.0010 Epoch: 4 Global Step: 101340 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:56:43,463-Speed 6317.36 samples/sec Loss 8.5165 LearningRate 0.0010 Epoch: 4 Global Step: 101350 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:56:46,710-Speed 6309.14 samples/sec Loss 8.4602 LearningRate 0.0010 Epoch: 4 Global Step: 101360 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:56:49,955-Speed 6313.47 samples/sec Loss 8.4688 LearningRate 0.0010 Epoch: 4 Global Step: 101370 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:56:53,199-Speed 6313.55 samples/sec Loss 8.5916 LearningRate 0.0010 Epoch: 4 Global Step: 101380 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:56:56,448-Speed 6305.51 samples/sec Loss 8.5115 LearningRate 0.0010 Epoch: 4 Global Step: 101390 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:56:59,693-Speed 6313.12 samples/sec Loss 8.4891 LearningRate 0.0010 Epoch: 4 Global Step: 101400 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:57:02,941-Speed 6306.98 samples/sec Loss 8.5053 LearningRate 0.0010 Epoch: 4 Global Step: 101410 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:57:06,194-Speed 6296.23 samples/sec Loss 8.4862 LearningRate 0.0010 Epoch: 4 Global Step: 101420 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:57:09,453-Speed 6285.79 samples/sec Loss 8.4113 LearningRate 0.0010 Epoch: 4 Global Step: 101430 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:57:12,697-Speed 6314.64 samples/sec Loss 8.5673 LearningRate 0.0010 Epoch: 4 Global Step: 101440 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:15,941-Speed 6315.01 samples/sec Loss 8.4351 LearningRate 0.0010 Epoch: 4 Global Step: 101450 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:19,185-Speed 6314.45 samples/sec Loss 8.5783 LearningRate 0.0010 Epoch: 4 Global Step: 101460 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:22,434-Speed 6305.29 samples/sec Loss 8.4897 LearningRate 0.0010 Epoch: 4 Global Step: 101470 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:25,684-Speed 6302.86 samples/sec Loss 8.4967 LearningRate 0.0010 Epoch: 4 Global Step: 101480 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:28,932-Speed 6306.06 samples/sec Loss 8.5414 LearningRate 0.0010 Epoch: 4 Global Step: 101490 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:32,177-Speed 6314.07 samples/sec Loss 8.5357 LearningRate 0.0010 Epoch: 4 Global Step: 101500 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:35,425-Speed 6306.25 samples/sec Loss 8.5704 LearningRate 0.0010 Epoch: 4 Global Step: 101510 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:38,667-Speed 6318.90 samples/sec Loss 8.5185 LearningRate 0.0010 Epoch: 4 Global Step: 101520 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:41,915-Speed 6306.79 samples/sec Loss 8.5301 LearningRate 0.0010 Epoch: 4 Global Step: 101530 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:45,142-Speed 6346.40 samples/sec Loss 8.4747 LearningRate 0.0010 Epoch: 4 Global Step: 101540 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:48,387-Speed 6313.16 samples/sec Loss 8.5502 LearningRate 0.0010 Epoch: 4 Global Step: 101550 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:51,637-Speed 6302.86 samples/sec Loss 8.5121 LearningRate 0.0010 Epoch: 4 Global Step: 101560 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:57:54,866-Speed 6345.03 samples/sec Loss 8.4695 LearningRate 0.0010 Epoch: 4 Global Step: 101570 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:57:58,110-Speed 6313.83 samples/sec Loss 8.4001 LearningRate 0.0010 Epoch: 4 Global Step: 101580 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:58:01,356-Speed 6311.22 samples/sec Loss 8.5034 LearningRate 0.0010 Epoch: 4 Global Step: 101590 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:58:04,607-Speed 6300.91 samples/sec Loss 8.5304 LearningRate 0.0010 Epoch: 4 Global Step: 101600 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:58:07,852-Speed 6311.96 samples/sec Loss 8.5582 LearningRate 0.0010 Epoch: 4 Global Step: 101610 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:58:11,093-Speed 6320.10 samples/sec Loss 8.5085 LearningRate 0.0010 Epoch: 4 Global Step: 101620 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:58:14,343-Speed 6303.81 samples/sec Loss 8.4690 LearningRate 0.0010 Epoch: 4 Global Step: 101630 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:58:17,587-Speed 6314.42 samples/sec Loss 8.5007 LearningRate 0.0010 Epoch: 4 Global Step: 101640 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:58:20,834-Speed 6310.55 samples/sec Loss 8.4949 LearningRate 0.0010 Epoch: 4 Global Step: 101650 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:58:24,082-Speed 6305.24 samples/sec Loss 8.4662 LearningRate 0.0010 Epoch: 4 Global Step: 101660 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:58:27,325-Speed 6316.45 samples/sec Loss 8.4234 LearningRate 0.0010 Epoch: 4 Global Step: 101670 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:58:30,570-Speed 6313.06 samples/sec Loss 8.5569 LearningRate 0.0010 Epoch: 4 Global Step: 101680 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:58:33,811-Speed 6320.13 samples/sec Loss 8.5481 LearningRate 0.0010 Epoch: 4 Global Step: 101690 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:58:37,063-Speed 6300.32 samples/sec Loss 8.4679 LearningRate 0.0010 Epoch: 4 Global Step: 101700 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:58:40,306-Speed 6315.42 samples/sec Loss 8.4753 LearningRate 0.0010 Epoch: 4 Global Step: 101710 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:58:43,550-Speed 6314.95 samples/sec Loss 8.5243 LearningRate 0.0010 Epoch: 4 Global Step: 101720 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:58:46,798-Speed 6307.36 samples/sec Loss 8.4216 LearningRate 0.0010 Epoch: 4 Global Step: 101730 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:58:50,044-Speed 6310.43 samples/sec Loss 8.3854 LearningRate 0.0010 Epoch: 4 Global Step: 101740 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:58:53,290-Speed 6310.87 samples/sec Loss 8.4869 LearningRate 0.0010 Epoch: 4 Global Step: 101750 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:58:56,538-Speed 6305.79 samples/sec Loss 8.5098 LearningRate 0.0010 Epoch: 4 Global Step: 101760 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:58:59,786-Speed 6308.46 samples/sec Loss 8.4643 LearningRate 0.0010 Epoch: 4 Global Step: 101770 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:59:03,058-Speed 6260.32 samples/sec Loss 8.4984 LearningRate 0.0010 Epoch: 4 Global Step: 101780 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:59:06,292-Speed 6332.93 samples/sec Loss 8.5062 LearningRate 0.0010 Epoch: 4 Global Step: 101790 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:59:09,547-Speed 6294.14 samples/sec Loss 8.4507 LearningRate 0.0010 Epoch: 4 Global Step: 101800 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:59:12,794-Speed 6309.34 samples/sec Loss 8.4964 LearningRate 0.0010 Epoch: 4 Global Step: 101810 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:59:16,041-Speed 6307.07 samples/sec Loss 8.5590 LearningRate 0.0010 Epoch: 4 Global Step: 101820 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:59:19,289-Speed 6306.76 samples/sec Loss 8.4451 LearningRate 0.0010 Epoch: 4 Global Step: 101830 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:59:22,536-Speed 6310.18 samples/sec Loss 8.4744 LearningRate 0.0010 Epoch: 4 Global Step: 101840 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:59:25,782-Speed 6310.96 samples/sec Loss 8.4431 LearningRate 0.0010 Epoch: 4 Global Step: 101850 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:59:29,029-Speed 6309.31 samples/sec Loss 8.4849 LearningRate 0.0010 Epoch: 4 Global Step: 101860 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:59:32,280-Speed 6300.26 samples/sec Loss 8.5710 LearningRate 0.0009 Epoch: 4 Global Step: 101870 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:59:35,528-Speed 6307.59 samples/sec Loss 8.5200 LearningRate 0.0009 Epoch: 4 Global Step: 101880 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 00:59:38,774-Speed 6309.74 samples/sec Loss 8.5525 LearningRate 0.0009 Epoch: 4 Global Step: 101890 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:59:42,022-Speed 6307.22 samples/sec Loss 8.4504 LearningRate 0.0009 Epoch: 4 Global Step: 101900 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:59:45,273-Speed 6300.10 samples/sec Loss 8.4757 LearningRate 0.0009 Epoch: 4 Global Step: 101910 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:59:48,523-Speed 6304.69 samples/sec Loss 8.5178 LearningRate 0.0009 Epoch: 4 Global Step: 101920 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:59:51,769-Speed 6310.58 samples/sec Loss 8.4729 LearningRate 0.0009 Epoch: 4 Global Step: 101930 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:59:55,020-Speed 6300.64 samples/sec Loss 8.4289 LearningRate 0.0009 Epoch: 4 Global Step: 101940 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 00:59:58,274-Speed 6294.82 samples/sec Loss 8.5186 LearningRate 0.0009 Epoch: 4 Global Step: 101950 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:01,538-Speed 6276.02 samples/sec Loss 8.5715 LearningRate 0.0009 Epoch: 4 Global Step: 101960 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:04,789-Speed 6301.82 samples/sec Loss 8.5144 LearningRate 0.0009 Epoch: 4 Global Step: 101970 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:08,031-Speed 6317.55 samples/sec Loss 8.4362 LearningRate 0.0009 Epoch: 4 Global Step: 101980 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:11,268-Speed 6328.03 samples/sec Loss 8.4926 LearningRate 0.0009 Epoch: 4 Global Step: 101990 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:14,514-Speed 6310.79 samples/sec Loss 8.4614 LearningRate 0.0009 Epoch: 4 Global Step: 102000 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:17,765-Speed 6302.28 samples/sec Loss 8.4404 LearningRate 0.0009 Epoch: 4 Global Step: 102010 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:21,009-Speed 6313.97 samples/sec Loss 8.4486 LearningRate 0.0009 Epoch: 4 Global Step: 102020 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:24,253-Speed 6314.54 samples/sec Loss 8.4898 LearningRate 0.0009 Epoch: 4 Global Step: 102030 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:27,500-Speed 6307.86 samples/sec Loss 8.4547 LearningRate 0.0009 Epoch: 4 Global Step: 102040 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:30,744-Speed 6315.83 samples/sec Loss 8.3963 LearningRate 0.0009 Epoch: 4 Global Step: 102050 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:33,990-Speed 6309.55 samples/sec Loss 8.4795 LearningRate 0.0009 Epoch: 4 Global Step: 102060 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:37,235-Speed 6313.14 samples/sec Loss 8.4183 LearningRate 0.0009 Epoch: 4 Global Step: 102070 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:40,482-Speed 6309.98 samples/sec Loss 8.5623 LearningRate 0.0009 Epoch: 4 Global Step: 102080 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:00:43,716-Speed 6332.48 samples/sec Loss 8.5031 LearningRate 0.0009 Epoch: 4 Global Step: 102090 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:00:46,965-Speed 6307.03 samples/sec Loss 8.5602 LearningRate 0.0009 Epoch: 4 Global Step: 102100 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:00:50,209-Speed 6313.89 samples/sec Loss 8.4620 LearningRate 0.0009 Epoch: 4 Global Step: 102110 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:00:53,453-Speed 6315.01 samples/sec Loss 8.4693 LearningRate 0.0009 Epoch: 4 Global Step: 102120 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:00:56,702-Speed 6305.01 samples/sec Loss 8.4250 LearningRate 0.0009 Epoch: 4 Global Step: 102130 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:00:59,948-Speed 6310.07 samples/sec Loss 8.5493 LearningRate 0.0009 Epoch: 4 Global Step: 102140 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:01:03,194-Speed 6311.88 samples/sec Loss 8.4109 LearningRate 0.0009 Epoch: 4 Global Step: 102150 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:01:06,439-Speed 6312.38 samples/sec Loss 8.4192 LearningRate 0.0009 Epoch: 4 Global Step: 102160 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:01:09,684-Speed 6312.26 samples/sec Loss 8.5236 LearningRate 0.0009 Epoch: 4 Global Step: 102170 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:01:12,932-Speed 6306.22 samples/sec Loss 8.5553 LearningRate 0.0009 Epoch: 4 Global Step: 102180 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:01:16,179-Speed 6309.05 samples/sec Loss 8.4947 LearningRate 0.0009 Epoch: 4 Global Step: 102190 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:19,422-Speed 6317.03 samples/sec Loss 8.4845 LearningRate 0.0009 Epoch: 4 Global Step: 102200 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:22,674-Speed 6299.30 samples/sec Loss 8.5270 LearningRate 0.0009 Epoch: 4 Global Step: 102210 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:25,920-Speed 6310.27 samples/sec Loss 8.5211 LearningRate 0.0009 Epoch: 4 Global Step: 102220 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:29,167-Speed 6309.13 samples/sec Loss 8.4846 LearningRate 0.0009 Epoch: 4 Global Step: 102230 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:32,410-Speed 6315.13 samples/sec Loss 8.4535 LearningRate 0.0009 Epoch: 4 Global Step: 102240 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:35,656-Speed 6311.43 samples/sec Loss 8.5051 LearningRate 0.0009 Epoch: 4 Global Step: 102250 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:38,907-Speed 6301.28 samples/sec Loss 8.4447 LearningRate 0.0009 Epoch: 4 Global Step: 102260 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:42,154-Speed 6308.01 samples/sec Loss 8.5325 LearningRate 0.0009 Epoch: 4 Global Step: 102270 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:45,400-Speed 6311.99 samples/sec Loss 8.4602 LearningRate 0.0009 Epoch: 4 Global Step: 102280 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:48,629-Speed 6343.24 samples/sec Loss 8.5444 LearningRate 0.0009 Epoch: 4 Global Step: 102290 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:51,874-Speed 6314.45 samples/sec Loss 8.4584 LearningRate 0.0009 Epoch: 4 Global Step: 102300 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:55,120-Speed 6309.83 samples/sec Loss 8.4763 LearningRate 0.0009 Epoch: 4 Global Step: 102310 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:01:58,368-Speed 6307.51 samples/sec Loss 8.4741 LearningRate 0.0009 Epoch: 4 Global Step: 102320 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:01,615-Speed 6307.90 samples/sec Loss 8.5088 LearningRate 0.0009 Epoch: 4 Global Step: 102330 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:04,861-Speed 6310.88 samples/sec Loss 8.5181 LearningRate 0.0009 Epoch: 4 Global Step: 102340 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:08,106-Speed 6313.39 samples/sec Loss 8.5239 LearningRate 0.0009 Epoch: 4 Global Step: 102350 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:11,355-Speed 6303.45 samples/sec Loss 8.5040 LearningRate 0.0009 Epoch: 4 Global Step: 102360 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:14,605-Speed 6304.16 samples/sec Loss 8.4615 LearningRate 0.0009 Epoch: 4 Global Step: 102370 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:17,852-Speed 6307.82 samples/sec Loss 8.4615 LearningRate 0.0009 Epoch: 4 Global Step: 102380 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:21,085-Speed 6335.99 samples/sec Loss 8.3872 LearningRate 0.0009 Epoch: 4 Global Step: 102390 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:24,331-Speed 6311.92 samples/sec Loss 8.4771 LearningRate 0.0009 Epoch: 4 Global Step: 102400 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:27,580-Speed 6304.96 samples/sec Loss 8.4968 LearningRate 0.0009 Epoch: 4 Global Step: 102410 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:30,822-Speed 6317.42 samples/sec Loss 8.3909 LearningRate 0.0009 Epoch: 4 Global Step: 102420 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:34,076-Speed 6295.44 samples/sec Loss 8.4041 LearningRate 0.0009 Epoch: 4 Global Step: 102430 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:37,325-Speed 6304.71 samples/sec Loss 8.4543 LearningRate 0.0009 Epoch: 4 Global Step: 102440 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:40,569-Speed 6314.98 samples/sec Loss 8.5104 LearningRate 0.0009 Epoch: 4 Global Step: 102450 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:43,814-Speed 6312.15 samples/sec Loss 8.4205 LearningRate 0.0009 Epoch: 4 Global Step: 102460 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:47,058-Speed 6314.03 samples/sec Loss 8.3735 LearningRate 0.0009 Epoch: 4 Global Step: 102470 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:50,301-Speed 6317.33 samples/sec Loss 8.4906 LearningRate 0.0009 Epoch: 4 Global Step: 102480 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:02:53,548-Speed 6308.96 samples/sec Loss 8.4623 LearningRate 0.0009 Epoch: 4 Global Step: 102490 Fp16 Grad Scale: 131072 Required: 66 hours Training: 2022-04-01 01:02:56,783-Speed 6333.04 samples/sec Loss 8.3985 LearningRate 0.0009 Epoch: 4 Global Step: 102500 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:03:00,033-Speed 6303.68 samples/sec Loss 8.4477 LearningRate 0.0009 Epoch: 4 Global Step: 102510 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:03:03,283-Speed 6301.70 samples/sec Loss 8.4424 LearningRate 0.0009 Epoch: 4 Global Step: 102520 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:03:06,529-Speed 6312.03 samples/sec Loss 8.4793 LearningRate 0.0009 Epoch: 4 Global Step: 102530 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:03:09,775-Speed 6309.92 samples/sec Loss 8.5175 LearningRate 0.0009 Epoch: 4 Global Step: 102540 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:03:13,034-Speed 6286.37 samples/sec Loss 8.3404 LearningRate 0.0009 Epoch: 4 Global Step: 102550 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:03:16,333-Speed 6208.72 samples/sec Loss 8.4450 LearningRate 0.0009 Epoch: 4 Global Step: 102560 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:03:19,597-Speed 6276.18 samples/sec Loss 8.5370 LearningRate 0.0009 Epoch: 4 Global Step: 102570 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:03:22,839-Speed 6319.90 samples/sec Loss 8.4729 LearningRate 0.0009 Epoch: 4 Global Step: 102580 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:03:26,086-Speed 6307.37 samples/sec Loss 8.5292 LearningRate 0.0009 Epoch: 4 Global Step: 102590 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:03:29,334-Speed 6306.62 samples/sec Loss 8.4967 LearningRate 0.0009 Epoch: 4 Global Step: 102600 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:03:32,577-Speed 6316.36 samples/sec Loss 8.4311 LearningRate 0.0009 Epoch: 4 Global Step: 102610 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:03:35,823-Speed 6312.45 samples/sec Loss 8.4376 LearningRate 0.0009 Epoch: 4 Global Step: 102620 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:03:39,066-Speed 6316.17 samples/sec Loss 8.6152 LearningRate 0.0009 Epoch: 4 Global Step: 102630 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:03:42,311-Speed 6311.03 samples/sec Loss 8.3636 LearningRate 0.0009 Epoch: 4 Global Step: 102640 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:03:45,554-Speed 6317.52 samples/sec Loss 8.4390 LearningRate 0.0009 Epoch: 4 Global Step: 102650 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:03:48,802-Speed 6307.10 samples/sec Loss 8.4572 LearningRate 0.0009 Epoch: 4 Global Step: 102660 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:03:52,049-Speed 6307.67 samples/sec Loss 8.4901 LearningRate 0.0009 Epoch: 4 Global Step: 102670 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:03:55,295-Speed 6311.94 samples/sec Loss 8.4284 LearningRate 0.0009 Epoch: 4 Global Step: 102680 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:03:58,536-Speed 6319.00 samples/sec Loss 8.4505 LearningRate 0.0009 Epoch: 4 Global Step: 102690 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:01,788-Speed 6300.42 samples/sec Loss 8.4761 LearningRate 0.0009 Epoch: 4 Global Step: 102700 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:05,037-Speed 6304.95 samples/sec Loss 8.4379 LearningRate 0.0009 Epoch: 4 Global Step: 102710 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:08,281-Speed 6314.42 samples/sec Loss 8.5159 LearningRate 0.0009 Epoch: 4 Global Step: 102720 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:11,527-Speed 6311.31 samples/sec Loss 8.5260 LearningRate 0.0009 Epoch: 4 Global Step: 102730 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:14,771-Speed 6313.77 samples/sec Loss 8.4286 LearningRate 0.0009 Epoch: 4 Global Step: 102740 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:18,017-Speed 6310.96 samples/sec Loss 8.5056 LearningRate 0.0009 Epoch: 4 Global Step: 102750 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:21,254-Speed 6329.31 samples/sec Loss 8.3964 LearningRate 0.0009 Epoch: 4 Global Step: 102760 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:24,498-Speed 6314.64 samples/sec Loss 8.4143 LearningRate 0.0009 Epoch: 4 Global Step: 102770 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:27,748-Speed 6302.06 samples/sec Loss 8.4146 LearningRate 0.0009 Epoch: 4 Global Step: 102780 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:30,992-Speed 6315.51 samples/sec Loss 8.4458 LearningRate 0.0009 Epoch: 4 Global Step: 102790 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:34,238-Speed 6310.31 samples/sec Loss 8.4577 LearningRate 0.0009 Epoch: 4 Global Step: 102800 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:37,490-Speed 6299.83 samples/sec Loss 8.3644 LearningRate 0.0009 Epoch: 4 Global Step: 102810 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:40,737-Speed 6308.52 samples/sec Loss 8.3930 LearningRate 0.0009 Epoch: 4 Global Step: 102820 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:43,989-Speed 6297.41 samples/sec Loss 8.4577 LearningRate 0.0009 Epoch: 4 Global Step: 102830 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:47,237-Speed 6307.16 samples/sec Loss 8.5229 LearningRate 0.0009 Epoch: 4 Global Step: 102840 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:50,485-Speed 6307.89 samples/sec Loss 8.4052 LearningRate 0.0009 Epoch: 4 Global Step: 102850 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:53,718-Speed 6336.58 samples/sec Loss 8.3476 LearningRate 0.0009 Epoch: 4 Global Step: 102860 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:04:56,965-Speed 6307.12 samples/sec Loss 8.4451 LearningRate 0.0009 Epoch: 4 Global Step: 102870 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:00,215-Speed 6302.97 samples/sec Loss 8.4142 LearningRate 0.0009 Epoch: 4 Global Step: 102880 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:03,459-Speed 6314.38 samples/sec Loss 8.4965 LearningRate 0.0009 Epoch: 4 Global Step: 102890 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:06,707-Speed 6307.09 samples/sec Loss 8.4226 LearningRate 0.0009 Epoch: 4 Global Step: 102900 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:09,952-Speed 6312.74 samples/sec Loss 8.4505 LearningRate 0.0009 Epoch: 4 Global Step: 102910 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:13,195-Speed 6317.14 samples/sec Loss 8.4086 LearningRate 0.0009 Epoch: 4 Global Step: 102920 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:16,444-Speed 6305.20 samples/sec Loss 8.4608 LearningRate 0.0009 Epoch: 4 Global Step: 102930 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:19,689-Speed 6313.82 samples/sec Loss 8.4682 LearningRate 0.0009 Epoch: 4 Global Step: 102940 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:22,935-Speed 6311.04 samples/sec Loss 8.4383 LearningRate 0.0009 Epoch: 4 Global Step: 102950 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:26,168-Speed 6335.96 samples/sec Loss 8.4407 LearningRate 0.0009 Epoch: 4 Global Step: 102960 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:29,411-Speed 6315.69 samples/sec Loss 8.3960 LearningRate 0.0009 Epoch: 4 Global Step: 102970 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:32,659-Speed 6306.06 samples/sec Loss 8.4606 LearningRate 0.0009 Epoch: 4 Global Step: 102980 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:35,937-Speed 6250.67 samples/sec Loss 8.4630 LearningRate 0.0009 Epoch: 4 Global Step: 102990 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:39,184-Speed 6307.22 samples/sec Loss 8.4109 LearningRate 0.0009 Epoch: 4 Global Step: 103000 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:42,429-Speed 6313.62 samples/sec Loss 8.4468 LearningRate 0.0009 Epoch: 4 Global Step: 103010 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:45,673-Speed 6315.24 samples/sec Loss 8.4592 LearningRate 0.0009 Epoch: 4 Global Step: 103020 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:48,917-Speed 6312.98 samples/sec Loss 8.4802 LearningRate 0.0009 Epoch: 4 Global Step: 103030 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:52,161-Speed 6315.90 samples/sec Loss 8.3566 LearningRate 0.0009 Epoch: 4 Global Step: 103040 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:55,405-Speed 6315.11 samples/sec Loss 8.4550 LearningRate 0.0009 Epoch: 4 Global Step: 103050 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:05:58,638-Speed 6335.94 samples/sec Loss 8.4591 LearningRate 0.0009 Epoch: 4 Global Step: 103060 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:01,898-Speed 6282.59 samples/sec Loss 8.4210 LearningRate 0.0009 Epoch: 4 Global Step: 103070 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:05,162-Speed 6276.74 samples/sec Loss 8.4008 LearningRate 0.0009 Epoch: 4 Global Step: 103080 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:08,407-Speed 6312.32 samples/sec Loss 8.4658 LearningRate 0.0009 Epoch: 4 Global Step: 103090 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:11,655-Speed 6305.75 samples/sec Loss 8.5053 LearningRate 0.0009 Epoch: 4 Global Step: 103100 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:14,899-Speed 6314.23 samples/sec Loss 8.4048 LearningRate 0.0009 Epoch: 4 Global Step: 103110 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:18,145-Speed 6310.62 samples/sec Loss 8.3542 LearningRate 0.0009 Epoch: 4 Global Step: 103120 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:21,392-Speed 6310.06 samples/sec Loss 8.3684 LearningRate 0.0009 Epoch: 4 Global Step: 103130 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:24,636-Speed 6314.98 samples/sec Loss 8.3390 LearningRate 0.0009 Epoch: 4 Global Step: 103140 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:27,889-Speed 6297.72 samples/sec Loss 8.3932 LearningRate 0.0009 Epoch: 4 Global Step: 103150 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:31,133-Speed 6314.89 samples/sec Loss 8.3697 LearningRate 0.0009 Epoch: 4 Global Step: 103160 Fp16 Grad Scale: 131072 Required: 66 hours Training: 2022-04-01 01:06:34,370-Speed 6327.93 samples/sec Loss 8.4029 LearningRate 0.0009 Epoch: 4 Global Step: 103170 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:37,617-Speed 6308.16 samples/sec Loss 8.3769 LearningRate 0.0009 Epoch: 4 Global Step: 103180 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:40,866-Speed 6305.92 samples/sec Loss 8.4963 LearningRate 0.0009 Epoch: 4 Global Step: 103190 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:44,113-Speed 6307.51 samples/sec Loss 8.3604 LearningRate 0.0009 Epoch: 4 Global Step: 103200 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:47,360-Speed 6309.74 samples/sec Loss 8.3930 LearningRate 0.0009 Epoch: 4 Global Step: 103210 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:50,606-Speed 6309.50 samples/sec Loss 8.4639 LearningRate 0.0009 Epoch: 4 Global Step: 103220 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:53,851-Speed 6312.86 samples/sec Loss 8.4701 LearningRate 0.0009 Epoch: 4 Global Step: 103230 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:06:57,096-Speed 6312.86 samples/sec Loss 8.4418 LearningRate 0.0009 Epoch: 4 Global Step: 103240 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:07:00,338-Speed 6318.06 samples/sec Loss 8.4263 LearningRate 0.0009 Epoch: 4 Global Step: 103250 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:07:03,585-Speed 6309.75 samples/sec Loss 8.4861 LearningRate 0.0009 Epoch: 4 Global Step: 103260 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:07:06,818-Speed 6336.15 samples/sec Loss 8.4856 LearningRate 0.0009 Epoch: 4 Global Step: 103270 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:07:10,064-Speed 6310.69 samples/sec Loss 8.4697 LearningRate 0.0009 Epoch: 4 Global Step: 103280 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:07:13,297-Speed 6334.76 samples/sec Loss 8.4822 LearningRate 0.0009 Epoch: 4 Global Step: 103290 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:07:16,544-Speed 6310.22 samples/sec Loss 8.4850 LearningRate 0.0009 Epoch: 4 Global Step: 103300 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:07:19,786-Speed 6317.34 samples/sec Loss 8.4066 LearningRate 0.0009 Epoch: 4 Global Step: 103310 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:07:23,032-Speed 6311.79 samples/sec Loss 8.4869 LearningRate 0.0009 Epoch: 4 Global Step: 103320 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:07:26,280-Speed 6306.92 samples/sec Loss 8.4009 LearningRate 0.0009 Epoch: 4 Global Step: 103330 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:07:29,524-Speed 6314.23 samples/sec Loss 8.4180 LearningRate 0.0009 Epoch: 4 Global Step: 103340 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:07:32,769-Speed 6313.03 samples/sec Loss 8.4794 LearningRate 0.0009 Epoch: 4 Global Step: 103350 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:07:36,013-Speed 6315.39 samples/sec Loss 8.4705 LearningRate 0.0009 Epoch: 4 Global Step: 103360 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:07:39,259-Speed 6310.38 samples/sec Loss 8.4426 LearningRate 0.0009 Epoch: 4 Global Step: 103370 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:07:42,503-Speed 6314.74 samples/sec Loss 8.4167 LearningRate 0.0009 Epoch: 4 Global Step: 103380 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:07:45,749-Speed 6310.41 samples/sec Loss 8.4798 LearningRate 0.0009 Epoch: 4 Global Step: 103390 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:07:48,999-Speed 6302.88 samples/sec Loss 8.4149 LearningRate 0.0009 Epoch: 4 Global Step: 103400 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:07:52,244-Speed 6313.14 samples/sec Loss 8.4726 LearningRate 0.0009 Epoch: 4 Global Step: 103410 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:07:55,489-Speed 6313.57 samples/sec Loss 8.4713 LearningRate 0.0009 Epoch: 4 Global Step: 103420 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:07:58,734-Speed 6310.72 samples/sec Loss 8.4248 LearningRate 0.0009 Epoch: 4 Global Step: 103430 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:08:01,986-Speed 6299.81 samples/sec Loss 8.4308 LearningRate 0.0009 Epoch: 4 Global Step: 103440 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:08:05,231-Speed 6312.23 samples/sec Loss 8.3725 LearningRate 0.0009 Epoch: 4 Global Step: 103450 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:08:08,478-Speed 6309.91 samples/sec Loss 8.4066 LearningRate 0.0009 Epoch: 4 Global Step: 103460 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:08:11,707-Speed 6342.98 samples/sec Loss 8.4370 LearningRate 0.0009 Epoch: 4 Global Step: 103470 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:08:14,951-Speed 6314.37 samples/sec Loss 8.3800 LearningRate 0.0009 Epoch: 4 Global Step: 103480 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:08:18,198-Speed 6309.00 samples/sec Loss 8.3142 LearningRate 0.0009 Epoch: 4 Global Step: 103490 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:08:21,445-Speed 6308.48 samples/sec Loss 8.5185 LearningRate 0.0009 Epoch: 4 Global Step: 103500 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:08:24,703-Speed 6288.21 samples/sec Loss 8.4401 LearningRate 0.0009 Epoch: 4 Global Step: 103510 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:08:27,950-Speed 6309.27 samples/sec Loss 8.3921 LearningRate 0.0009 Epoch: 4 Global Step: 103520 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:08:31,192-Speed 6317.68 samples/sec Loss 8.4571 LearningRate 0.0009 Epoch: 4 Global Step: 103530 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:08:34,438-Speed 6309.45 samples/sec Loss 8.4642 LearningRate 0.0009 Epoch: 4 Global Step: 103540 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:08:37,683-Speed 6314.18 samples/sec Loss 8.5207 LearningRate 0.0009 Epoch: 4 Global Step: 103550 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:08:40,931-Speed 6306.62 samples/sec Loss 8.4420 LearningRate 0.0009 Epoch: 4 Global Step: 103560 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:08:44,175-Speed 6314.40 samples/sec Loss 8.4174 LearningRate 0.0009 Epoch: 4 Global Step: 103570 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:08:47,418-Speed 6316.23 samples/sec Loss 8.4250 LearningRate 0.0009 Epoch: 4 Global Step: 103580 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:08:50,665-Speed 6310.37 samples/sec Loss 8.3435 LearningRate 0.0009 Epoch: 4 Global Step: 103590 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:08:53,906-Speed 6320.32 samples/sec Loss 8.4937 LearningRate 0.0009 Epoch: 4 Global Step: 103600 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:08:57,153-Speed 6307.35 samples/sec Loss 8.3442 LearningRate 0.0009 Epoch: 4 Global Step: 103610 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:09:00,403-Speed 6303.36 samples/sec Loss 8.3945 LearningRate 0.0009 Epoch: 4 Global Step: 103620 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:09:03,654-Speed 6302.19 samples/sec Loss 8.5105 LearningRate 0.0009 Epoch: 4 Global Step: 103630 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:09:06,899-Speed 6312.34 samples/sec Loss 8.4064 LearningRate 0.0009 Epoch: 4 Global Step: 103640 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:09:10,144-Speed 6311.32 samples/sec Loss 8.4605 LearningRate 0.0009 Epoch: 4 Global Step: 103650 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:09:13,391-Speed 6310.22 samples/sec Loss 8.4493 LearningRate 0.0009 Epoch: 4 Global Step: 103660 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:09:16,629-Speed 6325.14 samples/sec Loss 8.4328 LearningRate 0.0009 Epoch: 4 Global Step: 103670 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:09:19,875-Speed 6311.15 samples/sec Loss 8.4342 LearningRate 0.0009 Epoch: 4 Global Step: 103680 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:09:23,124-Speed 6305.52 samples/sec Loss 8.3912 LearningRate 0.0009 Epoch: 4 Global Step: 103690 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:24,130-Speed 335.71 samples/sec Loss 8.3719 LearningRate 0.0009 Epoch: 5 Global Step: 103700 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:27,368-Speed 6326.77 samples/sec Loss 8.4628 LearningRate 0.0009 Epoch: 5 Global Step: 103710 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:30,599-Speed 6339.33 samples/sec Loss 8.3986 LearningRate 0.0009 Epoch: 5 Global Step: 103720 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:33,835-Speed 6331.61 samples/sec Loss 8.4231 LearningRate 0.0009 Epoch: 5 Global Step: 103730 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:37,088-Speed 6296.36 samples/sec Loss 8.3903 LearningRate 0.0009 Epoch: 5 Global Step: 103740 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:40,325-Speed 6327.39 samples/sec Loss 8.4356 LearningRate 0.0009 Epoch: 5 Global Step: 103750 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:43,566-Speed 6320.95 samples/sec Loss 8.4465 LearningRate 0.0009 Epoch: 5 Global Step: 103760 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:46,788-Speed 6358.91 samples/sec Loss 8.3465 LearningRate 0.0009 Epoch: 5 Global Step: 103770 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:50,030-Speed 6318.27 samples/sec Loss 8.3048 LearningRate 0.0009 Epoch: 5 Global Step: 103780 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:53,269-Speed 6325.37 samples/sec Loss 8.3936 LearningRate 0.0009 Epoch: 5 Global Step: 103790 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:56,508-Speed 6323.03 samples/sec Loss 8.3157 LearningRate 0.0009 Epoch: 5 Global Step: 103800 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:10:59,764-Speed 6292.59 samples/sec Loss 8.4536 LearningRate 0.0009 Epoch: 5 Global Step: 103810 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:11:03,038-Speed 6256.43 samples/sec Loss 8.4049 LearningRate 0.0009 Epoch: 5 Global Step: 103820 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:11:06,279-Speed 6319.57 samples/sec Loss 8.4043 LearningRate 0.0009 Epoch: 5 Global Step: 103830 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:11:09,514-Speed 6333.05 samples/sec Loss 8.3739 LearningRate 0.0009 Epoch: 5 Global Step: 103840 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:11:12,759-Speed 6312.72 samples/sec Loss 8.4343 LearningRate 0.0009 Epoch: 5 Global Step: 103850 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:11:15,999-Speed 6323.03 samples/sec Loss 8.4055 LearningRate 0.0009 Epoch: 5 Global Step: 103860 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:11:19,220-Speed 6357.82 samples/sec Loss 8.3983 LearningRate 0.0009 Epoch: 5 Global Step: 103870 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:11:22,458-Speed 6327.90 samples/sec Loss 8.3860 LearningRate 0.0009 Epoch: 5 Global Step: 103880 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:11:25,682-Speed 6352.80 samples/sec Loss 8.3955 LearningRate 0.0009 Epoch: 5 Global Step: 103890 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:11:28,925-Speed 6318.01 samples/sec Loss 8.4157 LearningRate 0.0009 Epoch: 5 Global Step: 103900 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:11:32,161-Speed 6329.01 samples/sec Loss 8.4535 LearningRate 0.0009 Epoch: 5 Global Step: 103910 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:11:35,404-Speed 6315.91 samples/sec Loss 8.4140 LearningRate 0.0009 Epoch: 5 Global Step: 103920 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:11:38,653-Speed 6304.58 samples/sec Loss 8.3839 LearningRate 0.0009 Epoch: 5 Global Step: 103930 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:11:41,892-Speed 6326.37 samples/sec Loss 8.3278 LearningRate 0.0009 Epoch: 5 Global Step: 103940 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:11:45,134-Speed 6318.02 samples/sec Loss 8.4260 LearningRate 0.0009 Epoch: 5 Global Step: 103950 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:11:48,382-Speed 6306.06 samples/sec Loss 8.3331 LearningRate 0.0009 Epoch: 5 Global Step: 103960 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:11:51,626-Speed 6314.11 samples/sec Loss 8.3520 LearningRate 0.0009 Epoch: 5 Global Step: 103970 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:11:54,863-Speed 6328.15 samples/sec Loss 8.3914 LearningRate 0.0009 Epoch: 5 Global Step: 103980 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:11:58,103-Speed 6323.59 samples/sec Loss 8.3565 LearningRate 0.0009 Epoch: 5 Global Step: 103990 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:01,341-Speed 6326.55 samples/sec Loss 8.4086 LearningRate 0.0009 Epoch: 5 Global Step: 104000 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:04,576-Speed 6332.65 samples/sec Loss 8.4303 LearningRate 0.0009 Epoch: 5 Global Step: 104010 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:07,813-Speed 6327.38 samples/sec Loss 8.4043 LearningRate 0.0009 Epoch: 5 Global Step: 104020 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:11,122-Speed 6191.86 samples/sec Loss 8.4459 LearningRate 0.0009 Epoch: 5 Global Step: 104030 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:14,358-Speed 6329.37 samples/sec Loss 8.3316 LearningRate 0.0009 Epoch: 5 Global Step: 104040 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:17,639-Speed 6243.63 samples/sec Loss 8.2993 LearningRate 0.0009 Epoch: 5 Global Step: 104050 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:20,871-Speed 6337.82 samples/sec Loss 8.4017 LearningRate 0.0009 Epoch: 5 Global Step: 104060 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:24,115-Speed 6315.66 samples/sec Loss 8.3725 LearningRate 0.0009 Epoch: 5 Global Step: 104070 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:27,353-Speed 6324.61 samples/sec Loss 8.3263 LearningRate 0.0009 Epoch: 5 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:30,573-Speed 6363.06 samples/sec Loss 8.3695 LearningRate 0.0009 Epoch: 5 Global Step: 104090 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:33,808-Speed 6331.48 samples/sec Loss 8.2624 LearningRate 0.0009 Epoch: 5 Global Step: 104100 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:37,043-Speed 6332.46 samples/sec Loss 8.2875 LearningRate 0.0009 Epoch: 5 Global Step: 104110 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:40,283-Speed 6322.36 samples/sec Loss 8.3580 LearningRate 0.0009 Epoch: 5 Global Step: 104120 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:43,517-Speed 6334.74 samples/sec Loss 8.4166 LearningRate 0.0009 Epoch: 5 Global Step: 104130 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:46,753-Speed 6330.27 samples/sec Loss 8.3882 LearningRate 0.0009 Epoch: 5 Global Step: 104140 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:49,987-Speed 6332.74 samples/sec Loss 8.3835 LearningRate 0.0009 Epoch: 5 Global Step: 104150 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:53,223-Speed 6331.15 samples/sec Loss 8.2681 LearningRate 0.0009 Epoch: 5 Global Step: 104160 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:56,458-Speed 6331.66 samples/sec Loss 8.3518 LearningRate 0.0009 Epoch: 5 Global Step: 104170 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:12:59,691-Speed 6335.60 samples/sec Loss 8.3534 LearningRate 0.0009 Epoch: 5 Global Step: 104180 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:02,912-Speed 6359.62 samples/sec Loss 8.3563 LearningRate 0.0009 Epoch: 5 Global Step: 104190 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:06,148-Speed 6330.76 samples/sec Loss 8.4016 LearningRate 0.0009 Epoch: 5 Global Step: 104200 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:09,387-Speed 6324.58 samples/sec Loss 8.4074 LearningRate 0.0009 Epoch: 5 Global Step: 104210 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:12,623-Speed 6330.87 samples/sec Loss 8.4274 LearningRate 0.0009 Epoch: 5 Global Step: 104220 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:15,860-Speed 6328.81 samples/sec Loss 8.3316 LearningRate 0.0009 Epoch: 5 Global Step: 104230 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:19,095-Speed 6331.88 samples/sec Loss 8.4224 LearningRate 0.0009 Epoch: 5 Global Step: 104240 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:22,329-Speed 6334.95 samples/sec Loss 8.2522 LearningRate 0.0009 Epoch: 5 Global Step: 104250 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:25,562-Speed 6334.95 samples/sec Loss 8.4109 LearningRate 0.0009 Epoch: 5 Global Step: 104260 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:28,804-Speed 6318.81 samples/sec Loss 8.3588 LearningRate 0.0009 Epoch: 5 Global Step: 104270 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:32,043-Speed 6325.05 samples/sec Loss 8.4511 LearningRate 0.0009 Epoch: 5 Global Step: 104280 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:35,279-Speed 6329.78 samples/sec Loss 8.4021 LearningRate 0.0009 Epoch: 5 Global Step: 104290 Fp16 Grad Scale: 131072 Required: 66 hours Training: 2022-04-01 01:13:38,503-Speed 6354.21 samples/sec Loss 8.3836 LearningRate 0.0009 Epoch: 5 Global Step: 104300 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:41,739-Speed 6329.59 samples/sec Loss 8.4474 LearningRate 0.0009 Epoch: 5 Global Step: 104310 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:44,974-Speed 6331.22 samples/sec Loss 8.4383 LearningRate 0.0009 Epoch: 5 Global Step: 104320 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:48,217-Speed 6317.13 samples/sec Loss 8.3721 LearningRate 0.0009 Epoch: 5 Global Step: 104330 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:51,452-Speed 6333.15 samples/sec Loss 8.3718 LearningRate 0.0009 Epoch: 5 Global Step: 104340 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:54,689-Speed 6328.34 samples/sec Loss 8.3346 LearningRate 0.0009 Epoch: 5 Global Step: 104350 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:13:57,935-Speed 6310.50 samples/sec Loss 8.3783 LearningRate 0.0009 Epoch: 5 Global Step: 104360 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:01,172-Speed 6328.14 samples/sec Loss 8.4314 LearningRate 0.0009 Epoch: 5 Global Step: 104370 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:04,407-Speed 6331.80 samples/sec Loss 8.3273 LearningRate 0.0009 Epoch: 5 Global Step: 104380 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:07,645-Speed 6325.33 samples/sec Loss 8.4031 LearningRate 0.0009 Epoch: 5 Global Step: 104390 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:10,867-Speed 6358.29 samples/sec Loss 8.4848 LearningRate 0.0009 Epoch: 5 Global Step: 104400 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:14,106-Speed 6325.64 samples/sec Loss 8.3627 LearningRate 0.0009 Epoch: 5 Global Step: 104410 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:17,339-Speed 6335.48 samples/sec Loss 8.3302 LearningRate 0.0009 Epoch: 5 Global Step: 104420 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:20,574-Speed 6333.52 samples/sec Loss 8.3896 LearningRate 0.0009 Epoch: 5 Global Step: 104430 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:23,810-Speed 6329.06 samples/sec Loss 8.4945 LearningRate 0.0009 Epoch: 5 Global Step: 104440 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:27,050-Speed 6322.56 samples/sec Loss 8.3653 LearningRate 0.0009 Epoch: 5 Global Step: 104450 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:30,289-Speed 6325.03 samples/sec Loss 8.3684 LearningRate 0.0009 Epoch: 5 Global Step: 104460 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:33,525-Speed 6330.53 samples/sec Loss 8.3042 LearningRate 0.0009 Epoch: 5 Global Step: 104470 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:36,765-Speed 6322.45 samples/sec Loss 8.4636 LearningRate 0.0009 Epoch: 5 Global Step: 104480 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:40,002-Speed 6326.90 samples/sec Loss 8.3450 LearningRate 0.0009 Epoch: 5 Global Step: 104490 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:43,225-Speed 6357.24 samples/sec Loss 8.4374 LearningRate 0.0009 Epoch: 5 Global Step: 104500 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:46,468-Speed 6316.07 samples/sec Loss 8.4663 LearningRate 0.0009 Epoch: 5 Global Step: 104510 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:49,706-Speed 6326.82 samples/sec Loss 8.2998 LearningRate 0.0009 Epoch: 5 Global Step: 104520 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:52,955-Speed 6304.11 samples/sec Loss 8.3596 LearningRate 0.0009 Epoch: 5 Global Step: 104530 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:56,191-Speed 6330.43 samples/sec Loss 8.3647 LearningRate 0.0009 Epoch: 5 Global Step: 104540 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:14:59,427-Speed 6330.18 samples/sec Loss 8.3583 LearningRate 0.0009 Epoch: 5 Global Step: 104550 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:15:02,666-Speed 6324.45 samples/sec Loss 8.3948 LearningRate 0.0009 Epoch: 5 Global Step: 104560 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:15:05,904-Speed 6327.01 samples/sec Loss 8.4119 LearningRate 0.0009 Epoch: 5 Global Step: 104570 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:15:09,143-Speed 6323.48 samples/sec Loss 8.3668 LearningRate 0.0009 Epoch: 5 Global Step: 104580 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:15:12,381-Speed 6325.73 samples/sec Loss 8.3822 LearningRate 0.0009 Epoch: 5 Global Step: 104590 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:15:15,622-Speed 6320.64 samples/sec Loss 8.2982 LearningRate 0.0009 Epoch: 5 Global Step: 104600 Fp16 Grad Scale: 131072 Required: 66 hours Training: 2022-04-01 01:15:18,844-Speed 6358.35 samples/sec Loss 8.3278 LearningRate 0.0009 Epoch: 5 Global Step: 104610 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:15:22,082-Speed 6325.74 samples/sec Loss 8.4061 LearningRate 0.0009 Epoch: 5 Global Step: 104620 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:15:25,321-Speed 6324.32 samples/sec Loss 8.4232 LearningRate 0.0009 Epoch: 5 Global Step: 104630 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:15:28,565-Speed 6315.80 samples/sec Loss 8.4455 LearningRate 0.0009 Epoch: 5 Global Step: 104640 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:15:31,788-Speed 6355.60 samples/sec Loss 8.4371 LearningRate 0.0009 Epoch: 5 Global Step: 104650 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:15:35,025-Speed 6329.03 samples/sec Loss 8.3876 LearningRate 0.0009 Epoch: 5 Global Step: 104660 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:15:38,267-Speed 6318.08 samples/sec Loss 8.4417 LearningRate 0.0009 Epoch: 5 Global Step: 104670 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:15:41,504-Speed 6328.91 samples/sec Loss 8.3942 LearningRate 0.0009 Epoch: 5 Global Step: 104680 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:15:44,744-Speed 6320.89 samples/sec Loss 8.4246 LearningRate 0.0009 Epoch: 5 Global Step: 104690 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:15:47,983-Speed 6325.54 samples/sec Loss 8.4301 LearningRate 0.0009 Epoch: 5 Global Step: 104700 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:15:51,219-Speed 6329.23 samples/sec Loss 8.3798 LearningRate 0.0009 Epoch: 5 Global Step: 104710 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:15:54,460-Speed 6320.95 samples/sec Loss 8.4042 LearningRate 0.0009 Epoch: 5 Global Step: 104720 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:15:57,701-Speed 6320.00 samples/sec Loss 8.3913 LearningRate 0.0009 Epoch: 5 Global Step: 104730 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:16:00,939-Speed 6326.39 samples/sec Loss 8.3380 LearningRate 0.0009 Epoch: 5 Global Step: 104740 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:16:04,180-Speed 6321.12 samples/sec Loss 8.3020 LearningRate 0.0009 Epoch: 5 Global Step: 104750 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:07,420-Speed 6322.61 samples/sec Loss 8.3846 LearningRate 0.0009 Epoch: 5 Global Step: 104760 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:10,660-Speed 6321.42 samples/sec Loss 8.3235 LearningRate 0.0009 Epoch: 5 Global Step: 104770 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:13,917-Speed 6289.67 samples/sec Loss 8.3063 LearningRate 0.0009 Epoch: 5 Global Step: 104780 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:17,160-Speed 6316.62 samples/sec Loss 8.3297 LearningRate 0.0009 Epoch: 5 Global Step: 104790 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:20,400-Speed 6321.09 samples/sec Loss 8.3985 LearningRate 0.0009 Epoch: 5 Global Step: 104800 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:23,640-Speed 6323.66 samples/sec Loss 8.3344 LearningRate 0.0009 Epoch: 5 Global Step: 104810 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:26,891-Speed 6301.00 samples/sec Loss 8.3162 LearningRate 0.0009 Epoch: 5 Global Step: 104820 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:30,130-Speed 6323.58 samples/sec Loss 8.3794 LearningRate 0.0009 Epoch: 5 Global Step: 104830 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:33,372-Speed 6319.07 samples/sec Loss 8.2980 LearningRate 0.0009 Epoch: 5 Global Step: 104840 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:36,602-Speed 6341.95 samples/sec Loss 8.4796 LearningRate 0.0009 Epoch: 5 Global Step: 104850 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:39,843-Speed 6320.73 samples/sec Loss 8.3373 LearningRate 0.0009 Epoch: 5 Global Step: 104860 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:43,086-Speed 6317.59 samples/sec Loss 8.3827 LearningRate 0.0009 Epoch: 5 Global Step: 104870 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:16:46,318-Speed 6337.45 samples/sec Loss 8.4072 LearningRate 0.0009 Epoch: 5 Global Step: 104880 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:16:49,555-Speed 6328.37 samples/sec Loss 8.3546 LearningRate 0.0009 Epoch: 5 Global Step: 104890 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:16:52,792-Speed 6329.34 samples/sec Loss 8.3878 LearningRate 0.0009 Epoch: 5 Global Step: 104900 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:16:56,036-Speed 6312.64 samples/sec Loss 8.4531 LearningRate 0.0009 Epoch: 5 Global Step: 104910 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:16:59,276-Speed 6323.70 samples/sec Loss 8.3899 LearningRate 0.0009 Epoch: 5 Global Step: 104920 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:02,518-Speed 6319.16 samples/sec Loss 8.3646 LearningRate 0.0009 Epoch: 5 Global Step: 104930 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:05,757-Speed 6323.58 samples/sec Loss 8.3942 LearningRate 0.0009 Epoch: 5 Global Step: 104940 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:08,995-Speed 6325.29 samples/sec Loss 8.3509 LearningRate 0.0009 Epoch: 5 Global Step: 104950 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:12,238-Speed 6316.43 samples/sec Loss 8.3833 LearningRate 0.0009 Epoch: 5 Global Step: 104960 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:15,477-Speed 6325.95 samples/sec Loss 8.3958 LearningRate 0.0009 Epoch: 5 Global Step: 104970 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:18,719-Speed 6316.69 samples/sec Loss 8.3164 LearningRate 0.0009 Epoch: 5 Global Step: 104980 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:17:21,963-Speed 6314.82 samples/sec Loss 8.3085 LearningRate 0.0009 Epoch: 5 Global Step: 104990 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:17:25,204-Speed 6321.01 samples/sec Loss 8.3465 LearningRate 0.0009 Epoch: 5 Global Step: 105000 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:17:28,431-Speed 6348.90 samples/sec Loss 8.3883 LearningRate 0.0009 Epoch: 5 Global Step: 105010 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:31,671-Speed 6322.37 samples/sec Loss 8.2588 LearningRate 0.0009 Epoch: 5 Global Step: 105020 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:34,911-Speed 6320.93 samples/sec Loss 8.3623 LearningRate 0.0009 Epoch: 5 Global Step: 105030 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:38,154-Speed 6316.11 samples/sec Loss 8.3089 LearningRate 0.0009 Epoch: 5 Global Step: 105040 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:41,395-Speed 6322.64 samples/sec Loss 8.4322 LearningRate 0.0009 Epoch: 5 Global Step: 105050 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:44,633-Speed 6326.89 samples/sec Loss 8.3289 LearningRate 0.0009 Epoch: 5 Global Step: 105060 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:47,871-Speed 6325.06 samples/sec Loss 8.3022 LearningRate 0.0009 Epoch: 5 Global Step: 105070 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:51,116-Speed 6312.13 samples/sec Loss 8.3413 LearningRate 0.0009 Epoch: 5 Global Step: 105080 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:54,353-Speed 6330.22 samples/sec Loss 8.3302 LearningRate 0.0009 Epoch: 5 Global Step: 105090 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:17:57,598-Speed 6312.12 samples/sec Loss 8.3281 LearningRate 0.0009 Epoch: 5 Global Step: 105100 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:18:00,840-Speed 6318.31 samples/sec Loss 8.3834 LearningRate 0.0009 Epoch: 5 Global Step: 105110 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:04,082-Speed 6319.11 samples/sec Loss 8.3420 LearningRate 0.0009 Epoch: 5 Global Step: 105120 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:07,324-Speed 6318.37 samples/sec Loss 8.3874 LearningRate 0.0009 Epoch: 5 Global Step: 105130 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:10,564-Speed 6322.51 samples/sec Loss 8.4245 LearningRate 0.0009 Epoch: 5 Global Step: 105140 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:13,813-Speed 6304.41 samples/sec Loss 8.3320 LearningRate 0.0009 Epoch: 5 Global Step: 105150 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:17,056-Speed 6317.00 samples/sec Loss 8.3651 LearningRate 0.0009 Epoch: 5 Global Step: 105160 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:20,297-Speed 6320.23 samples/sec Loss 8.3415 LearningRate 0.0009 Epoch: 5 Global Step: 105170 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:23,540-Speed 6315.29 samples/sec Loss 8.3076 LearningRate 0.0009 Epoch: 5 Global Step: 105180 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:26,779-Speed 6325.23 samples/sec Loss 8.4109 LearningRate 0.0009 Epoch: 5 Global Step: 105190 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:30,023-Speed 6315.07 samples/sec Loss 8.3681 LearningRate 0.0009 Epoch: 5 Global Step: 105200 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:33,250-Speed 6347.78 samples/sec Loss 8.3384 LearningRate 0.0009 Epoch: 5 Global Step: 105210 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:36,492-Speed 6318.00 samples/sec Loss 8.2984 LearningRate 0.0009 Epoch: 5 Global Step: 105220 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:39,731-Speed 6323.54 samples/sec Loss 8.3912 LearningRate 0.0009 Epoch: 5 Global Step: 105230 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:42,973-Speed 6318.81 samples/sec Loss 8.3303 LearningRate 0.0009 Epoch: 5 Global Step: 105240 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:46,216-Speed 6316.64 samples/sec Loss 8.2893 LearningRate 0.0009 Epoch: 5 Global Step: 105250 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:49,461-Speed 6312.91 samples/sec Loss 8.2937 LearningRate 0.0009 Epoch: 5 Global Step: 105260 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:52,720-Speed 6286.73 samples/sec Loss 8.4108 LearningRate 0.0009 Epoch: 5 Global Step: 105270 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:55,960-Speed 6321.50 samples/sec Loss 8.3120 LearningRate 0.0009 Epoch: 5 Global Step: 105280 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:18:59,206-Speed 6312.04 samples/sec Loss 8.3888 LearningRate 0.0009 Epoch: 5 Global Step: 105290 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:02,449-Speed 6315.93 samples/sec Loss 8.3490 LearningRate 0.0009 Epoch: 5 Global Step: 105300 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:05,678-Speed 6344.35 samples/sec Loss 8.4038 LearningRate 0.0009 Epoch: 5 Global Step: 105310 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:08,920-Speed 6317.17 samples/sec Loss 8.3364 LearningRate 0.0009 Epoch: 5 Global Step: 105320 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:12,171-Speed 6302.71 samples/sec Loss 8.3553 LearningRate 0.0009 Epoch: 5 Global Step: 105330 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:15,414-Speed 6315.60 samples/sec Loss 8.2788 LearningRate 0.0009 Epoch: 5 Global Step: 105340 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:18,654-Speed 6322.74 samples/sec Loss 8.3800 LearningRate 0.0009 Epoch: 5 Global Step: 105350 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:21,895-Speed 6321.02 samples/sec Loss 8.3501 LearningRate 0.0009 Epoch: 5 Global Step: 105360 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:25,139-Speed 6314.43 samples/sec Loss 8.3683 LearningRate 0.0009 Epoch: 5 Global Step: 105370 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:28,382-Speed 6316.67 samples/sec Loss 8.2873 LearningRate 0.0009 Epoch: 5 Global Step: 105380 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:31,624-Speed 6318.29 samples/sec Loss 8.3608 LearningRate 0.0009 Epoch: 5 Global Step: 105390 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:34,868-Speed 6313.81 samples/sec Loss 8.3415 LearningRate 0.0009 Epoch: 5 Global Step: 105400 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:19:38,094-Speed 6349.14 samples/sec Loss 8.3010 LearningRate 0.0009 Epoch: 5 Global Step: 105410 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:19:41,336-Speed 6318.88 samples/sec Loss 8.2796 LearningRate 0.0009 Epoch: 5 Global Step: 105420 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:19:44,575-Speed 6324.83 samples/sec Loss 8.4108 LearningRate 0.0009 Epoch: 5 Global Step: 105430 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:19:47,820-Speed 6312.27 samples/sec Loss 8.3381 LearningRate 0.0009 Epoch: 5 Global Step: 105440 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:19:51,062-Speed 6317.33 samples/sec Loss 8.3932 LearningRate 0.0009 Epoch: 5 Global Step: 105450 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:19:54,310-Speed 6307.78 samples/sec Loss 8.3239 LearningRate 0.0009 Epoch: 5 Global Step: 105460 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:19:57,555-Speed 6312.27 samples/sec Loss 8.2642 LearningRate 0.0009 Epoch: 5 Global Step: 105470 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:20:00,796-Speed 6322.28 samples/sec Loss 8.2293 LearningRate 0.0009 Epoch: 5 Global Step: 105480 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:20:04,040-Speed 6314.01 samples/sec Loss 8.3764 LearningRate 0.0009 Epoch: 5 Global Step: 105490 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:20:07,283-Speed 6317.44 samples/sec Loss 8.3952 LearningRate 0.0009 Epoch: 5 Global Step: 105500 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:20:10,526-Speed 6315.33 samples/sec Loss 8.3978 LearningRate 0.0009 Epoch: 5 Global Step: 105510 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:13,768-Speed 6319.15 samples/sec Loss 8.4312 LearningRate 0.0009 Epoch: 5 Global Step: 105520 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:17,008-Speed 6323.20 samples/sec Loss 8.3216 LearningRate 0.0009 Epoch: 5 Global Step: 105530 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:20,250-Speed 6318.30 samples/sec Loss 8.2889 LearningRate 0.0009 Epoch: 5 Global Step: 105540 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:23,490-Speed 6322.52 samples/sec Loss 8.2899 LearningRate 0.0009 Epoch: 5 Global Step: 105550 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:26,733-Speed 6316.41 samples/sec Loss 8.3002 LearningRate 0.0009 Epoch: 5 Global Step: 105560 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:29,986-Speed 6297.62 samples/sec Loss 8.3478 LearningRate 0.0009 Epoch: 5 Global Step: 105570 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:33,228-Speed 6316.93 samples/sec Loss 8.3779 LearningRate 0.0009 Epoch: 5 Global Step: 105580 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:36,480-Speed 6300.38 samples/sec Loss 8.3428 LearningRate 0.0009 Epoch: 5 Global Step: 105590 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:39,721-Speed 6318.89 samples/sec Loss 8.3528 LearningRate 0.0009 Epoch: 5 Global Step: 105600 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:42,952-Speed 6341.43 samples/sec Loss 8.3345 LearningRate 0.0009 Epoch: 5 Global Step: 105610 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:46,202-Speed 6302.60 samples/sec Loss 8.3029 LearningRate 0.0009 Epoch: 5 Global Step: 105620 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:49,443-Speed 6320.46 samples/sec Loss 8.3174 LearningRate 0.0009 Epoch: 5 Global Step: 105630 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:52,684-Speed 6319.64 samples/sec Loss 8.2981 LearningRate 0.0009 Epoch: 5 Global Step: 105640 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:55,925-Speed 6320.69 samples/sec Loss 8.2757 LearningRate 0.0009 Epoch: 5 Global Step: 105650 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:20:59,167-Speed 6319.05 samples/sec Loss 8.2868 LearningRate 0.0009 Epoch: 5 Global Step: 105660 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:02,406-Speed 6324.06 samples/sec Loss 8.4427 LearningRate 0.0009 Epoch: 5 Global Step: 105670 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:05,647-Speed 6320.61 samples/sec Loss 8.3589 LearningRate 0.0009 Epoch: 5 Global Step: 105680 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:08,893-Speed 6310.54 samples/sec Loss 8.2764 LearningRate 0.0009 Epoch: 5 Global Step: 105690 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:12,137-Speed 6315.19 samples/sec Loss 8.4066 LearningRate 0.0009 Epoch: 5 Global Step: 105700 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:15,364-Speed 6347.42 samples/sec Loss 8.2526 LearningRate 0.0009 Epoch: 5 Global Step: 105710 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:18,606-Speed 6318.88 samples/sec Loss 8.3701 LearningRate 0.0009 Epoch: 5 Global Step: 105720 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:21,849-Speed 6317.10 samples/sec Loss 8.2714 LearningRate 0.0009 Epoch: 5 Global Step: 105730 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:25,089-Speed 6322.12 samples/sec Loss 8.2720 LearningRate 0.0009 Epoch: 5 Global Step: 105740 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:28,336-Speed 6308.81 samples/sec Loss 8.2750 LearningRate 0.0009 Epoch: 5 Global Step: 105750 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:31,579-Speed 6317.70 samples/sec Loss 8.2828 LearningRate 0.0009 Epoch: 5 Global Step: 105760 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:34,822-Speed 6316.41 samples/sec Loss 8.1829 LearningRate 0.0009 Epoch: 5 Global Step: 105770 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:38,064-Speed 6317.55 samples/sec Loss 8.3592 LearningRate 0.0009 Epoch: 5 Global Step: 105780 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:41,305-Speed 6321.06 samples/sec Loss 8.3626 LearningRate 0.0009 Epoch: 5 Global Step: 105790 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:44,550-Speed 6312.06 samples/sec Loss 8.2889 LearningRate 0.0009 Epoch: 5 Global Step: 105800 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:47,775-Speed 6351.73 samples/sec Loss 8.3085 LearningRate 0.0009 Epoch: 5 Global Step: 105810 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:51,017-Speed 6319.10 samples/sec Loss 8.3494 LearningRate 0.0009 Epoch: 5 Global Step: 105820 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:54,266-Speed 6304.55 samples/sec Loss 8.3413 LearningRate 0.0009 Epoch: 5 Global Step: 105830 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:21:57,508-Speed 6318.78 samples/sec Loss 8.2979 LearningRate 0.0009 Epoch: 5 Global Step: 105840 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:00,755-Speed 6308.93 samples/sec Loss 8.4034 LearningRate 0.0009 Epoch: 5 Global Step: 105850 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:04,003-Speed 6310.85 samples/sec Loss 8.3946 LearningRate 0.0009 Epoch: 5 Global Step: 105860 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:07,244-Speed 6320.04 samples/sec Loss 8.4056 LearningRate 0.0009 Epoch: 5 Global Step: 105870 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:10,505-Speed 6281.85 samples/sec Loss 8.3159 LearningRate 0.0009 Epoch: 5 Global Step: 105880 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:13,750-Speed 6313.24 samples/sec Loss 8.3967 LearningRate 0.0009 Epoch: 5 Global Step: 105890 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:16,992-Speed 6318.30 samples/sec Loss 8.3997 LearningRate 0.0009 Epoch: 5 Global Step: 105900 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:20,225-Speed 6336.41 samples/sec Loss 8.3543 LearningRate 0.0009 Epoch: 5 Global Step: 105910 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:23,469-Speed 6313.65 samples/sec Loss 8.3240 LearningRate 0.0009 Epoch: 5 Global Step: 105920 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:26,716-Speed 6309.51 samples/sec Loss 8.2423 LearningRate 0.0009 Epoch: 5 Global Step: 105930 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:29,962-Speed 6311.37 samples/sec Loss 8.3075 LearningRate 0.0009 Epoch: 5 Global Step: 105940 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:33,207-Speed 6312.95 samples/sec Loss 8.2417 LearningRate 0.0009 Epoch: 5 Global Step: 105950 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:36,448-Speed 6319.14 samples/sec Loss 8.3101 LearningRate 0.0009 Epoch: 5 Global Step: 105960 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:39,694-Speed 6311.72 samples/sec Loss 8.2925 LearningRate 0.0009 Epoch: 5 Global Step: 105970 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:42,934-Speed 6321.39 samples/sec Loss 8.2603 LearningRate 0.0009 Epoch: 5 Global Step: 105980 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:46,181-Speed 6308.20 samples/sec Loss 8.2882 LearningRate 0.0009 Epoch: 5 Global Step: 105990 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:49,424-Speed 6318.07 samples/sec Loss 8.3433 LearningRate 0.0009 Epoch: 5 Global Step: 106000 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:52,652-Speed 6345.41 samples/sec Loss 8.3163 LearningRate 0.0009 Epoch: 5 Global Step: 106010 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:55,898-Speed 6311.23 samples/sec Loss 8.4159 LearningRate 0.0009 Epoch: 5 Global Step: 106020 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:22:59,139-Speed 6320.50 samples/sec Loss 8.3878 LearningRate 0.0009 Epoch: 5 Global Step: 106030 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:02,382-Speed 6315.92 samples/sec Loss 8.2095 LearningRate 0.0009 Epoch: 5 Global Step: 106040 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:05,626-Speed 6314.19 samples/sec Loss 8.3844 LearningRate 0.0009 Epoch: 5 Global Step: 106050 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:08,879-Speed 6296.79 samples/sec Loss 8.4155 LearningRate 0.0009 Epoch: 5 Global Step: 106060 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:12,126-Speed 6310.10 samples/sec Loss 8.3495 LearningRate 0.0009 Epoch: 5 Global Step: 106070 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:15,376-Speed 6301.06 samples/sec Loss 8.3050 LearningRate 0.0009 Epoch: 5 Global Step: 106080 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:18,622-Speed 6311.71 samples/sec Loss 8.3116 LearningRate 0.0009 Epoch: 5 Global Step: 106090 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:21,866-Speed 6313.99 samples/sec Loss 8.3463 LearningRate 0.0009 Epoch: 5 Global Step: 106100 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:25,098-Speed 6339.08 samples/sec Loss 8.3788 LearningRate 0.0009 Epoch: 5 Global Step: 106110 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:28,341-Speed 6316.18 samples/sec Loss 8.3699 LearningRate 0.0009 Epoch: 5 Global Step: 106120 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:31,584-Speed 6317.40 samples/sec Loss 8.2832 LearningRate 0.0009 Epoch: 5 Global Step: 106130 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:34,826-Speed 6318.79 samples/sec Loss 8.1873 LearningRate 0.0009 Epoch: 5 Global Step: 106140 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:38,068-Speed 6318.07 samples/sec Loss 8.3407 LearningRate 0.0009 Epoch: 5 Global Step: 106150 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:41,314-Speed 6310.98 samples/sec Loss 8.2400 LearningRate 0.0009 Epoch: 5 Global Step: 106160 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:44,556-Speed 6318.79 samples/sec Loss 8.3050 LearningRate 0.0009 Epoch: 5 Global Step: 106170 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:47,802-Speed 6311.65 samples/sec Loss 8.2789 LearningRate 0.0009 Epoch: 5 Global Step: 106180 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:51,043-Speed 6319.84 samples/sec Loss 8.2886 LearningRate 0.0009 Epoch: 5 Global Step: 106190 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:54,299-Speed 6290.48 samples/sec Loss 8.3159 LearningRate 0.0009 Epoch: 5 Global Step: 106200 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:23:57,531-Speed 6339.37 samples/sec Loss 8.2769 LearningRate 0.0009 Epoch: 5 Global Step: 106210 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:00,776-Speed 6311.24 samples/sec Loss 8.3299 LearningRate 0.0009 Epoch: 5 Global Step: 106220 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:04,021-Speed 6313.25 samples/sec Loss 8.2608 LearningRate 0.0009 Epoch: 5 Global Step: 106230 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:07,266-Speed 6313.57 samples/sec Loss 8.3560 LearningRate 0.0009 Epoch: 5 Global Step: 106240 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:10,510-Speed 6313.44 samples/sec Loss 8.3001 LearningRate 0.0009 Epoch: 5 Global Step: 106250 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:13,757-Speed 6308.28 samples/sec Loss 8.3397 LearningRate 0.0009 Epoch: 5 Global Step: 106260 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:17,010-Speed 6297.84 samples/sec Loss 8.2789 LearningRate 0.0009 Epoch: 5 Global Step: 106270 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:20,256-Speed 6309.81 samples/sec Loss 8.3529 LearningRate 0.0009 Epoch: 5 Global Step: 106280 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:23,500-Speed 6314.77 samples/sec Loss 8.3055 LearningRate 0.0009 Epoch: 5 Global Step: 106290 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:26,744-Speed 6314.20 samples/sec Loss 8.2087 LearningRate 0.0009 Epoch: 5 Global Step: 106300 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:29,974-Speed 6342.97 samples/sec Loss 8.4078 LearningRate 0.0009 Epoch: 5 Global Step: 106310 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:24:33,202-Speed 6346.79 samples/sec Loss 8.2812 LearningRate 0.0009 Epoch: 5 Global Step: 106320 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:24:36,446-Speed 6315.32 samples/sec Loss 8.2926 LearningRate 0.0009 Epoch: 5 Global Step: 106330 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:24:39,691-Speed 6312.08 samples/sec Loss 8.2857 LearningRate 0.0009 Epoch: 5 Global Step: 106340 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:24:42,941-Speed 6302.70 samples/sec Loss 8.2409 LearningRate 0.0009 Epoch: 5 Global Step: 106350 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:24:46,182-Speed 6320.58 samples/sec Loss 8.3252 LearningRate 0.0009 Epoch: 5 Global Step: 106360 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:24:49,425-Speed 6317.54 samples/sec Loss 8.1919 LearningRate 0.0009 Epoch: 5 Global Step: 106370 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:24:52,669-Speed 6314.68 samples/sec Loss 8.2946 LearningRate 0.0009 Epoch: 5 Global Step: 106380 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:24:55,909-Speed 6320.36 samples/sec Loss 8.2461 LearningRate 0.0009 Epoch: 5 Global Step: 106390 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:24:59,150-Speed 6320.78 samples/sec Loss 8.3011 LearningRate 0.0009 Epoch: 5 Global Step: 106400 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:25:02,396-Speed 6312.31 samples/sec Loss 8.2825 LearningRate 0.0009 Epoch: 5 Global Step: 106410 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:25:05,639-Speed 6316.16 samples/sec Loss 8.3338 LearningRate 0.0009 Epoch: 5 Global Step: 106420 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:08,883-Speed 6313.24 samples/sec Loss 8.2580 LearningRate 0.0009 Epoch: 5 Global Step: 106430 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:12,128-Speed 6314.18 samples/sec Loss 8.2935 LearningRate 0.0009 Epoch: 5 Global Step: 106440 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:15,374-Speed 6309.76 samples/sec Loss 8.3443 LearningRate 0.0009 Epoch: 5 Global Step: 106450 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:18,617-Speed 6317.26 samples/sec Loss 8.2834 LearningRate 0.0009 Epoch: 5 Global Step: 106460 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:21,863-Speed 6309.13 samples/sec Loss 8.2516 LearningRate 0.0009 Epoch: 5 Global Step: 106470 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:25,111-Speed 6308.03 samples/sec Loss 8.3030 LearningRate 0.0009 Epoch: 5 Global Step: 106480 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:28,356-Speed 6313.33 samples/sec Loss 8.4082 LearningRate 0.0009 Epoch: 5 Global Step: 106490 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:31,599-Speed 6316.38 samples/sec Loss 8.3490 LearningRate 0.0009 Epoch: 5 Global Step: 106500 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:34,842-Speed 6316.01 samples/sec Loss 8.3235 LearningRate 0.0009 Epoch: 5 Global Step: 106510 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:38,074-Speed 6338.08 samples/sec Loss 8.3414 LearningRate 0.0009 Epoch: 5 Global Step: 106520 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:41,321-Speed 6309.15 samples/sec Loss 8.3475 LearningRate 0.0009 Epoch: 5 Global Step: 106530 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:44,566-Speed 6311.68 samples/sec Loss 8.3420 LearningRate 0.0009 Epoch: 5 Global Step: 106540 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:25:47,801-Speed 6333.04 samples/sec Loss 8.3180 LearningRate 0.0009 Epoch: 5 Global Step: 106550 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:25:51,046-Speed 6313.49 samples/sec Loss 8.3193 LearningRate 0.0009 Epoch: 5 Global Step: 106560 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:25:54,288-Speed 6318.53 samples/sec Loss 8.2525 LearningRate 0.0009 Epoch: 5 Global Step: 106570 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:25:57,530-Speed 6317.94 samples/sec Loss 8.2920 LearningRate 0.0009 Epoch: 5 Global Step: 106580 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:26:00,776-Speed 6310.94 samples/sec Loss 8.2043 LearningRate 0.0009 Epoch: 5 Global Step: 106590 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:26:04,023-Speed 6308.78 samples/sec Loss 8.3021 LearningRate 0.0009 Epoch: 5 Global Step: 106600 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:26:07,267-Speed 6314.27 samples/sec Loss 8.2937 LearningRate 0.0009 Epoch: 5 Global Step: 106610 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:26:10,517-Speed 6302.74 samples/sec Loss 8.2040 LearningRate 0.0009 Epoch: 5 Global Step: 106620 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:26:13,759-Speed 6319.58 samples/sec Loss 8.3403 LearningRate 0.0009 Epoch: 5 Global Step: 106630 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:26:17,004-Speed 6312.94 samples/sec Loss 8.3253 LearningRate 0.0009 Epoch: 5 Global Step: 106640 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:26:20,245-Speed 6318.62 samples/sec Loss 8.3163 LearningRate 0.0009 Epoch: 5 Global Step: 106650 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:23,493-Speed 6308.50 samples/sec Loss 8.3300 LearningRate 0.0009 Epoch: 5 Global Step: 106660 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:26,742-Speed 6303.95 samples/sec Loss 8.2997 LearningRate 0.0009 Epoch: 5 Global Step: 106670 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:29,986-Speed 6314.91 samples/sec Loss 8.2740 LearningRate 0.0009 Epoch: 5 Global Step: 106680 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:33,230-Speed 6314.68 samples/sec Loss 8.2724 LearningRate 0.0009 Epoch: 5 Global Step: 106690 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:36,473-Speed 6315.60 samples/sec Loss 8.2942 LearningRate 0.0009 Epoch: 5 Global Step: 106700 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:39,767-Speed 6218.84 samples/sec Loss 8.3562 LearningRate 0.0009 Epoch: 5 Global Step: 106710 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:43,013-Speed 6310.78 samples/sec Loss 8.2457 LearningRate 0.0009 Epoch: 5 Global Step: 106720 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:46,256-Speed 6316.47 samples/sec Loss 8.2893 LearningRate 0.0009 Epoch: 5 Global Step: 106730 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:49,504-Speed 6306.76 samples/sec Loss 8.2797 LearningRate 0.0009 Epoch: 5 Global Step: 106740 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:52,734-Speed 6342.49 samples/sec Loss 8.3259 LearningRate 0.0009 Epoch: 5 Global Step: 106750 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:55,983-Speed 6304.80 samples/sec Loss 8.3303 LearningRate 0.0009 Epoch: 5 Global Step: 106760 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:26:59,228-Speed 6312.61 samples/sec Loss 8.2738 LearningRate 0.0009 Epoch: 5 Global Step: 106770 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:27:02,474-Speed 6311.71 samples/sec Loss 8.2685 LearningRate 0.0009 Epoch: 5 Global Step: 106780 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:27:05,735-Speed 6281.74 samples/sec Loss 8.2971 LearningRate 0.0009 Epoch: 5 Global Step: 106790 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:27:08,979-Speed 6315.45 samples/sec Loss 8.2887 LearningRate 0.0009 Epoch: 5 Global Step: 106800 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:27:12,224-Speed 6312.09 samples/sec Loss 8.4009 LearningRate 0.0009 Epoch: 5 Global Step: 106810 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:27:15,468-Speed 6313.85 samples/sec Loss 8.3423 LearningRate 0.0009 Epoch: 5 Global Step: 106820 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:27:18,715-Speed 6309.32 samples/sec Loss 8.2599 LearningRate 0.0009 Epoch: 5 Global Step: 106830 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:27:21,962-Speed 6307.62 samples/sec Loss 8.3261 LearningRate 0.0009 Epoch: 5 Global Step: 106840 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:27:25,176-Speed 6373.74 samples/sec Loss 8.3756 LearningRate 0.0009 Epoch: 5 Global Step: 106850 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:27:28,425-Speed 6306.47 samples/sec Loss 8.3825 LearningRate 0.0009 Epoch: 5 Global Step: 106860 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:27:31,670-Speed 6312.35 samples/sec Loss 8.3513 LearningRate 0.0009 Epoch: 5 Global Step: 106870 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:27:34,913-Speed 6315.27 samples/sec Loss 8.2839 LearningRate 0.0009 Epoch: 5 Global Step: 106880 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:27:38,161-Speed 6306.75 samples/sec Loss 8.2286 LearningRate 0.0009 Epoch: 5 Global Step: 106890 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:27:41,407-Speed 6311.22 samples/sec Loss 8.3004 LearningRate 0.0009 Epoch: 5 Global Step: 106900 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:27:44,652-Speed 6312.49 samples/sec Loss 8.2668 LearningRate 0.0009 Epoch: 5 Global Step: 106910 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:27:47,895-Speed 6316.98 samples/sec Loss 8.3192 LearningRate 0.0009 Epoch: 5 Global Step: 106920 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:27:51,147-Speed 6299.41 samples/sec Loss 8.3556 LearningRate 0.0009 Epoch: 5 Global Step: 106930 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:27:54,389-Speed 6317.83 samples/sec Loss 8.3920 LearningRate 0.0009 Epoch: 5 Global Step: 106940 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:27:57,639-Speed 6303.64 samples/sec Loss 8.3010 LearningRate 0.0009 Epoch: 5 Global Step: 106950 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:28:00,885-Speed 6311.02 samples/sec Loss 8.2568 LearningRate 0.0009 Epoch: 5 Global Step: 106960 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:28:04,132-Speed 6309.41 samples/sec Loss 8.3061 LearningRate 0.0009 Epoch: 5 Global Step: 106970 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:28:07,365-Speed 6334.88 samples/sec Loss 8.2696 LearningRate 0.0009 Epoch: 5 Global Step: 106980 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:28:10,610-Speed 6313.95 samples/sec Loss 8.2472 LearningRate 0.0009 Epoch: 5 Global Step: 106990 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:28:13,856-Speed 6311.14 samples/sec Loss 8.2724 LearningRate 0.0009 Epoch: 5 Global Step: 107000 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:28:17,099-Speed 6315.36 samples/sec Loss 8.2109 LearningRate 0.0009 Epoch: 5 Global Step: 107010 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:28:20,346-Speed 6308.74 samples/sec Loss 8.3131 LearningRate 0.0009 Epoch: 5 Global Step: 107020 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:28:23,599-Speed 6297.96 samples/sec Loss 8.2346 LearningRate 0.0009 Epoch: 5 Global Step: 107030 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:28:26,845-Speed 6310.63 samples/sec Loss 8.1947 LearningRate 0.0009 Epoch: 5 Global Step: 107040 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:28:30,092-Speed 6307.79 samples/sec Loss 8.2958 LearningRate 0.0009 Epoch: 5 Global Step: 107050 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:28:33,338-Speed 6312.01 samples/sec Loss 8.1950 LearningRate 0.0009 Epoch: 5 Global Step: 107060 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:28:36,584-Speed 6309.64 samples/sec Loss 8.2989 LearningRate 0.0009 Epoch: 5 Global Step: 107070 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:28:39,830-Speed 6311.98 samples/sec Loss 8.2345 LearningRate 0.0009 Epoch: 5 Global Step: 107080 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:28:43,074-Speed 6313.63 samples/sec Loss 8.1965 LearningRate 0.0009 Epoch: 5 Global Step: 107090 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:28:46,326-Speed 6298.92 samples/sec Loss 8.2476 LearningRate 0.0009 Epoch: 5 Global Step: 107100 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:28:49,569-Speed 6315.98 samples/sec Loss 8.2302 LearningRate 0.0009 Epoch: 5 Global Step: 107110 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:28:52,813-Speed 6316.09 samples/sec Loss 8.2671 LearningRate 0.0009 Epoch: 5 Global Step: 107120 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:28:56,058-Speed 6311.81 samples/sec Loss 8.2955 LearningRate 0.0009 Epoch: 5 Global Step: 107130 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:28:59,308-Speed 6302.74 samples/sec Loss 8.1913 LearningRate 0.0009 Epoch: 5 Global Step: 107140 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:29:02,556-Speed 6305.99 samples/sec Loss 8.2318 LearningRate 0.0009 Epoch: 5 Global Step: 107150 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:29:05,805-Speed 6306.49 samples/sec Loss 8.2108 LearningRate 0.0009 Epoch: 5 Global Step: 107160 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:29:09,051-Speed 6310.02 samples/sec Loss 8.2248 LearningRate 0.0009 Epoch: 5 Global Step: 107170 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:29:12,294-Speed 6316.97 samples/sec Loss 8.2550 LearningRate 0.0009 Epoch: 5 Global Step: 107180 Fp16 Grad Scale: 131072 Required: 66 hours Training: 2022-04-01 01:29:15,524-Speed 6342.95 samples/sec Loss 8.3150 LearningRate 0.0009 Epoch: 5 Global Step: 107190 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:29:18,770-Speed 6310.57 samples/sec Loss 8.2165 LearningRate 0.0009 Epoch: 5 Global Step: 107200 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:29:22,015-Speed 6311.59 samples/sec Loss 8.3308 LearningRate 0.0009 Epoch: 5 Global Step: 107210 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:29:25,259-Speed 6314.83 samples/sec Loss 8.2703 LearningRate 0.0009 Epoch: 5 Global Step: 107220 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:29:28,514-Speed 6293.86 samples/sec Loss 8.3391 LearningRate 0.0009 Epoch: 5 Global Step: 107230 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:29:31,745-Speed 6339.94 samples/sec Loss 8.3322 LearningRate 0.0009 Epoch: 5 Global Step: 107240 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:29:34,993-Speed 6307.83 samples/sec Loss 8.2224 LearningRate 0.0009 Epoch: 5 Global Step: 107250 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:29:38,239-Speed 6310.08 samples/sec Loss 8.2920 LearningRate 0.0009 Epoch: 5 Global Step: 107260 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:29:41,486-Speed 6307.96 samples/sec Loss 8.2409 LearningRate 0.0009 Epoch: 5 Global Step: 107270 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:29:44,729-Speed 6316.89 samples/sec Loss 8.3182 LearningRate 0.0009 Epoch: 5 Global Step: 107280 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:29:47,973-Speed 6315.06 samples/sec Loss 8.2920 LearningRate 0.0009 Epoch: 5 Global Step: 107290 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:29:51,221-Speed 6305.48 samples/sec Loss 8.2934 LearningRate 0.0009 Epoch: 5 Global Step: 107300 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:29:54,465-Speed 6315.08 samples/sec Loss 8.2852 LearningRate 0.0009 Epoch: 5 Global Step: 107310 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:29:57,709-Speed 6315.40 samples/sec Loss 8.2388 LearningRate 0.0009 Epoch: 5 Global Step: 107320 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:30:00,955-Speed 6310.85 samples/sec Loss 8.2531 LearningRate 0.0009 Epoch: 5 Global Step: 107330 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:30:04,205-Speed 6302.47 samples/sec Loss 8.2505 LearningRate 0.0009 Epoch: 5 Global Step: 107340 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:07,451-Speed 6311.15 samples/sec Loss 8.2780 LearningRate 0.0009 Epoch: 5 Global Step: 107350 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:10,698-Speed 6307.88 samples/sec Loss 8.2660 LearningRate 0.0009 Epoch: 5 Global Step: 107360 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:13,952-Speed 6295.79 samples/sec Loss 8.2599 LearningRate 0.0009 Epoch: 5 Global Step: 107370 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:17,206-Speed 6296.65 samples/sec Loss 8.2315 LearningRate 0.0009 Epoch: 5 Global Step: 107380 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:20,450-Speed 6314.23 samples/sec Loss 8.2695 LearningRate 0.0009 Epoch: 5 Global Step: 107390 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:23,702-Speed 6299.43 samples/sec Loss 8.2812 LearningRate 0.0009 Epoch: 5 Global Step: 107400 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:26,947-Speed 6312.94 samples/sec Loss 8.2214 LearningRate 0.0009 Epoch: 5 Global Step: 107410 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:30,191-Speed 6312.89 samples/sec Loss 8.2302 LearningRate 0.0009 Epoch: 5 Global Step: 107420 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:33,443-Speed 6300.65 samples/sec Loss 8.2825 LearningRate 0.0009 Epoch: 5 Global Step: 107430 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:36,677-Speed 6334.39 samples/sec Loss 8.3053 LearningRate 0.0009 Epoch: 5 Global Step: 107440 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:39,924-Speed 6308.34 samples/sec Loss 8.2451 LearningRate 0.0009 Epoch: 5 Global Step: 107450 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:43,173-Speed 6303.76 samples/sec Loss 8.1881 LearningRate 0.0009 Epoch: 5 Global Step: 107460 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:30:46,409-Speed 6330.95 samples/sec Loss 8.1660 LearningRate 0.0009 Epoch: 5 Global Step: 107470 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:30:49,656-Speed 6308.01 samples/sec Loss 8.2853 LearningRate 0.0009 Epoch: 5 Global Step: 107480 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:30:52,903-Speed 6309.80 samples/sec Loss 8.2682 LearningRate 0.0009 Epoch: 5 Global Step: 107490 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:30:56,141-Speed 6326.22 samples/sec Loss 8.2844 LearningRate 0.0009 Epoch: 5 Global Step: 107500 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:30:59,386-Speed 6312.02 samples/sec Loss 8.2537 LearningRate 0.0009 Epoch: 5 Global Step: 107510 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:31:02,634-Speed 6306.65 samples/sec Loss 8.2056 LearningRate 0.0009 Epoch: 5 Global Step: 107520 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:31:05,884-Speed 6302.62 samples/sec Loss 8.2607 LearningRate 0.0009 Epoch: 5 Global Step: 107530 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:31:09,129-Speed 6312.81 samples/sec Loss 8.1804 LearningRate 0.0009 Epoch: 5 Global Step: 107540 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:31:12,379-Speed 6303.40 samples/sec Loss 8.3014 LearningRate 0.0009 Epoch: 5 Global Step: 107550 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:31:15,624-Speed 6313.44 samples/sec Loss 8.2416 LearningRate 0.0009 Epoch: 5 Global Step: 107560 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:31:18,868-Speed 6313.85 samples/sec Loss 8.2687 LearningRate 0.0009 Epoch: 5 Global Step: 107570 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:22,115-Speed 6309.87 samples/sec Loss 8.2570 LearningRate 0.0009 Epoch: 5 Global Step: 107580 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:25,360-Speed 6312.69 samples/sec Loss 8.2313 LearningRate 0.0009 Epoch: 5 Global Step: 107590 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:28,608-Speed 6307.34 samples/sec Loss 8.2897 LearningRate 0.0009 Epoch: 5 Global Step: 107600 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:31,854-Speed 6308.89 samples/sec Loss 8.2233 LearningRate 0.0009 Epoch: 5 Global Step: 107610 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:35,096-Speed 6320.22 samples/sec Loss 8.3145 LearningRate 0.0009 Epoch: 5 Global Step: 107620 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:38,341-Speed 6312.45 samples/sec Loss 8.3140 LearningRate 0.0009 Epoch: 5 Global Step: 107630 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:41,590-Speed 6304.83 samples/sec Loss 8.2415 LearningRate 0.0009 Epoch: 5 Global Step: 107640 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:44,834-Speed 6314.11 samples/sec Loss 8.3020 LearningRate 0.0009 Epoch: 5 Global Step: 107650 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:48,078-Speed 6314.37 samples/sec Loss 8.2681 LearningRate 0.0009 Epoch: 5 Global Step: 107660 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:51,323-Speed 6312.80 samples/sec Loss 8.2560 LearningRate 0.0009 Epoch: 5 Global Step: 107670 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:54,569-Speed 6311.11 samples/sec Loss 8.2518 LearningRate 0.0009 Epoch: 5 Global Step: 107680 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:31:57,814-Speed 6312.77 samples/sec Loss 8.2229 LearningRate 0.0009 Epoch: 5 Global Step: 107690 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:01,063-Speed 6304.79 samples/sec Loss 8.3297 LearningRate 0.0009 Epoch: 5 Global Step: 107700 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:04,314-Speed 6301.79 samples/sec Loss 8.2913 LearningRate 0.0009 Epoch: 5 Global Step: 107710 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:07,563-Speed 6304.62 samples/sec Loss 8.2247 LearningRate 0.0009 Epoch: 5 Global Step: 107720 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:10,821-Speed 6287.33 samples/sec Loss 8.2447 LearningRate 0.0009 Epoch: 5 Global Step: 107730 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:14,066-Speed 6312.73 samples/sec Loss 8.3267 LearningRate 0.0009 Epoch: 5 Global Step: 107740 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:17,312-Speed 6310.50 samples/sec Loss 8.1950 LearningRate 0.0009 Epoch: 5 Global Step: 107750 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:20,553-Speed 6320.95 samples/sec Loss 8.3288 LearningRate 0.0009 Epoch: 5 Global Step: 107760 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:23,810-Speed 6289.69 samples/sec Loss 8.2560 LearningRate 0.0009 Epoch: 5 Global Step: 107770 Fp16 Grad Scale: 131072 Required: 66 hours Training: 2022-04-01 01:32:27,043-Speed 6335.05 samples/sec Loss 8.3152 LearningRate 0.0009 Epoch: 5 Global Step: 107780 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:30,290-Speed 6309.61 samples/sec Loss 8.2382 LearningRate 0.0009 Epoch: 5 Global Step: 107790 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:33,540-Speed 6303.14 samples/sec Loss 8.1501 LearningRate 0.0009 Epoch: 5 Global Step: 107800 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:36,788-Speed 6307.58 samples/sec Loss 8.2811 LearningRate 0.0009 Epoch: 5 Global Step: 107810 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:40,032-Speed 6313.60 samples/sec Loss 8.2336 LearningRate 0.0009 Epoch: 5 Global Step: 107820 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:43,277-Speed 6312.35 samples/sec Loss 8.2100 LearningRate 0.0009 Epoch: 5 Global Step: 107830 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:46,522-Speed 6313.18 samples/sec Loss 8.2910 LearningRate 0.0009 Epoch: 5 Global Step: 107840 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:49,770-Speed 6306.38 samples/sec Loss 8.2392 LearningRate 0.0009 Epoch: 5 Global Step: 107850 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:53,016-Speed 6311.27 samples/sec Loss 8.2584 LearningRate 0.0009 Epoch: 5 Global Step: 107860 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:56,264-Speed 6306.97 samples/sec Loss 8.2902 LearningRate 0.0009 Epoch: 5 Global Step: 107870 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:32:59,494-Speed 6341.92 samples/sec Loss 8.2174 LearningRate 0.0009 Epoch: 5 Global Step: 107880 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:02,742-Speed 6305.45 samples/sec Loss 8.2447 LearningRate 0.0009 Epoch: 5 Global Step: 107890 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:05,994-Speed 6299.24 samples/sec Loss 8.2538 LearningRate 0.0009 Epoch: 5 Global Step: 107900 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:09,250-Speed 6291.03 samples/sec Loss 8.2053 LearningRate 0.0009 Epoch: 5 Global Step: 107910 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:12,497-Speed 6310.74 samples/sec Loss 8.2657 LearningRate 0.0009 Epoch: 5 Global Step: 107920 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:15,748-Speed 6300.68 samples/sec Loss 8.2680 LearningRate 0.0009 Epoch: 5 Global Step: 107930 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:18,997-Speed 6303.02 samples/sec Loss 8.2847 LearningRate 0.0009 Epoch: 5 Global Step: 107940 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:22,242-Speed 6313.34 samples/sec Loss 8.2606 LearningRate 0.0009 Epoch: 5 Global Step: 107950 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:25,489-Speed 6309.63 samples/sec Loss 8.2215 LearningRate 0.0009 Epoch: 5 Global Step: 107960 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:28,734-Speed 6312.56 samples/sec Loss 8.1959 LearningRate 0.0009 Epoch: 5 Global Step: 107970 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:31,961-Speed 6347.05 samples/sec Loss 8.3260 LearningRate 0.0009 Epoch: 5 Global Step: 107980 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:35,219-Speed 6288.11 samples/sec Loss 8.2183 LearningRate 0.0009 Epoch: 5 Global Step: 107990 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:38,472-Speed 6296.85 samples/sec Loss 8.1910 LearningRate 0.0009 Epoch: 5 Global Step: 108000 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:41,731-Speed 6286.47 samples/sec Loss 8.2196 LearningRate 0.0009 Epoch: 5 Global Step: 108010 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:44,978-Speed 6308.65 samples/sec Loss 8.2966 LearningRate 0.0009 Epoch: 5 Global Step: 108020 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:48,221-Speed 6316.27 samples/sec Loss 8.2245 LearningRate 0.0009 Epoch: 5 Global Step: 108030 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:51,466-Speed 6313.28 samples/sec Loss 8.2076 LearningRate 0.0009 Epoch: 5 Global Step: 108040 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:54,712-Speed 6310.31 samples/sec Loss 8.2399 LearningRate 0.0009 Epoch: 5 Global Step: 108050 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:33:57,958-Speed 6310.73 samples/sec Loss 8.1971 LearningRate 0.0009 Epoch: 5 Global Step: 108060 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:01,206-Speed 6306.10 samples/sec Loss 8.2446 LearningRate 0.0009 Epoch: 5 Global Step: 108070 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:04,439-Speed 6336.06 samples/sec Loss 8.2132 LearningRate 0.0009 Epoch: 5 Global Step: 108080 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:07,687-Speed 6307.29 samples/sec Loss 8.2274 LearningRate 0.0009 Epoch: 5 Global Step: 108090 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:10,931-Speed 6315.34 samples/sec Loss 8.2370 LearningRate 0.0009 Epoch: 5 Global Step: 108100 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:14,183-Speed 6298.97 samples/sec Loss 8.2459 LearningRate 0.0009 Epoch: 5 Global Step: 108110 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:17,430-Speed 6308.43 samples/sec Loss 8.2532 LearningRate 0.0009 Epoch: 5 Global Step: 108120 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:20,678-Speed 6306.40 samples/sec Loss 8.2891 LearningRate 0.0009 Epoch: 5 Global Step: 108130 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:23,927-Speed 6304.32 samples/sec Loss 8.2454 LearningRate 0.0009 Epoch: 5 Global Step: 108140 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:27,183-Speed 6291.36 samples/sec Loss 8.1864 LearningRate 0.0009 Epoch: 5 Global Step: 108150 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:30,429-Speed 6310.51 samples/sec Loss 8.1490 LearningRate 0.0009 Epoch: 5 Global Step: 108160 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:33,675-Speed 6310.75 samples/sec Loss 8.2440 LearningRate 0.0009 Epoch: 5 Global Step: 108170 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:36,908-Speed 6336.78 samples/sec Loss 8.3230 LearningRate 0.0009 Epoch: 5 Global Step: 108180 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:40,155-Speed 6309.35 samples/sec Loss 8.2857 LearningRate 0.0009 Epoch: 5 Global Step: 108190 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:43,398-Speed 6316.95 samples/sec Loss 8.2403 LearningRate 0.0009 Epoch: 5 Global Step: 108200 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:46,644-Speed 6309.88 samples/sec Loss 8.2206 LearningRate 0.0009 Epoch: 5 Global Step: 108210 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:49,889-Speed 6313.41 samples/sec Loss 8.2149 LearningRate 0.0009 Epoch: 5 Global Step: 108220 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:53,133-Speed 6314.70 samples/sec Loss 8.2384 LearningRate 0.0009 Epoch: 5 Global Step: 108230 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:56,381-Speed 6306.18 samples/sec Loss 8.2048 LearningRate 0.0009 Epoch: 5 Global Step: 108240 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:34:59,630-Speed 6305.53 samples/sec Loss 8.1677 LearningRate 0.0009 Epoch: 5 Global Step: 108250 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:02,877-Speed 6309.28 samples/sec Loss 8.2672 LearningRate 0.0009 Epoch: 5 Global Step: 108260 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:06,122-Speed 6311.44 samples/sec Loss 8.2301 LearningRate 0.0009 Epoch: 5 Global Step: 108270 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:09,367-Speed 6312.71 samples/sec Loss 8.3647 LearningRate 0.0009 Epoch: 5 Global Step: 108280 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:12,610-Speed 6317.46 samples/sec Loss 8.1866 LearningRate 0.0009 Epoch: 5 Global Step: 108290 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:15,852-Speed 6318.75 samples/sec Loss 8.2551 LearningRate 0.0009 Epoch: 5 Global Step: 108300 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:19,100-Speed 6306.64 samples/sec Loss 8.1592 LearningRate 0.0009 Epoch: 5 Global Step: 108310 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:22,341-Speed 6320.23 samples/sec Loss 8.2111 LearningRate 0.0009 Epoch: 5 Global Step: 108320 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:25,590-Speed 6305.31 samples/sec Loss 8.2232 LearningRate 0.0009 Epoch: 5 Global Step: 108330 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:28,849-Speed 6283.73 samples/sec Loss 8.2147 LearningRate 0.0009 Epoch: 5 Global Step: 108340 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:32,098-Speed 6305.24 samples/sec Loss 8.2626 LearningRate 0.0009 Epoch: 5 Global Step: 108350 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:35,344-Speed 6310.30 samples/sec Loss 8.2360 LearningRate 0.0009 Epoch: 5 Global Step: 108360 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:38,597-Speed 6298.89 samples/sec Loss 8.2693 LearningRate 0.0009 Epoch: 5 Global Step: 108370 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:35:41,825-Speed 6345.62 samples/sec Loss 8.2220 LearningRate 0.0009 Epoch: 5 Global Step: 108380 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:35:45,066-Speed 6319.10 samples/sec Loss 8.3488 LearningRate 0.0009 Epoch: 5 Global Step: 108390 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:35:48,314-Speed 6307.01 samples/sec Loss 8.2502 LearningRate 0.0009 Epoch: 5 Global Step: 108400 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:35:51,560-Speed 6310.40 samples/sec Loss 8.2798 LearningRate 0.0009 Epoch: 5 Global Step: 108410 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:35:54,804-Speed 6315.42 samples/sec Loss 8.2287 LearningRate 0.0009 Epoch: 5 Global Step: 108420 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:35:58,056-Speed 6298.12 samples/sec Loss 8.2324 LearningRate 0.0009 Epoch: 5 Global Step: 108430 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:36:01,301-Speed 6313.28 samples/sec Loss 8.2499 LearningRate 0.0009 Epoch: 5 Global Step: 108440 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:36:04,546-Speed 6313.38 samples/sec Loss 8.1346 LearningRate 0.0009 Epoch: 5 Global Step: 108450 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:36:07,791-Speed 6311.73 samples/sec Loss 8.2127 LearningRate 0.0009 Epoch: 5 Global Step: 108460 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:36:11,040-Speed 6306.77 samples/sec Loss 8.2521 LearningRate 0.0009 Epoch: 5 Global Step: 108470 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:36:14,285-Speed 6311.44 samples/sec Loss 8.2672 LearningRate 0.0009 Epoch: 5 Global Step: 108480 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:17,534-Speed 6305.55 samples/sec Loss 8.1032 LearningRate 0.0009 Epoch: 5 Global Step: 108490 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:20,782-Speed 6306.99 samples/sec Loss 8.2269 LearningRate 0.0009 Epoch: 5 Global Step: 108500 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:24,028-Speed 6310.22 samples/sec Loss 8.2078 LearningRate 0.0009 Epoch: 5 Global Step: 108510 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:27,275-Speed 6307.80 samples/sec Loss 8.3069 LearningRate 0.0009 Epoch: 5 Global Step: 108520 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:30,524-Speed 6305.72 samples/sec Loss 8.2085 LearningRate 0.0009 Epoch: 5 Global Step: 108530 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:33,773-Speed 6305.29 samples/sec Loss 8.3121 LearningRate 0.0009 Epoch: 5 Global Step: 108540 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:37,026-Speed 6297.12 samples/sec Loss 8.2345 LearningRate 0.0009 Epoch: 5 Global Step: 108550 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:40,273-Speed 6308.37 samples/sec Loss 8.2192 LearningRate 0.0009 Epoch: 5 Global Step: 108560 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:43,520-Speed 6308.73 samples/sec Loss 8.1765 LearningRate 0.0009 Epoch: 5 Global Step: 108570 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:46,750-Speed 6343.34 samples/sec Loss 8.2945 LearningRate 0.0009 Epoch: 5 Global Step: 108580 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:49,993-Speed 6315.73 samples/sec Loss 8.3031 LearningRate 0.0009 Epoch: 5 Global Step: 108590 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:53,249-Speed 6290.30 samples/sec Loss 8.3019 LearningRate 0.0009 Epoch: 5 Global Step: 108600 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:56,495-Speed 6310.40 samples/sec Loss 8.3389 LearningRate 0.0009 Epoch: 5 Global Step: 108610 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:36:59,741-Speed 6312.43 samples/sec Loss 8.2037 LearningRate 0.0009 Epoch: 5 Global Step: 108620 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:37:02,988-Speed 6307.48 samples/sec Loss 8.2736 LearningRate 0.0009 Epoch: 5 Global Step: 108630 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:37:06,233-Speed 6311.96 samples/sec Loss 8.1979 LearningRate 0.0009 Epoch: 5 Global Step: 108640 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:37:09,482-Speed 6306.77 samples/sec Loss 8.2504 LearningRate 0.0009 Epoch: 5 Global Step: 108650 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:37:12,728-Speed 6310.34 samples/sec Loss 8.1462 LearningRate 0.0009 Epoch: 5 Global Step: 108660 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:37:15,976-Speed 6306.60 samples/sec Loss 8.3309 LearningRate 0.0009 Epoch: 5 Global Step: 108670 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:37:19,210-Speed 6335.78 samples/sec Loss 8.2762 LearningRate 0.0009 Epoch: 5 Global Step: 108680 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:37:22,460-Speed 6302.44 samples/sec Loss 8.2412 LearningRate 0.0009 Epoch: 5 Global Step: 108690 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:37:25,697-Speed 6327.16 samples/sec Loss 8.2232 LearningRate 0.0009 Epoch: 5 Global Step: 108700 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:37:28,948-Speed 6302.42 samples/sec Loss 8.2089 LearningRate 0.0009 Epoch: 5 Global Step: 108710 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:37:32,199-Speed 6301.19 samples/sec Loss 8.2037 LearningRate 0.0009 Epoch: 5 Global Step: 108720 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:37:35,454-Speed 6291.62 samples/sec Loss 8.1837 LearningRate 0.0009 Epoch: 5 Global Step: 108730 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:37:38,701-Speed 6309.59 samples/sec Loss 8.2460 LearningRate 0.0009 Epoch: 5 Global Step: 108740 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:37:41,944-Speed 6315.32 samples/sec Loss 8.1722 LearningRate 0.0009 Epoch: 5 Global Step: 108750 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:37:45,195-Speed 6302.89 samples/sec Loss 8.2299 LearningRate 0.0009 Epoch: 5 Global Step: 108760 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:37:48,450-Speed 6292.30 samples/sec Loss 8.1954 LearningRate 0.0009 Epoch: 5 Global Step: 108770 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:37:51,696-Speed 6309.90 samples/sec Loss 8.1175 LearningRate 0.0009 Epoch: 5 Global Step: 108780 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:37:54,940-Speed 6315.80 samples/sec Loss 8.1579 LearningRate 0.0009 Epoch: 5 Global Step: 108790 Fp16 Grad Scale: 32768 Required: 66 hours Training: 2022-04-01 01:37:58,188-Speed 6307.19 samples/sec Loss 8.2970 LearningRate 0.0009 Epoch: 5 Global Step: 108800 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:01,432-Speed 6314.86 samples/sec Loss 8.1815 LearningRate 0.0009 Epoch: 5 Global Step: 108810 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:04,683-Speed 6300.07 samples/sec Loss 8.2431 LearningRate 0.0009 Epoch: 5 Global Step: 108820 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:07,930-Speed 6309.30 samples/sec Loss 8.2585 LearningRate 0.0009 Epoch: 5 Global Step: 108830 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:11,197-Speed 6268.70 samples/sec Loss 8.3245 LearningRate 0.0009 Epoch: 5 Global Step: 108840 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:14,446-Speed 6305.84 samples/sec Loss 8.3033 LearningRate 0.0009 Epoch: 5 Global Step: 108850 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:17,693-Speed 6309.34 samples/sec Loss 8.2089 LearningRate 0.0009 Epoch: 5 Global Step: 108860 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:20,945-Speed 6300.08 samples/sec Loss 8.2264 LearningRate 0.0009 Epoch: 5 Global Step: 108870 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:24,192-Speed 6308.07 samples/sec Loss 8.2625 LearningRate 0.0009 Epoch: 5 Global Step: 108880 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:27,436-Speed 6313.53 samples/sec Loss 8.1618 LearningRate 0.0009 Epoch: 5 Global Step: 108890 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:30,671-Speed 6333.33 samples/sec Loss 8.1905 LearningRate 0.0009 Epoch: 5 Global Step: 108900 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:33,919-Speed 6306.75 samples/sec Loss 8.3161 LearningRate 0.0009 Epoch: 5 Global Step: 108910 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:37,166-Speed 6307.86 samples/sec Loss 8.1565 LearningRate 0.0009 Epoch: 5 Global Step: 108920 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:40,413-Speed 6308.95 samples/sec Loss 8.2851 LearningRate 0.0009 Epoch: 5 Global Step: 108930 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:43,658-Speed 6312.32 samples/sec Loss 8.2554 LearningRate 0.0009 Epoch: 5 Global Step: 108940 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:46,907-Speed 6306.32 samples/sec Loss 8.2001 LearningRate 0.0009 Epoch: 5 Global Step: 108950 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:50,158-Speed 6299.57 samples/sec Loss 8.2061 LearningRate 0.0009 Epoch: 5 Global Step: 108960 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:53,401-Speed 6316.12 samples/sec Loss 8.2194 LearningRate 0.0009 Epoch: 5 Global Step: 108970 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:56,648-Speed 6309.31 samples/sec Loss 8.3734 LearningRate 0.0009 Epoch: 5 Global Step: 108980 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:38:59,896-Speed 6307.35 samples/sec Loss 8.1545 LearningRate 0.0009 Epoch: 5 Global Step: 108990 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:39:03,134-Speed 6326.42 samples/sec Loss 8.2456 LearningRate 0.0009 Epoch: 5 Global Step: 109000 Fp16 Grad Scale: 65536 Required: 66 hours Training: 2022-04-01 01:39:06,382-Speed 6306.56 samples/sec Loss 8.1933 LearningRate 0.0009 Epoch: 5 Global Step: 109010 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:39:09,630-Speed 6306.59 samples/sec Loss 8.1699 LearningRate 0.0009 Epoch: 5 Global Step: 109020 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:39:12,876-Speed 6311.35 samples/sec Loss 8.2198 LearningRate 0.0009 Epoch: 5 Global Step: 109030 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:39:16,123-Speed 6309.50 samples/sec Loss 8.2848 LearningRate 0.0009 Epoch: 5 Global Step: 109040 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:39:19,368-Speed 6311.49 samples/sec Loss 8.1630 LearningRate 0.0009 Epoch: 5 Global Step: 109050 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:39:22,602-Speed 6333.51 samples/sec Loss 8.2045 LearningRate 0.0009 Epoch: 5 Global Step: 109060 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:39:25,856-Speed 6295.26 samples/sec Loss 8.1821 LearningRate 0.0009 Epoch: 5 Global Step: 109070 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:39:29,103-Speed 6310.04 samples/sec Loss 8.1341 LearningRate 0.0009 Epoch: 5 Global Step: 109080 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:39:32,350-Speed 6308.98 samples/sec Loss 8.2868 LearningRate 0.0009 Epoch: 5 Global Step: 109090 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:39:35,598-Speed 6307.36 samples/sec Loss 8.1872 LearningRate 0.0009 Epoch: 5 Global Step: 109100 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:39:38,845-Speed 6307.99 samples/sec Loss 8.1159 LearningRate 0.0009 Epoch: 5 Global Step: 109110 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:39:42,093-Speed 6307.70 samples/sec Loss 8.2097 LearningRate 0.0009 Epoch: 5 Global Step: 109120 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:39:45,340-Speed 6308.09 samples/sec Loss 8.1975 LearningRate 0.0009 Epoch: 5 Global Step: 109130 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:39:48,584-Speed 6314.80 samples/sec Loss 8.2138 LearningRate 0.0009 Epoch: 5 Global Step: 109140 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:39:51,832-Speed 6307.06 samples/sec Loss 8.2641 LearningRate 0.0009 Epoch: 5 Global Step: 109150 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:39:55,075-Speed 6315.31 samples/sec Loss 8.2463 LearningRate 0.0009 Epoch: 5 Global Step: 109160 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:39:58,322-Speed 6310.02 samples/sec Loss 8.2888 LearningRate 0.0009 Epoch: 5 Global Step: 109170 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:01,566-Speed 6314.12 samples/sec Loss 8.1534 LearningRate 0.0009 Epoch: 5 Global Step: 109180 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:04,820-Speed 6294.97 samples/sec Loss 8.2928 LearningRate 0.0009 Epoch: 5 Global Step: 109190 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:08,066-Speed 6311.10 samples/sec Loss 8.2896 LearningRate 0.0009 Epoch: 5 Global Step: 109200 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:11,316-Speed 6302.74 samples/sec Loss 8.1902 LearningRate 0.0009 Epoch: 5 Global Step: 109210 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:14,563-Speed 6308.28 samples/sec Loss 8.2342 LearningRate 0.0009 Epoch: 5 Global Step: 109220 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:17,808-Speed 6313.35 samples/sec Loss 8.2384 LearningRate 0.0009 Epoch: 5 Global Step: 109230 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:21,054-Speed 6310.69 samples/sec Loss 8.1412 LearningRate 0.0009 Epoch: 5 Global Step: 109240 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:24,302-Speed 6307.20 samples/sec Loss 8.1640 LearningRate 0.0009 Epoch: 5 Global Step: 109250 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:27,534-Speed 6338.16 samples/sec Loss 8.2800 LearningRate 0.0009 Epoch: 5 Global Step: 109260 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:30,832-Speed 6211.30 samples/sec Loss 8.2066 LearningRate 0.0009 Epoch: 5 Global Step: 109270 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:34,075-Speed 6315.30 samples/sec Loss 8.2747 LearningRate 0.0009 Epoch: 5 Global Step: 109280 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:37,319-Speed 6315.00 samples/sec Loss 8.2921 LearningRate 0.0009 Epoch: 5 Global Step: 109290 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:40,565-Speed 6311.13 samples/sec Loss 8.2250 LearningRate 0.0009 Epoch: 5 Global Step: 109300 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:40:43,792-Speed 6349.44 samples/sec Loss 8.1070 LearningRate 0.0009 Epoch: 5 Global Step: 109310 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:40:47,037-Speed 6311.78 samples/sec Loss 8.2785 LearningRate 0.0009 Epoch: 5 Global Step: 109320 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:40:50,279-Speed 6319.30 samples/sec Loss 8.2177 LearningRate 0.0009 Epoch: 5 Global Step: 109330 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:40:53,525-Speed 6309.41 samples/sec Loss 8.1713 LearningRate 0.0009 Epoch: 5 Global Step: 109340 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:40:56,769-Speed 6315.82 samples/sec Loss 8.2433 LearningRate 0.0009 Epoch: 5 Global Step: 109350 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:00,016-Speed 6307.89 samples/sec Loss 8.2593 LearningRate 0.0009 Epoch: 5 Global Step: 109360 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:03,265-Speed 6305.15 samples/sec Loss 8.1358 LearningRate 0.0009 Epoch: 5 Global Step: 109370 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:06,510-Speed 6312.34 samples/sec Loss 8.3062 LearningRate 0.0009 Epoch: 5 Global Step: 109380 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:09,759-Speed 6304.58 samples/sec Loss 8.2765 LearningRate 0.0009 Epoch: 5 Global Step: 109390 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:13,006-Speed 6309.04 samples/sec Loss 8.2499 LearningRate 0.0009 Epoch: 5 Global Step: 109400 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:16,254-Speed 6306.98 samples/sec Loss 8.1648 LearningRate 0.0009 Epoch: 5 Global Step: 109410 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:41:19,500-Speed 6310.49 samples/sec Loss 8.2513 LearningRate 0.0009 Epoch: 5 Global Step: 109420 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:41:22,747-Speed 6307.82 samples/sec Loss 8.2654 LearningRate 0.0009 Epoch: 5 Global Step: 109430 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:41:26,013-Speed 6273.04 samples/sec Loss 8.2024 LearningRate 0.0009 Epoch: 5 Global Step: 109440 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:41:29,244-Speed 6340.62 samples/sec Loss 8.2706 LearningRate 0.0009 Epoch: 5 Global Step: 109450 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:32,492-Speed 6305.98 samples/sec Loss 8.1038 LearningRate 0.0009 Epoch: 5 Global Step: 109460 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:35,736-Speed 6314.74 samples/sec Loss 8.1697 LearningRate 0.0009 Epoch: 5 Global Step: 109470 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:38,983-Speed 6309.52 samples/sec Loss 8.1627 LearningRate 0.0009 Epoch: 5 Global Step: 109480 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:42,232-Speed 6304.67 samples/sec Loss 8.2863 LearningRate 0.0009 Epoch: 5 Global Step: 109490 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:45,483-Speed 6300.76 samples/sec Loss 8.2185 LearningRate 0.0009 Epoch: 5 Global Step: 109500 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:48,728-Speed 6314.16 samples/sec Loss 8.2008 LearningRate 0.0009 Epoch: 5 Global Step: 109510 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:51,986-Speed 6287.12 samples/sec Loss 8.1983 LearningRate 0.0009 Epoch: 5 Global Step: 109520 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:55,233-Speed 6307.85 samples/sec Loss 8.1991 LearningRate 0.0009 Epoch: 5 Global Step: 109530 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:41:58,478-Speed 6313.60 samples/sec Loss 8.3170 LearningRate 0.0009 Epoch: 5 Global Step: 109540 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:42:01,725-Speed 6307.53 samples/sec Loss 8.2152 LearningRate 0.0009 Epoch: 5 Global Step: 109550 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:04,972-Speed 6309.34 samples/sec Loss 8.2845 LearningRate 0.0009 Epoch: 5 Global Step: 109560 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:08,219-Speed 6309.09 samples/sec Loss 8.0993 LearningRate 0.0009 Epoch: 5 Global Step: 109570 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:11,465-Speed 6311.26 samples/sec Loss 8.1749 LearningRate 0.0009 Epoch: 5 Global Step: 109580 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:14,711-Speed 6310.97 samples/sec Loss 8.2018 LearningRate 0.0009 Epoch: 5 Global Step: 109590 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:17,956-Speed 6312.37 samples/sec Loss 8.1753 LearningRate 0.0009 Epoch: 5 Global Step: 109600 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:21,201-Speed 6311.27 samples/sec Loss 8.1996 LearningRate 0.0009 Epoch: 5 Global Step: 109610 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:24,445-Speed 6314.46 samples/sec Loss 8.1536 LearningRate 0.0009 Epoch: 5 Global Step: 109620 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:27,690-Speed 6313.74 samples/sec Loss 8.2102 LearningRate 0.0009 Epoch: 5 Global Step: 109630 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:30,931-Speed 6320.34 samples/sec Loss 8.1894 LearningRate 0.0009 Epoch: 5 Global Step: 109640 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:34,163-Speed 6337.65 samples/sec Loss 8.2566 LearningRate 0.0009 Epoch: 5 Global Step: 109650 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:37,406-Speed 6317.01 samples/sec Loss 8.2237 LearningRate 0.0009 Epoch: 5 Global Step: 109660 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:40,656-Speed 6303.31 samples/sec Loss 8.2440 LearningRate 0.0009 Epoch: 5 Global Step: 109670 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:43,905-Speed 6304.89 samples/sec Loss 8.2302 LearningRate 0.0009 Epoch: 5 Global Step: 109680 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:47,152-Speed 6308.29 samples/sec Loss 8.1647 LearningRate 0.0009 Epoch: 5 Global Step: 109690 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:50,399-Speed 6309.04 samples/sec Loss 8.1361 LearningRate 0.0009 Epoch: 5 Global Step: 109700 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:53,646-Speed 6308.44 samples/sec Loss 8.1667 LearningRate 0.0009 Epoch: 5 Global Step: 109710 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:42:56,894-Speed 6307.15 samples/sec Loss 8.1383 LearningRate 0.0009 Epoch: 5 Global Step: 109720 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:43:00,141-Speed 6309.07 samples/sec Loss 8.1985 LearningRate 0.0009 Epoch: 5 Global Step: 109730 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:43:03,389-Speed 6307.67 samples/sec Loss 8.2111 LearningRate 0.0009 Epoch: 5 Global Step: 109740 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:43:06,621-Speed 6336.46 samples/sec Loss 8.0575 LearningRate 0.0009 Epoch: 5 Global Step: 109750 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:43:09,866-Speed 6313.81 samples/sec Loss 8.1018 LearningRate 0.0009 Epoch: 5 Global Step: 109760 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:43:13,113-Speed 6308.96 samples/sec Loss 8.1330 LearningRate 0.0009 Epoch: 5 Global Step: 109770 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:43:16,357-Speed 6313.39 samples/sec Loss 8.1576 LearningRate 0.0009 Epoch: 5 Global Step: 109780 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:43:19,603-Speed 6310.76 samples/sec Loss 8.2645 LearningRate 0.0009 Epoch: 5 Global Step: 109790 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:43:22,853-Speed 6303.67 samples/sec Loss 8.0774 LearningRate 0.0009 Epoch: 5 Global Step: 109800 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:43:26,097-Speed 6313.88 samples/sec Loss 8.2270 LearningRate 0.0009 Epoch: 5 Global Step: 109810 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:43:29,342-Speed 6314.25 samples/sec Loss 8.1480 LearningRate 0.0009 Epoch: 5 Global Step: 109820 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:43:32,587-Speed 6311.80 samples/sec Loss 8.1866 LearningRate 0.0009 Epoch: 5 Global Step: 109830 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:43:35,835-Speed 6306.77 samples/sec Loss 8.1263 LearningRate 0.0009 Epoch: 5 Global Step: 109840 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:43:39,084-Speed 6304.05 samples/sec Loss 8.1703 LearningRate 0.0009 Epoch: 5 Global Step: 109850 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:43:42,331-Speed 6309.62 samples/sec Loss 8.1701 LearningRate 0.0009 Epoch: 5 Global Step: 109860 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:43:45,576-Speed 6312.32 samples/sec Loss 8.1243 LearningRate 0.0009 Epoch: 5 Global Step: 109870 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:43:48,821-Speed 6312.72 samples/sec Loss 8.1808 LearningRate 0.0009 Epoch: 5 Global Step: 109880 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:43:52,066-Speed 6313.03 samples/sec Loss 8.2251 LearningRate 0.0009 Epoch: 5 Global Step: 109890 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:43:55,311-Speed 6312.08 samples/sec Loss 8.1783 LearningRate 0.0009 Epoch: 5 Global Step: 109900 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:43:58,561-Speed 6303.67 samples/sec Loss 8.2383 LearningRate 0.0009 Epoch: 5 Global Step: 109910 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:01,809-Speed 6306.49 samples/sec Loss 8.2041 LearningRate 0.0009 Epoch: 5 Global Step: 109920 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:05,068-Speed 6285.76 samples/sec Loss 8.1710 LearningRate 0.0009 Epoch: 5 Global Step: 109930 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:08,316-Speed 6306.63 samples/sec Loss 8.2046 LearningRate 0.0009 Epoch: 5 Global Step: 109940 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:11,548-Speed 6337.84 samples/sec Loss 8.1382 LearningRate 0.0009 Epoch: 5 Global Step: 109950 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:14,792-Speed 6315.43 samples/sec Loss 8.2490 LearningRate 0.0009 Epoch: 5 Global Step: 109960 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:18,037-Speed 6313.41 samples/sec Loss 8.0696 LearningRate 0.0009 Epoch: 5 Global Step: 109970 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:21,285-Speed 6305.83 samples/sec Loss 8.2430 LearningRate 0.0009 Epoch: 5 Global Step: 109980 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:24,534-Speed 6305.69 samples/sec Loss 8.0915 LearningRate 0.0009 Epoch: 5 Global Step: 109990 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:27,788-Speed 6295.33 samples/sec Loss 8.2673 LearningRate 0.0009 Epoch: 5 Global Step: 110000 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:31,041-Speed 6296.83 samples/sec Loss 8.2060 LearningRate 0.0009 Epoch: 5 Global Step: 110010 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:34,286-Speed 6312.00 samples/sec Loss 8.2673 LearningRate 0.0009 Epoch: 5 Global Step: 110020 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:37,531-Speed 6312.60 samples/sec Loss 8.2368 LearningRate 0.0009 Epoch: 5 Global Step: 110030 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:40,775-Speed 6313.85 samples/sec Loss 8.2090 LearningRate 0.0009 Epoch: 5 Global Step: 110040 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:44,002-Speed 6347.52 samples/sec Loss 8.1658 LearningRate 0.0009 Epoch: 5 Global Step: 110050 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:47,248-Speed 6312.48 samples/sec Loss 8.1544 LearningRate 0.0009 Epoch: 5 Global Step: 110060 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:50,510-Speed 6278.78 samples/sec Loss 8.1802 LearningRate 0.0009 Epoch: 5 Global Step: 110070 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:53,753-Speed 6316.25 samples/sec Loss 8.1583 LearningRate 0.0009 Epoch: 5 Global Step: 110080 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:44:57,004-Speed 6301.38 samples/sec Loss 8.1352 LearningRate 0.0009 Epoch: 5 Global Step: 110090 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:45:00,252-Speed 6307.21 samples/sec Loss 8.1628 LearningRate 0.0009 Epoch: 5 Global Step: 110100 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:45:03,498-Speed 6309.73 samples/sec Loss 8.1578 LearningRate 0.0009 Epoch: 5 Global Step: 110110 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:45:06,729-Speed 6340.02 samples/sec Loss 8.2932 LearningRate 0.0009 Epoch: 5 Global Step: 110120 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:45:09,972-Speed 6316.35 samples/sec Loss 8.2130 LearningRate 0.0009 Epoch: 5 Global Step: 110130 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:45:13,217-Speed 6312.44 samples/sec Loss 8.1383 LearningRate 0.0009 Epoch: 5 Global Step: 110140 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:45:16,463-Speed 6312.12 samples/sec Loss 8.0926 LearningRate 0.0009 Epoch: 5 Global Step: 110150 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:45:19,712-Speed 6305.82 samples/sec Loss 8.1913 LearningRate 0.0009 Epoch: 5 Global Step: 110160 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:45:22,953-Speed 6320.17 samples/sec Loss 8.1845 LearningRate 0.0009 Epoch: 5 Global Step: 110170 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:45:26,201-Speed 6306.10 samples/sec Loss 8.1523 LearningRate 0.0009 Epoch: 5 Global Step: 110180 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:45:29,446-Speed 6313.27 samples/sec Loss 8.1843 LearningRate 0.0009 Epoch: 5 Global Step: 110190 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:45:32,692-Speed 6310.47 samples/sec Loss 8.2049 LearningRate 0.0009 Epoch: 5 Global Step: 110200 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:45:35,935-Speed 6316.32 samples/sec Loss 8.1325 LearningRate 0.0009 Epoch: 5 Global Step: 110210 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:45:39,182-Speed 6308.45 samples/sec Loss 8.2268 LearningRate 0.0009 Epoch: 5 Global Step: 110220 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:45:42,430-Speed 6306.69 samples/sec Loss 8.1143 LearningRate 0.0009 Epoch: 5 Global Step: 110230 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:45:45,681-Speed 6303.05 samples/sec Loss 8.2400 LearningRate 0.0009 Epoch: 5 Global Step: 110240 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:45:48,931-Speed 6302.72 samples/sec Loss 8.1013 LearningRate 0.0009 Epoch: 5 Global Step: 110250 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:45:52,176-Speed 6312.13 samples/sec Loss 8.1665 LearningRate 0.0009 Epoch: 5 Global Step: 110260 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:45:55,425-Speed 6305.60 samples/sec Loss 8.0236 LearningRate 0.0009 Epoch: 5 Global Step: 110270 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:45:58,669-Speed 6313.04 samples/sec Loss 8.1985 LearningRate 0.0009 Epoch: 5 Global Step: 110280 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:01,915-Speed 6310.79 samples/sec Loss 8.1352 LearningRate 0.0009 Epoch: 5 Global Step: 110290 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:05,165-Speed 6304.31 samples/sec Loss 8.1268 LearningRate 0.0009 Epoch: 5 Global Step: 110300 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:08,413-Speed 6306.16 samples/sec Loss 8.2576 LearningRate 0.0009 Epoch: 5 Global Step: 110310 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:11,649-Speed 6329.80 samples/sec Loss 8.1118 LearningRate 0.0009 Epoch: 5 Global Step: 110320 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:14,893-Speed 6315.37 samples/sec Loss 8.2083 LearningRate 0.0009 Epoch: 5 Global Step: 110330 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:18,144-Speed 6301.11 samples/sec Loss 8.1618 LearningRate 0.0009 Epoch: 5 Global Step: 110340 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:21,410-Speed 6271.12 samples/sec Loss 8.0821 LearningRate 0.0009 Epoch: 5 Global Step: 110350 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:24,662-Speed 6299.45 samples/sec Loss 8.1505 LearningRate 0.0009 Epoch: 5 Global Step: 110360 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:27,910-Speed 6306.97 samples/sec Loss 8.2362 LearningRate 0.0009 Epoch: 5 Global Step: 110370 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:31,157-Speed 6309.57 samples/sec Loss 8.2633 LearningRate 0.0009 Epoch: 5 Global Step: 110380 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:46:34,398-Speed 6320.21 samples/sec Loss 8.1199 LearningRate 0.0009 Epoch: 5 Global Step: 110390 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:46:37,641-Speed 6317.09 samples/sec Loss 8.2094 LearningRate 0.0009 Epoch: 5 Global Step: 110400 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:46:40,890-Speed 6304.50 samples/sec Loss 8.2054 LearningRate 0.0009 Epoch: 5 Global Step: 110410 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:46:44,138-Speed 6306.60 samples/sec Loss 8.1414 LearningRate 0.0009 Epoch: 5 Global Step: 110420 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:46:47,386-Speed 6306.14 samples/sec Loss 8.1229 LearningRate 0.0009 Epoch: 5 Global Step: 110430 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:46:50,630-Speed 6314.41 samples/sec Loss 8.0672 LearningRate 0.0009 Epoch: 5 Global Step: 110440 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:46:53,875-Speed 6313.59 samples/sec Loss 8.1383 LearningRate 0.0009 Epoch: 5 Global Step: 110450 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:46:57,123-Speed 6305.97 samples/sec Loss 8.1195 LearningRate 0.0009 Epoch: 5 Global Step: 110460 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:47:00,369-Speed 6310.99 samples/sec Loss 8.0421 LearningRate 0.0009 Epoch: 5 Global Step: 110470 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:47:03,615-Speed 6310.94 samples/sec Loss 8.1414 LearningRate 0.0009 Epoch: 5 Global Step: 110480 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:47:06,860-Speed 6313.86 samples/sec Loss 8.1277 LearningRate 0.0009 Epoch: 5 Global Step: 110490 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:10,107-Speed 6307.17 samples/sec Loss 8.1449 LearningRate 0.0009 Epoch: 5 Global Step: 110500 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:13,353-Speed 6310.23 samples/sec Loss 8.2052 LearningRate 0.0009 Epoch: 5 Global Step: 110510 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:16,607-Speed 6296.78 samples/sec Loss 8.0478 LearningRate 0.0009 Epoch: 5 Global Step: 110520 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:19,850-Speed 6316.81 samples/sec Loss 8.1199 LearningRate 0.0009 Epoch: 5 Global Step: 110530 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:23,098-Speed 6306.35 samples/sec Loss 8.1756 LearningRate 0.0009 Epoch: 5 Global Step: 110540 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:26,345-Speed 6307.69 samples/sec Loss 8.1681 LearningRate 0.0009 Epoch: 5 Global Step: 110550 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:29,590-Speed 6312.68 samples/sec Loss 8.1508 LearningRate 0.0009 Epoch: 5 Global Step: 110560 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:32,836-Speed 6311.45 samples/sec Loss 8.1947 LearningRate 0.0009 Epoch: 5 Global Step: 110570 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:36,080-Speed 6314.32 samples/sec Loss 8.2060 LearningRate 0.0009 Epoch: 5 Global Step: 110580 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:39,317-Speed 6329.07 samples/sec Loss 8.2084 LearningRate 0.0009 Epoch: 5 Global Step: 110590 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:42,562-Speed 6311.96 samples/sec Loss 8.2197 LearningRate 0.0009 Epoch: 5 Global Step: 110600 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:45,820-Speed 6288.82 samples/sec Loss 8.1712 LearningRate 0.0009 Epoch: 5 Global Step: 110610 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:49,065-Speed 6311.24 samples/sec Loss 8.2335 LearningRate 0.0009 Epoch: 5 Global Step: 110620 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:52,313-Speed 6308.58 samples/sec Loss 8.0895 LearningRate 0.0009 Epoch: 5 Global Step: 110630 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:55,557-Speed 6313.76 samples/sec Loss 8.2150 LearningRate 0.0009 Epoch: 5 Global Step: 110640 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:47:58,806-Speed 6304.36 samples/sec Loss 8.1905 LearningRate 0.0009 Epoch: 5 Global Step: 110650 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:48:02,051-Speed 6312.50 samples/sec Loss 8.1543 LearningRate 0.0009 Epoch: 5 Global Step: 110660 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:48:05,299-Speed 6310.22 samples/sec Loss 8.1769 LearningRate 0.0009 Epoch: 5 Global Step: 110670 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:48:08,546-Speed 6308.43 samples/sec Loss 8.1431 LearningRate 0.0009 Epoch: 5 Global Step: 110680 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:48:11,792-Speed 6310.86 samples/sec Loss 8.1647 LearningRate 0.0009 Epoch: 5 Global Step: 110690 Fp16 Grad Scale: 131072 Required: 65 hours Training: 2022-04-01 01:48:15,009-Speed 6367.46 samples/sec Loss 8.2214 LearningRate 0.0009 Epoch: 5 Global Step: 110700 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:48:18,270-Speed 6282.32 samples/sec Loss 8.1904 LearningRate 0.0009 Epoch: 5 Global Step: 110710 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:48:21,512-Speed 6318.81 samples/sec Loss 8.1732 LearningRate 0.0009 Epoch: 5 Global Step: 110720 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:48:24,761-Speed 6304.45 samples/sec Loss 8.1961 LearningRate 0.0009 Epoch: 5 Global Step: 110730 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:48:28,004-Speed 6316.03 samples/sec Loss 8.1145 LearningRate 0.0009 Epoch: 5 Global Step: 110740 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:48:31,249-Speed 6312.14 samples/sec Loss 8.2589 LearningRate 0.0009 Epoch: 5 Global Step: 110750 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:48:34,506-Speed 6290.60 samples/sec Loss 8.1325 LearningRate 0.0009 Epoch: 5 Global Step: 110760 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:48:37,750-Speed 6315.23 samples/sec Loss 8.1563 LearningRate 0.0009 Epoch: 5 Global Step: 110770 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:48:40,996-Speed 6311.18 samples/sec Loss 8.1262 LearningRate 0.0009 Epoch: 5 Global Step: 110780 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:48:44,239-Speed 6316.81 samples/sec Loss 8.1594 LearningRate 0.0009 Epoch: 5 Global Step: 110790 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:48:47,486-Speed 6308.35 samples/sec Loss 8.0591 LearningRate 0.0009 Epoch: 5 Global Step: 110800 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:48:50,736-Speed 6304.00 samples/sec Loss 8.0983 LearningRate 0.0009 Epoch: 5 Global Step: 110810 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:48:53,980-Speed 6314.81 samples/sec Loss 8.2437 LearningRate 0.0009 Epoch: 5 Global Step: 110820 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:48:57,223-Speed 6314.76 samples/sec Loss 8.1175 LearningRate 0.0009 Epoch: 5 Global Step: 110830 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:49:00,469-Speed 6311.07 samples/sec Loss 8.0639 LearningRate 0.0009 Epoch: 5 Global Step: 110840 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:49:03,720-Speed 6301.44 samples/sec Loss 8.1522 LearningRate 0.0009 Epoch: 5 Global Step: 110850 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:49:06,966-Speed 6311.51 samples/sec Loss 8.1486 LearningRate 0.0009 Epoch: 5 Global Step: 110860 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:49:10,197-Speed 6339.95 samples/sec Loss 8.1371 LearningRate 0.0009 Epoch: 5 Global Step: 110870 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:49:13,447-Speed 6302.78 samples/sec Loss 8.2239 LearningRate 0.0009 Epoch: 5 Global Step: 110880 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:49:16,694-Speed 6307.87 samples/sec Loss 8.1624 LearningRate 0.0009 Epoch: 5 Global Step: 110890 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:49:19,938-Speed 6314.97 samples/sec Loss 8.1955 LearningRate 0.0009 Epoch: 5 Global Step: 110900 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:49:23,188-Speed 6305.98 samples/sec Loss 8.1703 LearningRate 0.0009 Epoch: 5 Global Step: 110910 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:49:26,434-Speed 6310.87 samples/sec Loss 8.0957 LearningRate 0.0009 Epoch: 5 Global Step: 110920 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:49:29,680-Speed 6310.62 samples/sec Loss 8.1380 LearningRate 0.0009 Epoch: 5 Global Step: 110930 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:49:32,926-Speed 6310.97 samples/sec Loss 8.1857 LearningRate 0.0009 Epoch: 5 Global Step: 110940 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:49:36,172-Speed 6309.24 samples/sec Loss 8.1844 LearningRate 0.0009 Epoch: 5 Global Step: 110950 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:49:39,417-Speed 6313.33 samples/sec Loss 8.1709 LearningRate 0.0009 Epoch: 5 Global Step: 110960 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:49:42,665-Speed 6307.36 samples/sec Loss 8.2134 LearningRate 0.0009 Epoch: 5 Global Step: 110970 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:49:45,915-Speed 6301.93 samples/sec Loss 8.0935 LearningRate 0.0009 Epoch: 5 Global Step: 110980 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:49:49,167-Speed 6299.27 samples/sec Loss 8.0729 LearningRate 0.0009 Epoch: 5 Global Step: 110990 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:49:52,419-Speed 6301.18 samples/sec Loss 8.1926 LearningRate 0.0009 Epoch: 5 Global Step: 111000 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:49:55,662-Speed 6315.59 samples/sec Loss 8.0800 LearningRate 0.0009 Epoch: 5 Global Step: 111010 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:49:58,908-Speed 6310.37 samples/sec Loss 8.0749 LearningRate 0.0009 Epoch: 5 Global Step: 111020 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:50:02,152-Speed 6315.43 samples/sec Loss 8.0991 LearningRate 0.0009 Epoch: 5 Global Step: 111030 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:50:05,400-Speed 6305.88 samples/sec Loss 8.1512 LearningRate 0.0009 Epoch: 5 Global Step: 111040 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:50:08,645-Speed 6312.21 samples/sec Loss 8.1948 LearningRate 0.0009 Epoch: 5 Global Step: 111050 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:50:11,879-Speed 6334.29 samples/sec Loss 8.1592 LearningRate 0.0009 Epoch: 5 Global Step: 111060 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:15,128-Speed 6305.11 samples/sec Loss 8.1786 LearningRate 0.0009 Epoch: 5 Global Step: 111070 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:18,377-Speed 6304.56 samples/sec Loss 8.1741 LearningRate 0.0009 Epoch: 5 Global Step: 111080 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:21,622-Speed 6314.52 samples/sec Loss 8.0889 LearningRate 0.0009 Epoch: 5 Global Step: 111090 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:24,869-Speed 6308.05 samples/sec Loss 8.1069 LearningRate 0.0009 Epoch: 5 Global Step: 111100 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:28,113-Speed 6314.55 samples/sec Loss 8.1567 LearningRate 0.0009 Epoch: 5 Global Step: 111110 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:31,359-Speed 6310.84 samples/sec Loss 8.1366 LearningRate 0.0009 Epoch: 5 Global Step: 111120 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:34,604-Speed 6312.72 samples/sec Loss 8.1922 LearningRate 0.0009 Epoch: 5 Global Step: 111130 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:37,851-Speed 6308.42 samples/sec Loss 8.1892 LearningRate 0.0009 Epoch: 5 Global Step: 111140 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:41,097-Speed 6311.16 samples/sec Loss 8.0950 LearningRate 0.0009 Epoch: 5 Global Step: 111150 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:44,341-Speed 6313.16 samples/sec Loss 8.1306 LearningRate 0.0009 Epoch: 5 Global Step: 111160 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:50:47,586-Speed 6313.81 samples/sec Loss 8.0664 LearningRate 0.0009 Epoch: 5 Global Step: 111170 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:50:50,817-Speed 6339.11 samples/sec Loss 8.1948 LearningRate 0.0009 Epoch: 5 Global Step: 111180 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:54,063-Speed 6311.49 samples/sec Loss 8.1825 LearningRate 0.0009 Epoch: 5 Global Step: 111190 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:50:57,309-Speed 6311.27 samples/sec Loss 8.0532 LearningRate 0.0009 Epoch: 5 Global Step: 111200 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:51:00,569-Speed 6284.18 samples/sec Loss 8.1224 LearningRate 0.0009 Epoch: 5 Global Step: 111210 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:51:03,873-Speed 6200.43 samples/sec Loss 8.1185 LearningRate 0.0009 Epoch: 5 Global Step: 111220 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:51:07,115-Speed 6317.27 samples/sec Loss 7.9927 LearningRate 0.0009 Epoch: 5 Global Step: 111230 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:51:10,363-Speed 6308.01 samples/sec Loss 8.0259 LearningRate 0.0009 Epoch: 5 Global Step: 111240 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:51:13,608-Speed 6311.59 samples/sec Loss 8.0997 LearningRate 0.0009 Epoch: 5 Global Step: 111250 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:51:16,854-Speed 6310.35 samples/sec Loss 8.2020 LearningRate 0.0009 Epoch: 5 Global Step: 111260 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:51:20,103-Speed 6305.38 samples/sec Loss 8.1622 LearningRate 0.0009 Epoch: 5 Global Step: 111270 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:51:23,346-Speed 6315.78 samples/sec Loss 8.1345 LearningRate 0.0009 Epoch: 5 Global Step: 111280 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:26,595-Speed 6305.91 samples/sec Loss 8.1411 LearningRate 0.0009 Epoch: 5 Global Step: 111290 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:29,839-Speed 6314.35 samples/sec Loss 8.1195 LearningRate 0.0009 Epoch: 5 Global Step: 111300 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:33,085-Speed 6311.30 samples/sec Loss 8.1158 LearningRate 0.0009 Epoch: 5 Global Step: 111310 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:36,335-Speed 6302.88 samples/sec Loss 8.2361 LearningRate 0.0009 Epoch: 5 Global Step: 111320 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:39,581-Speed 6311.24 samples/sec Loss 8.1141 LearningRate 0.0009 Epoch: 5 Global Step: 111330 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:42,830-Speed 6303.64 samples/sec Loss 8.1960 LearningRate 0.0009 Epoch: 5 Global Step: 111340 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:46,078-Speed 6307.60 samples/sec Loss 8.1518 LearningRate 0.0009 Epoch: 5 Global Step: 111350 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:49,320-Speed 6317.16 samples/sec Loss 8.1585 LearningRate 0.0009 Epoch: 5 Global Step: 111360 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:52,563-Speed 6316.44 samples/sec Loss 8.1882 LearningRate 0.0009 Epoch: 5 Global Step: 111370 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:55,793-Speed 6341.89 samples/sec Loss 8.1256 LearningRate 0.0009 Epoch: 5 Global Step: 111380 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:51:59,040-Speed 6310.24 samples/sec Loss 8.1488 LearningRate 0.0009 Epoch: 5 Global Step: 111390 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:02,287-Speed 6307.70 samples/sec Loss 8.0880 LearningRate 0.0009 Epoch: 5 Global Step: 111400 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:05,533-Speed 6312.45 samples/sec Loss 8.0915 LearningRate 0.0009 Epoch: 5 Global Step: 111410 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:08,785-Speed 6298.75 samples/sec Loss 8.1044 LearningRate 0.0009 Epoch: 5 Global Step: 111420 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:12,035-Speed 6302.54 samples/sec Loss 8.0472 LearningRate 0.0009 Epoch: 5 Global Step: 111430 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:15,281-Speed 6310.18 samples/sec Loss 8.1302 LearningRate 0.0009 Epoch: 5 Global Step: 111440 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:18,526-Speed 6313.12 samples/sec Loss 8.0359 LearningRate 0.0009 Epoch: 5 Global Step: 111450 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:21,771-Speed 6312.87 samples/sec Loss 8.1042 LearningRate 0.0009 Epoch: 5 Global Step: 111460 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:25,028-Speed 6289.97 samples/sec Loss 8.1510 LearningRate 0.0009 Epoch: 5 Global Step: 111470 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:28,263-Speed 6332.56 samples/sec Loss 8.1193 LearningRate 0.0009 Epoch: 5 Global Step: 111480 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:31,505-Speed 6317.22 samples/sec Loss 8.2160 LearningRate 0.0009 Epoch: 5 Global Step: 111490 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:34,752-Speed 6308.68 samples/sec Loss 8.1288 LearningRate 0.0009 Epoch: 5 Global Step: 111500 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:38,002-Speed 6304.18 samples/sec Loss 8.0610 LearningRate 0.0009 Epoch: 5 Global Step: 111510 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:41,246-Speed 6314.67 samples/sec Loss 8.0785 LearningRate 0.0009 Epoch: 5 Global Step: 111520 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:44,491-Speed 6312.54 samples/sec Loss 8.0915 LearningRate 0.0009 Epoch: 5 Global Step: 111530 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:47,738-Speed 6308.55 samples/sec Loss 8.1774 LearningRate 0.0009 Epoch: 5 Global Step: 111540 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:50,981-Speed 6315.55 samples/sec Loss 8.1102 LearningRate 0.0009 Epoch: 5 Global Step: 111550 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:54,228-Speed 6309.00 samples/sec Loss 8.2015 LearningRate 0.0009 Epoch: 5 Global Step: 111560 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:52:57,480-Speed 6298.97 samples/sec Loss 8.1908 LearningRate 0.0009 Epoch: 5 Global Step: 111570 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:53:00,731-Speed 6301.17 samples/sec Loss 8.0800 LearningRate 0.0009 Epoch: 5 Global Step: 111580 Fp16 Grad Scale: 131072 Required: 65 hours Training: 2022-04-01 01:53:03,966-Speed 6333.15 samples/sec Loss 8.0562 LearningRate 0.0009 Epoch: 5 Global Step: 111590 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:53:07,216-Speed 6302.01 samples/sec Loss 8.1627 LearningRate 0.0009 Epoch: 5 Global Step: 111600 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:53:10,463-Speed 6308.38 samples/sec Loss 8.0866 LearningRate 0.0009 Epoch: 5 Global Step: 111610 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:53:13,709-Speed 6311.72 samples/sec Loss 8.2108 LearningRate 0.0009 Epoch: 5 Global Step: 111620 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:53:16,956-Speed 6309.52 samples/sec Loss 8.1595 LearningRate 0.0009 Epoch: 5 Global Step: 111630 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:53:20,186-Speed 6342.12 samples/sec Loss 8.1436 LearningRate 0.0009 Epoch: 5 Global Step: 111640 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:53:23,455-Speed 6265.83 samples/sec Loss 8.1640 LearningRate 0.0009 Epoch: 5 Global Step: 111650 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:53:26,812-Speed 6102.92 samples/sec Loss 8.1542 LearningRate 0.0009 Epoch: 5 Global Step: 111660 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:53:30,069-Speed 6288.17 samples/sec Loss 8.0467 LearningRate 0.0009 Epoch: 5 Global Step: 111670 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:53:33,321-Speed 6300.17 samples/sec Loss 7.9752 LearningRate 0.0009 Epoch: 5 Global Step: 111680 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:53:36,568-Speed 6307.54 samples/sec Loss 8.1604 LearningRate 0.0009 Epoch: 5 Global Step: 111690 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:53:39,815-Speed 6309.98 samples/sec Loss 8.1861 LearningRate 0.0009 Epoch: 5 Global Step: 111700 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:53:43,063-Speed 6305.31 samples/sec Loss 8.1693 LearningRate 0.0009 Epoch: 5 Global Step: 111710 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:53:46,306-Speed 6317.17 samples/sec Loss 8.1452 LearningRate 0.0009 Epoch: 5 Global Step: 111720 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:53:49,552-Speed 6310.61 samples/sec Loss 8.2210 LearningRate 0.0009 Epoch: 5 Global Step: 111730 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:53:52,800-Speed 6306.74 samples/sec Loss 8.0871 LearningRate 0.0009 Epoch: 5 Global Step: 111740 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:53:56,049-Speed 6305.51 samples/sec Loss 8.1431 LearningRate 0.0009 Epoch: 5 Global Step: 111750 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:53:59,294-Speed 6312.53 samples/sec Loss 8.0951 LearningRate 0.0009 Epoch: 5 Global Step: 111760 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:02,542-Speed 6307.45 samples/sec Loss 8.1266 LearningRate 0.0009 Epoch: 5 Global Step: 111770 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:05,798-Speed 6290.76 samples/sec Loss 8.1801 LearningRate 0.0009 Epoch: 5 Global Step: 111780 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:09,044-Speed 6310.32 samples/sec Loss 8.1160 LearningRate 0.0009 Epoch: 5 Global Step: 111790 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:12,288-Speed 6315.36 samples/sec Loss 8.1025 LearningRate 0.0009 Epoch: 5 Global Step: 111800 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:15,538-Speed 6301.71 samples/sec Loss 8.1422 LearningRate 0.0009 Epoch: 5 Global Step: 111810 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:18,784-Speed 6310.63 samples/sec Loss 8.1343 LearningRate 0.0009 Epoch: 5 Global Step: 111820 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:22,033-Speed 6306.58 samples/sec Loss 8.1152 LearningRate 0.0009 Epoch: 5 Global Step: 111830 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:25,269-Speed 6330.22 samples/sec Loss 8.1652 LearningRate 0.0009 Epoch: 5 Global Step: 111840 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:28,515-Speed 6310.82 samples/sec Loss 8.0842 LearningRate 0.0009 Epoch: 5 Global Step: 111850 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:31,760-Speed 6312.78 samples/sec Loss 8.0970 LearningRate 0.0009 Epoch: 5 Global Step: 111860 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:35,009-Speed 6304.13 samples/sec Loss 8.2120 LearningRate 0.0009 Epoch: 5 Global Step: 111870 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:38,256-Speed 6308.62 samples/sec Loss 8.0894 LearningRate 0.0009 Epoch: 5 Global Step: 111880 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:41,506-Speed 6304.37 samples/sec Loss 8.1716 LearningRate 0.0009 Epoch: 5 Global Step: 111890 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:44,753-Speed 6307.10 samples/sec Loss 8.1540 LearningRate 0.0009 Epoch: 5 Global Step: 111900 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:54:47,995-Speed 6318.95 samples/sec Loss 8.1728 LearningRate 0.0009 Epoch: 5 Global Step: 111910 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:54:51,244-Speed 6305.47 samples/sec Loss 8.2477 LearningRate 0.0009 Epoch: 5 Global Step: 111920 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:54:54,486-Speed 6317.33 samples/sec Loss 8.1219 LearningRate 0.0009 Epoch: 5 Global Step: 111930 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:54:57,732-Speed 6310.58 samples/sec Loss 8.1492 LearningRate 0.0009 Epoch: 5 Global Step: 111940 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:00,975-Speed 6317.60 samples/sec Loss 8.0422 LearningRate 0.0009 Epoch: 5 Global Step: 111950 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:04,237-Speed 6280.10 samples/sec Loss 8.1013 LearningRate 0.0009 Epoch: 5 Global Step: 111960 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:07,484-Speed 6306.84 samples/sec Loss 8.0934 LearningRate 0.0009 Epoch: 5 Global Step: 111970 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:10,733-Speed 6306.20 samples/sec Loss 8.1003 LearningRate 0.0009 Epoch: 5 Global Step: 111980 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:13,979-Speed 6311.14 samples/sec Loss 8.0437 LearningRate 0.0009 Epoch: 5 Global Step: 111990 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:17,223-Speed 6313.39 samples/sec Loss 8.1203 LearningRate 0.0009 Epoch: 5 Global Step: 112000 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:20,469-Speed 6311.82 samples/sec Loss 8.0744 LearningRate 0.0009 Epoch: 5 Global Step: 112010 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:55:23,701-Speed 6338.22 samples/sec Loss 8.0884 LearningRate 0.0009 Epoch: 5 Global Step: 112020 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:26,950-Speed 6305.34 samples/sec Loss 8.1429 LearningRate 0.0009 Epoch: 5 Global Step: 112030 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:30,194-Speed 6314.32 samples/sec Loss 8.0595 LearningRate 0.0009 Epoch: 5 Global Step: 112040 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:33,445-Speed 6301.70 samples/sec Loss 8.1386 LearningRate 0.0009 Epoch: 5 Global Step: 112050 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:36,693-Speed 6306.08 samples/sec Loss 8.1412 LearningRate 0.0009 Epoch: 5 Global Step: 112060 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:39,939-Speed 6311.50 samples/sec Loss 8.1111 LearningRate 0.0009 Epoch: 5 Global Step: 112070 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:43,182-Speed 6315.03 samples/sec Loss 8.0878 LearningRate 0.0009 Epoch: 5 Global Step: 112080 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:46,430-Speed 6307.38 samples/sec Loss 8.0899 LearningRate 0.0009 Epoch: 5 Global Step: 112090 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:49,675-Speed 6312.34 samples/sec Loss 8.1259 LearningRate 0.0009 Epoch: 5 Global Step: 112100 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:52,921-Speed 6311.89 samples/sec Loss 8.0783 LearningRate 0.0009 Epoch: 5 Global Step: 112110 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:55:56,169-Speed 6305.58 samples/sec Loss 8.1323 LearningRate 0.0009 Epoch: 5 Global Step: 112120 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:55:59,415-Speed 6311.08 samples/sec Loss 8.1558 LearningRate 0.0009 Epoch: 5 Global Step: 112130 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:56:02,658-Speed 6316.28 samples/sec Loss 8.1360 LearningRate 0.0009 Epoch: 5 Global Step: 112140 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:56:05,893-Speed 6333.49 samples/sec Loss 8.1149 LearningRate 0.0009 Epoch: 5 Global Step: 112150 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:56:09,140-Speed 6308.69 samples/sec Loss 8.1543 LearningRate 0.0009 Epoch: 5 Global Step: 112160 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:56:12,383-Speed 6315.89 samples/sec Loss 7.9963 LearningRate 0.0009 Epoch: 5 Global Step: 112170 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:56:15,630-Speed 6307.79 samples/sec Loss 8.1476 LearningRate 0.0009 Epoch: 5 Global Step: 112180 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:56:18,874-Speed 6315.13 samples/sec Loss 8.0956 LearningRate 0.0009 Epoch: 5 Global Step: 112190 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:56:22,117-Speed 6317.22 samples/sec Loss 8.1048 LearningRate 0.0009 Epoch: 5 Global Step: 112200 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:56:25,361-Speed 6313.84 samples/sec Loss 8.0811 LearningRate 0.0009 Epoch: 5 Global Step: 112210 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:56:28,604-Speed 6317.43 samples/sec Loss 8.1256 LearningRate 0.0009 Epoch: 5 Global Step: 112220 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:56:31,852-Speed 6305.90 samples/sec Loss 8.0479 LearningRate 0.0009 Epoch: 5 Global Step: 112230 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:56:35,107-Speed 6294.97 samples/sec Loss 8.1466 LearningRate 0.0009 Epoch: 5 Global Step: 112240 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:56:38,351-Speed 6313.88 samples/sec Loss 8.2002 LearningRate 0.0009 Epoch: 5 Global Step: 112250 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:56:41,601-Speed 6302.50 samples/sec Loss 8.0794 LearningRate 0.0009 Epoch: 5 Global Step: 112260 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:56:44,849-Speed 6307.27 samples/sec Loss 8.1326 LearningRate 0.0009 Epoch: 5 Global Step: 112270 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:56:48,094-Speed 6313.75 samples/sec Loss 8.0895 LearningRate 0.0009 Epoch: 5 Global Step: 112280 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:56:51,340-Speed 6310.11 samples/sec Loss 8.0815 LearningRate 0.0009 Epoch: 5 Global Step: 112290 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:56:54,585-Speed 6312.41 samples/sec Loss 8.0916 LearningRate 0.0009 Epoch: 5 Global Step: 112300 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:56:57,830-Speed 6316.06 samples/sec Loss 8.0006 LearningRate 0.0009 Epoch: 5 Global Step: 112310 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:57:01,075-Speed 6313.52 samples/sec Loss 8.0516 LearningRate 0.0009 Epoch: 5 Global Step: 112320 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:57:04,325-Speed 6302.48 samples/sec Loss 8.1553 LearningRate 0.0009 Epoch: 5 Global Step: 112330 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:57:07,577-Speed 6299.36 samples/sec Loss 8.1250 LearningRate 0.0009 Epoch: 5 Global Step: 112340 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:57:10,812-Speed 6331.10 samples/sec Loss 8.0842 LearningRate 0.0009 Epoch: 5 Global Step: 112350 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:57:14,043-Speed 6341.10 samples/sec Loss 8.1108 LearningRate 0.0009 Epoch: 5 Global Step: 112360 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:17,303-Speed 6283.43 samples/sec Loss 8.0470 LearningRate 0.0009 Epoch: 5 Global Step: 112370 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:20,547-Speed 6314.75 samples/sec Loss 8.1609 LearningRate 0.0009 Epoch: 5 Global Step: 112380 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:23,798-Speed 6301.33 samples/sec Loss 8.1206 LearningRate 0.0009 Epoch: 5 Global Step: 112390 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:27,046-Speed 6306.98 samples/sec Loss 8.0469 LearningRate 0.0009 Epoch: 5 Global Step: 112400 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:30,291-Speed 6311.71 samples/sec Loss 8.0877 LearningRate 0.0009 Epoch: 5 Global Step: 112410 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:33,537-Speed 6311.88 samples/sec Loss 8.1610 LearningRate 0.0009 Epoch: 5 Global Step: 112420 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:36,780-Speed 6316.41 samples/sec Loss 8.0215 LearningRate 0.0009 Epoch: 5 Global Step: 112430 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:40,029-Speed 6304.88 samples/sec Loss 8.0545 LearningRate 0.0009 Epoch: 5 Global Step: 112440 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:43,273-Speed 6314.79 samples/sec Loss 8.0946 LearningRate 0.0009 Epoch: 5 Global Step: 112450 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:46,502-Speed 6344.19 samples/sec Loss 8.0461 LearningRate 0.0009 Epoch: 5 Global Step: 112460 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:49,750-Speed 6305.82 samples/sec Loss 8.1077 LearningRate 0.0009 Epoch: 5 Global Step: 112470 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:52,996-Speed 6311.66 samples/sec Loss 8.0993 LearningRate 0.0009 Epoch: 5 Global Step: 112480 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:56,245-Speed 6304.57 samples/sec Loss 8.0769 LearningRate 0.0009 Epoch: 5 Global Step: 112490 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:57:59,492-Speed 6310.05 samples/sec Loss 8.0640 LearningRate 0.0009 Epoch: 5 Global Step: 112500 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:58:02,737-Speed 6312.36 samples/sec Loss 8.1287 LearningRate 0.0009 Epoch: 5 Global Step: 112510 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:58:05,982-Speed 6312.82 samples/sec Loss 8.0553 LearningRate 0.0009 Epoch: 5 Global Step: 112520 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:58:09,226-Speed 6314.51 samples/sec Loss 8.0316 LearningRate 0.0009 Epoch: 5 Global Step: 112530 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:58:12,473-Speed 6308.42 samples/sec Loss 8.1396 LearningRate 0.0009 Epoch: 5 Global Step: 112540 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:58:15,719-Speed 6311.26 samples/sec Loss 8.0872 LearningRate 0.0009 Epoch: 5 Global Step: 112550 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 01:58:18,966-Speed 6307.12 samples/sec Loss 8.0765 LearningRate 0.0009 Epoch: 5 Global Step: 112560 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:22,210-Speed 6314.72 samples/sec Loss 8.0909 LearningRate 0.0009 Epoch: 5 Global Step: 112570 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:25,453-Speed 6316.32 samples/sec Loss 8.1812 LearningRate 0.0009 Epoch: 5 Global Step: 112580 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:28,699-Speed 6310.91 samples/sec Loss 8.1484 LearningRate 0.0009 Epoch: 5 Global Step: 112590 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:31,949-Speed 6304.04 samples/sec Loss 8.0938 LearningRate 0.0009 Epoch: 5 Global Step: 112600 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:35,194-Speed 6311.53 samples/sec Loss 8.1736 LearningRate 0.0009 Epoch: 5 Global Step: 112610 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:38,440-Speed 6312.22 samples/sec Loss 7.9994 LearningRate 0.0009 Epoch: 5 Global Step: 112620 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:41,686-Speed 6310.52 samples/sec Loss 8.1064 LearningRate 0.0009 Epoch: 5 Global Step: 112630 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:44,933-Speed 6307.66 samples/sec Loss 8.0300 LearningRate 0.0009 Epoch: 5 Global Step: 112640 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:48,182-Speed 6305.08 samples/sec Loss 8.1536 LearningRate 0.0009 Epoch: 5 Global Step: 112650 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:51,427-Speed 6311.91 samples/sec Loss 8.0067 LearningRate 0.0009 Epoch: 5 Global Step: 112660 Fp16 Grad Scale: 131072 Required: 65 hours Training: 2022-04-01 01:58:54,660-Speed 6337.43 samples/sec Loss 8.1259 LearningRate 0.0009 Epoch: 5 Global Step: 112670 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:58:57,907-Speed 6307.93 samples/sec Loss 8.0756 LearningRate 0.0009 Epoch: 5 Global Step: 112680 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:01,157-Speed 6304.12 samples/sec Loss 8.1971 LearningRate 0.0009 Epoch: 5 Global Step: 112690 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:04,408-Speed 6300.48 samples/sec Loss 8.0720 LearningRate 0.0009 Epoch: 5 Global Step: 112700 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:07,657-Speed 6306.21 samples/sec Loss 8.0336 LearningRate 0.0009 Epoch: 5 Global Step: 112710 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:10,898-Speed 6318.87 samples/sec Loss 8.0705 LearningRate 0.0009 Epoch: 5 Global Step: 112720 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:14,141-Speed 6317.26 samples/sec Loss 8.0286 LearningRate 0.0009 Epoch: 5 Global Step: 112730 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:17,388-Speed 6309.82 samples/sec Loss 8.1088 LearningRate 0.0009 Epoch: 5 Global Step: 112740 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:20,637-Speed 6304.80 samples/sec Loss 8.0686 LearningRate 0.0009 Epoch: 5 Global Step: 112750 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:23,884-Speed 6307.83 samples/sec Loss 8.0774 LearningRate 0.0009 Epoch: 5 Global Step: 112760 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:27,115-Speed 6340.43 samples/sec Loss 8.1148 LearningRate 0.0009 Epoch: 5 Global Step: 112770 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:30,360-Speed 6312.58 samples/sec Loss 8.0982 LearningRate 0.0009 Epoch: 5 Global Step: 112780 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:33,607-Speed 6308.32 samples/sec Loss 8.0853 LearningRate 0.0009 Epoch: 5 Global Step: 112790 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:36,855-Speed 6306.78 samples/sec Loss 8.1238 LearningRate 0.0009 Epoch: 5 Global Step: 112800 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:40,103-Speed 6306.76 samples/sec Loss 8.0231 LearningRate 0.0009 Epoch: 5 Global Step: 112810 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:43,350-Speed 6308.26 samples/sec Loss 8.1044 LearningRate 0.0009 Epoch: 5 Global Step: 112820 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:46,599-Speed 6305.09 samples/sec Loss 8.0328 LearningRate 0.0009 Epoch: 5 Global Step: 112830 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:49,879-Speed 6245.21 samples/sec Loss 8.0446 LearningRate 0.0009 Epoch: 5 Global Step: 112840 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:53,128-Speed 6305.64 samples/sec Loss 8.0602 LearningRate 0.0009 Epoch: 5 Global Step: 112850 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:56,378-Speed 6302.85 samples/sec Loss 8.1546 LearningRate 0.0009 Epoch: 5 Global Step: 112860 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 01:59:59,617-Speed 6323.23 samples/sec Loss 8.0763 LearningRate 0.0009 Epoch: 5 Global Step: 112870 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:02,864-Speed 6309.43 samples/sec Loss 8.0912 LearningRate 0.0009 Epoch: 5 Global Step: 112880 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:06,117-Speed 6298.07 samples/sec Loss 8.0733 LearningRate 0.0009 Epoch: 5 Global Step: 112890 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:09,369-Speed 6298.01 samples/sec Loss 8.0818 LearningRate 0.0009 Epoch: 5 Global Step: 112900 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:12,616-Speed 6309.05 samples/sec Loss 8.0455 LearningRate 0.0009 Epoch: 5 Global Step: 112910 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:15,863-Speed 6308.59 samples/sec Loss 8.1734 LearningRate 0.0009 Epoch: 5 Global Step: 112920 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:19,113-Speed 6304.44 samples/sec Loss 8.0792 LearningRate 0.0009 Epoch: 5 Global Step: 112930 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:22,364-Speed 6300.82 samples/sec Loss 7.9999 LearningRate 0.0009 Epoch: 5 Global Step: 112940 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:25,611-Speed 6308.87 samples/sec Loss 8.0745 LearningRate 0.0009 Epoch: 5 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:28,854-Speed 6315.67 samples/sec Loss 8.1210 LearningRate 0.0009 Epoch: 5 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:32,090-Speed 6331.01 samples/sec Loss 8.0883 LearningRate 0.0009 Epoch: 5 Global Step: 112970 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:35,337-Speed 6309.50 samples/sec Loss 8.0633 LearningRate 0.0009 Epoch: 5 Global Step: 112980 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:38,582-Speed 6312.62 samples/sec Loss 8.0262 LearningRate 0.0009 Epoch: 5 Global Step: 112990 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:41,840-Speed 6285.60 samples/sec Loss 8.1016 LearningRate 0.0009 Epoch: 5 Global Step: 113000 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:45,087-Speed 6308.74 samples/sec Loss 8.0405 LearningRate 0.0009 Epoch: 5 Global Step: 113010 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:48,336-Speed 6306.30 samples/sec Loss 8.0663 LearningRate 0.0009 Epoch: 5 Global Step: 113020 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:51,583-Speed 6309.07 samples/sec Loss 8.0960 LearningRate 0.0009 Epoch: 5 Global Step: 113030 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:54,828-Speed 6311.20 samples/sec Loss 8.1353 LearningRate 0.0009 Epoch: 5 Global Step: 113040 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:00:58,077-Speed 6306.68 samples/sec Loss 8.0800 LearningRate 0.0009 Epoch: 5 Global Step: 113050 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:01,322-Speed 6311.24 samples/sec Loss 8.1026 LearningRate 0.0009 Epoch: 5 Global Step: 113060 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:04,557-Speed 6333.06 samples/sec Loss 8.0064 LearningRate 0.0009 Epoch: 5 Global Step: 113070 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:07,800-Speed 6316.53 samples/sec Loss 8.0354 LearningRate 0.0009 Epoch: 5 Global Step: 113080 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:11,046-Speed 6311.35 samples/sec Loss 8.0166 LearningRate 0.0009 Epoch: 5 Global Step: 113090 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:14,289-Speed 6317.04 samples/sec Loss 8.0196 LearningRate 0.0009 Epoch: 5 Global Step: 113100 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:17,535-Speed 6309.91 samples/sec Loss 8.0313 LearningRate 0.0009 Epoch: 5 Global Step: 113110 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:20,780-Speed 6312.83 samples/sec Loss 8.0999 LearningRate 0.0009 Epoch: 5 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:24,039-Speed 6285.10 samples/sec Loss 8.1090 LearningRate 0.0009 Epoch: 5 Global Step: 113130 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:27,283-Speed 6315.37 samples/sec Loss 8.0827 LearningRate 0.0009 Epoch: 5 Global Step: 113140 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:30,526-Speed 6316.59 samples/sec Loss 8.0674 LearningRate 0.0009 Epoch: 5 Global Step: 113150 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:33,775-Speed 6304.96 samples/sec Loss 8.0294 LearningRate 0.0009 Epoch: 5 Global Step: 113160 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:37,010-Speed 6331.12 samples/sec Loss 8.0650 LearningRate 0.0009 Epoch: 5 Global Step: 113170 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:40,257-Speed 6309.48 samples/sec Loss 8.1014 LearningRate 0.0009 Epoch: 5 Global Step: 113180 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:43,501-Speed 6314.53 samples/sec Loss 8.0961 LearningRate 0.0009 Epoch: 5 Global Step: 113190 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:46,744-Speed 6315.83 samples/sec Loss 8.1594 LearningRate 0.0009 Epoch: 5 Global Step: 113200 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:49,994-Speed 6303.39 samples/sec Loss 8.1281 LearningRate 0.0009 Epoch: 5 Global Step: 113210 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:53,236-Speed 6319.18 samples/sec Loss 8.0897 LearningRate 0.0009 Epoch: 5 Global Step: 113220 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:56,479-Speed 6316.35 samples/sec Loss 8.0758 LearningRate 0.0009 Epoch: 5 Global Step: 113230 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:01:59,728-Speed 6304.86 samples/sec Loss 8.0387 LearningRate 0.0009 Epoch: 5 Global Step: 113240 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:02:02,976-Speed 6305.98 samples/sec Loss 8.0226 LearningRate 0.0009 Epoch: 5 Global Step: 113250 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:02:06,229-Speed 6296.86 samples/sec Loss 8.0405 LearningRate 0.0009 Epoch: 5 Global Step: 113260 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:02:09,467-Speed 6326.30 samples/sec Loss 8.0656 LearningRate 0.0009 Epoch: 5 Global Step: 113270 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:02:12,714-Speed 6308.93 samples/sec Loss 8.0417 LearningRate 0.0009 Epoch: 5 Global Step: 113280 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:02:15,945-Speed 6340.49 samples/sec Loss 8.0692 LearningRate 0.0009 Epoch: 5 Global Step: 113290 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:19,191-Speed 6310.06 samples/sec Loss 8.0483 LearningRate 0.0009 Epoch: 5 Global Step: 113300 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:22,440-Speed 6306.44 samples/sec Loss 8.1036 LearningRate 0.0009 Epoch: 5 Global Step: 113310 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:25,686-Speed 6310.08 samples/sec Loss 8.0849 LearningRate 0.0009 Epoch: 5 Global Step: 113320 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:28,934-Speed 6308.66 samples/sec Loss 8.0063 LearningRate 0.0009 Epoch: 5 Global Step: 113330 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:32,181-Speed 6307.79 samples/sec Loss 8.1196 LearningRate 0.0009 Epoch: 5 Global Step: 113340 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:35,425-Speed 6315.25 samples/sec Loss 8.0112 LearningRate 0.0009 Epoch: 5 Global Step: 113350 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:38,670-Speed 6311.89 samples/sec Loss 8.0536 LearningRate 0.0009 Epoch: 5 Global Step: 113360 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:41,916-Speed 6311.85 samples/sec Loss 8.0428 LearningRate 0.0009 Epoch: 5 Global Step: 113370 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:45,162-Speed 6310.76 samples/sec Loss 8.0592 LearningRate 0.0009 Epoch: 5 Global Step: 113380 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:48,407-Speed 6312.37 samples/sec Loss 8.0796 LearningRate 0.0009 Epoch: 5 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:02:51,638-Speed 6340.74 samples/sec Loss 8.1590 LearningRate 0.0009 Epoch: 5 Global Step: 113400 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:54,882-Speed 6313.44 samples/sec Loss 8.0544 LearningRate 0.0009 Epoch: 5 Global Step: 113410 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:02:58,128-Speed 6310.43 samples/sec Loss 8.0483 LearningRate 0.0009 Epoch: 5 Global Step: 113420 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:01,375-Speed 6309.09 samples/sec Loss 8.1565 LearningRate 0.0009 Epoch: 5 Global Step: 113430 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:04,621-Speed 6311.86 samples/sec Loss 8.0377 LearningRate 0.0009 Epoch: 5 Global Step: 113440 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:07,868-Speed 6307.31 samples/sec Loss 8.1362 LearningRate 0.0009 Epoch: 5 Global Step: 113450 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:11,116-Speed 6306.49 samples/sec Loss 7.9747 LearningRate 0.0009 Epoch: 5 Global Step: 113460 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:14,363-Speed 6310.40 samples/sec Loss 8.1370 LearningRate 0.0009 Epoch: 5 Global Step: 113470 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:17,607-Speed 6314.60 samples/sec Loss 8.0639 LearningRate 0.0009 Epoch: 5 Global Step: 113480 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:20,851-Speed 6313.64 samples/sec Loss 8.0412 LearningRate 0.0009 Epoch: 5 Global Step: 113490 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:24,096-Speed 6312.14 samples/sec Loss 8.0385 LearningRate 0.0009 Epoch: 5 Global Step: 113500 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:03:27,346-Speed 6304.66 samples/sec Loss 8.0567 LearningRate 0.0009 Epoch: 5 Global Step: 113510 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:03:30,590-Speed 6315.84 samples/sec Loss 8.0786 LearningRate 0.0009 Epoch: 5 Global Step: 113520 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:03:33,825-Speed 6332.22 samples/sec Loss 8.0716 LearningRate 0.0009 Epoch: 5 Global Step: 113530 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:37,071-Speed 6310.51 samples/sec Loss 8.0403 LearningRate 0.0009 Epoch: 5 Global Step: 113540 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:40,318-Speed 6309.39 samples/sec Loss 8.0887 LearningRate 0.0009 Epoch: 5 Global Step: 113550 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:43,564-Speed 6309.41 samples/sec Loss 8.0534 LearningRate 0.0009 Epoch: 5 Global Step: 113560 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:46,814-Speed 6304.88 samples/sec Loss 8.1972 LearningRate 0.0009 Epoch: 5 Global Step: 113570 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:50,059-Speed 6311.53 samples/sec Loss 8.0879 LearningRate 0.0009 Epoch: 5 Global Step: 113580 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:53,306-Speed 6309.11 samples/sec Loss 8.0246 LearningRate 0.0009 Epoch: 5 Global Step: 113590 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:56,550-Speed 6315.26 samples/sec Loss 8.0757 LearningRate 0.0009 Epoch: 5 Global Step: 113600 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:03:59,792-Speed 6317.59 samples/sec Loss 8.0851 LearningRate 0.0009 Epoch: 5 Global Step: 113610 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:04:03,034-Speed 6318.82 samples/sec Loss 8.0329 LearningRate 0.0009 Epoch: 5 Global Step: 113620 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:04:06,295-Speed 6282.28 samples/sec Loss 8.0784 LearningRate 0.0009 Epoch: 5 Global Step: 113630 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:09,544-Speed 6303.64 samples/sec Loss 8.0341 LearningRate 0.0009 Epoch: 5 Global Step: 113640 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:12,788-Speed 6314.42 samples/sec Loss 8.0899 LearningRate 0.0009 Epoch: 5 Global Step: 113650 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:16,034-Speed 6310.16 samples/sec Loss 8.0382 LearningRate 0.0009 Epoch: 5 Global Step: 113660 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:19,288-Speed 6296.31 samples/sec Loss 8.1287 LearningRate 0.0009 Epoch: 5 Global Step: 113670 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:22,533-Speed 6312.58 samples/sec Loss 8.0669 LearningRate 0.0009 Epoch: 5 Global Step: 113680 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:25,781-Speed 6307.42 samples/sec Loss 8.1129 LearningRate 0.0009 Epoch: 5 Global Step: 113690 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:29,027-Speed 6310.11 samples/sec Loss 8.0361 LearningRate 0.0009 Epoch: 5 Global Step: 113700 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:32,276-Speed 6305.99 samples/sec Loss 8.0674 LearningRate 0.0009 Epoch: 5 Global Step: 113710 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:35,564-Speed 6229.71 samples/sec Loss 8.1197 LearningRate 0.0009 Epoch: 5 Global Step: 113720 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:38,807-Speed 6316.44 samples/sec Loss 8.0858 LearningRate 0.0009 Epoch: 5 Global Step: 113730 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:42,054-Speed 6308.75 samples/sec Loss 8.0557 LearningRate 0.0009 Epoch: 5 Global Step: 113740 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:45,305-Speed 6302.49 samples/sec Loss 8.1295 LearningRate 0.0009 Epoch: 5 Global Step: 113750 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:48,549-Speed 6314.61 samples/sec Loss 7.9898 LearningRate 0.0009 Epoch: 5 Global Step: 113760 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:51,799-Speed 6301.55 samples/sec Loss 8.0518 LearningRate 0.0009 Epoch: 5 Global Step: 113770 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:55,046-Speed 6308.75 samples/sec Loss 8.0903 LearningRate 0.0009 Epoch: 5 Global Step: 113780 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:04:58,296-Speed 6304.56 samples/sec Loss 8.0414 LearningRate 0.0009 Epoch: 5 Global Step: 113790 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:05:01,546-Speed 6303.01 samples/sec Loss 8.0604 LearningRate 0.0009 Epoch: 5 Global Step: 113800 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:05:04,798-Speed 6297.40 samples/sec Loss 8.0566 LearningRate 0.0009 Epoch: 5 Global Step: 113810 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:05:08,046-Speed 6308.69 samples/sec Loss 8.0890 LearningRate 0.0009 Epoch: 5 Global Step: 113820 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:05:11,292-Speed 6310.23 samples/sec Loss 8.0497 LearningRate 0.0009 Epoch: 5 Global Step: 113830 Fp16 Grad Scale: 131072 Required: 65 hours Training: 2022-04-01 02:05:14,528-Speed 6330.06 samples/sec Loss 7.9555 LearningRate 0.0009 Epoch: 5 Global Step: 113840 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:05:17,780-Speed 6299.40 samples/sec Loss 8.1278 LearningRate 0.0009 Epoch: 5 Global Step: 113850 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:05:21,025-Speed 6312.85 samples/sec Loss 7.9947 LearningRate 0.0009 Epoch: 5 Global Step: 113860 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:05:24,271-Speed 6310.18 samples/sec Loss 8.0960 LearningRate 0.0009 Epoch: 5 Global Step: 113870 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:05:27,526-Speed 6292.36 samples/sec Loss 8.0526 LearningRate 0.0009 Epoch: 5 Global Step: 113880 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:05:30,772-Speed 6311.70 samples/sec Loss 7.9945 LearningRate 0.0009 Epoch: 5 Global Step: 113890 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:05:34,003-Speed 6339.70 samples/sec Loss 8.0421 LearningRate 0.0009 Epoch: 5 Global Step: 113900 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:05:37,249-Speed 6310.52 samples/sec Loss 7.9023 LearningRate 0.0009 Epoch: 5 Global Step: 113910 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:05:40,494-Speed 6313.07 samples/sec Loss 8.1343 LearningRate 0.0009 Epoch: 5 Global Step: 113920 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:05:43,751-Speed 6288.96 samples/sec Loss 7.9368 LearningRate 0.0009 Epoch: 5 Global Step: 113930 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:05:46,999-Speed 6308.34 samples/sec Loss 8.0274 LearningRate 0.0009 Epoch: 5 Global Step: 113940 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:05:50,242-Speed 6316.32 samples/sec Loss 8.0530 LearningRate 0.0009 Epoch: 5 Global Step: 113950 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:05:53,487-Speed 6313.20 samples/sec Loss 8.0511 LearningRate 0.0009 Epoch: 5 Global Step: 113960 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:05:56,729-Speed 6317.51 samples/sec Loss 8.0568 LearningRate 0.0009 Epoch: 5 Global Step: 113970 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:05:59,979-Speed 6303.28 samples/sec Loss 7.9892 LearningRate 0.0009 Epoch: 5 Global Step: 113980 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:03,223-Speed 6314.15 samples/sec Loss 7.9999 LearningRate 0.0009 Epoch: 5 Global Step: 113990 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:06,454-Speed 6339.99 samples/sec Loss 8.0807 LearningRate 0.0009 Epoch: 5 Global Step: 114000 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:09,709-Speed 6293.68 samples/sec Loss 8.0445 LearningRate 0.0009 Epoch: 5 Global Step: 114010 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:12,956-Speed 6309.06 samples/sec Loss 7.9567 LearningRate 0.0009 Epoch: 5 Global Step: 114020 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:16,207-Speed 6301.96 samples/sec Loss 8.0286 LearningRate 0.0009 Epoch: 5 Global Step: 114030 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:19,452-Speed 6310.76 samples/sec Loss 8.0421 LearningRate 0.0009 Epoch: 5 Global Step: 114040 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:22,699-Speed 6309.93 samples/sec Loss 7.9658 LearningRate 0.0009 Epoch: 5 Global Step: 114050 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:25,947-Speed 6307.27 samples/sec Loss 8.0168 LearningRate 0.0009 Epoch: 5 Global Step: 114060 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:29,192-Speed 6311.37 samples/sec Loss 8.0052 LearningRate 0.0009 Epoch: 5 Global Step: 114070 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:32,438-Speed 6312.12 samples/sec Loss 8.0694 LearningRate 0.0009 Epoch: 5 Global Step: 114080 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:35,681-Speed 6315.79 samples/sec Loss 8.0341 LearningRate 0.0009 Epoch: 5 Global Step: 114090 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:06:38,932-Speed 6302.06 samples/sec Loss 8.0731 LearningRate 0.0009 Epoch: 5 Global Step: 114100 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:06:42,183-Speed 6301.26 samples/sec Loss 8.0444 LearningRate 0.0009 Epoch: 5 Global Step: 114110 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:06:45,428-Speed 6311.73 samples/sec Loss 8.0148 LearningRate 0.0009 Epoch: 5 Global Step: 114120 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:06:48,671-Speed 6316.10 samples/sec Loss 8.1342 LearningRate 0.0009 Epoch: 5 Global Step: 114130 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:06:51,918-Speed 6309.60 samples/sec Loss 7.9890 LearningRate 0.0009 Epoch: 5 Global Step: 114140 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:06:55,165-Speed 6308.01 samples/sec Loss 8.0478 LearningRate 0.0009 Epoch: 5 Global Step: 114150 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:06:58,417-Speed 6300.95 samples/sec Loss 8.0401 LearningRate 0.0009 Epoch: 5 Global Step: 114160 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:01,664-Speed 6308.90 samples/sec Loss 8.0438 LearningRate 0.0009 Epoch: 5 Global Step: 114170 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:04,916-Speed 6298.56 samples/sec Loss 7.9804 LearningRate 0.0009 Epoch: 5 Global Step: 114180 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:08,163-Speed 6310.09 samples/sec Loss 8.0067 LearningRate 0.0009 Epoch: 5 Global Step: 114190 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:11,395-Speed 6336.89 samples/sec Loss 8.0018 LearningRate 0.0009 Epoch: 5 Global Step: 114200 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:14,639-Speed 6315.81 samples/sec Loss 8.0223 LearningRate 0.0009 Epoch: 5 Global Step: 114210 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:17,885-Speed 6308.95 samples/sec Loss 8.0420 LearningRate 0.0009 Epoch: 5 Global Step: 114220 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:21,135-Speed 6305.05 samples/sec Loss 8.1115 LearningRate 0.0009 Epoch: 5 Global Step: 114230 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:24,378-Speed 6314.89 samples/sec Loss 8.0455 LearningRate 0.0009 Epoch: 5 Global Step: 114240 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:27,625-Speed 6308.79 samples/sec Loss 8.0779 LearningRate 0.0009 Epoch: 5 Global Step: 114250 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:30,881-Speed 6291.55 samples/sec Loss 8.0031 LearningRate 0.0009 Epoch: 5 Global Step: 114260 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:34,130-Speed 6304.89 samples/sec Loss 8.0073 LearningRate 0.0009 Epoch: 5 Global Step: 114270 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:37,379-Speed 6304.19 samples/sec Loss 8.0110 LearningRate 0.0009 Epoch: 5 Global Step: 114280 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:40,623-Speed 6315.92 samples/sec Loss 7.9945 LearningRate 0.0009 Epoch: 5 Global Step: 114290 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:43,854-Speed 6339.97 samples/sec Loss 8.0714 LearningRate 0.0009 Epoch: 5 Global Step: 114300 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:47,096-Speed 6317.55 samples/sec Loss 8.0722 LearningRate 0.0009 Epoch: 5 Global Step: 114310 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:50,344-Speed 6306.66 samples/sec Loss 8.0257 LearningRate 0.0009 Epoch: 5 Global Step: 114320 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:53,590-Speed 6311.78 samples/sec Loss 8.0085 LearningRate 0.0009 Epoch: 5 Global Step: 114330 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:07:56,836-Speed 6309.56 samples/sec Loss 8.0008 LearningRate 0.0009 Epoch: 5 Global Step: 114340 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:00,081-Speed 6313.18 samples/sec Loss 8.0004 LearningRate 0.0009 Epoch: 5 Global Step: 114350 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:03,330-Speed 6306.30 samples/sec Loss 7.9980 LearningRate 0.0009 Epoch: 5 Global Step: 114360 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:06,577-Speed 6309.52 samples/sec Loss 7.9854 LearningRate 0.0009 Epoch: 5 Global Step: 114370 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:09,822-Speed 6312.15 samples/sec Loss 7.9999 LearningRate 0.0009 Epoch: 5 Global Step: 114380 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:13,074-Speed 6299.33 samples/sec Loss 8.0436 LearningRate 0.0009 Epoch: 5 Global Step: 114390 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:16,311-Speed 6327.34 samples/sec Loss 8.0033 LearningRate 0.0009 Epoch: 5 Global Step: 114400 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:19,557-Speed 6312.08 samples/sec Loss 8.0464 LearningRate 0.0009 Epoch: 5 Global Step: 114410 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:22,803-Speed 6310.60 samples/sec Loss 8.0859 LearningRate 0.0009 Epoch: 5 Global Step: 114420 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:26,047-Speed 6314.01 samples/sec Loss 8.0102 LearningRate 0.0009 Epoch: 5 Global Step: 114430 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:29,291-Speed 6314.68 samples/sec Loss 8.0832 LearningRate 0.0009 Epoch: 5 Global Step: 114440 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:32,538-Speed 6308.39 samples/sec Loss 8.1397 LearningRate 0.0009 Epoch: 5 Global Step: 114450 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:35,785-Speed 6310.15 samples/sec Loss 7.9817 LearningRate 0.0009 Epoch: 5 Global Step: 114460 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:39,033-Speed 6305.63 samples/sec Loss 8.0266 LearningRate 0.0009 Epoch: 5 Global Step: 114470 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:42,279-Speed 6311.09 samples/sec Loss 8.0054 LearningRate 0.0009 Epoch: 5 Global Step: 114480 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:45,526-Speed 6308.53 samples/sec Loss 8.0999 LearningRate 0.0009 Epoch: 5 Global Step: 114490 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:48,759-Speed 6336.79 samples/sec Loss 8.0150 LearningRate 0.0009 Epoch: 5 Global Step: 114500 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:52,003-Speed 6313.91 samples/sec Loss 8.0295 LearningRate 0.0009 Epoch: 5 Global Step: 114510 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:55,253-Speed 6302.50 samples/sec Loss 8.0377 LearningRate 0.0009 Epoch: 5 Global Step: 114520 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:08:58,498-Speed 6313.96 samples/sec Loss 8.0387 LearningRate 0.0009 Epoch: 5 Global Step: 114530 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:09:01,745-Speed 6309.04 samples/sec Loss 8.0574 LearningRate 0.0009 Epoch: 5 Global Step: 114540 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:09:04,994-Speed 6304.50 samples/sec Loss 7.9347 LearningRate 0.0009 Epoch: 5 Global Step: 114550 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:09:08,241-Speed 6308.93 samples/sec Loss 8.0219 LearningRate 0.0009 Epoch: 5 Global Step: 114560 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:09:11,488-Speed 6308.99 samples/sec Loss 8.0526 LearningRate 0.0009 Epoch: 5 Global Step: 114570 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:09:14,735-Speed 6308.94 samples/sec Loss 8.0941 LearningRate 0.0009 Epoch: 5 Global Step: 114580 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:09:17,987-Speed 6300.10 samples/sec Loss 8.0535 LearningRate 0.0009 Epoch: 5 Global Step: 114590 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:09:21,217-Speed 6341.68 samples/sec Loss 8.0489 LearningRate 0.0009 Epoch: 5 Global Step: 114600 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:09:24,463-Speed 6310.00 samples/sec Loss 8.0789 LearningRate 0.0009 Epoch: 5 Global Step: 114610 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:09:27,711-Speed 6308.35 samples/sec Loss 8.1415 LearningRate 0.0009 Epoch: 5 Global Step: 114620 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:09:30,941-Speed 6341.43 samples/sec Loss 7.9850 LearningRate 0.0009 Epoch: 5 Global Step: 114630 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:09:34,187-Speed 6311.37 samples/sec Loss 8.0213 LearningRate 0.0009 Epoch: 5 Global Step: 114640 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:09:37,445-Speed 6286.21 samples/sec Loss 8.0882 LearningRate 0.0009 Epoch: 5 Global Step: 114650 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:09:40,689-Speed 6314.48 samples/sec Loss 8.0739 LearningRate 0.0009 Epoch: 5 Global Step: 114660 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:09:43,934-Speed 6314.02 samples/sec Loss 8.0053 LearningRate 0.0009 Epoch: 5 Global Step: 114670 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:09:47,181-Speed 6307.55 samples/sec Loss 8.0334 LearningRate 0.0009 Epoch: 5 Global Step: 114680 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:09:50,442-Speed 6281.09 samples/sec Loss 8.0507 LearningRate 0.0009 Epoch: 5 Global Step: 114690 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:09:53,695-Speed 6297.76 samples/sec Loss 8.0383 LearningRate 0.0009 Epoch: 5 Global Step: 114700 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:09:56,942-Speed 6308.26 samples/sec Loss 8.0812 LearningRate 0.0009 Epoch: 5 Global Step: 114710 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:10:00,188-Speed 6310.54 samples/sec Loss 8.0736 LearningRate 0.0009 Epoch: 5 Global Step: 114720 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:10:03,435-Speed 6310.83 samples/sec Loss 8.1125 LearningRate 0.0009 Epoch: 5 Global Step: 114730 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:06,685-Speed 6302.11 samples/sec Loss 8.1235 LearningRate 0.0009 Epoch: 5 Global Step: 114740 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:09,931-Speed 6310.37 samples/sec Loss 7.9732 LearningRate 0.0009 Epoch: 5 Global Step: 114750 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:13,178-Speed 6308.56 samples/sec Loss 8.0786 LearningRate 0.0009 Epoch: 5 Global Step: 114760 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:16,425-Speed 6309.65 samples/sec Loss 8.0488 LearningRate 0.0009 Epoch: 5 Global Step: 114770 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:19,676-Speed 6301.40 samples/sec Loss 8.0275 LearningRate 0.0009 Epoch: 5 Global Step: 114780 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:22,923-Speed 6309.50 samples/sec Loss 8.0738 LearningRate 0.0009 Epoch: 5 Global Step: 114790 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:26,170-Speed 6309.10 samples/sec Loss 7.9951 LearningRate 0.0009 Epoch: 5 Global Step: 114800 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:29,417-Speed 6309.15 samples/sec Loss 7.9435 LearningRate 0.0009 Epoch: 5 Global Step: 114810 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:32,661-Speed 6314.62 samples/sec Loss 8.0128 LearningRate 0.0009 Epoch: 5 Global Step: 114820 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:35,906-Speed 6313.22 samples/sec Loss 8.0211 LearningRate 0.0009 Epoch: 5 Global Step: 114830 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:39,147-Speed 6320.36 samples/sec Loss 8.0645 LearningRate 0.0009 Epoch: 5 Global Step: 114840 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:42,393-Speed 6310.10 samples/sec Loss 8.0389 LearningRate 0.0009 Epoch: 5 Global Step: 114850 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:45,636-Speed 6317.02 samples/sec Loss 8.0630 LearningRate 0.0009 Epoch: 5 Global Step: 114860 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:48,879-Speed 6315.83 samples/sec Loss 8.0065 LearningRate 0.0009 Epoch: 5 Global Step: 114870 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:10:52,111-Speed 6338.68 samples/sec Loss 7.9637 LearningRate 0.0009 Epoch: 5 Global Step: 114880 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:10:55,359-Speed 6306.54 samples/sec Loss 7.9968 LearningRate 0.0009 Epoch: 5 Global Step: 114890 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:10:58,609-Speed 6301.72 samples/sec Loss 8.0696 LearningRate 0.0009 Epoch: 5 Global Step: 114900 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:01,859-Speed 6304.11 samples/sec Loss 8.0423 LearningRate 0.0009 Epoch: 5 Global Step: 114910 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:05,105-Speed 6311.13 samples/sec Loss 8.0274 LearningRate 0.0009 Epoch: 5 Global Step: 114920 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:08,350-Speed 6312.90 samples/sec Loss 8.0699 LearningRate 0.0009 Epoch: 5 Global Step: 114930 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:11,597-Speed 6307.62 samples/sec Loss 8.0153 LearningRate 0.0009 Epoch: 5 Global Step: 114940 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:14,842-Speed 6313.36 samples/sec Loss 7.9540 LearningRate 0.0009 Epoch: 5 Global Step: 114950 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:18,090-Speed 6307.22 samples/sec Loss 8.0225 LearningRate 0.0009 Epoch: 5 Global Step: 114960 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:21,335-Speed 6311.91 samples/sec Loss 7.9724 LearningRate 0.0009 Epoch: 5 Global Step: 114970 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:24,584-Speed 6306.33 samples/sec Loss 8.0710 LearningRate 0.0009 Epoch: 5 Global Step: 114980 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:11:27,830-Speed 6310.75 samples/sec Loss 8.0758 LearningRate 0.0009 Epoch: 5 Global Step: 114990 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:11:31,076-Speed 6310.81 samples/sec Loss 8.0338 LearningRate 0.0009 Epoch: 5 Global Step: 115000 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:11:34,325-Speed 6305.13 samples/sec Loss 8.0429 LearningRate 0.0009 Epoch: 5 Global Step: 115010 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:11:37,562-Speed 6327.73 samples/sec Loss 7.9922 LearningRate 0.0009 Epoch: 5 Global Step: 115020 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:40,810-Speed 6307.18 samples/sec Loss 7.9907 LearningRate 0.0009 Epoch: 5 Global Step: 115030 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:44,057-Speed 6309.20 samples/sec Loss 8.0407 LearningRate 0.0009 Epoch: 5 Global Step: 115040 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:47,302-Speed 6312.03 samples/sec Loss 7.9770 LearningRate 0.0009 Epoch: 5 Global Step: 115050 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:50,549-Speed 6307.97 samples/sec Loss 8.0259 LearningRate 0.0009 Epoch: 5 Global Step: 115060 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:53,798-Speed 6305.84 samples/sec Loss 7.8980 LearningRate 0.0009 Epoch: 5 Global Step: 115070 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:11:57,045-Speed 6307.80 samples/sec Loss 8.0578 LearningRate 0.0009 Epoch: 5 Global Step: 115080 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:00,294-Speed 6305.83 samples/sec Loss 8.0347 LearningRate 0.0009 Epoch: 5 Global Step: 115090 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:03,545-Speed 6300.79 samples/sec Loss 7.9697 LearningRate 0.0009 Epoch: 5 Global Step: 115100 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:06,792-Speed 6308.76 samples/sec Loss 8.0274 LearningRate 0.0009 Epoch: 5 Global Step: 115110 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:10,038-Speed 6309.40 samples/sec Loss 8.1127 LearningRate 0.0009 Epoch: 5 Global Step: 115120 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:12:13,288-Speed 6303.46 samples/sec Loss 7.9870 LearningRate 0.0009 Epoch: 5 Global Step: 115130 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:12:16,537-Speed 6304.89 samples/sec Loss 8.0119 LearningRate 0.0009 Epoch: 5 Global Step: 115140 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:12:19,784-Speed 6308.62 samples/sec Loss 8.1859 LearningRate 0.0009 Epoch: 5 Global Step: 115150 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:12:23,032-Speed 6306.56 samples/sec Loss 8.0008 LearningRate 0.0009 Epoch: 5 Global Step: 115160 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:12:26,277-Speed 6312.17 samples/sec Loss 8.0218 LearningRate 0.0009 Epoch: 5 Global Step: 115170 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:12:29,524-Speed 6310.38 samples/sec Loss 8.0467 LearningRate 0.0009 Epoch: 5 Global Step: 115180 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:12:32,772-Speed 6305.40 samples/sec Loss 8.0442 LearningRate 0.0009 Epoch: 5 Global Step: 115190 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:12:36,003-Speed 6340.95 samples/sec Loss 8.0635 LearningRate 0.0009 Epoch: 5 Global Step: 115200 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:39,249-Speed 6311.57 samples/sec Loss 8.0056 LearningRate 0.0009 Epoch: 5 Global Step: 115210 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:42,498-Speed 6304.07 samples/sec Loss 7.9625 LearningRate 0.0009 Epoch: 5 Global Step: 115220 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:45,742-Speed 6314.85 samples/sec Loss 8.1104 LearningRate 0.0009 Epoch: 5 Global Step: 115230 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:48,992-Speed 6303.06 samples/sec Loss 7.9727 LearningRate 0.0009 Epoch: 5 Global Step: 115240 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:52,238-Speed 6311.56 samples/sec Loss 7.9691 LearningRate 0.0009 Epoch: 5 Global Step: 115250 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:55,482-Speed 6314.61 samples/sec Loss 7.9321 LearningRate 0.0009 Epoch: 5 Global Step: 115260 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:12:58,726-Speed 6313.37 samples/sec Loss 8.0115 LearningRate 0.0009 Epoch: 5 Global Step: 115270 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:01,971-Speed 6313.68 samples/sec Loss 7.9948 LearningRate 0.0009 Epoch: 5 Global Step: 115280 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:05,217-Speed 6310.05 samples/sec Loss 8.0205 LearningRate 0.0009 Epoch: 5 Global Step: 115290 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:08,466-Speed 6305.25 samples/sec Loss 7.9379 LearningRate 0.0009 Epoch: 5 Global Step: 115300 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:13:11,713-Speed 6307.43 samples/sec Loss 8.0996 LearningRate 0.0009 Epoch: 5 Global Step: 115310 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:13:14,957-Speed 6316.17 samples/sec Loss 8.0056 LearningRate 0.0009 Epoch: 5 Global Step: 115320 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:13:18,204-Speed 6309.01 samples/sec Loss 7.9595 LearningRate 0.0009 Epoch: 5 Global Step: 115330 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:13:21,438-Speed 6333.93 samples/sec Loss 8.0612 LearningRate 0.0009 Epoch: 5 Global Step: 115340 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:24,685-Speed 6308.33 samples/sec Loss 8.0348 LearningRate 0.0009 Epoch: 5 Global Step: 115350 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:27,940-Speed 6293.11 samples/sec Loss 8.0741 LearningRate 0.0009 Epoch: 5 Global Step: 115360 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:31,181-Speed 6320.63 samples/sec Loss 7.8735 LearningRate 0.0009 Epoch: 5 Global Step: 115370 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:34,428-Speed 6307.99 samples/sec Loss 8.0034 LearningRate 0.0009 Epoch: 5 Global Step: 115380 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:37,682-Speed 6294.61 samples/sec Loss 8.0437 LearningRate 0.0009 Epoch: 5 Global Step: 115390 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:40,930-Speed 6306.62 samples/sec Loss 7.9322 LearningRate 0.0009 Epoch: 5 Global Step: 115400 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:44,176-Speed 6312.02 samples/sec Loss 7.9767 LearningRate 0.0009 Epoch: 5 Global Step: 115410 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:47,423-Speed 6308.88 samples/sec Loss 7.9784 LearningRate 0.0009 Epoch: 5 Global Step: 115420 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:50,672-Speed 6305.38 samples/sec Loss 7.9921 LearningRate 0.0009 Epoch: 5 Global Step: 115430 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:13:53,921-Speed 6303.99 samples/sec Loss 7.9647 LearningRate 0.0009 Epoch: 5 Global Step: 115440 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:13:57,169-Speed 6307.60 samples/sec Loss 8.0280 LearningRate 0.0009 Epoch: 5 Global Step: 115450 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:14:00,404-Speed 6331.71 samples/sec Loss 7.8810 LearningRate 0.0009 Epoch: 5 Global Step: 115460 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:03,663-Speed 6286.07 samples/sec Loss 7.8778 LearningRate 0.0009 Epoch: 5 Global Step: 115470 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:06,912-Speed 6304.25 samples/sec Loss 7.9838 LearningRate 0.0009 Epoch: 5 Global Step: 115480 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:10,154-Speed 6318.96 samples/sec Loss 8.0144 LearningRate 0.0009 Epoch: 5 Global Step: 115490 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:13,413-Speed 6285.33 samples/sec Loss 8.0359 LearningRate 0.0009 Epoch: 5 Global Step: 115500 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:16,657-Speed 6316.01 samples/sec Loss 8.0208 LearningRate 0.0009 Epoch: 5 Global Step: 115510 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:19,901-Speed 6314.16 samples/sec Loss 7.9915 LearningRate 0.0009 Epoch: 5 Global Step: 115520 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:23,145-Speed 6312.96 samples/sec Loss 8.0333 LearningRate 0.0009 Epoch: 5 Global Step: 115530 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:26,387-Speed 6318.62 samples/sec Loss 7.9831 LearningRate 0.0009 Epoch: 5 Global Step: 115540 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:29,631-Speed 6314.56 samples/sec Loss 8.0287 LearningRate 0.0009 Epoch: 5 Global Step: 115550 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:32,875-Speed 6316.00 samples/sec Loss 8.0461 LearningRate 0.0009 Epoch: 5 Global Step: 115560 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:14:36,120-Speed 6311.39 samples/sec Loss 7.9760 LearningRate 0.0009 Epoch: 5 Global Step: 115570 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:14:39,364-Speed 6315.05 samples/sec Loss 8.0233 LearningRate 0.0009 Epoch: 5 Global Step: 115580 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:14:42,610-Speed 6310.48 samples/sec Loss 7.9462 LearningRate 0.0009 Epoch: 5 Global Step: 115590 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:14:45,860-Speed 6302.73 samples/sec Loss 7.9406 LearningRate 0.0009 Epoch: 5 Global Step: 115600 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:14:49,102-Speed 6318.37 samples/sec Loss 7.9628 LearningRate 0.0009 Epoch: 5 Global Step: 115610 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:14:52,349-Speed 6310.42 samples/sec Loss 8.0138 LearningRate 0.0009 Epoch: 5 Global Step: 115620 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:14:55,583-Speed 6332.97 samples/sec Loss 7.9539 LearningRate 0.0009 Epoch: 5 Global Step: 115630 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:14:58,827-Speed 6315.29 samples/sec Loss 7.9306 LearningRate 0.0009 Epoch: 5 Global Step: 115640 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:15:02,074-Speed 6309.34 samples/sec Loss 8.0309 LearningRate 0.0009 Epoch: 5 Global Step: 115650 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:15:05,317-Speed 6315.93 samples/sec Loss 7.9993 LearningRate 0.0009 Epoch: 5 Global Step: 115660 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:15:08,562-Speed 6312.45 samples/sec Loss 8.0721 LearningRate 0.0009 Epoch: 5 Global Step: 115670 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:15:11,806-Speed 6314.78 samples/sec Loss 7.9789 LearningRate 0.0009 Epoch: 5 Global Step: 115680 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:15:15,059-Speed 6298.01 samples/sec Loss 7.9237 LearningRate 0.0009 Epoch: 5 Global Step: 115690 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:15:18,304-Speed 6312.16 samples/sec Loss 8.0690 LearningRate 0.0009 Epoch: 5 Global Step: 115700 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:15:21,550-Speed 6310.82 samples/sec Loss 8.0163 LearningRate 0.0009 Epoch: 5 Global Step: 115710 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:15:24,797-Speed 6307.77 samples/sec Loss 8.0566 LearningRate 0.0009 Epoch: 5 Global Step: 115720 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:15:28,040-Speed 6317.75 samples/sec Loss 8.0496 LearningRate 0.0009 Epoch: 5 Global Step: 115730 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:15:31,282-Speed 6317.85 samples/sec Loss 7.9276 LearningRate 0.0009 Epoch: 5 Global Step: 115740 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:15:34,533-Speed 6301.80 samples/sec Loss 7.9164 LearningRate 0.0009 Epoch: 5 Global Step: 115750 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:15:37,786-Speed 6296.06 samples/sec Loss 8.0384 LearningRate 0.0009 Epoch: 5 Global Step: 115760 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:15:41,044-Speed 6288.19 samples/sec Loss 7.9828 LearningRate 0.0009 Epoch: 5 Global Step: 115770 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:15:44,294-Speed 6302.36 samples/sec Loss 8.0693 LearningRate 0.0009 Epoch: 5 Global Step: 115780 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:15:47,547-Speed 6297.20 samples/sec Loss 8.0358 LearningRate 0.0009 Epoch: 5 Global Step: 115790 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:15:50,791-Speed 6314.17 samples/sec Loss 7.9959 LearningRate 0.0009 Epoch: 5 Global Step: 115800 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:15:54,038-Speed 6309.76 samples/sec Loss 7.9881 LearningRate 0.0009 Epoch: 5 Global Step: 115810 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:15:57,280-Speed 6317.46 samples/sec Loss 8.0398 LearningRate 0.0009 Epoch: 5 Global Step: 115820 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:16:00,509-Speed 6344.26 samples/sec Loss 8.0365 LearningRate 0.0009 Epoch: 5 Global Step: 115830 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:16:03,753-Speed 6313.67 samples/sec Loss 7.9679 LearningRate 0.0009 Epoch: 5 Global Step: 115840 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:16:07,001-Speed 6307.59 samples/sec Loss 7.9241 LearningRate 0.0009 Epoch: 5 Global Step: 115850 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:16:10,255-Speed 6296.15 samples/sec Loss 8.0338 LearningRate 0.0009 Epoch: 5 Global Step: 115860 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:16:13,502-Speed 6308.81 samples/sec Loss 7.9958 LearningRate 0.0009 Epoch: 5 Global Step: 115870 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:16:16,751-Speed 6305.10 samples/sec Loss 7.9702 LearningRate 0.0009 Epoch: 5 Global Step: 115880 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:16:19,996-Speed 6312.66 samples/sec Loss 8.0225 LearningRate 0.0009 Epoch: 5 Global Step: 115890 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:16:23,238-Speed 6318.56 samples/sec Loss 7.9233 LearningRate 0.0009 Epoch: 5 Global Step: 115900 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:16:26,486-Speed 6306.88 samples/sec Loss 7.9432 LearningRate 0.0009 Epoch: 5 Global Step: 115910 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:16:29,733-Speed 6309.20 samples/sec Loss 7.9040 LearningRate 0.0009 Epoch: 5 Global Step: 115920 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:16:32,978-Speed 6310.81 samples/sec Loss 7.9893 LearningRate 0.0009 Epoch: 5 Global Step: 115930 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:16:36,227-Speed 6306.38 samples/sec Loss 8.0418 LearningRate 0.0009 Epoch: 5 Global Step: 115940 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:16:39,474-Speed 6308.09 samples/sec Loss 7.9856 LearningRate 0.0009 Epoch: 5 Global Step: 115950 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:16:42,722-Speed 6307.13 samples/sec Loss 7.9164 LearningRate 0.0009 Epoch: 5 Global Step: 115960 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:16:45,972-Speed 6303.58 samples/sec Loss 7.9563 LearningRate 0.0009 Epoch: 5 Global Step: 115970 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:16:49,218-Speed 6310.36 samples/sec Loss 7.9450 LearningRate 0.0009 Epoch: 5 Global Step: 115980 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:16:52,466-Speed 6305.56 samples/sec Loss 7.9716 LearningRate 0.0009 Epoch: 5 Global Step: 115990 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:16:55,710-Speed 6314.26 samples/sec Loss 8.0171 LearningRate 0.0009 Epoch: 5 Global Step: 116000 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:16:58,955-Speed 6314.09 samples/sec Loss 7.9109 LearningRate 0.0009 Epoch: 5 Global Step: 116010 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:17:02,205-Speed 6302.18 samples/sec Loss 8.0358 LearningRate 0.0009 Epoch: 5 Global Step: 116020 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:17:05,450-Speed 6313.17 samples/sec Loss 8.0549 LearningRate 0.0009 Epoch: 5 Global Step: 116030 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:17:08,698-Speed 6305.61 samples/sec Loss 8.0100 LearningRate 0.0009 Epoch: 5 Global Step: 116040 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:17:11,942-Speed 6315.27 samples/sec Loss 8.0522 LearningRate 0.0009 Epoch: 5 Global Step: 116050 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:17:15,180-Speed 6326.91 samples/sec Loss 8.0112 LearningRate 0.0009 Epoch: 5 Global Step: 116060 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:17:18,423-Speed 6315.74 samples/sec Loss 8.0042 LearningRate 0.0009 Epoch: 5 Global Step: 116070 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:17:21,671-Speed 6307.17 samples/sec Loss 8.0152 LearningRate 0.0009 Epoch: 5 Global Step: 116080 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:17:24,920-Speed 6306.10 samples/sec Loss 7.9021 LearningRate 0.0009 Epoch: 5 Global Step: 116090 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:17:28,166-Speed 6311.82 samples/sec Loss 7.9091 LearningRate 0.0009 Epoch: 5 Global Step: 116100 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:17:31,410-Speed 6314.32 samples/sec Loss 7.8908 LearningRate 0.0009 Epoch: 5 Global Step: 116110 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:17:34,654-Speed 6314.38 samples/sec Loss 8.0099 LearningRate 0.0009 Epoch: 5 Global Step: 116120 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:17:37,903-Speed 6305.79 samples/sec Loss 7.9522 LearningRate 0.0009 Epoch: 5 Global Step: 116130 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:17:41,150-Speed 6308.96 samples/sec Loss 7.9105 LearningRate 0.0009 Epoch: 5 Global Step: 116140 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:17:44,401-Speed 6301.62 samples/sec Loss 7.9787 LearningRate 0.0009 Epoch: 5 Global Step: 116150 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:17:47,646-Speed 6312.48 samples/sec Loss 8.0542 LearningRate 0.0009 Epoch: 5 Global Step: 116160 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:17:50,893-Speed 6308.71 samples/sec Loss 7.9049 LearningRate 0.0009 Epoch: 5 Global Step: 116170 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:17:54,139-Speed 6309.14 samples/sec Loss 7.9842 LearningRate 0.0009 Epoch: 5 Global Step: 116180 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:17:57,385-Speed 6310.95 samples/sec Loss 7.9558 LearningRate 0.0009 Epoch: 5 Global Step: 116190 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:00,631-Speed 6310.74 samples/sec Loss 8.0099 LearningRate 0.0009 Epoch: 5 Global Step: 116200 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:03,873-Speed 6317.96 samples/sec Loss 7.8973 LearningRate 0.0009 Epoch: 5 Global Step: 116210 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:07,124-Speed 6302.68 samples/sec Loss 8.0249 LearningRate 0.0009 Epoch: 5 Global Step: 116220 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:10,379-Speed 6293.29 samples/sec Loss 7.9259 LearningRate 0.0009 Epoch: 5 Global Step: 116230 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:13,623-Speed 6313.44 samples/sec Loss 8.0288 LearningRate 0.0009 Epoch: 5 Global Step: 116240 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:16,871-Speed 6308.37 samples/sec Loss 7.9576 LearningRate 0.0009 Epoch: 5 Global Step: 116250 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:20,102-Speed 6338.47 samples/sec Loss 7.9296 LearningRate 0.0009 Epoch: 5 Global Step: 116260 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:23,349-Speed 6308.85 samples/sec Loss 7.9766 LearningRate 0.0009 Epoch: 5 Global Step: 116270 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:26,596-Speed 6308.60 samples/sec Loss 7.9172 LearningRate 0.0009 Epoch: 5 Global Step: 116280 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:29,845-Speed 6305.58 samples/sec Loss 8.0895 LearningRate 0.0009 Epoch: 5 Global Step: 116290 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:33,089-Speed 6315.16 samples/sec Loss 8.0032 LearningRate 0.0009 Epoch: 5 Global Step: 116300 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:36,334-Speed 6312.30 samples/sec Loss 8.0115 LearningRate 0.0009 Epoch: 5 Global Step: 116310 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:39,585-Speed 6300.38 samples/sec Loss 7.9934 LearningRate 0.0009 Epoch: 5 Global Step: 116320 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:42,836-Speed 6301.83 samples/sec Loss 7.9221 LearningRate 0.0009 Epoch: 5 Global Step: 116330 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:46,083-Speed 6308.91 samples/sec Loss 7.9957 LearningRate 0.0009 Epoch: 5 Global Step: 116340 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:49,333-Speed 6303.04 samples/sec Loss 8.0034 LearningRate 0.0009 Epoch: 5 Global Step: 116350 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:52,564-Speed 6340.96 samples/sec Loss 7.9390 LearningRate 0.0009 Epoch: 5 Global Step: 116360 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:55,810-Speed 6309.64 samples/sec Loss 8.0809 LearningRate 0.0009 Epoch: 5 Global Step: 116370 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:18:59,055-Speed 6312.47 samples/sec Loss 7.9718 LearningRate 0.0009 Epoch: 5 Global Step: 116380 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:02,303-Speed 6306.29 samples/sec Loss 7.9707 LearningRate 0.0009 Epoch: 5 Global Step: 116390 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:05,552-Speed 6304.84 samples/sec Loss 7.9906 LearningRate 0.0009 Epoch: 5 Global Step: 116400 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:08,800-Speed 6306.37 samples/sec Loss 8.0025 LearningRate 0.0009 Epoch: 5 Global Step: 116410 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:12,047-Speed 6309.41 samples/sec Loss 7.9565 LearningRate 0.0009 Epoch: 5 Global Step: 116420 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:15,293-Speed 6311.67 samples/sec Loss 7.9665 LearningRate 0.0009 Epoch: 5 Global Step: 116430 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:18,546-Speed 6297.24 samples/sec Loss 8.0006 LearningRate 0.0009 Epoch: 5 Global Step: 116440 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:21,795-Speed 6305.03 samples/sec Loss 7.9729 LearningRate 0.0009 Epoch: 5 Global Step: 116450 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:25,027-Speed 6337.50 samples/sec Loss 8.0331 LearningRate 0.0009 Epoch: 5 Global Step: 116460 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:28,274-Speed 6307.73 samples/sec Loss 8.0295 LearningRate 0.0009 Epoch: 5 Global Step: 116470 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:31,519-Speed 6312.88 samples/sec Loss 7.9421 LearningRate 0.0009 Epoch: 5 Global Step: 116480 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:34,765-Speed 6311.87 samples/sec Loss 7.9723 LearningRate 0.0009 Epoch: 5 Global Step: 116490 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:19:38,003-Speed 6325.44 samples/sec Loss 7.9547 LearningRate 0.0009 Epoch: 5 Global Step: 116500 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:19:41,248-Speed 6311.86 samples/sec Loss 8.0026 LearningRate 0.0009 Epoch: 5 Global Step: 116510 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:19:44,497-Speed 6307.06 samples/sec Loss 8.0038 LearningRate 0.0009 Epoch: 5 Global Step: 116520 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:19:47,740-Speed 6316.99 samples/sec Loss 8.0268 LearningRate 0.0009 Epoch: 5 Global Step: 116530 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:19:50,984-Speed 6314.09 samples/sec Loss 7.9827 LearningRate 0.0009 Epoch: 5 Global Step: 116540 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:19:54,228-Speed 6314.25 samples/sec Loss 8.0061 LearningRate 0.0009 Epoch: 5 Global Step: 116550 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:19:57,477-Speed 6304.64 samples/sec Loss 8.0415 LearningRate 0.0009 Epoch: 5 Global Step: 116560 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:00,720-Speed 6316.55 samples/sec Loss 7.9806 LearningRate 0.0009 Epoch: 5 Global Step: 116570 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:03,968-Speed 6307.14 samples/sec Loss 7.9678 LearningRate 0.0009 Epoch: 5 Global Step: 116580 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:07,219-Speed 6300.86 samples/sec Loss 7.9548 LearningRate 0.0009 Epoch: 5 Global Step: 116590 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:10,468-Speed 6304.07 samples/sec Loss 8.0313 LearningRate 0.0009 Epoch: 5 Global Step: 116600 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:20:13,703-Speed 6332.87 samples/sec Loss 8.0416 LearningRate 0.0009 Epoch: 5 Global Step: 116610 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:16,949-Speed 6310.94 samples/sec Loss 7.9557 LearningRate 0.0009 Epoch: 5 Global Step: 116620 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:20,194-Speed 6311.89 samples/sec Loss 8.0074 LearningRate 0.0009 Epoch: 5 Global Step: 116630 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:23,440-Speed 6311.77 samples/sec Loss 7.9444 LearningRate 0.0009 Epoch: 5 Global Step: 116640 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:26,692-Speed 6298.38 samples/sec Loss 7.9461 LearningRate 0.0009 Epoch: 5 Global Step: 116650 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:29,942-Speed 6302.89 samples/sec Loss 7.9973 LearningRate 0.0009 Epoch: 5 Global Step: 116660 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:33,185-Speed 6315.94 samples/sec Loss 8.0580 LearningRate 0.0009 Epoch: 5 Global Step: 116670 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:36,433-Speed 6307.49 samples/sec Loss 8.0186 LearningRate 0.0009 Epoch: 5 Global Step: 116680 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:39,682-Speed 6305.30 samples/sec Loss 7.9657 LearningRate 0.0009 Epoch: 5 Global Step: 116690 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:42,930-Speed 6306.09 samples/sec Loss 7.9880 LearningRate 0.0009 Epoch: 5 Global Step: 116700 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:20:46,175-Speed 6312.28 samples/sec Loss 7.9545 LearningRate 0.0009 Epoch: 5 Global Step: 116710 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:20:49,421-Speed 6312.09 samples/sec Loss 7.8992 LearningRate 0.0009 Epoch: 5 Global Step: 116720 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:20:52,677-Speed 6290.81 samples/sec Loss 7.9434 LearningRate 0.0009 Epoch: 5 Global Step: 116730 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:20:55,922-Speed 6313.87 samples/sec Loss 7.9426 LearningRate 0.0009 Epoch: 5 Global Step: 116740 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:20:59,170-Speed 6306.35 samples/sec Loss 7.9623 LearningRate 0.0009 Epoch: 5 Global Step: 116750 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:02,415-Speed 6313.17 samples/sec Loss 7.9474 LearningRate 0.0009 Epoch: 5 Global Step: 116760 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:05,660-Speed 6312.58 samples/sec Loss 7.9135 LearningRate 0.0009 Epoch: 5 Global Step: 116770 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:08,909-Speed 6305.39 samples/sec Loss 7.9950 LearningRate 0.0009 Epoch: 5 Global Step: 116780 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:12,155-Speed 6309.97 samples/sec Loss 7.9966 LearningRate 0.0009 Epoch: 5 Global Step: 116790 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:15,402-Speed 6308.92 samples/sec Loss 7.8961 LearningRate 0.0009 Epoch: 5 Global Step: 116800 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:18,635-Speed 6335.90 samples/sec Loss 7.9987 LearningRate 0.0009 Epoch: 5 Global Step: 116810 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:21,883-Speed 6307.57 samples/sec Loss 7.8989 LearningRate 0.0009 Epoch: 5 Global Step: 116820 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:25,132-Speed 6304.50 samples/sec Loss 7.9463 LearningRate 0.0009 Epoch: 5 Global Step: 116830 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:28,374-Speed 6319.01 samples/sec Loss 7.9219 LearningRate 0.0009 Epoch: 5 Global Step: 116840 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:31,621-Speed 6308.07 samples/sec Loss 7.9278 LearningRate 0.0009 Epoch: 5 Global Step: 116850 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:34,869-Speed 6307.31 samples/sec Loss 7.9812 LearningRate 0.0009 Epoch: 5 Global Step: 116860 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:38,119-Speed 6302.48 samples/sec Loss 8.0093 LearningRate 0.0009 Epoch: 5 Global Step: 116870 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:41,369-Speed 6303.24 samples/sec Loss 7.8952 LearningRate 0.0009 Epoch: 5 Global Step: 116880 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:44,612-Speed 6315.72 samples/sec Loss 7.9694 LearningRate 0.0009 Epoch: 5 Global Step: 116890 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:47,859-Speed 6308.89 samples/sec Loss 7.9577 LearningRate 0.0009 Epoch: 5 Global Step: 116900 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:51,087-Speed 6346.75 samples/sec Loss 7.9574 LearningRate 0.0009 Epoch: 5 Global Step: 116910 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:54,330-Speed 6316.53 samples/sec Loss 8.0351 LearningRate 0.0009 Epoch: 5 Global Step: 116920 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:21:57,576-Speed 6311.04 samples/sec Loss 7.9554 LearningRate 0.0009 Epoch: 5 Global Step: 116930 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:22:00,818-Speed 6317.68 samples/sec Loss 7.9853 LearningRate 0.0009 Epoch: 5 Global Step: 116940 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:22:04,068-Speed 6303.16 samples/sec Loss 7.9205 LearningRate 0.0009 Epoch: 5 Global Step: 116950 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:22:07,303-Speed 6332.65 samples/sec Loss 7.9484 LearningRate 0.0009 Epoch: 5 Global Step: 116960 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:22:10,549-Speed 6310.03 samples/sec Loss 7.9324 LearningRate 0.0009 Epoch: 5 Global Step: 116970 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:22:13,794-Speed 6313.01 samples/sec Loss 7.9813 LearningRate 0.0009 Epoch: 5 Global Step: 116980 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:22:17,045-Speed 6302.26 samples/sec Loss 8.0018 LearningRate 0.0009 Epoch: 5 Global Step: 116990 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:22:20,289-Speed 6313.21 samples/sec Loss 7.9310 LearningRate 0.0009 Epoch: 5 Global Step: 117000 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:22:23,538-Speed 6305.55 samples/sec Loss 8.0310 LearningRate 0.0009 Epoch: 5 Global Step: 117010 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:22:26,782-Speed 6314.92 samples/sec Loss 7.9964 LearningRate 0.0009 Epoch: 5 Global Step: 117020 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:22:30,026-Speed 6314.24 samples/sec Loss 7.9508 LearningRate 0.0009 Epoch: 5 Global Step: 117030 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:22:33,273-Speed 6308.05 samples/sec Loss 7.8892 LearningRate 0.0009 Epoch: 5 Global Step: 117040 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:22:36,519-Speed 6311.43 samples/sec Loss 8.0012 LearningRate 0.0009 Epoch: 5 Global Step: 117050 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:22:39,764-Speed 6312.40 samples/sec Loss 7.9532 LearningRate 0.0009 Epoch: 5 Global Step: 117060 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:22:43,010-Speed 6311.68 samples/sec Loss 7.9585 LearningRate 0.0009 Epoch: 5 Global Step: 117070 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:22:46,256-Speed 6308.96 samples/sec Loss 7.9997 LearningRate 0.0009 Epoch: 5 Global Step: 117080 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:22:49,502-Speed 6312.03 samples/sec Loss 7.9959 LearningRate 0.0009 Epoch: 5 Global Step: 117090 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:22:52,748-Speed 6310.82 samples/sec Loss 7.9810 LearningRate 0.0009 Epoch: 5 Global Step: 117100 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:22:55,996-Speed 6306.74 samples/sec Loss 8.0710 LearningRate 0.0009 Epoch: 5 Global Step: 117110 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:22:59,243-Speed 6307.46 samples/sec Loss 7.9028 LearningRate 0.0009 Epoch: 5 Global Step: 117120 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:02,489-Speed 6310.62 samples/sec Loss 7.9323 LearningRate 0.0009 Epoch: 5 Global Step: 117130 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:05,740-Speed 6301.76 samples/sec Loss 8.0249 LearningRate 0.0009 Epoch: 5 Global Step: 117140 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:08,990-Speed 6301.60 samples/sec Loss 7.8951 LearningRate 0.0009 Epoch: 5 Global Step: 117150 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:12,225-Speed 6332.81 samples/sec Loss 7.9101 LearningRate 0.0009 Epoch: 5 Global Step: 117160 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:15,471-Speed 6311.65 samples/sec Loss 7.9432 LearningRate 0.0009 Epoch: 5 Global Step: 117170 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:18,718-Speed 6308.22 samples/sec Loss 7.9218 LearningRate 0.0009 Epoch: 5 Global Step: 117180 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:21,965-Speed 6310.43 samples/sec Loss 7.9734 LearningRate 0.0009 Epoch: 5 Global Step: 117190 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:25,208-Speed 6315.87 samples/sec Loss 7.9153 LearningRate 0.0009 Epoch: 5 Global Step: 117200 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:28,453-Speed 6311.83 samples/sec Loss 7.9988 LearningRate 0.0009 Epoch: 5 Global Step: 117210 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:31,699-Speed 6311.11 samples/sec Loss 7.9958 LearningRate 0.0009 Epoch: 5 Global Step: 117220 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:34,950-Speed 6301.68 samples/sec Loss 7.9776 LearningRate 0.0009 Epoch: 5 Global Step: 117230 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:38,197-Speed 6308.35 samples/sec Loss 7.9770 LearningRate 0.0009 Epoch: 5 Global Step: 117240 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:23:41,431-Speed 6334.75 samples/sec Loss 8.0157 LearningRate 0.0009 Epoch: 5 Global Step: 117250 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:23:44,686-Speed 6292.42 samples/sec Loss 7.9751 LearningRate 0.0009 Epoch: 5 Global Step: 117260 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:23:47,932-Speed 6311.37 samples/sec Loss 7.8758 LearningRate 0.0009 Epoch: 5 Global Step: 117270 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:23:51,177-Speed 6312.96 samples/sec Loss 7.9161 LearningRate 0.0009 Epoch: 5 Global Step: 117280 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:23:54,423-Speed 6310.03 samples/sec Loss 7.9511 LearningRate 0.0009 Epoch: 5 Global Step: 117290 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:23:57,666-Speed 6315.78 samples/sec Loss 7.9753 LearningRate 0.0009 Epoch: 5 Global Step: 117300 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:24:00,918-Speed 6299.42 samples/sec Loss 7.9061 LearningRate 0.0009 Epoch: 5 Global Step: 117310 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:24:04,163-Speed 6313.72 samples/sec Loss 7.9140 LearningRate 0.0009 Epoch: 5 Global Step: 117320 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:24:07,409-Speed 6309.24 samples/sec Loss 7.9389 LearningRate 0.0009 Epoch: 5 Global Step: 117330 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:24:10,658-Speed 6305.25 samples/sec Loss 7.8534 LearningRate 0.0009 Epoch: 5 Global Step: 117340 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:24:13,905-Speed 6310.04 samples/sec Loss 7.8897 LearningRate 0.0009 Epoch: 5 Global Step: 117350 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:17,154-Speed 6304.25 samples/sec Loss 7.9797 LearningRate 0.0009 Epoch: 5 Global Step: 117360 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:20,402-Speed 6307.01 samples/sec Loss 7.9757 LearningRate 0.0009 Epoch: 5 Global Step: 117370 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:23,653-Speed 6300.32 samples/sec Loss 7.9636 LearningRate 0.0009 Epoch: 5 Global Step: 117380 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:26,934-Speed 6243.50 samples/sec Loss 7.9780 LearningRate 0.0009 Epoch: 5 Global Step: 117390 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:30,192-Speed 6287.45 samples/sec Loss 7.9842 LearningRate 0.0009 Epoch: 5 Global Step: 117400 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:33,438-Speed 6311.32 samples/sec Loss 7.9246 LearningRate 0.0009 Epoch: 5 Global Step: 117410 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:36,686-Speed 6308.06 samples/sec Loss 7.9489 LearningRate 0.0009 Epoch: 5 Global Step: 117420 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:39,934-Speed 6305.60 samples/sec Loss 7.9759 LearningRate 0.0009 Epoch: 5 Global Step: 117430 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:43,182-Speed 6309.50 samples/sec Loss 8.0063 LearningRate 0.0009 Epoch: 5 Global Step: 117440 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:46,413-Speed 6340.95 samples/sec Loss 7.9784 LearningRate 0.0009 Epoch: 5 Global Step: 117450 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:49,658-Speed 6312.21 samples/sec Loss 7.9420 LearningRate 0.0009 Epoch: 5 Global Step: 117460 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:52,913-Speed 6292.94 samples/sec Loss 7.9943 LearningRate 0.0009 Epoch: 5 Global Step: 117470 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:56,170-Speed 6289.12 samples/sec Loss 7.9447 LearningRate 0.0009 Epoch: 5 Global Step: 117480 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:24:59,420-Speed 6304.37 samples/sec Loss 7.8951 LearningRate 0.0009 Epoch: 5 Global Step: 117490 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:02,669-Speed 6304.20 samples/sec Loss 8.0495 LearningRate 0.0009 Epoch: 5 Global Step: 117500 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:05,927-Speed 6287.10 samples/sec Loss 7.9580 LearningRate 0.0009 Epoch: 5 Global Step: 117510 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:09,176-Speed 6305.46 samples/sec Loss 7.9263 LearningRate 0.0009 Epoch: 5 Global Step: 117520 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:12,435-Speed 6285.55 samples/sec Loss 7.9235 LearningRate 0.0009 Epoch: 5 Global Step: 117530 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:15,683-Speed 6307.51 samples/sec Loss 7.9323 LearningRate 0.0009 Epoch: 5 Global Step: 117540 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:18,933-Speed 6302.35 samples/sec Loss 7.9192 LearningRate 0.0009 Epoch: 5 Global Step: 117550 Fp16 Grad Scale: 131072 Required: 65 hours Training: 2022-04-01 02:25:22,163-Speed 6341.93 samples/sec Loss 7.9388 LearningRate 0.0009 Epoch: 5 Global Step: 117560 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:25,408-Speed 6312.12 samples/sec Loss 7.9509 LearningRate 0.0009 Epoch: 5 Global Step: 117570 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:28,656-Speed 6306.56 samples/sec Loss 7.9734 LearningRate 0.0009 Epoch: 5 Global Step: 117580 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:31,905-Speed 6304.61 samples/sec Loss 7.9521 LearningRate 0.0009 Epoch: 5 Global Step: 117590 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:35,154-Speed 6304.73 samples/sec Loss 7.9024 LearningRate 0.0009 Epoch: 5 Global Step: 117600 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:38,404-Speed 6304.57 samples/sec Loss 7.9191 LearningRate 0.0009 Epoch: 5 Global Step: 117610 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:41,652-Speed 6306.55 samples/sec Loss 7.9976 LearningRate 0.0009 Epoch: 5 Global Step: 117620 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:44,902-Speed 6302.82 samples/sec Loss 7.9268 LearningRate 0.0009 Epoch: 5 Global Step: 117630 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:48,147-Speed 6312.55 samples/sec Loss 7.9693 LearningRate 0.0009 Epoch: 5 Global Step: 117640 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:51,397-Speed 6303.97 samples/sec Loss 7.8937 LearningRate 0.0009 Epoch: 5 Global Step: 117650 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:54,646-Speed 6305.34 samples/sec Loss 7.9483 LearningRate 0.0009 Epoch: 5 Global Step: 117660 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:25:57,923-Speed 6250.45 samples/sec Loss 7.9268 LearningRate 0.0009 Epoch: 5 Global Step: 117670 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:26:01,156-Speed 6334.91 samples/sec Loss 7.9409 LearningRate 0.0009 Epoch: 5 Global Step: 117680 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:26:04,403-Speed 6311.03 samples/sec Loss 7.9250 LearningRate 0.0009 Epoch: 5 Global Step: 117690 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:26:07,649-Speed 6310.53 samples/sec Loss 7.9095 LearningRate 0.0009 Epoch: 5 Global Step: 117700 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:26:10,895-Speed 6310.64 samples/sec Loss 7.8973 LearningRate 0.0009 Epoch: 5 Global Step: 117710 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:26:14,141-Speed 6310.21 samples/sec Loss 7.9128 LearningRate 0.0009 Epoch: 5 Global Step: 117720 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:26:17,389-Speed 6307.34 samples/sec Loss 7.8913 LearningRate 0.0009 Epoch: 5 Global Step: 117730 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:26:20,639-Speed 6302.51 samples/sec Loss 7.9975 LearningRate 0.0009 Epoch: 5 Global Step: 117740 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:26:23,885-Speed 6311.17 samples/sec Loss 7.9957 LearningRate 0.0009 Epoch: 5 Global Step: 117750 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:26:27,132-Speed 6308.30 samples/sec Loss 7.9544 LearningRate 0.0009 Epoch: 5 Global Step: 117760 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:26:30,377-Speed 6312.72 samples/sec Loss 7.9567 LearningRate 0.0009 Epoch: 5 Global Step: 117770 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:26:33,624-Speed 6308.35 samples/sec Loss 7.9309 LearningRate 0.0009 Epoch: 5 Global Step: 117780 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:26:36,870-Speed 6311.27 samples/sec Loss 7.9786 LearningRate 0.0009 Epoch: 5 Global Step: 117790 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:26:40,116-Speed 6310.72 samples/sec Loss 7.9061 LearningRate 0.0009 Epoch: 5 Global Step: 117800 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:26:43,361-Speed 6311.54 samples/sec Loss 7.9449 LearningRate 0.0009 Epoch: 5 Global Step: 117810 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:26:46,607-Speed 6311.70 samples/sec Loss 7.8666 LearningRate 0.0009 Epoch: 5 Global Step: 117820 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:26:49,911-Speed 6200.21 samples/sec Loss 7.9623 LearningRate 0.0009 Epoch: 5 Global Step: 117830 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:26:53,158-Speed 6307.22 samples/sec Loss 7.9714 LearningRate 0.0009 Epoch: 5 Global Step: 117840 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:26:56,407-Speed 6305.30 samples/sec Loss 7.9579 LearningRate 0.0009 Epoch: 5 Global Step: 117850 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:26:59,654-Speed 6308.87 samples/sec Loss 7.9877 LearningRate 0.0009 Epoch: 5 Global Step: 117860 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:27:02,899-Speed 6313.37 samples/sec Loss 7.9076 LearningRate 0.0009 Epoch: 5 Global Step: 117870 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:27:06,150-Speed 6302.37 samples/sec Loss 8.0323 LearningRate 0.0009 Epoch: 5 Global Step: 117880 Fp16 Grad Scale: 131072 Required: 65 hours Training: 2022-04-01 02:27:09,388-Speed 6325.21 samples/sec Loss 7.8957 LearningRate 0.0009 Epoch: 5 Global Step: 117890 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:27:12,634-Speed 6310.81 samples/sec Loss 7.9529 LearningRate 0.0009 Epoch: 5 Global Step: 117900 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:27:15,881-Speed 6309.61 samples/sec Loss 7.9143 LearningRate 0.0009 Epoch: 5 Global Step: 117910 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:27:19,126-Speed 6312.91 samples/sec Loss 7.9247 LearningRate 0.0009 Epoch: 5 Global Step: 117920 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:27:22,371-Speed 6311.16 samples/sec Loss 7.8940 LearningRate 0.0009 Epoch: 5 Global Step: 117930 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:27:25,617-Speed 6310.72 samples/sec Loss 7.9220 LearningRate 0.0009 Epoch: 5 Global Step: 117940 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:27:28,862-Speed 6313.03 samples/sec Loss 7.9543 LearningRate 0.0009 Epoch: 5 Global Step: 117950 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:27:32,111-Speed 6304.72 samples/sec Loss 7.8909 LearningRate 0.0009 Epoch: 5 Global Step: 117960 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:27:35,390-Speed 6248.07 samples/sec Loss 7.9846 LearningRate 0.0009 Epoch: 5 Global Step: 117970 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:27:38,636-Speed 6310.83 samples/sec Loss 7.9309 LearningRate 0.0009 Epoch: 5 Global Step: 117980 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:27:41,881-Speed 6312.33 samples/sec Loss 7.9377 LearningRate 0.0009 Epoch: 5 Global Step: 117990 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:27:45,129-Speed 6306.83 samples/sec Loss 7.9272 LearningRate 0.0009 Epoch: 5 Global Step: 118000 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:27:48,378-Speed 6304.22 samples/sec Loss 7.9454 LearningRate 0.0009 Epoch: 5 Global Step: 118010 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:27:51,624-Speed 6310.58 samples/sec Loss 7.8544 LearningRate 0.0009 Epoch: 5 Global Step: 118020 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:27:54,876-Speed 6299.87 samples/sec Loss 7.9447 LearningRate 0.0009 Epoch: 5 Global Step: 118030 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:27:58,119-Speed 6315.24 samples/sec Loss 7.8104 LearningRate 0.0009 Epoch: 5 Global Step: 118040 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:28:01,370-Speed 6301.75 samples/sec Loss 7.9189 LearningRate 0.0009 Epoch: 5 Global Step: 118050 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:28:04,614-Speed 6315.12 samples/sec Loss 7.8377 LearningRate 0.0009 Epoch: 5 Global Step: 118060 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:28:07,869-Speed 6293.66 samples/sec Loss 7.9246 LearningRate 0.0009 Epoch: 5 Global Step: 118070 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:11,116-Speed 6308.03 samples/sec Loss 7.8755 LearningRate 0.0009 Epoch: 5 Global Step: 118080 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:14,363-Speed 6310.16 samples/sec Loss 7.8923 LearningRate 0.0009 Epoch: 5 Global Step: 118090 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:17,614-Speed 6299.92 samples/sec Loss 7.9994 LearningRate 0.0009 Epoch: 5 Global Step: 118100 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:20,858-Speed 6315.05 samples/sec Loss 7.9318 LearningRate 0.0009 Epoch: 5 Global Step: 118110 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:24,107-Speed 6305.50 samples/sec Loss 7.8853 LearningRate 0.0009 Epoch: 5 Global Step: 118120 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:27,353-Speed 6310.80 samples/sec Loss 7.9628 LearningRate 0.0009 Epoch: 5 Global Step: 118130 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:30,597-Speed 6314.37 samples/sec Loss 7.9532 LearningRate 0.0009 Epoch: 5 Global Step: 118140 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:33,846-Speed 6305.01 samples/sec Loss 7.9110 LearningRate 0.0009 Epoch: 5 Global Step: 118150 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:37,091-Speed 6312.89 samples/sec Loss 7.9877 LearningRate 0.0009 Epoch: 5 Global Step: 118160 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:40,330-Speed 6323.65 samples/sec Loss 7.9010 LearningRate 0.0009 Epoch: 5 Global Step: 118170 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:28:43,562-Speed 6337.27 samples/sec Loss 7.8441 LearningRate 0.0009 Epoch: 5 Global Step: 118180 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:28:46,810-Speed 6306.96 samples/sec Loss 7.8616 LearningRate 0.0009 Epoch: 5 Global Step: 118190 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:28:50,055-Speed 6312.94 samples/sec Loss 7.8862 LearningRate 0.0009 Epoch: 5 Global Step: 118200 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:28:53,296-Speed 6320.04 samples/sec Loss 7.9274 LearningRate 0.0009 Epoch: 5 Global Step: 118210 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:28:56,548-Speed 6299.09 samples/sec Loss 7.8257 LearningRate 0.0009 Epoch: 5 Global Step: 118220 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:28:59,796-Speed 6307.93 samples/sec Loss 7.9450 LearningRate 0.0009 Epoch: 5 Global Step: 118230 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:03,039-Speed 6315.48 samples/sec Loss 7.8875 LearningRate 0.0009 Epoch: 5 Global Step: 118240 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:06,288-Speed 6305.51 samples/sec Loss 7.9289 LearningRate 0.0009 Epoch: 5 Global Step: 118250 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:09,550-Speed 6278.67 samples/sec Loss 7.9578 LearningRate 0.0009 Epoch: 5 Global Step: 118260 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:12,801-Speed 6301.77 samples/sec Loss 7.8810 LearningRate 0.0009 Epoch: 5 Global Step: 118270 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:16,046-Speed 6312.43 samples/sec Loss 7.9455 LearningRate 0.0009 Epoch: 5 Global Step: 118280 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:29:19,295-Speed 6305.29 samples/sec Loss 7.8730 LearningRate 0.0009 Epoch: 5 Global Step: 118290 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:29:22,544-Speed 6305.81 samples/sec Loss 7.9149 LearningRate 0.0009 Epoch: 5 Global Step: 118300 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:29:25,779-Speed 6332.60 samples/sec Loss 7.9284 LearningRate 0.0009 Epoch: 5 Global Step: 118310 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:29,026-Speed 6308.54 samples/sec Loss 7.9580 LearningRate 0.0009 Epoch: 5 Global Step: 118320 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:32,273-Speed 6307.55 samples/sec Loss 7.9149 LearningRate 0.0009 Epoch: 5 Global Step: 118330 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:35,522-Speed 6306.30 samples/sec Loss 7.9115 LearningRate 0.0009 Epoch: 5 Global Step: 118340 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:38,768-Speed 6309.38 samples/sec Loss 8.0080 LearningRate 0.0009 Epoch: 5 Global Step: 118350 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:42,012-Speed 6316.45 samples/sec Loss 7.8756 LearningRate 0.0009 Epoch: 5 Global Step: 118360 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:45,260-Speed 6305.63 samples/sec Loss 7.8904 LearningRate 0.0009 Epoch: 5 Global Step: 118370 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:48,503-Speed 6317.06 samples/sec Loss 7.9682 LearningRate 0.0009 Epoch: 5 Global Step: 118380 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:51,750-Speed 6309.43 samples/sec Loss 7.9049 LearningRate 0.0009 Epoch: 5 Global Step: 118390 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:54,998-Speed 6305.80 samples/sec Loss 7.8450 LearningRate 0.0009 Epoch: 5 Global Step: 118400 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:29:58,248-Speed 6303.10 samples/sec Loss 7.8619 LearningRate 0.0009 Epoch: 5 Global Step: 118410 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:30:01,496-Speed 6305.71 samples/sec Loss 7.8306 LearningRate 0.0009 Epoch: 5 Global Step: 118420 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:30:04,731-Speed 6334.06 samples/sec Loss 7.8488 LearningRate 0.0009 Epoch: 5 Global Step: 118430 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:07,977-Speed 6309.52 samples/sec Loss 7.9583 LearningRate 0.0009 Epoch: 5 Global Step: 118440 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:11,221-Speed 6315.33 samples/sec Loss 7.8769 LearningRate 0.0009 Epoch: 5 Global Step: 118450 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:14,470-Speed 6303.99 samples/sec Loss 7.8544 LearningRate 0.0009 Epoch: 5 Global Step: 118460 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:17,713-Speed 6316.29 samples/sec Loss 7.9351 LearningRate 0.0009 Epoch: 5 Global Step: 118470 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:20,954-Speed 6322.33 samples/sec Loss 7.9371 LearningRate 0.0009 Epoch: 5 Global Step: 118480 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:24,200-Speed 6309.73 samples/sec Loss 7.9704 LearningRate 0.0009 Epoch: 5 Global Step: 118490 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:27,446-Speed 6311.06 samples/sec Loss 7.9392 LearningRate 0.0009 Epoch: 5 Global Step: 118500 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:30,692-Speed 6310.09 samples/sec Loss 7.9405 LearningRate 0.0009 Epoch: 5 Global Step: 118510 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:33,937-Speed 6313.69 samples/sec Loss 7.9594 LearningRate 0.0009 Epoch: 5 Global Step: 118520 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:37,185-Speed 6306.72 samples/sec Loss 7.9546 LearningRate 0.0009 Epoch: 5 Global Step: 118530 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:30:40,435-Speed 6303.84 samples/sec Loss 7.9376 LearningRate 0.0009 Epoch: 5 Global Step: 118540 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:30:43,677-Speed 6318.31 samples/sec Loss 7.9566 LearningRate 0.0009 Epoch: 5 Global Step: 118550 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:30:46,906-Speed 6343.32 samples/sec Loss 7.9502 LearningRate 0.0009 Epoch: 5 Global Step: 118560 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:50,217-Speed 6186.52 samples/sec Loss 7.8262 LearningRate 0.0009 Epoch: 5 Global Step: 118570 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:53,496-Speed 6248.39 samples/sec Loss 7.9514 LearningRate 0.0009 Epoch: 5 Global Step: 118580 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:56,748-Speed 6299.35 samples/sec Loss 7.8621 LearningRate 0.0009 Epoch: 5 Global Step: 118590 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:30:59,991-Speed 6316.04 samples/sec Loss 7.8824 LearningRate 0.0009 Epoch: 5 Global Step: 118600 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:31:03,235-Speed 6313.00 samples/sec Loss 7.8879 LearningRate 0.0009 Epoch: 5 Global Step: 118610 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:31:06,482-Speed 6308.86 samples/sec Loss 7.9600 LearningRate 0.0009 Epoch: 5 Global Step: 118620 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:31:09,729-Speed 6310.32 samples/sec Loss 7.9592 LearningRate 0.0009 Epoch: 5 Global Step: 118630 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:31:12,980-Speed 6300.64 samples/sec Loss 7.8934 LearningRate 0.0009 Epoch: 5 Global Step: 118640 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:31:16,225-Speed 6311.44 samples/sec Loss 7.9126 LearningRate 0.0009 Epoch: 5 Global Step: 118650 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:31:19,471-Speed 6311.97 samples/sec Loss 7.8317 LearningRate 0.0009 Epoch: 5 Global Step: 118660 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:22,715-Speed 6313.87 samples/sec Loss 7.9909 LearningRate 0.0009 Epoch: 5 Global Step: 118670 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:25,959-Speed 6314.43 samples/sec Loss 7.9309 LearningRate 0.0009 Epoch: 5 Global Step: 118680 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:29,203-Speed 6315.33 samples/sec Loss 7.9232 LearningRate 0.0009 Epoch: 5 Global Step: 118690 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:32,447-Speed 6314.65 samples/sec Loss 7.8673 LearningRate 0.0009 Epoch: 5 Global Step: 118700 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:35,697-Speed 6302.54 samples/sec Loss 7.9922 LearningRate 0.0009 Epoch: 5 Global Step: 118710 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:38,944-Speed 6309.00 samples/sec Loss 7.9286 LearningRate 0.0009 Epoch: 5 Global Step: 118720 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:42,190-Speed 6311.06 samples/sec Loss 7.8383 LearningRate 0.0009 Epoch: 5 Global Step: 118730 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:45,436-Speed 6311.43 samples/sec Loss 7.9440 LearningRate 0.0009 Epoch: 5 Global Step: 118740 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:48,682-Speed 6311.58 samples/sec Loss 7.9461 LearningRate 0.0009 Epoch: 5 Global Step: 118750 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:51,926-Speed 6313.53 samples/sec Loss 7.9598 LearningRate 0.0009 Epoch: 5 Global Step: 118760 Fp16 Grad Scale: 131072 Required: 65 hours Training: 2022-04-01 02:31:55,155-Speed 6343.64 samples/sec Loss 7.9031 LearningRate 0.0009 Epoch: 5 Global Step: 118770 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:31:58,404-Speed 6306.28 samples/sec Loss 7.8370 LearningRate 0.0009 Epoch: 5 Global Step: 118780 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:32:01,650-Speed 6311.03 samples/sec Loss 8.0049 LearningRate 0.0009 Epoch: 5 Global Step: 118790 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:32:04,897-Speed 6308.48 samples/sec Loss 8.0009 LearningRate 0.0009 Epoch: 5 Global Step: 118800 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:32:08,129-Speed 6336.78 samples/sec Loss 7.9922 LearningRate 0.0009 Epoch: 5 Global Step: 118810 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:32:11,374-Speed 6313.69 samples/sec Loss 7.7922 LearningRate 0.0009 Epoch: 5 Global Step: 118820 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:32:14,618-Speed 6313.33 samples/sec Loss 7.8289 LearningRate 0.0009 Epoch: 5 Global Step: 118830 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:32:17,865-Speed 6310.11 samples/sec Loss 7.9249 LearningRate 0.0009 Epoch: 5 Global Step: 118840 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:32:21,113-Speed 6305.85 samples/sec Loss 7.8583 LearningRate 0.0009 Epoch: 5 Global Step: 118850 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:32:24,371-Speed 6287.85 samples/sec Loss 7.9294 LearningRate 0.0009 Epoch: 5 Global Step: 118860 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:32:27,615-Speed 6315.28 samples/sec Loss 7.9986 LearningRate 0.0009 Epoch: 5 Global Step: 118870 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:32:30,871-Speed 6290.68 samples/sec Loss 7.9506 LearningRate 0.0009 Epoch: 5 Global Step: 118880 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:32:34,117-Speed 6310.20 samples/sec Loss 7.9619 LearningRate 0.0009 Epoch: 5 Global Step: 118890 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:32:37,364-Speed 6308.39 samples/sec Loss 7.8271 LearningRate 0.0009 Epoch: 5 Global Step: 118900 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:32:40,608-Speed 6315.60 samples/sec Loss 8.0007 LearningRate 0.0009 Epoch: 5 Global Step: 118910 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:32:43,854-Speed 6311.05 samples/sec Loss 7.9405 LearningRate 0.0009 Epoch: 5 Global Step: 118920 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:32:47,097-Speed 6316.73 samples/sec Loss 7.8536 LearningRate 0.0009 Epoch: 5 Global Step: 118930 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:32:50,341-Speed 6314.15 samples/sec Loss 7.9039 LearningRate 0.0009 Epoch: 5 Global Step: 118940 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:32:53,592-Speed 6302.01 samples/sec Loss 7.8902 LearningRate 0.0009 Epoch: 5 Global Step: 118950 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:32:56,838-Speed 6311.19 samples/sec Loss 7.8743 LearningRate 0.0009 Epoch: 5 Global Step: 118960 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:33:00,083-Speed 6312.37 samples/sec Loss 7.9925 LearningRate 0.0009 Epoch: 5 Global Step: 118970 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:33:03,330-Speed 6308.45 samples/sec Loss 7.8719 LearningRate 0.0009 Epoch: 5 Global Step: 118980 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:33:06,560-Speed 6342.19 samples/sec Loss 7.9387 LearningRate 0.0009 Epoch: 5 Global Step: 118990 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:33:09,804-Speed 6313.54 samples/sec Loss 7.8907 LearningRate 0.0009 Epoch: 5 Global Step: 119000 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:33:13,051-Speed 6308.51 samples/sec Loss 8.0430 LearningRate 0.0009 Epoch: 5 Global Step: 119010 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:33:16,296-Speed 6314.24 samples/sec Loss 7.8401 LearningRate 0.0009 Epoch: 5 Global Step: 119020 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:33:19,543-Speed 6307.44 samples/sec Loss 7.9137 LearningRate 0.0009 Epoch: 5 Global Step: 119030 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:33:22,791-Speed 6307.03 samples/sec Loss 7.8609 LearningRate 0.0009 Epoch: 5 Global Step: 119040 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:33:26,037-Speed 6310.71 samples/sec Loss 7.8875 LearningRate 0.0009 Epoch: 5 Global Step: 119050 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:33:29,289-Speed 6298.94 samples/sec Loss 7.8982 LearningRate 0.0009 Epoch: 5 Global Step: 119060 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:33:32,536-Speed 6308.68 samples/sec Loss 7.8630 LearningRate 0.0009 Epoch: 5 Global Step: 119070 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:33:35,781-Speed 6314.01 samples/sec Loss 7.8826 LearningRate 0.0009 Epoch: 5 Global Step: 119080 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:33:39,028-Speed 6308.23 samples/sec Loss 7.9631 LearningRate 0.0009 Epoch: 5 Global Step: 119090 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:33:42,280-Speed 6299.40 samples/sec Loss 7.8853 LearningRate 0.0009 Epoch: 5 Global Step: 119100 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:33:45,529-Speed 6304.61 samples/sec Loss 7.8578 LearningRate 0.0009 Epoch: 5 Global Step: 119110 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:33:48,774-Speed 6311.90 samples/sec Loss 7.9002 LearningRate 0.0009 Epoch: 5 Global Step: 119120 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:33:52,020-Speed 6310.07 samples/sec Loss 7.8991 LearningRate 0.0009 Epoch: 5 Global Step: 119130 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:33:55,263-Speed 6317.69 samples/sec Loss 7.8980 LearningRate 0.0009 Epoch: 5 Global Step: 119140 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:33:58,511-Speed 6307.66 samples/sec Loss 7.9108 LearningRate 0.0009 Epoch: 5 Global Step: 119150 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:34:01,761-Speed 6302.45 samples/sec Loss 7.9380 LearningRate 0.0009 Epoch: 5 Global Step: 119160 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:34:05,009-Speed 6306.90 samples/sec Loss 7.9109 LearningRate 0.0009 Epoch: 5 Global Step: 119170 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:34:08,255-Speed 6311.50 samples/sec Loss 7.9164 LearningRate 0.0009 Epoch: 5 Global Step: 119180 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:34:11,489-Speed 6332.71 samples/sec Loss 7.8645 LearningRate 0.0009 Epoch: 5 Global Step: 119190 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:34:14,737-Speed 6308.79 samples/sec Loss 7.8693 LearningRate 0.0009 Epoch: 5 Global Step: 119200 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:34:17,983-Speed 6309.28 samples/sec Loss 7.8758 LearningRate 0.0009 Epoch: 5 Global Step: 119210 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:34:21,254-Speed 6261.96 samples/sec Loss 7.9622 LearningRate 0.0009 Epoch: 5 Global Step: 119220 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:34:24,514-Speed 6285.20 samples/sec Loss 7.8640 LearningRate 0.0009 Epoch: 5 Global Step: 119230 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:34:27,770-Speed 6290.05 samples/sec Loss 7.9265 LearningRate 0.0009 Epoch: 5 Global Step: 119240 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:34:31,003-Speed 6337.63 samples/sec Loss 7.9844 LearningRate 0.0009 Epoch: 5 Global Step: 119250 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:34:34,250-Speed 6308.01 samples/sec Loss 7.9028 LearningRate 0.0009 Epoch: 5 Global Step: 119260 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:34:37,497-Speed 6309.40 samples/sec Loss 7.8672 LearningRate 0.0009 Epoch: 5 Global Step: 119270 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:34:40,742-Speed 6311.30 samples/sec Loss 7.8247 LearningRate 0.0009 Epoch: 5 Global Step: 119280 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:34:43,998-Speed 6291.16 samples/sec Loss 7.9132 LearningRate 0.0009 Epoch: 5 Global Step: 119290 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:34:47,246-Speed 6307.24 samples/sec Loss 7.9328 LearningRate 0.0009 Epoch: 5 Global Step: 119300 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:34:50,502-Speed 6292.11 samples/sec Loss 7.8983 LearningRate 0.0009 Epoch: 5 Global Step: 119310 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:34:53,747-Speed 6311.94 samples/sec Loss 7.8766 LearningRate 0.0009 Epoch: 5 Global Step: 119320 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:34:56,994-Speed 6307.90 samples/sec Loss 7.8748 LearningRate 0.0009 Epoch: 5 Global Step: 119330 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:35:00,235-Speed 6321.78 samples/sec Loss 7.8514 LearningRate 0.0009 Epoch: 5 Global Step: 119340 Fp16 Grad Scale: 32768 Required: 65 hours Training: 2022-04-01 02:35:03,482-Speed 6308.71 samples/sec Loss 7.9344 LearningRate 0.0009 Epoch: 5 Global Step: 119350 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:06,726-Speed 6315.44 samples/sec Loss 7.9431 LearningRate 0.0009 Epoch: 5 Global Step: 119360 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:09,974-Speed 6307.56 samples/sec Loss 7.8708 LearningRate 0.0009 Epoch: 5 Global Step: 119370 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:13,218-Speed 6313.17 samples/sec Loss 7.8450 LearningRate 0.0009 Epoch: 5 Global Step: 119380 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:16,464-Speed 6311.03 samples/sec Loss 7.9032 LearningRate 0.0009 Epoch: 5 Global Step: 119390 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:19,712-Speed 6306.43 samples/sec Loss 7.8409 LearningRate 0.0009 Epoch: 5 Global Step: 119400 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:22,961-Speed 6305.56 samples/sec Loss 7.8669 LearningRate 0.0009 Epoch: 5 Global Step: 119410 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:26,205-Speed 6314.49 samples/sec Loss 7.9469 LearningRate 0.0009 Epoch: 5 Global Step: 119420 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:29,454-Speed 6305.49 samples/sec Loss 7.8518 LearningRate 0.0009 Epoch: 5 Global Step: 119430 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:32,702-Speed 6306.27 samples/sec Loss 7.9135 LearningRate 0.0009 Epoch: 5 Global Step: 119440 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:35,950-Speed 6306.10 samples/sec Loss 7.8135 LearningRate 0.0009 Epoch: 5 Global Step: 119450 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:39,193-Speed 6317.61 samples/sec Loss 7.8136 LearningRate 0.0009 Epoch: 5 Global Step: 119460 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:42,438-Speed 6312.09 samples/sec Loss 7.7999 LearningRate 0.0009 Epoch: 5 Global Step: 119470 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:45,684-Speed 6310.63 samples/sec Loss 7.8467 LearningRate 0.0009 Epoch: 5 Global Step: 119480 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:48,950-Speed 6272.41 samples/sec Loss 7.9248 LearningRate 0.0009 Epoch: 5 Global Step: 119490 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:52,218-Speed 6268.22 samples/sec Loss 7.9778 LearningRate 0.0009 Epoch: 5 Global Step: 119500 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:55,466-Speed 6307.64 samples/sec Loss 7.9196 LearningRate 0.0009 Epoch: 5 Global Step: 119510 Fp16 Grad Scale: 65536 Required: 65 hours Training: 2022-04-01 02:35:58,711-Speed 6311.98 samples/sec Loss 7.9316 LearningRate 0.0009 Epoch: 5 Global Step: 119520 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:01,957-Speed 6311.42 samples/sec Loss 7.8363 LearningRate 0.0009 Epoch: 5 Global Step: 119530 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:05,205-Speed 6306.63 samples/sec Loss 7.8727 LearningRate 0.0009 Epoch: 5 Global Step: 119540 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:08,436-Speed 6340.35 samples/sec Loss 7.8809 LearningRate 0.0009 Epoch: 5 Global Step: 119550 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:11,681-Speed 6312.84 samples/sec Loss 7.8608 LearningRate 0.0009 Epoch: 5 Global Step: 119560 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:14,932-Speed 6301.70 samples/sec Loss 7.8251 LearningRate 0.0009 Epoch: 5 Global Step: 119570 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:18,176-Speed 6313.27 samples/sec Loss 7.8482 LearningRate 0.0009 Epoch: 5 Global Step: 119580 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:21,423-Speed 6308.80 samples/sec Loss 7.8504 LearningRate 0.0009 Epoch: 5 Global Step: 119590 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:24,668-Speed 6312.58 samples/sec Loss 7.8794 LearningRate 0.0009 Epoch: 5 Global Step: 119600 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:27,917-Speed 6305.63 samples/sec Loss 7.7898 LearningRate 0.0009 Epoch: 5 Global Step: 119610 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:31,167-Speed 6302.74 samples/sec Loss 7.8965 LearningRate 0.0009 Epoch: 5 Global Step: 119620 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:34,414-Speed 6309.26 samples/sec Loss 7.8769 LearningRate 0.0009 Epoch: 5 Global Step: 119630 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:37,662-Speed 6306.12 samples/sec Loss 7.9536 LearningRate 0.0009 Epoch: 5 Global Step: 119640 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:40,892-Speed 6341.19 samples/sec Loss 7.9118 LearningRate 0.0009 Epoch: 5 Global Step: 119650 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:44,138-Speed 6311.36 samples/sec Loss 7.8670 LearningRate 0.0009 Epoch: 5 Global Step: 119660 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:47,392-Speed 6295.93 samples/sec Loss 7.8678 LearningRate 0.0009 Epoch: 5 Global Step: 119670 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:50,639-Speed 6308.35 samples/sec Loss 7.9363 LearningRate 0.0009 Epoch: 5 Global Step: 119680 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:53,894-Speed 6293.10 samples/sec Loss 7.8486 LearningRate 0.0009 Epoch: 5 Global Step: 119690 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:36:57,139-Speed 6312.24 samples/sec Loss 7.7939 LearningRate 0.0009 Epoch: 5 Global Step: 119700 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:37:00,391-Speed 6298.45 samples/sec Loss 7.8961 LearningRate 0.0009 Epoch: 5 Global Step: 119710 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:37:03,625-Speed 6334.80 samples/sec Loss 7.8792 LearningRate 0.0009 Epoch: 5 Global Step: 119720 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:37:06,875-Speed 6304.06 samples/sec Loss 7.8896 LearningRate 0.0009 Epoch: 5 Global Step: 119730 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:37:10,117-Speed 6317.28 samples/sec Loss 7.9047 LearningRate 0.0009 Epoch: 5 Global Step: 119740 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:37:13,359-Speed 6320.31 samples/sec Loss 7.8898 LearningRate 0.0009 Epoch: 5 Global Step: 119750 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:37:16,607-Speed 6306.33 samples/sec Loss 7.8947 LearningRate 0.0009 Epoch: 5 Global Step: 119760 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:37:19,852-Speed 6313.29 samples/sec Loss 7.8994 LearningRate 0.0009 Epoch: 5 Global Step: 119770 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:37:23,098-Speed 6310.19 samples/sec Loss 7.9306 LearningRate 0.0009 Epoch: 5 Global Step: 119780 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:37:26,350-Speed 6299.94 samples/sec Loss 7.9514 LearningRate 0.0009 Epoch: 5 Global Step: 119790 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:37:29,596-Speed 6310.33 samples/sec Loss 7.9008 LearningRate 0.0009 Epoch: 5 Global Step: 119800 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:37:32,841-Speed 6311.59 samples/sec Loss 7.8600 LearningRate 0.0009 Epoch: 5 Global Step: 119810 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:37:36,090-Speed 6305.30 samples/sec Loss 7.9611 LearningRate 0.0009 Epoch: 5 Global Step: 119820 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:37:39,343-Speed 6297.63 samples/sec Loss 7.9053 LearningRate 0.0009 Epoch: 5 Global Step: 119830 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:37:42,590-Speed 6307.70 samples/sec Loss 7.9735 LearningRate 0.0009 Epoch: 5 Global Step: 119840 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:37:45,836-Speed 6311.46 samples/sec Loss 7.9053 LearningRate 0.0009 Epoch: 5 Global Step: 119850 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:37:49,086-Speed 6303.60 samples/sec Loss 7.8726 LearningRate 0.0009 Epoch: 5 Global Step: 119860 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:37:52,335-Speed 6303.93 samples/sec Loss 7.8655 LearningRate 0.0009 Epoch: 5 Global Step: 119870 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:37:55,581-Speed 6310.13 samples/sec Loss 7.8810 LearningRate 0.0009 Epoch: 5 Global Step: 119880 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:37:58,835-Speed 6295.72 samples/sec Loss 7.8708 LearningRate 0.0009 Epoch: 5 Global Step: 119890 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:38:02,084-Speed 6305.44 samples/sec Loss 7.9698 LearningRate 0.0009 Epoch: 5 Global Step: 119900 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:38:05,334-Speed 6301.76 samples/sec Loss 7.8312 LearningRate 0.0009 Epoch: 5 Global Step: 119910 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:38:08,566-Speed 6338.14 samples/sec Loss 7.8732 LearningRate 0.0009 Epoch: 5 Global Step: 119920 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:38:11,811-Speed 6312.81 samples/sec Loss 7.9000 LearningRate 0.0009 Epoch: 5 Global Step: 119930 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:38:15,060-Speed 6311.07 samples/sec Loss 7.9119 LearningRate 0.0009 Epoch: 5 Global Step: 119940 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:38:18,308-Speed 6306.91 samples/sec Loss 7.9177 LearningRate 0.0009 Epoch: 5 Global Step: 119950 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:38:21,557-Speed 6305.50 samples/sec Loss 7.8550 LearningRate 0.0009 Epoch: 5 Global Step: 119960 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:38:24,787-Speed 6341.43 samples/sec Loss 7.8968 LearningRate 0.0009 Epoch: 5 Global Step: 119970 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:38:28,035-Speed 6307.18 samples/sec Loss 7.7562 LearningRate 0.0009 Epoch: 5 Global Step: 119980 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:38:31,285-Speed 6303.46 samples/sec Loss 7.8934 LearningRate 0.0009 Epoch: 5 Global Step: 119990 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:38:34,534-Speed 6304.89 samples/sec Loss 7.8852 LearningRate 0.0009 Epoch: 5 Global Step: 120000 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:38:37,782-Speed 6305.98 samples/sec Loss 7.8995 LearningRate 0.0009 Epoch: 5 Global Step: 120010 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:38:41,030-Speed 6307.22 samples/sec Loss 7.9099 LearningRate 0.0009 Epoch: 5 Global Step: 120020 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:38:44,279-Speed 6304.29 samples/sec Loss 7.9443 LearningRate 0.0009 Epoch: 5 Global Step: 120030 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:38:47,524-Speed 6313.26 samples/sec Loss 7.9114 LearningRate 0.0009 Epoch: 5 Global Step: 120040 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:38:50,770-Speed 6310.18 samples/sec Loss 7.8645 LearningRate 0.0009 Epoch: 5 Global Step: 120050 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:38:54,013-Speed 6316.56 samples/sec Loss 7.8986 LearningRate 0.0009 Epoch: 5 Global Step: 120060 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:38:57,256-Speed 6317.84 samples/sec Loss 7.8441 LearningRate 0.0009 Epoch: 5 Global Step: 120070 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:00,505-Speed 6304.48 samples/sec Loss 7.8348 LearningRate 0.0009 Epoch: 5 Global Step: 120080 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:03,752-Speed 6308.46 samples/sec Loss 7.9623 LearningRate 0.0009 Epoch: 5 Global Step: 120090 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:06,996-Speed 6313.79 samples/sec Loss 7.8935 LearningRate 0.0009 Epoch: 5 Global Step: 120100 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:10,240-Speed 6315.21 samples/sec Loss 7.9263 LearningRate 0.0009 Epoch: 5 Global Step: 120110 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:13,483-Speed 6316.72 samples/sec Loss 7.9176 LearningRate 0.0009 Epoch: 5 Global Step: 120120 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:16,729-Speed 6310.97 samples/sec Loss 7.7644 LearningRate 0.0009 Epoch: 5 Global Step: 120130 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:19,974-Speed 6312.44 samples/sec Loss 7.8552 LearningRate 0.0009 Epoch: 5 Global Step: 120140 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:23,257-Speed 6238.77 samples/sec Loss 7.8742 LearningRate 0.0009 Epoch: 5 Global Step: 120150 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:26,511-Speed 6296.13 samples/sec Loss 8.0155 LearningRate 0.0009 Epoch: 5 Global Step: 120160 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:29,747-Speed 6330.20 samples/sec Loss 7.8841 LearningRate 0.0009 Epoch: 5 Global Step: 120170 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:32,993-Speed 6311.26 samples/sec Loss 7.8103 LearningRate 0.0009 Epoch: 5 Global Step: 120180 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:36,239-Speed 6309.76 samples/sec Loss 7.8349 LearningRate 0.0009 Epoch: 5 Global Step: 120190 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:39,500-Speed 6281.96 samples/sec Loss 7.9487 LearningRate 0.0009 Epoch: 5 Global Step: 120200 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:39:42,731-Speed 6339.78 samples/sec Loss 7.8051 LearningRate 0.0009 Epoch: 5 Global Step: 120210 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:39:45,976-Speed 6313.77 samples/sec Loss 7.8961 LearningRate 0.0009 Epoch: 5 Global Step: 120220 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:39:49,221-Speed 6312.46 samples/sec Loss 7.9048 LearningRate 0.0009 Epoch: 5 Global Step: 120230 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:39:52,465-Speed 6313.19 samples/sec Loss 7.9519 LearningRate 0.0009 Epoch: 5 Global Step: 120240 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:39:55,709-Speed 6315.52 samples/sec Loss 7.8020 LearningRate 0.0009 Epoch: 5 Global Step: 120250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:39:58,958-Speed 6304.31 samples/sec Loss 7.7991 LearningRate 0.0009 Epoch: 5 Global Step: 120260 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:40:02,204-Speed 6311.10 samples/sec Loss 7.8893 LearningRate 0.0009 Epoch: 5 Global Step: 120270 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:40:05,452-Speed 6306.52 samples/sec Loss 7.8975 LearningRate 0.0009 Epoch: 5 Global Step: 120280 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:40:08,698-Speed 6310.51 samples/sec Loss 7.9155 LearningRate 0.0009 Epoch: 5 Global Step: 120290 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:40:11,949-Speed 6300.88 samples/sec Loss 7.8909 LearningRate 0.0009 Epoch: 5 Global Step: 120300 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:40:15,196-Speed 6309.81 samples/sec Loss 7.9669 LearningRate 0.0009 Epoch: 5 Global Step: 120310 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:18,444-Speed 6306.45 samples/sec Loss 7.8790 LearningRate 0.0009 Epoch: 5 Global Step: 120320 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:21,688-Speed 6313.75 samples/sec Loss 7.8465 LearningRate 0.0009 Epoch: 5 Global Step: 120330 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:24,947-Speed 6286.77 samples/sec Loss 7.8460 LearningRate 0.0009 Epoch: 5 Global Step: 120340 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:28,359-Speed 6003.65 samples/sec Loss 7.8549 LearningRate 0.0009 Epoch: 5 Global Step: 120350 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:31,608-Speed 6304.45 samples/sec Loss 7.8407 LearningRate 0.0009 Epoch: 5 Global Step: 120360 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:34,858-Speed 6303.16 samples/sec Loss 7.8944 LearningRate 0.0009 Epoch: 5 Global Step: 120370 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:38,109-Speed 6301.10 samples/sec Loss 7.8871 LearningRate 0.0009 Epoch: 5 Global Step: 120380 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:41,360-Speed 6301.94 samples/sec Loss 7.7894 LearningRate 0.0009 Epoch: 5 Global Step: 120390 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:44,604-Speed 6314.72 samples/sec Loss 7.9130 LearningRate 0.0009 Epoch: 5 Global Step: 120400 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:47,837-Speed 6335.02 samples/sec Loss 7.8321 LearningRate 0.0009 Epoch: 5 Global Step: 120410 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:51,081-Speed 6315.84 samples/sec Loss 7.8883 LearningRate 0.0009 Epoch: 5 Global Step: 120420 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:54,330-Speed 6305.00 samples/sec Loss 7.8232 LearningRate 0.0009 Epoch: 5 Global Step: 120430 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:40:57,576-Speed 6311.27 samples/sec Loss 7.8126 LearningRate 0.0009 Epoch: 5 Global Step: 120440 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:41:00,823-Speed 6308.38 samples/sec Loss 7.8607 LearningRate 0.0009 Epoch: 5 Global Step: 120450 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:41:04,069-Speed 6310.33 samples/sec Loss 7.9012 LearningRate 0.0009 Epoch: 5 Global Step: 120460 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:41:07,316-Speed 6309.86 samples/sec Loss 7.8842 LearningRate 0.0009 Epoch: 5 Global Step: 120470 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:41:10,570-Speed 6294.06 samples/sec Loss 7.7813 LearningRate 0.0009 Epoch: 5 Global Step: 120480 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:41:13,821-Speed 6301.51 samples/sec Loss 7.7707 LearningRate 0.0009 Epoch: 5 Global Step: 120490 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:41:17,052-Speed 6339.86 samples/sec Loss 7.8097 LearningRate 0.0009 Epoch: 5 Global Step: 120500 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:41:20,300-Speed 6306.44 samples/sec Loss 7.8667 LearningRate 0.0009 Epoch: 5 Global Step: 120510 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:41:23,548-Speed 6306.67 samples/sec Loss 7.8654 LearningRate 0.0009 Epoch: 5 Global Step: 120520 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:41:26,798-Speed 6303.13 samples/sec Loss 7.8308 LearningRate 0.0009 Epoch: 5 Global Step: 120530 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:41:30,043-Speed 6312.38 samples/sec Loss 7.8608 LearningRate 0.0009 Epoch: 5 Global Step: 120540 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:41:33,290-Speed 6309.45 samples/sec Loss 7.8376 LearningRate 0.0009 Epoch: 5 Global Step: 120550 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:41:36,535-Speed 6312.66 samples/sec Loss 7.9209 LearningRate 0.0009 Epoch: 5 Global Step: 120560 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:41:39,781-Speed 6310.44 samples/sec Loss 7.8018 LearningRate 0.0009 Epoch: 5 Global Step: 120570 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:41:43,026-Speed 6312.52 samples/sec Loss 7.8650 LearningRate 0.0009 Epoch: 5 Global Step: 120580 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:41:46,278-Speed 6300.02 samples/sec Loss 7.8888 LearningRate 0.0009 Epoch: 5 Global Step: 120590 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:41:49,526-Speed 6307.33 samples/sec Loss 7.8063 LearningRate 0.0009 Epoch: 5 Global Step: 120600 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:41:52,774-Speed 6306.19 samples/sec Loss 7.8490 LearningRate 0.0009 Epoch: 5 Global Step: 120610 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:41:56,020-Speed 6310.17 samples/sec Loss 7.9596 LearningRate 0.0009 Epoch: 5 Global Step: 120620 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:41:59,268-Speed 6307.80 samples/sec Loss 7.8947 LearningRate 0.0009 Epoch: 5 Global Step: 120630 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:42:02,517-Speed 6304.20 samples/sec Loss 7.8389 LearningRate 0.0009 Epoch: 5 Global Step: 120640 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:42:05,752-Speed 6332.36 samples/sec Loss 7.8468 LearningRate 0.0009 Epoch: 5 Global Step: 120650 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:42:08,997-Speed 6312.78 samples/sec Loss 7.8987 LearningRate 0.0009 Epoch: 5 Global Step: 120660 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:42:12,244-Speed 6309.34 samples/sec Loss 7.9183 LearningRate 0.0009 Epoch: 5 Global Step: 120670 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:42:15,495-Speed 6303.10 samples/sec Loss 7.9310 LearningRate 0.0009 Epoch: 5 Global Step: 120680 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:42:18,742-Speed 6308.60 samples/sec Loss 7.8037 LearningRate 0.0009 Epoch: 5 Global Step: 120690 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:42:21,988-Speed 6310.07 samples/sec Loss 7.7919 LearningRate 0.0009 Epoch: 5 Global Step: 120700 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:42:25,241-Speed 6297.66 samples/sec Loss 7.8223 LearningRate 0.0009 Epoch: 5 Global Step: 120710 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:42:28,487-Speed 6311.13 samples/sec Loss 7.8463 LearningRate 0.0009 Epoch: 5 Global Step: 120720 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:42:31,735-Speed 6306.53 samples/sec Loss 7.8532 LearningRate 0.0009 Epoch: 5 Global Step: 120730 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:42:34,984-Speed 6305.06 samples/sec Loss 7.7594 LearningRate 0.0009 Epoch: 5 Global Step: 120740 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:42:38,235-Speed 6301.22 samples/sec Loss 7.8179 LearningRate 0.0009 Epoch: 5 Global Step: 120750 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:42:41,480-Speed 6311.50 samples/sec Loss 7.9268 LearningRate 0.0009 Epoch: 5 Global Step: 120760 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:42:44,733-Speed 6298.58 samples/sec Loss 7.7709 LearningRate 0.0009 Epoch: 5 Global Step: 120770 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:42:47,978-Speed 6311.16 samples/sec Loss 7.8012 LearningRate 0.0009 Epoch: 5 Global Step: 120780 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:42:51,225-Speed 6308.38 samples/sec Loss 7.8635 LearningRate 0.0009 Epoch: 5 Global Step: 120790 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:42:54,472-Speed 6309.10 samples/sec Loss 7.8590 LearningRate 0.0009 Epoch: 5 Global Step: 120800 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:42:57,706-Speed 6334.41 samples/sec Loss 7.8540 LearningRate 0.0009 Epoch: 5 Global Step: 120810 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:43:00,953-Speed 6309.01 samples/sec Loss 7.8753 LearningRate 0.0009 Epoch: 5 Global Step: 120820 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:43:04,212-Speed 6285.78 samples/sec Loss 7.7418 LearningRate 0.0009 Epoch: 5 Global Step: 120830 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:43:07,463-Speed 6302.20 samples/sec Loss 7.8873 LearningRate 0.0009 Epoch: 5 Global Step: 120840 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:43:10,711-Speed 6306.87 samples/sec Loss 7.7987 LearningRate 0.0009 Epoch: 5 Global Step: 120850 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:43:13,956-Speed 6311.74 samples/sec Loss 7.9311 LearningRate 0.0009 Epoch: 5 Global Step: 120860 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:43:17,203-Speed 6309.80 samples/sec Loss 7.8891 LearningRate 0.0009 Epoch: 5 Global Step: 120870 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:43:20,448-Speed 6311.26 samples/sec Loss 7.9225 LearningRate 0.0009 Epoch: 5 Global Step: 120880 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:43:23,694-Speed 6310.89 samples/sec Loss 7.8773 LearningRate 0.0009 Epoch: 5 Global Step: 120890 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:43:26,943-Speed 6305.39 samples/sec Loss 7.9625 LearningRate 0.0009 Epoch: 5 Global Step: 120900 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:43:30,187-Speed 6314.22 samples/sec Loss 7.7787 LearningRate 0.0009 Epoch: 5 Global Step: 120910 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:43:33,436-Speed 6306.08 samples/sec Loss 7.8747 LearningRate 0.0009 Epoch: 5 Global Step: 120920 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:43:36,683-Speed 6307.29 samples/sec Loss 7.8573 LearningRate 0.0009 Epoch: 5 Global Step: 120930 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:43:39,932-Speed 6306.22 samples/sec Loss 7.8360 LearningRate 0.0009 Epoch: 5 Global Step: 120940 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:43:43,176-Speed 6313.32 samples/sec Loss 7.8464 LearningRate 0.0009 Epoch: 5 Global Step: 120950 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:43:46,420-Speed 6314.38 samples/sec Loss 7.8481 LearningRate 0.0009 Epoch: 5 Global Step: 120960 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:43:49,667-Speed 6309.63 samples/sec Loss 7.8727 LearningRate 0.0009 Epoch: 5 Global Step: 120970 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:43:52,915-Speed 6306.16 samples/sec Loss 7.9399 LearningRate 0.0009 Epoch: 5 Global Step: 120980 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:43:56,162-Speed 6310.17 samples/sec Loss 7.9298 LearningRate 0.0009 Epoch: 5 Global Step: 120990 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:43:59,406-Speed 6313.64 samples/sec Loss 7.8056 LearningRate 0.0009 Epoch: 5 Global Step: 121000 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:02,640-Speed 6334.57 samples/sec Loss 7.8075 LearningRate 0.0009 Epoch: 5 Global Step: 121010 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:05,884-Speed 6313.81 samples/sec Loss 7.8660 LearningRate 0.0009 Epoch: 5 Global Step: 121020 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:09,128-Speed 6314.89 samples/sec Loss 7.8356 LearningRate 0.0009 Epoch: 5 Global Step: 121030 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:12,380-Speed 6299.11 samples/sec Loss 7.9337 LearningRate 0.0009 Epoch: 5 Global Step: 121040 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:15,624-Speed 6316.06 samples/sec Loss 7.8533 LearningRate 0.0009 Epoch: 5 Global Step: 121050 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:18,866-Speed 6317.28 samples/sec Loss 7.8241 LearningRate 0.0009 Epoch: 5 Global Step: 121060 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:22,111-Speed 6313.03 samples/sec Loss 7.7932 LearningRate 0.0009 Epoch: 5 Global Step: 121070 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:25,356-Speed 6312.08 samples/sec Loss 7.7039 LearningRate 0.0009 Epoch: 5 Global Step: 121080 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:28,604-Speed 6308.22 samples/sec Loss 7.8411 LearningRate 0.0009 Epoch: 5 Global Step: 121090 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:31,849-Speed 6312.00 samples/sec Loss 7.8414 LearningRate 0.0009 Epoch: 5 Global Step: 121100 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:35,081-Speed 6338.03 samples/sec Loss 7.7369 LearningRate 0.0009 Epoch: 5 Global Step: 121110 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:38,331-Speed 6302.16 samples/sec Loss 7.9180 LearningRate 0.0009 Epoch: 5 Global Step: 121120 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:41,585-Speed 6296.48 samples/sec Loss 7.7604 LearningRate 0.0009 Epoch: 5 Global Step: 121130 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:44,829-Speed 6313.23 samples/sec Loss 7.8281 LearningRate 0.0009 Epoch: 5 Global Step: 121140 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:48,073-Speed 6314.56 samples/sec Loss 7.8054 LearningRate 0.0009 Epoch: 5 Global Step: 121150 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:51,322-Speed 6305.27 samples/sec Loss 7.8169 LearningRate 0.0009 Epoch: 5 Global Step: 121160 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:54,567-Speed 6311.80 samples/sec Loss 7.8481 LearningRate 0.0009 Epoch: 5 Global Step: 121170 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:44:57,812-Speed 6313.15 samples/sec Loss 7.8361 LearningRate 0.0009 Epoch: 5 Global Step: 121180 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:45:01,061-Speed 6305.42 samples/sec Loss 7.8229 LearningRate 0.0009 Epoch: 5 Global Step: 121190 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:45:04,307-Speed 6310.22 samples/sec Loss 7.8732 LearningRate 0.0009 Epoch: 5 Global Step: 121200 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:45:07,542-Speed 6332.04 samples/sec Loss 7.7458 LearningRate 0.0009 Epoch: 5 Global Step: 121210 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:45:10,790-Speed 6306.52 samples/sec Loss 7.8407 LearningRate 0.0009 Epoch: 5 Global Step: 121220 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:45:14,024-Speed 6334.82 samples/sec Loss 7.8522 LearningRate 0.0009 Epoch: 5 Global Step: 121230 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:45:17,269-Speed 6312.57 samples/sec Loss 7.8798 LearningRate 0.0009 Epoch: 5 Global Step: 121240 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:45:20,518-Speed 6303.98 samples/sec Loss 7.8018 LearningRate 0.0009 Epoch: 5 Global Step: 121250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:45:23,767-Speed 6305.96 samples/sec Loss 7.7791 LearningRate 0.0009 Epoch: 5 Global Step: 121260 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:45:27,012-Speed 6312.89 samples/sec Loss 7.8044 LearningRate 0.0009 Epoch: 5 Global Step: 121270 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:45:30,259-Speed 6308.98 samples/sec Loss 7.8311 LearningRate 0.0009 Epoch: 5 Global Step: 121280 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:45:33,506-Speed 6309.20 samples/sec Loss 7.7673 LearningRate 0.0009 Epoch: 5 Global Step: 121290 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:45:36,758-Speed 6298.49 samples/sec Loss 7.8487 LearningRate 0.0009 Epoch: 5 Global Step: 121300 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:45:40,004-Speed 6311.07 samples/sec Loss 7.7626 LearningRate 0.0009 Epoch: 5 Global Step: 121310 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:45:43,249-Speed 6313.45 samples/sec Loss 7.7434 LearningRate 0.0009 Epoch: 5 Global Step: 121320 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:45:46,495-Speed 6310.26 samples/sec Loss 7.8265 LearningRate 0.0009 Epoch: 5 Global Step: 121330 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:45:49,740-Speed 6312.15 samples/sec Loss 7.8636 LearningRate 0.0009 Epoch: 5 Global Step: 121340 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:45:52,988-Speed 6307.41 samples/sec Loss 7.8569 LearningRate 0.0009 Epoch: 5 Global Step: 121350 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:45:56,232-Speed 6314.03 samples/sec Loss 7.8012 LearningRate 0.0009 Epoch: 5 Global Step: 121360 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:45:59,481-Speed 6304.36 samples/sec Loss 7.8657 LearningRate 0.0009 Epoch: 5 Global Step: 121370 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:02,731-Speed 6303.82 samples/sec Loss 7.7761 LearningRate 0.0009 Epoch: 5 Global Step: 121380 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:05,978-Speed 6309.25 samples/sec Loss 7.8174 LearningRate 0.0009 Epoch: 5 Global Step: 121390 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:09,226-Speed 6306.70 samples/sec Loss 7.7977 LearningRate 0.0009 Epoch: 5 Global Step: 121400 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:12,493-Speed 6268.83 samples/sec Loss 7.8414 LearningRate 0.0009 Epoch: 5 Global Step: 121410 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:15,739-Speed 6311.41 samples/sec Loss 7.8595 LearningRate 0.0009 Epoch: 5 Global Step: 121420 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:18,973-Speed 6334.90 samples/sec Loss 7.7552 LearningRate 0.0009 Epoch: 5 Global Step: 121430 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:22,221-Speed 6306.63 samples/sec Loss 7.8026 LearningRate 0.0009 Epoch: 5 Global Step: 121440 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:25,469-Speed 6306.69 samples/sec Loss 7.8171 LearningRate 0.0009 Epoch: 5 Global Step: 121450 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:28,715-Speed 6309.98 samples/sec Loss 7.8276 LearningRate 0.0009 Epoch: 5 Global Step: 121460 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:31,963-Speed 6306.88 samples/sec Loss 7.8578 LearningRate 0.0009 Epoch: 5 Global Step: 121470 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:35,214-Speed 6300.84 samples/sec Loss 7.8021 LearningRate 0.0009 Epoch: 5 Global Step: 121480 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:38,471-Speed 6291.09 samples/sec Loss 7.8255 LearningRate 0.0009 Epoch: 5 Global Step: 121490 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:41,718-Speed 6309.16 samples/sec Loss 7.8351 LearningRate 0.0009 Epoch: 5 Global Step: 121500 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:46:44,951-Speed 6335.93 samples/sec Loss 7.8055 LearningRate 0.0009 Epoch: 5 Global Step: 121510 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:46:48,200-Speed 6304.91 samples/sec Loss 7.8137 LearningRate 0.0009 Epoch: 5 Global Step: 121520 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:46:51,442-Speed 6317.70 samples/sec Loss 7.8467 LearningRate 0.0009 Epoch: 5 Global Step: 121530 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:46:54,693-Speed 6301.83 samples/sec Loss 7.7892 LearningRate 0.0009 Epoch: 5 Global Step: 121540 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:46:57,942-Speed 6304.72 samples/sec Loss 7.8093 LearningRate 0.0009 Epoch: 5 Global Step: 121550 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:01,187-Speed 6313.15 samples/sec Loss 7.8215 LearningRate 0.0009 Epoch: 5 Global Step: 121560 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:04,433-Speed 6308.97 samples/sec Loss 7.8238 LearningRate 0.0009 Epoch: 5 Global Step: 121570 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:07,677-Speed 6315.01 samples/sec Loss 7.9065 LearningRate 0.0009 Epoch: 5 Global Step: 121580 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:10,926-Speed 6305.32 samples/sec Loss 7.8036 LearningRate 0.0009 Epoch: 5 Global Step: 121590 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:14,169-Speed 6315.67 samples/sec Loss 7.8624 LearningRate 0.0009 Epoch: 5 Global Step: 121600 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:17,422-Speed 6297.38 samples/sec Loss 7.7952 LearningRate 0.0009 Epoch: 5 Global Step: 121610 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:47:20,668-Speed 6311.80 samples/sec Loss 7.9142 LearningRate 0.0009 Epoch: 5 Global Step: 121620 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:47:23,913-Speed 6311.14 samples/sec Loss 7.8327 LearningRate 0.0009 Epoch: 5 Global Step: 121630 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:47:27,146-Speed 6335.71 samples/sec Loss 7.7866 LearningRate 0.0009 Epoch: 5 Global Step: 121640 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:30,394-Speed 6307.30 samples/sec Loss 7.8461 LearningRate 0.0009 Epoch: 5 Global Step: 121650 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:33,640-Speed 6310.64 samples/sec Loss 7.6930 LearningRate 0.0009 Epoch: 5 Global Step: 121660 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:36,888-Speed 6308.01 samples/sec Loss 7.7824 LearningRate 0.0009 Epoch: 5 Global Step: 121670 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:40,130-Speed 6317.46 samples/sec Loss 7.9269 LearningRate 0.0009 Epoch: 5 Global Step: 121680 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:43,381-Speed 6301.91 samples/sec Loss 7.8402 LearningRate 0.0009 Epoch: 5 Global Step: 121690 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:46,634-Speed 6297.26 samples/sec Loss 7.8466 LearningRate 0.0009 Epoch: 5 Global Step: 121700 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:49,881-Speed 6309.61 samples/sec Loss 7.8758 LearningRate 0.0009 Epoch: 5 Global Step: 121710 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:53,126-Speed 6311.92 samples/sec Loss 7.7824 LearningRate 0.0009 Epoch: 5 Global Step: 121720 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:56,377-Speed 6300.79 samples/sec Loss 7.7187 LearningRate 0.0009 Epoch: 5 Global Step: 121730 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:47:59,622-Speed 6312.19 samples/sec Loss 7.8092 LearningRate 0.0009 Epoch: 5 Global Step: 121740 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:48:02,856-Speed 6335.36 samples/sec Loss 7.8737 LearningRate 0.0009 Epoch: 5 Global Step: 121750 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:48:06,108-Speed 6298.31 samples/sec Loss 7.8256 LearningRate 0.0009 Epoch: 5 Global Step: 121760 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:48:09,356-Speed 6306.89 samples/sec Loss 7.8680 LearningRate 0.0009 Epoch: 5 Global Step: 121770 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:48:12,600-Speed 6314.09 samples/sec Loss 7.7928 LearningRate 0.0009 Epoch: 5 Global Step: 121780 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:48:15,849-Speed 6305.67 samples/sec Loss 7.8024 LearningRate 0.0009 Epoch: 5 Global Step: 121790 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:48:19,098-Speed 6303.87 samples/sec Loss 7.8858 LearningRate 0.0009 Epoch: 5 Global Step: 121800 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:48:22,343-Speed 6313.28 samples/sec Loss 7.8017 LearningRate 0.0009 Epoch: 5 Global Step: 121810 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:48:25,587-Speed 6315.66 samples/sec Loss 7.7960 LearningRate 0.0009 Epoch: 5 Global Step: 121820 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:48:28,832-Speed 6311.19 samples/sec Loss 7.8814 LearningRate 0.0009 Epoch: 5 Global Step: 121830 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:48:32,088-Speed 6291.04 samples/sec Loss 7.7985 LearningRate 0.0009 Epoch: 5 Global Step: 121840 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:48:35,334-Speed 6310.88 samples/sec Loss 7.7442 LearningRate 0.0009 Epoch: 5 Global Step: 121850 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:48:38,578-Speed 6314.85 samples/sec Loss 7.8711 LearningRate 0.0009 Epoch: 5 Global Step: 121860 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:48:41,823-Speed 6312.40 samples/sec Loss 7.8881 LearningRate 0.0009 Epoch: 5 Global Step: 121870 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:48:45,068-Speed 6313.17 samples/sec Loss 7.8425 LearningRate 0.0009 Epoch: 5 Global Step: 121880 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:48:48,338-Speed 6264.97 samples/sec Loss 7.7666 LearningRate 0.0009 Epoch: 5 Global Step: 121890 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:48:51,583-Speed 6312.29 samples/sec Loss 7.8326 LearningRate 0.0009 Epoch: 5 Global Step: 121900 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:48:54,831-Speed 6306.32 samples/sec Loss 7.7992 LearningRate 0.0009 Epoch: 5 Global Step: 121910 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:48:58,078-Speed 6310.48 samples/sec Loss 7.8581 LearningRate 0.0009 Epoch: 5 Global Step: 121920 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:01,321-Speed 6315.38 samples/sec Loss 7.8343 LearningRate 0.0009 Epoch: 5 Global Step: 121930 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:04,571-Speed 6304.50 samples/sec Loss 7.8989 LearningRate 0.0009 Epoch: 5 Global Step: 121940 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:07,805-Speed 6333.90 samples/sec Loss 7.8226 LearningRate 0.0009 Epoch: 5 Global Step: 121950 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:11,050-Speed 6312.66 samples/sec Loss 7.7783 LearningRate 0.0009 Epoch: 5 Global Step: 121960 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:14,296-Speed 6309.58 samples/sec Loss 7.8625 LearningRate 0.0009 Epoch: 5 Global Step: 121970 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:17,544-Speed 6307.03 samples/sec Loss 7.8751 LearningRate 0.0009 Epoch: 5 Global Step: 121980 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:20,791-Speed 6310.06 samples/sec Loss 7.7844 LearningRate 0.0009 Epoch: 5 Global Step: 121990 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:24,039-Speed 6305.11 samples/sec Loss 7.8870 LearningRate 0.0009 Epoch: 5 Global Step: 122000 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:27,286-Speed 6308.32 samples/sec Loss 7.7847 LearningRate 0.0009 Epoch: 5 Global Step: 122010 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:30,537-Speed 6302.21 samples/sec Loss 7.8796 LearningRate 0.0009 Epoch: 5 Global Step: 122020 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:33,785-Speed 6306.95 samples/sec Loss 7.9033 LearningRate 0.0009 Epoch: 5 Global Step: 122030 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:37,031-Speed 6313.95 samples/sec Loss 7.7623 LearningRate 0.0009 Epoch: 5 Global Step: 122040 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:40,298-Speed 6269.42 samples/sec Loss 7.7969 LearningRate 0.0009 Epoch: 5 Global Step: 122050 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:43,546-Speed 6306.59 samples/sec Loss 7.8468 LearningRate 0.0009 Epoch: 5 Global Step: 122060 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:46,796-Speed 6303.39 samples/sec Loss 7.7486 LearningRate 0.0009 Epoch: 5 Global Step: 122070 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:50,050-Speed 6296.35 samples/sec Loss 7.8105 LearningRate 0.0009 Epoch: 5 Global Step: 122080 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:53,294-Speed 6313.32 samples/sec Loss 7.7199 LearningRate 0.0009 Epoch: 5 Global Step: 122090 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:56,543-Speed 6304.88 samples/sec Loss 7.7801 LearningRate 0.0009 Epoch: 5 Global Step: 122100 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:49:59,788-Speed 6312.25 samples/sec Loss 7.8492 LearningRate 0.0009 Epoch: 5 Global Step: 122110 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:50:03,038-Speed 6303.38 samples/sec Loss 7.8436 LearningRate 0.0009 Epoch: 5 Global Step: 122120 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:50:06,287-Speed 6305.93 samples/sec Loss 7.8145 LearningRate 0.0009 Epoch: 5 Global Step: 122130 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:50:09,535-Speed 6307.19 samples/sec Loss 7.8227 LearningRate 0.0009 Epoch: 5 Global Step: 122140 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:50:12,771-Speed 6330.29 samples/sec Loss 7.8141 LearningRate 0.0009 Epoch: 5 Global Step: 122150 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:50:16,003-Speed 6337.57 samples/sec Loss 7.8568 LearningRate 0.0009 Epoch: 5 Global Step: 122160 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:50:19,248-Speed 6312.72 samples/sec Loss 7.7546 LearningRate 0.0009 Epoch: 5 Global Step: 122170 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:50:22,494-Speed 6310.93 samples/sec Loss 7.8645 LearningRate 0.0009 Epoch: 5 Global Step: 122180 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:50:25,745-Speed 6299.82 samples/sec Loss 7.7897 LearningRate 0.0009 Epoch: 5 Global Step: 122190 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:50:28,992-Speed 6308.78 samples/sec Loss 7.7336 LearningRate 0.0009 Epoch: 5 Global Step: 122200 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:50:32,240-Speed 6308.53 samples/sec Loss 7.8332 LearningRate 0.0009 Epoch: 5 Global Step: 122210 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:50:35,493-Speed 6297.20 samples/sec Loss 7.8776 LearningRate 0.0009 Epoch: 5 Global Step: 122220 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:50:38,741-Speed 6307.36 samples/sec Loss 7.7832 LearningRate 0.0009 Epoch: 5 Global Step: 122230 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:50:41,998-Speed 6288.86 samples/sec Loss 7.8713 LearningRate 0.0009 Epoch: 5 Global Step: 122240 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:50:45,242-Speed 6314.36 samples/sec Loss 7.7738 LearningRate 0.0009 Epoch: 5 Global Step: 122250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:50:48,489-Speed 6308.46 samples/sec Loss 7.8023 LearningRate 0.0009 Epoch: 5 Global Step: 122260 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:50:51,739-Speed 6302.55 samples/sec Loss 7.8155 LearningRate 0.0009 Epoch: 5 Global Step: 122270 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:50:54,983-Speed 6314.89 samples/sec Loss 7.7512 LearningRate 0.0009 Epoch: 5 Global Step: 122280 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:50:58,226-Speed 6316.18 samples/sec Loss 7.7995 LearningRate 0.0009 Epoch: 5 Global Step: 122290 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:51:01,474-Speed 6308.31 samples/sec Loss 7.7145 LearningRate 0.0009 Epoch: 5 Global Step: 122300 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:51:04,722-Speed 6306.51 samples/sec Loss 7.8489 LearningRate 0.0009 Epoch: 5 Global Step: 122310 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:51:07,967-Speed 6312.08 samples/sec Loss 7.7855 LearningRate 0.0009 Epoch: 5 Global Step: 122320 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:51:11,215-Speed 6306.72 samples/sec Loss 7.8337 LearningRate 0.0009 Epoch: 5 Global Step: 122330 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:51:14,468-Speed 6297.93 samples/sec Loss 7.8185 LearningRate 0.0009 Epoch: 5 Global Step: 122340 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:51:17,717-Speed 6303.33 samples/sec Loss 7.7812 LearningRate 0.0009 Epoch: 5 Global Step: 122350 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:51:20,964-Speed 6309.98 samples/sec Loss 7.7603 LearningRate 0.0009 Epoch: 5 Global Step: 122360 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:51:24,213-Speed 6305.39 samples/sec Loss 7.8568 LearningRate 0.0009 Epoch: 5 Global Step: 122370 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:51:27,455-Speed 6318.07 samples/sec Loss 7.8799 LearningRate 0.0009 Epoch: 5 Global Step: 122380 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:51:30,708-Speed 6297.32 samples/sec Loss 7.8088 LearningRate 0.0009 Epoch: 5 Global Step: 122390 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:51:33,954-Speed 6311.08 samples/sec Loss 7.7602 LearningRate 0.0009 Epoch: 5 Global Step: 122400 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:51:37,202-Speed 6306.99 samples/sec Loss 7.8131 LearningRate 0.0009 Epoch: 5 Global Step: 122410 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:51:40,447-Speed 6312.54 samples/sec Loss 7.7345 LearningRate 0.0009 Epoch: 5 Global Step: 122420 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:51:43,693-Speed 6310.68 samples/sec Loss 7.7967 LearningRate 0.0009 Epoch: 5 Global Step: 122430 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:51:46,939-Speed 6310.39 samples/sec Loss 7.8759 LearningRate 0.0009 Epoch: 5 Global Step: 122440 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:51:50,198-Speed 6286.06 samples/sec Loss 7.8888 LearningRate 0.0009 Epoch: 5 Global Step: 122450 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:51:53,443-Speed 6312.43 samples/sec Loss 7.8405 LearningRate 0.0009 Epoch: 5 Global Step: 122460 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:51:56,696-Speed 6296.88 samples/sec Loss 7.8131 LearningRate 0.0009 Epoch: 5 Global Step: 122470 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:51:59,945-Speed 6304.12 samples/sec Loss 7.8770 LearningRate 0.0009 Epoch: 5 Global Step: 122480 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:03,184-Speed 6325.31 samples/sec Loss 7.8169 LearningRate 0.0009 Epoch: 5 Global Step: 122490 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:06,428-Speed 6314.14 samples/sec Loss 7.7357 LearningRate 0.0009 Epoch: 5 Global Step: 122500 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:09,678-Speed 6303.95 samples/sec Loss 7.7763 LearningRate 0.0009 Epoch: 5 Global Step: 122510 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:12,924-Speed 6310.94 samples/sec Loss 7.8263 LearningRate 0.0009 Epoch: 5 Global Step: 122520 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:16,170-Speed 6310.05 samples/sec Loss 7.7415 LearningRate 0.0009 Epoch: 5 Global Step: 122530 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:19,419-Speed 6305.28 samples/sec Loss 7.7486 LearningRate 0.0009 Epoch: 5 Global Step: 122540 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:22,666-Speed 6308.92 samples/sec Loss 7.8540 LearningRate 0.0009 Epoch: 5 Global Step: 122550 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:25,908-Speed 6317.76 samples/sec Loss 7.8072 LearningRate 0.0009 Epoch: 5 Global Step: 122560 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:29,158-Speed 6304.22 samples/sec Loss 7.7620 LearningRate 0.0009 Epoch: 5 Global Step: 122570 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:32,402-Speed 6314.06 samples/sec Loss 7.8322 LearningRate 0.0009 Epoch: 5 Global Step: 122580 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:35,633-Speed 6339.95 samples/sec Loss 7.7769 LearningRate 0.0009 Epoch: 5 Global Step: 122590 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:38,882-Speed 6305.03 samples/sec Loss 7.8601 LearningRate 0.0009 Epoch: 5 Global Step: 122600 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:42,130-Speed 6306.42 samples/sec Loss 7.8508 LearningRate 0.0009 Epoch: 5 Global Step: 122610 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:45,375-Speed 6313.31 samples/sec Loss 7.7105 LearningRate 0.0009 Epoch: 5 Global Step: 122620 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:48,620-Speed 6312.23 samples/sec Loss 7.8050 LearningRate 0.0009 Epoch: 5 Global Step: 122630 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:51,874-Speed 6294.97 samples/sec Loss 7.7671 LearningRate 0.0009 Epoch: 5 Global Step: 122640 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:55,160-Speed 6235.42 samples/sec Loss 7.7982 LearningRate 0.0009 Epoch: 5 Global Step: 122650 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:52:58,406-Speed 6309.38 samples/sec Loss 7.7318 LearningRate 0.0009 Epoch: 5 Global Step: 122660 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:01,654-Speed 6307.89 samples/sec Loss 7.7896 LearningRate 0.0009 Epoch: 5 Global Step: 122670 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:04,900-Speed 6310.61 samples/sec Loss 7.8070 LearningRate 0.0009 Epoch: 5 Global Step: 122680 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:08,143-Speed 6316.05 samples/sec Loss 7.7898 LearningRate 0.0009 Epoch: 5 Global Step: 122690 Fp16 Grad Scale: 131072 Required: 64 hours Training: 2022-04-01 02:53:11,375-Speed 6338.58 samples/sec Loss 7.7733 LearningRate 0.0009 Epoch: 5 Global Step: 122700 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:14,627-Speed 6299.15 samples/sec Loss 7.7758 LearningRate 0.0009 Epoch: 5 Global Step: 122710 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:17,876-Speed 6304.31 samples/sec Loss 7.7656 LearningRate 0.0009 Epoch: 5 Global Step: 122720 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:21,126-Speed 6303.79 samples/sec Loss 7.8434 LearningRate 0.0009 Epoch: 5 Global Step: 122730 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:24,371-Speed 6311.56 samples/sec Loss 7.7328 LearningRate 0.0009 Epoch: 5 Global Step: 122740 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:27,620-Speed 6305.33 samples/sec Loss 7.6929 LearningRate 0.0009 Epoch: 5 Global Step: 122750 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:30,864-Speed 6313.67 samples/sec Loss 7.7916 LearningRate 0.0009 Epoch: 5 Global Step: 122760 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:34,111-Speed 6308.76 samples/sec Loss 7.8589 LearningRate 0.0009 Epoch: 5 Global Step: 122770 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:37,361-Speed 6304.18 samples/sec Loss 7.8420 LearningRate 0.0009 Epoch: 5 Global Step: 122780 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:53:40,595-Speed 6334.18 samples/sec Loss 7.7708 LearningRate 0.0009 Epoch: 5 Global Step: 122790 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:53:43,844-Speed 6304.59 samples/sec Loss 7.7576 LearningRate 0.0009 Epoch: 5 Global Step: 122800 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:53:47,094-Speed 6303.89 samples/sec Loss 7.7757 LearningRate 0.0009 Epoch: 5 Global Step: 122810 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:53:50,345-Speed 6301.19 samples/sec Loss 7.8780 LearningRate 0.0009 Epoch: 5 Global Step: 122820 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:53:53,592-Speed 6309.17 samples/sec Loss 7.8049 LearningRate 0.0009 Epoch: 5 Global Step: 122830 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:53:56,839-Speed 6308.25 samples/sec Loss 7.8974 LearningRate 0.0009 Epoch: 5 Global Step: 122840 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:54:00,086-Speed 6308.40 samples/sec Loss 7.7607 LearningRate 0.0009 Epoch: 5 Global Step: 122850 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:54:03,335-Speed 6304.72 samples/sec Loss 7.8524 LearningRate 0.0009 Epoch: 5 Global Step: 122860 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:54:06,584-Speed 6304.50 samples/sec Loss 7.8081 LearningRate 0.0009 Epoch: 5 Global Step: 122870 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:54:09,831-Speed 6309.26 samples/sec Loss 7.7720 LearningRate 0.0009 Epoch: 5 Global Step: 122880 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:54:13,075-Speed 6313.52 samples/sec Loss 7.8285 LearningRate 0.0009 Epoch: 5 Global Step: 122890 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:16,330-Speed 6293.20 samples/sec Loss 7.8553 LearningRate 0.0009 Epoch: 5 Global Step: 122900 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:19,579-Speed 6305.09 samples/sec Loss 7.7811 LearningRate 0.0009 Epoch: 5 Global Step: 122910 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:22,833-Speed 6296.31 samples/sec Loss 7.7722 LearningRate 0.0009 Epoch: 5 Global Step: 122920 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:26,080-Speed 6308.01 samples/sec Loss 7.8591 LearningRate 0.0009 Epoch: 5 Global Step: 122930 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:29,331-Speed 6301.53 samples/sec Loss 7.8416 LearningRate 0.0009 Epoch: 5 Global Step: 122940 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:32,578-Speed 6308.97 samples/sec Loss 7.7893 LearningRate 0.0009 Epoch: 5 Global Step: 122950 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:35,826-Speed 6306.63 samples/sec Loss 7.7820 LearningRate 0.0009 Epoch: 5 Global Step: 122960 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:39,095-Speed 6265.25 samples/sec Loss 7.7475 LearningRate 0.0009 Epoch: 5 Global Step: 122970 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:42,344-Speed 6305.02 samples/sec Loss 7.8091 LearningRate 0.0009 Epoch: 5 Global Step: 122980 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:45,582-Speed 6326.67 samples/sec Loss 7.7597 LearningRate 0.0009 Epoch: 5 Global Step: 122990 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:48,829-Speed 6309.15 samples/sec Loss 7.8010 LearningRate 0.0009 Epoch: 5 Global Step: 123000 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:52,077-Speed 6308.23 samples/sec Loss 7.7225 LearningRate 0.0009 Epoch: 5 Global Step: 123010 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:55,322-Speed 6311.54 samples/sec Loss 7.7513 LearningRate 0.0009 Epoch: 5 Global Step: 123020 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:54:58,570-Speed 6307.46 samples/sec Loss 7.7624 LearningRate 0.0009 Epoch: 5 Global Step: 123030 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:01,818-Speed 6307.14 samples/sec Loss 7.7976 LearningRate 0.0009 Epoch: 5 Global Step: 123040 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:05,068-Speed 6301.99 samples/sec Loss 7.8018 LearningRate 0.0009 Epoch: 5 Global Step: 123050 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:08,315-Speed 6309.49 samples/sec Loss 7.8267 LearningRate 0.0009 Epoch: 5 Global Step: 123060 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:11,557-Speed 6318.84 samples/sec Loss 7.7274 LearningRate 0.0009 Epoch: 5 Global Step: 123070 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:14,799-Speed 6317.16 samples/sec Loss 7.7733 LearningRate 0.0009 Epoch: 5 Global Step: 123080 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:18,032-Speed 6337.42 samples/sec Loss 7.8194 LearningRate 0.0009 Epoch: 5 Global Step: 123090 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:21,281-Speed 6303.88 samples/sec Loss 7.8535 LearningRate 0.0009 Epoch: 5 Global Step: 123100 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:24,525-Speed 6313.75 samples/sec Loss 7.8636 LearningRate 0.0009 Epoch: 5 Global Step: 123110 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:27,770-Speed 6314.73 samples/sec Loss 7.8395 LearningRate 0.0009 Epoch: 5 Global Step: 123120 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:31,012-Speed 6317.85 samples/sec Loss 7.6538 LearningRate 0.0009 Epoch: 5 Global Step: 123130 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:34,258-Speed 6310.10 samples/sec Loss 7.7482 LearningRate 0.0009 Epoch: 5 Global Step: 123140 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:37,503-Speed 6312.45 samples/sec Loss 7.7733 LearningRate 0.0009 Epoch: 5 Global Step: 123150 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:40,749-Speed 6310.92 samples/sec Loss 7.7951 LearningRate 0.0009 Epoch: 5 Global Step: 123160 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:43,994-Speed 6313.18 samples/sec Loss 7.7827 LearningRate 0.0009 Epoch: 5 Global Step: 123170 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:55:47,231-Speed 6328.88 samples/sec Loss 7.7294 LearningRate 0.0009 Epoch: 5 Global Step: 123180 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:55:50,474-Speed 6315.31 samples/sec Loss 7.7936 LearningRate 0.0009 Epoch: 5 Global Step: 123190 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:55:53,721-Speed 6309.47 samples/sec Loss 7.8888 LearningRate 0.0009 Epoch: 5 Global Step: 123200 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:55:56,965-Speed 6315.20 samples/sec Loss 7.7712 LearningRate 0.0009 Epoch: 5 Global Step: 123210 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:00,211-Speed 6309.91 samples/sec Loss 7.8132 LearningRate 0.0009 Epoch: 5 Global Step: 123220 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:03,458-Speed 6310.05 samples/sec Loss 7.8192 LearningRate 0.0009 Epoch: 5 Global Step: 123230 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:06,701-Speed 6315.78 samples/sec Loss 7.6692 LearningRate 0.0009 Epoch: 5 Global Step: 123240 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:09,947-Speed 6311.11 samples/sec Loss 7.8151 LearningRate 0.0009 Epoch: 5 Global Step: 123250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:13,193-Speed 6310.70 samples/sec Loss 7.8130 LearningRate 0.0009 Epoch: 5 Global Step: 123260 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:16,435-Speed 6317.99 samples/sec Loss 7.7974 LearningRate 0.0009 Epoch: 5 Global Step: 123270 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:19,682-Speed 6308.89 samples/sec Loss 7.7551 LearningRate 0.0009 Epoch: 5 Global Step: 123280 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:56:22,927-Speed 6312.35 samples/sec Loss 7.7546 LearningRate 0.0009 Epoch: 5 Global Step: 123290 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:56:26,172-Speed 6313.29 samples/sec Loss 7.7903 LearningRate 0.0009 Epoch: 5 Global Step: 123300 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:56:29,419-Speed 6308.79 samples/sec Loss 7.7061 LearningRate 0.0009 Epoch: 5 Global Step: 123310 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:56:32,665-Speed 6311.05 samples/sec Loss 7.7560 LearningRate 0.0009 Epoch: 5 Global Step: 123320 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:56:35,912-Speed 6307.26 samples/sec Loss 7.7576 LearningRate 0.0009 Epoch: 5 Global Step: 123330 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:56:39,158-Speed 6311.93 samples/sec Loss 7.7768 LearningRate 0.0009 Epoch: 5 Global Step: 123340 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:56:42,394-Speed 6329.21 samples/sec Loss 7.7585 LearningRate 0.0009 Epoch: 5 Global Step: 123350 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:45,638-Speed 6315.55 samples/sec Loss 7.7426 LearningRate 0.0009 Epoch: 5 Global Step: 123360 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:48,882-Speed 6313.75 samples/sec Loss 7.7571 LearningRate 0.0009 Epoch: 5 Global Step: 123370 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:52,132-Speed 6303.23 samples/sec Loss 7.8329 LearningRate 0.0009 Epoch: 5 Global Step: 123380 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:55,381-Speed 6304.62 samples/sec Loss 7.8126 LearningRate 0.0009 Epoch: 5 Global Step: 123390 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:56:58,626-Speed 6313.76 samples/sec Loss 7.7952 LearningRate 0.0009 Epoch: 5 Global Step: 123400 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:57:01,868-Speed 6318.47 samples/sec Loss 7.8370 LearningRate 0.0009 Epoch: 5 Global Step: 123410 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:57:05,111-Speed 6317.19 samples/sec Loss 7.7791 LearningRate 0.0009 Epoch: 5 Global Step: 123420 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:57:08,356-Speed 6311.56 samples/sec Loss 7.7284 LearningRate 0.0009 Epoch: 5 Global Step: 123430 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:57:11,600-Speed 6314.43 samples/sec Loss 7.7543 LearningRate 0.0009 Epoch: 5 Global Step: 123440 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:57:14,849-Speed 6306.41 samples/sec Loss 7.7784 LearningRate 0.0009 Epoch: 5 Global Step: 123450 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:18,100-Speed 6300.98 samples/sec Loss 7.9042 LearningRate 0.0009 Epoch: 5 Global Step: 123460 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:21,347-Speed 6308.40 samples/sec Loss 7.7428 LearningRate 0.0009 Epoch: 5 Global Step: 123470 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:24,591-Speed 6313.94 samples/sec Loss 7.8662 LearningRate 0.0009 Epoch: 5 Global Step: 123480 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:27,856-Speed 6273.69 samples/sec Loss 7.8146 LearningRate 0.0009 Epoch: 5 Global Step: 123490 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:31,103-Speed 6309.55 samples/sec Loss 7.8783 LearningRate 0.0009 Epoch: 5 Global Step: 123500 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:34,352-Speed 6304.97 samples/sec Loss 7.7924 LearningRate 0.0009 Epoch: 5 Global Step: 123510 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:37,605-Speed 6297.14 samples/sec Loss 7.8216 LearningRate 0.0009 Epoch: 5 Global Step: 123520 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:40,848-Speed 6316.51 samples/sec Loss 7.7154 LearningRate 0.0009 Epoch: 5 Global Step: 123530 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:44,093-Speed 6313.61 samples/sec Loss 7.6751 LearningRate 0.0009 Epoch: 5 Global Step: 123540 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:47,343-Speed 6301.85 samples/sec Loss 7.7726 LearningRate 0.0009 Epoch: 5 Global Step: 123550 Fp16 Grad Scale: 131072 Required: 64 hours Training: 2022-04-01 02:57:50,578-Speed 6332.65 samples/sec Loss 7.7216 LearningRate 0.0009 Epoch: 5 Global Step: 123560 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:53,825-Speed 6308.99 samples/sec Loss 7.7546 LearningRate 0.0009 Epoch: 5 Global Step: 123570 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:57:57,073-Speed 6305.16 samples/sec Loss 7.8009 LearningRate 0.0009 Epoch: 5 Global Step: 123580 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:58:00,323-Speed 6303.31 samples/sec Loss 7.7327 LearningRate 0.0009 Epoch: 5 Global Step: 123590 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:58:03,571-Speed 6307.44 samples/sec Loss 7.7165 LearningRate 0.0009 Epoch: 5 Global Step: 123600 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:58:06,818-Speed 6307.76 samples/sec Loss 7.7350 LearningRate 0.0009 Epoch: 5 Global Step: 123610 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:58:10,065-Speed 6308.98 samples/sec Loss 7.8038 LearningRate 0.0009 Epoch: 5 Global Step: 123620 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:58:13,315-Speed 6305.19 samples/sec Loss 7.7484 LearningRate 0.0009 Epoch: 5 Global Step: 123630 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:58:16,560-Speed 6312.37 samples/sec Loss 7.7653 LearningRate 0.0009 Epoch: 5 Global Step: 123640 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:58:19,807-Speed 6308.32 samples/sec Loss 7.7190 LearningRate 0.0009 Epoch: 5 Global Step: 123650 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:58:23,040-Speed 6336.03 samples/sec Loss 7.8122 LearningRate 0.0009 Epoch: 5 Global Step: 123660 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:58:26,286-Speed 6311.27 samples/sec Loss 7.7446 LearningRate 0.0009 Epoch: 5 Global Step: 123670 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:58:29,534-Speed 6305.84 samples/sec Loss 7.8035 LearningRate 0.0009 Epoch: 5 Global Step: 123680 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:58:32,778-Speed 6314.73 samples/sec Loss 7.7580 LearningRate 0.0009 Epoch: 5 Global Step: 123690 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:58:36,023-Speed 6311.96 samples/sec Loss 7.7516 LearningRate 0.0009 Epoch: 5 Global Step: 123700 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:58:39,270-Speed 6310.38 samples/sec Loss 7.8006 LearningRate 0.0009 Epoch: 5 Global Step: 123710 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:58:42,518-Speed 6306.32 samples/sec Loss 7.7219 LearningRate 0.0009 Epoch: 5 Global Step: 123720 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:58:45,770-Speed 6298.33 samples/sec Loss 7.7565 LearningRate 0.0009 Epoch: 5 Global Step: 123730 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:58:49,013-Speed 6317.33 samples/sec Loss 7.7518 LearningRate 0.0009 Epoch: 5 Global Step: 123740 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:58:52,263-Speed 6302.69 samples/sec Loss 7.7573 LearningRate 0.0009 Epoch: 5 Global Step: 123750 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:58:55,510-Speed 6308.32 samples/sec Loss 7.7271 LearningRate 0.0009 Epoch: 5 Global Step: 123760 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 02:58:58,759-Speed 6305.53 samples/sec Loss 7.7345 LearningRate 0.0009 Epoch: 5 Global Step: 123770 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:02,003-Speed 6313.58 samples/sec Loss 7.7523 LearningRate 0.0009 Epoch: 5 Global Step: 123780 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:05,248-Speed 6312.93 samples/sec Loss 7.7000 LearningRate 0.0009 Epoch: 5 Global Step: 123790 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:08,496-Speed 6308.62 samples/sec Loss 7.8058 LearningRate 0.0009 Epoch: 5 Global Step: 123800 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:11,741-Speed 6311.09 samples/sec Loss 7.7640 LearningRate 0.0009 Epoch: 5 Global Step: 123810 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:14,983-Speed 6319.89 samples/sec Loss 7.8326 LearningRate 0.0009 Epoch: 5 Global Step: 123820 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:18,232-Speed 6304.71 samples/sec Loss 7.7975 LearningRate 0.0009 Epoch: 5 Global Step: 123830 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:21,478-Speed 6311.48 samples/sec Loss 7.8203 LearningRate 0.0009 Epoch: 5 Global Step: 123840 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:24,718-Speed 6321.58 samples/sec Loss 7.7264 LearningRate 0.0009 Epoch: 5 Global Step: 123850 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:27,967-Speed 6307.56 samples/sec Loss 7.8096 LearningRate 0.0009 Epoch: 5 Global Step: 123860 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:31,198-Speed 6339.65 samples/sec Loss 7.7475 LearningRate 0.0009 Epoch: 5 Global Step: 123870 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:34,444-Speed 6311.60 samples/sec Loss 7.7892 LearningRate 0.0009 Epoch: 5 Global Step: 123880 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:37,687-Speed 6315.98 samples/sec Loss 7.7829 LearningRate 0.0009 Epoch: 5 Global Step: 123890 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:40,936-Speed 6305.65 samples/sec Loss 7.8519 LearningRate 0.0009 Epoch: 5 Global Step: 123900 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:44,185-Speed 6303.94 samples/sec Loss 7.7266 LearningRate 0.0009 Epoch: 5 Global Step: 123910 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:47,483-Speed 6211.39 samples/sec Loss 7.8611 LearningRate 0.0009 Epoch: 5 Global Step: 123920 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:50,730-Speed 6308.55 samples/sec Loss 7.8038 LearningRate 0.0009 Epoch: 5 Global Step: 123930 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:53,979-Speed 6305.36 samples/sec Loss 7.8488 LearningRate 0.0009 Epoch: 5 Global Step: 123940 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 02:59:57,227-Speed 6305.98 samples/sec Loss 7.7283 LearningRate 0.0009 Epoch: 5 Global Step: 123950 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:00:00,474-Speed 6309.65 samples/sec Loss 7.6764 LearningRate 0.0009 Epoch: 5 Global Step: 123960 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:00:03,706-Speed 6338.48 samples/sec Loss 7.7370 LearningRate 0.0009 Epoch: 5 Global Step: 123970 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:00:06,951-Speed 6312.66 samples/sec Loss 7.6622 LearningRate 0.0009 Epoch: 5 Global Step: 123980 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:00:10,194-Speed 6315.14 samples/sec Loss 7.7592 LearningRate 0.0009 Epoch: 5 Global Step: 123990 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:00:13,440-Speed 6310.84 samples/sec Loss 7.7490 LearningRate 0.0009 Epoch: 5 Global Step: 124000 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:00:16,690-Speed 6304.14 samples/sec Loss 7.7773 LearningRate 0.0009 Epoch: 5 Global Step: 124010 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:00:19,925-Speed 6331.02 samples/sec Loss 7.7864 LearningRate 0.0009 Epoch: 5 Global Step: 124020 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:00:23,195-Speed 6264.23 samples/sec Loss 7.7667 LearningRate 0.0009 Epoch: 5 Global Step: 124030 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:00:26,446-Speed 6302.24 samples/sec Loss 7.8124 LearningRate 0.0009 Epoch: 5 Global Step: 124040 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:00:29,692-Speed 6310.15 samples/sec Loss 7.6666 LearningRate 0.0009 Epoch: 5 Global Step: 124050 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:00:32,939-Speed 6309.60 samples/sec Loss 7.9391 LearningRate 0.0009 Epoch: 5 Global Step: 124060 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:00:36,185-Speed 6311.28 samples/sec Loss 7.8381 LearningRate 0.0009 Epoch: 5 Global Step: 124070 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:00:39,431-Speed 6309.92 samples/sec Loss 7.7345 LearningRate 0.0009 Epoch: 5 Global Step: 124080 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:00:42,678-Speed 6308.45 samples/sec Loss 7.7222 LearningRate 0.0009 Epoch: 5 Global Step: 124090 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:00:45,925-Speed 6309.75 samples/sec Loss 7.7343 LearningRate 0.0009 Epoch: 5 Global Step: 124100 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:00:49,168-Speed 6316.79 samples/sec Loss 7.7795 LearningRate 0.0009 Epoch: 5 Global Step: 124110 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:00:52,416-Speed 6306.61 samples/sec Loss 7.7740 LearningRate 0.0009 Epoch: 5 Global Step: 124120 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:00:55,659-Speed 6315.29 samples/sec Loss 7.7400 LearningRate 0.0009 Epoch: 5 Global Step: 124130 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:00:58,905-Speed 6311.95 samples/sec Loss 7.8147 LearningRate 0.0009 Epoch: 5 Global Step: 124140 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:01:02,160-Speed 6291.80 samples/sec Loss 7.8855 LearningRate 0.0009 Epoch: 5 Global Step: 124150 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:01:05,418-Speed 6288.68 samples/sec Loss 7.6909 LearningRate 0.0009 Epoch: 5 Global Step: 124160 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:01:08,662-Speed 6314.24 samples/sec Loss 7.7539 LearningRate 0.0009 Epoch: 5 Global Step: 124170 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:01:11,909-Speed 6309.19 samples/sec Loss 7.7416 LearningRate 0.0009 Epoch: 5 Global Step: 124180 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:01:15,158-Speed 6305.09 samples/sec Loss 7.7618 LearningRate 0.0009 Epoch: 5 Global Step: 124190 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:01:18,406-Speed 6306.40 samples/sec Loss 7.7937 LearningRate 0.0009 Epoch: 5 Global Step: 124200 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:01:21,635-Speed 6343.51 samples/sec Loss 7.6882 LearningRate 0.0009 Epoch: 5 Global Step: 124210 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:01:24,882-Speed 6309.00 samples/sec Loss 7.7641 LearningRate 0.0009 Epoch: 5 Global Step: 124220 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:01:28,133-Speed 6301.56 samples/sec Loss 7.7687 LearningRate 0.0009 Epoch: 5 Global Step: 124230 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:01:31,381-Speed 6305.33 samples/sec Loss 7.7407 LearningRate 0.0009 Epoch: 5 Global Step: 124240 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:01:34,625-Speed 6315.11 samples/sec Loss 7.7974 LearningRate 0.0009 Epoch: 5 Global Step: 124250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:01:37,872-Speed 6309.37 samples/sec Loss 7.8074 LearningRate 0.0009 Epoch: 5 Global Step: 124260 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:01:41,119-Speed 6308.90 samples/sec Loss 7.7863 LearningRate 0.0009 Epoch: 5 Global Step: 124270 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:01:44,365-Speed 6311.71 samples/sec Loss 7.7665 LearningRate 0.0009 Epoch: 5 Global Step: 124280 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:01:47,617-Speed 6298.77 samples/sec Loss 7.8241 LearningRate 0.0009 Epoch: 5 Global Step: 124290 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:01:50,867-Speed 6303.49 samples/sec Loss 7.7807 LearningRate 0.0009 Epoch: 5 Global Step: 124300 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:01:54,111-Speed 6313.75 samples/sec Loss 7.8319 LearningRate 0.0009 Epoch: 5 Global Step: 124310 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:01:57,359-Speed 6306.51 samples/sec Loss 7.8424 LearningRate 0.0009 Epoch: 5 Global Step: 124320 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:02:00,606-Speed 6309.45 samples/sec Loss 7.7236 LearningRate 0.0009 Epoch: 5 Global Step: 124330 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:02:03,853-Speed 6307.67 samples/sec Loss 7.7314 LearningRate 0.0009 Epoch: 5 Global Step: 124340 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:02:07,099-Speed 6312.32 samples/sec Loss 7.6972 LearningRate 0.0009 Epoch: 5 Global Step: 124350 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:02:10,346-Speed 6307.66 samples/sec Loss 7.7826 LearningRate 0.0009 Epoch: 5 Global Step: 124360 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:02:13,592-Speed 6310.83 samples/sec Loss 7.8137 LearningRate 0.0009 Epoch: 5 Global Step: 124370 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:02:16,843-Speed 6302.10 samples/sec Loss 7.7262 LearningRate 0.0009 Epoch: 5 Global Step: 124380 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:02:20,093-Speed 6302.56 samples/sec Loss 7.7161 LearningRate 0.0009 Epoch: 5 Global Step: 124390 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:02:23,338-Speed 6311.91 samples/sec Loss 7.7533 LearningRate 0.0009 Epoch: 5 Global Step: 124400 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:02:26,575-Speed 6329.37 samples/sec Loss 7.7911 LearningRate 0.0009 Epoch: 5 Global Step: 124410 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:02:29,821-Speed 6309.68 samples/sec Loss 7.7629 LearningRate 0.0009 Epoch: 5 Global Step: 124420 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:02:33,069-Speed 6307.36 samples/sec Loss 7.7951 LearningRate 0.0009 Epoch: 5 Global Step: 124430 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:03:32,051-Speed 347.23 samples/sec Loss 7.8730 LearningRate 0.0009 Epoch: 6 Global Step: 124440 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:03:35,288-Speed 6329.32 samples/sec Loss 7.7613 LearningRate 0.0009 Epoch: 6 Global Step: 124450 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:03:38,527-Speed 6325.06 samples/sec Loss 7.7584 LearningRate 0.0009 Epoch: 6 Global Step: 124460 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:03:41,764-Speed 6327.52 samples/sec Loss 7.8432 LearningRate 0.0009 Epoch: 6 Global Step: 124470 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:03:45,007-Speed 6318.32 samples/sec Loss 7.7304 LearningRate 0.0009 Epoch: 6 Global Step: 124480 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:03:48,244-Speed 6327.01 samples/sec Loss 7.7044 LearningRate 0.0009 Epoch: 6 Global Step: 124490 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:03:51,484-Speed 6323.81 samples/sec Loss 7.7549 LearningRate 0.0009 Epoch: 6 Global Step: 124500 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:03:54,723-Speed 6324.18 samples/sec Loss 7.7690 LearningRate 0.0009 Epoch: 6 Global Step: 124510 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:03:57,966-Speed 6316.19 samples/sec Loss 7.7258 LearningRate 0.0009 Epoch: 6 Global Step: 124520 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:04:01,203-Speed 6328.09 samples/sec Loss 7.7209 LearningRate 0.0009 Epoch: 6 Global Step: 124530 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:04:04,442-Speed 6324.42 samples/sec Loss 7.6982 LearningRate 0.0009 Epoch: 6 Global Step: 124540 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:04:07,688-Speed 6309.81 samples/sec Loss 7.7993 LearningRate 0.0009 Epoch: 6 Global Step: 124550 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:04:10,929-Speed 6321.33 samples/sec Loss 7.7800 LearningRate 0.0009 Epoch: 6 Global Step: 124560 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:04:14,177-Speed 6307.24 samples/sec Loss 7.6868 LearningRate 0.0009 Epoch: 6 Global Step: 124570 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:04:17,409-Speed 6337.86 samples/sec Loss 7.7823 LearningRate 0.0009 Epoch: 6 Global Step: 124580 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:04:20,650-Speed 6319.60 samples/sec Loss 7.7234 LearningRate 0.0009 Epoch: 6 Global Step: 124590 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:04:23,903-Speed 6298.07 samples/sec Loss 7.7149 LearningRate 0.0009 Epoch: 6 Global Step: 124600 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:04:27,143-Speed 6322.52 samples/sec Loss 7.7830 LearningRate 0.0009 Epoch: 6 Global Step: 124610 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:04:30,386-Speed 6316.42 samples/sec Loss 7.8006 LearningRate 0.0009 Epoch: 6 Global Step: 124620 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:04:33,633-Speed 6307.80 samples/sec Loss 7.6761 LearningRate 0.0009 Epoch: 6 Global Step: 124630 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:04:36,885-Speed 6299.91 samples/sec Loss 7.7096 LearningRate 0.0009 Epoch: 6 Global Step: 124640 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:04:40,134-Speed 6304.00 samples/sec Loss 7.7933 LearningRate 0.0009 Epoch: 6 Global Step: 124650 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:04:43,373-Speed 6325.38 samples/sec Loss 7.7138 LearningRate 0.0009 Epoch: 6 Global Step: 124660 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:04:46,625-Speed 6298.40 samples/sec Loss 7.7646 LearningRate 0.0009 Epoch: 6 Global Step: 124670 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:04:49,882-Speed 6290.54 samples/sec Loss 7.7120 LearningRate 0.0009 Epoch: 6 Global Step: 124680 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:04:53,139-Speed 6289.63 samples/sec Loss 7.7309 LearningRate 0.0009 Epoch: 6 Global Step: 124690 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:04:56,384-Speed 6312.78 samples/sec Loss 7.7161 LearningRate 0.0009 Epoch: 6 Global Step: 124700 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:04:59,625-Speed 6319.44 samples/sec Loss 7.7056 LearningRate 0.0009 Epoch: 6 Global Step: 124710 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:02,875-Speed 6304.47 samples/sec Loss 7.6990 LearningRate 0.0009 Epoch: 6 Global Step: 124720 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:06,117-Speed 6318.53 samples/sec Loss 7.7248 LearningRate 0.0009 Epoch: 6 Global Step: 124730 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:09,362-Speed 6311.32 samples/sec Loss 7.8423 LearningRate 0.0009 Epoch: 6 Global Step: 124740 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:12,605-Speed 6316.65 samples/sec Loss 7.7514 LearningRate 0.0009 Epoch: 6 Global Step: 124750 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:15,852-Speed 6308.33 samples/sec Loss 7.6767 LearningRate 0.0009 Epoch: 6 Global Step: 124760 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:19,102-Speed 6304.13 samples/sec Loss 7.7330 LearningRate 0.0009 Epoch: 6 Global Step: 124770 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:22,331-Speed 6342.39 samples/sec Loss 7.6551 LearningRate 0.0009 Epoch: 6 Global Step: 124780 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:25,577-Speed 6310.74 samples/sec Loss 7.7329 LearningRate 0.0009 Epoch: 6 Global Step: 124790 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:28,822-Speed 6313.51 samples/sec Loss 7.7135 LearningRate 0.0009 Epoch: 6 Global Step: 124800 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:32,079-Speed 6290.44 samples/sec Loss 7.7188 LearningRate 0.0009 Epoch: 6 Global Step: 124810 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:35,325-Speed 6309.53 samples/sec Loss 7.8078 LearningRate 0.0009 Epoch: 6 Global Step: 124820 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:38,571-Speed 6311.14 samples/sec Loss 7.7650 LearningRate 0.0009 Epoch: 6 Global Step: 124830 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:05:41,811-Speed 6321.43 samples/sec Loss 7.7181 LearningRate 0.0009 Epoch: 6 Global Step: 124840 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:05:45,073-Speed 6280.68 samples/sec Loss 7.7827 LearningRate 0.0009 Epoch: 6 Global Step: 124850 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:05:48,315-Speed 6317.81 samples/sec Loss 7.6844 LearningRate 0.0009 Epoch: 6 Global Step: 124860 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:05:51,563-Speed 6307.66 samples/sec Loss 7.7796 LearningRate 0.0009 Epoch: 6 Global Step: 124870 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:05:54,805-Speed 6318.78 samples/sec Loss 7.7248 LearningRate 0.0009 Epoch: 6 Global Step: 124880 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:05:58,047-Speed 6318.09 samples/sec Loss 7.7278 LearningRate 0.0009 Epoch: 6 Global Step: 124890 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:01,288-Speed 6320.20 samples/sec Loss 7.7090 LearningRate 0.0009 Epoch: 6 Global Step: 124900 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:04,532-Speed 6314.97 samples/sec Loss 7.7890 LearningRate 0.0009 Epoch: 6 Global Step: 124910 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:07,775-Speed 6315.97 samples/sec Loss 7.7733 LearningRate 0.0009 Epoch: 6 Global Step: 124920 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:11,022-Speed 6308.83 samples/sec Loss 7.7523 LearningRate 0.0009 Epoch: 6 Global Step: 124930 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:14,264-Speed 6318.99 samples/sec Loss 7.8022 LearningRate 0.0009 Epoch: 6 Global Step: 124940 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:06:17,504-Speed 6323.37 samples/sec Loss 7.6986 LearningRate 0.0009 Epoch: 6 Global Step: 124950 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:06:20,743-Speed 6324.88 samples/sec Loss 7.7156 LearningRate 0.0009 Epoch: 6 Global Step: 124960 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:06:23,985-Speed 6316.69 samples/sec Loss 7.6740 LearningRate 0.0009 Epoch: 6 Global Step: 124970 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:06:27,228-Speed 6317.06 samples/sec Loss 7.7351 LearningRate 0.0009 Epoch: 6 Global Step: 124980 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:06:30,531-Speed 6201.92 samples/sec Loss 7.7602 LearningRate 0.0009 Epoch: 6 Global Step: 124990 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:06:33,770-Speed 6324.99 samples/sec Loss 7.6841 LearningRate 0.0009 Epoch: 6 Global Step: 125000 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:06:37,013-Speed 6316.04 samples/sec Loss 7.6769 LearningRate 0.0009 Epoch: 6 Global Step: 125010 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:06:40,239-Speed 6349.11 samples/sec Loss 7.7171 LearningRate 0.0009 Epoch: 6 Global Step: 125020 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:43,482-Speed 6316.55 samples/sec Loss 7.7268 LearningRate 0.0009 Epoch: 6 Global Step: 125030 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:46,724-Speed 6318.39 samples/sec Loss 7.7512 LearningRate 0.0009 Epoch: 6 Global Step: 125040 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:49,971-Speed 6310.30 samples/sec Loss 7.7806 LearningRate 0.0009 Epoch: 6 Global Step: 125050 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:53,214-Speed 6316.20 samples/sec Loss 7.6933 LearningRate 0.0009 Epoch: 6 Global Step: 125060 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:56,465-Speed 6299.81 samples/sec Loss 7.7475 LearningRate 0.0009 Epoch: 6 Global Step: 125070 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:06:59,711-Speed 6311.26 samples/sec Loss 7.6499 LearningRate 0.0009 Epoch: 6 Global Step: 125080 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:07:02,965-Speed 6295.22 samples/sec Loss 7.7192 LearningRate 0.0009 Epoch: 6 Global Step: 125090 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:07:06,214-Speed 6304.03 samples/sec Loss 7.7982 LearningRate 0.0009 Epoch: 6 Global Step: 125100 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:07:09,464-Speed 6304.15 samples/sec Loss 7.8028 LearningRate 0.0009 Epoch: 6 Global Step: 125110 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:07:12,707-Speed 6318.42 samples/sec Loss 7.7823 LearningRate 0.0009 Epoch: 6 Global Step: 125120 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:15,952-Speed 6310.95 samples/sec Loss 7.7405 LearningRate 0.0009 Epoch: 6 Global Step: 125130 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:19,204-Speed 6300.88 samples/sec Loss 7.7509 LearningRate 0.0009 Epoch: 6 Global Step: 125140 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:22,449-Speed 6310.98 samples/sec Loss 7.6768 LearningRate 0.0009 Epoch: 6 Global Step: 125150 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:25,692-Speed 6317.76 samples/sec Loss 7.7139 LearningRate 0.0009 Epoch: 6 Global Step: 125160 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:28,944-Speed 6297.85 samples/sec Loss 7.7617 LearningRate 0.0009 Epoch: 6 Global Step: 125170 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:32,191-Speed 6309.22 samples/sec Loss 7.7233 LearningRate 0.0009 Epoch: 6 Global Step: 125180 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:35,438-Speed 6309.22 samples/sec Loss 7.8143 LearningRate 0.0009 Epoch: 6 Global Step: 125190 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:38,683-Speed 6311.70 samples/sec Loss 7.7650 LearningRate 0.0009 Epoch: 6 Global Step: 125200 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:41,936-Speed 6297.25 samples/sec Loss 7.6757 LearningRate 0.0009 Epoch: 6 Global Step: 125210 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:45,166-Speed 6343.30 samples/sec Loss 7.7103 LearningRate 0.0009 Epoch: 6 Global Step: 125220 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:48,414-Speed 6306.20 samples/sec Loss 7.7639 LearningRate 0.0009 Epoch: 6 Global Step: 125230 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:51,657-Speed 6315.45 samples/sec Loss 7.7266 LearningRate 0.0009 Epoch: 6 Global Step: 125240 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:07:54,888-Speed 6340.35 samples/sec Loss 7.6111 LearningRate 0.0009 Epoch: 6 Global Step: 125250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:07:58,130-Speed 6319.03 samples/sec Loss 7.7942 LearningRate 0.0009 Epoch: 6 Global Step: 125260 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:01,373-Speed 6315.58 samples/sec Loss 7.7020 LearningRate 0.0009 Epoch: 6 Global Step: 125270 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:04,620-Speed 6309.88 samples/sec Loss 7.7351 LearningRate 0.0009 Epoch: 6 Global Step: 125280 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:07,866-Speed 6311.21 samples/sec Loss 7.7918 LearningRate 0.0009 Epoch: 6 Global Step: 125290 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:11,111-Speed 6312.24 samples/sec Loss 7.8185 LearningRate 0.0009 Epoch: 6 Global Step: 125300 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:14,357-Speed 6309.29 samples/sec Loss 7.8167 LearningRate 0.0009 Epoch: 6 Global Step: 125310 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:17,602-Speed 6313.07 samples/sec Loss 7.6777 LearningRate 0.0009 Epoch: 6 Global Step: 125320 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:20,846-Speed 6316.72 samples/sec Loss 7.6426 LearningRate 0.0009 Epoch: 6 Global Step: 125330 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:24,098-Speed 6298.34 samples/sec Loss 7.6719 LearningRate 0.0009 Epoch: 6 Global Step: 125340 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:27,345-Speed 6309.16 samples/sec Loss 7.6470 LearningRate 0.0009 Epoch: 6 Global Step: 125350 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:08:30,590-Speed 6313.26 samples/sec Loss 7.7438 LearningRate 0.0009 Epoch: 6 Global Step: 125360 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:08:33,836-Speed 6310.04 samples/sec Loss 7.7158 LearningRate 0.0009 Epoch: 6 Global Step: 125370 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:08:37,082-Speed 6310.94 samples/sec Loss 7.6783 LearningRate 0.0009 Epoch: 6 Global Step: 125380 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:08:40,330-Speed 6306.52 samples/sec Loss 7.6601 LearningRate 0.0009 Epoch: 6 Global Step: 125390 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:08:43,561-Speed 6339.13 samples/sec Loss 7.8243 LearningRate 0.0009 Epoch: 6 Global Step: 125400 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:46,803-Speed 6318.52 samples/sec Loss 7.7807 LearningRate 0.0009 Epoch: 6 Global Step: 125410 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:50,046-Speed 6318.31 samples/sec Loss 7.7261 LearningRate 0.0009 Epoch: 6 Global Step: 125420 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:53,286-Speed 6321.19 samples/sec Loss 7.7916 LearningRate 0.0009 Epoch: 6 Global Step: 125430 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:56,559-Speed 6258.99 samples/sec Loss 7.7262 LearningRate 0.0009 Epoch: 6 Global Step: 125440 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:08:59,805-Speed 6310.60 samples/sec Loss 7.7323 LearningRate 0.0009 Epoch: 6 Global Step: 125450 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:03,051-Speed 6310.86 samples/sec Loss 7.7587 LearningRate 0.0009 Epoch: 6 Global Step: 125460 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:06,297-Speed 6310.98 samples/sec Loss 7.6293 LearningRate 0.0009 Epoch: 6 Global Step: 125470 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:09,543-Speed 6310.52 samples/sec Loss 7.7399 LearningRate 0.0009 Epoch: 6 Global Step: 125480 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:12,788-Speed 6313.01 samples/sec Loss 7.6962 LearningRate 0.0009 Epoch: 6 Global Step: 125490 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:16,031-Speed 6316.35 samples/sec Loss 7.7464 LearningRate 0.0009 Epoch: 6 Global Step: 125500 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:09:19,275-Speed 6313.10 samples/sec Loss 7.6324 LearningRate 0.0009 Epoch: 6 Global Step: 125510 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:09:22,521-Speed 6311.21 samples/sec Loss 7.6973 LearningRate 0.0009 Epoch: 6 Global Step: 125520 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:09:25,761-Speed 6322.99 samples/sec Loss 7.6760 LearningRate 0.0009 Epoch: 6 Global Step: 125530 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:09:29,006-Speed 6312.71 samples/sec Loss 7.7224 LearningRate 0.0009 Epoch: 6 Global Step: 125540 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:09:32,246-Speed 6321.83 samples/sec Loss 7.7223 LearningRate 0.0009 Epoch: 6 Global Step: 125550 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:35,493-Speed 6310.47 samples/sec Loss 7.7704 LearningRate 0.0009 Epoch: 6 Global Step: 125560 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:38,734-Speed 6319.47 samples/sec Loss 7.7640 LearningRate 0.0009 Epoch: 6 Global Step: 125570 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:41,978-Speed 6315.61 samples/sec Loss 7.6596 LearningRate 0.0009 Epoch: 6 Global Step: 125580 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:45,223-Speed 6312.12 samples/sec Loss 7.7568 LearningRate 0.0009 Epoch: 6 Global Step: 125590 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:48,466-Speed 6316.25 samples/sec Loss 7.6433 LearningRate 0.0009 Epoch: 6 Global Step: 125600 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:51,712-Speed 6311.11 samples/sec Loss 7.6898 LearningRate 0.0009 Epoch: 6 Global Step: 125610 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:54,958-Speed 6311.38 samples/sec Loss 7.7066 LearningRate 0.0009 Epoch: 6 Global Step: 125620 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:09:58,205-Speed 6308.66 samples/sec Loss 7.7384 LearningRate 0.0009 Epoch: 6 Global Step: 125630 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:01,447-Speed 6318.27 samples/sec Loss 7.7380 LearningRate 0.0009 Epoch: 6 Global Step: 125640 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:04,703-Speed 6290.49 samples/sec Loss 7.7462 LearningRate 0.0009 Epoch: 6 Global Step: 125650 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:10:07,954-Speed 6301.39 samples/sec Loss 7.7459 LearningRate 0.0009 Epoch: 6 Global Step: 125660 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:10:11,199-Speed 6312.92 samples/sec Loss 7.7859 LearningRate 0.0009 Epoch: 6 Global Step: 125670 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:10:14,442-Speed 6314.86 samples/sec Loss 7.7970 LearningRate 0.0009 Epoch: 6 Global Step: 125680 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:10:17,700-Speed 6288.34 samples/sec Loss 7.7610 LearningRate 0.0009 Epoch: 6 Global Step: 125690 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:10:20,934-Speed 6334.90 samples/sec Loss 7.7642 LearningRate 0.0009 Epoch: 6 Global Step: 125700 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:24,181-Speed 6308.26 samples/sec Loss 7.7175 LearningRate 0.0009 Epoch: 6 Global Step: 125710 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:27,424-Speed 6315.83 samples/sec Loss 7.6959 LearningRate 0.0009 Epoch: 6 Global Step: 125720 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:30,680-Speed 6291.45 samples/sec Loss 7.7737 LearningRate 0.0009 Epoch: 6 Global Step: 125730 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:33,926-Speed 6311.92 samples/sec Loss 7.6340 LearningRate 0.0009 Epoch: 6 Global Step: 125740 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:37,173-Speed 6310.06 samples/sec Loss 7.8154 LearningRate 0.0009 Epoch: 6 Global Step: 125750 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:40,423-Speed 6304.24 samples/sec Loss 7.7377 LearningRate 0.0009 Epoch: 6 Global Step: 125760 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:43,669-Speed 6311.15 samples/sec Loss 7.7586 LearningRate 0.0009 Epoch: 6 Global Step: 125770 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:46,912-Speed 6316.44 samples/sec Loss 7.7225 LearningRate 0.0009 Epoch: 6 Global Step: 125780 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:50,154-Speed 6319.70 samples/sec Loss 7.6907 LearningRate 0.0009 Epoch: 6 Global Step: 125790 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:10:53,407-Speed 6296.71 samples/sec Loss 7.7464 LearningRate 0.0009 Epoch: 6 Global Step: 125800 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:10:56,655-Speed 6306.32 samples/sec Loss 7.6516 LearningRate 0.0009 Epoch: 6 Global Step: 125810 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:10:59,899-Speed 6315.07 samples/sec Loss 7.7653 LearningRate 0.0009 Epoch: 6 Global Step: 125820 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:11:03,144-Speed 6311.54 samples/sec Loss 7.6919 LearningRate 0.0009 Epoch: 6 Global Step: 125830 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:11:06,387-Speed 6316.90 samples/sec Loss 7.7169 LearningRate 0.0009 Epoch: 6 Global Step: 125840 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:11:09,634-Speed 6309.09 samples/sec Loss 7.6552 LearningRate 0.0009 Epoch: 6 Global Step: 125850 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:11:12,878-Speed 6313.80 samples/sec Loss 7.7512 LearningRate 0.0009 Epoch: 6 Global Step: 125860 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:11:16,124-Speed 6311.94 samples/sec Loss 7.7727 LearningRate 0.0009 Epoch: 6 Global Step: 125870 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:11:19,369-Speed 6312.21 samples/sec Loss 7.7170 LearningRate 0.0009 Epoch: 6 Global Step: 125880 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:11:22,616-Speed 6308.76 samples/sec Loss 7.7820 LearningRate 0.0009 Epoch: 6 Global Step: 125890 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:11:25,850-Speed 6334.70 samples/sec Loss 7.7771 LearningRate 0.0009 Epoch: 6 Global Step: 125900 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:11:29,080-Speed 6341.02 samples/sec Loss 7.6768 LearningRate 0.0009 Epoch: 6 Global Step: 125910 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:11:32,333-Speed 6296.89 samples/sec Loss 7.7279 LearningRate 0.0009 Epoch: 6 Global Step: 125920 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:11:35,580-Speed 6308.93 samples/sec Loss 7.6958 LearningRate 0.0009 Epoch: 6 Global Step: 125930 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:11:38,836-Speed 6291.76 samples/sec Loss 7.7304 LearningRate 0.0009 Epoch: 6 Global Step: 125940 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:11:42,084-Speed 6306.42 samples/sec Loss 7.7767 LearningRate 0.0009 Epoch: 6 Global Step: 125950 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:11:45,332-Speed 6307.77 samples/sec Loss 7.7272 LearningRate 0.0009 Epoch: 6 Global Step: 125960 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:11:48,577-Speed 6312.22 samples/sec Loss 7.6403 LearningRate 0.0009 Epoch: 6 Global Step: 125970 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:11:51,819-Speed 6318.18 samples/sec Loss 7.6321 LearningRate 0.0009 Epoch: 6 Global Step: 125980 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:11:55,067-Speed 6308.42 samples/sec Loss 7.6805 LearningRate 0.0009 Epoch: 6 Global Step: 125990 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:11:58,310-Speed 6317.00 samples/sec Loss 7.7043 LearningRate 0.0009 Epoch: 6 Global Step: 126000 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:12:01,556-Speed 6309.22 samples/sec Loss 7.7544 LearningRate 0.0009 Epoch: 6 Global Step: 126010 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:04,814-Speed 6287.83 samples/sec Loss 7.6975 LearningRate 0.0009 Epoch: 6 Global Step: 126020 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:08,065-Speed 6301.11 samples/sec Loss 7.6542 LearningRate 0.0009 Epoch: 6 Global Step: 126030 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:11,317-Speed 6299.85 samples/sec Loss 7.7140 LearningRate 0.0009 Epoch: 6 Global Step: 126040 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:14,575-Speed 6287.26 samples/sec Loss 7.6937 LearningRate 0.0009 Epoch: 6 Global Step: 126050 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:17,820-Speed 6312.19 samples/sec Loss 7.7519 LearningRate 0.0009 Epoch: 6 Global Step: 126060 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:21,068-Speed 6306.76 samples/sec Loss 7.7524 LearningRate 0.0009 Epoch: 6 Global Step: 126070 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:24,313-Speed 6312.55 samples/sec Loss 7.6886 LearningRate 0.0009 Epoch: 6 Global Step: 126080 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:27,557-Speed 6314.86 samples/sec Loss 7.7325 LearningRate 0.0009 Epoch: 6 Global Step: 126090 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:30,808-Speed 6300.85 samples/sec Loss 7.7156 LearningRate 0.0009 Epoch: 6 Global Step: 126100 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:34,044-Speed 6330.28 samples/sec Loss 7.7403 LearningRate 0.0009 Epoch: 6 Global Step: 126110 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:37,291-Speed 6309.61 samples/sec Loss 7.6306 LearningRate 0.0009 Epoch: 6 Global Step: 126120 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:40,539-Speed 6306.78 samples/sec Loss 7.7564 LearningRate 0.0009 Epoch: 6 Global Step: 126130 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:43,790-Speed 6301.25 samples/sec Loss 7.7659 LearningRate 0.0009 Epoch: 6 Global Step: 126140 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:47,047-Speed 6289.00 samples/sec Loss 7.7090 LearningRate 0.0009 Epoch: 6 Global Step: 126150 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:50,293-Speed 6310.83 samples/sec Loss 7.6492 LearningRate 0.0009 Epoch: 6 Global Step: 126160 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:53,539-Speed 6310.53 samples/sec Loss 7.6497 LearningRate 0.0009 Epoch: 6 Global Step: 126170 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:12:56,783-Speed 6314.11 samples/sec Loss 7.7431 LearningRate 0.0009 Epoch: 6 Global Step: 126180 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:00,032-Speed 6304.77 samples/sec Loss 7.6062 LearningRate 0.0009 Epoch: 6 Global Step: 126190 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:03,281-Speed 6306.21 samples/sec Loss 7.6677 LearningRate 0.0009 Epoch: 6 Global Step: 126200 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:06,515-Speed 6333.59 samples/sec Loss 7.6949 LearningRate 0.0009 Epoch: 6 Global Step: 126210 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:09,760-Speed 6313.02 samples/sec Loss 7.7710 LearningRate 0.0009 Epoch: 6 Global Step: 126220 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:13,002-Speed 6318.82 samples/sec Loss 7.7977 LearningRate 0.0009 Epoch: 6 Global Step: 126230 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:16,247-Speed 6312.32 samples/sec Loss 7.6650 LearningRate 0.0009 Epoch: 6 Global Step: 126240 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:19,493-Speed 6311.40 samples/sec Loss 7.7984 LearningRate 0.0009 Epoch: 6 Global Step: 126250 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:22,751-Speed 6285.86 samples/sec Loss 7.7133 LearningRate 0.0009 Epoch: 6 Global Step: 126260 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:26,007-Speed 6292.69 samples/sec Loss 7.6885 LearningRate 0.0009 Epoch: 6 Global Step: 126270 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:29,251-Speed 6313.80 samples/sec Loss 7.6784 LearningRate 0.0009 Epoch: 6 Global Step: 126280 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:32,498-Speed 6309.11 samples/sec Loss 7.7435 LearningRate 0.0009 Epoch: 6 Global Step: 126290 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:35,744-Speed 6310.62 samples/sec Loss 7.7457 LearningRate 0.0009 Epoch: 6 Global Step: 126300 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:38,976-Speed 6338.96 samples/sec Loss 7.7309 LearningRate 0.0009 Epoch: 6 Global Step: 126310 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:42,223-Speed 6308.39 samples/sec Loss 7.6636 LearningRate 0.0009 Epoch: 6 Global Step: 126320 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:45,479-Speed 6289.80 samples/sec Loss 7.6658 LearningRate 0.0009 Epoch: 6 Global Step: 126330 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:48,740-Speed 6282.07 samples/sec Loss 7.7027 LearningRate 0.0009 Epoch: 6 Global Step: 126340 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:51,983-Speed 6316.46 samples/sec Loss 7.6776 LearningRate 0.0009 Epoch: 6 Global Step: 126350 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:55,227-Speed 6316.30 samples/sec Loss 7.6892 LearningRate 0.0009 Epoch: 6 Global Step: 126360 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:13:58,475-Speed 6306.40 samples/sec Loss 7.6937 LearningRate 0.0009 Epoch: 6 Global Step: 126370 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:14:01,722-Speed 6307.97 samples/sec Loss 7.7218 LearningRate 0.0009 Epoch: 6 Global Step: 126380 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:14:04,974-Speed 6299.16 samples/sec Loss 7.7228 LearningRate 0.0009 Epoch: 6 Global Step: 126390 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:14:08,223-Speed 6306.23 samples/sec Loss 7.6438 LearningRate 0.0009 Epoch: 6 Global Step: 126400 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:14:11,453-Speed 6341.39 samples/sec Loss 7.7639 LearningRate 0.0009 Epoch: 6 Global Step: 126410 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:14:14,683-Speed 6342.90 samples/sec Loss 7.6873 LearningRate 0.0009 Epoch: 6 Global Step: 126420 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:14:17,931-Speed 6306.38 samples/sec Loss 7.7296 LearningRate 0.0009 Epoch: 6 Global Step: 126430 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:14:21,177-Speed 6310.94 samples/sec Loss 7.6794 LearningRate 0.0009 Epoch: 6 Global Step: 126440 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:14:24,424-Speed 6309.36 samples/sec Loss 7.7089 LearningRate 0.0009 Epoch: 6 Global Step: 126450 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:14:27,673-Speed 6304.77 samples/sec Loss 7.5939 LearningRate 0.0009 Epoch: 6 Global Step: 126460 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:14:30,916-Speed 6316.57 samples/sec Loss 7.6825 LearningRate 0.0009 Epoch: 6 Global Step: 126470 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:14:34,162-Speed 6310.63 samples/sec Loss 7.6887 LearningRate 0.0009 Epoch: 6 Global Step: 126480 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:14:37,405-Speed 6317.01 samples/sec Loss 7.7575 LearningRate 0.0009 Epoch: 6 Global Step: 126490 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:14:40,656-Speed 6300.62 samples/sec Loss 7.6907 LearningRate 0.0009 Epoch: 6 Global Step: 126500 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:14:43,903-Speed 6308.14 samples/sec Loss 7.7943 LearningRate 0.0009 Epoch: 6 Global Step: 126510 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:14:47,147-Speed 6315.10 samples/sec Loss 7.6841 LearningRate 0.0009 Epoch: 6 Global Step: 126520 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:14:50,394-Speed 6308.19 samples/sec Loss 7.8048 LearningRate 0.0009 Epoch: 6 Global Step: 126530 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:14:53,637-Speed 6316.84 samples/sec Loss 7.7800 LearningRate 0.0009 Epoch: 6 Global Step: 126540 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:14:56,882-Speed 6312.87 samples/sec Loss 7.7027 LearningRate 0.0009 Epoch: 6 Global Step: 126550 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:00,131-Speed 6304.13 samples/sec Loss 7.6959 LearningRate 0.0009 Epoch: 6 Global Step: 126560 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:03,379-Speed 6306.92 samples/sec Loss 7.6920 LearningRate 0.0009 Epoch: 6 Global Step: 126570 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:06,626-Speed 6309.42 samples/sec Loss 7.6725 LearningRate 0.0009 Epoch: 6 Global Step: 126580 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:09,869-Speed 6315.33 samples/sec Loss 7.7422 LearningRate 0.0009 Epoch: 6 Global Step: 126590 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:13,123-Speed 6295.69 samples/sec Loss 7.5972 LearningRate 0.0009 Epoch: 6 Global Step: 126600 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:16,370-Speed 6308.70 samples/sec Loss 7.6350 LearningRate 0.0009 Epoch: 6 Global Step: 126610 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:19,604-Speed 6335.45 samples/sec Loss 7.6526 LearningRate 0.0009 Epoch: 6 Global Step: 126620 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:22,847-Speed 6316.77 samples/sec Loss 7.6850 LearningRate 0.0009 Epoch: 6 Global Step: 126630 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:26,093-Speed 6309.89 samples/sec Loss 7.6183 LearningRate 0.0009 Epoch: 6 Global Step: 126640 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:29,338-Speed 6312.38 samples/sec Loss 7.7698 LearningRate 0.0009 Epoch: 6 Global Step: 126650 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:32,583-Speed 6312.93 samples/sec Loss 7.7107 LearningRate 0.0009 Epoch: 6 Global Step: 126660 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:35,831-Speed 6308.08 samples/sec Loss 7.7755 LearningRate 0.0009 Epoch: 6 Global Step: 126670 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:39,076-Speed 6311.39 samples/sec Loss 7.6793 LearningRate 0.0009 Epoch: 6 Global Step: 126680 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:42,328-Speed 6298.90 samples/sec Loss 7.7154 LearningRate 0.0009 Epoch: 6 Global Step: 126690 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:45,574-Speed 6311.75 samples/sec Loss 7.7757 LearningRate 0.0009 Epoch: 6 Global Step: 126700 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:48,821-Speed 6308.45 samples/sec Loss 7.6773 LearningRate 0.0009 Epoch: 6 Global Step: 126710 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:52,066-Speed 6312.45 samples/sec Loss 7.6868 LearningRate 0.0009 Epoch: 6 Global Step: 126720 Fp16 Grad Scale: 131072 Required: 64 hours Training: 2022-04-01 03:15:55,296-Speed 6342.47 samples/sec Loss 7.7455 LearningRate 0.0009 Epoch: 6 Global Step: 126730 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:15:58,529-Speed 6335.72 samples/sec Loss 7.6706 LearningRate 0.0009 Epoch: 6 Global Step: 126740 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:16:01,777-Speed 6306.12 samples/sec Loss 7.7469 LearningRate 0.0009 Epoch: 6 Global Step: 126750 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:16:05,021-Speed 6314.64 samples/sec Loss 7.7149 LearningRate 0.0009 Epoch: 6 Global Step: 126760 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:16:08,266-Speed 6312.63 samples/sec Loss 7.6524 LearningRate 0.0009 Epoch: 6 Global Step: 126770 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:16:11,511-Speed 6312.52 samples/sec Loss 7.7216 LearningRate 0.0009 Epoch: 6 Global Step: 126780 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:16:14,760-Speed 6305.60 samples/sec Loss 7.6819 LearningRate 0.0009 Epoch: 6 Global Step: 126790 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:16:18,002-Speed 6318.76 samples/sec Loss 7.5686 LearningRate 0.0009 Epoch: 6 Global Step: 126800 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:16:21,245-Speed 6315.29 samples/sec Loss 7.6324 LearningRate 0.0009 Epoch: 6 Global Step: 126810 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:16:24,493-Speed 6308.15 samples/sec Loss 7.6652 LearningRate 0.0009 Epoch: 6 Global Step: 126820 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:16:27,738-Speed 6313.44 samples/sec Loss 7.6992 LearningRate 0.0009 Epoch: 6 Global Step: 126830 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:16:30,986-Speed 6307.14 samples/sec Loss 7.7291 LearningRate 0.0009 Epoch: 6 Global Step: 126840 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:16:34,230-Speed 6314.49 samples/sec Loss 7.6083 LearningRate 0.0009 Epoch: 6 Global Step: 126850 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:16:37,477-Speed 6308.66 samples/sec Loss 7.5832 LearningRate 0.0009 Epoch: 6 Global Step: 126860 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:16:40,827-Speed 6114.31 samples/sec Loss 7.6717 LearningRate 0.0009 Epoch: 6 Global Step: 126870 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:16:44,106-Speed 6246.94 samples/sec Loss 7.7398 LearningRate 0.0009 Epoch: 6 Global Step: 126880 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:16:47,356-Speed 6302.64 samples/sec Loss 7.7333 LearningRate 0.0009 Epoch: 6 Global Step: 126890 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:16:50,602-Speed 6311.76 samples/sec Loss 7.7637 LearningRate 0.0009 Epoch: 6 Global Step: 126900 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:16:53,851-Speed 6305.28 samples/sec Loss 7.6883 LearningRate 0.0009 Epoch: 6 Global Step: 126910 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:16:57,104-Speed 6296.44 samples/sec Loss 7.7041 LearningRate 0.0009 Epoch: 6 Global Step: 126920 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:00,352-Speed 6307.41 samples/sec Loss 7.6670 LearningRate 0.0009 Epoch: 6 Global Step: 126930 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:03,591-Speed 6323.57 samples/sec Loss 7.6996 LearningRate 0.0009 Epoch: 6 Global Step: 126940 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:06,842-Speed 6301.12 samples/sec Loss 7.6529 LearningRate 0.0009 Epoch: 6 Global Step: 126950 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:10,091-Speed 6305.31 samples/sec Loss 7.7410 LearningRate 0.0009 Epoch: 6 Global Step: 126960 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:13,337-Speed 6310.44 samples/sec Loss 7.6698 LearningRate 0.0009 Epoch: 6 Global Step: 126970 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:16,588-Speed 6300.91 samples/sec Loss 7.6970 LearningRate 0.0009 Epoch: 6 Global Step: 126980 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:19,837-Speed 6303.33 samples/sec Loss 7.7054 LearningRate 0.0009 Epoch: 6 Global Step: 126990 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:23,085-Speed 6307.59 samples/sec Loss 7.6765 LearningRate 0.0009 Epoch: 6 Global Step: 127000 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:26,338-Speed 6297.13 samples/sec Loss 7.6505 LearningRate 0.0009 Epoch: 6 Global Step: 127010 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:29,587-Speed 6305.12 samples/sec Loss 7.6926 LearningRate 0.0009 Epoch: 6 Global Step: 127020 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:32,840-Speed 6297.45 samples/sec Loss 7.6083 LearningRate 0.0009 Epoch: 6 Global Step: 127030 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:36,085-Speed 6312.93 samples/sec Loss 7.6313 LearningRate 0.0009 Epoch: 6 Global Step: 127040 Fp16 Grad Scale: 131072 Required: 64 hours Training: 2022-04-01 03:17:39,326-Speed 6320.31 samples/sec Loss 7.7248 LearningRate 0.0009 Epoch: 6 Global Step: 127050 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:42,574-Speed 6307.20 samples/sec Loss 7.6013 LearningRate 0.0009 Epoch: 6 Global Step: 127060 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:45,822-Speed 6307.40 samples/sec Loss 7.6883 LearningRate 0.0009 Epoch: 6 Global Step: 127070 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:17:49,062-Speed 6322.08 samples/sec Loss 7.6712 LearningRate 0.0009 Epoch: 6 Global Step: 127080 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:17:52,314-Speed 6297.76 samples/sec Loss 7.6907 LearningRate 0.0009 Epoch: 6 Global Step: 127090 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:17:55,564-Speed 6304.40 samples/sec Loss 7.6453 LearningRate 0.0009 Epoch: 6 Global Step: 127100 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:17:58,811-Speed 6307.27 samples/sec Loss 7.7812 LearningRate 0.0009 Epoch: 6 Global Step: 127110 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:18:02,059-Speed 6307.95 samples/sec Loss 7.6139 LearningRate 0.0009 Epoch: 6 Global Step: 127120 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:18:05,307-Speed 6306.52 samples/sec Loss 7.6913 LearningRate 0.0009 Epoch: 6 Global Step: 127130 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:18:08,552-Speed 6313.58 samples/sec Loss 7.6227 LearningRate 0.0009 Epoch: 6 Global Step: 127140 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:18:11,796-Speed 6313.41 samples/sec Loss 7.6259 LearningRate 0.0009 Epoch: 6 Global Step: 127150 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:18:15,042-Speed 6310.74 samples/sec Loss 7.5439 LearningRate 0.0009 Epoch: 6 Global Step: 127160 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:18:18,290-Speed 6306.29 samples/sec Loss 7.7288 LearningRate 0.0009 Epoch: 6 Global Step: 127170 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:18:21,537-Speed 6308.65 samples/sec Loss 7.6465 LearningRate 0.0009 Epoch: 6 Global Step: 127180 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:24,785-Speed 6308.40 samples/sec Loss 7.6806 LearningRate 0.0009 Epoch: 6 Global Step: 127190 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:28,036-Speed 6300.32 samples/sec Loss 7.7305 LearningRate 0.0009 Epoch: 6 Global Step: 127200 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:31,279-Speed 6317.03 samples/sec Loss 7.6667 LearningRate 0.0009 Epoch: 6 Global Step: 127210 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:34,530-Speed 6301.35 samples/sec Loss 7.6097 LearningRate 0.0009 Epoch: 6 Global Step: 127220 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:37,786-Speed 6290.28 samples/sec Loss 7.6784 LearningRate 0.0009 Epoch: 6 Global Step: 127230 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:41,034-Speed 6307.19 samples/sec Loss 7.7269 LearningRate 0.0009 Epoch: 6 Global Step: 127240 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:44,287-Speed 6297.55 samples/sec Loss 7.7054 LearningRate 0.0009 Epoch: 6 Global Step: 127250 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:47,529-Speed 6318.71 samples/sec Loss 7.8035 LearningRate 0.0009 Epoch: 6 Global Step: 127260 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:50,774-Speed 6313.15 samples/sec Loss 7.7319 LearningRate 0.0009 Epoch: 6 Global Step: 127270 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:54,005-Speed 6339.43 samples/sec Loss 7.7113 LearningRate 0.0009 Epoch: 6 Global Step: 127280 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:18:57,250-Speed 6313.37 samples/sec Loss 7.6926 LearningRate 0.0009 Epoch: 6 Global Step: 127290 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:00,495-Speed 6312.87 samples/sec Loss 7.7743 LearningRate 0.0009 Epoch: 6 Global Step: 127300 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:03,743-Speed 6306.47 samples/sec Loss 7.7457 LearningRate 0.0009 Epoch: 6 Global Step: 127310 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:06,990-Speed 6308.69 samples/sec Loss 7.5992 LearningRate 0.0009 Epoch: 6 Global Step: 127320 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:10,235-Speed 6311.75 samples/sec Loss 7.7510 LearningRate 0.0009 Epoch: 6 Global Step: 127330 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:13,482-Speed 6308.71 samples/sec Loss 7.6835 LearningRate 0.0009 Epoch: 6 Global Step: 127340 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:16,745-Speed 6277.47 samples/sec Loss 7.6722 LearningRate 0.0009 Epoch: 6 Global Step: 127350 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:19,991-Speed 6312.14 samples/sec Loss 7.5921 LearningRate 0.0009 Epoch: 6 Global Step: 127360 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:23,235-Speed 6312.91 samples/sec Loss 7.6313 LearningRate 0.0009 Epoch: 6 Global Step: 127370 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:26,466-Speed 6341.91 samples/sec Loss 7.6771 LearningRate 0.0009 Epoch: 6 Global Step: 127380 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:29,710-Speed 6314.47 samples/sec Loss 7.6656 LearningRate 0.0009 Epoch: 6 Global Step: 127390 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:32,960-Speed 6303.27 samples/sec Loss 7.7407 LearningRate 0.0009 Epoch: 6 Global Step: 127400 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:36,224-Speed 6275.39 samples/sec Loss 7.6798 LearningRate 0.0009 Epoch: 6 Global Step: 127410 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:39,471-Speed 6307.41 samples/sec Loss 7.6867 LearningRate 0.0009 Epoch: 6 Global Step: 127420 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:42,716-Speed 6313.49 samples/sec Loss 7.6454 LearningRate 0.0009 Epoch: 6 Global Step: 127430 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:45,968-Speed 6298.61 samples/sec Loss 7.6360 LearningRate 0.0009 Epoch: 6 Global Step: 127440 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:49,216-Speed 6307.98 samples/sec Loss 7.6803 LearningRate 0.0009 Epoch: 6 Global Step: 127450 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:52,461-Speed 6313.28 samples/sec Loss 7.7248 LearningRate 0.0009 Epoch: 6 Global Step: 127460 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:55,713-Speed 6299.49 samples/sec Loss 7.6755 LearningRate 0.0009 Epoch: 6 Global Step: 127470 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:19:58,958-Speed 6311.12 samples/sec Loss 7.7015 LearningRate 0.0009 Epoch: 6 Global Step: 127480 Fp16 Grad Scale: 131072 Required: 64 hours Training: 2022-04-01 03:20:02,193-Speed 6332.84 samples/sec Loss 7.7818 LearningRate 0.0009 Epoch: 6 Global Step: 127490 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:05,443-Speed 6304.11 samples/sec Loss 7.7245 LearningRate 0.0009 Epoch: 6 Global Step: 127500 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:08,689-Speed 6309.27 samples/sec Loss 7.6793 LearningRate 0.0009 Epoch: 6 Global Step: 127510 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:11,937-Speed 6307.97 samples/sec Loss 7.6719 LearningRate 0.0009 Epoch: 6 Global Step: 127520 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:15,189-Speed 6298.01 samples/sec Loss 7.6414 LearningRate 0.0009 Epoch: 6 Global Step: 127530 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:18,438-Speed 6305.64 samples/sec Loss 7.6900 LearningRate 0.0009 Epoch: 6 Global Step: 127540 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:21,688-Speed 6302.44 samples/sec Loss 7.6958 LearningRate 0.0009 Epoch: 6 Global Step: 127550 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:24,935-Speed 6308.72 samples/sec Loss 7.7507 LearningRate 0.0009 Epoch: 6 Global Step: 127560 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:28,188-Speed 6298.20 samples/sec Loss 7.7301 LearningRate 0.0009 Epoch: 6 Global Step: 127570 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:31,433-Speed 6311.54 samples/sec Loss 7.5964 LearningRate 0.0009 Epoch: 6 Global Step: 127580 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:34,669-Speed 6330.92 samples/sec Loss 7.6686 LearningRate 0.0009 Epoch: 6 Global Step: 127590 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:37,918-Speed 6304.27 samples/sec Loss 7.6832 LearningRate 0.0009 Epoch: 6 Global Step: 127600 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:41,165-Speed 6308.46 samples/sec Loss 7.7415 LearningRate 0.0009 Epoch: 6 Global Step: 127610 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:20:44,402-Speed 6329.45 samples/sec Loss 7.5940 LearningRate 0.0009 Epoch: 6 Global Step: 127620 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:20:47,648-Speed 6309.66 samples/sec Loss 7.6264 LearningRate 0.0009 Epoch: 6 Global Step: 127630 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:20:50,893-Speed 6312.09 samples/sec Loss 7.7476 LearningRate 0.0009 Epoch: 6 Global Step: 127640 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:20:54,143-Speed 6303.77 samples/sec Loss 7.7155 LearningRate 0.0009 Epoch: 6 Global Step: 127650 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:20:57,391-Speed 6306.92 samples/sec Loss 7.6317 LearningRate 0.0009 Epoch: 6 Global Step: 127660 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:21:00,640-Speed 6306.00 samples/sec Loss 7.6348 LearningRate 0.0009 Epoch: 6 Global Step: 127670 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:21:03,884-Speed 6314.03 samples/sec Loss 7.6402 LearningRate 0.0009 Epoch: 6 Global Step: 127680 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:21:07,130-Speed 6310.25 samples/sec Loss 7.6973 LearningRate 0.0009 Epoch: 6 Global Step: 127690 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:21:10,380-Speed 6304.14 samples/sec Loss 7.6863 LearningRate 0.0009 Epoch: 6 Global Step: 127700 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:21:13,626-Speed 6310.51 samples/sec Loss 7.6934 LearningRate 0.0009 Epoch: 6 Global Step: 127710 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:21:16,875-Speed 6304.04 samples/sec Loss 7.5736 LearningRate 0.0009 Epoch: 6 Global Step: 127720 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:20,127-Speed 6299.80 samples/sec Loss 7.7009 LearningRate 0.0009 Epoch: 6 Global Step: 127730 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:23,377-Speed 6302.13 samples/sec Loss 7.6034 LearningRate 0.0009 Epoch: 6 Global Step: 127740 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:26,626-Speed 6304.50 samples/sec Loss 7.6082 LearningRate 0.0009 Epoch: 6 Global Step: 127750 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:29,873-Speed 6309.81 samples/sec Loss 7.7102 LearningRate 0.0009 Epoch: 6 Global Step: 127760 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:33,118-Speed 6313.08 samples/sec Loss 7.6619 LearningRate 0.0009 Epoch: 6 Global Step: 127770 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:36,378-Speed 6283.94 samples/sec Loss 7.7385 LearningRate 0.0009 Epoch: 6 Global Step: 127780 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:39,623-Speed 6312.45 samples/sec Loss 7.5874 LearningRate 0.0009 Epoch: 6 Global Step: 127790 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:42,868-Speed 6311.73 samples/sec Loss 7.7675 LearningRate 0.0009 Epoch: 6 Global Step: 127800 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:46,113-Speed 6313.88 samples/sec Loss 7.6628 LearningRate 0.0009 Epoch: 6 Global Step: 127810 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:49,347-Speed 6332.86 samples/sec Loss 7.6807 LearningRate 0.0009 Epoch: 6 Global Step: 127820 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:52,597-Speed 6302.51 samples/sec Loss 7.6400 LearningRate 0.0009 Epoch: 6 Global Step: 127830 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:55,841-Speed 6315.27 samples/sec Loss 7.6952 LearningRate 0.0009 Epoch: 6 Global Step: 127840 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:21:59,085-Speed 6314.13 samples/sec Loss 7.5944 LearningRate 0.0009 Epoch: 6 Global Step: 127850 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:02,334-Speed 6305.01 samples/sec Loss 7.6667 LearningRate 0.0009 Epoch: 6 Global Step: 127860 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:05,582-Speed 6308.44 samples/sec Loss 7.4994 LearningRate 0.0009 Epoch: 6 Global Step: 127870 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:08,833-Speed 6300.76 samples/sec Loss 7.5744 LearningRate 0.0009 Epoch: 6 Global Step: 127880 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:12,079-Speed 6310.82 samples/sec Loss 7.7160 LearningRate 0.0009 Epoch: 6 Global Step: 127890 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:15,328-Speed 6305.39 samples/sec Loss 7.6881 LearningRate 0.0009 Epoch: 6 Global Step: 127900 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:18,576-Speed 6306.91 samples/sec Loss 7.7453 LearningRate 0.0009 Epoch: 6 Global Step: 127910 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:21,816-Speed 6323.58 samples/sec Loss 7.5959 LearningRate 0.0009 Epoch: 6 Global Step: 127920 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:25,061-Speed 6312.36 samples/sec Loss 7.6173 LearningRate 0.0009 Epoch: 6 Global Step: 127930 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:28,307-Speed 6309.72 samples/sec Loss 7.6565 LearningRate 0.0009 Epoch: 6 Global Step: 127940 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:31,557-Speed 6304.50 samples/sec Loss 7.6137 LearningRate 0.0009 Epoch: 6 Global Step: 127950 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:34,800-Speed 6315.36 samples/sec Loss 7.6365 LearningRate 0.0009 Epoch: 6 Global Step: 127960 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:38,048-Speed 6306.40 samples/sec Loss 7.6461 LearningRate 0.0009 Epoch: 6 Global Step: 127970 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:41,296-Speed 6307.80 samples/sec Loss 7.5682 LearningRate 0.0009 Epoch: 6 Global Step: 127980 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:44,542-Speed 6311.02 samples/sec Loss 7.5911 LearningRate 0.0009 Epoch: 6 Global Step: 127990 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:47,786-Speed 6314.14 samples/sec Loss 7.5899 LearningRate 0.0009 Epoch: 6 Global Step: 128000 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:51,034-Speed 6305.87 samples/sec Loss 7.6576 LearningRate 0.0009 Epoch: 6 Global Step: 128010 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:54,268-Speed 6335.06 samples/sec Loss 7.6805 LearningRate 0.0009 Epoch: 6 Global Step: 128020 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:22:57,520-Speed 6299.56 samples/sec Loss 7.6118 LearningRate 0.0009 Epoch: 6 Global Step: 128030 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:23:00,766-Speed 6310.00 samples/sec Loss 7.6593 LearningRate 0.0009 Epoch: 6 Global Step: 128040 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:23:04,014-Speed 6306.36 samples/sec Loss 7.6753 LearningRate 0.0009 Epoch: 6 Global Step: 128050 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:23:07,267-Speed 6298.62 samples/sec Loss 7.6837 LearningRate 0.0009 Epoch: 6 Global Step: 128060 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:23:10,499-Speed 6337.27 samples/sec Loss 7.6729 LearningRate 0.0009 Epoch: 6 Global Step: 128070 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:23:13,743-Speed 6313.62 samples/sec Loss 7.6843 LearningRate 0.0009 Epoch: 6 Global Step: 128080 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:23:17,007-Speed 6277.24 samples/sec Loss 7.6529 LearningRate 0.0009 Epoch: 6 Global Step: 128090 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:23:20,254-Speed 6309.11 samples/sec Loss 7.7065 LearningRate 0.0009 Epoch: 6 Global Step: 128100 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:23:23,499-Speed 6312.17 samples/sec Loss 7.6718 LearningRate 0.0009 Epoch: 6 Global Step: 128110 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:23:26,746-Speed 6308.30 samples/sec Loss 7.6418 LearningRate 0.0009 Epoch: 6 Global Step: 128120 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:23:29,990-Speed 6315.14 samples/sec Loss 7.5620 LearningRate 0.0009 Epoch: 6 Global Step: 128130 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:23:33,235-Speed 6313.16 samples/sec Loss 7.6391 LearningRate 0.0009 Epoch: 6 Global Step: 128140 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:23:36,482-Speed 6311.33 samples/sec Loss 7.6841 LearningRate 0.0009 Epoch: 6 Global Step: 128150 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:23:39,728-Speed 6311.28 samples/sec Loss 7.6227 LearningRate 0.0009 Epoch: 6 Global Step: 128160 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:23:42,974-Speed 6310.45 samples/sec Loss 7.7679 LearningRate 0.0009 Epoch: 6 Global Step: 128170 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:23:46,218-Speed 6313.40 samples/sec Loss 7.7877 LearningRate 0.0009 Epoch: 6 Global Step: 128180 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:23:49,465-Speed 6309.93 samples/sec Loss 7.6252 LearningRate 0.0009 Epoch: 6 Global Step: 128190 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:23:52,709-Speed 6314.76 samples/sec Loss 7.7572 LearningRate 0.0009 Epoch: 6 Global Step: 128200 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:23:55,955-Speed 6309.86 samples/sec Loss 7.7262 LearningRate 0.0009 Epoch: 6 Global Step: 128210 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:23:59,201-Speed 6312.13 samples/sec Loss 7.6186 LearningRate 0.0009 Epoch: 6 Global Step: 128220 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:02,450-Speed 6304.15 samples/sec Loss 7.6292 LearningRate 0.0009 Epoch: 6 Global Step: 128230 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:05,701-Speed 6300.65 samples/sec Loss 7.6506 LearningRate 0.0009 Epoch: 6 Global Step: 128240 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:08,954-Speed 6296.53 samples/sec Loss 7.7352 LearningRate 0.0009 Epoch: 6 Global Step: 128250 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:12,204-Speed 6303.95 samples/sec Loss 7.5709 LearningRate 0.0009 Epoch: 6 Global Step: 128260 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:15,437-Speed 6334.74 samples/sec Loss 7.5692 LearningRate 0.0009 Epoch: 6 Global Step: 128270 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:18,685-Speed 6308.06 samples/sec Loss 7.6982 LearningRate 0.0009 Epoch: 6 Global Step: 128280 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:21,927-Speed 6317.31 samples/sec Loss 7.6558 LearningRate 0.0009 Epoch: 6 Global Step: 128290 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:25,174-Speed 6309.43 samples/sec Loss 7.6645 LearningRate 0.0009 Epoch: 6 Global Step: 128300 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:28,446-Speed 6261.34 samples/sec Loss 7.5801 LearningRate 0.0009 Epoch: 6 Global Step: 128310 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:31,689-Speed 6315.85 samples/sec Loss 7.6828 LearningRate 0.0009 Epoch: 6 Global Step: 128320 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:34,957-Speed 6268.07 samples/sec Loss 7.5384 LearningRate 0.0009 Epoch: 6 Global Step: 128330 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:38,207-Speed 6305.05 samples/sec Loss 7.7137 LearningRate 0.0009 Epoch: 6 Global Step: 128340 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:24:41,441-Speed 6332.67 samples/sec Loss 7.6669 LearningRate 0.0009 Epoch: 6 Global Step: 128350 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:24:44,689-Speed 6306.84 samples/sec Loss 7.6553 LearningRate 0.0009 Epoch: 6 Global Step: 128360 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:24:47,935-Speed 6310.47 samples/sec Loss 7.6606 LearningRate 0.0009 Epoch: 6 Global Step: 128370 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:24:51,184-Speed 6304.54 samples/sec Loss 7.6151 LearningRate 0.0009 Epoch: 6 Global Step: 128380 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:24:54,432-Speed 6306.74 samples/sec Loss 7.6767 LearningRate 0.0009 Epoch: 6 Global Step: 128390 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:24:57,680-Speed 6307.84 samples/sec Loss 7.6166 LearningRate 0.0009 Epoch: 6 Global Step: 128400 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:00,925-Speed 6313.39 samples/sec Loss 7.5672 LearningRate 0.0009 Epoch: 6 Global Step: 128410 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:04,174-Speed 6305.17 samples/sec Loss 7.7160 LearningRate 0.0009 Epoch: 6 Global Step: 128420 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:07,424-Speed 6302.40 samples/sec Loss 7.7000 LearningRate 0.0009 Epoch: 6 Global Step: 128430 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:10,670-Speed 6311.14 samples/sec Loss 7.6736 LearningRate 0.0009 Epoch: 6 Global Step: 128440 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:13,917-Speed 6308.39 samples/sec Loss 7.6719 LearningRate 0.0009 Epoch: 6 Global Step: 128450 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:25:17,163-Speed 6309.80 samples/sec Loss 7.6436 LearningRate 0.0009 Epoch: 6 Global Step: 128460 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:25:20,413-Speed 6302.72 samples/sec Loss 7.6692 LearningRate 0.0009 Epoch: 6 Global Step: 128470 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:25:23,662-Speed 6304.61 samples/sec Loss 7.7040 LearningRate 0.0009 Epoch: 6 Global Step: 128480 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:25:26,909-Speed 6310.40 samples/sec Loss 7.6615 LearningRate 0.0009 Epoch: 6 Global Step: 128490 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:25:30,143-Speed 6333.73 samples/sec Loss 7.7178 LearningRate 0.0009 Epoch: 6 Global Step: 128500 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:33,386-Speed 6316.96 samples/sec Loss 7.6863 LearningRate 0.0009 Epoch: 6 Global Step: 128510 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:36,634-Speed 6307.73 samples/sec Loss 7.7027 LearningRate 0.0009 Epoch: 6 Global Step: 128520 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:39,881-Speed 6307.99 samples/sec Loss 7.6011 LearningRate 0.0009 Epoch: 6 Global Step: 128530 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:43,125-Speed 6315.49 samples/sec Loss 7.6843 LearningRate 0.0009 Epoch: 6 Global Step: 128540 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:46,426-Speed 6204.22 samples/sec Loss 7.6248 LearningRate 0.0009 Epoch: 6 Global Step: 128550 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:49,672-Speed 6312.34 samples/sec Loss 7.7684 LearningRate 0.0009 Epoch: 6 Global Step: 128560 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:52,917-Speed 6312.49 samples/sec Loss 7.6441 LearningRate 0.0009 Epoch: 6 Global Step: 128570 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:56,163-Speed 6309.74 samples/sec Loss 7.5630 LearningRate 0.0009 Epoch: 6 Global Step: 128580 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:25:59,410-Speed 6309.45 samples/sec Loss 7.6225 LearningRate 0.0009 Epoch: 6 Global Step: 128590 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:26:02,654-Speed 6315.31 samples/sec Loss 7.6602 LearningRate 0.0009 Epoch: 6 Global Step: 128600 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:05,897-Speed 6315.86 samples/sec Loss 7.6450 LearningRate 0.0009 Epoch: 6 Global Step: 128610 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:09,141-Speed 6314.22 samples/sec Loss 7.6831 LearningRate 0.0009 Epoch: 6 Global Step: 128620 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:12,400-Speed 6286.55 samples/sec Loss 7.5902 LearningRate 0.0009 Epoch: 6 Global Step: 128630 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:15,648-Speed 6304.96 samples/sec Loss 7.6678 LearningRate 0.0009 Epoch: 6 Global Step: 128640 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:18,897-Speed 6305.95 samples/sec Loss 7.6437 LearningRate 0.0009 Epoch: 6 Global Step: 128650 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:22,140-Speed 6316.75 samples/sec Loss 7.6359 LearningRate 0.0009 Epoch: 6 Global Step: 128660 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:25,391-Speed 6301.22 samples/sec Loss 7.5782 LearningRate 0.0009 Epoch: 6 Global Step: 128670 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:28,638-Speed 6308.77 samples/sec Loss 7.6025 LearningRate 0.0009 Epoch: 6 Global Step: 128680 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:31,886-Speed 6305.53 samples/sec Loss 7.6239 LearningRate 0.0009 Epoch: 6 Global Step: 128690 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:35,122-Speed 6331.47 samples/sec Loss 7.7257 LearningRate 0.0009 Epoch: 6 Global Step: 128700 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:26:38,357-Speed 6332.03 samples/sec Loss 7.6725 LearningRate 0.0009 Epoch: 6 Global Step: 128710 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:26:41,604-Speed 6307.72 samples/sec Loss 7.7156 LearningRate 0.0009 Epoch: 6 Global Step: 128720 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:26:44,853-Speed 6305.69 samples/sec Loss 7.6154 LearningRate 0.0009 Epoch: 6 Global Step: 128730 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:26:48,098-Speed 6314.06 samples/sec Loss 7.6632 LearningRate 0.0009 Epoch: 6 Global Step: 128740 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:26:51,346-Speed 6305.68 samples/sec Loss 7.5607 LearningRate 0.0009 Epoch: 6 Global Step: 128750 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:26:54,596-Speed 6303.58 samples/sec Loss 7.6666 LearningRate 0.0009 Epoch: 6 Global Step: 128760 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:26:57,841-Speed 6311.65 samples/sec Loss 7.6666 LearningRate 0.0009 Epoch: 6 Global Step: 128770 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:01,096-Speed 6294.19 samples/sec Loss 7.7295 LearningRate 0.0009 Epoch: 6 Global Step: 128780 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:04,346-Speed 6302.07 samples/sec Loss 7.5991 LearningRate 0.0009 Epoch: 6 Global Step: 128790 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:07,592-Speed 6311.83 samples/sec Loss 7.6505 LearningRate 0.0009 Epoch: 6 Global Step: 128800 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:10,841-Speed 6304.65 samples/sec Loss 7.6096 LearningRate 0.0009 Epoch: 6 Global Step: 128810 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:27:14,075-Speed 6333.32 samples/sec Loss 7.6020 LearningRate 0.0009 Epoch: 6 Global Step: 128820 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:17,321-Speed 6311.54 samples/sec Loss 7.5803 LearningRate 0.0009 Epoch: 6 Global Step: 128830 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:20,568-Speed 6309.19 samples/sec Loss 7.6564 LearningRate 0.0009 Epoch: 6 Global Step: 128840 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:23,816-Speed 6306.56 samples/sec Loss 7.6800 LearningRate 0.0009 Epoch: 6 Global Step: 128850 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:27,067-Speed 6300.67 samples/sec Loss 7.6409 LearningRate 0.0009 Epoch: 6 Global Step: 128860 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:30,315-Speed 6306.94 samples/sec Loss 7.5752 LearningRate 0.0009 Epoch: 6 Global Step: 128870 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:33,572-Speed 6288.54 samples/sec Loss 7.7437 LearningRate 0.0009 Epoch: 6 Global Step: 128880 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:36,819-Speed 6309.43 samples/sec Loss 7.6211 LearningRate 0.0009 Epoch: 6 Global Step: 128890 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:40,063-Speed 6315.05 samples/sec Loss 7.6502 LearningRate 0.0009 Epoch: 6 Global Step: 128900 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:43,312-Speed 6304.58 samples/sec Loss 7.6570 LearningRate 0.0009 Epoch: 6 Global Step: 128910 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:46,548-Speed 6330.26 samples/sec Loss 7.6088 LearningRate 0.0009 Epoch: 6 Global Step: 128920 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:49,796-Speed 6307.23 samples/sec Loss 7.6707 LearningRate 0.0009 Epoch: 6 Global Step: 128930 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:53,041-Speed 6311.90 samples/sec Loss 7.6486 LearningRate 0.0009 Epoch: 6 Global Step: 128940 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:56,297-Speed 6292.15 samples/sec Loss 7.5667 LearningRate 0.0009 Epoch: 6 Global Step: 128950 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:27:59,545-Speed 6307.62 samples/sec Loss 7.5586 LearningRate 0.0009 Epoch: 6 Global Step: 128960 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:28:02,793-Speed 6305.92 samples/sec Loss 7.6366 LearningRate 0.0009 Epoch: 6 Global Step: 128970 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:28:06,041-Speed 6307.49 samples/sec Loss 7.6678 LearningRate 0.0009 Epoch: 6 Global Step: 128980 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:28:09,291-Speed 6303.71 samples/sec Loss 7.6045 LearningRate 0.0009 Epoch: 6 Global Step: 128990 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:28:12,537-Speed 6310.19 samples/sec Loss 7.5858 LearningRate 0.0009 Epoch: 6 Global Step: 129000 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:28:15,787-Speed 6302.34 samples/sec Loss 7.5806 LearningRate 0.0009 Epoch: 6 Global Step: 129010 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:28:19,034-Speed 6308.38 samples/sec Loss 7.6588 LearningRate 0.0009 Epoch: 6 Global Step: 129020 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:22,284-Speed 6303.42 samples/sec Loss 7.6365 LearningRate 0.0009 Epoch: 6 Global Step: 129030 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:25,535-Speed 6301.69 samples/sec Loss 7.6359 LearningRate 0.0009 Epoch: 6 Global Step: 129040 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:28,786-Speed 6299.93 samples/sec Loss 7.6229 LearningRate 0.0009 Epoch: 6 Global Step: 129050 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:32,031-Speed 6313.54 samples/sec Loss 7.6006 LearningRate 0.0009 Epoch: 6 Global Step: 129060 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:35,277-Speed 6310.03 samples/sec Loss 7.5951 LearningRate 0.0009 Epoch: 6 Global Step: 129070 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:38,524-Speed 6309.85 samples/sec Loss 7.6240 LearningRate 0.0009 Epoch: 6 Global Step: 129080 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:41,772-Speed 6305.70 samples/sec Loss 7.6678 LearningRate 0.0009 Epoch: 6 Global Step: 129090 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:45,020-Speed 6307.57 samples/sec Loss 7.6851 LearningRate 0.0009 Epoch: 6 Global Step: 129100 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:48,271-Speed 6300.66 samples/sec Loss 7.6090 LearningRate 0.0009 Epoch: 6 Global Step: 129110 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:51,520-Speed 6304.94 samples/sec Loss 7.6678 LearningRate 0.0009 Epoch: 6 Global Step: 129120 Fp16 Grad Scale: 131072 Required: 64 hours Training: 2022-04-01 03:28:54,754-Speed 6334.67 samples/sec Loss 7.6331 LearningRate 0.0009 Epoch: 6 Global Step: 129130 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:28:58,000-Speed 6309.60 samples/sec Loss 7.6391 LearningRate 0.0009 Epoch: 6 Global Step: 129140 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:29:01,249-Speed 6305.75 samples/sec Loss 7.6112 LearningRate 0.0009 Epoch: 6 Global Step: 129150 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:29:04,482-Speed 6334.96 samples/sec Loss 7.6640 LearningRate 0.0009 Epoch: 6 Global Step: 129160 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:29:07,752-Speed 6265.15 samples/sec Loss 7.6076 LearningRate 0.0009 Epoch: 6 Global Step: 129170 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:29:10,998-Speed 6312.05 samples/sec Loss 7.5682 LearningRate 0.0009 Epoch: 6 Global Step: 129180 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:29:14,245-Speed 6308.85 samples/sec Loss 7.6343 LearningRate 0.0009 Epoch: 6 Global Step: 129190 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:29:17,492-Speed 6308.13 samples/sec Loss 7.6896 LearningRate 0.0009 Epoch: 6 Global Step: 129200 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:29:20,737-Speed 6313.14 samples/sec Loss 7.6907 LearningRate 0.0009 Epoch: 6 Global Step: 129210 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:29:23,983-Speed 6309.74 samples/sec Loss 7.5402 LearningRate 0.0009 Epoch: 6 Global Step: 129220 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:29:27,233-Speed 6303.34 samples/sec Loss 7.7315 LearningRate 0.0009 Epoch: 6 Global Step: 129230 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:29:30,485-Speed 6299.57 samples/sec Loss 7.6711 LearningRate 0.0009 Epoch: 6 Global Step: 129240 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:29:33,731-Speed 6310.00 samples/sec Loss 7.6619 LearningRate 0.0009 Epoch: 6 Global Step: 129250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:29:36,978-Speed 6309.04 samples/sec Loss 7.7329 LearningRate 0.0009 Epoch: 6 Global Step: 129260 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:29:40,223-Speed 6312.22 samples/sec Loss 7.6115 LearningRate 0.0009 Epoch: 6 Global Step: 129270 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:29:43,472-Speed 6304.40 samples/sec Loss 7.6715 LearningRate 0.0009 Epoch: 6 Global Step: 129280 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:29:46,723-Speed 6300.84 samples/sec Loss 7.6966 LearningRate 0.0009 Epoch: 6 Global Step: 129290 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:29:49,970-Speed 6309.13 samples/sec Loss 7.6026 LearningRate 0.0009 Epoch: 6 Global Step: 129300 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:29:53,216-Speed 6311.49 samples/sec Loss 7.6810 LearningRate 0.0009 Epoch: 6 Global Step: 129310 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:29:56,469-Speed 6296.38 samples/sec Loss 7.6158 LearningRate 0.0009 Epoch: 6 Global Step: 129320 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:29:59,738-Speed 6265.73 samples/sec Loss 7.6325 LearningRate 0.0009 Epoch: 6 Global Step: 129330 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:02,987-Speed 6305.11 samples/sec Loss 7.6323 LearningRate 0.0009 Epoch: 6 Global Step: 129340 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:06,238-Speed 6301.63 samples/sec Loss 7.5900 LearningRate 0.0009 Epoch: 6 Global Step: 129350 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:09,469-Speed 6339.87 samples/sec Loss 7.5922 LearningRate 0.0009 Epoch: 6 Global Step: 129360 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:12,727-Speed 6287.58 samples/sec Loss 7.6242 LearningRate 0.0009 Epoch: 6 Global Step: 129370 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:15,972-Speed 6312.89 samples/sec Loss 7.6205 LearningRate 0.0009 Epoch: 6 Global Step: 129380 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:19,216-Speed 6314.78 samples/sec Loss 7.6389 LearningRate 0.0009 Epoch: 6 Global Step: 129390 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:22,469-Speed 6297.82 samples/sec Loss 7.6665 LearningRate 0.0009 Epoch: 6 Global Step: 129400 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:25,715-Speed 6311.18 samples/sec Loss 7.6369 LearningRate 0.0009 Epoch: 6 Global Step: 129410 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:28,964-Speed 6303.97 samples/sec Loss 7.7364 LearningRate 0.0009 Epoch: 6 Global Step: 129420 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:32,209-Speed 6311.76 samples/sec Loss 7.6316 LearningRate 0.0009 Epoch: 6 Global Step: 129430 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:35,465-Speed 6292.72 samples/sec Loss 7.6814 LearningRate 0.0009 Epoch: 6 Global Step: 129440 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:38,708-Speed 6315.44 samples/sec Loss 7.7287 LearningRate 0.0009 Epoch: 6 Global Step: 129450 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:41,940-Speed 6337.86 samples/sec Loss 7.6073 LearningRate 0.0009 Epoch: 6 Global Step: 129460 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:45,184-Speed 6316.06 samples/sec Loss 7.6585 LearningRate 0.0009 Epoch: 6 Global Step: 129470 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:48,429-Speed 6311.18 samples/sec Loss 7.6856 LearningRate 0.0009 Epoch: 6 Global Step: 129480 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:51,675-Speed 6310.95 samples/sec Loss 7.6280 LearningRate 0.0009 Epoch: 6 Global Step: 129490 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:54,921-Speed 6311.35 samples/sec Loss 7.5875 LearningRate 0.0009 Epoch: 6 Global Step: 129500 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:30:58,170-Speed 6305.16 samples/sec Loss 7.6337 LearningRate 0.0009 Epoch: 6 Global Step: 129510 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:31:01,416-Speed 6309.44 samples/sec Loss 7.5941 LearningRate 0.0009 Epoch: 6 Global Step: 129520 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:31:04,665-Speed 6305.22 samples/sec Loss 7.5970 LearningRate 0.0009 Epoch: 6 Global Step: 129530 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:31:07,913-Speed 6307.70 samples/sec Loss 7.6462 LearningRate 0.0009 Epoch: 6 Global Step: 129540 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:31:11,144-Speed 6340.14 samples/sec Loss 7.6344 LearningRate 0.0009 Epoch: 6 Global Step: 129550 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:31:14,386-Speed 6318.34 samples/sec Loss 7.6730 LearningRate 0.0009 Epoch: 6 Global Step: 129560 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:31:17,631-Speed 6311.65 samples/sec Loss 7.5260 LearningRate 0.0009 Epoch: 6 Global Step: 129570 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:31:20,883-Speed 6298.97 samples/sec Loss 7.5972 LearningRate 0.0009 Epoch: 6 Global Step: 129580 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:31:24,134-Speed 6302.13 samples/sec Loss 7.5613 LearningRate 0.0009 Epoch: 6 Global Step: 129590 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:31:27,380-Speed 6310.49 samples/sec Loss 7.6128 LearningRate 0.0009 Epoch: 6 Global Step: 129600 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:31:30,626-Speed 6310.68 samples/sec Loss 7.6507 LearningRate 0.0009 Epoch: 6 Global Step: 129610 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:31:33,870-Speed 6314.93 samples/sec Loss 7.6796 LearningRate 0.0009 Epoch: 6 Global Step: 129620 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:31:37,119-Speed 6305.31 samples/sec Loss 7.6376 LearningRate 0.0009 Epoch: 6 Global Step: 129630 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:31:40,371-Speed 6299.34 samples/sec Loss 7.6449 LearningRate 0.0009 Epoch: 6 Global Step: 129640 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:31:43,625-Speed 6294.55 samples/sec Loss 7.6005 LearningRate 0.0009 Epoch: 6 Global Step: 129650 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:31:46,877-Speed 6299.26 samples/sec Loss 7.6541 LearningRate 0.0009 Epoch: 6 Global Step: 129660 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:31:50,122-Speed 6312.52 samples/sec Loss 7.5647 LearningRate 0.0009 Epoch: 6 Global Step: 129670 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:31:53,397-Speed 6255.34 samples/sec Loss 7.6560 LearningRate 0.0009 Epoch: 6 Global Step: 129680 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:31:56,649-Speed 6297.73 samples/sec Loss 7.7234 LearningRate 0.0009 Epoch: 6 Global Step: 129690 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:31:59,907-Speed 6289.03 samples/sec Loss 7.7453 LearningRate 0.0009 Epoch: 6 Global Step: 129700 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:32:03,177-Speed 6263.49 samples/sec Loss 7.6588 LearningRate 0.0009 Epoch: 6 Global Step: 129710 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:32:06,422-Speed 6313.68 samples/sec Loss 7.6648 LearningRate 0.0009 Epoch: 6 Global Step: 129720 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:32:09,670-Speed 6306.39 samples/sec Loss 7.6272 LearningRate 0.0009 Epoch: 6 Global Step: 129730 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:32:12,919-Speed 6305.74 samples/sec Loss 7.6566 LearningRate 0.0009 Epoch: 6 Global Step: 129740 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:32:16,152-Speed 6336.26 samples/sec Loss 7.5754 LearningRate 0.0009 Epoch: 6 Global Step: 129750 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:32:19,400-Speed 6307.45 samples/sec Loss 7.6054 LearningRate 0.0009 Epoch: 6 Global Step: 129760 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:32:22,647-Speed 6308.60 samples/sec Loss 7.5648 LearningRate 0.0009 Epoch: 6 Global Step: 129770 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:32:25,907-Speed 6283.82 samples/sec Loss 7.5591 LearningRate 0.0009 Epoch: 6 Global Step: 129780 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:32:29,156-Speed 6305.30 samples/sec Loss 7.6747 LearningRate 0.0009 Epoch: 6 Global Step: 129790 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:32:32,388-Speed 6338.04 samples/sec Loss 7.6224 LearningRate 0.0009 Epoch: 6 Global Step: 129800 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:32:35,636-Speed 6307.08 samples/sec Loss 7.5803 LearningRate 0.0009 Epoch: 6 Global Step: 129810 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:32:38,890-Speed 6295.70 samples/sec Loss 7.6042 LearningRate 0.0009 Epoch: 6 Global Step: 129820 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:32:42,141-Speed 6301.82 samples/sec Loss 7.6256 LearningRate 0.0009 Epoch: 6 Global Step: 129830 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:32:45,392-Speed 6301.16 samples/sec Loss 7.6331 LearningRate 0.0009 Epoch: 6 Global Step: 129840 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:32:48,641-Speed 6303.29 samples/sec Loss 7.6397 LearningRate 0.0009 Epoch: 6 Global Step: 129850 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:32:51,891-Speed 6304.38 samples/sec Loss 7.6261 LearningRate 0.0009 Epoch: 6 Global Step: 129860 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:32:55,141-Speed 6302.61 samples/sec Loss 7.5799 LearningRate 0.0009 Epoch: 6 Global Step: 129870 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:32:58,389-Speed 6306.80 samples/sec Loss 7.5930 LearningRate 0.0009 Epoch: 6 Global Step: 129880 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:01,643-Speed 6295.85 samples/sec Loss 7.5921 LearningRate 0.0009 Epoch: 6 Global Step: 129890 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:04,891-Speed 6307.36 samples/sec Loss 7.7134 LearningRate 0.0009 Epoch: 6 Global Step: 129900 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:33:08,137-Speed 6310.28 samples/sec Loss 7.6262 LearningRate 0.0009 Epoch: 6 Global Step: 129910 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:33:11,373-Speed 6331.12 samples/sec Loss 7.6524 LearningRate 0.0009 Epoch: 6 Global Step: 129920 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:14,615-Speed 6316.93 samples/sec Loss 7.5886 LearningRate 0.0009 Epoch: 6 Global Step: 129930 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:17,867-Speed 6300.03 samples/sec Loss 7.6076 LearningRate 0.0009 Epoch: 6 Global Step: 129940 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:21,113-Speed 6310.00 samples/sec Loss 7.6739 LearningRate 0.0009 Epoch: 6 Global Step: 129950 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:24,360-Speed 6311.05 samples/sec Loss 7.6273 LearningRate 0.0009 Epoch: 6 Global Step: 129960 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:27,606-Speed 6309.22 samples/sec Loss 7.5686 LearningRate 0.0009 Epoch: 6 Global Step: 129970 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:30,854-Speed 6306.38 samples/sec Loss 7.6076 LearningRate 0.0009 Epoch: 6 Global Step: 129980 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:34,112-Speed 6288.55 samples/sec Loss 7.6409 LearningRate 0.0009 Epoch: 6 Global Step: 129990 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:37,362-Speed 6308.40 samples/sec Loss 7.6933 LearningRate 0.0009 Epoch: 6 Global Step: 130000 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:40,610-Speed 6307.87 samples/sec Loss 7.6960 LearningRate 0.0009 Epoch: 6 Global Step: 130010 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:33:43,858-Speed 6307.38 samples/sec Loss 7.6151 LearningRate 0.0009 Epoch: 6 Global Step: 130020 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:33:47,104-Speed 6309.78 samples/sec Loss 7.6002 LearningRate 0.0009 Epoch: 6 Global Step: 130030 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:33:50,364-Speed 6284.28 samples/sec Loss 7.5898 LearningRate 0.0009 Epoch: 6 Global Step: 130040 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:33:53,609-Speed 6311.06 samples/sec Loss 7.5759 LearningRate 0.0009 Epoch: 6 Global Step: 130050 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:33:56,855-Speed 6312.07 samples/sec Loss 7.6860 LearningRate 0.0009 Epoch: 6 Global Step: 130060 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:34:00,106-Speed 6300.11 samples/sec Loss 7.7107 LearningRate 0.0009 Epoch: 6 Global Step: 130070 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:34:03,347-Speed 6320.11 samples/sec Loss 7.6607 LearningRate 0.0009 Epoch: 6 Global Step: 130080 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:34:06,592-Speed 6313.32 samples/sec Loss 7.6818 LearningRate 0.0009 Epoch: 6 Global Step: 130090 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:34:09,838-Speed 6311.03 samples/sec Loss 7.6315 LearningRate 0.0009 Epoch: 6 Global Step: 130100 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:34:13,085-Speed 6308.08 samples/sec Loss 7.6283 LearningRate 0.0009 Epoch: 6 Global Step: 130110 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:34:16,333-Speed 6306.90 samples/sec Loss 7.6748 LearningRate 0.0009 Epoch: 6 Global Step: 130120 Fp16 Grad Scale: 131072 Required: 64 hours Training: 2022-04-01 03:34:19,564-Speed 6339.99 samples/sec Loss 7.6972 LearningRate 0.0009 Epoch: 6 Global Step: 130130 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:34:22,816-Speed 6299.33 samples/sec Loss 7.6571 LearningRate 0.0009 Epoch: 6 Global Step: 130140 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:34:26,063-Speed 6309.72 samples/sec Loss 7.6537 LearningRate 0.0009 Epoch: 6 Global Step: 130150 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:34:29,309-Speed 6309.93 samples/sec Loss 7.6307 LearningRate 0.0009 Epoch: 6 Global Step: 130160 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:34:32,540-Speed 6340.36 samples/sec Loss 7.5974 LearningRate 0.0009 Epoch: 6 Global Step: 130170 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:34:35,785-Speed 6311.26 samples/sec Loss 7.5730 LearningRate 0.0009 Epoch: 6 Global Step: 130180 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:34:39,030-Speed 6313.37 samples/sec Loss 7.5225 LearningRate 0.0009 Epoch: 6 Global Step: 130190 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:34:42,277-Speed 6308.79 samples/sec Loss 7.5958 LearningRate 0.0009 Epoch: 6 Global Step: 130200 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:34:45,528-Speed 6301.93 samples/sec Loss 7.6438 LearningRate 0.0009 Epoch: 6 Global Step: 130210 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:34:48,774-Speed 6310.64 samples/sec Loss 7.6073 LearningRate 0.0009 Epoch: 6 Global Step: 130220 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:34:52,022-Speed 6306.92 samples/sec Loss 7.5845 LearningRate 0.0009 Epoch: 6 Global Step: 130230 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:34:55,267-Speed 6313.26 samples/sec Loss 7.6363 LearningRate 0.0009 Epoch: 6 Global Step: 130240 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:34:58,510-Speed 6315.39 samples/sec Loss 7.6062 LearningRate 0.0009 Epoch: 6 Global Step: 130250 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:35:01,754-Speed 6314.45 samples/sec Loss 7.6599 LearningRate 0.0009 Epoch: 6 Global Step: 130260 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:35:05,002-Speed 6307.77 samples/sec Loss 7.6082 LearningRate 0.0009 Epoch: 6 Global Step: 130270 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:08,253-Speed 6302.02 samples/sec Loss 7.5149 LearningRate 0.0009 Epoch: 6 Global Step: 130280 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:11,501-Speed 6305.48 samples/sec Loss 7.6151 LearningRate 0.0009 Epoch: 6 Global Step: 130290 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:14,747-Speed 6311.28 samples/sec Loss 7.5036 LearningRate 0.0009 Epoch: 6 Global Step: 130300 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:17,998-Speed 6300.31 samples/sec Loss 7.6223 LearningRate 0.0009 Epoch: 6 Global Step: 130310 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:21,248-Speed 6302.39 samples/sec Loss 7.5394 LearningRate 0.0009 Epoch: 6 Global Step: 130320 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:24,498-Speed 6303.16 samples/sec Loss 7.6593 LearningRate 0.0009 Epoch: 6 Global Step: 130330 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:27,742-Speed 6315.28 samples/sec Loss 7.6492 LearningRate 0.0009 Epoch: 6 Global Step: 130340 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:30,989-Speed 6308.03 samples/sec Loss 7.5554 LearningRate 0.0009 Epoch: 6 Global Step: 130350 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:34,237-Speed 6307.43 samples/sec Loss 7.6380 LearningRate 0.0009 Epoch: 6 Global Step: 130360 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:37,485-Speed 6307.55 samples/sec Loss 7.6821 LearningRate 0.0009 Epoch: 6 Global Step: 130370 Fp16 Grad Scale: 131072 Required: 64 hours Training: 2022-04-01 03:35:40,716-Speed 6338.22 samples/sec Loss 7.6336 LearningRate 0.0009 Epoch: 6 Global Step: 130380 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:43,964-Speed 6307.74 samples/sec Loss 7.6669 LearningRate 0.0009 Epoch: 6 Global Step: 130390 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:47,210-Speed 6311.45 samples/sec Loss 7.6996 LearningRate 0.0009 Epoch: 6 Global Step: 130400 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:50,454-Speed 6313.14 samples/sec Loss 7.5970 LearningRate 0.0009 Epoch: 6 Global Step: 130410 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:53,699-Speed 6314.69 samples/sec Loss 7.5743 LearningRate 0.0009 Epoch: 6 Global Step: 130420 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:35:56,946-Speed 6307.95 samples/sec Loss 7.5397 LearningRate 0.0009 Epoch: 6 Global Step: 130430 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:36:00,178-Speed 6338.07 samples/sec Loss 7.5758 LearningRate 0.0009 Epoch: 6 Global Step: 130440 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:36:03,425-Speed 6309.36 samples/sec Loss 7.6096 LearningRate 0.0009 Epoch: 6 Global Step: 130450 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:36:06,674-Speed 6304.43 samples/sec Loss 7.6237 LearningRate 0.0009 Epoch: 6 Global Step: 130460 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:36:09,928-Speed 6295.46 samples/sec Loss 7.6315 LearningRate 0.0009 Epoch: 6 Global Step: 130470 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:36:13,222-Speed 6218.55 samples/sec Loss 7.6263 LearningRate 0.0009 Epoch: 6 Global Step: 130480 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:36:16,585-Speed 6090.97 samples/sec Loss 7.6872 LearningRate 0.0009 Epoch: 6 Global Step: 130490 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:36:19,866-Speed 6244.40 samples/sec Loss 7.7112 LearningRate 0.0009 Epoch: 6 Global Step: 130500 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:36:23,114-Speed 6306.10 samples/sec Loss 7.6319 LearningRate 0.0009 Epoch: 6 Global Step: 130510 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:36:26,360-Speed 6310.79 samples/sec Loss 7.6471 LearningRate 0.0009 Epoch: 6 Global Step: 130520 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:36:29,607-Speed 6309.31 samples/sec Loss 7.5757 LearningRate 0.0009 Epoch: 6 Global Step: 130530 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:36:32,854-Speed 6307.49 samples/sec Loss 7.6416 LearningRate 0.0009 Epoch: 6 Global Step: 130540 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:36:36,104-Speed 6304.11 samples/sec Loss 7.5657 LearningRate 0.0009 Epoch: 6 Global Step: 130550 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:36:39,353-Speed 6304.97 samples/sec Loss 7.5980 LearningRate 0.0009 Epoch: 6 Global Step: 130560 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:36:42,602-Speed 6304.44 samples/sec Loss 7.5549 LearningRate 0.0009 Epoch: 6 Global Step: 130570 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:36:45,847-Speed 6312.74 samples/sec Loss 7.6169 LearningRate 0.0009 Epoch: 6 Global Step: 130580 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:36:49,092-Speed 6311.39 samples/sec Loss 7.7025 LearningRate 0.0009 Epoch: 6 Global Step: 130590 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:36:52,338-Speed 6310.99 samples/sec Loss 7.5845 LearningRate 0.0009 Epoch: 6 Global Step: 130600 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:36:55,583-Speed 6313.60 samples/sec Loss 7.5718 LearningRate 0.0009 Epoch: 6 Global Step: 130610 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:36:58,824-Speed 6319.43 samples/sec Loss 7.6321 LearningRate 0.0009 Epoch: 6 Global Step: 130620 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:02,074-Speed 6302.75 samples/sec Loss 7.6174 LearningRate 0.0009 Epoch: 6 Global Step: 130630 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:05,310-Speed 6331.83 samples/sec Loss 7.6038 LearningRate 0.0009 Epoch: 6 Global Step: 130640 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:08,562-Speed 6298.59 samples/sec Loss 7.6238 LearningRate 0.0009 Epoch: 6 Global Step: 130650 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:11,815-Speed 6298.17 samples/sec Loss 7.5785 LearningRate 0.0009 Epoch: 6 Global Step: 130660 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:15,062-Speed 6309.30 samples/sec Loss 7.5915 LearningRate 0.0009 Epoch: 6 Global Step: 130670 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:18,310-Speed 6305.70 samples/sec Loss 7.5752 LearningRate 0.0009 Epoch: 6 Global Step: 130680 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:21,581-Speed 6262.73 samples/sec Loss 7.5762 LearningRate 0.0009 Epoch: 6 Global Step: 130690 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:24,831-Speed 6303.09 samples/sec Loss 7.6851 LearningRate 0.0009 Epoch: 6 Global Step: 130700 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:28,080-Speed 6305.24 samples/sec Loss 7.6096 LearningRate 0.0009 Epoch: 6 Global Step: 130710 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:31,328-Speed 6305.84 samples/sec Loss 7.5920 LearningRate 0.0009 Epoch: 6 Global Step: 130720 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:34,580-Speed 6299.23 samples/sec Loss 7.6121 LearningRate 0.0009 Epoch: 6 Global Step: 130730 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:37,881-Speed 6206.61 samples/sec Loss 7.5274 LearningRate 0.0009 Epoch: 6 Global Step: 130740 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:41,204-Speed 6164.34 samples/sec Loss 7.6263 LearningRate 0.0009 Epoch: 6 Global Step: 130750 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:44,454-Speed 6302.88 samples/sec Loss 7.5529 LearningRate 0.0009 Epoch: 6 Global Step: 130760 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:47,701-Speed 6307.19 samples/sec Loss 7.5720 LearningRate 0.0009 Epoch: 6 Global Step: 130770 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:50,948-Speed 6309.14 samples/sec Loss 7.6916 LearningRate 0.0009 Epoch: 6 Global Step: 130780 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:54,199-Speed 6300.60 samples/sec Loss 7.5896 LearningRate 0.0009 Epoch: 6 Global Step: 130790 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:37:57,447-Speed 6307.39 samples/sec Loss 7.5564 LearningRate 0.0009 Epoch: 6 Global Step: 130800 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:38:00,725-Speed 6248.85 samples/sec Loss 7.6262 LearningRate 0.0009 Epoch: 6 Global Step: 130810 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:38:03,973-Speed 6306.06 samples/sec Loss 7.6266 LearningRate 0.0009 Epoch: 6 Global Step: 130820 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:38:07,225-Speed 6299.40 samples/sec Loss 7.5971 LearningRate 0.0009 Epoch: 6 Global Step: 130830 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:38:10,462-Speed 6328.77 samples/sec Loss 7.5544 LearningRate 0.0009 Epoch: 6 Global Step: 130840 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:38:13,710-Speed 6307.10 samples/sec Loss 7.5804 LearningRate 0.0009 Epoch: 6 Global Step: 130850 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:38:16,961-Speed 6302.10 samples/sec Loss 7.5763 LearningRate 0.0009 Epoch: 6 Global Step: 130860 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:38:20,204-Speed 6316.06 samples/sec Loss 7.5351 LearningRate 0.0009 Epoch: 6 Global Step: 130870 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:38:23,436-Speed 6337.59 samples/sec Loss 7.6450 LearningRate 0.0009 Epoch: 6 Global Step: 130880 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:38:26,682-Speed 6311.88 samples/sec Loss 7.5828 LearningRate 0.0009 Epoch: 6 Global Step: 130890 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:38:29,936-Speed 6295.64 samples/sec Loss 7.5282 LearningRate 0.0009 Epoch: 6 Global Step: 130900 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:38:33,183-Speed 6306.72 samples/sec Loss 7.5779 LearningRate 0.0009 Epoch: 6 Global Step: 130910 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:38:36,441-Speed 6287.71 samples/sec Loss 7.6065 LearningRate 0.0009 Epoch: 6 Global Step: 130920 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:38:39,693-Speed 6298.46 samples/sec Loss 7.5711 LearningRate 0.0009 Epoch: 6 Global Step: 130930 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:38:42,939-Speed 6311.82 samples/sec Loss 7.5560 LearningRate 0.0009 Epoch: 6 Global Step: 130940 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:38:46,187-Speed 6307.84 samples/sec Loss 7.6381 LearningRate 0.0009 Epoch: 6 Global Step: 130950 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:38:49,435-Speed 6305.82 samples/sec Loss 7.5495 LearningRate 0.0009 Epoch: 6 Global Step: 130960 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:38:52,680-Speed 6312.16 samples/sec Loss 7.6695 LearningRate 0.0009 Epoch: 6 Global Step: 130970 Fp16 Grad Scale: 32768 Required: 64 hours Training: 2022-04-01 03:38:55,928-Speed 6307.83 samples/sec Loss 7.5711 LearningRate 0.0009 Epoch: 6 Global Step: 130980 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:38:59,180-Speed 6297.61 samples/sec Loss 7.5846 LearningRate 0.0009 Epoch: 6 Global Step: 130990 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:39:02,430-Speed 6304.65 samples/sec Loss 7.6603 LearningRate 0.0009 Epoch: 6 Global Step: 131000 Fp16 Grad Scale: 65536 Required: 64 hours Training: 2022-04-01 03:39:05,676-Speed 6309.04 samples/sec Loss 7.6105 LearningRate 0.0009 Epoch: 6 Global Step: 131010 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:39:08,923-Speed 6309.08 samples/sec Loss 7.6095 LearningRate 0.0009 Epoch: 6 Global Step: 131020 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:39:12,158-Speed 6332.08 samples/sec Loss 7.5496 LearningRate 0.0009 Epoch: 6 Global Step: 131030 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:39:15,402-Speed 6315.45 samples/sec Loss 7.4616 LearningRate 0.0009 Epoch: 6 Global Step: 131040 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:39:18,649-Speed 6308.83 samples/sec Loss 7.5927 LearningRate 0.0009 Epoch: 6 Global Step: 131050 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:39:21,901-Speed 6299.97 samples/sec Loss 7.5712 LearningRate 0.0009 Epoch: 6 Global Step: 131060 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:39:25,151-Speed 6303.50 samples/sec Loss 7.5966 LearningRate 0.0009 Epoch: 6 Global Step: 131070 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:39:28,400-Speed 6304.63 samples/sec Loss 7.5995 LearningRate 0.0009 Epoch: 6 Global Step: 131080 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:39:31,641-Speed 6319.74 samples/sec Loss 7.6430 LearningRate 0.0009 Epoch: 6 Global Step: 131090 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:39:34,889-Speed 6307.52 samples/sec Loss 7.5964 LearningRate 0.0009 Epoch: 6 Global Step: 131100 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:39:38,134-Speed 6311.57 samples/sec Loss 7.6293 LearningRate 0.0009 Epoch: 6 Global Step: 131110 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:39:41,379-Speed 6313.38 samples/sec Loss 7.5906 LearningRate 0.0009 Epoch: 6 Global Step: 131120 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:39:44,626-Speed 6308.38 samples/sec Loss 7.5908 LearningRate 0.0009 Epoch: 6 Global Step: 131130 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:39:47,878-Speed 6299.27 samples/sec Loss 7.5670 LearningRate 0.0009 Epoch: 6 Global Step: 131140 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:39:51,126-Speed 6305.91 samples/sec Loss 7.6110 LearningRate 0.0009 Epoch: 6 Global Step: 131150 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:39:54,374-Speed 6307.98 samples/sec Loss 7.6288 LearningRate 0.0009 Epoch: 6 Global Step: 131160 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:39:57,612-Speed 6325.66 samples/sec Loss 7.6496 LearningRate 0.0009 Epoch: 6 Global Step: 131170 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:00,861-Speed 6304.90 samples/sec Loss 7.5774 LearningRate 0.0009 Epoch: 6 Global Step: 131180 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:04,108-Speed 6309.72 samples/sec Loss 7.5603 LearningRate 0.0009 Epoch: 6 Global Step: 131190 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:07,352-Speed 6314.57 samples/sec Loss 7.6166 LearningRate 0.0009 Epoch: 6 Global Step: 131200 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:10,600-Speed 6306.18 samples/sec Loss 7.5347 LearningRate 0.0009 Epoch: 6 Global Step: 131210 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:13,860-Speed 6284.04 samples/sec Loss 7.6193 LearningRate 0.0009 Epoch: 6 Global Step: 131220 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:17,108-Speed 6307.11 samples/sec Loss 7.5248 LearningRate 0.0009 Epoch: 6 Global Step: 131230 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:20,356-Speed 6306.27 samples/sec Loss 7.5368 LearningRate 0.0009 Epoch: 6 Global Step: 131240 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:23,600-Speed 6313.98 samples/sec Loss 7.6097 LearningRate 0.0009 Epoch: 6 Global Step: 131250 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:26,841-Speed 6320.41 samples/sec Loss 7.5935 LearningRate 0.0009 Epoch: 6 Global Step: 131260 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:30,088-Speed 6309.00 samples/sec Loss 7.5574 LearningRate 0.0009 Epoch: 6 Global Step: 131270 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:40:33,319-Speed 6339.75 samples/sec Loss 7.6635 LearningRate 0.0009 Epoch: 6 Global Step: 131280 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:36,571-Speed 6300.41 samples/sec Loss 7.5875 LearningRate 0.0009 Epoch: 6 Global Step: 131290 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:39,818-Speed 6307.42 samples/sec Loss 7.5975 LearningRate 0.0009 Epoch: 6 Global Step: 131300 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:43,063-Speed 6313.69 samples/sec Loss 7.5656 LearningRate 0.0009 Epoch: 6 Global Step: 131310 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:46,312-Speed 6304.78 samples/sec Loss 7.6192 LearningRate 0.0009 Epoch: 6 Global Step: 131320 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:49,558-Speed 6311.75 samples/sec Loss 7.6165 LearningRate 0.0009 Epoch: 6 Global Step: 131330 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:52,804-Speed 6309.79 samples/sec Loss 7.5456 LearningRate 0.0009 Epoch: 6 Global Step: 131340 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:56,057-Speed 6297.58 samples/sec Loss 7.6269 LearningRate 0.0009 Epoch: 6 Global Step: 131350 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:40:59,300-Speed 6315.89 samples/sec Loss 7.6054 LearningRate 0.0009 Epoch: 6 Global Step: 131360 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:41:02,548-Speed 6306.80 samples/sec Loss 7.5769 LearningRate 0.0009 Epoch: 6 Global Step: 131370 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:41:05,789-Speed 6320.29 samples/sec Loss 7.4978 LearningRate 0.0009 Epoch: 6 Global Step: 131380 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:09,034-Speed 6313.18 samples/sec Loss 7.5880 LearningRate 0.0009 Epoch: 6 Global Step: 131390 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:12,275-Speed 6320.21 samples/sec Loss 7.5702 LearningRate 0.0009 Epoch: 6 Global Step: 131400 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:15,521-Speed 6310.30 samples/sec Loss 7.5832 LearningRate 0.0009 Epoch: 6 Global Step: 131410 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:18,767-Speed 6311.61 samples/sec Loss 7.6257 LearningRate 0.0009 Epoch: 6 Global Step: 131420 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:22,008-Speed 6319.21 samples/sec Loss 7.6216 LearningRate 0.0009 Epoch: 6 Global Step: 131430 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:25,258-Speed 6304.31 samples/sec Loss 7.5955 LearningRate 0.0009 Epoch: 6 Global Step: 131440 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:28,503-Speed 6311.31 samples/sec Loss 7.5847 LearningRate 0.0009 Epoch: 6 Global Step: 131450 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:31,749-Speed 6312.59 samples/sec Loss 7.5696 LearningRate 0.0009 Epoch: 6 Global Step: 131460 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:34,995-Speed 6309.63 samples/sec Loss 7.5817 LearningRate 0.0009 Epoch: 6 Global Step: 131470 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:38,224-Speed 6343.01 samples/sec Loss 7.5784 LearningRate 0.0009 Epoch: 6 Global Step: 131480 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:41,474-Speed 6303.77 samples/sec Loss 7.5537 LearningRate 0.0009 Epoch: 6 Global Step: 131490 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:44,728-Speed 6294.72 samples/sec Loss 7.5640 LearningRate 0.0009 Epoch: 6 Global Step: 131500 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:47,979-Speed 6301.87 samples/sec Loss 7.5448 LearningRate 0.0009 Epoch: 6 Global Step: 131510 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:51,223-Speed 6314.74 samples/sec Loss 7.6080 LearningRate 0.0009 Epoch: 6 Global Step: 131520 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:54,471-Speed 6308.34 samples/sec Loss 7.5881 LearningRate 0.0009 Epoch: 6 Global Step: 131530 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:41:57,720-Speed 6303.95 samples/sec Loss 7.5757 LearningRate 0.0009 Epoch: 6 Global Step: 131540 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:42:00,963-Speed 6317.58 samples/sec Loss 7.5434 LearningRate 0.0009 Epoch: 6 Global Step: 131550 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:42:04,210-Speed 6308.69 samples/sec Loss 7.5984 LearningRate 0.0009 Epoch: 6 Global Step: 131560 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:42:07,443-Speed 6334.82 samples/sec Loss 7.5715 LearningRate 0.0009 Epoch: 6 Global Step: 131570 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:42:10,687-Speed 6315.07 samples/sec Loss 7.6066 LearningRate 0.0009 Epoch: 6 Global Step: 131580 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:42:13,935-Speed 6307.11 samples/sec Loss 7.6257 LearningRate 0.0009 Epoch: 6 Global Step: 131590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:42:17,182-Speed 6307.99 samples/sec Loss 7.6372 LearningRate 0.0009 Epoch: 6 Global Step: 131600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:42:20,428-Speed 6310.38 samples/sec Loss 7.5389 LearningRate 0.0009 Epoch: 6 Global Step: 131610 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:42:23,674-Speed 6310.32 samples/sec Loss 7.5954 LearningRate 0.0009 Epoch: 6 Global Step: 131620 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:42:26,917-Speed 6317.90 samples/sec Loss 7.4987 LearningRate 0.0009 Epoch: 6 Global Step: 131630 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:42:30,165-Speed 6307.30 samples/sec Loss 7.5941 LearningRate 0.0009 Epoch: 6 Global Step: 131640 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:42:33,413-Speed 6304.99 samples/sec Loss 7.5457 LearningRate 0.0009 Epoch: 6 Global Step: 131650 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:42:36,658-Speed 6313.80 samples/sec Loss 7.6429 LearningRate 0.0009 Epoch: 6 Global Step: 131660 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:42:39,902-Speed 6314.02 samples/sec Loss 7.6057 LearningRate 0.0009 Epoch: 6 Global Step: 131670 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:42:43,152-Speed 6303.29 samples/sec Loss 7.5894 LearningRate 0.0009 Epoch: 6 Global Step: 131680 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:42:46,397-Speed 6311.99 samples/sec Loss 7.5728 LearningRate 0.0009 Epoch: 6 Global Step: 131690 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:42:49,643-Speed 6310.99 samples/sec Loss 7.6230 LearningRate 0.0009 Epoch: 6 Global Step: 131700 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:42:52,890-Speed 6308.75 samples/sec Loss 7.6443 LearningRate 0.0009 Epoch: 6 Global Step: 131710 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:42:56,138-Speed 6306.89 samples/sec Loss 7.6384 LearningRate 0.0009 Epoch: 6 Global Step: 131720 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:42:59,386-Speed 6307.25 samples/sec Loss 7.5971 LearningRate 0.0009 Epoch: 6 Global Step: 131730 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:43:02,631-Speed 6313.20 samples/sec Loss 7.5290 LearningRate 0.0009 Epoch: 6 Global Step: 131740 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:43:05,881-Speed 6303.45 samples/sec Loss 7.5796 LearningRate 0.0009 Epoch: 6 Global Step: 131750 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:43:09,125-Speed 6313.69 samples/sec Loss 7.5181 LearningRate 0.0009 Epoch: 6 Global Step: 131760 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:43:12,360-Speed 6332.29 samples/sec Loss 7.5888 LearningRate 0.0009 Epoch: 6 Global Step: 131770 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:43:15,603-Speed 6316.34 samples/sec Loss 7.6337 LearningRate 0.0009 Epoch: 6 Global Step: 131780 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:43:18,867-Speed 6276.05 samples/sec Loss 7.6064 LearningRate 0.0009 Epoch: 6 Global Step: 131790 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:43:22,112-Speed 6312.98 samples/sec Loss 7.5218 LearningRate 0.0009 Epoch: 6 Global Step: 131800 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:43:25,355-Speed 6316.45 samples/sec Loss 7.6170 LearningRate 0.0009 Epoch: 6 Global Step: 131810 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:43:28,585-Speed 6342.40 samples/sec Loss 7.5553 LearningRate 0.0009 Epoch: 6 Global Step: 131820 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:43:31,831-Speed 6309.58 samples/sec Loss 7.7250 LearningRate 0.0009 Epoch: 6 Global Step: 131830 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:43:35,078-Speed 6309.22 samples/sec Loss 7.5783 LearningRate 0.0009 Epoch: 6 Global Step: 131840 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:43:38,322-Speed 6314.26 samples/sec Loss 7.5855 LearningRate 0.0009 Epoch: 6 Global Step: 131850 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:43:41,570-Speed 6307.60 samples/sec Loss 7.6144 LearningRate 0.0009 Epoch: 6 Global Step: 131860 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:43:44,819-Speed 6304.61 samples/sec Loss 7.5160 LearningRate 0.0009 Epoch: 6 Global Step: 131870 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:43:48,061-Speed 6318.45 samples/sec Loss 7.5372 LearningRate 0.0009 Epoch: 6 Global Step: 131880 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:43:51,307-Speed 6311.10 samples/sec Loss 7.5653 LearningRate 0.0009 Epoch: 6 Global Step: 131890 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:43:54,552-Speed 6311.20 samples/sec Loss 7.4655 LearningRate 0.0009 Epoch: 6 Global Step: 131900 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:43:57,802-Speed 6303.48 samples/sec Loss 7.6737 LearningRate 0.0009 Epoch: 6 Global Step: 131910 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:01,053-Speed 6301.11 samples/sec Loss 7.5887 LearningRate 0.0009 Epoch: 6 Global Step: 131920 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:44:04,307-Speed 6296.09 samples/sec Loss 7.4840 LearningRate 0.0009 Epoch: 6 Global Step: 131930 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:44:07,556-Speed 6303.79 samples/sec Loss 7.5845 LearningRate 0.0009 Epoch: 6 Global Step: 131940 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:44:10,805-Speed 6305.98 samples/sec Loss 7.5836 LearningRate 0.0009 Epoch: 6 Global Step: 131950 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:44:14,039-Speed 6334.44 samples/sec Loss 7.5568 LearningRate 0.0009 Epoch: 6 Global Step: 131960 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:17,285-Speed 6311.03 samples/sec Loss 7.5838 LearningRate 0.0009 Epoch: 6 Global Step: 131970 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:20,530-Speed 6311.71 samples/sec Loss 7.5188 LearningRate 0.0009 Epoch: 6 Global Step: 131980 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:23,779-Speed 6305.42 samples/sec Loss 7.5704 LearningRate 0.0009 Epoch: 6 Global Step: 131990 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:27,026-Speed 6307.63 samples/sec Loss 7.5988 LearningRate 0.0009 Epoch: 6 Global Step: 132000 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:30,273-Speed 6310.33 samples/sec Loss 7.5762 LearningRate 0.0009 Epoch: 6 Global Step: 132010 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:33,519-Speed 6309.30 samples/sec Loss 7.5551 LearningRate 0.0009 Epoch: 6 Global Step: 132020 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:36,770-Speed 6301.73 samples/sec Loss 7.5906 LearningRate 0.0009 Epoch: 6 Global Step: 132030 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:40,022-Speed 6299.33 samples/sec Loss 7.6188 LearningRate 0.0009 Epoch: 6 Global Step: 132040 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:43,267-Speed 6312.51 samples/sec Loss 7.5904 LearningRate 0.0009 Epoch: 6 Global Step: 132050 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:44:46,514-Speed 6307.74 samples/sec Loss 7.5807 LearningRate 0.0009 Epoch: 6 Global Step: 132060 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:44:49,761-Speed 6309.63 samples/sec Loss 7.6530 LearningRate 0.0009 Epoch: 6 Global Step: 132070 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:44:53,007-Speed 6309.79 samples/sec Loss 7.5104 LearningRate 0.0009 Epoch: 6 Global Step: 132080 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:44:56,264-Speed 6289.81 samples/sec Loss 7.5540 LearningRate 0.0009 Epoch: 6 Global Step: 132090 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:44:59,510-Speed 6310.87 samples/sec Loss 7.4973 LearningRate 0.0009 Epoch: 6 Global Step: 132100 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:45:02,758-Speed 6306.92 samples/sec Loss 7.6291 LearningRate 0.0009 Epoch: 6 Global Step: 132110 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:45:06,003-Speed 6313.20 samples/sec Loss 7.5233 LearningRate 0.0009 Epoch: 6 Global Step: 132120 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:45:09,251-Speed 6305.02 samples/sec Loss 7.5690 LearningRate 0.0009 Epoch: 6 Global Step: 132130 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:45:12,501-Speed 6304.60 samples/sec Loss 7.5391 LearningRate 0.0009 Epoch: 6 Global Step: 132140 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:45:15,732-Speed 6339.29 samples/sec Loss 7.5374 LearningRate 0.0009 Epoch: 6 Global Step: 132150 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:18,980-Speed 6308.52 samples/sec Loss 7.5646 LearningRate 0.0009 Epoch: 6 Global Step: 132160 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:22,222-Speed 6317.90 samples/sec Loss 7.6420 LearningRate 0.0009 Epoch: 6 Global Step: 132170 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:25,468-Speed 6309.79 samples/sec Loss 7.6024 LearningRate 0.0009 Epoch: 6 Global Step: 132180 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:28,728-Speed 6284.07 samples/sec Loss 7.5409 LearningRate 0.0009 Epoch: 6 Global Step: 132190 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:31,973-Speed 6313.08 samples/sec Loss 7.5979 LearningRate 0.0009 Epoch: 6 Global Step: 132200 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:35,215-Speed 6318.16 samples/sec Loss 7.5499 LearningRate 0.0009 Epoch: 6 Global Step: 132210 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:38,461-Speed 6311.74 samples/sec Loss 7.5533 LearningRate 0.0009 Epoch: 6 Global Step: 132220 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:41,706-Speed 6312.11 samples/sec Loss 7.5209 LearningRate 0.0009 Epoch: 6 Global Step: 132230 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:44,956-Speed 6303.31 samples/sec Loss 7.5669 LearningRate 0.0009 Epoch: 6 Global Step: 132240 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:48,184-Speed 6346.38 samples/sec Loss 7.5802 LearningRate 0.0009 Epoch: 6 Global Step: 132250 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:51,433-Speed 6305.19 samples/sec Loss 7.5789 LearningRate 0.0009 Epoch: 6 Global Step: 132260 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:54,677-Speed 6314.43 samples/sec Loss 7.6219 LearningRate 0.0009 Epoch: 6 Global Step: 132270 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:45:57,919-Speed 6317.11 samples/sec Loss 7.5950 LearningRate 0.0009 Epoch: 6 Global Step: 132280 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:01,167-Speed 6307.80 samples/sec Loss 7.5546 LearningRate 0.0009 Epoch: 6 Global Step: 132290 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:04,414-Speed 6309.36 samples/sec Loss 7.5755 LearningRate 0.0009 Epoch: 6 Global Step: 132300 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:07,668-Speed 6294.07 samples/sec Loss 7.5922 LearningRate 0.0009 Epoch: 6 Global Step: 132310 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:10,914-Speed 6311.72 samples/sec Loss 7.5513 LearningRate 0.0009 Epoch: 6 Global Step: 132320 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:14,158-Speed 6314.72 samples/sec Loss 7.5227 LearningRate 0.0009 Epoch: 6 Global Step: 132330 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:17,400-Speed 6318.18 samples/sec Loss 7.5755 LearningRate 0.0009 Epoch: 6 Global Step: 132340 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:20,646-Speed 6310.98 samples/sec Loss 7.5644 LearningRate 0.0009 Epoch: 6 Global Step: 132350 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:46:23,893-Speed 6308.63 samples/sec Loss 7.5549 LearningRate 0.0009 Epoch: 6 Global Step: 132360 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:46:27,139-Speed 6311.33 samples/sec Loss 7.5488 LearningRate 0.0009 Epoch: 6 Global Step: 132370 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:46:30,387-Speed 6306.47 samples/sec Loss 7.4909 LearningRate 0.0009 Epoch: 6 Global Step: 132380 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:46:33,634-Speed 6307.91 samples/sec Loss 7.6152 LearningRate 0.0009 Epoch: 6 Global Step: 132390 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:46:36,870-Speed 6331.12 samples/sec Loss 7.5918 LearningRate 0.0009 Epoch: 6 Global Step: 132400 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:40,117-Speed 6308.29 samples/sec Loss 7.4616 LearningRate 0.0009 Epoch: 6 Global Step: 132410 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:43,365-Speed 6307.35 samples/sec Loss 7.5555 LearningRate 0.0009 Epoch: 6 Global Step: 132420 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:46,617-Speed 6300.16 samples/sec Loss 7.5992 LearningRate 0.0009 Epoch: 6 Global Step: 132430 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:49,861-Speed 6314.16 samples/sec Loss 7.5641 LearningRate 0.0009 Epoch: 6 Global Step: 132440 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:53,104-Speed 6315.75 samples/sec Loss 7.5411 LearningRate 0.0009 Epoch: 6 Global Step: 132450 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:56,352-Speed 6306.54 samples/sec Loss 7.6186 LearningRate 0.0009 Epoch: 6 Global Step: 132460 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:46:59,601-Speed 6305.06 samples/sec Loss 7.5711 LearningRate 0.0009 Epoch: 6 Global Step: 132470 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:02,856-Speed 6293.40 samples/sec Loss 7.6352 LearningRate 0.0009 Epoch: 6 Global Step: 132480 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:06,108-Speed 6299.83 samples/sec Loss 7.5842 LearningRate 0.0009 Epoch: 6 Global Step: 132490 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:09,353-Speed 6312.05 samples/sec Loss 7.6505 LearningRate 0.0009 Epoch: 6 Global Step: 132500 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:47:12,584-Speed 6340.27 samples/sec Loss 7.6024 LearningRate 0.0009 Epoch: 6 Global Step: 132510 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:15,835-Speed 6301.18 samples/sec Loss 7.6636 LearningRate 0.0009 Epoch: 6 Global Step: 132520 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:19,083-Speed 6307.12 samples/sec Loss 7.5717 LearningRate 0.0009 Epoch: 6 Global Step: 132530 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:22,331-Speed 6305.43 samples/sec Loss 7.4668 LearningRate 0.0009 Epoch: 6 Global Step: 132540 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:25,582-Speed 6301.08 samples/sec Loss 7.5249 LearningRate 0.0009 Epoch: 6 Global Step: 132550 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:28,829-Speed 6309.02 samples/sec Loss 7.5641 LearningRate 0.0009 Epoch: 6 Global Step: 132560 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:32,076-Speed 6309.04 samples/sec Loss 7.6428 LearningRate 0.0009 Epoch: 6 Global Step: 132570 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:35,326-Speed 6303.92 samples/sec Loss 7.5072 LearningRate 0.0009 Epoch: 6 Global Step: 132580 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:38,573-Speed 6308.38 samples/sec Loss 7.5687 LearningRate 0.0009 Epoch: 6 Global Step: 132590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:41,822-Speed 6305.20 samples/sec Loss 7.5903 LearningRate 0.0009 Epoch: 6 Global Step: 132600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:45,069-Speed 6308.06 samples/sec Loss 7.5462 LearningRate 0.0009 Epoch: 6 Global Step: 132610 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:47:48,316-Speed 6308.83 samples/sec Loss 7.5750 LearningRate 0.0009 Epoch: 6 Global Step: 132620 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:47:51,546-Speed 6342.33 samples/sec Loss 7.5192 LearningRate 0.0009 Epoch: 6 Global Step: 132630 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:54,796-Speed 6303.47 samples/sec Loss 7.6816 LearningRate 0.0009 Epoch: 6 Global Step: 132640 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:47:58,070-Speed 6256.09 samples/sec Loss 7.6146 LearningRate 0.0009 Epoch: 6 Global Step: 132650 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:01,317-Speed 6309.93 samples/sec Loss 7.5128 LearningRate 0.0009 Epoch: 6 Global Step: 132660 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:04,609-Speed 6222.51 samples/sec Loss 7.5873 LearningRate 0.0009 Epoch: 6 Global Step: 132670 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:07,855-Speed 6312.32 samples/sec Loss 7.5087 LearningRate 0.0009 Epoch: 6 Global Step: 132680 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:11,104-Speed 6304.65 samples/sec Loss 7.5891 LearningRate 0.0009 Epoch: 6 Global Step: 132690 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:14,348-Speed 6313.84 samples/sec Loss 7.4849 LearningRate 0.0009 Epoch: 6 Global Step: 132700 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:17,599-Speed 6301.46 samples/sec Loss 7.5469 LearningRate 0.0009 Epoch: 6 Global Step: 132710 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:20,841-Speed 6317.46 samples/sec Loss 7.6442 LearningRate 0.0009 Epoch: 6 Global Step: 132720 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:24,089-Speed 6308.50 samples/sec Loss 7.5392 LearningRate 0.0009 Epoch: 6 Global Step: 132730 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:48:27,339-Speed 6301.82 samples/sec Loss 7.5599 LearningRate 0.0009 Epoch: 6 Global Step: 132740 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:48:30,583-Speed 6313.59 samples/sec Loss 7.6267 LearningRate 0.0009 Epoch: 6 Global Step: 132750 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:48:33,819-Speed 6330.42 samples/sec Loss 7.6676 LearningRate 0.0009 Epoch: 6 Global Step: 132760 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:37,065-Speed 6311.63 samples/sec Loss 7.5645 LearningRate 0.0009 Epoch: 6 Global Step: 132770 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:40,314-Speed 6305.09 samples/sec Loss 7.6391 LearningRate 0.0009 Epoch: 6 Global Step: 132780 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:43,559-Speed 6312.07 samples/sec Loss 7.5574 LearningRate 0.0009 Epoch: 6 Global Step: 132790 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:46,844-Speed 6236.26 samples/sec Loss 7.5104 LearningRate 0.0009 Epoch: 6 Global Step: 132800 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:50,089-Speed 6313.39 samples/sec Loss 7.5429 LearningRate 0.0009 Epoch: 6 Global Step: 132810 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:53,335-Speed 6310.85 samples/sec Loss 7.5039 LearningRate 0.0009 Epoch: 6 Global Step: 132820 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:56,587-Speed 6298.49 samples/sec Loss 7.5200 LearningRate 0.0009 Epoch: 6 Global Step: 132830 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:48:59,833-Speed 6311.04 samples/sec Loss 7.5156 LearningRate 0.0009 Epoch: 6 Global Step: 132840 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:49:03,074-Speed 6319.49 samples/sec Loss 7.5461 LearningRate 0.0009 Epoch: 6 Global Step: 132850 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:49:06,321-Speed 6310.13 samples/sec Loss 7.4245 LearningRate 0.0009 Epoch: 6 Global Step: 132860 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:09,580-Speed 6284.88 samples/sec Loss 7.5775 LearningRate 0.0009 Epoch: 6 Global Step: 132870 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:12,824-Speed 6313.84 samples/sec Loss 7.4908 LearningRate 0.0009 Epoch: 6 Global Step: 132880 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:16,070-Speed 6310.93 samples/sec Loss 7.5648 LearningRate 0.0009 Epoch: 6 Global Step: 132890 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:19,320-Speed 6303.98 samples/sec Loss 7.4862 LearningRate 0.0009 Epoch: 6 Global Step: 132900 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:22,565-Speed 6311.13 samples/sec Loss 7.5529 LearningRate 0.0009 Epoch: 6 Global Step: 132910 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:25,814-Speed 6305.78 samples/sec Loss 7.5749 LearningRate 0.0009 Epoch: 6 Global Step: 132920 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:29,060-Speed 6310.22 samples/sec Loss 7.5291 LearningRate 0.0009 Epoch: 6 Global Step: 132930 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:32,304-Speed 6314.94 samples/sec Loss 7.5844 LearningRate 0.0009 Epoch: 6 Global Step: 132940 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:35,558-Speed 6295.11 samples/sec Loss 7.5360 LearningRate 0.0009 Epoch: 6 Global Step: 132950 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:38,790-Speed 6338.08 samples/sec Loss 7.5033 LearningRate 0.0009 Epoch: 6 Global Step: 132960 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:42,034-Speed 6315.78 samples/sec Loss 7.6029 LearningRate 0.0009 Epoch: 6 Global Step: 132970 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:45,284-Speed 6300.99 samples/sec Loss 7.5686 LearningRate 0.0009 Epoch: 6 Global Step: 132980 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:48,530-Speed 6312.65 samples/sec Loss 7.5204 LearningRate 0.0009 Epoch: 6 Global Step: 132990 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:49:51,764-Speed 6332.49 samples/sec Loss 7.5554 LearningRate 0.0009 Epoch: 6 Global Step: 133000 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:49:55,009-Speed 6312.94 samples/sec Loss 7.5896 LearningRate 0.0009 Epoch: 6 Global Step: 133010 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:49:58,257-Speed 6307.83 samples/sec Loss 7.6354 LearningRate 0.0009 Epoch: 6 Global Step: 133020 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:50:01,502-Speed 6313.72 samples/sec Loss 7.5989 LearningRate 0.0009 Epoch: 6 Global Step: 133030 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:50:04,760-Speed 6285.81 samples/sec Loss 7.4472 LearningRate 0.0009 Epoch: 6 Global Step: 133040 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:50:08,006-Speed 6311.76 samples/sec Loss 7.5453 LearningRate 0.0009 Epoch: 6 Global Step: 133050 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:50:11,251-Speed 6312.64 samples/sec Loss 7.4941 LearningRate 0.0009 Epoch: 6 Global Step: 133060 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:50:14,497-Speed 6311.37 samples/sec Loss 7.5607 LearningRate 0.0009 Epoch: 6 Global Step: 133070 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:50:17,743-Speed 6310.08 samples/sec Loss 7.5832 LearningRate 0.0009 Epoch: 6 Global Step: 133080 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:50:20,989-Speed 6310.03 samples/sec Loss 7.5448 LearningRate 0.0009 Epoch: 6 Global Step: 133090 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:50:24,243-Speed 6295.36 samples/sec Loss 7.4516 LearningRate 0.0009 Epoch: 6 Global Step: 133100 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:27,490-Speed 6308.66 samples/sec Loss 7.5272 LearningRate 0.0009 Epoch: 6 Global Step: 133110 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:30,733-Speed 6316.89 samples/sec Loss 7.5393 LearningRate 0.0009 Epoch: 6 Global Step: 133120 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:33,976-Speed 6315.57 samples/sec Loss 7.5786 LearningRate 0.0009 Epoch: 6 Global Step: 133130 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:37,224-Speed 6307.85 samples/sec Loss 7.5015 LearningRate 0.0009 Epoch: 6 Global Step: 133140 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:40,472-Speed 6307.34 samples/sec Loss 7.5266 LearningRate 0.0009 Epoch: 6 Global Step: 133150 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:43,719-Speed 6307.78 samples/sec Loss 7.5403 LearningRate 0.0009 Epoch: 6 Global Step: 133160 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:46,967-Speed 6306.91 samples/sec Loss 7.5563 LearningRate 0.0009 Epoch: 6 Global Step: 133170 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:50,212-Speed 6312.17 samples/sec Loss 7.5606 LearningRate 0.0009 Epoch: 6 Global Step: 133180 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:53,464-Speed 6300.13 samples/sec Loss 7.5614 LearningRate 0.0009 Epoch: 6 Global Step: 133190 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:56,693-Speed 6344.81 samples/sec Loss 7.5828 LearningRate 0.0009 Epoch: 6 Global Step: 133200 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:50:59,937-Speed 6312.81 samples/sec Loss 7.5065 LearningRate 0.0009 Epoch: 6 Global Step: 133210 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:51:03,183-Speed 6311.55 samples/sec Loss 7.5492 LearningRate 0.0009 Epoch: 6 Global Step: 133220 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:51:06,427-Speed 6315.14 samples/sec Loss 7.6033 LearningRate 0.0009 Epoch: 6 Global Step: 133230 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:51:09,671-Speed 6313.00 samples/sec Loss 7.5013 LearningRate 0.0009 Epoch: 6 Global Step: 133240 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:51:12,918-Speed 6310.28 samples/sec Loss 7.5331 LearningRate 0.0009 Epoch: 6 Global Step: 133250 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:51:16,151-Speed 6336.55 samples/sec Loss 7.5272 LearningRate 0.0009 Epoch: 6 Global Step: 133260 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:51:19,400-Speed 6304.56 samples/sec Loss 7.5050 LearningRate 0.0009 Epoch: 6 Global Step: 133270 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:51:22,650-Speed 6303.30 samples/sec Loss 7.5948 LearningRate 0.0009 Epoch: 6 Global Step: 133280 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:51:25,893-Speed 6316.73 samples/sec Loss 7.5586 LearningRate 0.0009 Epoch: 6 Global Step: 133290 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:51:29,135-Speed 6317.42 samples/sec Loss 7.5607 LearningRate 0.0009 Epoch: 6 Global Step: 133300 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:51:32,381-Speed 6310.69 samples/sec Loss 7.5458 LearningRate 0.0009 Epoch: 6 Global Step: 133310 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:51:35,625-Speed 6314.03 samples/sec Loss 7.5001 LearningRate 0.0009 Epoch: 6 Global Step: 133320 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:51:38,869-Speed 6316.51 samples/sec Loss 7.5572 LearningRate 0.0009 Epoch: 6 Global Step: 133330 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:51:42,115-Speed 6309.87 samples/sec Loss 7.5390 LearningRate 0.0009 Epoch: 6 Global Step: 133340 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:51:45,360-Speed 6313.51 samples/sec Loss 7.5986 LearningRate 0.0009 Epoch: 6 Global Step: 133350 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:51:48,608-Speed 6306.06 samples/sec Loss 7.6036 LearningRate 0.0009 Epoch: 6 Global Step: 133360 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:51:51,859-Speed 6301.16 samples/sec Loss 7.4836 LearningRate 0.0009 Epoch: 6 Global Step: 133370 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:51:55,105-Speed 6311.14 samples/sec Loss 7.6639 LearningRate 0.0009 Epoch: 6 Global Step: 133380 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:51:58,364-Speed 6285.42 samples/sec Loss 7.4958 LearningRate 0.0009 Epoch: 6 Global Step: 133390 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:52:01,617-Speed 6297.26 samples/sec Loss 7.4887 LearningRate 0.0009 Epoch: 6 Global Step: 133400 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:52:04,866-Speed 6304.36 samples/sec Loss 7.6131 LearningRate 0.0009 Epoch: 6 Global Step: 133410 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:52:08,115-Speed 6305.81 samples/sec Loss 7.5411 LearningRate 0.0009 Epoch: 6 Global Step: 133420 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:52:11,354-Speed 6324.06 samples/sec Loss 7.4503 LearningRate 0.0009 Epoch: 6 Global Step: 133430 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:14,615-Speed 6280.81 samples/sec Loss 7.5024 LearningRate 0.0009 Epoch: 6 Global Step: 133440 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:17,863-Speed 6308.70 samples/sec Loss 7.5372 LearningRate 0.0009 Epoch: 6 Global Step: 133450 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:21,109-Speed 6310.44 samples/sec Loss 7.5118 LearningRate 0.0009 Epoch: 6 Global Step: 133460 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:24,355-Speed 6311.27 samples/sec Loss 7.4354 LearningRate 0.0009 Epoch: 6 Global Step: 133470 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:27,603-Speed 6306.67 samples/sec Loss 7.5646 LearningRate 0.0009 Epoch: 6 Global Step: 133480 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:30,849-Speed 6310.65 samples/sec Loss 7.5247 LearningRate 0.0009 Epoch: 6 Global Step: 133490 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:34,108-Speed 6285.88 samples/sec Loss 7.5199 LearningRate 0.0009 Epoch: 6 Global Step: 133500 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:37,359-Speed 6299.70 samples/sec Loss 7.5047 LearningRate 0.0009 Epoch: 6 Global Step: 133510 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:40,610-Speed 6302.49 samples/sec Loss 7.6030 LearningRate 0.0009 Epoch: 6 Global Step: 133520 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:43,847-Speed 6326.53 samples/sec Loss 7.5511 LearningRate 0.0009 Epoch: 6 Global Step: 133530 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:47,096-Speed 6306.76 samples/sec Loss 7.4696 LearningRate 0.0009 Epoch: 6 Global Step: 133540 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:50,345-Speed 6303.98 samples/sec Loss 7.5847 LearningRate 0.0009 Epoch: 6 Global Step: 133550 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:53,594-Speed 6304.69 samples/sec Loss 7.6117 LearningRate 0.0009 Epoch: 6 Global Step: 133560 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:52:56,842-Speed 6306.44 samples/sec Loss 7.5999 LearningRate 0.0009 Epoch: 6 Global Step: 133570 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:53:00,090-Speed 6307.93 samples/sec Loss 7.5649 LearningRate 0.0009 Epoch: 6 Global Step: 133580 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:53:03,340-Speed 6301.74 samples/sec Loss 7.6413 LearningRate 0.0009 Epoch: 6 Global Step: 133590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:53:06,589-Speed 6305.33 samples/sec Loss 7.5535 LearningRate 0.0009 Epoch: 6 Global Step: 133600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:53:09,835-Speed 6310.86 samples/sec Loss 7.4771 LearningRate 0.0009 Epoch: 6 Global Step: 133610 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:53:13,093-Speed 6286.57 samples/sec Loss 7.6294 LearningRate 0.0009 Epoch: 6 Global Step: 133620 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:53:16,344-Speed 6301.98 samples/sec Loss 7.4624 LearningRate 0.0009 Epoch: 6 Global Step: 133630 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:53:19,587-Speed 6316.21 samples/sec Loss 7.5430 LearningRate 0.0009 Epoch: 6 Global Step: 133640 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:53:22,835-Speed 6306.92 samples/sec Loss 7.5102 LearningRate 0.0009 Epoch: 6 Global Step: 133650 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:53:26,085-Speed 6304.11 samples/sec Loss 7.5997 LearningRate 0.0009 Epoch: 6 Global Step: 133660 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:53:29,330-Speed 6311.42 samples/sec Loss 7.4531 LearningRate 0.0009 Epoch: 6 Global Step: 133670 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:53:32,579-Speed 6305.22 samples/sec Loss 7.6358 LearningRate 0.0009 Epoch: 6 Global Step: 133680 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:53:35,824-Speed 6312.68 samples/sec Loss 7.4301 LearningRate 0.0009 Epoch: 6 Global Step: 133690 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:53:39,072-Speed 6307.63 samples/sec Loss 7.5106 LearningRate 0.0009 Epoch: 6 Global Step: 133700 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:53:42,321-Speed 6304.60 samples/sec Loss 7.5272 LearningRate 0.0009 Epoch: 6 Global Step: 133710 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:53:45,584-Speed 6277.60 samples/sec Loss 7.5224 LearningRate 0.0009 Epoch: 6 Global Step: 133720 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:53:48,813-Speed 6343.44 samples/sec Loss 7.5066 LearningRate 0.0009 Epoch: 6 Global Step: 133730 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:53:52,056-Speed 6317.89 samples/sec Loss 7.5196 LearningRate 0.0009 Epoch: 6 Global Step: 133740 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:53:55,300-Speed 6313.21 samples/sec Loss 7.6044 LearningRate 0.0009 Epoch: 6 Global Step: 133750 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:53:58,544-Speed 6316.23 samples/sec Loss 7.5526 LearningRate 0.0009 Epoch: 6 Global Step: 133760 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:01,795-Speed 6300.39 samples/sec Loss 7.5758 LearningRate 0.0009 Epoch: 6 Global Step: 133770 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:05,039-Speed 6313.07 samples/sec Loss 7.4760 LearningRate 0.0009 Epoch: 6 Global Step: 133780 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:08,286-Speed 6310.21 samples/sec Loss 7.5224 LearningRate 0.0009 Epoch: 6 Global Step: 133790 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:11,533-Speed 6308.58 samples/sec Loss 7.4741 LearningRate 0.0009 Epoch: 6 Global Step: 133800 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:14,775-Speed 6318.40 samples/sec Loss 7.4663 LearningRate 0.0009 Epoch: 6 Global Step: 133810 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:18,018-Speed 6316.29 samples/sec Loss 7.6110 LearningRate 0.0009 Epoch: 6 Global Step: 133820 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:21,262-Speed 6314.58 samples/sec Loss 7.4852 LearningRate 0.0009 Epoch: 6 Global Step: 133830 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:54:24,507-Speed 6312.72 samples/sec Loss 7.5149 LearningRate 0.0009 Epoch: 6 Global Step: 133840 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:54:27,757-Speed 6302.59 samples/sec Loss 7.4873 LearningRate 0.0009 Epoch: 6 Global Step: 133850 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:54:30,987-Speed 6342.86 samples/sec Loss 7.5416 LearningRate 0.0009 Epoch: 6 Global Step: 133860 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:34,233-Speed 6310.05 samples/sec Loss 7.5466 LearningRate 0.0009 Epoch: 6 Global Step: 133870 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:37,476-Speed 6316.15 samples/sec Loss 7.5474 LearningRate 0.0009 Epoch: 6 Global Step: 133880 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:40,724-Speed 6308.06 samples/sec Loss 7.5201 LearningRate 0.0009 Epoch: 6 Global Step: 133890 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:43,969-Speed 6313.18 samples/sec Loss 7.4774 LearningRate 0.0009 Epoch: 6 Global Step: 133900 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:47,213-Speed 6313.11 samples/sec Loss 7.4822 LearningRate 0.0009 Epoch: 6 Global Step: 133910 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:50,460-Speed 6308.67 samples/sec Loss 7.5468 LearningRate 0.0009 Epoch: 6 Global Step: 133920 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:53,707-Speed 6310.28 samples/sec Loss 7.4898 LearningRate 0.0009 Epoch: 6 Global Step: 133930 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:54:56,950-Speed 6316.11 samples/sec Loss 7.5584 LearningRate 0.0009 Epoch: 6 Global Step: 133940 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:00,194-Speed 6315.32 samples/sec Loss 7.5477 LearningRate 0.0009 Epoch: 6 Global Step: 133950 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:03,443-Speed 6303.58 samples/sec Loss 7.5282 LearningRate 0.0009 Epoch: 6 Global Step: 133960 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:55:06,692-Speed 6305.60 samples/sec Loss 7.5632 LearningRate 0.0009 Epoch: 6 Global Step: 133970 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:55:09,927-Speed 6332.39 samples/sec Loss 7.5070 LearningRate 0.0009 Epoch: 6 Global Step: 133980 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:13,174-Speed 6308.63 samples/sec Loss 7.5065 LearningRate 0.0009 Epoch: 6 Global Step: 133990 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:16,423-Speed 6303.70 samples/sec Loss 7.5111 LearningRate 0.0009 Epoch: 6 Global Step: 134000 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:19,671-Speed 6307.07 samples/sec Loss 7.5376 LearningRate 0.0009 Epoch: 6 Global Step: 134010 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:22,918-Speed 6309.24 samples/sec Loss 7.5326 LearningRate 0.0009 Epoch: 6 Global Step: 134020 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:26,162-Speed 6314.74 samples/sec Loss 7.6111 LearningRate 0.0009 Epoch: 6 Global Step: 134030 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:29,416-Speed 6299.29 samples/sec Loss 7.4619 LearningRate 0.0009 Epoch: 6 Global Step: 134040 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:32,664-Speed 6307.02 samples/sec Loss 7.5542 LearningRate 0.0009 Epoch: 6 Global Step: 134050 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:35,910-Speed 6310.26 samples/sec Loss 7.4793 LearningRate 0.0009 Epoch: 6 Global Step: 134060 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:39,160-Speed 6302.06 samples/sec Loss 7.5418 LearningRate 0.0009 Epoch: 6 Global Step: 134070 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:42,407-Speed 6309.23 samples/sec Loss 7.5428 LearningRate 0.0009 Epoch: 6 Global Step: 134080 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:55:45,652-Speed 6313.37 samples/sec Loss 7.5756 LearningRate 0.0009 Epoch: 6 Global Step: 134090 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:55:48,902-Speed 6303.45 samples/sec Loss 7.5566 LearningRate 0.0009 Epoch: 6 Global Step: 134100 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:55:52,140-Speed 6325.90 samples/sec Loss 7.5306 LearningRate 0.0009 Epoch: 6 Global Step: 134110 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:55,391-Speed 6301.13 samples/sec Loss 7.6431 LearningRate 0.0009 Epoch: 6 Global Step: 134120 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:55:58,637-Speed 6311.96 samples/sec Loss 7.4762 LearningRate 0.0009 Epoch: 6 Global Step: 134130 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:56:01,879-Speed 6318.03 samples/sec Loss 7.5041 LearningRate 0.0009 Epoch: 6 Global Step: 134140 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:56:05,127-Speed 6305.64 samples/sec Loss 7.5508 LearningRate 0.0009 Epoch: 6 Global Step: 134150 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:56:08,381-Speed 6296.00 samples/sec Loss 7.5734 LearningRate 0.0009 Epoch: 6 Global Step: 134160 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:56:11,628-Speed 6309.59 samples/sec Loss 7.4944 LearningRate 0.0009 Epoch: 6 Global Step: 134170 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:56:14,873-Speed 6311.47 samples/sec Loss 7.5258 LearningRate 0.0009 Epoch: 6 Global Step: 134180 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:56:18,118-Speed 6312.53 samples/sec Loss 7.6304 LearningRate 0.0009 Epoch: 6 Global Step: 134190 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:56:21,366-Speed 6307.30 samples/sec Loss 7.5843 LearningRate 0.0009 Epoch: 6 Global Step: 134200 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:56:24,612-Speed 6311.07 samples/sec Loss 7.4407 LearningRate 0.0009 Epoch: 6 Global Step: 134210 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:56:27,860-Speed 6306.07 samples/sec Loss 7.5288 LearningRate 0.0009 Epoch: 6 Global Step: 134220 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:56:31,112-Speed 6299.65 samples/sec Loss 7.4701 LearningRate 0.0009 Epoch: 6 Global Step: 134230 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:56:34,360-Speed 6307.13 samples/sec Loss 7.5068 LearningRate 0.0009 Epoch: 6 Global Step: 134240 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:56:37,611-Speed 6299.98 samples/sec Loss 7.5480 LearningRate 0.0009 Epoch: 6 Global Step: 134250 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:56:40,857-Speed 6311.05 samples/sec Loss 7.5175 LearningRate 0.0009 Epoch: 6 Global Step: 134260 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:56:44,105-Speed 6307.28 samples/sec Loss 7.5346 LearningRate 0.0009 Epoch: 6 Global Step: 134270 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:56:47,356-Speed 6299.77 samples/sec Loss 7.5192 LearningRate 0.0009 Epoch: 6 Global Step: 134280 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:56:50,601-Speed 6312.74 samples/sec Loss 7.4232 LearningRate 0.0009 Epoch: 6 Global Step: 134290 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:56:53,856-Speed 6294.36 samples/sec Loss 7.5051 LearningRate 0.0009 Epoch: 6 Global Step: 134300 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:56:57,088-Speed 6338.01 samples/sec Loss 7.6275 LearningRate 0.0009 Epoch: 6 Global Step: 134310 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:57:00,335-Speed 6309.45 samples/sec Loss 7.4431 LearningRate 0.0009 Epoch: 6 Global Step: 134320 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:57:03,582-Speed 6309.19 samples/sec Loss 7.5000 LearningRate 0.0009 Epoch: 6 Global Step: 134330 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:57:06,827-Speed 6311.62 samples/sec Loss 7.5220 LearningRate 0.0009 Epoch: 6 Global Step: 134340 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:57:10,073-Speed 6310.60 samples/sec Loss 7.5564 LearningRate 0.0009 Epoch: 6 Global Step: 134350 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:57:13,305-Speed 6339.41 samples/sec Loss 7.4653 LearningRate 0.0009 Epoch: 6 Global Step: 134360 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:57:16,555-Speed 6300.97 samples/sec Loss 7.5524 LearningRate 0.0009 Epoch: 6 Global Step: 134370 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:57:19,802-Speed 6308.64 samples/sec Loss 7.4844 LearningRate 0.0009 Epoch: 6 Global Step: 134380 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:57:23,049-Speed 6309.19 samples/sec Loss 7.4818 LearningRate 0.0009 Epoch: 6 Global Step: 134390 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:57:26,308-Speed 6286.94 samples/sec Loss 7.4768 LearningRate 0.0009 Epoch: 6 Global Step: 134400 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:57:29,551-Speed 6316.81 samples/sec Loss 7.4644 LearningRate 0.0009 Epoch: 6 Global Step: 134410 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:57:32,798-Speed 6308.38 samples/sec Loss 7.4103 LearningRate 0.0009 Epoch: 6 Global Step: 134420 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:57:36,040-Speed 6316.71 samples/sec Loss 7.5376 LearningRate 0.0009 Epoch: 6 Global Step: 134430 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:57:39,290-Speed 6304.58 samples/sec Loss 7.5998 LearningRate 0.0009 Epoch: 6 Global Step: 134440 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:57:42,537-Speed 6308.33 samples/sec Loss 7.5599 LearningRate 0.0009 Epoch: 6 Global Step: 134450 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:57:45,790-Speed 6297.06 samples/sec Loss 7.5095 LearningRate 0.0009 Epoch: 6 Global Step: 134460 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:57:49,121-Speed 6148.65 samples/sec Loss 7.4939 LearningRate 0.0009 Epoch: 6 Global Step: 134470 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:57:52,405-Speed 6238.61 samples/sec Loss 7.5303 LearningRate 0.0009 Epoch: 6 Global Step: 134480 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:57:55,653-Speed 6307.46 samples/sec Loss 7.6116 LearningRate 0.0009 Epoch: 6 Global Step: 134490 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:57:58,899-Speed 6310.37 samples/sec Loss 7.5209 LearningRate 0.0009 Epoch: 6 Global Step: 134500 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:58:02,149-Speed 6302.81 samples/sec Loss 7.4787 LearningRate 0.0009 Epoch: 6 Global Step: 134510 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:58:05,383-Speed 6334.75 samples/sec Loss 7.5465 LearningRate 0.0009 Epoch: 6 Global Step: 134520 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:08,626-Speed 6316.30 samples/sec Loss 7.4927 LearningRate 0.0009 Epoch: 6 Global Step: 134530 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:11,873-Speed 6308.95 samples/sec Loss 7.5310 LearningRate 0.0009 Epoch: 6 Global Step: 134540 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:15,119-Speed 6310.42 samples/sec Loss 7.5561 LearningRate 0.0009 Epoch: 6 Global Step: 134550 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:18,367-Speed 6307.50 samples/sec Loss 7.5245 LearningRate 0.0009 Epoch: 6 Global Step: 134560 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:21,615-Speed 6306.02 samples/sec Loss 7.5523 LearningRate 0.0009 Epoch: 6 Global Step: 134570 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:24,861-Speed 6311.52 samples/sec Loss 7.5934 LearningRate 0.0009 Epoch: 6 Global Step: 134580 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:28,119-Speed 6287.09 samples/sec Loss 7.5410 LearningRate 0.0009 Epoch: 6 Global Step: 134590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:31,369-Speed 6302.87 samples/sec Loss 7.4642 LearningRate 0.0009 Epoch: 6 Global Step: 134600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:34,617-Speed 6307.93 samples/sec Loss 7.4457 LearningRate 0.0009 Epoch: 6 Global Step: 134610 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:37,865-Speed 6305.37 samples/sec Loss 7.4614 LearningRate 0.0009 Epoch: 6 Global Step: 134620 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:58:41,115-Speed 6303.63 samples/sec Loss 7.5390 LearningRate 0.0009 Epoch: 6 Global Step: 134630 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:58:44,380-Speed 6273.84 samples/sec Loss 7.4877 LearningRate 0.0009 Epoch: 6 Global Step: 134640 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:58:47,757-Speed 6065.99 samples/sec Loss 7.4781 LearningRate 0.0009 Epoch: 6 Global Step: 134650 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:51,160-Speed 6019.91 samples/sec Loss 7.4914 LearningRate 0.0009 Epoch: 6 Global Step: 134660 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:54,514-Speed 6107.02 samples/sec Loss 7.4747 LearningRate 0.0009 Epoch: 6 Global Step: 134670 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:58:57,758-Speed 6315.22 samples/sec Loss 7.5765 LearningRate 0.0009 Epoch: 6 Global Step: 134680 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:01,009-Speed 6301.47 samples/sec Loss 7.5698 LearningRate 0.0009 Epoch: 6 Global Step: 134690 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:04,253-Speed 6313.91 samples/sec Loss 7.4650 LearningRate 0.0009 Epoch: 6 Global Step: 134700 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:07,502-Speed 6305.14 samples/sec Loss 7.5442 LearningRate 0.0009 Epoch: 6 Global Step: 134710 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:10,744-Speed 6317.89 samples/sec Loss 7.5241 LearningRate 0.0009 Epoch: 6 Global Step: 134720 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:13,991-Speed 6310.74 samples/sec Loss 7.5430 LearningRate 0.0009 Epoch: 6 Global Step: 134730 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:17,236-Speed 6311.54 samples/sec Loss 7.4683 LearningRate 0.0009 Epoch: 6 Global Step: 134740 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:20,483-Speed 6309.77 samples/sec Loss 7.5794 LearningRate 0.0009 Epoch: 6 Global Step: 134750 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:59:23,713-Speed 6342.04 samples/sec Loss 7.4667 LearningRate 0.0009 Epoch: 6 Global Step: 134760 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:26,965-Speed 6298.65 samples/sec Loss 7.4687 LearningRate 0.0009 Epoch: 6 Global Step: 134770 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:30,209-Speed 6314.40 samples/sec Loss 7.5153 LearningRate 0.0009 Epoch: 6 Global Step: 134780 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:33,456-Speed 6308.59 samples/sec Loss 7.6089 LearningRate 0.0009 Epoch: 6 Global Step: 134790 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:36,698-Speed 6319.32 samples/sec Loss 7.5069 LearningRate 0.0009 Epoch: 6 Global Step: 134800 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:39,942-Speed 6313.39 samples/sec Loss 7.4796 LearningRate 0.0009 Epoch: 6 Global Step: 134810 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:43,188-Speed 6310.98 samples/sec Loss 7.5322 LearningRate 0.0009 Epoch: 6 Global Step: 134820 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:46,435-Speed 6308.50 samples/sec Loss 7.4996 LearningRate 0.0009 Epoch: 6 Global Step: 134830 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:49,678-Speed 6317.50 samples/sec Loss 7.4884 LearningRate 0.0009 Epoch: 6 Global Step: 134840 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:52,947-Speed 6267.13 samples/sec Loss 7.5744 LearningRate 0.0009 Epoch: 6 Global Step: 134850 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 03:59:56,193-Speed 6308.89 samples/sec Loss 7.5178 LearningRate 0.0009 Epoch: 6 Global Step: 134860 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 03:59:59,510-Speed 6177.03 samples/sec Loss 7.4596 LearningRate 0.0009 Epoch: 6 Global Step: 134870 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:00:02,848-Speed 6136.91 samples/sec Loss 7.4297 LearningRate 0.0009 Epoch: 6 Global Step: 134880 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:00:06,095-Speed 6307.28 samples/sec Loss 7.5399 LearningRate 0.0009 Epoch: 6 Global Step: 134890 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:00:09,349-Speed 6295.93 samples/sec Loss 7.5713 LearningRate 0.0009 Epoch: 6 Global Step: 134900 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:00:12,602-Speed 6297.85 samples/sec Loss 7.4666 LearningRate 0.0009 Epoch: 6 Global Step: 134910 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:00:15,845-Speed 6316.79 samples/sec Loss 7.4468 LearningRate 0.0009 Epoch: 6 Global Step: 134920 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:00:19,091-Speed 6308.87 samples/sec Loss 7.4989 LearningRate 0.0009 Epoch: 6 Global Step: 134930 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:00:22,326-Speed 6333.89 samples/sec Loss 7.5002 LearningRate 0.0009 Epoch: 6 Global Step: 134940 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:00:25,570-Speed 6314.60 samples/sec Loss 7.4892 LearningRate 0.0009 Epoch: 6 Global Step: 134950 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:00:28,817-Speed 6309.26 samples/sec Loss 7.4477 LearningRate 0.0009 Epoch: 6 Global Step: 134960 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:00:32,097-Speed 6245.45 samples/sec Loss 7.5043 LearningRate 0.0009 Epoch: 6 Global Step: 134970 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:00:35,344-Speed 6309.34 samples/sec Loss 7.5244 LearningRate 0.0009 Epoch: 6 Global Step: 134980 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:00:38,600-Speed 6290.44 samples/sec Loss 7.6263 LearningRate 0.0009 Epoch: 6 Global Step: 134990 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:00:41,848-Speed 6306.40 samples/sec Loss 7.4699 LearningRate 0.0009 Epoch: 6 Global Step: 135000 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:00:45,096-Speed 6307.36 samples/sec Loss 7.4706 LearningRate 0.0009 Epoch: 6 Global Step: 135010 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:00:48,341-Speed 6312.69 samples/sec Loss 7.4235 LearningRate 0.0009 Epoch: 6 Global Step: 135020 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:00:51,592-Speed 6299.82 samples/sec Loss 7.5167 LearningRate 0.0009 Epoch: 6 Global Step: 135030 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:00:54,841-Speed 6305.56 samples/sec Loss 7.4579 LearningRate 0.0009 Epoch: 6 Global Step: 135040 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:00:58,089-Speed 6307.49 samples/sec Loss 7.4377 LearningRate 0.0009 Epoch: 6 Global Step: 135050 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:01:01,335-Speed 6309.62 samples/sec Loss 7.4750 LearningRate 0.0009 Epoch: 6 Global Step: 135060 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:01:04,582-Speed 6308.92 samples/sec Loss 7.5255 LearningRate 0.0009 Epoch: 6 Global Step: 135070 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:01:07,831-Speed 6305.92 samples/sec Loss 7.4862 LearningRate 0.0009 Epoch: 6 Global Step: 135080 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:01:11,064-Speed 6334.98 samples/sec Loss 7.5156 LearningRate 0.0009 Epoch: 6 Global Step: 135090 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:01:14,307-Speed 6316.25 samples/sec Loss 7.5150 LearningRate 0.0009 Epoch: 6 Global Step: 135100 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:01:17,557-Speed 6303.93 samples/sec Loss 7.4477 LearningRate 0.0009 Epoch: 6 Global Step: 135110 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:01:20,801-Speed 6313.54 samples/sec Loss 7.3714 LearningRate 0.0009 Epoch: 6 Global Step: 135120 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:01:24,046-Speed 6313.05 samples/sec Loss 7.5037 LearningRate 0.0009 Epoch: 6 Global Step: 135130 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:01:27,292-Speed 6311.40 samples/sec Loss 7.5136 LearningRate 0.0009 Epoch: 6 Global Step: 135140 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:01:30,542-Speed 6302.96 samples/sec Loss 7.5678 LearningRate 0.0009 Epoch: 6 Global Step: 135150 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:01:33,787-Speed 6312.86 samples/sec Loss 7.5510 LearningRate 0.0009 Epoch: 6 Global Step: 135160 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:01:37,035-Speed 6306.80 samples/sec Loss 7.5461 LearningRate 0.0009 Epoch: 6 Global Step: 135170 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:01:40,285-Speed 6304.07 samples/sec Loss 7.5116 LearningRate 0.0009 Epoch: 6 Global Step: 135180 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:01:43,537-Speed 6297.80 samples/sec Loss 7.5098 LearningRate 0.0009 Epoch: 6 Global Step: 135190 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:01:46,794-Speed 6289.39 samples/sec Loss 7.5562 LearningRate 0.0009 Epoch: 6 Global Step: 135200 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:01:50,052-Speed 6288.70 samples/sec Loss 7.5195 LearningRate 0.0009 Epoch: 6 Global Step: 135210 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:01:53,296-Speed 6314.60 samples/sec Loss 7.4779 LearningRate 0.0009 Epoch: 6 Global Step: 135220 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:01:56,542-Speed 6308.99 samples/sec Loss 7.5154 LearningRate 0.0009 Epoch: 6 Global Step: 135230 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:01:59,776-Speed 6334.56 samples/sec Loss 7.5235 LearningRate 0.0009 Epoch: 6 Global Step: 135240 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:02:03,022-Speed 6311.27 samples/sec Loss 7.4459 LearningRate 0.0009 Epoch: 6 Global Step: 135250 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:02:06,266-Speed 6315.03 samples/sec Loss 7.4778 LearningRate 0.0009 Epoch: 6 Global Step: 135260 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:02:09,515-Speed 6304.48 samples/sec Loss 7.6110 LearningRate 0.0009 Epoch: 6 Global Step: 135270 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:02:12,759-Speed 6314.38 samples/sec Loss 7.5161 LearningRate 0.0009 Epoch: 6 Global Step: 135280 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:02:16,008-Speed 6305.46 samples/sec Loss 7.4567 LearningRate 0.0009 Epoch: 6 Global Step: 135290 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:02:19,252-Speed 6315.26 samples/sec Loss 7.4882 LearningRate 0.0009 Epoch: 6 Global Step: 135300 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:02:22,495-Speed 6315.67 samples/sec Loss 7.4703 LearningRate 0.0009 Epoch: 6 Global Step: 135310 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:02:25,743-Speed 6307.52 samples/sec Loss 7.4963 LearningRate 0.0009 Epoch: 6 Global Step: 135320 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:02:28,988-Speed 6311.47 samples/sec Loss 7.4734 LearningRate 0.0009 Epoch: 6 Global Step: 135330 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:02:32,235-Speed 6309.09 samples/sec Loss 7.4783 LearningRate 0.0009 Epoch: 6 Global Step: 135340 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:02:35,482-Speed 6310.28 samples/sec Loss 7.6434 LearningRate 0.0009 Epoch: 6 Global Step: 135350 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:02:38,726-Speed 6313.98 samples/sec Loss 7.4551 LearningRate 0.0009 Epoch: 6 Global Step: 135360 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:02:41,971-Speed 6312.56 samples/sec Loss 7.5731 LearningRate 0.0009 Epoch: 6 Global Step: 135370 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:02:45,219-Speed 6306.62 samples/sec Loss 7.5642 LearningRate 0.0009 Epoch: 6 Global Step: 135380 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:02:48,465-Speed 6312.31 samples/sec Loss 7.5162 LearningRate 0.0009 Epoch: 6 Global Step: 135390 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:02:51,710-Speed 6312.03 samples/sec Loss 7.4613 LearningRate 0.0009 Epoch: 6 Global Step: 135400 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:02:54,951-Speed 6320.12 samples/sec Loss 7.5542 LearningRate 0.0009 Epoch: 6 Global Step: 135410 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:02:58,196-Speed 6313.25 samples/sec Loss 7.5153 LearningRate 0.0009 Epoch: 6 Global Step: 135420 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:03:01,444-Speed 6305.78 samples/sec Loss 7.5192 LearningRate 0.0009 Epoch: 6 Global Step: 135430 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:03:04,666-Speed 6357.88 samples/sec Loss 7.5600 LearningRate 0.0009 Epoch: 6 Global Step: 135440 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:07,908-Speed 6318.93 samples/sec Loss 7.4662 LearningRate 0.0009 Epoch: 6 Global Step: 135450 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:11,153-Speed 6311.82 samples/sec Loss 7.5801 LearningRate 0.0009 Epoch: 6 Global Step: 135460 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:14,401-Speed 6308.10 samples/sec Loss 7.5024 LearningRate 0.0009 Epoch: 6 Global Step: 135470 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:17,645-Speed 6313.83 samples/sec Loss 7.4714 LearningRate 0.0009 Epoch: 6 Global Step: 135480 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:20,903-Speed 6286.82 samples/sec Loss 7.4801 LearningRate 0.0009 Epoch: 6 Global Step: 135490 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:24,147-Speed 6314.70 samples/sec Loss 7.6044 LearningRate 0.0009 Epoch: 6 Global Step: 135500 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:27,394-Speed 6308.45 samples/sec Loss 7.5716 LearningRate 0.0009 Epoch: 6 Global Step: 135510 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:30,641-Speed 6309.12 samples/sec Loss 7.5009 LearningRate 0.0009 Epoch: 6 Global Step: 135520 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:33,884-Speed 6316.81 samples/sec Loss 7.4585 LearningRate 0.0009 Epoch: 6 Global Step: 135530 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:37,126-Speed 6319.73 samples/sec Loss 7.4616 LearningRate 0.0009 Epoch: 6 Global Step: 135540 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:03:40,375-Speed 6304.21 samples/sec Loss 7.4491 LearningRate 0.0009 Epoch: 6 Global Step: 135550 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:03:43,620-Speed 6312.66 samples/sec Loss 7.5290 LearningRate 0.0009 Epoch: 6 Global Step: 135560 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:03:46,868-Speed 6307.08 samples/sec Loss 7.5168 LearningRate 0.0009 Epoch: 6 Global Step: 135570 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:03:50,117-Speed 6304.32 samples/sec Loss 7.5055 LearningRate 0.0009 Epoch: 6 Global Step: 135580 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:03:53,349-Speed 6338.06 samples/sec Loss 7.5427 LearningRate 0.0009 Epoch: 6 Global Step: 135590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:56,594-Speed 6313.08 samples/sec Loss 7.5786 LearningRate 0.0009 Epoch: 6 Global Step: 135600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:03:59,843-Speed 6304.73 samples/sec Loss 7.4939 LearningRate 0.0009 Epoch: 6 Global Step: 135610 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:03,093-Speed 6303.34 samples/sec Loss 7.5578 LearningRate 0.0009 Epoch: 6 Global Step: 135620 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:06,342-Speed 6305.61 samples/sec Loss 7.5106 LearningRate 0.0009 Epoch: 6 Global Step: 135630 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:09,589-Speed 6308.46 samples/sec Loss 7.5469 LearningRate 0.0009 Epoch: 6 Global Step: 135640 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:12,833-Speed 6314.44 samples/sec Loss 7.5868 LearningRate 0.0009 Epoch: 6 Global Step: 135650 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:16,081-Speed 6307.13 samples/sec Loss 7.4643 LearningRate 0.0009 Epoch: 6 Global Step: 135660 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:19,329-Speed 6306.78 samples/sec Loss 7.4733 LearningRate 0.0009 Epoch: 6 Global Step: 135670 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:22,574-Speed 6311.25 samples/sec Loss 7.5301 LearningRate 0.0009 Epoch: 6 Global Step: 135680 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:25,823-Speed 6305.38 samples/sec Loss 7.3797 LearningRate 0.0009 Epoch: 6 Global Step: 135690 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:04:29,069-Speed 6311.03 samples/sec Loss 7.4482 LearningRate 0.0009 Epoch: 6 Global Step: 135700 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:04:32,318-Speed 6305.51 samples/sec Loss 7.4461 LearningRate 0.0009 Epoch: 6 Global Step: 135710 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:04:35,563-Speed 6311.64 samples/sec Loss 7.4342 LearningRate 0.0009 Epoch: 6 Global Step: 135720 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:04:38,840-Speed 6251.58 samples/sec Loss 7.5634 LearningRate 0.0009 Epoch: 6 Global Step: 135730 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:42,136-Speed 6213.73 samples/sec Loss 7.4647 LearningRate 0.0009 Epoch: 6 Global Step: 135740 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:45,395-Speed 6286.30 samples/sec Loss 7.4876 LearningRate 0.0009 Epoch: 6 Global Step: 135750 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:48,639-Speed 6315.13 samples/sec Loss 7.4661 LearningRate 0.0009 Epoch: 6 Global Step: 135760 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:51,887-Speed 6307.52 samples/sec Loss 7.4955 LearningRate 0.0009 Epoch: 6 Global Step: 135770 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:55,134-Speed 6307.78 samples/sec Loss 7.5002 LearningRate 0.0009 Epoch: 6 Global Step: 135780 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:04:58,380-Speed 6312.33 samples/sec Loss 7.5713 LearningRate 0.0009 Epoch: 6 Global Step: 135790 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:01,627-Speed 6308.23 samples/sec Loss 7.5654 LearningRate 0.0009 Epoch: 6 Global Step: 135800 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:04,875-Speed 6307.56 samples/sec Loss 7.4811 LearningRate 0.0009 Epoch: 6 Global Step: 135810 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:08,130-Speed 6292.73 samples/sec Loss 7.4545 LearningRate 0.0009 Epoch: 6 Global Step: 135820 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:11,375-Speed 6312.70 samples/sec Loss 7.4735 LearningRate 0.0009 Epoch: 6 Global Step: 135830 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:05:14,619-Speed 6313.80 samples/sec Loss 7.4220 LearningRate 0.0009 Epoch: 6 Global Step: 135840 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:05:17,863-Speed 6315.06 samples/sec Loss 7.5084 LearningRate 0.0009 Epoch: 6 Global Step: 135850 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:05:21,111-Speed 6306.19 samples/sec Loss 7.4424 LearningRate 0.0009 Epoch: 6 Global Step: 135860 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:05:24,363-Speed 6298.78 samples/sec Loss 7.4867 LearningRate 0.0009 Epoch: 6 Global Step: 135870 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:05:27,608-Speed 6313.73 samples/sec Loss 7.5344 LearningRate 0.0009 Epoch: 6 Global Step: 135880 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:05:30,860-Speed 6297.79 samples/sec Loss 7.5151 LearningRate 0.0009 Epoch: 6 Global Step: 135890 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:05:34,097-Speed 6329.09 samples/sec Loss 7.5621 LearningRate 0.0009 Epoch: 6 Global Step: 135900 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:37,341-Speed 6314.30 samples/sec Loss 7.4124 LearningRate 0.0009 Epoch: 6 Global Step: 135910 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:40,587-Speed 6309.78 samples/sec Loss 7.5095 LearningRate 0.0009 Epoch: 6 Global Step: 135920 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:43,842-Speed 6294.25 samples/sec Loss 7.5333 LearningRate 0.0009 Epoch: 6 Global Step: 135930 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:47,091-Speed 6304.94 samples/sec Loss 7.5486 LearningRate 0.0009 Epoch: 6 Global Step: 135940 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:50,338-Speed 6308.57 samples/sec Loss 7.4995 LearningRate 0.0009 Epoch: 6 Global Step: 135950 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:53,588-Speed 6302.30 samples/sec Loss 7.5297 LearningRate 0.0009 Epoch: 6 Global Step: 135960 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:05:56,838-Speed 6302.82 samples/sec Loss 7.5241 LearningRate 0.0009 Epoch: 6 Global Step: 135970 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:00,080-Speed 6318.30 samples/sec Loss 7.4795 LearningRate 0.0009 Epoch: 6 Global Step: 135980 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:03,330-Speed 6304.44 samples/sec Loss 7.5144 LearningRate 0.0009 Epoch: 6 Global Step: 135990 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:06,563-Speed 6336.03 samples/sec Loss 7.4841 LearningRate 0.0009 Epoch: 6 Global Step: 136000 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:09,815-Speed 6299.43 samples/sec Loss 7.5461 LearningRate 0.0009 Epoch: 6 Global Step: 136010 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:13,063-Speed 6306.67 samples/sec Loss 7.4749 LearningRate 0.0009 Epoch: 6 Global Step: 136020 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:16,314-Speed 6300.80 samples/sec Loss 7.4972 LearningRate 0.0009 Epoch: 6 Global Step: 136030 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:19,563-Speed 6305.57 samples/sec Loss 7.4041 LearningRate 0.0009 Epoch: 6 Global Step: 136040 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:22,811-Speed 6305.72 samples/sec Loss 7.4445 LearningRate 0.0009 Epoch: 6 Global Step: 136050 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:26,061-Speed 6303.98 samples/sec Loss 7.5673 LearningRate 0.0009 Epoch: 6 Global Step: 136060 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:29,315-Speed 6295.75 samples/sec Loss 7.4592 LearningRate 0.0009 Epoch: 6 Global Step: 136070 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:32,561-Speed 6309.34 samples/sec Loss 7.5374 LearningRate 0.0009 Epoch: 6 Global Step: 136080 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:35,805-Speed 6315.58 samples/sec Loss 7.5744 LearningRate 0.0009 Epoch: 6 Global Step: 136090 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:39,049-Speed 6314.42 samples/sec Loss 7.4403 LearningRate 0.0009 Epoch: 6 Global Step: 136100 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:06:42,282-Speed 6336.29 samples/sec Loss 7.4823 LearningRate 0.0009 Epoch: 6 Global Step: 136110 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:45,532-Speed 6302.04 samples/sec Loss 7.4388 LearningRate 0.0009 Epoch: 6 Global Step: 136120 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:48,780-Speed 6306.80 samples/sec Loss 7.4575 LearningRate 0.0009 Epoch: 6 Global Step: 136130 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:52,024-Speed 6314.04 samples/sec Loss 7.4212 LearningRate 0.0009 Epoch: 6 Global Step: 136140 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:55,271-Speed 6309.44 samples/sec Loss 7.4062 LearningRate 0.0009 Epoch: 6 Global Step: 136150 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:06:58,520-Speed 6304.32 samples/sec Loss 7.4405 LearningRate 0.0009 Epoch: 6 Global Step: 136160 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:01,765-Speed 6312.76 samples/sec Loss 7.4711 LearningRate 0.0009 Epoch: 6 Global Step: 136170 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:05,024-Speed 6287.16 samples/sec Loss 7.4268 LearningRate 0.0009 Epoch: 6 Global Step: 136180 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:08,271-Speed 6306.95 samples/sec Loss 7.4980 LearningRate 0.0009 Epoch: 6 Global Step: 136190 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:11,519-Speed 6308.69 samples/sec Loss 7.4278 LearningRate 0.0009 Epoch: 6 Global Step: 136200 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:14,767-Speed 6307.11 samples/sec Loss 7.3819 LearningRate 0.0009 Epoch: 6 Global Step: 136210 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:07:18,012-Speed 6311.63 samples/sec Loss 7.4638 LearningRate 0.0009 Epoch: 6 Global Step: 136220 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:07:21,262-Speed 6303.65 samples/sec Loss 7.5389 LearningRate 0.0009 Epoch: 6 Global Step: 136230 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:07:24,511-Speed 6304.08 samples/sec Loss 7.4514 LearningRate 0.0009 Epoch: 6 Global Step: 136240 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:07:27,747-Speed 6331.20 samples/sec Loss 7.4259 LearningRate 0.0009 Epoch: 6 Global Step: 136250 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:31,010-Speed 6277.71 samples/sec Loss 7.4529 LearningRate 0.0009 Epoch: 6 Global Step: 136260 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:34,257-Speed 6308.28 samples/sec Loss 7.5148 LearningRate 0.0009 Epoch: 6 Global Step: 136270 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:37,510-Speed 6298.21 samples/sec Loss 7.5056 LearningRate 0.0009 Epoch: 6 Global Step: 136280 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:40,757-Speed 6308.87 samples/sec Loss 7.4287 LearningRate 0.0009 Epoch: 6 Global Step: 136290 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:44,006-Speed 6304.37 samples/sec Loss 7.4495 LearningRate 0.0009 Epoch: 6 Global Step: 136300 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:47,257-Speed 6299.79 samples/sec Loss 7.4158 LearningRate 0.0009 Epoch: 6 Global Step: 136310 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:50,516-Speed 6287.41 samples/sec Loss 7.4646 LearningRate 0.0009 Epoch: 6 Global Step: 136320 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:53,762-Speed 6310.22 samples/sec Loss 7.4072 LearningRate 0.0009 Epoch: 6 Global Step: 136330 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:07:57,010-Speed 6305.82 samples/sec Loss 7.4463 LearningRate 0.0009 Epoch: 6 Global Step: 136340 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:00,260-Speed 6303.12 samples/sec Loss 7.5221 LearningRate 0.0009 Epoch: 6 Global Step: 136350 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:08:03,511-Speed 6301.80 samples/sec Loss 7.4383 LearningRate 0.0009 Epoch: 6 Global Step: 136360 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:08:06,761-Speed 6303.64 samples/sec Loss 7.4016 LearningRate 0.0009 Epoch: 6 Global Step: 136370 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:08:10,007-Speed 6310.35 samples/sec Loss 7.5532 LearningRate 0.0009 Epoch: 6 Global Step: 136380 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:08:13,255-Speed 6306.14 samples/sec Loss 7.5302 LearningRate 0.0009 Epoch: 6 Global Step: 136390 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:08:16,502-Speed 6307.76 samples/sec Loss 7.5265 LearningRate 0.0009 Epoch: 6 Global Step: 136400 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:08:19,751-Speed 6305.83 samples/sec Loss 7.4145 LearningRate 0.0009 Epoch: 6 Global Step: 136410 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:08:22,987-Speed 6331.28 samples/sec Loss 7.5527 LearningRate 0.0009 Epoch: 6 Global Step: 136420 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:26,236-Speed 6304.97 samples/sec Loss 7.4448 LearningRate 0.0009 Epoch: 6 Global Step: 136430 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:29,482-Speed 6310.24 samples/sec Loss 7.4774 LearningRate 0.0009 Epoch: 6 Global Step: 136440 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:32,728-Speed 6310.12 samples/sec Loss 7.4965 LearningRate 0.0009 Epoch: 6 Global Step: 136450 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:35,997-Speed 6267.11 samples/sec Loss 7.5157 LearningRate 0.0009 Epoch: 6 Global Step: 136460 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:39,243-Speed 6310.74 samples/sec Loss 7.5429 LearningRate 0.0009 Epoch: 6 Global Step: 136470 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:42,490-Speed 6308.85 samples/sec Loss 7.3989 LearningRate 0.0009 Epoch: 6 Global Step: 136480 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:45,738-Speed 6306.72 samples/sec Loss 7.4142 LearningRate 0.0009 Epoch: 6 Global Step: 136490 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:48,993-Speed 6293.40 samples/sec Loss 7.4900 LearningRate 0.0009 Epoch: 6 Global Step: 136500 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:52,241-Speed 6305.76 samples/sec Loss 7.4639 LearningRate 0.0009 Epoch: 6 Global Step: 136510 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:08:55,488-Speed 6310.70 samples/sec Loss 7.5268 LearningRate 0.0009 Epoch: 6 Global Step: 136520 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:08:58,716-Speed 6345.26 samples/sec Loss 7.5150 LearningRate 0.0009 Epoch: 6 Global Step: 136530 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:01,968-Speed 6298.26 samples/sec Loss 7.4094 LearningRate 0.0009 Epoch: 6 Global Step: 136540 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:05,216-Speed 6307.94 samples/sec Loss 7.4593 LearningRate 0.0009 Epoch: 6 Global Step: 136550 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:08,461-Speed 6311.61 samples/sec Loss 7.4813 LearningRate 0.0009 Epoch: 6 Global Step: 136560 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:11,704-Speed 6316.15 samples/sec Loss 7.4555 LearningRate 0.0009 Epoch: 6 Global Step: 136570 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:14,949-Speed 6314.15 samples/sec Loss 7.4111 LearningRate 0.0009 Epoch: 6 Global Step: 136580 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:18,198-Speed 6304.52 samples/sec Loss 7.5212 LearningRate 0.0009 Epoch: 6 Global Step: 136590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:21,442-Speed 6314.02 samples/sec Loss 7.5139 LearningRate 0.0009 Epoch: 6 Global Step: 136600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:24,685-Speed 6316.46 samples/sec Loss 7.4707 LearningRate 0.0009 Epoch: 6 Global Step: 136610 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:27,928-Speed 6317.96 samples/sec Loss 7.5236 LearningRate 0.0009 Epoch: 6 Global Step: 136620 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:31,173-Speed 6312.41 samples/sec Loss 7.4725 LearningRate 0.0009 Epoch: 6 Global Step: 136630 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:09:34,419-Speed 6310.12 samples/sec Loss 7.4548 LearningRate 0.0009 Epoch: 6 Global Step: 136640 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:09:37,666-Speed 6309.53 samples/sec Loss 7.4741 LearningRate 0.0009 Epoch: 6 Global Step: 136650 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:09:40,913-Speed 6309.36 samples/sec Loss 7.4652 LearningRate 0.0009 Epoch: 6 Global Step: 136660 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:09:44,160-Speed 6307.77 samples/sec Loss 7.4925 LearningRate 0.0009 Epoch: 6 Global Step: 136670 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:09:47,404-Speed 6313.83 samples/sec Loss 7.4934 LearningRate 0.0009 Epoch: 6 Global Step: 136680 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:09:50,649-Speed 6314.32 samples/sec Loss 7.4673 LearningRate 0.0009 Epoch: 6 Global Step: 136690 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:09:53,881-Speed 6337.82 samples/sec Loss 7.4495 LearningRate 0.0009 Epoch: 6 Global Step: 136700 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:09:57,127-Speed 6310.19 samples/sec Loss 7.4155 LearningRate 0.0009 Epoch: 6 Global Step: 136710 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:00,371-Speed 6315.27 samples/sec Loss 7.4658 LearningRate 0.0009 Epoch: 6 Global Step: 136720 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:03,629-Speed 6286.82 samples/sec Loss 7.4031 LearningRate 0.0009 Epoch: 6 Global Step: 136730 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:06,877-Speed 6306.49 samples/sec Loss 7.4710 LearningRate 0.0009 Epoch: 6 Global Step: 136740 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:10,123-Speed 6310.64 samples/sec Loss 7.4492 LearningRate 0.0009 Epoch: 6 Global Step: 136750 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:13,397-Speed 6256.09 samples/sec Loss 7.3854 LearningRate 0.0009 Epoch: 6 Global Step: 136760 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:16,640-Speed 6317.94 samples/sec Loss 7.4493 LearningRate 0.0009 Epoch: 6 Global Step: 136770 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:19,897-Speed 6288.31 samples/sec Loss 7.4729 LearningRate 0.0009 Epoch: 6 Global Step: 136780 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:23,140-Speed 6316.15 samples/sec Loss 7.5090 LearningRate 0.0009 Epoch: 6 Global Step: 136790 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:26,384-Speed 6316.03 samples/sec Loss 7.4958 LearningRate 0.0009 Epoch: 6 Global Step: 136800 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:10:29,617-Speed 6335.55 samples/sec Loss 7.3391 LearningRate 0.0009 Epoch: 6 Global Step: 136810 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:32,866-Speed 6304.40 samples/sec Loss 7.4914 LearningRate 0.0009 Epoch: 6 Global Step: 136820 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:36,114-Speed 6306.95 samples/sec Loss 7.4931 LearningRate 0.0009 Epoch: 6 Global Step: 136830 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:39,361-Speed 6310.08 samples/sec Loss 7.5302 LearningRate 0.0009 Epoch: 6 Global Step: 136840 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:42,611-Speed 6302.85 samples/sec Loss 7.4296 LearningRate 0.0009 Epoch: 6 Global Step: 136850 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:45,857-Speed 6309.54 samples/sec Loss 7.4436 LearningRate 0.0009 Epoch: 6 Global Step: 136860 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:49,109-Speed 6300.65 samples/sec Loss 7.4438 LearningRate 0.0009 Epoch: 6 Global Step: 136870 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:52,360-Speed 6300.56 samples/sec Loss 7.4749 LearningRate 0.0009 Epoch: 6 Global Step: 136880 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:55,606-Speed 6310.41 samples/sec Loss 7.4130 LearningRate 0.0009 Epoch: 6 Global Step: 136890 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:10:58,858-Speed 6299.87 samples/sec Loss 7.4467 LearningRate 0.0009 Epoch: 6 Global Step: 136900 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:02,101-Speed 6314.97 samples/sec Loss 7.4348 LearningRate 0.0009 Epoch: 6 Global Step: 136910 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:11:05,355-Speed 6296.96 samples/sec Loss 7.4426 LearningRate 0.0009 Epoch: 6 Global Step: 136920 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:11:08,590-Speed 6331.60 samples/sec Loss 7.3980 LearningRate 0.0009 Epoch: 6 Global Step: 136930 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:11,836-Speed 6310.64 samples/sec Loss 7.4355 LearningRate 0.0009 Epoch: 6 Global Step: 136940 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:15,086-Speed 6301.73 samples/sec Loss 7.3945 LearningRate 0.0009 Epoch: 6 Global Step: 136950 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:18,332-Speed 6310.66 samples/sec Loss 7.4646 LearningRate 0.0009 Epoch: 6 Global Step: 136960 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:21,582-Speed 6303.48 samples/sec Loss 7.5481 LearningRate 0.0009 Epoch: 6 Global Step: 136970 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:24,833-Speed 6301.17 samples/sec Loss 7.3308 LearningRate 0.0009 Epoch: 6 Global Step: 136980 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:28,094-Speed 6281.25 samples/sec Loss 7.4804 LearningRate 0.0009 Epoch: 6 Global Step: 136990 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:31,342-Speed 6307.45 samples/sec Loss 7.4658 LearningRate 0.0009 Epoch: 6 Global Step: 137000 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:34,588-Speed 6309.76 samples/sec Loss 7.4662 LearningRate 0.0009 Epoch: 6 Global Step: 137010 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:37,835-Speed 6309.77 samples/sec Loss 7.4489 LearningRate 0.0009 Epoch: 6 Global Step: 137020 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:11:41,083-Speed 6307.48 samples/sec Loss 7.4609 LearningRate 0.0009 Epoch: 6 Global Step: 137030 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:11:44,331-Speed 6305.51 samples/sec Loss 7.5887 LearningRate 0.0009 Epoch: 6 Global Step: 137040 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:11:47,578-Speed 6308.81 samples/sec Loss 7.4884 LearningRate 0.0009 Epoch: 6 Global Step: 137050 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:11:50,823-Speed 6313.52 samples/sec Loss 7.5650 LearningRate 0.0009 Epoch: 6 Global Step: 137060 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:11:54,069-Speed 6311.52 samples/sec Loss 7.4749 LearningRate 0.0009 Epoch: 6 Global Step: 137070 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:11:57,311-Speed 6317.54 samples/sec Loss 7.5904 LearningRate 0.0009 Epoch: 6 Global Step: 137080 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:12:00,555-Speed 6315.12 samples/sec Loss 7.5230 LearningRate 0.0009 Epoch: 6 Global Step: 137090 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:12:03,802-Speed 6308.60 samples/sec Loss 7.4460 LearningRate 0.0009 Epoch: 6 Global Step: 137100 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:12:07,047-Speed 6312.56 samples/sec Loss 7.4611 LearningRate 0.0009 Epoch: 6 Global Step: 137110 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:12:10,294-Speed 6309.91 samples/sec Loss 7.4961 LearningRate 0.0009 Epoch: 6 Global Step: 137120 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:12:13,528-Speed 6334.46 samples/sec Loss 7.4402 LearningRate 0.0009 Epoch: 6 Global Step: 137130 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:12:16,773-Speed 6310.82 samples/sec Loss 7.5351 LearningRate 0.0009 Epoch: 6 Global Step: 137140 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:12:20,018-Speed 6313.41 samples/sec Loss 7.5716 LearningRate 0.0009 Epoch: 6 Global Step: 137150 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:12:23,263-Speed 6313.88 samples/sec Loss 7.5468 LearningRate 0.0009 Epoch: 6 Global Step: 137160 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:12:26,509-Speed 6309.19 samples/sec Loss 7.5128 LearningRate 0.0009 Epoch: 6 Global Step: 137170 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:12:29,741-Speed 6338.11 samples/sec Loss 7.4404 LearningRate 0.0009 Epoch: 6 Global Step: 137180 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:12:32,992-Speed 6301.37 samples/sec Loss 7.4673 LearningRate 0.0009 Epoch: 6 Global Step: 137190 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:12:36,236-Speed 6314.43 samples/sec Loss 7.4085 LearningRate 0.0009 Epoch: 6 Global Step: 137200 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:12:39,484-Speed 6305.79 samples/sec Loss 7.5087 LearningRate 0.0009 Epoch: 6 Global Step: 137210 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:12:42,730-Speed 6311.20 samples/sec Loss 7.3954 LearningRate 0.0009 Epoch: 6 Global Step: 137220 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:12:45,976-Speed 6310.49 samples/sec Loss 7.3829 LearningRate 0.0009 Epoch: 6 Global Step: 137230 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:12:49,222-Speed 6311.96 samples/sec Loss 7.4876 LearningRate 0.0009 Epoch: 6 Global Step: 137240 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:12:52,469-Speed 6308.70 samples/sec Loss 7.4634 LearningRate 0.0009 Epoch: 6 Global Step: 137250 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:12:55,713-Speed 6315.25 samples/sec Loss 7.4311 LearningRate 0.0009 Epoch: 6 Global Step: 137260 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:12:58,956-Speed 6316.12 samples/sec Loss 7.4683 LearningRate 0.0009 Epoch: 6 Global Step: 137270 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:02,203-Speed 6308.63 samples/sec Loss 7.4740 LearningRate 0.0009 Epoch: 6 Global Step: 137280 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:13:05,457-Speed 6296.53 samples/sec Loss 7.4506 LearningRate 0.0009 Epoch: 6 Global Step: 137290 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:13:08,701-Speed 6313.96 samples/sec Loss 7.4306 LearningRate 0.0009 Epoch: 6 Global Step: 137300 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:13:11,948-Speed 6308.05 samples/sec Loss 7.4450 LearningRate 0.0009 Epoch: 6 Global Step: 137310 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:13:15,181-Speed 6336.66 samples/sec Loss 7.5475 LearningRate 0.0009 Epoch: 6 Global Step: 137320 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:18,429-Speed 6307.52 samples/sec Loss 7.4396 LearningRate 0.0009 Epoch: 6 Global Step: 137330 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:21,675-Speed 6310.66 samples/sec Loss 7.5040 LearningRate 0.0009 Epoch: 6 Global Step: 137340 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:24,920-Speed 6311.72 samples/sec Loss 7.4351 LearningRate 0.0009 Epoch: 6 Global Step: 137350 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:28,166-Speed 6310.64 samples/sec Loss 7.5260 LearningRate 0.0009 Epoch: 6 Global Step: 137360 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:31,410-Speed 6314.03 samples/sec Loss 7.5698 LearningRate 0.0009 Epoch: 6 Global Step: 137370 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:34,659-Speed 6306.54 samples/sec Loss 7.4476 LearningRate 0.0009 Epoch: 6 Global Step: 137380 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:37,908-Speed 6304.39 samples/sec Loss 7.4918 LearningRate 0.0009 Epoch: 6 Global Step: 137390 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:41,153-Speed 6311.66 samples/sec Loss 7.4849 LearningRate 0.0009 Epoch: 6 Global Step: 137400 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:44,403-Speed 6303.45 samples/sec Loss 7.5050 LearningRate 0.0009 Epoch: 6 Global Step: 137410 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:13:47,648-Speed 6313.11 samples/sec Loss 7.3557 LearningRate 0.0009 Epoch: 6 Global Step: 137420 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:13:50,894-Speed 6309.88 samples/sec Loss 7.5586 LearningRate 0.0009 Epoch: 6 Global Step: 137430 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:13:54,140-Speed 6311.66 samples/sec Loss 7.4256 LearningRate 0.0009 Epoch: 6 Global Step: 137440 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:13:57,386-Speed 6310.33 samples/sec Loss 7.4260 LearningRate 0.0009 Epoch: 6 Global Step: 137450 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:00,636-Speed 6303.34 samples/sec Loss 7.4215 LearningRate 0.0009 Epoch: 6 Global Step: 137460 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:03,881-Speed 6312.34 samples/sec Loss 7.4306 LearningRate 0.0009 Epoch: 6 Global Step: 137470 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:07,139-Speed 6287.64 samples/sec Loss 7.4832 LearningRate 0.0009 Epoch: 6 Global Step: 137480 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:10,386-Speed 6310.24 samples/sec Loss 7.4329 LearningRate 0.0009 Epoch: 6 Global Step: 137490 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:13,631-Speed 6312.75 samples/sec Loss 7.4678 LearningRate 0.0009 Epoch: 6 Global Step: 137500 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:16,879-Speed 6306.06 samples/sec Loss 7.4757 LearningRate 0.0009 Epoch: 6 Global Step: 137510 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:20,111-Speed 6337.12 samples/sec Loss 7.5499 LearningRate 0.0009 Epoch: 6 Global Step: 137520 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:23,357-Speed 6311.81 samples/sec Loss 7.4582 LearningRate 0.0009 Epoch: 6 Global Step: 137530 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:26,605-Speed 6307.08 samples/sec Loss 7.4672 LearningRate 0.0009 Epoch: 6 Global Step: 137540 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:29,853-Speed 6305.86 samples/sec Loss 7.4022 LearningRate 0.0009 Epoch: 6 Global Step: 137550 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:33,104-Speed 6301.35 samples/sec Loss 7.5091 LearningRate 0.0009 Epoch: 6 Global Step: 137560 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:36,346-Speed 6317.83 samples/sec Loss 7.4483 LearningRate 0.0009 Epoch: 6 Global Step: 137570 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:39,593-Speed 6309.06 samples/sec Loss 7.5103 LearningRate 0.0009 Epoch: 6 Global Step: 137580 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:14:42,825-Speed 6339.31 samples/sec Loss 7.4705 LearningRate 0.0009 Epoch: 6 Global Step: 137590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:14:46,073-Speed 6305.09 samples/sec Loss 7.4169 LearningRate 0.0009 Epoch: 6 Global Step: 137600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:14:49,322-Speed 6305.99 samples/sec Loss 7.3875 LearningRate 0.0009 Epoch: 6 Global Step: 137610 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:14:52,569-Speed 6309.04 samples/sec Loss 7.4350 LearningRate 0.0009 Epoch: 6 Global Step: 137620 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:14:55,815-Speed 6310.54 samples/sec Loss 7.4720 LearningRate 0.0009 Epoch: 6 Global Step: 137630 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:14:59,068-Speed 6297.16 samples/sec Loss 7.4496 LearningRate 0.0009 Epoch: 6 Global Step: 137640 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:02,320-Speed 6298.57 samples/sec Loss 7.4005 LearningRate 0.0009 Epoch: 6 Global Step: 137650 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:05,570-Speed 6302.18 samples/sec Loss 7.4673 LearningRate 0.0009 Epoch: 6 Global Step: 137660 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:08,816-Speed 6311.24 samples/sec Loss 7.5623 LearningRate 0.0009 Epoch: 6 Global Step: 137670 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:12,061-Speed 6313.68 samples/sec Loss 7.4156 LearningRate 0.0009 Epoch: 6 Global Step: 137680 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:15,309-Speed 6305.23 samples/sec Loss 7.5652 LearningRate 0.0009 Epoch: 6 Global Step: 137690 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:15:18,559-Speed 6304.18 samples/sec Loss 7.4788 LearningRate 0.0009 Epoch: 6 Global Step: 137700 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:15:21,807-Speed 6307.14 samples/sec Loss 7.4522 LearningRate 0.0009 Epoch: 6 Global Step: 137710 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:15:25,058-Speed 6300.69 samples/sec Loss 7.4308 LearningRate 0.0009 Epoch: 6 Global Step: 137720 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:15:28,289-Speed 6339.79 samples/sec Loss 7.5399 LearningRate 0.0009 Epoch: 6 Global Step: 137730 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:31,537-Speed 6308.00 samples/sec Loss 7.4167 LearningRate 0.0009 Epoch: 6 Global Step: 137740 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:34,795-Speed 6287.32 samples/sec Loss 7.4621 LearningRate 0.0009 Epoch: 6 Global Step: 137750 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:38,053-Speed 6287.35 samples/sec Loss 7.3670 LearningRate 0.0009 Epoch: 6 Global Step: 137760 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:41,297-Speed 6315.23 samples/sec Loss 7.5204 LearningRate 0.0009 Epoch: 6 Global Step: 137770 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:44,543-Speed 6310.01 samples/sec Loss 7.4076 LearningRate 0.0009 Epoch: 6 Global Step: 137780 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:47,789-Speed 6309.94 samples/sec Loss 7.4354 LearningRate 0.0009 Epoch: 6 Global Step: 137790 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:51,036-Speed 6308.58 samples/sec Loss 7.5003 LearningRate 0.0009 Epoch: 6 Global Step: 137800 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:54,285-Speed 6305.03 samples/sec Loss 7.4303 LearningRate 0.0009 Epoch: 6 Global Step: 137810 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:15:57,532-Speed 6309.84 samples/sec Loss 7.4417 LearningRate 0.0009 Epoch: 6 Global Step: 137820 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:16:00,787-Speed 6291.90 samples/sec Loss 7.4540 LearningRate 0.0009 Epoch: 6 Global Step: 137830 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:16:04,035-Speed 6307.59 samples/sec Loss 7.4514 LearningRate 0.0009 Epoch: 6 Global Step: 137840 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:16:07,282-Speed 6309.21 samples/sec Loss 7.4059 LearningRate 0.0009 Epoch: 6 Global Step: 137850 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:16:10,532-Speed 6302.63 samples/sec Loss 7.4828 LearningRate 0.0009 Epoch: 6 Global Step: 137860 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:16:13,779-Speed 6309.60 samples/sec Loss 7.4143 LearningRate 0.0009 Epoch: 6 Global Step: 137870 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:16:17,027-Speed 6306.38 samples/sec Loss 7.4284 LearningRate 0.0009 Epoch: 6 Global Step: 137880 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:16:20,274-Speed 6308.22 samples/sec Loss 7.4145 LearningRate 0.0009 Epoch: 6 Global Step: 137890 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:16:23,526-Speed 6299.54 samples/sec Loss 7.4450 LearningRate 0.0009 Epoch: 6 Global Step: 137900 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:16:26,780-Speed 6294.75 samples/sec Loss 7.4433 LearningRate 0.0009 Epoch: 6 Global Step: 137910 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:16:30,010-Speed 6342.14 samples/sec Loss 7.4262 LearningRate 0.0009 Epoch: 6 Global Step: 137920 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:16:33,256-Speed 6311.50 samples/sec Loss 7.4246 LearningRate 0.0009 Epoch: 6 Global Step: 137930 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:16:36,503-Speed 6309.43 samples/sec Loss 7.4131 LearningRate 0.0009 Epoch: 6 Global Step: 137940 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:16:39,755-Speed 6298.25 samples/sec Loss 7.3867 LearningRate 0.0009 Epoch: 6 Global Step: 137950 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:16:43,001-Speed 6311.35 samples/sec Loss 7.4656 LearningRate 0.0009 Epoch: 6 Global Step: 137960 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:16:46,248-Speed 6308.88 samples/sec Loss 7.4834 LearningRate 0.0009 Epoch: 6 Global Step: 137970 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:16:49,501-Speed 6297.01 samples/sec Loss 7.5766 LearningRate 0.0009 Epoch: 6 Global Step: 137980 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:16:52,759-Speed 6286.83 samples/sec Loss 7.4767 LearningRate 0.0009 Epoch: 6 Global Step: 137990 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:16:56,005-Speed 6310.74 samples/sec Loss 7.4375 LearningRate 0.0009 Epoch: 6 Global Step: 138000 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:16:59,258-Speed 6296.97 samples/sec Loss 7.4047 LearningRate 0.0009 Epoch: 6 Global Step: 138010 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:17:02,521-Speed 6277.67 samples/sec Loss 7.4148 LearningRate 0.0009 Epoch: 6 Global Step: 138020 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:17:05,774-Speed 6296.94 samples/sec Loss 7.4031 LearningRate 0.0009 Epoch: 6 Global Step: 138030 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:17:09,022-Speed 6307.53 samples/sec Loss 7.4527 LearningRate 0.0009 Epoch: 6 Global Step: 138040 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:17:12,275-Speed 6297.54 samples/sec Loss 7.4869 LearningRate 0.0009 Epoch: 6 Global Step: 138050 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:17:15,534-Speed 6285.97 samples/sec Loss 7.5595 LearningRate 0.0009 Epoch: 6 Global Step: 138060 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:17:18,782-Speed 6307.06 samples/sec Loss 7.4362 LearningRate 0.0009 Epoch: 6 Global Step: 138070 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:17:22,027-Speed 6312.26 samples/sec Loss 7.4362 LearningRate 0.0009 Epoch: 6 Global Step: 138080 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:17:25,273-Speed 6309.12 samples/sec Loss 7.4332 LearningRate 0.0009 Epoch: 6 Global Step: 138090 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:17:28,522-Speed 6305.11 samples/sec Loss 7.4737 LearningRate 0.0009 Epoch: 6 Global Step: 138100 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:17:31,771-Speed 6304.98 samples/sec Loss 7.4821 LearningRate 0.0009 Epoch: 6 Global Step: 138110 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:17:35,006-Speed 6333.06 samples/sec Loss 7.5097 LearningRate 0.0009 Epoch: 6 Global Step: 138120 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:17:38,253-Speed 6310.19 samples/sec Loss 7.5002 LearningRate 0.0009 Epoch: 6 Global Step: 138130 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:17:41,500-Speed 6308.86 samples/sec Loss 7.4545 LearningRate 0.0009 Epoch: 6 Global Step: 138140 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:17:44,745-Speed 6311.33 samples/sec Loss 7.4805 LearningRate 0.0009 Epoch: 6 Global Step: 138150 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:17:47,994-Speed 6305.56 samples/sec Loss 7.4042 LearningRate 0.0009 Epoch: 6 Global Step: 138160 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:17:51,240-Speed 6310.17 samples/sec Loss 7.4802 LearningRate 0.0009 Epoch: 6 Global Step: 138170 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:17:54,521-Speed 6243.45 samples/sec Loss 7.4131 LearningRate 0.0009 Epoch: 6 Global Step: 138180 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:17:57,766-Speed 6312.64 samples/sec Loss 7.4378 LearningRate 0.0009 Epoch: 6 Global Step: 138190 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:01,011-Speed 6313.73 samples/sec Loss 7.3956 LearningRate 0.0009 Epoch: 6 Global Step: 138200 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:04,260-Speed 6304.97 samples/sec Loss 7.4124 LearningRate 0.0009 Epoch: 6 Global Step: 138210 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:07,506-Speed 6309.97 samples/sec Loss 7.4801 LearningRate 0.0009 Epoch: 6 Global Step: 138220 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:18:10,741-Speed 6331.33 samples/sec Loss 7.4938 LearningRate 0.0009 Epoch: 6 Global Step: 138230 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:13,985-Speed 6314.98 samples/sec Loss 7.4784 LearningRate 0.0009 Epoch: 6 Global Step: 138240 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:17,230-Speed 6313.76 samples/sec Loss 7.4948 LearningRate 0.0009 Epoch: 6 Global Step: 138250 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:20,474-Speed 6313.66 samples/sec Loss 7.4590 LearningRate 0.0009 Epoch: 6 Global Step: 138260 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:23,720-Speed 6311.46 samples/sec Loss 7.4361 LearningRate 0.0009 Epoch: 6 Global Step: 138270 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:26,965-Speed 6311.47 samples/sec Loss 7.4781 LearningRate 0.0009 Epoch: 6 Global Step: 138280 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:30,211-Speed 6311.50 samples/sec Loss 7.4339 LearningRate 0.0009 Epoch: 6 Global Step: 138290 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:33,455-Speed 6313.68 samples/sec Loss 7.4544 LearningRate 0.0009 Epoch: 6 Global Step: 138300 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:36,696-Speed 6320.19 samples/sec Loss 7.4209 LearningRate 0.0009 Epoch: 6 Global Step: 138310 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:39,942-Speed 6310.86 samples/sec Loss 7.4690 LearningRate 0.0009 Epoch: 6 Global Step: 138320 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:18:43,190-Speed 6308.37 samples/sec Loss 7.4510 LearningRate 0.0009 Epoch: 6 Global Step: 138330 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:18:46,437-Speed 6308.36 samples/sec Loss 7.4093 LearningRate 0.0009 Epoch: 6 Global Step: 138340 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:18:49,698-Speed 6282.15 samples/sec Loss 7.3810 LearningRate 0.0009 Epoch: 6 Global Step: 138350 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:18:52,943-Speed 6312.61 samples/sec Loss 7.4017 LearningRate 0.0009 Epoch: 6 Global Step: 138360 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:18:56,187-Speed 6313.75 samples/sec Loss 7.3774 LearningRate 0.0009 Epoch: 6 Global Step: 138370 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:18:59,438-Speed 6300.99 samples/sec Loss 7.3898 LearningRate 0.0009 Epoch: 6 Global Step: 138380 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:02,692-Speed 6296.93 samples/sec Loss 7.4660 LearningRate 0.0009 Epoch: 6 Global Step: 138390 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:05,951-Speed 6285.52 samples/sec Loss 7.3802 LearningRate 0.0009 Epoch: 6 Global Step: 138400 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:09,199-Speed 6306.24 samples/sec Loss 7.4378 LearningRate 0.0009 Epoch: 6 Global Step: 138410 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:12,443-Speed 6313.43 samples/sec Loss 7.4924 LearningRate 0.0009 Epoch: 6 Global Step: 138420 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:15,674-Speed 6341.32 samples/sec Loss 7.4613 LearningRate 0.0009 Epoch: 6 Global Step: 138430 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:18,921-Speed 6309.29 samples/sec Loss 7.5165 LearningRate 0.0009 Epoch: 6 Global Step: 138440 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:22,167-Speed 6309.83 samples/sec Loss 7.4283 LearningRate 0.0009 Epoch: 6 Global Step: 138450 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:25,414-Speed 6309.22 samples/sec Loss 7.4875 LearningRate 0.0009 Epoch: 6 Global Step: 138460 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:28,663-Speed 6305.02 samples/sec Loss 7.4388 LearningRate 0.0009 Epoch: 6 Global Step: 138470 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:31,909-Speed 6309.85 samples/sec Loss 7.4190 LearningRate 0.0009 Epoch: 6 Global Step: 138480 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:35,154-Speed 6312.61 samples/sec Loss 7.3733 LearningRate 0.0009 Epoch: 6 Global Step: 138490 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:38,398-Speed 6313.66 samples/sec Loss 7.4692 LearningRate 0.0009 Epoch: 6 Global Step: 138500 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:19:41,631-Speed 6337.97 samples/sec Loss 7.4365 LearningRate 0.0009 Epoch: 6 Global Step: 138510 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:19:44,882-Speed 6300.52 samples/sec Loss 7.4707 LearningRate 0.0009 Epoch: 6 Global Step: 138520 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:19:48,135-Speed 6297.90 samples/sec Loss 7.4891 LearningRate 0.0009 Epoch: 6 Global Step: 138530 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:19:51,383-Speed 6306.65 samples/sec Loss 7.3847 LearningRate 0.0009 Epoch: 6 Global Step: 138540 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:19:54,632-Speed 6303.77 samples/sec Loss 7.4969 LearningRate 0.0009 Epoch: 6 Global Step: 138550 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:19:57,877-Speed 6313.08 samples/sec Loss 7.4590 LearningRate 0.0009 Epoch: 6 Global Step: 138560 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:01,121-Speed 6315.34 samples/sec Loss 7.4679 LearningRate 0.0009 Epoch: 6 Global Step: 138570 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:04,365-Speed 6314.66 samples/sec Loss 7.4400 LearningRate 0.0009 Epoch: 6 Global Step: 138580 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:07,615-Speed 6302.52 samples/sec Loss 7.4942 LearningRate 0.0009 Epoch: 6 Global Step: 138590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:10,871-Speed 6292.40 samples/sec Loss 7.3888 LearningRate 0.0009 Epoch: 6 Global Step: 138600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:14,120-Speed 6303.53 samples/sec Loss 7.4387 LearningRate 0.0009 Epoch: 6 Global Step: 138610 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:20:17,368-Speed 6306.76 samples/sec Loss 7.4543 LearningRate 0.0009 Epoch: 6 Global Step: 138620 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:20:20,616-Speed 6307.92 samples/sec Loss 7.4294 LearningRate 0.0009 Epoch: 6 Global Step: 138630 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:20:23,870-Speed 6293.98 samples/sec Loss 7.4243 LearningRate 0.0009 Epoch: 6 Global Step: 138640 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:20:27,116-Speed 6310.56 samples/sec Loss 7.4624 LearningRate 0.0009 Epoch: 6 Global Step: 138650 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:20:30,360-Speed 6314.92 samples/sec Loss 7.4041 LearningRate 0.0009 Epoch: 6 Global Step: 138660 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:20:33,606-Speed 6310.72 samples/sec Loss 7.4391 LearningRate 0.0009 Epoch: 6 Global Step: 138670 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:20:36,841-Speed 6332.36 samples/sec Loss 7.4928 LearningRate 0.0009 Epoch: 6 Global Step: 138680 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:40,087-Speed 6310.16 samples/sec Loss 7.4593 LearningRate 0.0009 Epoch: 6 Global Step: 138690 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:43,331-Speed 6315.46 samples/sec Loss 7.4225 LearningRate 0.0009 Epoch: 6 Global Step: 138700 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:46,580-Speed 6305.32 samples/sec Loss 7.4955 LearningRate 0.0009 Epoch: 6 Global Step: 138710 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:49,824-Speed 6313.76 samples/sec Loss 7.4040 LearningRate 0.0009 Epoch: 6 Global Step: 138720 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:53,075-Speed 6301.13 samples/sec Loss 7.4802 LearningRate 0.0009 Epoch: 6 Global Step: 138730 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:56,319-Speed 6315.45 samples/sec Loss 7.4927 LearningRate 0.0009 Epoch: 6 Global Step: 138740 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:20:59,567-Speed 6307.07 samples/sec Loss 7.3684 LearningRate 0.0009 Epoch: 6 Global Step: 138750 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:21:02,812-Speed 6311.74 samples/sec Loss 7.4055 LearningRate 0.0009 Epoch: 6 Global Step: 138760 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:21:06,057-Speed 6314.49 samples/sec Loss 7.3444 LearningRate 0.0009 Epoch: 6 Global Step: 138770 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:21:09,309-Speed 6298.94 samples/sec Loss 7.4625 LearningRate 0.0009 Epoch: 6 Global Step: 138780 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:12,556-Speed 6308.65 samples/sec Loss 7.3970 LearningRate 0.0009 Epoch: 6 Global Step: 138790 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:15,816-Speed 6283.02 samples/sec Loss 7.4271 LearningRate 0.0009 Epoch: 6 Global Step: 138800 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:19,063-Speed 6309.78 samples/sec Loss 7.4040 LearningRate 0.0009 Epoch: 6 Global Step: 138810 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:22,308-Speed 6311.87 samples/sec Loss 7.5330 LearningRate 0.0009 Epoch: 6 Global Step: 138820 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:25,558-Speed 6303.79 samples/sec Loss 7.4380 LearningRate 0.0009 Epoch: 6 Global Step: 138830 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:28,804-Speed 6310.72 samples/sec Loss 7.3875 LearningRate 0.0009 Epoch: 6 Global Step: 138840 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:32,051-Speed 6307.35 samples/sec Loss 7.4836 LearningRate 0.0009 Epoch: 6 Global Step: 138850 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:35,304-Speed 6297.48 samples/sec Loss 7.4095 LearningRate 0.0009 Epoch: 6 Global Step: 138860 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:38,554-Speed 6303.11 samples/sec Loss 7.3636 LearningRate 0.0009 Epoch: 6 Global Step: 138870 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:41,786-Speed 6338.00 samples/sec Loss 7.3942 LearningRate 0.0009 Epoch: 6 Global Step: 138880 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:45,028-Speed 6318.11 samples/sec Loss 7.4368 LearningRate 0.0009 Epoch: 6 Global Step: 138890 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:48,279-Speed 6301.37 samples/sec Loss 7.3567 LearningRate 0.0009 Epoch: 6 Global Step: 138900 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:51,529-Speed 6303.99 samples/sec Loss 7.4394 LearningRate 0.0009 Epoch: 6 Global Step: 138910 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:54,778-Speed 6303.59 samples/sec Loss 7.3372 LearningRate 0.0009 Epoch: 6 Global Step: 138920 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:21:58,026-Speed 6307.16 samples/sec Loss 7.3970 LearningRate 0.0009 Epoch: 6 Global Step: 138930 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:01,278-Speed 6299.91 samples/sec Loss 7.3827 LearningRate 0.0009 Epoch: 6 Global Step: 138940 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:04,528-Speed 6303.45 samples/sec Loss 7.4296 LearningRate 0.0009 Epoch: 6 Global Step: 138950 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:07,777-Speed 6304.71 samples/sec Loss 7.4050 LearningRate 0.0009 Epoch: 6 Global Step: 138960 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:11,026-Speed 6305.43 samples/sec Loss 7.2882 LearningRate 0.0009 Epoch: 6 Global Step: 138970 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:14,259-Speed 6336.15 samples/sec Loss 7.3709 LearningRate 0.0009 Epoch: 6 Global Step: 138980 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:17,507-Speed 6307.75 samples/sec Loss 7.3582 LearningRate 0.0009 Epoch: 6 Global Step: 138990 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:20,753-Speed 6311.64 samples/sec Loss 7.3805 LearningRate 0.0009 Epoch: 6 Global Step: 139000 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:23,996-Speed 6316.27 samples/sec Loss 7.4495 LearningRate 0.0009 Epoch: 6 Global Step: 139010 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:27,244-Speed 6306.59 samples/sec Loss 7.5035 LearningRate 0.0009 Epoch: 6 Global Step: 139020 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:30,491-Speed 6309.93 samples/sec Loss 7.3702 LearningRate 0.0009 Epoch: 6 Global Step: 139030 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:22:33,721-Speed 6341.04 samples/sec Loss 7.4185 LearningRate 0.0009 Epoch: 6 Global Step: 139040 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:22:36,969-Speed 6307.33 samples/sec Loss 7.3621 LearningRate 0.0009 Epoch: 6 Global Step: 139050 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:22:40,212-Speed 6315.47 samples/sec Loss 7.4394 LearningRate 0.0009 Epoch: 6 Global Step: 139060 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:22:43,461-Speed 6305.36 samples/sec Loss 7.3655 LearningRate 0.0009 Epoch: 6 Global Step: 139070 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:22:46,713-Speed 6298.26 samples/sec Loss 7.4228 LearningRate 0.0009 Epoch: 6 Global Step: 139080 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:22:49,959-Speed 6311.67 samples/sec Loss 7.4540 LearningRate 0.0009 Epoch: 6 Global Step: 139090 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:22:53,206-Speed 6309.54 samples/sec Loss 7.3959 LearningRate 0.0009 Epoch: 6 Global Step: 139100 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:22:56,451-Speed 6311.90 samples/sec Loss 7.4177 LearningRate 0.0009 Epoch: 6 Global Step: 139110 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:22:59,698-Speed 6308.40 samples/sec Loss 7.5414 LearningRate 0.0009 Epoch: 6 Global Step: 139120 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:23:02,946-Speed 6306.89 samples/sec Loss 7.5209 LearningRate 0.0009 Epoch: 6 Global Step: 139130 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:23:06,191-Speed 6312.55 samples/sec Loss 7.3729 LearningRate 0.0009 Epoch: 6 Global Step: 139140 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:09,440-Speed 6305.30 samples/sec Loss 7.4500 LearningRate 0.0009 Epoch: 6 Global Step: 139150 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:12,689-Speed 6305.73 samples/sec Loss 7.4691 LearningRate 0.0009 Epoch: 6 Global Step: 139160 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:15,935-Speed 6310.44 samples/sec Loss 7.4215 LearningRate 0.0009 Epoch: 6 Global Step: 139170 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:19,178-Speed 6316.46 samples/sec Loss 7.4037 LearningRate 0.0009 Epoch: 6 Global Step: 139180 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:22,426-Speed 6307.26 samples/sec Loss 7.4424 LearningRate 0.0009 Epoch: 6 Global Step: 139190 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:25,679-Speed 6297.07 samples/sec Loss 7.4219 LearningRate 0.0009 Epoch: 6 Global Step: 139200 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:28,941-Speed 6280.61 samples/sec Loss 7.4260 LearningRate 0.0009 Epoch: 6 Global Step: 139210 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:32,187-Speed 6311.47 samples/sec Loss 7.3497 LearningRate 0.0009 Epoch: 6 Global Step: 139220 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:35,434-Speed 6308.35 samples/sec Loss 7.4731 LearningRate 0.0009 Epoch: 6 Global Step: 139230 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:38,666-Speed 6338.35 samples/sec Loss 7.3716 LearningRate 0.0009 Epoch: 6 Global Step: 139240 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:41,912-Speed 6309.56 samples/sec Loss 7.4098 LearningRate 0.0009 Epoch: 6 Global Step: 139250 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:45,158-Speed 6311.16 samples/sec Loss 7.5521 LearningRate 0.0009 Epoch: 6 Global Step: 139260 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:48,408-Speed 6303.28 samples/sec Loss 7.4559 LearningRate 0.0009 Epoch: 6 Global Step: 139270 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:51,656-Speed 6307.66 samples/sec Loss 7.4681 LearningRate 0.0009 Epoch: 6 Global Step: 139280 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:54,901-Speed 6311.35 samples/sec Loss 7.4551 LearningRate 0.0009 Epoch: 6 Global Step: 139290 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:23:58,146-Speed 6312.32 samples/sec Loss 7.4783 LearningRate 0.0009 Epoch: 6 Global Step: 139300 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:24:01,393-Speed 6309.16 samples/sec Loss 7.4441 LearningRate 0.0009 Epoch: 6 Global Step: 139310 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:24:04,640-Speed 6309.90 samples/sec Loss 7.4270 LearningRate 0.0009 Epoch: 6 Global Step: 139320 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:24:07,876-Speed 6329.41 samples/sec Loss 7.4418 LearningRate 0.0009 Epoch: 6 Global Step: 139330 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:24:11,124-Speed 6306.55 samples/sec Loss 7.4708 LearningRate 0.0009 Epoch: 6 Global Step: 139340 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:24:14,369-Speed 6313.54 samples/sec Loss 7.3891 LearningRate 0.0009 Epoch: 6 Global Step: 139350 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:24:17,619-Speed 6302.19 samples/sec Loss 7.4110 LearningRate 0.0009 Epoch: 6 Global Step: 139360 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:24:20,868-Speed 6305.33 samples/sec Loss 7.5380 LearningRate 0.0009 Epoch: 6 Global Step: 139370 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:24:24,120-Speed 6299.42 samples/sec Loss 7.4775 LearningRate 0.0009 Epoch: 6 Global Step: 139380 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:24:27,362-Speed 6318.54 samples/sec Loss 7.4382 LearningRate 0.0009 Epoch: 6 Global Step: 139390 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:24:30,607-Speed 6312.78 samples/sec Loss 7.4767 LearningRate 0.0009 Epoch: 6 Global Step: 139400 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:24:33,856-Speed 6304.56 samples/sec Loss 7.4042 LearningRate 0.0009 Epoch: 6 Global Step: 139410 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:24:37,104-Speed 6307.91 samples/sec Loss 7.3759 LearningRate 0.0009 Epoch: 6 Global Step: 139420 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:24:40,353-Speed 6305.03 samples/sec Loss 7.3946 LearningRate 0.0009 Epoch: 6 Global Step: 139430 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:24:43,643-Speed 6225.81 samples/sec Loss 7.4914 LearningRate 0.0009 Epoch: 6 Global Step: 139440 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:24:46,890-Speed 6309.18 samples/sec Loss 7.4144 LearningRate 0.0009 Epoch: 6 Global Step: 139450 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:24:50,139-Speed 6304.72 samples/sec Loss 7.4985 LearningRate 0.0009 Epoch: 6 Global Step: 139460 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:24:53,385-Speed 6310.59 samples/sec Loss 7.3952 LearningRate 0.0009 Epoch: 6 Global Step: 139470 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:24:56,641-Speed 6292.73 samples/sec Loss 7.4747 LearningRate 0.0009 Epoch: 6 Global Step: 139480 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:24:59,884-Speed 6315.94 samples/sec Loss 7.4215 LearningRate 0.0009 Epoch: 6 Global Step: 139490 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:25:03,132-Speed 6305.76 samples/sec Loss 7.4114 LearningRate 0.0009 Epoch: 6 Global Step: 139500 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:25:06,377-Speed 6313.28 samples/sec Loss 7.3895 LearningRate 0.0009 Epoch: 6 Global Step: 139510 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:25:09,626-Speed 6305.03 samples/sec Loss 7.4728 LearningRate 0.0009 Epoch: 6 Global Step: 139520 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:25:12,861-Speed 6331.72 samples/sec Loss 7.4476 LearningRate 0.0009 Epoch: 6 Global Step: 139530 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:25:16,106-Speed 6311.80 samples/sec Loss 7.3616 LearningRate 0.0009 Epoch: 6 Global Step: 139540 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:25:19,353-Speed 6310.38 samples/sec Loss 7.5043 LearningRate 0.0009 Epoch: 6 Global Step: 139550 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:25:22,601-Speed 6305.36 samples/sec Loss 7.4245 LearningRate 0.0009 Epoch: 6 Global Step: 139560 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:25:25,851-Speed 6303.67 samples/sec Loss 7.4217 LearningRate 0.0009 Epoch: 6 Global Step: 139570 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:25:29,083-Speed 6337.91 samples/sec Loss 7.3926 LearningRate 0.0009 Epoch: 6 Global Step: 139580 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:25:32,328-Speed 6312.98 samples/sec Loss 7.4061 LearningRate 0.0009 Epoch: 6 Global Step: 139590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:25:35,579-Speed 6302.26 samples/sec Loss 7.3902 LearningRate 0.0009 Epoch: 6 Global Step: 139600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:25:38,822-Speed 6315.21 samples/sec Loss 7.4518 LearningRate 0.0009 Epoch: 6 Global Step: 139610 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:25:42,068-Speed 6311.27 samples/sec Loss 7.4098 LearningRate 0.0009 Epoch: 6 Global Step: 139620 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:25:45,312-Speed 6314.35 samples/sec Loss 7.4340 LearningRate 0.0009 Epoch: 6 Global Step: 139630 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:25:48,558-Speed 6312.49 samples/sec Loss 7.4407 LearningRate 0.0009 Epoch: 6 Global Step: 139640 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:25:51,804-Speed 6310.21 samples/sec Loss 7.4122 LearningRate 0.0009 Epoch: 6 Global Step: 139650 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:25:55,052-Speed 6305.66 samples/sec Loss 7.3850 LearningRate 0.0009 Epoch: 6 Global Step: 139660 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:25:58,299-Speed 6309.44 samples/sec Loss 7.3764 LearningRate 0.0009 Epoch: 6 Global Step: 139670 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:26:01,545-Speed 6309.93 samples/sec Loss 7.4178 LearningRate 0.0009 Epoch: 6 Global Step: 139680 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:04,797-Speed 6299.26 samples/sec Loss 7.3745 LearningRate 0.0009 Epoch: 6 Global Step: 139690 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:08,045-Speed 6306.42 samples/sec Loss 7.3451 LearningRate 0.0009 Epoch: 6 Global Step: 139700 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:11,293-Speed 6306.69 samples/sec Loss 7.3492 LearningRate 0.0009 Epoch: 6 Global Step: 139710 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:14,539-Speed 6312.25 samples/sec Loss 7.3994 LearningRate 0.0009 Epoch: 6 Global Step: 139720 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:17,784-Speed 6311.28 samples/sec Loss 7.4582 LearningRate 0.0009 Epoch: 6 Global Step: 139730 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:21,032-Speed 6307.99 samples/sec Loss 7.4232 LearningRate 0.0009 Epoch: 6 Global Step: 139740 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:24,280-Speed 6305.23 samples/sec Loss 7.4471 LearningRate 0.0009 Epoch: 6 Global Step: 139750 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:27,529-Speed 6305.87 samples/sec Loss 7.4435 LearningRate 0.0009 Epoch: 6 Global Step: 139760 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:30,778-Speed 6305.08 samples/sec Loss 7.4897 LearningRate 0.0009 Epoch: 6 Global Step: 139770 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:34,010-Speed 6336.80 samples/sec Loss 7.4785 LearningRate 0.0009 Epoch: 6 Global Step: 139780 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:37,259-Speed 6306.51 samples/sec Loss 7.3499 LearningRate 0.0009 Epoch: 6 Global Step: 139790 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:40,511-Speed 6298.57 samples/sec Loss 7.3803 LearningRate 0.0009 Epoch: 6 Global Step: 139800 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:43,759-Speed 6307.13 samples/sec Loss 7.3544 LearningRate 0.0009 Epoch: 6 Global Step: 139810 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:47,007-Speed 6306.07 samples/sec Loss 7.4384 LearningRate 0.0009 Epoch: 6 Global Step: 139820 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:50,259-Speed 6303.34 samples/sec Loss 7.3765 LearningRate 0.0009 Epoch: 6 Global Step: 139830 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:53,504-Speed 6313.13 samples/sec Loss 7.3078 LearningRate 0.0009 Epoch: 6 Global Step: 139840 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:26:56,740-Speed 6330.42 samples/sec Loss 7.3913 LearningRate 0.0009 Epoch: 6 Global Step: 139850 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:26:59,991-Speed 6299.91 samples/sec Loss 7.4514 LearningRate 0.0009 Epoch: 6 Global Step: 139860 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:27:03,239-Speed 6306.71 samples/sec Loss 7.4086 LearningRate 0.0009 Epoch: 6 Global Step: 139870 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:27:06,485-Speed 6311.44 samples/sec Loss 7.4071 LearningRate 0.0009 Epoch: 6 Global Step: 139880 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:27:09,729-Speed 6314.45 samples/sec Loss 7.4682 LearningRate 0.0009 Epoch: 6 Global Step: 139890 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:27:12,981-Speed 6298.41 samples/sec Loss 7.4626 LearningRate 0.0009 Epoch: 6 Global Step: 139900 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:27:16,229-Speed 6308.16 samples/sec Loss 7.4386 LearningRate 0.0009 Epoch: 6 Global Step: 139910 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:27:19,475-Speed 6309.95 samples/sec Loss 7.4230 LearningRate 0.0009 Epoch: 6 Global Step: 139920 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:27:22,724-Speed 6305.90 samples/sec Loss 7.3518 LearningRate 0.0009 Epoch: 6 Global Step: 139930 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:27:25,970-Speed 6309.38 samples/sec Loss 7.4781 LearningRate 0.0009 Epoch: 6 Global Step: 139940 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:27:29,220-Speed 6303.36 samples/sec Loss 7.4716 LearningRate 0.0009 Epoch: 6 Global Step: 139950 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:27:32,469-Speed 6304.09 samples/sec Loss 7.3908 LearningRate 0.0009 Epoch: 6 Global Step: 139960 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:27:35,718-Speed 6306.24 samples/sec Loss 7.4705 LearningRate 0.0009 Epoch: 6 Global Step: 139970 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:27:38,966-Speed 6307.20 samples/sec Loss 7.4612 LearningRate 0.0009 Epoch: 6 Global Step: 139980 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:27:42,217-Speed 6300.38 samples/sec Loss 7.4452 LearningRate 0.0009 Epoch: 6 Global Step: 139990 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:27:45,464-Speed 6309.28 samples/sec Loss 7.3942 LearningRate 0.0009 Epoch: 6 Global Step: 140000 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:27:48,716-Speed 6298.88 samples/sec Loss 7.4483 LearningRate 0.0009 Epoch: 6 Global Step: 140010 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:27:51,964-Speed 6306.16 samples/sec Loss 7.4543 LearningRate 0.0009 Epoch: 6 Global Step: 140020 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:27:55,212-Speed 6307.08 samples/sec Loss 7.3640 LearningRate 0.0009 Epoch: 6 Global Step: 140030 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:27:58,450-Speed 6326.68 samples/sec Loss 7.4110 LearningRate 0.0009 Epoch: 6 Global Step: 140040 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:01,695-Speed 6313.41 samples/sec Loss 7.4537 LearningRate 0.0009 Epoch: 6 Global Step: 140050 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:04,944-Speed 6304.95 samples/sec Loss 7.4796 LearningRate 0.0009 Epoch: 6 Global Step: 140060 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:08,193-Speed 6305.48 samples/sec Loss 7.4653 LearningRate 0.0009 Epoch: 6 Global Step: 140070 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:11,441-Speed 6305.20 samples/sec Loss 7.3763 LearningRate 0.0009 Epoch: 6 Global Step: 140080 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:14,691-Speed 6302.85 samples/sec Loss 7.4340 LearningRate 0.0009 Epoch: 6 Global Step: 140090 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:17,934-Speed 6315.88 samples/sec Loss 7.3637 LearningRate 0.0009 Epoch: 6 Global Step: 140100 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:21,185-Speed 6302.86 samples/sec Loss 7.3810 LearningRate 0.0009 Epoch: 6 Global Step: 140110 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:24,432-Speed 6308.22 samples/sec Loss 7.3670 LearningRate 0.0009 Epoch: 6 Global Step: 140120 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:27,684-Speed 6297.95 samples/sec Loss 7.4621 LearningRate 0.0009 Epoch: 6 Global Step: 140130 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:30,930-Speed 6310.63 samples/sec Loss 7.4382 LearningRate 0.0009 Epoch: 6 Global Step: 140140 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:28:34,178-Speed 6307.30 samples/sec Loss 7.3789 LearningRate 0.0009 Epoch: 6 Global Step: 140150 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:28:37,409-Speed 6339.90 samples/sec Loss 7.3471 LearningRate 0.0009 Epoch: 6 Global Step: 140160 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:40,656-Speed 6310.26 samples/sec Loss 7.3633 LearningRate 0.0009 Epoch: 6 Global Step: 140170 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:43,903-Speed 6308.38 samples/sec Loss 7.2760 LearningRate 0.0009 Epoch: 6 Global Step: 140180 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:47,152-Speed 6303.62 samples/sec Loss 7.4416 LearningRate 0.0009 Epoch: 6 Global Step: 140190 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:50,400-Speed 6308.39 samples/sec Loss 7.3702 LearningRate 0.0009 Epoch: 6 Global Step: 140200 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:53,646-Speed 6308.76 samples/sec Loss 7.3135 LearningRate 0.0009 Epoch: 6 Global Step: 140210 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:28:56,893-Speed 6310.85 samples/sec Loss 7.4033 LearningRate 0.0009 Epoch: 6 Global Step: 140220 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:29:00,156-Speed 6276.91 samples/sec Loss 7.4033 LearningRate 0.0009 Epoch: 6 Global Step: 140230 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:29:03,408-Speed 6299.72 samples/sec Loss 7.3355 LearningRate 0.0009 Epoch: 6 Global Step: 140240 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:29:06,656-Speed 6306.80 samples/sec Loss 7.3747 LearningRate 0.0009 Epoch: 6 Global Step: 140250 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:29:09,904-Speed 6307.71 samples/sec Loss 7.4164 LearningRate 0.0009 Epoch: 6 Global Step: 140260 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:13,154-Speed 6302.85 samples/sec Loss 7.3199 LearningRate 0.0009 Epoch: 6 Global Step: 140270 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:16,401-Speed 6307.04 samples/sec Loss 7.4120 LearningRate 0.0009 Epoch: 6 Global Step: 140280 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:19,692-Speed 6227.92 samples/sec Loss 7.4321 LearningRate 0.0009 Epoch: 6 Global Step: 140290 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:22,937-Speed 6313.07 samples/sec Loss 7.4036 LearningRate 0.0009 Epoch: 6 Global Step: 140300 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:26,190-Speed 6297.22 samples/sec Loss 7.4529 LearningRate 0.0009 Epoch: 6 Global Step: 140310 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:29,438-Speed 6307.13 samples/sec Loss 7.3228 LearningRate 0.0009 Epoch: 6 Global Step: 140320 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:32,681-Speed 6315.73 samples/sec Loss 7.5233 LearningRate 0.0009 Epoch: 6 Global Step: 140330 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:35,929-Speed 6306.64 samples/sec Loss 7.3524 LearningRate 0.0009 Epoch: 6 Global Step: 140340 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:39,182-Speed 6298.43 samples/sec Loss 7.3781 LearningRate 0.0009 Epoch: 6 Global Step: 140350 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:42,415-Speed 6335.05 samples/sec Loss 7.4347 LearningRate 0.0009 Epoch: 6 Global Step: 140360 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:29:45,651-Speed 6330.81 samples/sec Loss 7.4938 LearningRate 0.0009 Epoch: 6 Global Step: 140370 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:29:48,900-Speed 6304.71 samples/sec Loss 7.4358 LearningRate 0.0009 Epoch: 6 Global Step: 140380 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:29:52,146-Speed 6310.36 samples/sec Loss 7.3956 LearningRate 0.0009 Epoch: 6 Global Step: 140390 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:29:55,395-Speed 6305.49 samples/sec Loss 7.3394 LearningRate 0.0009 Epoch: 6 Global Step: 140400 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:29:58,647-Speed 6298.92 samples/sec Loss 7.3463 LearningRate 0.0009 Epoch: 6 Global Step: 140410 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:01,897-Speed 6304.68 samples/sec Loss 7.3766 LearningRate 0.0009 Epoch: 6 Global Step: 140420 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:05,148-Speed 6302.93 samples/sec Loss 7.4635 LearningRate 0.0009 Epoch: 6 Global Step: 140430 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:08,397-Speed 6305.63 samples/sec Loss 7.4183 LearningRate 0.0009 Epoch: 6 Global Step: 140440 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:11,639-Speed 6316.95 samples/sec Loss 7.3636 LearningRate 0.0009 Epoch: 6 Global Step: 140450 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:14,884-Speed 6312.29 samples/sec Loss 7.3513 LearningRate 0.0009 Epoch: 6 Global Step: 140460 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:18,137-Speed 6297.01 samples/sec Loss 7.4656 LearningRate 0.0009 Epoch: 6 Global Step: 140470 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:30:21,385-Speed 6307.27 samples/sec Loss 7.4232 LearningRate 0.0009 Epoch: 6 Global Step: 140480 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:30:24,629-Speed 6315.78 samples/sec Loss 7.4119 LearningRate 0.0009 Epoch: 6 Global Step: 140490 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:30:27,878-Speed 6304.20 samples/sec Loss 7.4042 LearningRate 0.0009 Epoch: 6 Global Step: 140500 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:30:31,124-Speed 6310.37 samples/sec Loss 7.3313 LearningRate 0.0009 Epoch: 6 Global Step: 140510 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:30:34,374-Speed 6302.75 samples/sec Loss 7.4311 LearningRate 0.0009 Epoch: 6 Global Step: 140520 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:30:37,622-Speed 6306.72 samples/sec Loss 7.4707 LearningRate 0.0009 Epoch: 6 Global Step: 140530 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:30:40,854-Speed 6339.44 samples/sec Loss 7.4415 LearningRate 0.0009 Epoch: 6 Global Step: 140540 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:44,103-Speed 6304.02 samples/sec Loss 7.3818 LearningRate 0.0009 Epoch: 6 Global Step: 140550 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:47,354-Speed 6300.77 samples/sec Loss 7.3833 LearningRate 0.0009 Epoch: 6 Global Step: 140560 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:50,604-Speed 6302.97 samples/sec Loss 7.4191 LearningRate 0.0009 Epoch: 6 Global Step: 140570 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:53,849-Speed 6312.91 samples/sec Loss 7.3571 LearningRate 0.0009 Epoch: 6 Global Step: 140580 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:30:57,093-Speed 6314.06 samples/sec Loss 7.5122 LearningRate 0.0009 Epoch: 6 Global Step: 140590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:31:00,340-Speed 6309.07 samples/sec Loss 7.4656 LearningRate 0.0009 Epoch: 6 Global Step: 140600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:31:03,588-Speed 6307.76 samples/sec Loss 7.4576 LearningRate 0.0009 Epoch: 6 Global Step: 140610 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:31:06,836-Speed 6305.64 samples/sec Loss 7.5055 LearningRate 0.0009 Epoch: 6 Global Step: 140620 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:31:10,097-Speed 6282.32 samples/sec Loss 7.3381 LearningRate 0.0009 Epoch: 6 Global Step: 140630 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:31:13,349-Speed 6299.71 samples/sec Loss 7.3577 LearningRate 0.0009 Epoch: 6 Global Step: 140640 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:31:16,597-Speed 6306.47 samples/sec Loss 7.4035 LearningRate 0.0009 Epoch: 6 Global Step: 140650 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:31:19,847-Speed 6303.93 samples/sec Loss 7.3892 LearningRate 0.0009 Epoch: 6 Global Step: 140660 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:31:23,095-Speed 6306.41 samples/sec Loss 7.3570 LearningRate 0.0009 Epoch: 6 Global Step: 140670 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:31:26,347-Speed 6299.70 samples/sec Loss 7.4212 LearningRate 0.0009 Epoch: 6 Global Step: 140680 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:31:29,601-Speed 6295.35 samples/sec Loss 7.3926 LearningRate 0.0009 Epoch: 6 Global Step: 140690 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:31:32,848-Speed 6308.20 samples/sec Loss 7.3003 LearningRate 0.0009 Epoch: 6 Global Step: 140700 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:31:36,097-Speed 6304.29 samples/sec Loss 7.4185 LearningRate 0.0009 Epoch: 6 Global Step: 140710 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:31:39,345-Speed 6308.33 samples/sec Loss 7.3576 LearningRate 0.0009 Epoch: 6 Global Step: 140720 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:31:42,578-Speed 6336.22 samples/sec Loss 7.4843 LearningRate 0.0009 Epoch: 6 Global Step: 140730 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:31:45,821-Speed 6315.80 samples/sec Loss 7.4348 LearningRate 0.0009 Epoch: 6 Global Step: 140740 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:31:49,071-Speed 6303.73 samples/sec Loss 7.4045 LearningRate 0.0009 Epoch: 6 Global Step: 140750 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:31:52,317-Speed 6310.50 samples/sec Loss 7.5172 LearningRate 0.0009 Epoch: 6 Global Step: 140760 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:31:55,561-Speed 6313.65 samples/sec Loss 7.3892 LearningRate 0.0009 Epoch: 6 Global Step: 140770 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:31:58,807-Speed 6310.17 samples/sec Loss 7.4417 LearningRate 0.0009 Epoch: 6 Global Step: 140780 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:32:02,058-Speed 6301.24 samples/sec Loss 7.4173 LearningRate 0.0009 Epoch: 6 Global Step: 140790 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:32:05,311-Speed 6297.23 samples/sec Loss 7.4017 LearningRate 0.0009 Epoch: 6 Global Step: 140800 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:32:08,565-Speed 6295.65 samples/sec Loss 7.3417 LearningRate 0.0009 Epoch: 6 Global Step: 140810 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:32:11,813-Speed 6307.49 samples/sec Loss 7.4418 LearningRate 0.0009 Epoch: 6 Global Step: 140820 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:32:15,059-Speed 6309.47 samples/sec Loss 7.3949 LearningRate 0.0009 Epoch: 6 Global Step: 140830 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:18,315-Speed 6292.32 samples/sec Loss 7.3258 LearningRate 0.0009 Epoch: 6 Global Step: 140840 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:21,562-Speed 6308.19 samples/sec Loss 7.4056 LearningRate 0.0009 Epoch: 6 Global Step: 140850 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:24,809-Speed 6308.12 samples/sec Loss 7.4192 LearningRate 0.0009 Epoch: 6 Global Step: 140860 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:28,061-Speed 6300.63 samples/sec Loss 7.4046 LearningRate 0.0009 Epoch: 6 Global Step: 140870 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:31,309-Speed 6306.00 samples/sec Loss 7.4061 LearningRate 0.0009 Epoch: 6 Global Step: 140880 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:34,557-Speed 6308.32 samples/sec Loss 7.4388 LearningRate 0.0009 Epoch: 6 Global Step: 140890 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:37,806-Speed 6303.65 samples/sec Loss 7.4102 LearningRate 0.0009 Epoch: 6 Global Step: 140900 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:41,054-Speed 6308.35 samples/sec Loss 7.4196 LearningRate 0.0009 Epoch: 6 Global Step: 140910 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:44,302-Speed 6306.94 samples/sec Loss 7.3577 LearningRate 0.0009 Epoch: 6 Global Step: 140920 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:47,538-Speed 6330.41 samples/sec Loss 7.3880 LearningRate 0.0009 Epoch: 6 Global Step: 140930 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:50,795-Speed 6288.11 samples/sec Loss 7.4414 LearningRate 0.0009 Epoch: 6 Global Step: 140940 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:54,045-Speed 6303.41 samples/sec Loss 7.3852 LearningRate 0.0009 Epoch: 6 Global Step: 140950 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:32:57,292-Speed 6308.94 samples/sec Loss 7.4344 LearningRate 0.0009 Epoch: 6 Global Step: 140960 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:33:00,544-Speed 6300.01 samples/sec Loss 7.3919 LearningRate 0.0009 Epoch: 6 Global Step: 140970 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:33:03,791-Speed 6308.15 samples/sec Loss 7.4195 LearningRate 0.0009 Epoch: 6 Global Step: 140980 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:33:07,038-Speed 6308.31 samples/sec Loss 7.4094 LearningRate 0.0009 Epoch: 6 Global Step: 140990 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:33:10,286-Speed 6305.87 samples/sec Loss 7.4378 LearningRate 0.0009 Epoch: 6 Global Step: 141000 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:33:13,530-Speed 6316.09 samples/sec Loss 7.3207 LearningRate 0.0009 Epoch: 6 Global Step: 141010 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:33:16,774-Speed 6314.71 samples/sec Loss 7.4245 LearningRate 0.0009 Epoch: 6 Global Step: 141020 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:20,020-Speed 6309.94 samples/sec Loss 7.4412 LearningRate 0.0009 Epoch: 6 Global Step: 141030 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:23,270-Speed 6302.27 samples/sec Loss 7.3569 LearningRate 0.0009 Epoch: 6 Global Step: 141040 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:26,518-Speed 6308.26 samples/sec Loss 7.4105 LearningRate 0.0009 Epoch: 6 Global Step: 141050 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:29,762-Speed 6313.87 samples/sec Loss 7.4244 LearningRate 0.0009 Epoch: 6 Global Step: 141060 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:33,009-Speed 6308.36 samples/sec Loss 7.3522 LearningRate 0.0009 Epoch: 6 Global Step: 141070 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:36,253-Speed 6315.92 samples/sec Loss 7.4302 LearningRate 0.0009 Epoch: 6 Global Step: 141080 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:39,501-Speed 6307.22 samples/sec Loss 7.3700 LearningRate 0.0009 Epoch: 6 Global Step: 141090 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:42,748-Speed 6309.00 samples/sec Loss 7.4119 LearningRate 0.0009 Epoch: 6 Global Step: 141100 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:46,001-Speed 6297.35 samples/sec Loss 7.4054 LearningRate 0.0009 Epoch: 6 Global Step: 141110 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:49,254-Speed 6297.21 samples/sec Loss 7.4396 LearningRate 0.0009 Epoch: 6 Global Step: 141120 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:33:52,483-Speed 6344.06 samples/sec Loss 7.3813 LearningRate 0.0009 Epoch: 6 Global Step: 141130 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:55,729-Speed 6310.63 samples/sec Loss 7.4346 LearningRate 0.0009 Epoch: 6 Global Step: 141140 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:33:58,974-Speed 6313.28 samples/sec Loss 7.3594 LearningRate 0.0009 Epoch: 6 Global Step: 141150 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:02,221-Speed 6308.26 samples/sec Loss 7.3745 LearningRate 0.0009 Epoch: 6 Global Step: 141160 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:05,470-Speed 6306.16 samples/sec Loss 7.4031 LearningRate 0.0009 Epoch: 6 Global Step: 141170 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:08,713-Speed 6315.01 samples/sec Loss 7.4542 LearningRate 0.0009 Epoch: 6 Global Step: 141180 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:11,971-Speed 6288.01 samples/sec Loss 7.4050 LearningRate 0.0009 Epoch: 6 Global Step: 141190 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:15,218-Speed 6308.26 samples/sec Loss 7.4011 LearningRate 0.0009 Epoch: 6 Global Step: 141200 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:18,470-Speed 6299.81 samples/sec Loss 7.3512 LearningRate 0.0009 Epoch: 6 Global Step: 141210 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:21,716-Speed 6309.90 samples/sec Loss 7.3487 LearningRate 0.0009 Epoch: 6 Global Step: 141220 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:24,965-Speed 6304.74 samples/sec Loss 7.3902 LearningRate 0.0008 Epoch: 6 Global Step: 141230 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:34:28,208-Speed 6316.84 samples/sec Loss 7.4009 LearningRate 0.0008 Epoch: 6 Global Step: 141240 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:31,452-Speed 6315.14 samples/sec Loss 7.3634 LearningRate 0.0008 Epoch: 6 Global Step: 141250 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:34,698-Speed 6309.65 samples/sec Loss 7.3867 LearningRate 0.0008 Epoch: 6 Global Step: 141260 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:37,946-Speed 6307.82 samples/sec Loss 7.4599 LearningRate 0.0008 Epoch: 6 Global Step: 141270 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:41,195-Speed 6304.66 samples/sec Loss 7.3545 LearningRate 0.0008 Epoch: 6 Global Step: 141280 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:44,442-Speed 6308.48 samples/sec Loss 7.3199 LearningRate 0.0008 Epoch: 6 Global Step: 141290 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:47,690-Speed 6308.03 samples/sec Loss 7.4069 LearningRate 0.0008 Epoch: 6 Global Step: 141300 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:50,934-Speed 6315.02 samples/sec Loss 7.3124 LearningRate 0.0008 Epoch: 6 Global Step: 141310 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:54,183-Speed 6304.72 samples/sec Loss 7.3693 LearningRate 0.0008 Epoch: 6 Global Step: 141320 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:34:57,430-Speed 6308.06 samples/sec Loss 7.3870 LearningRate 0.0008 Epoch: 6 Global Step: 141330 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:35:00,676-Speed 6310.98 samples/sec Loss 7.3916 LearningRate 0.0008 Epoch: 6 Global Step: 141340 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:03,921-Speed 6312.68 samples/sec Loss 7.3523 LearningRate 0.0008 Epoch: 6 Global Step: 141350 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:07,171-Speed 6303.35 samples/sec Loss 7.3687 LearningRate 0.0008 Epoch: 6 Global Step: 141360 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:10,417-Speed 6310.32 samples/sec Loss 7.3928 LearningRate 0.0008 Epoch: 6 Global Step: 141370 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:13,666-Speed 6305.21 samples/sec Loss 7.3419 LearningRate 0.0008 Epoch: 6 Global Step: 141380 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:16,915-Speed 6304.97 samples/sec Loss 7.3899 LearningRate 0.0008 Epoch: 6 Global Step: 141390 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:20,164-Speed 6304.74 samples/sec Loss 7.3689 LearningRate 0.0008 Epoch: 6 Global Step: 141400 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:23,410-Speed 6309.42 samples/sec Loss 7.3341 LearningRate 0.0008 Epoch: 6 Global Step: 141410 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:26,663-Speed 6298.65 samples/sec Loss 7.3087 LearningRate 0.0008 Epoch: 6 Global Step: 141420 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:29,926-Speed 6276.34 samples/sec Loss 7.3791 LearningRate 0.0008 Epoch: 6 Global Step: 141430 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:33,157-Speed 6340.98 samples/sec Loss 7.4544 LearningRate 0.0008 Epoch: 6 Global Step: 141440 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:35:36,440-Speed 6239.68 samples/sec Loss 7.4062 LearningRate 0.0008 Epoch: 6 Global Step: 141450 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:35:39,730-Speed 6225.99 samples/sec Loss 7.3770 LearningRate 0.0008 Epoch: 6 Global Step: 141460 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:35:42,974-Speed 6315.37 samples/sec Loss 7.3886 LearningRate 0.0008 Epoch: 6 Global Step: 141470 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:35:46,218-Speed 6314.08 samples/sec Loss 7.3483 LearningRate 0.0008 Epoch: 6 Global Step: 141480 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:35:49,465-Speed 6309.18 samples/sec Loss 7.3654 LearningRate 0.0008 Epoch: 6 Global Step: 141490 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:35:52,710-Speed 6313.10 samples/sec Loss 7.3554 LearningRate 0.0008 Epoch: 6 Global Step: 141500 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:35:55,956-Speed 6309.46 samples/sec Loss 7.3433 LearningRate 0.0008 Epoch: 6 Global Step: 141510 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:35:59,203-Speed 6309.73 samples/sec Loss 7.3120 LearningRate 0.0008 Epoch: 6 Global Step: 141520 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:36:02,450-Speed 6309.55 samples/sec Loss 7.3810 LearningRate 0.0008 Epoch: 6 Global Step: 141530 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:36:05,696-Speed 6309.87 samples/sec Loss 7.3658 LearningRate 0.0008 Epoch: 6 Global Step: 141540 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:36:08,941-Speed 6313.32 samples/sec Loss 7.4143 LearningRate 0.0008 Epoch: 6 Global Step: 141550 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:36:12,185-Speed 6314.39 samples/sec Loss 7.5297 LearningRate 0.0008 Epoch: 6 Global Step: 141560 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:36:15,433-Speed 6307.30 samples/sec Loss 7.4096 LearningRate 0.0008 Epoch: 6 Global Step: 141570 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:36:18,677-Speed 6314.59 samples/sec Loss 7.3160 LearningRate 0.0008 Epoch: 6 Global Step: 141580 Fp16 Grad Scale: 65536 Required: 63 hours Training: 2022-04-01 04:36:21,908-Speed 6339.26 samples/sec Loss 7.3730 LearningRate 0.0008 Epoch: 6 Global Step: 141590 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:36:25,153-Speed 6312.10 samples/sec Loss 7.4427 LearningRate 0.0008 Epoch: 6 Global Step: 141600 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:36:28,395-Speed 6318.74 samples/sec Loss 7.4554 LearningRate 0.0008 Epoch: 6 Global Step: 141610 Fp16 Grad Scale: 32768 Required: 63 hours Training: 2022-04-01 04:36:31,645-Speed 6304.18 samples/sec Loss 7.4019 LearningRate 0.0008 Epoch: 6 Global Step: 141620 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:36:34,888-Speed 6316.20 samples/sec Loss 7.4143 LearningRate 0.0008 Epoch: 6 Global Step: 141630 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:36:38,145-Speed 6289.41 samples/sec Loss 7.4298 LearningRate 0.0008 Epoch: 6 Global Step: 141640 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:36:41,390-Speed 6311.81 samples/sec Loss 7.4372 LearningRate 0.0008 Epoch: 6 Global Step: 141650 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:36:44,637-Speed 6308.97 samples/sec Loss 7.4268 LearningRate 0.0008 Epoch: 6 Global Step: 141660 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:36:47,884-Speed 6308.80 samples/sec Loss 7.3867 LearningRate 0.0008 Epoch: 6 Global Step: 141670 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:36:51,131-Speed 6307.76 samples/sec Loss 7.3530 LearningRate 0.0008 Epoch: 6 Global Step: 141680 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:36:54,379-Speed 6308.61 samples/sec Loss 7.4033 LearningRate 0.0008 Epoch: 6 Global Step: 141690 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:36:57,622-Speed 6314.85 samples/sec Loss 7.3579 LearningRate 0.0008 Epoch: 6 Global Step: 141700 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:37:00,865-Speed 6317.62 samples/sec Loss 7.4059 LearningRate 0.0008 Epoch: 6 Global Step: 141710 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:37:04,114-Speed 6305.93 samples/sec Loss 7.3755 LearningRate 0.0008 Epoch: 6 Global Step: 141720 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:37:07,361-Speed 6309.18 samples/sec Loss 7.3233 LearningRate 0.0008 Epoch: 6 Global Step: 141730 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:37:10,592-Speed 6338.61 samples/sec Loss 7.3107 LearningRate 0.0008 Epoch: 6 Global Step: 141740 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:13,841-Speed 6304.85 samples/sec Loss 7.3947 LearningRate 0.0008 Epoch: 6 Global Step: 141750 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:17,087-Speed 6311.46 samples/sec Loss 7.2791 LearningRate 0.0008 Epoch: 6 Global Step: 141760 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:20,336-Speed 6305.33 samples/sec Loss 7.4403 LearningRate 0.0008 Epoch: 6 Global Step: 141770 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:23,584-Speed 6307.24 samples/sec Loss 7.3826 LearningRate 0.0008 Epoch: 6 Global Step: 141780 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:26,830-Speed 6309.83 samples/sec Loss 7.2932 LearningRate 0.0008 Epoch: 6 Global Step: 141790 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:30,076-Speed 6311.48 samples/sec Loss 7.3276 LearningRate 0.0008 Epoch: 6 Global Step: 141800 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:33,322-Speed 6310.13 samples/sec Loss 7.3879 LearningRate 0.0008 Epoch: 6 Global Step: 141810 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:36,573-Speed 6301.78 samples/sec Loss 7.3086 LearningRate 0.0008 Epoch: 6 Global Step: 141820 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:39,817-Speed 6314.55 samples/sec Loss 7.3570 LearningRate 0.0008 Epoch: 6 Global Step: 141830 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:43,061-Speed 6314.31 samples/sec Loss 7.4600 LearningRate 0.0008 Epoch: 6 Global Step: 141840 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:37:46,305-Speed 6313.44 samples/sec Loss 7.3208 LearningRate 0.0008 Epoch: 6 Global Step: 141850 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:37:49,558-Speed 6298.13 samples/sec Loss 7.3503 LearningRate 0.0008 Epoch: 6 Global Step: 141860 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:37:52,804-Speed 6311.16 samples/sec Loss 7.3486 LearningRate 0.0008 Epoch: 6 Global Step: 141870 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:37:56,038-Speed 6333.65 samples/sec Loss 7.3830 LearningRate 0.0008 Epoch: 6 Global Step: 141880 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:37:59,287-Speed 6304.32 samples/sec Loss 7.3835 LearningRate 0.0008 Epoch: 6 Global Step: 141890 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:38:02,538-Speed 6301.70 samples/sec Loss 7.3592 LearningRate 0.0008 Epoch: 6 Global Step: 141900 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:38:05,793-Speed 6293.28 samples/sec Loss 7.4497 LearningRate 0.0008 Epoch: 6 Global Step: 141910 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:38:09,041-Speed 6305.86 samples/sec Loss 7.3385 LearningRate 0.0008 Epoch: 6 Global Step: 141920 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:38:12,288-Speed 6310.34 samples/sec Loss 7.2916 LearningRate 0.0008 Epoch: 6 Global Step: 141930 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:38:15,539-Speed 6300.51 samples/sec Loss 7.3100 LearningRate 0.0008 Epoch: 6 Global Step: 141940 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:38:18,783-Speed 6315.46 samples/sec Loss 7.4610 LearningRate 0.0008 Epoch: 6 Global Step: 141950 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:38:22,032-Speed 6305.13 samples/sec Loss 7.3690 LearningRate 0.0008 Epoch: 6 Global Step: 141960 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:38:25,284-Speed 6298.83 samples/sec Loss 7.3356 LearningRate 0.0008 Epoch: 6 Global Step: 141970 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:38:28,535-Speed 6300.14 samples/sec Loss 7.3985 LearningRate 0.0008 Epoch: 6 Global Step: 141980 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:38:31,784-Speed 6305.42 samples/sec Loss 7.4330 LearningRate 0.0008 Epoch: 6 Global Step: 141990 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:38:35,039-Speed 6292.63 samples/sec Loss 7.4236 LearningRate 0.0008 Epoch: 6 Global Step: 142000 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:38:38,289-Speed 6303.60 samples/sec Loss 7.4164 LearningRate 0.0008 Epoch: 6 Global Step: 142010 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:38:41,537-Speed 6307.74 samples/sec Loss 7.3146 LearningRate 0.0008 Epoch: 6 Global Step: 142020 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:38:44,789-Speed 6298.10 samples/sec Loss 7.3517 LearningRate 0.0008 Epoch: 6 Global Step: 142030 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:38:48,037-Speed 6307.16 samples/sec Loss 7.3219 LearningRate 0.0008 Epoch: 6 Global Step: 142040 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:38:51,285-Speed 6306.37 samples/sec Loss 7.3065 LearningRate 0.0008 Epoch: 6 Global Step: 142050 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:38:54,535-Speed 6302.79 samples/sec Loss 7.4016 LearningRate 0.0008 Epoch: 6 Global Step: 142060 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:38:57,782-Speed 6308.90 samples/sec Loss 7.3352 LearningRate 0.0008 Epoch: 6 Global Step: 142070 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:01,029-Speed 6308.96 samples/sec Loss 7.3339 LearningRate 0.0008 Epoch: 6 Global Step: 142080 Fp16 Grad Scale: 131072 Required: 62 hours Training: 2022-04-01 04:39:04,258-Speed 6343.74 samples/sec Loss 7.4640 LearningRate 0.0008 Epoch: 6 Global Step: 142090 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:07,522-Speed 6277.03 samples/sec Loss 7.3112 LearningRate 0.0008 Epoch: 6 Global Step: 142100 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:10,773-Speed 6299.62 samples/sec Loss 7.3498 LearningRate 0.0008 Epoch: 6 Global Step: 142110 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:14,023-Speed 6303.69 samples/sec Loss 7.3579 LearningRate 0.0008 Epoch: 6 Global Step: 142120 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:17,270-Speed 6307.99 samples/sec Loss 7.3514 LearningRate 0.0008 Epoch: 6 Global Step: 142130 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:20,527-Speed 6289.81 samples/sec Loss 7.3066 LearningRate 0.0008 Epoch: 6 Global Step: 142140 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:23,779-Speed 6300.76 samples/sec Loss 7.2864 LearningRate 0.0008 Epoch: 6 Global Step: 142150 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:27,023-Speed 6314.06 samples/sec Loss 7.3227 LearningRate 0.0008 Epoch: 6 Global Step: 142160 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:30,269-Speed 6310.97 samples/sec Loss 7.3876 LearningRate 0.0008 Epoch: 6 Global Step: 142170 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:33,518-Speed 6303.52 samples/sec Loss 7.2866 LearningRate 0.0008 Epoch: 6 Global Step: 142180 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:36,748-Speed 6342.70 samples/sec Loss 7.3458 LearningRate 0.0008 Epoch: 6 Global Step: 142190 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:39,997-Speed 6304.45 samples/sec Loss 7.3187 LearningRate 0.0008 Epoch: 6 Global Step: 142200 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:43,248-Speed 6302.25 samples/sec Loss 7.3914 LearningRate 0.0008 Epoch: 6 Global Step: 142210 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:46,497-Speed 6304.14 samples/sec Loss 7.4150 LearningRate 0.0008 Epoch: 6 Global Step: 142220 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:49,748-Speed 6300.76 samples/sec Loss 7.3810 LearningRate 0.0008 Epoch: 6 Global Step: 142230 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:52,998-Speed 6303.16 samples/sec Loss 7.3550 LearningRate 0.0008 Epoch: 6 Global Step: 142240 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:56,246-Speed 6307.04 samples/sec Loss 7.3633 LearningRate 0.0008 Epoch: 6 Global Step: 142250 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:39:59,495-Speed 6304.63 samples/sec Loss 7.3189 LearningRate 0.0008 Epoch: 6 Global Step: 142260 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:40:02,743-Speed 6306.91 samples/sec Loss 7.4703 LearningRate 0.0008 Epoch: 6 Global Step: 142270 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:40:06,003-Speed 6283.43 samples/sec Loss 7.4266 LearningRate 0.0008 Epoch: 6 Global Step: 142280 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:40:09,235-Speed 6337.15 samples/sec Loss 7.3725 LearningRate 0.0008 Epoch: 6 Global Step: 142290 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:40:12,485-Speed 6303.67 samples/sec Loss 7.4102 LearningRate 0.0008 Epoch: 6 Global Step: 142300 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:40:15,743-Speed 6287.36 samples/sec Loss 7.3851 LearningRate 0.0008 Epoch: 6 Global Step: 142310 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:40:18,991-Speed 6307.16 samples/sec Loss 7.3060 LearningRate 0.0008 Epoch: 6 Global Step: 142320 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:40:22,224-Speed 6336.85 samples/sec Loss 7.2883 LearningRate 0.0008 Epoch: 6 Global Step: 142330 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:40:25,516-Speed 6221.78 samples/sec Loss 7.2833 LearningRate 0.0008 Epoch: 6 Global Step: 142340 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:40:28,777-Speed 6281.81 samples/sec Loss 7.3590 LearningRate 0.0008 Epoch: 6 Global Step: 142350 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:40:32,021-Speed 6316.07 samples/sec Loss 7.3145 LearningRate 0.0008 Epoch: 6 Global Step: 142360 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:40:35,278-Speed 6288.60 samples/sec Loss 7.4810 LearningRate 0.0008 Epoch: 6 Global Step: 142370 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:40:38,521-Speed 6316.16 samples/sec Loss 7.3032 LearningRate 0.0008 Epoch: 6 Global Step: 142380 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:40:41,768-Speed 6310.13 samples/sec Loss 7.3906 LearningRate 0.0008 Epoch: 6 Global Step: 142390 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:40:45,013-Speed 6312.08 samples/sec Loss 7.3910 LearningRate 0.0008 Epoch: 6 Global Step: 142400 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:40:48,256-Speed 6315.82 samples/sec Loss 7.3292 LearningRate 0.0008 Epoch: 6 Global Step: 142410 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:40:51,510-Speed 6295.23 samples/sec Loss 7.4031 LearningRate 0.0008 Epoch: 6 Global Step: 142420 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:40:54,755-Speed 6314.07 samples/sec Loss 7.3699 LearningRate 0.0008 Epoch: 6 Global Step: 142430 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:40:58,000-Speed 6311.16 samples/sec Loss 7.3858 LearningRate 0.0008 Epoch: 6 Global Step: 142440 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:41:01,245-Speed 6312.38 samples/sec Loss 7.3311 LearningRate 0.0008 Epoch: 6 Global Step: 142450 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:41:04,489-Speed 6315.62 samples/sec Loss 7.3448 LearningRate 0.0008 Epoch: 6 Global Step: 142460 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:41:07,737-Speed 6305.74 samples/sec Loss 7.4059 LearningRate 0.0008 Epoch: 6 Global Step: 142470 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:41:10,981-Speed 6315.43 samples/sec Loss 7.3878 LearningRate 0.0008 Epoch: 6 Global Step: 142480 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:41:14,239-Speed 6287.47 samples/sec Loss 7.3072 LearningRate 0.0008 Epoch: 6 Global Step: 142490 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:41:17,487-Speed 6307.17 samples/sec Loss 7.3547 LearningRate 0.0008 Epoch: 6 Global Step: 142500 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:41:20,722-Speed 6331.67 samples/sec Loss 7.4238 LearningRate 0.0008 Epoch: 6 Global Step: 142510 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:23,970-Speed 6307.75 samples/sec Loss 7.4204 LearningRate 0.0008 Epoch: 6 Global Step: 142520 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:27,213-Speed 6315.79 samples/sec Loss 7.2251 LearningRate 0.0008 Epoch: 6 Global Step: 142530 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:30,464-Speed 6301.23 samples/sec Loss 7.4044 LearningRate 0.0008 Epoch: 6 Global Step: 142540 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:33,712-Speed 6307.61 samples/sec Loss 7.3414 LearningRate 0.0008 Epoch: 6 Global Step: 142550 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:36,960-Speed 6305.77 samples/sec Loss 7.3524 LearningRate 0.0008 Epoch: 6 Global Step: 142560 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:40,204-Speed 6315.00 samples/sec Loss 7.3362 LearningRate 0.0008 Epoch: 6 Global Step: 142570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:43,450-Speed 6310.09 samples/sec Loss 7.3963 LearningRate 0.0008 Epoch: 6 Global Step: 142580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:46,696-Speed 6312.06 samples/sec Loss 7.3826 LearningRate 0.0008 Epoch: 6 Global Step: 142590 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:49,943-Speed 6308.67 samples/sec Loss 7.3452 LearningRate 0.0008 Epoch: 6 Global Step: 142600 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:53,191-Speed 6306.49 samples/sec Loss 7.3452 LearningRate 0.0008 Epoch: 6 Global Step: 142610 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:41:56,431-Speed 6322.83 samples/sec Loss 7.3896 LearningRate 0.0008 Epoch: 6 Global Step: 142620 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:41:59,681-Speed 6302.08 samples/sec Loss 7.3321 LearningRate 0.0008 Epoch: 6 Global Step: 142630 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:42:02,934-Speed 6297.62 samples/sec Loss 7.3911 LearningRate 0.0008 Epoch: 6 Global Step: 142640 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:42:06,180-Speed 6311.37 samples/sec Loss 7.3272 LearningRate 0.0008 Epoch: 6 Global Step: 142650 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:42:09,428-Speed 6305.52 samples/sec Loss 7.3683 LearningRate 0.0008 Epoch: 6 Global Step: 142660 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:42:12,672-Speed 6314.02 samples/sec Loss 7.3529 LearningRate 0.0008 Epoch: 6 Global Step: 142670 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:42:15,922-Speed 6302.77 samples/sec Loss 7.3384 LearningRate 0.0008 Epoch: 6 Global Step: 142680 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:42:19,172-Speed 6303.84 samples/sec Loss 7.4148 LearningRate 0.0008 Epoch: 6 Global Step: 142690 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:42:22,417-Speed 6312.54 samples/sec Loss 7.4495 LearningRate 0.0008 Epoch: 6 Global Step: 142700 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:42:25,668-Speed 6302.06 samples/sec Loss 7.3892 LearningRate 0.0008 Epoch: 6 Global Step: 142710 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:42:28,915-Speed 6307.82 samples/sec Loss 7.3425 LearningRate 0.0008 Epoch: 6 Global Step: 142720 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:42:32,163-Speed 6307.79 samples/sec Loss 7.3055 LearningRate 0.0008 Epoch: 6 Global Step: 142730 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:42:35,410-Speed 6307.48 samples/sec Loss 7.3957 LearningRate 0.0008 Epoch: 6 Global Step: 142740 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:42:38,661-Speed 6302.24 samples/sec Loss 7.3706 LearningRate 0.0008 Epoch: 6 Global Step: 142750 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:42:41,909-Speed 6306.84 samples/sec Loss 7.3752 LearningRate 0.0008 Epoch: 6 Global Step: 142760 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:42:45,155-Speed 6312.03 samples/sec Loss 7.3534 LearningRate 0.0008 Epoch: 6 Global Step: 142770 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:42:48,406-Speed 6300.46 samples/sec Loss 7.3383 LearningRate 0.0008 Epoch: 6 Global Step: 142780 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:42:51,662-Speed 6291.08 samples/sec Loss 7.3457 LearningRate 0.0008 Epoch: 6 Global Step: 142790 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:42:54,912-Speed 6303.99 samples/sec Loss 7.3911 LearningRate 0.0008 Epoch: 6 Global Step: 142800 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:42:58,165-Speed 6297.10 samples/sec Loss 7.3627 LearningRate 0.0008 Epoch: 6 Global Step: 142810 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:01,417-Speed 6299.00 samples/sec Loss 7.4339 LearningRate 0.0008 Epoch: 6 Global Step: 142820 Fp16 Grad Scale: 131072 Required: 62 hours Training: 2022-04-01 04:43:04,651-Speed 6332.62 samples/sec Loss 7.4022 LearningRate 0.0008 Epoch: 6 Global Step: 142830 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:07,900-Speed 6305.80 samples/sec Loss 7.3760 LearningRate 0.0008 Epoch: 6 Global Step: 142840 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:11,147-Speed 6308.38 samples/sec Loss 7.3612 LearningRate 0.0008 Epoch: 6 Global Step: 142850 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:14,398-Speed 6300.60 samples/sec Loss 7.4329 LearningRate 0.0008 Epoch: 6 Global Step: 142860 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:17,643-Speed 6314.31 samples/sec Loss 7.3563 LearningRate 0.0008 Epoch: 6 Global Step: 142870 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:20,887-Speed 6313.54 samples/sec Loss 7.3392 LearningRate 0.0008 Epoch: 6 Global Step: 142880 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:24,142-Speed 6293.78 samples/sec Loss 7.2598 LearningRate 0.0008 Epoch: 6 Global Step: 142890 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:27,395-Speed 6296.96 samples/sec Loss 7.3149 LearningRate 0.0008 Epoch: 6 Global Step: 142900 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:30,642-Speed 6309.26 samples/sec Loss 7.3692 LearningRate 0.0008 Epoch: 6 Global Step: 142910 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:33,891-Speed 6304.27 samples/sec Loss 7.3148 LearningRate 0.0008 Epoch: 6 Global Step: 142920 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:37,128-Speed 6328.03 samples/sec Loss 7.3302 LearningRate 0.0008 Epoch: 6 Global Step: 142930 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:40,374-Speed 6310.59 samples/sec Loss 7.3574 LearningRate 0.0008 Epoch: 6 Global Step: 142940 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:43:43,606-Speed 6338.60 samples/sec Loss 7.4145 LearningRate 0.0008 Epoch: 6 Global Step: 142950 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:43:46,852-Speed 6311.38 samples/sec Loss 7.4029 LearningRate 0.0008 Epoch: 6 Global Step: 142960 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:43:50,097-Speed 6312.75 samples/sec Loss 7.3475 LearningRate 0.0008 Epoch: 6 Global Step: 142970 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:43:53,340-Speed 6315.61 samples/sec Loss 7.4007 LearningRate 0.0008 Epoch: 6 Global Step: 142980 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:43:56,594-Speed 6295.25 samples/sec Loss 7.3551 LearningRate 0.0008 Epoch: 6 Global Step: 142990 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:43:59,841-Speed 6309.12 samples/sec Loss 7.4314 LearningRate 0.0008 Epoch: 6 Global Step: 143000 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:03,088-Speed 6308.61 samples/sec Loss 7.2943 LearningRate 0.0008 Epoch: 6 Global Step: 143010 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:06,337-Speed 6305.14 samples/sec Loss 7.3484 LearningRate 0.0008 Epoch: 6 Global Step: 143020 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:09,582-Speed 6311.78 samples/sec Loss 7.3647 LearningRate 0.0008 Epoch: 6 Global Step: 143030 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:12,829-Speed 6309.74 samples/sec Loss 7.3244 LearningRate 0.0008 Epoch: 6 Global Step: 143040 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:16,073-Speed 6314.28 samples/sec Loss 7.3273 LearningRate 0.0008 Epoch: 6 Global Step: 143050 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:44:19,319-Speed 6310.25 samples/sec Loss 7.3771 LearningRate 0.0008 Epoch: 6 Global Step: 143060 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:44:22,567-Speed 6307.01 samples/sec Loss 7.3715 LearningRate 0.0008 Epoch: 6 Global Step: 143070 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:44:25,814-Speed 6310.15 samples/sec Loss 7.2930 LearningRate 0.0008 Epoch: 6 Global Step: 143080 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:44:29,051-Speed 6328.75 samples/sec Loss 7.3215 LearningRate 0.0008 Epoch: 6 Global Step: 143090 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:32,309-Speed 6285.98 samples/sec Loss 7.3508 LearningRate 0.0008 Epoch: 6 Global Step: 143100 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:35,558-Speed 6305.37 samples/sec Loss 7.3284 LearningRate 0.0008 Epoch: 6 Global Step: 143110 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:38,806-Speed 6306.35 samples/sec Loss 7.2760 LearningRate 0.0008 Epoch: 6 Global Step: 143120 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:42,061-Speed 6292.95 samples/sec Loss 7.4055 LearningRate 0.0008 Epoch: 6 Global Step: 143130 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:45,312-Speed 6302.27 samples/sec Loss 7.2512 LearningRate 0.0008 Epoch: 6 Global Step: 143140 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:48,560-Speed 6305.49 samples/sec Loss 7.3293 LearningRate 0.0008 Epoch: 6 Global Step: 143150 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:51,807-Speed 6308.73 samples/sec Loss 7.3883 LearningRate 0.0008 Epoch: 6 Global Step: 143160 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:55,056-Speed 6306.65 samples/sec Loss 7.3661 LearningRate 0.0008 Epoch: 6 Global Step: 143170 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:44:58,309-Speed 6297.07 samples/sec Loss 7.3492 LearningRate 0.0008 Epoch: 6 Global Step: 143180 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:01,558-Speed 6305.25 samples/sec Loss 7.4164 LearningRate 0.0008 Epoch: 6 Global Step: 143190 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:45:04,804-Speed 6311.08 samples/sec Loss 7.3473 LearningRate 0.0008 Epoch: 6 Global Step: 143200 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:45:08,049-Speed 6311.70 samples/sec Loss 7.4099 LearningRate 0.0008 Epoch: 6 Global Step: 143210 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:45:11,298-Speed 6305.34 samples/sec Loss 7.3186 LearningRate 0.0008 Epoch: 6 Global Step: 143220 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:45:14,544-Speed 6310.63 samples/sec Loss 7.3581 LearningRate 0.0008 Epoch: 6 Global Step: 143230 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:45:17,791-Speed 6308.76 samples/sec Loss 7.3904 LearningRate 0.0008 Epoch: 6 Global Step: 143240 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:45:21,034-Speed 6316.58 samples/sec Loss 7.3511 LearningRate 0.0008 Epoch: 6 Global Step: 143250 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:45:24,267-Speed 6336.17 samples/sec Loss 7.4019 LearningRate 0.0008 Epoch: 6 Global Step: 143260 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:27,520-Speed 6297.72 samples/sec Loss 7.3454 LearningRate 0.0008 Epoch: 6 Global Step: 143270 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:30,766-Speed 6310.57 samples/sec Loss 7.3441 LearningRate 0.0008 Epoch: 6 Global Step: 143280 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:34,018-Speed 6297.81 samples/sec Loss 7.3098 LearningRate 0.0008 Epoch: 6 Global Step: 143290 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:37,268-Speed 6304.27 samples/sec Loss 7.2893 LearningRate 0.0008 Epoch: 6 Global Step: 143300 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:40,514-Speed 6310.74 samples/sec Loss 7.3519 LearningRate 0.0008 Epoch: 6 Global Step: 143310 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:43,763-Speed 6303.67 samples/sec Loss 7.3159 LearningRate 0.0008 Epoch: 6 Global Step: 143320 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:47,017-Speed 6295.85 samples/sec Loss 7.4488 LearningRate 0.0008 Epoch: 6 Global Step: 143330 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:50,284-Speed 6269.12 samples/sec Loss 7.2864 LearningRate 0.0008 Epoch: 6 Global Step: 143340 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:53,532-Speed 6308.83 samples/sec Loss 7.3148 LearningRate 0.0008 Epoch: 6 Global Step: 143350 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:45:56,780-Speed 6306.23 samples/sec Loss 7.3560 LearningRate 0.0008 Epoch: 6 Global Step: 143360 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:46:00,028-Speed 6305.88 samples/sec Loss 7.3833 LearningRate 0.0008 Epoch: 6 Global Step: 143370 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:46:03,261-Speed 6337.10 samples/sec Loss 7.3682 LearningRate 0.0008 Epoch: 6 Global Step: 143380 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:06,510-Speed 6304.30 samples/sec Loss 7.2489 LearningRate 0.0008 Epoch: 6 Global Step: 143390 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:09,761-Speed 6301.82 samples/sec Loss 7.3503 LearningRate 0.0008 Epoch: 6 Global Step: 143400 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:13,009-Speed 6307.35 samples/sec Loss 7.3495 LearningRate 0.0008 Epoch: 6 Global Step: 143410 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:16,252-Speed 6315.72 samples/sec Loss 7.3881 LearningRate 0.0008 Epoch: 6 Global Step: 143420 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:19,501-Speed 6305.59 samples/sec Loss 7.3026 LearningRate 0.0008 Epoch: 6 Global Step: 143430 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:22,750-Speed 6305.65 samples/sec Loss 7.3660 LearningRate 0.0008 Epoch: 6 Global Step: 143440 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:26,006-Speed 6291.10 samples/sec Loss 7.3681 LearningRate 0.0008 Epoch: 6 Global Step: 143450 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:29,254-Speed 6308.89 samples/sec Loss 7.2552 LearningRate 0.0008 Epoch: 6 Global Step: 143460 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:32,504-Speed 6303.49 samples/sec Loss 7.3348 LearningRate 0.0008 Epoch: 6 Global Step: 143470 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:35,755-Speed 6300.55 samples/sec Loss 7.3461 LearningRate 0.0008 Epoch: 6 Global Step: 143480 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:46:39,002-Speed 6308.06 samples/sec Loss 7.4263 LearningRate 0.0008 Epoch: 6 Global Step: 143490 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:46:42,255-Speed 6297.17 samples/sec Loss 7.3875 LearningRate 0.0008 Epoch: 6 Global Step: 143500 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:46:45,501-Speed 6311.45 samples/sec Loss 7.3281 LearningRate 0.0008 Epoch: 6 Global Step: 143510 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:46:48,736-Speed 6331.71 samples/sec Loss 7.4318 LearningRate 0.0008 Epoch: 6 Global Step: 143520 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:51,989-Speed 6298.49 samples/sec Loss 7.3851 LearningRate 0.0008 Epoch: 6 Global Step: 143530 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:55,232-Speed 6314.68 samples/sec Loss 7.3189 LearningRate 0.0008 Epoch: 6 Global Step: 143540 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:46:58,486-Speed 6296.09 samples/sec Loss 7.3581 LearningRate 0.0008 Epoch: 6 Global Step: 143550 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:01,747-Speed 6281.05 samples/sec Loss 7.2447 LearningRate 0.0008 Epoch: 6 Global Step: 143560 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:05,005-Speed 6288.64 samples/sec Loss 7.3747 LearningRate 0.0008 Epoch: 6 Global Step: 143570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:08,252-Speed 6308.16 samples/sec Loss 7.3766 LearningRate 0.0008 Epoch: 6 Global Step: 143580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:11,501-Speed 6305.94 samples/sec Loss 7.3444 LearningRate 0.0008 Epoch: 6 Global Step: 143590 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:14,750-Speed 6304.16 samples/sec Loss 7.3569 LearningRate 0.0008 Epoch: 6 Global Step: 143600 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:17,997-Speed 6309.06 samples/sec Loss 7.3320 LearningRate 0.0008 Epoch: 6 Global Step: 143610 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:21,243-Speed 6311.43 samples/sec Loss 7.3892 LearningRate 0.0008 Epoch: 6 Global Step: 143620 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:47:24,478-Speed 6330.84 samples/sec Loss 7.3681 LearningRate 0.0008 Epoch: 6 Global Step: 143630 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:27,723-Speed 6312.85 samples/sec Loss 7.4045 LearningRate 0.0008 Epoch: 6 Global Step: 143640 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:30,969-Speed 6310.85 samples/sec Loss 7.4331 LearningRate 0.0008 Epoch: 6 Global Step: 143650 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:34,215-Speed 6311.19 samples/sec Loss 7.3105 LearningRate 0.0008 Epoch: 6 Global Step: 143660 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:37,464-Speed 6305.82 samples/sec Loss 7.3190 LearningRate 0.0008 Epoch: 6 Global Step: 143670 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:40,711-Speed 6308.42 samples/sec Loss 7.3332 LearningRate 0.0008 Epoch: 6 Global Step: 143680 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:43,958-Speed 6309.40 samples/sec Loss 7.3676 LearningRate 0.0008 Epoch: 6 Global Step: 143690 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:47,202-Speed 6314.47 samples/sec Loss 7.3191 LearningRate 0.0008 Epoch: 6 Global Step: 143700 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:50,448-Speed 6310.47 samples/sec Loss 7.2229 LearningRate 0.0008 Epoch: 6 Global Step: 143710 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:53,699-Speed 6301.76 samples/sec Loss 7.3368 LearningRate 0.0008 Epoch: 6 Global Step: 143720 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:47:56,944-Speed 6311.61 samples/sec Loss 7.2782 LearningRate 0.0008 Epoch: 6 Global Step: 143730 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:48:00,190-Speed 6310.34 samples/sec Loss 7.3698 LearningRate 0.0008 Epoch: 6 Global Step: 143740 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:48:03,426-Speed 6330.40 samples/sec Loss 7.4437 LearningRate 0.0008 Epoch: 6 Global Step: 143750 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:48:06,675-Speed 6305.09 samples/sec Loss 7.3392 LearningRate 0.0008 Epoch: 6 Global Step: 143760 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:48:09,921-Speed 6311.51 samples/sec Loss 7.4297 LearningRate 0.0008 Epoch: 6 Global Step: 143770 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:48:13,164-Speed 6315.34 samples/sec Loss 7.2843 LearningRate 0.0008 Epoch: 6 Global Step: 143780 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:48:16,413-Speed 6305.87 samples/sec Loss 7.4290 LearningRate 0.0008 Epoch: 6 Global Step: 143790 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:48:19,664-Speed 6301.14 samples/sec Loss 7.3512 LearningRate 0.0008 Epoch: 6 Global Step: 143800 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:48:22,908-Speed 6314.25 samples/sec Loss 7.3770 LearningRate 0.0008 Epoch: 6 Global Step: 143810 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:48:26,154-Speed 6310.65 samples/sec Loss 7.3606 LearningRate 0.0008 Epoch: 6 Global Step: 143820 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:48:29,400-Speed 6310.79 samples/sec Loss 7.3717 LearningRate 0.0008 Epoch: 6 Global Step: 143830 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:48:32,648-Speed 6307.86 samples/sec Loss 7.3459 LearningRate 0.0008 Epoch: 6 Global Step: 143840 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:48:35,895-Speed 6307.56 samples/sec Loss 7.3436 LearningRate 0.0008 Epoch: 6 Global Step: 143850 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:48:39,151-Speed 6291.70 samples/sec Loss 7.3338 LearningRate 0.0008 Epoch: 6 Global Step: 143860 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:48:42,400-Speed 6305.58 samples/sec Loss 7.2834 LearningRate 0.0008 Epoch: 6 Global Step: 143870 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:48:45,645-Speed 6312.48 samples/sec Loss 7.2932 LearningRate 0.0008 Epoch: 6 Global Step: 143880 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:48:48,892-Speed 6309.04 samples/sec Loss 7.3314 LearningRate 0.0008 Epoch: 6 Global Step: 143890 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:48:52,141-Speed 6304.82 samples/sec Loss 7.3332 LearningRate 0.0008 Epoch: 6 Global Step: 143900 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:48:55,388-Speed 6308.26 samples/sec Loss 7.2768 LearningRate 0.0008 Epoch: 6 Global Step: 143910 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:48:58,632-Speed 6315.67 samples/sec Loss 7.2869 LearningRate 0.0008 Epoch: 6 Global Step: 143920 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:49:01,883-Speed 6300.99 samples/sec Loss 7.3348 LearningRate 0.0008 Epoch: 6 Global Step: 143930 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:49:05,133-Speed 6302.31 samples/sec Loss 7.3324 LearningRate 0.0008 Epoch: 6 Global Step: 143940 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:49:08,365-Speed 6338.14 samples/sec Loss 7.3515 LearningRate 0.0008 Epoch: 6 Global Step: 143950 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:49:11,598-Speed 6336.30 samples/sec Loss 7.2899 LearningRate 0.0008 Epoch: 6 Global Step: 143960 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:14,844-Speed 6309.07 samples/sec Loss 7.2865 LearningRate 0.0008 Epoch: 6 Global Step: 143970 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:18,093-Speed 6306.24 samples/sec Loss 7.3472 LearningRate 0.0008 Epoch: 6 Global Step: 143980 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:21,341-Speed 6304.95 samples/sec Loss 7.3391 LearningRate 0.0008 Epoch: 6 Global Step: 143990 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:24,601-Speed 6284.56 samples/sec Loss 7.2151 LearningRate 0.0008 Epoch: 6 Global Step: 144000 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:27,857-Speed 6292.23 samples/sec Loss 7.2619 LearningRate 0.0008 Epoch: 6 Global Step: 144010 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:31,104-Speed 6308.27 samples/sec Loss 7.2729 LearningRate 0.0008 Epoch: 6 Global Step: 144020 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:34,351-Speed 6308.44 samples/sec Loss 7.3647 LearningRate 0.0008 Epoch: 6 Global Step: 144030 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:37,612-Speed 6282.55 samples/sec Loss 7.2610 LearningRate 0.0008 Epoch: 6 Global Step: 144040 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:40,859-Speed 6308.07 samples/sec Loss 7.3260 LearningRate 0.0008 Epoch: 6 Global Step: 144050 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:44,092-Speed 6337.01 samples/sec Loss 7.3465 LearningRate 0.0008 Epoch: 6 Global Step: 144060 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:47,343-Speed 6301.03 samples/sec Loss 7.3946 LearningRate 0.0008 Epoch: 6 Global Step: 144070 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:50,592-Speed 6305.08 samples/sec Loss 7.3362 LearningRate 0.0008 Epoch: 6 Global Step: 144080 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:53,837-Speed 6311.69 samples/sec Loss 7.3322 LearningRate 0.0008 Epoch: 6 Global Step: 144090 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:49:57,085-Speed 6307.45 samples/sec Loss 7.3579 LearningRate 0.0008 Epoch: 6 Global Step: 144100 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:00,330-Speed 6312.61 samples/sec Loss 7.4000 LearningRate 0.0008 Epoch: 6 Global Step: 144110 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:03,579-Speed 6305.50 samples/sec Loss 7.3843 LearningRate 0.0008 Epoch: 6 Global Step: 144120 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:06,830-Speed 6301.38 samples/sec Loss 7.2996 LearningRate 0.0008 Epoch: 6 Global Step: 144130 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:10,072-Speed 6317.27 samples/sec Loss 7.3096 LearningRate 0.0008 Epoch: 6 Global Step: 144140 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:13,322-Speed 6304.06 samples/sec Loss 7.3636 LearningRate 0.0008 Epoch: 6 Global Step: 144150 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:16,565-Speed 6315.12 samples/sec Loss 7.2231 LearningRate 0.0008 Epoch: 6 Global Step: 144160 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:50:19,813-Speed 6306.28 samples/sec Loss 7.2753 LearningRate 0.0008 Epoch: 6 Global Step: 144170 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:50:23,062-Speed 6305.53 samples/sec Loss 7.2469 LearningRate 0.0008 Epoch: 6 Global Step: 144180 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:50:26,309-Speed 6309.63 samples/sec Loss 7.4002 LearningRate 0.0008 Epoch: 6 Global Step: 144190 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:50:29,555-Speed 6310.22 samples/sec Loss 7.3923 LearningRate 0.0008 Epoch: 6 Global Step: 144200 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:50:32,800-Speed 6312.73 samples/sec Loss 7.3086 LearningRate 0.0008 Epoch: 6 Global Step: 144210 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:50:36,054-Speed 6295.66 samples/sec Loss 7.2512 LearningRate 0.0008 Epoch: 6 Global Step: 144220 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:50:39,286-Speed 6338.02 samples/sec Loss 7.4475 LearningRate 0.0008 Epoch: 6 Global Step: 144230 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:42,534-Speed 6305.71 samples/sec Loss 7.3252 LearningRate 0.0008 Epoch: 6 Global Step: 144240 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:45,781-Speed 6309.71 samples/sec Loss 7.3652 LearningRate 0.0008 Epoch: 6 Global Step: 144250 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:49,026-Speed 6312.06 samples/sec Loss 7.3277 LearningRate 0.0008 Epoch: 6 Global Step: 144260 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:52,274-Speed 6308.29 samples/sec Loss 7.3531 LearningRate 0.0008 Epoch: 6 Global Step: 144270 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:55,521-Speed 6309.50 samples/sec Loss 7.3620 LearningRate 0.0008 Epoch: 6 Global Step: 144280 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:50:58,765-Speed 6313.03 samples/sec Loss 7.4079 LearningRate 0.0008 Epoch: 6 Global Step: 144290 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:02,009-Speed 6316.22 samples/sec Loss 7.2755 LearningRate 0.0008 Epoch: 6 Global Step: 144300 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:05,256-Speed 6307.76 samples/sec Loss 7.3792 LearningRate 0.0008 Epoch: 6 Global Step: 144310 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:08,500-Speed 6315.04 samples/sec Loss 7.4056 LearningRate 0.0008 Epoch: 6 Global Step: 144320 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:11,744-Speed 6314.64 samples/sec Loss 7.2971 LearningRate 0.0008 Epoch: 6 Global Step: 144330 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:51:14,991-Speed 6307.70 samples/sec Loss 7.3599 LearningRate 0.0008 Epoch: 6 Global Step: 144340 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:51:18,236-Speed 6313.03 samples/sec Loss 7.3076 LearningRate 0.0008 Epoch: 6 Global Step: 144350 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:51:21,467-Speed 6340.11 samples/sec Loss 7.3857 LearningRate 0.0008 Epoch: 6 Global Step: 144360 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:24,713-Speed 6311.12 samples/sec Loss 7.2632 LearningRate 0.0008 Epoch: 6 Global Step: 144370 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:27,966-Speed 6299.10 samples/sec Loss 7.3361 LearningRate 0.0008 Epoch: 6 Global Step: 144380 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:31,214-Speed 6306.72 samples/sec Loss 7.3304 LearningRate 0.0008 Epoch: 6 Global Step: 144390 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:34,457-Speed 6316.86 samples/sec Loss 7.3352 LearningRate 0.0008 Epoch: 6 Global Step: 144400 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:37,703-Speed 6309.99 samples/sec Loss 7.3024 LearningRate 0.0008 Epoch: 6 Global Step: 144410 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:40,947-Speed 6314.28 samples/sec Loss 7.2351 LearningRate 0.0008 Epoch: 6 Global Step: 144420 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:44,190-Speed 6317.27 samples/sec Loss 7.2294 LearningRate 0.0008 Epoch: 6 Global Step: 144430 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:47,442-Speed 6299.22 samples/sec Loss 7.2508 LearningRate 0.0008 Epoch: 6 Global Step: 144440 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:50,688-Speed 6310.47 samples/sec Loss 7.4365 LearningRate 0.0008 Epoch: 6 Global Step: 144450 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:51:53,941-Speed 6297.58 samples/sec Loss 7.3752 LearningRate 0.0008 Epoch: 6 Global Step: 144460 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:51:57,184-Speed 6315.44 samples/sec Loss 7.3696 LearningRate 0.0008 Epoch: 6 Global Step: 144470 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:52:00,433-Speed 6305.38 samples/sec Loss 7.3485 LearningRate 0.0008 Epoch: 6 Global Step: 144480 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:52:03,683-Speed 6304.70 samples/sec Loss 7.3730 LearningRate 0.0008 Epoch: 6 Global Step: 144490 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:52:06,931-Speed 6306.28 samples/sec Loss 7.3836 LearningRate 0.0008 Epoch: 6 Global Step: 144500 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:52:10,176-Speed 6313.42 samples/sec Loss 7.3077 LearningRate 0.0008 Epoch: 6 Global Step: 144510 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:52:13,422-Speed 6308.99 samples/sec Loss 7.3523 LearningRate 0.0008 Epoch: 6 Global Step: 144520 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:52:16,676-Speed 6295.31 samples/sec Loss 7.2316 LearningRate 0.0008 Epoch: 6 Global Step: 144530 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:52:19,935-Speed 6286.62 samples/sec Loss 7.3170 LearningRate 0.0008 Epoch: 6 Global Step: 144540 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:52:23,198-Speed 6277.93 samples/sec Loss 7.2885 LearningRate 0.0008 Epoch: 6 Global Step: 144550 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:52:26,441-Speed 6315.88 samples/sec Loss 7.3029 LearningRate 0.0008 Epoch: 6 Global Step: 144560 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:52:29,687-Speed 6310.73 samples/sec Loss 7.4057 LearningRate 0.0008 Epoch: 6 Global Step: 144570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:52:32,936-Speed 6305.06 samples/sec Loss 7.2505 LearningRate 0.0008 Epoch: 6 Global Step: 144580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:52:36,186-Speed 6303.53 samples/sec Loss 7.1963 LearningRate 0.0008 Epoch: 6 Global Step: 144590 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:52:39,429-Speed 6316.73 samples/sec Loss 7.3444 LearningRate 0.0008 Epoch: 6 Global Step: 144600 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:52:42,697-Speed 6267.15 samples/sec Loss 7.3600 LearningRate 0.0008 Epoch: 6 Global Step: 144610 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:52:45,938-Speed 6320.63 samples/sec Loss 7.3629 LearningRate 0.0008 Epoch: 6 Global Step: 144620 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:52:49,183-Speed 6312.11 samples/sec Loss 7.3599 LearningRate 0.0008 Epoch: 6 Global Step: 144630 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:52:52,431-Speed 6307.79 samples/sec Loss 7.2112 LearningRate 0.0008 Epoch: 6 Global Step: 144640 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:52:55,676-Speed 6311.71 samples/sec Loss 7.3325 LearningRate 0.0008 Epoch: 6 Global Step: 144650 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:52:58,926-Speed 6304.13 samples/sec Loss 7.3432 LearningRate 0.0008 Epoch: 6 Global Step: 144660 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:53:02,173-Speed 6307.80 samples/sec Loss 7.3186 LearningRate 0.0008 Epoch: 6 Global Step: 144670 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:53:05,424-Speed 6301.27 samples/sec Loss 7.2958 LearningRate 0.0008 Epoch: 6 Global Step: 144680 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:53:08,672-Speed 6308.45 samples/sec Loss 7.4101 LearningRate 0.0008 Epoch: 6 Global Step: 144690 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:53:11,922-Speed 6301.61 samples/sec Loss 7.3176 LearningRate 0.0008 Epoch: 6 Global Step: 144700 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:53:15,176-Speed 6296.04 samples/sec Loss 7.3459 LearningRate 0.0008 Epoch: 6 Global Step: 144710 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:53:18,425-Speed 6303.55 samples/sec Loss 7.3048 LearningRate 0.0008 Epoch: 6 Global Step: 144720 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:53:21,677-Speed 6299.59 samples/sec Loss 7.3408 LearningRate 0.0008 Epoch: 6 Global Step: 144730 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:53:24,910-Speed 6336.04 samples/sec Loss 7.2835 LearningRate 0.0008 Epoch: 6 Global Step: 144740 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:53:28,156-Speed 6310.49 samples/sec Loss 7.2803 LearningRate 0.0008 Epoch: 6 Global Step: 144750 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:53:31,404-Speed 6306.35 samples/sec Loss 7.2311 LearningRate 0.0008 Epoch: 6 Global Step: 144760 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:53:34,661-Speed 6290.48 samples/sec Loss 7.3267 LearningRate 0.0008 Epoch: 6 Global Step: 144770 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:53:37,908-Speed 6307.74 samples/sec Loss 7.3383 LearningRate 0.0008 Epoch: 6 Global Step: 144780 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:53:41,152-Speed 6316.00 samples/sec Loss 7.2797 LearningRate 0.0008 Epoch: 6 Global Step: 144790 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:53:44,402-Speed 6303.41 samples/sec Loss 7.3880 LearningRate 0.0008 Epoch: 6 Global Step: 144800 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:53:47,656-Speed 6293.58 samples/sec Loss 7.2960 LearningRate 0.0008 Epoch: 6 Global Step: 144810 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:53:50,903-Speed 6308.80 samples/sec Loss 7.3240 LearningRate 0.0008 Epoch: 6 Global Step: 144820 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:53:54,149-Speed 6312.22 samples/sec Loss 7.2697 LearningRate 0.0008 Epoch: 6 Global Step: 144830 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:53:57,394-Speed 6312.10 samples/sec Loss 7.3862 LearningRate 0.0008 Epoch: 6 Global Step: 144840 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:54:00,628-Speed 6332.70 samples/sec Loss 7.3667 LearningRate 0.0008 Epoch: 6 Global Step: 144850 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:03,879-Speed 6302.10 samples/sec Loss 7.3065 LearningRate 0.0008 Epoch: 6 Global Step: 144860 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:07,128-Speed 6305.32 samples/sec Loss 7.2633 LearningRate 0.0008 Epoch: 6 Global Step: 144870 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:10,373-Speed 6311.40 samples/sec Loss 7.3209 LearningRate 0.0008 Epoch: 6 Global Step: 144880 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:13,618-Speed 6314.42 samples/sec Loss 7.3394 LearningRate 0.0008 Epoch: 6 Global Step: 144890 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:16,868-Speed 6302.64 samples/sec Loss 7.3150 LearningRate 0.0008 Epoch: 6 Global Step: 144900 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:20,113-Speed 6311.74 samples/sec Loss 7.2823 LearningRate 0.0008 Epoch: 6 Global Step: 144910 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:23,361-Speed 6308.49 samples/sec Loss 7.2878 LearningRate 0.0008 Epoch: 6 Global Step: 144920 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:26,608-Speed 6308.44 samples/sec Loss 7.2630 LearningRate 0.0008 Epoch: 6 Global Step: 144930 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:29,851-Speed 6316.89 samples/sec Loss 7.3150 LearningRate 0.0008 Epoch: 6 Global Step: 144940 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:33,086-Speed 6332.25 samples/sec Loss 7.2871 LearningRate 0.0008 Epoch: 6 Global Step: 144950 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:36,330-Speed 6314.79 samples/sec Loss 7.3423 LearningRate 0.0008 Epoch: 6 Global Step: 144960 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:39,575-Speed 6311.51 samples/sec Loss 7.3367 LearningRate 0.0008 Epoch: 6 Global Step: 144970 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:42,818-Speed 6316.33 samples/sec Loss 7.2814 LearningRate 0.0008 Epoch: 6 Global Step: 144980 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:46,071-Speed 6297.08 samples/sec Loss 7.3324 LearningRate 0.0008 Epoch: 6 Global Step: 144990 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:49,318-Speed 6309.57 samples/sec Loss 7.2979 LearningRate 0.0008 Epoch: 6 Global Step: 145000 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:52,574-Speed 6291.08 samples/sec Loss 7.3721 LearningRate 0.0008 Epoch: 6 Global Step: 145010 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:55,820-Speed 6310.22 samples/sec Loss 7.3808 LearningRate 0.0008 Epoch: 6 Global Step: 145020 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:54:59,073-Speed 6297.28 samples/sec Loss 7.4186 LearningRate 0.0008 Epoch: 6 Global Step: 145030 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:55:02,325-Speed 6299.24 samples/sec Loss 7.3134 LearningRate 0.0008 Epoch: 6 Global Step: 145040 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:55:05,576-Speed 6300.54 samples/sec Loss 7.3658 LearningRate 0.0008 Epoch: 6 Global Step: 145050 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:55:08,824-Speed 6307.73 samples/sec Loss 7.3158 LearningRate 0.0008 Epoch: 6 Global Step: 145060 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:55:12,073-Speed 6305.36 samples/sec Loss 7.3866 LearningRate 0.0008 Epoch: 6 Global Step: 145070 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:55:15,317-Speed 6313.67 samples/sec Loss 7.2845 LearningRate 0.0008 Epoch: 6 Global Step: 145080 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:55:18,567-Speed 6303.26 samples/sec Loss 7.4008 LearningRate 0.0008 Epoch: 6 Global Step: 145090 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:55:21,811-Speed 6313.43 samples/sec Loss 7.3270 LearningRate 0.0008 Epoch: 6 Global Step: 145100 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:55:25,060-Speed 6306.70 samples/sec Loss 7.2376 LearningRate 0.0008 Epoch: 6 Global Step: 145110 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:55:28,304-Speed 6314.17 samples/sec Loss 7.3755 LearningRate 0.0008 Epoch: 6 Global Step: 145120 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:55:31,552-Speed 6307.81 samples/sec Loss 7.3406 LearningRate 0.0008 Epoch: 6 Global Step: 145130 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:55:34,784-Speed 6338.35 samples/sec Loss 7.3539 LearningRate 0.0008 Epoch: 6 Global Step: 145140 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:55:38,029-Speed 6311.02 samples/sec Loss 7.3604 LearningRate 0.0008 Epoch: 6 Global Step: 145150 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:55:41,276-Speed 6309.74 samples/sec Loss 7.3549 LearningRate 0.0008 Epoch: 6 Global Step: 145160 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:55:44,520-Speed 6314.86 samples/sec Loss 7.3380 LearningRate 0.0008 Epoch: 6 Global Step: 145170 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:56:43,860-Speed 345.13 samples/sec Loss 7.3084 LearningRate 0.0008 Epoch: 7 Global Step: 145180 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:56:47,107-Speed 6310.25 samples/sec Loss 7.3697 LearningRate 0.0008 Epoch: 7 Global Step: 145190 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:56:50,348-Speed 6320.15 samples/sec Loss 7.3111 LearningRate 0.0008 Epoch: 7 Global Step: 145200 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:56:53,590-Speed 6317.70 samples/sec Loss 7.3185 LearningRate 0.0008 Epoch: 7 Global Step: 145210 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:56:56,833-Speed 6315.64 samples/sec Loss 7.3239 LearningRate 0.0008 Epoch: 7 Global Step: 145220 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:00,073-Speed 6323.69 samples/sec Loss 7.2724 LearningRate 0.0008 Epoch: 7 Global Step: 145230 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:03,312-Speed 6324.00 samples/sec Loss 7.2820 LearningRate 0.0008 Epoch: 7 Global Step: 145240 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:57:06,539-Speed 6348.58 samples/sec Loss 7.3430 LearningRate 0.0008 Epoch: 7 Global Step: 145250 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:09,775-Speed 6330.65 samples/sec Loss 7.3173 LearningRate 0.0008 Epoch: 7 Global Step: 145260 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:13,009-Speed 6332.84 samples/sec Loss 7.3567 LearningRate 0.0008 Epoch: 7 Global Step: 145270 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:16,248-Speed 6323.45 samples/sec Loss 7.2769 LearningRate 0.0008 Epoch: 7 Global Step: 145280 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:19,490-Speed 6318.36 samples/sec Loss 7.4132 LearningRate 0.0008 Epoch: 7 Global Step: 145290 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:22,731-Speed 6321.98 samples/sec Loss 7.3537 LearningRate 0.0008 Epoch: 7 Global Step: 145300 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:25,968-Speed 6327.17 samples/sec Loss 7.3291 LearningRate 0.0008 Epoch: 7 Global Step: 145310 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:29,207-Speed 6324.96 samples/sec Loss 7.3801 LearningRate 0.0008 Epoch: 7 Global Step: 145320 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:32,447-Speed 6323.21 samples/sec Loss 7.2810 LearningRate 0.0008 Epoch: 7 Global Step: 145330 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:35,684-Speed 6327.84 samples/sec Loss 7.3085 LearningRate 0.0008 Epoch: 7 Global Step: 145340 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:57:38,925-Speed 6320.54 samples/sec Loss 7.2883 LearningRate 0.0008 Epoch: 7 Global Step: 145350 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:57:42,166-Speed 6322.02 samples/sec Loss 7.3162 LearningRate 0.0008 Epoch: 7 Global Step: 145360 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:57:45,405-Speed 6323.90 samples/sec Loss 7.2384 LearningRate 0.0008 Epoch: 7 Global Step: 145370 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:57:48,643-Speed 6325.97 samples/sec Loss 7.2859 LearningRate 0.0008 Epoch: 7 Global Step: 145380 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:57:51,892-Speed 6303.98 samples/sec Loss 7.3287 LearningRate 0.0008 Epoch: 7 Global Step: 145390 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:57:55,132-Speed 6323.22 samples/sec Loss 7.2423 LearningRate 0.0008 Epoch: 7 Global Step: 145400 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:57:58,375-Speed 6316.32 samples/sec Loss 7.2257 LearningRate 0.0008 Epoch: 7 Global Step: 145410 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:58:01,616-Speed 6321.19 samples/sec Loss 7.3555 LearningRate 0.0008 Epoch: 7 Global Step: 145420 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:58:04,852-Speed 6330.02 samples/sec Loss 7.2611 LearningRate 0.0008 Epoch: 7 Global Step: 145430 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:58:08,094-Speed 6317.64 samples/sec Loss 7.2675 LearningRate 0.0008 Epoch: 7 Global Step: 145440 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:58:11,318-Speed 6354.61 samples/sec Loss 7.3339 LearningRate 0.0008 Epoch: 7 Global Step: 145450 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:58:14,538-Speed 6361.47 samples/sec Loss 7.3259 LearningRate 0.0008 Epoch: 7 Global Step: 145460 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:17,779-Speed 6319.96 samples/sec Loss 7.3355 LearningRate 0.0008 Epoch: 7 Global Step: 145470 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:21,022-Speed 6317.51 samples/sec Loss 7.1863 LearningRate 0.0008 Epoch: 7 Global Step: 145480 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:24,259-Speed 6327.10 samples/sec Loss 7.4066 LearningRate 0.0008 Epoch: 7 Global Step: 145490 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:27,499-Speed 6321.94 samples/sec Loss 7.1736 LearningRate 0.0008 Epoch: 7 Global Step: 145500 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:30,737-Speed 6327.11 samples/sec Loss 7.3469 LearningRate 0.0008 Epoch: 7 Global Step: 145510 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:33,977-Speed 6323.13 samples/sec Loss 7.3289 LearningRate 0.0008 Epoch: 7 Global Step: 145520 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:37,213-Speed 6329.38 samples/sec Loss 7.3016 LearningRate 0.0008 Epoch: 7 Global Step: 145530 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:40,449-Speed 6331.04 samples/sec Loss 7.2800 LearningRate 0.0008 Epoch: 7 Global Step: 145540 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:43,689-Speed 6323.57 samples/sec Loss 7.2446 LearningRate 0.0008 Epoch: 7 Global Step: 145550 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:46,931-Speed 6318.53 samples/sec Loss 7.3109 LearningRate 0.0008 Epoch: 7 Global Step: 145560 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:58:50,153-Speed 6356.14 samples/sec Loss 7.3130 LearningRate 0.0008 Epoch: 7 Global Step: 145570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:53,388-Speed 6332.83 samples/sec Loss 7.2745 LearningRate 0.0008 Epoch: 7 Global Step: 145580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:56,623-Speed 6332.40 samples/sec Loss 7.2577 LearningRate 0.0008 Epoch: 7 Global Step: 145590 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:58:59,859-Speed 6330.94 samples/sec Loss 7.2846 LearningRate 0.0008 Epoch: 7 Global Step: 145600 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:03,100-Speed 6320.91 samples/sec Loss 7.3298 LearningRate 0.0008 Epoch: 7 Global Step: 145610 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:06,369-Speed 6265.22 samples/sec Loss 7.3694 LearningRate 0.0008 Epoch: 7 Global Step: 145620 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:09,604-Speed 6332.63 samples/sec Loss 7.3394 LearningRate 0.0008 Epoch: 7 Global Step: 145630 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:12,838-Speed 6332.83 samples/sec Loss 7.3694 LearningRate 0.0008 Epoch: 7 Global Step: 145640 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:16,076-Speed 6327.47 samples/sec Loss 7.3392 LearningRate 0.0008 Epoch: 7 Global Step: 145650 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:19,323-Speed 6309.09 samples/sec Loss 7.2249 LearningRate 0.0008 Epoch: 7 Global Step: 145660 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:22,550-Speed 6347.50 samples/sec Loss 7.4132 LearningRate 0.0008 Epoch: 7 Global Step: 145670 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:25,787-Speed 6328.01 samples/sec Loss 7.3118 LearningRate 0.0008 Epoch: 7 Global Step: 145680 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:29,026-Speed 6323.34 samples/sec Loss 7.3652 LearningRate 0.0008 Epoch: 7 Global Step: 145690 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:32,262-Speed 6330.74 samples/sec Loss 7.2971 LearningRate 0.0008 Epoch: 7 Global Step: 145700 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:35,499-Speed 6328.79 samples/sec Loss 7.2687 LearningRate 0.0008 Epoch: 7 Global Step: 145710 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:38,736-Speed 6328.33 samples/sec Loss 7.2489 LearningRate 0.0008 Epoch: 7 Global Step: 145720 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:41,974-Speed 6327.10 samples/sec Loss 7.2771 LearningRate 0.0008 Epoch: 7 Global Step: 145730 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:45,213-Speed 6323.37 samples/sec Loss 7.2985 LearningRate 0.0008 Epoch: 7 Global Step: 145740 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:48,450-Speed 6327.93 samples/sec Loss 7.2826 LearningRate 0.0008 Epoch: 7 Global Step: 145750 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:51,698-Speed 6306.16 samples/sec Loss 7.2624 LearningRate 0.0008 Epoch: 7 Global Step: 145760 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 04:59:54,937-Speed 6325.22 samples/sec Loss 7.2639 LearningRate 0.0008 Epoch: 7 Global Step: 145770 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 04:59:58,177-Speed 6322.38 samples/sec Loss 7.1964 LearningRate 0.0008 Epoch: 7 Global Step: 145780 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:00:01,417-Speed 6322.70 samples/sec Loss 7.2447 LearningRate 0.0008 Epoch: 7 Global Step: 145790 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:00:04,657-Speed 6322.57 samples/sec Loss 7.2873 LearningRate 0.0008 Epoch: 7 Global Step: 145800 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:00:07,882-Speed 6351.79 samples/sec Loss 7.4336 LearningRate 0.0008 Epoch: 7 Global Step: 145810 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:00:11,117-Speed 6331.87 samples/sec Loss 7.2878 LearningRate 0.0008 Epoch: 7 Global Step: 145820 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:00:14,361-Speed 6315.94 samples/sec Loss 7.2909 LearningRate 0.0008 Epoch: 7 Global Step: 145830 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:00:17,599-Speed 6324.67 samples/sec Loss 7.2731 LearningRate 0.0008 Epoch: 7 Global Step: 145840 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:00:20,836-Speed 6328.32 samples/sec Loss 7.2723 LearningRate 0.0008 Epoch: 7 Global Step: 145850 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:00:24,073-Speed 6328.53 samples/sec Loss 7.2796 LearningRate 0.0008 Epoch: 7 Global Step: 145860 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:00:27,310-Speed 6329.54 samples/sec Loss 7.2433 LearningRate 0.0008 Epoch: 7 Global Step: 145870 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:00:30,545-Speed 6331.60 samples/sec Loss 7.3614 LearningRate 0.0008 Epoch: 7 Global Step: 145880 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:00:33,789-Speed 6315.50 samples/sec Loss 7.2638 LearningRate 0.0008 Epoch: 7 Global Step: 145890 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:00:37,056-Speed 6269.20 samples/sec Loss 7.3228 LearningRate 0.0008 Epoch: 7 Global Step: 145900 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:00:40,297-Speed 6321.78 samples/sec Loss 7.2770 LearningRate 0.0008 Epoch: 7 Global Step: 145910 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:00:43,535-Speed 6324.88 samples/sec Loss 7.3177 LearningRate 0.0008 Epoch: 7 Global Step: 145920 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:00:46,774-Speed 6325.72 samples/sec Loss 7.3164 LearningRate 0.0008 Epoch: 7 Global Step: 145930 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:00:50,010-Speed 6329.22 samples/sec Loss 7.2360 LearningRate 0.0008 Epoch: 7 Global Step: 145940 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:00:53,248-Speed 6326.10 samples/sec Loss 7.2782 LearningRate 0.0008 Epoch: 7 Global Step: 145950 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:00:56,483-Speed 6332.80 samples/sec Loss 7.2207 LearningRate 0.0008 Epoch: 7 Global Step: 145960 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:00:59,719-Speed 6329.07 samples/sec Loss 7.2243 LearningRate 0.0008 Epoch: 7 Global Step: 145970 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:01:02,962-Speed 6318.00 samples/sec Loss 7.3691 LearningRate 0.0008 Epoch: 7 Global Step: 145980 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:01:06,182-Speed 6362.11 samples/sec Loss 7.2995 LearningRate 0.0008 Epoch: 7 Global Step: 145990 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:01:09,417-Speed 6332.55 samples/sec Loss 7.2960 LearningRate 0.0008 Epoch: 7 Global Step: 146000 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:01:12,655-Speed 6326.53 samples/sec Loss 7.2624 LearningRate 0.0008 Epoch: 7 Global Step: 146010 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:01:15,890-Speed 6331.29 samples/sec Loss 7.2686 LearningRate 0.0008 Epoch: 7 Global Step: 146020 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:01:19,130-Speed 6323.03 samples/sec Loss 7.3604 LearningRate 0.0008 Epoch: 7 Global Step: 146030 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:01:22,364-Speed 6332.51 samples/sec Loss 7.4179 LearningRate 0.0008 Epoch: 7 Global Step: 146040 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:01:25,607-Speed 6317.54 samples/sec Loss 7.3624 LearningRate 0.0008 Epoch: 7 Global Step: 146050 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:01:28,844-Speed 6328.82 samples/sec Loss 7.3971 LearningRate 0.0008 Epoch: 7 Global Step: 146060 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:01:32,084-Speed 6321.80 samples/sec Loss 7.3198 LearningRate 0.0008 Epoch: 7 Global Step: 146070 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:01:35,328-Speed 6314.66 samples/sec Loss 7.3565 LearningRate 0.0008 Epoch: 7 Global Step: 146080 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:01:38,567-Speed 6323.88 samples/sec Loss 7.3078 LearningRate 0.0008 Epoch: 7 Global Step: 146090 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:01:41,803-Speed 6330.60 samples/sec Loss 7.2881 LearningRate 0.0008 Epoch: 7 Global Step: 146100 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:01:45,040-Speed 6329.07 samples/sec Loss 7.3508 LearningRate 0.0008 Epoch: 7 Global Step: 146110 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:01:48,282-Speed 6317.57 samples/sec Loss 7.3541 LearningRate 0.0008 Epoch: 7 Global Step: 146120 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:01:51,521-Speed 6323.13 samples/sec Loss 7.2244 LearningRate 0.0008 Epoch: 7 Global Step: 146130 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:01:54,759-Speed 6327.46 samples/sec Loss 7.2947 LearningRate 0.0008 Epoch: 7 Global Step: 146140 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:01:58,006-Speed 6308.60 samples/sec Loss 7.3227 LearningRate 0.0008 Epoch: 7 Global Step: 146150 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:02:01,247-Speed 6321.12 samples/sec Loss 7.2842 LearningRate 0.0008 Epoch: 7 Global Step: 146160 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:02:04,486-Speed 6323.53 samples/sec Loss 7.3339 LearningRate 0.0008 Epoch: 7 Global Step: 146170 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:07,722-Speed 6330.00 samples/sec Loss 7.3188 LearningRate 0.0008 Epoch: 7 Global Step: 146180 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:10,960-Speed 6326.26 samples/sec Loss 7.3027 LearningRate 0.0008 Epoch: 7 Global Step: 146190 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:14,199-Speed 6325.79 samples/sec Loss 7.2758 LearningRate 0.0008 Epoch: 7 Global Step: 146200 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:17,446-Speed 6308.51 samples/sec Loss 7.2422 LearningRate 0.0008 Epoch: 7 Global Step: 146210 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:20,686-Speed 6323.55 samples/sec Loss 7.1459 LearningRate 0.0008 Epoch: 7 Global Step: 146220 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:23,926-Speed 6322.09 samples/sec Loss 7.2647 LearningRate 0.0008 Epoch: 7 Global Step: 146230 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:27,166-Speed 6322.82 samples/sec Loss 7.3215 LearningRate 0.0008 Epoch: 7 Global Step: 146240 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:30,408-Speed 6316.98 samples/sec Loss 7.3320 LearningRate 0.0008 Epoch: 7 Global Step: 146250 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:33,656-Speed 6308.16 samples/sec Loss 7.2072 LearningRate 0.0008 Epoch: 7 Global Step: 146260 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:36,894-Speed 6325.20 samples/sec Loss 7.1693 LearningRate 0.0008 Epoch: 7 Global Step: 146270 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:02:40,142-Speed 6307.37 samples/sec Loss 7.3067 LearningRate 0.0008 Epoch: 7 Global Step: 146280 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:02:43,384-Speed 6319.29 samples/sec Loss 7.3506 LearningRate 0.0008 Epoch: 7 Global Step: 146290 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:02:46,606-Speed 6357.12 samples/sec Loss 7.2744 LearningRate 0.0008 Epoch: 7 Global Step: 146300 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:49,843-Speed 6328.72 samples/sec Loss 7.3105 LearningRate 0.0008 Epoch: 7 Global Step: 146310 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:53,084-Speed 6320.60 samples/sec Loss 7.3821 LearningRate 0.0008 Epoch: 7 Global Step: 146320 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:56,324-Speed 6321.23 samples/sec Loss 7.3054 LearningRate 0.0008 Epoch: 7 Global Step: 146330 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:02:59,569-Speed 6313.30 samples/sec Loss 7.3101 LearningRate 0.0008 Epoch: 7 Global Step: 146340 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:02,813-Speed 6314.53 samples/sec Loss 7.2531 LearningRate 0.0008 Epoch: 7 Global Step: 146350 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:06,082-Speed 6266.15 samples/sec Loss 7.2763 LearningRate 0.0008 Epoch: 7 Global Step: 146360 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:09,323-Speed 6319.41 samples/sec Loss 7.2951 LearningRate 0.0008 Epoch: 7 Global Step: 146370 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:12,561-Speed 6326.25 samples/sec Loss 7.2857 LearningRate 0.0008 Epoch: 7 Global Step: 146380 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:15,805-Speed 6315.88 samples/sec Loss 7.2627 LearningRate 0.0008 Epoch: 7 Global Step: 146390 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:19,047-Speed 6317.10 samples/sec Loss 7.2916 LearningRate 0.0008 Epoch: 7 Global Step: 146400 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:03:22,288-Speed 6321.48 samples/sec Loss 7.2891 LearningRate 0.0008 Epoch: 7 Global Step: 146410 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:03:25,526-Speed 6326.52 samples/sec Loss 7.3505 LearningRate 0.0008 Epoch: 7 Global Step: 146420 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:28,782-Speed 6292.46 samples/sec Loss 7.2587 LearningRate 0.0008 Epoch: 7 Global Step: 146430 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:32,027-Speed 6312.44 samples/sec Loss 7.3390 LearningRate 0.0008 Epoch: 7 Global Step: 146440 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:35,265-Speed 6325.84 samples/sec Loss 7.3006 LearningRate 0.0008 Epoch: 7 Global Step: 146450 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:38,542-Speed 6250.88 samples/sec Loss 7.2795 LearningRate 0.0008 Epoch: 7 Global Step: 146460 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:41,886-Speed 6126.13 samples/sec Loss 7.2682 LearningRate 0.0008 Epoch: 7 Global Step: 146470 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:45,134-Speed 6306.36 samples/sec Loss 7.3216 LearningRate 0.0008 Epoch: 7 Global Step: 146480 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:48,374-Speed 6322.20 samples/sec Loss 7.2845 LearningRate 0.0008 Epoch: 7 Global Step: 146490 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:51,618-Speed 6315.00 samples/sec Loss 7.2223 LearningRate 0.0008 Epoch: 7 Global Step: 146500 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:54,857-Speed 6324.13 samples/sec Loss 7.2764 LearningRate 0.0008 Epoch: 7 Global Step: 146510 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:03:58,098-Speed 6320.39 samples/sec Loss 7.2442 LearningRate 0.0008 Epoch: 7 Global Step: 146520 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:04:01,347-Speed 6306.06 samples/sec Loss 7.3593 LearningRate 0.0008 Epoch: 7 Global Step: 146530 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:04:04,588-Speed 6319.43 samples/sec Loss 7.3381 LearningRate 0.0008 Epoch: 7 Global Step: 146540 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:04:07,820-Speed 6339.08 samples/sec Loss 7.3168 LearningRate 0.0008 Epoch: 7 Global Step: 146550 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:11,062-Speed 6317.59 samples/sec Loss 7.2976 LearningRate 0.0008 Epoch: 7 Global Step: 146560 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:14,300-Speed 6325.87 samples/sec Loss 7.3117 LearningRate 0.0008 Epoch: 7 Global Step: 146570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:17,543-Speed 6316.81 samples/sec Loss 7.2401 LearningRate 0.0008 Epoch: 7 Global Step: 146580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:20,786-Speed 6317.31 samples/sec Loss 7.3617 LearningRate 0.0008 Epoch: 7 Global Step: 146590 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:24,029-Speed 6316.90 samples/sec Loss 7.3124 LearningRate 0.0008 Epoch: 7 Global Step: 146600 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:27,265-Speed 6329.24 samples/sec Loss 7.1683 LearningRate 0.0008 Epoch: 7 Global Step: 146610 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:30,508-Speed 6316.83 samples/sec Loss 7.2676 LearningRate 0.0008 Epoch: 7 Global Step: 146620 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:33,751-Speed 6318.22 samples/sec Loss 7.3715 LearningRate 0.0008 Epoch: 7 Global Step: 146630 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:36,989-Speed 6326.46 samples/sec Loss 7.3141 LearningRate 0.0008 Epoch: 7 Global Step: 146640 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:40,232-Speed 6315.96 samples/sec Loss 7.2887 LearningRate 0.0008 Epoch: 7 Global Step: 146650 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:04:43,462-Speed 6341.40 samples/sec Loss 7.3781 LearningRate 0.0008 Epoch: 7 Global Step: 146660 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:46,704-Speed 6318.75 samples/sec Loss 7.2880 LearningRate 0.0008 Epoch: 7 Global Step: 146670 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:49,945-Speed 6320.08 samples/sec Loss 7.2667 LearningRate 0.0008 Epoch: 7 Global Step: 146680 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:53,183-Speed 6327.74 samples/sec Loss 7.2975 LearningRate 0.0008 Epoch: 7 Global Step: 146690 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:56,423-Speed 6322.19 samples/sec Loss 7.2465 LearningRate 0.0008 Epoch: 7 Global Step: 146700 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:04:59,667-Speed 6314.18 samples/sec Loss 7.3720 LearningRate 0.0008 Epoch: 7 Global Step: 146710 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:02,913-Speed 6310.73 samples/sec Loss 7.2574 LearningRate 0.0008 Epoch: 7 Global Step: 146720 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:06,154-Speed 6319.88 samples/sec Loss 7.3056 LearningRate 0.0008 Epoch: 7 Global Step: 146730 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:09,400-Speed 6310.85 samples/sec Loss 7.2553 LearningRate 0.0008 Epoch: 7 Global Step: 146740 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:12,641-Speed 6320.33 samples/sec Loss 7.3209 LearningRate 0.0008 Epoch: 7 Global Step: 146750 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:15,882-Speed 6319.88 samples/sec Loss 7.2446 LearningRate 0.0008 Epoch: 7 Global Step: 146760 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:05:19,127-Speed 6313.90 samples/sec Loss 7.3133 LearningRate 0.0008 Epoch: 7 Global Step: 146770 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:05:22,370-Speed 6315.87 samples/sec Loss 7.3188 LearningRate 0.0008 Epoch: 7 Global Step: 146780 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:05:25,616-Speed 6311.54 samples/sec Loss 7.2810 LearningRate 0.0008 Epoch: 7 Global Step: 146790 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:05:28,860-Speed 6314.12 samples/sec Loss 7.2365 LearningRate 0.0008 Epoch: 7 Global Step: 146800 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:05:32,083-Speed 6356.01 samples/sec Loss 7.2741 LearningRate 0.0008 Epoch: 7 Global Step: 146810 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:35,326-Speed 6315.24 samples/sec Loss 7.3182 LearningRate 0.0008 Epoch: 7 Global Step: 146820 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:38,573-Speed 6309.41 samples/sec Loss 7.3644 LearningRate 0.0008 Epoch: 7 Global Step: 146830 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:41,816-Speed 6317.66 samples/sec Loss 7.2745 LearningRate 0.0008 Epoch: 7 Global Step: 146840 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:45,055-Speed 6324.51 samples/sec Loss 7.3019 LearningRate 0.0008 Epoch: 7 Global Step: 146850 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:48,300-Speed 6312.51 samples/sec Loss 7.3105 LearningRate 0.0008 Epoch: 7 Global Step: 146860 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:51,541-Speed 6319.96 samples/sec Loss 7.3158 LearningRate 0.0008 Epoch: 7 Global Step: 146870 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:54,780-Speed 6324.77 samples/sec Loss 7.2847 LearningRate 0.0008 Epoch: 7 Global Step: 146880 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:05:58,022-Speed 6318.00 samples/sec Loss 7.2357 LearningRate 0.0008 Epoch: 7 Global Step: 146890 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:01,264-Speed 6318.35 samples/sec Loss 7.2939 LearningRate 0.0008 Epoch: 7 Global Step: 146900 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:04,507-Speed 6317.56 samples/sec Loss 7.2277 LearningRate 0.0008 Epoch: 7 Global Step: 146910 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:06:07,751-Speed 6314.75 samples/sec Loss 7.2526 LearningRate 0.0008 Epoch: 7 Global Step: 146920 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:06:10,977-Speed 6348.39 samples/sec Loss 7.2464 LearningRate 0.0008 Epoch: 7 Global Step: 146930 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:14,219-Speed 6318.65 samples/sec Loss 7.3158 LearningRate 0.0008 Epoch: 7 Global Step: 146940 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:17,458-Speed 6325.09 samples/sec Loss 7.3184 LearningRate 0.0008 Epoch: 7 Global Step: 146950 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:20,699-Speed 6320.79 samples/sec Loss 7.2695 LearningRate 0.0008 Epoch: 7 Global Step: 146960 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:23,941-Speed 6318.27 samples/sec Loss 7.1611 LearningRate 0.0008 Epoch: 7 Global Step: 146970 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:27,181-Speed 6322.10 samples/sec Loss 7.2573 LearningRate 0.0008 Epoch: 7 Global Step: 146980 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:30,422-Speed 6320.90 samples/sec Loss 7.2998 LearningRate 0.0008 Epoch: 7 Global Step: 146990 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:33,661-Speed 6323.56 samples/sec Loss 7.2009 LearningRate 0.0008 Epoch: 7 Global Step: 147000 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:36,903-Speed 6319.21 samples/sec Loss 7.2316 LearningRate 0.0008 Epoch: 7 Global Step: 147010 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:40,142-Speed 6323.18 samples/sec Loss 7.2418 LearningRate 0.0008 Epoch: 7 Global Step: 147020 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:43,395-Speed 6297.58 samples/sec Loss 7.2412 LearningRate 0.0008 Epoch: 7 Global Step: 147030 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:06:46,637-Speed 6319.87 samples/sec Loss 7.2447 LearningRate 0.0008 Epoch: 7 Global Step: 147040 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:06:49,879-Speed 6317.55 samples/sec Loss 7.3462 LearningRate 0.0008 Epoch: 7 Global Step: 147050 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:06:53,126-Speed 6310.61 samples/sec Loss 7.2445 LearningRate 0.0008 Epoch: 7 Global Step: 147060 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:06:56,352-Speed 6348.42 samples/sec Loss 7.3173 LearningRate 0.0008 Epoch: 7 Global Step: 147070 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:06:59,593-Speed 6320.11 samples/sec Loss 7.2469 LearningRate 0.0008 Epoch: 7 Global Step: 147080 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:02,835-Speed 6319.13 samples/sec Loss 7.2510 LearningRate 0.0008 Epoch: 7 Global Step: 147090 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:06,075-Speed 6322.46 samples/sec Loss 7.2119 LearningRate 0.0008 Epoch: 7 Global Step: 147100 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:09,315-Speed 6322.60 samples/sec Loss 7.2318 LearningRate 0.0008 Epoch: 7 Global Step: 147110 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:12,557-Speed 6318.30 samples/sec Loss 7.2956 LearningRate 0.0008 Epoch: 7 Global Step: 147120 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:15,796-Speed 6324.90 samples/sec Loss 7.3017 LearningRate 0.0008 Epoch: 7 Global Step: 147130 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:19,042-Speed 6310.62 samples/sec Loss 7.2545 LearningRate 0.0008 Epoch: 7 Global Step: 147140 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:22,282-Speed 6321.40 samples/sec Loss 7.2791 LearningRate 0.0008 Epoch: 7 Global Step: 147150 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:25,525-Speed 6317.54 samples/sec Loss 7.3696 LearningRate 0.0008 Epoch: 7 Global Step: 147160 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:28,765-Speed 6320.88 samples/sec Loss 7.3436 LearningRate 0.0008 Epoch: 7 Global Step: 147170 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:07:32,006-Speed 6320.71 samples/sec Loss 7.2885 LearningRate 0.0008 Epoch: 7 Global Step: 147180 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:07:35,248-Speed 6318.64 samples/sec Loss 7.2413 LearningRate 0.0008 Epoch: 7 Global Step: 147190 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:07:38,491-Speed 6317.55 samples/sec Loss 7.2377 LearningRate 0.0008 Epoch: 7 Global Step: 147200 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:07:41,728-Speed 6327.84 samples/sec Loss 7.2547 LearningRate 0.0008 Epoch: 7 Global Step: 147210 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:07:44,969-Speed 6319.95 samples/sec Loss 7.2890 LearningRate 0.0008 Epoch: 7 Global Step: 147220 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:07:48,209-Speed 6322.04 samples/sec Loss 7.3243 LearningRate 0.0008 Epoch: 7 Global Step: 147230 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:51,453-Speed 6314.73 samples/sec Loss 7.2377 LearningRate 0.0008 Epoch: 7 Global Step: 147240 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:54,697-Speed 6315.56 samples/sec Loss 7.3101 LearningRate 0.0008 Epoch: 7 Global Step: 147250 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:07:57,944-Speed 6308.50 samples/sec Loss 7.2289 LearningRate 0.0008 Epoch: 7 Global Step: 147260 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:01,186-Speed 6318.36 samples/sec Loss 7.3287 LearningRate 0.0008 Epoch: 7 Global Step: 147270 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:04,429-Speed 6317.74 samples/sec Loss 7.3482 LearningRate 0.0008 Epoch: 7 Global Step: 147280 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:07,678-Speed 6305.30 samples/sec Loss 7.2886 LearningRate 0.0008 Epoch: 7 Global Step: 147290 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:10,921-Speed 6316.06 samples/sec Loss 7.2825 LearningRate 0.0008 Epoch: 7 Global Step: 147300 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:14,162-Speed 6319.30 samples/sec Loss 7.3544 LearningRate 0.0008 Epoch: 7 Global Step: 147310 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:17,408-Speed 6311.23 samples/sec Loss 7.3031 LearningRate 0.0008 Epoch: 7 Global Step: 147320 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:20,653-Speed 6314.08 samples/sec Loss 7.2589 LearningRate 0.0008 Epoch: 7 Global Step: 147330 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:08:23,883-Speed 6340.43 samples/sec Loss 7.3494 LearningRate 0.0008 Epoch: 7 Global Step: 147340 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:27,130-Speed 6309.15 samples/sec Loss 7.1889 LearningRate 0.0008 Epoch: 7 Global Step: 147350 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:30,375-Speed 6312.87 samples/sec Loss 7.2789 LearningRate 0.0008 Epoch: 7 Global Step: 147360 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:33,618-Speed 6315.91 samples/sec Loss 7.2222 LearningRate 0.0008 Epoch: 7 Global Step: 147370 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:36,859-Speed 6321.74 samples/sec Loss 7.2933 LearningRate 0.0008 Epoch: 7 Global Step: 147380 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:40,112-Speed 6296.84 samples/sec Loss 7.2949 LearningRate 0.0008 Epoch: 7 Global Step: 147390 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:43,360-Speed 6305.96 samples/sec Loss 7.3108 LearningRate 0.0008 Epoch: 7 Global Step: 147400 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:46,605-Speed 6312.54 samples/sec Loss 7.3064 LearningRate 0.0008 Epoch: 7 Global Step: 147410 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:49,855-Speed 6303.93 samples/sec Loss 7.2469 LearningRate 0.0008 Epoch: 7 Global Step: 147420 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:53,099-Speed 6314.39 samples/sec Loss 7.2558 LearningRate 0.0008 Epoch: 7 Global Step: 147430 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:08:56,344-Speed 6312.72 samples/sec Loss 7.3229 LearningRate 0.0008 Epoch: 7 Global Step: 147440 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:08:59,593-Speed 6304.84 samples/sec Loss 7.2329 LearningRate 0.0008 Epoch: 7 Global Step: 147450 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:09:02,843-Speed 6302.57 samples/sec Loss 7.3374 LearningRate 0.0008 Epoch: 7 Global Step: 147460 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:09:06,088-Speed 6312.98 samples/sec Loss 7.2215 LearningRate 0.0008 Epoch: 7 Global Step: 147470 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:09:09,330-Speed 6319.09 samples/sec Loss 7.3212 LearningRate 0.0008 Epoch: 7 Global Step: 147480 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:09:12,561-Speed 6339.71 samples/sec Loss 7.2620 LearningRate 0.0008 Epoch: 7 Global Step: 147490 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:09:15,815-Speed 6296.10 samples/sec Loss 7.3305 LearningRate 0.0008 Epoch: 7 Global Step: 147500 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:09:19,058-Speed 6315.05 samples/sec Loss 7.2086 LearningRate 0.0008 Epoch: 7 Global Step: 147510 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:09:22,301-Speed 6317.85 samples/sec Loss 7.2759 LearningRate 0.0008 Epoch: 7 Global Step: 147520 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:09:25,539-Speed 6324.80 samples/sec Loss 7.2630 LearningRate 0.0008 Epoch: 7 Global Step: 147530 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:09:28,783-Speed 6314.68 samples/sec Loss 7.2929 LearningRate 0.0008 Epoch: 7 Global Step: 147540 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:09:32,024-Speed 6321.54 samples/sec Loss 7.2092 LearningRate 0.0008 Epoch: 7 Global Step: 147550 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:09:35,277-Speed 6297.53 samples/sec Loss 7.3033 LearningRate 0.0008 Epoch: 7 Global Step: 147560 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:09:38,519-Speed 6318.44 samples/sec Loss 7.2462 LearningRate 0.0008 Epoch: 7 Global Step: 147570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:09:41,762-Speed 6315.62 samples/sec Loss 7.3392 LearningRate 0.0008 Epoch: 7 Global Step: 147580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:09:45,006-Speed 6313.79 samples/sec Loss 7.2484 LearningRate 0.0008 Epoch: 7 Global Step: 147590 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:09:48,249-Speed 6316.98 samples/sec Loss 7.1869 LearningRate 0.0008 Epoch: 7 Global Step: 147600 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:09:51,498-Speed 6305.59 samples/sec Loss 7.2517 LearningRate 0.0008 Epoch: 7 Global Step: 147610 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:09:54,740-Speed 6318.18 samples/sec Loss 7.2728 LearningRate 0.0008 Epoch: 7 Global Step: 147620 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:09:57,984-Speed 6313.81 samples/sec Loss 7.2611 LearningRate 0.0008 Epoch: 7 Global Step: 147630 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:01,248-Speed 6276.86 samples/sec Loss 7.2073 LearningRate 0.0008 Epoch: 7 Global Step: 147640 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:04,492-Speed 6314.81 samples/sec Loss 7.3430 LearningRate 0.0008 Epoch: 7 Global Step: 147650 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:07,736-Speed 6314.18 samples/sec Loss 7.2375 LearningRate 0.0008 Epoch: 7 Global Step: 147660 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:10,979-Speed 6316.82 samples/sec Loss 7.3284 LearningRate 0.0008 Epoch: 7 Global Step: 147670 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:14,222-Speed 6317.35 samples/sec Loss 7.2212 LearningRate 0.0008 Epoch: 7 Global Step: 147680 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:17,452-Speed 6342.43 samples/sec Loss 7.2267 LearningRate 0.0008 Epoch: 7 Global Step: 147690 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:20,694-Speed 6318.71 samples/sec Loss 7.3223 LearningRate 0.0008 Epoch: 7 Global Step: 147700 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:23,936-Speed 6317.93 samples/sec Loss 7.2257 LearningRate 0.0008 Epoch: 7 Global Step: 147710 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:27,195-Speed 6286.30 samples/sec Loss 7.2429 LearningRate 0.0008 Epoch: 7 Global Step: 147720 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:30,439-Speed 6313.66 samples/sec Loss 7.2847 LearningRate 0.0008 Epoch: 7 Global Step: 147730 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:10:33,671-Speed 6338.49 samples/sec Loss 7.3632 LearningRate 0.0008 Epoch: 7 Global Step: 147740 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:10:36,915-Speed 6314.27 samples/sec Loss 7.2578 LearningRate 0.0008 Epoch: 7 Global Step: 147750 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:10:40,157-Speed 6318.14 samples/sec Loss 7.2900 LearningRate 0.0008 Epoch: 7 Global Step: 147760 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:10:43,400-Speed 6315.88 samples/sec Loss 7.2263 LearningRate 0.0008 Epoch: 7 Global Step: 147770 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:10:46,644-Speed 6315.55 samples/sec Loss 7.2675 LearningRate 0.0008 Epoch: 7 Global Step: 147780 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:10:49,884-Speed 6321.50 samples/sec Loss 7.2744 LearningRate 0.0008 Epoch: 7 Global Step: 147790 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:10:53,130-Speed 6312.03 samples/sec Loss 7.2362 LearningRate 0.0008 Epoch: 7 Global Step: 147800 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:10:56,373-Speed 6316.24 samples/sec Loss 7.3146 LearningRate 0.0008 Epoch: 7 Global Step: 147810 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:10:59,614-Speed 6319.91 samples/sec Loss 7.2813 LearningRate 0.0008 Epoch: 7 Global Step: 147820 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:02,858-Speed 6314.11 samples/sec Loss 7.2928 LearningRate 0.0008 Epoch: 7 Global Step: 147830 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:06,085-Speed 6349.76 samples/sec Loss 7.1856 LearningRate 0.0008 Epoch: 7 Global Step: 147840 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:09,331-Speed 6310.21 samples/sec Loss 7.1691 LearningRate 0.0008 Epoch: 7 Global Step: 147850 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:12,571-Speed 6321.76 samples/sec Loss 7.2973 LearningRate 0.0008 Epoch: 7 Global Step: 147860 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:15,814-Speed 6317.56 samples/sec Loss 7.2222 LearningRate 0.0008 Epoch: 7 Global Step: 147870 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:19,057-Speed 6315.37 samples/sec Loss 7.2986 LearningRate 0.0008 Epoch: 7 Global Step: 147880 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:22,298-Speed 6320.17 samples/sec Loss 7.3849 LearningRate 0.0008 Epoch: 7 Global Step: 147890 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:25,547-Speed 6306.37 samples/sec Loss 7.2964 LearningRate 0.0008 Epoch: 7 Global Step: 147900 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:28,791-Speed 6315.46 samples/sec Loss 7.2275 LearningRate 0.0008 Epoch: 7 Global Step: 147910 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:32,041-Speed 6303.08 samples/sec Loss 7.2184 LearningRate 0.0008 Epoch: 7 Global Step: 147920 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:35,284-Speed 6316.92 samples/sec Loss 7.2282 LearningRate 0.0008 Epoch: 7 Global Step: 147930 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:11:38,527-Speed 6315.22 samples/sec Loss 7.2247 LearningRate 0.0008 Epoch: 7 Global Step: 147940 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:11:41,774-Speed 6309.37 samples/sec Loss 7.2448 LearningRate 0.0008 Epoch: 7 Global Step: 147950 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:11:45,017-Speed 6316.02 samples/sec Loss 7.2951 LearningRate 0.0008 Epoch: 7 Global Step: 147960 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:11:48,262-Speed 6312.70 samples/sec Loss 7.2842 LearningRate 0.0008 Epoch: 7 Global Step: 147970 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:11:51,506-Speed 6315.21 samples/sec Loss 7.2979 LearningRate 0.0008 Epoch: 7 Global Step: 147980 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:11:54,751-Speed 6312.48 samples/sec Loss 7.3352 LearningRate 0.0008 Epoch: 7 Global Step: 147990 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:11:57,994-Speed 6317.04 samples/sec Loss 7.3220 LearningRate 0.0008 Epoch: 7 Global Step: 148000 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:12:01,240-Speed 6309.87 samples/sec Loss 7.2581 LearningRate 0.0008 Epoch: 7 Global Step: 148010 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:12:04,500-Speed 6284.43 samples/sec Loss 7.2101 LearningRate 0.0008 Epoch: 7 Global Step: 148020 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:12:07,767-Speed 6270.17 samples/sec Loss 7.2657 LearningRate 0.0008 Epoch: 7 Global Step: 148030 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:11,014-Speed 6307.45 samples/sec Loss 7.1668 LearningRate 0.0008 Epoch: 7 Global Step: 148040 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:14,341-Speed 6157.94 samples/sec Loss 7.3181 LearningRate 0.0008 Epoch: 7 Global Step: 148050 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:17,589-Speed 6305.78 samples/sec Loss 7.2702 LearningRate 0.0008 Epoch: 7 Global Step: 148060 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:20,833-Speed 6316.31 samples/sec Loss 7.2752 LearningRate 0.0008 Epoch: 7 Global Step: 148070 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:24,078-Speed 6310.84 samples/sec Loss 7.2593 LearningRate 0.0008 Epoch: 7 Global Step: 148080 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:27,323-Speed 6312.93 samples/sec Loss 7.2591 LearningRate 0.0008 Epoch: 7 Global Step: 148090 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:30,570-Speed 6309.78 samples/sec Loss 7.2513 LearningRate 0.0008 Epoch: 7 Global Step: 148100 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:33,813-Speed 6317.69 samples/sec Loss 7.1926 LearningRate 0.0008 Epoch: 7 Global Step: 148110 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:37,060-Speed 6307.34 samples/sec Loss 7.2864 LearningRate 0.0008 Epoch: 7 Global Step: 148120 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:40,304-Speed 6315.32 samples/sec Loss 7.2636 LearningRate 0.0008 Epoch: 7 Global Step: 148130 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:12:43,551-Speed 6308.80 samples/sec Loss 7.2417 LearningRate 0.0008 Epoch: 7 Global Step: 148140 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:12:46,797-Speed 6310.42 samples/sec Loss 7.2071 LearningRate 0.0008 Epoch: 7 Global Step: 148150 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:12:50,041-Speed 6314.36 samples/sec Loss 7.3267 LearningRate 0.0008 Epoch: 7 Global Step: 148160 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:12:53,288-Speed 6309.42 samples/sec Loss 7.3324 LearningRate 0.0008 Epoch: 7 Global Step: 148170 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:12:56,517-Speed 6343.31 samples/sec Loss 7.2488 LearningRate 0.0008 Epoch: 7 Global Step: 148180 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:12:59,762-Speed 6313.29 samples/sec Loss 7.2797 LearningRate 0.0008 Epoch: 7 Global Step: 148190 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:13:03,008-Speed 6308.77 samples/sec Loss 7.1881 LearningRate 0.0008 Epoch: 7 Global Step: 148200 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:13:06,253-Speed 6314.16 samples/sec Loss 7.3161 LearningRate 0.0008 Epoch: 7 Global Step: 148210 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:13:09,497-Speed 6313.51 samples/sec Loss 7.2391 LearningRate 0.0008 Epoch: 7 Global Step: 148220 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:13:12,739-Speed 6319.26 samples/sec Loss 7.2946 LearningRate 0.0008 Epoch: 7 Global Step: 148230 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:13:15,983-Speed 6314.63 samples/sec Loss 7.2754 LearningRate 0.0008 Epoch: 7 Global Step: 148240 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:13:19,227-Speed 6314.76 samples/sec Loss 7.3487 LearningRate 0.0008 Epoch: 7 Global Step: 148250 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:13:22,469-Speed 6318.54 samples/sec Loss 7.2451 LearningRate 0.0008 Epoch: 7 Global Step: 148260 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:13:25,712-Speed 6316.86 samples/sec Loss 7.2637 LearningRate 0.0008 Epoch: 7 Global Step: 148270 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:13:28,957-Speed 6311.56 samples/sec Loss 7.2168 LearningRate 0.0008 Epoch: 7 Global Step: 148280 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:13:32,216-Speed 6285.06 samples/sec Loss 7.2283 LearningRate 0.0008 Epoch: 7 Global Step: 148290 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:13:35,464-Speed 6307.17 samples/sec Loss 7.2793 LearningRate 0.0008 Epoch: 7 Global Step: 148300 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:13:38,713-Speed 6304.96 samples/sec Loss 7.2834 LearningRate 0.0008 Epoch: 7 Global Step: 148310 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:13:41,962-Speed 6304.12 samples/sec Loss 7.3145 LearningRate 0.0008 Epoch: 7 Global Step: 148320 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:13:45,210-Speed 6308.15 samples/sec Loss 7.2117 LearningRate 0.0008 Epoch: 7 Global Step: 148330 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:13:48,454-Speed 6314.80 samples/sec Loss 7.2678 LearningRate 0.0008 Epoch: 7 Global Step: 148340 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:13:51,702-Speed 6306.93 samples/sec Loss 7.2782 LearningRate 0.0008 Epoch: 7 Global Step: 148350 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:13:54,950-Speed 6306.80 samples/sec Loss 7.2734 LearningRate 0.0008 Epoch: 7 Global Step: 148360 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:13:58,199-Speed 6305.61 samples/sec Loss 7.3054 LearningRate 0.0008 Epoch: 7 Global Step: 148370 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:14:01,449-Speed 6303.14 samples/sec Loss 7.2539 LearningRate 0.0008 Epoch: 7 Global Step: 148380 Fp16 Grad Scale: 131072 Required: 62 hours Training: 2022-04-01 05:14:04,671-Speed 6356.76 samples/sec Loss 7.1196 LearningRate 0.0008 Epoch: 7 Global Step: 148390 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:07,916-Speed 6313.38 samples/sec Loss 7.1947 LearningRate 0.0008 Epoch: 7 Global Step: 148400 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:11,160-Speed 6313.81 samples/sec Loss 7.1938 LearningRate 0.0008 Epoch: 7 Global Step: 148410 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:14,405-Speed 6312.36 samples/sec Loss 7.2558 LearningRate 0.0008 Epoch: 7 Global Step: 148420 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:17,650-Speed 6312.24 samples/sec Loss 7.2867 LearningRate 0.0008 Epoch: 7 Global Step: 148430 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:20,898-Speed 6308.27 samples/sec Loss 7.3005 LearningRate 0.0008 Epoch: 7 Global Step: 148440 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:24,141-Speed 6315.04 samples/sec Loss 7.2366 LearningRate 0.0008 Epoch: 7 Global Step: 148450 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:27,386-Speed 6313.32 samples/sec Loss 7.2142 LearningRate 0.0008 Epoch: 7 Global Step: 148460 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:30,632-Speed 6310.56 samples/sec Loss 7.2743 LearningRate 0.0008 Epoch: 7 Global Step: 148470 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:33,893-Speed 6282.16 samples/sec Loss 7.3146 LearningRate 0.0008 Epoch: 7 Global Step: 148480 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:37,140-Speed 6309.54 samples/sec Loss 7.1925 LearningRate 0.0008 Epoch: 7 Global Step: 148490 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:14:40,375-Speed 6330.15 samples/sec Loss 7.2406 LearningRate 0.0008 Epoch: 7 Global Step: 148500 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:43,618-Speed 6316.43 samples/sec Loss 7.2645 LearningRate 0.0008 Epoch: 7 Global Step: 148510 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:46,865-Speed 6309.92 samples/sec Loss 7.2775 LearningRate 0.0008 Epoch: 7 Global Step: 148520 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:50,117-Speed 6299.50 samples/sec Loss 7.2523 LearningRate 0.0008 Epoch: 7 Global Step: 148530 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:53,362-Speed 6312.01 samples/sec Loss 7.2275 LearningRate 0.0008 Epoch: 7 Global Step: 148540 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:56,607-Speed 6313.33 samples/sec Loss 7.2515 LearningRate 0.0008 Epoch: 7 Global Step: 148550 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:14:59,854-Speed 6308.43 samples/sec Loss 7.2949 LearningRate 0.0008 Epoch: 7 Global Step: 148560 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:03,103-Speed 6305.41 samples/sec Loss 7.2532 LearningRate 0.0008 Epoch: 7 Global Step: 148570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:06,349-Speed 6311.36 samples/sec Loss 7.2188 LearningRate 0.0008 Epoch: 7 Global Step: 148580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:09,593-Speed 6314.00 samples/sec Loss 7.2028 LearningRate 0.0008 Epoch: 7 Global Step: 148590 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:12,839-Speed 6311.23 samples/sec Loss 7.2577 LearningRate 0.0008 Epoch: 7 Global Step: 148600 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:15:16,078-Speed 6323.51 samples/sec Loss 7.1685 LearningRate 0.0008 Epoch: 7 Global Step: 148610 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:15:19,327-Speed 6305.87 samples/sec Loss 7.2599 LearningRate 0.0008 Epoch: 7 Global Step: 148620 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:15:22,558-Speed 6338.94 samples/sec Loss 7.1531 LearningRate 0.0008 Epoch: 7 Global Step: 148630 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:25,816-Speed 6288.33 samples/sec Loss 7.2310 LearningRate 0.0008 Epoch: 7 Global Step: 148640 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:29,059-Speed 6316.48 samples/sec Loss 7.1479 LearningRate 0.0008 Epoch: 7 Global Step: 148650 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:32,305-Speed 6310.69 samples/sec Loss 7.2219 LearningRate 0.0008 Epoch: 7 Global Step: 148660 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:35,552-Speed 6309.00 samples/sec Loss 7.2504 LearningRate 0.0008 Epoch: 7 Global Step: 148670 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:38,793-Speed 6318.80 samples/sec Loss 7.2045 LearningRate 0.0008 Epoch: 7 Global Step: 148680 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:42,037-Speed 6314.90 samples/sec Loss 7.2325 LearningRate 0.0008 Epoch: 7 Global Step: 148690 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:45,285-Speed 6307.11 samples/sec Loss 7.3039 LearningRate 0.0008 Epoch: 7 Global Step: 148700 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:48,528-Speed 6317.08 samples/sec Loss 7.2111 LearningRate 0.0008 Epoch: 7 Global Step: 148710 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:51,775-Speed 6308.19 samples/sec Loss 7.2243 LearningRate 0.0008 Epoch: 7 Global Step: 148720 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:15:55,022-Speed 6308.24 samples/sec Loss 7.2094 LearningRate 0.0008 Epoch: 7 Global Step: 148730 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:15:58,270-Speed 6308.30 samples/sec Loss 7.2499 LearningRate 0.0008 Epoch: 7 Global Step: 148740 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:01,515-Speed 6311.53 samples/sec Loss 7.1923 LearningRate 0.0008 Epoch: 7 Global Step: 148750 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:04,759-Speed 6316.01 samples/sec Loss 7.3119 LearningRate 0.0008 Epoch: 7 Global Step: 148760 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:08,002-Speed 6316.28 samples/sec Loss 7.2140 LearningRate 0.0008 Epoch: 7 Global Step: 148770 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:11,244-Speed 6318.41 samples/sec Loss 7.2349 LearningRate 0.0008 Epoch: 7 Global Step: 148780 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:14,500-Speed 6292.29 samples/sec Loss 7.2280 LearningRate 0.0008 Epoch: 7 Global Step: 148790 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:17,748-Speed 6306.36 samples/sec Loss 7.3052 LearningRate 0.0008 Epoch: 7 Global Step: 148800 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:20,993-Speed 6313.84 samples/sec Loss 7.2488 LearningRate 0.0008 Epoch: 7 Global Step: 148810 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:24,238-Speed 6311.79 samples/sec Loss 7.2190 LearningRate 0.0008 Epoch: 7 Global Step: 148820 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:27,483-Speed 6313.78 samples/sec Loss 7.2546 LearningRate 0.0008 Epoch: 7 Global Step: 148830 Fp16 Grad Scale: 131072 Required: 62 hours Training: 2022-04-01 05:16:30,713-Speed 6340.67 samples/sec Loss 7.2312 LearningRate 0.0008 Epoch: 7 Global Step: 148840 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:33,968-Speed 6292.88 samples/sec Loss 7.3339 LearningRate 0.0008 Epoch: 7 Global Step: 148850 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:16:37,204-Speed 6331.79 samples/sec Loss 7.2422 LearningRate 0.0008 Epoch: 7 Global Step: 148860 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:16:40,447-Speed 6315.87 samples/sec Loss 7.3152 LearningRate 0.0008 Epoch: 7 Global Step: 148870 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:16:43,694-Speed 6308.71 samples/sec Loss 7.1537 LearningRate 0.0008 Epoch: 7 Global Step: 148880 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:16:46,944-Speed 6302.84 samples/sec Loss 7.2085 LearningRate 0.0008 Epoch: 7 Global Step: 148890 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:16:50,187-Speed 6316.82 samples/sec Loss 7.2702 LearningRate 0.0008 Epoch: 7 Global Step: 148900 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:16:53,434-Speed 6309.24 samples/sec Loss 7.2111 LearningRate 0.0008 Epoch: 7 Global Step: 148910 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:16:56,682-Speed 6306.18 samples/sec Loss 7.1954 LearningRate 0.0008 Epoch: 7 Global Step: 148920 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:16:59,923-Speed 6319.30 samples/sec Loss 7.2326 LearningRate 0.0008 Epoch: 7 Global Step: 148930 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:03,169-Speed 6310.94 samples/sec Loss 7.2941 LearningRate 0.0008 Epoch: 7 Global Step: 148940 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:06,415-Speed 6311.41 samples/sec Loss 7.2270 LearningRate 0.0008 Epoch: 7 Global Step: 148950 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:09,693-Speed 6249.18 samples/sec Loss 7.2881 LearningRate 0.0008 Epoch: 7 Global Step: 148960 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:17:12,962-Speed 6265.78 samples/sec Loss 7.2283 LearningRate 0.0008 Epoch: 7 Global Step: 148970 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:17:16,206-Speed 6315.56 samples/sec Loss 7.2272 LearningRate 0.0008 Epoch: 7 Global Step: 148980 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:17:19,450-Speed 6315.36 samples/sec Loss 7.2144 LearningRate 0.0008 Epoch: 7 Global Step: 148990 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:17:22,698-Speed 6306.80 samples/sec Loss 7.3361 LearningRate 0.0008 Epoch: 7 Global Step: 149000 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:17:25,947-Speed 6305.46 samples/sec Loss 7.2919 LearningRate 0.0008 Epoch: 7 Global Step: 149010 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:17:29,190-Speed 6316.73 samples/sec Loss 7.2613 LearningRate 0.0008 Epoch: 7 Global Step: 149020 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:17:32,417-Speed 6346.57 samples/sec Loss 7.2226 LearningRate 0.0008 Epoch: 7 Global Step: 149030 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:35,660-Speed 6316.02 samples/sec Loss 7.2758 LearningRate 0.0008 Epoch: 7 Global Step: 149040 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:38,905-Speed 6313.90 samples/sec Loss 7.1844 LearningRate 0.0008 Epoch: 7 Global Step: 149050 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:42,153-Speed 6306.05 samples/sec Loss 7.2669 LearningRate 0.0008 Epoch: 7 Global Step: 149060 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:45,399-Speed 6311.77 samples/sec Loss 7.2107 LearningRate 0.0008 Epoch: 7 Global Step: 149070 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:48,645-Speed 6310.02 samples/sec Loss 7.2086 LearningRate 0.0008 Epoch: 7 Global Step: 149080 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:51,892-Speed 6309.38 samples/sec Loss 7.2547 LearningRate 0.0008 Epoch: 7 Global Step: 149090 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:55,138-Speed 6310.05 samples/sec Loss 7.2553 LearningRate 0.0008 Epoch: 7 Global Step: 149100 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:17:58,383-Speed 6312.34 samples/sec Loss 7.2409 LearningRate 0.0008 Epoch: 7 Global Step: 149110 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:01,632-Speed 6305.31 samples/sec Loss 7.2303 LearningRate 0.0008 Epoch: 7 Global Step: 149120 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:04,877-Speed 6313.29 samples/sec Loss 7.2083 LearningRate 0.0008 Epoch: 7 Global Step: 149130 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:18:08,121-Speed 6312.86 samples/sec Loss 7.2661 LearningRate 0.0008 Epoch: 7 Global Step: 149140 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:18:11,368-Speed 6309.10 samples/sec Loss 7.2439 LearningRate 0.0008 Epoch: 7 Global Step: 149150 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:18:14,621-Speed 6297.93 samples/sec Loss 7.3474 LearningRate 0.0008 Epoch: 7 Global Step: 149160 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:18:17,851-Speed 6342.01 samples/sec Loss 7.1929 LearningRate 0.0008 Epoch: 7 Global Step: 149170 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:21,102-Speed 6300.93 samples/sec Loss 7.2100 LearningRate 0.0008 Epoch: 7 Global Step: 149180 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:24,348-Speed 6310.34 samples/sec Loss 7.2703 LearningRate 0.0008 Epoch: 7 Global Step: 149190 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:27,596-Speed 6308.43 samples/sec Loss 7.2097 LearningRate 0.0008 Epoch: 7 Global Step: 149200 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:30,836-Speed 6321.62 samples/sec Loss 7.2544 LearningRate 0.0008 Epoch: 7 Global Step: 149210 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:34,082-Speed 6311.23 samples/sec Loss 7.1479 LearningRate 0.0008 Epoch: 7 Global Step: 149220 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:37,326-Speed 6313.08 samples/sec Loss 7.2356 LearningRate 0.0008 Epoch: 7 Global Step: 149230 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:40,572-Speed 6311.12 samples/sec Loss 7.3125 LearningRate 0.0008 Epoch: 7 Global Step: 149240 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:43,820-Speed 6307.89 samples/sec Loss 7.2282 LearningRate 0.0008 Epoch: 7 Global Step: 149250 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:47,064-Speed 6313.39 samples/sec Loss 7.2431 LearningRate 0.0008 Epoch: 7 Global Step: 149260 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:18:50,317-Speed 6297.63 samples/sec Loss 7.2942 LearningRate 0.0008 Epoch: 7 Global Step: 149270 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:18:53,567-Speed 6302.75 samples/sec Loss 7.2403 LearningRate 0.0008 Epoch: 7 Global Step: 149280 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:18:56,812-Speed 6313.39 samples/sec Loss 7.2532 LearningRate 0.0008 Epoch: 7 Global Step: 149290 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:19:00,060-Speed 6307.43 samples/sec Loss 7.2778 LearningRate 0.0008 Epoch: 7 Global Step: 149300 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:19:03,304-Speed 6314.05 samples/sec Loss 7.2219 LearningRate 0.0008 Epoch: 7 Global Step: 149310 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:19:06,537-Speed 6334.71 samples/sec Loss 7.2634 LearningRate 0.0008 Epoch: 7 Global Step: 149320 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:19:09,783-Speed 6311.87 samples/sec Loss 7.2701 LearningRate 0.0008 Epoch: 7 Global Step: 149330 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:19:13,056-Speed 6258.44 samples/sec Loss 7.2289 LearningRate 0.0008 Epoch: 7 Global Step: 149340 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:19:16,301-Speed 6312.43 samples/sec Loss 7.2051 LearningRate 0.0008 Epoch: 7 Global Step: 149350 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:19:19,548-Speed 6309.12 samples/sec Loss 7.2257 LearningRate 0.0008 Epoch: 7 Global Step: 149360 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:19:22,796-Speed 6305.96 samples/sec Loss 7.2519 LearningRate 0.0008 Epoch: 7 Global Step: 149370 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:19:26,041-Speed 6312.50 samples/sec Loss 7.2657 LearningRate 0.0008 Epoch: 7 Global Step: 149380 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:19:29,286-Speed 6313.23 samples/sec Loss 7.2628 LearningRate 0.0008 Epoch: 7 Global Step: 149390 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:19:32,529-Speed 6316.43 samples/sec Loss 7.1589 LearningRate 0.0008 Epoch: 7 Global Step: 149400 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:19:35,772-Speed 6318.59 samples/sec Loss 7.2080 LearningRate 0.0008 Epoch: 7 Global Step: 149410 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:19:39,030-Speed 6286.09 samples/sec Loss 7.2660 LearningRate 0.0008 Epoch: 7 Global Step: 149420 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:19:42,278-Speed 6308.02 samples/sec Loss 7.2534 LearningRate 0.0008 Epoch: 7 Global Step: 149430 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:19:45,531-Speed 6296.79 samples/sec Loss 7.1974 LearningRate 0.0008 Epoch: 7 Global Step: 149440 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:19:48,777-Speed 6309.58 samples/sec Loss 7.2563 LearningRate 0.0008 Epoch: 7 Global Step: 149450 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:19:52,025-Speed 6308.05 samples/sec Loss 7.1772 LearningRate 0.0008 Epoch: 7 Global Step: 149460 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:19:55,272-Speed 6308.87 samples/sec Loss 7.2600 LearningRate 0.0008 Epoch: 7 Global Step: 149470 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:19:58,516-Speed 6313.56 samples/sec Loss 7.2423 LearningRate 0.0008 Epoch: 7 Global Step: 149480 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:20:01,751-Speed 6332.31 samples/sec Loss 7.2336 LearningRate 0.0008 Epoch: 7 Global Step: 149490 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:20:05,000-Speed 6305.93 samples/sec Loss 7.2822 LearningRate 0.0008 Epoch: 7 Global Step: 149500 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:20:08,243-Speed 6314.78 samples/sec Loss 7.2695 LearningRate 0.0008 Epoch: 7 Global Step: 149510 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:20:11,491-Speed 6307.56 samples/sec Loss 7.2379 LearningRate 0.0008 Epoch: 7 Global Step: 149520 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:20:14,736-Speed 6312.84 samples/sec Loss 7.2617 LearningRate 0.0008 Epoch: 7 Global Step: 149530 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:20:17,986-Speed 6302.65 samples/sec Loss 7.1328 LearningRate 0.0008 Epoch: 7 Global Step: 149540 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:20:21,233-Speed 6310.17 samples/sec Loss 7.2156 LearningRate 0.0008 Epoch: 7 Global Step: 149550 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:20:24,480-Speed 6309.03 samples/sec Loss 7.3069 LearningRate 0.0008 Epoch: 7 Global Step: 149560 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:20:27,722-Speed 6317.36 samples/sec Loss 7.2278 LearningRate 0.0008 Epoch: 7 Global Step: 149570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:20:30,970-Speed 6306.12 samples/sec Loss 7.1780 LearningRate 0.0008 Epoch: 7 Global Step: 149580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:20:34,218-Speed 6307.82 samples/sec Loss 7.2308 LearningRate 0.0008 Epoch: 7 Global Step: 149590 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:20:37,465-Speed 6308.66 samples/sec Loss 7.1749 LearningRate 0.0008 Epoch: 7 Global Step: 149600 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:20:40,709-Speed 6314.19 samples/sec Loss 7.2341 LearningRate 0.0008 Epoch: 7 Global Step: 149610 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:20:43,952-Speed 6316.94 samples/sec Loss 7.2466 LearningRate 0.0008 Epoch: 7 Global Step: 149620 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:20:47,199-Speed 6309.08 samples/sec Loss 7.2627 LearningRate 0.0008 Epoch: 7 Global Step: 149630 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:20:50,443-Speed 6315.42 samples/sec Loss 7.2826 LearningRate 0.0008 Epoch: 7 Global Step: 149640 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:20:53,696-Speed 6296.87 samples/sec Loss 7.3181 LearningRate 0.0008 Epoch: 7 Global Step: 149650 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:20:56,928-Speed 6337.63 samples/sec Loss 7.2002 LearningRate 0.0008 Epoch: 7 Global Step: 149660 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:00,174-Speed 6311.78 samples/sec Loss 7.3124 LearningRate 0.0008 Epoch: 7 Global Step: 149670 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:03,422-Speed 6306.99 samples/sec Loss 7.1587 LearningRate 0.0008 Epoch: 7 Global Step: 149680 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:06,669-Speed 6307.92 samples/sec Loss 7.2741 LearningRate 0.0008 Epoch: 7 Global Step: 149690 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:09,915-Speed 6310.51 samples/sec Loss 7.1749 LearningRate 0.0008 Epoch: 7 Global Step: 149700 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:13,162-Speed 6309.90 samples/sec Loss 7.2626 LearningRate 0.0008 Epoch: 7 Global Step: 149710 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:16,405-Speed 6315.43 samples/sec Loss 7.1478 LearningRate 0.0008 Epoch: 7 Global Step: 149720 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:19,651-Speed 6310.22 samples/sec Loss 7.1827 LearningRate 0.0008 Epoch: 7 Global Step: 149730 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:22,897-Speed 6311.14 samples/sec Loss 7.2673 LearningRate 0.0008 Epoch: 7 Global Step: 149740 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:26,141-Speed 6313.82 samples/sec Loss 7.1918 LearningRate 0.0008 Epoch: 7 Global Step: 149750 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:29,371-Speed 6343.84 samples/sec Loss 7.2245 LearningRate 0.0008 Epoch: 7 Global Step: 149760 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:32,622-Speed 6299.35 samples/sec Loss 7.2383 LearningRate 0.0008 Epoch: 7 Global Step: 149770 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:35,869-Speed 6310.48 samples/sec Loss 7.2066 LearningRate 0.0008 Epoch: 7 Global Step: 149780 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:39,114-Speed 6311.26 samples/sec Loss 7.2061 LearningRate 0.0008 Epoch: 7 Global Step: 149790 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:42,362-Speed 6307.32 samples/sec Loss 7.2680 LearningRate 0.0008 Epoch: 7 Global Step: 149800 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:45,607-Speed 6312.60 samples/sec Loss 7.1975 LearningRate 0.0008 Epoch: 7 Global Step: 149810 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:48,850-Speed 6315.98 samples/sec Loss 7.1729 LearningRate 0.0008 Epoch: 7 Global Step: 149820 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:52,099-Speed 6305.47 samples/sec Loss 7.2093 LearningRate 0.0008 Epoch: 7 Global Step: 149830 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:55,345-Speed 6311.47 samples/sec Loss 7.1693 LearningRate 0.0008 Epoch: 7 Global Step: 149840 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:21:58,594-Speed 6305.73 samples/sec Loss 7.1710 LearningRate 0.0008 Epoch: 7 Global Step: 149850 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:22:01,846-Speed 6298.99 samples/sec Loss 7.2856 LearningRate 0.0008 Epoch: 7 Global Step: 149860 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:22:05,093-Speed 6307.31 samples/sec Loss 7.2152 LearningRate 0.0008 Epoch: 7 Global Step: 149870 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:22:08,341-Speed 6307.94 samples/sec Loss 7.2388 LearningRate 0.0008 Epoch: 7 Global Step: 149880 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:22:11,586-Speed 6312.70 samples/sec Loss 7.2840 LearningRate 0.0008 Epoch: 7 Global Step: 149890 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:22:14,834-Speed 6306.42 samples/sec Loss 7.2637 LearningRate 0.0008 Epoch: 7 Global Step: 149900 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:22:18,081-Speed 6309.56 samples/sec Loss 7.2692 LearningRate 0.0008 Epoch: 7 Global Step: 149910 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:22:21,326-Speed 6311.05 samples/sec Loss 7.2739 LearningRate 0.0008 Epoch: 7 Global Step: 149920 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:22:24,575-Speed 6305.19 samples/sec Loss 7.2619 LearningRate 0.0008 Epoch: 7 Global Step: 149930 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:22:27,819-Speed 6316.22 samples/sec Loss 7.2201 LearningRate 0.0008 Epoch: 7 Global Step: 149940 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:22:31,070-Speed 6299.32 samples/sec Loss 7.3174 LearningRate 0.0008 Epoch: 7 Global Step: 149950 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:22:34,290-Speed 6362.27 samples/sec Loss 7.2764 LearningRate 0.0008 Epoch: 7 Global Step: 149960 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:22:37,539-Speed 6304.61 samples/sec Loss 7.2093 LearningRate 0.0008 Epoch: 7 Global Step: 149970 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:22:40,790-Speed 6301.83 samples/sec Loss 7.2486 LearningRate 0.0008 Epoch: 7 Global Step: 149980 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:22:44,036-Speed 6310.69 samples/sec Loss 7.3470 LearningRate 0.0008 Epoch: 7 Global Step: 149990 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:22:47,280-Speed 6312.93 samples/sec Loss 7.1847 LearningRate 0.0008 Epoch: 7 Global Step: 150000 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:22:50,527-Speed 6310.59 samples/sec Loss 7.2504 LearningRate 0.0008 Epoch: 7 Global Step: 150010 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:22:53,774-Speed 6308.38 samples/sec Loss 7.2720 LearningRate 0.0008 Epoch: 7 Global Step: 150020 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:22:57,024-Speed 6304.01 samples/sec Loss 7.1867 LearningRate 0.0008 Epoch: 7 Global Step: 150030 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:23:00,281-Speed 6290.01 samples/sec Loss 7.2810 LearningRate 0.0008 Epoch: 7 Global Step: 150040 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:23:03,531-Speed 6305.64 samples/sec Loss 7.2036 LearningRate 0.0008 Epoch: 7 Global Step: 150050 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:23:06,776-Speed 6313.84 samples/sec Loss 7.3091 LearningRate 0.0008 Epoch: 7 Global Step: 150060 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:23:10,028-Speed 6299.13 samples/sec Loss 7.2407 LearningRate 0.0008 Epoch: 7 Global Step: 150070 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:23:13,273-Speed 6312.70 samples/sec Loss 7.1929 LearningRate 0.0008 Epoch: 7 Global Step: 150080 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:23:16,521-Speed 6305.94 samples/sec Loss 7.2356 LearningRate 0.0008 Epoch: 7 Global Step: 150090 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:23:19,766-Speed 6311.81 samples/sec Loss 7.2639 LearningRate 0.0008 Epoch: 7 Global Step: 150100 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:23:23,023-Speed 6289.21 samples/sec Loss 7.2297 LearningRate 0.0008 Epoch: 7 Global Step: 150110 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:23:26,305-Speed 6241.51 samples/sec Loss 7.2099 LearningRate 0.0008 Epoch: 7 Global Step: 150120 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:23:29,572-Speed 6270.35 samples/sec Loss 7.2957 LearningRate 0.0008 Epoch: 7 Global Step: 150130 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:23:32,825-Speed 6298.30 samples/sec Loss 7.2679 LearningRate 0.0008 Epoch: 7 Global Step: 150140 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:23:36,074-Speed 6303.76 samples/sec Loss 7.2221 LearningRate 0.0008 Epoch: 7 Global Step: 150150 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:23:39,294-Speed 6361.32 samples/sec Loss 7.2196 LearningRate 0.0008 Epoch: 7 Global Step: 150160 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:23:42,545-Speed 6301.95 samples/sec Loss 7.1821 LearningRate 0.0008 Epoch: 7 Global Step: 150170 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:23:45,804-Speed 6284.24 samples/sec Loss 7.2690 LearningRate 0.0008 Epoch: 7 Global Step: 150180 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:23:49,067-Speed 6278.34 samples/sec Loss 7.1374 LearningRate 0.0008 Epoch: 7 Global Step: 150190 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:23:52,314-Speed 6308.90 samples/sec Loss 7.1862 LearningRate 0.0008 Epoch: 7 Global Step: 150200 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:23:55,562-Speed 6306.31 samples/sec Loss 7.1986 LearningRate 0.0008 Epoch: 7 Global Step: 150210 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:23:58,810-Speed 6308.15 samples/sec Loss 7.2271 LearningRate 0.0008 Epoch: 7 Global Step: 150220 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:02,061-Speed 6302.18 samples/sec Loss 7.3040 LearningRate 0.0008 Epoch: 7 Global Step: 150230 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:05,308-Speed 6307.20 samples/sec Loss 7.2317 LearningRate 0.0008 Epoch: 7 Global Step: 150240 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:08,570-Speed 6280.02 samples/sec Loss 7.2297 LearningRate 0.0008 Epoch: 7 Global Step: 150250 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:11,815-Speed 6313.20 samples/sec Loss 7.1687 LearningRate 0.0008 Epoch: 7 Global Step: 150260 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:24:15,045-Speed 6341.83 samples/sec Loss 7.1934 LearningRate 0.0008 Epoch: 7 Global Step: 150270 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:18,293-Speed 6306.72 samples/sec Loss 7.1787 LearningRate 0.0008 Epoch: 7 Global Step: 150280 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:21,540-Speed 6309.64 samples/sec Loss 7.1798 LearningRate 0.0008 Epoch: 7 Global Step: 150290 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:24,785-Speed 6312.63 samples/sec Loss 7.1635 LearningRate 0.0008 Epoch: 7 Global Step: 150300 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:28,033-Speed 6307.32 samples/sec Loss 7.2822 LearningRate 0.0008 Epoch: 7 Global Step: 150310 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:31,286-Speed 6296.68 samples/sec Loss 7.2452 LearningRate 0.0008 Epoch: 7 Global Step: 150320 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:34,534-Speed 6305.87 samples/sec Loss 7.2041 LearningRate 0.0008 Epoch: 7 Global Step: 150330 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:37,778-Speed 6314.14 samples/sec Loss 7.2175 LearningRate 0.0008 Epoch: 7 Global Step: 150340 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:41,024-Speed 6312.03 samples/sec Loss 7.2849 LearningRate 0.0008 Epoch: 7 Global Step: 150350 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:44,271-Speed 6307.22 samples/sec Loss 7.2023 LearningRate 0.0008 Epoch: 7 Global Step: 150360 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:24:47,516-Speed 6313.21 samples/sec Loss 7.2355 LearningRate 0.0008 Epoch: 7 Global Step: 150370 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:24:50,761-Speed 6312.98 samples/sec Loss 7.1190 LearningRate 0.0008 Epoch: 7 Global Step: 150380 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:24:54,008-Speed 6308.47 samples/sec Loss 7.2533 LearningRate 0.0008 Epoch: 7 Global Step: 150390 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:24:57,257-Speed 6305.13 samples/sec Loss 7.2288 LearningRate 0.0008 Epoch: 7 Global Step: 150400 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:25:00,505-Speed 6306.65 samples/sec Loss 7.2210 LearningRate 0.0008 Epoch: 7 Global Step: 150410 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:25:03,750-Speed 6311.63 samples/sec Loss 7.2511 LearningRate 0.0008 Epoch: 7 Global Step: 150420 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:25:06,999-Speed 6306.11 samples/sec Loss 7.2286 LearningRate 0.0008 Epoch: 7 Global Step: 150430 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:25:10,245-Speed 6310.40 samples/sec Loss 7.3264 LearningRate 0.0008 Epoch: 7 Global Step: 150440 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:25:13,482-Speed 6328.51 samples/sec Loss 7.2333 LearningRate 0.0008 Epoch: 7 Global Step: 150450 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:16,728-Speed 6311.25 samples/sec Loss 7.3318 LearningRate 0.0008 Epoch: 7 Global Step: 150460 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:19,972-Speed 6314.65 samples/sec Loss 7.2175 LearningRate 0.0008 Epoch: 7 Global Step: 150470 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:23,219-Speed 6307.65 samples/sec Loss 7.1961 LearningRate 0.0008 Epoch: 7 Global Step: 150480 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:26,464-Speed 6314.48 samples/sec Loss 7.2294 LearningRate 0.0008 Epoch: 7 Global Step: 150490 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:29,713-Speed 6303.99 samples/sec Loss 7.2215 LearningRate 0.0008 Epoch: 7 Global Step: 150500 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:32,961-Speed 6306.25 samples/sec Loss 7.2815 LearningRate 0.0008 Epoch: 7 Global Step: 150510 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:36,207-Speed 6311.25 samples/sec Loss 7.1960 LearningRate 0.0008 Epoch: 7 Global Step: 150520 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:39,452-Speed 6313.31 samples/sec Loss 7.3381 LearningRate 0.0008 Epoch: 7 Global Step: 150530 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:42,715-Speed 6277.59 samples/sec Loss 7.2219 LearningRate 0.0008 Epoch: 7 Global Step: 150540 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:45,966-Speed 6301.56 samples/sec Loss 7.2345 LearningRate 0.0008 Epoch: 7 Global Step: 150550 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:25:49,199-Speed 6334.71 samples/sec Loss 7.2929 LearningRate 0.0008 Epoch: 7 Global Step: 150560 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:52,445-Speed 6311.65 samples/sec Loss 7.2229 LearningRate 0.0008 Epoch: 7 Global Step: 150570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:55,695-Speed 6301.62 samples/sec Loss 7.2127 LearningRate 0.0008 Epoch: 7 Global Step: 150580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:25:58,944-Speed 6304.76 samples/sec Loss 7.2593 LearningRate 0.0008 Epoch: 7 Global Step: 150590 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:02,190-Speed 6311.40 samples/sec Loss 7.2799 LearningRate 0.0008 Epoch: 7 Global Step: 150600 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:05,439-Speed 6304.36 samples/sec Loss 7.1687 LearningRate 0.0008 Epoch: 7 Global Step: 150610 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:08,683-Speed 6315.31 samples/sec Loss 7.1922 LearningRate 0.0008 Epoch: 7 Global Step: 150620 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:11,929-Speed 6310.70 samples/sec Loss 7.2068 LearningRate 0.0008 Epoch: 7 Global Step: 150630 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:15,180-Speed 6301.02 samples/sec Loss 7.2046 LearningRate 0.0008 Epoch: 7 Global Step: 150640 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:18,432-Speed 6299.91 samples/sec Loss 7.3102 LearningRate 0.0008 Epoch: 7 Global Step: 150650 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:21,675-Speed 6316.87 samples/sec Loss 7.2862 LearningRate 0.0008 Epoch: 7 Global Step: 150660 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:26:24,925-Speed 6302.07 samples/sec Loss 7.1480 LearningRate 0.0008 Epoch: 7 Global Step: 150670 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:26:28,173-Speed 6306.74 samples/sec Loss 7.1447 LearningRate 0.0008 Epoch: 7 Global Step: 150680 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:26:31,422-Speed 6305.64 samples/sec Loss 7.2331 LearningRate 0.0008 Epoch: 7 Global Step: 150690 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:26:34,669-Speed 6308.86 samples/sec Loss 7.2712 LearningRate 0.0008 Epoch: 7 Global Step: 150700 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:26:37,916-Speed 6308.63 samples/sec Loss 7.2163 LearningRate 0.0008 Epoch: 7 Global Step: 150710 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:26:41,148-Speed 6338.91 samples/sec Loss 7.2071 LearningRate 0.0008 Epoch: 7 Global Step: 150720 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:44,393-Speed 6312.40 samples/sec Loss 7.2815 LearningRate 0.0008 Epoch: 7 Global Step: 150730 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:47,642-Speed 6304.03 samples/sec Loss 7.1580 LearningRate 0.0008 Epoch: 7 Global Step: 150740 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:50,888-Speed 6311.23 samples/sec Loss 7.2045 LearningRate 0.0008 Epoch: 7 Global Step: 150750 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:54,134-Speed 6309.41 samples/sec Loss 7.2425 LearningRate 0.0008 Epoch: 7 Global Step: 150760 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:26:57,380-Speed 6311.10 samples/sec Loss 7.1800 LearningRate 0.0008 Epoch: 7 Global Step: 150770 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:27:00,627-Speed 6309.41 samples/sec Loss 7.2094 LearningRate 0.0008 Epoch: 7 Global Step: 150780 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:27:03,878-Speed 6300.86 samples/sec Loss 7.1924 LearningRate 0.0008 Epoch: 7 Global Step: 150790 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:27:07,127-Speed 6305.24 samples/sec Loss 7.2274 LearningRate 0.0008 Epoch: 7 Global Step: 150800 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:27:10,374-Speed 6308.47 samples/sec Loss 7.1580 LearningRate 0.0008 Epoch: 7 Global Step: 150810 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:27:13,619-Speed 6311.79 samples/sec Loss 7.1873 LearningRate 0.0008 Epoch: 7 Global Step: 150820 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:16,863-Speed 6315.07 samples/sec Loss 7.1372 LearningRate 0.0008 Epoch: 7 Global Step: 150830 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:20,117-Speed 6295.60 samples/sec Loss 7.2191 LearningRate 0.0008 Epoch: 7 Global Step: 150840 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:23,361-Speed 6313.43 samples/sec Loss 7.2943 LearningRate 0.0008 Epoch: 7 Global Step: 150850 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:26,622-Speed 6283.85 samples/sec Loss 7.2521 LearningRate 0.0008 Epoch: 7 Global Step: 150860 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:29,867-Speed 6311.48 samples/sec Loss 7.2150 LearningRate 0.0008 Epoch: 7 Global Step: 150870 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:33,114-Speed 6310.36 samples/sec Loss 7.2002 LearningRate 0.0008 Epoch: 7 Global Step: 150880 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:36,360-Speed 6310.36 samples/sec Loss 7.2402 LearningRate 0.0008 Epoch: 7 Global Step: 150890 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:39,612-Speed 6298.18 samples/sec Loss 7.2679 LearningRate 0.0008 Epoch: 7 Global Step: 150900 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:42,860-Speed 6307.94 samples/sec Loss 7.1545 LearningRate 0.0008 Epoch: 7 Global Step: 150910 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:46,092-Speed 6337.35 samples/sec Loss 7.1644 LearningRate 0.0008 Epoch: 7 Global Step: 150920 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:49,338-Speed 6309.86 samples/sec Loss 7.1633 LearningRate 0.0008 Epoch: 7 Global Step: 150930 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:52,585-Speed 6310.54 samples/sec Loss 7.1842 LearningRate 0.0008 Epoch: 7 Global Step: 150940 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:55,841-Speed 6289.95 samples/sec Loss 7.2093 LearningRate 0.0008 Epoch: 7 Global Step: 150950 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:27:59,089-Speed 6307.91 samples/sec Loss 7.2732 LearningRate 0.0008 Epoch: 7 Global Step: 150960 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:28:02,336-Speed 6307.83 samples/sec Loss 7.2244 LearningRate 0.0008 Epoch: 7 Global Step: 150970 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:28:05,569-Speed 6337.20 samples/sec Loss 7.1802 LearningRate 0.0008 Epoch: 7 Global Step: 150980 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:08,816-Speed 6308.92 samples/sec Loss 7.2868 LearningRate 0.0008 Epoch: 7 Global Step: 150990 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:12,063-Speed 6308.63 samples/sec Loss 7.3270 LearningRate 0.0008 Epoch: 7 Global Step: 151000 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:15,313-Speed 6302.83 samples/sec Loss 7.1862 LearningRate 0.0008 Epoch: 7 Global Step: 151010 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:18,559-Speed 6309.12 samples/sec Loss 7.2524 LearningRate 0.0008 Epoch: 7 Global Step: 151020 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:21,805-Speed 6311.47 samples/sec Loss 7.3013 LearningRate 0.0008 Epoch: 7 Global Step: 151030 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:25,051-Speed 6309.62 samples/sec Loss 7.1580 LearningRate 0.0008 Epoch: 7 Global Step: 151040 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:28,296-Speed 6313.07 samples/sec Loss 7.1978 LearningRate 0.0008 Epoch: 7 Global Step: 151050 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:31,544-Speed 6307.73 samples/sec Loss 7.1570 LearningRate 0.0008 Epoch: 7 Global Step: 151060 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:34,876-Speed 6148.61 samples/sec Loss 7.1060 LearningRate 0.0008 Epoch: 7 Global Step: 151070 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:38,238-Speed 6093.01 samples/sec Loss 7.3002 LearningRate 0.0008 Epoch: 7 Global Step: 151080 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:28:41,507-Speed 6265.04 samples/sec Loss 7.2729 LearningRate 0.0008 Epoch: 7 Global Step: 151090 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:28:44,739-Speed 6338.44 samples/sec Loss 7.3023 LearningRate 0.0008 Epoch: 7 Global Step: 151100 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:47,986-Speed 6309.99 samples/sec Loss 7.2213 LearningRate 0.0008 Epoch: 7 Global Step: 151110 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:51,230-Speed 6315.34 samples/sec Loss 7.1776 LearningRate 0.0008 Epoch: 7 Global Step: 151120 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:54,476-Speed 6308.79 samples/sec Loss 7.2873 LearningRate 0.0008 Epoch: 7 Global Step: 151130 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:28:57,720-Speed 6316.25 samples/sec Loss 7.2732 LearningRate 0.0008 Epoch: 7 Global Step: 151140 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:29:00,968-Speed 6306.36 samples/sec Loss 7.2702 LearningRate 0.0008 Epoch: 7 Global Step: 151150 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:29:04,217-Speed 6304.81 samples/sec Loss 7.1635 LearningRate 0.0008 Epoch: 7 Global Step: 151160 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:29:07,464-Speed 6308.34 samples/sec Loss 7.2593 LearningRate 0.0008 Epoch: 7 Global Step: 151170 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:29:10,711-Speed 6309.13 samples/sec Loss 7.1678 LearningRate 0.0008 Epoch: 7 Global Step: 151180 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:29:13,956-Speed 6313.26 samples/sec Loss 7.1344 LearningRate 0.0008 Epoch: 7 Global Step: 151190 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:29:17,201-Speed 6311.40 samples/sec Loss 7.1437 LearningRate 0.0008 Epoch: 7 Global Step: 151200 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:20,446-Speed 6312.42 samples/sec Loss 7.2788 LearningRate 0.0008 Epoch: 7 Global Step: 151210 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:23,699-Speed 6298.71 samples/sec Loss 7.2361 LearningRate 0.0008 Epoch: 7 Global Step: 151220 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:26,948-Speed 6307.47 samples/sec Loss 7.1511 LearningRate 0.0008 Epoch: 7 Global Step: 151230 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:30,195-Speed 6308.42 samples/sec Loss 7.1810 LearningRate 0.0008 Epoch: 7 Global Step: 151240 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:33,447-Speed 6299.74 samples/sec Loss 7.1598 LearningRate 0.0008 Epoch: 7 Global Step: 151250 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:36,697-Speed 6302.09 samples/sec Loss 7.2283 LearningRate 0.0008 Epoch: 7 Global Step: 151260 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:39,944-Speed 6307.63 samples/sec Loss 7.1896 LearningRate 0.0008 Epoch: 7 Global Step: 151270 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:43,191-Speed 6309.35 samples/sec Loss 7.1883 LearningRate 0.0008 Epoch: 7 Global Step: 151280 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:46,439-Speed 6307.94 samples/sec Loss 7.2362 LearningRate 0.0008 Epoch: 7 Global Step: 151290 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:49,672-Speed 6334.28 samples/sec Loss 7.1749 LearningRate 0.0008 Epoch: 7 Global Step: 151300 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:52,919-Speed 6310.69 samples/sec Loss 7.2363 LearningRate 0.0008 Epoch: 7 Global Step: 151310 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:29:56,151-Speed 6337.25 samples/sec Loss 7.2026 LearningRate 0.0008 Epoch: 7 Global Step: 151320 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:29:59,402-Speed 6302.34 samples/sec Loss 7.1612 LearningRate 0.0008 Epoch: 7 Global Step: 151330 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:02,651-Speed 6304.44 samples/sec Loss 7.2442 LearningRate 0.0008 Epoch: 7 Global Step: 151340 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:05,898-Speed 6307.98 samples/sec Loss 7.2409 LearningRate 0.0008 Epoch: 7 Global Step: 151350 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:09,145-Speed 6308.57 samples/sec Loss 7.2388 LearningRate 0.0008 Epoch: 7 Global Step: 151360 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:12,391-Speed 6311.31 samples/sec Loss 7.2285 LearningRate 0.0008 Epoch: 7 Global Step: 151370 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:15,643-Speed 6299.81 samples/sec Loss 7.1767 LearningRate 0.0008 Epoch: 7 Global Step: 151380 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:18,891-Speed 6307.12 samples/sec Loss 7.1628 LearningRate 0.0008 Epoch: 7 Global Step: 151390 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:22,138-Speed 6306.99 samples/sec Loss 7.2322 LearningRate 0.0008 Epoch: 7 Global Step: 151400 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:25,389-Speed 6302.47 samples/sec Loss 7.1716 LearningRate 0.0008 Epoch: 7 Global Step: 151410 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:28,642-Speed 6296.16 samples/sec Loss 7.1597 LearningRate 0.0008 Epoch: 7 Global Step: 151420 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:30:31,890-Speed 6308.00 samples/sec Loss 7.2023 LearningRate 0.0008 Epoch: 7 Global Step: 151430 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:30:35,137-Speed 6308.04 samples/sec Loss 7.1734 LearningRate 0.0008 Epoch: 7 Global Step: 151440 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:30:38,385-Speed 6306.38 samples/sec Loss 7.2234 LearningRate 0.0008 Epoch: 7 Global Step: 151450 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:30:41,630-Speed 6312.40 samples/sec Loss 7.1904 LearningRate 0.0008 Epoch: 7 Global Step: 151460 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:30:44,879-Speed 6305.11 samples/sec Loss 7.2208 LearningRate 0.0008 Epoch: 7 Global Step: 151470 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:30:48,123-Speed 6313.99 samples/sec Loss 7.1777 LearningRate 0.0008 Epoch: 7 Global Step: 151480 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:30:51,355-Speed 6338.28 samples/sec Loss 7.3138 LearningRate 0.0008 Epoch: 7 Global Step: 151490 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:54,600-Speed 6313.43 samples/sec Loss 7.2158 LearningRate 0.0008 Epoch: 7 Global Step: 151500 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:30:57,852-Speed 6300.15 samples/sec Loss 7.2225 LearningRate 0.0008 Epoch: 7 Global Step: 151510 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:01,099-Speed 6307.60 samples/sec Loss 7.1806 LearningRate 0.0008 Epoch: 7 Global Step: 151520 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:04,347-Speed 6308.81 samples/sec Loss 7.1277 LearningRate 0.0008 Epoch: 7 Global Step: 151530 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:07,595-Speed 6306.27 samples/sec Loss 7.2489 LearningRate 0.0008 Epoch: 7 Global Step: 151540 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:10,852-Speed 6289.95 samples/sec Loss 7.2227 LearningRate 0.0008 Epoch: 7 Global Step: 151550 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:14,110-Speed 6286.41 samples/sec Loss 7.1965 LearningRate 0.0008 Epoch: 7 Global Step: 151560 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:17,354-Speed 6314.31 samples/sec Loss 7.2500 LearningRate 0.0008 Epoch: 7 Global Step: 151570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:20,604-Speed 6303.27 samples/sec Loss 7.2012 LearningRate 0.0008 Epoch: 7 Global Step: 151580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:23,851-Speed 6308.69 samples/sec Loss 7.1476 LearningRate 0.0008 Epoch: 7 Global Step: 151590 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:31:27,096-Speed 6313.80 samples/sec Loss 7.2635 LearningRate 0.0008 Epoch: 7 Global Step: 151600 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:31:30,328-Speed 6338.32 samples/sec Loss 7.2995 LearningRate 0.0008 Epoch: 7 Global Step: 151610 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:33,573-Speed 6312.22 samples/sec Loss 7.1654 LearningRate 0.0008 Epoch: 7 Global Step: 151620 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:36,822-Speed 6304.10 samples/sec Loss 7.1671 LearningRate 0.0008 Epoch: 7 Global Step: 151630 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:40,064-Speed 6317.92 samples/sec Loss 7.1832 LearningRate 0.0008 Epoch: 7 Global Step: 151640 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:43,308-Speed 6315.46 samples/sec Loss 7.1852 LearningRate 0.0008 Epoch: 7 Global Step: 151650 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:46,564-Speed 6291.45 samples/sec Loss 7.1987 LearningRate 0.0008 Epoch: 7 Global Step: 151660 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:49,808-Speed 6313.45 samples/sec Loss 7.1955 LearningRate 0.0008 Epoch: 7 Global Step: 151670 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:53,058-Speed 6304.23 samples/sec Loss 7.1386 LearningRate 0.0008 Epoch: 7 Global Step: 151680 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:56,307-Speed 6303.72 samples/sec Loss 7.2278 LearningRate 0.0008 Epoch: 7 Global Step: 151690 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:31:59,555-Speed 6307.08 samples/sec Loss 7.1432 LearningRate 0.0008 Epoch: 7 Global Step: 151700 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:02,811-Speed 6291.75 samples/sec Loss 7.2123 LearningRate 0.0008 Epoch: 7 Global Step: 151710 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:32:06,059-Speed 6307.15 samples/sec Loss 7.2222 LearningRate 0.0008 Epoch: 7 Global Step: 151720 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:32:09,306-Speed 6308.53 samples/sec Loss 7.1997 LearningRate 0.0008 Epoch: 7 Global Step: 151730 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:32:12,537-Speed 6340.68 samples/sec Loss 7.1779 LearningRate 0.0008 Epoch: 7 Global Step: 151740 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:15,786-Speed 6304.50 samples/sec Loss 7.2033 LearningRate 0.0008 Epoch: 7 Global Step: 151750 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:19,033-Speed 6309.11 samples/sec Loss 7.1596 LearningRate 0.0008 Epoch: 7 Global Step: 151760 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:22,285-Speed 6300.40 samples/sec Loss 7.1892 LearningRate 0.0008 Epoch: 7 Global Step: 151770 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:25,532-Speed 6308.35 samples/sec Loss 7.1508 LearningRate 0.0008 Epoch: 7 Global Step: 151780 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:28,782-Speed 6301.91 samples/sec Loss 7.2262 LearningRate 0.0008 Epoch: 7 Global Step: 151790 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:32,029-Speed 6309.63 samples/sec Loss 7.2082 LearningRate 0.0008 Epoch: 7 Global Step: 151800 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:35,278-Speed 6304.82 samples/sec Loss 7.1691 LearningRate 0.0008 Epoch: 7 Global Step: 151810 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:38,527-Speed 6305.24 samples/sec Loss 7.2367 LearningRate 0.0008 Epoch: 7 Global Step: 151820 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:41,773-Speed 6309.64 samples/sec Loss 7.2292 LearningRate 0.0008 Epoch: 7 Global Step: 151830 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:45,018-Speed 6312.76 samples/sec Loss 7.2299 LearningRate 0.0008 Epoch: 7 Global Step: 151840 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:32:48,262-Speed 6314.24 samples/sec Loss 7.2630 LearningRate 0.0008 Epoch: 7 Global Step: 151850 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:32:51,494-Speed 6339.16 samples/sec Loss 7.2499 LearningRate 0.0008 Epoch: 7 Global Step: 151860 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:54,742-Speed 6306.10 samples/sec Loss 7.1599 LearningRate 0.0008 Epoch: 7 Global Step: 151870 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:32:57,996-Speed 6296.13 samples/sec Loss 7.2270 LearningRate 0.0008 Epoch: 7 Global Step: 151880 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:33:01,243-Speed 6308.58 samples/sec Loss 7.1944 LearningRate 0.0008 Epoch: 7 Global Step: 151890 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:33:04,494-Speed 6301.31 samples/sec Loss 7.2470 LearningRate 0.0008 Epoch: 7 Global Step: 151900 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:33:07,740-Speed 6311.11 samples/sec Loss 7.1964 LearningRate 0.0008 Epoch: 7 Global Step: 151910 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:33:10,987-Speed 6306.92 samples/sec Loss 7.1767 LearningRate 0.0008 Epoch: 7 Global Step: 151920 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:33:14,231-Speed 6315.96 samples/sec Loss 7.1529 LearningRate 0.0008 Epoch: 7 Global Step: 151930 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:33:17,475-Speed 6313.73 samples/sec Loss 7.2022 LearningRate 0.0008 Epoch: 7 Global Step: 151940 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:33:20,725-Speed 6303.50 samples/sec Loss 7.1321 LearningRate 0.0008 Epoch: 7 Global Step: 151950 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:33:23,975-Speed 6304.16 samples/sec Loss 7.1925 LearningRate 0.0008 Epoch: 7 Global Step: 151960 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:27,225-Speed 6302.70 samples/sec Loss 7.1855 LearningRate 0.0008 Epoch: 7 Global Step: 151970 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:30,471-Speed 6310.08 samples/sec Loss 7.1823 LearningRate 0.0008 Epoch: 7 Global Step: 151980 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:33,719-Speed 6307.76 samples/sec Loss 7.1832 LearningRate 0.0008 Epoch: 7 Global Step: 151990 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:36,967-Speed 6306.78 samples/sec Loss 7.1431 LearningRate 0.0008 Epoch: 7 Global Step: 152000 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:40,216-Speed 6304.87 samples/sec Loss 7.1245 LearningRate 0.0008 Epoch: 7 Global Step: 152010 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:43,476-Speed 6283.83 samples/sec Loss 7.1958 LearningRate 0.0008 Epoch: 7 Global Step: 152020 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:46,720-Speed 6313.44 samples/sec Loss 7.2655 LearningRate 0.0008 Epoch: 7 Global Step: 152030 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:49,976-Speed 6291.53 samples/sec Loss 7.3465 LearningRate 0.0008 Epoch: 7 Global Step: 152040 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:53,303-Speed 6157.43 samples/sec Loss 7.1070 LearningRate 0.0008 Epoch: 7 Global Step: 152050 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:56,537-Speed 6334.62 samples/sec Loss 7.0986 LearningRate 0.0008 Epoch: 7 Global Step: 152060 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:33:59,787-Speed 6301.39 samples/sec Loss 7.1988 LearningRate 0.0008 Epoch: 7 Global Step: 152070 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:34:03,032-Speed 6313.88 samples/sec Loss 7.1407 LearningRate 0.0008 Epoch: 7 Global Step: 152080 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:34:06,265-Speed 6335.25 samples/sec Loss 7.1732 LearningRate 0.0008 Epoch: 7 Global Step: 152090 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:34:09,513-Speed 6308.54 samples/sec Loss 7.1830 LearningRate 0.0008 Epoch: 7 Global Step: 152100 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:34:12,758-Speed 6311.47 samples/sec Loss 7.2172 LearningRate 0.0008 Epoch: 7 Global Step: 152110 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:34:16,003-Speed 6312.52 samples/sec Loss 7.1656 LearningRate 0.0008 Epoch: 7 Global Step: 152120 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:34:19,248-Speed 6313.57 samples/sec Loss 7.1479 LearningRate 0.0008 Epoch: 7 Global Step: 152130 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:34:22,494-Speed 6309.51 samples/sec Loss 7.1608 LearningRate 0.0008 Epoch: 7 Global Step: 152140 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:34:25,744-Speed 6303.94 samples/sec Loss 7.1631 LearningRate 0.0008 Epoch: 7 Global Step: 152150 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:34:28,995-Speed 6301.61 samples/sec Loss 7.1495 LearningRate 0.0008 Epoch: 7 Global Step: 152160 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:34:32,243-Speed 6307.34 samples/sec Loss 7.1882 LearningRate 0.0008 Epoch: 7 Global Step: 152170 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:34:35,489-Speed 6310.80 samples/sec Loss 7.2207 LearningRate 0.0008 Epoch: 7 Global Step: 152180 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:34:38,736-Speed 6307.28 samples/sec Loss 7.2290 LearningRate 0.0008 Epoch: 7 Global Step: 152190 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:34:41,984-Speed 6307.97 samples/sec Loss 7.2769 LearningRate 0.0008 Epoch: 7 Global Step: 152200 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:34:45,288-Speed 6199.61 samples/sec Loss 7.2139 LearningRate 0.0008 Epoch: 7 Global Step: 152210 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:34:48,537-Speed 6304.02 samples/sec Loss 7.2476 LearningRate 0.0008 Epoch: 7 Global Step: 152220 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:34:51,783-Speed 6312.08 samples/sec Loss 7.1990 LearningRate 0.0008 Epoch: 7 Global Step: 152230 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:34:55,027-Speed 6313.39 samples/sec Loss 7.1756 LearningRate 0.0008 Epoch: 7 Global Step: 152240 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:34:58,271-Speed 6315.66 samples/sec Loss 7.1220 LearningRate 0.0008 Epoch: 7 Global Step: 152250 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:35:01,522-Speed 6300.15 samples/sec Loss 7.1823 LearningRate 0.0008 Epoch: 7 Global Step: 152260 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:35:04,756-Speed 6334.94 samples/sec Loss 7.1338 LearningRate 0.0008 Epoch: 7 Global Step: 152270 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:08,003-Speed 6308.79 samples/sec Loss 7.1884 LearningRate 0.0008 Epoch: 7 Global Step: 152280 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:11,247-Speed 6313.22 samples/sec Loss 7.2295 LearningRate 0.0008 Epoch: 7 Global Step: 152290 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:14,494-Speed 6308.38 samples/sec Loss 7.2146 LearningRate 0.0008 Epoch: 7 Global Step: 152300 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:17,739-Speed 6313.46 samples/sec Loss 7.2913 LearningRate 0.0008 Epoch: 7 Global Step: 152310 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:20,982-Speed 6316.04 samples/sec Loss 7.1966 LearningRate 0.0008 Epoch: 7 Global Step: 152320 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:24,227-Speed 6313.76 samples/sec Loss 7.1929 LearningRate 0.0008 Epoch: 7 Global Step: 152330 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:27,472-Speed 6311.37 samples/sec Loss 7.2528 LearningRate 0.0008 Epoch: 7 Global Step: 152340 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:30,718-Speed 6310.34 samples/sec Loss 7.1807 LearningRate 0.0008 Epoch: 7 Global Step: 152350 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:33,965-Speed 6310.78 samples/sec Loss 7.1810 LearningRate 0.0008 Epoch: 7 Global Step: 152360 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:37,209-Speed 6314.69 samples/sec Loss 7.1164 LearningRate 0.0008 Epoch: 7 Global Step: 152370 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:35:40,454-Speed 6313.37 samples/sec Loss 7.2029 LearningRate 0.0008 Epoch: 7 Global Step: 152380 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:35:43,687-Speed 6335.66 samples/sec Loss 7.2775 LearningRate 0.0008 Epoch: 7 Global Step: 152390 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:46,934-Speed 6309.31 samples/sec Loss 7.1581 LearningRate 0.0008 Epoch: 7 Global Step: 152400 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:50,180-Speed 6308.86 samples/sec Loss 7.2505 LearningRate 0.0008 Epoch: 7 Global Step: 152410 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:53,426-Speed 6311.16 samples/sec Loss 7.1906 LearningRate 0.0008 Epoch: 7 Global Step: 152420 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:56,670-Speed 6314.40 samples/sec Loss 7.2841 LearningRate 0.0008 Epoch: 7 Global Step: 152430 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:35:59,913-Speed 6317.21 samples/sec Loss 7.1537 LearningRate 0.0008 Epoch: 7 Global Step: 152440 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:03,158-Speed 6312.21 samples/sec Loss 7.1801 LearningRate 0.0008 Epoch: 7 Global Step: 152450 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:06,401-Speed 6316.81 samples/sec Loss 7.2147 LearningRate 0.0008 Epoch: 7 Global Step: 152460 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:09,647-Speed 6310.74 samples/sec Loss 7.1707 LearningRate 0.0008 Epoch: 7 Global Step: 152470 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:12,892-Speed 6313.39 samples/sec Loss 7.2382 LearningRate 0.0008 Epoch: 7 Global Step: 152480 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:16,136-Speed 6314.45 samples/sec Loss 7.1778 LearningRate 0.0008 Epoch: 7 Global Step: 152490 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:36:19,383-Speed 6307.91 samples/sec Loss 7.2371 LearningRate 0.0008 Epoch: 7 Global Step: 152500 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:36:22,627-Speed 6315.66 samples/sec Loss 7.1784 LearningRate 0.0008 Epoch: 7 Global Step: 152510 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:36:25,875-Speed 6306.51 samples/sec Loss 7.2245 LearningRate 0.0008 Epoch: 7 Global Step: 152520 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:36:29,118-Speed 6315.77 samples/sec Loss 7.1023 LearningRate 0.0008 Epoch: 7 Global Step: 152530 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:36:32,355-Speed 6329.29 samples/sec Loss 7.2229 LearningRate 0.0008 Epoch: 7 Global Step: 152540 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:35,603-Speed 6305.83 samples/sec Loss 7.1748 LearningRate 0.0008 Epoch: 7 Global Step: 152550 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:38,848-Speed 6312.77 samples/sec Loss 7.2221 LearningRate 0.0008 Epoch: 7 Global Step: 152560 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:42,098-Speed 6303.59 samples/sec Loss 7.1890 LearningRate 0.0008 Epoch: 7 Global Step: 152570 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:45,353-Speed 6292.86 samples/sec Loss 7.2162 LearningRate 0.0008 Epoch: 7 Global Step: 152580 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:48,608-Speed 6293.24 samples/sec Loss 7.1795 LearningRate 0.0008 Epoch: 7 Global Step: 152590 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:51,855-Speed 6310.05 samples/sec Loss 7.1713 LearningRate 0.0008 Epoch: 7 Global Step: 152600 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:55,098-Speed 6315.30 samples/sec Loss 7.1853 LearningRate 0.0008 Epoch: 7 Global Step: 152610 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:36:58,359-Speed 6282.74 samples/sec Loss 7.1963 LearningRate 0.0008 Epoch: 7 Global Step: 152620 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:01,607-Speed 6305.65 samples/sec Loss 7.1735 LearningRate 0.0008 Epoch: 7 Global Step: 152630 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:04,840-Speed 6336.08 samples/sec Loss 7.1975 LearningRate 0.0008 Epoch: 7 Global Step: 152640 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:08,085-Speed 6313.05 samples/sec Loss 7.1799 LearningRate 0.0008 Epoch: 7 Global Step: 152650 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:11,339-Speed 6295.71 samples/sec Loss 7.2391 LearningRate 0.0008 Epoch: 7 Global Step: 152660 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:14,589-Speed 6303.63 samples/sec Loss 7.1470 LearningRate 0.0008 Epoch: 7 Global Step: 152670 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:17,840-Speed 6299.35 samples/sec Loss 7.2109 LearningRate 0.0008 Epoch: 7 Global Step: 152680 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:21,088-Speed 6307.18 samples/sec Loss 7.2321 LearningRate 0.0008 Epoch: 7 Global Step: 152690 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:24,341-Speed 6298.14 samples/sec Loss 7.1717 LearningRate 0.0008 Epoch: 7 Global Step: 152700 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:27,590-Speed 6303.21 samples/sec Loss 7.1728 LearningRate 0.0008 Epoch: 7 Global Step: 152710 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:30,837-Speed 6309.51 samples/sec Loss 7.1865 LearningRate 0.0008 Epoch: 7 Global Step: 152720 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:34,085-Speed 6306.28 samples/sec Loss 7.2131 LearningRate 0.0008 Epoch: 7 Global Step: 152730 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:37,331-Speed 6311.88 samples/sec Loss 7.1252 LearningRate 0.0008 Epoch: 7 Global Step: 152740 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:37:40,579-Speed 6307.07 samples/sec Loss 7.1262 LearningRate 0.0008 Epoch: 7 Global Step: 152750 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:37:43,815-Speed 6328.66 samples/sec Loss 7.1854 LearningRate 0.0008 Epoch: 7 Global Step: 152760 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:47,061-Speed 6311.82 samples/sec Loss 7.1721 LearningRate 0.0008 Epoch: 7 Global Step: 152770 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:50,309-Speed 6306.13 samples/sec Loss 7.1289 LearningRate 0.0008 Epoch: 7 Global Step: 152780 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:53,561-Speed 6299.34 samples/sec Loss 7.1514 LearningRate 0.0008 Epoch: 7 Global Step: 152790 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:37:56,810-Speed 6306.05 samples/sec Loss 7.1447 LearningRate 0.0008 Epoch: 7 Global Step: 152800 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:00,055-Speed 6313.15 samples/sec Loss 7.1013 LearningRate 0.0008 Epoch: 7 Global Step: 152810 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:03,301-Speed 6309.58 samples/sec Loss 7.1847 LearningRate 0.0008 Epoch: 7 Global Step: 152820 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:06,548-Speed 6308.52 samples/sec Loss 7.1215 LearningRate 0.0008 Epoch: 7 Global Step: 152830 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:09,795-Speed 6309.01 samples/sec Loss 7.1902 LearningRate 0.0008 Epoch: 7 Global Step: 152840 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:13,041-Speed 6311.06 samples/sec Loss 7.2685 LearningRate 0.0008 Epoch: 7 Global Step: 152850 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:16,285-Speed 6314.90 samples/sec Loss 7.1739 LearningRate 0.0008 Epoch: 7 Global Step: 152860 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:38:19,535-Speed 6303.17 samples/sec Loss 7.1009 LearningRate 0.0008 Epoch: 7 Global Step: 152870 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:38:22,783-Speed 6306.86 samples/sec Loss 7.1831 LearningRate 0.0008 Epoch: 7 Global Step: 152880 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:38:26,077-Speed 6217.37 samples/sec Loss 7.1831 LearningRate 0.0008 Epoch: 7 Global Step: 152890 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:38:29,326-Speed 6306.14 samples/sec Loss 7.1618 LearningRate 0.0008 Epoch: 7 Global Step: 152900 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:38:32,572-Speed 6309.62 samples/sec Loss 7.2270 LearningRate 0.0008 Epoch: 7 Global Step: 152910 Fp16 Grad Scale: 65536 Required: 62 hours Training: 2022-04-01 05:38:35,802-Speed 6342.67 samples/sec Loss 7.1912 LearningRate 0.0008 Epoch: 7 Global Step: 152920 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:39,054-Speed 6298.70 samples/sec Loss 7.2373 LearningRate 0.0008 Epoch: 7 Global Step: 152930 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:42,302-Speed 6308.15 samples/sec Loss 7.2168 LearningRate 0.0008 Epoch: 7 Global Step: 152940 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:45,546-Speed 6314.16 samples/sec Loss 7.1880 LearningRate 0.0008 Epoch: 7 Global Step: 152950 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:48,793-Speed 6308.77 samples/sec Loss 7.1578 LearningRate 0.0008 Epoch: 7 Global Step: 152960 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:52,036-Speed 6316.18 samples/sec Loss 7.1772 LearningRate 0.0008 Epoch: 7 Global Step: 152970 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:55,283-Speed 6308.65 samples/sec Loss 7.1109 LearningRate 0.0008 Epoch: 7 Global Step: 152980 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:38:58,530-Speed 6308.26 samples/sec Loss 7.1325 LearningRate 0.0008 Epoch: 7 Global Step: 152990 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:39:01,788-Speed 6287.53 samples/sec Loss 7.2247 LearningRate 0.0008 Epoch: 7 Global Step: 153000 Fp16 Grad Scale: 32768 Required: 62 hours Training: 2022-04-01 05:39:05,033-Speed 6312.53 samples/sec Loss 7.1957 LearningRate 0.0008 Epoch: 7 Global Step: 153010 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:08,280-Speed 6309.24 samples/sec Loss 7.1672 LearningRate 0.0008 Epoch: 7 Global Step: 153020 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:39:11,512-Speed 6338.41 samples/sec Loss 7.1061 LearningRate 0.0008 Epoch: 7 Global Step: 153030 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:14,757-Speed 6313.27 samples/sec Loss 7.1329 LearningRate 0.0008 Epoch: 7 Global Step: 153040 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:18,002-Speed 6311.69 samples/sec Loss 7.1461 LearningRate 0.0008 Epoch: 7 Global Step: 153050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:21,246-Speed 6314.61 samples/sec Loss 7.1586 LearningRate 0.0008 Epoch: 7 Global Step: 153060 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:24,499-Speed 6298.09 samples/sec Loss 7.1920 LearningRate 0.0008 Epoch: 7 Global Step: 153070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:27,745-Speed 6311.46 samples/sec Loss 7.0959 LearningRate 0.0008 Epoch: 7 Global Step: 153080 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:31,015-Speed 6263.57 samples/sec Loss 7.2048 LearningRate 0.0008 Epoch: 7 Global Step: 153090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:34,261-Speed 6310.86 samples/sec Loss 7.1892 LearningRate 0.0008 Epoch: 7 Global Step: 153100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:37,508-Speed 6307.51 samples/sec Loss 7.1667 LearningRate 0.0008 Epoch: 7 Global Step: 153110 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:40,758-Speed 6304.34 samples/sec Loss 7.2214 LearningRate 0.0008 Epoch: 7 Global Step: 153120 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:39:44,004-Speed 6309.58 samples/sec Loss 7.1493 LearningRate 0.0008 Epoch: 7 Global Step: 153130 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:39:47,248-Speed 6315.96 samples/sec Loss 7.1074 LearningRate 0.0008 Epoch: 7 Global Step: 153140 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:39:50,492-Speed 6313.96 samples/sec Loss 7.2069 LearningRate 0.0008 Epoch: 7 Global Step: 153150 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:39:53,742-Speed 6303.33 samples/sec Loss 7.1778 LearningRate 0.0008 Epoch: 7 Global Step: 153160 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:39:56,976-Speed 6333.54 samples/sec Loss 7.2232 LearningRate 0.0008 Epoch: 7 Global Step: 153170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:00,222-Speed 6310.80 samples/sec Loss 7.2073 LearningRate 0.0008 Epoch: 7 Global Step: 153180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:03,471-Speed 6304.24 samples/sec Loss 7.2022 LearningRate 0.0008 Epoch: 7 Global Step: 153190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:06,720-Speed 6304.67 samples/sec Loss 7.2240 LearningRate 0.0008 Epoch: 7 Global Step: 153200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:09,972-Speed 6300.12 samples/sec Loss 7.2080 LearningRate 0.0008 Epoch: 7 Global Step: 153210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:13,221-Speed 6305.11 samples/sec Loss 7.2106 LearningRate 0.0008 Epoch: 7 Global Step: 153220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:16,460-Speed 6324.23 samples/sec Loss 7.1917 LearningRate 0.0008 Epoch: 7 Global Step: 153230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:19,709-Speed 6305.40 samples/sec Loss 7.1665 LearningRate 0.0008 Epoch: 7 Global Step: 153240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:22,959-Speed 6303.82 samples/sec Loss 7.1633 LearningRate 0.0008 Epoch: 7 Global Step: 153250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:26,206-Speed 6308.21 samples/sec Loss 7.2303 LearningRate 0.0008 Epoch: 7 Global Step: 153260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:29,455-Speed 6305.45 samples/sec Loss 7.2660 LearningRate 0.0008 Epoch: 7 Global Step: 153270 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:40:32,701-Speed 6309.58 samples/sec Loss 7.1803 LearningRate 0.0008 Epoch: 7 Global Step: 153280 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:40:35,953-Speed 6299.80 samples/sec Loss 7.2225 LearningRate 0.0008 Epoch: 7 Global Step: 153290 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:40:39,186-Speed 6336.49 samples/sec Loss 7.2012 LearningRate 0.0008 Epoch: 7 Global Step: 153300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:42,431-Speed 6312.85 samples/sec Loss 7.2585 LearningRate 0.0008 Epoch: 7 Global Step: 153310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:45,674-Speed 6315.49 samples/sec Loss 7.2918 LearningRate 0.0008 Epoch: 7 Global Step: 153320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:48,922-Speed 6307.96 samples/sec Loss 7.2040 LearningRate 0.0008 Epoch: 7 Global Step: 153330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:52,171-Speed 6304.68 samples/sec Loss 7.2260 LearningRate 0.0008 Epoch: 7 Global Step: 153340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:55,422-Speed 6299.92 samples/sec Loss 7.2676 LearningRate 0.0008 Epoch: 7 Global Step: 153350 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:40:58,668-Speed 6311.27 samples/sec Loss 7.1669 LearningRate 0.0008 Epoch: 7 Global Step: 153360 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:01,915-Speed 6308.61 samples/sec Loss 7.1265 LearningRate 0.0008 Epoch: 7 Global Step: 153370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:05,170-Speed 6293.76 samples/sec Loss 7.2158 LearningRate 0.0008 Epoch: 7 Global Step: 153380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:08,417-Speed 6308.64 samples/sec Loss 7.1596 LearningRate 0.0008 Epoch: 7 Global Step: 153390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:11,660-Speed 6316.80 samples/sec Loss 7.0924 LearningRate 0.0008 Epoch: 7 Global Step: 153400 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:41:14,908-Speed 6306.58 samples/sec Loss 7.1923 LearningRate 0.0008 Epoch: 7 Global Step: 153410 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:41:18,154-Speed 6310.30 samples/sec Loss 7.1346 LearningRate 0.0008 Epoch: 7 Global Step: 153420 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:41:21,390-Speed 6330.95 samples/sec Loss 7.2245 LearningRate 0.0008 Epoch: 7 Global Step: 153430 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:24,642-Speed 6298.85 samples/sec Loss 7.1182 LearningRate 0.0008 Epoch: 7 Global Step: 153440 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:27,890-Speed 6308.33 samples/sec Loss 7.1508 LearningRate 0.0008 Epoch: 7 Global Step: 153450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:31,145-Speed 6293.48 samples/sec Loss 7.2003 LearningRate 0.0008 Epoch: 7 Global Step: 153460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:34,390-Speed 6312.73 samples/sec Loss 7.1204 LearningRate 0.0008 Epoch: 7 Global Step: 153470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:37,634-Speed 6314.17 samples/sec Loss 7.2223 LearningRate 0.0008 Epoch: 7 Global Step: 153480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:40,881-Speed 6309.10 samples/sec Loss 7.1660 LearningRate 0.0008 Epoch: 7 Global Step: 153490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:44,131-Speed 6303.38 samples/sec Loss 7.0980 LearningRate 0.0008 Epoch: 7 Global Step: 153500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:47,378-Speed 6308.55 samples/sec Loss 7.1268 LearningRate 0.0008 Epoch: 7 Global Step: 153510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:50,623-Speed 6312.96 samples/sec Loss 7.1736 LearningRate 0.0008 Epoch: 7 Global Step: 153520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:41:53,872-Speed 6304.43 samples/sec Loss 7.2029 LearningRate 0.0008 Epoch: 7 Global Step: 153530 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:41:57,119-Speed 6308.16 samples/sec Loss 7.1776 LearningRate 0.0008 Epoch: 7 Global Step: 153540 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:42:00,366-Speed 6309.65 samples/sec Loss 7.2243 LearningRate 0.0008 Epoch: 7 Global Step: 153550 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:42:03,612-Speed 6311.24 samples/sec Loss 7.2138 LearningRate 0.0008 Epoch: 7 Global Step: 153560 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:42:06,839-Speed 6346.24 samples/sec Loss 7.1914 LearningRate 0.0008 Epoch: 7 Global Step: 153570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:42:10,091-Speed 6300.09 samples/sec Loss 7.1545 LearningRate 0.0008 Epoch: 7 Global Step: 153580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:42:13,334-Speed 6315.88 samples/sec Loss 7.1038 LearningRate 0.0008 Epoch: 7 Global Step: 153590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:42:16,582-Speed 6307.61 samples/sec Loss 7.0652 LearningRate 0.0008 Epoch: 7 Global Step: 153600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:42:19,826-Speed 6313.23 samples/sec Loss 7.0921 LearningRate 0.0008 Epoch: 7 Global Step: 153610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:42:23,069-Speed 6317.29 samples/sec Loss 7.1232 LearningRate 0.0008 Epoch: 7 Global Step: 153620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:42:26,315-Speed 6310.34 samples/sec Loss 7.2085 LearningRate 0.0008 Epoch: 7 Global Step: 153630 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:42:29,560-Speed 6313.13 samples/sec Loss 7.2108 LearningRate 0.0008 Epoch: 7 Global Step: 153640 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:42:32,804-Speed 6315.56 samples/sec Loss 7.1611 LearningRate 0.0008 Epoch: 7 Global Step: 153650 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:42:36,051-Speed 6307.87 samples/sec Loss 7.1762 LearningRate 0.0008 Epoch: 7 Global Step: 153660 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:42:39,296-Speed 6313.20 samples/sec Loss 7.1235 LearningRate 0.0008 Epoch: 7 Global Step: 153670 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:42:42,547-Speed 6300.40 samples/sec Loss 7.1477 LearningRate 0.0008 Epoch: 7 Global Step: 153680 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:42:45,793-Speed 6312.02 samples/sec Loss 7.1408 LearningRate 0.0008 Epoch: 7 Global Step: 153690 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:42:49,038-Speed 6312.08 samples/sec Loss 7.1257 LearningRate 0.0008 Epoch: 7 Global Step: 153700 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:42:52,283-Speed 6313.68 samples/sec Loss 7.2566 LearningRate 0.0008 Epoch: 7 Global Step: 153710 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:42:55,529-Speed 6309.12 samples/sec Loss 7.1238 LearningRate 0.0008 Epoch: 7 Global Step: 153720 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:42:58,778-Speed 6305.82 samples/sec Loss 7.1898 LearningRate 0.0008 Epoch: 7 Global Step: 153730 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:43:02,025-Speed 6309.37 samples/sec Loss 7.1820 LearningRate 0.0008 Epoch: 7 Global Step: 153740 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:43:05,273-Speed 6306.18 samples/sec Loss 7.2120 LearningRate 0.0008 Epoch: 7 Global Step: 153750 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:43:08,520-Speed 6308.92 samples/sec Loss 7.1967 LearningRate 0.0008 Epoch: 7 Global Step: 153760 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:43:11,753-Speed 6335.24 samples/sec Loss 7.2110 LearningRate 0.0008 Epoch: 7 Global Step: 153770 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:43:15,000-Speed 6309.35 samples/sec Loss 7.1013 LearningRate 0.0008 Epoch: 7 Global Step: 153780 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:43:18,248-Speed 6307.04 samples/sec Loss 7.1686 LearningRate 0.0008 Epoch: 7 Global Step: 153790 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:43:21,481-Speed 6335.26 samples/sec Loss 7.1949 LearningRate 0.0008 Epoch: 7 Global Step: 153800 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:43:24,729-Speed 6307.59 samples/sec Loss 7.2007 LearningRate 0.0008 Epoch: 7 Global Step: 153810 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:43:27,980-Speed 6300.74 samples/sec Loss 7.1751 LearningRate 0.0008 Epoch: 7 Global Step: 153820 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:43:31,225-Speed 6312.96 samples/sec Loss 7.0783 LearningRate 0.0008 Epoch: 7 Global Step: 153830 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:43:34,473-Speed 6306.98 samples/sec Loss 7.1071 LearningRate 0.0008 Epoch: 7 Global Step: 153840 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:43:37,720-Speed 6307.34 samples/sec Loss 7.0777 LearningRate 0.0008 Epoch: 7 Global Step: 153850 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:43:40,967-Speed 6308.99 samples/sec Loss 7.2207 LearningRate 0.0008 Epoch: 7 Global Step: 153860 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:43:44,216-Speed 6307.32 samples/sec Loss 7.1858 LearningRate 0.0008 Epoch: 7 Global Step: 153870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:43:47,467-Speed 6300.16 samples/sec Loss 7.1574 LearningRate 0.0008 Epoch: 7 Global Step: 153880 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:43:50,722-Speed 6293.44 samples/sec Loss 7.2123 LearningRate 0.0008 Epoch: 7 Global Step: 153890 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:43:53,966-Speed 6313.21 samples/sec Loss 7.1893 LearningRate 0.0008 Epoch: 7 Global Step: 153900 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:43:57,219-Speed 6297.71 samples/sec Loss 7.2080 LearningRate 0.0008 Epoch: 7 Global Step: 153910 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:44:00,469-Speed 6303.87 samples/sec Loss 7.0784 LearningRate 0.0008 Epoch: 7 Global Step: 153920 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:44:03,700-Speed 6338.70 samples/sec Loss 7.1817 LearningRate 0.0008 Epoch: 7 Global Step: 153930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:44:06,948-Speed 6308.19 samples/sec Loss 7.2107 LearningRate 0.0008 Epoch: 7 Global Step: 153940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:44:10,194-Speed 6309.49 samples/sec Loss 7.2140 LearningRate 0.0008 Epoch: 7 Global Step: 153950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:44:13,441-Speed 6310.22 samples/sec Loss 7.1568 LearningRate 0.0008 Epoch: 7 Global Step: 153960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:44:16,685-Speed 6313.78 samples/sec Loss 7.1964 LearningRate 0.0008 Epoch: 7 Global Step: 153970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:44:19,932-Speed 6309.41 samples/sec Loss 7.1738 LearningRate 0.0008 Epoch: 7 Global Step: 153980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:44:23,180-Speed 6305.26 samples/sec Loss 7.2003 LearningRate 0.0008 Epoch: 7 Global Step: 153990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:44:26,427-Speed 6310.41 samples/sec Loss 7.1985 LearningRate 0.0008 Epoch: 7 Global Step: 154000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:44:29,676-Speed 6304.40 samples/sec Loss 7.1233 LearningRate 0.0008 Epoch: 7 Global Step: 154010 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:44:32,925-Speed 6304.81 samples/sec Loss 7.2057 LearningRate 0.0008 Epoch: 7 Global Step: 154020 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:44:36,169-Speed 6313.42 samples/sec Loss 7.1818 LearningRate 0.0008 Epoch: 7 Global Step: 154030 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:44:39,414-Speed 6313.82 samples/sec Loss 7.1507 LearningRate 0.0008 Epoch: 7 Global Step: 154040 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:44:42,659-Speed 6312.29 samples/sec Loss 7.0667 LearningRate 0.0008 Epoch: 7 Global Step: 154050 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:44:45,911-Speed 6299.17 samples/sec Loss 7.1779 LearningRate 0.0008 Epoch: 7 Global Step: 154060 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:44:49,161-Speed 6302.78 samples/sec Loss 7.1939 LearningRate 0.0008 Epoch: 7 Global Step: 154070 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:44:52,408-Speed 6308.15 samples/sec Loss 7.1597 LearningRate 0.0008 Epoch: 7 Global Step: 154080 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:44:55,662-Speed 6295.43 samples/sec Loss 7.1311 LearningRate 0.0008 Epoch: 7 Global Step: 154090 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:44:58,905-Speed 6317.02 samples/sec Loss 7.1892 LearningRate 0.0008 Epoch: 7 Global Step: 154100 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:45:02,163-Speed 6288.02 samples/sec Loss 7.1884 LearningRate 0.0008 Epoch: 7 Global Step: 154110 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:45:05,515-Speed 6112.31 samples/sec Loss 7.1868 LearningRate 0.0008 Epoch: 7 Global Step: 154120 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:45:08,734-Speed 6363.00 samples/sec Loss 7.2774 LearningRate 0.0008 Epoch: 7 Global Step: 154130 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:45:11,979-Speed 6313.22 samples/sec Loss 7.2183 LearningRate 0.0008 Epoch: 7 Global Step: 154140 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:45:15,229-Speed 6302.06 samples/sec Loss 7.1715 LearningRate 0.0008 Epoch: 7 Global Step: 154150 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:45:18,479-Speed 6303.67 samples/sec Loss 7.1219 LearningRate 0.0008 Epoch: 7 Global Step: 154160 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:45:21,750-Speed 6262.08 samples/sec Loss 7.0854 LearningRate 0.0008 Epoch: 7 Global Step: 154170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:45:25,040-Speed 6226.65 samples/sec Loss 7.1105 LearningRate 0.0008 Epoch: 7 Global Step: 154180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:45:28,285-Speed 6312.96 samples/sec Loss 7.1096 LearningRate 0.0008 Epoch: 7 Global Step: 154190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:45:31,527-Speed 6318.05 samples/sec Loss 7.2141 LearningRate 0.0008 Epoch: 7 Global Step: 154200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:45:34,779-Speed 6299.42 samples/sec Loss 7.1127 LearningRate 0.0008 Epoch: 7 Global Step: 154210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:45:38,024-Speed 6312.84 samples/sec Loss 7.1613 LearningRate 0.0008 Epoch: 7 Global Step: 154220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:45:41,276-Speed 6297.72 samples/sec Loss 7.1602 LearningRate 0.0008 Epoch: 7 Global Step: 154230 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:45:44,524-Speed 6308.39 samples/sec Loss 7.0825 LearningRate 0.0008 Epoch: 7 Global Step: 154240 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:45:47,767-Speed 6315.56 samples/sec Loss 7.1861 LearningRate 0.0008 Epoch: 7 Global Step: 154250 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:45:51,017-Speed 6302.94 samples/sec Loss 7.1149 LearningRate 0.0008 Epoch: 7 Global Step: 154260 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:45:54,262-Speed 6312.35 samples/sec Loss 7.2178 LearningRate 0.0008 Epoch: 7 Global Step: 154270 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:45:57,510-Speed 6307.78 samples/sec Loss 7.1036 LearningRate 0.0008 Epoch: 7 Global Step: 154280 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:46:00,756-Speed 6310.82 samples/sec Loss 7.1501 LearningRate 0.0008 Epoch: 7 Global Step: 154290 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:46:03,987-Speed 6339.67 samples/sec Loss 7.1435 LearningRate 0.0008 Epoch: 7 Global Step: 154300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:07,231-Speed 6314.65 samples/sec Loss 7.1995 LearningRate 0.0008 Epoch: 7 Global Step: 154310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:10,478-Speed 6310.26 samples/sec Loss 7.1129 LearningRate 0.0008 Epoch: 7 Global Step: 154320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:13,721-Speed 6315.31 samples/sec Loss 7.1971 LearningRate 0.0008 Epoch: 7 Global Step: 154330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:16,968-Speed 6310.21 samples/sec Loss 7.1148 LearningRate 0.0008 Epoch: 7 Global Step: 154340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:20,240-Speed 6260.01 samples/sec Loss 7.1099 LearningRate 0.0008 Epoch: 7 Global Step: 154350 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:23,488-Speed 6307.06 samples/sec Loss 7.1188 LearningRate 0.0008 Epoch: 7 Global Step: 154360 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:26,735-Speed 6308.26 samples/sec Loss 7.1517 LearningRate 0.0008 Epoch: 7 Global Step: 154370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:29,982-Speed 6308.32 samples/sec Loss 7.1870 LearningRate 0.0008 Epoch: 7 Global Step: 154380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:33,230-Speed 6308.06 samples/sec Loss 7.1606 LearningRate 0.0008 Epoch: 7 Global Step: 154390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:36,479-Speed 6303.36 samples/sec Loss 7.1769 LearningRate 0.0008 Epoch: 7 Global Step: 154400 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:46:39,726-Speed 6310.30 samples/sec Loss 7.1437 LearningRate 0.0008 Epoch: 7 Global Step: 154410 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:46:42,976-Speed 6302.10 samples/sec Loss 7.0978 LearningRate 0.0008 Epoch: 7 Global Step: 154420 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:46:46,220-Speed 6313.93 samples/sec Loss 7.1799 LearningRate 0.0008 Epoch: 7 Global Step: 154430 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:49,470-Speed 6304.30 samples/sec Loss 7.1640 LearningRate 0.0008 Epoch: 7 Global Step: 154440 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:52,718-Speed 6306.67 samples/sec Loss 7.2292 LearningRate 0.0008 Epoch: 7 Global Step: 154450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:55,962-Speed 6314.00 samples/sec Loss 7.1099 LearningRate 0.0008 Epoch: 7 Global Step: 154460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:46:59,215-Speed 6296.52 samples/sec Loss 7.1750 LearningRate 0.0008 Epoch: 7 Global Step: 154470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:02,465-Speed 6303.32 samples/sec Loss 7.1666 LearningRate 0.0008 Epoch: 7 Global Step: 154480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:05,713-Speed 6306.51 samples/sec Loss 7.2065 LearningRate 0.0008 Epoch: 7 Global Step: 154490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:08,963-Speed 6304.68 samples/sec Loss 7.1577 LearningRate 0.0008 Epoch: 7 Global Step: 154500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:12,215-Speed 6299.40 samples/sec Loss 7.1599 LearningRate 0.0008 Epoch: 7 Global Step: 154510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:15,460-Speed 6312.11 samples/sec Loss 7.2323 LearningRate 0.0008 Epoch: 7 Global Step: 154520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:18,690-Speed 6341.25 samples/sec Loss 7.1915 LearningRate 0.0008 Epoch: 7 Global Step: 154530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:21,938-Speed 6307.50 samples/sec Loss 7.2057 LearningRate 0.0008 Epoch: 7 Global Step: 154540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:25,185-Speed 6308.96 samples/sec Loss 7.1357 LearningRate 0.0008 Epoch: 7 Global Step: 154550 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:28,442-Speed 6289.77 samples/sec Loss 7.2175 LearningRate 0.0008 Epoch: 7 Global Step: 154560 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:31,691-Speed 6305.21 samples/sec Loss 7.1432 LearningRate 0.0008 Epoch: 7 Global Step: 154570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:34,939-Speed 6307.84 samples/sec Loss 7.2447 LearningRate 0.0008 Epoch: 7 Global Step: 154580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:38,189-Speed 6302.96 samples/sec Loss 7.1314 LearningRate 0.0008 Epoch: 7 Global Step: 154590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:41,440-Speed 6300.51 samples/sec Loss 7.1093 LearningRate 0.0008 Epoch: 7 Global Step: 154600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:44,691-Speed 6300.78 samples/sec Loss 7.1324 LearningRate 0.0008 Epoch: 7 Global Step: 154610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:47,937-Speed 6310.83 samples/sec Loss 7.2064 LearningRate 0.0008 Epoch: 7 Global Step: 154620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:47:51,183-Speed 6310.49 samples/sec Loss 7.0965 LearningRate 0.0008 Epoch: 7 Global Step: 154630 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:47:54,429-Speed 6309.60 samples/sec Loss 7.1302 LearningRate 0.0008 Epoch: 7 Global Step: 154640 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:47:57,676-Speed 6309.98 samples/sec Loss 7.1121 LearningRate 0.0008 Epoch: 7 Global Step: 154650 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:48:00,925-Speed 6304.81 samples/sec Loss 7.1551 LearningRate 0.0008 Epoch: 7 Global Step: 154660 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:48:04,181-Speed 6290.80 samples/sec Loss 7.1774 LearningRate 0.0008 Epoch: 7 Global Step: 154670 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:48:07,415-Speed 6334.35 samples/sec Loss 7.0966 LearningRate 0.0008 Epoch: 7 Global Step: 154680 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:10,659-Speed 6315.45 samples/sec Loss 7.0768 LearningRate 0.0008 Epoch: 7 Global Step: 154690 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:13,904-Speed 6312.32 samples/sec Loss 7.1338 LearningRate 0.0008 Epoch: 7 Global Step: 154700 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:17,152-Speed 6308.01 samples/sec Loss 7.2698 LearningRate 0.0008 Epoch: 7 Global Step: 154710 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:20,395-Speed 6316.70 samples/sec Loss 7.1527 LearningRate 0.0008 Epoch: 7 Global Step: 154720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:23,639-Speed 6313.30 samples/sec Loss 7.1500 LearningRate 0.0008 Epoch: 7 Global Step: 154730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:26,886-Speed 6308.56 samples/sec Loss 7.1559 LearningRate 0.0008 Epoch: 7 Global Step: 154740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:30,139-Speed 6298.13 samples/sec Loss 7.0837 LearningRate 0.0008 Epoch: 7 Global Step: 154750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:33,386-Speed 6307.78 samples/sec Loss 7.1740 LearningRate 0.0008 Epoch: 7 Global Step: 154760 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:36,634-Speed 6306.94 samples/sec Loss 7.1430 LearningRate 0.0008 Epoch: 7 Global Step: 154770 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:39,872-Speed 6326.17 samples/sec Loss 7.0661 LearningRate 0.0008 Epoch: 7 Global Step: 154780 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:43,118-Speed 6312.06 samples/sec Loss 7.1265 LearningRate 0.0008 Epoch: 7 Global Step: 154790 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:46,362-Speed 6314.11 samples/sec Loss 7.1869 LearningRate 0.0008 Epoch: 7 Global Step: 154800 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:49,613-Speed 6301.60 samples/sec Loss 7.0982 LearningRate 0.0008 Epoch: 7 Global Step: 154810 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:52,866-Speed 6295.91 samples/sec Loss 7.1592 LearningRate 0.0008 Epoch: 7 Global Step: 154820 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:56,119-Speed 6298.37 samples/sec Loss 7.1800 LearningRate 0.0008 Epoch: 7 Global Step: 154830 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:48:59,363-Speed 6313.78 samples/sec Loss 7.1711 LearningRate 0.0008 Epoch: 7 Global Step: 154840 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:02,609-Speed 6311.46 samples/sec Loss 7.1664 LearningRate 0.0008 Epoch: 7 Global Step: 154850 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:05,862-Speed 6296.79 samples/sec Loss 7.1194 LearningRate 0.0008 Epoch: 7 Global Step: 154860 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:09,110-Speed 6305.69 samples/sec Loss 7.1633 LearningRate 0.0008 Epoch: 7 Global Step: 154870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:12,359-Speed 6306.10 samples/sec Loss 7.1016 LearningRate 0.0008 Epoch: 7 Global Step: 154880 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:49:15,607-Speed 6307.53 samples/sec Loss 7.2198 LearningRate 0.0008 Epoch: 7 Global Step: 154890 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:49:18,854-Speed 6307.86 samples/sec Loss 7.2280 LearningRate 0.0008 Epoch: 7 Global Step: 154900 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:49:22,090-Speed 6330.15 samples/sec Loss 7.1956 LearningRate 0.0008 Epoch: 7 Global Step: 154910 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:25,334-Speed 6314.49 samples/sec Loss 7.1344 LearningRate 0.0008 Epoch: 7 Global Step: 154920 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:28,576-Speed 6319.24 samples/sec Loss 7.2299 LearningRate 0.0008 Epoch: 7 Global Step: 154930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:31,825-Speed 6305.10 samples/sec Loss 7.0766 LearningRate 0.0008 Epoch: 7 Global Step: 154940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:35,072-Speed 6309.61 samples/sec Loss 7.0286 LearningRate 0.0008 Epoch: 7 Global Step: 154950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:38,318-Speed 6309.86 samples/sec Loss 7.1466 LearningRate 0.0008 Epoch: 7 Global Step: 154960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:41,562-Speed 6314.94 samples/sec Loss 7.2108 LearningRate 0.0008 Epoch: 7 Global Step: 154970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:44,813-Speed 6299.83 samples/sec Loss 7.2036 LearningRate 0.0008 Epoch: 7 Global Step: 154980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:48,057-Speed 6316.21 samples/sec Loss 7.1599 LearningRate 0.0008 Epoch: 7 Global Step: 154990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:51,301-Speed 6313.28 samples/sec Loss 7.1214 LearningRate 0.0008 Epoch: 7 Global Step: 155000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:49:54,547-Speed 6310.99 samples/sec Loss 7.1304 LearningRate 0.0008 Epoch: 7 Global Step: 155010 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:49:57,797-Speed 6302.50 samples/sec Loss 7.1458 LearningRate 0.0008 Epoch: 7 Global Step: 155020 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:50:01,051-Speed 6295.26 samples/sec Loss 7.1667 LearningRate 0.0008 Epoch: 7 Global Step: 155030 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:50:04,297-Speed 6310.34 samples/sec Loss 7.1860 LearningRate 0.0008 Epoch: 7 Global Step: 155040 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:50:07,550-Speed 6298.19 samples/sec Loss 7.1839 LearningRate 0.0008 Epoch: 7 Global Step: 155050 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:50:10,796-Speed 6310.83 samples/sec Loss 7.1351 LearningRate 0.0008 Epoch: 7 Global Step: 155060 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:50:14,030-Speed 6334.58 samples/sec Loss 7.1924 LearningRate 0.0008 Epoch: 7 Global Step: 155070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:50:17,276-Speed 6310.23 samples/sec Loss 7.1106 LearningRate 0.0008 Epoch: 7 Global Step: 155080 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:50:20,526-Speed 6302.96 samples/sec Loss 7.1455 LearningRate 0.0008 Epoch: 7 Global Step: 155090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:50:23,772-Speed 6310.50 samples/sec Loss 7.1151 LearningRate 0.0008 Epoch: 7 Global Step: 155100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:50:27,017-Speed 6313.73 samples/sec Loss 7.1352 LearningRate 0.0008 Epoch: 7 Global Step: 155110 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:50:30,261-Speed 6314.52 samples/sec Loss 7.1977 LearningRate 0.0008 Epoch: 7 Global Step: 155120 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:50:33,523-Speed 6279.38 samples/sec Loss 7.2104 LearningRate 0.0008 Epoch: 7 Global Step: 155130 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:50:36,770-Speed 6309.86 samples/sec Loss 7.2343 LearningRate 0.0008 Epoch: 7 Global Step: 155140 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:50:40,016-Speed 6309.59 samples/sec Loss 7.1325 LearningRate 0.0008 Epoch: 7 Global Step: 155150 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:50:43,261-Speed 6313.07 samples/sec Loss 7.1488 LearningRate 0.0008 Epoch: 7 Global Step: 155160 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:50:46,514-Speed 6296.06 samples/sec Loss 7.1804 LearningRate 0.0008 Epoch: 7 Global Step: 155170 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:50:49,763-Speed 6306.49 samples/sec Loss 7.1188 LearningRate 0.0008 Epoch: 7 Global Step: 155180 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:50:53,008-Speed 6311.02 samples/sec Loss 7.2001 LearningRate 0.0008 Epoch: 7 Global Step: 155190 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:50:56,260-Speed 6300.72 samples/sec Loss 7.1220 LearningRate 0.0008 Epoch: 7 Global Step: 155200 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:50:59,516-Speed 6290.92 samples/sec Loss 7.1265 LearningRate 0.0008 Epoch: 7 Global Step: 155210 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:51:02,764-Speed 6305.79 samples/sec Loss 7.1512 LearningRate 0.0008 Epoch: 7 Global Step: 155220 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:51:06,012-Speed 6307.31 samples/sec Loss 7.1595 LearningRate 0.0008 Epoch: 7 Global Step: 155230 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:51:09,261-Speed 6304.26 samples/sec Loss 7.1142 LearningRate 0.0008 Epoch: 7 Global Step: 155240 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:51:12,496-Speed 6332.01 samples/sec Loss 7.1958 LearningRate 0.0008 Epoch: 7 Global Step: 155250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:51:15,746-Speed 6303.41 samples/sec Loss 7.1378 LearningRate 0.0008 Epoch: 7 Global Step: 155260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:51:18,993-Speed 6307.96 samples/sec Loss 7.1740 LearningRate 0.0008 Epoch: 7 Global Step: 155270 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:51:22,242-Speed 6306.40 samples/sec Loss 7.1290 LearningRate 0.0008 Epoch: 7 Global Step: 155280 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:51:25,489-Speed 6309.05 samples/sec Loss 7.1416 LearningRate 0.0008 Epoch: 7 Global Step: 155290 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:51:28,737-Speed 6305.82 samples/sec Loss 7.0941 LearningRate 0.0008 Epoch: 7 Global Step: 155300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:51:31,991-Speed 6295.87 samples/sec Loss 7.1663 LearningRate 0.0008 Epoch: 7 Global Step: 155310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:51:35,238-Speed 6308.53 samples/sec Loss 7.1322 LearningRate 0.0008 Epoch: 7 Global Step: 155320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:51:38,489-Speed 6301.28 samples/sec Loss 7.0939 LearningRate 0.0008 Epoch: 7 Global Step: 155330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:51:41,736-Speed 6309.32 samples/sec Loss 7.1319 LearningRate 0.0008 Epoch: 7 Global Step: 155340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:51:44,984-Speed 6307.18 samples/sec Loss 7.0971 LearningRate 0.0008 Epoch: 7 Global Step: 155350 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:51:48,236-Speed 6298.75 samples/sec Loss 7.0921 LearningRate 0.0008 Epoch: 7 Global Step: 155360 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:51:51,490-Speed 6295.84 samples/sec Loss 7.1566 LearningRate 0.0008 Epoch: 7 Global Step: 155370 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:51:54,739-Speed 6304.72 samples/sec Loss 7.1721 LearningRate 0.0008 Epoch: 7 Global Step: 155380 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:51:57,987-Speed 6306.38 samples/sec Loss 7.1947 LearningRate 0.0008 Epoch: 7 Global Step: 155390 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:52:01,267-Speed 6244.86 samples/sec Loss 7.1508 LearningRate 0.0008 Epoch: 7 Global Step: 155400 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:52:04,505-Speed 6325.98 samples/sec Loss 7.1656 LearningRate 0.0008 Epoch: 7 Global Step: 155410 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:07,757-Speed 6300.06 samples/sec Loss 7.1104 LearningRate 0.0008 Epoch: 7 Global Step: 155420 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:11,007-Speed 6302.66 samples/sec Loss 7.2394 LearningRate 0.0008 Epoch: 7 Global Step: 155430 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:14,254-Speed 6308.35 samples/sec Loss 7.1973 LearningRate 0.0008 Epoch: 7 Global Step: 155440 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:17,557-Speed 6202.32 samples/sec Loss 7.1220 LearningRate 0.0008 Epoch: 7 Global Step: 155450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:20,802-Speed 6312.65 samples/sec Loss 7.1256 LearningRate 0.0008 Epoch: 7 Global Step: 155460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:24,062-Speed 6282.74 samples/sec Loss 7.1042 LearningRate 0.0008 Epoch: 7 Global Step: 155470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:27,310-Speed 6307.62 samples/sec Loss 7.1715 LearningRate 0.0008 Epoch: 7 Global Step: 155480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:30,557-Speed 6308.08 samples/sec Loss 7.0673 LearningRate 0.0008 Epoch: 7 Global Step: 155490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:33,818-Speed 6282.36 samples/sec Loss 7.0941 LearningRate 0.0008 Epoch: 7 Global Step: 155500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:37,080-Speed 6279.73 samples/sec Loss 7.1332 LearningRate 0.0008 Epoch: 7 Global Step: 155510 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:52:40,322-Speed 6319.11 samples/sec Loss 7.1626 LearningRate 0.0008 Epoch: 7 Global Step: 155520 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:52:43,569-Speed 6308.55 samples/sec Loss 7.0629 LearningRate 0.0008 Epoch: 7 Global Step: 155530 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:52:46,825-Speed 6291.98 samples/sec Loss 7.1131 LearningRate 0.0008 Epoch: 7 Global Step: 155540 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:52:50,071-Speed 6311.26 samples/sec Loss 7.0717 LearningRate 0.0008 Epoch: 7 Global Step: 155550 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:52:53,318-Speed 6308.25 samples/sec Loss 7.0656 LearningRate 0.0008 Epoch: 7 Global Step: 155560 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:52:56,550-Speed 6337.12 samples/sec Loss 7.1045 LearningRate 0.0008 Epoch: 7 Global Step: 155570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:52:59,792-Speed 6318.21 samples/sec Loss 7.1858 LearningRate 0.0008 Epoch: 7 Global Step: 155580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:03,038-Speed 6310.96 samples/sec Loss 7.2187 LearningRate 0.0008 Epoch: 7 Global Step: 155590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:06,294-Speed 6291.14 samples/sec Loss 7.1054 LearningRate 0.0008 Epoch: 7 Global Step: 155600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:09,544-Speed 6303.23 samples/sec Loss 7.1164 LearningRate 0.0008 Epoch: 7 Global Step: 155610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:12,786-Speed 6317.65 samples/sec Loss 7.1713 LearningRate 0.0008 Epoch: 7 Global Step: 155620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:16,039-Speed 6297.42 samples/sec Loss 7.0718 LearningRate 0.0008 Epoch: 7 Global Step: 155630 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:19,289-Speed 6304.16 samples/sec Loss 7.0871 LearningRate 0.0008 Epoch: 7 Global Step: 155640 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:22,535-Speed 6310.55 samples/sec Loss 7.1519 LearningRate 0.0008 Epoch: 7 Global Step: 155650 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:25,781-Speed 6310.13 samples/sec Loss 7.0479 LearningRate 0.0008 Epoch: 7 Global Step: 155660 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:29,027-Speed 6310.31 samples/sec Loss 7.2027 LearningRate 0.0008 Epoch: 7 Global Step: 155670 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:53:32,256-Speed 6344.45 samples/sec Loss 7.1791 LearningRate 0.0008 Epoch: 7 Global Step: 155680 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:35,500-Speed 6314.60 samples/sec Loss 7.1595 LearningRate 0.0008 Epoch: 7 Global Step: 155690 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:38,747-Speed 6309.31 samples/sec Loss 7.0893 LearningRate 0.0008 Epoch: 7 Global Step: 155700 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:41,994-Speed 6308.74 samples/sec Loss 7.1079 LearningRate 0.0008 Epoch: 7 Global Step: 155710 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:45,241-Speed 6308.22 samples/sec Loss 7.1215 LearningRate 0.0008 Epoch: 7 Global Step: 155720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:48,494-Speed 6298.84 samples/sec Loss 7.0419 LearningRate 0.0008 Epoch: 7 Global Step: 155730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:51,739-Speed 6312.99 samples/sec Loss 7.0772 LearningRate 0.0008 Epoch: 7 Global Step: 155740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:54,984-Speed 6311.31 samples/sec Loss 7.1584 LearningRate 0.0008 Epoch: 7 Global Step: 155750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:53:58,230-Speed 6309.97 samples/sec Loss 7.2456 LearningRate 0.0008 Epoch: 7 Global Step: 155760 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:01,482-Speed 6298.99 samples/sec Loss 7.0553 LearningRate 0.0008 Epoch: 7 Global Step: 155770 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:04,715-Speed 6337.32 samples/sec Loss 7.2102 LearningRate 0.0008 Epoch: 7 Global Step: 155780 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:07,963-Speed 6305.78 samples/sec Loss 7.0792 LearningRate 0.0008 Epoch: 7 Global Step: 155790 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:11,207-Speed 6316.07 samples/sec Loss 7.1564 LearningRate 0.0008 Epoch: 7 Global Step: 155800 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:14,452-Speed 6312.29 samples/sec Loss 7.2288 LearningRate 0.0008 Epoch: 7 Global Step: 155810 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:17,697-Speed 6312.31 samples/sec Loss 7.1109 LearningRate 0.0008 Epoch: 7 Global Step: 155820 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:20,940-Speed 6316.45 samples/sec Loss 7.1204 LearningRate 0.0008 Epoch: 7 Global Step: 155830 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:24,186-Speed 6310.93 samples/sec Loss 7.1745 LearningRate 0.0008 Epoch: 7 Global Step: 155840 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:27,434-Speed 6307.03 samples/sec Loss 7.1179 LearningRate 0.0008 Epoch: 7 Global Step: 155850 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:30,681-Speed 6309.09 samples/sec Loss 7.1845 LearningRate 0.0008 Epoch: 7 Global Step: 155860 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:33,927-Speed 6310.45 samples/sec Loss 7.2225 LearningRate 0.0008 Epoch: 7 Global Step: 155870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:37,173-Speed 6309.58 samples/sec Loss 7.1748 LearningRate 0.0008 Epoch: 7 Global Step: 155880 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:54:40,408-Speed 6333.07 samples/sec Loss 7.1298 LearningRate 0.0008 Epoch: 7 Global Step: 155890 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:43,654-Speed 6310.26 samples/sec Loss 7.1110 LearningRate 0.0008 Epoch: 7 Global Step: 155900 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:46,910-Speed 6292.03 samples/sec Loss 7.1035 LearningRate 0.0008 Epoch: 7 Global Step: 155910 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:50,156-Speed 6312.42 samples/sec Loss 7.1843 LearningRate 0.0008 Epoch: 7 Global Step: 155920 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:53,401-Speed 6311.23 samples/sec Loss 7.1881 LearningRate 0.0008 Epoch: 7 Global Step: 155930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:56,650-Speed 6304.61 samples/sec Loss 7.1063 LearningRate 0.0008 Epoch: 7 Global Step: 155940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:54:59,894-Speed 6314.35 samples/sec Loss 7.0337 LearningRate 0.0008 Epoch: 7 Global Step: 155950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:55:03,142-Speed 6308.24 samples/sec Loss 7.1416 LearningRate 0.0008 Epoch: 7 Global Step: 155960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:55:06,385-Speed 6315.11 samples/sec Loss 7.0737 LearningRate 0.0008 Epoch: 7 Global Step: 155970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:55:09,644-Speed 6285.50 samples/sec Loss 7.1413 LearningRate 0.0008 Epoch: 7 Global Step: 155980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:55:12,896-Speed 6300.10 samples/sec Loss 7.1283 LearningRate 0.0008 Epoch: 7 Global Step: 155990 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:55:16,143-Speed 6308.78 samples/sec Loss 7.1393 LearningRate 0.0008 Epoch: 7 Global Step: 156000 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:55:19,391-Speed 6305.86 samples/sec Loss 7.1664 LearningRate 0.0008 Epoch: 7 Global Step: 156010 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:55:22,638-Speed 6309.41 samples/sec Loss 7.1399 LearningRate 0.0008 Epoch: 7 Global Step: 156020 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:55:25,893-Speed 6293.88 samples/sec Loss 7.1822 LearningRate 0.0008 Epoch: 7 Global Step: 156030 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:55:29,139-Speed 6310.81 samples/sec Loss 7.1472 LearningRate 0.0008 Epoch: 7 Global Step: 156040 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:55:32,385-Speed 6309.82 samples/sec Loss 7.1067 LearningRate 0.0008 Epoch: 7 Global Step: 156050 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:55:35,634-Speed 6304.77 samples/sec Loss 7.1600 LearningRate 0.0008 Epoch: 7 Global Step: 156060 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:55:38,885-Speed 6300.74 samples/sec Loss 7.1411 LearningRate 0.0008 Epoch: 7 Global Step: 156070 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:55:42,133-Speed 6306.72 samples/sec Loss 7.1838 LearningRate 0.0008 Epoch: 7 Global Step: 156080 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:55:45,354-Speed 6361.20 samples/sec Loss 7.2056 LearningRate 0.0008 Epoch: 7 Global Step: 156090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:55:48,603-Speed 6305.29 samples/sec Loss 7.1354 LearningRate 0.0008 Epoch: 7 Global Step: 156100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:55:51,852-Speed 6304.54 samples/sec Loss 7.1824 LearningRate 0.0008 Epoch: 7 Global Step: 156110 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:55:55,103-Speed 6300.71 samples/sec Loss 7.0989 LearningRate 0.0008 Epoch: 7 Global Step: 156120 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:55:58,350-Speed 6308.99 samples/sec Loss 7.0740 LearningRate 0.0008 Epoch: 7 Global Step: 156130 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:01,631-Speed 6244.12 samples/sec Loss 7.1941 LearningRate 0.0008 Epoch: 7 Global Step: 156140 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:04,891-Speed 6284.49 samples/sec Loss 7.0803 LearningRate 0.0008 Epoch: 7 Global Step: 156150 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:08,138-Speed 6308.49 samples/sec Loss 7.1214 LearningRate 0.0008 Epoch: 7 Global Step: 156160 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:11,384-Speed 6310.74 samples/sec Loss 7.0458 LearningRate 0.0008 Epoch: 7 Global Step: 156170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:14,629-Speed 6312.35 samples/sec Loss 7.1462 LearningRate 0.0008 Epoch: 7 Global Step: 156180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:17,877-Speed 6305.98 samples/sec Loss 7.1880 LearningRate 0.0008 Epoch: 7 Global Step: 156190 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:56:21,127-Speed 6304.51 samples/sec Loss 7.1920 LearningRate 0.0008 Epoch: 7 Global Step: 156200 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:56:24,375-Speed 6305.73 samples/sec Loss 7.1202 LearningRate 0.0008 Epoch: 7 Global Step: 156210 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:56:27,611-Speed 6330.68 samples/sec Loss 7.2057 LearningRate 0.0008 Epoch: 7 Global Step: 156220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:30,856-Speed 6312.77 samples/sec Loss 7.2138 LearningRate 0.0008 Epoch: 7 Global Step: 156230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:34,102-Speed 6309.66 samples/sec Loss 7.1427 LearningRate 0.0008 Epoch: 7 Global Step: 156240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:37,351-Speed 6305.52 samples/sec Loss 7.2099 LearningRate 0.0008 Epoch: 7 Global Step: 156250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:40,656-Speed 6198.52 samples/sec Loss 7.0833 LearningRate 0.0008 Epoch: 7 Global Step: 156260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:43,909-Speed 6297.70 samples/sec Loss 7.0672 LearningRate 0.0008 Epoch: 7 Global Step: 156270 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:47,155-Speed 6309.93 samples/sec Loss 7.0765 LearningRate 0.0008 Epoch: 7 Global Step: 156280 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:50,398-Speed 6316.09 samples/sec Loss 7.0657 LearningRate 0.0008 Epoch: 7 Global Step: 156290 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:53,658-Speed 6284.79 samples/sec Loss 7.1083 LearningRate 0.0008 Epoch: 7 Global Step: 156300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:56:56,903-Speed 6311.44 samples/sec Loss 7.1175 LearningRate 0.0008 Epoch: 7 Global Step: 156310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:00,149-Speed 6311.17 samples/sec Loss 7.1197 LearningRate 0.0008 Epoch: 7 Global Step: 156320 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:57:03,401-Speed 6299.22 samples/sec Loss 7.0503 LearningRate 0.0008 Epoch: 7 Global Step: 156330 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:57:06,650-Speed 6304.92 samples/sec Loss 7.0746 LearningRate 0.0008 Epoch: 7 Global Step: 156340 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:57:09,912-Speed 6280.76 samples/sec Loss 7.1318 LearningRate 0.0008 Epoch: 7 Global Step: 156350 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:57:13,159-Speed 6307.47 samples/sec Loss 7.0850 LearningRate 0.0008 Epoch: 7 Global Step: 156360 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:57:16,407-Speed 6307.74 samples/sec Loss 7.2421 LearningRate 0.0008 Epoch: 7 Global Step: 156370 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:57:19,640-Speed 6335.93 samples/sec Loss 7.1505 LearningRate 0.0008 Epoch: 7 Global Step: 156380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:22,886-Speed 6310.01 samples/sec Loss 7.1111 LearningRate 0.0008 Epoch: 7 Global Step: 156390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:26,140-Speed 6296.82 samples/sec Loss 7.1803 LearningRate 0.0008 Epoch: 7 Global Step: 156400 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:29,385-Speed 6312.32 samples/sec Loss 7.1753 LearningRate 0.0008 Epoch: 7 Global Step: 156410 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:32,634-Speed 6305.20 samples/sec Loss 7.2061 LearningRate 0.0008 Epoch: 7 Global Step: 156420 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:35,878-Speed 6313.86 samples/sec Loss 7.1662 LearningRate 0.0008 Epoch: 7 Global Step: 156430 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:39,123-Speed 6311.71 samples/sec Loss 7.0371 LearningRate 0.0008 Epoch: 7 Global Step: 156440 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:42,369-Speed 6311.21 samples/sec Loss 7.1348 LearningRate 0.0008 Epoch: 7 Global Step: 156450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:45,615-Speed 6310.94 samples/sec Loss 7.1414 LearningRate 0.0008 Epoch: 7 Global Step: 156460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:48,866-Speed 6301.08 samples/sec Loss 7.1787 LearningRate 0.0008 Epoch: 7 Global Step: 156470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:57:52,112-Speed 6310.00 samples/sec Loss 7.1213 LearningRate 0.0008 Epoch: 7 Global Step: 156480 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:57:55,366-Speed 6296.37 samples/sec Loss 7.1161 LearningRate 0.0008 Epoch: 7 Global Step: 156490 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:57:58,618-Speed 6298.53 samples/sec Loss 7.1900 LearningRate 0.0008 Epoch: 7 Global Step: 156500 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:58:01,848-Speed 6341.90 samples/sec Loss 7.1580 LearningRate 0.0008 Epoch: 7 Global Step: 156510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:05,093-Speed 6312.08 samples/sec Loss 7.1740 LearningRate 0.0008 Epoch: 7 Global Step: 156520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:08,343-Speed 6302.99 samples/sec Loss 7.1754 LearningRate 0.0008 Epoch: 7 Global Step: 156530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:11,600-Speed 6290.94 samples/sec Loss 6.9906 LearningRate 0.0008 Epoch: 7 Global Step: 156540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:14,847-Speed 6308.01 samples/sec Loss 7.1618 LearningRate 0.0008 Epoch: 7 Global Step: 156550 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:18,098-Speed 6300.68 samples/sec Loss 7.0983 LearningRate 0.0008 Epoch: 7 Global Step: 156560 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:21,344-Speed 6311.94 samples/sec Loss 7.1541 LearningRate 0.0008 Epoch: 7 Global Step: 156570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:24,590-Speed 6310.52 samples/sec Loss 7.1194 LearningRate 0.0008 Epoch: 7 Global Step: 156580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:27,845-Speed 6293.61 samples/sec Loss 7.1952 LearningRate 0.0008 Epoch: 7 Global Step: 156590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:31,090-Speed 6312.50 samples/sec Loss 7.0786 LearningRate 0.0008 Epoch: 7 Global Step: 156600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:34,337-Speed 6307.99 samples/sec Loss 7.1286 LearningRate 0.0008 Epoch: 7 Global Step: 156610 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:58:37,586-Speed 6304.24 samples/sec Loss 7.1100 LearningRate 0.0008 Epoch: 7 Global Step: 156620 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:58:40,835-Speed 6306.20 samples/sec Loss 7.1076 LearningRate 0.0008 Epoch: 7 Global Step: 156630 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:58:44,088-Speed 6297.58 samples/sec Loss 7.2007 LearningRate 0.0008 Epoch: 7 Global Step: 156640 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:58:47,334-Speed 6309.35 samples/sec Loss 7.0514 LearningRate 0.0008 Epoch: 7 Global Step: 156650 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:58:50,569-Speed 6333.88 samples/sec Loss 7.0232 LearningRate 0.0008 Epoch: 7 Global Step: 156660 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:53,815-Speed 6308.90 samples/sec Loss 7.1253 LearningRate 0.0008 Epoch: 7 Global Step: 156670 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:58:57,064-Speed 6305.02 samples/sec Loss 7.1233 LearningRate 0.0008 Epoch: 7 Global Step: 156680 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:59:00,313-Speed 6305.82 samples/sec Loss 7.1318 LearningRate 0.0008 Epoch: 7 Global Step: 156690 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:59:03,557-Speed 6314.43 samples/sec Loss 7.0837 LearningRate 0.0008 Epoch: 7 Global Step: 156700 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:59:06,804-Speed 6308.32 samples/sec Loss 7.1642 LearningRate 0.0008 Epoch: 7 Global Step: 156710 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:59:10,050-Speed 6309.97 samples/sec Loss 7.0941 LearningRate 0.0008 Epoch: 7 Global Step: 156720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:59:13,296-Speed 6312.25 samples/sec Loss 7.1658 LearningRate 0.0008 Epoch: 7 Global Step: 156730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:59:16,543-Speed 6309.28 samples/sec Loss 7.1478 LearningRate 0.0008 Epoch: 7 Global Step: 156740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:59:19,790-Speed 6307.88 samples/sec Loss 7.0932 LearningRate 0.0008 Epoch: 7 Global Step: 156750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 05:59:23,035-Speed 6312.93 samples/sec Loss 7.0247 LearningRate 0.0008 Epoch: 7 Global Step: 156760 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:59:26,278-Speed 6316.83 samples/sec Loss 7.1202 LearningRate 0.0008 Epoch: 7 Global Step: 156770 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:59:29,526-Speed 6307.13 samples/sec Loss 7.1592 LearningRate 0.0008 Epoch: 7 Global Step: 156780 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:59:32,771-Speed 6312.75 samples/sec Loss 7.0687 LearningRate 0.0008 Epoch: 7 Global Step: 156790 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:59:36,020-Speed 6303.84 samples/sec Loss 7.0926 LearningRate 0.0008 Epoch: 7 Global Step: 156800 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:59:39,268-Speed 6307.21 samples/sec Loss 7.1316 LearningRate 0.0008 Epoch: 7 Global Step: 156810 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:59:42,513-Speed 6312.09 samples/sec Loss 7.0369 LearningRate 0.0008 Epoch: 7 Global Step: 156820 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:59:45,759-Speed 6310.50 samples/sec Loss 7.1209 LearningRate 0.0008 Epoch: 7 Global Step: 156830 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:59:49,006-Speed 6309.95 samples/sec Loss 7.1045 LearningRate 0.0008 Epoch: 7 Global Step: 156840 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:59:52,251-Speed 6312.11 samples/sec Loss 7.0744 LearningRate 0.0008 Epoch: 7 Global Step: 156850 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 05:59:55,498-Speed 6309.01 samples/sec Loss 7.1825 LearningRate 0.0008 Epoch: 7 Global Step: 156860 Fp16 Grad Scale: 131072 Required: 61 hours Training: 2022-04-01 05:59:58,718-Speed 6360.82 samples/sec Loss 7.1439 LearningRate 0.0008 Epoch: 7 Global Step: 156870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:01,966-Speed 6307.11 samples/sec Loss 7.1062 LearningRate 0.0008 Epoch: 7 Global Step: 156880 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:05,217-Speed 6301.65 samples/sec Loss 7.0640 LearningRate 0.0008 Epoch: 7 Global Step: 156890 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:08,462-Speed 6313.57 samples/sec Loss 7.1006 LearningRate 0.0008 Epoch: 7 Global Step: 156900 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:11,708-Speed 6309.43 samples/sec Loss 7.1004 LearningRate 0.0008 Epoch: 7 Global Step: 156910 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:14,951-Speed 6317.62 samples/sec Loss 7.1902 LearningRate 0.0008 Epoch: 7 Global Step: 156920 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:18,200-Speed 6304.45 samples/sec Loss 7.0575 LearningRate 0.0008 Epoch: 7 Global Step: 156930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:21,446-Speed 6310.70 samples/sec Loss 7.0828 LearningRate 0.0008 Epoch: 7 Global Step: 156940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:24,690-Speed 6313.74 samples/sec Loss 7.0762 LearningRate 0.0008 Epoch: 7 Global Step: 156950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:27,940-Speed 6304.15 samples/sec Loss 7.0997 LearningRate 0.0008 Epoch: 7 Global Step: 156960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:31,189-Speed 6307.35 samples/sec Loss 7.1142 LearningRate 0.0008 Epoch: 7 Global Step: 156970 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:00:34,437-Speed 6306.82 samples/sec Loss 7.2018 LearningRate 0.0008 Epoch: 7 Global Step: 156980 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:00:37,700-Speed 6278.36 samples/sec Loss 7.2411 LearningRate 0.0008 Epoch: 7 Global Step: 156990 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:00:40,947-Speed 6308.16 samples/sec Loss 7.1353 LearningRate 0.0008 Epoch: 7 Global Step: 157000 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:00:44,192-Speed 6312.40 samples/sec Loss 7.1121 LearningRate 0.0008 Epoch: 7 Global Step: 157010 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:00:47,440-Speed 6307.13 samples/sec Loss 7.1495 LearningRate 0.0008 Epoch: 7 Global Step: 157020 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:00:50,676-Speed 6330.80 samples/sec Loss 7.0821 LearningRate 0.0008 Epoch: 7 Global Step: 157030 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:53,922-Speed 6310.24 samples/sec Loss 7.1250 LearningRate 0.0008 Epoch: 7 Global Step: 157040 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:00:57,169-Speed 6308.82 samples/sec Loss 7.0921 LearningRate 0.0008 Epoch: 7 Global Step: 157050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:00,418-Speed 6304.33 samples/sec Loss 7.0893 LearningRate 0.0008 Epoch: 7 Global Step: 157060 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:03,666-Speed 6307.39 samples/sec Loss 7.0010 LearningRate 0.0008 Epoch: 7 Global Step: 157070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:06,914-Speed 6307.21 samples/sec Loss 7.0885 LearningRate 0.0008 Epoch: 7 Global Step: 157080 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:10,155-Speed 6320.43 samples/sec Loss 7.1537 LearningRate 0.0008 Epoch: 7 Global Step: 157090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:13,403-Speed 6306.92 samples/sec Loss 7.1239 LearningRate 0.0008 Epoch: 7 Global Step: 157100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:16,648-Speed 6311.02 samples/sec Loss 7.0964 LearningRate 0.0008 Epoch: 7 Global Step: 157110 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:19,894-Speed 6312.43 samples/sec Loss 7.0778 LearningRate 0.0008 Epoch: 7 Global Step: 157120 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:23,138-Speed 6313.30 samples/sec Loss 7.1091 LearningRate 0.0008 Epoch: 7 Global Step: 157130 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:01:26,383-Speed 6312.76 samples/sec Loss 7.1669 LearningRate 0.0008 Epoch: 7 Global Step: 157140 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:01:29,632-Speed 6306.07 samples/sec Loss 7.1391 LearningRate 0.0008 Epoch: 7 Global Step: 157150 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:01:32,882-Speed 6302.56 samples/sec Loss 7.0378 LearningRate 0.0008 Epoch: 7 Global Step: 157160 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:01:36,127-Speed 6311.99 samples/sec Loss 7.1189 LearningRate 0.0008 Epoch: 7 Global Step: 157170 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:01:39,377-Speed 6302.35 samples/sec Loss 7.0732 LearningRate 0.0008 Epoch: 7 Global Step: 157180 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:01:42,623-Speed 6310.74 samples/sec Loss 7.1070 LearningRate 0.0008 Epoch: 7 Global Step: 157190 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:01:45,870-Speed 6309.23 samples/sec Loss 7.0977 LearningRate 0.0008 Epoch: 7 Global Step: 157200 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:01:49,107-Speed 6328.89 samples/sec Loss 7.0976 LearningRate 0.0008 Epoch: 7 Global Step: 157210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:52,353-Speed 6311.26 samples/sec Loss 7.1396 LearningRate 0.0008 Epoch: 7 Global Step: 157220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:55,604-Speed 6301.71 samples/sec Loss 7.0540 LearningRate 0.0008 Epoch: 7 Global Step: 157230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:01:58,867-Speed 6277.12 samples/sec Loss 7.1545 LearningRate 0.0008 Epoch: 7 Global Step: 157240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:02,116-Speed 6305.96 samples/sec Loss 7.1699 LearningRate 0.0008 Epoch: 7 Global Step: 157250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:05,362-Speed 6309.29 samples/sec Loss 7.0990 LearningRate 0.0008 Epoch: 7 Global Step: 157260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:08,613-Speed 6303.28 samples/sec Loss 7.1805 LearningRate 0.0008 Epoch: 7 Global Step: 157270 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:11,860-Speed 6308.64 samples/sec Loss 7.1631 LearningRate 0.0008 Epoch: 7 Global Step: 157280 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:15,110-Speed 6302.39 samples/sec Loss 7.2072 LearningRate 0.0008 Epoch: 7 Global Step: 157290 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:18,352-Speed 6318.40 samples/sec Loss 7.1073 LearningRate 0.0008 Epoch: 7 Global Step: 157300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:21,599-Speed 6308.79 samples/sec Loss 7.1050 LearningRate 0.0008 Epoch: 7 Global Step: 157310 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:02:24,838-Speed 6324.30 samples/sec Loss 7.1627 LearningRate 0.0008 Epoch: 7 Global Step: 157320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:28,086-Speed 6307.44 samples/sec Loss 7.1531 LearningRate 0.0008 Epoch: 7 Global Step: 157330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:31,342-Speed 6291.91 samples/sec Loss 7.0494 LearningRate 0.0008 Epoch: 7 Global Step: 157340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:34,626-Speed 6237.63 samples/sec Loss 7.0736 LearningRate 0.0008 Epoch: 7 Global Step: 157350 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:37,874-Speed 6306.65 samples/sec Loss 7.2013 LearningRate 0.0008 Epoch: 7 Global Step: 157360 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:41,118-Speed 6314.71 samples/sec Loss 7.1701 LearningRate 0.0008 Epoch: 7 Global Step: 157370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:44,368-Speed 6302.08 samples/sec Loss 7.0583 LearningRate 0.0008 Epoch: 7 Global Step: 157380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:47,618-Speed 6303.80 samples/sec Loss 7.0838 LearningRate 0.0008 Epoch: 7 Global Step: 157390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:50,868-Speed 6301.72 samples/sec Loss 6.9996 LearningRate 0.0008 Epoch: 7 Global Step: 157400 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:54,118-Speed 6304.22 samples/sec Loss 7.1151 LearningRate 0.0008 Epoch: 7 Global Step: 157410 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:02:57,367-Speed 6305.11 samples/sec Loss 7.1128 LearningRate 0.0008 Epoch: 7 Global Step: 157420 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:03:00,616-Speed 6304.80 samples/sec Loss 7.0532 LearningRate 0.0008 Epoch: 7 Global Step: 157430 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:03:03,864-Speed 6306.26 samples/sec Loss 7.0660 LearningRate 0.0008 Epoch: 7 Global Step: 157440 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:03:07,096-Speed 6338.77 samples/sec Loss 7.0647 LearningRate 0.0008 Epoch: 7 Global Step: 157450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:03:10,341-Speed 6313.56 samples/sec Loss 7.0480 LearningRate 0.0008 Epoch: 7 Global Step: 157460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:03:13,587-Speed 6309.01 samples/sec Loss 7.0603 LearningRate 0.0008 Epoch: 7 Global Step: 157470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:03:16,842-Speed 6294.16 samples/sec Loss 7.1129 LearningRate 0.0008 Epoch: 7 Global Step: 157480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:03:20,091-Speed 6305.60 samples/sec Loss 7.1374 LearningRate 0.0008 Epoch: 7 Global Step: 157490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:03:23,339-Speed 6305.76 samples/sec Loss 7.1127 LearningRate 0.0008 Epoch: 7 Global Step: 157500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:03:26,586-Speed 6308.45 samples/sec Loss 7.0424 LearningRate 0.0008 Epoch: 7 Global Step: 157510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:03:29,833-Speed 6310.27 samples/sec Loss 7.0964 LearningRate 0.0008 Epoch: 7 Global Step: 157520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:03:33,082-Speed 6304.36 samples/sec Loss 7.1911 LearningRate 0.0008 Epoch: 7 Global Step: 157530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:03:36,332-Speed 6302.24 samples/sec Loss 7.1432 LearningRate 0.0008 Epoch: 7 Global Step: 157540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:03:39,583-Speed 6300.46 samples/sec Loss 7.0670 LearningRate 0.0008 Epoch: 7 Global Step: 157550 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:03:42,827-Speed 6315.08 samples/sec Loss 7.0681 LearningRate 0.0008 Epoch: 7 Global Step: 157560 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:03:46,075-Speed 6306.44 samples/sec Loss 7.1570 LearningRate 0.0008 Epoch: 7 Global Step: 157570 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:03:49,324-Speed 6306.18 samples/sec Loss 7.2157 LearningRate 0.0008 Epoch: 7 Global Step: 157580 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:03:52,569-Speed 6311.01 samples/sec Loss 7.2020 LearningRate 0.0008 Epoch: 7 Global Step: 157590 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:03:55,811-Speed 6320.44 samples/sec Loss 7.1339 LearningRate 0.0008 Epoch: 7 Global Step: 157600 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:03:59,061-Speed 6301.21 samples/sec Loss 7.1259 LearningRate 0.0008 Epoch: 7 Global Step: 157610 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:04:02,310-Speed 6305.97 samples/sec Loss 7.1036 LearningRate 0.0008 Epoch: 7 Global Step: 157620 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:04:05,557-Speed 6310.02 samples/sec Loss 7.0775 LearningRate 0.0008 Epoch: 7 Global Step: 157630 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:04:08,804-Speed 6307.90 samples/sec Loss 7.1076 LearningRate 0.0008 Epoch: 7 Global Step: 157640 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:04:12,039-Speed 6333.01 samples/sec Loss 7.0407 LearningRate 0.0008 Epoch: 7 Global Step: 157650 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:04:15,281-Speed 6317.59 samples/sec Loss 7.1127 LearningRate 0.0008 Epoch: 7 Global Step: 157660 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:04:18,526-Speed 6311.72 samples/sec Loss 6.9588 LearningRate 0.0008 Epoch: 7 Global Step: 157670 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:04:21,775-Speed 6306.23 samples/sec Loss 7.1177 LearningRate 0.0008 Epoch: 7 Global Step: 157680 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:04:25,018-Speed 6316.65 samples/sec Loss 7.0406 LearningRate 0.0008 Epoch: 7 Global Step: 157690 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:04:28,252-Speed 6333.57 samples/sec Loss 7.0154 LearningRate 0.0008 Epoch: 7 Global Step: 157700 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:04:31,501-Speed 6305.33 samples/sec Loss 7.0547 LearningRate 0.0008 Epoch: 7 Global Step: 157710 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:04:34,753-Speed 6299.42 samples/sec Loss 7.1537 LearningRate 0.0008 Epoch: 7 Global Step: 157720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:04:38,000-Speed 6308.26 samples/sec Loss 7.0567 LearningRate 0.0008 Epoch: 7 Global Step: 157730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:04:41,246-Speed 6311.43 samples/sec Loss 7.0732 LearningRate 0.0008 Epoch: 7 Global Step: 157740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:04:44,489-Speed 6314.88 samples/sec Loss 7.0563 LearningRate 0.0008 Epoch: 7 Global Step: 157750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:04:47,737-Speed 6307.19 samples/sec Loss 7.0429 LearningRate 0.0008 Epoch: 7 Global Step: 157760 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:04:50,985-Speed 6306.88 samples/sec Loss 7.1554 LearningRate 0.0008 Epoch: 7 Global Step: 157770 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:04:54,235-Speed 6302.05 samples/sec Loss 7.1495 LearningRate 0.0008 Epoch: 7 Global Step: 157780 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:04:57,482-Speed 6308.92 samples/sec Loss 7.0922 LearningRate 0.0008 Epoch: 7 Global Step: 157790 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:05:00,737-Speed 6294.12 samples/sec Loss 7.1435 LearningRate 0.0008 Epoch: 7 Global Step: 157800 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:03,992-Speed 6292.26 samples/sec Loss 7.1052 LearningRate 0.0008 Epoch: 7 Global Step: 157810 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:07,239-Speed 6309.53 samples/sec Loss 7.1245 LearningRate 0.0008 Epoch: 7 Global Step: 157820 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:10,489-Speed 6304.61 samples/sec Loss 7.1263 LearningRate 0.0008 Epoch: 7 Global Step: 157830 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:13,740-Speed 6300.97 samples/sec Loss 7.2278 LearningRate 0.0008 Epoch: 7 Global Step: 157840 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:16,987-Speed 6308.53 samples/sec Loss 7.1969 LearningRate 0.0008 Epoch: 7 Global Step: 157850 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:20,233-Speed 6310.11 samples/sec Loss 7.1268 LearningRate 0.0008 Epoch: 7 Global Step: 157860 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:23,484-Speed 6301.89 samples/sec Loss 7.1694 LearningRate 0.0008 Epoch: 7 Global Step: 157870 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:26,748-Speed 6276.44 samples/sec Loss 7.1169 LearningRate 0.0008 Epoch: 7 Global Step: 157880 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:29,994-Speed 6308.94 samples/sec Loss 7.1528 LearningRate 0.0008 Epoch: 7 Global Step: 157890 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:33,224-Speed 6342.62 samples/sec Loss 7.1487 LearningRate 0.0008 Epoch: 7 Global Step: 157900 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:05:36,457-Speed 6336.43 samples/sec Loss 7.1468 LearningRate 0.0008 Epoch: 7 Global Step: 157910 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:05:39,703-Speed 6310.65 samples/sec Loss 7.1192 LearningRate 0.0008 Epoch: 7 Global Step: 157920 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:05:42,951-Speed 6306.23 samples/sec Loss 7.1394 LearningRate 0.0008 Epoch: 7 Global Step: 157930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:05:46,200-Speed 6305.52 samples/sec Loss 7.1192 LearningRate 0.0008 Epoch: 7 Global Step: 157940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:05:49,461-Speed 6281.71 samples/sec Loss 7.0514 LearningRate 0.0008 Epoch: 7 Global Step: 157950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:05:52,705-Speed 6313.47 samples/sec Loss 7.1453 LearningRate 0.0008 Epoch: 7 Global Step: 157960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:05:55,949-Speed 6315.58 samples/sec Loss 7.0224 LearningRate 0.0008 Epoch: 7 Global Step: 157970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:05:59,196-Speed 6307.99 samples/sec Loss 7.1712 LearningRate 0.0008 Epoch: 7 Global Step: 157980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:02,446-Speed 6303.31 samples/sec Loss 7.1290 LearningRate 0.0008 Epoch: 7 Global Step: 157990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:05,693-Speed 6308.69 samples/sec Loss 7.0771 LearningRate 0.0008 Epoch: 7 Global Step: 158000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:08,940-Speed 6308.19 samples/sec Loss 7.1164 LearningRate 0.0008 Epoch: 7 Global Step: 158010 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:06:12,174-Speed 6334.18 samples/sec Loss 7.0446 LearningRate 0.0008 Epoch: 7 Global Step: 158020 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:15,423-Speed 6306.16 samples/sec Loss 7.1024 LearningRate 0.0008 Epoch: 7 Global Step: 158030 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:18,667-Speed 6314.87 samples/sec Loss 7.1329 LearningRate 0.0008 Epoch: 7 Global Step: 158040 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:21,913-Speed 6311.34 samples/sec Loss 7.1665 LearningRate 0.0008 Epoch: 7 Global Step: 158050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:25,159-Speed 6310.63 samples/sec Loss 7.0768 LearningRate 0.0008 Epoch: 7 Global Step: 158060 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:28,409-Speed 6303.11 samples/sec Loss 7.0646 LearningRate 0.0008 Epoch: 7 Global Step: 158070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:31,656-Speed 6307.74 samples/sec Loss 7.0827 LearningRate 0.0008 Epoch: 7 Global Step: 158080 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:34,897-Speed 6320.43 samples/sec Loss 7.1596 LearningRate 0.0008 Epoch: 7 Global Step: 158090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:38,141-Speed 6314.86 samples/sec Loss 7.1462 LearningRate 0.0008 Epoch: 7 Global Step: 158100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:41,391-Speed 6302.80 samples/sec Loss 7.0895 LearningRate 0.0008 Epoch: 7 Global Step: 158110 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:06:44,638-Speed 6309.46 samples/sec Loss 7.1124 LearningRate 0.0008 Epoch: 7 Global Step: 158120 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:06:47,885-Speed 6308.20 samples/sec Loss 7.0469 LearningRate 0.0008 Epoch: 7 Global Step: 158130 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:06:51,134-Speed 6304.75 samples/sec Loss 7.0845 LearningRate 0.0008 Epoch: 7 Global Step: 158140 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:06:54,381-Speed 6307.85 samples/sec Loss 7.0710 LearningRate 0.0008 Epoch: 7 Global Step: 158150 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:06:57,629-Speed 6307.24 samples/sec Loss 7.0458 LearningRate 0.0008 Epoch: 7 Global Step: 158160 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:07:00,861-Speed 6339.03 samples/sec Loss 7.1111 LearningRate 0.0008 Epoch: 7 Global Step: 158170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:04,111-Speed 6301.91 samples/sec Loss 7.1601 LearningRate 0.0008 Epoch: 7 Global Step: 158180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:07,355-Speed 6315.00 samples/sec Loss 7.1293 LearningRate 0.0008 Epoch: 7 Global Step: 158190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:10,602-Speed 6308.66 samples/sec Loss 7.0476 LearningRate 0.0008 Epoch: 7 Global Step: 158200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:13,846-Speed 6314.07 samples/sec Loss 7.0967 LearningRate 0.0008 Epoch: 7 Global Step: 158210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:17,091-Speed 6313.79 samples/sec Loss 7.0731 LearningRate 0.0008 Epoch: 7 Global Step: 158220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:20,339-Speed 6305.89 samples/sec Loss 7.1441 LearningRate 0.0008 Epoch: 7 Global Step: 158230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:23,587-Speed 6306.85 samples/sec Loss 7.1995 LearningRate 0.0008 Epoch: 7 Global Step: 158240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:26,836-Speed 6306.13 samples/sec Loss 7.0374 LearningRate 0.0008 Epoch: 7 Global Step: 158250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:30,081-Speed 6312.59 samples/sec Loss 7.1486 LearningRate 0.0008 Epoch: 7 Global Step: 158260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:33,331-Speed 6302.45 samples/sec Loss 7.1586 LearningRate 0.0008 Epoch: 7 Global Step: 158270 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:07:36,578-Speed 6308.55 samples/sec Loss 7.1341 LearningRate 0.0008 Epoch: 7 Global Step: 158280 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:07:39,825-Speed 6310.57 samples/sec Loss 7.1138 LearningRate 0.0008 Epoch: 7 Global Step: 158290 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:07:43,057-Speed 6337.50 samples/sec Loss 7.1548 LearningRate 0.0008 Epoch: 7 Global Step: 158300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:46,305-Speed 6306.35 samples/sec Loss 7.0558 LearningRate 0.0008 Epoch: 7 Global Step: 158310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:49,552-Speed 6308.71 samples/sec Loss 7.1445 LearningRate 0.0008 Epoch: 7 Global Step: 158320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:52,796-Speed 6314.15 samples/sec Loss 7.1367 LearningRate 0.0008 Epoch: 7 Global Step: 158330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:56,043-Speed 6309.01 samples/sec Loss 7.0913 LearningRate 0.0008 Epoch: 7 Global Step: 158340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:07:59,291-Speed 6308.24 samples/sec Loss 7.1041 LearningRate 0.0008 Epoch: 7 Global Step: 158350 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:02,539-Speed 6305.76 samples/sec Loss 7.1857 LearningRate 0.0008 Epoch: 7 Global Step: 158360 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:05,784-Speed 6312.03 samples/sec Loss 7.0581 LearningRate 0.0008 Epoch: 7 Global Step: 158370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:09,027-Speed 6316.79 samples/sec Loss 7.0577 LearningRate 0.0008 Epoch: 7 Global Step: 158380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:12,273-Speed 6310.64 samples/sec Loss 7.0606 LearningRate 0.0008 Epoch: 7 Global Step: 158390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:15,523-Speed 6302.49 samples/sec Loss 7.1116 LearningRate 0.0008 Epoch: 7 Global Step: 158400 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:08:18,773-Speed 6304.19 samples/sec Loss 7.1393 LearningRate 0.0008 Epoch: 7 Global Step: 158410 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:08:22,008-Speed 6331.38 samples/sec Loss 7.0245 LearningRate 0.0008 Epoch: 7 Global Step: 158420 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:25,258-Speed 6303.21 samples/sec Loss 7.0306 LearningRate 0.0008 Epoch: 7 Global Step: 158430 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:28,508-Speed 6303.21 samples/sec Loss 7.0872 LearningRate 0.0008 Epoch: 7 Global Step: 158440 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:31,754-Speed 6311.32 samples/sec Loss 7.0982 LearningRate 0.0008 Epoch: 7 Global Step: 158450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:34,998-Speed 6312.86 samples/sec Loss 7.1162 LearningRate 0.0008 Epoch: 7 Global Step: 158460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:38,247-Speed 6306.22 samples/sec Loss 7.1087 LearningRate 0.0008 Epoch: 7 Global Step: 158470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:41,490-Speed 6317.30 samples/sec Loss 7.1972 LearningRate 0.0008 Epoch: 7 Global Step: 158480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:44,735-Speed 6311.93 samples/sec Loss 7.0561 LearningRate 0.0008 Epoch: 7 Global Step: 158490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:47,977-Speed 6317.96 samples/sec Loss 7.0736 LearningRate 0.0008 Epoch: 7 Global Step: 158500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:51,223-Speed 6310.89 samples/sec Loss 6.9989 LearningRate 0.0008 Epoch: 7 Global Step: 158510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:54,460-Speed 6329.40 samples/sec Loss 7.0651 LearningRate 0.0008 Epoch: 7 Global Step: 158520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:08:57,709-Speed 6304.08 samples/sec Loss 7.2163 LearningRate 0.0008 Epoch: 7 Global Step: 158530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:00,955-Speed 6310.61 samples/sec Loss 7.2112 LearningRate 0.0008 Epoch: 7 Global Step: 158540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:04,201-Speed 6310.82 samples/sec Loss 7.0573 LearningRate 0.0008 Epoch: 7 Global Step: 158550 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:07,448-Speed 6308.50 samples/sec Loss 7.1806 LearningRate 0.0008 Epoch: 7 Global Step: 158560 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:10,691-Speed 6317.87 samples/sec Loss 7.1696 LearningRate 0.0008 Epoch: 7 Global Step: 158570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:13,932-Speed 6319.53 samples/sec Loss 7.0400 LearningRate 0.0008 Epoch: 7 Global Step: 158580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:17,177-Speed 6313.61 samples/sec Loss 7.0976 LearningRate 0.0008 Epoch: 7 Global Step: 158590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:20,423-Speed 6310.51 samples/sec Loss 7.0619 LearningRate 0.0008 Epoch: 7 Global Step: 158600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:23,674-Speed 6301.45 samples/sec Loss 7.1003 LearningRate 0.0008 Epoch: 7 Global Step: 158610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:26,916-Speed 6318.21 samples/sec Loss 7.1430 LearningRate 0.0008 Epoch: 7 Global Step: 158620 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:09:30,169-Speed 6297.71 samples/sec Loss 7.1223 LearningRate 0.0008 Epoch: 7 Global Step: 158630 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:09:33,429-Speed 6283.39 samples/sec Loss 7.0107 LearningRate 0.0008 Epoch: 7 Global Step: 158640 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:09:36,674-Speed 6312.59 samples/sec Loss 7.1808 LearningRate 0.0008 Epoch: 7 Global Step: 158650 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:09:39,919-Speed 6311.58 samples/sec Loss 7.0911 LearningRate 0.0008 Epoch: 7 Global Step: 158660 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:09:43,167-Speed 6307.87 samples/sec Loss 7.1120 LearningRate 0.0008 Epoch: 7 Global Step: 158670 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:09:46,398-Speed 6338.75 samples/sec Loss 7.1307 LearningRate 0.0008 Epoch: 7 Global Step: 158680 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:49,646-Speed 6308.90 samples/sec Loss 7.0735 LearningRate 0.0008 Epoch: 7 Global Step: 158690 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:52,892-Speed 6309.77 samples/sec Loss 7.1030 LearningRate 0.0008 Epoch: 7 Global Step: 158700 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:56,142-Speed 6304.18 samples/sec Loss 7.0688 LearningRate 0.0008 Epoch: 7 Global Step: 158710 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:09:59,394-Speed 6298.15 samples/sec Loss 6.9913 LearningRate 0.0008 Epoch: 7 Global Step: 158720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:02,642-Speed 6307.95 samples/sec Loss 7.0898 LearningRate 0.0008 Epoch: 7 Global Step: 158730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:05,886-Speed 6314.17 samples/sec Loss 7.1396 LearningRate 0.0008 Epoch: 7 Global Step: 158740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:09,129-Speed 6315.46 samples/sec Loss 7.0891 LearningRate 0.0008 Epoch: 7 Global Step: 158750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:12,377-Speed 6307.60 samples/sec Loss 7.0934 LearningRate 0.0008 Epoch: 7 Global Step: 158760 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:15,619-Speed 6317.98 samples/sec Loss 7.1101 LearningRate 0.0008 Epoch: 7 Global Step: 158770 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:18,870-Speed 6302.10 samples/sec Loss 7.0693 LearningRate 0.0008 Epoch: 7 Global Step: 158780 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:10:22,113-Speed 6315.71 samples/sec Loss 7.1189 LearningRate 0.0008 Epoch: 7 Global Step: 158790 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:10:25,347-Speed 6334.17 samples/sec Loss 7.1133 LearningRate 0.0008 Epoch: 7 Global Step: 158800 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:28,623-Speed 6254.05 samples/sec Loss 7.1222 LearningRate 0.0008 Epoch: 7 Global Step: 158810 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:31,868-Speed 6312.40 samples/sec Loss 7.1268 LearningRate 0.0008 Epoch: 7 Global Step: 158820 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:35,113-Speed 6311.29 samples/sec Loss 7.0987 LearningRate 0.0008 Epoch: 7 Global Step: 158830 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:38,358-Speed 6312.84 samples/sec Loss 7.1358 LearningRate 0.0008 Epoch: 7 Global Step: 158840 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:41,604-Speed 6310.37 samples/sec Loss 7.1320 LearningRate 0.0008 Epoch: 7 Global Step: 158850 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:44,846-Speed 6320.02 samples/sec Loss 7.0629 LearningRate 0.0008 Epoch: 7 Global Step: 158860 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:48,093-Speed 6307.21 samples/sec Loss 7.1330 LearningRate 0.0008 Epoch: 7 Global Step: 158870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:51,338-Speed 6313.89 samples/sec Loss 7.1688 LearningRate 0.0008 Epoch: 7 Global Step: 158880 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:54,583-Speed 6312.96 samples/sec Loss 7.0485 LearningRate 0.0008 Epoch: 7 Global Step: 158890 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:10:57,830-Speed 6308.59 samples/sec Loss 7.1024 LearningRate 0.0008 Epoch: 7 Global Step: 158900 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:11:01,061-Speed 6339.18 samples/sec Loss 7.1589 LearningRate 0.0008 Epoch: 7 Global Step: 158910 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:04,308-Speed 6310.46 samples/sec Loss 7.0937 LearningRate 0.0008 Epoch: 7 Global Step: 158920 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:07,570-Speed 6280.33 samples/sec Loss 6.9769 LearningRate 0.0008 Epoch: 7 Global Step: 158930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:10,819-Speed 6303.65 samples/sec Loss 7.0764 LearningRate 0.0008 Epoch: 7 Global Step: 158940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:14,069-Speed 6303.47 samples/sec Loss 7.0810 LearningRate 0.0008 Epoch: 7 Global Step: 158950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:17,314-Speed 6312.43 samples/sec Loss 7.0721 LearningRate 0.0008 Epoch: 7 Global Step: 158960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:20,562-Speed 6306.09 samples/sec Loss 7.0834 LearningRate 0.0008 Epoch: 7 Global Step: 158970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:23,811-Speed 6305.72 samples/sec Loss 7.1319 LearningRate 0.0008 Epoch: 7 Global Step: 158980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:27,058-Speed 6308.80 samples/sec Loss 7.1791 LearningRate 0.0008 Epoch: 7 Global Step: 158990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:30,303-Speed 6312.68 samples/sec Loss 7.0804 LearningRate 0.0008 Epoch: 7 Global Step: 159000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:33,544-Speed 6320.27 samples/sec Loss 6.9987 LearningRate 0.0008 Epoch: 7 Global Step: 159010 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:11:36,788-Speed 6314.73 samples/sec Loss 7.1119 LearningRate 0.0008 Epoch: 7 Global Step: 159020 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:11:40,035-Speed 6309.04 samples/sec Loss 7.0203 LearningRate 0.0008 Epoch: 7 Global Step: 159030 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:11:43,285-Speed 6302.98 samples/sec Loss 7.0875 LearningRate 0.0008 Epoch: 7 Global Step: 159040 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:11:46,516-Speed 6340.10 samples/sec Loss 7.0398 LearningRate 0.0008 Epoch: 7 Global Step: 159050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:49,762-Speed 6310.39 samples/sec Loss 7.1374 LearningRate 0.0008 Epoch: 7 Global Step: 159060 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:53,010-Speed 6306.97 samples/sec Loss 7.1149 LearningRate 0.0008 Epoch: 7 Global Step: 159070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:56,262-Speed 6298.98 samples/sec Loss 7.0697 LearningRate 0.0008 Epoch: 7 Global Step: 159080 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:11:59,517-Speed 6293.03 samples/sec Loss 7.0724 LearningRate 0.0008 Epoch: 7 Global Step: 159090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:02,761-Speed 6314.03 samples/sec Loss 7.1319 LearningRate 0.0008 Epoch: 7 Global Step: 159100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:06,015-Speed 6295.34 samples/sec Loss 7.1499 LearningRate 0.0008 Epoch: 7 Global Step: 159110 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:09,265-Speed 6303.13 samples/sec Loss 7.0576 LearningRate 0.0008 Epoch: 7 Global Step: 159120 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:12,512-Speed 6309.69 samples/sec Loss 7.0999 LearningRate 0.0008 Epoch: 7 Global Step: 159130 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:15,757-Speed 6312.82 samples/sec Loss 7.1151 LearningRate 0.0008 Epoch: 7 Global Step: 159140 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:19,000-Speed 6316.35 samples/sec Loss 7.0942 LearningRate 0.0008 Epoch: 7 Global Step: 159150 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:12:22,249-Speed 6304.94 samples/sec Loss 7.1502 LearningRate 0.0008 Epoch: 7 Global Step: 159160 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:12:25,492-Speed 6316.47 samples/sec Loss 7.0837 LearningRate 0.0008 Epoch: 7 Global Step: 159170 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:12:28,739-Speed 6309.65 samples/sec Loss 7.0786 LearningRate 0.0008 Epoch: 7 Global Step: 159180 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:12:31,972-Speed 6336.76 samples/sec Loss 7.0534 LearningRate 0.0008 Epoch: 7 Global Step: 159190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:35,217-Speed 6312.60 samples/sec Loss 7.0887 LearningRate 0.0008 Epoch: 7 Global Step: 159200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:38,465-Speed 6305.28 samples/sec Loss 7.0124 LearningRate 0.0008 Epoch: 7 Global Step: 159210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:41,709-Speed 6315.53 samples/sec Loss 7.0876 LearningRate 0.0008 Epoch: 7 Global Step: 159220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:44,959-Speed 6302.19 samples/sec Loss 7.1341 LearningRate 0.0008 Epoch: 7 Global Step: 159230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:48,208-Speed 6304.33 samples/sec Loss 7.1211 LearningRate 0.0008 Epoch: 7 Global Step: 159240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:51,455-Speed 6308.86 samples/sec Loss 7.0973 LearningRate 0.0008 Epoch: 7 Global Step: 159250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:54,696-Speed 6320.98 samples/sec Loss 7.1193 LearningRate 0.0008 Epoch: 7 Global Step: 159260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:12:57,952-Speed 6291.42 samples/sec Loss 7.1649 LearningRate 0.0008 Epoch: 7 Global Step: 159270 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:01,196-Speed 6314.99 samples/sec Loss 7.0831 LearningRate 0.0008 Epoch: 7 Global Step: 159280 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:04,440-Speed 6313.95 samples/sec Loss 7.1934 LearningRate 0.0008 Epoch: 7 Global Step: 159290 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:13:07,670-Speed 6341.52 samples/sec Loss 7.0478 LearningRate 0.0008 Epoch: 7 Global Step: 159300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:10,919-Speed 6305.48 samples/sec Loss 7.1236 LearningRate 0.0008 Epoch: 7 Global Step: 159310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:14,167-Speed 6307.41 samples/sec Loss 7.1426 LearningRate 0.0008 Epoch: 7 Global Step: 159320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:17,418-Speed 6300.14 samples/sec Loss 7.1014 LearningRate 0.0008 Epoch: 7 Global Step: 159330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:20,667-Speed 6305.52 samples/sec Loss 7.0845 LearningRate 0.0008 Epoch: 7 Global Step: 159340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:23,914-Speed 6309.52 samples/sec Loss 7.0245 LearningRate 0.0008 Epoch: 7 Global Step: 159350 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:27,161-Speed 6308.37 samples/sec Loss 7.0997 LearningRate 0.0008 Epoch: 7 Global Step: 159360 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:30,410-Speed 6306.40 samples/sec Loss 7.1035 LearningRate 0.0008 Epoch: 7 Global Step: 159370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:33,655-Speed 6312.22 samples/sec Loss 7.0662 LearningRate 0.0008 Epoch: 7 Global Step: 159380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:36,899-Speed 6314.76 samples/sec Loss 7.1405 LearningRate 0.0008 Epoch: 7 Global Step: 159390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:13:40,147-Speed 6306.44 samples/sec Loss 7.1243 LearningRate 0.0008 Epoch: 7 Global Step: 159400 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:13:43,417-Speed 6263.54 samples/sec Loss 6.9949 LearningRate 0.0008 Epoch: 7 Global Step: 159410 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:13:46,675-Speed 6289.11 samples/sec Loss 7.0924 LearningRate 0.0008 Epoch: 7 Global Step: 159420 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:13:49,919-Speed 6314.25 samples/sec Loss 7.0467 LearningRate 0.0008 Epoch: 7 Global Step: 159430 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:13:53,165-Speed 6309.48 samples/sec Loss 7.0799 LearningRate 0.0008 Epoch: 7 Global Step: 159440 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:13:56,428-Speed 6278.42 samples/sec Loss 6.9840 LearningRate 0.0008 Epoch: 7 Global Step: 159450 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:13:59,663-Speed 6332.53 samples/sec Loss 7.0201 LearningRate 0.0008 Epoch: 7 Global Step: 159460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:02,912-Speed 6305.52 samples/sec Loss 7.0693 LearningRate 0.0008 Epoch: 7 Global Step: 159470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:06,158-Speed 6310.23 samples/sec Loss 7.1207 LearningRate 0.0008 Epoch: 7 Global Step: 159480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:09,407-Speed 6305.04 samples/sec Loss 7.1669 LearningRate 0.0008 Epoch: 7 Global Step: 159490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:12,655-Speed 6305.31 samples/sec Loss 7.1105 LearningRate 0.0008 Epoch: 7 Global Step: 159500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:15,901-Speed 6310.99 samples/sec Loss 7.0822 LearningRate 0.0008 Epoch: 7 Global Step: 159510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:19,153-Speed 6299.11 samples/sec Loss 7.0418 LearningRate 0.0008 Epoch: 7 Global Step: 159520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:22,411-Speed 6286.81 samples/sec Loss 7.0469 LearningRate 0.0008 Epoch: 7 Global Step: 159530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:25,656-Speed 6314.08 samples/sec Loss 7.0056 LearningRate 0.0008 Epoch: 7 Global Step: 159540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:28,901-Speed 6313.62 samples/sec Loss 7.1085 LearningRate 0.0008 Epoch: 7 Global Step: 159550 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:32,144-Speed 6315.47 samples/sec Loss 7.0195 LearningRate 0.0008 Epoch: 7 Global Step: 159560 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:14:35,391-Speed 6308.50 samples/sec Loss 7.0685 LearningRate 0.0008 Epoch: 7 Global Step: 159570 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:14:38,622-Speed 6340.50 samples/sec Loss 7.0909 LearningRate 0.0008 Epoch: 7 Global Step: 159580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:41,868-Speed 6310.58 samples/sec Loss 7.0918 LearningRate 0.0008 Epoch: 7 Global Step: 159590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:45,117-Speed 6305.93 samples/sec Loss 7.0747 LearningRate 0.0008 Epoch: 7 Global Step: 159600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:48,363-Speed 6310.12 samples/sec Loss 7.0712 LearningRate 0.0008 Epoch: 7 Global Step: 159610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:51,607-Speed 6315.14 samples/sec Loss 7.0301 LearningRate 0.0008 Epoch: 7 Global Step: 159620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:54,858-Speed 6300.02 samples/sec Loss 7.1094 LearningRate 0.0008 Epoch: 7 Global Step: 159630 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:14:58,105-Speed 6308.57 samples/sec Loss 7.0075 LearningRate 0.0008 Epoch: 7 Global Step: 159640 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:01,352-Speed 6309.22 samples/sec Loss 7.0720 LearningRate 0.0008 Epoch: 7 Global Step: 159650 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:04,597-Speed 6314.05 samples/sec Loss 6.9882 LearningRate 0.0008 Epoch: 7 Global Step: 159660 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:07,840-Speed 6316.10 samples/sec Loss 7.0622 LearningRate 0.0008 Epoch: 7 Global Step: 159670 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:11,084-Speed 6314.39 samples/sec Loss 7.1251 LearningRate 0.0008 Epoch: 7 Global Step: 159680 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:15:14,341-Speed 6288.13 samples/sec Loss 7.1001 LearningRate 0.0008 Epoch: 7 Global Step: 159690 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:15:17,576-Speed 6333.48 samples/sec Loss 7.0557 LearningRate 0.0008 Epoch: 7 Global Step: 159700 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:20,821-Speed 6311.38 samples/sec Loss 7.0980 LearningRate 0.0008 Epoch: 7 Global Step: 159710 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:24,068-Speed 6308.53 samples/sec Loss 6.9719 LearningRate 0.0008 Epoch: 7 Global Step: 159720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:27,315-Speed 6310.51 samples/sec Loss 7.0570 LearningRate 0.0008 Epoch: 7 Global Step: 159730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:30,561-Speed 6309.59 samples/sec Loss 7.0872 LearningRate 0.0008 Epoch: 7 Global Step: 159740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:33,806-Speed 6313.61 samples/sec Loss 7.0449 LearningRate 0.0008 Epoch: 7 Global Step: 159750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:37,049-Speed 6316.17 samples/sec Loss 7.0598 LearningRate 0.0008 Epoch: 7 Global Step: 159760 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:40,329-Speed 6245.04 samples/sec Loss 7.1096 LearningRate 0.0008 Epoch: 7 Global Step: 159770 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:43,605-Speed 6254.22 samples/sec Loss 7.0454 LearningRate 0.0008 Epoch: 7 Global Step: 159780 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:46,852-Speed 6307.66 samples/sec Loss 7.0904 LearningRate 0.0008 Epoch: 7 Global Step: 159790 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:50,099-Speed 6309.89 samples/sec Loss 7.0817 LearningRate 0.0008 Epoch: 7 Global Step: 159800 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:15:53,344-Speed 6311.67 samples/sec Loss 7.0619 LearningRate 0.0008 Epoch: 7 Global Step: 159810 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:15:56,574-Speed 6343.45 samples/sec Loss 7.1326 LearningRate 0.0008 Epoch: 7 Global Step: 159820 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:15:59,823-Speed 6304.66 samples/sec Loss 7.1269 LearningRate 0.0008 Epoch: 7 Global Step: 159830 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:03,072-Speed 6304.97 samples/sec Loss 6.9843 LearningRate 0.0008 Epoch: 7 Global Step: 159840 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:06,317-Speed 6312.27 samples/sec Loss 7.1134 LearningRate 0.0008 Epoch: 7 Global Step: 159850 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:09,565-Speed 6305.84 samples/sec Loss 7.0595 LearningRate 0.0008 Epoch: 7 Global Step: 159860 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:12,813-Speed 6307.59 samples/sec Loss 7.1448 LearningRate 0.0008 Epoch: 7 Global Step: 159870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:16,065-Speed 6300.29 samples/sec Loss 7.1393 LearningRate 0.0008 Epoch: 7 Global Step: 159880 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:19,309-Speed 6313.57 samples/sec Loss 7.0972 LearningRate 0.0008 Epoch: 7 Global Step: 159890 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:22,566-Speed 6289.93 samples/sec Loss 7.0940 LearningRate 0.0008 Epoch: 7 Global Step: 159900 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:25,817-Speed 6302.13 samples/sec Loss 7.0687 LearningRate 0.0008 Epoch: 7 Global Step: 159910 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:29,068-Speed 6299.18 samples/sec Loss 7.0436 LearningRate 0.0008 Epoch: 7 Global Step: 159920 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:16:32,313-Speed 6313.03 samples/sec Loss 7.0381 LearningRate 0.0008 Epoch: 7 Global Step: 159930 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:16:35,559-Speed 6310.88 samples/sec Loss 7.0754 LearningRate 0.0008 Epoch: 7 Global Step: 159940 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:16:38,803-Speed 6314.60 samples/sec Loss 7.1118 LearningRate 0.0008 Epoch: 7 Global Step: 159950 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:16:42,051-Speed 6306.45 samples/sec Loss 7.1489 LearningRate 0.0008 Epoch: 7 Global Step: 159960 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:16:45,298-Speed 6310.92 samples/sec Loss 7.0576 LearningRate 0.0008 Epoch: 7 Global Step: 159970 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:16:48,528-Speed 6341.43 samples/sec Loss 7.0927 LearningRate 0.0008 Epoch: 7 Global Step: 159980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:51,776-Speed 6306.66 samples/sec Loss 7.0834 LearningRate 0.0008 Epoch: 7 Global Step: 159990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:55,021-Speed 6312.94 samples/sec Loss 7.1181 LearningRate 0.0008 Epoch: 7 Global Step: 160000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:16:58,269-Speed 6307.42 samples/sec Loss 7.1450 LearningRate 0.0008 Epoch: 7 Global Step: 160010 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:01,517-Speed 6305.32 samples/sec Loss 7.0496 LearningRate 0.0008 Epoch: 7 Global Step: 160020 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:04,762-Speed 6314.16 samples/sec Loss 7.1397 LearningRate 0.0008 Epoch: 7 Global Step: 160030 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:08,006-Speed 6313.73 samples/sec Loss 7.0550 LearningRate 0.0008 Epoch: 7 Global Step: 160040 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:11,264-Speed 6287.35 samples/sec Loss 7.1530 LearningRate 0.0008 Epoch: 7 Global Step: 160050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:14,509-Speed 6311.77 samples/sec Loss 7.1046 LearningRate 0.0008 Epoch: 7 Global Step: 160060 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:17,759-Speed 6303.24 samples/sec Loss 7.0502 LearningRate 0.0008 Epoch: 7 Global Step: 160070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:21,007-Speed 6307.73 samples/sec Loss 7.0260 LearningRate 0.0008 Epoch: 7 Global Step: 160080 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:17:24,253-Speed 6309.66 samples/sec Loss 7.0677 LearningRate 0.0008 Epoch: 7 Global Step: 160090 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:17:27,502-Speed 6304.83 samples/sec Loss 7.1186 LearningRate 0.0008 Epoch: 7 Global Step: 160100 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:17:30,753-Speed 6300.65 samples/sec Loss 7.1269 LearningRate 0.0008 Epoch: 7 Global Step: 160110 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:17:34,001-Speed 6308.63 samples/sec Loss 7.0902 LearningRate 0.0008 Epoch: 7 Global Step: 160120 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:17:37,249-Speed 6305.93 samples/sec Loss 7.0930 LearningRate 0.0008 Epoch: 7 Global Step: 160130 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:17:40,498-Speed 6305.07 samples/sec Loss 7.1399 LearningRate 0.0008 Epoch: 7 Global Step: 160140 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:17:43,733-Speed 6331.59 samples/sec Loss 7.0388 LearningRate 0.0008 Epoch: 7 Global Step: 160150 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:46,977-Speed 6313.86 samples/sec Loss 7.0476 LearningRate 0.0008 Epoch: 7 Global Step: 160160 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:50,224-Speed 6309.44 samples/sec Loss 7.0856 LearningRate 0.0008 Epoch: 7 Global Step: 160170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:53,473-Speed 6304.02 samples/sec Loss 7.0526 LearningRate 0.0008 Epoch: 7 Global Step: 160180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:56,722-Speed 6306.45 samples/sec Loss 6.9985 LearningRate 0.0008 Epoch: 7 Global Step: 160190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:17:59,969-Speed 6309.04 samples/sec Loss 7.1610 LearningRate 0.0008 Epoch: 7 Global Step: 160200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:03,215-Speed 6311.49 samples/sec Loss 7.1234 LearningRate 0.0008 Epoch: 7 Global Step: 160210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:06,463-Speed 6306.20 samples/sec Loss 7.1427 LearningRate 0.0008 Epoch: 7 Global Step: 160220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:09,713-Speed 6302.59 samples/sec Loss 7.0320 LearningRate 0.0008 Epoch: 7 Global Step: 160230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:12,960-Speed 6309.86 samples/sec Loss 7.1235 LearningRate 0.0008 Epoch: 7 Global Step: 160240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:16,209-Speed 6304.42 samples/sec Loss 7.0289 LearningRate 0.0008 Epoch: 7 Global Step: 160250 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:18:19,442-Speed 6336.31 samples/sec Loss 6.9993 LearningRate 0.0008 Epoch: 7 Global Step: 160260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:22,701-Speed 6284.09 samples/sec Loss 7.0557 LearningRate 0.0008 Epoch: 7 Global Step: 160270 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:25,942-Speed 6320.51 samples/sec Loss 7.1463 LearningRate 0.0008 Epoch: 7 Global Step: 160280 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:29,188-Speed 6311.50 samples/sec Loss 7.1171 LearningRate 0.0008 Epoch: 7 Global Step: 160290 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:32,447-Speed 6284.93 samples/sec Loss 7.0853 LearningRate 0.0008 Epoch: 7 Global Step: 160300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:35,694-Speed 6309.65 samples/sec Loss 7.0583 LearningRate 0.0008 Epoch: 7 Global Step: 160310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:38,940-Speed 6310.33 samples/sec Loss 7.0648 LearningRate 0.0008 Epoch: 7 Global Step: 160320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:42,184-Speed 6315.20 samples/sec Loss 7.1589 LearningRate 0.0008 Epoch: 7 Global Step: 160330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:45,432-Speed 6306.97 samples/sec Loss 7.0888 LearningRate 0.0008 Epoch: 7 Global Step: 160340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:48,679-Speed 6309.00 samples/sec Loss 7.0995 LearningRate 0.0008 Epoch: 7 Global Step: 160350 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:51,922-Speed 6314.63 samples/sec Loss 7.0335 LearningRate 0.0008 Epoch: 7 Global Step: 160360 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:18:55,150-Speed 6347.02 samples/sec Loss 7.0582 LearningRate 0.0008 Epoch: 7 Global Step: 160370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:18:58,393-Speed 6316.95 samples/sec Loss 7.0126 LearningRate 0.0008 Epoch: 7 Global Step: 160380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:01,642-Speed 6306.54 samples/sec Loss 7.1654 LearningRate 0.0008 Epoch: 7 Global Step: 160390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:04,887-Speed 6313.85 samples/sec Loss 7.1476 LearningRate 0.0008 Epoch: 7 Global Step: 160400 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:08,133-Speed 6311.49 samples/sec Loss 7.1100 LearningRate 0.0008 Epoch: 7 Global Step: 160410 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:11,392-Speed 6286.23 samples/sec Loss 7.1006 LearningRate 0.0008 Epoch: 7 Global Step: 160420 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:14,637-Speed 6312.51 samples/sec Loss 6.9863 LearningRate 0.0008 Epoch: 7 Global Step: 160430 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:17,881-Speed 6312.79 samples/sec Loss 7.1859 LearningRate 0.0008 Epoch: 7 Global Step: 160440 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:21,127-Speed 6311.38 samples/sec Loss 7.0576 LearningRate 0.0008 Epoch: 7 Global Step: 160450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:24,372-Speed 6313.40 samples/sec Loss 7.1609 LearningRate 0.0008 Epoch: 7 Global Step: 160460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:27,616-Speed 6313.57 samples/sec Loss 7.0461 LearningRate 0.0008 Epoch: 7 Global Step: 160470 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:19:30,862-Speed 6311.64 samples/sec Loss 7.1084 LearningRate 0.0008 Epoch: 7 Global Step: 160480 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:19:34,096-Speed 6333.49 samples/sec Loss 7.1421 LearningRate 0.0008 Epoch: 7 Global Step: 160490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:37,341-Speed 6313.65 samples/sec Loss 7.0937 LearningRate 0.0008 Epoch: 7 Global Step: 160500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:40,589-Speed 6307.23 samples/sec Loss 7.0494 LearningRate 0.0008 Epoch: 7 Global Step: 160510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:43,833-Speed 6313.18 samples/sec Loss 7.0113 LearningRate 0.0008 Epoch: 7 Global Step: 160520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:47,076-Speed 6316.50 samples/sec Loss 7.0830 LearningRate 0.0008 Epoch: 7 Global Step: 160530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:50,325-Speed 6305.05 samples/sec Loss 7.0112 LearningRate 0.0008 Epoch: 7 Global Step: 160540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:53,572-Speed 6309.40 samples/sec Loss 7.0843 LearningRate 0.0008 Epoch: 7 Global Step: 160550 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:19:56,817-Speed 6311.99 samples/sec Loss 7.0552 LearningRate 0.0008 Epoch: 7 Global Step: 160560 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:00,061-Speed 6314.91 samples/sec Loss 7.0583 LearningRate 0.0008 Epoch: 7 Global Step: 160570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:03,311-Speed 6302.30 samples/sec Loss 7.0154 LearningRate 0.0008 Epoch: 7 Global Step: 160580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:06,557-Speed 6311.75 samples/sec Loss 7.0253 LearningRate 0.0008 Epoch: 7 Global Step: 160590 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:20:09,785-Speed 6344.74 samples/sec Loss 7.0482 LearningRate 0.0008 Epoch: 7 Global Step: 160600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:13,030-Speed 6313.43 samples/sec Loss 7.0342 LearningRate 0.0008 Epoch: 7 Global Step: 160610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:16,292-Speed 6279.61 samples/sec Loss 7.0383 LearningRate 0.0008 Epoch: 7 Global Step: 160620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:19,541-Speed 6304.63 samples/sec Loss 7.0587 LearningRate 0.0008 Epoch: 7 Global Step: 160630 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:22,790-Speed 6306.20 samples/sec Loss 7.1308 LearningRate 0.0008 Epoch: 7 Global Step: 160640 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:26,088-Speed 6210.94 samples/sec Loss 7.0432 LearningRate 0.0008 Epoch: 7 Global Step: 160650 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:29,336-Speed 6307.15 samples/sec Loss 7.1125 LearningRate 0.0008 Epoch: 7 Global Step: 160660 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:32,584-Speed 6306.85 samples/sec Loss 7.1670 LearningRate 0.0008 Epoch: 7 Global Step: 160670 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:35,830-Speed 6309.82 samples/sec Loss 7.0740 LearningRate 0.0008 Epoch: 7 Global Step: 160680 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:39,089-Speed 6286.75 samples/sec Loss 7.0150 LearningRate 0.0008 Epoch: 7 Global Step: 160690 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:42,337-Speed 6306.92 samples/sec Loss 7.0735 LearningRate 0.0008 Epoch: 7 Global Step: 160700 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:20:45,588-Speed 6301.24 samples/sec Loss 7.1291 LearningRate 0.0008 Epoch: 7 Global Step: 160710 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:20:48,819-Speed 6339.52 samples/sec Loss 7.0514 LearningRate 0.0008 Epoch: 7 Global Step: 160720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:52,065-Speed 6309.78 samples/sec Loss 7.0852 LearningRate 0.0008 Epoch: 7 Global Step: 160730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:55,307-Speed 6319.12 samples/sec Loss 7.0221 LearningRate 0.0008 Epoch: 7 Global Step: 160740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:20:58,585-Speed 6248.49 samples/sec Loss 7.0515 LearningRate 0.0008 Epoch: 7 Global Step: 160750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:01,890-Speed 6197.78 samples/sec Loss 7.0481 LearningRate 0.0008 Epoch: 7 Global Step: 160760 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:05,135-Speed 6313.71 samples/sec Loss 7.1277 LearningRate 0.0008 Epoch: 7 Global Step: 160770 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:08,382-Speed 6309.24 samples/sec Loss 7.0999 LearningRate 0.0008 Epoch: 7 Global Step: 160780 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:11,628-Speed 6309.52 samples/sec Loss 7.0458 LearningRate 0.0008 Epoch: 7 Global Step: 160790 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:14,873-Speed 6313.76 samples/sec Loss 7.0681 LearningRate 0.0008 Epoch: 7 Global Step: 160800 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:18,118-Speed 6310.61 samples/sec Loss 7.0328 LearningRate 0.0008 Epoch: 7 Global Step: 160810 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:21,368-Speed 6305.11 samples/sec Loss 6.9989 LearningRate 0.0008 Epoch: 7 Global Step: 160820 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:21:24,614-Speed 6309.58 samples/sec Loss 7.0319 LearningRate 0.0008 Epoch: 7 Global Step: 160830 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:21:27,863-Speed 6306.90 samples/sec Loss 7.0728 LearningRate 0.0008 Epoch: 7 Global Step: 160840 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:21:31,110-Speed 6307.99 samples/sec Loss 7.0252 LearningRate 0.0008 Epoch: 7 Global Step: 160850 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:21:34,357-Speed 6308.90 samples/sec Loss 7.0551 LearningRate 0.0008 Epoch: 7 Global Step: 160860 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:21:37,591-Speed 6334.67 samples/sec Loss 7.0823 LearningRate 0.0008 Epoch: 7 Global Step: 160870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:40,845-Speed 6295.54 samples/sec Loss 7.0315 LearningRate 0.0008 Epoch: 7 Global Step: 160880 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:44,090-Speed 6312.05 samples/sec Loss 7.0885 LearningRate 0.0008 Epoch: 7 Global Step: 160890 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:47,341-Speed 6299.92 samples/sec Loss 7.1117 LearningRate 0.0008 Epoch: 7 Global Step: 160900 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:50,585-Speed 6316.13 samples/sec Loss 7.0468 LearningRate 0.0008 Epoch: 7 Global Step: 160910 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:53,833-Speed 6305.22 samples/sec Loss 7.0273 LearningRate 0.0008 Epoch: 7 Global Step: 160920 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:21:57,076-Speed 6316.95 samples/sec Loss 7.0736 LearningRate 0.0008 Epoch: 7 Global Step: 160930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:00,328-Speed 6299.47 samples/sec Loss 7.0379 LearningRate 0.0008 Epoch: 7 Global Step: 160940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:03,577-Speed 6305.07 samples/sec Loss 7.0382 LearningRate 0.0008 Epoch: 7 Global Step: 160950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:06,819-Speed 6317.00 samples/sec Loss 7.0333 LearningRate 0.0008 Epoch: 7 Global Step: 160960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:10,064-Speed 6314.24 samples/sec Loss 7.0190 LearningRate 0.0008 Epoch: 7 Global Step: 160970 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:22:13,312-Speed 6305.62 samples/sec Loss 7.0366 LearningRate 0.0008 Epoch: 7 Global Step: 160980 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:22:16,561-Speed 6304.81 samples/sec Loss 7.1050 LearningRate 0.0008 Epoch: 7 Global Step: 160990 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:22:19,813-Speed 6298.66 samples/sec Loss 7.0311 LearningRate 0.0008 Epoch: 7 Global Step: 161000 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:22:23,048-Speed 6332.77 samples/sec Loss 7.0545 LearningRate 0.0008 Epoch: 7 Global Step: 161010 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:26,296-Speed 6307.16 samples/sec Loss 6.9974 LearningRate 0.0008 Epoch: 7 Global Step: 161020 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:29,544-Speed 6306.95 samples/sec Loss 7.0337 LearningRate 0.0008 Epoch: 7 Global Step: 161030 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:32,793-Speed 6306.79 samples/sec Loss 7.1147 LearningRate 0.0008 Epoch: 7 Global Step: 161040 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:36,040-Speed 6308.43 samples/sec Loss 7.0998 LearningRate 0.0008 Epoch: 7 Global Step: 161050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:39,287-Speed 6308.41 samples/sec Loss 7.1207 LearningRate 0.0008 Epoch: 7 Global Step: 161060 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:42,534-Speed 6309.78 samples/sec Loss 7.0870 LearningRate 0.0008 Epoch: 7 Global Step: 161070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:45,786-Speed 6298.18 samples/sec Loss 7.0636 LearningRate 0.0008 Epoch: 7 Global Step: 161080 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:49,036-Speed 6302.52 samples/sec Loss 7.1244 LearningRate 0.0008 Epoch: 7 Global Step: 161090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:52,284-Speed 6308.27 samples/sec Loss 7.1432 LearningRate 0.0008 Epoch: 7 Global Step: 161100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:22:55,539-Speed 6291.99 samples/sec Loss 7.0241 LearningRate 0.0008 Epoch: 7 Global Step: 161110 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:22:58,786-Speed 6309.67 samples/sec Loss 6.9551 LearningRate 0.0008 Epoch: 7 Global Step: 161120 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:23:02,034-Speed 6306.99 samples/sec Loss 7.0548 LearningRate 0.0008 Epoch: 7 Global Step: 161130 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:23:05,270-Speed 6330.06 samples/sec Loss 7.0949 LearningRate 0.0008 Epoch: 7 Global Step: 161140 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:23:08,516-Speed 6311.03 samples/sec Loss 7.0333 LearningRate 0.0008 Epoch: 7 Global Step: 161150 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:23:11,764-Speed 6306.84 samples/sec Loss 7.1038 LearningRate 0.0008 Epoch: 7 Global Step: 161160 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:23:15,018-Speed 6295.49 samples/sec Loss 7.0194 LearningRate 0.0008 Epoch: 7 Global Step: 161170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:23:18,263-Speed 6311.71 samples/sec Loss 7.0396 LearningRate 0.0008 Epoch: 7 Global Step: 161180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:23:21,511-Speed 6306.52 samples/sec Loss 7.0644 LearningRate 0.0008 Epoch: 7 Global Step: 161190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:23:24,760-Speed 6305.20 samples/sec Loss 7.0348 LearningRate 0.0008 Epoch: 7 Global Step: 161200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:23:28,010-Speed 6302.61 samples/sec Loss 7.0513 LearningRate 0.0008 Epoch: 7 Global Step: 161210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:23:31,257-Speed 6309.28 samples/sec Loss 7.0468 LearningRate 0.0008 Epoch: 7 Global Step: 161220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:23:34,506-Speed 6303.95 samples/sec Loss 7.1235 LearningRate 0.0008 Epoch: 7 Global Step: 161230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:23:37,752-Speed 6310.98 samples/sec Loss 7.1473 LearningRate 0.0008 Epoch: 7 Global Step: 161240 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:23:41,000-Speed 6306.26 samples/sec Loss 7.0345 LearningRate 0.0008 Epoch: 7 Global Step: 161250 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:23:44,247-Speed 6310.59 samples/sec Loss 7.1147 LearningRate 0.0008 Epoch: 7 Global Step: 161260 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:23:47,492-Speed 6312.92 samples/sec Loss 7.1138 LearningRate 0.0008 Epoch: 7 Global Step: 161270 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:23:50,742-Speed 6304.05 samples/sec Loss 7.0823 LearningRate 0.0008 Epoch: 7 Global Step: 161280 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:23:53,989-Speed 6308.22 samples/sec Loss 7.0921 LearningRate 0.0008 Epoch: 7 Global Step: 161290 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:23:57,223-Speed 6333.50 samples/sec Loss 7.0248 LearningRate 0.0008 Epoch: 7 Global Step: 161300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:00,471-Speed 6307.19 samples/sec Loss 7.1245 LearningRate 0.0008 Epoch: 7 Global Step: 161310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:03,717-Speed 6310.55 samples/sec Loss 7.1371 LearningRate 0.0008 Epoch: 7 Global Step: 161320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:06,965-Speed 6306.98 samples/sec Loss 7.0812 LearningRate 0.0008 Epoch: 7 Global Step: 161330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:10,209-Speed 6313.33 samples/sec Loss 7.0864 LearningRate 0.0008 Epoch: 7 Global Step: 161340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:13,458-Speed 6306.91 samples/sec Loss 7.1585 LearningRate 0.0008 Epoch: 7 Global Step: 161350 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:16,702-Speed 6313.23 samples/sec Loss 7.1212 LearningRate 0.0008 Epoch: 7 Global Step: 161360 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:19,947-Speed 6312.33 samples/sec Loss 7.1138 LearningRate 0.0008 Epoch: 7 Global Step: 161370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:23,197-Speed 6304.66 samples/sec Loss 7.0133 LearningRate 0.0008 Epoch: 7 Global Step: 161380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:26,440-Speed 6315.86 samples/sec Loss 7.0509 LearningRate 0.0008 Epoch: 7 Global Step: 161390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:29,688-Speed 6305.91 samples/sec Loss 7.0402 LearningRate 0.0008 Epoch: 7 Global Step: 161400 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:24:32,938-Speed 6303.16 samples/sec Loss 7.0949 LearningRate 0.0008 Epoch: 7 Global Step: 161410 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:24:36,182-Speed 6314.80 samples/sec Loss 7.0606 LearningRate 0.0008 Epoch: 7 Global Step: 161420 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:24:39,428-Speed 6309.55 samples/sec Loss 7.0304 LearningRate 0.0008 Epoch: 7 Global Step: 161430 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:24:42,662-Speed 6336.06 samples/sec Loss 7.0613 LearningRate 0.0008 Epoch: 7 Global Step: 161440 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:45,912-Speed 6302.10 samples/sec Loss 7.0872 LearningRate 0.0008 Epoch: 7 Global Step: 161450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:49,160-Speed 6307.06 samples/sec Loss 7.1154 LearningRate 0.0008 Epoch: 7 Global Step: 161460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:52,414-Speed 6295.64 samples/sec Loss 7.1726 LearningRate 0.0008 Epoch: 7 Global Step: 161470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:55,666-Speed 6299.96 samples/sec Loss 7.0946 LearningRate 0.0008 Epoch: 7 Global Step: 161480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:24:58,910-Speed 6313.27 samples/sec Loss 7.0371 LearningRate 0.0008 Epoch: 7 Global Step: 161490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:02,166-Speed 6292.54 samples/sec Loss 7.0822 LearningRate 0.0008 Epoch: 7 Global Step: 161500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:05,417-Speed 6301.24 samples/sec Loss 7.0570 LearningRate 0.0008 Epoch: 7 Global Step: 161510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:08,662-Speed 6312.67 samples/sec Loss 7.0908 LearningRate 0.0008 Epoch: 7 Global Step: 161520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:11,913-Speed 6300.12 samples/sec Loss 7.0287 LearningRate 0.0008 Epoch: 7 Global Step: 161530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:15,167-Speed 6296.11 samples/sec Loss 7.1868 LearningRate 0.0008 Epoch: 7 Global Step: 161540 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:25:18,421-Speed 6294.91 samples/sec Loss 7.1186 LearningRate 0.0008 Epoch: 7 Global Step: 161550 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:25:21,658-Speed 6327.33 samples/sec Loss 7.1011 LearningRate 0.0008 Epoch: 7 Global Step: 161560 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:24,904-Speed 6310.71 samples/sec Loss 7.0975 LearningRate 0.0008 Epoch: 7 Global Step: 161570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:28,154-Speed 6303.40 samples/sec Loss 7.0474 LearningRate 0.0008 Epoch: 7 Global Step: 161580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:31,404-Speed 6302.46 samples/sec Loss 6.9989 LearningRate 0.0008 Epoch: 7 Global Step: 161590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:34,655-Speed 6300.89 samples/sec Loss 7.0292 LearningRate 0.0008 Epoch: 7 Global Step: 161600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:37,916-Speed 6281.28 samples/sec Loss 7.0909 LearningRate 0.0008 Epoch: 7 Global Step: 161610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:41,172-Speed 6292.01 samples/sec Loss 7.0025 LearningRate 0.0008 Epoch: 7 Global Step: 161620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:44,421-Speed 6304.35 samples/sec Loss 7.1038 LearningRate 0.0008 Epoch: 7 Global Step: 161630 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:47,667-Speed 6311.90 samples/sec Loss 7.0702 LearningRate 0.0008 Epoch: 7 Global Step: 161640 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:50,914-Speed 6309.09 samples/sec Loss 6.9571 LearningRate 0.0008 Epoch: 7 Global Step: 161650 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:25:54,159-Speed 6312.11 samples/sec Loss 7.0319 LearningRate 0.0008 Epoch: 7 Global Step: 161660 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:25:57,409-Speed 6302.76 samples/sec Loss 6.9952 LearningRate 0.0008 Epoch: 7 Global Step: 161670 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:26:00,662-Speed 6297.53 samples/sec Loss 7.1307 LearningRate 0.0008 Epoch: 7 Global Step: 161680 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:26:03,909-Speed 6309.03 samples/sec Loss 7.0676 LearningRate 0.0008 Epoch: 7 Global Step: 161690 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:26:07,157-Speed 6306.00 samples/sec Loss 7.1154 LearningRate 0.0008 Epoch: 7 Global Step: 161700 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:26:10,393-Speed 6330.87 samples/sec Loss 6.9882 LearningRate 0.0008 Epoch: 7 Global Step: 161710 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:13,644-Speed 6300.79 samples/sec Loss 7.0212 LearningRate 0.0008 Epoch: 7 Global Step: 161720 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:16,890-Speed 6310.57 samples/sec Loss 7.0399 LearningRate 0.0008 Epoch: 7 Global Step: 161730 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:20,138-Speed 6307.59 samples/sec Loss 7.0507 LearningRate 0.0008 Epoch: 7 Global Step: 161740 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:23,387-Speed 6304.27 samples/sec Loss 7.1378 LearningRate 0.0008 Epoch: 7 Global Step: 161750 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:26,678-Speed 6224.89 samples/sec Loss 7.0920 LearningRate 0.0008 Epoch: 7 Global Step: 161760 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:29,929-Speed 6300.07 samples/sec Loss 7.0178 LearningRate 0.0008 Epoch: 7 Global Step: 161770 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:33,179-Speed 6303.21 samples/sec Loss 6.9795 LearningRate 0.0008 Epoch: 7 Global Step: 161780 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:36,428-Speed 6305.49 samples/sec Loss 7.0162 LearningRate 0.0008 Epoch: 7 Global Step: 161790 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:39,677-Speed 6304.81 samples/sec Loss 6.9896 LearningRate 0.0008 Epoch: 7 Global Step: 161800 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:42,910-Speed 6336.36 samples/sec Loss 7.0307 LearningRate 0.0008 Epoch: 7 Global Step: 161810 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:46,155-Speed 6312.87 samples/sec Loss 7.0072 LearningRate 0.0008 Epoch: 7 Global Step: 161820 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:49,402-Speed 6308.01 samples/sec Loss 7.0452 LearningRate 0.0008 Epoch: 7 Global Step: 161830 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:52,649-Speed 6308.05 samples/sec Loss 7.0885 LearningRate 0.0008 Epoch: 7 Global Step: 161840 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:55,902-Speed 6297.10 samples/sec Loss 7.0391 LearningRate 0.0008 Epoch: 7 Global Step: 161850 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:26:59,149-Speed 6310.04 samples/sec Loss 7.0690 LearningRate 0.0008 Epoch: 7 Global Step: 161860 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:02,400-Speed 6300.25 samples/sec Loss 7.0776 LearningRate 0.0008 Epoch: 7 Global Step: 161870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:05,656-Speed 6290.28 samples/sec Loss 7.1018 LearningRate 0.0008 Epoch: 7 Global Step: 161880 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:08,902-Speed 6310.57 samples/sec Loss 7.0642 LearningRate 0.0008 Epoch: 7 Global Step: 161890 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:12,151-Speed 6306.90 samples/sec Loss 7.1139 LearningRate 0.0008 Epoch: 7 Global Step: 161900 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:15,400-Speed 6303.48 samples/sec Loss 7.0352 LearningRate 0.0008 Epoch: 7 Global Step: 161910 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:27:18,645-Speed 6312.77 samples/sec Loss 7.0378 LearningRate 0.0008 Epoch: 7 Global Step: 161920 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:27:21,895-Speed 6304.59 samples/sec Loss 6.9482 LearningRate 0.0008 Epoch: 7 Global Step: 161930 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:27:25,144-Speed 6305.21 samples/sec Loss 7.0782 LearningRate 0.0008 Epoch: 7 Global Step: 161940 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:27:28,387-Speed 6316.55 samples/sec Loss 7.0485 LearningRate 0.0008 Epoch: 7 Global Step: 161950 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:27:31,622-Speed 6332.79 samples/sec Loss 7.0350 LearningRate 0.0008 Epoch: 7 Global Step: 161960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:34,873-Speed 6299.11 samples/sec Loss 6.9790 LearningRate 0.0008 Epoch: 7 Global Step: 161970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:38,125-Speed 6300.63 samples/sec Loss 7.1199 LearningRate 0.0008 Epoch: 7 Global Step: 161980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:41,373-Speed 6305.48 samples/sec Loss 7.0139 LearningRate 0.0008 Epoch: 7 Global Step: 161990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:44,622-Speed 6305.94 samples/sec Loss 7.0674 LearningRate 0.0008 Epoch: 7 Global Step: 162000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:47,870-Speed 6305.62 samples/sec Loss 7.0654 LearningRate 0.0008 Epoch: 7 Global Step: 162010 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:51,120-Speed 6303.77 samples/sec Loss 7.0706 LearningRate 0.0008 Epoch: 7 Global Step: 162020 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:54,366-Speed 6310.84 samples/sec Loss 6.9996 LearningRate 0.0008 Epoch: 7 Global Step: 162030 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:27:57,616-Speed 6303.42 samples/sec Loss 7.1290 LearningRate 0.0008 Epoch: 7 Global Step: 162040 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:00,863-Speed 6308.07 samples/sec Loss 7.0157 LearningRate 0.0008 Epoch: 7 Global Step: 162050 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:04,112-Speed 6304.38 samples/sec Loss 7.0472 LearningRate 0.0008 Epoch: 7 Global Step: 162060 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:28:07,342-Speed 6343.19 samples/sec Loss 7.0347 LearningRate 0.0008 Epoch: 7 Global Step: 162070 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:10,590-Speed 6305.72 samples/sec Loss 7.0488 LearningRate 0.0008 Epoch: 7 Global Step: 162080 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:13,832-Speed 6318.85 samples/sec Loss 7.0876 LearningRate 0.0008 Epoch: 7 Global Step: 162090 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:17,079-Speed 6308.42 samples/sec Loss 7.0330 LearningRate 0.0008 Epoch: 7 Global Step: 162100 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:20,328-Speed 6304.49 samples/sec Loss 6.9686 LearningRate 0.0008 Epoch: 7 Global Step: 162110 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:23,577-Speed 6304.68 samples/sec Loss 7.0617 LearningRate 0.0008 Epoch: 7 Global Step: 162120 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:26,851-Speed 6258.74 samples/sec Loss 7.0350 LearningRate 0.0008 Epoch: 7 Global Step: 162130 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:30,214-Speed 6090.88 samples/sec Loss 7.0095 LearningRate 0.0008 Epoch: 7 Global Step: 162140 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:33,460-Speed 6311.51 samples/sec Loss 7.0698 LearningRate 0.0008 Epoch: 7 Global Step: 162150 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:36,718-Speed 6285.84 samples/sec Loss 7.0035 LearningRate 0.0008 Epoch: 7 Global Step: 162160 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:39,966-Speed 6306.51 samples/sec Loss 7.0239 LearningRate 0.0008 Epoch: 7 Global Step: 162170 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:28:43,212-Speed 6311.30 samples/sec Loss 7.0701 LearningRate 0.0008 Epoch: 7 Global Step: 162180 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:28:46,462-Speed 6302.64 samples/sec Loss 7.0445 LearningRate 0.0008 Epoch: 7 Global Step: 162190 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:28:49,707-Speed 6313.14 samples/sec Loss 7.0689 LearningRate 0.0008 Epoch: 7 Global Step: 162200 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:28:52,940-Speed 6337.05 samples/sec Loss 7.0530 LearningRate 0.0008 Epoch: 7 Global Step: 162210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:56,190-Speed 6301.35 samples/sec Loss 6.8933 LearningRate 0.0008 Epoch: 7 Global Step: 162220 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:28:59,440-Speed 6303.32 samples/sec Loss 6.9869 LearningRate 0.0008 Epoch: 7 Global Step: 162230 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:02,683-Speed 6316.94 samples/sec Loss 6.9835 LearningRate 0.0008 Epoch: 7 Global Step: 162240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:05,930-Speed 6309.21 samples/sec Loss 7.0354 LearningRate 0.0008 Epoch: 7 Global Step: 162250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:09,176-Speed 6309.74 samples/sec Loss 6.9877 LearningRate 0.0008 Epoch: 7 Global Step: 162260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:12,425-Speed 6306.01 samples/sec Loss 7.0069 LearningRate 0.0008 Epoch: 7 Global Step: 162270 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:15,673-Speed 6305.28 samples/sec Loss 7.0304 LearningRate 0.0008 Epoch: 7 Global Step: 162280 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:18,924-Speed 6301.15 samples/sec Loss 7.1470 LearningRate 0.0008 Epoch: 7 Global Step: 162290 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:22,177-Speed 6298.44 samples/sec Loss 7.1444 LearningRate 0.0008 Epoch: 7 Global Step: 162300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:25,421-Speed 6312.92 samples/sec Loss 6.9904 LearningRate 0.0008 Epoch: 7 Global Step: 162310 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:29:28,656-Speed 6332.00 samples/sec Loss 7.0933 LearningRate 0.0008 Epoch: 7 Global Step: 162320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:31,901-Speed 6314.16 samples/sec Loss 7.0150 LearningRate 0.0008 Epoch: 7 Global Step: 162330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:35,148-Speed 6308.77 samples/sec Loss 7.0529 LearningRate 0.0008 Epoch: 7 Global Step: 162340 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:38,392-Speed 6315.82 samples/sec Loss 7.0350 LearningRate 0.0008 Epoch: 7 Global Step: 162350 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:41,645-Speed 6295.59 samples/sec Loss 7.0133 LearningRate 0.0008 Epoch: 7 Global Step: 162360 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:44,891-Speed 6310.75 samples/sec Loss 7.1106 LearningRate 0.0008 Epoch: 7 Global Step: 162370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:48,139-Speed 6306.90 samples/sec Loss 7.0772 LearningRate 0.0008 Epoch: 7 Global Step: 162380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:51,385-Speed 6311.09 samples/sec Loss 7.0031 LearningRate 0.0008 Epoch: 7 Global Step: 162390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:54,630-Speed 6312.45 samples/sec Loss 7.0935 LearningRate 0.0008 Epoch: 7 Global Step: 162400 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:29:57,880-Speed 6302.43 samples/sec Loss 7.1220 LearningRate 0.0008 Epoch: 7 Global Step: 162410 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:01,129-Speed 6305.85 samples/sec Loss 7.0661 LearningRate 0.0008 Epoch: 7 Global Step: 162420 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:30:04,409-Speed 6245.40 samples/sec Loss 7.0523 LearningRate 0.0008 Epoch: 7 Global Step: 162430 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:30:07,658-Speed 6303.99 samples/sec Loss 7.0410 LearningRate 0.0008 Epoch: 7 Global Step: 162440 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:30:10,926-Speed 6269.06 samples/sec Loss 7.0699 LearningRate 0.0008 Epoch: 7 Global Step: 162450 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:30:14,158-Speed 6337.85 samples/sec Loss 7.0479 LearningRate 0.0008 Epoch: 7 Global Step: 162460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:17,410-Speed 6298.31 samples/sec Loss 7.0037 LearningRate 0.0008 Epoch: 7 Global Step: 162470 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:20,660-Speed 6304.69 samples/sec Loss 6.9941 LearningRate 0.0008 Epoch: 7 Global Step: 162480 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:23,910-Speed 6301.69 samples/sec Loss 7.0294 LearningRate 0.0008 Epoch: 7 Global Step: 162490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:27,172-Speed 6279.29 samples/sec Loss 7.1352 LearningRate 0.0008 Epoch: 7 Global Step: 162500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:30,420-Speed 6308.18 samples/sec Loss 7.0534 LearningRate 0.0008 Epoch: 7 Global Step: 162510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:33,672-Speed 6297.61 samples/sec Loss 6.9176 LearningRate 0.0008 Epoch: 7 Global Step: 162520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:36,921-Speed 6306.67 samples/sec Loss 7.0699 LearningRate 0.0008 Epoch: 7 Global Step: 162530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:40,169-Speed 6306.00 samples/sec Loss 7.0386 LearningRate 0.0008 Epoch: 7 Global Step: 162540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:43,417-Speed 6308.70 samples/sec Loss 7.0159 LearningRate 0.0008 Epoch: 7 Global Step: 162550 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:30:46,665-Speed 6305.10 samples/sec Loss 7.0292 LearningRate 0.0008 Epoch: 7 Global Step: 162560 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:30:49,910-Speed 6313.25 samples/sec Loss 7.0887 LearningRate 0.0008 Epoch: 7 Global Step: 162570 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:30:53,159-Speed 6305.59 samples/sec Loss 7.0976 LearningRate 0.0008 Epoch: 7 Global Step: 162580 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:30:56,406-Speed 6307.99 samples/sec Loss 6.9819 LearningRate 0.0008 Epoch: 7 Global Step: 162590 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:30:59,657-Speed 6301.85 samples/sec Loss 7.0618 LearningRate 0.0008 Epoch: 7 Global Step: 162600 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:31:02,890-Speed 6335.97 samples/sec Loss 6.9924 LearningRate 0.0008 Epoch: 7 Global Step: 162610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:31:06,135-Speed 6312.43 samples/sec Loss 7.1214 LearningRate 0.0008 Epoch: 7 Global Step: 162620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:31:09,379-Speed 6315.01 samples/sec Loss 7.0144 LearningRate 0.0008 Epoch: 7 Global Step: 162630 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:31:12,626-Speed 6307.59 samples/sec Loss 7.0547 LearningRate 0.0008 Epoch: 7 Global Step: 162640 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:31:15,872-Speed 6310.92 samples/sec Loss 6.9489 LearningRate 0.0008 Epoch: 7 Global Step: 162650 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:31:19,120-Speed 6306.81 samples/sec Loss 7.0029 LearningRate 0.0008 Epoch: 7 Global Step: 162660 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:31:22,377-Speed 6290.02 samples/sec Loss 6.9466 LearningRate 0.0008 Epoch: 7 Global Step: 162670 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:31:25,626-Speed 6305.46 samples/sec Loss 7.0274 LearningRate 0.0008 Epoch: 7 Global Step: 162680 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:31:28,874-Speed 6306.46 samples/sec Loss 7.0238 LearningRate 0.0008 Epoch: 7 Global Step: 162690 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:31:32,123-Speed 6304.47 samples/sec Loss 7.0960 LearningRate 0.0008 Epoch: 7 Global Step: 162700 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:31:35,369-Speed 6311.40 samples/sec Loss 7.0778 LearningRate 0.0008 Epoch: 7 Global Step: 162710 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:31:38,611-Speed 6318.32 samples/sec Loss 7.0276 LearningRate 0.0008 Epoch: 7 Global Step: 162720 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:31:41,858-Speed 6308.84 samples/sec Loss 7.0269 LearningRate 0.0008 Epoch: 7 Global Step: 162730 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:31:45,102-Speed 6315.35 samples/sec Loss 6.9853 LearningRate 0.0008 Epoch: 7 Global Step: 162740 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:31:48,351-Speed 6304.79 samples/sec Loss 7.0512 LearningRate 0.0008 Epoch: 7 Global Step: 162750 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:31:51,599-Speed 6306.99 samples/sec Loss 7.0804 LearningRate 0.0008 Epoch: 7 Global Step: 162760 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:31:54,843-Speed 6314.49 samples/sec Loss 7.0586 LearningRate 0.0008 Epoch: 7 Global Step: 162770 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:31:58,092-Speed 6304.57 samples/sec Loss 7.0363 LearningRate 0.0008 Epoch: 7 Global Step: 162780 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:32:01,331-Speed 6324.14 samples/sec Loss 7.0371 LearningRate 0.0008 Epoch: 7 Global Step: 162790 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:04,578-Speed 6308.86 samples/sec Loss 7.0708 LearningRate 0.0008 Epoch: 7 Global Step: 162800 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:07,827-Speed 6305.59 samples/sec Loss 6.9997 LearningRate 0.0008 Epoch: 7 Global Step: 162810 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:11,070-Speed 6315.03 samples/sec Loss 7.0155 LearningRate 0.0008 Epoch: 7 Global Step: 162820 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:14,317-Speed 6309.64 samples/sec Loss 7.0409 LearningRate 0.0008 Epoch: 7 Global Step: 162830 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:17,562-Speed 6313.00 samples/sec Loss 7.0512 LearningRate 0.0008 Epoch: 7 Global Step: 162840 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:20,806-Speed 6314.54 samples/sec Loss 6.9347 LearningRate 0.0008 Epoch: 7 Global Step: 162850 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:24,049-Speed 6317.28 samples/sec Loss 6.9505 LearningRate 0.0008 Epoch: 7 Global Step: 162860 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:27,300-Speed 6299.76 samples/sec Loss 6.9952 LearningRate 0.0008 Epoch: 7 Global Step: 162870 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:30,544-Speed 6315.20 samples/sec Loss 7.0406 LearningRate 0.0008 Epoch: 7 Global Step: 162880 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:33,805-Speed 6281.64 samples/sec Loss 7.0816 LearningRate 0.0008 Epoch: 7 Global Step: 162890 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:32:37,054-Speed 6304.54 samples/sec Loss 7.0275 LearningRate 0.0008 Epoch: 7 Global Step: 162900 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:32:40,300-Speed 6309.69 samples/sec Loss 7.0495 LearningRate 0.0008 Epoch: 7 Global Step: 162910 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:32:43,537-Speed 6328.29 samples/sec Loss 7.0317 LearningRate 0.0008 Epoch: 7 Global Step: 162920 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:46,791-Speed 6295.77 samples/sec Loss 7.0558 LearningRate 0.0008 Epoch: 7 Global Step: 162930 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:50,042-Speed 6302.03 samples/sec Loss 7.0577 LearningRate 0.0008 Epoch: 7 Global Step: 162940 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:53,357-Speed 6179.57 samples/sec Loss 6.9449 LearningRate 0.0008 Epoch: 7 Global Step: 162950 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:56,605-Speed 6307.28 samples/sec Loss 7.0925 LearningRate 0.0008 Epoch: 7 Global Step: 162960 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:32:59,851-Speed 6310.90 samples/sec Loss 7.0293 LearningRate 0.0008 Epoch: 7 Global Step: 162970 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:33:03,097-Speed 6310.99 samples/sec Loss 7.0060 LearningRate 0.0008 Epoch: 7 Global Step: 162980 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:33:06,342-Speed 6311.09 samples/sec Loss 7.0339 LearningRate 0.0008 Epoch: 7 Global Step: 162990 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:33:09,591-Speed 6304.82 samples/sec Loss 7.0037 LearningRate 0.0008 Epoch: 7 Global Step: 163000 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:33:12,839-Speed 6308.19 samples/sec Loss 7.0395 LearningRate 0.0008 Epoch: 7 Global Step: 163010 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:33:16,092-Speed 6295.92 samples/sec Loss 7.0425 LearningRate 0.0008 Epoch: 7 Global Step: 163020 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:33:19,341-Speed 6306.16 samples/sec Loss 7.0160 LearningRate 0.0008 Epoch: 7 Global Step: 163030 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:33:22,591-Speed 6302.99 samples/sec Loss 7.0503 LearningRate 0.0008 Epoch: 7 Global Step: 163040 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:33:25,841-Speed 6312.57 samples/sec Loss 7.0528 LearningRate 0.0008 Epoch: 7 Global Step: 163050 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:33:29,093-Speed 6298.16 samples/sec Loss 7.0149 LearningRate 0.0008 Epoch: 7 Global Step: 163060 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:33:32,354-Speed 6281.23 samples/sec Loss 7.0740 LearningRate 0.0008 Epoch: 7 Global Step: 163070 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:33:35,600-Speed 6311.12 samples/sec Loss 7.0626 LearningRate 0.0008 Epoch: 7 Global Step: 163080 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:33:38,846-Speed 6310.37 samples/sec Loss 7.0151 LearningRate 0.0008 Epoch: 7 Global Step: 163090 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:33:42,099-Speed 6296.96 samples/sec Loss 6.9756 LearningRate 0.0008 Epoch: 7 Global Step: 163100 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:33:45,344-Speed 6313.71 samples/sec Loss 6.9540 LearningRate 0.0008 Epoch: 7 Global Step: 163110 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:33:48,578-Speed 6333.53 samples/sec Loss 7.0419 LearningRate 0.0008 Epoch: 7 Global Step: 163120 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:33:51,831-Speed 6297.25 samples/sec Loss 7.0087 LearningRate 0.0008 Epoch: 7 Global Step: 163130 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:33:55,078-Speed 6309.56 samples/sec Loss 7.0135 LearningRate 0.0008 Epoch: 7 Global Step: 163140 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:33:58,324-Speed 6311.19 samples/sec Loss 7.0645 LearningRate 0.0008 Epoch: 7 Global Step: 163150 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:01,584-Speed 6283.90 samples/sec Loss 6.9861 LearningRate 0.0008 Epoch: 7 Global Step: 163160 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:04,834-Speed 6301.19 samples/sec Loss 7.0207 LearningRate 0.0008 Epoch: 7 Global Step: 163170 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:08,081-Speed 6310.24 samples/sec Loss 6.9892 LearningRate 0.0008 Epoch: 7 Global Step: 163180 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:11,328-Speed 6309.35 samples/sec Loss 7.1031 LearningRate 0.0008 Epoch: 7 Global Step: 163190 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:14,575-Speed 6306.85 samples/sec Loss 7.0980 LearningRate 0.0008 Epoch: 7 Global Step: 163200 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:17,826-Speed 6306.46 samples/sec Loss 7.0606 LearningRate 0.0008 Epoch: 7 Global Step: 163210 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:21,074-Speed 6306.40 samples/sec Loss 7.0010 LearningRate 0.0008 Epoch: 7 Global Step: 163220 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:34:24,324-Speed 6302.19 samples/sec Loss 7.0569 LearningRate 0.0008 Epoch: 7 Global Step: 163230 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:34:27,555-Speed 6339.76 samples/sec Loss 7.0851 LearningRate 0.0008 Epoch: 7 Global Step: 163240 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:30,801-Speed 6312.17 samples/sec Loss 7.1043 LearningRate 0.0008 Epoch: 7 Global Step: 163250 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:34,048-Speed 6308.53 samples/sec Loss 7.0788 LearningRate 0.0008 Epoch: 7 Global Step: 163260 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:37,303-Speed 6292.72 samples/sec Loss 7.0310 LearningRate 0.0008 Epoch: 7 Global Step: 163270 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:40,548-Speed 6312.05 samples/sec Loss 6.9547 LearningRate 0.0008 Epoch: 7 Global Step: 163280 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:43,809-Speed 6281.23 samples/sec Loss 7.0211 LearningRate 0.0008 Epoch: 7 Global Step: 163290 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:47,056-Speed 6310.23 samples/sec Loss 7.0085 LearningRate 0.0008 Epoch: 7 Global Step: 163300 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:50,307-Speed 6300.30 samples/sec Loss 7.0179 LearningRate 0.0008 Epoch: 7 Global Step: 163310 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:53,549-Speed 6319.37 samples/sec Loss 7.0204 LearningRate 0.0008 Epoch: 7 Global Step: 163320 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:34:56,796-Speed 6309.56 samples/sec Loss 7.0422 LearningRate 0.0008 Epoch: 7 Global Step: 163330 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:00,047-Speed 6300.76 samples/sec Loss 7.0026 LearningRate 0.0008 Epoch: 7 Global Step: 163340 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:35:03,309-Speed 6280.35 samples/sec Loss 7.0035 LearningRate 0.0008 Epoch: 7 Global Step: 163350 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:35:06,561-Speed 6297.29 samples/sec Loss 7.0184 LearningRate 0.0008 Epoch: 7 Global Step: 163360 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:35:09,794-Speed 6336.36 samples/sec Loss 7.0570 LearningRate 0.0008 Epoch: 7 Global Step: 163370 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:13,041-Speed 6310.52 samples/sec Loss 6.9891 LearningRate 0.0008 Epoch: 7 Global Step: 163380 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:16,287-Speed 6309.14 samples/sec Loss 6.9654 LearningRate 0.0008 Epoch: 7 Global Step: 163390 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:19,533-Speed 6311.86 samples/sec Loss 7.0126 LearningRate 0.0008 Epoch: 7 Global Step: 163400 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:22,785-Speed 6298.33 samples/sec Loss 6.9950 LearningRate 0.0008 Epoch: 7 Global Step: 163410 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:26,035-Speed 6303.87 samples/sec Loss 7.1089 LearningRate 0.0008 Epoch: 7 Global Step: 163420 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:29,283-Speed 6307.25 samples/sec Loss 7.0570 LearningRate 0.0008 Epoch: 7 Global Step: 163430 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:32,530-Speed 6308.64 samples/sec Loss 7.0437 LearningRate 0.0008 Epoch: 7 Global Step: 163440 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:35,778-Speed 6306.46 samples/sec Loss 6.9502 LearningRate 0.0008 Epoch: 7 Global Step: 163450 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:39,025-Speed 6308.28 samples/sec Loss 7.0253 LearningRate 0.0008 Epoch: 7 Global Step: 163460 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:42,272-Speed 6309.49 samples/sec Loss 6.9644 LearningRate 0.0008 Epoch: 7 Global Step: 163470 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:35:45,520-Speed 6306.45 samples/sec Loss 7.0187 LearningRate 0.0008 Epoch: 7 Global Step: 163480 Fp16 Grad Scale: 65536 Required: 61 hours Training: 2022-04-01 06:35:48,747-Speed 6346.82 samples/sec Loss 6.9917 LearningRate 0.0008 Epoch: 7 Global Step: 163490 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:51,996-Speed 6305.98 samples/sec Loss 7.1009 LearningRate 0.0008 Epoch: 7 Global Step: 163500 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:55,246-Speed 6302.61 samples/sec Loss 7.0943 LearningRate 0.0008 Epoch: 7 Global Step: 163510 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:35:58,493-Speed 6308.89 samples/sec Loss 6.9903 LearningRate 0.0008 Epoch: 7 Global Step: 163520 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:01,741-Speed 6306.50 samples/sec Loss 6.9894 LearningRate 0.0008 Epoch: 7 Global Step: 163530 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:04,989-Speed 6307.92 samples/sec Loss 7.0346 LearningRate 0.0008 Epoch: 7 Global Step: 163540 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:08,240-Speed 6300.58 samples/sec Loss 6.9805 LearningRate 0.0008 Epoch: 7 Global Step: 163550 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:11,490-Speed 6303.62 samples/sec Loss 7.0706 LearningRate 0.0008 Epoch: 7 Global Step: 163560 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:14,734-Speed 6313.93 samples/sec Loss 7.0357 LearningRate 0.0008 Epoch: 7 Global Step: 163570 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:17,981-Speed 6308.98 samples/sec Loss 7.0624 LearningRate 0.0008 Epoch: 7 Global Step: 163580 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:21,216-Speed 6332.17 samples/sec Loss 7.0319 LearningRate 0.0008 Epoch: 7 Global Step: 163590 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:24,466-Speed 6304.16 samples/sec Loss 7.0683 LearningRate 0.0008 Epoch: 7 Global Step: 163600 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:27,717-Speed 6300.97 samples/sec Loss 7.0318 LearningRate 0.0008 Epoch: 7 Global Step: 163610 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:30,963-Speed 6309.85 samples/sec Loss 7.1084 LearningRate 0.0008 Epoch: 7 Global Step: 163620 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:34,214-Speed 6301.26 samples/sec Loss 6.9963 LearningRate 0.0008 Epoch: 7 Global Step: 163630 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:37,462-Speed 6307.53 samples/sec Loss 7.0188 LearningRate 0.0008 Epoch: 7 Global Step: 163640 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:40,723-Speed 6280.50 samples/sec Loss 7.0220 LearningRate 0.0008 Epoch: 7 Global Step: 163650 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:43,968-Speed 6312.15 samples/sec Loss 7.0121 LearningRate 0.0008 Epoch: 7 Global Step: 163660 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:47,216-Speed 6306.54 samples/sec Loss 6.9615 LearningRate 0.0008 Epoch: 7 Global Step: 163670 Fp16 Grad Scale: 32768 Required: 61 hours Training: 2022-04-01 06:36:50,462-Speed 6311.84 samples/sec Loss 7.0728 LearningRate 0.0008 Epoch: 7 Global Step: 163680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:36:53,707-Speed 6311.61 samples/sec Loss 6.9554 LearningRate 0.0008 Epoch: 7 Global Step: 163690 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:36:56,957-Speed 6304.63 samples/sec Loss 7.0073 LearningRate 0.0008 Epoch: 7 Global Step: 163700 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:37:00,204-Speed 6306.96 samples/sec Loss 7.0365 LearningRate 0.0008 Epoch: 7 Global Step: 163710 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:37:03,449-Speed 6313.26 samples/sec Loss 6.9962 LearningRate 0.0008 Epoch: 7 Global Step: 163720 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:37:06,700-Speed 6300.54 samples/sec Loss 7.0018 LearningRate 0.0008 Epoch: 7 Global Step: 163730 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:37:09,934-Speed 6335.34 samples/sec Loss 7.0429 LearningRate 0.0008 Epoch: 7 Global Step: 163740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:13,180-Speed 6310.91 samples/sec Loss 7.0318 LearningRate 0.0008 Epoch: 7 Global Step: 163750 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:16,430-Speed 6302.32 samples/sec Loss 7.1109 LearningRate 0.0008 Epoch: 7 Global Step: 163760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:19,676-Speed 6311.47 samples/sec Loss 7.0443 LearningRate 0.0008 Epoch: 7 Global Step: 163770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:22,919-Speed 6317.60 samples/sec Loss 6.9445 LearningRate 0.0008 Epoch: 7 Global Step: 163780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:26,164-Speed 6314.30 samples/sec Loss 7.0813 LearningRate 0.0008 Epoch: 7 Global Step: 163790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:29,409-Speed 6312.77 samples/sec Loss 7.0173 LearningRate 0.0008 Epoch: 7 Global Step: 163800 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:32,658-Speed 6305.10 samples/sec Loss 7.0535 LearningRate 0.0008 Epoch: 7 Global Step: 163810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:35,904-Speed 6310.30 samples/sec Loss 7.0551 LearningRate 0.0008 Epoch: 7 Global Step: 163820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:39,152-Speed 6307.95 samples/sec Loss 7.0569 LearningRate 0.0008 Epoch: 7 Global Step: 163830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:42,396-Speed 6314.54 samples/sec Loss 6.9902 LearningRate 0.0008 Epoch: 7 Global Step: 163840 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:37:45,642-Speed 6309.02 samples/sec Loss 6.9551 LearningRate 0.0008 Epoch: 7 Global Step: 163850 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:37:48,869-Speed 6348.94 samples/sec Loss 7.0376 LearningRate 0.0008 Epoch: 7 Global Step: 163860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:52,114-Speed 6312.19 samples/sec Loss 7.0185 LearningRate 0.0008 Epoch: 7 Global Step: 163870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:55,398-Speed 6237.43 samples/sec Loss 7.0468 LearningRate 0.0008 Epoch: 7 Global Step: 163880 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:37:58,644-Speed 6311.09 samples/sec Loss 7.0509 LearningRate 0.0008 Epoch: 7 Global Step: 163890 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:01,886-Speed 6318.09 samples/sec Loss 7.0064 LearningRate 0.0008 Epoch: 7 Global Step: 163900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:05,132-Speed 6311.37 samples/sec Loss 7.1453 LearningRate 0.0008 Epoch: 7 Global Step: 163910 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:08,377-Speed 6313.20 samples/sec Loss 7.0262 LearningRate 0.0008 Epoch: 7 Global Step: 163920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:11,625-Speed 6306.76 samples/sec Loss 7.0391 LearningRate 0.0008 Epoch: 7 Global Step: 163930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:14,871-Speed 6309.56 samples/sec Loss 6.9213 LearningRate 0.0008 Epoch: 7 Global Step: 163940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:18,121-Speed 6305.05 samples/sec Loss 6.9982 LearningRate 0.0008 Epoch: 7 Global Step: 163950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:21,400-Speed 6246.05 samples/sec Loss 7.0561 LearningRate 0.0008 Epoch: 7 Global Step: 163960 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:38:24,631-Speed 6340.43 samples/sec Loss 6.9803 LearningRate 0.0008 Epoch: 7 Global Step: 163970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:27,877-Speed 6311.03 samples/sec Loss 7.0568 LearningRate 0.0008 Epoch: 7 Global Step: 163980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:31,124-Speed 6307.80 samples/sec Loss 7.0462 LearningRate 0.0008 Epoch: 7 Global Step: 163990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:34,370-Speed 6310.95 samples/sec Loss 6.9383 LearningRate 0.0008 Epoch: 7 Global Step: 164000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:37,617-Speed 6309.76 samples/sec Loss 7.1062 LearningRate 0.0008 Epoch: 7 Global Step: 164010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:40,861-Speed 6314.53 samples/sec Loss 7.0513 LearningRate 0.0008 Epoch: 7 Global Step: 164020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:44,114-Speed 6296.94 samples/sec Loss 7.0144 LearningRate 0.0008 Epoch: 7 Global Step: 164030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:47,358-Speed 6314.48 samples/sec Loss 7.0056 LearningRate 0.0008 Epoch: 7 Global Step: 164040 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:50,611-Speed 6296.72 samples/sec Loss 7.0822 LearningRate 0.0008 Epoch: 7 Global Step: 164050 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:53,859-Speed 6307.16 samples/sec Loss 7.0527 LearningRate 0.0008 Epoch: 7 Global Step: 164060 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:38:57,116-Speed 6288.96 samples/sec Loss 7.0391 LearningRate 0.0008 Epoch: 7 Global Step: 164070 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:00,365-Speed 6304.85 samples/sec Loss 6.9333 LearningRate 0.0008 Epoch: 7 Global Step: 164080 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:03,617-Speed 6298.65 samples/sec Loss 7.0180 LearningRate 0.0008 Epoch: 7 Global Step: 164090 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:06,865-Speed 6307.13 samples/sec Loss 6.9810 LearningRate 0.0008 Epoch: 7 Global Step: 164100 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:10,110-Speed 6312.24 samples/sec Loss 7.0818 LearningRate 0.0008 Epoch: 7 Global Step: 164110 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:13,359-Speed 6306.40 samples/sec Loss 6.9355 LearningRate 0.0008 Epoch: 7 Global Step: 164120 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:16,604-Speed 6312.41 samples/sec Loss 7.0640 LearningRate 0.0008 Epoch: 7 Global Step: 164130 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:19,850-Speed 6310.43 samples/sec Loss 6.9812 LearningRate 0.0008 Epoch: 7 Global Step: 164140 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:23,098-Speed 6307.87 samples/sec Loss 6.9417 LearningRate 0.0008 Epoch: 7 Global Step: 164150 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:26,353-Speed 6292.85 samples/sec Loss 7.0968 LearningRate 0.0008 Epoch: 7 Global Step: 164160 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:29,585-Speed 6338.46 samples/sec Loss 7.0818 LearningRate 0.0008 Epoch: 7 Global Step: 164170 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:32,831-Speed 6311.09 samples/sec Loss 7.0433 LearningRate 0.0008 Epoch: 7 Global Step: 164180 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:36,074-Speed 6315.26 samples/sec Loss 7.0459 LearningRate 0.0008 Epoch: 7 Global Step: 164190 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:39:39,307-Speed 6335.75 samples/sec Loss 7.0539 LearningRate 0.0008 Epoch: 7 Global Step: 164200 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:39:42,561-Speed 6298.19 samples/sec Loss 6.9860 LearningRate 0.0008 Epoch: 7 Global Step: 164210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:39:45,808-Speed 6309.48 samples/sec Loss 6.9976 LearningRate 0.0008 Epoch: 7 Global Step: 164220 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:39:49,055-Speed 6307.29 samples/sec Loss 6.9988 LearningRate 0.0008 Epoch: 7 Global Step: 164230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:39:52,298-Speed 6316.11 samples/sec Loss 6.9327 LearningRate 0.0008 Epoch: 7 Global Step: 164240 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:39:55,547-Speed 6306.27 samples/sec Loss 7.1373 LearningRate 0.0008 Epoch: 7 Global Step: 164250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:39:58,791-Speed 6314.10 samples/sec Loss 6.9733 LearningRate 0.0008 Epoch: 7 Global Step: 164260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:40:02,040-Speed 6306.30 samples/sec Loss 7.0474 LearningRate 0.0008 Epoch: 7 Global Step: 164270 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:40:05,286-Speed 6310.20 samples/sec Loss 6.9241 LearningRate 0.0008 Epoch: 7 Global Step: 164280 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:40:08,536-Speed 6302.87 samples/sec Loss 6.9705 LearningRate 0.0008 Epoch: 7 Global Step: 164290 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:40:11,804-Speed 6267.70 samples/sec Loss 6.9674 LearningRate 0.0008 Epoch: 7 Global Step: 164300 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:40:15,054-Speed 6303.21 samples/sec Loss 6.9289 LearningRate 0.0008 Epoch: 7 Global Step: 164310 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:40:18,312-Speed 6287.92 samples/sec Loss 7.0180 LearningRate 0.0008 Epoch: 7 Global Step: 164320 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:40:21,556-Speed 6313.11 samples/sec Loss 6.9456 LearningRate 0.0008 Epoch: 7 Global Step: 164330 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:40:24,802-Speed 6311.15 samples/sec Loss 7.0689 LearningRate 0.0008 Epoch: 7 Global Step: 164340 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:40:28,051-Speed 6304.25 samples/sec Loss 7.0409 LearningRate 0.0008 Epoch: 7 Global Step: 164350 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:40:31,297-Speed 6312.24 samples/sec Loss 6.9853 LearningRate 0.0008 Epoch: 7 Global Step: 164360 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:40:34,544-Speed 6308.22 samples/sec Loss 7.0187 LearningRate 0.0008 Epoch: 7 Global Step: 164370 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:40:37,782-Speed 6327.42 samples/sec Loss 7.0119 LearningRate 0.0008 Epoch: 7 Global Step: 164380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:40:41,029-Speed 6307.14 samples/sec Loss 6.9984 LearningRate 0.0008 Epoch: 7 Global Step: 164390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:40:44,276-Speed 6309.55 samples/sec Loss 6.9776 LearningRate 0.0008 Epoch: 7 Global Step: 164400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:40:47,527-Speed 6301.87 samples/sec Loss 7.0080 LearningRate 0.0008 Epoch: 7 Global Step: 164410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:40:50,777-Speed 6302.14 samples/sec Loss 7.0141 LearningRate 0.0008 Epoch: 7 Global Step: 164420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:40:54,026-Speed 6304.39 samples/sec Loss 7.0387 LearningRate 0.0008 Epoch: 7 Global Step: 164430 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:40:57,342-Speed 6178.00 samples/sec Loss 7.0916 LearningRate 0.0008 Epoch: 7 Global Step: 164440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:00,589-Speed 6308.94 samples/sec Loss 7.0534 LearningRate 0.0008 Epoch: 7 Global Step: 164450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:03,838-Speed 6305.55 samples/sec Loss 6.9725 LearningRate 0.0008 Epoch: 7 Global Step: 164460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:07,082-Speed 6313.21 samples/sec Loss 7.0646 LearningRate 0.0008 Epoch: 7 Global Step: 164470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:10,328-Speed 6310.45 samples/sec Loss 7.0090 LearningRate 0.0008 Epoch: 7 Global Step: 164480 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:41:13,577-Speed 6305.25 samples/sec Loss 6.9968 LearningRate 0.0008 Epoch: 7 Global Step: 164490 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:41:16,826-Speed 6304.50 samples/sec Loss 7.0418 LearningRate 0.0008 Epoch: 7 Global Step: 164500 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:41:20,073-Speed 6308.72 samples/sec Loss 6.9638 LearningRate 0.0008 Epoch: 7 Global Step: 164510 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:41:23,323-Speed 6303.94 samples/sec Loss 7.0374 LearningRate 0.0008 Epoch: 7 Global Step: 164520 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:41:26,569-Speed 6310.33 samples/sec Loss 7.0994 LearningRate 0.0008 Epoch: 7 Global Step: 164530 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:41:29,811-Speed 6319.75 samples/sec Loss 7.0078 LearningRate 0.0008 Epoch: 7 Global Step: 164540 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:33,059-Speed 6305.53 samples/sec Loss 7.0215 LearningRate 0.0008 Epoch: 7 Global Step: 164550 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:36,305-Speed 6310.47 samples/sec Loss 7.1244 LearningRate 0.0008 Epoch: 7 Global Step: 164560 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:39,554-Speed 6305.67 samples/sec Loss 7.0664 LearningRate 0.0008 Epoch: 7 Global Step: 164570 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:42,808-Speed 6294.11 samples/sec Loss 7.1028 LearningRate 0.0008 Epoch: 7 Global Step: 164580 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:46,066-Speed 6287.84 samples/sec Loss 7.0177 LearningRate 0.0008 Epoch: 7 Global Step: 164590 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:49,379-Speed 6183.65 samples/sec Loss 6.9544 LearningRate 0.0008 Epoch: 7 Global Step: 164600 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:52,626-Speed 6308.55 samples/sec Loss 7.1026 LearningRate 0.0008 Epoch: 7 Global Step: 164610 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:55,878-Speed 6300.33 samples/sec Loss 7.0312 LearningRate 0.0008 Epoch: 7 Global Step: 164620 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:41:59,125-Speed 6308.92 samples/sec Loss 7.0280 LearningRate 0.0008 Epoch: 7 Global Step: 164630 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:02,376-Speed 6304.00 samples/sec Loss 7.1078 LearningRate 0.0008 Epoch: 7 Global Step: 164640 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:42:05,621-Speed 6313.05 samples/sec Loss 7.0175 LearningRate 0.0008 Epoch: 7 Global Step: 164650 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:42:08,868-Speed 6308.40 samples/sec Loss 6.9596 LearningRate 0.0008 Epoch: 7 Global Step: 164660 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:42:12,100-Speed 6337.86 samples/sec Loss 6.9004 LearningRate 0.0008 Epoch: 7 Global Step: 164670 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:15,349-Speed 6305.62 samples/sec Loss 7.0022 LearningRate 0.0008 Epoch: 7 Global Step: 164680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:18,596-Speed 6309.23 samples/sec Loss 7.0361 LearningRate 0.0008 Epoch: 7 Global Step: 164690 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:21,842-Speed 6309.78 samples/sec Loss 6.9785 LearningRate 0.0008 Epoch: 7 Global Step: 164700 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:25,087-Speed 6312.16 samples/sec Loss 6.9661 LearningRate 0.0008 Epoch: 7 Global Step: 164710 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:28,334-Speed 6309.92 samples/sec Loss 7.0405 LearningRate 0.0008 Epoch: 7 Global Step: 164720 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:31,578-Speed 6315.31 samples/sec Loss 7.0361 LearningRate 0.0008 Epoch: 7 Global Step: 164730 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:34,826-Speed 6307.17 samples/sec Loss 7.0550 LearningRate 0.0008 Epoch: 7 Global Step: 164740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:38,071-Speed 6311.63 samples/sec Loss 6.9904 LearningRate 0.0008 Epoch: 7 Global Step: 164750 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:41,320-Speed 6305.61 samples/sec Loss 6.8811 LearningRate 0.0008 Epoch: 7 Global Step: 164760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:44,550-Speed 6341.21 samples/sec Loss 7.0720 LearningRate 0.0008 Epoch: 7 Global Step: 164770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:47,797-Speed 6309.16 samples/sec Loss 6.9775 LearningRate 0.0008 Epoch: 7 Global Step: 164780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:51,041-Speed 6314.32 samples/sec Loss 7.1171 LearningRate 0.0008 Epoch: 7 Global Step: 164790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:54,288-Speed 6309.51 samples/sec Loss 7.0528 LearningRate 0.0008 Epoch: 7 Global Step: 164800 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:42:57,537-Speed 6304.52 samples/sec Loss 7.0355 LearningRate 0.0008 Epoch: 7 Global Step: 164810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:00,784-Speed 6310.25 samples/sec Loss 6.9892 LearningRate 0.0008 Epoch: 7 Global Step: 164820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:04,031-Speed 6308.30 samples/sec Loss 6.9922 LearningRate 0.0008 Epoch: 7 Global Step: 164830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:07,277-Speed 6310.32 samples/sec Loss 7.0623 LearningRate 0.0008 Epoch: 7 Global Step: 164840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:10,527-Speed 6302.31 samples/sec Loss 7.0386 LearningRate 0.0008 Epoch: 7 Global Step: 164850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:13,867-Speed 6134.03 samples/sec Loss 7.0533 LearningRate 0.0008 Epoch: 7 Global Step: 164860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:17,169-Speed 6203.23 samples/sec Loss 7.0239 LearningRate 0.0008 Epoch: 7 Global Step: 164870 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:43:20,417-Speed 6306.34 samples/sec Loss 7.0034 LearningRate 0.0008 Epoch: 7 Global Step: 164880 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:43:23,662-Speed 6314.00 samples/sec Loss 6.9501 LearningRate 0.0008 Epoch: 7 Global Step: 164890 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:43:26,908-Speed 6308.92 samples/sec Loss 7.0557 LearningRate 0.0008 Epoch: 7 Global Step: 164900 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:43:30,157-Speed 6305.06 samples/sec Loss 6.9657 LearningRate 0.0008 Epoch: 7 Global Step: 164910 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:43:33,404-Speed 6308.58 samples/sec Loss 6.9579 LearningRate 0.0008 Epoch: 7 Global Step: 164920 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:43:36,667-Speed 6278.37 samples/sec Loss 7.0635 LearningRate 0.0008 Epoch: 7 Global Step: 164930 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:43:39,900-Speed 6335.15 samples/sec Loss 7.0612 LearningRate 0.0008 Epoch: 7 Global Step: 164940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:43,149-Speed 6306.18 samples/sec Loss 6.9819 LearningRate 0.0008 Epoch: 7 Global Step: 164950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:46,397-Speed 6306.73 samples/sec Loss 7.0402 LearningRate 0.0008 Epoch: 7 Global Step: 164960 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:49,644-Speed 6308.96 samples/sec Loss 7.0123 LearningRate 0.0008 Epoch: 7 Global Step: 164970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:52,890-Speed 6309.16 samples/sec Loss 6.9716 LearningRate 0.0008 Epoch: 7 Global Step: 164980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:56,141-Speed 6302.44 samples/sec Loss 6.9460 LearningRate 0.0008 Epoch: 7 Global Step: 164990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:43:59,390-Speed 6304.00 samples/sec Loss 6.9923 LearningRate 0.0008 Epoch: 7 Global Step: 165000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:02,639-Speed 6304.98 samples/sec Loss 7.0113 LearningRate 0.0008 Epoch: 7 Global Step: 165010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:05,887-Speed 6308.62 samples/sec Loss 6.9808 LearningRate 0.0008 Epoch: 7 Global Step: 165020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:09,135-Speed 6305.72 samples/sec Loss 6.9503 LearningRate 0.0008 Epoch: 7 Global Step: 165030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:12,380-Speed 6312.56 samples/sec Loss 6.9483 LearningRate 0.0008 Epoch: 7 Global Step: 165040 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:44:15,629-Speed 6305.74 samples/sec Loss 7.0712 LearningRate 0.0008 Epoch: 7 Global Step: 165050 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:44:18,861-Speed 6337.77 samples/sec Loss 7.0428 LearningRate 0.0008 Epoch: 7 Global Step: 165060 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:22,109-Speed 6307.13 samples/sec Loss 6.9667 LearningRate 0.0008 Epoch: 7 Global Step: 165070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:25,355-Speed 6311.57 samples/sec Loss 6.9490 LearningRate 0.0008 Epoch: 7 Global Step: 165080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:28,597-Speed 6317.96 samples/sec Loss 7.0876 LearningRate 0.0008 Epoch: 7 Global Step: 165090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:31,843-Speed 6310.77 samples/sec Loss 6.9988 LearningRate 0.0008 Epoch: 7 Global Step: 165100 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:35,088-Speed 6312.40 samples/sec Loss 7.0819 LearningRate 0.0008 Epoch: 7 Global Step: 165110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:38,346-Speed 6287.45 samples/sec Loss 7.0158 LearningRate 0.0008 Epoch: 7 Global Step: 165120 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:41,597-Speed 6300.03 samples/sec Loss 6.9695 LearningRate 0.0008 Epoch: 7 Global Step: 165130 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:44,841-Speed 6314.63 samples/sec Loss 6.9781 LearningRate 0.0008 Epoch: 7 Global Step: 165140 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:48,086-Speed 6313.86 samples/sec Loss 7.0520 LearningRate 0.0008 Epoch: 7 Global Step: 165150 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:44:51,332-Speed 6309.79 samples/sec Loss 6.9550 LearningRate 0.0008 Epoch: 7 Global Step: 165160 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:44:54,581-Speed 6304.65 samples/sec Loss 6.9984 LearningRate 0.0008 Epoch: 7 Global Step: 165170 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:44:57,814-Speed 6336.46 samples/sec Loss 6.9502 LearningRate 0.0008 Epoch: 7 Global Step: 165180 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:01,061-Speed 6309.00 samples/sec Loss 6.9909 LearningRate 0.0008 Epoch: 7 Global Step: 165190 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:04,307-Speed 6311.21 samples/sec Loss 7.0071 LearningRate 0.0008 Epoch: 7 Global Step: 165200 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:07,549-Speed 6316.68 samples/sec Loss 6.9847 LearningRate 0.0008 Epoch: 7 Global Step: 165210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:10,797-Speed 6306.94 samples/sec Loss 7.0059 LearningRate 0.0008 Epoch: 7 Global Step: 165220 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:14,043-Speed 6311.77 samples/sec Loss 6.9633 LearningRate 0.0008 Epoch: 7 Global Step: 165230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:17,291-Speed 6307.71 samples/sec Loss 6.9467 LearningRate 0.0008 Epoch: 7 Global Step: 165240 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:20,546-Speed 6293.83 samples/sec Loss 6.9959 LearningRate 0.0008 Epoch: 7 Global Step: 165250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:23,792-Speed 6310.77 samples/sec Loss 7.0681 LearningRate 0.0008 Epoch: 7 Global Step: 165260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:27,037-Speed 6312.36 samples/sec Loss 7.0048 LearningRate 0.0008 Epoch: 7 Global Step: 165270 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:30,280-Speed 6316.62 samples/sec Loss 7.0113 LearningRate 0.0008 Epoch: 7 Global Step: 165280 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:45:33,524-Speed 6314.40 samples/sec Loss 6.9770 LearningRate 0.0008 Epoch: 7 Global Step: 165290 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:45:36,769-Speed 6311.71 samples/sec Loss 7.0527 LearningRate 0.0008 Epoch: 7 Global Step: 165300 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:45:40,016-Speed 6309.32 samples/sec Loss 6.9407 LearningRate 0.0008 Epoch: 7 Global Step: 165310 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:45:43,249-Speed 6336.77 samples/sec Loss 6.9577 LearningRate 0.0008 Epoch: 7 Global Step: 165320 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:46,492-Speed 6315.44 samples/sec Loss 6.9587 LearningRate 0.0008 Epoch: 7 Global Step: 165330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:49,736-Speed 6315.62 samples/sec Loss 7.0094 LearningRate 0.0008 Epoch: 7 Global Step: 165340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:52,984-Speed 6305.85 samples/sec Loss 7.0045 LearningRate 0.0008 Epoch: 7 Global Step: 165350 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:56,237-Speed 6297.58 samples/sec Loss 7.0348 LearningRate 0.0008 Epoch: 7 Global Step: 165360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:45:59,488-Speed 6300.74 samples/sec Loss 7.0365 LearningRate 0.0008 Epoch: 7 Global Step: 165370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:02,739-Speed 6301.53 samples/sec Loss 7.0506 LearningRate 0.0008 Epoch: 7 Global Step: 165380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:05,990-Speed 6300.78 samples/sec Loss 7.0360 LearningRate 0.0008 Epoch: 7 Global Step: 165390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:09,240-Speed 6302.64 samples/sec Loss 6.9703 LearningRate 0.0008 Epoch: 7 Global Step: 165400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:12,489-Speed 6303.99 samples/sec Loss 6.9731 LearningRate 0.0008 Epoch: 7 Global Step: 165410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:15,736-Speed 6309.38 samples/sec Loss 7.0108 LearningRate 0.0008 Epoch: 7 Global Step: 165420 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:46:19,006-Speed 6265.54 samples/sec Loss 7.0624 LearningRate 0.0008 Epoch: 7 Global Step: 165430 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:46:22,241-Speed 6330.41 samples/sec Loss 7.0359 LearningRate 0.0008 Epoch: 7 Global Step: 165440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:25,490-Speed 6305.33 samples/sec Loss 7.0084 LearningRate 0.0008 Epoch: 7 Global Step: 165450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:28,745-Speed 6293.34 samples/sec Loss 6.9836 LearningRate 0.0008 Epoch: 7 Global Step: 165460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:31,995-Speed 6303.31 samples/sec Loss 6.9752 LearningRate 0.0008 Epoch: 7 Global Step: 165470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:35,252-Speed 6290.24 samples/sec Loss 6.9661 LearningRate 0.0008 Epoch: 7 Global Step: 165480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:38,499-Speed 6308.62 samples/sec Loss 7.0247 LearningRate 0.0008 Epoch: 7 Global Step: 165490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:41,745-Speed 6310.79 samples/sec Loss 7.0499 LearningRate 0.0008 Epoch: 7 Global Step: 165500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:44,992-Speed 6309.47 samples/sec Loss 6.9855 LearningRate 0.0008 Epoch: 7 Global Step: 165510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:48,239-Speed 6309.14 samples/sec Loss 7.1183 LearningRate 0.0008 Epoch: 7 Global Step: 165520 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:51,485-Speed 6309.80 samples/sec Loss 6.9625 LearningRate 0.0008 Epoch: 7 Global Step: 165530 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:46:54,732-Speed 6308.96 samples/sec Loss 7.0024 LearningRate 0.0008 Epoch: 7 Global Step: 165540 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:46:57,979-Speed 6308.70 samples/sec Loss 7.0630 LearningRate 0.0008 Epoch: 7 Global Step: 165550 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:47:01,230-Speed 6301.49 samples/sec Loss 7.0290 LearningRate 0.0008 Epoch: 7 Global Step: 165560 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:47:04,476-Speed 6309.30 samples/sec Loss 7.0394 LearningRate 0.0008 Epoch: 7 Global Step: 165570 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:47:07,729-Speed 6298.18 samples/sec Loss 7.0246 LearningRate 0.0008 Epoch: 7 Global Step: 165580 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:47:10,973-Speed 6314.54 samples/sec Loss 7.0040 LearningRate 0.0008 Epoch: 7 Global Step: 165590 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:47:14,224-Speed 6300.68 samples/sec Loss 7.0037 LearningRate 0.0008 Epoch: 7 Global Step: 165600 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:47:17,457-Speed 6335.74 samples/sec Loss 7.0150 LearningRate 0.0008 Epoch: 7 Global Step: 165610 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:47:20,706-Speed 6304.41 samples/sec Loss 6.9648 LearningRate 0.0008 Epoch: 7 Global Step: 165620 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:47:23,956-Speed 6303.90 samples/sec Loss 6.9863 LearningRate 0.0008 Epoch: 7 Global Step: 165630 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:47:27,203-Speed 6308.39 samples/sec Loss 7.0259 LearningRate 0.0008 Epoch: 7 Global Step: 165640 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:47:30,443-Speed 6322.05 samples/sec Loss 7.0024 LearningRate 0.0008 Epoch: 7 Global Step: 165650 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:47:33,689-Speed 6310.97 samples/sec Loss 6.9816 LearningRate 0.0008 Epoch: 7 Global Step: 165660 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:47:36,937-Speed 6307.03 samples/sec Loss 6.9892 LearningRate 0.0008 Epoch: 7 Global Step: 165670 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:47:40,183-Speed 6312.66 samples/sec Loss 6.9542 LearningRate 0.0008 Epoch: 7 Global Step: 165680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:47:43,429-Speed 6310.35 samples/sec Loss 6.9643 LearningRate 0.0008 Epoch: 7 Global Step: 165690 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:47:46,678-Speed 6304.97 samples/sec Loss 7.0035 LearningRate 0.0008 Epoch: 7 Global Step: 165700 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:47:49,920-Speed 6317.39 samples/sec Loss 7.0076 LearningRate 0.0008 Epoch: 7 Global Step: 165710 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:47:53,166-Speed 6310.03 samples/sec Loss 7.0587 LearningRate 0.0008 Epoch: 7 Global Step: 165720 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:47:56,414-Speed 6307.67 samples/sec Loss 7.0652 LearningRate 0.0008 Epoch: 7 Global Step: 165730 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:47:59,650-Speed 6329.34 samples/sec Loss 7.0657 LearningRate 0.0008 Epoch: 7 Global Step: 165740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:48:02,902-Speed 6300.27 samples/sec Loss 7.0154 LearningRate 0.0008 Epoch: 7 Global Step: 165750 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:48:06,149-Speed 6308.68 samples/sec Loss 7.0640 LearningRate 0.0008 Epoch: 7 Global Step: 165760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:48:09,398-Speed 6305.24 samples/sec Loss 7.0213 LearningRate 0.0008 Epoch: 7 Global Step: 165770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:48:12,648-Speed 6302.60 samples/sec Loss 7.0385 LearningRate 0.0008 Epoch: 7 Global Step: 165780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:48:15,901-Speed 6297.44 samples/sec Loss 7.0117 LearningRate 0.0008 Epoch: 7 Global Step: 165790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:48:19,150-Speed 6303.66 samples/sec Loss 7.0067 LearningRate 0.0008 Epoch: 7 Global Step: 165800 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:48:22,410-Speed 6283.22 samples/sec Loss 7.0005 LearningRate 0.0008 Epoch: 7 Global Step: 165810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:48:25,657-Speed 6310.11 samples/sec Loss 6.9797 LearningRate 0.0008 Epoch: 7 Global Step: 165820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:48:28,903-Speed 6310.79 samples/sec Loss 6.9591 LearningRate 0.0008 Epoch: 7 Global Step: 165830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:48:32,149-Speed 6310.97 samples/sec Loss 7.0446 LearningRate 0.0008 Epoch: 7 Global Step: 165840 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:48:35,400-Speed 6300.53 samples/sec Loss 7.0397 LearningRate 0.0008 Epoch: 7 Global Step: 165850 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:48:38,647-Speed 6309.03 samples/sec Loss 6.9934 LearningRate 0.0008 Epoch: 7 Global Step: 165860 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:48:41,895-Speed 6305.90 samples/sec Loss 7.0315 LearningRate 0.0008 Epoch: 7 Global Step: 165870 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:48:45,144-Speed 6306.94 samples/sec Loss 6.9967 LearningRate 0.0008 Epoch: 7 Global Step: 165880 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:48:48,392-Speed 6305.16 samples/sec Loss 7.0770 LearningRate 0.0008 Epoch: 7 Global Step: 165890 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:48:51,640-Speed 6308.44 samples/sec Loss 7.0096 LearningRate 0.0008 Epoch: 7 Global Step: 165900 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:48:54,888-Speed 6306.35 samples/sec Loss 6.9793 LearningRate 0.0008 Epoch: 7 Global Step: 165910 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:49:54,139-Speed 345.65 samples/sec Loss 6.9755 LearningRate 0.0008 Epoch: 8 Global Step: 165920 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:49:57,378-Speed 6324.52 samples/sec Loss 7.0571 LearningRate 0.0008 Epoch: 8 Global Step: 165930 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:50:00,591-Speed 6375.56 samples/sec Loss 7.0336 LearningRate 0.0008 Epoch: 8 Global Step: 165940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:03,830-Speed 6323.89 samples/sec Loss 7.1146 LearningRate 0.0008 Epoch: 8 Global Step: 165950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:07,070-Speed 6323.33 samples/sec Loss 6.9721 LearningRate 0.0008 Epoch: 8 Global Step: 165960 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:10,314-Speed 6313.75 samples/sec Loss 6.9649 LearningRate 0.0008 Epoch: 8 Global Step: 165970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:13,553-Speed 6323.84 samples/sec Loss 6.9688 LearningRate 0.0008 Epoch: 8 Global Step: 165980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:16,796-Speed 6317.71 samples/sec Loss 6.9912 LearningRate 0.0008 Epoch: 8 Global Step: 165990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:20,034-Speed 6324.52 samples/sec Loss 6.9246 LearningRate 0.0008 Epoch: 8 Global Step: 166000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:23,279-Speed 6312.49 samples/sec Loss 6.9810 LearningRate 0.0008 Epoch: 8 Global Step: 166010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:26,519-Speed 6323.66 samples/sec Loss 6.9157 LearningRate 0.0008 Epoch: 8 Global Step: 166020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:29,754-Speed 6330.81 samples/sec Loss 6.9007 LearningRate 0.0008 Epoch: 8 Global Step: 166030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:33,000-Speed 6311.56 samples/sec Loss 7.0160 LearningRate 0.0008 Epoch: 8 Global Step: 166040 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:50:36,239-Speed 6323.75 samples/sec Loss 7.0622 LearningRate 0.0008 Epoch: 8 Global Step: 166050 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:50:39,470-Speed 6341.54 samples/sec Loss 7.0085 LearningRate 0.0008 Epoch: 8 Global Step: 166060 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:42,711-Speed 6319.43 samples/sec Loss 7.0139 LearningRate 0.0008 Epoch: 8 Global Step: 166070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:45,961-Speed 6304.54 samples/sec Loss 7.0628 LearningRate 0.0008 Epoch: 8 Global Step: 166080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:49,203-Speed 6317.49 samples/sec Loss 7.0553 LearningRate 0.0008 Epoch: 8 Global Step: 166090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:52,444-Speed 6321.06 samples/sec Loss 6.9860 LearningRate 0.0008 Epoch: 8 Global Step: 166100 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:55,689-Speed 6313.58 samples/sec Loss 6.9629 LearningRate 0.0008 Epoch: 8 Global Step: 166110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:50:58,927-Speed 6326.65 samples/sec Loss 6.9482 LearningRate 0.0008 Epoch: 8 Global Step: 166120 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:02,168-Speed 6320.59 samples/sec Loss 6.9637 LearningRate 0.0008 Epoch: 8 Global Step: 166130 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:05,407-Speed 6324.78 samples/sec Loss 7.0246 LearningRate 0.0008 Epoch: 8 Global Step: 166140 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:08,644-Speed 6327.48 samples/sec Loss 7.0027 LearningRate 0.0008 Epoch: 8 Global Step: 166150 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:11,885-Speed 6319.78 samples/sec Loss 6.8858 LearningRate 0.0008 Epoch: 8 Global Step: 166160 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:51:15,133-Speed 6307.57 samples/sec Loss 6.8861 LearningRate 0.0008 Epoch: 8 Global Step: 166170 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:51:18,376-Speed 6316.73 samples/sec Loss 7.0087 LearningRate 0.0008 Epoch: 8 Global Step: 166180 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:51:21,621-Speed 6311.43 samples/sec Loss 6.9525 LearningRate 0.0008 Epoch: 8 Global Step: 166190 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:51:24,861-Speed 6323.38 samples/sec Loss 6.8924 LearningRate 0.0008 Epoch: 8 Global Step: 166200 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:51:28,089-Speed 6345.14 samples/sec Loss 7.0103 LearningRate 0.0008 Epoch: 8 Global Step: 166210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:31,334-Speed 6314.30 samples/sec Loss 6.9470 LearningRate 0.0008 Epoch: 8 Global Step: 166220 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:34,574-Speed 6321.94 samples/sec Loss 6.9892 LearningRate 0.0008 Epoch: 8 Global Step: 166230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:37,844-Speed 6263.45 samples/sec Loss 7.0069 LearningRate 0.0008 Epoch: 8 Global Step: 166240 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:41,084-Speed 6322.67 samples/sec Loss 7.0230 LearningRate 0.0008 Epoch: 8 Global Step: 166250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:44,323-Speed 6325.47 samples/sec Loss 7.0148 LearningRate 0.0008 Epoch: 8 Global Step: 166260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:47,563-Speed 6321.76 samples/sec Loss 6.8856 LearningRate 0.0008 Epoch: 8 Global Step: 166270 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:50,805-Speed 6318.07 samples/sec Loss 7.0250 LearningRate 0.0008 Epoch: 8 Global Step: 166280 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:54,046-Speed 6321.86 samples/sec Loss 7.0069 LearningRate 0.0008 Epoch: 8 Global Step: 166290 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:51:57,289-Speed 6315.84 samples/sec Loss 6.9730 LearningRate 0.0008 Epoch: 8 Global Step: 166300 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:00,531-Speed 6318.20 samples/sec Loss 6.9991 LearningRate 0.0008 Epoch: 8 Global Step: 166310 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:52:03,797-Speed 6271.96 samples/sec Loss 7.0144 LearningRate 0.0008 Epoch: 8 Global Step: 166320 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:52:07,024-Speed 6347.77 samples/sec Loss 6.9715 LearningRate 0.0008 Epoch: 8 Global Step: 166330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:10,263-Speed 6325.84 samples/sec Loss 6.9998 LearningRate 0.0008 Epoch: 8 Global Step: 166340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:13,503-Speed 6322.44 samples/sec Loss 6.9756 LearningRate 0.0008 Epoch: 8 Global Step: 166350 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:16,743-Speed 6321.91 samples/sec Loss 6.9329 LearningRate 0.0008 Epoch: 8 Global Step: 166360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:19,980-Speed 6328.05 samples/sec Loss 6.9371 LearningRate 0.0008 Epoch: 8 Global Step: 166370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:23,225-Speed 6312.02 samples/sec Loss 7.0339 LearningRate 0.0008 Epoch: 8 Global Step: 166380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:26,466-Speed 6321.22 samples/sec Loss 6.9522 LearningRate 0.0008 Epoch: 8 Global Step: 166390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:29,707-Speed 6319.08 samples/sec Loss 7.0252 LearningRate 0.0008 Epoch: 8 Global Step: 166400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:32,950-Speed 6317.84 samples/sec Loss 6.9468 LearningRate 0.0008 Epoch: 8 Global Step: 166410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:36,195-Speed 6312.69 samples/sec Loss 7.0253 LearningRate 0.0008 Epoch: 8 Global Step: 166420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:39,439-Speed 6313.89 samples/sec Loss 7.0124 LearningRate 0.0008 Epoch: 8 Global Step: 166430 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:52:42,666-Speed 6348.28 samples/sec Loss 7.0507 LearningRate 0.0008 Epoch: 8 Global Step: 166440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:45,906-Speed 6322.76 samples/sec Loss 6.9162 LearningRate 0.0008 Epoch: 8 Global Step: 166450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:49,144-Speed 6324.81 samples/sec Loss 6.9626 LearningRate 0.0008 Epoch: 8 Global Step: 166460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:52,387-Speed 6317.37 samples/sec Loss 6.9857 LearningRate 0.0008 Epoch: 8 Global Step: 166470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:55,634-Speed 6309.55 samples/sec Loss 6.9703 LearningRate 0.0008 Epoch: 8 Global Step: 166480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:52:58,876-Speed 6318.95 samples/sec Loss 7.0112 LearningRate 0.0008 Epoch: 8 Global Step: 166490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:02,119-Speed 6316.17 samples/sec Loss 6.9094 LearningRate 0.0008 Epoch: 8 Global Step: 166500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:05,358-Speed 6325.12 samples/sec Loss 7.0015 LearningRate 0.0008 Epoch: 8 Global Step: 166510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:08,603-Speed 6313.08 samples/sec Loss 6.9007 LearningRate 0.0008 Epoch: 8 Global Step: 166520 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:11,844-Speed 6320.23 samples/sec Loss 6.8957 LearningRate 0.0008 Epoch: 8 Global Step: 166530 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:15,087-Speed 6316.23 samples/sec Loss 7.0609 LearningRate 0.0008 Epoch: 8 Global Step: 166540 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:53:18,317-Speed 6341.14 samples/sec Loss 7.0272 LearningRate 0.0008 Epoch: 8 Global Step: 166550 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:21,560-Speed 6317.06 samples/sec Loss 6.9244 LearningRate 0.0008 Epoch: 8 Global Step: 166560 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:24,802-Speed 6318.94 samples/sec Loss 7.0682 LearningRate 0.0008 Epoch: 8 Global Step: 166570 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:28,042-Speed 6322.41 samples/sec Loss 7.0637 LearningRate 0.0008 Epoch: 8 Global Step: 166580 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:31,282-Speed 6322.41 samples/sec Loss 7.0237 LearningRate 0.0008 Epoch: 8 Global Step: 166590 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:34,525-Speed 6316.64 samples/sec Loss 6.9776 LearningRate 0.0008 Epoch: 8 Global Step: 166600 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:37,768-Speed 6315.24 samples/sec Loss 6.9175 LearningRate 0.0008 Epoch: 8 Global Step: 166610 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:41,009-Speed 6320.52 samples/sec Loss 6.8906 LearningRate 0.0008 Epoch: 8 Global Step: 166620 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:44,252-Speed 6317.81 samples/sec Loss 6.9819 LearningRate 0.0008 Epoch: 8 Global Step: 166630 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:47,493-Speed 6319.37 samples/sec Loss 6.9941 LearningRate 0.0008 Epoch: 8 Global Step: 166640 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:53:50,733-Speed 6322.58 samples/sec Loss 7.0195 LearningRate 0.0008 Epoch: 8 Global Step: 166650 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:53:53,974-Speed 6320.50 samples/sec Loss 6.9688 LearningRate 0.0008 Epoch: 8 Global Step: 166660 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:53:57,216-Speed 6318.38 samples/sec Loss 7.0237 LearningRate 0.0008 Epoch: 8 Global Step: 166670 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:00,459-Speed 6317.50 samples/sec Loss 6.9557 LearningRate 0.0008 Epoch: 8 Global Step: 166680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:03,708-Speed 6305.20 samples/sec Loss 6.9565 LearningRate 0.0008 Epoch: 8 Global Step: 166690 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:06,953-Speed 6311.27 samples/sec Loss 7.0103 LearningRate 0.0008 Epoch: 8 Global Step: 166700 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:10,245-Speed 6223.63 samples/sec Loss 6.9798 LearningRate 0.0008 Epoch: 8 Global Step: 166710 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:13,485-Speed 6322.03 samples/sec Loss 6.9928 LearningRate 0.0008 Epoch: 8 Global Step: 166720 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:16,730-Speed 6313.08 samples/sec Loss 6.9710 LearningRate 0.0008 Epoch: 8 Global Step: 166730 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:19,975-Speed 6312.43 samples/sec Loss 7.0087 LearningRate 0.0008 Epoch: 8 Global Step: 166740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:23,218-Speed 6317.43 samples/sec Loss 6.9251 LearningRate 0.0008 Epoch: 8 Global Step: 166750 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:26,455-Speed 6328.83 samples/sec Loss 7.0002 LearningRate 0.0008 Epoch: 8 Global Step: 166760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:29,682-Speed 6346.93 samples/sec Loss 7.0526 LearningRate 0.0008 Epoch: 8 Global Step: 166770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:32,929-Speed 6309.24 samples/sec Loss 6.9747 LearningRate 0.0008 Epoch: 8 Global Step: 166780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:36,177-Speed 6306.49 samples/sec Loss 7.0224 LearningRate 0.0008 Epoch: 8 Global Step: 166790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:39,417-Speed 6322.40 samples/sec Loss 7.0424 LearningRate 0.0008 Epoch: 8 Global Step: 166800 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:42,661-Speed 6314.15 samples/sec Loss 6.9612 LearningRate 0.0008 Epoch: 8 Global Step: 166810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:45,907-Speed 6310.32 samples/sec Loss 6.9892 LearningRate 0.0008 Epoch: 8 Global Step: 166820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:49,152-Speed 6313.64 samples/sec Loss 7.0856 LearningRate 0.0008 Epoch: 8 Global Step: 166830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:52,403-Speed 6300.79 samples/sec Loss 6.9154 LearningRate 0.0008 Epoch: 8 Global Step: 166840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:55,647-Speed 6315.48 samples/sec Loss 6.9299 LearningRate 0.0008 Epoch: 8 Global Step: 166850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:54:58,888-Speed 6320.14 samples/sec Loss 6.9595 LearningRate 0.0008 Epoch: 8 Global Step: 166860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:02,129-Speed 6319.55 samples/sec Loss 6.9830 LearningRate 0.0008 Epoch: 8 Global Step: 166870 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:55:05,369-Speed 6322.76 samples/sec Loss 6.9368 LearningRate 0.0008 Epoch: 8 Global Step: 166880 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:55:08,616-Speed 6308.52 samples/sec Loss 7.0491 LearningRate 0.0008 Epoch: 8 Global Step: 166890 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:55:11,860-Speed 6313.96 samples/sec Loss 7.0352 LearningRate 0.0008 Epoch: 8 Global Step: 166900 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:55:15,104-Speed 6315.68 samples/sec Loss 6.9879 LearningRate 0.0008 Epoch: 8 Global Step: 166910 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:55:18,333-Speed 6344.20 samples/sec Loss 7.0077 LearningRate 0.0008 Epoch: 8 Global Step: 166920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:21,579-Speed 6311.37 samples/sec Loss 6.9905 LearningRate 0.0008 Epoch: 8 Global Step: 166930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:24,821-Speed 6317.88 samples/sec Loss 7.0642 LearningRate 0.0008 Epoch: 8 Global Step: 166940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:28,064-Speed 6316.20 samples/sec Loss 6.9622 LearningRate 0.0008 Epoch: 8 Global Step: 166950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:31,305-Speed 6322.29 samples/sec Loss 7.0075 LearningRate 0.0008 Epoch: 8 Global Step: 166960 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:34,550-Speed 6312.31 samples/sec Loss 7.0307 LearningRate 0.0008 Epoch: 8 Global Step: 166970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:37,792-Speed 6318.28 samples/sec Loss 6.9935 LearningRate 0.0008 Epoch: 8 Global Step: 166980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:41,033-Speed 6319.33 samples/sec Loss 6.9126 LearningRate 0.0008 Epoch: 8 Global Step: 166990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:44,280-Speed 6310.20 samples/sec Loss 6.9816 LearningRate 0.0008 Epoch: 8 Global Step: 167000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:47,570-Speed 6224.87 samples/sec Loss 7.0486 LearningRate 0.0008 Epoch: 8 Global Step: 167010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:50,823-Speed 6296.90 samples/sec Loss 6.9070 LearningRate 0.0008 Epoch: 8 Global Step: 167020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:54,065-Speed 6318.30 samples/sec Loss 7.0184 LearningRate 0.0008 Epoch: 8 Global Step: 167030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:55:57,314-Speed 6305.14 samples/sec Loss 6.9667 LearningRate 0.0008 Epoch: 8 Global Step: 167040 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:00,557-Speed 6316.51 samples/sec Loss 6.9815 LearningRate 0.0008 Epoch: 8 Global Step: 167050 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:03,805-Speed 6308.14 samples/sec Loss 7.0115 LearningRate 0.0008 Epoch: 8 Global Step: 167060 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:07,048-Speed 6315.76 samples/sec Loss 6.8829 LearningRate 0.0008 Epoch: 8 Global Step: 167070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:10,289-Speed 6320.74 samples/sec Loss 7.0105 LearningRate 0.0008 Epoch: 8 Global Step: 167080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:13,538-Speed 6303.58 samples/sec Loss 6.9813 LearningRate 0.0008 Epoch: 8 Global Step: 167090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:16,787-Speed 6306.64 samples/sec Loss 7.0148 LearningRate 0.0008 Epoch: 8 Global Step: 167100 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:20,029-Speed 6317.79 samples/sec Loss 6.9778 LearningRate 0.0008 Epoch: 8 Global Step: 167110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:23,261-Speed 6337.66 samples/sec Loss 6.9847 LearningRate 0.0008 Epoch: 8 Global Step: 167120 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:26,501-Speed 6322.41 samples/sec Loss 6.9308 LearningRate 0.0008 Epoch: 8 Global Step: 167130 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:29,744-Speed 6318.14 samples/sec Loss 7.0627 LearningRate 0.0008 Epoch: 8 Global Step: 167140 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:32,987-Speed 6316.16 samples/sec Loss 6.9612 LearningRate 0.0008 Epoch: 8 Global Step: 167150 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:36,229-Speed 6319.03 samples/sec Loss 6.9662 LearningRate 0.0008 Epoch: 8 Global Step: 167160 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:39,476-Speed 6307.90 samples/sec Loss 6.9303 LearningRate 0.0008 Epoch: 8 Global Step: 167170 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:42,725-Speed 6304.40 samples/sec Loss 6.9816 LearningRate 0.0008 Epoch: 8 Global Step: 167180 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:45,970-Speed 6313.91 samples/sec Loss 6.9194 LearningRate 0.0008 Epoch: 8 Global Step: 167190 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:49,214-Speed 6313.87 samples/sec Loss 6.9530 LearningRate 0.0008 Epoch: 8 Global Step: 167200 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:52,462-Speed 6307.49 samples/sec Loss 7.0245 LearningRate 0.0008 Epoch: 8 Global Step: 167210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:56:55,712-Speed 6301.76 samples/sec Loss 6.9490 LearningRate 0.0008 Epoch: 8 Global Step: 167220 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:56:58,939-Speed 6347.82 samples/sec Loss 6.9863 LearningRate 0.0008 Epoch: 8 Global Step: 167230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:02,186-Speed 6309.68 samples/sec Loss 6.9734 LearningRate 0.0008 Epoch: 8 Global Step: 167240 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:05,429-Speed 6315.81 samples/sec Loss 6.9907 LearningRate 0.0008 Epoch: 8 Global Step: 167250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:08,675-Speed 6311.62 samples/sec Loss 6.9400 LearningRate 0.0008 Epoch: 8 Global Step: 167260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:11,921-Speed 6309.70 samples/sec Loss 7.0190 LearningRate 0.0008 Epoch: 8 Global Step: 167270 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:15,168-Speed 6308.64 samples/sec Loss 6.8513 LearningRate 0.0008 Epoch: 8 Global Step: 167280 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:18,414-Speed 6310.69 samples/sec Loss 7.0105 LearningRate 0.0008 Epoch: 8 Global Step: 167290 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:21,661-Speed 6310.21 samples/sec Loss 6.9949 LearningRate 0.0008 Epoch: 8 Global Step: 167300 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:24,910-Speed 6303.42 samples/sec Loss 7.0184 LearningRate 0.0008 Epoch: 8 Global Step: 167310 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:28,157-Speed 6309.67 samples/sec Loss 6.8702 LearningRate 0.0008 Epoch: 8 Global Step: 167320 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:31,399-Speed 6318.49 samples/sec Loss 6.9269 LearningRate 0.0008 Epoch: 8 Global Step: 167330 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:57:34,643-Speed 6315.33 samples/sec Loss 7.0133 LearningRate 0.0008 Epoch: 8 Global Step: 167340 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:57:37,888-Speed 6312.29 samples/sec Loss 6.9271 LearningRate 0.0008 Epoch: 8 Global Step: 167350 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:57:41,133-Speed 6313.33 samples/sec Loss 7.0277 LearningRate 0.0008 Epoch: 8 Global Step: 167360 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:57:44,377-Speed 6314.69 samples/sec Loss 7.0375 LearningRate 0.0008 Epoch: 8 Global Step: 167370 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:57:47,622-Speed 6313.23 samples/sec Loss 6.9481 LearningRate 0.0008 Epoch: 8 Global Step: 167380 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:57:50,852-Speed 6342.09 samples/sec Loss 6.9353 LearningRate 0.0008 Epoch: 8 Global Step: 167390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:54,096-Speed 6314.52 samples/sec Loss 6.9844 LearningRate 0.0008 Epoch: 8 Global Step: 167400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:57:57,339-Speed 6315.51 samples/sec Loss 6.9559 LearningRate 0.0008 Epoch: 8 Global Step: 167410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:00,595-Speed 6291.78 samples/sec Loss 6.9889 LearningRate 0.0008 Epoch: 8 Global Step: 167420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:03,843-Speed 6306.48 samples/sec Loss 6.9497 LearningRate 0.0008 Epoch: 8 Global Step: 167430 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:07,087-Speed 6314.41 samples/sec Loss 6.9424 LearningRate 0.0008 Epoch: 8 Global Step: 167440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:10,333-Speed 6311.49 samples/sec Loss 6.9980 LearningRate 0.0008 Epoch: 8 Global Step: 167450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:13,579-Speed 6311.44 samples/sec Loss 6.9476 LearningRate 0.0008 Epoch: 8 Global Step: 167460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:16,824-Speed 6312.82 samples/sec Loss 7.1030 LearningRate 0.0008 Epoch: 8 Global Step: 167470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:20,072-Speed 6306.48 samples/sec Loss 6.9310 LearningRate 0.0008 Epoch: 8 Global Step: 167480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:23,314-Speed 6317.46 samples/sec Loss 6.9721 LearningRate 0.0008 Epoch: 8 Global Step: 167490 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:58:26,560-Speed 6310.83 samples/sec Loss 6.9672 LearningRate 0.0008 Epoch: 8 Global Step: 167500 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:58:29,790-Speed 6341.37 samples/sec Loss 7.0011 LearningRate 0.0008 Epoch: 8 Global Step: 167510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:33,049-Speed 6286.20 samples/sec Loss 6.9507 LearningRate 0.0008 Epoch: 8 Global Step: 167520 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:36,292-Speed 6317.75 samples/sec Loss 6.9478 LearningRate 0.0008 Epoch: 8 Global Step: 167530 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:39,533-Speed 6319.93 samples/sec Loss 6.9822 LearningRate 0.0008 Epoch: 8 Global Step: 167540 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:42,795-Speed 6279.49 samples/sec Loss 6.9997 LearningRate 0.0008 Epoch: 8 Global Step: 167550 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:46,042-Speed 6309.00 samples/sec Loss 7.0379 LearningRate 0.0008 Epoch: 8 Global Step: 167560 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:49,287-Speed 6312.38 samples/sec Loss 6.9388 LearningRate 0.0008 Epoch: 8 Global Step: 167570 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:52,536-Speed 6305.20 samples/sec Loss 7.0392 LearningRate 0.0008 Epoch: 8 Global Step: 167580 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:55,780-Speed 6315.29 samples/sec Loss 7.0207 LearningRate 0.0008 Epoch: 8 Global Step: 167590 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:58:59,024-Speed 6314.96 samples/sec Loss 7.0038 LearningRate 0.0008 Epoch: 8 Global Step: 167600 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:02,268-Speed 6314.37 samples/sec Loss 6.9747 LearningRate 0.0008 Epoch: 8 Global Step: 167610 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:59:05,512-Speed 6315.12 samples/sec Loss 7.0420 LearningRate 0.0008 Epoch: 8 Global Step: 167620 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:59:08,761-Speed 6304.54 samples/sec Loss 6.9358 LearningRate 0.0008 Epoch: 8 Global Step: 167630 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:59:12,007-Speed 6310.14 samples/sec Loss 6.9789 LearningRate 0.0008 Epoch: 8 Global Step: 167640 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:59:15,240-Speed 6336.25 samples/sec Loss 6.9276 LearningRate 0.0008 Epoch: 8 Global Step: 167650 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:18,490-Speed 6302.89 samples/sec Loss 6.9733 LearningRate 0.0008 Epoch: 8 Global Step: 167660 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:21,734-Speed 6314.47 samples/sec Loss 7.0129 LearningRate 0.0008 Epoch: 8 Global Step: 167670 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:24,978-Speed 6314.13 samples/sec Loss 6.9557 LearningRate 0.0008 Epoch: 8 Global Step: 167680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:28,221-Speed 6317.24 samples/sec Loss 6.9575 LearningRate 0.0008 Epoch: 8 Global Step: 167690 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:31,466-Speed 6311.74 samples/sec Loss 6.9586 LearningRate 0.0008 Epoch: 8 Global Step: 167700 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:34,710-Speed 6315.56 samples/sec Loss 6.9397 LearningRate 0.0008 Epoch: 8 Global Step: 167710 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:37,957-Speed 6308.49 samples/sec Loss 6.8973 LearningRate 0.0008 Epoch: 8 Global Step: 167720 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:41,200-Speed 6317.46 samples/sec Loss 7.0134 LearningRate 0.0008 Epoch: 8 Global Step: 167730 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:44,444-Speed 6312.58 samples/sec Loss 6.9588 LearningRate 0.0008 Epoch: 8 Global Step: 167740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 06:59:47,702-Speed 6288.15 samples/sec Loss 6.9464 LearningRate 0.0008 Epoch: 8 Global Step: 167750 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:59:50,949-Speed 6309.25 samples/sec Loss 6.9330 LearningRate 0.0008 Epoch: 8 Global Step: 167760 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:59:54,193-Speed 6315.62 samples/sec Loss 6.9589 LearningRate 0.0008 Epoch: 8 Global Step: 167770 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 06:59:57,422-Speed 6342.62 samples/sec Loss 7.0086 LearningRate 0.0008 Epoch: 8 Global Step: 167780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:00,671-Speed 6306.01 samples/sec Loss 6.9861 LearningRate 0.0008 Epoch: 8 Global Step: 167790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:03,918-Speed 6308.16 samples/sec Loss 7.0073 LearningRate 0.0008 Epoch: 8 Global Step: 167800 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:07,163-Speed 6313.08 samples/sec Loss 6.9803 LearningRate 0.0008 Epoch: 8 Global Step: 167810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:10,410-Speed 6309.32 samples/sec Loss 7.0458 LearningRate 0.0008 Epoch: 8 Global Step: 167820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:13,654-Speed 6313.79 samples/sec Loss 6.9156 LearningRate 0.0008 Epoch: 8 Global Step: 167830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:16,898-Speed 6315.63 samples/sec Loss 6.9683 LearningRate 0.0008 Epoch: 8 Global Step: 167840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:20,142-Speed 6313.64 samples/sec Loss 6.9699 LearningRate 0.0008 Epoch: 8 Global Step: 167850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:23,390-Speed 6307.18 samples/sec Loss 6.9536 LearningRate 0.0008 Epoch: 8 Global Step: 167860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:26,636-Speed 6310.92 samples/sec Loss 7.0074 LearningRate 0.0008 Epoch: 8 Global Step: 167870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:29,882-Speed 6309.49 samples/sec Loss 6.9248 LearningRate 0.0008 Epoch: 8 Global Step: 167880 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:00:33,132-Speed 6303.65 samples/sec Loss 6.9510 LearningRate 0.0008 Epoch: 8 Global Step: 167890 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:00:36,369-Speed 6328.71 samples/sec Loss 7.0006 LearningRate 0.0008 Epoch: 8 Global Step: 167900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:39,616-Speed 6308.79 samples/sec Loss 7.0024 LearningRate 0.0008 Epoch: 8 Global Step: 167910 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:42,858-Speed 6321.01 samples/sec Loss 7.0348 LearningRate 0.0008 Epoch: 8 Global Step: 167920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:46,099-Speed 6320.31 samples/sec Loss 6.8696 LearningRate 0.0008 Epoch: 8 Global Step: 167930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:49,345-Speed 6310.19 samples/sec Loss 6.9946 LearningRate 0.0008 Epoch: 8 Global Step: 167940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:52,595-Speed 6303.28 samples/sec Loss 6.9921 LearningRate 0.0008 Epoch: 8 Global Step: 167950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:55,840-Speed 6312.17 samples/sec Loss 6.9843 LearningRate 0.0008 Epoch: 8 Global Step: 167960 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:00:59,086-Speed 6312.78 samples/sec Loss 7.0300 LearningRate 0.0008 Epoch: 8 Global Step: 167970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:02,332-Speed 6311.06 samples/sec Loss 6.9452 LearningRate 0.0008 Epoch: 8 Global Step: 167980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:05,577-Speed 6312.45 samples/sec Loss 6.9738 LearningRate 0.0008 Epoch: 8 Global Step: 167990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:08,820-Speed 6317.27 samples/sec Loss 7.0799 LearningRate 0.0008 Epoch: 8 Global Step: 168000 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:01:12,068-Speed 6305.38 samples/sec Loss 6.9063 LearningRate 0.0008 Epoch: 8 Global Step: 168010 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:01:15,300-Speed 6339.17 samples/sec Loss 6.8916 LearningRate 0.0008 Epoch: 8 Global Step: 168020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:18,545-Speed 6311.93 samples/sec Loss 6.9122 LearningRate 0.0008 Epoch: 8 Global Step: 168030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:21,790-Speed 6311.79 samples/sec Loss 7.0193 LearningRate 0.0008 Epoch: 8 Global Step: 168040 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:25,038-Speed 6308.19 samples/sec Loss 6.9741 LearningRate 0.0008 Epoch: 8 Global Step: 168050 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:28,288-Speed 6302.77 samples/sec Loss 6.9529 LearningRate 0.0008 Epoch: 8 Global Step: 168060 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:31,531-Speed 6315.48 samples/sec Loss 6.9693 LearningRate 0.0008 Epoch: 8 Global Step: 168070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:34,773-Speed 6318.15 samples/sec Loss 6.9032 LearningRate 0.0008 Epoch: 8 Global Step: 168080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:38,017-Speed 6315.81 samples/sec Loss 6.9385 LearningRate 0.0008 Epoch: 8 Global Step: 168090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:41,262-Speed 6312.38 samples/sec Loss 6.9018 LearningRate 0.0008 Epoch: 8 Global Step: 168100 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:44,508-Speed 6311.40 samples/sec Loss 6.9397 LearningRate 0.0008 Epoch: 8 Global Step: 168110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:01:47,753-Speed 6312.58 samples/sec Loss 6.9180 LearningRate 0.0008 Epoch: 8 Global Step: 168120 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:01:50,997-Speed 6313.15 samples/sec Loss 6.9573 LearningRate 0.0008 Epoch: 8 Global Step: 168130 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:01:54,241-Speed 6314.64 samples/sec Loss 6.9234 LearningRate 0.0008 Epoch: 8 Global Step: 168140 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:01:57,488-Speed 6308.37 samples/sec Loss 6.9629 LearningRate 0.0008 Epoch: 8 Global Step: 168150 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:02:00,722-Speed 6335.22 samples/sec Loss 6.9905 LearningRate 0.0008 Epoch: 8 Global Step: 168160 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:03,967-Speed 6313.90 samples/sec Loss 6.9805 LearningRate 0.0008 Epoch: 8 Global Step: 168170 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:07,211-Speed 6313.91 samples/sec Loss 7.0381 LearningRate 0.0008 Epoch: 8 Global Step: 168180 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:10,467-Speed 6291.96 samples/sec Loss 6.9104 LearningRate 0.0008 Epoch: 8 Global Step: 168190 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:13,710-Speed 6317.48 samples/sec Loss 6.9412 LearningRate 0.0008 Epoch: 8 Global Step: 168200 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:16,953-Speed 6316.50 samples/sec Loss 7.0030 LearningRate 0.0008 Epoch: 8 Global Step: 168210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:20,197-Speed 6313.59 samples/sec Loss 6.9387 LearningRate 0.0008 Epoch: 8 Global Step: 168220 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:23,448-Speed 6301.11 samples/sec Loss 6.8712 LearningRate 0.0008 Epoch: 8 Global Step: 168230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:26,696-Speed 6306.54 samples/sec Loss 6.9723 LearningRate 0.0008 Epoch: 8 Global Step: 168240 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:29,943-Speed 6307.98 samples/sec Loss 6.9773 LearningRate 0.0008 Epoch: 8 Global Step: 168250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:33,189-Speed 6310.91 samples/sec Loss 7.0390 LearningRate 0.0008 Epoch: 8 Global Step: 168260 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:02:36,434-Speed 6313.00 samples/sec Loss 7.0150 LearningRate 0.0008 Epoch: 8 Global Step: 168270 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:02:39,677-Speed 6317.13 samples/sec Loss 6.9980 LearningRate 0.0008 Epoch: 8 Global Step: 168280 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:02:42,922-Speed 6313.03 samples/sec Loss 6.9495 LearningRate 0.0008 Epoch: 8 Global Step: 168290 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:02:46,172-Speed 6301.69 samples/sec Loss 6.9546 LearningRate 0.0008 Epoch: 8 Global Step: 168300 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:02:49,418-Speed 6311.89 samples/sec Loss 6.9993 LearningRate 0.0008 Epoch: 8 Global Step: 168310 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:02:52,663-Speed 6312.34 samples/sec Loss 6.9594 LearningRate 0.0008 Epoch: 8 Global Step: 168320 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:02:55,897-Speed 6333.69 samples/sec Loss 6.9358 LearningRate 0.0008 Epoch: 8 Global Step: 168330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:02:59,140-Speed 6315.92 samples/sec Loss 6.9449 LearningRate 0.0008 Epoch: 8 Global Step: 168340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:02,393-Speed 6297.15 samples/sec Loss 6.9921 LearningRate 0.0008 Epoch: 8 Global Step: 168350 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:05,641-Speed 6307.60 samples/sec Loss 6.9623 LearningRate 0.0008 Epoch: 8 Global Step: 168360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:08,884-Speed 6315.61 samples/sec Loss 6.9578 LearningRate 0.0008 Epoch: 8 Global Step: 168370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:12,129-Speed 6314.20 samples/sec Loss 6.9946 LearningRate 0.0008 Epoch: 8 Global Step: 168380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:15,378-Speed 6305.06 samples/sec Loss 6.9474 LearningRate 0.0008 Epoch: 8 Global Step: 168390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:18,626-Speed 6306.21 samples/sec Loss 6.8863 LearningRate 0.0008 Epoch: 8 Global Step: 168400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:21,873-Speed 6309.60 samples/sec Loss 6.9217 LearningRate 0.0008 Epoch: 8 Global Step: 168410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:25,117-Speed 6314.97 samples/sec Loss 6.9519 LearningRate 0.0008 Epoch: 8 Global Step: 168420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:28,364-Speed 6308.96 samples/sec Loss 6.9754 LearningRate 0.0008 Epoch: 8 Global Step: 168430 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:03:31,616-Speed 6298.67 samples/sec Loss 6.8864 LearningRate 0.0008 Epoch: 8 Global Step: 168440 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:03:34,859-Speed 6315.57 samples/sec Loss 6.9296 LearningRate 0.0008 Epoch: 8 Global Step: 168450 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:03:38,105-Speed 6310.69 samples/sec Loss 6.9943 LearningRate 0.0008 Epoch: 8 Global Step: 168460 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:03:41,338-Speed 6336.93 samples/sec Loss 7.0169 LearningRate 0.0008 Epoch: 8 Global Step: 168470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:44,579-Speed 6320.98 samples/sec Loss 6.9478 LearningRate 0.0008 Epoch: 8 Global Step: 168480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:47,824-Speed 6311.05 samples/sec Loss 6.9640 LearningRate 0.0008 Epoch: 8 Global Step: 168490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:51,068-Speed 6315.04 samples/sec Loss 6.9708 LearningRate 0.0008 Epoch: 8 Global Step: 168500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:54,315-Speed 6309.61 samples/sec Loss 6.9676 LearningRate 0.0008 Epoch: 8 Global Step: 168510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:03:57,563-Speed 6305.66 samples/sec Loss 6.8364 LearningRate 0.0008 Epoch: 8 Global Step: 168520 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:04:00,808-Speed 6313.22 samples/sec Loss 6.9301 LearningRate 0.0008 Epoch: 8 Global Step: 168530 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:04:04,054-Speed 6310.81 samples/sec Loss 6.9403 LearningRate 0.0008 Epoch: 8 Global Step: 168540 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:04:07,298-Speed 6314.57 samples/sec Loss 6.9806 LearningRate 0.0008 Epoch: 8 Global Step: 168550 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:04:10,549-Speed 6301.68 samples/sec Loss 7.0105 LearningRate 0.0008 Epoch: 8 Global Step: 168560 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:04:13,791-Speed 6318.32 samples/sec Loss 6.9468 LearningRate 0.0008 Epoch: 8 Global Step: 168570 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:17,039-Speed 6305.95 samples/sec Loss 6.9566 LearningRate 0.0008 Epoch: 8 Global Step: 168580 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:20,288-Speed 6305.38 samples/sec Loss 6.9868 LearningRate 0.0008 Epoch: 8 Global Step: 168590 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:23,535-Speed 6307.93 samples/sec Loss 6.8916 LearningRate 0.0008 Epoch: 8 Global Step: 168600 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:26,781-Speed 6311.96 samples/sec Loss 6.9032 LearningRate 0.0008 Epoch: 8 Global Step: 168610 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:30,028-Speed 6308.97 samples/sec Loss 6.9817 LearningRate 0.0008 Epoch: 8 Global Step: 168620 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:33,283-Speed 6293.23 samples/sec Loss 6.9451 LearningRate 0.0008 Epoch: 8 Global Step: 168630 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:36,531-Speed 6306.35 samples/sec Loss 6.9192 LearningRate 0.0008 Epoch: 8 Global Step: 168640 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:39,776-Speed 6312.25 samples/sec Loss 7.0006 LearningRate 0.0008 Epoch: 8 Global Step: 168650 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:43,025-Speed 6306.97 samples/sec Loss 6.9532 LearningRate 0.0008 Epoch: 8 Global Step: 168660 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:46,256-Speed 6339.67 samples/sec Loss 7.0187 LearningRate 0.0008 Epoch: 8 Global Step: 168670 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:49,499-Speed 6314.74 samples/sec Loss 6.9381 LearningRate 0.0008 Epoch: 8 Global Step: 168680 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:52,749-Speed 6304.04 samples/sec Loss 6.9860 LearningRate 0.0008 Epoch: 8 Global Step: 168690 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:04:55,979-Speed 6341.56 samples/sec Loss 7.0249 LearningRate 0.0008 Epoch: 8 Global Step: 168700 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:04:59,225-Speed 6311.97 samples/sec Loss 6.9592 LearningRate 0.0008 Epoch: 8 Global Step: 168710 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:02,474-Speed 6304.49 samples/sec Loss 6.9798 LearningRate 0.0008 Epoch: 8 Global Step: 168720 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:05,721-Speed 6309.12 samples/sec Loss 6.9053 LearningRate 0.0008 Epoch: 8 Global Step: 168730 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:08,971-Speed 6303.22 samples/sec Loss 6.8943 LearningRate 0.0008 Epoch: 8 Global Step: 168740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:12,224-Speed 6298.92 samples/sec Loss 6.8918 LearningRate 0.0008 Epoch: 8 Global Step: 168750 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:15,466-Speed 6317.43 samples/sec Loss 6.9479 LearningRate 0.0008 Epoch: 8 Global Step: 168760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:18,711-Speed 6313.72 samples/sec Loss 6.9258 LearningRate 0.0008 Epoch: 8 Global Step: 168770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:21,960-Speed 6304.41 samples/sec Loss 6.8706 LearningRate 0.0008 Epoch: 8 Global Step: 168780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:25,206-Speed 6309.96 samples/sec Loss 6.9831 LearningRate 0.0008 Epoch: 8 Global Step: 168790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:28,448-Speed 6318.71 samples/sec Loss 6.9286 LearningRate 0.0008 Epoch: 8 Global Step: 168800 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:05:31,685-Speed 6329.14 samples/sec Loss 6.8962 LearningRate 0.0008 Epoch: 8 Global Step: 168810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:34,928-Speed 6316.36 samples/sec Loss 6.9283 LearningRate 0.0008 Epoch: 8 Global Step: 168820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:38,173-Speed 6312.93 samples/sec Loss 6.9559 LearningRate 0.0008 Epoch: 8 Global Step: 168830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:41,418-Speed 6313.72 samples/sec Loss 6.8579 LearningRate 0.0008 Epoch: 8 Global Step: 168840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:44,664-Speed 6309.68 samples/sec Loss 6.9384 LearningRate 0.0008 Epoch: 8 Global Step: 168850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:47,907-Speed 6317.21 samples/sec Loss 6.9607 LearningRate 0.0008 Epoch: 8 Global Step: 168860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:51,151-Speed 6313.47 samples/sec Loss 7.0248 LearningRate 0.0008 Epoch: 8 Global Step: 168870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:54,401-Speed 6303.30 samples/sec Loss 7.0028 LearningRate 0.0008 Epoch: 8 Global Step: 168880 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:05:57,643-Speed 6318.63 samples/sec Loss 6.9554 LearningRate 0.0008 Epoch: 8 Global Step: 168890 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:00,891-Speed 6307.24 samples/sec Loss 6.9870 LearningRate 0.0008 Epoch: 8 Global Step: 168900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:04,138-Speed 6307.65 samples/sec Loss 7.0740 LearningRate 0.0008 Epoch: 8 Global Step: 168910 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:06:07,388-Speed 6304.20 samples/sec Loss 6.9589 LearningRate 0.0008 Epoch: 8 Global Step: 168920 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:06:10,623-Speed 6331.21 samples/sec Loss 7.0143 LearningRate 0.0008 Epoch: 8 Global Step: 168930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:13,869-Speed 6311.98 samples/sec Loss 7.0261 LearningRate 0.0008 Epoch: 8 Global Step: 168940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:17,113-Speed 6313.29 samples/sec Loss 6.9670 LearningRate 0.0008 Epoch: 8 Global Step: 168950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:20,366-Speed 6298.29 samples/sec Loss 6.9105 LearningRate 0.0008 Epoch: 8 Global Step: 168960 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:23,625-Speed 6284.28 samples/sec Loss 6.9165 LearningRate 0.0008 Epoch: 8 Global Step: 168970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:26,883-Speed 6288.04 samples/sec Loss 6.9217 LearningRate 0.0008 Epoch: 8 Global Step: 168980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:30,134-Speed 6301.04 samples/sec Loss 6.9269 LearningRate 0.0008 Epoch: 8 Global Step: 168990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:33,387-Speed 6297.41 samples/sec Loss 7.0277 LearningRate 0.0008 Epoch: 8 Global Step: 169000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:36,631-Speed 6314.92 samples/sec Loss 7.0203 LearningRate 0.0008 Epoch: 8 Global Step: 169010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:39,876-Speed 6311.56 samples/sec Loss 6.9396 LearningRate 0.0008 Epoch: 8 Global Step: 169020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:43,125-Speed 6306.76 samples/sec Loss 7.0045 LearningRate 0.0008 Epoch: 8 Global Step: 169030 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:06:46,368-Speed 6315.87 samples/sec Loss 6.9950 LearningRate 0.0008 Epoch: 8 Global Step: 169040 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:06:49,617-Speed 6304.28 samples/sec Loss 6.9869 LearningRate 0.0008 Epoch: 8 Global Step: 169050 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:06:52,868-Speed 6302.01 samples/sec Loss 7.0331 LearningRate 0.0008 Epoch: 8 Global Step: 169060 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:06:56,103-Speed 6332.29 samples/sec Loss 6.9892 LearningRate 0.0008 Epoch: 8 Global Step: 169070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:06:59,349-Speed 6309.06 samples/sec Loss 6.8484 LearningRate 0.0008 Epoch: 8 Global Step: 169080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:02,598-Speed 6306.98 samples/sec Loss 6.9413 LearningRate 0.0008 Epoch: 8 Global Step: 169090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:05,843-Speed 6311.19 samples/sec Loss 6.9358 LearningRate 0.0008 Epoch: 8 Global Step: 169100 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:09,087-Speed 6314.37 samples/sec Loss 6.9032 LearningRate 0.0008 Epoch: 8 Global Step: 169110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:12,336-Speed 6305.98 samples/sec Loss 7.0287 LearningRate 0.0008 Epoch: 8 Global Step: 169120 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:15,581-Speed 6311.94 samples/sec Loss 6.9888 LearningRate 0.0008 Epoch: 8 Global Step: 169130 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:18,829-Speed 6307.91 samples/sec Loss 6.9229 LearningRate 0.0008 Epoch: 8 Global Step: 169140 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:22,078-Speed 6304.17 samples/sec Loss 6.8422 LearningRate 0.0008 Epoch: 8 Global Step: 169150 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:25,322-Speed 6314.37 samples/sec Loss 6.9831 LearningRate 0.0008 Epoch: 8 Global Step: 169160 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:28,556-Speed 6333.91 samples/sec Loss 7.0418 LearningRate 0.0008 Epoch: 8 Global Step: 169170 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:31,804-Speed 6307.24 samples/sec Loss 6.9574 LearningRate 0.0008 Epoch: 8 Global Step: 169180 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:35,054-Speed 6302.30 samples/sec Loss 6.9800 LearningRate 0.0008 Epoch: 8 Global Step: 169190 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:38,308-Speed 6295.70 samples/sec Loss 7.0247 LearningRate 0.0008 Epoch: 8 Global Step: 169200 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:41,560-Speed 6299.15 samples/sec Loss 6.8704 LearningRate 0.0008 Epoch: 8 Global Step: 169210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:44,805-Speed 6311.52 samples/sec Loss 6.9888 LearningRate 0.0008 Epoch: 8 Global Step: 169220 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:48,062-Speed 6290.56 samples/sec Loss 6.9244 LearningRate 0.0008 Epoch: 8 Global Step: 169230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:51,314-Speed 6300.68 samples/sec Loss 6.9128 LearningRate 0.0008 Epoch: 8 Global Step: 169240 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:54,615-Speed 6204.40 samples/sec Loss 6.9030 LearningRate 0.0008 Epoch: 8 Global Step: 169250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:07:57,870-Speed 6293.57 samples/sec Loss 6.9637 LearningRate 0.0008 Epoch: 8 Global Step: 169260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:01,116-Speed 6310.81 samples/sec Loss 6.8990 LearningRate 0.0008 Epoch: 8 Global Step: 169270 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:08:04,361-Speed 6312.92 samples/sec Loss 6.9032 LearningRate 0.0008 Epoch: 8 Global Step: 169280 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:08:07,590-Speed 6342.69 samples/sec Loss 6.8775 LearningRate 0.0008 Epoch: 8 Global Step: 169290 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:10,838-Speed 6308.78 samples/sec Loss 7.0172 LearningRate 0.0008 Epoch: 8 Global Step: 169300 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:14,086-Speed 6306.04 samples/sec Loss 6.8750 LearningRate 0.0008 Epoch: 8 Global Step: 169310 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:17,330-Speed 6314.68 samples/sec Loss 6.9093 LearningRate 0.0008 Epoch: 8 Global Step: 169320 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:20,577-Speed 6308.51 samples/sec Loss 6.9051 LearningRate 0.0008 Epoch: 8 Global Step: 169330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:23,826-Speed 6304.90 samples/sec Loss 6.9204 LearningRate 0.0008 Epoch: 8 Global Step: 169340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:27,073-Speed 6309.15 samples/sec Loss 6.8703 LearningRate 0.0008 Epoch: 8 Global Step: 169350 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:30,315-Speed 6318.08 samples/sec Loss 6.9846 LearningRate 0.0008 Epoch: 8 Global Step: 169360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:33,557-Speed 6317.64 samples/sec Loss 6.9297 LearningRate 0.0008 Epoch: 8 Global Step: 169370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:36,807-Speed 6303.93 samples/sec Loss 6.9906 LearningRate 0.0008 Epoch: 8 Global Step: 169380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:40,054-Speed 6308.25 samples/sec Loss 6.9871 LearningRate 0.0008 Epoch: 8 Global Step: 169390 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:08:43,298-Speed 6314.55 samples/sec Loss 6.8663 LearningRate 0.0008 Epoch: 8 Global Step: 169400 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:08:46,538-Speed 6321.99 samples/sec Loss 6.9531 LearningRate 0.0008 Epoch: 8 Global Step: 169410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:49,820-Speed 6243.17 samples/sec Loss 6.9748 LearningRate 0.0008 Epoch: 8 Global Step: 169420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:53,066-Speed 6310.60 samples/sec Loss 6.8954 LearningRate 0.0008 Epoch: 8 Global Step: 169430 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:56,312-Speed 6309.94 samples/sec Loss 6.9862 LearningRate 0.0008 Epoch: 8 Global Step: 169440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:08:59,562-Speed 6303.76 samples/sec Loss 6.9124 LearningRate 0.0008 Epoch: 8 Global Step: 169450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:02,812-Speed 6302.31 samples/sec Loss 6.9034 LearningRate 0.0008 Epoch: 8 Global Step: 169460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:06,057-Speed 6313.21 samples/sec Loss 6.9367 LearningRate 0.0008 Epoch: 8 Global Step: 169470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:09,303-Speed 6310.82 samples/sec Loss 7.0057 LearningRate 0.0008 Epoch: 8 Global Step: 169480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:12,549-Speed 6311.41 samples/sec Loss 6.9708 LearningRate 0.0008 Epoch: 8 Global Step: 169490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:15,795-Speed 6309.73 samples/sec Loss 6.9003 LearningRate 0.0008 Epoch: 8 Global Step: 169500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:19,045-Speed 6303.53 samples/sec Loss 6.8836 LearningRate 0.0008 Epoch: 8 Global Step: 169510 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:09:22,290-Speed 6311.98 samples/sec Loss 7.0133 LearningRate 0.0008 Epoch: 8 Global Step: 169520 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:09:25,523-Speed 6336.36 samples/sec Loss 7.0514 LearningRate 0.0008 Epoch: 8 Global Step: 169530 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:28,765-Speed 6318.30 samples/sec Loss 6.9017 LearningRate 0.0008 Epoch: 8 Global Step: 169540 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:32,012-Speed 6308.50 samples/sec Loss 7.0088 LearningRate 0.0008 Epoch: 8 Global Step: 169550 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:35,268-Speed 6290.82 samples/sec Loss 6.9558 LearningRate 0.0008 Epoch: 8 Global Step: 169560 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:38,510-Speed 6320.15 samples/sec Loss 6.9584 LearningRate 0.0008 Epoch: 8 Global Step: 169570 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:41,755-Speed 6311.90 samples/sec Loss 6.8444 LearningRate 0.0008 Epoch: 8 Global Step: 169580 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:45,002-Speed 6309.41 samples/sec Loss 6.9754 LearningRate 0.0008 Epoch: 8 Global Step: 169590 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:48,252-Speed 6303.18 samples/sec Loss 6.9182 LearningRate 0.0008 Epoch: 8 Global Step: 169600 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:51,496-Speed 6314.59 samples/sec Loss 6.9085 LearningRate 0.0008 Epoch: 8 Global Step: 169610 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:54,740-Speed 6314.30 samples/sec Loss 6.9087 LearningRate 0.0008 Epoch: 8 Global Step: 169620 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:09:57,983-Speed 6315.91 samples/sec Loss 6.9338 LearningRate 0.0008 Epoch: 8 Global Step: 169630 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:10:01,236-Speed 6298.63 samples/sec Loss 7.0190 LearningRate 0.0008 Epoch: 8 Global Step: 169640 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:10:04,466-Speed 6340.69 samples/sec Loss 6.8976 LearningRate 0.0008 Epoch: 8 Global Step: 169650 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:07,710-Speed 6315.32 samples/sec Loss 6.9395 LearningRate 0.0008 Epoch: 8 Global Step: 169660 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:10,957-Speed 6309.81 samples/sec Loss 6.9288 LearningRate 0.0008 Epoch: 8 Global Step: 169670 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:14,202-Speed 6312.64 samples/sec Loss 6.9436 LearningRate 0.0008 Epoch: 8 Global Step: 169680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:17,451-Speed 6304.93 samples/sec Loss 6.9499 LearningRate 0.0008 Epoch: 8 Global Step: 169690 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:20,696-Speed 6312.28 samples/sec Loss 6.9299 LearningRate 0.0008 Epoch: 8 Global Step: 169700 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:23,940-Speed 6313.43 samples/sec Loss 7.0189 LearningRate 0.0008 Epoch: 8 Global Step: 169710 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:27,189-Speed 6306.00 samples/sec Loss 6.8947 LearningRate 0.0008 Epoch: 8 Global Step: 169720 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:30,433-Speed 6313.62 samples/sec Loss 6.9694 LearningRate 0.0008 Epoch: 8 Global Step: 169730 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:33,695-Speed 6280.13 samples/sec Loss 6.9423 LearningRate 0.0008 Epoch: 8 Global Step: 169740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:36,942-Speed 6308.94 samples/sec Loss 6.9692 LearningRate 0.0008 Epoch: 8 Global Step: 169750 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:10:40,172-Speed 6342.53 samples/sec Loss 6.9373 LearningRate 0.0008 Epoch: 8 Global Step: 169760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:43,417-Speed 6313.10 samples/sec Loss 6.9797 LearningRate 0.0008 Epoch: 8 Global Step: 169770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:46,662-Speed 6311.36 samples/sec Loss 7.0030 LearningRate 0.0008 Epoch: 8 Global Step: 169780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:49,908-Speed 6311.19 samples/sec Loss 6.8615 LearningRate 0.0008 Epoch: 8 Global Step: 169790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:53,151-Speed 6317.45 samples/sec Loss 6.9626 LearningRate 0.0008 Epoch: 8 Global Step: 169800 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:56,396-Speed 6312.25 samples/sec Loss 6.8131 LearningRate 0.0008 Epoch: 8 Global Step: 169810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:10:59,638-Speed 6317.36 samples/sec Loss 6.9569 LearningRate 0.0008 Epoch: 8 Global Step: 169820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:02,885-Speed 6309.92 samples/sec Loss 7.0423 LearningRate 0.0008 Epoch: 8 Global Step: 169830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:06,132-Speed 6308.64 samples/sec Loss 6.9137 LearningRate 0.0008 Epoch: 8 Global Step: 169840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:09,380-Speed 6305.41 samples/sec Loss 6.9770 LearningRate 0.0008 Epoch: 8 Global Step: 169850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:12,614-Speed 6335.65 samples/sec Loss 7.0138 LearningRate 0.0008 Epoch: 8 Global Step: 169860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:15,862-Speed 6307.31 samples/sec Loss 6.9985 LearningRate 0.0008 Epoch: 8 Global Step: 169870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:19,104-Speed 6319.03 samples/sec Loss 6.9418 LearningRate 0.0008 Epoch: 8 Global Step: 169880 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:22,348-Speed 6314.76 samples/sec Loss 6.8969 LearningRate 0.0008 Epoch: 8 Global Step: 169890 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:25,594-Speed 6309.64 samples/sec Loss 6.9183 LearningRate 0.0008 Epoch: 8 Global Step: 169900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:28,840-Speed 6310.00 samples/sec Loss 6.9965 LearningRate 0.0008 Epoch: 8 Global Step: 169910 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:32,087-Speed 6309.50 samples/sec Loss 7.0589 LearningRate 0.0008 Epoch: 8 Global Step: 169920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:35,358-Speed 6263.03 samples/sec Loss 6.9120 LearningRate 0.0008 Epoch: 8 Global Step: 169930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:38,609-Speed 6299.70 samples/sec Loss 6.9667 LearningRate 0.0008 Epoch: 8 Global Step: 169940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:41,855-Speed 6313.97 samples/sec Loss 6.9372 LearningRate 0.0008 Epoch: 8 Global Step: 169950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:45,088-Speed 6336.87 samples/sec Loss 6.9086 LearningRate 0.0008 Epoch: 8 Global Step: 169960 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:48,344-Speed 6292.26 samples/sec Loss 7.0232 LearningRate 0.0008 Epoch: 8 Global Step: 169970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:51,588-Speed 6313.61 samples/sec Loss 6.9919 LearningRate 0.0008 Epoch: 8 Global Step: 169980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:54,839-Speed 6300.85 samples/sec Loss 6.9873 LearningRate 0.0008 Epoch: 8 Global Step: 169990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:11:58,084-Speed 6313.05 samples/sec Loss 6.9499 LearningRate 0.0008 Epoch: 8 Global Step: 170000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:01,331-Speed 6311.89 samples/sec Loss 6.9838 LearningRate 0.0008 Epoch: 8 Global Step: 170010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:04,578-Speed 6306.99 samples/sec Loss 7.0044 LearningRate 0.0008 Epoch: 8 Global Step: 170020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:07,827-Speed 6304.71 samples/sec Loss 6.9736 LearningRate 0.0008 Epoch: 8 Global Step: 170030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:11,075-Speed 6307.45 samples/sec Loss 6.9155 LearningRate 0.0008 Epoch: 8 Global Step: 170040 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:14,319-Speed 6315.85 samples/sec Loss 6.8908 LearningRate 0.0008 Epoch: 8 Global Step: 170050 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:17,626-Speed 6194.63 samples/sec Loss 6.9129 LearningRate 0.0008 Epoch: 8 Global Step: 170060 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:12:20,856-Speed 6340.47 samples/sec Loss 6.9398 LearningRate 0.0008 Epoch: 8 Global Step: 170070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:24,105-Speed 6305.32 samples/sec Loss 6.9929 LearningRate 0.0008 Epoch: 8 Global Step: 170080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:27,353-Speed 6308.26 samples/sec Loss 6.9071 LearningRate 0.0008 Epoch: 8 Global Step: 170090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:30,599-Speed 6311.10 samples/sec Loss 6.8835 LearningRate 0.0008 Epoch: 8 Global Step: 170100 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:33,845-Speed 6309.41 samples/sec Loss 6.9012 LearningRate 0.0008 Epoch: 8 Global Step: 170110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:37,088-Speed 6317.21 samples/sec Loss 6.9231 LearningRate 0.0008 Epoch: 8 Global Step: 170120 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:40,337-Speed 6305.03 samples/sec Loss 6.9400 LearningRate 0.0008 Epoch: 8 Global Step: 170130 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:43,581-Speed 6312.94 samples/sec Loss 6.9056 LearningRate 0.0008 Epoch: 8 Global Step: 170140 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:46,834-Speed 6297.81 samples/sec Loss 6.9417 LearningRate 0.0008 Epoch: 8 Global Step: 170150 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:50,077-Speed 6316.00 samples/sec Loss 6.8716 LearningRate 0.0008 Epoch: 8 Global Step: 170160 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:12:53,318-Speed 6321.73 samples/sec Loss 6.9075 LearningRate 0.0008 Epoch: 8 Global Step: 170170 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:12:56,564-Speed 6310.38 samples/sec Loss 6.9657 LearningRate 0.0008 Epoch: 8 Global Step: 170180 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:12:59,814-Speed 6303.71 samples/sec Loss 6.9714 LearningRate 0.0008 Epoch: 8 Global Step: 170190 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:13:03,061-Speed 6307.99 samples/sec Loss 6.8869 LearningRate 0.0008 Epoch: 8 Global Step: 170200 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:13:06,298-Speed 6328.54 samples/sec Loss 6.9380 LearningRate 0.0008 Epoch: 8 Global Step: 170210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:09,540-Speed 6319.50 samples/sec Loss 6.9058 LearningRate 0.0008 Epoch: 8 Global Step: 170220 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:12,789-Speed 6303.04 samples/sec Loss 6.8677 LearningRate 0.0008 Epoch: 8 Global Step: 170230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:16,039-Speed 6304.13 samples/sec Loss 6.9053 LearningRate 0.0008 Epoch: 8 Global Step: 170240 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:19,291-Speed 6298.94 samples/sec Loss 6.8875 LearningRate 0.0008 Epoch: 8 Global Step: 170250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:22,539-Speed 6307.77 samples/sec Loss 6.9183 LearningRate 0.0008 Epoch: 8 Global Step: 170260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:25,786-Speed 6308.63 samples/sec Loss 6.9765 LearningRate 0.0008 Epoch: 8 Global Step: 170270 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:29,035-Speed 6303.91 samples/sec Loss 6.9239 LearningRate 0.0008 Epoch: 8 Global Step: 170280 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:32,282-Speed 6309.35 samples/sec Loss 6.9093 LearningRate 0.0008 Epoch: 8 Global Step: 170290 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:35,529-Speed 6307.55 samples/sec Loss 6.9871 LearningRate 0.0008 Epoch: 8 Global Step: 170300 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:38,778-Speed 6306.14 samples/sec Loss 6.8477 LearningRate 0.0008 Epoch: 8 Global Step: 170310 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:13:42,013-Speed 6332.18 samples/sec Loss 6.9515 LearningRate 0.0008 Epoch: 8 Global Step: 170320 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:45,257-Speed 6313.86 samples/sec Loss 6.9096 LearningRate 0.0008 Epoch: 8 Global Step: 170330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:48,503-Speed 6311.69 samples/sec Loss 6.9059 LearningRate 0.0008 Epoch: 8 Global Step: 170340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:51,754-Speed 6301.68 samples/sec Loss 6.9179 LearningRate 0.0008 Epoch: 8 Global Step: 170350 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:55,002-Speed 6306.10 samples/sec Loss 7.0003 LearningRate 0.0008 Epoch: 8 Global Step: 170360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:13:58,247-Speed 6313.02 samples/sec Loss 6.9051 LearningRate 0.0008 Epoch: 8 Global Step: 170370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:01,498-Speed 6300.32 samples/sec Loss 6.9401 LearningRate 0.0008 Epoch: 8 Global Step: 170380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:04,744-Speed 6311.05 samples/sec Loss 6.9937 LearningRate 0.0008 Epoch: 8 Global Step: 170390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:07,990-Speed 6311.32 samples/sec Loss 6.9781 LearningRate 0.0008 Epoch: 8 Global Step: 170400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:11,235-Speed 6311.51 samples/sec Loss 6.9795 LearningRate 0.0008 Epoch: 8 Global Step: 170410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:14,485-Speed 6303.52 samples/sec Loss 6.9573 LearningRate 0.0008 Epoch: 8 Global Step: 170420 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:14:17,734-Speed 6305.51 samples/sec Loss 6.8995 LearningRate 0.0008 Epoch: 8 Global Step: 170430 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:14:20,980-Speed 6310.17 samples/sec Loss 6.8848 LearningRate 0.0008 Epoch: 8 Global Step: 170440 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:14:24,229-Speed 6304.54 samples/sec Loss 6.9193 LearningRate 0.0008 Epoch: 8 Global Step: 170450 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:14:27,497-Speed 6268.59 samples/sec Loss 6.9624 LearningRate 0.0008 Epoch: 8 Global Step: 170460 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:14:30,754-Speed 6289.08 samples/sec Loss 6.9581 LearningRate 0.0008 Epoch: 8 Global Step: 170470 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:14:33,987-Speed 6335.70 samples/sec Loss 6.8861 LearningRate 0.0008 Epoch: 8 Global Step: 170480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:37,231-Speed 6314.89 samples/sec Loss 6.9955 LearningRate 0.0008 Epoch: 8 Global Step: 170490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:40,477-Speed 6311.24 samples/sec Loss 6.9241 LearningRate 0.0008 Epoch: 8 Global Step: 170500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:43,723-Speed 6310.76 samples/sec Loss 6.8968 LearningRate 0.0008 Epoch: 8 Global Step: 170510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:46,970-Speed 6308.26 samples/sec Loss 6.8944 LearningRate 0.0008 Epoch: 8 Global Step: 170520 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:50,220-Speed 6304.31 samples/sec Loss 6.9664 LearningRate 0.0008 Epoch: 8 Global Step: 170530 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:53,471-Speed 6301.30 samples/sec Loss 6.9450 LearningRate 0.0008 Epoch: 8 Global Step: 170540 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:56,717-Speed 6309.49 samples/sec Loss 7.0005 LearningRate 0.0008 Epoch: 8 Global Step: 170550 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:14:59,963-Speed 6311.10 samples/sec Loss 6.9583 LearningRate 0.0008 Epoch: 8 Global Step: 170560 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:03,216-Speed 6297.08 samples/sec Loss 6.9121 LearningRate 0.0008 Epoch: 8 Global Step: 170570 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:06,466-Speed 6303.46 samples/sec Loss 6.9124 LearningRate 0.0008 Epoch: 8 Global Step: 170580 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:15:09,715-Speed 6304.14 samples/sec Loss 6.8970 LearningRate 0.0008 Epoch: 8 Global Step: 170590 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:15:12,963-Speed 6306.77 samples/sec Loss 6.9483 LearningRate 0.0008 Epoch: 8 Global Step: 170600 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:15:16,209-Speed 6311.11 samples/sec Loss 6.9559 LearningRate 0.0008 Epoch: 8 Global Step: 170610 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:15:19,457-Speed 6307.17 samples/sec Loss 6.9488 LearningRate 0.0008 Epoch: 8 Global Step: 170620 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:15:22,690-Speed 6335.81 samples/sec Loss 6.9426 LearningRate 0.0008 Epoch: 8 Global Step: 170630 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:25,938-Speed 6307.94 samples/sec Loss 6.9697 LearningRate 0.0008 Epoch: 8 Global Step: 170640 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:29,187-Speed 6304.38 samples/sec Loss 6.9843 LearningRate 0.0008 Epoch: 8 Global Step: 170650 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:32,434-Speed 6308.60 samples/sec Loss 6.9424 LearningRate 0.0008 Epoch: 8 Global Step: 170660 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:35,681-Speed 6309.41 samples/sec Loss 6.8302 LearningRate 0.0008 Epoch: 8 Global Step: 170670 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:38,924-Speed 6314.84 samples/sec Loss 6.8989 LearningRate 0.0008 Epoch: 8 Global Step: 170680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:42,168-Speed 6314.63 samples/sec Loss 6.9324 LearningRate 0.0008 Epoch: 8 Global Step: 170690 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:45,411-Speed 6317.68 samples/sec Loss 6.9464 LearningRate 0.0008 Epoch: 8 Global Step: 170700 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:48,725-Speed 6181.02 samples/sec Loss 6.9206 LearningRate 0.0008 Epoch: 8 Global Step: 170710 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:51,992-Speed 6270.57 samples/sec Loss 6.9494 LearningRate 0.0008 Epoch: 8 Global Step: 170720 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:15:55,236-Speed 6313.62 samples/sec Loss 6.9353 LearningRate 0.0008 Epoch: 8 Global Step: 170730 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:15:58,475-Speed 6324.57 samples/sec Loss 7.0205 LearningRate 0.0008 Epoch: 8 Global Step: 170740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:01,724-Speed 6304.69 samples/sec Loss 6.9993 LearningRate 0.0008 Epoch: 8 Global Step: 170750 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:04,972-Speed 6307.30 samples/sec Loss 6.9595 LearningRate 0.0008 Epoch: 8 Global Step: 170760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:08,216-Speed 6316.13 samples/sec Loss 6.9135 LearningRate 0.0008 Epoch: 8 Global Step: 170770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:11,464-Speed 6307.10 samples/sec Loss 6.9025 LearningRate 0.0008 Epoch: 8 Global Step: 170780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:14,709-Speed 6311.27 samples/sec Loss 6.9275 LearningRate 0.0008 Epoch: 8 Global Step: 170790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:17,953-Speed 6315.97 samples/sec Loss 6.9114 LearningRate 0.0008 Epoch: 8 Global Step: 170800 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:21,198-Speed 6311.79 samples/sec Loss 6.8654 LearningRate 0.0008 Epoch: 8 Global Step: 170810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:24,448-Speed 6303.15 samples/sec Loss 6.9271 LearningRate 0.0008 Epoch: 8 Global Step: 170820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:27,694-Speed 6311.19 samples/sec Loss 6.9903 LearningRate 0.0008 Epoch: 8 Global Step: 170830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:30,939-Speed 6311.81 samples/sec Loss 6.9331 LearningRate 0.0008 Epoch: 8 Global Step: 170840 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:16:34,187-Speed 6307.24 samples/sec Loss 6.9092 LearningRate 0.0008 Epoch: 8 Global Step: 170850 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:16:37,420-Speed 6335.83 samples/sec Loss 6.9533 LearningRate 0.0008 Epoch: 8 Global Step: 170860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:40,668-Speed 6306.69 samples/sec Loss 6.9048 LearningRate 0.0008 Epoch: 8 Global Step: 170870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:43,917-Speed 6305.21 samples/sec Loss 7.0045 LearningRate 0.0008 Epoch: 8 Global Step: 170880 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:47,167-Speed 6303.76 samples/sec Loss 6.8946 LearningRate 0.0008 Epoch: 8 Global Step: 170890 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:50,415-Speed 6305.59 samples/sec Loss 6.8601 LearningRate 0.0008 Epoch: 8 Global Step: 170900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:53,662-Speed 6309.50 samples/sec Loss 6.9620 LearningRate 0.0008 Epoch: 8 Global Step: 170910 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:16:56,909-Speed 6308.71 samples/sec Loss 6.8394 LearningRate 0.0008 Epoch: 8 Global Step: 170920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:00,158-Speed 6305.30 samples/sec Loss 6.9789 LearningRate 0.0008 Epoch: 8 Global Step: 170930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:03,402-Speed 6313.06 samples/sec Loss 6.9635 LearningRate 0.0008 Epoch: 8 Global Step: 170940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:06,652-Speed 6306.24 samples/sec Loss 6.8710 LearningRate 0.0008 Epoch: 8 Global Step: 170950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:09,920-Speed 6267.94 samples/sec Loss 6.9019 LearningRate 0.0008 Epoch: 8 Global Step: 170960 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:17:13,170-Speed 6303.54 samples/sec Loss 6.8983 LearningRate 0.0008 Epoch: 8 Global Step: 170970 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:17:16,420-Speed 6302.83 samples/sec Loss 6.9130 LearningRate 0.0008 Epoch: 8 Global Step: 170980 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:17:19,670-Speed 6302.96 samples/sec Loss 6.9119 LearningRate 0.0008 Epoch: 8 Global Step: 170990 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:17:22,905-Speed 6332.53 samples/sec Loss 6.8748 LearningRate 0.0008 Epoch: 8 Global Step: 171000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:26,150-Speed 6313.14 samples/sec Loss 6.9339 LearningRate 0.0008 Epoch: 8 Global Step: 171010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:29,398-Speed 6307.68 samples/sec Loss 6.9648 LearningRate 0.0008 Epoch: 8 Global Step: 171020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:32,680-Speed 6241.49 samples/sec Loss 6.9292 LearningRate 0.0008 Epoch: 8 Global Step: 171030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:35,926-Speed 6309.24 samples/sec Loss 6.9638 LearningRate 0.0008 Epoch: 8 Global Step: 171040 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:39,177-Speed 6300.98 samples/sec Loss 6.8867 LearningRate 0.0008 Epoch: 8 Global Step: 171050 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:42,422-Speed 6312.41 samples/sec Loss 6.9619 LearningRate 0.0008 Epoch: 8 Global Step: 171060 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:45,669-Speed 6308.55 samples/sec Loss 6.9292 LearningRate 0.0008 Epoch: 8 Global Step: 171070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:48,916-Speed 6309.40 samples/sec Loss 6.9781 LearningRate 0.0008 Epoch: 8 Global Step: 171080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:52,165-Speed 6305.65 samples/sec Loss 6.9083 LearningRate 0.0008 Epoch: 8 Global Step: 171090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:17:55,412-Speed 6308.04 samples/sec Loss 7.0618 LearningRate 0.0008 Epoch: 8 Global Step: 171100 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:17:58,643-Speed 6341.31 samples/sec Loss 6.9835 LearningRate 0.0008 Epoch: 8 Global Step: 171110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:01,889-Speed 6309.86 samples/sec Loss 6.9500 LearningRate 0.0008 Epoch: 8 Global Step: 171120 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:05,135-Speed 6310.75 samples/sec Loss 6.9858 LearningRate 0.0008 Epoch: 8 Global Step: 171130 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:08,378-Speed 6316.01 samples/sec Loss 6.9713 LearningRate 0.0008 Epoch: 8 Global Step: 171140 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:11,642-Speed 6277.32 samples/sec Loss 6.8934 LearningRate 0.0008 Epoch: 8 Global Step: 171150 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:14,918-Speed 6252.44 samples/sec Loss 6.9072 LearningRate 0.0008 Epoch: 8 Global Step: 171160 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:18,168-Speed 6304.53 samples/sec Loss 6.9298 LearningRate 0.0008 Epoch: 8 Global Step: 171170 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:21,413-Speed 6311.41 samples/sec Loss 6.9496 LearningRate 0.0008 Epoch: 8 Global Step: 171180 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:24,664-Speed 6302.12 samples/sec Loss 6.9178 LearningRate 0.0008 Epoch: 8 Global Step: 171190 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:27,907-Speed 6315.78 samples/sec Loss 6.9883 LearningRate 0.0008 Epoch: 8 Global Step: 171200 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:31,153-Speed 6310.77 samples/sec Loss 6.8447 LearningRate 0.0008 Epoch: 8 Global Step: 171210 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:18:34,400-Speed 6308.87 samples/sec Loss 6.9163 LearningRate 0.0008 Epoch: 8 Global Step: 171220 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:18:37,647-Speed 6308.47 samples/sec Loss 6.9571 LearningRate 0.0008 Epoch: 8 Global Step: 171230 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:18:40,896-Speed 6305.13 samples/sec Loss 6.9302 LearningRate 0.0008 Epoch: 8 Global Step: 171240 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:18:44,131-Speed 6332.48 samples/sec Loss 6.9474 LearningRate 0.0008 Epoch: 8 Global Step: 171250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:47,375-Speed 6315.18 samples/sec Loss 6.8970 LearningRate 0.0008 Epoch: 8 Global Step: 171260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:50,626-Speed 6301.03 samples/sec Loss 6.9755 LearningRate 0.0008 Epoch: 8 Global Step: 171270 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:53,873-Speed 6308.46 samples/sec Loss 6.8490 LearningRate 0.0008 Epoch: 8 Global Step: 171280 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:18:57,122-Speed 6304.73 samples/sec Loss 6.9259 LearningRate 0.0008 Epoch: 8 Global Step: 171290 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:00,367-Speed 6312.35 samples/sec Loss 6.9230 LearningRate 0.0008 Epoch: 8 Global Step: 171300 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:03,621-Speed 6296.19 samples/sec Loss 6.8900 LearningRate 0.0008 Epoch: 8 Global Step: 171310 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:06,873-Speed 6298.96 samples/sec Loss 6.8758 LearningRate 0.0008 Epoch: 8 Global Step: 171320 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:10,120-Speed 6307.71 samples/sec Loss 6.9283 LearningRate 0.0008 Epoch: 8 Global Step: 171330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:13,372-Speed 6298.52 samples/sec Loss 6.9706 LearningRate 0.0008 Epoch: 8 Global Step: 171340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:16,624-Speed 6299.96 samples/sec Loss 7.0047 LearningRate 0.0008 Epoch: 8 Global Step: 171350 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:19:19,860-Speed 6330.14 samples/sec Loss 6.8601 LearningRate 0.0008 Epoch: 8 Global Step: 171360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:23,109-Speed 6304.95 samples/sec Loss 6.8123 LearningRate 0.0008 Epoch: 8 Global Step: 171370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:26,356-Speed 6308.19 samples/sec Loss 7.0597 LearningRate 0.0008 Epoch: 8 Global Step: 171380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:29,602-Speed 6311.96 samples/sec Loss 6.9016 LearningRate 0.0008 Epoch: 8 Global Step: 171390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:32,850-Speed 6306.58 samples/sec Loss 6.9186 LearningRate 0.0008 Epoch: 8 Global Step: 171400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:36,095-Speed 6313.62 samples/sec Loss 6.9352 LearningRate 0.0008 Epoch: 8 Global Step: 171410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:39,346-Speed 6300.31 samples/sec Loss 6.9587 LearningRate 0.0008 Epoch: 8 Global Step: 171420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:42,592-Speed 6311.61 samples/sec Loss 7.0017 LearningRate 0.0008 Epoch: 8 Global Step: 171430 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:45,854-Speed 6279.28 samples/sec Loss 6.9037 LearningRate 0.0008 Epoch: 8 Global Step: 171440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:49,097-Speed 6316.47 samples/sec Loss 6.9225 LearningRate 0.0008 Epoch: 8 Global Step: 171450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:52,329-Speed 6337.12 samples/sec Loss 6.9254 LearningRate 0.0008 Epoch: 8 Global Step: 171460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:55,575-Speed 6312.10 samples/sec Loss 6.9188 LearningRate 0.0008 Epoch: 8 Global Step: 171470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:19:58,821-Speed 6309.81 samples/sec Loss 6.9264 LearningRate 0.0008 Epoch: 8 Global Step: 171480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:20:02,068-Speed 6309.19 samples/sec Loss 6.9277 LearningRate 0.0008 Epoch: 8 Global Step: 171490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:20:05,317-Speed 6305.02 samples/sec Loss 6.8263 LearningRate 0.0008 Epoch: 8 Global Step: 171500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:20:08,565-Speed 6306.91 samples/sec Loss 6.9155 LearningRate 0.0008 Epoch: 8 Global Step: 171510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:20:11,813-Speed 6307.78 samples/sec Loss 6.9995 LearningRate 0.0008 Epoch: 8 Global Step: 171520 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:20:15,041-Speed 6344.43 samples/sec Loss 6.9278 LearningRate 0.0008 Epoch: 8 Global Step: 171530 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-04-01 07:20:18,285-Speed 6315.08 samples/sec Loss 6.9457 LearningRate 0.0008 Epoch: 8 Global Step: 171540 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-04-01 07:20:21,529-Speed 6315.23 samples/sec Loss 6.9576 LearningRate 0.0008 Epoch: 8 Global Step: 171550 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-04-01 07:20:24,774-Speed 6311.71 samples/sec Loss 6.8520 LearningRate 0.0008 Epoch: 8 Global Step: 171560 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-04-01 07:20:28,019-Speed 6313.47 samples/sec Loss 6.9145 LearningRate 0.0008 Epoch: 8 Global Step: 171570 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-04-01 07:20:31,268-Speed 6304.34 samples/sec Loss 6.9264 LearningRate 0.0008 Epoch: 8 Global Step: 171580 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-04-01 07:20:34,510-Speed 6318.29 samples/sec Loss 7.0232 LearningRate 0.0008 Epoch: 8 Global Step: 171590 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-04-01 07:20:37,767-Speed 6291.60 samples/sec Loss 6.9459 LearningRate 0.0008 Epoch: 8 Global Step: 171600 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-04-01 07:20:41,008-Speed 6318.78 samples/sec Loss 6.9745 LearningRate 0.0008 Epoch: 8 Global Step: 171610 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-04-01 07:20:44,254-Speed 6310.82 samples/sec Loss 6.9321 LearningRate 0.0008 Epoch: 8 Global Step: 171620 Fp16 Grad Scale: 16384 Required: 60 hours Training: 2022-04-01 07:20:47,498-Speed 6315.93 samples/sec Loss 6.8783 LearningRate 0.0008 Epoch: 8 Global Step: 171630 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:20:50,745-Speed 6309.07 samples/sec Loss 6.8252 LearningRate 0.0008 Epoch: 8 Global Step: 171640 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:20:53,989-Speed 6314.23 samples/sec Loss 6.9468 LearningRate 0.0008 Epoch: 8 Global Step: 171650 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:20:57,238-Speed 6304.13 samples/sec Loss 6.8783 LearningRate 0.0008 Epoch: 8 Global Step: 171660 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:00,484-Speed 6310.31 samples/sec Loss 6.8824 LearningRate 0.0008 Epoch: 8 Global Step: 171670 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:03,735-Speed 6301.82 samples/sec Loss 6.8876 LearningRate 0.0008 Epoch: 8 Global Step: 171680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:06,979-Speed 6315.34 samples/sec Loss 6.8869 LearningRate 0.0008 Epoch: 8 Global Step: 171690 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:10,224-Speed 6311.48 samples/sec Loss 6.9700 LearningRate 0.0008 Epoch: 8 Global Step: 171700 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:13,470-Speed 6311.45 samples/sec Loss 6.9405 LearningRate 0.0008 Epoch: 8 Global Step: 171710 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:16,714-Speed 6315.37 samples/sec Loss 6.9452 LearningRate 0.0008 Epoch: 8 Global Step: 171720 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:19,959-Speed 6312.09 samples/sec Loss 6.8920 LearningRate 0.0008 Epoch: 8 Global Step: 171730 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:21:23,203-Speed 6313.48 samples/sec Loss 6.9006 LearningRate 0.0008 Epoch: 8 Global Step: 171740 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:21:26,450-Speed 6308.43 samples/sec Loss 6.9394 LearningRate 0.0008 Epoch: 8 Global Step: 171750 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:21:29,695-Speed 6314.17 samples/sec Loss 6.9782 LearningRate 0.0008 Epoch: 8 Global Step: 171760 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:21:32,967-Speed 6259.49 samples/sec Loss 7.0088 LearningRate 0.0008 Epoch: 8 Global Step: 171770 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:21:36,221-Speed 6295.79 samples/sec Loss 6.8854 LearningRate 0.0008 Epoch: 8 Global Step: 171780 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:21:39,471-Speed 6302.72 samples/sec Loss 6.9052 LearningRate 0.0008 Epoch: 8 Global Step: 171790 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:21:42,719-Speed 6306.34 samples/sec Loss 6.9068 LearningRate 0.0008 Epoch: 8 Global Step: 171800 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:21:45,952-Speed 6338.36 samples/sec Loss 7.0565 LearningRate 0.0008 Epoch: 8 Global Step: 171810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:49,199-Speed 6307.81 samples/sec Loss 6.9610 LearningRate 0.0008 Epoch: 8 Global Step: 171820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:52,445-Speed 6310.72 samples/sec Loss 6.8695 LearningRate 0.0008 Epoch: 8 Global Step: 171830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:55,690-Speed 6313.02 samples/sec Loss 6.9412 LearningRate 0.0008 Epoch: 8 Global Step: 171840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:21:58,985-Speed 6217.13 samples/sec Loss 6.9821 LearningRate 0.0008 Epoch: 8 Global Step: 171850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:02,247-Speed 6278.89 samples/sec Loss 6.8884 LearningRate 0.0008 Epoch: 8 Global Step: 171860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:05,567-Speed 6171.63 samples/sec Loss 6.9478 LearningRate 0.0008 Epoch: 8 Global Step: 171870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:08,830-Speed 6275.96 samples/sec Loss 6.9451 LearningRate 0.0008 Epoch: 8 Global Step: 171880 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:12,086-Speed 6292.96 samples/sec Loss 6.8901 LearningRate 0.0008 Epoch: 8 Global Step: 171890 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:15,331-Speed 6310.98 samples/sec Loss 6.8560 LearningRate 0.0008 Epoch: 8 Global Step: 171900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:18,582-Speed 6302.55 samples/sec Loss 6.9054 LearningRate 0.0008 Epoch: 8 Global Step: 171910 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:22:21,813-Speed 6338.92 samples/sec Loss 6.8494 LearningRate 0.0008 Epoch: 8 Global Step: 171920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:25,061-Speed 6307.72 samples/sec Loss 6.8939 LearningRate 0.0008 Epoch: 8 Global Step: 171930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:28,306-Speed 6312.38 samples/sec Loss 6.9542 LearningRate 0.0008 Epoch: 8 Global Step: 171940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:31,553-Speed 6308.39 samples/sec Loss 6.8482 LearningRate 0.0008 Epoch: 8 Global Step: 171950 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:34,809-Speed 6291.54 samples/sec Loss 6.9354 LearningRate 0.0008 Epoch: 8 Global Step: 171960 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:38,054-Speed 6312.56 samples/sec Loss 7.0353 LearningRate 0.0008 Epoch: 8 Global Step: 171970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:41,298-Speed 6313.59 samples/sec Loss 6.8647 LearningRate 0.0008 Epoch: 8 Global Step: 171980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:44,557-Speed 6285.80 samples/sec Loss 6.9617 LearningRate 0.0008 Epoch: 8 Global Step: 171990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:47,804-Speed 6310.15 samples/sec Loss 6.8973 LearningRate 0.0008 Epoch: 8 Global Step: 172000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:51,051-Speed 6307.82 samples/sec Loss 6.8812 LearningRate 0.0008 Epoch: 8 Global Step: 172010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:22:54,297-Speed 6310.68 samples/sec Loss 6.9222 LearningRate 0.0008 Epoch: 8 Global Step: 172020 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:22:57,533-Speed 6331.11 samples/sec Loss 6.8900 LearningRate 0.0008 Epoch: 8 Global Step: 172030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:00,782-Speed 6305.03 samples/sec Loss 6.9383 LearningRate 0.0008 Epoch: 8 Global Step: 172040 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:04,027-Speed 6312.43 samples/sec Loss 6.7950 LearningRate 0.0008 Epoch: 8 Global Step: 172050 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:07,272-Speed 6312.88 samples/sec Loss 6.9344 LearningRate 0.0008 Epoch: 8 Global Step: 172060 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:10,515-Speed 6315.47 samples/sec Loss 6.8356 LearningRate 0.0008 Epoch: 8 Global Step: 172070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:13,761-Speed 6311.21 samples/sec Loss 6.9464 LearningRate 0.0008 Epoch: 8 Global Step: 172080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:17,008-Speed 6308.25 samples/sec Loss 6.9749 LearningRate 0.0008 Epoch: 8 Global Step: 172090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:20,257-Speed 6305.03 samples/sec Loss 6.9826 LearningRate 0.0008 Epoch: 8 Global Step: 172100 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:23,526-Speed 6266.14 samples/sec Loss 6.8924 LearningRate 0.0008 Epoch: 8 Global Step: 172110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:26,770-Speed 6315.77 samples/sec Loss 6.9145 LearningRate 0.0008 Epoch: 8 Global Step: 172120 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:30,018-Speed 6305.94 samples/sec Loss 7.0185 LearningRate 0.0008 Epoch: 8 Global Step: 172130 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:23:33,245-Speed 6347.83 samples/sec Loss 6.8678 LearningRate 0.0008 Epoch: 8 Global Step: 172140 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:36,491-Speed 6310.98 samples/sec Loss 6.9037 LearningRate 0.0008 Epoch: 8 Global Step: 172150 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:39,738-Speed 6308.74 samples/sec Loss 6.8633 LearningRate 0.0008 Epoch: 8 Global Step: 172160 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:42,982-Speed 6314.39 samples/sec Loss 6.9407 LearningRate 0.0008 Epoch: 8 Global Step: 172170 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:46,228-Speed 6310.95 samples/sec Loss 6.8322 LearningRate 0.0008 Epoch: 8 Global Step: 172180 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:49,476-Speed 6306.08 samples/sec Loss 6.8976 LearningRate 0.0008 Epoch: 8 Global Step: 172190 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:52,722-Speed 6311.30 samples/sec Loss 6.9200 LearningRate 0.0008 Epoch: 8 Global Step: 172200 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:55,966-Speed 6314.60 samples/sec Loss 6.9421 LearningRate 0.0008 Epoch: 8 Global Step: 172210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:23:59,213-Speed 6310.08 samples/sec Loss 6.9465 LearningRate 0.0008 Epoch: 8 Global Step: 172220 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:02,460-Speed 6309.25 samples/sec Loss 7.0417 LearningRate 0.0008 Epoch: 8 Global Step: 172230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:05,709-Speed 6305.33 samples/sec Loss 7.0458 LearningRate 0.0008 Epoch: 8 Global Step: 172240 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:24:08,942-Speed 6335.46 samples/sec Loss 6.9647 LearningRate 0.0008 Epoch: 8 Global Step: 172250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:12,189-Speed 6308.61 samples/sec Loss 6.9240 LearningRate 0.0008 Epoch: 8 Global Step: 172260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:15,433-Speed 6313.79 samples/sec Loss 6.9095 LearningRate 0.0008 Epoch: 8 Global Step: 172270 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:18,680-Speed 6309.76 samples/sec Loss 6.8398 LearningRate 0.0008 Epoch: 8 Global Step: 172280 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:21,926-Speed 6309.41 samples/sec Loss 6.9285 LearningRate 0.0008 Epoch: 8 Global Step: 172290 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:25,179-Speed 6297.92 samples/sec Loss 6.8946 LearningRate 0.0008 Epoch: 8 Global Step: 172300 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:28,420-Speed 6319.83 samples/sec Loss 6.8313 LearningRate 0.0008 Epoch: 8 Global Step: 172310 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:31,663-Speed 6316.98 samples/sec Loss 6.8418 LearningRate 0.0008 Epoch: 8 Global Step: 172320 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:34,910-Speed 6309.11 samples/sec Loss 6.8893 LearningRate 0.0008 Epoch: 8 Global Step: 172330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:38,160-Speed 6302.44 samples/sec Loss 6.8834 LearningRate 0.0008 Epoch: 8 Global Step: 172340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:41,408-Speed 6307.80 samples/sec Loss 6.9492 LearningRate 0.0008 Epoch: 8 Global Step: 172350 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:24:44,640-Speed 6336.86 samples/sec Loss 6.8129 LearningRate 0.0008 Epoch: 8 Global Step: 172360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:47,886-Speed 6312.00 samples/sec Loss 6.9416 LearningRate 0.0008 Epoch: 8 Global Step: 172370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:51,130-Speed 6313.83 samples/sec Loss 6.8760 LearningRate 0.0008 Epoch: 8 Global Step: 172380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:54,377-Speed 6308.86 samples/sec Loss 6.8694 LearningRate 0.0008 Epoch: 8 Global Step: 172390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:24:57,621-Speed 6314.67 samples/sec Loss 6.8799 LearningRate 0.0008 Epoch: 8 Global Step: 172400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:00,891-Speed 6265.27 samples/sec Loss 6.8984 LearningRate 0.0008 Epoch: 8 Global Step: 172410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:04,139-Speed 6306.53 samples/sec Loss 6.8508 LearningRate 0.0008 Epoch: 8 Global Step: 172420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:07,389-Speed 6303.21 samples/sec Loss 6.8894 LearningRate 0.0008 Epoch: 8 Global Step: 172430 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:10,635-Speed 6312.02 samples/sec Loss 6.9326 LearningRate 0.0008 Epoch: 8 Global Step: 172440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:13,884-Speed 6303.69 samples/sec Loss 6.8857 LearningRate 0.0008 Epoch: 8 Global Step: 172450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:17,135-Speed 6301.93 samples/sec Loss 6.9558 LearningRate 0.0008 Epoch: 8 Global Step: 172460 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:25:20,381-Speed 6309.65 samples/sec Loss 6.8924 LearningRate 0.0008 Epoch: 8 Global Step: 172470 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:25:23,630-Speed 6306.05 samples/sec Loss 6.8746 LearningRate 0.0008 Epoch: 8 Global Step: 172480 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:25:26,868-Speed 6325.18 samples/sec Loss 6.8352 LearningRate 0.0008 Epoch: 8 Global Step: 172490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:30,116-Speed 6306.87 samples/sec Loss 6.8024 LearningRate 0.0008 Epoch: 8 Global Step: 172500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:33,359-Speed 6317.11 samples/sec Loss 6.9216 LearningRate 0.0008 Epoch: 8 Global Step: 172510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:36,606-Speed 6309.26 samples/sec Loss 6.9478 LearningRate 0.0008 Epoch: 8 Global Step: 172520 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:39,853-Speed 6308.50 samples/sec Loss 6.8621 LearningRate 0.0008 Epoch: 8 Global Step: 172530 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:43,113-Speed 6283.53 samples/sec Loss 6.8789 LearningRate 0.0008 Epoch: 8 Global Step: 172540 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:46,359-Speed 6309.73 samples/sec Loss 6.8552 LearningRate 0.0008 Epoch: 8 Global Step: 172550 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:49,606-Speed 6309.56 samples/sec Loss 6.9266 LearningRate 0.0008 Epoch: 8 Global Step: 172560 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:52,862-Speed 6291.12 samples/sec Loss 6.9251 LearningRate 0.0008 Epoch: 8 Global Step: 172570 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:56,109-Speed 6308.48 samples/sec Loss 6.9248 LearningRate 0.0008 Epoch: 8 Global Step: 172580 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:25:59,341-Speed 6337.50 samples/sec Loss 6.9232 LearningRate 0.0008 Epoch: 8 Global Step: 172590 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:02,586-Speed 6314.68 samples/sec Loss 6.9532 LearningRate 0.0008 Epoch: 8 Global Step: 172600 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:05,832-Speed 6308.90 samples/sec Loss 6.9261 LearningRate 0.0008 Epoch: 8 Global Step: 172610 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:09,077-Speed 6313.07 samples/sec Loss 6.8441 LearningRate 0.0008 Epoch: 8 Global Step: 172620 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:12,337-Speed 6285.23 samples/sec Loss 6.9411 LearningRate 0.0008 Epoch: 8 Global Step: 172630 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:15,584-Speed 6308.36 samples/sec Loss 6.9877 LearningRate 0.0008 Epoch: 8 Global Step: 172640 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:18,830-Speed 6310.87 samples/sec Loss 6.8948 LearningRate 0.0008 Epoch: 8 Global Step: 172650 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:22,077-Speed 6309.46 samples/sec Loss 6.9279 LearningRate 0.0008 Epoch: 8 Global Step: 172660 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:25,327-Speed 6302.99 samples/sec Loss 6.8700 LearningRate 0.0008 Epoch: 8 Global Step: 172670 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:28,580-Speed 6297.66 samples/sec Loss 6.9190 LearningRate 0.0008 Epoch: 8 Global Step: 172680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:31,823-Speed 6314.85 samples/sec Loss 6.9863 LearningRate 0.0008 Epoch: 8 Global Step: 172690 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:26:35,054-Speed 6340.50 samples/sec Loss 6.9371 LearningRate 0.0008 Epoch: 8 Global Step: 172700 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:38,302-Speed 6306.94 samples/sec Loss 6.8740 LearningRate 0.0008 Epoch: 8 Global Step: 172710 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:41,547-Speed 6313.27 samples/sec Loss 6.8727 LearningRate 0.0008 Epoch: 8 Global Step: 172720 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:44,796-Speed 6303.78 samples/sec Loss 6.8581 LearningRate 0.0008 Epoch: 8 Global Step: 172730 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:48,043-Speed 6308.94 samples/sec Loss 6.8808 LearningRate 0.0008 Epoch: 8 Global Step: 172740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:51,291-Speed 6307.42 samples/sec Loss 6.8861 LearningRate 0.0008 Epoch: 8 Global Step: 172750 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:54,534-Speed 6316.53 samples/sec Loss 6.8443 LearningRate 0.0008 Epoch: 8 Global Step: 172760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:26:57,780-Speed 6310.01 samples/sec Loss 6.8492 LearningRate 0.0008 Epoch: 8 Global Step: 172770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:01,029-Speed 6304.95 samples/sec Loss 6.9811 LearningRate 0.0008 Epoch: 8 Global Step: 172780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:04,280-Speed 6302.54 samples/sec Loss 6.9080 LearningRate 0.0008 Epoch: 8 Global Step: 172790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:07,526-Speed 6308.89 samples/sec Loss 6.8991 LearningRate 0.0008 Epoch: 8 Global Step: 172800 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:27:10,769-Speed 6317.06 samples/sec Loss 6.8940 LearningRate 0.0008 Epoch: 8 Global Step: 172810 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:27:14,016-Speed 6310.14 samples/sec Loss 7.0345 LearningRate 0.0008 Epoch: 8 Global Step: 172820 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:27:17,263-Speed 6307.81 samples/sec Loss 7.0018 LearningRate 0.0008 Epoch: 8 Global Step: 172830 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:27:20,497-Speed 6333.95 samples/sec Loss 6.9142 LearningRate 0.0008 Epoch: 8 Global Step: 172840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:23,747-Speed 6303.72 samples/sec Loss 6.9295 LearningRate 0.0008 Epoch: 8 Global Step: 172850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:26,994-Speed 6308.98 samples/sec Loss 6.9292 LearningRate 0.0008 Epoch: 8 Global Step: 172860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:30,243-Speed 6304.73 samples/sec Loss 7.0253 LearningRate 0.0008 Epoch: 8 Global Step: 172870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:33,490-Speed 6310.10 samples/sec Loss 6.9202 LearningRate 0.0008 Epoch: 8 Global Step: 172880 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:36,733-Speed 6315.10 samples/sec Loss 6.7811 LearningRate 0.0008 Epoch: 8 Global Step: 172890 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:39,979-Speed 6311.33 samples/sec Loss 6.9296 LearningRate 0.0008 Epoch: 8 Global Step: 172900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:43,225-Speed 6310.39 samples/sec Loss 6.9398 LearningRate 0.0008 Epoch: 8 Global Step: 172910 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:46,474-Speed 6306.52 samples/sec Loss 6.9132 LearningRate 0.0008 Epoch: 8 Global Step: 172920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:49,802-Speed 6153.61 samples/sec Loss 6.9380 LearningRate 0.0008 Epoch: 8 Global Step: 172930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:27:53,143-Speed 6131.41 samples/sec Loss 6.8352 LearningRate 0.0008 Epoch: 8 Global Step: 172940 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:27:56,395-Speed 6299.88 samples/sec Loss 6.8742 LearningRate 0.0008 Epoch: 8 Global Step: 172950 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:27:59,627-Speed 6337.46 samples/sec Loss 6.9012 LearningRate 0.0008 Epoch: 8 Global Step: 172960 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:02,878-Speed 6300.52 samples/sec Loss 6.9701 LearningRate 0.0008 Epoch: 8 Global Step: 172970 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:06,130-Speed 6299.52 samples/sec Loss 6.9054 LearningRate 0.0008 Epoch: 8 Global Step: 172980 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:09,377-Speed 6309.32 samples/sec Loss 6.9560 LearningRate 0.0008 Epoch: 8 Global Step: 172990 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:12,624-Speed 6309.13 samples/sec Loss 6.8809 LearningRate 0.0008 Epoch: 8 Global Step: 173000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:15,869-Speed 6311.19 samples/sec Loss 6.9312 LearningRate 0.0008 Epoch: 8 Global Step: 173010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:19,119-Speed 6302.93 samples/sec Loss 6.8776 LearningRate 0.0008 Epoch: 8 Global Step: 173020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:22,372-Speed 6298.28 samples/sec Loss 6.8132 LearningRate 0.0008 Epoch: 8 Global Step: 173030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:25,621-Speed 6305.81 samples/sec Loss 6.9089 LearningRate 0.0008 Epoch: 8 Global Step: 173040 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:28,871-Speed 6302.66 samples/sec Loss 6.9293 LearningRate 0.0008 Epoch: 8 Global Step: 173050 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:32,113-Speed 6318.28 samples/sec Loss 6.9523 LearningRate 0.0008 Epoch: 8 Global Step: 173060 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:28:35,360-Speed 6308.75 samples/sec Loss 6.9429 LearningRate 0.0008 Epoch: 8 Global Step: 173070 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:28:38,606-Speed 6310.46 samples/sec Loss 6.8965 LearningRate 0.0008 Epoch: 8 Global Step: 173080 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:28:41,839-Speed 6336.07 samples/sec Loss 6.8908 LearningRate 0.0008 Epoch: 8 Global Step: 173090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:45,086-Speed 6309.72 samples/sec Loss 6.8862 LearningRate 0.0008 Epoch: 8 Global Step: 173100 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:48,334-Speed 6306.67 samples/sec Loss 6.9695 LearningRate 0.0008 Epoch: 8 Global Step: 173110 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:51,577-Speed 6315.84 samples/sec Loss 6.9361 LearningRate 0.0008 Epoch: 8 Global Step: 173120 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:54,818-Speed 6321.66 samples/sec Loss 6.9535 LearningRate 0.0008 Epoch: 8 Global Step: 173130 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:28:58,068-Speed 6303.38 samples/sec Loss 6.8884 LearningRate 0.0008 Epoch: 8 Global Step: 173140 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:01,311-Speed 6316.02 samples/sec Loss 6.9154 LearningRate 0.0008 Epoch: 8 Global Step: 173150 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:04,558-Speed 6309.66 samples/sec Loss 6.8383 LearningRate 0.0008 Epoch: 8 Global Step: 173160 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:07,805-Speed 6308.02 samples/sec Loss 6.9411 LearningRate 0.0008 Epoch: 8 Global Step: 173170 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:11,052-Speed 6308.46 samples/sec Loss 6.8799 LearningRate 0.0008 Epoch: 8 Global Step: 173180 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:14,312-Speed 6283.14 samples/sec Loss 6.9045 LearningRate 0.0008 Epoch: 8 Global Step: 173190 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:29:17,541-Speed 6343.67 samples/sec Loss 6.9134 LearningRate 0.0008 Epoch: 8 Global Step: 173200 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:20,788-Speed 6309.80 samples/sec Loss 6.8949 LearningRate 0.0008 Epoch: 8 Global Step: 173210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:24,037-Speed 6304.85 samples/sec Loss 6.8498 LearningRate 0.0008 Epoch: 8 Global Step: 173220 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:27,293-Speed 6290.89 samples/sec Loss 6.9043 LearningRate 0.0008 Epoch: 8 Global Step: 173230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:30,541-Speed 6307.03 samples/sec Loss 6.8469 LearningRate 0.0008 Epoch: 8 Global Step: 173240 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:33,786-Speed 6312.52 samples/sec Loss 6.9336 LearningRate 0.0008 Epoch: 8 Global Step: 173250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:37,037-Speed 6300.98 samples/sec Loss 6.9168 LearningRate 0.0008 Epoch: 8 Global Step: 173260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:40,328-Speed 6224.27 samples/sec Loss 6.9700 LearningRate 0.0008 Epoch: 8 Global Step: 173270 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:43,574-Speed 6310.58 samples/sec Loss 6.9816 LearningRate 0.0008 Epoch: 8 Global Step: 173280 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:46,819-Speed 6312.82 samples/sec Loss 6.9596 LearningRate 0.0008 Epoch: 8 Global Step: 173290 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:50,068-Speed 6305.38 samples/sec Loss 6.8471 LearningRate 0.0008 Epoch: 8 Global Step: 173300 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:29:53,319-Speed 6301.35 samples/sec Loss 6.9368 LearningRate 0.0008 Epoch: 8 Global Step: 173310 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:29:56,558-Speed 6324.71 samples/sec Loss 6.9050 LearningRate 0.0008 Epoch: 8 Global Step: 173320 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:29:59,817-Speed 6286.30 samples/sec Loss 6.7785 LearningRate 0.0008 Epoch: 8 Global Step: 173330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:03,065-Speed 6307.55 samples/sec Loss 6.9226 LearningRate 0.0008 Epoch: 8 Global Step: 173340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:06,320-Speed 6293.11 samples/sec Loss 6.9064 LearningRate 0.0008 Epoch: 8 Global Step: 173350 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:09,569-Speed 6304.85 samples/sec Loss 6.8740 LearningRate 0.0008 Epoch: 8 Global Step: 173360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:12,816-Speed 6308.68 samples/sec Loss 6.9772 LearningRate 0.0008 Epoch: 8 Global Step: 173370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:16,060-Speed 6314.76 samples/sec Loss 6.8890 LearningRate 0.0008 Epoch: 8 Global Step: 173380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:19,306-Speed 6309.41 samples/sec Loss 6.9731 LearningRate 0.0008 Epoch: 8 Global Step: 173390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:22,555-Speed 6306.15 samples/sec Loss 6.8978 LearningRate 0.0008 Epoch: 8 Global Step: 173400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:25,803-Speed 6306.65 samples/sec Loss 6.9044 LearningRate 0.0008 Epoch: 8 Global Step: 173410 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:29,035-Speed 6337.17 samples/sec Loss 6.9343 LearningRate 0.0008 Epoch: 8 Global Step: 173420 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:32,286-Speed 6301.95 samples/sec Loss 6.8427 LearningRate 0.0008 Epoch: 8 Global Step: 173430 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:35,533-Speed 6307.90 samples/sec Loss 6.8653 LearningRate 0.0008 Epoch: 8 Global Step: 173440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:38,782-Speed 6306.08 samples/sec Loss 6.8781 LearningRate 0.0008 Epoch: 8 Global Step: 173450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:42,036-Speed 6295.16 samples/sec Loss 6.9610 LearningRate 0.0008 Epoch: 8 Global Step: 173460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:45,285-Speed 6305.11 samples/sec Loss 6.8717 LearningRate 0.0008 Epoch: 8 Global Step: 173470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:48,532-Speed 6307.00 samples/sec Loss 6.9384 LearningRate 0.0008 Epoch: 8 Global Step: 173480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:51,775-Speed 6317.58 samples/sec Loss 6.8956 LearningRate 0.0008 Epoch: 8 Global Step: 173490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:55,024-Speed 6305.43 samples/sec Loss 6.8997 LearningRate 0.0008 Epoch: 8 Global Step: 173500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:30:58,291-Speed 6269.90 samples/sec Loss 6.8316 LearningRate 0.0008 Epoch: 8 Global Step: 173510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:31:01,537-Speed 6310.57 samples/sec Loss 6.8983 LearningRate 0.0008 Epoch: 8 Global Step: 173520 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:31:04,792-Speed 6293.48 samples/sec Loss 6.8866 LearningRate 0.0008 Epoch: 8 Global Step: 173530 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:31:08,038-Speed 6311.78 samples/sec Loss 6.8575 LearningRate 0.0008 Epoch: 8 Global Step: 173540 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:31:11,292-Speed 6296.31 samples/sec Loss 6.8501 LearningRate 0.0008 Epoch: 8 Global Step: 173550 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:31:14,538-Speed 6310.36 samples/sec Loss 6.8948 LearningRate 0.0008 Epoch: 8 Global Step: 173560 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:31:17,786-Speed 6305.86 samples/sec Loss 6.9177 LearningRate 0.0008 Epoch: 8 Global Step: 173570 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:31:21,032-Speed 6310.62 samples/sec Loss 6.9226 LearningRate 0.0008 Epoch: 8 Global Step: 173580 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:31:24,279-Speed 6309.95 samples/sec Loss 6.8721 LearningRate 0.0008 Epoch: 8 Global Step: 173590 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:31:27,513-Speed 6332.70 samples/sec Loss 6.9002 LearningRate 0.0008 Epoch: 8 Global Step: 173600 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:31:30,759-Speed 6311.97 samples/sec Loss 6.9011 LearningRate 0.0008 Epoch: 8 Global Step: 173610 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:31:34,012-Speed 6297.53 samples/sec Loss 6.8943 LearningRate 0.0008 Epoch: 8 Global Step: 173620 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:31:37,260-Speed 6306.11 samples/sec Loss 6.8552 LearningRate 0.0008 Epoch: 8 Global Step: 173630 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:31:40,507-Speed 6308.79 samples/sec Loss 6.8333 LearningRate 0.0008 Epoch: 8 Global Step: 173640 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:31:43,757-Speed 6302.35 samples/sec Loss 6.9126 LearningRate 0.0008 Epoch: 8 Global Step: 173650 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:31:47,005-Speed 6307.35 samples/sec Loss 6.8436 LearningRate 0.0008 Epoch: 8 Global Step: 173660 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:31:50,256-Speed 6300.22 samples/sec Loss 6.8349 LearningRate 0.0008 Epoch: 8 Global Step: 173670 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:31:53,504-Speed 6308.58 samples/sec Loss 6.9451 LearningRate 0.0008 Epoch: 8 Global Step: 173680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:31:56,748-Speed 6313.06 samples/sec Loss 6.8348 LearningRate 0.0008 Epoch: 8 Global Step: 173690 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:00,009-Speed 6282.95 samples/sec Loss 6.9162 LearningRate 0.0008 Epoch: 8 Global Step: 173700 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:32:03,258-Speed 6303.51 samples/sec Loss 6.8487 LearningRate 0.0008 Epoch: 8 Global Step: 173710 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:32:06,507-Speed 6304.76 samples/sec Loss 6.8532 LearningRate 0.0008 Epoch: 8 Global Step: 173720 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:32:09,753-Speed 6311.12 samples/sec Loss 6.9539 LearningRate 0.0008 Epoch: 8 Global Step: 173730 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:32:13,002-Speed 6306.55 samples/sec Loss 6.8540 LearningRate 0.0008 Epoch: 8 Global Step: 173740 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:32:16,235-Speed 6336.59 samples/sec Loss 6.8787 LearningRate 0.0008 Epoch: 8 Global Step: 173750 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:19,478-Speed 6315.67 samples/sec Loss 6.8915 LearningRate 0.0008 Epoch: 8 Global Step: 173760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:22,727-Speed 6306.39 samples/sec Loss 6.8131 LearningRate 0.0008 Epoch: 8 Global Step: 173770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:25,972-Speed 6311.13 samples/sec Loss 6.9216 LearningRate 0.0008 Epoch: 8 Global Step: 173780 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:29,218-Speed 6311.51 samples/sec Loss 6.9383 LearningRate 0.0008 Epoch: 8 Global Step: 173790 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:32,467-Speed 6304.54 samples/sec Loss 6.9270 LearningRate 0.0008 Epoch: 8 Global Step: 173800 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:35,709-Speed 6319.65 samples/sec Loss 6.8610 LearningRate 0.0008 Epoch: 8 Global Step: 173810 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:38,956-Speed 6308.22 samples/sec Loss 6.9600 LearningRate 0.0008 Epoch: 8 Global Step: 173820 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:42,201-Speed 6312.83 samples/sec Loss 6.9310 LearningRate 0.0008 Epoch: 8 Global Step: 173830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:45,451-Speed 6302.90 samples/sec Loss 6.8862 LearningRate 0.0008 Epoch: 8 Global Step: 173840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:48,683-Speed 6337.50 samples/sec Loss 6.8235 LearningRate 0.0008 Epoch: 8 Global Step: 173850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:51,929-Speed 6311.77 samples/sec Loss 6.8897 LearningRate 0.0008 Epoch: 8 Global Step: 173860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:55,172-Speed 6315.57 samples/sec Loss 6.8723 LearningRate 0.0008 Epoch: 8 Global Step: 173870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:32:58,417-Speed 6312.14 samples/sec Loss 6.9512 LearningRate 0.0008 Epoch: 8 Global Step: 173880 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:01,666-Speed 6305.00 samples/sec Loss 6.7913 LearningRate 0.0008 Epoch: 8 Global Step: 173890 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:04,912-Speed 6311.89 samples/sec Loss 6.9836 LearningRate 0.0008 Epoch: 8 Global Step: 173900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:08,162-Speed 6302.75 samples/sec Loss 6.8396 LearningRate 0.0008 Epoch: 8 Global Step: 173910 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:11,414-Speed 6298.14 samples/sec Loss 6.9386 LearningRate 0.0008 Epoch: 8 Global Step: 173920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:14,655-Speed 6320.15 samples/sec Loss 6.7688 LearningRate 0.0008 Epoch: 8 Global Step: 173930 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:17,909-Speed 6295.56 samples/sec Loss 6.9542 LearningRate 0.0008 Epoch: 8 Global Step: 173940 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:21,158-Speed 6306.01 samples/sec Loss 6.9605 LearningRate 0.0008 Epoch: 8 Global Step: 173950 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:33:24,410-Speed 6299.21 samples/sec Loss 6.8554 LearningRate 0.0008 Epoch: 8 Global Step: 173960 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:33:27,660-Speed 6302.63 samples/sec Loss 6.9151 LearningRate 0.0008 Epoch: 8 Global Step: 173970 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:33:30,906-Speed 6311.33 samples/sec Loss 6.9123 LearningRate 0.0008 Epoch: 8 Global Step: 173980 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:33:34,158-Speed 6298.70 samples/sec Loss 6.8570 LearningRate 0.0008 Epoch: 8 Global Step: 173990 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:33:37,395-Speed 6328.07 samples/sec Loss 6.9002 LearningRate 0.0008 Epoch: 8 Global Step: 174000 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:40,644-Speed 6304.79 samples/sec Loss 6.8673 LearningRate 0.0008 Epoch: 8 Global Step: 174010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:43,895-Speed 6300.87 samples/sec Loss 6.9782 LearningRate 0.0008 Epoch: 8 Global Step: 174020 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:47,147-Speed 6299.90 samples/sec Loss 6.9561 LearningRate 0.0008 Epoch: 8 Global Step: 174030 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:50,395-Speed 6306.11 samples/sec Loss 6.8617 LearningRate 0.0008 Epoch: 8 Global Step: 174040 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:53,647-Speed 6298.60 samples/sec Loss 6.9225 LearningRate 0.0008 Epoch: 8 Global Step: 174050 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:33:56,895-Speed 6307.99 samples/sec Loss 6.8970 LearningRate 0.0008 Epoch: 8 Global Step: 174060 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:34:00,140-Speed 6311.65 samples/sec Loss 6.8834 LearningRate 0.0008 Epoch: 8 Global Step: 174070 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:34:03,384-Speed 6314.52 samples/sec Loss 6.8925 LearningRate 0.0008 Epoch: 8 Global Step: 174080 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:34:06,633-Speed 6305.04 samples/sec Loss 6.8212 LearningRate 0.0008 Epoch: 8 Global Step: 174090 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:34:09,886-Speed 6296.35 samples/sec Loss 6.8151 LearningRate 0.0008 Epoch: 8 Global Step: 174100 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:34:13,142-Speed 6291.38 samples/sec Loss 6.9677 LearningRate 0.0008 Epoch: 8 Global Step: 174110 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:34:16,400-Speed 6289.25 samples/sec Loss 6.9319 LearningRate 0.0008 Epoch: 8 Global Step: 174120 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:34:19,645-Speed 6311.20 samples/sec Loss 6.9094 LearningRate 0.0008 Epoch: 8 Global Step: 174130 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:34:22,904-Speed 6285.86 samples/sec Loss 6.9933 LearningRate 0.0008 Epoch: 8 Global Step: 174140 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:34:26,153-Speed 6304.87 samples/sec Loss 6.8529 LearningRate 0.0008 Epoch: 8 Global Step: 174150 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:34:29,401-Speed 6306.82 samples/sec Loss 6.9007 LearningRate 0.0008 Epoch: 8 Global Step: 174160 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:34:32,645-Speed 6315.15 samples/sec Loss 6.8254 LearningRate 0.0008 Epoch: 8 Global Step: 174170 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:34:35,894-Speed 6305.79 samples/sec Loss 6.9105 LearningRate 0.0008 Epoch: 8 Global Step: 174180 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:34:39,130-Speed 6330.16 samples/sec Loss 6.8951 LearningRate 0.0008 Epoch: 8 Global Step: 174190 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:34:42,378-Speed 6307.03 samples/sec Loss 6.8886 LearningRate 0.0008 Epoch: 8 Global Step: 174200 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:34:45,628-Speed 6302.99 samples/sec Loss 6.8714 LearningRate 0.0008 Epoch: 8 Global Step: 174210 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:34:48,873-Speed 6311.98 samples/sec Loss 6.9411 LearningRate 0.0008 Epoch: 8 Global Step: 174220 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:34:52,121-Speed 6306.97 samples/sec Loss 6.8187 LearningRate 0.0008 Epoch: 8 Global Step: 174230 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:34:55,366-Speed 6311.57 samples/sec Loss 6.8500 LearningRate 0.0008 Epoch: 8 Global Step: 174240 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:34:58,612-Speed 6311.60 samples/sec Loss 6.8689 LearningRate 0.0008 Epoch: 8 Global Step: 174250 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:01,854-Speed 6318.11 samples/sec Loss 6.9522 LearningRate 0.0008 Epoch: 8 Global Step: 174260 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:05,102-Speed 6306.69 samples/sec Loss 6.7792 LearningRate 0.0008 Epoch: 8 Global Step: 174270 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:08,352-Speed 6303.48 samples/sec Loss 6.8571 LearningRate 0.0008 Epoch: 8 Global Step: 174280 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:11,599-Speed 6308.63 samples/sec Loss 6.9362 LearningRate 0.0008 Epoch: 8 Global Step: 174290 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:35:14,849-Speed 6303.77 samples/sec Loss 6.8734 LearningRate 0.0008 Epoch: 8 Global Step: 174300 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:35:18,090-Speed 6319.07 samples/sec Loss 6.9384 LearningRate 0.0008 Epoch: 8 Global Step: 174310 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:21,341-Speed 6301.03 samples/sec Loss 6.9086 LearningRate 0.0008 Epoch: 8 Global Step: 174320 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:24,589-Speed 6306.75 samples/sec Loss 6.9562 LearningRate 0.0008 Epoch: 8 Global Step: 174330 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:27,839-Speed 6303.24 samples/sec Loss 6.8892 LearningRate 0.0008 Epoch: 8 Global Step: 174340 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:31,090-Speed 6300.81 samples/sec Loss 6.9311 LearningRate 0.0008 Epoch: 8 Global Step: 174350 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:34,338-Speed 6306.60 samples/sec Loss 6.8551 LearningRate 0.0008 Epoch: 8 Global Step: 174360 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:37,587-Speed 6305.34 samples/sec Loss 6.8571 LearningRate 0.0008 Epoch: 8 Global Step: 174370 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:40,837-Speed 6303.61 samples/sec Loss 6.8493 LearningRate 0.0008 Epoch: 8 Global Step: 174380 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:44,084-Speed 6308.79 samples/sec Loss 6.8975 LearningRate 0.0008 Epoch: 8 Global Step: 174390 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:47,334-Speed 6304.20 samples/sec Loss 6.8702 LearningRate 0.0008 Epoch: 8 Global Step: 174400 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:35:50,581-Speed 6307.26 samples/sec Loss 6.8475 LearningRate 0.0008 Epoch: 8 Global Step: 174410 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:35:53,835-Speed 6296.54 samples/sec Loss 6.8702 LearningRate 0.0008 Epoch: 8 Global Step: 174420 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:35:57,067-Speed 6336.65 samples/sec Loss 6.8787 LearningRate 0.0008 Epoch: 8 Global Step: 174430 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:00,339-Speed 6261.37 samples/sec Loss 6.8894 LearningRate 0.0008 Epoch: 8 Global Step: 174440 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:03,587-Speed 6306.52 samples/sec Loss 6.9233 LearningRate 0.0008 Epoch: 8 Global Step: 174450 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:06,837-Speed 6303.54 samples/sec Loss 6.9217 LearningRate 0.0008 Epoch: 8 Global Step: 174460 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:10,090-Speed 6296.87 samples/sec Loss 6.8813 LearningRate 0.0008 Epoch: 8 Global Step: 174470 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:13,339-Speed 6305.37 samples/sec Loss 6.8650 LearningRate 0.0008 Epoch: 8 Global Step: 174480 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:16,586-Speed 6308.15 samples/sec Loss 6.9107 LearningRate 0.0008 Epoch: 8 Global Step: 174490 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:19,833-Speed 6308.25 samples/sec Loss 6.8491 LearningRate 0.0008 Epoch: 8 Global Step: 174500 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:23,082-Speed 6305.17 samples/sec Loss 6.8999 LearningRate 0.0008 Epoch: 8 Global Step: 174510 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:26,327-Speed 6311.90 samples/sec Loss 6.9055 LearningRate 0.0008 Epoch: 8 Global Step: 174520 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:29,574-Speed 6309.80 samples/sec Loss 6.9311 LearningRate 0.0008 Epoch: 8 Global Step: 174530 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:36:32,827-Speed 6296.79 samples/sec Loss 6.9613 LearningRate 0.0008 Epoch: 8 Global Step: 174540 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:36:36,062-Speed 6332.35 samples/sec Loss 6.7836 LearningRate 0.0008 Epoch: 8 Global Step: 174550 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:39,310-Speed 6306.55 samples/sec Loss 6.9217 LearningRate 0.0008 Epoch: 8 Global Step: 174560 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:42,559-Speed 6306.07 samples/sec Loss 6.9053 LearningRate 0.0008 Epoch: 8 Global Step: 174570 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:45,803-Speed 6314.50 samples/sec Loss 6.8532 LearningRate 0.0008 Epoch: 8 Global Step: 174580 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:49,050-Speed 6308.81 samples/sec Loss 6.8562 LearningRate 0.0008 Epoch: 8 Global Step: 174590 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:52,298-Speed 6306.29 samples/sec Loss 6.8468 LearningRate 0.0008 Epoch: 8 Global Step: 174600 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:55,544-Speed 6310.64 samples/sec Loss 6.8422 LearningRate 0.0008 Epoch: 8 Global Step: 174610 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:36:58,790-Speed 6312.43 samples/sec Loss 6.8560 LearningRate 0.0008 Epoch: 8 Global Step: 174620 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:02,038-Speed 6306.41 samples/sec Loss 6.7754 LearningRate 0.0008 Epoch: 8 Global Step: 174630 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:05,285-Speed 6307.19 samples/sec Loss 6.8209 LearningRate 0.0008 Epoch: 8 Global Step: 174640 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:08,531-Speed 6311.82 samples/sec Loss 6.8236 LearningRate 0.0008 Epoch: 8 Global Step: 174650 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:37:11,780-Speed 6304.47 samples/sec Loss 6.9115 LearningRate 0.0008 Epoch: 8 Global Step: 174660 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:37:15,031-Speed 6301.71 samples/sec Loss 6.9474 LearningRate 0.0008 Epoch: 8 Global Step: 174670 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:37:18,267-Speed 6330.12 samples/sec Loss 6.8861 LearningRate 0.0008 Epoch: 8 Global Step: 174680 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:21,512-Speed 6312.58 samples/sec Loss 6.7750 LearningRate 0.0008 Epoch: 8 Global Step: 174690 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:24,755-Speed 6315.60 samples/sec Loss 6.9627 LearningRate 0.0008 Epoch: 8 Global Step: 174700 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:28,005-Speed 6303.41 samples/sec Loss 6.9096 LearningRate 0.0008 Epoch: 8 Global Step: 174710 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:31,251-Speed 6311.15 samples/sec Loss 6.8658 LearningRate 0.0008 Epoch: 8 Global Step: 174720 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:34,509-Speed 6287.43 samples/sec Loss 6.8153 LearningRate 0.0008 Epoch: 8 Global Step: 174730 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:37,755-Speed 6309.33 samples/sec Loss 6.9462 LearningRate 0.0008 Epoch: 8 Global Step: 174740 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:40,998-Speed 6317.71 samples/sec Loss 6.8875 LearningRate 0.0008 Epoch: 8 Global Step: 174750 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:44,243-Speed 6312.91 samples/sec Loss 6.8449 LearningRate 0.0008 Epoch: 8 Global Step: 174760 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:47,483-Speed 6322.13 samples/sec Loss 6.8803 LearningRate 0.0008 Epoch: 8 Global Step: 174770 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:37:50,729-Speed 6310.65 samples/sec Loss 6.8265 LearningRate 0.0008 Epoch: 8 Global Step: 174780 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:37:53,975-Speed 6311.00 samples/sec Loss 6.8697 LearningRate 0.0008 Epoch: 8 Global Step: 174790 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:37:57,223-Speed 6306.64 samples/sec Loss 6.8785 LearningRate 0.0008 Epoch: 8 Global Step: 174800 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:38:00,471-Speed 6307.20 samples/sec Loss 6.9178 LearningRate 0.0008 Epoch: 8 Global Step: 174810 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:38:03,717-Speed 6311.76 samples/sec Loss 6.8760 LearningRate 0.0008 Epoch: 8 Global Step: 174820 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:38:06,953-Speed 6329.83 samples/sec Loss 6.8778 LearningRate 0.0008 Epoch: 8 Global Step: 174830 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:38:10,197-Speed 6314.28 samples/sec Loss 6.8591 LearningRate 0.0008 Epoch: 8 Global Step: 174840 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:38:13,442-Speed 6312.10 samples/sec Loss 6.9651 LearningRate 0.0008 Epoch: 8 Global Step: 174850 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:38:16,704-Speed 6280.16 samples/sec Loss 6.9075 LearningRate 0.0008 Epoch: 8 Global Step: 174860 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:38:19,949-Speed 6312.94 samples/sec Loss 6.8693 LearningRate 0.0008 Epoch: 8 Global Step: 174870 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:38:23,200-Speed 6301.35 samples/sec Loss 6.8454 LearningRate 0.0008 Epoch: 8 Global Step: 174880 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:38:26,448-Speed 6305.90 samples/sec Loss 6.9242 LearningRate 0.0008 Epoch: 8 Global Step: 174890 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:38:29,703-Speed 6294.09 samples/sec Loss 6.8956 LearningRate 0.0008 Epoch: 8 Global Step: 174900 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:38:32,952-Speed 6305.03 samples/sec Loss 6.9509 LearningRate 0.0008 Epoch: 8 Global Step: 174910 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:38:36,199-Speed 6308.18 samples/sec Loss 6.8140 LearningRate 0.0008 Epoch: 8 Global Step: 174920 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:38:39,449-Speed 6302.79 samples/sec Loss 6.9145 LearningRate 0.0008 Epoch: 8 Global Step: 174930 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:38:42,695-Speed 6310.45 samples/sec Loss 6.8917 LearningRate 0.0008 Epoch: 8 Global Step: 174940 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:38:45,948-Speed 6298.21 samples/sec Loss 6.9139 LearningRate 0.0008 Epoch: 8 Global Step: 174950 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:38:49,196-Speed 6305.92 samples/sec Loss 6.8644 LearningRate 0.0008 Epoch: 8 Global Step: 174960 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:38:52,446-Speed 6302.96 samples/sec Loss 6.8449 LearningRate 0.0008 Epoch: 8 Global Step: 174970 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:38:55,692-Speed 6310.93 samples/sec Loss 6.8403 LearningRate 0.0008 Epoch: 8 Global Step: 174980 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:38:58,941-Speed 6306.15 samples/sec Loss 6.8488 LearningRate 0.0008 Epoch: 8 Global Step: 174990 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:39:02,191-Speed 6302.91 samples/sec Loss 7.0005 LearningRate 0.0008 Epoch: 8 Global Step: 175000 Fp16 Grad Scale: 65536 Required: 60 hours Training: 2022-04-01 07:39:05,442-Speed 6300.65 samples/sec Loss 6.8507 LearningRate 0.0008 Epoch: 8 Global Step: 175010 Fp16 Grad Scale: 32768 Required: 60 hours Training: 2022-04-01 07:39:08,690-Speed 6307.23 samples/sec Loss 6.8687 LearningRate 0.0008 Epoch: 8 Global Step: 175020 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:11,938-Speed 6307.29 samples/sec Loss 6.8979 LearningRate 0.0008 Epoch: 8 Global Step: 175030 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:15,186-Speed 6307.17 samples/sec Loss 6.9466 LearningRate 0.0008 Epoch: 8 Global Step: 175040 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:18,431-Speed 6312.80 samples/sec Loss 6.8474 LearningRate 0.0008 Epoch: 8 Global Step: 175050 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:21,676-Speed 6311.97 samples/sec Loss 6.8629 LearningRate 0.0008 Epoch: 8 Global Step: 175060 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:24,925-Speed 6304.77 samples/sec Loss 6.8792 LearningRate 0.0008 Epoch: 8 Global Step: 175070 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:28,173-Speed 6306.65 samples/sec Loss 6.8824 LearningRate 0.0008 Epoch: 8 Global Step: 175080 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:31,425-Speed 6298.69 samples/sec Loss 6.8192 LearningRate 0.0008 Epoch: 8 Global Step: 175090 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:34,686-Speed 6281.34 samples/sec Loss 6.8627 LearningRate 0.0008 Epoch: 8 Global Step: 175100 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:37,929-Speed 6316.09 samples/sec Loss 6.7937 LearningRate 0.0008 Epoch: 8 Global Step: 175110 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:39:41,164-Speed 6333.55 samples/sec Loss 6.8282 LearningRate 0.0008 Epoch: 8 Global Step: 175120 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:44,412-Speed 6306.02 samples/sec Loss 6.9145 LearningRate 0.0008 Epoch: 8 Global Step: 175130 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:47,662-Speed 6304.32 samples/sec Loss 6.8507 LearningRate 0.0008 Epoch: 8 Global Step: 175140 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:50,914-Speed 6298.25 samples/sec Loss 6.8415 LearningRate 0.0008 Epoch: 8 Global Step: 175150 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:54,158-Speed 6313.80 samples/sec Loss 6.8708 LearningRate 0.0008 Epoch: 8 Global Step: 175160 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:39:57,409-Speed 6301.74 samples/sec Loss 6.8671 LearningRate 0.0008 Epoch: 8 Global Step: 175170 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:00,656-Speed 6309.36 samples/sec Loss 6.8647 LearningRate 0.0008 Epoch: 8 Global Step: 175180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:03,902-Speed 6310.53 samples/sec Loss 6.8371 LearningRate 0.0008 Epoch: 8 Global Step: 175190 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:07,152-Speed 6303.59 samples/sec Loss 6.8817 LearningRate 0.0008 Epoch: 8 Global Step: 175200 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:10,395-Speed 6316.20 samples/sec Loss 6.8640 LearningRate 0.0008 Epoch: 8 Global Step: 175210 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:13,646-Speed 6300.05 samples/sec Loss 6.9027 LearningRate 0.0008 Epoch: 8 Global Step: 175220 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:40:16,896-Speed 6304.43 samples/sec Loss 6.8697 LearningRate 0.0008 Epoch: 8 Global Step: 175230 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:40:20,137-Speed 6320.27 samples/sec Loss 7.0053 LearningRate 0.0008 Epoch: 8 Global Step: 175240 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:40:23,381-Speed 6314.20 samples/sec Loss 6.8630 LearningRate 0.0008 Epoch: 8 Global Step: 175250 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:40:26,613-Speed 6338.36 samples/sec Loss 6.9073 LearningRate 0.0008 Epoch: 8 Global Step: 175260 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:29,859-Speed 6311.22 samples/sec Loss 6.8047 LearningRate 0.0008 Epoch: 8 Global Step: 175270 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:33,103-Speed 6313.01 samples/sec Loss 6.8120 LearningRate 0.0008 Epoch: 8 Global Step: 175280 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:36,351-Speed 6308.01 samples/sec Loss 6.9052 LearningRate 0.0008 Epoch: 8 Global Step: 175290 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:39,597-Speed 6310.88 samples/sec Loss 6.9448 LearningRate 0.0008 Epoch: 8 Global Step: 175300 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:42,877-Speed 6244.59 samples/sec Loss 6.9171 LearningRate 0.0008 Epoch: 8 Global Step: 175310 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:46,123-Speed 6311.19 samples/sec Loss 6.8699 LearningRate 0.0008 Epoch: 8 Global Step: 175320 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:49,371-Speed 6306.69 samples/sec Loss 6.7547 LearningRate 0.0008 Epoch: 8 Global Step: 175330 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:52,620-Speed 6305.40 samples/sec Loss 6.8870 LearningRate 0.0008 Epoch: 8 Global Step: 175340 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:55,867-Speed 6308.95 samples/sec Loss 6.8793 LearningRate 0.0008 Epoch: 8 Global Step: 175350 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:40:59,114-Speed 6308.45 samples/sec Loss 6.8092 LearningRate 0.0008 Epoch: 8 Global Step: 175360 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:41:02,356-Speed 6317.59 samples/sec Loss 6.8631 LearningRate 0.0008 Epoch: 8 Global Step: 175370 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:41:05,602-Speed 6311.24 samples/sec Loss 6.8449 LearningRate 0.0008 Epoch: 8 Global Step: 175380 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:41:08,838-Speed 6329.53 samples/sec Loss 6.9095 LearningRate 0.0008 Epoch: 8 Global Step: 175390 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:12,082-Speed 6315.22 samples/sec Loss 6.8744 LearningRate 0.0008 Epoch: 8 Global Step: 175400 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:15,328-Speed 6311.54 samples/sec Loss 6.9219 LearningRate 0.0008 Epoch: 8 Global Step: 175410 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:18,572-Speed 6313.37 samples/sec Loss 6.9028 LearningRate 0.0008 Epoch: 8 Global Step: 175420 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:21,825-Speed 6298.68 samples/sec Loss 6.8950 LearningRate 0.0008 Epoch: 8 Global Step: 175430 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:25,073-Speed 6307.31 samples/sec Loss 6.8733 LearningRate 0.0008 Epoch: 8 Global Step: 175440 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:28,324-Speed 6301.44 samples/sec Loss 6.8509 LearningRate 0.0008 Epoch: 8 Global Step: 175450 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:31,567-Speed 6315.44 samples/sec Loss 6.8812 LearningRate 0.0008 Epoch: 8 Global Step: 175460 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:34,815-Speed 6307.66 samples/sec Loss 6.8194 LearningRate 0.0008 Epoch: 8 Global Step: 175470 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:38,060-Speed 6311.81 samples/sec Loss 6.8035 LearningRate 0.0008 Epoch: 8 Global Step: 175480 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:41,305-Speed 6312.14 samples/sec Loss 6.8629 LearningRate 0.0008 Epoch: 8 Global Step: 175490 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:41:44,535-Speed 6342.15 samples/sec Loss 6.8883 LearningRate 0.0008 Epoch: 8 Global Step: 175500 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:47,778-Speed 6317.41 samples/sec Loss 6.8508 LearningRate 0.0008 Epoch: 8 Global Step: 175510 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:51,023-Speed 6312.27 samples/sec Loss 6.8794 LearningRate 0.0008 Epoch: 8 Global Step: 175520 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:54,275-Speed 6299.48 samples/sec Loss 6.8554 LearningRate 0.0008 Epoch: 8 Global Step: 175530 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:41:57,521-Speed 6310.35 samples/sec Loss 6.8738 LearningRate 0.0008 Epoch: 8 Global Step: 175540 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:00,767-Speed 6310.31 samples/sec Loss 6.8798 LearningRate 0.0008 Epoch: 8 Global Step: 175550 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:04,018-Speed 6301.74 samples/sec Loss 6.8017 LearningRate 0.0008 Epoch: 8 Global Step: 175560 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:07,266-Speed 6305.51 samples/sec Loss 6.8553 LearningRate 0.0008 Epoch: 8 Global Step: 175570 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:10,511-Speed 6313.51 samples/sec Loss 6.8844 LearningRate 0.0008 Epoch: 8 Global Step: 175580 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:13,755-Speed 6313.72 samples/sec Loss 6.8027 LearningRate 0.0008 Epoch: 8 Global Step: 175590 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:17,003-Speed 6307.11 samples/sec Loss 6.9523 LearningRate 0.0008 Epoch: 8 Global Step: 175600 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:42:20,240-Speed 6329.03 samples/sec Loss 6.9369 LearningRate 0.0008 Epoch: 8 Global Step: 175610 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:23,487-Speed 6309.05 samples/sec Loss 6.9349 LearningRate 0.0008 Epoch: 8 Global Step: 175620 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:26,732-Speed 6312.83 samples/sec Loss 6.8668 LearningRate 0.0008 Epoch: 8 Global Step: 175630 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:29,977-Speed 6313.19 samples/sec Loss 6.9270 LearningRate 0.0008 Epoch: 8 Global Step: 175640 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:33,221-Speed 6313.85 samples/sec Loss 6.9000 LearningRate 0.0008 Epoch: 8 Global Step: 175650 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:36,464-Speed 6316.58 samples/sec Loss 6.9169 LearningRate 0.0008 Epoch: 8 Global Step: 175660 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:39,711-Speed 6308.11 samples/sec Loss 6.9190 LearningRate 0.0008 Epoch: 8 Global Step: 175670 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:42,960-Speed 6304.93 samples/sec Loss 6.9608 LearningRate 0.0008 Epoch: 8 Global Step: 175680 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:46,206-Speed 6311.43 samples/sec Loss 6.8609 LearningRate 0.0008 Epoch: 8 Global Step: 175690 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:49,457-Speed 6301.69 samples/sec Loss 6.8818 LearningRate 0.0008 Epoch: 8 Global Step: 175700 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:52,688-Speed 6340.66 samples/sec Loss 6.9139 LearningRate 0.0008 Epoch: 8 Global Step: 175710 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:55,936-Speed 6305.20 samples/sec Loss 6.8775 LearningRate 0.0008 Epoch: 8 Global Step: 175720 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:42:59,186-Speed 6302.83 samples/sec Loss 6.8027 LearningRate 0.0008 Epoch: 8 Global Step: 175730 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:02,434-Speed 6307.55 samples/sec Loss 6.8784 LearningRate 0.0008 Epoch: 8 Global Step: 175740 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:05,681-Speed 6307.93 samples/sec Loss 6.8578 LearningRate 0.0008 Epoch: 8 Global Step: 175750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:08,927-Speed 6312.31 samples/sec Loss 6.7666 LearningRate 0.0008 Epoch: 8 Global Step: 175760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:12,177-Speed 6301.20 samples/sec Loss 6.8291 LearningRate 0.0008 Epoch: 8 Global Step: 175770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:15,422-Speed 6313.34 samples/sec Loss 6.8081 LearningRate 0.0008 Epoch: 8 Global Step: 175780 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:18,671-Speed 6305.81 samples/sec Loss 6.9114 LearningRate 0.0008 Epoch: 8 Global Step: 175790 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:21,918-Speed 6308.84 samples/sec Loss 6.8778 LearningRate 0.0008 Epoch: 8 Global Step: 175800 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:25,164-Speed 6309.94 samples/sec Loss 6.8181 LearningRate 0.0008 Epoch: 8 Global Step: 175810 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:43:28,497-Speed 6145.84 samples/sec Loss 6.8801 LearningRate 0.0008 Epoch: 8 Global Step: 175820 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:43:31,739-Speed 6317.51 samples/sec Loss 6.8515 LearningRate 0.0008 Epoch: 8 Global Step: 175830 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:43:34,976-Speed 6330.15 samples/sec Loss 6.9516 LearningRate 0.0008 Epoch: 8 Global Step: 175840 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:38,221-Speed 6312.06 samples/sec Loss 6.8985 LearningRate 0.0008 Epoch: 8 Global Step: 175850 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:41,467-Speed 6310.87 samples/sec Loss 6.9036 LearningRate 0.0008 Epoch: 8 Global Step: 175860 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:44,716-Speed 6304.67 samples/sec Loss 6.9013 LearningRate 0.0008 Epoch: 8 Global Step: 175870 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:47,961-Speed 6313.22 samples/sec Loss 6.9403 LearningRate 0.0008 Epoch: 8 Global Step: 175880 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:51,207-Speed 6313.45 samples/sec Loss 6.9487 LearningRate 0.0008 Epoch: 8 Global Step: 175890 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:54,456-Speed 6305.34 samples/sec Loss 6.8861 LearningRate 0.0008 Epoch: 8 Global Step: 175900 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:43:57,702-Speed 6309.86 samples/sec Loss 6.8795 LearningRate 0.0008 Epoch: 8 Global Step: 175910 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:00,946-Speed 6314.79 samples/sec Loss 6.8749 LearningRate 0.0008 Epoch: 8 Global Step: 175920 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:04,302-Speed 6104.57 samples/sec Loss 6.8892 LearningRate 0.0008 Epoch: 8 Global Step: 175930 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:07,592-Speed 6224.56 samples/sec Loss 6.8431 LearningRate 0.0008 Epoch: 8 Global Step: 175940 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:44:10,826-Speed 6334.83 samples/sec Loss 6.7882 LearningRate 0.0008 Epoch: 8 Global Step: 175950 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:14,075-Speed 6306.20 samples/sec Loss 6.8595 LearningRate 0.0008 Epoch: 8 Global Step: 175960 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:17,328-Speed 6295.57 samples/sec Loss 6.8519 LearningRate 0.0008 Epoch: 8 Global Step: 175970 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:20,576-Speed 6307.62 samples/sec Loss 6.9209 LearningRate 0.0008 Epoch: 8 Global Step: 175980 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:23,819-Speed 6316.60 samples/sec Loss 6.7866 LearningRate 0.0008 Epoch: 8 Global Step: 175990 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:27,065-Speed 6311.13 samples/sec Loss 6.8222 LearningRate 0.0008 Epoch: 8 Global Step: 176000 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:30,355-Speed 6225.75 samples/sec Loss 6.8384 LearningRate 0.0008 Epoch: 8 Global Step: 176010 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:33,600-Speed 6313.09 samples/sec Loss 6.8553 LearningRate 0.0008 Epoch: 8 Global Step: 176020 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:36,844-Speed 6313.49 samples/sec Loss 6.8086 LearningRate 0.0008 Epoch: 8 Global Step: 176030 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:40,091-Speed 6308.87 samples/sec Loss 6.8619 LearningRate 0.0008 Epoch: 8 Global Step: 176040 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:43,344-Speed 6296.81 samples/sec Loss 6.8223 LearningRate 0.0008 Epoch: 8 Global Step: 176050 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:44:46,599-Speed 6294.39 samples/sec Loss 6.8710 LearningRate 0.0008 Epoch: 8 Global Step: 176060 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:49,949-Speed 6114.10 samples/sec Loss 6.9002 LearningRate 0.0008 Epoch: 8 Global Step: 176070 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:53,248-Speed 6213.13 samples/sec Loss 6.8416 LearningRate 0.0008 Epoch: 8 Global Step: 176080 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:56,490-Speed 6317.63 samples/sec Loss 6.8052 LearningRate 0.0008 Epoch: 8 Global Step: 176090 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:44:59,736-Speed 6311.23 samples/sec Loss 6.8551 LearningRate 0.0008 Epoch: 8 Global Step: 176100 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:02,987-Speed 6300.63 samples/sec Loss 6.8528 LearningRate 0.0008 Epoch: 8 Global Step: 176110 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:06,235-Speed 6307.08 samples/sec Loss 6.9418 LearningRate 0.0008 Epoch: 8 Global Step: 176120 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:09,485-Speed 6302.24 samples/sec Loss 6.8017 LearningRate 0.0008 Epoch: 8 Global Step: 176130 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:12,732-Speed 6309.68 samples/sec Loss 6.8438 LearningRate 0.0008 Epoch: 8 Global Step: 176140 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:15,981-Speed 6305.47 samples/sec Loss 6.7749 LearningRate 0.0008 Epoch: 8 Global Step: 176150 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:19,215-Speed 6332.17 samples/sec Loss 6.8050 LearningRate 0.0008 Epoch: 8 Global Step: 176160 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:22,465-Speed 6304.43 samples/sec Loss 6.8078 LearningRate 0.0008 Epoch: 8 Global Step: 176170 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:25,711-Speed 6310.72 samples/sec Loss 6.8180 LearningRate 0.0008 Epoch: 8 Global Step: 176180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:28,976-Speed 6274.27 samples/sec Loss 6.8822 LearningRate 0.0008 Epoch: 8 Global Step: 176190 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:32,225-Speed 6304.63 samples/sec Loss 6.8404 LearningRate 0.0008 Epoch: 8 Global Step: 176200 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:35,470-Speed 6312.35 samples/sec Loss 6.8355 LearningRate 0.0008 Epoch: 8 Global Step: 176210 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:38,730-Speed 6283.15 samples/sec Loss 6.8769 LearningRate 0.0008 Epoch: 8 Global Step: 176220 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:41,983-Speed 6296.30 samples/sec Loss 6.8883 LearningRate 0.0008 Epoch: 8 Global Step: 176230 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:45,233-Speed 6303.45 samples/sec Loss 6.7763 LearningRate 0.0008 Epoch: 8 Global Step: 176240 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:48,478-Speed 6314.34 samples/sec Loss 6.9254 LearningRate 0.0008 Epoch: 8 Global Step: 176250 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:51,723-Speed 6312.45 samples/sec Loss 6.8591 LearningRate 0.0008 Epoch: 8 Global Step: 176260 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:45:54,957-Speed 6333.48 samples/sec Loss 6.8684 LearningRate 0.0008 Epoch: 8 Global Step: 176270 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:45:58,206-Speed 6305.70 samples/sec Loss 6.8581 LearningRate 0.0008 Epoch: 8 Global Step: 176280 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:01,452-Speed 6310.31 samples/sec Loss 6.8667 LearningRate 0.0008 Epoch: 8 Global Step: 176290 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:04,699-Speed 6310.30 samples/sec Loss 6.8761 LearningRate 0.0008 Epoch: 8 Global Step: 176300 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:07,949-Speed 6302.07 samples/sec Loss 6.9042 LearningRate 0.0008 Epoch: 8 Global Step: 176310 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:11,192-Speed 6315.86 samples/sec Loss 6.9008 LearningRate 0.0008 Epoch: 8 Global Step: 176320 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:14,444-Speed 6299.34 samples/sec Loss 6.8381 LearningRate 0.0008 Epoch: 8 Global Step: 176330 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:17,693-Speed 6306.14 samples/sec Loss 6.8321 LearningRate 0.0008 Epoch: 8 Global Step: 176340 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:20,940-Speed 6308.50 samples/sec Loss 6.8702 LearningRate 0.0008 Epoch: 8 Global Step: 176350 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:24,185-Speed 6311.48 samples/sec Loss 6.8703 LearningRate 0.0008 Epoch: 8 Global Step: 176360 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:27,430-Speed 6314.15 samples/sec Loss 6.9253 LearningRate 0.0008 Epoch: 8 Global Step: 176370 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:46:30,680-Speed 6302.79 samples/sec Loss 6.8063 LearningRate 0.0008 Epoch: 8 Global Step: 176380 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:46:33,915-Speed 6331.40 samples/sec Loss 6.8260 LearningRate 0.0008 Epoch: 8 Global Step: 176390 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:37,165-Speed 6303.04 samples/sec Loss 6.8384 LearningRate 0.0008 Epoch: 8 Global Step: 176400 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:40,409-Speed 6314.75 samples/sec Loss 6.8871 LearningRate 0.0008 Epoch: 8 Global Step: 176410 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:43,658-Speed 6303.63 samples/sec Loss 6.8455 LearningRate 0.0008 Epoch: 8 Global Step: 176420 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:46,907-Speed 6305.61 samples/sec Loss 6.8897 LearningRate 0.0008 Epoch: 8 Global Step: 176430 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:50,152-Speed 6312.19 samples/sec Loss 6.8451 LearningRate 0.0008 Epoch: 8 Global Step: 176440 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:53,401-Speed 6306.21 samples/sec Loss 6.8790 LearningRate 0.0008 Epoch: 8 Global Step: 176450 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:56,649-Speed 6306.01 samples/sec Loss 6.8262 LearningRate 0.0008 Epoch: 8 Global Step: 176460 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:46:59,898-Speed 6305.63 samples/sec Loss 6.8619 LearningRate 0.0008 Epoch: 8 Global Step: 176470 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:03,142-Speed 6312.67 samples/sec Loss 6.8712 LearningRate 0.0008 Epoch: 8 Global Step: 176480 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:06,387-Speed 6313.95 samples/sec Loss 6.8581 LearningRate 0.0008 Epoch: 8 Global Step: 176490 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:47:09,639-Speed 6298.97 samples/sec Loss 6.7646 LearningRate 0.0008 Epoch: 8 Global Step: 176500 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:47:12,885-Speed 6311.14 samples/sec Loss 6.8646 LearningRate 0.0008 Epoch: 8 Global Step: 176510 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:47:16,136-Speed 6301.93 samples/sec Loss 6.8941 LearningRate 0.0008 Epoch: 8 Global Step: 176520 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:47:19,382-Speed 6310.35 samples/sec Loss 6.7978 LearningRate 0.0008 Epoch: 8 Global Step: 176530 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:47:22,637-Speed 6293.38 samples/sec Loss 6.8148 LearningRate 0.0008 Epoch: 8 Global Step: 176540 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:47:25,871-Speed 6333.84 samples/sec Loss 6.7981 LearningRate 0.0008 Epoch: 8 Global Step: 176550 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:29,118-Speed 6308.21 samples/sec Loss 6.8472 LearningRate 0.0008 Epoch: 8 Global Step: 176560 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:32,364-Speed 6310.81 samples/sec Loss 6.9375 LearningRate 0.0008 Epoch: 8 Global Step: 176570 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:35,609-Speed 6313.69 samples/sec Loss 6.8664 LearningRate 0.0008 Epoch: 8 Global Step: 176580 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:38,855-Speed 6309.80 samples/sec Loss 6.8294 LearningRate 0.0008 Epoch: 8 Global Step: 176590 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:42,101-Speed 6310.71 samples/sec Loss 6.8101 LearningRate 0.0008 Epoch: 8 Global Step: 176600 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:45,345-Speed 6314.60 samples/sec Loss 6.9088 LearningRate 0.0008 Epoch: 8 Global Step: 176610 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:48,588-Speed 6315.82 samples/sec Loss 6.7755 LearningRate 0.0008 Epoch: 8 Global Step: 176620 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:51,834-Speed 6311.05 samples/sec Loss 6.7835 LearningRate 0.0008 Epoch: 8 Global Step: 176630 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:55,081-Speed 6308.88 samples/sec Loss 6.8763 LearningRate 0.0008 Epoch: 8 Global Step: 176640 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:47:58,324-Speed 6316.14 samples/sec Loss 6.8797 LearningRate 0.0008 Epoch: 8 Global Step: 176650 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:48:01,572-Speed 6307.65 samples/sec Loss 6.8436 LearningRate 0.0008 Epoch: 8 Global Step: 176660 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:48:04,808-Speed 6330.80 samples/sec Loss 6.9435 LearningRate 0.0008 Epoch: 8 Global Step: 176670 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:08,059-Speed 6299.31 samples/sec Loss 6.9126 LearningRate 0.0008 Epoch: 8 Global Step: 176680 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:11,308-Speed 6305.72 samples/sec Loss 6.8255 LearningRate 0.0008 Epoch: 8 Global Step: 176690 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:14,552-Speed 6314.45 samples/sec Loss 6.8300 LearningRate 0.0008 Epoch: 8 Global Step: 176700 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:17,805-Speed 6297.52 samples/sec Loss 6.8421 LearningRate 0.0008 Epoch: 8 Global Step: 176710 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:21,054-Speed 6306.13 samples/sec Loss 6.8937 LearningRate 0.0008 Epoch: 8 Global Step: 176720 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:24,305-Speed 6300.26 samples/sec Loss 6.9096 LearningRate 0.0008 Epoch: 8 Global Step: 176730 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:27,557-Speed 6299.29 samples/sec Loss 6.8819 LearningRate 0.0008 Epoch: 8 Global Step: 176740 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:30,802-Speed 6312.35 samples/sec Loss 6.9126 LearningRate 0.0008 Epoch: 8 Global Step: 176750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:34,047-Speed 6313.73 samples/sec Loss 6.8234 LearningRate 0.0008 Epoch: 8 Global Step: 176760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:37,277-Speed 6342.49 samples/sec Loss 6.9899 LearningRate 0.0008 Epoch: 8 Global Step: 176770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:40,528-Speed 6301.54 samples/sec Loss 6.9151 LearningRate 0.0008 Epoch: 8 Global Step: 176780 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:43,773-Speed 6312.51 samples/sec Loss 6.8676 LearningRate 0.0008 Epoch: 8 Global Step: 176790 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:47,022-Speed 6305.94 samples/sec Loss 6.8278 LearningRate 0.0008 Epoch: 8 Global Step: 176800 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:50,271-Speed 6304.51 samples/sec Loss 6.8233 LearningRate 0.0008 Epoch: 8 Global Step: 176810 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:53,527-Speed 6291.18 samples/sec Loss 6.8420 LearningRate 0.0008 Epoch: 8 Global Step: 176820 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:48:56,775-Speed 6307.27 samples/sec Loss 6.8415 LearningRate 0.0008 Epoch: 8 Global Step: 176830 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:00,025-Speed 6303.19 samples/sec Loss 6.8894 LearningRate 0.0008 Epoch: 8 Global Step: 176840 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:03,274-Speed 6304.01 samples/sec Loss 6.9096 LearningRate 0.0008 Epoch: 8 Global Step: 176850 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:06,521-Speed 6309.08 samples/sec Loss 6.8192 LearningRate 0.0008 Epoch: 8 Global Step: 176860 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:09,768-Speed 6308.78 samples/sec Loss 6.9090 LearningRate 0.0008 Epoch: 8 Global Step: 176870 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:49:13,011-Speed 6315.72 samples/sec Loss 6.8740 LearningRate 0.0008 Epoch: 8 Global Step: 176880 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:49:16,255-Speed 6313.91 samples/sec Loss 6.8946 LearningRate 0.0008 Epoch: 8 Global Step: 176890 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:49:19,505-Speed 6304.86 samples/sec Loss 6.8025 LearningRate 0.0008 Epoch: 8 Global Step: 176900 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:49:22,750-Speed 6311.33 samples/sec Loss 6.8843 LearningRate 0.0008 Epoch: 8 Global Step: 176910 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:49:25,985-Speed 6333.07 samples/sec Loss 6.8416 LearningRate 0.0008 Epoch: 8 Global Step: 176920 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:29,231-Speed 6311.91 samples/sec Loss 6.8605 LearningRate 0.0008 Epoch: 8 Global Step: 176930 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:32,477-Speed 6310.01 samples/sec Loss 6.8204 LearningRate 0.0008 Epoch: 8 Global Step: 176940 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:35,725-Speed 6306.10 samples/sec Loss 6.8509 LearningRate 0.0008 Epoch: 8 Global Step: 176950 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:38,977-Speed 6300.62 samples/sec Loss 6.8564 LearningRate 0.0008 Epoch: 8 Global Step: 176960 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:42,220-Speed 6315.24 samples/sec Loss 6.8684 LearningRate 0.0008 Epoch: 8 Global Step: 176970 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:45,466-Speed 6310.55 samples/sec Loss 6.8902 LearningRate 0.0008 Epoch: 8 Global Step: 176980 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:48,713-Speed 6310.20 samples/sec Loss 6.8986 LearningRate 0.0008 Epoch: 8 Global Step: 176990 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:51,960-Speed 6308.61 samples/sec Loss 6.8312 LearningRate 0.0008 Epoch: 8 Global Step: 177000 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:55,206-Speed 6309.83 samples/sec Loss 6.8493 LearningRate 0.0008 Epoch: 8 Global Step: 177010 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:49:58,443-Speed 6328.81 samples/sec Loss 6.7916 LearningRate 0.0008 Epoch: 8 Global Step: 177020 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:01,692-Speed 6304.68 samples/sec Loss 6.9714 LearningRate 0.0008 Epoch: 8 Global Step: 177030 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:04,937-Speed 6312.02 samples/sec Loss 6.8385 LearningRate 0.0008 Epoch: 8 Global Step: 177040 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:08,185-Speed 6307.71 samples/sec Loss 6.7945 LearningRate 0.0008 Epoch: 8 Global Step: 177050 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:11,430-Speed 6312.96 samples/sec Loss 6.7898 LearningRate 0.0008 Epoch: 8 Global Step: 177060 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:14,676-Speed 6309.61 samples/sec Loss 6.8838 LearningRate 0.0008 Epoch: 8 Global Step: 177070 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:17,925-Speed 6305.09 samples/sec Loss 6.8319 LearningRate 0.0008 Epoch: 8 Global Step: 177080 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:21,175-Speed 6303.63 samples/sec Loss 6.9025 LearningRate 0.0008 Epoch: 8 Global Step: 177090 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:24,429-Speed 6295.04 samples/sec Loss 6.8570 LearningRate 0.0008 Epoch: 8 Global Step: 177100 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:27,678-Speed 6305.57 samples/sec Loss 6.9005 LearningRate 0.0008 Epoch: 8 Global Step: 177110 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:30,928-Speed 6302.28 samples/sec Loss 6.8411 LearningRate 0.0008 Epoch: 8 Global Step: 177120 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:50:34,175-Speed 6308.40 samples/sec Loss 6.8433 LearningRate 0.0008 Epoch: 8 Global Step: 177130 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:50:37,424-Speed 6306.08 samples/sec Loss 6.7695 LearningRate 0.0008 Epoch: 8 Global Step: 177140 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:50:40,670-Speed 6311.36 samples/sec Loss 6.8664 LearningRate 0.0008 Epoch: 8 Global Step: 177150 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:50:43,919-Speed 6303.80 samples/sec Loss 6.7895 LearningRate 0.0008 Epoch: 8 Global Step: 177160 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:50:47,168-Speed 6304.58 samples/sec Loss 6.9128 LearningRate 0.0008 Epoch: 8 Global Step: 177170 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:50:50,402-Speed 6334.15 samples/sec Loss 6.8576 LearningRate 0.0008 Epoch: 8 Global Step: 177180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:53,652-Speed 6304.56 samples/sec Loss 6.8737 LearningRate 0.0008 Epoch: 8 Global Step: 177190 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:50:56,900-Speed 6305.81 samples/sec Loss 6.8475 LearningRate 0.0008 Epoch: 8 Global Step: 177200 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:00,145-Speed 6313.70 samples/sec Loss 6.9267 LearningRate 0.0008 Epoch: 8 Global Step: 177210 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:03,418-Speed 6257.35 samples/sec Loss 6.8678 LearningRate 0.0008 Epoch: 8 Global Step: 177220 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:06,668-Speed 6303.56 samples/sec Loss 6.9359 LearningRate 0.0008 Epoch: 8 Global Step: 177230 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:09,915-Speed 6308.41 samples/sec Loss 6.8719 LearningRate 0.0008 Epoch: 8 Global Step: 177240 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:13,162-Speed 6308.24 samples/sec Loss 6.8612 LearningRate 0.0008 Epoch: 8 Global Step: 177250 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:16,407-Speed 6313.00 samples/sec Loss 6.8599 LearningRate 0.0008 Epoch: 8 Global Step: 177260 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:19,655-Speed 6308.21 samples/sec Loss 6.8982 LearningRate 0.0008 Epoch: 8 Global Step: 177270 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:22,900-Speed 6311.73 samples/sec Loss 6.8655 LearningRate 0.0008 Epoch: 8 Global Step: 177280 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:51:26,146-Speed 6310.55 samples/sec Loss 6.8631 LearningRate 0.0008 Epoch: 8 Global Step: 177290 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:51:29,390-Speed 6315.49 samples/sec Loss 6.8042 LearningRate 0.0008 Epoch: 8 Global Step: 177300 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:51:32,637-Speed 6307.36 samples/sec Loss 6.8198 LearningRate 0.0008 Epoch: 8 Global Step: 177310 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:51:35,881-Speed 6315.19 samples/sec Loss 6.9361 LearningRate 0.0008 Epoch: 8 Global Step: 177320 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:51:39,122-Speed 6320.29 samples/sec Loss 6.9217 LearningRate 0.0008 Epoch: 8 Global Step: 177330 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:51:42,369-Speed 6308.38 samples/sec Loss 6.8623 LearningRate 0.0008 Epoch: 8 Global Step: 177340 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:51:45,619-Speed 6302.96 samples/sec Loss 6.7994 LearningRate 0.0008 Epoch: 8 Global Step: 177350 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:51:48,861-Speed 6320.35 samples/sec Loss 6.9106 LearningRate 0.0008 Epoch: 8 Global Step: 177360 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:52,119-Speed 6286.83 samples/sec Loss 6.8831 LearningRate 0.0008 Epoch: 8 Global Step: 177370 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:55,364-Speed 6311.84 samples/sec Loss 6.8471 LearningRate 0.0008 Epoch: 8 Global Step: 177380 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:51:58,611-Speed 6310.03 samples/sec Loss 6.8281 LearningRate 0.0008 Epoch: 8 Global Step: 177390 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:01,860-Speed 6304.65 samples/sec Loss 6.7888 LearningRate 0.0008 Epoch: 8 Global Step: 177400 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:05,113-Speed 6296.56 samples/sec Loss 6.8909 LearningRate 0.0008 Epoch: 8 Global Step: 177410 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:08,359-Speed 6310.66 samples/sec Loss 6.8430 LearningRate 0.0008 Epoch: 8 Global Step: 177420 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:11,605-Speed 6312.07 samples/sec Loss 6.7802 LearningRate 0.0008 Epoch: 8 Global Step: 177430 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:14,850-Speed 6310.91 samples/sec Loss 6.9010 LearningRate 0.0008 Epoch: 8 Global Step: 177440 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:18,097-Speed 6310.02 samples/sec Loss 6.9050 LearningRate 0.0008 Epoch: 8 Global Step: 177450 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:21,347-Speed 6302.45 samples/sec Loss 6.8559 LearningRate 0.0008 Epoch: 8 Global Step: 177460 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:52:24,595-Speed 6306.03 samples/sec Loss 6.8513 LearningRate 0.0008 Epoch: 8 Global Step: 177470 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:52:27,847-Speed 6300.48 samples/sec Loss 6.6954 LearningRate 0.0008 Epoch: 8 Global Step: 177480 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:52:31,094-Speed 6308.46 samples/sec Loss 6.8426 LearningRate 0.0008 Epoch: 8 Global Step: 177490 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:52:34,343-Speed 6304.38 samples/sec Loss 6.8490 LearningRate 0.0008 Epoch: 8 Global Step: 177500 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:52:37,575-Speed 6338.65 samples/sec Loss 6.8794 LearningRate 0.0008 Epoch: 8 Global Step: 177510 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:40,825-Speed 6302.74 samples/sec Loss 6.8043 LearningRate 0.0008 Epoch: 8 Global Step: 177520 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:44,074-Speed 6304.39 samples/sec Loss 6.8466 LearningRate 0.0008 Epoch: 8 Global Step: 177530 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:47,324-Speed 6303.70 samples/sec Loss 6.7947 LearningRate 0.0008 Epoch: 8 Global Step: 177540 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:50,568-Speed 6318.19 samples/sec Loss 6.8244 LearningRate 0.0008 Epoch: 8 Global Step: 177550 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:53,820-Speed 6298.64 samples/sec Loss 6.7982 LearningRate 0.0008 Epoch: 8 Global Step: 177560 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:52:57,069-Speed 6304.99 samples/sec Loss 6.8433 LearningRate 0.0008 Epoch: 8 Global Step: 177570 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:00,317-Speed 6306.60 samples/sec Loss 6.8933 LearningRate 0.0008 Epoch: 8 Global Step: 177580 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:03,561-Speed 6315.06 samples/sec Loss 6.8384 LearningRate 0.0008 Epoch: 8 Global Step: 177590 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:06,810-Speed 6305.96 samples/sec Loss 6.8688 LearningRate 0.0008 Epoch: 8 Global Step: 177600 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:10,056-Speed 6310.01 samples/sec Loss 6.8183 LearningRate 0.0008 Epoch: 8 Global Step: 177610 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:53:13,285-Speed 6342.83 samples/sec Loss 6.8305 LearningRate 0.0008 Epoch: 8 Global Step: 177620 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:16,537-Speed 6300.63 samples/sec Loss 6.7736 LearningRate 0.0008 Epoch: 8 Global Step: 177630 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:19,783-Speed 6309.52 samples/sec Loss 6.7922 LearningRate 0.0008 Epoch: 8 Global Step: 177640 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:23,035-Speed 6298.63 samples/sec Loss 6.7972 LearningRate 0.0008 Epoch: 8 Global Step: 177650 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:26,285-Speed 6304.16 samples/sec Loss 6.8188 LearningRate 0.0008 Epoch: 8 Global Step: 177660 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:29,533-Speed 6306.36 samples/sec Loss 6.8258 LearningRate 0.0008 Epoch: 8 Global Step: 177670 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:32,782-Speed 6304.60 samples/sec Loss 6.8184 LearningRate 0.0008 Epoch: 8 Global Step: 177680 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:36,034-Speed 6300.53 samples/sec Loss 6.8070 LearningRate 0.0008 Epoch: 8 Global Step: 177690 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:39,278-Speed 6313.30 samples/sec Loss 6.7516 LearningRate 0.0008 Epoch: 8 Global Step: 177700 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:42,523-Speed 6312.47 samples/sec Loss 6.8332 LearningRate 0.0008 Epoch: 8 Global Step: 177710 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:45,779-Speed 6291.12 samples/sec Loss 6.8710 LearningRate 0.0008 Epoch: 8 Global Step: 177720 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:53:49,026-Speed 6309.83 samples/sec Loss 6.7465 LearningRate 0.0008 Epoch: 8 Global Step: 177730 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:53:52,258-Speed 6337.16 samples/sec Loss 6.8904 LearningRate 0.0008 Epoch: 8 Global Step: 177740 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:55,505-Speed 6308.49 samples/sec Loss 6.8822 LearningRate 0.0008 Epoch: 8 Global Step: 177750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:53:58,747-Speed 6318.76 samples/sec Loss 6.7775 LearningRate 0.0008 Epoch: 8 Global Step: 177760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:54:01,994-Speed 6309.64 samples/sec Loss 6.7569 LearningRate 0.0008 Epoch: 8 Global Step: 177770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:54:05,224-Speed 6342.00 samples/sec Loss 6.7684 LearningRate 0.0008 Epoch: 8 Global Step: 177780 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-04-01 07:54:08,472-Speed 6306.71 samples/sec Loss 6.9435 LearningRate 0.0008 Epoch: 8 Global Step: 177790 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-04-01 07:54:11,714-Speed 6319.08 samples/sec Loss 6.8110 LearningRate 0.0008 Epoch: 8 Global Step: 177800 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-04-01 07:54:14,960-Speed 6310.32 samples/sec Loss 6.8246 LearningRate 0.0008 Epoch: 8 Global Step: 177810 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-04-01 07:54:18,207-Speed 6309.02 samples/sec Loss 6.8845 LearningRate 0.0008 Epoch: 8 Global Step: 177820 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-04-01 07:54:21,455-Speed 6307.35 samples/sec Loss 6.9120 LearningRate 0.0008 Epoch: 8 Global Step: 177830 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-04-01 07:54:24,702-Speed 6308.92 samples/sec Loss 6.9046 LearningRate 0.0008 Epoch: 8 Global Step: 177840 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-04-01 07:54:27,948-Speed 6309.29 samples/sec Loss 6.8234 LearningRate 0.0008 Epoch: 8 Global Step: 177850 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-04-01 07:54:31,196-Speed 6308.29 samples/sec Loss 6.7704 LearningRate 0.0008 Epoch: 8 Global Step: 177860 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-04-01 07:54:34,445-Speed 6305.19 samples/sec Loss 6.8302 LearningRate 0.0008 Epoch: 8 Global Step: 177870 Fp16 Grad Scale: 16384 Required: 59 hours Training: 2022-04-01 07:54:37,688-Speed 6316.52 samples/sec Loss 6.9301 LearningRate 0.0008 Epoch: 8 Global Step: 177880 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:54:40,935-Speed 6308.84 samples/sec Loss 6.9270 LearningRate 0.0008 Epoch: 8 Global Step: 177890 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:54:44,181-Speed 6309.58 samples/sec Loss 6.8473 LearningRate 0.0008 Epoch: 8 Global Step: 177900 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:54:47,429-Speed 6307.88 samples/sec Loss 6.8046 LearningRate 0.0008 Epoch: 8 Global Step: 177910 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:54:50,675-Speed 6310.60 samples/sec Loss 6.8401 LearningRate 0.0008 Epoch: 8 Global Step: 177920 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:54:53,919-Speed 6313.63 samples/sec Loss 6.8032 LearningRate 0.0008 Epoch: 8 Global Step: 177930 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:54:57,178-Speed 6285.11 samples/sec Loss 6.7868 LearningRate 0.0008 Epoch: 8 Global Step: 177940 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:00,433-Speed 6293.09 samples/sec Loss 6.7839 LearningRate 0.0008 Epoch: 8 Global Step: 177950 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:03,680-Speed 6308.73 samples/sec Loss 6.8492 LearningRate 0.0008 Epoch: 8 Global Step: 177960 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:06,929-Speed 6306.64 samples/sec Loss 6.8977 LearningRate 0.0008 Epoch: 8 Global Step: 177970 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:10,178-Speed 6304.57 samples/sec Loss 6.8271 LearningRate 0.0008 Epoch: 8 Global Step: 177980 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:55:13,409-Speed 6339.16 samples/sec Loss 6.8229 LearningRate 0.0008 Epoch: 8 Global Step: 177990 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:16,654-Speed 6312.35 samples/sec Loss 6.7477 LearningRate 0.0008 Epoch: 8 Global Step: 178000 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:19,900-Speed 6311.33 samples/sec Loss 6.8768 LearningRate 0.0008 Epoch: 8 Global Step: 178010 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:23,148-Speed 6306.79 samples/sec Loss 6.8254 LearningRate 0.0008 Epoch: 8 Global Step: 178020 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:26,405-Speed 6290.55 samples/sec Loss 6.8002 LearningRate 0.0008 Epoch: 8 Global Step: 178030 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:29,649-Speed 6313.20 samples/sec Loss 6.8521 LearningRate 0.0008 Epoch: 8 Global Step: 178040 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:32,897-Speed 6306.98 samples/sec Loss 6.8425 LearningRate 0.0008 Epoch: 8 Global Step: 178050 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:36,144-Speed 6308.91 samples/sec Loss 6.8601 LearningRate 0.0008 Epoch: 8 Global Step: 178060 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:39,391-Speed 6309.17 samples/sec Loss 6.8466 LearningRate 0.0008 Epoch: 8 Global Step: 178070 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:42,636-Speed 6313.56 samples/sec Loss 6.7726 LearningRate 0.0008 Epoch: 8 Global Step: 178080 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:45,885-Speed 6303.33 samples/sec Loss 6.8691 LearningRate 0.0008 Epoch: 8 Global Step: 178090 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:55:49,136-Speed 6300.53 samples/sec Loss 6.8412 LearningRate 0.0008 Epoch: 8 Global Step: 178100 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:55:52,389-Speed 6298.75 samples/sec Loss 6.8543 LearningRate 0.0008 Epoch: 8 Global Step: 178110 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:55:55,620-Speed 6338.52 samples/sec Loss 6.8900 LearningRate 0.0008 Epoch: 8 Global Step: 178120 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:55:58,865-Speed 6312.58 samples/sec Loss 6.8351 LearningRate 0.0008 Epoch: 8 Global Step: 178130 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:02,113-Speed 6308.10 samples/sec Loss 6.8484 LearningRate 0.0008 Epoch: 8 Global Step: 178140 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:05,363-Speed 6302.04 samples/sec Loss 6.7705 LearningRate 0.0008 Epoch: 8 Global Step: 178150 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:08,612-Speed 6304.46 samples/sec Loss 6.7919 LearningRate 0.0008 Epoch: 8 Global Step: 178160 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:11,856-Speed 6316.29 samples/sec Loss 6.8006 LearningRate 0.0008 Epoch: 8 Global Step: 178170 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:15,103-Speed 6307.23 samples/sec Loss 6.8086 LearningRate 0.0008 Epoch: 8 Global Step: 178180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:18,347-Speed 6315.04 samples/sec Loss 6.7891 LearningRate 0.0008 Epoch: 8 Global Step: 178190 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:21,593-Speed 6310.83 samples/sec Loss 6.8115 LearningRate 0.0008 Epoch: 8 Global Step: 178200 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:24,848-Speed 6293.38 samples/sec Loss 6.8872 LearningRate 0.0008 Epoch: 8 Global Step: 178210 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:28,096-Speed 6308.16 samples/sec Loss 6.7791 LearningRate 0.0008 Epoch: 8 Global Step: 178220 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:56:31,328-Speed 6337.99 samples/sec Loss 6.8677 LearningRate 0.0008 Epoch: 8 Global Step: 178230 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:34,576-Speed 6305.81 samples/sec Loss 6.8446 LearningRate 0.0008 Epoch: 8 Global Step: 178240 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:37,824-Speed 6307.05 samples/sec Loss 6.8827 LearningRate 0.0008 Epoch: 8 Global Step: 178250 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:41,069-Speed 6312.05 samples/sec Loss 6.8331 LearningRate 0.0008 Epoch: 8 Global Step: 178260 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:44,312-Speed 6316.85 samples/sec Loss 6.8829 LearningRate 0.0008 Epoch: 8 Global Step: 178270 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:47,564-Speed 6300.19 samples/sec Loss 6.8578 LearningRate 0.0008 Epoch: 8 Global Step: 178280 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:50,835-Speed 6261.90 samples/sec Loss 6.7989 LearningRate 0.0008 Epoch: 8 Global Step: 178290 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:54,149-Speed 6180.69 samples/sec Loss 6.7052 LearningRate 0.0008 Epoch: 8 Global Step: 178300 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:56:57,397-Speed 6307.26 samples/sec Loss 6.8088 LearningRate 0.0008 Epoch: 8 Global Step: 178310 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:00,645-Speed 6307.65 samples/sec Loss 6.8315 LearningRate 0.0008 Epoch: 8 Global Step: 178320 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:03,894-Speed 6303.73 samples/sec Loss 6.8281 LearningRate 0.0008 Epoch: 8 Global Step: 178330 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:57:07,138-Speed 6314.80 samples/sec Loss 6.9232 LearningRate 0.0008 Epoch: 8 Global Step: 178340 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:57:10,378-Speed 6322.75 samples/sec Loss 6.8276 LearningRate 0.0008 Epoch: 8 Global Step: 178350 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:13,623-Speed 6311.80 samples/sec Loss 6.8321 LearningRate 0.0008 Epoch: 8 Global Step: 178360 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:16,868-Speed 6313.49 samples/sec Loss 6.8326 LearningRate 0.0008 Epoch: 8 Global Step: 178370 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:20,121-Speed 6297.29 samples/sec Loss 6.8225 LearningRate 0.0008 Epoch: 8 Global Step: 178380 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:23,368-Speed 6308.96 samples/sec Loss 6.8318 LearningRate 0.0008 Epoch: 8 Global Step: 178390 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:26,618-Speed 6301.04 samples/sec Loss 6.7268 LearningRate 0.0008 Epoch: 8 Global Step: 178400 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:29,869-Speed 6302.66 samples/sec Loss 6.7806 LearningRate 0.0008 Epoch: 8 Global Step: 178410 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:33,116-Speed 6309.43 samples/sec Loss 6.7646 LearningRate 0.0008 Epoch: 8 Global Step: 178420 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:36,362-Speed 6309.76 samples/sec Loss 6.8168 LearningRate 0.0008 Epoch: 8 Global Step: 178430 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:39,612-Speed 6304.89 samples/sec Loss 6.8505 LearningRate 0.0008 Epoch: 8 Global Step: 178440 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:42,861-Speed 6304.74 samples/sec Loss 6.7568 LearningRate 0.0008 Epoch: 8 Global Step: 178450 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:57:46,108-Speed 6307.91 samples/sec Loss 6.9465 LearningRate 0.0008 Epoch: 8 Global Step: 178460 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:57:49,369-Speed 6281.26 samples/sec Loss 6.8046 LearningRate 0.0008 Epoch: 8 Global Step: 178470 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:57:52,602-Speed 6336.28 samples/sec Loss 6.8406 LearningRate 0.0008 Epoch: 8 Global Step: 178480 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:55,846-Speed 6314.06 samples/sec Loss 6.8633 LearningRate 0.0008 Epoch: 8 Global Step: 178490 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:57:59,095-Speed 6305.08 samples/sec Loss 6.8901 LearningRate 0.0008 Epoch: 8 Global Step: 178500 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:58:02,339-Speed 6314.43 samples/sec Loss 6.8519 LearningRate 0.0008 Epoch: 8 Global Step: 178510 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:58:05,587-Speed 6308.10 samples/sec Loss 6.8945 LearningRate 0.0008 Epoch: 8 Global Step: 178520 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:58:08,829-Speed 6318.14 samples/sec Loss 6.7977 LearningRate 0.0008 Epoch: 8 Global Step: 178530 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:58:12,090-Speed 6281.95 samples/sec Loss 6.8192 LearningRate 0.0008 Epoch: 8 Global Step: 178540 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:58:15,338-Speed 6306.80 samples/sec Loss 6.7662 LearningRate 0.0008 Epoch: 8 Global Step: 178550 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:58:18,586-Speed 6306.85 samples/sec Loss 6.8446 LearningRate 0.0008 Epoch: 8 Global Step: 178560 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:58:21,833-Speed 6308.55 samples/sec Loss 6.8622 LearningRate 0.0008 Epoch: 8 Global Step: 178570 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:58:25,084-Speed 6301.34 samples/sec Loss 6.9413 LearningRate 0.0008 Epoch: 8 Global Step: 178580 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:58:28,336-Speed 6299.32 samples/sec Loss 6.9096 LearningRate 0.0008 Epoch: 8 Global Step: 178590 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:58:31,583-Speed 6307.21 samples/sec Loss 6.8237 LearningRate 0.0008 Epoch: 8 Global Step: 178600 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:58:34,829-Speed 6310.71 samples/sec Loss 6.8942 LearningRate 0.0008 Epoch: 8 Global Step: 178610 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:58:38,075-Speed 6310.51 samples/sec Loss 6.9489 LearningRate 0.0008 Epoch: 8 Global Step: 178620 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:58:41,325-Speed 6305.08 samples/sec Loss 6.8077 LearningRate 0.0008 Epoch: 8 Global Step: 178630 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:58:44,574-Speed 6303.72 samples/sec Loss 6.7803 LearningRate 0.0008 Epoch: 8 Global Step: 178640 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:58:47,820-Speed 6311.84 samples/sec Loss 6.9095 LearningRate 0.0008 Epoch: 8 Global Step: 178650 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:58:51,069-Speed 6303.82 samples/sec Loss 6.8256 LearningRate 0.0008 Epoch: 8 Global Step: 178660 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:58:54,317-Speed 6307.31 samples/sec Loss 6.8121 LearningRate 0.0008 Epoch: 8 Global Step: 178670 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:58:57,546-Speed 6343.67 samples/sec Loss 6.7668 LearningRate 0.0008 Epoch: 8 Global Step: 178680 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:59:00,793-Speed 6309.71 samples/sec Loss 6.8878 LearningRate 0.0008 Epoch: 8 Global Step: 178690 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 07:59:04,025-Speed 6338.24 samples/sec Loss 6.8133 LearningRate 0.0008 Epoch: 8 Global Step: 178700 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:07,273-Speed 6306.78 samples/sec Loss 6.9096 LearningRate 0.0008 Epoch: 8 Global Step: 178710 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:10,518-Speed 6311.63 samples/sec Loss 6.8376 LearningRate 0.0008 Epoch: 8 Global Step: 178720 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:13,768-Speed 6302.39 samples/sec Loss 6.8879 LearningRate 0.0008 Epoch: 8 Global Step: 178730 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:17,016-Speed 6308.17 samples/sec Loss 6.8344 LearningRate 0.0008 Epoch: 8 Global Step: 178740 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:20,264-Speed 6307.05 samples/sec Loss 6.8348 LearningRate 0.0008 Epoch: 8 Global Step: 178750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:23,511-Speed 6307.42 samples/sec Loss 6.8421 LearningRate 0.0008 Epoch: 8 Global Step: 178760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:26,768-Speed 6288.84 samples/sec Loss 6.8731 LearningRate 0.0008 Epoch: 8 Global Step: 178770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:30,016-Speed 6307.89 samples/sec Loss 6.8695 LearningRate 0.0008 Epoch: 8 Global Step: 178780 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:33,263-Speed 6307.86 samples/sec Loss 6.8159 LearningRate 0.0008 Epoch: 8 Global Step: 178790 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:36,493-Speed 6343.51 samples/sec Loss 6.7729 LearningRate 0.0008 Epoch: 8 Global Step: 178800 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:39,739-Speed 6308.86 samples/sec Loss 6.7824 LearningRate 0.0008 Epoch: 8 Global Step: 178810 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:42,988-Speed 6306.38 samples/sec Loss 6.8324 LearningRate 0.0008 Epoch: 8 Global Step: 178820 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:46,237-Speed 6304.29 samples/sec Loss 6.8969 LearningRate 0.0008 Epoch: 8 Global Step: 178830 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:49,484-Speed 6309.39 samples/sec Loss 6.8631 LearningRate 0.0008 Epoch: 8 Global Step: 178840 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:52,729-Speed 6312.09 samples/sec Loss 6.8105 LearningRate 0.0008 Epoch: 8 Global Step: 178850 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:55,975-Speed 6311.48 samples/sec Loss 6.8551 LearningRate 0.0008 Epoch: 8 Global Step: 178860 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 07:59:59,227-Speed 6298.72 samples/sec Loss 6.8539 LearningRate 0.0008 Epoch: 8 Global Step: 178870 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:02,475-Speed 6307.67 samples/sec Loss 6.9363 LearningRate 0.0008 Epoch: 8 Global Step: 178880 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:05,718-Speed 6315.31 samples/sec Loss 6.8440 LearningRate 0.0008 Epoch: 8 Global Step: 178890 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:08,971-Speed 6297.01 samples/sec Loss 6.8432 LearningRate 0.0008 Epoch: 8 Global Step: 178900 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:00:12,219-Speed 6307.52 samples/sec Loss 6.8448 LearningRate 0.0008 Epoch: 8 Global Step: 178910 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:15,467-Speed 6306.40 samples/sec Loss 6.9128 LearningRate 0.0008 Epoch: 8 Global Step: 178920 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:18,721-Speed 6296.68 samples/sec Loss 6.7847 LearningRate 0.0008 Epoch: 8 Global Step: 178930 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:21,967-Speed 6310.38 samples/sec Loss 6.7779 LearningRate 0.0008 Epoch: 8 Global Step: 178940 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:25,213-Speed 6310.44 samples/sec Loss 6.7929 LearningRate 0.0008 Epoch: 8 Global Step: 178950 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:28,463-Speed 6301.68 samples/sec Loss 6.8099 LearningRate 0.0008 Epoch: 8 Global Step: 178960 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:31,713-Speed 6304.53 samples/sec Loss 6.7654 LearningRate 0.0008 Epoch: 8 Global Step: 178970 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:34,959-Speed 6310.73 samples/sec Loss 6.8328 LearningRate 0.0008 Epoch: 8 Global Step: 178980 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:38,205-Speed 6309.76 samples/sec Loss 6.7965 LearningRate 0.0008 Epoch: 8 Global Step: 178990 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:41,455-Speed 6303.72 samples/sec Loss 6.8469 LearningRate 0.0008 Epoch: 8 Global Step: 179000 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:00:44,699-Speed 6314.70 samples/sec Loss 6.9230 LearningRate 0.0008 Epoch: 8 Global Step: 179010 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:00:47,950-Speed 6301.73 samples/sec Loss 6.8098 LearningRate 0.0008 Epoch: 8 Global Step: 179020 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:00:51,215-Speed 6272.14 samples/sec Loss 6.8939 LearningRate 0.0008 Epoch: 8 Global Step: 179030 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:00:54,460-Speed 6312.63 samples/sec Loss 6.7860 LearningRate 0.0008 Epoch: 8 Global Step: 179040 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:00:57,707-Speed 6310.34 samples/sec Loss 6.8281 LearningRate 0.0008 Epoch: 8 Global Step: 179050 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:01:00,956-Speed 6304.52 samples/sec Loss 6.8415 LearningRate 0.0008 Epoch: 8 Global Step: 179060 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:01:04,207-Speed 6301.99 samples/sec Loss 6.8793 LearningRate 0.0008 Epoch: 8 Global Step: 179070 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:01:07,456-Speed 6304.69 samples/sec Loss 6.7469 LearningRate 0.0008 Epoch: 8 Global Step: 179080 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:01:10,704-Speed 6306.92 samples/sec Loss 6.8299 LearningRate 0.0008 Epoch: 8 Global Step: 179090 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:01:13,954-Speed 6303.07 samples/sec Loss 6.8665 LearningRate 0.0008 Epoch: 8 Global Step: 179100 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:01:17,185-Speed 6339.64 samples/sec Loss 6.8311 LearningRate 0.0008 Epoch: 8 Global Step: 179110 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:01:20,431-Speed 6310.72 samples/sec Loss 6.8043 LearningRate 0.0008 Epoch: 8 Global Step: 179120 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:01:23,677-Speed 6311.27 samples/sec Loss 6.8897 LearningRate 0.0008 Epoch: 8 Global Step: 179130 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:01:26,924-Speed 6308.16 samples/sec Loss 6.7559 LearningRate 0.0008 Epoch: 8 Global Step: 179140 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:01:30,168-Speed 6315.36 samples/sec Loss 6.8217 LearningRate 0.0008 Epoch: 8 Global Step: 179150 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:01:33,412-Speed 6314.40 samples/sec Loss 6.8893 LearningRate 0.0008 Epoch: 8 Global Step: 179160 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:01:36,655-Speed 6315.39 samples/sec Loss 6.8251 LearningRate 0.0008 Epoch: 8 Global Step: 179170 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:01:39,901-Speed 6310.43 samples/sec Loss 6.9091 LearningRate 0.0008 Epoch: 8 Global Step: 179180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:01:43,147-Speed 6312.04 samples/sec Loss 6.8459 LearningRate 0.0008 Epoch: 8 Global Step: 179190 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:01:46,392-Speed 6311.84 samples/sec Loss 6.8278 LearningRate 0.0008 Epoch: 8 Global Step: 179200 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:01:49,641-Speed 6305.72 samples/sec Loss 6.7854 LearningRate 0.0008 Epoch: 8 Global Step: 179210 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:01:52,902-Speed 6281.32 samples/sec Loss 6.9178 LearningRate 0.0008 Epoch: 8 Global Step: 179220 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:01:56,147-Speed 6312.76 samples/sec Loss 6.7480 LearningRate 0.0008 Epoch: 8 Global Step: 179230 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:01:59,393-Speed 6310.44 samples/sec Loss 6.8583 LearningRate 0.0008 Epoch: 8 Global Step: 179240 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:02:02,626-Speed 6336.10 samples/sec Loss 6.8556 LearningRate 0.0008 Epoch: 8 Global Step: 179250 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:05,872-Speed 6311.54 samples/sec Loss 6.8503 LearningRate 0.0008 Epoch: 8 Global Step: 179260 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:09,118-Speed 6310.68 samples/sec Loss 6.7378 LearningRate 0.0008 Epoch: 8 Global Step: 179270 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:12,365-Speed 6309.70 samples/sec Loss 6.8493 LearningRate 0.0008 Epoch: 8 Global Step: 179280 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:15,615-Speed 6303.15 samples/sec Loss 6.8648 LearningRate 0.0008 Epoch: 8 Global Step: 179290 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:18,860-Speed 6311.56 samples/sec Loss 6.8786 LearningRate 0.0008 Epoch: 8 Global Step: 179300 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:22,111-Speed 6301.25 samples/sec Loss 6.8209 LearningRate 0.0008 Epoch: 8 Global Step: 179310 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:25,360-Speed 6305.60 samples/sec Loss 6.9680 LearningRate 0.0008 Epoch: 8 Global Step: 179320 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:28,610-Speed 6301.79 samples/sec Loss 6.8417 LearningRate 0.0008 Epoch: 8 Global Step: 179330 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:31,861-Speed 6301.48 samples/sec Loss 6.8476 LearningRate 0.0008 Epoch: 8 Global Step: 179340 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:35,108-Speed 6309.57 samples/sec Loss 6.8256 LearningRate 0.0008 Epoch: 8 Global Step: 179350 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:02:38,358-Speed 6301.24 samples/sec Loss 6.8038 LearningRate 0.0008 Epoch: 8 Global Step: 179360 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:02:41,591-Speed 6337.24 samples/sec Loss 6.8067 LearningRate 0.0008 Epoch: 8 Global Step: 179370 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:44,838-Speed 6309.16 samples/sec Loss 6.7134 LearningRate 0.0008 Epoch: 8 Global Step: 179380 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:48,091-Speed 6297.35 samples/sec Loss 6.8164 LearningRate 0.0008 Epoch: 8 Global Step: 179390 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:51,337-Speed 6310.56 samples/sec Loss 6.8404 LearningRate 0.0008 Epoch: 8 Global Step: 179400 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:54,583-Speed 6309.84 samples/sec Loss 6.8632 LearningRate 0.0008 Epoch: 8 Global Step: 179410 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:02:57,832-Speed 6305.25 samples/sec Loss 6.7587 LearningRate 0.0008 Epoch: 8 Global Step: 179420 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:01,077-Speed 6312.14 samples/sec Loss 6.8754 LearningRate 0.0008 Epoch: 8 Global Step: 179430 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:04,325-Speed 6306.29 samples/sec Loss 6.8412 LearningRate 0.0008 Epoch: 8 Global Step: 179440 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:07,566-Speed 6320.30 samples/sec Loss 6.7700 LearningRate 0.0008 Epoch: 8 Global Step: 179450 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:10,813-Speed 6309.46 samples/sec Loss 6.7968 LearningRate 0.0008 Epoch: 8 Global Step: 179460 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:14,058-Speed 6313.43 samples/sec Loss 6.8618 LearningRate 0.0008 Epoch: 8 Global Step: 179470 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:03:17,294-Speed 6331.03 samples/sec Loss 6.8310 LearningRate 0.0008 Epoch: 8 Global Step: 179480 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:20,550-Speed 6291.16 samples/sec Loss 6.9077 LearningRate 0.0008 Epoch: 8 Global Step: 179490 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:23,800-Speed 6303.13 samples/sec Loss 6.9093 LearningRate 0.0008 Epoch: 8 Global Step: 179500 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:27,047-Speed 6309.34 samples/sec Loss 6.8097 LearningRate 0.0008 Epoch: 8 Global Step: 179510 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:30,291-Speed 6313.70 samples/sec Loss 6.8066 LearningRate 0.0008 Epoch: 8 Global Step: 179520 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:33,536-Speed 6313.02 samples/sec Loss 6.8498 LearningRate 0.0008 Epoch: 8 Global Step: 179530 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:36,786-Speed 6303.26 samples/sec Loss 6.9417 LearningRate 0.0008 Epoch: 8 Global Step: 179540 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:40,034-Speed 6306.13 samples/sec Loss 6.8575 LearningRate 0.0008 Epoch: 8 Global Step: 179550 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:43,285-Speed 6301.18 samples/sec Loss 6.8141 LearningRate 0.0008 Epoch: 8 Global Step: 179560 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:46,532-Speed 6308.42 samples/sec Loss 6.8244 LearningRate 0.0008 Epoch: 8 Global Step: 179570 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:49,773-Speed 6321.65 samples/sec Loss 6.9009 LearningRate 0.0008 Epoch: 8 Global Step: 179580 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:53,025-Speed 6298.91 samples/sec Loss 6.8107 LearningRate 0.0008 Epoch: 8 Global Step: 179590 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:56,273-Speed 6305.45 samples/sec Loss 6.7815 LearningRate 0.0008 Epoch: 8 Global Step: 179600 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:03:59,516-Speed 6317.44 samples/sec Loss 6.8579 LearningRate 0.0008 Epoch: 8 Global Step: 179610 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:02,764-Speed 6306.31 samples/sec Loss 6.8989 LearningRate 0.0008 Epoch: 8 Global Step: 179620 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:06,013-Speed 6305.25 samples/sec Loss 6.8151 LearningRate 0.0008 Epoch: 8 Global Step: 179630 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:09,262-Speed 6305.44 samples/sec Loss 6.8388 LearningRate 0.0008 Epoch: 8 Global Step: 179640 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:12,522-Speed 6283.02 samples/sec Loss 6.8185 LearningRate 0.0008 Epoch: 8 Global Step: 179650 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:15,768-Speed 6310.74 samples/sec Loss 6.8036 LearningRate 0.0008 Epoch: 8 Global Step: 179660 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:19,019-Speed 6300.19 samples/sec Loss 6.7941 LearningRate 0.0008 Epoch: 8 Global Step: 179670 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:22,268-Speed 6307.40 samples/sec Loss 6.7823 LearningRate 0.0008 Epoch: 8 Global Step: 179680 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:04:25,516-Speed 6306.55 samples/sec Loss 6.8281 LearningRate 0.0008 Epoch: 8 Global Step: 179690 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:04:28,764-Speed 6306.44 samples/sec Loss 6.7672 LearningRate 0.0008 Epoch: 8 Global Step: 179700 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:04:32,011-Speed 6308.48 samples/sec Loss 6.8066 LearningRate 0.0008 Epoch: 8 Global Step: 179710 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:04:35,258-Speed 6309.62 samples/sec Loss 6.8413 LearningRate 0.0008 Epoch: 8 Global Step: 179720 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:04:38,506-Speed 6305.69 samples/sec Loss 6.8525 LearningRate 0.0008 Epoch: 8 Global Step: 179730 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:04:41,738-Speed 6339.06 samples/sec Loss 6.8201 LearningRate 0.0008 Epoch: 8 Global Step: 179740 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:44,987-Speed 6305.67 samples/sec Loss 6.8405 LearningRate 0.0008 Epoch: 8 Global Step: 179750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:48,232-Speed 6311.80 samples/sec Loss 6.7798 LearningRate 0.0008 Epoch: 8 Global Step: 179760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:51,478-Speed 6311.46 samples/sec Loss 6.7416 LearningRate 0.0008 Epoch: 8 Global Step: 179770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:54,723-Speed 6312.63 samples/sec Loss 6.9006 LearningRate 0.0008 Epoch: 8 Global Step: 179780 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:04:57,968-Speed 6311.34 samples/sec Loss 6.7568 LearningRate 0.0008 Epoch: 8 Global Step: 179790 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:01,217-Speed 6306.05 samples/sec Loss 6.8096 LearningRate 0.0008 Epoch: 8 Global Step: 179800 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:04,464-Speed 6307.30 samples/sec Loss 6.8768 LearningRate 0.0008 Epoch: 8 Global Step: 179810 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:07,715-Speed 6302.35 samples/sec Loss 6.8246 LearningRate 0.0008 Epoch: 8 Global Step: 179820 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:10,964-Speed 6306.25 samples/sec Loss 6.7295 LearningRate 0.0008 Epoch: 8 Global Step: 179830 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:14,198-Speed 6334.06 samples/sec Loss 6.8140 LearningRate 0.0008 Epoch: 8 Global Step: 179840 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:17,443-Speed 6311.04 samples/sec Loss 6.8366 LearningRate 0.0008 Epoch: 8 Global Step: 179850 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:20,688-Speed 6313.86 samples/sec Loss 6.8842 LearningRate 0.0008 Epoch: 8 Global Step: 179860 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:23,943-Speed 6292.60 samples/sec Loss 6.7761 LearningRate 0.0008 Epoch: 8 Global Step: 179870 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:27,189-Speed 6310.38 samples/sec Loss 6.8103 LearningRate 0.0008 Epoch: 8 Global Step: 179880 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:30,433-Speed 6314.57 samples/sec Loss 6.9069 LearningRate 0.0008 Epoch: 8 Global Step: 179890 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:33,681-Speed 6307.56 samples/sec Loss 6.8384 LearningRate 0.0008 Epoch: 8 Global Step: 179900 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:36,933-Speed 6300.11 samples/sec Loss 6.7714 LearningRate 0.0008 Epoch: 8 Global Step: 179910 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:40,179-Speed 6309.50 samples/sec Loss 6.8404 LearningRate 0.0008 Epoch: 8 Global Step: 179920 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:43,428-Speed 6304.99 samples/sec Loss 6.8080 LearningRate 0.0008 Epoch: 8 Global Step: 179930 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:46,674-Speed 6311.03 samples/sec Loss 6.7656 LearningRate 0.0008 Epoch: 8 Global Step: 179940 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:05:49,924-Speed 6302.52 samples/sec Loss 6.8165 LearningRate 0.0008 Epoch: 8 Global Step: 179950 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:05:53,173-Speed 6305.80 samples/sec Loss 6.8485 LearningRate 0.0008 Epoch: 8 Global Step: 179960 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:05:56,404-Speed 6339.51 samples/sec Loss 6.7893 LearningRate 0.0008 Epoch: 8 Global Step: 179970 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:05:59,651-Speed 6309.83 samples/sec Loss 6.8470 LearningRate 0.0008 Epoch: 8 Global Step: 179980 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:02,898-Speed 6307.44 samples/sec Loss 6.8090 LearningRate 0.0008 Epoch: 8 Global Step: 179990 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:06,142-Speed 6314.91 samples/sec Loss 6.9210 LearningRate 0.0008 Epoch: 8 Global Step: 180000 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:09,388-Speed 6309.92 samples/sec Loss 6.8428 LearningRate 0.0008 Epoch: 8 Global Step: 180010 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:12,636-Speed 6308.33 samples/sec Loss 6.7712 LearningRate 0.0008 Epoch: 8 Global Step: 180020 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:15,881-Speed 6313.08 samples/sec Loss 6.8913 LearningRate 0.0008 Epoch: 8 Global Step: 180030 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:19,127-Speed 6310.07 samples/sec Loss 6.8491 LearningRate 0.0008 Epoch: 8 Global Step: 180040 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:22,374-Speed 6308.90 samples/sec Loss 6.8260 LearningRate 0.0008 Epoch: 8 Global Step: 180050 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:25,620-Speed 6309.41 samples/sec Loss 6.7960 LearningRate 0.0008 Epoch: 8 Global Step: 180060 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:28,868-Speed 6307.91 samples/sec Loss 6.7938 LearningRate 0.0008 Epoch: 8 Global Step: 180070 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:06:32,111-Speed 6317.20 samples/sec Loss 6.8417 LearningRate 0.0008 Epoch: 8 Global Step: 180080 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:06:35,344-Speed 6335.84 samples/sec Loss 6.8292 LearningRate 0.0008 Epoch: 8 Global Step: 180090 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:38,590-Speed 6310.96 samples/sec Loss 6.7621 LearningRate 0.0008 Epoch: 8 Global Step: 180100 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:41,838-Speed 6306.99 samples/sec Loss 6.8162 LearningRate 0.0008 Epoch: 8 Global Step: 180110 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:45,084-Speed 6310.24 samples/sec Loss 6.8556 LearningRate 0.0008 Epoch: 8 Global Step: 180120 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:48,333-Speed 6305.20 samples/sec Loss 6.8601 LearningRate 0.0008 Epoch: 8 Global Step: 180130 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:51,580-Speed 6310.14 samples/sec Loss 6.8045 LearningRate 0.0008 Epoch: 8 Global Step: 180140 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:54,826-Speed 6309.56 samples/sec Loss 6.8326 LearningRate 0.0008 Epoch: 8 Global Step: 180150 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:06:58,073-Speed 6310.06 samples/sec Loss 6.7982 LearningRate 0.0008 Epoch: 8 Global Step: 180160 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:01,318-Speed 6310.74 samples/sec Loss 6.7911 LearningRate 0.0008 Epoch: 8 Global Step: 180170 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:04,571-Speed 6298.59 samples/sec Loss 6.7981 LearningRate 0.0008 Epoch: 8 Global Step: 180180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:07,817-Speed 6309.09 samples/sec Loss 6.7723 LearningRate 0.0008 Epoch: 8 Global Step: 180190 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:07:11,050-Speed 6338.17 samples/sec Loss 6.7809 LearningRate 0.0008 Epoch: 8 Global Step: 180200 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:14,294-Speed 6313.94 samples/sec Loss 6.8558 LearningRate 0.0008 Epoch: 8 Global Step: 180210 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:17,542-Speed 6306.41 samples/sec Loss 6.7636 LearningRate 0.0008 Epoch: 8 Global Step: 180220 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:20,786-Speed 6314.91 samples/sec Loss 6.7940 LearningRate 0.0008 Epoch: 8 Global Step: 180230 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:24,038-Speed 6299.18 samples/sec Loss 6.7502 LearningRate 0.0008 Epoch: 8 Global Step: 180240 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:27,287-Speed 6306.03 samples/sec Loss 6.7889 LearningRate 0.0008 Epoch: 8 Global Step: 180250 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:30,532-Speed 6311.35 samples/sec Loss 6.8027 LearningRate 0.0008 Epoch: 8 Global Step: 180260 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:33,782-Speed 6303.18 samples/sec Loss 6.7379 LearningRate 0.0008 Epoch: 8 Global Step: 180270 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:37,032-Speed 6302.96 samples/sec Loss 6.7086 LearningRate 0.0008 Epoch: 8 Global Step: 180280 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:40,275-Speed 6315.80 samples/sec Loss 6.7702 LearningRate 0.0008 Epoch: 8 Global Step: 180290 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:43,526-Speed 6302.48 samples/sec Loss 6.8088 LearningRate 0.0008 Epoch: 8 Global Step: 180300 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:07:46,773-Speed 6308.85 samples/sec Loss 6.7746 LearningRate 0.0008 Epoch: 8 Global Step: 180310 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:07:50,006-Speed 6336.79 samples/sec Loss 6.8101 LearningRate 0.0008 Epoch: 8 Global Step: 180320 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:53,254-Speed 6305.87 samples/sec Loss 6.6695 LearningRate 0.0008 Epoch: 8 Global Step: 180330 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:56,501-Speed 6309.18 samples/sec Loss 6.7515 LearningRate 0.0008 Epoch: 8 Global Step: 180340 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:07:59,753-Speed 6298.51 samples/sec Loss 6.8443 LearningRate 0.0008 Epoch: 8 Global Step: 180350 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:02,996-Speed 6317.28 samples/sec Loss 6.8344 LearningRate 0.0008 Epoch: 8 Global Step: 180360 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:06,243-Speed 6309.41 samples/sec Loss 6.8412 LearningRate 0.0008 Epoch: 8 Global Step: 180370 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:09,488-Speed 6310.80 samples/sec Loss 6.7349 LearningRate 0.0008 Epoch: 8 Global Step: 180380 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:12,735-Speed 6310.15 samples/sec Loss 6.8710 LearningRate 0.0008 Epoch: 8 Global Step: 180390 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:16,029-Speed 6219.32 samples/sec Loss 6.7607 LearningRate 0.0008 Epoch: 8 Global Step: 180400 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:19,275-Speed 6309.63 samples/sec Loss 6.8821 LearningRate 0.0008 Epoch: 8 Global Step: 180410 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:22,522-Speed 6308.69 samples/sec Loss 6.8414 LearningRate 0.0008 Epoch: 8 Global Step: 180420 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:08:25,759-Speed 6327.67 samples/sec Loss 6.7404 LearningRate 0.0008 Epoch: 8 Global Step: 180430 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:29,005-Speed 6310.74 samples/sec Loss 6.7750 LearningRate 0.0008 Epoch: 8 Global Step: 180440 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:32,258-Speed 6297.09 samples/sec Loss 6.9130 LearningRate 0.0008 Epoch: 8 Global Step: 180450 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:35,505-Speed 6308.39 samples/sec Loss 6.8105 LearningRate 0.0008 Epoch: 8 Global Step: 180460 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:38,759-Speed 6296.23 samples/sec Loss 6.7531 LearningRate 0.0008 Epoch: 8 Global Step: 180470 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:42,006-Speed 6308.54 samples/sec Loss 6.7814 LearningRate 0.0008 Epoch: 8 Global Step: 180480 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:45,258-Speed 6299.28 samples/sec Loss 6.9155 LearningRate 0.0008 Epoch: 8 Global Step: 180490 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:48,505-Speed 6307.58 samples/sec Loss 6.8487 LearningRate 0.0008 Epoch: 8 Global Step: 180500 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:51,757-Speed 6300.66 samples/sec Loss 6.7954 LearningRate 0.0008 Epoch: 8 Global Step: 180510 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:55,016-Speed 6284.44 samples/sec Loss 6.7485 LearningRate 0.0008 Epoch: 8 Global Step: 180520 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:08:58,264-Speed 6307.80 samples/sec Loss 6.7286 LearningRate 0.0008 Epoch: 8 Global Step: 180530 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:09:01,514-Speed 6302.88 samples/sec Loss 6.7919 LearningRate 0.0008 Epoch: 8 Global Step: 180540 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:09:04,763-Speed 6305.83 samples/sec Loss 6.8522 LearningRate 0.0008 Epoch: 8 Global Step: 180550 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:09:07,998-Speed 6331.85 samples/sec Loss 6.8329 LearningRate 0.0008 Epoch: 8 Global Step: 180560 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:11,249-Speed 6301.00 samples/sec Loss 6.7785 LearningRate 0.0008 Epoch: 8 Global Step: 180570 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:14,501-Speed 6298.51 samples/sec Loss 6.8225 LearningRate 0.0008 Epoch: 8 Global Step: 180580 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:17,757-Speed 6291.74 samples/sec Loss 6.8400 LearningRate 0.0008 Epoch: 8 Global Step: 180590 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:21,006-Speed 6304.98 samples/sec Loss 6.8585 LearningRate 0.0008 Epoch: 8 Global Step: 180600 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:24,252-Speed 6309.88 samples/sec Loss 6.7659 LearningRate 0.0008 Epoch: 8 Global Step: 180610 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:27,503-Speed 6301.94 samples/sec Loss 6.7935 LearningRate 0.0008 Epoch: 8 Global Step: 180620 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:30,754-Speed 6299.74 samples/sec Loss 6.8425 LearningRate 0.0008 Epoch: 8 Global Step: 180630 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:34,001-Speed 6308.97 samples/sec Loss 6.8045 LearningRate 0.0008 Epoch: 8 Global Step: 180640 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:37,249-Speed 6306.52 samples/sec Loss 6.8537 LearningRate 0.0008 Epoch: 8 Global Step: 180650 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:40,505-Speed 6292.91 samples/sec Loss 6.7967 LearningRate 0.0008 Epoch: 8 Global Step: 180660 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:09:43,752-Speed 6308.44 samples/sec Loss 6.8003 LearningRate 0.0008 Epoch: 8 Global Step: 180670 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:09:46,999-Speed 6307.74 samples/sec Loss 6.7863 LearningRate 0.0008 Epoch: 8 Global Step: 180680 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:09:50,235-Speed 6331.03 samples/sec Loss 6.8494 LearningRate 0.0008 Epoch: 8 Global Step: 180690 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:53,481-Speed 6310.55 samples/sec Loss 6.8113 LearningRate 0.0008 Epoch: 8 Global Step: 180700 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:56,730-Speed 6305.29 samples/sec Loss 6.6771 LearningRate 0.0008 Epoch: 8 Global Step: 180710 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:09:59,977-Speed 6307.75 samples/sec Loss 6.8913 LearningRate 0.0008 Epoch: 8 Global Step: 180720 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:03,224-Speed 6308.19 samples/sec Loss 6.8735 LearningRate 0.0008 Epoch: 8 Global Step: 180730 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:06,475-Speed 6301.35 samples/sec Loss 6.7438 LearningRate 0.0008 Epoch: 8 Global Step: 180740 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:09,725-Speed 6304.45 samples/sec Loss 6.8245 LearningRate 0.0008 Epoch: 8 Global Step: 180750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:12,978-Speed 6297.92 samples/sec Loss 6.8728 LearningRate 0.0008 Epoch: 8 Global Step: 180760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:16,225-Speed 6306.98 samples/sec Loss 6.8143 LearningRate 0.0008 Epoch: 8 Global Step: 180770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:19,492-Speed 6271.58 samples/sec Loss 6.7962 LearningRate 0.0008 Epoch: 8 Global Step: 180780 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:22,743-Speed 6300.25 samples/sec Loss 6.9283 LearningRate 0.0008 Epoch: 8 Global Step: 180790 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:10:25,989-Speed 6311.01 samples/sec Loss 6.8053 LearningRate 0.0008 Epoch: 8 Global Step: 180800 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:10:29,235-Speed 6310.73 samples/sec Loss 6.7757 LearningRate 0.0008 Epoch: 8 Global Step: 180810 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:10:32,468-Speed 6336.15 samples/sec Loss 6.9315 LearningRate 0.0008 Epoch: 8 Global Step: 180820 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:35,720-Speed 6299.07 samples/sec Loss 6.8753 LearningRate 0.0008 Epoch: 8 Global Step: 180830 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:38,962-Speed 6317.44 samples/sec Loss 6.8188 LearningRate 0.0008 Epoch: 8 Global Step: 180840 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:42,211-Speed 6305.87 samples/sec Loss 6.7822 LearningRate 0.0008 Epoch: 8 Global Step: 180850 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:45,454-Speed 6316.54 samples/sec Loss 6.8212 LearningRate 0.0008 Epoch: 8 Global Step: 180860 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:48,701-Speed 6308.38 samples/sec Loss 6.8102 LearningRate 0.0008 Epoch: 8 Global Step: 180870 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:51,954-Speed 6297.71 samples/sec Loss 6.7756 LearningRate 0.0008 Epoch: 8 Global Step: 180880 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:55,202-Speed 6305.73 samples/sec Loss 6.8671 LearningRate 0.0008 Epoch: 8 Global Step: 180890 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:10:58,462-Speed 6284.41 samples/sec Loss 6.8941 LearningRate 0.0008 Epoch: 8 Global Step: 180900 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:01,710-Speed 6307.33 samples/sec Loss 6.8544 LearningRate 0.0008 Epoch: 8 Global Step: 180910 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:04,956-Speed 6310.12 samples/sec Loss 6.8597 LearningRate 0.0008 Epoch: 8 Global Step: 180920 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:11:08,195-Speed 6325.06 samples/sec Loss 6.8811 LearningRate 0.0008 Epoch: 8 Global Step: 180930 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:11,442-Speed 6307.79 samples/sec Loss 6.8139 LearningRate 0.0008 Epoch: 8 Global Step: 180940 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:14,693-Speed 6301.40 samples/sec Loss 6.8545 LearningRate 0.0008 Epoch: 8 Global Step: 180950 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:17,942-Speed 6305.16 samples/sec Loss 6.8111 LearningRate 0.0008 Epoch: 8 Global Step: 180960 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:21,187-Speed 6312.65 samples/sec Loss 6.8324 LearningRate 0.0008 Epoch: 8 Global Step: 180970 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:24,433-Speed 6310.61 samples/sec Loss 6.8689 LearningRate 0.0008 Epoch: 8 Global Step: 180980 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:27,679-Speed 6310.16 samples/sec Loss 6.7849 LearningRate 0.0008 Epoch: 8 Global Step: 180990 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:30,925-Speed 6311.58 samples/sec Loss 6.8098 LearningRate 0.0008 Epoch: 8 Global Step: 181000 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:34,171-Speed 6310.62 samples/sec Loss 6.8083 LearningRate 0.0008 Epoch: 8 Global Step: 181010 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:37,419-Speed 6306.56 samples/sec Loss 6.7613 LearningRate 0.0008 Epoch: 8 Global Step: 181020 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:40,668-Speed 6305.39 samples/sec Loss 6.8067 LearningRate 0.0008 Epoch: 8 Global Step: 181030 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:11:43,902-Speed 6334.01 samples/sec Loss 6.8461 LearningRate 0.0008 Epoch: 8 Global Step: 181040 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:47,156-Speed 6295.37 samples/sec Loss 6.8619 LearningRate 0.0008 Epoch: 8 Global Step: 181050 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:50,402-Speed 6310.30 samples/sec Loss 6.7694 LearningRate 0.0008 Epoch: 8 Global Step: 181060 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:53,663-Speed 6281.40 samples/sec Loss 6.8672 LearningRate 0.0008 Epoch: 8 Global Step: 181070 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:11:56,912-Speed 6306.07 samples/sec Loss 6.8344 LearningRate 0.0008 Epoch: 8 Global Step: 181080 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:00,156-Speed 6313.11 samples/sec Loss 6.8773 LearningRate 0.0008 Epoch: 8 Global Step: 181090 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:03,412-Speed 6292.71 samples/sec Loss 6.8419 LearningRate 0.0008 Epoch: 8 Global Step: 181100 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:06,658-Speed 6310.43 samples/sec Loss 6.7312 LearningRate 0.0008 Epoch: 8 Global Step: 181110 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:09,906-Speed 6306.66 samples/sec Loss 6.8049 LearningRate 0.0008 Epoch: 8 Global Step: 181120 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:13,158-Speed 6299.41 samples/sec Loss 6.7296 LearningRate 0.0008 Epoch: 8 Global Step: 181130 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:16,405-Speed 6307.94 samples/sec Loss 6.8115 LearningRate 0.0008 Epoch: 8 Global Step: 181140 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:12:19,637-Speed 6337.61 samples/sec Loss 6.8249 LearningRate 0.0008 Epoch: 8 Global Step: 181150 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:22,883-Speed 6310.68 samples/sec Loss 6.7488 LearningRate 0.0008 Epoch: 8 Global Step: 181160 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:26,127-Speed 6313.97 samples/sec Loss 6.7711 LearningRate 0.0008 Epoch: 8 Global Step: 181170 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:29,374-Speed 6310.13 samples/sec Loss 6.8568 LearningRate 0.0008 Epoch: 8 Global Step: 181180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:32,621-Speed 6309.42 samples/sec Loss 6.8326 LearningRate 0.0008 Epoch: 8 Global Step: 181190 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:35,866-Speed 6313.14 samples/sec Loss 6.7310 LearningRate 0.0008 Epoch: 8 Global Step: 181200 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:39,113-Speed 6307.50 samples/sec Loss 6.7396 LearningRate 0.0008 Epoch: 8 Global Step: 181210 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:42,365-Speed 6299.53 samples/sec Loss 6.8521 LearningRate 0.0008 Epoch: 8 Global Step: 181220 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:45,613-Speed 6306.53 samples/sec Loss 6.8367 LearningRate 0.0008 Epoch: 8 Global Step: 181230 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:48,860-Speed 6309.66 samples/sec Loss 6.7280 LearningRate 0.0008 Epoch: 8 Global Step: 181240 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:52,110-Speed 6301.62 samples/sec Loss 6.7854 LearningRate 0.0008 Epoch: 8 Global Step: 181250 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:12:55,346-Speed 6331.47 samples/sec Loss 6.7989 LearningRate 0.0008 Epoch: 8 Global Step: 181260 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:12:58,599-Speed 6297.17 samples/sec Loss 6.8316 LearningRate 0.0008 Epoch: 8 Global Step: 181270 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:01,846-Speed 6307.98 samples/sec Loss 6.8714 LearningRate 0.0008 Epoch: 8 Global Step: 181280 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:05,094-Speed 6307.90 samples/sec Loss 6.7711 LearningRate 0.0008 Epoch: 8 Global Step: 181290 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:08,340-Speed 6309.90 samples/sec Loss 6.8109 LearningRate 0.0008 Epoch: 8 Global Step: 181300 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:11,587-Speed 6309.05 samples/sec Loss 6.8881 LearningRate 0.0008 Epoch: 8 Global Step: 181310 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:14,833-Speed 6310.12 samples/sec Loss 6.7570 LearningRate 0.0008 Epoch: 8 Global Step: 181320 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:18,082-Speed 6305.79 samples/sec Loss 6.7439 LearningRate 0.0008 Epoch: 8 Global Step: 181330 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:21,339-Speed 6288.85 samples/sec Loss 6.7457 LearningRate 0.0008 Epoch: 8 Global Step: 181340 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:24,584-Speed 6311.47 samples/sec Loss 6.7908 LearningRate 0.0008 Epoch: 8 Global Step: 181350 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:27,815-Speed 6341.14 samples/sec Loss 6.8859 LearningRate 0.0008 Epoch: 8 Global Step: 181360 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:31,062-Speed 6308.31 samples/sec Loss 6.8759 LearningRate 0.0008 Epoch: 8 Global Step: 181370 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:34,307-Speed 6313.58 samples/sec Loss 6.8502 LearningRate 0.0008 Epoch: 8 Global Step: 181380 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:37,555-Speed 6306.70 samples/sec Loss 6.7816 LearningRate 0.0008 Epoch: 8 Global Step: 181390 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:40,800-Speed 6313.46 samples/sec Loss 6.7847 LearningRate 0.0008 Epoch: 8 Global Step: 181400 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:44,048-Speed 6305.88 samples/sec Loss 6.8080 LearningRate 0.0008 Epoch: 8 Global Step: 181410 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:47,298-Speed 6303.88 samples/sec Loss 6.7997 LearningRate 0.0008 Epoch: 8 Global Step: 181420 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:50,544-Speed 6310.38 samples/sec Loss 6.7720 LearningRate 0.0008 Epoch: 8 Global Step: 181430 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:53,803-Speed 6285.54 samples/sec Loss 6.7432 LearningRate 0.0008 Epoch: 8 Global Step: 181440 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:13:57,048-Speed 6312.82 samples/sec Loss 6.7834 LearningRate 0.0008 Epoch: 8 Global Step: 181450 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:00,281-Speed 6336.67 samples/sec Loss 6.7739 LearningRate 0.0008 Epoch: 8 Global Step: 181460 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:03,530-Speed 6303.78 samples/sec Loss 6.7472 LearningRate 0.0008 Epoch: 8 Global Step: 181470 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:06,777-Speed 6309.08 samples/sec Loss 6.8371 LearningRate 0.0008 Epoch: 8 Global Step: 181480 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:10,026-Speed 6304.51 samples/sec Loss 6.7942 LearningRate 0.0008 Epoch: 8 Global Step: 181490 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:13,281-Speed 6293.73 samples/sec Loss 6.7718 LearningRate 0.0008 Epoch: 8 Global Step: 181500 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:16,530-Speed 6304.77 samples/sec Loss 6.8299 LearningRate 0.0008 Epoch: 8 Global Step: 181510 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:19,778-Speed 6307.74 samples/sec Loss 6.7862 LearningRate 0.0008 Epoch: 8 Global Step: 181520 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:23,025-Speed 6307.24 samples/sec Loss 6.7880 LearningRate 0.0008 Epoch: 8 Global Step: 181530 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:26,273-Speed 6306.47 samples/sec Loss 6.8852 LearningRate 0.0008 Epoch: 8 Global Step: 181540 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:29,519-Speed 6311.60 samples/sec Loss 6.8076 LearningRate 0.0008 Epoch: 8 Global Step: 181550 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:32,767-Speed 6307.43 samples/sec Loss 6.7961 LearningRate 0.0008 Epoch: 8 Global Step: 181560 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:14:36,012-Speed 6311.37 samples/sec Loss 6.7635 LearningRate 0.0008 Epoch: 8 Global Step: 181570 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:14:39,305-Speed 6220.65 samples/sec Loss 6.7684 LearningRate 0.0008 Epoch: 8 Global Step: 181580 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:14:42,539-Speed 6334.87 samples/sec Loss 6.7673 LearningRate 0.0008 Epoch: 8 Global Step: 181590 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:45,786-Speed 6308.40 samples/sec Loss 6.8309 LearningRate 0.0008 Epoch: 8 Global Step: 181600 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:49,032-Speed 6310.80 samples/sec Loss 6.7984 LearningRate 0.0008 Epoch: 8 Global Step: 181610 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:52,313-Speed 6242.82 samples/sec Loss 6.8640 LearningRate 0.0008 Epoch: 8 Global Step: 181620 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:55,563-Speed 6303.37 samples/sec Loss 6.7478 LearningRate 0.0008 Epoch: 8 Global Step: 181630 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:14:58,808-Speed 6314.52 samples/sec Loss 6.7821 LearningRate 0.0008 Epoch: 8 Global Step: 181640 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:02,054-Speed 6310.76 samples/sec Loss 6.8232 LearningRate 0.0008 Epoch: 8 Global Step: 181650 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:05,302-Speed 6305.65 samples/sec Loss 6.8064 LearningRate 0.0008 Epoch: 8 Global Step: 181660 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:08,549-Speed 6309.40 samples/sec Loss 6.7724 LearningRate 0.0008 Epoch: 8 Global Step: 181670 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:11,799-Speed 6303.57 samples/sec Loss 6.8409 LearningRate 0.0008 Epoch: 8 Global Step: 181680 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:15,047-Speed 6305.65 samples/sec Loss 6.7915 LearningRate 0.0008 Epoch: 8 Global Step: 181690 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:15:18,284-Speed 6328.51 samples/sec Loss 6.7699 LearningRate 0.0008 Epoch: 8 Global Step: 181700 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:21,534-Speed 6302.43 samples/sec Loss 6.8454 LearningRate 0.0008 Epoch: 8 Global Step: 181710 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:24,796-Speed 6280.07 samples/sec Loss 6.7364 LearningRate 0.0008 Epoch: 8 Global Step: 181720 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:28,104-Speed 6192.11 samples/sec Loss 6.7638 LearningRate 0.0008 Epoch: 8 Global Step: 181730 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:31,354-Speed 6302.74 samples/sec Loss 6.7983 LearningRate 0.0008 Epoch: 8 Global Step: 181740 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:34,603-Speed 6305.94 samples/sec Loss 6.8468 LearningRate 0.0008 Epoch: 8 Global Step: 181750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:37,856-Speed 6296.84 samples/sec Loss 6.7004 LearningRate 0.0008 Epoch: 8 Global Step: 181760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:41,103-Speed 6308.15 samples/sec Loss 6.8261 LearningRate 0.0008 Epoch: 8 Global Step: 181770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:44,348-Speed 6312.44 samples/sec Loss 6.8414 LearningRate 0.0008 Epoch: 8 Global Step: 181780 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:47,602-Speed 6296.18 samples/sec Loss 6.8220 LearningRate 0.0008 Epoch: 8 Global Step: 181790 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:50,851-Speed 6304.46 samples/sec Loss 6.7769 LearningRate 0.0008 Epoch: 8 Global Step: 181800 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:15:54,086-Speed 6331.86 samples/sec Loss 6.7736 LearningRate 0.0008 Epoch: 8 Global Step: 181810 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:15:57,333-Speed 6309.70 samples/sec Loss 6.7582 LearningRate 0.0008 Epoch: 8 Global Step: 181820 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:00,583-Speed 6302.26 samples/sec Loss 6.8837 LearningRate 0.0008 Epoch: 8 Global Step: 181830 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:03,833-Speed 6303.16 samples/sec Loss 6.8243 LearningRate 0.0008 Epoch: 8 Global Step: 181840 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:07,081-Speed 6306.55 samples/sec Loss 6.8684 LearningRate 0.0008 Epoch: 8 Global Step: 181850 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:10,333-Speed 6299.79 samples/sec Loss 6.7541 LearningRate 0.0008 Epoch: 8 Global Step: 181860 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:13,583-Speed 6302.88 samples/sec Loss 6.8277 LearningRate 0.0008 Epoch: 8 Global Step: 181870 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:16,829-Speed 6312.07 samples/sec Loss 6.8394 LearningRate 0.0008 Epoch: 8 Global Step: 181880 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:20,079-Speed 6303.14 samples/sec Loss 6.7911 LearningRate 0.0008 Epoch: 8 Global Step: 181890 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:23,325-Speed 6309.20 samples/sec Loss 6.8334 LearningRate 0.0008 Epoch: 8 Global Step: 181900 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:26,578-Speed 6298.93 samples/sec Loss 6.8560 LearningRate 0.0008 Epoch: 8 Global Step: 181910 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:16:29,811-Speed 6334.84 samples/sec Loss 6.7901 LearningRate 0.0008 Epoch: 8 Global Step: 181920 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:33,054-Speed 6317.84 samples/sec Loss 6.8297 LearningRate 0.0008 Epoch: 8 Global Step: 181930 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:36,308-Speed 6293.37 samples/sec Loss 6.7834 LearningRate 0.0008 Epoch: 8 Global Step: 181940 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:39,556-Speed 6307.49 samples/sec Loss 6.8172 LearningRate 0.0008 Epoch: 8 Global Step: 181950 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:42,801-Speed 6312.77 samples/sec Loss 6.7983 LearningRate 0.0008 Epoch: 8 Global Step: 181960 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:46,046-Speed 6312.08 samples/sec Loss 6.8396 LearningRate 0.0008 Epoch: 8 Global Step: 181970 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:49,294-Speed 6308.10 samples/sec Loss 6.7508 LearningRate 0.0008 Epoch: 8 Global Step: 181980 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:52,545-Speed 6300.21 samples/sec Loss 6.7987 LearningRate 0.0008 Epoch: 8 Global Step: 181990 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:55,792-Speed 6308.42 samples/sec Loss 6.8326 LearningRate 0.0008 Epoch: 8 Global Step: 182000 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:16:59,047-Speed 6293.55 samples/sec Loss 6.8079 LearningRate 0.0008 Epoch: 8 Global Step: 182010 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:02,298-Speed 6302.19 samples/sec Loss 6.8469 LearningRate 0.0008 Epoch: 8 Global Step: 182020 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:17:05,544-Speed 6308.92 samples/sec Loss 6.8441 LearningRate 0.0008 Epoch: 8 Global Step: 182030 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:17:08,793-Speed 6305.18 samples/sec Loss 6.8150 LearningRate 0.0008 Epoch: 8 Global Step: 182040 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:17:12,025-Speed 6337.16 samples/sec Loss 6.7402 LearningRate 0.0008 Epoch: 8 Global Step: 182050 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:15,275-Speed 6304.38 samples/sec Loss 6.8664 LearningRate 0.0008 Epoch: 8 Global Step: 182060 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:18,526-Speed 6302.16 samples/sec Loss 6.7358 LearningRate 0.0008 Epoch: 8 Global Step: 182070 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:21,775-Speed 6304.94 samples/sec Loss 6.7765 LearningRate 0.0008 Epoch: 8 Global Step: 182080 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:25,021-Speed 6309.64 samples/sec Loss 6.8644 LearningRate 0.0008 Epoch: 8 Global Step: 182090 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:28,276-Speed 6293.77 samples/sec Loss 6.8136 LearningRate 0.0008 Epoch: 8 Global Step: 182100 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:31,525-Speed 6304.94 samples/sec Loss 6.9200 LearningRate 0.0008 Epoch: 8 Global Step: 182110 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:34,776-Speed 6301.36 samples/sec Loss 6.8107 LearningRate 0.0008 Epoch: 8 Global Step: 182120 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:38,021-Speed 6312.24 samples/sec Loss 6.7695 LearningRate 0.0008 Epoch: 8 Global Step: 182130 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:41,270-Speed 6304.80 samples/sec Loss 6.8245 LearningRate 0.0008 Epoch: 8 Global Step: 182140 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:44,500-Speed 6341.74 samples/sec Loss 6.7678 LearningRate 0.0008 Epoch: 8 Global Step: 182150 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:47,747-Speed 6308.90 samples/sec Loss 6.7510 LearningRate 0.0008 Epoch: 8 Global Step: 182160 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:50,995-Speed 6307.61 samples/sec Loss 6.8247 LearningRate 0.0008 Epoch: 8 Global Step: 182170 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:54,247-Speed 6299.19 samples/sec Loss 6.7465 LearningRate 0.0008 Epoch: 8 Global Step: 182180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:17:57,492-Speed 6311.71 samples/sec Loss 6.8272 LearningRate 0.0008 Epoch: 8 Global Step: 182190 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:00,739-Speed 6309.10 samples/sec Loss 6.6921 LearningRate 0.0008 Epoch: 8 Global Step: 182200 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:03,988-Speed 6304.29 samples/sec Loss 6.7921 LearningRate 0.0008 Epoch: 8 Global Step: 182210 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:07,266-Speed 6249.25 samples/sec Loss 6.8530 LearningRate 0.0008 Epoch: 8 Global Step: 182220 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:10,516-Speed 6303.01 samples/sec Loss 6.7744 LearningRate 0.0008 Epoch: 8 Global Step: 182230 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:13,763-Speed 6309.90 samples/sec Loss 6.8083 LearningRate 0.0008 Epoch: 8 Global Step: 182240 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:17,015-Speed 6299.70 samples/sec Loss 6.7817 LearningRate 0.0008 Epoch: 8 Global Step: 182250 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:18:20,246-Speed 6339.98 samples/sec Loss 6.7803 LearningRate 0.0008 Epoch: 8 Global Step: 182260 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:23,494-Speed 6306.40 samples/sec Loss 6.6834 LearningRate 0.0008 Epoch: 8 Global Step: 182270 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:26,737-Speed 6316.80 samples/sec Loss 6.7712 LearningRate 0.0008 Epoch: 8 Global Step: 182280 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:29,986-Speed 6305.55 samples/sec Loss 6.8464 LearningRate 0.0008 Epoch: 8 Global Step: 182290 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:33,231-Speed 6313.47 samples/sec Loss 6.7633 LearningRate 0.0008 Epoch: 8 Global Step: 182300 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:36,506-Speed 6254.45 samples/sec Loss 6.9037 LearningRate 0.0008 Epoch: 8 Global Step: 182310 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:39,752-Speed 6311.09 samples/sec Loss 6.7954 LearningRate 0.0008 Epoch: 8 Global Step: 182320 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:42,995-Speed 6315.85 samples/sec Loss 6.8653 LearningRate 0.0008 Epoch: 8 Global Step: 182330 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:46,243-Speed 6307.33 samples/sec Loss 6.7406 LearningRate 0.0008 Epoch: 8 Global Step: 182340 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:49,492-Speed 6304.68 samples/sec Loss 6.8527 LearningRate 0.0008 Epoch: 8 Global Step: 182350 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:18:52,736-Speed 6315.05 samples/sec Loss 6.7974 LearningRate 0.0008 Epoch: 8 Global Step: 182360 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:18:55,988-Speed 6297.73 samples/sec Loss 6.7800 LearningRate 0.0008 Epoch: 8 Global Step: 182370 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:18:59,224-Speed 6331.22 samples/sec Loss 6.7818 LearningRate 0.0008 Epoch: 8 Global Step: 182380 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:02,477-Speed 6299.58 samples/sec Loss 6.8330 LearningRate 0.0008 Epoch: 8 Global Step: 182390 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:05,723-Speed 6310.90 samples/sec Loss 6.7530 LearningRate 0.0008 Epoch: 8 Global Step: 182400 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:08,980-Speed 6289.67 samples/sec Loss 6.7806 LearningRate 0.0008 Epoch: 8 Global Step: 182410 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:12,228-Speed 6305.99 samples/sec Loss 6.8134 LearningRate 0.0008 Epoch: 8 Global Step: 182420 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:15,483-Speed 6293.31 samples/sec Loss 6.8159 LearningRate 0.0008 Epoch: 8 Global Step: 182430 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:18,729-Speed 6310.36 samples/sec Loss 6.8614 LearningRate 0.0008 Epoch: 8 Global Step: 182440 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:21,973-Speed 6315.80 samples/sec Loss 6.8171 LearningRate 0.0008 Epoch: 8 Global Step: 182450 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:25,218-Speed 6312.21 samples/sec Loss 6.7381 LearningRate 0.0008 Epoch: 8 Global Step: 182460 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:28,468-Speed 6302.93 samples/sec Loss 6.8072 LearningRate 0.0008 Epoch: 8 Global Step: 182470 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:31,718-Speed 6304.02 samples/sec Loss 6.8572 LearningRate 0.0008 Epoch: 8 Global Step: 182480 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:19:34,955-Speed 6327.59 samples/sec Loss 6.9369 LearningRate 0.0008 Epoch: 8 Global Step: 182490 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:38,207-Speed 6299.95 samples/sec Loss 6.7781 LearningRate 0.0008 Epoch: 8 Global Step: 182500 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:41,454-Speed 6309.32 samples/sec Loss 6.7237 LearningRate 0.0008 Epoch: 8 Global Step: 182510 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:44,709-Speed 6293.22 samples/sec Loss 6.7723 LearningRate 0.0008 Epoch: 8 Global Step: 182520 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:47,954-Speed 6312.37 samples/sec Loss 6.8118 LearningRate 0.0008 Epoch: 8 Global Step: 182530 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:51,200-Speed 6310.22 samples/sec Loss 6.8565 LearningRate 0.0008 Epoch: 8 Global Step: 182540 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:54,452-Speed 6299.94 samples/sec Loss 6.8406 LearningRate 0.0008 Epoch: 8 Global Step: 182550 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:19:57,698-Speed 6309.03 samples/sec Loss 6.7838 LearningRate 0.0008 Epoch: 8 Global Step: 182560 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:00,947-Speed 6306.00 samples/sec Loss 6.8048 LearningRate 0.0008 Epoch: 8 Global Step: 182570 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:04,197-Speed 6302.48 samples/sec Loss 6.7877 LearningRate 0.0008 Epoch: 8 Global Step: 182580 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:07,446-Speed 6304.40 samples/sec Loss 6.8914 LearningRate 0.0008 Epoch: 8 Global Step: 182590 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:20:10,696-Speed 6302.81 samples/sec Loss 6.7702 LearningRate 0.0008 Epoch: 8 Global Step: 182600 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:20:13,935-Speed 6325.90 samples/sec Loss 6.8465 LearningRate 0.0008 Epoch: 8 Global Step: 182610 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:17,180-Speed 6311.11 samples/sec Loss 6.8293 LearningRate 0.0008 Epoch: 8 Global Step: 182620 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:20,428-Speed 6309.27 samples/sec Loss 6.7847 LearningRate 0.0008 Epoch: 8 Global Step: 182630 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:23,679-Speed 6301.12 samples/sec Loss 6.6406 LearningRate 0.0008 Epoch: 8 Global Step: 182640 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:26,927-Speed 6305.88 samples/sec Loss 6.8415 LearningRate 0.0008 Epoch: 8 Global Step: 182650 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:30,174-Speed 6308.75 samples/sec Loss 6.7740 LearningRate 0.0008 Epoch: 8 Global Step: 182660 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:33,425-Speed 6302.65 samples/sec Loss 6.7933 LearningRate 0.0008 Epoch: 8 Global Step: 182670 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:36,670-Speed 6311.21 samples/sec Loss 6.8206 LearningRate 0.0008 Epoch: 8 Global Step: 182680 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:39,922-Speed 6300.78 samples/sec Loss 6.7508 LearningRate 0.0008 Epoch: 8 Global Step: 182690 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:43,167-Speed 6312.38 samples/sec Loss 6.7685 LearningRate 0.0008 Epoch: 8 Global Step: 182700 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:46,407-Speed 6323.04 samples/sec Loss 6.8202 LearningRate 0.0008 Epoch: 8 Global Step: 182710 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:49,652-Speed 6311.99 samples/sec Loss 6.7560 LearningRate 0.0008 Epoch: 8 Global Step: 182720 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:52,898-Speed 6312.15 samples/sec Loss 6.7599 LearningRate 0.0008 Epoch: 8 Global Step: 182730 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:56,147-Speed 6303.83 samples/sec Loss 6.7512 LearningRate 0.0008 Epoch: 8 Global Step: 182740 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:20:59,392-Speed 6312.70 samples/sec Loss 6.7294 LearningRate 0.0008 Epoch: 8 Global Step: 182750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:02,642-Speed 6303.97 samples/sec Loss 6.8402 LearningRate 0.0008 Epoch: 8 Global Step: 182760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:05,889-Speed 6308.37 samples/sec Loss 6.8438 LearningRate 0.0008 Epoch: 8 Global Step: 182770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:09,136-Speed 6308.37 samples/sec Loss 6.8293 LearningRate 0.0008 Epoch: 8 Global Step: 182780 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:12,381-Speed 6313.03 samples/sec Loss 6.7684 LearningRate 0.0008 Epoch: 8 Global Step: 182790 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:15,631-Speed 6302.83 samples/sec Loss 6.7243 LearningRate 0.0008 Epoch: 8 Global Step: 182800 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:18,884-Speed 6297.35 samples/sec Loss 6.8507 LearningRate 0.0008 Epoch: 8 Global Step: 182810 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:21:22,128-Speed 6313.82 samples/sec Loss 6.8006 LearningRate 0.0008 Epoch: 8 Global Step: 182820 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:21:25,360-Speed 6338.33 samples/sec Loss 6.7953 LearningRate 0.0008 Epoch: 8 Global Step: 182830 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:28,606-Speed 6309.42 samples/sec Loss 6.7225 LearningRate 0.0008 Epoch: 8 Global Step: 182840 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:31,853-Speed 6309.19 samples/sec Loss 6.7964 LearningRate 0.0008 Epoch: 8 Global Step: 182850 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:35,097-Speed 6314.43 samples/sec Loss 6.8529 LearningRate 0.0008 Epoch: 8 Global Step: 182860 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:38,345-Speed 6307.19 samples/sec Loss 6.7722 LearningRate 0.0008 Epoch: 8 Global Step: 182870 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:41,593-Speed 6306.64 samples/sec Loss 6.7742 LearningRate 0.0008 Epoch: 8 Global Step: 182880 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:44,840-Speed 6309.30 samples/sec Loss 6.7555 LearningRate 0.0008 Epoch: 8 Global Step: 182890 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:48,083-Speed 6316.88 samples/sec Loss 6.7761 LearningRate 0.0008 Epoch: 8 Global Step: 182900 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:51,328-Speed 6311.90 samples/sec Loss 6.7325 LearningRate 0.0008 Epoch: 8 Global Step: 182910 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:54,577-Speed 6305.99 samples/sec Loss 6.7267 LearningRate 0.0008 Epoch: 8 Global Step: 182920 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:21:57,825-Speed 6306.65 samples/sec Loss 6.7616 LearningRate 0.0008 Epoch: 8 Global Step: 182930 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:22:01,055-Speed 6342.45 samples/sec Loss 6.8296 LearningRate 0.0008 Epoch: 8 Global Step: 182940 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:04,299-Speed 6314.31 samples/sec Loss 6.7915 LearningRate 0.0008 Epoch: 8 Global Step: 182950 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:07,549-Speed 6304.54 samples/sec Loss 6.7664 LearningRate 0.0008 Epoch: 8 Global Step: 182960 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:10,798-Speed 6303.28 samples/sec Loss 6.7380 LearningRate 0.0008 Epoch: 8 Global Step: 182970 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:14,047-Speed 6305.45 samples/sec Loss 6.7058 LearningRate 0.0008 Epoch: 8 Global Step: 182980 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:17,295-Speed 6306.03 samples/sec Loss 6.7250 LearningRate 0.0007 Epoch: 8 Global Step: 182990 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:20,543-Speed 6307.77 samples/sec Loss 6.7959 LearningRate 0.0007 Epoch: 8 Global Step: 183000 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:23,788-Speed 6311.37 samples/sec Loss 6.8188 LearningRate 0.0007 Epoch: 8 Global Step: 183010 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:27,037-Speed 6305.25 samples/sec Loss 6.7706 LearningRate 0.0007 Epoch: 8 Global Step: 183020 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:30,286-Speed 6305.80 samples/sec Loss 6.7995 LearningRate 0.0007 Epoch: 8 Global Step: 183030 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:33,531-Speed 6313.05 samples/sec Loss 6.8102 LearningRate 0.0007 Epoch: 8 Global Step: 183040 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:22:36,778-Speed 6308.62 samples/sec Loss 6.8822 LearningRate 0.0007 Epoch: 8 Global Step: 183050 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:22:40,026-Speed 6306.83 samples/sec Loss 6.7545 LearningRate 0.0007 Epoch: 8 Global Step: 183060 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:22:43,262-Speed 6329.64 samples/sec Loss 6.7531 LearningRate 0.0007 Epoch: 8 Global Step: 183070 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:46,510-Speed 6307.35 samples/sec Loss 6.7810 LearningRate 0.0007 Epoch: 8 Global Step: 183080 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:49,754-Speed 6314.11 samples/sec Loss 6.8046 LearningRate 0.0007 Epoch: 8 Global Step: 183090 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:53,004-Speed 6303.60 samples/sec Loss 6.8197 LearningRate 0.0007 Epoch: 8 Global Step: 183100 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:56,253-Speed 6304.81 samples/sec Loss 6.7794 LearningRate 0.0007 Epoch: 8 Global Step: 183110 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:22:59,497-Speed 6315.60 samples/sec Loss 6.8086 LearningRate 0.0007 Epoch: 8 Global Step: 183120 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:02,744-Speed 6307.38 samples/sec Loss 6.8274 LearningRate 0.0007 Epoch: 8 Global Step: 183130 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:05,995-Speed 6301.83 samples/sec Loss 6.8077 LearningRate 0.0007 Epoch: 8 Global Step: 183140 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:09,243-Speed 6306.81 samples/sec Loss 6.8057 LearningRate 0.0007 Epoch: 8 Global Step: 183150 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:12,495-Speed 6299.38 samples/sec Loss 6.7032 LearningRate 0.0007 Epoch: 8 Global Step: 183160 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:15,744-Speed 6304.24 samples/sec Loss 6.7760 LearningRate 0.0007 Epoch: 8 Global Step: 183170 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:23:18,977-Speed 6336.49 samples/sec Loss 6.7819 LearningRate 0.0007 Epoch: 8 Global Step: 183180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:22,227-Speed 6303.18 samples/sec Loss 6.7299 LearningRate 0.0007 Epoch: 8 Global Step: 183190 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:25,474-Speed 6308.84 samples/sec Loss 6.7590 LearningRate 0.0007 Epoch: 8 Global Step: 183200 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:28,717-Speed 6315.22 samples/sec Loss 6.7535 LearningRate 0.0007 Epoch: 8 Global Step: 183210 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:31,964-Speed 6310.04 samples/sec Loss 6.7998 LearningRate 0.0007 Epoch: 8 Global Step: 183220 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:35,220-Speed 6291.50 samples/sec Loss 6.7755 LearningRate 0.0007 Epoch: 8 Global Step: 183230 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:38,469-Speed 6304.42 samples/sec Loss 6.7829 LearningRate 0.0007 Epoch: 8 Global Step: 183240 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:41,710-Speed 6319.23 samples/sec Loss 6.8173 LearningRate 0.0007 Epoch: 8 Global Step: 183250 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:44,958-Speed 6307.72 samples/sec Loss 6.6827 LearningRate 0.0007 Epoch: 8 Global Step: 183260 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:48,206-Speed 6307.45 samples/sec Loss 6.7766 LearningRate 0.0007 Epoch: 8 Global Step: 183270 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:51,437-Speed 6339.68 samples/sec Loss 6.7963 LearningRate 0.0007 Epoch: 8 Global Step: 183280 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:54,684-Speed 6307.69 samples/sec Loss 6.7274 LearningRate 0.0007 Epoch: 8 Global Step: 183290 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:23:57,930-Speed 6312.33 samples/sec Loss 6.8375 LearningRate 0.0007 Epoch: 8 Global Step: 183300 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:01,182-Speed 6298.63 samples/sec Loss 6.7500 LearningRate 0.0007 Epoch: 8 Global Step: 183310 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:04,428-Speed 6310.19 samples/sec Loss 6.8004 LearningRate 0.0007 Epoch: 8 Global Step: 183320 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:07,673-Speed 6312.31 samples/sec Loss 6.7480 LearningRate 0.0007 Epoch: 8 Global Step: 183330 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:10,922-Speed 6305.34 samples/sec Loss 6.7446 LearningRate 0.0007 Epoch: 8 Global Step: 183340 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:14,170-Speed 6307.42 samples/sec Loss 6.8110 LearningRate 0.0007 Epoch: 8 Global Step: 183350 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:17,418-Speed 6306.46 samples/sec Loss 6.8015 LearningRate 0.0007 Epoch: 8 Global Step: 183360 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:20,665-Speed 6309.34 samples/sec Loss 6.7948 LearningRate 0.0007 Epoch: 8 Global Step: 183370 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:23,909-Speed 6313.99 samples/sec Loss 6.7348 LearningRate 0.0007 Epoch: 8 Global Step: 183380 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:24:27,160-Speed 6301.24 samples/sec Loss 6.8051 LearningRate 0.0007 Epoch: 8 Global Step: 183390 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:24:30,404-Speed 6315.16 samples/sec Loss 6.7248 LearningRate 0.0007 Epoch: 8 Global Step: 183400 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:24:33,651-Speed 6309.18 samples/sec Loss 6.7641 LearningRate 0.0007 Epoch: 8 Global Step: 183410 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:24:36,895-Speed 6314.46 samples/sec Loss 6.7275 LearningRate 0.0007 Epoch: 8 Global Step: 183420 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:24:40,127-Speed 6338.02 samples/sec Loss 6.7524 LearningRate 0.0007 Epoch: 8 Global Step: 183430 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:43,372-Speed 6311.25 samples/sec Loss 6.8358 LearningRate 0.0007 Epoch: 8 Global Step: 183440 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:46,620-Speed 6308.06 samples/sec Loss 6.7639 LearningRate 0.0007 Epoch: 8 Global Step: 183450 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:49,866-Speed 6311.23 samples/sec Loss 6.8192 LearningRate 0.0007 Epoch: 8 Global Step: 183460 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:53,113-Speed 6308.85 samples/sec Loss 6.7503 LearningRate 0.0007 Epoch: 8 Global Step: 183470 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:56,368-Speed 6292.76 samples/sec Loss 6.7612 LearningRate 0.0007 Epoch: 8 Global Step: 183480 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:24:59,611-Speed 6316.87 samples/sec Loss 6.7941 LearningRate 0.0007 Epoch: 8 Global Step: 183490 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:02,880-Speed 6265.97 samples/sec Loss 6.7026 LearningRate 0.0007 Epoch: 8 Global Step: 183500 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:06,127-Speed 6308.09 samples/sec Loss 6.7407 LearningRate 0.0007 Epoch: 8 Global Step: 183510 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:09,376-Speed 6306.23 samples/sec Loss 6.7933 LearningRate 0.0007 Epoch: 8 Global Step: 183520 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:12,609-Speed 6335.26 samples/sec Loss 6.8405 LearningRate 0.0007 Epoch: 8 Global Step: 183530 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:15,859-Speed 6304.22 samples/sec Loss 6.8254 LearningRate 0.0007 Epoch: 8 Global Step: 183540 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:19,108-Speed 6304.32 samples/sec Loss 6.7664 LearningRate 0.0007 Epoch: 8 Global Step: 183550 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:22,351-Speed 6315.96 samples/sec Loss 6.7955 LearningRate 0.0007 Epoch: 8 Global Step: 183560 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:25,599-Speed 6308.39 samples/sec Loss 6.7472 LearningRate 0.0007 Epoch: 8 Global Step: 183570 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:28,845-Speed 6309.80 samples/sec Loss 6.6800 LearningRate 0.0007 Epoch: 8 Global Step: 183580 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:32,090-Speed 6312.80 samples/sec Loss 6.6896 LearningRate 0.0007 Epoch: 8 Global Step: 183590 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:35,368-Speed 6249.08 samples/sec Loss 6.8045 LearningRate 0.0007 Epoch: 8 Global Step: 183600 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:38,631-Speed 6278.18 samples/sec Loss 6.8011 LearningRate 0.0007 Epoch: 8 Global Step: 183610 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:41,879-Speed 6305.28 samples/sec Loss 6.8250 LearningRate 0.0007 Epoch: 8 Global Step: 183620 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:45,127-Speed 6307.69 samples/sec Loss 6.7528 LearningRate 0.0007 Epoch: 8 Global Step: 183630 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:25:48,358-Speed 6339.60 samples/sec Loss 6.7515 LearningRate 0.0007 Epoch: 8 Global Step: 183640 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:51,609-Speed 6300.76 samples/sec Loss 6.7836 LearningRate 0.0007 Epoch: 8 Global Step: 183650 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:54,857-Speed 6307.69 samples/sec Loss 6.8381 LearningRate 0.0007 Epoch: 8 Global Step: 183660 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:25:58,108-Speed 6300.33 samples/sec Loss 6.6758 LearningRate 0.0007 Epoch: 8 Global Step: 183670 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:01,376-Speed 6268.07 samples/sec Loss 6.8217 LearningRate 0.0007 Epoch: 8 Global Step: 183680 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:04,628-Speed 6300.49 samples/sec Loss 6.7688 LearningRate 0.0007 Epoch: 8 Global Step: 183690 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:07,876-Speed 6305.79 samples/sec Loss 6.7011 LearningRate 0.0007 Epoch: 8 Global Step: 183700 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:11,124-Speed 6307.16 samples/sec Loss 6.6726 LearningRate 0.0007 Epoch: 8 Global Step: 183710 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:14,368-Speed 6314.67 samples/sec Loss 6.8576 LearningRate 0.0007 Epoch: 8 Global Step: 183720 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:17,616-Speed 6306.87 samples/sec Loss 6.7989 LearningRate 0.0007 Epoch: 8 Global Step: 183730 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:20,862-Speed 6311.36 samples/sec Loss 6.7752 LearningRate 0.0007 Epoch: 8 Global Step: 183740 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:26:24,101-Speed 6325.74 samples/sec Loss 6.7918 LearningRate 0.0007 Epoch: 8 Global Step: 183750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:27,346-Speed 6314.96 samples/sec Loss 6.7267 LearningRate 0.0007 Epoch: 8 Global Step: 183760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:30,597-Speed 6300.71 samples/sec Loss 6.7106 LearningRate 0.0007 Epoch: 8 Global Step: 183770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:33,846-Speed 6305.51 samples/sec Loss 6.7117 LearningRate 0.0007 Epoch: 8 Global Step: 183780 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:37,096-Speed 6302.94 samples/sec Loss 6.7535 LearningRate 0.0007 Epoch: 8 Global Step: 183790 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:40,345-Speed 6304.47 samples/sec Loss 6.7463 LearningRate 0.0007 Epoch: 8 Global Step: 183800 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:43,592-Speed 6309.54 samples/sec Loss 6.7969 LearningRate 0.0007 Epoch: 8 Global Step: 183810 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:46,836-Speed 6313.16 samples/sec Loss 6.7583 LearningRate 0.0007 Epoch: 8 Global Step: 183820 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:50,085-Speed 6305.61 samples/sec Loss 6.8113 LearningRate 0.0007 Epoch: 8 Global Step: 183830 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:53,331-Speed 6311.40 samples/sec Loss 6.8178 LearningRate 0.0007 Epoch: 8 Global Step: 183840 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:26:56,582-Speed 6299.98 samples/sec Loss 6.7353 LearningRate 0.0007 Epoch: 8 Global Step: 183850 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:26:59,820-Speed 6327.80 samples/sec Loss 6.7770 LearningRate 0.0007 Epoch: 8 Global Step: 183860 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:03,065-Speed 6311.63 samples/sec Loss 6.7268 LearningRate 0.0007 Epoch: 8 Global Step: 183870 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:06,323-Speed 6288.00 samples/sec Loss 6.7191 LearningRate 0.0007 Epoch: 8 Global Step: 183880 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:09,581-Speed 6287.27 samples/sec Loss 6.7622 LearningRate 0.0007 Epoch: 8 Global Step: 183890 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:12,837-Speed 6290.95 samples/sec Loss 6.8037 LearningRate 0.0007 Epoch: 8 Global Step: 183900 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:16,084-Speed 6307.80 samples/sec Loss 6.7278 LearningRate 0.0007 Epoch: 8 Global Step: 183910 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:19,332-Speed 6306.72 samples/sec Loss 6.8074 LearningRate 0.0007 Epoch: 8 Global Step: 183920 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:22,576-Speed 6316.13 samples/sec Loss 6.7866 LearningRate 0.0007 Epoch: 8 Global Step: 183930 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:25,835-Speed 6284.79 samples/sec Loss 6.7795 LearningRate 0.0007 Epoch: 8 Global Step: 183940 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:29,082-Speed 6309.66 samples/sec Loss 6.8098 LearningRate 0.0007 Epoch: 8 Global Step: 183950 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:32,328-Speed 6311.97 samples/sec Loss 6.7012 LearningRate 0.0007 Epoch: 8 Global Step: 183960 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:27:35,577-Speed 6303.69 samples/sec Loss 6.7336 LearningRate 0.0007 Epoch: 8 Global Step: 183970 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:27:38,826-Speed 6305.15 samples/sec Loss 6.7523 LearningRate 0.0007 Epoch: 8 Global Step: 183980 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:27:42,056-Speed 6341.06 samples/sec Loss 6.8300 LearningRate 0.0007 Epoch: 8 Global Step: 183990 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:45,301-Speed 6313.61 samples/sec Loss 6.7815 LearningRate 0.0007 Epoch: 8 Global Step: 184000 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:48,549-Speed 6307.41 samples/sec Loss 6.7532 LearningRate 0.0007 Epoch: 8 Global Step: 184010 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:51,801-Speed 6297.41 samples/sec Loss 6.7641 LearningRate 0.0007 Epoch: 8 Global Step: 184020 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:55,052-Speed 6302.77 samples/sec Loss 6.7886 LearningRate 0.0007 Epoch: 8 Global Step: 184030 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:27:58,301-Speed 6303.70 samples/sec Loss 6.7643 LearningRate 0.0007 Epoch: 8 Global Step: 184040 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:01,544-Speed 6317.03 samples/sec Loss 6.7709 LearningRate 0.0007 Epoch: 8 Global Step: 184050 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:04,791-Speed 6308.81 samples/sec Loss 6.7856 LearningRate 0.0007 Epoch: 8 Global Step: 184060 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:08,042-Speed 6300.82 samples/sec Loss 6.8245 LearningRate 0.0007 Epoch: 8 Global Step: 184070 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:11,287-Speed 6313.24 samples/sec Loss 6.8025 LearningRate 0.0007 Epoch: 8 Global Step: 184080 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:14,535-Speed 6305.34 samples/sec Loss 6.8262 LearningRate 0.0007 Epoch: 8 Global Step: 184090 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:28:17,768-Speed 6336.36 samples/sec Loss 6.7675 LearningRate 0.0007 Epoch: 8 Global Step: 184100 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:21,015-Speed 6309.78 samples/sec Loss 6.7258 LearningRate 0.0007 Epoch: 8 Global Step: 184110 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:24,262-Speed 6307.56 samples/sec Loss 6.6489 LearningRate 0.0007 Epoch: 8 Global Step: 184120 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:27,508-Speed 6311.99 samples/sec Loss 6.7581 LearningRate 0.0007 Epoch: 8 Global Step: 184130 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:30,755-Speed 6308.07 samples/sec Loss 6.7270 LearningRate 0.0007 Epoch: 8 Global Step: 184140 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:34,000-Speed 6313.76 samples/sec Loss 6.6884 LearningRate 0.0007 Epoch: 8 Global Step: 184150 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:37,260-Speed 6284.25 samples/sec Loss 6.8701 LearningRate 0.0007 Epoch: 8 Global Step: 184160 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:40,507-Speed 6307.75 samples/sec Loss 6.7788 LearningRate 0.0007 Epoch: 8 Global Step: 184170 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:43,757-Speed 6302.43 samples/sec Loss 6.7354 LearningRate 0.0007 Epoch: 8 Global Step: 184180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:47,009-Speed 6299.90 samples/sec Loss 6.7096 LearningRate 0.0007 Epoch: 8 Global Step: 184190 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:50,258-Speed 6304.27 samples/sec Loss 6.7753 LearningRate 0.0007 Epoch: 8 Global Step: 184200 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:28:53,489-Speed 6341.32 samples/sec Loss 6.8102 LearningRate 0.0007 Epoch: 8 Global Step: 184210 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:56,736-Speed 6308.35 samples/sec Loss 6.8412 LearningRate 0.0007 Epoch: 8 Global Step: 184220 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:28:59,985-Speed 6305.07 samples/sec Loss 6.7330 LearningRate 0.0007 Epoch: 8 Global Step: 184230 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:03,230-Speed 6312.04 samples/sec Loss 6.7691 LearningRate 0.0007 Epoch: 8 Global Step: 184240 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:06,477-Speed 6309.57 samples/sec Loss 6.8388 LearningRate 0.0007 Epoch: 8 Global Step: 184250 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:09,725-Speed 6306.56 samples/sec Loss 6.8668 LearningRate 0.0007 Epoch: 8 Global Step: 184260 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:12,972-Speed 6308.36 samples/sec Loss 6.7700 LearningRate 0.0007 Epoch: 8 Global Step: 184270 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:16,231-Speed 6285.36 samples/sec Loss 6.7644 LearningRate 0.0007 Epoch: 8 Global Step: 184280 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:19,485-Speed 6295.81 samples/sec Loss 6.7576 LearningRate 0.0007 Epoch: 8 Global Step: 184290 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:22,731-Speed 6310.43 samples/sec Loss 6.8250 LearningRate 0.0007 Epoch: 8 Global Step: 184300 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:25,977-Speed 6311.09 samples/sec Loss 6.7772 LearningRate 0.0007 Epoch: 8 Global Step: 184310 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:29:29,221-Speed 6314.19 samples/sec Loss 6.7830 LearningRate 0.0007 Epoch: 8 Global Step: 184320 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:29:32,455-Speed 6333.44 samples/sec Loss 6.7936 LearningRate 0.0007 Epoch: 8 Global Step: 184330 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:35,700-Speed 6314.28 samples/sec Loss 6.7209 LearningRate 0.0007 Epoch: 8 Global Step: 184340 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:38,944-Speed 6313.40 samples/sec Loss 6.8344 LearningRate 0.0007 Epoch: 8 Global Step: 184350 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:42,190-Speed 6310.91 samples/sec Loss 6.8483 LearningRate 0.0007 Epoch: 8 Global Step: 184360 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:45,434-Speed 6315.87 samples/sec Loss 6.8117 LearningRate 0.0007 Epoch: 8 Global Step: 184370 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:48,680-Speed 6310.43 samples/sec Loss 6.8031 LearningRate 0.0007 Epoch: 8 Global Step: 184380 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:51,923-Speed 6315.66 samples/sec Loss 6.7711 LearningRate 0.0007 Epoch: 8 Global Step: 184390 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:55,174-Speed 6301.47 samples/sec Loss 6.8195 LearningRate 0.0007 Epoch: 8 Global Step: 184400 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:29:58,420-Speed 6311.13 samples/sec Loss 6.7445 LearningRate 0.0007 Epoch: 8 Global Step: 184410 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:01,670-Speed 6302.75 samples/sec Loss 6.7427 LearningRate 0.0007 Epoch: 8 Global Step: 184420 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:04,916-Speed 6309.61 samples/sec Loss 6.7130 LearningRate 0.0007 Epoch: 8 Global Step: 184430 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:30:08,169-Speed 6298.28 samples/sec Loss 6.7825 LearningRate 0.0007 Epoch: 8 Global Step: 184440 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:30:11,432-Speed 6277.09 samples/sec Loss 6.7165 LearningRate 0.0007 Epoch: 8 Global Step: 184450 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:30:14,678-Speed 6311.08 samples/sec Loss 6.7575 LearningRate 0.0007 Epoch: 8 Global Step: 184460 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:30:17,928-Speed 6302.43 samples/sec Loss 6.7464 LearningRate 0.0007 Epoch: 8 Global Step: 184470 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:30:21,176-Speed 6306.79 samples/sec Loss 6.8119 LearningRate 0.0007 Epoch: 8 Global Step: 184480 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:30:24,409-Speed 6336.63 samples/sec Loss 6.7397 LearningRate 0.0007 Epoch: 8 Global Step: 184490 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:27,664-Speed 6292.47 samples/sec Loss 6.7815 LearningRate 0.0007 Epoch: 8 Global Step: 184500 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:30,913-Speed 6304.92 samples/sec Loss 6.7827 LearningRate 0.0007 Epoch: 8 Global Step: 184510 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:34,159-Speed 6310.60 samples/sec Loss 6.7805 LearningRate 0.0007 Epoch: 8 Global Step: 184520 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:37,405-Speed 6311.58 samples/sec Loss 6.7663 LearningRate 0.0007 Epoch: 8 Global Step: 184530 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:40,652-Speed 6309.38 samples/sec Loss 6.6717 LearningRate 0.0007 Epoch: 8 Global Step: 184540 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:43,900-Speed 6307.27 samples/sec Loss 6.8465 LearningRate 0.0007 Epoch: 8 Global Step: 184550 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:47,149-Speed 6304.94 samples/sec Loss 6.7314 LearningRate 0.0007 Epoch: 8 Global Step: 184560 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:50,401-Speed 6299.00 samples/sec Loss 6.7752 LearningRate 0.0007 Epoch: 8 Global Step: 184570 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:53,649-Speed 6307.19 samples/sec Loss 6.7255 LearningRate 0.0007 Epoch: 8 Global Step: 184580 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:30:56,898-Speed 6303.92 samples/sec Loss 6.7054 LearningRate 0.0007 Epoch: 8 Global Step: 184590 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:31:00,133-Speed 6333.24 samples/sec Loss 6.8169 LearningRate 0.0007 Epoch: 8 Global Step: 184600 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:03,385-Speed 6298.58 samples/sec Loss 6.7175 LearningRate 0.0007 Epoch: 8 Global Step: 184610 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:06,634-Speed 6304.84 samples/sec Loss 6.6799 LearningRate 0.0007 Epoch: 8 Global Step: 184620 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:09,880-Speed 6310.90 samples/sec Loss 6.7886 LearningRate 0.0007 Epoch: 8 Global Step: 184630 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:13,127-Speed 6309.52 samples/sec Loss 6.7244 LearningRate 0.0007 Epoch: 8 Global Step: 184640 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:16,378-Speed 6300.76 samples/sec Loss 6.8374 LearningRate 0.0007 Epoch: 8 Global Step: 184650 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:19,628-Speed 6302.20 samples/sec Loss 6.7948 LearningRate 0.0007 Epoch: 8 Global Step: 184660 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:22,872-Speed 6313.92 samples/sec Loss 6.7552 LearningRate 0.0007 Epoch: 8 Global Step: 184670 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:26,123-Speed 6301.99 samples/sec Loss 6.7735 LearningRate 0.0007 Epoch: 8 Global Step: 184680 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:29,377-Speed 6295.51 samples/sec Loss 6.7520 LearningRate 0.0007 Epoch: 8 Global Step: 184690 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:32,626-Speed 6305.34 samples/sec Loss 6.7986 LearningRate 0.0007 Epoch: 8 Global Step: 184700 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:31:35,858-Speed 6338.34 samples/sec Loss 6.8091 LearningRate 0.0007 Epoch: 8 Global Step: 184710 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:39,105-Speed 6308.19 samples/sec Loss 6.8129 LearningRate 0.0007 Epoch: 8 Global Step: 184720 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:42,354-Speed 6304.47 samples/sec Loss 6.6666 LearningRate 0.0007 Epoch: 8 Global Step: 184730 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:45,601-Speed 6309.47 samples/sec Loss 6.7340 LearningRate 0.0007 Epoch: 8 Global Step: 184740 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:48,844-Speed 6316.00 samples/sec Loss 6.7432 LearningRate 0.0007 Epoch: 8 Global Step: 184750 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:52,091-Speed 6308.93 samples/sec Loss 6.7018 LearningRate 0.0007 Epoch: 8 Global Step: 184760 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:55,350-Speed 6286.43 samples/sec Loss 6.7267 LearningRate 0.0007 Epoch: 8 Global Step: 184770 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:31:58,598-Speed 6306.83 samples/sec Loss 6.7763 LearningRate 0.0007 Epoch: 8 Global Step: 184780 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:01,848-Speed 6302.51 samples/sec Loss 6.8580 LearningRate 0.0007 Epoch: 8 Global Step: 184790 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:05,094-Speed 6310.94 samples/sec Loss 6.8177 LearningRate 0.0007 Epoch: 8 Global Step: 184800 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:08,341-Speed 6308.78 samples/sec Loss 6.7616 LearningRate 0.0007 Epoch: 8 Global Step: 184810 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:32:11,577-Speed 6330.11 samples/sec Loss 6.7727 LearningRate 0.0007 Epoch: 8 Global Step: 184820 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:14,827-Speed 6302.89 samples/sec Loss 6.7702 LearningRate 0.0007 Epoch: 8 Global Step: 184830 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:18,076-Speed 6305.48 samples/sec Loss 6.7679 LearningRate 0.0007 Epoch: 8 Global Step: 184840 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:21,325-Speed 6304.75 samples/sec Loss 6.7434 LearningRate 0.0007 Epoch: 8 Global Step: 184850 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:24,573-Speed 6306.14 samples/sec Loss 6.7032 LearningRate 0.0007 Epoch: 8 Global Step: 184860 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:27,826-Speed 6296.73 samples/sec Loss 6.7710 LearningRate 0.0007 Epoch: 8 Global Step: 184870 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:31,077-Speed 6301.97 samples/sec Loss 6.7793 LearningRate 0.0007 Epoch: 8 Global Step: 184880 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:34,323-Speed 6311.21 samples/sec Loss 6.8294 LearningRate 0.0007 Epoch: 8 Global Step: 184890 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:37,572-Speed 6303.92 samples/sec Loss 6.7704 LearningRate 0.0007 Epoch: 8 Global Step: 184900 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:40,831-Speed 6286.28 samples/sec Loss 6.8133 LearningRate 0.0007 Epoch: 8 Global Step: 184910 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:44,079-Speed 6306.30 samples/sec Loss 6.7648 LearningRate 0.0007 Epoch: 8 Global Step: 184920 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:32:47,311-Speed 6337.52 samples/sec Loss 6.7025 LearningRate 0.0007 Epoch: 8 Global Step: 184930 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:50,563-Speed 6298.94 samples/sec Loss 6.7216 LearningRate 0.0007 Epoch: 8 Global Step: 184940 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:53,814-Speed 6301.57 samples/sec Loss 6.6984 LearningRate 0.0007 Epoch: 8 Global Step: 184950 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:32:57,063-Speed 6304.97 samples/sec Loss 6.7436 LearningRate 0.0007 Epoch: 8 Global Step: 184960 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:00,315-Speed 6300.16 samples/sec Loss 6.6731 LearningRate 0.0007 Epoch: 8 Global Step: 184970 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:03,566-Speed 6301.28 samples/sec Loss 6.7186 LearningRate 0.0007 Epoch: 8 Global Step: 184980 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:06,816-Speed 6302.93 samples/sec Loss 6.7356 LearningRate 0.0007 Epoch: 8 Global Step: 184990 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:10,065-Speed 6304.09 samples/sec Loss 6.7365 LearningRate 0.0007 Epoch: 8 Global Step: 185000 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:13,312-Speed 6309.47 samples/sec Loss 6.7781 LearningRate 0.0007 Epoch: 8 Global Step: 185010 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:16,563-Speed 6300.76 samples/sec Loss 6.7375 LearningRate 0.0007 Epoch: 8 Global Step: 185020 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:19,797-Speed 6334.70 samples/sec Loss 6.7607 LearningRate 0.0007 Epoch: 8 Global Step: 185030 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:23,040-Speed 6315.08 samples/sec Loss 6.7491 LearningRate 0.0007 Epoch: 8 Global Step: 185040 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:26,286-Speed 6311.53 samples/sec Loss 6.8028 LearningRate 0.0007 Epoch: 8 Global Step: 185050 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:29,531-Speed 6311.68 samples/sec Loss 6.7115 LearningRate 0.0007 Epoch: 8 Global Step: 185060 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:32,779-Speed 6306.69 samples/sec Loss 6.6656 LearningRate 0.0007 Epoch: 8 Global Step: 185070 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:36,029-Speed 6305.89 samples/sec Loss 6.8084 LearningRate 0.0007 Epoch: 8 Global Step: 185080 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:39,277-Speed 6308.06 samples/sec Loss 6.8096 LearningRate 0.0007 Epoch: 8 Global Step: 185090 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:42,524-Speed 6308.56 samples/sec Loss 6.8056 LearningRate 0.0007 Epoch: 8 Global Step: 185100 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:45,766-Speed 6317.07 samples/sec Loss 6.8282 LearningRate 0.0007 Epoch: 8 Global Step: 185110 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:49,015-Speed 6306.02 samples/sec Loss 6.7931 LearningRate 0.0007 Epoch: 8 Global Step: 185120 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:33:52,260-Speed 6312.56 samples/sec Loss 6.7854 LearningRate 0.0007 Epoch: 8 Global Step: 185130 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:33:55,510-Speed 6303.01 samples/sec Loss 6.7239 LearningRate 0.0007 Epoch: 8 Global Step: 185140 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:33:58,760-Speed 6303.44 samples/sec Loss 6.7749 LearningRate 0.0007 Epoch: 8 Global Step: 185150 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:34:01,991-Speed 6340.73 samples/sec Loss 6.7517 LearningRate 0.0007 Epoch: 8 Global Step: 185160 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:05,246-Speed 6292.96 samples/sec Loss 6.8822 LearningRate 0.0007 Epoch: 8 Global Step: 185170 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:08,489-Speed 6317.11 samples/sec Loss 6.7959 LearningRate 0.0007 Epoch: 8 Global Step: 185180 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:11,738-Speed 6304.57 samples/sec Loss 6.7750 LearningRate 0.0007 Epoch: 8 Global Step: 185190 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:14,984-Speed 6309.60 samples/sec Loss 6.7540 LearningRate 0.0007 Epoch: 8 Global Step: 185200 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:18,241-Speed 6290.24 samples/sec Loss 6.7699 LearningRate 0.0007 Epoch: 8 Global Step: 185210 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:21,491-Speed 6303.17 samples/sec Loss 6.6995 LearningRate 0.0007 Epoch: 8 Global Step: 185220 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:24,740-Speed 6307.28 samples/sec Loss 6.7847 LearningRate 0.0007 Epoch: 8 Global Step: 185230 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:27,985-Speed 6314.31 samples/sec Loss 6.7906 LearningRate 0.0007 Epoch: 8 Global Step: 185240 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:31,230-Speed 6312.09 samples/sec Loss 6.7755 LearningRate 0.0007 Epoch: 8 Global Step: 185250 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:34,473-Speed 6316.37 samples/sec Loss 6.7647 LearningRate 0.0007 Epoch: 8 Global Step: 185260 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:34:37,723-Speed 6303.07 samples/sec Loss 6.7671 LearningRate 0.0007 Epoch: 8 Global Step: 185270 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:34:40,972-Speed 6304.50 samples/sec Loss 6.7387 LearningRate 0.0007 Epoch: 8 Global Step: 185280 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:34:44,209-Speed 6328.60 samples/sec Loss 6.7683 LearningRate 0.0007 Epoch: 8 Global Step: 185290 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:47,452-Speed 6315.95 samples/sec Loss 6.7700 LearningRate 0.0007 Epoch: 8 Global Step: 185300 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:50,700-Speed 6307.47 samples/sec Loss 6.7301 LearningRate 0.0007 Epoch: 8 Global Step: 185310 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:53,952-Speed 6299.83 samples/sec Loss 6.7076 LearningRate 0.0007 Epoch: 8 Global Step: 185320 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:34:57,199-Speed 6308.42 samples/sec Loss 6.8083 LearningRate 0.0007 Epoch: 8 Global Step: 185330 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:00,450-Speed 6300.74 samples/sec Loss 6.7615 LearningRate 0.0007 Epoch: 8 Global Step: 185340 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:03,694-Speed 6314.29 samples/sec Loss 6.7174 LearningRate 0.0007 Epoch: 8 Global Step: 185350 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:06,941-Speed 6307.92 samples/sec Loss 6.7252 LearningRate 0.0007 Epoch: 8 Global Step: 185360 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:10,196-Speed 6294.56 samples/sec Loss 6.7985 LearningRate 0.0007 Epoch: 8 Global Step: 185370 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:13,441-Speed 6313.44 samples/sec Loss 6.8243 LearningRate 0.0007 Epoch: 8 Global Step: 185380 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:16,687-Speed 6311.05 samples/sec Loss 6.7301 LearningRate 0.0007 Epoch: 8 Global Step: 185390 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:35:19,917-Speed 6341.84 samples/sec Loss 6.7001 LearningRate 0.0007 Epoch: 8 Global Step: 185400 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:23,166-Speed 6304.38 samples/sec Loss 6.7625 LearningRate 0.0007 Epoch: 8 Global Step: 185410 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:26,410-Speed 6314.47 samples/sec Loss 6.8552 LearningRate 0.0007 Epoch: 8 Global Step: 185420 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:29,659-Speed 6305.13 samples/sec Loss 6.8245 LearningRate 0.0007 Epoch: 8 Global Step: 185430 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:32,903-Speed 6313.52 samples/sec Loss 6.7185 LearningRate 0.0007 Epoch: 8 Global Step: 185440 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:36,146-Speed 6317.78 samples/sec Loss 6.7631 LearningRate 0.0007 Epoch: 8 Global Step: 185450 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:39,391-Speed 6311.50 samples/sec Loss 6.7496 LearningRate 0.0007 Epoch: 8 Global Step: 185460 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:42,641-Speed 6302.47 samples/sec Loss 6.8188 LearningRate 0.0007 Epoch: 8 Global Step: 185470 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:45,892-Speed 6301.94 samples/sec Loss 6.7284 LearningRate 0.0007 Epoch: 8 Global Step: 185480 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:49,136-Speed 6314.97 samples/sec Loss 6.6950 LearningRate 0.0007 Epoch: 8 Global Step: 185490 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:52,370-Speed 6333.61 samples/sec Loss 6.7607 LearningRate 0.0007 Epoch: 8 Global Step: 185500 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:55,619-Speed 6304.70 samples/sec Loss 6.8622 LearningRate 0.0007 Epoch: 8 Global Step: 185510 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:35:58,867-Speed 6306.87 samples/sec Loss 6.8090 LearningRate 0.0007 Epoch: 8 Global Step: 185520 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:02,123-Speed 6291.12 samples/sec Loss 6.7453 LearningRate 0.0007 Epoch: 8 Global Step: 185530 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:05,373-Speed 6302.17 samples/sec Loss 6.7471 LearningRate 0.0007 Epoch: 8 Global Step: 185540 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:08,634-Speed 6283.10 samples/sec Loss 6.7726 LearningRate 0.0007 Epoch: 8 Global Step: 185550 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:11,880-Speed 6310.44 samples/sec Loss 6.7922 LearningRate 0.0007 Epoch: 8 Global Step: 185560 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:15,140-Speed 6284.04 samples/sec Loss 6.6256 LearningRate 0.0007 Epoch: 8 Global Step: 185570 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:18,384-Speed 6313.73 samples/sec Loss 6.7946 LearningRate 0.0007 Epoch: 8 Global Step: 185580 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:21,634-Speed 6303.71 samples/sec Loss 6.7649 LearningRate 0.0007 Epoch: 8 Global Step: 185590 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:24,868-Speed 6333.80 samples/sec Loss 6.7481 LearningRate 0.0007 Epoch: 8 Global Step: 185600 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:28,117-Speed 6305.58 samples/sec Loss 6.7388 LearningRate 0.0007 Epoch: 8 Global Step: 185610 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:31,366-Speed 6303.98 samples/sec Loss 6.8214 LearningRate 0.0007 Epoch: 8 Global Step: 185620 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:34,613-Speed 6309.40 samples/sec Loss 6.7780 LearningRate 0.0007 Epoch: 8 Global Step: 185630 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:37,859-Speed 6310.76 samples/sec Loss 6.8506 LearningRate 0.0007 Epoch: 8 Global Step: 185640 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:41,106-Speed 6309.08 samples/sec Loss 6.7604 LearningRate 0.0007 Epoch: 8 Global Step: 185650 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:44,354-Speed 6307.47 samples/sec Loss 6.6744 LearningRate 0.0007 Epoch: 8 Global Step: 185660 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:47,604-Speed 6302.35 samples/sec Loss 6.8076 LearningRate 0.0007 Epoch: 8 Global Step: 185670 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:50,859-Speed 6292.67 samples/sec Loss 6.7508 LearningRate 0.0007 Epoch: 8 Global Step: 185680 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:54,103-Speed 6314.30 samples/sec Loss 6.7416 LearningRate 0.0007 Epoch: 8 Global Step: 185690 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:36:57,349-Speed 6311.12 samples/sec Loss 6.7337 LearningRate 0.0007 Epoch: 8 Global Step: 185700 Fp16 Grad Scale: 65536 Required: 59 hours Training: 2022-04-01 08:37:00,589-Speed 6322.98 samples/sec Loss 6.6891 LearningRate 0.0007 Epoch: 8 Global Step: 185710 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:37:03,843-Speed 6295.27 samples/sec Loss 6.7552 LearningRate 0.0007 Epoch: 8 Global Step: 185720 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:37:07,093-Speed 6302.07 samples/sec Loss 6.7522 LearningRate 0.0007 Epoch: 8 Global Step: 185730 Fp16 Grad Scale: 32768 Required: 59 hours Training: 2022-04-01 08:37:10,337-Speed 6314.01 samples/sec Loss 6.7044 LearningRate 0.0007 Epoch: 8 Global Step: 185740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:37:13,586-Speed 6306.86 samples/sec Loss 6.7753 LearningRate 0.0007 Epoch: 8 Global Step: 185750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:37:16,829-Speed 6316.41 samples/sec Loss 6.7634 LearningRate 0.0007 Epoch: 8 Global Step: 185760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:37:20,074-Speed 6311.73 samples/sec Loss 6.7026 LearningRate 0.0007 Epoch: 8 Global Step: 185770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:37:23,319-Speed 6312.87 samples/sec Loss 6.7861 LearningRate 0.0007 Epoch: 8 Global Step: 185780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:37:26,571-Speed 6299.71 samples/sec Loss 6.7565 LearningRate 0.0007 Epoch: 8 Global Step: 185790 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:37:29,818-Speed 6309.44 samples/sec Loss 6.7694 LearningRate 0.0007 Epoch: 8 Global Step: 185800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:37:33,064-Speed 6310.59 samples/sec Loss 6.7453 LearningRate 0.0007 Epoch: 8 Global Step: 185810 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:37:36,312-Speed 6306.82 samples/sec Loss 6.7442 LearningRate 0.0007 Epoch: 8 Global Step: 185820 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:37:39,561-Speed 6304.21 samples/sec Loss 6.8461 LearningRate 0.0007 Epoch: 8 Global Step: 185830 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:37:42,814-Speed 6298.14 samples/sec Loss 6.7226 LearningRate 0.0007 Epoch: 8 Global Step: 185840 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:37:46,061-Speed 6308.95 samples/sec Loss 6.8326 LearningRate 0.0007 Epoch: 8 Global Step: 185850 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:37:49,307-Speed 6309.09 samples/sec Loss 6.6626 LearningRate 0.0007 Epoch: 8 Global Step: 185860 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:37:52,554-Speed 6310.23 samples/sec Loss 6.7203 LearningRate 0.0007 Epoch: 8 Global Step: 185870 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:37:55,800-Speed 6310.21 samples/sec Loss 6.7761 LearningRate 0.0007 Epoch: 8 Global Step: 185880 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:37:59,048-Speed 6307.01 samples/sec Loss 6.7429 LearningRate 0.0007 Epoch: 8 Global Step: 185890 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:38:02,291-Speed 6315.07 samples/sec Loss 6.7083 LearningRate 0.0007 Epoch: 8 Global Step: 185900 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:38:05,524-Speed 6336.64 samples/sec Loss 6.7172 LearningRate 0.0007 Epoch: 8 Global Step: 185910 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:38:08,757-Speed 6336.02 samples/sec Loss 6.7089 LearningRate 0.0007 Epoch: 8 Global Step: 185920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:12,003-Speed 6311.81 samples/sec Loss 6.7748 LearningRate 0.0007 Epoch: 8 Global Step: 185930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:15,252-Speed 6304.75 samples/sec Loss 6.7358 LearningRate 0.0007 Epoch: 8 Global Step: 185940 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:18,499-Speed 6308.68 samples/sec Loss 6.7551 LearningRate 0.0007 Epoch: 8 Global Step: 185950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:21,741-Speed 6318.05 samples/sec Loss 6.7876 LearningRate 0.0007 Epoch: 8 Global Step: 185960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:24,992-Speed 6301.40 samples/sec Loss 6.7544 LearningRate 0.0007 Epoch: 8 Global Step: 185970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:28,240-Speed 6305.77 samples/sec Loss 6.6676 LearningRate 0.0007 Epoch: 8 Global Step: 185980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:31,490-Speed 6303.09 samples/sec Loss 6.7277 LearningRate 0.0007 Epoch: 8 Global Step: 185990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:34,736-Speed 6312.67 samples/sec Loss 6.7237 LearningRate 0.0007 Epoch: 8 Global Step: 186000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:37,987-Speed 6300.91 samples/sec Loss 6.7758 LearningRate 0.0007 Epoch: 8 Global Step: 186010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:41,237-Speed 6302.24 samples/sec Loss 6.7993 LearningRate 0.0007 Epoch: 8 Global Step: 186020 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:38:44,491-Speed 6294.96 samples/sec Loss 6.6238 LearningRate 0.0007 Epoch: 8 Global Step: 186030 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:38:47,743-Speed 6300.38 samples/sec Loss 6.7261 LearningRate 0.0007 Epoch: 8 Global Step: 186040 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:38:50,974-Speed 6339.01 samples/sec Loss 6.7332 LearningRate 0.0007 Epoch: 8 Global Step: 186050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:54,223-Speed 6305.53 samples/sec Loss 6.7012 LearningRate 0.0007 Epoch: 8 Global Step: 186060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:38:57,469-Speed 6310.68 samples/sec Loss 6.7861 LearningRate 0.0007 Epoch: 8 Global Step: 186070 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:00,718-Speed 6303.77 samples/sec Loss 6.7533 LearningRate 0.0007 Epoch: 8 Global Step: 186080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:03,966-Speed 6308.27 samples/sec Loss 6.6586 LearningRate 0.0007 Epoch: 8 Global Step: 186090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:07,216-Speed 6302.79 samples/sec Loss 6.8246 LearningRate 0.0007 Epoch: 8 Global Step: 186100 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:10,462-Speed 6310.58 samples/sec Loss 6.7694 LearningRate 0.0007 Epoch: 8 Global Step: 186110 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:13,711-Speed 6304.64 samples/sec Loss 6.7158 LearningRate 0.0007 Epoch: 8 Global Step: 186120 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:16,958-Speed 6307.86 samples/sec Loss 6.7193 LearningRate 0.0007 Epoch: 8 Global Step: 186130 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:20,209-Speed 6301.43 samples/sec Loss 6.7377 LearningRate 0.0007 Epoch: 8 Global Step: 186140 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:23,458-Speed 6304.31 samples/sec Loss 6.7065 LearningRate 0.0007 Epoch: 8 Global Step: 186150 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:39:26,704-Speed 6310.61 samples/sec Loss 6.7164 LearningRate 0.0007 Epoch: 8 Global Step: 186160 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:39:29,949-Speed 6313.76 samples/sec Loss 6.6571 LearningRate 0.0007 Epoch: 8 Global Step: 186170 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:39:33,183-Speed 6333.24 samples/sec Loss 6.8051 LearningRate 0.0007 Epoch: 8 Global Step: 186180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:36,430-Speed 6308.54 samples/sec Loss 6.6744 LearningRate 0.0007 Epoch: 8 Global Step: 186190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:39,672-Speed 6317.94 samples/sec Loss 6.8327 LearningRate 0.0007 Epoch: 8 Global Step: 186200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:42,921-Speed 6306.99 samples/sec Loss 6.7223 LearningRate 0.0007 Epoch: 8 Global Step: 186210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:46,164-Speed 6315.60 samples/sec Loss 6.8193 LearningRate 0.0007 Epoch: 8 Global Step: 186220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:49,411-Speed 6309.33 samples/sec Loss 6.7765 LearningRate 0.0007 Epoch: 8 Global Step: 186230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:52,659-Speed 6308.14 samples/sec Loss 6.7720 LearningRate 0.0007 Epoch: 8 Global Step: 186240 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:55,906-Speed 6310.61 samples/sec Loss 6.8265 LearningRate 0.0007 Epoch: 8 Global Step: 186250 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:39:59,150-Speed 6313.12 samples/sec Loss 6.7409 LearningRate 0.0007 Epoch: 8 Global Step: 186260 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:02,399-Speed 6306.14 samples/sec Loss 6.7931 LearningRate 0.0007 Epoch: 8 Global Step: 186270 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:05,657-Speed 6287.35 samples/sec Loss 6.7753 LearningRate 0.0007 Epoch: 8 Global Step: 186280 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:40:08,902-Speed 6313.05 samples/sec Loss 6.6804 LearningRate 0.0007 Epoch: 8 Global Step: 186290 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:40:12,150-Speed 6305.71 samples/sec Loss 6.7808 LearningRate 0.0007 Epoch: 8 Global Step: 186300 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:40:15,397-Speed 6308.73 samples/sec Loss 6.7245 LearningRate 0.0007 Epoch: 8 Global Step: 186310 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:40:18,633-Speed 6330.57 samples/sec Loss 6.7778 LearningRate 0.0007 Epoch: 8 Global Step: 186320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:21,880-Speed 6308.83 samples/sec Loss 6.7007 LearningRate 0.0007 Epoch: 8 Global Step: 186330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:25,134-Speed 6294.81 samples/sec Loss 6.7512 LearningRate 0.0007 Epoch: 8 Global Step: 186340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:28,382-Speed 6307.53 samples/sec Loss 6.8175 LearningRate 0.0007 Epoch: 8 Global Step: 186350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:31,630-Speed 6306.09 samples/sec Loss 6.7261 LearningRate 0.0007 Epoch: 8 Global Step: 186360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:34,876-Speed 6310.37 samples/sec Loss 6.8250 LearningRate 0.0007 Epoch: 8 Global Step: 186370 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:38,120-Speed 6315.48 samples/sec Loss 6.7851 LearningRate 0.0007 Epoch: 8 Global Step: 186380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:41,375-Speed 6294.12 samples/sec Loss 6.6643 LearningRate 0.0007 Epoch: 8 Global Step: 186390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:44,620-Speed 6312.06 samples/sec Loss 6.7520 LearningRate 0.0007 Epoch: 8 Global Step: 186400 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:47,875-Speed 6292.83 samples/sec Loss 6.7419 LearningRate 0.0007 Epoch: 8 Global Step: 186410 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:40:51,121-Speed 6312.03 samples/sec Loss 6.7480 LearningRate 0.0007 Epoch: 8 Global Step: 186420 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:40:54,367-Speed 6309.92 samples/sec Loss 6.7914 LearningRate 0.0007 Epoch: 8 Global Step: 186430 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:40:57,599-Speed 6337.98 samples/sec Loss 6.7059 LearningRate 0.0007 Epoch: 8 Global Step: 186440 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:00,847-Speed 6307.04 samples/sec Loss 6.7399 LearningRate 0.0007 Epoch: 8 Global Step: 186450 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:04,093-Speed 6310.48 samples/sec Loss 6.7756 LearningRate 0.0007 Epoch: 8 Global Step: 186460 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:07,345-Speed 6299.32 samples/sec Loss 6.7699 LearningRate 0.0007 Epoch: 8 Global Step: 186470 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:10,590-Speed 6312.83 samples/sec Loss 6.7743 LearningRate 0.0007 Epoch: 8 Global Step: 186480 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:13,844-Speed 6294.12 samples/sec Loss 6.7149 LearningRate 0.0007 Epoch: 8 Global Step: 186490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:17,097-Speed 6297.14 samples/sec Loss 6.8184 LearningRate 0.0007 Epoch: 8 Global Step: 186500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:20,343-Speed 6311.69 samples/sec Loss 6.8155 LearningRate 0.0007 Epoch: 8 Global Step: 186510 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:23,592-Speed 6304.13 samples/sec Loss 6.8151 LearningRate 0.0007 Epoch: 8 Global Step: 186520 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:26,834-Speed 6319.06 samples/sec Loss 6.6789 LearningRate 0.0007 Epoch: 8 Global Step: 186530 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:30,078-Speed 6313.88 samples/sec Loss 6.7395 LearningRate 0.0007 Epoch: 8 Global Step: 186540 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:41:33,308-Speed 6341.60 samples/sec Loss 6.8178 LearningRate 0.0007 Epoch: 8 Global Step: 186550 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:36,568-Speed 6284.80 samples/sec Loss 6.8267 LearningRate 0.0007 Epoch: 8 Global Step: 186560 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:39,815-Speed 6309.46 samples/sec Loss 6.7130 LearningRate 0.0007 Epoch: 8 Global Step: 186570 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:43,061-Speed 6308.94 samples/sec Loss 6.7513 LearningRate 0.0007 Epoch: 8 Global Step: 186580 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:46,306-Speed 6312.32 samples/sec Loss 6.6888 LearningRate 0.0007 Epoch: 8 Global Step: 186590 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:49,560-Speed 6296.64 samples/sec Loss 6.7599 LearningRate 0.0007 Epoch: 8 Global Step: 186600 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:52,804-Speed 6314.12 samples/sec Loss 6.7839 LearningRate 0.0007 Epoch: 8 Global Step: 186610 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:56,051-Speed 6309.28 samples/sec Loss 6.7534 LearningRate 0.0007 Epoch: 8 Global Step: 186620 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:41:59,298-Speed 6308.68 samples/sec Loss 6.8034 LearningRate 0.0007 Epoch: 8 Global Step: 186630 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:42:02,549-Speed 6301.20 samples/sec Loss 6.7331 LearningRate 0.0007 Epoch: 8 Global Step: 186640 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:42:05,799-Speed 6302.76 samples/sec Loss 6.8133 LearningRate 0.0007 Epoch: 8 Global Step: 186650 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:43:07,155-Speed 333.80 samples/sec Loss 6.7902 LearningRate 0.0007 Epoch: 9 Global Step: 186660 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:43:10,388-Speed 6336.70 samples/sec Loss 6.7529 LearningRate 0.0007 Epoch: 9 Global Step: 186670 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:43:13,623-Speed 6331.32 samples/sec Loss 6.7307 LearningRate 0.0007 Epoch: 9 Global Step: 186680 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:43:16,845-Speed 6358.46 samples/sec Loss 6.7843 LearningRate 0.0007 Epoch: 9 Global Step: 186690 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:20,085-Speed 6322.93 samples/sec Loss 6.7943 LearningRate 0.0007 Epoch: 9 Global Step: 186700 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:23,318-Speed 6335.72 samples/sec Loss 6.8406 LearningRate 0.0007 Epoch: 9 Global Step: 186710 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:26,560-Speed 6317.60 samples/sec Loss 6.6843 LearningRate 0.0007 Epoch: 9 Global Step: 186720 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:29,792-Speed 6337.85 samples/sec Loss 6.7756 LearningRate 0.0007 Epoch: 9 Global Step: 186730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:33,029-Speed 6328.75 samples/sec Loss 6.7177 LearningRate 0.0007 Epoch: 9 Global Step: 186740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:36,270-Speed 6321.10 samples/sec Loss 6.6981 LearningRate 0.0007 Epoch: 9 Global Step: 186750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:39,507-Speed 6327.53 samples/sec Loss 6.7486 LearningRate 0.0007 Epoch: 9 Global Step: 186760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:42,745-Speed 6325.94 samples/sec Loss 6.6910 LearningRate 0.0007 Epoch: 9 Global Step: 186770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:45,985-Speed 6321.71 samples/sec Loss 6.6905 LearningRate 0.0007 Epoch: 9 Global Step: 186780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:49,232-Speed 6309.60 samples/sec Loss 6.7992 LearningRate 0.0007 Epoch: 9 Global Step: 186790 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:52,471-Speed 6324.48 samples/sec Loss 6.7122 LearningRate 0.0007 Epoch: 9 Global Step: 186800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:55,713-Speed 6318.64 samples/sec Loss 6.7088 LearningRate 0.0007 Epoch: 9 Global Step: 186810 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:43:58,950-Speed 6328.35 samples/sec Loss 6.7342 LearningRate 0.0007 Epoch: 9 Global Step: 186820 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:02,192-Speed 6318.12 samples/sec Loss 6.6991 LearningRate 0.0007 Epoch: 9 Global Step: 186830 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:05,434-Speed 6319.22 samples/sec Loss 6.8066 LearningRate 0.0007 Epoch: 9 Global Step: 186840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:08,677-Speed 6316.35 samples/sec Loss 6.7466 LearningRate 0.0007 Epoch: 9 Global Step: 186850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:11,916-Speed 6324.00 samples/sec Loss 6.6938 LearningRate 0.0007 Epoch: 9 Global Step: 186860 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:15,160-Speed 6315.89 samples/sec Loss 6.7331 LearningRate 0.0007 Epoch: 9 Global Step: 186870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:18,403-Speed 6316.54 samples/sec Loss 6.7105 LearningRate 0.0007 Epoch: 9 Global Step: 186880 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:21,642-Speed 6324.81 samples/sec Loss 6.7042 LearningRate 0.0007 Epoch: 9 Global Step: 186890 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:44:24,868-Speed 6349.69 samples/sec Loss 6.7033 LearningRate 0.0007 Epoch: 9 Global Step: 186900 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:28,111-Speed 6316.81 samples/sec Loss 6.6427 LearningRate 0.0007 Epoch: 9 Global Step: 186910 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:31,351-Speed 6321.77 samples/sec Loss 6.7754 LearningRate 0.0007 Epoch: 9 Global Step: 186920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:34,587-Speed 6330.46 samples/sec Loss 6.7129 LearningRate 0.0007 Epoch: 9 Global Step: 186930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:37,828-Speed 6319.98 samples/sec Loss 6.7748 LearningRate 0.0007 Epoch: 9 Global Step: 186940 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:41,069-Speed 6319.38 samples/sec Loss 6.7236 LearningRate 0.0007 Epoch: 9 Global Step: 186950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:44,308-Speed 6324.73 samples/sec Loss 6.6765 LearningRate 0.0007 Epoch: 9 Global Step: 186960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:47,547-Speed 6325.48 samples/sec Loss 6.7483 LearningRate 0.0007 Epoch: 9 Global Step: 186970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:50,800-Speed 6296.75 samples/sec Loss 6.7595 LearningRate 0.0007 Epoch: 9 Global Step: 186980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:54,041-Speed 6321.19 samples/sec Loss 6.6838 LearningRate 0.0007 Epoch: 9 Global Step: 186990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:44:57,277-Speed 6329.86 samples/sec Loss 6.7133 LearningRate 0.0007 Epoch: 9 Global Step: 187000 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:45:00,529-Speed 6298.93 samples/sec Loss 6.7094 LearningRate 0.0007 Epoch: 9 Global Step: 187010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:03,765-Speed 6330.18 samples/sec Loss 6.7247 LearningRate 0.0007 Epoch: 9 Global Step: 187020 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:07,006-Speed 6320.32 samples/sec Loss 6.7362 LearningRate 0.0007 Epoch: 9 Global Step: 187030 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:10,247-Speed 6320.64 samples/sec Loss 6.7648 LearningRate 0.0007 Epoch: 9 Global Step: 187040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:13,486-Speed 6324.08 samples/sec Loss 6.6710 LearningRate 0.0007 Epoch: 9 Global Step: 187050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:16,731-Speed 6314.05 samples/sec Loss 6.7972 LearningRate 0.0007 Epoch: 9 Global Step: 187060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:19,969-Speed 6325.84 samples/sec Loss 6.8337 LearningRate 0.0007 Epoch: 9 Global Step: 187070 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:23,206-Speed 6328.27 samples/sec Loss 6.6466 LearningRate 0.0007 Epoch: 9 Global Step: 187080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:26,444-Speed 6326.53 samples/sec Loss 6.7648 LearningRate 0.0007 Epoch: 9 Global Step: 187090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:29,680-Speed 6329.27 samples/sec Loss 6.8335 LearningRate 0.0007 Epoch: 9 Global Step: 187100 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:32,918-Speed 6326.52 samples/sec Loss 6.6953 LearningRate 0.0007 Epoch: 9 Global Step: 187110 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:45:36,157-Speed 6324.11 samples/sec Loss 6.7392 LearningRate 0.0007 Epoch: 9 Global Step: 187120 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:45:39,394-Speed 6328.64 samples/sec Loss 6.8125 LearningRate 0.0007 Epoch: 9 Global Step: 187130 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:45:42,618-Speed 6353.69 samples/sec Loss 6.7443 LearningRate 0.0007 Epoch: 9 Global Step: 187140 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:45,856-Speed 6326.89 samples/sec Loss 6.7556 LearningRate 0.0007 Epoch: 9 Global Step: 187150 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:49,096-Speed 6321.87 samples/sec Loss 6.8113 LearningRate 0.0007 Epoch: 9 Global Step: 187160 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:52,344-Speed 6307.97 samples/sec Loss 6.8707 LearningRate 0.0007 Epoch: 9 Global Step: 187170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:55,582-Speed 6325.65 samples/sec Loss 6.6912 LearningRate 0.0007 Epoch: 9 Global Step: 187180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:45:58,819-Speed 6327.09 samples/sec Loss 6.7772 LearningRate 0.0007 Epoch: 9 Global Step: 187190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:02,058-Speed 6324.90 samples/sec Loss 6.6911 LearningRate 0.0007 Epoch: 9 Global Step: 187200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:05,300-Speed 6319.48 samples/sec Loss 6.7872 LearningRate 0.0007 Epoch: 9 Global Step: 187210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:08,537-Speed 6327.64 samples/sec Loss 6.7192 LearningRate 0.0007 Epoch: 9 Global Step: 187220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:11,777-Speed 6322.87 samples/sec Loss 6.7929 LearningRate 0.0007 Epoch: 9 Global Step: 187230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:15,015-Speed 6325.70 samples/sec Loss 6.7556 LearningRate 0.0007 Epoch: 9 Global Step: 187240 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:46:18,256-Speed 6320.93 samples/sec Loss 6.7275 LearningRate 0.0007 Epoch: 9 Global Step: 187250 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:46:21,493-Speed 6327.94 samples/sec Loss 6.7912 LearningRate 0.0007 Epoch: 9 Global Step: 187260 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:46:24,734-Speed 6321.60 samples/sec Loss 6.6589 LearningRate 0.0007 Epoch: 9 Global Step: 187270 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:46:27,974-Speed 6321.90 samples/sec Loss 6.7156 LearningRate 0.0007 Epoch: 9 Global Step: 187280 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:46:31,212-Speed 6326.31 samples/sec Loss 6.6309 LearningRate 0.0007 Epoch: 9 Global Step: 187290 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:46:34,438-Speed 6349.25 samples/sec Loss 6.7462 LearningRate 0.0007 Epoch: 9 Global Step: 187300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:37,680-Speed 6318.72 samples/sec Loss 6.8013 LearningRate 0.0007 Epoch: 9 Global Step: 187310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:40,924-Speed 6315.60 samples/sec Loss 6.7272 LearningRate 0.0007 Epoch: 9 Global Step: 187320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:44,163-Speed 6323.07 samples/sec Loss 6.7575 LearningRate 0.0007 Epoch: 9 Global Step: 187330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:47,408-Speed 6312.20 samples/sec Loss 6.6660 LearningRate 0.0007 Epoch: 9 Global Step: 187340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:50,646-Speed 6326.26 samples/sec Loss 6.7112 LearningRate 0.0007 Epoch: 9 Global Step: 187350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:53,898-Speed 6299.28 samples/sec Loss 6.6316 LearningRate 0.0007 Epoch: 9 Global Step: 187360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:46:57,139-Speed 6320.15 samples/sec Loss 6.7062 LearningRate 0.0007 Epoch: 9 Global Step: 187370 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:00,385-Speed 6311.44 samples/sec Loss 6.6982 LearningRate 0.0007 Epoch: 9 Global Step: 187380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:03,630-Speed 6312.61 samples/sec Loss 6.6993 LearningRate 0.0007 Epoch: 9 Global Step: 187390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:06,870-Speed 6322.90 samples/sec Loss 6.7826 LearningRate 0.0007 Epoch: 9 Global Step: 187400 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:47:10,095-Speed 6350.65 samples/sec Loss 6.8027 LearningRate 0.0007 Epoch: 9 Global Step: 187410 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:13,338-Speed 6317.27 samples/sec Loss 6.7669 LearningRate 0.0007 Epoch: 9 Global Step: 187420 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:16,647-Speed 6190.39 samples/sec Loss 6.6730 LearningRate 0.0007 Epoch: 9 Global Step: 187430 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:19,924-Speed 6251.77 samples/sec Loss 6.7709 LearningRate 0.0007 Epoch: 9 Global Step: 187440 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:23,165-Speed 6320.23 samples/sec Loss 6.7511 LearningRate 0.0007 Epoch: 9 Global Step: 187450 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:26,409-Speed 6315.47 samples/sec Loss 6.8450 LearningRate 0.0007 Epoch: 9 Global Step: 187460 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:29,657-Speed 6307.93 samples/sec Loss 6.7517 LearningRate 0.0007 Epoch: 9 Global Step: 187470 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:32,898-Speed 6319.74 samples/sec Loss 6.7984 LearningRate 0.0007 Epoch: 9 Global Step: 187480 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:36,140-Speed 6317.27 samples/sec Loss 6.7342 LearningRate 0.0007 Epoch: 9 Global Step: 187490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:39,382-Speed 6319.74 samples/sec Loss 6.7766 LearningRate 0.0007 Epoch: 9 Global Step: 187500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:47:42,631-Speed 6304.91 samples/sec Loss 6.6858 LearningRate 0.0007 Epoch: 9 Global Step: 187510 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:47:45,871-Speed 6322.05 samples/sec Loss 6.7316 LearningRate 0.0007 Epoch: 9 Global Step: 187520 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:47:49,126-Speed 6293.23 samples/sec Loss 6.7144 LearningRate 0.0007 Epoch: 9 Global Step: 187530 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:47:52,371-Speed 6312.15 samples/sec Loss 6.7235 LearningRate 0.0007 Epoch: 9 Global Step: 187540 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:47:55,614-Speed 6317.53 samples/sec Loss 6.6367 LearningRate 0.0007 Epoch: 9 Global Step: 187550 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:47:58,858-Speed 6314.61 samples/sec Loss 6.7171 LearningRate 0.0007 Epoch: 9 Global Step: 187560 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:48:02,142-Speed 6236.49 samples/sec Loss 6.7360 LearningRate 0.0007 Epoch: 9 Global Step: 187570 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:05,497-Speed 6105.53 samples/sec Loss 6.6826 LearningRate 0.0007 Epoch: 9 Global Step: 187580 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:08,747-Speed 6303.53 samples/sec Loss 6.7135 LearningRate 0.0007 Epoch: 9 Global Step: 187590 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:11,986-Speed 6325.18 samples/sec Loss 6.8728 LearningRate 0.0007 Epoch: 9 Global Step: 187600 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:15,231-Speed 6311.29 samples/sec Loss 6.7539 LearningRate 0.0007 Epoch: 9 Global Step: 187610 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:18,473-Speed 6319.08 samples/sec Loss 6.7209 LearningRate 0.0007 Epoch: 9 Global Step: 187620 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:21,719-Speed 6311.80 samples/sec Loss 6.7725 LearningRate 0.0007 Epoch: 9 Global Step: 187630 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:24,965-Speed 6310.58 samples/sec Loss 6.7881 LearningRate 0.0007 Epoch: 9 Global Step: 187640 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:28,269-Speed 6200.18 samples/sec Loss 6.7336 LearningRate 0.0007 Epoch: 9 Global Step: 187650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:31,562-Speed 6220.82 samples/sec Loss 6.7161 LearningRate 0.0007 Epoch: 9 Global Step: 187660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:34,802-Speed 6322.07 samples/sec Loss 6.7428 LearningRate 0.0007 Epoch: 9 Global Step: 187670 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:48:38,032-Speed 6341.78 samples/sec Loss 6.6810 LearningRate 0.0007 Epoch: 9 Global Step: 187680 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:41,343-Speed 6187.78 samples/sec Loss 6.6808 LearningRate 0.0007 Epoch: 9 Global Step: 187690 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:44,594-Speed 6299.81 samples/sec Loss 6.6965 LearningRate 0.0007 Epoch: 9 Global Step: 187700 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:47,837-Speed 6316.02 samples/sec Loss 6.7010 LearningRate 0.0007 Epoch: 9 Global Step: 187710 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:51,078-Speed 6321.68 samples/sec Loss 6.7454 LearningRate 0.0007 Epoch: 9 Global Step: 187720 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:54,324-Speed 6309.96 samples/sec Loss 6.7069 LearningRate 0.0007 Epoch: 9 Global Step: 187730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:48:57,566-Speed 6319.45 samples/sec Loss 6.7623 LearningRate 0.0007 Epoch: 9 Global Step: 187740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:00,810-Speed 6314.79 samples/sec Loss 6.7126 LearningRate 0.0007 Epoch: 9 Global Step: 187750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:04,059-Speed 6303.93 samples/sec Loss 6.7700 LearningRate 0.0007 Epoch: 9 Global Step: 187760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:07,300-Speed 6320.59 samples/sec Loss 6.6755 LearningRate 0.0007 Epoch: 9 Global Step: 187770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:10,547-Speed 6309.69 samples/sec Loss 6.6690 LearningRate 0.0007 Epoch: 9 Global Step: 187780 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:49:13,772-Speed 6350.24 samples/sec Loss 6.7218 LearningRate 0.0007 Epoch: 9 Global Step: 187790 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:17,015-Speed 6316.21 samples/sec Loss 6.6273 LearningRate 0.0007 Epoch: 9 Global Step: 187800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:20,265-Speed 6304.86 samples/sec Loss 6.6553 LearningRate 0.0007 Epoch: 9 Global Step: 187810 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:23,519-Speed 6294.68 samples/sec Loss 6.7862 LearningRate 0.0007 Epoch: 9 Global Step: 187820 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:26,760-Speed 6321.16 samples/sec Loss 6.7619 LearningRate 0.0007 Epoch: 9 Global Step: 187830 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:29,999-Speed 6322.37 samples/sec Loss 6.7063 LearningRate 0.0007 Epoch: 9 Global Step: 187840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:33,244-Speed 6312.96 samples/sec Loss 6.7370 LearningRate 0.0007 Epoch: 9 Global Step: 187850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:36,493-Speed 6306.12 samples/sec Loss 6.7495 LearningRate 0.0007 Epoch: 9 Global Step: 187860 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:39,735-Speed 6317.99 samples/sec Loss 6.7488 LearningRate 0.0007 Epoch: 9 Global Step: 187870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:42,978-Speed 6316.58 samples/sec Loss 6.7828 LearningRate 0.0007 Epoch: 9 Global Step: 187880 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:46,221-Speed 6316.96 samples/sec Loss 6.7157 LearningRate 0.0007 Epoch: 9 Global Step: 187890 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:49:49,451-Speed 6341.36 samples/sec Loss 6.7232 LearningRate 0.0007 Epoch: 9 Global Step: 187900 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:52,735-Speed 6237.34 samples/sec Loss 6.7245 LearningRate 0.0007 Epoch: 9 Global Step: 187910 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:55,978-Speed 6316.74 samples/sec Loss 6.7328 LearningRate 0.0007 Epoch: 9 Global Step: 187920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:49:59,220-Speed 6319.03 samples/sec Loss 6.7557 LearningRate 0.0007 Epoch: 9 Global Step: 187930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:02,464-Speed 6315.47 samples/sec Loss 6.7024 LearningRate 0.0007 Epoch: 9 Global Step: 187940 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:05,713-Speed 6304.69 samples/sec Loss 6.7789 LearningRate 0.0007 Epoch: 9 Global Step: 187950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:08,956-Speed 6316.90 samples/sec Loss 6.7239 LearningRate 0.0007 Epoch: 9 Global Step: 187960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:12,204-Speed 6305.26 samples/sec Loss 6.7009 LearningRate 0.0007 Epoch: 9 Global Step: 187970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:15,453-Speed 6305.52 samples/sec Loss 6.7529 LearningRate 0.0007 Epoch: 9 Global Step: 187980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:18,699-Speed 6311.02 samples/sec Loss 6.7391 LearningRate 0.0007 Epoch: 9 Global Step: 187990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:21,929-Speed 6341.39 samples/sec Loss 6.7905 LearningRate 0.0007 Epoch: 9 Global Step: 188000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:25,174-Speed 6312.05 samples/sec Loss 6.6828 LearningRate 0.0007 Epoch: 9 Global Step: 188010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:28,422-Speed 6307.63 samples/sec Loss 6.6814 LearningRate 0.0007 Epoch: 9 Global Step: 188020 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:31,669-Speed 6308.94 samples/sec Loss 6.7585 LearningRate 0.0007 Epoch: 9 Global Step: 188030 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:34,913-Speed 6314.12 samples/sec Loss 6.7624 LearningRate 0.0007 Epoch: 9 Global Step: 188040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:38,154-Speed 6321.13 samples/sec Loss 6.7883 LearningRate 0.0007 Epoch: 9 Global Step: 188050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:41,397-Speed 6316.57 samples/sec Loss 6.8356 LearningRate 0.0007 Epoch: 9 Global Step: 188060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:44,643-Speed 6311.33 samples/sec Loss 6.7617 LearningRate 0.0007 Epoch: 9 Global Step: 188070 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:47,921-Speed 6249.62 samples/sec Loss 6.7161 LearningRate 0.0007 Epoch: 9 Global Step: 188080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:51,165-Speed 6314.62 samples/sec Loss 6.6482 LearningRate 0.0007 Epoch: 9 Global Step: 188090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:50:54,463-Speed 6210.60 samples/sec Loss 6.7091 LearningRate 0.0007 Epoch: 9 Global Step: 188100 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:50:57,706-Speed 6317.77 samples/sec Loss 6.7782 LearningRate 0.0007 Epoch: 9 Global Step: 188110 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:51:00,956-Speed 6301.94 samples/sec Loss 6.7551 LearningRate 0.0007 Epoch: 9 Global Step: 188120 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:51:04,200-Speed 6313.64 samples/sec Loss 6.7475 LearningRate 0.0007 Epoch: 9 Global Step: 188130 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:51:07,430-Speed 6341.66 samples/sec Loss 6.6580 LearningRate 0.0007 Epoch: 9 Global Step: 188140 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:10,688-Speed 6288.32 samples/sec Loss 6.8189 LearningRate 0.0007 Epoch: 9 Global Step: 188150 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:13,931-Speed 6316.96 samples/sec Loss 6.6905 LearningRate 0.0007 Epoch: 9 Global Step: 188160 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:17,173-Speed 6318.31 samples/sec Loss 6.6939 LearningRate 0.0007 Epoch: 9 Global Step: 188170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:20,417-Speed 6315.20 samples/sec Loss 6.7464 LearningRate 0.0007 Epoch: 9 Global Step: 188180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:23,664-Speed 6308.83 samples/sec Loss 6.6932 LearningRate 0.0007 Epoch: 9 Global Step: 188190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:26,906-Speed 6317.46 samples/sec Loss 6.6974 LearningRate 0.0007 Epoch: 9 Global Step: 188200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:30,150-Speed 6314.73 samples/sec Loss 6.7907 LearningRate 0.0007 Epoch: 9 Global Step: 188210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:33,397-Speed 6308.38 samples/sec Loss 6.6732 LearningRate 0.0007 Epoch: 9 Global Step: 188220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:36,641-Speed 6314.20 samples/sec Loss 6.7551 LearningRate 0.0007 Epoch: 9 Global Step: 188230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:39,885-Speed 6314.53 samples/sec Loss 6.6856 LearningRate 0.0007 Epoch: 9 Global Step: 188240 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:51:43,131-Speed 6310.54 samples/sec Loss 6.6844 LearningRate 0.0007 Epoch: 9 Global Step: 188250 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:51:46,375-Speed 6315.14 samples/sec Loss 6.6750 LearningRate 0.0007 Epoch: 9 Global Step: 188260 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:51:49,607-Speed 6338.21 samples/sec Loss 6.7972 LearningRate 0.0007 Epoch: 9 Global Step: 188270 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:52,853-Speed 6311.62 samples/sec Loss 6.8057 LearningRate 0.0007 Epoch: 9 Global Step: 188280 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:56,102-Speed 6305.45 samples/sec Loss 6.7702 LearningRate 0.0007 Epoch: 9 Global Step: 188290 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:51:59,346-Speed 6314.50 samples/sec Loss 6.7938 LearningRate 0.0007 Epoch: 9 Global Step: 188300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:02,606-Speed 6283.99 samples/sec Loss 6.6900 LearningRate 0.0007 Epoch: 9 Global Step: 188310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:05,863-Speed 6288.91 samples/sec Loss 6.7817 LearningRate 0.0007 Epoch: 9 Global Step: 188320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:09,111-Speed 6307.51 samples/sec Loss 6.7029 LearningRate 0.0007 Epoch: 9 Global Step: 188330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:12,354-Speed 6315.12 samples/sec Loss 6.7342 LearningRate 0.0007 Epoch: 9 Global Step: 188340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:15,598-Speed 6314.65 samples/sec Loss 6.7239 LearningRate 0.0007 Epoch: 9 Global Step: 188350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:18,861-Speed 6278.38 samples/sec Loss 6.7466 LearningRate 0.0007 Epoch: 9 Global Step: 188360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:22,103-Speed 6319.29 samples/sec Loss 6.7542 LearningRate 0.0007 Epoch: 9 Global Step: 188370 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:52:25,338-Speed 6330.45 samples/sec Loss 6.6714 LearningRate 0.0007 Epoch: 9 Global Step: 188380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:28,591-Speed 6298.80 samples/sec Loss 6.8579 LearningRate 0.0007 Epoch: 9 Global Step: 188390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:31,838-Speed 6308.53 samples/sec Loss 6.7376 LearningRate 0.0007 Epoch: 9 Global Step: 188400 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:35,086-Speed 6306.58 samples/sec Loss 6.7191 LearningRate 0.0007 Epoch: 9 Global Step: 188410 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:38,331-Speed 6311.74 samples/sec Loss 6.6867 LearningRate 0.0007 Epoch: 9 Global Step: 188420 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:41,577-Speed 6311.09 samples/sec Loss 6.7728 LearningRate 0.0007 Epoch: 9 Global Step: 188430 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:44,822-Speed 6313.41 samples/sec Loss 6.7036 LearningRate 0.0007 Epoch: 9 Global Step: 188440 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:48,077-Speed 6292.14 samples/sec Loss 6.8065 LearningRate 0.0007 Epoch: 9 Global Step: 188450 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:51,328-Speed 6302.16 samples/sec Loss 6.7326 LearningRate 0.0007 Epoch: 9 Global Step: 188460 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:54,572-Speed 6313.60 samples/sec Loss 6.7662 LearningRate 0.0007 Epoch: 9 Global Step: 188470 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:52:57,810-Speed 6327.61 samples/sec Loss 6.6677 LearningRate 0.0007 Epoch: 9 Global Step: 188480 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:01,056-Speed 6310.13 samples/sec Loss 6.7126 LearningRate 0.0007 Epoch: 9 Global Step: 188490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:04,306-Speed 6302.37 samples/sec Loss 6.7511 LearningRate 0.0007 Epoch: 9 Global Step: 188500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:07,560-Speed 6295.57 samples/sec Loss 6.7529 LearningRate 0.0007 Epoch: 9 Global Step: 188510 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:10,805-Speed 6312.72 samples/sec Loss 6.7163 LearningRate 0.0007 Epoch: 9 Global Step: 188520 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:14,057-Speed 6299.43 samples/sec Loss 6.6975 LearningRate 0.0007 Epoch: 9 Global Step: 188530 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:17,306-Speed 6304.28 samples/sec Loss 6.6427 LearningRate 0.0007 Epoch: 9 Global Step: 188540 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:20,556-Speed 6303.39 samples/sec Loss 6.7818 LearningRate 0.0007 Epoch: 9 Global Step: 188550 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:23,808-Speed 6300.44 samples/sec Loss 6.7103 LearningRate 0.0007 Epoch: 9 Global Step: 188560 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:27,059-Speed 6299.92 samples/sec Loss 6.7261 LearningRate 0.0007 Epoch: 9 Global Step: 188570 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:30,306-Speed 6309.94 samples/sec Loss 6.6823 LearningRate 0.0007 Epoch: 9 Global Step: 188580 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:53:33,554-Speed 6306.05 samples/sec Loss 6.7108 LearningRate 0.0007 Epoch: 9 Global Step: 188590 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:53:36,790-Speed 6328.84 samples/sec Loss 6.6739 LearningRate 0.0007 Epoch: 9 Global Step: 188600 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:40,037-Speed 6309.44 samples/sec Loss 6.6462 LearningRate 0.0007 Epoch: 9 Global Step: 188610 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:43,284-Speed 6309.09 samples/sec Loss 6.7508 LearningRate 0.0007 Epoch: 9 Global Step: 188620 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:46,534-Speed 6301.88 samples/sec Loss 6.7275 LearningRate 0.0007 Epoch: 9 Global Step: 188630 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:49,785-Speed 6302.31 samples/sec Loss 6.6236 LearningRate 0.0007 Epoch: 9 Global Step: 188640 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:53,035-Speed 6303.12 samples/sec Loss 6.7323 LearningRate 0.0007 Epoch: 9 Global Step: 188650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:56,283-Speed 6305.91 samples/sec Loss 6.7451 LearningRate 0.0007 Epoch: 9 Global Step: 188660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:53:59,533-Speed 6303.40 samples/sec Loss 6.6891 LearningRate 0.0007 Epoch: 9 Global Step: 188670 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:02,785-Speed 6299.12 samples/sec Loss 6.6947 LearningRate 0.0007 Epoch: 9 Global Step: 188680 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:06,031-Speed 6310.61 samples/sec Loss 6.6567 LearningRate 0.0007 Epoch: 9 Global Step: 188690 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:09,281-Speed 6302.91 samples/sec Loss 6.6606 LearningRate 0.0007 Epoch: 9 Global Step: 188700 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:54:12,520-Speed 6325.40 samples/sec Loss 6.6999 LearningRate 0.0007 Epoch: 9 Global Step: 188710 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:15,766-Speed 6309.69 samples/sec Loss 6.7622 LearningRate 0.0007 Epoch: 9 Global Step: 188720 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:19,015-Speed 6306.27 samples/sec Loss 6.7202 LearningRate 0.0007 Epoch: 9 Global Step: 188730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:22,264-Speed 6304.34 samples/sec Loss 6.7635 LearningRate 0.0007 Epoch: 9 Global Step: 188740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:25,515-Speed 6301.18 samples/sec Loss 6.7461 LearningRate 0.0007 Epoch: 9 Global Step: 188750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:28,764-Speed 6305.38 samples/sec Loss 6.6381 LearningRate 0.0007 Epoch: 9 Global Step: 188760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:32,012-Speed 6306.33 samples/sec Loss 6.7099 LearningRate 0.0007 Epoch: 9 Global Step: 188770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:35,263-Speed 6300.25 samples/sec Loss 6.6805 LearningRate 0.0007 Epoch: 9 Global Step: 188780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:38,511-Speed 6308.22 samples/sec Loss 6.7182 LearningRate 0.0007 Epoch: 9 Global Step: 188790 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:41,758-Speed 6307.90 samples/sec Loss 6.7901 LearningRate 0.0007 Epoch: 9 Global Step: 188800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:45,007-Speed 6305.79 samples/sec Loss 6.7749 LearningRate 0.0007 Epoch: 9 Global Step: 188810 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:54:48,254-Speed 6308.50 samples/sec Loss 6.6518 LearningRate 0.0007 Epoch: 9 Global Step: 188820 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:54:51,515-Speed 6280.80 samples/sec Loss 6.6533 LearningRate 0.0007 Epoch: 9 Global Step: 188830 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:54:54,752-Speed 6328.81 samples/sec Loss 6.7177 LearningRate 0.0007 Epoch: 9 Global Step: 188840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:54:58,002-Speed 6303.90 samples/sec Loss 6.7348 LearningRate 0.0007 Epoch: 9 Global Step: 188850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:01,248-Speed 6308.69 samples/sec Loss 6.7170 LearningRate 0.0007 Epoch: 9 Global Step: 188860 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:04,495-Speed 6309.31 samples/sec Loss 6.7473 LearningRate 0.0007 Epoch: 9 Global Step: 188870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:07,750-Speed 6294.41 samples/sec Loss 6.6906 LearningRate 0.0007 Epoch: 9 Global Step: 188880 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:10,998-Speed 6306.32 samples/sec Loss 6.6518 LearningRate 0.0007 Epoch: 9 Global Step: 188890 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:14,243-Speed 6311.71 samples/sec Loss 6.7539 LearningRate 0.0007 Epoch: 9 Global Step: 188900 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:17,496-Speed 6299.35 samples/sec Loss 6.6483 LearningRate 0.0007 Epoch: 9 Global Step: 188910 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:20,743-Speed 6307.59 samples/sec Loss 6.7287 LearningRate 0.0007 Epoch: 9 Global Step: 188920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:23,994-Speed 6301.23 samples/sec Loss 6.6861 LearningRate 0.0007 Epoch: 9 Global Step: 188930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:27,254-Speed 6285.11 samples/sec Loss 6.8469 LearningRate 0.0007 Epoch: 9 Global Step: 188940 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:55:30,492-Speed 6324.45 samples/sec Loss 6.7259 LearningRate 0.0007 Epoch: 9 Global Step: 188950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:33,745-Speed 6297.63 samples/sec Loss 6.7575 LearningRate 0.0007 Epoch: 9 Global Step: 188960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:37,000-Speed 6294.86 samples/sec Loss 6.6733 LearningRate 0.0007 Epoch: 9 Global Step: 188970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:40,249-Speed 6304.75 samples/sec Loss 6.6945 LearningRate 0.0007 Epoch: 9 Global Step: 188980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:43,497-Speed 6306.08 samples/sec Loss 6.7015 LearningRate 0.0007 Epoch: 9 Global Step: 188990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:46,746-Speed 6303.79 samples/sec Loss 6.6830 LearningRate 0.0007 Epoch: 9 Global Step: 189000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:49,993-Speed 6309.97 samples/sec Loss 6.6861 LearningRate 0.0007 Epoch: 9 Global Step: 189010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:53,254-Speed 6281.57 samples/sec Loss 6.6897 LearningRate 0.0007 Epoch: 9 Global Step: 189020 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:56,504-Speed 6302.52 samples/sec Loss 6.6663 LearningRate 0.0007 Epoch: 9 Global Step: 189030 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:55:59,753-Speed 6304.62 samples/sec Loss 6.6841 LearningRate 0.0007 Epoch: 9 Global Step: 189040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:03,003-Speed 6303.90 samples/sec Loss 6.7043 LearningRate 0.0007 Epoch: 9 Global Step: 189050 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:56:06,262-Speed 6285.42 samples/sec Loss 6.7176 LearningRate 0.0007 Epoch: 9 Global Step: 189060 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:56:09,510-Speed 6307.28 samples/sec Loss 6.6618 LearningRate 0.0007 Epoch: 9 Global Step: 189070 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:56:12,761-Speed 6300.10 samples/sec Loss 6.8112 LearningRate 0.0007 Epoch: 9 Global Step: 189080 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:56:16,017-Speed 6290.51 samples/sec Loss 6.6851 LearningRate 0.0007 Epoch: 9 Global Step: 189090 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:56:19,267-Speed 6304.14 samples/sec Loss 6.7162 LearningRate 0.0007 Epoch: 9 Global Step: 189100 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:56:22,515-Speed 6305.88 samples/sec Loss 6.7521 LearningRate 0.0007 Epoch: 9 Global Step: 189110 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:56:25,750-Speed 6331.94 samples/sec Loss 6.7202 LearningRate 0.0007 Epoch: 9 Global Step: 189120 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:29,007-Speed 6289.69 samples/sec Loss 6.7494 LearningRate 0.0007 Epoch: 9 Global Step: 189130 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:32,257-Speed 6304.62 samples/sec Loss 6.8134 LearningRate 0.0007 Epoch: 9 Global Step: 189140 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:35,507-Speed 6303.20 samples/sec Loss 6.6980 LearningRate 0.0007 Epoch: 9 Global Step: 189150 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:38,756-Speed 6303.52 samples/sec Loss 6.6795 LearningRate 0.0007 Epoch: 9 Global Step: 189160 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:42,009-Speed 6297.77 samples/sec Loss 6.6794 LearningRate 0.0007 Epoch: 9 Global Step: 189170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:45,260-Speed 6301.95 samples/sec Loss 6.7339 LearningRate 0.0007 Epoch: 9 Global Step: 189180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:48,521-Speed 6281.63 samples/sec Loss 6.6077 LearningRate 0.0007 Epoch: 9 Global Step: 189190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:51,770-Speed 6304.87 samples/sec Loss 6.7807 LearningRate 0.0007 Epoch: 9 Global Step: 189200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:55,017-Speed 6308.17 samples/sec Loss 6.7494 LearningRate 0.0007 Epoch: 9 Global Step: 189210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:56:58,268-Speed 6300.44 samples/sec Loss 6.6886 LearningRate 0.0007 Epoch: 9 Global Step: 189220 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:57:01,519-Speed 6301.73 samples/sec Loss 6.6927 LearningRate 0.0007 Epoch: 9 Global Step: 189230 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:57:04,753-Speed 6333.40 samples/sec Loss 6.7332 LearningRate 0.0007 Epoch: 9 Global Step: 189240 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:08,005-Speed 6298.80 samples/sec Loss 6.6652 LearningRate 0.0007 Epoch: 9 Global Step: 189250 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:11,259-Speed 6296.66 samples/sec Loss 6.7702 LearningRate 0.0007 Epoch: 9 Global Step: 189260 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:14,524-Speed 6272.16 samples/sec Loss 6.6952 LearningRate 0.0007 Epoch: 9 Global Step: 189270 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:17,776-Speed 6299.68 samples/sec Loss 6.6952 LearningRate 0.0007 Epoch: 9 Global Step: 189280 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:21,027-Speed 6300.93 samples/sec Loss 6.7137 LearningRate 0.0007 Epoch: 9 Global Step: 189290 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:24,278-Speed 6300.48 samples/sec Loss 6.6546 LearningRate 0.0007 Epoch: 9 Global Step: 189300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:27,525-Speed 6309.43 samples/sec Loss 6.6873 LearningRate 0.0007 Epoch: 9 Global Step: 189310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:30,775-Speed 6303.07 samples/sec Loss 6.7321 LearningRate 0.0007 Epoch: 9 Global Step: 189320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:34,023-Speed 6306.27 samples/sec Loss 6.6908 LearningRate 0.0007 Epoch: 9 Global Step: 189330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:37,262-Speed 6325.22 samples/sec Loss 6.6823 LearningRate 0.0007 Epoch: 9 Global Step: 189340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:40,547-Speed 6236.52 samples/sec Loss 6.7354 LearningRate 0.0007 Epoch: 9 Global Step: 189350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:43,827-Speed 6245.39 samples/sec Loss 6.7780 LearningRate 0.0007 Epoch: 9 Global Step: 189360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:47,083-Speed 6291.66 samples/sec Loss 6.7677 LearningRate 0.0007 Epoch: 9 Global Step: 189370 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:50,329-Speed 6309.27 samples/sec Loss 6.7742 LearningRate 0.0007 Epoch: 9 Global Step: 189380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:53,579-Speed 6303.40 samples/sec Loss 6.7751 LearningRate 0.0007 Epoch: 9 Global Step: 189390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:57:56,839-Speed 6283.23 samples/sec Loss 6.6131 LearningRate 0.0007 Epoch: 9 Global Step: 189400 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:00,087-Speed 6306.61 samples/sec Loss 6.7092 LearningRate 0.0007 Epoch: 9 Global Step: 189410 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:03,336-Speed 6305.06 samples/sec Loss 6.7146 LearningRate 0.0007 Epoch: 9 Global Step: 189420 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:06,584-Speed 6308.07 samples/sec Loss 6.6402 LearningRate 0.0007 Epoch: 9 Global Step: 189430 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:09,834-Speed 6302.06 samples/sec Loss 6.6647 LearningRate 0.0007 Epoch: 9 Global Step: 189440 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:58:13,082-Speed 6306.72 samples/sec Loss 6.7241 LearningRate 0.0007 Epoch: 9 Global Step: 189450 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:58:16,339-Speed 6288.93 samples/sec Loss 6.7762 LearningRate 0.0007 Epoch: 9 Global Step: 189460 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:58:19,575-Speed 6330.25 samples/sec Loss 6.7176 LearningRate 0.0007 Epoch: 9 Global Step: 189470 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:22,823-Speed 6307.80 samples/sec Loss 6.8023 LearningRate 0.0007 Epoch: 9 Global Step: 189480 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:26,076-Speed 6296.29 samples/sec Loss 6.7107 LearningRate 0.0007 Epoch: 9 Global Step: 189490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:29,325-Speed 6305.32 samples/sec Loss 6.7706 LearningRate 0.0007 Epoch: 9 Global Step: 189500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:32,577-Speed 6299.96 samples/sec Loss 6.6568 LearningRate 0.0007 Epoch: 9 Global Step: 189510 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:35,824-Speed 6308.47 samples/sec Loss 6.7465 LearningRate 0.0007 Epoch: 9 Global Step: 189520 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:39,074-Speed 6302.12 samples/sec Loss 6.7302 LearningRate 0.0007 Epoch: 9 Global Step: 189530 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:42,324-Speed 6302.60 samples/sec Loss 6.6579 LearningRate 0.0007 Epoch: 9 Global Step: 189540 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:45,573-Speed 6305.14 samples/sec Loss 6.7124 LearningRate 0.0007 Epoch: 9 Global Step: 189550 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:48,821-Speed 6308.28 samples/sec Loss 6.8120 LearningRate 0.0007 Epoch: 9 Global Step: 189560 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:58:52,071-Speed 6302.63 samples/sec Loss 6.6950 LearningRate 0.0007 Epoch: 9 Global Step: 189570 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:58:55,319-Speed 6306.77 samples/sec Loss 6.6643 LearningRate 0.0007 Epoch: 9 Global Step: 189580 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:58:58,567-Speed 6306.17 samples/sec Loss 6.7017 LearningRate 0.0007 Epoch: 9 Global Step: 189590 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:59:01,815-Speed 6307.48 samples/sec Loss 6.7144 LearningRate 0.0007 Epoch: 9 Global Step: 189600 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:59:05,050-Speed 6331.96 samples/sec Loss 6.6921 LearningRate 0.0007 Epoch: 9 Global Step: 189610 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:08,306-Speed 6292.17 samples/sec Loss 6.7229 LearningRate 0.0007 Epoch: 9 Global Step: 189620 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:11,555-Speed 6303.98 samples/sec Loss 6.6534 LearningRate 0.0007 Epoch: 9 Global Step: 189630 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:14,809-Speed 6295.72 samples/sec Loss 6.7273 LearningRate 0.0007 Epoch: 9 Global Step: 189640 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:18,056-Speed 6308.69 samples/sec Loss 6.7447 LearningRate 0.0007 Epoch: 9 Global Step: 189650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:21,307-Speed 6301.93 samples/sec Loss 6.6731 LearningRate 0.0007 Epoch: 9 Global Step: 189660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:24,553-Speed 6309.69 samples/sec Loss 6.5951 LearningRate 0.0007 Epoch: 9 Global Step: 189670 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:27,799-Speed 6310.13 samples/sec Loss 6.7794 LearningRate 0.0007 Epoch: 9 Global Step: 189680 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:31,047-Speed 6306.84 samples/sec Loss 6.6899 LearningRate 0.0007 Epoch: 9 Global Step: 189690 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:34,294-Speed 6309.09 samples/sec Loss 6.7112 LearningRate 0.0007 Epoch: 9 Global Step: 189700 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:37,543-Speed 6305.68 samples/sec Loss 6.6643 LearningRate 0.0007 Epoch: 9 Global Step: 189710 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 08:59:40,776-Speed 6336.51 samples/sec Loss 6.7460 LearningRate 0.0007 Epoch: 9 Global Step: 189720 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:44,025-Speed 6303.98 samples/sec Loss 6.7109 LearningRate 0.0007 Epoch: 9 Global Step: 189730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:47,275-Speed 6303.07 samples/sec Loss 6.6783 LearningRate 0.0007 Epoch: 9 Global Step: 189740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:50,523-Speed 6305.43 samples/sec Loss 6.7278 LearningRate 0.0007 Epoch: 9 Global Step: 189750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:53,771-Speed 6308.52 samples/sec Loss 6.7524 LearningRate 0.0007 Epoch: 9 Global Step: 189760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 08:59:57,017-Speed 6310.81 samples/sec Loss 6.7679 LearningRate 0.0007 Epoch: 9 Global Step: 189770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:00,268-Speed 6301.93 samples/sec Loss 6.7140 LearningRate 0.0007 Epoch: 9 Global Step: 189780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:03,518-Speed 6303.04 samples/sec Loss 6.7277 LearningRate 0.0007 Epoch: 9 Global Step: 189790 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:06,768-Speed 6303.11 samples/sec Loss 6.8205 LearningRate 0.0007 Epoch: 9 Global Step: 189800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:10,026-Speed 6287.93 samples/sec Loss 6.7709 LearningRate 0.0007 Epoch: 9 Global Step: 189810 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:13,264-Speed 6325.24 samples/sec Loss 6.7412 LearningRate 0.0007 Epoch: 9 Global Step: 189820 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:16,519-Speed 6293.23 samples/sec Loss 6.7730 LearningRate 0.0007 Epoch: 9 Global Step: 189830 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:19,765-Speed 6310.32 samples/sec Loss 6.6794 LearningRate 0.0007 Epoch: 9 Global Step: 189840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:23,015-Speed 6304.37 samples/sec Loss 6.6941 LearningRate 0.0007 Epoch: 9 Global Step: 189850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:26,264-Speed 6304.01 samples/sec Loss 6.6973 LearningRate 0.0007 Epoch: 9 Global Step: 189860 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:29,511-Speed 6308.42 samples/sec Loss 6.5980 LearningRate 0.0007 Epoch: 9 Global Step: 189870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:32,760-Speed 6305.62 samples/sec Loss 6.6926 LearningRate 0.0007 Epoch: 9 Global Step: 189880 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:36,003-Speed 6315.74 samples/sec Loss 6.6928 LearningRate 0.0007 Epoch: 9 Global Step: 189890 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:39,254-Speed 6301.61 samples/sec Loss 6.6586 LearningRate 0.0007 Epoch: 9 Global Step: 189900 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:42,500-Speed 6311.52 samples/sec Loss 6.7100 LearningRate 0.0007 Epoch: 9 Global Step: 189910 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:45,751-Speed 6300.09 samples/sec Loss 6.7881 LearningRate 0.0007 Epoch: 9 Global Step: 189920 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:00:48,991-Speed 6321.90 samples/sec Loss 6.7264 LearningRate 0.0007 Epoch: 9 Global Step: 189930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:52,237-Speed 6311.69 samples/sec Loss 6.6554 LearningRate 0.0007 Epoch: 9 Global Step: 189940 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:55,485-Speed 6305.69 samples/sec Loss 6.6873 LearningRate 0.0007 Epoch: 9 Global Step: 189950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:00:58,730-Speed 6313.71 samples/sec Loss 6.7582 LearningRate 0.0007 Epoch: 9 Global Step: 189960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:01,979-Speed 6304.85 samples/sec Loss 6.6535 LearningRate 0.0007 Epoch: 9 Global Step: 189970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:05,229-Speed 6303.69 samples/sec Loss 6.7520 LearningRate 0.0007 Epoch: 9 Global Step: 189980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:08,475-Speed 6311.58 samples/sec Loss 6.6562 LearningRate 0.0007 Epoch: 9 Global Step: 189990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:11,724-Speed 6303.98 samples/sec Loss 6.7259 LearningRate 0.0007 Epoch: 9 Global Step: 190000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:14,973-Speed 6304.86 samples/sec Loss 6.6750 LearningRate 0.0007 Epoch: 9 Global Step: 190010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:18,222-Speed 6304.04 samples/sec Loss 6.7705 LearningRate 0.0007 Epoch: 9 Global Step: 190020 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:21,471-Speed 6305.44 samples/sec Loss 6.6256 LearningRate 0.0007 Epoch: 9 Global Step: 190030 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:01:24,705-Speed 6334.85 samples/sec Loss 6.7174 LearningRate 0.0007 Epoch: 9 Global Step: 190040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:27,951-Speed 6309.22 samples/sec Loss 6.6702 LearningRate 0.0007 Epoch: 9 Global Step: 190050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:31,200-Speed 6304.63 samples/sec Loss 6.7323 LearningRate 0.0007 Epoch: 9 Global Step: 190060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:34,448-Speed 6307.51 samples/sec Loss 6.5887 LearningRate 0.0007 Epoch: 9 Global Step: 190070 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:37,693-Speed 6313.68 samples/sec Loss 6.6595 LearningRate 0.0007 Epoch: 9 Global Step: 190080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:40,939-Speed 6310.87 samples/sec Loss 6.6081 LearningRate 0.0007 Epoch: 9 Global Step: 190090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:44,186-Speed 6306.97 samples/sec Loss 6.6778 LearningRate 0.0007 Epoch: 9 Global Step: 190100 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:47,440-Speed 6294.84 samples/sec Loss 6.6972 LearningRate 0.0007 Epoch: 9 Global Step: 190110 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:50,691-Speed 6301.33 samples/sec Loss 6.6539 LearningRate 0.0007 Epoch: 9 Global Step: 190120 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:53,941-Speed 6303.22 samples/sec Loss 6.7325 LearningRate 0.0007 Epoch: 9 Global Step: 190130 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:01:57,195-Speed 6295.73 samples/sec Loss 6.7012 LearningRate 0.0007 Epoch: 9 Global Step: 190140 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:02:00,439-Speed 6314.27 samples/sec Loss 6.6976 LearningRate 0.0007 Epoch: 9 Global Step: 190150 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:02:03,689-Speed 6303.98 samples/sec Loss 6.7759 LearningRate 0.0007 Epoch: 9 Global Step: 190160 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:02:06,923-Speed 6333.09 samples/sec Loss 6.6394 LearningRate 0.0007 Epoch: 9 Global Step: 190170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:10,171-Speed 6308.61 samples/sec Loss 6.7135 LearningRate 0.0007 Epoch: 9 Global Step: 190180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:13,421-Speed 6301.96 samples/sec Loss 6.7633 LearningRate 0.0007 Epoch: 9 Global Step: 190190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:16,669-Speed 6307.17 samples/sec Loss 6.7567 LearningRate 0.0007 Epoch: 9 Global Step: 190200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:19,917-Speed 6306.94 samples/sec Loss 6.7145 LearningRate 0.0007 Epoch: 9 Global Step: 190210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:23,166-Speed 6303.82 samples/sec Loss 6.7196 LearningRate 0.0007 Epoch: 9 Global Step: 190220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:26,417-Speed 6302.67 samples/sec Loss 6.6668 LearningRate 0.0007 Epoch: 9 Global Step: 190230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:29,666-Speed 6303.21 samples/sec Loss 6.7157 LearningRate 0.0007 Epoch: 9 Global Step: 190240 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:32,911-Speed 6313.76 samples/sec Loss 6.7097 LearningRate 0.0007 Epoch: 9 Global Step: 190250 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:36,158-Speed 6308.64 samples/sec Loss 6.6701 LearningRate 0.0007 Epoch: 9 Global Step: 190260 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:39,389-Speed 6339.83 samples/sec Loss 6.6292 LearningRate 0.0007 Epoch: 9 Global Step: 190270 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:42,635-Speed 6311.40 samples/sec Loss 6.6783 LearningRate 0.0007 Epoch: 9 Global Step: 190280 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:45,880-Speed 6311.65 samples/sec Loss 6.6805 LearningRate 0.0007 Epoch: 9 Global Step: 190290 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:49,126-Speed 6310.00 samples/sec Loss 6.7575 LearningRate 0.0007 Epoch: 9 Global Step: 190300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:52,378-Speed 6300.69 samples/sec Loss 6.6483 LearningRate 0.0007 Epoch: 9 Global Step: 190310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:55,622-Speed 6313.54 samples/sec Loss 6.7228 LearningRate 0.0007 Epoch: 9 Global Step: 190320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:02:58,873-Speed 6301.81 samples/sec Loss 6.6585 LearningRate 0.0007 Epoch: 9 Global Step: 190330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:02,123-Speed 6302.53 samples/sec Loss 6.7056 LearningRate 0.0007 Epoch: 9 Global Step: 190340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:05,372-Speed 6304.30 samples/sec Loss 6.7373 LearningRate 0.0007 Epoch: 9 Global Step: 190350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:08,621-Speed 6305.59 samples/sec Loss 6.6869 LearningRate 0.0007 Epoch: 9 Global Step: 190360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:11,850-Speed 6342.84 samples/sec Loss 6.7191 LearningRate 0.0007 Epoch: 9 Global Step: 190370 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:15,099-Speed 6305.43 samples/sec Loss 6.7133 LearningRate 0.0007 Epoch: 9 Global Step: 190380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:18,347-Speed 6306.93 samples/sec Loss 6.6625 LearningRate 0.0007 Epoch: 9 Global Step: 190390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:21,593-Speed 6311.22 samples/sec Loss 6.6669 LearningRate 0.0007 Epoch: 9 Global Step: 190400 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:24,840-Speed 6309.51 samples/sec Loss 6.7267 LearningRate 0.0007 Epoch: 9 Global Step: 190410 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:28,090-Speed 6303.35 samples/sec Loss 6.7707 LearningRate 0.0007 Epoch: 9 Global Step: 190420 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:31,337-Speed 6308.70 samples/sec Loss 6.7554 LearningRate 0.0007 Epoch: 9 Global Step: 190430 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:34,585-Speed 6305.94 samples/sec Loss 6.7209 LearningRate 0.0007 Epoch: 9 Global Step: 190440 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:37,837-Speed 6298.83 samples/sec Loss 6.7260 LearningRate 0.0007 Epoch: 9 Global Step: 190450 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:41,084-Speed 6309.89 samples/sec Loss 6.6911 LearningRate 0.0007 Epoch: 9 Global Step: 190460 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:44,326-Speed 6317.39 samples/sec Loss 6.7018 LearningRate 0.0007 Epoch: 9 Global Step: 190470 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:03:47,561-Speed 6332.08 samples/sec Loss 6.7128 LearningRate 0.0007 Epoch: 9 Global Step: 190480 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:50,806-Speed 6313.54 samples/sec Loss 6.7349 LearningRate 0.0007 Epoch: 9 Global Step: 190490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:54,051-Speed 6311.51 samples/sec Loss 6.7202 LearningRate 0.0007 Epoch: 9 Global Step: 190500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:03:57,314-Speed 6278.95 samples/sec Loss 6.6980 LearningRate 0.0007 Epoch: 9 Global Step: 190510 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:00,561-Speed 6308.95 samples/sec Loss 6.7429 LearningRate 0.0007 Epoch: 9 Global Step: 190520 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:03,807-Speed 6309.29 samples/sec Loss 6.6354 LearningRate 0.0007 Epoch: 9 Global Step: 190530 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:07,056-Speed 6305.00 samples/sec Loss 6.6473 LearningRate 0.0007 Epoch: 9 Global Step: 190540 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:10,306-Speed 6304.23 samples/sec Loss 6.7654 LearningRate 0.0007 Epoch: 9 Global Step: 190550 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:13,551-Speed 6311.69 samples/sec Loss 6.6947 LearningRate 0.0007 Epoch: 9 Global Step: 190560 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:16,798-Speed 6308.45 samples/sec Loss 6.7535 LearningRate 0.0007 Epoch: 9 Global Step: 190570 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:20,045-Speed 6308.38 samples/sec Loss 6.7160 LearningRate 0.0007 Epoch: 9 Global Step: 190580 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:04:23,288-Speed 6318.27 samples/sec Loss 6.7056 LearningRate 0.0007 Epoch: 9 Global Step: 190590 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:26,534-Speed 6311.34 samples/sec Loss 6.6160 LearningRate 0.0007 Epoch: 9 Global Step: 190600 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:29,782-Speed 6306.93 samples/sec Loss 6.7117 LearningRate 0.0007 Epoch: 9 Global Step: 190610 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:33,028-Speed 6310.87 samples/sec Loss 6.7295 LearningRate 0.0007 Epoch: 9 Global Step: 190620 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:36,277-Speed 6304.51 samples/sec Loss 6.6896 LearningRate 0.0007 Epoch: 9 Global Step: 190630 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:39,520-Speed 6315.74 samples/sec Loss 6.7726 LearningRate 0.0007 Epoch: 9 Global Step: 190640 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:42,772-Speed 6299.11 samples/sec Loss 6.7216 LearningRate 0.0007 Epoch: 9 Global Step: 190650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:46,017-Speed 6312.18 samples/sec Loss 6.8084 LearningRate 0.0007 Epoch: 9 Global Step: 190660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:49,267-Speed 6303.11 samples/sec Loss 6.6904 LearningRate 0.0007 Epoch: 9 Global Step: 190670 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:52,512-Speed 6314.15 samples/sec Loss 6.7600 LearningRate 0.0007 Epoch: 9 Global Step: 190680 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:04:55,760-Speed 6305.59 samples/sec Loss 6.6992 LearningRate 0.0007 Epoch: 9 Global Step: 190690 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:04:59,008-Speed 6306.19 samples/sec Loss 6.7174 LearningRate 0.0007 Epoch: 9 Global Step: 190700 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:05:02,258-Speed 6303.52 samples/sec Loss 6.6722 LearningRate 0.0007 Epoch: 9 Global Step: 190710 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:05:05,509-Speed 6300.69 samples/sec Loss 6.7303 LearningRate 0.0007 Epoch: 9 Global Step: 190720 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:05:08,741-Speed 6338.06 samples/sec Loss 6.6568 LearningRate 0.0007 Epoch: 9 Global Step: 190730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:11,988-Speed 6308.90 samples/sec Loss 6.6320 LearningRate 0.0007 Epoch: 9 Global Step: 190740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:15,240-Speed 6300.12 samples/sec Loss 6.6498 LearningRate 0.0007 Epoch: 9 Global Step: 190750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:18,486-Speed 6310.28 samples/sec Loss 6.7142 LearningRate 0.0007 Epoch: 9 Global Step: 190760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:21,731-Speed 6311.77 samples/sec Loss 6.6287 LearningRate 0.0007 Epoch: 9 Global Step: 190770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:24,977-Speed 6312.21 samples/sec Loss 6.7525 LearningRate 0.0007 Epoch: 9 Global Step: 190780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:28,225-Speed 6304.75 samples/sec Loss 6.6968 LearningRate 0.0007 Epoch: 9 Global Step: 190790 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:31,471-Speed 6310.68 samples/sec Loss 6.7058 LearningRate 0.0007 Epoch: 9 Global Step: 190800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:34,721-Speed 6304.60 samples/sec Loss 6.6654 LearningRate 0.0007 Epoch: 9 Global Step: 190810 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:37,973-Speed 6299.28 samples/sec Loss 6.7088 LearningRate 0.0007 Epoch: 9 Global Step: 190820 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:41,220-Speed 6308.64 samples/sec Loss 6.6778 LearningRate 0.0007 Epoch: 9 Global Step: 190830 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:05:44,457-Speed 6329.19 samples/sec Loss 6.6712 LearningRate 0.0007 Epoch: 9 Global Step: 190840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:47,707-Speed 6301.76 samples/sec Loss 6.6683 LearningRate 0.0007 Epoch: 9 Global Step: 190850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:50,959-Speed 6300.32 samples/sec Loss 6.6078 LearningRate 0.0007 Epoch: 9 Global Step: 190860 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:54,203-Speed 6314.49 samples/sec Loss 6.7550 LearningRate 0.0007 Epoch: 9 Global Step: 190870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:05:57,447-Speed 6314.15 samples/sec Loss 6.6610 LearningRate 0.0007 Epoch: 9 Global Step: 190880 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:00,692-Speed 6312.37 samples/sec Loss 6.6721 LearningRate 0.0007 Epoch: 9 Global Step: 190890 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:03,943-Speed 6301.18 samples/sec Loss 6.7141 LearningRate 0.0007 Epoch: 9 Global Step: 190900 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:07,188-Speed 6312.22 samples/sec Loss 6.7523 LearningRate 0.0007 Epoch: 9 Global Step: 190910 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:10,432-Speed 6314.34 samples/sec Loss 6.6348 LearningRate 0.0007 Epoch: 9 Global Step: 190920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:13,686-Speed 6295.02 samples/sec Loss 6.6790 LearningRate 0.0007 Epoch: 9 Global Step: 190930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:16,931-Speed 6313.33 samples/sec Loss 6.6095 LearningRate 0.0007 Epoch: 9 Global Step: 190940 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:06:20,163-Speed 6338.72 samples/sec Loss 6.7105 LearningRate 0.0007 Epoch: 9 Global Step: 190950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:23,414-Speed 6301.05 samples/sec Loss 6.6287 LearningRate 0.0007 Epoch: 9 Global Step: 190960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:26,661-Speed 6308.83 samples/sec Loss 6.6762 LearningRate 0.0007 Epoch: 9 Global Step: 190970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:29,906-Speed 6312.30 samples/sec Loss 6.6394 LearningRate 0.0007 Epoch: 9 Global Step: 190980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:33,151-Speed 6312.84 samples/sec Loss 6.6456 LearningRate 0.0007 Epoch: 9 Global Step: 190990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:36,397-Speed 6310.87 samples/sec Loss 6.6945 LearningRate 0.0007 Epoch: 9 Global Step: 191000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:39,643-Speed 6309.15 samples/sec Loss 6.7915 LearningRate 0.0007 Epoch: 9 Global Step: 191010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:42,884-Speed 6321.33 samples/sec Loss 6.7473 LearningRate 0.0007 Epoch: 9 Global Step: 191020 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:46,132-Speed 6307.14 samples/sec Loss 6.7122 LearningRate 0.0007 Epoch: 9 Global Step: 191030 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:49,380-Speed 6307.22 samples/sec Loss 6.6753 LearningRate 0.0007 Epoch: 9 Global Step: 191040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:52,614-Speed 6335.37 samples/sec Loss 6.7723 LearningRate 0.0007 Epoch: 9 Global Step: 191050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:55,858-Speed 6314.23 samples/sec Loss 6.6961 LearningRate 0.0007 Epoch: 9 Global Step: 191060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:06:59,107-Speed 6304.33 samples/sec Loss 6.6852 LearningRate 0.0007 Epoch: 9 Global Step: 191070 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:02,360-Speed 6297.80 samples/sec Loss 6.7102 LearningRate 0.0007 Epoch: 9 Global Step: 191080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:05,604-Speed 6315.44 samples/sec Loss 6.6161 LearningRate 0.0007 Epoch: 9 Global Step: 191090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:08,848-Speed 6313.73 samples/sec Loss 6.6759 LearningRate 0.0007 Epoch: 9 Global Step: 191100 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:12,097-Speed 6305.87 samples/sec Loss 6.7167 LearningRate 0.0007 Epoch: 9 Global Step: 191110 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:15,342-Speed 6311.81 samples/sec Loss 6.7118 LearningRate 0.0007 Epoch: 9 Global Step: 191120 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:18,594-Speed 6299.19 samples/sec Loss 6.7353 LearningRate 0.0007 Epoch: 9 Global Step: 191130 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:21,836-Speed 6318.43 samples/sec Loss 6.6600 LearningRate 0.0007 Epoch: 9 Global Step: 191140 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:25,082-Speed 6311.04 samples/sec Loss 6.6090 LearningRate 0.0007 Epoch: 9 Global Step: 191150 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:07:28,317-Speed 6332.18 samples/sec Loss 6.6416 LearningRate 0.0007 Epoch: 9 Global Step: 191160 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:31,568-Speed 6299.55 samples/sec Loss 6.6663 LearningRate 0.0007 Epoch: 9 Global Step: 191170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:34,815-Speed 6310.05 samples/sec Loss 6.7290 LearningRate 0.0007 Epoch: 9 Global Step: 191180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:38,059-Speed 6313.33 samples/sec Loss 6.6685 LearningRate 0.0007 Epoch: 9 Global Step: 191190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:41,306-Speed 6309.11 samples/sec Loss 6.6524 LearningRate 0.0007 Epoch: 9 Global Step: 191200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:44,552-Speed 6311.88 samples/sec Loss 6.6826 LearningRate 0.0007 Epoch: 9 Global Step: 191210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:47,794-Speed 6317.15 samples/sec Loss 6.6230 LearningRate 0.0007 Epoch: 9 Global Step: 191220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:51,043-Speed 6306.15 samples/sec Loss 6.7504 LearningRate 0.0007 Epoch: 9 Global Step: 191230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:54,292-Speed 6305.64 samples/sec Loss 6.6976 LearningRate 0.0007 Epoch: 9 Global Step: 191240 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:07:57,538-Speed 6310.82 samples/sec Loss 6.7368 LearningRate 0.0007 Epoch: 9 Global Step: 191250 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:00,769-Speed 6339.34 samples/sec Loss 6.6777 LearningRate 0.0007 Epoch: 9 Global Step: 191260 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:04,029-Speed 6282.85 samples/sec Loss 6.7207 LearningRate 0.0007 Epoch: 9 Global Step: 191270 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:07,279-Speed 6303.12 samples/sec Loss 6.6552 LearningRate 0.0007 Epoch: 9 Global Step: 191280 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:10,525-Speed 6311.04 samples/sec Loss 6.6565 LearningRate 0.0007 Epoch: 9 Global Step: 191290 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:13,775-Speed 6303.81 samples/sec Loss 6.7214 LearningRate 0.0007 Epoch: 9 Global Step: 191300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:17,022-Speed 6307.40 samples/sec Loss 6.6418 LearningRate 0.0007 Epoch: 9 Global Step: 191310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:20,270-Speed 6306.65 samples/sec Loss 6.6695 LearningRate 0.0007 Epoch: 9 Global Step: 191320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:23,515-Speed 6313.69 samples/sec Loss 6.6433 LearningRate 0.0007 Epoch: 9 Global Step: 191330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:26,777-Speed 6279.26 samples/sec Loss 6.6716 LearningRate 0.0007 Epoch: 9 Global Step: 191340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:30,025-Speed 6306.46 samples/sec Loss 6.7346 LearningRate 0.0007 Epoch: 9 Global Step: 191350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:33,270-Speed 6313.85 samples/sec Loss 6.6460 LearningRate 0.0007 Epoch: 9 Global Step: 191360 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:08:36,519-Speed 6303.85 samples/sec Loss 6.6657 LearningRate 0.0007 Epoch: 9 Global Step: 191370 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:08:39,756-Speed 6327.84 samples/sec Loss 6.7387 LearningRate 0.0007 Epoch: 9 Global Step: 191380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:43,009-Speed 6298.01 samples/sec Loss 6.6866 LearningRate 0.0007 Epoch: 9 Global Step: 191390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:46,260-Speed 6301.55 samples/sec Loss 6.7052 LearningRate 0.0007 Epoch: 9 Global Step: 191400 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:49,505-Speed 6312.88 samples/sec Loss 6.7800 LearningRate 0.0007 Epoch: 9 Global Step: 191410 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:52,748-Speed 6314.88 samples/sec Loss 6.7169 LearningRate 0.0007 Epoch: 9 Global Step: 191420 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:55,995-Speed 6308.74 samples/sec Loss 6.7094 LearningRate 0.0007 Epoch: 9 Global Step: 191430 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:08:59,240-Speed 6314.02 samples/sec Loss 6.5620 LearningRate 0.0007 Epoch: 9 Global Step: 191440 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:02,489-Speed 6304.73 samples/sec Loss 6.6776 LearningRate 0.0007 Epoch: 9 Global Step: 191450 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:05,736-Speed 6308.93 samples/sec Loss 6.7183 LearningRate 0.0007 Epoch: 9 Global Step: 191460 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:08,984-Speed 6308.19 samples/sec Loss 6.7671 LearningRate 0.0007 Epoch: 9 Global Step: 191470 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:12,232-Speed 6306.70 samples/sec Loss 6.8163 LearningRate 0.0007 Epoch: 9 Global Step: 191480 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:09:15,463-Speed 6338.59 samples/sec Loss 6.6601 LearningRate 0.0007 Epoch: 9 Global Step: 191490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:18,711-Speed 6308.38 samples/sec Loss 6.7876 LearningRate 0.0007 Epoch: 9 Global Step: 191500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:21,964-Speed 6295.37 samples/sec Loss 6.6138 LearningRate 0.0007 Epoch: 9 Global Step: 191510 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:25,216-Speed 6300.29 samples/sec Loss 6.6035 LearningRate 0.0007 Epoch: 9 Global Step: 191520 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:28,460-Speed 6314.09 samples/sec Loss 6.6817 LearningRate 0.0007 Epoch: 9 Global Step: 191530 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:31,708-Speed 6306.13 samples/sec Loss 6.7393 LearningRate 0.0007 Epoch: 9 Global Step: 191540 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:34,951-Speed 6316.91 samples/sec Loss 6.6959 LearningRate 0.0007 Epoch: 9 Global Step: 191550 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:38,199-Speed 6306.29 samples/sec Loss 6.7636 LearningRate 0.0007 Epoch: 9 Global Step: 191560 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:41,445-Speed 6311.06 samples/sec Loss 6.6538 LearningRate 0.0007 Epoch: 9 Global Step: 191570 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:44,694-Speed 6306.53 samples/sec Loss 6.6690 LearningRate 0.0007 Epoch: 9 Global Step: 191580 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:47,929-Speed 6331.66 samples/sec Loss 6.6772 LearningRate 0.0007 Epoch: 9 Global Step: 191590 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:51,175-Speed 6310.85 samples/sec Loss 6.7584 LearningRate 0.0007 Epoch: 9 Global Step: 191600 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:54,425-Speed 6301.16 samples/sec Loss 6.6980 LearningRate 0.0007 Epoch: 9 Global Step: 191610 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:09:57,669-Speed 6314.99 samples/sec Loss 6.7150 LearningRate 0.0007 Epoch: 9 Global Step: 191620 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:00,917-Speed 6307.67 samples/sec Loss 6.6824 LearningRate 0.0007 Epoch: 9 Global Step: 191630 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:04,163-Speed 6309.88 samples/sec Loss 6.6681 LearningRate 0.0007 Epoch: 9 Global Step: 191640 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:07,408-Speed 6314.85 samples/sec Loss 6.6564 LearningRate 0.0007 Epoch: 9 Global Step: 191650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:10,664-Speed 6289.85 samples/sec Loss 6.6415 LearningRate 0.0007 Epoch: 9 Global Step: 191660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:13,908-Speed 6315.43 samples/sec Loss 6.6674 LearningRate 0.0007 Epoch: 9 Global Step: 191670 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:17,158-Speed 6303.67 samples/sec Loss 6.6483 LearningRate 0.0007 Epoch: 9 Global Step: 191680 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:20,393-Speed 6332.22 samples/sec Loss 6.6394 LearningRate 0.0007 Epoch: 9 Global Step: 191690 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:23,638-Speed 6312.05 samples/sec Loss 6.6685 LearningRate 0.0007 Epoch: 9 Global Step: 191700 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:26,892-Speed 6295.90 samples/sec Loss 6.7277 LearningRate 0.0007 Epoch: 9 Global Step: 191710 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:30,136-Speed 6313.30 samples/sec Loss 6.6387 LearningRate 0.0007 Epoch: 9 Global Step: 191720 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:33,388-Speed 6299.99 samples/sec Loss 6.6477 LearningRate 0.0007 Epoch: 9 Global Step: 191730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:36,633-Speed 6310.85 samples/sec Loss 6.7197 LearningRate 0.0007 Epoch: 9 Global Step: 191740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:39,883-Speed 6303.43 samples/sec Loss 6.6737 LearningRate 0.0007 Epoch: 9 Global Step: 191750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:43,129-Speed 6310.56 samples/sec Loss 6.6774 LearningRate 0.0007 Epoch: 9 Global Step: 191760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:46,373-Speed 6315.84 samples/sec Loss 6.6846 LearningRate 0.0007 Epoch: 9 Global Step: 191770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:49,622-Speed 6305.18 samples/sec Loss 6.6747 LearningRate 0.0007 Epoch: 9 Global Step: 191780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:52,869-Speed 6308.58 samples/sec Loss 6.6880 LearningRate 0.0007 Epoch: 9 Global Step: 191790 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:10:56,101-Speed 6336.51 samples/sec Loss 6.7067 LearningRate 0.0007 Epoch: 9 Global Step: 191800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:10:59,345-Speed 6315.10 samples/sec Loss 6.8190 LearningRate 0.0007 Epoch: 9 Global Step: 191810 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:02,594-Speed 6305.94 samples/sec Loss 6.6888 LearningRate 0.0007 Epoch: 9 Global Step: 191820 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:05,837-Speed 6315.89 samples/sec Loss 6.7604 LearningRate 0.0007 Epoch: 9 Global Step: 191830 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:09,083-Speed 6310.15 samples/sec Loss 6.7697 LearningRate 0.0007 Epoch: 9 Global Step: 191840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:12,330-Speed 6309.15 samples/sec Loss 6.6575 LearningRate 0.0007 Epoch: 9 Global Step: 191850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:15,578-Speed 6307.69 samples/sec Loss 6.6409 LearningRate 0.0007 Epoch: 9 Global Step: 191860 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:18,827-Speed 6305.42 samples/sec Loss 6.6489 LearningRate 0.0007 Epoch: 9 Global Step: 191870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:22,074-Speed 6308.24 samples/sec Loss 6.6825 LearningRate 0.0007 Epoch: 9 Global Step: 191880 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:25,322-Speed 6307.01 samples/sec Loss 6.7306 LearningRate 0.0007 Epoch: 9 Global Step: 191890 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:28,569-Speed 6308.81 samples/sec Loss 6.6582 LearningRate 0.0007 Epoch: 9 Global Step: 191900 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:11:31,811-Speed 6318.80 samples/sec Loss 6.7142 LearningRate 0.0007 Epoch: 9 Global Step: 191910 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:11:35,043-Speed 6338.30 samples/sec Loss 6.6273 LearningRate 0.0007 Epoch: 9 Global Step: 191920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:38,325-Speed 6241.57 samples/sec Loss 6.6532 LearningRate 0.0007 Epoch: 9 Global Step: 191930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:41,568-Speed 6315.20 samples/sec Loss 6.6690 LearningRate 0.0007 Epoch: 9 Global Step: 191940 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:44,815-Speed 6310.26 samples/sec Loss 6.6516 LearningRate 0.0007 Epoch: 9 Global Step: 191950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:48,057-Speed 6318.24 samples/sec Loss 6.7207 LearningRate 0.0007 Epoch: 9 Global Step: 191960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:51,306-Speed 6304.08 samples/sec Loss 6.6837 LearningRate 0.0007 Epoch: 9 Global Step: 191970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:54,555-Speed 6304.98 samples/sec Loss 6.6221 LearningRate 0.0007 Epoch: 9 Global Step: 191980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:11:57,813-Speed 6288.54 samples/sec Loss 6.7101 LearningRate 0.0007 Epoch: 9 Global Step: 191990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:01,072-Speed 6285.02 samples/sec Loss 6.7322 LearningRate 0.0007 Epoch: 9 Global Step: 192000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:04,323-Speed 6300.49 samples/sec Loss 6.6929 LearningRate 0.0007 Epoch: 9 Global Step: 192010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:07,574-Speed 6301.03 samples/sec Loss 6.7599 LearningRate 0.0007 Epoch: 9 Global Step: 192020 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:12:10,807-Speed 6337.07 samples/sec Loss 6.7338 LearningRate 0.0007 Epoch: 9 Global Step: 192030 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:14,052-Speed 6311.52 samples/sec Loss 6.6528 LearningRate 0.0007 Epoch: 9 Global Step: 192040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:17,300-Speed 6307.89 samples/sec Loss 6.6430 LearningRate 0.0007 Epoch: 9 Global Step: 192050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:20,549-Speed 6304.73 samples/sec Loss 6.6244 LearningRate 0.0007 Epoch: 9 Global Step: 192060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:23,795-Speed 6311.98 samples/sec Loss 6.7178 LearningRate 0.0007 Epoch: 9 Global Step: 192070 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:27,043-Speed 6306.90 samples/sec Loss 6.7111 LearningRate 0.0007 Epoch: 9 Global Step: 192080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:30,299-Speed 6290.02 samples/sec Loss 6.6893 LearningRate 0.0007 Epoch: 9 Global Step: 192090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:33,548-Speed 6305.18 samples/sec Loss 6.7526 LearningRate 0.0007 Epoch: 9 Global Step: 192100 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:36,795-Speed 6308.27 samples/sec Loss 6.6648 LearningRate 0.0007 Epoch: 9 Global Step: 192110 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:40,041-Speed 6311.91 samples/sec Loss 6.6159 LearningRate 0.0007 Epoch: 9 Global Step: 192120 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:43,286-Speed 6312.63 samples/sec Loss 6.7447 LearningRate 0.0007 Epoch: 9 Global Step: 192130 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:12:46,537-Speed 6300.73 samples/sec Loss 6.7048 LearningRate 0.0007 Epoch: 9 Global Step: 192140 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:12:49,767-Speed 6342.66 samples/sec Loss 6.6933 LearningRate 0.0007 Epoch: 9 Global Step: 192150 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:53,065-Speed 6209.76 samples/sec Loss 6.6870 LearningRate 0.0007 Epoch: 9 Global Step: 192160 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:56,329-Speed 6277.87 samples/sec Loss 6.6754 LearningRate 0.0007 Epoch: 9 Global Step: 192170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:12:59,574-Speed 6312.47 samples/sec Loss 6.6962 LearningRate 0.0007 Epoch: 9 Global Step: 192180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:02,827-Speed 6297.03 samples/sec Loss 6.6679 LearningRate 0.0007 Epoch: 9 Global Step: 192190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:06,071-Speed 6314.31 samples/sec Loss 6.6230 LearningRate 0.0007 Epoch: 9 Global Step: 192200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:09,320-Speed 6305.29 samples/sec Loss 6.6733 LearningRate 0.0007 Epoch: 9 Global Step: 192210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:12,569-Speed 6304.14 samples/sec Loss 6.6398 LearningRate 0.0007 Epoch: 9 Global Step: 192220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:15,815-Speed 6310.97 samples/sec Loss 6.7182 LearningRate 0.0007 Epoch: 9 Global Step: 192230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:19,062-Speed 6307.69 samples/sec Loss 6.6941 LearningRate 0.0007 Epoch: 9 Global Step: 192240 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:22,308-Speed 6311.44 samples/sec Loss 6.7003 LearningRate 0.0007 Epoch: 9 Global Step: 192250 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:13:25,562-Speed 6295.47 samples/sec Loss 6.6940 LearningRate 0.0007 Epoch: 9 Global Step: 192260 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:13:28,811-Speed 6305.23 samples/sec Loss 6.7744 LearningRate 0.0007 Epoch: 9 Global Step: 192270 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:13:32,057-Speed 6311.51 samples/sec Loss 6.6987 LearningRate 0.0007 Epoch: 9 Global Step: 192280 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:13:35,303-Speed 6309.98 samples/sec Loss 6.7590 LearningRate 0.0007 Epoch: 9 Global Step: 192290 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:13:38,535-Speed 6338.39 samples/sec Loss 6.6397 LearningRate 0.0007 Epoch: 9 Global Step: 192300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:41,783-Speed 6308.06 samples/sec Loss 6.6584 LearningRate 0.0007 Epoch: 9 Global Step: 192310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:45,032-Speed 6304.45 samples/sec Loss 6.7470 LearningRate 0.0007 Epoch: 9 Global Step: 192320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:48,276-Speed 6313.38 samples/sec Loss 6.6191 LearningRate 0.0007 Epoch: 9 Global Step: 192330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:51,524-Speed 6307.16 samples/sec Loss 6.6658 LearningRate 0.0007 Epoch: 9 Global Step: 192340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:54,786-Speed 6280.15 samples/sec Loss 6.7241 LearningRate 0.0007 Epoch: 9 Global Step: 192350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:13:58,035-Speed 6305.47 samples/sec Loss 6.6177 LearningRate 0.0007 Epoch: 9 Global Step: 192360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:01,293-Speed 6287.63 samples/sec Loss 6.6513 LearningRate 0.0007 Epoch: 9 Global Step: 192370 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:04,557-Speed 6275.33 samples/sec Loss 6.5823 LearningRate 0.0007 Epoch: 9 Global Step: 192380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:07,803-Speed 6310.71 samples/sec Loss 6.6963 LearningRate 0.0007 Epoch: 9 Global Step: 192390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:11,048-Speed 6311.46 samples/sec Loss 6.7195 LearningRate 0.0007 Epoch: 9 Global Step: 192400 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:14:14,299-Speed 6302.15 samples/sec Loss 6.6790 LearningRate 0.0007 Epoch: 9 Global Step: 192410 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:14:17,551-Speed 6297.76 samples/sec Loss 6.6706 LearningRate 0.0007 Epoch: 9 Global Step: 192420 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:14:20,801-Speed 6304.39 samples/sec Loss 6.6671 LearningRate 0.0007 Epoch: 9 Global Step: 192430 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:14:24,042-Speed 6319.48 samples/sec Loss 6.5910 LearningRate 0.0007 Epoch: 9 Global Step: 192440 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:14:27,289-Speed 6309.44 samples/sec Loss 6.6611 LearningRate 0.0007 Epoch: 9 Global Step: 192450 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:14:30,535-Speed 6310.14 samples/sec Loss 6.7206 LearningRate 0.0007 Epoch: 9 Global Step: 192460 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:14:33,783-Speed 6306.88 samples/sec Loss 6.5747 LearningRate 0.0007 Epoch: 9 Global Step: 192470 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:14:37,017-Speed 6334.95 samples/sec Loss 6.6363 LearningRate 0.0007 Epoch: 9 Global Step: 192480 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:40,266-Speed 6304.98 samples/sec Loss 6.6878 LearningRate 0.0007 Epoch: 9 Global Step: 192490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:43,516-Speed 6303.16 samples/sec Loss 6.7017 LearningRate 0.0007 Epoch: 9 Global Step: 192500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:46,764-Speed 6307.28 samples/sec Loss 6.7473 LearningRate 0.0007 Epoch: 9 Global Step: 192510 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:50,009-Speed 6312.86 samples/sec Loss 6.6745 LearningRate 0.0007 Epoch: 9 Global Step: 192520 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:53,256-Speed 6308.79 samples/sec Loss 6.7058 LearningRate 0.0007 Epoch: 9 Global Step: 192530 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:56,498-Speed 6317.78 samples/sec Loss 6.6852 LearningRate 0.0007 Epoch: 9 Global Step: 192540 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:14:59,753-Speed 6293.64 samples/sec Loss 6.6452 LearningRate 0.0007 Epoch: 9 Global Step: 192550 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:03,002-Speed 6303.85 samples/sec Loss 6.7259 LearningRate 0.0007 Epoch: 9 Global Step: 192560 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:06,251-Speed 6305.17 samples/sec Loss 6.6298 LearningRate 0.0007 Epoch: 9 Global Step: 192570 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:09,486-Speed 6332.97 samples/sec Loss 6.6621 LearningRate 0.0007 Epoch: 9 Global Step: 192580 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:12,729-Speed 6315.54 samples/sec Loss 6.7001 LearningRate 0.0007 Epoch: 9 Global Step: 192590 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:15,980-Speed 6302.06 samples/sec Loss 6.6734 LearningRate 0.0007 Epoch: 9 Global Step: 192600 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:19,223-Speed 6315.75 samples/sec Loss 6.6912 LearningRate 0.0007 Epoch: 9 Global Step: 192610 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:22,468-Speed 6312.62 samples/sec Loss 6.6599 LearningRate 0.0007 Epoch: 9 Global Step: 192620 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:25,715-Speed 6309.50 samples/sec Loss 6.7049 LearningRate 0.0007 Epoch: 9 Global Step: 192630 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:28,967-Speed 6299.38 samples/sec Loss 6.6375 LearningRate 0.0007 Epoch: 9 Global Step: 192640 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:32,216-Speed 6304.77 samples/sec Loss 6.6701 LearningRate 0.0007 Epoch: 9 Global Step: 192650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:35,467-Speed 6301.24 samples/sec Loss 6.7198 LearningRate 0.0007 Epoch: 9 Global Step: 192660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:38,713-Speed 6309.33 samples/sec Loss 6.6746 LearningRate 0.0007 Epoch: 9 Global Step: 192670 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:41,962-Speed 6306.25 samples/sec Loss 6.6356 LearningRate 0.0007 Epoch: 9 Global Step: 192680 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:15:45,204-Speed 6318.53 samples/sec Loss 6.6679 LearningRate 0.0007 Epoch: 9 Global Step: 192690 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:15:48,437-Speed 6336.78 samples/sec Loss 6.7620 LearningRate 0.0007 Epoch: 9 Global Step: 192700 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:51,684-Speed 6307.28 samples/sec Loss 6.7197 LearningRate 0.0007 Epoch: 9 Global Step: 192710 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:54,927-Speed 6316.90 samples/sec Loss 6.7132 LearningRate 0.0007 Epoch: 9 Global Step: 192720 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:15:58,178-Speed 6301.77 samples/sec Loss 6.6430 LearningRate 0.0007 Epoch: 9 Global Step: 192730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:01,440-Speed 6279.72 samples/sec Loss 6.6942 LearningRate 0.0007 Epoch: 9 Global Step: 192740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:04,693-Speed 6296.53 samples/sec Loss 6.7212 LearningRate 0.0007 Epoch: 9 Global Step: 192750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:07,944-Speed 6302.63 samples/sec Loss 6.7046 LearningRate 0.0007 Epoch: 9 Global Step: 192760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:11,194-Speed 6302.77 samples/sec Loss 6.6525 LearningRate 0.0007 Epoch: 9 Global Step: 192770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:14,438-Speed 6312.85 samples/sec Loss 6.6940 LearningRate 0.0007 Epoch: 9 Global Step: 192780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:17,687-Speed 6304.76 samples/sec Loss 6.7006 LearningRate 0.0007 Epoch: 9 Global Step: 192790 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:20,935-Speed 6307.08 samples/sec Loss 6.6909 LearningRate 0.0007 Epoch: 9 Global Step: 192800 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:16:24,180-Speed 6312.85 samples/sec Loss 6.6104 LearningRate 0.0007 Epoch: 9 Global Step: 192810 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:16:27,414-Speed 6335.51 samples/sec Loss 6.6665 LearningRate 0.0007 Epoch: 9 Global Step: 192820 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:30,661-Speed 6307.49 samples/sec Loss 6.7012 LearningRate 0.0007 Epoch: 9 Global Step: 192830 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:33,919-Speed 6288.05 samples/sec Loss 6.6832 LearningRate 0.0007 Epoch: 9 Global Step: 192840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:37,170-Speed 6299.67 samples/sec Loss 6.7554 LearningRate 0.0007 Epoch: 9 Global Step: 192850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:40,419-Speed 6307.09 samples/sec Loss 6.7102 LearningRate 0.0007 Epoch: 9 Global Step: 192860 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:43,665-Speed 6308.67 samples/sec Loss 6.5772 LearningRate 0.0007 Epoch: 9 Global Step: 192870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:46,945-Speed 6246.03 samples/sec Loss 6.6395 LearningRate 0.0007 Epoch: 9 Global Step: 192880 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:50,197-Speed 6298.83 samples/sec Loss 6.5594 LearningRate 0.0007 Epoch: 9 Global Step: 192890 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:53,454-Speed 6289.53 samples/sec Loss 6.6306 LearningRate 0.0007 Epoch: 9 Global Step: 192900 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:56,701-Speed 6312.22 samples/sec Loss 6.6596 LearningRate 0.0007 Epoch: 9 Global Step: 192910 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:16:59,948-Speed 6309.32 samples/sec Loss 6.6918 LearningRate 0.0007 Epoch: 9 Global Step: 192920 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:17:03,209-Speed 6281.36 samples/sec Loss 6.6925 LearningRate 0.0007 Epoch: 9 Global Step: 192930 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:17:06,459-Speed 6303.39 samples/sec Loss 6.6903 LearningRate 0.0007 Epoch: 9 Global Step: 192940 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:17:09,715-Speed 6291.61 samples/sec Loss 6.7493 LearningRate 0.0007 Epoch: 9 Global Step: 192950 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:17:13,038-Speed 6164.65 samples/sec Loss 6.6702 LearningRate 0.0007 Epoch: 9 Global Step: 192960 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:17:16,284-Speed 6310.79 samples/sec Loss 6.6947 LearningRate 0.0007 Epoch: 9 Global Step: 192970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:19,559-Speed 6255.43 samples/sec Loss 6.6311 LearningRate 0.0007 Epoch: 9 Global Step: 192980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:22,807-Speed 6305.85 samples/sec Loss 6.7603 LearningRate 0.0007 Epoch: 9 Global Step: 192990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:26,053-Speed 6310.43 samples/sec Loss 6.7428 LearningRate 0.0007 Epoch: 9 Global Step: 193000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:29,300-Speed 6308.74 samples/sec Loss 6.6909 LearningRate 0.0007 Epoch: 9 Global Step: 193010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:32,565-Speed 6275.02 samples/sec Loss 6.6483 LearningRate 0.0007 Epoch: 9 Global Step: 193020 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:35,811-Speed 6308.98 samples/sec Loss 6.6894 LearningRate 0.0007 Epoch: 9 Global Step: 193030 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:39,063-Speed 6300.23 samples/sec Loss 6.6013 LearningRate 0.0007 Epoch: 9 Global Step: 193040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:42,314-Speed 6300.91 samples/sec Loss 6.5782 LearningRate 0.0007 Epoch: 9 Global Step: 193050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:45,563-Speed 6304.07 samples/sec Loss 6.6553 LearningRate 0.0007 Epoch: 9 Global Step: 193060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:48,809-Speed 6311.07 samples/sec Loss 6.6120 LearningRate 0.0007 Epoch: 9 Global Step: 193070 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:17:52,041-Speed 6338.00 samples/sec Loss 6.6550 LearningRate 0.0007 Epoch: 9 Global Step: 193080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:55,288-Speed 6309.98 samples/sec Loss 6.7337 LearningRate 0.0007 Epoch: 9 Global Step: 193090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:17:58,531-Speed 6314.96 samples/sec Loss 6.6420 LearningRate 0.0007 Epoch: 9 Global Step: 193100 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:01,812-Speed 6244.80 samples/sec Loss 6.6072 LearningRate 0.0007 Epoch: 9 Global Step: 193110 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:05,074-Speed 6279.90 samples/sec Loss 6.6099 LearningRate 0.0007 Epoch: 9 Global Step: 193120 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:08,328-Speed 6296.62 samples/sec Loss 6.6746 LearningRate 0.0007 Epoch: 9 Global Step: 193130 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:11,575-Speed 6307.00 samples/sec Loss 6.6963 LearningRate 0.0007 Epoch: 9 Global Step: 193140 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:14,823-Speed 6307.86 samples/sec Loss 6.6992 LearningRate 0.0007 Epoch: 9 Global Step: 193150 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:18,072-Speed 6304.30 samples/sec Loss 6.6353 LearningRate 0.0007 Epoch: 9 Global Step: 193160 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:21,321-Speed 6304.40 samples/sec Loss 6.6690 LearningRate 0.0007 Epoch: 9 Global Step: 193170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:24,574-Speed 6297.85 samples/sec Loss 6.6496 LearningRate 0.0007 Epoch: 9 Global Step: 193180 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:18:27,818-Speed 6314.25 samples/sec Loss 6.7431 LearningRate 0.0007 Epoch: 9 Global Step: 193190 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:18:31,066-Speed 6306.84 samples/sec Loss 6.6834 LearningRate 0.0007 Epoch: 9 Global Step: 193200 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:18:34,300-Speed 6333.73 samples/sec Loss 6.6714 LearningRate 0.0007 Epoch: 9 Global Step: 193210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:37,567-Speed 6271.81 samples/sec Loss 6.6340 LearningRate 0.0007 Epoch: 9 Global Step: 193220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:40,816-Speed 6303.14 samples/sec Loss 6.5790 LearningRate 0.0007 Epoch: 9 Global Step: 193230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:44,064-Speed 6308.01 samples/sec Loss 6.7079 LearningRate 0.0007 Epoch: 9 Global Step: 193240 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:47,311-Speed 6307.06 samples/sec Loss 6.7079 LearningRate 0.0007 Epoch: 9 Global Step: 193250 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:50,566-Speed 6293.71 samples/sec Loss 6.6849 LearningRate 0.0007 Epoch: 9 Global Step: 193260 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:53,815-Speed 6306.13 samples/sec Loss 6.6031 LearningRate 0.0007 Epoch: 9 Global Step: 193270 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:18:57,060-Speed 6312.18 samples/sec Loss 6.6809 LearningRate 0.0007 Epoch: 9 Global Step: 193280 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:00,302-Speed 6317.37 samples/sec Loss 6.6520 LearningRate 0.0007 Epoch: 9 Global Step: 193290 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:03,553-Speed 6302.13 samples/sec Loss 6.6375 LearningRate 0.0007 Epoch: 9 Global Step: 193300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:06,784-Speed 6339.86 samples/sec Loss 6.6320 LearningRate 0.0007 Epoch: 9 Global Step: 193310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:10,031-Speed 6308.65 samples/sec Loss 6.7443 LearningRate 0.0007 Epoch: 9 Global Step: 193320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:13,280-Speed 6305.46 samples/sec Loss 6.6878 LearningRate 0.0007 Epoch: 9 Global Step: 193330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:16,528-Speed 6306.54 samples/sec Loss 6.6911 LearningRate 0.0007 Epoch: 9 Global Step: 193340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:19,781-Speed 6297.30 samples/sec Loss 6.7087 LearningRate 0.0007 Epoch: 9 Global Step: 193350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:23,025-Speed 6314.34 samples/sec Loss 6.6702 LearningRate 0.0007 Epoch: 9 Global Step: 193360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:26,284-Speed 6285.63 samples/sec Loss 6.7002 LearningRate 0.0007 Epoch: 9 Global Step: 193370 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:29,532-Speed 6306.96 samples/sec Loss 6.6269 LearningRate 0.0007 Epoch: 9 Global Step: 193380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:32,777-Speed 6312.88 samples/sec Loss 6.6552 LearningRate 0.0007 Epoch: 9 Global Step: 193390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:36,038-Speed 6282.54 samples/sec Loss 6.6962 LearningRate 0.0007 Epoch: 9 Global Step: 193400 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:39,286-Speed 6306.20 samples/sec Loss 6.5949 LearningRate 0.0007 Epoch: 9 Global Step: 193410 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:19:42,520-Speed 6333.71 samples/sec Loss 6.6361 LearningRate 0.0007 Epoch: 9 Global Step: 193420 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:45,764-Speed 6315.66 samples/sec Loss 6.6312 LearningRate 0.0007 Epoch: 9 Global Step: 193430 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:49,008-Speed 6314.71 samples/sec Loss 6.7535 LearningRate 0.0007 Epoch: 9 Global Step: 193440 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:52,256-Speed 6306.06 samples/sec Loss 6.6964 LearningRate 0.0007 Epoch: 9 Global Step: 193450 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:55,504-Speed 6307.38 samples/sec Loss 6.5995 LearningRate 0.0007 Epoch: 9 Global Step: 193460 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:19:58,752-Speed 6306.72 samples/sec Loss 6.7219 LearningRate 0.0007 Epoch: 9 Global Step: 193470 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:01,996-Speed 6312.96 samples/sec Loss 6.6249 LearningRate 0.0007 Epoch: 9 Global Step: 193480 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:05,241-Speed 6313.19 samples/sec Loss 6.6064 LearningRate 0.0007 Epoch: 9 Global Step: 193490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:08,485-Speed 6314.43 samples/sec Loss 6.7097 LearningRate 0.0007 Epoch: 9 Global Step: 193500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:11,730-Speed 6313.15 samples/sec Loss 6.6861 LearningRate 0.0007 Epoch: 9 Global Step: 193510 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:14,977-Speed 6308.19 samples/sec Loss 6.6585 LearningRate 0.0007 Epoch: 9 Global Step: 193520 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:20:18,225-Speed 6307.83 samples/sec Loss 6.6184 LearningRate 0.0007 Epoch: 9 Global Step: 193530 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:20:21,473-Speed 6307.23 samples/sec Loss 6.6964 LearningRate 0.0007 Epoch: 9 Global Step: 193540 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:20:24,724-Speed 6301.69 samples/sec Loss 6.6425 LearningRate 0.0007 Epoch: 9 Global Step: 193550 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:20:27,970-Speed 6310.29 samples/sec Loss 6.7092 LearningRate 0.0007 Epoch: 9 Global Step: 193560 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:20:31,205-Speed 6332.39 samples/sec Loss 6.7651 LearningRate 0.0007 Epoch: 9 Global Step: 193570 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:34,459-Speed 6294.49 samples/sec Loss 6.6535 LearningRate 0.0007 Epoch: 9 Global Step: 193580 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:37,705-Speed 6312.11 samples/sec Loss 6.6472 LearningRate 0.0007 Epoch: 9 Global Step: 193590 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:40,953-Speed 6304.85 samples/sec Loss 6.7252 LearningRate 0.0007 Epoch: 9 Global Step: 193600 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:44,202-Speed 6305.40 samples/sec Loss 6.7680 LearningRate 0.0007 Epoch: 9 Global Step: 193610 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:47,445-Speed 6316.73 samples/sec Loss 6.7109 LearningRate 0.0007 Epoch: 9 Global Step: 193620 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:50,694-Speed 6305.24 samples/sec Loss 6.6827 LearningRate 0.0007 Epoch: 9 Global Step: 193630 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:53,940-Speed 6310.40 samples/sec Loss 6.6589 LearningRate 0.0007 Epoch: 9 Global Step: 193640 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:20:57,186-Speed 6311.87 samples/sec Loss 6.5961 LearningRate 0.0007 Epoch: 9 Global Step: 193650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:00,436-Speed 6303.07 samples/sec Loss 6.7550 LearningRate 0.0007 Epoch: 9 Global Step: 193660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:03,684-Speed 6306.48 samples/sec Loss 6.6295 LearningRate 0.0007 Epoch: 9 Global Step: 193670 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:21:06,936-Speed 6298.85 samples/sec Loss 6.6859 LearningRate 0.0007 Epoch: 9 Global Step: 193680 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:21:10,164-Speed 6345.56 samples/sec Loss 6.7259 LearningRate 0.0007 Epoch: 9 Global Step: 193690 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:13,408-Speed 6315.16 samples/sec Loss 6.6669 LearningRate 0.0007 Epoch: 9 Global Step: 193700 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:16,657-Speed 6303.76 samples/sec Loss 6.6544 LearningRate 0.0007 Epoch: 9 Global Step: 193710 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:19,903-Speed 6311.28 samples/sec Loss 6.6800 LearningRate 0.0007 Epoch: 9 Global Step: 193720 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:23,151-Speed 6307.01 samples/sec Loss 6.6716 LearningRate 0.0007 Epoch: 9 Global Step: 193730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:26,397-Speed 6310.43 samples/sec Loss 6.6498 LearningRate 0.0007 Epoch: 9 Global Step: 193740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:29,648-Speed 6301.77 samples/sec Loss 6.6469 LearningRate 0.0007 Epoch: 9 Global Step: 193750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:32,906-Speed 6286.46 samples/sec Loss 6.6649 LearningRate 0.0007 Epoch: 9 Global Step: 193760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:36,154-Speed 6307.53 samples/sec Loss 6.6777 LearningRate 0.0007 Epoch: 9 Global Step: 193770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:39,404-Speed 6303.63 samples/sec Loss 6.7008 LearningRate 0.0007 Epoch: 9 Global Step: 193780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:42,652-Speed 6306.19 samples/sec Loss 6.6864 LearningRate 0.0007 Epoch: 9 Global Step: 193790 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:21:45,888-Speed 6329.52 samples/sec Loss 6.6105 LearningRate 0.0007 Epoch: 9 Global Step: 193800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:49,139-Speed 6301.98 samples/sec Loss 6.6881 LearningRate 0.0007 Epoch: 9 Global Step: 193810 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:52,386-Speed 6308.34 samples/sec Loss 6.6195 LearningRate 0.0007 Epoch: 9 Global Step: 193820 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:55,631-Speed 6313.66 samples/sec Loss 6.7125 LearningRate 0.0007 Epoch: 9 Global Step: 193830 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:21:58,876-Speed 6311.35 samples/sec Loss 6.7199 LearningRate 0.0007 Epoch: 9 Global Step: 193840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:02,125-Speed 6305.10 samples/sec Loss 6.7089 LearningRate 0.0007 Epoch: 9 Global Step: 193850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:05,372-Speed 6309.34 samples/sec Loss 6.6391 LearningRate 0.0007 Epoch: 9 Global Step: 193860 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:08,618-Speed 6309.67 samples/sec Loss 6.7613 LearningRate 0.0007 Epoch: 9 Global Step: 193870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:11,868-Speed 6302.47 samples/sec Loss 6.7328 LearningRate 0.0007 Epoch: 9 Global Step: 193880 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:15,117-Speed 6305.30 samples/sec Loss 6.6947 LearningRate 0.0007 Epoch: 9 Global Step: 193890 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:18,368-Speed 6300.63 samples/sec Loss 6.6641 LearningRate 0.0007 Epoch: 9 Global Step: 193900 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:22:21,604-Speed 6331.95 samples/sec Loss 6.6859 LearningRate 0.0007 Epoch: 9 Global Step: 193910 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:24,848-Speed 6314.39 samples/sec Loss 6.6203 LearningRate 0.0007 Epoch: 9 Global Step: 193920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:28,097-Speed 6304.80 samples/sec Loss 6.6986 LearningRate 0.0007 Epoch: 9 Global Step: 193930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:31,344-Speed 6309.33 samples/sec Loss 6.6742 LearningRate 0.0007 Epoch: 9 Global Step: 193940 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:34,591-Speed 6308.42 samples/sec Loss 6.6108 LearningRate 0.0007 Epoch: 9 Global Step: 193950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:37,843-Speed 6299.68 samples/sec Loss 6.7110 LearningRate 0.0007 Epoch: 9 Global Step: 193960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:41,093-Speed 6301.89 samples/sec Loss 6.6868 LearningRate 0.0007 Epoch: 9 Global Step: 193970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:44,346-Speed 6298.76 samples/sec Loss 6.7030 LearningRate 0.0007 Epoch: 9 Global Step: 193980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:47,593-Speed 6308.75 samples/sec Loss 6.6687 LearningRate 0.0007 Epoch: 9 Global Step: 193990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:50,839-Speed 6310.53 samples/sec Loss 6.6440 LearningRate 0.0007 Epoch: 9 Global Step: 194000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:22:54,086-Speed 6307.91 samples/sec Loss 6.5860 LearningRate 0.0007 Epoch: 9 Global Step: 194010 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:22:57,333-Speed 6308.40 samples/sec Loss 6.6292 LearningRate 0.0007 Epoch: 9 Global Step: 194020 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:23:00,573-Speed 6323.16 samples/sec Loss 6.6306 LearningRate 0.0007 Epoch: 9 Global Step: 194030 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:03,817-Speed 6314.75 samples/sec Loss 6.6510 LearningRate 0.0007 Epoch: 9 Global Step: 194040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:07,063-Speed 6310.63 samples/sec Loss 6.6473 LearningRate 0.0007 Epoch: 9 Global Step: 194050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:10,309-Speed 6310.28 samples/sec Loss 6.6411 LearningRate 0.0007 Epoch: 9 Global Step: 194060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:13,560-Speed 6300.60 samples/sec Loss 6.6174 LearningRate 0.0007 Epoch: 9 Global Step: 194070 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:16,808-Speed 6306.31 samples/sec Loss 6.6588 LearningRate 0.0007 Epoch: 9 Global Step: 194080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:20,056-Speed 6307.39 samples/sec Loss 6.7319 LearningRate 0.0007 Epoch: 9 Global Step: 194090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:23,305-Speed 6305.50 samples/sec Loss 6.7305 LearningRate 0.0007 Epoch: 9 Global Step: 194100 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:26,552-Speed 6307.64 samples/sec Loss 6.6480 LearningRate 0.0007 Epoch: 9 Global Step: 194110 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:29,804-Speed 6298.97 samples/sec Loss 6.6241 LearningRate 0.0007 Epoch: 9 Global Step: 194120 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:33,049-Speed 6312.94 samples/sec Loss 6.7047 LearningRate 0.0007 Epoch: 9 Global Step: 194130 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:23:36,287-Speed 6327.53 samples/sec Loss 6.6352 LearningRate 0.0007 Epoch: 9 Global Step: 194140 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:39,532-Speed 6311.71 samples/sec Loss 6.6631 LearningRate 0.0007 Epoch: 9 Global Step: 194150 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:42,777-Speed 6311.76 samples/sec Loss 6.5618 LearningRate 0.0007 Epoch: 9 Global Step: 194160 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:46,027-Speed 6304.77 samples/sec Loss 6.6197 LearningRate 0.0007 Epoch: 9 Global Step: 194170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:49,273-Speed 6310.53 samples/sec Loss 6.7598 LearningRate 0.0007 Epoch: 9 Global Step: 194180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:52,521-Speed 6306.82 samples/sec Loss 6.7249 LearningRate 0.0007 Epoch: 9 Global Step: 194190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:55,769-Speed 6307.17 samples/sec Loss 6.5781 LearningRate 0.0007 Epoch: 9 Global Step: 194200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:23:59,014-Speed 6312.96 samples/sec Loss 6.6649 LearningRate 0.0007 Epoch: 9 Global Step: 194210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:02,264-Speed 6303.49 samples/sec Loss 6.7225 LearningRate 0.0007 Epoch: 9 Global Step: 194220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:05,512-Speed 6306.09 samples/sec Loss 6.6272 LearningRate 0.0007 Epoch: 9 Global Step: 194230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:08,758-Speed 6309.74 samples/sec Loss 6.5473 LearningRate 0.0007 Epoch: 9 Global Step: 194240 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:24:11,993-Speed 6331.63 samples/sec Loss 6.6619 LearningRate 0.0007 Epoch: 9 Global Step: 194250 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:15,238-Speed 6312.85 samples/sec Loss 6.6237 LearningRate 0.0007 Epoch: 9 Global Step: 194260 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:18,486-Speed 6308.37 samples/sec Loss 6.6573 LearningRate 0.0007 Epoch: 9 Global Step: 194270 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:21,730-Speed 6314.01 samples/sec Loss 6.6173 LearningRate 0.0007 Epoch: 9 Global Step: 194280 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:24,980-Speed 6302.57 samples/sec Loss 6.7330 LearningRate 0.0007 Epoch: 9 Global Step: 194290 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:28,226-Speed 6311.26 samples/sec Loss 6.6127 LearningRate 0.0007 Epoch: 9 Global Step: 194300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:31,471-Speed 6312.49 samples/sec Loss 6.7434 LearningRate 0.0007 Epoch: 9 Global Step: 194310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:34,719-Speed 6306.91 samples/sec Loss 6.6519 LearningRate 0.0007 Epoch: 9 Global Step: 194320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:37,967-Speed 6306.49 samples/sec Loss 6.7035 LearningRate 0.0007 Epoch: 9 Global Step: 194330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:41,214-Speed 6309.02 samples/sec Loss 6.7052 LearningRate 0.0007 Epoch: 9 Global Step: 194340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:24:44,460-Speed 6311.33 samples/sec Loss 6.6177 LearningRate 0.0007 Epoch: 9 Global Step: 194350 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:24:47,707-Speed 6307.94 samples/sec Loss 6.6488 LearningRate 0.0007 Epoch: 9 Global Step: 194360 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:24:50,957-Speed 6303.70 samples/sec Loss 6.6349 LearningRate 0.0007 Epoch: 9 Global Step: 194370 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:24:54,207-Speed 6302.78 samples/sec Loss 6.6343 LearningRate 0.0007 Epoch: 9 Global Step: 194380 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:24:57,453-Speed 6310.48 samples/sec Loss 6.5827 LearningRate 0.0007 Epoch: 9 Global Step: 194390 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:25:00,683-Speed 6341.08 samples/sec Loss 6.7117 LearningRate 0.0007 Epoch: 9 Global Step: 194400 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:03,933-Speed 6304.48 samples/sec Loss 6.6057 LearningRate 0.0007 Epoch: 9 Global Step: 194410 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:07,176-Speed 6317.36 samples/sec Loss 6.5461 LearningRate 0.0007 Epoch: 9 Global Step: 194420 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:10,421-Speed 6311.75 samples/sec Loss 6.6406 LearningRate 0.0007 Epoch: 9 Global Step: 194430 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:13,668-Speed 6308.22 samples/sec Loss 6.6326 LearningRate 0.0007 Epoch: 9 Global Step: 194440 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:16,914-Speed 6311.29 samples/sec Loss 6.6758 LearningRate 0.0007 Epoch: 9 Global Step: 194450 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:20,160-Speed 6310.26 samples/sec Loss 6.6198 LearningRate 0.0007 Epoch: 9 Global Step: 194460 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:23,407-Speed 6309.25 samples/sec Loss 6.7396 LearningRate 0.0007 Epoch: 9 Global Step: 194470 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:26,659-Speed 6298.87 samples/sec Loss 6.6249 LearningRate 0.0007 Epoch: 9 Global Step: 194480 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:29,904-Speed 6313.29 samples/sec Loss 6.7116 LearningRate 0.0007 Epoch: 9 Global Step: 194490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:33,141-Speed 6327.95 samples/sec Loss 6.6021 LearningRate 0.0007 Epoch: 9 Global Step: 194500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:36,390-Speed 6304.04 samples/sec Loss 6.6355 LearningRate 0.0007 Epoch: 9 Global Step: 194510 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:39,641-Speed 6301.69 samples/sec Loss 6.6525 LearningRate 0.0007 Epoch: 9 Global Step: 194520 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:42,885-Speed 6314.18 samples/sec Loss 6.6296 LearningRate 0.0007 Epoch: 9 Global Step: 194530 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:46,131-Speed 6311.37 samples/sec Loss 6.6752 LearningRate 0.0007 Epoch: 9 Global Step: 194540 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:49,380-Speed 6303.48 samples/sec Loss 6.6086 LearningRate 0.0007 Epoch: 9 Global Step: 194550 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:52,627-Speed 6309.70 samples/sec Loss 6.6800 LearningRate 0.0007 Epoch: 9 Global Step: 194560 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:55,875-Speed 6307.43 samples/sec Loss 6.6640 LearningRate 0.0007 Epoch: 9 Global Step: 194570 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:25:59,123-Speed 6306.89 samples/sec Loss 6.7264 LearningRate 0.0007 Epoch: 9 Global Step: 194580 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:02,367-Speed 6314.09 samples/sec Loss 6.5765 LearningRate 0.0007 Epoch: 9 Global Step: 194590 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:05,684-Speed 6176.62 samples/sec Loss 6.6841 LearningRate 0.0007 Epoch: 9 Global Step: 194600 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:26:08,926-Speed 6318.24 samples/sec Loss 6.6084 LearningRate 0.0007 Epoch: 9 Global Step: 194610 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:26:12,175-Speed 6304.08 samples/sec Loss 6.8327 LearningRate 0.0007 Epoch: 9 Global Step: 194620 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:26:15,426-Speed 6301.47 samples/sec Loss 6.6140 LearningRate 0.0007 Epoch: 9 Global Step: 194630 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:26:18,675-Speed 6304.92 samples/sec Loss 6.6665 LearningRate 0.0007 Epoch: 9 Global Step: 194640 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:26:21,912-Speed 6328.22 samples/sec Loss 6.6084 LearningRate 0.0007 Epoch: 9 Global Step: 194650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:25,157-Speed 6313.92 samples/sec Loss 6.7146 LearningRate 0.0007 Epoch: 9 Global Step: 194660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:28,406-Speed 6305.23 samples/sec Loss 6.7019 LearningRate 0.0007 Epoch: 9 Global Step: 194670 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:31,658-Speed 6297.72 samples/sec Loss 6.6553 LearningRate 0.0007 Epoch: 9 Global Step: 194680 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:34,905-Speed 6309.07 samples/sec Loss 6.7437 LearningRate 0.0007 Epoch: 9 Global Step: 194690 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:38,155-Speed 6303.81 samples/sec Loss 6.7223 LearningRate 0.0007 Epoch: 9 Global Step: 194700 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:41,398-Speed 6315.14 samples/sec Loss 6.6346 LearningRate 0.0007 Epoch: 9 Global Step: 194710 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:44,646-Speed 6306.90 samples/sec Loss 6.6542 LearningRate 0.0007 Epoch: 9 Global Step: 194720 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:47,890-Speed 6314.88 samples/sec Loss 6.7456 LearningRate 0.0007 Epoch: 9 Global Step: 194730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:51,137-Speed 6308.13 samples/sec Loss 6.7005 LearningRate 0.0007 Epoch: 9 Global Step: 194740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:26:54,381-Speed 6314.96 samples/sec Loss 6.6395 LearningRate 0.0007 Epoch: 9 Global Step: 194750 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:26:57,611-Speed 6341.73 samples/sec Loss 6.6014 LearningRate 0.0007 Epoch: 9 Global Step: 194760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:00,859-Speed 6306.74 samples/sec Loss 6.6834 LearningRate 0.0007 Epoch: 9 Global Step: 194770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:04,107-Speed 6308.31 samples/sec Loss 6.6416 LearningRate 0.0007 Epoch: 9 Global Step: 194780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:07,353-Speed 6310.49 samples/sec Loss 6.6624 LearningRate 0.0007 Epoch: 9 Global Step: 194790 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:10,595-Speed 6317.16 samples/sec Loss 6.6189 LearningRate 0.0007 Epoch: 9 Global Step: 194800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:13,841-Speed 6312.05 samples/sec Loss 6.6355 LearningRate 0.0007 Epoch: 9 Global Step: 194810 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:17,088-Speed 6307.50 samples/sec Loss 6.5639 LearningRate 0.0007 Epoch: 9 Global Step: 194820 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:20,335-Speed 6308.41 samples/sec Loss 6.6508 LearningRate 0.0007 Epoch: 9 Global Step: 194830 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:23,586-Speed 6303.03 samples/sec Loss 6.6325 LearningRate 0.0007 Epoch: 9 Global Step: 194840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:26,834-Speed 6305.96 samples/sec Loss 6.6662 LearningRate 0.0007 Epoch: 9 Global Step: 194850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:30,089-Speed 6293.93 samples/sec Loss 6.5860 LearningRate 0.0007 Epoch: 9 Global Step: 194860 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:27:33,327-Speed 6326.56 samples/sec Loss 6.6918 LearningRate 0.0007 Epoch: 9 Global Step: 194870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:36,573-Speed 6310.17 samples/sec Loss 6.6888 LearningRate 0.0007 Epoch: 9 Global Step: 194880 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:39,823-Speed 6302.78 samples/sec Loss 6.6062 LearningRate 0.0007 Epoch: 9 Global Step: 194890 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:43,074-Speed 6300.19 samples/sec Loss 6.6805 LearningRate 0.0007 Epoch: 9 Global Step: 194900 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:46,318-Speed 6315.43 samples/sec Loss 6.6377 LearningRate 0.0007 Epoch: 9 Global Step: 194910 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:49,564-Speed 6310.31 samples/sec Loss 6.6500 LearningRate 0.0007 Epoch: 9 Global Step: 194920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:52,811-Speed 6308.88 samples/sec Loss 6.6215 LearningRate 0.0007 Epoch: 9 Global Step: 194930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:56,058-Speed 6308.13 samples/sec Loss 6.6417 LearningRate 0.0007 Epoch: 9 Global Step: 194940 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:27:59,305-Speed 6310.39 samples/sec Loss 6.6317 LearningRate 0.0007 Epoch: 9 Global Step: 194950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:02,551-Speed 6309.63 samples/sec Loss 6.6863 LearningRate 0.0007 Epoch: 9 Global Step: 194960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:05,797-Speed 6311.25 samples/sec Loss 6.6275 LearningRate 0.0007 Epoch: 9 Global Step: 194970 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:28:09,033-Speed 6329.88 samples/sec Loss 6.7266 LearningRate 0.0007 Epoch: 9 Global Step: 194980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:12,287-Speed 6295.18 samples/sec Loss 6.5672 LearningRate 0.0007 Epoch: 9 Global Step: 194990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:15,539-Speed 6298.59 samples/sec Loss 6.6662 LearningRate 0.0007 Epoch: 9 Global Step: 195000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:18,790-Speed 6301.56 samples/sec Loss 6.6937 LearningRate 0.0007 Epoch: 9 Global Step: 195010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:22,038-Speed 6306.56 samples/sec Loss 6.6945 LearningRate 0.0007 Epoch: 9 Global Step: 195020 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:25,285-Speed 6308.85 samples/sec Loss 6.6313 LearningRate 0.0007 Epoch: 9 Global Step: 195030 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:28,539-Speed 6294.47 samples/sec Loss 6.6251 LearningRate 0.0007 Epoch: 9 Global Step: 195040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:31,787-Speed 6307.84 samples/sec Loss 6.6202 LearningRate 0.0007 Epoch: 9 Global Step: 195050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:35,034-Speed 6309.69 samples/sec Loss 6.5757 LearningRate 0.0007 Epoch: 9 Global Step: 195060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:38,280-Speed 6310.71 samples/sec Loss 6.5934 LearningRate 0.0007 Epoch: 9 Global Step: 195070 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:41,513-Speed 6335.07 samples/sec Loss 6.6749 LearningRate 0.0007 Epoch: 9 Global Step: 195080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:44,758-Speed 6314.06 samples/sec Loss 6.7081 LearningRate 0.0007 Epoch: 9 Global Step: 195090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:48,011-Speed 6296.72 samples/sec Loss 6.6018 LearningRate 0.0007 Epoch: 9 Global Step: 195100 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:51,260-Speed 6305.64 samples/sec Loss 6.6535 LearningRate 0.0007 Epoch: 9 Global Step: 195110 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:54,505-Speed 6311.13 samples/sec Loss 6.5794 LearningRate 0.0007 Epoch: 9 Global Step: 195120 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:28:57,754-Speed 6305.60 samples/sec Loss 6.6362 LearningRate 0.0007 Epoch: 9 Global Step: 195130 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:01,003-Speed 6305.77 samples/sec Loss 6.6284 LearningRate 0.0007 Epoch: 9 Global Step: 195140 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:04,251-Speed 6307.85 samples/sec Loss 6.6373 LearningRate 0.0007 Epoch: 9 Global Step: 195150 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:07,498-Speed 6308.41 samples/sec Loss 6.6371 LearningRate 0.0007 Epoch: 9 Global Step: 195160 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:10,742-Speed 6313.76 samples/sec Loss 6.6342 LearningRate 0.0007 Epoch: 9 Global Step: 195170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:14,003-Speed 6282.67 samples/sec Loss 6.6172 LearningRate 0.0007 Epoch: 9 Global Step: 195180 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:29:17,272-Speed 6264.72 samples/sec Loss 6.6964 LearningRate 0.0007 Epoch: 9 Global Step: 195190 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:29:20,520-Speed 6307.92 samples/sec Loss 6.6305 LearningRate 0.0007 Epoch: 9 Global Step: 195200 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:29:23,752-Speed 6338.00 samples/sec Loss 6.6836 LearningRate 0.0007 Epoch: 9 Global Step: 195210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:27,012-Speed 6283.44 samples/sec Loss 6.6380 LearningRate 0.0007 Epoch: 9 Global Step: 195220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:30,258-Speed 6311.24 samples/sec Loss 6.6543 LearningRate 0.0007 Epoch: 9 Global Step: 195230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:33,511-Speed 6295.86 samples/sec Loss 6.6860 LearningRate 0.0007 Epoch: 9 Global Step: 195240 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:36,755-Speed 6315.59 samples/sec Loss 6.6728 LearningRate 0.0007 Epoch: 9 Global Step: 195250 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:40,002-Speed 6308.47 samples/sec Loss 6.6402 LearningRate 0.0007 Epoch: 9 Global Step: 195260 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:43,251-Speed 6305.00 samples/sec Loss 6.6599 LearningRate 0.0007 Epoch: 9 Global Step: 195270 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:46,499-Speed 6306.45 samples/sec Loss 6.6740 LearningRate 0.0007 Epoch: 9 Global Step: 195280 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:49,744-Speed 6313.93 samples/sec Loss 6.6859 LearningRate 0.0007 Epoch: 9 Global Step: 195290 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:52,992-Speed 6305.55 samples/sec Loss 6.7100 LearningRate 0.0007 Epoch: 9 Global Step: 195300 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:56,228-Speed 6330.88 samples/sec Loss 6.7444 LearningRate 0.0007 Epoch: 9 Global Step: 195310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:29:59,475-Speed 6308.95 samples/sec Loss 6.6436 LearningRate 0.0007 Epoch: 9 Global Step: 195320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:02,721-Speed 6310.31 samples/sec Loss 6.6547 LearningRate 0.0007 Epoch: 9 Global Step: 195330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:05,971-Speed 6304.06 samples/sec Loss 6.6019 LearningRate 0.0007 Epoch: 9 Global Step: 195340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:09,218-Speed 6309.01 samples/sec Loss 6.6669 LearningRate 0.0007 Epoch: 9 Global Step: 195350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:12,466-Speed 6305.78 samples/sec Loss 6.7144 LearningRate 0.0007 Epoch: 9 Global Step: 195360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:15,715-Speed 6305.58 samples/sec Loss 6.6018 LearningRate 0.0007 Epoch: 9 Global Step: 195370 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:18,965-Speed 6303.11 samples/sec Loss 6.6186 LearningRate 0.0007 Epoch: 9 Global Step: 195380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:22,213-Speed 6307.03 samples/sec Loss 6.6209 LearningRate 0.0007 Epoch: 9 Global Step: 195390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:25,462-Speed 6303.81 samples/sec Loss 6.7555 LearningRate 0.0007 Epoch: 9 Global Step: 195400 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:28,710-Speed 6307.66 samples/sec Loss 6.7057 LearningRate 0.0007 Epoch: 9 Global Step: 195410 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:30:31,958-Speed 6306.84 samples/sec Loss 6.6653 LearningRate 0.0007 Epoch: 9 Global Step: 195420 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:30:35,206-Speed 6306.39 samples/sec Loss 6.6070 LearningRate 0.0007 Epoch: 9 Global Step: 195430 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:30:38,451-Speed 6312.21 samples/sec Loss 6.5947 LearningRate 0.0007 Epoch: 9 Global Step: 195440 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:30:41,695-Speed 6314.41 samples/sec Loss 6.6034 LearningRate 0.0007 Epoch: 9 Global Step: 195450 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:30:44,943-Speed 6307.86 samples/sec Loss 6.6871 LearningRate 0.0007 Epoch: 9 Global Step: 195460 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:30:48,178-Speed 6332.27 samples/sec Loss 6.6730 LearningRate 0.0007 Epoch: 9 Global Step: 195470 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:51,423-Speed 6311.92 samples/sec Loss 6.6593 LearningRate 0.0007 Epoch: 9 Global Step: 195480 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:54,669-Speed 6311.44 samples/sec Loss 6.6498 LearningRate 0.0007 Epoch: 9 Global Step: 195490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:30:57,913-Speed 6314.28 samples/sec Loss 6.6023 LearningRate 0.0007 Epoch: 9 Global Step: 195500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:01,160-Speed 6310.24 samples/sec Loss 6.6178 LearningRate 0.0007 Epoch: 9 Global Step: 195510 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:04,406-Speed 6308.99 samples/sec Loss 6.6243 LearningRate 0.0007 Epoch: 9 Global Step: 195520 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:07,654-Speed 6307.78 samples/sec Loss 6.6779 LearningRate 0.0007 Epoch: 9 Global Step: 195530 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:10,904-Speed 6303.69 samples/sec Loss 6.6317 LearningRate 0.0007 Epoch: 9 Global Step: 195540 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:14,150-Speed 6310.29 samples/sec Loss 6.5181 LearningRate 0.0007 Epoch: 9 Global Step: 195550 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:17,395-Speed 6311.19 samples/sec Loss 6.6017 LearningRate 0.0007 Epoch: 9 Global Step: 195560 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:20,634-Speed 6326.44 samples/sec Loss 6.5985 LearningRate 0.0007 Epoch: 9 Global Step: 195570 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:23,881-Speed 6308.92 samples/sec Loss 6.6874 LearningRate 0.0007 Epoch: 9 Global Step: 195580 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:27,131-Speed 6300.96 samples/sec Loss 6.6388 LearningRate 0.0007 Epoch: 9 Global Step: 195590 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:30,377-Speed 6312.41 samples/sec Loss 6.6452 LearningRate 0.0007 Epoch: 9 Global Step: 195600 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:33,624-Speed 6307.78 samples/sec Loss 6.6238 LearningRate 0.0007 Epoch: 9 Global Step: 195610 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:36,867-Speed 6316.25 samples/sec Loss 6.5747 LearningRate 0.0007 Epoch: 9 Global Step: 195620 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:40,115-Speed 6308.33 samples/sec Loss 6.6489 LearningRate 0.0007 Epoch: 9 Global Step: 195630 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:43,360-Speed 6312.23 samples/sec Loss 6.7065 LearningRate 0.0007 Epoch: 9 Global Step: 195640 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:46,608-Speed 6305.28 samples/sec Loss 6.6118 LearningRate 0.0007 Epoch: 9 Global Step: 195650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:49,853-Speed 6314.43 samples/sec Loss 6.6494 LearningRate 0.0007 Epoch: 9 Global Step: 195660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:53,100-Speed 6307.18 samples/sec Loss 6.5618 LearningRate 0.0007 Epoch: 9 Global Step: 195670 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:31:56,350-Speed 6304.00 samples/sec Loss 6.6122 LearningRate 0.0007 Epoch: 9 Global Step: 195680 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:31:59,617-Speed 6270.54 samples/sec Loss 6.6960 LearningRate 0.0007 Epoch: 9 Global Step: 195690 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:02,864-Speed 6307.83 samples/sec Loss 6.5575 LearningRate 0.0007 Epoch: 9 Global Step: 195700 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:06,113-Speed 6305.66 samples/sec Loss 6.6404 LearningRate 0.0007 Epoch: 9 Global Step: 195710 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:09,363-Speed 6303.39 samples/sec Loss 6.5428 LearningRate 0.0007 Epoch: 9 Global Step: 195720 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:12,611-Speed 6305.84 samples/sec Loss 6.6532 LearningRate 0.0007 Epoch: 9 Global Step: 195730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:15,859-Speed 6308.16 samples/sec Loss 6.6733 LearningRate 0.0007 Epoch: 9 Global Step: 195740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:19,107-Speed 6306.38 samples/sec Loss 6.6658 LearningRate 0.0007 Epoch: 9 Global Step: 195750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:22,351-Speed 6314.06 samples/sec Loss 6.5585 LearningRate 0.0007 Epoch: 9 Global Step: 195760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:25,597-Speed 6312.05 samples/sec Loss 6.5818 LearningRate 0.0007 Epoch: 9 Global Step: 195770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:28,828-Speed 6338.76 samples/sec Loss 6.6220 LearningRate 0.0007 Epoch: 9 Global Step: 195780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:32,080-Speed 6299.51 samples/sec Loss 6.6967 LearningRate 0.0007 Epoch: 9 Global Step: 195790 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:35,326-Speed 6309.96 samples/sec Loss 6.6482 LearningRate 0.0007 Epoch: 9 Global Step: 195800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:38,572-Speed 6312.27 samples/sec Loss 6.6555 LearningRate 0.0007 Epoch: 9 Global Step: 195810 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:41,825-Speed 6297.01 samples/sec Loss 6.6389 LearningRate 0.0007 Epoch: 9 Global Step: 195820 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:45,074-Speed 6305.08 samples/sec Loss 6.6172 LearningRate 0.0007 Epoch: 9 Global Step: 195830 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:48,322-Speed 6305.85 samples/sec Loss 6.5761 LearningRate 0.0007 Epoch: 9 Global Step: 195840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:51,570-Speed 6306.99 samples/sec Loss 6.6489 LearningRate 0.0007 Epoch: 9 Global Step: 195850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:54,814-Speed 6313.72 samples/sec Loss 6.6373 LearningRate 0.0007 Epoch: 9 Global Step: 195860 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:32:58,064-Speed 6303.95 samples/sec Loss 6.7379 LearningRate 0.0007 Epoch: 9 Global Step: 195870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:01,309-Speed 6313.32 samples/sec Loss 6.6322 LearningRate 0.0007 Epoch: 9 Global Step: 195880 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:33:04,541-Speed 6338.10 samples/sec Loss 6.6205 LearningRate 0.0007 Epoch: 9 Global Step: 195890 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:07,785-Speed 6314.07 samples/sec Loss 6.5975 LearningRate 0.0007 Epoch: 9 Global Step: 195900 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:11,033-Speed 6306.80 samples/sec Loss 6.6857 LearningRate 0.0007 Epoch: 9 Global Step: 195910 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:14,281-Speed 6306.43 samples/sec Loss 6.6824 LearningRate 0.0007 Epoch: 9 Global Step: 195920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:17,527-Speed 6310.83 samples/sec Loss 6.6920 LearningRate 0.0007 Epoch: 9 Global Step: 195930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:20,777-Speed 6303.54 samples/sec Loss 6.6805 LearningRate 0.0007 Epoch: 9 Global Step: 195940 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:24,022-Speed 6313.81 samples/sec Loss 6.7283 LearningRate 0.0007 Epoch: 9 Global Step: 195950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:27,267-Speed 6311.48 samples/sec Loss 6.6772 LearningRate 0.0007 Epoch: 9 Global Step: 195960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:30,515-Speed 6307.50 samples/sec Loss 6.6806 LearningRate 0.0007 Epoch: 9 Global Step: 195970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:33,767-Speed 6298.66 samples/sec Loss 6.6198 LearningRate 0.0007 Epoch: 9 Global Step: 195980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:37,011-Speed 6315.17 samples/sec Loss 6.6861 LearningRate 0.0007 Epoch: 9 Global Step: 195990 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:33:40,259-Speed 6306.32 samples/sec Loss 6.6168 LearningRate 0.0007 Epoch: 9 Global Step: 196000 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:33:43,505-Speed 6311.24 samples/sec Loss 6.7030 LearningRate 0.0007 Epoch: 9 Global Step: 196010 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:33:46,758-Speed 6297.17 samples/sec Loss 6.6423 LearningRate 0.0007 Epoch: 9 Global Step: 196020 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:33:50,002-Speed 6313.64 samples/sec Loss 6.5433 LearningRate 0.0007 Epoch: 9 Global Step: 196030 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:33:53,247-Speed 6312.49 samples/sec Loss 6.6000 LearningRate 0.0007 Epoch: 9 Global Step: 196040 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:33:56,481-Speed 6334.12 samples/sec Loss 6.7177 LearningRate 0.0007 Epoch: 9 Global Step: 196050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:33:59,731-Speed 6303.01 samples/sec Loss 6.6721 LearningRate 0.0007 Epoch: 9 Global Step: 196060 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:02,981-Speed 6302.67 samples/sec Loss 6.7138 LearningRate 0.0007 Epoch: 9 Global Step: 196070 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:06,228-Speed 6309.50 samples/sec Loss 6.6618 LearningRate 0.0007 Epoch: 9 Global Step: 196080 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:09,479-Speed 6301.48 samples/sec Loss 6.6364 LearningRate 0.0007 Epoch: 9 Global Step: 196090 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:12,730-Speed 6301.06 samples/sec Loss 6.6364 LearningRate 0.0007 Epoch: 9 Global Step: 196100 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:15,977-Speed 6306.93 samples/sec Loss 6.6248 LearningRate 0.0007 Epoch: 9 Global Step: 196110 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:19,228-Speed 6302.30 samples/sec Loss 6.6490 LearningRate 0.0007 Epoch: 9 Global Step: 196120 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:22,476-Speed 6307.39 samples/sec Loss 6.5446 LearningRate 0.0007 Epoch: 9 Global Step: 196130 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:25,725-Speed 6304.67 samples/sec Loss 6.6757 LearningRate 0.0007 Epoch: 9 Global Step: 196140 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:28,973-Speed 6307.20 samples/sec Loss 6.6892 LearningRate 0.0007 Epoch: 9 Global Step: 196150 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:34:32,224-Speed 6301.80 samples/sec Loss 6.6833 LearningRate 0.0007 Epoch: 9 Global Step: 196160 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:34:35,463-Speed 6324.40 samples/sec Loss 6.6307 LearningRate 0.0007 Epoch: 9 Global Step: 196170 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:38,713-Speed 6303.01 samples/sec Loss 6.7461 LearningRate 0.0007 Epoch: 9 Global Step: 196180 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:41,955-Speed 6317.03 samples/sec Loss 6.6055 LearningRate 0.0007 Epoch: 9 Global Step: 196190 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:45,201-Speed 6310.24 samples/sec Loss 6.5568 LearningRate 0.0007 Epoch: 9 Global Step: 196200 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:48,457-Speed 6291.37 samples/sec Loss 6.7240 LearningRate 0.0007 Epoch: 9 Global Step: 196210 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:51,707-Speed 6303.61 samples/sec Loss 6.6611 LearningRate 0.0007 Epoch: 9 Global Step: 196220 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:54,959-Speed 6298.79 samples/sec Loss 6.6673 LearningRate 0.0007 Epoch: 9 Global Step: 196230 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:34:58,211-Speed 6298.72 samples/sec Loss 6.6093 LearningRate 0.0007 Epoch: 9 Global Step: 196240 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:01,456-Speed 6314.20 samples/sec Loss 6.5695 LearningRate 0.0007 Epoch: 9 Global Step: 196250 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:04,704-Speed 6305.13 samples/sec Loss 6.5710 LearningRate 0.0007 Epoch: 9 Global Step: 196260 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:07,948-Speed 6316.27 samples/sec Loss 6.6007 LearningRate 0.0007 Epoch: 9 Global Step: 196270 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:35:11,197-Speed 6305.11 samples/sec Loss 6.6810 LearningRate 0.0007 Epoch: 9 Global Step: 196280 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:35:14,444-Speed 6307.27 samples/sec Loss 6.6331 LearningRate 0.0007 Epoch: 9 Global Step: 196290 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:35:17,690-Speed 6309.98 samples/sec Loss 6.6353 LearningRate 0.0007 Epoch: 9 Global Step: 196300 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:35:20,920-Speed 6342.45 samples/sec Loss 6.5199 LearningRate 0.0007 Epoch: 9 Global Step: 196310 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:24,175-Speed 6294.07 samples/sec Loss 6.6144 LearningRate 0.0007 Epoch: 9 Global Step: 196320 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:27,446-Speed 6261.19 samples/sec Loss 6.5829 LearningRate 0.0007 Epoch: 9 Global Step: 196330 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:30,694-Speed 6307.05 samples/sec Loss 6.6066 LearningRate 0.0007 Epoch: 9 Global Step: 196340 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:33,946-Speed 6300.77 samples/sec Loss 6.6153 LearningRate 0.0007 Epoch: 9 Global Step: 196350 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:37,193-Speed 6308.16 samples/sec Loss 6.5703 LearningRate 0.0007 Epoch: 9 Global Step: 196360 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:40,440-Speed 6310.29 samples/sec Loss 6.6599 LearningRate 0.0007 Epoch: 9 Global Step: 196370 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:43,694-Speed 6295.09 samples/sec Loss 6.5602 LearningRate 0.0007 Epoch: 9 Global Step: 196380 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:46,940-Speed 6310.74 samples/sec Loss 6.6431 LearningRate 0.0007 Epoch: 9 Global Step: 196390 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:50,195-Speed 6293.38 samples/sec Loss 6.6850 LearningRate 0.0007 Epoch: 9 Global Step: 196400 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:53,437-Speed 6318.08 samples/sec Loss 6.6908 LearningRate 0.0007 Epoch: 9 Global Step: 196410 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:56,707-Speed 6265.01 samples/sec Loss 6.6135 LearningRate 0.0007 Epoch: 9 Global Step: 196420 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:35:59,955-Speed 6306.32 samples/sec Loss 6.6517 LearningRate 0.0007 Epoch: 9 Global Step: 196430 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:03,269-Speed 6179.46 samples/sec Loss 6.6318 LearningRate 0.0007 Epoch: 9 Global Step: 196440 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:06,518-Speed 6306.45 samples/sec Loss 6.6478 LearningRate 0.0007 Epoch: 9 Global Step: 196450 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:09,765-Speed 6308.71 samples/sec Loss 6.5886 LearningRate 0.0007 Epoch: 9 Global Step: 196460 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:13,013-Speed 6306.04 samples/sec Loss 6.5729 LearningRate 0.0007 Epoch: 9 Global Step: 196470 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:16,268-Speed 6293.24 samples/sec Loss 6.6549 LearningRate 0.0007 Epoch: 9 Global Step: 196480 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:19,513-Speed 6312.20 samples/sec Loss 6.6324 LearningRate 0.0007 Epoch: 9 Global Step: 196490 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:22,759-Speed 6312.01 samples/sec Loss 6.6734 LearningRate 0.0007 Epoch: 9 Global Step: 196500 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:26,008-Speed 6304.27 samples/sec Loss 6.6246 LearningRate 0.0007 Epoch: 9 Global Step: 196510 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:36:29,240-Speed 6338.30 samples/sec Loss 6.6794 LearningRate 0.0007 Epoch: 9 Global Step: 196520 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:32,487-Speed 6308.64 samples/sec Loss 6.5475 LearningRate 0.0007 Epoch: 9 Global Step: 196530 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:35,733-Speed 6311.30 samples/sec Loss 6.6808 LearningRate 0.0007 Epoch: 9 Global Step: 196540 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:38,977-Speed 6314.11 samples/sec Loss 6.6438 LearningRate 0.0007 Epoch: 9 Global Step: 196550 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:42,224-Speed 6308.54 samples/sec Loss 6.7284 LearningRate 0.0007 Epoch: 9 Global Step: 196560 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:45,472-Speed 6307.94 samples/sec Loss 6.6830 LearningRate 0.0007 Epoch: 9 Global Step: 196570 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:48,718-Speed 6311.55 samples/sec Loss 6.5902 LearningRate 0.0007 Epoch: 9 Global Step: 196580 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:51,966-Speed 6306.31 samples/sec Loss 6.5938 LearningRate 0.0007 Epoch: 9 Global Step: 196590 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:55,215-Speed 6303.93 samples/sec Loss 6.5913 LearningRate 0.0007 Epoch: 9 Global Step: 196600 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:36:58,462-Speed 6308.65 samples/sec Loss 6.5888 LearningRate 0.0007 Epoch: 9 Global Step: 196610 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:01,712-Speed 6304.22 samples/sec Loss 6.6194 LearningRate 0.0007 Epoch: 9 Global Step: 196620 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:37:04,959-Speed 6307.68 samples/sec Loss 6.6573 LearningRate 0.0007 Epoch: 9 Global Step: 196630 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:37:08,193-Speed 6334.97 samples/sec Loss 6.6054 LearningRate 0.0007 Epoch: 9 Global Step: 196640 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:11,460-Speed 6270.75 samples/sec Loss 6.6318 LearningRate 0.0007 Epoch: 9 Global Step: 196650 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:14,711-Speed 6300.20 samples/sec Loss 6.6181 LearningRate 0.0007 Epoch: 9 Global Step: 196660 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:17,959-Speed 6306.54 samples/sec Loss 6.6600 LearningRate 0.0007 Epoch: 9 Global Step: 196670 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:21,202-Speed 6316.01 samples/sec Loss 6.6173 LearningRate 0.0007 Epoch: 9 Global Step: 196680 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:24,447-Speed 6314.55 samples/sec Loss 6.6900 LearningRate 0.0007 Epoch: 9 Global Step: 196690 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:27,693-Speed 6309.05 samples/sec Loss 6.6255 LearningRate 0.0007 Epoch: 9 Global Step: 196700 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:30,942-Speed 6305.95 samples/sec Loss 6.6871 LearningRate 0.0007 Epoch: 9 Global Step: 196710 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:34,187-Speed 6311.26 samples/sec Loss 6.6971 LearningRate 0.0007 Epoch: 9 Global Step: 196720 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:37,433-Speed 6311.97 samples/sec Loss 6.6322 LearningRate 0.0007 Epoch: 9 Global Step: 196730 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:40,675-Speed 6317.57 samples/sec Loss 6.5921 LearningRate 0.0007 Epoch: 9 Global Step: 196740 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:43,920-Speed 6312.15 samples/sec Loss 6.6056 LearningRate 0.0007 Epoch: 9 Global Step: 196750 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:47,168-Speed 6307.54 samples/sec Loss 6.5899 LearningRate 0.0007 Epoch: 9 Global Step: 196760 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:50,416-Speed 6307.41 samples/sec Loss 6.6208 LearningRate 0.0007 Epoch: 9 Global Step: 196770 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:53,661-Speed 6311.62 samples/sec Loss 6.7291 LearningRate 0.0007 Epoch: 9 Global Step: 196780 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:37:56,909-Speed 6308.30 samples/sec Loss 6.6459 LearningRate 0.0007 Epoch: 9 Global Step: 196790 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:00,166-Speed 6290.02 samples/sec Loss 6.6614 LearningRate 0.0007 Epoch: 9 Global Step: 196800 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:03,414-Speed 6306.05 samples/sec Loss 6.5816 LearningRate 0.0007 Epoch: 9 Global Step: 196810 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:06,663-Speed 6305.53 samples/sec Loss 6.6436 LearningRate 0.0007 Epoch: 9 Global Step: 196820 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:09,910-Speed 6307.19 samples/sec Loss 6.6203 LearningRate 0.0007 Epoch: 9 Global Step: 196830 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:13,147-Speed 6329.97 samples/sec Loss 6.6442 LearningRate 0.0007 Epoch: 9 Global Step: 196840 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:16,395-Speed 6306.21 samples/sec Loss 6.5478 LearningRate 0.0007 Epoch: 9 Global Step: 196850 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:19,641-Speed 6310.23 samples/sec Loss 6.6267 LearningRate 0.0007 Epoch: 9 Global Step: 196860 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:22,888-Speed 6309.20 samples/sec Loss 6.6306 LearningRate 0.0007 Epoch: 9 Global Step: 196870 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:26,135-Speed 6308.48 samples/sec Loss 6.6160 LearningRate 0.0007 Epoch: 9 Global Step: 196880 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:29,381-Speed 6310.36 samples/sec Loss 6.5765 LearningRate 0.0007 Epoch: 9 Global Step: 196890 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:32,623-Speed 6318.70 samples/sec Loss 6.5880 LearningRate 0.0007 Epoch: 9 Global Step: 196900 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:35,868-Speed 6312.05 samples/sec Loss 6.5802 LearningRate 0.0007 Epoch: 9 Global Step: 196910 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:39,118-Speed 6302.92 samples/sec Loss 6.5329 LearningRate 0.0007 Epoch: 9 Global Step: 196920 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:42,365-Speed 6309.71 samples/sec Loss 6.5882 LearningRate 0.0007 Epoch: 9 Global Step: 196930 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:45,614-Speed 6304.26 samples/sec Loss 6.5919 LearningRate 0.0007 Epoch: 9 Global Step: 196940 Fp16 Grad Scale: 65536 Required: 58 hours Training: 2022-04-01 09:38:48,849-Speed 6332.55 samples/sec Loss 6.5675 LearningRate 0.0007 Epoch: 9 Global Step: 196950 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:52,095-Speed 6310.36 samples/sec Loss 6.6913 LearningRate 0.0007 Epoch: 9 Global Step: 196960 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:55,342-Speed 6309.93 samples/sec Loss 6.6946 LearningRate 0.0007 Epoch: 9 Global Step: 196970 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:38:58,589-Speed 6307.12 samples/sec Loss 6.7074 LearningRate 0.0007 Epoch: 9 Global Step: 196980 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:39:01,835-Speed 6311.75 samples/sec Loss 6.6804 LearningRate 0.0007 Epoch: 9 Global Step: 196990 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:39:05,088-Speed 6296.36 samples/sec Loss 6.5435 LearningRate 0.0007 Epoch: 9 Global Step: 197000 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:39:08,333-Speed 6313.07 samples/sec Loss 6.6222 LearningRate 0.0007 Epoch: 9 Global Step: 197010 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:39:11,580-Speed 6309.92 samples/sec Loss 6.6891 LearningRate 0.0007 Epoch: 9 Global Step: 197020 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:39:14,828-Speed 6306.49 samples/sec Loss 6.6149 LearningRate 0.0007 Epoch: 9 Global Step: 197030 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:39:18,073-Speed 6311.83 samples/sec Loss 6.5945 LearningRate 0.0007 Epoch: 9 Global Step: 197040 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:39:21,309-Speed 6330.22 samples/sec Loss 6.5991 LearningRate 0.0007 Epoch: 9 Global Step: 197050 Fp16 Grad Scale: 32768 Required: 58 hours Training: 2022-04-01 09:39:24,559-Speed 6304.30 samples/sec Loss 6.5602 LearningRate 0.0007 Epoch: 9 Global Step: 197060 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:39:27,805-Speed 6310.73 samples/sec Loss 6.6042 LearningRate 0.0007 Epoch: 9 Global Step: 197070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:39:31,082-Speed 6251.59 samples/sec Loss 6.6523 LearningRate 0.0007 Epoch: 9 Global Step: 197080 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:39:34,325-Speed 6316.51 samples/sec Loss 6.6221 LearningRate 0.0007 Epoch: 9 Global Step: 197090 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:39:37,573-Speed 6306.21 samples/sec Loss 6.6443 LearningRate 0.0007 Epoch: 9 Global Step: 197100 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:39:40,823-Speed 6302.40 samples/sec Loss 6.6907 LearningRate 0.0007 Epoch: 9 Global Step: 197110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:39:44,066-Speed 6316.99 samples/sec Loss 6.7398 LearningRate 0.0007 Epoch: 9 Global Step: 197120 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:39:47,371-Speed 6198.28 samples/sec Loss 6.5397 LearningRate 0.0007 Epoch: 9 Global Step: 197130 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:39:50,662-Speed 6223.87 samples/sec Loss 6.6043 LearningRate 0.0007 Epoch: 9 Global Step: 197140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:39:53,909-Speed 6308.23 samples/sec Loss 6.6128 LearningRate 0.0007 Epoch: 9 Global Step: 197150 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:39:57,234-Speed 6161.69 samples/sec Loss 6.5986 LearningRate 0.0007 Epoch: 9 Global Step: 197160 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:40:00,478-Speed 6314.47 samples/sec Loss 6.5834 LearningRate 0.0007 Epoch: 9 Global Step: 197170 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:40:03,708-Speed 6340.99 samples/sec Loss 6.5872 LearningRate 0.0007 Epoch: 9 Global Step: 197180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:06,956-Speed 6308.16 samples/sec Loss 6.6025 LearningRate 0.0007 Epoch: 9 Global Step: 197190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:10,198-Speed 6317.73 samples/sec Loss 6.6646 LearningRate 0.0007 Epoch: 9 Global Step: 197200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:13,464-Speed 6273.04 samples/sec Loss 6.6083 LearningRate 0.0007 Epoch: 9 Global Step: 197210 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:16,709-Speed 6312.61 samples/sec Loss 6.6790 LearningRate 0.0007 Epoch: 9 Global Step: 197220 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:19,954-Speed 6312.01 samples/sec Loss 6.5522 LearningRate 0.0007 Epoch: 9 Global Step: 197230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:23,201-Speed 6309.06 samples/sec Loss 6.6099 LearningRate 0.0007 Epoch: 9 Global Step: 197240 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:26,448-Speed 6309.97 samples/sec Loss 6.5912 LearningRate 0.0007 Epoch: 9 Global Step: 197250 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:29,693-Speed 6312.01 samples/sec Loss 6.5348 LearningRate 0.0007 Epoch: 9 Global Step: 197260 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:32,948-Speed 6294.19 samples/sec Loss 6.6267 LearningRate 0.0007 Epoch: 9 Global Step: 197270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:36,195-Speed 6309.32 samples/sec Loss 6.6911 LearningRate 0.0007 Epoch: 9 Global Step: 197280 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:40:39,547-Speed 6110.02 samples/sec Loss 6.6002 LearningRate 0.0007 Epoch: 9 Global Step: 197290 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:40:42,792-Speed 6313.69 samples/sec Loss 6.6252 LearningRate 0.0007 Epoch: 9 Global Step: 197300 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:40:46,035-Speed 6315.13 samples/sec Loss 6.5850 LearningRate 0.0007 Epoch: 9 Global Step: 197310 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:40:49,271-Speed 6330.81 samples/sec Loss 6.6206 LearningRate 0.0007 Epoch: 9 Global Step: 197320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:52,518-Speed 6308.40 samples/sec Loss 6.6012 LearningRate 0.0007 Epoch: 9 Global Step: 197330 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:55,768-Speed 6303.65 samples/sec Loss 6.6096 LearningRate 0.0007 Epoch: 9 Global Step: 197340 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:40:59,016-Speed 6305.57 samples/sec Loss 6.6313 LearningRate 0.0007 Epoch: 9 Global Step: 197350 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:02,262-Speed 6310.90 samples/sec Loss 6.5888 LearningRate 0.0007 Epoch: 9 Global Step: 197360 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:05,508-Speed 6311.07 samples/sec Loss 6.6989 LearningRate 0.0007 Epoch: 9 Global Step: 197370 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:08,756-Speed 6307.69 samples/sec Loss 6.6310 LearningRate 0.0007 Epoch: 9 Global Step: 197380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:12,004-Speed 6305.63 samples/sec Loss 6.6585 LearningRate 0.0007 Epoch: 9 Global Step: 197390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:15,250-Speed 6310.56 samples/sec Loss 6.6187 LearningRate 0.0007 Epoch: 9 Global Step: 197400 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:18,495-Speed 6313.18 samples/sec Loss 6.5360 LearningRate 0.0007 Epoch: 9 Global Step: 197410 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:21,727-Speed 6338.88 samples/sec Loss 6.5598 LearningRate 0.0007 Epoch: 9 Global Step: 197420 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:24,972-Speed 6312.34 samples/sec Loss 6.5952 LearningRate 0.0007 Epoch: 9 Global Step: 197430 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:28,218-Speed 6310.34 samples/sec Loss 6.5895 LearningRate 0.0007 Epoch: 9 Global Step: 197440 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:31,466-Speed 6306.34 samples/sec Loss 6.6738 LearningRate 0.0007 Epoch: 9 Global Step: 197450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:34,710-Speed 6314.91 samples/sec Loss 6.5941 LearningRate 0.0007 Epoch: 9 Global Step: 197460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:37,960-Speed 6303.36 samples/sec Loss 6.6883 LearningRate 0.0007 Epoch: 9 Global Step: 197470 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:41,210-Speed 6303.56 samples/sec Loss 6.6855 LearningRate 0.0007 Epoch: 9 Global Step: 197480 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:44,453-Speed 6316.27 samples/sec Loss 6.6784 LearningRate 0.0007 Epoch: 9 Global Step: 197490 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:47,700-Speed 6308.64 samples/sec Loss 6.6705 LearningRate 0.0007 Epoch: 9 Global Step: 197500 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:50,945-Speed 6312.74 samples/sec Loss 6.6092 LearningRate 0.0007 Epoch: 9 Global Step: 197510 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:41:54,193-Speed 6306.02 samples/sec Loss 6.6907 LearningRate 0.0007 Epoch: 9 Global Step: 197520 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:41:57,443-Speed 6303.31 samples/sec Loss 6.6217 LearningRate 0.0007 Epoch: 9 Global Step: 197530 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:42:00,680-Speed 6327.72 samples/sec Loss 6.6915 LearningRate 0.0007 Epoch: 9 Global Step: 197540 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:03,926-Speed 6312.36 samples/sec Loss 6.7605 LearningRate 0.0007 Epoch: 9 Global Step: 197550 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:07,175-Speed 6304.17 samples/sec Loss 6.5857 LearningRate 0.0007 Epoch: 9 Global Step: 197560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:10,418-Speed 6316.93 samples/sec Loss 6.6538 LearningRate 0.0007 Epoch: 9 Global Step: 197570 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:13,664-Speed 6310.77 samples/sec Loss 6.6072 LearningRate 0.0007 Epoch: 9 Global Step: 197580 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:16,911-Speed 6309.78 samples/sec Loss 6.7487 LearningRate 0.0007 Epoch: 9 Global Step: 197590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:20,158-Speed 6307.28 samples/sec Loss 6.7036 LearningRate 0.0007 Epoch: 9 Global Step: 197600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:23,412-Speed 6294.88 samples/sec Loss 6.5841 LearningRate 0.0007 Epoch: 9 Global Step: 197610 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:26,660-Speed 6307.46 samples/sec Loss 6.6409 LearningRate 0.0007 Epoch: 9 Global Step: 197620 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:29,909-Speed 6304.12 samples/sec Loss 6.5889 LearningRate 0.0007 Epoch: 9 Global Step: 197630 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:33,158-Speed 6304.77 samples/sec Loss 6.6558 LearningRate 0.0007 Epoch: 9 Global Step: 197640 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:42:36,392-Speed 6335.56 samples/sec Loss 6.6055 LearningRate 0.0007 Epoch: 9 Global Step: 197650 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:39,640-Speed 6306.62 samples/sec Loss 6.5881 LearningRate 0.0007 Epoch: 9 Global Step: 197660 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:42,890-Speed 6302.94 samples/sec Loss 6.6802 LearningRate 0.0007 Epoch: 9 Global Step: 197670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:46,140-Speed 6303.35 samples/sec Loss 6.6344 LearningRate 0.0007 Epoch: 9 Global Step: 197680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:49,389-Speed 6305.83 samples/sec Loss 6.6300 LearningRate 0.0007 Epoch: 9 Global Step: 197690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:52,633-Speed 6313.08 samples/sec Loss 6.6712 LearningRate 0.0007 Epoch: 9 Global Step: 197700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:55,881-Speed 6307.59 samples/sec Loss 6.6397 LearningRate 0.0007 Epoch: 9 Global Step: 197710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:42:59,127-Speed 6311.11 samples/sec Loss 6.6513 LearningRate 0.0007 Epoch: 9 Global Step: 197720 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:02,373-Speed 6309.98 samples/sec Loss 6.6367 LearningRate 0.0007 Epoch: 9 Global Step: 197730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:05,622-Speed 6304.75 samples/sec Loss 6.6100 LearningRate 0.0007 Epoch: 9 Global Step: 197740 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:08,869-Speed 6308.64 samples/sec Loss 6.5581 LearningRate 0.0007 Epoch: 9 Global Step: 197750 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:43:12,113-Speed 6315.82 samples/sec Loss 6.6127 LearningRate 0.0007 Epoch: 9 Global Step: 197760 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:43:15,353-Speed 6321.46 samples/sec Loss 6.6473 LearningRate 0.0007 Epoch: 9 Global Step: 197770 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:43:18,586-Speed 6337.00 samples/sec Loss 6.6118 LearningRate 0.0007 Epoch: 9 Global Step: 197780 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:21,833-Speed 6308.38 samples/sec Loss 6.6018 LearningRate 0.0007 Epoch: 9 Global Step: 197790 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:25,079-Speed 6310.26 samples/sec Loss 6.5868 LearningRate 0.0007 Epoch: 9 Global Step: 197800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:28,324-Speed 6313.71 samples/sec Loss 6.6018 LearningRate 0.0007 Epoch: 9 Global Step: 197810 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:31,569-Speed 6311.34 samples/sec Loss 6.5838 LearningRate 0.0007 Epoch: 9 Global Step: 197820 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:34,821-Speed 6300.52 samples/sec Loss 6.6475 LearningRate 0.0007 Epoch: 9 Global Step: 197830 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:38,064-Speed 6315.47 samples/sec Loss 6.6201 LearningRate 0.0007 Epoch: 9 Global Step: 197840 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:41,310-Speed 6311.36 samples/sec Loss 6.6563 LearningRate 0.0007 Epoch: 9 Global Step: 197850 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:44,557-Speed 6309.10 samples/sec Loss 6.6646 LearningRate 0.0007 Epoch: 9 Global Step: 197860 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:47,801-Speed 6313.39 samples/sec Loss 6.5810 LearningRate 0.0007 Epoch: 9 Global Step: 197870 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:51,034-Speed 6336.65 samples/sec Loss 6.6499 LearningRate 0.0007 Epoch: 9 Global Step: 197880 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:54,280-Speed 6310.72 samples/sec Loss 6.6485 LearningRate 0.0007 Epoch: 9 Global Step: 197890 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:43:57,531-Speed 6302.22 samples/sec Loss 6.5757 LearningRate 0.0007 Epoch: 9 Global Step: 197900 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:00,779-Speed 6307.53 samples/sec Loss 6.6519 LearningRate 0.0007 Epoch: 9 Global Step: 197910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:04,027-Speed 6305.52 samples/sec Loss 6.5933 LearningRate 0.0007 Epoch: 9 Global Step: 197920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:07,276-Speed 6305.01 samples/sec Loss 6.6385 LearningRate 0.0007 Epoch: 9 Global Step: 197930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:10,523-Speed 6309.84 samples/sec Loss 6.5997 LearningRate 0.0007 Epoch: 9 Global Step: 197940 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:13,774-Speed 6300.47 samples/sec Loss 6.6496 LearningRate 0.0007 Epoch: 9 Global Step: 197950 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:17,019-Speed 6312.17 samples/sec Loss 6.6329 LearningRate 0.0007 Epoch: 9 Global Step: 197960 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:20,268-Speed 6305.65 samples/sec Loss 6.6280 LearningRate 0.0007 Epoch: 9 Global Step: 197970 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:23,520-Speed 6299.52 samples/sec Loss 6.6466 LearningRate 0.0007 Epoch: 9 Global Step: 197980 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:44:26,767-Speed 6308.45 samples/sec Loss 6.6150 LearningRate 0.0007 Epoch: 9 Global Step: 197990 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:44:30,001-Speed 6334.42 samples/sec Loss 6.5680 LearningRate 0.0007 Epoch: 9 Global Step: 198000 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:33,247-Speed 6309.67 samples/sec Loss 6.7450 LearningRate 0.0007 Epoch: 9 Global Step: 198010 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:36,496-Speed 6305.09 samples/sec Loss 6.6060 LearningRate 0.0007 Epoch: 9 Global Step: 198020 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:39,749-Speed 6296.76 samples/sec Loss 6.6497 LearningRate 0.0007 Epoch: 9 Global Step: 198030 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:43,001-Speed 6300.16 samples/sec Loss 6.6057 LearningRate 0.0007 Epoch: 9 Global Step: 198040 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:46,250-Speed 6304.76 samples/sec Loss 6.5555 LearningRate 0.0007 Epoch: 9 Global Step: 198050 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:49,495-Speed 6312.36 samples/sec Loss 6.6203 LearningRate 0.0007 Epoch: 9 Global Step: 198060 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:52,744-Speed 6304.44 samples/sec Loss 6.6659 LearningRate 0.0007 Epoch: 9 Global Step: 198070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:55,993-Speed 6305.15 samples/sec Loss 6.6490 LearningRate 0.0007 Epoch: 9 Global Step: 198080 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:44:59,243-Speed 6301.96 samples/sec Loss 6.6691 LearningRate 0.0007 Epoch: 9 Global Step: 198090 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:02,494-Speed 6302.23 samples/sec Loss 6.6437 LearningRate 0.0007 Epoch: 9 Global Step: 198100 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:45:05,740-Speed 6311.26 samples/sec Loss 6.6916 LearningRate 0.0007 Epoch: 9 Global Step: 198110 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:45:08,985-Speed 6312.89 samples/sec Loss 6.6397 LearningRate 0.0007 Epoch: 9 Global Step: 198120 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:45:12,229-Speed 6313.66 samples/sec Loss 6.6031 LearningRate 0.0007 Epoch: 9 Global Step: 198130 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:45:15,477-Speed 6308.15 samples/sec Loss 6.6378 LearningRate 0.0007 Epoch: 9 Global Step: 198140 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:45:18,726-Speed 6304.27 samples/sec Loss 6.5114 LearningRate 0.0007 Epoch: 9 Global Step: 198150 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:45:21,971-Speed 6312.05 samples/sec Loss 6.5625 LearningRate 0.0007 Epoch: 9 Global Step: 198160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:25,254-Speed 6240.22 samples/sec Loss 6.6667 LearningRate 0.0007 Epoch: 9 Global Step: 198170 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:28,575-Speed 6168.10 samples/sec Loss 6.6119 LearningRate 0.0007 Epoch: 9 Global Step: 198180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:31,825-Speed 6303.31 samples/sec Loss 6.5275 LearningRate 0.0007 Epoch: 9 Global Step: 198190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:35,069-Speed 6314.80 samples/sec Loss 6.6194 LearningRate 0.0007 Epoch: 9 Global Step: 198200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:38,314-Speed 6311.66 samples/sec Loss 6.6495 LearningRate 0.0007 Epoch: 9 Global Step: 198210 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:41,559-Speed 6312.37 samples/sec Loss 6.5824 LearningRate 0.0007 Epoch: 9 Global Step: 198220 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:44,806-Speed 6309.44 samples/sec Loss 6.5663 LearningRate 0.0007 Epoch: 9 Global Step: 198230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:48,055-Speed 6304.20 samples/sec Loss 6.5398 LearningRate 0.0007 Epoch: 9 Global Step: 198240 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:51,303-Speed 6307.15 samples/sec Loss 6.5852 LearningRate 0.0007 Epoch: 9 Global Step: 198250 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:45:54,553-Speed 6303.42 samples/sec Loss 6.5859 LearningRate 0.0007 Epoch: 9 Global Step: 198260 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:45:57,792-Speed 6324.78 samples/sec Loss 6.5861 LearningRate 0.0007 Epoch: 9 Global Step: 198270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:01,037-Speed 6311.57 samples/sec Loss 6.6230 LearningRate 0.0007 Epoch: 9 Global Step: 198280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:04,288-Speed 6302.21 samples/sec Loss 6.7160 LearningRate 0.0007 Epoch: 9 Global Step: 198290 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:07,535-Speed 6307.45 samples/sec Loss 6.5938 LearningRate 0.0007 Epoch: 9 Global Step: 198300 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:10,794-Speed 6285.82 samples/sec Loss 6.5833 LearningRate 0.0007 Epoch: 9 Global Step: 198310 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:14,041-Speed 6310.06 samples/sec Loss 6.6193 LearningRate 0.0007 Epoch: 9 Global Step: 198320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:17,287-Speed 6311.69 samples/sec Loss 6.5670 LearningRate 0.0007 Epoch: 9 Global Step: 198330 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:20,536-Speed 6304.51 samples/sec Loss 6.6080 LearningRate 0.0007 Epoch: 9 Global Step: 198340 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:23,783-Speed 6307.60 samples/sec Loss 6.5667 LearningRate 0.0007 Epoch: 9 Global Step: 198350 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:27,033-Speed 6302.30 samples/sec Loss 6.5728 LearningRate 0.0007 Epoch: 9 Global Step: 198360 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:30,281-Speed 6307.16 samples/sec Loss 6.5972 LearningRate 0.0007 Epoch: 9 Global Step: 198370 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:46:33,516-Speed 6332.53 samples/sec Loss 6.6407 LearningRate 0.0007 Epoch: 9 Global Step: 198380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:36,764-Speed 6307.22 samples/sec Loss 6.5282 LearningRate 0.0007 Epoch: 9 Global Step: 198390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:40,017-Speed 6297.26 samples/sec Loss 6.6157 LearningRate 0.0007 Epoch: 9 Global Step: 198400 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:43,269-Speed 6301.76 samples/sec Loss 6.6345 LearningRate 0.0007 Epoch: 9 Global Step: 198410 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:46,571-Speed 6202.75 samples/sec Loss 6.6305 LearningRate 0.0007 Epoch: 9 Global Step: 198420 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:49,820-Speed 6305.20 samples/sec Loss 6.7285 LearningRate 0.0007 Epoch: 9 Global Step: 198430 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:53,073-Speed 6298.47 samples/sec Loss 6.5984 LearningRate 0.0007 Epoch: 9 Global Step: 198440 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:56,322-Speed 6303.08 samples/sec Loss 6.5581 LearningRate 0.0007 Epoch: 9 Global Step: 198450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:46:59,571-Speed 6304.77 samples/sec Loss 6.6815 LearningRate 0.0007 Epoch: 9 Global Step: 198460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:02,817-Speed 6312.44 samples/sec Loss 6.6444 LearningRate 0.0007 Epoch: 9 Global Step: 198470 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:06,069-Speed 6298.83 samples/sec Loss 6.6261 LearningRate 0.0007 Epoch: 9 Global Step: 198480 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:47:09,316-Speed 6308.34 samples/sec Loss 6.5917 LearningRate 0.0007 Epoch: 9 Global Step: 198490 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:47:12,567-Speed 6301.05 samples/sec Loss 6.6494 LearningRate 0.0007 Epoch: 9 Global Step: 198500 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:47:15,820-Speed 6297.20 samples/sec Loss 6.5742 LearningRate 0.0007 Epoch: 9 Global Step: 198510 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:47:19,051-Speed 6340.40 samples/sec Loss 6.6548 LearningRate 0.0007 Epoch: 9 Global Step: 198520 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:22,299-Speed 6306.90 samples/sec Loss 6.5554 LearningRate 0.0007 Epoch: 9 Global Step: 198530 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:25,548-Speed 6305.56 samples/sec Loss 6.6115 LearningRate 0.0007 Epoch: 9 Global Step: 198540 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:28,792-Speed 6313.76 samples/sec Loss 6.6605 LearningRate 0.0007 Epoch: 9 Global Step: 198550 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:32,039-Speed 6309.42 samples/sec Loss 6.5661 LearningRate 0.0007 Epoch: 9 Global Step: 198560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:35,287-Speed 6306.45 samples/sec Loss 6.5908 LearningRate 0.0007 Epoch: 9 Global Step: 198570 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:38,537-Speed 6302.82 samples/sec Loss 6.5751 LearningRate 0.0007 Epoch: 9 Global Step: 198580 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:41,785-Speed 6306.65 samples/sec Loss 6.5899 LearningRate 0.0007 Epoch: 9 Global Step: 198590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:45,035-Speed 6303.84 samples/sec Loss 6.5763 LearningRate 0.0007 Epoch: 9 Global Step: 198600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:48,284-Speed 6303.68 samples/sec Loss 6.5683 LearningRate 0.0007 Epoch: 9 Global Step: 198610 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:51,516-Speed 6339.06 samples/sec Loss 6.5715 LearningRate 0.0007 Epoch: 9 Global Step: 198620 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:54,761-Speed 6313.07 samples/sec Loss 6.6224 LearningRate 0.0007 Epoch: 9 Global Step: 198630 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:47:58,011-Speed 6301.65 samples/sec Loss 6.6309 LearningRate 0.0007 Epoch: 9 Global Step: 198640 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:01,260-Speed 6304.59 samples/sec Loss 6.5964 LearningRate 0.0007 Epoch: 9 Global Step: 198650 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:04,508-Speed 6306.73 samples/sec Loss 6.5994 LearningRate 0.0007 Epoch: 9 Global Step: 198660 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:07,756-Speed 6308.64 samples/sec Loss 6.6275 LearningRate 0.0007 Epoch: 9 Global Step: 198670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:10,997-Speed 6318.78 samples/sec Loss 6.5735 LearningRate 0.0007 Epoch: 9 Global Step: 198680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:14,246-Speed 6306.12 samples/sec Loss 6.5902 LearningRate 0.0007 Epoch: 9 Global Step: 198690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:17,493-Speed 6308.20 samples/sec Loss 6.6583 LearningRate 0.0007 Epoch: 9 Global Step: 198700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:20,737-Speed 6315.45 samples/sec Loss 6.6036 LearningRate 0.0007 Epoch: 9 Global Step: 198710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:23,982-Speed 6312.61 samples/sec Loss 6.5941 LearningRate 0.0007 Epoch: 9 Global Step: 198720 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:48:27,215-Speed 6335.24 samples/sec Loss 6.5981 LearningRate 0.0007 Epoch: 9 Global Step: 198730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:30,468-Speed 6298.30 samples/sec Loss 6.5503 LearningRate 0.0007 Epoch: 9 Global Step: 198740 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:33,719-Speed 6300.85 samples/sec Loss 6.6002 LearningRate 0.0007 Epoch: 9 Global Step: 198750 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:36,965-Speed 6310.68 samples/sec Loss 6.4974 LearningRate 0.0007 Epoch: 9 Global Step: 198760 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:40,213-Speed 6306.92 samples/sec Loss 6.6953 LearningRate 0.0007 Epoch: 9 Global Step: 198770 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:43,459-Speed 6310.53 samples/sec Loss 6.6456 LearningRate 0.0007 Epoch: 9 Global Step: 198780 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:46,713-Speed 6295.34 samples/sec Loss 6.6240 LearningRate 0.0007 Epoch: 9 Global Step: 198790 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:49,958-Speed 6313.80 samples/sec Loss 6.6606 LearningRate 0.0007 Epoch: 9 Global Step: 198800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:53,205-Speed 6308.79 samples/sec Loss 6.6239 LearningRate 0.0007 Epoch: 9 Global Step: 198810 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:56,449-Speed 6314.04 samples/sec Loss 6.7121 LearningRate 0.0007 Epoch: 9 Global Step: 198820 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:48:59,690-Speed 6319.60 samples/sec Loss 6.6646 LearningRate 0.0007 Epoch: 9 Global Step: 198830 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:49:02,940-Speed 6303.78 samples/sec Loss 6.5716 LearningRate 0.0007 Epoch: 9 Global Step: 198840 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:49:06,186-Speed 6310.31 samples/sec Loss 6.5417 LearningRate 0.0007 Epoch: 9 Global Step: 198850 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:49:09,422-Speed 6330.11 samples/sec Loss 6.5627 LearningRate 0.0007 Epoch: 9 Global Step: 198860 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:12,673-Speed 6301.84 samples/sec Loss 6.6201 LearningRate 0.0007 Epoch: 9 Global Step: 198870 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:15,920-Speed 6307.54 samples/sec Loss 6.5647 LearningRate 0.0007 Epoch: 9 Global Step: 198880 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:19,167-Speed 6310.03 samples/sec Loss 6.5900 LearningRate 0.0007 Epoch: 9 Global Step: 198890 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:22,413-Speed 6311.39 samples/sec Loss 6.6236 LearningRate 0.0007 Epoch: 9 Global Step: 198900 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:25,660-Speed 6309.03 samples/sec Loss 6.5036 LearningRate 0.0007 Epoch: 9 Global Step: 198910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:28,905-Speed 6311.53 samples/sec Loss 6.5639 LearningRate 0.0007 Epoch: 9 Global Step: 198920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:32,153-Speed 6308.14 samples/sec Loss 6.6041 LearningRate 0.0007 Epoch: 9 Global Step: 198930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:35,402-Speed 6304.20 samples/sec Loss 6.6062 LearningRate 0.0007 Epoch: 9 Global Step: 198940 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:38,649-Speed 6309.24 samples/sec Loss 6.6758 LearningRate 0.0007 Epoch: 9 Global Step: 198950 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:41,900-Speed 6301.66 samples/sec Loss 6.6341 LearningRate 0.0007 Epoch: 9 Global Step: 198960 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:49:45,146-Speed 6310.27 samples/sec Loss 6.4902 LearningRate 0.0007 Epoch: 9 Global Step: 198970 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:49:48,375-Speed 6344.45 samples/sec Loss 6.5798 LearningRate 0.0007 Epoch: 9 Global Step: 198980 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:51,628-Speed 6297.97 samples/sec Loss 6.5521 LearningRate 0.0007 Epoch: 9 Global Step: 198990 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:54,874-Speed 6310.20 samples/sec Loss 6.6169 LearningRate 0.0007 Epoch: 9 Global Step: 199000 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:49:58,123-Speed 6305.91 samples/sec Loss 6.6324 LearningRate 0.0007 Epoch: 9 Global Step: 199010 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:01,370-Speed 6307.76 samples/sec Loss 6.5829 LearningRate 0.0007 Epoch: 9 Global Step: 199020 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:04,620-Speed 6303.44 samples/sec Loss 6.5725 LearningRate 0.0007 Epoch: 9 Global Step: 199030 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:07,864-Speed 6313.96 samples/sec Loss 6.6405 LearningRate 0.0007 Epoch: 9 Global Step: 199040 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:11,114-Speed 6303.95 samples/sec Loss 6.5586 LearningRate 0.0007 Epoch: 9 Global Step: 199050 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:14,364-Speed 6302.85 samples/sec Loss 6.5883 LearningRate 0.0007 Epoch: 9 Global Step: 199060 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:17,609-Speed 6312.41 samples/sec Loss 6.5406 LearningRate 0.0007 Epoch: 9 Global Step: 199070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:20,873-Speed 6274.50 samples/sec Loss 6.5652 LearningRate 0.0007 Epoch: 9 Global Step: 199080 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:50:24,109-Speed 6332.10 samples/sec Loss 6.6220 LearningRate 0.0007 Epoch: 9 Global Step: 199090 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:27,357-Speed 6305.08 samples/sec Loss 6.6895 LearningRate 0.0007 Epoch: 9 Global Step: 199100 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:30,607-Speed 6303.75 samples/sec Loss 6.6008 LearningRate 0.0007 Epoch: 9 Global Step: 199110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:33,852-Speed 6312.03 samples/sec Loss 6.5546 LearningRate 0.0007 Epoch: 9 Global Step: 199120 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:37,102-Speed 6304.01 samples/sec Loss 6.5624 LearningRate 0.0007 Epoch: 9 Global Step: 199130 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:40,349-Speed 6308.27 samples/sec Loss 6.6461 LearningRate 0.0007 Epoch: 9 Global Step: 199140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:43,597-Speed 6308.18 samples/sec Loss 6.5686 LearningRate 0.0007 Epoch: 9 Global Step: 199150 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:46,856-Speed 6285.95 samples/sec Loss 6.5737 LearningRate 0.0007 Epoch: 9 Global Step: 199160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:50,102-Speed 6309.11 samples/sec Loss 6.5802 LearningRate 0.0007 Epoch: 9 Global Step: 199170 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:53,355-Speed 6297.77 samples/sec Loss 6.5816 LearningRate 0.0007 Epoch: 9 Global Step: 199180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:50:56,602-Speed 6308.86 samples/sec Loss 6.6306 LearningRate 0.0007 Epoch: 9 Global Step: 199190 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:50:59,836-Speed 6335.36 samples/sec Loss 6.5733 LearningRate 0.0007 Epoch: 9 Global Step: 199200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:03,080-Speed 6313.20 samples/sec Loss 6.6327 LearningRate 0.0007 Epoch: 9 Global Step: 199210 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:06,328-Speed 6306.32 samples/sec Loss 6.5848 LearningRate 0.0007 Epoch: 9 Global Step: 199220 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:09,579-Speed 6302.04 samples/sec Loss 6.5905 LearningRate 0.0007 Epoch: 9 Global Step: 199230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:12,828-Speed 6305.41 samples/sec Loss 6.5867 LearningRate 0.0007 Epoch: 9 Global Step: 199240 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:16,073-Speed 6312.60 samples/sec Loss 6.6362 LearningRate 0.0007 Epoch: 9 Global Step: 199250 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:19,315-Speed 6316.64 samples/sec Loss 6.7145 LearningRate 0.0007 Epoch: 9 Global Step: 199260 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:22,562-Speed 6309.00 samples/sec Loss 6.6113 LearningRate 0.0007 Epoch: 9 Global Step: 199270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:25,816-Speed 6295.21 samples/sec Loss 6.6203 LearningRate 0.0007 Epoch: 9 Global Step: 199280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:29,066-Speed 6303.80 samples/sec Loss 6.5744 LearningRate 0.0007 Epoch: 9 Global Step: 199290 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:32,313-Speed 6308.35 samples/sec Loss 6.6395 LearningRate 0.0007 Epoch: 9 Global Step: 199300 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:51:35,559-Speed 6310.16 samples/sec Loss 6.6020 LearningRate 0.0007 Epoch: 9 Global Step: 199310 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:51:38,794-Speed 6333.28 samples/sec Loss 6.5960 LearningRate 0.0007 Epoch: 9 Global Step: 199320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:42,040-Speed 6310.31 samples/sec Loss 6.5934 LearningRate 0.0007 Epoch: 9 Global Step: 199330 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:45,284-Speed 6315.37 samples/sec Loss 6.6271 LearningRate 0.0007 Epoch: 9 Global Step: 199340 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:48,529-Speed 6312.31 samples/sec Loss 6.6768 LearningRate 0.0007 Epoch: 9 Global Step: 199350 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:51,776-Speed 6307.80 samples/sec Loss 6.6135 LearningRate 0.0007 Epoch: 9 Global Step: 199360 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:55,040-Speed 6277.65 samples/sec Loss 6.6410 LearningRate 0.0007 Epoch: 9 Global Step: 199370 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:51:58,287-Speed 6308.85 samples/sec Loss 6.6060 LearningRate 0.0007 Epoch: 9 Global Step: 199380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:01,536-Speed 6305.14 samples/sec Loss 6.6390 LearningRate 0.0007 Epoch: 9 Global Step: 199390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:04,788-Speed 6297.85 samples/sec Loss 6.5773 LearningRate 0.0007 Epoch: 9 Global Step: 199400 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:08,036-Speed 6306.82 samples/sec Loss 6.6107 LearningRate 0.0007 Epoch: 9 Global Step: 199410 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:11,269-Speed 6336.99 samples/sec Loss 6.6293 LearningRate 0.0007 Epoch: 9 Global Step: 199420 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:14,515-Speed 6311.01 samples/sec Loss 6.7093 LearningRate 0.0007 Epoch: 9 Global Step: 199430 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:17,761-Speed 6310.13 samples/sec Loss 6.5972 LearningRate 0.0007 Epoch: 9 Global Step: 199440 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:21,005-Speed 6314.10 samples/sec Loss 6.6589 LearningRate 0.0007 Epoch: 9 Global Step: 199450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:24,253-Speed 6306.79 samples/sec Loss 6.6322 LearningRate 0.0007 Epoch: 9 Global Step: 199460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:27,502-Speed 6304.24 samples/sec Loss 6.6023 LearningRate 0.0007 Epoch: 9 Global Step: 199470 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:30,745-Speed 6317.76 samples/sec Loss 6.6205 LearningRate 0.0007 Epoch: 9 Global Step: 199480 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:33,994-Speed 6304.55 samples/sec Loss 6.6143 LearningRate 0.0007 Epoch: 9 Global Step: 199490 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:37,241-Speed 6308.46 samples/sec Loss 6.6765 LearningRate 0.0007 Epoch: 9 Global Step: 199500 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:40,485-Speed 6314.01 samples/sec Loss 6.6121 LearningRate 0.0007 Epoch: 9 Global Step: 199510 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:43,734-Speed 6309.91 samples/sec Loss 6.5996 LearningRate 0.0007 Epoch: 9 Global Step: 199520 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:52:46,965-Speed 6339.87 samples/sec Loss 6.5628 LearningRate 0.0007 Epoch: 9 Global Step: 199530 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:50,211-Speed 6310.39 samples/sec Loss 6.6002 LearningRate 0.0007 Epoch: 9 Global Step: 199540 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:53,460-Speed 6303.91 samples/sec Loss 6.6553 LearningRate 0.0007 Epoch: 9 Global Step: 199550 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:56,705-Speed 6312.99 samples/sec Loss 6.5494 LearningRate 0.0007 Epoch: 9 Global Step: 199560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:52:59,948-Speed 6317.86 samples/sec Loss 6.5467 LearningRate 0.0007 Epoch: 9 Global Step: 199570 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:03,197-Speed 6305.62 samples/sec Loss 6.6405 LearningRate 0.0007 Epoch: 9 Global Step: 199580 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:06,445-Speed 6306.17 samples/sec Loss 6.6560 LearningRate 0.0007 Epoch: 9 Global Step: 199590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:09,688-Speed 6316.27 samples/sec Loss 6.6137 LearningRate 0.0007 Epoch: 9 Global Step: 199600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:12,935-Speed 6309.98 samples/sec Loss 6.6392 LearningRate 0.0007 Epoch: 9 Global Step: 199610 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:16,183-Speed 6305.10 samples/sec Loss 6.6223 LearningRate 0.0007 Epoch: 9 Global Step: 199620 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:19,418-Speed 6332.80 samples/sec Loss 6.6648 LearningRate 0.0007 Epoch: 9 Global Step: 199630 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:22,667-Speed 6305.63 samples/sec Loss 6.5763 LearningRate 0.0007 Epoch: 9 Global Step: 199640 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:25,913-Speed 6309.52 samples/sec Loss 6.5952 LearningRate 0.0007 Epoch: 9 Global Step: 199650 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:29,162-Speed 6305.78 samples/sec Loss 6.5535 LearningRate 0.0007 Epoch: 9 Global Step: 199660 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:32,409-Speed 6307.94 samples/sec Loss 6.5764 LearningRate 0.0007 Epoch: 9 Global Step: 199670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:35,652-Speed 6316.50 samples/sec Loss 6.6199 LearningRate 0.0007 Epoch: 9 Global Step: 199680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:38,900-Speed 6306.95 samples/sec Loss 6.5629 LearningRate 0.0007 Epoch: 9 Global Step: 199690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:42,149-Speed 6306.22 samples/sec Loss 6.5881 LearningRate 0.0007 Epoch: 9 Global Step: 199700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:45,397-Speed 6304.99 samples/sec Loss 6.6078 LearningRate 0.0007 Epoch: 9 Global Step: 199710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:48,645-Speed 6307.13 samples/sec Loss 6.5948 LearningRate 0.0007 Epoch: 9 Global Step: 199720 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:51,883-Speed 6326.67 samples/sec Loss 6.6031 LearningRate 0.0007 Epoch: 9 Global Step: 199730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:55,133-Speed 6303.80 samples/sec Loss 6.6227 LearningRate 0.0007 Epoch: 9 Global Step: 199740 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:53:58,380-Speed 6307.54 samples/sec Loss 6.6520 LearningRate 0.0007 Epoch: 9 Global Step: 199750 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:01,631-Speed 6302.16 samples/sec Loss 6.5580 LearningRate 0.0007 Epoch: 9 Global Step: 199760 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:04,876-Speed 6311.58 samples/sec Loss 6.6451 LearningRate 0.0007 Epoch: 9 Global Step: 199770 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:08,120-Speed 6314.21 samples/sec Loss 6.6395 LearningRate 0.0007 Epoch: 9 Global Step: 199780 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:11,370-Speed 6306.25 samples/sec Loss 6.6624 LearningRate 0.0007 Epoch: 9 Global Step: 199790 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:14,616-Speed 6310.98 samples/sec Loss 6.6153 LearningRate 0.0007 Epoch: 9 Global Step: 199800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:17,862-Speed 6308.99 samples/sec Loss 6.5740 LearningRate 0.0007 Epoch: 9 Global Step: 199810 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:21,108-Speed 6312.16 samples/sec Loss 6.6313 LearningRate 0.0007 Epoch: 9 Global Step: 199820 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:24,350-Speed 6317.62 samples/sec Loss 6.6008 LearningRate 0.0007 Epoch: 9 Global Step: 199830 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:54:27,582-Speed 6339.21 samples/sec Loss 6.6235 LearningRate 0.0007 Epoch: 9 Global Step: 199840 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:30,834-Speed 6297.93 samples/sec Loss 6.5380 LearningRate 0.0007 Epoch: 9 Global Step: 199850 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:34,076-Speed 6319.61 samples/sec Loss 6.6460 LearningRate 0.0007 Epoch: 9 Global Step: 199860 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:37,321-Speed 6312.67 samples/sec Loss 6.6273 LearningRate 0.0007 Epoch: 9 Global Step: 199870 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:40,578-Speed 6288.78 samples/sec Loss 6.5558 LearningRate 0.0007 Epoch: 9 Global Step: 199880 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:43,824-Speed 6311.47 samples/sec Loss 6.6590 LearningRate 0.0007 Epoch: 9 Global Step: 199890 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:47,071-Speed 6308.21 samples/sec Loss 6.5783 LearningRate 0.0007 Epoch: 9 Global Step: 199900 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:50,320-Speed 6305.05 samples/sec Loss 6.5920 LearningRate 0.0007 Epoch: 9 Global Step: 199910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:53,565-Speed 6312.43 samples/sec Loss 6.6302 LearningRate 0.0007 Epoch: 9 Global Step: 199920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:54:56,810-Speed 6311.77 samples/sec Loss 6.5575 LearningRate 0.0007 Epoch: 9 Global Step: 199930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:00,057-Speed 6310.51 samples/sec Loss 6.5550 LearningRate 0.0007 Epoch: 9 Global Step: 199940 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:55:03,290-Speed 6334.93 samples/sec Loss 6.6954 LearningRate 0.0007 Epoch: 9 Global Step: 199950 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:06,538-Speed 6306.48 samples/sec Loss 6.7151 LearningRate 0.0007 Epoch: 9 Global Step: 199960 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:09,785-Speed 6308.92 samples/sec Loss 6.6395 LearningRate 0.0007 Epoch: 9 Global Step: 199970 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:13,033-Speed 6307.37 samples/sec Loss 6.6594 LearningRate 0.0007 Epoch: 9 Global Step: 199980 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:16,283-Speed 6301.37 samples/sec Loss 6.5984 LearningRate 0.0007 Epoch: 9 Global Step: 199990 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:19,531-Speed 6308.32 samples/sec Loss 6.5735 LearningRate 0.0007 Epoch: 9 Global Step: 200000 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:22,781-Speed 6302.70 samples/sec Loss 6.6327 LearningRate 0.0007 Epoch: 9 Global Step: 200010 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:26,030-Speed 6304.55 samples/sec Loss 6.6146 LearningRate 0.0007 Epoch: 9 Global Step: 200020 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:29,278-Speed 6308.60 samples/sec Loss 6.5783 LearningRate 0.0007 Epoch: 9 Global Step: 200030 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:32,525-Speed 6307.51 samples/sec Loss 6.5161 LearningRate 0.0007 Epoch: 9 Global Step: 200040 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:35,758-Speed 6336.60 samples/sec Loss 6.6834 LearningRate 0.0007 Epoch: 9 Global Step: 200050 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:39,006-Speed 6306.93 samples/sec Loss 6.5383 LearningRate 0.0007 Epoch: 9 Global Step: 200060 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:42,258-Speed 6299.19 samples/sec Loss 6.5625 LearningRate 0.0007 Epoch: 9 Global Step: 200070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:45,505-Speed 6307.99 samples/sec Loss 6.5860 LearningRate 0.0007 Epoch: 9 Global Step: 200080 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:48,751-Speed 6310.43 samples/sec Loss 6.5785 LearningRate 0.0007 Epoch: 9 Global Step: 200090 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:51,995-Speed 6316.12 samples/sec Loss 6.6141 LearningRate 0.0007 Epoch: 9 Global Step: 200100 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:55,241-Speed 6309.30 samples/sec Loss 6.6632 LearningRate 0.0007 Epoch: 9 Global Step: 200110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:55:58,486-Speed 6312.96 samples/sec Loss 6.5973 LearningRate 0.0007 Epoch: 9 Global Step: 200120 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:01,736-Speed 6303.44 samples/sec Loss 6.6064 LearningRate 0.0007 Epoch: 9 Global Step: 200130 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:04,982-Speed 6311.50 samples/sec Loss 6.4908 LearningRate 0.0007 Epoch: 9 Global Step: 200140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:08,216-Speed 6334.23 samples/sec Loss 6.5517 LearningRate 0.0007 Epoch: 9 Global Step: 200150 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:11,469-Speed 6296.31 samples/sec Loss 6.6809 LearningRate 0.0007 Epoch: 9 Global Step: 200160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:14,720-Speed 6301.46 samples/sec Loss 6.6282 LearningRate 0.0007 Epoch: 9 Global Step: 200170 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:17,969-Speed 6303.25 samples/sec Loss 6.5658 LearningRate 0.0007 Epoch: 9 Global Step: 200180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:21,220-Speed 6302.10 samples/sec Loss 6.5532 LearningRate 0.0007 Epoch: 9 Global Step: 200190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:24,473-Speed 6297.25 samples/sec Loss 6.5574 LearningRate 0.0007 Epoch: 9 Global Step: 200200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:27,717-Speed 6315.16 samples/sec Loss 6.6112 LearningRate 0.0007 Epoch: 9 Global Step: 200210 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:30,962-Speed 6311.35 samples/sec Loss 6.7252 LearningRate 0.0007 Epoch: 9 Global Step: 200220 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:34,211-Speed 6305.58 samples/sec Loss 6.6478 LearningRate 0.0007 Epoch: 9 Global Step: 200230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:37,456-Speed 6313.90 samples/sec Loss 6.4834 LearningRate 0.0007 Epoch: 9 Global Step: 200240 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:40,702-Speed 6310.41 samples/sec Loss 6.5852 LearningRate 0.0007 Epoch: 9 Global Step: 200250 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:56:43,961-Speed 6284.95 samples/sec Loss 6.6153 LearningRate 0.0007 Epoch: 9 Global Step: 200260 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:56:47,194-Speed 6337.01 samples/sec Loss 6.5913 LearningRate 0.0007 Epoch: 9 Global Step: 200270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:50,446-Speed 6298.44 samples/sec Loss 6.6321 LearningRate 0.0007 Epoch: 9 Global Step: 200280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:53,695-Speed 6305.37 samples/sec Loss 6.6954 LearningRate 0.0007 Epoch: 9 Global Step: 200290 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:56:56,942-Speed 6309.40 samples/sec Loss 6.6068 LearningRate 0.0007 Epoch: 9 Global Step: 200300 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:00,191-Speed 6303.87 samples/sec Loss 6.5460 LearningRate 0.0007 Epoch: 9 Global Step: 200310 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:03,439-Speed 6307.16 samples/sec Loss 6.5429 LearningRate 0.0007 Epoch: 9 Global Step: 200320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:06,683-Speed 6314.50 samples/sec Loss 6.5921 LearningRate 0.0007 Epoch: 9 Global Step: 200330 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:09,927-Speed 6314.64 samples/sec Loss 6.5083 LearningRate 0.0007 Epoch: 9 Global Step: 200340 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:13,175-Speed 6306.64 samples/sec Loss 6.6354 LearningRate 0.0007 Epoch: 9 Global Step: 200350 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:16,423-Speed 6307.81 samples/sec Loss 6.5936 LearningRate 0.0007 Epoch: 9 Global Step: 200360 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:19,651-Speed 6344.05 samples/sec Loss 6.5419 LearningRate 0.0007 Epoch: 9 Global Step: 200370 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:22,899-Speed 6307.79 samples/sec Loss 6.6642 LearningRate 0.0007 Epoch: 9 Global Step: 200380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:26,145-Speed 6309.69 samples/sec Loss 6.5888 LearningRate 0.0007 Epoch: 9 Global Step: 200390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:29,388-Speed 6317.23 samples/sec Loss 6.6676 LearningRate 0.0007 Epoch: 9 Global Step: 200400 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:32,637-Speed 6304.97 samples/sec Loss 6.6012 LearningRate 0.0007 Epoch: 9 Global Step: 200410 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:35,887-Speed 6302.79 samples/sec Loss 6.5819 LearningRate 0.0007 Epoch: 9 Global Step: 200420 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:39,134-Speed 6309.72 samples/sec Loss 6.5869 LearningRate 0.0007 Epoch: 9 Global Step: 200430 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:42,389-Speed 6292.74 samples/sec Loss 6.6275 LearningRate 0.0007 Epoch: 9 Global Step: 200440 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:45,635-Speed 6311.21 samples/sec Loss 6.6042 LearningRate 0.0007 Epoch: 9 Global Step: 200450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:48,883-Speed 6307.95 samples/sec Loss 6.5761 LearningRate 0.0007 Epoch: 9 Global Step: 200460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:52,133-Speed 6302.81 samples/sec Loss 6.6456 LearningRate 0.0007 Epoch: 9 Global Step: 200470 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:57:55,366-Speed 6335.53 samples/sec Loss 6.5506 LearningRate 0.0007 Epoch: 9 Global Step: 200480 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:57:58,614-Speed 6306.39 samples/sec Loss 6.5976 LearningRate 0.0007 Epoch: 9 Global Step: 200490 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:01,874-Speed 6283.98 samples/sec Loss 6.5886 LearningRate 0.0007 Epoch: 9 Global Step: 200500 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:05,120-Speed 6309.88 samples/sec Loss 6.6257 LearningRate 0.0007 Epoch: 9 Global Step: 200510 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:08,366-Speed 6311.39 samples/sec Loss 6.5962 LearningRate 0.0007 Epoch: 9 Global Step: 200520 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:11,615-Speed 6305.10 samples/sec Loss 6.5760 LearningRate 0.0007 Epoch: 9 Global Step: 200530 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:14,859-Speed 6313.91 samples/sec Loss 6.5547 LearningRate 0.0007 Epoch: 9 Global Step: 200540 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:18,109-Speed 6304.02 samples/sec Loss 6.5751 LearningRate 0.0007 Epoch: 9 Global Step: 200550 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:21,358-Speed 6304.17 samples/sec Loss 6.5783 LearningRate 0.0007 Epoch: 9 Global Step: 200560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:24,612-Speed 6295.34 samples/sec Loss 6.5342 LearningRate 0.0007 Epoch: 9 Global Step: 200570 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:27,859-Speed 6309.41 samples/sec Loss 6.6217 LearningRate 0.0007 Epoch: 9 Global Step: 200580 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:58:31,093-Speed 6333.67 samples/sec Loss 6.6121 LearningRate 0.0007 Epoch: 9 Global Step: 200590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:34,346-Speed 6296.30 samples/sec Loss 6.7274 LearningRate 0.0007 Epoch: 9 Global Step: 200600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:37,591-Speed 6313.01 samples/sec Loss 6.6683 LearningRate 0.0007 Epoch: 9 Global Step: 200610 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:40,839-Speed 6306.58 samples/sec Loss 6.6093 LearningRate 0.0007 Epoch: 9 Global Step: 200620 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:44,089-Speed 6304.05 samples/sec Loss 6.6426 LearningRate 0.0007 Epoch: 9 Global Step: 200630 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:47,337-Speed 6305.99 samples/sec Loss 6.5235 LearningRate 0.0007 Epoch: 9 Global Step: 200640 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:50,585-Speed 6308.19 samples/sec Loss 6.5611 LearningRate 0.0007 Epoch: 9 Global Step: 200650 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:53,834-Speed 6304.29 samples/sec Loss 6.5409 LearningRate 0.0007 Epoch: 9 Global Step: 200660 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:58:57,078-Speed 6314.39 samples/sec Loss 6.6554 LearningRate 0.0007 Epoch: 9 Global Step: 200670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:00,323-Speed 6313.63 samples/sec Loss 6.5923 LearningRate 0.0007 Epoch: 9 Global Step: 200680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:03,557-Speed 6333.69 samples/sec Loss 6.6392 LearningRate 0.0007 Epoch: 9 Global Step: 200690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:06,805-Speed 6306.62 samples/sec Loss 6.5643 LearningRate 0.0007 Epoch: 9 Global Step: 200700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:10,054-Speed 6305.60 samples/sec Loss 6.5709 LearningRate 0.0007 Epoch: 9 Global Step: 200710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:13,307-Speed 6295.63 samples/sec Loss 6.6774 LearningRate 0.0007 Epoch: 9 Global Step: 200720 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:16,551-Speed 6315.34 samples/sec Loss 6.6029 LearningRate 0.0007 Epoch: 9 Global Step: 200730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:19,794-Speed 6315.86 samples/sec Loss 6.6150 LearningRate 0.0007 Epoch: 9 Global Step: 200740 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:23,054-Speed 6284.48 samples/sec Loss 6.5850 LearningRate 0.0007 Epoch: 9 Global Step: 200750 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:26,300-Speed 6310.13 samples/sec Loss 6.6185 LearningRate 0.0007 Epoch: 9 Global Step: 200760 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:29,554-Speed 6295.29 samples/sec Loss 6.6216 LearningRate 0.0007 Epoch: 9 Global Step: 200770 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:32,800-Speed 6310.66 samples/sec Loss 6.5612 LearningRate 0.0007 Epoch: 9 Global Step: 200780 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:36,058-Speed 6288.07 samples/sec Loss 6.6687 LearningRate 0.0007 Epoch: 9 Global Step: 200790 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 09:59:39,290-Speed 6338.72 samples/sec Loss 6.6330 LearningRate 0.0007 Epoch: 9 Global Step: 200800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:42,534-Speed 6313.31 samples/sec Loss 6.6280 LearningRate 0.0007 Epoch: 9 Global Step: 200810 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:45,780-Speed 6311.79 samples/sec Loss 6.5771 LearningRate 0.0007 Epoch: 9 Global Step: 200820 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:49,030-Speed 6303.35 samples/sec Loss 6.6263 LearningRate 0.0007 Epoch: 9 Global Step: 200830 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:52,276-Speed 6308.74 samples/sec Loss 6.6084 LearningRate 0.0007 Epoch: 9 Global Step: 200840 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:55,524-Speed 6308.90 samples/sec Loss 6.6000 LearningRate 0.0007 Epoch: 9 Global Step: 200850 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 09:59:58,772-Speed 6307.17 samples/sec Loss 6.5679 LearningRate 0.0007 Epoch: 9 Global Step: 200860 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:02,019-Speed 6307.81 samples/sec Loss 6.5265 LearningRate 0.0007 Epoch: 9 Global Step: 200870 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:05,270-Speed 6301.19 samples/sec Loss 6.5730 LearningRate 0.0007 Epoch: 9 Global Step: 200880 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:08,521-Speed 6301.54 samples/sec Loss 6.6267 LearningRate 0.0007 Epoch: 9 Global Step: 200890 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:11,753-Speed 6337.61 samples/sec Loss 6.5394 LearningRate 0.0007 Epoch: 9 Global Step: 200900 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:15,004-Speed 6300.91 samples/sec Loss 6.5330 LearningRate 0.0007 Epoch: 9 Global Step: 200910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:18,250-Speed 6311.60 samples/sec Loss 6.5897 LearningRate 0.0007 Epoch: 9 Global Step: 200920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:21,498-Speed 6305.28 samples/sec Loss 6.5993 LearningRate 0.0007 Epoch: 9 Global Step: 200930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:24,748-Speed 6303.46 samples/sec Loss 6.5230 LearningRate 0.0007 Epoch: 9 Global Step: 200940 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:28,008-Speed 6283.86 samples/sec Loss 6.5938 LearningRate 0.0007 Epoch: 9 Global Step: 200950 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:31,253-Speed 6312.17 samples/sec Loss 6.5590 LearningRate 0.0007 Epoch: 9 Global Step: 200960 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:34,498-Speed 6313.60 samples/sec Loss 6.6020 LearningRate 0.0007 Epoch: 9 Global Step: 200970 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:37,745-Speed 6308.80 samples/sec Loss 6.5904 LearningRate 0.0007 Epoch: 9 Global Step: 200980 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:40,991-Speed 6310.80 samples/sec Loss 6.5883 LearningRate 0.0007 Epoch: 9 Global Step: 200990 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:44,237-Speed 6309.78 samples/sec Loss 6.5527 LearningRate 0.0007 Epoch: 9 Global Step: 201000 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:00:47,498-Speed 6281.63 samples/sec Loss 6.6037 LearningRate 0.0007 Epoch: 9 Global Step: 201010 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:00:50,738-Speed 6322.47 samples/sec Loss 6.6524 LearningRate 0.0007 Epoch: 9 Global Step: 201020 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:53,984-Speed 6311.06 samples/sec Loss 6.5106 LearningRate 0.0007 Epoch: 9 Global Step: 201030 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:00:57,232-Speed 6307.75 samples/sec Loss 6.5724 LearningRate 0.0007 Epoch: 9 Global Step: 201040 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:00,478-Speed 6310.05 samples/sec Loss 6.5968 LearningRate 0.0007 Epoch: 9 Global Step: 201050 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:03,736-Speed 6287.29 samples/sec Loss 6.5933 LearningRate 0.0007 Epoch: 9 Global Step: 201060 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:06,982-Speed 6312.02 samples/sec Loss 6.5532 LearningRate 0.0007 Epoch: 9 Global Step: 201070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:10,232-Speed 6302.69 samples/sec Loss 6.5380 LearningRate 0.0007 Epoch: 9 Global Step: 201080 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:13,474-Speed 6317.61 samples/sec Loss 6.5993 LearningRate 0.0007 Epoch: 9 Global Step: 201090 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:16,723-Speed 6306.32 samples/sec Loss 6.6110 LearningRate 0.0007 Epoch: 9 Global Step: 201100 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:19,968-Speed 6311.42 samples/sec Loss 6.6044 LearningRate 0.0007 Epoch: 9 Global Step: 201110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:23,204-Speed 6331.58 samples/sec Loss 6.5383 LearningRate 0.0007 Epoch: 9 Global Step: 201120 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:26,459-Speed 6292.22 samples/sec Loss 6.5176 LearningRate 0.0007 Epoch: 9 Global Step: 201130 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:29,710-Speed 6300.69 samples/sec Loss 6.6084 LearningRate 0.0007 Epoch: 9 Global Step: 201140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:32,963-Speed 6298.34 samples/sec Loss 6.6299 LearningRate 0.0007 Epoch: 9 Global Step: 201150 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:36,215-Speed 6299.74 samples/sec Loss 6.5546 LearningRate 0.0007 Epoch: 9 Global Step: 201160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:39,458-Speed 6314.86 samples/sec Loss 6.5894 LearningRate 0.0007 Epoch: 9 Global Step: 201170 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:42,703-Speed 6312.40 samples/sec Loss 6.5575 LearningRate 0.0007 Epoch: 9 Global Step: 201180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:45,948-Speed 6313.41 samples/sec Loss 6.5431 LearningRate 0.0007 Epoch: 9 Global Step: 201190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:49,198-Speed 6302.82 samples/sec Loss 6.5595 LearningRate 0.0007 Epoch: 9 Global Step: 201200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:52,447-Speed 6305.79 samples/sec Loss 6.5827 LearningRate 0.0007 Epoch: 9 Global Step: 201210 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:01:55,702-Speed 6292.15 samples/sec Loss 6.5994 LearningRate 0.0007 Epoch: 9 Global Step: 201220 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:01:58,931-Speed 6343.77 samples/sec Loss 6.5630 LearningRate 0.0007 Epoch: 9 Global Step: 201230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:02,178-Speed 6308.56 samples/sec Loss 6.6312 LearningRate 0.0007 Epoch: 9 Global Step: 201240 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:05,427-Speed 6306.97 samples/sec Loss 6.4898 LearningRate 0.0007 Epoch: 9 Global Step: 201250 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:08,672-Speed 6311.03 samples/sec Loss 6.4668 LearningRate 0.0007 Epoch: 9 Global Step: 201260 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:11,925-Speed 6296.62 samples/sec Loss 6.5258 LearningRate 0.0007 Epoch: 9 Global Step: 201270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:15,173-Speed 6306.92 samples/sec Loss 6.5476 LearningRate 0.0007 Epoch: 9 Global Step: 201280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:18,418-Speed 6314.18 samples/sec Loss 6.5656 LearningRate 0.0007 Epoch: 9 Global Step: 201290 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:21,669-Speed 6301.68 samples/sec Loss 6.6041 LearningRate 0.0007 Epoch: 9 Global Step: 201300 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:24,915-Speed 6310.04 samples/sec Loss 6.6067 LearningRate 0.0007 Epoch: 9 Global Step: 201310 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:28,163-Speed 6307.46 samples/sec Loss 6.6578 LearningRate 0.0007 Epoch: 9 Global Step: 201320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:31,408-Speed 6311.74 samples/sec Loss 6.5278 LearningRate 0.0007 Epoch: 9 Global Step: 201330 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:02:34,656-Speed 6307.10 samples/sec Loss 6.5607 LearningRate 0.0007 Epoch: 9 Global Step: 201340 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:02:37,905-Speed 6305.34 samples/sec Loss 6.6348 LearningRate 0.0007 Epoch: 9 Global Step: 201350 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:02:41,155-Speed 6303.45 samples/sec Loss 6.6068 LearningRate 0.0007 Epoch: 9 Global Step: 201360 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:02:44,390-Speed 6332.33 samples/sec Loss 6.6250 LearningRate 0.0007 Epoch: 9 Global Step: 201370 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:47,635-Speed 6313.12 samples/sec Loss 6.5194 LearningRate 0.0007 Epoch: 9 Global Step: 201380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:50,881-Speed 6309.24 samples/sec Loss 6.6298 LearningRate 0.0007 Epoch: 9 Global Step: 201390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:54,122-Speed 6321.22 samples/sec Loss 6.5926 LearningRate 0.0007 Epoch: 9 Global Step: 201400 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:02:57,372-Speed 6302.06 samples/sec Loss 6.5935 LearningRate 0.0007 Epoch: 9 Global Step: 201410 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:00,620-Speed 6307.33 samples/sec Loss 6.5719 LearningRate 0.0007 Epoch: 9 Global Step: 201420 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:03,867-Speed 6310.06 samples/sec Loss 6.4903 LearningRate 0.0007 Epoch: 9 Global Step: 201430 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:07,111-Speed 6314.42 samples/sec Loss 6.6453 LearningRate 0.0007 Epoch: 9 Global Step: 201440 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:10,358-Speed 6309.11 samples/sec Loss 6.6759 LearningRate 0.0007 Epoch: 9 Global Step: 201450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:13,605-Speed 6307.65 samples/sec Loss 6.6512 LearningRate 0.0007 Epoch: 9 Global Step: 201460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:16,839-Speed 6337.73 samples/sec Loss 6.5950 LearningRate 0.0007 Epoch: 9 Global Step: 201470 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:20,086-Speed 6309.78 samples/sec Loss 6.5805 LearningRate 0.0007 Epoch: 9 Global Step: 201480 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:23,331-Speed 6312.00 samples/sec Loss 6.6311 LearningRate 0.0007 Epoch: 9 Global Step: 201490 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:26,579-Speed 6306.43 samples/sec Loss 6.6613 LearningRate 0.0007 Epoch: 9 Global Step: 201500 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:29,828-Speed 6304.47 samples/sec Loss 6.5847 LearningRate 0.0007 Epoch: 9 Global Step: 201510 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:33,071-Speed 6316.82 samples/sec Loss 6.6077 LearningRate 0.0007 Epoch: 9 Global Step: 201520 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:36,322-Speed 6301.46 samples/sec Loss 6.6409 LearningRate 0.0007 Epoch: 9 Global Step: 201530 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:39,581-Speed 6286.71 samples/sec Loss 6.5837 LearningRate 0.0007 Epoch: 9 Global Step: 201540 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:42,830-Speed 6304.07 samples/sec Loss 6.6303 LearningRate 0.0007 Epoch: 9 Global Step: 201550 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:46,074-Speed 6314.14 samples/sec Loss 6.6029 LearningRate 0.0007 Epoch: 9 Global Step: 201560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:49,319-Speed 6312.02 samples/sec Loss 6.6163 LearningRate 0.0007 Epoch: 9 Global Step: 201570 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:03:52,552-Speed 6337.81 samples/sec Loss 6.6080 LearningRate 0.0007 Epoch: 9 Global Step: 201580 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:55,799-Speed 6308.68 samples/sec Loss 6.5914 LearningRate 0.0007 Epoch: 9 Global Step: 201590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:03:59,045-Speed 6308.99 samples/sec Loss 6.6482 LearningRate 0.0007 Epoch: 9 Global Step: 201600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:02,296-Speed 6300.80 samples/sec Loss 6.5804 LearningRate 0.0007 Epoch: 9 Global Step: 201610 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:05,545-Speed 6306.02 samples/sec Loss 6.6638 LearningRate 0.0007 Epoch: 9 Global Step: 201620 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:08,792-Speed 6308.59 samples/sec Loss 6.5655 LearningRate 0.0007 Epoch: 9 Global Step: 201630 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:12,043-Speed 6302.28 samples/sec Loss 6.4794 LearningRate 0.0007 Epoch: 9 Global Step: 201640 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:15,290-Speed 6308.08 samples/sec Loss 6.5392 LearningRate 0.0007 Epoch: 9 Global Step: 201650 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:18,540-Speed 6303.59 samples/sec Loss 6.5150 LearningRate 0.0007 Epoch: 9 Global Step: 201660 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:21,788-Speed 6306.24 samples/sec Loss 6.5602 LearningRate 0.0007 Epoch: 9 Global Step: 201670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:25,027-Speed 6327.11 samples/sec Loss 6.5770 LearningRate 0.0007 Epoch: 9 Global Step: 201680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:28,274-Speed 6308.64 samples/sec Loss 6.6237 LearningRate 0.0007 Epoch: 9 Global Step: 201690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:31,520-Speed 6310.59 samples/sec Loss 6.5997 LearningRate 0.0007 Epoch: 9 Global Step: 201700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:34,770-Speed 6302.71 samples/sec Loss 6.6480 LearningRate 0.0007 Epoch: 9 Global Step: 201710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:38,017-Speed 6309.05 samples/sec Loss 6.6223 LearningRate 0.0007 Epoch: 9 Global Step: 201720 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:41,265-Speed 6307.91 samples/sec Loss 6.5222 LearningRate 0.0007 Epoch: 9 Global Step: 201730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:44,512-Speed 6307.69 samples/sec Loss 6.5463 LearningRate 0.0007 Epoch: 9 Global Step: 201740 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:47,761-Speed 6305.89 samples/sec Loss 6.6688 LearningRate 0.0007 Epoch: 9 Global Step: 201750 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:51,095-Speed 6143.72 samples/sec Loss 6.5465 LearningRate 0.0007 Epoch: 9 Global Step: 201760 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:54,362-Speed 6270.47 samples/sec Loss 6.6011 LearningRate 0.0007 Epoch: 9 Global Step: 201770 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:04:57,609-Speed 6308.13 samples/sec Loss 6.6190 LearningRate 0.0007 Epoch: 9 Global Step: 201780 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:05:00,839-Speed 6342.05 samples/sec Loss 6.6044 LearningRate 0.0007 Epoch: 9 Global Step: 201790 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:04,094-Speed 6293.68 samples/sec Loss 6.5926 LearningRate 0.0007 Epoch: 9 Global Step: 201800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:07,341-Speed 6309.08 samples/sec Loss 6.5720 LearningRate 0.0007 Epoch: 9 Global Step: 201810 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:10,583-Speed 6317.57 samples/sec Loss 6.5944 LearningRate 0.0007 Epoch: 9 Global Step: 201820 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:13,830-Speed 6309.63 samples/sec Loss 6.4948 LearningRate 0.0007 Epoch: 9 Global Step: 201830 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:17,084-Speed 6295.16 samples/sec Loss 6.5550 LearningRate 0.0007 Epoch: 9 Global Step: 201840 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:20,337-Speed 6296.43 samples/sec Loss 6.6544 LearningRate 0.0007 Epoch: 9 Global Step: 201850 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:23,581-Speed 6314.42 samples/sec Loss 6.6194 LearningRate 0.0007 Epoch: 9 Global Step: 201860 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:26,830-Speed 6305.11 samples/sec Loss 6.6563 LearningRate 0.0007 Epoch: 9 Global Step: 201870 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:30,075-Speed 6312.32 samples/sec Loss 6.6210 LearningRate 0.0007 Epoch: 9 Global Step: 201880 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:33,320-Speed 6313.70 samples/sec Loss 6.5522 LearningRate 0.0007 Epoch: 9 Global Step: 201890 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:05:36,566-Speed 6310.06 samples/sec Loss 6.6437 LearningRate 0.0007 Epoch: 9 Global Step: 201900 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:05:39,796-Speed 6341.69 samples/sec Loss 6.6019 LearningRate 0.0007 Epoch: 9 Global Step: 201910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:43,045-Speed 6306.48 samples/sec Loss 6.6613 LearningRate 0.0007 Epoch: 9 Global Step: 201920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:46,290-Speed 6311.88 samples/sec Loss 6.5902 LearningRate 0.0007 Epoch: 9 Global Step: 201930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:49,537-Speed 6309.48 samples/sec Loss 6.5824 LearningRate 0.0007 Epoch: 9 Global Step: 201940 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:52,785-Speed 6307.32 samples/sec Loss 6.5757 LearningRate 0.0007 Epoch: 9 Global Step: 201950 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:56,030-Speed 6311.85 samples/sec Loss 6.5772 LearningRate 0.0007 Epoch: 9 Global Step: 201960 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:05:59,274-Speed 6314.30 samples/sec Loss 6.5593 LearningRate 0.0007 Epoch: 9 Global Step: 201970 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:02,519-Speed 6312.51 samples/sec Loss 6.5205 LearningRate 0.0007 Epoch: 9 Global Step: 201980 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:05,765-Speed 6312.53 samples/sec Loss 6.6089 LearningRate 0.0007 Epoch: 9 Global Step: 201990 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:09,008-Speed 6316.25 samples/sec Loss 6.5819 LearningRate 0.0007 Epoch: 9 Global Step: 202000 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:12,254-Speed 6309.62 samples/sec Loss 6.5805 LearningRate 0.0007 Epoch: 9 Global Step: 202010 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:06:15,491-Speed 6329.73 samples/sec Loss 6.5331 LearningRate 0.0007 Epoch: 9 Global Step: 202020 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:18,735-Speed 6314.43 samples/sec Loss 6.6032 LearningRate 0.0007 Epoch: 9 Global Step: 202030 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:21,989-Speed 6293.72 samples/sec Loss 6.5678 LearningRate 0.0007 Epoch: 9 Global Step: 202040 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:25,232-Speed 6317.66 samples/sec Loss 6.5612 LearningRate 0.0007 Epoch: 9 Global Step: 202050 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:28,478-Speed 6310.75 samples/sec Loss 6.5490 LearningRate 0.0007 Epoch: 9 Global Step: 202060 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:31,723-Speed 6312.01 samples/sec Loss 6.5400 LearningRate 0.0007 Epoch: 9 Global Step: 202070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:34,976-Speed 6298.96 samples/sec Loss 6.5394 LearningRate 0.0007 Epoch: 9 Global Step: 202080 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:38,220-Speed 6314.16 samples/sec Loss 6.6469 LearningRate 0.0007 Epoch: 9 Global Step: 202090 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:41,470-Speed 6303.49 samples/sec Loss 6.6105 LearningRate 0.0007 Epoch: 9 Global Step: 202100 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:44,729-Speed 6285.84 samples/sec Loss 6.5547 LearningRate 0.0007 Epoch: 9 Global Step: 202110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:47,971-Speed 6317.71 samples/sec Loss 6.5627 LearningRate 0.0007 Epoch: 9 Global Step: 202120 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:06:51,204-Speed 6336.38 samples/sec Loss 6.5899 LearningRate 0.0007 Epoch: 9 Global Step: 202130 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:54,454-Speed 6303.33 samples/sec Loss 6.5949 LearningRate 0.0007 Epoch: 9 Global Step: 202140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:06:57,696-Speed 6318.63 samples/sec Loss 6.5312 LearningRate 0.0007 Epoch: 9 Global Step: 202150 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:00,949-Speed 6297.05 samples/sec Loss 6.6118 LearningRate 0.0007 Epoch: 9 Global Step: 202160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:04,194-Speed 6313.50 samples/sec Loss 6.6088 LearningRate 0.0007 Epoch: 9 Global Step: 202170 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:07,440-Speed 6309.15 samples/sec Loss 6.5556 LearningRate 0.0007 Epoch: 9 Global Step: 202180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:10,690-Speed 6303.80 samples/sec Loss 6.5277 LearningRate 0.0007 Epoch: 9 Global Step: 202190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:13,938-Speed 6307.21 samples/sec Loss 6.6248 LearningRate 0.0007 Epoch: 9 Global Step: 202200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:17,186-Speed 6305.89 samples/sec Loss 6.6562 LearningRate 0.0007 Epoch: 9 Global Step: 202210 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:20,432-Speed 6311.27 samples/sec Loss 6.6078 LearningRate 0.0007 Epoch: 9 Global Step: 202220 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:23,679-Speed 6308.44 samples/sec Loss 6.6001 LearningRate 0.0007 Epoch: 9 Global Step: 202230 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:07:26,932-Speed 6296.54 samples/sec Loss 6.5661 LearningRate 0.0007 Epoch: 9 Global Step: 202240 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:07:30,163-Speed 6340.68 samples/sec Loss 6.5288 LearningRate 0.0007 Epoch: 9 Global Step: 202250 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:33,412-Speed 6305.44 samples/sec Loss 6.5468 LearningRate 0.0007 Epoch: 9 Global Step: 202260 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:36,661-Speed 6304.97 samples/sec Loss 6.5738 LearningRate 0.0007 Epoch: 9 Global Step: 202270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:39,937-Speed 6252.39 samples/sec Loss 6.5299 LearningRate 0.0007 Epoch: 9 Global Step: 202280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:43,195-Speed 6288.50 samples/sec Loss 6.6174 LearningRate 0.0007 Epoch: 9 Global Step: 202290 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:46,444-Speed 6304.99 samples/sec Loss 6.6147 LearningRate 0.0007 Epoch: 9 Global Step: 202300 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:49,703-Speed 6285.20 samples/sec Loss 6.5500 LearningRate 0.0007 Epoch: 9 Global Step: 202310 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:52,950-Speed 6308.60 samples/sec Loss 6.5487 LearningRate 0.0007 Epoch: 9 Global Step: 202320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:56,200-Speed 6301.98 samples/sec Loss 6.5774 LearningRate 0.0007 Epoch: 9 Global Step: 202330 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:07:59,447-Speed 6309.74 samples/sec Loss 6.6500 LearningRate 0.0007 Epoch: 9 Global Step: 202340 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:02,696-Speed 6304.91 samples/sec Loss 6.6391 LearningRate 0.0007 Epoch: 9 Global Step: 202350 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:08:05,945-Speed 6305.32 samples/sec Loss 6.5238 LearningRate 0.0007 Epoch: 9 Global Step: 202360 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:08:09,191-Speed 6310.09 samples/sec Loss 6.6074 LearningRate 0.0007 Epoch: 9 Global Step: 202370 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:08:12,425-Speed 6335.70 samples/sec Loss 6.5672 LearningRate 0.0007 Epoch: 9 Global Step: 202380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:15,672-Speed 6307.28 samples/sec Loss 6.5561 LearningRate 0.0007 Epoch: 9 Global Step: 202390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:18,918-Speed 6311.13 samples/sec Loss 6.6203 LearningRate 0.0007 Epoch: 9 Global Step: 202400 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:22,164-Speed 6310.86 samples/sec Loss 6.5869 LearningRate 0.0007 Epoch: 9 Global Step: 202410 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:25,408-Speed 6314.00 samples/sec Loss 6.5550 LearningRate 0.0007 Epoch: 9 Global Step: 202420 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:28,655-Speed 6308.93 samples/sec Loss 6.4700 LearningRate 0.0007 Epoch: 9 Global Step: 202430 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:31,897-Speed 6318.19 samples/sec Loss 6.5140 LearningRate 0.0007 Epoch: 9 Global Step: 202440 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:35,148-Speed 6302.21 samples/sec Loss 6.5994 LearningRate 0.0007 Epoch: 9 Global Step: 202450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:38,396-Speed 6306.27 samples/sec Loss 6.5566 LearningRate 0.0007 Epoch: 9 Global Step: 202460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:41,641-Speed 6313.42 samples/sec Loss 6.5817 LearningRate 0.0007 Epoch: 9 Global Step: 202470 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:44,886-Speed 6311.54 samples/sec Loss 6.6673 LearningRate 0.0007 Epoch: 9 Global Step: 202480 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:08:48,128-Speed 6318.08 samples/sec Loss 6.5740 LearningRate 0.0007 Epoch: 9 Global Step: 202490 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:08:51,374-Speed 6311.58 samples/sec Loss 6.5709 LearningRate 0.0007 Epoch: 9 Global Step: 202500 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:08:54,605-Speed 6339.12 samples/sec Loss 6.5809 LearningRate 0.0007 Epoch: 9 Global Step: 202510 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:08:57,853-Speed 6307.80 samples/sec Loss 6.5425 LearningRate 0.0007 Epoch: 9 Global Step: 202520 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:01,101-Speed 6305.49 samples/sec Loss 6.5083 LearningRate 0.0007 Epoch: 9 Global Step: 202530 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:04,349-Speed 6307.74 samples/sec Loss 6.5576 LearningRate 0.0007 Epoch: 9 Global Step: 202540 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:07,599-Speed 6303.54 samples/sec Loss 6.5374 LearningRate 0.0007 Epoch: 9 Global Step: 202550 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:10,845-Speed 6311.15 samples/sec Loss 6.5765 LearningRate 0.0007 Epoch: 9 Global Step: 202560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:14,091-Speed 6310.54 samples/sec Loss 6.5662 LearningRate 0.0007 Epoch: 9 Global Step: 202570 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:17,334-Speed 6316.61 samples/sec Loss 6.5212 LearningRate 0.0007 Epoch: 9 Global Step: 202580 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:20,579-Speed 6312.82 samples/sec Loss 6.4996 LearningRate 0.0007 Epoch: 9 Global Step: 202590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:23,831-Speed 6299.04 samples/sec Loss 6.5475 LearningRate 0.0007 Epoch: 9 Global Step: 202600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:27,076-Speed 6313.25 samples/sec Loss 6.6010 LearningRate 0.0007 Epoch: 9 Global Step: 202610 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:09:30,309-Speed 6336.01 samples/sec Loss 6.6135 LearningRate 0.0007 Epoch: 9 Global Step: 202620 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:33,555-Speed 6310.36 samples/sec Loss 6.6234 LearningRate 0.0007 Epoch: 9 Global Step: 202630 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:36,801-Speed 6311.00 samples/sec Loss 6.5859 LearningRate 0.0007 Epoch: 9 Global Step: 202640 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:40,053-Speed 6298.12 samples/sec Loss 6.6767 LearningRate 0.0007 Epoch: 9 Global Step: 202650 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:43,296-Speed 6316.09 samples/sec Loss 6.4885 LearningRate 0.0007 Epoch: 9 Global Step: 202660 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:46,544-Speed 6308.11 samples/sec Loss 6.5947 LearningRate 0.0007 Epoch: 9 Global Step: 202670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:49,787-Speed 6315.68 samples/sec Loss 6.5548 LearningRate 0.0007 Epoch: 9 Global Step: 202680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:53,033-Speed 6311.69 samples/sec Loss 6.6579 LearningRate 0.0007 Epoch: 9 Global Step: 202690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:56,277-Speed 6312.72 samples/sec Loss 6.5857 LearningRate 0.0007 Epoch: 9 Global Step: 202700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:09:59,524-Speed 6308.67 samples/sec Loss 6.5453 LearningRate 0.0007 Epoch: 9 Global Step: 202710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:02,771-Speed 6310.41 samples/sec Loss 6.5692 LearningRate 0.0007 Epoch: 9 Global Step: 202720 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:10:06,002-Speed 6340.22 samples/sec Loss 6.5627 LearningRate 0.0007 Epoch: 9 Global Step: 202730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:09,263-Speed 6279.85 samples/sec Loss 6.5713 LearningRate 0.0007 Epoch: 9 Global Step: 202740 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:12,508-Speed 6312.96 samples/sec Loss 6.5696 LearningRate 0.0007 Epoch: 9 Global Step: 202750 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:15,758-Speed 6303.65 samples/sec Loss 6.5694 LearningRate 0.0007 Epoch: 9 Global Step: 202760 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:19,007-Speed 6305.43 samples/sec Loss 6.6138 LearningRate 0.0007 Epoch: 9 Global Step: 202770 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:22,253-Speed 6310.25 samples/sec Loss 6.5025 LearningRate 0.0007 Epoch: 9 Global Step: 202780 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:25,501-Speed 6307.52 samples/sec Loss 6.5308 LearningRate 0.0007 Epoch: 9 Global Step: 202790 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:28,748-Speed 6308.17 samples/sec Loss 6.6106 LearningRate 0.0007 Epoch: 9 Global Step: 202800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:31,990-Speed 6317.80 samples/sec Loss 6.6468 LearningRate 0.0007 Epoch: 9 Global Step: 202810 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:35,237-Speed 6309.27 samples/sec Loss 6.6549 LearningRate 0.0007 Epoch: 9 Global Step: 202820 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:38,484-Speed 6310.08 samples/sec Loss 6.5782 LearningRate 0.0007 Epoch: 9 Global Step: 202830 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:10:41,719-Speed 6331.52 samples/sec Loss 6.5808 LearningRate 0.0007 Epoch: 9 Global Step: 202840 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:44,961-Speed 6319.07 samples/sec Loss 6.6462 LearningRate 0.0007 Epoch: 9 Global Step: 202850 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:48,213-Speed 6298.78 samples/sec Loss 6.5898 LearningRate 0.0007 Epoch: 9 Global Step: 202860 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:51,460-Speed 6308.63 samples/sec Loss 6.5339 LearningRate 0.0007 Epoch: 9 Global Step: 202870 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:54,702-Speed 6319.07 samples/sec Loss 6.5436 LearningRate 0.0007 Epoch: 9 Global Step: 202880 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:10:57,950-Speed 6305.21 samples/sec Loss 6.6455 LearningRate 0.0007 Epoch: 9 Global Step: 202890 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:01,193-Speed 6316.14 samples/sec Loss 6.5867 LearningRate 0.0007 Epoch: 9 Global Step: 202900 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:04,442-Speed 6306.80 samples/sec Loss 6.6254 LearningRate 0.0007 Epoch: 9 Global Step: 202910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:07,688-Speed 6309.81 samples/sec Loss 6.6046 LearningRate 0.0007 Epoch: 9 Global Step: 202920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:10,940-Speed 6300.08 samples/sec Loss 6.5633 LearningRate 0.0007 Epoch: 9 Global Step: 202930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:14,184-Speed 6313.85 samples/sec Loss 6.6849 LearningRate 0.0007 Epoch: 9 Global Step: 202940 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:11:17,428-Speed 6313.97 samples/sec Loss 6.6336 LearningRate 0.0007 Epoch: 9 Global Step: 202950 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:11:20,675-Speed 6308.55 samples/sec Loss 6.6171 LearningRate 0.0007 Epoch: 9 Global Step: 202960 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:11:23,926-Speed 6302.47 samples/sec Loss 6.5309 LearningRate 0.0007 Epoch: 9 Global Step: 202970 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:11:27,172-Speed 6309.66 samples/sec Loss 6.5201 LearningRate 0.0007 Epoch: 9 Global Step: 202980 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:11:30,408-Speed 6331.36 samples/sec Loss 6.6091 LearningRate 0.0007 Epoch: 9 Global Step: 202990 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:33,654-Speed 6309.61 samples/sec Loss 6.6008 LearningRate 0.0007 Epoch: 9 Global Step: 203000 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:36,905-Speed 6302.49 samples/sec Loss 6.5345 LearningRate 0.0007 Epoch: 9 Global Step: 203010 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:40,151-Speed 6310.13 samples/sec Loss 6.5282 LearningRate 0.0007 Epoch: 9 Global Step: 203020 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:43,398-Speed 6307.94 samples/sec Loss 6.6159 LearningRate 0.0007 Epoch: 9 Global Step: 203030 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:46,642-Speed 6315.86 samples/sec Loss 6.5874 LearningRate 0.0007 Epoch: 9 Global Step: 203040 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:49,890-Speed 6306.96 samples/sec Loss 6.5826 LearningRate 0.0007 Epoch: 9 Global Step: 203050 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:53,132-Speed 6316.94 samples/sec Loss 6.6065 LearningRate 0.0007 Epoch: 9 Global Step: 203060 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:56,376-Speed 6314.94 samples/sec Loss 6.5470 LearningRate 0.0007 Epoch: 9 Global Step: 203070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:11:59,620-Speed 6314.99 samples/sec Loss 6.6229 LearningRate 0.0007 Epoch: 9 Global Step: 203080 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:02,893-Speed 6258.60 samples/sec Loss 6.7036 LearningRate 0.0007 Epoch: 9 Global Step: 203090 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:12:06,165-Speed 6260.13 samples/sec Loss 6.5941 LearningRate 0.0007 Epoch: 9 Global Step: 203100 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:09,420-Speed 6293.92 samples/sec Loss 6.5682 LearningRate 0.0007 Epoch: 9 Global Step: 203110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:12,666-Speed 6309.96 samples/sec Loss 6.5707 LearningRate 0.0007 Epoch: 9 Global Step: 203120 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:15,917-Speed 6302.60 samples/sec Loss 6.5614 LearningRate 0.0007 Epoch: 9 Global Step: 203130 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:19,164-Speed 6307.87 samples/sec Loss 6.6266 LearningRate 0.0007 Epoch: 9 Global Step: 203140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:22,411-Speed 6307.65 samples/sec Loss 6.6245 LearningRate 0.0007 Epoch: 9 Global Step: 203150 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:25,757-Speed 6122.76 samples/sec Loss 6.5800 LearningRate 0.0007 Epoch: 9 Global Step: 203160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:29,025-Speed 6268.41 samples/sec Loss 6.5036 LearningRate 0.0007 Epoch: 9 Global Step: 203170 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:32,279-Speed 6294.64 samples/sec Loss 6.5656 LearningRate 0.0007 Epoch: 9 Global Step: 203180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:35,531-Speed 6300.60 samples/sec Loss 6.6017 LearningRate 0.0007 Epoch: 9 Global Step: 203190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:38,781-Speed 6303.76 samples/sec Loss 6.5856 LearningRate 0.0007 Epoch: 9 Global Step: 203200 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:12:42,050-Speed 6264.62 samples/sec Loss 6.6166 LearningRate 0.0007 Epoch: 9 Global Step: 203210 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:12:45,280-Speed 6342.07 samples/sec Loss 6.5865 LearningRate 0.0007 Epoch: 9 Global Step: 203220 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:48,526-Speed 6312.19 samples/sec Loss 6.5166 LearningRate 0.0007 Epoch: 9 Global Step: 203230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:51,773-Speed 6308.84 samples/sec Loss 6.6609 LearningRate 0.0007 Epoch: 9 Global Step: 203240 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:55,020-Speed 6308.01 samples/sec Loss 6.6432 LearningRate 0.0007 Epoch: 9 Global Step: 203250 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:12:58,265-Speed 6312.20 samples/sec Loss 6.6237 LearningRate 0.0007 Epoch: 9 Global Step: 203260 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:01,517-Speed 6298.88 samples/sec Loss 6.5460 LearningRate 0.0007 Epoch: 9 Global Step: 203270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:04,777-Speed 6284.60 samples/sec Loss 6.5415 LearningRate 0.0007 Epoch: 9 Global Step: 203280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:08,027-Speed 6302.56 samples/sec Loss 6.5109 LearningRate 0.0007 Epoch: 9 Global Step: 203290 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:11,275-Speed 6306.32 samples/sec Loss 6.5107 LearningRate 0.0007 Epoch: 9 Global Step: 203300 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:14,532-Speed 6289.40 samples/sec Loss 6.5321 LearningRate 0.0007 Epoch: 9 Global Step: 203310 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:17,767-Speed 6332.45 samples/sec Loss 6.5876 LearningRate 0.0007 Epoch: 9 Global Step: 203320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:21,019-Speed 6299.12 samples/sec Loss 6.5181 LearningRate 0.0007 Epoch: 9 Global Step: 203330 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:24,270-Speed 6301.17 samples/sec Loss 6.5074 LearningRate 0.0007 Epoch: 9 Global Step: 203340 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:27,516-Speed 6309.94 samples/sec Loss 6.5134 LearningRate 0.0007 Epoch: 9 Global Step: 203350 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:30,765-Speed 6304.87 samples/sec Loss 6.6064 LearningRate 0.0007 Epoch: 9 Global Step: 203360 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:34,014-Speed 6304.99 samples/sec Loss 6.6137 LearningRate 0.0007 Epoch: 9 Global Step: 203370 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:37,259-Speed 6313.26 samples/sec Loss 6.5901 LearningRate 0.0007 Epoch: 9 Global Step: 203380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:40,510-Speed 6301.35 samples/sec Loss 6.5192 LearningRate 0.0007 Epoch: 9 Global Step: 203390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:43,762-Speed 6299.23 samples/sec Loss 6.6202 LearningRate 0.0007 Epoch: 9 Global Step: 203400 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:47,007-Speed 6313.03 samples/sec Loss 6.4921 LearningRate 0.0007 Epoch: 9 Global Step: 203410 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:13:50,254-Speed 6308.96 samples/sec Loss 6.6269 LearningRate 0.0007 Epoch: 9 Global Step: 203420 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:13:53,501-Speed 6308.71 samples/sec Loss 6.5609 LearningRate 0.0007 Epoch: 9 Global Step: 203430 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:13:56,748-Speed 6308.83 samples/sec Loss 6.5409 LearningRate 0.0007 Epoch: 9 Global Step: 203440 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:13:59,979-Speed 6338.85 samples/sec Loss 6.5099 LearningRate 0.0007 Epoch: 9 Global Step: 203450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:03,228-Speed 6306.10 samples/sec Loss 6.6033 LearningRate 0.0007 Epoch: 9 Global Step: 203460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:06,474-Speed 6309.04 samples/sec Loss 6.5692 LearningRate 0.0007 Epoch: 9 Global Step: 203470 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:09,723-Speed 6306.95 samples/sec Loss 6.5264 LearningRate 0.0007 Epoch: 9 Global Step: 203480 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:12,968-Speed 6311.87 samples/sec Loss 6.5096 LearningRate 0.0007 Epoch: 9 Global Step: 203490 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:16,216-Speed 6307.73 samples/sec Loss 6.5785 LearningRate 0.0007 Epoch: 9 Global Step: 203500 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:19,464-Speed 6305.65 samples/sec Loss 6.5739 LearningRate 0.0007 Epoch: 9 Global Step: 203510 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:22,708-Speed 6314.47 samples/sec Loss 6.5567 LearningRate 0.0007 Epoch: 9 Global Step: 203520 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:25,952-Speed 6314.11 samples/sec Loss 6.6326 LearningRate 0.0007 Epoch: 9 Global Step: 203530 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:29,200-Speed 6308.07 samples/sec Loss 6.6488 LearningRate 0.0007 Epoch: 9 Global Step: 203540 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:32,452-Speed 6299.13 samples/sec Loss 6.4895 LearningRate 0.0007 Epoch: 9 Global Step: 203550 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:14:35,684-Speed 6337.10 samples/sec Loss 6.5210 LearningRate 0.0007 Epoch: 9 Global Step: 203560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:38,929-Speed 6313.29 samples/sec Loss 6.5985 LearningRate 0.0007 Epoch: 9 Global Step: 203570 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:42,178-Speed 6305.32 samples/sec Loss 6.5145 LearningRate 0.0007 Epoch: 9 Global Step: 203580 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:45,428-Speed 6303.12 samples/sec Loss 6.5666 LearningRate 0.0007 Epoch: 9 Global Step: 203590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:48,696-Speed 6268.18 samples/sec Loss 6.5716 LearningRate 0.0007 Epoch: 9 Global Step: 203600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:51,946-Speed 6303.13 samples/sec Loss 6.4915 LearningRate 0.0007 Epoch: 9 Global Step: 203610 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:55,189-Speed 6317.76 samples/sec Loss 6.5899 LearningRate 0.0007 Epoch: 9 Global Step: 203620 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:14:58,434-Speed 6311.94 samples/sec Loss 6.5941 LearningRate 0.0007 Epoch: 9 Global Step: 203630 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:01,683-Speed 6305.86 samples/sec Loss 6.5565 LearningRate 0.0007 Epoch: 9 Global Step: 203640 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:04,929-Speed 6310.32 samples/sec Loss 6.5808 LearningRate 0.0007 Epoch: 9 Global Step: 203650 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:08,172-Speed 6315.43 samples/sec Loss 6.5767 LearningRate 0.0007 Epoch: 9 Global Step: 203660 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:15:11,406-Speed 6335.63 samples/sec Loss 6.5311 LearningRate 0.0007 Epoch: 9 Global Step: 203670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:14,655-Speed 6304.38 samples/sec Loss 6.5802 LearningRate 0.0007 Epoch: 9 Global Step: 203680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:17,897-Speed 6318.81 samples/sec Loss 6.4915 LearningRate 0.0007 Epoch: 9 Global Step: 203690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:21,144-Speed 6308.30 samples/sec Loss 6.5233 LearningRate 0.0007 Epoch: 9 Global Step: 203700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:24,392-Speed 6307.79 samples/sec Loss 6.5628 LearningRate 0.0007 Epoch: 9 Global Step: 203710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:27,635-Speed 6315.35 samples/sec Loss 6.5159 LearningRate 0.0007 Epoch: 9 Global Step: 203720 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:30,885-Speed 6304.14 samples/sec Loss 6.5715 LearningRate 0.0007 Epoch: 9 Global Step: 203730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:34,135-Speed 6302.82 samples/sec Loss 6.6456 LearningRate 0.0007 Epoch: 9 Global Step: 203740 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:37,382-Speed 6309.58 samples/sec Loss 6.5116 LearningRate 0.0007 Epoch: 9 Global Step: 203750 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:40,628-Speed 6309.91 samples/sec Loss 6.5447 LearningRate 0.0007 Epoch: 9 Global Step: 203760 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:43,877-Speed 6304.28 samples/sec Loss 6.5966 LearningRate 0.0007 Epoch: 9 Global Step: 203770 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:15:47,124-Speed 6309.31 samples/sec Loss 6.5835 LearningRate 0.0007 Epoch: 9 Global Step: 203780 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:15:50,371-Speed 6309.32 samples/sec Loss 6.6013 LearningRate 0.0007 Epoch: 9 Global Step: 203790 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:15:53,614-Speed 6315.51 samples/sec Loss 6.6251 LearningRate 0.0007 Epoch: 9 Global Step: 203800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:15:56,860-Speed 6310.64 samples/sec Loss 6.5984 LearningRate 0.0007 Epoch: 9 Global Step: 203810 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:00,108-Speed 6308.79 samples/sec Loss 6.6436 LearningRate 0.0007 Epoch: 9 Global Step: 203820 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:03,378-Speed 6263.73 samples/sec Loss 6.5734 LearningRate 0.0007 Epoch: 9 Global Step: 203830 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:06,622-Speed 6315.23 samples/sec Loss 6.7119 LearningRate 0.0007 Epoch: 9 Global Step: 203840 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:09,871-Speed 6305.12 samples/sec Loss 6.5980 LearningRate 0.0007 Epoch: 9 Global Step: 203850 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:13,119-Speed 6307.21 samples/sec Loss 6.6132 LearningRate 0.0007 Epoch: 9 Global Step: 203860 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:16,366-Speed 6307.84 samples/sec Loss 6.5930 LearningRate 0.0007 Epoch: 9 Global Step: 203870 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:19,614-Speed 6306.60 samples/sec Loss 6.5503 LearningRate 0.0007 Epoch: 9 Global Step: 203880 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:22,862-Speed 6306.83 samples/sec Loss 6.5064 LearningRate 0.0007 Epoch: 9 Global Step: 203890 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:26,109-Speed 6308.19 samples/sec Loss 6.5612 LearningRate 0.0007 Epoch: 9 Global Step: 203900 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:16:29,350-Speed 6322.19 samples/sec Loss 6.5466 LearningRate 0.0007 Epoch: 9 Global Step: 203910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:32,595-Speed 6312.50 samples/sec Loss 6.5165 LearningRate 0.0007 Epoch: 9 Global Step: 203920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:35,841-Speed 6310.96 samples/sec Loss 6.4978 LearningRate 0.0007 Epoch: 9 Global Step: 203930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:39,089-Speed 6306.19 samples/sec Loss 6.5443 LearningRate 0.0007 Epoch: 9 Global Step: 203940 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:42,339-Speed 6302.64 samples/sec Loss 6.5060 LearningRate 0.0007 Epoch: 9 Global Step: 203950 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:45,584-Speed 6312.73 samples/sec Loss 6.5691 LearningRate 0.0007 Epoch: 9 Global Step: 203960 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:48,831-Speed 6308.51 samples/sec Loss 6.5238 LearningRate 0.0007 Epoch: 9 Global Step: 203970 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:52,083-Speed 6298.40 samples/sec Loss 6.6024 LearningRate 0.0007 Epoch: 9 Global Step: 203980 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:16:55,310-Speed 6349.36 samples/sec Loss 6.5493 LearningRate 0.0007 Epoch: 9 Global Step: 203990 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:16:58,558-Speed 6306.99 samples/sec Loss 6.5905 LearningRate 0.0007 Epoch: 9 Global Step: 204000 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:17:01,808-Speed 6302.47 samples/sec Loss 6.5392 LearningRate 0.0007 Epoch: 9 Global Step: 204010 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:17:05,053-Speed 6312.98 samples/sec Loss 6.4587 LearningRate 0.0007 Epoch: 9 Global Step: 204020 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:17:08,302-Speed 6305.54 samples/sec Loss 6.5209 LearningRate 0.0007 Epoch: 9 Global Step: 204030 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:17:11,547-Speed 6312.34 samples/sec Loss 6.5449 LearningRate 0.0007 Epoch: 9 Global Step: 204040 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:17:14,789-Speed 6319.33 samples/sec Loss 6.5916 LearningRate 0.0007 Epoch: 9 Global Step: 204050 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:17:18,036-Speed 6309.32 samples/sec Loss 6.5590 LearningRate 0.0007 Epoch: 9 Global Step: 204060 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:17:21,281-Speed 6310.73 samples/sec Loss 6.4931 LearningRate 0.0007 Epoch: 9 Global Step: 204070 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:17:24,531-Speed 6303.60 samples/sec Loss 6.5307 LearningRate 0.0007 Epoch: 9 Global Step: 204080 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:17:27,788-Speed 6289.51 samples/sec Loss 6.5997 LearningRate 0.0007 Epoch: 9 Global Step: 204090 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:17:31,032-Speed 6314.43 samples/sec Loss 6.4691 LearningRate 0.0007 Epoch: 9 Global Step: 204100 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:17:34,288-Speed 6292.19 samples/sec Loss 6.5751 LearningRate 0.0007 Epoch: 9 Global Step: 204110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:17:37,531-Speed 6314.87 samples/sec Loss 6.5832 LearningRate 0.0007 Epoch: 9 Global Step: 204120 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:17:40,778-Speed 6308.92 samples/sec Loss 6.5999 LearningRate 0.0007 Epoch: 9 Global Step: 204130 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:17:44,025-Speed 6310.30 samples/sec Loss 6.5506 LearningRate 0.0007 Epoch: 9 Global Step: 204140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:17:47,279-Speed 6294.34 samples/sec Loss 6.5062 LearningRate 0.0007 Epoch: 9 Global Step: 204150 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:17:50,529-Speed 6303.18 samples/sec Loss 6.5263 LearningRate 0.0007 Epoch: 9 Global Step: 204160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:17:53,773-Speed 6314.94 samples/sec Loss 6.5771 LearningRate 0.0007 Epoch: 9 Global Step: 204170 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:17:57,024-Speed 6299.89 samples/sec Loss 6.5430 LearningRate 0.0007 Epoch: 9 Global Step: 204180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:00,257-Speed 6336.78 samples/sec Loss 6.5171 LearningRate 0.0007 Epoch: 9 Global Step: 204190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:03,502-Speed 6313.08 samples/sec Loss 6.4660 LearningRate 0.0007 Epoch: 9 Global Step: 204200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:06,751-Speed 6304.75 samples/sec Loss 6.5612 LearningRate 0.0007 Epoch: 9 Global Step: 204210 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:10,004-Speed 6297.72 samples/sec Loss 6.5672 LearningRate 0.0007 Epoch: 9 Global Step: 204220 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:13,263-Speed 6285.14 samples/sec Loss 6.4846 LearningRate 0.0007 Epoch: 9 Global Step: 204230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:16,509-Speed 6311.82 samples/sec Loss 6.4880 LearningRate 0.0007 Epoch: 9 Global Step: 204240 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:19,753-Speed 6314.29 samples/sec Loss 6.5157 LearningRate 0.0007 Epoch: 9 Global Step: 204250 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:23,002-Speed 6303.81 samples/sec Loss 6.5343 LearningRate 0.0007 Epoch: 9 Global Step: 204260 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:26,254-Speed 6299.19 samples/sec Loss 6.5336 LearningRate 0.0007 Epoch: 9 Global Step: 204270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:29,504-Speed 6303.33 samples/sec Loss 6.5635 LearningRate 0.0007 Epoch: 9 Global Step: 204280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:32,762-Speed 6286.97 samples/sec Loss 6.5285 LearningRate 0.0007 Epoch: 9 Global Step: 204290 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:18:36,000-Speed 6326.45 samples/sec Loss 6.4689 LearningRate 0.0007 Epoch: 9 Global Step: 204300 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:39,249-Speed 6305.25 samples/sec Loss 6.5366 LearningRate 0.0007 Epoch: 9 Global Step: 204310 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:42,501-Speed 6299.84 samples/sec Loss 6.5164 LearningRate 0.0007 Epoch: 9 Global Step: 204320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:45,747-Speed 6310.05 samples/sec Loss 6.5458 LearningRate 0.0007 Epoch: 9 Global Step: 204330 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:48,994-Speed 6308.11 samples/sec Loss 6.4567 LearningRate 0.0007 Epoch: 9 Global Step: 204340 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:52,243-Speed 6305.99 samples/sec Loss 6.5426 LearningRate 0.0007 Epoch: 9 Global Step: 204350 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:55,491-Speed 6305.95 samples/sec Loss 6.5944 LearningRate 0.0007 Epoch: 9 Global Step: 204360 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:18:58,740-Speed 6304.80 samples/sec Loss 6.5874 LearningRate 0.0007 Epoch: 9 Global Step: 204370 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:01,989-Speed 6306.02 samples/sec Loss 6.4695 LearningRate 0.0007 Epoch: 9 Global Step: 204380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:05,244-Speed 6292.74 samples/sec Loss 6.5506 LearningRate 0.0007 Epoch: 9 Global Step: 204390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:08,488-Speed 6315.56 samples/sec Loss 6.5705 LearningRate 0.0007 Epoch: 9 Global Step: 204400 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:19:11,735-Speed 6307.78 samples/sec Loss 6.5595 LearningRate 0.0007 Epoch: 9 Global Step: 204410 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:19:14,971-Speed 6332.21 samples/sec Loss 6.6212 LearningRate 0.0007 Epoch: 9 Global Step: 204420 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:18,218-Speed 6308.16 samples/sec Loss 6.6357 LearningRate 0.0007 Epoch: 9 Global Step: 204430 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:21,463-Speed 6313.01 samples/sec Loss 6.5851 LearningRate 0.0007 Epoch: 9 Global Step: 204440 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:24,712-Speed 6306.15 samples/sec Loss 6.5885 LearningRate 0.0007 Epoch: 9 Global Step: 204450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:27,959-Speed 6307.50 samples/sec Loss 6.5162 LearningRate 0.0007 Epoch: 9 Global Step: 204460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:31,206-Speed 6308.06 samples/sec Loss 6.5296 LearningRate 0.0007 Epoch: 9 Global Step: 204470 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:34,452-Speed 6311.78 samples/sec Loss 6.5708 LearningRate 0.0007 Epoch: 9 Global Step: 204480 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:37,698-Speed 6310.96 samples/sec Loss 6.5311 LearningRate 0.0007 Epoch: 9 Global Step: 204490 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:40,955-Speed 6289.55 samples/sec Loss 6.5372 LearningRate 0.0007 Epoch: 9 Global Step: 204500 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:44,200-Speed 6312.28 samples/sec Loss 6.6086 LearningRate 0.0007 Epoch: 9 Global Step: 204510 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:47,444-Speed 6314.60 samples/sec Loss 6.5467 LearningRate 0.0007 Epoch: 9 Global Step: 204520 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:50,687-Speed 6317.26 samples/sec Loss 6.5110 LearningRate 0.0007 Epoch: 9 Global Step: 204530 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:53,930-Speed 6315.88 samples/sec Loss 6.4692 LearningRate 0.0007 Epoch: 9 Global Step: 204540 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:19:57,180-Speed 6302.56 samples/sec Loss 6.6375 LearningRate 0.0007 Epoch: 9 Global Step: 204550 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:00,427-Speed 6308.48 samples/sec Loss 6.5770 LearningRate 0.0007 Epoch: 9 Global Step: 204560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:03,682-Speed 6292.96 samples/sec Loss 6.5332 LearningRate 0.0007 Epoch: 9 Global Step: 204570 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:06,965-Speed 6239.69 samples/sec Loss 6.5965 LearningRate 0.0007 Epoch: 9 Global Step: 204580 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:10,215-Speed 6302.32 samples/sec Loss 6.5811 LearningRate 0.0007 Epoch: 9 Global Step: 204590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:13,462-Speed 6308.64 samples/sec Loss 6.5318 LearningRate 0.0007 Epoch: 9 Global Step: 204600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:16,704-Speed 6320.24 samples/sec Loss 6.5757 LearningRate 0.0007 Epoch: 9 Global Step: 204610 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:19,952-Speed 6305.35 samples/sec Loss 6.5030 LearningRate 0.0007 Epoch: 9 Global Step: 204620 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:20:23,197-Speed 6313.03 samples/sec Loss 6.5986 LearningRate 0.0007 Epoch: 9 Global Step: 204630 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:20:26,445-Speed 6307.89 samples/sec Loss 6.6081 LearningRate 0.0007 Epoch: 9 Global Step: 204640 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:20:29,695-Speed 6303.59 samples/sec Loss 6.5464 LearningRate 0.0007 Epoch: 9 Global Step: 204650 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:20:32,929-Speed 6333.53 samples/sec Loss 6.4506 LearningRate 0.0007 Epoch: 9 Global Step: 204660 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:36,177-Speed 6307.07 samples/sec Loss 6.5631 LearningRate 0.0007 Epoch: 9 Global Step: 204670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:39,421-Speed 6315.25 samples/sec Loss 6.5778 LearningRate 0.0007 Epoch: 9 Global Step: 204680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:42,668-Speed 6308.23 samples/sec Loss 6.5128 LearningRate 0.0007 Epoch: 9 Global Step: 204690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:45,917-Speed 6305.23 samples/sec Loss 6.5295 LearningRate 0.0007 Epoch: 9 Global Step: 204700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:49,165-Speed 6307.11 samples/sec Loss 6.6230 LearningRate 0.0007 Epoch: 9 Global Step: 204710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:52,411-Speed 6310.14 samples/sec Loss 6.4959 LearningRate 0.0007 Epoch: 9 Global Step: 204720 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:55,656-Speed 6311.87 samples/sec Loss 6.5781 LearningRate 0.0007 Epoch: 9 Global Step: 204730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:20:58,904-Speed 6307.24 samples/sec Loss 6.4561 LearningRate 0.0007 Epoch: 9 Global Step: 204740 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:02,153-Speed 6304.61 samples/sec Loss 6.5504 LearningRate 0.0007 Epoch: 9 Global Step: 204750 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:05,389-Speed 6331.23 samples/sec Loss 6.6280 LearningRate 0.0007 Epoch: 9 Global Step: 204760 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:08,640-Speed 6299.99 samples/sec Loss 6.4648 LearningRate 0.0007 Epoch: 9 Global Step: 204770 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:11,895-Speed 6292.96 samples/sec Loss 6.4698 LearningRate 0.0007 Epoch: 9 Global Step: 204780 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:15,141-Speed 6311.78 samples/sec Loss 6.5433 LearningRate 0.0007 Epoch: 9 Global Step: 204790 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:18,384-Speed 6315.09 samples/sec Loss 6.6144 LearningRate 0.0007 Epoch: 9 Global Step: 204800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:21,635-Speed 6301.03 samples/sec Loss 6.5740 LearningRate 0.0007 Epoch: 9 Global Step: 204810 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:24,882-Speed 6309.28 samples/sec Loss 6.5438 LearningRate 0.0007 Epoch: 9 Global Step: 204820 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:28,147-Speed 6273.46 samples/sec Loss 6.5840 LearningRate 0.0007 Epoch: 9 Global Step: 204830 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:31,396-Speed 6304.92 samples/sec Loss 6.5605 LearningRate 0.0007 Epoch: 9 Global Step: 204840 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:34,645-Speed 6306.13 samples/sec Loss 6.6179 LearningRate 0.0007 Epoch: 9 Global Step: 204850 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:37,896-Speed 6301.31 samples/sec Loss 6.5323 LearningRate 0.0007 Epoch: 9 Global Step: 204860 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:21:41,141-Speed 6311.90 samples/sec Loss 6.6007 LearningRate 0.0007 Epoch: 9 Global Step: 204870 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:21:44,391-Speed 6304.84 samples/sec Loss 6.5566 LearningRate 0.0007 Epoch: 9 Global Step: 204880 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:21:47,626-Speed 6330.72 samples/sec Loss 6.6132 LearningRate 0.0007 Epoch: 9 Global Step: 204890 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:50,875-Speed 6305.34 samples/sec Loss 6.5275 LearningRate 0.0007 Epoch: 9 Global Step: 204900 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:54,121-Speed 6310.85 samples/sec Loss 6.5604 LearningRate 0.0007 Epoch: 9 Global Step: 204910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:21:57,368-Speed 6308.51 samples/sec Loss 6.5951 LearningRate 0.0007 Epoch: 9 Global Step: 204920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:00,620-Speed 6298.23 samples/sec Loss 6.5964 LearningRate 0.0007 Epoch: 9 Global Step: 204930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:03,876-Speed 6292.50 samples/sec Loss 6.5448 LearningRate 0.0007 Epoch: 9 Global Step: 204940 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:07,125-Speed 6304.14 samples/sec Loss 6.4450 LearningRate 0.0007 Epoch: 9 Global Step: 204950 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:10,377-Speed 6299.58 samples/sec Loss 6.5423 LearningRate 0.0007 Epoch: 9 Global Step: 204960 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:13,627-Speed 6302.11 samples/sec Loss 6.5901 LearningRate 0.0007 Epoch: 9 Global Step: 204970 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:16,875-Speed 6307.29 samples/sec Loss 6.5984 LearningRate 0.0007 Epoch: 9 Global Step: 204980 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:20,106-Speed 6340.06 samples/sec Loss 6.5923 LearningRate 0.0007 Epoch: 9 Global Step: 204990 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:23,355-Speed 6305.69 samples/sec Loss 6.5113 LearningRate 0.0007 Epoch: 9 Global Step: 205000 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:26,602-Speed 6308.50 samples/sec Loss 6.5095 LearningRate 0.0007 Epoch: 9 Global Step: 205010 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:29,854-Speed 6298.59 samples/sec Loss 6.5512 LearningRate 0.0007 Epoch: 9 Global Step: 205020 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:33,099-Speed 6311.63 samples/sec Loss 6.5210 LearningRate 0.0007 Epoch: 9 Global Step: 205030 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:36,350-Speed 6302.69 samples/sec Loss 6.5681 LearningRate 0.0007 Epoch: 9 Global Step: 205040 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:39,595-Speed 6311.32 samples/sec Loss 6.5036 LearningRate 0.0007 Epoch: 9 Global Step: 205050 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:42,840-Speed 6313.35 samples/sec Loss 6.6029 LearningRate 0.0007 Epoch: 9 Global Step: 205060 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:46,088-Speed 6306.79 samples/sec Loss 6.6798 LearningRate 0.0007 Epoch: 9 Global Step: 205070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:49,343-Speed 6295.13 samples/sec Loss 6.5853 LearningRate 0.0007 Epoch: 9 Global Step: 205080 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:22:52,585-Speed 6317.32 samples/sec Loss 6.6266 LearningRate 0.0007 Epoch: 9 Global Step: 205090 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:22:55,831-Speed 6310.69 samples/sec Loss 6.5273 LearningRate 0.0007 Epoch: 9 Global Step: 205100 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:22:59,071-Speed 6322.26 samples/sec Loss 6.6067 LearningRate 0.0007 Epoch: 9 Global Step: 205110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:02,320-Speed 6305.59 samples/sec Loss 6.4781 LearningRate 0.0007 Epoch: 9 Global Step: 205120 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:05,579-Speed 6285.80 samples/sec Loss 6.5946 LearningRate 0.0007 Epoch: 9 Global Step: 205130 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:08,824-Speed 6312.27 samples/sec Loss 6.5427 LearningRate 0.0007 Epoch: 9 Global Step: 205140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:12,074-Speed 6302.49 samples/sec Loss 6.5345 LearningRate 0.0007 Epoch: 9 Global Step: 205150 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:15,331-Speed 6289.49 samples/sec Loss 6.5220 LearningRate 0.0007 Epoch: 9 Global Step: 205160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:18,580-Speed 6305.13 samples/sec Loss 6.5950 LearningRate 0.0007 Epoch: 9 Global Step: 205170 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:21,828-Speed 6307.89 samples/sec Loss 6.5868 LearningRate 0.0007 Epoch: 9 Global Step: 205180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:25,074-Speed 6310.98 samples/sec Loss 6.5754 LearningRate 0.0007 Epoch: 9 Global Step: 205190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:28,322-Speed 6306.67 samples/sec Loss 6.5027 LearningRate 0.0007 Epoch: 9 Global Step: 205200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:31,570-Speed 6306.33 samples/sec Loss 6.5339 LearningRate 0.0007 Epoch: 9 Global Step: 205210 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:23:34,821-Speed 6301.21 samples/sec Loss 6.5135 LearningRate 0.0007 Epoch: 9 Global Step: 205220 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:23:38,052-Speed 6339.47 samples/sec Loss 6.5341 LearningRate 0.0007 Epoch: 9 Global Step: 205230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:41,302-Speed 6303.20 samples/sec Loss 6.4871 LearningRate 0.0007 Epoch: 9 Global Step: 205240 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:44,554-Speed 6299.48 samples/sec Loss 6.4931 LearningRate 0.0007 Epoch: 9 Global Step: 205250 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:47,802-Speed 6306.06 samples/sec Loss 6.5066 LearningRate 0.0007 Epoch: 9 Global Step: 205260 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:51,051-Speed 6306.00 samples/sec Loss 6.5310 LearningRate 0.0007 Epoch: 9 Global Step: 205270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:54,296-Speed 6311.51 samples/sec Loss 6.6092 LearningRate 0.0007 Epoch: 9 Global Step: 205280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:23:57,544-Speed 6306.79 samples/sec Loss 6.5663 LearningRate 0.0007 Epoch: 9 Global Step: 205290 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:00,796-Speed 6299.39 samples/sec Loss 6.5404 LearningRate 0.0007 Epoch: 9 Global Step: 205300 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:04,044-Speed 6307.74 samples/sec Loss 6.4788 LearningRate 0.0007 Epoch: 9 Global Step: 205310 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:07,291-Speed 6309.07 samples/sec Loss 6.5089 LearningRate 0.0007 Epoch: 9 Global Step: 205320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:10,520-Speed 6343.54 samples/sec Loss 6.5185 LearningRate 0.0007 Epoch: 9 Global Step: 205330 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:13,766-Speed 6311.12 samples/sec Loss 6.5734 LearningRate 0.0007 Epoch: 9 Global Step: 205340 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:17,012-Speed 6309.97 samples/sec Loss 6.5337 LearningRate 0.0007 Epoch: 9 Global Step: 205350 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:20,265-Speed 6296.12 samples/sec Loss 6.5760 LearningRate 0.0007 Epoch: 9 Global Step: 205360 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:23,512-Speed 6311.10 samples/sec Loss 6.5085 LearningRate 0.0007 Epoch: 9 Global Step: 205370 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:26,769-Speed 6288.97 samples/sec Loss 6.5335 LearningRate 0.0007 Epoch: 9 Global Step: 205380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:30,013-Speed 6313.51 samples/sec Loss 6.6371 LearningRate 0.0007 Epoch: 9 Global Step: 205390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:33,258-Speed 6313.09 samples/sec Loss 6.6092 LearningRate 0.0007 Epoch: 9 Global Step: 205400 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:36,505-Speed 6308.88 samples/sec Loss 6.5196 LearningRate 0.0007 Epoch: 9 Global Step: 205410 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:39,752-Speed 6307.80 samples/sec Loss 6.5440 LearningRate 0.0007 Epoch: 9 Global Step: 205420 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:42,983-Speed 6339.50 samples/sec Loss 6.5031 LearningRate 0.0007 Epoch: 9 Global Step: 205430 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:46,230-Speed 6310.80 samples/sec Loss 6.5322 LearningRate 0.0007 Epoch: 9 Global Step: 205440 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:49,473-Speed 6314.96 samples/sec Loss 6.5559 LearningRate 0.0007 Epoch: 9 Global Step: 205450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:52,720-Speed 6309.01 samples/sec Loss 6.5970 LearningRate 0.0007 Epoch: 9 Global Step: 205460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:55,966-Speed 6310.32 samples/sec Loss 6.5066 LearningRate 0.0007 Epoch: 9 Global Step: 205470 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:24:59,209-Speed 6316.86 samples/sec Loss 6.5457 LearningRate 0.0007 Epoch: 9 Global Step: 205480 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:02,455-Speed 6310.17 samples/sec Loss 6.4750 LearningRate 0.0007 Epoch: 9 Global Step: 205490 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:05,703-Speed 6309.29 samples/sec Loss 6.6545 LearningRate 0.0007 Epoch: 9 Global Step: 205500 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:08,948-Speed 6312.79 samples/sec Loss 6.5287 LearningRate 0.0007 Epoch: 9 Global Step: 205510 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:12,199-Speed 6300.89 samples/sec Loss 6.6640 LearningRate 0.0007 Epoch: 9 Global Step: 205520 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:15,454-Speed 6293.73 samples/sec Loss 6.4768 LearningRate 0.0007 Epoch: 9 Global Step: 205530 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:25:18,702-Speed 6305.83 samples/sec Loss 6.5037 LearningRate 0.0007 Epoch: 9 Global Step: 205540 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:25:21,936-Speed 6335.34 samples/sec Loss 6.4786 LearningRate 0.0007 Epoch: 9 Global Step: 205550 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:25,186-Speed 6301.24 samples/sec Loss 6.5187 LearningRate 0.0007 Epoch: 9 Global Step: 205560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:28,432-Speed 6312.09 samples/sec Loss 6.5092 LearningRate 0.0007 Epoch: 9 Global Step: 205570 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:31,677-Speed 6310.68 samples/sec Loss 6.6016 LearningRate 0.0007 Epoch: 9 Global Step: 205580 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:34,921-Speed 6315.26 samples/sec Loss 6.5760 LearningRate 0.0007 Epoch: 9 Global Step: 205590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:38,179-Speed 6288.14 samples/sec Loss 6.4063 LearningRate 0.0007 Epoch: 9 Global Step: 205600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:41,425-Speed 6309.96 samples/sec Loss 6.6102 LearningRate 0.0007 Epoch: 9 Global Step: 205610 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:44,668-Speed 6317.98 samples/sec Loss 6.4808 LearningRate 0.0007 Epoch: 9 Global Step: 205620 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:47,913-Speed 6311.56 samples/sec Loss 6.5332 LearningRate 0.0007 Epoch: 9 Global Step: 205630 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:51,159-Speed 6310.46 samples/sec Loss 6.4877 LearningRate 0.0007 Epoch: 9 Global Step: 205640 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:25:54,408-Speed 6305.23 samples/sec Loss 6.5160 LearningRate 0.0007 Epoch: 9 Global Step: 205650 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:25:57,647-Speed 6323.94 samples/sec Loss 6.4929 LearningRate 0.0007 Epoch: 9 Global Step: 205660 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:00,893-Speed 6309.84 samples/sec Loss 6.5282 LearningRate 0.0007 Epoch: 9 Global Step: 205670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:04,147-Speed 6296.18 samples/sec Loss 6.6317 LearningRate 0.0007 Epoch: 9 Global Step: 205680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:07,391-Speed 6314.90 samples/sec Loss 6.5682 LearningRate 0.0007 Epoch: 9 Global Step: 205690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:10,641-Speed 6301.78 samples/sec Loss 6.5356 LearningRate 0.0007 Epoch: 9 Global Step: 205700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:13,889-Speed 6306.98 samples/sec Loss 6.4612 LearningRate 0.0007 Epoch: 9 Global Step: 205710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:17,136-Speed 6310.28 samples/sec Loss 6.5380 LearningRate 0.0007 Epoch: 9 Global Step: 205720 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:20,386-Speed 6303.64 samples/sec Loss 6.5079 LearningRate 0.0007 Epoch: 9 Global Step: 205730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:23,632-Speed 6310.85 samples/sec Loss 6.5166 LearningRate 0.0007 Epoch: 9 Global Step: 205740 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:26,879-Speed 6308.48 samples/sec Loss 6.5606 LearningRate 0.0007 Epoch: 9 Global Step: 205750 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:30,109-Speed 6341.00 samples/sec Loss 6.5850 LearningRate 0.0007 Epoch: 9 Global Step: 205760 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:33,354-Speed 6312.21 samples/sec Loss 6.6046 LearningRate 0.0007 Epoch: 9 Global Step: 205770 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:36,607-Speed 6298.17 samples/sec Loss 6.5877 LearningRate 0.0007 Epoch: 9 Global Step: 205780 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:39,856-Speed 6304.32 samples/sec Loss 6.5175 LearningRate 0.0007 Epoch: 9 Global Step: 205790 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:43,105-Speed 6306.01 samples/sec Loss 6.5842 LearningRate 0.0007 Epoch: 9 Global Step: 205800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:46,348-Speed 6315.22 samples/sec Loss 6.5024 LearningRate 0.0007 Epoch: 9 Global Step: 205810 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:49,595-Speed 6309.53 samples/sec Loss 6.6191 LearningRate 0.0007 Epoch: 9 Global Step: 205820 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:52,841-Speed 6311.02 samples/sec Loss 6.6165 LearningRate 0.0007 Epoch: 9 Global Step: 205830 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:56,088-Speed 6309.32 samples/sec Loss 6.5250 LearningRate 0.0007 Epoch: 9 Global Step: 205840 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:26:59,338-Speed 6301.26 samples/sec Loss 6.4940 LearningRate 0.0007 Epoch: 9 Global Step: 205850 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:02,585-Speed 6309.46 samples/sec Loss 6.5808 LearningRate 0.0007 Epoch: 9 Global Step: 205860 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:27:05,826-Speed 6320.57 samples/sec Loss 6.5494 LearningRate 0.0007 Epoch: 9 Global Step: 205870 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:09,069-Speed 6315.90 samples/sec Loss 6.5995 LearningRate 0.0007 Epoch: 9 Global Step: 205880 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:12,323-Speed 6294.83 samples/sec Loss 6.5409 LearningRate 0.0007 Epoch: 9 Global Step: 205890 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:15,567-Speed 6316.24 samples/sec Loss 6.6050 LearningRate 0.0007 Epoch: 9 Global Step: 205900 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:18,812-Speed 6311.65 samples/sec Loss 6.5427 LearningRate 0.0007 Epoch: 9 Global Step: 205910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:22,057-Speed 6312.29 samples/sec Loss 6.5486 LearningRate 0.0007 Epoch: 9 Global Step: 205920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:25,303-Speed 6311.33 samples/sec Loss 6.4487 LearningRate 0.0007 Epoch: 9 Global Step: 205930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:28,552-Speed 6305.35 samples/sec Loss 6.5356 LearningRate 0.0007 Epoch: 9 Global Step: 205940 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:31,799-Speed 6308.68 samples/sec Loss 6.4830 LearningRate 0.0007 Epoch: 9 Global Step: 205950 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:35,045-Speed 6312.63 samples/sec Loss 6.5787 LearningRate 0.0007 Epoch: 9 Global Step: 205960 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:38,291-Speed 6310.07 samples/sec Loss 6.5739 LearningRate 0.0007 Epoch: 9 Global Step: 205970 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:27:41,539-Speed 6305.82 samples/sec Loss 6.5600 LearningRate 0.0007 Epoch: 9 Global Step: 205980 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:27:44,771-Speed 6339.29 samples/sec Loss 6.5537 LearningRate 0.0007 Epoch: 9 Global Step: 205990 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:48,016-Speed 6312.74 samples/sec Loss 6.4648 LearningRate 0.0007 Epoch: 9 Global Step: 206000 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:51,266-Speed 6302.77 samples/sec Loss 6.5567 LearningRate 0.0007 Epoch: 9 Global Step: 206010 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:54,525-Speed 6284.57 samples/sec Loss 6.5116 LearningRate 0.0007 Epoch: 9 Global Step: 206020 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:27:57,769-Speed 6314.65 samples/sec Loss 6.5195 LearningRate 0.0007 Epoch: 9 Global Step: 206030 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:01,019-Speed 6303.97 samples/sec Loss 6.5823 LearningRate 0.0007 Epoch: 9 Global Step: 206040 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:04,270-Speed 6301.13 samples/sec Loss 6.5431 LearningRate 0.0007 Epoch: 9 Global Step: 206050 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:07,515-Speed 6312.13 samples/sec Loss 6.5177 LearningRate 0.0007 Epoch: 9 Global Step: 206060 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:10,763-Speed 6307.85 samples/sec Loss 6.5888 LearningRate 0.0007 Epoch: 9 Global Step: 206070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:14,007-Speed 6313.11 samples/sec Loss 6.5837 LearningRate 0.0007 Epoch: 9 Global Step: 206080 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:17,253-Speed 6310.23 samples/sec Loss 6.5785 LearningRate 0.0007 Epoch: 9 Global Step: 206090 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:28:20,498-Speed 6312.85 samples/sec Loss 6.5939 LearningRate 0.0007 Epoch: 9 Global Step: 206100 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:28:23,733-Speed 6333.70 samples/sec Loss 6.4633 LearningRate 0.0007 Epoch: 9 Global Step: 206110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:26,977-Speed 6314.05 samples/sec Loss 6.5777 LearningRate 0.0007 Epoch: 9 Global Step: 206120 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:30,224-Speed 6308.72 samples/sec Loss 6.5711 LearningRate 0.0007 Epoch: 9 Global Step: 206130 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:33,474-Speed 6302.60 samples/sec Loss 6.5661 LearningRate 0.0007 Epoch: 9 Global Step: 206140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:36,720-Speed 6311.25 samples/sec Loss 6.5673 LearningRate 0.0007 Epoch: 9 Global Step: 206150 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:39,964-Speed 6316.19 samples/sec Loss 6.5684 LearningRate 0.0007 Epoch: 9 Global Step: 206160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:43,213-Speed 6304.10 samples/sec Loss 6.5651 LearningRate 0.0007 Epoch: 9 Global Step: 206170 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:46,460-Speed 6308.06 samples/sec Loss 6.5037 LearningRate 0.0007 Epoch: 9 Global Step: 206180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:49,705-Speed 6312.73 samples/sec Loss 6.5474 LearningRate 0.0007 Epoch: 9 Global Step: 206190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:52,952-Speed 6309.26 samples/sec Loss 6.5009 LearningRate 0.0007 Epoch: 9 Global Step: 206200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:28:56,200-Speed 6306.87 samples/sec Loss 6.4985 LearningRate 0.0007 Epoch: 9 Global Step: 206210 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:28:59,434-Speed 6333.97 samples/sec Loss 6.5212 LearningRate 0.0007 Epoch: 9 Global Step: 206220 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:02,680-Speed 6310.73 samples/sec Loss 6.5645 LearningRate 0.0007 Epoch: 9 Global Step: 206230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:05,927-Speed 6309.34 samples/sec Loss 6.5705 LearningRate 0.0007 Epoch: 9 Global Step: 206240 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:09,174-Speed 6307.27 samples/sec Loss 6.6084 LearningRate 0.0007 Epoch: 9 Global Step: 206250 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:12,422-Speed 6310.89 samples/sec Loss 6.5632 LearningRate 0.0007 Epoch: 9 Global Step: 206260 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:15,668-Speed 6309.44 samples/sec Loss 6.4705 LearningRate 0.0007 Epoch: 9 Global Step: 206270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:18,919-Speed 6301.42 samples/sec Loss 6.5918 LearningRate 0.0007 Epoch: 9 Global Step: 206280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:22,167-Speed 6307.08 samples/sec Loss 6.5493 LearningRate 0.0007 Epoch: 9 Global Step: 206290 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:25,413-Speed 6309.96 samples/sec Loss 6.4728 LearningRate 0.0007 Epoch: 9 Global Step: 206300 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:28,662-Speed 6305.67 samples/sec Loss 6.5394 LearningRate 0.0007 Epoch: 9 Global Step: 206310 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:31,895-Speed 6335.05 samples/sec Loss 6.5822 LearningRate 0.0007 Epoch: 9 Global Step: 206320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:35,141-Speed 6311.65 samples/sec Loss 6.5859 LearningRate 0.0007 Epoch: 9 Global Step: 206330 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:38,391-Speed 6303.43 samples/sec Loss 6.6069 LearningRate 0.0007 Epoch: 9 Global Step: 206340 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:41,646-Speed 6293.67 samples/sec Loss 6.5119 LearningRate 0.0007 Epoch: 9 Global Step: 206350 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:44,891-Speed 6312.43 samples/sec Loss 6.4952 LearningRate 0.0007 Epoch: 9 Global Step: 206360 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:48,154-Speed 6278.23 samples/sec Loss 6.5814 LearningRate 0.0007 Epoch: 9 Global Step: 206370 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:51,402-Speed 6307.42 samples/sec Loss 6.5968 LearningRate 0.0007 Epoch: 9 Global Step: 206380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:54,650-Speed 6306.56 samples/sec Loss 6.5989 LearningRate 0.0007 Epoch: 9 Global Step: 206390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:29:57,897-Speed 6308.64 samples/sec Loss 6.5647 LearningRate 0.0007 Epoch: 9 Global Step: 206400 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:01,146-Speed 6305.35 samples/sec Loss 6.5079 LearningRate 0.0007 Epoch: 9 Global Step: 206410 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:04,396-Speed 6301.44 samples/sec Loss 6.4687 LearningRate 0.0007 Epoch: 9 Global Step: 206420 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:30:07,634-Speed 6326.65 samples/sec Loss 6.5149 LearningRate 0.0007 Epoch: 9 Global Step: 206430 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:10,881-Speed 6309.59 samples/sec Loss 6.5292 LearningRate 0.0007 Epoch: 9 Global Step: 206440 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:14,128-Speed 6307.35 samples/sec Loss 6.5127 LearningRate 0.0007 Epoch: 9 Global Step: 206450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:17,378-Speed 6302.97 samples/sec Loss 6.4594 LearningRate 0.0007 Epoch: 9 Global Step: 206460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:20,627-Speed 6306.27 samples/sec Loss 6.6148 LearningRate 0.0007 Epoch: 9 Global Step: 206470 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:23,873-Speed 6310.41 samples/sec Loss 6.5517 LearningRate 0.0007 Epoch: 9 Global Step: 206480 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:27,122-Speed 6305.31 samples/sec Loss 6.5464 LearningRate 0.0007 Epoch: 9 Global Step: 206490 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:30,365-Speed 6316.28 samples/sec Loss 6.5032 LearningRate 0.0007 Epoch: 9 Global Step: 206500 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:33,614-Speed 6304.13 samples/sec Loss 6.5639 LearningRate 0.0007 Epoch: 9 Global Step: 206510 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:36,862-Speed 6306.51 samples/sec Loss 6.5263 LearningRate 0.0007 Epoch: 9 Global Step: 206520 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:40,121-Speed 6286.40 samples/sec Loss 6.4818 LearningRate 0.0007 Epoch: 9 Global Step: 206530 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:30:43,357-Speed 6329.96 samples/sec Loss 6.5303 LearningRate 0.0007 Epoch: 9 Global Step: 206540 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:46,602-Speed 6311.33 samples/sec Loss 6.5178 LearningRate 0.0007 Epoch: 9 Global Step: 206550 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:49,846-Speed 6314.73 samples/sec Loss 6.6022 LearningRate 0.0007 Epoch: 9 Global Step: 206560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:53,101-Speed 6294.57 samples/sec Loss 6.5156 LearningRate 0.0007 Epoch: 9 Global Step: 206570 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:56,345-Speed 6314.11 samples/sec Loss 6.4237 LearningRate 0.0007 Epoch: 9 Global Step: 206580 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:30:59,596-Speed 6301.68 samples/sec Loss 6.5767 LearningRate 0.0007 Epoch: 9 Global Step: 206590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:02,849-Speed 6298.48 samples/sec Loss 6.5750 LearningRate 0.0007 Epoch: 9 Global Step: 206600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:06,095-Speed 6309.32 samples/sec Loss 6.4928 LearningRate 0.0007 Epoch: 9 Global Step: 206610 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:09,345-Speed 6303.76 samples/sec Loss 6.5160 LearningRate 0.0007 Epoch: 9 Global Step: 206620 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:12,595-Speed 6302.14 samples/sec Loss 6.5247 LearningRate 0.0007 Epoch: 9 Global Step: 206630 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:15,829-Speed 6335.03 samples/sec Loss 6.5230 LearningRate 0.0007 Epoch: 9 Global Step: 206640 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:19,074-Speed 6313.11 samples/sec Loss 6.4022 LearningRate 0.0007 Epoch: 9 Global Step: 206650 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:22,323-Speed 6303.73 samples/sec Loss 6.5885 LearningRate 0.0007 Epoch: 9 Global Step: 206660 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:25,572-Speed 6306.24 samples/sec Loss 6.4646 LearningRate 0.0007 Epoch: 9 Global Step: 206670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:28,817-Speed 6312.18 samples/sec Loss 6.5473 LearningRate 0.0007 Epoch: 9 Global Step: 206680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:32,067-Speed 6302.96 samples/sec Loss 6.5139 LearningRate 0.0007 Epoch: 9 Global Step: 206690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:35,317-Speed 6303.37 samples/sec Loss 6.5641 LearningRate 0.0007 Epoch: 9 Global Step: 206700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:38,566-Speed 6303.46 samples/sec Loss 6.5284 LearningRate 0.0007 Epoch: 9 Global Step: 206710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:41,814-Speed 6308.27 samples/sec Loss 6.4813 LearningRate 0.0007 Epoch: 9 Global Step: 206720 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:45,058-Speed 6312.59 samples/sec Loss 6.4585 LearningRate 0.0007 Epoch: 9 Global Step: 206730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:48,303-Speed 6313.04 samples/sec Loss 6.5227 LearningRate 0.0007 Epoch: 9 Global Step: 206740 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:31:51,539-Speed 6331.18 samples/sec Loss 6.5875 LearningRate 0.0007 Epoch: 9 Global Step: 206750 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:54,781-Speed 6318.12 samples/sec Loss 6.5972 LearningRate 0.0007 Epoch: 9 Global Step: 206760 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:31:58,025-Speed 6315.23 samples/sec Loss 6.4778 LearningRate 0.0007 Epoch: 9 Global Step: 206770 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:01,284-Speed 6285.29 samples/sec Loss 6.5390 LearningRate 0.0007 Epoch: 9 Global Step: 206780 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:04,530-Speed 6312.10 samples/sec Loss 6.5829 LearningRate 0.0007 Epoch: 9 Global Step: 206790 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:07,772-Speed 6316.57 samples/sec Loss 6.6321 LearningRate 0.0007 Epoch: 9 Global Step: 206800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:11,026-Speed 6296.68 samples/sec Loss 6.4874 LearningRate 0.0007 Epoch: 9 Global Step: 206810 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:14,273-Speed 6307.55 samples/sec Loss 6.5736 LearningRate 0.0007 Epoch: 9 Global Step: 206820 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:17,524-Speed 6301.41 samples/sec Loss 6.5460 LearningRate 0.0007 Epoch: 9 Global Step: 206830 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:20,778-Speed 6297.19 samples/sec Loss 6.4640 LearningRate 0.0007 Epoch: 9 Global Step: 206840 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:24,009-Speed 6340.53 samples/sec Loss 6.4713 LearningRate 0.0007 Epoch: 9 Global Step: 206850 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:27,258-Speed 6304.42 samples/sec Loss 6.4799 LearningRate 0.0007 Epoch: 9 Global Step: 206860 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:30,506-Speed 6307.43 samples/sec Loss 6.5679 LearningRate 0.0007 Epoch: 9 Global Step: 206870 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:33,759-Speed 6297.37 samples/sec Loss 6.4780 LearningRate 0.0007 Epoch: 9 Global Step: 206880 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:37,005-Speed 6310.32 samples/sec Loss 6.5906 LearningRate 0.0007 Epoch: 9 Global Step: 206890 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:40,251-Speed 6310.70 samples/sec Loss 6.4973 LearningRate 0.0007 Epoch: 9 Global Step: 206900 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:43,499-Speed 6306.37 samples/sec Loss 6.5203 LearningRate 0.0007 Epoch: 9 Global Step: 206910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:46,748-Speed 6304.76 samples/sec Loss 6.5173 LearningRate 0.0007 Epoch: 9 Global Step: 206920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:49,995-Speed 6308.98 samples/sec Loss 6.5373 LearningRate 0.0007 Epoch: 9 Global Step: 206930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:53,239-Speed 6315.09 samples/sec Loss 6.5425 LearningRate 0.0007 Epoch: 9 Global Step: 206940 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:32:56,486-Speed 6307.70 samples/sec Loss 6.5248 LearningRate 0.0007 Epoch: 9 Global Step: 206950 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:32:59,724-Speed 6326.79 samples/sec Loss 6.5368 LearningRate 0.0007 Epoch: 9 Global Step: 206960 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:02,972-Speed 6305.98 samples/sec Loss 6.4579 LearningRate 0.0007 Epoch: 9 Global Step: 206970 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:06,218-Speed 6311.97 samples/sec Loss 6.5346 LearningRate 0.0007 Epoch: 9 Global Step: 206980 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:09,464-Speed 6310.63 samples/sec Loss 6.5643 LearningRate 0.0007 Epoch: 9 Global Step: 206990 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:12,710-Speed 6311.41 samples/sec Loss 6.5251 LearningRate 0.0007 Epoch: 9 Global Step: 207000 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:15,958-Speed 6306.75 samples/sec Loss 6.5148 LearningRate 0.0007 Epoch: 9 Global Step: 207010 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:19,205-Speed 6307.60 samples/sec Loss 6.5416 LearningRate 0.0007 Epoch: 9 Global Step: 207020 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:22,461-Speed 6291.81 samples/sec Loss 6.6037 LearningRate 0.0007 Epoch: 9 Global Step: 207030 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:25,710-Speed 6306.04 samples/sec Loss 6.5297 LearningRate 0.0007 Epoch: 9 Global Step: 207040 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:28,957-Speed 6308.36 samples/sec Loss 6.6043 LearningRate 0.0007 Epoch: 9 Global Step: 207050 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:32,205-Speed 6306.57 samples/sec Loss 6.5258 LearningRate 0.0007 Epoch: 9 Global Step: 207060 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:33:35,438-Speed 6336.08 samples/sec Loss 6.5513 LearningRate 0.0007 Epoch: 9 Global Step: 207070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:38,684-Speed 6311.26 samples/sec Loss 6.5349 LearningRate 0.0007 Epoch: 9 Global Step: 207080 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:41,930-Speed 6311.27 samples/sec Loss 6.5357 LearningRate 0.0007 Epoch: 9 Global Step: 207090 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:45,179-Speed 6303.16 samples/sec Loss 6.5121 LearningRate 0.0007 Epoch: 9 Global Step: 207100 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:48,424-Speed 6313.01 samples/sec Loss 6.5629 LearningRate 0.0007 Epoch: 9 Global Step: 207110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:51,671-Speed 6309.52 samples/sec Loss 6.4435 LearningRate 0.0007 Epoch: 9 Global Step: 207120 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:54,923-Speed 6299.23 samples/sec Loss 6.5376 LearningRate 0.0007 Epoch: 9 Global Step: 207130 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:33:58,168-Speed 6312.71 samples/sec Loss 6.5578 LearningRate 0.0007 Epoch: 9 Global Step: 207140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:01,414-Speed 6310.70 samples/sec Loss 6.5468 LearningRate 0.0007 Epoch: 9 Global Step: 207150 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:04,660-Speed 6310.28 samples/sec Loss 6.5039 LearningRate 0.0007 Epoch: 9 Global Step: 207160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:07,908-Speed 6307.03 samples/sec Loss 6.5311 LearningRate 0.0007 Epoch: 9 Global Step: 207170 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:34:11,153-Speed 6311.38 samples/sec Loss 6.5428 LearningRate 0.0007 Epoch: 9 Global Step: 207180 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:34:14,385-Speed 6338.91 samples/sec Loss 6.4747 LearningRate 0.0007 Epoch: 9 Global Step: 207190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:17,632-Speed 6309.08 samples/sec Loss 6.5926 LearningRate 0.0007 Epoch: 9 Global Step: 207200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:20,881-Speed 6306.24 samples/sec Loss 6.5888 LearningRate 0.0007 Epoch: 9 Global Step: 207210 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:24,127-Speed 6309.42 samples/sec Loss 6.5298 LearningRate 0.0007 Epoch: 9 Global Step: 207220 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:27,371-Speed 6315.96 samples/sec Loss 6.5811 LearningRate 0.0007 Epoch: 9 Global Step: 207230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:30,620-Speed 6303.65 samples/sec Loss 6.5608 LearningRate 0.0007 Epoch: 9 Global Step: 207240 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:33,866-Speed 6310.90 samples/sec Loss 6.5430 LearningRate 0.0007 Epoch: 9 Global Step: 207250 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:37,121-Speed 6294.53 samples/sec Loss 6.5493 LearningRate 0.0007 Epoch: 9 Global Step: 207260 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:40,368-Speed 6308.09 samples/sec Loss 6.5410 LearningRate 0.0007 Epoch: 9 Global Step: 207270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:43,612-Speed 6314.19 samples/sec Loss 6.5686 LearningRate 0.0007 Epoch: 9 Global Step: 207280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:46,859-Speed 6308.45 samples/sec Loss 6.5982 LearningRate 0.0007 Epoch: 9 Global Step: 207290 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:34:50,093-Speed 6334.70 samples/sec Loss 6.5425 LearningRate 0.0007 Epoch: 9 Global Step: 207300 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:53,339-Speed 6310.77 samples/sec Loss 6.4973 LearningRate 0.0007 Epoch: 9 Global Step: 207310 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:56,584-Speed 6313.30 samples/sec Loss 6.5365 LearningRate 0.0007 Epoch: 9 Global Step: 207320 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:34:59,828-Speed 6313.42 samples/sec Loss 6.4664 LearningRate 0.0007 Epoch: 9 Global Step: 207330 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:35:03,079-Speed 6303.28 samples/sec Loss 6.5423 LearningRate 0.0007 Epoch: 9 Global Step: 207340 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:35:06,326-Speed 6310.40 samples/sec Loss 6.5859 LearningRate 0.0007 Epoch: 9 Global Step: 207350 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:35:09,580-Speed 6294.55 samples/sec Loss 6.5009 LearningRate 0.0007 Epoch: 9 Global Step: 207360 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:35:12,832-Speed 6298.19 samples/sec Loss 6.5203 LearningRate 0.0007 Epoch: 9 Global Step: 207370 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:35:16,074-Speed 6318.23 samples/sec Loss 6.5545 LearningRate 0.0007 Epoch: 9 Global Step: 207380 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:35:19,321-Speed 6309.36 samples/sec Loss 6.6548 LearningRate 0.0007 Epoch: 9 Global Step: 207390 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:18,979-Speed 343.30 samples/sec Loss 6.5035 LearningRate 0.0007 Epoch: 10 Global Step: 207400 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:36:22,218-Speed 6324.58 samples/sec Loss 6.5958 LearningRate 0.0007 Epoch: 10 Global Step: 207410 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:36:25,449-Speed 6339.70 samples/sec Loss 6.5235 LearningRate 0.0007 Epoch: 10 Global Step: 207420 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:28,683-Speed 6334.41 samples/sec Loss 6.5755 LearningRate 0.0007 Epoch: 10 Global Step: 207430 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:31,919-Speed 6329.39 samples/sec Loss 6.5778 LearningRate 0.0007 Epoch: 10 Global Step: 207440 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:35,157-Speed 6327.13 samples/sec Loss 6.5304 LearningRate 0.0007 Epoch: 10 Global Step: 207450 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:38,405-Speed 6306.59 samples/sec Loss 6.5746 LearningRate 0.0007 Epoch: 10 Global Step: 207460 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:41,641-Speed 6329.94 samples/sec Loss 6.5229 LearningRate 0.0007 Epoch: 10 Global Step: 207470 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:44,879-Speed 6327.08 samples/sec Loss 6.4706 LearningRate 0.0007 Epoch: 10 Global Step: 207480 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:48,118-Speed 6324.62 samples/sec Loss 6.5318 LearningRate 0.0007 Epoch: 10 Global Step: 207490 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:51,357-Speed 6323.23 samples/sec Loss 6.5294 LearningRate 0.0007 Epoch: 10 Global Step: 207500 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:54,606-Speed 6305.84 samples/sec Loss 6.6249 LearningRate 0.0007 Epoch: 10 Global Step: 207510 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:36:57,849-Speed 6315.61 samples/sec Loss 6.6138 LearningRate 0.0007 Epoch: 10 Global Step: 207520 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:37:01,090-Speed 6321.05 samples/sec Loss 6.6006 LearningRate 0.0007 Epoch: 10 Global Step: 207530 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:37:04,335-Speed 6312.92 samples/sec Loss 6.5986 LearningRate 0.0007 Epoch: 10 Global Step: 207540 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:37:07,577-Speed 6317.02 samples/sec Loss 6.4926 LearningRate 0.0007 Epoch: 10 Global Step: 207550 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:37:10,813-Speed 6331.09 samples/sec Loss 6.4833 LearningRate 0.0007 Epoch: 10 Global Step: 207560 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:14,060-Speed 6309.88 samples/sec Loss 6.4064 LearningRate 0.0007 Epoch: 10 Global Step: 207570 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:17,304-Speed 6314.69 samples/sec Loss 6.5641 LearningRate 0.0007 Epoch: 10 Global Step: 207580 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:20,552-Speed 6306.95 samples/sec Loss 6.4905 LearningRate 0.0007 Epoch: 10 Global Step: 207590 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:23,795-Speed 6315.96 samples/sec Loss 6.6322 LearningRate 0.0007 Epoch: 10 Global Step: 207600 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:27,043-Speed 6306.49 samples/sec Loss 6.4800 LearningRate 0.0007 Epoch: 10 Global Step: 207610 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:30,284-Speed 6320.56 samples/sec Loss 6.4647 LearningRate 0.0007 Epoch: 10 Global Step: 207620 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:33,528-Speed 6315.30 samples/sec Loss 6.4340 LearningRate 0.0007 Epoch: 10 Global Step: 207630 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:36,776-Speed 6309.00 samples/sec Loss 6.5512 LearningRate 0.0007 Epoch: 10 Global Step: 207640 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:40,026-Speed 6304.05 samples/sec Loss 6.4411 LearningRate 0.0007 Epoch: 10 Global Step: 207650 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:43,275-Speed 6304.61 samples/sec Loss 6.6028 LearningRate 0.0007 Epoch: 10 Global Step: 207660 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:37:46,513-Speed 6326.26 samples/sec Loss 6.5386 LearningRate 0.0007 Epoch: 10 Global Step: 207670 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:49,763-Speed 6303.32 samples/sec Loss 6.5915 LearningRate 0.0007 Epoch: 10 Global Step: 207680 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:53,007-Speed 6312.78 samples/sec Loss 6.4996 LearningRate 0.0007 Epoch: 10 Global Step: 207690 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:56,257-Speed 6307.52 samples/sec Loss 6.5789 LearningRate 0.0007 Epoch: 10 Global Step: 207700 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:37:59,511-Speed 6294.11 samples/sec Loss 6.5154 LearningRate 0.0007 Epoch: 10 Global Step: 207710 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:38:02,756-Speed 6312.05 samples/sec Loss 6.4222 LearningRate 0.0007 Epoch: 10 Global Step: 207720 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:38:06,007-Speed 6302.29 samples/sec Loss 6.4930 LearningRate 0.0007 Epoch: 10 Global Step: 207730 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:38:09,253-Speed 6309.86 samples/sec Loss 6.4921 LearningRate 0.0007 Epoch: 10 Global Step: 207740 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:38:12,503-Speed 6304.30 samples/sec Loss 6.4930 LearningRate 0.0007 Epoch: 10 Global Step: 207750 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:38:15,751-Speed 6306.82 samples/sec Loss 6.4266 LearningRate 0.0007 Epoch: 10 Global Step: 207760 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:38:18,985-Speed 6333.92 samples/sec Loss 6.5520 LearningRate 0.0007 Epoch: 10 Global Step: 207770 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:38:22,240-Speed 6292.65 samples/sec Loss 6.5173 LearningRate 0.0007 Epoch: 10 Global Step: 207780 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:38:25,485-Speed 6313.73 samples/sec Loss 6.5401 LearningRate 0.0007 Epoch: 10 Global Step: 207790 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:38:28,732-Speed 6308.40 samples/sec Loss 6.5599 LearningRate 0.0007 Epoch: 10 Global Step: 207800 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:38:31,963-Speed 6340.16 samples/sec Loss 6.4896 LearningRate 0.0007 Epoch: 10 Global Step: 207810 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:38:35,209-Speed 6311.14 samples/sec Loss 6.5094 LearningRate 0.0007 Epoch: 10 Global Step: 207820 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:38:38,458-Speed 6304.13 samples/sec Loss 6.5256 LearningRate 0.0007 Epoch: 10 Global Step: 207830 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:38:41,699-Speed 6319.94 samples/sec Loss 6.4966 LearningRate 0.0007 Epoch: 10 Global Step: 207840 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:38:44,945-Speed 6311.40 samples/sec Loss 6.5403 LearningRate 0.0007 Epoch: 10 Global Step: 207850 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:38:48,187-Speed 6318.27 samples/sec Loss 6.4556 LearningRate 0.0007 Epoch: 10 Global Step: 207860 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:38:51,435-Speed 6307.22 samples/sec Loss 6.4140 LearningRate 0.0007 Epoch: 10 Global Step: 207870 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:38:54,677-Speed 6318.70 samples/sec Loss 6.4360 LearningRate 0.0007 Epoch: 10 Global Step: 207880 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:38:57,919-Speed 6318.03 samples/sec Loss 6.5324 LearningRate 0.0007 Epoch: 10 Global Step: 207890 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:39:01,163-Speed 6314.47 samples/sec Loss 6.5656 LearningRate 0.0007 Epoch: 10 Global Step: 207900 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-01 10:39:04,520-Speed 6102.40 samples/sec Loss 6.4271 LearningRate 0.0007 Epoch: 10 Global Step: 207910 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:07,776-Speed 6290.64 samples/sec Loss 6.4309 LearningRate 0.0007 Epoch: 10 Global Step: 207920 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:11,019-Speed 6316.08 samples/sec Loss 6.5819 LearningRate 0.0007 Epoch: 10 Global Step: 207930 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:14,264-Speed 6314.83 samples/sec Loss 6.4279 LearningRate 0.0007 Epoch: 10 Global Step: 207940 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:17,510-Speed 6309.17 samples/sec Loss 6.4879 LearningRate 0.0007 Epoch: 10 Global Step: 207950 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:20,758-Speed 6307.78 samples/sec Loss 6.5954 LearningRate 0.0007 Epoch: 10 Global Step: 207960 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:24,000-Speed 6316.89 samples/sec Loss 6.5686 LearningRate 0.0007 Epoch: 10 Global Step: 207970 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:27,255-Speed 6294.75 samples/sec Loss 6.4685 LearningRate 0.0007 Epoch: 10 Global Step: 207980 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:30,503-Speed 6306.14 samples/sec Loss 6.5543 LearningRate 0.0007 Epoch: 10 Global Step: 207990 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:33,750-Speed 6309.47 samples/sec Loss 6.5204 LearningRate 0.0007 Epoch: 10 Global Step: 208000 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:37,000-Speed 6304.15 samples/sec Loss 6.4773 LearningRate 0.0007 Epoch: 10 Global Step: 208010 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:39:40,250-Speed 6302.74 samples/sec Loss 6.5873 LearningRate 0.0007 Epoch: 10 Global Step: 208020 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:39:43,486-Speed 6329.53 samples/sec Loss 6.5041 LearningRate 0.0007 Epoch: 10 Global Step: 208030 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:46,729-Speed 6316.61 samples/sec Loss 6.5468 LearningRate 0.0007 Epoch: 10 Global Step: 208040 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:49,976-Speed 6309.41 samples/sec Loss 6.5423 LearningRate 0.0007 Epoch: 10 Global Step: 208050 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:53,218-Speed 6316.91 samples/sec Loss 6.6137 LearningRate 0.0007 Epoch: 10 Global Step: 208060 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:56,480-Speed 6279.65 samples/sec Loss 6.5292 LearningRate 0.0007 Epoch: 10 Global Step: 208070 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:39:59,723-Speed 6317.99 samples/sec Loss 6.5530 LearningRate 0.0007 Epoch: 10 Global Step: 208080 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:02,971-Speed 6305.38 samples/sec Loss 6.4683 LearningRate 0.0007 Epoch: 10 Global Step: 208090 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:06,218-Speed 6309.33 samples/sec Loss 6.5204 LearningRate 0.0007 Epoch: 10 Global Step: 208100 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:09,467-Speed 6305.23 samples/sec Loss 6.4901 LearningRate 0.0007 Epoch: 10 Global Step: 208110 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:12,760-Speed 6220.04 samples/sec Loss 6.5196 LearningRate 0.0007 Epoch: 10 Global Step: 208120 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:16,096-Speed 6140.64 samples/sec Loss 6.4624 LearningRate 0.0007 Epoch: 10 Global Step: 208130 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:40:19,333-Speed 6327.63 samples/sec Loss 6.5006 LearningRate 0.0007 Epoch: 10 Global Step: 208140 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:22,585-Speed 6300.47 samples/sec Loss 6.4928 LearningRate 0.0007 Epoch: 10 Global Step: 208150 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:25,834-Speed 6305.15 samples/sec Loss 6.5472 LearningRate 0.0007 Epoch: 10 Global Step: 208160 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:29,077-Speed 6314.78 samples/sec Loss 6.4683 LearningRate 0.0007 Epoch: 10 Global Step: 208170 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:32,320-Speed 6316.77 samples/sec Loss 6.5194 LearningRate 0.0007 Epoch: 10 Global Step: 208180 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:35,566-Speed 6310.61 samples/sec Loss 6.5375 LearningRate 0.0007 Epoch: 10 Global Step: 208190 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:38,828-Speed 6281.25 samples/sec Loss 6.5888 LearningRate 0.0007 Epoch: 10 Global Step: 208200 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:42,074-Speed 6311.10 samples/sec Loss 6.5632 LearningRate 0.0007 Epoch: 10 Global Step: 208210 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:45,317-Speed 6316.04 samples/sec Loss 6.4539 LearningRate 0.0007 Epoch: 10 Global Step: 208220 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:48,567-Speed 6303.08 samples/sec Loss 6.5163 LearningRate 0.0007 Epoch: 10 Global Step: 208230 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:40:51,818-Speed 6300.04 samples/sec Loss 6.5505 LearningRate 0.0007 Epoch: 10 Global Step: 208240 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:40:55,065-Speed 6309.22 samples/sec Loss 6.5606 LearningRate 0.0007 Epoch: 10 Global Step: 208250 Fp16 Grad Scale: 65536 Required: 57 hours Training: 2022-04-01 10:40:58,296-Speed 6339.84 samples/sec Loss 6.5532 LearningRate 0.0007 Epoch: 10 Global Step: 208260 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:41:01,539-Speed 6316.51 samples/sec Loss 6.4721 LearningRate 0.0007 Epoch: 10 Global Step: 208270 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:41:04,787-Speed 6306.27 samples/sec Loss 6.4868 LearningRate 0.0007 Epoch: 10 Global Step: 208280 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:41:08,034-Speed 6309.61 samples/sec Loss 6.5374 LearningRate 0.0007 Epoch: 10 Global Step: 208290 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:41:11,279-Speed 6313.11 samples/sec Loss 6.5093 LearningRate 0.0007 Epoch: 10 Global Step: 208300 Fp16 Grad Scale: 32768 Required: 57 hours Training: 2022-04-01 10:41:14,531-Speed 6302.40 samples/sec Loss 6.6354 LearningRate 0.0007 Epoch: 10 Global Step: 208310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:17,778-Speed 6308.94 samples/sec Loss 6.5058 LearningRate 0.0007 Epoch: 10 Global Step: 208320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:21,025-Speed 6308.41 samples/sec Loss 6.4659 LearningRate 0.0007 Epoch: 10 Global Step: 208330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:24,272-Speed 6307.68 samples/sec Loss 6.5693 LearningRate 0.0007 Epoch: 10 Global Step: 208340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:27,521-Speed 6306.32 samples/sec Loss 6.5684 LearningRate 0.0007 Epoch: 10 Global Step: 208350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:30,756-Speed 6331.73 samples/sec Loss 6.5593 LearningRate 0.0007 Epoch: 10 Global Step: 208360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:34,001-Speed 6312.31 samples/sec Loss 6.4482 LearningRate 0.0007 Epoch: 10 Global Step: 208370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:37,251-Speed 6303.29 samples/sec Loss 6.5270 LearningRate 0.0007 Epoch: 10 Global Step: 208380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:40,496-Speed 6311.18 samples/sec Loss 6.4893 LearningRate 0.0007 Epoch: 10 Global Step: 208390 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:43,747-Speed 6301.49 samples/sec Loss 6.5075 LearningRate 0.0007 Epoch: 10 Global Step: 208400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:46,994-Speed 6310.60 samples/sec Loss 6.6149 LearningRate 0.0007 Epoch: 10 Global Step: 208410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:50,242-Speed 6307.24 samples/sec Loss 6.5761 LearningRate 0.0007 Epoch: 10 Global Step: 208420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:53,495-Speed 6296.89 samples/sec Loss 6.4534 LearningRate 0.0007 Epoch: 10 Global Step: 208430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:56,741-Speed 6310.80 samples/sec Loss 6.5377 LearningRate 0.0007 Epoch: 10 Global Step: 208440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:41:59,992-Speed 6299.94 samples/sec Loss 6.5298 LearningRate 0.0007 Epoch: 10 Global Step: 208450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:03,242-Speed 6303.77 samples/sec Loss 6.4883 LearningRate 0.0007 Epoch: 10 Global Step: 208460 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:42:06,488-Speed 6310.60 samples/sec Loss 6.4996 LearningRate 0.0007 Epoch: 10 Global Step: 208470 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:42:09,734-Speed 6310.86 samples/sec Loss 6.4625 LearningRate 0.0007 Epoch: 10 Global Step: 208480 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:42:12,984-Speed 6303.39 samples/sec Loss 6.4793 LearningRate 0.0007 Epoch: 10 Global Step: 208490 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:42:16,220-Speed 6328.94 samples/sec Loss 6.4653 LearningRate 0.0007 Epoch: 10 Global Step: 208500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:19,471-Speed 6302.17 samples/sec Loss 6.5126 LearningRate 0.0007 Epoch: 10 Global Step: 208510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:22,717-Speed 6311.18 samples/sec Loss 6.5583 LearningRate 0.0007 Epoch: 10 Global Step: 208520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:25,964-Speed 6307.44 samples/sec Loss 6.4672 LearningRate 0.0007 Epoch: 10 Global Step: 208530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:29,211-Speed 6308.51 samples/sec Loss 6.5643 LearningRate 0.0007 Epoch: 10 Global Step: 208540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:32,459-Speed 6308.26 samples/sec Loss 6.5033 LearningRate 0.0007 Epoch: 10 Global Step: 208550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:35,705-Speed 6309.10 samples/sec Loss 6.5571 LearningRate 0.0007 Epoch: 10 Global Step: 208560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:38,958-Speed 6298.34 samples/sec Loss 6.4516 LearningRate 0.0007 Epoch: 10 Global Step: 208570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:42,203-Speed 6313.36 samples/sec Loss 6.4851 LearningRate 0.0007 Epoch: 10 Global Step: 208580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:45,448-Speed 6310.99 samples/sec Loss 6.4881 LearningRate 0.0007 Epoch: 10 Global Step: 208590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:48,695-Speed 6309.54 samples/sec Loss 6.5275 LearningRate 0.0007 Epoch: 10 Global Step: 208600 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:42:51,927-Speed 6338.55 samples/sec Loss 6.5524 LearningRate 0.0007 Epoch: 10 Global Step: 208610 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:55,175-Speed 6307.40 samples/sec Loss 6.4875 LearningRate 0.0007 Epoch: 10 Global Step: 208620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:42:58,418-Speed 6315.69 samples/sec Loss 6.4837 LearningRate 0.0007 Epoch: 10 Global Step: 208630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:01,666-Speed 6307.18 samples/sec Loss 6.5336 LearningRate 0.0007 Epoch: 10 Global Step: 208640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:04,913-Speed 6308.89 samples/sec Loss 6.5068 LearningRate 0.0007 Epoch: 10 Global Step: 208650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:08,162-Speed 6305.13 samples/sec Loss 6.5062 LearningRate 0.0007 Epoch: 10 Global Step: 208660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:11,406-Speed 6313.65 samples/sec Loss 6.5307 LearningRate 0.0007 Epoch: 10 Global Step: 208670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:14,651-Speed 6312.49 samples/sec Loss 6.5413 LearningRate 0.0007 Epoch: 10 Global Step: 208680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:17,899-Speed 6308.70 samples/sec Loss 6.5101 LearningRate 0.0007 Epoch: 10 Global Step: 208690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:21,149-Speed 6303.16 samples/sec Loss 6.4986 LearningRate 0.0007 Epoch: 10 Global Step: 208700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:24,395-Speed 6310.01 samples/sec Loss 6.3712 LearningRate 0.0007 Epoch: 10 Global Step: 208710 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:43:27,628-Speed 6336.79 samples/sec Loss 6.4873 LearningRate 0.0007 Epoch: 10 Global Step: 208720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:30,869-Speed 6319.30 samples/sec Loss 6.5077 LearningRate 0.0007 Epoch: 10 Global Step: 208730 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:34,117-Speed 6307.77 samples/sec Loss 6.5791 LearningRate 0.0007 Epoch: 10 Global Step: 208740 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:37,364-Speed 6309.00 samples/sec Loss 6.5747 LearningRate 0.0007 Epoch: 10 Global Step: 208750 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:40,613-Speed 6304.32 samples/sec Loss 6.5450 LearningRate 0.0007 Epoch: 10 Global Step: 208760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:43,863-Speed 6302.97 samples/sec Loss 6.5422 LearningRate 0.0007 Epoch: 10 Global Step: 208770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:47,110-Speed 6309.35 samples/sec Loss 6.4831 LearningRate 0.0007 Epoch: 10 Global Step: 208780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:50,358-Speed 6306.29 samples/sec Loss 6.5318 LearningRate 0.0007 Epoch: 10 Global Step: 208790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:53,602-Speed 6314.14 samples/sec Loss 6.4797 LearningRate 0.0007 Epoch: 10 Global Step: 208800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:43:56,847-Speed 6312.54 samples/sec Loss 6.4542 LearningRate 0.0007 Epoch: 10 Global Step: 208810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:00,094-Speed 6308.81 samples/sec Loss 6.4735 LearningRate 0.0007 Epoch: 10 Global Step: 208820 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:44:03,343-Speed 6305.03 samples/sec Loss 6.4605 LearningRate 0.0007 Epoch: 10 Global Step: 208830 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:44:06,575-Speed 6338.83 samples/sec Loss 6.4794 LearningRate 0.0007 Epoch: 10 Global Step: 208840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:09,819-Speed 6314.25 samples/sec Loss 6.4994 LearningRate 0.0007 Epoch: 10 Global Step: 208850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:13,064-Speed 6312.57 samples/sec Loss 6.5063 LearningRate 0.0007 Epoch: 10 Global Step: 208860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:16,307-Speed 6317.31 samples/sec Loss 6.5195 LearningRate 0.0007 Epoch: 10 Global Step: 208870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:19,553-Speed 6310.50 samples/sec Loss 6.5434 LearningRate 0.0007 Epoch: 10 Global Step: 208880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:22,800-Speed 6308.98 samples/sec Loss 6.5025 LearningRate 0.0007 Epoch: 10 Global Step: 208890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:26,053-Speed 6298.04 samples/sec Loss 6.4937 LearningRate 0.0007 Epoch: 10 Global Step: 208900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:29,309-Speed 6289.65 samples/sec Loss 6.4170 LearningRate 0.0007 Epoch: 10 Global Step: 208910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:32,555-Speed 6311.49 samples/sec Loss 6.5159 LearningRate 0.0007 Epoch: 10 Global Step: 208920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:35,800-Speed 6312.87 samples/sec Loss 6.5213 LearningRate 0.0007 Epoch: 10 Global Step: 208930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:39,045-Speed 6311.89 samples/sec Loss 6.4432 LearningRate 0.0007 Epoch: 10 Global Step: 208940 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:44:42,278-Speed 6336.97 samples/sec Loss 6.5174 LearningRate 0.0007 Epoch: 10 Global Step: 208950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:45,526-Speed 6306.66 samples/sec Loss 6.4812 LearningRate 0.0007 Epoch: 10 Global Step: 208960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:48,771-Speed 6312.55 samples/sec Loss 6.4976 LearningRate 0.0007 Epoch: 10 Global Step: 208970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:52,019-Speed 6306.08 samples/sec Loss 6.5906 LearningRate 0.0007 Epoch: 10 Global Step: 208980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:55,265-Speed 6310.98 samples/sec Loss 6.5778 LearningRate 0.0007 Epoch: 10 Global Step: 208990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:44:58,517-Speed 6299.14 samples/sec Loss 6.4834 LearningRate 0.0007 Epoch: 10 Global Step: 209000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:01,768-Speed 6301.13 samples/sec Loss 6.4957 LearningRate 0.0007 Epoch: 10 Global Step: 209010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:05,019-Speed 6300.94 samples/sec Loss 6.5727 LearningRate 0.0007 Epoch: 10 Global Step: 209020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:08,270-Speed 6302.09 samples/sec Loss 6.5172 LearningRate 0.0007 Epoch: 10 Global Step: 209030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:11,514-Speed 6313.67 samples/sec Loss 6.4495 LearningRate 0.0007 Epoch: 10 Global Step: 209040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:14,745-Speed 6339.76 samples/sec Loss 6.5312 LearningRate 0.0007 Epoch: 10 Global Step: 209050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:17,993-Speed 6308.51 samples/sec Loss 6.4848 LearningRate 0.0007 Epoch: 10 Global Step: 209060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:21,238-Speed 6312.92 samples/sec Loss 6.4968 LearningRate 0.0007 Epoch: 10 Global Step: 209070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:24,484-Speed 6310.49 samples/sec Loss 6.5394 LearningRate 0.0007 Epoch: 10 Global Step: 209080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:27,734-Speed 6302.82 samples/sec Loss 6.5174 LearningRate 0.0007 Epoch: 10 Global Step: 209090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:30,984-Speed 6303.73 samples/sec Loss 6.5552 LearningRate 0.0007 Epoch: 10 Global Step: 209100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:34,230-Speed 6310.63 samples/sec Loss 6.5082 LearningRate 0.0007 Epoch: 10 Global Step: 209110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:37,476-Speed 6310.31 samples/sec Loss 6.4318 LearningRate 0.0007 Epoch: 10 Global Step: 209120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:40,721-Speed 6312.07 samples/sec Loss 6.4769 LearningRate 0.0007 Epoch: 10 Global Step: 209130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:43,965-Speed 6314.96 samples/sec Loss 6.4994 LearningRate 0.0007 Epoch: 10 Global Step: 209140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:47,200-Speed 6332.19 samples/sec Loss 6.5103 LearningRate 0.0007 Epoch: 10 Global Step: 209150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:50,448-Speed 6306.84 samples/sec Loss 6.5262 LearningRate 0.0007 Epoch: 10 Global Step: 209160 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:53,695-Speed 6308.67 samples/sec Loss 6.4642 LearningRate 0.0007 Epoch: 10 Global Step: 209170 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:45:56,942-Speed 6309.43 samples/sec Loss 6.5161 LearningRate 0.0007 Epoch: 10 Global Step: 209180 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:00,188-Speed 6309.75 samples/sec Loss 6.5037 LearningRate 0.0007 Epoch: 10 Global Step: 209190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:03,439-Speed 6301.41 samples/sec Loss 6.5365 LearningRate 0.0007 Epoch: 10 Global Step: 209200 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:06,684-Speed 6313.42 samples/sec Loss 6.5385 LearningRate 0.0007 Epoch: 10 Global Step: 209210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:09,931-Speed 6307.73 samples/sec Loss 6.4876 LearningRate 0.0007 Epoch: 10 Global Step: 209220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:13,179-Speed 6306.92 samples/sec Loss 6.5672 LearningRate 0.0007 Epoch: 10 Global Step: 209230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:16,433-Speed 6293.94 samples/sec Loss 6.5409 LearningRate 0.0007 Epoch: 10 Global Step: 209240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:19,682-Speed 6305.39 samples/sec Loss 6.5017 LearningRate 0.0007 Epoch: 10 Global Step: 209250 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:46:22,915-Speed 6335.81 samples/sec Loss 6.5323 LearningRate 0.0007 Epoch: 10 Global Step: 209260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:26,166-Speed 6301.69 samples/sec Loss 6.5248 LearningRate 0.0007 Epoch: 10 Global Step: 209270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:29,412-Speed 6310.93 samples/sec Loss 6.4458 LearningRate 0.0007 Epoch: 10 Global Step: 209280 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:32,658-Speed 6311.20 samples/sec Loss 6.5624 LearningRate 0.0007 Epoch: 10 Global Step: 209290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:35,905-Speed 6308.25 samples/sec Loss 6.5367 LearningRate 0.0007 Epoch: 10 Global Step: 209300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:39,152-Speed 6309.18 samples/sec Loss 6.5616 LearningRate 0.0007 Epoch: 10 Global Step: 209310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:42,398-Speed 6311.28 samples/sec Loss 6.5274 LearningRate 0.0007 Epoch: 10 Global Step: 209320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:45,642-Speed 6314.63 samples/sec Loss 6.5260 LearningRate 0.0007 Epoch: 10 Global Step: 209330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:48,887-Speed 6311.59 samples/sec Loss 6.5428 LearningRate 0.0007 Epoch: 10 Global Step: 209340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:52,132-Speed 6312.98 samples/sec Loss 6.4543 LearningRate 0.0007 Epoch: 10 Global Step: 209350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:46:55,376-Speed 6314.94 samples/sec Loss 6.4786 LearningRate 0.0007 Epoch: 10 Global Step: 209360 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:46:58,607-Speed 6341.33 samples/sec Loss 6.5161 LearningRate 0.0007 Epoch: 10 Global Step: 209370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:01,851-Speed 6314.65 samples/sec Loss 6.5307 LearningRate 0.0007 Epoch: 10 Global Step: 209380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:05,099-Speed 6305.59 samples/sec Loss 6.5788 LearningRate 0.0007 Epoch: 10 Global Step: 209390 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:08,348-Speed 6304.86 samples/sec Loss 6.4550 LearningRate 0.0007 Epoch: 10 Global Step: 209400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:11,607-Speed 6285.60 samples/sec Loss 6.6022 LearningRate 0.0007 Epoch: 10 Global Step: 209410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:14,866-Speed 6286.52 samples/sec Loss 6.5663 LearningRate 0.0007 Epoch: 10 Global Step: 209420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:18,112-Speed 6311.12 samples/sec Loss 6.5272 LearningRate 0.0007 Epoch: 10 Global Step: 209430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:21,357-Speed 6312.01 samples/sec Loss 6.4713 LearningRate 0.0007 Epoch: 10 Global Step: 209440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:24,603-Speed 6310.48 samples/sec Loss 6.4829 LearningRate 0.0007 Epoch: 10 Global Step: 209450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:27,849-Speed 6310.86 samples/sec Loss 6.4939 LearningRate 0.0007 Epoch: 10 Global Step: 209460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:31,106-Speed 6288.81 samples/sec Loss 6.4084 LearningRate 0.0007 Epoch: 10 Global Step: 209470 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:47:34,342-Speed 6330.89 samples/sec Loss 6.4899 LearningRate 0.0007 Epoch: 10 Global Step: 209480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:37,591-Speed 6306.34 samples/sec Loss 6.5440 LearningRate 0.0007 Epoch: 10 Global Step: 209490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:40,837-Speed 6310.22 samples/sec Loss 6.4937 LearningRate 0.0007 Epoch: 10 Global Step: 209500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:44,082-Speed 6312.82 samples/sec Loss 6.4782 LearningRate 0.0007 Epoch: 10 Global Step: 209510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:47,324-Speed 6318.22 samples/sec Loss 6.5061 LearningRate 0.0007 Epoch: 10 Global Step: 209520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:50,573-Speed 6304.72 samples/sec Loss 6.4988 LearningRate 0.0007 Epoch: 10 Global Step: 209530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:53,821-Speed 6306.56 samples/sec Loss 6.5267 LearningRate 0.0007 Epoch: 10 Global Step: 209540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:47:57,066-Speed 6313.60 samples/sec Loss 6.5231 LearningRate 0.0007 Epoch: 10 Global Step: 209550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:00,314-Speed 6305.73 samples/sec Loss 6.4685 LearningRate 0.0007 Epoch: 10 Global Step: 209560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:03,561-Speed 6309.29 samples/sec Loss 6.4915 LearningRate 0.0007 Epoch: 10 Global Step: 209570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:06,809-Speed 6307.41 samples/sec Loss 6.5191 LearningRate 0.0007 Epoch: 10 Global Step: 209580 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:48:10,056-Speed 6309.38 samples/sec Loss 6.4579 LearningRate 0.0007 Epoch: 10 Global Step: 209590 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:48:13,298-Speed 6317.65 samples/sec Loss 6.4311 LearningRate 0.0007 Epoch: 10 Global Step: 209600 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:48:16,544-Speed 6310.97 samples/sec Loss 6.4857 LearningRate 0.0007 Epoch: 10 Global Step: 209610 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:48:19,791-Speed 6307.72 samples/sec Loss 6.5085 LearningRate 0.0007 Epoch: 10 Global Step: 209620 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:48:23,026-Speed 6332.25 samples/sec Loss 6.4490 LearningRate 0.0007 Epoch: 10 Global Step: 209630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:26,275-Speed 6306.02 samples/sec Loss 6.5029 LearningRate 0.0007 Epoch: 10 Global Step: 209640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:29,520-Speed 6312.33 samples/sec Loss 6.5631 LearningRate 0.0007 Epoch: 10 Global Step: 209650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:32,768-Speed 6307.52 samples/sec Loss 6.5149 LearningRate 0.0007 Epoch: 10 Global Step: 209660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:36,015-Speed 6307.17 samples/sec Loss 6.4292 LearningRate 0.0007 Epoch: 10 Global Step: 209670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:39,264-Speed 6306.24 samples/sec Loss 6.5074 LearningRate 0.0007 Epoch: 10 Global Step: 209680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:42,509-Speed 6311.61 samples/sec Loss 6.5457 LearningRate 0.0007 Epoch: 10 Global Step: 209690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:45,753-Speed 6316.42 samples/sec Loss 6.4727 LearningRate 0.0007 Epoch: 10 Global Step: 209700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:49,000-Speed 6308.60 samples/sec Loss 6.6218 LearningRate 0.0007 Epoch: 10 Global Step: 209710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:52,246-Speed 6311.18 samples/sec Loss 6.5440 LearningRate 0.0007 Epoch: 10 Global Step: 209720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:48:55,494-Speed 6305.55 samples/sec Loss 6.5765 LearningRate 0.0007 Epoch: 10 Global Step: 209730 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:48:58,743-Speed 6306.20 samples/sec Loss 6.4498 LearningRate 0.0007 Epoch: 10 Global Step: 209740 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:49:01,991-Speed 6305.28 samples/sec Loss 6.4585 LearningRate 0.0007 Epoch: 10 Global Step: 209750 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:49:05,245-Speed 6296.23 samples/sec Loss 6.4982 LearningRate 0.0007 Epoch: 10 Global Step: 209760 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:49:08,479-Speed 6333.16 samples/sec Loss 6.5513 LearningRate 0.0007 Epoch: 10 Global Step: 209770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:11,723-Speed 6315.45 samples/sec Loss 6.4978 LearningRate 0.0007 Epoch: 10 Global Step: 209780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:14,969-Speed 6309.69 samples/sec Loss 6.4600 LearningRate 0.0007 Epoch: 10 Global Step: 209790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:18,219-Speed 6302.90 samples/sec Loss 6.5020 LearningRate 0.0007 Epoch: 10 Global Step: 209800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:21,467-Speed 6306.82 samples/sec Loss 6.5349 LearningRate 0.0007 Epoch: 10 Global Step: 209810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:24,711-Speed 6315.14 samples/sec Loss 6.4833 LearningRate 0.0007 Epoch: 10 Global Step: 209820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:27,960-Speed 6306.05 samples/sec Loss 6.5306 LearningRate 0.0007 Epoch: 10 Global Step: 209830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:31,203-Speed 6315.57 samples/sec Loss 6.4631 LearningRate 0.0007 Epoch: 10 Global Step: 209840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:34,450-Speed 6308.34 samples/sec Loss 6.4503 LearningRate 0.0007 Epoch: 10 Global Step: 209850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:37,698-Speed 6306.84 samples/sec Loss 6.4787 LearningRate 0.0007 Epoch: 10 Global Step: 209860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:40,948-Speed 6303.79 samples/sec Loss 6.4652 LearningRate 0.0007 Epoch: 10 Global Step: 209870 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:49:44,196-Speed 6305.64 samples/sec Loss 6.5259 LearningRate 0.0007 Epoch: 10 Global Step: 209880 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:49:47,441-Speed 6313.38 samples/sec Loss 6.4901 LearningRate 0.0007 Epoch: 10 Global Step: 209890 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:49:50,675-Speed 6334.28 samples/sec Loss 6.5379 LearningRate 0.0007 Epoch: 10 Global Step: 209900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:53,923-Speed 6307.51 samples/sec Loss 6.5243 LearningRate 0.0007 Epoch: 10 Global Step: 209910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:49:57,177-Speed 6296.27 samples/sec Loss 6.5158 LearningRate 0.0007 Epoch: 10 Global Step: 209920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:00,421-Speed 6313.66 samples/sec Loss 6.4802 LearningRate 0.0007 Epoch: 10 Global Step: 209930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:03,668-Speed 6308.57 samples/sec Loss 6.4348 LearningRate 0.0007 Epoch: 10 Global Step: 209940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:06,915-Speed 6309.05 samples/sec Loss 6.5741 LearningRate 0.0007 Epoch: 10 Global Step: 209950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:10,157-Speed 6319.21 samples/sec Loss 6.5308 LearningRate 0.0007 Epoch: 10 Global Step: 209960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:13,403-Speed 6310.51 samples/sec Loss 6.5514 LearningRate 0.0007 Epoch: 10 Global Step: 209970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:16,649-Speed 6309.99 samples/sec Loss 6.5184 LearningRate 0.0007 Epoch: 10 Global Step: 209980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:19,896-Speed 6309.54 samples/sec Loss 6.4353 LearningRate 0.0007 Epoch: 10 Global Step: 209990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:23,143-Speed 6307.86 samples/sec Loss 6.5259 LearningRate 0.0007 Epoch: 10 Global Step: 210000 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:50:26,378-Speed 6332.80 samples/sec Loss 6.5102 LearningRate 0.0007 Epoch: 10 Global Step: 210010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:29,621-Speed 6315.90 samples/sec Loss 6.5080 LearningRate 0.0007 Epoch: 10 Global Step: 210020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:32,865-Speed 6314.35 samples/sec Loss 6.5293 LearningRate 0.0007 Epoch: 10 Global Step: 210030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:36,112-Speed 6310.30 samples/sec Loss 6.5101 LearningRate 0.0007 Epoch: 10 Global Step: 210040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:39,358-Speed 6309.25 samples/sec Loss 6.5072 LearningRate 0.0007 Epoch: 10 Global Step: 210050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:42,610-Speed 6300.06 samples/sec Loss 6.5162 LearningRate 0.0007 Epoch: 10 Global Step: 210060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:45,857-Speed 6307.25 samples/sec Loss 6.5135 LearningRate 0.0007 Epoch: 10 Global Step: 210070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:49,104-Speed 6310.35 samples/sec Loss 6.5533 LearningRate 0.0007 Epoch: 10 Global Step: 210080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:52,347-Speed 6315.41 samples/sec Loss 6.5009 LearningRate 0.0007 Epoch: 10 Global Step: 210090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:55,592-Speed 6313.44 samples/sec Loss 6.4681 LearningRate 0.0007 Epoch: 10 Global Step: 210100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:50:58,822-Speed 6342.11 samples/sec Loss 6.4995 LearningRate 0.0007 Epoch: 10 Global Step: 210110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:02,068-Speed 6310.18 samples/sec Loss 6.4699 LearningRate 0.0007 Epoch: 10 Global Step: 210120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:05,316-Speed 6307.44 samples/sec Loss 6.4547 LearningRate 0.0007 Epoch: 10 Global Step: 210130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:08,561-Speed 6313.62 samples/sec Loss 6.5022 LearningRate 0.0007 Epoch: 10 Global Step: 210140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:11,809-Speed 6305.64 samples/sec Loss 6.5210 LearningRate 0.0007 Epoch: 10 Global Step: 210150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:15,055-Speed 6311.27 samples/sec Loss 6.5784 LearningRate 0.0007 Epoch: 10 Global Step: 210160 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:18,300-Speed 6313.52 samples/sec Loss 6.5467 LearningRate 0.0007 Epoch: 10 Global Step: 210170 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:21,547-Speed 6307.91 samples/sec Loss 6.5374 LearningRate 0.0007 Epoch: 10 Global Step: 210180 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:24,793-Speed 6309.91 samples/sec Loss 6.4740 LearningRate 0.0007 Epoch: 10 Global Step: 210190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:28,041-Speed 6307.61 samples/sec Loss 6.4586 LearningRate 0.0007 Epoch: 10 Global Step: 210200 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:31,271-Speed 6341.73 samples/sec Loss 6.4824 LearningRate 0.0007 Epoch: 10 Global Step: 210210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:34,521-Speed 6303.19 samples/sec Loss 6.4891 LearningRate 0.0007 Epoch: 10 Global Step: 210220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:37,770-Speed 6305.63 samples/sec Loss 6.4695 LearningRate 0.0007 Epoch: 10 Global Step: 210230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:41,017-Speed 6307.97 samples/sec Loss 6.4591 LearningRate 0.0007 Epoch: 10 Global Step: 210240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:44,261-Speed 6313.61 samples/sec Loss 6.3857 LearningRate 0.0007 Epoch: 10 Global Step: 210250 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:47,514-Speed 6297.37 samples/sec Loss 6.5180 LearningRate 0.0007 Epoch: 10 Global Step: 210260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:50,760-Speed 6311.70 samples/sec Loss 6.5307 LearningRate 0.0007 Epoch: 10 Global Step: 210270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:54,006-Speed 6310.63 samples/sec Loss 6.5476 LearningRate 0.0007 Epoch: 10 Global Step: 210280 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:51:57,249-Speed 6316.34 samples/sec Loss 6.4612 LearningRate 0.0007 Epoch: 10 Global Step: 210290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:00,501-Speed 6299.36 samples/sec Loss 6.5206 LearningRate 0.0007 Epoch: 10 Global Step: 210300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:03,735-Speed 6334.15 samples/sec Loss 6.4610 LearningRate 0.0007 Epoch: 10 Global Step: 210310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:06,984-Speed 6305.50 samples/sec Loss 6.4958 LearningRate 0.0007 Epoch: 10 Global Step: 210320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:10,226-Speed 6317.29 samples/sec Loss 6.4853 LearningRate 0.0007 Epoch: 10 Global Step: 210330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:13,471-Speed 6311.90 samples/sec Loss 6.5178 LearningRate 0.0007 Epoch: 10 Global Step: 210340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:16,719-Speed 6307.93 samples/sec Loss 6.6704 LearningRate 0.0007 Epoch: 10 Global Step: 210350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:19,962-Speed 6317.23 samples/sec Loss 6.5159 LearningRate 0.0007 Epoch: 10 Global Step: 210360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:23,207-Speed 6313.41 samples/sec Loss 6.4618 LearningRate 0.0007 Epoch: 10 Global Step: 210370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:26,453-Speed 6309.88 samples/sec Loss 6.4886 LearningRate 0.0007 Epoch: 10 Global Step: 210380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:29,703-Speed 6303.45 samples/sec Loss 6.5380 LearningRate 0.0007 Epoch: 10 Global Step: 210390 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:32,948-Speed 6313.25 samples/sec Loss 6.4865 LearningRate 0.0007 Epoch: 10 Global Step: 210400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:36,219-Speed 6261.53 samples/sec Loss 6.5160 LearningRate 0.0007 Epoch: 10 Global Step: 210410 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:52:39,454-Speed 6332.79 samples/sec Loss 6.5336 LearningRate 0.0007 Epoch: 10 Global Step: 210420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:42,702-Speed 6306.44 samples/sec Loss 6.4888 LearningRate 0.0007 Epoch: 10 Global Step: 210430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:45,948-Speed 6309.79 samples/sec Loss 6.4684 LearningRate 0.0007 Epoch: 10 Global Step: 210440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:49,195-Speed 6310.03 samples/sec Loss 6.5002 LearningRate 0.0007 Epoch: 10 Global Step: 210450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:52,440-Speed 6312.47 samples/sec Loss 6.5446 LearningRate 0.0007 Epoch: 10 Global Step: 210460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:55,689-Speed 6304.51 samples/sec Loss 6.4772 LearningRate 0.0007 Epoch: 10 Global Step: 210470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:52:58,937-Speed 6306.64 samples/sec Loss 6.4360 LearningRate 0.0007 Epoch: 10 Global Step: 210480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:02,182-Speed 6314.26 samples/sec Loss 6.5186 LearningRate 0.0007 Epoch: 10 Global Step: 210490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:05,435-Speed 6297.31 samples/sec Loss 6.5121 LearningRate 0.0007 Epoch: 10 Global Step: 210500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:08,683-Speed 6305.82 samples/sec Loss 6.5456 LearningRate 0.0007 Epoch: 10 Global Step: 210510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:11,935-Speed 6299.73 samples/sec Loss 6.5821 LearningRate 0.0007 Epoch: 10 Global Step: 210520 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:53:15,167-Speed 6337.21 samples/sec Loss 6.5044 LearningRate 0.0007 Epoch: 10 Global Step: 210530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:18,413-Speed 6310.76 samples/sec Loss 6.4160 LearningRate 0.0007 Epoch: 10 Global Step: 210540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:21,658-Speed 6312.15 samples/sec Loss 6.4872 LearningRate 0.0007 Epoch: 10 Global Step: 210550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:24,907-Speed 6306.23 samples/sec Loss 6.4717 LearningRate 0.0007 Epoch: 10 Global Step: 210560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:28,160-Speed 6296.85 samples/sec Loss 6.5378 LearningRate 0.0007 Epoch: 10 Global Step: 210570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:31,409-Speed 6306.52 samples/sec Loss 6.4716 LearningRate 0.0007 Epoch: 10 Global Step: 210580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:34,656-Speed 6309.18 samples/sec Loss 6.5214 LearningRate 0.0007 Epoch: 10 Global Step: 210590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:37,899-Speed 6315.36 samples/sec Loss 6.5525 LearningRate 0.0007 Epoch: 10 Global Step: 210600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:41,144-Speed 6313.06 samples/sec Loss 6.4924 LearningRate 0.0007 Epoch: 10 Global Step: 210610 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:44,393-Speed 6304.60 samples/sec Loss 6.5419 LearningRate 0.0007 Epoch: 10 Global Step: 210620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:47,640-Speed 6309.26 samples/sec Loss 6.4624 LearningRate 0.0007 Epoch: 10 Global Step: 210630 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:53:50,874-Speed 6334.48 samples/sec Loss 6.4481 LearningRate 0.0007 Epoch: 10 Global Step: 210640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:54,125-Speed 6299.78 samples/sec Loss 6.4614 LearningRate 0.0007 Epoch: 10 Global Step: 210650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:53:57,373-Speed 6308.00 samples/sec Loss 6.4657 LearningRate 0.0007 Epoch: 10 Global Step: 210660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:00,626-Speed 6296.76 samples/sec Loss 6.4926 LearningRate 0.0007 Epoch: 10 Global Step: 210670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:03,876-Speed 6302.87 samples/sec Loss 6.5618 LearningRate 0.0007 Epoch: 10 Global Step: 210680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:07,123-Speed 6307.93 samples/sec Loss 6.4647 LearningRate 0.0007 Epoch: 10 Global Step: 210690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:10,374-Speed 6301.76 samples/sec Loss 6.5527 LearningRate 0.0007 Epoch: 10 Global Step: 210700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:13,631-Speed 6289.27 samples/sec Loss 6.5223 LearningRate 0.0007 Epoch: 10 Global Step: 210710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:16,879-Speed 6307.38 samples/sec Loss 6.4489 LearningRate 0.0007 Epoch: 10 Global Step: 210720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:20,131-Speed 6298.96 samples/sec Loss 6.5216 LearningRate 0.0007 Epoch: 10 Global Step: 210730 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:23,379-Speed 6305.43 samples/sec Loss 6.4936 LearningRate 0.0007 Epoch: 10 Global Step: 210740 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:54:26,630-Speed 6301.03 samples/sec Loss 6.4562 LearningRate 0.0007 Epoch: 10 Global Step: 210750 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:54:29,869-Speed 6326.19 samples/sec Loss 6.5004 LearningRate 0.0007 Epoch: 10 Global Step: 210760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:33,117-Speed 6307.23 samples/sec Loss 6.4902 LearningRate 0.0007 Epoch: 10 Global Step: 210770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:36,361-Speed 6313.46 samples/sec Loss 6.4464 LearningRate 0.0007 Epoch: 10 Global Step: 210780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:39,622-Speed 6282.03 samples/sec Loss 6.4576 LearningRate 0.0007 Epoch: 10 Global Step: 210790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:42,868-Speed 6311.33 samples/sec Loss 6.3950 LearningRate 0.0007 Epoch: 10 Global Step: 210800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:46,117-Speed 6303.86 samples/sec Loss 6.4249 LearningRate 0.0007 Epoch: 10 Global Step: 210810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:49,366-Speed 6305.76 samples/sec Loss 6.4882 LearningRate 0.0007 Epoch: 10 Global Step: 210820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:52,616-Speed 6303.39 samples/sec Loss 6.4412 LearningRate 0.0007 Epoch: 10 Global Step: 210830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:55,863-Speed 6308.64 samples/sec Loss 6.4551 LearningRate 0.0007 Epoch: 10 Global Step: 210840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:54:59,108-Speed 6312.16 samples/sec Loss 6.4650 LearningRate 0.0007 Epoch: 10 Global Step: 210850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:02,361-Speed 6301.56 samples/sec Loss 6.3963 LearningRate 0.0007 Epoch: 10 Global Step: 210860 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:55:05,595-Speed 6332.44 samples/sec Loss 6.4038 LearningRate 0.0007 Epoch: 10 Global Step: 210870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:08,841-Speed 6310.87 samples/sec Loss 6.4537 LearningRate 0.0007 Epoch: 10 Global Step: 210880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:12,088-Speed 6308.56 samples/sec Loss 6.4213 LearningRate 0.0007 Epoch: 10 Global Step: 210890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:15,336-Speed 6307.58 samples/sec Loss 6.4637 LearningRate 0.0007 Epoch: 10 Global Step: 210900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:18,578-Speed 6318.04 samples/sec Loss 6.4659 LearningRate 0.0007 Epoch: 10 Global Step: 210910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:21,824-Speed 6310.71 samples/sec Loss 6.4806 LearningRate 0.0007 Epoch: 10 Global Step: 210920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:25,070-Speed 6311.30 samples/sec Loss 6.4475 LearningRate 0.0007 Epoch: 10 Global Step: 210930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:28,318-Speed 6305.49 samples/sec Loss 6.5496 LearningRate 0.0007 Epoch: 10 Global Step: 210940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:31,562-Speed 6315.49 samples/sec Loss 6.5235 LearningRate 0.0007 Epoch: 10 Global Step: 210950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:34,809-Speed 6308.35 samples/sec Loss 6.5312 LearningRate 0.0007 Epoch: 10 Global Step: 210960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:38,055-Speed 6311.39 samples/sec Loss 6.5095 LearningRate 0.0007 Epoch: 10 Global Step: 210970 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:55:41,299-Speed 6315.19 samples/sec Loss 6.4935 LearningRate 0.0007 Epoch: 10 Global Step: 210980 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:55:44,547-Speed 6306.94 samples/sec Loss 6.4594 LearningRate 0.0007 Epoch: 10 Global Step: 210990 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:55:47,778-Speed 6339.25 samples/sec Loss 6.5331 LearningRate 0.0007 Epoch: 10 Global Step: 211000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:51,030-Speed 6299.36 samples/sec Loss 6.4446 LearningRate 0.0007 Epoch: 10 Global Step: 211010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:54,274-Speed 6314.86 samples/sec Loss 6.5057 LearningRate 0.0007 Epoch: 10 Global Step: 211020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:55:57,521-Speed 6308.86 samples/sec Loss 6.5382 LearningRate 0.0007 Epoch: 10 Global Step: 211030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:00,771-Speed 6303.43 samples/sec Loss 6.5725 LearningRate 0.0007 Epoch: 10 Global Step: 211040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:04,016-Speed 6311.60 samples/sec Loss 6.5476 LearningRate 0.0007 Epoch: 10 Global Step: 211050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:07,264-Speed 6306.73 samples/sec Loss 6.5020 LearningRate 0.0007 Epoch: 10 Global Step: 211060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:10,521-Speed 6290.46 samples/sec Loss 6.5534 LearningRate 0.0007 Epoch: 10 Global Step: 211070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:13,774-Speed 6296.04 samples/sec Loss 6.4774 LearningRate 0.0007 Epoch: 10 Global Step: 211080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:17,022-Speed 6308.25 samples/sec Loss 6.4611 LearningRate 0.0007 Epoch: 10 Global Step: 211090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:20,259-Speed 6327.82 samples/sec Loss 6.4972 LearningRate 0.0007 Epoch: 10 Global Step: 211100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:23,504-Speed 6312.24 samples/sec Loss 6.5622 LearningRate 0.0007 Epoch: 10 Global Step: 211110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:26,754-Speed 6302.26 samples/sec Loss 6.4043 LearningRate 0.0007 Epoch: 10 Global Step: 211120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:30,011-Speed 6290.41 samples/sec Loss 6.5286 LearningRate 0.0007 Epoch: 10 Global Step: 211130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:33,259-Speed 6307.88 samples/sec Loss 6.4215 LearningRate 0.0007 Epoch: 10 Global Step: 211140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:36,507-Speed 6306.02 samples/sec Loss 6.4298 LearningRate 0.0007 Epoch: 10 Global Step: 211150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:39,751-Speed 6315.09 samples/sec Loss 6.4712 LearningRate 0.0007 Epoch: 10 Global Step: 211160 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:42,996-Speed 6312.25 samples/sec Loss 6.4714 LearningRate 0.0007 Epoch: 10 Global Step: 211170 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:46,239-Speed 6316.38 samples/sec Loss 6.5174 LearningRate 0.0007 Epoch: 10 Global Step: 211180 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:49,485-Speed 6310.06 samples/sec Loss 6.5087 LearningRate 0.0007 Epoch: 10 Global Step: 211190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:56:52,733-Speed 6307.35 samples/sec Loss 6.4327 LearningRate 0.0007 Epoch: 10 Global Step: 211200 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:56:55,981-Speed 6306.70 samples/sec Loss 6.4433 LearningRate 0.0007 Epoch: 10 Global Step: 211210 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:56:59,213-Speed 6338.36 samples/sec Loss 6.4173 LearningRate 0.0007 Epoch: 10 Global Step: 211220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:02,458-Speed 6312.38 samples/sec Loss 6.5063 LearningRate 0.0007 Epoch: 10 Global Step: 211230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:05,705-Speed 6310.34 samples/sec Loss 6.5626 LearningRate 0.0007 Epoch: 10 Global Step: 211240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:08,951-Speed 6309.65 samples/sec Loss 6.4742 LearningRate 0.0007 Epoch: 10 Global Step: 211250 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:12,201-Speed 6303.68 samples/sec Loss 6.4974 LearningRate 0.0007 Epoch: 10 Global Step: 211260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:15,447-Speed 6311.01 samples/sec Loss 6.5429 LearningRate 0.0007 Epoch: 10 Global Step: 211270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:18,695-Speed 6305.65 samples/sec Loss 6.4245 LearningRate 0.0007 Epoch: 10 Global Step: 211280 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:21,942-Speed 6309.97 samples/sec Loss 6.5059 LearningRate 0.0007 Epoch: 10 Global Step: 211290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:25,191-Speed 6305.07 samples/sec Loss 6.4492 LearningRate 0.0007 Epoch: 10 Global Step: 211300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:28,441-Speed 6302.55 samples/sec Loss 6.5286 LearningRate 0.0007 Epoch: 10 Global Step: 211310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:31,690-Speed 6304.48 samples/sec Loss 6.5211 LearningRate 0.0007 Epoch: 10 Global Step: 211320 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:57:34,922-Speed 6337.74 samples/sec Loss 6.4835 LearningRate 0.0007 Epoch: 10 Global Step: 211330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:38,192-Speed 6263.98 samples/sec Loss 6.5104 LearningRate 0.0007 Epoch: 10 Global Step: 211340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:41,456-Speed 6277.13 samples/sec Loss 6.4971 LearningRate 0.0007 Epoch: 10 Global Step: 211350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:44,700-Speed 6314.46 samples/sec Loss 6.5631 LearningRate 0.0007 Epoch: 10 Global Step: 211360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:47,955-Speed 6293.25 samples/sec Loss 6.5517 LearningRate 0.0007 Epoch: 10 Global Step: 211370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:51,198-Speed 6315.43 samples/sec Loss 6.4825 LearningRate 0.0007 Epoch: 10 Global Step: 211380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:54,443-Speed 6312.74 samples/sec Loss 6.4979 LearningRate 0.0007 Epoch: 10 Global Step: 211390 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:57:57,704-Speed 6282.89 samples/sec Loss 6.4345 LearningRate 0.0007 Epoch: 10 Global Step: 211400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:00,953-Speed 6305.28 samples/sec Loss 6.4628 LearningRate 0.0007 Epoch: 10 Global Step: 211410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:04,234-Speed 6243.87 samples/sec Loss 6.4351 LearningRate 0.0007 Epoch: 10 Global Step: 211420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:07,467-Speed 6335.61 samples/sec Loss 6.5196 LearningRate 0.0007 Epoch: 10 Global Step: 211430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:10,712-Speed 6311.94 samples/sec Loss 6.4963 LearningRate 0.0007 Epoch: 10 Global Step: 211440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:13,962-Speed 6303.24 samples/sec Loss 6.4904 LearningRate 0.0007 Epoch: 10 Global Step: 211450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:17,210-Speed 6306.79 samples/sec Loss 6.4482 LearningRate 0.0007 Epoch: 10 Global Step: 211460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:20,453-Speed 6316.61 samples/sec Loss 6.5723 LearningRate 0.0007 Epoch: 10 Global Step: 211470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:23,701-Speed 6306.42 samples/sec Loss 6.4594 LearningRate 0.0007 Epoch: 10 Global Step: 211480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:26,949-Speed 6307.50 samples/sec Loss 6.4525 LearningRate 0.0007 Epoch: 10 Global Step: 211490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:30,195-Speed 6310.82 samples/sec Loss 6.5043 LearningRate 0.0007 Epoch: 10 Global Step: 211500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:33,445-Speed 6303.77 samples/sec Loss 6.5538 LearningRate 0.0007 Epoch: 10 Global Step: 211510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:36,695-Speed 6302.20 samples/sec Loss 6.4154 LearningRate 0.0007 Epoch: 10 Global Step: 211520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:39,933-Speed 6327.24 samples/sec Loss 6.5547 LearningRate 0.0007 Epoch: 10 Global Step: 211530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:43,179-Speed 6310.12 samples/sec Loss 6.4423 LearningRate 0.0007 Epoch: 10 Global Step: 211540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:46,427-Speed 6306.69 samples/sec Loss 6.5995 LearningRate 0.0007 Epoch: 10 Global Step: 211550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:49,671-Speed 6313.30 samples/sec Loss 6.4654 LearningRate 0.0007 Epoch: 10 Global Step: 211560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:52,918-Speed 6310.39 samples/sec Loss 6.4700 LearningRate 0.0007 Epoch: 10 Global Step: 211570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:56,165-Speed 6307.79 samples/sec Loss 6.4675 LearningRate 0.0007 Epoch: 10 Global Step: 211580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:58:59,421-Speed 6290.53 samples/sec Loss 6.5020 LearningRate 0.0007 Epoch: 10 Global Step: 211590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:02,670-Speed 6305.29 samples/sec Loss 6.4384 LearningRate 0.0007 Epoch: 10 Global Step: 211600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:05,915-Speed 6313.94 samples/sec Loss 6.5416 LearningRate 0.0007 Epoch: 10 Global Step: 211610 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:09,162-Speed 6308.85 samples/sec Loss 6.5058 LearningRate 0.0007 Epoch: 10 Global Step: 211620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:12,409-Speed 6308.93 samples/sec Loss 6.5234 LearningRate 0.0007 Epoch: 10 Global Step: 211630 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:59:15,644-Speed 6332.59 samples/sec Loss 6.4763 LearningRate 0.0007 Epoch: 10 Global Step: 211640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:18,889-Speed 6311.22 samples/sec Loss 6.4852 LearningRate 0.0007 Epoch: 10 Global Step: 211650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:22,136-Speed 6308.79 samples/sec Loss 6.5211 LearningRate 0.0007 Epoch: 10 Global Step: 211660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:25,383-Speed 6309.52 samples/sec Loss 6.4564 LearningRate 0.0007 Epoch: 10 Global Step: 211670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:28,630-Speed 6309.48 samples/sec Loss 6.5419 LearningRate 0.0007 Epoch: 10 Global Step: 211680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:31,878-Speed 6306.26 samples/sec Loss 6.4441 LearningRate 0.0007 Epoch: 10 Global Step: 211690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:35,127-Speed 6304.62 samples/sec Loss 6.5011 LearningRate 0.0007 Epoch: 10 Global Step: 211700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:38,370-Speed 6315.95 samples/sec Loss 6.5547 LearningRate 0.0007 Epoch: 10 Global Step: 211710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:41,617-Speed 6308.89 samples/sec Loss 6.4634 LearningRate 0.0007 Epoch: 10 Global Step: 211720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:44,861-Speed 6314.31 samples/sec Loss 6.4924 LearningRate 0.0007 Epoch: 10 Global Step: 211730 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:48,107-Speed 6310.90 samples/sec Loss 6.4043 LearningRate 0.0007 Epoch: 10 Global Step: 211740 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 10:59:51,339-Speed 6338.15 samples/sec Loss 6.4980 LearningRate 0.0007 Epoch: 10 Global Step: 211750 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:54,581-Speed 6319.29 samples/sec Loss 6.4569 LearningRate 0.0007 Epoch: 10 Global Step: 211760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 10:59:57,830-Speed 6303.84 samples/sec Loss 6.4840 LearningRate 0.0007 Epoch: 10 Global Step: 211770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:01,083-Speed 6297.21 samples/sec Loss 6.5100 LearningRate 0.0007 Epoch: 10 Global Step: 211780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:04,332-Speed 6304.95 samples/sec Loss 6.5078 LearningRate 0.0007 Epoch: 10 Global Step: 211790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:07,577-Speed 6312.35 samples/sec Loss 6.4685 LearningRate 0.0007 Epoch: 10 Global Step: 211800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:10,831-Speed 6295.83 samples/sec Loss 6.4539 LearningRate 0.0007 Epoch: 10 Global Step: 211810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:14,078-Speed 6308.65 samples/sec Loss 6.4117 LearningRate 0.0007 Epoch: 10 Global Step: 211820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:17,323-Speed 6313.12 samples/sec Loss 6.4615 LearningRate 0.0007 Epoch: 10 Global Step: 211830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:20,571-Speed 6307.35 samples/sec Loss 6.4664 LearningRate 0.0007 Epoch: 10 Global Step: 211840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:23,822-Speed 6301.47 samples/sec Loss 6.5122 LearningRate 0.0007 Epoch: 10 Global Step: 211850 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:00:27,052-Speed 6342.61 samples/sec Loss 6.5189 LearningRate 0.0007 Epoch: 10 Global Step: 211860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:30,297-Speed 6312.48 samples/sec Loss 6.4440 LearningRate 0.0007 Epoch: 10 Global Step: 211870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:33,542-Speed 6312.27 samples/sec Loss 6.4900 LearningRate 0.0007 Epoch: 10 Global Step: 211880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:36,786-Speed 6315.14 samples/sec Loss 6.4510 LearningRate 0.0007 Epoch: 10 Global Step: 211890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:40,045-Speed 6284.66 samples/sec Loss 6.5136 LearningRate 0.0007 Epoch: 10 Global Step: 211900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:43,291-Speed 6310.99 samples/sec Loss 6.4883 LearningRate 0.0007 Epoch: 10 Global Step: 211910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:46,540-Speed 6304.17 samples/sec Loss 6.4633 LearningRate 0.0007 Epoch: 10 Global Step: 211920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:49,790-Speed 6304.40 samples/sec Loss 6.5107 LearningRate 0.0007 Epoch: 10 Global Step: 211930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:53,038-Speed 6306.66 samples/sec Loss 6.4790 LearningRate 0.0007 Epoch: 10 Global Step: 211940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:56,284-Speed 6310.55 samples/sec Loss 6.4411 LearningRate 0.0007 Epoch: 10 Global Step: 211950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:00:59,514-Speed 6340.67 samples/sec Loss 6.4387 LearningRate 0.0007 Epoch: 10 Global Step: 211960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:02,769-Speed 6292.97 samples/sec Loss 6.4159 LearningRate 0.0007 Epoch: 10 Global Step: 211970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:06,018-Speed 6305.76 samples/sec Loss 6.4458 LearningRate 0.0007 Epoch: 10 Global Step: 211980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:09,262-Speed 6313.99 samples/sec Loss 6.4665 LearningRate 0.0007 Epoch: 10 Global Step: 211990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:12,520-Speed 6287.99 samples/sec Loss 6.3957 LearningRate 0.0007 Epoch: 10 Global Step: 212000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:15,769-Speed 6305.11 samples/sec Loss 6.4827 LearningRate 0.0007 Epoch: 10 Global Step: 212010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:19,019-Speed 6303.27 samples/sec Loss 6.5143 LearningRate 0.0007 Epoch: 10 Global Step: 212020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:22,265-Speed 6309.48 samples/sec Loss 6.4603 LearningRate 0.0007 Epoch: 10 Global Step: 212030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:25,516-Speed 6301.68 samples/sec Loss 6.4121 LearningRate 0.0007 Epoch: 10 Global Step: 212040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:28,762-Speed 6310.98 samples/sec Loss 6.3941 LearningRate 0.0007 Epoch: 10 Global Step: 212050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:32,010-Speed 6308.23 samples/sec Loss 6.5119 LearningRate 0.0007 Epoch: 10 Global Step: 212060 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:01:35,257-Speed 6308.60 samples/sec Loss 6.4702 LearningRate 0.0007 Epoch: 10 Global Step: 212070 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:01:38,493-Speed 6330.25 samples/sec Loss 6.4311 LearningRate 0.0007 Epoch: 10 Global Step: 212080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:41,737-Speed 6314.25 samples/sec Loss 6.4720 LearningRate 0.0007 Epoch: 10 Global Step: 212090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:44,980-Speed 6316.80 samples/sec Loss 6.5460 LearningRate 0.0007 Epoch: 10 Global Step: 212100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:48,232-Speed 6299.71 samples/sec Loss 6.5172 LearningRate 0.0007 Epoch: 10 Global Step: 212110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:51,482-Speed 6302.13 samples/sec Loss 6.5682 LearningRate 0.0007 Epoch: 10 Global Step: 212120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:54,726-Speed 6314.59 samples/sec Loss 6.5026 LearningRate 0.0007 Epoch: 10 Global Step: 212130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:01:57,974-Speed 6306.10 samples/sec Loss 6.4277 LearningRate 0.0007 Epoch: 10 Global Step: 212140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:01,220-Speed 6310.86 samples/sec Loss 6.4128 LearningRate 0.0007 Epoch: 10 Global Step: 212150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:04,467-Speed 6312.56 samples/sec Loss 6.5385 LearningRate 0.0007 Epoch: 10 Global Step: 212160 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:07,713-Speed 6309.39 samples/sec Loss 6.4645 LearningRate 0.0007 Epoch: 10 Global Step: 212170 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:10,965-Speed 6299.83 samples/sec Loss 6.5126 LearningRate 0.0007 Epoch: 10 Global Step: 212180 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:02:14,196-Speed 6340.58 samples/sec Loss 6.4458 LearningRate 0.0007 Epoch: 10 Global Step: 212190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:17,437-Speed 6319.43 samples/sec Loss 6.4606 LearningRate 0.0007 Epoch: 10 Global Step: 212200 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:20,681-Speed 6315.26 samples/sec Loss 6.5196 LearningRate 0.0007 Epoch: 10 Global Step: 212210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:23,931-Speed 6302.71 samples/sec Loss 6.5326 LearningRate 0.0007 Epoch: 10 Global Step: 212220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:27,176-Speed 6312.83 samples/sec Loss 6.4806 LearningRate 0.0007 Epoch: 10 Global Step: 212230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:30,423-Speed 6307.94 samples/sec Loss 6.5002 LearningRate 0.0007 Epoch: 10 Global Step: 212240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:33,669-Speed 6311.46 samples/sec Loss 6.4856 LearningRate 0.0007 Epoch: 10 Global Step: 212250 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:36,916-Speed 6309.89 samples/sec Loss 6.5756 LearningRate 0.0007 Epoch: 10 Global Step: 212260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:40,161-Speed 6312.99 samples/sec Loss 6.5228 LearningRate 0.0007 Epoch: 10 Global Step: 212270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:43,404-Speed 6315.41 samples/sec Loss 6.5186 LearningRate 0.0007 Epoch: 10 Global Step: 212280 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:46,635-Speed 6341.46 samples/sec Loss 6.4639 LearningRate 0.0007 Epoch: 10 Global Step: 212290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:49,876-Speed 6319.64 samples/sec Loss 6.5054 LearningRate 0.0007 Epoch: 10 Global Step: 212300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:53,123-Speed 6308.95 samples/sec Loss 6.4571 LearningRate 0.0007 Epoch: 10 Global Step: 212310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:56,368-Speed 6312.23 samples/sec Loss 6.5720 LearningRate 0.0007 Epoch: 10 Global Step: 212320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:02:59,619-Speed 6302.18 samples/sec Loss 6.4432 LearningRate 0.0007 Epoch: 10 Global Step: 212330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:02,863-Speed 6314.54 samples/sec Loss 6.4140 LearningRate 0.0007 Epoch: 10 Global Step: 212340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:06,110-Speed 6307.81 samples/sec Loss 6.3915 LearningRate 0.0007 Epoch: 10 Global Step: 212350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:09,358-Speed 6306.80 samples/sec Loss 6.4509 LearningRate 0.0007 Epoch: 10 Global Step: 212360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:12,605-Speed 6307.99 samples/sec Loss 6.4573 LearningRate 0.0007 Epoch: 10 Global Step: 212370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:15,849-Speed 6319.14 samples/sec Loss 6.5316 LearningRate 0.0007 Epoch: 10 Global Step: 212380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:19,094-Speed 6312.64 samples/sec Loss 6.5217 LearningRate 0.0007 Epoch: 10 Global Step: 212390 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:03:22,326-Speed 6338.45 samples/sec Loss 6.4621 LearningRate 0.0007 Epoch: 10 Global Step: 212400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:25,570-Speed 6313.77 samples/sec Loss 6.5662 LearningRate 0.0007 Epoch: 10 Global Step: 212410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:28,826-Speed 6291.41 samples/sec Loss 6.4956 LearningRate 0.0007 Epoch: 10 Global Step: 212420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:32,079-Speed 6296.44 samples/sec Loss 6.4042 LearningRate 0.0007 Epoch: 10 Global Step: 212430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:35,325-Speed 6311.66 samples/sec Loss 6.5299 LearningRate 0.0007 Epoch: 10 Global Step: 212440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:38,575-Speed 6302.60 samples/sec Loss 6.4633 LearningRate 0.0007 Epoch: 10 Global Step: 212450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:41,824-Speed 6304.06 samples/sec Loss 6.5135 LearningRate 0.0007 Epoch: 10 Global Step: 212460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:45,076-Speed 6300.79 samples/sec Loss 6.4555 LearningRate 0.0007 Epoch: 10 Global Step: 212470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:48,322-Speed 6311.46 samples/sec Loss 6.4252 LearningRate 0.0007 Epoch: 10 Global Step: 212480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:51,569-Speed 6308.88 samples/sec Loss 6.4528 LearningRate 0.0007 Epoch: 10 Global Step: 212490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:03:54,815-Speed 6309.50 samples/sec Loss 6.4377 LearningRate 0.0007 Epoch: 10 Global Step: 212500 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:03:58,061-Speed 6311.49 samples/sec Loss 6.5275 LearningRate 0.0007 Epoch: 10 Global Step: 212510 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:04:01,294-Speed 6335.92 samples/sec Loss 6.4853 LearningRate 0.0007 Epoch: 10 Global Step: 212520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:04,540-Speed 6310.20 samples/sec Loss 6.4896 LearningRate 0.0007 Epoch: 10 Global Step: 212530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:07,787-Speed 6308.83 samples/sec Loss 6.4571 LearningRate 0.0007 Epoch: 10 Global Step: 212540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:11,032-Speed 6313.16 samples/sec Loss 6.4762 LearningRate 0.0007 Epoch: 10 Global Step: 212550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:14,276-Speed 6314.02 samples/sec Loss 6.5121 LearningRate 0.0007 Epoch: 10 Global Step: 212560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:17,523-Speed 6309.36 samples/sec Loss 6.4890 LearningRate 0.0007 Epoch: 10 Global Step: 212570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:20,793-Speed 6264.48 samples/sec Loss 6.5047 LearningRate 0.0007 Epoch: 10 Global Step: 212580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:24,083-Speed 6225.70 samples/sec Loss 6.4471 LearningRate 0.0007 Epoch: 10 Global Step: 212590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:27,332-Speed 6304.50 samples/sec Loss 6.5002 LearningRate 0.0007 Epoch: 10 Global Step: 212600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:30,579-Speed 6310.36 samples/sec Loss 6.5026 LearningRate 0.0007 Epoch: 10 Global Step: 212610 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:33,826-Speed 6307.90 samples/sec Loss 6.4785 LearningRate 0.0007 Epoch: 10 Global Step: 212620 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:04:37,058-Speed 6337.00 samples/sec Loss 6.5161 LearningRate 0.0007 Epoch: 10 Global Step: 212630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:40,306-Speed 6310.38 samples/sec Loss 6.5336 LearningRate 0.0007 Epoch: 10 Global Step: 212640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:43,551-Speed 6313.76 samples/sec Loss 6.4549 LearningRate 0.0007 Epoch: 10 Global Step: 212650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:46,800-Speed 6304.89 samples/sec Loss 6.5247 LearningRate 0.0007 Epoch: 10 Global Step: 212660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:50,048-Speed 6306.78 samples/sec Loss 6.5025 LearningRate 0.0007 Epoch: 10 Global Step: 212670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:53,293-Speed 6311.72 samples/sec Loss 6.4601 LearningRate 0.0007 Epoch: 10 Global Step: 212680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:56,539-Speed 6311.51 samples/sec Loss 6.4076 LearningRate 0.0007 Epoch: 10 Global Step: 212690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:04:59,785-Speed 6310.60 samples/sec Loss 6.4923 LearningRate 0.0007 Epoch: 10 Global Step: 212700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:03,032-Speed 6309.29 samples/sec Loss 6.5509 LearningRate 0.0007 Epoch: 10 Global Step: 212710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:06,279-Speed 6308.90 samples/sec Loss 6.5491 LearningRate 0.0007 Epoch: 10 Global Step: 212720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:09,510-Speed 6340.66 samples/sec Loss 6.4313 LearningRate 0.0007 Epoch: 10 Global Step: 212730 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:12,753-Speed 6316.04 samples/sec Loss 6.5111 LearningRate 0.0007 Epoch: 10 Global Step: 212740 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:16,002-Speed 6304.63 samples/sec Loss 6.4474 LearningRate 0.0007 Epoch: 10 Global Step: 212750 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:19,274-Speed 6259.97 samples/sec Loss 6.4561 LearningRate 0.0007 Epoch: 10 Global Step: 212760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:22,519-Speed 6314.20 samples/sec Loss 6.5104 LearningRate 0.0007 Epoch: 10 Global Step: 212770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:25,769-Speed 6302.97 samples/sec Loss 6.5100 LearningRate 0.0007 Epoch: 10 Global Step: 212780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:29,030-Speed 6280.68 samples/sec Loss 6.5805 LearningRate 0.0007 Epoch: 10 Global Step: 212790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:32,277-Speed 6308.79 samples/sec Loss 6.4113 LearningRate 0.0007 Epoch: 10 Global Step: 212800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:35,522-Speed 6313.24 samples/sec Loss 6.4570 LearningRate 0.0007 Epoch: 10 Global Step: 212810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:38,768-Speed 6310.47 samples/sec Loss 6.4628 LearningRate 0.0007 Epoch: 10 Global Step: 212820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:42,027-Speed 6286.19 samples/sec Loss 6.4342 LearningRate 0.0007 Epoch: 10 Global Step: 212830 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:05:45,262-Speed 6332.14 samples/sec Loss 6.4937 LearningRate 0.0007 Epoch: 10 Global Step: 212840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:48,512-Speed 6302.43 samples/sec Loss 6.4654 LearningRate 0.0007 Epoch: 10 Global Step: 212850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:51,756-Speed 6314.46 samples/sec Loss 6.5009 LearningRate 0.0007 Epoch: 10 Global Step: 212860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:55,001-Speed 6312.61 samples/sec Loss 6.5333 LearningRate 0.0007 Epoch: 10 Global Step: 212870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:05:58,244-Speed 6315.81 samples/sec Loss 6.5130 LearningRate 0.0007 Epoch: 10 Global Step: 212880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:01,485-Speed 6320.15 samples/sec Loss 6.4652 LearningRate 0.0007 Epoch: 10 Global Step: 212890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:04,730-Speed 6312.94 samples/sec Loss 6.4660 LearningRate 0.0007 Epoch: 10 Global Step: 212900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:07,998-Speed 6268.92 samples/sec Loss 6.5215 LearningRate 0.0007 Epoch: 10 Global Step: 212910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:11,243-Speed 6312.93 samples/sec Loss 6.4903 LearningRate 0.0007 Epoch: 10 Global Step: 212920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:14,492-Speed 6305.84 samples/sec Loss 6.4240 LearningRate 0.0007 Epoch: 10 Global Step: 212930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:17,738-Speed 6309.34 samples/sec Loss 6.4667 LearningRate 0.0007 Epoch: 10 Global Step: 212940 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:06:20,988-Speed 6303.71 samples/sec Loss 6.4986 LearningRate 0.0007 Epoch: 10 Global Step: 212950 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:06:24,221-Speed 6336.16 samples/sec Loss 6.4775 LearningRate 0.0007 Epoch: 10 Global Step: 212960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:27,465-Speed 6313.92 samples/sec Loss 6.4721 LearningRate 0.0007 Epoch: 10 Global Step: 212970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:30,711-Speed 6311.19 samples/sec Loss 6.4871 LearningRate 0.0007 Epoch: 10 Global Step: 212980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:33,962-Speed 6299.91 samples/sec Loss 6.4369 LearningRate 0.0007 Epoch: 10 Global Step: 212990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:37,209-Speed 6309.16 samples/sec Loss 6.4855 LearningRate 0.0007 Epoch: 10 Global Step: 213000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:40,457-Speed 6307.54 samples/sec Loss 6.4663 LearningRate 0.0007 Epoch: 10 Global Step: 213010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:43,703-Speed 6310.07 samples/sec Loss 6.4788 LearningRate 0.0007 Epoch: 10 Global Step: 213020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:46,948-Speed 6313.50 samples/sec Loss 6.5079 LearningRate 0.0007 Epoch: 10 Global Step: 213030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:50,194-Speed 6311.26 samples/sec Loss 6.5588 LearningRate 0.0007 Epoch: 10 Global Step: 213040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:53,441-Speed 6306.90 samples/sec Loss 6.5331 LearningRate 0.0007 Epoch: 10 Global Step: 213050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:56,672-Speed 6341.35 samples/sec Loss 6.4605 LearningRate 0.0007 Epoch: 10 Global Step: 213060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:06:59,923-Speed 6299.63 samples/sec Loss 6.4710 LearningRate 0.0007 Epoch: 10 Global Step: 213070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:03,170-Speed 6309.31 samples/sec Loss 6.3725 LearningRate 0.0007 Epoch: 10 Global Step: 213080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:06,414-Speed 6315.55 samples/sec Loss 6.4459 LearningRate 0.0007 Epoch: 10 Global Step: 213090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:09,658-Speed 6314.90 samples/sec Loss 6.3808 LearningRate 0.0007 Epoch: 10 Global Step: 213100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:12,906-Speed 6304.99 samples/sec Loss 6.3962 LearningRate 0.0007 Epoch: 10 Global Step: 213110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:16,150-Speed 6315.72 samples/sec Loss 6.4639 LearningRate 0.0007 Epoch: 10 Global Step: 213120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:19,398-Speed 6308.01 samples/sec Loss 6.4590 LearningRate 0.0007 Epoch: 10 Global Step: 213130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:22,642-Speed 6314.61 samples/sec Loss 6.4645 LearningRate 0.0007 Epoch: 10 Global Step: 213140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:25,889-Speed 6309.28 samples/sec Loss 6.4164 LearningRate 0.0007 Epoch: 10 Global Step: 213150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:29,133-Speed 6313.19 samples/sec Loss 6.4995 LearningRate 0.0007 Epoch: 10 Global Step: 213160 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:07:32,369-Speed 6331.56 samples/sec Loss 6.4930 LearningRate 0.0007 Epoch: 10 Global Step: 213170 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:35,612-Speed 6315.29 samples/sec Loss 6.4763 LearningRate 0.0007 Epoch: 10 Global Step: 213180 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:38,870-Speed 6289.04 samples/sec Loss 6.4885 LearningRate 0.0007 Epoch: 10 Global Step: 213190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:42,117-Speed 6307.25 samples/sec Loss 6.4538 LearningRate 0.0007 Epoch: 10 Global Step: 213200 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:45,364-Speed 6309.37 samples/sec Loss 6.4411 LearningRate 0.0007 Epoch: 10 Global Step: 213210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:48,612-Speed 6306.67 samples/sec Loss 6.4040 LearningRate 0.0007 Epoch: 10 Global Step: 213220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:51,857-Speed 6313.50 samples/sec Loss 6.4056 LearningRate 0.0007 Epoch: 10 Global Step: 213230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:55,101-Speed 6313.88 samples/sec Loss 6.4459 LearningRate 0.0007 Epoch: 10 Global Step: 213240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:07:58,345-Speed 6314.30 samples/sec Loss 6.4076 LearningRate 0.0007 Epoch: 10 Global Step: 213250 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:01,590-Speed 6313.25 samples/sec Loss 6.4740 LearningRate 0.0007 Epoch: 10 Global Step: 213260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:04,825-Speed 6331.60 samples/sec Loss 6.4127 LearningRate 0.0007 Epoch: 10 Global Step: 213270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:08,068-Speed 6315.74 samples/sec Loss 6.4608 LearningRate 0.0007 Epoch: 10 Global Step: 213280 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:11,318-Speed 6303.59 samples/sec Loss 6.5057 LearningRate 0.0007 Epoch: 10 Global Step: 213290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:14,564-Speed 6311.66 samples/sec Loss 6.4230 LearningRate 0.0007 Epoch: 10 Global Step: 213300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:17,810-Speed 6309.58 samples/sec Loss 6.4766 LearningRate 0.0007 Epoch: 10 Global Step: 213310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:21,056-Speed 6311.83 samples/sec Loss 6.4569 LearningRate 0.0007 Epoch: 10 Global Step: 213320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:24,303-Speed 6309.49 samples/sec Loss 6.4163 LearningRate 0.0007 Epoch: 10 Global Step: 213330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:27,553-Speed 6303.43 samples/sec Loss 6.5481 LearningRate 0.0007 Epoch: 10 Global Step: 213340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:30,799-Speed 6310.00 samples/sec Loss 6.3779 LearningRate 0.0007 Epoch: 10 Global Step: 213350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:34,045-Speed 6310.37 samples/sec Loss 6.4470 LearningRate 0.0007 Epoch: 10 Global Step: 213360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:37,289-Speed 6314.70 samples/sec Loss 6.3923 LearningRate 0.0007 Epoch: 10 Global Step: 213370 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:08:40,523-Speed 6335.24 samples/sec Loss 6.5385 LearningRate 0.0007 Epoch: 10 Global Step: 213380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:43,770-Speed 6307.51 samples/sec Loss 6.5422 LearningRate 0.0007 Epoch: 10 Global Step: 213390 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:47,014-Speed 6316.34 samples/sec Loss 6.5373 LearningRate 0.0007 Epoch: 10 Global Step: 213400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:50,264-Speed 6302.21 samples/sec Loss 6.4833 LearningRate 0.0007 Epoch: 10 Global Step: 213410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:53,507-Speed 6316.41 samples/sec Loss 6.5143 LearningRate 0.0007 Epoch: 10 Global Step: 213420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:56,748-Speed 6321.06 samples/sec Loss 6.4330 LearningRate 0.0007 Epoch: 10 Global Step: 213430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:08:59,991-Speed 6315.36 samples/sec Loss 6.4657 LearningRate 0.0007 Epoch: 10 Global Step: 213440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:03,242-Speed 6302.22 samples/sec Loss 6.5367 LearningRate 0.0007 Epoch: 10 Global Step: 213450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:06,490-Speed 6305.72 samples/sec Loss 6.4034 LearningRate 0.0007 Epoch: 10 Global Step: 213460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:09,734-Speed 6314.68 samples/sec Loss 6.5047 LearningRate 0.0007 Epoch: 10 Global Step: 213470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:12,968-Speed 6334.86 samples/sec Loss 6.4356 LearningRate 0.0007 Epoch: 10 Global Step: 213480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:16,225-Speed 6290.06 samples/sec Loss 6.4865 LearningRate 0.0007 Epoch: 10 Global Step: 213490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:19,467-Speed 6317.25 samples/sec Loss 6.5024 LearningRate 0.0007 Epoch: 10 Global Step: 213500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:22,713-Speed 6310.87 samples/sec Loss 6.5052 LearningRate 0.0007 Epoch: 10 Global Step: 213510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:25,958-Speed 6312.13 samples/sec Loss 6.4575 LearningRate 0.0007 Epoch: 10 Global Step: 213520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:29,201-Speed 6316.79 samples/sec Loss 6.3766 LearningRate 0.0007 Epoch: 10 Global Step: 213530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:32,448-Speed 6310.37 samples/sec Loss 6.4117 LearningRate 0.0007 Epoch: 10 Global Step: 213540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:35,695-Speed 6308.07 samples/sec Loss 6.4795 LearningRate 0.0007 Epoch: 10 Global Step: 213550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:38,942-Speed 6309.15 samples/sec Loss 6.4790 LearningRate 0.0007 Epoch: 10 Global Step: 213560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:42,190-Speed 6307.61 samples/sec Loss 6.4354 LearningRate 0.0007 Epoch: 10 Global Step: 213570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:45,422-Speed 6338.20 samples/sec Loss 6.5176 LearningRate 0.0007 Epoch: 10 Global Step: 213580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:48,674-Speed 6298.14 samples/sec Loss 6.4743 LearningRate 0.0007 Epoch: 10 Global Step: 213590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:51,920-Speed 6310.08 samples/sec Loss 6.4712 LearningRate 0.0007 Epoch: 10 Global Step: 213600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:55,163-Speed 6316.06 samples/sec Loss 6.4957 LearningRate 0.0007 Epoch: 10 Global Step: 213610 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:09:58,407-Speed 6316.37 samples/sec Loss 6.4449 LearningRate 0.0007 Epoch: 10 Global Step: 213620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:01,658-Speed 6299.47 samples/sec Loss 6.5114 LearningRate 0.0007 Epoch: 10 Global Step: 213630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:04,904-Speed 6310.91 samples/sec Loss 6.4292 LearningRate 0.0007 Epoch: 10 Global Step: 213640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:08,148-Speed 6314.60 samples/sec Loss 6.4811 LearningRate 0.0007 Epoch: 10 Global Step: 213650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:11,393-Speed 6313.71 samples/sec Loss 6.4834 LearningRate 0.0007 Epoch: 10 Global Step: 213660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:14,644-Speed 6301.33 samples/sec Loss 6.5267 LearningRate 0.0007 Epoch: 10 Global Step: 213670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:17,894-Speed 6301.76 samples/sec Loss 6.4322 LearningRate 0.0007 Epoch: 10 Global Step: 213680 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:10:21,126-Speed 6338.51 samples/sec Loss 6.4683 LearningRate 0.0007 Epoch: 10 Global Step: 213690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:24,374-Speed 6307.38 samples/sec Loss 6.4474 LearningRate 0.0007 Epoch: 10 Global Step: 213700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:27,625-Speed 6300.15 samples/sec Loss 6.4561 LearningRate 0.0007 Epoch: 10 Global Step: 213710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:30,885-Speed 6282.90 samples/sec Loss 6.4703 LearningRate 0.0007 Epoch: 10 Global Step: 213720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:34,134-Speed 6306.37 samples/sec Loss 6.4302 LearningRate 0.0007 Epoch: 10 Global Step: 213730 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:37,380-Speed 6311.90 samples/sec Loss 6.5186 LearningRate 0.0007 Epoch: 10 Global Step: 213740 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:40,628-Speed 6305.53 samples/sec Loss 6.4982 LearningRate 0.0007 Epoch: 10 Global Step: 213750 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:43,876-Speed 6306.91 samples/sec Loss 6.4117 LearningRate 0.0007 Epoch: 10 Global Step: 213760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:47,123-Speed 6309.15 samples/sec Loss 6.3973 LearningRate 0.0007 Epoch: 10 Global Step: 213770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:50,367-Speed 6314.64 samples/sec Loss 6.5933 LearningRate 0.0007 Epoch: 10 Global Step: 213780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:53,603-Speed 6330.95 samples/sec Loss 6.4539 LearningRate 0.0007 Epoch: 10 Global Step: 213790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:10:56,852-Speed 6304.24 samples/sec Loss 6.4813 LearningRate 0.0007 Epoch: 10 Global Step: 213800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:00,116-Speed 6276.38 samples/sec Loss 6.3864 LearningRate 0.0007 Epoch: 10 Global Step: 213810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:03,366-Speed 6303.01 samples/sec Loss 6.4392 LearningRate 0.0007 Epoch: 10 Global Step: 213820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:06,615-Speed 6304.70 samples/sec Loss 6.4073 LearningRate 0.0007 Epoch: 10 Global Step: 213830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:09,864-Speed 6304.73 samples/sec Loss 6.4225 LearningRate 0.0007 Epoch: 10 Global Step: 213840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:13,110-Speed 6310.43 samples/sec Loss 6.4087 LearningRate 0.0007 Epoch: 10 Global Step: 213850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:16,361-Speed 6302.30 samples/sec Loss 6.4527 LearningRate 0.0007 Epoch: 10 Global Step: 213860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:19,606-Speed 6310.87 samples/sec Loss 6.5111 LearningRate 0.0007 Epoch: 10 Global Step: 213870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:22,854-Speed 6310.37 samples/sec Loss 6.4866 LearningRate 0.0007 Epoch: 10 Global Step: 213880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:26,098-Speed 6314.35 samples/sec Loss 6.4262 LearningRate 0.0007 Epoch: 10 Global Step: 213890 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:11:29,347-Speed 6304.22 samples/sec Loss 6.4530 LearningRate 0.0007 Epoch: 10 Global Step: 213900 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:11:32,579-Speed 6339.24 samples/sec Loss 6.4790 LearningRate 0.0007 Epoch: 10 Global Step: 213910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:35,823-Speed 6314.23 samples/sec Loss 6.6015 LearningRate 0.0007 Epoch: 10 Global Step: 213920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:39,070-Speed 6308.28 samples/sec Loss 6.4462 LearningRate 0.0007 Epoch: 10 Global Step: 213930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:42,317-Speed 6308.87 samples/sec Loss 6.4753 LearningRate 0.0007 Epoch: 10 Global Step: 213940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:45,561-Speed 6315.27 samples/sec Loss 6.5409 LearningRate 0.0007 Epoch: 10 Global Step: 213950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:48,806-Speed 6313.16 samples/sec Loss 6.4511 LearningRate 0.0007 Epoch: 10 Global Step: 213960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:52,057-Speed 6300.65 samples/sec Loss 6.4754 LearningRate 0.0007 Epoch: 10 Global Step: 213970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:55,306-Speed 6305.70 samples/sec Loss 6.4315 LearningRate 0.0007 Epoch: 10 Global Step: 213980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:11:58,554-Speed 6306.53 samples/sec Loss 6.4670 LearningRate 0.0007 Epoch: 10 Global Step: 213990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:01,799-Speed 6312.00 samples/sec Loss 6.4158 LearningRate 0.0007 Epoch: 10 Global Step: 214000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:05,046-Speed 6308.85 samples/sec Loss 6.5572 LearningRate 0.0007 Epoch: 10 Global Step: 214010 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:12:08,277-Speed 6340.66 samples/sec Loss 6.4875 LearningRate 0.0007 Epoch: 10 Global Step: 214020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:11,519-Speed 6317.76 samples/sec Loss 6.4648 LearningRate 0.0007 Epoch: 10 Global Step: 214030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:14,771-Speed 6298.84 samples/sec Loss 6.4733 LearningRate 0.0007 Epoch: 10 Global Step: 214040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:18,016-Speed 6313.87 samples/sec Loss 6.4458 LearningRate 0.0007 Epoch: 10 Global Step: 214050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:21,261-Speed 6312.03 samples/sec Loss 6.4383 LearningRate 0.0007 Epoch: 10 Global Step: 214060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:24,507-Speed 6311.30 samples/sec Loss 6.4538 LearningRate 0.0007 Epoch: 10 Global Step: 214070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:27,752-Speed 6312.40 samples/sec Loss 6.4730 LearningRate 0.0007 Epoch: 10 Global Step: 214080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:30,996-Speed 6315.07 samples/sec Loss 6.3953 LearningRate 0.0007 Epoch: 10 Global Step: 214090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:34,241-Speed 6312.40 samples/sec Loss 6.4937 LearningRate 0.0007 Epoch: 10 Global Step: 214100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:37,485-Speed 6313.26 samples/sec Loss 6.4084 LearningRate 0.0007 Epoch: 10 Global Step: 214110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:40,736-Speed 6302.44 samples/sec Loss 6.4680 LearningRate 0.0007 Epoch: 10 Global Step: 214120 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:12:43,969-Speed 6335.14 samples/sec Loss 6.5340 LearningRate 0.0007 Epoch: 10 Global Step: 214130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:47,218-Speed 6304.69 samples/sec Loss 6.3992 LearningRate 0.0007 Epoch: 10 Global Step: 214140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:50,466-Speed 6306.78 samples/sec Loss 6.4828 LearningRate 0.0007 Epoch: 10 Global Step: 214150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:53,758-Speed 6223.41 samples/sec Loss 6.4895 LearningRate 0.0007 Epoch: 10 Global Step: 214160 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:12:57,006-Speed 6307.86 samples/sec Loss 6.4924 LearningRate 0.0007 Epoch: 10 Global Step: 214170 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:00,253-Speed 6307.77 samples/sec Loss 6.4413 LearningRate 0.0007 Epoch: 10 Global Step: 214180 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:03,499-Speed 6310.73 samples/sec Loss 6.4802 LearningRate 0.0007 Epoch: 10 Global Step: 214190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:06,746-Speed 6309.54 samples/sec Loss 6.4464 LearningRate 0.0007 Epoch: 10 Global Step: 214200 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:09,990-Speed 6313.78 samples/sec Loss 6.4143 LearningRate 0.0007 Epoch: 10 Global Step: 214210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:13,236-Speed 6310.94 samples/sec Loss 6.5387 LearningRate 0.0007 Epoch: 10 Global Step: 214220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:16,470-Speed 6334.93 samples/sec Loss 6.4245 LearningRate 0.0007 Epoch: 10 Global Step: 214230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:19,717-Speed 6308.53 samples/sec Loss 6.4244 LearningRate 0.0007 Epoch: 10 Global Step: 214240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:22,961-Speed 6315.07 samples/sec Loss 6.4990 LearningRate 0.0007 Epoch: 10 Global Step: 214250 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:26,209-Speed 6306.78 samples/sec Loss 6.3935 LearningRate 0.0007 Epoch: 10 Global Step: 214260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:29,462-Speed 6296.99 samples/sec Loss 6.4861 LearningRate 0.0007 Epoch: 10 Global Step: 214270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:32,710-Speed 6305.79 samples/sec Loss 6.4434 LearningRate 0.0007 Epoch: 10 Global Step: 214280 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:35,958-Speed 6308.41 samples/sec Loss 6.4481 LearningRate 0.0007 Epoch: 10 Global Step: 214290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:39,208-Speed 6301.92 samples/sec Loss 6.4085 LearningRate 0.0007 Epoch: 10 Global Step: 214300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:42,520-Speed 6184.39 samples/sec Loss 6.4117 LearningRate 0.0007 Epoch: 10 Global Step: 214310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:45,766-Speed 6310.89 samples/sec Loss 6.4726 LearningRate 0.0007 Epoch: 10 Global Step: 214320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:48,997-Speed 6340.37 samples/sec Loss 6.5322 LearningRate 0.0007 Epoch: 10 Global Step: 214330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:52,240-Speed 6315.88 samples/sec Loss 6.4847 LearningRate 0.0007 Epoch: 10 Global Step: 214340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:55,490-Speed 6303.01 samples/sec Loss 6.4593 LearningRate 0.0007 Epoch: 10 Global Step: 214350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:13:58,734-Speed 6315.28 samples/sec Loss 6.4706 LearningRate 0.0007 Epoch: 10 Global Step: 214360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:01,983-Speed 6304.09 samples/sec Loss 6.4753 LearningRate 0.0007 Epoch: 10 Global Step: 214370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:05,228-Speed 6312.74 samples/sec Loss 6.4889 LearningRate 0.0007 Epoch: 10 Global Step: 214380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:08,475-Speed 6309.52 samples/sec Loss 6.4613 LearningRate 0.0007 Epoch: 10 Global Step: 214390 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:11,723-Speed 6306.46 samples/sec Loss 6.4964 LearningRate 0.0007 Epoch: 10 Global Step: 214400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:14,967-Speed 6315.34 samples/sec Loss 6.5438 LearningRate 0.0007 Epoch: 10 Global Step: 214410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:18,215-Speed 6307.94 samples/sec Loss 6.4272 LearningRate 0.0007 Epoch: 10 Global Step: 214420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:21,447-Speed 6337.34 samples/sec Loss 6.4795 LearningRate 0.0007 Epoch: 10 Global Step: 214430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:24,705-Speed 6288.49 samples/sec Loss 6.5181 LearningRate 0.0007 Epoch: 10 Global Step: 214440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:27,952-Speed 6306.76 samples/sec Loss 6.4776 LearningRate 0.0007 Epoch: 10 Global Step: 214450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:31,202-Speed 6304.29 samples/sec Loss 6.3750 LearningRate 0.0007 Epoch: 10 Global Step: 214460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:34,447-Speed 6312.54 samples/sec Loss 6.4126 LearningRate 0.0007 Epoch: 10 Global Step: 214470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:37,694-Speed 6309.05 samples/sec Loss 6.4386 LearningRate 0.0007 Epoch: 10 Global Step: 214480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:40,939-Speed 6311.08 samples/sec Loss 6.4869 LearningRate 0.0007 Epoch: 10 Global Step: 214490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:44,189-Speed 6303.95 samples/sec Loss 6.4253 LearningRate 0.0007 Epoch: 10 Global Step: 214500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:47,431-Speed 6317.72 samples/sec Loss 6.5047 LearningRate 0.0007 Epoch: 10 Global Step: 214510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:50,678-Speed 6308.96 samples/sec Loss 6.4807 LearningRate 0.0007 Epoch: 10 Global Step: 214520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:14:53,926-Speed 6307.99 samples/sec Loss 6.5349 LearningRate 0.0007 Epoch: 10 Global Step: 214530 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:14:57,160-Speed 6334.21 samples/sec Loss 6.4763 LearningRate 0.0007 Epoch: 10 Global Step: 214540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:00,408-Speed 6305.49 samples/sec Loss 6.4058 LearningRate 0.0007 Epoch: 10 Global Step: 214550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:03,654-Speed 6309.95 samples/sec Loss 6.4435 LearningRate 0.0007 Epoch: 10 Global Step: 214560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:06,906-Speed 6300.02 samples/sec Loss 6.4542 LearningRate 0.0007 Epoch: 10 Global Step: 214570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:10,156-Speed 6303.11 samples/sec Loss 6.4559 LearningRate 0.0007 Epoch: 10 Global Step: 214580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:13,400-Speed 6313.52 samples/sec Loss 6.4429 LearningRate 0.0007 Epoch: 10 Global Step: 214590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:16,651-Speed 6301.59 samples/sec Loss 6.4596 LearningRate 0.0007 Epoch: 10 Global Step: 214600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:19,898-Speed 6310.56 samples/sec Loss 6.4915 LearningRate 0.0007 Epoch: 10 Global Step: 214610 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:23,149-Speed 6301.41 samples/sec Loss 6.5147 LearningRate 0.0007 Epoch: 10 Global Step: 214620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:26,396-Speed 6308.01 samples/sec Loss 6.4700 LearningRate 0.0007 Epoch: 10 Global Step: 214630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:29,640-Speed 6314.35 samples/sec Loss 6.4572 LearningRate 0.0007 Epoch: 10 Global Step: 214640 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:15:32,884-Speed 6314.59 samples/sec Loss 6.5602 LearningRate 0.0007 Epoch: 10 Global Step: 214650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:36,130-Speed 6310.86 samples/sec Loss 6.5355 LearningRate 0.0007 Epoch: 10 Global Step: 214660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:39,378-Speed 6306.53 samples/sec Loss 6.4714 LearningRate 0.0007 Epoch: 10 Global Step: 214670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:42,625-Speed 6308.13 samples/sec Loss 6.4869 LearningRate 0.0007 Epoch: 10 Global Step: 214680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:45,875-Speed 6307.43 samples/sec Loss 6.4437 LearningRate 0.0007 Epoch: 10 Global Step: 214690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:49,120-Speed 6313.00 samples/sec Loss 6.4512 LearningRate 0.0007 Epoch: 10 Global Step: 214700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:52,366-Speed 6309.96 samples/sec Loss 6.5211 LearningRate 0.0007 Epoch: 10 Global Step: 214710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:55,610-Speed 6315.04 samples/sec Loss 6.3772 LearningRate 0.0007 Epoch: 10 Global Step: 214720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:15:58,864-Speed 6293.99 samples/sec Loss 6.4503 LearningRate 0.0007 Epoch: 10 Global Step: 214730 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:02,117-Speed 6298.29 samples/sec Loss 6.4586 LearningRate 0.0007 Epoch: 10 Global Step: 214740 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:05,365-Speed 6307.11 samples/sec Loss 6.4414 LearningRate 0.0007 Epoch: 10 Global Step: 214750 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:16:08,610-Speed 6311.47 samples/sec Loss 6.4624 LearningRate 0.0007 Epoch: 10 Global Step: 214760 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:16:11,846-Speed 6331.21 samples/sec Loss 6.5205 LearningRate 0.0007 Epoch: 10 Global Step: 214770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:15,094-Speed 6306.51 samples/sec Loss 6.4979 LearningRate 0.0007 Epoch: 10 Global Step: 214780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:18,341-Speed 6307.74 samples/sec Loss 6.4565 LearningRate 0.0007 Epoch: 10 Global Step: 214790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:21,586-Speed 6313.81 samples/sec Loss 6.4364 LearningRate 0.0007 Epoch: 10 Global Step: 214800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:24,834-Speed 6307.03 samples/sec Loss 6.4689 LearningRate 0.0007 Epoch: 10 Global Step: 214810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:28,083-Speed 6305.60 samples/sec Loss 6.3903 LearningRate 0.0007 Epoch: 10 Global Step: 214820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:31,335-Speed 6297.93 samples/sec Loss 6.4740 LearningRate 0.0007 Epoch: 10 Global Step: 214830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:34,580-Speed 6313.85 samples/sec Loss 6.4465 LearningRate 0.0007 Epoch: 10 Global Step: 214840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:37,822-Speed 6317.55 samples/sec Loss 6.4422 LearningRate 0.0007 Epoch: 10 Global Step: 214850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:41,066-Speed 6314.41 samples/sec Loss 6.4990 LearningRate 0.0007 Epoch: 10 Global Step: 214860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:44,310-Speed 6314.65 samples/sec Loss 6.4534 LearningRate 0.0007 Epoch: 10 Global Step: 214870 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:16:47,559-Speed 6304.95 samples/sec Loss 6.3903 LearningRate 0.0007 Epoch: 10 Global Step: 214880 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:16:50,790-Speed 6341.18 samples/sec Loss 6.4523 LearningRate 0.0007 Epoch: 10 Global Step: 214890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:54,037-Speed 6308.32 samples/sec Loss 6.4247 LearningRate 0.0007 Epoch: 10 Global Step: 214900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:16:57,284-Speed 6308.72 samples/sec Loss 6.5193 LearningRate 0.0007 Epoch: 10 Global Step: 214910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:00,542-Speed 6286.53 samples/sec Loss 6.4344 LearningRate 0.0007 Epoch: 10 Global Step: 214920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:03,794-Speed 6299.35 samples/sec Loss 6.4591 LearningRate 0.0007 Epoch: 10 Global Step: 214930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:07,039-Speed 6312.58 samples/sec Loss 6.4158 LearningRate 0.0007 Epoch: 10 Global Step: 214940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:10,284-Speed 6313.46 samples/sec Loss 6.3994 LearningRate 0.0007 Epoch: 10 Global Step: 214950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:13,526-Speed 6318.39 samples/sec Loss 6.4975 LearningRate 0.0007 Epoch: 10 Global Step: 214960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:16,777-Speed 6300.75 samples/sec Loss 6.3866 LearningRate 0.0007 Epoch: 10 Global Step: 214970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:20,023-Speed 6311.12 samples/sec Loss 6.4679 LearningRate 0.0007 Epoch: 10 Global Step: 214980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:23,267-Speed 6313.87 samples/sec Loss 6.4576 LearningRate 0.0007 Epoch: 10 Global Step: 214990 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:17:26,500-Speed 6335.85 samples/sec Loss 6.4048 LearningRate 0.0007 Epoch: 10 Global Step: 215000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:29,747-Speed 6310.15 samples/sec Loss 6.4488 LearningRate 0.0007 Epoch: 10 Global Step: 215010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:32,997-Speed 6302.90 samples/sec Loss 6.3842 LearningRate 0.0007 Epoch: 10 Global Step: 215020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:36,246-Speed 6305.60 samples/sec Loss 6.4582 LearningRate 0.0007 Epoch: 10 Global Step: 215030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:39,495-Speed 6303.53 samples/sec Loss 6.4658 LearningRate 0.0007 Epoch: 10 Global Step: 215040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:42,747-Speed 6300.63 samples/sec Loss 6.4370 LearningRate 0.0007 Epoch: 10 Global Step: 215050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:45,988-Speed 6320.66 samples/sec Loss 6.3781 LearningRate 0.0007 Epoch: 10 Global Step: 215060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:49,235-Speed 6307.39 samples/sec Loss 6.5017 LearningRate 0.0007 Epoch: 10 Global Step: 215070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:52,484-Speed 6305.54 samples/sec Loss 6.4750 LearningRate 0.0007 Epoch: 10 Global Step: 215080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:55,728-Speed 6314.01 samples/sec Loss 6.3986 LearningRate 0.0007 Epoch: 10 Global Step: 215090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:17:58,964-Speed 6330.43 samples/sec Loss 6.4439 LearningRate 0.0007 Epoch: 10 Global Step: 215100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:02,208-Speed 6314.29 samples/sec Loss 6.4435 LearningRate 0.0007 Epoch: 10 Global Step: 215110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:05,455-Speed 6310.02 samples/sec Loss 6.4496 LearningRate 0.0007 Epoch: 10 Global Step: 215120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:08,701-Speed 6309.39 samples/sec Loss 6.4867 LearningRate 0.0007 Epoch: 10 Global Step: 215130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:11,948-Speed 6308.80 samples/sec Loss 6.4789 LearningRate 0.0007 Epoch: 10 Global Step: 215140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:15,197-Speed 6305.96 samples/sec Loss 6.4112 LearningRate 0.0007 Epoch: 10 Global Step: 215150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:18,446-Speed 6303.53 samples/sec Loss 6.3813 LearningRate 0.0007 Epoch: 10 Global Step: 215160 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:21,694-Speed 6307.55 samples/sec Loss 6.5165 LearningRate 0.0007 Epoch: 10 Global Step: 215170 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:24,945-Speed 6301.64 samples/sec Loss 6.4491 LearningRate 0.0007 Epoch: 10 Global Step: 215180 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:28,193-Speed 6307.20 samples/sec Loss 6.4218 LearningRate 0.0007 Epoch: 10 Global Step: 215190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:31,434-Speed 6320.66 samples/sec Loss 6.4292 LearningRate 0.0007 Epoch: 10 Global Step: 215200 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:18:34,665-Speed 6339.72 samples/sec Loss 6.4890 LearningRate 0.0007 Epoch: 10 Global Step: 215210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:37,909-Speed 6314.41 samples/sec Loss 6.4012 LearningRate 0.0007 Epoch: 10 Global Step: 215220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:41,156-Speed 6309.31 samples/sec Loss 6.4949 LearningRate 0.0007 Epoch: 10 Global Step: 215230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:44,402-Speed 6311.22 samples/sec Loss 6.5064 LearningRate 0.0007 Epoch: 10 Global Step: 215240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:47,646-Speed 6314.66 samples/sec Loss 6.4320 LearningRate 0.0007 Epoch: 10 Global Step: 215250 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:50,893-Speed 6307.68 samples/sec Loss 6.4403 LearningRate 0.0007 Epoch: 10 Global Step: 215260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:54,147-Speed 6296.59 samples/sec Loss 6.5344 LearningRate 0.0007 Epoch: 10 Global Step: 215270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:18:57,390-Speed 6316.27 samples/sec Loss 6.3820 LearningRate 0.0007 Epoch: 10 Global Step: 215280 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:00,638-Speed 6306.94 samples/sec Loss 6.4092 LearningRate 0.0007 Epoch: 10 Global Step: 215290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:03,884-Speed 6309.53 samples/sec Loss 6.5047 LearningRate 0.0007 Epoch: 10 Global Step: 215300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:07,126-Speed 6318.77 samples/sec Loss 6.4480 LearningRate 0.0007 Epoch: 10 Global Step: 215310 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:19:10,374-Speed 6307.51 samples/sec Loss 6.3763 LearningRate 0.0007 Epoch: 10 Global Step: 215320 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:19:13,606-Speed 6336.58 samples/sec Loss 6.4262 LearningRate 0.0007 Epoch: 10 Global Step: 215330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:16,854-Speed 6308.35 samples/sec Loss 6.4345 LearningRate 0.0007 Epoch: 10 Global Step: 215340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:20,108-Speed 6295.41 samples/sec Loss 6.4091 LearningRate 0.0007 Epoch: 10 Global Step: 215350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:23,355-Speed 6308.45 samples/sec Loss 6.4090 LearningRate 0.0007 Epoch: 10 Global Step: 215360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:26,600-Speed 6313.36 samples/sec Loss 6.4250 LearningRate 0.0007 Epoch: 10 Global Step: 215370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:29,846-Speed 6310.47 samples/sec Loss 6.4271 LearningRate 0.0007 Epoch: 10 Global Step: 215380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:33,101-Speed 6291.46 samples/sec Loss 6.4187 LearningRate 0.0007 Epoch: 10 Global Step: 215390 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:36,345-Speed 6314.76 samples/sec Loss 6.4023 LearningRate 0.0007 Epoch: 10 Global Step: 215400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:39,591-Speed 6312.36 samples/sec Loss 6.4152 LearningRate 0.0007 Epoch: 10 Global Step: 215410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:42,833-Speed 6317.18 samples/sec Loss 6.3670 LearningRate 0.0007 Epoch: 10 Global Step: 215420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:46,079-Speed 6310.35 samples/sec Loss 6.5270 LearningRate 0.0007 Epoch: 10 Global Step: 215430 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:19:49,316-Speed 6330.45 samples/sec Loss 6.4342 LearningRate 0.0007 Epoch: 10 Global Step: 215440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:52,560-Speed 6313.71 samples/sec Loss 6.5330 LearningRate 0.0007 Epoch: 10 Global Step: 215450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:55,812-Speed 6299.01 samples/sec Loss 6.4588 LearningRate 0.0007 Epoch: 10 Global Step: 215460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:19:59,058-Speed 6311.20 samples/sec Loss 6.4508 LearningRate 0.0007 Epoch: 10 Global Step: 215470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:02,319-Speed 6282.30 samples/sec Loss 6.4812 LearningRate 0.0007 Epoch: 10 Global Step: 215480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:05,569-Speed 6303.01 samples/sec Loss 6.4081 LearningRate 0.0007 Epoch: 10 Global Step: 215490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:08,816-Speed 6309.23 samples/sec Loss 6.4613 LearningRate 0.0007 Epoch: 10 Global Step: 215500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:12,065-Speed 6306.86 samples/sec Loss 6.4776 LearningRate 0.0007 Epoch: 10 Global Step: 215510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:15,313-Speed 6306.48 samples/sec Loss 6.4618 LearningRate 0.0007 Epoch: 10 Global Step: 215520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:18,567-Speed 6294.91 samples/sec Loss 6.4393 LearningRate 0.0007 Epoch: 10 Global Step: 215530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:21,800-Speed 6335.56 samples/sec Loss 6.5344 LearningRate 0.0007 Epoch: 10 Global Step: 215540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:25,048-Speed 6306.61 samples/sec Loss 6.5137 LearningRate 0.0007 Epoch: 10 Global Step: 215550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:28,293-Speed 6313.94 samples/sec Loss 6.5427 LearningRate 0.0007 Epoch: 10 Global Step: 215560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:31,572-Speed 6246.69 samples/sec Loss 6.4500 LearningRate 0.0007 Epoch: 10 Global Step: 215570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:34,818-Speed 6310.88 samples/sec Loss 6.4531 LearningRate 0.0007 Epoch: 10 Global Step: 215580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:38,062-Speed 6314.03 samples/sec Loss 6.4189 LearningRate 0.0007 Epoch: 10 Global Step: 215590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:41,312-Speed 6303.77 samples/sec Loss 6.4269 LearningRate 0.0007 Epoch: 10 Global Step: 215600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:44,557-Speed 6312.98 samples/sec Loss 6.4890 LearningRate 0.0007 Epoch: 10 Global Step: 215610 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:47,803-Speed 6310.72 samples/sec Loss 6.5063 LearningRate 0.0007 Epoch: 10 Global Step: 215620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:51,052-Speed 6304.24 samples/sec Loss 6.4790 LearningRate 0.0007 Epoch: 10 Global Step: 215630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:20:54,297-Speed 6312.67 samples/sec Loss 6.4880 LearningRate 0.0007 Epoch: 10 Global Step: 215640 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:20:57,534-Speed 6329.06 samples/sec Loss 6.4290 LearningRate 0.0007 Epoch: 10 Global Step: 215650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:00,780-Speed 6309.89 samples/sec Loss 6.4619 LearningRate 0.0007 Epoch: 10 Global Step: 215660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:04,027-Speed 6309.67 samples/sec Loss 6.4626 LearningRate 0.0007 Epoch: 10 Global Step: 215670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:07,272-Speed 6312.32 samples/sec Loss 6.4171 LearningRate 0.0007 Epoch: 10 Global Step: 215680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:10,517-Speed 6312.60 samples/sec Loss 6.4672 LearningRate 0.0007 Epoch: 10 Global Step: 215690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:13,765-Speed 6308.11 samples/sec Loss 6.3980 LearningRate 0.0007 Epoch: 10 Global Step: 215700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:17,012-Speed 6308.74 samples/sec Loss 6.4723 LearningRate 0.0007 Epoch: 10 Global Step: 215710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:20,258-Speed 6309.80 samples/sec Loss 6.4475 LearningRate 0.0007 Epoch: 10 Global Step: 215720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:23,505-Speed 6308.15 samples/sec Loss 6.4265 LearningRate 0.0007 Epoch: 10 Global Step: 215730 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:26,757-Speed 6300.57 samples/sec Loss 6.4863 LearningRate 0.0007 Epoch: 10 Global Step: 215740 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:30,005-Speed 6306.67 samples/sec Loss 6.4169 LearningRate 0.0007 Epoch: 10 Global Step: 215750 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:21:33,251-Speed 6309.33 samples/sec Loss 6.5120 LearningRate 0.0007 Epoch: 10 Global Step: 215760 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:21:36,485-Speed 6335.63 samples/sec Loss 6.4696 LearningRate 0.0007 Epoch: 10 Global Step: 215770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:39,731-Speed 6309.39 samples/sec Loss 6.4559 LearningRate 0.0007 Epoch: 10 Global Step: 215780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:42,980-Speed 6304.69 samples/sec Loss 6.4578 LearningRate 0.0007 Epoch: 10 Global Step: 215790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:46,228-Speed 6306.54 samples/sec Loss 6.3866 LearningRate 0.0007 Epoch: 10 Global Step: 215800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:49,476-Speed 6306.96 samples/sec Loss 6.3772 LearningRate 0.0007 Epoch: 10 Global Step: 215810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:52,724-Speed 6306.91 samples/sec Loss 6.4069 LearningRate 0.0007 Epoch: 10 Global Step: 215820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:55,969-Speed 6313.54 samples/sec Loss 6.3853 LearningRate 0.0007 Epoch: 10 Global Step: 215830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:21:59,214-Speed 6312.32 samples/sec Loss 6.4313 LearningRate 0.0007 Epoch: 10 Global Step: 215840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:02,461-Speed 6308.62 samples/sec Loss 6.4435 LearningRate 0.0007 Epoch: 10 Global Step: 215850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:05,707-Speed 6310.74 samples/sec Loss 6.4541 LearningRate 0.0007 Epoch: 10 Global Step: 215860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:08,936-Speed 6344.41 samples/sec Loss 6.4486 LearningRate 0.0007 Epoch: 10 Global Step: 215870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:12,182-Speed 6310.99 samples/sec Loss 6.3991 LearningRate 0.0007 Epoch: 10 Global Step: 215880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:15,431-Speed 6305.58 samples/sec Loss 6.3879 LearningRate 0.0007 Epoch: 10 Global Step: 215890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:18,691-Speed 6282.16 samples/sec Loss 6.3453 LearningRate 0.0007 Epoch: 10 Global Step: 215900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:21,942-Speed 6302.64 samples/sec Loss 6.4450 LearningRate 0.0007 Epoch: 10 Global Step: 215910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:25,192-Speed 6303.06 samples/sec Loss 6.4680 LearningRate 0.0007 Epoch: 10 Global Step: 215920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:28,441-Speed 6304.79 samples/sec Loss 6.5357 LearningRate 0.0007 Epoch: 10 Global Step: 215930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:31,686-Speed 6311.86 samples/sec Loss 6.4974 LearningRate 0.0007 Epoch: 10 Global Step: 215940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:34,945-Speed 6284.94 samples/sec Loss 6.4548 LearningRate 0.0007 Epoch: 10 Global Step: 215950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:38,194-Speed 6305.17 samples/sec Loss 6.4926 LearningRate 0.0007 Epoch: 10 Global Step: 215960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:41,427-Speed 6336.01 samples/sec Loss 6.4714 LearningRate 0.0007 Epoch: 10 Global Step: 215970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:44,676-Speed 6306.29 samples/sec Loss 6.4238 LearningRate 0.0007 Epoch: 10 Global Step: 215980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:47,921-Speed 6310.93 samples/sec Loss 6.4798 LearningRate 0.0007 Epoch: 10 Global Step: 215990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:51,168-Speed 6309.17 samples/sec Loss 6.3917 LearningRate 0.0007 Epoch: 10 Global Step: 216000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:54,418-Speed 6302.51 samples/sec Loss 6.3260 LearningRate 0.0007 Epoch: 10 Global Step: 216010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:22:57,665-Speed 6310.32 samples/sec Loss 6.3438 LearningRate 0.0007 Epoch: 10 Global Step: 216020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:00,912-Speed 6307.71 samples/sec Loss 6.4014 LearningRate 0.0007 Epoch: 10 Global Step: 216030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:04,161-Speed 6304.15 samples/sec Loss 6.4591 LearningRate 0.0007 Epoch: 10 Global Step: 216040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:07,410-Speed 6304.90 samples/sec Loss 6.4217 LearningRate 0.0007 Epoch: 10 Global Step: 216050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:10,657-Speed 6310.58 samples/sec Loss 6.3976 LearningRate 0.0007 Epoch: 10 Global Step: 216060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:13,901-Speed 6313.70 samples/sec Loss 6.4459 LearningRate 0.0007 Epoch: 10 Global Step: 216070 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:23:17,132-Speed 6339.26 samples/sec Loss 6.3515 LearningRate 0.0007 Epoch: 10 Global Step: 216080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:20,383-Speed 6303.24 samples/sec Loss 6.4859 LearningRate 0.0007 Epoch: 10 Global Step: 216090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:23,629-Speed 6309.83 samples/sec Loss 6.4241 LearningRate 0.0007 Epoch: 10 Global Step: 216100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:26,877-Speed 6307.66 samples/sec Loss 6.4144 LearningRate 0.0007 Epoch: 10 Global Step: 216110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:30,127-Speed 6302.93 samples/sec Loss 6.4666 LearningRate 0.0007 Epoch: 10 Global Step: 216120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:33,378-Speed 6300.09 samples/sec Loss 6.4996 LearningRate 0.0007 Epoch: 10 Global Step: 216130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:36,634-Speed 6291.50 samples/sec Loss 6.4107 LearningRate 0.0007 Epoch: 10 Global Step: 216140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:39,882-Speed 6306.64 samples/sec Loss 6.4614 LearningRate 0.0007 Epoch: 10 Global Step: 216150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:43,130-Speed 6306.84 samples/sec Loss 6.4284 LearningRate 0.0007 Epoch: 10 Global Step: 216160 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:46,377-Speed 6309.84 samples/sec Loss 6.4549 LearningRate 0.0007 Epoch: 10 Global Step: 216170 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:49,613-Speed 6328.45 samples/sec Loss 6.4010 LearningRate 0.0007 Epoch: 10 Global Step: 216180 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:52,858-Speed 6314.14 samples/sec Loss 6.4982 LearningRate 0.0007 Epoch: 10 Global Step: 216190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:56,102-Speed 6313.34 samples/sec Loss 6.5354 LearningRate 0.0007 Epoch: 10 Global Step: 216200 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:23:59,346-Speed 6314.96 samples/sec Loss 6.5017 LearningRate 0.0007 Epoch: 10 Global Step: 216210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:02,590-Speed 6314.69 samples/sec Loss 6.4542 LearningRate 0.0007 Epoch: 10 Global Step: 216220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:05,839-Speed 6305.11 samples/sec Loss 6.3848 LearningRate 0.0007 Epoch: 10 Global Step: 216230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:09,085-Speed 6310.75 samples/sec Loss 6.3449 LearningRate 0.0007 Epoch: 10 Global Step: 216240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:12,335-Speed 6302.12 samples/sec Loss 6.4200 LearningRate 0.0007 Epoch: 10 Global Step: 216250 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:15,589-Speed 6294.91 samples/sec Loss 6.4146 LearningRate 0.0007 Epoch: 10 Global Step: 216260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:18,839-Speed 6304.31 samples/sec Loss 6.5011 LearningRate 0.0007 Epoch: 10 Global Step: 216270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:22,084-Speed 6311.91 samples/sec Loss 6.4206 LearningRate 0.0007 Epoch: 10 Global Step: 216280 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:24:25,328-Speed 6316.18 samples/sec Loss 6.4152 LearningRate 0.0007 Epoch: 10 Global Step: 216290 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:24:28,560-Speed 6338.20 samples/sec Loss 6.4800 LearningRate 0.0007 Epoch: 10 Global Step: 216300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:31,804-Speed 6313.92 samples/sec Loss 6.4241 LearningRate 0.0007 Epoch: 10 Global Step: 216310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:35,053-Speed 6305.73 samples/sec Loss 6.4419 LearningRate 0.0007 Epoch: 10 Global Step: 216320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:38,299-Speed 6310.77 samples/sec Loss 6.4415 LearningRate 0.0007 Epoch: 10 Global Step: 216330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:41,550-Speed 6300.43 samples/sec Loss 6.4062 LearningRate 0.0007 Epoch: 10 Global Step: 216340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:44,798-Speed 6307.55 samples/sec Loss 6.4399 LearningRate 0.0007 Epoch: 10 Global Step: 216350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:48,042-Speed 6314.52 samples/sec Loss 6.4734 LearningRate 0.0007 Epoch: 10 Global Step: 216360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:51,305-Speed 6277.09 samples/sec Loss 6.4281 LearningRate 0.0007 Epoch: 10 Global Step: 216370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:54,550-Speed 6311.94 samples/sec Loss 6.4376 LearningRate 0.0007 Epoch: 10 Global Step: 216380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:24:57,796-Speed 6312.51 samples/sec Loss 6.3517 LearningRate 0.0007 Epoch: 10 Global Step: 216390 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:01,037-Speed 6320.39 samples/sec Loss 6.4226 LearningRate 0.0007 Epoch: 10 Global Step: 216400 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:04,283-Speed 6310.54 samples/sec Loss 6.4755 LearningRate 0.0007 Epoch: 10 Global Step: 216410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:07,528-Speed 6310.71 samples/sec Loss 6.3539 LearningRate 0.0007 Epoch: 10 Global Step: 216420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:10,777-Speed 6306.54 samples/sec Loss 6.4329 LearningRate 0.0007 Epoch: 10 Global Step: 216430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:14,023-Speed 6309.69 samples/sec Loss 6.4454 LearningRate 0.0007 Epoch: 10 Global Step: 216440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:17,271-Speed 6307.71 samples/sec Loss 6.4666 LearningRate 0.0007 Epoch: 10 Global Step: 216450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:20,531-Speed 6283.16 samples/sec Loss 6.4065 LearningRate 0.0007 Epoch: 10 Global Step: 216460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:23,778-Speed 6308.90 samples/sec Loss 6.3595 LearningRate 0.0007 Epoch: 10 Global Step: 216470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:27,032-Speed 6295.88 samples/sec Loss 6.3997 LearningRate 0.0007 Epoch: 10 Global Step: 216480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:30,281-Speed 6303.66 samples/sec Loss 6.4780 LearningRate 0.0007 Epoch: 10 Global Step: 216490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:33,537-Speed 6293.12 samples/sec Loss 6.4317 LearningRate 0.0007 Epoch: 10 Global Step: 216500 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:25:36,785-Speed 6306.41 samples/sec Loss 6.4231 LearningRate 0.0007 Epoch: 10 Global Step: 216510 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:25:40,017-Speed 6338.42 samples/sec Loss 6.4424 LearningRate 0.0007 Epoch: 10 Global Step: 216520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:43,267-Speed 6302.54 samples/sec Loss 6.4268 LearningRate 0.0007 Epoch: 10 Global Step: 216530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:46,518-Speed 6302.23 samples/sec Loss 6.3713 LearningRate 0.0007 Epoch: 10 Global Step: 216540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:49,774-Speed 6290.23 samples/sec Loss 6.4492 LearningRate 0.0007 Epoch: 10 Global Step: 216550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:53,023-Speed 6305.21 samples/sec Loss 6.4948 LearningRate 0.0007 Epoch: 10 Global Step: 216560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:56,270-Speed 6308.53 samples/sec Loss 6.3849 LearningRate 0.0007 Epoch: 10 Global Step: 216570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:25:59,518-Speed 6306.83 samples/sec Loss 6.4449 LearningRate 0.0007 Epoch: 10 Global Step: 216580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:02,765-Speed 6308.90 samples/sec Loss 6.4029 LearningRate 0.0007 Epoch: 10 Global Step: 216590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:06,012-Speed 6308.94 samples/sec Loss 6.4341 LearningRate 0.0007 Epoch: 10 Global Step: 216600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:09,261-Speed 6303.67 samples/sec Loss 6.3969 LearningRate 0.0007 Epoch: 10 Global Step: 216610 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:12,497-Speed 6330.72 samples/sec Loss 6.4189 LearningRate 0.0007 Epoch: 10 Global Step: 216620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:15,747-Speed 6303.94 samples/sec Loss 6.3861 LearningRate 0.0007 Epoch: 10 Global Step: 216630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:18,990-Speed 6315.33 samples/sec Loss 6.4930 LearningRate 0.0007 Epoch: 10 Global Step: 216640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:22,240-Speed 6303.47 samples/sec Loss 6.4398 LearningRate 0.0007 Epoch: 10 Global Step: 216650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:25,484-Speed 6315.10 samples/sec Loss 6.5094 LearningRate 0.0007 Epoch: 10 Global Step: 216660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:28,738-Speed 6295.49 samples/sec Loss 6.4386 LearningRate 0.0007 Epoch: 10 Global Step: 216670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:31,990-Speed 6299.25 samples/sec Loss 6.4573 LearningRate 0.0007 Epoch: 10 Global Step: 216680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:35,237-Speed 6307.56 samples/sec Loss 6.4322 LearningRate 0.0007 Epoch: 10 Global Step: 216690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:38,484-Speed 6308.86 samples/sec Loss 6.4919 LearningRate 0.0007 Epoch: 10 Global Step: 216700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:41,731-Speed 6309.38 samples/sec Loss 6.4595 LearningRate 0.0007 Epoch: 10 Global Step: 216710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:44,978-Speed 6309.89 samples/sec Loss 6.4934 LearningRate 0.0007 Epoch: 10 Global Step: 216720 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:26:48,228-Speed 6301.60 samples/sec Loss 6.4468 LearningRate 0.0007 Epoch: 10 Global Step: 216730 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:26:51,462-Speed 6335.29 samples/sec Loss 6.4660 LearningRate 0.0007 Epoch: 10 Global Step: 216740 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:54,706-Speed 6314.92 samples/sec Loss 6.4238 LearningRate 0.0007 Epoch: 10 Global Step: 216750 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:26:57,952-Speed 6310.68 samples/sec Loss 6.3976 LearningRate 0.0007 Epoch: 10 Global Step: 216760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:01,200-Speed 6306.99 samples/sec Loss 6.4179 LearningRate 0.0007 Epoch: 10 Global Step: 216770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:04,449-Speed 6305.00 samples/sec Loss 6.3524 LearningRate 0.0007 Epoch: 10 Global Step: 216780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:07,697-Speed 6306.81 samples/sec Loss 6.4620 LearningRate 0.0007 Epoch: 10 Global Step: 216790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:10,946-Speed 6304.43 samples/sec Loss 6.3996 LearningRate 0.0007 Epoch: 10 Global Step: 216800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:14,193-Speed 6308.06 samples/sec Loss 6.3748 LearningRate 0.0007 Epoch: 10 Global Step: 216810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:17,440-Speed 6309.58 samples/sec Loss 6.4201 LearningRate 0.0007 Epoch: 10 Global Step: 216820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:20,690-Speed 6301.63 samples/sec Loss 6.4208 LearningRate 0.0007 Epoch: 10 Global Step: 216830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:23,925-Speed 6331.84 samples/sec Loss 6.4614 LearningRate 0.0007 Epoch: 10 Global Step: 216840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:27,168-Speed 6317.74 samples/sec Loss 6.4125 LearningRate 0.0007 Epoch: 10 Global Step: 216850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:30,417-Speed 6304.79 samples/sec Loss 6.4590 LearningRate 0.0007 Epoch: 10 Global Step: 216860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:33,668-Speed 6301.15 samples/sec Loss 6.4273 LearningRate 0.0007 Epoch: 10 Global Step: 216870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:36,912-Speed 6314.54 samples/sec Loss 6.4139 LearningRate 0.0007 Epoch: 10 Global Step: 216880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:40,156-Speed 6314.80 samples/sec Loss 6.4483 LearningRate 0.0007 Epoch: 10 Global Step: 216890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:43,400-Speed 6313.21 samples/sec Loss 6.4057 LearningRate 0.0007 Epoch: 10 Global Step: 216900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:46,648-Speed 6307.21 samples/sec Loss 6.4372 LearningRate 0.0007 Epoch: 10 Global Step: 216910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:49,894-Speed 6311.01 samples/sec Loss 6.3688 LearningRate 0.0007 Epoch: 10 Global Step: 216920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:53,139-Speed 6314.22 samples/sec Loss 6.4327 LearningRate 0.0007 Epoch: 10 Global Step: 216930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:27:56,386-Speed 6308.69 samples/sec Loss 6.3941 LearningRate 0.0007 Epoch: 10 Global Step: 216940 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:27:59,653-Speed 6269.31 samples/sec Loss 6.3658 LearningRate 0.0007 Epoch: 10 Global Step: 216950 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:28:02,885-Speed 6338.81 samples/sec Loss 6.4451 LearningRate 0.0007 Epoch: 10 Global Step: 216960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:06,133-Speed 6307.64 samples/sec Loss 6.4192 LearningRate 0.0007 Epoch: 10 Global Step: 216970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:09,381-Speed 6305.21 samples/sec Loss 6.4726 LearningRate 0.0007 Epoch: 10 Global Step: 216980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:12,640-Speed 6285.62 samples/sec Loss 6.5386 LearningRate 0.0007 Epoch: 10 Global Step: 216990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:15,887-Speed 6309.76 samples/sec Loss 6.4157 LearningRate 0.0007 Epoch: 10 Global Step: 217000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:19,134-Speed 6308.46 samples/sec Loss 6.4079 LearningRate 0.0007 Epoch: 10 Global Step: 217010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:22,381-Speed 6309.25 samples/sec Loss 6.3615 LearningRate 0.0007 Epoch: 10 Global Step: 217020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:25,627-Speed 6309.53 samples/sec Loss 6.4782 LearningRate 0.0007 Epoch: 10 Global Step: 217030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:28,909-Speed 6241.83 samples/sec Loss 6.4876 LearningRate 0.0007 Epoch: 10 Global Step: 217040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:32,157-Speed 6306.69 samples/sec Loss 6.4648 LearningRate 0.0007 Epoch: 10 Global Step: 217050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:35,391-Speed 6334.95 samples/sec Loss 6.4479 LearningRate 0.0007 Epoch: 10 Global Step: 217060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:38,635-Speed 6313.50 samples/sec Loss 6.3550 LearningRate 0.0007 Epoch: 10 Global Step: 217070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:41,881-Speed 6310.97 samples/sec Loss 6.4627 LearningRate 0.0007 Epoch: 10 Global Step: 217080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:45,125-Speed 6314.02 samples/sec Loss 6.3831 LearningRate 0.0007 Epoch: 10 Global Step: 217090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:48,371-Speed 6312.34 samples/sec Loss 6.4056 LearningRate 0.0007 Epoch: 10 Global Step: 217100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:51,613-Speed 6317.43 samples/sec Loss 6.4042 LearningRate 0.0007 Epoch: 10 Global Step: 217110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:54,861-Speed 6307.26 samples/sec Loss 6.4172 LearningRate 0.0007 Epoch: 10 Global Step: 217120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:28:58,106-Speed 6313.08 samples/sec Loss 6.4852 LearningRate 0.0007 Epoch: 10 Global Step: 217130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:01,353-Speed 6308.12 samples/sec Loss 6.5176 LearningRate 0.0007 Epoch: 10 Global Step: 217140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:04,602-Speed 6306.36 samples/sec Loss 6.3895 LearningRate 0.0007 Epoch: 10 Global Step: 217150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:07,847-Speed 6312.41 samples/sec Loss 6.4183 LearningRate 0.0007 Epoch: 10 Global Step: 217160 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:29:11,096-Speed 6305.29 samples/sec Loss 6.5039 LearningRate 0.0007 Epoch: 10 Global Step: 217170 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:29:14,347-Speed 6300.97 samples/sec Loss 6.3895 LearningRate 0.0007 Epoch: 10 Global Step: 217180 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:29:17,580-Speed 6334.79 samples/sec Loss 6.4709 LearningRate 0.0007 Epoch: 10 Global Step: 217190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:20,831-Speed 6300.95 samples/sec Loss 6.4020 LearningRate 0.0007 Epoch: 10 Global Step: 217200 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:24,081-Speed 6302.88 samples/sec Loss 6.3553 LearningRate 0.0007 Epoch: 10 Global Step: 217210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:27,330-Speed 6305.42 samples/sec Loss 6.4539 LearningRate 0.0007 Epoch: 10 Global Step: 217220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:30,577-Speed 6307.98 samples/sec Loss 6.4904 LearningRate 0.0007 Epoch: 10 Global Step: 217230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:33,822-Speed 6313.06 samples/sec Loss 6.4214 LearningRate 0.0007 Epoch: 10 Global Step: 217240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:37,067-Speed 6314.20 samples/sec Loss 6.2936 LearningRate 0.0007 Epoch: 10 Global Step: 217250 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:40,312-Speed 6310.86 samples/sec Loss 6.4987 LearningRate 0.0007 Epoch: 10 Global Step: 217260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:43,555-Speed 6316.19 samples/sec Loss 6.4126 LearningRate 0.0007 Epoch: 10 Global Step: 217270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:46,802-Speed 6310.08 samples/sec Loss 6.4596 LearningRate 0.0007 Epoch: 10 Global Step: 217280 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:50,033-Speed 6339.25 samples/sec Loss 6.3516 LearningRate 0.0007 Epoch: 10 Global Step: 217290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:53,279-Speed 6311.53 samples/sec Loss 6.3378 LearningRate 0.0007 Epoch: 10 Global Step: 217300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:56,525-Speed 6310.49 samples/sec Loss 6.3782 LearningRate 0.0007 Epoch: 10 Global Step: 217310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:29:59,772-Speed 6308.65 samples/sec Loss 6.4319 LearningRate 0.0007 Epoch: 10 Global Step: 217320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:03,023-Speed 6300.98 samples/sec Loss 6.4570 LearningRate 0.0007 Epoch: 10 Global Step: 217330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:06,269-Speed 6311.18 samples/sec Loss 6.3746 LearningRate 0.0007 Epoch: 10 Global Step: 217340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:09,515-Speed 6312.24 samples/sec Loss 6.3851 LearningRate 0.0007 Epoch: 10 Global Step: 217350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:12,763-Speed 6306.26 samples/sec Loss 6.5139 LearningRate 0.0007 Epoch: 10 Global Step: 217360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:16,011-Speed 6307.29 samples/sec Loss 6.4449 LearningRate 0.0007 Epoch: 10 Global Step: 217370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:19,260-Speed 6303.77 samples/sec Loss 6.4113 LearningRate 0.0007 Epoch: 10 Global Step: 217380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:22,507-Speed 6309.12 samples/sec Loss 6.4478 LearningRate 0.0007 Epoch: 10 Global Step: 217390 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:30:25,760-Speed 6296.83 samples/sec Loss 6.4727 LearningRate 0.0007 Epoch: 10 Global Step: 217400 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:30:28,992-Speed 6339.06 samples/sec Loss 6.5146 LearningRate 0.0007 Epoch: 10 Global Step: 217410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:32,239-Speed 6307.86 samples/sec Loss 6.4135 LearningRate 0.0007 Epoch: 10 Global Step: 217420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:35,487-Speed 6306.60 samples/sec Loss 6.4551 LearningRate 0.0007 Epoch: 10 Global Step: 217430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:38,730-Speed 6316.96 samples/sec Loss 6.4440 LearningRate 0.0007 Epoch: 10 Global Step: 217440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:41,984-Speed 6296.40 samples/sec Loss 6.4061 LearningRate 0.0007 Epoch: 10 Global Step: 217450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:45,227-Speed 6316.63 samples/sec Loss 6.3761 LearningRate 0.0007 Epoch: 10 Global Step: 217460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:48,473-Speed 6309.36 samples/sec Loss 6.3676 LearningRate 0.0007 Epoch: 10 Global Step: 217470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:51,723-Speed 6302.85 samples/sec Loss 6.3973 LearningRate 0.0007 Epoch: 10 Global Step: 217480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:54,973-Speed 6303.70 samples/sec Loss 6.4421 LearningRate 0.0007 Epoch: 10 Global Step: 217490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:30:58,222-Speed 6304.17 samples/sec Loss 6.4199 LearningRate 0.0007 Epoch: 10 Global Step: 217500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:01,454-Speed 6339.25 samples/sec Loss 6.3932 LearningRate 0.0007 Epoch: 10 Global Step: 217510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:04,702-Speed 6305.48 samples/sec Loss 6.3804 LearningRate 0.0007 Epoch: 10 Global Step: 217520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:07,949-Speed 6309.03 samples/sec Loss 6.4281 LearningRate 0.0007 Epoch: 10 Global Step: 217530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:11,191-Speed 6318.82 samples/sec Loss 6.4148 LearningRate 0.0007 Epoch: 10 Global Step: 217540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:14,441-Speed 6303.42 samples/sec Loss 6.4229 LearningRate 0.0007 Epoch: 10 Global Step: 217550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:17,690-Speed 6305.28 samples/sec Loss 6.4133 LearningRate 0.0007 Epoch: 10 Global Step: 217560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:20,938-Speed 6306.70 samples/sec Loss 6.4745 LearningRate 0.0007 Epoch: 10 Global Step: 217570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:24,260-Speed 6167.04 samples/sec Loss 6.4773 LearningRate 0.0007 Epoch: 10 Global Step: 217580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:27,616-Speed 6104.27 samples/sec Loss 6.5102 LearningRate 0.0007 Epoch: 10 Global Step: 217590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:30,869-Speed 6296.89 samples/sec Loss 6.4226 LearningRate 0.0007 Epoch: 10 Global Step: 217600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:34,117-Speed 6305.52 samples/sec Loss 6.4452 LearningRate 0.0007 Epoch: 10 Global Step: 217610 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:31:37,351-Speed 6335.35 samples/sec Loss 6.4019 LearningRate 0.0007 Epoch: 10 Global Step: 217620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:40,596-Speed 6311.69 samples/sec Loss 6.4055 LearningRate 0.0007 Epoch: 10 Global Step: 217630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:43,841-Speed 6313.10 samples/sec Loss 6.4206 LearningRate 0.0007 Epoch: 10 Global Step: 217640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:47,091-Speed 6302.25 samples/sec Loss 6.4536 LearningRate 0.0007 Epoch: 10 Global Step: 217650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:50,337-Speed 6310.84 samples/sec Loss 6.3448 LearningRate 0.0007 Epoch: 10 Global Step: 217660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:53,584-Speed 6310.23 samples/sec Loss 6.3859 LearningRate 0.0007 Epoch: 10 Global Step: 217670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:31:56,829-Speed 6311.41 samples/sec Loss 6.3747 LearningRate 0.0007 Epoch: 10 Global Step: 217680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:00,077-Speed 6307.09 samples/sec Loss 6.4621 LearningRate 0.0007 Epoch: 10 Global Step: 217690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:03,325-Speed 6307.64 samples/sec Loss 6.4377 LearningRate 0.0007 Epoch: 10 Global Step: 217700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:06,572-Speed 6308.96 samples/sec Loss 6.4346 LearningRate 0.0007 Epoch: 10 Global Step: 217710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:09,814-Speed 6317.62 samples/sec Loss 6.4356 LearningRate 0.0007 Epoch: 10 Global Step: 217720 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:32:13,062-Speed 6307.36 samples/sec Loss 6.3850 LearningRate 0.0007 Epoch: 10 Global Step: 217730 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:32:16,319-Speed 6289.38 samples/sec Loss 6.3906 LearningRate 0.0007 Epoch: 10 Global Step: 217740 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:32:19,548-Speed 6342.41 samples/sec Loss 6.3449 LearningRate 0.0007 Epoch: 10 Global Step: 217750 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:22,791-Speed 6316.88 samples/sec Loss 6.4925 LearningRate 0.0007 Epoch: 10 Global Step: 217760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:26,040-Speed 6306.28 samples/sec Loss 6.3818 LearningRate 0.0007 Epoch: 10 Global Step: 217770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:29,288-Speed 6307.56 samples/sec Loss 6.4320 LearningRate 0.0007 Epoch: 10 Global Step: 217780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:32,537-Speed 6305.03 samples/sec Loss 6.4307 LearningRate 0.0007 Epoch: 10 Global Step: 217790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:35,785-Speed 6306.22 samples/sec Loss 6.3747 LearningRate 0.0007 Epoch: 10 Global Step: 217800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:39,034-Speed 6304.62 samples/sec Loss 6.5120 LearningRate 0.0007 Epoch: 10 Global Step: 217810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:42,282-Speed 6306.94 samples/sec Loss 6.4489 LearningRate 0.0007 Epoch: 10 Global Step: 217820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:45,527-Speed 6312.27 samples/sec Loss 6.4487 LearningRate 0.0007 Epoch: 10 Global Step: 217830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:48,775-Speed 6307.56 samples/sec Loss 6.4383 LearningRate 0.0007 Epoch: 10 Global Step: 217840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:52,008-Speed 6334.94 samples/sec Loss 6.4954 LearningRate 0.0007 Epoch: 10 Global Step: 217850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:55,258-Speed 6304.64 samples/sec Loss 6.3653 LearningRate 0.0007 Epoch: 10 Global Step: 217860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:32:58,503-Speed 6311.03 samples/sec Loss 6.3512 LearningRate 0.0007 Epoch: 10 Global Step: 217870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:01,747-Speed 6315.78 samples/sec Loss 6.4438 LearningRate 0.0007 Epoch: 10 Global Step: 217880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:04,994-Speed 6308.04 samples/sec Loss 6.3525 LearningRate 0.0007 Epoch: 10 Global Step: 217890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:08,249-Speed 6293.64 samples/sec Loss 6.4349 LearningRate 0.0007 Epoch: 10 Global Step: 217900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:11,498-Speed 6304.94 samples/sec Loss 6.4266 LearningRate 0.0007 Epoch: 10 Global Step: 217910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:14,745-Speed 6308.05 samples/sec Loss 6.4635 LearningRate 0.0007 Epoch: 10 Global Step: 217920 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:18,069-Speed 6163.32 samples/sec Loss 6.5164 LearningRate 0.0007 Epoch: 10 Global Step: 217930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:21,316-Speed 6307.60 samples/sec Loss 6.4399 LearningRate 0.0007 Epoch: 10 Global Step: 217940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:24,570-Speed 6296.09 samples/sec Loss 6.4218 LearningRate 0.0007 Epoch: 10 Global Step: 217950 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:33:27,822-Speed 6298.17 samples/sec Loss 6.3983 LearningRate 0.0007 Epoch: 10 Global Step: 217960 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:33:31,052-Speed 6343.00 samples/sec Loss 6.3012 LearningRate 0.0007 Epoch: 10 Global Step: 217970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:34,298-Speed 6311.44 samples/sec Loss 6.4424 LearningRate 0.0007 Epoch: 10 Global Step: 217980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:37,544-Speed 6310.80 samples/sec Loss 6.3548 LearningRate 0.0007 Epoch: 10 Global Step: 217990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:40,792-Speed 6306.75 samples/sec Loss 6.3697 LearningRate 0.0007 Epoch: 10 Global Step: 218000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:44,037-Speed 6312.56 samples/sec Loss 6.3887 LearningRate 0.0007 Epoch: 10 Global Step: 218010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:47,284-Speed 6308.32 samples/sec Loss 6.4076 LearningRate 0.0007 Epoch: 10 Global Step: 218020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:50,529-Speed 6313.34 samples/sec Loss 6.4496 LearningRate 0.0007 Epoch: 10 Global Step: 218030 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:53,781-Speed 6299.41 samples/sec Loss 6.4225 LearningRate 0.0007 Epoch: 10 Global Step: 218040 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:33:57,037-Speed 6291.10 samples/sec Loss 6.3886 LearningRate 0.0007 Epoch: 10 Global Step: 218050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:00,286-Speed 6304.97 samples/sec Loss 6.3876 LearningRate 0.0007 Epoch: 10 Global Step: 218060 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:03,518-Speed 6336.51 samples/sec Loss 6.3833 LearningRate 0.0007 Epoch: 10 Global Step: 218070 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:06,769-Speed 6302.87 samples/sec Loss 6.2934 LearningRate 0.0007 Epoch: 10 Global Step: 218080 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:10,015-Speed 6309.35 samples/sec Loss 6.4836 LearningRate 0.0007 Epoch: 10 Global Step: 218090 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:13,262-Speed 6308.55 samples/sec Loss 6.5147 LearningRate 0.0007 Epoch: 10 Global Step: 218100 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:16,511-Speed 6304.84 samples/sec Loss 6.4722 LearningRate 0.0007 Epoch: 10 Global Step: 218110 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:19,757-Speed 6311.21 samples/sec Loss 6.4593 LearningRate 0.0007 Epoch: 10 Global Step: 218120 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:23,005-Speed 6306.49 samples/sec Loss 6.4798 LearningRate 0.0007 Epoch: 10 Global Step: 218130 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:26,255-Speed 6302.52 samples/sec Loss 6.4162 LearningRate 0.0007 Epoch: 10 Global Step: 218140 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:29,507-Speed 6299.65 samples/sec Loss 6.4865 LearningRate 0.0007 Epoch: 10 Global Step: 218150 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:32,759-Speed 6298.87 samples/sec Loss 6.3438 LearningRate 0.0007 Epoch: 10 Global Step: 218160 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:36,002-Speed 6316.06 samples/sec Loss 6.4243 LearningRate 0.0007 Epoch: 10 Global Step: 218170 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:34:39,235-Speed 6336.55 samples/sec Loss 6.3548 LearningRate 0.0007 Epoch: 10 Global Step: 218180 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:42,485-Speed 6303.70 samples/sec Loss 6.4289 LearningRate 0.0007 Epoch: 10 Global Step: 218190 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:45,738-Speed 6297.46 samples/sec Loss 6.4065 LearningRate 0.0007 Epoch: 10 Global Step: 218200 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:48,984-Speed 6309.88 samples/sec Loss 6.4102 LearningRate 0.0007 Epoch: 10 Global Step: 218210 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:52,236-Speed 6301.05 samples/sec Loss 6.4848 LearningRate 0.0007 Epoch: 10 Global Step: 218220 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:55,484-Speed 6305.32 samples/sec Loss 6.4619 LearningRate 0.0007 Epoch: 10 Global Step: 218230 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:34:58,726-Speed 6318.30 samples/sec Loss 6.4357 LearningRate 0.0007 Epoch: 10 Global Step: 218240 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:01,977-Speed 6301.84 samples/sec Loss 6.4402 LearningRate 0.0007 Epoch: 10 Global Step: 218250 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:05,237-Speed 6282.63 samples/sec Loss 6.3841 LearningRate 0.0007 Epoch: 10 Global Step: 218260 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:08,485-Speed 6306.68 samples/sec Loss 6.4114 LearningRate 0.0007 Epoch: 10 Global Step: 218270 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:11,733-Speed 6307.69 samples/sec Loss 6.4100 LearningRate 0.0007 Epoch: 10 Global Step: 218280 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:35:14,963-Speed 6341.42 samples/sec Loss 6.3411 LearningRate 0.0007 Epoch: 10 Global Step: 218290 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:18,223-Speed 6285.42 samples/sec Loss 6.3599 LearningRate 0.0007 Epoch: 10 Global Step: 218300 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:21,467-Speed 6313.61 samples/sec Loss 6.4120 LearningRate 0.0007 Epoch: 10 Global Step: 218310 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:24,716-Speed 6304.88 samples/sec Loss 6.4078 LearningRate 0.0007 Epoch: 10 Global Step: 218320 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:27,959-Speed 6317.27 samples/sec Loss 6.4248 LearningRate 0.0007 Epoch: 10 Global Step: 218330 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:31,207-Speed 6305.86 samples/sec Loss 6.4863 LearningRate 0.0007 Epoch: 10 Global Step: 218340 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:34,460-Speed 6296.39 samples/sec Loss 6.3598 LearningRate 0.0007 Epoch: 10 Global Step: 218350 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:37,707-Speed 6309.38 samples/sec Loss 6.3704 LearningRate 0.0007 Epoch: 10 Global Step: 218360 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:40,960-Speed 6298.22 samples/sec Loss 6.4674 LearningRate 0.0007 Epoch: 10 Global Step: 218370 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:44,211-Speed 6300.29 samples/sec Loss 6.4619 LearningRate 0.0007 Epoch: 10 Global Step: 218380 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:47,460-Speed 6304.86 samples/sec Loss 6.4423 LearningRate 0.0007 Epoch: 10 Global Step: 218390 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:35:50,708-Speed 6308.13 samples/sec Loss 6.4454 LearningRate 0.0007 Epoch: 10 Global Step: 218400 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:35:53,938-Speed 6340.84 samples/sec Loss 6.4063 LearningRate 0.0007 Epoch: 10 Global Step: 218410 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:35:57,190-Speed 6300.37 samples/sec Loss 6.4513 LearningRate 0.0007 Epoch: 10 Global Step: 218420 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:00,432-Speed 6317.09 samples/sec Loss 6.5193 LearningRate 0.0007 Epoch: 10 Global Step: 218430 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:03,682-Speed 6304.08 samples/sec Loss 6.3831 LearningRate 0.0007 Epoch: 10 Global Step: 218440 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:06,934-Speed 6299.31 samples/sec Loss 6.4792 LearningRate 0.0007 Epoch: 10 Global Step: 218450 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:10,183-Speed 6304.05 samples/sec Loss 6.4661 LearningRate 0.0007 Epoch: 10 Global Step: 218460 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:13,434-Speed 6301.02 samples/sec Loss 6.4646 LearningRate 0.0007 Epoch: 10 Global Step: 218470 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:16,685-Speed 6301.89 samples/sec Loss 6.3506 LearningRate 0.0007 Epoch: 10 Global Step: 218480 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:19,933-Speed 6306.87 samples/sec Loss 6.4388 LearningRate 0.0007 Epoch: 10 Global Step: 218490 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:23,203-Speed 6263.67 samples/sec Loss 6.5140 LearningRate 0.0007 Epoch: 10 Global Step: 218500 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:26,437-Speed 6334.58 samples/sec Loss 6.4071 LearningRate 0.0007 Epoch: 10 Global Step: 218510 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:29,682-Speed 6311.27 samples/sec Loss 6.3439 LearningRate 0.0007 Epoch: 10 Global Step: 218520 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:32,931-Speed 6305.13 samples/sec Loss 6.4275 LearningRate 0.0007 Epoch: 10 Global Step: 218530 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:36,182-Speed 6302.10 samples/sec Loss 6.4085 LearningRate 0.0007 Epoch: 10 Global Step: 218540 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:39,435-Speed 6297.28 samples/sec Loss 6.4200 LearningRate 0.0007 Epoch: 10 Global Step: 218550 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:42,682-Speed 6307.67 samples/sec Loss 6.3990 LearningRate 0.0007 Epoch: 10 Global Step: 218560 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:45,924-Speed 6318.05 samples/sec Loss 6.4624 LearningRate 0.0007 Epoch: 10 Global Step: 218570 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:49,173-Speed 6305.28 samples/sec Loss 6.4116 LearningRate 0.0007 Epoch: 10 Global Step: 218580 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:52,419-Speed 6310.43 samples/sec Loss 6.3791 LearningRate 0.0007 Epoch: 10 Global Step: 218590 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:55,667-Speed 6308.80 samples/sec Loss 6.4678 LearningRate 0.0007 Epoch: 10 Global Step: 218600 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:36:58,917-Speed 6303.05 samples/sec Loss 6.4249 LearningRate 0.0007 Epoch: 10 Global Step: 218610 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:37:02,149-Speed 6337.65 samples/sec Loss 6.4088 LearningRate 0.0007 Epoch: 10 Global Step: 218620 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:05,397-Speed 6307.06 samples/sec Loss 6.4517 LearningRate 0.0007 Epoch: 10 Global Step: 218630 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:08,643-Speed 6310.95 samples/sec Loss 6.3578 LearningRate 0.0007 Epoch: 10 Global Step: 218640 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:11,885-Speed 6317.32 samples/sec Loss 6.3066 LearningRate 0.0007 Epoch: 10 Global Step: 218650 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:15,133-Speed 6307.55 samples/sec Loss 6.4743 LearningRate 0.0007 Epoch: 10 Global Step: 218660 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:18,381-Speed 6307.35 samples/sec Loss 6.3853 LearningRate 0.0007 Epoch: 10 Global Step: 218670 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:21,628-Speed 6306.98 samples/sec Loss 6.4028 LearningRate 0.0007 Epoch: 10 Global Step: 218680 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:24,873-Speed 6314.62 samples/sec Loss 6.4494 LearningRate 0.0007 Epoch: 10 Global Step: 218690 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:28,118-Speed 6311.09 samples/sec Loss 6.4578 LearningRate 0.0007 Epoch: 10 Global Step: 218700 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:31,367-Speed 6305.79 samples/sec Loss 6.4685 LearningRate 0.0007 Epoch: 10 Global Step: 218710 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:34,598-Speed 6339.23 samples/sec Loss 6.4091 LearningRate 0.0007 Epoch: 10 Global Step: 218720 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:37,844-Speed 6310.73 samples/sec Loss 6.4706 LearningRate 0.0007 Epoch: 10 Global Step: 218730 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:41,093-Speed 6305.35 samples/sec Loss 6.4093 LearningRate 0.0007 Epoch: 10 Global Step: 218740 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:44,338-Speed 6312.77 samples/sec Loss 6.4336 LearningRate 0.0007 Epoch: 10 Global Step: 218750 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:47,583-Speed 6313.22 samples/sec Loss 6.4452 LearningRate 0.0007 Epoch: 10 Global Step: 218760 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:50,830-Speed 6308.58 samples/sec Loss 6.4037 LearningRate 0.0007 Epoch: 10 Global Step: 218770 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:54,074-Speed 6313.42 samples/sec Loss 6.4117 LearningRate 0.0007 Epoch: 10 Global Step: 218780 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:37:57,323-Speed 6305.67 samples/sec Loss 6.4550 LearningRate 0.0007 Epoch: 10 Global Step: 218790 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:00,570-Speed 6309.83 samples/sec Loss 6.4183 LearningRate 0.0007 Epoch: 10 Global Step: 218800 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:03,812-Speed 6317.09 samples/sec Loss 6.4109 LearningRate 0.0007 Epoch: 10 Global Step: 218810 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:07,050-Speed 6326.98 samples/sec Loss 6.4126 LearningRate 0.0007 Epoch: 10 Global Step: 218820 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:10,294-Speed 6315.64 samples/sec Loss 6.4059 LearningRate 0.0007 Epoch: 10 Global Step: 218830 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:13,542-Speed 6307.12 samples/sec Loss 6.3738 LearningRate 0.0007 Epoch: 10 Global Step: 218840 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:16,790-Speed 6306.34 samples/sec Loss 6.3906 LearningRate 0.0007 Epoch: 10 Global Step: 218850 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:20,035-Speed 6313.15 samples/sec Loss 6.5064 LearningRate 0.0007 Epoch: 10 Global Step: 218860 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:23,281-Speed 6310.89 samples/sec Loss 6.3987 LearningRate 0.0007 Epoch: 10 Global Step: 218870 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:26,521-Speed 6321.49 samples/sec Loss 6.4668 LearningRate 0.0007 Epoch: 10 Global Step: 218880 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:29,767-Speed 6310.79 samples/sec Loss 6.4155 LearningRate 0.0007 Epoch: 10 Global Step: 218890 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:33,014-Speed 6307.97 samples/sec Loss 6.4450 LearningRate 0.0007 Epoch: 10 Global Step: 218900 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:36,263-Speed 6305.73 samples/sec Loss 6.4374 LearningRate 0.0007 Epoch: 10 Global Step: 218910 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:39,506-Speed 6315.21 samples/sec Loss 6.3930 LearningRate 0.0007 Epoch: 10 Global Step: 218920 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:38:42,739-Speed 6335.88 samples/sec Loss 6.4359 LearningRate 0.0007 Epoch: 10 Global Step: 218930 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:45,986-Speed 6309.89 samples/sec Loss 6.4459 LearningRate 0.0007 Epoch: 10 Global Step: 218940 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:49,230-Speed 6313.56 samples/sec Loss 6.4199 LearningRate 0.0007 Epoch: 10 Global Step: 218950 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:52,474-Speed 6315.44 samples/sec Loss 6.4354 LearningRate 0.0007 Epoch: 10 Global Step: 218960 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:55,720-Speed 6310.86 samples/sec Loss 6.3717 LearningRate 0.0007 Epoch: 10 Global Step: 218970 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:38:58,967-Speed 6308.33 samples/sec Loss 6.3399 LearningRate 0.0007 Epoch: 10 Global Step: 218980 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:39:02,213-Speed 6310.30 samples/sec Loss 6.3850 LearningRate 0.0007 Epoch: 10 Global Step: 218990 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:39:05,462-Speed 6305.57 samples/sec Loss 6.5044 LearningRate 0.0007 Epoch: 10 Global Step: 219000 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:39:08,706-Speed 6316.40 samples/sec Loss 6.3367 LearningRate 0.0007 Epoch: 10 Global Step: 219010 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:39:11,961-Speed 6292.04 samples/sec Loss 6.3643 LearningRate 0.0007 Epoch: 10 Global Step: 219020 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:39:15,206-Speed 6312.80 samples/sec Loss 6.4221 LearningRate 0.0007 Epoch: 10 Global Step: 219030 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:39:18,456-Speed 6303.56 samples/sec Loss 6.4066 LearningRate 0.0007 Epoch: 10 Global Step: 219040 Fp16 Grad Scale: 65536 Required: 56 hours Training: 2022-04-01 11:39:21,692-Speed 6330.70 samples/sec Loss 6.4387 LearningRate 0.0007 Epoch: 10 Global Step: 219050 Fp16 Grad Scale: 32768 Required: 56 hours Training: 2022-04-01 11:39:25,002-Speed 6188.49 samples/sec Loss 6.4118 LearningRate 0.0007 Epoch: 10 Global Step: 219060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:39:28,249-Speed 6308.42 samples/sec Loss 6.4248 LearningRate 0.0007 Epoch: 10 Global Step: 219070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:39:31,518-Speed 6265.55 samples/sec Loss 6.3849 LearningRate 0.0007 Epoch: 10 Global Step: 219080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:39:34,765-Speed 6308.90 samples/sec Loss 6.3657 LearningRate 0.0007 Epoch: 10 Global Step: 219090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:39:38,013-Speed 6307.98 samples/sec Loss 6.3236 LearningRate 0.0007 Epoch: 10 Global Step: 219100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:39:41,260-Speed 6307.31 samples/sec Loss 6.3603 LearningRate 0.0007 Epoch: 10 Global Step: 219110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:39:44,507-Speed 6308.08 samples/sec Loss 6.4468 LearningRate 0.0007 Epoch: 10 Global Step: 219120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:39:47,753-Speed 6312.73 samples/sec Loss 6.3025 LearningRate 0.0007 Epoch: 10 Global Step: 219130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:39:50,998-Speed 6312.37 samples/sec Loss 6.3755 LearningRate 0.0007 Epoch: 10 Global Step: 219140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:39:54,243-Speed 6312.11 samples/sec Loss 6.4352 LearningRate 0.0007 Epoch: 10 Global Step: 219150 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:39:57,475-Speed 6337.85 samples/sec Loss 6.4158 LearningRate 0.0007 Epoch: 10 Global Step: 219160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:00,722-Speed 6309.75 samples/sec Loss 6.4524 LearningRate 0.0007 Epoch: 10 Global Step: 219170 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:03,968-Speed 6310.44 samples/sec Loss 6.4319 LearningRate 0.0007 Epoch: 10 Global Step: 219180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:07,217-Speed 6303.76 samples/sec Loss 6.4311 LearningRate 0.0007 Epoch: 10 Global Step: 219190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:10,463-Speed 6310.13 samples/sec Loss 6.3365 LearningRate 0.0007 Epoch: 10 Global Step: 219200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:13,730-Speed 6271.81 samples/sec Loss 6.4686 LearningRate 0.0007 Epoch: 10 Global Step: 219210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:17,094-Speed 6088.66 samples/sec Loss 6.4290 LearningRate 0.0007 Epoch: 10 Global Step: 219220 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:20,363-Speed 6267.55 samples/sec Loss 6.4434 LearningRate 0.0007 Epoch: 10 Global Step: 219230 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:23,612-Speed 6303.89 samples/sec Loss 6.3766 LearningRate 0.0007 Epoch: 10 Global Step: 219240 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:26,856-Speed 6315.02 samples/sec Loss 6.3523 LearningRate 0.0007 Epoch: 10 Global Step: 219250 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:30,091-Speed 6333.04 samples/sec Loss 6.4201 LearningRate 0.0007 Epoch: 10 Global Step: 219260 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:33,341-Speed 6302.97 samples/sec Loss 6.3504 LearningRate 0.0007 Epoch: 10 Global Step: 219270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:36,596-Speed 6293.13 samples/sec Loss 6.3726 LearningRate 0.0007 Epoch: 10 Global Step: 219280 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:39,842-Speed 6310.86 samples/sec Loss 6.4598 LearningRate 0.0007 Epoch: 10 Global Step: 219290 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:43,089-Speed 6307.56 samples/sec Loss 6.4383 LearningRate 0.0007 Epoch: 10 Global Step: 219300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:46,338-Speed 6304.44 samples/sec Loss 6.3862 LearningRate 0.0007 Epoch: 10 Global Step: 219310 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:49,583-Speed 6313.31 samples/sec Loss 6.3268 LearningRate 0.0007 Epoch: 10 Global Step: 219320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:52,827-Speed 6315.32 samples/sec Loss 6.4360 LearningRate 0.0007 Epoch: 10 Global Step: 219330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:56,073-Speed 6310.05 samples/sec Loss 6.4213 LearningRate 0.0007 Epoch: 10 Global Step: 219340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:40:59,315-Speed 6317.95 samples/sec Loss 6.3877 LearningRate 0.0007 Epoch: 10 Global Step: 219350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:02,548-Speed 6336.17 samples/sec Loss 6.4082 LearningRate 0.0007 Epoch: 10 Global Step: 219360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:05,795-Speed 6308.66 samples/sec Loss 6.4292 LearningRate 0.0007 Epoch: 10 Global Step: 219370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:09,040-Speed 6312.90 samples/sec Loss 6.4337 LearningRate 0.0007 Epoch: 10 Global Step: 219380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:12,293-Speed 6298.00 samples/sec Loss 6.4427 LearningRate 0.0007 Epoch: 10 Global Step: 219390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:15,540-Speed 6308.01 samples/sec Loss 6.3913 LearningRate 0.0007 Epoch: 10 Global Step: 219400 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:18,789-Speed 6305.04 samples/sec Loss 6.3433 LearningRate 0.0007 Epoch: 10 Global Step: 219410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:22,033-Speed 6314.01 samples/sec Loss 6.3806 LearningRate 0.0007 Epoch: 10 Global Step: 219420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:25,286-Speed 6297.65 samples/sec Loss 6.4776 LearningRate 0.0007 Epoch: 10 Global Step: 219430 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:28,534-Speed 6307.03 samples/sec Loss 6.3598 LearningRate 0.0007 Epoch: 10 Global Step: 219440 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:31,781-Speed 6309.50 samples/sec Loss 6.3372 LearningRate 0.0007 Epoch: 10 Global Step: 219450 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:35,026-Speed 6312.17 samples/sec Loss 6.3944 LearningRate 0.0007 Epoch: 10 Global Step: 219460 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:41:38,271-Speed 6314.14 samples/sec Loss 6.3804 LearningRate 0.0007 Epoch: 10 Global Step: 219470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:41,519-Speed 6305.92 samples/sec Loss 6.3165 LearningRate 0.0007 Epoch: 10 Global Step: 219480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:44,765-Speed 6309.36 samples/sec Loss 6.4233 LearningRate 0.0007 Epoch: 10 Global Step: 219490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:48,011-Speed 6312.22 samples/sec Loss 6.4068 LearningRate 0.0007 Epoch: 10 Global Step: 219500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:51,256-Speed 6311.33 samples/sec Loss 6.4363 LearningRate 0.0007 Epoch: 10 Global Step: 219510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:54,527-Speed 6262.83 samples/sec Loss 6.4442 LearningRate 0.0007 Epoch: 10 Global Step: 219520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:41:57,777-Speed 6304.60 samples/sec Loss 6.4017 LearningRate 0.0007 Epoch: 10 Global Step: 219530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:01,025-Speed 6304.78 samples/sec Loss 6.3531 LearningRate 0.0007 Epoch: 10 Global Step: 219540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:04,272-Speed 6310.12 samples/sec Loss 6.4432 LearningRate 0.0007 Epoch: 10 Global Step: 219550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:07,516-Speed 6313.66 samples/sec Loss 6.4095 LearningRate 0.0007 Epoch: 10 Global Step: 219560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:10,749-Speed 6336.80 samples/sec Loss 6.3874 LearningRate 0.0007 Epoch: 10 Global Step: 219570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:14,000-Speed 6300.08 samples/sec Loss 6.4449 LearningRate 0.0007 Epoch: 10 Global Step: 219580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:17,248-Speed 6308.46 samples/sec Loss 6.4697 LearningRate 0.0007 Epoch: 10 Global Step: 219590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:20,489-Speed 6318.82 samples/sec Loss 6.4493 LearningRate 0.0007 Epoch: 10 Global Step: 219600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:23,733-Speed 6315.39 samples/sec Loss 6.3597 LearningRate 0.0007 Epoch: 10 Global Step: 219610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:26,983-Speed 6303.47 samples/sec Loss 6.3880 LearningRate 0.0007 Epoch: 10 Global Step: 219620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:30,234-Speed 6301.04 samples/sec Loss 6.4283 LearningRate 0.0007 Epoch: 10 Global Step: 219630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:33,480-Speed 6310.50 samples/sec Loss 6.4262 LearningRate 0.0007 Epoch: 10 Global Step: 219640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:36,723-Speed 6316.30 samples/sec Loss 6.3735 LearningRate 0.0007 Epoch: 10 Global Step: 219650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:39,969-Speed 6310.97 samples/sec Loss 6.4266 LearningRate 0.0007 Epoch: 10 Global Step: 219660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:43,215-Speed 6311.19 samples/sec Loss 6.4201 LearningRate 0.0007 Epoch: 10 Global Step: 219670 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:42:46,447-Speed 6336.74 samples/sec Loss 6.4881 LearningRate 0.0007 Epoch: 10 Global Step: 219680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:49,695-Speed 6307.69 samples/sec Loss 6.3702 LearningRate 0.0007 Epoch: 10 Global Step: 219690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:52,942-Speed 6309.83 samples/sec Loss 6.3711 LearningRate 0.0007 Epoch: 10 Global Step: 219700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:56,185-Speed 6315.08 samples/sec Loss 6.3237 LearningRate 0.0007 Epoch: 10 Global Step: 219710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:42:59,430-Speed 6313.04 samples/sec Loss 6.3802 LearningRate 0.0007 Epoch: 10 Global Step: 219720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:02,674-Speed 6314.23 samples/sec Loss 6.3163 LearningRate 0.0007 Epoch: 10 Global Step: 219730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:05,919-Speed 6312.66 samples/sec Loss 6.3808 LearningRate 0.0007 Epoch: 10 Global Step: 219740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:09,165-Speed 6311.47 samples/sec Loss 6.4170 LearningRate 0.0007 Epoch: 10 Global Step: 219750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:12,408-Speed 6317.42 samples/sec Loss 6.4002 LearningRate 0.0007 Epoch: 10 Global Step: 219760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:15,653-Speed 6312.28 samples/sec Loss 6.3631 LearningRate 0.0007 Epoch: 10 Global Step: 219770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:18,899-Speed 6309.81 samples/sec Loss 6.4266 LearningRate 0.0007 Epoch: 10 Global Step: 219780 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:43:22,145-Speed 6311.93 samples/sec Loss 6.4057 LearningRate 0.0007 Epoch: 10 Global Step: 219790 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:43:25,392-Speed 6308.67 samples/sec Loss 6.4636 LearningRate 0.0007 Epoch: 10 Global Step: 219800 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:43:28,624-Speed 6337.06 samples/sec Loss 6.3893 LearningRate 0.0007 Epoch: 10 Global Step: 219810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:31,871-Speed 6308.53 samples/sec Loss 6.4473 LearningRate 0.0007 Epoch: 10 Global Step: 219820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:35,130-Speed 6286.67 samples/sec Loss 6.3703 LearningRate 0.0007 Epoch: 10 Global Step: 219830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:38,378-Speed 6305.78 samples/sec Loss 6.4443 LearningRate 0.0007 Epoch: 10 Global Step: 219840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:41,626-Speed 6306.67 samples/sec Loss 6.3317 LearningRate 0.0007 Epoch: 10 Global Step: 219850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:44,877-Speed 6302.36 samples/sec Loss 6.3641 LearningRate 0.0007 Epoch: 10 Global Step: 219860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:48,132-Speed 6292.36 samples/sec Loss 6.3875 LearningRate 0.0007 Epoch: 10 Global Step: 219870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:51,377-Speed 6313.75 samples/sec Loss 6.3556 LearningRate 0.0007 Epoch: 10 Global Step: 219880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:54,623-Speed 6310.06 samples/sec Loss 6.3699 LearningRate 0.0007 Epoch: 10 Global Step: 219890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:43:57,869-Speed 6310.81 samples/sec Loss 6.4995 LearningRate 0.0007 Epoch: 10 Global Step: 219900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:01,120-Speed 6301.74 samples/sec Loss 6.4040 LearningRate 0.0007 Epoch: 10 Global Step: 219910 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:44:04,355-Speed 6332.28 samples/sec Loss 6.3838 LearningRate 0.0007 Epoch: 10 Global Step: 219920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:07,603-Speed 6305.94 samples/sec Loss 6.4401 LearningRate 0.0007 Epoch: 10 Global Step: 219930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:10,851-Speed 6307.38 samples/sec Loss 6.4091 LearningRate 0.0007 Epoch: 10 Global Step: 219940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:14,097-Speed 6311.10 samples/sec Loss 6.4724 LearningRate 0.0007 Epoch: 10 Global Step: 219950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:17,343-Speed 6311.51 samples/sec Loss 6.3875 LearningRate 0.0007 Epoch: 10 Global Step: 219960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:20,586-Speed 6315.74 samples/sec Loss 6.4601 LearningRate 0.0007 Epoch: 10 Global Step: 219970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:23,833-Speed 6308.26 samples/sec Loss 6.4164 LearningRate 0.0007 Epoch: 10 Global Step: 219980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:27,083-Speed 6303.06 samples/sec Loss 6.4491 LearningRate 0.0007 Epoch: 10 Global Step: 219990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:30,330-Speed 6309.28 samples/sec Loss 6.4009 LearningRate 0.0007 Epoch: 10 Global Step: 220000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:33,579-Speed 6305.80 samples/sec Loss 6.4643 LearningRate 0.0007 Epoch: 10 Global Step: 220010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:36,823-Speed 6312.91 samples/sec Loss 6.3841 LearningRate 0.0007 Epoch: 10 Global Step: 220020 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:44:40,079-Speed 6292.24 samples/sec Loss 6.3945 LearningRate 0.0007 Epoch: 10 Global Step: 220030 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:44:43,329-Speed 6302.48 samples/sec Loss 6.3935 LearningRate 0.0007 Epoch: 10 Global Step: 220040 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:44:46,572-Speed 6315.90 samples/sec Loss 6.3851 LearningRate 0.0007 Epoch: 10 Global Step: 220050 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:44:49,806-Speed 6334.80 samples/sec Loss 6.3456 LearningRate 0.0007 Epoch: 10 Global Step: 220060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:53,055-Speed 6305.38 samples/sec Loss 6.4085 LearningRate 0.0007 Epoch: 10 Global Step: 220070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:56,302-Speed 6308.93 samples/sec Loss 6.4178 LearningRate 0.0007 Epoch: 10 Global Step: 220080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:44:59,547-Speed 6313.66 samples/sec Loss 6.3458 LearningRate 0.0007 Epoch: 10 Global Step: 220090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:02,797-Speed 6301.47 samples/sec Loss 6.4082 LearningRate 0.0007 Epoch: 10 Global Step: 220100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:06,045-Speed 6306.44 samples/sec Loss 6.4370 LearningRate 0.0007 Epoch: 10 Global Step: 220110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:09,295-Speed 6303.74 samples/sec Loss 6.4236 LearningRate 0.0007 Epoch: 10 Global Step: 220120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:12,540-Speed 6312.74 samples/sec Loss 6.4601 LearningRate 0.0007 Epoch: 10 Global Step: 220130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:15,787-Speed 6308.25 samples/sec Loss 6.4866 LearningRate 0.0007 Epoch: 10 Global Step: 220140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:19,035-Speed 6308.25 samples/sec Loss 6.4587 LearningRate 0.0007 Epoch: 10 Global Step: 220150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:22,267-Speed 6336.90 samples/sec Loss 6.3676 LearningRate 0.0007 Epoch: 10 Global Step: 220160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:25,511-Speed 6314.94 samples/sec Loss 6.3404 LearningRate 0.0007 Epoch: 10 Global Step: 220170 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:28,757-Speed 6310.83 samples/sec Loss 6.4016 LearningRate 0.0007 Epoch: 10 Global Step: 220180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:32,006-Speed 6305.96 samples/sec Loss 6.4108 LearningRate 0.0007 Epoch: 10 Global Step: 220190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:35,256-Speed 6301.87 samples/sec Loss 6.4209 LearningRate 0.0007 Epoch: 10 Global Step: 220200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:38,504-Speed 6307.36 samples/sec Loss 6.4201 LearningRate 0.0007 Epoch: 10 Global Step: 220210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:41,762-Speed 6286.64 samples/sec Loss 6.2942 LearningRate 0.0007 Epoch: 10 Global Step: 220220 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:45,009-Speed 6308.63 samples/sec Loss 6.4784 LearningRate 0.0007 Epoch: 10 Global Step: 220230 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:48,267-Speed 6288.42 samples/sec Loss 6.3805 LearningRate 0.0007 Epoch: 10 Global Step: 220240 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:51,512-Speed 6312.08 samples/sec Loss 6.4243 LearningRate 0.0007 Epoch: 10 Global Step: 220250 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:54,744-Speed 6337.27 samples/sec Loss 6.4507 LearningRate 0.0007 Epoch: 10 Global Step: 220260 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:45:57,990-Speed 6311.91 samples/sec Loss 6.3935 LearningRate 0.0007 Epoch: 10 Global Step: 220270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:01,241-Speed 6299.93 samples/sec Loss 6.3882 LearningRate 0.0007 Epoch: 10 Global Step: 220280 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:04,492-Speed 6302.46 samples/sec Loss 6.4840 LearningRate 0.0007 Epoch: 10 Global Step: 220290 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:07,740-Speed 6307.30 samples/sec Loss 6.3862 LearningRate 0.0007 Epoch: 10 Global Step: 220300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:10,990-Speed 6302.08 samples/sec Loss 6.4057 LearningRate 0.0007 Epoch: 10 Global Step: 220310 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:14,235-Speed 6311.99 samples/sec Loss 6.3768 LearningRate 0.0007 Epoch: 10 Global Step: 220320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:17,481-Speed 6311.71 samples/sec Loss 6.4055 LearningRate 0.0007 Epoch: 10 Global Step: 220330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:20,729-Speed 6305.95 samples/sec Loss 6.3645 LearningRate 0.0007 Epoch: 10 Global Step: 220340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:23,974-Speed 6313.51 samples/sec Loss 6.3765 LearningRate 0.0007 Epoch: 10 Global Step: 220350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:27,210-Speed 6330.86 samples/sec Loss 6.4215 LearningRate 0.0007 Epoch: 10 Global Step: 220360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:30,459-Speed 6304.34 samples/sec Loss 6.5266 LearningRate 0.0007 Epoch: 10 Global Step: 220370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:33,705-Speed 6310.43 samples/sec Loss 6.4541 LearningRate 0.0007 Epoch: 10 Global Step: 220380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:36,950-Speed 6312.26 samples/sec Loss 6.4549 LearningRate 0.0007 Epoch: 10 Global Step: 220390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:46:40,191-Speed 6321.22 samples/sec Loss 6.3986 LearningRate 0.0007 Epoch: 10 Global Step: 220400 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-04-01 11:46:43,438-Speed 6308.77 samples/sec Loss 6.4596 LearningRate 0.0007 Epoch: 10 Global Step: 220410 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-04-01 11:46:46,689-Speed 6300.82 samples/sec Loss 6.3947 LearningRate 0.0007 Epoch: 10 Global Step: 220420 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-04-01 11:46:49,934-Speed 6312.21 samples/sec Loss 6.3959 LearningRate 0.0007 Epoch: 10 Global Step: 220430 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-04-01 11:46:53,181-Speed 6308.65 samples/sec Loss 6.3647 LearningRate 0.0007 Epoch: 10 Global Step: 220440 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-04-01 11:46:56,429-Speed 6305.97 samples/sec Loss 6.3821 LearningRate 0.0007 Epoch: 10 Global Step: 220450 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-04-01 11:46:59,677-Speed 6309.64 samples/sec Loss 6.3656 LearningRate 0.0007 Epoch: 10 Global Step: 220460 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-04-01 11:47:02,937-Speed 6283.66 samples/sec Loss 6.3991 LearningRate 0.0007 Epoch: 10 Global Step: 220470 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-04-01 11:47:06,183-Speed 6310.76 samples/sec Loss 6.3682 LearningRate 0.0007 Epoch: 10 Global Step: 220480 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-04-01 11:47:09,429-Speed 6310.12 samples/sec Loss 6.3430 LearningRate 0.0007 Epoch: 10 Global Step: 220490 Fp16 Grad Scale: 16384 Required: 55 hours Training: 2022-04-01 11:47:12,677-Speed 6308.36 samples/sec Loss 6.3785 LearningRate 0.0007 Epoch: 10 Global Step: 220500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:15,924-Speed 6309.12 samples/sec Loss 6.3814 LearningRate 0.0007 Epoch: 10 Global Step: 220510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:19,173-Speed 6306.11 samples/sec Loss 6.4426 LearningRate 0.0007 Epoch: 10 Global Step: 220520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:22,418-Speed 6312.33 samples/sec Loss 6.4174 LearningRate 0.0007 Epoch: 10 Global Step: 220530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:25,665-Speed 6308.56 samples/sec Loss 6.3740 LearningRate 0.0007 Epoch: 10 Global Step: 220540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:28,923-Speed 6287.75 samples/sec Loss 6.3854 LearningRate 0.0007 Epoch: 10 Global Step: 220550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:32,172-Speed 6305.78 samples/sec Loss 6.3944 LearningRate 0.0007 Epoch: 10 Global Step: 220560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:35,453-Speed 6242.31 samples/sec Loss 6.3845 LearningRate 0.0007 Epoch: 10 Global Step: 220570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:38,700-Speed 6308.43 samples/sec Loss 6.4144 LearningRate 0.0007 Epoch: 10 Global Step: 220580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:41,949-Speed 6306.47 samples/sec Loss 6.3647 LearningRate 0.0007 Epoch: 10 Global Step: 220590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:45,182-Speed 6335.70 samples/sec Loss 6.4226 LearningRate 0.0007 Epoch: 10 Global Step: 220600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:48,428-Speed 6309.36 samples/sec Loss 6.4688 LearningRate 0.0007 Epoch: 10 Global Step: 220610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:51,672-Speed 6315.89 samples/sec Loss 6.4697 LearningRate 0.0007 Epoch: 10 Global Step: 220620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:54,917-Speed 6312.77 samples/sec Loss 6.4332 LearningRate 0.0007 Epoch: 10 Global Step: 220630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:47:58,169-Speed 6298.92 samples/sec Loss 6.3731 LearningRate 0.0007 Epoch: 10 Global Step: 220640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:01,414-Speed 6312.21 samples/sec Loss 6.4054 LearningRate 0.0007 Epoch: 10 Global Step: 220650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:04,661-Speed 6308.73 samples/sec Loss 6.3934 LearningRate 0.0007 Epoch: 10 Global Step: 220660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:07,912-Speed 6300.77 samples/sec Loss 6.3959 LearningRate 0.0007 Epoch: 10 Global Step: 220670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:11,161-Speed 6304.72 samples/sec Loss 6.4045 LearningRate 0.0007 Epoch: 10 Global Step: 220680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:14,411-Speed 6304.05 samples/sec Loss 6.4279 LearningRate 0.0007 Epoch: 10 Global Step: 220690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:17,642-Speed 6340.08 samples/sec Loss 6.4169 LearningRate 0.0007 Epoch: 10 Global Step: 220700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:20,892-Speed 6301.28 samples/sec Loss 6.3979 LearningRate 0.0007 Epoch: 10 Global Step: 220710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:24,142-Speed 6304.24 samples/sec Loss 6.4198 LearningRate 0.0007 Epoch: 10 Global Step: 220720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:27,395-Speed 6297.41 samples/sec Loss 6.3839 LearningRate 0.0007 Epoch: 10 Global Step: 220730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:30,644-Speed 6305.95 samples/sec Loss 6.5141 LearningRate 0.0007 Epoch: 10 Global Step: 220740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:33,893-Speed 6303.25 samples/sec Loss 6.3820 LearningRate 0.0007 Epoch: 10 Global Step: 220750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:37,144-Speed 6302.01 samples/sec Loss 6.4246 LearningRate 0.0007 Epoch: 10 Global Step: 220760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:40,388-Speed 6314.16 samples/sec Loss 6.3759 LearningRate 0.0007 Epoch: 10 Global Step: 220770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:43,634-Speed 6310.67 samples/sec Loss 6.3687 LearningRate 0.0007 Epoch: 10 Global Step: 220780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:46,881-Speed 6308.93 samples/sec Loss 6.4003 LearningRate 0.0007 Epoch: 10 Global Step: 220790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:50,115-Speed 6334.34 samples/sec Loss 6.4112 LearningRate 0.0007 Epoch: 10 Global Step: 220800 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:53,361-Speed 6310.10 samples/sec Loss 6.4425 LearningRate 0.0007 Epoch: 10 Global Step: 220810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:56,608-Speed 6309.32 samples/sec Loss 6.3695 LearningRate 0.0007 Epoch: 10 Global Step: 220820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:48:59,855-Speed 6309.29 samples/sec Loss 6.4577 LearningRate 0.0007 Epoch: 10 Global Step: 220830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:03,101-Speed 6309.05 samples/sec Loss 6.4725 LearningRate 0.0007 Epoch: 10 Global Step: 220840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:06,348-Speed 6309.54 samples/sec Loss 6.3959 LearningRate 0.0007 Epoch: 10 Global Step: 220850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:09,599-Speed 6300.53 samples/sec Loss 6.4019 LearningRate 0.0007 Epoch: 10 Global Step: 220860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:12,869-Speed 6264.31 samples/sec Loss 6.4470 LearningRate 0.0007 Epoch: 10 Global Step: 220870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:16,208-Speed 6135.02 samples/sec Loss 6.3765 LearningRate 0.0007 Epoch: 10 Global Step: 220880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:19,453-Speed 6313.11 samples/sec Loss 6.3324 LearningRate 0.0007 Epoch: 10 Global Step: 220890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:22,702-Speed 6304.42 samples/sec Loss 6.3806 LearningRate 0.0007 Epoch: 10 Global Step: 220900 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:49:25,936-Speed 6333.62 samples/sec Loss 6.3454 LearningRate 0.0007 Epoch: 10 Global Step: 220910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:29,183-Speed 6310.21 samples/sec Loss 6.4366 LearningRate 0.0007 Epoch: 10 Global Step: 220920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:32,429-Speed 6309.43 samples/sec Loss 6.4014 LearningRate 0.0007 Epoch: 10 Global Step: 220930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:35,674-Speed 6314.19 samples/sec Loss 6.4203 LearningRate 0.0007 Epoch: 10 Global Step: 220940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:38,920-Speed 6309.98 samples/sec Loss 6.3902 LearningRate 0.0007 Epoch: 10 Global Step: 220950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:42,178-Speed 6288.45 samples/sec Loss 6.4559 LearningRate 0.0007 Epoch: 10 Global Step: 220960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:45,427-Speed 6305.56 samples/sec Loss 6.3299 LearningRate 0.0007 Epoch: 10 Global Step: 220970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:48,674-Speed 6308.69 samples/sec Loss 6.4705 LearningRate 0.0007 Epoch: 10 Global Step: 220980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:51,918-Speed 6313.40 samples/sec Loss 6.4089 LearningRate 0.0007 Epoch: 10 Global Step: 220990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:55,164-Speed 6310.74 samples/sec Loss 6.3561 LearningRate 0.0007 Epoch: 10 Global Step: 221000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:49:58,394-Speed 6341.69 samples/sec Loss 6.4311 LearningRate 0.0007 Epoch: 10 Global Step: 221010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:01,643-Speed 6305.74 samples/sec Loss 6.3952 LearningRate 0.0007 Epoch: 10 Global Step: 221020 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:04,889-Speed 6310.97 samples/sec Loss 6.4442 LearningRate 0.0007 Epoch: 10 Global Step: 221030 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:08,135-Speed 6309.92 samples/sec Loss 6.4399 LearningRate 0.0007 Epoch: 10 Global Step: 221040 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:11,386-Speed 6301.76 samples/sec Loss 6.3934 LearningRate 0.0007 Epoch: 10 Global Step: 221050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:14,637-Speed 6299.55 samples/sec Loss 6.4597 LearningRate 0.0007 Epoch: 10 Global Step: 221060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:17,883-Speed 6310.67 samples/sec Loss 6.3874 LearningRate 0.0007 Epoch: 10 Global Step: 221070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:21,130-Speed 6310.23 samples/sec Loss 6.4372 LearningRate 0.0007 Epoch: 10 Global Step: 221080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:24,375-Speed 6311.79 samples/sec Loss 6.4157 LearningRate 0.0007 Epoch: 10 Global Step: 221090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:27,621-Speed 6311.06 samples/sec Loss 6.3638 LearningRate 0.0007 Epoch: 10 Global Step: 221100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:30,869-Speed 6306.34 samples/sec Loss 6.4334 LearningRate 0.0007 Epoch: 10 Global Step: 221110 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:50:34,102-Speed 6337.34 samples/sec Loss 6.3587 LearningRate 0.0007 Epoch: 10 Global Step: 221120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:37,350-Speed 6306.43 samples/sec Loss 6.3604 LearningRate 0.0007 Epoch: 10 Global Step: 221130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:40,597-Speed 6307.82 samples/sec Loss 6.3837 LearningRate 0.0007 Epoch: 10 Global Step: 221140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:43,844-Speed 6308.90 samples/sec Loss 6.3128 LearningRate 0.0007 Epoch: 10 Global Step: 221150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:47,100-Speed 6291.62 samples/sec Loss 6.3959 LearningRate 0.0007 Epoch: 10 Global Step: 221160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:50,348-Speed 6306.97 samples/sec Loss 6.4636 LearningRate 0.0007 Epoch: 10 Global Step: 221170 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:53,595-Speed 6308.91 samples/sec Loss 6.3346 LearningRate 0.0007 Epoch: 10 Global Step: 221180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:50:56,843-Speed 6308.11 samples/sec Loss 6.3523 LearningRate 0.0007 Epoch: 10 Global Step: 221190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:00,089-Speed 6310.84 samples/sec Loss 6.3964 LearningRate 0.0007 Epoch: 10 Global Step: 221200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:03,336-Speed 6308.61 samples/sec Loss 6.3908 LearningRate 0.0007 Epoch: 10 Global Step: 221210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:06,584-Speed 6306.21 samples/sec Loss 6.3656 LearningRate 0.0007 Epoch: 10 Global Step: 221220 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:51:09,816-Speed 6337.27 samples/sec Loss 6.3413 LearningRate 0.0007 Epoch: 10 Global Step: 221230 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:13,069-Speed 6296.99 samples/sec Loss 6.3710 LearningRate 0.0007 Epoch: 10 Global Step: 221240 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:16,318-Speed 6305.90 samples/sec Loss 6.3431 LearningRate 0.0007 Epoch: 10 Global Step: 221250 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:19,566-Speed 6306.04 samples/sec Loss 6.4086 LearningRate 0.0007 Epoch: 10 Global Step: 221260 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:22,812-Speed 6311.16 samples/sec Loss 6.4239 LearningRate 0.0007 Epoch: 10 Global Step: 221270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:26,060-Speed 6306.97 samples/sec Loss 6.3372 LearningRate 0.0007 Epoch: 10 Global Step: 221280 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:29,308-Speed 6307.07 samples/sec Loss 6.3666 LearningRate 0.0007 Epoch: 10 Global Step: 221290 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:32,560-Speed 6298.41 samples/sec Loss 6.3411 LearningRate 0.0007 Epoch: 10 Global Step: 221300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:35,823-Speed 6277.58 samples/sec Loss 6.3056 LearningRate 0.0007 Epoch: 10 Global Step: 221310 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:39,139-Speed 6178.67 samples/sec Loss 6.4179 LearningRate 0.0007 Epoch: 10 Global Step: 221320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:42,381-Speed 6318.69 samples/sec Loss 6.3970 LearningRate 0.0007 Epoch: 10 Global Step: 221330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:45,623-Speed 6317.82 samples/sec Loss 6.4632 LearningRate 0.0007 Epoch: 10 Global Step: 221340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:48,870-Speed 6309.14 samples/sec Loss 6.4207 LearningRate 0.0007 Epoch: 10 Global Step: 221350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:52,117-Speed 6308.65 samples/sec Loss 6.4652 LearningRate 0.0007 Epoch: 10 Global Step: 221360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:55,360-Speed 6315.53 samples/sec Loss 6.3733 LearningRate 0.0007 Epoch: 10 Global Step: 221370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:51:58,609-Speed 6306.81 samples/sec Loss 6.3870 LearningRate 0.0007 Epoch: 10 Global Step: 221380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:01,865-Speed 6289.85 samples/sec Loss 6.4129 LearningRate 0.0007 Epoch: 10 Global Step: 221390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:05,112-Speed 6309.69 samples/sec Loss 6.3910 LearningRate 0.0007 Epoch: 10 Global Step: 221400 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:08,358-Speed 6310.71 samples/sec Loss 6.3913 LearningRate 0.0007 Epoch: 10 Global Step: 221410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:11,607-Speed 6304.29 samples/sec Loss 6.3946 LearningRate 0.0007 Epoch: 10 Global Step: 221420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:14,856-Speed 6305.32 samples/sec Loss 6.4544 LearningRate 0.0007 Epoch: 10 Global Step: 221430 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:52:18,096-Speed 6323.15 samples/sec Loss 6.4517 LearningRate 0.0007 Epoch: 10 Global Step: 221440 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:21,344-Speed 6306.63 samples/sec Loss 6.4861 LearningRate 0.0007 Epoch: 10 Global Step: 221450 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:24,590-Speed 6310.13 samples/sec Loss 6.3695 LearningRate 0.0007 Epoch: 10 Global Step: 221460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:27,844-Speed 6294.72 samples/sec Loss 6.3873 LearningRate 0.0007 Epoch: 10 Global Step: 221470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:31,085-Speed 6321.02 samples/sec Loss 6.4600 LearningRate 0.0007 Epoch: 10 Global Step: 221480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:34,332-Speed 6309.18 samples/sec Loss 6.3183 LearningRate 0.0007 Epoch: 10 Global Step: 221490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:37,574-Speed 6317.61 samples/sec Loss 6.4077 LearningRate 0.0007 Epoch: 10 Global Step: 221500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:40,820-Speed 6311.36 samples/sec Loss 6.3551 LearningRate 0.0007 Epoch: 10 Global Step: 221510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:44,066-Speed 6309.95 samples/sec Loss 6.4345 LearningRate 0.0007 Epoch: 10 Global Step: 221520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:47,313-Speed 6308.35 samples/sec Loss 6.4554 LearningRate 0.0007 Epoch: 10 Global Step: 221530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:50,557-Speed 6315.22 samples/sec Loss 6.3990 LearningRate 0.0007 Epoch: 10 Global Step: 221540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:53,803-Speed 6310.57 samples/sec Loss 6.4494 LearningRate 0.0007 Epoch: 10 Global Step: 221550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:52:57,046-Speed 6315.66 samples/sec Loss 6.4418 LearningRate 0.0007 Epoch: 10 Global Step: 221560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:00,294-Speed 6307.90 samples/sec Loss 6.4221 LearningRate 0.0007 Epoch: 10 Global Step: 221570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:03,539-Speed 6312.16 samples/sec Loss 6.4295 LearningRate 0.0007 Epoch: 10 Global Step: 221580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:06,787-Speed 6307.51 samples/sec Loss 6.3196 LearningRate 0.0007 Epoch: 10 Global Step: 221590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:10,033-Speed 6309.57 samples/sec Loss 6.4075 LearningRate 0.0007 Epoch: 10 Global Step: 221600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:13,282-Speed 6306.44 samples/sec Loss 6.3485 LearningRate 0.0007 Epoch: 10 Global Step: 221610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:16,530-Speed 6306.79 samples/sec Loss 6.3314 LearningRate 0.0007 Epoch: 10 Global Step: 221620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:19,777-Speed 6310.17 samples/sec Loss 6.2964 LearningRate 0.0007 Epoch: 10 Global Step: 221630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:23,013-Speed 6329.37 samples/sec Loss 6.4303 LearningRate 0.0007 Epoch: 10 Global Step: 221640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:26,262-Speed 6305.08 samples/sec Loss 6.4149 LearningRate 0.0007 Epoch: 10 Global Step: 221650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:29,509-Speed 6308.08 samples/sec Loss 6.3965 LearningRate 0.0007 Epoch: 10 Global Step: 221660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:32,759-Speed 6302.41 samples/sec Loss 6.3993 LearningRate 0.0007 Epoch: 10 Global Step: 221670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:36,005-Speed 6311.80 samples/sec Loss 6.4094 LearningRate 0.0007 Epoch: 10 Global Step: 221680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:39,252-Speed 6307.69 samples/sec Loss 6.4747 LearningRate 0.0007 Epoch: 10 Global Step: 221690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:42,499-Speed 6310.14 samples/sec Loss 6.3915 LearningRate 0.0007 Epoch: 10 Global Step: 221700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:45,745-Speed 6309.26 samples/sec Loss 6.3177 LearningRate 0.0007 Epoch: 10 Global Step: 221710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:48,992-Speed 6309.96 samples/sec Loss 6.3364 LearningRate 0.0007 Epoch: 10 Global Step: 221720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:52,243-Speed 6300.70 samples/sec Loss 6.3127 LearningRate 0.0007 Epoch: 10 Global Step: 221730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:53:55,488-Speed 6312.14 samples/sec Loss 6.3149 LearningRate 0.0007 Epoch: 10 Global Step: 221740 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:53:58,726-Speed 6326.16 samples/sec Loss 6.4377 LearningRate 0.0007 Epoch: 10 Global Step: 221750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:01,971-Speed 6313.78 samples/sec Loss 6.3486 LearningRate 0.0007 Epoch: 10 Global Step: 221760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:05,216-Speed 6312.19 samples/sec Loss 6.3662 LearningRate 0.0007 Epoch: 10 Global Step: 221770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:08,459-Speed 6316.13 samples/sec Loss 6.3568 LearningRate 0.0007 Epoch: 10 Global Step: 221780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:11,705-Speed 6311.23 samples/sec Loss 6.3078 LearningRate 0.0007 Epoch: 10 Global Step: 221790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:14,951-Speed 6311.20 samples/sec Loss 6.4834 LearningRate 0.0007 Epoch: 10 Global Step: 221800 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:18,200-Speed 6305.34 samples/sec Loss 6.3326 LearningRate 0.0007 Epoch: 10 Global Step: 221810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:21,448-Speed 6306.32 samples/sec Loss 6.4724 LearningRate 0.0007 Epoch: 10 Global Step: 221820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:24,692-Speed 6315.07 samples/sec Loss 6.3770 LearningRate 0.0007 Epoch: 10 Global Step: 221830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:27,938-Speed 6310.92 samples/sec Loss 6.3057 LearningRate 0.0007 Epoch: 10 Global Step: 221840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:31,176-Speed 6325.53 samples/sec Loss 6.4281 LearningRate 0.0007 Epoch: 10 Global Step: 221850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:34,425-Speed 6305.07 samples/sec Loss 6.3484 LearningRate 0.0007 Epoch: 10 Global Step: 221860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:37,676-Speed 6300.52 samples/sec Loss 6.3352 LearningRate 0.0007 Epoch: 10 Global Step: 221870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:40,923-Speed 6308.76 samples/sec Loss 6.4186 LearningRate 0.0007 Epoch: 10 Global Step: 221880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:44,169-Speed 6311.39 samples/sec Loss 6.3897 LearningRate 0.0007 Epoch: 10 Global Step: 221890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:47,416-Speed 6308.20 samples/sec Loss 6.4557 LearningRate 0.0007 Epoch: 10 Global Step: 221900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:50,663-Speed 6309.25 samples/sec Loss 6.4281 LearningRate 0.0007 Epoch: 10 Global Step: 221910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:53,906-Speed 6317.15 samples/sec Loss 6.4091 LearningRate 0.0007 Epoch: 10 Global Step: 221920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:54:57,153-Speed 6308.88 samples/sec Loss 6.4009 LearningRate 0.0007 Epoch: 10 Global Step: 221930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:00,401-Speed 6306.48 samples/sec Loss 6.3797 LearningRate 0.0007 Epoch: 10 Global Step: 221940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:03,649-Speed 6307.22 samples/sec Loss 6.3749 LearningRate 0.0007 Epoch: 10 Global Step: 221950 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:55:06,884-Speed 6331.82 samples/sec Loss 6.3623 LearningRate 0.0007 Epoch: 10 Global Step: 221960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:10,138-Speed 6294.47 samples/sec Loss 6.4215 LearningRate 0.0007 Epoch: 10 Global Step: 221970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:13,384-Speed 6311.74 samples/sec Loss 6.2952 LearningRate 0.0007 Epoch: 10 Global Step: 221980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:16,635-Speed 6299.71 samples/sec Loss 6.3711 LearningRate 0.0007 Epoch: 10 Global Step: 221990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:19,883-Speed 6307.24 samples/sec Loss 6.3337 LearningRate 0.0007 Epoch: 10 Global Step: 222000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:23,131-Speed 6306.59 samples/sec Loss 6.4251 LearningRate 0.0007 Epoch: 10 Global Step: 222010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:26,380-Speed 6306.49 samples/sec Loss 6.3928 LearningRate 0.0007 Epoch: 10 Global Step: 222020 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:29,631-Speed 6301.57 samples/sec Loss 6.4544 LearningRate 0.0007 Epoch: 10 Global Step: 222030 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:32,876-Speed 6311.38 samples/sec Loss 6.3540 LearningRate 0.0007 Epoch: 10 Global Step: 222040 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:36,125-Speed 6306.00 samples/sec Loss 6.3707 LearningRate 0.0007 Epoch: 10 Global Step: 222050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:39,358-Speed 6336.44 samples/sec Loss 6.3578 LearningRate 0.0007 Epoch: 10 Global Step: 222060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:42,602-Speed 6313.58 samples/sec Loss 6.4319 LearningRate 0.0007 Epoch: 10 Global Step: 222070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:45,852-Speed 6303.43 samples/sec Loss 6.3890 LearningRate 0.0007 Epoch: 10 Global Step: 222080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:49,097-Speed 6312.04 samples/sec Loss 6.3713 LearningRate 0.0007 Epoch: 10 Global Step: 222090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:52,343-Speed 6311.27 samples/sec Loss 6.3649 LearningRate 0.0007 Epoch: 10 Global Step: 222100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:55,590-Speed 6308.72 samples/sec Loss 6.4192 LearningRate 0.0007 Epoch: 10 Global Step: 222110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:55:58,835-Speed 6312.24 samples/sec Loss 6.3011 LearningRate 0.0007 Epoch: 10 Global Step: 222120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:02,081-Speed 6310.68 samples/sec Loss 6.3759 LearningRate 0.0007 Epoch: 10 Global Step: 222130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:05,323-Speed 6318.67 samples/sec Loss 6.3774 LearningRate 0.0007 Epoch: 10 Global Step: 222140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:08,570-Speed 6309.51 samples/sec Loss 6.3795 LearningRate 0.0007 Epoch: 10 Global Step: 222150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:11,814-Speed 6314.40 samples/sec Loss 6.3950 LearningRate 0.0007 Epoch: 10 Global Step: 222160 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:56:15,062-Speed 6306.50 samples/sec Loss 6.3479 LearningRate 0.0007 Epoch: 10 Global Step: 222170 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:56:18,300-Speed 6326.09 samples/sec Loss 6.5079 LearningRate 0.0007 Epoch: 10 Global Step: 222180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:21,551-Speed 6301.27 samples/sec Loss 6.2931 LearningRate 0.0007 Epoch: 10 Global Step: 222190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:24,798-Speed 6307.91 samples/sec Loss 6.3742 LearningRate 0.0007 Epoch: 10 Global Step: 222200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:28,044-Speed 6311.63 samples/sec Loss 6.4030 LearningRate 0.0007 Epoch: 10 Global Step: 222210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:31,289-Speed 6311.36 samples/sec Loss 6.2976 LearningRate 0.0007 Epoch: 10 Global Step: 222220 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:34,539-Speed 6304.03 samples/sec Loss 6.3747 LearningRate 0.0007 Epoch: 10 Global Step: 222230 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:37,781-Speed 6318.55 samples/sec Loss 6.4656 LearningRate 0.0007 Epoch: 10 Global Step: 222240 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:41,026-Speed 6312.70 samples/sec Loss 6.4016 LearningRate 0.0007 Epoch: 10 Global Step: 222250 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:44,273-Speed 6308.48 samples/sec Loss 6.3850 LearningRate 0.0007 Epoch: 10 Global Step: 222260 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:47,518-Speed 6312.84 samples/sec Loss 6.4217 LearningRate 0.0007 Epoch: 10 Global Step: 222270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:56:50,765-Speed 6309.23 samples/sec Loss 6.3908 LearningRate 0.0007 Epoch: 10 Global Step: 222280 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:56:54,010-Speed 6312.23 samples/sec Loss 6.3572 LearningRate 0.0007 Epoch: 10 Global Step: 222290 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:56:57,243-Speed 6337.49 samples/sec Loss 6.4173 LearningRate 0.0007 Epoch: 10 Global Step: 222300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:00,534-Speed 6223.46 samples/sec Loss 6.3707 LearningRate 0.0007 Epoch: 10 Global Step: 222310 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:03,784-Speed 6303.52 samples/sec Loss 6.4462 LearningRate 0.0007 Epoch: 10 Global Step: 222320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:07,033-Speed 6305.66 samples/sec Loss 6.4116 LearningRate 0.0007 Epoch: 10 Global Step: 222330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:10,277-Speed 6313.57 samples/sec Loss 6.3813 LearningRate 0.0007 Epoch: 10 Global Step: 222340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:13,528-Speed 6300.01 samples/sec Loss 6.3185 LearningRate 0.0007 Epoch: 10 Global Step: 222350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:16,774-Speed 6310.82 samples/sec Loss 6.3965 LearningRate 0.0007 Epoch: 10 Global Step: 222360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:20,017-Speed 6316.64 samples/sec Loss 6.3422 LearningRate 0.0007 Epoch: 10 Global Step: 222370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:23,263-Speed 6310.54 samples/sec Loss 6.3362 LearningRate 0.0007 Epoch: 10 Global Step: 222380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:26,511-Speed 6307.13 samples/sec Loss 6.3909 LearningRate 0.0007 Epoch: 10 Global Step: 222390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:29,744-Speed 6337.16 samples/sec Loss 6.4223 LearningRate 0.0007 Epoch: 10 Global Step: 222400 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:32,994-Speed 6302.33 samples/sec Loss 6.4277 LearningRate 0.0007 Epoch: 10 Global Step: 222410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:36,238-Speed 6314.46 samples/sec Loss 6.4320 LearningRate 0.0007 Epoch: 10 Global Step: 222420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:39,485-Speed 6309.73 samples/sec Loss 6.4067 LearningRate 0.0007 Epoch: 10 Global Step: 222430 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:42,739-Speed 6293.75 samples/sec Loss 6.4515 LearningRate 0.0007 Epoch: 10 Global Step: 222440 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:45,995-Speed 6292.07 samples/sec Loss 6.4077 LearningRate 0.0007 Epoch: 10 Global Step: 222450 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:49,240-Speed 6314.30 samples/sec Loss 6.4307 LearningRate 0.0007 Epoch: 10 Global Step: 222460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:52,488-Speed 6305.54 samples/sec Loss 6.4061 LearningRate 0.0007 Epoch: 10 Global Step: 222470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:55,732-Speed 6315.12 samples/sec Loss 6.4266 LearningRate 0.0007 Epoch: 10 Global Step: 222480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:57:58,976-Speed 6314.45 samples/sec Loss 6.4185 LearningRate 0.0007 Epoch: 10 Global Step: 222490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:02,222-Speed 6309.67 samples/sec Loss 6.3339 LearningRate 0.0007 Epoch: 10 Global Step: 222500 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:58:05,453-Speed 6341.80 samples/sec Loss 6.3402 LearningRate 0.0007 Epoch: 10 Global Step: 222510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:08,695-Speed 6317.01 samples/sec Loss 6.5081 LearningRate 0.0007 Epoch: 10 Global Step: 222520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:11,942-Speed 6309.74 samples/sec Loss 6.3613 LearningRate 0.0007 Epoch: 10 Global Step: 222530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:15,200-Speed 6287.81 samples/sec Loss 6.4198 LearningRate 0.0007 Epoch: 10 Global Step: 222540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:18,445-Speed 6312.60 samples/sec Loss 6.4162 LearningRate 0.0007 Epoch: 10 Global Step: 222550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:21,693-Speed 6307.00 samples/sec Loss 6.3243 LearningRate 0.0007 Epoch: 10 Global Step: 222560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:24,940-Speed 6307.87 samples/sec Loss 6.3775 LearningRate 0.0007 Epoch: 10 Global Step: 222570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:28,186-Speed 6310.69 samples/sec Loss 6.4121 LearningRate 0.0007 Epoch: 10 Global Step: 222580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:31,438-Speed 6299.34 samples/sec Loss 6.3885 LearningRate 0.0007 Epoch: 10 Global Step: 222590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:34,686-Speed 6306.60 samples/sec Loss 6.3869 LearningRate 0.0007 Epoch: 10 Global Step: 222600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:37,931-Speed 6312.92 samples/sec Loss 6.3897 LearningRate 0.0007 Epoch: 10 Global Step: 222610 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:58:41,169-Speed 6326.59 samples/sec Loss 6.3568 LearningRate 0.0007 Epoch: 10 Global Step: 222620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:44,415-Speed 6310.55 samples/sec Loss 6.4449 LearningRate 0.0007 Epoch: 10 Global Step: 222630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:47,662-Speed 6309.46 samples/sec Loss 6.4659 LearningRate 0.0007 Epoch: 10 Global Step: 222640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:50,905-Speed 6316.08 samples/sec Loss 6.3957 LearningRate 0.0007 Epoch: 10 Global Step: 222650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:54,151-Speed 6311.27 samples/sec Loss 6.3981 LearningRate 0.0007 Epoch: 10 Global Step: 222660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:58:57,396-Speed 6312.44 samples/sec Loss 6.3814 LearningRate 0.0007 Epoch: 10 Global Step: 222670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:00,641-Speed 6312.44 samples/sec Loss 6.3092 LearningRate 0.0007 Epoch: 10 Global Step: 222680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:03,888-Speed 6310.14 samples/sec Loss 6.3688 LearningRate 0.0007 Epoch: 10 Global Step: 222690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:07,141-Speed 6295.82 samples/sec Loss 6.3939 LearningRate 0.0007 Epoch: 10 Global Step: 222700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:10,388-Speed 6310.75 samples/sec Loss 6.3055 LearningRate 0.0007 Epoch: 10 Global Step: 222710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:13,620-Speed 6337.12 samples/sec Loss 6.4490 LearningRate 0.0007 Epoch: 10 Global Step: 222720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:16,866-Speed 6311.25 samples/sec Loss 6.3048 LearningRate 0.0007 Epoch: 10 Global Step: 222730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:20,109-Speed 6316.08 samples/sec Loss 6.3021 LearningRate 0.0007 Epoch: 10 Global Step: 222740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:23,355-Speed 6311.10 samples/sec Loss 6.3764 LearningRate 0.0007 Epoch: 10 Global Step: 222750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:26,600-Speed 6311.37 samples/sec Loss 6.2684 LearningRate 0.0007 Epoch: 10 Global Step: 222760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:29,849-Speed 6306.17 samples/sec Loss 6.2865 LearningRate 0.0007 Epoch: 10 Global Step: 222770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:33,099-Speed 6301.74 samples/sec Loss 6.3729 LearningRate 0.0007 Epoch: 10 Global Step: 222780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:36,345-Speed 6311.74 samples/sec Loss 6.4310 LearningRate 0.0007 Epoch: 10 Global Step: 222790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:39,593-Speed 6306.46 samples/sec Loss 6.3699 LearningRate 0.0007 Epoch: 10 Global Step: 222800 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:42,838-Speed 6313.16 samples/sec Loss 6.3432 LearningRate 0.0007 Epoch: 10 Global Step: 222810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:46,086-Speed 6306.40 samples/sec Loss 6.4108 LearningRate 0.0007 Epoch: 10 Global Step: 222820 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 11:59:49,318-Speed 6337.85 samples/sec Loss 6.4305 LearningRate 0.0007 Epoch: 10 Global Step: 222830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:52,565-Speed 6308.98 samples/sec Loss 6.4569 LearningRate 0.0007 Epoch: 10 Global Step: 222840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:55,816-Speed 6301.38 samples/sec Loss 6.3281 LearningRate 0.0007 Epoch: 10 Global Step: 222850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 11:59:59,063-Speed 6307.87 samples/sec Loss 6.4427 LearningRate 0.0007 Epoch: 10 Global Step: 222860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:02,314-Speed 6302.31 samples/sec Loss 6.3178 LearningRate 0.0007 Epoch: 10 Global Step: 222870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:05,562-Speed 6306.34 samples/sec Loss 6.4428 LearningRate 0.0007 Epoch: 10 Global Step: 222880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:08,811-Speed 6304.16 samples/sec Loss 6.3961 LearningRate 0.0007 Epoch: 10 Global Step: 222890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:12,060-Speed 6306.47 samples/sec Loss 6.4192 LearningRate 0.0007 Epoch: 10 Global Step: 222900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:15,306-Speed 6309.65 samples/sec Loss 6.3881 LearningRate 0.0007 Epoch: 10 Global Step: 222910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:18,553-Speed 6308.67 samples/sec Loss 6.3328 LearningRate 0.0007 Epoch: 10 Global Step: 222920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:21,796-Speed 6316.48 samples/sec Loss 6.3376 LearningRate 0.0007 Epoch: 10 Global Step: 222930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:25,046-Speed 6302.63 samples/sec Loss 6.3201 LearningRate 0.0007 Epoch: 10 Global Step: 222940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:28,298-Speed 6299.89 samples/sec Loss 6.3785 LearningRate 0.0007 Epoch: 10 Global Step: 222950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:31,543-Speed 6311.82 samples/sec Loss 6.4754 LearningRate 0.0007 Epoch: 10 Global Step: 222960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:34,793-Speed 6303.23 samples/sec Loss 6.3414 LearningRate 0.0007 Epoch: 10 Global Step: 222970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:38,040-Speed 6309.54 samples/sec Loss 6.3053 LearningRate 0.0007 Epoch: 10 Global Step: 222980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:41,284-Speed 6314.86 samples/sec Loss 6.3255 LearningRate 0.0007 Epoch: 10 Global Step: 222990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:44,534-Speed 6302.26 samples/sec Loss 6.3602 LearningRate 0.0007 Epoch: 10 Global Step: 223000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:47,783-Speed 6304.77 samples/sec Loss 6.3818 LearningRate 0.0007 Epoch: 10 Global Step: 223010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:51,033-Speed 6302.38 samples/sec Loss 6.3220 LearningRate 0.0007 Epoch: 10 Global Step: 223020 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:00:54,279-Speed 6312.13 samples/sec Loss 6.3178 LearningRate 0.0007 Epoch: 10 Global Step: 223030 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:00:57,529-Speed 6302.17 samples/sec Loss 6.3379 LearningRate 0.0007 Epoch: 10 Global Step: 223040 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:01:00,764-Speed 6331.55 samples/sec Loss 6.4481 LearningRate 0.0007 Epoch: 10 Global Step: 223050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:04,013-Speed 6306.52 samples/sec Loss 6.4731 LearningRate 0.0007 Epoch: 10 Global Step: 223060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:07,260-Speed 6308.50 samples/sec Loss 6.3568 LearningRate 0.0007 Epoch: 10 Global Step: 223070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:10,507-Speed 6309.61 samples/sec Loss 6.4800 LearningRate 0.0007 Epoch: 10 Global Step: 223080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:13,755-Speed 6305.94 samples/sec Loss 6.3781 LearningRate 0.0007 Epoch: 10 Global Step: 223090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:17,002-Speed 6309.84 samples/sec Loss 6.3614 LearningRate 0.0007 Epoch: 10 Global Step: 223100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:20,248-Speed 6309.65 samples/sec Loss 6.4231 LearningRate 0.0007 Epoch: 10 Global Step: 223110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:23,498-Speed 6303.79 samples/sec Loss 6.3750 LearningRate 0.0007 Epoch: 10 Global Step: 223120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:26,747-Speed 6305.10 samples/sec Loss 6.3685 LearningRate 0.0007 Epoch: 10 Global Step: 223130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:29,991-Speed 6312.73 samples/sec Loss 6.3681 LearningRate 0.0007 Epoch: 10 Global Step: 223140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:33,228-Speed 6329.17 samples/sec Loss 6.3652 LearningRate 0.0007 Epoch: 10 Global Step: 223150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:36,496-Speed 6268.05 samples/sec Loss 6.3878 LearningRate 0.0007 Epoch: 10 Global Step: 223160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:39,741-Speed 6311.95 samples/sec Loss 6.3173 LearningRate 0.0007 Epoch: 10 Global Step: 223170 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:42,988-Speed 6310.86 samples/sec Loss 6.3688 LearningRate 0.0007 Epoch: 10 Global Step: 223180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:46,236-Speed 6305.25 samples/sec Loss 6.3454 LearningRate 0.0007 Epoch: 10 Global Step: 223190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:49,480-Speed 6314.47 samples/sec Loss 6.3326 LearningRate 0.0007 Epoch: 10 Global Step: 223200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:52,734-Speed 6295.85 samples/sec Loss 6.3846 LearningRate 0.0007 Epoch: 10 Global Step: 223210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:55,981-Speed 6309.58 samples/sec Loss 6.3409 LearningRate 0.0007 Epoch: 10 Global Step: 223220 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:01:59,227-Speed 6310.19 samples/sec Loss 6.3683 LearningRate 0.0007 Epoch: 10 Global Step: 223230 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:02,484-Speed 6289.15 samples/sec Loss 6.3423 LearningRate 0.0007 Epoch: 10 Global Step: 223240 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:05,744-Speed 6284.02 samples/sec Loss 6.3636 LearningRate 0.0007 Epoch: 10 Global Step: 223250 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:02:08,977-Speed 6335.25 samples/sec Loss 6.3416 LearningRate 0.0007 Epoch: 10 Global Step: 223260 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:12,230-Speed 6299.03 samples/sec Loss 6.3901 LearningRate 0.0007 Epoch: 10 Global Step: 223270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:15,479-Speed 6304.78 samples/sec Loss 6.4287 LearningRate 0.0007 Epoch: 10 Global Step: 223280 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:18,731-Speed 6297.63 samples/sec Loss 6.3178 LearningRate 0.0007 Epoch: 10 Global Step: 223290 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:21,978-Speed 6310.49 samples/sec Loss 6.4003 LearningRate 0.0007 Epoch: 10 Global Step: 223300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:25,228-Speed 6303.13 samples/sec Loss 6.3451 LearningRate 0.0007 Epoch: 10 Global Step: 223310 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:28,472-Speed 6313.26 samples/sec Loss 6.4755 LearningRate 0.0007 Epoch: 10 Global Step: 223320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:31,719-Speed 6308.70 samples/sec Loss 6.3679 LearningRate 0.0007 Epoch: 10 Global Step: 223330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:34,968-Speed 6304.80 samples/sec Loss 6.4416 LearningRate 0.0007 Epoch: 10 Global Step: 223340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:38,212-Speed 6315.41 samples/sec Loss 6.3658 LearningRate 0.0007 Epoch: 10 Global Step: 223350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:41,459-Speed 6308.32 samples/sec Loss 6.4148 LearningRate 0.0007 Epoch: 10 Global Step: 223360 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:02:44,691-Speed 6339.23 samples/sec Loss 6.3526 LearningRate 0.0007 Epoch: 10 Global Step: 223370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:47,937-Speed 6309.42 samples/sec Loss 6.3670 LearningRate 0.0007 Epoch: 10 Global Step: 223380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:51,184-Speed 6308.71 samples/sec Loss 6.3849 LearningRate 0.0007 Epoch: 10 Global Step: 223390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:54,431-Speed 6309.71 samples/sec Loss 6.3469 LearningRate 0.0007 Epoch: 10 Global Step: 223400 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:02:57,680-Speed 6304.38 samples/sec Loss 6.4336 LearningRate 0.0007 Epoch: 10 Global Step: 223410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:00,925-Speed 6313.20 samples/sec Loss 6.4333 LearningRate 0.0007 Epoch: 10 Global Step: 223420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:04,173-Speed 6306.35 samples/sec Loss 6.4146 LearningRate 0.0007 Epoch: 10 Global Step: 223430 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:07,423-Speed 6302.69 samples/sec Loss 6.3813 LearningRate 0.0007 Epoch: 10 Global Step: 223440 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:10,669-Speed 6311.80 samples/sec Loss 6.4037 LearningRate 0.0007 Epoch: 10 Global Step: 223450 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:13,916-Speed 6307.84 samples/sec Loss 6.4120 LearningRate 0.0007 Epoch: 10 Global Step: 223460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:17,164-Speed 6307.11 samples/sec Loss 6.3970 LearningRate 0.0007 Epoch: 10 Global Step: 223470 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:03:20,400-Speed 6330.49 samples/sec Loss 6.4751 LearningRate 0.0007 Epoch: 10 Global Step: 223480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:23,645-Speed 6313.15 samples/sec Loss 6.3490 LearningRate 0.0007 Epoch: 10 Global Step: 223490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:26,892-Speed 6308.19 samples/sec Loss 6.4607 LearningRate 0.0007 Epoch: 10 Global Step: 223500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:30,138-Speed 6310.18 samples/sec Loss 6.3295 LearningRate 0.0007 Epoch: 10 Global Step: 223510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:33,386-Speed 6307.78 samples/sec Loss 6.3739 LearningRate 0.0007 Epoch: 10 Global Step: 223520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:36,631-Speed 6311.86 samples/sec Loss 6.4388 LearningRate 0.0007 Epoch: 10 Global Step: 223530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:39,879-Speed 6308.36 samples/sec Loss 6.3571 LearningRate 0.0007 Epoch: 10 Global Step: 223540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:43,130-Speed 6301.22 samples/sec Loss 6.3308 LearningRate 0.0007 Epoch: 10 Global Step: 223550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:46,375-Speed 6311.97 samples/sec Loss 6.4454 LearningRate 0.0007 Epoch: 10 Global Step: 223560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:49,623-Speed 6306.30 samples/sec Loss 6.4007 LearningRate 0.0007 Epoch: 10 Global Step: 223570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:52,857-Speed 6333.58 samples/sec Loss 6.4336 LearningRate 0.0007 Epoch: 10 Global Step: 223580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:56,109-Speed 6300.46 samples/sec Loss 6.3908 LearningRate 0.0007 Epoch: 10 Global Step: 223590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:03:59,355-Speed 6310.11 samples/sec Loss 6.3889 LearningRate 0.0007 Epoch: 10 Global Step: 223600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:02,606-Speed 6301.44 samples/sec Loss 6.4119 LearningRate 0.0007 Epoch: 10 Global Step: 223610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:05,855-Speed 6303.56 samples/sec Loss 6.3760 LearningRate 0.0007 Epoch: 10 Global Step: 223620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:09,105-Speed 6303.84 samples/sec Loss 6.4114 LearningRate 0.0007 Epoch: 10 Global Step: 223630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:12,351-Speed 6311.28 samples/sec Loss 6.3205 LearningRate 0.0007 Epoch: 10 Global Step: 223640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:15,597-Speed 6310.96 samples/sec Loss 6.3453 LearningRate 0.0007 Epoch: 10 Global Step: 223650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:18,843-Speed 6308.98 samples/sec Loss 6.2447 LearningRate 0.0007 Epoch: 10 Global Step: 223660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:22,091-Speed 6307.96 samples/sec Loss 6.3373 LearningRate 0.0007 Epoch: 10 Global Step: 223670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:25,339-Speed 6306.36 samples/sec Loss 6.3464 LearningRate 0.0007 Epoch: 10 Global Step: 223680 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:04:28,574-Speed 6332.98 samples/sec Loss 6.3172 LearningRate 0.0007 Epoch: 10 Global Step: 223690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:31,823-Speed 6304.81 samples/sec Loss 6.4007 LearningRate 0.0007 Epoch: 10 Global Step: 223700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:35,073-Speed 6303.66 samples/sec Loss 6.3505 LearningRate 0.0007 Epoch: 10 Global Step: 223710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:38,322-Speed 6303.66 samples/sec Loss 6.3377 LearningRate 0.0007 Epoch: 10 Global Step: 223720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:41,570-Speed 6308.24 samples/sec Loss 6.3229 LearningRate 0.0007 Epoch: 10 Global Step: 223730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:44,816-Speed 6309.96 samples/sec Loss 6.3947 LearningRate 0.0007 Epoch: 10 Global Step: 223740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:48,075-Speed 6286.29 samples/sec Loss 6.3247 LearningRate 0.0007 Epoch: 10 Global Step: 223750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:51,330-Speed 6292.24 samples/sec Loss 6.4635 LearningRate 0.0007 Epoch: 10 Global Step: 223760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:54,576-Speed 6310.89 samples/sec Loss 6.3596 LearningRate 0.0007 Epoch: 10 Global Step: 223770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:04:57,820-Speed 6314.37 samples/sec Loss 6.4125 LearningRate 0.0007 Epoch: 10 Global Step: 223780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:01,054-Speed 6334.47 samples/sec Loss 6.4013 LearningRate 0.0007 Epoch: 10 Global Step: 223790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:04,299-Speed 6312.30 samples/sec Loss 6.3810 LearningRate 0.0007 Epoch: 10 Global Step: 223800 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:07,544-Speed 6313.11 samples/sec Loss 6.3659 LearningRate 0.0007 Epoch: 10 Global Step: 223810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:10,792-Speed 6307.56 samples/sec Loss 6.5093 LearningRate 0.0007 Epoch: 10 Global Step: 223820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:14,040-Speed 6306.87 samples/sec Loss 6.3719 LearningRate 0.0007 Epoch: 10 Global Step: 223830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:17,294-Speed 6294.77 samples/sec Loss 6.4081 LearningRate 0.0007 Epoch: 10 Global Step: 223840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:20,541-Speed 6308.65 samples/sec Loss 6.3990 LearningRate 0.0007 Epoch: 10 Global Step: 223850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:23,801-Speed 6282.78 samples/sec Loss 6.4093 LearningRate 0.0007 Epoch: 10 Global Step: 223860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:27,047-Speed 6311.69 samples/sec Loss 6.3788 LearningRate 0.0007 Epoch: 10 Global Step: 223870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:30,293-Speed 6309.53 samples/sec Loss 6.3887 LearningRate 0.0007 Epoch: 10 Global Step: 223880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:33,540-Speed 6309.24 samples/sec Loss 6.4192 LearningRate 0.0007 Epoch: 10 Global Step: 223890 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:05:36,773-Speed 6335.34 samples/sec Loss 6.3722 LearningRate 0.0007 Epoch: 10 Global Step: 223900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:40,104-Speed 6151.05 samples/sec Loss 6.3397 LearningRate 0.0007 Epoch: 10 Global Step: 223910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:43,355-Speed 6301.59 samples/sec Loss 6.3877 LearningRate 0.0007 Epoch: 10 Global Step: 223920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:46,602-Speed 6308.65 samples/sec Loss 6.3545 LearningRate 0.0007 Epoch: 10 Global Step: 223930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:49,847-Speed 6312.26 samples/sec Loss 6.3485 LearningRate 0.0007 Epoch: 10 Global Step: 223940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:53,094-Speed 6309.52 samples/sec Loss 6.3992 LearningRate 0.0007 Epoch: 10 Global Step: 223950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:56,336-Speed 6317.49 samples/sec Loss 6.4057 LearningRate 0.0007 Epoch: 10 Global Step: 223960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:05:59,583-Speed 6308.74 samples/sec Loss 6.2560 LearningRate 0.0007 Epoch: 10 Global Step: 223970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:02,830-Speed 6309.50 samples/sec Loss 6.3558 LearningRate 0.0007 Epoch: 10 Global Step: 223980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:06,075-Speed 6312.19 samples/sec Loss 6.3726 LearningRate 0.0007 Epoch: 10 Global Step: 223990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:09,320-Speed 6312.38 samples/sec Loss 6.4120 LearningRate 0.0007 Epoch: 10 Global Step: 224000 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:06:12,571-Speed 6301.03 samples/sec Loss 6.3722 LearningRate 0.0007 Epoch: 10 Global Step: 224010 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:06:15,816-Speed 6312.87 samples/sec Loss 6.3558 LearningRate 0.0007 Epoch: 10 Global Step: 224020 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:06:19,051-Speed 6332.44 samples/sec Loss 6.3787 LearningRate 0.0007 Epoch: 10 Global Step: 224030 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:22,305-Speed 6294.20 samples/sec Loss 6.4326 LearningRate 0.0007 Epoch: 10 Global Step: 224040 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:25,553-Speed 6307.86 samples/sec Loss 6.4453 LearningRate 0.0007 Epoch: 10 Global Step: 224050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:28,804-Speed 6301.72 samples/sec Loss 6.4985 LearningRate 0.0007 Epoch: 10 Global Step: 224060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:32,052-Speed 6305.63 samples/sec Loss 6.3562 LearningRate 0.0007 Epoch: 10 Global Step: 224070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:35,301-Speed 6304.83 samples/sec Loss 6.3263 LearningRate 0.0007 Epoch: 10 Global Step: 224080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:38,548-Speed 6309.23 samples/sec Loss 6.3972 LearningRate 0.0007 Epoch: 10 Global Step: 224090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:41,795-Speed 6308.62 samples/sec Loss 6.4031 LearningRate 0.0007 Epoch: 10 Global Step: 224100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:45,053-Speed 6286.31 samples/sec Loss 6.3655 LearningRate 0.0007 Epoch: 10 Global Step: 224110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:48,320-Speed 6271.03 samples/sec Loss 6.4143 LearningRate 0.0007 Epoch: 10 Global Step: 224120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:51,565-Speed 6313.40 samples/sec Loss 6.4749 LearningRate 0.0007 Epoch: 10 Global Step: 224130 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:06:54,799-Speed 6335.36 samples/sec Loss 6.3536 LearningRate 0.0007 Epoch: 10 Global Step: 224140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:06:58,046-Speed 6308.25 samples/sec Loss 6.3840 LearningRate 0.0007 Epoch: 10 Global Step: 224150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:01,292-Speed 6310.43 samples/sec Loss 6.3637 LearningRate 0.0007 Epoch: 10 Global Step: 224160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:04,544-Speed 6299.25 samples/sec Loss 6.3921 LearningRate 0.0007 Epoch: 10 Global Step: 224170 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:07,788-Speed 6313.75 samples/sec Loss 6.3674 LearningRate 0.0007 Epoch: 10 Global Step: 224180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:11,035-Speed 6309.22 samples/sec Loss 6.3617 LearningRate 0.0007 Epoch: 10 Global Step: 224190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:14,280-Speed 6312.72 samples/sec Loss 6.3908 LearningRate 0.0007 Epoch: 10 Global Step: 224200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:17,530-Speed 6302.69 samples/sec Loss 6.3848 LearningRate 0.0007 Epoch: 10 Global Step: 224210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:20,780-Speed 6302.21 samples/sec Loss 6.3775 LearningRate 0.0007 Epoch: 10 Global Step: 224220 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:24,027-Speed 6309.12 samples/sec Loss 6.2862 LearningRate 0.0007 Epoch: 10 Global Step: 224230 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:27,285-Speed 6287.58 samples/sec Loss 6.3931 LearningRate 0.0007 Epoch: 10 Global Step: 224240 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:07:30,516-Speed 6339.94 samples/sec Loss 6.4077 LearningRate 0.0007 Epoch: 10 Global Step: 224250 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:33,764-Speed 6307.99 samples/sec Loss 6.3976 LearningRate 0.0007 Epoch: 10 Global Step: 224260 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:37,014-Speed 6302.03 samples/sec Loss 6.4089 LearningRate 0.0007 Epoch: 10 Global Step: 224270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:40,268-Speed 6294.96 samples/sec Loss 6.2692 LearningRate 0.0007 Epoch: 10 Global Step: 224280 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:43,521-Speed 6298.69 samples/sec Loss 6.3957 LearningRate 0.0007 Epoch: 10 Global Step: 224290 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:46,766-Speed 6311.64 samples/sec Loss 6.3019 LearningRate 0.0007 Epoch: 10 Global Step: 224300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:50,013-Speed 6308.99 samples/sec Loss 6.4013 LearningRate 0.0007 Epoch: 10 Global Step: 224310 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:53,261-Speed 6306.09 samples/sec Loss 6.3418 LearningRate 0.0007 Epoch: 10 Global Step: 224320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:56,507-Speed 6311.77 samples/sec Loss 6.3401 LearningRate 0.0007 Epoch: 10 Global Step: 224330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:07:59,753-Speed 6310.18 samples/sec Loss 6.3425 LearningRate 0.0007 Epoch: 10 Global Step: 224340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:02,988-Speed 6332.82 samples/sec Loss 6.3576 LearningRate 0.0007 Epoch: 10 Global Step: 224350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:06,240-Speed 6298.69 samples/sec Loss 6.3854 LearningRate 0.0007 Epoch: 10 Global Step: 224360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:09,491-Speed 6301.57 samples/sec Loss 6.3827 LearningRate 0.0007 Epoch: 10 Global Step: 224370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:12,738-Speed 6309.08 samples/sec Loss 6.3438 LearningRate 0.0007 Epoch: 10 Global Step: 224380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:15,985-Speed 6308.67 samples/sec Loss 6.2818 LearningRate 0.0007 Epoch: 10 Global Step: 224390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:19,232-Speed 6309.45 samples/sec Loss 6.3249 LearningRate 0.0007 Epoch: 10 Global Step: 224400 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:22,478-Speed 6309.51 samples/sec Loss 6.3830 LearningRate 0.0007 Epoch: 10 Global Step: 224410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:25,726-Speed 6306.34 samples/sec Loss 6.3018 LearningRate 0.0007 Epoch: 10 Global Step: 224420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:28,976-Speed 6304.56 samples/sec Loss 6.3915 LearningRate 0.0007 Epoch: 10 Global Step: 224430 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:32,224-Speed 6306.20 samples/sec Loss 6.3477 LearningRate 0.0007 Epoch: 10 Global Step: 224440 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:35,481-Speed 6289.99 samples/sec Loss 6.4030 LearningRate 0.0007 Epoch: 10 Global Step: 224450 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:08:38,732-Speed 6300.30 samples/sec Loss 6.4479 LearningRate 0.0007 Epoch: 10 Global Step: 224460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:41,981-Speed 6305.33 samples/sec Loss 6.3090 LearningRate 0.0007 Epoch: 10 Global Step: 224470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:45,231-Speed 6303.12 samples/sec Loss 6.3100 LearningRate 0.0007 Epoch: 10 Global Step: 224480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:48,477-Speed 6311.27 samples/sec Loss 6.3547 LearningRate 0.0007 Epoch: 10 Global Step: 224490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:51,744-Speed 6270.04 samples/sec Loss 6.3899 LearningRate 0.0007 Epoch: 10 Global Step: 224500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:54,991-Speed 6308.21 samples/sec Loss 6.3751 LearningRate 0.0007 Epoch: 10 Global Step: 224510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:08:58,237-Speed 6310.23 samples/sec Loss 6.4060 LearningRate 0.0007 Epoch: 10 Global Step: 224520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:01,490-Speed 6297.26 samples/sec Loss 6.3477 LearningRate 0.0007 Epoch: 10 Global Step: 224530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:04,749-Speed 6285.80 samples/sec Loss 6.4002 LearningRate 0.0007 Epoch: 10 Global Step: 224540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:07,995-Speed 6310.81 samples/sec Loss 6.3473 LearningRate 0.0007 Epoch: 10 Global Step: 224550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:11,246-Speed 6302.45 samples/sec Loss 6.4535 LearningRate 0.0007 Epoch: 10 Global Step: 224560 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:09:14,492-Speed 6309.34 samples/sec Loss 6.3141 LearningRate 0.0007 Epoch: 10 Global Step: 224570 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:09:17,725-Speed 6338.05 samples/sec Loss 6.4671 LearningRate 0.0007 Epoch: 10 Global Step: 224580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:20,972-Speed 6308.36 samples/sec Loss 6.3835 LearningRate 0.0007 Epoch: 10 Global Step: 224590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:24,223-Speed 6300.70 samples/sec Loss 6.3769 LearningRate 0.0007 Epoch: 10 Global Step: 224600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:27,470-Speed 6307.63 samples/sec Loss 6.3628 LearningRate 0.0007 Epoch: 10 Global Step: 224610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:30,714-Speed 6315.89 samples/sec Loss 6.2445 LearningRate 0.0007 Epoch: 10 Global Step: 224620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:33,963-Speed 6304.96 samples/sec Loss 6.2841 LearningRate 0.0007 Epoch: 10 Global Step: 224630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:37,208-Speed 6311.99 samples/sec Loss 6.3386 LearningRate 0.0007 Epoch: 10 Global Step: 224640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:40,453-Speed 6312.01 samples/sec Loss 6.3223 LearningRate 0.0007 Epoch: 10 Global Step: 224650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:43,700-Speed 6308.61 samples/sec Loss 6.3506 LearningRate 0.0007 Epoch: 10 Global Step: 224660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:46,950-Speed 6304.69 samples/sec Loss 6.4019 LearningRate 0.0007 Epoch: 10 Global Step: 224670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:50,196-Speed 6310.30 samples/sec Loss 6.3375 LearningRate 0.0007 Epoch: 10 Global Step: 224680 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:09:53,440-Speed 6314.92 samples/sec Loss 6.3673 LearningRate 0.0007 Epoch: 10 Global Step: 224690 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:09:56,673-Speed 6335.21 samples/sec Loss 6.3291 LearningRate 0.0007 Epoch: 10 Global Step: 224700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:09:59,920-Speed 6308.56 samples/sec Loss 6.3497 LearningRate 0.0007 Epoch: 10 Global Step: 224710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:03,163-Speed 6316.05 samples/sec Loss 6.3810 LearningRate 0.0007 Epoch: 10 Global Step: 224720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:06,407-Speed 6314.38 samples/sec Loss 6.3566 LearningRate 0.0007 Epoch: 10 Global Step: 224730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:09,652-Speed 6312.93 samples/sec Loss 6.4257 LearningRate 0.0007 Epoch: 10 Global Step: 224740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:12,901-Speed 6305.47 samples/sec Loss 6.4011 LearningRate 0.0007 Epoch: 10 Global Step: 224750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:16,150-Speed 6306.24 samples/sec Loss 6.3600 LearningRate 0.0007 Epoch: 10 Global Step: 224760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:19,394-Speed 6314.07 samples/sec Loss 6.3591 LearningRate 0.0007 Epoch: 10 Global Step: 224770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:22,641-Speed 6309.43 samples/sec Loss 6.3384 LearningRate 0.0007 Epoch: 10 Global Step: 224780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:25,887-Speed 6310.45 samples/sec Loss 6.4046 LearningRate 0.0007 Epoch: 10 Global Step: 224790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:29,147-Speed 6283.48 samples/sec Loss 6.4551 LearningRate 0.0007 Epoch: 10 Global Step: 224800 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:10:32,381-Speed 6335.13 samples/sec Loss 6.2897 LearningRate 0.0007 Epoch: 10 Global Step: 224810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:35,627-Speed 6309.26 samples/sec Loss 6.3706 LearningRate 0.0007 Epoch: 10 Global Step: 224820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:38,871-Speed 6315.38 samples/sec Loss 6.3310 LearningRate 0.0007 Epoch: 10 Global Step: 224830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:42,124-Speed 6296.28 samples/sec Loss 6.3205 LearningRate 0.0007 Epoch: 10 Global Step: 224840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:45,370-Speed 6311.59 samples/sec Loss 6.3729 LearningRate 0.0007 Epoch: 10 Global Step: 224850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:48,613-Speed 6315.57 samples/sec Loss 6.2926 LearningRate 0.0007 Epoch: 10 Global Step: 224860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:51,859-Speed 6311.25 samples/sec Loss 6.3199 LearningRate 0.0007 Epoch: 10 Global Step: 224870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:55,107-Speed 6306.81 samples/sec Loss 6.3639 LearningRate 0.0007 Epoch: 10 Global Step: 224880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:10:58,347-Speed 6322.03 samples/sec Loss 6.2947 LearningRate 0.0007 Epoch: 10 Global Step: 224890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:01,595-Speed 6306.72 samples/sec Loss 6.3113 LearningRate 0.0007 Epoch: 10 Global Step: 224900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:04,843-Speed 6306.95 samples/sec Loss 6.3878 LearningRate 0.0007 Epoch: 10 Global Step: 224910 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:11:08,075-Speed 6337.75 samples/sec Loss 6.3475 LearningRate 0.0007 Epoch: 10 Global Step: 224920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:11,316-Speed 6321.05 samples/sec Loss 6.3972 LearningRate 0.0007 Epoch: 10 Global Step: 224930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:14,561-Speed 6313.25 samples/sec Loss 6.3531 LearningRate 0.0007 Epoch: 10 Global Step: 224940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:17,807-Speed 6309.97 samples/sec Loss 6.3459 LearningRate 0.0007 Epoch: 10 Global Step: 224950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:21,053-Speed 6311.93 samples/sec Loss 6.3467 LearningRate 0.0007 Epoch: 10 Global Step: 224960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:24,300-Speed 6308.76 samples/sec Loss 6.3725 LearningRate 0.0007 Epoch: 10 Global Step: 224970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:27,548-Speed 6306.09 samples/sec Loss 6.3450 LearningRate 0.0007 Epoch: 10 Global Step: 224980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:30,796-Speed 6307.47 samples/sec Loss 6.3579 LearningRate 0.0007 Epoch: 10 Global Step: 224990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:34,048-Speed 6298.64 samples/sec Loss 6.2788 LearningRate 0.0007 Epoch: 10 Global Step: 225000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:37,300-Speed 6299.39 samples/sec Loss 6.3457 LearningRate 0.0007 Epoch: 10 Global Step: 225010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:40,534-Speed 6334.13 samples/sec Loss 6.3623 LearningRate 0.0007 Epoch: 10 Global Step: 225020 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:43,779-Speed 6313.62 samples/sec Loss 6.3508 LearningRate 0.0007 Epoch: 10 Global Step: 225030 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:47,026-Speed 6307.63 samples/sec Loss 6.4029 LearningRate 0.0007 Epoch: 10 Global Step: 225040 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:50,274-Speed 6306.77 samples/sec Loss 6.3970 LearningRate 0.0007 Epoch: 10 Global Step: 225050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:53,519-Speed 6313.01 samples/sec Loss 6.3245 LearningRate 0.0007 Epoch: 10 Global Step: 225060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:11:56,765-Speed 6310.23 samples/sec Loss 6.3341 LearningRate 0.0007 Epoch: 10 Global Step: 225070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:00,040-Speed 6255.76 samples/sec Loss 6.3685 LearningRate 0.0007 Epoch: 10 Global Step: 225080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:03,291-Speed 6300.02 samples/sec Loss 6.2838 LearningRate 0.0007 Epoch: 10 Global Step: 225090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:06,537-Speed 6310.36 samples/sec Loss 6.4217 LearningRate 0.0007 Epoch: 10 Global Step: 225100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:09,782-Speed 6313.74 samples/sec Loss 6.2818 LearningRate 0.0007 Epoch: 10 Global Step: 225110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:13,019-Speed 6329.03 samples/sec Loss 6.4012 LearningRate 0.0007 Epoch: 10 Global Step: 225120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:16,262-Speed 6317.01 samples/sec Loss 6.3766 LearningRate 0.0007 Epoch: 10 Global Step: 225130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:19,511-Speed 6304.17 samples/sec Loss 6.4211 LearningRate 0.0007 Epoch: 10 Global Step: 225140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:22,755-Speed 6313.99 samples/sec Loss 6.3334 LearningRate 0.0007 Epoch: 10 Global Step: 225150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:26,005-Speed 6303.73 samples/sec Loss 6.3963 LearningRate 0.0007 Epoch: 10 Global Step: 225160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:29,255-Speed 6301.74 samples/sec Loss 6.4231 LearningRate 0.0007 Epoch: 10 Global Step: 225170 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:32,500-Speed 6313.86 samples/sec Loss 6.3194 LearningRate 0.0007 Epoch: 10 Global Step: 225180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:35,779-Speed 6246.65 samples/sec Loss 6.3082 LearningRate 0.0007 Epoch: 10 Global Step: 225190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:39,088-Speed 6191.18 samples/sec Loss 6.3158 LearningRate 0.0007 Epoch: 10 Global Step: 225200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:42,342-Speed 6296.43 samples/sec Loss 6.3174 LearningRate 0.0007 Epoch: 10 Global Step: 225210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:45,593-Speed 6300.50 samples/sec Loss 6.3768 LearningRate 0.0007 Epoch: 10 Global Step: 225220 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:12:48,839-Speed 6310.12 samples/sec Loss 6.3629 LearningRate 0.0007 Epoch: 10 Global Step: 225230 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:12:52,074-Speed 6331.85 samples/sec Loss 6.2134 LearningRate 0.0007 Epoch: 10 Global Step: 225240 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:55,317-Speed 6317.65 samples/sec Loss 6.4007 LearningRate 0.0007 Epoch: 10 Global Step: 225250 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:12:58,566-Speed 6304.88 samples/sec Loss 6.3790 LearningRate 0.0007 Epoch: 10 Global Step: 225260 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:01,813-Speed 6309.18 samples/sec Loss 6.3501 LearningRate 0.0007 Epoch: 10 Global Step: 225270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:05,063-Speed 6302.84 samples/sec Loss 6.3631 LearningRate 0.0007 Epoch: 10 Global Step: 225280 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:08,307-Speed 6313.47 samples/sec Loss 6.3938 LearningRate 0.0007 Epoch: 10 Global Step: 225290 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:11,559-Speed 6300.38 samples/sec Loss 6.3211 LearningRate 0.0007 Epoch: 10 Global Step: 225300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:14,848-Speed 6227.61 samples/sec Loss 6.3074 LearningRate 0.0007 Epoch: 10 Global Step: 225310 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:18,185-Speed 6138.33 samples/sec Loss 6.3398 LearningRate 0.0007 Epoch: 10 Global Step: 225320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:21,432-Speed 6309.43 samples/sec Loss 6.2678 LearningRate 0.0007 Epoch: 10 Global Step: 225330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:24,668-Speed 6329.66 samples/sec Loss 6.3510 LearningRate 0.0007 Epoch: 10 Global Step: 225340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:27,921-Speed 6296.84 samples/sec Loss 6.3851 LearningRate 0.0007 Epoch: 10 Global Step: 225350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:31,177-Speed 6291.42 samples/sec Loss 6.3605 LearningRate 0.0007 Epoch: 10 Global Step: 225360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:34,427-Speed 6303.12 samples/sec Loss 6.2912 LearningRate 0.0007 Epoch: 10 Global Step: 225370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:37,675-Speed 6306.37 samples/sec Loss 6.2745 LearningRate 0.0007 Epoch: 10 Global Step: 225380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:40,923-Speed 6306.37 samples/sec Loss 6.3516 LearningRate 0.0007 Epoch: 10 Global Step: 225390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:44,173-Speed 6303.82 samples/sec Loss 6.3856 LearningRate 0.0007 Epoch: 10 Global Step: 225400 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:47,425-Speed 6299.83 samples/sec Loss 6.4659 LearningRate 0.0007 Epoch: 10 Global Step: 225410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:50,675-Speed 6302.45 samples/sec Loss 6.3023 LearningRate 0.0007 Epoch: 10 Global Step: 225420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:53,925-Speed 6304.25 samples/sec Loss 6.3037 LearningRate 0.0007 Epoch: 10 Global Step: 225430 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:13:57,171-Speed 6311.39 samples/sec Loss 6.2780 LearningRate 0.0007 Epoch: 10 Global Step: 225440 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:14:00,418-Speed 6308.03 samples/sec Loss 6.3337 LearningRate 0.0007 Epoch: 10 Global Step: 225450 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:14:03,653-Speed 6332.05 samples/sec Loss 6.3870 LearningRate 0.0007 Epoch: 10 Global Step: 225460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:06,900-Speed 6309.05 samples/sec Loss 6.4486 LearningRate 0.0007 Epoch: 10 Global Step: 225470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:10,149-Speed 6304.47 samples/sec Loss 6.4043 LearningRate 0.0007 Epoch: 10 Global Step: 225480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:13,398-Speed 6305.62 samples/sec Loss 6.3576 LearningRate 0.0007 Epoch: 10 Global Step: 225490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:16,664-Speed 6270.79 samples/sec Loss 6.4154 LearningRate 0.0007 Epoch: 10 Global Step: 225500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:19,912-Speed 6306.56 samples/sec Loss 6.4090 LearningRate 0.0007 Epoch: 10 Global Step: 225510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:23,166-Speed 6295.68 samples/sec Loss 6.3769 LearningRate 0.0007 Epoch: 10 Global Step: 225520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:26,415-Speed 6305.55 samples/sec Loss 6.2694 LearningRate 0.0007 Epoch: 10 Global Step: 225530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:29,662-Speed 6308.84 samples/sec Loss 6.3366 LearningRate 0.0007 Epoch: 10 Global Step: 225540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:32,910-Speed 6306.54 samples/sec Loss 6.2957 LearningRate 0.0007 Epoch: 10 Global Step: 225550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:36,141-Speed 6339.48 samples/sec Loss 6.4403 LearningRate 0.0007 Epoch: 10 Global Step: 225560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:39,386-Speed 6312.75 samples/sec Loss 6.2740 LearningRate 0.0007 Epoch: 10 Global Step: 225570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:42,634-Speed 6306.64 samples/sec Loss 6.3310 LearningRate 0.0007 Epoch: 10 Global Step: 225580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:45,882-Speed 6307.24 samples/sec Loss 6.3750 LearningRate 0.0007 Epoch: 10 Global Step: 225590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:49,127-Speed 6312.06 samples/sec Loss 6.3505 LearningRate 0.0007 Epoch: 10 Global Step: 225600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:52,377-Speed 6302.76 samples/sec Loss 6.3752 LearningRate 0.0007 Epoch: 10 Global Step: 225610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:55,623-Speed 6310.96 samples/sec Loss 6.3856 LearningRate 0.0007 Epoch: 10 Global Step: 225620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:14:58,873-Speed 6302.41 samples/sec Loss 6.3092 LearningRate 0.0007 Epoch: 10 Global Step: 225630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:02,124-Speed 6302.00 samples/sec Loss 6.3435 LearningRate 0.0007 Epoch: 10 Global Step: 225640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:05,380-Speed 6291.50 samples/sec Loss 6.2905 LearningRate 0.0007 Epoch: 10 Global Step: 225650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:08,628-Speed 6307.96 samples/sec Loss 6.3486 LearningRate 0.0007 Epoch: 10 Global Step: 225660 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:15:11,862-Speed 6333.56 samples/sec Loss 6.2946 LearningRate 0.0007 Epoch: 10 Global Step: 225670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:15,107-Speed 6313.22 samples/sec Loss 6.3214 LearningRate 0.0007 Epoch: 10 Global Step: 225680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:18,353-Speed 6309.01 samples/sec Loss 6.3492 LearningRate 0.0007 Epoch: 10 Global Step: 225690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:21,600-Speed 6310.31 samples/sec Loss 6.4016 LearningRate 0.0007 Epoch: 10 Global Step: 225700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:24,855-Speed 6293.33 samples/sec Loss 6.3530 LearningRate 0.0007 Epoch: 10 Global Step: 225710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:28,102-Speed 6307.21 samples/sec Loss 6.3786 LearningRate 0.0007 Epoch: 10 Global Step: 225720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:31,347-Speed 6313.08 samples/sec Loss 6.4091 LearningRate 0.0007 Epoch: 10 Global Step: 225730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:34,593-Speed 6310.43 samples/sec Loss 6.3623 LearningRate 0.0007 Epoch: 10 Global Step: 225740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:37,840-Speed 6308.70 samples/sec Loss 6.3628 LearningRate 0.0007 Epoch: 10 Global Step: 225750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:41,088-Speed 6307.12 samples/sec Loss 6.4352 LearningRate 0.0007 Epoch: 10 Global Step: 225760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:44,333-Speed 6312.20 samples/sec Loss 6.4078 LearningRate 0.0007 Epoch: 10 Global Step: 225770 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:15:47,583-Speed 6304.32 samples/sec Loss 6.3791 LearningRate 0.0007 Epoch: 10 Global Step: 225780 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:15:50,819-Speed 6330.13 samples/sec Loss 6.3702 LearningRate 0.0007 Epoch: 10 Global Step: 225790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:54,065-Speed 6309.67 samples/sec Loss 6.3210 LearningRate 0.0007 Epoch: 10 Global Step: 225800 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:15:57,313-Speed 6307.12 samples/sec Loss 6.4353 LearningRate 0.0007 Epoch: 10 Global Step: 225810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:00,560-Speed 6308.04 samples/sec Loss 6.3180 LearningRate 0.0007 Epoch: 10 Global Step: 225820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:03,808-Speed 6307.34 samples/sec Loss 6.3064 LearningRate 0.0007 Epoch: 10 Global Step: 225830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:07,055-Speed 6310.16 samples/sec Loss 6.3493 LearningRate 0.0007 Epoch: 10 Global Step: 225840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:10,302-Speed 6308.52 samples/sec Loss 6.3682 LearningRate 0.0007 Epoch: 10 Global Step: 225850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:13,550-Speed 6307.56 samples/sec Loss 6.2298 LearningRate 0.0007 Epoch: 10 Global Step: 225860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:16,796-Speed 6310.63 samples/sec Loss 6.3932 LearningRate 0.0007 Epoch: 10 Global Step: 225870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:20,044-Speed 6306.75 samples/sec Loss 6.3420 LearningRate 0.0007 Epoch: 10 Global Step: 225880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:23,279-Speed 6331.98 samples/sec Loss 6.4298 LearningRate 0.0007 Epoch: 10 Global Step: 225890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:26,523-Speed 6315.24 samples/sec Loss 6.3572 LearningRate 0.0007 Epoch: 10 Global Step: 225900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:29,769-Speed 6310.09 samples/sec Loss 6.3263 LearningRate 0.0007 Epoch: 10 Global Step: 225910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:33,016-Speed 6307.48 samples/sec Loss 6.4072 LearningRate 0.0007 Epoch: 10 Global Step: 225920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:36,265-Speed 6305.56 samples/sec Loss 6.3445 LearningRate 0.0007 Epoch: 10 Global Step: 225930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:39,590-Speed 6160.98 samples/sec Loss 6.3776 LearningRate 0.0007 Epoch: 10 Global Step: 225940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:42,838-Speed 6307.10 samples/sec Loss 6.3559 LearningRate 0.0007 Epoch: 10 Global Step: 225950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:46,086-Speed 6307.23 samples/sec Loss 6.4092 LearningRate 0.0007 Epoch: 10 Global Step: 225960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:49,333-Speed 6308.65 samples/sec Loss 6.3333 LearningRate 0.0007 Epoch: 10 Global Step: 225970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:52,583-Speed 6303.26 samples/sec Loss 6.3075 LearningRate 0.0007 Epoch: 10 Global Step: 225980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:16:55,831-Speed 6305.91 samples/sec Loss 6.3917 LearningRate 0.0007 Epoch: 10 Global Step: 225990 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:16:59,063-Speed 6337.71 samples/sec Loss 6.3795 LearningRate 0.0007 Epoch: 10 Global Step: 226000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:02,314-Speed 6302.16 samples/sec Loss 6.3417 LearningRate 0.0007 Epoch: 10 Global Step: 226010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:05,563-Speed 6303.98 samples/sec Loss 6.3865 LearningRate 0.0007 Epoch: 10 Global Step: 226020 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:08,807-Speed 6314.80 samples/sec Loss 6.4183 LearningRate 0.0007 Epoch: 10 Global Step: 226030 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:12,054-Speed 6308.58 samples/sec Loss 6.2930 LearningRate 0.0007 Epoch: 10 Global Step: 226040 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:15,302-Speed 6308.31 samples/sec Loss 6.3277 LearningRate 0.0007 Epoch: 10 Global Step: 226050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:18,549-Speed 6309.11 samples/sec Loss 6.3379 LearningRate 0.0007 Epoch: 10 Global Step: 226060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:21,795-Speed 6309.56 samples/sec Loss 6.3594 LearningRate 0.0007 Epoch: 10 Global Step: 226070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:25,052-Speed 6289.12 samples/sec Loss 6.3820 LearningRate 0.0007 Epoch: 10 Global Step: 226080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:28,297-Speed 6312.67 samples/sec Loss 6.3253 LearningRate 0.0007 Epoch: 10 Global Step: 226090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:31,545-Speed 6306.55 samples/sec Loss 6.4069 LearningRate 0.0007 Epoch: 10 Global Step: 226100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:34,790-Speed 6313.16 samples/sec Loss 6.2692 LearningRate 0.0007 Epoch: 10 Global Step: 226110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:38,036-Speed 6310.03 samples/sec Loss 6.3599 LearningRate 0.0007 Epoch: 10 Global Step: 226120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:41,282-Speed 6311.76 samples/sec Loss 6.3959 LearningRate 0.0007 Epoch: 10 Global Step: 226130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:44,526-Speed 6313.69 samples/sec Loss 6.3648 LearningRate 0.0007 Epoch: 10 Global Step: 226140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:47,781-Speed 6294.80 samples/sec Loss 6.2851 LearningRate 0.0007 Epoch: 10 Global Step: 226150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:51,025-Speed 6314.28 samples/sec Loss 6.3151 LearningRate 0.0007 Epoch: 10 Global Step: 226160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:54,272-Speed 6309.16 samples/sec Loss 6.3191 LearningRate 0.0007 Epoch: 10 Global Step: 226170 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:17:57,518-Speed 6309.58 samples/sec Loss 6.3162 LearningRate 0.0007 Epoch: 10 Global Step: 226180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:00,762-Speed 6313.90 samples/sec Loss 6.3518 LearningRate 0.0007 Epoch: 10 Global Step: 226190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:04,013-Speed 6301.99 samples/sec Loss 6.3309 LearningRate 0.0007 Epoch: 10 Global Step: 226200 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:18:07,241-Speed 6345.73 samples/sec Loss 6.3251 LearningRate 0.0007 Epoch: 10 Global Step: 226210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:10,486-Speed 6312.29 samples/sec Loss 6.3215 LearningRate 0.0007 Epoch: 10 Global Step: 226220 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:13,730-Speed 6314.89 samples/sec Loss 6.4083 LearningRate 0.0007 Epoch: 10 Global Step: 226230 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:16,978-Speed 6306.64 samples/sec Loss 6.3534 LearningRate 0.0007 Epoch: 10 Global Step: 226240 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:20,224-Speed 6311.80 samples/sec Loss 6.3107 LearningRate 0.0007 Epoch: 10 Global Step: 226250 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:23,473-Speed 6305.31 samples/sec Loss 6.3577 LearningRate 0.0007 Epoch: 10 Global Step: 226260 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:26,719-Speed 6310.67 samples/sec Loss 6.2721 LearningRate 0.0007 Epoch: 10 Global Step: 226270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:29,965-Speed 6311.17 samples/sec Loss 6.3488 LearningRate 0.0007 Epoch: 10 Global Step: 226280 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:33,211-Speed 6309.54 samples/sec Loss 6.4530 LearningRate 0.0007 Epoch: 10 Global Step: 226290 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:36,462-Speed 6302.02 samples/sec Loss 6.3565 LearningRate 0.0007 Epoch: 10 Global Step: 226300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:39,708-Speed 6310.50 samples/sec Loss 6.3384 LearningRate 0.0007 Epoch: 10 Global Step: 226310 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:18:42,939-Speed 6339.60 samples/sec Loss 6.3204 LearningRate 0.0007 Epoch: 10 Global Step: 226320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:46,184-Speed 6312.66 samples/sec Loss 6.3763 LearningRate 0.0007 Epoch: 10 Global Step: 226330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:49,435-Speed 6300.77 samples/sec Loss 6.3125 LearningRate 0.0007 Epoch: 10 Global Step: 226340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:52,681-Speed 6310.58 samples/sec Loss 6.3122 LearningRate 0.0007 Epoch: 10 Global Step: 226350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:55,930-Speed 6305.90 samples/sec Loss 6.3573 LearningRate 0.0007 Epoch: 10 Global Step: 226360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:18:59,184-Speed 6294.46 samples/sec Loss 6.3697 LearningRate 0.0007 Epoch: 10 Global Step: 226370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:02,432-Speed 6306.87 samples/sec Loss 6.3604 LearningRate 0.0007 Epoch: 10 Global Step: 226380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:05,677-Speed 6313.00 samples/sec Loss 6.3834 LearningRate 0.0007 Epoch: 10 Global Step: 226390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:08,922-Speed 6312.36 samples/sec Loss 6.2665 LearningRate 0.0007 Epoch: 10 Global Step: 226400 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:12,169-Speed 6308.09 samples/sec Loss 6.3894 LearningRate 0.0007 Epoch: 10 Global Step: 226410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:15,398-Speed 6345.44 samples/sec Loss 6.2805 LearningRate 0.0007 Epoch: 10 Global Step: 226420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:18,650-Speed 6298.80 samples/sec Loss 6.3478 LearningRate 0.0007 Epoch: 10 Global Step: 226430 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:21,897-Speed 6308.59 samples/sec Loss 6.3789 LearningRate 0.0007 Epoch: 10 Global Step: 226440 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:25,142-Speed 6313.86 samples/sec Loss 6.3490 LearningRate 0.0007 Epoch: 10 Global Step: 226450 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:28,388-Speed 6310.45 samples/sec Loss 6.3735 LearningRate 0.0007 Epoch: 10 Global Step: 226460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:31,632-Speed 6314.05 samples/sec Loss 6.4132 LearningRate 0.0007 Epoch: 10 Global Step: 226470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:34,901-Speed 6266.82 samples/sec Loss 6.3343 LearningRate 0.0007 Epoch: 10 Global Step: 226480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:38,180-Speed 6247.38 samples/sec Loss 6.4021 LearningRate 0.0007 Epoch: 10 Global Step: 226490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:41,427-Speed 6308.50 samples/sec Loss 6.4228 LearningRate 0.0007 Epoch: 10 Global Step: 226500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:44,675-Speed 6306.50 samples/sec Loss 6.3015 LearningRate 0.0007 Epoch: 10 Global Step: 226510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:47,903-Speed 6345.53 samples/sec Loss 6.3855 LearningRate 0.0007 Epoch: 10 Global Step: 226520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:51,147-Speed 6315.35 samples/sec Loss 6.3127 LearningRate 0.0007 Epoch: 10 Global Step: 226530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:54,393-Speed 6310.25 samples/sec Loss 6.3753 LearningRate 0.0007 Epoch: 10 Global Step: 226540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:19:57,635-Speed 6318.02 samples/sec Loss 6.3594 LearningRate 0.0007 Epoch: 10 Global Step: 226550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:00,880-Speed 6312.80 samples/sec Loss 6.3290 LearningRate 0.0007 Epoch: 10 Global Step: 226560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:04,128-Speed 6306.88 samples/sec Loss 6.4214 LearningRate 0.0007 Epoch: 10 Global Step: 226570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:07,371-Speed 6317.28 samples/sec Loss 6.3052 LearningRate 0.0007 Epoch: 10 Global Step: 226580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:10,616-Speed 6312.33 samples/sec Loss 6.2904 LearningRate 0.0007 Epoch: 10 Global Step: 226590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:13,864-Speed 6306.28 samples/sec Loss 6.3627 LearningRate 0.0007 Epoch: 10 Global Step: 226600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:17,108-Speed 6316.12 samples/sec Loss 6.4039 LearningRate 0.0007 Epoch: 10 Global Step: 226610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:20,337-Speed 6342.81 samples/sec Loss 6.3297 LearningRate 0.0007 Epoch: 10 Global Step: 226620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:23,586-Speed 6305.22 samples/sec Loss 6.4275 LearningRate 0.0007 Epoch: 10 Global Step: 226630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:26,831-Speed 6312.91 samples/sec Loss 6.2844 LearningRate 0.0007 Epoch: 10 Global Step: 226640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:30,076-Speed 6311.59 samples/sec Loss 6.3367 LearningRate 0.0007 Epoch: 10 Global Step: 226650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:33,322-Speed 6311.25 samples/sec Loss 6.2997 LearningRate 0.0007 Epoch: 10 Global Step: 226660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:36,570-Speed 6308.11 samples/sec Loss 6.3360 LearningRate 0.0007 Epoch: 10 Global Step: 226670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:39,815-Speed 6312.48 samples/sec Loss 6.3923 LearningRate 0.0007 Epoch: 10 Global Step: 226680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:43,063-Speed 6306.82 samples/sec Loss 6.3564 LearningRate 0.0007 Epoch: 10 Global Step: 226690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:46,308-Speed 6313.68 samples/sec Loss 6.3012 LearningRate 0.0007 Epoch: 10 Global Step: 226700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:49,554-Speed 6310.42 samples/sec Loss 6.4000 LearningRate 0.0007 Epoch: 10 Global Step: 226710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:52,813-Speed 6284.62 samples/sec Loss 6.3979 LearningRate 0.0007 Epoch: 10 Global Step: 226720 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:20:56,066-Speed 6296.59 samples/sec Loss 6.3650 LearningRate 0.0007 Epoch: 10 Global Step: 226730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:20:59,316-Speed 6304.65 samples/sec Loss 6.4087 LearningRate 0.0007 Epoch: 10 Global Step: 226740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:02,563-Speed 6307.79 samples/sec Loss 6.4168 LearningRate 0.0007 Epoch: 10 Global Step: 226750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:05,851-Speed 6229.74 samples/sec Loss 6.3755 LearningRate 0.0007 Epoch: 10 Global Step: 226760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:09,097-Speed 6310.83 samples/sec Loss 6.3609 LearningRate 0.0007 Epoch: 10 Global Step: 226770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:12,343-Speed 6312.00 samples/sec Loss 6.3556 LearningRate 0.0007 Epoch: 10 Global Step: 226780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:15,590-Speed 6307.18 samples/sec Loss 6.3612 LearningRate 0.0007 Epoch: 10 Global Step: 226790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:18,838-Speed 6306.99 samples/sec Loss 6.3247 LearningRate 0.0007 Epoch: 10 Global Step: 226800 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:22,086-Speed 6307.78 samples/sec Loss 6.2584 LearningRate 0.0007 Epoch: 10 Global Step: 226810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:25,332-Speed 6309.32 samples/sec Loss 6.2955 LearningRate 0.0007 Epoch: 10 Global Step: 226820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:28,582-Speed 6304.51 samples/sec Loss 6.3896 LearningRate 0.0007 Epoch: 10 Global Step: 226830 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:21:31,815-Speed 6336.15 samples/sec Loss 6.3269 LearningRate 0.0007 Epoch: 10 Global Step: 226840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:35,077-Speed 6278.97 samples/sec Loss 6.3394 LearningRate 0.0007 Epoch: 10 Global Step: 226850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:38,324-Speed 6308.73 samples/sec Loss 6.3534 LearningRate 0.0007 Epoch: 10 Global Step: 226860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:41,571-Speed 6308.78 samples/sec Loss 6.3050 LearningRate 0.0007 Epoch: 10 Global Step: 226870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:44,815-Speed 6315.29 samples/sec Loss 6.2396 LearningRate 0.0007 Epoch: 10 Global Step: 226880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:48,064-Speed 6305.34 samples/sec Loss 6.3234 LearningRate 0.0007 Epoch: 10 Global Step: 226890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:51,306-Speed 6317.44 samples/sec Loss 6.3402 LearningRate 0.0007 Epoch: 10 Global Step: 226900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:54,557-Speed 6301.10 samples/sec Loss 6.3449 LearningRate 0.0007 Epoch: 10 Global Step: 226910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:21:57,803-Speed 6310.67 samples/sec Loss 6.3309 LearningRate 0.0007 Epoch: 10 Global Step: 226920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:01,049-Speed 6311.40 samples/sec Loss 6.4223 LearningRate 0.0007 Epoch: 10 Global Step: 226930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:04,284-Speed 6332.24 samples/sec Loss 6.3664 LearningRate 0.0007 Epoch: 10 Global Step: 226940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:07,529-Speed 6313.03 samples/sec Loss 6.2949 LearningRate 0.0007 Epoch: 10 Global Step: 226950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:10,785-Speed 6290.69 samples/sec Loss 6.2965 LearningRate 0.0007 Epoch: 10 Global Step: 226960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:14,033-Speed 6307.92 samples/sec Loss 6.2736 LearningRate 0.0007 Epoch: 10 Global Step: 226970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:17,280-Speed 6308.49 samples/sec Loss 6.2930 LearningRate 0.0007 Epoch: 10 Global Step: 226980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:20,529-Speed 6304.74 samples/sec Loss 6.3704 LearningRate 0.0007 Epoch: 10 Global Step: 226990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:23,786-Speed 6288.22 samples/sec Loss 6.3295 LearningRate 0.0007 Epoch: 10 Global Step: 227000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:27,031-Speed 6313.61 samples/sec Loss 6.3815 LearningRate 0.0007 Epoch: 10 Global Step: 227010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:30,277-Speed 6310.55 samples/sec Loss 6.3345 LearningRate 0.0007 Epoch: 10 Global Step: 227020 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:33,523-Speed 6311.16 samples/sec Loss 6.3314 LearningRate 0.0007 Epoch: 10 Global Step: 227030 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:36,755-Speed 6337.69 samples/sec Loss 6.3003 LearningRate 0.0007 Epoch: 10 Global Step: 227040 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:39,998-Speed 6314.86 samples/sec Loss 6.3118 LearningRate 0.0007 Epoch: 10 Global Step: 227050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:43,246-Speed 6308.63 samples/sec Loss 6.3548 LearningRate 0.0007 Epoch: 10 Global Step: 227060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:46,487-Speed 6319.02 samples/sec Loss 6.3517 LearningRate 0.0007 Epoch: 10 Global Step: 227070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:49,733-Speed 6311.11 samples/sec Loss 6.3170 LearningRate 0.0007 Epoch: 10 Global Step: 227080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:52,979-Speed 6311.44 samples/sec Loss 6.4162 LearningRate 0.0007 Epoch: 10 Global Step: 227090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:56,225-Speed 6310.84 samples/sec Loss 6.3073 LearningRate 0.0007 Epoch: 10 Global Step: 227100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:22:59,477-Speed 6299.55 samples/sec Loss 6.3593 LearningRate 0.0007 Epoch: 10 Global Step: 227110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:02,727-Speed 6302.47 samples/sec Loss 6.4100 LearningRate 0.0007 Epoch: 10 Global Step: 227120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:05,978-Speed 6302.38 samples/sec Loss 6.3311 LearningRate 0.0007 Epoch: 10 Global Step: 227130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:09,228-Speed 6302.16 samples/sec Loss 6.3653 LearningRate 0.0007 Epoch: 10 Global Step: 227140 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:23:12,457-Speed 6343.87 samples/sec Loss 6.3946 LearningRate 0.0007 Epoch: 10 Global Step: 227150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:15,703-Speed 6310.43 samples/sec Loss 6.3677 LearningRate 0.0007 Epoch: 10 Global Step: 227160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:18,958-Speed 6294.21 samples/sec Loss 6.3014 LearningRate 0.0007 Epoch: 10 Global Step: 227170 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:22,211-Speed 6295.95 samples/sec Loss 6.3538 LearningRate 0.0007 Epoch: 10 Global Step: 227180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:25,457-Speed 6310.99 samples/sec Loss 6.4366 LearningRate 0.0007 Epoch: 10 Global Step: 227190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:28,706-Speed 6305.85 samples/sec Loss 6.3874 LearningRate 0.0007 Epoch: 10 Global Step: 227200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:31,951-Speed 6313.00 samples/sec Loss 6.4075 LearningRate 0.0007 Epoch: 10 Global Step: 227210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:35,195-Speed 6313.02 samples/sec Loss 6.3684 LearningRate 0.0007 Epoch: 10 Global Step: 227220 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:38,446-Speed 6302.22 samples/sec Loss 6.3509 LearningRate 0.0007 Epoch: 10 Global Step: 227230 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:41,691-Speed 6312.98 samples/sec Loss 6.3430 LearningRate 0.0007 Epoch: 10 Global Step: 227240 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:44,937-Speed 6308.96 samples/sec Loss 6.2953 LearningRate 0.0007 Epoch: 10 Global Step: 227250 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:23:48,195-Speed 6288.39 samples/sec Loss 6.2658 LearningRate 0.0007 Epoch: 10 Global Step: 227260 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:23:51,429-Speed 6334.96 samples/sec Loss 6.3903 LearningRate 0.0007 Epoch: 10 Global Step: 227270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:54,676-Speed 6307.25 samples/sec Loss 6.3684 LearningRate 0.0007 Epoch: 10 Global Step: 227280 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:23:57,924-Speed 6308.45 samples/sec Loss 6.3528 LearningRate 0.0007 Epoch: 10 Global Step: 227290 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:01,169-Speed 6311.10 samples/sec Loss 6.3730 LearningRate 0.0007 Epoch: 10 Global Step: 227300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:04,418-Speed 6305.85 samples/sec Loss 6.3953 LearningRate 0.0007 Epoch: 10 Global Step: 227310 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:07,664-Speed 6310.32 samples/sec Loss 6.3350 LearningRate 0.0007 Epoch: 10 Global Step: 227320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:10,911-Speed 6308.86 samples/sec Loss 6.3262 LearningRate 0.0007 Epoch: 10 Global Step: 227330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:14,156-Speed 6313.35 samples/sec Loss 6.3523 LearningRate 0.0007 Epoch: 10 Global Step: 227340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:17,402-Speed 6311.38 samples/sec Loss 6.3643 LearningRate 0.0007 Epoch: 10 Global Step: 227350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:20,647-Speed 6312.18 samples/sec Loss 6.2904 LearningRate 0.0007 Epoch: 10 Global Step: 227360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:23,890-Speed 6317.06 samples/sec Loss 6.3369 LearningRate 0.0007 Epoch: 10 Global Step: 227370 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:24:27,131-Speed 6319.54 samples/sec Loss 6.3045 LearningRate 0.0007 Epoch: 10 Global Step: 227380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:30,377-Speed 6311.82 samples/sec Loss 6.2301 LearningRate 0.0007 Epoch: 10 Global Step: 227390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:33,623-Speed 6309.32 samples/sec Loss 6.2731 LearningRate 0.0007 Epoch: 10 Global Step: 227400 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:36,869-Speed 6311.12 samples/sec Loss 6.4001 LearningRate 0.0007 Epoch: 10 Global Step: 227410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:40,114-Speed 6312.00 samples/sec Loss 6.3060 LearningRate 0.0007 Epoch: 10 Global Step: 227420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:43,362-Speed 6307.66 samples/sec Loss 6.4091 LearningRate 0.0007 Epoch: 10 Global Step: 227430 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:46,608-Speed 6310.75 samples/sec Loss 6.3177 LearningRate 0.0007 Epoch: 10 Global Step: 227440 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:49,857-Speed 6304.73 samples/sec Loss 6.3777 LearningRate 0.0007 Epoch: 10 Global Step: 227450 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:53,102-Speed 6311.80 samples/sec Loss 6.2594 LearningRate 0.0007 Epoch: 10 Global Step: 227460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:56,349-Speed 6309.25 samples/sec Loss 6.3849 LearningRate 0.0007 Epoch: 10 Global Step: 227470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:24:59,598-Speed 6305.53 samples/sec Loss 6.3329 LearningRate 0.0007 Epoch: 10 Global Step: 227480 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:25:02,829-Speed 6339.67 samples/sec Loss 6.3699 LearningRate 0.0007 Epoch: 10 Global Step: 227490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:06,073-Speed 6313.68 samples/sec Loss 6.3819 LearningRate 0.0007 Epoch: 10 Global Step: 227500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:09,321-Speed 6306.47 samples/sec Loss 6.3341 LearningRate 0.0007 Epoch: 10 Global Step: 227510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:12,572-Speed 6301.78 samples/sec Loss 6.3624 LearningRate 0.0007 Epoch: 10 Global Step: 227520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:15,818-Speed 6310.78 samples/sec Loss 6.3705 LearningRate 0.0007 Epoch: 10 Global Step: 227530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:19,065-Speed 6309.25 samples/sec Loss 6.3246 LearningRate 0.0007 Epoch: 10 Global Step: 227540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:22,317-Speed 6300.11 samples/sec Loss 6.3227 LearningRate 0.0007 Epoch: 10 Global Step: 227550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:25,562-Speed 6312.64 samples/sec Loss 6.2602 LearningRate 0.0007 Epoch: 10 Global Step: 227560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:28,807-Speed 6311.12 samples/sec Loss 6.2423 LearningRate 0.0007 Epoch: 10 Global Step: 227570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:32,051-Speed 6314.86 samples/sec Loss 6.3485 LearningRate 0.0007 Epoch: 10 Global Step: 227580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:35,285-Speed 6335.05 samples/sec Loss 6.3590 LearningRate 0.0007 Epoch: 10 Global Step: 227590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:38,532-Speed 6309.14 samples/sec Loss 6.3117 LearningRate 0.0007 Epoch: 10 Global Step: 227600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:41,782-Speed 6302.24 samples/sec Loss 6.4018 LearningRate 0.0007 Epoch: 10 Global Step: 227610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:45,053-Speed 6263.58 samples/sec Loss 6.3268 LearningRate 0.0007 Epoch: 10 Global Step: 227620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:48,297-Speed 6313.97 samples/sec Loss 6.2445 LearningRate 0.0006 Epoch: 10 Global Step: 227630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:51,580-Speed 6238.79 samples/sec Loss 6.3583 LearningRate 0.0006 Epoch: 10 Global Step: 227640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:54,829-Speed 6305.34 samples/sec Loss 6.3601 LearningRate 0.0006 Epoch: 10 Global Step: 227650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:25:58,076-Speed 6308.90 samples/sec Loss 6.3964 LearningRate 0.0006 Epoch: 10 Global Step: 227660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:01,322-Speed 6311.38 samples/sec Loss 6.2337 LearningRate 0.0006 Epoch: 10 Global Step: 227670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:04,564-Speed 6317.80 samples/sec Loss 6.3350 LearningRate 0.0006 Epoch: 10 Global Step: 227680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:07,810-Speed 6310.19 samples/sec Loss 6.2694 LearningRate 0.0006 Epoch: 10 Global Step: 227690 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:26:11,043-Speed 6336.59 samples/sec Loss 6.2855 LearningRate 0.0006 Epoch: 10 Global Step: 227700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:14,291-Speed 6306.20 samples/sec Loss 6.3329 LearningRate 0.0006 Epoch: 10 Global Step: 227710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:17,538-Speed 6310.26 samples/sec Loss 6.3753 LearningRate 0.0006 Epoch: 10 Global Step: 227720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:20,784-Speed 6308.83 samples/sec Loss 6.4126 LearningRate 0.0006 Epoch: 10 Global Step: 227730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:24,033-Speed 6306.18 samples/sec Loss 6.3113 LearningRate 0.0006 Epoch: 10 Global Step: 227740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:27,281-Speed 6307.04 samples/sec Loss 6.3153 LearningRate 0.0006 Epoch: 10 Global Step: 227750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:30,527-Speed 6310.15 samples/sec Loss 6.3575 LearningRate 0.0006 Epoch: 10 Global Step: 227760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:33,771-Speed 6314.75 samples/sec Loss 6.3516 LearningRate 0.0006 Epoch: 10 Global Step: 227770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:37,020-Speed 6305.27 samples/sec Loss 6.3724 LearningRate 0.0006 Epoch: 10 Global Step: 227780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:40,279-Speed 6287.34 samples/sec Loss 6.4339 LearningRate 0.0006 Epoch: 10 Global Step: 227790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:43,525-Speed 6310.56 samples/sec Loss 6.2540 LearningRate 0.0006 Epoch: 10 Global Step: 227800 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:26:46,776-Speed 6301.21 samples/sec Loss 6.3921 LearningRate 0.0006 Epoch: 10 Global Step: 227810 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:26:50,004-Speed 6344.39 samples/sec Loss 6.3604 LearningRate 0.0006 Epoch: 10 Global Step: 227820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:53,251-Speed 6308.40 samples/sec Loss 6.3655 LearningRate 0.0006 Epoch: 10 Global Step: 227830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:56,499-Speed 6308.33 samples/sec Loss 6.3342 LearningRate 0.0006 Epoch: 10 Global Step: 227840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:26:59,745-Speed 6309.27 samples/sec Loss 6.3187 LearningRate 0.0006 Epoch: 10 Global Step: 227850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:02,992-Speed 6310.14 samples/sec Loss 6.3288 LearningRate 0.0006 Epoch: 10 Global Step: 227860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:06,234-Speed 6317.63 samples/sec Loss 6.3793 LearningRate 0.0006 Epoch: 10 Global Step: 227870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:09,480-Speed 6310.16 samples/sec Loss 6.4508 LearningRate 0.0006 Epoch: 10 Global Step: 227880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:12,729-Speed 6305.34 samples/sec Loss 6.3621 LearningRate 0.0006 Epoch: 10 Global Step: 227890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:15,974-Speed 6312.40 samples/sec Loss 6.4280 LearningRate 0.0006 Epoch: 10 Global Step: 227900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:19,220-Speed 6311.52 samples/sec Loss 6.2567 LearningRate 0.0006 Epoch: 10 Global Step: 227910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:22,449-Speed 6343.83 samples/sec Loss 6.3404 LearningRate 0.0006 Epoch: 10 Global Step: 227920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:25,692-Speed 6315.94 samples/sec Loss 6.3852 LearningRate 0.0006 Epoch: 10 Global Step: 227930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:28,937-Speed 6312.26 samples/sec Loss 6.3310 LearningRate 0.0006 Epoch: 10 Global Step: 227940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:32,182-Speed 6313.15 samples/sec Loss 6.3089 LearningRate 0.0006 Epoch: 10 Global Step: 227950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:35,431-Speed 6305.17 samples/sec Loss 6.3104 LearningRate 0.0006 Epoch: 10 Global Step: 227960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:38,675-Speed 6315.22 samples/sec Loss 6.4036 LearningRate 0.0006 Epoch: 10 Global Step: 227970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:41,921-Speed 6311.26 samples/sec Loss 6.3749 LearningRate 0.0006 Epoch: 10 Global Step: 227980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:45,167-Speed 6310.46 samples/sec Loss 6.3288 LearningRate 0.0006 Epoch: 10 Global Step: 227990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:48,414-Speed 6309.23 samples/sec Loss 6.3311 LearningRate 0.0006 Epoch: 10 Global Step: 228000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:51,660-Speed 6310.14 samples/sec Loss 6.3475 LearningRate 0.0006 Epoch: 10 Global Step: 228010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:54,888-Speed 6346.15 samples/sec Loss 6.2866 LearningRate 0.0006 Epoch: 10 Global Step: 228020 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:27:58,136-Speed 6306.35 samples/sec Loss 6.3757 LearningRate 0.0006 Epoch: 10 Global Step: 228030 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:28:01,378-Speed 6318.66 samples/sec Loss 6.3732 LearningRate 0.0006 Epoch: 10 Global Step: 228040 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:28:04,627-Speed 6303.68 samples/sec Loss 6.4122 LearningRate 0.0006 Epoch: 10 Global Step: 228050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:28:07,875-Speed 6308.04 samples/sec Loss 6.3436 LearningRate 0.0006 Epoch: 10 Global Step: 228060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:28:11,116-Speed 6321.12 samples/sec Loss 6.2771 LearningRate 0.0006 Epoch: 10 Global Step: 228070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:28:14,362-Speed 6310.42 samples/sec Loss 6.3354 LearningRate 0.0006 Epoch: 10 Global Step: 228080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:28:17,611-Speed 6305.18 samples/sec Loss 6.3137 LearningRate 0.0006 Epoch: 10 Global Step: 228090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:28:20,856-Speed 6311.04 samples/sec Loss 6.2868 LearningRate 0.0006 Epoch: 10 Global Step: 228100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:28:24,106-Speed 6303.51 samples/sec Loss 6.3167 LearningRate 0.0006 Epoch: 10 Global Step: 228110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:28:27,353-Speed 6308.50 samples/sec Loss 6.3890 LearningRate 0.0006 Epoch: 10 Global Step: 228120 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:28:30,582-Speed 6343.99 samples/sec Loss 6.3990 LearningRate 0.0006 Epoch: 10 Global Step: 228130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:29:30,799-Speed 340.11 samples/sec Loss 6.3907 LearningRate 0.0006 Epoch: 11 Global Step: 228140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:29:34,035-Speed 6330.57 samples/sec Loss 6.3245 LearningRate 0.0006 Epoch: 11 Global Step: 228150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:29:37,275-Speed 6322.39 samples/sec Loss 6.3374 LearningRate 0.0006 Epoch: 11 Global Step: 228160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:29:40,509-Speed 6333.79 samples/sec Loss 6.3192 LearningRate 0.0006 Epoch: 11 Global Step: 228170 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:29:43,744-Speed 6333.66 samples/sec Loss 6.3988 LearningRate 0.0006 Epoch: 11 Global Step: 228180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:29:46,980-Speed 6329.90 samples/sec Loss 6.3333 LearningRate 0.0006 Epoch: 11 Global Step: 228190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:29:50,220-Speed 6321.63 samples/sec Loss 6.3381 LearningRate 0.0006 Epoch: 11 Global Step: 228200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:29:53,464-Speed 6314.75 samples/sec Loss 6.3000 LearningRate 0.0006 Epoch: 11 Global Step: 228210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:29:56,710-Speed 6311.64 samples/sec Loss 6.3598 LearningRate 0.0006 Epoch: 11 Global Step: 228220 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:29:59,953-Speed 6315.91 samples/sec Loss 6.3335 LearningRate 0.0006 Epoch: 11 Global Step: 228230 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:30:03,177-Speed 6354.66 samples/sec Loss 6.4269 LearningRate 0.0006 Epoch: 11 Global Step: 228240 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:06,417-Speed 6321.94 samples/sec Loss 6.2399 LearningRate 0.0006 Epoch: 11 Global Step: 228250 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:09,653-Speed 6329.88 samples/sec Loss 6.3888 LearningRate 0.0006 Epoch: 11 Global Step: 228260 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:12,890-Speed 6327.53 samples/sec Loss 6.3658 LearningRate 0.0006 Epoch: 11 Global Step: 228270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:16,127-Speed 6329.77 samples/sec Loss 6.2765 LearningRate 0.0006 Epoch: 11 Global Step: 228280 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:19,365-Speed 6325.44 samples/sec Loss 6.3413 LearningRate 0.0006 Epoch: 11 Global Step: 228290 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:22,606-Speed 6320.00 samples/sec Loss 6.3336 LearningRate 0.0006 Epoch: 11 Global Step: 228300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:25,842-Speed 6329.59 samples/sec Loss 6.3441 LearningRate 0.0006 Epoch: 11 Global Step: 228310 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:29,081-Speed 6325.14 samples/sec Loss 6.3468 LearningRate 0.0006 Epoch: 11 Global Step: 228320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:32,321-Speed 6323.30 samples/sec Loss 6.3772 LearningRate 0.0006 Epoch: 11 Global Step: 228330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:35,549-Speed 6344.77 samples/sec Loss 6.3604 LearningRate 0.0006 Epoch: 11 Global Step: 228340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:38,791-Speed 6318.55 samples/sec Loss 6.3281 LearningRate 0.0006 Epoch: 11 Global Step: 228350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:42,029-Speed 6325.74 samples/sec Loss 6.3482 LearningRate 0.0006 Epoch: 11 Global Step: 228360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:45,271-Speed 6318.33 samples/sec Loss 6.2784 LearningRate 0.0006 Epoch: 11 Global Step: 228370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:48,512-Speed 6322.93 samples/sec Loss 6.2801 LearningRate 0.0006 Epoch: 11 Global Step: 228380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:51,752-Speed 6320.98 samples/sec Loss 6.3149 LearningRate 0.0006 Epoch: 11 Global Step: 228390 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:54,996-Speed 6314.94 samples/sec Loss 6.2022 LearningRate 0.0006 Epoch: 11 Global Step: 228400 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:30:58,236-Speed 6322.56 samples/sec Loss 6.2351 LearningRate 0.0006 Epoch: 11 Global Step: 228410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:01,476-Speed 6323.28 samples/sec Loss 6.3689 LearningRate 0.0006 Epoch: 11 Global Step: 228420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:04,712-Speed 6329.89 samples/sec Loss 6.3202 LearningRate 0.0006 Epoch: 11 Global Step: 228430 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:07,951-Speed 6324.04 samples/sec Loss 6.3639 LearningRate 0.0006 Epoch: 11 Global Step: 228440 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:31:11,176-Speed 6353.07 samples/sec Loss 6.2958 LearningRate 0.0006 Epoch: 11 Global Step: 228450 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:14,411-Speed 6331.34 samples/sec Loss 6.2734 LearningRate 0.0006 Epoch: 11 Global Step: 228460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:17,648-Speed 6329.04 samples/sec Loss 6.3279 LearningRate 0.0006 Epoch: 11 Global Step: 228470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:20,883-Speed 6331.46 samples/sec Loss 6.3148 LearningRate 0.0006 Epoch: 11 Global Step: 228480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:24,124-Speed 6319.84 samples/sec Loss 6.3548 LearningRate 0.0006 Epoch: 11 Global Step: 228490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:27,363-Speed 6324.93 samples/sec Loss 6.3383 LearningRate 0.0006 Epoch: 11 Global Step: 228500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:30,610-Speed 6309.22 samples/sec Loss 6.2911 LearningRate 0.0006 Epoch: 11 Global Step: 228510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:33,848-Speed 6325.19 samples/sec Loss 6.2879 LearningRate 0.0006 Epoch: 11 Global Step: 228520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:37,083-Speed 6333.21 samples/sec Loss 6.3831 LearningRate 0.0006 Epoch: 11 Global Step: 228530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:40,320-Speed 6328.19 samples/sec Loss 6.4041 LearningRate 0.0006 Epoch: 11 Global Step: 228540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:43,544-Speed 6354.09 samples/sec Loss 6.2549 LearningRate 0.0006 Epoch: 11 Global Step: 228550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:46,804-Speed 6283.22 samples/sec Loss 6.2157 LearningRate 0.0006 Epoch: 11 Global Step: 228560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:50,052-Speed 6307.03 samples/sec Loss 6.2882 LearningRate 0.0006 Epoch: 11 Global Step: 228570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:53,292-Speed 6320.94 samples/sec Loss 6.2805 LearningRate 0.0006 Epoch: 11 Global Step: 228580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:56,534-Speed 6319.77 samples/sec Loss 6.2801 LearningRate 0.0006 Epoch: 11 Global Step: 228590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:31:59,769-Speed 6331.93 samples/sec Loss 6.2702 LearningRate 0.0006 Epoch: 11 Global Step: 228600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:03,008-Speed 6324.37 samples/sec Loss 6.3269 LearningRate 0.0006 Epoch: 11 Global Step: 228610 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:06,245-Speed 6328.62 samples/sec Loss 6.3486 LearningRate 0.0006 Epoch: 11 Global Step: 228620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:09,482-Speed 6328.56 samples/sec Loss 6.2507 LearningRate 0.0006 Epoch: 11 Global Step: 228630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:12,719-Speed 6328.75 samples/sec Loss 6.3260 LearningRate 0.0006 Epoch: 11 Global Step: 228640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:15,946-Speed 6347.17 samples/sec Loss 6.3148 LearningRate 0.0006 Epoch: 11 Global Step: 228650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:19,181-Speed 6332.36 samples/sec Loss 6.2410 LearningRate 0.0006 Epoch: 11 Global Step: 228660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:22,414-Speed 6335.80 samples/sec Loss 6.3397 LearningRate 0.0006 Epoch: 11 Global Step: 228670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:25,650-Speed 6329.81 samples/sec Loss 6.3750 LearningRate 0.0006 Epoch: 11 Global Step: 228680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:28,897-Speed 6309.33 samples/sec Loss 6.2996 LearningRate 0.0006 Epoch: 11 Global Step: 228690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:32,133-Speed 6330.79 samples/sec Loss 6.4045 LearningRate 0.0006 Epoch: 11 Global Step: 228700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:35,374-Speed 6320.25 samples/sec Loss 6.2836 LearningRate 0.0006 Epoch: 11 Global Step: 228710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:38,609-Speed 6331.37 samples/sec Loss 6.3320 LearningRate 0.0006 Epoch: 11 Global Step: 228720 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:41,845-Speed 6330.41 samples/sec Loss 6.3291 LearningRate 0.0006 Epoch: 11 Global Step: 228730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:45,081-Speed 6330.61 samples/sec Loss 6.3613 LearningRate 0.0006 Epoch: 11 Global Step: 228740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:48,320-Speed 6323.27 samples/sec Loss 6.2787 LearningRate 0.0006 Epoch: 11 Global Step: 228750 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:32:51,546-Speed 6351.35 samples/sec Loss 6.3249 LearningRate 0.0006 Epoch: 11 Global Step: 228760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:54,781-Speed 6331.75 samples/sec Loss 6.2836 LearningRate 0.0006 Epoch: 11 Global Step: 228770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:32:58,017-Speed 6328.86 samples/sec Loss 6.3438 LearningRate 0.0006 Epoch: 11 Global Step: 228780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:01,258-Speed 6320.82 samples/sec Loss 6.3758 LearningRate 0.0006 Epoch: 11 Global Step: 228790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:04,498-Speed 6323.74 samples/sec Loss 6.3403 LearningRate 0.0006 Epoch: 11 Global Step: 228800 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:07,739-Speed 6319.43 samples/sec Loss 6.2903 LearningRate 0.0006 Epoch: 11 Global Step: 228810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:10,980-Speed 6322.18 samples/sec Loss 6.3958 LearningRate 0.0006 Epoch: 11 Global Step: 228820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:14,219-Speed 6324.34 samples/sec Loss 6.4332 LearningRate 0.0006 Epoch: 11 Global Step: 228830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:17,461-Speed 6317.72 samples/sec Loss 6.2902 LearningRate 0.0006 Epoch: 11 Global Step: 228840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:20,701-Speed 6323.70 samples/sec Loss 6.2713 LearningRate 0.0006 Epoch: 11 Global Step: 228850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:23,938-Speed 6326.99 samples/sec Loss 6.2484 LearningRate 0.0006 Epoch: 11 Global Step: 228860 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:33:27,161-Speed 6356.50 samples/sec Loss 6.2875 LearningRate 0.0006 Epoch: 11 Global Step: 228870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:30,401-Speed 6321.58 samples/sec Loss 6.3834 LearningRate 0.0006 Epoch: 11 Global Step: 228880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:33,642-Speed 6321.34 samples/sec Loss 6.4313 LearningRate 0.0006 Epoch: 11 Global Step: 228890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:36,881-Speed 6323.78 samples/sec Loss 6.2435 LearningRate 0.0006 Epoch: 11 Global Step: 228900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:40,119-Speed 6326.57 samples/sec Loss 6.3657 LearningRate 0.0006 Epoch: 11 Global Step: 228910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:43,358-Speed 6323.92 samples/sec Loss 6.3168 LearningRate 0.0006 Epoch: 11 Global Step: 228920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:46,592-Speed 6333.93 samples/sec Loss 6.3362 LearningRate 0.0006 Epoch: 11 Global Step: 228930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:49,831-Speed 6324.96 samples/sec Loss 6.4125 LearningRate 0.0006 Epoch: 11 Global Step: 228940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:53,077-Speed 6310.66 samples/sec Loss 6.3361 LearningRate 0.0006 Epoch: 11 Global Step: 228950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:56,318-Speed 6320.86 samples/sec Loss 6.3748 LearningRate 0.0006 Epoch: 11 Global Step: 228960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:33:59,546-Speed 6345.08 samples/sec Loss 6.3989 LearningRate 0.0006 Epoch: 11 Global Step: 228970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:02,790-Speed 6315.24 samples/sec Loss 6.2934 LearningRate 0.0006 Epoch: 11 Global Step: 228980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:06,031-Speed 6320.45 samples/sec Loss 6.3659 LearningRate 0.0006 Epoch: 11 Global Step: 228990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:09,270-Speed 6324.79 samples/sec Loss 6.3149 LearningRate 0.0006 Epoch: 11 Global Step: 229000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:12,508-Speed 6326.18 samples/sec Loss 6.3125 LearningRate 0.0006 Epoch: 11 Global Step: 229010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:15,750-Speed 6319.47 samples/sec Loss 6.3301 LearningRate 0.0006 Epoch: 11 Global Step: 229020 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:18,989-Speed 6323.66 samples/sec Loss 6.2331 LearningRate 0.0006 Epoch: 11 Global Step: 229030 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:22,232-Speed 6317.67 samples/sec Loss 6.1991 LearningRate 0.0006 Epoch: 11 Global Step: 229040 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:25,474-Speed 6317.27 samples/sec Loss 6.3664 LearningRate 0.0006 Epoch: 11 Global Step: 229050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:28,712-Speed 6326.25 samples/sec Loss 6.3634 LearningRate 0.0006 Epoch: 11 Global Step: 229060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:31,937-Speed 6352.85 samples/sec Loss 6.2672 LearningRate 0.0006 Epoch: 11 Global Step: 229070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:35,174-Speed 6327.00 samples/sec Loss 6.3557 LearningRate 0.0006 Epoch: 11 Global Step: 229080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:38,416-Speed 6318.12 samples/sec Loss 6.3784 LearningRate 0.0006 Epoch: 11 Global Step: 229090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:41,657-Speed 6321.39 samples/sec Loss 6.3393 LearningRate 0.0006 Epoch: 11 Global Step: 229100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:44,901-Speed 6314.89 samples/sec Loss 6.4232 LearningRate 0.0006 Epoch: 11 Global Step: 229110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:48,143-Speed 6317.21 samples/sec Loss 6.3418 LearningRate 0.0006 Epoch: 11 Global Step: 229120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:51,383-Speed 6322.62 samples/sec Loss 6.3260 LearningRate 0.0006 Epoch: 11 Global Step: 229130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:54,624-Speed 6321.32 samples/sec Loss 6.3417 LearningRate 0.0006 Epoch: 11 Global Step: 229140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:34:57,860-Speed 6329.21 samples/sec Loss 6.2942 LearningRate 0.0006 Epoch: 11 Global Step: 229150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:01,106-Speed 6310.52 samples/sec Loss 6.2919 LearningRate 0.0006 Epoch: 11 Global Step: 229160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:04,350-Speed 6316.56 samples/sec Loss 6.3008 LearningRate 0.0006 Epoch: 11 Global Step: 229170 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:35:07,578-Speed 6344.68 samples/sec Loss 6.3127 LearningRate 0.0006 Epoch: 11 Global Step: 229180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:10,823-Speed 6312.97 samples/sec Loss 6.2643 LearningRate 0.0006 Epoch: 11 Global Step: 229190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:14,061-Speed 6326.92 samples/sec Loss 6.3261 LearningRate 0.0006 Epoch: 11 Global Step: 229200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:17,331-Speed 6264.24 samples/sec Loss 6.3083 LearningRate 0.0006 Epoch: 11 Global Step: 229210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:20,582-Speed 6300.46 samples/sec Loss 6.2465 LearningRate 0.0006 Epoch: 11 Global Step: 229220 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:23,824-Speed 6319.27 samples/sec Loss 6.2677 LearningRate 0.0006 Epoch: 11 Global Step: 229230 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:27,083-Speed 6284.11 samples/sec Loss 6.3586 LearningRate 0.0006 Epoch: 11 Global Step: 229240 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:30,369-Speed 6235.88 samples/sec Loss 6.3206 LearningRate 0.0006 Epoch: 11 Global Step: 229250 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:33,611-Speed 6318.82 samples/sec Loss 6.3643 LearningRate 0.0006 Epoch: 11 Global Step: 229260 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:36,852-Speed 6319.41 samples/sec Loss 6.2960 LearningRate 0.0006 Epoch: 11 Global Step: 229270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:40,105-Speed 6297.09 samples/sec Loss 6.3109 LearningRate 0.0006 Epoch: 11 Global Step: 229280 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:35:43,331-Speed 6349.38 samples/sec Loss 6.3012 LearningRate 0.0006 Epoch: 11 Global Step: 229290 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:46,574-Speed 6317.60 samples/sec Loss 6.3007 LearningRate 0.0006 Epoch: 11 Global Step: 229300 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:49,816-Speed 6318.64 samples/sec Loss 6.2811 LearningRate 0.0006 Epoch: 11 Global Step: 229310 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:53,062-Speed 6310.62 samples/sec Loss 6.3367 LearningRate 0.0006 Epoch: 11 Global Step: 229320 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:56,304-Speed 6318.75 samples/sec Loss 6.3575 LearningRate 0.0006 Epoch: 11 Global Step: 229330 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:35:59,542-Speed 6326.21 samples/sec Loss 6.2420 LearningRate 0.0006 Epoch: 11 Global Step: 229340 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:02,782-Speed 6321.54 samples/sec Loss 6.2557 LearningRate 0.0006 Epoch: 11 Global Step: 229350 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:06,032-Speed 6303.21 samples/sec Loss 6.4130 LearningRate 0.0006 Epoch: 11 Global Step: 229360 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:09,271-Speed 6325.28 samples/sec Loss 6.3645 LearningRate 0.0006 Epoch: 11 Global Step: 229370 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:12,509-Speed 6324.63 samples/sec Loss 6.3534 LearningRate 0.0006 Epoch: 11 Global Step: 229380 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:15,755-Speed 6312.28 samples/sec Loss 6.3120 LearningRate 0.0006 Epoch: 11 Global Step: 229390 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:36:18,997-Speed 6317.65 samples/sec Loss 6.2324 LearningRate 0.0006 Epoch: 11 Global Step: 229400 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:36:22,221-Speed 6354.04 samples/sec Loss 6.3489 LearningRate 0.0006 Epoch: 11 Global Step: 229410 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:25,464-Speed 6316.17 samples/sec Loss 6.3495 LearningRate 0.0006 Epoch: 11 Global Step: 229420 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:28,704-Speed 6322.24 samples/sec Loss 6.3578 LearningRate 0.0006 Epoch: 11 Global Step: 229430 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:31,944-Speed 6321.56 samples/sec Loss 6.2805 LearningRate 0.0006 Epoch: 11 Global Step: 229440 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:35,184-Speed 6322.67 samples/sec Loss 6.3713 LearningRate 0.0006 Epoch: 11 Global Step: 229450 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:38,428-Speed 6315.69 samples/sec Loss 6.3021 LearningRate 0.0006 Epoch: 11 Global Step: 229460 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:41,671-Speed 6316.57 samples/sec Loss 6.3657 LearningRate 0.0006 Epoch: 11 Global Step: 229470 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:44,911-Speed 6322.63 samples/sec Loss 6.3118 LearningRate 0.0006 Epoch: 11 Global Step: 229480 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:48,149-Speed 6327.15 samples/sec Loss 6.2907 LearningRate 0.0006 Epoch: 11 Global Step: 229490 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:51,390-Speed 6319.74 samples/sec Loss 6.3516 LearningRate 0.0006 Epoch: 11 Global Step: 229500 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:54,612-Speed 6357.48 samples/sec Loss 6.3203 LearningRate 0.0006 Epoch: 11 Global Step: 229510 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:36:57,855-Speed 6317.14 samples/sec Loss 6.3198 LearningRate 0.0006 Epoch: 11 Global Step: 229520 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:01,097-Speed 6318.55 samples/sec Loss 6.3747 LearningRate 0.0006 Epoch: 11 Global Step: 229530 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:04,335-Speed 6325.57 samples/sec Loss 6.2821 LearningRate 0.0006 Epoch: 11 Global Step: 229540 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:07,573-Speed 6326.82 samples/sec Loss 6.3395 LearningRate 0.0006 Epoch: 11 Global Step: 229550 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:10,810-Speed 6328.65 samples/sec Loss 6.2942 LearningRate 0.0006 Epoch: 11 Global Step: 229560 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:14,051-Speed 6320.47 samples/sec Loss 6.3466 LearningRate 0.0006 Epoch: 11 Global Step: 229570 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:17,292-Speed 6320.17 samples/sec Loss 6.2917 LearningRate 0.0006 Epoch: 11 Global Step: 229580 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:20,535-Speed 6315.53 samples/sec Loss 6.2428 LearningRate 0.0006 Epoch: 11 Global Step: 229590 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:23,774-Speed 6324.20 samples/sec Loss 6.3639 LearningRate 0.0006 Epoch: 11 Global Step: 229600 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:27,013-Speed 6325.38 samples/sec Loss 6.2927 LearningRate 0.0006 Epoch: 11 Global Step: 229610 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:37:30,242-Speed 6343.07 samples/sec Loss 6.3647 LearningRate 0.0006 Epoch: 11 Global Step: 229620 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:33,485-Speed 6317.95 samples/sec Loss 6.2905 LearningRate 0.0006 Epoch: 11 Global Step: 229630 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:36,726-Speed 6319.91 samples/sec Loss 6.2770 LearningRate 0.0006 Epoch: 11 Global Step: 229640 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:39,966-Speed 6322.29 samples/sec Loss 6.2386 LearningRate 0.0006 Epoch: 11 Global Step: 229650 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:43,205-Speed 6324.55 samples/sec Loss 6.3540 LearningRate 0.0006 Epoch: 11 Global Step: 229660 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:46,446-Speed 6320.08 samples/sec Loss 6.3799 LearningRate 0.0006 Epoch: 11 Global Step: 229670 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:49,687-Speed 6320.50 samples/sec Loss 6.2999 LearningRate 0.0006 Epoch: 11 Global Step: 229680 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:52,930-Speed 6317.19 samples/sec Loss 6.3042 LearningRate 0.0006 Epoch: 11 Global Step: 229690 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:56,174-Speed 6315.14 samples/sec Loss 6.3598 LearningRate 0.0006 Epoch: 11 Global Step: 229700 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:37:59,414-Speed 6321.10 samples/sec Loss 6.3366 LearningRate 0.0006 Epoch: 11 Global Step: 229710 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:02,659-Speed 6314.36 samples/sec Loss 6.2352 LearningRate 0.0006 Epoch: 11 Global Step: 229720 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:38:05,888-Speed 6343.46 samples/sec Loss 6.3190 LearningRate 0.0006 Epoch: 11 Global Step: 229730 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:09,129-Speed 6320.05 samples/sec Loss 6.3374 LearningRate 0.0006 Epoch: 11 Global Step: 229740 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:12,372-Speed 6315.87 samples/sec Loss 6.3012 LearningRate 0.0006 Epoch: 11 Global Step: 229750 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:15,611-Speed 6325.09 samples/sec Loss 6.3624 LearningRate 0.0006 Epoch: 11 Global Step: 229760 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:18,857-Speed 6310.81 samples/sec Loss 6.4243 LearningRate 0.0006 Epoch: 11 Global Step: 229770 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:22,098-Speed 6319.75 samples/sec Loss 6.3408 LearningRate 0.0006 Epoch: 11 Global Step: 229780 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:25,340-Speed 6319.74 samples/sec Loss 6.3856 LearningRate 0.0006 Epoch: 11 Global Step: 229790 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:28,583-Speed 6316.58 samples/sec Loss 6.4136 LearningRate 0.0006 Epoch: 11 Global Step: 229800 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:31,827-Speed 6313.75 samples/sec Loss 6.2431 LearningRate 0.0006 Epoch: 11 Global Step: 229810 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:35,069-Speed 6319.51 samples/sec Loss 6.3255 LearningRate 0.0006 Epoch: 11 Global Step: 229820 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:38,301-Speed 6337.25 samples/sec Loss 6.3037 LearningRate 0.0006 Epoch: 11 Global Step: 229830 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:41,542-Speed 6319.42 samples/sec Loss 6.2546 LearningRate 0.0006 Epoch: 11 Global Step: 229840 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:44,784-Speed 6318.54 samples/sec Loss 6.2920 LearningRate 0.0006 Epoch: 11 Global Step: 229850 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:48,029-Speed 6312.81 samples/sec Loss 6.2286 LearningRate 0.0006 Epoch: 11 Global Step: 229860 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:51,274-Speed 6314.37 samples/sec Loss 6.2761 LearningRate 0.0006 Epoch: 11 Global Step: 229870 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:54,515-Speed 6318.38 samples/sec Loss 6.2746 LearningRate 0.0006 Epoch: 11 Global Step: 229880 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:38:57,754-Speed 6324.38 samples/sec Loss 6.3329 LearningRate 0.0006 Epoch: 11 Global Step: 229890 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:00,997-Speed 6316.68 samples/sec Loss 6.2955 LearningRate 0.0006 Epoch: 11 Global Step: 229900 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:04,243-Speed 6313.05 samples/sec Loss 6.3684 LearningRate 0.0006 Epoch: 11 Global Step: 229910 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:07,482-Speed 6322.88 samples/sec Loss 6.3048 LearningRate 0.0006 Epoch: 11 Global Step: 229920 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:10,711-Speed 6343.86 samples/sec Loss 6.3456 LearningRate 0.0006 Epoch: 11 Global Step: 229930 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:13,949-Speed 6326.44 samples/sec Loss 6.3323 LearningRate 0.0006 Epoch: 11 Global Step: 229940 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:17,192-Speed 6316.72 samples/sec Loss 6.3965 LearningRate 0.0006 Epoch: 11 Global Step: 229950 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:20,440-Speed 6308.43 samples/sec Loss 6.2813 LearningRate 0.0006 Epoch: 11 Global Step: 229960 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:23,686-Speed 6309.67 samples/sec Loss 6.3524 LearningRate 0.0006 Epoch: 11 Global Step: 229970 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:26,926-Speed 6322.68 samples/sec Loss 6.2605 LearningRate 0.0006 Epoch: 11 Global Step: 229980 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:30,168-Speed 6317.35 samples/sec Loss 6.3505 LearningRate 0.0006 Epoch: 11 Global Step: 229990 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:33,412-Speed 6314.59 samples/sec Loss 6.3056 LearningRate 0.0006 Epoch: 11 Global Step: 230000 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:36,660-Speed 6307.50 samples/sec Loss 6.3090 LearningRate 0.0006 Epoch: 11 Global Step: 230010 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:39,905-Speed 6311.68 samples/sec Loss 6.2695 LearningRate 0.0006 Epoch: 11 Global Step: 230020 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:43,148-Speed 6317.71 samples/sec Loss 6.2983 LearningRate 0.0006 Epoch: 11 Global Step: 230030 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:39:46,377-Speed 6344.54 samples/sec Loss 6.3796 LearningRate 0.0006 Epoch: 11 Global Step: 230040 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:49,618-Speed 6319.56 samples/sec Loss 6.3215 LearningRate 0.0006 Epoch: 11 Global Step: 230050 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:52,864-Speed 6311.37 samples/sec Loss 6.3010 LearningRate 0.0006 Epoch: 11 Global Step: 230060 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:56,107-Speed 6315.74 samples/sec Loss 6.3306 LearningRate 0.0006 Epoch: 11 Global Step: 230070 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:39:59,375-Speed 6267.53 samples/sec Loss 6.2760 LearningRate 0.0006 Epoch: 11 Global Step: 230080 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:02,620-Speed 6314.58 samples/sec Loss 6.2981 LearningRate 0.0006 Epoch: 11 Global Step: 230090 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:05,864-Speed 6313.18 samples/sec Loss 6.2630 LearningRate 0.0006 Epoch: 11 Global Step: 230100 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:09,106-Speed 6317.86 samples/sec Loss 6.3351 LearningRate 0.0006 Epoch: 11 Global Step: 230110 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:12,349-Speed 6317.48 samples/sec Loss 6.3155 LearningRate 0.0006 Epoch: 11 Global Step: 230120 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:15,592-Speed 6316.47 samples/sec Loss 6.3684 LearningRate 0.0006 Epoch: 11 Global Step: 230130 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:18,821-Speed 6345.62 samples/sec Loss 6.2794 LearningRate 0.0006 Epoch: 11 Global Step: 230140 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:22,070-Speed 6305.40 samples/sec Loss 6.3208 LearningRate 0.0006 Epoch: 11 Global Step: 230150 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:25,309-Speed 6322.91 samples/sec Loss 6.2929 LearningRate 0.0006 Epoch: 11 Global Step: 230160 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:28,554-Speed 6313.86 samples/sec Loss 6.2578 LearningRate 0.0006 Epoch: 11 Global Step: 230170 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:31,795-Speed 6319.69 samples/sec Loss 6.3483 LearningRate 0.0006 Epoch: 11 Global Step: 230180 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:35,038-Speed 6317.30 samples/sec Loss 6.3128 LearningRate 0.0006 Epoch: 11 Global Step: 230190 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:38,283-Speed 6311.45 samples/sec Loss 6.3548 LearningRate 0.0006 Epoch: 11 Global Step: 230200 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:41,529-Speed 6310.02 samples/sec Loss 6.2982 LearningRate 0.0006 Epoch: 11 Global Step: 230210 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:44,779-Speed 6304.24 samples/sec Loss 6.2417 LearningRate 0.0006 Epoch: 11 Global Step: 230220 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:48,018-Speed 6324.14 samples/sec Loss 6.3450 LearningRate 0.0006 Epoch: 11 Global Step: 230230 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:40:51,280-Speed 6279.97 samples/sec Loss 6.3228 LearningRate 0.0006 Epoch: 11 Global Step: 230240 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:40:54,523-Speed 6317.01 samples/sec Loss 6.4148 LearningRate 0.0006 Epoch: 11 Global Step: 230250 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:40:57,768-Speed 6312.42 samples/sec Loss 6.3308 LearningRate 0.0006 Epoch: 11 Global Step: 230260 Fp16 Grad Scale: 65536 Required: 55 hours Training: 2022-04-01 12:41:01,000-Speed 6336.84 samples/sec Loss 6.3070 LearningRate 0.0006 Epoch: 11 Global Step: 230270 Fp16 Grad Scale: 32768 Required: 55 hours Training: 2022-04-01 12:41:04,256-Speed 6292.17 samples/sec Loss 6.2458 LearningRate 0.0006 Epoch: 11 Global Step: 230280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:07,499-Speed 6316.31 samples/sec Loss 6.2875 LearningRate 0.0006 Epoch: 11 Global Step: 230290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:10,741-Speed 6317.33 samples/sec Loss 6.2672 LearningRate 0.0006 Epoch: 11 Global Step: 230300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:13,983-Speed 6318.89 samples/sec Loss 6.3023 LearningRate 0.0006 Epoch: 11 Global Step: 230310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:17,343-Speed 6096.41 samples/sec Loss 6.3459 LearningRate 0.0006 Epoch: 11 Global Step: 230320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:20,598-Speed 6294.59 samples/sec Loss 6.2929 LearningRate 0.0006 Epoch: 11 Global Step: 230330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:23,844-Speed 6310.75 samples/sec Loss 6.3130 LearningRate 0.0006 Epoch: 11 Global Step: 230340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:27,087-Speed 6317.15 samples/sec Loss 6.3044 LearningRate 0.0006 Epoch: 11 Global Step: 230350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:30,333-Speed 6309.44 samples/sec Loss 6.3581 LearningRate 0.0006 Epoch: 11 Global Step: 230360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:33,578-Speed 6313.19 samples/sec Loss 6.2533 LearningRate 0.0006 Epoch: 11 Global Step: 230370 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:41:36,819-Speed 6320.92 samples/sec Loss 6.3814 LearningRate 0.0006 Epoch: 11 Global Step: 230380 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:41:40,051-Speed 6338.91 samples/sec Loss 6.2637 LearningRate 0.0006 Epoch: 11 Global Step: 230390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:43,295-Speed 6313.33 samples/sec Loss 6.2864 LearningRate 0.0006 Epoch: 11 Global Step: 230400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:46,538-Speed 6316.62 samples/sec Loss 6.2712 LearningRate 0.0006 Epoch: 11 Global Step: 230410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:49,781-Speed 6317.63 samples/sec Loss 6.3125 LearningRate 0.0006 Epoch: 11 Global Step: 230420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:53,024-Speed 6316.93 samples/sec Loss 6.2327 LearningRate 0.0006 Epoch: 11 Global Step: 230430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:56,268-Speed 6314.41 samples/sec Loss 6.2862 LearningRate 0.0006 Epoch: 11 Global Step: 230440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:41:59,510-Speed 6317.12 samples/sec Loss 6.3073 LearningRate 0.0006 Epoch: 11 Global Step: 230450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:02,773-Speed 6278.26 samples/sec Loss 6.2801 LearningRate 0.0006 Epoch: 11 Global Step: 230460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:06,023-Speed 6303.36 samples/sec Loss 6.3204 LearningRate 0.0006 Epoch: 11 Global Step: 230470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:09,271-Speed 6306.48 samples/sec Loss 6.3684 LearningRate 0.0006 Epoch: 11 Global Step: 230480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:12,502-Speed 6340.17 samples/sec Loss 6.3511 LearningRate 0.0006 Epoch: 11 Global Step: 230490 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:15,766-Speed 6276.47 samples/sec Loss 6.3437 LearningRate 0.0006 Epoch: 11 Global Step: 230500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:19,026-Speed 6283.11 samples/sec Loss 6.2219 LearningRate 0.0006 Epoch: 11 Global Step: 230510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:22,266-Speed 6322.09 samples/sec Loss 6.3482 LearningRate 0.0006 Epoch: 11 Global Step: 230520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:25,510-Speed 6315.27 samples/sec Loss 6.3644 LearningRate 0.0006 Epoch: 11 Global Step: 230530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:28,755-Speed 6312.28 samples/sec Loss 6.2681 LearningRate 0.0006 Epoch: 11 Global Step: 230540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:32,011-Speed 6291.51 samples/sec Loss 6.2873 LearningRate 0.0006 Epoch: 11 Global Step: 230550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:35,255-Speed 6316.70 samples/sec Loss 6.2999 LearningRate 0.0006 Epoch: 11 Global Step: 230560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:38,504-Speed 6303.31 samples/sec Loss 6.2501 LearningRate 0.0006 Epoch: 11 Global Step: 230570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:41,747-Speed 6317.34 samples/sec Loss 6.3023 LearningRate 0.0006 Epoch: 11 Global Step: 230580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:44,991-Speed 6314.77 samples/sec Loss 6.2205 LearningRate 0.0006 Epoch: 11 Global Step: 230590 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:42:48,223-Speed 6337.82 samples/sec Loss 6.3269 LearningRate 0.0006 Epoch: 11 Global Step: 230600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:51,469-Speed 6310.61 samples/sec Loss 6.2940 LearningRate 0.0006 Epoch: 11 Global Step: 230610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:54,711-Speed 6318.79 samples/sec Loss 6.3265 LearningRate 0.0006 Epoch: 11 Global Step: 230620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:42:57,955-Speed 6313.96 samples/sec Loss 6.2764 LearningRate 0.0006 Epoch: 11 Global Step: 230630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:01,197-Speed 6319.08 samples/sec Loss 6.2838 LearningRate 0.0006 Epoch: 11 Global Step: 230640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:04,443-Speed 6310.00 samples/sec Loss 6.3304 LearningRate 0.0006 Epoch: 11 Global Step: 230650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:07,688-Speed 6312.48 samples/sec Loss 6.3495 LearningRate 0.0006 Epoch: 11 Global Step: 230660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:10,931-Speed 6316.51 samples/sec Loss 6.2668 LearningRate 0.0006 Epoch: 11 Global Step: 230670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:14,182-Speed 6301.36 samples/sec Loss 6.3227 LearningRate 0.0006 Epoch: 11 Global Step: 230680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:17,426-Speed 6314.53 samples/sec Loss 6.3000 LearningRate 0.0006 Epoch: 11 Global Step: 230690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:20,673-Speed 6310.09 samples/sec Loss 6.2904 LearningRate 0.0006 Epoch: 11 Global Step: 230700 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:43:23,904-Speed 6338.87 samples/sec Loss 6.3035 LearningRate 0.0006 Epoch: 11 Global Step: 230710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:27,154-Speed 6302.34 samples/sec Loss 6.3407 LearningRate 0.0006 Epoch: 11 Global Step: 230720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:30,397-Speed 6315.95 samples/sec Loss 6.3315 LearningRate 0.0006 Epoch: 11 Global Step: 230730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:33,645-Speed 6307.81 samples/sec Loss 6.3389 LearningRate 0.0006 Epoch: 11 Global Step: 230740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:36,895-Speed 6304.31 samples/sec Loss 6.3066 LearningRate 0.0006 Epoch: 11 Global Step: 230750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:40,139-Speed 6313.81 samples/sec Loss 6.2586 LearningRate 0.0006 Epoch: 11 Global Step: 230760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:43,384-Speed 6313.26 samples/sec Loss 6.2944 LearningRate 0.0006 Epoch: 11 Global Step: 230770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:46,627-Speed 6316.15 samples/sec Loss 6.3343 LearningRate 0.0006 Epoch: 11 Global Step: 230780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:49,871-Speed 6315.42 samples/sec Loss 6.2902 LearningRate 0.0006 Epoch: 11 Global Step: 230790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:53,115-Speed 6313.74 samples/sec Loss 6.2533 LearningRate 0.0006 Epoch: 11 Global Step: 230800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:43:56,360-Speed 6313.23 samples/sec Loss 6.3088 LearningRate 0.0006 Epoch: 11 Global Step: 230810 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:43:59,592-Speed 6337.90 samples/sec Loss 6.2748 LearningRate 0.0006 Epoch: 11 Global Step: 230820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:02,838-Speed 6311.19 samples/sec Loss 6.3161 LearningRate 0.0006 Epoch: 11 Global Step: 230830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:06,081-Speed 6316.72 samples/sec Loss 6.2816 LearningRate 0.0006 Epoch: 11 Global Step: 230840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:09,326-Speed 6311.20 samples/sec Loss 6.3656 LearningRate 0.0006 Epoch: 11 Global Step: 230850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:12,567-Speed 6321.49 samples/sec Loss 6.2434 LearningRate 0.0006 Epoch: 11 Global Step: 230860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:15,814-Speed 6308.52 samples/sec Loss 6.2618 LearningRate 0.0006 Epoch: 11 Global Step: 230870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:19,055-Speed 6319.85 samples/sec Loss 6.3341 LearningRate 0.0006 Epoch: 11 Global Step: 230880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:22,300-Speed 6312.41 samples/sec Loss 6.3252 LearningRate 0.0006 Epoch: 11 Global Step: 230890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:25,543-Speed 6316.42 samples/sec Loss 6.2751 LearningRate 0.0006 Epoch: 11 Global Step: 230900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:28,786-Speed 6317.69 samples/sec Loss 6.3447 LearningRate 0.0006 Epoch: 11 Global Step: 230910 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:32,016-Speed 6340.33 samples/sec Loss 6.3868 LearningRate 0.0006 Epoch: 11 Global Step: 230920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:35,262-Speed 6311.04 samples/sec Loss 6.3633 LearningRate 0.0006 Epoch: 11 Global Step: 230930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:38,512-Speed 6304.24 samples/sec Loss 6.3160 LearningRate 0.0006 Epoch: 11 Global Step: 230940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:41,758-Speed 6309.64 samples/sec Loss 6.3558 LearningRate 0.0006 Epoch: 11 Global Step: 230950 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:45,002-Speed 6315.13 samples/sec Loss 6.3089 LearningRate 0.0006 Epoch: 11 Global Step: 230960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:48,252-Speed 6304.30 samples/sec Loss 6.2448 LearningRate 0.0006 Epoch: 11 Global Step: 230970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:51,497-Speed 6312.54 samples/sec Loss 6.2874 LearningRate 0.0006 Epoch: 11 Global Step: 230980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:54,738-Speed 6319.94 samples/sec Loss 6.3916 LearningRate 0.0006 Epoch: 11 Global Step: 230990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:44:57,980-Speed 6317.91 samples/sec Loss 6.2965 LearningRate 0.0006 Epoch: 11 Global Step: 231000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:01,229-Speed 6306.14 samples/sec Loss 6.3321 LearningRate 0.0006 Epoch: 11 Global Step: 231010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:04,475-Speed 6310.80 samples/sec Loss 6.3765 LearningRate 0.0006 Epoch: 11 Global Step: 231020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:07,725-Speed 6302.76 samples/sec Loss 6.3165 LearningRate 0.0006 Epoch: 11 Global Step: 231030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:10,968-Speed 6315.23 samples/sec Loss 6.3379 LearningRate 0.0006 Epoch: 11 Global Step: 231040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:14,217-Speed 6306.25 samples/sec Loss 6.3830 LearningRate 0.0006 Epoch: 11 Global Step: 231050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:17,464-Speed 6308.42 samples/sec Loss 6.2993 LearningRate 0.0006 Epoch: 11 Global Step: 231060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:20,712-Speed 6306.51 samples/sec Loss 6.2575 LearningRate 0.0006 Epoch: 11 Global Step: 231070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:23,957-Speed 6313.39 samples/sec Loss 6.2876 LearningRate 0.0006 Epoch: 11 Global Step: 231080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:27,201-Speed 6313.27 samples/sec Loss 6.2502 LearningRate 0.0006 Epoch: 11 Global Step: 231090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:30,448-Speed 6309.44 samples/sec Loss 6.3314 LearningRate 0.0006 Epoch: 11 Global Step: 231100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:33,695-Speed 6308.94 samples/sec Loss 6.3851 LearningRate 0.0006 Epoch: 11 Global Step: 231110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:36,940-Speed 6312.54 samples/sec Loss 6.2414 LearningRate 0.0006 Epoch: 11 Global Step: 231120 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:45:40,177-Speed 6327.23 samples/sec Loss 6.2521 LearningRate 0.0006 Epoch: 11 Global Step: 231130 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:43,427-Speed 6303.55 samples/sec Loss 6.3638 LearningRate 0.0006 Epoch: 11 Global Step: 231140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:46,674-Speed 6309.22 samples/sec Loss 6.2382 LearningRate 0.0006 Epoch: 11 Global Step: 231150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:49,928-Speed 6295.33 samples/sec Loss 6.3254 LearningRate 0.0006 Epoch: 11 Global Step: 231160 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:53,173-Speed 6313.79 samples/sec Loss 6.2619 LearningRate 0.0006 Epoch: 11 Global Step: 231170 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:56,416-Speed 6315.15 samples/sec Loss 6.2533 LearningRate 0.0006 Epoch: 11 Global Step: 231180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:45:59,663-Speed 6308.32 samples/sec Loss 6.2816 LearningRate 0.0006 Epoch: 11 Global Step: 231190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:02,905-Speed 6319.20 samples/sec Loss 6.3472 LearningRate 0.0006 Epoch: 11 Global Step: 231200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:06,149-Speed 6315.37 samples/sec Loss 6.3636 LearningRate 0.0006 Epoch: 11 Global Step: 231210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:09,395-Speed 6309.42 samples/sec Loss 6.3559 LearningRate 0.0006 Epoch: 11 Global Step: 231220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:12,639-Speed 6315.86 samples/sec Loss 6.3174 LearningRate 0.0006 Epoch: 11 Global Step: 231230 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:46:15,874-Speed 6332.73 samples/sec Loss 6.3593 LearningRate 0.0006 Epoch: 11 Global Step: 231240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:19,117-Speed 6315.03 samples/sec Loss 6.2856 LearningRate 0.0006 Epoch: 11 Global Step: 231250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:22,361-Speed 6314.71 samples/sec Loss 6.2857 LearningRate 0.0006 Epoch: 11 Global Step: 231260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:25,611-Speed 6303.63 samples/sec Loss 6.3258 LearningRate 0.0006 Epoch: 11 Global Step: 231270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:28,856-Speed 6312.65 samples/sec Loss 6.3900 LearningRate 0.0006 Epoch: 11 Global Step: 231280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:32,101-Speed 6312.51 samples/sec Loss 6.3053 LearningRate 0.0006 Epoch: 11 Global Step: 231290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:35,350-Speed 6304.38 samples/sec Loss 6.3087 LearningRate 0.0006 Epoch: 11 Global Step: 231300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:38,600-Speed 6303.78 samples/sec Loss 6.3753 LearningRate 0.0006 Epoch: 11 Global Step: 231310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:41,843-Speed 6315.85 samples/sec Loss 6.4138 LearningRate 0.0006 Epoch: 11 Global Step: 231320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:45,086-Speed 6316.84 samples/sec Loss 6.2326 LearningRate 0.0006 Epoch: 11 Global Step: 231330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:48,320-Speed 6334.42 samples/sec Loss 6.2380 LearningRate 0.0006 Epoch: 11 Global Step: 231340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:51,565-Speed 6313.08 samples/sec Loss 6.2967 LearningRate 0.0006 Epoch: 11 Global Step: 231350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:54,807-Speed 6317.76 samples/sec Loss 6.3256 LearningRate 0.0006 Epoch: 11 Global Step: 231360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:46:58,049-Speed 6317.34 samples/sec Loss 6.1970 LearningRate 0.0006 Epoch: 11 Global Step: 231370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:01,293-Speed 6315.99 samples/sec Loss 6.2360 LearningRate 0.0006 Epoch: 11 Global Step: 231380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:04,540-Speed 6309.24 samples/sec Loss 6.3480 LearningRate 0.0006 Epoch: 11 Global Step: 231390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:07,784-Speed 6313.89 samples/sec Loss 6.1734 LearningRate 0.0006 Epoch: 11 Global Step: 231400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:11,032-Speed 6307.96 samples/sec Loss 6.2203 LearningRate 0.0006 Epoch: 11 Global Step: 231410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:14,277-Speed 6313.43 samples/sec Loss 6.3345 LearningRate 0.0006 Epoch: 11 Global Step: 231420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:17,523-Speed 6308.88 samples/sec Loss 6.2474 LearningRate 0.0006 Epoch: 11 Global Step: 231430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:20,752-Speed 6344.21 samples/sec Loss 6.2489 LearningRate 0.0006 Epoch: 11 Global Step: 231440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:23,999-Speed 6308.69 samples/sec Loss 6.2879 LearningRate 0.0006 Epoch: 11 Global Step: 231450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:27,246-Speed 6309.46 samples/sec Loss 6.3240 LearningRate 0.0006 Epoch: 11 Global Step: 231460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:30,492-Speed 6311.21 samples/sec Loss 6.3454 LearningRate 0.0006 Epoch: 11 Global Step: 231470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:33,736-Speed 6313.57 samples/sec Loss 6.3149 LearningRate 0.0006 Epoch: 11 Global Step: 231480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:36,982-Speed 6310.82 samples/sec Loss 6.3074 LearningRate 0.0006 Epoch: 11 Global Step: 231490 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:40,232-Speed 6303.07 samples/sec Loss 6.3505 LearningRate 0.0006 Epoch: 11 Global Step: 231500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:43,474-Speed 6319.43 samples/sec Loss 6.2323 LearningRate 0.0006 Epoch: 11 Global Step: 231510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:46,719-Speed 6312.49 samples/sec Loss 6.3326 LearningRate 0.0006 Epoch: 11 Global Step: 231520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:49,966-Speed 6308.58 samples/sec Loss 6.2688 LearningRate 0.0006 Epoch: 11 Global Step: 231530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:53,201-Speed 6331.15 samples/sec Loss 6.2962 LearningRate 0.0006 Epoch: 11 Global Step: 231540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:56,448-Speed 6308.94 samples/sec Loss 6.3109 LearningRate 0.0006 Epoch: 11 Global Step: 231550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:47:59,694-Speed 6311.26 samples/sec Loss 6.3133 LearningRate 0.0006 Epoch: 11 Global Step: 231560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:02,942-Speed 6306.27 samples/sec Loss 6.2469 LearningRate 0.0006 Epoch: 11 Global Step: 231570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:06,189-Speed 6307.99 samples/sec Loss 6.3044 LearningRate 0.0006 Epoch: 11 Global Step: 231580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:09,438-Speed 6306.01 samples/sec Loss 6.3061 LearningRate 0.0006 Epoch: 11 Global Step: 231590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:12,683-Speed 6312.75 samples/sec Loss 6.2353 LearningRate 0.0006 Epoch: 11 Global Step: 231600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:15,928-Speed 6312.95 samples/sec Loss 6.2220 LearningRate 0.0006 Epoch: 11 Global Step: 231610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:19,177-Speed 6305.57 samples/sec Loss 6.2587 LearningRate 0.0006 Epoch: 11 Global Step: 231620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:22,422-Speed 6312.93 samples/sec Loss 6.2221 LearningRate 0.0006 Epoch: 11 Global Step: 231630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:25,687-Speed 6274.30 samples/sec Loss 6.3388 LearningRate 0.0006 Epoch: 11 Global Step: 231640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:29,051-Speed 6088.29 samples/sec Loss 6.2518 LearningRate 0.0006 Epoch: 11 Global Step: 231650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:32,313-Speed 6279.93 samples/sec Loss 6.2775 LearningRate 0.0006 Epoch: 11 Global Step: 231660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:35,555-Speed 6318.52 samples/sec Loss 6.2597 LearningRate 0.0006 Epoch: 11 Global Step: 231670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:38,801-Speed 6311.05 samples/sec Loss 6.2464 LearningRate 0.0006 Epoch: 11 Global Step: 231680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:42,047-Speed 6311.28 samples/sec Loss 6.3161 LearningRate 0.0006 Epoch: 11 Global Step: 231690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:45,287-Speed 6322.01 samples/sec Loss 6.2637 LearningRate 0.0006 Epoch: 11 Global Step: 231700 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:48,543-Speed 6290.39 samples/sec Loss 6.3058 LearningRate 0.0006 Epoch: 11 Global Step: 231710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:51,889-Speed 6123.50 samples/sec Loss 6.3413 LearningRate 0.0006 Epoch: 11 Global Step: 231720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:55,139-Speed 6301.80 samples/sec Loss 6.3151 LearningRate 0.0006 Epoch: 11 Global Step: 231730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:48:58,371-Speed 6337.87 samples/sec Loss 6.3093 LearningRate 0.0006 Epoch: 11 Global Step: 231740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:01,621-Speed 6303.95 samples/sec Loss 6.2381 LearningRate 0.0006 Epoch: 11 Global Step: 231750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:04,867-Speed 6309.67 samples/sec Loss 6.3855 LearningRate 0.0006 Epoch: 11 Global Step: 231760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:08,111-Speed 6314.94 samples/sec Loss 6.2882 LearningRate 0.0006 Epoch: 11 Global Step: 231770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:11,357-Speed 6310.30 samples/sec Loss 6.2918 LearningRate 0.0006 Epoch: 11 Global Step: 231780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:14,603-Speed 6311.52 samples/sec Loss 6.2987 LearningRate 0.0006 Epoch: 11 Global Step: 231790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:17,849-Speed 6310.41 samples/sec Loss 6.3525 LearningRate 0.0006 Epoch: 11 Global Step: 231800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:21,097-Speed 6307.56 samples/sec Loss 6.3174 LearningRate 0.0006 Epoch: 11 Global Step: 231810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:24,342-Speed 6312.85 samples/sec Loss 6.3331 LearningRate 0.0006 Epoch: 11 Global Step: 231820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:27,588-Speed 6310.05 samples/sec Loss 6.3040 LearningRate 0.0006 Epoch: 11 Global Step: 231830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:30,837-Speed 6306.26 samples/sec Loss 6.2351 LearningRate 0.0006 Epoch: 11 Global Step: 231840 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:49:34,069-Speed 6337.02 samples/sec Loss 6.3101 LearningRate 0.0006 Epoch: 11 Global Step: 231850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:37,319-Speed 6303.59 samples/sec Loss 6.2643 LearningRate 0.0006 Epoch: 11 Global Step: 231860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:40,560-Speed 6320.52 samples/sec Loss 6.1743 LearningRate 0.0006 Epoch: 11 Global Step: 231870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:43,809-Speed 6305.35 samples/sec Loss 6.3462 LearningRate 0.0006 Epoch: 11 Global Step: 231880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:47,052-Speed 6316.41 samples/sec Loss 6.3385 LearningRate 0.0006 Epoch: 11 Global Step: 231890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:50,298-Speed 6310.67 samples/sec Loss 6.3715 LearningRate 0.0006 Epoch: 11 Global Step: 231900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:53,551-Speed 6296.69 samples/sec Loss 6.3005 LearningRate 0.0006 Epoch: 11 Global Step: 231910 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:49:56,793-Speed 6319.04 samples/sec Loss 6.3294 LearningRate 0.0006 Epoch: 11 Global Step: 231920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:00,043-Speed 6301.82 samples/sec Loss 6.3060 LearningRate 0.0006 Epoch: 11 Global Step: 231930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:03,290-Speed 6309.05 samples/sec Loss 6.2515 LearningRate 0.0006 Epoch: 11 Global Step: 231940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:06,535-Speed 6313.25 samples/sec Loss 6.2908 LearningRate 0.0006 Epoch: 11 Global Step: 231950 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:50:09,773-Speed 6325.91 samples/sec Loss 6.3380 LearningRate 0.0006 Epoch: 11 Global Step: 231960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:13,018-Speed 6312.56 samples/sec Loss 6.3276 LearningRate 0.0006 Epoch: 11 Global Step: 231970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:16,267-Speed 6304.05 samples/sec Loss 6.2928 LearningRate 0.0006 Epoch: 11 Global Step: 231980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:19,514-Speed 6309.85 samples/sec Loss 6.2860 LearningRate 0.0006 Epoch: 11 Global Step: 231990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:22,759-Speed 6313.27 samples/sec Loss 6.2478 LearningRate 0.0006 Epoch: 11 Global Step: 232000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:26,006-Speed 6308.62 samples/sec Loss 6.2845 LearningRate 0.0006 Epoch: 11 Global Step: 232010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:29,253-Speed 6307.77 samples/sec Loss 6.2891 LearningRate 0.0006 Epoch: 11 Global Step: 232020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:32,497-Speed 6315.83 samples/sec Loss 6.2542 LearningRate 0.0006 Epoch: 11 Global Step: 232030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:35,744-Speed 6309.21 samples/sec Loss 6.2616 LearningRate 0.0006 Epoch: 11 Global Step: 232040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:38,991-Speed 6307.10 samples/sec Loss 6.2635 LearningRate 0.0006 Epoch: 11 Global Step: 232050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:42,229-Speed 6327.38 samples/sec Loss 6.3873 LearningRate 0.0006 Epoch: 11 Global Step: 232060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:45,475-Speed 6311.82 samples/sec Loss 6.2227 LearningRate 0.0006 Epoch: 11 Global Step: 232070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:48,727-Speed 6298.20 samples/sec Loss 6.2746 LearningRate 0.0006 Epoch: 11 Global Step: 232080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:51,969-Speed 6318.51 samples/sec Loss 6.2660 LearningRate 0.0006 Epoch: 11 Global Step: 232090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:55,211-Speed 6318.77 samples/sec Loss 6.2618 LearningRate 0.0006 Epoch: 11 Global Step: 232100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:50:58,457-Speed 6310.94 samples/sec Loss 6.3116 LearningRate 0.0006 Epoch: 11 Global Step: 232110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:51:01,703-Speed 6309.37 samples/sec Loss 6.3380 LearningRate 0.0006 Epoch: 11 Global Step: 232120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:51:04,943-Speed 6323.10 samples/sec Loss 6.3260 LearningRate 0.0006 Epoch: 11 Global Step: 232130 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 12:51:08,186-Speed 6316.74 samples/sec Loss 6.2871 LearningRate 0.0006 Epoch: 11 Global Step: 232140 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 12:51:11,449-Speed 6276.55 samples/sec Loss 6.3273 LearningRate 0.0006 Epoch: 11 Global Step: 232150 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 12:51:14,693-Speed 6315.58 samples/sec Loss 6.3244 LearningRate 0.0006 Epoch: 11 Global Step: 232160 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 12:51:17,941-Speed 6307.39 samples/sec Loss 6.3859 LearningRate 0.0006 Epoch: 11 Global Step: 232170 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 12:51:21,200-Speed 6284.70 samples/sec Loss 6.3938 LearningRate 0.0006 Epoch: 11 Global Step: 232180 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 12:51:24,472-Speed 6261.58 samples/sec Loss 6.3554 LearningRate 0.0006 Epoch: 11 Global Step: 232190 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 12:51:27,776-Speed 6199.89 samples/sec Loss 6.2472 LearningRate 0.0006 Epoch: 11 Global Step: 232200 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 12:51:31,022-Speed 6308.73 samples/sec Loss 6.2716 LearningRate 0.0006 Epoch: 11 Global Step: 232210 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 12:51:34,267-Speed 6313.96 samples/sec Loss 6.3921 LearningRate 0.0006 Epoch: 11 Global Step: 232220 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 12:51:37,510-Speed 6315.28 samples/sec Loss 6.3162 LearningRate 0.0006 Epoch: 11 Global Step: 232230 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:51:40,758-Speed 6307.32 samples/sec Loss 6.2807 LearningRate 0.0006 Epoch: 11 Global Step: 232240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:51:44,004-Speed 6312.93 samples/sec Loss 6.2965 LearningRate 0.0006 Epoch: 11 Global Step: 232250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:51:47,249-Speed 6311.84 samples/sec Loss 6.2775 LearningRate 0.0006 Epoch: 11 Global Step: 232260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:51:50,496-Speed 6308.34 samples/sec Loss 6.3102 LearningRate 0.0006 Epoch: 11 Global Step: 232270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:51:53,742-Speed 6312.12 samples/sec Loss 6.3250 LearningRate 0.0006 Epoch: 11 Global Step: 232280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:51:56,987-Speed 6311.69 samples/sec Loss 6.3579 LearningRate 0.0006 Epoch: 11 Global Step: 232290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:00,232-Speed 6313.88 samples/sec Loss 6.2929 LearningRate 0.0006 Epoch: 11 Global Step: 232300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:03,482-Speed 6302.50 samples/sec Loss 6.3292 LearningRate 0.0006 Epoch: 11 Global Step: 232310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:06,729-Speed 6307.54 samples/sec Loss 6.3048 LearningRate 0.0006 Epoch: 11 Global Step: 232320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:09,973-Speed 6315.10 samples/sec Loss 6.3154 LearningRate 0.0006 Epoch: 11 Global Step: 232330 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:52:13,207-Speed 6335.29 samples/sec Loss 6.3183 LearningRate 0.0006 Epoch: 11 Global Step: 232340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:16,449-Speed 6317.24 samples/sec Loss 6.2777 LearningRate 0.0006 Epoch: 11 Global Step: 232350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:19,694-Speed 6313.39 samples/sec Loss 6.3485 LearningRate 0.0006 Epoch: 11 Global Step: 232360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:22,937-Speed 6315.87 samples/sec Loss 6.3024 LearningRate 0.0006 Epoch: 11 Global Step: 232370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:26,180-Speed 6317.17 samples/sec Loss 6.2369 LearningRate 0.0006 Epoch: 11 Global Step: 232380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:29,424-Speed 6313.88 samples/sec Loss 6.2498 LearningRate 0.0006 Epoch: 11 Global Step: 232390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:32,681-Speed 6290.56 samples/sec Loss 6.2947 LearningRate 0.0006 Epoch: 11 Global Step: 232400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:35,927-Speed 6309.94 samples/sec Loss 6.2387 LearningRate 0.0006 Epoch: 11 Global Step: 232410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:39,177-Speed 6303.58 samples/sec Loss 6.2393 LearningRate 0.0006 Epoch: 11 Global Step: 232420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:42,420-Speed 6315.67 samples/sec Loss 6.3321 LearningRate 0.0006 Epoch: 11 Global Step: 232430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:45,651-Speed 6340.70 samples/sec Loss 6.2955 LearningRate 0.0006 Epoch: 11 Global Step: 232440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:48,894-Speed 6315.56 samples/sec Loss 6.3087 LearningRate 0.0006 Epoch: 11 Global Step: 232450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:52,144-Speed 6304.54 samples/sec Loss 6.2608 LearningRate 0.0006 Epoch: 11 Global Step: 232460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:55,389-Speed 6313.45 samples/sec Loss 6.2896 LearningRate 0.0006 Epoch: 11 Global Step: 232470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:52:58,637-Speed 6305.25 samples/sec Loss 6.2710 LearningRate 0.0006 Epoch: 11 Global Step: 232480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:01,879-Speed 6319.21 samples/sec Loss 6.3497 LearningRate 0.0006 Epoch: 11 Global Step: 232490 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:05,127-Speed 6305.83 samples/sec Loss 6.2366 LearningRate 0.0006 Epoch: 11 Global Step: 232500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:08,376-Speed 6306.02 samples/sec Loss 6.2848 LearningRate 0.0006 Epoch: 11 Global Step: 232510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:11,620-Speed 6314.67 samples/sec Loss 6.3343 LearningRate 0.0006 Epoch: 11 Global Step: 232520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:14,867-Speed 6308.77 samples/sec Loss 6.1745 LearningRate 0.0006 Epoch: 11 Global Step: 232530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:18,115-Speed 6306.99 samples/sec Loss 6.3397 LearningRate 0.0006 Epoch: 11 Global Step: 232540 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:53:21,348-Speed 6336.16 samples/sec Loss 6.2239 LearningRate 0.0006 Epoch: 11 Global Step: 232550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:24,594-Speed 6309.99 samples/sec Loss 6.2750 LearningRate 0.0006 Epoch: 11 Global Step: 232560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:27,840-Speed 6310.56 samples/sec Loss 6.3193 LearningRate 0.0006 Epoch: 11 Global Step: 232570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:31,088-Speed 6307.99 samples/sec Loss 6.2864 LearningRate 0.0006 Epoch: 11 Global Step: 232580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:34,337-Speed 6303.99 samples/sec Loss 6.2865 LearningRate 0.0006 Epoch: 11 Global Step: 232590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:37,591-Speed 6295.25 samples/sec Loss 6.2108 LearningRate 0.0006 Epoch: 11 Global Step: 232600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:40,841-Speed 6302.74 samples/sec Loss 6.2803 LearningRate 0.0006 Epoch: 11 Global Step: 232610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:44,089-Speed 6307.20 samples/sec Loss 6.2557 LearningRate 0.0006 Epoch: 11 Global Step: 232620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:47,346-Speed 6288.27 samples/sec Loss 6.2730 LearningRate 0.0006 Epoch: 11 Global Step: 232630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:50,591-Speed 6313.45 samples/sec Loss 6.2213 LearningRate 0.0006 Epoch: 11 Global Step: 232640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:53,824-Speed 6335.74 samples/sec Loss 6.3530 LearningRate 0.0006 Epoch: 11 Global Step: 232650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:53:57,074-Speed 6302.95 samples/sec Loss 6.2604 LearningRate 0.0006 Epoch: 11 Global Step: 232660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:00,324-Speed 6304.93 samples/sec Loss 6.3757 LearningRate 0.0006 Epoch: 11 Global Step: 232670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:03,569-Speed 6312.20 samples/sec Loss 6.3330 LearningRate 0.0006 Epoch: 11 Global Step: 232680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:06,814-Speed 6312.86 samples/sec Loss 6.2416 LearningRate 0.0006 Epoch: 11 Global Step: 232690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:10,064-Speed 6302.49 samples/sec Loss 6.2249 LearningRate 0.0006 Epoch: 11 Global Step: 232700 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:13,309-Speed 6311.63 samples/sec Loss 6.2910 LearningRate 0.0006 Epoch: 11 Global Step: 232710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:16,559-Speed 6304.48 samples/sec Loss 6.2652 LearningRate 0.0006 Epoch: 11 Global Step: 232720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:19,805-Speed 6309.19 samples/sec Loss 6.2759 LearningRate 0.0006 Epoch: 11 Global Step: 232730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:23,052-Speed 6310.16 samples/sec Loss 6.2751 LearningRate 0.0006 Epoch: 11 Global Step: 232740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:26,282-Speed 6340.85 samples/sec Loss 6.2235 LearningRate 0.0006 Epoch: 11 Global Step: 232750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:29,528-Speed 6311.23 samples/sec Loss 6.2686 LearningRate 0.0006 Epoch: 11 Global Step: 232760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:32,775-Speed 6308.78 samples/sec Loss 6.2233 LearningRate 0.0006 Epoch: 11 Global Step: 232770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:36,023-Speed 6306.46 samples/sec Loss 6.2931 LearningRate 0.0006 Epoch: 11 Global Step: 232780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:39,269-Speed 6310.28 samples/sec Loss 6.2416 LearningRate 0.0006 Epoch: 11 Global Step: 232790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:42,516-Speed 6308.70 samples/sec Loss 6.3239 LearningRate 0.0006 Epoch: 11 Global Step: 232800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:45,761-Speed 6313.70 samples/sec Loss 6.3203 LearningRate 0.0006 Epoch: 11 Global Step: 232810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:49,009-Speed 6306.12 samples/sec Loss 6.3524 LearningRate 0.0006 Epoch: 11 Global Step: 232820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:52,321-Speed 6184.95 samples/sec Loss 6.2279 LearningRate 0.0006 Epoch: 11 Global Step: 232830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:55,566-Speed 6313.81 samples/sec Loss 6.2348 LearningRate 0.0006 Epoch: 11 Global Step: 232840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:54:58,796-Speed 6340.89 samples/sec Loss 6.2429 LearningRate 0.0006 Epoch: 11 Global Step: 232850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:02,042-Speed 6311.63 samples/sec Loss 6.3502 LearningRate 0.0006 Epoch: 11 Global Step: 232860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:05,289-Speed 6307.76 samples/sec Loss 6.2741 LearningRate 0.0006 Epoch: 11 Global Step: 232870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:08,531-Speed 6319.91 samples/sec Loss 6.3678 LearningRate 0.0006 Epoch: 11 Global Step: 232880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:11,783-Speed 6299.36 samples/sec Loss 6.2508 LearningRate 0.0006 Epoch: 11 Global Step: 232890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:15,025-Speed 6318.25 samples/sec Loss 6.2897 LearningRate 0.0006 Epoch: 11 Global Step: 232900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:18,269-Speed 6314.29 samples/sec Loss 6.3462 LearningRate 0.0006 Epoch: 11 Global Step: 232910 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:21,519-Speed 6302.79 samples/sec Loss 6.3076 LearningRate 0.0006 Epoch: 11 Global Step: 232920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:24,766-Speed 6309.37 samples/sec Loss 6.2864 LearningRate 0.0006 Epoch: 11 Global Step: 232930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:28,009-Speed 6316.33 samples/sec Loss 6.3270 LearningRate 0.0006 Epoch: 11 Global Step: 232940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:31,252-Speed 6317.19 samples/sec Loss 6.3026 LearningRate 0.0006 Epoch: 11 Global Step: 232950 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:55:34,484-Speed 6337.20 samples/sec Loss 6.3067 LearningRate 0.0006 Epoch: 11 Global Step: 232960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:37,732-Speed 6306.64 samples/sec Loss 6.2131 LearningRate 0.0006 Epoch: 11 Global Step: 232970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:40,978-Speed 6311.16 samples/sec Loss 6.2107 LearningRate 0.0006 Epoch: 11 Global Step: 232980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:44,221-Speed 6316.56 samples/sec Loss 6.2313 LearningRate 0.0006 Epoch: 11 Global Step: 232990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:47,476-Speed 6293.14 samples/sec Loss 6.2906 LearningRate 0.0006 Epoch: 11 Global Step: 233000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:50,721-Speed 6312.05 samples/sec Loss 6.2759 LearningRate 0.0006 Epoch: 11 Global Step: 233010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:53,967-Speed 6310.00 samples/sec Loss 6.3353 LearningRate 0.0006 Epoch: 11 Global Step: 233020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:55:57,213-Speed 6310.95 samples/sec Loss 6.2663 LearningRate 0.0006 Epoch: 11 Global Step: 233030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:00,457-Speed 6316.04 samples/sec Loss 6.3066 LearningRate 0.0006 Epoch: 11 Global Step: 233040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:03,704-Speed 6307.68 samples/sec Loss 6.3788 LearningRate 0.0006 Epoch: 11 Global Step: 233050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:06,935-Speed 6339.24 samples/sec Loss 6.3305 LearningRate 0.0006 Epoch: 11 Global Step: 233060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:10,179-Speed 6315.73 samples/sec Loss 6.3006 LearningRate 0.0006 Epoch: 11 Global Step: 233070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:13,422-Speed 6316.87 samples/sec Loss 6.2043 LearningRate 0.0006 Epoch: 11 Global Step: 233080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:16,669-Speed 6309.60 samples/sec Loss 6.2608 LearningRate 0.0006 Epoch: 11 Global Step: 233090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:19,910-Speed 6319.00 samples/sec Loss 6.3492 LearningRate 0.0006 Epoch: 11 Global Step: 233100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:23,167-Speed 6289.45 samples/sec Loss 6.4549 LearningRate 0.0006 Epoch: 11 Global Step: 233110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:26,415-Speed 6308.09 samples/sec Loss 6.3075 LearningRate 0.0006 Epoch: 11 Global Step: 233120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:29,662-Speed 6308.34 samples/sec Loss 6.2836 LearningRate 0.0006 Epoch: 11 Global Step: 233130 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:32,906-Speed 6314.72 samples/sec Loss 6.2720 LearningRate 0.0006 Epoch: 11 Global Step: 233140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:36,149-Speed 6315.60 samples/sec Loss 6.2775 LearningRate 0.0006 Epoch: 11 Global Step: 233150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:39,392-Speed 6316.92 samples/sec Loss 6.3189 LearningRate 0.0006 Epoch: 11 Global Step: 233160 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:56:42,622-Speed 6341.81 samples/sec Loss 6.3730 LearningRate 0.0006 Epoch: 11 Global Step: 233170 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:45,869-Speed 6308.37 samples/sec Loss 6.2363 LearningRate 0.0006 Epoch: 11 Global Step: 233180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:49,119-Speed 6304.26 samples/sec Loss 6.2814 LearningRate 0.0006 Epoch: 11 Global Step: 233190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:52,364-Speed 6312.97 samples/sec Loss 6.3221 LearningRate 0.0006 Epoch: 11 Global Step: 233200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:55,608-Speed 6314.37 samples/sec Loss 6.2989 LearningRate 0.0006 Epoch: 11 Global Step: 233210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:56:58,854-Speed 6310.71 samples/sec Loss 6.2776 LearningRate 0.0006 Epoch: 11 Global Step: 233220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:02,109-Speed 6291.90 samples/sec Loss 6.3417 LearningRate 0.0006 Epoch: 11 Global Step: 233230 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:05,359-Speed 6303.36 samples/sec Loss 6.3198 LearningRate 0.0006 Epoch: 11 Global Step: 233240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:08,611-Speed 6298.84 samples/sec Loss 6.3561 LearningRate 0.0006 Epoch: 11 Global Step: 233250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:11,856-Speed 6314.44 samples/sec Loss 6.2221 LearningRate 0.0006 Epoch: 11 Global Step: 233260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:15,101-Speed 6311.08 samples/sec Loss 6.4183 LearningRate 0.0006 Epoch: 11 Global Step: 233270 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:57:18,334-Speed 6335.77 samples/sec Loss 6.3681 LearningRate 0.0006 Epoch: 11 Global Step: 233280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:21,615-Speed 6245.02 samples/sec Loss 6.2920 LearningRate 0.0006 Epoch: 11 Global Step: 233290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:24,878-Speed 6276.24 samples/sec Loss 6.3876 LearningRate 0.0006 Epoch: 11 Global Step: 233300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:28,128-Speed 6304.37 samples/sec Loss 6.2317 LearningRate 0.0006 Epoch: 11 Global Step: 233310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:31,369-Speed 6320.29 samples/sec Loss 6.2926 LearningRate 0.0006 Epoch: 11 Global Step: 233320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:34,615-Speed 6311.75 samples/sec Loss 6.3441 LearningRate 0.0006 Epoch: 11 Global Step: 233330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:37,858-Speed 6315.31 samples/sec Loss 6.2593 LearningRate 0.0006 Epoch: 11 Global Step: 233340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:41,104-Speed 6312.31 samples/sec Loss 6.2994 LearningRate 0.0006 Epoch: 11 Global Step: 233350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:44,352-Speed 6307.07 samples/sec Loss 6.3208 LearningRate 0.0006 Epoch: 11 Global Step: 233360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:47,596-Speed 6314.22 samples/sec Loss 6.2552 LearningRate 0.0006 Epoch: 11 Global Step: 233370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:50,825-Speed 6342.62 samples/sec Loss 6.2263 LearningRate 0.0006 Epoch: 11 Global Step: 233380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:54,074-Speed 6306.10 samples/sec Loss 6.2928 LearningRate 0.0006 Epoch: 11 Global Step: 233390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:57:57,318-Speed 6315.09 samples/sec Loss 6.2815 LearningRate 0.0006 Epoch: 11 Global Step: 233400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:00,561-Speed 6314.82 samples/sec Loss 6.2986 LearningRate 0.0006 Epoch: 11 Global Step: 233410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:03,809-Speed 6307.67 samples/sec Loss 6.3479 LearningRate 0.0006 Epoch: 11 Global Step: 233420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:07,057-Speed 6307.09 samples/sec Loss 6.3556 LearningRate 0.0006 Epoch: 11 Global Step: 233430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:10,309-Speed 6298.10 samples/sec Loss 6.3425 LearningRate 0.0006 Epoch: 11 Global Step: 233440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:13,557-Speed 6307.72 samples/sec Loss 6.2998 LearningRate 0.0006 Epoch: 11 Global Step: 233450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:16,801-Speed 6313.74 samples/sec Loss 6.3263 LearningRate 0.0006 Epoch: 11 Global Step: 233460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:20,049-Speed 6308.10 samples/sec Loss 6.3069 LearningRate 0.0006 Epoch: 11 Global Step: 233470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:23,294-Speed 6310.81 samples/sec Loss 6.2308 LearningRate 0.0006 Epoch: 11 Global Step: 233480 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:58:26,536-Speed 6320.41 samples/sec Loss 6.2862 LearningRate 0.0006 Epoch: 11 Global Step: 233490 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:58:29,787-Speed 6300.85 samples/sec Loss 6.3318 LearningRate 0.0006 Epoch: 11 Global Step: 233500 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:58:33,029-Speed 6317.62 samples/sec Loss 6.3094 LearningRate 0.0006 Epoch: 11 Global Step: 233510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:36,276-Speed 6308.15 samples/sec Loss 6.3803 LearningRate 0.0006 Epoch: 11 Global Step: 233520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:39,525-Speed 6306.04 samples/sec Loss 6.2603 LearningRate 0.0006 Epoch: 11 Global Step: 233530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:42,771-Speed 6311.63 samples/sec Loss 6.3584 LearningRate 0.0006 Epoch: 11 Global Step: 233540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:46,014-Speed 6315.85 samples/sec Loss 6.2263 LearningRate 0.0006 Epoch: 11 Global Step: 233550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:49,258-Speed 6314.68 samples/sec Loss 6.1961 LearningRate 0.0006 Epoch: 11 Global Step: 233560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:52,504-Speed 6310.63 samples/sec Loss 6.3179 LearningRate 0.0006 Epoch: 11 Global Step: 233570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:55,750-Speed 6311.66 samples/sec Loss 6.2620 LearningRate 0.0006 Epoch: 11 Global Step: 233580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:58:58,993-Speed 6315.30 samples/sec Loss 6.1951 LearningRate 0.0006 Epoch: 11 Global Step: 233590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:02,237-Speed 6314.95 samples/sec Loss 6.1705 LearningRate 0.0006 Epoch: 11 Global Step: 233600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:05,468-Speed 6339.94 samples/sec Loss 6.2620 LearningRate 0.0006 Epoch: 11 Global Step: 233610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:08,720-Speed 6299.07 samples/sec Loss 6.2025 LearningRate 0.0006 Epoch: 11 Global Step: 233620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:11,967-Speed 6309.30 samples/sec Loss 6.2674 LearningRate 0.0006 Epoch: 11 Global Step: 233630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:15,213-Speed 6310.41 samples/sec Loss 6.3581 LearningRate 0.0006 Epoch: 11 Global Step: 233640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:18,457-Speed 6313.61 samples/sec Loss 6.2302 LearningRate 0.0006 Epoch: 11 Global Step: 233650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:21,703-Speed 6310.46 samples/sec Loss 6.2821 LearningRate 0.0006 Epoch: 11 Global Step: 233660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:24,954-Speed 6301.92 samples/sec Loss 6.3233 LearningRate 0.0006 Epoch: 11 Global Step: 233670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:28,202-Speed 6306.89 samples/sec Loss 6.2978 LearningRate 0.0006 Epoch: 11 Global Step: 233680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:31,445-Speed 6316.46 samples/sec Loss 6.3156 LearningRate 0.0006 Epoch: 11 Global Step: 233690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:34,689-Speed 6313.46 samples/sec Loss 6.2371 LearningRate 0.0006 Epoch: 11 Global Step: 233700 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:37,933-Speed 6316.02 samples/sec Loss 6.1933 LearningRate 0.0006 Epoch: 11 Global Step: 233710 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 12:59:41,162-Speed 6343.15 samples/sec Loss 6.2773 LearningRate 0.0006 Epoch: 11 Global Step: 233720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:44,407-Speed 6312.44 samples/sec Loss 6.2223 LearningRate 0.0006 Epoch: 11 Global Step: 233730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:47,674-Speed 6269.50 samples/sec Loss 6.2421 LearningRate 0.0006 Epoch: 11 Global Step: 233740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:50,915-Speed 6320.57 samples/sec Loss 6.2924 LearningRate 0.0006 Epoch: 11 Global Step: 233750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:54,160-Speed 6314.46 samples/sec Loss 6.2858 LearningRate 0.0006 Epoch: 11 Global Step: 233760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 12:59:57,405-Speed 6311.62 samples/sec Loss 6.3697 LearningRate 0.0006 Epoch: 11 Global Step: 233770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:00:00,655-Speed 6303.67 samples/sec Loss 6.2990 LearningRate 0.0006 Epoch: 11 Global Step: 233780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:00:03,903-Speed 6307.47 samples/sec Loss 6.2220 LearningRate 0.0006 Epoch: 11 Global Step: 233790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:00:07,136-Speed 6335.62 samples/sec Loss 6.2627 LearningRate 0.0006 Epoch: 11 Global Step: 233800 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:00:10,383-Speed 6309.17 samples/sec Loss 6.2851 LearningRate 0.0006 Epoch: 11 Global Step: 233810 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:00:13,624-Speed 6320.02 samples/sec Loss 6.2680 LearningRate 0.0006 Epoch: 11 Global Step: 233820 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:00:16,872-Speed 6305.88 samples/sec Loss 6.2819 LearningRate 0.0006 Epoch: 11 Global Step: 233830 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:00:20,117-Speed 6312.97 samples/sec Loss 6.2336 LearningRate 0.0006 Epoch: 11 Global Step: 233840 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:00:23,360-Speed 6316.59 samples/sec Loss 6.3071 LearningRate 0.0006 Epoch: 11 Global Step: 233850 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:00:26,626-Speed 6272.74 samples/sec Loss 6.2067 LearningRate 0.0006 Epoch: 11 Global Step: 233860 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:00:29,867-Speed 6320.35 samples/sec Loss 6.2823 LearningRate 0.0006 Epoch: 11 Global Step: 233870 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:00:33,116-Speed 6304.14 samples/sec Loss 6.2844 LearningRate 0.0006 Epoch: 11 Global Step: 233880 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:00:36,358-Speed 6319.86 samples/sec Loss 6.3162 LearningRate 0.0006 Epoch: 11 Global Step: 233890 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:00:39,606-Speed 6306.86 samples/sec Loss 6.3637 LearningRate 0.0006 Epoch: 11 Global Step: 233900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:00:42,850-Speed 6312.94 samples/sec Loss 6.2157 LearningRate 0.0006 Epoch: 11 Global Step: 233910 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:00:46,098-Speed 6307.65 samples/sec Loss 6.3587 LearningRate 0.0006 Epoch: 11 Global Step: 233920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:00:49,399-Speed 6204.52 samples/sec Loss 6.3444 LearningRate 0.0006 Epoch: 11 Global Step: 233930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:00:52,643-Speed 6315.52 samples/sec Loss 6.2398 LearningRate 0.0006 Epoch: 11 Global Step: 233940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:00:55,888-Speed 6311.97 samples/sec Loss 6.3232 LearningRate 0.0006 Epoch: 11 Global Step: 233950 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:00:59,136-Speed 6308.30 samples/sec Loss 6.2399 LearningRate 0.0006 Epoch: 11 Global Step: 233960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:02,382-Speed 6311.02 samples/sec Loss 6.2704 LearningRate 0.0006 Epoch: 11 Global Step: 233970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:05,626-Speed 6314.05 samples/sec Loss 6.2991 LearningRate 0.0006 Epoch: 11 Global Step: 233980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:08,872-Speed 6312.13 samples/sec Loss 6.2781 LearningRate 0.0006 Epoch: 11 Global Step: 233990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:12,106-Speed 6333.91 samples/sec Loss 6.2840 LearningRate 0.0006 Epoch: 11 Global Step: 234000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:15,351-Speed 6311.94 samples/sec Loss 6.3138 LearningRate 0.0006 Epoch: 11 Global Step: 234010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:18,600-Speed 6305.57 samples/sec Loss 6.2976 LearningRate 0.0006 Epoch: 11 Global Step: 234020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:21,843-Speed 6314.98 samples/sec Loss 6.3724 LearningRate 0.0006 Epoch: 11 Global Step: 234030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:25,139-Speed 6215.69 samples/sec Loss 6.2360 LearningRate 0.0006 Epoch: 11 Global Step: 234040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:28,435-Speed 6215.03 samples/sec Loss 6.2900 LearningRate 0.0006 Epoch: 11 Global Step: 234050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:31,679-Speed 6315.19 samples/sec Loss 6.2562 LearningRate 0.0006 Epoch: 11 Global Step: 234060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:34,928-Speed 6303.43 samples/sec Loss 6.2736 LearningRate 0.0006 Epoch: 11 Global Step: 234070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:38,176-Speed 6306.77 samples/sec Loss 6.4024 LearningRate 0.0006 Epoch: 11 Global Step: 234080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:41,423-Speed 6309.84 samples/sec Loss 6.2987 LearningRate 0.0006 Epoch: 11 Global Step: 234090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:44,670-Speed 6309.00 samples/sec Loss 6.3056 LearningRate 0.0006 Epoch: 11 Global Step: 234100 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:01:47,900-Speed 6341.07 samples/sec Loss 6.3654 LearningRate 0.0006 Epoch: 11 Global Step: 234110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:51,148-Speed 6306.99 samples/sec Loss 6.3435 LearningRate 0.0006 Epoch: 11 Global Step: 234120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:54,391-Speed 6316.27 samples/sec Loss 6.2396 LearningRate 0.0006 Epoch: 11 Global Step: 234130 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:01:57,637-Speed 6310.90 samples/sec Loss 6.3434 LearningRate 0.0006 Epoch: 11 Global Step: 234140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:00,884-Speed 6307.59 samples/sec Loss 6.2494 LearningRate 0.0006 Epoch: 11 Global Step: 234150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:04,135-Speed 6301.70 samples/sec Loss 6.3125 LearningRate 0.0006 Epoch: 11 Global Step: 234160 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:07,382-Speed 6308.16 samples/sec Loss 6.2904 LearningRate 0.0006 Epoch: 11 Global Step: 234170 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:10,626-Speed 6314.96 samples/sec Loss 6.2356 LearningRate 0.0006 Epoch: 11 Global Step: 234180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:13,877-Speed 6301.21 samples/sec Loss 6.3300 LearningRate 0.0006 Epoch: 11 Global Step: 234190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:17,122-Speed 6313.29 samples/sec Loss 6.2790 LearningRate 0.0006 Epoch: 11 Global Step: 234200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:20,355-Speed 6336.37 samples/sec Loss 6.3518 LearningRate 0.0006 Epoch: 11 Global Step: 234210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:23,600-Speed 6312.36 samples/sec Loss 6.2232 LearningRate 0.0006 Epoch: 11 Global Step: 234220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:26,844-Speed 6315.59 samples/sec Loss 6.2489 LearningRate 0.0006 Epoch: 11 Global Step: 234230 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:30,094-Speed 6302.06 samples/sec Loss 6.3152 LearningRate 0.0006 Epoch: 11 Global Step: 234240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:33,340-Speed 6310.84 samples/sec Loss 6.3266 LearningRate 0.0006 Epoch: 11 Global Step: 234250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:36,582-Speed 6318.21 samples/sec Loss 6.3199 LearningRate 0.0006 Epoch: 11 Global Step: 234260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:39,905-Speed 6164.72 samples/sec Loss 6.3540 LearningRate 0.0006 Epoch: 11 Global Step: 234270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:43,168-Speed 6278.32 samples/sec Loss 6.2517 LearningRate 0.0006 Epoch: 11 Global Step: 234280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:46,411-Speed 6316.55 samples/sec Loss 6.2790 LearningRate 0.0006 Epoch: 11 Global Step: 234290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:49,656-Speed 6312.98 samples/sec Loss 6.2548 LearningRate 0.0006 Epoch: 11 Global Step: 234300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:52,887-Speed 6338.40 samples/sec Loss 6.2589 LearningRate 0.0006 Epoch: 11 Global Step: 234310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:56,132-Speed 6314.48 samples/sec Loss 6.3009 LearningRate 0.0006 Epoch: 11 Global Step: 234320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:02:59,374-Speed 6318.36 samples/sec Loss 6.2596 LearningRate 0.0006 Epoch: 11 Global Step: 234330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:02,617-Speed 6316.38 samples/sec Loss 6.2738 LearningRate 0.0006 Epoch: 11 Global Step: 234340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:05,860-Speed 6315.36 samples/sec Loss 6.2932 LearningRate 0.0006 Epoch: 11 Global Step: 234350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:09,104-Speed 6314.96 samples/sec Loss 6.2616 LearningRate 0.0006 Epoch: 11 Global Step: 234360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:12,348-Speed 6315.04 samples/sec Loss 6.3118 LearningRate 0.0006 Epoch: 11 Global Step: 234370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:15,590-Speed 6317.83 samples/sec Loss 6.3175 LearningRate 0.0006 Epoch: 11 Global Step: 234380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:18,837-Speed 6309.95 samples/sec Loss 6.3414 LearningRate 0.0006 Epoch: 11 Global Step: 234390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:22,080-Speed 6316.45 samples/sec Loss 6.2380 LearningRate 0.0006 Epoch: 11 Global Step: 234400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:25,313-Speed 6336.69 samples/sec Loss 6.2730 LearningRate 0.0006 Epoch: 11 Global Step: 234410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:28,558-Speed 6311.47 samples/sec Loss 6.2497 LearningRate 0.0006 Epoch: 11 Global Step: 234420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:31,806-Speed 6307.25 samples/sec Loss 6.1660 LearningRate 0.0006 Epoch: 11 Global Step: 234430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:35,052-Speed 6311.05 samples/sec Loss 6.2677 LearningRate 0.0006 Epoch: 11 Global Step: 234440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:38,296-Speed 6314.10 samples/sec Loss 6.2347 LearningRate 0.0006 Epoch: 11 Global Step: 234450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:41,540-Speed 6314.45 samples/sec Loss 6.3295 LearningRate 0.0006 Epoch: 11 Global Step: 234460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:44,786-Speed 6310.58 samples/sec Loss 6.3027 LearningRate 0.0006 Epoch: 11 Global Step: 234470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:48,033-Speed 6308.47 samples/sec Loss 6.3042 LearningRate 0.0006 Epoch: 11 Global Step: 234480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:51,280-Speed 6310.04 samples/sec Loss 6.1853 LearningRate 0.0006 Epoch: 11 Global Step: 234490 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:54,529-Speed 6303.27 samples/sec Loss 6.2354 LearningRate 0.0006 Epoch: 11 Global Step: 234500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:03:57,762-Speed 6337.54 samples/sec Loss 6.2811 LearningRate 0.0006 Epoch: 11 Global Step: 234510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:01,010-Speed 6306.74 samples/sec Loss 6.3042 LearningRate 0.0006 Epoch: 11 Global Step: 234520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:04,257-Speed 6309.15 samples/sec Loss 6.2922 LearningRate 0.0006 Epoch: 11 Global Step: 234530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:07,504-Speed 6308.05 samples/sec Loss 6.2845 LearningRate 0.0006 Epoch: 11 Global Step: 234540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:10,750-Speed 6311.57 samples/sec Loss 6.2317 LearningRate 0.0006 Epoch: 11 Global Step: 234550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:13,996-Speed 6309.04 samples/sec Loss 6.2558 LearningRate 0.0006 Epoch: 11 Global Step: 234560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:17,244-Speed 6307.86 samples/sec Loss 6.3015 LearningRate 0.0006 Epoch: 11 Global Step: 234570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:20,493-Speed 6304.64 samples/sec Loss 6.3711 LearningRate 0.0006 Epoch: 11 Global Step: 234580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:23,739-Speed 6311.13 samples/sec Loss 6.2258 LearningRate 0.0006 Epoch: 11 Global Step: 234590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:26,986-Speed 6309.47 samples/sec Loss 6.2186 LearningRate 0.0006 Epoch: 11 Global Step: 234600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:30,234-Speed 6306.73 samples/sec Loss 6.2331 LearningRate 0.0006 Epoch: 11 Global Step: 234610 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:04:33,480-Speed 6311.88 samples/sec Loss 6.2321 LearningRate 0.0006 Epoch: 11 Global Step: 234620 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:04:36,724-Speed 6313.98 samples/sec Loss 6.2396 LearningRate 0.0006 Epoch: 11 Global Step: 234630 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:04:39,971-Speed 6308.70 samples/sec Loss 6.2800 LearningRate 0.0006 Epoch: 11 Global Step: 234640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:43,220-Speed 6303.96 samples/sec Loss 6.2693 LearningRate 0.0006 Epoch: 11 Global Step: 234650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:46,464-Speed 6317.77 samples/sec Loss 6.2171 LearningRate 0.0006 Epoch: 11 Global Step: 234660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:49,712-Speed 6307.63 samples/sec Loss 6.2136 LearningRate 0.0006 Epoch: 11 Global Step: 234670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:52,960-Speed 6305.17 samples/sec Loss 6.2525 LearningRate 0.0006 Epoch: 11 Global Step: 234680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:56,205-Speed 6313.53 samples/sec Loss 6.3071 LearningRate 0.0006 Epoch: 11 Global Step: 234690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:04:59,451-Speed 6310.87 samples/sec Loss 6.2612 LearningRate 0.0006 Epoch: 11 Global Step: 234700 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:02,695-Speed 6315.76 samples/sec Loss 6.2634 LearningRate 0.0006 Epoch: 11 Global Step: 234710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:05,941-Speed 6309.45 samples/sec Loss 6.2457 LearningRate 0.0006 Epoch: 11 Global Step: 234720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:09,184-Speed 6317.99 samples/sec Loss 6.2757 LearningRate 0.0006 Epoch: 11 Global Step: 234730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:12,423-Speed 6323.65 samples/sec Loss 6.3695 LearningRate 0.0006 Epoch: 11 Global Step: 234740 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:05:15,656-Speed 6335.37 samples/sec Loss 6.2978 LearningRate 0.0006 Epoch: 11 Global Step: 234750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:18,903-Speed 6308.76 samples/sec Loss 6.2721 LearningRate 0.0006 Epoch: 11 Global Step: 234760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:22,148-Speed 6313.98 samples/sec Loss 6.2928 LearningRate 0.0006 Epoch: 11 Global Step: 234770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:25,392-Speed 6313.81 samples/sec Loss 6.2444 LearningRate 0.0006 Epoch: 11 Global Step: 234780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:28,642-Speed 6303.90 samples/sec Loss 6.2302 LearningRate 0.0006 Epoch: 11 Global Step: 234790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:31,886-Speed 6313.26 samples/sec Loss 6.2444 LearningRate 0.0006 Epoch: 11 Global Step: 234800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:35,135-Speed 6305.95 samples/sec Loss 6.3042 LearningRate 0.0006 Epoch: 11 Global Step: 234810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:38,409-Speed 6257.24 samples/sec Loss 6.2387 LearningRate 0.0006 Epoch: 11 Global Step: 234820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:41,708-Speed 6208.65 samples/sec Loss 6.2970 LearningRate 0.0006 Epoch: 11 Global Step: 234830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:44,953-Speed 6314.38 samples/sec Loss 6.3055 LearningRate 0.0006 Epoch: 11 Global Step: 234840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:48,183-Speed 6340.50 samples/sec Loss 6.2943 LearningRate 0.0006 Epoch: 11 Global Step: 234850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:51,429-Speed 6311.24 samples/sec Loss 6.2965 LearningRate 0.0006 Epoch: 11 Global Step: 234860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:54,671-Speed 6319.00 samples/sec Loss 6.3207 LearningRate 0.0006 Epoch: 11 Global Step: 234870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:05:57,917-Speed 6309.67 samples/sec Loss 6.3777 LearningRate 0.0006 Epoch: 11 Global Step: 234880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:01,161-Speed 6314.26 samples/sec Loss 6.2839 LearningRate 0.0006 Epoch: 11 Global Step: 234890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:04,411-Speed 6303.93 samples/sec Loss 6.2489 LearningRate 0.0006 Epoch: 11 Global Step: 234900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:07,658-Speed 6309.29 samples/sec Loss 6.3596 LearningRate 0.0006 Epoch: 11 Global Step: 234910 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:10,910-Speed 6297.97 samples/sec Loss 6.2541 LearningRate 0.0006 Epoch: 11 Global Step: 234920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:14,156-Speed 6310.29 samples/sec Loss 6.2626 LearningRate 0.0006 Epoch: 11 Global Step: 234930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:17,397-Speed 6320.40 samples/sec Loss 6.2283 LearningRate 0.0006 Epoch: 11 Global Step: 234940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:20,645-Speed 6307.80 samples/sec Loss 6.2846 LearningRate 0.0006 Epoch: 11 Global Step: 234950 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:06:23,877-Speed 6338.13 samples/sec Loss 6.2671 LearningRate 0.0006 Epoch: 11 Global Step: 234960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:27,127-Speed 6302.66 samples/sec Loss 6.2607 LearningRate 0.0006 Epoch: 11 Global Step: 234970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:30,370-Speed 6316.37 samples/sec Loss 6.2625 LearningRate 0.0006 Epoch: 11 Global Step: 234980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:33,631-Speed 6281.64 samples/sec Loss 6.2722 LearningRate 0.0006 Epoch: 11 Global Step: 234990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:36,876-Speed 6312.29 samples/sec Loss 6.3444 LearningRate 0.0006 Epoch: 11 Global Step: 235000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:40,137-Speed 6282.44 samples/sec Loss 6.3009 LearningRate 0.0006 Epoch: 11 Global Step: 235010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:43,381-Speed 6313.51 samples/sec Loss 6.2507 LearningRate 0.0006 Epoch: 11 Global Step: 235020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:46,632-Speed 6301.60 samples/sec Loss 6.2595 LearningRate 0.0006 Epoch: 11 Global Step: 235030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:49,878-Speed 6311.81 samples/sec Loss 6.2228 LearningRate 0.0006 Epoch: 11 Global Step: 235040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:53,121-Speed 6315.60 samples/sec Loss 6.1972 LearningRate 0.0006 Epoch: 11 Global Step: 235050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:06:56,371-Speed 6303.85 samples/sec Loss 6.2209 LearningRate 0.0006 Epoch: 11 Global Step: 235060 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:06:59,606-Speed 6333.24 samples/sec Loss 6.2969 LearningRate 0.0006 Epoch: 11 Global Step: 235070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:02,868-Speed 6279.87 samples/sec Loss 6.2761 LearningRate 0.0006 Epoch: 11 Global Step: 235080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:06,115-Speed 6308.30 samples/sec Loss 6.2823 LearningRate 0.0006 Epoch: 11 Global Step: 235090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:09,362-Speed 6308.09 samples/sec Loss 6.2616 LearningRate 0.0006 Epoch: 11 Global Step: 235100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:12,606-Speed 6315.17 samples/sec Loss 6.3150 LearningRate 0.0006 Epoch: 11 Global Step: 235110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:15,851-Speed 6312.82 samples/sec Loss 6.3632 LearningRate 0.0006 Epoch: 11 Global Step: 235120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:19,096-Speed 6311.56 samples/sec Loss 6.2704 LearningRate 0.0006 Epoch: 11 Global Step: 235130 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:22,339-Speed 6318.00 samples/sec Loss 6.2689 LearningRate 0.0006 Epoch: 11 Global Step: 235140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:25,589-Speed 6302.35 samples/sec Loss 6.3708 LearningRate 0.0006 Epoch: 11 Global Step: 235150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:28,833-Speed 6313.43 samples/sec Loss 6.2336 LearningRate 0.0006 Epoch: 11 Global Step: 235160 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:32,076-Speed 6318.47 samples/sec Loss 6.2693 LearningRate 0.0006 Epoch: 11 Global Step: 235170 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:07:35,309-Speed 6335.27 samples/sec Loss 6.2293 LearningRate 0.0006 Epoch: 11 Global Step: 235180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:38,555-Speed 6310.80 samples/sec Loss 6.2963 LearningRate 0.0006 Epoch: 11 Global Step: 235190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:41,809-Speed 6294.68 samples/sec Loss 6.3237 LearningRate 0.0006 Epoch: 11 Global Step: 235200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:45,056-Speed 6309.34 samples/sec Loss 6.2432 LearningRate 0.0006 Epoch: 11 Global Step: 235210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:48,301-Speed 6312.92 samples/sec Loss 6.2327 LearningRate 0.0006 Epoch: 11 Global Step: 235220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:51,539-Speed 6325.10 samples/sec Loss 6.2431 LearningRate 0.0006 Epoch: 11 Global Step: 235230 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:54,782-Speed 6317.79 samples/sec Loss 6.3277 LearningRate 0.0006 Epoch: 11 Global Step: 235240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:07:58,027-Speed 6312.02 samples/sec Loss 6.2823 LearningRate 0.0006 Epoch: 11 Global Step: 235250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:01,274-Speed 6310.20 samples/sec Loss 6.2184 LearningRate 0.0006 Epoch: 11 Global Step: 235260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:04,517-Speed 6316.06 samples/sec Loss 6.2446 LearningRate 0.0006 Epoch: 11 Global Step: 235270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:07,748-Speed 6341.26 samples/sec Loss 6.2774 LearningRate 0.0006 Epoch: 11 Global Step: 235280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:10,991-Speed 6315.25 samples/sec Loss 6.3136 LearningRate 0.0006 Epoch: 11 Global Step: 235290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:14,237-Speed 6311.60 samples/sec Loss 6.2517 LearningRate 0.0006 Epoch: 11 Global Step: 235300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:17,483-Speed 6309.63 samples/sec Loss 6.2559 LearningRate 0.0006 Epoch: 11 Global Step: 235310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:20,736-Speed 6298.02 samples/sec Loss 6.2440 LearningRate 0.0006 Epoch: 11 Global Step: 235320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:23,980-Speed 6313.43 samples/sec Loss 6.3666 LearningRate 0.0006 Epoch: 11 Global Step: 235330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:27,224-Speed 6314.31 samples/sec Loss 6.3139 LearningRate 0.0006 Epoch: 11 Global Step: 235340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:30,469-Speed 6314.13 samples/sec Loss 6.2953 LearningRate 0.0006 Epoch: 11 Global Step: 235350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:33,714-Speed 6312.43 samples/sec Loss 6.2212 LearningRate 0.0006 Epoch: 11 Global Step: 235360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:36,962-Speed 6305.83 samples/sec Loss 6.2556 LearningRate 0.0006 Epoch: 11 Global Step: 235370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:40,191-Speed 6343.73 samples/sec Loss 6.2105 LearningRate 0.0006 Epoch: 11 Global Step: 235380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:43,441-Speed 6304.41 samples/sec Loss 6.2792 LearningRate 0.0006 Epoch: 11 Global Step: 235390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:46,683-Speed 6318.40 samples/sec Loss 6.3550 LearningRate 0.0006 Epoch: 11 Global Step: 235400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:49,927-Speed 6314.98 samples/sec Loss 6.2329 LearningRate 0.0006 Epoch: 11 Global Step: 235410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:53,174-Speed 6308.57 samples/sec Loss 6.2595 LearningRate 0.0006 Epoch: 11 Global Step: 235420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:56,417-Speed 6315.31 samples/sec Loss 6.2741 LearningRate 0.0006 Epoch: 11 Global Step: 235430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:08:59,666-Speed 6305.37 samples/sec Loss 6.3693 LearningRate 0.0006 Epoch: 11 Global Step: 235440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:02,911-Speed 6312.59 samples/sec Loss 6.2808 LearningRate 0.0006 Epoch: 11 Global Step: 235450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:06,154-Speed 6315.89 samples/sec Loss 6.1430 LearningRate 0.0006 Epoch: 11 Global Step: 235460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:09,401-Speed 6311.12 samples/sec Loss 6.2855 LearningRate 0.0006 Epoch: 11 Global Step: 235470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:12,647-Speed 6309.71 samples/sec Loss 6.3233 LearningRate 0.0006 Epoch: 11 Global Step: 235480 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:09:15,878-Speed 6339.78 samples/sec Loss 6.2026 LearningRate 0.0006 Epoch: 11 Global Step: 235490 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:19,126-Speed 6306.73 samples/sec Loss 6.1791 LearningRate 0.0006 Epoch: 11 Global Step: 235500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:22,372-Speed 6311.24 samples/sec Loss 6.2957 LearningRate 0.0006 Epoch: 11 Global Step: 235510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:25,617-Speed 6313.17 samples/sec Loss 6.3652 LearningRate 0.0006 Epoch: 11 Global Step: 235520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:28,860-Speed 6316.37 samples/sec Loss 6.2020 LearningRate 0.0006 Epoch: 11 Global Step: 235530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:32,109-Speed 6303.97 samples/sec Loss 6.2251 LearningRate 0.0006 Epoch: 11 Global Step: 235540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:35,355-Speed 6310.26 samples/sec Loss 6.2238 LearningRate 0.0006 Epoch: 11 Global Step: 235550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:38,601-Speed 6313.43 samples/sec Loss 6.3059 LearningRate 0.0006 Epoch: 11 Global Step: 235560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:41,847-Speed 6311.20 samples/sec Loss 6.2184 LearningRate 0.0006 Epoch: 11 Global Step: 235570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:45,094-Speed 6308.45 samples/sec Loss 6.3207 LearningRate 0.0006 Epoch: 11 Global Step: 235580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:48,343-Speed 6304.78 samples/sec Loss 6.3234 LearningRate 0.0006 Epoch: 11 Global Step: 235590 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:09:51,581-Speed 6327.50 samples/sec Loss 6.2492 LearningRate 0.0006 Epoch: 11 Global Step: 235600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:54,828-Speed 6307.13 samples/sec Loss 6.2490 LearningRate 0.0006 Epoch: 11 Global Step: 235610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:09:58,073-Speed 6313.79 samples/sec Loss 6.2592 LearningRate 0.0006 Epoch: 11 Global Step: 235620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:01,319-Speed 6309.96 samples/sec Loss 6.2840 LearningRate 0.0006 Epoch: 11 Global Step: 235630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:04,566-Speed 6309.11 samples/sec Loss 6.2707 LearningRate 0.0006 Epoch: 11 Global Step: 235640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:07,819-Speed 6298.16 samples/sec Loss 6.2851 LearningRate 0.0006 Epoch: 11 Global Step: 235650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:11,076-Speed 6289.84 samples/sec Loss 6.1296 LearningRate 0.0006 Epoch: 11 Global Step: 235660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:14,324-Speed 6306.76 samples/sec Loss 6.2582 LearningRate 0.0006 Epoch: 11 Global Step: 235670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:17,570-Speed 6311.13 samples/sec Loss 6.2138 LearningRate 0.0006 Epoch: 11 Global Step: 235680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:20,819-Speed 6304.80 samples/sec Loss 6.2883 LearningRate 0.0006 Epoch: 11 Global Step: 235690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:24,064-Speed 6311.90 samples/sec Loss 6.2676 LearningRate 0.0006 Epoch: 11 Global Step: 235700 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:10:27,297-Speed 6335.31 samples/sec Loss 6.3158 LearningRate 0.0006 Epoch: 11 Global Step: 235710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:30,542-Speed 6314.10 samples/sec Loss 6.2331 LearningRate 0.0006 Epoch: 11 Global Step: 235720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:33,788-Speed 6309.41 samples/sec Loss 6.3487 LearningRate 0.0006 Epoch: 11 Global Step: 235730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:37,036-Speed 6307.00 samples/sec Loss 6.2621 LearningRate 0.0006 Epoch: 11 Global Step: 235740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:40,291-Speed 6293.77 samples/sec Loss 6.1715 LearningRate 0.0006 Epoch: 11 Global Step: 235750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:43,542-Speed 6300.20 samples/sec Loss 6.2422 LearningRate 0.0006 Epoch: 11 Global Step: 235760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:46,788-Speed 6311.85 samples/sec Loss 6.3177 LearningRate 0.0006 Epoch: 11 Global Step: 235770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:50,034-Speed 6309.65 samples/sec Loss 6.1914 LearningRate 0.0006 Epoch: 11 Global Step: 235780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:53,281-Speed 6308.64 samples/sec Loss 6.2060 LearningRate 0.0006 Epoch: 11 Global Step: 235790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:56,534-Speed 6296.85 samples/sec Loss 6.1566 LearningRate 0.0006 Epoch: 11 Global Step: 235800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:10:59,773-Speed 6325.35 samples/sec Loss 6.3237 LearningRate 0.0006 Epoch: 11 Global Step: 235810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:03,015-Speed 6318.67 samples/sec Loss 6.2991 LearningRate 0.0006 Epoch: 11 Global Step: 235820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:06,261-Speed 6310.45 samples/sec Loss 6.3157 LearningRate 0.0006 Epoch: 11 Global Step: 235830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:09,509-Speed 6307.07 samples/sec Loss 6.2162 LearningRate 0.0006 Epoch: 11 Global Step: 235840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:12,761-Speed 6297.86 samples/sec Loss 6.2601 LearningRate 0.0006 Epoch: 11 Global Step: 235850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:16,009-Speed 6308.31 samples/sec Loss 6.2620 LearningRate 0.0006 Epoch: 11 Global Step: 235860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:19,253-Speed 6315.35 samples/sec Loss 6.2954 LearningRate 0.0006 Epoch: 11 Global Step: 235870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:22,502-Speed 6304.00 samples/sec Loss 6.2986 LearningRate 0.0006 Epoch: 11 Global Step: 235880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:25,743-Speed 6320.79 samples/sec Loss 6.2496 LearningRate 0.0006 Epoch: 11 Global Step: 235890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:28,989-Speed 6309.73 samples/sec Loss 6.2378 LearningRate 0.0006 Epoch: 11 Global Step: 235900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:32,237-Speed 6308.06 samples/sec Loss 6.1967 LearningRate 0.0006 Epoch: 11 Global Step: 235910 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:11:35,471-Speed 6334.03 samples/sec Loss 6.2938 LearningRate 0.0006 Epoch: 11 Global Step: 235920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:38,717-Speed 6311.07 samples/sec Loss 6.2700 LearningRate 0.0006 Epoch: 11 Global Step: 235930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:41,964-Speed 6309.07 samples/sec Loss 6.2873 LearningRate 0.0006 Epoch: 11 Global Step: 235940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:45,222-Speed 6286.91 samples/sec Loss 6.2106 LearningRate 0.0006 Epoch: 11 Global Step: 235950 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:48,467-Speed 6312.92 samples/sec Loss 6.2520 LearningRate 0.0006 Epoch: 11 Global Step: 235960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:51,723-Speed 6291.59 samples/sec Loss 6.2970 LearningRate 0.0006 Epoch: 11 Global Step: 235970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:54,971-Speed 6305.86 samples/sec Loss 6.3299 LearningRate 0.0006 Epoch: 11 Global Step: 235980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:11:58,216-Speed 6311.73 samples/sec Loss 6.1760 LearningRate 0.0006 Epoch: 11 Global Step: 235990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:01,463-Speed 6310.14 samples/sec Loss 6.2065 LearningRate 0.0006 Epoch: 11 Global Step: 236000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:04,708-Speed 6311.73 samples/sec Loss 6.2913 LearningRate 0.0006 Epoch: 11 Global Step: 236010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:07,952-Speed 6315.58 samples/sec Loss 6.3105 LearningRate 0.0006 Epoch: 11 Global Step: 236020 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:12:11,181-Speed 6343.10 samples/sec Loss 6.2756 LearningRate 0.0006 Epoch: 11 Global Step: 236030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:14,428-Speed 6308.41 samples/sec Loss 6.3296 LearningRate 0.0006 Epoch: 11 Global Step: 236040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:17,678-Speed 6303.63 samples/sec Loss 6.2081 LearningRate 0.0006 Epoch: 11 Global Step: 236050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:20,925-Speed 6309.19 samples/sec Loss 6.3211 LearningRate 0.0006 Epoch: 11 Global Step: 236060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:24,173-Speed 6308.08 samples/sec Loss 6.2225 LearningRate 0.0006 Epoch: 11 Global Step: 236070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:27,416-Speed 6316.38 samples/sec Loss 6.2668 LearningRate 0.0006 Epoch: 11 Global Step: 236080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:30,663-Speed 6308.40 samples/sec Loss 6.2538 LearningRate 0.0006 Epoch: 11 Global Step: 236090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:33,911-Speed 6307.58 samples/sec Loss 6.1982 LearningRate 0.0006 Epoch: 11 Global Step: 236100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:37,157-Speed 6309.73 samples/sec Loss 6.2254 LearningRate 0.0006 Epoch: 11 Global Step: 236110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:40,410-Speed 6296.69 samples/sec Loss 6.2834 LearningRate 0.0006 Epoch: 11 Global Step: 236120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:43,639-Speed 6343.99 samples/sec Loss 6.2559 LearningRate 0.0006 Epoch: 11 Global Step: 236130 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:46,885-Speed 6310.71 samples/sec Loss 6.2324 LearningRate 0.0006 Epoch: 11 Global Step: 236140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:50,145-Speed 6284.08 samples/sec Loss 6.2494 LearningRate 0.0006 Epoch: 11 Global Step: 236150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:53,389-Speed 6315.74 samples/sec Loss 6.2307 LearningRate 0.0006 Epoch: 11 Global Step: 236160 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:56,634-Speed 6310.69 samples/sec Loss 6.2816 LearningRate 0.0006 Epoch: 11 Global Step: 236170 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:12:59,885-Speed 6301.26 samples/sec Loss 6.2138 LearningRate 0.0006 Epoch: 11 Global Step: 236180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:03,131-Speed 6310.78 samples/sec Loss 6.3379 LearningRate 0.0006 Epoch: 11 Global Step: 236190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:06,375-Speed 6315.40 samples/sec Loss 6.3545 LearningRate 0.0006 Epoch: 11 Global Step: 236200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:09,619-Speed 6314.17 samples/sec Loss 6.2319 LearningRate 0.0006 Epoch: 11 Global Step: 236210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:12,864-Speed 6313.82 samples/sec Loss 6.2792 LearningRate 0.0006 Epoch: 11 Global Step: 236220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:16,103-Speed 6323.03 samples/sec Loss 6.2850 LearningRate 0.0006 Epoch: 11 Global Step: 236230 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:13:19,334-Speed 6339.60 samples/sec Loss 6.3177 LearningRate 0.0006 Epoch: 11 Global Step: 236240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:22,575-Speed 6321.22 samples/sec Loss 6.2253 LearningRate 0.0006 Epoch: 11 Global Step: 236250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:25,821-Speed 6310.77 samples/sec Loss 6.3236 LearningRate 0.0006 Epoch: 11 Global Step: 236260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:29,066-Speed 6313.14 samples/sec Loss 6.3367 LearningRate 0.0006 Epoch: 11 Global Step: 236270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:32,318-Speed 6298.87 samples/sec Loss 6.3149 LearningRate 0.0006 Epoch: 11 Global Step: 236280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:35,559-Speed 6320.39 samples/sec Loss 6.3000 LearningRate 0.0006 Epoch: 11 Global Step: 236290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:38,805-Speed 6312.13 samples/sec Loss 6.2085 LearningRate 0.0006 Epoch: 11 Global Step: 236300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:42,048-Speed 6315.73 samples/sec Loss 6.1982 LearningRate 0.0006 Epoch: 11 Global Step: 236310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:45,295-Speed 6308.38 samples/sec Loss 6.2094 LearningRate 0.0006 Epoch: 11 Global Step: 236320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:48,544-Speed 6305.47 samples/sec Loss 6.2063 LearningRate 0.0006 Epoch: 11 Global Step: 236330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:13:51,846-Speed 6204.30 samples/sec Loss 6.3780 LearningRate 0.0006 Epoch: 11 Global Step: 236340 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:13:55,088-Speed 6318.61 samples/sec Loss 6.2733 LearningRate 0.0006 Epoch: 11 Global Step: 236350 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:13:58,318-Speed 6340.50 samples/sec Loss 6.2937 LearningRate 0.0006 Epoch: 11 Global Step: 236360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:01,562-Speed 6316.09 samples/sec Loss 6.3494 LearningRate 0.0006 Epoch: 11 Global Step: 236370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:04,812-Speed 6302.02 samples/sec Loss 6.3169 LearningRate 0.0006 Epoch: 11 Global Step: 236380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:08,057-Speed 6312.32 samples/sec Loss 6.2534 LearningRate 0.0006 Epoch: 11 Global Step: 236390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:11,304-Speed 6309.54 samples/sec Loss 6.2603 LearningRate 0.0006 Epoch: 11 Global Step: 236400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:14,551-Speed 6309.12 samples/sec Loss 6.2903 LearningRate 0.0006 Epoch: 11 Global Step: 236410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:17,796-Speed 6311.72 samples/sec Loss 6.2486 LearningRate 0.0006 Epoch: 11 Global Step: 236420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:21,043-Speed 6308.94 samples/sec Loss 6.2446 LearningRate 0.0006 Epoch: 11 Global Step: 236430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:24,290-Speed 6309.24 samples/sec Loss 6.2401 LearningRate 0.0006 Epoch: 11 Global Step: 236440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:27,605-Speed 6178.93 samples/sec Loss 6.2599 LearningRate 0.0006 Epoch: 11 Global Step: 236450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:30,851-Speed 6310.10 samples/sec Loss 6.3852 LearningRate 0.0006 Epoch: 11 Global Step: 236460 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:14:34,086-Speed 6331.10 samples/sec Loss 6.2614 LearningRate 0.0006 Epoch: 11 Global Step: 236470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:37,338-Speed 6301.01 samples/sec Loss 6.1999 LearningRate 0.0006 Epoch: 11 Global Step: 236480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:40,588-Speed 6302.34 samples/sec Loss 6.2662 LearningRate 0.0006 Epoch: 11 Global Step: 236490 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:43,837-Speed 6305.03 samples/sec Loss 6.2897 LearningRate 0.0006 Epoch: 11 Global Step: 236500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:47,084-Speed 6309.07 samples/sec Loss 6.2090 LearningRate 0.0006 Epoch: 11 Global Step: 236510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:50,336-Speed 6299.96 samples/sec Loss 6.2839 LearningRate 0.0006 Epoch: 11 Global Step: 236520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:53,584-Speed 6306.50 samples/sec Loss 6.2488 LearningRate 0.0006 Epoch: 11 Global Step: 236530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:14:56,828-Speed 6314.51 samples/sec Loss 6.2471 LearningRate 0.0006 Epoch: 11 Global Step: 236540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:00,073-Speed 6313.25 samples/sec Loss 6.2547 LearningRate 0.0006 Epoch: 11 Global Step: 236550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:03,320-Speed 6307.86 samples/sec Loss 6.2445 LearningRate 0.0006 Epoch: 11 Global Step: 236560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:06,551-Speed 6340.29 samples/sec Loss 6.2281 LearningRate 0.0006 Epoch: 11 Global Step: 236570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:09,798-Speed 6310.03 samples/sec Loss 6.2765 LearningRate 0.0006 Epoch: 11 Global Step: 236580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:13,042-Speed 6314.48 samples/sec Loss 6.2798 LearningRate 0.0006 Epoch: 11 Global Step: 236590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:16,291-Speed 6303.61 samples/sec Loss 6.2232 LearningRate 0.0006 Epoch: 11 Global Step: 236600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:19,537-Speed 6311.41 samples/sec Loss 6.2602 LearningRate 0.0006 Epoch: 11 Global Step: 236610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:22,781-Speed 6313.00 samples/sec Loss 6.2861 LearningRate 0.0006 Epoch: 11 Global Step: 236620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:26,054-Speed 6259.29 samples/sec Loss 6.2692 LearningRate 0.0006 Epoch: 11 Global Step: 236630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:29,301-Speed 6308.76 samples/sec Loss 6.2038 LearningRate 0.0006 Epoch: 11 Global Step: 236640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:32,548-Speed 6308.47 samples/sec Loss 6.2152 LearningRate 0.0006 Epoch: 11 Global Step: 236650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:35,793-Speed 6312.78 samples/sec Loss 6.3338 LearningRate 0.0006 Epoch: 11 Global Step: 236660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:39,021-Speed 6345.46 samples/sec Loss 6.3505 LearningRate 0.0006 Epoch: 11 Global Step: 236670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:42,274-Speed 6298.77 samples/sec Loss 6.2933 LearningRate 0.0006 Epoch: 11 Global Step: 236680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:45,518-Speed 6313.44 samples/sec Loss 6.2473 LearningRate 0.0006 Epoch: 11 Global Step: 236690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:48,765-Speed 6308.57 samples/sec Loss 6.1379 LearningRate 0.0006 Epoch: 11 Global Step: 236700 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:52,010-Speed 6314.22 samples/sec Loss 6.2051 LearningRate 0.0006 Epoch: 11 Global Step: 236710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:55,255-Speed 6311.37 samples/sec Loss 6.2352 LearningRate 0.0006 Epoch: 11 Global Step: 236720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:15:58,502-Speed 6309.78 samples/sec Loss 6.2547 LearningRate 0.0006 Epoch: 11 Global Step: 236730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:01,746-Speed 6314.81 samples/sec Loss 6.3035 LearningRate 0.0006 Epoch: 11 Global Step: 236740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:04,993-Speed 6307.91 samples/sec Loss 6.2444 LearningRate 0.0006 Epoch: 11 Global Step: 236750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:08,241-Speed 6307.89 samples/sec Loss 6.2299 LearningRate 0.0006 Epoch: 11 Global Step: 236760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:11,472-Speed 6340.55 samples/sec Loss 6.2526 LearningRate 0.0006 Epoch: 11 Global Step: 236770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:14,720-Speed 6305.80 samples/sec Loss 6.2321 LearningRate 0.0006 Epoch: 11 Global Step: 236780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:17,970-Speed 6303.90 samples/sec Loss 6.2182 LearningRate 0.0006 Epoch: 11 Global Step: 236790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:21,216-Speed 6311.04 samples/sec Loss 6.2788 LearningRate 0.0006 Epoch: 11 Global Step: 236800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:24,459-Speed 6316.40 samples/sec Loss 6.3094 LearningRate 0.0006 Epoch: 11 Global Step: 236810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:27,705-Speed 6309.95 samples/sec Loss 6.3189 LearningRate 0.0006 Epoch: 11 Global Step: 236820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:30,952-Speed 6309.36 samples/sec Loss 6.1584 LearningRate 0.0006 Epoch: 11 Global Step: 236830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:34,196-Speed 6314.73 samples/sec Loss 6.2473 LearningRate 0.0006 Epoch: 11 Global Step: 236840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:37,446-Speed 6303.37 samples/sec Loss 6.2710 LearningRate 0.0006 Epoch: 11 Global Step: 236850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:40,700-Speed 6293.55 samples/sec Loss 6.2453 LearningRate 0.0006 Epoch: 11 Global Step: 236860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:43,947-Speed 6309.42 samples/sec Loss 6.2256 LearningRate 0.0006 Epoch: 11 Global Step: 236870 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:16:47,192-Speed 6311.89 samples/sec Loss 6.1707 LearningRate 0.0006 Epoch: 11 Global Step: 236880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:50,439-Speed 6310.40 samples/sec Loss 6.2577 LearningRate 0.0006 Epoch: 11 Global Step: 236890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:53,684-Speed 6312.60 samples/sec Loss 6.1566 LearningRate 0.0006 Epoch: 11 Global Step: 236900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:16:56,925-Speed 6319.04 samples/sec Loss 6.2481 LearningRate 0.0006 Epoch: 11 Global Step: 236910 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:00,174-Speed 6305.21 samples/sec Loss 6.1890 LearningRate 0.0006 Epoch: 11 Global Step: 236920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:03,423-Speed 6306.06 samples/sec Loss 6.2232 LearningRate 0.0006 Epoch: 11 Global Step: 236930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:06,671-Speed 6307.35 samples/sec Loss 6.3106 LearningRate 0.0006 Epoch: 11 Global Step: 236940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:09,915-Speed 6315.12 samples/sec Loss 6.2690 LearningRate 0.0006 Epoch: 11 Global Step: 236950 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:13,159-Speed 6313.52 samples/sec Loss 6.2516 LearningRate 0.0006 Epoch: 11 Global Step: 236960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:16,404-Speed 6313.15 samples/sec Loss 6.2955 LearningRate 0.0006 Epoch: 11 Global Step: 236970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:19,650-Speed 6310.11 samples/sec Loss 6.1971 LearningRate 0.0006 Epoch: 11 Global Step: 236980 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:17:22,883-Speed 6335.56 samples/sec Loss 6.2401 LearningRate 0.0006 Epoch: 11 Global Step: 236990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:26,127-Speed 6316.05 samples/sec Loss 6.3206 LearningRate 0.0006 Epoch: 11 Global Step: 237000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:29,375-Speed 6305.35 samples/sec Loss 6.2655 LearningRate 0.0006 Epoch: 11 Global Step: 237010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:32,618-Speed 6318.10 samples/sec Loss 6.2783 LearningRate 0.0006 Epoch: 11 Global Step: 237020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:35,861-Speed 6316.30 samples/sec Loss 6.2045 LearningRate 0.0006 Epoch: 11 Global Step: 237030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:39,104-Speed 6315.95 samples/sec Loss 6.3189 LearningRate 0.0006 Epoch: 11 Global Step: 237040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:42,349-Speed 6312.91 samples/sec Loss 6.2919 LearningRate 0.0006 Epoch: 11 Global Step: 237050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:45,591-Speed 6318.53 samples/sec Loss 6.3066 LearningRate 0.0006 Epoch: 11 Global Step: 237060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:48,837-Speed 6309.66 samples/sec Loss 6.1513 LearningRate 0.0006 Epoch: 11 Global Step: 237070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:52,081-Speed 6314.57 samples/sec Loss 6.2451 LearningRate 0.0006 Epoch: 11 Global Step: 237080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:55,313-Speed 6338.06 samples/sec Loss 6.2838 LearningRate 0.0006 Epoch: 11 Global Step: 237090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:17:58,560-Speed 6308.96 samples/sec Loss 6.2933 LearningRate 0.0006 Epoch: 11 Global Step: 237100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:01,806-Speed 6311.39 samples/sec Loss 6.2535 LearningRate 0.0006 Epoch: 11 Global Step: 237110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:05,060-Speed 6294.94 samples/sec Loss 6.2805 LearningRate 0.0006 Epoch: 11 Global Step: 237120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:08,309-Speed 6305.63 samples/sec Loss 6.2120 LearningRate 0.0006 Epoch: 11 Global Step: 237130 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:11,555-Speed 6310.76 samples/sec Loss 6.3833 LearningRate 0.0006 Epoch: 11 Global Step: 237140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:14,807-Speed 6303.39 samples/sec Loss 6.2195 LearningRate 0.0006 Epoch: 11 Global Step: 237150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:18,054-Speed 6309.62 samples/sec Loss 6.2813 LearningRate 0.0006 Epoch: 11 Global Step: 237160 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:21,297-Speed 6314.96 samples/sec Loss 6.2146 LearningRate 0.0006 Epoch: 11 Global Step: 237170 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:24,546-Speed 6305.15 samples/sec Loss 6.1610 LearningRate 0.0006 Epoch: 11 Global Step: 237180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:27,780-Speed 6334.96 samples/sec Loss 6.2592 LearningRate 0.0006 Epoch: 11 Global Step: 237190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:31,022-Speed 6318.11 samples/sec Loss 6.2686 LearningRate 0.0006 Epoch: 11 Global Step: 237200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:34,268-Speed 6309.47 samples/sec Loss 6.2113 LearningRate 0.0006 Epoch: 11 Global Step: 237210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:37,516-Speed 6311.24 samples/sec Loss 6.2410 LearningRate 0.0006 Epoch: 11 Global Step: 237220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:40,780-Speed 6274.24 samples/sec Loss 6.2536 LearningRate 0.0006 Epoch: 11 Global Step: 237230 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:44,029-Speed 6304.99 samples/sec Loss 6.2092 LearningRate 0.0006 Epoch: 11 Global Step: 237240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:47,275-Speed 6311.37 samples/sec Loss 6.2117 LearningRate 0.0006 Epoch: 11 Global Step: 237250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:50,521-Speed 6310.45 samples/sec Loss 6.2384 LearningRate 0.0006 Epoch: 11 Global Step: 237260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:53,765-Speed 6314.37 samples/sec Loss 6.1905 LearningRate 0.0006 Epoch: 11 Global Step: 237270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:18:57,005-Speed 6322.22 samples/sec Loss 6.2415 LearningRate 0.0006 Epoch: 11 Global Step: 237280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:00,238-Speed 6335.99 samples/sec Loss 6.2284 LearningRate 0.0006 Epoch: 11 Global Step: 237290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:03,486-Speed 6308.08 samples/sec Loss 6.3078 LearningRate 0.0006 Epoch: 11 Global Step: 237300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:06,728-Speed 6317.20 samples/sec Loss 6.1941 LearningRate 0.0006 Epoch: 11 Global Step: 237310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:09,974-Speed 6311.71 samples/sec Loss 6.2091 LearningRate 0.0006 Epoch: 11 Global Step: 237320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:13,218-Speed 6314.98 samples/sec Loss 6.2765 LearningRate 0.0006 Epoch: 11 Global Step: 237330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:16,465-Speed 6308.84 samples/sec Loss 6.2351 LearningRate 0.0006 Epoch: 11 Global Step: 237340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:19,714-Speed 6303.79 samples/sec Loss 6.2225 LearningRate 0.0006 Epoch: 11 Global Step: 237350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:22,962-Speed 6307.64 samples/sec Loss 6.3341 LearningRate 0.0006 Epoch: 11 Global Step: 237360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:26,205-Speed 6316.78 samples/sec Loss 6.2531 LearningRate 0.0006 Epoch: 11 Global Step: 237370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:29,453-Speed 6306.72 samples/sec Loss 6.3579 LearningRate 0.0006 Epoch: 11 Global Step: 237380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:32,683-Speed 6341.42 samples/sec Loss 6.3479 LearningRate 0.0006 Epoch: 11 Global Step: 237390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:35,930-Speed 6310.27 samples/sec Loss 6.2534 LearningRate 0.0006 Epoch: 11 Global Step: 237400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:39,172-Speed 6317.60 samples/sec Loss 6.2391 LearningRate 0.0006 Epoch: 11 Global Step: 237410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:42,423-Speed 6300.98 samples/sec Loss 6.1828 LearningRate 0.0006 Epoch: 11 Global Step: 237420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:45,665-Speed 6319.60 samples/sec Loss 6.2990 LearningRate 0.0006 Epoch: 11 Global Step: 237430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:48,912-Speed 6307.49 samples/sec Loss 6.1906 LearningRate 0.0006 Epoch: 11 Global Step: 237440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:52,159-Speed 6308.98 samples/sec Loss 6.2416 LearningRate 0.0006 Epoch: 11 Global Step: 237450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:55,402-Speed 6317.05 samples/sec Loss 6.3449 LearningRate 0.0006 Epoch: 11 Global Step: 237460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:19:58,647-Speed 6312.41 samples/sec Loss 6.1513 LearningRate 0.0006 Epoch: 11 Global Step: 237470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:01,908-Speed 6282.61 samples/sec Loss 6.2733 LearningRate 0.0006 Epoch: 11 Global Step: 237480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:05,221-Speed 6182.36 samples/sec Loss 6.2887 LearningRate 0.0006 Epoch: 11 Global Step: 237490 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:20:08,476-Speed 6296.15 samples/sec Loss 6.3108 LearningRate 0.0006 Epoch: 11 Global Step: 237500 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:20:11,710-Speed 6334.90 samples/sec Loss 6.2003 LearningRate 0.0006 Epoch: 11 Global Step: 237510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:14,954-Speed 6314.77 samples/sec Loss 6.2159 LearningRate 0.0006 Epoch: 11 Global Step: 237520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:18,203-Speed 6303.62 samples/sec Loss 6.2544 LearningRate 0.0006 Epoch: 11 Global Step: 237530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:21,447-Speed 6315.57 samples/sec Loss 6.1833 LearningRate 0.0006 Epoch: 11 Global Step: 237540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:24,695-Speed 6306.68 samples/sec Loss 6.3137 LearningRate 0.0006 Epoch: 11 Global Step: 237550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:27,946-Speed 6300.28 samples/sec Loss 6.2221 LearningRate 0.0006 Epoch: 11 Global Step: 237560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:31,193-Speed 6309.30 samples/sec Loss 6.2866 LearningRate 0.0006 Epoch: 11 Global Step: 237570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:34,439-Speed 6311.58 samples/sec Loss 6.2542 LearningRate 0.0006 Epoch: 11 Global Step: 237580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:37,684-Speed 6312.80 samples/sec Loss 6.1959 LearningRate 0.0006 Epoch: 11 Global Step: 237590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:40,930-Speed 6310.24 samples/sec Loss 6.2667 LearningRate 0.0006 Epoch: 11 Global Step: 237600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:44,160-Speed 6342.06 samples/sec Loss 6.1775 LearningRate 0.0006 Epoch: 11 Global Step: 237610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:47,414-Speed 6295.26 samples/sec Loss 6.3052 LearningRate 0.0006 Epoch: 11 Global Step: 237620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:50,659-Speed 6313.20 samples/sec Loss 6.2940 LearningRate 0.0006 Epoch: 11 Global Step: 237630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:53,904-Speed 6312.59 samples/sec Loss 6.2263 LearningRate 0.0006 Epoch: 11 Global Step: 237640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:20:57,151-Speed 6308.99 samples/sec Loss 6.3116 LearningRate 0.0006 Epoch: 11 Global Step: 237650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:21:00,396-Speed 6312.75 samples/sec Loss 6.1620 LearningRate 0.0006 Epoch: 11 Global Step: 237660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:21:03,643-Speed 6308.20 samples/sec Loss 6.1961 LearningRate 0.0006 Epoch: 11 Global Step: 237670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:21:06,890-Speed 6309.57 samples/sec Loss 6.2359 LearningRate 0.0006 Epoch: 11 Global Step: 237680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:21:10,131-Speed 6318.63 samples/sec Loss 6.2056 LearningRate 0.0006 Epoch: 11 Global Step: 237690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:21:13,363-Speed 6338.33 samples/sec Loss 6.2523 LearningRate 0.0006 Epoch: 11 Global Step: 237700 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:21:16,607-Speed 6315.85 samples/sec Loss 6.1814 LearningRate 0.0006 Epoch: 11 Global Step: 237710 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:21:19,855-Speed 6305.36 samples/sec Loss 6.2638 LearningRate 0.0006 Epoch: 11 Global Step: 237720 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:21:23,104-Speed 6305.60 samples/sec Loss 6.2003 LearningRate 0.0006 Epoch: 11 Global Step: 237730 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:21:26,348-Speed 6314.71 samples/sec Loss 6.2721 LearningRate 0.0006 Epoch: 11 Global Step: 237740 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:21:29,597-Speed 6304.78 samples/sec Loss 6.2023 LearningRate 0.0006 Epoch: 11 Global Step: 237750 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:21:32,844-Speed 6308.55 samples/sec Loss 6.2391 LearningRate 0.0006 Epoch: 11 Global Step: 237760 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:21:36,087-Speed 6316.22 samples/sec Loss 6.2800 LearningRate 0.0006 Epoch: 11 Global Step: 237770 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:21:39,330-Speed 6316.39 samples/sec Loss 6.2578 LearningRate 0.0006 Epoch: 11 Global Step: 237780 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:21:42,573-Speed 6317.25 samples/sec Loss 6.2450 LearningRate 0.0006 Epoch: 11 Global Step: 237790 Fp16 Grad Scale: 16384 Required: 54 hours Training: 2022-04-01 13:21:45,820-Speed 6309.12 samples/sec Loss 6.2220 LearningRate 0.0006 Epoch: 11 Global Step: 237800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:21:49,066-Speed 6311.13 samples/sec Loss 6.2514 LearningRate 0.0006 Epoch: 11 Global Step: 237810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:21:52,317-Speed 6300.21 samples/sec Loss 6.2386 LearningRate 0.0006 Epoch: 11 Global Step: 237820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:21:55,560-Speed 6316.53 samples/sec Loss 6.2478 LearningRate 0.0006 Epoch: 11 Global Step: 237830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:21:58,802-Speed 6319.62 samples/sec Loss 6.2368 LearningRate 0.0006 Epoch: 11 Global Step: 237840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:02,051-Speed 6304.83 samples/sec Loss 6.2497 LearningRate 0.0006 Epoch: 11 Global Step: 237850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:05,301-Speed 6302.49 samples/sec Loss 6.2517 LearningRate 0.0006 Epoch: 11 Global Step: 237860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:08,544-Speed 6315.83 samples/sec Loss 6.3039 LearningRate 0.0006 Epoch: 11 Global Step: 237870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:11,795-Speed 6302.64 samples/sec Loss 6.2469 LearningRate 0.0006 Epoch: 11 Global Step: 237880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:15,041-Speed 6309.91 samples/sec Loss 6.3393 LearningRate 0.0006 Epoch: 11 Global Step: 237890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:18,287-Speed 6311.74 samples/sec Loss 6.2376 LearningRate 0.0006 Epoch: 11 Global Step: 237900 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:22:21,531-Speed 6314.60 samples/sec Loss 6.2533 LearningRate 0.0006 Epoch: 11 Global Step: 237910 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:22:24,766-Speed 6331.52 samples/sec Loss 6.3098 LearningRate 0.0006 Epoch: 11 Global Step: 237920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:28,009-Speed 6316.63 samples/sec Loss 6.2479 LearningRate 0.0006 Epoch: 11 Global Step: 237930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:31,254-Speed 6312.48 samples/sec Loss 6.2381 LearningRate 0.0006 Epoch: 11 Global Step: 237940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:34,497-Speed 6315.78 samples/sec Loss 6.2492 LearningRate 0.0006 Epoch: 11 Global Step: 237950 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:37,749-Speed 6299.32 samples/sec Loss 6.2839 LearningRate 0.0006 Epoch: 11 Global Step: 237960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:40,995-Speed 6311.30 samples/sec Loss 6.2404 LearningRate 0.0006 Epoch: 11 Global Step: 237970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:44,238-Speed 6315.87 samples/sec Loss 6.3058 LearningRate 0.0006 Epoch: 11 Global Step: 237980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:47,482-Speed 6313.90 samples/sec Loss 6.1851 LearningRate 0.0006 Epoch: 11 Global Step: 237990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:50,734-Speed 6300.43 samples/sec Loss 6.2923 LearningRate 0.0006 Epoch: 11 Global Step: 238000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:53,980-Speed 6311.61 samples/sec Loss 6.2543 LearningRate 0.0006 Epoch: 11 Global Step: 238010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:22:57,215-Speed 6330.94 samples/sec Loss 6.2164 LearningRate 0.0006 Epoch: 11 Global Step: 238020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:00,460-Speed 6312.46 samples/sec Loss 6.2941 LearningRate 0.0006 Epoch: 11 Global Step: 238030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:03,711-Speed 6301.66 samples/sec Loss 6.2512 LearningRate 0.0006 Epoch: 11 Global Step: 238040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:06,954-Speed 6317.30 samples/sec Loss 6.2733 LearningRate 0.0006 Epoch: 11 Global Step: 238050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:10,197-Speed 6315.45 samples/sec Loss 6.2436 LearningRate 0.0006 Epoch: 11 Global Step: 238060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:13,445-Speed 6306.69 samples/sec Loss 6.2635 LearningRate 0.0006 Epoch: 11 Global Step: 238070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:16,689-Speed 6316.02 samples/sec Loss 6.1530 LearningRate 0.0006 Epoch: 11 Global Step: 238080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:19,935-Speed 6310.58 samples/sec Loss 6.2475 LearningRate 0.0006 Epoch: 11 Global Step: 238090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:23,179-Speed 6314.34 samples/sec Loss 6.2059 LearningRate 0.0006 Epoch: 11 Global Step: 238100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:26,426-Speed 6307.99 samples/sec Loss 6.2668 LearningRate 0.0006 Epoch: 11 Global Step: 238110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:29,675-Speed 6304.52 samples/sec Loss 6.1939 LearningRate 0.0006 Epoch: 11 Global Step: 238120 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:23:32,910-Speed 6333.34 samples/sec Loss 6.2203 LearningRate 0.0006 Epoch: 11 Global Step: 238130 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:36,156-Speed 6309.41 samples/sec Loss 6.2175 LearningRate 0.0006 Epoch: 11 Global Step: 238140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:39,410-Speed 6296.32 samples/sec Loss 6.3110 LearningRate 0.0006 Epoch: 11 Global Step: 238150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:42,658-Speed 6307.07 samples/sec Loss 6.2628 LearningRate 0.0006 Epoch: 11 Global Step: 238160 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:45,904-Speed 6310.25 samples/sec Loss 6.2950 LearningRate 0.0006 Epoch: 11 Global Step: 238170 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:49,149-Speed 6313.29 samples/sec Loss 6.2320 LearningRate 0.0006 Epoch: 11 Global Step: 238180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:52,396-Speed 6308.41 samples/sec Loss 6.2076 LearningRate 0.0006 Epoch: 11 Global Step: 238190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:55,643-Speed 6309.08 samples/sec Loss 6.2434 LearningRate 0.0006 Epoch: 11 Global Step: 238200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:23:58,888-Speed 6313.12 samples/sec Loss 6.2771 LearningRate 0.0006 Epoch: 11 Global Step: 238210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:02,138-Speed 6303.39 samples/sec Loss 6.2965 LearningRate 0.0006 Epoch: 11 Global Step: 238220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:05,387-Speed 6303.82 samples/sec Loss 6.1740 LearningRate 0.0006 Epoch: 11 Global Step: 238230 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:24:08,642-Speed 6294.34 samples/sec Loss 6.3114 LearningRate 0.0006 Epoch: 11 Global Step: 238240 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:24:11,869-Speed 6346.45 samples/sec Loss 6.2172 LearningRate 0.0006 Epoch: 11 Global Step: 238250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:15,117-Speed 6307.59 samples/sec Loss 6.2752 LearningRate 0.0006 Epoch: 11 Global Step: 238260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:18,365-Speed 6307.71 samples/sec Loss 6.2874 LearningRate 0.0006 Epoch: 11 Global Step: 238270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:21,608-Speed 6314.66 samples/sec Loss 6.2518 LearningRate 0.0006 Epoch: 11 Global Step: 238280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:24,857-Speed 6306.03 samples/sec Loss 6.1950 LearningRate 0.0006 Epoch: 11 Global Step: 238290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:28,107-Speed 6301.94 samples/sec Loss 6.2720 LearningRate 0.0006 Epoch: 11 Global Step: 238300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:31,354-Speed 6309.78 samples/sec Loss 6.2511 LearningRate 0.0006 Epoch: 11 Global Step: 238310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:34,600-Speed 6309.93 samples/sec Loss 6.2137 LearningRate 0.0006 Epoch: 11 Global Step: 238320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:37,861-Speed 6282.50 samples/sec Loss 6.2745 LearningRate 0.0006 Epoch: 11 Global Step: 238330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:41,106-Speed 6311.00 samples/sec Loss 6.1850 LearningRate 0.0006 Epoch: 11 Global Step: 238340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:44,337-Speed 6341.71 samples/sec Loss 6.2701 LearningRate 0.0006 Epoch: 11 Global Step: 238350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:47,585-Speed 6306.30 samples/sec Loss 6.2093 LearningRate 0.0006 Epoch: 11 Global Step: 238360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:50,829-Speed 6314.88 samples/sec Loss 6.2615 LearningRate 0.0006 Epoch: 11 Global Step: 238370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:54,074-Speed 6311.80 samples/sec Loss 6.2492 LearningRate 0.0006 Epoch: 11 Global Step: 238380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:24:57,319-Speed 6313.23 samples/sec Loss 6.2090 LearningRate 0.0006 Epoch: 11 Global Step: 238390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:00,570-Speed 6300.60 samples/sec Loss 6.1829 LearningRate 0.0006 Epoch: 11 Global Step: 238400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:03,816-Speed 6310.83 samples/sec Loss 6.2740 LearningRate 0.0006 Epoch: 11 Global Step: 238410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:07,061-Speed 6312.78 samples/sec Loss 6.2670 LearningRate 0.0006 Epoch: 11 Global Step: 238420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:10,311-Speed 6303.95 samples/sec Loss 6.2446 LearningRate 0.0006 Epoch: 11 Global Step: 238430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:13,556-Speed 6312.40 samples/sec Loss 6.1951 LearningRate 0.0006 Epoch: 11 Global Step: 238440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:16,803-Speed 6308.00 samples/sec Loss 6.2072 LearningRate 0.0006 Epoch: 11 Global Step: 238450 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:25:20,036-Speed 6338.09 samples/sec Loss 6.2280 LearningRate 0.0006 Epoch: 11 Global Step: 238460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:23,281-Speed 6312.15 samples/sec Loss 6.2848 LearningRate 0.0006 Epoch: 11 Global Step: 238470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:26,531-Speed 6301.73 samples/sec Loss 6.2751 LearningRate 0.0006 Epoch: 11 Global Step: 238480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:29,775-Speed 6314.89 samples/sec Loss 6.2500 LearningRate 0.0006 Epoch: 11 Global Step: 238490 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:33,025-Speed 6302.43 samples/sec Loss 6.1708 LearningRate 0.0006 Epoch: 11 Global Step: 238500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:36,272-Speed 6310.44 samples/sec Loss 6.2416 LearningRate 0.0006 Epoch: 11 Global Step: 238510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:39,516-Speed 6313.13 samples/sec Loss 6.1534 LearningRate 0.0006 Epoch: 11 Global Step: 238520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:42,767-Speed 6302.50 samples/sec Loss 6.1876 LearningRate 0.0006 Epoch: 11 Global Step: 238530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:46,009-Speed 6317.30 samples/sec Loss 6.2221 LearningRate 0.0006 Epoch: 11 Global Step: 238540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:49,257-Speed 6307.84 samples/sec Loss 6.2907 LearningRate 0.0006 Epoch: 11 Global Step: 238550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:52,492-Speed 6331.60 samples/sec Loss 6.2143 LearningRate 0.0006 Epoch: 11 Global Step: 238560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:55,737-Speed 6311.83 samples/sec Loss 6.2757 LearningRate 0.0006 Epoch: 11 Global Step: 238570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:25:58,985-Speed 6307.64 samples/sec Loss 6.2344 LearningRate 0.0006 Epoch: 11 Global Step: 238580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:02,231-Speed 6311.06 samples/sec Loss 6.2182 LearningRate 0.0006 Epoch: 11 Global Step: 238590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:05,480-Speed 6304.63 samples/sec Loss 6.2554 LearningRate 0.0006 Epoch: 11 Global Step: 238600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:08,723-Speed 6314.78 samples/sec Loss 6.1970 LearningRate 0.0006 Epoch: 11 Global Step: 238610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:11,971-Speed 6306.74 samples/sec Loss 6.1814 LearningRate 0.0006 Epoch: 11 Global Step: 238620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:15,219-Speed 6307.48 samples/sec Loss 6.2001 LearningRate 0.0006 Epoch: 11 Global Step: 238630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:18,470-Speed 6302.45 samples/sec Loss 6.2710 LearningRate 0.0006 Epoch: 11 Global Step: 238640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:21,716-Speed 6310.32 samples/sec Loss 6.2254 LearningRate 0.0006 Epoch: 11 Global Step: 238650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:24,959-Speed 6316.44 samples/sec Loss 6.2301 LearningRate 0.0006 Epoch: 11 Global Step: 238660 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:26:28,196-Speed 6328.63 samples/sec Loss 6.2229 LearningRate 0.0006 Epoch: 11 Global Step: 238670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:31,442-Speed 6309.87 samples/sec Loss 6.2962 LearningRate 0.0006 Epoch: 11 Global Step: 238680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:34,696-Speed 6296.13 samples/sec Loss 6.2224 LearningRate 0.0006 Epoch: 11 Global Step: 238690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:37,944-Speed 6307.04 samples/sec Loss 6.2033 LearningRate 0.0006 Epoch: 11 Global Step: 238700 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:41,191-Speed 6308.33 samples/sec Loss 6.3156 LearningRate 0.0006 Epoch: 11 Global Step: 238710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:44,438-Speed 6308.67 samples/sec Loss 6.2565 LearningRate 0.0006 Epoch: 11 Global Step: 238720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:47,682-Speed 6315.13 samples/sec Loss 6.2354 LearningRate 0.0006 Epoch: 11 Global Step: 238730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:50,927-Speed 6311.80 samples/sec Loss 6.1845 LearningRate 0.0006 Epoch: 11 Global Step: 238740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:54,176-Speed 6305.01 samples/sec Loss 6.2516 LearningRate 0.0006 Epoch: 11 Global Step: 238750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:26:57,426-Speed 6303.15 samples/sec Loss 6.2019 LearningRate 0.0006 Epoch: 11 Global Step: 238760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:00,658-Speed 6339.79 samples/sec Loss 6.2482 LearningRate 0.0006 Epoch: 11 Global Step: 238770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:03,901-Speed 6314.76 samples/sec Loss 6.1870 LearningRate 0.0006 Epoch: 11 Global Step: 238780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:07,146-Speed 6312.53 samples/sec Loss 6.2363 LearningRate 0.0006 Epoch: 11 Global Step: 238790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:10,389-Speed 6317.39 samples/sec Loss 6.2273 LearningRate 0.0006 Epoch: 11 Global Step: 238800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:13,636-Speed 6308.43 samples/sec Loss 6.2373 LearningRate 0.0006 Epoch: 11 Global Step: 238810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:16,887-Speed 6300.77 samples/sec Loss 6.1835 LearningRate 0.0006 Epoch: 11 Global Step: 238820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:20,130-Speed 6316.57 samples/sec Loss 6.3145 LearningRate 0.0006 Epoch: 11 Global Step: 238830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:23,380-Speed 6302.76 samples/sec Loss 6.2445 LearningRate 0.0006 Epoch: 11 Global Step: 238840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:26,624-Speed 6315.10 samples/sec Loss 6.1793 LearningRate 0.0006 Epoch: 11 Global Step: 238850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:29,867-Speed 6317.15 samples/sec Loss 6.2184 LearningRate 0.0006 Epoch: 11 Global Step: 238860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:33,099-Speed 6338.29 samples/sec Loss 6.2003 LearningRate 0.0006 Epoch: 11 Global Step: 238870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:36,344-Speed 6313.30 samples/sec Loss 6.2289 LearningRate 0.0006 Epoch: 11 Global Step: 238880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:39,591-Speed 6308.26 samples/sec Loss 6.1891 LearningRate 0.0006 Epoch: 11 Global Step: 238890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:42,835-Speed 6314.20 samples/sec Loss 6.1814 LearningRate 0.0006 Epoch: 11 Global Step: 238900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:46,077-Speed 6318.39 samples/sec Loss 6.2161 LearningRate 0.0006 Epoch: 11 Global Step: 238910 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:49,318-Speed 6323.60 samples/sec Loss 6.2531 LearningRate 0.0006 Epoch: 11 Global Step: 238920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:52,560-Speed 6318.21 samples/sec Loss 6.2193 LearningRate 0.0006 Epoch: 11 Global Step: 238930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:55,804-Speed 6313.07 samples/sec Loss 6.2010 LearningRate 0.0006 Epoch: 11 Global Step: 238940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:27:59,048-Speed 6315.72 samples/sec Loss 6.2520 LearningRate 0.0006 Epoch: 11 Global Step: 238950 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:02,295-Speed 6308.15 samples/sec Loss 6.2085 LearningRate 0.0006 Epoch: 11 Global Step: 238960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:05,522-Speed 6349.23 samples/sec Loss 6.2418 LearningRate 0.0006 Epoch: 11 Global Step: 238970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:08,766-Speed 6313.28 samples/sec Loss 6.2613 LearningRate 0.0006 Epoch: 11 Global Step: 238980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:12,014-Speed 6306.84 samples/sec Loss 6.2505 LearningRate 0.0006 Epoch: 11 Global Step: 238990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:15,261-Speed 6310.24 samples/sec Loss 6.1665 LearningRate 0.0006 Epoch: 11 Global Step: 239000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:18,510-Speed 6304.58 samples/sec Loss 6.2229 LearningRate 0.0006 Epoch: 11 Global Step: 239010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:21,756-Speed 6311.21 samples/sec Loss 6.2244 LearningRate 0.0006 Epoch: 11 Global Step: 239020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:25,001-Speed 6310.90 samples/sec Loss 6.2708 LearningRate 0.0006 Epoch: 11 Global Step: 239030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:28,245-Speed 6316.28 samples/sec Loss 6.2369 LearningRate 0.0006 Epoch: 11 Global Step: 239040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:31,492-Speed 6308.46 samples/sec Loss 6.3081 LearningRate 0.0006 Epoch: 11 Global Step: 239050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:34,735-Speed 6316.16 samples/sec Loss 6.1593 LearningRate 0.0006 Epoch: 11 Global Step: 239060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:37,981-Speed 6309.56 samples/sec Loss 6.2814 LearningRate 0.0006 Epoch: 11 Global Step: 239070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:41,230-Speed 6305.31 samples/sec Loss 6.2769 LearningRate 0.0006 Epoch: 11 Global Step: 239080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:44,480-Speed 6303.17 samples/sec Loss 6.2920 LearningRate 0.0006 Epoch: 11 Global Step: 239090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:47,727-Speed 6308.96 samples/sec Loss 6.3093 LearningRate 0.0006 Epoch: 11 Global Step: 239100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:50,977-Speed 6303.69 samples/sec Loss 6.2251 LearningRate 0.0006 Epoch: 11 Global Step: 239110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:54,220-Speed 6316.71 samples/sec Loss 6.2647 LearningRate 0.0006 Epoch: 11 Global Step: 239120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:28:57,462-Speed 6319.09 samples/sec Loss 6.2594 LearningRate 0.0006 Epoch: 11 Global Step: 239130 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:00,706-Speed 6313.23 samples/sec Loss 6.2300 LearningRate 0.0006 Epoch: 11 Global Step: 239140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:03,953-Speed 6311.25 samples/sec Loss 6.3467 LearningRate 0.0006 Epoch: 11 Global Step: 239150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:07,202-Speed 6305.47 samples/sec Loss 6.1939 LearningRate 0.0006 Epoch: 11 Global Step: 239160 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:10,450-Speed 6306.91 samples/sec Loss 6.3081 LearningRate 0.0006 Epoch: 11 Global Step: 239170 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:29:13,679-Speed 6342.49 samples/sec Loss 6.2297 LearningRate 0.0006 Epoch: 11 Global Step: 239180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:16,931-Speed 6308.90 samples/sec Loss 6.1977 LearningRate 0.0006 Epoch: 11 Global Step: 239190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:20,173-Speed 6318.14 samples/sec Loss 6.3315 LearningRate 0.0006 Epoch: 11 Global Step: 239200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:23,416-Speed 6317.05 samples/sec Loss 6.2198 LearningRate 0.0006 Epoch: 11 Global Step: 239210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:26,659-Speed 6316.21 samples/sec Loss 6.2688 LearningRate 0.0006 Epoch: 11 Global Step: 239220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:29,910-Speed 6300.47 samples/sec Loss 6.1910 LearningRate 0.0006 Epoch: 11 Global Step: 239230 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:33,154-Speed 6314.73 samples/sec Loss 6.2828 LearningRate 0.0006 Epoch: 11 Global Step: 239240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:36,399-Speed 6312.70 samples/sec Loss 6.2827 LearningRate 0.0006 Epoch: 11 Global Step: 239250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:39,637-Speed 6325.53 samples/sec Loss 6.2509 LearningRate 0.0006 Epoch: 11 Global Step: 239260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:42,883-Speed 6310.58 samples/sec Loss 6.2055 LearningRate 0.0006 Epoch: 11 Global Step: 239270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:46,110-Speed 6347.94 samples/sec Loss 6.2407 LearningRate 0.0006 Epoch: 11 Global Step: 239280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:49,349-Speed 6324.43 samples/sec Loss 6.2690 LearningRate 0.0006 Epoch: 11 Global Step: 239290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:52,597-Speed 6308.70 samples/sec Loss 6.2429 LearningRate 0.0006 Epoch: 11 Global Step: 239300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:55,843-Speed 6310.73 samples/sec Loss 6.1960 LearningRate 0.0006 Epoch: 11 Global Step: 239310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:29:59,087-Speed 6313.53 samples/sec Loss 6.1873 LearningRate 0.0006 Epoch: 11 Global Step: 239320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:02,336-Speed 6305.09 samples/sec Loss 6.1947 LearningRate 0.0006 Epoch: 11 Global Step: 239330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:05,600-Speed 6277.71 samples/sec Loss 6.2625 LearningRate 0.0006 Epoch: 11 Global Step: 239340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:08,876-Speed 6252.12 samples/sec Loss 6.2705 LearningRate 0.0006 Epoch: 11 Global Step: 239350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:12,121-Speed 6311.46 samples/sec Loss 6.2089 LearningRate 0.0006 Epoch: 11 Global Step: 239360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:15,370-Speed 6305.04 samples/sec Loss 6.2686 LearningRate 0.0006 Epoch: 11 Global Step: 239370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:18,614-Speed 6316.21 samples/sec Loss 6.1056 LearningRate 0.0006 Epoch: 11 Global Step: 239380 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:30:21,845-Speed 6338.96 samples/sec Loss 6.2327 LearningRate 0.0006 Epoch: 11 Global Step: 239390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:25,092-Speed 6309.03 samples/sec Loss 6.2192 LearningRate 0.0006 Epoch: 11 Global Step: 239400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:28,333-Speed 6319.76 samples/sec Loss 6.2043 LearningRate 0.0006 Epoch: 11 Global Step: 239410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:31,580-Speed 6309.24 samples/sec Loss 6.1740 LearningRate 0.0006 Epoch: 11 Global Step: 239420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:34,825-Speed 6313.77 samples/sec Loss 6.2545 LearningRate 0.0006 Epoch: 11 Global Step: 239430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:38,074-Speed 6304.20 samples/sec Loss 6.2355 LearningRate 0.0006 Epoch: 11 Global Step: 239440 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:41,315-Speed 6319.40 samples/sec Loss 6.2246 LearningRate 0.0006 Epoch: 11 Global Step: 239450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:44,562-Speed 6309.91 samples/sec Loss 6.2783 LearningRate 0.0006 Epoch: 11 Global Step: 239460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:47,816-Speed 6294.65 samples/sec Loss 6.3125 LearningRate 0.0006 Epoch: 11 Global Step: 239470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:51,082-Speed 6272.76 samples/sec Loss 6.2856 LearningRate 0.0006 Epoch: 11 Global Step: 239480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:30:54,334-Speed 6298.77 samples/sec Loss 6.2404 LearningRate 0.0006 Epoch: 11 Global Step: 239490 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:30:57,562-Speed 6344.71 samples/sec Loss 6.2306 LearningRate 0.0006 Epoch: 11 Global Step: 239500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:00,823-Speed 6283.04 samples/sec Loss 6.2201 LearningRate 0.0006 Epoch: 11 Global Step: 239510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:04,071-Speed 6306.46 samples/sec Loss 6.1710 LearningRate 0.0006 Epoch: 11 Global Step: 239520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:07,321-Speed 6303.83 samples/sec Loss 6.2459 LearningRate 0.0006 Epoch: 11 Global Step: 239530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:10,569-Speed 6307.69 samples/sec Loss 6.1838 LearningRate 0.0006 Epoch: 11 Global Step: 239540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:13,811-Speed 6318.09 samples/sec Loss 6.2571 LearningRate 0.0006 Epoch: 11 Global Step: 239550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:17,115-Speed 6199.39 samples/sec Loss 6.2060 LearningRate 0.0006 Epoch: 11 Global Step: 239560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:20,382-Speed 6269.30 samples/sec Loss 6.2982 LearningRate 0.0006 Epoch: 11 Global Step: 239570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:23,629-Speed 6309.38 samples/sec Loss 6.2547 LearningRate 0.0006 Epoch: 11 Global Step: 239580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:26,880-Speed 6300.99 samples/sec Loss 6.2105 LearningRate 0.0006 Epoch: 11 Global Step: 239590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:30,113-Speed 6336.75 samples/sec Loss 6.1952 LearningRate 0.0006 Epoch: 11 Global Step: 239600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:33,361-Speed 6306.35 samples/sec Loss 6.2183 LearningRate 0.0006 Epoch: 11 Global Step: 239610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:36,605-Speed 6313.79 samples/sec Loss 6.1781 LearningRate 0.0006 Epoch: 11 Global Step: 239620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:39,850-Speed 6312.86 samples/sec Loss 6.1289 LearningRate 0.0006 Epoch: 11 Global Step: 239630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:43,099-Speed 6305.11 samples/sec Loss 6.1475 LearningRate 0.0006 Epoch: 11 Global Step: 239640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:46,343-Speed 6315.21 samples/sec Loss 6.1577 LearningRate 0.0006 Epoch: 11 Global Step: 239650 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:49,593-Speed 6303.16 samples/sec Loss 6.2908 LearningRate 0.0006 Epoch: 11 Global Step: 239660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:52,838-Speed 6312.44 samples/sec Loss 6.2621 LearningRate 0.0006 Epoch: 11 Global Step: 239670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:56,089-Speed 6299.97 samples/sec Loss 6.2403 LearningRate 0.0006 Epoch: 11 Global Step: 239680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:31:59,336-Speed 6309.70 samples/sec Loss 6.2204 LearningRate 0.0006 Epoch: 11 Global Step: 239690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:02,581-Speed 6311.67 samples/sec Loss 6.2149 LearningRate 0.0006 Epoch: 11 Global Step: 239700 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:32:05,818-Speed 6328.41 samples/sec Loss 6.2649 LearningRate 0.0006 Epoch: 11 Global Step: 239710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:09,067-Speed 6305.50 samples/sec Loss 6.2676 LearningRate 0.0006 Epoch: 11 Global Step: 239720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:12,319-Speed 6300.65 samples/sec Loss 6.1764 LearningRate 0.0006 Epoch: 11 Global Step: 239730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:15,565-Speed 6309.82 samples/sec Loss 6.2699 LearningRate 0.0006 Epoch: 11 Global Step: 239740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:18,812-Speed 6309.62 samples/sec Loss 6.1636 LearningRate 0.0006 Epoch: 11 Global Step: 239750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:22,066-Speed 6294.75 samples/sec Loss 6.2324 LearningRate 0.0006 Epoch: 11 Global Step: 239760 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:25,310-Speed 6313.27 samples/sec Loss 6.2911 LearningRate 0.0006 Epoch: 11 Global Step: 239770 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:28,558-Speed 6307.35 samples/sec Loss 6.1954 LearningRate 0.0006 Epoch: 11 Global Step: 239780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:31,801-Speed 6316.51 samples/sec Loss 6.2453 LearningRate 0.0006 Epoch: 11 Global Step: 239790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:35,048-Speed 6308.58 samples/sec Loss 6.2087 LearningRate 0.0006 Epoch: 11 Global Step: 239800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:38,279-Speed 6341.73 samples/sec Loss 6.2381 LearningRate 0.0006 Epoch: 11 Global Step: 239810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:41,523-Speed 6313.22 samples/sec Loss 6.1683 LearningRate 0.0006 Epoch: 11 Global Step: 239820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:44,770-Speed 6309.23 samples/sec Loss 6.2070 LearningRate 0.0006 Epoch: 11 Global Step: 239830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:48,015-Speed 6312.44 samples/sec Loss 6.3016 LearningRate 0.0006 Epoch: 11 Global Step: 239840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:51,263-Speed 6306.51 samples/sec Loss 6.2600 LearningRate 0.0006 Epoch: 11 Global Step: 239850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:54,511-Speed 6306.71 samples/sec Loss 6.2246 LearningRate 0.0006 Epoch: 11 Global Step: 239860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:32:57,760-Speed 6305.84 samples/sec Loss 6.2647 LearningRate 0.0006 Epoch: 11 Global Step: 239870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:01,018-Speed 6286.57 samples/sec Loss 6.1411 LearningRate 0.0006 Epoch: 11 Global Step: 239880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:04,262-Speed 6315.32 samples/sec Loss 6.1521 LearningRate 0.0006 Epoch: 11 Global Step: 239890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:07,511-Speed 6304.82 samples/sec Loss 6.2282 LearningRate 0.0006 Epoch: 11 Global Step: 239900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:10,755-Speed 6313.47 samples/sec Loss 6.2095 LearningRate 0.0006 Epoch: 11 Global Step: 239910 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:33:14,002-Speed 6309.16 samples/sec Loss 6.2261 LearningRate 0.0006 Epoch: 11 Global Step: 239920 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:33:17,235-Speed 6338.09 samples/sec Loss 6.2498 LearningRate 0.0006 Epoch: 11 Global Step: 239930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:20,484-Speed 6303.74 samples/sec Loss 6.3070 LearningRate 0.0006 Epoch: 11 Global Step: 239940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:23,735-Speed 6302.05 samples/sec Loss 6.2011 LearningRate 0.0006 Epoch: 11 Global Step: 239950 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:26,978-Speed 6314.96 samples/sec Loss 6.2645 LearningRate 0.0006 Epoch: 11 Global Step: 239960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:30,230-Speed 6301.04 samples/sec Loss 6.2447 LearningRate 0.0006 Epoch: 11 Global Step: 239970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:33,476-Speed 6309.70 samples/sec Loss 6.1952 LearningRate 0.0006 Epoch: 11 Global Step: 239980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:36,722-Speed 6310.99 samples/sec Loss 6.1772 LearningRate 0.0006 Epoch: 11 Global Step: 239990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:39,975-Speed 6297.67 samples/sec Loss 6.2268 LearningRate 0.0006 Epoch: 11 Global Step: 240000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:43,222-Speed 6308.61 samples/sec Loss 6.2786 LearningRate 0.0006 Epoch: 11 Global Step: 240010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:46,468-Speed 6310.00 samples/sec Loss 6.2589 LearningRate 0.0006 Epoch: 11 Global Step: 240020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:49,704-Speed 6330.04 samples/sec Loss 6.3176 LearningRate 0.0006 Epoch: 11 Global Step: 240030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:52,950-Speed 6310.78 samples/sec Loss 6.1735 LearningRate 0.0006 Epoch: 11 Global Step: 240040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:56,196-Speed 6310.59 samples/sec Loss 6.2151 LearningRate 0.0006 Epoch: 11 Global Step: 240050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:33:59,441-Speed 6313.82 samples/sec Loss 6.1854 LearningRate 0.0006 Epoch: 11 Global Step: 240060 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:02,688-Speed 6308.41 samples/sec Loss 6.2057 LearningRate 0.0006 Epoch: 11 Global Step: 240070 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:05,932-Speed 6314.26 samples/sec Loss 6.1686 LearningRate 0.0006 Epoch: 11 Global Step: 240080 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:09,175-Speed 6316.56 samples/sec Loss 6.3221 LearningRate 0.0006 Epoch: 11 Global Step: 240090 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:12,420-Speed 6311.51 samples/sec Loss 6.2302 LearningRate 0.0006 Epoch: 11 Global Step: 240100 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:15,668-Speed 6308.50 samples/sec Loss 6.1468 LearningRate 0.0006 Epoch: 11 Global Step: 240110 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:18,921-Speed 6295.96 samples/sec Loss 6.1952 LearningRate 0.0006 Epoch: 11 Global Step: 240120 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:22,173-Speed 6299.83 samples/sec Loss 6.2418 LearningRate 0.0006 Epoch: 11 Global Step: 240130 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:34:25,407-Speed 6334.40 samples/sec Loss 6.1938 LearningRate 0.0006 Epoch: 11 Global Step: 240140 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:28,655-Speed 6307.28 samples/sec Loss 6.2005 LearningRate 0.0006 Epoch: 11 Global Step: 240150 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:31,900-Speed 6311.96 samples/sec Loss 6.2358 LearningRate 0.0006 Epoch: 11 Global Step: 240160 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:35,150-Speed 6302.83 samples/sec Loss 6.1862 LearningRate 0.0006 Epoch: 11 Global Step: 240170 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:38,397-Speed 6308.89 samples/sec Loss 6.1916 LearningRate 0.0006 Epoch: 11 Global Step: 240180 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:41,644-Speed 6309.37 samples/sec Loss 6.2496 LearningRate 0.0006 Epoch: 11 Global Step: 240190 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:44,893-Speed 6305.89 samples/sec Loss 6.1632 LearningRate 0.0006 Epoch: 11 Global Step: 240200 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:48,139-Speed 6310.42 samples/sec Loss 6.1896 LearningRate 0.0006 Epoch: 11 Global Step: 240210 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:51,384-Speed 6312.24 samples/sec Loss 6.1892 LearningRate 0.0006 Epoch: 11 Global Step: 240220 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:54,632-Speed 6307.17 samples/sec Loss 6.2879 LearningRate 0.0006 Epoch: 11 Global Step: 240230 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:34:57,864-Speed 6337.97 samples/sec Loss 6.1979 LearningRate 0.0006 Epoch: 11 Global Step: 240240 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:01,106-Speed 6317.58 samples/sec Loss 6.2080 LearningRate 0.0006 Epoch: 11 Global Step: 240250 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:04,355-Speed 6304.58 samples/sec Loss 6.1823 LearningRate 0.0006 Epoch: 11 Global Step: 240260 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:07,604-Speed 6305.16 samples/sec Loss 6.2318 LearningRate 0.0006 Epoch: 11 Global Step: 240270 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:10,849-Speed 6312.81 samples/sec Loss 6.2557 LearningRate 0.0006 Epoch: 11 Global Step: 240280 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:14,102-Speed 6295.65 samples/sec Loss 6.2418 LearningRate 0.0006 Epoch: 11 Global Step: 240290 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:17,348-Speed 6310.95 samples/sec Loss 6.2463 LearningRate 0.0006 Epoch: 11 Global Step: 240300 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:20,594-Speed 6310.93 samples/sec Loss 6.2820 LearningRate 0.0006 Epoch: 11 Global Step: 240310 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:23,841-Speed 6309.66 samples/sec Loss 6.1881 LearningRate 0.0006 Epoch: 11 Global Step: 240320 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:27,089-Speed 6305.80 samples/sec Loss 6.3054 LearningRate 0.0006 Epoch: 11 Global Step: 240330 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:30,332-Speed 6318.52 samples/sec Loss 6.2473 LearningRate 0.0006 Epoch: 11 Global Step: 240340 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:33,576-Speed 6313.88 samples/sec Loss 6.2724 LearningRate 0.0006 Epoch: 11 Global Step: 240350 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:36,827-Speed 6302.08 samples/sec Loss 6.1922 LearningRate 0.0006 Epoch: 11 Global Step: 240360 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:40,071-Speed 6314.26 samples/sec Loss 6.1819 LearningRate 0.0006 Epoch: 11 Global Step: 240370 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:43,320-Speed 6303.97 samples/sec Loss 6.1886 LearningRate 0.0006 Epoch: 11 Global Step: 240380 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:46,570-Speed 6303.96 samples/sec Loss 6.2126 LearningRate 0.0006 Epoch: 11 Global Step: 240390 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:49,813-Speed 6315.84 samples/sec Loss 6.1844 LearningRate 0.0006 Epoch: 11 Global Step: 240400 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:53,059-Speed 6309.97 samples/sec Loss 6.2414 LearningRate 0.0006 Epoch: 11 Global Step: 240410 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:56,301-Speed 6319.09 samples/sec Loss 6.1409 LearningRate 0.0006 Epoch: 11 Global Step: 240420 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:35:59,546-Speed 6312.32 samples/sec Loss 6.1055 LearningRate 0.0006 Epoch: 11 Global Step: 240430 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:02,788-Speed 6318.58 samples/sec Loss 6.2178 LearningRate 0.0006 Epoch: 11 Global Step: 240440 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:36:06,023-Speed 6331.56 samples/sec Loss 6.2694 LearningRate 0.0006 Epoch: 11 Global Step: 240450 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:09,281-Speed 6288.36 samples/sec Loss 6.2545 LearningRate 0.0006 Epoch: 11 Global Step: 240460 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:12,523-Speed 6318.11 samples/sec Loss 6.2303 LearningRate 0.0006 Epoch: 11 Global Step: 240470 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:15,772-Speed 6306.90 samples/sec Loss 6.3032 LearningRate 0.0006 Epoch: 11 Global Step: 240480 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:19,040-Speed 6268.84 samples/sec Loss 6.2042 LearningRate 0.0006 Epoch: 11 Global Step: 240490 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:22,304-Speed 6276.04 samples/sec Loss 6.2335 LearningRate 0.0006 Epoch: 11 Global Step: 240500 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:25,549-Speed 6311.51 samples/sec Loss 6.1981 LearningRate 0.0006 Epoch: 11 Global Step: 240510 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:28,797-Speed 6307.74 samples/sec Loss 6.2171 LearningRate 0.0006 Epoch: 11 Global Step: 240520 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:32,042-Speed 6313.35 samples/sec Loss 6.2440 LearningRate 0.0006 Epoch: 11 Global Step: 240530 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:35,292-Speed 6303.39 samples/sec Loss 6.2806 LearningRate 0.0006 Epoch: 11 Global Step: 240540 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:38,522-Speed 6341.53 samples/sec Loss 6.2572 LearningRate 0.0006 Epoch: 11 Global Step: 240550 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:41,768-Speed 6311.05 samples/sec Loss 6.1752 LearningRate 0.0006 Epoch: 11 Global Step: 240560 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:45,011-Speed 6316.63 samples/sec Loss 6.1944 LearningRate 0.0006 Epoch: 11 Global Step: 240570 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:48,254-Speed 6315.41 samples/sec Loss 6.1936 LearningRate 0.0006 Epoch: 11 Global Step: 240580 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:51,503-Speed 6305.64 samples/sec Loss 6.2114 LearningRate 0.0006 Epoch: 11 Global Step: 240590 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:54,748-Speed 6312.43 samples/sec Loss 6.1968 LearningRate 0.0006 Epoch: 11 Global Step: 240600 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:36:57,993-Speed 6313.28 samples/sec Loss 6.1989 LearningRate 0.0006 Epoch: 11 Global Step: 240610 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:01,238-Speed 6311.49 samples/sec Loss 6.1762 LearningRate 0.0006 Epoch: 11 Global Step: 240620 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:04,486-Speed 6307.65 samples/sec Loss 6.1956 LearningRate 0.0006 Epoch: 11 Global Step: 240630 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:07,732-Speed 6310.43 samples/sec Loss 6.1795 LearningRate 0.0006 Epoch: 11 Global Step: 240640 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:10,982-Speed 6303.19 samples/sec Loss 6.2157 LearningRate 0.0006 Epoch: 11 Global Step: 240650 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:37:14,216-Speed 6333.17 samples/sec Loss 6.2044 LearningRate 0.0006 Epoch: 11 Global Step: 240660 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:17,511-Speed 6218.13 samples/sec Loss 6.2495 LearningRate 0.0006 Epoch: 11 Global Step: 240670 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:20,754-Speed 6315.65 samples/sec Loss 6.2319 LearningRate 0.0006 Epoch: 11 Global Step: 240680 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:24,007-Speed 6297.89 samples/sec Loss 6.2708 LearningRate 0.0006 Epoch: 11 Global Step: 240690 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:27,252-Speed 6311.92 samples/sec Loss 6.2691 LearningRate 0.0006 Epoch: 11 Global Step: 240700 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:30,502-Speed 6305.26 samples/sec Loss 6.1265 LearningRate 0.0006 Epoch: 11 Global Step: 240710 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:33,746-Speed 6313.78 samples/sec Loss 6.2372 LearningRate 0.0006 Epoch: 11 Global Step: 240720 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:36,989-Speed 6316.91 samples/sec Loss 6.3750 LearningRate 0.0006 Epoch: 11 Global Step: 240730 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:40,236-Speed 6309.90 samples/sec Loss 6.2410 LearningRate 0.0006 Epoch: 11 Global Step: 240740 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:43,482-Speed 6310.82 samples/sec Loss 6.1865 LearningRate 0.0006 Epoch: 11 Global Step: 240750 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:46,733-Speed 6300.93 samples/sec Loss 6.1915 LearningRate 0.0006 Epoch: 11 Global Step: 240760 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:37:49,980-Speed 6308.09 samples/sec Loss 6.1823 LearningRate 0.0006 Epoch: 11 Global Step: 240770 Fp16 Grad Scale: 65536 Required: 54 hours Training: 2022-04-01 13:37:53,210-Speed 6343.46 samples/sec Loss 6.2937 LearningRate 0.0006 Epoch: 11 Global Step: 240780 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:56,456-Speed 6309.95 samples/sec Loss 6.2532 LearningRate 0.0006 Epoch: 11 Global Step: 240790 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:37:59,702-Speed 6311.54 samples/sec Loss 6.2640 LearningRate 0.0006 Epoch: 11 Global Step: 240800 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:02,950-Speed 6306.45 samples/sec Loss 6.2163 LearningRate 0.0006 Epoch: 11 Global Step: 240810 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:06,198-Speed 6306.10 samples/sec Loss 6.1771 LearningRate 0.0006 Epoch: 11 Global Step: 240820 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:09,441-Speed 6316.68 samples/sec Loss 6.1621 LearningRate 0.0006 Epoch: 11 Global Step: 240830 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:12,688-Speed 6308.51 samples/sec Loss 6.2717 LearningRate 0.0006 Epoch: 11 Global Step: 240840 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:15,934-Speed 6311.81 samples/sec Loss 6.2034 LearningRate 0.0006 Epoch: 11 Global Step: 240850 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:19,180-Speed 6309.94 samples/sec Loss 6.2498 LearningRate 0.0006 Epoch: 11 Global Step: 240860 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:22,426-Speed 6310.88 samples/sec Loss 6.2450 LearningRate 0.0006 Epoch: 11 Global Step: 240870 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:25,662-Speed 6330.35 samples/sec Loss 6.2345 LearningRate 0.0006 Epoch: 11 Global Step: 240880 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:28,914-Speed 6298.71 samples/sec Loss 6.2738 LearningRate 0.0006 Epoch: 11 Global Step: 240890 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:32,156-Speed 6318.34 samples/sec Loss 6.3072 LearningRate 0.0006 Epoch: 11 Global Step: 240900 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:35,419-Speed 6278.02 samples/sec Loss 6.1936 LearningRate 0.0006 Epoch: 11 Global Step: 240910 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:38,671-Speed 6297.95 samples/sec Loss 6.1898 LearningRate 0.0006 Epoch: 11 Global Step: 240920 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:41,922-Speed 6302.40 samples/sec Loss 6.1845 LearningRate 0.0006 Epoch: 11 Global Step: 240930 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:45,171-Speed 6305.44 samples/sec Loss 6.2118 LearningRate 0.0006 Epoch: 11 Global Step: 240940 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:48,416-Speed 6312.64 samples/sec Loss 6.1820 LearningRate 0.0006 Epoch: 11 Global Step: 240950 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:51,664-Speed 6305.71 samples/sec Loss 6.3018 LearningRate 0.0006 Epoch: 11 Global Step: 240960 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:54,909-Speed 6312.76 samples/sec Loss 6.2260 LearningRate 0.0006 Epoch: 11 Global Step: 240970 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:38:58,147-Speed 6327.52 samples/sec Loss 6.1556 LearningRate 0.0006 Epoch: 11 Global Step: 240980 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:39:01,397-Speed 6303.23 samples/sec Loss 6.2226 LearningRate 0.0006 Epoch: 11 Global Step: 240990 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:39:04,645-Speed 6305.78 samples/sec Loss 6.2432 LearningRate 0.0006 Epoch: 11 Global Step: 241000 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:39:07,891-Speed 6311.32 samples/sec Loss 6.2604 LearningRate 0.0006 Epoch: 11 Global Step: 241010 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:39:11,137-Speed 6311.01 samples/sec Loss 6.1943 LearningRate 0.0006 Epoch: 11 Global Step: 241020 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:39:14,380-Speed 6316.18 samples/sec Loss 6.2368 LearningRate 0.0006 Epoch: 11 Global Step: 241030 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:39:17,629-Speed 6305.52 samples/sec Loss 6.1872 LearningRate 0.0006 Epoch: 11 Global Step: 241040 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:39:20,875-Speed 6309.90 samples/sec Loss 6.1620 LearningRate 0.0006 Epoch: 11 Global Step: 241050 Fp16 Grad Scale: 32768 Required: 54 hours Training: 2022-04-01 13:39:24,123-Speed 6307.37 samples/sec Loss 6.2579 LearningRate 0.0006 Epoch: 11 Global Step: 241060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:39:27,376-Speed 6298.00 samples/sec Loss 6.2695 LearningRate 0.0006 Epoch: 11 Global Step: 241070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:39:30,619-Speed 6315.85 samples/sec Loss 6.3065 LearningRate 0.0006 Epoch: 11 Global Step: 241080 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:39:33,850-Speed 6340.50 samples/sec Loss 6.3008 LearningRate 0.0006 Epoch: 11 Global Step: 241090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:39:37,102-Speed 6298.52 samples/sec Loss 6.1923 LearningRate 0.0006 Epoch: 11 Global Step: 241100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:39:40,349-Speed 6308.43 samples/sec Loss 6.2434 LearningRate 0.0006 Epoch: 11 Global Step: 241110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:39:43,589-Speed 6323.39 samples/sec Loss 6.2850 LearningRate 0.0006 Epoch: 11 Global Step: 241120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:39:46,832-Speed 6316.42 samples/sec Loss 6.1795 LearningRate 0.0006 Epoch: 11 Global Step: 241130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:39:50,084-Speed 6298.03 samples/sec Loss 6.2545 LearningRate 0.0006 Epoch: 11 Global Step: 241140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:39:53,335-Speed 6302.00 samples/sec Loss 6.1951 LearningRate 0.0006 Epoch: 11 Global Step: 241150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:39:56,579-Speed 6314.28 samples/sec Loss 6.1626 LearningRate 0.0006 Epoch: 11 Global Step: 241160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:39:59,826-Speed 6308.22 samples/sec Loss 6.2703 LearningRate 0.0006 Epoch: 11 Global Step: 241170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:03,072-Speed 6310.39 samples/sec Loss 6.1411 LearningRate 0.0006 Epoch: 11 Global Step: 241180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:06,305-Speed 6337.06 samples/sec Loss 6.2938 LearningRate 0.0006 Epoch: 11 Global Step: 241190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:09,550-Speed 6312.57 samples/sec Loss 6.2741 LearningRate 0.0006 Epoch: 11 Global Step: 241200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:12,796-Speed 6311.31 samples/sec Loss 6.2561 LearningRate 0.0006 Epoch: 11 Global Step: 241210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:16,042-Speed 6310.74 samples/sec Loss 6.2287 LearningRate 0.0006 Epoch: 11 Global Step: 241220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:19,287-Speed 6312.18 samples/sec Loss 6.1681 LearningRate 0.0006 Epoch: 11 Global Step: 241230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:22,544-Speed 6290.34 samples/sec Loss 6.1950 LearningRate 0.0006 Epoch: 11 Global Step: 241240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:25,791-Speed 6308.50 samples/sec Loss 6.1969 LearningRate 0.0006 Epoch: 11 Global Step: 241250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:29,037-Speed 6310.81 samples/sec Loss 6.2867 LearningRate 0.0006 Epoch: 11 Global Step: 241260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:32,297-Speed 6283.06 samples/sec Loss 6.2199 LearningRate 0.0006 Epoch: 11 Global Step: 241270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:35,540-Speed 6316.01 samples/sec Loss 6.2401 LearningRate 0.0006 Epoch: 11 Global Step: 241280 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:38,784-Speed 6315.80 samples/sec Loss 6.2262 LearningRate 0.0006 Epoch: 11 Global Step: 241290 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:40:42,019-Speed 6332.52 samples/sec Loss 6.2022 LearningRate 0.0006 Epoch: 11 Global Step: 241300 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:45,266-Speed 6308.67 samples/sec Loss 6.1622 LearningRate 0.0006 Epoch: 11 Global Step: 241310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:48,515-Speed 6304.20 samples/sec Loss 6.2741 LearningRate 0.0006 Epoch: 11 Global Step: 241320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:51,759-Speed 6315.04 samples/sec Loss 6.2085 LearningRate 0.0006 Epoch: 11 Global Step: 241330 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:55,010-Speed 6300.23 samples/sec Loss 6.1608 LearningRate 0.0006 Epoch: 11 Global Step: 241340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:40:58,257-Speed 6310.46 samples/sec Loss 6.2309 LearningRate 0.0006 Epoch: 11 Global Step: 241350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:01,539-Speed 6239.74 samples/sec Loss 6.2336 LearningRate 0.0006 Epoch: 11 Global Step: 241360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:04,883-Speed 6127.02 samples/sec Loss 6.2690 LearningRate 0.0006 Epoch: 11 Global Step: 241370 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:08,127-Speed 6313.95 samples/sec Loss 6.1956 LearningRate 0.0006 Epoch: 11 Global Step: 241380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:11,377-Speed 6303.62 samples/sec Loss 6.1957 LearningRate 0.0006 Epoch: 11 Global Step: 241390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:14,609-Speed 6337.26 samples/sec Loss 6.1697 LearningRate 0.0006 Epoch: 11 Global Step: 241400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:17,853-Speed 6315.44 samples/sec Loss 6.1852 LearningRate 0.0006 Epoch: 11 Global Step: 241410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:21,103-Speed 6303.85 samples/sec Loss 6.2025 LearningRate 0.0006 Epoch: 11 Global Step: 241420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:24,347-Speed 6315.06 samples/sec Loss 6.2361 LearningRate 0.0006 Epoch: 11 Global Step: 241430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:27,593-Speed 6309.50 samples/sec Loss 6.2751 LearningRate 0.0006 Epoch: 11 Global Step: 241440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:30,893-Speed 6207.14 samples/sec Loss 6.1893 LearningRate 0.0006 Epoch: 11 Global Step: 241450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:34,139-Speed 6311.15 samples/sec Loss 6.2752 LearningRate 0.0006 Epoch: 11 Global Step: 241460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:37,426-Speed 6232.78 samples/sec Loss 6.2961 LearningRate 0.0006 Epoch: 11 Global Step: 241470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:40,673-Speed 6307.59 samples/sec Loss 6.2222 LearningRate 0.0006 Epoch: 11 Global Step: 241480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:43,918-Speed 6313.94 samples/sec Loss 6.2838 LearningRate 0.0006 Epoch: 11 Global Step: 241490 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:47,163-Speed 6311.62 samples/sec Loss 6.2008 LearningRate 0.0006 Epoch: 11 Global Step: 241500 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:41:50,396-Speed 6339.46 samples/sec Loss 6.1909 LearningRate 0.0006 Epoch: 11 Global Step: 241510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:53,641-Speed 6313.72 samples/sec Loss 6.1956 LearningRate 0.0006 Epoch: 11 Global Step: 241520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:41:56,883-Speed 6317.41 samples/sec Loss 6.2130 LearningRate 0.0006 Epoch: 11 Global Step: 241530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:00,132-Speed 6305.45 samples/sec Loss 6.2471 LearningRate 0.0006 Epoch: 11 Global Step: 241540 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:03,379-Speed 6307.92 samples/sec Loss 6.1893 LearningRate 0.0006 Epoch: 11 Global Step: 241550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:06,630-Speed 6301.96 samples/sec Loss 6.2333 LearningRate 0.0006 Epoch: 11 Global Step: 241560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:09,874-Speed 6312.90 samples/sec Loss 6.2056 LearningRate 0.0006 Epoch: 11 Global Step: 241570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:13,118-Speed 6314.79 samples/sec Loss 6.1577 LearningRate 0.0006 Epoch: 11 Global Step: 241580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:16,361-Speed 6318.02 samples/sec Loss 6.2229 LearningRate 0.0006 Epoch: 11 Global Step: 241590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:19,609-Speed 6305.78 samples/sec Loss 6.1709 LearningRate 0.0006 Epoch: 11 Global Step: 241600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:22,843-Speed 6334.45 samples/sec Loss 6.2582 LearningRate 0.0006 Epoch: 11 Global Step: 241610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:26,089-Speed 6311.03 samples/sec Loss 6.2185 LearningRate 0.0006 Epoch: 11 Global Step: 241620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:29,342-Speed 6298.28 samples/sec Loss 6.2597 LearningRate 0.0006 Epoch: 11 Global Step: 241630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:32,591-Speed 6303.80 samples/sec Loss 6.2838 LearningRate 0.0006 Epoch: 11 Global Step: 241640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:35,842-Speed 6302.50 samples/sec Loss 6.1922 LearningRate 0.0006 Epoch: 11 Global Step: 241650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:39,091-Speed 6303.64 samples/sec Loss 6.2415 LearningRate 0.0006 Epoch: 11 Global Step: 241660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:42,337-Speed 6311.95 samples/sec Loss 6.2739 LearningRate 0.0006 Epoch: 11 Global Step: 241670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:45,580-Speed 6315.71 samples/sec Loss 6.1545 LearningRate 0.0006 Epoch: 11 Global Step: 241680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:48,823-Speed 6317.58 samples/sec Loss 6.2198 LearningRate 0.0006 Epoch: 11 Global Step: 241690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:52,068-Speed 6312.93 samples/sec Loss 6.1861 LearningRate 0.0006 Epoch: 11 Global Step: 241700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:55,307-Speed 6322.85 samples/sec Loss 6.2238 LearningRate 0.0006 Epoch: 11 Global Step: 241710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:42:58,658-Speed 6112.48 samples/sec Loss 6.2002 LearningRate 0.0006 Epoch: 11 Global Step: 241720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:01,907-Speed 6305.13 samples/sec Loss 6.1892 LearningRate 0.0006 Epoch: 11 Global Step: 241730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:05,154-Speed 6309.65 samples/sec Loss 6.2842 LearningRate 0.0006 Epoch: 11 Global Step: 241740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:08,401-Speed 6308.95 samples/sec Loss 6.2726 LearningRate 0.0006 Epoch: 11 Global Step: 241750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:11,646-Speed 6312.68 samples/sec Loss 6.1636 LearningRate 0.0006 Epoch: 11 Global Step: 241760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:14,892-Speed 6311.32 samples/sec Loss 6.1619 LearningRate 0.0006 Epoch: 11 Global Step: 241770 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:18,137-Speed 6311.04 samples/sec Loss 6.2260 LearningRate 0.0006 Epoch: 11 Global Step: 241780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:21,384-Speed 6309.45 samples/sec Loss 6.2281 LearningRate 0.0006 Epoch: 11 Global Step: 241790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:24,631-Speed 6309.00 samples/sec Loss 6.2855 LearningRate 0.0006 Epoch: 11 Global Step: 241800 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:27,881-Speed 6302.55 samples/sec Loss 6.2088 LearningRate 0.0006 Epoch: 11 Global Step: 241810 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:43:31,114-Speed 6335.14 samples/sec Loss 6.1270 LearningRate 0.0006 Epoch: 11 Global Step: 241820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:34,359-Speed 6313.64 samples/sec Loss 6.2296 LearningRate 0.0006 Epoch: 11 Global Step: 241830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:37,605-Speed 6310.79 samples/sec Loss 6.2703 LearningRate 0.0006 Epoch: 11 Global Step: 241840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:40,849-Speed 6315.14 samples/sec Loss 6.1853 LearningRate 0.0006 Epoch: 11 Global Step: 241850 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:44,096-Speed 6308.11 samples/sec Loss 6.2005 LearningRate 0.0006 Epoch: 11 Global Step: 241860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:47,342-Speed 6312.54 samples/sec Loss 6.1893 LearningRate 0.0006 Epoch: 11 Global Step: 241870 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:50,590-Speed 6306.76 samples/sec Loss 6.1791 LearningRate 0.0006 Epoch: 11 Global Step: 241880 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:53,836-Speed 6310.05 samples/sec Loss 6.1612 LearningRate 0.0006 Epoch: 11 Global Step: 241890 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:43:57,085-Speed 6303.94 samples/sec Loss 6.2077 LearningRate 0.0006 Epoch: 11 Global Step: 241900 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:00,337-Speed 6299.70 samples/sec Loss 6.3188 LearningRate 0.0006 Epoch: 11 Global Step: 241910 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:03,573-Speed 6333.92 samples/sec Loss 6.1624 LearningRate 0.0006 Epoch: 11 Global Step: 241920 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:06,820-Speed 6307.67 samples/sec Loss 6.1544 LearningRate 0.0006 Epoch: 11 Global Step: 241930 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:10,067-Speed 6309.79 samples/sec Loss 6.2006 LearningRate 0.0006 Epoch: 11 Global Step: 241940 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:13,312-Speed 6312.30 samples/sec Loss 6.1843 LearningRate 0.0006 Epoch: 11 Global Step: 241950 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:16,560-Speed 6305.89 samples/sec Loss 6.2192 LearningRate 0.0006 Epoch: 11 Global Step: 241960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:19,805-Speed 6312.46 samples/sec Loss 6.1844 LearningRate 0.0006 Epoch: 11 Global Step: 241970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:23,048-Speed 6317.57 samples/sec Loss 6.3401 LearningRate 0.0006 Epoch: 11 Global Step: 241980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:26,301-Speed 6296.10 samples/sec Loss 6.2423 LearningRate 0.0006 Epoch: 11 Global Step: 241990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:29,549-Speed 6307.74 samples/sec Loss 6.1930 LearningRate 0.0006 Epoch: 11 Global Step: 242000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:32,795-Speed 6309.99 samples/sec Loss 6.2165 LearningRate 0.0006 Epoch: 11 Global Step: 242010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:36,045-Speed 6303.00 samples/sec Loss 6.2594 LearningRate 0.0006 Epoch: 11 Global Step: 242020 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:44:39,289-Speed 6315.64 samples/sec Loss 6.2486 LearningRate 0.0006 Epoch: 11 Global Step: 242030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:42,532-Speed 6314.68 samples/sec Loss 6.2555 LearningRate 0.0006 Epoch: 11 Global Step: 242040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:45,778-Speed 6312.50 samples/sec Loss 6.2488 LearningRate 0.0006 Epoch: 11 Global Step: 242050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:49,026-Speed 6306.94 samples/sec Loss 6.2568 LearningRate 0.0006 Epoch: 11 Global Step: 242060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:52,275-Speed 6307.13 samples/sec Loss 6.3155 LearningRate 0.0006 Epoch: 11 Global Step: 242070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:55,520-Speed 6313.48 samples/sec Loss 6.1873 LearningRate 0.0006 Epoch: 11 Global Step: 242080 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:44:58,767-Speed 6308.24 samples/sec Loss 6.1706 LearningRate 0.0006 Epoch: 11 Global Step: 242090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:02,013-Speed 6310.40 samples/sec Loss 6.3300 LearningRate 0.0006 Epoch: 11 Global Step: 242100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:05,258-Speed 6313.65 samples/sec Loss 6.1978 LearningRate 0.0006 Epoch: 11 Global Step: 242110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:08,507-Speed 6304.26 samples/sec Loss 6.2024 LearningRate 0.0006 Epoch: 11 Global Step: 242120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:11,743-Speed 6330.63 samples/sec Loss 6.2139 LearningRate 0.0006 Epoch: 11 Global Step: 242130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:14,986-Speed 6316.13 samples/sec Loss 6.1887 LearningRate 0.0006 Epoch: 11 Global Step: 242140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:18,229-Speed 6315.70 samples/sec Loss 6.2259 LearningRate 0.0006 Epoch: 11 Global Step: 242150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:21,489-Speed 6285.47 samples/sec Loss 6.2840 LearningRate 0.0006 Epoch: 11 Global Step: 242160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:24,736-Speed 6307.81 samples/sec Loss 6.2210 LearningRate 0.0006 Epoch: 11 Global Step: 242170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:27,980-Speed 6314.26 samples/sec Loss 6.2271 LearningRate 0.0006 Epoch: 11 Global Step: 242180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:31,228-Speed 6306.93 samples/sec Loss 6.2058 LearningRate 0.0006 Epoch: 11 Global Step: 242190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:34,484-Speed 6290.71 samples/sec Loss 6.2333 LearningRate 0.0006 Epoch: 11 Global Step: 242200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:37,729-Speed 6313.15 samples/sec Loss 6.2021 LearningRate 0.0006 Epoch: 11 Global Step: 242210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:40,981-Speed 6299.00 samples/sec Loss 6.2788 LearningRate 0.0006 Epoch: 11 Global Step: 242220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:44,231-Speed 6302.37 samples/sec Loss 6.2030 LearningRate 0.0006 Epoch: 11 Global Step: 242230 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:45:47,461-Speed 6342.13 samples/sec Loss 6.2244 LearningRate 0.0006 Epoch: 11 Global Step: 242240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:50,705-Speed 6315.51 samples/sec Loss 6.2819 LearningRate 0.0006 Epoch: 11 Global Step: 242250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:53,953-Speed 6308.49 samples/sec Loss 6.2358 LearningRate 0.0006 Epoch: 11 Global Step: 242260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:45:57,204-Speed 6300.84 samples/sec Loss 6.2300 LearningRate 0.0006 Epoch: 11 Global Step: 242270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:00,451-Speed 6307.87 samples/sec Loss 6.2300 LearningRate 0.0006 Epoch: 11 Global Step: 242280 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:03,698-Speed 6310.35 samples/sec Loss 6.1707 LearningRate 0.0006 Epoch: 11 Global Step: 242290 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:06,941-Speed 6315.15 samples/sec Loss 6.2538 LearningRate 0.0006 Epoch: 11 Global Step: 242300 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:10,185-Speed 6314.17 samples/sec Loss 6.1678 LearningRate 0.0006 Epoch: 11 Global Step: 242310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:13,431-Speed 6310.60 samples/sec Loss 6.2665 LearningRate 0.0006 Epoch: 11 Global Step: 242320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:16,676-Speed 6312.62 samples/sec Loss 6.2392 LearningRate 0.0006 Epoch: 11 Global Step: 242330 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:19,910-Speed 6334.48 samples/sec Loss 6.1518 LearningRate 0.0006 Epoch: 11 Global Step: 242340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:23,159-Speed 6306.06 samples/sec Loss 6.2718 LearningRate 0.0006 Epoch: 11 Global Step: 242350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:26,404-Speed 6311.67 samples/sec Loss 6.1752 LearningRate 0.0006 Epoch: 11 Global Step: 242360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:29,651-Speed 6308.77 samples/sec Loss 6.2700 LearningRate 0.0006 Epoch: 11 Global Step: 242370 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:32,895-Speed 6315.70 samples/sec Loss 6.1311 LearningRate 0.0006 Epoch: 11 Global Step: 242380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:36,142-Speed 6308.27 samples/sec Loss 6.2343 LearningRate 0.0006 Epoch: 11 Global Step: 242390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:39,391-Speed 6304.54 samples/sec Loss 6.2243 LearningRate 0.0006 Epoch: 11 Global Step: 242400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:42,636-Speed 6313.14 samples/sec Loss 6.1330 LearningRate 0.0006 Epoch: 11 Global Step: 242410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:45,886-Speed 6301.85 samples/sec Loss 6.1501 LearningRate 0.0006 Epoch: 11 Global Step: 242420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:49,136-Speed 6303.08 samples/sec Loss 6.1730 LearningRate 0.0006 Epoch: 11 Global Step: 242430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:52,383-Speed 6310.42 samples/sec Loss 6.1561 LearningRate 0.0006 Epoch: 11 Global Step: 242440 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:46:55,616-Speed 6334.20 samples/sec Loss 6.2521 LearningRate 0.0006 Epoch: 11 Global Step: 242450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:46:58,862-Speed 6311.18 samples/sec Loss 6.1986 LearningRate 0.0006 Epoch: 11 Global Step: 242460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:02,122-Speed 6284.63 samples/sec Loss 6.2195 LearningRate 0.0006 Epoch: 11 Global Step: 242470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:05,374-Speed 6299.57 samples/sec Loss 6.1661 LearningRate 0.0006 Epoch: 11 Global Step: 242480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:08,620-Speed 6311.09 samples/sec Loss 6.3010 LearningRate 0.0006 Epoch: 11 Global Step: 242490 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:11,863-Speed 6315.24 samples/sec Loss 6.1759 LearningRate 0.0006 Epoch: 11 Global Step: 242500 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:15,105-Speed 6318.86 samples/sec Loss 6.1780 LearningRate 0.0006 Epoch: 11 Global Step: 242510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:18,354-Speed 6305.31 samples/sec Loss 6.1869 LearningRate 0.0006 Epoch: 11 Global Step: 242520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:21,598-Speed 6315.51 samples/sec Loss 6.1600 LearningRate 0.0006 Epoch: 11 Global Step: 242530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:24,843-Speed 6311.99 samples/sec Loss 6.1742 LearningRate 0.0006 Epoch: 11 Global Step: 242540 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:28,077-Speed 6334.37 samples/sec Loss 6.2191 LearningRate 0.0006 Epoch: 11 Global Step: 242550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:31,326-Speed 6304.61 samples/sec Loss 6.2854 LearningRate 0.0006 Epoch: 11 Global Step: 242560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:34,575-Speed 6305.13 samples/sec Loss 6.1790 LearningRate 0.0006 Epoch: 11 Global Step: 242570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:37,821-Speed 6310.67 samples/sec Loss 6.1777 LearningRate 0.0006 Epoch: 11 Global Step: 242580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:41,066-Speed 6311.41 samples/sec Loss 6.1927 LearningRate 0.0006 Epoch: 11 Global Step: 242590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:44,316-Speed 6303.76 samples/sec Loss 6.2089 LearningRate 0.0006 Epoch: 11 Global Step: 242600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:47,562-Speed 6310.90 samples/sec Loss 6.1541 LearningRate 0.0006 Epoch: 11 Global Step: 242610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:50,812-Speed 6302.53 samples/sec Loss 6.1223 LearningRate 0.0006 Epoch: 11 Global Step: 242620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:54,055-Speed 6315.76 samples/sec Loss 6.2225 LearningRate 0.0006 Epoch: 11 Global Step: 242630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:47:57,301-Speed 6311.69 samples/sec Loss 6.1962 LearningRate 0.0006 Epoch: 11 Global Step: 242640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:00,536-Speed 6332.20 samples/sec Loss 6.1254 LearningRate 0.0006 Epoch: 11 Global Step: 242650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:03,783-Speed 6308.50 samples/sec Loss 6.1825 LearningRate 0.0006 Epoch: 11 Global Step: 242660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:07,035-Speed 6299.53 samples/sec Loss 6.2340 LearningRate 0.0006 Epoch: 11 Global Step: 242670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:10,283-Speed 6307.48 samples/sec Loss 6.2478 LearningRate 0.0006 Epoch: 11 Global Step: 242680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:13,527-Speed 6314.72 samples/sec Loss 6.2299 LearningRate 0.0006 Epoch: 11 Global Step: 242690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:16,777-Speed 6303.79 samples/sec Loss 6.2337 LearningRate 0.0006 Epoch: 11 Global Step: 242700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:20,027-Speed 6302.54 samples/sec Loss 6.2274 LearningRate 0.0006 Epoch: 11 Global Step: 242710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:23,277-Speed 6302.44 samples/sec Loss 6.1698 LearningRate 0.0006 Epoch: 11 Global Step: 242720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:26,526-Speed 6305.05 samples/sec Loss 6.2096 LearningRate 0.0006 Epoch: 11 Global Step: 242730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:29,772-Speed 6309.74 samples/sec Loss 6.1017 LearningRate 0.0006 Epoch: 11 Global Step: 242740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:33,004-Speed 6339.16 samples/sec Loss 6.1711 LearningRate 0.0006 Epoch: 11 Global Step: 242750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:36,277-Speed 6258.14 samples/sec Loss 6.1338 LearningRate 0.0006 Epoch: 11 Global Step: 242760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:39,547-Speed 6265.06 samples/sec Loss 6.2326 LearningRate 0.0006 Epoch: 11 Global Step: 242770 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:42,792-Speed 6311.32 samples/sec Loss 6.1922 LearningRate 0.0006 Epoch: 11 Global Step: 242780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:46,038-Speed 6311.50 samples/sec Loss 6.1656 LearningRate 0.0006 Epoch: 11 Global Step: 242790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:49,282-Speed 6314.69 samples/sec Loss 6.2375 LearningRate 0.0006 Epoch: 11 Global Step: 242800 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:52,534-Speed 6297.86 samples/sec Loss 6.2565 LearningRate 0.0006 Epoch: 11 Global Step: 242810 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:55,781-Speed 6310.78 samples/sec Loss 6.2029 LearningRate 0.0006 Epoch: 11 Global Step: 242820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:48:59,030-Speed 6304.44 samples/sec Loss 6.1415 LearningRate 0.0006 Epoch: 11 Global Step: 242830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:02,278-Speed 6305.89 samples/sec Loss 6.2269 LearningRate 0.0006 Epoch: 11 Global Step: 242840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:05,522-Speed 6315.26 samples/sec Loss 6.1940 LearningRate 0.0006 Epoch: 11 Global Step: 242850 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:49:08,755-Speed 6336.53 samples/sec Loss 6.2492 LearningRate 0.0006 Epoch: 11 Global Step: 242860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:12,003-Speed 6306.67 samples/sec Loss 6.1511 LearningRate 0.0006 Epoch: 11 Global Step: 242870 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:15,249-Speed 6310.69 samples/sec Loss 6.2284 LearningRate 0.0006 Epoch: 11 Global Step: 242880 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:18,491-Speed 6319.18 samples/sec Loss 6.2493 LearningRate 0.0006 Epoch: 11 Global Step: 242890 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:21,734-Speed 6316.43 samples/sec Loss 6.2845 LearningRate 0.0006 Epoch: 11 Global Step: 242900 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:24,983-Speed 6305.31 samples/sec Loss 6.3465 LearningRate 0.0006 Epoch: 11 Global Step: 242910 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:28,227-Speed 6313.72 samples/sec Loss 6.2622 LearningRate 0.0006 Epoch: 11 Global Step: 242920 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:31,469-Speed 6320.08 samples/sec Loss 6.2221 LearningRate 0.0006 Epoch: 11 Global Step: 242930 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:34,714-Speed 6312.56 samples/sec Loss 6.2065 LearningRate 0.0006 Epoch: 11 Global Step: 242940 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:37,962-Speed 6306.33 samples/sec Loss 6.1803 LearningRate 0.0006 Epoch: 11 Global Step: 242950 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:41,194-Speed 6338.79 samples/sec Loss 6.1677 LearningRate 0.0006 Epoch: 11 Global Step: 242960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:44,441-Speed 6307.89 samples/sec Loss 6.2045 LearningRate 0.0006 Epoch: 11 Global Step: 242970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:47,687-Speed 6309.80 samples/sec Loss 6.2016 LearningRate 0.0006 Epoch: 11 Global Step: 242980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:50,935-Speed 6307.58 samples/sec Loss 6.2325 LearningRate 0.0006 Epoch: 11 Global Step: 242990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:54,181-Speed 6310.63 samples/sec Loss 6.2800 LearningRate 0.0006 Epoch: 11 Global Step: 243000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:49:57,426-Speed 6313.79 samples/sec Loss 6.2515 LearningRate 0.0006 Epoch: 11 Global Step: 243010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:00,668-Speed 6316.83 samples/sec Loss 6.2380 LearningRate 0.0006 Epoch: 11 Global Step: 243020 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:03,916-Speed 6306.87 samples/sec Loss 6.1779 LearningRate 0.0006 Epoch: 11 Global Step: 243030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:07,159-Speed 6316.55 samples/sec Loss 6.1628 LearningRate 0.0006 Epoch: 11 Global Step: 243040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:10,401-Speed 6319.55 samples/sec Loss 6.2315 LearningRate 0.0006 Epoch: 11 Global Step: 243050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:13,636-Speed 6331.51 samples/sec Loss 6.2588 LearningRate 0.0006 Epoch: 11 Global Step: 243060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:16,881-Speed 6311.84 samples/sec Loss 6.1665 LearningRate 0.0006 Epoch: 11 Global Step: 243070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:20,126-Speed 6312.94 samples/sec Loss 6.1914 LearningRate 0.0006 Epoch: 11 Global Step: 243080 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:23,374-Speed 6308.86 samples/sec Loss 6.1716 LearningRate 0.0006 Epoch: 11 Global Step: 243090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:26,619-Speed 6311.87 samples/sec Loss 6.2530 LearningRate 0.0006 Epoch: 11 Global Step: 243100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:29,867-Speed 6307.08 samples/sec Loss 6.2458 LearningRate 0.0006 Epoch: 11 Global Step: 243110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:33,112-Speed 6312.66 samples/sec Loss 6.2920 LearningRate 0.0006 Epoch: 11 Global Step: 243120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:36,359-Speed 6308.63 samples/sec Loss 6.2079 LearningRate 0.0006 Epoch: 11 Global Step: 243130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:39,604-Speed 6313.54 samples/sec Loss 6.1018 LearningRate 0.0006 Epoch: 11 Global Step: 243140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:42,851-Speed 6307.91 samples/sec Loss 6.1516 LearningRate 0.0006 Epoch: 11 Global Step: 243150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:46,102-Speed 6300.63 samples/sec Loss 6.2994 LearningRate 0.0006 Epoch: 11 Global Step: 243160 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:50:49,351-Speed 6304.99 samples/sec Loss 6.2242 LearningRate 0.0006 Epoch: 11 Global Step: 243170 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:50:52,583-Speed 6338.48 samples/sec Loss 6.1780 LearningRate 0.0006 Epoch: 11 Global Step: 243180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:55,834-Speed 6301.61 samples/sec Loss 6.1257 LearningRate 0.0006 Epoch: 11 Global Step: 243190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:50:59,086-Speed 6298.57 samples/sec Loss 6.2231 LearningRate 0.0006 Epoch: 11 Global Step: 243200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:02,330-Speed 6315.21 samples/sec Loss 6.2688 LearningRate 0.0006 Epoch: 11 Global Step: 243210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:05,576-Speed 6309.31 samples/sec Loss 6.2198 LearningRate 0.0006 Epoch: 11 Global Step: 243220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:08,822-Speed 6310.52 samples/sec Loss 6.1993 LearningRate 0.0006 Epoch: 11 Global Step: 243230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:12,071-Speed 6305.12 samples/sec Loss 6.2352 LearningRate 0.0006 Epoch: 11 Global Step: 243240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:15,318-Speed 6309.52 samples/sec Loss 6.2167 LearningRate 0.0006 Epoch: 11 Global Step: 243250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:18,560-Speed 6318.00 samples/sec Loss 6.2200 LearningRate 0.0006 Epoch: 11 Global Step: 243260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:21,808-Speed 6306.73 samples/sec Loss 6.2360 LearningRate 0.0006 Epoch: 11 Global Step: 243270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:25,058-Speed 6302.57 samples/sec Loss 6.1473 LearningRate 0.0006 Epoch: 11 Global Step: 243280 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:51:28,289-Speed 6340.10 samples/sec Loss 6.1968 LearningRate 0.0006 Epoch: 11 Global Step: 243290 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:31,544-Speed 6293.24 samples/sec Loss 6.2407 LearningRate 0.0006 Epoch: 11 Global Step: 243300 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:34,797-Speed 6298.60 samples/sec Loss 6.0455 LearningRate 0.0006 Epoch: 11 Global Step: 243310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:38,047-Speed 6302.73 samples/sec Loss 6.1883 LearningRate 0.0006 Epoch: 11 Global Step: 243320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:41,295-Speed 6306.86 samples/sec Loss 6.2576 LearningRate 0.0006 Epoch: 11 Global Step: 243330 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:44,544-Speed 6306.23 samples/sec Loss 6.2121 LearningRate 0.0006 Epoch: 11 Global Step: 243340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:47,815-Speed 6262.13 samples/sec Loss 6.2795 LearningRate 0.0006 Epoch: 11 Global Step: 243350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:51,083-Speed 6268.25 samples/sec Loss 6.1543 LearningRate 0.0006 Epoch: 11 Global Step: 243360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:54,332-Speed 6305.16 samples/sec Loss 6.1178 LearningRate 0.0006 Epoch: 11 Global Step: 243370 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:51:57,578-Speed 6310.96 samples/sec Loss 6.1747 LearningRate 0.0006 Epoch: 11 Global Step: 243380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:00,814-Speed 6330.12 samples/sec Loss 6.1502 LearningRate 0.0006 Epoch: 11 Global Step: 243390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:04,059-Speed 6311.68 samples/sec Loss 6.1365 LearningRate 0.0006 Epoch: 11 Global Step: 243400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:07,307-Speed 6306.76 samples/sec Loss 6.2169 LearningRate 0.0006 Epoch: 11 Global Step: 243410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:10,559-Speed 6299.11 samples/sec Loss 6.2539 LearningRate 0.0006 Epoch: 11 Global Step: 243420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:13,805-Speed 6310.42 samples/sec Loss 6.2068 LearningRate 0.0006 Epoch: 11 Global Step: 243430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:17,051-Speed 6310.97 samples/sec Loss 6.2296 LearningRate 0.0006 Epoch: 11 Global Step: 243440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:20,297-Speed 6310.52 samples/sec Loss 6.2401 LearningRate 0.0006 Epoch: 11 Global Step: 243450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:23,544-Speed 6309.64 samples/sec Loss 6.2638 LearningRate 0.0006 Epoch: 11 Global Step: 243460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:26,792-Speed 6307.28 samples/sec Loss 6.2166 LearningRate 0.0006 Epoch: 11 Global Step: 243470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:30,041-Speed 6304.90 samples/sec Loss 6.1693 LearningRate 0.0006 Epoch: 11 Global Step: 243480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:33,290-Speed 6304.75 samples/sec Loss 6.2342 LearningRate 0.0006 Epoch: 11 Global Step: 243490 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:52:36,523-Speed 6334.70 samples/sec Loss 6.2556 LearningRate 0.0006 Epoch: 11 Global Step: 243500 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:39,769-Speed 6311.57 samples/sec Loss 6.2242 LearningRate 0.0006 Epoch: 11 Global Step: 243510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:43,015-Speed 6309.67 samples/sec Loss 6.2402 LearningRate 0.0006 Epoch: 11 Global Step: 243520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:46,267-Speed 6301.08 samples/sec Loss 6.1502 LearningRate 0.0006 Epoch: 11 Global Step: 243530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:49,511-Speed 6314.23 samples/sec Loss 6.1053 LearningRate 0.0006 Epoch: 11 Global Step: 243540 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:52,758-Speed 6308.62 samples/sec Loss 6.1542 LearningRate 0.0006 Epoch: 11 Global Step: 243550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:56,004-Speed 6311.91 samples/sec Loss 6.1897 LearningRate 0.0006 Epoch: 11 Global Step: 243560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:52:59,249-Speed 6314.14 samples/sec Loss 6.1620 LearningRate 0.0006 Epoch: 11 Global Step: 243570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:02,495-Speed 6312.14 samples/sec Loss 6.1925 LearningRate 0.0006 Epoch: 11 Global Step: 243580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:05,741-Speed 6309.75 samples/sec Loss 6.1972 LearningRate 0.0006 Epoch: 11 Global Step: 243590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:08,967-Speed 6349.09 samples/sec Loss 6.3075 LearningRate 0.0006 Epoch: 11 Global Step: 243600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:12,216-Speed 6306.39 samples/sec Loss 6.1531 LearningRate 0.0006 Epoch: 11 Global Step: 243610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:15,476-Speed 6283.58 samples/sec Loss 6.2318 LearningRate 0.0006 Epoch: 11 Global Step: 243620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:18,726-Speed 6301.92 samples/sec Loss 6.1935 LearningRate 0.0006 Epoch: 11 Global Step: 243630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:21,968-Speed 6319.30 samples/sec Loss 6.1498 LearningRate 0.0006 Epoch: 11 Global Step: 243640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:25,211-Speed 6315.72 samples/sec Loss 6.2256 LearningRate 0.0006 Epoch: 11 Global Step: 243650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:28,456-Speed 6313.99 samples/sec Loss 6.1884 LearningRate 0.0006 Epoch: 11 Global Step: 243660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:31,701-Speed 6311.87 samples/sec Loss 6.1512 LearningRate 0.0006 Epoch: 11 Global Step: 243670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:34,952-Speed 6301.08 samples/sec Loss 6.1559 LearningRate 0.0006 Epoch: 11 Global Step: 243680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:38,203-Speed 6300.58 samples/sec Loss 6.1774 LearningRate 0.0006 Epoch: 11 Global Step: 243690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:41,448-Speed 6313.33 samples/sec Loss 6.2247 LearningRate 0.0006 Epoch: 11 Global Step: 243700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:44,711-Speed 6278.06 samples/sec Loss 6.2482 LearningRate 0.0006 Epoch: 11 Global Step: 243710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:47,956-Speed 6312.17 samples/sec Loss 6.1500 LearningRate 0.0006 Epoch: 11 Global Step: 243720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:51,207-Speed 6301.12 samples/sec Loss 6.2629 LearningRate 0.0006 Epoch: 11 Global Step: 243730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:54,448-Speed 6322.03 samples/sec Loss 6.1128 LearningRate 0.0006 Epoch: 11 Global Step: 243740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:53:57,695-Speed 6308.53 samples/sec Loss 6.1618 LearningRate 0.0006 Epoch: 11 Global Step: 243750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:00,940-Speed 6313.35 samples/sec Loss 6.2077 LearningRate 0.0006 Epoch: 11 Global Step: 243760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:04,185-Speed 6312.47 samples/sec Loss 6.2156 LearningRate 0.0006 Epoch: 11 Global Step: 243770 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:07,429-Speed 6314.03 samples/sec Loss 6.1815 LearningRate 0.0006 Epoch: 11 Global Step: 243780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:10,671-Speed 6318.34 samples/sec Loss 6.1221 LearningRate 0.0006 Epoch: 11 Global Step: 243790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:13,918-Speed 6308.18 samples/sec Loss 6.1978 LearningRate 0.0006 Epoch: 11 Global Step: 243800 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:54:17,163-Speed 6313.32 samples/sec Loss 6.1691 LearningRate 0.0006 Epoch: 11 Global Step: 243810 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:54:20,399-Speed 6330.95 samples/sec Loss 6.1772 LearningRate 0.0006 Epoch: 11 Global Step: 243820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:23,648-Speed 6304.25 samples/sec Loss 6.1465 LearningRate 0.0006 Epoch: 11 Global Step: 243830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:26,893-Speed 6312.57 samples/sec Loss 6.2662 LearningRate 0.0006 Epoch: 11 Global Step: 243840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:30,141-Speed 6307.57 samples/sec Loss 6.2410 LearningRate 0.0006 Epoch: 11 Global Step: 243850 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:33,385-Speed 6314.04 samples/sec Loss 6.2375 LearningRate 0.0006 Epoch: 11 Global Step: 243860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:36,636-Speed 6301.48 samples/sec Loss 6.1638 LearningRate 0.0006 Epoch: 11 Global Step: 243870 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:39,881-Speed 6312.16 samples/sec Loss 6.1665 LearningRate 0.0006 Epoch: 11 Global Step: 243880 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:43,135-Speed 6295.82 samples/sec Loss 6.2144 LearningRate 0.0006 Epoch: 11 Global Step: 243890 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:46,390-Speed 6292.74 samples/sec Loss 6.2288 LearningRate 0.0006 Epoch: 11 Global Step: 243900 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:49,640-Speed 6302.70 samples/sec Loss 6.2204 LearningRate 0.0006 Epoch: 11 Global Step: 243910 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:52,888-Speed 6307.53 samples/sec Loss 6.1681 LearningRate 0.0006 Epoch: 11 Global Step: 243920 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:54:56,120-Speed 6338.63 samples/sec Loss 6.2344 LearningRate 0.0006 Epoch: 11 Global Step: 243930 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:54:59,369-Speed 6304.75 samples/sec Loss 6.1124 LearningRate 0.0006 Epoch: 11 Global Step: 243940 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:02,619-Speed 6302.54 samples/sec Loss 6.2503 LearningRate 0.0006 Epoch: 11 Global Step: 243950 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:05,869-Speed 6303.10 samples/sec Loss 6.1667 LearningRate 0.0006 Epoch: 11 Global Step: 243960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:09,113-Speed 6315.43 samples/sec Loss 6.1503 LearningRate 0.0006 Epoch: 11 Global Step: 243970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:12,358-Speed 6312.23 samples/sec Loss 6.2099 LearningRate 0.0006 Epoch: 11 Global Step: 243980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:15,610-Speed 6299.29 samples/sec Loss 6.1476 LearningRate 0.0006 Epoch: 11 Global Step: 243990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:18,854-Speed 6314.28 samples/sec Loss 6.2146 LearningRate 0.0006 Epoch: 11 Global Step: 244000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:22,098-Speed 6315.14 samples/sec Loss 6.2582 LearningRate 0.0006 Epoch: 11 Global Step: 244010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:25,346-Speed 6306.57 samples/sec Loss 6.1976 LearningRate 0.0006 Epoch: 11 Global Step: 244020 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:28,576-Speed 6341.37 samples/sec Loss 6.1743 LearningRate 0.0006 Epoch: 11 Global Step: 244030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:31,825-Speed 6306.40 samples/sec Loss 6.1434 LearningRate 0.0006 Epoch: 11 Global Step: 244040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:35,068-Speed 6316.56 samples/sec Loss 6.1606 LearningRate 0.0006 Epoch: 11 Global Step: 244050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:38,313-Speed 6311.19 samples/sec Loss 6.2898 LearningRate 0.0006 Epoch: 11 Global Step: 244060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:41,561-Speed 6307.50 samples/sec Loss 6.2607 LearningRate 0.0006 Epoch: 11 Global Step: 244070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:44,804-Speed 6316.41 samples/sec Loss 6.2746 LearningRate 0.0006 Epoch: 11 Global Step: 244080 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:48,052-Speed 6308.07 samples/sec Loss 6.1394 LearningRate 0.0006 Epoch: 11 Global Step: 244090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:51,300-Speed 6306.60 samples/sec Loss 6.1331 LearningRate 0.0006 Epoch: 11 Global Step: 244100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:54,546-Speed 6309.11 samples/sec Loss 6.1489 LearningRate 0.0006 Epoch: 11 Global Step: 244110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:55:57,796-Speed 6304.73 samples/sec Loss 6.1384 LearningRate 0.0006 Epoch: 11 Global Step: 244120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:01,027-Speed 6340.04 samples/sec Loss 6.2177 LearningRate 0.0006 Epoch: 11 Global Step: 244130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:04,272-Speed 6311.92 samples/sec Loss 6.2226 LearningRate 0.0006 Epoch: 11 Global Step: 244140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:07,522-Speed 6304.11 samples/sec Loss 6.2190 LearningRate 0.0006 Epoch: 11 Global Step: 244150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:10,766-Speed 6314.80 samples/sec Loss 6.2163 LearningRate 0.0006 Epoch: 11 Global Step: 244160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:14,013-Speed 6308.81 samples/sec Loss 6.2555 LearningRate 0.0006 Epoch: 11 Global Step: 244170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:17,258-Speed 6311.92 samples/sec Loss 6.2555 LearningRate 0.0006 Epoch: 11 Global Step: 244180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:20,505-Speed 6308.24 samples/sec Loss 6.2381 LearningRate 0.0006 Epoch: 11 Global Step: 244190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:23,756-Speed 6301.59 samples/sec Loss 6.2734 LearningRate 0.0006 Epoch: 11 Global Step: 244200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:27,001-Speed 6314.11 samples/sec Loss 6.1547 LearningRate 0.0006 Epoch: 11 Global Step: 244210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:30,250-Speed 6304.50 samples/sec Loss 6.2189 LearningRate 0.0006 Epoch: 11 Global Step: 244220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:33,479-Speed 6343.50 samples/sec Loss 6.1274 LearningRate 0.0006 Epoch: 11 Global Step: 244230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:36,724-Speed 6312.85 samples/sec Loss 6.2260 LearningRate 0.0006 Epoch: 11 Global Step: 244240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:39,966-Speed 6318.50 samples/sec Loss 6.2284 LearningRate 0.0006 Epoch: 11 Global Step: 244250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:43,212-Speed 6310.15 samples/sec Loss 6.1893 LearningRate 0.0006 Epoch: 11 Global Step: 244260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:46,459-Speed 6309.69 samples/sec Loss 6.2461 LearningRate 0.0006 Epoch: 11 Global Step: 244270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:49,703-Speed 6314.26 samples/sec Loss 6.3098 LearningRate 0.0006 Epoch: 11 Global Step: 244280 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:52,952-Speed 6305.24 samples/sec Loss 6.1677 LearningRate 0.0006 Epoch: 11 Global Step: 244290 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:56,195-Speed 6316.59 samples/sec Loss 6.1567 LearningRate 0.0006 Epoch: 11 Global Step: 244300 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:56:59,449-Speed 6295.04 samples/sec Loss 6.1345 LearningRate 0.0006 Epoch: 11 Global Step: 244310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:02,698-Speed 6305.65 samples/sec Loss 6.1474 LearningRate 0.0006 Epoch: 11 Global Step: 244320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:05,939-Speed 6319.32 samples/sec Loss 6.1629 LearningRate 0.0006 Epoch: 11 Global Step: 244330 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:57:09,169-Speed 6342.30 samples/sec Loss 6.1809 LearningRate 0.0006 Epoch: 11 Global Step: 244340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:12,413-Speed 6314.90 samples/sec Loss 6.1597 LearningRate 0.0006 Epoch: 11 Global Step: 244350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:15,657-Speed 6314.72 samples/sec Loss 6.1803 LearningRate 0.0006 Epoch: 11 Global Step: 244360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:18,902-Speed 6313.36 samples/sec Loss 6.1632 LearningRate 0.0006 Epoch: 11 Global Step: 244370 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:22,146-Speed 6312.89 samples/sec Loss 6.1637 LearningRate 0.0006 Epoch: 11 Global Step: 244380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:25,397-Speed 6303.42 samples/sec Loss 6.2908 LearningRate 0.0006 Epoch: 11 Global Step: 244390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:28,643-Speed 6311.25 samples/sec Loss 6.2112 LearningRate 0.0006 Epoch: 11 Global Step: 244400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:31,891-Speed 6305.95 samples/sec Loss 6.1541 LearningRate 0.0006 Epoch: 11 Global Step: 244410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:35,139-Speed 6307.93 samples/sec Loss 6.1996 LearningRate 0.0006 Epoch: 11 Global Step: 244420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:38,385-Speed 6310.77 samples/sec Loss 6.2526 LearningRate 0.0006 Epoch: 11 Global Step: 244430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:41,615-Speed 6340.73 samples/sec Loss 6.1988 LearningRate 0.0006 Epoch: 11 Global Step: 244440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:44,860-Speed 6313.07 samples/sec Loss 6.1448 LearningRate 0.0006 Epoch: 11 Global Step: 244450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:48,108-Speed 6307.74 samples/sec Loss 6.1299 LearningRate 0.0006 Epoch: 11 Global Step: 244460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:51,354-Speed 6310.51 samples/sec Loss 6.1892 LearningRate 0.0006 Epoch: 11 Global Step: 244470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:54,601-Speed 6308.01 samples/sec Loss 6.2502 LearningRate 0.0006 Epoch: 11 Global Step: 244480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:57:57,849-Speed 6307.46 samples/sec Loss 6.2281 LearningRate 0.0006 Epoch: 11 Global Step: 244490 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:01,098-Speed 6305.27 samples/sec Loss 6.2844 LearningRate 0.0006 Epoch: 11 Global Step: 244500 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:04,346-Speed 6307.03 samples/sec Loss 6.1995 LearningRate 0.0006 Epoch: 11 Global Step: 244510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:07,612-Speed 6270.99 samples/sec Loss 6.2210 LearningRate 0.0006 Epoch: 11 Global Step: 244520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:10,857-Speed 6312.85 samples/sec Loss 6.1391 LearningRate 0.0006 Epoch: 11 Global Step: 244530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:14,100-Speed 6317.12 samples/sec Loss 6.1950 LearningRate 0.0006 Epoch: 11 Global Step: 244540 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 13:58:17,333-Speed 6336.31 samples/sec Loss 6.2866 LearningRate 0.0006 Epoch: 11 Global Step: 244550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:20,578-Speed 6311.33 samples/sec Loss 6.1500 LearningRate 0.0006 Epoch: 11 Global Step: 244560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:23,826-Speed 6307.05 samples/sec Loss 6.2684 LearningRate 0.0006 Epoch: 11 Global Step: 244570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:27,073-Speed 6308.62 samples/sec Loss 6.1788 LearningRate 0.0006 Epoch: 11 Global Step: 244580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:30,318-Speed 6312.25 samples/sec Loss 6.1548 LearningRate 0.0006 Epoch: 11 Global Step: 244590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:33,580-Speed 6280.84 samples/sec Loss 6.2114 LearningRate 0.0006 Epoch: 11 Global Step: 244600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:36,923-Speed 6128.21 samples/sec Loss 6.2168 LearningRate 0.0006 Epoch: 11 Global Step: 244610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:40,167-Speed 6314.04 samples/sec Loss 6.1585 LearningRate 0.0006 Epoch: 11 Global Step: 244620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:43,419-Speed 6300.64 samples/sec Loss 6.1911 LearningRate 0.0006 Epoch: 11 Global Step: 244630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:46,666-Speed 6309.01 samples/sec Loss 6.2824 LearningRate 0.0006 Epoch: 11 Global Step: 244640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:49,896-Speed 6340.60 samples/sec Loss 6.1248 LearningRate 0.0006 Epoch: 11 Global Step: 244650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:53,142-Speed 6311.85 samples/sec Loss 6.1982 LearningRate 0.0006 Epoch: 11 Global Step: 244660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:56,391-Speed 6304.61 samples/sec Loss 6.2356 LearningRate 0.0006 Epoch: 11 Global Step: 244670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:58:59,634-Speed 6316.92 samples/sec Loss 6.2311 LearningRate 0.0006 Epoch: 11 Global Step: 244680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:02,883-Speed 6304.86 samples/sec Loss 6.2029 LearningRate 0.0006 Epoch: 11 Global Step: 244690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:06,133-Speed 6302.96 samples/sec Loss 6.1957 LearningRate 0.0006 Epoch: 11 Global Step: 244700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:09,378-Speed 6313.39 samples/sec Loss 6.1073 LearningRate 0.0006 Epoch: 11 Global Step: 244710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:12,627-Speed 6303.71 samples/sec Loss 6.1749 LearningRate 0.0006 Epoch: 11 Global Step: 244720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:15,876-Speed 6305.19 samples/sec Loss 6.1812 LearningRate 0.0006 Epoch: 11 Global Step: 244730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:19,119-Speed 6316.55 samples/sec Loss 6.1719 LearningRate 0.0006 Epoch: 11 Global Step: 244740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:22,353-Speed 6333.29 samples/sec Loss 6.1989 LearningRate 0.0006 Epoch: 11 Global Step: 244750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:25,605-Speed 6300.33 samples/sec Loss 6.1747 LearningRate 0.0006 Epoch: 11 Global Step: 244760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:28,856-Speed 6300.40 samples/sec Loss 6.0622 LearningRate 0.0006 Epoch: 11 Global Step: 244770 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:32,101-Speed 6312.87 samples/sec Loss 6.2505 LearningRate 0.0006 Epoch: 11 Global Step: 244780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:35,375-Speed 6257.99 samples/sec Loss 6.1236 LearningRate 0.0006 Epoch: 11 Global Step: 244790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:38,663-Speed 6230.39 samples/sec Loss 6.1674 LearningRate 0.0006 Epoch: 11 Global Step: 244800 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:41,922-Speed 6286.10 samples/sec Loss 6.1834 LearningRate 0.0006 Epoch: 11 Global Step: 244810 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:45,170-Speed 6305.71 samples/sec Loss 6.1948 LearningRate 0.0006 Epoch: 11 Global Step: 244820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:48,431-Speed 6281.96 samples/sec Loss 6.2003 LearningRate 0.0006 Epoch: 11 Global Step: 244830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:51,679-Speed 6308.04 samples/sec Loss 6.1859 LearningRate 0.0006 Epoch: 11 Global Step: 244840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:54,910-Speed 6339.92 samples/sec Loss 6.2305 LearningRate 0.0006 Epoch: 11 Global Step: 244850 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 13:59:58,156-Speed 6311.49 samples/sec Loss 6.2301 LearningRate 0.0006 Epoch: 11 Global Step: 244860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:01,398-Speed 6318.23 samples/sec Loss 6.1427 LearningRate 0.0006 Epoch: 11 Global Step: 244870 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:04,643-Speed 6313.18 samples/sec Loss 6.1716 LearningRate 0.0006 Epoch: 11 Global Step: 244880 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:07,887-Speed 6312.75 samples/sec Loss 6.1640 LearningRate 0.0006 Epoch: 11 Global Step: 244890 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:11,138-Speed 6303.07 samples/sec Loss 6.1849 LearningRate 0.0006 Epoch: 11 Global Step: 244900 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:14,381-Speed 6315.69 samples/sec Loss 6.2014 LearningRate 0.0006 Epoch: 11 Global Step: 244910 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:17,627-Speed 6310.41 samples/sec Loss 6.1422 LearningRate 0.0006 Epoch: 11 Global Step: 244920 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:20,872-Speed 6313.44 samples/sec Loss 6.2200 LearningRate 0.0006 Epoch: 11 Global Step: 244930 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:24,118-Speed 6309.23 samples/sec Loss 6.1767 LearningRate 0.0006 Epoch: 11 Global Step: 244940 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:27,366-Speed 6308.34 samples/sec Loss 6.2119 LearningRate 0.0006 Epoch: 11 Global Step: 244950 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:00:30,598-Speed 6337.35 samples/sec Loss 6.2411 LearningRate 0.0006 Epoch: 11 Global Step: 244960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:33,848-Speed 6303.82 samples/sec Loss 6.2324 LearningRate 0.0006 Epoch: 11 Global Step: 244970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:37,092-Speed 6312.96 samples/sec Loss 6.1482 LearningRate 0.0006 Epoch: 11 Global Step: 244980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:40,335-Speed 6317.53 samples/sec Loss 6.2258 LearningRate 0.0006 Epoch: 11 Global Step: 244990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:43,584-Speed 6305.84 samples/sec Loss 6.1110 LearningRate 0.0006 Epoch: 11 Global Step: 245000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:46,831-Speed 6309.19 samples/sec Loss 6.2405 LearningRate 0.0006 Epoch: 11 Global Step: 245010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:50,075-Speed 6312.73 samples/sec Loss 6.1506 LearningRate 0.0006 Epoch: 11 Global Step: 245020 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:53,320-Speed 6312.79 samples/sec Loss 6.1747 LearningRate 0.0006 Epoch: 11 Global Step: 245030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:56,565-Speed 6314.02 samples/sec Loss 6.1990 LearningRate 0.0006 Epoch: 11 Global Step: 245040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:00:59,834-Speed 6266.87 samples/sec Loss 6.1764 LearningRate 0.0006 Epoch: 11 Global Step: 245050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:03,148-Speed 6179.95 samples/sec Loss 6.1947 LearningRate 0.0006 Epoch: 11 Global Step: 245060 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:01:06,384-Speed 6330.90 samples/sec Loss 6.1928 LearningRate 0.0006 Epoch: 11 Global Step: 245070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:09,630-Speed 6311.57 samples/sec Loss 6.0684 LearningRate 0.0006 Epoch: 11 Global Step: 245080 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:12,872-Speed 6317.71 samples/sec Loss 6.1773 LearningRate 0.0006 Epoch: 11 Global Step: 245090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:16,122-Speed 6304.08 samples/sec Loss 6.2083 LearningRate 0.0006 Epoch: 11 Global Step: 245100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:19,367-Speed 6311.80 samples/sec Loss 6.1056 LearningRate 0.0006 Epoch: 11 Global Step: 245110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:22,614-Speed 6307.98 samples/sec Loss 6.1416 LearningRate 0.0006 Epoch: 11 Global Step: 245120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:25,864-Speed 6303.24 samples/sec Loss 6.2135 LearningRate 0.0006 Epoch: 11 Global Step: 245130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:29,108-Speed 6315.99 samples/sec Loss 6.1468 LearningRate 0.0006 Epoch: 11 Global Step: 245140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:32,356-Speed 6309.54 samples/sec Loss 6.2730 LearningRate 0.0006 Epoch: 11 Global Step: 245150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:35,603-Speed 6307.89 samples/sec Loss 6.1588 LearningRate 0.0006 Epoch: 11 Global Step: 245160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:38,834-Speed 6339.57 samples/sec Loss 6.2348 LearningRate 0.0006 Epoch: 11 Global Step: 245170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:42,079-Speed 6313.41 samples/sec Loss 6.1339 LearningRate 0.0006 Epoch: 11 Global Step: 245180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:45,328-Speed 6304.86 samples/sec Loss 6.1374 LearningRate 0.0006 Epoch: 11 Global Step: 245190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:48,578-Speed 6302.62 samples/sec Loss 6.1301 LearningRate 0.0006 Epoch: 11 Global Step: 245200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:51,820-Speed 6318.41 samples/sec Loss 6.1984 LearningRate 0.0006 Epoch: 11 Global Step: 245210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:55,135-Speed 6179.91 samples/sec Loss 6.2228 LearningRate 0.0006 Epoch: 11 Global Step: 245220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:01:58,392-Speed 6289.56 samples/sec Loss 6.1979 LearningRate 0.0006 Epoch: 11 Global Step: 245230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:01,640-Speed 6306.12 samples/sec Loss 6.2209 LearningRate 0.0006 Epoch: 11 Global Step: 245240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:04,889-Speed 6306.52 samples/sec Loss 6.1713 LearningRate 0.0006 Epoch: 11 Global Step: 245250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:08,130-Speed 6320.93 samples/sec Loss 6.1682 LearningRate 0.0006 Epoch: 11 Global Step: 245260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:11,365-Speed 6331.77 samples/sec Loss 6.1856 LearningRate 0.0006 Epoch: 11 Global Step: 245270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:14,633-Speed 6269.22 samples/sec Loss 6.1630 LearningRate 0.0006 Epoch: 11 Global Step: 245280 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:17,875-Speed 6317.57 samples/sec Loss 6.1998 LearningRate 0.0006 Epoch: 11 Global Step: 245290 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:21,128-Speed 6298.79 samples/sec Loss 6.1384 LearningRate 0.0006 Epoch: 11 Global Step: 245300 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:24,372-Speed 6313.53 samples/sec Loss 6.1520 LearningRate 0.0006 Epoch: 11 Global Step: 245310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:27,616-Speed 6315.53 samples/sec Loss 6.1780 LearningRate 0.0006 Epoch: 11 Global Step: 245320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:30,861-Speed 6311.80 samples/sec Loss 6.1565 LearningRate 0.0006 Epoch: 11 Global Step: 245330 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:34,107-Speed 6310.27 samples/sec Loss 6.2412 LearningRate 0.0006 Epoch: 11 Global Step: 245340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:37,355-Speed 6307.82 samples/sec Loss 6.1120 LearningRate 0.0006 Epoch: 11 Global Step: 245350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:40,600-Speed 6312.07 samples/sec Loss 6.1713 LearningRate 0.0006 Epoch: 11 Global Step: 245360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:43,845-Speed 6313.32 samples/sec Loss 6.1866 LearningRate 0.0006 Epoch: 11 Global Step: 245370 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:02:47,076-Speed 6339.77 samples/sec Loss 6.2197 LearningRate 0.0006 Epoch: 11 Global Step: 245380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:50,322-Speed 6309.65 samples/sec Loss 6.1601 LearningRate 0.0006 Epoch: 11 Global Step: 245390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:53,573-Speed 6300.83 samples/sec Loss 6.1862 LearningRate 0.0006 Epoch: 11 Global Step: 245400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:02:56,818-Speed 6313.33 samples/sec Loss 6.1592 LearningRate 0.0006 Epoch: 11 Global Step: 245410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:00,065-Speed 6309.74 samples/sec Loss 6.2037 LearningRate 0.0006 Epoch: 11 Global Step: 245420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:03,309-Speed 6314.56 samples/sec Loss 6.2127 LearningRate 0.0006 Epoch: 11 Global Step: 245430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:06,560-Speed 6299.74 samples/sec Loss 6.1577 LearningRate 0.0006 Epoch: 11 Global Step: 245440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:09,806-Speed 6311.21 samples/sec Loss 6.1753 LearningRate 0.0006 Epoch: 11 Global Step: 245450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:13,068-Speed 6281.57 samples/sec Loss 6.1281 LearningRate 0.0006 Epoch: 11 Global Step: 245460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:16,316-Speed 6306.72 samples/sec Loss 6.2063 LearningRate 0.0006 Epoch: 11 Global Step: 245470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:19,567-Speed 6301.47 samples/sec Loss 6.1228 LearningRate 0.0006 Epoch: 11 Global Step: 245480 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:03:22,819-Speed 6298.92 samples/sec Loss 6.2158 LearningRate 0.0006 Epoch: 11 Global Step: 245490 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:03:26,051-Speed 6337.16 samples/sec Loss 6.1682 LearningRate 0.0006 Epoch: 11 Global Step: 245500 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:29,295-Speed 6314.35 samples/sec Loss 6.1699 LearningRate 0.0006 Epoch: 11 Global Step: 245510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:32,542-Speed 6308.74 samples/sec Loss 6.1120 LearningRate 0.0006 Epoch: 11 Global Step: 245520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:35,791-Speed 6306.31 samples/sec Loss 6.2083 LearningRate 0.0006 Epoch: 11 Global Step: 245530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:39,037-Speed 6309.14 samples/sec Loss 6.1699 LearningRate 0.0006 Epoch: 11 Global Step: 245540 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:42,287-Speed 6303.53 samples/sec Loss 6.2752 LearningRate 0.0006 Epoch: 11 Global Step: 245550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:45,530-Speed 6317.75 samples/sec Loss 6.2222 LearningRate 0.0006 Epoch: 11 Global Step: 245560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:48,776-Speed 6309.43 samples/sec Loss 6.2306 LearningRate 0.0006 Epoch: 11 Global Step: 245570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:52,030-Speed 6295.53 samples/sec Loss 6.1879 LearningRate 0.0006 Epoch: 11 Global Step: 245580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:55,278-Speed 6306.66 samples/sec Loss 6.1656 LearningRate 0.0006 Epoch: 11 Global Step: 245590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:03:58,508-Speed 6341.24 samples/sec Loss 6.1176 LearningRate 0.0006 Epoch: 11 Global Step: 245600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:01,754-Speed 6311.64 samples/sec Loss 6.1645 LearningRate 0.0006 Epoch: 11 Global Step: 245610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:05,005-Speed 6301.25 samples/sec Loss 6.1637 LearningRate 0.0006 Epoch: 11 Global Step: 245620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:08,256-Speed 6300.83 samples/sec Loss 6.1455 LearningRate 0.0006 Epoch: 11 Global Step: 245630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:11,520-Speed 6276.21 samples/sec Loss 6.1889 LearningRate 0.0006 Epoch: 11 Global Step: 245640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:14,771-Speed 6301.07 samples/sec Loss 6.1747 LearningRate 0.0006 Epoch: 11 Global Step: 245650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:18,018-Speed 6309.74 samples/sec Loss 6.1903 LearningRate 0.0006 Epoch: 11 Global Step: 245660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:21,266-Speed 6307.81 samples/sec Loss 6.1075 LearningRate 0.0006 Epoch: 11 Global Step: 245670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:24,512-Speed 6309.97 samples/sec Loss 6.1643 LearningRate 0.0006 Epoch: 11 Global Step: 245680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:27,758-Speed 6311.46 samples/sec Loss 6.1298 LearningRate 0.0006 Epoch: 11 Global Step: 245690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:31,001-Speed 6316.47 samples/sec Loss 6.1681 LearningRate 0.0006 Epoch: 11 Global Step: 245700 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:04:34,233-Speed 6338.14 samples/sec Loss 6.1091 LearningRate 0.0006 Epoch: 11 Global Step: 245710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:37,479-Speed 6310.32 samples/sec Loss 6.1549 LearningRate 0.0006 Epoch: 11 Global Step: 245720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:40,724-Speed 6312.79 samples/sec Loss 6.2919 LearningRate 0.0006 Epoch: 11 Global Step: 245730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:43,967-Speed 6315.55 samples/sec Loss 6.2171 LearningRate 0.0006 Epoch: 11 Global Step: 245740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:47,214-Speed 6309.51 samples/sec Loss 6.1623 LearningRate 0.0006 Epoch: 11 Global Step: 245750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:50,458-Speed 6314.42 samples/sec Loss 6.1200 LearningRate 0.0006 Epoch: 11 Global Step: 245760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:53,710-Speed 6299.04 samples/sec Loss 6.0616 LearningRate 0.0006 Epoch: 11 Global Step: 245770 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:04:56,954-Speed 6315.20 samples/sec Loss 6.1816 LearningRate 0.0006 Epoch: 11 Global Step: 245780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:00,199-Speed 6312.51 samples/sec Loss 6.1328 LearningRate 0.0006 Epoch: 11 Global Step: 245790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:03,443-Speed 6315.26 samples/sec Loss 6.1798 LearningRate 0.0006 Epoch: 11 Global Step: 245800 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:06,677-Speed 6334.33 samples/sec Loss 6.1615 LearningRate 0.0006 Epoch: 11 Global Step: 245810 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:09,923-Speed 6309.96 samples/sec Loss 6.1151 LearningRate 0.0006 Epoch: 11 Global Step: 245820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:13,167-Speed 6314.94 samples/sec Loss 6.1490 LearningRate 0.0006 Epoch: 11 Global Step: 245830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:16,414-Speed 6308.89 samples/sec Loss 6.0951 LearningRate 0.0006 Epoch: 11 Global Step: 245840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:19,663-Speed 6304.58 samples/sec Loss 6.1480 LearningRate 0.0006 Epoch: 11 Global Step: 245850 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:22,909-Speed 6310.56 samples/sec Loss 6.2584 LearningRate 0.0006 Epoch: 11 Global Step: 245860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:26,167-Speed 6287.94 samples/sec Loss 6.1850 LearningRate 0.0006 Epoch: 11 Global Step: 245870 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:29,414-Speed 6310.01 samples/sec Loss 6.1174 LearningRate 0.0006 Epoch: 11 Global Step: 245880 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:32,662-Speed 6307.06 samples/sec Loss 6.1252 LearningRate 0.0006 Epoch: 11 Global Step: 245890 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:35,906-Speed 6314.54 samples/sec Loss 6.1239 LearningRate 0.0006 Epoch: 11 Global Step: 245900 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:39,163-Speed 6288.48 samples/sec Loss 6.1701 LearningRate 0.0006 Epoch: 11 Global Step: 245910 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:05:42,397-Speed 6334.20 samples/sec Loss 6.1657 LearningRate 0.0006 Epoch: 11 Global Step: 245920 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:45,640-Speed 6316.04 samples/sec Loss 6.2118 LearningRate 0.0006 Epoch: 11 Global Step: 245930 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:48,886-Speed 6310.44 samples/sec Loss 6.1642 LearningRate 0.0006 Epoch: 11 Global Step: 245940 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:52,131-Speed 6313.09 samples/sec Loss 6.1873 LearningRate 0.0006 Epoch: 11 Global Step: 245950 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:55,376-Speed 6313.56 samples/sec Loss 6.2123 LearningRate 0.0006 Epoch: 11 Global Step: 245960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:05:58,623-Speed 6308.35 samples/sec Loss 6.2039 LearningRate 0.0006 Epoch: 11 Global Step: 245970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:01,872-Speed 6304.91 samples/sec Loss 6.1924 LearningRate 0.0006 Epoch: 11 Global Step: 245980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:05,119-Speed 6309.10 samples/sec Loss 6.2397 LearningRate 0.0006 Epoch: 11 Global Step: 245990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:08,367-Speed 6307.20 samples/sec Loss 6.2950 LearningRate 0.0006 Epoch: 11 Global Step: 246000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:11,615-Speed 6306.69 samples/sec Loss 6.2057 LearningRate 0.0006 Epoch: 11 Global Step: 246010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:14,861-Speed 6310.37 samples/sec Loss 6.2000 LearningRate 0.0006 Epoch: 11 Global Step: 246020 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:06:18,098-Speed 6328.80 samples/sec Loss 6.1292 LearningRate 0.0006 Epoch: 11 Global Step: 246030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:21,345-Speed 6309.44 samples/sec Loss 6.1840 LearningRate 0.0006 Epoch: 11 Global Step: 246040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:24,595-Speed 6302.02 samples/sec Loss 6.2230 LearningRate 0.0006 Epoch: 11 Global Step: 246050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:27,842-Speed 6308.57 samples/sec Loss 6.2507 LearningRate 0.0006 Epoch: 11 Global Step: 246060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:31,092-Speed 6303.75 samples/sec Loss 6.1682 LearningRate 0.0006 Epoch: 11 Global Step: 246070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:34,335-Speed 6315.25 samples/sec Loss 6.2003 LearningRate 0.0006 Epoch: 11 Global Step: 246080 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:37,584-Speed 6306.35 samples/sec Loss 6.1447 LearningRate 0.0006 Epoch: 11 Global Step: 246090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:40,831-Speed 6309.28 samples/sec Loss 6.1228 LearningRate 0.0006 Epoch: 11 Global Step: 246100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:44,075-Speed 6314.62 samples/sec Loss 6.1854 LearningRate 0.0006 Epoch: 11 Global Step: 246110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:47,324-Speed 6303.73 samples/sec Loss 6.1194 LearningRate 0.0006 Epoch: 11 Global Step: 246120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:50,554-Speed 6343.23 samples/sec Loss 6.1714 LearningRate 0.0006 Epoch: 11 Global Step: 246130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:53,802-Speed 6307.34 samples/sec Loss 6.1452 LearningRate 0.0006 Epoch: 11 Global Step: 246140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:06:57,052-Speed 6301.81 samples/sec Loss 6.1817 LearningRate 0.0006 Epoch: 11 Global Step: 246150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:00,298-Speed 6311.06 samples/sec Loss 6.1672 LearningRate 0.0006 Epoch: 11 Global Step: 246160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:03,546-Speed 6307.35 samples/sec Loss 6.2017 LearningRate 0.0006 Epoch: 11 Global Step: 246170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:06,792-Speed 6309.95 samples/sec Loss 6.1206 LearningRate 0.0006 Epoch: 11 Global Step: 246180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:10,040-Speed 6306.69 samples/sec Loss 6.2416 LearningRate 0.0006 Epoch: 11 Global Step: 246190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:13,287-Speed 6309.79 samples/sec Loss 6.1814 LearningRate 0.0006 Epoch: 11 Global Step: 246200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:16,536-Speed 6303.89 samples/sec Loss 6.2428 LearningRate 0.0006 Epoch: 11 Global Step: 246210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:19,784-Speed 6307.64 samples/sec Loss 6.1486 LearningRate 0.0006 Epoch: 11 Global Step: 246220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:23,030-Speed 6311.23 samples/sec Loss 6.1477 LearningRate 0.0006 Epoch: 11 Global Step: 246230 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:07:26,279-Speed 6303.85 samples/sec Loss 6.1985 LearningRate 0.0006 Epoch: 11 Global Step: 246240 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:07:29,510-Speed 6342.33 samples/sec Loss 6.1353 LearningRate 0.0006 Epoch: 11 Global Step: 246250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:32,757-Speed 6308.62 samples/sec Loss 6.1928 LearningRate 0.0006 Epoch: 11 Global Step: 246260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:36,017-Speed 6281.93 samples/sec Loss 6.1755 LearningRate 0.0006 Epoch: 11 Global Step: 246270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:39,265-Speed 6308.13 samples/sec Loss 6.2097 LearningRate 0.0006 Epoch: 11 Global Step: 246280 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:42,514-Speed 6304.75 samples/sec Loss 6.1562 LearningRate 0.0006 Epoch: 11 Global Step: 246290 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:45,755-Speed 6319.78 samples/sec Loss 6.1259 LearningRate 0.0006 Epoch: 11 Global Step: 246300 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:49,003-Speed 6308.54 samples/sec Loss 6.1880 LearningRate 0.0006 Epoch: 11 Global Step: 246310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:52,244-Speed 6320.88 samples/sec Loss 6.1675 LearningRate 0.0006 Epoch: 11 Global Step: 246320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:55,489-Speed 6311.32 samples/sec Loss 6.1551 LearningRate 0.0006 Epoch: 11 Global Step: 246330 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:07:58,730-Speed 6321.01 samples/sec Loss 6.1349 LearningRate 0.0006 Epoch: 11 Global Step: 246340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:01,963-Speed 6336.93 samples/sec Loss 6.1422 LearningRate 0.0006 Epoch: 11 Global Step: 246350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:05,208-Speed 6311.77 samples/sec Loss 6.1701 LearningRate 0.0006 Epoch: 11 Global Step: 246360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:08,452-Speed 6314.33 samples/sec Loss 6.2421 LearningRate 0.0006 Epoch: 11 Global Step: 246370 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:11,699-Speed 6309.02 samples/sec Loss 6.1683 LearningRate 0.0006 Epoch: 11 Global Step: 246380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:14,947-Speed 6307.14 samples/sec Loss 6.0835 LearningRate 0.0006 Epoch: 11 Global Step: 246390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:18,195-Speed 6307.55 samples/sec Loss 6.1784 LearningRate 0.0006 Epoch: 11 Global Step: 246400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:21,439-Speed 6313.25 samples/sec Loss 6.2252 LearningRate 0.0006 Epoch: 11 Global Step: 246410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:24,688-Speed 6305.97 samples/sec Loss 6.2000 LearningRate 0.0006 Epoch: 11 Global Step: 246420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:27,932-Speed 6314.92 samples/sec Loss 6.2324 LearningRate 0.0006 Epoch: 11 Global Step: 246430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:31,179-Speed 6307.87 samples/sec Loss 6.1487 LearningRate 0.0006 Epoch: 11 Global Step: 246440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:34,426-Speed 6308.93 samples/sec Loss 6.1212 LearningRate 0.0006 Epoch: 11 Global Step: 246450 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:08:37,657-Speed 6340.76 samples/sec Loss 6.1879 LearningRate 0.0006 Epoch: 11 Global Step: 246460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:40,907-Speed 6302.50 samples/sec Loss 6.2183 LearningRate 0.0006 Epoch: 11 Global Step: 246470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:44,158-Speed 6301.44 samples/sec Loss 6.1615 LearningRate 0.0006 Epoch: 11 Global Step: 246480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:47,414-Speed 6289.70 samples/sec Loss 6.1590 LearningRate 0.0006 Epoch: 11 Global Step: 246490 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:50,664-Speed 6304.50 samples/sec Loss 6.2435 LearningRate 0.0006 Epoch: 11 Global Step: 246500 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:53,909-Speed 6313.00 samples/sec Loss 6.1125 LearningRate 0.0006 Epoch: 11 Global Step: 246510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:08:57,161-Speed 6298.90 samples/sec Loss 6.1542 LearningRate 0.0006 Epoch: 11 Global Step: 246520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:00,404-Speed 6316.98 samples/sec Loss 6.1523 LearningRate 0.0006 Epoch: 11 Global Step: 246530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:03,650-Speed 6310.41 samples/sec Loss 6.1524 LearningRate 0.0006 Epoch: 11 Global Step: 246540 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:06,898-Speed 6306.37 samples/sec Loss 6.1269 LearningRate 0.0006 Epoch: 11 Global Step: 246550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:10,151-Speed 6298.51 samples/sec Loss 6.1998 LearningRate 0.0006 Epoch: 11 Global Step: 246560 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:09:13,383-Speed 6338.52 samples/sec Loss 6.1531 LearningRate 0.0006 Epoch: 11 Global Step: 246570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:16,627-Speed 6312.83 samples/sec Loss 6.1220 LearningRate 0.0006 Epoch: 11 Global Step: 246580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:19,874-Speed 6308.89 samples/sec Loss 6.2464 LearningRate 0.0006 Epoch: 11 Global Step: 246590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:23,126-Speed 6300.66 samples/sec Loss 6.1803 LearningRate 0.0006 Epoch: 11 Global Step: 246600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:26,368-Speed 6317.59 samples/sec Loss 6.1649 LearningRate 0.0006 Epoch: 11 Global Step: 246610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:29,617-Speed 6305.70 samples/sec Loss 6.1074 LearningRate 0.0006 Epoch: 11 Global Step: 246620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:32,860-Speed 6317.27 samples/sec Loss 6.1325 LearningRate 0.0006 Epoch: 11 Global Step: 246630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:36,111-Speed 6300.74 samples/sec Loss 6.1936 LearningRate 0.0006 Epoch: 11 Global Step: 246640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:39,359-Speed 6306.20 samples/sec Loss 6.2225 LearningRate 0.0006 Epoch: 11 Global Step: 246650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:42,605-Speed 6310.85 samples/sec Loss 6.1755 LearningRate 0.0006 Epoch: 11 Global Step: 246660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:45,838-Speed 6336.78 samples/sec Loss 6.1234 LearningRate 0.0006 Epoch: 11 Global Step: 246670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:49,082-Speed 6313.57 samples/sec Loss 6.1900 LearningRate 0.0006 Epoch: 11 Global Step: 246680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:52,326-Speed 6313.83 samples/sec Loss 6.2211 LearningRate 0.0006 Epoch: 11 Global Step: 246690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:55,572-Speed 6311.16 samples/sec Loss 6.1388 LearningRate 0.0006 Epoch: 11 Global Step: 246700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:09:58,818-Speed 6310.01 samples/sec Loss 6.1669 LearningRate 0.0006 Epoch: 11 Global Step: 246710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:02,068-Speed 6304.04 samples/sec Loss 6.2451 LearningRate 0.0006 Epoch: 11 Global Step: 246720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:05,315-Speed 6310.38 samples/sec Loss 6.1551 LearningRate 0.0006 Epoch: 11 Global Step: 246730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:08,560-Speed 6312.99 samples/sec Loss 6.1737 LearningRate 0.0006 Epoch: 11 Global Step: 246740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:11,808-Speed 6306.98 samples/sec Loss 6.0797 LearningRate 0.0006 Epoch: 11 Global Step: 246750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:15,052-Speed 6314.32 samples/sec Loss 6.1230 LearningRate 0.0006 Epoch: 11 Global Step: 246760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:18,286-Speed 6334.26 samples/sec Loss 6.1833 LearningRate 0.0006 Epoch: 11 Global Step: 246770 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:21,530-Speed 6314.90 samples/sec Loss 6.1365 LearningRate 0.0006 Epoch: 11 Global Step: 246780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:24,777-Speed 6308.42 samples/sec Loss 6.2168 LearningRate 0.0006 Epoch: 11 Global Step: 246790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:28,023-Speed 6309.23 samples/sec Loss 6.2068 LearningRate 0.0006 Epoch: 11 Global Step: 246800 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:31,270-Speed 6310.34 samples/sec Loss 6.1534 LearningRate 0.0006 Epoch: 11 Global Step: 246810 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:34,518-Speed 6306.52 samples/sec Loss 6.1326 LearningRate 0.0006 Epoch: 11 Global Step: 246820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:37,773-Speed 6292.38 samples/sec Loss 6.1561 LearningRate 0.0006 Epoch: 11 Global Step: 246830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:41,023-Speed 6303.03 samples/sec Loss 6.1343 LearningRate 0.0006 Epoch: 11 Global Step: 246840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:44,269-Speed 6311.44 samples/sec Loss 6.2516 LearningRate 0.0006 Epoch: 11 Global Step: 246850 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:47,516-Speed 6307.86 samples/sec Loss 6.2033 LearningRate 0.0006 Epoch: 11 Global Step: 246860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:50,762-Speed 6312.02 samples/sec Loss 6.1146 LearningRate 0.0006 Epoch: 11 Global Step: 246870 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:10:53,996-Speed 6334.10 samples/sec Loss 6.1491 LearningRate 0.0006 Epoch: 11 Global Step: 246880 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:10:57,247-Speed 6301.07 samples/sec Loss 6.1473 LearningRate 0.0006 Epoch: 11 Global Step: 246890 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:00,490-Speed 6315.10 samples/sec Loss 6.1684 LearningRate 0.0006 Epoch: 11 Global Step: 246900 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:03,738-Speed 6307.85 samples/sec Loss 6.2533 LearningRate 0.0006 Epoch: 11 Global Step: 246910 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:06,986-Speed 6307.28 samples/sec Loss 6.1869 LearningRate 0.0006 Epoch: 11 Global Step: 246920 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:10,232-Speed 6310.07 samples/sec Loss 6.1808 LearningRate 0.0006 Epoch: 11 Global Step: 246930 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:13,477-Speed 6313.29 samples/sec Loss 6.1156 LearningRate 0.0006 Epoch: 11 Global Step: 246940 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:16,722-Speed 6313.12 samples/sec Loss 6.2286 LearningRate 0.0006 Epoch: 11 Global Step: 246950 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:19,969-Speed 6309.04 samples/sec Loss 6.1181 LearningRate 0.0006 Epoch: 11 Global Step: 246960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:23,211-Speed 6318.44 samples/sec Loss 6.1455 LearningRate 0.0006 Epoch: 11 Global Step: 246970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:26,441-Speed 6342.18 samples/sec Loss 6.2483 LearningRate 0.0006 Epoch: 11 Global Step: 246980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:29,685-Speed 6313.39 samples/sec Loss 6.1376 LearningRate 0.0006 Epoch: 11 Global Step: 246990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:32,935-Speed 6303.30 samples/sec Loss 6.2259 LearningRate 0.0006 Epoch: 11 Global Step: 247000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:36,180-Speed 6312.89 samples/sec Loss 6.1151 LearningRate 0.0006 Epoch: 11 Global Step: 247010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:39,424-Speed 6314.82 samples/sec Loss 6.1148 LearningRate 0.0006 Epoch: 11 Global Step: 247020 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:42,675-Speed 6301.75 samples/sec Loss 6.1221 LearningRate 0.0006 Epoch: 11 Global Step: 247030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:45,923-Speed 6306.28 samples/sec Loss 6.1429 LearningRate 0.0006 Epoch: 11 Global Step: 247040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:49,251-Speed 6155.46 samples/sec Loss 6.2186 LearningRate 0.0006 Epoch: 11 Global Step: 247050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:52,493-Speed 6318.60 samples/sec Loss 6.1051 LearningRate 0.0006 Epoch: 11 Global Step: 247060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:55,738-Speed 6311.71 samples/sec Loss 6.1500 LearningRate 0.0006 Epoch: 11 Global Step: 247070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:11:58,988-Speed 6302.54 samples/sec Loss 6.1127 LearningRate 0.0006 Epoch: 11 Global Step: 247080 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:12:02,222-Speed 6335.31 samples/sec Loss 6.1965 LearningRate 0.0006 Epoch: 11 Global Step: 247090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:05,471-Speed 6305.40 samples/sec Loss 6.1146 LearningRate 0.0006 Epoch: 11 Global Step: 247100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:08,715-Speed 6313.20 samples/sec Loss 6.2316 LearningRate 0.0006 Epoch: 11 Global Step: 247110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:11,965-Speed 6302.70 samples/sec Loss 6.2447 LearningRate 0.0006 Epoch: 11 Global Step: 247120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:15,208-Speed 6316.37 samples/sec Loss 6.2308 LearningRate 0.0006 Epoch: 11 Global Step: 247130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:18,455-Speed 6310.02 samples/sec Loss 6.1995 LearningRate 0.0006 Epoch: 11 Global Step: 247140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:21,702-Speed 6309.76 samples/sec Loss 6.1334 LearningRate 0.0006 Epoch: 11 Global Step: 247150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:24,946-Speed 6314.69 samples/sec Loss 6.2172 LearningRate 0.0006 Epoch: 11 Global Step: 247160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:28,192-Speed 6310.42 samples/sec Loss 6.2067 LearningRate 0.0006 Epoch: 11 Global Step: 247170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:31,439-Speed 6309.48 samples/sec Loss 6.1745 LearningRate 0.0006 Epoch: 11 Global Step: 247180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:34,686-Speed 6309.36 samples/sec Loss 6.1418 LearningRate 0.0006 Epoch: 11 Global Step: 247190 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:12:37,919-Speed 6335.31 samples/sec Loss 6.1543 LearningRate 0.0006 Epoch: 11 Global Step: 247200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:41,161-Speed 6317.67 samples/sec Loss 6.0686 LearningRate 0.0006 Epoch: 11 Global Step: 247210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:44,411-Speed 6303.24 samples/sec Loss 6.1055 LearningRate 0.0006 Epoch: 11 Global Step: 247220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:47,664-Speed 6297.57 samples/sec Loss 6.1744 LearningRate 0.0006 Epoch: 11 Global Step: 247230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:50,912-Speed 6307.08 samples/sec Loss 6.2486 LearningRate 0.0006 Epoch: 11 Global Step: 247240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:54,161-Speed 6304.44 samples/sec Loss 6.2096 LearningRate 0.0006 Epoch: 11 Global Step: 247250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:12:57,404-Speed 6315.80 samples/sec Loss 6.1697 LearningRate 0.0006 Epoch: 11 Global Step: 247260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:00,653-Speed 6305.43 samples/sec Loss 6.2385 LearningRate 0.0006 Epoch: 11 Global Step: 247270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:03,902-Speed 6305.73 samples/sec Loss 6.2013 LearningRate 0.0006 Epoch: 11 Global Step: 247280 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:07,151-Speed 6304.69 samples/sec Loss 6.1665 LearningRate 0.0006 Epoch: 11 Global Step: 247290 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:10,405-Speed 6295.28 samples/sec Loss 6.1929 LearningRate 0.0006 Epoch: 11 Global Step: 247300 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:13:13,643-Speed 6326.65 samples/sec Loss 6.1501 LearningRate 0.0006 Epoch: 11 Global Step: 247310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:16,891-Speed 6305.15 samples/sec Loss 6.1788 LearningRate 0.0006 Epoch: 11 Global Step: 247320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:20,135-Speed 6314.84 samples/sec Loss 6.1726 LearningRate 0.0006 Epoch: 11 Global Step: 247330 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:23,385-Speed 6304.18 samples/sec Loss 6.2167 LearningRate 0.0006 Epoch: 11 Global Step: 247340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:26,627-Speed 6317.15 samples/sec Loss 6.1806 LearningRate 0.0006 Epoch: 11 Global Step: 247350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:29,870-Speed 6317.13 samples/sec Loss 6.2241 LearningRate 0.0006 Epoch: 11 Global Step: 247360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:33,117-Speed 6310.19 samples/sec Loss 6.1684 LearningRate 0.0006 Epoch: 11 Global Step: 247370 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:36,362-Speed 6312.00 samples/sec Loss 6.0938 LearningRate 0.0006 Epoch: 11 Global Step: 247380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:39,609-Speed 6310.36 samples/sec Loss 6.1305 LearningRate 0.0006 Epoch: 11 Global Step: 247390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:42,854-Speed 6311.22 samples/sec Loss 6.1717 LearningRate 0.0006 Epoch: 11 Global Step: 247400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:46,085-Speed 6339.86 samples/sec Loss 6.2299 LearningRate 0.0006 Epoch: 11 Global Step: 247410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:49,333-Speed 6307.09 samples/sec Loss 6.1927 LearningRate 0.0006 Epoch: 11 Global Step: 247420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:52,587-Speed 6296.18 samples/sec Loss 6.1468 LearningRate 0.0006 Epoch: 11 Global Step: 247430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:55,832-Speed 6313.07 samples/sec Loss 6.2036 LearningRate 0.0006 Epoch: 11 Global Step: 247440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:13:59,086-Speed 6293.87 samples/sec Loss 6.1137 LearningRate 0.0006 Epoch: 11 Global Step: 247450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:02,333-Speed 6309.60 samples/sec Loss 6.1093 LearningRate 0.0006 Epoch: 11 Global Step: 247460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:05,585-Speed 6298.90 samples/sec Loss 6.2259 LearningRate 0.0006 Epoch: 11 Global Step: 247470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:08,832-Speed 6310.02 samples/sec Loss 6.2406 LearningRate 0.0006 Epoch: 11 Global Step: 247480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:12,077-Speed 6311.34 samples/sec Loss 6.2278 LearningRate 0.0006 Epoch: 11 Global Step: 247490 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:15,320-Speed 6315.94 samples/sec Loss 6.1498 LearningRate 0.0006 Epoch: 11 Global Step: 247500 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:18,571-Speed 6302.10 samples/sec Loss 6.2948 LearningRate 0.0006 Epoch: 11 Global Step: 247510 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:14:21,805-Speed 6334.61 samples/sec Loss 6.2431 LearningRate 0.0006 Epoch: 11 Global Step: 247520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:25,051-Speed 6309.59 samples/sec Loss 6.1794 LearningRate 0.0006 Epoch: 11 Global Step: 247530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:28,301-Speed 6303.64 samples/sec Loss 6.2359 LearningRate 0.0006 Epoch: 11 Global Step: 247540 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:31,585-Speed 6237.84 samples/sec Loss 6.1791 LearningRate 0.0006 Epoch: 11 Global Step: 247550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:34,830-Speed 6312.29 samples/sec Loss 6.1843 LearningRate 0.0006 Epoch: 11 Global Step: 247560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:38,075-Speed 6311.86 samples/sec Loss 6.1086 LearningRate 0.0006 Epoch: 11 Global Step: 247570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:41,402-Speed 6157.12 samples/sec Loss 6.1657 LearningRate 0.0006 Epoch: 11 Global Step: 247580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:44,670-Speed 6269.49 samples/sec Loss 6.1288 LearningRate 0.0006 Epoch: 11 Global Step: 247590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:47,915-Speed 6312.65 samples/sec Loss 6.1611 LearningRate 0.0006 Epoch: 11 Global Step: 247600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:51,161-Speed 6311.30 samples/sec Loss 6.1054 LearningRate 0.0006 Epoch: 11 Global Step: 247610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:54,392-Speed 6340.32 samples/sec Loss 6.1881 LearningRate 0.0006 Epoch: 11 Global Step: 247620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:14:57,642-Speed 6302.24 samples/sec Loss 6.1198 LearningRate 0.0006 Epoch: 11 Global Step: 247630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:00,888-Speed 6312.01 samples/sec Loss 6.1612 LearningRate 0.0006 Epoch: 11 Global Step: 247640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:04,133-Speed 6311.85 samples/sec Loss 6.0532 LearningRate 0.0006 Epoch: 11 Global Step: 247650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:07,381-Speed 6306.54 samples/sec Loss 6.1832 LearningRate 0.0006 Epoch: 11 Global Step: 247660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:10,624-Speed 6318.16 samples/sec Loss 6.1095 LearningRate 0.0006 Epoch: 11 Global Step: 247670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:13,869-Speed 6311.74 samples/sec Loss 6.2384 LearningRate 0.0006 Epoch: 11 Global Step: 247680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:17,118-Speed 6305.44 samples/sec Loss 6.1024 LearningRate 0.0006 Epoch: 11 Global Step: 247690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:20,369-Speed 6300.38 samples/sec Loss 6.2095 LearningRate 0.0006 Epoch: 11 Global Step: 247700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:23,614-Speed 6312.71 samples/sec Loss 6.1316 LearningRate 0.0006 Epoch: 11 Global Step: 247710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:26,862-Speed 6307.06 samples/sec Loss 6.1509 LearningRate 0.0006 Epoch: 11 Global Step: 247720 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:15:30,108-Speed 6310.61 samples/sec Loss 6.0926 LearningRate 0.0006 Epoch: 11 Global Step: 247730 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:15:33,341-Speed 6335.86 samples/sec Loss 6.1415 LearningRate 0.0006 Epoch: 11 Global Step: 247740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:36,588-Speed 6309.20 samples/sec Loss 6.1455 LearningRate 0.0006 Epoch: 11 Global Step: 247750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:39,833-Speed 6311.70 samples/sec Loss 6.2341 LearningRate 0.0006 Epoch: 11 Global Step: 247760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:43,086-Speed 6298.31 samples/sec Loss 6.0727 LearningRate 0.0006 Epoch: 11 Global Step: 247770 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:46,331-Speed 6311.44 samples/sec Loss 6.1171 LearningRate 0.0006 Epoch: 11 Global Step: 247780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:49,582-Speed 6300.76 samples/sec Loss 6.0753 LearningRate 0.0006 Epoch: 11 Global Step: 247790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:52,827-Speed 6315.36 samples/sec Loss 6.2261 LearningRate 0.0006 Epoch: 11 Global Step: 247800 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:56,073-Speed 6309.79 samples/sec Loss 6.1392 LearningRate 0.0006 Epoch: 11 Global Step: 247810 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:15:59,318-Speed 6313.62 samples/sec Loss 6.2186 LearningRate 0.0006 Epoch: 11 Global Step: 247820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:02,567-Speed 6304.64 samples/sec Loss 6.1755 LearningRate 0.0006 Epoch: 11 Global Step: 247830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:05,797-Speed 6342.52 samples/sec Loss 6.1960 LearningRate 0.0006 Epoch: 11 Global Step: 247840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:09,050-Speed 6297.74 samples/sec Loss 6.1647 LearningRate 0.0006 Epoch: 11 Global Step: 247850 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:12,293-Speed 6315.13 samples/sec Loss 6.1869 LearningRate 0.0006 Epoch: 11 Global Step: 247860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:15,544-Speed 6301.87 samples/sec Loss 6.1859 LearningRate 0.0006 Epoch: 11 Global Step: 247870 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:18,788-Speed 6314.48 samples/sec Loss 6.3179 LearningRate 0.0006 Epoch: 11 Global Step: 247880 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:22,053-Speed 6274.55 samples/sec Loss 6.1723 LearningRate 0.0006 Epoch: 11 Global Step: 247890 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:25,301-Speed 6306.51 samples/sec Loss 6.1585 LearningRate 0.0006 Epoch: 11 Global Step: 247900 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:28,551-Speed 6302.80 samples/sec Loss 6.2111 LearningRate 0.0006 Epoch: 11 Global Step: 247910 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:31,797-Speed 6309.67 samples/sec Loss 6.1249 LearningRate 0.0006 Epoch: 11 Global Step: 247920 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:35,047-Speed 6304.19 samples/sec Loss 6.0989 LearningRate 0.0006 Epoch: 11 Global Step: 247930 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:38,296-Speed 6304.61 samples/sec Loss 6.1806 LearningRate 0.0006 Epoch: 11 Global Step: 247940 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:16:41,529-Speed 6335.62 samples/sec Loss 6.1638 LearningRate 0.0006 Epoch: 11 Global Step: 247950 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:44,774-Speed 6313.78 samples/sec Loss 6.1246 LearningRate 0.0006 Epoch: 11 Global Step: 247960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:48,021-Speed 6309.27 samples/sec Loss 6.2087 LearningRate 0.0006 Epoch: 11 Global Step: 247970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:51,267-Speed 6309.54 samples/sec Loss 6.1817 LearningRate 0.0006 Epoch: 11 Global Step: 247980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:54,513-Speed 6312.09 samples/sec Loss 6.1151 LearningRate 0.0006 Epoch: 11 Global Step: 247990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:16:57,759-Speed 6310.27 samples/sec Loss 6.1035 LearningRate 0.0006 Epoch: 11 Global Step: 248000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:01,009-Speed 6302.85 samples/sec Loss 6.1659 LearningRate 0.0006 Epoch: 11 Global Step: 248010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:04,261-Speed 6299.86 samples/sec Loss 6.0477 LearningRate 0.0006 Epoch: 11 Global Step: 248020 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:07,511-Speed 6303.49 samples/sec Loss 6.2083 LearningRate 0.0006 Epoch: 11 Global Step: 248030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:10,758-Speed 6309.26 samples/sec Loss 6.0997 LearningRate 0.0006 Epoch: 11 Global Step: 248040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:14,082-Speed 6161.58 samples/sec Loss 6.1344 LearningRate 0.0006 Epoch: 11 Global Step: 248050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:17,438-Speed 6103.42 samples/sec Loss 6.1390 LearningRate 0.0006 Epoch: 11 Global Step: 248060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:20,687-Speed 6304.68 samples/sec Loss 6.1603 LearningRate 0.0006 Epoch: 11 Global Step: 248070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:23,980-Speed 6222.27 samples/sec Loss 6.1246 LearningRate 0.0006 Epoch: 11 Global Step: 248080 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:27,226-Speed 6310.54 samples/sec Loss 6.1068 LearningRate 0.0006 Epoch: 11 Global Step: 248090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:30,470-Speed 6313.42 samples/sec Loss 6.1788 LearningRate 0.0006 Epoch: 11 Global Step: 248100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:33,718-Speed 6306.98 samples/sec Loss 6.1933 LearningRate 0.0006 Epoch: 11 Global Step: 248110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:37,023-Speed 6198.49 samples/sec Loss 6.0594 LearningRate 0.0006 Epoch: 11 Global Step: 248120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:40,278-Speed 6293.79 samples/sec Loss 6.1712 LearningRate 0.0006 Epoch: 11 Global Step: 248130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:43,525-Speed 6308.79 samples/sec Loss 6.1839 LearningRate 0.0006 Epoch: 11 Global Step: 248140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:46,772-Speed 6307.30 samples/sec Loss 6.1543 LearningRate 0.0006 Epoch: 11 Global Step: 248150 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:17:50,011-Speed 6325.49 samples/sec Loss 6.1957 LearningRate 0.0006 Epoch: 11 Global Step: 248160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:53,263-Speed 6299.23 samples/sec Loss 6.1280 LearningRate 0.0006 Epoch: 11 Global Step: 248170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:56,508-Speed 6311.56 samples/sec Loss 6.1769 LearningRate 0.0006 Epoch: 11 Global Step: 248180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:17:59,760-Speed 6299.37 samples/sec Loss 6.2294 LearningRate 0.0006 Epoch: 11 Global Step: 248190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:03,007-Speed 6309.01 samples/sec Loss 6.1337 LearningRate 0.0006 Epoch: 11 Global Step: 248200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:06,254-Speed 6308.71 samples/sec Loss 6.1029 LearningRate 0.0006 Epoch: 11 Global Step: 248210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:09,500-Speed 6312.02 samples/sec Loss 6.0664 LearningRate 0.0006 Epoch: 11 Global Step: 248220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:12,748-Speed 6307.17 samples/sec Loss 6.1765 LearningRate 0.0006 Epoch: 11 Global Step: 248230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:15,994-Speed 6309.80 samples/sec Loss 6.0881 LearningRate 0.0006 Epoch: 11 Global Step: 248240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:19,240-Speed 6310.62 samples/sec Loss 6.2049 LearningRate 0.0006 Epoch: 11 Global Step: 248250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:22,476-Speed 6330.76 samples/sec Loss 6.1210 LearningRate 0.0006 Epoch: 11 Global Step: 248260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:25,726-Speed 6303.69 samples/sec Loss 6.1343 LearningRate 0.0006 Epoch: 11 Global Step: 248270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:28,973-Speed 6308.63 samples/sec Loss 6.1634 LearningRate 0.0006 Epoch: 11 Global Step: 248280 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:32,217-Speed 6315.39 samples/sec Loss 6.0931 LearningRate 0.0006 Epoch: 11 Global Step: 248290 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:35,461-Speed 6313.31 samples/sec Loss 6.1464 LearningRate 0.0006 Epoch: 11 Global Step: 248300 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:38,710-Speed 6305.34 samples/sec Loss 6.1071 LearningRate 0.0006 Epoch: 11 Global Step: 248310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:41,981-Speed 6262.46 samples/sec Loss 6.2437 LearningRate 0.0006 Epoch: 11 Global Step: 248320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:45,228-Speed 6308.12 samples/sec Loss 6.1972 LearningRate 0.0006 Epoch: 11 Global Step: 248330 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:48,471-Speed 6316.04 samples/sec Loss 6.1865 LearningRate 0.0006 Epoch: 11 Global Step: 248340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:51,721-Speed 6303.92 samples/sec Loss 6.2527 LearningRate 0.0006 Epoch: 11 Global Step: 248350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:54,953-Speed 6338.20 samples/sec Loss 6.1278 LearningRate 0.0006 Epoch: 11 Global Step: 248360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:18:58,196-Speed 6315.35 samples/sec Loss 6.2348 LearningRate 0.0006 Epoch: 11 Global Step: 248370 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:01,441-Speed 6313.59 samples/sec Loss 6.1290 LearningRate 0.0006 Epoch: 11 Global Step: 248380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:04,689-Speed 6307.39 samples/sec Loss 6.1271 LearningRate 0.0006 Epoch: 11 Global Step: 248390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:07,935-Speed 6310.61 samples/sec Loss 6.1763 LearningRate 0.0006 Epoch: 11 Global Step: 248400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:11,182-Speed 6308.63 samples/sec Loss 6.1032 LearningRate 0.0006 Epoch: 11 Global Step: 248410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:14,432-Speed 6301.79 samples/sec Loss 6.1172 LearningRate 0.0006 Epoch: 11 Global Step: 248420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:17,680-Speed 6308.46 samples/sec Loss 6.1142 LearningRate 0.0006 Epoch: 11 Global Step: 248430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:20,974-Speed 6217.70 samples/sec Loss 6.1463 LearningRate 0.0006 Epoch: 11 Global Step: 248440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:24,227-Speed 6301.77 samples/sec Loss 6.2391 LearningRate 0.0006 Epoch: 11 Global Step: 248450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:27,471-Speed 6314.35 samples/sec Loss 6.1908 LearningRate 0.0006 Epoch: 11 Global Step: 248460 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:19:30,705-Speed 6333.43 samples/sec Loss 6.1877 LearningRate 0.0006 Epoch: 11 Global Step: 248470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:33,997-Speed 6223.69 samples/sec Loss 6.2445 LearningRate 0.0006 Epoch: 11 Global Step: 248480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:37,315-Speed 6173.13 samples/sec Loss 6.1592 LearningRate 0.0006 Epoch: 11 Global Step: 248490 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:40,593-Speed 6248.73 samples/sec Loss 6.0777 LearningRate 0.0006 Epoch: 11 Global Step: 248500 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:43,841-Speed 6308.06 samples/sec Loss 6.1118 LearningRate 0.0006 Epoch: 11 Global Step: 248510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:47,085-Speed 6314.58 samples/sec Loss 6.1784 LearningRate 0.0006 Epoch: 11 Global Step: 248520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:50,332-Speed 6308.47 samples/sec Loss 6.1028 LearningRate 0.0006 Epoch: 11 Global Step: 248530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:53,573-Speed 6320.42 samples/sec Loss 6.1917 LearningRate 0.0006 Epoch: 11 Global Step: 248540 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:19:56,827-Speed 6295.08 samples/sec Loss 6.1609 LearningRate 0.0006 Epoch: 11 Global Step: 248550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:00,073-Speed 6310.19 samples/sec Loss 6.1338 LearningRate 0.0006 Epoch: 11 Global Step: 248560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:03,309-Speed 6329.85 samples/sec Loss 6.2015 LearningRate 0.0006 Epoch: 11 Global Step: 248570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:06,552-Speed 6316.94 samples/sec Loss 6.0921 LearningRate 0.0006 Epoch: 11 Global Step: 248580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:09,804-Speed 6298.24 samples/sec Loss 6.1125 LearningRate 0.0006 Epoch: 11 Global Step: 248590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:13,055-Speed 6302.10 samples/sec Loss 6.1789 LearningRate 0.0006 Epoch: 11 Global Step: 248600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:16,301-Speed 6310.89 samples/sec Loss 6.2337 LearningRate 0.0006 Epoch: 11 Global Step: 248610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:19,546-Speed 6310.99 samples/sec Loss 6.2098 LearningRate 0.0006 Epoch: 11 Global Step: 248620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:22,793-Speed 6308.93 samples/sec Loss 6.1980 LearningRate 0.0006 Epoch: 11 Global Step: 248630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:26,064-Speed 6263.55 samples/sec Loss 6.1262 LearningRate 0.0006 Epoch: 11 Global Step: 248640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:29,312-Speed 6306.22 samples/sec Loss 6.1293 LearningRate 0.0006 Epoch: 11 Global Step: 248650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:32,560-Speed 6308.54 samples/sec Loss 6.1069 LearningRate 0.0006 Epoch: 11 Global Step: 248660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:35,793-Speed 6335.88 samples/sec Loss 6.1442 LearningRate 0.0006 Epoch: 11 Global Step: 248670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:39,038-Speed 6312.81 samples/sec Loss 6.1214 LearningRate 0.0006 Epoch: 11 Global Step: 248680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:42,341-Speed 6201.74 samples/sec Loss 6.1941 LearningRate 0.0006 Epoch: 11 Global Step: 248690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:45,587-Speed 6310.44 samples/sec Loss 6.2470 LearningRate 0.0006 Epoch: 11 Global Step: 248700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:48,830-Speed 6316.38 samples/sec Loss 6.2563 LearningRate 0.0006 Epoch: 11 Global Step: 248710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:52,076-Speed 6311.34 samples/sec Loss 6.2733 LearningRate 0.0006 Epoch: 11 Global Step: 248720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:55,320-Speed 6314.88 samples/sec Loss 6.1467 LearningRate 0.0006 Epoch: 11 Global Step: 248730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:20:58,567-Speed 6307.85 samples/sec Loss 6.0894 LearningRate 0.0006 Epoch: 11 Global Step: 248740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:01,814-Speed 6308.54 samples/sec Loss 6.2317 LearningRate 0.0006 Epoch: 11 Global Step: 248750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:05,060-Speed 6312.35 samples/sec Loss 6.1646 LearningRate 0.0006 Epoch: 11 Global Step: 248760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:08,306-Speed 6310.28 samples/sec Loss 6.1880 LearningRate 0.0006 Epoch: 11 Global Step: 248770 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:21:11,535-Speed 6345.83 samples/sec Loss 6.1176 LearningRate 0.0006 Epoch: 11 Global Step: 248780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:14,781-Speed 6309.53 samples/sec Loss 6.1286 LearningRate 0.0006 Epoch: 11 Global Step: 248790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:18,027-Speed 6310.41 samples/sec Loss 6.1806 LearningRate 0.0006 Epoch: 11 Global Step: 248800 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:21,277-Speed 6302.70 samples/sec Loss 6.2383 LearningRate 0.0006 Epoch: 11 Global Step: 248810 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:24,526-Speed 6304.75 samples/sec Loss 6.1129 LearningRate 0.0006 Epoch: 11 Global Step: 248820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:27,768-Speed 6318.84 samples/sec Loss 6.2096 LearningRate 0.0006 Epoch: 11 Global Step: 248830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:31,013-Speed 6313.52 samples/sec Loss 6.1260 LearningRate 0.0006 Epoch: 11 Global Step: 248840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:34,257-Speed 6314.70 samples/sec Loss 6.2181 LearningRate 0.0006 Epoch: 11 Global Step: 248850 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:37,507-Speed 6302.66 samples/sec Loss 6.2036 LearningRate 0.0006 Epoch: 11 Global Step: 248860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:21:40,739-Speed 6339.16 samples/sec Loss 6.1888 LearningRate 0.0006 Epoch: 11 Global Step: 248870 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-04-01 14:22:39,714-Speed 347.27 samples/sec Loss 6.1399 LearningRate 0.0006 Epoch: 12 Global Step: 248880 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-04-01 14:22:42,956-Speed 6318.88 samples/sec Loss 6.1493 LearningRate 0.0006 Epoch: 12 Global Step: 248890 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-04-01 14:22:46,192-Speed 6329.96 samples/sec Loss 6.1384 LearningRate 0.0006 Epoch: 12 Global Step: 248900 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-04-01 14:22:49,437-Speed 6312.29 samples/sec Loss 6.1674 LearningRate 0.0006 Epoch: 12 Global Step: 248910 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-04-01 14:22:52,683-Speed 6310.25 samples/sec Loss 6.1302 LearningRate 0.0006 Epoch: 12 Global Step: 248920 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-04-01 14:22:55,920-Speed 6328.73 samples/sec Loss 6.1073 LearningRate 0.0006 Epoch: 12 Global Step: 248930 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-04-01 14:22:59,163-Speed 6316.73 samples/sec Loss 6.1910 LearningRate 0.0006 Epoch: 12 Global Step: 248940 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-04-01 14:23:02,400-Speed 6328.35 samples/sec Loss 6.1604 LearningRate 0.0006 Epoch: 12 Global Step: 248950 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-04-01 14:23:05,639-Speed 6323.53 samples/sec Loss 6.1473 LearningRate 0.0006 Epoch: 12 Global Step: 248960 Fp16 Grad Scale: 16384 Required: 53 hours Training: 2022-04-01 14:23:08,877-Speed 6326.81 samples/sec Loss 6.1672 LearningRate 0.0006 Epoch: 12 Global Step: 248970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:12,122-Speed 6313.92 samples/sec Loss 6.1311 LearningRate 0.0006 Epoch: 12 Global Step: 248980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:15,359-Speed 6327.39 samples/sec Loss 6.0295 LearningRate 0.0006 Epoch: 12 Global Step: 248990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:18,603-Speed 6314.56 samples/sec Loss 6.2114 LearningRate 0.0006 Epoch: 12 Global Step: 249000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:21,843-Speed 6321.66 samples/sec Loss 6.2805 LearningRate 0.0006 Epoch: 12 Global Step: 249010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:25,084-Speed 6320.79 samples/sec Loss 6.2170 LearningRate 0.0006 Epoch: 12 Global Step: 249020 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:28,327-Speed 6315.95 samples/sec Loss 6.1692 LearningRate 0.0006 Epoch: 12 Global Step: 249030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:31,579-Speed 6300.71 samples/sec Loss 6.1311 LearningRate 0.0006 Epoch: 12 Global Step: 249040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:34,819-Speed 6321.94 samples/sec Loss 6.1843 LearningRate 0.0006 Epoch: 12 Global Step: 249050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:38,061-Speed 6319.46 samples/sec Loss 6.1979 LearningRate 0.0006 Epoch: 12 Global Step: 249060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:41,293-Speed 6338.74 samples/sec Loss 6.1765 LearningRate 0.0006 Epoch: 12 Global Step: 249070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:44,535-Speed 6317.20 samples/sec Loss 6.1359 LearningRate 0.0006 Epoch: 12 Global Step: 249080 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:47,779-Speed 6314.57 samples/sec Loss 6.0944 LearningRate 0.0006 Epoch: 12 Global Step: 249090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:51,024-Speed 6312.71 samples/sec Loss 6.1536 LearningRate 0.0006 Epoch: 12 Global Step: 249100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:54,266-Speed 6318.53 samples/sec Loss 6.1117 LearningRate 0.0006 Epoch: 12 Global Step: 249110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:23:57,509-Speed 6317.00 samples/sec Loss 6.1557 LearningRate 0.0006 Epoch: 12 Global Step: 249120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:00,752-Speed 6316.19 samples/sec Loss 6.1083 LearningRate 0.0006 Epoch: 12 Global Step: 249130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:03,996-Speed 6314.36 samples/sec Loss 6.1442 LearningRate 0.0006 Epoch: 12 Global Step: 249140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:07,241-Speed 6313.38 samples/sec Loss 6.1686 LearningRate 0.0006 Epoch: 12 Global Step: 249150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:10,487-Speed 6310.18 samples/sec Loss 6.2183 LearningRate 0.0006 Epoch: 12 Global Step: 249160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:13,726-Speed 6323.93 samples/sec Loss 6.1811 LearningRate 0.0006 Epoch: 12 Global Step: 249170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:16,972-Speed 6312.20 samples/sec Loss 6.0675 LearningRate 0.0006 Epoch: 12 Global Step: 249180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:20,216-Speed 6313.42 samples/sec Loss 6.1320 LearningRate 0.0006 Epoch: 12 Global Step: 249190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:23,473-Speed 6289.43 samples/sec Loss 6.1479 LearningRate 0.0006 Epoch: 12 Global Step: 249200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:26,717-Speed 6315.27 samples/sec Loss 6.1671 LearningRate 0.0006 Epoch: 12 Global Step: 249210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:29,963-Speed 6309.75 samples/sec Loss 6.0628 LearningRate 0.0006 Epoch: 12 Global Step: 249220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:33,206-Speed 6318.19 samples/sec Loss 6.1451 LearningRate 0.0006 Epoch: 12 Global Step: 249230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:36,454-Speed 6306.14 samples/sec Loss 6.1352 LearningRate 0.0006 Epoch: 12 Global Step: 249240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:39,699-Speed 6312.71 samples/sec Loss 6.1136 LearningRate 0.0006 Epoch: 12 Global Step: 249250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:42,942-Speed 6316.54 samples/sec Loss 6.1649 LearningRate 0.0006 Epoch: 12 Global Step: 249260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:46,172-Speed 6342.23 samples/sec Loss 6.1331 LearningRate 0.0006 Epoch: 12 Global Step: 249270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:49,421-Speed 6305.37 samples/sec Loss 6.0957 LearningRate 0.0006 Epoch: 12 Global Step: 249280 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:52,668-Speed 6309.01 samples/sec Loss 6.1730 LearningRate 0.0006 Epoch: 12 Global Step: 249290 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:55,909-Speed 6319.32 samples/sec Loss 6.0694 LearningRate 0.0006 Epoch: 12 Global Step: 249300 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:24:59,151-Speed 6319.70 samples/sec Loss 6.1439 LearningRate 0.0006 Epoch: 12 Global Step: 249310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:02,396-Speed 6311.28 samples/sec Loss 6.1552 LearningRate 0.0006 Epoch: 12 Global Step: 249320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:05,642-Speed 6311.99 samples/sec Loss 6.1678 LearningRate 0.0006 Epoch: 12 Global Step: 249330 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:08,885-Speed 6315.27 samples/sec Loss 6.1311 LearningRate 0.0006 Epoch: 12 Global Step: 249340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:12,134-Speed 6306.86 samples/sec Loss 6.0550 LearningRate 0.0006 Epoch: 12 Global Step: 249350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:15,379-Speed 6312.24 samples/sec Loss 6.2135 LearningRate 0.0006 Epoch: 12 Global Step: 249360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:18,621-Speed 6317.28 samples/sec Loss 6.0605 LearningRate 0.0006 Epoch: 12 Global Step: 249370 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:25:21,851-Speed 6341.77 samples/sec Loss 6.1069 LearningRate 0.0006 Epoch: 12 Global Step: 249380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:25,102-Speed 6302.08 samples/sec Loss 6.0665 LearningRate 0.0006 Epoch: 12 Global Step: 249390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:28,345-Speed 6316.73 samples/sec Loss 6.0665 LearningRate 0.0006 Epoch: 12 Global Step: 249400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:31,589-Speed 6314.30 samples/sec Loss 6.1683 LearningRate 0.0006 Epoch: 12 Global Step: 249410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:34,834-Speed 6312.41 samples/sec Loss 6.1578 LearningRate 0.0006 Epoch: 12 Global Step: 249420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:38,127-Speed 6220.98 samples/sec Loss 6.1547 LearningRate 0.0006 Epoch: 12 Global Step: 249430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:41,385-Speed 6287.74 samples/sec Loss 6.1719 LearningRate 0.0006 Epoch: 12 Global Step: 249440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:44,635-Speed 6303.45 samples/sec Loss 6.1592 LearningRate 0.0006 Epoch: 12 Global Step: 249450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:47,883-Speed 6306.68 samples/sec Loss 6.1893 LearningRate 0.0006 Epoch: 12 Global Step: 249460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:51,128-Speed 6312.13 samples/sec Loss 6.1228 LearningRate 0.0006 Epoch: 12 Global Step: 249470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:54,359-Speed 6339.89 samples/sec Loss 6.1383 LearningRate 0.0006 Epoch: 12 Global Step: 249480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:25:57,605-Speed 6312.37 samples/sec Loss 6.1121 LearningRate 0.0006 Epoch: 12 Global Step: 249490 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:00,846-Speed 6319.69 samples/sec Loss 6.1825 LearningRate 0.0006 Epoch: 12 Global Step: 249500 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:04,097-Speed 6301.01 samples/sec Loss 6.2041 LearningRate 0.0006 Epoch: 12 Global Step: 249510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:07,342-Speed 6311.95 samples/sec Loss 6.2089 LearningRate 0.0006 Epoch: 12 Global Step: 249520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:10,586-Speed 6315.70 samples/sec Loss 6.2021 LearningRate 0.0006 Epoch: 12 Global Step: 249530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:13,830-Speed 6313.94 samples/sec Loss 6.2174 LearningRate 0.0006 Epoch: 12 Global Step: 249540 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:17,076-Speed 6311.69 samples/sec Loss 6.1475 LearningRate 0.0006 Epoch: 12 Global Step: 249550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:20,319-Speed 6314.83 samples/sec Loss 6.1213 LearningRate 0.0006 Epoch: 12 Global Step: 249560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:23,564-Speed 6313.42 samples/sec Loss 6.1229 LearningRate 0.0006 Epoch: 12 Global Step: 249570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:26,811-Speed 6308.35 samples/sec Loss 6.1168 LearningRate 0.0006 Epoch: 12 Global Step: 249580 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:26:30,044-Speed 6337.10 samples/sec Loss 6.0854 LearningRate 0.0006 Epoch: 12 Global Step: 249590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:33,286-Speed 6317.59 samples/sec Loss 6.1469 LearningRate 0.0006 Epoch: 12 Global Step: 249600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:36,535-Speed 6304.53 samples/sec Loss 6.1185 LearningRate 0.0006 Epoch: 12 Global Step: 249610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:39,782-Speed 6310.54 samples/sec Loss 6.1533 LearningRate 0.0006 Epoch: 12 Global Step: 249620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:43,029-Speed 6307.07 samples/sec Loss 6.1337 LearningRate 0.0006 Epoch: 12 Global Step: 249630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:46,273-Speed 6315.26 samples/sec Loss 6.2580 LearningRate 0.0006 Epoch: 12 Global Step: 249640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:49,517-Speed 6314.76 samples/sec Loss 6.1846 LearningRate 0.0006 Epoch: 12 Global Step: 249650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:52,767-Speed 6303.09 samples/sec Loss 6.1735 LearningRate 0.0006 Epoch: 12 Global Step: 249660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:56,010-Speed 6316.37 samples/sec Loss 6.1426 LearningRate 0.0006 Epoch: 12 Global Step: 249670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:26:59,255-Speed 6313.22 samples/sec Loss 6.2085 LearningRate 0.0006 Epoch: 12 Global Step: 249680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:02,496-Speed 6320.79 samples/sec Loss 6.1263 LearningRate 0.0006 Epoch: 12 Global Step: 249690 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:27:05,725-Speed 6344.83 samples/sec Loss 6.1375 LearningRate 0.0006 Epoch: 12 Global Step: 249700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:08,970-Speed 6312.23 samples/sec Loss 6.1942 LearningRate 0.0006 Epoch: 12 Global Step: 249710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:12,212-Speed 6318.52 samples/sec Loss 6.1711 LearningRate 0.0006 Epoch: 12 Global Step: 249720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:15,457-Speed 6312.69 samples/sec Loss 6.1572 LearningRate 0.0006 Epoch: 12 Global Step: 249730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:18,698-Speed 6318.89 samples/sec Loss 6.1822 LearningRate 0.0006 Epoch: 12 Global Step: 249740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:21,944-Speed 6311.15 samples/sec Loss 6.1447 LearningRate 0.0006 Epoch: 12 Global Step: 249750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:25,189-Speed 6313.13 samples/sec Loss 6.0710 LearningRate 0.0006 Epoch: 12 Global Step: 249760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:28,435-Speed 6309.85 samples/sec Loss 6.1690 LearningRate 0.0006 Epoch: 12 Global Step: 249770 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:31,676-Speed 6321.68 samples/sec Loss 6.1231 LearningRate 0.0006 Epoch: 12 Global Step: 249780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:34,920-Speed 6313.96 samples/sec Loss 6.1518 LearningRate 0.0006 Epoch: 12 Global Step: 249790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:38,152-Speed 6338.69 samples/sec Loss 6.1165 LearningRate 0.0006 Epoch: 12 Global Step: 249800 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:41,393-Speed 6319.69 samples/sec Loss 6.1708 LearningRate 0.0006 Epoch: 12 Global Step: 249810 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:44,650-Speed 6290.11 samples/sec Loss 6.1246 LearningRate 0.0006 Epoch: 12 Global Step: 249820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:47,950-Speed 6207.43 samples/sec Loss 6.1465 LearningRate 0.0006 Epoch: 12 Global Step: 249830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:51,202-Speed 6298.27 samples/sec Loss 6.1502 LearningRate 0.0006 Epoch: 12 Global Step: 249840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:54,450-Speed 6307.66 samples/sec Loss 6.1145 LearningRate 0.0006 Epoch: 12 Global Step: 249850 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:27:57,694-Speed 6314.37 samples/sec Loss 6.0749 LearningRate 0.0006 Epoch: 12 Global Step: 249860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:00,940-Speed 6309.15 samples/sec Loss 6.1403 LearningRate 0.0006 Epoch: 12 Global Step: 249870 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:04,234-Speed 6219.28 samples/sec Loss 6.1663 LearningRate 0.0006 Epoch: 12 Global Step: 249880 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:07,482-Speed 6308.50 samples/sec Loss 6.2037 LearningRate 0.0006 Epoch: 12 Global Step: 249890 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:10,710-Speed 6345.51 samples/sec Loss 6.2040 LearningRate 0.0006 Epoch: 12 Global Step: 249900 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:13,957-Speed 6308.95 samples/sec Loss 6.1358 LearningRate 0.0006 Epoch: 12 Global Step: 249910 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:17,207-Speed 6303.73 samples/sec Loss 6.1417 LearningRate 0.0006 Epoch: 12 Global Step: 249920 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:20,454-Speed 6308.21 samples/sec Loss 6.0655 LearningRate 0.0006 Epoch: 12 Global Step: 249930 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:23,700-Speed 6311.60 samples/sec Loss 6.0671 LearningRate 0.0006 Epoch: 12 Global Step: 249940 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:26,947-Speed 6308.02 samples/sec Loss 6.0921 LearningRate 0.0006 Epoch: 12 Global Step: 249950 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:30,293-Speed 6121.51 samples/sec Loss 6.1476 LearningRate 0.0006 Epoch: 12 Global Step: 249960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:33,538-Speed 6312.72 samples/sec Loss 6.1915 LearningRate 0.0006 Epoch: 12 Global Step: 249970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:36,789-Speed 6301.55 samples/sec Loss 6.1087 LearningRate 0.0006 Epoch: 12 Global Step: 249980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:40,035-Speed 6309.92 samples/sec Loss 6.0695 LearningRate 0.0006 Epoch: 12 Global Step: 249990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:43,279-Speed 6314.88 samples/sec Loss 6.1163 LearningRate 0.0006 Epoch: 12 Global Step: 250000 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:28:46,528-Speed 6306.27 samples/sec Loss 6.1396 LearningRate 0.0006 Epoch: 12 Global Step: 250010 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:28:49,759-Speed 6339.87 samples/sec Loss 6.0609 LearningRate 0.0006 Epoch: 12 Global Step: 250020 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:53,005-Speed 6310.12 samples/sec Loss 6.1453 LearningRate 0.0006 Epoch: 12 Global Step: 250030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:56,248-Speed 6316.20 samples/sec Loss 6.1668 LearningRate 0.0006 Epoch: 12 Global Step: 250040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:28:59,492-Speed 6314.48 samples/sec Loss 6.1750 LearningRate 0.0006 Epoch: 12 Global Step: 250050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:02,745-Speed 6297.76 samples/sec Loss 6.2434 LearningRate 0.0006 Epoch: 12 Global Step: 250060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:05,989-Speed 6314.50 samples/sec Loss 6.1844 LearningRate 0.0006 Epoch: 12 Global Step: 250070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:09,232-Speed 6315.83 samples/sec Loss 6.1288 LearningRate 0.0006 Epoch: 12 Global Step: 250080 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:12,474-Speed 6319.07 samples/sec Loss 6.1187 LearningRate 0.0006 Epoch: 12 Global Step: 250090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:15,721-Speed 6310.76 samples/sec Loss 6.1395 LearningRate 0.0006 Epoch: 12 Global Step: 250100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:18,965-Speed 6313.49 samples/sec Loss 6.1857 LearningRate 0.0006 Epoch: 12 Global Step: 250110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:22,213-Speed 6307.11 samples/sec Loss 6.1471 LearningRate 0.0006 Epoch: 12 Global Step: 250120 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:29:25,447-Speed 6334.17 samples/sec Loss 6.1858 LearningRate 0.0006 Epoch: 12 Global Step: 250130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:28,691-Speed 6314.18 samples/sec Loss 6.1740 LearningRate 0.0006 Epoch: 12 Global Step: 250140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:31,941-Speed 6304.07 samples/sec Loss 6.0818 LearningRate 0.0006 Epoch: 12 Global Step: 250150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:35,184-Speed 6315.37 samples/sec Loss 6.1201 LearningRate 0.0006 Epoch: 12 Global Step: 250160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:38,427-Speed 6317.60 samples/sec Loss 6.1055 LearningRate 0.0006 Epoch: 12 Global Step: 250170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:41,672-Speed 6313.21 samples/sec Loss 6.1428 LearningRate 0.0006 Epoch: 12 Global Step: 250180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:44,917-Speed 6312.34 samples/sec Loss 6.2498 LearningRate 0.0006 Epoch: 12 Global Step: 250190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:48,160-Speed 6315.76 samples/sec Loss 6.0751 LearningRate 0.0006 Epoch: 12 Global Step: 250200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:51,403-Speed 6316.30 samples/sec Loss 6.1524 LearningRate 0.0006 Epoch: 12 Global Step: 250210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:54,648-Speed 6313.08 samples/sec Loss 6.1514 LearningRate 0.0006 Epoch: 12 Global Step: 250220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:29:57,883-Speed 6331.29 samples/sec Loss 6.1233 LearningRate 0.0006 Epoch: 12 Global Step: 250230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:01,129-Speed 6311.26 samples/sec Loss 6.1694 LearningRate 0.0006 Epoch: 12 Global Step: 250240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:04,378-Speed 6305.08 samples/sec Loss 6.1523 LearningRate 0.0006 Epoch: 12 Global Step: 250250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:07,624-Speed 6310.90 samples/sec Loss 6.1181 LearningRate 0.0006 Epoch: 12 Global Step: 250260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:10,866-Speed 6317.28 samples/sec Loss 6.1693 LearningRate 0.0006 Epoch: 12 Global Step: 250270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:14,111-Speed 6312.71 samples/sec Loss 6.1418 LearningRate 0.0006 Epoch: 12 Global Step: 250280 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:17,363-Speed 6300.96 samples/sec Loss 6.1113 LearningRate 0.0006 Epoch: 12 Global Step: 250290 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:20,609-Speed 6310.65 samples/sec Loss 6.1837 LearningRate 0.0006 Epoch: 12 Global Step: 250300 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:23,857-Speed 6307.97 samples/sec Loss 6.1222 LearningRate 0.0006 Epoch: 12 Global Step: 250310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:27,122-Speed 6273.48 samples/sec Loss 6.1252 LearningRate 0.0006 Epoch: 12 Global Step: 250320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:30,366-Speed 6314.37 samples/sec Loss 6.2216 LearningRate 0.0006 Epoch: 12 Global Step: 250330 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:30:33,597-Speed 6340.05 samples/sec Loss 6.1378 LearningRate 0.0006 Epoch: 12 Global Step: 250340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:36,844-Speed 6309.00 samples/sec Loss 6.1190 LearningRate 0.0006 Epoch: 12 Global Step: 250350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:40,089-Speed 6312.50 samples/sec Loss 6.1360 LearningRate 0.0006 Epoch: 12 Global Step: 250360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:43,336-Speed 6309.88 samples/sec Loss 6.1336 LearningRate 0.0006 Epoch: 12 Global Step: 250370 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:46,576-Speed 6322.70 samples/sec Loss 6.1418 LearningRate 0.0006 Epoch: 12 Global Step: 250380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:49,821-Speed 6311.79 samples/sec Loss 6.1757 LearningRate 0.0006 Epoch: 12 Global Step: 250390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:53,064-Speed 6316.70 samples/sec Loss 6.1461 LearningRate 0.0006 Epoch: 12 Global Step: 250400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:56,309-Speed 6312.06 samples/sec Loss 6.1757 LearningRate 0.0006 Epoch: 12 Global Step: 250410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:30:59,555-Speed 6310.73 samples/sec Loss 6.1290 LearningRate 0.0006 Epoch: 12 Global Step: 250420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:02,799-Speed 6314.20 samples/sec Loss 6.1057 LearningRate 0.0006 Epoch: 12 Global Step: 250430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:06,032-Speed 6337.47 samples/sec Loss 6.0128 LearningRate 0.0006 Epoch: 12 Global Step: 250440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:09,275-Speed 6316.08 samples/sec Loss 6.0947 LearningRate 0.0006 Epoch: 12 Global Step: 250450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:12,520-Speed 6311.92 samples/sec Loss 6.0962 LearningRate 0.0006 Epoch: 12 Global Step: 250460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:15,769-Speed 6305.86 samples/sec Loss 6.1013 LearningRate 0.0006 Epoch: 12 Global Step: 250470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:19,014-Speed 6312.54 samples/sec Loss 6.1164 LearningRate 0.0006 Epoch: 12 Global Step: 250480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:22,260-Speed 6310.26 samples/sec Loss 6.1000 LearningRate 0.0006 Epoch: 12 Global Step: 250490 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:25,503-Speed 6316.12 samples/sec Loss 6.1119 LearningRate 0.0006 Epoch: 12 Global Step: 250500 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:28,750-Speed 6310.90 samples/sec Loss 6.2307 LearningRate 0.0006 Epoch: 12 Global Step: 250510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:31,996-Speed 6310.87 samples/sec Loss 6.1073 LearningRate 0.0006 Epoch: 12 Global Step: 250520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:35,243-Speed 6308.62 samples/sec Loss 6.0859 LearningRate 0.0006 Epoch: 12 Global Step: 250530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:38,493-Speed 6303.28 samples/sec Loss 6.1146 LearningRate 0.0006 Epoch: 12 Global Step: 250540 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:31:41,736-Speed 6315.46 samples/sec Loss 6.1437 LearningRate 0.0006 Epoch: 12 Global Step: 250550 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:31:44,965-Speed 6343.37 samples/sec Loss 6.1792 LearningRate 0.0006 Epoch: 12 Global Step: 250560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:48,210-Speed 6312.55 samples/sec Loss 6.1187 LearningRate 0.0006 Epoch: 12 Global Step: 250570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:51,467-Speed 6290.04 samples/sec Loss 6.1626 LearningRate 0.0006 Epoch: 12 Global Step: 250580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:54,710-Speed 6316.27 samples/sec Loss 6.1703 LearningRate 0.0006 Epoch: 12 Global Step: 250590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:31:57,965-Speed 6294.50 samples/sec Loss 6.1992 LearningRate 0.0006 Epoch: 12 Global Step: 250600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:01,210-Speed 6312.72 samples/sec Loss 6.1210 LearningRate 0.0006 Epoch: 12 Global Step: 250610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:04,459-Speed 6304.64 samples/sec Loss 6.1461 LearningRate 0.0006 Epoch: 12 Global Step: 250620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:07,706-Speed 6308.48 samples/sec Loss 6.1630 LearningRate 0.0006 Epoch: 12 Global Step: 250630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:10,959-Speed 6297.16 samples/sec Loss 6.2386 LearningRate 0.0006 Epoch: 12 Global Step: 250640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:14,208-Speed 6304.51 samples/sec Loss 6.0965 LearningRate 0.0006 Epoch: 12 Global Step: 250650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:17,457-Speed 6305.75 samples/sec Loss 6.1943 LearningRate 0.0006 Epoch: 12 Global Step: 250660 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:32:20,687-Speed 6340.85 samples/sec Loss 6.1211 LearningRate 0.0006 Epoch: 12 Global Step: 250670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:23,930-Speed 6316.30 samples/sec Loss 6.1342 LearningRate 0.0006 Epoch: 12 Global Step: 250680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:27,184-Speed 6295.56 samples/sec Loss 6.1002 LearningRate 0.0006 Epoch: 12 Global Step: 250690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:30,432-Speed 6308.18 samples/sec Loss 6.1524 LearningRate 0.0006 Epoch: 12 Global Step: 250700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:33,678-Speed 6310.42 samples/sec Loss 6.1313 LearningRate 0.0006 Epoch: 12 Global Step: 250710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:36,928-Speed 6301.77 samples/sec Loss 6.1710 LearningRate 0.0006 Epoch: 12 Global Step: 250720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:40,174-Speed 6312.83 samples/sec Loss 6.1544 LearningRate 0.0006 Epoch: 12 Global Step: 250730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:43,419-Speed 6311.07 samples/sec Loss 6.1348 LearningRate 0.0006 Epoch: 12 Global Step: 250740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:46,666-Speed 6309.12 samples/sec Loss 6.1261 LearningRate 0.0006 Epoch: 12 Global Step: 250750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:49,913-Speed 6310.00 samples/sec Loss 6.1511 LearningRate 0.0006 Epoch: 12 Global Step: 250760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:53,146-Speed 6335.29 samples/sec Loss 6.2201 LearningRate 0.0006 Epoch: 12 Global Step: 250770 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:56,392-Speed 6310.86 samples/sec Loss 6.1498 LearningRate 0.0006 Epoch: 12 Global Step: 250780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:32:59,644-Speed 6299.59 samples/sec Loss 6.0591 LearningRate 0.0006 Epoch: 12 Global Step: 250790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:02,888-Speed 6313.99 samples/sec Loss 6.1863 LearningRate 0.0006 Epoch: 12 Global Step: 250800 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:06,137-Speed 6303.97 samples/sec Loss 6.1430 LearningRate 0.0006 Epoch: 12 Global Step: 250810 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:09,384-Speed 6310.20 samples/sec Loss 6.1978 LearningRate 0.0006 Epoch: 12 Global Step: 250820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:12,631-Speed 6308.55 samples/sec Loss 6.1789 LearningRate 0.0006 Epoch: 12 Global Step: 250830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:15,885-Speed 6293.63 samples/sec Loss 6.1202 LearningRate 0.0006 Epoch: 12 Global Step: 250840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:19,133-Speed 6308.46 samples/sec Loss 6.1246 LearningRate 0.0006 Epoch: 12 Global Step: 250850 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:22,379-Speed 6308.84 samples/sec Loss 6.1660 LearningRate 0.0006 Epoch: 12 Global Step: 250860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:25,644-Speed 6274.34 samples/sec Loss 6.1360 LearningRate 0.0006 Epoch: 12 Global Step: 250870 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:33:28,875-Speed 6340.64 samples/sec Loss 6.0827 LearningRate 0.0006 Epoch: 12 Global Step: 250880 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:32,119-Speed 6315.47 samples/sec Loss 6.1493 LearningRate 0.0006 Epoch: 12 Global Step: 250890 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:35,367-Speed 6307.31 samples/sec Loss 6.0403 LearningRate 0.0006 Epoch: 12 Global Step: 250900 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:38,614-Speed 6308.87 samples/sec Loss 6.1458 LearningRate 0.0006 Epoch: 12 Global Step: 250910 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:41,857-Speed 6316.85 samples/sec Loss 6.2349 LearningRate 0.0006 Epoch: 12 Global Step: 250920 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:45,101-Speed 6313.70 samples/sec Loss 5.9802 LearningRate 0.0006 Epoch: 12 Global Step: 250930 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:48,345-Speed 6315.39 samples/sec Loss 6.1115 LearningRate 0.0006 Epoch: 12 Global Step: 250940 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:51,590-Speed 6311.30 samples/sec Loss 6.1565 LearningRate 0.0006 Epoch: 12 Global Step: 250950 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:54,840-Speed 6303.17 samples/sec Loss 6.1090 LearningRate 0.0006 Epoch: 12 Global Step: 250960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:33:58,083-Speed 6316.98 samples/sec Loss 6.2058 LearningRate 0.0006 Epoch: 12 Global Step: 250970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:01,318-Speed 6332.81 samples/sec Loss 6.1123 LearningRate 0.0006 Epoch: 12 Global Step: 250980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:04,644-Speed 6158.01 samples/sec Loss 6.1546 LearningRate 0.0006 Epoch: 12 Global Step: 250990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:07,892-Speed 6307.63 samples/sec Loss 6.1869 LearningRate 0.0006 Epoch: 12 Global Step: 251000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:11,144-Speed 6298.29 samples/sec Loss 6.1255 LearningRate 0.0006 Epoch: 12 Global Step: 251010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:14,422-Speed 6249.02 samples/sec Loss 6.1979 LearningRate 0.0006 Epoch: 12 Global Step: 251020 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:17,779-Speed 6102.50 samples/sec Loss 6.1003 LearningRate 0.0006 Epoch: 12 Global Step: 251030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:21,027-Speed 6306.78 samples/sec Loss 6.1390 LearningRate 0.0006 Epoch: 12 Global Step: 251040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:24,274-Speed 6309.15 samples/sec Loss 6.1303 LearningRate 0.0006 Epoch: 12 Global Step: 251050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:27,521-Speed 6308.80 samples/sec Loss 6.1730 LearningRate 0.0006 Epoch: 12 Global Step: 251060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:30,767-Speed 6310.89 samples/sec Loss 6.1362 LearningRate 0.0006 Epoch: 12 Global Step: 251070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:34,010-Speed 6315.40 samples/sec Loss 6.1033 LearningRate 0.0006 Epoch: 12 Global Step: 251080 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:34:37,242-Speed 6337.77 samples/sec Loss 6.0661 LearningRate 0.0006 Epoch: 12 Global Step: 251090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:40,491-Speed 6305.74 samples/sec Loss 6.0164 LearningRate 0.0006 Epoch: 12 Global Step: 251100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:43,738-Speed 6309.54 samples/sec Loss 6.0551 LearningRate 0.0006 Epoch: 12 Global Step: 251110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:46,990-Speed 6299.79 samples/sec Loss 6.1251 LearningRate 0.0006 Epoch: 12 Global Step: 251120 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:50,235-Speed 6312.36 samples/sec Loss 6.1282 LearningRate 0.0006 Epoch: 12 Global Step: 251130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:53,481-Speed 6309.46 samples/sec Loss 6.1452 LearningRate 0.0006 Epoch: 12 Global Step: 251140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:56,729-Speed 6309.04 samples/sec Loss 6.0803 LearningRate 0.0006 Epoch: 12 Global Step: 251150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:34:59,973-Speed 6313.36 samples/sec Loss 6.1345 LearningRate 0.0006 Epoch: 12 Global Step: 251160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:03,223-Speed 6303.85 samples/sec Loss 6.0500 LearningRate 0.0006 Epoch: 12 Global Step: 251170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:06,485-Speed 6278.73 samples/sec Loss 6.1587 LearningRate 0.0006 Epoch: 12 Global Step: 251180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:09,720-Speed 6332.67 samples/sec Loss 6.0751 LearningRate 0.0006 Epoch: 12 Global Step: 251190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:12,993-Speed 6257.82 samples/sec Loss 6.1774 LearningRate 0.0006 Epoch: 12 Global Step: 251200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:16,239-Speed 6310.82 samples/sec Loss 6.1324 LearningRate 0.0006 Epoch: 12 Global Step: 251210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:19,484-Speed 6312.38 samples/sec Loss 6.1571 LearningRate 0.0006 Epoch: 12 Global Step: 251220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:22,730-Speed 6310.30 samples/sec Loss 6.1410 LearningRate 0.0006 Epoch: 12 Global Step: 251230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:25,983-Speed 6299.23 samples/sec Loss 6.1637 LearningRate 0.0006 Epoch: 12 Global Step: 251240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:29,231-Speed 6306.42 samples/sec Loss 6.1782 LearningRate 0.0006 Epoch: 12 Global Step: 251250 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:32,479-Speed 6305.72 samples/sec Loss 6.1788 LearningRate 0.0006 Epoch: 12 Global Step: 251260 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:35,721-Speed 6318.71 samples/sec Loss 6.0533 LearningRate 0.0006 Epoch: 12 Global Step: 251270 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:38,969-Speed 6307.75 samples/sec Loss 6.1422 LearningRate 0.0006 Epoch: 12 Global Step: 251280 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:42,217-Speed 6306.73 samples/sec Loss 6.1620 LearningRate 0.0006 Epoch: 12 Global Step: 251290 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:35:45,450-Speed 6335.57 samples/sec Loss 6.0791 LearningRate 0.0006 Epoch: 12 Global Step: 251300 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:48,696-Speed 6312.36 samples/sec Loss 6.0779 LearningRate 0.0006 Epoch: 12 Global Step: 251310 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:51,942-Speed 6309.53 samples/sec Loss 6.0838 LearningRate 0.0006 Epoch: 12 Global Step: 251320 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:55,184-Speed 6318.56 samples/sec Loss 6.1332 LearningRate 0.0006 Epoch: 12 Global Step: 251330 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:35:58,467-Speed 6239.59 samples/sec Loss 6.1085 LearningRate 0.0006 Epoch: 12 Global Step: 251340 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:01,768-Speed 6206.67 samples/sec Loss 6.2179 LearningRate 0.0006 Epoch: 12 Global Step: 251350 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:05,010-Speed 6318.33 samples/sec Loss 6.0726 LearningRate 0.0006 Epoch: 12 Global Step: 251360 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:08,257-Speed 6308.17 samples/sec Loss 6.2053 LearningRate 0.0006 Epoch: 12 Global Step: 251370 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:11,503-Speed 6310.59 samples/sec Loss 6.1439 LearningRate 0.0006 Epoch: 12 Global Step: 251380 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:14,749-Speed 6310.34 samples/sec Loss 6.1896 LearningRate 0.0006 Epoch: 12 Global Step: 251390 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:17,982-Speed 6337.12 samples/sec Loss 6.0274 LearningRate 0.0006 Epoch: 12 Global Step: 251400 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:21,231-Speed 6304.38 samples/sec Loss 6.0854 LearningRate 0.0006 Epoch: 12 Global Step: 251410 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:24,474-Speed 6317.64 samples/sec Loss 6.1411 LearningRate 0.0006 Epoch: 12 Global Step: 251420 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:27,719-Speed 6311.19 samples/sec Loss 6.1234 LearningRate 0.0006 Epoch: 12 Global Step: 251430 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:30,964-Speed 6313.70 samples/sec Loss 6.1672 LearningRate 0.0006 Epoch: 12 Global Step: 251440 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:34,210-Speed 6310.79 samples/sec Loss 6.1205 LearningRate 0.0006 Epoch: 12 Global Step: 251450 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:37,456-Speed 6309.33 samples/sec Loss 6.1382 LearningRate 0.0006 Epoch: 12 Global Step: 251460 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:40,702-Speed 6311.30 samples/sec Loss 6.1190 LearningRate 0.0006 Epoch: 12 Global Step: 251470 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:43,949-Speed 6308.40 samples/sec Loss 6.0600 LearningRate 0.0006 Epoch: 12 Global Step: 251480 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:47,198-Speed 6306.11 samples/sec Loss 6.1077 LearningRate 0.0006 Epoch: 12 Global Step: 251490 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:50,460-Speed 6279.40 samples/sec Loss 6.0469 LearningRate 0.0006 Epoch: 12 Global Step: 251500 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:36:53,693-Speed 6337.12 samples/sec Loss 6.1427 LearningRate 0.0006 Epoch: 12 Global Step: 251510 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:36:56,940-Speed 6307.85 samples/sec Loss 6.1472 LearningRate 0.0006 Epoch: 12 Global Step: 251520 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:00,196-Speed 6291.07 samples/sec Loss 6.1576 LearningRate 0.0006 Epoch: 12 Global Step: 251530 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:03,446-Speed 6304.42 samples/sec Loss 6.2118 LearningRate 0.0006 Epoch: 12 Global Step: 251540 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:06,690-Speed 6313.57 samples/sec Loss 6.1071 LearningRate 0.0006 Epoch: 12 Global Step: 251550 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:09,937-Speed 6308.60 samples/sec Loss 6.1022 LearningRate 0.0006 Epoch: 12 Global Step: 251560 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:13,185-Speed 6306.72 samples/sec Loss 6.1289 LearningRate 0.0006 Epoch: 12 Global Step: 251570 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:16,431-Speed 6312.06 samples/sec Loss 6.0833 LearningRate 0.0006 Epoch: 12 Global Step: 251580 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:19,684-Speed 6296.20 samples/sec Loss 6.1590 LearningRate 0.0006 Epoch: 12 Global Step: 251590 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:22,931-Speed 6308.82 samples/sec Loss 6.1877 LearningRate 0.0006 Epoch: 12 Global Step: 251600 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:26,165-Speed 6335.41 samples/sec Loss 6.1083 LearningRate 0.0006 Epoch: 12 Global Step: 251610 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:29,412-Speed 6308.27 samples/sec Loss 6.0790 LearningRate 0.0006 Epoch: 12 Global Step: 251620 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:32,658-Speed 6311.40 samples/sec Loss 6.1976 LearningRate 0.0006 Epoch: 12 Global Step: 251630 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:35,906-Speed 6305.71 samples/sec Loss 6.1516 LearningRate 0.0006 Epoch: 12 Global Step: 251640 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:39,151-Speed 6312.14 samples/sec Loss 6.1205 LearningRate 0.0006 Epoch: 12 Global Step: 251650 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:42,398-Speed 6309.34 samples/sec Loss 6.1647 LearningRate 0.0006 Epoch: 12 Global Step: 251660 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:45,645-Speed 6308.00 samples/sec Loss 6.0974 LearningRate 0.0006 Epoch: 12 Global Step: 251670 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:48,902-Speed 6290.56 samples/sec Loss 6.1175 LearningRate 0.0006 Epoch: 12 Global Step: 251680 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:52,152-Speed 6303.26 samples/sec Loss 6.0929 LearningRate 0.0006 Epoch: 12 Global Step: 251690 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:55,402-Speed 6302.22 samples/sec Loss 6.0964 LearningRate 0.0006 Epoch: 12 Global Step: 251700 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:37:58,636-Speed 6335.99 samples/sec Loss 6.1870 LearningRate 0.0006 Epoch: 12 Global Step: 251710 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:01,884-Speed 6306.78 samples/sec Loss 6.2015 LearningRate 0.0006 Epoch: 12 Global Step: 251720 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:05,136-Speed 6299.19 samples/sec Loss 6.1565 LearningRate 0.0006 Epoch: 12 Global Step: 251730 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:08,381-Speed 6313.10 samples/sec Loss 6.2236 LearningRate 0.0006 Epoch: 12 Global Step: 251740 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:11,628-Speed 6307.97 samples/sec Loss 6.1586 LearningRate 0.0006 Epoch: 12 Global Step: 251750 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:14,872-Speed 6314.01 samples/sec Loss 6.0628 LearningRate 0.0006 Epoch: 12 Global Step: 251760 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:18,121-Speed 6305.59 samples/sec Loss 6.1613 LearningRate 0.0006 Epoch: 12 Global Step: 251770 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:21,368-Speed 6308.04 samples/sec Loss 6.0626 LearningRate 0.0006 Epoch: 12 Global Step: 251780 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:24,613-Speed 6313.49 samples/sec Loss 6.0880 LearningRate 0.0006 Epoch: 12 Global Step: 251790 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:27,859-Speed 6311.12 samples/sec Loss 6.0998 LearningRate 0.0006 Epoch: 12 Global Step: 251800 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:31,093-Speed 6333.26 samples/sec Loss 6.1700 LearningRate 0.0006 Epoch: 12 Global Step: 251810 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:34,342-Speed 6304.98 samples/sec Loss 6.1474 LearningRate 0.0006 Epoch: 12 Global Step: 251820 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:37,587-Speed 6311.81 samples/sec Loss 6.1435 LearningRate 0.0006 Epoch: 12 Global Step: 251830 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:40,835-Speed 6307.82 samples/sec Loss 6.0945 LearningRate 0.0006 Epoch: 12 Global Step: 251840 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:44,084-Speed 6304.41 samples/sec Loss 6.0174 LearningRate 0.0006 Epoch: 12 Global Step: 251850 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:47,332-Speed 6307.12 samples/sec Loss 6.1425 LearningRate 0.0006 Epoch: 12 Global Step: 251860 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:50,588-Speed 6290.66 samples/sec Loss 6.1774 LearningRate 0.0006 Epoch: 12 Global Step: 251870 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:53,830-Speed 6318.66 samples/sec Loss 6.1822 LearningRate 0.0006 Epoch: 12 Global Step: 251880 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:38:57,078-Speed 6306.72 samples/sec Loss 6.1321 LearningRate 0.0006 Epoch: 12 Global Step: 251890 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:00,325-Speed 6308.97 samples/sec Loss 6.1149 LearningRate 0.0006 Epoch: 12 Global Step: 251900 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:03,573-Speed 6307.48 samples/sec Loss 6.0672 LearningRate 0.0006 Epoch: 12 Global Step: 251910 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:39:06,806-Speed 6336.53 samples/sec Loss 6.1130 LearningRate 0.0006 Epoch: 12 Global Step: 251920 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:10,097-Speed 6225.44 samples/sec Loss 6.0958 LearningRate 0.0006 Epoch: 12 Global Step: 251930 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:13,413-Speed 6177.59 samples/sec Loss 6.0699 LearningRate 0.0006 Epoch: 12 Global Step: 251940 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:16,659-Speed 6309.47 samples/sec Loss 6.1738 LearningRate 0.0006 Epoch: 12 Global Step: 251950 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:19,905-Speed 6310.28 samples/sec Loss 6.1069 LearningRate 0.0006 Epoch: 12 Global Step: 251960 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:23,153-Speed 6307.28 samples/sec Loss 6.1739 LearningRate 0.0006 Epoch: 12 Global Step: 251970 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:26,402-Speed 6304.84 samples/sec Loss 6.1169 LearningRate 0.0006 Epoch: 12 Global Step: 251980 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:29,652-Speed 6302.62 samples/sec Loss 6.1098 LearningRate 0.0006 Epoch: 12 Global Step: 251990 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:32,900-Speed 6307.79 samples/sec Loss 6.1017 LearningRate 0.0006 Epoch: 12 Global Step: 252000 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:36,147-Speed 6309.42 samples/sec Loss 6.1289 LearningRate 0.0006 Epoch: 12 Global Step: 252010 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:39,383-Speed 6330.46 samples/sec Loss 6.1784 LearningRate 0.0006 Epoch: 12 Global Step: 252020 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:42,630-Speed 6308.22 samples/sec Loss 6.0832 LearningRate 0.0006 Epoch: 12 Global Step: 252030 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:45,876-Speed 6310.81 samples/sec Loss 6.0857 LearningRate 0.0006 Epoch: 12 Global Step: 252040 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:49,135-Speed 6284.58 samples/sec Loss 6.2101 LearningRate 0.0006 Epoch: 12 Global Step: 252050 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:52,380-Speed 6313.19 samples/sec Loss 6.1480 LearningRate 0.0006 Epoch: 12 Global Step: 252060 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:55,627-Speed 6307.91 samples/sec Loss 6.2408 LearningRate 0.0006 Epoch: 12 Global Step: 252070 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:39:58,875-Speed 6307.49 samples/sec Loss 6.1129 LearningRate 0.0006 Epoch: 12 Global Step: 252080 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:02,123-Speed 6306.66 samples/sec Loss 6.1435 LearningRate 0.0006 Epoch: 12 Global Step: 252090 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:05,367-Speed 6315.89 samples/sec Loss 6.1291 LearningRate 0.0006 Epoch: 12 Global Step: 252100 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:08,613-Speed 6310.77 samples/sec Loss 6.1091 LearningRate 0.0006 Epoch: 12 Global Step: 252110 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:11,859-Speed 6310.16 samples/sec Loss 6.1438 LearningRate 0.0006 Epoch: 12 Global Step: 252120 Fp16 Grad Scale: 65536 Required: 53 hours Training: 2022-04-01 14:40:15,090-Speed 6340.74 samples/sec Loss 6.1293 LearningRate 0.0006 Epoch: 12 Global Step: 252130 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:18,340-Speed 6301.87 samples/sec Loss 6.0455 LearningRate 0.0006 Epoch: 12 Global Step: 252140 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:21,586-Speed 6311.64 samples/sec Loss 6.1273 LearningRate 0.0006 Epoch: 12 Global Step: 252150 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:24,834-Speed 6307.37 samples/sec Loss 6.0820 LearningRate 0.0006 Epoch: 12 Global Step: 252160 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:28,083-Speed 6304.62 samples/sec Loss 6.0324 LearningRate 0.0006 Epoch: 12 Global Step: 252170 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:31,332-Speed 6303.63 samples/sec Loss 6.1402 LearningRate 0.0006 Epoch: 12 Global Step: 252180 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:34,579-Speed 6309.96 samples/sec Loss 6.1624 LearningRate 0.0006 Epoch: 12 Global Step: 252190 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:37,827-Speed 6307.03 samples/sec Loss 6.0854 LearningRate 0.0006 Epoch: 12 Global Step: 252200 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:41,070-Speed 6317.11 samples/sec Loss 6.0427 LearningRate 0.0006 Epoch: 12 Global Step: 252210 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:44,314-Speed 6312.93 samples/sec Loss 6.1074 LearningRate 0.0006 Epoch: 12 Global Step: 252220 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:47,550-Speed 6331.53 samples/sec Loss 6.1858 LearningRate 0.0006 Epoch: 12 Global Step: 252230 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:50,797-Speed 6307.75 samples/sec Loss 6.0626 LearningRate 0.0006 Epoch: 12 Global Step: 252240 Fp16 Grad Scale: 32768 Required: 53 hours Training: 2022-04-01 14:40:54,044-Speed 6308.91 samples/sec Loss 6.1554 LearningRate 0.0006 Epoch: 12 Global Step: 252250 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:40:57,293-Speed 6304.54 samples/sec Loss 6.0931 LearningRate 0.0006 Epoch: 12 Global Step: 252260 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:41:00,537-Speed 6314.19 samples/sec Loss 6.1201 LearningRate 0.0006 Epoch: 12 Global Step: 252270 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:41:03,769-Speed 6338.74 samples/sec Loss 6.1579 LearningRate 0.0006 Epoch: 12 Global Step: 252280 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:41:07,021-Speed 6299.57 samples/sec Loss 6.1498 LearningRate 0.0006 Epoch: 12 Global Step: 252290 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:41:10,280-Speed 6284.98 samples/sec Loss 6.0431 LearningRate 0.0006 Epoch: 12 Global Step: 252300 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:41:13,529-Speed 6304.37 samples/sec Loss 6.1098 LearningRate 0.0006 Epoch: 12 Global Step: 252310 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:41:16,774-Speed 6314.12 samples/sec Loss 6.1923 LearningRate 0.0006 Epoch: 12 Global Step: 252320 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:41:20,034-Speed 6282.37 samples/sec Loss 6.0766 LearningRate 0.0006 Epoch: 12 Global Step: 252330 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:41:23,282-Speed 6307.57 samples/sec Loss 6.0979 LearningRate 0.0006 Epoch: 12 Global Step: 252340 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:41:26,531-Speed 6304.72 samples/sec Loss 6.1798 LearningRate 0.0006 Epoch: 12 Global Step: 252350 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:41:29,778-Speed 6309.67 samples/sec Loss 6.1091 LearningRate 0.0006 Epoch: 12 Global Step: 252360 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:41:33,024-Speed 6310.83 samples/sec Loss 6.1177 LearningRate 0.0006 Epoch: 12 Global Step: 252370 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:41:36,272-Speed 6306.22 samples/sec Loss 6.1431 LearningRate 0.0006 Epoch: 12 Global Step: 252380 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:41:39,523-Speed 6301.97 samples/sec Loss 6.0748 LearningRate 0.0006 Epoch: 12 Global Step: 252390 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:41:42,769-Speed 6310.51 samples/sec Loss 6.1452 LearningRate 0.0006 Epoch: 12 Global Step: 252400 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:41:46,014-Speed 6312.53 samples/sec Loss 6.0980 LearningRate 0.0006 Epoch: 12 Global Step: 252410 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:41:49,262-Speed 6306.09 samples/sec Loss 6.1145 LearningRate 0.0006 Epoch: 12 Global Step: 252420 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:41:52,509-Speed 6309.60 samples/sec Loss 6.0559 LearningRate 0.0006 Epoch: 12 Global Step: 252430 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:41:55,755-Speed 6310.60 samples/sec Loss 6.0932 LearningRate 0.0006 Epoch: 12 Global Step: 252440 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:41:59,004-Speed 6305.10 samples/sec Loss 6.1039 LearningRate 0.0006 Epoch: 12 Global Step: 252450 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:02,245-Speed 6320.58 samples/sec Loss 6.1036 LearningRate 0.0006 Epoch: 12 Global Step: 252460 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:05,494-Speed 6305.12 samples/sec Loss 6.1040 LearningRate 0.0006 Epoch: 12 Global Step: 252470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:08,728-Speed 6333.89 samples/sec Loss 6.0587 LearningRate 0.0006 Epoch: 12 Global Step: 252480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:11,974-Speed 6310.30 samples/sec Loss 6.0995 LearningRate 0.0006 Epoch: 12 Global Step: 252490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:15,220-Speed 6309.68 samples/sec Loss 6.0866 LearningRate 0.0006 Epoch: 12 Global Step: 252500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:18,463-Speed 6317.47 samples/sec Loss 6.0271 LearningRate 0.0006 Epoch: 12 Global Step: 252510 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:21,719-Speed 6290.63 samples/sec Loss 6.0972 LearningRate 0.0006 Epoch: 12 Global Step: 252520 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:24,962-Speed 6317.42 samples/sec Loss 6.1900 LearningRate 0.0006 Epoch: 12 Global Step: 252530 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:28,209-Speed 6308.39 samples/sec Loss 6.1253 LearningRate 0.0006 Epoch: 12 Global Step: 252540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:31,457-Speed 6306.74 samples/sec Loss 6.0938 LearningRate 0.0006 Epoch: 12 Global Step: 252550 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:34,704-Speed 6309.98 samples/sec Loss 6.1154 LearningRate 0.0006 Epoch: 12 Global Step: 252560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:37,948-Speed 6314.03 samples/sec Loss 6.1130 LearningRate 0.0006 Epoch: 12 Global Step: 252570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:41,196-Speed 6307.61 samples/sec Loss 6.0632 LearningRate 0.0006 Epoch: 12 Global Step: 252580 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:42:44,426-Speed 6341.52 samples/sec Loss 6.1473 LearningRate 0.0006 Epoch: 12 Global Step: 252590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:47,673-Speed 6308.20 samples/sec Loss 6.1211 LearningRate 0.0006 Epoch: 12 Global Step: 252600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:50,929-Speed 6291.53 samples/sec Loss 6.0935 LearningRate 0.0006 Epoch: 12 Global Step: 252610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:54,181-Speed 6299.43 samples/sec Loss 6.0827 LearningRate 0.0006 Epoch: 12 Global Step: 252620 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:42:57,425-Speed 6313.72 samples/sec Loss 6.0302 LearningRate 0.0006 Epoch: 12 Global Step: 252630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:00,679-Speed 6296.21 samples/sec Loss 6.0950 LearningRate 0.0006 Epoch: 12 Global Step: 252640 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:03,923-Speed 6314.50 samples/sec Loss 6.1356 LearningRate 0.0006 Epoch: 12 Global Step: 252650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:07,169-Speed 6309.19 samples/sec Loss 6.1058 LearningRate 0.0006 Epoch: 12 Global Step: 252660 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:10,420-Speed 6302.72 samples/sec Loss 6.1019 LearningRate 0.0006 Epoch: 12 Global Step: 252670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:13,671-Speed 6300.54 samples/sec Loss 6.0607 LearningRate 0.0006 Epoch: 12 Global Step: 252680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:16,903-Speed 6338.74 samples/sec Loss 6.0167 LearningRate 0.0006 Epoch: 12 Global Step: 252690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:20,150-Speed 6307.31 samples/sec Loss 6.1126 LearningRate 0.0006 Epoch: 12 Global Step: 252700 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:23,396-Speed 6311.16 samples/sec Loss 6.1359 LearningRate 0.0006 Epoch: 12 Global Step: 252710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:26,642-Speed 6309.98 samples/sec Loss 6.0986 LearningRate 0.0006 Epoch: 12 Global Step: 252720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:29,888-Speed 6312.10 samples/sec Loss 6.1706 LearningRate 0.0006 Epoch: 12 Global Step: 252730 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:33,135-Speed 6307.46 samples/sec Loss 6.1669 LearningRate 0.0006 Epoch: 12 Global Step: 252740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:36,383-Speed 6307.56 samples/sec Loss 6.1052 LearningRate 0.0006 Epoch: 12 Global Step: 252750 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:39,632-Speed 6304.43 samples/sec Loss 6.1166 LearningRate 0.0006 Epoch: 12 Global Step: 252760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:42,880-Speed 6308.09 samples/sec Loss 6.1064 LearningRate 0.0006 Epoch: 12 Global Step: 252770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:46,128-Speed 6307.14 samples/sec Loss 6.1759 LearningRate 0.0006 Epoch: 12 Global Step: 252780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:49,361-Speed 6335.26 samples/sec Loss 6.1290 LearningRate 0.0006 Epoch: 12 Global Step: 252790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:52,610-Speed 6305.56 samples/sec Loss 6.1371 LearningRate 0.0006 Epoch: 12 Global Step: 252800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:55,855-Speed 6312.82 samples/sec Loss 6.2193 LearningRate 0.0006 Epoch: 12 Global Step: 252810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:43:59,099-Speed 6315.20 samples/sec Loss 6.1258 LearningRate 0.0006 Epoch: 12 Global Step: 252820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:02,344-Speed 6311.29 samples/sec Loss 6.1471 LearningRate 0.0006 Epoch: 12 Global Step: 252830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:05,594-Speed 6304.53 samples/sec Loss 6.1385 LearningRate 0.0006 Epoch: 12 Global Step: 252840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:08,842-Speed 6306.09 samples/sec Loss 6.2023 LearningRate 0.0006 Epoch: 12 Global Step: 252850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:12,088-Speed 6310.92 samples/sec Loss 6.1562 LearningRate 0.0006 Epoch: 12 Global Step: 252860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:15,339-Speed 6299.74 samples/sec Loss 6.1412 LearningRate 0.0006 Epoch: 12 Global Step: 252870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:18,590-Speed 6301.98 samples/sec Loss 6.1162 LearningRate 0.0006 Epoch: 12 Global Step: 252880 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:21,835-Speed 6312.21 samples/sec Loss 6.1574 LearningRate 0.0006 Epoch: 12 Global Step: 252890 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:44:25,069-Speed 6334.31 samples/sec Loss 6.0791 LearningRate 0.0006 Epoch: 12 Global Step: 252900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:28,316-Speed 6308.33 samples/sec Loss 6.0858 LearningRate 0.0006 Epoch: 12 Global Step: 252910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:31,578-Speed 6279.60 samples/sec Loss 6.1689 LearningRate 0.0006 Epoch: 12 Global Step: 252920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:34,820-Speed 6318.46 samples/sec Loss 6.0815 LearningRate 0.0006 Epoch: 12 Global Step: 252930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:38,066-Speed 6311.02 samples/sec Loss 6.0770 LearningRate 0.0006 Epoch: 12 Global Step: 252940 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:41,315-Speed 6305.47 samples/sec Loss 6.0643 LearningRate 0.0006 Epoch: 12 Global Step: 252950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:44,560-Speed 6312.19 samples/sec Loss 6.1276 LearningRate 0.0006 Epoch: 12 Global Step: 252960 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:47,807-Speed 6308.77 samples/sec Loss 6.1175 LearningRate 0.0006 Epoch: 12 Global Step: 252970 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:51,052-Speed 6312.69 samples/sec Loss 6.0433 LearningRate 0.0006 Epoch: 12 Global Step: 252980 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:54,295-Speed 6317.26 samples/sec Loss 6.0941 LearningRate 0.0006 Epoch: 12 Global Step: 252990 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:44:57,542-Speed 6309.63 samples/sec Loss 6.0881 LearningRate 0.0006 Epoch: 12 Global Step: 253000 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:45:00,775-Speed 6335.13 samples/sec Loss 6.1027 LearningRate 0.0006 Epoch: 12 Global Step: 253010 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:04,020-Speed 6312.75 samples/sec Loss 6.0688 LearningRate 0.0006 Epoch: 12 Global Step: 253020 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:07,267-Speed 6310.27 samples/sec Loss 6.0964 LearningRate 0.0006 Epoch: 12 Global Step: 253030 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:10,510-Speed 6316.58 samples/sec Loss 6.1597 LearningRate 0.0006 Epoch: 12 Global Step: 253040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:13,755-Speed 6312.31 samples/sec Loss 6.1535 LearningRate 0.0006 Epoch: 12 Global Step: 253050 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:17,000-Speed 6313.10 samples/sec Loss 6.1345 LearningRate 0.0006 Epoch: 12 Global Step: 253060 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:20,252-Speed 6297.56 samples/sec Loss 6.1806 LearningRate 0.0006 Epoch: 12 Global Step: 253070 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:23,497-Speed 6312.89 samples/sec Loss 6.0091 LearningRate 0.0006 Epoch: 12 Global Step: 253080 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:26,747-Speed 6302.76 samples/sec Loss 6.1266 LearningRate 0.0006 Epoch: 12 Global Step: 253090 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:29,996-Speed 6304.92 samples/sec Loss 6.1552 LearningRate 0.0006 Epoch: 12 Global Step: 253100 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:33,224-Speed 6346.52 samples/sec Loss 6.1519 LearningRate 0.0006 Epoch: 12 Global Step: 253110 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:36,473-Speed 6303.99 samples/sec Loss 6.1430 LearningRate 0.0006 Epoch: 12 Global Step: 253120 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:39,718-Speed 6312.42 samples/sec Loss 6.0605 LearningRate 0.0006 Epoch: 12 Global Step: 253130 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:42,964-Speed 6310.58 samples/sec Loss 6.1073 LearningRate 0.0006 Epoch: 12 Global Step: 253140 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:46,217-Speed 6298.94 samples/sec Loss 6.0820 LearningRate 0.0006 Epoch: 12 Global Step: 253150 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:49,465-Speed 6306.47 samples/sec Loss 6.2079 LearningRate 0.0006 Epoch: 12 Global Step: 253160 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:52,710-Speed 6312.97 samples/sec Loss 6.1180 LearningRate 0.0006 Epoch: 12 Global Step: 253170 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:55,954-Speed 6313.93 samples/sec Loss 6.1558 LearningRate 0.0006 Epoch: 12 Global Step: 253180 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:45:59,210-Speed 6291.48 samples/sec Loss 6.0942 LearningRate 0.0006 Epoch: 12 Global Step: 253190 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:02,464-Speed 6296.80 samples/sec Loss 6.0902 LearningRate 0.0006 Epoch: 12 Global Step: 253200 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:05,712-Speed 6306.69 samples/sec Loss 6.1147 LearningRate 0.0006 Epoch: 12 Global Step: 253210 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:46:08,938-Speed 6348.36 samples/sec Loss 6.1500 LearningRate 0.0006 Epoch: 12 Global Step: 253220 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:12,194-Speed 6292.67 samples/sec Loss 6.1299 LearningRate 0.0006 Epoch: 12 Global Step: 253230 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:15,551-Speed 6101.84 samples/sec Loss 6.1129 LearningRate 0.0006 Epoch: 12 Global Step: 253240 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:18,794-Speed 6315.86 samples/sec Loss 6.1385 LearningRate 0.0006 Epoch: 12 Global Step: 253250 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:22,042-Speed 6307.38 samples/sec Loss 6.0743 LearningRate 0.0006 Epoch: 12 Global Step: 253260 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:25,291-Speed 6304.53 samples/sec Loss 6.0711 LearningRate 0.0006 Epoch: 12 Global Step: 253270 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:28,539-Speed 6307.02 samples/sec Loss 6.1729 LearningRate 0.0006 Epoch: 12 Global Step: 253280 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:31,782-Speed 6316.23 samples/sec Loss 6.1577 LearningRate 0.0006 Epoch: 12 Global Step: 253290 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:35,026-Speed 6314.56 samples/sec Loss 6.0988 LearningRate 0.0006 Epoch: 12 Global Step: 253300 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:38,307-Speed 6243.09 samples/sec Loss 6.1255 LearningRate 0.0006 Epoch: 12 Global Step: 253310 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:41,542-Speed 6333.14 samples/sec Loss 6.1111 LearningRate 0.0006 Epoch: 12 Global Step: 253320 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:44,792-Speed 6302.69 samples/sec Loss 6.1507 LearningRate 0.0006 Epoch: 12 Global Step: 253330 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:48,041-Speed 6305.17 samples/sec Loss 6.0647 LearningRate 0.0006 Epoch: 12 Global Step: 253340 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:51,287-Speed 6310.83 samples/sec Loss 6.0456 LearningRate 0.0006 Epoch: 12 Global Step: 253350 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:54,534-Speed 6307.96 samples/sec Loss 6.1437 LearningRate 0.0006 Epoch: 12 Global Step: 253360 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:46:57,779-Speed 6312.04 samples/sec Loss 6.1177 LearningRate 0.0006 Epoch: 12 Global Step: 253370 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:01,026-Speed 6309.67 samples/sec Loss 6.0722 LearningRate 0.0006 Epoch: 12 Global Step: 253380 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:04,280-Speed 6295.60 samples/sec Loss 6.1083 LearningRate 0.0006 Epoch: 12 Global Step: 253390 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:07,524-Speed 6314.72 samples/sec Loss 6.1723 LearningRate 0.0006 Epoch: 12 Global Step: 253400 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:10,771-Speed 6309.55 samples/sec Loss 6.1355 LearningRate 0.0006 Epoch: 12 Global Step: 253410 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:14,017-Speed 6308.97 samples/sec Loss 6.1576 LearningRate 0.0006 Epoch: 12 Global Step: 253420 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:47:17,248-Speed 6340.91 samples/sec Loss 6.1373 LearningRate 0.0006 Epoch: 12 Global Step: 253430 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:20,497-Speed 6305.62 samples/sec Loss 6.1778 LearningRate 0.0006 Epoch: 12 Global Step: 253440 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:23,740-Speed 6316.55 samples/sec Loss 6.1009 LearningRate 0.0006 Epoch: 12 Global Step: 253450 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:26,986-Speed 6309.43 samples/sec Loss 6.0974 LearningRate 0.0006 Epoch: 12 Global Step: 253460 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:30,236-Speed 6304.26 samples/sec Loss 6.0306 LearningRate 0.0006 Epoch: 12 Global Step: 253470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:33,484-Speed 6306.17 samples/sec Loss 6.0704 LearningRate 0.0006 Epoch: 12 Global Step: 253480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:36,731-Speed 6309.14 samples/sec Loss 6.1079 LearningRate 0.0006 Epoch: 12 Global Step: 253490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:39,977-Speed 6309.96 samples/sec Loss 6.1457 LearningRate 0.0006 Epoch: 12 Global Step: 253500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:43,222-Speed 6312.00 samples/sec Loss 6.0729 LearningRate 0.0006 Epoch: 12 Global Step: 253510 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:46,463-Speed 6320.86 samples/sec Loss 6.0653 LearningRate 0.0006 Epoch: 12 Global Step: 253520 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:49,709-Speed 6311.84 samples/sec Loss 6.1450 LearningRate 0.0006 Epoch: 12 Global Step: 253530 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:47:52,951-Speed 6319.08 samples/sec Loss 6.0598 LearningRate 0.0006 Epoch: 12 Global Step: 253540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:56,196-Speed 6311.97 samples/sec Loss 6.0677 LearningRate 0.0006 Epoch: 12 Global Step: 253550 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:47:59,444-Speed 6306.63 samples/sec Loss 6.0560 LearningRate 0.0006 Epoch: 12 Global Step: 253560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:02,690-Speed 6309.72 samples/sec Loss 6.1698 LearningRate 0.0006 Epoch: 12 Global Step: 253570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:05,938-Speed 6306.86 samples/sec Loss 6.1505 LearningRate 0.0006 Epoch: 12 Global Step: 253580 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:09,183-Speed 6314.04 samples/sec Loss 6.1276 LearningRate 0.0006 Epoch: 12 Global Step: 253590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:12,429-Speed 6310.79 samples/sec Loss 6.1002 LearningRate 0.0006 Epoch: 12 Global Step: 253600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:15,674-Speed 6312.32 samples/sec Loss 6.1308 LearningRate 0.0006 Epoch: 12 Global Step: 253610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:18,918-Speed 6314.32 samples/sec Loss 6.1444 LearningRate 0.0006 Epoch: 12 Global Step: 253620 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:22,167-Speed 6305.72 samples/sec Loss 6.0851 LearningRate 0.0006 Epoch: 12 Global Step: 253630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:25,418-Speed 6301.65 samples/sec Loss 6.0723 LearningRate 0.0006 Epoch: 12 Global Step: 253640 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:48:28,655-Speed 6328.61 samples/sec Loss 6.1596 LearningRate 0.0006 Epoch: 12 Global Step: 253650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:31,901-Speed 6310.76 samples/sec Loss 6.1066 LearningRate 0.0006 Epoch: 12 Global Step: 253660 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:35,150-Speed 6304.82 samples/sec Loss 6.0701 LearningRate 0.0006 Epoch: 12 Global Step: 253670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:38,394-Speed 6313.06 samples/sec Loss 6.1174 LearningRate 0.0006 Epoch: 12 Global Step: 253680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:41,641-Speed 6310.51 samples/sec Loss 6.1772 LearningRate 0.0006 Epoch: 12 Global Step: 253690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:44,887-Speed 6310.15 samples/sec Loss 6.1382 LearningRate 0.0006 Epoch: 12 Global Step: 253700 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:48,132-Speed 6311.17 samples/sec Loss 6.1340 LearningRate 0.0006 Epoch: 12 Global Step: 253710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:51,382-Speed 6304.75 samples/sec Loss 6.0643 LearningRate 0.0006 Epoch: 12 Global Step: 253720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:54,628-Speed 6310.70 samples/sec Loss 6.1005 LearningRate 0.0006 Epoch: 12 Global Step: 253730 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:48:57,878-Speed 6302.45 samples/sec Loss 6.0836 LearningRate 0.0006 Epoch: 12 Global Step: 253740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:01,123-Speed 6312.08 samples/sec Loss 6.0662 LearningRate 0.0006 Epoch: 12 Global Step: 253750 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:49:04,354-Speed 6339.90 samples/sec Loss 6.1485 LearningRate 0.0006 Epoch: 12 Global Step: 253760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:07,602-Speed 6307.51 samples/sec Loss 6.0672 LearningRate 0.0006 Epoch: 12 Global Step: 253770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:10,856-Speed 6295.44 samples/sec Loss 6.1192 LearningRate 0.0006 Epoch: 12 Global Step: 253780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:14,102-Speed 6310.26 samples/sec Loss 6.0653 LearningRate 0.0006 Epoch: 12 Global Step: 253790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:17,346-Speed 6314.48 samples/sec Loss 6.1813 LearningRate 0.0006 Epoch: 12 Global Step: 253800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:20,588-Speed 6317.69 samples/sec Loss 6.1235 LearningRate 0.0006 Epoch: 12 Global Step: 253810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:23,834-Speed 6312.34 samples/sec Loss 6.1472 LearningRate 0.0006 Epoch: 12 Global Step: 253820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:27,081-Speed 6308.69 samples/sec Loss 6.1144 LearningRate 0.0006 Epoch: 12 Global Step: 253830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:30,327-Speed 6310.30 samples/sec Loss 6.1513 LearningRate 0.0006 Epoch: 12 Global Step: 253840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:33,571-Speed 6315.22 samples/sec Loss 6.1191 LearningRate 0.0006 Epoch: 12 Global Step: 253850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:36,816-Speed 6312.37 samples/sec Loss 6.1024 LearningRate 0.0006 Epoch: 12 Global Step: 253860 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:49:40,047-Speed 6341.01 samples/sec Loss 6.1524 LearningRate 0.0006 Epoch: 12 Global Step: 253870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:43,311-Speed 6275.58 samples/sec Loss 6.1156 LearningRate 0.0006 Epoch: 12 Global Step: 253880 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:46,559-Speed 6307.46 samples/sec Loss 6.0768 LearningRate 0.0006 Epoch: 12 Global Step: 253890 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:49,808-Speed 6304.65 samples/sec Loss 6.1259 LearningRate 0.0006 Epoch: 12 Global Step: 253900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:53,059-Speed 6299.99 samples/sec Loss 6.0791 LearningRate 0.0006 Epoch: 12 Global Step: 253910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:56,306-Speed 6308.50 samples/sec Loss 6.1349 LearningRate 0.0006 Epoch: 12 Global Step: 253920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:49:59,558-Speed 6300.33 samples/sec Loss 6.0672 LearningRate 0.0006 Epoch: 12 Global Step: 253930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:02,809-Speed 6300.81 samples/sec Loss 6.1259 LearningRate 0.0006 Epoch: 12 Global Step: 253940 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:06,057-Speed 6305.94 samples/sec Loss 6.1355 LearningRate 0.0006 Epoch: 12 Global Step: 253950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:09,305-Speed 6307.35 samples/sec Loss 6.0768 LearningRate 0.0006 Epoch: 12 Global Step: 253960 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:12,536-Speed 6339.12 samples/sec Loss 6.0739 LearningRate 0.0006 Epoch: 12 Global Step: 253970 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:15,782-Speed 6311.15 samples/sec Loss 6.0997 LearningRate 0.0006 Epoch: 12 Global Step: 253980 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:19,031-Speed 6305.70 samples/sec Loss 6.0617 LearningRate 0.0006 Epoch: 12 Global Step: 253990 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:22,283-Speed 6298.65 samples/sec Loss 6.1417 LearningRate 0.0006 Epoch: 12 Global Step: 254000 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:25,535-Speed 6299.43 samples/sec Loss 6.1065 LearningRate 0.0006 Epoch: 12 Global Step: 254010 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:28,779-Speed 6314.33 samples/sec Loss 6.1524 LearningRate 0.0006 Epoch: 12 Global Step: 254020 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:32,025-Speed 6311.59 samples/sec Loss 6.1076 LearningRate 0.0006 Epoch: 12 Global Step: 254030 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:35,292-Speed 6269.23 samples/sec Loss 6.1352 LearningRate 0.0006 Epoch: 12 Global Step: 254040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:38,575-Speed 6240.13 samples/sec Loss 6.1522 LearningRate 0.0006 Epoch: 12 Global Step: 254050 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:41,855-Speed 6245.84 samples/sec Loss 6.1349 LearningRate 0.0006 Epoch: 12 Global Step: 254060 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:45,088-Speed 6336.64 samples/sec Loss 6.0786 LearningRate 0.0006 Epoch: 12 Global Step: 254070 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:48,335-Speed 6308.90 samples/sec Loss 6.0887 LearningRate 0.0006 Epoch: 12 Global Step: 254080 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:51,582-Speed 6307.76 samples/sec Loss 6.1485 LearningRate 0.0006 Epoch: 12 Global Step: 254090 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:54,825-Speed 6317.31 samples/sec Loss 6.0401 LearningRate 0.0006 Epoch: 12 Global Step: 254100 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:50:58,072-Speed 6307.70 samples/sec Loss 6.0737 LearningRate 0.0006 Epoch: 12 Global Step: 254110 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:01,317-Speed 6312.05 samples/sec Loss 6.1023 LearningRate 0.0006 Epoch: 12 Global Step: 254120 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:04,566-Speed 6310.93 samples/sec Loss 6.1764 LearningRate 0.0006 Epoch: 12 Global Step: 254130 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:07,810-Speed 6314.57 samples/sec Loss 6.1315 LearningRate 0.0006 Epoch: 12 Global Step: 254140 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:11,054-Speed 6313.17 samples/sec Loss 6.0874 LearningRate 0.0006 Epoch: 12 Global Step: 254150 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:14,299-Speed 6312.03 samples/sec Loss 6.0804 LearningRate 0.0006 Epoch: 12 Global Step: 254160 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:17,528-Speed 6344.26 samples/sec Loss 6.1062 LearningRate 0.0006 Epoch: 12 Global Step: 254170 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:20,773-Speed 6313.82 samples/sec Loss 6.1364 LearningRate 0.0006 Epoch: 12 Global Step: 254180 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:24,027-Speed 6295.29 samples/sec Loss 6.1252 LearningRate 0.0006 Epoch: 12 Global Step: 254190 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:27,279-Speed 6298.20 samples/sec Loss 6.0924 LearningRate 0.0006 Epoch: 12 Global Step: 254200 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:30,522-Speed 6316.87 samples/sec Loss 6.0445 LearningRate 0.0006 Epoch: 12 Global Step: 254210 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:33,768-Speed 6311.22 samples/sec Loss 6.0880 LearningRate 0.0006 Epoch: 12 Global Step: 254220 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:37,020-Speed 6298.32 samples/sec Loss 6.1457 LearningRate 0.0006 Epoch: 12 Global Step: 254230 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:40,263-Speed 6316.73 samples/sec Loss 6.1212 LearningRate 0.0006 Epoch: 12 Global Step: 254240 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:43,507-Speed 6315.53 samples/sec Loss 6.1076 LearningRate 0.0006 Epoch: 12 Global Step: 254250 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:46,757-Speed 6302.62 samples/sec Loss 6.1064 LearningRate 0.0006 Epoch: 12 Global Step: 254260 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:50,007-Speed 6303.00 samples/sec Loss 6.1003 LearningRate 0.0006 Epoch: 12 Global Step: 254270 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:51:53,242-Speed 6331.82 samples/sec Loss 6.1424 LearningRate 0.0006 Epoch: 12 Global Step: 254280 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:56,488-Speed 6312.09 samples/sec Loss 6.0110 LearningRate 0.0006 Epoch: 12 Global Step: 254290 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:51:59,738-Speed 6302.64 samples/sec Loss 6.0544 LearningRate 0.0006 Epoch: 12 Global Step: 254300 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:02,982-Speed 6313.94 samples/sec Loss 6.1798 LearningRate 0.0006 Epoch: 12 Global Step: 254310 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:06,229-Speed 6308.55 samples/sec Loss 6.0296 LearningRate 0.0006 Epoch: 12 Global Step: 254320 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:09,476-Speed 6308.61 samples/sec Loss 6.0399 LearningRate 0.0006 Epoch: 12 Global Step: 254330 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:12,720-Speed 6315.37 samples/sec Loss 6.0763 LearningRate 0.0006 Epoch: 12 Global Step: 254340 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:15,965-Speed 6312.02 samples/sec Loss 6.0854 LearningRate 0.0006 Epoch: 12 Global Step: 254350 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:19,213-Speed 6308.07 samples/sec Loss 6.1457 LearningRate 0.0006 Epoch: 12 Global Step: 254360 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:22,460-Speed 6308.30 samples/sec Loss 6.2024 LearningRate 0.0006 Epoch: 12 Global Step: 254370 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:25,711-Speed 6301.34 samples/sec Loss 6.1427 LearningRate 0.0006 Epoch: 12 Global Step: 254380 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:52:28,944-Speed 6336.25 samples/sec Loss 6.0598 LearningRate 0.0006 Epoch: 12 Global Step: 254390 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:32,186-Speed 6318.21 samples/sec Loss 6.1793 LearningRate 0.0006 Epoch: 12 Global Step: 254400 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:35,428-Speed 6318.94 samples/sec Loss 6.1778 LearningRate 0.0006 Epoch: 12 Global Step: 254410 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:38,687-Speed 6284.40 samples/sec Loss 6.0424 LearningRate 0.0006 Epoch: 12 Global Step: 254420 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:41,932-Speed 6312.02 samples/sec Loss 6.0866 LearningRate 0.0006 Epoch: 12 Global Step: 254430 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:45,176-Speed 6315.43 samples/sec Loss 6.0737 LearningRate 0.0006 Epoch: 12 Global Step: 254440 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:48,425-Speed 6305.70 samples/sec Loss 6.0647 LearningRate 0.0006 Epoch: 12 Global Step: 254450 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:51,669-Speed 6314.38 samples/sec Loss 6.0797 LearningRate 0.0006 Epoch: 12 Global Step: 254460 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:54,916-Speed 6307.69 samples/sec Loss 6.1238 LearningRate 0.0006 Epoch: 12 Global Step: 254470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:52:58,165-Speed 6305.96 samples/sec Loss 6.2044 LearningRate 0.0006 Epoch: 12 Global Step: 254480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:01,394-Speed 6344.47 samples/sec Loss 6.1361 LearningRate 0.0006 Epoch: 12 Global Step: 254490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:04,641-Speed 6309.00 samples/sec Loss 6.0696 LearningRate 0.0006 Epoch: 12 Global Step: 254500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:07,891-Speed 6302.33 samples/sec Loss 6.0803 LearningRate 0.0006 Epoch: 12 Global Step: 254510 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:11,139-Speed 6307.79 samples/sec Loss 6.1099 LearningRate 0.0006 Epoch: 12 Global Step: 254520 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:14,384-Speed 6311.61 samples/sec Loss 6.0929 LearningRate 0.0006 Epoch: 12 Global Step: 254530 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:17,631-Speed 6307.89 samples/sec Loss 6.0596 LearningRate 0.0006 Epoch: 12 Global Step: 254540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:20,878-Speed 6310.28 samples/sec Loss 6.0977 LearningRate 0.0006 Epoch: 12 Global Step: 254550 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:24,125-Speed 6307.98 samples/sec Loss 6.1056 LearningRate 0.0006 Epoch: 12 Global Step: 254560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:27,372-Speed 6309.89 samples/sec Loss 6.1403 LearningRate 0.0006 Epoch: 12 Global Step: 254570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:30,618-Speed 6310.36 samples/sec Loss 6.1633 LearningRate 0.0006 Epoch: 12 Global Step: 254580 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:33,850-Speed 6338.38 samples/sec Loss 6.1236 LearningRate 0.0006 Epoch: 12 Global Step: 254590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:37,095-Speed 6312.50 samples/sec Loss 6.1435 LearningRate 0.0006 Epoch: 12 Global Step: 254600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:40,339-Speed 6314.74 samples/sec Loss 6.1202 LearningRate 0.0006 Epoch: 12 Global Step: 254610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:43,587-Speed 6305.89 samples/sec Loss 6.1389 LearningRate 0.0006 Epoch: 12 Global Step: 254620 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:46,832-Speed 6313.46 samples/sec Loss 6.1072 LearningRate 0.0006 Epoch: 12 Global Step: 254630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:50,075-Speed 6315.67 samples/sec Loss 6.1038 LearningRate 0.0006 Epoch: 12 Global Step: 254640 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:53,318-Speed 6316.17 samples/sec Loss 6.1031 LearningRate 0.0006 Epoch: 12 Global Step: 254650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:56,600-Speed 6242.08 samples/sec Loss 6.1370 LearningRate 0.0006 Epoch: 12 Global Step: 254660 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:53:59,847-Speed 6309.20 samples/sec Loss 6.0026 LearningRate 0.0006 Epoch: 12 Global Step: 254670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:03,096-Speed 6305.42 samples/sec Loss 6.1611 LearningRate 0.0006 Epoch: 12 Global Step: 254680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:06,331-Speed 6332.14 samples/sec Loss 6.0862 LearningRate 0.0006 Epoch: 12 Global Step: 254690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:09,573-Speed 6317.78 samples/sec Loss 6.1066 LearningRate 0.0006 Epoch: 12 Global Step: 254700 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:12,823-Speed 6303.92 samples/sec Loss 6.0231 LearningRate 0.0006 Epoch: 12 Global Step: 254710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:16,065-Speed 6318.82 samples/sec Loss 6.2287 LearningRate 0.0006 Epoch: 12 Global Step: 254720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:19,309-Speed 6314.36 samples/sec Loss 6.1390 LearningRate 0.0006 Epoch: 12 Global Step: 254730 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:22,558-Speed 6305.64 samples/sec Loss 6.1524 LearningRate 0.0006 Epoch: 12 Global Step: 254740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:25,814-Speed 6290.12 samples/sec Loss 6.0567 LearningRate 0.0006 Epoch: 12 Global Step: 254750 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:29,094-Speed 6245.23 samples/sec Loss 6.0397 LearningRate 0.0006 Epoch: 12 Global Step: 254760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:32,338-Speed 6314.79 samples/sec Loss 6.2009 LearningRate 0.0006 Epoch: 12 Global Step: 254770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:35,585-Speed 6308.70 samples/sec Loss 6.1300 LearningRate 0.0006 Epoch: 12 Global Step: 254780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:38,817-Speed 6338.50 samples/sec Loss 6.0859 LearningRate 0.0006 Epoch: 12 Global Step: 254790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:42,064-Speed 6309.08 samples/sec Loss 6.0208 LearningRate 0.0006 Epoch: 12 Global Step: 254800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:45,316-Speed 6298.41 samples/sec Loss 6.1547 LearningRate 0.0006 Epoch: 12 Global Step: 254810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:48,565-Speed 6304.71 samples/sec Loss 6.1237 LearningRate 0.0006 Epoch: 12 Global Step: 254820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:51,819-Speed 6296.50 samples/sec Loss 6.1005 LearningRate 0.0006 Epoch: 12 Global Step: 254830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:55,070-Speed 6300.48 samples/sec Loss 6.1292 LearningRate 0.0006 Epoch: 12 Global Step: 254840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:54:58,317-Speed 6309.16 samples/sec Loss 6.1456 LearningRate 0.0006 Epoch: 12 Global Step: 254850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:55:01,562-Speed 6312.50 samples/sec Loss 6.1059 LearningRate 0.0006 Epoch: 12 Global Step: 254860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:55:04,809-Speed 6308.46 samples/sec Loss 6.1571 LearningRate 0.0006 Epoch: 12 Global Step: 254870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:55:08,054-Speed 6313.41 samples/sec Loss 6.2149 LearningRate 0.0006 Epoch: 12 Global Step: 254880 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:55:11,296-Speed 6318.46 samples/sec Loss 6.1071 LearningRate 0.0006 Epoch: 12 Global Step: 254890 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:55:14,525-Speed 6344.20 samples/sec Loss 6.0976 LearningRate 0.0006 Epoch: 12 Global Step: 254900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:55:17,778-Speed 6297.60 samples/sec Loss 6.1146 LearningRate 0.0006 Epoch: 12 Global Step: 254910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:55:21,024-Speed 6310.13 samples/sec Loss 6.0875 LearningRate 0.0006 Epoch: 12 Global Step: 254920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:55:24,267-Speed 6316.69 samples/sec Loss 6.0993 LearningRate 0.0006 Epoch: 12 Global Step: 254930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:55:27,498-Speed 6339.93 samples/sec Loss 6.1660 LearningRate 0.0006 Epoch: 12 Global Step: 254940 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:55:30,746-Speed 6307.27 samples/sec Loss 6.0918 LearningRate 0.0006 Epoch: 12 Global Step: 254950 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:55:33,991-Speed 6311.63 samples/sec Loss 6.1189 LearningRate 0.0006 Epoch: 12 Global Step: 254960 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:55:37,237-Speed 6311.80 samples/sec Loss 6.1006 LearningRate 0.0006 Epoch: 12 Global Step: 254970 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:55:40,486-Speed 6303.84 samples/sec Loss 6.1086 LearningRate 0.0006 Epoch: 12 Global Step: 254980 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:55:43,734-Speed 6307.25 samples/sec Loss 6.0951 LearningRate 0.0006 Epoch: 12 Global Step: 254990 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:55:46,985-Speed 6301.47 samples/sec Loss 6.0368 LearningRate 0.0006 Epoch: 12 Global Step: 255000 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:55:50,231-Speed 6311.17 samples/sec Loss 6.0993 LearningRate 0.0006 Epoch: 12 Global Step: 255010 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:55:53,482-Speed 6300.80 samples/sec Loss 6.1398 LearningRate 0.0006 Epoch: 12 Global Step: 255020 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:55:56,725-Speed 6316.95 samples/sec Loss 6.1178 LearningRate 0.0006 Epoch: 12 Global Step: 255030 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:55:59,972-Speed 6307.74 samples/sec Loss 6.1411 LearningRate 0.0006 Epoch: 12 Global Step: 255040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:03,221-Speed 6304.29 samples/sec Loss 6.0412 LearningRate 0.0006 Epoch: 12 Global Step: 255050 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:06,480-Speed 6285.09 samples/sec Loss 6.0506 LearningRate 0.0006 Epoch: 12 Global Step: 255060 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:09,728-Speed 6307.00 samples/sec Loss 6.0682 LearningRate 0.0006 Epoch: 12 Global Step: 255070 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:12,978-Speed 6304.40 samples/sec Loss 6.1083 LearningRate 0.0006 Epoch: 12 Global Step: 255080 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:16,220-Speed 6318.89 samples/sec Loss 6.0750 LearningRate 0.0006 Epoch: 12 Global Step: 255090 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:19,463-Speed 6315.55 samples/sec Loss 6.0774 LearningRate 0.0006 Epoch: 12 Global Step: 255100 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:22,711-Speed 6306.90 samples/sec Loss 6.1439 LearningRate 0.0006 Epoch: 12 Global Step: 255110 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:25,956-Speed 6314.02 samples/sec Loss 6.0352 LearningRate 0.0006 Epoch: 12 Global Step: 255120 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:29,201-Speed 6311.32 samples/sec Loss 6.0709 LearningRate 0.0006 Epoch: 12 Global Step: 255130 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:32,444-Speed 6316.39 samples/sec Loss 6.1344 LearningRate 0.0006 Epoch: 12 Global Step: 255140 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:56:35,675-Speed 6340.03 samples/sec Loss 6.0946 LearningRate 0.0006 Epoch: 12 Global Step: 255150 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:38,920-Speed 6313.45 samples/sec Loss 6.1334 LearningRate 0.0006 Epoch: 12 Global Step: 255160 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:42,166-Speed 6311.21 samples/sec Loss 6.0784 LearningRate 0.0006 Epoch: 12 Global Step: 255170 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:45,408-Speed 6318.25 samples/sec Loss 6.1629 LearningRate 0.0006 Epoch: 12 Global Step: 255180 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:48,656-Speed 6307.41 samples/sec Loss 6.1565 LearningRate 0.0006 Epoch: 12 Global Step: 255190 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:51,899-Speed 6315.55 samples/sec Loss 6.1583 LearningRate 0.0006 Epoch: 12 Global Step: 255200 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:55,144-Speed 6313.00 samples/sec Loss 6.2178 LearningRate 0.0006 Epoch: 12 Global Step: 255210 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:56:58,394-Speed 6303.19 samples/sec Loss 6.0662 LearningRate 0.0006 Epoch: 12 Global Step: 255220 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:01,641-Speed 6307.92 samples/sec Loss 6.0829 LearningRate 0.0006 Epoch: 12 Global Step: 255230 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:04,895-Speed 6296.30 samples/sec Loss 6.0747 LearningRate 0.0006 Epoch: 12 Global Step: 255240 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:08,123-Speed 6345.98 samples/sec Loss 6.0410 LearningRate 0.0006 Epoch: 12 Global Step: 255250 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:11,368-Speed 6311.41 samples/sec Loss 6.0973 LearningRate 0.0006 Epoch: 12 Global Step: 255260 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:14,621-Speed 6301.50 samples/sec Loss 6.1393 LearningRate 0.0006 Epoch: 12 Global Step: 255270 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:17,865-Speed 6313.93 samples/sec Loss 6.1070 LearningRate 0.0006 Epoch: 12 Global Step: 255280 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:21,109-Speed 6314.85 samples/sec Loss 6.0227 LearningRate 0.0006 Epoch: 12 Global Step: 255290 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:24,355-Speed 6310.26 samples/sec Loss 6.0788 LearningRate 0.0006 Epoch: 12 Global Step: 255300 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:27,604-Speed 6305.04 samples/sec Loss 6.0986 LearningRate 0.0006 Epoch: 12 Global Step: 255310 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:30,854-Speed 6303.25 samples/sec Loss 6.1427 LearningRate 0.0006 Epoch: 12 Global Step: 255320 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:34,103-Speed 6303.95 samples/sec Loss 6.0635 LearningRate 0.0006 Epoch: 12 Global Step: 255330 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:37,349-Speed 6312.12 samples/sec Loss 6.0749 LearningRate 0.0006 Epoch: 12 Global Step: 255340 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:57:40,581-Speed 6338.28 samples/sec Loss 6.0335 LearningRate 0.0006 Epoch: 12 Global Step: 255350 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:57:43,826-Speed 6312.79 samples/sec Loss 6.0665 LearningRate 0.0006 Epoch: 12 Global Step: 255360 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:57:47,076-Speed 6301.95 samples/sec Loss 6.1521 LearningRate 0.0006 Epoch: 12 Global Step: 255370 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:57:50,320-Speed 6315.97 samples/sec Loss 6.1671 LearningRate 0.0006 Epoch: 12 Global Step: 255380 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:57:53,565-Speed 6311.66 samples/sec Loss 6.0886 LearningRate 0.0006 Epoch: 12 Global Step: 255390 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:57:56,814-Speed 6304.95 samples/sec Loss 6.1500 LearningRate 0.0006 Epoch: 12 Global Step: 255400 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:58:00,060-Speed 6310.85 samples/sec Loss 6.0114 LearningRate 0.0006 Epoch: 12 Global Step: 255410 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:58:03,303-Speed 6315.48 samples/sec Loss 6.1295 LearningRate 0.0006 Epoch: 12 Global Step: 255420 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:58:06,553-Speed 6304.50 samples/sec Loss 6.0722 LearningRate 0.0006 Epoch: 12 Global Step: 255430 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:58:09,799-Speed 6309.68 samples/sec Loss 6.0977 LearningRate 0.0006 Epoch: 12 Global Step: 255440 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 14:58:13,046-Speed 6309.87 samples/sec Loss 6.1304 LearningRate 0.0006 Epoch: 12 Global Step: 255450 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:16,291-Speed 6311.39 samples/sec Loss 6.0295 LearningRate 0.0006 Epoch: 12 Global Step: 255460 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:19,537-Speed 6311.87 samples/sec Loss 6.0502 LearningRate 0.0006 Epoch: 12 Global Step: 255470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:22,782-Speed 6311.27 samples/sec Loss 6.1413 LearningRate 0.0006 Epoch: 12 Global Step: 255480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:26,027-Speed 6314.14 samples/sec Loss 6.1272 LearningRate 0.0006 Epoch: 12 Global Step: 255490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:29,276-Speed 6304.18 samples/sec Loss 6.1156 LearningRate 0.0006 Epoch: 12 Global Step: 255500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:32,530-Speed 6296.06 samples/sec Loss 6.1668 LearningRate 0.0006 Epoch: 12 Global Step: 255510 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:35,776-Speed 6308.97 samples/sec Loss 6.0786 LearningRate 0.0006 Epoch: 12 Global Step: 255520 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:39,026-Speed 6304.07 samples/sec Loss 6.0527 LearningRate 0.0006 Epoch: 12 Global Step: 255530 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:42,269-Speed 6316.34 samples/sec Loss 6.1646 LearningRate 0.0006 Epoch: 12 Global Step: 255540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:45,514-Speed 6312.36 samples/sec Loss 6.1195 LearningRate 0.0006 Epoch: 12 Global Step: 255550 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:58:48,753-Speed 6325.37 samples/sec Loss 6.1319 LearningRate 0.0006 Epoch: 12 Global Step: 255560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:52,005-Speed 6298.43 samples/sec Loss 6.1260 LearningRate 0.0006 Epoch: 12 Global Step: 255570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:55,267-Speed 6280.18 samples/sec Loss 6.1007 LearningRate 0.0006 Epoch: 12 Global Step: 255580 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:58:58,509-Speed 6319.39 samples/sec Loss 6.1126 LearningRate 0.0006 Epoch: 12 Global Step: 255590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:01,754-Speed 6311.75 samples/sec Loss 6.0724 LearningRate 0.0006 Epoch: 12 Global Step: 255600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:05,004-Speed 6303.21 samples/sec Loss 6.0281 LearningRate 0.0006 Epoch: 12 Global Step: 255610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:08,258-Speed 6295.40 samples/sec Loss 6.1184 LearningRate 0.0006 Epoch: 12 Global Step: 255620 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:11,501-Speed 6317.32 samples/sec Loss 6.0287 LearningRate 0.0006 Epoch: 12 Global Step: 255630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:14,757-Speed 6290.47 samples/sec Loss 6.0219 LearningRate 0.0006 Epoch: 12 Global Step: 255640 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:18,003-Speed 6311.43 samples/sec Loss 6.0504 LearningRate 0.0006 Epoch: 12 Global Step: 255650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:21,232-Speed 6344.05 samples/sec Loss 6.0496 LearningRate 0.0006 Epoch: 12 Global Step: 255660 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:24,480-Speed 6305.86 samples/sec Loss 6.1314 LearningRate 0.0006 Epoch: 12 Global Step: 255670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:27,729-Speed 6305.29 samples/sec Loss 6.1119 LearningRate 0.0006 Epoch: 12 Global Step: 255680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:30,977-Speed 6306.98 samples/sec Loss 6.0936 LearningRate 0.0006 Epoch: 12 Global Step: 255690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:34,232-Speed 6291.80 samples/sec Loss 6.0556 LearningRate 0.0006 Epoch: 12 Global Step: 255700 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:37,477-Speed 6313.07 samples/sec Loss 6.1469 LearningRate 0.0006 Epoch: 12 Global Step: 255710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:40,724-Speed 6309.91 samples/sec Loss 6.0456 LearningRate 0.0006 Epoch: 12 Global Step: 255720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:43,977-Speed 6296.79 samples/sec Loss 6.0528 LearningRate 0.0006 Epoch: 12 Global Step: 255730 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:47,221-Speed 6315.05 samples/sec Loss 6.1156 LearningRate 0.0006 Epoch: 12 Global Step: 255740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:50,466-Speed 6313.57 samples/sec Loss 6.0843 LearningRate 0.0006 Epoch: 12 Global Step: 255750 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 14:59:53,710-Speed 6315.02 samples/sec Loss 6.0629 LearningRate 0.0006 Epoch: 12 Global Step: 255760 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 14:59:56,942-Speed 6338.25 samples/sec Loss 6.1406 LearningRate 0.0006 Epoch: 12 Global Step: 255770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:00,190-Speed 6307.63 samples/sec Loss 6.0762 LearningRate 0.0006 Epoch: 12 Global Step: 255780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:03,435-Speed 6312.75 samples/sec Loss 6.0708 LearningRate 0.0006 Epoch: 12 Global Step: 255790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:06,687-Speed 6297.79 samples/sec Loss 6.0677 LearningRate 0.0006 Epoch: 12 Global Step: 255800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:09,937-Speed 6302.73 samples/sec Loss 6.1360 LearningRate 0.0006 Epoch: 12 Global Step: 255810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:13,185-Speed 6307.47 samples/sec Loss 5.9970 LearningRate 0.0006 Epoch: 12 Global Step: 255820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:16,438-Speed 6297.93 samples/sec Loss 6.0516 LearningRate 0.0006 Epoch: 12 Global Step: 255830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:19,680-Speed 6318.40 samples/sec Loss 6.0958 LearningRate 0.0006 Epoch: 12 Global Step: 255840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:22,928-Speed 6306.90 samples/sec Loss 6.1418 LearningRate 0.0006 Epoch: 12 Global Step: 255850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:26,175-Speed 6308.50 samples/sec Loss 6.0660 LearningRate 0.0006 Epoch: 12 Global Step: 255860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:29,406-Speed 6339.84 samples/sec Loss 6.1840 LearningRate 0.0006 Epoch: 12 Global Step: 255870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:32,649-Speed 6315.27 samples/sec Loss 6.1557 LearningRate 0.0006 Epoch: 12 Global Step: 255880 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:35,897-Speed 6307.42 samples/sec Loss 6.1454 LearningRate 0.0006 Epoch: 12 Global Step: 255890 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:39,147-Speed 6304.16 samples/sec Loss 6.1040 LearningRate 0.0006 Epoch: 12 Global Step: 255900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:42,392-Speed 6311.13 samples/sec Loss 6.0705 LearningRate 0.0006 Epoch: 12 Global Step: 255910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:45,644-Speed 6298.81 samples/sec Loss 6.0966 LearningRate 0.0006 Epoch: 12 Global Step: 255920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:48,888-Speed 6316.03 samples/sec Loss 6.0889 LearningRate 0.0006 Epoch: 12 Global Step: 255930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:52,132-Speed 6312.99 samples/sec Loss 6.0850 LearningRate 0.0006 Epoch: 12 Global Step: 255940 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:55,378-Speed 6311.96 samples/sec Loss 6.0524 LearningRate 0.0006 Epoch: 12 Global Step: 255950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:00:58,627-Speed 6304.99 samples/sec Loss 6.0223 LearningRate 0.0006 Epoch: 12 Global Step: 255960 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:01,872-Speed 6313.74 samples/sec Loss 6.0546 LearningRate 0.0006 Epoch: 12 Global Step: 255970 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:01:05,105-Speed 6336.13 samples/sec Loss 6.0370 LearningRate 0.0006 Epoch: 12 Global Step: 255980 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:08,348-Speed 6315.84 samples/sec Loss 6.1103 LearningRate 0.0006 Epoch: 12 Global Step: 255990 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:11,597-Speed 6305.38 samples/sec Loss 6.0427 LearningRate 0.0006 Epoch: 12 Global Step: 256000 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:14,841-Speed 6313.76 samples/sec Loss 6.0814 LearningRate 0.0006 Epoch: 12 Global Step: 256010 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:18,092-Speed 6301.67 samples/sec Loss 6.1688 LearningRate 0.0006 Epoch: 12 Global Step: 256020 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:21,341-Speed 6304.21 samples/sec Loss 6.0650 LearningRate 0.0006 Epoch: 12 Global Step: 256030 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:24,592-Speed 6302.74 samples/sec Loss 6.1079 LearningRate 0.0006 Epoch: 12 Global Step: 256040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:27,838-Speed 6310.19 samples/sec Loss 6.1515 LearningRate 0.0006 Epoch: 12 Global Step: 256050 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:31,093-Speed 6292.20 samples/sec Loss 6.0926 LearningRate 0.0006 Epoch: 12 Global Step: 256060 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:34,338-Speed 6312.09 samples/sec Loss 6.1532 LearningRate 0.0006 Epoch: 12 Global Step: 256070 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:37,567-Speed 6344.43 samples/sec Loss 6.1349 LearningRate 0.0006 Epoch: 12 Global Step: 256080 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:40,810-Speed 6317.61 samples/sec Loss 6.0518 LearningRate 0.0006 Epoch: 12 Global Step: 256090 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:44,057-Speed 6308.26 samples/sec Loss 6.0741 LearningRate 0.0006 Epoch: 12 Global Step: 256100 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:47,305-Speed 6306.54 samples/sec Loss 6.1490 LearningRate 0.0006 Epoch: 12 Global Step: 256110 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:50,551-Speed 6311.48 samples/sec Loss 6.1094 LearningRate 0.0006 Epoch: 12 Global Step: 256120 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:53,800-Speed 6304.03 samples/sec Loss 6.1537 LearningRate 0.0006 Epoch: 12 Global Step: 256130 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:01:57,048-Speed 6307.54 samples/sec Loss 6.1133 LearningRate 0.0006 Epoch: 12 Global Step: 256140 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:02:00,296-Speed 6306.98 samples/sec Loss 6.0445 LearningRate 0.0006 Epoch: 12 Global Step: 256150 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:02:03,545-Speed 6303.38 samples/sec Loss 6.0877 LearningRate 0.0006 Epoch: 12 Global Step: 256160 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:02:06,793-Speed 6307.03 samples/sec Loss 6.1446 LearningRate 0.0006 Epoch: 12 Global Step: 256170 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:02:10,023-Speed 6343.25 samples/sec Loss 6.1805 LearningRate 0.0006 Epoch: 12 Global Step: 256180 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:02:13,268-Speed 6312.41 samples/sec Loss 6.1460 LearningRate 0.0006 Epoch: 12 Global Step: 256190 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:02:16,512-Speed 6314.67 samples/sec Loss 6.0897 LearningRate 0.0006 Epoch: 12 Global Step: 256200 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:02:19,761-Speed 6306.62 samples/sec Loss 6.0842 LearningRate 0.0006 Epoch: 12 Global Step: 256210 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:02:22,989-Speed 6346.08 samples/sec Loss 6.0058 LearningRate 0.0006 Epoch: 12 Global Step: 256220 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:02:26,236-Speed 6307.56 samples/sec Loss 6.0991 LearningRate 0.0006 Epoch: 12 Global Step: 256230 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:02:29,483-Speed 6309.75 samples/sec Loss 6.1047 LearningRate 0.0006 Epoch: 12 Global Step: 256240 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:02:32,732-Speed 6304.81 samples/sec Loss 6.0692 LearningRate 0.0006 Epoch: 12 Global Step: 256250 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:02:35,973-Speed 6319.76 samples/sec Loss 6.1049 LearningRate 0.0006 Epoch: 12 Global Step: 256260 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:02:39,214-Speed 6319.54 samples/sec Loss 6.1466 LearningRate 0.0006 Epoch: 12 Global Step: 256270 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:02:42,461-Speed 6309.72 samples/sec Loss 6.1332 LearningRate 0.0006 Epoch: 12 Global Step: 256280 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:02:45,704-Speed 6315.25 samples/sec Loss 6.0516 LearningRate 0.0006 Epoch: 12 Global Step: 256290 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:02:48,951-Speed 6309.15 samples/sec Loss 6.1204 LearningRate 0.0006 Epoch: 12 Global Step: 256300 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:02:52,195-Speed 6314.38 samples/sec Loss 6.0784 LearningRate 0.0006 Epoch: 12 Global Step: 256310 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:02:55,442-Speed 6309.88 samples/sec Loss 6.0653 LearningRate 0.0006 Epoch: 12 Global Step: 256320 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:02:58,681-Speed 6324.85 samples/sec Loss 6.1267 LearningRate 0.0006 Epoch: 12 Global Step: 256330 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:01,930-Speed 6304.83 samples/sec Loss 6.1524 LearningRate 0.0006 Epoch: 12 Global Step: 256340 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:05,176-Speed 6310.18 samples/sec Loss 6.1070 LearningRate 0.0006 Epoch: 12 Global Step: 256350 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:08,418-Speed 6319.06 samples/sec Loss 6.1404 LearningRate 0.0006 Epoch: 12 Global Step: 256360 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:11,674-Speed 6290.63 samples/sec Loss 6.1371 LearningRate 0.0006 Epoch: 12 Global Step: 256370 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:14,921-Speed 6308.86 samples/sec Loss 6.1378 LearningRate 0.0006 Epoch: 12 Global Step: 256380 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:18,172-Speed 6301.27 samples/sec Loss 6.1462 LearningRate 0.0006 Epoch: 12 Global Step: 256390 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:21,421-Speed 6306.37 samples/sec Loss 6.1070 LearningRate 0.0006 Epoch: 12 Global Step: 256400 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:24,670-Speed 6308.27 samples/sec Loss 6.1004 LearningRate 0.0006 Epoch: 12 Global Step: 256410 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:27,909-Speed 6322.67 samples/sec Loss 6.0790 LearningRate 0.0006 Epoch: 12 Global Step: 256420 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:31,157-Speed 6306.98 samples/sec Loss 6.0843 LearningRate 0.0006 Epoch: 12 Global Step: 256430 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:34,404-Speed 6308.41 samples/sec Loss 6.0726 LearningRate 0.0006 Epoch: 12 Global Step: 256440 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:37,656-Speed 6300.06 samples/sec Loss 6.0839 LearningRate 0.0006 Epoch: 12 Global Step: 256450 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:40,900-Speed 6314.08 samples/sec Loss 6.0767 LearningRate 0.0006 Epoch: 12 Global Step: 256460 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:44,146-Speed 6310.44 samples/sec Loss 6.1263 LearningRate 0.0006 Epoch: 12 Global Step: 256470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:47,392-Speed 6311.17 samples/sec Loss 6.0090 LearningRate 0.0006 Epoch: 12 Global Step: 256480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:50,640-Speed 6306.56 samples/sec Loss 6.1002 LearningRate 0.0006 Epoch: 12 Global Step: 256490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:53,886-Speed 6310.99 samples/sec Loss 6.1369 LearningRate 0.0006 Epoch: 12 Global Step: 256500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:03:57,135-Speed 6304.54 samples/sec Loss 6.0805 LearningRate 0.0006 Epoch: 12 Global Step: 256510 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:00,371-Speed 6331.12 samples/sec Loss 6.0451 LearningRate 0.0006 Epoch: 12 Global Step: 256520 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:03,618-Speed 6307.97 samples/sec Loss 6.0282 LearningRate 0.0006 Epoch: 12 Global Step: 256530 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:06,866-Speed 6306.44 samples/sec Loss 6.0869 LearningRate 0.0006 Epoch: 12 Global Step: 256540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:10,112-Speed 6309.77 samples/sec Loss 6.0809 LearningRate 0.0006 Epoch: 12 Global Step: 256550 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:13,390-Speed 6249.59 samples/sec Loss 6.1302 LearningRate 0.0006 Epoch: 12 Global Step: 256560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:16,650-Speed 6283.99 samples/sec Loss 6.0875 LearningRate 0.0006 Epoch: 12 Global Step: 256570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:19,901-Speed 6300.93 samples/sec Loss 6.0915 LearningRate 0.0006 Epoch: 12 Global Step: 256580 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:23,153-Speed 6299.09 samples/sec Loss 6.1086 LearningRate 0.0006 Epoch: 12 Global Step: 256590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:26,401-Speed 6308.39 samples/sec Loss 6.0662 LearningRate 0.0006 Epoch: 12 Global Step: 256600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:29,645-Speed 6314.20 samples/sec Loss 6.0388 LearningRate 0.0006 Epoch: 12 Global Step: 256610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:32,896-Speed 6300.04 samples/sec Loss 6.0533 LearningRate 0.0006 Epoch: 12 Global Step: 256620 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:04:36,135-Speed 6324.51 samples/sec Loss 6.0927 LearningRate 0.0006 Epoch: 12 Global Step: 256630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:39,427-Speed 6222.26 samples/sec Loss 6.1079 LearningRate 0.0006 Epoch: 12 Global Step: 256640 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:42,674-Speed 6310.26 samples/sec Loss 6.0819 LearningRate 0.0006 Epoch: 12 Global Step: 256650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:45,917-Speed 6316.48 samples/sec Loss 6.0989 LearningRate 0.0006 Epoch: 12 Global Step: 256660 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:49,160-Speed 6315.59 samples/sec Loss 6.1747 LearningRate 0.0006 Epoch: 12 Global Step: 256670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:52,406-Speed 6310.86 samples/sec Loss 6.1587 LearningRate 0.0006 Epoch: 12 Global Step: 256680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:55,654-Speed 6306.89 samples/sec Loss 6.0787 LearningRate 0.0006 Epoch: 12 Global Step: 256690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:04:58,897-Speed 6316.15 samples/sec Loss 6.0800 LearningRate 0.0006 Epoch: 12 Global Step: 256700 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:02,146-Speed 6305.73 samples/sec Loss 6.1016 LearningRate 0.0006 Epoch: 12 Global Step: 256710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:05,392-Speed 6310.49 samples/sec Loss 6.0518 LearningRate 0.0006 Epoch: 12 Global Step: 256720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:08,625-Speed 6335.19 samples/sec Loss 5.9933 LearningRate 0.0006 Epoch: 12 Global Step: 256730 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:11,871-Speed 6311.16 samples/sec Loss 6.0723 LearningRate 0.0006 Epoch: 12 Global Step: 256740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:15,119-Speed 6307.47 samples/sec Loss 6.1104 LearningRate 0.0006 Epoch: 12 Global Step: 256750 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:18,370-Speed 6300.51 samples/sec Loss 6.0240 LearningRate 0.0006 Epoch: 12 Global Step: 256760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:21,615-Speed 6312.35 samples/sec Loss 6.0452 LearningRate 0.0006 Epoch: 12 Global Step: 256770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:24,862-Speed 6308.42 samples/sec Loss 6.1260 LearningRate 0.0006 Epoch: 12 Global Step: 256780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:28,107-Speed 6313.05 samples/sec Loss 6.0994 LearningRate 0.0006 Epoch: 12 Global Step: 256790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:31,352-Speed 6313.40 samples/sec Loss 6.0674 LearningRate 0.0006 Epoch: 12 Global Step: 256800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:34,597-Speed 6312.93 samples/sec Loss 6.1250 LearningRate 0.0006 Epoch: 12 Global Step: 256810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:37,843-Speed 6310.01 samples/sec Loss 6.0121 LearningRate 0.0006 Epoch: 12 Global Step: 256820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:41,089-Speed 6311.78 samples/sec Loss 6.0982 LearningRate 0.0006 Epoch: 12 Global Step: 256830 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:05:44,325-Speed 6330.77 samples/sec Loss 6.1093 LearningRate 0.0006 Epoch: 12 Global Step: 256840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:47,570-Speed 6311.38 samples/sec Loss 6.0811 LearningRate 0.0006 Epoch: 12 Global Step: 256850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:50,822-Speed 6299.21 samples/sec Loss 6.0098 LearningRate 0.0006 Epoch: 12 Global Step: 256860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:54,067-Speed 6312.90 samples/sec Loss 6.1192 LearningRate 0.0006 Epoch: 12 Global Step: 256870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:05:57,311-Speed 6314.42 samples/sec Loss 6.0541 LearningRate 0.0006 Epoch: 12 Global Step: 256880 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:00,555-Speed 6315.45 samples/sec Loss 6.1403 LearningRate 0.0006 Epoch: 12 Global Step: 256890 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:03,802-Speed 6307.75 samples/sec Loss 6.0729 LearningRate 0.0006 Epoch: 12 Global Step: 256900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:07,048-Speed 6311.21 samples/sec Loss 6.0902 LearningRate 0.0006 Epoch: 12 Global Step: 256910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:10,289-Speed 6320.45 samples/sec Loss 6.0716 LearningRate 0.0006 Epoch: 12 Global Step: 256920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:13,534-Speed 6312.91 samples/sec Loss 6.1507 LearningRate 0.0006 Epoch: 12 Global Step: 256930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:16,769-Speed 6332.81 samples/sec Loss 6.1363 LearningRate 0.0006 Epoch: 12 Global Step: 256940 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:20,015-Speed 6309.66 samples/sec Loss 6.0608 LearningRate 0.0006 Epoch: 12 Global Step: 256950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:23,262-Speed 6310.38 samples/sec Loss 6.0240 LearningRate 0.0006 Epoch: 12 Global Step: 256960 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:26,506-Speed 6314.32 samples/sec Loss 6.0503 LearningRate 0.0006 Epoch: 12 Global Step: 256970 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:29,765-Speed 6285.64 samples/sec Loss 6.1390 LearningRate 0.0006 Epoch: 12 Global Step: 256980 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:33,009-Speed 6313.81 samples/sec Loss 6.1490 LearningRate 0.0006 Epoch: 12 Global Step: 256990 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:36,258-Speed 6305.88 samples/sec Loss 6.0465 LearningRate 0.0006 Epoch: 12 Global Step: 257000 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:39,554-Speed 6214.64 samples/sec Loss 6.1872 LearningRate 0.0006 Epoch: 12 Global Step: 257010 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:42,867-Speed 6182.40 samples/sec Loss 6.0804 LearningRate 0.0006 Epoch: 12 Global Step: 257020 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:46,116-Speed 6306.19 samples/sec Loss 6.0919 LearningRate 0.0006 Epoch: 12 Global Step: 257030 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:49,346-Speed 6342.32 samples/sec Loss 6.0732 LearningRate 0.0006 Epoch: 12 Global Step: 257040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:52,593-Speed 6309.23 samples/sec Loss 6.1044 LearningRate 0.0006 Epoch: 12 Global Step: 257050 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:55,841-Speed 6306.72 samples/sec Loss 6.0928 LearningRate 0.0006 Epoch: 12 Global Step: 257060 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:06:59,089-Speed 6307.02 samples/sec Loss 6.0519 LearningRate 0.0006 Epoch: 12 Global Step: 257070 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:02,336-Speed 6308.39 samples/sec Loss 6.1526 LearningRate 0.0006 Epoch: 12 Global Step: 257080 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:05,582-Speed 6310.26 samples/sec Loss 6.0578 LearningRate 0.0006 Epoch: 12 Global Step: 257090 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:08,825-Speed 6316.59 samples/sec Loss 6.0852 LearningRate 0.0006 Epoch: 12 Global Step: 257100 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:12,075-Speed 6305.15 samples/sec Loss 6.1270 LearningRate 0.0006 Epoch: 12 Global Step: 257110 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:15,320-Speed 6312.53 samples/sec Loss 6.1169 LearningRate 0.0006 Epoch: 12 Global Step: 257120 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:18,565-Speed 6313.81 samples/sec Loss 6.0263 LearningRate 0.0006 Epoch: 12 Global Step: 257130 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:21,811-Speed 6310.08 samples/sec Loss 6.0525 LearningRate 0.0006 Epoch: 12 Global Step: 257140 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:07:25,045-Speed 6334.01 samples/sec Loss 6.0857 LearningRate 0.0006 Epoch: 12 Global Step: 257150 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:28,290-Speed 6315.20 samples/sec Loss 6.1603 LearningRate 0.0006 Epoch: 12 Global Step: 257160 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:31,537-Speed 6309.10 samples/sec Loss 6.0532 LearningRate 0.0006 Epoch: 12 Global Step: 257170 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:34,782-Speed 6311.97 samples/sec Loss 6.0548 LearningRate 0.0006 Epoch: 12 Global Step: 257180 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:38,027-Speed 6312.86 samples/sec Loss 6.1824 LearningRate 0.0006 Epoch: 12 Global Step: 257190 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:41,273-Speed 6310.40 samples/sec Loss 6.0523 LearningRate 0.0006 Epoch: 12 Global Step: 257200 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:44,522-Speed 6304.68 samples/sec Loss 6.1071 LearningRate 0.0006 Epoch: 12 Global Step: 257210 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:47,777-Speed 6293.00 samples/sec Loss 6.0879 LearningRate 0.0006 Epoch: 12 Global Step: 257220 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:51,026-Speed 6307.17 samples/sec Loss 6.0821 LearningRate 0.0006 Epoch: 12 Global Step: 257230 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:54,270-Speed 6314.27 samples/sec Loss 6.0587 LearningRate 0.0006 Epoch: 12 Global Step: 257240 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:07:57,502-Speed 6337.89 samples/sec Loss 6.0392 LearningRate 0.0006 Epoch: 12 Global Step: 257250 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:00,747-Speed 6312.63 samples/sec Loss 6.1569 LearningRate 0.0006 Epoch: 12 Global Step: 257260 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:03,993-Speed 6310.83 samples/sec Loss 6.0770 LearningRate 0.0006 Epoch: 12 Global Step: 257270 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:07,241-Speed 6306.72 samples/sec Loss 6.0623 LearningRate 0.0006 Epoch: 12 Global Step: 257280 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:10,487-Speed 6310.70 samples/sec Loss 6.0663 LearningRate 0.0006 Epoch: 12 Global Step: 257290 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:13,736-Speed 6304.91 samples/sec Loss 6.0284 LearningRate 0.0006 Epoch: 12 Global Step: 257300 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:16,981-Speed 6313.16 samples/sec Loss 6.0656 LearningRate 0.0006 Epoch: 12 Global Step: 257310 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:20,225-Speed 6314.86 samples/sec Loss 6.1247 LearningRate 0.0006 Epoch: 12 Global Step: 257320 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:23,475-Speed 6301.70 samples/sec Loss 6.0769 LearningRate 0.0006 Epoch: 12 Global Step: 257330 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:26,724-Speed 6305.40 samples/sec Loss 5.9906 LearningRate 0.0006 Epoch: 12 Global Step: 257340 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:29,956-Speed 6337.95 samples/sec Loss 6.0402 LearningRate 0.0006 Epoch: 12 Global Step: 257350 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:33,201-Speed 6313.03 samples/sec Loss 6.0926 LearningRate 0.0006 Epoch: 12 Global Step: 257360 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:36,449-Speed 6306.58 samples/sec Loss 6.1289 LearningRate 0.0006 Epoch: 12 Global Step: 257370 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:39,695-Speed 6310.72 samples/sec Loss 6.0045 LearningRate 0.0006 Epoch: 12 Global Step: 257380 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:42,941-Speed 6310.32 samples/sec Loss 6.0731 LearningRate 0.0006 Epoch: 12 Global Step: 257390 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:46,187-Speed 6311.27 samples/sec Loss 6.0049 LearningRate 0.0006 Epoch: 12 Global Step: 257400 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:49,432-Speed 6312.78 samples/sec Loss 6.0818 LearningRate 0.0006 Epoch: 12 Global Step: 257410 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:52,677-Speed 6312.52 samples/sec Loss 6.0856 LearningRate 0.0006 Epoch: 12 Global Step: 257420 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:55,928-Speed 6301.67 samples/sec Loss 6.2246 LearningRate 0.0006 Epoch: 12 Global Step: 257430 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:08:59,172-Speed 6314.03 samples/sec Loss 6.0492 LearningRate 0.0006 Epoch: 12 Global Step: 257440 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:02,417-Speed 6314.02 samples/sec Loss 6.0725 LearningRate 0.0006 Epoch: 12 Global Step: 257450 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:09:05,652-Speed 6330.84 samples/sec Loss 6.0013 LearningRate 0.0006 Epoch: 12 Global Step: 257460 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:08,896-Speed 6314.83 samples/sec Loss 6.0520 LearningRate 0.0006 Epoch: 12 Global Step: 257470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:12,143-Speed 6308.25 samples/sec Loss 6.1187 LearningRate 0.0006 Epoch: 12 Global Step: 257480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:15,391-Speed 6307.22 samples/sec Loss 6.1970 LearningRate 0.0006 Epoch: 12 Global Step: 257490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:18,636-Speed 6312.12 samples/sec Loss 6.0411 LearningRate 0.0006 Epoch: 12 Global Step: 257500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:21,890-Speed 6295.86 samples/sec Loss 6.0220 LearningRate 0.0006 Epoch: 12 Global Step: 257510 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:25,135-Speed 6312.06 samples/sec Loss 6.1455 LearningRate 0.0006 Epoch: 12 Global Step: 257520 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:28,382-Speed 6309.55 samples/sec Loss 6.0492 LearningRate 0.0006 Epoch: 12 Global Step: 257530 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:31,629-Speed 6308.87 samples/sec Loss 6.0662 LearningRate 0.0006 Epoch: 12 Global Step: 257540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:34,876-Speed 6308.78 samples/sec Loss 5.9766 LearningRate 0.0006 Epoch: 12 Global Step: 257550 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:38,108-Speed 6337.42 samples/sec Loss 6.1165 LearningRate 0.0006 Epoch: 12 Global Step: 257560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:41,354-Speed 6310.66 samples/sec Loss 6.0523 LearningRate 0.0006 Epoch: 12 Global Step: 257570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:44,605-Speed 6300.76 samples/sec Loss 6.0911 LearningRate 0.0006 Epoch: 12 Global Step: 257580 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:47,861-Speed 6291.80 samples/sec Loss 6.0684 LearningRate 0.0006 Epoch: 12 Global Step: 257590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:51,110-Speed 6304.19 samples/sec Loss 6.0779 LearningRate 0.0006 Epoch: 12 Global Step: 257600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:54,355-Speed 6313.57 samples/sec Loss 6.0985 LearningRate 0.0006 Epoch: 12 Global Step: 257610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:09:57,602-Speed 6308.55 samples/sec Loss 6.1456 LearningRate 0.0006 Epoch: 12 Global Step: 257620 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:00,874-Speed 6260.93 samples/sec Loss 6.1803 LearningRate 0.0006 Epoch: 12 Global Step: 257630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:04,123-Speed 6305.29 samples/sec Loss 6.1193 LearningRate 0.0006 Epoch: 12 Global Step: 257640 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:07,368-Speed 6312.80 samples/sec Loss 6.1002 LearningRate 0.0006 Epoch: 12 Global Step: 257650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:10,613-Speed 6313.26 samples/sec Loss 6.0439 LearningRate 0.0006 Epoch: 12 Global Step: 257660 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:10:13,850-Speed 6328.25 samples/sec Loss 6.1130 LearningRate 0.0006 Epoch: 12 Global Step: 257670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:17,100-Speed 6302.78 samples/sec Loss 6.0683 LearningRate 0.0006 Epoch: 12 Global Step: 257680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:20,348-Speed 6307.14 samples/sec Loss 6.0491 LearningRate 0.0006 Epoch: 12 Global Step: 257690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:23,593-Speed 6312.63 samples/sec Loss 6.0595 LearningRate 0.0006 Epoch: 12 Global Step: 257700 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:26,838-Speed 6313.09 samples/sec Loss 6.0848 LearningRate 0.0006 Epoch: 12 Global Step: 257710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:30,089-Speed 6300.30 samples/sec Loss 6.1164 LearningRate 0.0006 Epoch: 12 Global Step: 257720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:33,334-Speed 6313.09 samples/sec Loss 6.1096 LearningRate 0.0006 Epoch: 12 Global Step: 257730 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:36,582-Speed 6307.17 samples/sec Loss 6.1191 LearningRate 0.0006 Epoch: 12 Global Step: 257740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:39,826-Speed 6313.25 samples/sec Loss 6.0485 LearningRate 0.0006 Epoch: 12 Global Step: 257750 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:43,071-Speed 6314.12 samples/sec Loss 6.1056 LearningRate 0.0006 Epoch: 12 Global Step: 257760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:46,310-Speed 6323.36 samples/sec Loss 6.0007 LearningRate 0.0006 Epoch: 12 Global Step: 257770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:49,554-Speed 6315.07 samples/sec Loss 6.0996 LearningRate 0.0006 Epoch: 12 Global Step: 257780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:52,798-Speed 6314.02 samples/sec Loss 6.0157 LearningRate 0.0006 Epoch: 12 Global Step: 257790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:56,041-Speed 6317.16 samples/sec Loss 6.0690 LearningRate 0.0006 Epoch: 12 Global Step: 257800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:10:59,288-Speed 6309.49 samples/sec Loss 6.0457 LearningRate 0.0006 Epoch: 12 Global Step: 257810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:02,539-Speed 6300.74 samples/sec Loss 6.0579 LearningRate 0.0006 Epoch: 12 Global Step: 257820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:05,785-Speed 6311.17 samples/sec Loss 6.0861 LearningRate 0.0006 Epoch: 12 Global Step: 257830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:09,030-Speed 6311.32 samples/sec Loss 6.0623 LearningRate 0.0006 Epoch: 12 Global Step: 257840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:12,279-Speed 6305.85 samples/sec Loss 6.0003 LearningRate 0.0006 Epoch: 12 Global Step: 257850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:15,523-Speed 6315.03 samples/sec Loss 6.0020 LearningRate 0.0006 Epoch: 12 Global Step: 257860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:18,770-Speed 6308.63 samples/sec Loss 6.0650 LearningRate 0.0006 Epoch: 12 Global Step: 257870 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:11:22,017-Speed 6308.63 samples/sec Loss 5.9739 LearningRate 0.0006 Epoch: 12 Global Step: 257880 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:11:25,256-Speed 6324.79 samples/sec Loss 6.1045 LearningRate 0.0006 Epoch: 12 Global Step: 257890 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:28,508-Speed 6299.29 samples/sec Loss 5.9993 LearningRate 0.0006 Epoch: 12 Global Step: 257900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:31,755-Speed 6308.64 samples/sec Loss 6.0799 LearningRate 0.0006 Epoch: 12 Global Step: 257910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:35,000-Speed 6311.96 samples/sec Loss 6.0817 LearningRate 0.0006 Epoch: 12 Global Step: 257920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:38,247-Speed 6308.74 samples/sec Loss 6.0517 LearningRate 0.0006 Epoch: 12 Global Step: 257930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:41,492-Speed 6311.75 samples/sec Loss 6.0846 LearningRate 0.0006 Epoch: 12 Global Step: 257940 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:44,737-Speed 6314.01 samples/sec Loss 6.0854 LearningRate 0.0006 Epoch: 12 Global Step: 257950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:47,987-Speed 6301.18 samples/sec Loss 6.0439 LearningRate 0.0006 Epoch: 12 Global Step: 257960 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:51,236-Speed 6305.27 samples/sec Loss 6.0656 LearningRate 0.0006 Epoch: 12 Global Step: 257970 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:54,486-Speed 6304.13 samples/sec Loss 6.1143 LearningRate 0.0006 Epoch: 12 Global Step: 257980 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:11:57,731-Speed 6313.22 samples/sec Loss 6.0470 LearningRate 0.0006 Epoch: 12 Global Step: 257990 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:12:00,962-Speed 6339.01 samples/sec Loss 6.1236 LearningRate 0.0006 Epoch: 12 Global Step: 258000 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:04,215-Speed 6296.47 samples/sec Loss 6.1233 LearningRate 0.0006 Epoch: 12 Global Step: 258010 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:07,458-Speed 6317.96 samples/sec Loss 6.0822 LearningRate 0.0006 Epoch: 12 Global Step: 258020 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:10,702-Speed 6314.10 samples/sec Loss 6.0464 LearningRate 0.0006 Epoch: 12 Global Step: 258030 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:13,951-Speed 6304.90 samples/sec Loss 6.0232 LearningRate 0.0006 Epoch: 12 Global Step: 258040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:17,194-Speed 6316.40 samples/sec Loss 6.0619 LearningRate 0.0006 Epoch: 12 Global Step: 258050 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:20,441-Speed 6309.25 samples/sec Loss 6.0728 LearningRate 0.0006 Epoch: 12 Global Step: 258060 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:23,690-Speed 6305.44 samples/sec Loss 6.0982 LearningRate 0.0006 Epoch: 12 Global Step: 258070 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:26,939-Speed 6305.17 samples/sec Loss 6.0635 LearningRate 0.0006 Epoch: 12 Global Step: 258080 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:30,190-Speed 6301.66 samples/sec Loss 6.0527 LearningRate 0.0006 Epoch: 12 Global Step: 258090 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:33,434-Speed 6314.32 samples/sec Loss 6.0849 LearningRate 0.0006 Epoch: 12 Global Step: 258100 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:12:36,663-Speed 6343.07 samples/sec Loss 6.1068 LearningRate 0.0006 Epoch: 12 Global Step: 258110 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:39,904-Speed 6319.77 samples/sec Loss 6.0652 LearningRate 0.0006 Epoch: 12 Global Step: 258120 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:43,150-Speed 6312.49 samples/sec Loss 6.1131 LearningRate 0.0006 Epoch: 12 Global Step: 258130 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:46,399-Speed 6304.26 samples/sec Loss 6.0556 LearningRate 0.0006 Epoch: 12 Global Step: 258140 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:49,645-Speed 6309.58 samples/sec Loss 6.0798 LearningRate 0.0006 Epoch: 12 Global Step: 258150 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:52,899-Speed 6296.16 samples/sec Loss 6.0984 LearningRate 0.0006 Epoch: 12 Global Step: 258160 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:56,141-Speed 6318.32 samples/sec Loss 6.1124 LearningRate 0.0006 Epoch: 12 Global Step: 258170 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:12:59,394-Speed 6298.32 samples/sec Loss 6.1267 LearningRate 0.0006 Epoch: 12 Global Step: 258180 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:02,642-Speed 6304.99 samples/sec Loss 5.9755 LearningRate 0.0006 Epoch: 12 Global Step: 258190 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:05,891-Speed 6304.62 samples/sec Loss 6.1012 LearningRate 0.0006 Epoch: 12 Global Step: 258200 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:09,120-Speed 6343.78 samples/sec Loss 6.0450 LearningRate 0.0006 Epoch: 12 Global Step: 258210 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:12,367-Speed 6309.69 samples/sec Loss 6.1294 LearningRate 0.0006 Epoch: 12 Global Step: 258220 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:15,609-Speed 6317.92 samples/sec Loss 6.0410 LearningRate 0.0006 Epoch: 12 Global Step: 258230 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:18,850-Speed 6320.01 samples/sec Loss 6.0432 LearningRate 0.0006 Epoch: 12 Global Step: 258240 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:22,098-Speed 6306.93 samples/sec Loss 6.0807 LearningRate 0.0006 Epoch: 12 Global Step: 258250 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:25,344-Speed 6312.10 samples/sec Loss 6.0269 LearningRate 0.0006 Epoch: 12 Global Step: 258260 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:28,593-Speed 6304.91 samples/sec Loss 6.0058 LearningRate 0.0006 Epoch: 12 Global Step: 258270 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:31,837-Speed 6314.64 samples/sec Loss 6.0509 LearningRate 0.0006 Epoch: 12 Global Step: 258280 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:35,083-Speed 6311.25 samples/sec Loss 6.0463 LearningRate 0.0006 Epoch: 12 Global Step: 258290 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:38,334-Speed 6300.75 samples/sec Loss 6.0252 LearningRate 0.0006 Epoch: 12 Global Step: 258300 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:41,563-Speed 6344.18 samples/sec Loss 6.0820 LearningRate 0.0006 Epoch: 12 Global Step: 258310 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:44,807-Speed 6313.71 samples/sec Loss 6.1307 LearningRate 0.0006 Epoch: 12 Global Step: 258320 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:48,053-Speed 6311.83 samples/sec Loss 6.1093 LearningRate 0.0006 Epoch: 12 Global Step: 258330 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:51,298-Speed 6311.73 samples/sec Loss 6.0724 LearningRate 0.0006 Epoch: 12 Global Step: 258340 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:54,545-Speed 6308.88 samples/sec Loss 6.2115 LearningRate 0.0006 Epoch: 12 Global Step: 258350 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:13:57,788-Speed 6317.50 samples/sec Loss 6.0955 LearningRate 0.0006 Epoch: 12 Global Step: 258360 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:01,033-Speed 6312.01 samples/sec Loss 6.0091 LearningRate 0.0006 Epoch: 12 Global Step: 258370 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:04,284-Speed 6300.77 samples/sec Loss 6.0705 LearningRate 0.0006 Epoch: 12 Global Step: 258380 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:07,531-Speed 6309.58 samples/sec Loss 6.0499 LearningRate 0.0006 Epoch: 12 Global Step: 258390 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:10,777-Speed 6309.87 samples/sec Loss 6.1048 LearningRate 0.0006 Epoch: 12 Global Step: 258400 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:14,018-Speed 6321.71 samples/sec Loss 6.0427 LearningRate 0.0006 Epoch: 12 Global Step: 258410 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:14:17,252-Speed 6333.22 samples/sec Loss 6.0059 LearningRate 0.0006 Epoch: 12 Global Step: 258420 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:20,498-Speed 6309.53 samples/sec Loss 6.0861 LearningRate 0.0006 Epoch: 12 Global Step: 258430 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:23,747-Speed 6306.13 samples/sec Loss 6.0603 LearningRate 0.0006 Epoch: 12 Global Step: 258440 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:26,997-Speed 6302.15 samples/sec Loss 6.0732 LearningRate 0.0006 Epoch: 12 Global Step: 258450 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:30,245-Speed 6308.38 samples/sec Loss 6.0443 LearningRate 0.0006 Epoch: 12 Global Step: 258460 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:33,490-Speed 6311.39 samples/sec Loss 6.1007 LearningRate 0.0006 Epoch: 12 Global Step: 258470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:36,735-Speed 6312.77 samples/sec Loss 6.0623 LearningRate 0.0006 Epoch: 12 Global Step: 258480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:39,983-Speed 6307.44 samples/sec Loss 6.0670 LearningRate 0.0006 Epoch: 12 Global Step: 258490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:43,231-Speed 6306.38 samples/sec Loss 6.0846 LearningRate 0.0006 Epoch: 12 Global Step: 258500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:46,476-Speed 6313.08 samples/sec Loss 5.9918 LearningRate 0.0006 Epoch: 12 Global Step: 258510 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:49,726-Speed 6303.03 samples/sec Loss 6.0408 LearningRate 0.0006 Epoch: 12 Global Step: 258520 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:14:52,956-Speed 6343.87 samples/sec Loss 6.0773 LearningRate 0.0006 Epoch: 12 Global Step: 258530 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:56,203-Speed 6308.15 samples/sec Loss 6.0465 LearningRate 0.0006 Epoch: 12 Global Step: 258540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:14:59,447-Speed 6315.13 samples/sec Loss 6.1063 LearningRate 0.0006 Epoch: 12 Global Step: 258550 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:02,697-Speed 6302.03 samples/sec Loss 6.0569 LearningRate 0.0006 Epoch: 12 Global Step: 258560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:05,943-Speed 6311.44 samples/sec Loss 6.0866 LearningRate 0.0006 Epoch: 12 Global Step: 258570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:09,188-Speed 6311.22 samples/sec Loss 6.0385 LearningRate 0.0006 Epoch: 12 Global Step: 258580 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:12,446-Speed 6289.21 samples/sec Loss 5.9784 LearningRate 0.0006 Epoch: 12 Global Step: 258590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:15,693-Speed 6307.81 samples/sec Loss 6.1000 LearningRate 0.0006 Epoch: 12 Global Step: 258600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:18,937-Speed 6314.58 samples/sec Loss 6.1304 LearningRate 0.0006 Epoch: 12 Global Step: 258610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:22,183-Speed 6311.28 samples/sec Loss 6.1649 LearningRate 0.0006 Epoch: 12 Global Step: 258620 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:25,416-Speed 6338.70 samples/sec Loss 6.1712 LearningRate 0.0006 Epoch: 12 Global Step: 258630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:28,666-Speed 6303.59 samples/sec Loss 6.0984 LearningRate 0.0006 Epoch: 12 Global Step: 258640 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:31,915-Speed 6305.09 samples/sec Loss 6.1144 LearningRate 0.0006 Epoch: 12 Global Step: 258650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:35,164-Speed 6304.32 samples/sec Loss 6.0810 LearningRate 0.0006 Epoch: 12 Global Step: 258660 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:38,406-Speed 6317.51 samples/sec Loss 6.0693 LearningRate 0.0006 Epoch: 12 Global Step: 258670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:41,651-Speed 6314.06 samples/sec Loss 6.0455 LearningRate 0.0006 Epoch: 12 Global Step: 258680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:44,893-Speed 6318.34 samples/sec Loss 6.0146 LearningRate 0.0006 Epoch: 12 Global Step: 258690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:48,138-Speed 6312.77 samples/sec Loss 6.1252 LearningRate 0.0006 Epoch: 12 Global Step: 258700 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:51,383-Speed 6313.38 samples/sec Loss 6.1923 LearningRate 0.0006 Epoch: 12 Global Step: 258710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:54,625-Speed 6319.32 samples/sec Loss 6.0219 LearningRate 0.0006 Epoch: 12 Global Step: 258720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:15:57,868-Speed 6314.60 samples/sec Loss 6.0689 LearningRate 0.0006 Epoch: 12 Global Step: 258730 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:16:01,106-Speed 6327.26 samples/sec Loss 6.0951 LearningRate 0.0006 Epoch: 12 Global Step: 258740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:04,353-Speed 6309.01 samples/sec Loss 6.0563 LearningRate 0.0006 Epoch: 12 Global Step: 258750 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:07,602-Speed 6304.91 samples/sec Loss 6.0597 LearningRate 0.0006 Epoch: 12 Global Step: 258760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:10,847-Speed 6311.63 samples/sec Loss 6.0985 LearningRate 0.0006 Epoch: 12 Global Step: 258770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:14,090-Speed 6316.69 samples/sec Loss 5.9962 LearningRate 0.0006 Epoch: 12 Global Step: 258780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:17,336-Speed 6311.74 samples/sec Loss 6.0180 LearningRate 0.0006 Epoch: 12 Global Step: 258790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:20,585-Speed 6303.76 samples/sec Loss 6.0109 LearningRate 0.0006 Epoch: 12 Global Step: 258800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:23,832-Speed 6309.31 samples/sec Loss 6.0896 LearningRate 0.0006 Epoch: 12 Global Step: 258810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:27,077-Speed 6312.37 samples/sec Loss 6.0169 LearningRate 0.0006 Epoch: 12 Global Step: 258820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:30,323-Speed 6311.57 samples/sec Loss 6.0276 LearningRate 0.0006 Epoch: 12 Global Step: 258830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:33,556-Speed 6335.63 samples/sec Loss 6.1163 LearningRate 0.0006 Epoch: 12 Global Step: 258840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:36,801-Speed 6313.06 samples/sec Loss 6.0445 LearningRate 0.0006 Epoch: 12 Global Step: 258850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:40,046-Speed 6312.62 samples/sec Loss 6.0855 LearningRate 0.0006 Epoch: 12 Global Step: 258860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:43,289-Speed 6316.00 samples/sec Loss 6.1304 LearningRate 0.0006 Epoch: 12 Global Step: 258870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:46,537-Speed 6306.68 samples/sec Loss 6.1155 LearningRate 0.0006 Epoch: 12 Global Step: 258880 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:49,784-Speed 6308.47 samples/sec Loss 6.0468 LearningRate 0.0006 Epoch: 12 Global Step: 258890 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:53,036-Speed 6300.46 samples/sec Loss 6.0861 LearningRate 0.0006 Epoch: 12 Global Step: 258900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:56,281-Speed 6311.46 samples/sec Loss 6.0595 LearningRate 0.0006 Epoch: 12 Global Step: 258910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:16:59,530-Speed 6305.35 samples/sec Loss 6.0623 LearningRate 0.0006 Epoch: 12 Global Step: 258920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:02,774-Speed 6315.41 samples/sec Loss 6.0148 LearningRate 0.0006 Epoch: 12 Global Step: 258930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:06,027-Speed 6296.98 samples/sec Loss 6.1345 LearningRate 0.0006 Epoch: 12 Global Step: 258940 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:17:09,261-Speed 6334.82 samples/sec Loss 6.0473 LearningRate 0.0006 Epoch: 12 Global Step: 258950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:12,505-Speed 6315.51 samples/sec Loss 6.0884 LearningRate 0.0006 Epoch: 12 Global Step: 258960 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:15,753-Speed 6306.29 samples/sec Loss 6.0675 LearningRate 0.0006 Epoch: 12 Global Step: 258970 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:18,999-Speed 6309.78 samples/sec Loss 6.0105 LearningRate 0.0006 Epoch: 12 Global Step: 258980 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:22,245-Speed 6310.76 samples/sec Loss 6.0749 LearningRate 0.0006 Epoch: 12 Global Step: 258990 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:25,492-Speed 6309.70 samples/sec Loss 6.1149 LearningRate 0.0006 Epoch: 12 Global Step: 259000 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:28,738-Speed 6310.66 samples/sec Loss 6.0979 LearningRate 0.0006 Epoch: 12 Global Step: 259010 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:31,984-Speed 6310.23 samples/sec Loss 6.0304 LearningRate 0.0006 Epoch: 12 Global Step: 259020 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:35,232-Speed 6305.78 samples/sec Loss 6.0148 LearningRate 0.0006 Epoch: 12 Global Step: 259030 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:38,487-Speed 6294.69 samples/sec Loss 6.1041 LearningRate 0.0006 Epoch: 12 Global Step: 259040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:41,734-Speed 6307.68 samples/sec Loss 6.1276 LearningRate 0.0006 Epoch: 12 Global Step: 259050 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:17:44,964-Speed 6341.37 samples/sec Loss 6.1155 LearningRate 0.0006 Epoch: 12 Global Step: 259060 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:48,220-Speed 6291.76 samples/sec Loss 6.0598 LearningRate 0.0006 Epoch: 12 Global Step: 259070 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:51,463-Speed 6316.29 samples/sec Loss 6.0858 LearningRate 0.0006 Epoch: 12 Global Step: 259080 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:54,713-Speed 6304.53 samples/sec Loss 6.0537 LearningRate 0.0006 Epoch: 12 Global Step: 259090 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:17:57,958-Speed 6311.74 samples/sec Loss 5.9899 LearningRate 0.0006 Epoch: 12 Global Step: 259100 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:01,206-Speed 6308.03 samples/sec Loss 6.0495 LearningRate 0.0006 Epoch: 12 Global Step: 259110 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:04,451-Speed 6312.10 samples/sec Loss 6.1012 LearningRate 0.0006 Epoch: 12 Global Step: 259120 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:07,701-Speed 6302.12 samples/sec Loss 6.0221 LearningRate 0.0006 Epoch: 12 Global Step: 259130 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:10,949-Speed 6308.15 samples/sec Loss 6.0859 LearningRate 0.0006 Epoch: 12 Global Step: 259140 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:14,206-Speed 6289.99 samples/sec Loss 6.1013 LearningRate 0.0006 Epoch: 12 Global Step: 259150 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:17,436-Speed 6342.20 samples/sec Loss 6.0163 LearningRate 0.0006 Epoch: 12 Global Step: 259160 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:20,688-Speed 6297.98 samples/sec Loss 6.0874 LearningRate 0.0006 Epoch: 12 Global Step: 259170 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:23,969-Speed 6247.82 samples/sec Loss 6.0726 LearningRate 0.0006 Epoch: 12 Global Step: 259180 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:27,215-Speed 6309.91 samples/sec Loss 6.0507 LearningRate 0.0006 Epoch: 12 Global Step: 259190 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:30,460-Speed 6313.88 samples/sec Loss 6.0251 LearningRate 0.0006 Epoch: 12 Global Step: 259200 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:33,705-Speed 6311.38 samples/sec Loss 6.0324 LearningRate 0.0006 Epoch: 12 Global Step: 259210 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:36,955-Speed 6303.19 samples/sec Loss 6.0514 LearningRate 0.0006 Epoch: 12 Global Step: 259220 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:40,198-Speed 6316.58 samples/sec Loss 6.0875 LearningRate 0.0006 Epoch: 12 Global Step: 259230 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:43,446-Speed 6307.27 samples/sec Loss 6.0137 LearningRate 0.0006 Epoch: 12 Global Step: 259240 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:46,691-Speed 6312.33 samples/sec Loss 6.0516 LearningRate 0.0006 Epoch: 12 Global Step: 259250 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:49,927-Speed 6329.54 samples/sec Loss 6.0060 LearningRate 0.0006 Epoch: 12 Global Step: 259260 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:53,174-Speed 6308.78 samples/sec Loss 6.0483 LearningRate 0.0006 Epoch: 12 Global Step: 259270 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:56,422-Speed 6307.11 samples/sec Loss 6.0052 LearningRate 0.0006 Epoch: 12 Global Step: 259280 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:18:59,670-Speed 6307.14 samples/sec Loss 6.0193 LearningRate 0.0006 Epoch: 12 Global Step: 259290 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:02,915-Speed 6313.55 samples/sec Loss 6.0017 LearningRate 0.0006 Epoch: 12 Global Step: 259300 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:06,161-Speed 6310.17 samples/sec Loss 6.0586 LearningRate 0.0006 Epoch: 12 Global Step: 259310 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:09,403-Speed 6317.95 samples/sec Loss 6.1451 LearningRate 0.0006 Epoch: 12 Global Step: 259320 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:12,650-Speed 6308.84 samples/sec Loss 6.0888 LearningRate 0.0006 Epoch: 12 Global Step: 259330 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:15,893-Speed 6317.26 samples/sec Loss 6.1170 LearningRate 0.0006 Epoch: 12 Global Step: 259340 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:19,141-Speed 6306.89 samples/sec Loss 6.0384 LearningRate 0.0006 Epoch: 12 Global Step: 259350 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:22,372-Speed 6340.46 samples/sec Loss 6.0194 LearningRate 0.0006 Epoch: 12 Global Step: 259360 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:25,616-Speed 6318.23 samples/sec Loss 6.0892 LearningRate 0.0006 Epoch: 12 Global Step: 259370 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:28,862-Speed 6309.39 samples/sec Loss 6.0549 LearningRate 0.0006 Epoch: 12 Global Step: 259380 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:32,108-Speed 6311.15 samples/sec Loss 6.1355 LearningRate 0.0006 Epoch: 12 Global Step: 259390 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:35,358-Speed 6302.87 samples/sec Loss 6.0902 LearningRate 0.0006 Epoch: 12 Global Step: 259400 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:38,606-Speed 6307.45 samples/sec Loss 6.0964 LearningRate 0.0006 Epoch: 12 Global Step: 259410 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:41,846-Speed 6321.93 samples/sec Loss 6.1409 LearningRate 0.0006 Epoch: 12 Global Step: 259420 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:45,089-Speed 6316.56 samples/sec Loss 6.1116 LearningRate 0.0006 Epoch: 12 Global Step: 259430 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:48,335-Speed 6311.03 samples/sec Loss 6.0522 LearningRate 0.0006 Epoch: 12 Global Step: 259440 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:51,639-Speed 6199.45 samples/sec Loss 6.0754 LearningRate 0.0006 Epoch: 12 Global Step: 259450 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:19:54,920-Speed 6243.07 samples/sec Loss 6.0482 LearningRate 0.0006 Epoch: 12 Global Step: 259460 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:19:58,146-Speed 6349.55 samples/sec Loss 6.0899 LearningRate 0.0006 Epoch: 12 Global Step: 259470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:01,394-Speed 6307.17 samples/sec Loss 6.0496 LearningRate 0.0006 Epoch: 12 Global Step: 259480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:04,697-Speed 6202.26 samples/sec Loss 6.0601 LearningRate 0.0006 Epoch: 12 Global Step: 259490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:07,942-Speed 6312.84 samples/sec Loss 5.9613 LearningRate 0.0006 Epoch: 12 Global Step: 259500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:11,189-Speed 6307.89 samples/sec Loss 6.0618 LearningRate 0.0006 Epoch: 12 Global Step: 259510 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:14,430-Speed 6322.11 samples/sec Loss 6.1231 LearningRate 0.0006 Epoch: 12 Global Step: 259520 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:17,673-Speed 6314.66 samples/sec Loss 6.0537 LearningRate 0.0006 Epoch: 12 Global Step: 259530 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:20,917-Speed 6316.05 samples/sec Loss 5.9965 LearningRate 0.0006 Epoch: 12 Global Step: 259540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:24,164-Speed 6309.07 samples/sec Loss 6.0267 LearningRate 0.0006 Epoch: 12 Global Step: 259550 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:27,407-Speed 6317.40 samples/sec Loss 6.0272 LearningRate 0.0006 Epoch: 12 Global Step: 259560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:30,637-Speed 6340.83 samples/sec Loss 6.0872 LearningRate 0.0006 Epoch: 12 Global Step: 259570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:33,880-Speed 6316.89 samples/sec Loss 5.9507 LearningRate 0.0006 Epoch: 12 Global Step: 259580 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:37,126-Speed 6311.65 samples/sec Loss 5.9919 LearningRate 0.0006 Epoch: 12 Global Step: 259590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:40,371-Speed 6311.08 samples/sec Loss 6.0119 LearningRate 0.0006 Epoch: 12 Global Step: 259600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:43,614-Speed 6318.16 samples/sec Loss 6.0009 LearningRate 0.0006 Epoch: 12 Global Step: 259610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:46,858-Speed 6313.44 samples/sec Loss 6.0123 LearningRate 0.0006 Epoch: 12 Global Step: 259620 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:50,151-Speed 6220.41 samples/sec Loss 6.0208 LearningRate 0.0006 Epoch: 12 Global Step: 259630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:53,454-Speed 6202.29 samples/sec Loss 5.9670 LearningRate 0.0006 Epoch: 12 Global Step: 259640 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:56,702-Speed 6308.19 samples/sec Loss 6.0918 LearningRate 0.0006 Epoch: 12 Global Step: 259650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:20:59,946-Speed 6314.97 samples/sec Loss 6.0440 LearningRate 0.0006 Epoch: 12 Global Step: 259660 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:03,178-Speed 6337.96 samples/sec Loss 6.0382 LearningRate 0.0006 Epoch: 12 Global Step: 259670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:06,427-Speed 6304.22 samples/sec Loss 6.1145 LearningRate 0.0006 Epoch: 12 Global Step: 259680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:09,673-Speed 6310.28 samples/sec Loss 6.1088 LearningRate 0.0006 Epoch: 12 Global Step: 259690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:12,917-Speed 6315.66 samples/sec Loss 6.1220 LearningRate 0.0006 Epoch: 12 Global Step: 259700 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:16,163-Speed 6308.89 samples/sec Loss 6.0501 LearningRate 0.0006 Epoch: 12 Global Step: 259710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:19,409-Speed 6312.19 samples/sec Loss 6.0577 LearningRate 0.0006 Epoch: 12 Global Step: 259720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:22,652-Speed 6316.02 samples/sec Loss 5.9492 LearningRate 0.0006 Epoch: 12 Global Step: 259730 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:25,898-Speed 6310.72 samples/sec Loss 6.1138 LearningRate 0.0006 Epoch: 12 Global Step: 259740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:29,142-Speed 6313.86 samples/sec Loss 6.0211 LearningRate 0.0006 Epoch: 12 Global Step: 259750 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:32,387-Speed 6313.82 samples/sec Loss 6.0615 LearningRate 0.0006 Epoch: 12 Global Step: 259760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:35,630-Speed 6317.86 samples/sec Loss 6.0580 LearningRate 0.0006 Epoch: 12 Global Step: 259770 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:21:38,863-Speed 6335.34 samples/sec Loss 6.1754 LearningRate 0.0006 Epoch: 12 Global Step: 259780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:42,112-Speed 6305.06 samples/sec Loss 6.0687 LearningRate 0.0006 Epoch: 12 Global Step: 259790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:45,357-Speed 6311.56 samples/sec Loss 6.0632 LearningRate 0.0006 Epoch: 12 Global Step: 259800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:48,611-Speed 6296.58 samples/sec Loss 6.0671 LearningRate 0.0006 Epoch: 12 Global Step: 259810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:51,859-Speed 6305.97 samples/sec Loss 6.0875 LearningRate 0.0006 Epoch: 12 Global Step: 259820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:55,107-Speed 6306.20 samples/sec Loss 6.0946 LearningRate 0.0006 Epoch: 12 Global Step: 259830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:21:58,355-Speed 6307.73 samples/sec Loss 6.1477 LearningRate 0.0006 Epoch: 12 Global Step: 259840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:01,598-Speed 6316.49 samples/sec Loss 6.0975 LearningRate 0.0006 Epoch: 12 Global Step: 259850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:04,842-Speed 6314.12 samples/sec Loss 6.0369 LearningRate 0.0006 Epoch: 12 Global Step: 259860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:08,089-Speed 6309.26 samples/sec Loss 6.1428 LearningRate 0.0006 Epoch: 12 Global Step: 259870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:11,336-Speed 6309.63 samples/sec Loss 6.0024 LearningRate 0.0006 Epoch: 12 Global Step: 259880 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:22:14,587-Speed 6299.21 samples/sec Loss 6.1281 LearningRate 0.0006 Epoch: 12 Global Step: 259890 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:22:17,820-Speed 6337.68 samples/sec Loss 6.1013 LearningRate 0.0006 Epoch: 12 Global Step: 259900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:21,065-Speed 6311.82 samples/sec Loss 6.1065 LearningRate 0.0006 Epoch: 12 Global Step: 259910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:24,324-Speed 6286.26 samples/sec Loss 6.1509 LearningRate 0.0006 Epoch: 12 Global Step: 259920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:27,565-Speed 6318.89 samples/sec Loss 6.0715 LearningRate 0.0006 Epoch: 12 Global Step: 259930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:30,810-Speed 6312.59 samples/sec Loss 6.1105 LearningRate 0.0006 Epoch: 12 Global Step: 259940 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:34,057-Speed 6310.06 samples/sec Loss 6.1202 LearningRate 0.0006 Epoch: 12 Global Step: 259950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:37,303-Speed 6311.47 samples/sec Loss 6.0767 LearningRate 0.0006 Epoch: 12 Global Step: 259960 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:40,547-Speed 6315.14 samples/sec Loss 5.9904 LearningRate 0.0006 Epoch: 12 Global Step: 259970 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:43,790-Speed 6316.36 samples/sec Loss 6.0795 LearningRate 0.0006 Epoch: 12 Global Step: 259980 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:47,037-Speed 6308.99 samples/sec Loss 6.0001 LearningRate 0.0006 Epoch: 12 Global Step: 259990 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:50,266-Speed 6343.08 samples/sec Loss 5.9633 LearningRate 0.0006 Epoch: 12 Global Step: 260000 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:53,513-Speed 6308.02 samples/sec Loss 6.0958 LearningRate 0.0006 Epoch: 12 Global Step: 260010 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:22:56,758-Speed 6314.10 samples/sec Loss 6.0446 LearningRate 0.0006 Epoch: 12 Global Step: 260020 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:00,011-Speed 6296.06 samples/sec Loss 6.0467 LearningRate 0.0006 Epoch: 12 Global Step: 260030 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:03,255-Speed 6315.34 samples/sec Loss 6.1226 LearningRate 0.0006 Epoch: 12 Global Step: 260040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:06,498-Speed 6315.85 samples/sec Loss 6.0833 LearningRate 0.0006 Epoch: 12 Global Step: 260050 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:09,747-Speed 6306.47 samples/sec Loss 6.0862 LearningRate 0.0006 Epoch: 12 Global Step: 260060 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:12,992-Speed 6312.40 samples/sec Loss 6.1080 LearningRate 0.0006 Epoch: 12 Global Step: 260070 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:16,240-Speed 6305.79 samples/sec Loss 6.0341 LearningRate 0.0006 Epoch: 12 Global Step: 260080 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:19,486-Speed 6312.02 samples/sec Loss 6.0501 LearningRate 0.0006 Epoch: 12 Global Step: 260090 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:22,714-Speed 6344.86 samples/sec Loss 6.0917 LearningRate 0.0006 Epoch: 12 Global Step: 260100 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:25,958-Speed 6314.87 samples/sec Loss 6.0428 LearningRate 0.0006 Epoch: 12 Global Step: 260110 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:29,202-Speed 6314.62 samples/sec Loss 6.1433 LearningRate 0.0006 Epoch: 12 Global Step: 260120 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:32,448-Speed 6310.63 samples/sec Loss 6.0739 LearningRate 0.0006 Epoch: 12 Global Step: 260130 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:35,696-Speed 6307.87 samples/sec Loss 6.1188 LearningRate 0.0006 Epoch: 12 Global Step: 260140 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:38,955-Speed 6285.37 samples/sec Loss 6.0632 LearningRate 0.0006 Epoch: 12 Global Step: 260150 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:42,196-Speed 6319.00 samples/sec Loss 6.0655 LearningRate 0.0006 Epoch: 12 Global Step: 260160 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:45,442-Speed 6312.24 samples/sec Loss 6.0642 LearningRate 0.0006 Epoch: 12 Global Step: 260170 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:48,689-Speed 6311.30 samples/sec Loss 6.0690 LearningRate 0.0006 Epoch: 12 Global Step: 260180 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:51,933-Speed 6314.90 samples/sec Loss 6.0636 LearningRate 0.0006 Epoch: 12 Global Step: 260190 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:55,164-Speed 6339.59 samples/sec Loss 6.0405 LearningRate 0.0006 Epoch: 12 Global Step: 260200 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:23:58,411-Speed 6308.22 samples/sec Loss 6.0559 LearningRate 0.0006 Epoch: 12 Global Step: 260210 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:01,661-Speed 6304.00 samples/sec Loss 6.0283 LearningRate 0.0006 Epoch: 12 Global Step: 260220 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:04,914-Speed 6296.42 samples/sec Loss 6.0509 LearningRate 0.0006 Epoch: 12 Global Step: 260230 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:08,165-Speed 6302.08 samples/sec Loss 6.1164 LearningRate 0.0006 Epoch: 12 Global Step: 260240 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:11,418-Speed 6295.58 samples/sec Loss 6.0516 LearningRate 0.0006 Epoch: 12 Global Step: 260250 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:14,661-Speed 6317.71 samples/sec Loss 6.0743 LearningRate 0.0006 Epoch: 12 Global Step: 260260 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:17,909-Speed 6307.18 samples/sec Loss 6.0902 LearningRate 0.0006 Epoch: 12 Global Step: 260270 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:21,157-Speed 6305.19 samples/sec Loss 6.0235 LearningRate 0.0006 Epoch: 12 Global Step: 260280 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:24,406-Speed 6306.12 samples/sec Loss 6.0524 LearningRate 0.0006 Epoch: 12 Global Step: 260290 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:27,635-Speed 6342.26 samples/sec Loss 6.0503 LearningRate 0.0006 Epoch: 12 Global Step: 260300 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:30,880-Speed 6313.13 samples/sec Loss 5.9685 LearningRate 0.0006 Epoch: 12 Global Step: 260310 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:34,128-Speed 6307.88 samples/sec Loss 6.0499 LearningRate 0.0006 Epoch: 12 Global Step: 260320 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:37,371-Speed 6317.19 samples/sec Loss 6.0908 LearningRate 0.0006 Epoch: 12 Global Step: 260330 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:40,620-Speed 6304.55 samples/sec Loss 6.0539 LearningRate 0.0006 Epoch: 12 Global Step: 260340 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:43,868-Speed 6305.47 samples/sec Loss 6.0855 LearningRate 0.0006 Epoch: 12 Global Step: 260350 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:47,115-Speed 6309.90 samples/sec Loss 6.0138 LearningRate 0.0006 Epoch: 12 Global Step: 260360 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:50,361-Speed 6310.48 samples/sec Loss 5.9971 LearningRate 0.0006 Epoch: 12 Global Step: 260370 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:53,612-Speed 6300.76 samples/sec Loss 6.1114 LearningRate 0.0006 Epoch: 12 Global Step: 260380 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:24:56,864-Speed 6298.35 samples/sec Loss 5.9498 LearningRate 0.0006 Epoch: 12 Global Step: 260390 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:00,112-Speed 6307.47 samples/sec Loss 6.0331 LearningRate 0.0006 Epoch: 12 Global Step: 260400 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:25:03,346-Speed 6335.14 samples/sec Loss 6.0167 LearningRate 0.0006 Epoch: 12 Global Step: 260410 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:06,591-Speed 6313.68 samples/sec Loss 6.0584 LearningRate 0.0006 Epoch: 12 Global Step: 260420 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:09,834-Speed 6316.00 samples/sec Loss 6.0610 LearningRate 0.0006 Epoch: 12 Global Step: 260430 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:13,085-Speed 6300.62 samples/sec Loss 6.0170 LearningRate 0.0006 Epoch: 12 Global Step: 260440 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:16,332-Speed 6307.69 samples/sec Loss 6.0545 LearningRate 0.0006 Epoch: 12 Global Step: 260450 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:19,580-Speed 6308.04 samples/sec Loss 6.1069 LearningRate 0.0006 Epoch: 12 Global Step: 260460 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:22,830-Speed 6303.00 samples/sec Loss 6.0694 LearningRate 0.0006 Epoch: 12 Global Step: 260470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:26,075-Speed 6311.73 samples/sec Loss 6.0609 LearningRate 0.0006 Epoch: 12 Global Step: 260480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:29,321-Speed 6311.13 samples/sec Loss 6.0556 LearningRate 0.0006 Epoch: 12 Global Step: 260490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:32,562-Speed 6320.17 samples/sec Loss 6.0024 LearningRate 0.0006 Epoch: 12 Global Step: 260500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:35,807-Speed 6314.17 samples/sec Loss 6.0738 LearningRate 0.0006 Epoch: 12 Global Step: 260510 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:25:39,042-Speed 6331.03 samples/sec Loss 5.9828 LearningRate 0.0006 Epoch: 12 Global Step: 260520 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:42,284-Speed 6318.25 samples/sec Loss 6.0116 LearningRate 0.0006 Epoch: 12 Global Step: 260530 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:45,533-Speed 6305.21 samples/sec Loss 6.0166 LearningRate 0.0006 Epoch: 12 Global Step: 260540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:48,783-Speed 6302.94 samples/sec Loss 5.9876 LearningRate 0.0006 Epoch: 12 Global Step: 260550 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:52,045-Speed 6280.16 samples/sec Loss 6.0477 LearningRate 0.0006 Epoch: 12 Global Step: 260560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:55,291-Speed 6309.36 samples/sec Loss 6.0512 LearningRate 0.0006 Epoch: 12 Global Step: 260570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:25:58,536-Speed 6313.31 samples/sec Loss 6.1106 LearningRate 0.0006 Epoch: 12 Global Step: 260580 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:01,784-Speed 6307.37 samples/sec Loss 6.0529 LearningRate 0.0006 Epoch: 12 Global Step: 260590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:05,027-Speed 6316.46 samples/sec Loss 6.0270 LearningRate 0.0006 Epoch: 12 Global Step: 260600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:08,275-Speed 6307.57 samples/sec Loss 6.0182 LearningRate 0.0006 Epoch: 12 Global Step: 260610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:11,522-Speed 6307.96 samples/sec Loss 6.0879 LearningRate 0.0006 Epoch: 12 Global Step: 260620 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:26:14,769-Speed 6310.06 samples/sec Loss 6.0479 LearningRate 0.0006 Epoch: 12 Global Step: 260630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:18,014-Speed 6312.19 samples/sec Loss 6.0661 LearningRate 0.0006 Epoch: 12 Global Step: 260640 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:21,262-Speed 6307.20 samples/sec Loss 5.9520 LearningRate 0.0006 Epoch: 12 Global Step: 260650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:24,509-Speed 6308.68 samples/sec Loss 6.0821 LearningRate 0.0006 Epoch: 12 Global Step: 260660 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:27,752-Speed 6316.68 samples/sec Loss 6.0660 LearningRate 0.0006 Epoch: 12 Global Step: 260670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:31,001-Speed 6304.98 samples/sec Loss 6.0243 LearningRate 0.0006 Epoch: 12 Global Step: 260680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:34,244-Speed 6316.84 samples/sec Loss 6.0551 LearningRate 0.0006 Epoch: 12 Global Step: 260690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:37,491-Speed 6309.18 samples/sec Loss 6.0229 LearningRate 0.0006 Epoch: 12 Global Step: 260700 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:40,734-Speed 6315.49 samples/sec Loss 6.0761 LearningRate 0.0006 Epoch: 12 Global Step: 260710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:43,984-Speed 6303.25 samples/sec Loss 6.0473 LearningRate 0.0006 Epoch: 12 Global Step: 260720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:47,216-Speed 6336.72 samples/sec Loss 6.1075 LearningRate 0.0006 Epoch: 12 Global Step: 260730 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:50,460-Speed 6315.95 samples/sec Loss 6.0403 LearningRate 0.0006 Epoch: 12 Global Step: 260740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:53,712-Speed 6299.57 samples/sec Loss 5.9865 LearningRate 0.0006 Epoch: 12 Global Step: 260750 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:26:56,960-Speed 6305.35 samples/sec Loss 6.0671 LearningRate 0.0006 Epoch: 12 Global Step: 260760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:00,220-Speed 6283.76 samples/sec Loss 6.0257 LearningRate 0.0006 Epoch: 12 Global Step: 260770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:03,469-Speed 6305.79 samples/sec Loss 6.1878 LearningRate 0.0006 Epoch: 12 Global Step: 260780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:06,712-Speed 6317.16 samples/sec Loss 6.0206 LearningRate 0.0006 Epoch: 12 Global Step: 260790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:09,958-Speed 6309.22 samples/sec Loss 6.0476 LearningRate 0.0006 Epoch: 12 Global Step: 260800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:13,204-Speed 6311.54 samples/sec Loss 5.9892 LearningRate 0.0006 Epoch: 12 Global Step: 260810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:16,514-Speed 6188.24 samples/sec Loss 6.1029 LearningRate 0.0006 Epoch: 12 Global Step: 260820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:19,748-Speed 6333.78 samples/sec Loss 5.9800 LearningRate 0.0006 Epoch: 12 Global Step: 260830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:22,997-Speed 6306.92 samples/sec Loss 6.0046 LearningRate 0.0006 Epoch: 12 Global Step: 260840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:26,244-Speed 6308.05 samples/sec Loss 6.1221 LearningRate 0.0006 Epoch: 12 Global Step: 260850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:29,488-Speed 6314.08 samples/sec Loss 6.0348 LearningRate 0.0006 Epoch: 12 Global Step: 260860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:32,738-Speed 6304.64 samples/sec Loss 6.0415 LearningRate 0.0006 Epoch: 12 Global Step: 260870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:35,984-Speed 6311.71 samples/sec Loss 6.1404 LearningRate 0.0006 Epoch: 12 Global Step: 260880 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:39,230-Speed 6310.93 samples/sec Loss 6.0745 LearningRate 0.0006 Epoch: 12 Global Step: 260890 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:42,476-Speed 6309.11 samples/sec Loss 6.1049 LearningRate 0.0006 Epoch: 12 Global Step: 260900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:45,720-Speed 6314.16 samples/sec Loss 6.0189 LearningRate 0.0006 Epoch: 12 Global Step: 260910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:48,969-Speed 6305.09 samples/sec Loss 6.0958 LearningRate 0.0006 Epoch: 12 Global Step: 260920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:52,202-Speed 6336.05 samples/sec Loss 6.0981 LearningRate 0.0006 Epoch: 12 Global Step: 260930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:55,451-Speed 6306.07 samples/sec Loss 6.0583 LearningRate 0.0006 Epoch: 12 Global Step: 260940 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:27:58,697-Speed 6309.63 samples/sec Loss 6.1300 LearningRate 0.0006 Epoch: 12 Global Step: 260950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:01,942-Speed 6313.43 samples/sec Loss 6.0261 LearningRate 0.0006 Epoch: 12 Global Step: 260960 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:05,189-Speed 6308.69 samples/sec Loss 6.0395 LearningRate 0.0006 Epoch: 12 Global Step: 260970 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:08,433-Speed 6314.86 samples/sec Loss 6.0527 LearningRate 0.0006 Epoch: 12 Global Step: 260980 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:11,680-Speed 6309.24 samples/sec Loss 6.0663 LearningRate 0.0006 Epoch: 12 Global Step: 260990 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:14,928-Speed 6305.33 samples/sec Loss 6.0672 LearningRate 0.0006 Epoch: 12 Global Step: 261000 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:18,175-Speed 6308.94 samples/sec Loss 6.1183 LearningRate 0.0006 Epoch: 12 Global Step: 261010 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:21,419-Speed 6314.73 samples/sec Loss 6.0786 LearningRate 0.0006 Epoch: 12 Global Step: 261020 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:24,671-Speed 6298.74 samples/sec Loss 6.0666 LearningRate 0.0006 Epoch: 12 Global Step: 261030 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:28:27,904-Speed 6336.34 samples/sec Loss 5.9912 LearningRate 0.0006 Epoch: 12 Global Step: 261040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:31,152-Speed 6306.85 samples/sec Loss 6.0164 LearningRate 0.0006 Epoch: 12 Global Step: 261050 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:34,404-Speed 6301.15 samples/sec Loss 6.0304 LearningRate 0.0006 Epoch: 12 Global Step: 261060 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:37,649-Speed 6312.45 samples/sec Loss 6.0468 LearningRate 0.0006 Epoch: 12 Global Step: 261070 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:40,894-Speed 6313.22 samples/sec Loss 6.0098 LearningRate 0.0006 Epoch: 12 Global Step: 261080 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:44,139-Speed 6312.44 samples/sec Loss 6.0691 LearningRate 0.0006 Epoch: 12 Global Step: 261090 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:47,385-Speed 6310.94 samples/sec Loss 6.0501 LearningRate 0.0006 Epoch: 12 Global Step: 261100 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:50,630-Speed 6311.70 samples/sec Loss 6.0245 LearningRate 0.0006 Epoch: 12 Global Step: 261110 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:53,880-Speed 6302.66 samples/sec Loss 5.9612 LearningRate 0.0006 Epoch: 12 Global Step: 261120 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:28:57,128-Speed 6308.24 samples/sec Loss 6.0228 LearningRate 0.0006 Epoch: 12 Global Step: 261130 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:00,359-Speed 6340.47 samples/sec Loss 6.0685 LearningRate 0.0006 Epoch: 12 Global Step: 261140 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:03,601-Speed 6317.29 samples/sec Loss 6.0377 LearningRate 0.0006 Epoch: 12 Global Step: 261150 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:06,849-Speed 6307.04 samples/sec Loss 5.9661 LearningRate 0.0006 Epoch: 12 Global Step: 261160 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:10,090-Speed 6319.46 samples/sec Loss 6.0450 LearningRate 0.0006 Epoch: 12 Global Step: 261170 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:13,337-Speed 6310.15 samples/sec Loss 6.0116 LearningRate 0.0006 Epoch: 12 Global Step: 261180 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:16,579-Speed 6316.77 samples/sec Loss 6.0342 LearningRate 0.0006 Epoch: 12 Global Step: 261190 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:19,825-Speed 6312.26 samples/sec Loss 6.0156 LearningRate 0.0006 Epoch: 12 Global Step: 261200 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:23,067-Speed 6318.00 samples/sec Loss 6.0432 LearningRate 0.0006 Epoch: 12 Global Step: 261210 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:26,308-Speed 6320.37 samples/sec Loss 5.9607 LearningRate 0.0006 Epoch: 12 Global Step: 261220 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:29,558-Speed 6303.48 samples/sec Loss 6.0677 LearningRate 0.0006 Epoch: 12 Global Step: 261230 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:32,789-Speed 6339.04 samples/sec Loss 6.0835 LearningRate 0.0006 Epoch: 12 Global Step: 261240 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:36,032-Speed 6317.83 samples/sec Loss 6.0206 LearningRate 0.0006 Epoch: 12 Global Step: 261250 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:39,279-Speed 6307.36 samples/sec Loss 6.0631 LearningRate 0.0006 Epoch: 12 Global Step: 261260 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:42,530-Speed 6302.50 samples/sec Loss 6.0252 LearningRate 0.0006 Epoch: 12 Global Step: 261270 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:45,774-Speed 6315.21 samples/sec Loss 6.0115 LearningRate 0.0006 Epoch: 12 Global Step: 261280 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:49,022-Speed 6306.05 samples/sec Loss 6.0641 LearningRate 0.0006 Epoch: 12 Global Step: 261290 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:52,265-Speed 6316.91 samples/sec Loss 5.9978 LearningRate 0.0006 Epoch: 12 Global Step: 261300 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:55,512-Speed 6308.94 samples/sec Loss 6.0906 LearningRate 0.0006 Epoch: 12 Global Step: 261310 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:29:58,755-Speed 6315.88 samples/sec Loss 5.9461 LearningRate 0.0006 Epoch: 12 Global Step: 261320 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:02,004-Speed 6305.36 samples/sec Loss 6.0193 LearningRate 0.0006 Epoch: 12 Global Step: 261330 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:05,248-Speed 6315.36 samples/sec Loss 6.0969 LearningRate 0.0006 Epoch: 12 Global Step: 261340 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:30:08,476-Speed 6344.72 samples/sec Loss 6.0761 LearningRate 0.0006 Epoch: 12 Global Step: 261350 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:11,725-Speed 6305.43 samples/sec Loss 6.0314 LearningRate 0.0006 Epoch: 12 Global Step: 261360 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:14,972-Speed 6309.24 samples/sec Loss 6.0502 LearningRate 0.0006 Epoch: 12 Global Step: 261370 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:18,217-Speed 6312.16 samples/sec Loss 6.1531 LearningRate 0.0006 Epoch: 12 Global Step: 261380 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:21,495-Speed 6249.45 samples/sec Loss 6.0294 LearningRate 0.0006 Epoch: 12 Global Step: 261390 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:24,784-Speed 6228.47 samples/sec Loss 6.0054 LearningRate 0.0006 Epoch: 12 Global Step: 261400 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:28,026-Speed 6316.84 samples/sec Loss 6.1025 LearningRate 0.0006 Epoch: 12 Global Step: 261410 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:31,271-Speed 6314.18 samples/sec Loss 6.0408 LearningRate 0.0006 Epoch: 12 Global Step: 261420 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:34,517-Speed 6309.24 samples/sec Loss 5.9984 LearningRate 0.0006 Epoch: 12 Global Step: 261430 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:37,761-Speed 6315.43 samples/sec Loss 5.9908 LearningRate 0.0006 Epoch: 12 Global Step: 261440 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:41,004-Speed 6317.63 samples/sec Loss 6.0594 LearningRate 0.0006 Epoch: 12 Global Step: 261450 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:44,252-Speed 6305.21 samples/sec Loss 6.0824 LearningRate 0.0006 Epoch: 12 Global Step: 261460 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:47,502-Speed 6304.10 samples/sec Loss 6.0816 LearningRate 0.0006 Epoch: 12 Global Step: 261470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:50,751-Speed 6305.25 samples/sec Loss 5.9565 LearningRate 0.0006 Epoch: 12 Global Step: 261480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:53,998-Speed 6308.82 samples/sec Loss 6.0800 LearningRate 0.0006 Epoch: 12 Global Step: 261490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:30:57,249-Speed 6300.33 samples/sec Loss 6.0994 LearningRate 0.0006 Epoch: 12 Global Step: 261500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:00,495-Speed 6312.07 samples/sec Loss 6.0515 LearningRate 0.0006 Epoch: 12 Global Step: 261510 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:03,745-Speed 6303.12 samples/sec Loss 6.0639 LearningRate 0.0006 Epoch: 12 Global Step: 261520 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:06,991-Speed 6310.20 samples/sec Loss 6.0012 LearningRate 0.0006 Epoch: 12 Global Step: 261530 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:10,237-Speed 6311.51 samples/sec Loss 6.0293 LearningRate 0.0006 Epoch: 12 Global Step: 261540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:13,476-Speed 6322.68 samples/sec Loss 6.0473 LearningRate 0.0006 Epoch: 12 Global Step: 261550 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:16,720-Speed 6314.57 samples/sec Loss 6.0022 LearningRate 0.0006 Epoch: 12 Global Step: 261560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:19,966-Speed 6310.79 samples/sec Loss 6.0960 LearningRate 0.0006 Epoch: 12 Global Step: 261570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:23,212-Speed 6312.55 samples/sec Loss 6.0919 LearningRate 0.0006 Epoch: 12 Global Step: 261580 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:26,457-Speed 6311.72 samples/sec Loss 6.0699 LearningRate 0.0006 Epoch: 12 Global Step: 261590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:29,703-Speed 6310.19 samples/sec Loss 6.0859 LearningRate 0.0006 Epoch: 12 Global Step: 261600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:32,946-Speed 6316.75 samples/sec Loss 6.0210 LearningRate 0.0006 Epoch: 12 Global Step: 261610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:36,193-Speed 6309.76 samples/sec Loss 6.0510 LearningRate 0.0006 Epoch: 12 Global Step: 261620 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:39,442-Speed 6303.87 samples/sec Loss 6.0515 LearningRate 0.0006 Epoch: 12 Global Step: 261630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:42,688-Speed 6310.93 samples/sec Loss 5.9904 LearningRate 0.0006 Epoch: 12 Global Step: 261640 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:45,919-Speed 6340.14 samples/sec Loss 6.0283 LearningRate 0.0006 Epoch: 12 Global Step: 261650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:49,166-Speed 6309.55 samples/sec Loss 6.0897 LearningRate 0.0006 Epoch: 12 Global Step: 261660 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:52,416-Speed 6302.94 samples/sec Loss 6.0531 LearningRate 0.0006 Epoch: 12 Global Step: 261670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:55,661-Speed 6312.65 samples/sec Loss 6.0437 LearningRate 0.0006 Epoch: 12 Global Step: 261680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:31:58,916-Speed 6291.56 samples/sec Loss 6.1343 LearningRate 0.0006 Epoch: 12 Global Step: 261690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:02,166-Speed 6304.20 samples/sec Loss 6.0321 LearningRate 0.0006 Epoch: 12 Global Step: 261700 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:05,417-Speed 6301.93 samples/sec Loss 5.9982 LearningRate 0.0006 Epoch: 12 Global Step: 261710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:08,666-Speed 6304.49 samples/sec Loss 6.0737 LearningRate 0.0006 Epoch: 12 Global Step: 261720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:11,914-Speed 6306.95 samples/sec Loss 5.9815 LearningRate 0.0006 Epoch: 12 Global Step: 261730 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:15,160-Speed 6310.16 samples/sec Loss 6.0749 LearningRate 0.0006 Epoch: 12 Global Step: 261740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:18,405-Speed 6312.80 samples/sec Loss 6.0366 LearningRate 0.0006 Epoch: 12 Global Step: 261750 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:32:21,639-Speed 6334.45 samples/sec Loss 6.0039 LearningRate 0.0006 Epoch: 12 Global Step: 261760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:24,886-Speed 6310.01 samples/sec Loss 6.1543 LearningRate 0.0006 Epoch: 12 Global Step: 261770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:28,130-Speed 6314.67 samples/sec Loss 6.0261 LearningRate 0.0006 Epoch: 12 Global Step: 261780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:31,374-Speed 6313.27 samples/sec Loss 6.0569 LearningRate 0.0006 Epoch: 12 Global Step: 261790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:34,620-Speed 6311.89 samples/sec Loss 6.0368 LearningRate 0.0006 Epoch: 12 Global Step: 261800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:37,864-Speed 6313.90 samples/sec Loss 6.0696 LearningRate 0.0006 Epoch: 12 Global Step: 261810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:41,108-Speed 6314.26 samples/sec Loss 6.0876 LearningRate 0.0006 Epoch: 12 Global Step: 261820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:44,355-Speed 6309.49 samples/sec Loss 6.0677 LearningRate 0.0006 Epoch: 12 Global Step: 261830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:47,598-Speed 6315.67 samples/sec Loss 5.9699 LearningRate 0.0006 Epoch: 12 Global Step: 261840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:50,843-Speed 6312.94 samples/sec Loss 6.1244 LearningRate 0.0006 Epoch: 12 Global Step: 261850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:54,074-Speed 6339.78 samples/sec Loss 6.1088 LearningRate 0.0006 Epoch: 12 Global Step: 261860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:32:57,318-Speed 6314.49 samples/sec Loss 6.0607 LearningRate 0.0006 Epoch: 12 Global Step: 261870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:00,571-Speed 6297.84 samples/sec Loss 5.9711 LearningRate 0.0006 Epoch: 12 Global Step: 261880 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:03,819-Speed 6307.73 samples/sec Loss 6.0650 LearningRate 0.0006 Epoch: 12 Global Step: 261890 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:07,060-Speed 6319.57 samples/sec Loss 5.9755 LearningRate 0.0006 Epoch: 12 Global Step: 261900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:10,307-Speed 6308.70 samples/sec Loss 6.1036 LearningRate 0.0006 Epoch: 12 Global Step: 261910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:13,555-Speed 6308.23 samples/sec Loss 6.0619 LearningRate 0.0006 Epoch: 12 Global Step: 261920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:16,796-Speed 6319.22 samples/sec Loss 6.0671 LearningRate 0.0006 Epoch: 12 Global Step: 261930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:20,040-Speed 6314.87 samples/sec Loss 6.1135 LearningRate 0.0006 Epoch: 12 Global Step: 261940 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:23,286-Speed 6310.77 samples/sec Loss 6.0258 LearningRate 0.0006 Epoch: 12 Global Step: 261950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:26,532-Speed 6311.39 samples/sec Loss 6.0577 LearningRate 0.0006 Epoch: 12 Global Step: 261960 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:29,775-Speed 6316.48 samples/sec Loss 6.0402 LearningRate 0.0006 Epoch: 12 Global Step: 261970 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:33,021-Speed 6309.74 samples/sec Loss 6.0767 LearningRate 0.0006 Epoch: 12 Global Step: 261980 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:36,267-Speed 6312.74 samples/sec Loss 6.0869 LearningRate 0.0006 Epoch: 12 Global Step: 261990 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:39,511-Speed 6314.20 samples/sec Loss 6.0155 LearningRate 0.0006 Epoch: 12 Global Step: 262000 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:42,754-Speed 6316.64 samples/sec Loss 5.9944 LearningRate 0.0006 Epoch: 12 Global Step: 262010 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:45,998-Speed 6314.23 samples/sec Loss 6.0168 LearningRate 0.0006 Epoch: 12 Global Step: 262020 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:49,241-Speed 6315.46 samples/sec Loss 6.0107 LearningRate 0.0006 Epoch: 12 Global Step: 262030 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:52,490-Speed 6305.07 samples/sec Loss 6.0351 LearningRate 0.0006 Epoch: 12 Global Step: 262040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:55,736-Speed 6311.03 samples/sec Loss 6.0996 LearningRate 0.0006 Epoch: 12 Global Step: 262050 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:33:58,986-Speed 6302.79 samples/sec Loss 6.0220 LearningRate 0.0006 Epoch: 12 Global Step: 262060 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:34:02,235-Speed 6305.68 samples/sec Loss 6.0826 LearningRate 0.0006 Epoch: 12 Global Step: 262070 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:34:05,467-Speed 6338.02 samples/sec Loss 6.0282 LearningRate 0.0006 Epoch: 12 Global Step: 262080 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:34:08,708-Speed 6319.14 samples/sec Loss 6.0647 LearningRate 0.0006 Epoch: 12 Global Step: 262090 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:34:11,955-Speed 6309.41 samples/sec Loss 6.0429 LearningRate 0.0006 Epoch: 12 Global Step: 262100 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:34:15,200-Speed 6313.24 samples/sec Loss 6.0809 LearningRate 0.0006 Epoch: 12 Global Step: 262110 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:34:18,447-Speed 6308.50 samples/sec Loss 6.0356 LearningRate 0.0006 Epoch: 12 Global Step: 262120 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:34:21,694-Speed 6309.97 samples/sec Loss 6.0182 LearningRate 0.0006 Epoch: 12 Global Step: 262130 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:34:24,944-Speed 6302.24 samples/sec Loss 6.0618 LearningRate 0.0006 Epoch: 12 Global Step: 262140 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:34:28,190-Speed 6310.18 samples/sec Loss 6.1015 LearningRate 0.0006 Epoch: 12 Global Step: 262150 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:34:31,434-Speed 6315.02 samples/sec Loss 6.0081 LearningRate 0.0006 Epoch: 12 Global Step: 262160 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:34:34,682-Speed 6307.12 samples/sec Loss 6.0050 LearningRate 0.0006 Epoch: 12 Global Step: 262170 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:34:37,899-Speed 6368.93 samples/sec Loss 6.1580 LearningRate 0.0006 Epoch: 12 Global Step: 262180 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:34:41,179-Speed 6244.46 samples/sec Loss 6.0778 LearningRate 0.0006 Epoch: 12 Global Step: 262190 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:34:44,434-Speed 6293.58 samples/sec Loss 6.0799 LearningRate 0.0006 Epoch: 12 Global Step: 262200 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:34:47,680-Speed 6310.02 samples/sec Loss 6.0790 LearningRate 0.0006 Epoch: 12 Global Step: 262210 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:34:50,928-Speed 6306.70 samples/sec Loss 5.9980 LearningRate 0.0006 Epoch: 12 Global Step: 262220 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:34:54,174-Speed 6309.71 samples/sec Loss 6.0753 LearningRate 0.0006 Epoch: 12 Global Step: 262230 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:34:57,426-Speed 6300.40 samples/sec Loss 6.0933 LearningRate 0.0006 Epoch: 12 Global Step: 262240 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:35:00,676-Speed 6302.83 samples/sec Loss 6.0496 LearningRate 0.0006 Epoch: 12 Global Step: 262250 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:35:03,929-Speed 6296.13 samples/sec Loss 6.0012 LearningRate 0.0006 Epoch: 12 Global Step: 262260 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:35:07,175-Speed 6311.75 samples/sec Loss 6.0306 LearningRate 0.0006 Epoch: 12 Global Step: 262270 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-01 15:35:10,423-Speed 6307.09 samples/sec Loss 6.0080 LearningRate 0.0006 Epoch: 12 Global Step: 262280 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:13,694-Speed 6262.68 samples/sec Loss 6.0089 LearningRate 0.0006 Epoch: 12 Global Step: 262290 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:16,976-Speed 6241.12 samples/sec Loss 6.0392 LearningRate 0.0006 Epoch: 12 Global Step: 262300 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:20,224-Speed 6305.63 samples/sec Loss 6.0437 LearningRate 0.0006 Epoch: 12 Global Step: 262310 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:23,470-Speed 6311.74 samples/sec Loss 5.9518 LearningRate 0.0006 Epoch: 12 Global Step: 262320 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:26,715-Speed 6313.43 samples/sec Loss 6.0317 LearningRate 0.0006 Epoch: 12 Global Step: 262330 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:29,968-Speed 6297.20 samples/sec Loss 6.1362 LearningRate 0.0006 Epoch: 12 Global Step: 262340 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:33,213-Speed 6312.46 samples/sec Loss 6.1133 LearningRate 0.0006 Epoch: 12 Global Step: 262350 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:36,464-Speed 6300.40 samples/sec Loss 6.0369 LearningRate 0.0006 Epoch: 12 Global Step: 262360 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:39,705-Speed 6320.55 samples/sec Loss 6.0353 LearningRate 0.0006 Epoch: 12 Global Step: 262370 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:42,948-Speed 6317.85 samples/sec Loss 6.0669 LearningRate 0.0006 Epoch: 12 Global Step: 262380 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:35:46,199-Speed 6299.79 samples/sec Loss 6.0217 LearningRate 0.0006 Epoch: 12 Global Step: 262390 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:35:49,430-Speed 6339.58 samples/sec Loss 6.0406 LearningRate 0.0006 Epoch: 12 Global Step: 262400 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:52,675-Speed 6313.78 samples/sec Loss 6.1284 LearningRate 0.0006 Epoch: 12 Global Step: 262410 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:55,919-Speed 6313.33 samples/sec Loss 6.0684 LearningRate 0.0006 Epoch: 12 Global Step: 262420 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:35:59,166-Speed 6310.07 samples/sec Loss 6.0269 LearningRate 0.0006 Epoch: 12 Global Step: 262430 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:02,419-Speed 6297.15 samples/sec Loss 6.0732 LearningRate 0.0006 Epoch: 12 Global Step: 262440 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:05,661-Speed 6317.41 samples/sec Loss 6.0943 LearningRate 0.0006 Epoch: 12 Global Step: 262450 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:08,905-Speed 6315.61 samples/sec Loss 6.0089 LearningRate 0.0006 Epoch: 12 Global Step: 262460 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:12,149-Speed 6313.68 samples/sec Loss 5.9950 LearningRate 0.0006 Epoch: 12 Global Step: 262470 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:15,395-Speed 6311.32 samples/sec Loss 6.0034 LearningRate 0.0006 Epoch: 12 Global Step: 262480 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:18,640-Speed 6312.94 samples/sec Loss 6.0584 LearningRate 0.0006 Epoch: 12 Global Step: 262490 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:21,872-Speed 6338.08 samples/sec Loss 5.9396 LearningRate 0.0006 Epoch: 12 Global Step: 262500 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:25,117-Speed 6311.78 samples/sec Loss 6.0448 LearningRate 0.0006 Epoch: 12 Global Step: 262510 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:28,364-Speed 6308.55 samples/sec Loss 6.0559 LearningRate 0.0006 Epoch: 12 Global Step: 262520 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:31,614-Speed 6302.70 samples/sec Loss 6.0691 LearningRate 0.0006 Epoch: 12 Global Step: 262530 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:34,859-Speed 6313.40 samples/sec Loss 5.9544 LearningRate 0.0006 Epoch: 12 Global Step: 262540 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:38,104-Speed 6313.51 samples/sec Loss 6.0535 LearningRate 0.0006 Epoch: 12 Global Step: 262550 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:41,350-Speed 6310.72 samples/sec Loss 6.1116 LearningRate 0.0006 Epoch: 12 Global Step: 262560 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:44,592-Speed 6319.20 samples/sec Loss 6.0980 LearningRate 0.0006 Epoch: 12 Global Step: 262570 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:47,846-Speed 6295.41 samples/sec Loss 6.0906 LearningRate 0.0006 Epoch: 12 Global Step: 262580 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:51,123-Speed 6250.84 samples/sec Loss 5.9507 LearningRate 0.0006 Epoch: 12 Global Step: 262590 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:54,356-Speed 6336.52 samples/sec Loss 5.9895 LearningRate 0.0006 Epoch: 12 Global Step: 262600 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:36:57,602-Speed 6308.89 samples/sec Loss 6.0568 LearningRate 0.0006 Epoch: 12 Global Step: 262610 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:00,852-Speed 6303.05 samples/sec Loss 6.0590 LearningRate 0.0006 Epoch: 12 Global Step: 262620 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:04,114-Speed 6280.92 samples/sec Loss 6.1248 LearningRate 0.0006 Epoch: 12 Global Step: 262630 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:07,374-Speed 6283.28 samples/sec Loss 6.0480 LearningRate 0.0006 Epoch: 12 Global Step: 262640 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:10,622-Speed 6305.85 samples/sec Loss 6.0198 LearningRate 0.0006 Epoch: 12 Global Step: 262650 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:13,874-Speed 6300.78 samples/sec Loss 5.9964 LearningRate 0.0006 Epoch: 12 Global Step: 262660 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:17,118-Speed 6312.89 samples/sec Loss 6.0597 LearningRate 0.0006 Epoch: 12 Global Step: 262670 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:20,373-Speed 6293.83 samples/sec Loss 6.0256 LearningRate 0.0006 Epoch: 12 Global Step: 262680 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:23,640-Speed 6269.64 samples/sec Loss 6.0226 LearningRate 0.0006 Epoch: 12 Global Step: 262690 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:26,886-Speed 6311.98 samples/sec Loss 6.0289 LearningRate 0.0006 Epoch: 12 Global Step: 262700 Fp16 Grad Scale: 65536 Required: 52 hours Training: 2022-04-01 15:37:30,119-Speed 6336.32 samples/sec Loss 5.9621 LearningRate 0.0006 Epoch: 12 Global Step: 262710 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:33,366-Speed 6307.63 samples/sec Loss 6.0206 LearningRate 0.0006 Epoch: 12 Global Step: 262720 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:36,615-Speed 6305.47 samples/sec Loss 6.0801 LearningRate 0.0006 Epoch: 12 Global Step: 262730 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:39,863-Speed 6306.68 samples/sec Loss 6.0146 LearningRate 0.0006 Epoch: 12 Global Step: 262740 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:43,110-Speed 6309.09 samples/sec Loss 6.0244 LearningRate 0.0006 Epoch: 12 Global Step: 262750 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:46,382-Speed 6260.74 samples/sec Loss 6.1130 LearningRate 0.0006 Epoch: 12 Global Step: 262760 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:49,628-Speed 6310.68 samples/sec Loss 6.0460 LearningRate 0.0006 Epoch: 12 Global Step: 262770 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:52,880-Speed 6299.69 samples/sec Loss 6.0452 LearningRate 0.0006 Epoch: 12 Global Step: 262780 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:56,125-Speed 6311.85 samples/sec Loss 5.9995 LearningRate 0.0006 Epoch: 12 Global Step: 262790 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:37:59,371-Speed 6311.40 samples/sec Loss 5.9825 LearningRate 0.0006 Epoch: 12 Global Step: 262800 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:02,600-Speed 6342.66 samples/sec Loss 6.0904 LearningRate 0.0006 Epoch: 12 Global Step: 262810 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:05,848-Speed 6308.63 samples/sec Loss 6.0619 LearningRate 0.0006 Epoch: 12 Global Step: 262820 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:09,092-Speed 6313.78 samples/sec Loss 6.0147 LearningRate 0.0006 Epoch: 12 Global Step: 262830 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:12,337-Speed 6312.85 samples/sec Loss 6.0901 LearningRate 0.0006 Epoch: 12 Global Step: 262840 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:15,584-Speed 6308.20 samples/sec Loss 5.9754 LearningRate 0.0006 Epoch: 12 Global Step: 262850 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:18,830-Speed 6310.73 samples/sec Loss 6.0532 LearningRate 0.0006 Epoch: 12 Global Step: 262860 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:22,073-Speed 6315.69 samples/sec Loss 6.0911 LearningRate 0.0006 Epoch: 12 Global Step: 262870 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:25,321-Speed 6309.31 samples/sec Loss 6.0352 LearningRate 0.0006 Epoch: 12 Global Step: 262880 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:28,565-Speed 6314.84 samples/sec Loss 6.0609 LearningRate 0.0006 Epoch: 12 Global Step: 262890 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:31,812-Speed 6308.56 samples/sec Loss 6.0900 LearningRate 0.0006 Epoch: 12 Global Step: 262900 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:35,045-Speed 6336.92 samples/sec Loss 6.0154 LearningRate 0.0006 Epoch: 12 Global Step: 262910 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:38,293-Speed 6307.08 samples/sec Loss 6.0189 LearningRate 0.0006 Epoch: 12 Global Step: 262920 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:41,559-Speed 6271.81 samples/sec Loss 6.0812 LearningRate 0.0006 Epoch: 12 Global Step: 262930 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:44,807-Speed 6307.32 samples/sec Loss 5.9724 LearningRate 0.0006 Epoch: 12 Global Step: 262940 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:48,056-Speed 6304.67 samples/sec Loss 6.0239 LearningRate 0.0006 Epoch: 12 Global Step: 262950 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:51,300-Speed 6314.15 samples/sec Loss 6.1556 LearningRate 0.0006 Epoch: 12 Global Step: 262960 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:54,544-Speed 6314.77 samples/sec Loss 5.9820 LearningRate 0.0006 Epoch: 12 Global Step: 262970 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:38:57,791-Speed 6309.67 samples/sec Loss 6.0142 LearningRate 0.0006 Epoch: 12 Global Step: 262980 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:39:01,038-Speed 6308.84 samples/sec Loss 5.9623 LearningRate 0.0006 Epoch: 12 Global Step: 262990 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:39:04,284-Speed 6310.45 samples/sec Loss 6.0791 LearningRate 0.0006 Epoch: 12 Global Step: 263000 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:39:07,516-Speed 6338.27 samples/sec Loss 6.0679 LearningRate 0.0006 Epoch: 12 Global Step: 263010 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:39:10,759-Speed 6315.38 samples/sec Loss 6.1193 LearningRate 0.0006 Epoch: 12 Global Step: 263020 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:39:14,004-Speed 6313.35 samples/sec Loss 6.0482 LearningRate 0.0006 Epoch: 12 Global Step: 263030 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:39:17,250-Speed 6311.09 samples/sec Loss 5.9990 LearningRate 0.0006 Epoch: 12 Global Step: 263040 Fp16 Grad Scale: 32768 Required: 52 hours Training: 2022-04-01 15:39:20,493-Speed 6315.80 samples/sec Loss 6.0441 LearningRate 0.0006 Epoch: 12 Global Step: 263050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:23,741-Speed 6307.70 samples/sec Loss 6.1023 LearningRate 0.0006 Epoch: 12 Global Step: 263060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:26,990-Speed 6304.54 samples/sec Loss 6.1059 LearningRate 0.0006 Epoch: 12 Global Step: 263070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:30,242-Speed 6298.88 samples/sec Loss 6.0591 LearningRate 0.0006 Epoch: 12 Global Step: 263080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:33,487-Speed 6312.33 samples/sec Loss 6.1293 LearningRate 0.0006 Epoch: 12 Global Step: 263090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:36,737-Speed 6304.48 samples/sec Loss 6.0083 LearningRate 0.0006 Epoch: 12 Global Step: 263100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:39,972-Speed 6331.26 samples/sec Loss 5.9381 LearningRate 0.0006 Epoch: 12 Global Step: 263110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:43,219-Speed 6308.95 samples/sec Loss 6.0083 LearningRate 0.0006 Epoch: 12 Global Step: 263120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:46,465-Speed 6310.06 samples/sec Loss 6.0628 LearningRate 0.0006 Epoch: 12 Global Step: 263130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:49,713-Speed 6306.55 samples/sec Loss 5.9899 LearningRate 0.0006 Epoch: 12 Global Step: 263140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:52,959-Speed 6310.48 samples/sec Loss 6.0047 LearningRate 0.0006 Epoch: 12 Global Step: 263150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:56,206-Speed 6310.88 samples/sec Loss 6.0673 LearningRate 0.0006 Epoch: 12 Global Step: 263160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:39:59,455-Speed 6304.06 samples/sec Loss 6.0263 LearningRate 0.0006 Epoch: 12 Global Step: 263170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:02,707-Speed 6299.01 samples/sec Loss 5.9236 LearningRate 0.0006 Epoch: 12 Global Step: 263180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:05,959-Speed 6300.84 samples/sec Loss 6.0021 LearningRate 0.0006 Epoch: 12 Global Step: 263190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:09,204-Speed 6312.55 samples/sec Loss 6.0032 LearningRate 0.0006 Epoch: 12 Global Step: 263200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:12,452-Speed 6307.19 samples/sec Loss 6.0863 LearningRate 0.0006 Epoch: 12 Global Step: 263210 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 15:40:15,682-Speed 6341.82 samples/sec Loss 6.0480 LearningRate 0.0006 Epoch: 12 Global Step: 263220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:18,959-Speed 6249.50 samples/sec Loss 5.9948 LearningRate 0.0006 Epoch: 12 Global Step: 263230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:22,205-Speed 6310.47 samples/sec Loss 6.0263 LearningRate 0.0006 Epoch: 12 Global Step: 263240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:25,451-Speed 6311.04 samples/sec Loss 6.0377 LearningRate 0.0006 Epoch: 12 Global Step: 263250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:28,695-Speed 6315.07 samples/sec Loss 5.9632 LearningRate 0.0006 Epoch: 12 Global Step: 263260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:31,941-Speed 6310.91 samples/sec Loss 6.0183 LearningRate 0.0006 Epoch: 12 Global Step: 263270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:35,188-Speed 6308.72 samples/sec Loss 5.9950 LearningRate 0.0006 Epoch: 12 Global Step: 263280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:38,430-Speed 6318.34 samples/sec Loss 6.0402 LearningRate 0.0006 Epoch: 12 Global Step: 263290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:41,679-Speed 6305.09 samples/sec Loss 5.9954 LearningRate 0.0006 Epoch: 12 Global Step: 263300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:44,925-Speed 6310.53 samples/sec Loss 5.9833 LearningRate 0.0006 Epoch: 12 Global Step: 263310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:48,158-Speed 6336.64 samples/sec Loss 6.1276 LearningRate 0.0006 Epoch: 12 Global Step: 263320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:51,402-Speed 6313.92 samples/sec Loss 6.0037 LearningRate 0.0006 Epoch: 12 Global Step: 263330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:54,649-Speed 6309.35 samples/sec Loss 6.0387 LearningRate 0.0006 Epoch: 12 Global Step: 263340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:40:57,895-Speed 6310.41 samples/sec Loss 6.0569 LearningRate 0.0006 Epoch: 12 Global Step: 263350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:01,143-Speed 6306.67 samples/sec Loss 6.0210 LearningRate 0.0006 Epoch: 12 Global Step: 263360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:04,391-Speed 6306.34 samples/sec Loss 5.9694 LearningRate 0.0006 Epoch: 12 Global Step: 263370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:07,640-Speed 6304.78 samples/sec Loss 6.0138 LearningRate 0.0006 Epoch: 12 Global Step: 263380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:10,885-Speed 6313.72 samples/sec Loss 6.0256 LearningRate 0.0006 Epoch: 12 Global Step: 263390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:14,131-Speed 6310.96 samples/sec Loss 6.0300 LearningRate 0.0006 Epoch: 12 Global Step: 263400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:17,377-Speed 6311.92 samples/sec Loss 6.0188 LearningRate 0.0006 Epoch: 12 Global Step: 263410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:20,610-Speed 6336.25 samples/sec Loss 6.0145 LearningRate 0.0006 Epoch: 12 Global Step: 263420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:23,858-Speed 6306.25 samples/sec Loss 5.9502 LearningRate 0.0006 Epoch: 12 Global Step: 263430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:27,120-Speed 6279.77 samples/sec Loss 5.9959 LearningRate 0.0006 Epoch: 12 Global Step: 263440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:30,363-Speed 6315.76 samples/sec Loss 6.0120 LearningRate 0.0006 Epoch: 12 Global Step: 263450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:33,608-Speed 6313.81 samples/sec Loss 6.0420 LearningRate 0.0006 Epoch: 12 Global Step: 263460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:36,851-Speed 6314.54 samples/sec Loss 5.9741 LearningRate 0.0006 Epoch: 12 Global Step: 263470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:40,094-Speed 6316.98 samples/sec Loss 6.0581 LearningRate 0.0006 Epoch: 12 Global Step: 263480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:43,336-Speed 6319.10 samples/sec Loss 5.9605 LearningRate 0.0006 Epoch: 12 Global Step: 263490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:46,584-Speed 6307.45 samples/sec Loss 6.0052 LearningRate 0.0006 Epoch: 12 Global Step: 263500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:49,829-Speed 6312.33 samples/sec Loss 6.0433 LearningRate 0.0006 Epoch: 12 Global Step: 263510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:53,063-Speed 6335.23 samples/sec Loss 6.0480 LearningRate 0.0006 Epoch: 12 Global Step: 263520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:56,308-Speed 6312.63 samples/sec Loss 6.0647 LearningRate 0.0006 Epoch: 12 Global Step: 263530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:41:59,554-Speed 6310.51 samples/sec Loss 6.0281 LearningRate 0.0006 Epoch: 12 Global Step: 263540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:02,799-Speed 6312.32 samples/sec Loss 6.0642 LearningRate 0.0006 Epoch: 12 Global Step: 263550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:06,043-Speed 6314.93 samples/sec Loss 6.0421 LearningRate 0.0006 Epoch: 12 Global Step: 263560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:09,286-Speed 6316.74 samples/sec Loss 6.0596 LearningRate 0.0006 Epoch: 12 Global Step: 263570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:12,527-Speed 6319.39 samples/sec Loss 6.0481 LearningRate 0.0006 Epoch: 12 Global Step: 263580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:15,775-Speed 6308.61 samples/sec Loss 6.0820 LearningRate 0.0006 Epoch: 12 Global Step: 263590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:19,024-Speed 6304.54 samples/sec Loss 6.0440 LearningRate 0.0006 Epoch: 12 Global Step: 263600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:22,268-Speed 6315.13 samples/sec Loss 6.0339 LearningRate 0.0006 Epoch: 12 Global Step: 263610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:25,513-Speed 6311.41 samples/sec Loss 6.0809 LearningRate 0.0006 Epoch: 12 Global Step: 263620 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 15:42:28,743-Speed 6341.79 samples/sec Loss 6.0414 LearningRate 0.0006 Epoch: 12 Global Step: 263630 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:31,990-Speed 6309.10 samples/sec Loss 6.0432 LearningRate 0.0006 Epoch: 12 Global Step: 263640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:35,237-Speed 6310.12 samples/sec Loss 6.0267 LearningRate 0.0006 Epoch: 12 Global Step: 263650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:38,482-Speed 6312.15 samples/sec Loss 6.0538 LearningRate 0.0006 Epoch: 12 Global Step: 263660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:41,726-Speed 6313.37 samples/sec Loss 5.9755 LearningRate 0.0006 Epoch: 12 Global Step: 263670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:44,972-Speed 6310.55 samples/sec Loss 6.0815 LearningRate 0.0006 Epoch: 12 Global Step: 263680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:48,216-Speed 6315.16 samples/sec Loss 6.0350 LearningRate 0.0006 Epoch: 12 Global Step: 263690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:51,464-Speed 6306.57 samples/sec Loss 6.1080 LearningRate 0.0006 Epoch: 12 Global Step: 263700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:54,714-Speed 6304.13 samples/sec Loss 6.0762 LearningRate 0.0006 Epoch: 12 Global Step: 263710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:42:57,964-Speed 6303.33 samples/sec Loss 6.0158 LearningRate 0.0006 Epoch: 12 Global Step: 263720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:01,194-Speed 6340.22 samples/sec Loss 6.0680 LearningRate 0.0006 Epoch: 12 Global Step: 263730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:04,440-Speed 6311.68 samples/sec Loss 5.9538 LearningRate 0.0006 Epoch: 12 Global Step: 263740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:07,686-Speed 6310.28 samples/sec Loss 6.0020 LearningRate 0.0006 Epoch: 12 Global Step: 263750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:10,930-Speed 6314.25 samples/sec Loss 6.0204 LearningRate 0.0006 Epoch: 12 Global Step: 263760 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:14,175-Speed 6312.58 samples/sec Loss 5.9730 LearningRate 0.0006 Epoch: 12 Global Step: 263770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:17,421-Speed 6311.19 samples/sec Loss 6.0883 LearningRate 0.0006 Epoch: 12 Global Step: 263780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:20,666-Speed 6313.15 samples/sec Loss 6.0529 LearningRate 0.0006 Epoch: 12 Global Step: 263790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:23,918-Speed 6300.15 samples/sec Loss 6.0576 LearningRate 0.0006 Epoch: 12 Global Step: 263800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:27,165-Speed 6308.27 samples/sec Loss 6.0693 LearningRate 0.0006 Epoch: 12 Global Step: 263810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:30,410-Speed 6313.28 samples/sec Loss 5.9916 LearningRate 0.0006 Epoch: 12 Global Step: 263820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:33,642-Speed 6337.44 samples/sec Loss 6.1410 LearningRate 0.0006 Epoch: 12 Global Step: 263830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:36,887-Speed 6313.74 samples/sec Loss 6.0472 LearningRate 0.0006 Epoch: 12 Global Step: 263840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:40,138-Speed 6299.86 samples/sec Loss 6.1554 LearningRate 0.0006 Epoch: 12 Global Step: 263850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:43,382-Speed 6314.69 samples/sec Loss 5.9886 LearningRate 0.0006 Epoch: 12 Global Step: 263860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:46,640-Speed 6287.88 samples/sec Loss 5.9577 LearningRate 0.0006 Epoch: 12 Global Step: 263870 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:49,883-Speed 6315.27 samples/sec Loss 6.0379 LearningRate 0.0006 Epoch: 12 Global Step: 263880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:53,131-Speed 6308.57 samples/sec Loss 5.9094 LearningRate 0.0006 Epoch: 12 Global Step: 263890 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:56,393-Speed 6279.34 samples/sec Loss 6.0194 LearningRate 0.0006 Epoch: 12 Global Step: 263900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:43:59,639-Speed 6309.68 samples/sec Loss 6.0420 LearningRate 0.0006 Epoch: 12 Global Step: 263910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:02,883-Speed 6314.34 samples/sec Loss 6.0809 LearningRate 0.0006 Epoch: 12 Global Step: 263920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:06,127-Speed 6315.32 samples/sec Loss 5.9896 LearningRate 0.0006 Epoch: 12 Global Step: 263930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:09,368-Speed 6321.26 samples/sec Loss 6.0441 LearningRate 0.0006 Epoch: 12 Global Step: 263940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:12,616-Speed 6305.18 samples/sec Loss 6.0940 LearningRate 0.0006 Epoch: 12 Global Step: 263950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:15,867-Speed 6301.94 samples/sec Loss 6.1087 LearningRate 0.0006 Epoch: 12 Global Step: 263960 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:19,114-Speed 6308.08 samples/sec Loss 6.0700 LearningRate 0.0006 Epoch: 12 Global Step: 263970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:22,356-Speed 6319.97 samples/sec Loss 6.0330 LearningRate 0.0006 Epoch: 12 Global Step: 263980 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:25,636-Speed 6245.42 samples/sec Loss 6.0699 LearningRate 0.0006 Epoch: 12 Global Step: 263990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:28,900-Speed 6275.80 samples/sec Loss 5.9987 LearningRate 0.0006 Epoch: 12 Global Step: 264000 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:32,146-Speed 6310.73 samples/sec Loss 5.9267 LearningRate 0.0006 Epoch: 12 Global Step: 264010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:35,391-Speed 6313.75 samples/sec Loss 5.9871 LearningRate 0.0006 Epoch: 12 Global Step: 264020 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:38,619-Speed 6344.99 samples/sec Loss 6.0002 LearningRate 0.0006 Epoch: 12 Global Step: 264030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:41,864-Speed 6311.55 samples/sec Loss 5.9755 LearningRate 0.0006 Epoch: 12 Global Step: 264040 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:45,114-Speed 6304.02 samples/sec Loss 6.0403 LearningRate 0.0006 Epoch: 12 Global Step: 264050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:48,358-Speed 6315.17 samples/sec Loss 6.0246 LearningRate 0.0006 Epoch: 12 Global Step: 264060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:51,607-Speed 6303.89 samples/sec Loss 6.0054 LearningRate 0.0006 Epoch: 12 Global Step: 264070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:54,852-Speed 6313.57 samples/sec Loss 6.0550 LearningRate 0.0006 Epoch: 12 Global Step: 264080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:44:58,103-Speed 6300.27 samples/sec Loss 6.0170 LearningRate 0.0006 Epoch: 12 Global Step: 264090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:01,350-Speed 6309.28 samples/sec Loss 6.0144 LearningRate 0.0006 Epoch: 12 Global Step: 264100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:04,596-Speed 6310.19 samples/sec Loss 6.0129 LearningRate 0.0006 Epoch: 12 Global Step: 264110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:07,841-Speed 6312.46 samples/sec Loss 5.9911 LearningRate 0.0006 Epoch: 12 Global Step: 264120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:11,074-Speed 6336.89 samples/sec Loss 5.9442 LearningRate 0.0006 Epoch: 12 Global Step: 264130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:14,314-Speed 6321.53 samples/sec Loss 6.1236 LearningRate 0.0006 Epoch: 12 Global Step: 264140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:17,562-Speed 6308.25 samples/sec Loss 6.0131 LearningRate 0.0006 Epoch: 12 Global Step: 264150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:20,804-Speed 6317.41 samples/sec Loss 6.0456 LearningRate 0.0006 Epoch: 12 Global Step: 264160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:24,049-Speed 6313.07 samples/sec Loss 6.0245 LearningRate 0.0006 Epoch: 12 Global Step: 264170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:27,292-Speed 6316.34 samples/sec Loss 6.0075 LearningRate 0.0006 Epoch: 12 Global Step: 264180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:30,541-Speed 6305.45 samples/sec Loss 5.9733 LearningRate 0.0006 Epoch: 12 Global Step: 264190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:33,788-Speed 6308.85 samples/sec Loss 6.0309 LearningRate 0.0006 Epoch: 12 Global Step: 264200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:37,031-Speed 6316.64 samples/sec Loss 5.9748 LearningRate 0.0006 Epoch: 12 Global Step: 264210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:40,282-Speed 6300.33 samples/sec Loss 6.0160 LearningRate 0.0006 Epoch: 12 Global Step: 264220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:43,524-Speed 6319.55 samples/sec Loss 6.0294 LearningRate 0.0006 Epoch: 12 Global Step: 264230 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 15:45:46,753-Speed 6343.07 samples/sec Loss 5.9919 LearningRate 0.0006 Epoch: 12 Global Step: 264240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:49,998-Speed 6313.66 samples/sec Loss 6.0481 LearningRate 0.0006 Epoch: 12 Global Step: 264250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:53,247-Speed 6304.92 samples/sec Loss 6.0219 LearningRate 0.0006 Epoch: 12 Global Step: 264260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:56,497-Speed 6302.69 samples/sec Loss 6.0269 LearningRate 0.0006 Epoch: 12 Global Step: 264270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:45:59,744-Speed 6309.98 samples/sec Loss 6.0343 LearningRate 0.0006 Epoch: 12 Global Step: 264280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:02,993-Speed 6304.60 samples/sec Loss 6.0090 LearningRate 0.0006 Epoch: 12 Global Step: 264290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:06,239-Speed 6310.02 samples/sec Loss 6.0481 LearningRate 0.0006 Epoch: 12 Global Step: 264300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:09,483-Speed 6314.65 samples/sec Loss 6.0529 LearningRate 0.0006 Epoch: 12 Global Step: 264310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:12,733-Speed 6302.96 samples/sec Loss 5.9710 LearningRate 0.0006 Epoch: 12 Global Step: 264320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:15,978-Speed 6312.83 samples/sec Loss 6.0346 LearningRate 0.0006 Epoch: 12 Global Step: 264330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:19,210-Speed 6336.70 samples/sec Loss 6.0549 LearningRate 0.0006 Epoch: 12 Global Step: 264340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:22,454-Speed 6314.86 samples/sec Loss 6.0137 LearningRate 0.0006 Epoch: 12 Global Step: 264350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:25,704-Speed 6304.86 samples/sec Loss 5.9773 LearningRate 0.0006 Epoch: 12 Global Step: 264360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:28,954-Speed 6302.65 samples/sec Loss 5.9933 LearningRate 0.0006 Epoch: 12 Global Step: 264370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:32,199-Speed 6311.89 samples/sec Loss 6.1027 LearningRate 0.0006 Epoch: 12 Global Step: 264380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:35,446-Speed 6308.34 samples/sec Loss 5.9762 LearningRate 0.0006 Epoch: 12 Global Step: 264390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:38,692-Speed 6310.84 samples/sec Loss 6.0602 LearningRate 0.0006 Epoch: 12 Global Step: 264400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:41,947-Speed 6292.61 samples/sec Loss 6.0469 LearningRate 0.0006 Epoch: 12 Global Step: 264410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:45,193-Speed 6313.09 samples/sec Loss 6.0014 LearningRate 0.0006 Epoch: 12 Global Step: 264420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:48,442-Speed 6303.93 samples/sec Loss 6.0872 LearningRate 0.0006 Epoch: 12 Global Step: 264430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:51,683-Speed 6321.37 samples/sec Loss 6.0604 LearningRate 0.0006 Epoch: 12 Global Step: 264440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:54,935-Speed 6298.20 samples/sec Loss 5.9804 LearningRate 0.0006 Epoch: 12 Global Step: 264450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:46:58,186-Speed 6300.85 samples/sec Loss 6.0122 LearningRate 0.0006 Epoch: 12 Global Step: 264460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:01,435-Speed 6305.97 samples/sec Loss 6.0528 LearningRate 0.0006 Epoch: 12 Global Step: 264470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:04,684-Speed 6304.39 samples/sec Loss 6.0305 LearningRate 0.0006 Epoch: 12 Global Step: 264480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:07,934-Speed 6303.70 samples/sec Loss 6.0001 LearningRate 0.0006 Epoch: 12 Global Step: 264490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:11,180-Speed 6309.66 samples/sec Loss 6.0409 LearningRate 0.0006 Epoch: 12 Global Step: 264500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:14,434-Speed 6294.55 samples/sec Loss 6.0619 LearningRate 0.0006 Epoch: 12 Global Step: 264510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:17,684-Speed 6303.07 samples/sec Loss 6.0043 LearningRate 0.0006 Epoch: 12 Global Step: 264520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:20,932-Speed 6307.65 samples/sec Loss 6.0307 LearningRate 0.0006 Epoch: 12 Global Step: 264530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:24,182-Speed 6301.94 samples/sec Loss 6.0830 LearningRate 0.0006 Epoch: 12 Global Step: 264540 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 15:47:27,412-Speed 6343.95 samples/sec Loss 6.0340 LearningRate 0.0006 Epoch: 12 Global Step: 264550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:30,659-Speed 6307.83 samples/sec Loss 6.0622 LearningRate 0.0006 Epoch: 12 Global Step: 264560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:33,907-Speed 6306.35 samples/sec Loss 6.0179 LearningRate 0.0006 Epoch: 12 Global Step: 264570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:37,153-Speed 6311.81 samples/sec Loss 5.9305 LearningRate 0.0006 Epoch: 12 Global Step: 264580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:40,400-Speed 6308.16 samples/sec Loss 5.9582 LearningRate 0.0006 Epoch: 12 Global Step: 264590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:43,653-Speed 6296.99 samples/sec Loss 5.9795 LearningRate 0.0006 Epoch: 12 Global Step: 264600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:46,902-Speed 6305.72 samples/sec Loss 6.0716 LearningRate 0.0006 Epoch: 12 Global Step: 264610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:50,154-Speed 6298.26 samples/sec Loss 5.9636 LearningRate 0.0006 Epoch: 12 Global Step: 264620 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:53,407-Speed 6298.52 samples/sec Loss 6.0103 LearningRate 0.0006 Epoch: 12 Global Step: 264630 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:56,649-Speed 6318.17 samples/sec Loss 6.0641 LearningRate 0.0006 Epoch: 12 Global Step: 264640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:47:59,885-Speed 6331.06 samples/sec Loss 5.9708 LearningRate 0.0006 Epoch: 12 Global Step: 264650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:03,142-Speed 6289.12 samples/sec Loss 6.0826 LearningRate 0.0006 Epoch: 12 Global Step: 264660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:06,394-Speed 6298.46 samples/sec Loss 5.9613 LearningRate 0.0006 Epoch: 12 Global Step: 264670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:09,639-Speed 6312.30 samples/sec Loss 6.0442 LearningRate 0.0006 Epoch: 12 Global Step: 264680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:12,883-Speed 6315.48 samples/sec Loss 6.0090 LearningRate 0.0006 Epoch: 12 Global Step: 264690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:16,131-Speed 6306.83 samples/sec Loss 6.0067 LearningRate 0.0006 Epoch: 12 Global Step: 264700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:19,376-Speed 6312.95 samples/sec Loss 6.0197 LearningRate 0.0006 Epoch: 12 Global Step: 264710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:22,623-Speed 6307.65 samples/sec Loss 6.0066 LearningRate 0.0006 Epoch: 12 Global Step: 264720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:25,870-Speed 6310.33 samples/sec Loss 6.0842 LearningRate 0.0006 Epoch: 12 Global Step: 264730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:29,115-Speed 6311.75 samples/sec Loss 6.0254 LearningRate 0.0006 Epoch: 12 Global Step: 264740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:32,362-Speed 6309.81 samples/sec Loss 6.0108 LearningRate 0.0006 Epoch: 12 Global Step: 264750 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 15:48:35,591-Speed 6342.14 samples/sec Loss 6.0794 LearningRate 0.0006 Epoch: 12 Global Step: 264760 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:38,833-Speed 6318.30 samples/sec Loss 6.0002 LearningRate 0.0006 Epoch: 12 Global Step: 264770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:42,076-Speed 6317.71 samples/sec Loss 5.9514 LearningRate 0.0006 Epoch: 12 Global Step: 264780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:45,318-Speed 6319.11 samples/sec Loss 5.9975 LearningRate 0.0006 Epoch: 12 Global Step: 264790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:48,564-Speed 6309.99 samples/sec Loss 5.9168 LearningRate 0.0006 Epoch: 12 Global Step: 264800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:51,811-Speed 6308.13 samples/sec Loss 6.1024 LearningRate 0.0006 Epoch: 12 Global Step: 264810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:55,058-Speed 6309.36 samples/sec Loss 6.0929 LearningRate 0.0006 Epoch: 12 Global Step: 264820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:48:58,305-Speed 6309.55 samples/sec Loss 6.0258 LearningRate 0.0006 Epoch: 12 Global Step: 264830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:01,554-Speed 6304.68 samples/sec Loss 5.9763 LearningRate 0.0006 Epoch: 12 Global Step: 264840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:04,802-Speed 6307.94 samples/sec Loss 5.9967 LearningRate 0.0006 Epoch: 12 Global Step: 264850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:08,030-Speed 6345.71 samples/sec Loss 5.9727 LearningRate 0.0006 Epoch: 12 Global Step: 264860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:11,276-Speed 6310.18 samples/sec Loss 6.0493 LearningRate 0.0006 Epoch: 12 Global Step: 264870 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:14,519-Speed 6315.50 samples/sec Loss 6.0534 LearningRate 0.0006 Epoch: 12 Global Step: 264880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:17,767-Speed 6307.87 samples/sec Loss 6.0716 LearningRate 0.0006 Epoch: 12 Global Step: 264890 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:21,013-Speed 6310.60 samples/sec Loss 5.9633 LearningRate 0.0006 Epoch: 12 Global Step: 264900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:24,261-Speed 6307.52 samples/sec Loss 6.0283 LearningRate 0.0006 Epoch: 12 Global Step: 264910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:27,507-Speed 6309.99 samples/sec Loss 6.0703 LearningRate 0.0006 Epoch: 12 Global Step: 264920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:30,757-Speed 6303.10 samples/sec Loss 6.0469 LearningRate 0.0006 Epoch: 12 Global Step: 264930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:34,006-Speed 6304.74 samples/sec Loss 5.9840 LearningRate 0.0006 Epoch: 12 Global Step: 264940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:37,250-Speed 6313.95 samples/sec Loss 5.9837 LearningRate 0.0006 Epoch: 12 Global Step: 264950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:40,498-Speed 6307.77 samples/sec Loss 6.0292 LearningRate 0.0006 Epoch: 12 Global Step: 264960 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 15:49:43,732-Speed 6334.04 samples/sec Loss 6.0019 LearningRate 0.0006 Epoch: 12 Global Step: 264970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:47,081-Speed 6116.03 samples/sec Loss 6.1162 LearningRate 0.0006 Epoch: 12 Global Step: 264980 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:50,364-Speed 6239.84 samples/sec Loss 5.9746 LearningRate 0.0006 Epoch: 12 Global Step: 264990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:53,644-Speed 6245.26 samples/sec Loss 6.0275 LearningRate 0.0006 Epoch: 12 Global Step: 265000 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:49:56,893-Speed 6305.57 samples/sec Loss 6.0172 LearningRate 0.0006 Epoch: 12 Global Step: 265010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:00,147-Speed 6295.55 samples/sec Loss 5.9619 LearningRate 0.0006 Epoch: 12 Global Step: 265020 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:03,394-Speed 6308.13 samples/sec Loss 6.0165 LearningRate 0.0006 Epoch: 12 Global Step: 265030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:06,645-Speed 6301.60 samples/sec Loss 6.0122 LearningRate 0.0006 Epoch: 12 Global Step: 265040 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:09,890-Speed 6312.36 samples/sec Loss 6.0789 LearningRate 0.0006 Epoch: 12 Global Step: 265050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:13,138-Speed 6307.46 samples/sec Loss 6.1414 LearningRate 0.0006 Epoch: 12 Global Step: 265060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:16,374-Speed 6330.25 samples/sec Loss 6.0688 LearningRate 0.0006 Epoch: 12 Global Step: 265070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:19,626-Speed 6299.74 samples/sec Loss 6.0197 LearningRate 0.0006 Epoch: 12 Global Step: 265080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:22,878-Speed 6298.45 samples/sec Loss 5.9636 LearningRate 0.0006 Epoch: 12 Global Step: 265090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:26,130-Speed 6299.41 samples/sec Loss 6.0003 LearningRate 0.0006 Epoch: 12 Global Step: 265100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:29,377-Speed 6309.01 samples/sec Loss 6.0150 LearningRate 0.0006 Epoch: 12 Global Step: 265110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:32,628-Speed 6300.38 samples/sec Loss 6.0905 LearningRate 0.0006 Epoch: 12 Global Step: 265120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:35,879-Speed 6301.30 samples/sec Loss 6.0127 LearningRate 0.0006 Epoch: 12 Global Step: 265130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:39,126-Speed 6308.59 samples/sec Loss 6.0504 LearningRate 0.0006 Epoch: 12 Global Step: 265140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:42,372-Speed 6310.78 samples/sec Loss 6.0491 LearningRate 0.0006 Epoch: 12 Global Step: 265150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:45,619-Speed 6309.46 samples/sec Loss 6.0405 LearningRate 0.0006 Epoch: 12 Global Step: 265160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:48,851-Speed 6337.53 samples/sec Loss 6.1088 LearningRate 0.0006 Epoch: 12 Global Step: 265170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:52,097-Speed 6310.76 samples/sec Loss 6.0537 LearningRate 0.0006 Epoch: 12 Global Step: 265180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:55,346-Speed 6304.35 samples/sec Loss 5.9924 LearningRate 0.0006 Epoch: 12 Global Step: 265190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:50:58,592-Speed 6310.63 samples/sec Loss 5.9875 LearningRate 0.0006 Epoch: 12 Global Step: 265200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:01,863-Speed 6262.49 samples/sec Loss 5.9217 LearningRate 0.0006 Epoch: 12 Global Step: 265210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:05,180-Speed 6176.72 samples/sec Loss 6.0051 LearningRate 0.0006 Epoch: 12 Global Step: 265220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:08,425-Speed 6311.75 samples/sec Loss 6.0241 LearningRate 0.0006 Epoch: 12 Global Step: 265230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:11,670-Speed 6312.69 samples/sec Loss 6.0544 LearningRate 0.0006 Epoch: 12 Global Step: 265240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:14,915-Speed 6312.40 samples/sec Loss 5.9851 LearningRate 0.0006 Epoch: 12 Global Step: 265250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:18,160-Speed 6313.62 samples/sec Loss 6.0586 LearningRate 0.0006 Epoch: 12 Global Step: 265260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:21,395-Speed 6332.99 samples/sec Loss 6.0401 LearningRate 0.0006 Epoch: 12 Global Step: 265270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:24,642-Speed 6307.80 samples/sec Loss 5.9649 LearningRate 0.0006 Epoch: 12 Global Step: 265280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:27,888-Speed 6310.09 samples/sec Loss 6.0083 LearningRate 0.0006 Epoch: 12 Global Step: 265290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:31,134-Speed 6311.67 samples/sec Loss 6.0714 LearningRate 0.0006 Epoch: 12 Global Step: 265300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:34,375-Speed 6319.76 samples/sec Loss 6.0827 LearningRate 0.0006 Epoch: 12 Global Step: 265310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:37,621-Speed 6312.11 samples/sec Loss 5.9667 LearningRate 0.0006 Epoch: 12 Global Step: 265320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:40,868-Speed 6307.92 samples/sec Loss 6.0917 LearningRate 0.0006 Epoch: 12 Global Step: 265330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:44,111-Speed 6316.72 samples/sec Loss 6.0108 LearningRate 0.0006 Epoch: 12 Global Step: 265340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:47,356-Speed 6313.56 samples/sec Loss 6.0954 LearningRate 0.0006 Epoch: 12 Global Step: 265350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:50,605-Speed 6302.97 samples/sec Loss 6.0043 LearningRate 0.0006 Epoch: 12 Global Step: 265360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:53,836-Speed 6340.48 samples/sec Loss 6.0021 LearningRate 0.0006 Epoch: 12 Global Step: 265370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:51:57,082-Speed 6310.46 samples/sec Loss 6.0323 LearningRate 0.0006 Epoch: 12 Global Step: 265380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:00,334-Speed 6299.30 samples/sec Loss 6.0171 LearningRate 0.0006 Epoch: 12 Global Step: 265390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:03,582-Speed 6308.32 samples/sec Loss 5.9998 LearningRate 0.0006 Epoch: 12 Global Step: 265400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:06,835-Speed 6296.44 samples/sec Loss 5.9800 LearningRate 0.0006 Epoch: 12 Global Step: 265410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:10,085-Speed 6302.21 samples/sec Loss 5.9452 LearningRate 0.0006 Epoch: 12 Global Step: 265420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:13,332-Speed 6308.03 samples/sec Loss 5.9393 LearningRate 0.0006 Epoch: 12 Global Step: 265430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:16,577-Speed 6313.98 samples/sec Loss 5.9785 LearningRate 0.0006 Epoch: 12 Global Step: 265440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:19,826-Speed 6303.56 samples/sec Loss 5.9970 LearningRate 0.0006 Epoch: 12 Global Step: 265450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:23,071-Speed 6314.21 samples/sec Loss 6.0601 LearningRate 0.0006 Epoch: 12 Global Step: 265460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:26,318-Speed 6309.34 samples/sec Loss 6.0504 LearningRate 0.0006 Epoch: 12 Global Step: 265470 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 15:52:29,551-Speed 6335.71 samples/sec Loss 6.0325 LearningRate 0.0006 Epoch: 12 Global Step: 265480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:32,803-Speed 6299.32 samples/sec Loss 6.0208 LearningRate 0.0006 Epoch: 12 Global Step: 265490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:36,058-Speed 6294.16 samples/sec Loss 6.0542 LearningRate 0.0006 Epoch: 12 Global Step: 265500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:39,311-Speed 6296.32 samples/sec Loss 5.9832 LearningRate 0.0006 Epoch: 12 Global Step: 265510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:42,560-Speed 6304.85 samples/sec Loss 5.9606 LearningRate 0.0006 Epoch: 12 Global Step: 265520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:45,805-Speed 6313.29 samples/sec Loss 6.0422 LearningRate 0.0006 Epoch: 12 Global Step: 265530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:49,050-Speed 6311.11 samples/sec Loss 6.1107 LearningRate 0.0006 Epoch: 12 Global Step: 265540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:52,298-Speed 6308.02 samples/sec Loss 6.0442 LearningRate 0.0006 Epoch: 12 Global Step: 265550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:55,543-Speed 6312.28 samples/sec Loss 5.9830 LearningRate 0.0006 Epoch: 12 Global Step: 265560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:52:58,792-Speed 6304.29 samples/sec Loss 6.0244 LearningRate 0.0006 Epoch: 12 Global Step: 265570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:02,019-Speed 6348.51 samples/sec Loss 5.9899 LearningRate 0.0006 Epoch: 12 Global Step: 265580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:05,264-Speed 6313.17 samples/sec Loss 6.0275 LearningRate 0.0006 Epoch: 12 Global Step: 265590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:08,508-Speed 6314.70 samples/sec Loss 6.1250 LearningRate 0.0006 Epoch: 12 Global Step: 265600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:11,750-Speed 6317.70 samples/sec Loss 5.9813 LearningRate 0.0006 Epoch: 12 Global Step: 265610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:14,997-Speed 6308.21 samples/sec Loss 5.9974 LearningRate 0.0006 Epoch: 12 Global Step: 265620 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:18,245-Speed 6306.79 samples/sec Loss 5.9619 LearningRate 0.0006 Epoch: 12 Global Step: 265630 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:21,493-Speed 6306.91 samples/sec Loss 5.9889 LearningRate 0.0006 Epoch: 12 Global Step: 265640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:24,740-Speed 6309.34 samples/sec Loss 6.0046 LearningRate 0.0006 Epoch: 12 Global Step: 265650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:27,987-Speed 6308.49 samples/sec Loss 6.0209 LearningRate 0.0006 Epoch: 12 Global Step: 265660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:31,231-Speed 6314.73 samples/sec Loss 6.0338 LearningRate 0.0006 Epoch: 12 Global Step: 265670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:34,475-Speed 6315.59 samples/sec Loss 5.9796 LearningRate 0.0006 Epoch: 12 Global Step: 265680 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 15:53:37,710-Speed 6332.32 samples/sec Loss 5.9826 LearningRate 0.0006 Epoch: 12 Global Step: 265690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:40,955-Speed 6311.71 samples/sec Loss 5.9444 LearningRate 0.0006 Epoch: 12 Global Step: 265700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:44,203-Speed 6308.26 samples/sec Loss 6.0676 LearningRate 0.0006 Epoch: 12 Global Step: 265710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:47,448-Speed 6311.27 samples/sec Loss 5.9702 LearningRate 0.0006 Epoch: 12 Global Step: 265720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:50,691-Speed 6317.33 samples/sec Loss 6.0264 LearningRate 0.0006 Epoch: 12 Global Step: 265730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:53,972-Speed 6243.58 samples/sec Loss 6.0514 LearningRate 0.0006 Epoch: 12 Global Step: 265740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:53:57,223-Speed 6301.86 samples/sec Loss 6.0304 LearningRate 0.0006 Epoch: 12 Global Step: 265750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:00,465-Speed 6317.08 samples/sec Loss 5.9723 LearningRate 0.0006 Epoch: 12 Global Step: 265760 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:03,712-Speed 6310.22 samples/sec Loss 5.9830 LearningRate 0.0006 Epoch: 12 Global Step: 265770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:06,956-Speed 6312.87 samples/sec Loss 6.0075 LearningRate 0.0006 Epoch: 12 Global Step: 265780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:10,204-Speed 6309.01 samples/sec Loss 5.9993 LearningRate 0.0006 Epoch: 12 Global Step: 265790 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 15:54:13,444-Speed 6321.02 samples/sec Loss 5.9807 LearningRate 0.0006 Epoch: 12 Global Step: 265800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:16,693-Speed 6305.07 samples/sec Loss 6.0269 LearningRate 0.0006 Epoch: 12 Global Step: 265810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:19,939-Speed 6310.30 samples/sec Loss 5.9492 LearningRate 0.0006 Epoch: 12 Global Step: 265820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:23,198-Speed 6285.83 samples/sec Loss 6.0850 LearningRate 0.0006 Epoch: 12 Global Step: 265830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:26,446-Speed 6306.65 samples/sec Loss 5.9767 LearningRate 0.0006 Epoch: 12 Global Step: 265840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:29,695-Speed 6304.39 samples/sec Loss 6.0125 LearningRate 0.0006 Epoch: 12 Global Step: 265850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:32,941-Speed 6312.06 samples/sec Loss 6.0289 LearningRate 0.0006 Epoch: 12 Global Step: 265860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:36,193-Speed 6299.44 samples/sec Loss 5.9998 LearningRate 0.0006 Epoch: 12 Global Step: 265870 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:39,486-Speed 6220.43 samples/sec Loss 5.8924 LearningRate 0.0006 Epoch: 12 Global Step: 265880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:42,735-Speed 6304.16 samples/sec Loss 5.9990 LearningRate 0.0006 Epoch: 12 Global Step: 265890 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:45,968-Speed 6337.57 samples/sec Loss 6.0064 LearningRate 0.0006 Epoch: 12 Global Step: 265900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:49,212-Speed 6314.88 samples/sec Loss 6.0654 LearningRate 0.0006 Epoch: 12 Global Step: 265910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:52,455-Speed 6316.92 samples/sec Loss 5.9727 LearningRate 0.0006 Epoch: 12 Global Step: 265920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:55,702-Speed 6307.01 samples/sec Loss 5.9324 LearningRate 0.0006 Epoch: 12 Global Step: 265930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:54:58,950-Speed 6306.78 samples/sec Loss 5.9906 LearningRate 0.0006 Epoch: 12 Global Step: 265940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:02,199-Speed 6305.66 samples/sec Loss 5.9818 LearningRate 0.0006 Epoch: 12 Global Step: 265950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:05,444-Speed 6312.55 samples/sec Loss 6.0404 LearningRate 0.0006 Epoch: 12 Global Step: 265960 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:08,691-Speed 6308.28 samples/sec Loss 5.9837 LearningRate 0.0006 Epoch: 12 Global Step: 265970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:11,939-Speed 6307.86 samples/sec Loss 6.0194 LearningRate 0.0006 Epoch: 12 Global Step: 265980 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:15,186-Speed 6308.52 samples/sec Loss 5.9826 LearningRate 0.0006 Epoch: 12 Global Step: 265990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:18,420-Speed 6333.17 samples/sec Loss 5.9758 LearningRate 0.0006 Epoch: 12 Global Step: 266000 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:21,664-Speed 6316.01 samples/sec Loss 6.0582 LearningRate 0.0006 Epoch: 12 Global Step: 266010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:24,909-Speed 6312.71 samples/sec Loss 6.0669 LearningRate 0.0006 Epoch: 12 Global Step: 266020 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:28,151-Speed 6318.11 samples/sec Loss 6.0936 LearningRate 0.0006 Epoch: 12 Global Step: 266030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:31,395-Speed 6314.82 samples/sec Loss 5.9578 LearningRate 0.0006 Epoch: 12 Global Step: 266040 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:34,641-Speed 6310.15 samples/sec Loss 6.0187 LearningRate 0.0006 Epoch: 12 Global Step: 266050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:37,885-Speed 6314.94 samples/sec Loss 5.9669 LearningRate 0.0006 Epoch: 12 Global Step: 266060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:41,129-Speed 6314.29 samples/sec Loss 6.0990 LearningRate 0.0006 Epoch: 12 Global Step: 266070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:44,375-Speed 6310.40 samples/sec Loss 6.0904 LearningRate 0.0006 Epoch: 12 Global Step: 266080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:47,627-Speed 6299.64 samples/sec Loss 6.0303 LearningRate 0.0006 Epoch: 12 Global Step: 266090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:50,865-Speed 6326.12 samples/sec Loss 6.0608 LearningRate 0.0006 Epoch: 12 Global Step: 266100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:54,111-Speed 6310.64 samples/sec Loss 6.0505 LearningRate 0.0006 Epoch: 12 Global Step: 266110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:55:57,358-Speed 6308.58 samples/sec Loss 6.0213 LearningRate 0.0006 Epoch: 12 Global Step: 266120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:00,608-Speed 6303.85 samples/sec Loss 5.9351 LearningRate 0.0006 Epoch: 12 Global Step: 266130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:03,855-Speed 6309.29 samples/sec Loss 5.9928 LearningRate 0.0006 Epoch: 12 Global Step: 266140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:07,102-Speed 6308.15 samples/sec Loss 6.0098 LearningRate 0.0006 Epoch: 12 Global Step: 266150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:10,348-Speed 6310.70 samples/sec Loss 5.9815 LearningRate 0.0006 Epoch: 12 Global Step: 266160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:13,594-Speed 6311.63 samples/sec Loss 6.0236 LearningRate 0.0006 Epoch: 12 Global Step: 266170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:16,839-Speed 6311.48 samples/sec Loss 6.0811 LearningRate 0.0006 Epoch: 12 Global Step: 266180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:20,086-Speed 6309.87 samples/sec Loss 6.0057 LearningRate 0.0006 Epoch: 12 Global Step: 266190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:23,315-Speed 6343.62 samples/sec Loss 6.0527 LearningRate 0.0006 Epoch: 12 Global Step: 266200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:26,561-Speed 6309.94 samples/sec Loss 6.0590 LearningRate 0.0006 Epoch: 12 Global Step: 266210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:29,809-Speed 6306.69 samples/sec Loss 6.0172 LearningRate 0.0006 Epoch: 12 Global Step: 266220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:33,056-Speed 6309.65 samples/sec Loss 5.9706 LearningRate 0.0006 Epoch: 12 Global Step: 266230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:36,304-Speed 6305.92 samples/sec Loss 5.9913 LearningRate 0.0006 Epoch: 12 Global Step: 266240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:39,555-Speed 6302.18 samples/sec Loss 6.0201 LearningRate 0.0006 Epoch: 12 Global Step: 266250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:42,816-Speed 6281.46 samples/sec Loss 5.9445 LearningRate 0.0006 Epoch: 12 Global Step: 266260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:46,095-Speed 6247.02 samples/sec Loss 5.9827 LearningRate 0.0006 Epoch: 12 Global Step: 266270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:49,352-Speed 6289.19 samples/sec Loss 5.9676 LearningRate 0.0006 Epoch: 12 Global Step: 266280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:52,600-Speed 6307.58 samples/sec Loss 6.0239 LearningRate 0.0006 Epoch: 12 Global Step: 266290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:55,833-Speed 6335.58 samples/sec Loss 5.9755 LearningRate 0.0006 Epoch: 12 Global Step: 266300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:56:59,078-Speed 6312.22 samples/sec Loss 6.0545 LearningRate 0.0006 Epoch: 12 Global Step: 266310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:02,327-Speed 6307.07 samples/sec Loss 6.0180 LearningRate 0.0006 Epoch: 12 Global Step: 266320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:05,573-Speed 6310.72 samples/sec Loss 6.0327 LearningRate 0.0006 Epoch: 12 Global Step: 266330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:08,822-Speed 6304.47 samples/sec Loss 5.9627 LearningRate 0.0006 Epoch: 12 Global Step: 266340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:12,066-Speed 6314.47 samples/sec Loss 5.9712 LearningRate 0.0006 Epoch: 12 Global Step: 266350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:15,314-Speed 6305.50 samples/sec Loss 6.0470 LearningRate 0.0006 Epoch: 12 Global Step: 266360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:18,560-Speed 6311.22 samples/sec Loss 5.9875 LearningRate 0.0006 Epoch: 12 Global Step: 266370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:21,810-Speed 6303.55 samples/sec Loss 5.9870 LearningRate 0.0006 Epoch: 12 Global Step: 266380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:25,059-Speed 6305.30 samples/sec Loss 5.9974 LearningRate 0.0006 Epoch: 12 Global Step: 266390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:28,300-Speed 6318.64 samples/sec Loss 5.9926 LearningRate 0.0006 Epoch: 12 Global Step: 266400 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 15:57:31,533-Speed 6336.45 samples/sec Loss 6.0054 LearningRate 0.0006 Epoch: 12 Global Step: 266410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:34,777-Speed 6315.08 samples/sec Loss 5.9602 LearningRate 0.0006 Epoch: 12 Global Step: 266420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:38,025-Speed 6306.52 samples/sec Loss 6.0546 LearningRate 0.0006 Epoch: 12 Global Step: 266430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:41,270-Speed 6312.49 samples/sec Loss 5.9974 LearningRate 0.0006 Epoch: 12 Global Step: 266440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:44,520-Speed 6302.92 samples/sec Loss 5.9791 LearningRate 0.0006 Epoch: 12 Global Step: 266450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:47,767-Speed 6308.38 samples/sec Loss 6.0546 LearningRate 0.0006 Epoch: 12 Global Step: 266460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:51,014-Speed 6310.32 samples/sec Loss 5.9952 LearningRate 0.0006 Epoch: 12 Global Step: 266470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:54,259-Speed 6312.68 samples/sec Loss 6.0628 LearningRate 0.0006 Epoch: 12 Global Step: 266480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:57:57,506-Speed 6307.58 samples/sec Loss 5.9794 LearningRate 0.0006 Epoch: 12 Global Step: 266490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:58:00,752-Speed 6310.75 samples/sec Loss 6.0385 LearningRate 0.0006 Epoch: 12 Global Step: 266500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:58:03,985-Speed 6336.93 samples/sec Loss 5.9680 LearningRate 0.0006 Epoch: 12 Global Step: 266510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:58:07,234-Speed 6304.29 samples/sec Loss 6.0377 LearningRate 0.0006 Epoch: 12 Global Step: 266520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:58:10,483-Speed 6307.10 samples/sec Loss 6.0564 LearningRate 0.0006 Epoch: 12 Global Step: 266530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:58:13,734-Speed 6299.62 samples/sec Loss 6.0163 LearningRate 0.0006 Epoch: 12 Global Step: 266540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:58:16,969-Speed 6331.98 samples/sec Loss 5.9901 LearningRate 0.0006 Epoch: 12 Global Step: 266550 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 15:58:20,217-Speed 6307.42 samples/sec Loss 5.9859 LearningRate 0.0006 Epoch: 12 Global Step: 266560 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 15:58:23,463-Speed 6311.14 samples/sec Loss 5.9415 LearningRate 0.0006 Epoch: 12 Global Step: 266570 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 15:58:26,708-Speed 6311.62 samples/sec Loss 6.0173 LearningRate 0.0006 Epoch: 12 Global Step: 266580 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 15:58:29,956-Speed 6306.82 samples/sec Loss 5.9594 LearningRate 0.0006 Epoch: 12 Global Step: 266590 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 15:58:33,213-Speed 6291.89 samples/sec Loss 5.9973 LearningRate 0.0006 Epoch: 12 Global Step: 266600 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 15:58:36,461-Speed 6307.23 samples/sec Loss 6.0027 LearningRate 0.0006 Epoch: 12 Global Step: 266610 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 15:58:39,708-Speed 6309.14 samples/sec Loss 5.9359 LearningRate 0.0006 Epoch: 12 Global Step: 266620 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 15:58:42,953-Speed 6312.06 samples/sec Loss 6.0209 LearningRate 0.0006 Epoch: 12 Global Step: 266630 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 15:58:46,198-Speed 6312.76 samples/sec Loss 6.0624 LearningRate 0.0006 Epoch: 12 Global Step: 266640 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 15:58:49,448-Speed 6302.86 samples/sec Loss 5.9710 LearningRate 0.0006 Epoch: 12 Global Step: 266650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:58:52,694-Speed 6310.97 samples/sec Loss 6.0075 LearningRate 0.0006 Epoch: 12 Global Step: 266660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:58:55,949-Speed 6293.53 samples/sec Loss 6.0294 LearningRate 0.0006 Epoch: 12 Global Step: 266670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:58:59,198-Speed 6303.58 samples/sec Loss 6.0408 LearningRate 0.0006 Epoch: 12 Global Step: 266680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:02,444-Speed 6312.26 samples/sec Loss 5.9721 LearningRate 0.0006 Epoch: 12 Global Step: 266690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:05,692-Speed 6305.37 samples/sec Loss 5.9511 LearningRate 0.0006 Epoch: 12 Global Step: 266700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:08,939-Speed 6310.33 samples/sec Loss 6.0520 LearningRate 0.0006 Epoch: 12 Global Step: 266710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:12,184-Speed 6312.55 samples/sec Loss 6.0200 LearningRate 0.0006 Epoch: 12 Global Step: 266720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:15,431-Speed 6308.65 samples/sec Loss 6.0427 LearningRate 0.0006 Epoch: 12 Global Step: 266730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:18,676-Speed 6313.36 samples/sec Loss 6.0545 LearningRate 0.0006 Epoch: 12 Global Step: 266740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:21,908-Speed 6337.66 samples/sec Loss 6.0322 LearningRate 0.0006 Epoch: 12 Global Step: 266750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:25,154-Speed 6311.54 samples/sec Loss 6.0396 LearningRate 0.0006 Epoch: 12 Global Step: 266760 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:28,398-Speed 6314.17 samples/sec Loss 5.9992 LearningRate 0.0006 Epoch: 12 Global Step: 266770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:31,642-Speed 6314.53 samples/sec Loss 6.0318 LearningRate 0.0006 Epoch: 12 Global Step: 266780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:34,895-Speed 6297.02 samples/sec Loss 5.8960 LearningRate 0.0006 Epoch: 12 Global Step: 266790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:38,144-Speed 6304.59 samples/sec Loss 5.9232 LearningRate 0.0006 Epoch: 12 Global Step: 266800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:41,399-Speed 6294.43 samples/sec Loss 5.9548 LearningRate 0.0006 Epoch: 12 Global Step: 266810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:44,642-Speed 6315.12 samples/sec Loss 5.9916 LearningRate 0.0006 Epoch: 12 Global Step: 266820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:47,885-Speed 6316.23 samples/sec Loss 6.0480 LearningRate 0.0006 Epoch: 12 Global Step: 266830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:51,132-Speed 6309.84 samples/sec Loss 5.9899 LearningRate 0.0006 Epoch: 12 Global Step: 266840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:54,366-Speed 6333.80 samples/sec Loss 5.9384 LearningRate 0.0006 Epoch: 12 Global Step: 266850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 15:59:57,612-Speed 6310.88 samples/sec Loss 6.0103 LearningRate 0.0006 Epoch: 12 Global Step: 266860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:00,856-Speed 6314.06 samples/sec Loss 6.0176 LearningRate 0.0006 Epoch: 12 Global Step: 266870 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:04,105-Speed 6304.71 samples/sec Loss 6.0239 LearningRate 0.0006 Epoch: 12 Global Step: 266880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:07,349-Speed 6315.70 samples/sec Loss 5.9637 LearningRate 0.0006 Epoch: 12 Global Step: 266890 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:10,595-Speed 6309.94 samples/sec Loss 6.0071 LearningRate 0.0006 Epoch: 12 Global Step: 266900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:13,843-Speed 6308.08 samples/sec Loss 5.9170 LearningRate 0.0006 Epoch: 12 Global Step: 266910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:17,086-Speed 6315.71 samples/sec Loss 6.0339 LearningRate 0.0006 Epoch: 12 Global Step: 266920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:20,329-Speed 6317.03 samples/sec Loss 5.9464 LearningRate 0.0006 Epoch: 12 Global Step: 266930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:23,577-Speed 6306.31 samples/sec Loss 5.9700 LearningRate 0.0006 Epoch: 12 Global Step: 266940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:26,807-Speed 6342.95 samples/sec Loss 6.0021 LearningRate 0.0006 Epoch: 12 Global Step: 266950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:30,054-Speed 6307.62 samples/sec Loss 5.9782 LearningRate 0.0006 Epoch: 12 Global Step: 266960 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:33,299-Speed 6313.61 samples/sec Loss 6.0166 LearningRate 0.0006 Epoch: 12 Global Step: 266970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:36,543-Speed 6315.38 samples/sec Loss 5.9864 LearningRate 0.0006 Epoch: 12 Global Step: 266980 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:39,787-Speed 6315.29 samples/sec Loss 6.0831 LearningRate 0.0006 Epoch: 12 Global Step: 266990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:43,029-Speed 6317.03 samples/sec Loss 6.0031 LearningRate 0.0006 Epoch: 12 Global Step: 267000 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:46,277-Speed 6306.55 samples/sec Loss 5.9383 LearningRate 0.0006 Epoch: 12 Global Step: 267010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:49,523-Speed 6312.22 samples/sec Loss 6.0144 LearningRate 0.0006 Epoch: 12 Global Step: 267020 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:52,768-Speed 6311.57 samples/sec Loss 5.9471 LearningRate 0.0006 Epoch: 12 Global Step: 267030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:56,014-Speed 6310.22 samples/sec Loss 6.0100 LearningRate 0.0006 Epoch: 12 Global Step: 267040 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:00:59,263-Speed 6305.30 samples/sec Loss 5.9544 LearningRate 0.0006 Epoch: 12 Global Step: 267050 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:01:02,495-Speed 6337.78 samples/sec Loss 5.9830 LearningRate 0.0006 Epoch: 12 Global Step: 267060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:05,743-Speed 6307.00 samples/sec Loss 5.9255 LearningRate 0.0006 Epoch: 12 Global Step: 267070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:08,987-Speed 6315.40 samples/sec Loss 6.0499 LearningRate 0.0006 Epoch: 12 Global Step: 267080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:12,233-Speed 6310.03 samples/sec Loss 6.0359 LearningRate 0.0006 Epoch: 12 Global Step: 267090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:15,478-Speed 6313.44 samples/sec Loss 6.0332 LearningRate 0.0006 Epoch: 12 Global Step: 267100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:18,744-Speed 6270.69 samples/sec Loss 6.0862 LearningRate 0.0006 Epoch: 12 Global Step: 267110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:21,990-Speed 6312.00 samples/sec Loss 5.9532 LearningRate 0.0006 Epoch: 12 Global Step: 267120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:25,240-Speed 6302.28 samples/sec Loss 6.0539 LearningRate 0.0006 Epoch: 12 Global Step: 267130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:28,489-Speed 6304.93 samples/sec Loss 5.9731 LearningRate 0.0006 Epoch: 12 Global Step: 267140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:31,737-Speed 6307.50 samples/sec Loss 6.0019 LearningRate 0.0006 Epoch: 12 Global Step: 267150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:34,969-Speed 6336.50 samples/sec Loss 6.0386 LearningRate 0.0006 Epoch: 12 Global Step: 267160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:38,216-Speed 6310.59 samples/sec Loss 5.9896 LearningRate 0.0006 Epoch: 12 Global Step: 267170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:41,461-Speed 6312.80 samples/sec Loss 5.9704 LearningRate 0.0006 Epoch: 12 Global Step: 267180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:44,707-Speed 6309.82 samples/sec Loss 5.9397 LearningRate 0.0006 Epoch: 12 Global Step: 267190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:48,043-Speed 6142.21 samples/sec Loss 6.0270 LearningRate 0.0006 Epoch: 12 Global Step: 267200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:51,289-Speed 6310.61 samples/sec Loss 5.9592 LearningRate 0.0006 Epoch: 12 Global Step: 267210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:54,532-Speed 6314.89 samples/sec Loss 6.0260 LearningRate 0.0006 Epoch: 12 Global Step: 267220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:01:57,781-Speed 6305.68 samples/sec Loss 5.9808 LearningRate 0.0006 Epoch: 12 Global Step: 267230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:01,031-Speed 6302.35 samples/sec Loss 6.0029 LearningRate 0.0006 Epoch: 12 Global Step: 267240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:04,277-Speed 6311.61 samples/sec Loss 6.0521 LearningRate 0.0006 Epoch: 12 Global Step: 267250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:07,509-Speed 6337.23 samples/sec Loss 6.0498 LearningRate 0.0006 Epoch: 12 Global Step: 267260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:10,755-Speed 6311.98 samples/sec Loss 6.0401 LearningRate 0.0006 Epoch: 12 Global Step: 267270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:14,051-Speed 6214.13 samples/sec Loss 6.0953 LearningRate 0.0006 Epoch: 12 Global Step: 267280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:17,298-Speed 6309.67 samples/sec Loss 6.0264 LearningRate 0.0006 Epoch: 12 Global Step: 267290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:20,543-Speed 6311.31 samples/sec Loss 6.0766 LearningRate 0.0006 Epoch: 12 Global Step: 267300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:23,786-Speed 6317.56 samples/sec Loss 5.9853 LearningRate 0.0006 Epoch: 12 Global Step: 267310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:27,043-Speed 6289.29 samples/sec Loss 6.0506 LearningRate 0.0006 Epoch: 12 Global Step: 267320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:30,290-Speed 6309.43 samples/sec Loss 6.0089 LearningRate 0.0006 Epoch: 12 Global Step: 267330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:33,533-Speed 6315.24 samples/sec Loss 5.9757 LearningRate 0.0006 Epoch: 12 Global Step: 267340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:36,780-Speed 6309.48 samples/sec Loss 6.0041 LearningRate 0.0006 Epoch: 12 Global Step: 267350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:40,013-Speed 6335.62 samples/sec Loss 5.9332 LearningRate 0.0006 Epoch: 12 Global Step: 267360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:43,258-Speed 6312.96 samples/sec Loss 5.9491 LearningRate 0.0006 Epoch: 12 Global Step: 267370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:46,500-Speed 6319.58 samples/sec Loss 5.9477 LearningRate 0.0006 Epoch: 12 Global Step: 267380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:49,748-Speed 6306.62 samples/sec Loss 5.9755 LearningRate 0.0006 Epoch: 12 Global Step: 267390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:52,992-Speed 6314.92 samples/sec Loss 6.0105 LearningRate 0.0006 Epoch: 12 Global Step: 267400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:56,242-Speed 6303.20 samples/sec Loss 6.0023 LearningRate 0.0006 Epoch: 12 Global Step: 267410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:02:59,499-Speed 6289.55 samples/sec Loss 6.0243 LearningRate 0.0006 Epoch: 12 Global Step: 267420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:02,819-Speed 6169.42 samples/sec Loss 6.0311 LearningRate 0.0006 Epoch: 12 Global Step: 267430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:06,063-Speed 6314.57 samples/sec Loss 5.9800 LearningRate 0.0006 Epoch: 12 Global Step: 267440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:09,308-Speed 6312.76 samples/sec Loss 5.9404 LearningRate 0.0006 Epoch: 12 Global Step: 267450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:12,556-Speed 6306.57 samples/sec Loss 6.0383 LearningRate 0.0006 Epoch: 12 Global Step: 267460 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:03:15,795-Speed 6324.67 samples/sec Loss 5.9864 LearningRate 0.0006 Epoch: 12 Global Step: 267470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:19,038-Speed 6315.95 samples/sec Loss 6.0782 LearningRate 0.0006 Epoch: 12 Global Step: 267480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:22,284-Speed 6311.31 samples/sec Loss 5.9553 LearningRate 0.0006 Epoch: 12 Global Step: 267490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:25,543-Speed 6285.51 samples/sec Loss 5.9535 LearningRate 0.0006 Epoch: 12 Global Step: 267500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:28,825-Speed 6241.36 samples/sec Loss 6.0430 LearningRate 0.0006 Epoch: 12 Global Step: 267510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:32,069-Speed 6314.91 samples/sec Loss 6.0108 LearningRate 0.0006 Epoch: 12 Global Step: 267520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:35,311-Speed 6318.14 samples/sec Loss 6.0399 LearningRate 0.0006 Epoch: 12 Global Step: 267530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:38,566-Speed 6294.51 samples/sec Loss 6.0076 LearningRate 0.0006 Epoch: 12 Global Step: 267540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:41,813-Speed 6309.06 samples/sec Loss 5.9644 LearningRate 0.0006 Epoch: 12 Global Step: 267550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:45,056-Speed 6315.40 samples/sec Loss 5.9932 LearningRate 0.0006 Epoch: 12 Global Step: 267560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:48,290-Speed 6334.93 samples/sec Loss 5.9940 LearningRate 0.0006 Epoch: 12 Global Step: 267570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:51,534-Speed 6313.64 samples/sec Loss 6.0262 LearningRate 0.0006 Epoch: 12 Global Step: 267580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:54,780-Speed 6311.07 samples/sec Loss 6.0030 LearningRate 0.0006 Epoch: 12 Global Step: 267590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:03:58,028-Speed 6308.14 samples/sec Loss 6.0122 LearningRate 0.0006 Epoch: 12 Global Step: 267600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:01,274-Speed 6310.07 samples/sec Loss 5.9839 LearningRate 0.0006 Epoch: 12 Global Step: 267610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:04,524-Speed 6303.71 samples/sec Loss 6.0258 LearningRate 0.0006 Epoch: 12 Global Step: 267620 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:07,773-Speed 6305.53 samples/sec Loss 6.0341 LearningRate 0.0006 Epoch: 12 Global Step: 267630 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:11,017-Speed 6314.43 samples/sec Loss 6.0241 LearningRate 0.0006 Epoch: 12 Global Step: 267640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:14,319-Speed 6203.54 samples/sec Loss 6.0361 LearningRate 0.0006 Epoch: 12 Global Step: 267650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:17,623-Speed 6199.89 samples/sec Loss 6.0174 LearningRate 0.0006 Epoch: 12 Global Step: 267660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:20,861-Speed 6327.19 samples/sec Loss 5.8996 LearningRate 0.0006 Epoch: 12 Global Step: 267670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:24,107-Speed 6310.09 samples/sec Loss 5.9279 LearningRate 0.0006 Epoch: 12 Global Step: 267680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:27,358-Speed 6301.58 samples/sec Loss 6.0665 LearningRate 0.0006 Epoch: 12 Global Step: 267690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:30,604-Speed 6309.44 samples/sec Loss 6.0154 LearningRate 0.0006 Epoch: 12 Global Step: 267700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:33,853-Speed 6304.77 samples/sec Loss 6.0001 LearningRate 0.0006 Epoch: 12 Global Step: 267710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:37,099-Speed 6310.79 samples/sec Loss 5.9625 LearningRate 0.0006 Epoch: 12 Global Step: 267720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:40,347-Speed 6308.55 samples/sec Loss 6.0292 LearningRate 0.0006 Epoch: 12 Global Step: 267730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:43,594-Speed 6307.66 samples/sec Loss 6.0561 LearningRate 0.0006 Epoch: 12 Global Step: 267740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:46,839-Speed 6312.44 samples/sec Loss 5.9463 LearningRate 0.0006 Epoch: 12 Global Step: 267750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:50,096-Speed 6290.15 samples/sec Loss 6.0240 LearningRate 0.0006 Epoch: 12 Global Step: 267760 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:53,374-Speed 6249.06 samples/sec Loss 6.0422 LearningRate 0.0006 Epoch: 12 Global Step: 267770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:56,619-Speed 6311.58 samples/sec Loss 6.0404 LearningRate 0.0006 Epoch: 12 Global Step: 267780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:04:59,890-Speed 6264.54 samples/sec Loss 5.9724 LearningRate 0.0006 Epoch: 12 Global Step: 267790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:03,137-Speed 6307.76 samples/sec Loss 6.0175 LearningRate 0.0006 Epoch: 12 Global Step: 267800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:06,381-Speed 6314.33 samples/sec Loss 6.0157 LearningRate 0.0006 Epoch: 12 Global Step: 267810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:09,625-Speed 6316.06 samples/sec Loss 6.0422 LearningRate 0.0006 Epoch: 12 Global Step: 267820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:12,870-Speed 6313.05 samples/sec Loss 6.0047 LearningRate 0.0006 Epoch: 12 Global Step: 267830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:16,119-Speed 6303.85 samples/sec Loss 6.0333 LearningRate 0.0006 Epoch: 12 Global Step: 267840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:19,369-Speed 6303.36 samples/sec Loss 6.0406 LearningRate 0.0006 Epoch: 12 Global Step: 267850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:22,619-Speed 6302.60 samples/sec Loss 5.9440 LearningRate 0.0006 Epoch: 12 Global Step: 267860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:25,868-Speed 6304.68 samples/sec Loss 5.9765 LearningRate 0.0006 Epoch: 12 Global Step: 267870 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:05:29,102-Speed 6334.77 samples/sec Loss 5.9421 LearningRate 0.0006 Epoch: 12 Global Step: 267880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:32,347-Speed 6313.34 samples/sec Loss 5.9626 LearningRate 0.0006 Epoch: 12 Global Step: 267890 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:35,594-Speed 6308.82 samples/sec Loss 5.9976 LearningRate 0.0006 Epoch: 12 Global Step: 267900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:38,842-Speed 6307.17 samples/sec Loss 5.9808 LearningRate 0.0006 Epoch: 12 Global Step: 267910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:42,085-Speed 6316.45 samples/sec Loss 5.9784 LearningRate 0.0006 Epoch: 12 Global Step: 267920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:45,334-Speed 6305.06 samples/sec Loss 5.9626 LearningRate 0.0006 Epoch: 12 Global Step: 267930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:48,579-Speed 6312.35 samples/sec Loss 6.0451 LearningRate 0.0006 Epoch: 12 Global Step: 267940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:51,828-Speed 6303.88 samples/sec Loss 6.0038 LearningRate 0.0006 Epoch: 12 Global Step: 267950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:55,084-Speed 6290.77 samples/sec Loss 5.9899 LearningRate 0.0006 Epoch: 12 Global Step: 267960 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:05:58,324-Speed 6322.39 samples/sec Loss 5.9702 LearningRate 0.0006 Epoch: 12 Global Step: 267970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:01,572-Speed 6306.87 samples/sec Loss 5.9525 LearningRate 0.0006 Epoch: 12 Global Step: 267980 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:06:04,805-Speed 6335.84 samples/sec Loss 5.9840 LearningRate 0.0006 Epoch: 12 Global Step: 267990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:08,049-Speed 6315.85 samples/sec Loss 5.9850 LearningRate 0.0006 Epoch: 12 Global Step: 268000 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:11,297-Speed 6306.32 samples/sec Loss 6.0180 LearningRate 0.0006 Epoch: 12 Global Step: 268010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:14,543-Speed 6311.43 samples/sec Loss 5.9200 LearningRate 0.0006 Epoch: 12 Global Step: 268020 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:17,790-Speed 6309.48 samples/sec Loss 6.0618 LearningRate 0.0006 Epoch: 12 Global Step: 268030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:21,034-Speed 6313.62 samples/sec Loss 6.0245 LearningRate 0.0006 Epoch: 12 Global Step: 268040 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:24,278-Speed 6314.62 samples/sec Loss 6.0221 LearningRate 0.0006 Epoch: 12 Global Step: 268050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:27,523-Speed 6313.97 samples/sec Loss 6.0060 LearningRate 0.0006 Epoch: 12 Global Step: 268060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:30,766-Speed 6316.71 samples/sec Loss 5.9490 LearningRate 0.0006 Epoch: 12 Global Step: 268070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:34,011-Speed 6311.73 samples/sec Loss 5.9538 LearningRate 0.0006 Epoch: 12 Global Step: 268080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:37,242-Speed 6339.61 samples/sec Loss 6.0550 LearningRate 0.0006 Epoch: 12 Global Step: 268090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:40,488-Speed 6311.56 samples/sec Loss 6.0174 LearningRate 0.0006 Epoch: 12 Global Step: 268100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:43,731-Speed 6316.30 samples/sec Loss 5.9845 LearningRate 0.0006 Epoch: 12 Global Step: 268110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:46,978-Speed 6309.30 samples/sec Loss 6.0271 LearningRate 0.0006 Epoch: 12 Global Step: 268120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:50,223-Speed 6311.26 samples/sec Loss 6.0648 LearningRate 0.0006 Epoch: 12 Global Step: 268130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:53,469-Speed 6312.03 samples/sec Loss 5.9669 LearningRate 0.0006 Epoch: 12 Global Step: 268140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:56,713-Speed 6314.18 samples/sec Loss 6.0592 LearningRate 0.0006 Epoch: 12 Global Step: 268150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:06:59,958-Speed 6312.55 samples/sec Loss 5.9608 LearningRate 0.0006 Epoch: 12 Global Step: 268160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:03,204-Speed 6309.46 samples/sec Loss 6.0540 LearningRate 0.0006 Epoch: 12 Global Step: 268170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:06,452-Speed 6308.10 samples/sec Loss 5.9819 LearningRate 0.0006 Epoch: 12 Global Step: 268180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:09,682-Speed 6341.07 samples/sec Loss 5.9737 LearningRate 0.0006 Epoch: 12 Global Step: 268190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:12,935-Speed 6297.90 samples/sec Loss 5.9394 LearningRate 0.0006 Epoch: 12 Global Step: 268200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:16,181-Speed 6311.30 samples/sec Loss 5.9628 LearningRate 0.0006 Epoch: 12 Global Step: 268210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:19,427-Speed 6309.81 samples/sec Loss 5.9962 LearningRate 0.0006 Epoch: 12 Global Step: 268220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:22,677-Speed 6304.31 samples/sec Loss 5.9549 LearningRate 0.0006 Epoch: 12 Global Step: 268230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:25,924-Speed 6307.23 samples/sec Loss 6.0093 LearningRate 0.0006 Epoch: 12 Global Step: 268240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:29,170-Speed 6311.17 samples/sec Loss 6.0823 LearningRate 0.0006 Epoch: 12 Global Step: 268250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:32,416-Speed 6312.24 samples/sec Loss 6.0568 LearningRate 0.0006 Epoch: 12 Global Step: 268260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:35,669-Speed 6296.32 samples/sec Loss 5.9732 LearningRate 0.0006 Epoch: 12 Global Step: 268270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:38,913-Speed 6314.58 samples/sec Loss 5.9854 LearningRate 0.0006 Epoch: 12 Global Step: 268280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:42,145-Speed 6339.04 samples/sec Loss 5.8848 LearningRate 0.0006 Epoch: 12 Global Step: 268290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:45,392-Speed 6307.32 samples/sec Loss 6.0076 LearningRate 0.0006 Epoch: 12 Global Step: 268300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:48,638-Speed 6312.04 samples/sec Loss 6.0879 LearningRate 0.0006 Epoch: 12 Global Step: 268310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:51,883-Speed 6312.41 samples/sec Loss 6.0431 LearningRate 0.0006 Epoch: 12 Global Step: 268320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:55,130-Speed 6309.03 samples/sec Loss 6.0804 LearningRate 0.0006 Epoch: 12 Global Step: 268330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:07:58,397-Speed 6270.52 samples/sec Loss 5.8900 LearningRate 0.0006 Epoch: 12 Global Step: 268340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:01,642-Speed 6312.00 samples/sec Loss 5.9887 LearningRate 0.0006 Epoch: 12 Global Step: 268350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:04,892-Speed 6302.03 samples/sec Loss 5.9770 LearningRate 0.0006 Epoch: 12 Global Step: 268360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:08,139-Speed 6308.51 samples/sec Loss 5.9874 LearningRate 0.0006 Epoch: 12 Global Step: 268370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:11,394-Speed 6294.84 samples/sec Loss 5.9936 LearningRate 0.0006 Epoch: 12 Global Step: 268380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:14,656-Speed 6278.29 samples/sec Loss 5.9064 LearningRate 0.0006 Epoch: 12 Global Step: 268390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:17,900-Speed 6314.44 samples/sec Loss 5.9246 LearningRate 0.0006 Epoch: 12 Global Step: 268400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:21,148-Speed 6306.67 samples/sec Loss 5.9110 LearningRate 0.0006 Epoch: 12 Global Step: 268410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:24,390-Speed 6319.07 samples/sec Loss 5.9609 LearningRate 0.0006 Epoch: 12 Global Step: 268420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:27,637-Speed 6309.47 samples/sec Loss 5.9585 LearningRate 0.0006 Epoch: 12 Global Step: 268430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:30,878-Speed 6318.80 samples/sec Loss 5.9400 LearningRate 0.0006 Epoch: 12 Global Step: 268440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:34,123-Speed 6313.62 samples/sec Loss 5.9394 LearningRate 0.0006 Epoch: 12 Global Step: 268450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:37,368-Speed 6314.18 samples/sec Loss 6.0769 LearningRate 0.0006 Epoch: 12 Global Step: 268460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:40,612-Speed 6314.13 samples/sec Loss 5.9442 LearningRate 0.0006 Epoch: 12 Global Step: 268470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:43,855-Speed 6315.76 samples/sec Loss 6.0349 LearningRate 0.0006 Epoch: 12 Global Step: 268480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:47,088-Speed 6336.19 samples/sec Loss 6.0307 LearningRate 0.0006 Epoch: 12 Global Step: 268490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:50,337-Speed 6306.58 samples/sec Loss 6.0278 LearningRate 0.0006 Epoch: 12 Global Step: 268500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:53,583-Speed 6309.93 samples/sec Loss 6.0311 LearningRate 0.0006 Epoch: 12 Global Step: 268510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:08:56,825-Speed 6317.94 samples/sec Loss 5.9462 LearningRate 0.0006 Epoch: 12 Global Step: 268520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:00,070-Speed 6313.02 samples/sec Loss 6.0047 LearningRate 0.0006 Epoch: 12 Global Step: 268530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:03,314-Speed 6314.52 samples/sec Loss 6.0093 LearningRate 0.0006 Epoch: 12 Global Step: 268540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:06,558-Speed 6314.14 samples/sec Loss 5.9499 LearningRate 0.0006 Epoch: 12 Global Step: 268550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:09,803-Speed 6312.83 samples/sec Loss 6.0172 LearningRate 0.0006 Epoch: 12 Global Step: 268560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:13,043-Speed 6322.07 samples/sec Loss 6.0072 LearningRate 0.0006 Epoch: 12 Global Step: 268570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:16,289-Speed 6311.74 samples/sec Loss 5.9999 LearningRate 0.0006 Epoch: 12 Global Step: 268580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:19,522-Speed 6336.21 samples/sec Loss 5.9808 LearningRate 0.0006 Epoch: 12 Global Step: 268590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:22,770-Speed 6306.28 samples/sec Loss 6.0300 LearningRate 0.0006 Epoch: 12 Global Step: 268600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:26,014-Speed 6314.54 samples/sec Loss 5.9627 LearningRate 0.0006 Epoch: 12 Global Step: 268610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:29,256-Speed 6319.17 samples/sec Loss 5.9454 LearningRate 0.0006 Epoch: 12 Global Step: 268620 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:32,497-Speed 6319.06 samples/sec Loss 6.0126 LearningRate 0.0006 Epoch: 12 Global Step: 268630 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:35,744-Speed 6308.58 samples/sec Loss 5.9631 LearningRate 0.0006 Epoch: 12 Global Step: 268640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:38,990-Speed 6312.03 samples/sec Loss 6.0597 LearningRate 0.0006 Epoch: 12 Global Step: 268650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:42,232-Speed 6318.08 samples/sec Loss 5.9981 LearningRate 0.0006 Epoch: 12 Global Step: 268660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:45,476-Speed 6315.04 samples/sec Loss 5.9598 LearningRate 0.0006 Epoch: 12 Global Step: 268670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:48,720-Speed 6314.89 samples/sec Loss 5.9999 LearningRate 0.0006 Epoch: 12 Global Step: 268680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:51,949-Speed 6343.55 samples/sec Loss 6.0183 LearningRate 0.0006 Epoch: 12 Global Step: 268690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:55,193-Speed 6315.46 samples/sec Loss 5.9118 LearningRate 0.0006 Epoch: 12 Global Step: 268700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:09:58,441-Speed 6305.78 samples/sec Loss 5.9230 LearningRate 0.0006 Epoch: 12 Global Step: 268710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:01,684-Speed 6317.16 samples/sec Loss 6.0164 LearningRate 0.0006 Epoch: 12 Global Step: 268720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:04,927-Speed 6315.82 samples/sec Loss 6.0596 LearningRate 0.0006 Epoch: 12 Global Step: 268730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:08,173-Speed 6311.87 samples/sec Loss 6.0106 LearningRate 0.0006 Epoch: 12 Global Step: 268740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:11,443-Speed 6262.95 samples/sec Loss 5.9649 LearningRate 0.0006 Epoch: 12 Global Step: 268750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:14,687-Speed 6316.06 samples/sec Loss 5.9766 LearningRate 0.0006 Epoch: 12 Global Step: 268760 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:17,933-Speed 6309.51 samples/sec Loss 5.9644 LearningRate 0.0006 Epoch: 12 Global Step: 268770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:21,178-Speed 6313.68 samples/sec Loss 5.9859 LearningRate 0.0006 Epoch: 12 Global Step: 268780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:24,411-Speed 6336.65 samples/sec Loss 5.9627 LearningRate 0.0006 Epoch: 12 Global Step: 268790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:27,659-Speed 6306.32 samples/sec Loss 6.0030 LearningRate 0.0006 Epoch: 12 Global Step: 268800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:30,906-Speed 6308.69 samples/sec Loss 5.9333 LearningRate 0.0006 Epoch: 12 Global Step: 268810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:34,152-Speed 6310.83 samples/sec Loss 5.9284 LearningRate 0.0006 Epoch: 12 Global Step: 268820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:37,396-Speed 6314.20 samples/sec Loss 5.9086 LearningRate 0.0006 Epoch: 12 Global Step: 268830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:40,642-Speed 6309.53 samples/sec Loss 5.9865 LearningRate 0.0006 Epoch: 12 Global Step: 268840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:43,889-Speed 6309.95 samples/sec Loss 5.9912 LearningRate 0.0006 Epoch: 12 Global Step: 268850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:47,136-Speed 6308.67 samples/sec Loss 5.9759 LearningRate 0.0006 Epoch: 12 Global Step: 268860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:50,382-Speed 6310.51 samples/sec Loss 6.0294 LearningRate 0.0006 Epoch: 12 Global Step: 268870 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:53,628-Speed 6311.30 samples/sec Loss 6.0407 LearningRate 0.0006 Epoch: 12 Global Step: 268880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:10:56,873-Speed 6312.71 samples/sec Loss 5.9351 LearningRate 0.0006 Epoch: 12 Global Step: 268890 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:11:00,152-Speed 6248.08 samples/sec Loss 5.9989 LearningRate 0.0006 Epoch: 12 Global Step: 268900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:03,442-Speed 6226.48 samples/sec Loss 6.0028 LearningRate 0.0006 Epoch: 12 Global Step: 268910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:06,690-Speed 6305.85 samples/sec Loss 5.9980 LearningRate 0.0006 Epoch: 12 Global Step: 268920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:09,936-Speed 6311.28 samples/sec Loss 6.0156 LearningRate 0.0006 Epoch: 12 Global Step: 268930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:13,187-Speed 6301.30 samples/sec Loss 5.9762 LearningRate 0.0006 Epoch: 12 Global Step: 268940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:16,431-Speed 6313.49 samples/sec Loss 6.0531 LearningRate 0.0006 Epoch: 12 Global Step: 268950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:19,682-Speed 6302.41 samples/sec Loss 5.9485 LearningRate 0.0006 Epoch: 12 Global Step: 268960 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:22,925-Speed 6315.07 samples/sec Loss 5.9754 LearningRate 0.0006 Epoch: 12 Global Step: 268970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:26,174-Speed 6305.07 samples/sec Loss 6.0178 LearningRate 0.0006 Epoch: 12 Global Step: 268980 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:29,421-Speed 6308.27 samples/sec Loss 5.9688 LearningRate 0.0006 Epoch: 12 Global Step: 268990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:32,671-Speed 6304.30 samples/sec Loss 6.0445 LearningRate 0.0006 Epoch: 12 Global Step: 269000 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:11:35,902-Speed 6340.28 samples/sec Loss 5.9967 LearningRate 0.0006 Epoch: 12 Global Step: 269010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:39,148-Speed 6310.48 samples/sec Loss 5.9912 LearningRate 0.0006 Epoch: 12 Global Step: 269020 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:42,395-Speed 6308.45 samples/sec Loss 6.0785 LearningRate 0.0006 Epoch: 12 Global Step: 269030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:45,648-Speed 6297.43 samples/sec Loss 6.0166 LearningRate 0.0006 Epoch: 12 Global Step: 269040 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:48,889-Speed 6319.38 samples/sec Loss 5.9766 LearningRate 0.0006 Epoch: 12 Global Step: 269050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:52,140-Speed 6301.90 samples/sec Loss 5.9230 LearningRate 0.0006 Epoch: 12 Global Step: 269060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:55,387-Speed 6309.17 samples/sec Loss 5.9381 LearningRate 0.0006 Epoch: 12 Global Step: 269070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:11:58,635-Speed 6306.52 samples/sec Loss 5.9571 LearningRate 0.0006 Epoch: 12 Global Step: 269080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:01,881-Speed 6310.21 samples/sec Loss 5.9426 LearningRate 0.0006 Epoch: 12 Global Step: 269090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:05,129-Speed 6307.09 samples/sec Loss 6.0289 LearningRate 0.0006 Epoch: 12 Global Step: 269100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:08,365-Speed 6331.85 samples/sec Loss 6.0103 LearningRate 0.0006 Epoch: 12 Global Step: 269110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:11,610-Speed 6312.09 samples/sec Loss 5.9480 LearningRate 0.0006 Epoch: 12 Global Step: 269120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:14,857-Speed 6307.88 samples/sec Loss 5.9278 LearningRate 0.0006 Epoch: 12 Global Step: 269130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:18,104-Speed 6309.92 samples/sec Loss 5.9852 LearningRate 0.0006 Epoch: 12 Global Step: 269140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:21,355-Speed 6300.82 samples/sec Loss 5.9389 LearningRate 0.0006 Epoch: 12 Global Step: 269150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:24,601-Speed 6309.49 samples/sec Loss 5.9996 LearningRate 0.0006 Epoch: 12 Global Step: 269160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:27,846-Speed 6313.70 samples/sec Loss 5.9961 LearningRate 0.0006 Epoch: 12 Global Step: 269170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:31,096-Speed 6302.69 samples/sec Loss 6.0043 LearningRate 0.0006 Epoch: 12 Global Step: 269180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:34,348-Speed 6299.64 samples/sec Loss 5.9840 LearningRate 0.0006 Epoch: 12 Global Step: 269190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:37,593-Speed 6311.02 samples/sec Loss 6.0249 LearningRate 0.0006 Epoch: 12 Global Step: 269200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:40,824-Speed 6339.65 samples/sec Loss 5.9848 LearningRate 0.0006 Epoch: 12 Global Step: 269210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:44,072-Speed 6307.43 samples/sec Loss 6.0630 LearningRate 0.0006 Epoch: 12 Global Step: 269220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:47,321-Speed 6305.48 samples/sec Loss 6.0436 LearningRate 0.0006 Epoch: 12 Global Step: 269230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:50,569-Speed 6307.88 samples/sec Loss 5.9735 LearningRate 0.0006 Epoch: 12 Global Step: 269240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:53,812-Speed 6315.10 samples/sec Loss 5.9843 LearningRate 0.0006 Epoch: 12 Global Step: 269250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:12:57,060-Speed 6307.61 samples/sec Loss 5.9267 LearningRate 0.0006 Epoch: 12 Global Step: 269260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:00,316-Speed 6289.95 samples/sec Loss 5.8872 LearningRate 0.0006 Epoch: 12 Global Step: 269270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:03,564-Speed 6308.14 samples/sec Loss 6.0152 LearningRate 0.0006 Epoch: 12 Global Step: 269280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:06,813-Speed 6306.45 samples/sec Loss 6.0383 LearningRate 0.0006 Epoch: 12 Global Step: 269290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:10,057-Speed 6312.88 samples/sec Loss 6.0171 LearningRate 0.0006 Epoch: 12 Global Step: 269300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:13,288-Speed 6341.93 samples/sec Loss 6.0286 LearningRate 0.0006 Epoch: 12 Global Step: 269310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:16,532-Speed 6314.51 samples/sec Loss 6.0856 LearningRate 0.0006 Epoch: 12 Global Step: 269320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:19,780-Speed 6306.83 samples/sec Loss 5.9566 LearningRate 0.0006 Epoch: 12 Global Step: 269330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:23,024-Speed 6315.14 samples/sec Loss 6.0231 LearningRate 0.0006 Epoch: 12 Global Step: 269340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:26,270-Speed 6308.96 samples/sec Loss 6.0582 LearningRate 0.0006 Epoch: 12 Global Step: 269350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:29,517-Speed 6310.21 samples/sec Loss 6.0487 LearningRate 0.0006 Epoch: 12 Global Step: 269360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:32,765-Speed 6306.30 samples/sec Loss 5.9245 LearningRate 0.0006 Epoch: 12 Global Step: 269370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:36,044-Speed 6247.64 samples/sec Loss 6.0393 LearningRate 0.0006 Epoch: 12 Global Step: 269380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:39,322-Speed 6247.84 samples/sec Loss 5.9411 LearningRate 0.0006 Epoch: 12 Global Step: 269390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:42,571-Speed 6305.36 samples/sec Loss 6.0042 LearningRate 0.0006 Epoch: 12 Global Step: 269400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:45,817-Speed 6311.86 samples/sec Loss 5.9660 LearningRate 0.0006 Epoch: 12 Global Step: 269410 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:13:49,050-Speed 6335.03 samples/sec Loss 5.9626 LearningRate 0.0006 Epoch: 12 Global Step: 269420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:52,300-Speed 6303.79 samples/sec Loss 5.9939 LearningRate 0.0006 Epoch: 12 Global Step: 269430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:55,545-Speed 6311.57 samples/sec Loss 6.0685 LearningRate 0.0006 Epoch: 12 Global Step: 269440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:13:58,794-Speed 6305.41 samples/sec Loss 5.9781 LearningRate 0.0006 Epoch: 12 Global Step: 269450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:02,044-Speed 6301.88 samples/sec Loss 6.0015 LearningRate 0.0006 Epoch: 12 Global Step: 269460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:05,290-Speed 6311.10 samples/sec Loss 5.9796 LearningRate 0.0006 Epoch: 12 Global Step: 269470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:08,539-Speed 6306.34 samples/sec Loss 6.0327 LearningRate 0.0006 Epoch: 12 Global Step: 269480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:11,783-Speed 6315.78 samples/sec Loss 6.0113 LearningRate 0.0006 Epoch: 12 Global Step: 269490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:15,030-Speed 6307.49 samples/sec Loss 6.0164 LearningRate 0.0006 Epoch: 12 Global Step: 269500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:18,277-Speed 6308.50 samples/sec Loss 5.9868 LearningRate 0.0006 Epoch: 12 Global Step: 269510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:21,531-Speed 6296.47 samples/sec Loss 5.9747 LearningRate 0.0006 Epoch: 12 Global Step: 269520 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:14:24,763-Speed 6337.45 samples/sec Loss 6.1001 LearningRate 0.0006 Epoch: 12 Global Step: 269530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:28,013-Speed 6302.81 samples/sec Loss 5.9710 LearningRate 0.0006 Epoch: 12 Global Step: 269540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:31,260-Speed 6308.82 samples/sec Loss 5.9681 LearningRate 0.0006 Epoch: 12 Global Step: 269550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:34,503-Speed 6316.01 samples/sec Loss 6.0366 LearningRate 0.0006 Epoch: 12 Global Step: 269560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:37,752-Speed 6304.95 samples/sec Loss 5.9780 LearningRate 0.0006 Epoch: 12 Global Step: 269570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:41,000-Speed 6307.66 samples/sec Loss 6.0058 LearningRate 0.0006 Epoch: 12 Global Step: 269580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:44,251-Speed 6301.89 samples/sec Loss 5.9893 LearningRate 0.0006 Epoch: 12 Global Step: 269590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:47,498-Speed 6308.08 samples/sec Loss 5.9852 LearningRate 0.0006 Epoch: 12 Global Step: 269600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:14:50,743-Speed 6312.55 samples/sec Loss 5.9381 LearningRate 0.0006 Epoch: 12 Global Step: 269610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:15:51,961-Speed 334.54 samples/sec Loss 5.9197 LearningRate 0.0006 Epoch: 13 Global Step: 269620 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:15:55,207-Speed 6311.52 samples/sec Loss 6.0645 LearningRate 0.0006 Epoch: 13 Global Step: 269630 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:15:58,433-Speed 6349.09 samples/sec Loss 5.9131 LearningRate 0.0006 Epoch: 13 Global Step: 269640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:01,665-Speed 6337.68 samples/sec Loss 5.9339 LearningRate 0.0006 Epoch: 13 Global Step: 269650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:04,911-Speed 6312.66 samples/sec Loss 5.9551 LearningRate 0.0006 Epoch: 13 Global Step: 269660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:08,149-Speed 6325.94 samples/sec Loss 6.0394 LearningRate 0.0006 Epoch: 13 Global Step: 269670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:11,396-Speed 6309.06 samples/sec Loss 5.9913 LearningRate 0.0006 Epoch: 13 Global Step: 269680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:14,630-Speed 6333.97 samples/sec Loss 5.9645 LearningRate 0.0006 Epoch: 13 Global Step: 269690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:17,865-Speed 6331.91 samples/sec Loss 5.9922 LearningRate 0.0006 Epoch: 13 Global Step: 269700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:21,103-Speed 6326.81 samples/sec Loss 5.9708 LearningRate 0.0006 Epoch: 13 Global Step: 269710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:24,408-Speed 6198.79 samples/sec Loss 6.0181 LearningRate 0.0006 Epoch: 13 Global Step: 269720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:27,690-Speed 6241.39 samples/sec Loss 5.9182 LearningRate 0.0006 Epoch: 13 Global Step: 269730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:30,913-Speed 6354.64 samples/sec Loss 5.9198 LearningRate 0.0006 Epoch: 13 Global Step: 269740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:34,155-Speed 6320.13 samples/sec Loss 5.9906 LearningRate 0.0006 Epoch: 13 Global Step: 269750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:37,394-Speed 6323.95 samples/sec Loss 6.0076 LearningRate 0.0006 Epoch: 13 Global Step: 269760 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:40,630-Speed 6328.76 samples/sec Loss 6.0376 LearningRate 0.0006 Epoch: 13 Global Step: 269770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:43,873-Speed 6317.82 samples/sec Loss 5.9747 LearningRate 0.0006 Epoch: 13 Global Step: 269780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:47,115-Speed 6317.53 samples/sec Loss 5.9332 LearningRate 0.0006 Epoch: 13 Global Step: 269790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:50,354-Speed 6324.28 samples/sec Loss 5.9547 LearningRate 0.0006 Epoch: 13 Global Step: 269800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:53,596-Speed 6318.56 samples/sec Loss 5.9551 LearningRate 0.0006 Epoch: 13 Global Step: 269810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:16:56,836-Speed 6323.71 samples/sec Loss 5.9423 LearningRate 0.0006 Epoch: 13 Global Step: 269820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:00,081-Speed 6311.90 samples/sec Loss 5.9885 LearningRate 0.0006 Epoch: 13 Global Step: 269830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:03,309-Speed 6345.21 samples/sec Loss 5.8942 LearningRate 0.0006 Epoch: 13 Global Step: 269840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:06,546-Speed 6328.44 samples/sec Loss 5.9419 LearningRate 0.0006 Epoch: 13 Global Step: 269850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:09,781-Speed 6333.41 samples/sec Loss 5.9104 LearningRate 0.0006 Epoch: 13 Global Step: 269860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:13,024-Speed 6317.57 samples/sec Loss 6.0516 LearningRate 0.0006 Epoch: 13 Global Step: 269870 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:16,261-Speed 6327.00 samples/sec Loss 5.9123 LearningRate 0.0006 Epoch: 13 Global Step: 269880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:19,498-Speed 6329.14 samples/sec Loss 6.0326 LearningRate 0.0006 Epoch: 13 Global Step: 269890 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:22,738-Speed 6321.50 samples/sec Loss 5.9323 LearningRate 0.0006 Epoch: 13 Global Step: 269900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:25,987-Speed 6305.21 samples/sec Loss 5.9678 LearningRate 0.0006 Epoch: 13 Global Step: 269910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:29,226-Speed 6325.06 samples/sec Loss 5.9296 LearningRate 0.0006 Epoch: 13 Global Step: 269920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:32,461-Speed 6332.90 samples/sec Loss 5.9225 LearningRate 0.0006 Epoch: 13 Global Step: 269930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:35,717-Speed 6290.10 samples/sec Loss 5.9937 LearningRate 0.0006 Epoch: 13 Global Step: 269940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:38,956-Speed 6325.14 samples/sec Loss 5.9797 LearningRate 0.0006 Epoch: 13 Global Step: 269950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:42,196-Speed 6320.67 samples/sec Loss 5.9540 LearningRate 0.0006 Epoch: 13 Global Step: 269960 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:45,432-Speed 6331.92 samples/sec Loss 6.0198 LearningRate 0.0006 Epoch: 13 Global Step: 269970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:48,666-Speed 6332.49 samples/sec Loss 5.9174 LearningRate 0.0006 Epoch: 13 Global Step: 269980 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:51,902-Speed 6331.83 samples/sec Loss 5.9741 LearningRate 0.0006 Epoch: 13 Global Step: 269990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:55,135-Speed 6335.36 samples/sec Loss 5.9683 LearningRate 0.0006 Epoch: 13 Global Step: 270000 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:17:58,369-Speed 6333.54 samples/sec Loss 5.9352 LearningRate 0.0006 Epoch: 13 Global Step: 270010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:01,621-Speed 6299.17 samples/sec Loss 6.0085 LearningRate 0.0006 Epoch: 13 Global Step: 270020 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:04,860-Speed 6325.45 samples/sec Loss 5.9467 LearningRate 0.0006 Epoch: 13 Global Step: 270030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:08,098-Speed 6325.29 samples/sec Loss 5.9706 LearningRate 0.0006 Epoch: 13 Global Step: 270040 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:18:11,316-Speed 6366.22 samples/sec Loss 6.0411 LearningRate 0.0006 Epoch: 13 Global Step: 270050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:14,552-Speed 6330.22 samples/sec Loss 5.9958 LearningRate 0.0006 Epoch: 13 Global Step: 270060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:17,786-Speed 6333.24 samples/sec Loss 6.0243 LearningRate 0.0006 Epoch: 13 Global Step: 270070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:21,022-Speed 6331.51 samples/sec Loss 5.9887 LearningRate 0.0006 Epoch: 13 Global Step: 270080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:24,261-Speed 6327.06 samples/sec Loss 5.9686 LearningRate 0.0006 Epoch: 13 Global Step: 270090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:27,497-Speed 6332.01 samples/sec Loss 6.0195 LearningRate 0.0006 Epoch: 13 Global Step: 270100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:30,730-Speed 6334.74 samples/sec Loss 5.9651 LearningRate 0.0006 Epoch: 13 Global Step: 270110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:33,966-Speed 6330.75 samples/sec Loss 5.9452 LearningRate 0.0006 Epoch: 13 Global Step: 270120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:37,201-Speed 6331.27 samples/sec Loss 5.9958 LearningRate 0.0006 Epoch: 13 Global Step: 270130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:40,437-Speed 6330.23 samples/sec Loss 5.9837 LearningRate 0.0006 Epoch: 13 Global Step: 270140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:43,660-Speed 6356.92 samples/sec Loss 5.9943 LearningRate 0.0006 Epoch: 13 Global Step: 270150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:46,897-Speed 6327.41 samples/sec Loss 6.0327 LearningRate 0.0006 Epoch: 13 Global Step: 270160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:50,137-Speed 6322.98 samples/sec Loss 6.0450 LearningRate 0.0006 Epoch: 13 Global Step: 270170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:53,373-Speed 6329.88 samples/sec Loss 5.9966 LearningRate 0.0006 Epoch: 13 Global Step: 270180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:56,610-Speed 6328.94 samples/sec Loss 5.9281 LearningRate 0.0006 Epoch: 13 Global Step: 270190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:18:59,847-Speed 6327.60 samples/sec Loss 5.9438 LearningRate 0.0006 Epoch: 13 Global Step: 270200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:03,083-Speed 6330.44 samples/sec Loss 5.8622 LearningRate 0.0006 Epoch: 13 Global Step: 270210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:06,322-Speed 6324.65 samples/sec Loss 5.9726 LearningRate 0.0006 Epoch: 13 Global Step: 270220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:09,562-Speed 6322.85 samples/sec Loss 5.9834 LearningRate 0.0006 Epoch: 13 Global Step: 270230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:12,796-Speed 6332.54 samples/sec Loss 5.9992 LearningRate 0.0006 Epoch: 13 Global Step: 270240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:16,023-Speed 6349.86 samples/sec Loss 6.0110 LearningRate 0.0006 Epoch: 13 Global Step: 270250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:19,257-Speed 6333.66 samples/sec Loss 5.9922 LearningRate 0.0006 Epoch: 13 Global Step: 270260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:22,497-Speed 6322.40 samples/sec Loss 5.9986 LearningRate 0.0006 Epoch: 13 Global Step: 270270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:25,734-Speed 6328.44 samples/sec Loss 5.9130 LearningRate 0.0006 Epoch: 13 Global Step: 270280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:28,969-Speed 6331.86 samples/sec Loss 5.9320 LearningRate 0.0006 Epoch: 13 Global Step: 270290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:32,206-Speed 6328.44 samples/sec Loss 5.9908 LearningRate 0.0006 Epoch: 13 Global Step: 270300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:35,442-Speed 6331.25 samples/sec Loss 5.9042 LearningRate 0.0006 Epoch: 13 Global Step: 270310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:38,685-Speed 6315.97 samples/sec Loss 5.9199 LearningRate 0.0006 Epoch: 13 Global Step: 270320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:41,930-Speed 6312.69 samples/sec Loss 6.0165 LearningRate 0.0006 Epoch: 13 Global Step: 270330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:45,167-Speed 6327.63 samples/sec Loss 5.9086 LearningRate 0.0006 Epoch: 13 Global Step: 270340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:48,405-Speed 6326.61 samples/sec Loss 5.9784 LearningRate 0.0006 Epoch: 13 Global Step: 270350 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:19:51,627-Speed 6358.17 samples/sec Loss 5.9527 LearningRate 0.0006 Epoch: 13 Global Step: 270360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:54,869-Speed 6317.65 samples/sec Loss 5.9833 LearningRate 0.0006 Epoch: 13 Global Step: 270370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:19:58,106-Speed 6329.66 samples/sec Loss 5.9252 LearningRate 0.0006 Epoch: 13 Global Step: 270380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:01,350-Speed 6314.13 samples/sec Loss 6.0129 LearningRate 0.0006 Epoch: 13 Global Step: 270390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:04,590-Speed 6323.92 samples/sec Loss 5.9338 LearningRate 0.0006 Epoch: 13 Global Step: 270400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:07,828-Speed 6325.71 samples/sec Loss 6.0267 LearningRate 0.0006 Epoch: 13 Global Step: 270410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:11,068-Speed 6323.20 samples/sec Loss 6.0098 LearningRate 0.0006 Epoch: 13 Global Step: 270420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:14,308-Speed 6321.28 samples/sec Loss 5.9940 LearningRate 0.0006 Epoch: 13 Global Step: 270430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:17,553-Speed 6313.51 samples/sec Loss 6.0284 LearningRate 0.0006 Epoch: 13 Global Step: 270440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:20,800-Speed 6307.25 samples/sec Loss 5.9341 LearningRate 0.0006 Epoch: 13 Global Step: 270450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:24,023-Speed 6357.19 samples/sec Loss 6.0034 LearningRate 0.0006 Epoch: 13 Global Step: 270460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:27,259-Speed 6329.05 samples/sec Loss 5.9479 LearningRate 0.0006 Epoch: 13 Global Step: 270470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:30,496-Speed 6328.47 samples/sec Loss 6.0259 LearningRate 0.0006 Epoch: 13 Global Step: 270480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:33,737-Speed 6319.72 samples/sec Loss 6.0408 LearningRate 0.0006 Epoch: 13 Global Step: 270490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:36,977-Speed 6322.79 samples/sec Loss 6.0230 LearningRate 0.0006 Epoch: 13 Global Step: 270500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:40,223-Speed 6312.25 samples/sec Loss 5.9302 LearningRate 0.0006 Epoch: 13 Global Step: 270510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:43,462-Speed 6324.90 samples/sec Loss 5.9475 LearningRate 0.0006 Epoch: 13 Global Step: 270520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:46,698-Speed 6330.35 samples/sec Loss 5.9698 LearningRate 0.0006 Epoch: 13 Global Step: 270530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:49,940-Speed 6317.56 samples/sec Loss 5.9389 LearningRate 0.0006 Epoch: 13 Global Step: 270540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:53,180-Speed 6322.24 samples/sec Loss 5.9963 LearningRate 0.0006 Epoch: 13 Global Step: 270550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:56,407-Speed 6347.86 samples/sec Loss 5.9996 LearningRate 0.0006 Epoch: 13 Global Step: 270560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:20:59,644-Speed 6327.99 samples/sec Loss 6.0749 LearningRate 0.0006 Epoch: 13 Global Step: 270570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:02,885-Speed 6320.71 samples/sec Loss 5.9941 LearningRate 0.0006 Epoch: 13 Global Step: 270580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:06,126-Speed 6320.06 samples/sec Loss 6.0135 LearningRate 0.0006 Epoch: 13 Global Step: 270590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:09,364-Speed 6327.99 samples/sec Loss 5.9502 LearningRate 0.0006 Epoch: 13 Global Step: 270600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:12,602-Speed 6325.38 samples/sec Loss 5.8984 LearningRate 0.0006 Epoch: 13 Global Step: 270610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:15,840-Speed 6326.36 samples/sec Loss 6.0793 LearningRate 0.0006 Epoch: 13 Global Step: 270620 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:19,079-Speed 6324.83 samples/sec Loss 6.0273 LearningRate 0.0006 Epoch: 13 Global Step: 270630 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:22,320-Speed 6320.63 samples/sec Loss 5.9375 LearningRate 0.0006 Epoch: 13 Global Step: 270640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:25,558-Speed 6326.44 samples/sec Loss 5.9907 LearningRate 0.0006 Epoch: 13 Global Step: 270650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:28,786-Speed 6345.53 samples/sec Loss 5.9931 LearningRate 0.0006 Epoch: 13 Global Step: 270660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:32,027-Speed 6321.06 samples/sec Loss 5.9594 LearningRate 0.0006 Epoch: 13 Global Step: 270670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:35,268-Speed 6319.97 samples/sec Loss 5.8933 LearningRate 0.0006 Epoch: 13 Global Step: 270680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:38,507-Speed 6324.44 samples/sec Loss 6.0076 LearningRate 0.0006 Epoch: 13 Global Step: 270690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:41,745-Speed 6326.62 samples/sec Loss 5.8777 LearningRate 0.0006 Epoch: 13 Global Step: 270700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:44,986-Speed 6319.80 samples/sec Loss 5.9197 LearningRate 0.0006 Epoch: 13 Global Step: 270710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:48,226-Speed 6324.14 samples/sec Loss 5.9088 LearningRate 0.0006 Epoch: 13 Global Step: 270720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:51,468-Speed 6318.40 samples/sec Loss 5.9908 LearningRate 0.0006 Epoch: 13 Global Step: 270730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:54,706-Speed 6325.97 samples/sec Loss 5.9984 LearningRate 0.0006 Epoch: 13 Global Step: 270740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:21:57,944-Speed 6324.76 samples/sec Loss 5.9191 LearningRate 0.0006 Epoch: 13 Global Step: 270750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:01,190-Speed 6312.02 samples/sec Loss 6.0379 LearningRate 0.0006 Epoch: 13 Global Step: 270760 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:22:04,416-Speed 6350.30 samples/sec Loss 6.0202 LearningRate 0.0006 Epoch: 13 Global Step: 270770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:07,656-Speed 6320.89 samples/sec Loss 5.9607 LearningRate 0.0006 Epoch: 13 Global Step: 270780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:10,895-Speed 6325.72 samples/sec Loss 5.9588 LearningRate 0.0006 Epoch: 13 Global Step: 270790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:14,135-Speed 6322.11 samples/sec Loss 5.9363 LearningRate 0.0006 Epoch: 13 Global Step: 270800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:17,394-Speed 6286.29 samples/sec Loss 6.0119 LearningRate 0.0006 Epoch: 13 Global Step: 270810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:20,635-Speed 6318.97 samples/sec Loss 5.9853 LearningRate 0.0006 Epoch: 13 Global Step: 270820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:23,874-Speed 6324.16 samples/sec Loss 5.9639 LearningRate 0.0006 Epoch: 13 Global Step: 270830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:27,114-Speed 6323.03 samples/sec Loss 5.9848 LearningRate 0.0006 Epoch: 13 Global Step: 270840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:30,358-Speed 6314.77 samples/sec Loss 5.9564 LearningRate 0.0006 Epoch: 13 Global Step: 270850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:33,599-Speed 6319.48 samples/sec Loss 5.9242 LearningRate 0.0006 Epoch: 13 Global Step: 270860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:36,825-Speed 6349.67 samples/sec Loss 6.0015 LearningRate 0.0006 Epoch: 13 Global Step: 270870 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:40,067-Speed 6320.46 samples/sec Loss 5.9275 LearningRate 0.0006 Epoch: 13 Global Step: 270880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:43,305-Speed 6325.82 samples/sec Loss 5.9621 LearningRate 0.0006 Epoch: 13 Global Step: 270890 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:46,547-Speed 6317.70 samples/sec Loss 6.0106 LearningRate 0.0006 Epoch: 13 Global Step: 270900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:49,790-Speed 6316.82 samples/sec Loss 5.9920 LearningRate 0.0006 Epoch: 13 Global Step: 270910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:53,028-Speed 6326.21 samples/sec Loss 6.0166 LearningRate 0.0006 Epoch: 13 Global Step: 270920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:56,266-Speed 6326.45 samples/sec Loss 5.9479 LearningRate 0.0006 Epoch: 13 Global Step: 270930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:22:59,510-Speed 6316.78 samples/sec Loss 5.9358 LearningRate 0.0006 Epoch: 13 Global Step: 270940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:02,748-Speed 6326.19 samples/sec Loss 5.8883 LearningRate 0.0006 Epoch: 13 Global Step: 270950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:05,986-Speed 6324.59 samples/sec Loss 5.9376 LearningRate 0.0006 Epoch: 13 Global Step: 270960 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:09,214-Speed 6347.57 samples/sec Loss 6.1088 LearningRate 0.0006 Epoch: 13 Global Step: 270970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:12,457-Speed 6316.26 samples/sec Loss 5.9005 LearningRate 0.0006 Epoch: 13 Global Step: 270980 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:15,699-Speed 6318.58 samples/sec Loss 6.0100 LearningRate 0.0006 Epoch: 13 Global Step: 270990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:18,946-Speed 6308.52 samples/sec Loss 5.9798 LearningRate 0.0006 Epoch: 13 Global Step: 271000 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:22,186-Speed 6322.22 samples/sec Loss 6.0013 LearningRate 0.0006 Epoch: 13 Global Step: 271010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:25,426-Speed 6322.64 samples/sec Loss 5.9464 LearningRate 0.0006 Epoch: 13 Global Step: 271020 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:28,666-Speed 6323.25 samples/sec Loss 5.8987 LearningRate 0.0006 Epoch: 13 Global Step: 271030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:31,907-Speed 6318.78 samples/sec Loss 5.8801 LearningRate 0.0006 Epoch: 13 Global Step: 271040 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:35,146-Speed 6325.62 samples/sec Loss 5.9419 LearningRate 0.0006 Epoch: 13 Global Step: 271050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:38,384-Speed 6326.01 samples/sec Loss 5.9734 LearningRate 0.0006 Epoch: 13 Global Step: 271060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:41,613-Speed 6342.83 samples/sec Loss 5.9481 LearningRate 0.0006 Epoch: 13 Global Step: 271070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:44,856-Speed 6316.33 samples/sec Loss 5.9814 LearningRate 0.0006 Epoch: 13 Global Step: 271080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:48,098-Speed 6320.25 samples/sec Loss 6.0107 LearningRate 0.0006 Epoch: 13 Global Step: 271090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:51,340-Speed 6318.72 samples/sec Loss 5.9099 LearningRate 0.0006 Epoch: 13 Global Step: 271100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:54,586-Speed 6310.01 samples/sec Loss 5.9790 LearningRate 0.0006 Epoch: 13 Global Step: 271110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:23:57,829-Speed 6316.61 samples/sec Loss 5.9292 LearningRate 0.0006 Epoch: 13 Global Step: 271120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:01,077-Speed 6307.17 samples/sec Loss 5.9532 LearningRate 0.0006 Epoch: 13 Global Step: 271130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:04,320-Speed 6315.75 samples/sec Loss 5.9440 LearningRate 0.0006 Epoch: 13 Global Step: 271140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:07,561-Speed 6321.85 samples/sec Loss 6.0711 LearningRate 0.0006 Epoch: 13 Global Step: 271150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:10,805-Speed 6314.35 samples/sec Loss 5.9100 LearningRate 0.0006 Epoch: 13 Global Step: 271160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:14,046-Speed 6319.73 samples/sec Loss 5.9225 LearningRate 0.0006 Epoch: 13 Global Step: 271170 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:24:17,275-Speed 6345.19 samples/sec Loss 5.9981 LearningRate 0.0006 Epoch: 13 Global Step: 271180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:20,529-Speed 6294.02 samples/sec Loss 6.0211 LearningRate 0.0006 Epoch: 13 Global Step: 271190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:23,772-Speed 6317.65 samples/sec Loss 6.0006 LearningRate 0.0006 Epoch: 13 Global Step: 271200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:27,013-Speed 6320.66 samples/sec Loss 5.9849 LearningRate 0.0006 Epoch: 13 Global Step: 271210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:30,253-Speed 6320.87 samples/sec Loss 5.9893 LearningRate 0.0006 Epoch: 13 Global Step: 271220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:33,496-Speed 6317.66 samples/sec Loss 5.9629 LearningRate 0.0006 Epoch: 13 Global Step: 271230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:36,739-Speed 6316.33 samples/sec Loss 5.9602 LearningRate 0.0006 Epoch: 13 Global Step: 271240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:39,986-Speed 6308.37 samples/sec Loss 5.9269 LearningRate 0.0006 Epoch: 13 Global Step: 271250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:43,230-Speed 6315.04 samples/sec Loss 5.8894 LearningRate 0.0006 Epoch: 13 Global Step: 271260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:46,475-Speed 6312.79 samples/sec Loss 5.9370 LearningRate 0.0006 Epoch: 13 Global Step: 271270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:49,706-Speed 6340.27 samples/sec Loss 6.0342 LearningRate 0.0006 Epoch: 13 Global Step: 271280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:52,948-Speed 6318.17 samples/sec Loss 5.8887 LearningRate 0.0006 Epoch: 13 Global Step: 271290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:56,188-Speed 6321.71 samples/sec Loss 5.9543 LearningRate 0.0006 Epoch: 13 Global Step: 271300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:24:59,433-Speed 6313.65 samples/sec Loss 6.0215 LearningRate 0.0006 Epoch: 13 Global Step: 271310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:02,677-Speed 6313.28 samples/sec Loss 6.0069 LearningRate 0.0006 Epoch: 13 Global Step: 271320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:05,922-Speed 6313.68 samples/sec Loss 5.9565 LearningRate 0.0006 Epoch: 13 Global Step: 271330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:09,198-Speed 6253.06 samples/sec Loss 5.9066 LearningRate 0.0006 Epoch: 13 Global Step: 271340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:12,524-Speed 6158.42 samples/sec Loss 6.0175 LearningRate 0.0006 Epoch: 13 Global Step: 271350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:15,769-Speed 6314.69 samples/sec Loss 5.9564 LearningRate 0.0006 Epoch: 13 Global Step: 271360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:19,011-Speed 6318.17 samples/sec Loss 5.9462 LearningRate 0.0006 Epoch: 13 Global Step: 271370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:22,243-Speed 6338.69 samples/sec Loss 5.9601 LearningRate 0.0006 Epoch: 13 Global Step: 271380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:25,489-Speed 6311.48 samples/sec Loss 5.9920 LearningRate 0.0006 Epoch: 13 Global Step: 271390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:28,734-Speed 6312.02 samples/sec Loss 6.0332 LearningRate 0.0006 Epoch: 13 Global Step: 271400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:31,974-Speed 6321.48 samples/sec Loss 5.9698 LearningRate 0.0006 Epoch: 13 Global Step: 271410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:35,216-Speed 6318.74 samples/sec Loss 5.9290 LearningRate 0.0006 Epoch: 13 Global Step: 271420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:38,459-Speed 6316.28 samples/sec Loss 6.0446 LearningRate 0.0006 Epoch: 13 Global Step: 271430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:41,773-Speed 6182.88 samples/sec Loss 5.9071 LearningRate 0.0006 Epoch: 13 Global Step: 271440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:45,015-Speed 6316.55 samples/sec Loss 5.9389 LearningRate 0.0006 Epoch: 13 Global Step: 271450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:48,257-Speed 6318.98 samples/sec Loss 5.8961 LearningRate 0.0006 Epoch: 13 Global Step: 271460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:51,504-Speed 6309.61 samples/sec Loss 5.9852 LearningRate 0.0006 Epoch: 13 Global Step: 271470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:54,735-Speed 6339.86 samples/sec Loss 5.9925 LearningRate 0.0006 Epoch: 13 Global Step: 271480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:25:57,980-Speed 6312.58 samples/sec Loss 5.8699 LearningRate 0.0006 Epoch: 13 Global Step: 271490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:26:01,228-Speed 6306.40 samples/sec Loss 5.9919 LearningRate 0.0006 Epoch: 13 Global Step: 271500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:26:04,469-Speed 6319.83 samples/sec Loss 5.9782 LearningRate 0.0006 Epoch: 13 Global Step: 271510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:26:07,713-Speed 6315.58 samples/sec Loss 5.9364 LearningRate 0.0006 Epoch: 13 Global Step: 271520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:26:10,959-Speed 6310.08 samples/sec Loss 6.0459 LearningRate 0.0006 Epoch: 13 Global Step: 271530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:26:14,203-Speed 6315.83 samples/sec Loss 5.9290 LearningRate 0.0006 Epoch: 13 Global Step: 271540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:26:17,446-Speed 6316.46 samples/sec Loss 5.9549 LearningRate 0.0006 Epoch: 13 Global Step: 271550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:26:20,690-Speed 6314.86 samples/sec Loss 5.9783 LearningRate 0.0006 Epoch: 13 Global Step: 271560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:26:23,937-Speed 6309.05 samples/sec Loss 5.9975 LearningRate 0.0006 Epoch: 13 Global Step: 271570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:26:27,169-Speed 6337.51 samples/sec Loss 5.9578 LearningRate 0.0006 Epoch: 13 Global Step: 271580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:26:30,396-Speed 6347.88 samples/sec Loss 6.0275 LearningRate 0.0006 Epoch: 13 Global Step: 271590 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:26:33,644-Speed 6306.54 samples/sec Loss 6.0184 LearningRate 0.0006 Epoch: 13 Global Step: 271600 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:26:36,887-Speed 6316.06 samples/sec Loss 5.9132 LearningRate 0.0006 Epoch: 13 Global Step: 271610 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:26:40,132-Speed 6314.68 samples/sec Loss 5.9662 LearningRate 0.0006 Epoch: 13 Global Step: 271620 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:26:43,372-Speed 6320.92 samples/sec Loss 5.9742 LearningRate 0.0006 Epoch: 13 Global Step: 271630 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:26:46,615-Speed 6317.11 samples/sec Loss 5.9132 LearningRate 0.0006 Epoch: 13 Global Step: 271640 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:26:49,856-Speed 6320.19 samples/sec Loss 5.9704 LearningRate 0.0006 Epoch: 13 Global Step: 271650 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:26:53,098-Speed 6318.53 samples/sec Loss 5.9486 LearningRate 0.0006 Epoch: 13 Global Step: 271660 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:26:56,343-Speed 6313.19 samples/sec Loss 5.9568 LearningRate 0.0006 Epoch: 13 Global Step: 271670 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:26:59,583-Speed 6322.60 samples/sec Loss 6.0362 LearningRate 0.0006 Epoch: 13 Global Step: 271680 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:02,821-Speed 6326.58 samples/sec Loss 5.9254 LearningRate 0.0006 Epoch: 13 Global Step: 271690 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:06,066-Speed 6313.04 samples/sec Loss 6.0195 LearningRate 0.0006 Epoch: 13 Global Step: 271700 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:09,334-Speed 6268.36 samples/sec Loss 6.0285 LearningRate 0.0006 Epoch: 13 Global Step: 271710 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:12,609-Speed 6252.95 samples/sec Loss 5.9915 LearningRate 0.0006 Epoch: 13 Global Step: 271720 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:15,851-Speed 6320.03 samples/sec Loss 5.8898 LearningRate 0.0006 Epoch: 13 Global Step: 271730 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:19,099-Speed 6305.70 samples/sec Loss 6.0136 LearningRate 0.0006 Epoch: 13 Global Step: 271740 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:22,349-Speed 6304.06 samples/sec Loss 5.9322 LearningRate 0.0006 Epoch: 13 Global Step: 271750 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:25,594-Speed 6312.18 samples/sec Loss 5.9540 LearningRate 0.0006 Epoch: 13 Global Step: 271760 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:28,843-Speed 6304.90 samples/sec Loss 5.9582 LearningRate 0.0006 Epoch: 13 Global Step: 271770 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:32,086-Speed 6317.91 samples/sec Loss 5.8987 LearningRate 0.0006 Epoch: 13 Global Step: 271780 Fp16 Grad Scale: 16384 Required: 51 hours Training: 2022-04-01 16:27:35,331-Speed 6311.90 samples/sec Loss 5.9775 LearningRate 0.0006 Epoch: 13 Global Step: 271790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:27:38,576-Speed 6312.81 samples/sec Loss 5.9552 LearningRate 0.0006 Epoch: 13 Global Step: 271800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:27:41,824-Speed 6307.58 samples/sec Loss 5.8826 LearningRate 0.0006 Epoch: 13 Global Step: 271810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:27:45,069-Speed 6311.74 samples/sec Loss 5.9408 LearningRate 0.0006 Epoch: 13 Global Step: 271820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:27:48,316-Speed 6309.48 samples/sec Loss 6.0125 LearningRate 0.0006 Epoch: 13 Global Step: 271830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:27:51,562-Speed 6312.75 samples/sec Loss 5.9544 LearningRate 0.0006 Epoch: 13 Global Step: 271840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:27:54,829-Speed 6269.87 samples/sec Loss 5.9322 LearningRate 0.0006 Epoch: 13 Global Step: 271850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:27:58,073-Speed 6314.55 samples/sec Loss 5.9881 LearningRate 0.0006 Epoch: 13 Global Step: 271860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:01,316-Speed 6315.58 samples/sec Loss 6.0240 LearningRate 0.0006 Epoch: 13 Global Step: 271870 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:04,568-Speed 6299.73 samples/sec Loss 5.9210 LearningRate 0.0006 Epoch: 13 Global Step: 271880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:07,811-Speed 6316.45 samples/sec Loss 5.9463 LearningRate 0.0006 Epoch: 13 Global Step: 271890 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:28:11,041-Speed 6342.48 samples/sec Loss 5.8991 LearningRate 0.0006 Epoch: 13 Global Step: 271900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:14,286-Speed 6312.97 samples/sec Loss 5.9049 LearningRate 0.0006 Epoch: 13 Global Step: 271910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:17,534-Speed 6305.25 samples/sec Loss 6.0153 LearningRate 0.0006 Epoch: 13 Global Step: 271920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:20,778-Speed 6315.56 samples/sec Loss 5.8676 LearningRate 0.0006 Epoch: 13 Global Step: 271930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:24,021-Speed 6316.17 samples/sec Loss 5.9709 LearningRate 0.0006 Epoch: 13 Global Step: 271940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:27,270-Speed 6305.17 samples/sec Loss 6.0009 LearningRate 0.0006 Epoch: 13 Global Step: 271950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:30,514-Speed 6314.54 samples/sec Loss 6.0051 LearningRate 0.0006 Epoch: 13 Global Step: 271960 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:33,759-Speed 6312.92 samples/sec Loss 6.0026 LearningRate 0.0006 Epoch: 13 Global Step: 271970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:37,011-Speed 6298.57 samples/sec Loss 5.9851 LearningRate 0.0006 Epoch: 13 Global Step: 271980 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:40,259-Speed 6308.80 samples/sec Loss 5.9748 LearningRate 0.0006 Epoch: 13 Global Step: 271990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:43,494-Speed 6331.15 samples/sec Loss 5.9408 LearningRate 0.0006 Epoch: 13 Global Step: 272000 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:46,734-Speed 6321.76 samples/sec Loss 5.9669 LearningRate 0.0006 Epoch: 13 Global Step: 272010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:49,980-Speed 6313.01 samples/sec Loss 5.9224 LearningRate 0.0006 Epoch: 13 Global Step: 272020 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:53,222-Speed 6318.01 samples/sec Loss 5.9248 LearningRate 0.0006 Epoch: 13 Global Step: 272030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:56,464-Speed 6318.99 samples/sec Loss 5.8420 LearningRate 0.0006 Epoch: 13 Global Step: 272040 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:28:59,715-Speed 6299.41 samples/sec Loss 6.0092 LearningRate 0.0006 Epoch: 13 Global Step: 272050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:02,962-Speed 6309.16 samples/sec Loss 5.9521 LearningRate 0.0006 Epoch: 13 Global Step: 272060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:06,206-Speed 6314.28 samples/sec Loss 6.0069 LearningRate 0.0006 Epoch: 13 Global Step: 272070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:09,451-Speed 6313.97 samples/sec Loss 5.9724 LearningRate 0.0006 Epoch: 13 Global Step: 272080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:12,695-Speed 6313.28 samples/sec Loss 6.0493 LearningRate 0.0006 Epoch: 13 Global Step: 272090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:15,922-Speed 6348.45 samples/sec Loss 5.9846 LearningRate 0.0006 Epoch: 13 Global Step: 272100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:19,166-Speed 6313.27 samples/sec Loss 5.9243 LearningRate 0.0006 Epoch: 13 Global Step: 272110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:22,410-Speed 6315.86 samples/sec Loss 5.9877 LearningRate 0.0006 Epoch: 13 Global Step: 272120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:25,654-Speed 6313.55 samples/sec Loss 5.9613 LearningRate 0.0006 Epoch: 13 Global Step: 272130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:28,898-Speed 6315.10 samples/sec Loss 5.9048 LearningRate 0.0006 Epoch: 13 Global Step: 272140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:32,143-Speed 6312.62 samples/sec Loss 5.9622 LearningRate 0.0006 Epoch: 13 Global Step: 272150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:35,390-Speed 6309.42 samples/sec Loss 5.9662 LearningRate 0.0006 Epoch: 13 Global Step: 272160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:38,632-Speed 6317.35 samples/sec Loss 6.0025 LearningRate 0.0006 Epoch: 13 Global Step: 272170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:41,878-Speed 6315.23 samples/sec Loss 5.9478 LearningRate 0.0006 Epoch: 13 Global Step: 272180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:45,122-Speed 6314.09 samples/sec Loss 5.9992 LearningRate 0.0006 Epoch: 13 Global Step: 272190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:48,358-Speed 6330.82 samples/sec Loss 6.0292 LearningRate 0.0006 Epoch: 13 Global Step: 272200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:51,599-Speed 6320.44 samples/sec Loss 6.0058 LearningRate 0.0006 Epoch: 13 Global Step: 272210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:54,847-Speed 6307.90 samples/sec Loss 5.9567 LearningRate 0.0006 Epoch: 13 Global Step: 272220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:29:58,089-Speed 6317.35 samples/sec Loss 5.9176 LearningRate 0.0006 Epoch: 13 Global Step: 272230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:01,335-Speed 6311.55 samples/sec Loss 5.9803 LearningRate 0.0006 Epoch: 13 Global Step: 272240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:04,581-Speed 6310.48 samples/sec Loss 5.9657 LearningRate 0.0006 Epoch: 13 Global Step: 272250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:07,826-Speed 6312.03 samples/sec Loss 5.9109 LearningRate 0.0006 Epoch: 13 Global Step: 272260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:11,072-Speed 6310.72 samples/sec Loss 6.0003 LearningRate 0.0006 Epoch: 13 Global Step: 272270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:14,316-Speed 6315.70 samples/sec Loss 5.9256 LearningRate 0.0006 Epoch: 13 Global Step: 272280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:17,558-Speed 6317.83 samples/sec Loss 5.9817 LearningRate 0.0006 Epoch: 13 Global Step: 272290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:20,791-Speed 6336.91 samples/sec Loss 5.9722 LearningRate 0.0006 Epoch: 13 Global Step: 272300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:24,038-Speed 6308.36 samples/sec Loss 5.9860 LearningRate 0.0006 Epoch: 13 Global Step: 272310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:27,286-Speed 6307.36 samples/sec Loss 5.8395 LearningRate 0.0006 Epoch: 13 Global Step: 272320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:30,531-Speed 6311.12 samples/sec Loss 5.9289 LearningRate 0.0006 Epoch: 13 Global Step: 272330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:33,776-Speed 6313.44 samples/sec Loss 5.9868 LearningRate 0.0006 Epoch: 13 Global Step: 272340 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:37,021-Speed 6312.36 samples/sec Loss 5.9864 LearningRate 0.0006 Epoch: 13 Global Step: 272350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:40,265-Speed 6313.80 samples/sec Loss 5.9197 LearningRate 0.0006 Epoch: 13 Global Step: 272360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:43,515-Speed 6303.70 samples/sec Loss 5.9123 LearningRate 0.0006 Epoch: 13 Global Step: 272370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:46,754-Speed 6324.02 samples/sec Loss 6.0198 LearningRate 0.0006 Epoch: 13 Global Step: 272380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:50,004-Speed 6306.73 samples/sec Loss 5.9569 LearningRate 0.0006 Epoch: 13 Global Step: 272390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:53,233-Speed 6343.74 samples/sec Loss 6.0012 LearningRate 0.0006 Epoch: 13 Global Step: 272400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:56,480-Speed 6308.89 samples/sec Loss 5.9174 LearningRate 0.0006 Epoch: 13 Global Step: 272410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:30:59,724-Speed 6315.51 samples/sec Loss 5.9067 LearningRate 0.0006 Epoch: 13 Global Step: 272420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:02,972-Speed 6306.27 samples/sec Loss 5.9677 LearningRate 0.0006 Epoch: 13 Global Step: 272430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:06,220-Speed 6308.15 samples/sec Loss 5.9592 LearningRate 0.0006 Epoch: 13 Global Step: 272440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:09,463-Speed 6315.31 samples/sec Loss 5.8866 LearningRate 0.0006 Epoch: 13 Global Step: 272450 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:12,706-Speed 6317.12 samples/sec Loss 6.0407 LearningRate 0.0006 Epoch: 13 Global Step: 272460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:15,953-Speed 6307.76 samples/sec Loss 5.9576 LearningRate 0.0006 Epoch: 13 Global Step: 272470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:19,202-Speed 6304.85 samples/sec Loss 5.9287 LearningRate 0.0006 Epoch: 13 Global Step: 272480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:22,445-Speed 6317.72 samples/sec Loss 5.9451 LearningRate 0.0006 Epoch: 13 Global Step: 272490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:25,687-Speed 6319.12 samples/sec Loss 5.9847 LearningRate 0.0006 Epoch: 13 Global Step: 272500 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:31:28,991-Speed 6198.86 samples/sec Loss 5.9354 LearningRate 0.0006 Epoch: 13 Global Step: 272510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:32,258-Speed 6271.34 samples/sec Loss 5.8519 LearningRate 0.0006 Epoch: 13 Global Step: 272520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:35,501-Speed 6314.93 samples/sec Loss 5.9756 LearningRate 0.0006 Epoch: 13 Global Step: 272530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:38,746-Speed 6313.60 samples/sec Loss 6.0177 LearningRate 0.0006 Epoch: 13 Global Step: 272540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:41,987-Speed 6319.50 samples/sec Loss 5.9184 LearningRate 0.0006 Epoch: 13 Global Step: 272550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:45,234-Speed 6309.90 samples/sec Loss 5.9572 LearningRate 0.0006 Epoch: 13 Global Step: 272560 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:48,479-Speed 6312.75 samples/sec Loss 5.9239 LearningRate 0.0006 Epoch: 13 Global Step: 272570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:51,728-Speed 6304.90 samples/sec Loss 5.9396 LearningRate 0.0006 Epoch: 13 Global Step: 272580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:54,971-Speed 6315.98 samples/sec Loss 5.9395 LearningRate 0.0006 Epoch: 13 Global Step: 272590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:31:58,224-Speed 6297.32 samples/sec Loss 6.0041 LearningRate 0.0006 Epoch: 13 Global Step: 272600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:01,463-Speed 6324.61 samples/sec Loss 6.0193 LearningRate 0.0006 Epoch: 13 Global Step: 272610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:04,713-Speed 6301.92 samples/sec Loss 5.9964 LearningRate 0.0006 Epoch: 13 Global Step: 272620 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:07,962-Speed 6305.21 samples/sec Loss 6.0036 LearningRate 0.0006 Epoch: 13 Global Step: 272630 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:11,207-Speed 6313.11 samples/sec Loss 5.9946 LearningRate 0.0006 Epoch: 13 Global Step: 272640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:14,453-Speed 6312.22 samples/sec Loss 5.9985 LearningRate 0.0006 Epoch: 13 Global Step: 272650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:17,699-Speed 6310.11 samples/sec Loss 5.8765 LearningRate 0.0006 Epoch: 13 Global Step: 272660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:20,943-Speed 6315.37 samples/sec Loss 5.9930 LearningRate 0.0006 Epoch: 13 Global Step: 272670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:24,187-Speed 6314.23 samples/sec Loss 5.9534 LearningRate 0.0006 Epoch: 13 Global Step: 272680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:27,433-Speed 6310.87 samples/sec Loss 6.0285 LearningRate 0.0006 Epoch: 13 Global Step: 272690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:30,677-Speed 6314.59 samples/sec Loss 5.9971 LearningRate 0.0006 Epoch: 13 Global Step: 272700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:33,922-Speed 6311.73 samples/sec Loss 5.9625 LearningRate 0.0006 Epoch: 13 Global Step: 272710 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:32:37,150-Speed 6345.38 samples/sec Loss 6.0434 LearningRate 0.0006 Epoch: 13 Global Step: 272720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:40,397-Speed 6310.52 samples/sec Loss 5.9660 LearningRate 0.0006 Epoch: 13 Global Step: 272730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:43,641-Speed 6312.69 samples/sec Loss 6.0506 LearningRate 0.0006 Epoch: 13 Global Step: 272740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:46,889-Speed 6307.44 samples/sec Loss 6.0370 LearningRate 0.0006 Epoch: 13 Global Step: 272750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:50,135-Speed 6311.20 samples/sec Loss 6.0099 LearningRate 0.0006 Epoch: 13 Global Step: 272760 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:53,381-Speed 6309.80 samples/sec Loss 5.9268 LearningRate 0.0006 Epoch: 13 Global Step: 272770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:56,636-Speed 6295.49 samples/sec Loss 5.8955 LearningRate 0.0006 Epoch: 13 Global Step: 272780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:32:59,879-Speed 6315.63 samples/sec Loss 6.0155 LearningRate 0.0006 Epoch: 13 Global Step: 272790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:03,126-Speed 6309.81 samples/sec Loss 5.8556 LearningRate 0.0006 Epoch: 13 Global Step: 272800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:06,368-Speed 6318.13 samples/sec Loss 5.9753 LearningRate 0.0006 Epoch: 13 Global Step: 272810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:09,600-Speed 6337.77 samples/sec Loss 5.9299 LearningRate 0.0006 Epoch: 13 Global Step: 272820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:12,847-Speed 6308.83 samples/sec Loss 5.9041 LearningRate 0.0006 Epoch: 13 Global Step: 272830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:16,095-Speed 6306.63 samples/sec Loss 5.9691 LearningRate 0.0006 Epoch: 13 Global Step: 272840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:19,346-Speed 6302.37 samples/sec Loss 5.9933 LearningRate 0.0006 Epoch: 13 Global Step: 272850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:22,591-Speed 6311.82 samples/sec Loss 5.9237 LearningRate 0.0006 Epoch: 13 Global Step: 272860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:25,861-Speed 6264.29 samples/sec Loss 5.9279 LearningRate 0.0006 Epoch: 13 Global Step: 272870 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:29,105-Speed 6315.57 samples/sec Loss 5.9088 LearningRate 0.0006 Epoch: 13 Global Step: 272880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:32,351-Speed 6311.06 samples/sec Loss 5.9882 LearningRate 0.0006 Epoch: 13 Global Step: 272890 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:35,595-Speed 6313.69 samples/sec Loss 5.9168 LearningRate 0.0006 Epoch: 13 Global Step: 272900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:38,839-Speed 6313.91 samples/sec Loss 5.9850 LearningRate 0.0006 Epoch: 13 Global Step: 272910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:42,073-Speed 6334.67 samples/sec Loss 5.9293 LearningRate 0.0006 Epoch: 13 Global Step: 272920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:45,322-Speed 6304.60 samples/sec Loss 5.9432 LearningRate 0.0006 Epoch: 13 Global Step: 272930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:48,572-Speed 6304.04 samples/sec Loss 5.8595 LearningRate 0.0006 Epoch: 13 Global Step: 272940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:51,816-Speed 6313.34 samples/sec Loss 5.9166 LearningRate 0.0006 Epoch: 13 Global Step: 272950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:55,061-Speed 6312.77 samples/sec Loss 5.8800 LearningRate 0.0006 Epoch: 13 Global Step: 272960 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:33:58,313-Speed 6298.74 samples/sec Loss 5.9426 LearningRate 0.0006 Epoch: 13 Global Step: 272970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:01,560-Speed 6309.91 samples/sec Loss 5.9464 LearningRate 0.0006 Epoch: 13 Global Step: 272980 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:04,809-Speed 6304.37 samples/sec Loss 5.9659 LearningRate 0.0006 Epoch: 13 Global Step: 272990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:08,061-Speed 6299.81 samples/sec Loss 5.9916 LearningRate 0.0006 Epoch: 13 Global Step: 273000 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:11,315-Speed 6293.61 samples/sec Loss 5.9206 LearningRate 0.0006 Epoch: 13 Global Step: 273010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:14,560-Speed 6312.35 samples/sec Loss 5.9208 LearningRate 0.0006 Epoch: 13 Global Step: 273020 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:34:17,791-Speed 6342.18 samples/sec Loss 5.9270 LearningRate 0.0006 Epoch: 13 Global Step: 273030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:21,038-Speed 6308.48 samples/sec Loss 5.9166 LearningRate 0.0006 Epoch: 13 Global Step: 273040 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:24,291-Speed 6297.02 samples/sec Loss 5.9366 LearningRate 0.0006 Epoch: 13 Global Step: 273050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:27,535-Speed 6313.75 samples/sec Loss 5.8691 LearningRate 0.0006 Epoch: 13 Global Step: 273060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:30,779-Speed 6315.43 samples/sec Loss 5.9606 LearningRate 0.0006 Epoch: 13 Global Step: 273070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:34,026-Speed 6310.44 samples/sec Loss 6.0274 LearningRate 0.0006 Epoch: 13 Global Step: 273080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:37,279-Speed 6297.49 samples/sec Loss 5.8936 LearningRate 0.0006 Epoch: 13 Global Step: 273090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:40,524-Speed 6311.72 samples/sec Loss 5.9035 LearningRate 0.0006 Epoch: 13 Global Step: 273100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:43,767-Speed 6316.43 samples/sec Loss 5.9690 LearningRate 0.0006 Epoch: 13 Global Step: 273110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:47,017-Speed 6303.18 samples/sec Loss 5.9470 LearningRate 0.0006 Epoch: 13 Global Step: 273120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:50,261-Speed 6315.39 samples/sec Loss 5.8566 LearningRate 0.0006 Epoch: 13 Global Step: 273130 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:34:53,489-Speed 6344.63 samples/sec Loss 5.9972 LearningRate 0.0006 Epoch: 13 Global Step: 273140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:56,734-Speed 6312.89 samples/sec Loss 5.9374 LearningRate 0.0006 Epoch: 13 Global Step: 273150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:34:59,979-Speed 6313.81 samples/sec Loss 5.9315 LearningRate 0.0006 Epoch: 13 Global Step: 273160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:03,226-Speed 6307.58 samples/sec Loss 6.0040 LearningRate 0.0006 Epoch: 13 Global Step: 273170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:06,473-Speed 6308.35 samples/sec Loss 5.9539 LearningRate 0.0006 Epoch: 13 Global Step: 273180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:09,720-Speed 6310.08 samples/sec Loss 5.9762 LearningRate 0.0006 Epoch: 13 Global Step: 273190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:12,965-Speed 6312.42 samples/sec Loss 5.9628 LearningRate 0.0006 Epoch: 13 Global Step: 273200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:16,212-Speed 6309.20 samples/sec Loss 5.9012 LearningRate 0.0006 Epoch: 13 Global Step: 273210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:19,459-Speed 6307.61 samples/sec Loss 5.9272 LearningRate 0.0006 Epoch: 13 Global Step: 273220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:22,706-Speed 6309.15 samples/sec Loss 5.9084 LearningRate 0.0006 Epoch: 13 Global Step: 273230 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:25,936-Speed 6342.23 samples/sec Loss 5.9473 LearningRate 0.0006 Epoch: 13 Global Step: 273240 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:29,185-Speed 6304.57 samples/sec Loss 6.0069 LearningRate 0.0006 Epoch: 13 Global Step: 273250 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:32,433-Speed 6307.80 samples/sec Loss 5.9668 LearningRate 0.0006 Epoch: 13 Global Step: 273260 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:35,681-Speed 6306.43 samples/sec Loss 5.9527 LearningRate 0.0006 Epoch: 13 Global Step: 273270 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:38,927-Speed 6311.68 samples/sec Loss 5.9513 LearningRate 0.0006 Epoch: 13 Global Step: 273280 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:42,169-Speed 6318.00 samples/sec Loss 5.9152 LearningRate 0.0006 Epoch: 13 Global Step: 273290 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:45,416-Speed 6308.98 samples/sec Loss 5.9452 LearningRate 0.0006 Epoch: 13 Global Step: 273300 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:48,663-Speed 6309.43 samples/sec Loss 5.9530 LearningRate 0.0006 Epoch: 13 Global Step: 273310 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:51,908-Speed 6311.59 samples/sec Loss 5.9167 LearningRate 0.0006 Epoch: 13 Global Step: 273320 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:55,155-Speed 6309.34 samples/sec Loss 5.9568 LearningRate 0.0006 Epoch: 13 Global Step: 273330 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:35:58,405-Speed 6302.65 samples/sec Loss 5.8625 LearningRate 0.0006 Epoch: 13 Global Step: 273340 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:36:01,638-Speed 6335.91 samples/sec Loss 5.8778 LearningRate 0.0006 Epoch: 13 Global Step: 273350 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:04,882-Speed 6315.70 samples/sec Loss 5.9906 LearningRate 0.0006 Epoch: 13 Global Step: 273360 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:08,127-Speed 6311.85 samples/sec Loss 5.9156 LearningRate 0.0006 Epoch: 13 Global Step: 273370 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:11,378-Speed 6301.32 samples/sec Loss 6.0187 LearningRate 0.0006 Epoch: 13 Global Step: 273380 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:14,628-Speed 6303.03 samples/sec Loss 5.9520 LearningRate 0.0006 Epoch: 13 Global Step: 273390 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:17,875-Speed 6308.40 samples/sec Loss 5.8702 LearningRate 0.0006 Epoch: 13 Global Step: 273400 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:21,121-Speed 6310.32 samples/sec Loss 5.9346 LearningRate 0.0006 Epoch: 13 Global Step: 273410 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:24,372-Speed 6301.54 samples/sec Loss 6.0054 LearningRate 0.0006 Epoch: 13 Global Step: 273420 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:27,617-Speed 6312.24 samples/sec Loss 5.9267 LearningRate 0.0006 Epoch: 13 Global Step: 273430 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:30,862-Speed 6312.89 samples/sec Loss 5.9387 LearningRate 0.0006 Epoch: 13 Global Step: 273440 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:34,110-Speed 6307.30 samples/sec Loss 5.8975 LearningRate 0.0006 Epoch: 13 Global Step: 273450 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:36:37,377-Speed 6269.82 samples/sec Loss 6.0337 LearningRate 0.0006 Epoch: 13 Global Step: 273460 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:40,692-Speed 6179.55 samples/sec Loss 5.9704 LearningRate 0.0006 Epoch: 13 Global Step: 273470 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:43,935-Speed 6316.54 samples/sec Loss 5.9212 LearningRate 0.0006 Epoch: 13 Global Step: 273480 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:47,213-Speed 6248.70 samples/sec Loss 5.9828 LearningRate 0.0006 Epoch: 13 Global Step: 273490 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:50,577-Speed 6089.45 samples/sec Loss 5.9057 LearningRate 0.0006 Epoch: 13 Global Step: 273500 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:53,823-Speed 6312.72 samples/sec Loss 5.9807 LearningRate 0.0006 Epoch: 13 Global Step: 273510 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:36:57,068-Speed 6311.73 samples/sec Loss 5.9735 LearningRate 0.0006 Epoch: 13 Global Step: 273520 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:00,311-Speed 6317.79 samples/sec Loss 5.9517 LearningRate 0.0006 Epoch: 13 Global Step: 273530 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:03,557-Speed 6310.68 samples/sec Loss 5.9197 LearningRate 0.0006 Epoch: 13 Global Step: 273540 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:06,804-Speed 6309.04 samples/sec Loss 5.9162 LearningRate 0.0006 Epoch: 13 Global Step: 273550 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:10,048-Speed 6314.02 samples/sec Loss 5.8825 LearningRate 0.0006 Epoch: 13 Global Step: 273560 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:37:13,278-Speed 6341.71 samples/sec Loss 5.9780 LearningRate 0.0006 Epoch: 13 Global Step: 273570 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:16,524-Speed 6312.89 samples/sec Loss 5.9656 LearningRate 0.0006 Epoch: 13 Global Step: 273580 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:19,768-Speed 6313.30 samples/sec Loss 5.9788 LearningRate 0.0006 Epoch: 13 Global Step: 273590 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:23,014-Speed 6310.45 samples/sec Loss 5.8851 LearningRate 0.0006 Epoch: 13 Global Step: 273600 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:26,259-Speed 6313.10 samples/sec Loss 5.9269 LearningRate 0.0006 Epoch: 13 Global Step: 273610 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:29,506-Speed 6308.21 samples/sec Loss 5.9757 LearningRate 0.0006 Epoch: 13 Global Step: 273620 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:32,753-Speed 6310.90 samples/sec Loss 5.9923 LearningRate 0.0006 Epoch: 13 Global Step: 273630 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:36,004-Speed 6300.51 samples/sec Loss 6.0096 LearningRate 0.0006 Epoch: 13 Global Step: 273640 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:39,247-Speed 6315.08 samples/sec Loss 5.9642 LearningRate 0.0006 Epoch: 13 Global Step: 273650 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:42,501-Speed 6295.93 samples/sec Loss 5.9763 LearningRate 0.0006 Epoch: 13 Global Step: 273660 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:45,732-Speed 6339.63 samples/sec Loss 5.9951 LearningRate 0.0006 Epoch: 13 Global Step: 273670 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:48,978-Speed 6311.15 samples/sec Loss 5.9616 LearningRate 0.0006 Epoch: 13 Global Step: 273680 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:52,226-Speed 6306.07 samples/sec Loss 5.9092 LearningRate 0.0006 Epoch: 13 Global Step: 273690 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:55,473-Speed 6310.15 samples/sec Loss 5.9485 LearningRate 0.0006 Epoch: 13 Global Step: 273700 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:37:58,714-Speed 6321.06 samples/sec Loss 5.9310 LearningRate 0.0006 Epoch: 13 Global Step: 273710 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:01,960-Speed 6309.62 samples/sec Loss 5.9329 LearningRate 0.0006 Epoch: 13 Global Step: 273720 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:05,207-Speed 6310.58 samples/sec Loss 5.9491 LearningRate 0.0006 Epoch: 13 Global Step: 273730 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:08,451-Speed 6313.82 samples/sec Loss 5.9684 LearningRate 0.0006 Epoch: 13 Global Step: 273740 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:11,695-Speed 6314.31 samples/sec Loss 5.9359 LearningRate 0.0006 Epoch: 13 Global Step: 273750 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:14,954-Speed 6286.16 samples/sec Loss 5.8868 LearningRate 0.0006 Epoch: 13 Global Step: 273760 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:18,183-Speed 6342.60 samples/sec Loss 5.9275 LearningRate 0.0006 Epoch: 13 Global Step: 273770 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:21,429-Speed 6312.44 samples/sec Loss 5.9579 LearningRate 0.0006 Epoch: 13 Global Step: 273780 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:24,677-Speed 6305.00 samples/sec Loss 5.9680 LearningRate 0.0006 Epoch: 13 Global Step: 273790 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:27,925-Speed 6307.54 samples/sec Loss 5.9094 LearningRate 0.0006 Epoch: 13 Global Step: 273800 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:31,169-Speed 6315.28 samples/sec Loss 5.9651 LearningRate 0.0006 Epoch: 13 Global Step: 273810 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:34,415-Speed 6311.06 samples/sec Loss 6.0258 LearningRate 0.0006 Epoch: 13 Global Step: 273820 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:37,658-Speed 6316.56 samples/sec Loss 5.8949 LearningRate 0.0006 Epoch: 13 Global Step: 273830 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:40,900-Speed 6316.63 samples/sec Loss 6.0245 LearningRate 0.0006 Epoch: 13 Global Step: 273840 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:44,145-Speed 6313.40 samples/sec Loss 5.9337 LearningRate 0.0006 Epoch: 13 Global Step: 273850 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:47,392-Speed 6309.81 samples/sec Loss 5.9376 LearningRate 0.0006 Epoch: 13 Global Step: 273860 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:50,642-Speed 6302.03 samples/sec Loss 5.8866 LearningRate 0.0006 Epoch: 13 Global Step: 273870 Fp16 Grad Scale: 65536 Required: 51 hours Training: 2022-04-01 16:38:53,873-Speed 6339.56 samples/sec Loss 5.8560 LearningRate 0.0006 Epoch: 13 Global Step: 273880 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:38:57,119-Speed 6311.83 samples/sec Loss 5.9340 LearningRate 0.0006 Epoch: 13 Global Step: 273890 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:00,362-Speed 6315.78 samples/sec Loss 5.8571 LearningRate 0.0006 Epoch: 13 Global Step: 273900 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:03,613-Speed 6300.58 samples/sec Loss 5.8694 LearningRate 0.0006 Epoch: 13 Global Step: 273910 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:06,859-Speed 6311.66 samples/sec Loss 5.9848 LearningRate 0.0006 Epoch: 13 Global Step: 273920 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:10,106-Speed 6308.48 samples/sec Loss 5.9714 LearningRate 0.0006 Epoch: 13 Global Step: 273930 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:13,347-Speed 6320.35 samples/sec Loss 5.9841 LearningRate 0.0006 Epoch: 13 Global Step: 273940 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:16,612-Speed 6275.72 samples/sec Loss 5.9342 LearningRate 0.0006 Epoch: 13 Global Step: 273950 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:19,858-Speed 6309.51 samples/sec Loss 5.8833 LearningRate 0.0006 Epoch: 13 Global Step: 273960 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:23,100-Speed 6318.28 samples/sec Loss 5.9909 LearningRate 0.0006 Epoch: 13 Global Step: 273970 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:26,336-Speed 6331.63 samples/sec Loss 5.9426 LearningRate 0.0006 Epoch: 13 Global Step: 273980 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:29,609-Speed 6257.78 samples/sec Loss 5.9888 LearningRate 0.0006 Epoch: 13 Global Step: 273990 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:32,866-Speed 6289.24 samples/sec Loss 5.9734 LearningRate 0.0006 Epoch: 13 Global Step: 274000 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:36,114-Speed 6306.94 samples/sec Loss 5.9047 LearningRate 0.0006 Epoch: 13 Global Step: 274010 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:39,357-Speed 6317.64 samples/sec Loss 5.9012 LearningRate 0.0006 Epoch: 13 Global Step: 274020 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:42,604-Speed 6308.11 samples/sec Loss 5.9776 LearningRate 0.0006 Epoch: 13 Global Step: 274030 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:45,848-Speed 6315.03 samples/sec Loss 5.8898 LearningRate 0.0006 Epoch: 13 Global Step: 274040 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:49,097-Speed 6303.64 samples/sec Loss 5.9645 LearningRate 0.0006 Epoch: 13 Global Step: 274050 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:52,344-Speed 6309.53 samples/sec Loss 5.9361 LearningRate 0.0006 Epoch: 13 Global Step: 274060 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:55,589-Speed 6311.46 samples/sec Loss 5.8660 LearningRate 0.0006 Epoch: 13 Global Step: 274070 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:39:58,819-Speed 6342.23 samples/sec Loss 5.9644 LearningRate 0.0006 Epoch: 13 Global Step: 274080 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:02,066-Speed 6310.13 samples/sec Loss 5.9509 LearningRate 0.0006 Epoch: 13 Global Step: 274090 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:05,312-Speed 6309.39 samples/sec Loss 5.9795 LearningRate 0.0006 Epoch: 13 Global Step: 274100 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:08,557-Speed 6314.05 samples/sec Loss 6.0142 LearningRate 0.0006 Epoch: 13 Global Step: 274110 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:11,808-Speed 6299.12 samples/sec Loss 5.9339 LearningRate 0.0006 Epoch: 13 Global Step: 274120 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:15,056-Speed 6308.48 samples/sec Loss 5.9173 LearningRate 0.0006 Epoch: 13 Global Step: 274130 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:18,302-Speed 6309.96 samples/sec Loss 5.9712 LearningRate 0.0006 Epoch: 13 Global Step: 274140 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:21,548-Speed 6311.70 samples/sec Loss 5.9429 LearningRate 0.0006 Epoch: 13 Global Step: 274150 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:24,792-Speed 6313.98 samples/sec Loss 5.9140 LearningRate 0.0006 Epoch: 13 Global Step: 274160 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:28,032-Speed 6322.06 samples/sec Loss 5.9549 LearningRate 0.0006 Epoch: 13 Global Step: 274170 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:31,304-Speed 6261.89 samples/sec Loss 5.9385 LearningRate 0.0006 Epoch: 13 Global Step: 274180 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:34,549-Speed 6313.45 samples/sec Loss 5.9688 LearningRate 0.0006 Epoch: 13 Global Step: 274190 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:37,797-Speed 6306.94 samples/sec Loss 5.8912 LearningRate 0.0006 Epoch: 13 Global Step: 274200 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:41,050-Speed 6295.66 samples/sec Loss 5.9293 LearningRate 0.0006 Epoch: 13 Global Step: 274210 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:44,297-Speed 6309.19 samples/sec Loss 5.9102 LearningRate 0.0006 Epoch: 13 Global Step: 274220 Fp16 Grad Scale: 32768 Required: 51 hours Training: 2022-04-01 16:40:47,550-Speed 6297.48 samples/sec Loss 5.9784 LearningRate 0.0006 Epoch: 13 Global Step: 274230 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:40:50,800-Speed 6302.43 samples/sec Loss 5.9097 LearningRate 0.0006 Epoch: 13 Global Step: 274240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:40:54,048-Speed 6306.84 samples/sec Loss 5.8757 LearningRate 0.0006 Epoch: 13 Global Step: 274250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:40:57,292-Speed 6314.13 samples/sec Loss 5.9390 LearningRate 0.0006 Epoch: 13 Global Step: 274260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:00,538-Speed 6310.40 samples/sec Loss 5.9962 LearningRate 0.0006 Epoch: 13 Global Step: 274270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:03,791-Speed 6297.77 samples/sec Loss 5.9637 LearningRate 0.0006 Epoch: 13 Global Step: 274280 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:41:07,022-Speed 6339.98 samples/sec Loss 5.9342 LearningRate 0.0006 Epoch: 13 Global Step: 274290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:10,268-Speed 6310.90 samples/sec Loss 5.9610 LearningRate 0.0006 Epoch: 13 Global Step: 274300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:13,516-Speed 6308.39 samples/sec Loss 5.9437 LearningRate 0.0006 Epoch: 13 Global Step: 274310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:16,764-Speed 6305.21 samples/sec Loss 5.8737 LearningRate 0.0006 Epoch: 13 Global Step: 274320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:20,011-Speed 6309.70 samples/sec Loss 6.0033 LearningRate 0.0006 Epoch: 13 Global Step: 274330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:23,260-Speed 6304.34 samples/sec Loss 5.9419 LearningRate 0.0006 Epoch: 13 Global Step: 274340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:26,505-Speed 6313.45 samples/sec Loss 5.9512 LearningRate 0.0006 Epoch: 13 Global Step: 274350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:29,750-Speed 6312.08 samples/sec Loss 5.9608 LearningRate 0.0006 Epoch: 13 Global Step: 274360 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:32,998-Speed 6307.19 samples/sec Loss 5.9696 LearningRate 0.0006 Epoch: 13 Global Step: 274370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:36,247-Speed 6306.12 samples/sec Loss 6.0097 LearningRate 0.0006 Epoch: 13 Global Step: 274380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:39,478-Speed 6339.38 samples/sec Loss 5.9366 LearningRate 0.0006 Epoch: 13 Global Step: 274390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:42,722-Speed 6314.97 samples/sec Loss 5.9719 LearningRate 0.0006 Epoch: 13 Global Step: 274400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:45,972-Speed 6301.91 samples/sec Loss 5.9881 LearningRate 0.0006 Epoch: 13 Global Step: 274410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:49,217-Speed 6314.04 samples/sec Loss 5.9360 LearningRate 0.0006 Epoch: 13 Global Step: 274420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:52,460-Speed 6315.43 samples/sec Loss 5.9803 LearningRate 0.0006 Epoch: 13 Global Step: 274430 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:55,705-Speed 6312.68 samples/sec Loss 5.9929 LearningRate 0.0006 Epoch: 13 Global Step: 274440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:41:58,950-Speed 6312.25 samples/sec Loss 5.9707 LearningRate 0.0006 Epoch: 13 Global Step: 274450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:02,202-Speed 6300.93 samples/sec Loss 5.9007 LearningRate 0.0006 Epoch: 13 Global Step: 274460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:05,448-Speed 6310.32 samples/sec Loss 5.8995 LearningRate 0.0006 Epoch: 13 Global Step: 274470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:08,697-Speed 6304.96 samples/sec Loss 5.8917 LearningRate 0.0006 Epoch: 13 Global Step: 274480 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:11,946-Speed 6305.47 samples/sec Loss 5.9850 LearningRate 0.0006 Epoch: 13 Global Step: 274490 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:42:15,190-Speed 6315.21 samples/sec Loss 5.9452 LearningRate 0.0006 Epoch: 13 Global Step: 274500 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:42:18,422-Speed 6336.50 samples/sec Loss 5.9444 LearningRate 0.0006 Epoch: 13 Global Step: 274510 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:21,669-Speed 6311.05 samples/sec Loss 5.9320 LearningRate 0.0006 Epoch: 13 Global Step: 274520 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:24,916-Speed 6307.79 samples/sec Loss 5.9217 LearningRate 0.0006 Epoch: 13 Global Step: 274530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:28,163-Speed 6309.23 samples/sec Loss 5.9266 LearningRate 0.0006 Epoch: 13 Global Step: 274540 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:31,409-Speed 6310.87 samples/sec Loss 6.0171 LearningRate 0.0006 Epoch: 13 Global Step: 274550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:34,654-Speed 6313.57 samples/sec Loss 5.9107 LearningRate 0.0006 Epoch: 13 Global Step: 274560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:37,897-Speed 6315.98 samples/sec Loss 5.9207 LearningRate 0.0006 Epoch: 13 Global Step: 274570 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:41,152-Speed 6293.09 samples/sec Loss 5.9545 LearningRate 0.0006 Epoch: 13 Global Step: 274580 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:44,398-Speed 6311.40 samples/sec Loss 5.9334 LearningRate 0.0006 Epoch: 13 Global Step: 274590 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:47,644-Speed 6310.89 samples/sec Loss 6.0052 LearningRate 0.0006 Epoch: 13 Global Step: 274600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:50,875-Speed 6339.15 samples/sec Loss 5.9820 LearningRate 0.0006 Epoch: 13 Global Step: 274610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:54,122-Speed 6308.73 samples/sec Loss 5.9436 LearningRate 0.0006 Epoch: 13 Global Step: 274620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:42:57,367-Speed 6312.73 samples/sec Loss 5.8886 LearningRate 0.0006 Epoch: 13 Global Step: 274630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:00,666-Speed 6209.40 samples/sec Loss 5.9504 LearningRate 0.0006 Epoch: 13 Global Step: 274640 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:03,988-Speed 6165.38 samples/sec Loss 6.0022 LearningRate 0.0006 Epoch: 13 Global Step: 274650 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:07,233-Speed 6313.25 samples/sec Loss 5.9055 LearningRate 0.0006 Epoch: 13 Global Step: 274660 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:10,488-Speed 6293.24 samples/sec Loss 5.9879 LearningRate 0.0006 Epoch: 13 Global Step: 274670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:13,732-Speed 6315.20 samples/sec Loss 5.9872 LearningRate 0.0006 Epoch: 13 Global Step: 274680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:17,017-Speed 6235.84 samples/sec Loss 5.9245 LearningRate 0.0006 Epoch: 13 Global Step: 274690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:20,279-Speed 6278.84 samples/sec Loss 5.9649 LearningRate 0.0006 Epoch: 13 Global Step: 274700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:23,529-Speed 6306.25 samples/sec Loss 5.8787 LearningRate 0.0006 Epoch: 13 Global Step: 274710 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:43:26,763-Speed 6334.27 samples/sec Loss 5.9312 LearningRate 0.0006 Epoch: 13 Global Step: 274720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:30,007-Speed 6313.62 samples/sec Loss 5.9257 LearningRate 0.0006 Epoch: 13 Global Step: 274730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:33,251-Speed 6315.08 samples/sec Loss 5.9222 LearningRate 0.0006 Epoch: 13 Global Step: 274740 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:36,498-Speed 6308.31 samples/sec Loss 5.9541 LearningRate 0.0006 Epoch: 13 Global Step: 274750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:39,743-Speed 6312.71 samples/sec Loss 5.9572 LearningRate 0.0006 Epoch: 13 Global Step: 274760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:42,991-Speed 6308.21 samples/sec Loss 5.9963 LearningRate 0.0006 Epoch: 13 Global Step: 274770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:46,232-Speed 6321.31 samples/sec Loss 5.9771 LearningRate 0.0006 Epoch: 13 Global Step: 274780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:49,482-Speed 6302.85 samples/sec Loss 5.9430 LearningRate 0.0006 Epoch: 13 Global Step: 274790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:52,729-Speed 6307.96 samples/sec Loss 5.9878 LearningRate 0.0006 Epoch: 13 Global Step: 274800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:55,979-Speed 6303.22 samples/sec Loss 5.9767 LearningRate 0.0006 Epoch: 13 Global Step: 274810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:43:59,217-Speed 6325.93 samples/sec Loss 5.9665 LearningRate 0.0006 Epoch: 13 Global Step: 274820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:02,467-Speed 6304.43 samples/sec Loss 5.9080 LearningRate 0.0006 Epoch: 13 Global Step: 274830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:05,710-Speed 6315.13 samples/sec Loss 5.9103 LearningRate 0.0006 Epoch: 13 Global Step: 274840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:08,962-Speed 6300.18 samples/sec Loss 5.9623 LearningRate 0.0006 Epoch: 13 Global Step: 274850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:12,208-Speed 6310.17 samples/sec Loss 6.0161 LearningRate 0.0006 Epoch: 13 Global Step: 274860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:15,453-Speed 6312.38 samples/sec Loss 5.8510 LearningRate 0.0006 Epoch: 13 Global Step: 274870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:18,700-Speed 6309.41 samples/sec Loss 5.9637 LearningRate 0.0006 Epoch: 13 Global Step: 274880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:21,947-Speed 6308.45 samples/sec Loss 5.8732 LearningRate 0.0006 Epoch: 13 Global Step: 274890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:25,191-Speed 6314.54 samples/sec Loss 5.9750 LearningRate 0.0006 Epoch: 13 Global Step: 274900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:28,440-Speed 6305.14 samples/sec Loss 5.9258 LearningRate 0.0006 Epoch: 13 Global Step: 274910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:31,671-Speed 6339.03 samples/sec Loss 6.0147 LearningRate 0.0006 Epoch: 13 Global Step: 274920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:34,916-Speed 6313.07 samples/sec Loss 5.9350 LearningRate 0.0006 Epoch: 13 Global Step: 274930 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:38,171-Speed 6293.25 samples/sec Loss 5.9667 LearningRate 0.0006 Epoch: 13 Global Step: 274940 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:41,418-Speed 6307.91 samples/sec Loss 5.9371 LearningRate 0.0006 Epoch: 13 Global Step: 274950 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:44,664-Speed 6310.55 samples/sec Loss 5.9480 LearningRate 0.0006 Epoch: 13 Global Step: 274960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:47,912-Speed 6307.84 samples/sec Loss 5.8659 LearningRate 0.0006 Epoch: 13 Global Step: 274970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:51,157-Speed 6311.73 samples/sec Loss 5.8913 LearningRate 0.0006 Epoch: 13 Global Step: 274980 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:54,404-Speed 6310.10 samples/sec Loss 5.9544 LearningRate 0.0006 Epoch: 13 Global Step: 274990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:44:57,649-Speed 6312.85 samples/sec Loss 5.9572 LearningRate 0.0006 Epoch: 13 Global Step: 275000 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:00,890-Speed 6320.63 samples/sec Loss 5.9811 LearningRate 0.0006 Epoch: 13 Global Step: 275010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:04,138-Speed 6310.49 samples/sec Loss 5.9596 LearningRate 0.0006 Epoch: 13 Global Step: 275020 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:45:07,371-Speed 6334.94 samples/sec Loss 5.8896 LearningRate 0.0006 Epoch: 13 Global Step: 275030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:10,623-Speed 6299.16 samples/sec Loss 5.9788 LearningRate 0.0006 Epoch: 13 Global Step: 275040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:13,870-Speed 6309.61 samples/sec Loss 5.9444 LearningRate 0.0006 Epoch: 13 Global Step: 275050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:17,116-Speed 6309.79 samples/sec Loss 5.8537 LearningRate 0.0006 Epoch: 13 Global Step: 275060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:20,364-Speed 6307.37 samples/sec Loss 5.9177 LearningRate 0.0006 Epoch: 13 Global Step: 275070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:23,609-Speed 6312.84 samples/sec Loss 5.9337 LearningRate 0.0006 Epoch: 13 Global Step: 275080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:26,853-Speed 6313.76 samples/sec Loss 5.9688 LearningRate 0.0006 Epoch: 13 Global Step: 275090 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:30,099-Speed 6310.70 samples/sec Loss 6.0640 LearningRate 0.0006 Epoch: 13 Global Step: 275100 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:33,345-Speed 6310.94 samples/sec Loss 5.9425 LearningRate 0.0006 Epoch: 13 Global Step: 275110 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:36,589-Speed 6313.89 samples/sec Loss 5.9744 LearningRate 0.0006 Epoch: 13 Global Step: 275120 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:39,822-Speed 6336.53 samples/sec Loss 5.9982 LearningRate 0.0006 Epoch: 13 Global Step: 275130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:43,068-Speed 6312.37 samples/sec Loss 5.9564 LearningRate 0.0006 Epoch: 13 Global Step: 275140 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:46,312-Speed 6312.96 samples/sec Loss 5.8791 LearningRate 0.0006 Epoch: 13 Global Step: 275150 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:49,558-Speed 6311.36 samples/sec Loss 5.9359 LearningRate 0.0006 Epoch: 13 Global Step: 275160 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:52,804-Speed 6310.16 samples/sec Loss 5.9205 LearningRate 0.0006 Epoch: 13 Global Step: 275170 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:56,051-Speed 6309.80 samples/sec Loss 5.9471 LearningRate 0.0006 Epoch: 13 Global Step: 275180 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:45:59,296-Speed 6311.12 samples/sec Loss 6.0064 LearningRate 0.0006 Epoch: 13 Global Step: 275190 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:02,545-Speed 6305.59 samples/sec Loss 5.9293 LearningRate 0.0006 Epoch: 13 Global Step: 275200 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:05,794-Speed 6306.38 samples/sec Loss 5.9694 LearningRate 0.0006 Epoch: 13 Global Step: 275210 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:09,040-Speed 6310.47 samples/sec Loss 5.8941 LearningRate 0.0006 Epoch: 13 Global Step: 275220 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:12,290-Speed 6303.64 samples/sec Loss 5.9194 LearningRate 0.0006 Epoch: 13 Global Step: 275230 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:46:15,523-Speed 6336.02 samples/sec Loss 5.9890 LearningRate 0.0006 Epoch: 13 Global Step: 275240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:18,770-Speed 6309.17 samples/sec Loss 5.9414 LearningRate 0.0006 Epoch: 13 Global Step: 275250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:22,020-Speed 6302.30 samples/sec Loss 5.8933 LearningRate 0.0006 Epoch: 13 Global Step: 275260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:25,267-Speed 6308.51 samples/sec Loss 5.9958 LearningRate 0.0006 Epoch: 13 Global Step: 275270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:28,515-Speed 6306.07 samples/sec Loss 5.9082 LearningRate 0.0006 Epoch: 13 Global Step: 275280 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:31,764-Speed 6305.57 samples/sec Loss 5.9685 LearningRate 0.0006 Epoch: 13 Global Step: 275290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:35,025-Speed 6282.58 samples/sec Loss 5.9492 LearningRate 0.0006 Epoch: 13 Global Step: 275300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:38,269-Speed 6313.85 samples/sec Loss 5.9494 LearningRate 0.0006 Epoch: 13 Global Step: 275310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:41,516-Speed 6309.39 samples/sec Loss 5.9541 LearningRate 0.0006 Epoch: 13 Global Step: 275320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:44,761-Speed 6313.18 samples/sec Loss 5.9303 LearningRate 0.0006 Epoch: 13 Global Step: 275330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:47,991-Speed 6340.76 samples/sec Loss 5.8716 LearningRate 0.0006 Epoch: 13 Global Step: 275340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:51,235-Speed 6314.27 samples/sec Loss 5.9504 LearningRate 0.0006 Epoch: 13 Global Step: 275350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:54,482-Speed 6309.47 samples/sec Loss 5.9030 LearningRate 0.0006 Epoch: 13 Global Step: 275360 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:46:57,728-Speed 6310.98 samples/sec Loss 5.9271 LearningRate 0.0006 Epoch: 13 Global Step: 275370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:00,976-Speed 6305.44 samples/sec Loss 5.8904 LearningRate 0.0006 Epoch: 13 Global Step: 275380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:04,221-Speed 6314.11 samples/sec Loss 5.9014 LearningRate 0.0006 Epoch: 13 Global Step: 275390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:07,488-Speed 6269.20 samples/sec Loss 5.8789 LearningRate 0.0006 Epoch: 13 Global Step: 275400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:10,741-Speed 6297.65 samples/sec Loss 5.9364 LearningRate 0.0006 Epoch: 13 Global Step: 275410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:13,988-Speed 6310.17 samples/sec Loss 5.9694 LearningRate 0.0006 Epoch: 13 Global Step: 275420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:17,238-Speed 6302.85 samples/sec Loss 5.9669 LearningRate 0.0006 Epoch: 13 Global Step: 275430 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:20,470-Speed 6337.66 samples/sec Loss 5.9709 LearningRate 0.0006 Epoch: 13 Global Step: 275440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:23,718-Speed 6306.28 samples/sec Loss 5.9394 LearningRate 0.0006 Epoch: 13 Global Step: 275450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:26,962-Speed 6314.07 samples/sec Loss 5.9174 LearningRate 0.0006 Epoch: 13 Global Step: 275460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:30,210-Speed 6307.37 samples/sec Loss 5.9301 LearningRate 0.0006 Epoch: 13 Global Step: 275470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:33,457-Speed 6309.20 samples/sec Loss 5.9026 LearningRate 0.0006 Epoch: 13 Global Step: 275480 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:36,709-Speed 6298.35 samples/sec Loss 5.8280 LearningRate 0.0006 Epoch: 13 Global Step: 275490 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:39,964-Speed 6293.58 samples/sec Loss 5.9400 LearningRate 0.0006 Epoch: 13 Global Step: 275500 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:43,230-Speed 6271.49 samples/sec Loss 5.8701 LearningRate 0.0006 Epoch: 13 Global Step: 275510 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:46,478-Speed 6308.14 samples/sec Loss 5.9467 LearningRate 0.0006 Epoch: 13 Global Step: 275520 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:49,725-Speed 6307.47 samples/sec Loss 5.9848 LearningRate 0.0006 Epoch: 13 Global Step: 275530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:52,961-Speed 6330.54 samples/sec Loss 6.0348 LearningRate 0.0006 Epoch: 13 Global Step: 275540 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:56,207-Speed 6311.72 samples/sec Loss 5.8659 LearningRate 0.0006 Epoch: 13 Global Step: 275550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:47:59,460-Speed 6295.74 samples/sec Loss 5.9195 LearningRate 0.0006 Epoch: 13 Global Step: 275560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:02,706-Speed 6311.01 samples/sec Loss 5.9620 LearningRate 0.0006 Epoch: 13 Global Step: 275570 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:05,951-Speed 6313.16 samples/sec Loss 5.9708 LearningRate 0.0006 Epoch: 13 Global Step: 275580 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:09,200-Speed 6305.63 samples/sec Loss 5.9521 LearningRate 0.0006 Epoch: 13 Global Step: 275590 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:12,450-Speed 6302.20 samples/sec Loss 5.9653 LearningRate 0.0006 Epoch: 13 Global Step: 275600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:15,695-Speed 6315.08 samples/sec Loss 6.0010 LearningRate 0.0006 Epoch: 13 Global Step: 275610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:18,944-Speed 6304.91 samples/sec Loss 5.9494 LearningRate 0.0006 Epoch: 13 Global Step: 275620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:22,197-Speed 6297.85 samples/sec Loss 5.9437 LearningRate 0.0006 Epoch: 13 Global Step: 275630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:25,442-Speed 6311.97 samples/sec Loss 5.9931 LearningRate 0.0006 Epoch: 13 Global Step: 275640 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:48:28,678-Speed 6330.58 samples/sec Loss 5.9259 LearningRate 0.0006 Epoch: 13 Global Step: 275650 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:31,926-Speed 6306.32 samples/sec Loss 5.9195 LearningRate 0.0006 Epoch: 13 Global Step: 275660 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:35,174-Speed 6307.66 samples/sec Loss 5.9065 LearningRate 0.0006 Epoch: 13 Global Step: 275670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:38,423-Speed 6305.56 samples/sec Loss 5.9147 LearningRate 0.0006 Epoch: 13 Global Step: 275680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:41,671-Speed 6305.12 samples/sec Loss 5.9702 LearningRate 0.0006 Epoch: 13 Global Step: 275690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:44,921-Speed 6303.74 samples/sec Loss 5.9506 LearningRate 0.0006 Epoch: 13 Global Step: 275700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:48,166-Speed 6312.77 samples/sec Loss 5.9138 LearningRate 0.0006 Epoch: 13 Global Step: 275710 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:51,416-Speed 6303.34 samples/sec Loss 5.9509 LearningRate 0.0006 Epoch: 13 Global Step: 275720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:54,659-Speed 6315.71 samples/sec Loss 5.9602 LearningRate 0.0006 Epoch: 13 Global Step: 275730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:48:57,904-Speed 6313.47 samples/sec Loss 5.9059 LearningRate 0.0006 Epoch: 13 Global Step: 275740 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:01,136-Speed 6337.40 samples/sec Loss 5.8641 LearningRate 0.0006 Epoch: 13 Global Step: 275750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:04,385-Speed 6304.90 samples/sec Loss 5.8952 LearningRate 0.0006 Epoch: 13 Global Step: 275760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:07,633-Speed 6307.05 samples/sec Loss 5.9168 LearningRate 0.0006 Epoch: 13 Global Step: 275770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:10,879-Speed 6310.98 samples/sec Loss 5.9714 LearningRate 0.0006 Epoch: 13 Global Step: 275780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:14,126-Speed 6309.22 samples/sec Loss 5.8337 LearningRate 0.0006 Epoch: 13 Global Step: 275790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:17,370-Speed 6314.13 samples/sec Loss 5.8704 LearningRate 0.0006 Epoch: 13 Global Step: 275800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:20,614-Speed 6313.24 samples/sec Loss 5.9272 LearningRate 0.0006 Epoch: 13 Global Step: 275810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:23,860-Speed 6310.73 samples/sec Loss 5.8490 LearningRate 0.0006 Epoch: 13 Global Step: 275820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:27,104-Speed 6314.77 samples/sec Loss 5.9384 LearningRate 0.0006 Epoch: 13 Global Step: 275830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:30,345-Speed 6320.54 samples/sec Loss 5.9841 LearningRate 0.0006 Epoch: 13 Global Step: 275840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:33,577-Speed 6339.24 samples/sec Loss 5.9178 LearningRate 0.0006 Epoch: 13 Global Step: 275850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:36,821-Speed 6314.34 samples/sec Loss 5.9228 LearningRate 0.0006 Epoch: 13 Global Step: 275860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:40,068-Speed 6309.98 samples/sec Loss 6.0206 LearningRate 0.0005 Epoch: 13 Global Step: 275870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:43,313-Speed 6311.85 samples/sec Loss 5.9712 LearningRate 0.0005 Epoch: 13 Global Step: 275880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:46,557-Speed 6315.49 samples/sec Loss 5.9971 LearningRate 0.0005 Epoch: 13 Global Step: 275890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:49,807-Speed 6303.04 samples/sec Loss 6.0208 LearningRate 0.0005 Epoch: 13 Global Step: 275900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:53,054-Speed 6307.05 samples/sec Loss 5.9047 LearningRate 0.0005 Epoch: 13 Global Step: 275910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:56,298-Speed 6314.70 samples/sec Loss 5.9546 LearningRate 0.0005 Epoch: 13 Global Step: 275920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:49:59,546-Speed 6308.23 samples/sec Loss 5.9428 LearningRate 0.0005 Epoch: 13 Global Step: 275930 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:02,791-Speed 6311.21 samples/sec Loss 5.9600 LearningRate 0.0005 Epoch: 13 Global Step: 275940 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:06,039-Speed 6306.58 samples/sec Loss 5.8895 LearningRate 0.0005 Epoch: 13 Global Step: 275950 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:50:09,271-Speed 6338.09 samples/sec Loss 5.9117 LearningRate 0.0005 Epoch: 13 Global Step: 275960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:12,520-Speed 6305.08 samples/sec Loss 5.8408 LearningRate 0.0005 Epoch: 13 Global Step: 275970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:15,784-Speed 6277.25 samples/sec Loss 5.9450 LearningRate 0.0005 Epoch: 13 Global Step: 275980 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:19,030-Speed 6310.83 samples/sec Loss 5.9345 LearningRate 0.0005 Epoch: 13 Global Step: 275990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:22,273-Speed 6316.42 samples/sec Loss 5.8878 LearningRate 0.0005 Epoch: 13 Global Step: 276000 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:25,515-Speed 6316.99 samples/sec Loss 5.8328 LearningRate 0.0005 Epoch: 13 Global Step: 276010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:28,762-Speed 6309.57 samples/sec Loss 5.8819 LearningRate 0.0005 Epoch: 13 Global Step: 276020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:32,008-Speed 6309.73 samples/sec Loss 5.9360 LearningRate 0.0005 Epoch: 13 Global Step: 276030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:35,253-Speed 6312.61 samples/sec Loss 5.8717 LearningRate 0.0005 Epoch: 13 Global Step: 276040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:38,506-Speed 6298.91 samples/sec Loss 5.9425 LearningRate 0.0005 Epoch: 13 Global Step: 276050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:41,751-Speed 6311.80 samples/sec Loss 5.8542 LearningRate 0.0005 Epoch: 13 Global Step: 276060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:44,997-Speed 6311.82 samples/sec Loss 6.0141 LearningRate 0.0005 Epoch: 13 Global Step: 276070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:48,242-Speed 6313.05 samples/sec Loss 5.8564 LearningRate 0.0005 Epoch: 13 Global Step: 276080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:51,522-Speed 6245.77 samples/sec Loss 5.8338 LearningRate 0.0005 Epoch: 13 Global Step: 276090 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:54,766-Speed 6314.37 samples/sec Loss 5.9935 LearningRate 0.0005 Epoch: 13 Global Step: 276100 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:50:58,009-Speed 6315.84 samples/sec Loss 5.9069 LearningRate 0.0005 Epoch: 13 Global Step: 276110 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:51:01,294-Speed 6236.52 samples/sec Loss 5.9506 LearningRate 0.0005 Epoch: 13 Global Step: 276120 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:51:04,536-Speed 6317.31 samples/sec Loss 5.9254 LearningRate 0.0005 Epoch: 13 Global Step: 276130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:51:07,763-Speed 6348.36 samples/sec Loss 5.8722 LearningRate 0.0005 Epoch: 13 Global Step: 276140 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:51:11,007-Speed 6315.77 samples/sec Loss 5.9030 LearningRate 0.0005 Epoch: 13 Global Step: 276150 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:51:14,255-Speed 6306.32 samples/sec Loss 5.9234 LearningRate 0.0005 Epoch: 13 Global Step: 276160 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:51:17,502-Speed 6308.13 samples/sec Loss 5.9386 LearningRate 0.0005 Epoch: 13 Global Step: 276170 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:51:20,749-Speed 6309.37 samples/sec Loss 5.9389 LearningRate 0.0005 Epoch: 13 Global Step: 276180 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:51:23,992-Speed 6316.57 samples/sec Loss 5.9555 LearningRate 0.0005 Epoch: 13 Global Step: 276190 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:51:27,239-Speed 6308.34 samples/sec Loss 5.9499 LearningRate 0.0005 Epoch: 13 Global Step: 276200 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:51:30,481-Speed 6318.41 samples/sec Loss 5.9257 LearningRate 0.0005 Epoch: 13 Global Step: 276210 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:51:33,731-Speed 6303.39 samples/sec Loss 5.9138 LearningRate 0.0005 Epoch: 13 Global Step: 276220 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:51:36,977-Speed 6310.00 samples/sec Loss 5.9808 LearningRate 0.0005 Epoch: 13 Global Step: 276230 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:51:40,219-Speed 6318.56 samples/sec Loss 5.8476 LearningRate 0.0005 Epoch: 13 Global Step: 276240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:51:43,465-Speed 6311.75 samples/sec Loss 5.8945 LearningRate 0.0005 Epoch: 13 Global Step: 276250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:51:46,714-Speed 6305.08 samples/sec Loss 5.9804 LearningRate 0.0005 Epoch: 13 Global Step: 276260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:51:49,963-Speed 6304.00 samples/sec Loss 5.9066 LearningRate 0.0005 Epoch: 13 Global Step: 276270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:51:53,209-Speed 6312.04 samples/sec Loss 5.9467 LearningRate 0.0005 Epoch: 13 Global Step: 276280 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:51:56,456-Speed 6308.01 samples/sec Loss 6.0277 LearningRate 0.0005 Epoch: 13 Global Step: 276290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:51:59,702-Speed 6311.58 samples/sec Loss 5.9625 LearningRate 0.0005 Epoch: 13 Global Step: 276300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:02,949-Speed 6308.43 samples/sec Loss 5.9376 LearningRate 0.0005 Epoch: 13 Global Step: 276310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:06,195-Speed 6310.13 samples/sec Loss 5.8847 LearningRate 0.0005 Epoch: 13 Global Step: 276320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:09,443-Speed 6306.76 samples/sec Loss 5.9570 LearningRate 0.0005 Epoch: 13 Global Step: 276330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:12,679-Speed 6331.74 samples/sec Loss 5.9823 LearningRate 0.0005 Epoch: 13 Global Step: 276340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:15,932-Speed 6296.88 samples/sec Loss 5.8781 LearningRate 0.0005 Epoch: 13 Global Step: 276350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:19,181-Speed 6303.85 samples/sec Loss 5.8554 LearningRate 0.0005 Epoch: 13 Global Step: 276360 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:22,427-Speed 6312.18 samples/sec Loss 5.9745 LearningRate 0.0005 Epoch: 13 Global Step: 276370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:25,674-Speed 6307.92 samples/sec Loss 5.9854 LearningRate 0.0005 Epoch: 13 Global Step: 276380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:28,919-Speed 6312.83 samples/sec Loss 5.9766 LearningRate 0.0005 Epoch: 13 Global Step: 276390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:32,164-Speed 6313.16 samples/sec Loss 5.8867 LearningRate 0.0005 Epoch: 13 Global Step: 276400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:35,410-Speed 6310.34 samples/sec Loss 5.9347 LearningRate 0.0005 Epoch: 13 Global Step: 276410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:38,653-Speed 6315.03 samples/sec Loss 5.9360 LearningRate 0.0005 Epoch: 13 Global Step: 276420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:41,935-Speed 6242.13 samples/sec Loss 5.9214 LearningRate 0.0005 Epoch: 13 Global Step: 276430 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:45,170-Speed 6332.48 samples/sec Loss 5.8392 LearningRate 0.0005 Epoch: 13 Global Step: 276440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:48,418-Speed 6307.19 samples/sec Loss 5.9758 LearningRate 0.0005 Epoch: 13 Global Step: 276450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:51,663-Speed 6311.99 samples/sec Loss 5.8524 LearningRate 0.0005 Epoch: 13 Global Step: 276460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:54,910-Speed 6309.06 samples/sec Loss 5.9078 LearningRate 0.0005 Epoch: 13 Global Step: 276470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:52:58,182-Speed 6259.34 samples/sec Loss 6.0231 LearningRate 0.0005 Epoch: 13 Global Step: 276480 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:01,430-Speed 6309.15 samples/sec Loss 5.8937 LearningRate 0.0005 Epoch: 13 Global Step: 276490 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:04,678-Speed 6306.02 samples/sec Loss 5.8845 LearningRate 0.0005 Epoch: 13 Global Step: 276500 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:07,925-Speed 6309.77 samples/sec Loss 5.8677 LearningRate 0.0005 Epoch: 13 Global Step: 276510 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:11,172-Speed 6308.19 samples/sec Loss 5.9323 LearningRate 0.0005 Epoch: 13 Global Step: 276520 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:14,422-Speed 6302.03 samples/sec Loss 5.9093 LearningRate 0.0005 Epoch: 13 Global Step: 276530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:17,665-Speed 6317.60 samples/sec Loss 5.8916 LearningRate 0.0005 Epoch: 13 Global Step: 276540 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:53:20,899-Speed 6334.38 samples/sec Loss 5.9258 LearningRate 0.0005 Epoch: 13 Global Step: 276550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:24,149-Speed 6303.31 samples/sec Loss 5.9934 LearningRate 0.0005 Epoch: 13 Global Step: 276560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:27,395-Speed 6310.65 samples/sec Loss 5.9314 LearningRate 0.0005 Epoch: 13 Global Step: 276570 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:30,642-Speed 6308.68 samples/sec Loss 5.9179 LearningRate 0.0005 Epoch: 13 Global Step: 276580 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:33,884-Speed 6318.53 samples/sec Loss 5.9426 LearningRate 0.0005 Epoch: 13 Global Step: 276590 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:37,130-Speed 6310.46 samples/sec Loss 5.8860 LearningRate 0.0005 Epoch: 13 Global Step: 276600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:40,373-Speed 6316.43 samples/sec Loss 5.9451 LearningRate 0.0005 Epoch: 13 Global Step: 276610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:43,619-Speed 6310.13 samples/sec Loss 5.9343 LearningRate 0.0005 Epoch: 13 Global Step: 276620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:46,865-Speed 6311.02 samples/sec Loss 5.8630 LearningRate 0.0005 Epoch: 13 Global Step: 276630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:50,115-Speed 6302.05 samples/sec Loss 5.9805 LearningRate 0.0005 Epoch: 13 Global Step: 276640 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:53,349-Speed 6334.98 samples/sec Loss 5.8893 LearningRate 0.0005 Epoch: 13 Global Step: 276650 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:56,595-Speed 6310.45 samples/sec Loss 5.8829 LearningRate 0.0005 Epoch: 13 Global Step: 276660 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:53:59,846-Speed 6302.61 samples/sec Loss 5.9601 LearningRate 0.0005 Epoch: 13 Global Step: 276670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:54:03,123-Speed 6249.19 samples/sec Loss 5.9149 LearningRate 0.0005 Epoch: 13 Global Step: 276680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:54:06,371-Speed 6306.88 samples/sec Loss 5.9606 LearningRate 0.0005 Epoch: 13 Global Step: 276690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:54:09,619-Speed 6307.39 samples/sec Loss 5.9144 LearningRate 0.0005 Epoch: 13 Global Step: 276700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:54:12,867-Speed 6307.57 samples/sec Loss 5.9522 LearningRate 0.0005 Epoch: 13 Global Step: 276710 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:54:16,118-Speed 6300.45 samples/sec Loss 5.8623 LearningRate 0.0005 Epoch: 13 Global Step: 276720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:54:19,361-Speed 6317.01 samples/sec Loss 5.9259 LearningRate 0.0005 Epoch: 13 Global Step: 276730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:54:22,609-Speed 6307.77 samples/sec Loss 5.9449 LearningRate 0.0005 Epoch: 13 Global Step: 276740 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:54:25,840-Speed 6338.95 samples/sec Loss 5.9494 LearningRate 0.0005 Epoch: 13 Global Step: 276750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:54:29,071-Speed 6341.93 samples/sec Loss 5.9282 LearningRate 0.0005 Epoch: 13 Global Step: 276760 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:54:32,319-Speed 6306.64 samples/sec Loss 5.8892 LearningRate 0.0005 Epoch: 13 Global Step: 276770 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:54:35,566-Speed 6307.80 samples/sec Loss 5.9905 LearningRate 0.0005 Epoch: 13 Global Step: 276780 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:54:38,816-Speed 6302.94 samples/sec Loss 5.9041 LearningRate 0.0005 Epoch: 13 Global Step: 276790 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:54:42,064-Speed 6306.33 samples/sec Loss 5.9073 LearningRate 0.0005 Epoch: 13 Global Step: 276800 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:54:45,310-Speed 6311.08 samples/sec Loss 5.9233 LearningRate 0.0005 Epoch: 13 Global Step: 276810 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:54:48,558-Speed 6307.52 samples/sec Loss 5.8997 LearningRate 0.0005 Epoch: 13 Global Step: 276820 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:54:51,802-Speed 6313.16 samples/sec Loss 5.9351 LearningRate 0.0005 Epoch: 13 Global Step: 276830 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:54:55,052-Speed 6303.37 samples/sec Loss 5.9295 LearningRate 0.0005 Epoch: 13 Global Step: 276840 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:54:58,298-Speed 6311.72 samples/sec Loss 6.0012 LearningRate 0.0005 Epoch: 13 Global Step: 276850 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 16:55:01,550-Speed 6299.00 samples/sec Loss 5.8644 LearningRate 0.0005 Epoch: 13 Global Step: 276860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:04,807-Speed 6288.52 samples/sec Loss 5.9625 LearningRate 0.0005 Epoch: 13 Global Step: 276870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:08,052-Speed 6311.90 samples/sec Loss 5.9899 LearningRate 0.0005 Epoch: 13 Global Step: 276880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:11,306-Speed 6295.33 samples/sec Loss 5.9462 LearningRate 0.0005 Epoch: 13 Global Step: 276890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:14,553-Speed 6310.75 samples/sec Loss 5.8639 LearningRate 0.0005 Epoch: 13 Global Step: 276900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:17,798-Speed 6313.20 samples/sec Loss 5.9546 LearningRate 0.0005 Epoch: 13 Global Step: 276910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:21,046-Speed 6306.00 samples/sec Loss 5.9275 LearningRate 0.0005 Epoch: 13 Global Step: 276920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:24,288-Speed 6319.61 samples/sec Loss 5.9478 LearningRate 0.0005 Epoch: 13 Global Step: 276930 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:27,532-Speed 6313.24 samples/sec Loss 5.9666 LearningRate 0.0005 Epoch: 13 Global Step: 276940 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:30,780-Speed 6308.02 samples/sec Loss 5.8992 LearningRate 0.0005 Epoch: 13 Global Step: 276950 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:34,008-Speed 6344.76 samples/sec Loss 5.9861 LearningRate 0.0005 Epoch: 13 Global Step: 276960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:37,254-Speed 6311.00 samples/sec Loss 5.9718 LearningRate 0.0005 Epoch: 13 Global Step: 276970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:40,504-Speed 6303.02 samples/sec Loss 5.8516 LearningRate 0.0005 Epoch: 13 Global Step: 276980 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:43,751-Speed 6309.20 samples/sec Loss 5.8972 LearningRate 0.0005 Epoch: 13 Global Step: 276990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:46,997-Speed 6310.04 samples/sec Loss 5.8965 LearningRate 0.0005 Epoch: 13 Global Step: 277000 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:50,246-Speed 6305.31 samples/sec Loss 5.9536 LearningRate 0.0005 Epoch: 13 Global Step: 277010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:53,495-Speed 6304.35 samples/sec Loss 5.9510 LearningRate 0.0005 Epoch: 13 Global Step: 277020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:56,741-Speed 6311.88 samples/sec Loss 5.9843 LearningRate 0.0005 Epoch: 13 Global Step: 277030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:55:59,985-Speed 6313.19 samples/sec Loss 5.8771 LearningRate 0.0005 Epoch: 13 Global Step: 277040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:03,234-Speed 6305.19 samples/sec Loss 5.9058 LearningRate 0.0005 Epoch: 13 Global Step: 277050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:06,470-Speed 6330.41 samples/sec Loss 5.8661 LearningRate 0.0005 Epoch: 13 Global Step: 277060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:09,717-Speed 6308.16 samples/sec Loss 5.9149 LearningRate 0.0005 Epoch: 13 Global Step: 277070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:12,964-Speed 6309.83 samples/sec Loss 5.9316 LearningRate 0.0005 Epoch: 13 Global Step: 277080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:16,212-Speed 6305.35 samples/sec Loss 5.9942 LearningRate 0.0005 Epoch: 13 Global Step: 277090 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:19,462-Speed 6304.55 samples/sec Loss 5.8943 LearningRate 0.0005 Epoch: 13 Global Step: 277100 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:22,709-Speed 6309.19 samples/sec Loss 5.9077 LearningRate 0.0005 Epoch: 13 Global Step: 277110 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:25,964-Speed 6292.63 samples/sec Loss 5.9265 LearningRate 0.0005 Epoch: 13 Global Step: 277120 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:29,220-Speed 6292.57 samples/sec Loss 5.9370 LearningRate 0.0005 Epoch: 13 Global Step: 277130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:32,463-Speed 6316.55 samples/sec Loss 5.8495 LearningRate 0.0005 Epoch: 13 Global Step: 277140 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:35,711-Speed 6306.26 samples/sec Loss 5.9194 LearningRate 0.0005 Epoch: 13 Global Step: 277150 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:38,943-Speed 6337.61 samples/sec Loss 5.8933 LearningRate 0.0005 Epoch: 13 Global Step: 277160 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:42,187-Speed 6315.28 samples/sec Loss 6.0101 LearningRate 0.0005 Epoch: 13 Global Step: 277170 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:45,431-Speed 6313.21 samples/sec Loss 5.9346 LearningRate 0.0005 Epoch: 13 Global Step: 277180 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:48,677-Speed 6310.81 samples/sec Loss 5.9461 LearningRate 0.0005 Epoch: 13 Global Step: 277190 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:51,921-Speed 6314.33 samples/sec Loss 5.9170 LearningRate 0.0005 Epoch: 13 Global Step: 277200 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:55,168-Speed 6310.08 samples/sec Loss 5.8815 LearningRate 0.0005 Epoch: 13 Global Step: 277210 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:56:58,416-Speed 6306.07 samples/sec Loss 5.9664 LearningRate 0.0005 Epoch: 13 Global Step: 277220 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:01,661-Speed 6312.28 samples/sec Loss 5.9865 LearningRate 0.0005 Epoch: 13 Global Step: 277230 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:04,909-Speed 6308.31 samples/sec Loss 5.9156 LearningRate 0.0005 Epoch: 13 Global Step: 277240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:08,155-Speed 6309.74 samples/sec Loss 5.9145 LearningRate 0.0005 Epoch: 13 Global Step: 277250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:11,391-Speed 6331.04 samples/sec Loss 5.9300 LearningRate 0.0005 Epoch: 13 Global Step: 277260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:14,642-Speed 6300.61 samples/sec Loss 5.8999 LearningRate 0.0005 Epoch: 13 Global Step: 277270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:17,887-Speed 6311.76 samples/sec Loss 5.8290 LearningRate 0.0005 Epoch: 13 Global Step: 277280 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:21,132-Speed 6313.22 samples/sec Loss 5.9686 LearningRate 0.0005 Epoch: 13 Global Step: 277290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:24,377-Speed 6312.21 samples/sec Loss 5.9325 LearningRate 0.0005 Epoch: 13 Global Step: 277300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:27,624-Speed 6309.76 samples/sec Loss 5.8944 LearningRate 0.0005 Epoch: 13 Global Step: 277310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:30,872-Speed 6308.37 samples/sec Loss 5.8837 LearningRate 0.0005 Epoch: 13 Global Step: 277320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:34,115-Speed 6315.85 samples/sec Loss 5.9316 LearningRate 0.0005 Epoch: 13 Global Step: 277330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:37,362-Speed 6309.03 samples/sec Loss 5.9542 LearningRate 0.0005 Epoch: 13 Global Step: 277340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:40,610-Speed 6306.93 samples/sec Loss 5.9154 LearningRate 0.0005 Epoch: 13 Global Step: 277350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:43,857-Speed 6307.44 samples/sec Loss 5.8916 LearningRate 0.0005 Epoch: 13 Global Step: 277360 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:57:47,090-Speed 6337.18 samples/sec Loss 5.9178 LearningRate 0.0005 Epoch: 13 Global Step: 277370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:50,340-Speed 6303.76 samples/sec Loss 5.8936 LearningRate 0.0005 Epoch: 13 Global Step: 277380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:53,585-Speed 6311.51 samples/sec Loss 5.9772 LearningRate 0.0005 Epoch: 13 Global Step: 277390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:57:56,834-Speed 6305.30 samples/sec Loss 5.9064 LearningRate 0.0005 Epoch: 13 Global Step: 277400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:00,081-Speed 6308.56 samples/sec Loss 5.8483 LearningRate 0.0005 Epoch: 13 Global Step: 277410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:03,330-Speed 6303.87 samples/sec Loss 5.8957 LearningRate 0.0005 Epoch: 13 Global Step: 277420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:06,575-Speed 6312.44 samples/sec Loss 5.9436 LearningRate 0.0005 Epoch: 13 Global Step: 277430 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:09,825-Speed 6303.58 samples/sec Loss 5.8362 LearningRate 0.0005 Epoch: 13 Global Step: 277440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:13,070-Speed 6312.29 samples/sec Loss 5.8443 LearningRate 0.0005 Epoch: 13 Global Step: 277450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:16,317-Speed 6309.08 samples/sec Loss 5.8987 LearningRate 0.0005 Epoch: 13 Global Step: 277460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:19,552-Speed 6332.34 samples/sec Loss 5.9778 LearningRate 0.0005 Epoch: 13 Global Step: 277470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:22,799-Speed 6308.45 samples/sec Loss 5.9623 LearningRate 0.0005 Epoch: 13 Global Step: 277480 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:26,041-Speed 6318.37 samples/sec Loss 5.8726 LearningRate 0.0005 Epoch: 13 Global Step: 277490 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:29,287-Speed 6310.27 samples/sec Loss 5.9572 LearningRate 0.0005 Epoch: 13 Global Step: 277500 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:32,539-Speed 6300.77 samples/sec Loss 5.8132 LearningRate 0.0005 Epoch: 13 Global Step: 277510 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:35,783-Speed 6314.70 samples/sec Loss 5.9592 LearningRate 0.0005 Epoch: 13 Global Step: 277520 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:39,034-Speed 6301.77 samples/sec Loss 5.9143 LearningRate 0.0005 Epoch: 13 Global Step: 277530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:42,282-Speed 6306.50 samples/sec Loss 5.8851 LearningRate 0.0005 Epoch: 13 Global Step: 277540 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:45,530-Speed 6305.49 samples/sec Loss 5.8706 LearningRate 0.0005 Epoch: 13 Global Step: 277550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:48,774-Speed 6315.08 samples/sec Loss 5.8367 LearningRate 0.0005 Epoch: 13 Global Step: 277560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:52,023-Speed 6305.00 samples/sec Loss 5.9191 LearningRate 0.0005 Epoch: 13 Global Step: 277570 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 16:58:55,252-Speed 6344.03 samples/sec Loss 6.0059 LearningRate 0.0005 Epoch: 13 Global Step: 277580 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:58:58,497-Speed 6313.07 samples/sec Loss 5.9766 LearningRate 0.0005 Epoch: 13 Global Step: 277590 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:01,738-Speed 6319.10 samples/sec Loss 5.9345 LearningRate 0.0005 Epoch: 13 Global Step: 277600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:04,983-Speed 6313.58 samples/sec Loss 5.9045 LearningRate 0.0005 Epoch: 13 Global Step: 277610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:08,230-Speed 6307.83 samples/sec Loss 5.9083 LearningRate 0.0005 Epoch: 13 Global Step: 277620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:11,473-Speed 6316.96 samples/sec Loss 5.9138 LearningRate 0.0005 Epoch: 13 Global Step: 277630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:14,722-Speed 6306.33 samples/sec Loss 5.9855 LearningRate 0.0005 Epoch: 13 Global Step: 277640 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:17,963-Speed 6318.80 samples/sec Loss 5.9794 LearningRate 0.0005 Epoch: 13 Global Step: 277650 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:21,212-Speed 6305.81 samples/sec Loss 5.8774 LearningRate 0.0005 Epoch: 13 Global Step: 277660 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:24,461-Speed 6304.94 samples/sec Loss 5.9735 LearningRate 0.0005 Epoch: 13 Global Step: 277670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:27,694-Speed 6336.72 samples/sec Loss 5.9781 LearningRate 0.0005 Epoch: 13 Global Step: 277680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:30,938-Speed 6312.78 samples/sec Loss 5.9819 LearningRate 0.0005 Epoch: 13 Global Step: 277690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:34,184-Speed 6311.22 samples/sec Loss 5.9004 LearningRate 0.0005 Epoch: 13 Global Step: 277700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:37,430-Speed 6311.01 samples/sec Loss 5.9724 LearningRate 0.0005 Epoch: 13 Global Step: 277710 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:40,674-Speed 6315.77 samples/sec Loss 5.8772 LearningRate 0.0005 Epoch: 13 Global Step: 277720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:43,921-Speed 6308.64 samples/sec Loss 5.9417 LearningRate 0.0005 Epoch: 13 Global Step: 277730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:47,168-Speed 6309.02 samples/sec Loss 5.9388 LearningRate 0.0005 Epoch: 13 Global Step: 277740 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:50,412-Speed 6313.72 samples/sec Loss 5.9458 LearningRate 0.0005 Epoch: 13 Global Step: 277750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:53,669-Speed 6290.15 samples/sec Loss 5.8818 LearningRate 0.0005 Epoch: 13 Global Step: 277760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 16:59:56,915-Speed 6311.46 samples/sec Loss 5.8896 LearningRate 0.0005 Epoch: 13 Global Step: 277770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:00,142-Speed 6347.14 samples/sec Loss 5.8401 LearningRate 0.0005 Epoch: 13 Global Step: 277780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:03,388-Speed 6309.91 samples/sec Loss 5.9265 LearningRate 0.0005 Epoch: 13 Global Step: 277790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:06,634-Speed 6312.40 samples/sec Loss 5.8735 LearningRate 0.0005 Epoch: 13 Global Step: 277800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:09,878-Speed 6314.46 samples/sec Loss 5.9384 LearningRate 0.0005 Epoch: 13 Global Step: 277810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:13,119-Speed 6319.13 samples/sec Loss 5.9102 LearningRate 0.0005 Epoch: 13 Global Step: 277820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:16,363-Speed 6314.53 samples/sec Loss 5.8392 LearningRate 0.0005 Epoch: 13 Global Step: 277830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:19,608-Speed 6312.77 samples/sec Loss 5.8899 LearningRate 0.0005 Epoch: 13 Global Step: 277840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:22,853-Speed 6313.08 samples/sec Loss 5.9636 LearningRate 0.0005 Epoch: 13 Global Step: 277850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:26,098-Speed 6313.20 samples/sec Loss 6.0180 LearningRate 0.0005 Epoch: 13 Global Step: 277860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:29,342-Speed 6314.48 samples/sec Loss 5.9029 LearningRate 0.0005 Epoch: 13 Global Step: 277870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:32,588-Speed 6309.94 samples/sec Loss 5.8087 LearningRate 0.0005 Epoch: 13 Global Step: 277880 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:00:35,819-Speed 6340.47 samples/sec Loss 5.8499 LearningRate 0.0005 Epoch: 13 Global Step: 277890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:39,067-Speed 6306.36 samples/sec Loss 5.9887 LearningRate 0.0005 Epoch: 13 Global Step: 277900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:42,320-Speed 6298.21 samples/sec Loss 5.9718 LearningRate 0.0005 Epoch: 13 Global Step: 277910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:45,562-Speed 6318.14 samples/sec Loss 5.9181 LearningRate 0.0005 Epoch: 13 Global Step: 277920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:48,810-Speed 6306.07 samples/sec Loss 5.9544 LearningRate 0.0005 Epoch: 13 Global Step: 277930 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:52,062-Speed 6298.93 samples/sec Loss 5.9409 LearningRate 0.0005 Epoch: 13 Global Step: 277940 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:55,305-Speed 6317.43 samples/sec Loss 5.8814 LearningRate 0.0005 Epoch: 13 Global Step: 277950 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:00:58,550-Speed 6313.04 samples/sec Loss 5.9656 LearningRate 0.0005 Epoch: 13 Global Step: 277960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:01,798-Speed 6306.41 samples/sec Loss 5.9308 LearningRate 0.0005 Epoch: 13 Global Step: 277970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:05,052-Speed 6296.44 samples/sec Loss 5.8671 LearningRate 0.0005 Epoch: 13 Global Step: 277980 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:08,280-Speed 6344.75 samples/sec Loss 5.9942 LearningRate 0.0005 Epoch: 13 Global Step: 277990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:11,526-Speed 6311.43 samples/sec Loss 5.8504 LearningRate 0.0005 Epoch: 13 Global Step: 278000 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:14,772-Speed 6310.49 samples/sec Loss 5.8690 LearningRate 0.0005 Epoch: 13 Global Step: 278010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:18,018-Speed 6310.88 samples/sec Loss 5.9030 LearningRate 0.0005 Epoch: 13 Global Step: 278020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:21,267-Speed 6304.58 samples/sec Loss 5.9201 LearningRate 0.0005 Epoch: 13 Global Step: 278030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:24,520-Speed 6297.97 samples/sec Loss 5.9057 LearningRate 0.0005 Epoch: 13 Global Step: 278040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:27,766-Speed 6309.05 samples/sec Loss 5.9274 LearningRate 0.0005 Epoch: 13 Global Step: 278050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:31,010-Speed 6314.87 samples/sec Loss 5.8565 LearningRate 0.0005 Epoch: 13 Global Step: 278060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:34,257-Speed 6308.63 samples/sec Loss 5.9596 LearningRate 0.0005 Epoch: 13 Global Step: 278070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:37,511-Speed 6296.47 samples/sec Loss 5.9736 LearningRate 0.0005 Epoch: 13 Global Step: 278080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:40,757-Speed 6310.44 samples/sec Loss 5.8535 LearningRate 0.0005 Epoch: 13 Global Step: 278090 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:01:43,993-Speed 6329.50 samples/sec Loss 5.8899 LearningRate 0.0005 Epoch: 13 Global Step: 278100 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:47,240-Speed 6308.45 samples/sec Loss 6.0183 LearningRate 0.0005 Epoch: 13 Global Step: 278110 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:50,487-Speed 6309.16 samples/sec Loss 5.8759 LearningRate 0.0005 Epoch: 13 Global Step: 278120 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:53,767-Speed 6244.99 samples/sec Loss 5.9177 LearningRate 0.0005 Epoch: 13 Global Step: 278130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:01:57,010-Speed 6317.69 samples/sec Loss 5.8589 LearningRate 0.0005 Epoch: 13 Global Step: 278140 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:00,255-Speed 6311.91 samples/sec Loss 5.9309 LearningRate 0.0005 Epoch: 13 Global Step: 278150 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:03,498-Speed 6316.49 samples/sec Loss 5.9192 LearningRate 0.0005 Epoch: 13 Global Step: 278160 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:06,747-Speed 6305.02 samples/sec Loss 5.9730 LearningRate 0.0005 Epoch: 13 Global Step: 278170 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:09,988-Speed 6321.64 samples/sec Loss 5.8838 LearningRate 0.0005 Epoch: 13 Global Step: 278180 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:13,234-Speed 6311.09 samples/sec Loss 5.8462 LearningRate 0.0005 Epoch: 13 Global Step: 278190 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:16,467-Speed 6335.37 samples/sec Loss 5.9580 LearningRate 0.0005 Epoch: 13 Global Step: 278200 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:19,710-Speed 6315.65 samples/sec Loss 5.9187 LearningRate 0.0005 Epoch: 13 Global Step: 278210 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:22,958-Speed 6308.74 samples/sec Loss 5.9796 LearningRate 0.0005 Epoch: 13 Global Step: 278220 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:26,201-Speed 6316.15 samples/sec Loss 5.9121 LearningRate 0.0005 Epoch: 13 Global Step: 278230 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:29,530-Speed 6152.77 samples/sec Loss 5.9017 LearningRate 0.0005 Epoch: 13 Global Step: 278240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:32,778-Speed 6307.35 samples/sec Loss 5.9401 LearningRate 0.0005 Epoch: 13 Global Step: 278250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:36,024-Speed 6310.54 samples/sec Loss 5.8565 LearningRate 0.0005 Epoch: 13 Global Step: 278260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:02:39,258-Speed 6338.57 samples/sec Loss 5.8632 LearningRate 0.0005 Epoch: 13 Global Step: 278270 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:02:42,506-Speed 6306.46 samples/sec Loss 5.9280 LearningRate 0.0005 Epoch: 13 Global Step: 278280 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:02:45,750-Speed 6314.65 samples/sec Loss 5.9201 LearningRate 0.0005 Epoch: 13 Global Step: 278290 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:02:48,993-Speed 6316.66 samples/sec Loss 5.9567 LearningRate 0.0005 Epoch: 13 Global Step: 278300 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:02:52,239-Speed 6311.45 samples/sec Loss 5.9201 LearningRate 0.0005 Epoch: 13 Global Step: 278310 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:02:55,482-Speed 6315.34 samples/sec Loss 5.9099 LearningRate 0.0005 Epoch: 13 Global Step: 278320 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:02:58,729-Speed 6310.20 samples/sec Loss 5.9315 LearningRate 0.0005 Epoch: 13 Global Step: 278330 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:01,981-Speed 6297.67 samples/sec Loss 5.8636 LearningRate 0.0005 Epoch: 13 Global Step: 278340 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:05,228-Speed 6309.90 samples/sec Loss 5.9506 LearningRate 0.0005 Epoch: 13 Global Step: 278350 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:08,480-Speed 6299.35 samples/sec Loss 5.8287 LearningRate 0.0005 Epoch: 13 Global Step: 278360 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:11,727-Speed 6308.81 samples/sec Loss 5.9244 LearningRate 0.0005 Epoch: 13 Global Step: 278370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:03:14,972-Speed 6311.85 samples/sec Loss 6.0345 LearningRate 0.0005 Epoch: 13 Global Step: 278380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:03:18,220-Speed 6306.51 samples/sec Loss 5.8904 LearningRate 0.0005 Epoch: 13 Global Step: 278390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:03:21,466-Speed 6311.60 samples/sec Loss 5.8905 LearningRate 0.0005 Epoch: 13 Global Step: 278400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:03:24,715-Speed 6304.61 samples/sec Loss 5.9109 LearningRate 0.0005 Epoch: 13 Global Step: 278410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:03:27,947-Speed 6338.40 samples/sec Loss 5.9327 LearningRate 0.0005 Epoch: 13 Global Step: 278420 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:31,196-Speed 6305.43 samples/sec Loss 5.9248 LearningRate 0.0005 Epoch: 13 Global Step: 278430 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:34,441-Speed 6312.72 samples/sec Loss 5.8615 LearningRate 0.0005 Epoch: 13 Global Step: 278440 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:37,684-Speed 6316.77 samples/sec Loss 5.9117 LearningRate 0.0005 Epoch: 13 Global Step: 278450 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:40,929-Speed 6312.37 samples/sec Loss 5.9080 LearningRate 0.0005 Epoch: 13 Global Step: 278460 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:44,171-Speed 6317.70 samples/sec Loss 5.9290 LearningRate 0.0005 Epoch: 13 Global Step: 278470 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:47,414-Speed 6316.64 samples/sec Loss 5.8447 LearningRate 0.0005 Epoch: 13 Global Step: 278480 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:50,663-Speed 6305.38 samples/sec Loss 5.8883 LearningRate 0.0005 Epoch: 13 Global Step: 278490 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:53,910-Speed 6312.48 samples/sec Loss 5.8744 LearningRate 0.0005 Epoch: 13 Global Step: 278500 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:03:57,154-Speed 6313.79 samples/sec Loss 5.8955 LearningRate 0.0005 Epoch: 13 Global Step: 278510 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:00,408-Speed 6295.03 samples/sec Loss 5.8881 LearningRate 0.0005 Epoch: 13 Global Step: 278520 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:04:03,656-Speed 6307.74 samples/sec Loss 5.9298 LearningRate 0.0005 Epoch: 13 Global Step: 278530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:04:06,912-Speed 6289.63 samples/sec Loss 5.9591 LearningRate 0.0005 Epoch: 13 Global Step: 278540 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:04:10,158-Speed 6312.13 samples/sec Loss 5.8962 LearningRate 0.0005 Epoch: 13 Global Step: 278550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:04:13,401-Speed 6316.12 samples/sec Loss 5.9279 LearningRate 0.0005 Epoch: 13 Global Step: 278560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:04:16,633-Speed 6338.31 samples/sec Loss 5.9348 LearningRate 0.0005 Epoch: 13 Global Step: 278570 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:19,883-Speed 6303.34 samples/sec Loss 5.9238 LearningRate 0.0005 Epoch: 13 Global Step: 278580 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:23,124-Speed 6320.27 samples/sec Loss 5.8258 LearningRate 0.0005 Epoch: 13 Global Step: 278590 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:26,366-Speed 6318.12 samples/sec Loss 5.9031 LearningRate 0.0005 Epoch: 13 Global Step: 278600 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:29,612-Speed 6311.94 samples/sec Loss 5.9503 LearningRate 0.0005 Epoch: 13 Global Step: 278610 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:32,859-Speed 6308.87 samples/sec Loss 5.8930 LearningRate 0.0005 Epoch: 13 Global Step: 278620 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:36,104-Speed 6313.34 samples/sec Loss 5.7987 LearningRate 0.0005 Epoch: 13 Global Step: 278630 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:39,347-Speed 6315.13 samples/sec Loss 5.9082 LearningRate 0.0005 Epoch: 13 Global Step: 278640 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:42,590-Speed 6318.45 samples/sec Loss 5.8678 LearningRate 0.0005 Epoch: 13 Global Step: 278650 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:45,829-Speed 6324.13 samples/sec Loss 5.8822 LearningRate 0.0005 Epoch: 13 Global Step: 278660 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:04:49,074-Speed 6311.15 samples/sec Loss 5.8577 LearningRate 0.0005 Epoch: 13 Global Step: 278670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:04:52,316-Speed 6319.40 samples/sec Loss 5.8941 LearningRate 0.0005 Epoch: 13 Global Step: 278680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:04:55,559-Speed 6315.75 samples/sec Loss 5.8772 LearningRate 0.0005 Epoch: 13 Global Step: 278690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:04:58,802-Speed 6316.00 samples/sec Loss 5.9772 LearningRate 0.0005 Epoch: 13 Global Step: 278700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:02,047-Speed 6313.59 samples/sec Loss 5.8964 LearningRate 0.0005 Epoch: 13 Global Step: 278710 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:05,293-Speed 6309.91 samples/sec Loss 5.9807 LearningRate 0.0005 Epoch: 13 Global Step: 278720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:08,541-Speed 6307.83 samples/sec Loss 5.8842 LearningRate 0.0005 Epoch: 13 Global Step: 278730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:11,787-Speed 6310.88 samples/sec Loss 5.9671 LearningRate 0.0005 Epoch: 13 Global Step: 278740 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:15,037-Speed 6302.57 samples/sec Loss 5.8992 LearningRate 0.0005 Epoch: 13 Global Step: 278750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:18,282-Speed 6312.86 samples/sec Loss 5.9404 LearningRate 0.0005 Epoch: 13 Global Step: 278760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:21,519-Speed 6327.99 samples/sec Loss 5.8248 LearningRate 0.0005 Epoch: 13 Global Step: 278770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:24,767-Speed 6307.73 samples/sec Loss 5.9810 LearningRate 0.0005 Epoch: 13 Global Step: 278780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:28,015-Speed 6305.57 samples/sec Loss 5.9036 LearningRate 0.0005 Epoch: 13 Global Step: 278790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:31,273-Speed 6287.24 samples/sec Loss 5.9166 LearningRate 0.0005 Epoch: 13 Global Step: 278800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:34,521-Speed 6308.78 samples/sec Loss 5.8960 LearningRate 0.0005 Epoch: 13 Global Step: 278810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:37,767-Speed 6311.49 samples/sec Loss 5.9186 LearningRate 0.0005 Epoch: 13 Global Step: 278820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:41,013-Speed 6309.46 samples/sec Loss 5.8727 LearningRate 0.0005 Epoch: 13 Global Step: 278830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:44,260-Speed 6310.36 samples/sec Loss 5.8992 LearningRate 0.0005 Epoch: 13 Global Step: 278840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:47,504-Speed 6313.59 samples/sec Loss 5.8540 LearningRate 0.0005 Epoch: 13 Global Step: 278850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:50,749-Speed 6312.17 samples/sec Loss 5.8893 LearningRate 0.0005 Epoch: 13 Global Step: 278860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:53,981-Speed 6338.70 samples/sec Loss 5.8536 LearningRate 0.0005 Epoch: 13 Global Step: 278870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:05:57,226-Speed 6312.03 samples/sec Loss 5.9547 LearningRate 0.0005 Epoch: 13 Global Step: 278880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:00,468-Speed 6318.23 samples/sec Loss 5.9834 LearningRate 0.0005 Epoch: 13 Global Step: 278890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:03,714-Speed 6312.04 samples/sec Loss 5.9476 LearningRate 0.0005 Epoch: 13 Global Step: 278900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:06,964-Speed 6301.55 samples/sec Loss 5.8864 LearningRate 0.0005 Epoch: 13 Global Step: 278910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:10,211-Speed 6310.27 samples/sec Loss 5.9495 LearningRate 0.0005 Epoch: 13 Global Step: 278920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:13,452-Speed 6319.77 samples/sec Loss 5.8880 LearningRate 0.0005 Epoch: 13 Global Step: 278930 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:16,699-Speed 6308.02 samples/sec Loss 5.8066 LearningRate 0.0005 Epoch: 13 Global Step: 278940 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:19,948-Speed 6304.79 samples/sec Loss 5.8986 LearningRate 0.0005 Epoch: 13 Global Step: 278950 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:23,201-Speed 6299.65 samples/sec Loss 5.9619 LearningRate 0.0005 Epoch: 13 Global Step: 278960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:26,431-Speed 6341.12 samples/sec Loss 5.9247 LearningRate 0.0005 Epoch: 13 Global Step: 278970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:29,681-Speed 6303.68 samples/sec Loss 5.9390 LearningRate 0.0005 Epoch: 13 Global Step: 278980 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:32,928-Speed 6307.25 samples/sec Loss 5.8428 LearningRate 0.0005 Epoch: 13 Global Step: 278990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:36,174-Speed 6311.31 samples/sec Loss 5.9575 LearningRate 0.0005 Epoch: 13 Global Step: 279000 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:39,421-Speed 6308.78 samples/sec Loss 5.9111 LearningRate 0.0005 Epoch: 13 Global Step: 279010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:42,665-Speed 6314.28 samples/sec Loss 5.8664 LearningRate 0.0005 Epoch: 13 Global Step: 279020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:45,916-Speed 6301.84 samples/sec Loss 5.8727 LearningRate 0.0005 Epoch: 13 Global Step: 279030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:49,164-Speed 6307.51 samples/sec Loss 5.9756 LearningRate 0.0005 Epoch: 13 Global Step: 279040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:52,416-Speed 6299.87 samples/sec Loss 5.9095 LearningRate 0.0005 Epoch: 13 Global Step: 279050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:55,662-Speed 6310.09 samples/sec Loss 5.8450 LearningRate 0.0005 Epoch: 13 Global Step: 279060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:06:58,912-Speed 6303.79 samples/sec Loss 5.9445 LearningRate 0.0005 Epoch: 13 Global Step: 279070 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:07:02,143-Speed 6338.44 samples/sec Loss 5.9106 LearningRate 0.0005 Epoch: 13 Global Step: 279080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:05,390-Speed 6308.69 samples/sec Loss 5.8840 LearningRate 0.0005 Epoch: 13 Global Step: 279090 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:08,639-Speed 6305.25 samples/sec Loss 5.9127 LearningRate 0.0005 Epoch: 13 Global Step: 279100 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:11,890-Speed 6302.19 samples/sec Loss 5.8680 LearningRate 0.0005 Epoch: 13 Global Step: 279110 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:15,137-Speed 6309.57 samples/sec Loss 5.9173 LearningRate 0.0005 Epoch: 13 Global Step: 279120 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:18,386-Speed 6304.45 samples/sec Loss 5.8834 LearningRate 0.0005 Epoch: 13 Global Step: 279130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:21,632-Speed 6309.99 samples/sec Loss 5.8665 LearningRate 0.0005 Epoch: 13 Global Step: 279140 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:24,877-Speed 6312.54 samples/sec Loss 5.9002 LearningRate 0.0005 Epoch: 13 Global Step: 279150 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:28,124-Speed 6308.65 samples/sec Loss 5.8668 LearningRate 0.0005 Epoch: 13 Global Step: 279160 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:31,370-Speed 6310.25 samples/sec Loss 5.8498 LearningRate 0.0005 Epoch: 13 Global Step: 279170 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:34,605-Speed 6331.76 samples/sec Loss 5.8067 LearningRate 0.0005 Epoch: 13 Global Step: 279180 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:37,852-Speed 6309.82 samples/sec Loss 5.9270 LearningRate 0.0005 Epoch: 13 Global Step: 279190 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:41,099-Speed 6307.90 samples/sec Loss 5.8788 LearningRate 0.0005 Epoch: 13 Global Step: 279200 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:44,353-Speed 6295.97 samples/sec Loss 5.8922 LearningRate 0.0005 Epoch: 13 Global Step: 279210 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:47,599-Speed 6311.25 samples/sec Loss 5.9130 LearningRate 0.0005 Epoch: 13 Global Step: 279220 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:50,845-Speed 6310.74 samples/sec Loss 5.9039 LearningRate 0.0005 Epoch: 13 Global Step: 279230 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:54,088-Speed 6317.45 samples/sec Loss 5.9143 LearningRate 0.0005 Epoch: 13 Global Step: 279240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:07:57,335-Speed 6308.53 samples/sec Loss 5.9038 LearningRate 0.0005 Epoch: 13 Global Step: 279250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:00,582-Speed 6308.20 samples/sec Loss 5.9504 LearningRate 0.0005 Epoch: 13 Global Step: 279260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:03,831-Speed 6306.43 samples/sec Loss 5.9148 LearningRate 0.0005 Epoch: 13 Global Step: 279270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:07,067-Speed 6329.79 samples/sec Loss 5.8453 LearningRate 0.0005 Epoch: 13 Global Step: 279280 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:10,313-Speed 6310.20 samples/sec Loss 5.9161 LearningRate 0.0005 Epoch: 13 Global Step: 279290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:13,559-Speed 6310.54 samples/sec Loss 5.8661 LearningRate 0.0005 Epoch: 13 Global Step: 279300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:16,810-Speed 6301.52 samples/sec Loss 5.9269 LearningRate 0.0005 Epoch: 13 Global Step: 279310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:20,059-Speed 6304.70 samples/sec Loss 5.8663 LearningRate 0.0005 Epoch: 13 Global Step: 279320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:23,309-Speed 6303.05 samples/sec Loss 5.9369 LearningRate 0.0005 Epoch: 13 Global Step: 279330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:26,557-Speed 6306.27 samples/sec Loss 5.8877 LearningRate 0.0005 Epoch: 13 Global Step: 279340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:29,803-Speed 6311.64 samples/sec Loss 5.8692 LearningRate 0.0005 Epoch: 13 Global Step: 279350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:33,050-Speed 6308.93 samples/sec Loss 5.9482 LearningRate 0.0005 Epoch: 13 Global Step: 279360 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:36,295-Speed 6311.97 samples/sec Loss 5.7977 LearningRate 0.0005 Epoch: 13 Global Step: 279370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:39,529-Speed 6334.70 samples/sec Loss 5.9256 LearningRate 0.0005 Epoch: 13 Global Step: 279380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:42,777-Speed 6306.45 samples/sec Loss 5.9395 LearningRate 0.0005 Epoch: 13 Global Step: 279390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:46,022-Speed 6312.41 samples/sec Loss 5.8931 LearningRate 0.0005 Epoch: 13 Global Step: 279400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:49,267-Speed 6313.87 samples/sec Loss 5.9318 LearningRate 0.0005 Epoch: 13 Global Step: 279410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:52,511-Speed 6312.79 samples/sec Loss 5.9559 LearningRate 0.0005 Epoch: 13 Global Step: 279420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:55,757-Speed 6311.46 samples/sec Loss 5.9416 LearningRate 0.0005 Epoch: 13 Global Step: 279430 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:08:59,002-Speed 6313.62 samples/sec Loss 5.8872 LearningRate 0.0005 Epoch: 13 Global Step: 279440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:09:02,248-Speed 6310.11 samples/sec Loss 5.9224 LearningRate 0.0005 Epoch: 13 Global Step: 279450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:09:05,492-Speed 6314.32 samples/sec Loss 5.9371 LearningRate 0.0005 Epoch: 13 Global Step: 279460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:09:08,739-Speed 6311.00 samples/sec Loss 5.8916 LearningRate 0.0005 Epoch: 13 Global Step: 279470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:09:11,984-Speed 6312.12 samples/sec Loss 5.8945 LearningRate 0.0005 Epoch: 13 Global Step: 279480 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:09:15,216-Speed 6337.11 samples/sec Loss 5.9518 LearningRate 0.0005 Epoch: 13 Global Step: 279490 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:09:18,446-Speed 6342.53 samples/sec Loss 5.9446 LearningRate 0.0005 Epoch: 13 Global Step: 279500 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:09:21,696-Speed 6302.90 samples/sec Loss 5.8721 LearningRate 0.0005 Epoch: 13 Global Step: 279510 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:09:24,940-Speed 6314.87 samples/sec Loss 5.8488 LearningRate 0.0005 Epoch: 13 Global Step: 279520 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:09:28,185-Speed 6313.04 samples/sec Loss 5.8504 LearningRate 0.0005 Epoch: 13 Global Step: 279530 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:09:31,429-Speed 6313.94 samples/sec Loss 5.8741 LearningRate 0.0005 Epoch: 13 Global Step: 279540 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:09:34,672-Speed 6316.89 samples/sec Loss 5.9722 LearningRate 0.0005 Epoch: 13 Global Step: 279550 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:09:37,912-Speed 6321.95 samples/sec Loss 5.8894 LearningRate 0.0005 Epoch: 13 Global Step: 279560 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:09:41,154-Speed 6318.68 samples/sec Loss 5.8341 LearningRate 0.0005 Epoch: 13 Global Step: 279570 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:09:44,398-Speed 6314.37 samples/sec Loss 5.8325 LearningRate 0.0005 Epoch: 13 Global Step: 279580 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:09:47,663-Speed 6274.01 samples/sec Loss 5.8618 LearningRate 0.0005 Epoch: 13 Global Step: 279590 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:09:50,905-Speed 6318.91 samples/sec Loss 6.0185 LearningRate 0.0005 Epoch: 13 Global Step: 279600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:09:54,149-Speed 6314.49 samples/sec Loss 5.9227 LearningRate 0.0005 Epoch: 13 Global Step: 279610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:09:57,396-Speed 6308.13 samples/sec Loss 5.9078 LearningRate 0.0005 Epoch: 13 Global Step: 279620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:00,641-Speed 6313.66 samples/sec Loss 5.9279 LearningRate 0.0005 Epoch: 13 Global Step: 279630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:03,886-Speed 6312.63 samples/sec Loss 5.8244 LearningRate 0.0005 Epoch: 13 Global Step: 279640 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:07,130-Speed 6313.61 samples/sec Loss 5.8283 LearningRate 0.0005 Epoch: 13 Global Step: 279650 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:10,379-Speed 6306.79 samples/sec Loss 5.8984 LearningRate 0.0005 Epoch: 13 Global Step: 279660 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:13,621-Speed 6318.45 samples/sec Loss 5.8520 LearningRate 0.0005 Epoch: 13 Global Step: 279670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:16,863-Speed 6317.03 samples/sec Loss 5.9050 LearningRate 0.0005 Epoch: 13 Global Step: 279680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:20,120-Speed 6290.87 samples/sec Loss 5.8992 LearningRate 0.0005 Epoch: 13 Global Step: 279690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:23,349-Speed 6342.24 samples/sec Loss 5.9318 LearningRate 0.0005 Epoch: 13 Global Step: 279700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:26,597-Speed 6308.28 samples/sec Loss 5.9147 LearningRate 0.0005 Epoch: 13 Global Step: 279710 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:29,842-Speed 6311.47 samples/sec Loss 5.8535 LearningRate 0.0005 Epoch: 13 Global Step: 279720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:33,089-Speed 6309.25 samples/sec Loss 5.9045 LearningRate 0.0005 Epoch: 13 Global Step: 279730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:36,336-Speed 6308.96 samples/sec Loss 5.8829 LearningRate 0.0005 Epoch: 13 Global Step: 279740 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:39,579-Speed 6316.78 samples/sec Loss 5.8774 LearningRate 0.0005 Epoch: 13 Global Step: 279750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:42,827-Speed 6307.56 samples/sec Loss 5.8856 LearningRate 0.0005 Epoch: 13 Global Step: 279760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:46,071-Speed 6313.55 samples/sec Loss 5.9392 LearningRate 0.0005 Epoch: 13 Global Step: 279770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:49,316-Speed 6313.78 samples/sec Loss 5.9227 LearningRate 0.0005 Epoch: 13 Global Step: 279780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:52,568-Speed 6298.25 samples/sec Loss 5.8711 LearningRate 0.0005 Epoch: 13 Global Step: 279790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:55,800-Speed 6337.53 samples/sec Loss 5.9006 LearningRate 0.0005 Epoch: 13 Global Step: 279800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:10:59,046-Speed 6310.66 samples/sec Loss 5.9645 LearningRate 0.0005 Epoch: 13 Global Step: 279810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:02,290-Speed 6315.11 samples/sec Loss 5.9333 LearningRate 0.0005 Epoch: 13 Global Step: 279820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:05,538-Speed 6307.11 samples/sec Loss 5.9413 LearningRate 0.0005 Epoch: 13 Global Step: 279830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:08,785-Speed 6308.12 samples/sec Loss 5.8157 LearningRate 0.0005 Epoch: 13 Global Step: 279840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:12,031-Speed 6310.90 samples/sec Loss 5.9130 LearningRate 0.0005 Epoch: 13 Global Step: 279850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:15,282-Speed 6300.83 samples/sec Loss 5.9039 LearningRate 0.0005 Epoch: 13 Global Step: 279860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:18,532-Speed 6303.37 samples/sec Loss 5.9023 LearningRate 0.0005 Epoch: 13 Global Step: 279870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:21,777-Speed 6314.49 samples/sec Loss 5.8967 LearningRate 0.0005 Epoch: 13 Global Step: 279880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:25,024-Speed 6308.69 samples/sec Loss 5.8870 LearningRate 0.0005 Epoch: 13 Global Step: 279890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:28,254-Speed 6341.42 samples/sec Loss 5.9030 LearningRate 0.0005 Epoch: 13 Global Step: 279900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:31,500-Speed 6310.67 samples/sec Loss 5.8306 LearningRate 0.0005 Epoch: 13 Global Step: 279910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:34,746-Speed 6311.06 samples/sec Loss 5.9936 LearningRate 0.0005 Epoch: 13 Global Step: 279920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:37,991-Speed 6311.24 samples/sec Loss 5.9206 LearningRate 0.0005 Epoch: 13 Global Step: 279930 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:41,281-Speed 6227.61 samples/sec Loss 5.9088 LearningRate 0.0005 Epoch: 13 Global Step: 279940 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:44,524-Speed 6315.28 samples/sec Loss 5.9062 LearningRate 0.0005 Epoch: 13 Global Step: 279950 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:47,771-Speed 6309.58 samples/sec Loss 5.9233 LearningRate 0.0005 Epoch: 13 Global Step: 279960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:51,021-Speed 6303.57 samples/sec Loss 5.9517 LearningRate 0.0005 Epoch: 13 Global Step: 279970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:54,365-Speed 6125.32 samples/sec Loss 5.8529 LearningRate 0.0005 Epoch: 13 Global Step: 279980 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:11:57,611-Speed 6309.81 samples/sec Loss 5.9363 LearningRate 0.0005 Epoch: 13 Global Step: 279990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:00,859-Speed 6308.46 samples/sec Loss 5.9496 LearningRate 0.0005 Epoch: 13 Global Step: 280000 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:12:04,095-Speed 6329.24 samples/sec Loss 5.9408 LearningRate 0.0005 Epoch: 13 Global Step: 280010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:07,342-Speed 6308.11 samples/sec Loss 5.8426 LearningRate 0.0005 Epoch: 13 Global Step: 280020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:10,584-Speed 6319.31 samples/sec Loss 5.8611 LearningRate 0.0005 Epoch: 13 Global Step: 280030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:13,827-Speed 6315.96 samples/sec Loss 5.9272 LearningRate 0.0005 Epoch: 13 Global Step: 280040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:17,073-Speed 6311.27 samples/sec Loss 5.8736 LearningRate 0.0005 Epoch: 13 Global Step: 280050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:20,320-Speed 6309.54 samples/sec Loss 5.9366 LearningRate 0.0005 Epoch: 13 Global Step: 280060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:23,562-Speed 6318.61 samples/sec Loss 5.8980 LearningRate 0.0005 Epoch: 13 Global Step: 280070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:26,816-Speed 6295.20 samples/sec Loss 5.7871 LearningRate 0.0005 Epoch: 13 Global Step: 280080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:30,063-Speed 6309.14 samples/sec Loss 5.8788 LearningRate 0.0005 Epoch: 13 Global Step: 280090 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:33,311-Speed 6305.45 samples/sec Loss 5.8534 LearningRate 0.0005 Epoch: 13 Global Step: 280100 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:36,543-Speed 6339.87 samples/sec Loss 5.8894 LearningRate 0.0005 Epoch: 13 Global Step: 280110 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:39,792-Speed 6304.09 samples/sec Loss 5.9260 LearningRate 0.0005 Epoch: 13 Global Step: 280120 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:43,038-Speed 6311.32 samples/sec Loss 5.8503 LearningRate 0.0005 Epoch: 13 Global Step: 280130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:46,285-Speed 6307.87 samples/sec Loss 5.8858 LearningRate 0.0005 Epoch: 13 Global Step: 280140 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:49,530-Speed 6313.72 samples/sec Loss 5.8103 LearningRate 0.0005 Epoch: 13 Global Step: 280150 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:52,784-Speed 6294.18 samples/sec Loss 5.8295 LearningRate 0.0005 Epoch: 13 Global Step: 280160 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:56,027-Speed 6316.81 samples/sec Loss 5.9401 LearningRate 0.0005 Epoch: 13 Global Step: 280170 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:12:59,273-Speed 6311.10 samples/sec Loss 5.8952 LearningRate 0.0005 Epoch: 13 Global Step: 280180 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:02,521-Speed 6305.67 samples/sec Loss 5.8528 LearningRate 0.0005 Epoch: 13 Global Step: 280190 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:05,767-Speed 6312.22 samples/sec Loss 5.8608 LearningRate 0.0005 Epoch: 13 Global Step: 280200 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:08,999-Speed 6337.61 samples/sec Loss 5.8745 LearningRate 0.0005 Epoch: 13 Global Step: 280210 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:12,243-Speed 6314.09 samples/sec Loss 5.8281 LearningRate 0.0005 Epoch: 13 Global Step: 280220 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:15,487-Speed 6315.60 samples/sec Loss 5.8902 LearningRate 0.0005 Epoch: 13 Global Step: 280230 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:18,733-Speed 6310.61 samples/sec Loss 5.8437 LearningRate 0.0005 Epoch: 13 Global Step: 280240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:21,981-Speed 6307.58 samples/sec Loss 5.8112 LearningRate 0.0005 Epoch: 13 Global Step: 280250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:25,230-Speed 6304.58 samples/sec Loss 5.9652 LearningRate 0.0005 Epoch: 13 Global Step: 280260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:28,474-Speed 6313.93 samples/sec Loss 5.8910 LearningRate 0.0005 Epoch: 13 Global Step: 280270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:31,725-Speed 6302.53 samples/sec Loss 5.9753 LearningRate 0.0005 Epoch: 13 Global Step: 280280 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:34,970-Speed 6313.55 samples/sec Loss 5.8641 LearningRate 0.0005 Epoch: 13 Global Step: 280290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:38,215-Speed 6312.87 samples/sec Loss 5.8506 LearningRate 0.0005 Epoch: 13 Global Step: 280300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:41,450-Speed 6331.52 samples/sec Loss 5.9384 LearningRate 0.0005 Epoch: 13 Global Step: 280310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:44,693-Speed 6315.80 samples/sec Loss 6.0028 LearningRate 0.0005 Epoch: 13 Global Step: 280320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:47,944-Speed 6301.94 samples/sec Loss 5.8692 LearningRate 0.0005 Epoch: 13 Global Step: 280330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:51,189-Speed 6312.26 samples/sec Loss 5.8724 LearningRate 0.0005 Epoch: 13 Global Step: 280340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:54,436-Speed 6308.27 samples/sec Loss 5.8579 LearningRate 0.0005 Epoch: 13 Global Step: 280350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:13:57,680-Speed 6316.03 samples/sec Loss 5.8299 LearningRate 0.0005 Epoch: 13 Global Step: 280360 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:00,930-Speed 6302.06 samples/sec Loss 5.8503 LearningRate 0.0005 Epoch: 13 Global Step: 280370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:04,180-Speed 6303.47 samples/sec Loss 5.8837 LearningRate 0.0005 Epoch: 13 Global Step: 280380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:07,428-Speed 6307.28 samples/sec Loss 5.9145 LearningRate 0.0005 Epoch: 13 Global Step: 280390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:10,673-Speed 6312.18 samples/sec Loss 5.9872 LearningRate 0.0005 Epoch: 13 Global Step: 280400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:13,922-Speed 6305.14 samples/sec Loss 5.8826 LearningRate 0.0005 Epoch: 13 Global Step: 280410 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:14:17,153-Speed 6340.86 samples/sec Loss 5.9390 LearningRate 0.0005 Epoch: 13 Global Step: 280420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:20,403-Speed 6302.07 samples/sec Loss 5.8861 LearningRate 0.0005 Epoch: 13 Global Step: 280430 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:23,652-Speed 6305.96 samples/sec Loss 5.9029 LearningRate 0.0005 Epoch: 13 Global Step: 280440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:26,899-Speed 6308.70 samples/sec Loss 5.8784 LearningRate 0.0005 Epoch: 13 Global Step: 280450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:30,147-Speed 6306.01 samples/sec Loss 5.9062 LearningRate 0.0005 Epoch: 13 Global Step: 280460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:33,399-Speed 6298.82 samples/sec Loss 5.8910 LearningRate 0.0005 Epoch: 13 Global Step: 280470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:36,647-Speed 6307.74 samples/sec Loss 5.9203 LearningRate 0.0005 Epoch: 13 Global Step: 280480 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:39,893-Speed 6311.60 samples/sec Loss 5.9762 LearningRate 0.0005 Epoch: 13 Global Step: 280490 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:43,139-Speed 6310.66 samples/sec Loss 5.8839 LearningRate 0.0005 Epoch: 13 Global Step: 280500 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:46,386-Speed 6307.98 samples/sec Loss 5.9833 LearningRate 0.0005 Epoch: 13 Global Step: 280510 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:49,633-Speed 6309.77 samples/sec Loss 5.8573 LearningRate 0.0005 Epoch: 13 Global Step: 280520 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:14:52,863-Speed 6342.03 samples/sec Loss 5.9369 LearningRate 0.0005 Epoch: 13 Global Step: 280530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:56,109-Speed 6310.59 samples/sec Loss 5.9272 LearningRate 0.0005 Epoch: 13 Global Step: 280540 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:14:59,351-Speed 6317.53 samples/sec Loss 5.8759 LearningRate 0.0005 Epoch: 13 Global Step: 280550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:15:02,598-Speed 6308.38 samples/sec Loss 5.8309 LearningRate 0.0005 Epoch: 13 Global Step: 280560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:15:05,843-Speed 6314.26 samples/sec Loss 5.8969 LearningRate 0.0005 Epoch: 13 Global Step: 280570 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:15:09,090-Speed 6307.15 samples/sec Loss 5.9048 LearningRate 0.0005 Epoch: 13 Global Step: 280580 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:15:12,333-Speed 6316.56 samples/sec Loss 5.8479 LearningRate 0.0005 Epoch: 13 Global Step: 280590 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:15:15,582-Speed 6305.10 samples/sec Loss 5.9352 LearningRate 0.0005 Epoch: 13 Global Step: 280600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:15:18,828-Speed 6310.27 samples/sec Loss 5.9376 LearningRate 0.0005 Epoch: 13 Global Step: 280610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:15:22,077-Speed 6305.88 samples/sec Loss 5.8617 LearningRate 0.0005 Epoch: 13 Global Step: 280620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:15:25,315-Speed 6325.48 samples/sec Loss 5.8532 LearningRate 0.0005 Epoch: 13 Global Step: 280630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:15:28,669-Speed 6107.93 samples/sec Loss 5.8594 LearningRate 0.0005 Epoch: 13 Global Step: 280640 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:15:31,945-Speed 6253.46 samples/sec Loss 5.8965 LearningRate 0.0005 Epoch: 13 Global Step: 280650 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:15:35,187-Speed 6319.03 samples/sec Loss 5.9013 LearningRate 0.0005 Epoch: 13 Global Step: 280660 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:15:38,436-Speed 6303.81 samples/sec Loss 5.9375 LearningRate 0.0005 Epoch: 13 Global Step: 280670 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:15:41,684-Speed 6308.27 samples/sec Loss 5.9359 LearningRate 0.0005 Epoch: 13 Global Step: 280680 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:15:44,934-Speed 6303.47 samples/sec Loss 5.9078 LearningRate 0.0005 Epoch: 13 Global Step: 280690 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:15:48,183-Speed 6304.99 samples/sec Loss 5.8932 LearningRate 0.0005 Epoch: 13 Global Step: 280700 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:15:51,427-Speed 6314.13 samples/sec Loss 5.8604 LearningRate 0.0005 Epoch: 13 Global Step: 280710 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:15:54,672-Speed 6313.12 samples/sec Loss 5.9344 LearningRate 0.0005 Epoch: 13 Global Step: 280720 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:15:57,919-Speed 6308.81 samples/sec Loss 5.8434 LearningRate 0.0005 Epoch: 13 Global Step: 280730 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:16:01,167-Speed 6306.43 samples/sec Loss 5.8380 LearningRate 0.0005 Epoch: 13 Global Step: 280740 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:16:04,414-Speed 6308.65 samples/sec Loss 5.8690 LearningRate 0.0005 Epoch: 13 Global Step: 280750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:07,662-Speed 6307.06 samples/sec Loss 5.8693 LearningRate 0.0005 Epoch: 13 Global Step: 280760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:10,910-Speed 6305.95 samples/sec Loss 5.9332 LearningRate 0.0005 Epoch: 13 Global Step: 280770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:14,156-Speed 6309.98 samples/sec Loss 5.9142 LearningRate 0.0005 Epoch: 13 Global Step: 280780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:17,408-Speed 6299.40 samples/sec Loss 5.8991 LearningRate 0.0005 Epoch: 13 Global Step: 280790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:20,654-Speed 6311.87 samples/sec Loss 5.9619 LearningRate 0.0005 Epoch: 13 Global Step: 280800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:23,902-Speed 6306.73 samples/sec Loss 5.8599 LearningRate 0.0005 Epoch: 13 Global Step: 280810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:27,149-Speed 6308.49 samples/sec Loss 5.8577 LearningRate 0.0005 Epoch: 13 Global Step: 280820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:30,394-Speed 6312.66 samples/sec Loss 5.9567 LearningRate 0.0005 Epoch: 13 Global Step: 280830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:33,640-Speed 6310.92 samples/sec Loss 5.9217 LearningRate 0.0005 Epoch: 13 Global Step: 280840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:36,873-Speed 6335.66 samples/sec Loss 5.9117 LearningRate 0.0005 Epoch: 13 Global Step: 280850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:40,117-Speed 6314.42 samples/sec Loss 5.9113 LearningRate 0.0005 Epoch: 13 Global Step: 280860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:43,362-Speed 6312.21 samples/sec Loss 5.9139 LearningRate 0.0005 Epoch: 13 Global Step: 280870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:46,610-Speed 6307.89 samples/sec Loss 5.8688 LearningRate 0.0005 Epoch: 13 Global Step: 280880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:49,861-Speed 6301.95 samples/sec Loss 5.9465 LearningRate 0.0005 Epoch: 13 Global Step: 280890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:16:53,095-Speed 6333.97 samples/sec Loss 5.8909 LearningRate 0.0005 Epoch: 13 Global Step: 280900 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:16:56,338-Speed 6317.46 samples/sec Loss 5.8823 LearningRate 0.0005 Epoch: 13 Global Step: 280910 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:16:59,589-Speed 6300.81 samples/sec Loss 5.8645 LearningRate 0.0005 Epoch: 13 Global Step: 280920 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:17:02,835-Speed 6309.15 samples/sec Loss 5.9623 LearningRate 0.0005 Epoch: 13 Global Step: 280930 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:17:06,079-Speed 6314.87 samples/sec Loss 5.8910 LearningRate 0.0005 Epoch: 13 Global Step: 280940 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:17:09,321-Speed 6319.39 samples/sec Loss 5.9162 LearningRate 0.0005 Epoch: 13 Global Step: 280950 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:17:12,567-Speed 6310.88 samples/sec Loss 5.9431 LearningRate 0.0005 Epoch: 13 Global Step: 280960 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:17:15,812-Speed 6312.09 samples/sec Loss 5.9030 LearningRate 0.0005 Epoch: 13 Global Step: 280970 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:17:19,064-Speed 6299.76 samples/sec Loss 5.8585 LearningRate 0.0005 Epoch: 13 Global Step: 280980 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:17:22,311-Speed 6308.57 samples/sec Loss 5.8721 LearningRate 0.0005 Epoch: 13 Global Step: 280990 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:17:25,556-Speed 6313.11 samples/sec Loss 5.8742 LearningRate 0.0005 Epoch: 13 Global Step: 281000 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:17:28,804-Speed 6305.56 samples/sec Loss 5.9801 LearningRate 0.0005 Epoch: 13 Global Step: 281010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:17:32,049-Speed 6312.49 samples/sec Loss 5.9047 LearningRate 0.0005 Epoch: 13 Global Step: 281020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:17:35,294-Speed 6313.82 samples/sec Loss 5.8691 LearningRate 0.0005 Epoch: 13 Global Step: 281030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:17:38,539-Speed 6311.15 samples/sec Loss 5.8720 LearningRate 0.0005 Epoch: 13 Global Step: 281040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:17:41,785-Speed 6310.85 samples/sec Loss 5.8876 LearningRate 0.0005 Epoch: 13 Global Step: 281050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:17:45,031-Speed 6310.42 samples/sec Loss 5.8768 LearningRate 0.0005 Epoch: 13 Global Step: 281060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:17:48,281-Speed 6304.02 samples/sec Loss 5.8756 LearningRate 0.0005 Epoch: 13 Global Step: 281070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:17:51,526-Speed 6312.61 samples/sec Loss 5.8627 LearningRate 0.0005 Epoch: 13 Global Step: 281080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:17:54,772-Speed 6311.47 samples/sec Loss 5.9198 LearningRate 0.0005 Epoch: 13 Global Step: 281090 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:17:58,007-Speed 6332.67 samples/sec Loss 5.9483 LearningRate 0.0005 Epoch: 13 Global Step: 281100 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:01,254-Speed 6308.70 samples/sec Loss 5.9162 LearningRate 0.0005 Epoch: 13 Global Step: 281110 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:04,498-Speed 6313.66 samples/sec Loss 5.8503 LearningRate 0.0005 Epoch: 13 Global Step: 281120 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:07,744-Speed 6310.25 samples/sec Loss 5.9121 LearningRate 0.0005 Epoch: 13 Global Step: 281130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:10,993-Speed 6306.27 samples/sec Loss 5.8731 LearningRate 0.0005 Epoch: 13 Global Step: 281140 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:14,244-Speed 6300.57 samples/sec Loss 5.8833 LearningRate 0.0005 Epoch: 13 Global Step: 281150 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:17,487-Speed 6316.43 samples/sec Loss 5.9231 LearningRate 0.0005 Epoch: 13 Global Step: 281160 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:20,735-Speed 6306.46 samples/sec Loss 5.8501 LearningRate 0.0005 Epoch: 13 Global Step: 281170 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:23,984-Speed 6305.07 samples/sec Loss 5.9226 LearningRate 0.0005 Epoch: 13 Global Step: 281180 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:27,234-Speed 6303.79 samples/sec Loss 5.8231 LearningRate 0.0005 Epoch: 13 Global Step: 281190 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:30,465-Speed 6338.47 samples/sec Loss 5.8218 LearningRate 0.0005 Epoch: 13 Global Step: 281200 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:33,708-Speed 6316.18 samples/sec Loss 5.8136 LearningRate 0.0005 Epoch: 13 Global Step: 281210 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:36,957-Speed 6305.89 samples/sec Loss 5.8289 LearningRate 0.0005 Epoch: 13 Global Step: 281220 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:40,201-Speed 6313.94 samples/sec Loss 5.8091 LearningRate 0.0005 Epoch: 13 Global Step: 281230 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:43,446-Speed 6313.41 samples/sec Loss 5.9026 LearningRate 0.0005 Epoch: 13 Global Step: 281240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:46,693-Speed 6308.35 samples/sec Loss 5.9479 LearningRate 0.0005 Epoch: 13 Global Step: 281250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:49,938-Speed 6312.31 samples/sec Loss 5.8968 LearningRate 0.0005 Epoch: 13 Global Step: 281260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:53,184-Speed 6311.89 samples/sec Loss 5.8946 LearningRate 0.0005 Epoch: 13 Global Step: 281270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:56,428-Speed 6314.73 samples/sec Loss 5.9076 LearningRate 0.0005 Epoch: 13 Global Step: 281280 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:18:59,674-Speed 6310.69 samples/sec Loss 5.8619 LearningRate 0.0005 Epoch: 13 Global Step: 281290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:02,930-Speed 6291.56 samples/sec Loss 5.8147 LearningRate 0.0005 Epoch: 13 Global Step: 281300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:06,175-Speed 6312.18 samples/sec Loss 5.9153 LearningRate 0.0005 Epoch: 13 Global Step: 281310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:09,423-Speed 6307.74 samples/sec Loss 5.8717 LearningRate 0.0005 Epoch: 13 Global Step: 281320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:12,666-Speed 6316.73 samples/sec Loss 5.8518 LearningRate 0.0005 Epoch: 13 Global Step: 281330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:15,910-Speed 6315.47 samples/sec Loss 5.9000 LearningRate 0.0005 Epoch: 13 Global Step: 281340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:19,155-Speed 6310.96 samples/sec Loss 5.7796 LearningRate 0.0005 Epoch: 13 Global Step: 281350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:22,403-Speed 6307.87 samples/sec Loss 5.8283 LearningRate 0.0005 Epoch: 13 Global Step: 281360 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:25,646-Speed 6316.66 samples/sec Loss 5.7880 LearningRate 0.0005 Epoch: 13 Global Step: 281370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:28,892-Speed 6310.97 samples/sec Loss 5.8055 LearningRate 0.0005 Epoch: 13 Global Step: 281380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:32,138-Speed 6310.61 samples/sec Loss 5.8470 LearningRate 0.0005 Epoch: 13 Global Step: 281390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:35,385-Speed 6308.40 samples/sec Loss 5.8321 LearningRate 0.0005 Epoch: 13 Global Step: 281400 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:19:38,611-Speed 6349.36 samples/sec Loss 5.9351 LearningRate 0.0005 Epoch: 13 Global Step: 281410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:41,855-Speed 6315.47 samples/sec Loss 5.8559 LearningRate 0.0005 Epoch: 13 Global Step: 281420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:45,102-Speed 6308.34 samples/sec Loss 5.8741 LearningRate 0.0005 Epoch: 13 Global Step: 281430 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:48,349-Speed 6308.00 samples/sec Loss 5.9258 LearningRate 0.0005 Epoch: 13 Global Step: 281440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:51,594-Speed 6313.47 samples/sec Loss 5.8745 LearningRate 0.0005 Epoch: 13 Global Step: 281450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:54,837-Speed 6316.50 samples/sec Loss 5.9155 LearningRate 0.0005 Epoch: 13 Global Step: 281460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:19:58,083-Speed 6310.78 samples/sec Loss 5.9859 LearningRate 0.0005 Epoch: 13 Global Step: 281470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:01,334-Speed 6300.69 samples/sec Loss 5.8781 LearningRate 0.0005 Epoch: 13 Global Step: 281480 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:04,584-Speed 6304.14 samples/sec Loss 5.8945 LearningRate 0.0005 Epoch: 13 Global Step: 281490 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:07,830-Speed 6311.06 samples/sec Loss 5.9074 LearningRate 0.0005 Epoch: 13 Global Step: 281500 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:11,059-Speed 6343.78 samples/sec Loss 5.8377 LearningRate 0.0005 Epoch: 13 Global Step: 281510 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:14,306-Speed 6309.12 samples/sec Loss 5.9747 LearningRate 0.0005 Epoch: 13 Global Step: 281520 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:17,551-Speed 6311.80 samples/sec Loss 5.9239 LearningRate 0.0005 Epoch: 13 Global Step: 281530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:20,797-Speed 6312.38 samples/sec Loss 5.7534 LearningRate 0.0005 Epoch: 13 Global Step: 281540 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:24,053-Speed 6290.20 samples/sec Loss 5.9026 LearningRate 0.0005 Epoch: 13 Global Step: 281550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:27,299-Speed 6311.95 samples/sec Loss 5.9455 LearningRate 0.0005 Epoch: 13 Global Step: 281560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:30,546-Speed 6308.53 samples/sec Loss 5.9063 LearningRate 0.0005 Epoch: 13 Global Step: 281570 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:33,792-Speed 6310.47 samples/sec Loss 5.8195 LearningRate 0.0005 Epoch: 13 Global Step: 281580 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:37,038-Speed 6310.26 samples/sec Loss 5.8919 LearningRate 0.0005 Epoch: 13 Global Step: 281590 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:40,282-Speed 6314.15 samples/sec Loss 5.8197 LearningRate 0.0005 Epoch: 13 Global Step: 281600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:43,515-Speed 6336.21 samples/sec Loss 5.9246 LearningRate 0.0005 Epoch: 13 Global Step: 281610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:46,764-Speed 6304.80 samples/sec Loss 5.8574 LearningRate 0.0005 Epoch: 13 Global Step: 281620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:50,009-Speed 6312.96 samples/sec Loss 5.8871 LearningRate 0.0005 Epoch: 13 Global Step: 281630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:53,259-Speed 6302.72 samples/sec Loss 5.9352 LearningRate 0.0005 Epoch: 13 Global Step: 281640 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:56,511-Speed 6299.00 samples/sec Loss 5.7889 LearningRate 0.0005 Epoch: 13 Global Step: 281650 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:20:59,761-Speed 6302.90 samples/sec Loss 5.9081 LearningRate 0.0005 Epoch: 13 Global Step: 281660 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:03,010-Speed 6305.66 samples/sec Loss 5.8314 LearningRate 0.0005 Epoch: 13 Global Step: 281670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:06,256-Speed 6309.90 samples/sec Loss 5.9277 LearningRate 0.0005 Epoch: 13 Global Step: 281680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:09,506-Speed 6303.83 samples/sec Loss 5.8936 LearningRate 0.0005 Epoch: 13 Global Step: 281690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:12,755-Speed 6303.83 samples/sec Loss 5.9478 LearningRate 0.0005 Epoch: 13 Global Step: 281700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:16,002-Speed 6309.90 samples/sec Loss 5.8622 LearningRate 0.0005 Epoch: 13 Global Step: 281710 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:21:19,232-Speed 6340.74 samples/sec Loss 5.8739 LearningRate 0.0005 Epoch: 13 Global Step: 281720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:22,480-Speed 6307.69 samples/sec Loss 5.8395 LearningRate 0.0005 Epoch: 13 Global Step: 281730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:25,724-Speed 6313.85 samples/sec Loss 5.9439 LearningRate 0.0005 Epoch: 13 Global Step: 281740 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:28,976-Speed 6299.83 samples/sec Loss 5.9354 LearningRate 0.0005 Epoch: 13 Global Step: 281750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:32,222-Speed 6310.96 samples/sec Loss 5.8502 LearningRate 0.0005 Epoch: 13 Global Step: 281760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:35,472-Speed 6302.76 samples/sec Loss 5.8590 LearningRate 0.0005 Epoch: 13 Global Step: 281770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:38,720-Speed 6306.77 samples/sec Loss 5.8278 LearningRate 0.0005 Epoch: 13 Global Step: 281780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:41,968-Speed 6307.35 samples/sec Loss 5.9676 LearningRate 0.0005 Epoch: 13 Global Step: 281790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:45,223-Speed 6293.25 samples/sec Loss 5.7873 LearningRate 0.0005 Epoch: 13 Global Step: 281800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:48,468-Speed 6311.86 samples/sec Loss 5.8171 LearningRate 0.0005 Epoch: 13 Global Step: 281810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:51,710-Speed 6319.82 samples/sec Loss 5.8939 LearningRate 0.0005 Epoch: 13 Global Step: 281820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:54,956-Speed 6310.17 samples/sec Loss 5.9033 LearningRate 0.0005 Epoch: 13 Global Step: 281830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:21:58,204-Speed 6307.63 samples/sec Loss 5.8671 LearningRate 0.0005 Epoch: 13 Global Step: 281840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:01,450-Speed 6309.33 samples/sec Loss 5.8204 LearningRate 0.0005 Epoch: 13 Global Step: 281850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:04,697-Speed 6308.95 samples/sec Loss 5.8857 LearningRate 0.0005 Epoch: 13 Global Step: 281860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:07,945-Speed 6306.02 samples/sec Loss 5.9745 LearningRate 0.0005 Epoch: 13 Global Step: 281870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:11,197-Speed 6299.72 samples/sec Loss 5.9183 LearningRate 0.0005 Epoch: 13 Global Step: 281880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:14,438-Speed 6320.47 samples/sec Loss 5.8797 LearningRate 0.0005 Epoch: 13 Global Step: 281890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:17,684-Speed 6310.03 samples/sec Loss 5.9181 LearningRate 0.0005 Epoch: 13 Global Step: 281900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:20,931-Speed 6309.50 samples/sec Loss 5.8073 LearningRate 0.0005 Epoch: 13 Global Step: 281910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:24,163-Speed 6338.16 samples/sec Loss 5.9054 LearningRate 0.0005 Epoch: 13 Global Step: 281920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:27,411-Speed 6307.29 samples/sec Loss 5.8985 LearningRate 0.0005 Epoch: 13 Global Step: 281930 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:30,656-Speed 6313.63 samples/sec Loss 5.9397 LearningRate 0.0005 Epoch: 13 Global Step: 281940 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:33,902-Speed 6309.93 samples/sec Loss 5.8606 LearningRate 0.0005 Epoch: 13 Global Step: 281950 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:37,147-Speed 6311.91 samples/sec Loss 5.8808 LearningRate 0.0005 Epoch: 13 Global Step: 281960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:40,393-Speed 6312.17 samples/sec Loss 5.8995 LearningRate 0.0005 Epoch: 13 Global Step: 281970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:43,638-Speed 6312.74 samples/sec Loss 5.8997 LearningRate 0.0005 Epoch: 13 Global Step: 281980 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:46,885-Speed 6308.04 samples/sec Loss 5.8300 LearningRate 0.0005 Epoch: 13 Global Step: 281990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:50,131-Speed 6311.00 samples/sec Loss 5.9313 LearningRate 0.0005 Epoch: 13 Global Step: 282000 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:53,378-Speed 6308.22 samples/sec Loss 5.8676 LearningRate 0.0005 Epoch: 13 Global Step: 282010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:56,610-Speed 6337.80 samples/sec Loss 5.8422 LearningRate 0.0005 Epoch: 13 Global Step: 282020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:22:59,857-Speed 6309.73 samples/sec Loss 5.8075 LearningRate 0.0005 Epoch: 13 Global Step: 282030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:03,101-Speed 6314.18 samples/sec Loss 5.9004 LearningRate 0.0005 Epoch: 13 Global Step: 282040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:06,346-Speed 6312.32 samples/sec Loss 5.8537 LearningRate 0.0005 Epoch: 13 Global Step: 282050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:09,600-Speed 6296.71 samples/sec Loss 5.8347 LearningRate 0.0005 Epoch: 13 Global Step: 282060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:12,844-Speed 6313.71 samples/sec Loss 5.8919 LearningRate 0.0005 Epoch: 13 Global Step: 282070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:16,089-Speed 6312.99 samples/sec Loss 5.8443 LearningRate 0.0005 Epoch: 13 Global Step: 282080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:19,334-Speed 6312.21 samples/sec Loss 5.8959 LearningRate 0.0005 Epoch: 13 Global Step: 282090 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:22,575-Speed 6321.08 samples/sec Loss 5.9288 LearningRate 0.0005 Epoch: 13 Global Step: 282100 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:25,818-Speed 6314.83 samples/sec Loss 5.8684 LearningRate 0.0005 Epoch: 13 Global Step: 282110 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:29,063-Speed 6313.55 samples/sec Loss 5.8576 LearningRate 0.0005 Epoch: 13 Global Step: 282120 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:23:32,295-Speed 6338.44 samples/sec Loss 5.7657 LearningRate 0.0005 Epoch: 13 Global Step: 282130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:35,538-Speed 6317.04 samples/sec Loss 5.8730 LearningRate 0.0005 Epoch: 13 Global Step: 282140 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:38,787-Speed 6304.16 samples/sec Loss 5.8436 LearningRate 0.0005 Epoch: 13 Global Step: 282150 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:42,034-Speed 6310.62 samples/sec Loss 5.8099 LearningRate 0.0005 Epoch: 13 Global Step: 282160 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:45,275-Speed 6318.73 samples/sec Loss 5.8842 LearningRate 0.0005 Epoch: 13 Global Step: 282170 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:48,520-Speed 6314.21 samples/sec Loss 5.8189 LearningRate 0.0005 Epoch: 13 Global Step: 282180 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:51,784-Speed 6275.65 samples/sec Loss 5.8919 LearningRate 0.0005 Epoch: 13 Global Step: 282190 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:55,062-Speed 6248.80 samples/sec Loss 5.8579 LearningRate 0.0005 Epoch: 13 Global Step: 282200 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:23:58,309-Speed 6309.02 samples/sec Loss 5.8598 LearningRate 0.0005 Epoch: 13 Global Step: 282210 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:01,554-Speed 6312.49 samples/sec Loss 5.8695 LearningRate 0.0005 Epoch: 13 Global Step: 282220 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:04,784-Speed 6341.26 samples/sec Loss 5.8783 LearningRate 0.0005 Epoch: 13 Global Step: 282230 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:08,034-Speed 6303.51 samples/sec Loss 5.8222 LearningRate 0.0005 Epoch: 13 Global Step: 282240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:11,278-Speed 6314.06 samples/sec Loss 5.9269 LearningRate 0.0005 Epoch: 13 Global Step: 282250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:14,528-Speed 6304.45 samples/sec Loss 5.9667 LearningRate 0.0005 Epoch: 13 Global Step: 282260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:17,820-Speed 6221.53 samples/sec Loss 5.9089 LearningRate 0.0005 Epoch: 13 Global Step: 282270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:21,070-Speed 6302.65 samples/sec Loss 5.9477 LearningRate 0.0005 Epoch: 13 Global Step: 282280 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:24,318-Speed 6306.41 samples/sec Loss 5.8967 LearningRate 0.0005 Epoch: 13 Global Step: 282290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:27,565-Speed 6310.34 samples/sec Loss 6.0091 LearningRate 0.0005 Epoch: 13 Global Step: 282300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:30,811-Speed 6309.45 samples/sec Loss 5.8784 LearningRate 0.0005 Epoch: 13 Global Step: 282310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:34,056-Speed 6313.37 samples/sec Loss 5.9233 LearningRate 0.0005 Epoch: 13 Global Step: 282320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:37,287-Speed 6339.25 samples/sec Loss 5.8917 LearningRate 0.0005 Epoch: 13 Global Step: 282330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:40,534-Speed 6310.21 samples/sec Loss 5.8472 LearningRate 0.0005 Epoch: 13 Global Step: 282340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:43,777-Speed 6315.68 samples/sec Loss 5.8693 LearningRate 0.0005 Epoch: 13 Global Step: 282350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:47,020-Speed 6317.23 samples/sec Loss 5.9057 LearningRate 0.0005 Epoch: 13 Global Step: 282360 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:50,267-Speed 6310.13 samples/sec Loss 5.9603 LearningRate 0.0005 Epoch: 13 Global Step: 282370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:53,512-Speed 6310.83 samples/sec Loss 5.8910 LearningRate 0.0005 Epoch: 13 Global Step: 282380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:24:56,758-Speed 6311.46 samples/sec Loss 5.8591 LearningRate 0.0005 Epoch: 13 Global Step: 282390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:00,003-Speed 6312.81 samples/sec Loss 5.8802 LearningRate 0.0005 Epoch: 13 Global Step: 282400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:03,249-Speed 6311.30 samples/sec Loss 5.8981 LearningRate 0.0005 Epoch: 13 Global Step: 282410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:06,498-Speed 6303.77 samples/sec Loss 5.8900 LearningRate 0.0005 Epoch: 13 Global Step: 282420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:09,743-Speed 6312.63 samples/sec Loss 5.8594 LearningRate 0.0005 Epoch: 13 Global Step: 282430 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:25:12,975-Speed 6339.28 samples/sec Loss 5.9068 LearningRate 0.0005 Epoch: 13 Global Step: 282440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:16,223-Speed 6305.93 samples/sec Loss 5.8878 LearningRate 0.0005 Epoch: 13 Global Step: 282450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:19,468-Speed 6312.92 samples/sec Loss 5.9440 LearningRate 0.0005 Epoch: 13 Global Step: 282460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:22,714-Speed 6309.84 samples/sec Loss 5.8615 LearningRate 0.0005 Epoch: 13 Global Step: 282470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:25,963-Speed 6305.22 samples/sec Loss 5.8809 LearningRate 0.0005 Epoch: 13 Global Step: 282480 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:29,211-Speed 6306.99 samples/sec Loss 5.9301 LearningRate 0.0005 Epoch: 13 Global Step: 282490 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:32,463-Speed 6299.34 samples/sec Loss 5.8822 LearningRate 0.0005 Epoch: 13 Global Step: 282500 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:35,716-Speed 6296.95 samples/sec Loss 5.8282 LearningRate 0.0005 Epoch: 13 Global Step: 282510 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:38,962-Speed 6310.69 samples/sec Loss 5.8153 LearningRate 0.0005 Epoch: 13 Global Step: 282520 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:42,209-Speed 6307.63 samples/sec Loss 5.8933 LearningRate 0.0005 Epoch: 13 Global Step: 282530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:45,443-Speed 6334.05 samples/sec Loss 5.8140 LearningRate 0.0005 Epoch: 13 Global Step: 282540 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:48,693-Speed 6303.65 samples/sec Loss 5.8544 LearningRate 0.0005 Epoch: 13 Global Step: 282550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:51,942-Speed 6305.20 samples/sec Loss 5.9587 LearningRate 0.0005 Epoch: 13 Global Step: 282560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:55,189-Speed 6308.86 samples/sec Loss 5.9079 LearningRate 0.0005 Epoch: 13 Global Step: 282570 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:25:58,436-Speed 6309.87 samples/sec Loss 5.9214 LearningRate 0.0005 Epoch: 13 Global Step: 282580 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:01,681-Speed 6313.88 samples/sec Loss 5.9427 LearningRate 0.0005 Epoch: 13 Global Step: 282590 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:04,932-Speed 6300.17 samples/sec Loss 5.9132 LearningRate 0.0005 Epoch: 13 Global Step: 282600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:08,180-Speed 6307.13 samples/sec Loss 5.9239 LearningRate 0.0005 Epoch: 13 Global Step: 282610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:11,427-Speed 6308.10 samples/sec Loss 5.8992 LearningRate 0.0005 Epoch: 13 Global Step: 282620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:14,674-Speed 6310.06 samples/sec Loss 5.8169 LearningRate 0.0005 Epoch: 13 Global Step: 282630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:17,910-Speed 6329.28 samples/sec Loss 5.9142 LearningRate 0.0005 Epoch: 13 Global Step: 282640 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:21,157-Speed 6308.66 samples/sec Loss 5.8859 LearningRate 0.0005 Epoch: 13 Global Step: 282650 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:24,457-Speed 6207.16 samples/sec Loss 5.8302 LearningRate 0.0005 Epoch: 13 Global Step: 282660 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:27,728-Speed 6261.67 samples/sec Loss 5.8318 LearningRate 0.0005 Epoch: 13 Global Step: 282670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:30,973-Speed 6313.49 samples/sec Loss 5.7972 LearningRate 0.0005 Epoch: 13 Global Step: 282680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:34,228-Speed 6293.79 samples/sec Loss 5.8489 LearningRate 0.0005 Epoch: 13 Global Step: 282690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:37,479-Speed 6301.71 samples/sec Loss 5.8598 LearningRate 0.0005 Epoch: 13 Global Step: 282700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:40,723-Speed 6314.52 samples/sec Loss 5.8786 LearningRate 0.0005 Epoch: 13 Global Step: 282710 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:43,977-Speed 6293.60 samples/sec Loss 5.8156 LearningRate 0.0005 Epoch: 13 Global Step: 282720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:47,226-Speed 6305.58 samples/sec Loss 5.8723 LearningRate 0.0005 Epoch: 13 Global Step: 282730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:50,471-Speed 6311.91 samples/sec Loss 5.8674 LearningRate 0.0005 Epoch: 13 Global Step: 282740 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:26:53,703-Speed 6339.65 samples/sec Loss 5.8977 LearningRate 0.0005 Epoch: 13 Global Step: 282750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:26:56,948-Speed 6312.71 samples/sec Loss 5.8693 LearningRate 0.0005 Epoch: 13 Global Step: 282760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:00,198-Speed 6303.07 samples/sec Loss 5.8884 LearningRate 0.0005 Epoch: 13 Global Step: 282770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:03,502-Speed 6200.30 samples/sec Loss 5.8635 LearningRate 0.0005 Epoch: 13 Global Step: 282780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:06,778-Speed 6251.92 samples/sec Loss 5.9536 LearningRate 0.0005 Epoch: 13 Global Step: 282790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:10,024-Speed 6310.99 samples/sec Loss 5.9283 LearningRate 0.0005 Epoch: 13 Global Step: 282800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:13,273-Speed 6304.71 samples/sec Loss 5.8053 LearningRate 0.0005 Epoch: 13 Global Step: 282810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:16,515-Speed 6318.54 samples/sec Loss 5.9200 LearningRate 0.0005 Epoch: 13 Global Step: 282820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:19,767-Speed 6300.41 samples/sec Loss 5.8734 LearningRate 0.0005 Epoch: 13 Global Step: 282830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:23,013-Speed 6310.66 samples/sec Loss 5.8642 LearningRate 0.0005 Epoch: 13 Global Step: 282840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:26,262-Speed 6303.82 samples/sec Loss 5.8652 LearningRate 0.0005 Epoch: 13 Global Step: 282850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:29,506-Speed 6315.10 samples/sec Loss 5.8913 LearningRate 0.0005 Epoch: 13 Global Step: 282860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:32,752-Speed 6310.53 samples/sec Loss 5.8685 LearningRate 0.0005 Epoch: 13 Global Step: 282870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:35,993-Speed 6320.67 samples/sec Loss 5.8967 LearningRate 0.0005 Epoch: 13 Global Step: 282880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:39,238-Speed 6311.88 samples/sec Loss 5.8265 LearningRate 0.0005 Epoch: 13 Global Step: 282890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:42,483-Speed 6312.68 samples/sec Loss 5.8615 LearningRate 0.0005 Epoch: 13 Global Step: 282900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:45,729-Speed 6310.13 samples/sec Loss 5.9103 LearningRate 0.0005 Epoch: 13 Global Step: 282910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:48,979-Speed 6304.47 samples/sec Loss 5.8192 LearningRate 0.0005 Epoch: 13 Global Step: 282920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:52,224-Speed 6312.21 samples/sec Loss 5.9037 LearningRate 0.0005 Epoch: 13 Global Step: 282930 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:55,472-Speed 6306.06 samples/sec Loss 5.8710 LearningRate 0.0005 Epoch: 13 Global Step: 282940 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:27:58,705-Speed 6336.63 samples/sec Loss 5.8916 LearningRate 0.0005 Epoch: 13 Global Step: 282950 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:01,948-Speed 6317.72 samples/sec Loss 5.9031 LearningRate 0.0005 Epoch: 13 Global Step: 282960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:05,189-Speed 6319.86 samples/sec Loss 5.8871 LearningRate 0.0005 Epoch: 13 Global Step: 282970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:08,434-Speed 6313.46 samples/sec Loss 5.9213 LearningRate 0.0005 Epoch: 13 Global Step: 282980 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:11,676-Speed 6318.62 samples/sec Loss 5.9633 LearningRate 0.0005 Epoch: 13 Global Step: 282990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:14,918-Speed 6318.08 samples/sec Loss 5.9636 LearningRate 0.0005 Epoch: 13 Global Step: 283000 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:18,170-Speed 6299.34 samples/sec Loss 5.8858 LearningRate 0.0005 Epoch: 13 Global Step: 283010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:21,415-Speed 6311.82 samples/sec Loss 5.9148 LearningRate 0.0005 Epoch: 13 Global Step: 283020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:24,661-Speed 6312.11 samples/sec Loss 5.8267 LearningRate 0.0005 Epoch: 13 Global Step: 283030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:27,909-Speed 6305.77 samples/sec Loss 5.8532 LearningRate 0.0005 Epoch: 13 Global Step: 283040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:31,147-Speed 6327.36 samples/sec Loss 5.9250 LearningRate 0.0005 Epoch: 13 Global Step: 283050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:34,394-Speed 6308.27 samples/sec Loss 5.8297 LearningRate 0.0005 Epoch: 13 Global Step: 283060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:37,638-Speed 6315.51 samples/sec Loss 5.9116 LearningRate 0.0005 Epoch: 13 Global Step: 283070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:40,884-Speed 6310.14 samples/sec Loss 5.8921 LearningRate 0.0005 Epoch: 13 Global Step: 283080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:44,136-Speed 6299.99 samples/sec Loss 5.9226 LearningRate 0.0005 Epoch: 13 Global Step: 283090 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:47,385-Speed 6304.45 samples/sec Loss 5.9070 LearningRate 0.0005 Epoch: 13 Global Step: 283100 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:50,629-Speed 6315.12 samples/sec Loss 5.8175 LearningRate 0.0005 Epoch: 13 Global Step: 283110 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:53,880-Speed 6300.05 samples/sec Loss 5.7876 LearningRate 0.0005 Epoch: 13 Global Step: 283120 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:28:57,125-Speed 6312.30 samples/sec Loss 5.8714 LearningRate 0.0005 Epoch: 13 Global Step: 283130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:29:00,374-Speed 6306.26 samples/sec Loss 5.9054 LearningRate 0.0005 Epoch: 13 Global Step: 283140 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:29:03,606-Speed 6338.36 samples/sec Loss 5.8842 LearningRate 0.0005 Epoch: 13 Global Step: 283150 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:29:06,857-Speed 6300.79 samples/sec Loss 5.9112 LearningRate 0.0005 Epoch: 13 Global Step: 283160 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:29:10,093-Speed 6331.32 samples/sec Loss 5.8960 LearningRate 0.0005 Epoch: 13 Global Step: 283170 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:29:13,340-Speed 6307.18 samples/sec Loss 5.9196 LearningRate 0.0005 Epoch: 13 Global Step: 283180 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:29:16,590-Speed 6302.71 samples/sec Loss 5.7849 LearningRate 0.0005 Epoch: 13 Global Step: 283190 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:29:19,832-Speed 6318.28 samples/sec Loss 5.8648 LearningRate 0.0005 Epoch: 13 Global Step: 283200 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:29:23,078-Speed 6312.45 samples/sec Loss 5.9419 LearningRate 0.0005 Epoch: 13 Global Step: 283210 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:29:26,323-Speed 6311.25 samples/sec Loss 5.9064 LearningRate 0.0005 Epoch: 13 Global Step: 283220 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:29:29,571-Speed 6308.18 samples/sec Loss 5.8548 LearningRate 0.0005 Epoch: 13 Global Step: 283230 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:29:32,820-Speed 6303.45 samples/sec Loss 5.9313 LearningRate 0.0005 Epoch: 13 Global Step: 283240 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:29:36,065-Speed 6312.88 samples/sec Loss 5.8126 LearningRate 0.0005 Epoch: 13 Global Step: 283250 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:29:39,309-Speed 6315.92 samples/sec Loss 5.9532 LearningRate 0.0005 Epoch: 13 Global Step: 283260 Fp16 Grad Scale: 16384 Required: 50 hours Training: 2022-04-01 17:29:42,556-Speed 6307.09 samples/sec Loss 5.8551 LearningRate 0.0005 Epoch: 13 Global Step: 283270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:29:45,800-Speed 6314.61 samples/sec Loss 5.9286 LearningRate 0.0005 Epoch: 13 Global Step: 283280 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:29:49,046-Speed 6311.72 samples/sec Loss 5.8544 LearningRate 0.0005 Epoch: 13 Global Step: 283290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:29:52,293-Speed 6308.70 samples/sec Loss 5.9059 LearningRate 0.0005 Epoch: 13 Global Step: 283300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:29:55,541-Speed 6307.38 samples/sec Loss 5.8859 LearningRate 0.0005 Epoch: 13 Global Step: 283310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:29:58,786-Speed 6310.90 samples/sec Loss 5.9257 LearningRate 0.0005 Epoch: 13 Global Step: 283320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:02,036-Speed 6304.39 samples/sec Loss 5.7605 LearningRate 0.0005 Epoch: 13 Global Step: 283330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:05,365-Speed 6154.55 samples/sec Loss 5.8506 LearningRate 0.0005 Epoch: 13 Global Step: 283340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:08,613-Speed 6305.48 samples/sec Loss 5.8319 LearningRate 0.0005 Epoch: 13 Global Step: 283350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:11,863-Speed 6304.14 samples/sec Loss 5.8434 LearningRate 0.0005 Epoch: 13 Global Step: 283360 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:15,128-Speed 6274.82 samples/sec Loss 5.9100 LearningRate 0.0005 Epoch: 13 Global Step: 283370 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:30:18,394-Speed 6271.86 samples/sec Loss 5.7902 LearningRate 0.0005 Epoch: 13 Global Step: 283380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:21,639-Speed 6313.24 samples/sec Loss 5.8309 LearningRate 0.0005 Epoch: 13 Global Step: 283390 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:24,887-Speed 6305.97 samples/sec Loss 5.9335 LearningRate 0.0005 Epoch: 13 Global Step: 283400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:28,137-Speed 6302.62 samples/sec Loss 5.8881 LearningRate 0.0005 Epoch: 13 Global Step: 283410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:31,386-Speed 6306.56 samples/sec Loss 5.8329 LearningRate 0.0005 Epoch: 13 Global Step: 283420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:34,631-Speed 6312.68 samples/sec Loss 5.8355 LearningRate 0.0005 Epoch: 13 Global Step: 283430 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:37,877-Speed 6310.40 samples/sec Loss 5.9249 LearningRate 0.0005 Epoch: 13 Global Step: 283440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:41,125-Speed 6305.90 samples/sec Loss 5.9291 LearningRate 0.0005 Epoch: 13 Global Step: 283450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:44,375-Speed 6303.53 samples/sec Loss 5.8398 LearningRate 0.0005 Epoch: 13 Global Step: 283460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:47,626-Speed 6301.17 samples/sec Loss 5.8322 LearningRate 0.0005 Epoch: 13 Global Step: 283470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:50,865-Speed 6324.63 samples/sec Loss 5.8692 LearningRate 0.0005 Epoch: 13 Global Step: 283480 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:54,108-Speed 6315.13 samples/sec Loss 5.8349 LearningRate 0.0005 Epoch: 13 Global Step: 283490 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:30:57,352-Speed 6315.91 samples/sec Loss 5.9113 LearningRate 0.0005 Epoch: 13 Global Step: 283500 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:00,604-Speed 6299.35 samples/sec Loss 5.9073 LearningRate 0.0005 Epoch: 13 Global Step: 283510 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:03,853-Speed 6304.64 samples/sec Loss 5.8588 LearningRate 0.0005 Epoch: 13 Global Step: 283520 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:07,112-Speed 6284.93 samples/sec Loss 5.8390 LearningRate 0.0005 Epoch: 13 Global Step: 283530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:10,362-Speed 6302.41 samples/sec Loss 5.8957 LearningRate 0.0005 Epoch: 13 Global Step: 283540 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:13,612-Speed 6302.71 samples/sec Loss 5.8960 LearningRate 0.0005 Epoch: 13 Global Step: 283550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:16,859-Speed 6310.02 samples/sec Loss 5.9014 LearningRate 0.0005 Epoch: 13 Global Step: 283560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:20,105-Speed 6311.40 samples/sec Loss 5.8122 LearningRate 0.0005 Epoch: 13 Global Step: 283570 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:23,337-Speed 6337.64 samples/sec Loss 5.8298 LearningRate 0.0005 Epoch: 13 Global Step: 283580 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:26,585-Speed 6306.69 samples/sec Loss 5.9823 LearningRate 0.0005 Epoch: 13 Global Step: 283590 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:29,858-Speed 6259.03 samples/sec Loss 5.8940 LearningRate 0.0005 Epoch: 13 Global Step: 283600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:33,106-Speed 6307.62 samples/sec Loss 5.8003 LearningRate 0.0005 Epoch: 13 Global Step: 283610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:36,350-Speed 6312.66 samples/sec Loss 5.9293 LearningRate 0.0005 Epoch: 13 Global Step: 283620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:39,602-Speed 6299.67 samples/sec Loss 5.8563 LearningRate 0.0005 Epoch: 13 Global Step: 283630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:42,847-Speed 6313.12 samples/sec Loss 5.8642 LearningRate 0.0005 Epoch: 13 Global Step: 283640 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:46,098-Speed 6300.50 samples/sec Loss 5.8548 LearningRate 0.0005 Epoch: 13 Global Step: 283650 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:49,344-Speed 6310.60 samples/sec Loss 5.8825 LearningRate 0.0005 Epoch: 13 Global Step: 283660 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:52,589-Speed 6312.47 samples/sec Loss 5.8384 LearningRate 0.0005 Epoch: 13 Global Step: 283670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:55,842-Speed 6297.42 samples/sec Loss 5.7869 LearningRate 0.0005 Epoch: 13 Global Step: 283680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:31:59,085-Speed 6315.84 samples/sec Loss 5.9265 LearningRate 0.0005 Epoch: 13 Global Step: 283690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:02,334-Speed 6306.42 samples/sec Loss 5.8471 LearningRate 0.0005 Epoch: 13 Global Step: 283700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:05,583-Speed 6303.71 samples/sec Loss 5.9092 LearningRate 0.0005 Epoch: 13 Global Step: 283710 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:08,833-Speed 6303.05 samples/sec Loss 5.9508 LearningRate 0.0005 Epoch: 13 Global Step: 283720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:12,078-Speed 6313.41 samples/sec Loss 5.8766 LearningRate 0.0005 Epoch: 13 Global Step: 283730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:15,323-Speed 6312.21 samples/sec Loss 5.8918 LearningRate 0.0005 Epoch: 13 Global Step: 283740 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:18,572-Speed 6304.67 samples/sec Loss 5.8763 LearningRate 0.0005 Epoch: 13 Global Step: 283750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:21,818-Speed 6310.77 samples/sec Loss 5.8994 LearningRate 0.0005 Epoch: 13 Global Step: 283760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:25,066-Speed 6308.14 samples/sec Loss 5.8206 LearningRate 0.0005 Epoch: 13 Global Step: 283770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:28,297-Speed 6338.43 samples/sec Loss 5.9046 LearningRate 0.0005 Epoch: 13 Global Step: 283780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:31,539-Speed 6320.76 samples/sec Loss 5.8713 LearningRate 0.0005 Epoch: 13 Global Step: 283790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:34,788-Speed 6304.40 samples/sec Loss 5.7519 LearningRate 0.0005 Epoch: 13 Global Step: 283800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:38,038-Speed 6302.65 samples/sec Loss 5.8298 LearningRate 0.0005 Epoch: 13 Global Step: 283810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:41,286-Speed 6307.02 samples/sec Loss 5.9028 LearningRate 0.0005 Epoch: 13 Global Step: 283820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:44,532-Speed 6311.60 samples/sec Loss 5.8794 LearningRate 0.0005 Epoch: 13 Global Step: 283830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:47,774-Speed 6318.09 samples/sec Loss 5.8760 LearningRate 0.0005 Epoch: 13 Global Step: 283840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:51,017-Speed 6316.70 samples/sec Loss 5.9179 LearningRate 0.0005 Epoch: 13 Global Step: 283850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:54,262-Speed 6312.29 samples/sec Loss 5.9267 LearningRate 0.0005 Epoch: 13 Global Step: 283860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:32:57,507-Speed 6313.04 samples/sec Loss 5.8513 LearningRate 0.0005 Epoch: 13 Global Step: 283870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:00,743-Speed 6329.75 samples/sec Loss 5.8542 LearningRate 0.0005 Epoch: 13 Global Step: 283880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:04,004-Speed 6281.82 samples/sec Loss 5.8093 LearningRate 0.0005 Epoch: 13 Global Step: 283890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:07,252-Speed 6306.05 samples/sec Loss 5.9179 LearningRate 0.0005 Epoch: 13 Global Step: 283900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:10,507-Speed 6293.54 samples/sec Loss 5.9083 LearningRate 0.0005 Epoch: 13 Global Step: 283910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:13,753-Speed 6311.93 samples/sec Loss 5.8579 LearningRate 0.0005 Epoch: 13 Global Step: 283920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:16,998-Speed 6310.80 samples/sec Loss 5.8852 LearningRate 0.0005 Epoch: 13 Global Step: 283930 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:20,245-Speed 6310.09 samples/sec Loss 5.9128 LearningRate 0.0005 Epoch: 13 Global Step: 283940 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:23,491-Speed 6309.42 samples/sec Loss 5.8741 LearningRate 0.0005 Epoch: 13 Global Step: 283950 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:26,738-Speed 6309.30 samples/sec Loss 5.8484 LearningRate 0.0005 Epoch: 13 Global Step: 283960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:29,987-Speed 6304.49 samples/sec Loss 5.7867 LearningRate 0.0005 Epoch: 13 Global Step: 283970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:33,233-Speed 6310.76 samples/sec Loss 5.9537 LearningRate 0.0005 Epoch: 13 Global Step: 283980 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:33:36,471-Speed 6326.81 samples/sec Loss 5.8503 LearningRate 0.0005 Epoch: 13 Global Step: 283990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:39,717-Speed 6310.87 samples/sec Loss 5.8872 LearningRate 0.0005 Epoch: 13 Global Step: 284000 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:42,964-Speed 6309.21 samples/sec Loss 5.8452 LearningRate 0.0005 Epoch: 13 Global Step: 284010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:46,211-Speed 6308.47 samples/sec Loss 5.9303 LearningRate 0.0005 Epoch: 13 Global Step: 284020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:49,454-Speed 6317.02 samples/sec Loss 5.8032 LearningRate 0.0005 Epoch: 13 Global Step: 284030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:52,704-Speed 6304.58 samples/sec Loss 5.8390 LearningRate 0.0005 Epoch: 13 Global Step: 284040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:55,949-Speed 6310.76 samples/sec Loss 5.8187 LearningRate 0.0005 Epoch: 13 Global Step: 284050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:33:59,197-Speed 6307.07 samples/sec Loss 5.8654 LearningRate 0.0005 Epoch: 13 Global Step: 284060 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:02,447-Speed 6303.09 samples/sec Loss 5.9176 LearningRate 0.0005 Epoch: 13 Global Step: 284070 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:05,695-Speed 6306.77 samples/sec Loss 5.9211 LearningRate 0.0005 Epoch: 13 Global Step: 284080 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:08,940-Speed 6313.97 samples/sec Loss 5.8067 LearningRate 0.0005 Epoch: 13 Global Step: 284090 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:12,200-Speed 6283.23 samples/sec Loss 5.8790 LearningRate 0.0005 Epoch: 13 Global Step: 284100 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:15,454-Speed 6294.20 samples/sec Loss 5.8203 LearningRate 0.0005 Epoch: 13 Global Step: 284110 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:18,706-Speed 6300.86 samples/sec Loss 5.8844 LearningRate 0.0005 Epoch: 13 Global Step: 284120 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:21,950-Speed 6313.76 samples/sec Loss 5.8771 LearningRate 0.0005 Epoch: 13 Global Step: 284130 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:25,192-Speed 6317.70 samples/sec Loss 5.8392 LearningRate 0.0005 Epoch: 13 Global Step: 284140 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:28,440-Speed 6306.70 samples/sec Loss 5.8283 LearningRate 0.0005 Epoch: 13 Global Step: 284150 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:31,685-Speed 6312.97 samples/sec Loss 5.8600 LearningRate 0.0005 Epoch: 13 Global Step: 284160 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:34,928-Speed 6317.03 samples/sec Loss 5.7936 LearningRate 0.0005 Epoch: 13 Global Step: 284170 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:38,175-Speed 6309.15 samples/sec Loss 5.8434 LearningRate 0.0005 Epoch: 13 Global Step: 284180 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:41,409-Speed 6334.12 samples/sec Loss 5.8099 LearningRate 0.0005 Epoch: 13 Global Step: 284190 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:44,658-Speed 6303.32 samples/sec Loss 5.8032 LearningRate 0.0005 Epoch: 13 Global Step: 284200 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:47,925-Speed 6275.54 samples/sec Loss 5.8552 LearningRate 0.0005 Epoch: 13 Global Step: 284210 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:51,253-Speed 6155.53 samples/sec Loss 5.9040 LearningRate 0.0005 Epoch: 13 Global Step: 284220 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:54,532-Speed 6246.73 samples/sec Loss 5.8683 LearningRate 0.0005 Epoch: 13 Global Step: 284230 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:34:57,777-Speed 6313.14 samples/sec Loss 5.9051 LearningRate 0.0005 Epoch: 13 Global Step: 284240 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:01,025-Speed 6306.44 samples/sec Loss 5.8596 LearningRate 0.0005 Epoch: 13 Global Step: 284250 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:04,275-Speed 6302.40 samples/sec Loss 5.9277 LearningRate 0.0005 Epoch: 13 Global Step: 284260 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:07,522-Speed 6309.28 samples/sec Loss 5.8729 LearningRate 0.0005 Epoch: 13 Global Step: 284270 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:10,766-Speed 6314.49 samples/sec Loss 5.9075 LearningRate 0.0005 Epoch: 13 Global Step: 284280 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:13,999-Speed 6337.04 samples/sec Loss 5.8019 LearningRate 0.0005 Epoch: 13 Global Step: 284290 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:17,243-Speed 6314.68 samples/sec Loss 5.7640 LearningRate 0.0005 Epoch: 13 Global Step: 284300 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:20,492-Speed 6304.01 samples/sec Loss 5.8078 LearningRate 0.0005 Epoch: 13 Global Step: 284310 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:23,736-Speed 6315.37 samples/sec Loss 5.8486 LearningRate 0.0005 Epoch: 13 Global Step: 284320 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:26,983-Speed 6307.41 samples/sec Loss 5.7998 LearningRate 0.0005 Epoch: 13 Global Step: 284330 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:30,229-Speed 6311.93 samples/sec Loss 5.8498 LearningRate 0.0005 Epoch: 13 Global Step: 284340 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:33,471-Speed 6317.57 samples/sec Loss 5.9464 LearningRate 0.0005 Epoch: 13 Global Step: 284350 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:36,715-Speed 6315.09 samples/sec Loss 5.8877 LearningRate 0.0005 Epoch: 13 Global Step: 284360 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:39,956-Speed 6319.89 samples/sec Loss 5.8372 LearningRate 0.0005 Epoch: 13 Global Step: 284370 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:43,204-Speed 6307.44 samples/sec Loss 5.9019 LearningRate 0.0005 Epoch: 13 Global Step: 284380 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:46,449-Speed 6312.02 samples/sec Loss 5.8089 LearningRate 0.0005 Epoch: 13 Global Step: 284390 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:35:49,681-Speed 6338.55 samples/sec Loss 5.9120 LearningRate 0.0005 Epoch: 13 Global Step: 284400 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:52,923-Speed 6318.97 samples/sec Loss 5.8359 LearningRate 0.0005 Epoch: 13 Global Step: 284410 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:56,166-Speed 6315.57 samples/sec Loss 5.8283 LearningRate 0.0005 Epoch: 13 Global Step: 284420 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:35:59,417-Speed 6300.93 samples/sec Loss 5.9071 LearningRate 0.0005 Epoch: 13 Global Step: 284430 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:02,659-Speed 6319.62 samples/sec Loss 5.8348 LearningRate 0.0005 Epoch: 13 Global Step: 284440 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:05,905-Speed 6310.25 samples/sec Loss 5.9396 LearningRate 0.0005 Epoch: 13 Global Step: 284450 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:09,148-Speed 6317.93 samples/sec Loss 5.8302 LearningRate 0.0005 Epoch: 13 Global Step: 284460 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:12,394-Speed 6311.28 samples/sec Loss 5.9612 LearningRate 0.0005 Epoch: 13 Global Step: 284470 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:15,641-Speed 6306.85 samples/sec Loss 5.8319 LearningRate 0.0005 Epoch: 13 Global Step: 284480 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:18,887-Speed 6311.14 samples/sec Loss 5.9013 LearningRate 0.0005 Epoch: 13 Global Step: 284490 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:22,119-Speed 6338.49 samples/sec Loss 5.9059 LearningRate 0.0005 Epoch: 13 Global Step: 284500 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:25,367-Speed 6306.38 samples/sec Loss 5.9131 LearningRate 0.0005 Epoch: 13 Global Step: 284510 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:28,615-Speed 6307.77 samples/sec Loss 5.8691 LearningRate 0.0005 Epoch: 13 Global Step: 284520 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:31,865-Speed 6303.11 samples/sec Loss 5.9240 LearningRate 0.0005 Epoch: 13 Global Step: 284530 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:35,108-Speed 6316.61 samples/sec Loss 5.7880 LearningRate 0.0005 Epoch: 13 Global Step: 284540 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:38,386-Speed 6247.62 samples/sec Loss 5.8810 LearningRate 0.0005 Epoch: 13 Global Step: 284550 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:41,633-Speed 6310.26 samples/sec Loss 5.8949 LearningRate 0.0005 Epoch: 13 Global Step: 284560 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:44,887-Speed 6294.35 samples/sec Loss 5.8955 LearningRate 0.0005 Epoch: 13 Global Step: 284570 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:48,137-Speed 6303.66 samples/sec Loss 5.9153 LearningRate 0.0005 Epoch: 13 Global Step: 284580 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:51,383-Speed 6309.77 samples/sec Loss 5.8768 LearningRate 0.0005 Epoch: 13 Global Step: 284590 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:54,618-Speed 6333.46 samples/sec Loss 5.9096 LearningRate 0.0005 Epoch: 13 Global Step: 284600 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:36:57,861-Speed 6316.45 samples/sec Loss 5.8626 LearningRate 0.0005 Epoch: 13 Global Step: 284610 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:01,110-Speed 6303.97 samples/sec Loss 5.8954 LearningRate 0.0005 Epoch: 13 Global Step: 284620 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:04,354-Speed 6313.73 samples/sec Loss 5.8785 LearningRate 0.0005 Epoch: 13 Global Step: 284630 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:07,600-Speed 6312.18 samples/sec Loss 5.8231 LearningRate 0.0005 Epoch: 13 Global Step: 284640 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:10,846-Speed 6310.37 samples/sec Loss 5.8830 LearningRate 0.0005 Epoch: 13 Global Step: 284650 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:14,095-Speed 6305.41 samples/sec Loss 5.9302 LearningRate 0.0005 Epoch: 13 Global Step: 284660 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:17,352-Speed 6289.34 samples/sec Loss 5.8360 LearningRate 0.0005 Epoch: 13 Global Step: 284670 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:20,601-Speed 6305.67 samples/sec Loss 5.9720 LearningRate 0.0005 Epoch: 13 Global Step: 284680 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:23,844-Speed 6315.67 samples/sec Loss 5.8593 LearningRate 0.0005 Epoch: 13 Global Step: 284690 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:27,077-Speed 6337.02 samples/sec Loss 5.7902 LearningRate 0.0005 Epoch: 13 Global Step: 284700 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:30,324-Speed 6308.24 samples/sec Loss 5.9332 LearningRate 0.0005 Epoch: 13 Global Step: 284710 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:33,577-Speed 6297.40 samples/sec Loss 5.8472 LearningRate 0.0005 Epoch: 13 Global Step: 284720 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:36,821-Speed 6313.95 samples/sec Loss 5.8206 LearningRate 0.0005 Epoch: 13 Global Step: 284730 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:40,064-Speed 6317.74 samples/sec Loss 5.9451 LearningRate 0.0005 Epoch: 13 Global Step: 284740 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:43,311-Speed 6308.20 samples/sec Loss 5.8736 LearningRate 0.0005 Epoch: 13 Global Step: 284750 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:46,554-Speed 6316.25 samples/sec Loss 5.8587 LearningRate 0.0005 Epoch: 13 Global Step: 284760 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:49,807-Speed 6298.14 samples/sec Loss 5.8557 LearningRate 0.0005 Epoch: 13 Global Step: 284770 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:53,053-Speed 6309.07 samples/sec Loss 5.8835 LearningRate 0.0005 Epoch: 13 Global Step: 284780 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:56,302-Speed 6306.22 samples/sec Loss 5.8772 LearningRate 0.0005 Epoch: 13 Global Step: 284790 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:37:59,532-Speed 6341.29 samples/sec Loss 5.9384 LearningRate 0.0005 Epoch: 13 Global Step: 284800 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:02,784-Speed 6299.44 samples/sec Loss 5.9026 LearningRate 0.0005 Epoch: 13 Global Step: 284810 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:06,029-Speed 6312.75 samples/sec Loss 5.8944 LearningRate 0.0005 Epoch: 13 Global Step: 284820 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:09,276-Speed 6307.99 samples/sec Loss 5.8711 LearningRate 0.0005 Epoch: 13 Global Step: 284830 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:12,527-Speed 6302.43 samples/sec Loss 5.8018 LearningRate 0.0005 Epoch: 13 Global Step: 284840 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:15,770-Speed 6316.74 samples/sec Loss 5.7702 LearningRate 0.0005 Epoch: 13 Global Step: 284850 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:19,016-Speed 6310.93 samples/sec Loss 5.8175 LearningRate 0.0005 Epoch: 13 Global Step: 284860 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:22,261-Speed 6311.70 samples/sec Loss 5.7821 LearningRate 0.0005 Epoch: 13 Global Step: 284870 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:25,504-Speed 6316.56 samples/sec Loss 5.8200 LearningRate 0.0005 Epoch: 13 Global Step: 284880 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:28,752-Speed 6307.17 samples/sec Loss 5.8173 LearningRate 0.0005 Epoch: 13 Global Step: 284890 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:31,983-Speed 6340.69 samples/sec Loss 5.8304 LearningRate 0.0005 Epoch: 13 Global Step: 284900 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:35,234-Speed 6299.90 samples/sec Loss 5.8982 LearningRate 0.0005 Epoch: 13 Global Step: 284910 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:38,478-Speed 6315.81 samples/sec Loss 5.8474 LearningRate 0.0005 Epoch: 13 Global Step: 284920 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:41,724-Speed 6310.13 samples/sec Loss 5.8877 LearningRate 0.0005 Epoch: 13 Global Step: 284930 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:44,968-Speed 6313.88 samples/sec Loss 5.8402 LearningRate 0.0005 Epoch: 13 Global Step: 284940 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:48,220-Speed 6299.07 samples/sec Loss 5.9273 LearningRate 0.0005 Epoch: 13 Global Step: 284950 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:51,469-Speed 6305.58 samples/sec Loss 5.8864 LearningRate 0.0005 Epoch: 13 Global Step: 284960 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:54,714-Speed 6312.94 samples/sec Loss 5.9095 LearningRate 0.0005 Epoch: 13 Global Step: 284970 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:38:57,962-Speed 6306.91 samples/sec Loss 5.8056 LearningRate 0.0005 Epoch: 13 Global Step: 284980 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:39:01,210-Speed 6306.00 samples/sec Loss 5.8396 LearningRate 0.0005 Epoch: 13 Global Step: 284990 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:39:04,455-Speed 6313.06 samples/sec Loss 5.8210 LearningRate 0.0005 Epoch: 13 Global Step: 285000 Fp16 Grad Scale: 65536 Required: 50 hours Training: 2022-04-01 17:39:07,692-Speed 6329.10 samples/sec Loss 5.8173 LearningRate 0.0005 Epoch: 13 Global Step: 285010 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:39:10,936-Speed 6313.51 samples/sec Loss 5.8055 LearningRate 0.0005 Epoch: 13 Global Step: 285020 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:39:14,180-Speed 6315.06 samples/sec Loss 5.8737 LearningRate 0.0005 Epoch: 13 Global Step: 285030 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:39:17,429-Speed 6304.71 samples/sec Loss 5.8759 LearningRate 0.0005 Epoch: 13 Global Step: 285040 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:39:20,674-Speed 6313.61 samples/sec Loss 5.8893 LearningRate 0.0005 Epoch: 13 Global Step: 285050 Fp16 Grad Scale: 32768 Required: 50 hours Training: 2022-04-01 17:39:23,923-Speed 6304.75 samples/sec Loss 5.8824 LearningRate 0.0005 Epoch: 13 Global Step: 285060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:27,172-Speed 6305.23 samples/sec Loss 5.8261 LearningRate 0.0005 Epoch: 13 Global Step: 285070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:30,415-Speed 6317.39 samples/sec Loss 5.8192 LearningRate 0.0005 Epoch: 13 Global Step: 285080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:33,657-Speed 6318.03 samples/sec Loss 5.9237 LearningRate 0.0005 Epoch: 13 Global Step: 285090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:36,904-Speed 6309.57 samples/sec Loss 5.8757 LearningRate 0.0005 Epoch: 13 Global Step: 285100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:40,138-Speed 6334.10 samples/sec Loss 5.9043 LearningRate 0.0005 Epoch: 13 Global Step: 285110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:43,383-Speed 6312.08 samples/sec Loss 5.8308 LearningRate 0.0005 Epoch: 13 Global Step: 285120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:46,630-Speed 6308.79 samples/sec Loss 5.8574 LearningRate 0.0005 Epoch: 13 Global Step: 285130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:49,875-Speed 6312.30 samples/sec Loss 5.8842 LearningRate 0.0005 Epoch: 13 Global Step: 285140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:53,121-Speed 6311.00 samples/sec Loss 5.8378 LearningRate 0.0005 Epoch: 13 Global Step: 285150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:56,366-Speed 6312.99 samples/sec Loss 5.7588 LearningRate 0.0005 Epoch: 13 Global Step: 285160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:39:59,612-Speed 6310.05 samples/sec Loss 5.8515 LearningRate 0.0005 Epoch: 13 Global Step: 285170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:02,860-Speed 6305.87 samples/sec Loss 5.8713 LearningRate 0.0005 Epoch: 13 Global Step: 285180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:06,108-Speed 6307.33 samples/sec Loss 5.8044 LearningRate 0.0005 Epoch: 13 Global Step: 285190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:09,357-Speed 6305.92 samples/sec Loss 5.8649 LearningRate 0.0005 Epoch: 13 Global Step: 285200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:12,589-Speed 6337.69 samples/sec Loss 5.8786 LearningRate 0.0005 Epoch: 13 Global Step: 285210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:15,837-Speed 6305.91 samples/sec Loss 5.8751 LearningRate 0.0005 Epoch: 13 Global Step: 285220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:19,085-Speed 6306.74 samples/sec Loss 5.8636 LearningRate 0.0005 Epoch: 13 Global Step: 285230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:22,333-Speed 6307.57 samples/sec Loss 5.9124 LearningRate 0.0005 Epoch: 13 Global Step: 285240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:25,585-Speed 6298.63 samples/sec Loss 5.8604 LearningRate 0.0005 Epoch: 13 Global Step: 285250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:28,833-Speed 6306.61 samples/sec Loss 5.8092 LearningRate 0.0005 Epoch: 13 Global Step: 285260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:32,081-Speed 6307.61 samples/sec Loss 5.8201 LearningRate 0.0005 Epoch: 13 Global Step: 285270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:35,375-Speed 6218.70 samples/sec Loss 5.8755 LearningRate 0.0005 Epoch: 13 Global Step: 285280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:38,649-Speed 6258.04 samples/sec Loss 5.8079 LearningRate 0.0005 Epoch: 13 Global Step: 285290 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:41,891-Speed 6318.48 samples/sec Loss 5.7669 LearningRate 0.0005 Epoch: 13 Global Step: 285300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:45,135-Speed 6314.01 samples/sec Loss 5.8322 LearningRate 0.0005 Epoch: 13 Global Step: 285310 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 17:40:48,379-Speed 6314.87 samples/sec Loss 5.8167 LearningRate 0.0005 Epoch: 13 Global Step: 285320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:51,624-Speed 6311.53 samples/sec Loss 5.9797 LearningRate 0.0005 Epoch: 13 Global Step: 285330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:54,867-Speed 6318.00 samples/sec Loss 5.8015 LearningRate 0.0005 Epoch: 13 Global Step: 285340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:40:58,113-Speed 6309.70 samples/sec Loss 5.9036 LearningRate 0.0005 Epoch: 13 Global Step: 285350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:01,361-Speed 6307.25 samples/sec Loss 5.8860 LearningRate 0.0005 Epoch: 13 Global Step: 285360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:04,610-Speed 6304.09 samples/sec Loss 5.7874 LearningRate 0.0005 Epoch: 13 Global Step: 285370 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:07,854-Speed 6315.81 samples/sec Loss 5.8916 LearningRate 0.0005 Epoch: 13 Global Step: 285380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:11,157-Speed 6201.86 samples/sec Loss 5.8905 LearningRate 0.0005 Epoch: 13 Global Step: 285390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:14,402-Speed 6312.79 samples/sec Loss 5.8304 LearningRate 0.0005 Epoch: 13 Global Step: 285400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:17,651-Speed 6304.86 samples/sec Loss 5.7650 LearningRate 0.0005 Epoch: 13 Global Step: 285410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:20,885-Speed 6333.95 samples/sec Loss 5.7744 LearningRate 0.0005 Epoch: 13 Global Step: 285420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:24,130-Speed 6312.75 samples/sec Loss 5.9071 LearningRate 0.0005 Epoch: 13 Global Step: 285430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:27,383-Speed 6296.62 samples/sec Loss 5.8430 LearningRate 0.0005 Epoch: 13 Global Step: 285440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:30,629-Speed 6310.80 samples/sec Loss 5.8509 LearningRate 0.0005 Epoch: 13 Global Step: 285450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:33,879-Speed 6302.10 samples/sec Loss 5.8388 LearningRate 0.0005 Epoch: 13 Global Step: 285460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:37,131-Speed 6299.08 samples/sec Loss 5.8152 LearningRate 0.0005 Epoch: 13 Global Step: 285470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:40,383-Speed 6300.53 samples/sec Loss 5.8665 LearningRate 0.0005 Epoch: 13 Global Step: 285480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:43,632-Speed 6306.02 samples/sec Loss 5.7977 LearningRate 0.0005 Epoch: 13 Global Step: 285490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:46,876-Speed 6314.69 samples/sec Loss 5.9085 LearningRate 0.0005 Epoch: 13 Global Step: 285500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:50,124-Speed 6305.98 samples/sec Loss 5.8532 LearningRate 0.0005 Epoch: 13 Global Step: 285510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:53,356-Speed 6338.48 samples/sec Loss 5.8722 LearningRate 0.0005 Epoch: 13 Global Step: 285520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:56,604-Speed 6306.23 samples/sec Loss 5.7567 LearningRate 0.0005 Epoch: 13 Global Step: 285530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:41:59,851-Speed 6309.11 samples/sec Loss 5.8099 LearningRate 0.0005 Epoch: 13 Global Step: 285540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:03,098-Speed 6308.15 samples/sec Loss 5.8076 LearningRate 0.0005 Epoch: 13 Global Step: 285550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:06,351-Speed 6298.23 samples/sec Loss 5.9035 LearningRate 0.0005 Epoch: 13 Global Step: 285560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:09,598-Speed 6307.48 samples/sec Loss 5.9273 LearningRate 0.0005 Epoch: 13 Global Step: 285570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:12,843-Speed 6313.39 samples/sec Loss 5.9283 LearningRate 0.0005 Epoch: 13 Global Step: 285580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:16,087-Speed 6314.48 samples/sec Loss 5.8325 LearningRate 0.0005 Epoch: 13 Global Step: 285590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:19,340-Speed 6297.60 samples/sec Loss 5.8609 LearningRate 0.0005 Epoch: 13 Global Step: 285600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:22,587-Speed 6309.21 samples/sec Loss 5.8621 LearningRate 0.0005 Epoch: 13 Global Step: 285610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:25,818-Speed 6338.87 samples/sec Loss 5.8189 LearningRate 0.0005 Epoch: 13 Global Step: 285620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:29,072-Speed 6295.22 samples/sec Loss 5.9260 LearningRate 0.0005 Epoch: 13 Global Step: 285630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:32,318-Speed 6310.16 samples/sec Loss 5.8252 LearningRate 0.0005 Epoch: 13 Global Step: 285640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:35,565-Speed 6308.63 samples/sec Loss 5.8774 LearningRate 0.0005 Epoch: 13 Global Step: 285650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:38,816-Speed 6302.82 samples/sec Loss 5.8770 LearningRate 0.0005 Epoch: 13 Global Step: 285660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:42,058-Speed 6317.21 samples/sec Loss 5.9167 LearningRate 0.0005 Epoch: 13 Global Step: 285670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:45,306-Speed 6307.57 samples/sec Loss 5.9270 LearningRate 0.0005 Epoch: 13 Global Step: 285680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:48,553-Speed 6309.61 samples/sec Loss 5.8675 LearningRate 0.0005 Epoch: 13 Global Step: 285690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:51,804-Speed 6300.78 samples/sec Loss 5.8488 LearningRate 0.0005 Epoch: 13 Global Step: 285700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:55,054-Speed 6303.24 samples/sec Loss 5.8865 LearningRate 0.0005 Epoch: 13 Global Step: 285710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:42:58,283-Speed 6343.41 samples/sec Loss 5.9019 LearningRate 0.0005 Epoch: 13 Global Step: 285720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:01,529-Speed 6310.68 samples/sec Loss 5.8943 LearningRate 0.0005 Epoch: 13 Global Step: 285730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:04,775-Speed 6311.82 samples/sec Loss 5.9055 LearningRate 0.0005 Epoch: 13 Global Step: 285740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:08,022-Speed 6307.37 samples/sec Loss 5.8781 LearningRate 0.0005 Epoch: 13 Global Step: 285750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:11,272-Speed 6302.75 samples/sec Loss 5.8801 LearningRate 0.0005 Epoch: 13 Global Step: 285760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:14,522-Speed 6303.85 samples/sec Loss 5.8811 LearningRate 0.0005 Epoch: 13 Global Step: 285770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:17,769-Speed 6308.76 samples/sec Loss 5.9539 LearningRate 0.0005 Epoch: 13 Global Step: 285780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:21,015-Speed 6311.56 samples/sec Loss 5.8144 LearningRate 0.0005 Epoch: 13 Global Step: 285790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:24,265-Speed 6302.28 samples/sec Loss 5.9821 LearningRate 0.0005 Epoch: 13 Global Step: 285800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:27,515-Speed 6302.22 samples/sec Loss 5.8457 LearningRate 0.0005 Epoch: 13 Global Step: 285810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:30,747-Speed 6339.50 samples/sec Loss 5.9029 LearningRate 0.0005 Epoch: 13 Global Step: 285820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:33,998-Speed 6300.34 samples/sec Loss 5.7994 LearningRate 0.0005 Epoch: 13 Global Step: 285830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:37,243-Speed 6312.46 samples/sec Loss 5.9259 LearningRate 0.0005 Epoch: 13 Global Step: 285840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:40,488-Speed 6312.87 samples/sec Loss 5.9250 LearningRate 0.0005 Epoch: 13 Global Step: 285850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:43,734-Speed 6310.84 samples/sec Loss 5.9252 LearningRate 0.0005 Epoch: 13 Global Step: 285860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:46,981-Speed 6307.51 samples/sec Loss 5.7672 LearningRate 0.0005 Epoch: 13 Global Step: 285870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:50,230-Speed 6306.44 samples/sec Loss 5.8482 LearningRate 0.0005 Epoch: 13 Global Step: 285880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:53,476-Speed 6310.12 samples/sec Loss 5.9124 LearningRate 0.0005 Epoch: 13 Global Step: 285890 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:56,729-Speed 6296.75 samples/sec Loss 5.8502 LearningRate 0.0005 Epoch: 13 Global Step: 285900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:43:59,980-Speed 6301.37 samples/sec Loss 5.8480 LearningRate 0.0005 Epoch: 13 Global Step: 285910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:03,227-Speed 6309.85 samples/sec Loss 5.8472 LearningRate 0.0005 Epoch: 13 Global Step: 285920 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 17:44:06,468-Speed 6320.65 samples/sec Loss 5.8649 LearningRate 0.0005 Epoch: 13 Global Step: 285930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:09,714-Speed 6309.74 samples/sec Loss 5.8966 LearningRate 0.0005 Epoch: 13 Global Step: 285940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:12,965-Speed 6301.17 samples/sec Loss 5.8621 LearningRate 0.0005 Epoch: 13 Global Step: 285950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:16,212-Speed 6308.35 samples/sec Loss 5.7959 LearningRate 0.0005 Epoch: 13 Global Step: 285960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:19,462-Speed 6304.86 samples/sec Loss 5.8837 LearningRate 0.0005 Epoch: 13 Global Step: 285970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:22,706-Speed 6313.55 samples/sec Loss 5.8670 LearningRate 0.0005 Epoch: 13 Global Step: 285980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:25,955-Speed 6305.77 samples/sec Loss 5.8242 LearningRate 0.0005 Epoch: 13 Global Step: 285990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:29,203-Speed 6305.27 samples/sec Loss 5.8169 LearningRate 0.0005 Epoch: 13 Global Step: 286000 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:32,451-Speed 6306.61 samples/sec Loss 5.8220 LearningRate 0.0005 Epoch: 13 Global Step: 286010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:35,703-Speed 6300.75 samples/sec Loss 5.8417 LearningRate 0.0005 Epoch: 13 Global Step: 286020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:38,938-Speed 6334.99 samples/sec Loss 5.9062 LearningRate 0.0005 Epoch: 13 Global Step: 286030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:44:42,177-Speed 6323.04 samples/sec Loss 5.8280 LearningRate 0.0005 Epoch: 13 Global Step: 286040 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 17:44:45,422-Speed 6312.52 samples/sec Loss 5.8578 LearningRate 0.0005 Epoch: 13 Global Step: 286050 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 17:44:48,673-Speed 6302.06 samples/sec Loss 5.8990 LearningRate 0.0005 Epoch: 13 Global Step: 286060 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 17:44:51,919-Speed 6309.68 samples/sec Loss 5.8594 LearningRate 0.0005 Epoch: 13 Global Step: 286070 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 17:44:55,168-Speed 6306.05 samples/sec Loss 5.8842 LearningRate 0.0005 Epoch: 13 Global Step: 286080 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 17:44:58,411-Speed 6315.96 samples/sec Loss 5.8725 LearningRate 0.0005 Epoch: 13 Global Step: 286090 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 17:45:01,656-Speed 6312.79 samples/sec Loss 5.8921 LearningRate 0.0005 Epoch: 13 Global Step: 286100 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 17:45:04,901-Speed 6312.00 samples/sec Loss 5.7748 LearningRate 0.0005 Epoch: 13 Global Step: 286110 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 17:45:08,146-Speed 6314.80 samples/sec Loss 5.8296 LearningRate 0.0005 Epoch: 13 Global Step: 286120 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 17:45:11,389-Speed 6315.83 samples/sec Loss 5.8365 LearningRate 0.0005 Epoch: 13 Global Step: 286130 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 17:45:14,638-Speed 6305.88 samples/sec Loss 5.9021 LearningRate 0.0005 Epoch: 13 Global Step: 286140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:17,883-Speed 6312.24 samples/sec Loss 5.9047 LearningRate 0.0005 Epoch: 13 Global Step: 286150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:21,130-Speed 6308.04 samples/sec Loss 5.8750 LearningRate 0.0005 Epoch: 13 Global Step: 286160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:24,377-Speed 6310.37 samples/sec Loss 5.8146 LearningRate 0.0005 Epoch: 13 Global Step: 286170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:27,622-Speed 6311.77 samples/sec Loss 5.8389 LearningRate 0.0005 Epoch: 13 Global Step: 286180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:30,866-Speed 6314.24 samples/sec Loss 5.9066 LearningRate 0.0005 Epoch: 13 Global Step: 286190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:34,107-Speed 6321.06 samples/sec Loss 5.8081 LearningRate 0.0005 Epoch: 13 Global Step: 286200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:37,354-Speed 6309.13 samples/sec Loss 5.8288 LearningRate 0.0005 Epoch: 13 Global Step: 286210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:40,600-Speed 6309.55 samples/sec Loss 5.9030 LearningRate 0.0005 Epoch: 13 Global Step: 286220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:43,845-Speed 6314.09 samples/sec Loss 5.8883 LearningRate 0.0005 Epoch: 13 Global Step: 286230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:47,091-Speed 6309.97 samples/sec Loss 5.7974 LearningRate 0.0005 Epoch: 13 Global Step: 286240 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 17:45:50,324-Speed 6336.73 samples/sec Loss 5.8331 LearningRate 0.0005 Epoch: 13 Global Step: 286250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:53,572-Speed 6306.57 samples/sec Loss 5.8616 LearningRate 0.0005 Epoch: 13 Global Step: 286260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:45:56,820-Speed 6306.24 samples/sec Loss 5.8947 LearningRate 0.0005 Epoch: 13 Global Step: 286270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:00,065-Speed 6313.04 samples/sec Loss 5.8164 LearningRate 0.0005 Epoch: 13 Global Step: 286280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:03,311-Speed 6310.99 samples/sec Loss 5.7890 LearningRate 0.0005 Epoch: 13 Global Step: 286290 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:06,558-Speed 6308.77 samples/sec Loss 5.8899 LearningRate 0.0005 Epoch: 13 Global Step: 286300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:09,808-Speed 6302.22 samples/sec Loss 5.8776 LearningRate 0.0005 Epoch: 13 Global Step: 286310 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:13,055-Speed 6309.54 samples/sec Loss 5.8851 LearningRate 0.0005 Epoch: 13 Global Step: 286320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:16,300-Speed 6311.84 samples/sec Loss 5.8415 LearningRate 0.0005 Epoch: 13 Global Step: 286330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:19,580-Speed 6245.74 samples/sec Loss 5.8433 LearningRate 0.0005 Epoch: 13 Global Step: 286340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:22,812-Speed 6338.09 samples/sec Loss 5.7763 LearningRate 0.0005 Epoch: 13 Global Step: 286350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:26,103-Speed 6225.11 samples/sec Loss 5.8434 LearningRate 0.0005 Epoch: 13 Global Step: 286360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:29,371-Speed 6268.87 samples/sec Loss 5.9201 LearningRate 0.0005 Epoch: 13 Global Step: 286370 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:32,615-Speed 6315.00 samples/sec Loss 5.8416 LearningRate 0.0005 Epoch: 13 Global Step: 286380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:35,859-Speed 6312.94 samples/sec Loss 5.8436 LearningRate 0.0005 Epoch: 13 Global Step: 286390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:39,104-Speed 6312.56 samples/sec Loss 5.7939 LearningRate 0.0005 Epoch: 13 Global Step: 286400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:42,346-Speed 6318.94 samples/sec Loss 5.8293 LearningRate 0.0005 Epoch: 13 Global Step: 286410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:45,594-Speed 6308.10 samples/sec Loss 5.8518 LearningRate 0.0005 Epoch: 13 Global Step: 286420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:48,838-Speed 6312.75 samples/sec Loss 5.8296 LearningRate 0.0005 Epoch: 13 Global Step: 286430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:52,087-Speed 6305.44 samples/sec Loss 5.8472 LearningRate 0.0005 Epoch: 13 Global Step: 286440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:55,321-Speed 6337.02 samples/sec Loss 5.7773 LearningRate 0.0005 Epoch: 13 Global Step: 286450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:46:58,567-Speed 6310.04 samples/sec Loss 5.8529 LearningRate 0.0005 Epoch: 13 Global Step: 286460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:01,820-Speed 6297.78 samples/sec Loss 5.8538 LearningRate 0.0005 Epoch: 13 Global Step: 286470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:05,064-Speed 6313.54 samples/sec Loss 5.8451 LearningRate 0.0005 Epoch: 13 Global Step: 286480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:08,310-Speed 6311.08 samples/sec Loss 5.8701 LearningRate 0.0005 Epoch: 13 Global Step: 286490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:11,559-Speed 6305.60 samples/sec Loss 5.8175 LearningRate 0.0005 Epoch: 13 Global Step: 286500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:14,806-Speed 6309.76 samples/sec Loss 5.9338 LearningRate 0.0005 Epoch: 13 Global Step: 286510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:18,054-Speed 6306.26 samples/sec Loss 5.8699 LearningRate 0.0005 Epoch: 13 Global Step: 286520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:21,299-Speed 6312.02 samples/sec Loss 5.8753 LearningRate 0.0005 Epoch: 13 Global Step: 286530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:24,549-Speed 6303.77 samples/sec Loss 5.7847 LearningRate 0.0005 Epoch: 13 Global Step: 286540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:27,791-Speed 6317.82 samples/sec Loss 5.8853 LearningRate 0.0005 Epoch: 13 Global Step: 286550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:31,036-Speed 6313.83 samples/sec Loss 5.8067 LearningRate 0.0005 Epoch: 13 Global Step: 286560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:34,282-Speed 6311.49 samples/sec Loss 5.8506 LearningRate 0.0005 Epoch: 13 Global Step: 286570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:37,527-Speed 6311.47 samples/sec Loss 5.7927 LearningRate 0.0005 Epoch: 13 Global Step: 286580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:40,777-Speed 6303.49 samples/sec Loss 5.8293 LearningRate 0.0005 Epoch: 13 Global Step: 286590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:44,021-Speed 6313.41 samples/sec Loss 5.7548 LearningRate 0.0005 Epoch: 13 Global Step: 286600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:47,265-Speed 6316.20 samples/sec Loss 5.8962 LearningRate 0.0005 Epoch: 13 Global Step: 286610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:50,508-Speed 6316.12 samples/sec Loss 5.9085 LearningRate 0.0005 Epoch: 13 Global Step: 286620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:53,755-Speed 6307.89 samples/sec Loss 5.8526 LearningRate 0.0005 Epoch: 13 Global Step: 286630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:47:57,001-Speed 6311.37 samples/sec Loss 5.7934 LearningRate 0.0005 Epoch: 13 Global Step: 286640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:00,249-Speed 6307.05 samples/sec Loss 5.8910 LearningRate 0.0005 Epoch: 13 Global Step: 286650 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 17:48:03,476-Speed 6347.38 samples/sec Loss 5.8508 LearningRate 0.0005 Epoch: 13 Global Step: 286660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:06,721-Speed 6313.82 samples/sec Loss 5.8835 LearningRate 0.0005 Epoch: 13 Global Step: 286670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:09,966-Speed 6312.57 samples/sec Loss 5.8276 LearningRate 0.0005 Epoch: 13 Global Step: 286680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:13,215-Speed 6304.77 samples/sec Loss 5.8092 LearningRate 0.0005 Epoch: 13 Global Step: 286690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:16,463-Speed 6307.26 samples/sec Loss 5.8371 LearningRate 0.0005 Epoch: 13 Global Step: 286700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:19,714-Speed 6299.25 samples/sec Loss 5.7768 LearningRate 0.0005 Epoch: 13 Global Step: 286710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:22,960-Speed 6311.52 samples/sec Loss 5.9192 LearningRate 0.0005 Epoch: 13 Global Step: 286720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:26,208-Speed 6307.62 samples/sec Loss 5.8091 LearningRate 0.0005 Epoch: 13 Global Step: 286730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:29,451-Speed 6316.27 samples/sec Loss 5.8429 LearningRate 0.0005 Epoch: 13 Global Step: 286740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:32,694-Speed 6315.21 samples/sec Loss 5.7994 LearningRate 0.0005 Epoch: 13 Global Step: 286750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:35,927-Speed 6337.71 samples/sec Loss 5.9368 LearningRate 0.0005 Epoch: 13 Global Step: 286760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:39,172-Speed 6312.70 samples/sec Loss 5.8656 LearningRate 0.0005 Epoch: 13 Global Step: 286770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:42,421-Speed 6304.57 samples/sec Loss 5.7852 LearningRate 0.0005 Epoch: 13 Global Step: 286780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:45,666-Speed 6312.30 samples/sec Loss 5.8654 LearningRate 0.0005 Epoch: 13 Global Step: 286790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:48,909-Speed 6316.63 samples/sec Loss 5.9135 LearningRate 0.0005 Epoch: 13 Global Step: 286800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:52,156-Speed 6309.52 samples/sec Loss 5.7951 LearningRate 0.0005 Epoch: 13 Global Step: 286810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:55,400-Speed 6314.81 samples/sec Loss 5.8155 LearningRate 0.0005 Epoch: 13 Global Step: 286820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:48:58,645-Speed 6312.47 samples/sec Loss 5.8186 LearningRate 0.0005 Epoch: 13 Global Step: 286830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:01,898-Speed 6297.83 samples/sec Loss 5.8776 LearningRate 0.0005 Epoch: 13 Global Step: 286840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:05,144-Speed 6309.54 samples/sec Loss 5.8353 LearningRate 0.0005 Epoch: 13 Global Step: 286850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:08,379-Speed 6332.32 samples/sec Loss 5.8317 LearningRate 0.0005 Epoch: 13 Global Step: 286860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:11,623-Speed 6314.86 samples/sec Loss 5.8463 LearningRate 0.0005 Epoch: 13 Global Step: 286870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:14,868-Speed 6312.31 samples/sec Loss 5.8094 LearningRate 0.0005 Epoch: 13 Global Step: 286880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:18,116-Speed 6307.86 samples/sec Loss 5.8334 LearningRate 0.0005 Epoch: 13 Global Step: 286890 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:21,360-Speed 6314.39 samples/sec Loss 5.8865 LearningRate 0.0005 Epoch: 13 Global Step: 286900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:24,612-Speed 6299.41 samples/sec Loss 5.8330 LearningRate 0.0005 Epoch: 13 Global Step: 286910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:27,854-Speed 6318.36 samples/sec Loss 5.8016 LearningRate 0.0005 Epoch: 13 Global Step: 286920 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:31,097-Speed 6316.69 samples/sec Loss 5.9060 LearningRate 0.0005 Epoch: 13 Global Step: 286930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:34,340-Speed 6316.26 samples/sec Loss 5.7653 LearningRate 0.0005 Epoch: 13 Global Step: 286940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:37,587-Speed 6307.35 samples/sec Loss 5.9407 LearningRate 0.0005 Epoch: 13 Global Step: 286950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:40,822-Speed 6333.12 samples/sec Loss 5.8156 LearningRate 0.0005 Epoch: 13 Global Step: 286960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:44,066-Speed 6315.04 samples/sec Loss 5.9137 LearningRate 0.0005 Epoch: 13 Global Step: 286970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:47,310-Speed 6314.88 samples/sec Loss 5.8683 LearningRate 0.0005 Epoch: 13 Global Step: 286980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:50,550-Speed 6321.32 samples/sec Loss 5.7877 LearningRate 0.0005 Epoch: 13 Global Step: 286990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:53,796-Speed 6312.05 samples/sec Loss 5.8138 LearningRate 0.0005 Epoch: 13 Global Step: 287000 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:49:57,046-Speed 6302.77 samples/sec Loss 5.7993 LearningRate 0.0005 Epoch: 13 Global Step: 287010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:00,290-Speed 6315.40 samples/sec Loss 5.8524 LearningRate 0.0005 Epoch: 13 Global Step: 287020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:03,534-Speed 6313.65 samples/sec Loss 5.8399 LearningRate 0.0005 Epoch: 13 Global Step: 287030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:06,778-Speed 6314.98 samples/sec Loss 5.8409 LearningRate 0.0005 Epoch: 13 Global Step: 287040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:10,023-Speed 6312.28 samples/sec Loss 5.7934 LearningRate 0.0005 Epoch: 13 Global Step: 287050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:13,268-Speed 6312.00 samples/sec Loss 5.8125 LearningRate 0.0005 Epoch: 13 Global Step: 287060 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 17:50:16,504-Speed 6331.50 samples/sec Loss 5.8067 LearningRate 0.0005 Epoch: 13 Global Step: 287070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:19,751-Speed 6307.70 samples/sec Loss 5.8143 LearningRate 0.0005 Epoch: 13 Global Step: 287080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:22,997-Speed 6312.00 samples/sec Loss 5.8415 LearningRate 0.0005 Epoch: 13 Global Step: 287090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:26,251-Speed 6294.66 samples/sec Loss 5.8486 LearningRate 0.0005 Epoch: 13 Global Step: 287100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:29,496-Speed 6312.45 samples/sec Loss 5.8818 LearningRate 0.0005 Epoch: 13 Global Step: 287110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:32,744-Speed 6307.34 samples/sec Loss 5.8389 LearningRate 0.0005 Epoch: 13 Global Step: 287120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:35,992-Speed 6306.01 samples/sec Loss 5.8163 LearningRate 0.0005 Epoch: 13 Global Step: 287130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:39,237-Speed 6312.13 samples/sec Loss 5.8262 LearningRate 0.0005 Epoch: 13 Global Step: 287140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:42,480-Speed 6317.23 samples/sec Loss 5.8624 LearningRate 0.0005 Epoch: 13 Global Step: 287150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:45,726-Speed 6310.05 samples/sec Loss 5.8121 LearningRate 0.0005 Epoch: 13 Global Step: 287160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:48,953-Speed 6347.68 samples/sec Loss 5.9165 LearningRate 0.0005 Epoch: 13 Global Step: 287170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:52,199-Speed 6312.01 samples/sec Loss 5.7462 LearningRate 0.0005 Epoch: 13 Global Step: 287180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:55,447-Speed 6306.25 samples/sec Loss 5.8522 LearningRate 0.0005 Epoch: 13 Global Step: 287190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:50:58,691-Speed 6315.10 samples/sec Loss 5.8703 LearningRate 0.0005 Epoch: 13 Global Step: 287200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:01,935-Speed 6315.41 samples/sec Loss 5.7885 LearningRate 0.0005 Epoch: 13 Global Step: 287210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:05,185-Speed 6302.41 samples/sec Loss 5.8101 LearningRate 0.0005 Epoch: 13 Global Step: 287220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:08,436-Speed 6302.80 samples/sec Loss 5.8297 LearningRate 0.0005 Epoch: 13 Global Step: 287230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:11,684-Speed 6306.56 samples/sec Loss 5.8996 LearningRate 0.0005 Epoch: 13 Global Step: 287240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:14,928-Speed 6314.91 samples/sec Loss 5.7730 LearningRate 0.0005 Epoch: 13 Global Step: 287250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:18,196-Speed 6266.61 samples/sec Loss 5.8431 LearningRate 0.0005 Epoch: 13 Global Step: 287260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:21,433-Speed 6328.08 samples/sec Loss 5.7884 LearningRate 0.0005 Epoch: 13 Global Step: 287270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:24,681-Speed 6308.04 samples/sec Loss 5.8499 LearningRate 0.0005 Epoch: 13 Global Step: 287280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:27,928-Speed 6308.54 samples/sec Loss 5.8279 LearningRate 0.0005 Epoch: 13 Global Step: 287290 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:31,219-Speed 6224.13 samples/sec Loss 5.7579 LearningRate 0.0005 Epoch: 13 Global Step: 287300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:34,471-Speed 6299.81 samples/sec Loss 5.8361 LearningRate 0.0005 Epoch: 13 Global Step: 287310 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:37,718-Speed 6307.95 samples/sec Loss 5.8449 LearningRate 0.0005 Epoch: 13 Global Step: 287320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:40,961-Speed 6316.24 samples/sec Loss 5.7770 LearningRate 0.0005 Epoch: 13 Global Step: 287330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:44,210-Speed 6305.75 samples/sec Loss 5.7754 LearningRate 0.0005 Epoch: 13 Global Step: 287340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:47,463-Speed 6297.21 samples/sec Loss 5.8930 LearningRate 0.0005 Epoch: 13 Global Step: 287350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:50,711-Speed 6305.79 samples/sec Loss 5.7853 LearningRate 0.0005 Epoch: 13 Global Step: 287360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:51:53,957-Speed 6311.41 samples/sec Loss 5.8738 LearningRate 0.0005 Epoch: 13 Global Step: 287370 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 17:51:57,187-Speed 6341.78 samples/sec Loss 5.8158 LearningRate 0.0005 Epoch: 13 Global Step: 287380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:00,435-Speed 6307.14 samples/sec Loss 5.9435 LearningRate 0.0005 Epoch: 13 Global Step: 287390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:03,686-Speed 6299.93 samples/sec Loss 5.9074 LearningRate 0.0005 Epoch: 13 Global Step: 287400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:06,941-Speed 6294.51 samples/sec Loss 5.8885 LearningRate 0.0005 Epoch: 13 Global Step: 287410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:10,190-Speed 6304.84 samples/sec Loss 5.8609 LearningRate 0.0005 Epoch: 13 Global Step: 287420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:13,440-Speed 6304.53 samples/sec Loss 5.8303 LearningRate 0.0005 Epoch: 13 Global Step: 287430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:16,696-Speed 6290.43 samples/sec Loss 5.8934 LearningRate 0.0005 Epoch: 13 Global Step: 287440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:19,941-Speed 6312.34 samples/sec Loss 5.8903 LearningRate 0.0005 Epoch: 13 Global Step: 287450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:23,198-Speed 6290.74 samples/sec Loss 5.8162 LearningRate 0.0005 Epoch: 13 Global Step: 287460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:26,460-Speed 6279.65 samples/sec Loss 5.8152 LearningRate 0.0005 Epoch: 13 Global Step: 287470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:29,694-Speed 6333.48 samples/sec Loss 5.8515 LearningRate 0.0005 Epoch: 13 Global Step: 287480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:32,942-Speed 6307.08 samples/sec Loss 5.7611 LearningRate 0.0005 Epoch: 13 Global Step: 287490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:36,189-Speed 6307.77 samples/sec Loss 5.7668 LearningRate 0.0005 Epoch: 13 Global Step: 287500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:39,437-Speed 6307.63 samples/sec Loss 5.8125 LearningRate 0.0005 Epoch: 13 Global Step: 287510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:42,685-Speed 6307.04 samples/sec Loss 5.8665 LearningRate 0.0005 Epoch: 13 Global Step: 287520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:45,932-Speed 6307.67 samples/sec Loss 5.8909 LearningRate 0.0005 Epoch: 13 Global Step: 287530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:49,176-Speed 6314.39 samples/sec Loss 5.8102 LearningRate 0.0005 Epoch: 13 Global Step: 287540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:52,423-Speed 6309.18 samples/sec Loss 5.8162 LearningRate 0.0005 Epoch: 13 Global Step: 287550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:55,677-Speed 6295.91 samples/sec Loss 5.8170 LearningRate 0.0005 Epoch: 13 Global Step: 287560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:52:58,920-Speed 6315.34 samples/sec Loss 5.8226 LearningRate 0.0005 Epoch: 13 Global Step: 287570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:02,154-Speed 6334.73 samples/sec Loss 5.8034 LearningRate 0.0005 Epoch: 13 Global Step: 287580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:05,399-Speed 6313.59 samples/sec Loss 5.8050 LearningRate 0.0005 Epoch: 13 Global Step: 287590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:08,646-Speed 6308.17 samples/sec Loss 5.7876 LearningRate 0.0005 Epoch: 13 Global Step: 287600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:11,891-Speed 6312.64 samples/sec Loss 5.8492 LearningRate 0.0005 Epoch: 13 Global Step: 287610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:15,139-Speed 6308.12 samples/sec Loss 5.7993 LearningRate 0.0005 Epoch: 13 Global Step: 287620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:18,389-Speed 6301.14 samples/sec Loss 5.8964 LearningRate 0.0005 Epoch: 13 Global Step: 287630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:21,641-Speed 6300.88 samples/sec Loss 5.7717 LearningRate 0.0005 Epoch: 13 Global Step: 287640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:24,888-Speed 6306.94 samples/sec Loss 5.8577 LearningRate 0.0005 Epoch: 13 Global Step: 287650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:28,140-Speed 6300.64 samples/sec Loss 5.8704 LearningRate 0.0005 Epoch: 13 Global Step: 287660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:31,406-Speed 6271.77 samples/sec Loss 5.8284 LearningRate 0.0005 Epoch: 13 Global Step: 287670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:34,646-Speed 6321.82 samples/sec Loss 5.8209 LearningRate 0.0005 Epoch: 13 Global Step: 287680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:37,891-Speed 6313.96 samples/sec Loss 5.8353 LearningRate 0.0005 Epoch: 13 Global Step: 287690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:41,149-Speed 6286.56 samples/sec Loss 5.8261 LearningRate 0.0005 Epoch: 13 Global Step: 287700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:44,393-Speed 6315.24 samples/sec Loss 5.8747 LearningRate 0.0005 Epoch: 13 Global Step: 287710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:47,646-Speed 6296.33 samples/sec Loss 5.7597 LearningRate 0.0005 Epoch: 13 Global Step: 287720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:50,896-Speed 6303.49 samples/sec Loss 5.9121 LearningRate 0.0005 Epoch: 13 Global Step: 287730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:54,141-Speed 6312.37 samples/sec Loss 5.8585 LearningRate 0.0005 Epoch: 13 Global Step: 287740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:53:57,392-Speed 6299.88 samples/sec Loss 5.7656 LearningRate 0.0005 Epoch: 13 Global Step: 287750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:00,635-Speed 6317.34 samples/sec Loss 5.7678 LearningRate 0.0005 Epoch: 13 Global Step: 287760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:03,887-Speed 6299.68 samples/sec Loss 5.8356 LearningRate 0.0005 Epoch: 13 Global Step: 287770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:07,147-Speed 6282.54 samples/sec Loss 5.7395 LearningRate 0.0005 Epoch: 13 Global Step: 287780 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 17:54:10,378-Speed 6341.15 samples/sec Loss 5.8541 LearningRate 0.0005 Epoch: 13 Global Step: 287790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:13,623-Speed 6311.08 samples/sec Loss 5.8192 LearningRate 0.0005 Epoch: 13 Global Step: 287800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:16,868-Speed 6313.62 samples/sec Loss 5.8886 LearningRate 0.0005 Epoch: 13 Global Step: 287810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:20,116-Speed 6306.66 samples/sec Loss 5.8279 LearningRate 0.0005 Epoch: 13 Global Step: 287820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:23,364-Speed 6306.52 samples/sec Loss 5.8359 LearningRate 0.0005 Epoch: 13 Global Step: 287830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:26,613-Speed 6305.56 samples/sec Loss 5.8233 LearningRate 0.0005 Epoch: 13 Global Step: 287840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:29,874-Speed 6282.91 samples/sec Loss 5.8103 LearningRate 0.0005 Epoch: 13 Global Step: 287850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:33,114-Speed 6321.48 samples/sec Loss 5.7272 LearningRate 0.0005 Epoch: 13 Global Step: 287860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:36,358-Speed 6314.34 samples/sec Loss 5.8063 LearningRate 0.0005 Epoch: 13 Global Step: 287870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:39,608-Speed 6304.16 samples/sec Loss 5.8553 LearningRate 0.0005 Epoch: 13 Global Step: 287880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:42,839-Speed 6338.37 samples/sec Loss 5.7461 LearningRate 0.0005 Epoch: 13 Global Step: 287890 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:46,091-Speed 6300.30 samples/sec Loss 5.7594 LearningRate 0.0005 Epoch: 13 Global Step: 287900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:49,340-Speed 6304.92 samples/sec Loss 5.8840 LearningRate 0.0005 Epoch: 13 Global Step: 287910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:52,588-Speed 6305.81 samples/sec Loss 5.8479 LearningRate 0.0005 Epoch: 13 Global Step: 287920 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:55,830-Speed 6318.59 samples/sec Loss 5.8703 LearningRate 0.0005 Epoch: 13 Global Step: 287930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:54:59,078-Speed 6307.80 samples/sec Loss 5.8191 LearningRate 0.0005 Epoch: 13 Global Step: 287940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:02,326-Speed 6306.97 samples/sec Loss 5.9115 LearningRate 0.0005 Epoch: 13 Global Step: 287950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:05,570-Speed 6314.52 samples/sec Loss 5.8880 LearningRate 0.0005 Epoch: 13 Global Step: 287960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:08,817-Speed 6308.42 samples/sec Loss 5.8513 LearningRate 0.0005 Epoch: 13 Global Step: 287970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:12,066-Speed 6303.85 samples/sec Loss 5.7468 LearningRate 0.0005 Epoch: 13 Global Step: 287980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:15,300-Speed 6335.74 samples/sec Loss 5.8370 LearningRate 0.0005 Epoch: 13 Global Step: 287990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:18,541-Speed 6319.41 samples/sec Loss 5.9057 LearningRate 0.0005 Epoch: 13 Global Step: 288000 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:21,791-Speed 6304.08 samples/sec Loss 5.8300 LearningRate 0.0005 Epoch: 13 Global Step: 288010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:25,033-Speed 6318.48 samples/sec Loss 5.8020 LearningRate 0.0005 Epoch: 13 Global Step: 288020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:28,278-Speed 6312.93 samples/sec Loss 5.8265 LearningRate 0.0005 Epoch: 13 Global Step: 288030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:31,526-Speed 6305.89 samples/sec Loss 5.9327 LearningRate 0.0005 Epoch: 13 Global Step: 288040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:34,771-Speed 6313.91 samples/sec Loss 5.8869 LearningRate 0.0005 Epoch: 13 Global Step: 288050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:38,017-Speed 6310.73 samples/sec Loss 5.8932 LearningRate 0.0005 Epoch: 13 Global Step: 288060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:41,266-Speed 6305.10 samples/sec Loss 5.8773 LearningRate 0.0005 Epoch: 13 Global Step: 288070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:44,512-Speed 6309.97 samples/sec Loss 5.8968 LearningRate 0.0005 Epoch: 13 Global Step: 288080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:47,744-Speed 6337.55 samples/sec Loss 5.8122 LearningRate 0.0005 Epoch: 13 Global Step: 288090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:50,986-Speed 6318.79 samples/sec Loss 5.7793 LearningRate 0.0005 Epoch: 13 Global Step: 288100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:54,236-Speed 6303.59 samples/sec Loss 5.8080 LearningRate 0.0005 Epoch: 13 Global Step: 288110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:55:57,485-Speed 6303.76 samples/sec Loss 5.8255 LearningRate 0.0005 Epoch: 13 Global Step: 288120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:00,732-Speed 6309.36 samples/sec Loss 5.8133 LearningRate 0.0005 Epoch: 13 Global Step: 288130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:03,977-Speed 6311.83 samples/sec Loss 5.8489 LearningRate 0.0005 Epoch: 13 Global Step: 288140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:07,221-Speed 6314.85 samples/sec Loss 5.8063 LearningRate 0.0005 Epoch: 13 Global Step: 288150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:10,466-Speed 6314.05 samples/sec Loss 5.7302 LearningRate 0.0005 Epoch: 13 Global Step: 288160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:13,716-Speed 6301.45 samples/sec Loss 5.7195 LearningRate 0.0005 Epoch: 13 Global Step: 288170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:16,964-Speed 6306.94 samples/sec Loss 5.8686 LearningRate 0.0005 Epoch: 13 Global Step: 288180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:20,195-Speed 6339.69 samples/sec Loss 5.7573 LearningRate 0.0005 Epoch: 13 Global Step: 288190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:23,442-Speed 6309.90 samples/sec Loss 5.8206 LearningRate 0.0005 Epoch: 13 Global Step: 288200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:26,691-Speed 6304.26 samples/sec Loss 5.8255 LearningRate 0.0005 Epoch: 13 Global Step: 288210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:29,939-Speed 6306.53 samples/sec Loss 5.7937 LearningRate 0.0005 Epoch: 13 Global Step: 288220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:33,193-Speed 6296.03 samples/sec Loss 5.8721 LearningRate 0.0005 Epoch: 13 Global Step: 288230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:36,436-Speed 6317.06 samples/sec Loss 5.8148 LearningRate 0.0005 Epoch: 13 Global Step: 288240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:39,684-Speed 6306.65 samples/sec Loss 5.8475 LearningRate 0.0005 Epoch: 13 Global Step: 288250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:42,929-Speed 6312.98 samples/sec Loss 5.8496 LearningRate 0.0005 Epoch: 13 Global Step: 288260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:46,173-Speed 6315.87 samples/sec Loss 5.8305 LearningRate 0.0005 Epoch: 13 Global Step: 288270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:49,420-Speed 6308.54 samples/sec Loss 5.7708 LearningRate 0.0005 Epoch: 13 Global Step: 288280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:52,697-Speed 6250.68 samples/sec Loss 5.8870 LearningRate 0.0005 Epoch: 13 Global Step: 288290 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 17:56:55,929-Speed 6337.33 samples/sec Loss 5.8753 LearningRate 0.0005 Epoch: 13 Global Step: 288300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:56:59,177-Speed 6307.21 samples/sec Loss 5.8719 LearningRate 0.0005 Epoch: 13 Global Step: 288310 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:02,425-Speed 6307.54 samples/sec Loss 5.8408 LearningRate 0.0005 Epoch: 13 Global Step: 288320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:05,672-Speed 6309.26 samples/sec Loss 5.8953 LearningRate 0.0005 Epoch: 13 Global Step: 288330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:08,923-Speed 6300.35 samples/sec Loss 5.8951 LearningRate 0.0005 Epoch: 13 Global Step: 288340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:12,173-Speed 6302.52 samples/sec Loss 5.8243 LearningRate 0.0005 Epoch: 13 Global Step: 288350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:15,422-Speed 6304.10 samples/sec Loss 5.7754 LearningRate 0.0005 Epoch: 13 Global Step: 288360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:18,674-Speed 6299.96 samples/sec Loss 5.7615 LearningRate 0.0005 Epoch: 13 Global Step: 288370 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:21,921-Speed 6308.83 samples/sec Loss 5.7814 LearningRate 0.0005 Epoch: 13 Global Step: 288380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:25,174-Speed 6296.54 samples/sec Loss 5.8266 LearningRate 0.0005 Epoch: 13 Global Step: 288390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:28,407-Speed 6336.05 samples/sec Loss 5.7916 LearningRate 0.0005 Epoch: 13 Global Step: 288400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:31,653-Speed 6310.21 samples/sec Loss 5.7956 LearningRate 0.0005 Epoch: 13 Global Step: 288410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:34,915-Speed 6281.37 samples/sec Loss 5.8421 LearningRate 0.0005 Epoch: 13 Global Step: 288420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:38,163-Speed 6305.37 samples/sec Loss 5.8429 LearningRate 0.0005 Epoch: 13 Global Step: 288430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:41,412-Speed 6305.98 samples/sec Loss 5.8955 LearningRate 0.0005 Epoch: 13 Global Step: 288440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:44,658-Speed 6310.99 samples/sec Loss 5.8352 LearningRate 0.0005 Epoch: 13 Global Step: 288450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:47,904-Speed 6309.85 samples/sec Loss 5.8541 LearningRate 0.0005 Epoch: 13 Global Step: 288460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:51,151-Speed 6309.10 samples/sec Loss 5.8771 LearningRate 0.0005 Epoch: 13 Global Step: 288470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:54,395-Speed 6314.59 samples/sec Loss 5.8989 LearningRate 0.0005 Epoch: 13 Global Step: 288480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:57:57,641-Speed 6311.96 samples/sec Loss 5.7595 LearningRate 0.0005 Epoch: 13 Global Step: 288490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:00,875-Speed 6333.59 samples/sec Loss 5.7854 LearningRate 0.0005 Epoch: 13 Global Step: 288500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:04,124-Speed 6305.87 samples/sec Loss 5.8657 LearningRate 0.0005 Epoch: 13 Global Step: 288510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:07,369-Speed 6311.70 samples/sec Loss 5.8632 LearningRate 0.0005 Epoch: 13 Global Step: 288520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:10,619-Speed 6302.77 samples/sec Loss 5.7949 LearningRate 0.0005 Epoch: 13 Global Step: 288530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:13,866-Speed 6308.82 samples/sec Loss 5.8643 LearningRate 0.0005 Epoch: 13 Global Step: 288540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:17,111-Speed 6312.36 samples/sec Loss 5.8300 LearningRate 0.0005 Epoch: 13 Global Step: 288550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:20,354-Speed 6316.36 samples/sec Loss 5.8679 LearningRate 0.0005 Epoch: 13 Global Step: 288560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:23,599-Speed 6312.95 samples/sec Loss 5.8703 LearningRate 0.0005 Epoch: 13 Global Step: 288570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:26,846-Speed 6308.95 samples/sec Loss 5.8179 LearningRate 0.0005 Epoch: 13 Global Step: 288580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:30,089-Speed 6316.60 samples/sec Loss 5.8072 LearningRate 0.0005 Epoch: 13 Global Step: 288590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:33,320-Speed 6339.69 samples/sec Loss 5.7611 LearningRate 0.0005 Epoch: 13 Global Step: 288600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:36,565-Speed 6312.21 samples/sec Loss 5.8380 LearningRate 0.0005 Epoch: 13 Global Step: 288610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:39,813-Speed 6308.36 samples/sec Loss 5.8586 LearningRate 0.0005 Epoch: 13 Global Step: 288620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:43,061-Speed 6306.58 samples/sec Loss 5.8320 LearningRate 0.0005 Epoch: 13 Global Step: 288630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:46,305-Speed 6314.39 samples/sec Loss 5.8630 LearningRate 0.0005 Epoch: 13 Global Step: 288640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:49,554-Speed 6304.37 samples/sec Loss 5.8365 LearningRate 0.0005 Epoch: 13 Global Step: 288650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:52,805-Speed 6301.54 samples/sec Loss 5.8095 LearningRate 0.0005 Epoch: 13 Global Step: 288660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:56,051-Speed 6309.22 samples/sec Loss 5.8554 LearningRate 0.0005 Epoch: 13 Global Step: 288670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:58:59,297-Speed 6312.07 samples/sec Loss 5.7221 LearningRate 0.0005 Epoch: 13 Global Step: 288680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:02,541-Speed 6314.68 samples/sec Loss 5.9134 LearningRate 0.0005 Epoch: 13 Global Step: 288690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:05,771-Speed 6341.98 samples/sec Loss 5.8160 LearningRate 0.0005 Epoch: 13 Global Step: 288700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:09,019-Speed 6306.90 samples/sec Loss 5.8549 LearningRate 0.0005 Epoch: 13 Global Step: 288710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:12,265-Speed 6311.32 samples/sec Loss 5.7889 LearningRate 0.0005 Epoch: 13 Global Step: 288720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:15,531-Speed 6272.63 samples/sec Loss 5.8074 LearningRate 0.0005 Epoch: 13 Global Step: 288730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:18,782-Speed 6302.02 samples/sec Loss 5.8730 LearningRate 0.0005 Epoch: 13 Global Step: 288740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:22,032-Speed 6302.54 samples/sec Loss 5.8198 LearningRate 0.0005 Epoch: 13 Global Step: 288750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:25,288-Speed 6291.37 samples/sec Loss 5.8901 LearningRate 0.0005 Epoch: 13 Global Step: 288760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:28,531-Speed 6315.69 samples/sec Loss 5.8112 LearningRate 0.0005 Epoch: 13 Global Step: 288770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:31,782-Speed 6300.05 samples/sec Loss 5.8484 LearningRate 0.0005 Epoch: 13 Global Step: 288780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:35,029-Speed 6309.11 samples/sec Loss 5.7763 LearningRate 0.0005 Epoch: 13 Global Step: 288790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:38,259-Speed 6341.59 samples/sec Loss 5.8554 LearningRate 0.0005 Epoch: 13 Global Step: 288800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:41,510-Speed 6300.99 samples/sec Loss 5.8237 LearningRate 0.0005 Epoch: 13 Global Step: 288810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:44,758-Speed 6307.55 samples/sec Loss 5.8473 LearningRate 0.0005 Epoch: 13 Global Step: 288820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:48,010-Speed 6298.63 samples/sec Loss 5.7622 LearningRate 0.0005 Epoch: 13 Global Step: 288830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:51,254-Speed 6315.70 samples/sec Loss 5.8647 LearningRate 0.0005 Epoch: 13 Global Step: 288840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:54,507-Speed 6297.18 samples/sec Loss 5.8584 LearningRate 0.0005 Epoch: 13 Global Step: 288850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 17:59:57,753-Speed 6309.58 samples/sec Loss 5.8030 LearningRate 0.0005 Epoch: 13 Global Step: 288860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:00,998-Speed 6313.16 samples/sec Loss 5.7531 LearningRate 0.0005 Epoch: 13 Global Step: 288870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:04,247-Speed 6305.69 samples/sec Loss 5.8355 LearningRate 0.0005 Epoch: 13 Global Step: 288880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:07,490-Speed 6316.49 samples/sec Loss 5.7962 LearningRate 0.0005 Epoch: 13 Global Step: 288890 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:10,731-Speed 6323.14 samples/sec Loss 5.8028 LearningRate 0.0005 Epoch: 13 Global Step: 288900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:13,982-Speed 6301.18 samples/sec Loss 5.8637 LearningRate 0.0005 Epoch: 13 Global Step: 288910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:17,227-Speed 6313.31 samples/sec Loss 5.7609 LearningRate 0.0005 Epoch: 13 Global Step: 288920 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:20,474-Speed 6308.76 samples/sec Loss 5.8419 LearningRate 0.0005 Epoch: 13 Global Step: 288930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:23,722-Speed 6305.74 samples/sec Loss 5.7892 LearningRate 0.0005 Epoch: 13 Global Step: 288940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:26,965-Speed 6316.60 samples/sec Loss 5.8590 LearningRate 0.0005 Epoch: 13 Global Step: 288950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:30,212-Speed 6308.88 samples/sec Loss 5.8541 LearningRate 0.0005 Epoch: 13 Global Step: 288960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:33,459-Speed 6309.75 samples/sec Loss 5.7682 LearningRate 0.0005 Epoch: 13 Global Step: 288970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:36,703-Speed 6314.53 samples/sec Loss 5.8383 LearningRate 0.0005 Epoch: 13 Global Step: 288980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:39,952-Speed 6305.46 samples/sec Loss 5.8942 LearningRate 0.0005 Epoch: 13 Global Step: 288990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:43,187-Speed 6331.47 samples/sec Loss 5.9160 LearningRate 0.0005 Epoch: 13 Global Step: 289000 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:46,429-Speed 6318.43 samples/sec Loss 5.8179 LearningRate 0.0005 Epoch: 13 Global Step: 289010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:49,677-Speed 6307.22 samples/sec Loss 5.8755 LearningRate 0.0005 Epoch: 13 Global Step: 289020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:52,921-Speed 6314.26 samples/sec Loss 5.8568 LearningRate 0.0005 Epoch: 13 Global Step: 289030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:56,204-Speed 6239.07 samples/sec Loss 5.8746 LearningRate 0.0005 Epoch: 13 Global Step: 289040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:00:59,448-Speed 6314.86 samples/sec Loss 5.7972 LearningRate 0.0005 Epoch: 13 Global Step: 289050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:02,697-Speed 6304.71 samples/sec Loss 5.8648 LearningRate 0.0005 Epoch: 13 Global Step: 289060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:05,944-Speed 6309.48 samples/sec Loss 5.8345 LearningRate 0.0005 Epoch: 13 Global Step: 289070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:09,188-Speed 6314.37 samples/sec Loss 5.8177 LearningRate 0.0005 Epoch: 13 Global Step: 289080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:12,431-Speed 6316.71 samples/sec Loss 5.8603 LearningRate 0.0005 Epoch: 13 Global Step: 289090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:15,690-Speed 6286.44 samples/sec Loss 5.8663 LearningRate 0.0005 Epoch: 13 Global Step: 289100 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:01:18,925-Speed 6331.96 samples/sec Loss 5.7938 LearningRate 0.0005 Epoch: 13 Global Step: 289110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:22,174-Speed 6304.53 samples/sec Loss 5.8021 LearningRate 0.0005 Epoch: 13 Global Step: 289120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:25,423-Speed 6305.79 samples/sec Loss 5.9284 LearningRate 0.0005 Epoch: 13 Global Step: 289130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:28,671-Speed 6306.76 samples/sec Loss 5.8379 LearningRate 0.0005 Epoch: 13 Global Step: 289140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:31,918-Speed 6308.40 samples/sec Loss 5.8536 LearningRate 0.0005 Epoch: 13 Global Step: 289150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:35,162-Speed 6314.81 samples/sec Loss 5.8197 LearningRate 0.0005 Epoch: 13 Global Step: 289160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:38,407-Speed 6311.19 samples/sec Loss 5.9005 LearningRate 0.0005 Epoch: 13 Global Step: 289170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:41,655-Speed 6308.20 samples/sec Loss 5.7792 LearningRate 0.0005 Epoch: 13 Global Step: 289180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:44,904-Speed 6305.05 samples/sec Loss 5.7964 LearningRate 0.0005 Epoch: 13 Global Step: 289190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:48,153-Speed 6304.59 samples/sec Loss 5.9111 LearningRate 0.0005 Epoch: 13 Global Step: 289200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:51,384-Speed 6339.30 samples/sec Loss 5.8322 LearningRate 0.0005 Epoch: 13 Global Step: 289210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:54,634-Speed 6302.76 samples/sec Loss 5.8148 LearningRate 0.0005 Epoch: 13 Global Step: 289220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:01:57,880-Speed 6310.55 samples/sec Loss 5.8020 LearningRate 0.0005 Epoch: 13 Global Step: 289230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:02:01,126-Speed 6310.42 samples/sec Loss 5.8722 LearningRate 0.0005 Epoch: 13 Global Step: 289240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:02:04,376-Speed 6303.10 samples/sec Loss 5.8380 LearningRate 0.0005 Epoch: 13 Global Step: 289250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:02:07,624-Speed 6307.37 samples/sec Loss 5.8737 LearningRate 0.0005 Epoch: 13 Global Step: 289260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:02:10,871-Speed 6309.03 samples/sec Loss 5.9098 LearningRate 0.0005 Epoch: 13 Global Step: 289270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:02:14,117-Speed 6311.00 samples/sec Loss 5.8326 LearningRate 0.0005 Epoch: 13 Global Step: 289280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:02:17,354-Speed 6327.91 samples/sec Loss 5.7564 LearningRate 0.0005 Epoch: 13 Global Step: 289290 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:02:20,599-Speed 6312.90 samples/sec Loss 5.8370 LearningRate 0.0005 Epoch: 13 Global Step: 289300 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:02:23,849-Speed 6304.32 samples/sec Loss 5.7978 LearningRate 0.0005 Epoch: 13 Global Step: 289310 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:02:27,094-Speed 6312.36 samples/sec Loss 5.7937 LearningRate 0.0005 Epoch: 13 Global Step: 289320 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:02:30,343-Speed 6305.20 samples/sec Loss 5.8463 LearningRate 0.0005 Epoch: 13 Global Step: 289330 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:02:33,590-Speed 6307.46 samples/sec Loss 5.8374 LearningRate 0.0005 Epoch: 13 Global Step: 289340 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:02:36,834-Speed 6315.23 samples/sec Loss 5.8166 LearningRate 0.0005 Epoch: 13 Global Step: 289350 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:02:40,082-Speed 6305.96 samples/sec Loss 5.7330 LearningRate 0.0005 Epoch: 13 Global Step: 289360 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:02:43,331-Speed 6305.58 samples/sec Loss 5.7868 LearningRate 0.0005 Epoch: 13 Global Step: 289370 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:02:46,589-Speed 6287.19 samples/sec Loss 5.8348 LearningRate 0.0005 Epoch: 13 Global Step: 289380 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:02:49,841-Speed 6299.17 samples/sec Loss 5.8107 LearningRate 0.0005 Epoch: 13 Global Step: 289390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:02:53,089-Speed 6306.65 samples/sec Loss 5.8191 LearningRate 0.0005 Epoch: 13 Global Step: 289400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:02:56,336-Speed 6309.91 samples/sec Loss 5.7999 LearningRate 0.0005 Epoch: 13 Global Step: 289410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:02:59,582-Speed 6309.80 samples/sec Loss 5.8259 LearningRate 0.0005 Epoch: 13 Global Step: 289420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:02,834-Speed 6298.07 samples/sec Loss 5.7954 LearningRate 0.0005 Epoch: 13 Global Step: 289430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:06,084-Speed 6303.48 samples/sec Loss 5.9079 LearningRate 0.0005 Epoch: 13 Global Step: 289440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:09,329-Speed 6313.23 samples/sec Loss 5.7435 LearningRate 0.0005 Epoch: 13 Global Step: 289450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:12,590-Speed 6281.10 samples/sec Loss 5.8635 LearningRate 0.0005 Epoch: 13 Global Step: 289460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:15,835-Speed 6312.86 samples/sec Loss 5.8818 LearningRate 0.0005 Epoch: 13 Global Step: 289470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:19,079-Speed 6314.48 samples/sec Loss 5.8698 LearningRate 0.0005 Epoch: 13 Global Step: 289480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:22,313-Speed 6334.58 samples/sec Loss 5.7945 LearningRate 0.0005 Epoch: 13 Global Step: 289490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:25,566-Speed 6297.93 samples/sec Loss 5.8734 LearningRate 0.0005 Epoch: 13 Global Step: 289500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:28,812-Speed 6310.38 samples/sec Loss 5.8593 LearningRate 0.0005 Epoch: 13 Global Step: 289510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:32,069-Speed 6289.49 samples/sec Loss 5.8372 LearningRate 0.0005 Epoch: 13 Global Step: 289520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:35,316-Speed 6308.41 samples/sec Loss 5.6915 LearningRate 0.0005 Epoch: 13 Global Step: 289530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:38,574-Speed 6287.36 samples/sec Loss 5.8683 LearningRate 0.0005 Epoch: 13 Global Step: 289540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:41,826-Speed 6299.54 samples/sec Loss 5.8278 LearningRate 0.0005 Epoch: 13 Global Step: 289550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:45,088-Speed 6279.23 samples/sec Loss 5.7731 LearningRate 0.0005 Epoch: 13 Global Step: 289560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:48,340-Speed 6298.81 samples/sec Loss 5.8182 LearningRate 0.0005 Epoch: 13 Global Step: 289570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:51,668-Speed 6155.79 samples/sec Loss 5.7696 LearningRate 0.0005 Epoch: 13 Global Step: 289580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:54,913-Speed 6311.54 samples/sec Loss 5.8299 LearningRate 0.0005 Epoch: 13 Global Step: 289590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:03:58,164-Speed 6300.95 samples/sec Loss 5.7974 LearningRate 0.0005 Epoch: 13 Global Step: 289600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:01,418-Speed 6296.27 samples/sec Loss 5.7952 LearningRate 0.0005 Epoch: 13 Global Step: 289610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:04,662-Speed 6314.32 samples/sec Loss 5.8331 LearningRate 0.0005 Epoch: 13 Global Step: 289620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:07,908-Speed 6310.42 samples/sec Loss 5.7806 LearningRate 0.0005 Epoch: 13 Global Step: 289630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:11,155-Speed 6308.92 samples/sec Loss 5.8437 LearningRate 0.0005 Epoch: 13 Global Step: 289640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:14,403-Speed 6306.75 samples/sec Loss 5.6995 LearningRate 0.0005 Epoch: 13 Global Step: 289650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:17,660-Speed 6289.05 samples/sec Loss 5.8210 LearningRate 0.0005 Epoch: 13 Global Step: 289660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:20,909-Speed 6305.63 samples/sec Loss 5.8708 LearningRate 0.0005 Epoch: 13 Global Step: 289670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:24,165-Speed 6290.64 samples/sec Loss 5.8381 LearningRate 0.0005 Epoch: 13 Global Step: 289680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:27,403-Speed 6326.91 samples/sec Loss 5.8477 LearningRate 0.0005 Epoch: 13 Global Step: 289690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:30,649-Speed 6310.61 samples/sec Loss 5.8257 LearningRate 0.0005 Epoch: 13 Global Step: 289700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:33,898-Speed 6305.33 samples/sec Loss 5.8204 LearningRate 0.0005 Epoch: 13 Global Step: 289710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:37,153-Speed 6293.03 samples/sec Loss 5.8422 LearningRate 0.0005 Epoch: 13 Global Step: 289720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:40,407-Speed 6295.38 samples/sec Loss 5.8636 LearningRate 0.0005 Epoch: 13 Global Step: 289730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:43,655-Speed 6307.96 samples/sec Loss 5.7743 LearningRate 0.0005 Epoch: 13 Global Step: 289740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:46,897-Speed 6316.70 samples/sec Loss 5.8757 LearningRate 0.0005 Epoch: 13 Global Step: 289750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:50,144-Speed 6309.49 samples/sec Loss 5.8218 LearningRate 0.0005 Epoch: 13 Global Step: 289760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:53,391-Speed 6308.36 samples/sec Loss 5.9428 LearningRate 0.0005 Epoch: 13 Global Step: 289770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:56,642-Speed 6302.26 samples/sec Loss 5.8715 LearningRate 0.0005 Epoch: 13 Global Step: 289780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:04:59,887-Speed 6311.06 samples/sec Loss 5.7840 LearningRate 0.0005 Epoch: 13 Global Step: 289790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:03,143-Speed 6292.21 samples/sec Loss 5.8641 LearningRate 0.0005 Epoch: 13 Global Step: 289800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:06,390-Speed 6308.20 samples/sec Loss 5.8016 LearningRate 0.0005 Epoch: 13 Global Step: 289810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:09,637-Speed 6309.06 samples/sec Loss 5.7736 LearningRate 0.0005 Epoch: 13 Global Step: 289820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:12,902-Speed 6274.96 samples/sec Loss 5.9265 LearningRate 0.0005 Epoch: 13 Global Step: 289830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:16,154-Speed 6297.57 samples/sec Loss 5.7551 LearningRate 0.0005 Epoch: 13 Global Step: 289840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:19,401-Speed 6310.15 samples/sec Loss 5.7301 LearningRate 0.0005 Epoch: 13 Global Step: 289850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:22,647-Speed 6309.01 samples/sec Loss 5.8282 LearningRate 0.0005 Epoch: 13 Global Step: 289860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:25,897-Speed 6304.57 samples/sec Loss 5.8537 LearningRate 0.0005 Epoch: 13 Global Step: 289870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:29,144-Speed 6307.30 samples/sec Loss 5.8778 LearningRate 0.0005 Epoch: 13 Global Step: 289880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:32,399-Speed 6295.19 samples/sec Loss 5.8538 LearningRate 0.0005 Epoch: 13 Global Step: 289890 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:05:35,632-Speed 6334.97 samples/sec Loss 5.7009 LearningRate 0.0005 Epoch: 13 Global Step: 289900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:38,879-Speed 6309.92 samples/sec Loss 5.8411 LearningRate 0.0005 Epoch: 13 Global Step: 289910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:42,125-Speed 6309.95 samples/sec Loss 5.8773 LearningRate 0.0005 Epoch: 13 Global Step: 289920 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:45,375-Speed 6303.64 samples/sec Loss 5.8435 LearningRate 0.0005 Epoch: 13 Global Step: 289930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:48,624-Speed 6306.05 samples/sec Loss 5.8303 LearningRate 0.0005 Epoch: 13 Global Step: 289940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:51,879-Speed 6291.48 samples/sec Loss 5.8162 LearningRate 0.0005 Epoch: 13 Global Step: 289950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:55,128-Speed 6306.28 samples/sec Loss 5.9441 LearningRate 0.0005 Epoch: 13 Global Step: 289960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:05:58,371-Speed 6315.97 samples/sec Loss 5.7706 LearningRate 0.0005 Epoch: 13 Global Step: 289970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:01,625-Speed 6295.21 samples/sec Loss 5.8218 LearningRate 0.0005 Epoch: 13 Global Step: 289980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:04,876-Speed 6301.10 samples/sec Loss 5.7675 LearningRate 0.0005 Epoch: 13 Global Step: 289990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:08,106-Speed 6342.11 samples/sec Loss 5.8657 LearningRate 0.0005 Epoch: 13 Global Step: 290000 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:11,359-Speed 6296.61 samples/sec Loss 5.8415 LearningRate 0.0005 Epoch: 13 Global Step: 290010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:14,607-Speed 6307.95 samples/sec Loss 5.8076 LearningRate 0.0005 Epoch: 13 Global Step: 290020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:17,854-Speed 6308.32 samples/sec Loss 5.8975 LearningRate 0.0005 Epoch: 13 Global Step: 290030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:21,102-Speed 6306.83 samples/sec Loss 5.7753 LearningRate 0.0005 Epoch: 13 Global Step: 290040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:24,361-Speed 6286.02 samples/sec Loss 5.8042 LearningRate 0.0005 Epoch: 13 Global Step: 290050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:27,607-Speed 6310.69 samples/sec Loss 5.8041 LearningRate 0.0005 Epoch: 13 Global Step: 290060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:30,853-Speed 6310.87 samples/sec Loss 5.7867 LearningRate 0.0005 Epoch: 13 Global Step: 290070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:34,107-Speed 6295.05 samples/sec Loss 5.8284 LearningRate 0.0005 Epoch: 13 Global Step: 290080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:37,350-Speed 6314.68 samples/sec Loss 5.7921 LearningRate 0.0005 Epoch: 13 Global Step: 290090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:40,584-Speed 6334.57 samples/sec Loss 5.8404 LearningRate 0.0005 Epoch: 13 Global Step: 290100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:43,848-Speed 6276.22 samples/sec Loss 5.8060 LearningRate 0.0005 Epoch: 13 Global Step: 290110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:47,096-Speed 6307.20 samples/sec Loss 5.8516 LearningRate 0.0005 Epoch: 13 Global Step: 290120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:50,343-Speed 6309.33 samples/sec Loss 5.8100 LearningRate 0.0005 Epoch: 13 Global Step: 290130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:53,594-Speed 6301.06 samples/sec Loss 5.8221 LearningRate 0.0005 Epoch: 13 Global Step: 290140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:06:56,848-Speed 6296.12 samples/sec Loss 5.8970 LearningRate 0.0005 Epoch: 13 Global Step: 290150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:07:00,095-Speed 6308.87 samples/sec Loss 5.8554 LearningRate 0.0005 Epoch: 13 Global Step: 290160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:07:03,347-Speed 6298.03 samples/sec Loss 5.8233 LearningRate 0.0005 Epoch: 13 Global Step: 290170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:07:06,598-Speed 6300.55 samples/sec Loss 5.7985 LearningRate 0.0005 Epoch: 13 Global Step: 290180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:07:09,849-Speed 6301.22 samples/sec Loss 5.8621 LearningRate 0.0005 Epoch: 13 Global Step: 290190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:07:13,083-Speed 6333.65 samples/sec Loss 5.8370 LearningRate 0.0005 Epoch: 13 Global Step: 290200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:07:16,327-Speed 6315.60 samples/sec Loss 5.8939 LearningRate 0.0005 Epoch: 13 Global Step: 290210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:07:19,592-Speed 6274.23 samples/sec Loss 5.8667 LearningRate 0.0005 Epoch: 13 Global Step: 290220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:07:22,838-Speed 6309.27 samples/sec Loss 5.8742 LearningRate 0.0005 Epoch: 13 Global Step: 290230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:07:26,081-Speed 6317.71 samples/sec Loss 5.8118 LearningRate 0.0005 Epoch: 13 Global Step: 290240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:07:29,314-Speed 6335.93 samples/sec Loss 5.8059 LearningRate 0.0005 Epoch: 13 Global Step: 290250 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:07:32,560-Speed 6311.14 samples/sec Loss 5.8379 LearningRate 0.0005 Epoch: 13 Global Step: 290260 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:07:35,808-Speed 6305.86 samples/sec Loss 5.8544 LearningRate 0.0005 Epoch: 13 Global Step: 290270 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:07:39,054-Speed 6311.71 samples/sec Loss 5.8387 LearningRate 0.0005 Epoch: 13 Global Step: 290280 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:07:42,301-Speed 6307.93 samples/sec Loss 5.7999 LearningRate 0.0005 Epoch: 13 Global Step: 290290 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:07:45,544-Speed 6317.91 samples/sec Loss 5.8056 LearningRate 0.0005 Epoch: 13 Global Step: 290300 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:07:48,793-Speed 6303.59 samples/sec Loss 5.8560 LearningRate 0.0005 Epoch: 13 Global Step: 290310 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:07:52,044-Speed 6301.16 samples/sec Loss 5.8968 LearningRate 0.0005 Epoch: 13 Global Step: 290320 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:07:55,289-Speed 6312.34 samples/sec Loss 5.8476 LearningRate 0.0005 Epoch: 13 Global Step: 290330 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:07:58,539-Speed 6304.31 samples/sec Loss 5.7509 LearningRate 0.0005 Epoch: 13 Global Step: 290340 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:08:01,785-Speed 6310.12 samples/sec Loss 5.7712 LearningRate 0.0005 Epoch: 13 Global Step: 290350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:01,374-Speed 343.69 samples/sec Loss 5.8171 LearningRate 0.0005 Epoch: 14 Global Step: 290360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:04,617-Speed 6315.89 samples/sec Loss 5.7729 LearningRate 0.0005 Epoch: 14 Global Step: 290370 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:07,855-Speed 6327.50 samples/sec Loss 5.8831 LearningRate 0.0005 Epoch: 14 Global Step: 290380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:11,096-Speed 6320.48 samples/sec Loss 5.8490 LearningRate 0.0005 Epoch: 14 Global Step: 290390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:14,332-Speed 6329.73 samples/sec Loss 5.8550 LearningRate 0.0005 Epoch: 14 Global Step: 290400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:17,569-Speed 6328.20 samples/sec Loss 5.8229 LearningRate 0.0005 Epoch: 14 Global Step: 290410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:20,807-Speed 6326.06 samples/sec Loss 5.8280 LearningRate 0.0005 Epoch: 14 Global Step: 290420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:24,048-Speed 6320.64 samples/sec Loss 5.9279 LearningRate 0.0005 Epoch: 14 Global Step: 290430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:27,284-Speed 6329.59 samples/sec Loss 5.8175 LearningRate 0.0005 Epoch: 14 Global Step: 290440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:30,511-Speed 6348.79 samples/sec Loss 5.7978 LearningRate 0.0005 Epoch: 14 Global Step: 290450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:33,753-Speed 6317.42 samples/sec Loss 5.8563 LearningRate 0.0005 Epoch: 14 Global Step: 290460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:36,996-Speed 6317.65 samples/sec Loss 5.8311 LearningRate 0.0005 Epoch: 14 Global Step: 290470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:40,234-Speed 6327.32 samples/sec Loss 5.8740 LearningRate 0.0005 Epoch: 14 Global Step: 290480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:43,477-Speed 6315.35 samples/sec Loss 5.6905 LearningRate 0.0005 Epoch: 14 Global Step: 290490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:46,723-Speed 6310.44 samples/sec Loss 5.7266 LearningRate 0.0005 Epoch: 14 Global Step: 290500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:49,963-Speed 6323.70 samples/sec Loss 5.7737 LearningRate 0.0005 Epoch: 14 Global Step: 290510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:53,210-Speed 6307.97 samples/sec Loss 5.8306 LearningRate 0.0005 Epoch: 14 Global Step: 290520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:56,452-Speed 6319.48 samples/sec Loss 5.8034 LearningRate 0.0005 Epoch: 14 Global Step: 290530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:09:59,691-Speed 6325.49 samples/sec Loss 5.7570 LearningRate 0.0005 Epoch: 14 Global Step: 290540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:02,932-Speed 6319.59 samples/sec Loss 5.8569 LearningRate 0.0005 Epoch: 14 Global Step: 290550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:06,183-Speed 6302.23 samples/sec Loss 5.8062 LearningRate 0.0005 Epoch: 14 Global Step: 290560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:09,422-Speed 6323.01 samples/sec Loss 5.7449 LearningRate 0.0005 Epoch: 14 Global Step: 290570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:12,662-Speed 6323.37 samples/sec Loss 5.8547 LearningRate 0.0005 Epoch: 14 Global Step: 290580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:15,901-Speed 6322.29 samples/sec Loss 5.8342 LearningRate 0.0005 Epoch: 14 Global Step: 290590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:19,141-Speed 6323.99 samples/sec Loss 5.6586 LearningRate 0.0005 Epoch: 14 Global Step: 290600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:22,383-Speed 6318.19 samples/sec Loss 5.7243 LearningRate 0.0005 Epoch: 14 Global Step: 290610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:25,624-Speed 6319.41 samples/sec Loss 5.8001 LearningRate 0.0005 Epoch: 14 Global Step: 290620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:28,866-Speed 6318.28 samples/sec Loss 5.8598 LearningRate 0.0005 Epoch: 14 Global Step: 290630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:32,107-Speed 6322.36 samples/sec Loss 5.7901 LearningRate 0.0005 Epoch: 14 Global Step: 290640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:35,349-Speed 6317.32 samples/sec Loss 5.7667 LearningRate 0.0005 Epoch: 14 Global Step: 290650 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:10:38,578-Speed 6344.76 samples/sec Loss 5.8317 LearningRate 0.0005 Epoch: 14 Global Step: 290660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:41,823-Speed 6311.23 samples/sec Loss 5.7681 LearningRate 0.0005 Epoch: 14 Global Step: 290670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:45,067-Speed 6315.71 samples/sec Loss 5.8737 LearningRate 0.0005 Epoch: 14 Global Step: 290680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:48,307-Speed 6321.15 samples/sec Loss 5.8046 LearningRate 0.0005 Epoch: 14 Global Step: 290690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:51,559-Speed 6301.28 samples/sec Loss 5.8283 LearningRate 0.0005 Epoch: 14 Global Step: 290700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:54,800-Speed 6319.48 samples/sec Loss 5.8000 LearningRate 0.0005 Epoch: 14 Global Step: 290710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:10:58,086-Speed 6233.42 samples/sec Loss 5.8256 LearningRate 0.0005 Epoch: 14 Global Step: 290720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:01,343-Speed 6290.83 samples/sec Loss 5.7590 LearningRate 0.0005 Epoch: 14 Global Step: 290730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:04,585-Speed 6319.07 samples/sec Loss 5.7982 LearningRate 0.0005 Epoch: 14 Global Step: 290740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:07,827-Speed 6318.58 samples/sec Loss 5.8249 LearningRate 0.0005 Epoch: 14 Global Step: 290750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:11,066-Speed 6323.32 samples/sec Loss 5.8340 LearningRate 0.0005 Epoch: 14 Global Step: 290760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:14,322-Speed 6292.05 samples/sec Loss 5.7842 LearningRate 0.0005 Epoch: 14 Global Step: 290770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:17,566-Speed 6314.65 samples/sec Loss 5.7902 LearningRate 0.0005 Epoch: 14 Global Step: 290780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:20,809-Speed 6316.32 samples/sec Loss 5.8242 LearningRate 0.0005 Epoch: 14 Global Step: 290790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:24,050-Speed 6319.79 samples/sec Loss 5.8261 LearningRate 0.0005 Epoch: 14 Global Step: 290800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:27,298-Speed 6307.89 samples/sec Loss 5.8002 LearningRate 0.0005 Epoch: 14 Global Step: 290810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:30,539-Speed 6320.58 samples/sec Loss 5.8103 LearningRate 0.0005 Epoch: 14 Global Step: 290820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:33,781-Speed 6317.91 samples/sec Loss 5.8058 LearningRate 0.0005 Epoch: 14 Global Step: 290830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:37,022-Speed 6319.67 samples/sec Loss 5.7617 LearningRate 0.0005 Epoch: 14 Global Step: 290840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:40,265-Speed 6316.94 samples/sec Loss 5.7973 LearningRate 0.0005 Epoch: 14 Global Step: 290850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:43,493-Speed 6345.85 samples/sec Loss 5.8434 LearningRate 0.0005 Epoch: 14 Global Step: 290860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:46,744-Speed 6301.53 samples/sec Loss 5.8312 LearningRate 0.0005 Epoch: 14 Global Step: 290870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:49,988-Speed 6314.21 samples/sec Loss 5.8261 LearningRate 0.0005 Epoch: 14 Global Step: 290880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:53,284-Speed 6215.61 samples/sec Loss 5.8360 LearningRate 0.0005 Epoch: 14 Global Step: 290890 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:56,541-Speed 6289.08 samples/sec Loss 5.7866 LearningRate 0.0005 Epoch: 14 Global Step: 290900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:11:59,783-Speed 6319.18 samples/sec Loss 5.6972 LearningRate 0.0005 Epoch: 14 Global Step: 290910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:03,026-Speed 6316.28 samples/sec Loss 5.8302 LearningRate 0.0005 Epoch: 14 Global Step: 290920 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:06,277-Speed 6300.32 samples/sec Loss 5.8133 LearningRate 0.0005 Epoch: 14 Global Step: 290930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:09,518-Speed 6320.97 samples/sec Loss 5.7393 LearningRate 0.0005 Epoch: 14 Global Step: 290940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:12,765-Speed 6308.72 samples/sec Loss 5.8777 LearningRate 0.0005 Epoch: 14 Global Step: 290950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:15,996-Speed 6340.24 samples/sec Loss 5.7890 LearningRate 0.0005 Epoch: 14 Global Step: 290960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:19,245-Speed 6306.12 samples/sec Loss 5.7948 LearningRate 0.0005 Epoch: 14 Global Step: 290970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:22,490-Speed 6312.07 samples/sec Loss 5.7346 LearningRate 0.0005 Epoch: 14 Global Step: 290980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:25,743-Speed 6297.33 samples/sec Loss 5.8895 LearningRate 0.0005 Epoch: 14 Global Step: 290990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:28,985-Speed 6317.53 samples/sec Loss 5.6991 LearningRate 0.0005 Epoch: 14 Global Step: 291000 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:32,226-Speed 6320.65 samples/sec Loss 5.7502 LearningRate 0.0005 Epoch: 14 Global Step: 291010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:35,470-Speed 6314.42 samples/sec Loss 5.7968 LearningRate 0.0005 Epoch: 14 Global Step: 291020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:38,713-Speed 6316.59 samples/sec Loss 5.8141 LearningRate 0.0005 Epoch: 14 Global Step: 291030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:41,954-Speed 6320.14 samples/sec Loss 5.8134 LearningRate 0.0005 Epoch: 14 Global Step: 291040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:45,200-Speed 6310.80 samples/sec Loss 5.8732 LearningRate 0.0005 Epoch: 14 Global Step: 291050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:48,425-Speed 6351.18 samples/sec Loss 5.7463 LearningRate 0.0005 Epoch: 14 Global Step: 291060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:51,669-Speed 6316.12 samples/sec Loss 5.7643 LearningRate 0.0005 Epoch: 14 Global Step: 291070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:54,912-Speed 6315.26 samples/sec Loss 5.8159 LearningRate 0.0005 Epoch: 14 Global Step: 291080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:12:58,157-Speed 6313.85 samples/sec Loss 5.8294 LearningRate 0.0005 Epoch: 14 Global Step: 291090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:01,408-Speed 6301.82 samples/sec Loss 5.8853 LearningRate 0.0005 Epoch: 14 Global Step: 291100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:04,652-Speed 6314.11 samples/sec Loss 5.8581 LearningRate 0.0005 Epoch: 14 Global Step: 291110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:07,896-Speed 6313.95 samples/sec Loss 5.8168 LearningRate 0.0005 Epoch: 14 Global Step: 291120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:11,139-Speed 6317.68 samples/sec Loss 5.8152 LearningRate 0.0005 Epoch: 14 Global Step: 291130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:14,393-Speed 6293.89 samples/sec Loss 5.8307 LearningRate 0.0005 Epoch: 14 Global Step: 291140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:17,636-Speed 6316.68 samples/sec Loss 5.8594 LearningRate 0.0005 Epoch: 14 Global Step: 291150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:20,876-Speed 6321.88 samples/sec Loss 5.8511 LearningRate 0.0005 Epoch: 14 Global Step: 291160 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:13:24,107-Speed 6340.79 samples/sec Loss 5.8871 LearningRate 0.0005 Epoch: 14 Global Step: 291170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:27,352-Speed 6312.50 samples/sec Loss 5.8149 LearningRate 0.0005 Epoch: 14 Global Step: 291180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:30,601-Speed 6306.28 samples/sec Loss 5.8167 LearningRate 0.0005 Epoch: 14 Global Step: 291190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:33,846-Speed 6313.29 samples/sec Loss 5.8607 LearningRate 0.0005 Epoch: 14 Global Step: 291200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:37,087-Speed 6319.15 samples/sec Loss 5.8412 LearningRate 0.0005 Epoch: 14 Global Step: 291210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:40,330-Speed 6316.48 samples/sec Loss 5.8597 LearningRate 0.0005 Epoch: 14 Global Step: 291220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:43,572-Speed 6318.54 samples/sec Loss 5.7953 LearningRate 0.0005 Epoch: 14 Global Step: 291230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:46,816-Speed 6314.60 samples/sec Loss 5.7999 LearningRate 0.0005 Epoch: 14 Global Step: 291240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:50,110-Speed 6218.77 samples/sec Loss 5.7871 LearningRate 0.0005 Epoch: 14 Global Step: 291250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:53,367-Speed 6289.89 samples/sec Loss 5.8455 LearningRate 0.0005 Epoch: 14 Global Step: 291260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:56,598-Speed 6340.21 samples/sec Loss 5.7590 LearningRate 0.0005 Epoch: 14 Global Step: 291270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:13:59,861-Speed 6278.69 samples/sec Loss 5.8022 LearningRate 0.0005 Epoch: 14 Global Step: 291280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:03,101-Speed 6321.99 samples/sec Loss 5.7860 LearningRate 0.0005 Epoch: 14 Global Step: 291290 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:06,343-Speed 6317.96 samples/sec Loss 5.7583 LearningRate 0.0005 Epoch: 14 Global Step: 291300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:09,584-Speed 6319.90 samples/sec Loss 5.8522 LearningRate 0.0005 Epoch: 14 Global Step: 291310 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:12,836-Speed 6298.88 samples/sec Loss 5.8046 LearningRate 0.0005 Epoch: 14 Global Step: 291320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:16,076-Speed 6322.73 samples/sec Loss 5.8128 LearningRate 0.0005 Epoch: 14 Global Step: 291330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:19,326-Speed 6302.95 samples/sec Loss 5.7933 LearningRate 0.0005 Epoch: 14 Global Step: 291340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:22,586-Speed 6283.81 samples/sec Loss 5.8110 LearningRate 0.0005 Epoch: 14 Global Step: 291350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:25,831-Speed 6312.49 samples/sec Loss 5.8383 LearningRate 0.0005 Epoch: 14 Global Step: 291360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:29,063-Speed 6338.73 samples/sec Loss 5.7801 LearningRate 0.0005 Epoch: 14 Global Step: 291370 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:32,303-Speed 6321.83 samples/sec Loss 5.8034 LearningRate 0.0005 Epoch: 14 Global Step: 291380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:35,550-Speed 6309.48 samples/sec Loss 5.7926 LearningRate 0.0005 Epoch: 14 Global Step: 291390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:38,790-Speed 6322.51 samples/sec Loss 5.7248 LearningRate 0.0005 Epoch: 14 Global Step: 291400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:42,036-Speed 6311.16 samples/sec Loss 5.8360 LearningRate 0.0005 Epoch: 14 Global Step: 291410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:45,277-Speed 6319.49 samples/sec Loss 5.8452 LearningRate 0.0005 Epoch: 14 Global Step: 291420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:48,521-Speed 6314.32 samples/sec Loss 5.8405 LearningRate 0.0005 Epoch: 14 Global Step: 291430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:51,763-Speed 6319.63 samples/sec Loss 5.8389 LearningRate 0.0005 Epoch: 14 Global Step: 291440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:55,003-Speed 6323.03 samples/sec Loss 5.8120 LearningRate 0.0005 Epoch: 14 Global Step: 291450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:14:58,249-Speed 6310.73 samples/sec Loss 5.7626 LearningRate 0.0005 Epoch: 14 Global Step: 291460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:01,490-Speed 6320.38 samples/sec Loss 5.8677 LearningRate 0.0005 Epoch: 14 Global Step: 291470 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:15:04,727-Speed 6327.55 samples/sec Loss 5.8071 LearningRate 0.0005 Epoch: 14 Global Step: 291480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:08,011-Speed 6238.79 samples/sec Loss 5.7864 LearningRate 0.0005 Epoch: 14 Global Step: 291490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:11,262-Speed 6299.30 samples/sec Loss 5.8529 LearningRate 0.0005 Epoch: 14 Global Step: 291500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:14,512-Speed 6304.32 samples/sec Loss 5.7658 LearningRate 0.0005 Epoch: 14 Global Step: 291510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:17,756-Speed 6313.66 samples/sec Loss 5.8493 LearningRate 0.0005 Epoch: 14 Global Step: 291520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:21,000-Speed 6314.25 samples/sec Loss 5.7628 LearningRate 0.0005 Epoch: 14 Global Step: 291530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:24,258-Speed 6287.93 samples/sec Loss 5.7907 LearningRate 0.0005 Epoch: 14 Global Step: 291540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:27,503-Speed 6312.46 samples/sec Loss 5.7426 LearningRate 0.0005 Epoch: 14 Global Step: 291550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:30,748-Speed 6312.60 samples/sec Loss 5.8618 LearningRate 0.0005 Epoch: 14 Global Step: 291560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:33,988-Speed 6321.93 samples/sec Loss 5.8269 LearningRate 0.0005 Epoch: 14 Global Step: 291570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:37,220-Speed 6339.23 samples/sec Loss 5.7966 LearningRate 0.0005 Epoch: 14 Global Step: 291580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:40,465-Speed 6311.64 samples/sec Loss 5.8596 LearningRate 0.0005 Epoch: 14 Global Step: 291590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:43,711-Speed 6311.44 samples/sec Loss 5.8144 LearningRate 0.0005 Epoch: 14 Global Step: 291600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:46,958-Speed 6309.17 samples/sec Loss 5.8777 LearningRate 0.0005 Epoch: 14 Global Step: 291610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:50,200-Speed 6317.48 samples/sec Loss 5.8041 LearningRate 0.0005 Epoch: 14 Global Step: 291620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:53,444-Speed 6315.09 samples/sec Loss 5.7515 LearningRate 0.0005 Epoch: 14 Global Step: 291630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:56,687-Speed 6317.19 samples/sec Loss 5.6792 LearningRate 0.0005 Epoch: 14 Global Step: 291640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:15:59,933-Speed 6312.38 samples/sec Loss 5.7780 LearningRate 0.0005 Epoch: 14 Global Step: 291650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:03,177-Speed 6314.74 samples/sec Loss 5.7664 LearningRate 0.0005 Epoch: 14 Global Step: 291660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:06,427-Speed 6303.26 samples/sec Loss 5.7815 LearningRate 0.0005 Epoch: 14 Global Step: 291670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:09,656-Speed 6343.05 samples/sec Loss 5.8502 LearningRate 0.0005 Epoch: 14 Global Step: 291680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:12,897-Speed 6321.15 samples/sec Loss 5.9074 LearningRate 0.0005 Epoch: 14 Global Step: 291690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:16,141-Speed 6314.14 samples/sec Loss 5.8273 LearningRate 0.0005 Epoch: 14 Global Step: 291700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:19,389-Speed 6306.12 samples/sec Loss 5.7042 LearningRate 0.0005 Epoch: 14 Global Step: 291710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:22,637-Speed 6307.66 samples/sec Loss 5.8009 LearningRate 0.0005 Epoch: 14 Global Step: 291720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:25,880-Speed 6316.19 samples/sec Loss 5.7874 LearningRate 0.0005 Epoch: 14 Global Step: 291730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:29,124-Speed 6314.54 samples/sec Loss 5.8109 LearningRate 0.0005 Epoch: 14 Global Step: 291740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:32,376-Speed 6298.82 samples/sec Loss 5.8446 LearningRate 0.0005 Epoch: 14 Global Step: 291750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:35,618-Speed 6318.22 samples/sec Loss 5.8076 LearningRate 0.0005 Epoch: 14 Global Step: 291760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:38,865-Speed 6308.15 samples/sec Loss 5.8261 LearningRate 0.0005 Epoch: 14 Global Step: 291770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:42,118-Speed 6298.48 samples/sec Loss 5.7027 LearningRate 0.0005 Epoch: 14 Global Step: 291780 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:16:45,346-Speed 6345.16 samples/sec Loss 5.7648 LearningRate 0.0005 Epoch: 14 Global Step: 291790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:48,592-Speed 6311.02 samples/sec Loss 5.8640 LearningRate 0.0005 Epoch: 14 Global Step: 291800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:51,836-Speed 6315.49 samples/sec Loss 5.7787 LearningRate 0.0005 Epoch: 14 Global Step: 291810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:55,078-Speed 6317.99 samples/sec Loss 5.8638 LearningRate 0.0005 Epoch: 14 Global Step: 291820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:16:58,322-Speed 6314.91 samples/sec Loss 5.7651 LearningRate 0.0005 Epoch: 14 Global Step: 291830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:01,569-Speed 6307.55 samples/sec Loss 5.8201 LearningRate 0.0005 Epoch: 14 Global Step: 291840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:04,814-Speed 6313.40 samples/sec Loss 5.8307 LearningRate 0.0005 Epoch: 14 Global Step: 291850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:08,062-Speed 6305.82 samples/sec Loss 5.7851 LearningRate 0.0005 Epoch: 14 Global Step: 291860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:11,312-Speed 6304.64 samples/sec Loss 5.8496 LearningRate 0.0005 Epoch: 14 Global Step: 291870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:14,556-Speed 6314.22 samples/sec Loss 5.8299 LearningRate 0.0005 Epoch: 14 Global Step: 291880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:17,788-Speed 6339.60 samples/sec Loss 5.8000 LearningRate 0.0005 Epoch: 14 Global Step: 291890 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:21,034-Speed 6309.88 samples/sec Loss 5.7691 LearningRate 0.0005 Epoch: 14 Global Step: 291900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:24,278-Speed 6314.34 samples/sec Loss 5.7123 LearningRate 0.0005 Epoch: 14 Global Step: 291910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:27,527-Speed 6304.44 samples/sec Loss 5.8321 LearningRate 0.0005 Epoch: 14 Global Step: 291920 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:30,771-Speed 6315.77 samples/sec Loss 5.8203 LearningRate 0.0005 Epoch: 14 Global Step: 291930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:34,015-Speed 6313.98 samples/sec Loss 5.8313 LearningRate 0.0005 Epoch: 14 Global Step: 291940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:37,263-Speed 6306.55 samples/sec Loss 5.8339 LearningRate 0.0005 Epoch: 14 Global Step: 291950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:40,508-Speed 6313.96 samples/sec Loss 5.7668 LearningRate 0.0005 Epoch: 14 Global Step: 291960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:43,754-Speed 6309.76 samples/sec Loss 5.8103 LearningRate 0.0005 Epoch: 14 Global Step: 291970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:47,009-Speed 6293.04 samples/sec Loss 5.8520 LearningRate 0.0005 Epoch: 14 Global Step: 291980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:50,238-Speed 6344.52 samples/sec Loss 5.8795 LearningRate 0.0005 Epoch: 14 Global Step: 291990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:53,482-Speed 6315.48 samples/sec Loss 5.8418 LearningRate 0.0005 Epoch: 14 Global Step: 292000 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:56,726-Speed 6314.14 samples/sec Loss 5.7964 LearningRate 0.0005 Epoch: 14 Global Step: 292010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:17:59,972-Speed 6310.28 samples/sec Loss 5.8314 LearningRate 0.0005 Epoch: 14 Global Step: 292020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:18:03,221-Speed 6305.40 samples/sec Loss 5.8553 LearningRate 0.0005 Epoch: 14 Global Step: 292030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:18:06,467-Speed 6310.52 samples/sec Loss 5.8383 LearningRate 0.0005 Epoch: 14 Global Step: 292040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:18:09,711-Speed 6314.86 samples/sec Loss 5.7635 LearningRate 0.0005 Epoch: 14 Global Step: 292050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:18:12,955-Speed 6313.63 samples/sec Loss 5.8007 LearningRate 0.0005 Epoch: 14 Global Step: 292060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:18:16,185-Speed 6342.67 samples/sec Loss 5.7701 LearningRate 0.0005 Epoch: 14 Global Step: 292070 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:18:19,427-Speed 6318.47 samples/sec Loss 5.7975 LearningRate 0.0005 Epoch: 14 Global Step: 292080 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:18:22,670-Speed 6316.59 samples/sec Loss 5.8568 LearningRate 0.0005 Epoch: 14 Global Step: 292090 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:18:25,918-Speed 6306.98 samples/sec Loss 5.7700 LearningRate 0.0005 Epoch: 14 Global Step: 292100 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:18:29,161-Speed 6316.80 samples/sec Loss 5.7217 LearningRate 0.0005 Epoch: 14 Global Step: 292110 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:18:32,404-Speed 6317.33 samples/sec Loss 5.7925 LearningRate 0.0005 Epoch: 14 Global Step: 292120 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:18:35,649-Speed 6312.24 samples/sec Loss 5.7985 LearningRate 0.0005 Epoch: 14 Global Step: 292130 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:18:38,896-Speed 6309.24 samples/sec Loss 5.7994 LearningRate 0.0005 Epoch: 14 Global Step: 292140 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:18:42,142-Speed 6310.47 samples/sec Loss 5.8095 LearningRate 0.0005 Epoch: 14 Global Step: 292150 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:18:45,386-Speed 6315.12 samples/sec Loss 5.7047 LearningRate 0.0005 Epoch: 14 Global Step: 292160 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:18:48,629-Speed 6316.89 samples/sec Loss 5.8573 LearningRate 0.0005 Epoch: 14 Global Step: 292170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:18:51,897-Speed 6267.44 samples/sec Loss 5.8316 LearningRate 0.0005 Epoch: 14 Global Step: 292180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:18:55,197-Speed 6206.35 samples/sec Loss 5.8143 LearningRate 0.0005 Epoch: 14 Global Step: 292190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:18:58,448-Speed 6302.36 samples/sec Loss 5.7866 LearningRate 0.0005 Epoch: 14 Global Step: 292200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:01,695-Speed 6307.32 samples/sec Loss 5.8164 LearningRate 0.0005 Epoch: 14 Global Step: 292210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:04,952-Speed 6289.99 samples/sec Loss 5.8369 LearningRate 0.0005 Epoch: 14 Global Step: 292220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:08,202-Speed 6302.82 samples/sec Loss 5.7995 LearningRate 0.0005 Epoch: 14 Global Step: 292230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:11,448-Speed 6310.59 samples/sec Loss 5.8600 LearningRate 0.0005 Epoch: 14 Global Step: 292240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:14,694-Speed 6310.21 samples/sec Loss 5.8095 LearningRate 0.0005 Epoch: 14 Global Step: 292250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:17,940-Speed 6312.66 samples/sec Loss 5.7966 LearningRate 0.0005 Epoch: 14 Global Step: 292260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:21,172-Speed 6336.98 samples/sec Loss 5.8094 LearningRate 0.0005 Epoch: 14 Global Step: 292270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:24,515-Speed 6129.15 samples/sec Loss 5.7869 LearningRate 0.0005 Epoch: 14 Global Step: 292280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:27,785-Speed 6263.56 samples/sec Loss 5.7851 LearningRate 0.0005 Epoch: 14 Global Step: 292290 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:31,031-Speed 6311.69 samples/sec Loss 5.8394 LearningRate 0.0005 Epoch: 14 Global Step: 292300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:34,279-Speed 6306.40 samples/sec Loss 5.8156 LearningRate 0.0005 Epoch: 14 Global Step: 292310 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:37,525-Speed 6310.57 samples/sec Loss 5.7657 LearningRate 0.0005 Epoch: 14 Global Step: 292320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:40,770-Speed 6313.61 samples/sec Loss 5.7386 LearningRate 0.0005 Epoch: 14 Global Step: 292330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:44,015-Speed 6312.07 samples/sec Loss 5.8105 LearningRate 0.0005 Epoch: 14 Global Step: 292340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:47,260-Speed 6312.59 samples/sec Loss 5.7993 LearningRate 0.0005 Epoch: 14 Global Step: 292350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:50,503-Speed 6317.00 samples/sec Loss 5.8116 LearningRate 0.0005 Epoch: 14 Global Step: 292360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:53,734-Speed 6338.92 samples/sec Loss 5.8265 LearningRate 0.0005 Epoch: 14 Global Step: 292370 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:19:56,980-Speed 6310.45 samples/sec Loss 5.8586 LearningRate 0.0005 Epoch: 14 Global Step: 292380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:00,226-Speed 6312.15 samples/sec Loss 5.7820 LearningRate 0.0005 Epoch: 14 Global Step: 292390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:03,475-Speed 6304.00 samples/sec Loss 5.7771 LearningRate 0.0005 Epoch: 14 Global Step: 292400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:06,721-Speed 6310.09 samples/sec Loss 5.7236 LearningRate 0.0005 Epoch: 14 Global Step: 292410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:09,963-Speed 6318.33 samples/sec Loss 5.8452 LearningRate 0.0005 Epoch: 14 Global Step: 292420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:13,213-Speed 6303.48 samples/sec Loss 5.8493 LearningRate 0.0005 Epoch: 14 Global Step: 292430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:16,458-Speed 6312.21 samples/sec Loss 5.8891 LearningRate 0.0005 Epoch: 14 Global Step: 292440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:19,713-Speed 6294.61 samples/sec Loss 5.8181 LearningRate 0.0005 Epoch: 14 Global Step: 292450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:22,969-Speed 6290.58 samples/sec Loss 5.8121 LearningRate 0.0005 Epoch: 14 Global Step: 292460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:26,236-Speed 6269.26 samples/sec Loss 5.7428 LearningRate 0.0005 Epoch: 14 Global Step: 292470 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:20:29,471-Speed 6332.60 samples/sec Loss 5.7431 LearningRate 0.0005 Epoch: 14 Global Step: 292480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:32,720-Speed 6304.95 samples/sec Loss 5.8564 LearningRate 0.0005 Epoch: 14 Global Step: 292490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:35,970-Speed 6304.82 samples/sec Loss 5.8082 LearningRate 0.0005 Epoch: 14 Global Step: 292500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:39,215-Speed 6311.66 samples/sec Loss 5.8026 LearningRate 0.0005 Epoch: 14 Global Step: 292510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:42,465-Speed 6303.88 samples/sec Loss 5.7838 LearningRate 0.0005 Epoch: 14 Global Step: 292520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:45,710-Speed 6313.31 samples/sec Loss 5.7886 LearningRate 0.0005 Epoch: 14 Global Step: 292530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:48,956-Speed 6309.80 samples/sec Loss 5.7937 LearningRate 0.0005 Epoch: 14 Global Step: 292540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:52,199-Speed 6317.26 samples/sec Loss 5.7673 LearningRate 0.0005 Epoch: 14 Global Step: 292550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:55,443-Speed 6314.05 samples/sec Loss 5.7894 LearningRate 0.0005 Epoch: 14 Global Step: 292560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:20:58,691-Speed 6306.62 samples/sec Loss 5.8230 LearningRate 0.0005 Epoch: 14 Global Step: 292570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:21:01,921-Speed 6342.29 samples/sec Loss 5.7902 LearningRate 0.0005 Epoch: 14 Global Step: 292580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:21:05,167-Speed 6310.71 samples/sec Loss 5.7991 LearningRate 0.0005 Epoch: 14 Global Step: 292590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:21:08,398-Speed 6339.66 samples/sec Loss 5.8578 LearningRate 0.0005 Epoch: 14 Global Step: 292600 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:21:11,643-Speed 6312.10 samples/sec Loss 5.7819 LearningRate 0.0005 Epoch: 14 Global Step: 292610 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:21:14,890-Speed 6309.09 samples/sec Loss 5.7664 LearningRate 0.0005 Epoch: 14 Global Step: 292620 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:21:18,138-Speed 6307.72 samples/sec Loss 5.8191 LearningRate 0.0005 Epoch: 14 Global Step: 292630 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:21:21,384-Speed 6311.13 samples/sec Loss 5.7762 LearningRate 0.0005 Epoch: 14 Global Step: 292640 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:21:24,629-Speed 6311.92 samples/sec Loss 5.6521 LearningRate 0.0005 Epoch: 14 Global Step: 292650 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:21:27,873-Speed 6313.69 samples/sec Loss 5.8036 LearningRate 0.0005 Epoch: 14 Global Step: 292660 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:21:31,118-Speed 6313.03 samples/sec Loss 5.7985 LearningRate 0.0005 Epoch: 14 Global Step: 292670 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:21:34,366-Speed 6307.42 samples/sec Loss 5.8043 LearningRate 0.0005 Epoch: 14 Global Step: 292680 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:21:37,615-Speed 6303.70 samples/sec Loss 5.8621 LearningRate 0.0005 Epoch: 14 Global Step: 292690 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:21:40,863-Speed 6307.73 samples/sec Loss 5.7327 LearningRate 0.0005 Epoch: 14 Global Step: 292700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:21:44,113-Speed 6304.68 samples/sec Loss 5.7874 LearningRate 0.0005 Epoch: 14 Global Step: 292710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:21:47,357-Speed 6313.81 samples/sec Loss 5.8102 LearningRate 0.0005 Epoch: 14 Global Step: 292720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:21:50,604-Speed 6312.74 samples/sec Loss 5.7781 LearningRate 0.0005 Epoch: 14 Global Step: 292730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:21:53,847-Speed 6316.45 samples/sec Loss 5.7457 LearningRate 0.0005 Epoch: 14 Global Step: 292740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:21:57,094-Speed 6309.14 samples/sec Loss 5.8289 LearningRate 0.0005 Epoch: 14 Global Step: 292750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:00,370-Speed 6250.89 samples/sec Loss 5.8228 LearningRate 0.0005 Epoch: 14 Global Step: 292760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:03,616-Speed 6310.97 samples/sec Loss 5.7974 LearningRate 0.0005 Epoch: 14 Global Step: 292770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:06,862-Speed 6310.23 samples/sec Loss 5.7268 LearningRate 0.0005 Epoch: 14 Global Step: 292780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:10,115-Speed 6298.74 samples/sec Loss 5.7461 LearningRate 0.0005 Epoch: 14 Global Step: 292790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:13,347-Speed 6337.93 samples/sec Loss 5.7888 LearningRate 0.0005 Epoch: 14 Global Step: 292800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:16,597-Speed 6301.43 samples/sec Loss 5.7704 LearningRate 0.0005 Epoch: 14 Global Step: 292810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:19,852-Speed 6293.35 samples/sec Loss 5.7877 LearningRate 0.0005 Epoch: 14 Global Step: 292820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:23,099-Speed 6309.08 samples/sec Loss 5.8448 LearningRate 0.0005 Epoch: 14 Global Step: 292830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:26,350-Speed 6301.74 samples/sec Loss 5.8208 LearningRate 0.0005 Epoch: 14 Global Step: 292840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:29,608-Speed 6287.85 samples/sec Loss 5.7382 LearningRate 0.0005 Epoch: 14 Global Step: 292850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:32,853-Speed 6312.23 samples/sec Loss 5.8474 LearningRate 0.0005 Epoch: 14 Global Step: 292860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:36,101-Speed 6306.64 samples/sec Loss 5.8330 LearningRate 0.0005 Epoch: 14 Global Step: 292870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:39,350-Speed 6304.46 samples/sec Loss 5.6499 LearningRate 0.0005 Epoch: 14 Global Step: 292880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:42,590-Speed 6323.55 samples/sec Loss 5.7959 LearningRate 0.0005 Epoch: 14 Global Step: 292890 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:45,822-Speed 6338.74 samples/sec Loss 5.8226 LearningRate 0.0005 Epoch: 14 Global Step: 292900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:49,070-Speed 6306.20 samples/sec Loss 5.8049 LearningRate 0.0005 Epoch: 14 Global Step: 292910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:52,322-Speed 6299.02 samples/sec Loss 5.7406 LearningRate 0.0005 Epoch: 14 Global Step: 292920 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:55,566-Speed 6315.18 samples/sec Loss 5.8277 LearningRate 0.0005 Epoch: 14 Global Step: 292930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:22:58,809-Speed 6317.16 samples/sec Loss 5.7985 LearningRate 0.0005 Epoch: 14 Global Step: 292940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:02,067-Speed 6287.45 samples/sec Loss 5.7444 LearningRate 0.0005 Epoch: 14 Global Step: 292950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:05,318-Speed 6300.43 samples/sec Loss 5.7982 LearningRate 0.0005 Epoch: 14 Global Step: 292960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:08,564-Speed 6309.95 samples/sec Loss 5.7751 LearningRate 0.0005 Epoch: 14 Global Step: 292970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:11,812-Speed 6308.35 samples/sec Loss 5.7907 LearningRate 0.0005 Epoch: 14 Global Step: 292980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:15,060-Speed 6306.07 samples/sec Loss 5.8094 LearningRate 0.0005 Epoch: 14 Global Step: 292990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:18,304-Speed 6313.59 samples/sec Loss 5.8083 LearningRate 0.0005 Epoch: 14 Global Step: 293000 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:23:21,536-Speed 6339.74 samples/sec Loss 5.7357 LearningRate 0.0005 Epoch: 14 Global Step: 293010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:24,785-Speed 6305.03 samples/sec Loss 5.8264 LearningRate 0.0005 Epoch: 14 Global Step: 293020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:28,030-Speed 6311.42 samples/sec Loss 5.8003 LearningRate 0.0005 Epoch: 14 Global Step: 293030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:31,278-Speed 6307.16 samples/sec Loss 5.7785 LearningRate 0.0005 Epoch: 14 Global Step: 293040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:34,523-Speed 6313.71 samples/sec Loss 5.8176 LearningRate 0.0005 Epoch: 14 Global Step: 293050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:37,766-Speed 6315.40 samples/sec Loss 5.8052 LearningRate 0.0005 Epoch: 14 Global Step: 293060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:41,012-Speed 6311.47 samples/sec Loss 5.7717 LearningRate 0.0005 Epoch: 14 Global Step: 293070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:44,256-Speed 6314.16 samples/sec Loss 5.8106 LearningRate 0.0005 Epoch: 14 Global Step: 293080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:47,504-Speed 6307.28 samples/sec Loss 5.7408 LearningRate 0.0005 Epoch: 14 Global Step: 293090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:50,749-Speed 6312.40 samples/sec Loss 5.8182 LearningRate 0.0005 Epoch: 14 Global Step: 293100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:53,997-Speed 6307.26 samples/sec Loss 5.8163 LearningRate 0.0005 Epoch: 14 Global Step: 293110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:23:57,242-Speed 6312.12 samples/sec Loss 5.7950 LearningRate 0.0005 Epoch: 14 Global Step: 293120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:00,490-Speed 6310.17 samples/sec Loss 5.8518 LearningRate 0.0005 Epoch: 14 Global Step: 293130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:03,736-Speed 6311.19 samples/sec Loss 5.7884 LearningRate 0.0005 Epoch: 14 Global Step: 293140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:06,980-Speed 6313.69 samples/sec Loss 5.8015 LearningRate 0.0005 Epoch: 14 Global Step: 293150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:10,225-Speed 6312.93 samples/sec Loss 5.7253 LearningRate 0.0005 Epoch: 14 Global Step: 293160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:13,469-Speed 6315.54 samples/sec Loss 5.8456 LearningRate 0.0005 Epoch: 14 Global Step: 293170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:16,712-Speed 6315.01 samples/sec Loss 5.8427 LearningRate 0.0005 Epoch: 14 Global Step: 293180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:19,959-Speed 6309.04 samples/sec Loss 5.8006 LearningRate 0.0005 Epoch: 14 Global Step: 293190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:23,207-Speed 6307.17 samples/sec Loss 5.7970 LearningRate 0.0005 Epoch: 14 Global Step: 293200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:26,449-Speed 6318.32 samples/sec Loss 5.8576 LearningRate 0.0005 Epoch: 14 Global Step: 293210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:29,696-Speed 6309.04 samples/sec Loss 5.8202 LearningRate 0.0005 Epoch: 14 Global Step: 293220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:32,944-Speed 6306.11 samples/sec Loss 5.8191 LearningRate 0.0005 Epoch: 14 Global Step: 293230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:36,189-Speed 6312.47 samples/sec Loss 5.7792 LearningRate 0.0005 Epoch: 14 Global Step: 293240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:39,442-Speed 6297.03 samples/sec Loss 5.7611 LearningRate 0.0005 Epoch: 14 Global Step: 293250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:42,687-Speed 6314.41 samples/sec Loss 5.7627 LearningRate 0.0005 Epoch: 14 Global Step: 293260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:45,929-Speed 6317.46 samples/sec Loss 5.8607 LearningRate 0.0005 Epoch: 14 Global Step: 293270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:49,174-Speed 6312.18 samples/sec Loss 5.7226 LearningRate 0.0005 Epoch: 14 Global Step: 293280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:52,418-Speed 6314.77 samples/sec Loss 5.7720 LearningRate 0.0005 Epoch: 14 Global Step: 293290 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:55,663-Speed 6312.96 samples/sec Loss 5.7803 LearningRate 0.0005 Epoch: 14 Global Step: 293300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:24:58,897-Speed 6334.07 samples/sec Loss 5.7745 LearningRate 0.0005 Epoch: 14 Global Step: 293310 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:25:02,141-Speed 6315.51 samples/sec Loss 5.7163 LearningRate 0.0005 Epoch: 14 Global Step: 293320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:25:05,385-Speed 6315.44 samples/sec Loss 5.8278 LearningRate 0.0005 Epoch: 14 Global Step: 293330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:25:08,629-Speed 6314.03 samples/sec Loss 5.8010 LearningRate 0.0005 Epoch: 14 Global Step: 293340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:25:11,874-Speed 6312.21 samples/sec Loss 5.8411 LearningRate 0.0005 Epoch: 14 Global Step: 293350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:25:15,123-Speed 6304.94 samples/sec Loss 5.8959 LearningRate 0.0005 Epoch: 14 Global Step: 293360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:25:18,356-Speed 6336.95 samples/sec Loss 5.7495 LearningRate 0.0005 Epoch: 14 Global Step: 293370 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:25:21,600-Speed 6314.57 samples/sec Loss 5.8365 LearningRate 0.0005 Epoch: 14 Global Step: 293380 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:25:24,847-Speed 6309.39 samples/sec Loss 5.7828 LearningRate 0.0005 Epoch: 14 Global Step: 293390 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:25:28,090-Speed 6316.54 samples/sec Loss 5.8296 LearningRate 0.0005 Epoch: 14 Global Step: 293400 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:25:31,338-Speed 6305.91 samples/sec Loss 5.8150 LearningRate 0.0005 Epoch: 14 Global Step: 293410 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:25:34,581-Speed 6317.44 samples/sec Loss 5.8327 LearningRate 0.0005 Epoch: 14 Global Step: 293420 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:25:37,826-Speed 6312.58 samples/sec Loss 5.8142 LearningRate 0.0005 Epoch: 14 Global Step: 293430 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:25:41,074-Speed 6305.58 samples/sec Loss 5.8107 LearningRate 0.0005 Epoch: 14 Global Step: 293440 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:25:44,323-Speed 6304.73 samples/sec Loss 5.7279 LearningRate 0.0005 Epoch: 14 Global Step: 293450 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:25:47,569-Speed 6311.27 samples/sec Loss 5.8076 LearningRate 0.0005 Epoch: 14 Global Step: 293460 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:25:50,815-Speed 6311.30 samples/sec Loss 5.8782 LearningRate 0.0005 Epoch: 14 Global Step: 293470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:25:54,063-Speed 6306.15 samples/sec Loss 5.8140 LearningRate 0.0005 Epoch: 14 Global Step: 293480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:25:57,306-Speed 6317.55 samples/sec Loss 5.8816 LearningRate 0.0005 Epoch: 14 Global Step: 293490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:00,552-Speed 6309.99 samples/sec Loss 5.7816 LearningRate 0.0005 Epoch: 14 Global Step: 293500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:03,799-Speed 6309.23 samples/sec Loss 5.7760 LearningRate 0.0005 Epoch: 14 Global Step: 293510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:07,044-Speed 6313.07 samples/sec Loss 5.8143 LearningRate 0.0005 Epoch: 14 Global Step: 293520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:10,295-Speed 6300.51 samples/sec Loss 5.8419 LearningRate 0.0005 Epoch: 14 Global Step: 293530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:13,544-Speed 6305.04 samples/sec Loss 5.7487 LearningRate 0.0005 Epoch: 14 Global Step: 293540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:16,790-Speed 6311.54 samples/sec Loss 5.8082 LearningRate 0.0005 Epoch: 14 Global Step: 293550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:20,033-Speed 6316.56 samples/sec Loss 5.7984 LearningRate 0.0005 Epoch: 14 Global Step: 293560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:23,274-Speed 6320.67 samples/sec Loss 5.8369 LearningRate 0.0005 Epoch: 14 Global Step: 293570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:26,516-Speed 6317.20 samples/sec Loss 5.8275 LearningRate 0.0005 Epoch: 14 Global Step: 293580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:29,762-Speed 6310.98 samples/sec Loss 5.7163 LearningRate 0.0005 Epoch: 14 Global Step: 293590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:33,008-Speed 6312.31 samples/sec Loss 5.8432 LearningRate 0.0005 Epoch: 14 Global Step: 293600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:36,253-Speed 6311.38 samples/sec Loss 5.8054 LearningRate 0.0005 Epoch: 14 Global Step: 293610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:39,499-Speed 6310.90 samples/sec Loss 5.8154 LearningRate 0.0005 Epoch: 14 Global Step: 293620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:42,742-Speed 6317.12 samples/sec Loss 5.7315 LearningRate 0.0005 Epoch: 14 Global Step: 293630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:45,988-Speed 6309.58 samples/sec Loss 5.7580 LearningRate 0.0005 Epoch: 14 Global Step: 293640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:49,235-Speed 6309.94 samples/sec Loss 5.7911 LearningRate 0.0005 Epoch: 14 Global Step: 293650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:52,480-Speed 6311.90 samples/sec Loss 5.7880 LearningRate 0.0005 Epoch: 14 Global Step: 293660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:26:55,726-Speed 6310.77 samples/sec Loss 5.7060 LearningRate 0.0005 Epoch: 14 Global Step: 293670 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:26:58,966-Speed 6321.28 samples/sec Loss 5.8288 LearningRate 0.0005 Epoch: 14 Global Step: 293680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:02,216-Speed 6303.80 samples/sec Loss 5.7980 LearningRate 0.0005 Epoch: 14 Global Step: 293690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:05,465-Speed 6305.54 samples/sec Loss 5.7968 LearningRate 0.0005 Epoch: 14 Global Step: 293700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:08,710-Speed 6312.17 samples/sec Loss 5.7817 LearningRate 0.0005 Epoch: 14 Global Step: 293710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:11,957-Speed 6308.39 samples/sec Loss 5.7275 LearningRate 0.0005 Epoch: 14 Global Step: 293720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:15,200-Speed 6316.14 samples/sec Loss 5.7720 LearningRate 0.0005 Epoch: 14 Global Step: 293730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:18,443-Speed 6316.20 samples/sec Loss 5.7272 LearningRate 0.0005 Epoch: 14 Global Step: 293740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:21,691-Speed 6307.56 samples/sec Loss 5.8202 LearningRate 0.0005 Epoch: 14 Global Step: 293750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:24,933-Speed 6319.50 samples/sec Loss 5.7816 LearningRate 0.0005 Epoch: 14 Global Step: 293760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:28,182-Speed 6304.55 samples/sec Loss 5.7943 LearningRate 0.0005 Epoch: 14 Global Step: 293770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:31,436-Speed 6296.09 samples/sec Loss 5.6850 LearningRate 0.0005 Epoch: 14 Global Step: 293780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:34,681-Speed 6313.30 samples/sec Loss 5.8048 LearningRate 0.0005 Epoch: 14 Global Step: 293790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:37,925-Speed 6314.10 samples/sec Loss 5.7610 LearningRate 0.0005 Epoch: 14 Global Step: 293800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:41,170-Speed 6311.35 samples/sec Loss 5.7269 LearningRate 0.0005 Epoch: 14 Global Step: 293810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:44,416-Speed 6311.73 samples/sec Loss 5.7018 LearningRate 0.0005 Epoch: 14 Global Step: 293820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:47,660-Speed 6314.44 samples/sec Loss 5.7552 LearningRate 0.0005 Epoch: 14 Global Step: 293830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:50,903-Speed 6316.08 samples/sec Loss 5.7354 LearningRate 0.0005 Epoch: 14 Global Step: 293840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:54,149-Speed 6311.30 samples/sec Loss 5.7985 LearningRate 0.0005 Epoch: 14 Global Step: 293850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:27:57,403-Speed 6295.02 samples/sec Loss 5.7466 LearningRate 0.0005 Epoch: 14 Global Step: 293860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:00,646-Speed 6315.58 samples/sec Loss 5.7351 LearningRate 0.0005 Epoch: 14 Global Step: 293870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:03,884-Speed 6328.15 samples/sec Loss 5.7456 LearningRate 0.0005 Epoch: 14 Global Step: 293880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:07,131-Speed 6308.46 samples/sec Loss 5.7743 LearningRate 0.0005 Epoch: 14 Global Step: 293890 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:10,374-Speed 6315.25 samples/sec Loss 5.8100 LearningRate 0.0005 Epoch: 14 Global Step: 293900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:13,624-Speed 6303.18 samples/sec Loss 5.7747 LearningRate 0.0005 Epoch: 14 Global Step: 293910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:16,870-Speed 6311.31 samples/sec Loss 5.8144 LearningRate 0.0005 Epoch: 14 Global Step: 293920 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:20,111-Speed 6318.99 samples/sec Loss 5.8006 LearningRate 0.0005 Epoch: 14 Global Step: 293930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:23,360-Speed 6305.69 samples/sec Loss 5.7724 LearningRate 0.0005 Epoch: 14 Global Step: 293940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:26,602-Speed 6318.96 samples/sec Loss 5.8167 LearningRate 0.0005 Epoch: 14 Global Step: 293950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:29,852-Speed 6303.76 samples/sec Loss 5.8380 LearningRate 0.0005 Epoch: 14 Global Step: 293960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:33,094-Speed 6318.84 samples/sec Loss 5.7607 LearningRate 0.0005 Epoch: 14 Global Step: 293970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:36,340-Speed 6309.76 samples/sec Loss 5.7569 LearningRate 0.0005 Epoch: 14 Global Step: 293980 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:28:39,572-Speed 6338.40 samples/sec Loss 5.8094 LearningRate 0.0005 Epoch: 14 Global Step: 293990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:42,821-Speed 6305.65 samples/sec Loss 5.7468 LearningRate 0.0005 Epoch: 14 Global Step: 294000 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:46,070-Speed 6305.26 samples/sec Loss 5.7122 LearningRate 0.0005 Epoch: 14 Global Step: 294010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:49,318-Speed 6306.85 samples/sec Loss 5.8806 LearningRate 0.0005 Epoch: 14 Global Step: 294020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:52,564-Speed 6310.54 samples/sec Loss 5.7961 LearningRate 0.0005 Epoch: 14 Global Step: 294030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:55,807-Speed 6315.56 samples/sec Loss 5.7371 LearningRate 0.0005 Epoch: 14 Global Step: 294040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:28:59,058-Speed 6301.71 samples/sec Loss 5.7960 LearningRate 0.0005 Epoch: 14 Global Step: 294050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:02,308-Speed 6303.53 samples/sec Loss 5.7695 LearningRate 0.0005 Epoch: 14 Global Step: 294060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:05,568-Speed 6283.05 samples/sec Loss 5.8114 LearningRate 0.0005 Epoch: 14 Global Step: 294070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:08,814-Speed 6311.00 samples/sec Loss 5.8490 LearningRate 0.0005 Epoch: 14 Global Step: 294080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:12,045-Speed 6339.84 samples/sec Loss 5.7516 LearningRate 0.0005 Epoch: 14 Global Step: 294090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:15,292-Speed 6308.46 samples/sec Loss 5.8574 LearningRate 0.0005 Epoch: 14 Global Step: 294100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:18,540-Speed 6307.10 samples/sec Loss 5.7852 LearningRate 0.0005 Epoch: 14 Global Step: 294110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:21,785-Speed 6312.37 samples/sec Loss 5.7017 LearningRate 0.0005 Epoch: 14 Global Step: 294120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:25,030-Speed 6312.12 samples/sec Loss 5.7112 LearningRate 0.0005 Epoch: 14 Global Step: 294130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:28,273-Speed 6316.52 samples/sec Loss 5.7679 LearningRate 0.0005 Epoch: 14 Global Step: 294140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:31,518-Speed 6313.54 samples/sec Loss 5.7060 LearningRate 0.0005 Epoch: 14 Global Step: 294150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:34,762-Speed 6314.53 samples/sec Loss 5.7830 LearningRate 0.0005 Epoch: 14 Global Step: 294160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:38,011-Speed 6305.74 samples/sec Loss 5.7820 LearningRate 0.0005 Epoch: 14 Global Step: 294170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:41,254-Speed 6315.46 samples/sec Loss 5.7202 LearningRate 0.0005 Epoch: 14 Global Step: 294180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:44,486-Speed 6339.86 samples/sec Loss 5.8099 LearningRate 0.0005 Epoch: 14 Global Step: 294190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:47,730-Speed 6313.77 samples/sec Loss 5.8108 LearningRate 0.0005 Epoch: 14 Global Step: 294200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:50,975-Speed 6312.08 samples/sec Loss 5.7683 LearningRate 0.0005 Epoch: 14 Global Step: 294210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:54,223-Speed 6308.17 samples/sec Loss 5.7766 LearningRate 0.0005 Epoch: 14 Global Step: 294220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:29:57,470-Speed 6306.96 samples/sec Loss 5.8000 LearningRate 0.0005 Epoch: 14 Global Step: 294230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:00,725-Speed 6293.77 samples/sec Loss 5.8233 LearningRate 0.0005 Epoch: 14 Global Step: 294240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:03,972-Speed 6307.81 samples/sec Loss 5.7554 LearningRate 0.0005 Epoch: 14 Global Step: 294250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:07,229-Speed 6291.36 samples/sec Loss 5.7615 LearningRate 0.0005 Epoch: 14 Global Step: 294260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:10,477-Speed 6306.05 samples/sec Loss 5.8555 LearningRate 0.0005 Epoch: 14 Global Step: 294270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:13,720-Speed 6316.37 samples/sec Loss 5.7513 LearningRate 0.0005 Epoch: 14 Global Step: 294280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:16,961-Speed 6319.85 samples/sec Loss 5.7896 LearningRate 0.0005 Epoch: 14 Global Step: 294290 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:30:20,191-Speed 6342.39 samples/sec Loss 5.8026 LearningRate 0.0005 Epoch: 14 Global Step: 294300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:23,441-Speed 6302.71 samples/sec Loss 5.8342 LearningRate 0.0005 Epoch: 14 Global Step: 294310 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:26,687-Speed 6311.31 samples/sec Loss 5.8257 LearningRate 0.0005 Epoch: 14 Global Step: 294320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:29,933-Speed 6310.45 samples/sec Loss 5.8330 LearningRate 0.0005 Epoch: 14 Global Step: 294330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:33,179-Speed 6310.01 samples/sec Loss 5.8152 LearningRate 0.0005 Epoch: 14 Global Step: 294340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:36,433-Speed 6296.83 samples/sec Loss 5.7676 LearningRate 0.0005 Epoch: 14 Global Step: 294350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:39,675-Speed 6317.52 samples/sec Loss 5.8019 LearningRate 0.0005 Epoch: 14 Global Step: 294360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:42,919-Speed 6315.53 samples/sec Loss 5.8086 LearningRate 0.0005 Epoch: 14 Global Step: 294370 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:46,161-Speed 6318.03 samples/sec Loss 5.8442 LearningRate 0.0005 Epoch: 14 Global Step: 294380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:49,413-Speed 6300.44 samples/sec Loss 5.8412 LearningRate 0.0005 Epoch: 14 Global Step: 294390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:52,645-Speed 6337.10 samples/sec Loss 5.7769 LearningRate 0.0005 Epoch: 14 Global Step: 294400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:55,891-Speed 6310.72 samples/sec Loss 5.7362 LearningRate 0.0005 Epoch: 14 Global Step: 294410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:30:59,144-Speed 6296.69 samples/sec Loss 5.7770 LearningRate 0.0005 Epoch: 14 Global Step: 294420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:02,395-Speed 6301.34 samples/sec Loss 5.7346 LearningRate 0.0005 Epoch: 14 Global Step: 294430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:05,646-Speed 6301.35 samples/sec Loss 5.7652 LearningRate 0.0005 Epoch: 14 Global Step: 294440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:08,894-Speed 6307.95 samples/sec Loss 5.7134 LearningRate 0.0005 Epoch: 14 Global Step: 294450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:12,140-Speed 6309.33 samples/sec Loss 5.7791 LearningRate 0.0005 Epoch: 14 Global Step: 294460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:15,467-Speed 6157.71 samples/sec Loss 5.7664 LearningRate 0.0005 Epoch: 14 Global Step: 294470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:18,779-Speed 6185.95 samples/sec Loss 5.8016 LearningRate 0.0005 Epoch: 14 Global Step: 294480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:22,029-Speed 6301.42 samples/sec Loss 5.8899 LearningRate 0.0005 Epoch: 14 Global Step: 294490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:25,261-Speed 6339.39 samples/sec Loss 5.8541 LearningRate 0.0005 Epoch: 14 Global Step: 294500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:28,516-Speed 6293.46 samples/sec Loss 5.7346 LearningRate 0.0005 Epoch: 14 Global Step: 294510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:31,765-Speed 6304.65 samples/sec Loss 5.7780 LearningRate 0.0005 Epoch: 14 Global Step: 294520 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:35,009-Speed 6312.80 samples/sec Loss 5.7817 LearningRate 0.0005 Epoch: 14 Global Step: 294530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:38,255-Speed 6311.10 samples/sec Loss 5.7860 LearningRate 0.0005 Epoch: 14 Global Step: 294540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:41,502-Speed 6309.36 samples/sec Loss 5.7519 LearningRate 0.0005 Epoch: 14 Global Step: 294550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:44,746-Speed 6314.75 samples/sec Loss 5.7226 LearningRate 0.0005 Epoch: 14 Global Step: 294560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:47,989-Speed 6316.83 samples/sec Loss 5.8023 LearningRate 0.0005 Epoch: 14 Global Step: 294570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:51,233-Speed 6314.79 samples/sec Loss 5.7732 LearningRate 0.0005 Epoch: 14 Global Step: 294580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:54,491-Speed 6288.47 samples/sec Loss 5.8008 LearningRate 0.0005 Epoch: 14 Global Step: 294590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:31:57,727-Speed 6329.23 samples/sec Loss 5.7717 LearningRate 0.0005 Epoch: 14 Global Step: 294600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:00,975-Speed 6307.15 samples/sec Loss 5.7178 LearningRate 0.0005 Epoch: 14 Global Step: 294610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:04,232-Speed 6290.49 samples/sec Loss 5.7682 LearningRate 0.0005 Epoch: 14 Global Step: 294620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:07,475-Speed 6316.24 samples/sec Loss 5.7042 LearningRate 0.0005 Epoch: 14 Global Step: 294630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:10,718-Speed 6315.74 samples/sec Loss 5.8313 LearningRate 0.0005 Epoch: 14 Global Step: 294640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:13,965-Speed 6308.69 samples/sec Loss 5.7791 LearningRate 0.0005 Epoch: 14 Global Step: 294650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:17,212-Speed 6309.77 samples/sec Loss 5.7272 LearningRate 0.0005 Epoch: 14 Global Step: 294660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:20,461-Speed 6305.34 samples/sec Loss 5.7920 LearningRate 0.0005 Epoch: 14 Global Step: 294670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:23,707-Speed 6310.73 samples/sec Loss 5.8312 LearningRate 0.0005 Epoch: 14 Global Step: 294680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:26,951-Speed 6314.84 samples/sec Loss 5.8446 LearningRate 0.0005 Epoch: 14 Global Step: 294690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:30,186-Speed 6331.90 samples/sec Loss 5.8402 LearningRate 0.0005 Epoch: 14 Global Step: 294700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:33,434-Speed 6306.71 samples/sec Loss 5.7268 LearningRate 0.0005 Epoch: 14 Global Step: 294710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:36,683-Speed 6304.26 samples/sec Loss 5.7708 LearningRate 0.0005 Epoch: 14 Global Step: 294720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:39,933-Speed 6303.55 samples/sec Loss 5.8273 LearningRate 0.0005 Epoch: 14 Global Step: 294730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:43,177-Speed 6313.82 samples/sec Loss 5.7952 LearningRate 0.0005 Epoch: 14 Global Step: 294740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:46,426-Speed 6304.66 samples/sec Loss 5.8100 LearningRate 0.0005 Epoch: 14 Global Step: 294750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:49,677-Speed 6301.49 samples/sec Loss 5.7844 LearningRate 0.0005 Epoch: 14 Global Step: 294760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:52,923-Speed 6311.04 samples/sec Loss 5.7621 LearningRate 0.0005 Epoch: 14 Global Step: 294770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:56,170-Speed 6307.96 samples/sec Loss 5.7363 LearningRate 0.0005 Epoch: 14 Global Step: 294780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:32:59,418-Speed 6307.93 samples/sec Loss 5.7202 LearningRate 0.0005 Epoch: 14 Global Step: 294790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:02,655-Speed 6328.54 samples/sec Loss 5.7981 LearningRate 0.0005 Epoch: 14 Global Step: 294800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:05,912-Speed 6289.06 samples/sec Loss 5.7451 LearningRate 0.0005 Epoch: 14 Global Step: 294810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:09,168-Speed 6291.44 samples/sec Loss 5.8099 LearningRate 0.0005 Epoch: 14 Global Step: 294820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:12,414-Speed 6310.97 samples/sec Loss 5.8204 LearningRate 0.0005 Epoch: 14 Global Step: 294830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:15,661-Speed 6307.95 samples/sec Loss 5.8482 LearningRate 0.0005 Epoch: 14 Global Step: 294840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:18,910-Speed 6306.93 samples/sec Loss 5.7499 LearningRate 0.0005 Epoch: 14 Global Step: 294850 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:22,155-Speed 6311.34 samples/sec Loss 5.7483 LearningRate 0.0005 Epoch: 14 Global Step: 294860 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:25,403-Speed 6307.71 samples/sec Loss 5.7599 LearningRate 0.0005 Epoch: 14 Global Step: 294870 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:28,649-Speed 6310.44 samples/sec Loss 5.8005 LearningRate 0.0005 Epoch: 14 Global Step: 294880 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:31,898-Speed 6303.80 samples/sec Loss 5.8147 LearningRate 0.0005 Epoch: 14 Global Step: 294890 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:35,134-Speed 6330.15 samples/sec Loss 5.7717 LearningRate 0.0005 Epoch: 14 Global Step: 294900 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:38,380-Speed 6311.29 samples/sec Loss 5.8078 LearningRate 0.0005 Epoch: 14 Global Step: 294910 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:41,632-Speed 6300.18 samples/sec Loss 5.8430 LearningRate 0.0005 Epoch: 14 Global Step: 294920 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:44,874-Speed 6317.86 samples/sec Loss 5.7459 LearningRate 0.0005 Epoch: 14 Global Step: 294930 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:48,131-Speed 6289.98 samples/sec Loss 5.7933 LearningRate 0.0005 Epoch: 14 Global Step: 294940 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:51,377-Speed 6310.18 samples/sec Loss 5.8157 LearningRate 0.0005 Epoch: 14 Global Step: 294950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:54,624-Speed 6309.68 samples/sec Loss 5.8337 LearningRate 0.0005 Epoch: 14 Global Step: 294960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:33:57,878-Speed 6294.13 samples/sec Loss 5.7347 LearningRate 0.0005 Epoch: 14 Global Step: 294970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:01,123-Speed 6311.58 samples/sec Loss 5.7791 LearningRate 0.0005 Epoch: 14 Global Step: 294980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:04,377-Speed 6295.03 samples/sec Loss 5.7893 LearningRate 0.0005 Epoch: 14 Global Step: 294990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:07,625-Speed 6308.19 samples/sec Loss 5.7518 LearningRate 0.0005 Epoch: 14 Global Step: 295000 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:34:10,856-Speed 6341.01 samples/sec Loss 5.8170 LearningRate 0.0005 Epoch: 14 Global Step: 295010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:14,103-Speed 6309.14 samples/sec Loss 5.7734 LearningRate 0.0005 Epoch: 14 Global Step: 295020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:17,346-Speed 6315.72 samples/sec Loss 5.7077 LearningRate 0.0005 Epoch: 14 Global Step: 295030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:20,595-Speed 6305.94 samples/sec Loss 5.7479 LearningRate 0.0005 Epoch: 14 Global Step: 295040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:23,842-Speed 6307.35 samples/sec Loss 5.7111 LearningRate 0.0005 Epoch: 14 Global Step: 295050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:27,088-Speed 6311.90 samples/sec Loss 5.7593 LearningRate 0.0005 Epoch: 14 Global Step: 295060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:30,335-Speed 6308.58 samples/sec Loss 5.7165 LearningRate 0.0005 Epoch: 14 Global Step: 295070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:33,581-Speed 6310.66 samples/sec Loss 5.8137 LearningRate 0.0005 Epoch: 14 Global Step: 295080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:36,838-Speed 6288.35 samples/sec Loss 5.8156 LearningRate 0.0005 Epoch: 14 Global Step: 295090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:40,096-Speed 6287.73 samples/sec Loss 5.8021 LearningRate 0.0005 Epoch: 14 Global Step: 295100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:43,346-Speed 6302.52 samples/sec Loss 5.8339 LearningRate 0.0005 Epoch: 14 Global Step: 295110 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:34:46,580-Speed 6333.79 samples/sec Loss 5.7392 LearningRate 0.0005 Epoch: 14 Global Step: 295120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:49,824-Speed 6316.56 samples/sec Loss 5.7171 LearningRate 0.0005 Epoch: 14 Global Step: 295130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:53,073-Speed 6303.07 samples/sec Loss 5.8042 LearningRate 0.0005 Epoch: 14 Global Step: 295140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:56,324-Speed 6302.23 samples/sec Loss 5.7816 LearningRate 0.0005 Epoch: 14 Global Step: 295150 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:34:59,584-Speed 6282.28 samples/sec Loss 5.8041 LearningRate 0.0005 Epoch: 14 Global Step: 295160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:02,836-Speed 6299.29 samples/sec Loss 5.8151 LearningRate 0.0005 Epoch: 14 Global Step: 295170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:06,084-Speed 6308.16 samples/sec Loss 5.7664 LearningRate 0.0005 Epoch: 14 Global Step: 295180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:09,331-Speed 6307.74 samples/sec Loss 5.7743 LearningRate 0.0005 Epoch: 14 Global Step: 295190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:12,576-Speed 6313.88 samples/sec Loss 5.7621 LearningRate 0.0005 Epoch: 14 Global Step: 295200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:15,824-Speed 6306.19 samples/sec Loss 5.6810 LearningRate 0.0005 Epoch: 14 Global Step: 295210 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:19,058-Speed 6334.76 samples/sec Loss 5.7184 LearningRate 0.0005 Epoch: 14 Global Step: 295220 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:22,305-Speed 6309.62 samples/sec Loss 5.7483 LearningRate 0.0005 Epoch: 14 Global Step: 295230 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:25,554-Speed 6304.47 samples/sec Loss 5.7400 LearningRate 0.0005 Epoch: 14 Global Step: 295240 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:28,812-Speed 6286.82 samples/sec Loss 5.7835 LearningRate 0.0005 Epoch: 14 Global Step: 295250 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:32,055-Speed 6316.52 samples/sec Loss 5.7841 LearningRate 0.0005 Epoch: 14 Global Step: 295260 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:35,298-Speed 6316.04 samples/sec Loss 5.7334 LearningRate 0.0005 Epoch: 14 Global Step: 295270 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:38,549-Speed 6300.84 samples/sec Loss 5.7764 LearningRate 0.0005 Epoch: 14 Global Step: 295280 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:41,794-Speed 6314.40 samples/sec Loss 5.8115 LearningRate 0.0005 Epoch: 14 Global Step: 295290 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:45,042-Speed 6307.00 samples/sec Loss 5.8247 LearningRate 0.0005 Epoch: 14 Global Step: 295300 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:48,288-Speed 6309.25 samples/sec Loss 5.8204 LearningRate 0.0005 Epoch: 14 Global Step: 295310 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:51,518-Speed 6343.18 samples/sec Loss 5.7334 LearningRate 0.0005 Epoch: 14 Global Step: 295320 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:54,769-Speed 6300.03 samples/sec Loss 5.8174 LearningRate 0.0005 Epoch: 14 Global Step: 295330 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:35:58,014-Speed 6312.90 samples/sec Loss 5.7377 LearningRate 0.0005 Epoch: 14 Global Step: 295340 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:01,264-Speed 6301.95 samples/sec Loss 5.7306 LearningRate 0.0005 Epoch: 14 Global Step: 295350 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:04,513-Speed 6305.24 samples/sec Loss 5.7912 LearningRate 0.0005 Epoch: 14 Global Step: 295360 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:07,760-Speed 6310.15 samples/sec Loss 5.8188 LearningRate 0.0005 Epoch: 14 Global Step: 295370 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:11,007-Speed 6308.56 samples/sec Loss 5.8426 LearningRate 0.0005 Epoch: 14 Global Step: 295380 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:14,257-Speed 6302.60 samples/sec Loss 5.8427 LearningRate 0.0005 Epoch: 14 Global Step: 295390 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:17,509-Speed 6297.96 samples/sec Loss 5.8349 LearningRate 0.0005 Epoch: 14 Global Step: 295400 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:20,756-Speed 6310.03 samples/sec Loss 5.7413 LearningRate 0.0005 Epoch: 14 Global Step: 295410 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:24,007-Speed 6300.56 samples/sec Loss 5.7955 LearningRate 0.0005 Epoch: 14 Global Step: 295420 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:27,249-Speed 6319.82 samples/sec Loss 5.7463 LearningRate 0.0005 Epoch: 14 Global Step: 295430 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:30,493-Speed 6314.86 samples/sec Loss 5.8126 LearningRate 0.0005 Epoch: 14 Global Step: 295440 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:33,739-Speed 6310.46 samples/sec Loss 5.8000 LearningRate 0.0005 Epoch: 14 Global Step: 295450 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:36,984-Speed 6312.45 samples/sec Loss 5.7753 LearningRate 0.0005 Epoch: 14 Global Step: 295460 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:40,229-Speed 6311.83 samples/sec Loss 5.8065 LearningRate 0.0005 Epoch: 14 Global Step: 295470 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:43,477-Speed 6308.34 samples/sec Loss 5.7789 LearningRate 0.0005 Epoch: 14 Global Step: 295480 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:46,720-Speed 6315.29 samples/sec Loss 5.8119 LearningRate 0.0005 Epoch: 14 Global Step: 295490 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:49,971-Speed 6301.86 samples/sec Loss 5.8252 LearningRate 0.0005 Epoch: 14 Global Step: 295500 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:53,217-Speed 6309.64 samples/sec Loss 5.7713 LearningRate 0.0005 Epoch: 14 Global Step: 295510 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:36:56,462-Speed 6313.16 samples/sec Loss 5.7625 LearningRate 0.0005 Epoch: 14 Global Step: 295520 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:36:59,692-Speed 6342.37 samples/sec Loss 5.7999 LearningRate 0.0005 Epoch: 14 Global Step: 295530 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:02,945-Speed 6297.24 samples/sec Loss 5.7843 LearningRate 0.0005 Epoch: 14 Global Step: 295540 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:06,190-Speed 6312.02 samples/sec Loss 5.8520 LearningRate 0.0005 Epoch: 14 Global Step: 295550 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:09,437-Speed 6309.32 samples/sec Loss 5.8832 LearningRate 0.0005 Epoch: 14 Global Step: 295560 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:12,685-Speed 6306.16 samples/sec Loss 5.7745 LearningRate 0.0005 Epoch: 14 Global Step: 295570 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:15,935-Speed 6304.24 samples/sec Loss 5.7891 LearningRate 0.0005 Epoch: 14 Global Step: 295580 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:19,180-Speed 6312.38 samples/sec Loss 5.7983 LearningRate 0.0005 Epoch: 14 Global Step: 295590 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:22,459-Speed 6246.92 samples/sec Loss 5.7276 LearningRate 0.0005 Epoch: 14 Global Step: 295600 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:25,703-Speed 6313.82 samples/sec Loss 5.6767 LearningRate 0.0005 Epoch: 14 Global Step: 295610 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:28,948-Speed 6312.71 samples/sec Loss 5.7887 LearningRate 0.0005 Epoch: 14 Global Step: 295620 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:32,177-Speed 6346.17 samples/sec Loss 5.7089 LearningRate 0.0005 Epoch: 14 Global Step: 295630 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:35,424-Speed 6308.83 samples/sec Loss 5.7731 LearningRate 0.0005 Epoch: 14 Global Step: 295640 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:38,666-Speed 6317.60 samples/sec Loss 5.8022 LearningRate 0.0005 Epoch: 14 Global Step: 295650 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:41,929-Speed 6277.82 samples/sec Loss 5.7410 LearningRate 0.0005 Epoch: 14 Global Step: 295660 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:45,175-Speed 6310.67 samples/sec Loss 5.8295 LearningRate 0.0005 Epoch: 14 Global Step: 295670 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:48,421-Speed 6310.38 samples/sec Loss 5.7371 LearningRate 0.0005 Epoch: 14 Global Step: 295680 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:51,667-Speed 6310.73 samples/sec Loss 5.7703 LearningRate 0.0005 Epoch: 14 Global Step: 295690 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:54,917-Speed 6303.19 samples/sec Loss 5.7575 LearningRate 0.0005 Epoch: 14 Global Step: 295700 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:37:58,164-Speed 6309.93 samples/sec Loss 5.8183 LearningRate 0.0005 Epoch: 14 Global Step: 295710 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:01,414-Speed 6301.89 samples/sec Loss 5.7475 LearningRate 0.0005 Epoch: 14 Global Step: 295720 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:04,652-Speed 6327.72 samples/sec Loss 5.7488 LearningRate 0.0005 Epoch: 14 Global Step: 295730 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:07,902-Speed 6301.16 samples/sec Loss 5.7734 LearningRate 0.0005 Epoch: 14 Global Step: 295740 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:11,154-Speed 6300.61 samples/sec Loss 5.7162 LearningRate 0.0005 Epoch: 14 Global Step: 295750 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:14,406-Speed 6298.23 samples/sec Loss 5.7407 LearningRate 0.0005 Epoch: 14 Global Step: 295760 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:17,671-Speed 6274.59 samples/sec Loss 5.7064 LearningRate 0.0005 Epoch: 14 Global Step: 295770 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:20,953-Speed 6240.68 samples/sec Loss 5.7480 LearningRate 0.0005 Epoch: 14 Global Step: 295780 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:24,234-Speed 6243.20 samples/sec Loss 5.8003 LearningRate 0.0005 Epoch: 14 Global Step: 295790 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:27,478-Speed 6316.09 samples/sec Loss 5.7975 LearningRate 0.0005 Epoch: 14 Global Step: 295800 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:30,721-Speed 6314.62 samples/sec Loss 5.7343 LearningRate 0.0005 Epoch: 14 Global Step: 295810 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:33,972-Speed 6302.40 samples/sec Loss 5.7283 LearningRate 0.0005 Epoch: 14 Global Step: 295820 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:37,205-Speed 6336.24 samples/sec Loss 5.8064 LearningRate 0.0005 Epoch: 14 Global Step: 295830 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:40,452-Speed 6309.48 samples/sec Loss 5.7293 LearningRate 0.0005 Epoch: 14 Global Step: 295840 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:38:43,681-Speed 6343.45 samples/sec Loss 5.7958 LearningRate 0.0005 Epoch: 14 Global Step: 295850 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:38:46,927-Speed 6310.67 samples/sec Loss 5.7703 LearningRate 0.0005 Epoch: 14 Global Step: 295860 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:38:50,173-Speed 6311.43 samples/sec Loss 5.8171 LearningRate 0.0005 Epoch: 14 Global Step: 295870 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:38:53,419-Speed 6309.93 samples/sec Loss 5.8197 LearningRate 0.0005 Epoch: 14 Global Step: 295880 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:38:56,664-Speed 6312.12 samples/sec Loss 5.7488 LearningRate 0.0005 Epoch: 14 Global Step: 295890 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:38:59,913-Speed 6306.54 samples/sec Loss 5.8471 LearningRate 0.0005 Epoch: 14 Global Step: 295900 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:39:03,159-Speed 6310.27 samples/sec Loss 5.7776 LearningRate 0.0005 Epoch: 14 Global Step: 295910 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:39:06,401-Speed 6317.86 samples/sec Loss 5.7568 LearningRate 0.0005 Epoch: 14 Global Step: 295920 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:39:09,646-Speed 6311.97 samples/sec Loss 5.7841 LearningRate 0.0005 Epoch: 14 Global Step: 295930 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:39:12,888-Speed 6319.94 samples/sec Loss 5.7122 LearningRate 0.0005 Epoch: 14 Global Step: 295940 Fp16 Grad Scale: 16384 Required: 49 hours Training: 2022-04-01 18:39:16,139-Speed 6300.36 samples/sec Loss 5.7743 LearningRate 0.0005 Epoch: 14 Global Step: 295950 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:19,386-Speed 6308.79 samples/sec Loss 5.7810 LearningRate 0.0005 Epoch: 14 Global Step: 295960 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:22,637-Speed 6300.91 samples/sec Loss 5.8012 LearningRate 0.0005 Epoch: 14 Global Step: 295970 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:25,884-Speed 6308.92 samples/sec Loss 5.7520 LearningRate 0.0005 Epoch: 14 Global Step: 295980 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:29,131-Speed 6308.81 samples/sec Loss 5.8015 LearningRate 0.0005 Epoch: 14 Global Step: 295990 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:32,382-Speed 6301.90 samples/sec Loss 5.8321 LearningRate 0.0005 Epoch: 14 Global Step: 296000 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:35,633-Speed 6301.39 samples/sec Loss 5.7728 LearningRate 0.0005 Epoch: 14 Global Step: 296010 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:38,881-Speed 6305.39 samples/sec Loss 5.7558 LearningRate 0.0005 Epoch: 14 Global Step: 296020 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:42,128-Speed 6308.97 samples/sec Loss 5.7303 LearningRate 0.0005 Epoch: 14 Global Step: 296030 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:45,374-Speed 6310.84 samples/sec Loss 5.8807 LearningRate 0.0005 Epoch: 14 Global Step: 296040 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:48,607-Speed 6336.79 samples/sec Loss 5.8059 LearningRate 0.0005 Epoch: 14 Global Step: 296050 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:51,853-Speed 6309.95 samples/sec Loss 5.7477 LearningRate 0.0005 Epoch: 14 Global Step: 296060 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:55,100-Speed 6309.87 samples/sec Loss 5.7496 LearningRate 0.0005 Epoch: 14 Global Step: 296070 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:39:58,344-Speed 6313.69 samples/sec Loss 5.7899 LearningRate 0.0005 Epoch: 14 Global Step: 296080 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:01,590-Speed 6311.95 samples/sec Loss 5.8073 LearningRate 0.0005 Epoch: 14 Global Step: 296090 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:04,838-Speed 6306.58 samples/sec Loss 5.6963 LearningRate 0.0005 Epoch: 14 Global Step: 296100 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:08,085-Speed 6309.72 samples/sec Loss 5.8035 LearningRate 0.0005 Epoch: 14 Global Step: 296110 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:11,330-Speed 6312.02 samples/sec Loss 5.8128 LearningRate 0.0005 Epoch: 14 Global Step: 296120 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:14,574-Speed 6314.27 samples/sec Loss 5.8042 LearningRate 0.0005 Epoch: 14 Global Step: 296130 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:17,820-Speed 6309.90 samples/sec Loss 5.7751 LearningRate 0.0005 Epoch: 14 Global Step: 296140 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:21,071-Speed 6301.22 samples/sec Loss 5.6992 LearningRate 0.0005 Epoch: 14 Global Step: 296150 Fp16 Grad Scale: 65536 Required: 49 hours Training: 2022-04-01 18:40:24,303-Speed 6339.36 samples/sec Loss 5.7574 LearningRate 0.0005 Epoch: 14 Global Step: 296160 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:27,547-Speed 6314.95 samples/sec Loss 5.8094 LearningRate 0.0005 Epoch: 14 Global Step: 296170 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:30,799-Speed 6298.42 samples/sec Loss 5.7736 LearningRate 0.0005 Epoch: 14 Global Step: 296180 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:34,040-Speed 6320.87 samples/sec Loss 5.7721 LearningRate 0.0005 Epoch: 14 Global Step: 296190 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:37,286-Speed 6309.88 samples/sec Loss 5.6672 LearningRate 0.0005 Epoch: 14 Global Step: 296200 Fp16 Grad Scale: 32768 Required: 49 hours Training: 2022-04-01 18:40:40,539-Speed 6297.32 samples/sec Loss 5.7651 LearningRate 0.0005 Epoch: 14 Global Step: 296210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:40:43,784-Speed 6313.83 samples/sec Loss 5.7482 LearningRate 0.0005 Epoch: 14 Global Step: 296220 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:40:47,027-Speed 6314.71 samples/sec Loss 5.6742 LearningRate 0.0005 Epoch: 14 Global Step: 296230 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:40:50,272-Speed 6313.51 samples/sec Loss 5.7883 LearningRate 0.0005 Epoch: 14 Global Step: 296240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:40:53,519-Speed 6310.63 samples/sec Loss 5.8005 LearningRate 0.0005 Epoch: 14 Global Step: 296250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:40:56,748-Speed 6342.41 samples/sec Loss 5.7297 LearningRate 0.0005 Epoch: 14 Global Step: 296260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:40:59,992-Speed 6314.50 samples/sec Loss 5.7531 LearningRate 0.0005 Epoch: 14 Global Step: 296270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:03,250-Speed 6288.38 samples/sec Loss 5.7878 LearningRate 0.0005 Epoch: 14 Global Step: 296280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:06,495-Speed 6312.94 samples/sec Loss 5.7627 LearningRate 0.0005 Epoch: 14 Global Step: 296290 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:09,745-Speed 6302.92 samples/sec Loss 5.7777 LearningRate 0.0005 Epoch: 14 Global Step: 296300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:12,991-Speed 6311.27 samples/sec Loss 5.8684 LearningRate 0.0005 Epoch: 14 Global Step: 296310 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:16,237-Speed 6309.00 samples/sec Loss 5.6614 LearningRate 0.0005 Epoch: 14 Global Step: 296320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:19,484-Speed 6309.25 samples/sec Loss 5.7384 LearningRate 0.0005 Epoch: 14 Global Step: 296330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:22,734-Speed 6302.27 samples/sec Loss 5.7867 LearningRate 0.0005 Epoch: 14 Global Step: 296340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:25,985-Speed 6300.85 samples/sec Loss 5.7602 LearningRate 0.0005 Epoch: 14 Global Step: 296350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:29,222-Speed 6330.12 samples/sec Loss 5.8066 LearningRate 0.0005 Epoch: 14 Global Step: 296360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:32,468-Speed 6310.65 samples/sec Loss 5.6735 LearningRate 0.0005 Epoch: 14 Global Step: 296370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:35,720-Speed 6298.51 samples/sec Loss 5.7874 LearningRate 0.0005 Epoch: 14 Global Step: 296380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:38,966-Speed 6310.16 samples/sec Loss 5.7570 LearningRate 0.0005 Epoch: 14 Global Step: 296390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:42,212-Speed 6310.86 samples/sec Loss 5.7970 LearningRate 0.0005 Epoch: 14 Global Step: 296400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:45,460-Speed 6306.30 samples/sec Loss 5.8014 LearningRate 0.0005 Epoch: 14 Global Step: 296410 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:48,708-Speed 6307.39 samples/sec Loss 5.7896 LearningRate 0.0005 Epoch: 14 Global Step: 296420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:51,954-Speed 6311.25 samples/sec Loss 5.7753 LearningRate 0.0005 Epoch: 14 Global Step: 296430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:55,203-Speed 6304.40 samples/sec Loss 5.7844 LearningRate 0.0005 Epoch: 14 Global Step: 296440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:41:58,444-Speed 6320.66 samples/sec Loss 5.7911 LearningRate 0.0005 Epoch: 14 Global Step: 296450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:01,725-Speed 6242.43 samples/sec Loss 5.7390 LearningRate 0.0005 Epoch: 14 Global Step: 296460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:04,976-Speed 6302.47 samples/sec Loss 5.7385 LearningRate 0.0005 Epoch: 14 Global Step: 296470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:08,223-Speed 6308.24 samples/sec Loss 5.7445 LearningRate 0.0005 Epoch: 14 Global Step: 296480 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:11,470-Speed 6310.03 samples/sec Loss 5.7770 LearningRate 0.0005 Epoch: 14 Global Step: 296490 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:14,715-Speed 6312.90 samples/sec Loss 5.7324 LearningRate 0.0005 Epoch: 14 Global Step: 296500 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:17,966-Speed 6300.26 samples/sec Loss 5.8740 LearningRate 0.0005 Epoch: 14 Global Step: 296510 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:21,210-Speed 6314.78 samples/sec Loss 5.8269 LearningRate 0.0005 Epoch: 14 Global Step: 296520 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:24,459-Speed 6305.29 samples/sec Loss 5.7901 LearningRate 0.0005 Epoch: 14 Global Step: 296530 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:27,710-Speed 6300.00 samples/sec Loss 5.7833 LearningRate 0.0005 Epoch: 14 Global Step: 296540 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:30,956-Speed 6310.77 samples/sec Loss 5.8133 LearningRate 0.0005 Epoch: 14 Global Step: 296550 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:34,201-Speed 6312.44 samples/sec Loss 5.8407 LearningRate 0.0005 Epoch: 14 Global Step: 296560 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:37,446-Speed 6313.71 samples/sec Loss 5.7094 LearningRate 0.0005 Epoch: 14 Global Step: 296570 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:40,702-Speed 6290.06 samples/sec Loss 5.7365 LearningRate 0.0005 Epoch: 14 Global Step: 296580 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:43,948-Speed 6311.15 samples/sec Loss 5.8264 LearningRate 0.0005 Epoch: 14 Global Step: 296590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:47,199-Speed 6301.99 samples/sec Loss 5.6939 LearningRate 0.0005 Epoch: 14 Global Step: 296600 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:50,448-Speed 6304.14 samples/sec Loss 5.7977 LearningRate 0.0005 Epoch: 14 Global Step: 296610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:53,695-Speed 6309.47 samples/sec Loss 5.6653 LearningRate 0.0005 Epoch: 14 Global Step: 296620 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:42:56,940-Speed 6312.06 samples/sec Loss 5.8440 LearningRate 0.0005 Epoch: 14 Global Step: 296630 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:00,188-Speed 6307.26 samples/sec Loss 5.8580 LearningRate 0.0005 Epoch: 14 Global Step: 296640 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:03,446-Speed 6287.17 samples/sec Loss 5.7399 LearningRate 0.0005 Epoch: 14 Global Step: 296650 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:06,690-Speed 6313.32 samples/sec Loss 5.8342 LearningRate 0.0005 Epoch: 14 Global Step: 296660 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 18:43:09,924-Speed 6335.11 samples/sec Loss 5.7220 LearningRate 0.0005 Epoch: 14 Global Step: 296670 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:13,169-Speed 6313.04 samples/sec Loss 5.7770 LearningRate 0.0005 Epoch: 14 Global Step: 296680 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:16,418-Speed 6306.10 samples/sec Loss 5.7603 LearningRate 0.0005 Epoch: 14 Global Step: 296690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:19,666-Speed 6307.02 samples/sec Loss 5.7992 LearningRate 0.0005 Epoch: 14 Global Step: 296700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:22,916-Speed 6301.58 samples/sec Loss 5.7570 LearningRate 0.0005 Epoch: 14 Global Step: 296710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:26,163-Speed 6309.44 samples/sec Loss 5.7444 LearningRate 0.0005 Epoch: 14 Global Step: 296720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:29,412-Speed 6304.47 samples/sec Loss 5.7783 LearningRate 0.0005 Epoch: 14 Global Step: 296730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:32,655-Speed 6318.13 samples/sec Loss 5.7810 LearningRate 0.0005 Epoch: 14 Global Step: 296740 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:35,902-Speed 6308.38 samples/sec Loss 5.8589 LearningRate 0.0005 Epoch: 14 Global Step: 296750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:39,150-Speed 6307.32 samples/sec Loss 5.7223 LearningRate 0.0005 Epoch: 14 Global Step: 296760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:42,379-Speed 6343.53 samples/sec Loss 5.8080 LearningRate 0.0005 Epoch: 14 Global Step: 296770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:45,623-Speed 6315.64 samples/sec Loss 5.7067 LearningRate 0.0005 Epoch: 14 Global Step: 296780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:48,878-Speed 6292.83 samples/sec Loss 5.7214 LearningRate 0.0005 Epoch: 14 Global Step: 296790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:52,126-Speed 6305.79 samples/sec Loss 5.6793 LearningRate 0.0005 Epoch: 14 Global Step: 296800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:55,374-Speed 6306.43 samples/sec Loss 5.7220 LearningRate 0.0005 Epoch: 14 Global Step: 296810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:43:58,623-Speed 6306.05 samples/sec Loss 5.7534 LearningRate 0.0005 Epoch: 14 Global Step: 296820 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:44:01,886-Speed 6276.39 samples/sec Loss 5.7829 LearningRate 0.0005 Epoch: 14 Global Step: 296830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:44:05,132-Speed 6310.66 samples/sec Loss 5.8024 LearningRate 0.0005 Epoch: 14 Global Step: 296840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:44:08,378-Speed 6311.29 samples/sec Loss 5.7884 LearningRate 0.0005 Epoch: 14 Global Step: 296850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:44:11,624-Speed 6311.28 samples/sec Loss 5.7407 LearningRate 0.0005 Epoch: 14 Global Step: 296860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:44:14,858-Speed 6334.77 samples/sec Loss 5.7541 LearningRate 0.0005 Epoch: 14 Global Step: 296870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:44:18,103-Speed 6311.38 samples/sec Loss 5.8418 LearningRate 0.0005 Epoch: 14 Global Step: 296880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:44:21,350-Speed 6310.05 samples/sec Loss 5.7305 LearningRate 0.0005 Epoch: 14 Global Step: 296890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:44:24,596-Speed 6311.08 samples/sec Loss 5.8398 LearningRate 0.0005 Epoch: 14 Global Step: 296900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:44:27,825-Speed 6343.35 samples/sec Loss 5.8341 LearningRate 0.0005 Epoch: 14 Global Step: 296910 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:44:31,072-Speed 6308.63 samples/sec Loss 5.8010 LearningRate 0.0005 Epoch: 14 Global Step: 296920 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:44:34,315-Speed 6316.50 samples/sec Loss 5.7162 LearningRate 0.0005 Epoch: 14 Global Step: 296930 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:44:37,562-Speed 6309.86 samples/sec Loss 5.7180 LearningRate 0.0005 Epoch: 14 Global Step: 296940 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:44:40,812-Speed 6303.19 samples/sec Loss 5.7599 LearningRate 0.0005 Epoch: 14 Global Step: 296950 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:44:44,055-Speed 6316.41 samples/sec Loss 5.7734 LearningRate 0.0005 Epoch: 14 Global Step: 296960 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:44:47,298-Speed 6316.28 samples/sec Loss 5.8021 LearningRate 0.0005 Epoch: 14 Global Step: 296970 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:44:50,545-Speed 6308.30 samples/sec Loss 5.7483 LearningRate 0.0005 Epoch: 14 Global Step: 296980 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:44:53,794-Speed 6304.43 samples/sec Loss 5.7614 LearningRate 0.0005 Epoch: 14 Global Step: 296990 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:44:57,037-Speed 6317.31 samples/sec Loss 5.7028 LearningRate 0.0005 Epoch: 14 Global Step: 297000 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:45:00,282-Speed 6312.31 samples/sec Loss 5.8085 LearningRate 0.0005 Epoch: 14 Global Step: 297010 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:03,526-Speed 6314.50 samples/sec Loss 5.8043 LearningRate 0.0005 Epoch: 14 Global Step: 297020 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:06,776-Speed 6303.98 samples/sec Loss 5.7201 LearningRate 0.0005 Epoch: 14 Global Step: 297030 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:10,021-Speed 6312.32 samples/sec Loss 5.8545 LearningRate 0.0005 Epoch: 14 Global Step: 297040 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:13,266-Speed 6312.91 samples/sec Loss 5.7716 LearningRate 0.0005 Epoch: 14 Global Step: 297050 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:16,523-Speed 6289.21 samples/sec Loss 5.7593 LearningRate 0.0005 Epoch: 14 Global Step: 297060 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:19,771-Speed 6307.32 samples/sec Loss 5.8533 LearningRate 0.0005 Epoch: 14 Global Step: 297070 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:23,022-Speed 6300.33 samples/sec Loss 5.7689 LearningRate 0.0005 Epoch: 14 Global Step: 297080 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:26,268-Speed 6310.69 samples/sec Loss 5.7174 LearningRate 0.0005 Epoch: 14 Global Step: 297090 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:29,511-Speed 6316.83 samples/sec Loss 5.7195 LearningRate 0.0005 Epoch: 14 Global Step: 297100 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:32,743-Speed 6338.14 samples/sec Loss 5.7485 LearningRate 0.0005 Epoch: 14 Global Step: 297110 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:35,988-Speed 6312.80 samples/sec Loss 5.7073 LearningRate 0.0005 Epoch: 14 Global Step: 297120 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:39,233-Speed 6312.90 samples/sec Loss 5.7804 LearningRate 0.0005 Epoch: 14 Global Step: 297130 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:42,478-Speed 6313.74 samples/sec Loss 5.7618 LearningRate 0.0005 Epoch: 14 Global Step: 297140 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:45,721-Speed 6316.57 samples/sec Loss 5.7547 LearningRate 0.0005 Epoch: 14 Global Step: 297150 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:48,967-Speed 6310.16 samples/sec Loss 5.8257 LearningRate 0.0005 Epoch: 14 Global Step: 297160 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:52,214-Speed 6310.93 samples/sec Loss 5.7521 LearningRate 0.0005 Epoch: 14 Global Step: 297170 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:55,460-Speed 6310.87 samples/sec Loss 5.7685 LearningRate 0.0005 Epoch: 14 Global Step: 297180 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:45:58,708-Speed 6306.27 samples/sec Loss 5.7182 LearningRate 0.0005 Epoch: 14 Global Step: 297190 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:01,953-Speed 6313.29 samples/sec Loss 5.6952 LearningRate 0.0005 Epoch: 14 Global Step: 297200 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:05,183-Speed 6342.14 samples/sec Loss 5.7287 LearningRate 0.0005 Epoch: 14 Global Step: 297210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:08,431-Speed 6307.49 samples/sec Loss 5.7319 LearningRate 0.0005 Epoch: 14 Global Step: 297220 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:11,678-Speed 6308.39 samples/sec Loss 5.8167 LearningRate 0.0005 Epoch: 14 Global Step: 297230 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:14,922-Speed 6313.27 samples/sec Loss 5.7069 LearningRate 0.0005 Epoch: 14 Global Step: 297240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:18,168-Speed 6312.60 samples/sec Loss 5.7337 LearningRate 0.0005 Epoch: 14 Global Step: 297250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:21,412-Speed 6313.25 samples/sec Loss 5.7393 LearningRate 0.0005 Epoch: 14 Global Step: 297260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:24,663-Speed 6301.89 samples/sec Loss 5.7466 LearningRate 0.0005 Epoch: 14 Global Step: 297270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:27,911-Speed 6306.38 samples/sec Loss 5.7256 LearningRate 0.0005 Epoch: 14 Global Step: 297280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:31,154-Speed 6316.75 samples/sec Loss 5.7144 LearningRate 0.0005 Epoch: 14 Global Step: 297290 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:34,401-Speed 6309.44 samples/sec Loss 5.6860 LearningRate 0.0005 Epoch: 14 Global Step: 297300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:37,652-Speed 6301.00 samples/sec Loss 5.7122 LearningRate 0.0005 Epoch: 14 Global Step: 297310 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 18:46:40,885-Speed 6336.94 samples/sec Loss 5.7400 LearningRate 0.0005 Epoch: 14 Global Step: 297320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:44,146-Speed 6281.52 samples/sec Loss 5.7790 LearningRate 0.0005 Epoch: 14 Global Step: 297330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:47,394-Speed 6306.83 samples/sec Loss 5.8127 LearningRate 0.0005 Epoch: 14 Global Step: 297340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:50,641-Speed 6308.78 samples/sec Loss 5.6890 LearningRate 0.0005 Epoch: 14 Global Step: 297350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:53,892-Speed 6300.37 samples/sec Loss 5.7704 LearningRate 0.0005 Epoch: 14 Global Step: 297360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:46:57,140-Speed 6307.36 samples/sec Loss 5.7578 LearningRate 0.0005 Epoch: 14 Global Step: 297370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:00,384-Speed 6313.87 samples/sec Loss 5.7946 LearningRate 0.0005 Epoch: 14 Global Step: 297380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:03,636-Speed 6298.55 samples/sec Loss 5.7951 LearningRate 0.0005 Epoch: 14 Global Step: 297390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:06,883-Speed 6308.20 samples/sec Loss 5.7488 LearningRate 0.0005 Epoch: 14 Global Step: 297400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:10,129-Speed 6312.96 samples/sec Loss 5.7423 LearningRate 0.0005 Epoch: 14 Global Step: 297410 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:13,360-Speed 6338.63 samples/sec Loss 5.7536 LearningRate 0.0005 Epoch: 14 Global Step: 297420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:16,610-Speed 6303.90 samples/sec Loss 5.8172 LearningRate 0.0005 Epoch: 14 Global Step: 297430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:19,856-Speed 6310.86 samples/sec Loss 5.7564 LearningRate 0.0005 Epoch: 14 Global Step: 297440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:23,105-Speed 6304.01 samples/sec Loss 5.7929 LearningRate 0.0005 Epoch: 14 Global Step: 297450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:26,356-Speed 6301.95 samples/sec Loss 5.7641 LearningRate 0.0005 Epoch: 14 Global Step: 297460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:29,601-Speed 6312.05 samples/sec Loss 5.7429 LearningRate 0.0005 Epoch: 14 Global Step: 297470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:47:32,831-Speed 6341.41 samples/sec Loss 5.7893 LearningRate 0.0005 Epoch: 14 Global Step: 297480 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:47:36,099-Speed 6269.10 samples/sec Loss 5.7802 LearningRate 0.0005 Epoch: 14 Global Step: 297490 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:47:39,343-Speed 6315.38 samples/sec Loss 5.7902 LearningRate 0.0005 Epoch: 14 Global Step: 297500 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:47:42,588-Speed 6312.80 samples/sec Loss 5.7892 LearningRate 0.0005 Epoch: 14 Global Step: 297510 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:47:45,843-Speed 6293.65 samples/sec Loss 5.7460 LearningRate 0.0005 Epoch: 14 Global Step: 297520 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:47:49,088-Speed 6312.42 samples/sec Loss 5.7994 LearningRate 0.0005 Epoch: 14 Global Step: 297530 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:47:52,354-Speed 6271.58 samples/sec Loss 5.7722 LearningRate 0.0005 Epoch: 14 Global Step: 297540 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:47:55,600-Speed 6311.82 samples/sec Loss 5.7316 LearningRate 0.0005 Epoch: 14 Global Step: 297550 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:47:58,842-Speed 6317.77 samples/sec Loss 5.7625 LearningRate 0.0005 Epoch: 14 Global Step: 297560 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:48:02,088-Speed 6310.11 samples/sec Loss 5.7039 LearningRate 0.0005 Epoch: 14 Global Step: 297570 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:48:05,333-Speed 6313.23 samples/sec Loss 5.7573 LearningRate 0.0005 Epoch: 14 Global Step: 297580 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:08,582-Speed 6304.23 samples/sec Loss 5.8092 LearningRate 0.0005 Epoch: 14 Global Step: 297590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:11,836-Speed 6296.02 samples/sec Loss 5.7388 LearningRate 0.0005 Epoch: 14 Global Step: 297600 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:15,082-Speed 6311.04 samples/sec Loss 5.7588 LearningRate 0.0005 Epoch: 14 Global Step: 297610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:18,351-Speed 6269.08 samples/sec Loss 5.8035 LearningRate 0.0005 Epoch: 14 Global Step: 297620 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:21,596-Speed 6311.79 samples/sec Loss 5.8337 LearningRate 0.0005 Epoch: 14 Global Step: 297630 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:24,844-Speed 6307.52 samples/sec Loss 5.6955 LearningRate 0.0005 Epoch: 14 Global Step: 297640 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:28,093-Speed 6304.61 samples/sec Loss 5.6380 LearningRate 0.0005 Epoch: 14 Global Step: 297650 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:31,336-Speed 6316.11 samples/sec Loss 5.6944 LearningRate 0.0005 Epoch: 14 Global Step: 297660 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:34,585-Speed 6305.94 samples/sec Loss 5.8199 LearningRate 0.0005 Epoch: 14 Global Step: 297670 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:37,816-Speed 6338.82 samples/sec Loss 5.7320 LearningRate 0.0005 Epoch: 14 Global Step: 297680 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:41,064-Speed 6307.84 samples/sec Loss 5.7549 LearningRate 0.0005 Epoch: 14 Global Step: 297690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:44,310-Speed 6310.11 samples/sec Loss 5.7402 LearningRate 0.0005 Epoch: 14 Global Step: 297700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:47,561-Speed 6302.18 samples/sec Loss 5.7870 LearningRate 0.0005 Epoch: 14 Global Step: 297710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:50,808-Speed 6308.74 samples/sec Loss 5.7610 LearningRate 0.0005 Epoch: 14 Global Step: 297720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:54,059-Speed 6300.76 samples/sec Loss 5.7452 LearningRate 0.0005 Epoch: 14 Global Step: 297730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:48:57,310-Speed 6302.35 samples/sec Loss 5.7470 LearningRate 0.0005 Epoch: 14 Global Step: 297740 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:00,553-Speed 6316.07 samples/sec Loss 5.8683 LearningRate 0.0005 Epoch: 14 Global Step: 297750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:03,799-Speed 6309.33 samples/sec Loss 5.7597 LearningRate 0.0005 Epoch: 14 Global Step: 297760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:07,046-Speed 6309.94 samples/sec Loss 5.7601 LearningRate 0.0005 Epoch: 14 Global Step: 297770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:10,291-Speed 6312.29 samples/sec Loss 5.7490 LearningRate 0.0005 Epoch: 14 Global Step: 297780 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 18:49:13,526-Speed 6332.04 samples/sec Loss 5.7688 LearningRate 0.0005 Epoch: 14 Global Step: 297790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:16,781-Speed 6293.45 samples/sec Loss 5.7553 LearningRate 0.0005 Epoch: 14 Global Step: 297800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:20,028-Speed 6308.44 samples/sec Loss 5.7670 LearningRate 0.0005 Epoch: 14 Global Step: 297810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:23,272-Speed 6315.45 samples/sec Loss 5.8364 LearningRate 0.0005 Epoch: 14 Global Step: 297820 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:26,523-Speed 6303.69 samples/sec Loss 5.7577 LearningRate 0.0005 Epoch: 14 Global Step: 297830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:29,773-Speed 6302.87 samples/sec Loss 5.7246 LearningRate 0.0005 Epoch: 14 Global Step: 297840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:33,018-Speed 6313.68 samples/sec Loss 5.7823 LearningRate 0.0005 Epoch: 14 Global Step: 297850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:36,263-Speed 6311.13 samples/sec Loss 5.7599 LearningRate 0.0005 Epoch: 14 Global Step: 297860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:39,512-Speed 6306.97 samples/sec Loss 5.7247 LearningRate 0.0005 Epoch: 14 Global Step: 297870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:42,758-Speed 6310.25 samples/sec Loss 5.7972 LearningRate 0.0005 Epoch: 14 Global Step: 297880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:45,990-Speed 6338.11 samples/sec Loss 5.8558 LearningRate 0.0005 Epoch: 14 Global Step: 297890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:49,236-Speed 6309.53 samples/sec Loss 5.6961 LearningRate 0.0005 Epoch: 14 Global Step: 297900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:52,483-Speed 6309.68 samples/sec Loss 5.7770 LearningRate 0.0005 Epoch: 14 Global Step: 297910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:55,728-Speed 6313.09 samples/sec Loss 5.8017 LearningRate 0.0005 Epoch: 14 Global Step: 297920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:49:58,975-Speed 6308.85 samples/sec Loss 5.7408 LearningRate 0.0005 Epoch: 14 Global Step: 297930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:02,225-Speed 6302.17 samples/sec Loss 5.7735 LearningRate 0.0005 Epoch: 14 Global Step: 297940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:05,473-Speed 6307.88 samples/sec Loss 5.8211 LearningRate 0.0005 Epoch: 14 Global Step: 297950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:08,721-Speed 6305.99 samples/sec Loss 5.7565 LearningRate 0.0005 Epoch: 14 Global Step: 297960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:11,964-Speed 6317.19 samples/sec Loss 5.7492 LearningRate 0.0005 Epoch: 14 Global Step: 297970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:15,212-Speed 6307.37 samples/sec Loss 5.7232 LearningRate 0.0005 Epoch: 14 Global Step: 297980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:18,441-Speed 6343.38 samples/sec Loss 5.7360 LearningRate 0.0005 Epoch: 14 Global Step: 297990 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:21,687-Speed 6311.43 samples/sec Loss 5.6991 LearningRate 0.0005 Epoch: 14 Global Step: 298000 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:24,935-Speed 6305.66 samples/sec Loss 5.6999 LearningRate 0.0005 Epoch: 14 Global Step: 298010 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:28,209-Speed 6258.15 samples/sec Loss 5.8149 LearningRate 0.0005 Epoch: 14 Global Step: 298020 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:31,463-Speed 6294.21 samples/sec Loss 5.8101 LearningRate 0.0005 Epoch: 14 Global Step: 298030 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:34,709-Speed 6311.75 samples/sec Loss 5.8158 LearningRate 0.0005 Epoch: 14 Global Step: 298040 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:37,959-Speed 6301.76 samples/sec Loss 5.7139 LearningRate 0.0005 Epoch: 14 Global Step: 298050 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:41,212-Speed 6296.40 samples/sec Loss 5.7281 LearningRate 0.0005 Epoch: 14 Global Step: 298060 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:44,458-Speed 6311.75 samples/sec Loss 5.7208 LearningRate 0.0005 Epoch: 14 Global Step: 298070 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:47,707-Speed 6304.42 samples/sec Loss 5.6502 LearningRate 0.0005 Epoch: 14 Global Step: 298080 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:50,938-Speed 6340.26 samples/sec Loss 5.7639 LearningRate 0.0005 Epoch: 14 Global Step: 298090 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:54,183-Speed 6312.69 samples/sec Loss 5.7665 LearningRate 0.0005 Epoch: 14 Global Step: 298100 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:50:57,430-Speed 6309.34 samples/sec Loss 5.7446 LearningRate 0.0005 Epoch: 14 Global Step: 298110 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:00,679-Speed 6305.24 samples/sec Loss 5.8244 LearningRate 0.0005 Epoch: 14 Global Step: 298120 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:03,926-Speed 6308.32 samples/sec Loss 5.7315 LearningRate 0.0005 Epoch: 14 Global Step: 298130 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:07,176-Speed 6305.16 samples/sec Loss 5.8124 LearningRate 0.0005 Epoch: 14 Global Step: 298140 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:10,422-Speed 6310.54 samples/sec Loss 5.7655 LearningRate 0.0005 Epoch: 14 Global Step: 298150 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:13,671-Speed 6304.75 samples/sec Loss 5.7646 LearningRate 0.0005 Epoch: 14 Global Step: 298160 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:16,917-Speed 6309.37 samples/sec Loss 5.7183 LearningRate 0.0005 Epoch: 14 Global Step: 298170 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:20,165-Speed 6307.88 samples/sec Loss 5.7197 LearningRate 0.0005 Epoch: 14 Global Step: 298180 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:23,401-Speed 6329.32 samples/sec Loss 5.7439 LearningRate 0.0005 Epoch: 14 Global Step: 298190 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:26,648-Speed 6309.56 samples/sec Loss 5.7586 LearningRate 0.0005 Epoch: 14 Global Step: 298200 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:29,891-Speed 6316.91 samples/sec Loss 5.7250 LearningRate 0.0005 Epoch: 14 Global Step: 298210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:33,140-Speed 6304.94 samples/sec Loss 5.7725 LearningRate 0.0005 Epoch: 14 Global Step: 298220 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:36,390-Speed 6306.03 samples/sec Loss 5.8023 LearningRate 0.0005 Epoch: 14 Global Step: 298230 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:39,635-Speed 6311.91 samples/sec Loss 5.8207 LearningRate 0.0005 Epoch: 14 Global Step: 298240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:42,884-Speed 6305.50 samples/sec Loss 5.7684 LearningRate 0.0005 Epoch: 14 Global Step: 298250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:46,130-Speed 6309.65 samples/sec Loss 5.6764 LearningRate 0.0005 Epoch: 14 Global Step: 298260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:49,377-Speed 6309.64 samples/sec Loss 5.8262 LearningRate 0.0005 Epoch: 14 Global Step: 298270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:52,623-Speed 6309.86 samples/sec Loss 5.7509 LearningRate 0.0005 Epoch: 14 Global Step: 298280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:51:55,872-Speed 6304.52 samples/sec Loss 5.6996 LearningRate 0.0005 Epoch: 14 Global Step: 298290 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 18:51:59,108-Speed 6330.60 samples/sec Loss 5.7940 LearningRate 0.0005 Epoch: 14 Global Step: 298300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:02,358-Speed 6302.50 samples/sec Loss 5.6816 LearningRate 0.0005 Epoch: 14 Global Step: 298310 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:05,617-Speed 6286.46 samples/sec Loss 5.7538 LearningRate 0.0005 Epoch: 14 Global Step: 298320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:08,864-Speed 6308.68 samples/sec Loss 5.7158 LearningRate 0.0005 Epoch: 14 Global Step: 298330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:12,113-Speed 6305.60 samples/sec Loss 5.7103 LearningRate 0.0005 Epoch: 14 Global Step: 298340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:15,362-Speed 6305.46 samples/sec Loss 5.7448 LearningRate 0.0005 Epoch: 14 Global Step: 298350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:18,608-Speed 6310.46 samples/sec Loss 5.8695 LearningRate 0.0005 Epoch: 14 Global Step: 298360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:21,858-Speed 6303.21 samples/sec Loss 5.7878 LearningRate 0.0005 Epoch: 14 Global Step: 298370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:25,106-Speed 6307.18 samples/sec Loss 5.8117 LearningRate 0.0005 Epoch: 14 Global Step: 298380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:28,354-Speed 6306.61 samples/sec Loss 5.7731 LearningRate 0.0005 Epoch: 14 Global Step: 298390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:31,604-Speed 6301.86 samples/sec Loss 5.7314 LearningRate 0.0005 Epoch: 14 Global Step: 298400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:34,850-Speed 6311.48 samples/sec Loss 5.7329 LearningRate 0.0005 Epoch: 14 Global Step: 298410 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:38,100-Speed 6303.58 samples/sec Loss 5.7954 LearningRate 0.0005 Epoch: 14 Global Step: 298420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:41,343-Speed 6315.76 samples/sec Loss 5.6955 LearningRate 0.0005 Epoch: 14 Global Step: 298430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:44,589-Speed 6309.77 samples/sec Loss 5.7305 LearningRate 0.0005 Epoch: 14 Global Step: 298440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:47,834-Speed 6312.47 samples/sec Loss 5.6984 LearningRate 0.0005 Epoch: 14 Global Step: 298450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:51,080-Speed 6311.20 samples/sec Loss 5.8409 LearningRate 0.0005 Epoch: 14 Global Step: 298460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:54,327-Speed 6308.91 samples/sec Loss 5.7767 LearningRate 0.0005 Epoch: 14 Global Step: 298470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:52:57,575-Speed 6307.97 samples/sec Loss 5.7853 LearningRate 0.0005 Epoch: 14 Global Step: 298480 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:00,822-Speed 6308.95 samples/sec Loss 5.7769 LearningRate 0.0005 Epoch: 14 Global Step: 298490 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:04,057-Speed 6331.64 samples/sec Loss 5.8360 LearningRate 0.0005 Epoch: 14 Global Step: 298500 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:07,305-Speed 6307.20 samples/sec Loss 5.7049 LearningRate 0.0005 Epoch: 14 Global Step: 298510 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:10,549-Speed 6313.72 samples/sec Loss 5.7442 LearningRate 0.0005 Epoch: 14 Global Step: 298520 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:13,796-Speed 6309.54 samples/sec Loss 5.7283 LearningRate 0.0005 Epoch: 14 Global Step: 298530 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:17,048-Speed 6298.86 samples/sec Loss 5.7990 LearningRate 0.0005 Epoch: 14 Global Step: 298540 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:20,293-Speed 6312.20 samples/sec Loss 5.8000 LearningRate 0.0005 Epoch: 14 Global Step: 298550 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:23,548-Speed 6293.13 samples/sec Loss 5.7735 LearningRate 0.0005 Epoch: 14 Global Step: 298560 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:26,792-Speed 6314.79 samples/sec Loss 5.7714 LearningRate 0.0005 Epoch: 14 Global Step: 298570 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:30,044-Speed 6300.25 samples/sec Loss 5.8057 LearningRate 0.0005 Epoch: 14 Global Step: 298580 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:33,292-Speed 6306.92 samples/sec Loss 5.7792 LearningRate 0.0005 Epoch: 14 Global Step: 298590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:36,519-Speed 6348.21 samples/sec Loss 5.7201 LearningRate 0.0005 Epoch: 14 Global Step: 298600 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:39,765-Speed 6310.75 samples/sec Loss 5.7972 LearningRate 0.0005 Epoch: 14 Global Step: 298610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:43,010-Speed 6311.99 samples/sec Loss 5.7625 LearningRate 0.0005 Epoch: 14 Global Step: 298620 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:46,264-Speed 6296.27 samples/sec Loss 5.7229 LearningRate 0.0005 Epoch: 14 Global Step: 298630 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:49,508-Speed 6314.68 samples/sec Loss 5.7881 LearningRate 0.0005 Epoch: 14 Global Step: 298640 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:52,754-Speed 6309.64 samples/sec Loss 5.8046 LearningRate 0.0005 Epoch: 14 Global Step: 298650 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:55,997-Speed 6317.06 samples/sec Loss 5.7642 LearningRate 0.0005 Epoch: 14 Global Step: 298660 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:53:59,246-Speed 6304.62 samples/sec Loss 5.7384 LearningRate 0.0005 Epoch: 14 Global Step: 298670 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:02,492-Speed 6310.36 samples/sec Loss 5.7690 LearningRate 0.0005 Epoch: 14 Global Step: 298680 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:05,735-Speed 6316.55 samples/sec Loss 5.7751 LearningRate 0.0005 Epoch: 14 Global Step: 298690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:08,968-Speed 6336.24 samples/sec Loss 5.6633 LearningRate 0.0005 Epoch: 14 Global Step: 298700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:12,213-Speed 6312.13 samples/sec Loss 5.7557 LearningRate 0.0005 Epoch: 14 Global Step: 298710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:15,460-Speed 6309.97 samples/sec Loss 5.7682 LearningRate 0.0005 Epoch: 14 Global Step: 298720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:18,706-Speed 6311.22 samples/sec Loss 5.7665 LearningRate 0.0005 Epoch: 14 Global Step: 298730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:21,952-Speed 6309.78 samples/sec Loss 5.7753 LearningRate 0.0005 Epoch: 14 Global Step: 298740 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:25,203-Speed 6301.74 samples/sec Loss 5.7327 LearningRate 0.0005 Epoch: 14 Global Step: 298750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:28,444-Speed 6319.55 samples/sec Loss 5.7676 LearningRate 0.0005 Epoch: 14 Global Step: 298760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:31,688-Speed 6316.05 samples/sec Loss 5.7592 LearningRate 0.0005 Epoch: 14 Global Step: 298770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:34,932-Speed 6314.11 samples/sec Loss 5.7519 LearningRate 0.0005 Epoch: 14 Global Step: 298780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:38,175-Speed 6317.64 samples/sec Loss 5.8533 LearningRate 0.0005 Epoch: 14 Global Step: 298790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:41,460-Speed 6235.36 samples/sec Loss 5.6852 LearningRate 0.0005 Epoch: 14 Global Step: 298800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:44,703-Speed 6317.15 samples/sec Loss 5.7318 LearningRate 0.0005 Epoch: 14 Global Step: 298810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:47,949-Speed 6310.72 samples/sec Loss 5.8083 LearningRate 0.0005 Epoch: 14 Global Step: 298820 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:51,191-Speed 6317.06 samples/sec Loss 5.7066 LearningRate 0.0005 Epoch: 14 Global Step: 298830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:54,440-Speed 6305.54 samples/sec Loss 5.6808 LearningRate 0.0005 Epoch: 14 Global Step: 298840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:54:57,682-Speed 6317.49 samples/sec Loss 5.7322 LearningRate 0.0005 Epoch: 14 Global Step: 298850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:00,943-Speed 6282.51 samples/sec Loss 5.7370 LearningRate 0.0005 Epoch: 14 Global Step: 298860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:04,189-Speed 6311.75 samples/sec Loss 5.6754 LearningRate 0.0005 Epoch: 14 Global Step: 298870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:07,434-Speed 6311.45 samples/sec Loss 5.7809 LearningRate 0.0005 Epoch: 14 Global Step: 298880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:10,679-Speed 6313.66 samples/sec Loss 5.6383 LearningRate 0.0005 Epoch: 14 Global Step: 298890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:13,911-Speed 6336.64 samples/sec Loss 5.7978 LearningRate 0.0005 Epoch: 14 Global Step: 298900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:17,161-Speed 6303.68 samples/sec Loss 5.7925 LearningRate 0.0005 Epoch: 14 Global Step: 298910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:20,408-Speed 6308.99 samples/sec Loss 5.7294 LearningRate 0.0005 Epoch: 14 Global Step: 298920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:23,668-Speed 6283.58 samples/sec Loss 5.7391 LearningRate 0.0005 Epoch: 14 Global Step: 298930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:26,912-Speed 6314.36 samples/sec Loss 5.7540 LearningRate 0.0005 Epoch: 14 Global Step: 298940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:30,155-Speed 6316.19 samples/sec Loss 5.7514 LearningRate 0.0005 Epoch: 14 Global Step: 298950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:33,406-Speed 6301.55 samples/sec Loss 5.6819 LearningRate 0.0005 Epoch: 14 Global Step: 298960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:36,654-Speed 6307.30 samples/sec Loss 5.7663 LearningRate 0.0005 Epoch: 14 Global Step: 298970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:39,908-Speed 6295.67 samples/sec Loss 5.7085 LearningRate 0.0005 Epoch: 14 Global Step: 298980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:43,157-Speed 6303.75 samples/sec Loss 5.6391 LearningRate 0.0005 Epoch: 14 Global Step: 298990 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:46,406-Speed 6306.28 samples/sec Loss 5.7104 LearningRate 0.0005 Epoch: 14 Global Step: 299000 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 18:55:49,653-Speed 6309.52 samples/sec Loss 5.6560 LearningRate 0.0005 Epoch: 14 Global Step: 299010 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:52,897-Speed 6313.32 samples/sec Loss 5.7835 LearningRate 0.0005 Epoch: 14 Global Step: 299020 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:56,142-Speed 6312.82 samples/sec Loss 5.7213 LearningRate 0.0005 Epoch: 14 Global Step: 299030 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:55:59,391-Speed 6305.70 samples/sec Loss 5.7391 LearningRate 0.0005 Epoch: 14 Global Step: 299040 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:02,634-Speed 6315.00 samples/sec Loss 5.7270 LearningRate 0.0005 Epoch: 14 Global Step: 299050 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:05,880-Speed 6311.11 samples/sec Loss 5.7156 LearningRate 0.0005 Epoch: 14 Global Step: 299060 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:09,125-Speed 6312.85 samples/sec Loss 5.6863 LearningRate 0.0005 Epoch: 14 Global Step: 299070 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:12,377-Speed 6298.84 samples/sec Loss 5.7776 LearningRate 0.0005 Epoch: 14 Global Step: 299080 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:15,624-Speed 6309.03 samples/sec Loss 5.7791 LearningRate 0.0005 Epoch: 14 Global Step: 299090 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:18,871-Speed 6310.11 samples/sec Loss 5.8247 LearningRate 0.0005 Epoch: 14 Global Step: 299100 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:22,103-Speed 6337.09 samples/sec Loss 5.8154 LearningRate 0.0005 Epoch: 14 Global Step: 299110 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:25,351-Speed 6307.12 samples/sec Loss 5.7234 LearningRate 0.0005 Epoch: 14 Global Step: 299120 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:28,596-Speed 6312.86 samples/sec Loss 5.7001 LearningRate 0.0005 Epoch: 14 Global Step: 299130 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:31,843-Speed 6307.98 samples/sec Loss 5.7371 LearningRate 0.0005 Epoch: 14 Global Step: 299140 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:35,097-Speed 6296.12 samples/sec Loss 5.7916 LearningRate 0.0005 Epoch: 14 Global Step: 299150 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:38,345-Speed 6306.08 samples/sec Loss 5.7816 LearningRate 0.0005 Epoch: 14 Global Step: 299160 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:41,590-Speed 6312.87 samples/sec Loss 5.8239 LearningRate 0.0005 Epoch: 14 Global Step: 299170 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:44,836-Speed 6310.46 samples/sec Loss 5.7784 LearningRate 0.0005 Epoch: 14 Global Step: 299180 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:48,085-Speed 6305.83 samples/sec Loss 5.7333 LearningRate 0.0005 Epoch: 14 Global Step: 299190 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:56:51,316-Speed 6340.28 samples/sec Loss 5.7524 LearningRate 0.0005 Epoch: 14 Global Step: 299200 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:56:54,561-Speed 6312.59 samples/sec Loss 5.7920 LearningRate 0.0005 Epoch: 14 Global Step: 299210 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:56:57,806-Speed 6311.92 samples/sec Loss 5.7446 LearningRate 0.0005 Epoch: 14 Global Step: 299220 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:57:01,058-Speed 6299.81 samples/sec Loss 5.7288 LearningRate 0.0005 Epoch: 14 Global Step: 299230 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:57:04,306-Speed 6307.89 samples/sec Loss 5.7894 LearningRate 0.0005 Epoch: 14 Global Step: 299240 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:57:07,549-Speed 6316.72 samples/sec Loss 5.7304 LearningRate 0.0005 Epoch: 14 Global Step: 299250 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:57:10,796-Speed 6307.03 samples/sec Loss 5.7544 LearningRate 0.0005 Epoch: 14 Global Step: 299260 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:57:14,043-Speed 6309.21 samples/sec Loss 5.7177 LearningRate 0.0005 Epoch: 14 Global Step: 299270 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:57:17,304-Speed 6281.82 samples/sec Loss 5.6754 LearningRate 0.0005 Epoch: 14 Global Step: 299280 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:57:20,548-Speed 6314.85 samples/sec Loss 5.7094 LearningRate 0.0005 Epoch: 14 Global Step: 299290 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:57:23,795-Speed 6308.39 samples/sec Loss 5.6846 LearningRate 0.0005 Epoch: 14 Global Step: 299300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:27,040-Speed 6313.00 samples/sec Loss 5.7610 LearningRate 0.0005 Epoch: 14 Global Step: 299310 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:30,286-Speed 6311.51 samples/sec Loss 5.6896 LearningRate 0.0005 Epoch: 14 Global Step: 299320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:33,545-Speed 6285.03 samples/sec Loss 5.8036 LearningRate 0.0005 Epoch: 14 Global Step: 299330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:36,790-Speed 6314.94 samples/sec Loss 5.7159 LearningRate 0.0005 Epoch: 14 Global Step: 299340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:40,040-Speed 6302.91 samples/sec Loss 5.7483 LearningRate 0.0005 Epoch: 14 Global Step: 299350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:43,286-Speed 6310.02 samples/sec Loss 5.6821 LearningRate 0.0005 Epoch: 14 Global Step: 299360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:46,531-Speed 6311.51 samples/sec Loss 5.7470 LearningRate 0.0005 Epoch: 14 Global Step: 299370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:49,885-Speed 6107.61 samples/sec Loss 5.7452 LearningRate 0.0005 Epoch: 14 Global Step: 299380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:53,152-Speed 6270.93 samples/sec Loss 5.7427 LearningRate 0.0005 Epoch: 14 Global Step: 299390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:56,386-Speed 6335.40 samples/sec Loss 5.7646 LearningRate 0.0005 Epoch: 14 Global Step: 299400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:57:59,632-Speed 6309.45 samples/sec Loss 5.7548 LearningRate 0.0005 Epoch: 14 Global Step: 299410 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:58:02,880-Speed 6307.66 samples/sec Loss 5.7580 LearningRate 0.0005 Epoch: 14 Global Step: 299420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:58:06,127-Speed 6308.46 samples/sec Loss 5.7883 LearningRate 0.0005 Epoch: 14 Global Step: 299430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:58:09,380-Speed 6297.62 samples/sec Loss 5.6848 LearningRate 0.0005 Epoch: 14 Global Step: 299440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:58:12,625-Speed 6312.33 samples/sec Loss 5.7292 LearningRate 0.0005 Epoch: 14 Global Step: 299450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:58:15,871-Speed 6311.05 samples/sec Loss 5.7462 LearningRate 0.0005 Epoch: 14 Global Step: 299460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:58:19,123-Speed 6299.63 samples/sec Loss 5.6825 LearningRate 0.0005 Epoch: 14 Global Step: 299470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:58:22,369-Speed 6309.58 samples/sec Loss 5.7741 LearningRate 0.0005 Epoch: 14 Global Step: 299480 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:58:25,599-Speed 6342.56 samples/sec Loss 5.7289 LearningRate 0.0005 Epoch: 14 Global Step: 299490 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:58:28,846-Speed 6309.26 samples/sec Loss 5.7106 LearningRate 0.0005 Epoch: 14 Global Step: 299500 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:58:32,091-Speed 6311.47 samples/sec Loss 5.7088 LearningRate 0.0005 Epoch: 14 Global Step: 299510 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:58:35,337-Speed 6310.71 samples/sec Loss 5.7293 LearningRate 0.0005 Epoch: 14 Global Step: 299520 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:58:38,590-Speed 6297.37 samples/sec Loss 5.6228 LearningRate 0.0005 Epoch: 14 Global Step: 299530 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:58:41,836-Speed 6310.92 samples/sec Loss 5.7927 LearningRate 0.0005 Epoch: 14 Global Step: 299540 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:58:45,095-Speed 6286.17 samples/sec Loss 5.7412 LearningRate 0.0005 Epoch: 14 Global Step: 299550 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:58:48,338-Speed 6315.58 samples/sec Loss 5.7763 LearningRate 0.0005 Epoch: 14 Global Step: 299560 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:58:51,582-Speed 6315.04 samples/sec Loss 5.8522 LearningRate 0.0005 Epoch: 14 Global Step: 299570 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:58:54,824-Speed 6318.47 samples/sec Loss 5.7537 LearningRate 0.0005 Epoch: 14 Global Step: 299580 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 18:58:58,069-Speed 6311.99 samples/sec Loss 5.6841 LearningRate 0.0005 Epoch: 14 Global Step: 299590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:01,319-Speed 6305.53 samples/sec Loss 5.7342 LearningRate 0.0005 Epoch: 14 Global Step: 299600 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:04,563-Speed 6313.08 samples/sec Loss 5.7432 LearningRate 0.0005 Epoch: 14 Global Step: 299610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:07,811-Speed 6308.76 samples/sec Loss 5.7324 LearningRate 0.0005 Epoch: 14 Global Step: 299620 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:11,062-Speed 6300.95 samples/sec Loss 5.8079 LearningRate 0.0005 Epoch: 14 Global Step: 299630 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:14,309-Speed 6308.16 samples/sec Loss 5.7838 LearningRate 0.0005 Epoch: 14 Global Step: 299640 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:17,556-Speed 6308.96 samples/sec Loss 5.7965 LearningRate 0.0005 Epoch: 14 Global Step: 299650 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:20,802-Speed 6311.26 samples/sec Loss 5.6741 LearningRate 0.0005 Epoch: 14 Global Step: 299660 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:24,063-Speed 6280.96 samples/sec Loss 5.8025 LearningRate 0.0005 Epoch: 14 Global Step: 299670 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:27,312-Speed 6303.82 samples/sec Loss 5.6814 LearningRate 0.0005 Epoch: 14 Global Step: 299680 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:30,549-Speed 6330.15 samples/sec Loss 5.7245 LearningRate 0.0005 Epoch: 14 Global Step: 299690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:33,791-Speed 6317.11 samples/sec Loss 5.6767 LearningRate 0.0005 Epoch: 14 Global Step: 299700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:37,038-Speed 6309.90 samples/sec Loss 5.7303 LearningRate 0.0005 Epoch: 14 Global Step: 299710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:40,284-Speed 6309.88 samples/sec Loss 5.7210 LearningRate 0.0005 Epoch: 14 Global Step: 299720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:43,528-Speed 6314.23 samples/sec Loss 5.7292 LearningRate 0.0005 Epoch: 14 Global Step: 299730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:46,775-Speed 6308.41 samples/sec Loss 5.8326 LearningRate 0.0005 Epoch: 14 Global Step: 299740 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:50,023-Speed 6307.16 samples/sec Loss 5.7172 LearningRate 0.0005 Epoch: 14 Global Step: 299750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:53,269-Speed 6310.84 samples/sec Loss 5.6057 LearningRate 0.0005 Epoch: 14 Global Step: 299760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:56,513-Speed 6313.75 samples/sec Loss 5.7880 LearningRate 0.0005 Epoch: 14 Global Step: 299770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 18:59:59,756-Speed 6317.28 samples/sec Loss 5.7030 LearningRate 0.0005 Epoch: 14 Global Step: 299780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:03,007-Speed 6300.41 samples/sec Loss 5.7533 LearningRate 0.0005 Epoch: 14 Global Step: 299790 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:00:06,245-Speed 6327.94 samples/sec Loss 5.7075 LearningRate 0.0005 Epoch: 14 Global Step: 299800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:09,494-Speed 6304.19 samples/sec Loss 5.7064 LearningRate 0.0005 Epoch: 14 Global Step: 299810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:12,746-Speed 6299.87 samples/sec Loss 5.7133 LearningRate 0.0005 Epoch: 14 Global Step: 299820 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:15,996-Speed 6301.80 samples/sec Loss 5.8016 LearningRate 0.0005 Epoch: 14 Global Step: 299830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:19,244-Speed 6308.40 samples/sec Loss 5.7574 LearningRate 0.0005 Epoch: 14 Global Step: 299840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:22,491-Speed 6309.14 samples/sec Loss 5.6773 LearningRate 0.0005 Epoch: 14 Global Step: 299850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:25,737-Speed 6309.73 samples/sec Loss 5.7578 LearningRate 0.0005 Epoch: 14 Global Step: 299860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:28,985-Speed 6307.59 samples/sec Loss 5.6986 LearningRate 0.0005 Epoch: 14 Global Step: 299870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:32,236-Speed 6300.99 samples/sec Loss 5.7083 LearningRate 0.0005 Epoch: 14 Global Step: 299880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:35,481-Speed 6311.43 samples/sec Loss 5.7157 LearningRate 0.0005 Epoch: 14 Global Step: 299890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:38,717-Speed 6332.10 samples/sec Loss 5.7535 LearningRate 0.0005 Epoch: 14 Global Step: 299900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:41,960-Speed 6315.41 samples/sec Loss 5.8190 LearningRate 0.0005 Epoch: 14 Global Step: 299910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:45,207-Speed 6308.45 samples/sec Loss 5.7867 LearningRate 0.0005 Epoch: 14 Global Step: 299920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:48,453-Speed 6311.17 samples/sec Loss 5.8016 LearningRate 0.0005 Epoch: 14 Global Step: 299930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:51,702-Speed 6304.17 samples/sec Loss 5.7417 LearningRate 0.0005 Epoch: 14 Global Step: 299940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:54,948-Speed 6310.70 samples/sec Loss 5.7573 LearningRate 0.0005 Epoch: 14 Global Step: 299950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:00:58,221-Speed 6258.23 samples/sec Loss 5.7479 LearningRate 0.0005 Epoch: 14 Global Step: 299960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:01,471-Speed 6304.43 samples/sec Loss 5.6806 LearningRate 0.0005 Epoch: 14 Global Step: 299970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:04,716-Speed 6312.96 samples/sec Loss 5.7830 LearningRate 0.0005 Epoch: 14 Global Step: 299980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:07,962-Speed 6309.84 samples/sec Loss 5.7373 LearningRate 0.0005 Epoch: 14 Global Step: 299990 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:11,200-Speed 6325.75 samples/sec Loss 5.7516 LearningRate 0.0005 Epoch: 14 Global Step: 300000 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:14,447-Speed 6309.08 samples/sec Loss 5.7325 LearningRate 0.0005 Epoch: 14 Global Step: 300010 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:17,694-Speed 6309.98 samples/sec Loss 5.7144 LearningRate 0.0005 Epoch: 14 Global Step: 300020 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:20,945-Speed 6301.74 samples/sec Loss 5.7010 LearningRate 0.0005 Epoch: 14 Global Step: 300030 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:24,187-Speed 6316.70 samples/sec Loss 5.6914 LearningRate 0.0005 Epoch: 14 Global Step: 300040 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:27,433-Speed 6312.06 samples/sec Loss 5.7378 LearningRate 0.0005 Epoch: 14 Global Step: 300050 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:30,690-Speed 6289.23 samples/sec Loss 5.6989 LearningRate 0.0005 Epoch: 14 Global Step: 300060 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:33,937-Speed 6308.04 samples/sec Loss 5.6590 LearningRate 0.0005 Epoch: 14 Global Step: 300070 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:37,182-Speed 6314.03 samples/sec Loss 5.6721 LearningRate 0.0005 Epoch: 14 Global Step: 300080 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:40,425-Speed 6315.78 samples/sec Loss 5.7252 LearningRate 0.0005 Epoch: 14 Global Step: 300090 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:43,663-Speed 6327.09 samples/sec Loss 5.8092 LearningRate 0.0005 Epoch: 14 Global Step: 300100 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:46,909-Speed 6309.10 samples/sec Loss 5.7401 LearningRate 0.0005 Epoch: 14 Global Step: 300110 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:50,156-Speed 6309.75 samples/sec Loss 5.7311 LearningRate 0.0005 Epoch: 14 Global Step: 300120 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:53,400-Speed 6313.78 samples/sec Loss 5.7410 LearningRate 0.0005 Epoch: 14 Global Step: 300130 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:56,650-Speed 6303.65 samples/sec Loss 5.7841 LearningRate 0.0005 Epoch: 14 Global Step: 300140 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:01:59,906-Speed 6290.75 samples/sec Loss 5.7326 LearningRate 0.0005 Epoch: 14 Global Step: 300150 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:03,155-Speed 6305.26 samples/sec Loss 5.7944 LearningRate 0.0005 Epoch: 14 Global Step: 300160 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:06,407-Speed 6300.34 samples/sec Loss 5.7131 LearningRate 0.0005 Epoch: 14 Global Step: 300170 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:09,652-Speed 6312.45 samples/sec Loss 5.6909 LearningRate 0.0005 Epoch: 14 Global Step: 300180 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:12,897-Speed 6312.22 samples/sec Loss 5.7276 LearningRate 0.0005 Epoch: 14 Global Step: 300190 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:16,128-Speed 6339.97 samples/sec Loss 5.8045 LearningRate 0.0005 Epoch: 14 Global Step: 300200 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:19,375-Speed 6308.69 samples/sec Loss 5.8155 LearningRate 0.0005 Epoch: 14 Global Step: 300210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:22,621-Speed 6310.82 samples/sec Loss 5.7831 LearningRate 0.0005 Epoch: 14 Global Step: 300220 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:25,868-Speed 6309.11 samples/sec Loss 5.7656 LearningRate 0.0005 Epoch: 14 Global Step: 300230 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:29,120-Speed 6299.17 samples/sec Loss 5.7312 LearningRate 0.0005 Epoch: 14 Global Step: 300240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:32,367-Speed 6309.88 samples/sec Loss 5.7410 LearningRate 0.0005 Epoch: 14 Global Step: 300250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:35,613-Speed 6309.69 samples/sec Loss 5.7378 LearningRate 0.0005 Epoch: 14 Global Step: 300260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:38,861-Speed 6309.94 samples/sec Loss 5.7704 LearningRate 0.0005 Epoch: 14 Global Step: 300270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:42,106-Speed 6312.21 samples/sec Loss 5.7063 LearningRate 0.0005 Epoch: 14 Global Step: 300280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:45,354-Speed 6307.25 samples/sec Loss 5.6717 LearningRate 0.0005 Epoch: 14 Global Step: 300290 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:48,586-Speed 6336.79 samples/sec Loss 5.7860 LearningRate 0.0005 Epoch: 14 Global Step: 300300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:51,841-Speed 6294.36 samples/sec Loss 5.7542 LearningRate 0.0005 Epoch: 14 Global Step: 300310 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:55,087-Speed 6309.99 samples/sec Loss 5.7257 LearningRate 0.0005 Epoch: 14 Global Step: 300320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:02:58,329-Speed 6318.16 samples/sec Loss 5.7080 LearningRate 0.0005 Epoch: 14 Global Step: 300330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:01,588-Speed 6285.36 samples/sec Loss 5.6648 LearningRate 0.0005 Epoch: 14 Global Step: 300340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:04,834-Speed 6311.64 samples/sec Loss 5.7556 LearningRate 0.0005 Epoch: 14 Global Step: 300350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:08,084-Speed 6302.51 samples/sec Loss 5.7346 LearningRate 0.0005 Epoch: 14 Global Step: 300360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:11,336-Speed 6298.90 samples/sec Loss 5.7466 LearningRate 0.0005 Epoch: 14 Global Step: 300370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:14,582-Speed 6311.57 samples/sec Loss 5.6934 LearningRate 0.0005 Epoch: 14 Global Step: 300380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:17,832-Speed 6301.77 samples/sec Loss 5.7072 LearningRate 0.0005 Epoch: 14 Global Step: 300390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:21,062-Speed 6342.06 samples/sec Loss 5.6930 LearningRate 0.0005 Epoch: 14 Global Step: 300400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:24,311-Speed 6305.99 samples/sec Loss 5.7514 LearningRate 0.0005 Epoch: 14 Global Step: 300410 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:27,556-Speed 6312.78 samples/sec Loss 5.7161 LearningRate 0.0005 Epoch: 14 Global Step: 300420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:30,799-Speed 6315.15 samples/sec Loss 5.6973 LearningRate 0.0005 Epoch: 14 Global Step: 300430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:34,044-Speed 6312.13 samples/sec Loss 5.6953 LearningRate 0.0005 Epoch: 14 Global Step: 300440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:37,288-Speed 6315.94 samples/sec Loss 5.8347 LearningRate 0.0005 Epoch: 14 Global Step: 300450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:40,534-Speed 6311.25 samples/sec Loss 5.7414 LearningRate 0.0005 Epoch: 14 Global Step: 300460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:43,780-Speed 6310.42 samples/sec Loss 5.7033 LearningRate 0.0005 Epoch: 14 Global Step: 300470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:47,048-Speed 6268.47 samples/sec Loss 5.7260 LearningRate 0.0005 Epoch: 14 Global Step: 300480 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:50,293-Speed 6311.89 samples/sec Loss 5.7221 LearningRate 0.0005 Epoch: 14 Global Step: 300490 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:03:53,540-Speed 6308.78 samples/sec Loss 5.6810 LearningRate 0.0005 Epoch: 14 Global Step: 300500 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:03:56,772-Speed 6339.53 samples/sec Loss 5.7355 LearningRate 0.0005 Epoch: 14 Global Step: 300510 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:04:00,014-Speed 6317.78 samples/sec Loss 5.7423 LearningRate 0.0005 Epoch: 14 Global Step: 300520 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:04:03,261-Speed 6308.73 samples/sec Loss 5.7527 LearningRate 0.0005 Epoch: 14 Global Step: 300530 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:04:06,508-Speed 6309.03 samples/sec Loss 5.7530 LearningRate 0.0005 Epoch: 14 Global Step: 300540 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:04:09,751-Speed 6315.69 samples/sec Loss 5.7967 LearningRate 0.0005 Epoch: 14 Global Step: 300550 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:04:12,981-Speed 6342.74 samples/sec Loss 5.7575 LearningRate 0.0005 Epoch: 14 Global Step: 300560 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:04:16,223-Speed 6318.49 samples/sec Loss 5.7294 LearningRate 0.0005 Epoch: 14 Global Step: 300570 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:04:19,484-Speed 6281.79 samples/sec Loss 5.7226 LearningRate 0.0005 Epoch: 14 Global Step: 300580 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:04:22,732-Speed 6306.23 samples/sec Loss 5.6933 LearningRate 0.0005 Epoch: 14 Global Step: 300590 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:04:25,981-Speed 6305.80 samples/sec Loss 5.7109 LearningRate 0.0005 Epoch: 14 Global Step: 300600 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:04:29,230-Speed 6303.97 samples/sec Loss 5.7309 LearningRate 0.0005 Epoch: 14 Global Step: 300610 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:04:32,476-Speed 6311.60 samples/sec Loss 5.7395 LearningRate 0.0005 Epoch: 14 Global Step: 300620 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:04:35,723-Speed 6308.02 samples/sec Loss 5.6756 LearningRate 0.0005 Epoch: 14 Global Step: 300630 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:04:38,964-Speed 6320.20 samples/sec Loss 5.7390 LearningRate 0.0005 Epoch: 14 Global Step: 300640 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:04:42,209-Speed 6313.23 samples/sec Loss 5.7783 LearningRate 0.0005 Epoch: 14 Global Step: 300650 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:04:45,456-Speed 6309.38 samples/sec Loss 5.7562 LearningRate 0.0005 Epoch: 14 Global Step: 300660 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:04:48,699-Speed 6316.62 samples/sec Loss 5.7493 LearningRate 0.0005 Epoch: 14 Global Step: 300670 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:04:51,945-Speed 6310.82 samples/sec Loss 5.7583 LearningRate 0.0005 Epoch: 14 Global Step: 300680 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:04:55,195-Speed 6303.05 samples/sec Loss 5.7016 LearningRate 0.0005 Epoch: 14 Global Step: 300690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:04:58,442-Speed 6309.32 samples/sec Loss 5.7260 LearningRate 0.0005 Epoch: 14 Global Step: 300700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:01,688-Speed 6310.67 samples/sec Loss 5.6750 LearningRate 0.0005 Epoch: 14 Global Step: 300710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:04,936-Speed 6306.84 samples/sec Loss 5.7536 LearningRate 0.0005 Epoch: 14 Global Step: 300720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:08,183-Speed 6309.06 samples/sec Loss 5.7312 LearningRate 0.0005 Epoch: 14 Global Step: 300730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:11,426-Speed 6316.17 samples/sec Loss 5.7605 LearningRate 0.0005 Epoch: 14 Global Step: 300740 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:14,676-Speed 6302.59 samples/sec Loss 5.7274 LearningRate 0.0005 Epoch: 14 Global Step: 300750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:17,916-Speed 6323.65 samples/sec Loss 5.7377 LearningRate 0.0005 Epoch: 14 Global Step: 300760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:21,162-Speed 6310.92 samples/sec Loss 5.7252 LearningRate 0.0005 Epoch: 14 Global Step: 300770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:24,412-Speed 6302.21 samples/sec Loss 5.7133 LearningRate 0.0005 Epoch: 14 Global Step: 300780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:27,660-Speed 6306.89 samples/sec Loss 5.7576 LearningRate 0.0005 Epoch: 14 Global Step: 300790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:30,909-Speed 6304.59 samples/sec Loss 5.7367 LearningRate 0.0005 Epoch: 14 Global Step: 300800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:34,157-Speed 6306.15 samples/sec Loss 5.7129 LearningRate 0.0005 Epoch: 14 Global Step: 300810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:37,402-Speed 6312.04 samples/sec Loss 5.7396 LearningRate 0.0005 Epoch: 14 Global Step: 300820 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:40,648-Speed 6311.75 samples/sec Loss 5.7034 LearningRate 0.0005 Epoch: 14 Global Step: 300830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:43,897-Speed 6304.17 samples/sec Loss 5.7035 LearningRate 0.0005 Epoch: 14 Global Step: 300840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:47,145-Speed 6308.19 samples/sec Loss 5.7289 LearningRate 0.0005 Epoch: 14 Global Step: 300850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:50,377-Speed 6337.13 samples/sec Loss 5.8190 LearningRate 0.0005 Epoch: 14 Global Step: 300860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:53,630-Speed 6297.01 samples/sec Loss 5.7290 LearningRate 0.0005 Epoch: 14 Global Step: 300870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:05:56,875-Speed 6313.10 samples/sec Loss 5.6405 LearningRate 0.0005 Epoch: 14 Global Step: 300880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:00,124-Speed 6305.93 samples/sec Loss 5.7439 LearningRate 0.0005 Epoch: 14 Global Step: 300890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:03,374-Speed 6303.28 samples/sec Loss 5.7351 LearningRate 0.0005 Epoch: 14 Global Step: 300900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:06,618-Speed 6314.29 samples/sec Loss 5.6618 LearningRate 0.0005 Epoch: 14 Global Step: 300910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:09,864-Speed 6311.64 samples/sec Loss 5.7194 LearningRate 0.0005 Epoch: 14 Global Step: 300920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:13,107-Speed 6315.54 samples/sec Loss 5.7104 LearningRate 0.0005 Epoch: 14 Global Step: 300930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:16,355-Speed 6307.33 samples/sec Loss 5.7403 LearningRate 0.0005 Epoch: 14 Global Step: 300940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:19,602-Speed 6308.51 samples/sec Loss 5.7039 LearningRate 0.0005 Epoch: 14 Global Step: 300950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:22,833-Speed 6340.29 samples/sec Loss 5.6679 LearningRate 0.0005 Epoch: 14 Global Step: 300960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:26,082-Speed 6305.01 samples/sec Loss 5.7063 LearningRate 0.0005 Epoch: 14 Global Step: 300970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:29,323-Speed 6319.79 samples/sec Loss 5.7168 LearningRate 0.0005 Epoch: 14 Global Step: 300980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:32,567-Speed 6316.16 samples/sec Loss 5.7811 LearningRate 0.0005 Epoch: 14 Global Step: 300990 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:35,820-Speed 6296.73 samples/sec Loss 5.7798 LearningRate 0.0005 Epoch: 14 Global Step: 301000 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:39,072-Speed 6298.59 samples/sec Loss 5.6894 LearningRate 0.0005 Epoch: 14 Global Step: 301010 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:42,321-Speed 6304.99 samples/sec Loss 5.6865 LearningRate 0.0005 Epoch: 14 Global Step: 301020 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:45,566-Speed 6311.36 samples/sec Loss 5.6930 LearningRate 0.0005 Epoch: 14 Global Step: 301030 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:48,814-Speed 6308.72 samples/sec Loss 5.7865 LearningRate 0.0005 Epoch: 14 Global Step: 301040 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:52,063-Speed 6304.71 samples/sec Loss 5.7361 LearningRate 0.0005 Epoch: 14 Global Step: 301050 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:06:55,309-Speed 6310.38 samples/sec Loss 5.7212 LearningRate 0.0005 Epoch: 14 Global Step: 301060 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:06:58,547-Speed 6325.66 samples/sec Loss 5.7290 LearningRate 0.0005 Epoch: 14 Global Step: 301070 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:01,883-Speed 6139.99 samples/sec Loss 5.6822 LearningRate 0.0005 Epoch: 14 Global Step: 301080 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:05,206-Speed 6166.07 samples/sec Loss 5.7594 LearningRate 0.0005 Epoch: 14 Global Step: 301090 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:08,452-Speed 6310.90 samples/sec Loss 5.7698 LearningRate 0.0005 Epoch: 14 Global Step: 301100 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:11,700-Speed 6307.49 samples/sec Loss 5.7482 LearningRate 0.0005 Epoch: 14 Global Step: 301110 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:14,950-Speed 6302.79 samples/sec Loss 5.7483 LearningRate 0.0005 Epoch: 14 Global Step: 301120 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:18,195-Speed 6311.85 samples/sec Loss 5.7147 LearningRate 0.0005 Epoch: 14 Global Step: 301130 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:21,443-Speed 6308.13 samples/sec Loss 5.7076 LearningRate 0.0005 Epoch: 14 Global Step: 301140 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:24,690-Speed 6308.36 samples/sec Loss 5.7705 LearningRate 0.0005 Epoch: 14 Global Step: 301150 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:27,937-Speed 6308.77 samples/sec Loss 5.7003 LearningRate 0.0005 Epoch: 14 Global Step: 301160 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:31,167-Speed 6341.61 samples/sec Loss 5.7542 LearningRate 0.0005 Epoch: 14 Global Step: 301170 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:34,415-Speed 6307.40 samples/sec Loss 5.8544 LearningRate 0.0005 Epoch: 14 Global Step: 301180 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:37,660-Speed 6312.24 samples/sec Loss 5.7524 LearningRate 0.0005 Epoch: 14 Global Step: 301190 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:40,911-Speed 6300.50 samples/sec Loss 5.7148 LearningRate 0.0005 Epoch: 14 Global Step: 301200 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:44,159-Speed 6306.84 samples/sec Loss 5.7317 LearningRate 0.0005 Epoch: 14 Global Step: 301210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:47,415-Speed 6292.91 samples/sec Loss 5.7478 LearningRate 0.0005 Epoch: 14 Global Step: 301220 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:50,662-Speed 6307.61 samples/sec Loss 5.7093 LearningRate 0.0005 Epoch: 14 Global Step: 301230 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:53,909-Speed 6308.29 samples/sec Loss 5.7438 LearningRate 0.0005 Epoch: 14 Global Step: 301240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:07:57,157-Speed 6306.93 samples/sec Loss 5.7641 LearningRate 0.0005 Epoch: 14 Global Step: 301250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:00,403-Speed 6311.44 samples/sec Loss 5.8260 LearningRate 0.0005 Epoch: 14 Global Step: 301260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:03,636-Speed 6336.50 samples/sec Loss 5.7834 LearningRate 0.0005 Epoch: 14 Global Step: 301270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:06,884-Speed 6305.34 samples/sec Loss 5.7966 LearningRate 0.0005 Epoch: 14 Global Step: 301280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:10,129-Speed 6313.44 samples/sec Loss 5.7947 LearningRate 0.0005 Epoch: 14 Global Step: 301290 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:13,377-Speed 6307.68 samples/sec Loss 5.7282 LearningRate 0.0005 Epoch: 14 Global Step: 301300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:16,625-Speed 6306.84 samples/sec Loss 5.7668 LearningRate 0.0005 Epoch: 14 Global Step: 301310 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:19,870-Speed 6312.75 samples/sec Loss 5.7509 LearningRate 0.0005 Epoch: 14 Global Step: 301320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:23,134-Speed 6275.84 samples/sec Loss 5.6701 LearningRate 0.0005 Epoch: 14 Global Step: 301330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:26,378-Speed 6313.73 samples/sec Loss 5.7035 LearningRate 0.0005 Epoch: 14 Global Step: 301340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:29,625-Speed 6310.47 samples/sec Loss 5.6896 LearningRate 0.0005 Epoch: 14 Global Step: 301350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:32,870-Speed 6312.52 samples/sec Loss 5.7418 LearningRate 0.0005 Epoch: 14 Global Step: 301360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:36,102-Speed 6336.57 samples/sec Loss 5.6818 LearningRate 0.0005 Epoch: 14 Global Step: 301370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:39,345-Speed 6316.96 samples/sec Loss 5.7163 LearningRate 0.0005 Epoch: 14 Global Step: 301380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:42,591-Speed 6310.60 samples/sec Loss 5.7484 LearningRate 0.0005 Epoch: 14 Global Step: 301390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:45,836-Speed 6312.46 samples/sec Loss 5.7568 LearningRate 0.0005 Epoch: 14 Global Step: 301400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:49,086-Speed 6303.86 samples/sec Loss 5.7639 LearningRate 0.0005 Epoch: 14 Global Step: 301410 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:52,333-Speed 6309.32 samples/sec Loss 5.7540 LearningRate 0.0005 Epoch: 14 Global Step: 301420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:55,575-Speed 6318.21 samples/sec Loss 5.7496 LearningRate 0.0005 Epoch: 14 Global Step: 301430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:08:58,821-Speed 6310.06 samples/sec Loss 5.7491 LearningRate 0.0005 Epoch: 14 Global Step: 301440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:02,066-Speed 6312.14 samples/sec Loss 5.7714 LearningRate 0.0005 Epoch: 14 Global Step: 301450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:05,312-Speed 6310.92 samples/sec Loss 5.6807 LearningRate 0.0005 Epoch: 14 Global Step: 301460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:08,543-Speed 6339.63 samples/sec Loss 5.7671 LearningRate 0.0005 Epoch: 14 Global Step: 301470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:11,787-Speed 6315.61 samples/sec Loss 5.7108 LearningRate 0.0005 Epoch: 14 Global Step: 301480 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:15,035-Speed 6306.36 samples/sec Loss 5.7487 LearningRate 0.0005 Epoch: 14 Global Step: 301490 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:18,286-Speed 6304.96 samples/sec Loss 5.7609 LearningRate 0.0005 Epoch: 14 Global Step: 301500 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:21,530-Speed 6315.01 samples/sec Loss 5.7626 LearningRate 0.0005 Epoch: 14 Global Step: 301510 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:24,776-Speed 6309.64 samples/sec Loss 5.6878 LearningRate 0.0005 Epoch: 14 Global Step: 301520 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:28,021-Speed 6313.50 samples/sec Loss 5.7987 LearningRate 0.0005 Epoch: 14 Global Step: 301530 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:31,268-Speed 6308.60 samples/sec Loss 5.6895 LearningRate 0.0005 Epoch: 14 Global Step: 301540 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:34,514-Speed 6310.77 samples/sec Loss 5.7472 LearningRate 0.0005 Epoch: 14 Global Step: 301550 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:37,770-Speed 6291.28 samples/sec Loss 5.7269 LearningRate 0.0005 Epoch: 14 Global Step: 301560 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:41,017-Speed 6308.82 samples/sec Loss 5.8129 LearningRate 0.0005 Epoch: 14 Global Step: 301570 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:09:44,248-Speed 6341.43 samples/sec Loss 5.7384 LearningRate 0.0005 Epoch: 14 Global Step: 301580 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:47,498-Speed 6302.90 samples/sec Loss 5.7275 LearningRate 0.0005 Epoch: 14 Global Step: 301590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:50,740-Speed 6318.58 samples/sec Loss 5.6966 LearningRate 0.0005 Epoch: 14 Global Step: 301600 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:53,984-Speed 6313.63 samples/sec Loss 5.7112 LearningRate 0.0005 Epoch: 14 Global Step: 301610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:09:57,229-Speed 6312.11 samples/sec Loss 5.7767 LearningRate 0.0005 Epoch: 14 Global Step: 301620 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:00,474-Speed 6312.58 samples/sec Loss 5.7937 LearningRate 0.0005 Epoch: 14 Global Step: 301630 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:03,722-Speed 6306.20 samples/sec Loss 5.7321 LearningRate 0.0005 Epoch: 14 Global Step: 301640 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:06,966-Speed 6315.07 samples/sec Loss 5.7235 LearningRate 0.0005 Epoch: 14 Global Step: 301650 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:10,216-Speed 6304.59 samples/sec Loss 5.8014 LearningRate 0.0005 Epoch: 14 Global Step: 301660 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:13,461-Speed 6312.15 samples/sec Loss 5.7926 LearningRate 0.0005 Epoch: 14 Global Step: 301670 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:16,691-Speed 6341.31 samples/sec Loss 5.7392 LearningRate 0.0005 Epoch: 14 Global Step: 301680 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:19,937-Speed 6310.58 samples/sec Loss 5.7874 LearningRate 0.0005 Epoch: 14 Global Step: 301690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:23,181-Speed 6314.26 samples/sec Loss 5.6920 LearningRate 0.0005 Epoch: 14 Global Step: 301700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:26,429-Speed 6307.97 samples/sec Loss 5.6815 LearningRate 0.0005 Epoch: 14 Global Step: 301710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:29,674-Speed 6311.80 samples/sec Loss 5.7082 LearningRate 0.0005 Epoch: 14 Global Step: 301720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:32,922-Speed 6307.36 samples/sec Loss 5.7699 LearningRate 0.0005 Epoch: 14 Global Step: 301730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:36,171-Speed 6305.72 samples/sec Loss 5.7112 LearningRate 0.0005 Epoch: 14 Global Step: 301740 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:39,420-Speed 6305.57 samples/sec Loss 5.7867 LearningRate 0.0005 Epoch: 14 Global Step: 301750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:42,665-Speed 6311.89 samples/sec Loss 5.6740 LearningRate 0.0005 Epoch: 14 Global Step: 301760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:45,914-Speed 6305.74 samples/sec Loss 5.7558 LearningRate 0.0005 Epoch: 14 Global Step: 301770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:49,148-Speed 6332.98 samples/sec Loss 5.7649 LearningRate 0.0005 Epoch: 14 Global Step: 301780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:52,400-Speed 6300.48 samples/sec Loss 5.7524 LearningRate 0.0005 Epoch: 14 Global Step: 301790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:55,645-Speed 6310.92 samples/sec Loss 5.7117 LearningRate 0.0005 Epoch: 14 Global Step: 301800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:10:58,894-Speed 6306.27 samples/sec Loss 5.7504 LearningRate 0.0005 Epoch: 14 Global Step: 301810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:02,138-Speed 6314.73 samples/sec Loss 5.7220 LearningRate 0.0005 Epoch: 14 Global Step: 301820 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:05,385-Speed 6307.10 samples/sec Loss 5.6874 LearningRate 0.0005 Epoch: 14 Global Step: 301830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:08,638-Speed 6297.49 samples/sec Loss 5.7530 LearningRate 0.0005 Epoch: 14 Global Step: 301840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:11,884-Speed 6311.39 samples/sec Loss 5.7482 LearningRate 0.0005 Epoch: 14 Global Step: 301850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:15,131-Speed 6308.87 samples/sec Loss 5.7080 LearningRate 0.0005 Epoch: 14 Global Step: 301860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:18,381-Speed 6302.54 samples/sec Loss 5.6880 LearningRate 0.0005 Epoch: 14 Global Step: 301870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:21,616-Speed 6332.38 samples/sec Loss 5.6765 LearningRate 0.0005 Epoch: 14 Global Step: 301880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:24,864-Speed 6306.87 samples/sec Loss 5.6915 LearningRate 0.0005 Epoch: 14 Global Step: 301890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:28,113-Speed 6305.29 samples/sec Loss 5.7032 LearningRate 0.0005 Epoch: 14 Global Step: 301900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:31,358-Speed 6311.95 samples/sec Loss 5.6934 LearningRate 0.0005 Epoch: 14 Global Step: 301910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:34,600-Speed 6319.49 samples/sec Loss 5.7282 LearningRate 0.0005 Epoch: 14 Global Step: 301920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:37,845-Speed 6311.87 samples/sec Loss 5.7387 LearningRate 0.0005 Epoch: 14 Global Step: 301930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:41,094-Speed 6305.20 samples/sec Loss 5.7628 LearningRate 0.0005 Epoch: 14 Global Step: 301940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:44,340-Speed 6310.51 samples/sec Loss 5.7240 LearningRate 0.0005 Epoch: 14 Global Step: 301950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:47,588-Speed 6307.26 samples/sec Loss 5.7476 LearningRate 0.0005 Epoch: 14 Global Step: 301960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:50,837-Speed 6305.49 samples/sec Loss 5.7876 LearningRate 0.0005 Epoch: 14 Global Step: 301970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:54,073-Speed 6331.15 samples/sec Loss 5.7269 LearningRate 0.0005 Epoch: 14 Global Step: 301980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:11:57,317-Speed 6313.41 samples/sec Loss 5.6903 LearningRate 0.0005 Epoch: 14 Global Step: 301990 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:00,563-Speed 6311.08 samples/sec Loss 5.7264 LearningRate 0.0005 Epoch: 14 Global Step: 302000 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:03,810-Speed 6308.48 samples/sec Loss 5.6896 LearningRate 0.0005 Epoch: 14 Global Step: 302010 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:07,054-Speed 6315.69 samples/sec Loss 5.7135 LearningRate 0.0005 Epoch: 14 Global Step: 302020 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:10,300-Speed 6308.73 samples/sec Loss 5.7266 LearningRate 0.0005 Epoch: 14 Global Step: 302030 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:13,546-Speed 6312.32 samples/sec Loss 5.6402 LearningRate 0.0005 Epoch: 14 Global Step: 302040 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:16,793-Speed 6308.41 samples/sec Loss 5.8109 LearningRate 0.0005 Epoch: 14 Global Step: 302050 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:20,040-Speed 6307.60 samples/sec Loss 5.7128 LearningRate 0.0005 Epoch: 14 Global Step: 302060 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:23,287-Speed 6309.61 samples/sec Loss 5.6889 LearningRate 0.0005 Epoch: 14 Global Step: 302070 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:26,537-Speed 6302.95 samples/sec Loss 5.7367 LearningRate 0.0005 Epoch: 14 Global Step: 302080 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:12:29,773-Speed 6330.58 samples/sec Loss 5.8195 LearningRate 0.0005 Epoch: 14 Global Step: 302090 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:33,019-Speed 6309.71 samples/sec Loss 5.6964 LearningRate 0.0005 Epoch: 14 Global Step: 302100 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:36,262-Speed 6317.21 samples/sec Loss 5.6964 LearningRate 0.0005 Epoch: 14 Global Step: 302110 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:39,509-Speed 6308.33 samples/sec Loss 5.7250 LearningRate 0.0005 Epoch: 14 Global Step: 302120 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:42,756-Speed 6310.04 samples/sec Loss 5.6191 LearningRate 0.0005 Epoch: 14 Global Step: 302130 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:45,997-Speed 6319.68 samples/sec Loss 5.6356 LearningRate 0.0005 Epoch: 14 Global Step: 302140 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:49,242-Speed 6313.09 samples/sec Loss 5.7292 LearningRate 0.0005 Epoch: 14 Global Step: 302150 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:52,484-Speed 6319.61 samples/sec Loss 5.6453 LearningRate 0.0005 Epoch: 14 Global Step: 302160 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:55,733-Speed 6304.20 samples/sec Loss 5.6916 LearningRate 0.0005 Epoch: 14 Global Step: 302170 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:12:58,978-Speed 6312.19 samples/sec Loss 5.6676 LearningRate 0.0005 Epoch: 14 Global Step: 302180 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:02,210-Speed 6337.72 samples/sec Loss 5.7292 LearningRate 0.0005 Epoch: 14 Global Step: 302190 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:05,461-Speed 6302.48 samples/sec Loss 5.7121 LearningRate 0.0005 Epoch: 14 Global Step: 302200 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:08,706-Speed 6312.75 samples/sec Loss 5.7634 LearningRate 0.0005 Epoch: 14 Global Step: 302210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:11,955-Speed 6303.71 samples/sec Loss 5.7553 LearningRate 0.0005 Epoch: 14 Global Step: 302220 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:15,200-Speed 6313.23 samples/sec Loss 5.6700 LearningRate 0.0005 Epoch: 14 Global Step: 302230 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:18,447-Speed 6309.77 samples/sec Loss 5.7074 LearningRate 0.0005 Epoch: 14 Global Step: 302240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:21,690-Speed 6314.60 samples/sec Loss 5.7414 LearningRate 0.0005 Epoch: 14 Global Step: 302250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:24,939-Speed 6307.94 samples/sec Loss 5.7441 LearningRate 0.0005 Epoch: 14 Global Step: 302260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:28,184-Speed 6313.31 samples/sec Loss 5.7130 LearningRate 0.0005 Epoch: 14 Global Step: 302270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:31,435-Speed 6301.34 samples/sec Loss 5.7527 LearningRate 0.0005 Epoch: 14 Global Step: 302280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:34,668-Speed 6335.47 samples/sec Loss 5.7076 LearningRate 0.0005 Epoch: 14 Global Step: 302290 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:37,929-Speed 6282.12 samples/sec Loss 5.6871 LearningRate 0.0005 Epoch: 14 Global Step: 302300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:41,174-Speed 6312.34 samples/sec Loss 5.7386 LearningRate 0.0005 Epoch: 14 Global Step: 302310 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:44,423-Speed 6304.27 samples/sec Loss 5.6781 LearningRate 0.0005 Epoch: 14 Global Step: 302320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:47,667-Speed 6316.00 samples/sec Loss 5.6953 LearningRate 0.0005 Epoch: 14 Global Step: 302330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:50,912-Speed 6312.56 samples/sec Loss 5.7358 LearningRate 0.0005 Epoch: 14 Global Step: 302340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:54,158-Speed 6310.98 samples/sec Loss 5.6808 LearningRate 0.0005 Epoch: 14 Global Step: 302350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:13:57,404-Speed 6311.90 samples/sec Loss 5.6818 LearningRate 0.0005 Epoch: 14 Global Step: 302360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:00,651-Speed 6309.09 samples/sec Loss 5.6367 LearningRate 0.0005 Epoch: 14 Global Step: 302370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:03,897-Speed 6310.48 samples/sec Loss 5.7502 LearningRate 0.0005 Epoch: 14 Global Step: 302380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:07,132-Speed 6332.07 samples/sec Loss 5.6923 LearningRate 0.0005 Epoch: 14 Global Step: 302390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:10,382-Speed 6301.87 samples/sec Loss 5.6756 LearningRate 0.0005 Epoch: 14 Global Step: 302400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:13,626-Speed 6314.85 samples/sec Loss 5.7229 LearningRate 0.0005 Epoch: 14 Global Step: 302410 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:16,871-Speed 6312.82 samples/sec Loss 5.6856 LearningRate 0.0005 Epoch: 14 Global Step: 302420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:20,117-Speed 6310.75 samples/sec Loss 5.6054 LearningRate 0.0005 Epoch: 14 Global Step: 302430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:23,363-Speed 6310.15 samples/sec Loss 5.7576 LearningRate 0.0005 Epoch: 14 Global Step: 302440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:26,613-Speed 6303.77 samples/sec Loss 5.6571 LearningRate 0.0005 Epoch: 14 Global Step: 302450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:29,868-Speed 6292.32 samples/sec Loss 5.7586 LearningRate 0.0005 Epoch: 14 Global Step: 302460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:33,111-Speed 6316.42 samples/sec Loss 5.6883 LearningRate 0.0005 Epoch: 14 Global Step: 302470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:36,358-Speed 6309.68 samples/sec Loss 5.7389 LearningRate 0.0005 Epoch: 14 Global Step: 302480 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:39,606-Speed 6307.49 samples/sec Loss 5.7574 LearningRate 0.0005 Epoch: 14 Global Step: 302490 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:14:42,840-Speed 6334.28 samples/sec Loss 5.7217 LearningRate 0.0005 Epoch: 14 Global Step: 302500 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:46,085-Speed 6311.96 samples/sec Loss 5.7448 LearningRate 0.0005 Epoch: 14 Global Step: 302510 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:49,330-Speed 6312.72 samples/sec Loss 5.6853 LearningRate 0.0005 Epoch: 14 Global Step: 302520 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:52,575-Speed 6313.47 samples/sec Loss 5.7585 LearningRate 0.0005 Epoch: 14 Global Step: 302530 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:55,826-Speed 6299.82 samples/sec Loss 5.6731 LearningRate 0.0005 Epoch: 14 Global Step: 302540 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:14:59,129-Speed 6202.35 samples/sec Loss 5.7627 LearningRate 0.0005 Epoch: 14 Global Step: 302550 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:15:02,373-Speed 6314.18 samples/sec Loss 5.7754 LearningRate 0.0005 Epoch: 14 Global Step: 302560 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:15:05,624-Speed 6302.85 samples/sec Loss 5.6655 LearningRate 0.0005 Epoch: 14 Global Step: 302570 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:15:08,869-Speed 6311.66 samples/sec Loss 5.6793 LearningRate 0.0005 Epoch: 14 Global Step: 302580 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:15:12,117-Speed 6307.77 samples/sec Loss 5.6828 LearningRate 0.0005 Epoch: 14 Global Step: 302590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:15:15,363-Speed 6309.38 samples/sec Loss 5.7482 LearningRate 0.0005 Epoch: 14 Global Step: 302600 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:15:18,596-Speed 6338.05 samples/sec Loss 5.7129 LearningRate 0.0005 Epoch: 14 Global Step: 302610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:15:21,830-Speed 6333.17 samples/sec Loss 5.7108 LearningRate 0.0005 Epoch: 14 Global Step: 302620 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:15:25,081-Speed 6301.79 samples/sec Loss 5.6960 LearningRate 0.0005 Epoch: 14 Global Step: 302630 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:15:28,325-Speed 6314.10 samples/sec Loss 5.7048 LearningRate 0.0005 Epoch: 14 Global Step: 302640 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:15:31,573-Speed 6306.14 samples/sec Loss 5.7510 LearningRate 0.0005 Epoch: 14 Global Step: 302650 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:15:34,815-Speed 6318.04 samples/sec Loss 5.7361 LearningRate 0.0005 Epoch: 14 Global Step: 302660 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:15:38,063-Speed 6308.42 samples/sec Loss 5.7353 LearningRate 0.0005 Epoch: 14 Global Step: 302670 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:15:41,311-Speed 6307.01 samples/sec Loss 5.6821 LearningRate 0.0005 Epoch: 14 Global Step: 302680 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:15:44,552-Speed 6320.49 samples/sec Loss 5.7338 LearningRate 0.0005 Epoch: 14 Global Step: 302690 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:15:47,798-Speed 6310.14 samples/sec Loss 5.6782 LearningRate 0.0005 Epoch: 14 Global Step: 302700 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:15:51,044-Speed 6310.78 samples/sec Loss 5.7195 LearningRate 0.0005 Epoch: 14 Global Step: 302710 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:15:54,294-Speed 6301.93 samples/sec Loss 5.7065 LearningRate 0.0005 Epoch: 14 Global Step: 302720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:15:57,537-Speed 6316.49 samples/sec Loss 5.7448 LearningRate 0.0005 Epoch: 14 Global Step: 302730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:00,782-Speed 6312.37 samples/sec Loss 5.6397 LearningRate 0.0005 Epoch: 14 Global Step: 302740 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:04,026-Speed 6314.60 samples/sec Loss 5.7849 LearningRate 0.0005 Epoch: 14 Global Step: 302750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:07,269-Speed 6317.84 samples/sec Loss 5.7637 LearningRate 0.0005 Epoch: 14 Global Step: 302760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:10,516-Speed 6309.40 samples/sec Loss 5.7077 LearningRate 0.0005 Epoch: 14 Global Step: 302770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:13,766-Speed 6302.17 samples/sec Loss 5.6370 LearningRate 0.0005 Epoch: 14 Global Step: 302780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:17,020-Speed 6296.22 samples/sec Loss 5.6701 LearningRate 0.0005 Epoch: 14 Global Step: 302790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:20,265-Speed 6311.22 samples/sec Loss 5.7210 LearningRate 0.0005 Epoch: 14 Global Step: 302800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:23,514-Speed 6305.46 samples/sec Loss 5.7063 LearningRate 0.0005 Epoch: 14 Global Step: 302810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:26,748-Speed 6335.39 samples/sec Loss 5.5944 LearningRate 0.0005 Epoch: 14 Global Step: 302820 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:29,992-Speed 6313.19 samples/sec Loss 5.7023 LearningRate 0.0005 Epoch: 14 Global Step: 302830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:33,237-Speed 6312.89 samples/sec Loss 5.7161 LearningRate 0.0005 Epoch: 14 Global Step: 302840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:36,481-Speed 6314.49 samples/sec Loss 5.6525 LearningRate 0.0005 Epoch: 14 Global Step: 302850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:39,724-Speed 6316.79 samples/sec Loss 5.6624 LearningRate 0.0005 Epoch: 14 Global Step: 302860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:42,969-Speed 6312.89 samples/sec Loss 5.7213 LearningRate 0.0005 Epoch: 14 Global Step: 302870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:46,213-Speed 6314.23 samples/sec Loss 5.7515 LearningRate 0.0005 Epoch: 14 Global Step: 302880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:49,462-Speed 6304.62 samples/sec Loss 5.6902 LearningRate 0.0005 Epoch: 14 Global Step: 302890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:52,705-Speed 6316.46 samples/sec Loss 5.7197 LearningRate 0.0005 Epoch: 14 Global Step: 302900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:55,956-Speed 6302.26 samples/sec Loss 5.7421 LearningRate 0.0005 Epoch: 14 Global Step: 302910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:16:59,189-Speed 6334.49 samples/sec Loss 5.7457 LearningRate 0.0005 Epoch: 14 Global Step: 302920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:02,436-Speed 6312.87 samples/sec Loss 5.7675 LearningRate 0.0005 Epoch: 14 Global Step: 302930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:05,684-Speed 6307.28 samples/sec Loss 5.7578 LearningRate 0.0005 Epoch: 14 Global Step: 302940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:08,932-Speed 6306.75 samples/sec Loss 5.6919 LearningRate 0.0005 Epoch: 14 Global Step: 302950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:12,181-Speed 6303.36 samples/sec Loss 5.7220 LearningRate 0.0005 Epoch: 14 Global Step: 302960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:15,424-Speed 6316.76 samples/sec Loss 5.7273 LearningRate 0.0005 Epoch: 14 Global Step: 302970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:18,668-Speed 6316.60 samples/sec Loss 5.6496 LearningRate 0.0005 Epoch: 14 Global Step: 302980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:21,916-Speed 6306.93 samples/sec Loss 5.8026 LearningRate 0.0005 Epoch: 14 Global Step: 302990 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:25,159-Speed 6315.78 samples/sec Loss 5.7924 LearningRate 0.0005 Epoch: 14 Global Step: 303000 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:28,405-Speed 6311.72 samples/sec Loss 5.8024 LearningRate 0.0005 Epoch: 14 Global Step: 303010 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:31,635-Speed 6341.65 samples/sec Loss 5.7581 LearningRate 0.0005 Epoch: 14 Global Step: 303020 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:34,881-Speed 6310.74 samples/sec Loss 5.7519 LearningRate 0.0005 Epoch: 14 Global Step: 303030 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:38,128-Speed 6307.61 samples/sec Loss 5.6847 LearningRate 0.0005 Epoch: 14 Global Step: 303040 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:41,373-Speed 6314.02 samples/sec Loss 5.7835 LearningRate 0.0005 Epoch: 14 Global Step: 303050 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:44,616-Speed 6315.31 samples/sec Loss 5.6327 LearningRate 0.0005 Epoch: 14 Global Step: 303060 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:47,863-Speed 6309.41 samples/sec Loss 5.7312 LearningRate 0.0005 Epoch: 14 Global Step: 303070 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:51,108-Speed 6312.63 samples/sec Loss 5.6398 LearningRate 0.0005 Epoch: 14 Global Step: 303080 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:54,401-Speed 6219.58 samples/sec Loss 5.7017 LearningRate 0.0005 Epoch: 14 Global Step: 303090 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:17:57,647-Speed 6312.68 samples/sec Loss 5.7789 LearningRate 0.0005 Epoch: 14 Global Step: 303100 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:00,890-Speed 6316.49 samples/sec Loss 5.6871 LearningRate 0.0005 Epoch: 14 Global Step: 303110 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:04,118-Speed 6345.15 samples/sec Loss 5.8377 LearningRate 0.0005 Epoch: 14 Global Step: 303120 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:07,365-Speed 6309.22 samples/sec Loss 5.7748 LearningRate 0.0005 Epoch: 14 Global Step: 303130 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:10,614-Speed 6304.39 samples/sec Loss 5.7416 LearningRate 0.0005 Epoch: 14 Global Step: 303140 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:13,857-Speed 6315.55 samples/sec Loss 5.8205 LearningRate 0.0005 Epoch: 14 Global Step: 303150 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:17,103-Speed 6311.78 samples/sec Loss 5.7477 LearningRate 0.0005 Epoch: 14 Global Step: 303160 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:20,348-Speed 6311.56 samples/sec Loss 5.6745 LearningRate 0.0005 Epoch: 14 Global Step: 303170 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:23,595-Speed 6309.79 samples/sec Loss 5.7319 LearningRate 0.0005 Epoch: 14 Global Step: 303180 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:26,845-Speed 6302.57 samples/sec Loss 5.6845 LearningRate 0.0005 Epoch: 14 Global Step: 303190 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:30,095-Speed 6304.41 samples/sec Loss 5.6799 LearningRate 0.0005 Epoch: 14 Global Step: 303200 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:33,340-Speed 6312.68 samples/sec Loss 5.6959 LearningRate 0.0005 Epoch: 14 Global Step: 303210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:36,584-Speed 6313.68 samples/sec Loss 5.6342 LearningRate 0.0005 Epoch: 14 Global Step: 303220 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:18:39,818-Speed 6335.07 samples/sec Loss 5.7040 LearningRate 0.0005 Epoch: 14 Global Step: 303230 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:43,066-Speed 6306.64 samples/sec Loss 5.6817 LearningRate 0.0005 Epoch: 14 Global Step: 303240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:46,312-Speed 6310.77 samples/sec Loss 5.6971 LearningRate 0.0005 Epoch: 14 Global Step: 303250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:49,558-Speed 6311.00 samples/sec Loss 5.6901 LearningRate 0.0005 Epoch: 14 Global Step: 303260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:52,801-Speed 6316.67 samples/sec Loss 5.6593 LearningRate 0.0005 Epoch: 14 Global Step: 303270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:56,049-Speed 6307.28 samples/sec Loss 5.7840 LearningRate 0.0005 Epoch: 14 Global Step: 303280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:18:59,291-Speed 6317.55 samples/sec Loss 5.7737 LearningRate 0.0005 Epoch: 14 Global Step: 303290 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:02,539-Speed 6307.12 samples/sec Loss 5.7508 LearningRate 0.0005 Epoch: 14 Global Step: 303300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:05,782-Speed 6316.89 samples/sec Loss 5.6913 LearningRate 0.0005 Epoch: 14 Global Step: 303310 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:09,027-Speed 6311.73 samples/sec Loss 5.7916 LearningRate 0.0005 Epoch: 14 Global Step: 303320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:12,280-Speed 6296.76 samples/sec Loss 5.6953 LearningRate 0.0005 Epoch: 14 Global Step: 303330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:15,590-Speed 6190.20 samples/sec Loss 5.6921 LearningRate 0.0005 Epoch: 14 Global Step: 303340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:18,840-Speed 6302.10 samples/sec Loss 5.8124 LearningRate 0.0005 Epoch: 14 Global Step: 303350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:22,090-Speed 6303.49 samples/sec Loss 5.7979 LearningRate 0.0005 Epoch: 14 Global Step: 303360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:25,339-Speed 6303.78 samples/sec Loss 5.6594 LearningRate 0.0005 Epoch: 14 Global Step: 303370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:28,616-Speed 6252.61 samples/sec Loss 5.7196 LearningRate 0.0005 Epoch: 14 Global Step: 303380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:31,860-Speed 6312.94 samples/sec Loss 5.6973 LearningRate 0.0005 Epoch: 14 Global Step: 303390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:35,108-Speed 6307.21 samples/sec Loss 5.7200 LearningRate 0.0005 Epoch: 14 Global Step: 303400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:38,366-Speed 6287.60 samples/sec Loss 5.7384 LearningRate 0.0005 Epoch: 14 Global Step: 303410 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:41,616-Speed 6303.87 samples/sec Loss 5.7323 LearningRate 0.0005 Epoch: 14 Global Step: 303420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:44,846-Speed 6342.80 samples/sec Loss 5.7186 LearningRate 0.0005 Epoch: 14 Global Step: 303430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:48,090-Speed 6314.32 samples/sec Loss 5.7010 LearningRate 0.0005 Epoch: 14 Global Step: 303440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:51,340-Speed 6303.59 samples/sec Loss 5.6418 LearningRate 0.0005 Epoch: 14 Global Step: 303450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:19:54,571-Speed 6338.56 samples/sec Loss 5.7143 LearningRate 0.0005 Epoch: 14 Global Step: 303460 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:19:57,816-Speed 6312.63 samples/sec Loss 5.6756 LearningRate 0.0005 Epoch: 14 Global Step: 303470 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:20:01,061-Speed 6313.27 samples/sec Loss 5.6565 LearningRate 0.0005 Epoch: 14 Global Step: 303480 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:20:04,307-Speed 6311.85 samples/sec Loss 5.7594 LearningRate 0.0005 Epoch: 14 Global Step: 303490 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:20:07,552-Speed 6310.93 samples/sec Loss 5.7612 LearningRate 0.0005 Epoch: 14 Global Step: 303500 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:20:10,794-Speed 6318.83 samples/sec Loss 5.7175 LearningRate 0.0005 Epoch: 14 Global Step: 303510 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:20:14,038-Speed 6314.87 samples/sec Loss 5.6584 LearningRate 0.0005 Epoch: 14 Global Step: 303520 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:20:17,295-Speed 6289.35 samples/sec Loss 5.7028 LearningRate 0.0005 Epoch: 14 Global Step: 303530 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:20:20,541-Speed 6309.97 samples/sec Loss 5.7260 LearningRate 0.0005 Epoch: 14 Global Step: 303540 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:20:23,787-Speed 6311.53 samples/sec Loss 5.7428 LearningRate 0.0005 Epoch: 14 Global Step: 303550 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:20:27,033-Speed 6311.20 samples/sec Loss 5.7265 LearningRate 0.0005 Epoch: 14 Global Step: 303560 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:20:30,281-Speed 6305.36 samples/sec Loss 5.7355 LearningRate 0.0005 Epoch: 14 Global Step: 303570 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:20:33,528-Speed 6309.41 samples/sec Loss 5.7714 LearningRate 0.0005 Epoch: 14 Global Step: 303580 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:20:36,770-Speed 6317.99 samples/sec Loss 5.6851 LearningRate 0.0005 Epoch: 14 Global Step: 303590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:20:40,022-Speed 6299.64 samples/sec Loss 5.7362 LearningRate 0.0005 Epoch: 14 Global Step: 303600 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:20:43,273-Speed 6301.27 samples/sec Loss 5.7331 LearningRate 0.0005 Epoch: 14 Global Step: 303610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:20:46,522-Speed 6306.11 samples/sec Loss 5.6732 LearningRate 0.0005 Epoch: 14 Global Step: 303620 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:20:49,771-Speed 6305.02 samples/sec Loss 5.6455 LearningRate 0.0005 Epoch: 14 Global Step: 303630 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:20:53,018-Speed 6308.27 samples/sec Loss 5.5887 LearningRate 0.0005 Epoch: 14 Global Step: 303640 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:20:56,261-Speed 6316.74 samples/sec Loss 5.6379 LearningRate 0.0005 Epoch: 14 Global Step: 303650 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:20:59,490-Speed 6342.64 samples/sec Loss 5.7472 LearningRate 0.0005 Epoch: 14 Global Step: 303660 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:02,735-Speed 6313.73 samples/sec Loss 5.7238 LearningRate 0.0005 Epoch: 14 Global Step: 303670 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:05,978-Speed 6317.18 samples/sec Loss 5.8092 LearningRate 0.0005 Epoch: 14 Global Step: 303680 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:09,222-Speed 6314.79 samples/sec Loss 5.7732 LearningRate 0.0005 Epoch: 14 Global Step: 303690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:12,468-Speed 6309.60 samples/sec Loss 5.7728 LearningRate 0.0005 Epoch: 14 Global Step: 303700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:15,714-Speed 6312.02 samples/sec Loss 5.7390 LearningRate 0.0005 Epoch: 14 Global Step: 303710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:18,962-Speed 6305.98 samples/sec Loss 5.7151 LearningRate 0.0005 Epoch: 14 Global Step: 303720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:22,208-Speed 6310.43 samples/sec Loss 5.7589 LearningRate 0.0005 Epoch: 14 Global Step: 303730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:25,464-Speed 6291.08 samples/sec Loss 5.7586 LearningRate 0.0005 Epoch: 14 Global Step: 303740 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:28,712-Speed 6306.58 samples/sec Loss 5.6970 LearningRate 0.0005 Epoch: 14 Global Step: 303750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:31,944-Speed 6339.22 samples/sec Loss 5.6860 LearningRate 0.0005 Epoch: 14 Global Step: 303760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:35,188-Speed 6313.10 samples/sec Loss 5.8165 LearningRate 0.0005 Epoch: 14 Global Step: 303770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:38,435-Speed 6309.67 samples/sec Loss 5.6578 LearningRate 0.0005 Epoch: 14 Global Step: 303780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:41,682-Speed 6309.50 samples/sec Loss 5.7135 LearningRate 0.0005 Epoch: 14 Global Step: 303790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:21:44,914-Speed 6336.47 samples/sec Loss 5.6975 LearningRate 0.0005 Epoch: 14 Global Step: 303800 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:21:48,163-Speed 6306.39 samples/sec Loss 5.8234 LearningRate 0.0005 Epoch: 14 Global Step: 303810 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:21:51,409-Speed 6309.19 samples/sec Loss 5.7596 LearningRate 0.0005 Epoch: 14 Global Step: 303820 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:21:54,659-Speed 6303.30 samples/sec Loss 5.6661 LearningRate 0.0005 Epoch: 14 Global Step: 303830 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:21:57,904-Speed 6313.36 samples/sec Loss 5.6883 LearningRate 0.0005 Epoch: 14 Global Step: 303840 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:22:01,156-Speed 6298.74 samples/sec Loss 5.7133 LearningRate 0.0005 Epoch: 14 Global Step: 303850 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:22:04,400-Speed 6314.25 samples/sec Loss 5.7176 LearningRate 0.0005 Epoch: 14 Global Step: 303860 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:22:07,649-Speed 6305.02 samples/sec Loss 5.7177 LearningRate 0.0005 Epoch: 14 Global Step: 303870 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:22:10,894-Speed 6312.45 samples/sec Loss 5.7418 LearningRate 0.0005 Epoch: 14 Global Step: 303880 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:22:14,138-Speed 6314.83 samples/sec Loss 5.6958 LearningRate 0.0005 Epoch: 14 Global Step: 303890 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:22:17,383-Speed 6312.90 samples/sec Loss 5.7350 LearningRate 0.0005 Epoch: 14 Global Step: 303900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:20,629-Speed 6310.62 samples/sec Loss 5.7417 LearningRate 0.0005 Epoch: 14 Global Step: 303910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:23,876-Speed 6309.97 samples/sec Loss 5.6831 LearningRate 0.0005 Epoch: 14 Global Step: 303920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:27,124-Speed 6305.29 samples/sec Loss 5.7578 LearningRate 0.0005 Epoch: 14 Global Step: 303930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:30,371-Speed 6310.21 samples/sec Loss 5.7041 LearningRate 0.0005 Epoch: 14 Global Step: 303940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:33,615-Speed 6314.60 samples/sec Loss 5.7488 LearningRate 0.0005 Epoch: 14 Global Step: 303950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:36,858-Speed 6315.97 samples/sec Loss 5.7247 LearningRate 0.0005 Epoch: 14 Global Step: 303960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:40,104-Speed 6311.12 samples/sec Loss 5.6792 LearningRate 0.0005 Epoch: 14 Global Step: 303970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:43,357-Speed 6295.85 samples/sec Loss 5.7483 LearningRate 0.0005 Epoch: 14 Global Step: 303980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:46,602-Speed 6313.85 samples/sec Loss 5.7309 LearningRate 0.0005 Epoch: 14 Global Step: 303990 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:49,835-Speed 6336.53 samples/sec Loss 5.8117 LearningRate 0.0005 Epoch: 14 Global Step: 304000 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:22:53,065-Speed 6341.46 samples/sec Loss 5.7293 LearningRate 0.0005 Epoch: 14 Global Step: 304010 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:22:56,316-Speed 6300.50 samples/sec Loss 5.6777 LearningRate 0.0005 Epoch: 14 Global Step: 304020 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:22:59,562-Speed 6309.92 samples/sec Loss 5.8233 LearningRate 0.0005 Epoch: 14 Global Step: 304030 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:23:02,809-Speed 6309.50 samples/sec Loss 5.6856 LearningRate 0.0005 Epoch: 14 Global Step: 304040 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:23:06,057-Speed 6306.52 samples/sec Loss 5.6909 LearningRate 0.0005 Epoch: 14 Global Step: 304050 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:23:09,302-Speed 6313.17 samples/sec Loss 5.7399 LearningRate 0.0005 Epoch: 14 Global Step: 304060 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:23:12,551-Speed 6308.44 samples/sec Loss 5.7337 LearningRate 0.0005 Epoch: 14 Global Step: 304070 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:23:15,794-Speed 6316.45 samples/sec Loss 5.7090 LearningRate 0.0005 Epoch: 14 Global Step: 304080 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:23:19,043-Speed 6305.16 samples/sec Loss 5.6805 LearningRate 0.0005 Epoch: 14 Global Step: 304090 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:23:22,288-Speed 6312.82 samples/sec Loss 5.6726 LearningRate 0.0005 Epoch: 14 Global Step: 304100 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:23:25,546-Speed 6287.43 samples/sec Loss 5.6354 LearningRate 0.0005 Epoch: 14 Global Step: 304110 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:23:28,794-Speed 6305.82 samples/sec Loss 5.7361 LearningRate 0.0005 Epoch: 14 Global Step: 304120 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:23:32,039-Speed 6313.97 samples/sec Loss 5.7371 LearningRate 0.0005 Epoch: 14 Global Step: 304130 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:23:35,286-Speed 6307.43 samples/sec Loss 5.7447 LearningRate 0.0005 Epoch: 14 Global Step: 304140 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:23:38,534-Speed 6307.80 samples/sec Loss 5.7659 LearningRate 0.0005 Epoch: 14 Global Step: 304150 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:23:41,780-Speed 6311.41 samples/sec Loss 5.6860 LearningRate 0.0005 Epoch: 14 Global Step: 304160 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:23:45,029-Speed 6303.08 samples/sec Loss 5.7204 LearningRate 0.0005 Epoch: 14 Global Step: 304170 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:23:48,276-Speed 6310.77 samples/sec Loss 5.6853 LearningRate 0.0005 Epoch: 14 Global Step: 304180 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:23:51,523-Speed 6307.42 samples/sec Loss 5.7009 LearningRate 0.0005 Epoch: 14 Global Step: 304190 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:23:54,767-Speed 6315.79 samples/sec Loss 5.6790 LearningRate 0.0005 Epoch: 14 Global Step: 304200 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:23:57,999-Speed 6336.38 samples/sec Loss 5.7682 LearningRate 0.0005 Epoch: 14 Global Step: 304210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:01,251-Speed 6300.01 samples/sec Loss 5.6953 LearningRate 0.0005 Epoch: 14 Global Step: 304220 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:04,505-Speed 6295.66 samples/sec Loss 5.6800 LearningRate 0.0005 Epoch: 14 Global Step: 304230 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:07,748-Speed 6316.21 samples/sec Loss 5.7419 LearningRate 0.0005 Epoch: 14 Global Step: 304240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:10,993-Speed 6311.05 samples/sec Loss 5.6979 LearningRate 0.0005 Epoch: 14 Global Step: 304250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:14,239-Speed 6312.78 samples/sec Loss 5.7501 LearningRate 0.0005 Epoch: 14 Global Step: 304260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:17,483-Speed 6314.11 samples/sec Loss 5.7087 LearningRate 0.0005 Epoch: 14 Global Step: 304270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:20,731-Speed 6308.00 samples/sec Loss 5.7239 LearningRate 0.0005 Epoch: 14 Global Step: 304280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:23,974-Speed 6315.52 samples/sec Loss 5.6282 LearningRate 0.0005 Epoch: 14 Global Step: 304290 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:27,220-Speed 6311.96 samples/sec Loss 5.6959 LearningRate 0.0005 Epoch: 14 Global Step: 304300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:30,467-Speed 6307.92 samples/sec Loss 5.6709 LearningRate 0.0005 Epoch: 14 Global Step: 304310 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:24:33,697-Speed 6341.65 samples/sec Loss 5.7326 LearningRate 0.0005 Epoch: 14 Global Step: 304320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:36,942-Speed 6313.23 samples/sec Loss 5.7599 LearningRate 0.0005 Epoch: 14 Global Step: 304330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:40,218-Speed 6253.13 samples/sec Loss 5.6812 LearningRate 0.0005 Epoch: 14 Global Step: 304340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:43,460-Speed 6317.40 samples/sec Loss 5.7734 LearningRate 0.0005 Epoch: 14 Global Step: 304350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:46,709-Speed 6305.59 samples/sec Loss 5.6929 LearningRate 0.0005 Epoch: 14 Global Step: 304360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:49,954-Speed 6312.48 samples/sec Loss 5.6774 LearningRate 0.0005 Epoch: 14 Global Step: 304370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:53,198-Speed 6314.45 samples/sec Loss 5.7014 LearningRate 0.0005 Epoch: 14 Global Step: 304380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:56,448-Speed 6303.98 samples/sec Loss 5.7281 LearningRate 0.0005 Epoch: 14 Global Step: 304390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:24:59,697-Speed 6304.27 samples/sec Loss 5.7190 LearningRate 0.0005 Epoch: 14 Global Step: 304400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:02,944-Speed 6308.06 samples/sec Loss 5.7542 LearningRate 0.0005 Epoch: 14 Global Step: 304410 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:06,175-Speed 6340.58 samples/sec Loss 5.7784 LearningRate 0.0005 Epoch: 14 Global Step: 304420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:09,432-Speed 6288.92 samples/sec Loss 5.7348 LearningRate 0.0005 Epoch: 14 Global Step: 304430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:12,726-Speed 6220.46 samples/sec Loss 5.6702 LearningRate 0.0005 Epoch: 14 Global Step: 304440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:16,021-Speed 6214.92 samples/sec Loss 5.6124 LearningRate 0.0005 Epoch: 14 Global Step: 304450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:19,283-Speed 6280.33 samples/sec Loss 5.6801 LearningRate 0.0005 Epoch: 14 Global Step: 304460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:22,530-Speed 6309.84 samples/sec Loss 5.7190 LearningRate 0.0005 Epoch: 14 Global Step: 304470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:25,793-Speed 6277.90 samples/sec Loss 5.7625 LearningRate 0.0005 Epoch: 14 Global Step: 304480 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:29,042-Speed 6304.28 samples/sec Loss 5.7764 LearningRate 0.0005 Epoch: 14 Global Step: 304490 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:32,292-Speed 6303.84 samples/sec Loss 5.7131 LearningRate 0.0005 Epoch: 14 Global Step: 304500 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:35,578-Speed 6233.21 samples/sec Loss 5.6883 LearningRate 0.0005 Epoch: 14 Global Step: 304510 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:38,823-Speed 6312.77 samples/sec Loss 5.6615 LearningRate 0.0005 Epoch: 14 Global Step: 304520 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:25:42,056-Speed 6336.43 samples/sec Loss 5.7189 LearningRate 0.0005 Epoch: 14 Global Step: 304530 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:45,301-Speed 6311.87 samples/sec Loss 5.7047 LearningRate 0.0005 Epoch: 14 Global Step: 304540 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:48,547-Speed 6312.46 samples/sec Loss 5.7127 LearningRate 0.0005 Epoch: 14 Global Step: 304550 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:51,792-Speed 6311.64 samples/sec Loss 5.7586 LearningRate 0.0005 Epoch: 14 Global Step: 304560 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:55,055-Speed 6278.40 samples/sec Loss 5.7684 LearningRate 0.0005 Epoch: 14 Global Step: 304570 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:25:58,306-Speed 6300.30 samples/sec Loss 5.6796 LearningRate 0.0005 Epoch: 14 Global Step: 304580 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:01,550-Speed 6315.10 samples/sec Loss 5.7341 LearningRate 0.0005 Epoch: 14 Global Step: 304590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:04,799-Speed 6304.73 samples/sec Loss 5.6329 LearningRate 0.0005 Epoch: 14 Global Step: 304600 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:08,042-Speed 6317.04 samples/sec Loss 5.7340 LearningRate 0.0005 Epoch: 14 Global Step: 304610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:11,290-Speed 6305.66 samples/sec Loss 5.6625 LearningRate 0.0005 Epoch: 14 Global Step: 304620 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:14,521-Speed 6340.17 samples/sec Loss 5.7059 LearningRate 0.0005 Epoch: 14 Global Step: 304630 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:17,773-Speed 6298.89 samples/sec Loss 5.6351 LearningRate 0.0005 Epoch: 14 Global Step: 304640 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:21,024-Speed 6302.18 samples/sec Loss 5.7249 LearningRate 0.0005 Epoch: 14 Global Step: 304650 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:24,269-Speed 6311.71 samples/sec Loss 5.6926 LearningRate 0.0005 Epoch: 14 Global Step: 304660 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:27,517-Speed 6308.01 samples/sec Loss 5.6483 LearningRate 0.0005 Epoch: 14 Global Step: 304670 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:30,762-Speed 6313.19 samples/sec Loss 5.6840 LearningRate 0.0005 Epoch: 14 Global Step: 304680 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:34,002-Speed 6321.04 samples/sec Loss 5.6809 LearningRate 0.0005 Epoch: 14 Global Step: 304690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:37,250-Speed 6308.22 samples/sec Loss 5.7216 LearningRate 0.0005 Epoch: 14 Global Step: 304700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:40,499-Speed 6304.09 samples/sec Loss 5.6341 LearningRate 0.0005 Epoch: 14 Global Step: 304710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:43,750-Speed 6301.59 samples/sec Loss 5.7560 LearningRate 0.0005 Epoch: 14 Global Step: 304720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:46,980-Speed 6342.45 samples/sec Loss 5.7441 LearningRate 0.0005 Epoch: 14 Global Step: 304730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:50,226-Speed 6310.36 samples/sec Loss 5.7182 LearningRate 0.0005 Epoch: 14 Global Step: 304740 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:53,475-Speed 6305.09 samples/sec Loss 5.7091 LearningRate 0.0005 Epoch: 14 Global Step: 304750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:56,717-Speed 6318.42 samples/sec Loss 5.7015 LearningRate 0.0005 Epoch: 14 Global Step: 304760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:26:59,981-Speed 6276.20 samples/sec Loss 5.6660 LearningRate 0.0005 Epoch: 14 Global Step: 304770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:03,229-Speed 6307.14 samples/sec Loss 5.7106 LearningRate 0.0005 Epoch: 14 Global Step: 304780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:06,471-Speed 6316.57 samples/sec Loss 5.8078 LearningRate 0.0005 Epoch: 14 Global Step: 304790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:09,718-Speed 6309.55 samples/sec Loss 5.7517 LearningRate 0.0005 Epoch: 14 Global Step: 304800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:12,966-Speed 6306.69 samples/sec Loss 5.7529 LearningRate 0.0005 Epoch: 14 Global Step: 304810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:16,225-Speed 6286.12 samples/sec Loss 5.7359 LearningRate 0.0005 Epoch: 14 Global Step: 304820 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:19,461-Speed 6329.10 samples/sec Loss 5.6988 LearningRate 0.0005 Epoch: 14 Global Step: 304830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:22,708-Speed 6309.29 samples/sec Loss 5.6185 LearningRate 0.0005 Epoch: 14 Global Step: 304840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:25,989-Speed 6244.12 samples/sec Loss 5.7260 LearningRate 0.0005 Epoch: 14 Global Step: 304850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:29,239-Speed 6302.77 samples/sec Loss 5.6787 LearningRate 0.0005 Epoch: 14 Global Step: 304860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:32,485-Speed 6311.07 samples/sec Loss 5.7372 LearningRate 0.0005 Epoch: 14 Global Step: 304870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:35,728-Speed 6315.15 samples/sec Loss 5.6268 LearningRate 0.0005 Epoch: 14 Global Step: 304880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:38,970-Speed 6318.35 samples/sec Loss 5.7377 LearningRate 0.0005 Epoch: 14 Global Step: 304890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:42,222-Speed 6300.59 samples/sec Loss 5.6773 LearningRate 0.0005 Epoch: 14 Global Step: 304900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:45,469-Speed 6307.99 samples/sec Loss 5.6530 LearningRate 0.0005 Epoch: 14 Global Step: 304910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:48,714-Speed 6312.45 samples/sec Loss 5.6238 LearningRate 0.0005 Epoch: 14 Global Step: 304920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:51,949-Speed 6332.72 samples/sec Loss 5.6977 LearningRate 0.0005 Epoch: 14 Global Step: 304930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:55,194-Speed 6312.90 samples/sec Loss 5.7001 LearningRate 0.0005 Epoch: 14 Global Step: 304940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:27:58,435-Speed 6320.78 samples/sec Loss 5.6794 LearningRate 0.0005 Epoch: 14 Global Step: 304950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:01,682-Speed 6309.75 samples/sec Loss 5.6475 LearningRate 0.0005 Epoch: 14 Global Step: 304960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:04,951-Speed 6265.70 samples/sec Loss 5.6648 LearningRate 0.0005 Epoch: 14 Global Step: 304970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:08,196-Speed 6311.66 samples/sec Loss 5.7535 LearningRate 0.0005 Epoch: 14 Global Step: 304980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:11,444-Speed 6308.06 samples/sec Loss 5.7261 LearningRate 0.0005 Epoch: 14 Global Step: 304990 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:14,691-Speed 6307.78 samples/sec Loss 5.7787 LearningRate 0.0005 Epoch: 14 Global Step: 305000 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:17,956-Speed 6273.93 samples/sec Loss 5.7137 LearningRate 0.0005 Epoch: 14 Global Step: 305010 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:21,200-Speed 6313.66 samples/sec Loss 5.7372 LearningRate 0.0005 Epoch: 14 Global Step: 305020 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:24,448-Speed 6307.65 samples/sec Loss 5.7132 LearningRate 0.0005 Epoch: 14 Global Step: 305030 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:28:27,679-Speed 6340.27 samples/sec Loss 5.6860 LearningRate 0.0005 Epoch: 14 Global Step: 305040 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:30,922-Speed 6316.13 samples/sec Loss 5.6849 LearningRate 0.0005 Epoch: 14 Global Step: 305050 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:34,166-Speed 6315.82 samples/sec Loss 5.6446 LearningRate 0.0005 Epoch: 14 Global Step: 305060 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:37,414-Speed 6306.66 samples/sec Loss 5.7281 LearningRate 0.0005 Epoch: 14 Global Step: 305070 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:40,664-Speed 6302.84 samples/sec Loss 5.7350 LearningRate 0.0005 Epoch: 14 Global Step: 305080 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:43,918-Speed 6295.45 samples/sec Loss 5.6388 LearningRate 0.0005 Epoch: 14 Global Step: 305090 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:47,164-Speed 6310.62 samples/sec Loss 5.7084 LearningRate 0.0005 Epoch: 14 Global Step: 305100 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:28:50,398-Speed 6334.37 samples/sec Loss 5.7083 LearningRate 0.0005 Epoch: 14 Global Step: 305110 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:28:53,642-Speed 6313.23 samples/sec Loss 5.6823 LearningRate 0.0005 Epoch: 14 Global Step: 305120 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:28:56,892-Speed 6303.43 samples/sec Loss 5.7024 LearningRate 0.0005 Epoch: 14 Global Step: 305130 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:29:00,142-Speed 6304.47 samples/sec Loss 5.6410 LearningRate 0.0005 Epoch: 14 Global Step: 305140 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:29:03,391-Speed 6304.51 samples/sec Loss 5.6485 LearningRate 0.0005 Epoch: 14 Global Step: 305150 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:29:06,641-Speed 6303.23 samples/sec Loss 5.7013 LearningRate 0.0005 Epoch: 14 Global Step: 305160 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:29:09,885-Speed 6313.45 samples/sec Loss 5.7491 LearningRate 0.0005 Epoch: 14 Global Step: 305170 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:29:13,134-Speed 6306.55 samples/sec Loss 5.7300 LearningRate 0.0005 Epoch: 14 Global Step: 305180 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:29:16,382-Speed 6306.68 samples/sec Loss 5.6757 LearningRate 0.0005 Epoch: 14 Global Step: 305190 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:29:19,632-Speed 6302.29 samples/sec Loss 5.7407 LearningRate 0.0005 Epoch: 14 Global Step: 305200 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-01 19:29:22,877-Speed 6313.42 samples/sec Loss 5.6205 LearningRate 0.0005 Epoch: 14 Global Step: 305210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:26,123-Speed 6309.99 samples/sec Loss 5.6632 LearningRate 0.0005 Epoch: 14 Global Step: 305220 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:29,370-Speed 6309.45 samples/sec Loss 5.7367 LearningRate 0.0005 Epoch: 14 Global Step: 305230 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:32,616-Speed 6310.13 samples/sec Loss 5.7150 LearningRate 0.0005 Epoch: 14 Global Step: 305240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:35,863-Speed 6308.84 samples/sec Loss 5.7354 LearningRate 0.0005 Epoch: 14 Global Step: 305250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:39,108-Speed 6311.64 samples/sec Loss 5.7112 LearningRate 0.0005 Epoch: 14 Global Step: 305260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:42,354-Speed 6311.34 samples/sec Loss 5.7303 LearningRate 0.0005 Epoch: 14 Global Step: 305270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:45,602-Speed 6308.17 samples/sec Loss 5.7200 LearningRate 0.0005 Epoch: 14 Global Step: 305280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:48,845-Speed 6314.73 samples/sec Loss 5.7383 LearningRate 0.0005 Epoch: 14 Global Step: 305290 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:52,090-Speed 6313.16 samples/sec Loss 5.7685 LearningRate 0.0005 Epoch: 14 Global Step: 305300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:55,324-Speed 6335.20 samples/sec Loss 5.7227 LearningRate 0.0005 Epoch: 14 Global Step: 305310 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:29:58,573-Speed 6305.21 samples/sec Loss 5.7916 LearningRate 0.0005 Epoch: 14 Global Step: 305320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:01,817-Speed 6313.65 samples/sec Loss 5.7291 LearningRate 0.0005 Epoch: 14 Global Step: 305330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:05,062-Speed 6312.03 samples/sec Loss 5.7000 LearningRate 0.0005 Epoch: 14 Global Step: 305340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:08,308-Speed 6312.49 samples/sec Loss 5.7449 LearningRate 0.0005 Epoch: 14 Global Step: 305350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:11,553-Speed 6311.31 samples/sec Loss 5.7114 LearningRate 0.0005 Epoch: 14 Global Step: 305360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:14,799-Speed 6311.75 samples/sec Loss 5.6916 LearningRate 0.0005 Epoch: 14 Global Step: 305370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:18,059-Speed 6282.58 samples/sec Loss 5.7315 LearningRate 0.0005 Epoch: 14 Global Step: 305380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:21,311-Speed 6300.10 samples/sec Loss 5.7034 LearningRate 0.0005 Epoch: 14 Global Step: 305390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:24,562-Speed 6300.21 samples/sec Loss 5.7271 LearningRate 0.0005 Epoch: 14 Global Step: 305400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:27,806-Speed 6315.86 samples/sec Loss 5.7582 LearningRate 0.0005 Epoch: 14 Global Step: 305410 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:30:31,039-Speed 6336.05 samples/sec Loss 5.7051 LearningRate 0.0005 Epoch: 14 Global Step: 305420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:34,283-Speed 6314.72 samples/sec Loss 5.7007 LearningRate 0.0005 Epoch: 14 Global Step: 305430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:37,532-Speed 6304.29 samples/sec Loss 5.6720 LearningRate 0.0005 Epoch: 14 Global Step: 305440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:40,780-Speed 6306.92 samples/sec Loss 5.6994 LearningRate 0.0005 Epoch: 14 Global Step: 305450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:44,026-Speed 6311.04 samples/sec Loss 5.6872 LearningRate 0.0005 Epoch: 14 Global Step: 305460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:47,273-Speed 6307.66 samples/sec Loss 5.7151 LearningRate 0.0005 Epoch: 14 Global Step: 305470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:50,521-Speed 6307.56 samples/sec Loss 5.7597 LearningRate 0.0005 Epoch: 14 Global Step: 305480 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:53,771-Speed 6303.36 samples/sec Loss 5.6664 LearningRate 0.0005 Epoch: 14 Global Step: 305490 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:30:57,016-Speed 6312.85 samples/sec Loss 5.7488 LearningRate 0.0005 Epoch: 14 Global Step: 305500 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:00,267-Speed 6299.96 samples/sec Loss 5.7235 LearningRate 0.0005 Epoch: 14 Global Step: 305510 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:03,502-Speed 6332.03 samples/sec Loss 5.7690 LearningRate 0.0005 Epoch: 14 Global Step: 305520 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:06,749-Speed 6309.22 samples/sec Loss 5.7002 LearningRate 0.0005 Epoch: 14 Global Step: 305530 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:09,997-Speed 6306.57 samples/sec Loss 5.6318 LearningRate 0.0005 Epoch: 14 Global Step: 305540 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:13,241-Speed 6314.99 samples/sec Loss 5.7526 LearningRate 0.0005 Epoch: 14 Global Step: 305550 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:16,488-Speed 6309.63 samples/sec Loss 5.6272 LearningRate 0.0005 Epoch: 14 Global Step: 305560 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:19,735-Speed 6309.03 samples/sec Loss 5.6952 LearningRate 0.0005 Epoch: 14 Global Step: 305570 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:22,988-Speed 6297.72 samples/sec Loss 5.6795 LearningRate 0.0005 Epoch: 14 Global Step: 305580 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:26,235-Speed 6307.57 samples/sec Loss 5.6592 LearningRate 0.0005 Epoch: 14 Global Step: 305590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:29,478-Speed 6316.34 samples/sec Loss 5.7867 LearningRate 0.0005 Epoch: 14 Global Step: 305600 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:32,730-Speed 6299.87 samples/sec Loss 5.6466 LearningRate 0.0005 Epoch: 14 Global Step: 305610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:35,964-Speed 6333.26 samples/sec Loss 5.6698 LearningRate 0.0005 Epoch: 14 Global Step: 305620 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:39,211-Speed 6309.41 samples/sec Loss 5.6502 LearningRate 0.0005 Epoch: 14 Global Step: 305630 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:42,460-Speed 6304.78 samples/sec Loss 5.6543 LearningRate 0.0005 Epoch: 14 Global Step: 305640 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:45,704-Speed 6314.72 samples/sec Loss 5.7530 LearningRate 0.0005 Epoch: 14 Global Step: 305650 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:48,951-Speed 6308.50 samples/sec Loss 5.7146 LearningRate 0.0005 Epoch: 14 Global Step: 305660 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:52,197-Speed 6311.29 samples/sec Loss 5.7181 LearningRate 0.0005 Epoch: 14 Global Step: 305670 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:55,443-Speed 6311.19 samples/sec Loss 5.6897 LearningRate 0.0005 Epoch: 14 Global Step: 305680 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:31:58,689-Speed 6309.12 samples/sec Loss 5.6975 LearningRate 0.0005 Epoch: 14 Global Step: 305690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:01,936-Speed 6309.03 samples/sec Loss 5.7527 LearningRate 0.0005 Epoch: 14 Global Step: 305700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:05,183-Speed 6310.15 samples/sec Loss 5.7230 LearningRate 0.0005 Epoch: 14 Global Step: 305710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:08,412-Speed 6342.26 samples/sec Loss 5.6737 LearningRate 0.0005 Epoch: 14 Global Step: 305720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:11,657-Speed 6313.75 samples/sec Loss 5.6924 LearningRate 0.0005 Epoch: 14 Global Step: 305730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:14,900-Speed 6315.27 samples/sec Loss 5.7009 LearningRate 0.0005 Epoch: 14 Global Step: 305740 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:18,146-Speed 6310.74 samples/sec Loss 5.7599 LearningRate 0.0005 Epoch: 14 Global Step: 305750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:21,395-Speed 6304.55 samples/sec Loss 5.6724 LearningRate 0.0005 Epoch: 14 Global Step: 305760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:24,640-Speed 6313.36 samples/sec Loss 5.7162 LearningRate 0.0005 Epoch: 14 Global Step: 305770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:27,888-Speed 6307.76 samples/sec Loss 5.7142 LearningRate 0.0005 Epoch: 14 Global Step: 305780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:31,140-Speed 6300.41 samples/sec Loss 5.6725 LearningRate 0.0005 Epoch: 14 Global Step: 305790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:34,385-Speed 6311.34 samples/sec Loss 5.6822 LearningRate 0.0005 Epoch: 14 Global Step: 305800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:37,633-Speed 6307.31 samples/sec Loss 5.7037 LearningRate 0.0005 Epoch: 14 Global Step: 305810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:40,879-Speed 6309.86 samples/sec Loss 5.6975 LearningRate 0.0005 Epoch: 14 Global Step: 305820 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:32:44,110-Speed 6340.04 samples/sec Loss 5.7795 LearningRate 0.0005 Epoch: 14 Global Step: 305830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:47,357-Speed 6310.28 samples/sec Loss 5.7412 LearningRate 0.0005 Epoch: 14 Global Step: 305840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:50,606-Speed 6304.50 samples/sec Loss 5.7337 LearningRate 0.0005 Epoch: 14 Global Step: 305850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:53,851-Speed 6314.90 samples/sec Loss 5.7048 LearningRate 0.0005 Epoch: 14 Global Step: 305860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:32:57,131-Speed 6244.01 samples/sec Loss 5.6496 LearningRate 0.0005 Epoch: 14 Global Step: 305870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:00,387-Speed 6292.79 samples/sec Loss 5.6522 LearningRate 0.0005 Epoch: 14 Global Step: 305880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:03,630-Speed 6314.87 samples/sec Loss 5.6818 LearningRate 0.0005 Epoch: 14 Global Step: 305890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:06,877-Speed 6309.58 samples/sec Loss 5.7100 LearningRate 0.0005 Epoch: 14 Global Step: 305900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:10,123-Speed 6309.94 samples/sec Loss 5.6850 LearningRate 0.0005 Epoch: 14 Global Step: 305910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:13,370-Speed 6309.32 samples/sec Loss 5.7184 LearningRate 0.0005 Epoch: 14 Global Step: 305920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:16,606-Speed 6330.86 samples/sec Loss 5.6705 LearningRate 0.0005 Epoch: 14 Global Step: 305930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:19,851-Speed 6311.36 samples/sec Loss 5.6145 LearningRate 0.0005 Epoch: 14 Global Step: 305940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:23,098-Speed 6308.93 samples/sec Loss 5.6687 LearningRate 0.0005 Epoch: 14 Global Step: 305950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:26,343-Speed 6312.55 samples/sec Loss 5.6782 LearningRate 0.0005 Epoch: 14 Global Step: 305960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:29,593-Speed 6304.82 samples/sec Loss 5.7595 LearningRate 0.0005 Epoch: 14 Global Step: 305970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:32,841-Speed 6305.07 samples/sec Loss 5.6283 LearningRate 0.0005 Epoch: 14 Global Step: 305980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:36,087-Speed 6311.77 samples/sec Loss 5.6660 LearningRate 0.0005 Epoch: 14 Global Step: 305990 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:39,333-Speed 6311.03 samples/sec Loss 5.7300 LearningRate 0.0005 Epoch: 14 Global Step: 306000 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:42,582-Speed 6304.94 samples/sec Loss 5.6482 LearningRate 0.0005 Epoch: 14 Global Step: 306010 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:45,826-Speed 6315.65 samples/sec Loss 5.6860 LearningRate 0.0005 Epoch: 14 Global Step: 306020 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:49,142-Speed 6176.74 samples/sec Loss 5.7242 LearningRate 0.0005 Epoch: 14 Global Step: 306030 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:52,389-Speed 6308.03 samples/sec Loss 5.6017 LearningRate 0.0005 Epoch: 14 Global Step: 306040 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:55,633-Speed 6314.51 samples/sec Loss 5.6544 LearningRate 0.0005 Epoch: 14 Global Step: 306050 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:33:58,882-Speed 6306.04 samples/sec Loss 5.7227 LearningRate 0.0005 Epoch: 14 Global Step: 306060 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:02,136-Speed 6295.18 samples/sec Loss 5.6760 LearningRate 0.0005 Epoch: 14 Global Step: 306070 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:05,400-Speed 6275.94 samples/sec Loss 5.6335 LearningRate 0.0005 Epoch: 14 Global Step: 306080 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:08,647-Speed 6308.56 samples/sec Loss 5.7872 LearningRate 0.0005 Epoch: 14 Global Step: 306090 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:11,892-Speed 6311.88 samples/sec Loss 5.6613 LearningRate 0.0005 Epoch: 14 Global Step: 306100 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:15,141-Speed 6305.94 samples/sec Loss 5.6897 LearningRate 0.0005 Epoch: 14 Global Step: 306110 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:18,385-Speed 6313.51 samples/sec Loss 5.6228 LearningRate 0.0005 Epoch: 14 Global Step: 306120 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:21,613-Speed 6346.57 samples/sec Loss 5.6645 LearningRate 0.0005 Epoch: 14 Global Step: 306130 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:24,864-Speed 6300.38 samples/sec Loss 5.6508 LearningRate 0.0005 Epoch: 14 Global Step: 306140 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:28,115-Speed 6302.16 samples/sec Loss 5.7156 LearningRate 0.0005 Epoch: 14 Global Step: 306150 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:31,361-Speed 6309.85 samples/sec Loss 5.6713 LearningRate 0.0005 Epoch: 14 Global Step: 306160 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:34,619-Speed 6287.50 samples/sec Loss 5.7296 LearningRate 0.0005 Epoch: 14 Global Step: 306170 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:37,864-Speed 6312.38 samples/sec Loss 5.6931 LearningRate 0.0005 Epoch: 14 Global Step: 306180 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:41,109-Speed 6313.20 samples/sec Loss 5.7252 LearningRate 0.0005 Epoch: 14 Global Step: 306190 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:44,354-Speed 6312.86 samples/sec Loss 5.7150 LearningRate 0.0005 Epoch: 14 Global Step: 306200 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:47,603-Speed 6305.54 samples/sec Loss 5.6386 LearningRate 0.0005 Epoch: 14 Global Step: 306210 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:50,849-Speed 6312.01 samples/sec Loss 5.6672 LearningRate 0.0005 Epoch: 14 Global Step: 306220 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:34:54,093-Speed 6312.71 samples/sec Loss 5.6527 LearningRate 0.0005 Epoch: 14 Global Step: 306230 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:34:57,329-Speed 6331.37 samples/sec Loss 5.7135 LearningRate 0.0005 Epoch: 14 Global Step: 306240 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:00,572-Speed 6316.07 samples/sec Loss 5.6754 LearningRate 0.0005 Epoch: 14 Global Step: 306250 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:03,820-Speed 6307.15 samples/sec Loss 5.6998 LearningRate 0.0005 Epoch: 14 Global Step: 306260 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:07,062-Speed 6319.14 samples/sec Loss 5.5924 LearningRate 0.0005 Epoch: 14 Global Step: 306270 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:10,307-Speed 6312.51 samples/sec Loss 5.7325 LearningRate 0.0005 Epoch: 14 Global Step: 306280 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:13,550-Speed 6315.57 samples/sec Loss 5.6313 LearningRate 0.0005 Epoch: 14 Global Step: 306290 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:16,799-Speed 6305.40 samples/sec Loss 5.7271 LearningRate 0.0005 Epoch: 14 Global Step: 306300 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:20,043-Speed 6314.14 samples/sec Loss 5.7119 LearningRate 0.0005 Epoch: 14 Global Step: 306310 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:23,290-Speed 6309.89 samples/sec Loss 5.7024 LearningRate 0.0005 Epoch: 14 Global Step: 306320 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:26,540-Speed 6302.61 samples/sec Loss 5.6622 LearningRate 0.0005 Epoch: 14 Global Step: 306330 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:29,771-Speed 6339.91 samples/sec Loss 5.7212 LearningRate 0.0005 Epoch: 14 Global Step: 306340 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:33,017-Speed 6309.35 samples/sec Loss 5.7251 LearningRate 0.0005 Epoch: 14 Global Step: 306350 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:36,263-Speed 6311.04 samples/sec Loss 5.6567 LearningRate 0.0005 Epoch: 14 Global Step: 306360 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:39,529-Speed 6272.91 samples/sec Loss 5.7427 LearningRate 0.0005 Epoch: 14 Global Step: 306370 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:42,776-Speed 6309.33 samples/sec Loss 5.7724 LearningRate 0.0005 Epoch: 14 Global Step: 306380 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:46,020-Speed 6313.75 samples/sec Loss 5.7124 LearningRate 0.0005 Epoch: 14 Global Step: 306390 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:49,266-Speed 6310.44 samples/sec Loss 5.7149 LearningRate 0.0005 Epoch: 14 Global Step: 306400 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:52,510-Speed 6314.23 samples/sec Loss 5.7529 LearningRate 0.0005 Epoch: 14 Global Step: 306410 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:55,758-Speed 6307.00 samples/sec Loss 5.7090 LearningRate 0.0005 Epoch: 14 Global Step: 306420 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:35:59,006-Speed 6307.26 samples/sec Loss 5.7273 LearningRate 0.0005 Epoch: 14 Global Step: 306430 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:02,241-Speed 6333.38 samples/sec Loss 5.7077 LearningRate 0.0005 Epoch: 14 Global Step: 306440 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:05,486-Speed 6312.60 samples/sec Loss 5.6846 LearningRate 0.0005 Epoch: 14 Global Step: 306450 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:08,731-Speed 6311.95 samples/sec Loss 5.7324 LearningRate 0.0005 Epoch: 14 Global Step: 306460 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:11,979-Speed 6307.08 samples/sec Loss 5.6404 LearningRate 0.0005 Epoch: 14 Global Step: 306470 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:15,227-Speed 6307.05 samples/sec Loss 5.7020 LearningRate 0.0005 Epoch: 14 Global Step: 306480 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:18,472-Speed 6312.99 samples/sec Loss 5.7542 LearningRate 0.0005 Epoch: 14 Global Step: 306490 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:21,716-Speed 6314.61 samples/sec Loss 5.7135 LearningRate 0.0005 Epoch: 14 Global Step: 306500 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:24,964-Speed 6306.56 samples/sec Loss 5.7343 LearningRate 0.0005 Epoch: 14 Global Step: 306510 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:28,210-Speed 6309.92 samples/sec Loss 5.7504 LearningRate 0.0005 Epoch: 14 Global Step: 306520 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:31,458-Speed 6308.35 samples/sec Loss 5.6808 LearningRate 0.0005 Epoch: 14 Global Step: 306530 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:34,704-Speed 6310.41 samples/sec Loss 5.7817 LearningRate 0.0005 Epoch: 14 Global Step: 306540 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:37,959-Speed 6292.94 samples/sec Loss 5.7258 LearningRate 0.0005 Epoch: 14 Global Step: 306550 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:41,207-Speed 6307.08 samples/sec Loss 5.7021 LearningRate 0.0005 Epoch: 14 Global Step: 306560 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:44,451-Speed 6313.46 samples/sec Loss 5.6542 LearningRate 0.0005 Epoch: 14 Global Step: 306570 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:47,697-Speed 6310.48 samples/sec Loss 5.5999 LearningRate 0.0005 Epoch: 14 Global Step: 306580 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:50,942-Speed 6314.39 samples/sec Loss 5.7302 LearningRate 0.0005 Epoch: 14 Global Step: 306590 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:54,188-Speed 6308.67 samples/sec Loss 5.7097 LearningRate 0.0005 Epoch: 14 Global Step: 306600 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:36:57,434-Speed 6311.59 samples/sec Loss 5.6515 LearningRate 0.0005 Epoch: 14 Global Step: 306610 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:00,683-Speed 6305.14 samples/sec Loss 5.7658 LearningRate 0.0005 Epoch: 14 Global Step: 306620 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:03,931-Speed 6306.42 samples/sec Loss 5.6536 LearningRate 0.0005 Epoch: 14 Global Step: 306630 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:07,165-Speed 6335.60 samples/sec Loss 5.7207 LearningRate 0.0005 Epoch: 14 Global Step: 306640 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:10,411-Speed 6310.12 samples/sec Loss 5.7240 LearningRate 0.0005 Epoch: 14 Global Step: 306650 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:13,658-Speed 6310.08 samples/sec Loss 5.7117 LearningRate 0.0005 Epoch: 14 Global Step: 306660 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:16,908-Speed 6302.79 samples/sec Loss 5.7932 LearningRate 0.0005 Epoch: 14 Global Step: 306670 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:20,156-Speed 6305.72 samples/sec Loss 5.6892 LearningRate 0.0005 Epoch: 14 Global Step: 306680 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:23,405-Speed 6305.68 samples/sec Loss 5.7296 LearningRate 0.0005 Epoch: 14 Global Step: 306690 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:26,655-Speed 6302.37 samples/sec Loss 5.6457 LearningRate 0.0005 Epoch: 14 Global Step: 306700 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:29,899-Speed 6316.05 samples/sec Loss 5.6414 LearningRate 0.0005 Epoch: 14 Global Step: 306710 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:33,149-Speed 6302.24 samples/sec Loss 5.6810 LearningRate 0.0005 Epoch: 14 Global Step: 306720 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:36,392-Speed 6316.35 samples/sec Loss 5.6610 LearningRate 0.0005 Epoch: 14 Global Step: 306730 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:39,641-Speed 6305.64 samples/sec Loss 5.6806 LearningRate 0.0005 Epoch: 14 Global Step: 306740 Fp16 Grad Scale: 65536 Required: 48 hours Training: 2022-04-01 19:37:42,869-Speed 6344.29 samples/sec Loss 5.6497 LearningRate 0.0005 Epoch: 14 Global Step: 306750 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:46,118-Speed 6306.13 samples/sec Loss 5.7825 LearningRate 0.0005 Epoch: 14 Global Step: 306760 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:49,365-Speed 6308.49 samples/sec Loss 5.6933 LearningRate 0.0005 Epoch: 14 Global Step: 306770 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:52,609-Speed 6313.68 samples/sec Loss 5.6771 LearningRate 0.0005 Epoch: 14 Global Step: 306780 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:55,858-Speed 6306.12 samples/sec Loss 5.7063 LearningRate 0.0005 Epoch: 14 Global Step: 306790 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:37:59,106-Speed 6306.87 samples/sec Loss 5.7331 LearningRate 0.0005 Epoch: 14 Global Step: 306800 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:02,352-Speed 6308.96 samples/sec Loss 5.6924 LearningRate 0.0005 Epoch: 14 Global Step: 306810 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:05,599-Speed 6309.40 samples/sec Loss 5.7644 LearningRate 0.0005 Epoch: 14 Global Step: 306820 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:08,845-Speed 6310.62 samples/sec Loss 5.6745 LearningRate 0.0005 Epoch: 14 Global Step: 306830 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:12,093-Speed 6306.96 samples/sec Loss 5.7429 LearningRate 0.0005 Epoch: 14 Global Step: 306840 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:15,322-Speed 6343.81 samples/sec Loss 5.7061 LearningRate 0.0005 Epoch: 14 Global Step: 306850 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:18,572-Speed 6304.70 samples/sec Loss 5.7081 LearningRate 0.0005 Epoch: 14 Global Step: 306860 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:21,822-Speed 6302.46 samples/sec Loss 5.7521 LearningRate 0.0005 Epoch: 14 Global Step: 306870 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:25,072-Speed 6303.36 samples/sec Loss 5.6404 LearningRate 0.0005 Epoch: 14 Global Step: 306880 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:28,320-Speed 6306.98 samples/sec Loss 5.7219 LearningRate 0.0005 Epoch: 14 Global Step: 306890 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:31,567-Speed 6309.77 samples/sec Loss 5.6696 LearningRate 0.0005 Epoch: 14 Global Step: 306900 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:34,817-Speed 6302.09 samples/sec Loss 5.7143 LearningRate 0.0005 Epoch: 14 Global Step: 306910 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:38,063-Speed 6311.81 samples/sec Loss 5.6655 LearningRate 0.0005 Epoch: 14 Global Step: 306920 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:41,311-Speed 6305.91 samples/sec Loss 5.7254 LearningRate 0.0005 Epoch: 14 Global Step: 306930 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:44,555-Speed 6313.57 samples/sec Loss 5.6881 LearningRate 0.0005 Epoch: 14 Global Step: 306940 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:47,786-Speed 6340.17 samples/sec Loss 5.6947 LearningRate 0.0005 Epoch: 14 Global Step: 306950 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:51,031-Speed 6313.19 samples/sec Loss 5.6531 LearningRate 0.0005 Epoch: 14 Global Step: 306960 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:54,276-Speed 6313.13 samples/sec Loss 5.6520 LearningRate 0.0005 Epoch: 14 Global Step: 306970 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:38:57,525-Speed 6303.56 samples/sec Loss 5.6483 LearningRate 0.0005 Epoch: 14 Global Step: 306980 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:39:00,773-Speed 6307.86 samples/sec Loss 5.6932 LearningRate 0.0005 Epoch: 14 Global Step: 306990 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:39:04,021-Speed 6309.83 samples/sec Loss 5.6683 LearningRate 0.0005 Epoch: 14 Global Step: 307000 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:39:07,271-Speed 6303.04 samples/sec Loss 5.6831 LearningRate 0.0005 Epoch: 14 Global Step: 307010 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:39:10,517-Speed 6311.58 samples/sec Loss 5.6772 LearningRate 0.0005 Epoch: 14 Global Step: 307020 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:39:13,763-Speed 6310.48 samples/sec Loss 5.6273 LearningRate 0.0005 Epoch: 14 Global Step: 307030 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:39:17,009-Speed 6310.76 samples/sec Loss 5.6705 LearningRate 0.0005 Epoch: 14 Global Step: 307040 Fp16 Grad Scale: 32768 Required: 48 hours Training: 2022-04-01 19:39:20,242-Speed 6336.53 samples/sec Loss 5.6726 LearningRate 0.0005 Epoch: 14 Global Step: 307050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:23,503-Speed 6280.74 samples/sec Loss 5.7011 LearningRate 0.0005 Epoch: 14 Global Step: 307060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:26,803-Speed 6207.88 samples/sec Loss 5.6658 LearningRate 0.0005 Epoch: 14 Global Step: 307070 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:30,048-Speed 6311.81 samples/sec Loss 5.7316 LearningRate 0.0005 Epoch: 14 Global Step: 307080 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:33,295-Speed 6309.30 samples/sec Loss 5.7206 LearningRate 0.0005 Epoch: 14 Global Step: 307090 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:36,541-Speed 6311.75 samples/sec Loss 5.6781 LearningRate 0.0005 Epoch: 14 Global Step: 307100 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:39,786-Speed 6311.56 samples/sec Loss 5.6294 LearningRate 0.0005 Epoch: 14 Global Step: 307110 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:43,034-Speed 6307.92 samples/sec Loss 5.6829 LearningRate 0.0005 Epoch: 14 Global Step: 307120 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:46,281-Speed 6308.80 samples/sec Loss 5.6336 LearningRate 0.0005 Epoch: 14 Global Step: 307130 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:49,524-Speed 6315.80 samples/sec Loss 5.7031 LearningRate 0.0005 Epoch: 14 Global Step: 307140 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:52,773-Speed 6306.07 samples/sec Loss 5.6920 LearningRate 0.0005 Epoch: 14 Global Step: 307150 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 19:39:56,004-Speed 6339.11 samples/sec Loss 5.7389 LearningRate 0.0005 Epoch: 14 Global Step: 307160 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:39:59,255-Speed 6300.51 samples/sec Loss 5.7753 LearningRate 0.0005 Epoch: 14 Global Step: 307170 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:02,506-Speed 6301.76 samples/sec Loss 5.7854 LearningRate 0.0005 Epoch: 14 Global Step: 307180 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:05,752-Speed 6311.23 samples/sec Loss 5.6864 LearningRate 0.0005 Epoch: 14 Global Step: 307190 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:08,996-Speed 6314.41 samples/sec Loss 5.6632 LearningRate 0.0005 Epoch: 14 Global Step: 307200 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:12,239-Speed 6315.78 samples/sec Loss 5.6685 LearningRate 0.0005 Epoch: 14 Global Step: 307210 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:15,481-Speed 6318.19 samples/sec Loss 5.6329 LearningRate 0.0005 Epoch: 14 Global Step: 307220 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:18,735-Speed 6297.00 samples/sec Loss 5.6985 LearningRate 0.0005 Epoch: 14 Global Step: 307230 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:21,978-Speed 6315.73 samples/sec Loss 5.7013 LearningRate 0.0005 Epoch: 14 Global Step: 307240 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:25,225-Speed 6307.36 samples/sec Loss 5.6366 LearningRate 0.0005 Epoch: 14 Global Step: 307250 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:28,464-Speed 6324.79 samples/sec Loss 5.6694 LearningRate 0.0005 Epoch: 14 Global Step: 307260 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:31,719-Speed 6293.73 samples/sec Loss 5.6106 LearningRate 0.0005 Epoch: 14 Global Step: 307270 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:34,966-Speed 6308.89 samples/sec Loss 5.6300 LearningRate 0.0005 Epoch: 14 Global Step: 307280 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:38,218-Speed 6299.59 samples/sec Loss 5.6121 LearningRate 0.0005 Epoch: 14 Global Step: 307290 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:41,461-Speed 6317.34 samples/sec Loss 5.7009 LearningRate 0.0005 Epoch: 14 Global Step: 307300 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:44,706-Speed 6311.52 samples/sec Loss 5.7259 LearningRate 0.0005 Epoch: 14 Global Step: 307310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:47,950-Speed 6315.88 samples/sec Loss 5.6654 LearningRate 0.0005 Epoch: 14 Global Step: 307320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:51,195-Speed 6312.18 samples/sec Loss 5.6822 LearningRate 0.0005 Epoch: 14 Global Step: 307330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:54,442-Speed 6308.21 samples/sec Loss 5.6956 LearningRate 0.0005 Epoch: 14 Global Step: 307340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:40:57,688-Speed 6311.53 samples/sec Loss 5.6890 LearningRate 0.0005 Epoch: 14 Global Step: 307350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:00,921-Speed 6335.45 samples/sec Loss 5.7071 LearningRate 0.0005 Epoch: 14 Global Step: 307360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:04,169-Speed 6307.89 samples/sec Loss 5.6513 LearningRate 0.0005 Epoch: 14 Global Step: 307370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:07,421-Speed 6298.83 samples/sec Loss 5.7576 LearningRate 0.0005 Epoch: 14 Global Step: 307380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:10,669-Speed 6306.78 samples/sec Loss 5.5689 LearningRate 0.0005 Epoch: 14 Global Step: 307390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:13,929-Speed 6284.18 samples/sec Loss 5.6842 LearningRate 0.0005 Epoch: 14 Global Step: 307400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:17,180-Speed 6300.12 samples/sec Loss 5.8185 LearningRate 0.0005 Epoch: 14 Global Step: 307410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:20,427-Speed 6308.78 samples/sec Loss 5.6415 LearningRate 0.0005 Epoch: 14 Global Step: 307420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:23,674-Speed 6309.27 samples/sec Loss 5.6335 LearningRate 0.0005 Epoch: 14 Global Step: 307430 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:26,932-Speed 6287.44 samples/sec Loss 5.6136 LearningRate 0.0005 Epoch: 14 Global Step: 307440 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:30,179-Speed 6307.88 samples/sec Loss 5.6289 LearningRate 0.0005 Epoch: 14 Global Step: 307450 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:33,415-Speed 6331.47 samples/sec Loss 5.7301 LearningRate 0.0005 Epoch: 14 Global Step: 307460 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:36,663-Speed 6307.28 samples/sec Loss 5.6579 LearningRate 0.0005 Epoch: 14 Global Step: 307470 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:39,913-Speed 6302.92 samples/sec Loss 5.6525 LearningRate 0.0005 Epoch: 14 Global Step: 307480 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:43,159-Speed 6309.52 samples/sec Loss 5.6840 LearningRate 0.0005 Epoch: 14 Global Step: 307490 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:46,409-Speed 6303.54 samples/sec Loss 5.6651 LearningRate 0.0005 Epoch: 14 Global Step: 307500 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:49,658-Speed 6304.26 samples/sec Loss 5.7202 LearningRate 0.0005 Epoch: 14 Global Step: 307510 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:52,912-Speed 6296.22 samples/sec Loss 5.6910 LearningRate 0.0005 Epoch: 14 Global Step: 307520 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:56,162-Speed 6302.51 samples/sec Loss 5.6804 LearningRate 0.0005 Epoch: 14 Global Step: 307530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:41:59,412-Speed 6302.71 samples/sec Loss 5.6733 LearningRate 0.0005 Epoch: 14 Global Step: 307540 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:02,661-Speed 6305.72 samples/sec Loss 5.6918 LearningRate 0.0005 Epoch: 14 Global Step: 307550 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:05,910-Speed 6304.17 samples/sec Loss 5.7257 LearningRate 0.0005 Epoch: 14 Global Step: 307560 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 19:42:09,144-Speed 6334.73 samples/sec Loss 5.6924 LearningRate 0.0005 Epoch: 14 Global Step: 307570 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:12,398-Speed 6296.23 samples/sec Loss 5.6992 LearningRate 0.0005 Epoch: 14 Global Step: 307580 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:15,648-Speed 6302.58 samples/sec Loss 5.7590 LearningRate 0.0005 Epoch: 14 Global Step: 307590 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:18,896-Speed 6306.77 samples/sec Loss 5.6967 LearningRate 0.0005 Epoch: 14 Global Step: 307600 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:22,141-Speed 6311.00 samples/sec Loss 5.6891 LearningRate 0.0005 Epoch: 14 Global Step: 307610 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:25,387-Speed 6310.74 samples/sec Loss 5.7506 LearningRate 0.0005 Epoch: 14 Global Step: 307620 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:28,634-Speed 6308.61 samples/sec Loss 5.6789 LearningRate 0.0005 Epoch: 14 Global Step: 307630 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:31,879-Speed 6313.57 samples/sec Loss 5.6439 LearningRate 0.0005 Epoch: 14 Global Step: 307640 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:35,125-Speed 6310.21 samples/sec Loss 5.7330 LearningRate 0.0005 Epoch: 14 Global Step: 307650 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:42:38,355-Speed 6342.83 samples/sec Loss 5.6352 LearningRate 0.0005 Epoch: 14 Global Step: 307660 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:42:41,600-Speed 6311.72 samples/sec Loss 5.6699 LearningRate 0.0005 Epoch: 14 Global Step: 307670 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:42:44,844-Speed 6314.63 samples/sec Loss 5.7027 LearningRate 0.0005 Epoch: 14 Global Step: 307680 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:42:48,085-Speed 6320.59 samples/sec Loss 5.6845 LearningRate 0.0005 Epoch: 14 Global Step: 307690 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:42:51,330-Speed 6312.76 samples/sec Loss 5.7402 LearningRate 0.0005 Epoch: 14 Global Step: 307700 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:42:54,576-Speed 6311.93 samples/sec Loss 5.7328 LearningRate 0.0005 Epoch: 14 Global Step: 307710 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:42:57,820-Speed 6313.55 samples/sec Loss 5.7317 LearningRate 0.0005 Epoch: 14 Global Step: 307720 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:43:01,063-Speed 6317.53 samples/sec Loss 5.7019 LearningRate 0.0005 Epoch: 14 Global Step: 307730 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:43:04,307-Speed 6315.51 samples/sec Loss 5.6163 LearningRate 0.0005 Epoch: 14 Global Step: 307740 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:43:07,553-Speed 6309.75 samples/sec Loss 5.6560 LearningRate 0.0005 Epoch: 14 Global Step: 307750 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:43:10,798-Speed 6313.71 samples/sec Loss 5.7314 LearningRate 0.0005 Epoch: 14 Global Step: 307760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:14,042-Speed 6314.08 samples/sec Loss 5.6897 LearningRate 0.0005 Epoch: 14 Global Step: 307770 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:17,293-Speed 6301.12 samples/sec Loss 5.7124 LearningRate 0.0005 Epoch: 14 Global Step: 307780 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:20,541-Speed 6306.50 samples/sec Loss 5.6868 LearningRate 0.0005 Epoch: 14 Global Step: 307790 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:23,797-Speed 6291.32 samples/sec Loss 5.6316 LearningRate 0.0005 Epoch: 14 Global Step: 307800 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:27,047-Speed 6303.27 samples/sec Loss 5.5998 LearningRate 0.0005 Epoch: 14 Global Step: 307810 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:30,293-Speed 6311.05 samples/sec Loss 5.6829 LearningRate 0.0005 Epoch: 14 Global Step: 307820 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:33,539-Speed 6311.03 samples/sec Loss 5.6346 LearningRate 0.0005 Epoch: 14 Global Step: 307830 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:36,787-Speed 6305.65 samples/sec Loss 5.6894 LearningRate 0.0005 Epoch: 14 Global Step: 307840 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:40,036-Speed 6305.03 samples/sec Loss 5.6246 LearningRate 0.0005 Epoch: 14 Global Step: 307850 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:43,267-Speed 6339.81 samples/sec Loss 5.6664 LearningRate 0.0005 Epoch: 14 Global Step: 307860 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:46,515-Speed 6306.71 samples/sec Loss 5.7103 LearningRate 0.0005 Epoch: 14 Global Step: 307870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:49,768-Speed 6298.04 samples/sec Loss 5.6820 LearningRate 0.0005 Epoch: 14 Global Step: 307880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:53,013-Speed 6312.40 samples/sec Loss 5.6776 LearningRate 0.0005 Epoch: 14 Global Step: 307890 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:56,263-Speed 6302.29 samples/sec Loss 5.7024 LearningRate 0.0005 Epoch: 14 Global Step: 307900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:43:59,513-Speed 6302.58 samples/sec Loss 5.6799 LearningRate 0.0005 Epoch: 14 Global Step: 307910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:02,761-Speed 6307.89 samples/sec Loss 5.6929 LearningRate 0.0005 Epoch: 14 Global Step: 307920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:06,005-Speed 6313.63 samples/sec Loss 5.6585 LearningRate 0.0005 Epoch: 14 Global Step: 307930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:09,252-Speed 6309.07 samples/sec Loss 5.6223 LearningRate 0.0005 Epoch: 14 Global Step: 307940 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:12,500-Speed 6307.57 samples/sec Loss 5.6251 LearningRate 0.0005 Epoch: 14 Global Step: 307950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:15,730-Speed 6342.19 samples/sec Loss 5.6630 LearningRate 0.0005 Epoch: 14 Global Step: 307960 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:18,977-Speed 6308.57 samples/sec Loss 5.6511 LearningRate 0.0005 Epoch: 14 Global Step: 307970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:22,224-Speed 6309.64 samples/sec Loss 5.6154 LearningRate 0.0005 Epoch: 14 Global Step: 307980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:25,474-Speed 6302.14 samples/sec Loss 5.6684 LearningRate 0.0005 Epoch: 14 Global Step: 307990 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:28,720-Speed 6310.79 samples/sec Loss 5.6090 LearningRate 0.0005 Epoch: 14 Global Step: 308000 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:31,966-Speed 6310.74 samples/sec Loss 5.6110 LearningRate 0.0005 Epoch: 14 Global Step: 308010 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:35,213-Speed 6309.51 samples/sec Loss 5.5860 LearningRate 0.0005 Epoch: 14 Global Step: 308020 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:38,455-Speed 6317.64 samples/sec Loss 5.6843 LearningRate 0.0005 Epoch: 14 Global Step: 308030 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:41,703-Speed 6307.18 samples/sec Loss 5.5990 LearningRate 0.0005 Epoch: 14 Global Step: 308040 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:44,948-Speed 6313.10 samples/sec Loss 5.7158 LearningRate 0.0005 Epoch: 14 Global Step: 308050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:48,179-Speed 6339.40 samples/sec Loss 5.6124 LearningRate 0.0005 Epoch: 14 Global Step: 308060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:51,430-Speed 6300.45 samples/sec Loss 5.6565 LearningRate 0.0005 Epoch: 14 Global Step: 308070 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:54,675-Speed 6313.05 samples/sec Loss 5.7271 LearningRate 0.0005 Epoch: 14 Global Step: 308080 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:44:57,920-Speed 6312.99 samples/sec Loss 5.7086 LearningRate 0.0005 Epoch: 14 Global Step: 308090 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:01,165-Speed 6311.60 samples/sec Loss 5.6356 LearningRate 0.0005 Epoch: 14 Global Step: 308100 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:04,418-Speed 6298.90 samples/sec Loss 5.6400 LearningRate 0.0005 Epoch: 14 Global Step: 308110 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:07,667-Speed 6303.55 samples/sec Loss 5.6772 LearningRate 0.0005 Epoch: 14 Global Step: 308120 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:10,915-Speed 6309.34 samples/sec Loss 5.6307 LearningRate 0.0005 Epoch: 14 Global Step: 308130 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:14,166-Speed 6299.41 samples/sec Loss 5.6745 LearningRate 0.0005 Epoch: 14 Global Step: 308140 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:17,411-Speed 6313.37 samples/sec Loss 5.7620 LearningRate 0.0005 Epoch: 14 Global Step: 308150 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:20,644-Speed 6336.47 samples/sec Loss 5.7238 LearningRate 0.0005 Epoch: 14 Global Step: 308160 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:23,889-Speed 6312.17 samples/sec Loss 5.7083 LearningRate 0.0005 Epoch: 14 Global Step: 308170 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:27,135-Speed 6310.80 samples/sec Loss 5.6283 LearningRate 0.0005 Epoch: 14 Global Step: 308180 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:30,378-Speed 6316.01 samples/sec Loss 5.6382 LearningRate 0.0005 Epoch: 14 Global Step: 308190 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:33,622-Speed 6314.76 samples/sec Loss 5.6770 LearningRate 0.0005 Epoch: 14 Global Step: 308200 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:36,866-Speed 6314.35 samples/sec Loss 5.6619 LearningRate 0.0005 Epoch: 14 Global Step: 308210 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:40,113-Speed 6309.83 samples/sec Loss 5.6983 LearningRate 0.0005 Epoch: 14 Global Step: 308220 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:43,362-Speed 6305.41 samples/sec Loss 5.6553 LearningRate 0.0005 Epoch: 14 Global Step: 308230 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:46,608-Speed 6310.16 samples/sec Loss 5.6259 LearningRate 0.0005 Epoch: 14 Global Step: 308240 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:49,851-Speed 6316.25 samples/sec Loss 5.6812 LearningRate 0.0005 Epoch: 14 Global Step: 308250 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:53,100-Speed 6305.90 samples/sec Loss 5.7049 LearningRate 0.0005 Epoch: 14 Global Step: 308260 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 19:45:56,327-Speed 6346.40 samples/sec Loss 5.6533 LearningRate 0.0005 Epoch: 14 Global Step: 308270 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:45:59,580-Speed 6296.34 samples/sec Loss 5.7440 LearningRate 0.0005 Epoch: 14 Global Step: 308280 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:02,826-Speed 6311.91 samples/sec Loss 5.6056 LearningRate 0.0005 Epoch: 14 Global Step: 308290 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:06,071-Speed 6311.78 samples/sec Loss 5.6649 LearningRate 0.0005 Epoch: 14 Global Step: 308300 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:09,320-Speed 6305.32 samples/sec Loss 5.6836 LearningRate 0.0005 Epoch: 14 Global Step: 308310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:12,571-Speed 6301.62 samples/sec Loss 5.6479 LearningRate 0.0005 Epoch: 14 Global Step: 308320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:15,820-Speed 6305.17 samples/sec Loss 5.6896 LearningRate 0.0005 Epoch: 14 Global Step: 308330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:19,065-Speed 6311.36 samples/sec Loss 5.6202 LearningRate 0.0005 Epoch: 14 Global Step: 308340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:22,312-Speed 6309.46 samples/sec Loss 5.6445 LearningRate 0.0005 Epoch: 14 Global Step: 308350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:25,556-Speed 6316.25 samples/sec Loss 5.6488 LearningRate 0.0005 Epoch: 14 Global Step: 308360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:28,786-Speed 6341.42 samples/sec Loss 5.7573 LearningRate 0.0005 Epoch: 14 Global Step: 308370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:32,032-Speed 6309.50 samples/sec Loss 5.6667 LearningRate 0.0005 Epoch: 14 Global Step: 308380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:35,278-Speed 6312.16 samples/sec Loss 5.6732 LearningRate 0.0005 Epoch: 14 Global Step: 308390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:38,527-Speed 6303.83 samples/sec Loss 5.7405 LearningRate 0.0005 Epoch: 14 Global Step: 308400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:41,778-Speed 6300.93 samples/sec Loss 5.7183 LearningRate 0.0005 Epoch: 14 Global Step: 308410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:45,024-Speed 6312.05 samples/sec Loss 5.6689 LearningRate 0.0005 Epoch: 14 Global Step: 308420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:48,271-Speed 6307.75 samples/sec Loss 5.6830 LearningRate 0.0005 Epoch: 14 Global Step: 308430 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:51,520-Speed 6305.89 samples/sec Loss 5.6890 LearningRate 0.0005 Epoch: 14 Global Step: 308440 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:54,764-Speed 6314.35 samples/sec Loss 5.6994 LearningRate 0.0005 Epoch: 14 Global Step: 308450 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:46:58,013-Speed 6304.33 samples/sec Loss 5.7639 LearningRate 0.0005 Epoch: 14 Global Step: 308460 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:01,247-Speed 6334.81 samples/sec Loss 5.6573 LearningRate 0.0005 Epoch: 14 Global Step: 308470 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:04,495-Speed 6307.24 samples/sec Loss 5.6971 LearningRate 0.0005 Epoch: 14 Global Step: 308480 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:07,739-Speed 6313.01 samples/sec Loss 5.6989 LearningRate 0.0005 Epoch: 14 Global Step: 308490 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:10,986-Speed 6309.83 samples/sec Loss 5.7164 LearningRate 0.0005 Epoch: 14 Global Step: 308500 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:14,233-Speed 6308.16 samples/sec Loss 5.5773 LearningRate 0.0005 Epoch: 14 Global Step: 308510 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:17,479-Speed 6311.62 samples/sec Loss 5.7418 LearningRate 0.0005 Epoch: 14 Global Step: 308520 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:20,724-Speed 6311.77 samples/sec Loss 5.6506 LearningRate 0.0005 Epoch: 14 Global Step: 308530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:23,975-Speed 6301.77 samples/sec Loss 5.7586 LearningRate 0.0005 Epoch: 14 Global Step: 308540 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:27,224-Speed 6304.07 samples/sec Loss 5.7270 LearningRate 0.0005 Epoch: 14 Global Step: 308550 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:30,478-Speed 6296.03 samples/sec Loss 5.6397 LearningRate 0.0005 Epoch: 14 Global Step: 308560 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:33,712-Speed 6332.95 samples/sec Loss 5.7000 LearningRate 0.0005 Epoch: 14 Global Step: 308570 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:36,957-Speed 6314.62 samples/sec Loss 5.6480 LearningRate 0.0005 Epoch: 14 Global Step: 308580 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:40,204-Speed 6308.18 samples/sec Loss 5.6657 LearningRate 0.0005 Epoch: 14 Global Step: 308590 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:43,449-Speed 6313.50 samples/sec Loss 5.7545 LearningRate 0.0005 Epoch: 14 Global Step: 308600 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:46,691-Speed 6316.98 samples/sec Loss 5.6592 LearningRate 0.0005 Epoch: 14 Global Step: 308610 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:49,937-Speed 6310.57 samples/sec Loss 5.6093 LearningRate 0.0005 Epoch: 14 Global Step: 308620 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:53,206-Speed 6266.29 samples/sec Loss 5.6419 LearningRate 0.0005 Epoch: 14 Global Step: 308630 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:56,454-Speed 6306.91 samples/sec Loss 5.6925 LearningRate 0.0005 Epoch: 14 Global Step: 308640 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:47:59,699-Speed 6312.43 samples/sec Loss 5.6413 LearningRate 0.0005 Epoch: 14 Global Step: 308650 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:02,944-Speed 6312.62 samples/sec Loss 5.7500 LearningRate 0.0005 Epoch: 14 Global Step: 308660 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:06,176-Speed 6338.98 samples/sec Loss 5.6118 LearningRate 0.0005 Epoch: 14 Global Step: 308670 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:09,420-Speed 6314.73 samples/sec Loss 5.5943 LearningRate 0.0005 Epoch: 14 Global Step: 308680 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:12,663-Speed 6316.81 samples/sec Loss 5.7141 LearningRate 0.0005 Epoch: 14 Global Step: 308690 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:15,908-Speed 6312.12 samples/sec Loss 5.6879 LearningRate 0.0005 Epoch: 14 Global Step: 308700 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:19,155-Speed 6308.43 samples/sec Loss 5.6449 LearningRate 0.0005 Epoch: 14 Global Step: 308710 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:22,400-Speed 6312.84 samples/sec Loss 5.6502 LearningRate 0.0005 Epoch: 14 Global Step: 308720 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:25,646-Speed 6310.95 samples/sec Loss 5.6312 LearningRate 0.0005 Epoch: 14 Global Step: 308730 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:28,899-Speed 6297.13 samples/sec Loss 5.7111 LearningRate 0.0005 Epoch: 14 Global Step: 308740 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:32,146-Speed 6308.90 samples/sec Loss 5.7133 LearningRate 0.0005 Epoch: 14 Global Step: 308750 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:35,393-Speed 6308.77 samples/sec Loss 5.5290 LearningRate 0.0005 Epoch: 14 Global Step: 308760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:38,639-Speed 6310.95 samples/sec Loss 5.7379 LearningRate 0.0005 Epoch: 14 Global Step: 308770 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 19:48:41,872-Speed 6336.84 samples/sec Loss 5.6830 LearningRate 0.0005 Epoch: 14 Global Step: 308780 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:45,124-Speed 6298.32 samples/sec Loss 5.6611 LearningRate 0.0005 Epoch: 14 Global Step: 308790 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:48,371-Speed 6309.51 samples/sec Loss 5.5922 LearningRate 0.0005 Epoch: 14 Global Step: 308800 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:51,616-Speed 6311.49 samples/sec Loss 5.7701 LearningRate 0.0005 Epoch: 14 Global Step: 308810 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:54,863-Speed 6310.62 samples/sec Loss 5.6177 LearningRate 0.0005 Epoch: 14 Global Step: 308820 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:48:58,113-Speed 6302.35 samples/sec Loss 5.7753 LearningRate 0.0005 Epoch: 14 Global Step: 308830 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:01,361-Speed 6307.02 samples/sec Loss 5.6513 LearningRate 0.0005 Epoch: 14 Global Step: 308840 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:04,610-Speed 6305.04 samples/sec Loss 5.6615 LearningRate 0.0005 Epoch: 14 Global Step: 308850 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:07,855-Speed 6311.91 samples/sec Loss 5.6867 LearningRate 0.0005 Epoch: 14 Global Step: 308860 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:11,100-Speed 6313.92 samples/sec Loss 5.6738 LearningRate 0.0005 Epoch: 14 Global Step: 308870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:14,333-Speed 6335.96 samples/sec Loss 5.6370 LearningRate 0.0005 Epoch: 14 Global Step: 308880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:17,581-Speed 6305.66 samples/sec Loss 5.5917 LearningRate 0.0005 Epoch: 14 Global Step: 308890 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:20,824-Speed 6316.51 samples/sec Loss 5.6653 LearningRate 0.0005 Epoch: 14 Global Step: 308900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:24,073-Speed 6305.60 samples/sec Loss 5.6206 LearningRate 0.0005 Epoch: 14 Global Step: 308910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:27,317-Speed 6314.14 samples/sec Loss 5.6726 LearningRate 0.0005 Epoch: 14 Global Step: 308920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:30,566-Speed 6305.53 samples/sec Loss 5.6253 LearningRate 0.0005 Epoch: 14 Global Step: 308930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:33,807-Speed 6319.40 samples/sec Loss 5.6948 LearningRate 0.0005 Epoch: 14 Global Step: 308940 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:37,054-Speed 6308.81 samples/sec Loss 5.6784 LearningRate 0.0005 Epoch: 14 Global Step: 308950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:40,305-Speed 6302.32 samples/sec Loss 5.6669 LearningRate 0.0005 Epoch: 14 Global Step: 308960 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:43,552-Speed 6307.36 samples/sec Loss 5.6531 LearningRate 0.0005 Epoch: 14 Global Step: 308970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:46,789-Speed 6328.05 samples/sec Loss 5.7122 LearningRate 0.0005 Epoch: 14 Global Step: 308980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:50,034-Speed 6312.62 samples/sec Loss 5.6953 LearningRate 0.0005 Epoch: 14 Global Step: 308990 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:53,280-Speed 6311.93 samples/sec Loss 5.6325 LearningRate 0.0005 Epoch: 14 Global Step: 309000 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:56,527-Speed 6308.72 samples/sec Loss 5.6152 LearningRate 0.0005 Epoch: 14 Global Step: 309010 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:49:59,773-Speed 6310.12 samples/sec Loss 5.6891 LearningRate 0.0005 Epoch: 14 Global Step: 309020 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:03,020-Speed 6310.57 samples/sec Loss 5.6318 LearningRate 0.0005 Epoch: 14 Global Step: 309030 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:06,262-Speed 6316.70 samples/sec Loss 5.6768 LearningRate 0.0005 Epoch: 14 Global Step: 309040 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:09,508-Speed 6310.94 samples/sec Loss 5.7093 LearningRate 0.0005 Epoch: 14 Global Step: 309050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:12,750-Speed 6318.52 samples/sec Loss 5.7380 LearningRate 0.0005 Epoch: 14 Global Step: 309060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:15,997-Speed 6309.44 samples/sec Loss 5.6535 LearningRate 0.0005 Epoch: 14 Global Step: 309070 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:19,244-Speed 6308.79 samples/sec Loss 5.6823 LearningRate 0.0005 Epoch: 14 Global Step: 309080 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 19:50:22,489-Speed 6313.36 samples/sec Loss 5.7105 LearningRate 0.0005 Epoch: 14 Global Step: 309090 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:25,737-Speed 6305.40 samples/sec Loss 5.6719 LearningRate 0.0005 Epoch: 14 Global Step: 309100 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:28,984-Speed 6309.98 samples/sec Loss 5.6729 LearningRate 0.0005 Epoch: 14 Global Step: 309110 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:32,231-Speed 6308.31 samples/sec Loss 5.6167 LearningRate 0.0005 Epoch: 14 Global Step: 309120 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:35,484-Speed 6297.46 samples/sec Loss 5.6874 LearningRate 0.0005 Epoch: 14 Global Step: 309130 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:38,755-Speed 6262.54 samples/sec Loss 5.6521 LearningRate 0.0005 Epoch: 14 Global Step: 309140 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:42,102-Speed 6120.07 samples/sec Loss 5.6059 LearningRate 0.0005 Epoch: 14 Global Step: 309150 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:45,346-Speed 6314.91 samples/sec Loss 5.6478 LearningRate 0.0005 Epoch: 14 Global Step: 309160 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:48,611-Speed 6276.72 samples/sec Loss 5.6710 LearningRate 0.0005 Epoch: 14 Global Step: 309170 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:51,854-Speed 6315.94 samples/sec Loss 5.6615 LearningRate 0.0005 Epoch: 14 Global Step: 309180 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:55,085-Speed 6338.84 samples/sec Loss 5.7133 LearningRate 0.0005 Epoch: 14 Global Step: 309190 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:50:58,336-Speed 6301.50 samples/sec Loss 5.6794 LearningRate 0.0005 Epoch: 14 Global Step: 309200 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:01,581-Speed 6312.30 samples/sec Loss 5.6463 LearningRate 0.0005 Epoch: 14 Global Step: 309210 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:04,905-Speed 6163.24 samples/sec Loss 5.6826 LearningRate 0.0005 Epoch: 14 Global Step: 309220 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:08,177-Speed 6261.52 samples/sec Loss 5.6809 LearningRate 0.0005 Epoch: 14 Global Step: 309230 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:11,424-Speed 6309.85 samples/sec Loss 5.7088 LearningRate 0.0005 Epoch: 14 Global Step: 309240 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:14,670-Speed 6310.36 samples/sec Loss 5.7366 LearningRate 0.0005 Epoch: 14 Global Step: 309250 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:17,910-Speed 6321.24 samples/sec Loss 5.7173 LearningRate 0.0005 Epoch: 14 Global Step: 309260 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:21,161-Speed 6301.53 samples/sec Loss 5.6272 LearningRate 0.0005 Epoch: 14 Global Step: 309270 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:24,405-Speed 6313.65 samples/sec Loss 5.7071 LearningRate 0.0005 Epoch: 14 Global Step: 309280 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:27,637-Speed 6337.70 samples/sec Loss 5.7188 LearningRate 0.0005 Epoch: 14 Global Step: 309290 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:30,881-Speed 6314.96 samples/sec Loss 5.6992 LearningRate 0.0005 Epoch: 14 Global Step: 309300 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:34,127-Speed 6310.99 samples/sec Loss 5.6700 LearningRate 0.0005 Epoch: 14 Global Step: 309310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:37,377-Speed 6303.71 samples/sec Loss 5.7084 LearningRate 0.0005 Epoch: 14 Global Step: 309320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:40,625-Speed 6308.12 samples/sec Loss 5.6578 LearningRate 0.0005 Epoch: 14 Global Step: 309330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:43,873-Speed 6306.73 samples/sec Loss 5.6581 LearningRate 0.0005 Epoch: 14 Global Step: 309340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:47,113-Speed 6322.32 samples/sec Loss 5.6308 LearningRate 0.0005 Epoch: 14 Global Step: 309350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:50,360-Speed 6308.84 samples/sec Loss 5.6549 LearningRate 0.0005 Epoch: 14 Global Step: 309360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:53,607-Speed 6308.55 samples/sec Loss 5.6616 LearningRate 0.0005 Epoch: 14 Global Step: 309370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:51:56,849-Speed 6317.06 samples/sec Loss 5.6844 LearningRate 0.0005 Epoch: 14 Global Step: 309380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:52:00,091-Speed 6319.59 samples/sec Loss 5.7056 LearningRate 0.0005 Epoch: 14 Global Step: 309390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:52:03,338-Speed 6307.38 samples/sec Loss 5.6436 LearningRate 0.0005 Epoch: 14 Global Step: 309400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:52:06,584-Speed 6311.60 samples/sec Loss 5.6650 LearningRate 0.0005 Epoch: 14 Global Step: 309410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:52:09,832-Speed 6308.06 samples/sec Loss 5.6828 LearningRate 0.0005 Epoch: 14 Global Step: 309420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:52:13,069-Speed 6328.90 samples/sec Loss 5.7389 LearningRate 0.0005 Epoch: 14 Global Step: 309430 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:52:16,312-Speed 6315.51 samples/sec Loss 5.6432 LearningRate 0.0005 Epoch: 14 Global Step: 309440 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:52:19,559-Speed 6310.35 samples/sec Loss 5.6743 LearningRate 0.0005 Epoch: 14 Global Step: 309450 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:52:22,806-Speed 6308.11 samples/sec Loss 5.6873 LearningRate 0.0005 Epoch: 14 Global Step: 309460 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:52:26,051-Speed 6311.68 samples/sec Loss 5.6619 LearningRate 0.0005 Epoch: 14 Global Step: 309470 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:52:29,296-Speed 6313.43 samples/sec Loss 5.6781 LearningRate 0.0005 Epoch: 14 Global Step: 309480 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:52:32,541-Speed 6311.97 samples/sec Loss 5.6359 LearningRate 0.0005 Epoch: 14 Global Step: 309490 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:52:35,785-Speed 6315.68 samples/sec Loss 5.6480 LearningRate 0.0005 Epoch: 14 Global Step: 309500 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:52:39,031-Speed 6309.39 samples/sec Loss 5.6288 LearningRate 0.0005 Epoch: 14 Global Step: 309510 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:52:42,277-Speed 6313.08 samples/sec Loss 5.6026 LearningRate 0.0005 Epoch: 14 Global Step: 309520 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 19:52:45,522-Speed 6311.82 samples/sec Loss 5.7256 LearningRate 0.0005 Epoch: 14 Global Step: 309530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:52:48,770-Speed 6307.99 samples/sec Loss 5.6599 LearningRate 0.0005 Epoch: 14 Global Step: 309540 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:52:52,013-Speed 6314.90 samples/sec Loss 5.6258 LearningRate 0.0005 Epoch: 14 Global Step: 309550 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:52:55,263-Speed 6303.20 samples/sec Loss 5.5846 LearningRate 0.0005 Epoch: 14 Global Step: 309560 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:52:58,512-Speed 6305.26 samples/sec Loss 5.6945 LearningRate 0.0005 Epoch: 14 Global Step: 309570 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:01,758-Speed 6310.60 samples/sec Loss 5.6981 LearningRate 0.0005 Epoch: 14 Global Step: 309580 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:05,005-Speed 6308.44 samples/sec Loss 5.7750 LearningRate 0.0005 Epoch: 14 Global Step: 309590 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:08,254-Speed 6304.84 samples/sec Loss 5.6041 LearningRate 0.0005 Epoch: 14 Global Step: 309600 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:11,500-Speed 6311.53 samples/sec Loss 5.7452 LearningRate 0.0005 Epoch: 14 Global Step: 309610 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:14,760-Speed 6284.03 samples/sec Loss 5.6653 LearningRate 0.0005 Epoch: 14 Global Step: 309620 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:18,007-Speed 6307.92 samples/sec Loss 5.7112 LearningRate 0.0005 Epoch: 14 Global Step: 309630 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 19:53:21,241-Speed 6335.13 samples/sec Loss 5.6609 LearningRate 0.0005 Epoch: 14 Global Step: 309640 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:24,486-Speed 6311.64 samples/sec Loss 5.6843 LearningRate 0.0005 Epoch: 14 Global Step: 309650 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:27,731-Speed 6313.09 samples/sec Loss 5.6317 LearningRate 0.0005 Epoch: 14 Global Step: 309660 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:30,977-Speed 6311.64 samples/sec Loss 5.7594 LearningRate 0.0005 Epoch: 14 Global Step: 309670 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:34,223-Speed 6311.10 samples/sec Loss 5.6536 LearningRate 0.0005 Epoch: 14 Global Step: 309680 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:37,468-Speed 6311.89 samples/sec Loss 5.7670 LearningRate 0.0005 Epoch: 14 Global Step: 309690 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:40,715-Speed 6309.98 samples/sec Loss 5.7205 LearningRate 0.0005 Epoch: 14 Global Step: 309700 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:43,963-Speed 6305.15 samples/sec Loss 5.6525 LearningRate 0.0005 Epoch: 14 Global Step: 309710 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:47,212-Speed 6306.63 samples/sec Loss 5.6746 LearningRate 0.0005 Epoch: 14 Global Step: 309720 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:50,459-Speed 6307.92 samples/sec Loss 5.7346 LearningRate 0.0005 Epoch: 14 Global Step: 309730 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:53,686-Speed 6348.14 samples/sec Loss 5.7014 LearningRate 0.0005 Epoch: 14 Global Step: 309740 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:53:56,931-Speed 6312.76 samples/sec Loss 5.7114 LearningRate 0.0005 Epoch: 14 Global Step: 309750 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:00,178-Speed 6309.19 samples/sec Loss 5.6727 LearningRate 0.0005 Epoch: 14 Global Step: 309760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:03,426-Speed 6305.37 samples/sec Loss 5.6267 LearningRate 0.0005 Epoch: 14 Global Step: 309770 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:06,672-Speed 6310.78 samples/sec Loss 5.6754 LearningRate 0.0005 Epoch: 14 Global Step: 309780 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:09,920-Speed 6307.72 samples/sec Loss 5.6176 LearningRate 0.0005 Epoch: 14 Global Step: 309790 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:13,167-Speed 6307.73 samples/sec Loss 5.7181 LearningRate 0.0005 Epoch: 14 Global Step: 309800 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:16,420-Speed 6297.00 samples/sec Loss 5.6248 LearningRate 0.0005 Epoch: 14 Global Step: 309810 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:19,669-Speed 6305.39 samples/sec Loss 5.6479 LearningRate 0.0005 Epoch: 14 Global Step: 309820 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:22,915-Speed 6310.38 samples/sec Loss 5.6124 LearningRate 0.0005 Epoch: 14 Global Step: 309830 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:26,148-Speed 6339.75 samples/sec Loss 5.6940 LearningRate 0.0005 Epoch: 14 Global Step: 309840 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:29,396-Speed 6307.28 samples/sec Loss 5.6647 LearningRate 0.0005 Epoch: 14 Global Step: 309850 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:32,642-Speed 6310.49 samples/sec Loss 5.7382 LearningRate 0.0005 Epoch: 14 Global Step: 309860 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:35,887-Speed 6312.00 samples/sec Loss 5.6091 LearningRate 0.0005 Epoch: 14 Global Step: 309870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:39,133-Speed 6311.48 samples/sec Loss 5.6431 LearningRate 0.0005 Epoch: 14 Global Step: 309880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:42,385-Speed 6299.47 samples/sec Loss 5.6854 LearningRate 0.0005 Epoch: 14 Global Step: 309890 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:45,632-Speed 6308.61 samples/sec Loss 5.6161 LearningRate 0.0005 Epoch: 14 Global Step: 309900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:48,875-Speed 6317.17 samples/sec Loss 5.7230 LearningRate 0.0005 Epoch: 14 Global Step: 309910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:53,003-Speed 6310.48 samples/sec Loss 5.6494 LearningRate 0.0005 Epoch: 14 Global Step: 309920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:56,250-Speed 6308.28 samples/sec Loss 5.5975 LearningRate 0.0005 Epoch: 14 Global Step: 309930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:54:59,541-Speed 6224.92 samples/sec Loss 5.6982 LearningRate 0.0005 Epoch: 14 Global Step: 309940 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 19:55:02,778-Speed 6327.83 samples/sec Loss 5.6145 LearningRate 0.0005 Epoch: 14 Global Step: 309950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:06,025-Speed 6309.97 samples/sec Loss 5.6483 LearningRate 0.0005 Epoch: 14 Global Step: 309960 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:09,270-Speed 6312.03 samples/sec Loss 5.7633 LearningRate 0.0005 Epoch: 14 Global Step: 309970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:12,513-Speed 6315.99 samples/sec Loss 5.7303 LearningRate 0.0005 Epoch: 14 Global Step: 309980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:15,763-Speed 6302.96 samples/sec Loss 5.6896 LearningRate 0.0005 Epoch: 14 Global Step: 309990 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:19,016-Speed 6298.51 samples/sec Loss 5.6302 LearningRate 0.0005 Epoch: 14 Global Step: 310000 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:22,266-Speed 6303.17 samples/sec Loss 5.6312 LearningRate 0.0005 Epoch: 14 Global Step: 310010 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:25,514-Speed 6306.44 samples/sec Loss 5.6680 LearningRate 0.0005 Epoch: 14 Global Step: 310020 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:28,761-Speed 6309.08 samples/sec Loss 5.6287 LearningRate 0.0005 Epoch: 14 Global Step: 310030 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:32,008-Speed 6308.73 samples/sec Loss 5.5660 LearningRate 0.0005 Epoch: 14 Global Step: 310040 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:35,244-Speed 6330.01 samples/sec Loss 5.7189 LearningRate 0.0005 Epoch: 14 Global Step: 310050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:38,487-Speed 6315.20 samples/sec Loss 5.6934 LearningRate 0.0005 Epoch: 14 Global Step: 310060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:41,732-Speed 6313.55 samples/sec Loss 5.6831 LearningRate 0.0005 Epoch: 14 Global Step: 310070 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:44,989-Speed 6288.98 samples/sec Loss 5.6292 LearningRate 0.0005 Epoch: 14 Global Step: 310080 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:48,237-Speed 6306.91 samples/sec Loss 5.7074 LearningRate 0.0005 Epoch: 14 Global Step: 310090 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:51,493-Speed 6292.90 samples/sec Loss 5.6586 LearningRate 0.0005 Epoch: 14 Global Step: 310100 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:54,743-Speed 6302.77 samples/sec Loss 5.6319 LearningRate 0.0005 Epoch: 14 Global Step: 310110 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:55:57,990-Speed 6308.33 samples/sec Loss 5.7375 LearningRate 0.0005 Epoch: 14 Global Step: 310120 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:01,235-Speed 6312.52 samples/sec Loss 5.5751 LearningRate 0.0005 Epoch: 14 Global Step: 310130 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:04,532-Speed 6212.73 samples/sec Loss 5.6940 LearningRate 0.0005 Epoch: 14 Global Step: 310140 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:07,764-Speed 6337.80 samples/sec Loss 5.6044 LearningRate 0.0005 Epoch: 14 Global Step: 310150 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:11,011-Speed 6309.43 samples/sec Loss 5.6612 LearningRate 0.0005 Epoch: 14 Global Step: 310160 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:14,266-Speed 6294.44 samples/sec Loss 5.6955 LearningRate 0.0005 Epoch: 14 Global Step: 310170 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:17,513-Speed 6307.28 samples/sec Loss 5.6299 LearningRate 0.0005 Epoch: 14 Global Step: 310180 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:20,759-Speed 6312.44 samples/sec Loss 5.7160 LearningRate 0.0005 Epoch: 14 Global Step: 310190 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:24,006-Speed 6308.10 samples/sec Loss 5.6060 LearningRate 0.0005 Epoch: 14 Global Step: 310200 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:27,256-Speed 6303.80 samples/sec Loss 5.6677 LearningRate 0.0005 Epoch: 14 Global Step: 310210 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:30,505-Speed 6302.87 samples/sec Loss 5.7743 LearningRate 0.0005 Epoch: 14 Global Step: 310220 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:33,755-Speed 6304.55 samples/sec Loss 5.6997 LearningRate 0.0005 Epoch: 14 Global Step: 310230 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:37,002-Speed 6308.08 samples/sec Loss 5.6742 LearningRate 0.0005 Epoch: 14 Global Step: 310240 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:40,248-Speed 6311.23 samples/sec Loss 5.6780 LearningRate 0.0005 Epoch: 14 Global Step: 310250 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 19:56:43,483-Speed 6330.93 samples/sec Loss 5.6903 LearningRate 0.0005 Epoch: 14 Global Step: 310260 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:46,741-Speed 6287.48 samples/sec Loss 5.6769 LearningRate 0.0005 Epoch: 14 Global Step: 310270 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:49,990-Speed 6308.02 samples/sec Loss 5.6116 LearningRate 0.0005 Epoch: 14 Global Step: 310280 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:53,237-Speed 6309.14 samples/sec Loss 5.6799 LearningRate 0.0005 Epoch: 14 Global Step: 310290 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:56,481-Speed 6317.11 samples/sec Loss 5.6347 LearningRate 0.0005 Epoch: 14 Global Step: 310300 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:56:59,733-Speed 6298.94 samples/sec Loss 5.5793 LearningRate 0.0005 Epoch: 14 Global Step: 310310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:02,975-Speed 6317.23 samples/sec Loss 5.6717 LearningRate 0.0005 Epoch: 14 Global Step: 310320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:06,218-Speed 6317.77 samples/sec Loss 5.5264 LearningRate 0.0005 Epoch: 14 Global Step: 310330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:09,463-Speed 6311.34 samples/sec Loss 5.7627 LearningRate 0.0005 Epoch: 14 Global Step: 310340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:12,716-Speed 6297.14 samples/sec Loss 5.6354 LearningRate 0.0005 Epoch: 14 Global Step: 310350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:15,947-Speed 6340.55 samples/sec Loss 5.7076 LearningRate 0.0005 Epoch: 14 Global Step: 310360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:19,191-Speed 6313.96 samples/sec Loss 5.6598 LearningRate 0.0005 Epoch: 14 Global Step: 310370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:22,437-Speed 6310.77 samples/sec Loss 5.6513 LearningRate 0.0005 Epoch: 14 Global Step: 310380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:25,684-Speed 6308.34 samples/sec Loss 5.7070 LearningRate 0.0005 Epoch: 14 Global Step: 310390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:28,928-Speed 6315.06 samples/sec Loss 5.6572 LearningRate 0.0005 Epoch: 14 Global Step: 310400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:32,176-Speed 6307.25 samples/sec Loss 5.6860 LearningRate 0.0005 Epoch: 14 Global Step: 310410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:35,423-Speed 6309.43 samples/sec Loss 5.6450 LearningRate 0.0005 Epoch: 14 Global Step: 310420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:38,666-Speed 6315.47 samples/sec Loss 5.6413 LearningRate 0.0005 Epoch: 14 Global Step: 310430 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:41,912-Speed 6311.76 samples/sec Loss 5.6964 LearningRate 0.0005 Epoch: 14 Global Step: 310440 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:45,160-Speed 6306.42 samples/sec Loss 5.6656 LearningRate 0.0005 Epoch: 14 Global Step: 310450 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:48,395-Speed 6331.07 samples/sec Loss 5.6470 LearningRate 0.0005 Epoch: 14 Global Step: 310460 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:51,648-Speed 6298.31 samples/sec Loss 5.6589 LearningRate 0.0005 Epoch: 14 Global Step: 310470 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:54,899-Speed 6301.27 samples/sec Loss 5.6414 LearningRate 0.0005 Epoch: 14 Global Step: 310480 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:57:58,145-Speed 6309.68 samples/sec Loss 5.7412 LearningRate 0.0005 Epoch: 14 Global Step: 310490 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:01,390-Speed 6312.46 samples/sec Loss 5.7043 LearningRate 0.0005 Epoch: 14 Global Step: 310500 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:04,640-Speed 6304.30 samples/sec Loss 5.5856 LearningRate 0.0005 Epoch: 14 Global Step: 310510 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:07,890-Speed 6302.16 samples/sec Loss 5.6250 LearningRate 0.0005 Epoch: 14 Global Step: 310520 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:11,138-Speed 6308.79 samples/sec Loss 5.6301 LearningRate 0.0005 Epoch: 14 Global Step: 310530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:14,403-Speed 6273.57 samples/sec Loss 5.6549 LearningRate 0.0005 Epoch: 14 Global Step: 310540 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:17,648-Speed 6311.55 samples/sec Loss 5.6944 LearningRate 0.0005 Epoch: 14 Global Step: 310550 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:20,887-Speed 6324.68 samples/sec Loss 5.6304 LearningRate 0.0005 Epoch: 14 Global Step: 310560 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:24,136-Speed 6306.02 samples/sec Loss 5.6742 LearningRate 0.0005 Epoch: 14 Global Step: 310570 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:27,381-Speed 6312.83 samples/sec Loss 5.6920 LearningRate 0.0005 Epoch: 14 Global Step: 310580 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:30,635-Speed 6294.41 samples/sec Loss 5.6353 LearningRate 0.0005 Epoch: 14 Global Step: 310590 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:33,883-Speed 6307.29 samples/sec Loss 5.7574 LearningRate 0.0005 Epoch: 14 Global Step: 310600 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:37,129-Speed 6310.87 samples/sec Loss 5.6158 LearningRate 0.0005 Epoch: 14 Global Step: 310610 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:40,373-Speed 6313.30 samples/sec Loss 5.6510 LearningRate 0.0005 Epoch: 14 Global Step: 310620 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:43,625-Speed 6299.26 samples/sec Loss 5.6663 LearningRate 0.0005 Epoch: 14 Global Step: 310630 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:46,873-Speed 6308.17 samples/sec Loss 5.6885 LearningRate 0.0005 Epoch: 14 Global Step: 310640 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:50,118-Speed 6311.81 samples/sec Loss 5.6708 LearningRate 0.0005 Epoch: 14 Global Step: 310650 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:53,346-Speed 6345.94 samples/sec Loss 5.7164 LearningRate 0.0005 Epoch: 14 Global Step: 310660 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:56,595-Speed 6305.36 samples/sec Loss 5.7172 LearningRate 0.0005 Epoch: 14 Global Step: 310670 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:58:59,839-Speed 6314.18 samples/sec Loss 5.6355 LearningRate 0.0005 Epoch: 14 Global Step: 310680 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:03,108-Speed 6266.27 samples/sec Loss 5.6162 LearningRate 0.0005 Epoch: 14 Global Step: 310690 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:06,422-Speed 6181.82 samples/sec Loss 5.6458 LearningRate 0.0005 Epoch: 14 Global Step: 310700 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:09,668-Speed 6310.44 samples/sec Loss 5.6523 LearningRate 0.0005 Epoch: 14 Global Step: 310710 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:12,915-Speed 6309.90 samples/sec Loss 5.6818 LearningRate 0.0005 Epoch: 14 Global Step: 310720 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:16,161-Speed 6310.26 samples/sec Loss 5.6791 LearningRate 0.0005 Epoch: 14 Global Step: 310730 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:19,408-Speed 6309.71 samples/sec Loss 5.6095 LearningRate 0.0005 Epoch: 14 Global Step: 310740 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:22,654-Speed 6311.36 samples/sec Loss 5.6600 LearningRate 0.0005 Epoch: 14 Global Step: 310750 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:25,887-Speed 6336.00 samples/sec Loss 5.7169 LearningRate 0.0005 Epoch: 14 Global Step: 310760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:29,137-Speed 6302.26 samples/sec Loss 5.5915 LearningRate 0.0005 Epoch: 14 Global Step: 310770 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:32,379-Speed 6318.93 samples/sec Loss 5.6323 LearningRate 0.0005 Epoch: 14 Global Step: 310780 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:35,625-Speed 6309.69 samples/sec Loss 5.7711 LearningRate 0.0005 Epoch: 14 Global Step: 310790 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:38,869-Speed 6314.36 samples/sec Loss 5.6147 LearningRate 0.0005 Epoch: 14 Global Step: 310800 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:42,113-Speed 6314.42 samples/sec Loss 5.7544 LearningRate 0.0005 Epoch: 14 Global Step: 310810 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:45,358-Speed 6312.66 samples/sec Loss 5.6493 LearningRate 0.0005 Epoch: 14 Global Step: 310820 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:48,607-Speed 6305.32 samples/sec Loss 5.6724 LearningRate 0.0005 Epoch: 14 Global Step: 310830 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:51,856-Speed 6304.36 samples/sec Loss 5.6876 LearningRate 0.0005 Epoch: 14 Global Step: 310840 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:55,101-Speed 6312.97 samples/sec Loss 5.6713 LearningRate 0.0005 Epoch: 14 Global Step: 310850 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 19:59:58,332-Speed 6341.08 samples/sec Loss 5.6097 LearningRate 0.0005 Epoch: 14 Global Step: 310860 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:01,579-Speed 6308.42 samples/sec Loss 5.6627 LearningRate 0.0005 Epoch: 14 Global Step: 310870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:04,825-Speed 6309.85 samples/sec Loss 5.6613 LearningRate 0.0005 Epoch: 14 Global Step: 310880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:08,070-Speed 6313.37 samples/sec Loss 5.6141 LearningRate 0.0005 Epoch: 14 Global Step: 310890 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:11,313-Speed 6316.38 samples/sec Loss 5.6776 LearningRate 0.0005 Epoch: 14 Global Step: 310900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:14,559-Speed 6311.01 samples/sec Loss 5.7447 LearningRate 0.0005 Epoch: 14 Global Step: 310910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:17,806-Speed 6309.40 samples/sec Loss 5.6280 LearningRate 0.0005 Epoch: 14 Global Step: 310920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:21,052-Speed 6311.17 samples/sec Loss 5.7373 LearningRate 0.0005 Epoch: 14 Global Step: 310930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:24,295-Speed 6315.73 samples/sec Loss 5.7695 LearningRate 0.0005 Epoch: 14 Global Step: 310940 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:27,545-Speed 6303.20 samples/sec Loss 5.7034 LearningRate 0.0005 Epoch: 14 Global Step: 310950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:30,793-Speed 6307.73 samples/sec Loss 5.6055 LearningRate 0.0005 Epoch: 14 Global Step: 310960 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:00:34,022-Speed 6342.99 samples/sec Loss 5.6711 LearningRate 0.0005 Epoch: 14 Global Step: 310970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:37,267-Speed 6314.03 samples/sec Loss 5.6699 LearningRate 0.0005 Epoch: 14 Global Step: 310980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:40,516-Speed 6305.24 samples/sec Loss 5.5683 LearningRate 0.0005 Epoch: 14 Global Step: 310990 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:43,764-Speed 6306.36 samples/sec Loss 5.6530 LearningRate 0.0005 Epoch: 14 Global Step: 311000 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:47,011-Speed 6308.71 samples/sec Loss 5.6392 LearningRate 0.0005 Epoch: 14 Global Step: 311010 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:50,262-Speed 6301.15 samples/sec Loss 5.6867 LearningRate 0.0005 Epoch: 14 Global Step: 311020 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:53,509-Speed 6308.61 samples/sec Loss 5.6940 LearningRate 0.0005 Epoch: 14 Global Step: 311030 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:56,753-Speed 6313.20 samples/sec Loss 5.6431 LearningRate 0.0005 Epoch: 14 Global Step: 311040 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:00:59,999-Speed 6310.91 samples/sec Loss 5.6677 LearningRate 0.0005 Epoch: 14 Global Step: 311050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:01:03,245-Speed 6312.15 samples/sec Loss 5.6894 LearningRate 0.0005 Epoch: 14 Global Step: 311060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:01:06,475-Speed 6340.51 samples/sec Loss 5.6495 LearningRate 0.0005 Epoch: 14 Global Step: 311070 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:01:09,718-Speed 6316.50 samples/sec Loss 5.7247 LearningRate 0.0005 Epoch: 14 Global Step: 311080 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:01:12,965-Speed 6310.03 samples/sec Loss 5.6554 LearningRate 0.0005 Epoch: 14 Global Step: 311090 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:11,945-Speed 347.24 samples/sec Loss 5.6347 LearningRate 0.0005 Epoch: 15 Global Step: 311100 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:15,183-Speed 6327.33 samples/sec Loss 5.6355 LearningRate 0.0005 Epoch: 15 Global Step: 311110 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:18,418-Speed 6331.94 samples/sec Loss 5.6628 LearningRate 0.0005 Epoch: 15 Global Step: 311120 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:21,682-Speed 6276.09 samples/sec Loss 5.6726 LearningRate 0.0005 Epoch: 15 Global Step: 311130 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:24,919-Speed 6329.94 samples/sec Loss 5.6217 LearningRate 0.0005 Epoch: 15 Global Step: 311140 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:28,152-Speed 6335.82 samples/sec Loss 5.7094 LearningRate 0.0005 Epoch: 15 Global Step: 311150 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:31,392-Speed 6321.45 samples/sec Loss 5.6721 LearningRate 0.0005 Epoch: 15 Global Step: 311160 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:34,612-Speed 6362.85 samples/sec Loss 5.6573 LearningRate 0.0005 Epoch: 15 Global Step: 311170 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:37,855-Speed 6315.65 samples/sec Loss 5.6673 LearningRate 0.0005 Epoch: 15 Global Step: 311180 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:41,090-Speed 6332.94 samples/sec Loss 5.6757 LearningRate 0.0005 Epoch: 15 Global Step: 311190 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:44,331-Speed 6320.03 samples/sec Loss 5.6761 LearningRate 0.0005 Epoch: 15 Global Step: 311200 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:47,568-Speed 6328.73 samples/sec Loss 5.6568 LearningRate 0.0005 Epoch: 15 Global Step: 311210 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:50,804-Speed 6329.53 samples/sec Loss 5.6858 LearningRate 0.0005 Epoch: 15 Global Step: 311220 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:54,050-Speed 6310.82 samples/sec Loss 5.7147 LearningRate 0.0005 Epoch: 15 Global Step: 311230 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:02:57,291-Speed 6320.13 samples/sec Loss 5.6543 LearningRate 0.0005 Epoch: 15 Global Step: 311240 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:00,531-Speed 6323.60 samples/sec Loss 5.6543 LearningRate 0.0005 Epoch: 15 Global Step: 311250 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:03,774-Speed 6315.18 samples/sec Loss 5.6053 LearningRate 0.0005 Epoch: 15 Global Step: 311260 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:07,002-Speed 6347.55 samples/sec Loss 5.6237 LearningRate 0.0005 Epoch: 15 Global Step: 311270 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:10,241-Speed 6324.36 samples/sec Loss 5.6608 LearningRate 0.0005 Epoch: 15 Global Step: 311280 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:13,494-Speed 6296.13 samples/sec Loss 5.6427 LearningRate 0.0005 Epoch: 15 Global Step: 311290 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:16,737-Speed 6316.67 samples/sec Loss 5.6590 LearningRate 0.0005 Epoch: 15 Global Step: 311300 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:19,982-Speed 6311.57 samples/sec Loss 5.5929 LearningRate 0.0005 Epoch: 15 Global Step: 311310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:23,230-Speed 6307.55 samples/sec Loss 5.6734 LearningRate 0.0005 Epoch: 15 Global Step: 311320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:26,527-Speed 6213.09 samples/sec Loss 5.6603 LearningRate 0.0005 Epoch: 15 Global Step: 311330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:29,806-Speed 6248.82 samples/sec Loss 5.5573 LearningRate 0.0005 Epoch: 15 Global Step: 311340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:33,052-Speed 6310.10 samples/sec Loss 5.6397 LearningRate 0.0005 Epoch: 15 Global Step: 311350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:36,310-Speed 6287.29 samples/sec Loss 5.5911 LearningRate 0.0005 Epoch: 15 Global Step: 311360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:39,542-Speed 6339.18 samples/sec Loss 5.6206 LearningRate 0.0005 Epoch: 15 Global Step: 311370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:42,786-Speed 6313.83 samples/sec Loss 5.6265 LearningRate 0.0005 Epoch: 15 Global Step: 311380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:46,028-Speed 6317.72 samples/sec Loss 5.6325 LearningRate 0.0005 Epoch: 15 Global Step: 311390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:49,275-Speed 6308.54 samples/sec Loss 5.6384 LearningRate 0.0005 Epoch: 15 Global Step: 311400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:52,516-Speed 6321.37 samples/sec Loss 5.5987 LearningRate 0.0005 Epoch: 15 Global Step: 311410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:55,758-Speed 6317.64 samples/sec Loss 5.6170 LearningRate 0.0005 Epoch: 15 Global Step: 311420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:03:58,989-Speed 6340.75 samples/sec Loss 5.6883 LearningRate 0.0005 Epoch: 15 Global Step: 311430 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:04:02,244-Speed 6293.66 samples/sec Loss 5.6396 LearningRate 0.0005 Epoch: 15 Global Step: 311440 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:04:05,486-Speed 6317.86 samples/sec Loss 5.6465 LearningRate 0.0005 Epoch: 15 Global Step: 311450 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:04:08,729-Speed 6316.50 samples/sec Loss 5.6447 LearningRate 0.0005 Epoch: 15 Global Step: 311460 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:04:11,978-Speed 6304.78 samples/sec Loss 5.6849 LearningRate 0.0005 Epoch: 15 Global Step: 311470 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:04:15,224-Speed 6311.54 samples/sec Loss 5.6300 LearningRate 0.0005 Epoch: 15 Global Step: 311480 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:04:18,471-Speed 6306.87 samples/sec Loss 5.6206 LearningRate 0.0005 Epoch: 15 Global Step: 311490 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:04:21,713-Speed 6319.40 samples/sec Loss 5.6089 LearningRate 0.0005 Epoch: 15 Global Step: 311500 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:04:24,956-Speed 6315.79 samples/sec Loss 5.6555 LearningRate 0.0005 Epoch: 15 Global Step: 311510 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:04:28,198-Speed 6319.54 samples/sec Loss 5.6644 LearningRate 0.0005 Epoch: 15 Global Step: 311520 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:04:31,438-Speed 6322.74 samples/sec Loss 5.6704 LearningRate 0.0005 Epoch: 15 Global Step: 311530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:04:34,695-Speed 6289.37 samples/sec Loss 5.6521 LearningRate 0.0005 Epoch: 15 Global Step: 311540 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:04:37,936-Speed 6320.31 samples/sec Loss 5.6108 LearningRate 0.0005 Epoch: 15 Global Step: 311550 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:04:41,179-Speed 6317.39 samples/sec Loss 5.6875 LearningRate 0.0005 Epoch: 15 Global Step: 311560 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:04:44,421-Speed 6318.87 samples/sec Loss 5.6939 LearningRate 0.0005 Epoch: 15 Global Step: 311570 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:04:47,663-Speed 6317.70 samples/sec Loss 5.6777 LearningRate 0.0005 Epoch: 15 Global Step: 311580 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:04:50,904-Speed 6321.23 samples/sec Loss 5.6613 LearningRate 0.0005 Epoch: 15 Global Step: 311590 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:04:54,149-Speed 6312.63 samples/sec Loss 5.6781 LearningRate 0.0005 Epoch: 15 Global Step: 311600 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:04:57,389-Speed 6321.32 samples/sec Loss 5.6557 LearningRate 0.0005 Epoch: 15 Global Step: 311610 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:05:00,642-Speed 6298.36 samples/sec Loss 5.6476 LearningRate 0.0005 Epoch: 15 Global Step: 311620 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:05:03,873-Speed 6339.34 samples/sec Loss 5.6724 LearningRate 0.0005 Epoch: 15 Global Step: 311630 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:05:07,114-Speed 6323.33 samples/sec Loss 5.7167 LearningRate 0.0005 Epoch: 15 Global Step: 311640 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:05:10,357-Speed 6316.47 samples/sec Loss 5.6829 LearningRate 0.0005 Epoch: 15 Global Step: 311650 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:05:13,598-Speed 6320.84 samples/sec Loss 5.6863 LearningRate 0.0005 Epoch: 15 Global Step: 311660 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:05:16,849-Speed 6301.60 samples/sec Loss 5.6612 LearningRate 0.0005 Epoch: 15 Global Step: 311670 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:05:20,099-Speed 6305.12 samples/sec Loss 5.6456 LearningRate 0.0005 Epoch: 15 Global Step: 311680 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:05:23,396-Speed 6213.10 samples/sec Loss 5.5862 LearningRate 0.0005 Epoch: 15 Global Step: 311690 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:05:26,641-Speed 6313.22 samples/sec Loss 5.6362 LearningRate 0.0005 Epoch: 15 Global Step: 311700 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:05:29,884-Speed 6315.30 samples/sec Loss 5.6876 LearningRate 0.0005 Epoch: 15 Global Step: 311710 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:05:33,130-Speed 6311.96 samples/sec Loss 5.6252 LearningRate 0.0005 Epoch: 15 Global Step: 311720 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:05:36,370-Speed 6321.09 samples/sec Loss 5.6027 LearningRate 0.0005 Epoch: 15 Global Step: 311730 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:05:39,617-Speed 6308.93 samples/sec Loss 5.6763 LearningRate 0.0005 Epoch: 15 Global Step: 311740 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:05:42,865-Speed 6306.81 samples/sec Loss 5.6729 LearningRate 0.0005 Epoch: 15 Global Step: 311750 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:05:46,112-Speed 6309.45 samples/sec Loss 5.6052 LearningRate 0.0005 Epoch: 15 Global Step: 311760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:05:49,356-Speed 6314.88 samples/sec Loss 5.7359 LearningRate 0.0005 Epoch: 15 Global Step: 311770 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:05:52,601-Speed 6312.21 samples/sec Loss 5.6519 LearningRate 0.0005 Epoch: 15 Global Step: 311780 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:05:55,843-Speed 6319.47 samples/sec Loss 5.6805 LearningRate 0.0005 Epoch: 15 Global Step: 311790 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:05:59,087-Speed 6314.62 samples/sec Loss 5.6589 LearningRate 0.0005 Epoch: 15 Global Step: 311800 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:02,332-Speed 6312.66 samples/sec Loss 5.6248 LearningRate 0.0005 Epoch: 15 Global Step: 311810 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:05,581-Speed 6304.70 samples/sec Loss 5.6815 LearningRate 0.0005 Epoch: 15 Global Step: 311820 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:08,809-Speed 6346.16 samples/sec Loss 5.6295 LearningRate 0.0005 Epoch: 15 Global Step: 311830 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:12,050-Speed 6319.99 samples/sec Loss 5.5513 LearningRate 0.0005 Epoch: 15 Global Step: 311840 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:15,292-Speed 6319.39 samples/sec Loss 5.6014 LearningRate 0.0005 Epoch: 15 Global Step: 311850 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:18,542-Speed 6301.93 samples/sec Loss 5.6341 LearningRate 0.0005 Epoch: 15 Global Step: 311860 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:21,787-Speed 6313.66 samples/sec Loss 5.6737 LearningRate 0.0005 Epoch: 15 Global Step: 311870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:25,036-Speed 6304.86 samples/sec Loss 5.6449 LearningRate 0.0005 Epoch: 15 Global Step: 311880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:28,280-Speed 6314.26 samples/sec Loss 5.6627 LearningRate 0.0005 Epoch: 15 Global Step: 311890 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:31,528-Speed 6306.60 samples/sec Loss 5.6928 LearningRate 0.0005 Epoch: 15 Global Step: 311900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:34,771-Speed 6315.41 samples/sec Loss 5.5985 LearningRate 0.0005 Epoch: 15 Global Step: 311910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:38,015-Speed 6314.61 samples/sec Loss 5.6457 LearningRate 0.0005 Epoch: 15 Global Step: 311920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:41,242-Speed 6348.03 samples/sec Loss 5.5852 LearningRate 0.0005 Epoch: 15 Global Step: 311930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:44,486-Speed 6315.71 samples/sec Loss 5.6079 LearningRate 0.0005 Epoch: 15 Global Step: 311940 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:47,774-Speed 6230.03 samples/sec Loss 5.7122 LearningRate 0.0005 Epoch: 15 Global Step: 311950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:51,017-Speed 6316.28 samples/sec Loss 5.7172 LearningRate 0.0005 Epoch: 15 Global Step: 311960 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:54,261-Speed 6314.38 samples/sec Loss 5.6666 LearningRate 0.0005 Epoch: 15 Global Step: 311970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:06:57,505-Speed 6315.49 samples/sec Loss 5.6646 LearningRate 0.0005 Epoch: 15 Global Step: 311980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:07:00,750-Speed 6313.67 samples/sec Loss 5.5910 LearningRate 0.0005 Epoch: 15 Global Step: 311990 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:07:03,993-Speed 6314.75 samples/sec Loss 5.6167 LearningRate 0.0005 Epoch: 15 Global Step: 312000 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:07:07,241-Speed 6307.89 samples/sec Loss 5.6703 LearningRate 0.0005 Epoch: 15 Global Step: 312010 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:07:10,487-Speed 6311.23 samples/sec Loss 5.6737 LearningRate 0.0005 Epoch: 15 Global Step: 312020 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:07:13,729-Speed 6318.51 samples/sec Loss 5.6775 LearningRate 0.0005 Epoch: 15 Global Step: 312030 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:07:16,958-Speed 6342.69 samples/sec Loss 5.6925 LearningRate 0.0005 Epoch: 15 Global Step: 312040 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:07:20,204-Speed 6312.36 samples/sec Loss 5.6142 LearningRate 0.0005 Epoch: 15 Global Step: 312050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:07:23,447-Speed 6315.21 samples/sec Loss 5.6595 LearningRate 0.0005 Epoch: 15 Global Step: 312060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:07:26,694-Speed 6308.18 samples/sec Loss 5.6050 LearningRate 0.0005 Epoch: 15 Global Step: 312070 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:07:29,922-Speed 6346.95 samples/sec Loss 5.6675 LearningRate 0.0005 Epoch: 15 Global Step: 312080 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:07:33,166-Speed 6314.10 samples/sec Loss 5.6684 LearningRate 0.0005 Epoch: 15 Global Step: 312090 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:07:36,409-Speed 6317.72 samples/sec Loss 5.6881 LearningRate 0.0005 Epoch: 15 Global Step: 312100 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:07:39,659-Speed 6302.82 samples/sec Loss 5.6499 LearningRate 0.0005 Epoch: 15 Global Step: 312110 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:07:42,900-Speed 6320.53 samples/sec Loss 5.6493 LearningRate 0.0005 Epoch: 15 Global Step: 312120 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:07:46,140-Speed 6322.60 samples/sec Loss 5.6409 LearningRate 0.0005 Epoch: 15 Global Step: 312130 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:07:49,384-Speed 6313.12 samples/sec Loss 5.6696 LearningRate 0.0005 Epoch: 15 Global Step: 312140 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:07:52,627-Speed 6316.39 samples/sec Loss 5.6232 LearningRate 0.0005 Epoch: 15 Global Step: 312150 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:07:55,871-Speed 6315.89 samples/sec Loss 5.6181 LearningRate 0.0005 Epoch: 15 Global Step: 312160 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:07:59,114-Speed 6316.84 samples/sec Loss 5.6910 LearningRate 0.0005 Epoch: 15 Global Step: 312170 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:08:02,362-Speed 6307.22 samples/sec Loss 5.6292 LearningRate 0.0005 Epoch: 15 Global Step: 312180 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:05,604-Speed 6317.13 samples/sec Loss 5.7159 LearningRate 0.0005 Epoch: 15 Global Step: 312190 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:08,850-Speed 6312.54 samples/sec Loss 5.6738 LearningRate 0.0005 Epoch: 15 Global Step: 312200 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:12,098-Speed 6304.79 samples/sec Loss 5.6861 LearningRate 0.0005 Epoch: 15 Global Step: 312210 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:15,338-Speed 6323.38 samples/sec Loss 5.6431 LearningRate 0.0005 Epoch: 15 Global Step: 312220 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:18,584-Speed 6310.24 samples/sec Loss 5.6709 LearningRate 0.0005 Epoch: 15 Global Step: 312230 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:21,831-Speed 6310.16 samples/sec Loss 5.5951 LearningRate 0.0005 Epoch: 15 Global Step: 312240 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:25,075-Speed 6313.93 samples/sec Loss 5.6362 LearningRate 0.0005 Epoch: 15 Global Step: 312250 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:28,321-Speed 6310.29 samples/sec Loss 5.7112 LearningRate 0.0005 Epoch: 15 Global Step: 312260 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:31,566-Speed 6311.89 samples/sec Loss 5.6929 LearningRate 0.0005 Epoch: 15 Global Step: 312270 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:34,799-Speed 6337.19 samples/sec Loss 5.6871 LearningRate 0.0005 Epoch: 15 Global Step: 312280 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:38,044-Speed 6311.80 samples/sec Loss 5.6233 LearningRate 0.0005 Epoch: 15 Global Step: 312290 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:41,292-Speed 6307.66 samples/sec Loss 5.6768 LearningRate 0.0005 Epoch: 15 Global Step: 312300 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:44,535-Speed 6315.76 samples/sec Loss 5.6328 LearningRate 0.0005 Epoch: 15 Global Step: 312310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:47,783-Speed 6307.79 samples/sec Loss 5.6479 LearningRate 0.0005 Epoch: 15 Global Step: 312320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:51,027-Speed 6313.62 samples/sec Loss 5.6578 LearningRate 0.0005 Epoch: 15 Global Step: 312330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:54,277-Speed 6303.96 samples/sec Loss 5.6590 LearningRate 0.0005 Epoch: 15 Global Step: 312340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:08:57,523-Speed 6310.45 samples/sec Loss 5.6576 LearningRate 0.0005 Epoch: 15 Global Step: 312350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:00,769-Speed 6311.07 samples/sec Loss 5.6742 LearningRate 0.0005 Epoch: 15 Global Step: 312360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:04,026-Speed 6290.77 samples/sec Loss 5.7052 LearningRate 0.0005 Epoch: 15 Global Step: 312370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:07,259-Speed 6335.62 samples/sec Loss 5.5848 LearningRate 0.0005 Epoch: 15 Global Step: 312380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:10,501-Speed 6319.21 samples/sec Loss 5.6132 LearningRate 0.0005 Epoch: 15 Global Step: 312390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:13,748-Speed 6308.94 samples/sec Loss 5.6996 LearningRate 0.0005 Epoch: 15 Global Step: 312400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:16,993-Speed 6311.79 samples/sec Loss 5.6083 LearningRate 0.0005 Epoch: 15 Global Step: 312410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:20,239-Speed 6309.90 samples/sec Loss 5.7060 LearningRate 0.0005 Epoch: 15 Global Step: 312420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:23,486-Speed 6310.12 samples/sec Loss 5.6547 LearningRate 0.0005 Epoch: 15 Global Step: 312430 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:26,732-Speed 6310.84 samples/sec Loss 5.6795 LearningRate 0.0005 Epoch: 15 Global Step: 312440 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:29,975-Speed 6315.88 samples/sec Loss 5.6957 LearningRate 0.0005 Epoch: 15 Global Step: 312450 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:33,221-Speed 6311.16 samples/sec Loss 5.6333 LearningRate 0.0005 Epoch: 15 Global Step: 312460 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:36,466-Speed 6311.37 samples/sec Loss 5.5940 LearningRate 0.0005 Epoch: 15 Global Step: 312470 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:39,724-Speed 6288.67 samples/sec Loss 5.6285 LearningRate 0.0005 Epoch: 15 Global Step: 312480 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:09:42,954-Speed 6342.55 samples/sec Loss 5.7111 LearningRate 0.0005 Epoch: 15 Global Step: 312490 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:46,198-Speed 6312.91 samples/sec Loss 5.6616 LearningRate 0.0005 Epoch: 15 Global Step: 312500 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:49,448-Speed 6303.14 samples/sec Loss 5.5798 LearningRate 0.0005 Epoch: 15 Global Step: 312510 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:52,691-Speed 6317.21 samples/sec Loss 5.6559 LearningRate 0.0005 Epoch: 15 Global Step: 312520 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:55,934-Speed 6316.59 samples/sec Loss 5.6211 LearningRate 0.0005 Epoch: 15 Global Step: 312530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:09:59,176-Speed 6317.63 samples/sec Loss 5.6790 LearningRate 0.0005 Epoch: 15 Global Step: 312540 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:02,425-Speed 6304.91 samples/sec Loss 5.6341 LearningRate 0.0005 Epoch: 15 Global Step: 312550 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:05,671-Speed 6310.72 samples/sec Loss 5.6124 LearningRate 0.0005 Epoch: 15 Global Step: 312560 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:08,917-Speed 6312.24 samples/sec Loss 5.5832 LearningRate 0.0005 Epoch: 15 Global Step: 312570 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:12,171-Speed 6294.54 samples/sec Loss 5.7007 LearningRate 0.0005 Epoch: 15 Global Step: 312580 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:15,408-Speed 6329.65 samples/sec Loss 5.7077 LearningRate 0.0005 Epoch: 15 Global Step: 312590 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:18,651-Speed 6315.16 samples/sec Loss 5.6249 LearningRate 0.0005 Epoch: 15 Global Step: 312600 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:21,907-Speed 6292.57 samples/sec Loss 5.6655 LearningRate 0.0005 Epoch: 15 Global Step: 312610 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:25,271-Speed 6089.15 samples/sec Loss 5.6319 LearningRate 0.0005 Epoch: 15 Global Step: 312620 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:28,568-Speed 6212.18 samples/sec Loss 5.6030 LearningRate 0.0005 Epoch: 15 Global Step: 312630 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:31,813-Speed 6313.11 samples/sec Loss 5.6766 LearningRate 0.0005 Epoch: 15 Global Step: 312640 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:35,056-Speed 6316.39 samples/sec Loss 5.6619 LearningRate 0.0005 Epoch: 15 Global Step: 312650 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:38,302-Speed 6310.11 samples/sec Loss 5.6390 LearningRate 0.0005 Epoch: 15 Global Step: 312660 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:41,543-Speed 6320.56 samples/sec Loss 5.6015 LearningRate 0.0005 Epoch: 15 Global Step: 312670 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:44,785-Speed 6320.34 samples/sec Loss 5.6169 LearningRate 0.0005 Epoch: 15 Global Step: 312680 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:48,012-Speed 6346.64 samples/sec Loss 5.6362 LearningRate 0.0005 Epoch: 15 Global Step: 312690 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:51,255-Speed 6315.96 samples/sec Loss 5.6601 LearningRate 0.0005 Epoch: 15 Global Step: 312700 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:54,499-Speed 6314.48 samples/sec Loss 5.6142 LearningRate 0.0005 Epoch: 15 Global Step: 312710 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:10:57,745-Speed 6310.97 samples/sec Loss 5.6989 LearningRate 0.0005 Epoch: 15 Global Step: 312720 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:00,989-Speed 6314.19 samples/sec Loss 5.5775 LearningRate 0.0005 Epoch: 15 Global Step: 312730 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:04,233-Speed 6314.91 samples/sec Loss 5.6550 LearningRate 0.0005 Epoch: 15 Global Step: 312740 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:07,478-Speed 6313.69 samples/sec Loss 5.6549 LearningRate 0.0005 Epoch: 15 Global Step: 312750 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:10,748-Speed 6263.77 samples/sec Loss 5.6761 LearningRate 0.0005 Epoch: 15 Global Step: 312760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:14,013-Speed 6275.29 samples/sec Loss 5.6494 LearningRate 0.0005 Epoch: 15 Global Step: 312770 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:17,261-Speed 6306.48 samples/sec Loss 5.6728 LearningRate 0.0005 Epoch: 15 Global Step: 312780 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:20,494-Speed 6335.55 samples/sec Loss 5.6413 LearningRate 0.0005 Epoch: 15 Global Step: 312790 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:23,741-Speed 6309.06 samples/sec Loss 5.6459 LearningRate 0.0005 Epoch: 15 Global Step: 312800 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:26,985-Speed 6314.98 samples/sec Loss 5.6454 LearningRate 0.0005 Epoch: 15 Global Step: 312810 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:30,229-Speed 6314.28 samples/sec Loss 5.7098 LearningRate 0.0005 Epoch: 15 Global Step: 312820 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:33,475-Speed 6310.89 samples/sec Loss 5.6151 LearningRate 0.0005 Epoch: 15 Global Step: 312830 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:36,720-Speed 6312.77 samples/sec Loss 5.6323 LearningRate 0.0005 Epoch: 15 Global Step: 312840 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:39,963-Speed 6316.15 samples/sec Loss 5.5908 LearningRate 0.0005 Epoch: 15 Global Step: 312850 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:43,208-Speed 6313.18 samples/sec Loss 5.6680 LearningRate 0.0005 Epoch: 15 Global Step: 312860 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:46,455-Speed 6309.85 samples/sec Loss 5.6455 LearningRate 0.0005 Epoch: 15 Global Step: 312870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:49,702-Speed 6306.87 samples/sec Loss 5.6161 LearningRate 0.0005 Epoch: 15 Global Step: 312880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:52,946-Speed 6315.25 samples/sec Loss 5.6345 LearningRate 0.0005 Epoch: 15 Global Step: 312890 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:11:56,178-Speed 6337.90 samples/sec Loss 5.6879 LearningRate 0.0005 Epoch: 15 Global Step: 312900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:11:59,422-Speed 6315.30 samples/sec Loss 5.6411 LearningRate 0.0005 Epoch: 15 Global Step: 312910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:02,666-Speed 6314.34 samples/sec Loss 5.6443 LearningRate 0.0005 Epoch: 15 Global Step: 312920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:05,906-Speed 6321.56 samples/sec Loss 5.6306 LearningRate 0.0005 Epoch: 15 Global Step: 312930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:09,154-Speed 6306.87 samples/sec Loss 5.6283 LearningRate 0.0005 Epoch: 15 Global Step: 312940 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:12,437-Speed 6239.34 samples/sec Loss 5.6358 LearningRate 0.0005 Epoch: 15 Global Step: 312950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:15,698-Speed 6283.87 samples/sec Loss 5.6916 LearningRate 0.0005 Epoch: 15 Global Step: 312960 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:18,943-Speed 6312.45 samples/sec Loss 5.6056 LearningRate 0.0005 Epoch: 15 Global Step: 312970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:22,189-Speed 6310.59 samples/sec Loss 5.5928 LearningRate 0.0005 Epoch: 15 Global Step: 312980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:25,483-Speed 6219.17 samples/sec Loss 5.6308 LearningRate 0.0005 Epoch: 15 Global Step: 312990 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:28,721-Speed 6325.39 samples/sec Loss 5.5289 LearningRate 0.0005 Epoch: 15 Global Step: 313000 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:31,969-Speed 6308.53 samples/sec Loss 5.6044 LearningRate 0.0005 Epoch: 15 Global Step: 313010 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:35,212-Speed 6314.79 samples/sec Loss 5.6311 LearningRate 0.0005 Epoch: 15 Global Step: 313020 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:38,459-Speed 6310.17 samples/sec Loss 5.6132 LearningRate 0.0005 Epoch: 15 Global Step: 313030 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:41,704-Speed 6312.31 samples/sec Loss 5.6553 LearningRate 0.0005 Epoch: 15 Global Step: 313040 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:44,946-Speed 6318.20 samples/sec Loss 5.6286 LearningRate 0.0005 Epoch: 15 Global Step: 313050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:48,195-Speed 6305.72 samples/sec Loss 5.6051 LearningRate 0.0005 Epoch: 15 Global Step: 313060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:51,438-Speed 6315.73 samples/sec Loss 5.6556 LearningRate 0.0005 Epoch: 15 Global Step: 313070 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:54,688-Speed 6303.36 samples/sec Loss 5.6492 LearningRate 0.0005 Epoch: 15 Global Step: 313080 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:12:57,935-Speed 6308.72 samples/sec Loss 5.6256 LearningRate 0.0005 Epoch: 15 Global Step: 313090 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:01,168-Speed 6337.17 samples/sec Loss 5.6580 LearningRate 0.0005 Epoch: 15 Global Step: 313100 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:04,416-Speed 6306.43 samples/sec Loss 5.6231 LearningRate 0.0005 Epoch: 15 Global Step: 313110 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:07,661-Speed 6312.06 samples/sec Loss 5.5508 LearningRate 0.0005 Epoch: 15 Global Step: 313120 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:10,908-Speed 6310.08 samples/sec Loss 5.6291 LearningRate 0.0005 Epoch: 15 Global Step: 313130 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:14,155-Speed 6311.18 samples/sec Loss 5.6449 LearningRate 0.0005 Epoch: 15 Global Step: 313140 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:17,401-Speed 6310.08 samples/sec Loss 5.5731 LearningRate 0.0005 Epoch: 15 Global Step: 313150 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:20,645-Speed 6313.48 samples/sec Loss 5.7219 LearningRate 0.0005 Epoch: 15 Global Step: 313160 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:23,892-Speed 6309.62 samples/sec Loss 5.6931 LearningRate 0.0005 Epoch: 15 Global Step: 313170 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:27,139-Speed 6310.77 samples/sec Loss 5.5525 LearningRate 0.0005 Epoch: 15 Global Step: 313180 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:30,384-Speed 6312.85 samples/sec Loss 5.6618 LearningRate 0.0005 Epoch: 15 Global Step: 313190 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:33,617-Speed 6335.57 samples/sec Loss 5.6587 LearningRate 0.0005 Epoch: 15 Global Step: 313200 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:36,861-Speed 6315.59 samples/sec Loss 5.7155 LearningRate 0.0005 Epoch: 15 Global Step: 313210 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:40,108-Speed 6308.44 samples/sec Loss 5.5700 LearningRate 0.0005 Epoch: 15 Global Step: 313220 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:43,355-Speed 6307.86 samples/sec Loss 5.6458 LearningRate 0.0005 Epoch: 15 Global Step: 313230 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:46,603-Speed 6306.70 samples/sec Loss 5.6714 LearningRate 0.0005 Epoch: 15 Global Step: 313240 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:49,852-Speed 6304.85 samples/sec Loss 5.7285 LearningRate 0.0005 Epoch: 15 Global Step: 313250 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:53,096-Speed 6315.68 samples/sec Loss 5.6564 LearningRate 0.0005 Epoch: 15 Global Step: 313260 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:56,340-Speed 6313.67 samples/sec Loss 5.7153 LearningRate 0.0005 Epoch: 15 Global Step: 313270 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:13:59,584-Speed 6315.03 samples/sec Loss 5.6746 LearningRate 0.0005 Epoch: 15 Global Step: 313280 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:02,838-Speed 6295.35 samples/sec Loss 5.6105 LearningRate 0.0005 Epoch: 15 Global Step: 313290 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:06,073-Speed 6331.58 samples/sec Loss 5.6752 LearningRate 0.0005 Epoch: 15 Global Step: 313300 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:09,315-Speed 6319.47 samples/sec Loss 5.5784 LearningRate 0.0005 Epoch: 15 Global Step: 313310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:12,564-Speed 6304.07 samples/sec Loss 5.6318 LearningRate 0.0005 Epoch: 15 Global Step: 313320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:15,810-Speed 6311.06 samples/sec Loss 5.6107 LearningRate 0.0005 Epoch: 15 Global Step: 313330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:19,061-Speed 6301.96 samples/sec Loss 5.6289 LearningRate 0.0005 Epoch: 15 Global Step: 313340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:22,306-Speed 6311.20 samples/sec Loss 5.6477 LearningRate 0.0005 Epoch: 15 Global Step: 313350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:25,549-Speed 6316.36 samples/sec Loss 5.6494 LearningRate 0.0005 Epoch: 15 Global Step: 313360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:28,793-Speed 6315.47 samples/sec Loss 5.6719 LearningRate 0.0005 Epoch: 15 Global Step: 313370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:32,045-Speed 6299.23 samples/sec Loss 5.5787 LearningRate 0.0005 Epoch: 15 Global Step: 313380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:35,291-Speed 6309.70 samples/sec Loss 5.6605 LearningRate 0.0005 Epoch: 15 Global Step: 313390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:38,522-Speed 6340.59 samples/sec Loss 5.6235 LearningRate 0.0005 Epoch: 15 Global Step: 313400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:41,771-Speed 6304.84 samples/sec Loss 5.6999 LearningRate 0.0005 Epoch: 15 Global Step: 313410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:45,016-Speed 6314.29 samples/sec Loss 5.5503 LearningRate 0.0005 Epoch: 15 Global Step: 313420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:48,257-Speed 6319.58 samples/sec Loss 5.5897 LearningRate 0.0005 Epoch: 15 Global Step: 313430 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:51,502-Speed 6312.64 samples/sec Loss 5.6209 LearningRate 0.0005 Epoch: 15 Global Step: 313440 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:54,743-Speed 6320.80 samples/sec Loss 5.6191 LearningRate 0.0005 Epoch: 15 Global Step: 313450 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:14:57,988-Speed 6313.01 samples/sec Loss 5.6194 LearningRate 0.0005 Epoch: 15 Global Step: 313460 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:01,231-Speed 6316.54 samples/sec Loss 5.6569 LearningRate 0.0005 Epoch: 15 Global Step: 313470 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:04,480-Speed 6303.42 samples/sec Loss 5.6870 LearningRate 0.0005 Epoch: 15 Global Step: 313480 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:07,727-Speed 6309.84 samples/sec Loss 5.6180 LearningRate 0.0005 Epoch: 15 Global Step: 313490 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:10,962-Speed 6333.12 samples/sec Loss 5.6265 LearningRate 0.0005 Epoch: 15 Global Step: 313500 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:14,205-Speed 6315.96 samples/sec Loss 5.5902 LearningRate 0.0005 Epoch: 15 Global Step: 313510 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:17,449-Speed 6314.40 samples/sec Loss 5.6824 LearningRate 0.0005 Epoch: 15 Global Step: 313520 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:20,695-Speed 6311.21 samples/sec Loss 5.6461 LearningRate 0.0005 Epoch: 15 Global Step: 313530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:23,939-Speed 6313.41 samples/sec Loss 5.5568 LearningRate 0.0005 Epoch: 15 Global Step: 313540 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:27,190-Speed 6301.66 samples/sec Loss 5.6273 LearningRate 0.0005 Epoch: 15 Global Step: 313550 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:30,437-Speed 6309.16 samples/sec Loss 5.5874 LearningRate 0.0005 Epoch: 15 Global Step: 313560 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:33,679-Speed 6318.32 samples/sec Loss 5.6037 LearningRate 0.0005 Epoch: 15 Global Step: 313570 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:36,928-Speed 6305.67 samples/sec Loss 5.6670 LearningRate 0.0005 Epoch: 15 Global Step: 313580 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:40,173-Speed 6311.28 samples/sec Loss 5.5486 LearningRate 0.0005 Epoch: 15 Global Step: 313590 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:43,404-Speed 6340.65 samples/sec Loss 5.7019 LearningRate 0.0005 Epoch: 15 Global Step: 313600 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:46,664-Speed 6284.85 samples/sec Loss 5.6065 LearningRate 0.0005 Epoch: 15 Global Step: 313610 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:49,909-Speed 6312.95 samples/sec Loss 5.6284 LearningRate 0.0005 Epoch: 15 Global Step: 313620 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:53,152-Speed 6315.23 samples/sec Loss 5.6629 LearningRate 0.0005 Epoch: 15 Global Step: 313630 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:56,397-Speed 6312.50 samples/sec Loss 5.6058 LearningRate 0.0005 Epoch: 15 Global Step: 313640 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:15:59,647-Speed 6303.32 samples/sec Loss 5.6472 LearningRate 0.0005 Epoch: 15 Global Step: 313650 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:02,889-Speed 6318.20 samples/sec Loss 5.5847 LearningRate 0.0005 Epoch: 15 Global Step: 313660 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:06,133-Speed 6314.60 samples/sec Loss 5.5920 LearningRate 0.0005 Epoch: 15 Global Step: 313670 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:09,379-Speed 6311.88 samples/sec Loss 5.6980 LearningRate 0.0005 Epoch: 15 Global Step: 313680 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:12,623-Speed 6314.28 samples/sec Loss 5.6430 LearningRate 0.0005 Epoch: 15 Global Step: 313690 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:15,870-Speed 6308.77 samples/sec Loss 5.6951 LearningRate 0.0005 Epoch: 15 Global Step: 313700 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:16:19,099-Speed 6344.59 samples/sec Loss 5.7005 LearningRate 0.0005 Epoch: 15 Global Step: 313710 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:22,345-Speed 6309.55 samples/sec Loss 5.5997 LearningRate 0.0005 Epoch: 15 Global Step: 313720 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:25,593-Speed 6307.01 samples/sec Loss 5.5861 LearningRate 0.0005 Epoch: 15 Global Step: 313730 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:28,836-Speed 6315.60 samples/sec Loss 5.6400 LearningRate 0.0005 Epoch: 15 Global Step: 313740 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:32,083-Speed 6309.84 samples/sec Loss 5.6199 LearningRate 0.0005 Epoch: 15 Global Step: 313750 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:35,374-Speed 6223.33 samples/sec Loss 5.6494 LearningRate 0.0005 Epoch: 15 Global Step: 313760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:16:38,608-Speed 6334.40 samples/sec Loss 5.6668 LearningRate 0.0005 Epoch: 15 Global Step: 313770 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:16:41,852-Speed 6315.13 samples/sec Loss 5.5926 LearningRate 0.0005 Epoch: 15 Global Step: 313780 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:16:45,097-Speed 6313.16 samples/sec Loss 5.6847 LearningRate 0.0005 Epoch: 15 Global Step: 313790 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:16:48,342-Speed 6313.34 samples/sec Loss 5.6194 LearningRate 0.0005 Epoch: 15 Global Step: 313800 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:16:51,584-Speed 6317.25 samples/sec Loss 5.6856 LearningRate 0.0005 Epoch: 15 Global Step: 313810 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:16:54,830-Speed 6312.68 samples/sec Loss 5.6838 LearningRate 0.0005 Epoch: 15 Global Step: 313820 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:16:58,072-Speed 6316.97 samples/sec Loss 5.6287 LearningRate 0.0005 Epoch: 15 Global Step: 313830 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:17:01,320-Speed 6307.42 samples/sec Loss 5.6983 LearningRate 0.0005 Epoch: 15 Global Step: 313840 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:17:04,567-Speed 6309.80 samples/sec Loss 5.6612 LearningRate 0.0005 Epoch: 15 Global Step: 313850 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:17:07,812-Speed 6312.56 samples/sec Loss 5.6346 LearningRate 0.0005 Epoch: 15 Global Step: 313860 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:17:11,058-Speed 6309.06 samples/sec Loss 5.6377 LearningRate 0.0005 Epoch: 15 Global Step: 313870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:14,308-Speed 6304.76 samples/sec Loss 5.6009 LearningRate 0.0005 Epoch: 15 Global Step: 313880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:17,556-Speed 6305.13 samples/sec Loss 5.5693 LearningRate 0.0005 Epoch: 15 Global Step: 313890 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:20,799-Speed 6317.60 samples/sec Loss 5.7289 LearningRate 0.0005 Epoch: 15 Global Step: 313900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:24,044-Speed 6311.69 samples/sec Loss 5.6433 LearningRate 0.0005 Epoch: 15 Global Step: 313910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:27,290-Speed 6311.15 samples/sec Loss 5.6662 LearningRate 0.0005 Epoch: 15 Global Step: 313920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:30,536-Speed 6310.33 samples/sec Loss 5.6135 LearningRate 0.0005 Epoch: 15 Global Step: 313930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:33,786-Speed 6304.05 samples/sec Loss 5.5730 LearningRate 0.0005 Epoch: 15 Global Step: 313940 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:37,039-Speed 6297.31 samples/sec Loss 5.5961 LearningRate 0.0005 Epoch: 15 Global Step: 313950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:40,284-Speed 6310.83 samples/sec Loss 5.6008 LearningRate 0.0005 Epoch: 15 Global Step: 313960 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:43,516-Speed 6339.21 samples/sec Loss 5.6558 LearningRate 0.0005 Epoch: 15 Global Step: 313970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:46,762-Speed 6309.92 samples/sec Loss 5.7159 LearningRate 0.0005 Epoch: 15 Global Step: 313980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:50,010-Speed 6306.88 samples/sec Loss 5.5697 LearningRate 0.0005 Epoch: 15 Global Step: 313990 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:53,257-Speed 6309.65 samples/sec Loss 5.6791 LearningRate 0.0005 Epoch: 15 Global Step: 314000 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:56,507-Speed 6301.95 samples/sec Loss 5.6498 LearningRate 0.0005 Epoch: 15 Global Step: 314010 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:17:59,756-Speed 6306.93 samples/sec Loss 5.6870 LearningRate 0.0005 Epoch: 15 Global Step: 314020 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:18:03,000-Speed 6314.00 samples/sec Loss 5.5756 LearningRate 0.0005 Epoch: 15 Global Step: 314030 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:18:06,247-Speed 6308.43 samples/sec Loss 5.6461 LearningRate 0.0005 Epoch: 15 Global Step: 314040 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:18:09,493-Speed 6311.90 samples/sec Loss 5.6214 LearningRate 0.0005 Epoch: 15 Global Step: 314050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:18:12,741-Speed 6306.20 samples/sec Loss 5.6607 LearningRate 0.0005 Epoch: 15 Global Step: 314060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:18:15,985-Speed 6314.04 samples/sec Loss 5.6631 LearningRate 0.0005 Epoch: 15 Global Step: 314070 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:18:19,216-Speed 6340.31 samples/sec Loss 5.5967 LearningRate 0.0005 Epoch: 15 Global Step: 314080 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:18:22,460-Speed 6315.17 samples/sec Loss 5.5906 LearningRate 0.0005 Epoch: 15 Global Step: 314090 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:18:25,714-Speed 6294.66 samples/sec Loss 5.6773 LearningRate 0.0005 Epoch: 15 Global Step: 314100 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:18:28,962-Speed 6306.28 samples/sec Loss 5.6606 LearningRate 0.0005 Epoch: 15 Global Step: 314110 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:18:32,204-Speed 6318.83 samples/sec Loss 5.6240 LearningRate 0.0005 Epoch: 15 Global Step: 314120 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:18:35,449-Speed 6312.74 samples/sec Loss 5.6501 LearningRate 0.0005 Epoch: 15 Global Step: 314130 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:18:38,695-Speed 6311.32 samples/sec Loss 5.6494 LearningRate 0.0005 Epoch: 15 Global Step: 314140 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:18:41,942-Speed 6308.91 samples/sec Loss 5.6243 LearningRate 0.0005 Epoch: 15 Global Step: 314150 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:18:45,192-Speed 6303.80 samples/sec Loss 5.6210 LearningRate 0.0005 Epoch: 15 Global Step: 314160 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:18:48,439-Speed 6307.98 samples/sec Loss 5.6440 LearningRate 0.0005 Epoch: 15 Global Step: 314170 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:18:51,687-Speed 6307.26 samples/sec Loss 5.5887 LearningRate 0.0005 Epoch: 15 Global Step: 314180 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:18:54,930-Speed 6315.96 samples/sec Loss 5.5958 LearningRate 0.0005 Epoch: 15 Global Step: 314190 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:18:58,177-Speed 6309.49 samples/sec Loss 5.6405 LearningRate 0.0005 Epoch: 15 Global Step: 314200 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:01,427-Speed 6302.61 samples/sec Loss 5.6652 LearningRate 0.0005 Epoch: 15 Global Step: 314210 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:04,668-Speed 6320.28 samples/sec Loss 5.6612 LearningRate 0.0005 Epoch: 15 Global Step: 314220 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:07,915-Speed 6310.04 samples/sec Loss 5.6612 LearningRate 0.0005 Epoch: 15 Global Step: 314230 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:11,162-Speed 6308.76 samples/sec Loss 5.7570 LearningRate 0.0005 Epoch: 15 Global Step: 314240 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:14,410-Speed 6307.11 samples/sec Loss 5.6390 LearningRate 0.0005 Epoch: 15 Global Step: 314250 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:17,658-Speed 6305.62 samples/sec Loss 5.6970 LearningRate 0.0005 Epoch: 15 Global Step: 314260 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:20,909-Speed 6302.39 samples/sec Loss 5.5925 LearningRate 0.0005 Epoch: 15 Global Step: 314270 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:24,160-Speed 6301.26 samples/sec Loss 5.6282 LearningRate 0.0005 Epoch: 15 Global Step: 314280 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:27,416-Speed 6290.54 samples/sec Loss 5.6728 LearningRate 0.0005 Epoch: 15 Global Step: 314290 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:19:30,650-Speed 6333.59 samples/sec Loss 5.6049 LearningRate 0.0005 Epoch: 15 Global Step: 314300 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:33,892-Speed 6319.59 samples/sec Loss 5.6284 LearningRate 0.0005 Epoch: 15 Global Step: 314310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:37,139-Speed 6307.40 samples/sec Loss 5.6656 LearningRate 0.0005 Epoch: 15 Global Step: 314320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:40,384-Speed 6313.99 samples/sec Loss 5.5684 LearningRate 0.0005 Epoch: 15 Global Step: 314330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:43,626-Speed 6318.75 samples/sec Loss 5.5547 LearningRate 0.0005 Epoch: 15 Global Step: 314340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:46,874-Speed 6305.62 samples/sec Loss 5.5901 LearningRate 0.0005 Epoch: 15 Global Step: 314350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:50,121-Speed 6309.96 samples/sec Loss 5.7053 LearningRate 0.0005 Epoch: 15 Global Step: 314360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:53,371-Speed 6302.90 samples/sec Loss 5.6496 LearningRate 0.0005 Epoch: 15 Global Step: 314370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:56,618-Speed 6308.21 samples/sec Loss 5.6369 LearningRate 0.0005 Epoch: 15 Global Step: 314380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:19:59,861-Speed 6315.75 samples/sec Loss 5.5348 LearningRate 0.0005 Epoch: 15 Global Step: 314390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:03,094-Speed 6336.13 samples/sec Loss 5.6264 LearningRate 0.0005 Epoch: 15 Global Step: 314400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:06,344-Speed 6303.36 samples/sec Loss 5.5650 LearningRate 0.0005 Epoch: 15 Global Step: 314410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:09,592-Speed 6306.24 samples/sec Loss 5.6114 LearningRate 0.0005 Epoch: 15 Global Step: 314420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:12,838-Speed 6312.53 samples/sec Loss 5.6037 LearningRate 0.0005 Epoch: 15 Global Step: 314430 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:16,082-Speed 6313.76 samples/sec Loss 5.5657 LearningRate 0.0005 Epoch: 15 Global Step: 314440 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:19,329-Speed 6309.07 samples/sec Loss 5.5216 LearningRate 0.0005 Epoch: 15 Global Step: 314450 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:22,576-Speed 6309.51 samples/sec Loss 5.6564 LearningRate 0.0005 Epoch: 15 Global Step: 314460 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:25,820-Speed 6313.45 samples/sec Loss 5.6313 LearningRate 0.0005 Epoch: 15 Global Step: 314470 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:29,067-Speed 6309.28 samples/sec Loss 5.5699 LearningRate 0.0005 Epoch: 15 Global Step: 314480 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:32,317-Speed 6303.22 samples/sec Loss 5.5265 LearningRate 0.0005 Epoch: 15 Global Step: 314490 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:35,555-Speed 6326.30 samples/sec Loss 5.6465 LearningRate 0.0005 Epoch: 15 Global Step: 314500 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:38,800-Speed 6312.58 samples/sec Loss 5.4706 LearningRate 0.0005 Epoch: 15 Global Step: 314510 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:42,047-Speed 6309.50 samples/sec Loss 5.6371 LearningRate 0.0005 Epoch: 15 Global Step: 314520 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:45,292-Speed 6312.14 samples/sec Loss 5.6406 LearningRate 0.0005 Epoch: 15 Global Step: 314530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:48,540-Speed 6307.54 samples/sec Loss 5.5923 LearningRate 0.0005 Epoch: 15 Global Step: 314540 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:51,787-Speed 6308.70 samples/sec Loss 5.6532 LearningRate 0.0005 Epoch: 15 Global Step: 314550 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:55,034-Speed 6308.17 samples/sec Loss 5.6264 LearningRate 0.0005 Epoch: 15 Global Step: 314560 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:20:58,279-Speed 6312.01 samples/sec Loss 5.5835 LearningRate 0.0005 Epoch: 15 Global Step: 314570 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:01,528-Speed 6304.37 samples/sec Loss 5.6660 LearningRate 0.0005 Epoch: 15 Global Step: 314580 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:04,774-Speed 6311.80 samples/sec Loss 5.5822 LearningRate 0.0005 Epoch: 15 Global Step: 314590 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:08,019-Speed 6311.39 samples/sec Loss 5.6234 LearningRate 0.0005 Epoch: 15 Global Step: 314600 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:21:11,246-Speed 6349.13 samples/sec Loss 5.6114 LearningRate 0.0005 Epoch: 15 Global Step: 314610 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:14,494-Speed 6306.78 samples/sec Loss 5.6441 LearningRate 0.0005 Epoch: 15 Global Step: 314620 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:17,745-Speed 6300.10 samples/sec Loss 5.6432 LearningRate 0.0005 Epoch: 15 Global Step: 314630 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:20,996-Speed 6300.88 samples/sec Loss 5.6126 LearningRate 0.0005 Epoch: 15 Global Step: 314640 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:24,248-Speed 6301.30 samples/sec Loss 5.6482 LearningRate 0.0005 Epoch: 15 Global Step: 314650 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:27,496-Speed 6305.98 samples/sec Loss 5.6458 LearningRate 0.0005 Epoch: 15 Global Step: 314660 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:30,740-Speed 6315.44 samples/sec Loss 5.6399 LearningRate 0.0005 Epoch: 15 Global Step: 314670 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:33,989-Speed 6305.15 samples/sec Loss 5.5228 LearningRate 0.0005 Epoch: 15 Global Step: 314680 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:37,232-Speed 6315.35 samples/sec Loss 5.5312 LearningRate 0.0005 Epoch: 15 Global Step: 314690 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:40,489-Speed 6289.21 samples/sec Loss 5.6585 LearningRate 0.0005 Epoch: 15 Global Step: 314700 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:43,725-Speed 6331.47 samples/sec Loss 5.5622 LearningRate 0.0005 Epoch: 15 Global Step: 314710 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:46,968-Speed 6315.46 samples/sec Loss 5.6813 LearningRate 0.0005 Epoch: 15 Global Step: 314720 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:50,211-Speed 6316.76 samples/sec Loss 5.6160 LearningRate 0.0005 Epoch: 15 Global Step: 314730 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:53,458-Speed 6310.26 samples/sec Loss 5.6253 LearningRate 0.0005 Epoch: 15 Global Step: 314740 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:56,702-Speed 6313.16 samples/sec Loss 5.6188 LearningRate 0.0005 Epoch: 15 Global Step: 314750 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:21:59,951-Speed 6304.31 samples/sec Loss 5.6919 LearningRate 0.0005 Epoch: 15 Global Step: 314760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:03,202-Speed 6302.03 samples/sec Loss 5.6558 LearningRate 0.0005 Epoch: 15 Global Step: 314770 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:06,447-Speed 6313.02 samples/sec Loss 5.6383 LearningRate 0.0005 Epoch: 15 Global Step: 314780 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:09,689-Speed 6318.03 samples/sec Loss 5.6442 LearningRate 0.0005 Epoch: 15 Global Step: 314790 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:12,934-Speed 6313.17 samples/sec Loss 5.6442 LearningRate 0.0005 Epoch: 15 Global Step: 314800 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:16,174-Speed 6322.17 samples/sec Loss 5.6190 LearningRate 0.0005 Epoch: 15 Global Step: 314810 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:19,420-Speed 6310.45 samples/sec Loss 5.6836 LearningRate 0.0005 Epoch: 15 Global Step: 314820 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:22,666-Speed 6311.45 samples/sec Loss 5.6133 LearningRate 0.0005 Epoch: 15 Global Step: 314830 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:25,918-Speed 6299.05 samples/sec Loss 5.6546 LearningRate 0.0005 Epoch: 15 Global Step: 314840 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:29,165-Speed 6308.75 samples/sec Loss 5.6580 LearningRate 0.0005 Epoch: 15 Global Step: 314850 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:32,419-Speed 6294.97 samples/sec Loss 5.6297 LearningRate 0.0005 Epoch: 15 Global Step: 314860 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:35,666-Speed 6308.86 samples/sec Loss 5.6988 LearningRate 0.0005 Epoch: 15 Global Step: 314870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:38,914-Speed 6307.53 samples/sec Loss 5.6048 LearningRate 0.0005 Epoch: 15 Global Step: 314880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:42,162-Speed 6305.19 samples/sec Loss 5.6304 LearningRate 0.0005 Epoch: 15 Global Step: 314890 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:45,410-Speed 6307.35 samples/sec Loss 5.6575 LearningRate 0.0005 Epoch: 15 Global Step: 314900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:48,645-Speed 6331.94 samples/sec Loss 5.6095 LearningRate 0.0005 Epoch: 15 Global Step: 314910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:51,906-Speed 6281.33 samples/sec Loss 5.6489 LearningRate 0.0005 Epoch: 15 Global Step: 314920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:55,151-Speed 6314.46 samples/sec Loss 5.7159 LearningRate 0.0005 Epoch: 15 Global Step: 314930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:22:58,406-Speed 6291.92 samples/sec Loss 5.6524 LearningRate 0.0005 Epoch: 15 Global Step: 314940 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:01,655-Speed 6304.99 samples/sec Loss 5.6439 LearningRate 0.0005 Epoch: 15 Global Step: 314950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:04,905-Speed 6302.64 samples/sec Loss 5.6007 LearningRate 0.0005 Epoch: 15 Global Step: 314960 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:08,151-Speed 6311.20 samples/sec Loss 5.6376 LearningRate 0.0005 Epoch: 15 Global Step: 314970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:11,402-Speed 6301.49 samples/sec Loss 5.6410 LearningRate 0.0005 Epoch: 15 Global Step: 314980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:14,651-Speed 6304.79 samples/sec Loss 5.6423 LearningRate 0.0005 Epoch: 15 Global Step: 314990 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:17,895-Speed 6315.33 samples/sec Loss 5.5379 LearningRate 0.0005 Epoch: 15 Global Step: 315000 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:21,143-Speed 6305.96 samples/sec Loss 5.5782 LearningRate 0.0005 Epoch: 15 Global Step: 315010 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:23:24,379-Speed 6330.27 samples/sec Loss 5.6478 LearningRate 0.0005 Epoch: 15 Global Step: 315020 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:27,630-Speed 6301.66 samples/sec Loss 5.6432 LearningRate 0.0005 Epoch: 15 Global Step: 315030 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:30,879-Speed 6304.71 samples/sec Loss 5.6394 LearningRate 0.0005 Epoch: 15 Global Step: 315040 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:34,124-Speed 6312.21 samples/sec Loss 5.6672 LearningRate 0.0005 Epoch: 15 Global Step: 315050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:37,367-Speed 6316.49 samples/sec Loss 5.6381 LearningRate 0.0005 Epoch: 15 Global Step: 315060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:40,619-Speed 6298.54 samples/sec Loss 5.6456 LearningRate 0.0005 Epoch: 15 Global Step: 315070 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:43,869-Speed 6304.84 samples/sec Loss 5.5972 LearningRate 0.0005 Epoch: 15 Global Step: 315080 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:47,111-Speed 6317.31 samples/sec Loss 5.5928 LearningRate 0.0005 Epoch: 15 Global Step: 315090 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:50,366-Speed 6294.16 samples/sec Loss 5.7024 LearningRate 0.0005 Epoch: 15 Global Step: 315100 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:53,617-Speed 6301.23 samples/sec Loss 5.5566 LearningRate 0.0005 Epoch: 15 Global Step: 315110 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:23:56,849-Speed 6338.40 samples/sec Loss 5.6520 LearningRate 0.0005 Epoch: 15 Global Step: 315120 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:24:00,096-Speed 6307.64 samples/sec Loss 5.5880 LearningRate 0.0005 Epoch: 15 Global Step: 315130 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:24:03,351-Speed 6293.01 samples/sec Loss 5.7395 LearningRate 0.0005 Epoch: 15 Global Step: 315140 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:24:06,602-Speed 6302.11 samples/sec Loss 5.6660 LearningRate 0.0005 Epoch: 15 Global Step: 315150 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:24:09,835-Speed 6335.58 samples/sec Loss 5.6399 LearningRate 0.0005 Epoch: 15 Global Step: 315160 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:24:13,087-Speed 6298.76 samples/sec Loss 5.6222 LearningRate 0.0005 Epoch: 15 Global Step: 315170 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:24:16,330-Speed 6317.67 samples/sec Loss 5.7127 LearningRate 0.0005 Epoch: 15 Global Step: 315180 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:24:19,576-Speed 6309.61 samples/sec Loss 5.5983 LearningRate 0.0005 Epoch: 15 Global Step: 315190 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:24:22,823-Speed 6309.76 samples/sec Loss 5.6817 LearningRate 0.0005 Epoch: 15 Global Step: 315200 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:24:26,076-Speed 6301.15 samples/sec Loss 5.6155 LearningRate 0.0005 Epoch: 15 Global Step: 315210 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:24:29,323-Speed 6307.42 samples/sec Loss 5.6209 LearningRate 0.0005 Epoch: 15 Global Step: 315220 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:24:32,577-Speed 6294.83 samples/sec Loss 5.5726 LearningRate 0.0005 Epoch: 15 Global Step: 315230 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:24:35,824-Speed 6309.45 samples/sec Loss 5.5793 LearningRate 0.0005 Epoch: 15 Global Step: 315240 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:24:39,067-Speed 6316.91 samples/sec Loss 5.5856 LearningRate 0.0005 Epoch: 15 Global Step: 315250 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:24:42,312-Speed 6311.56 samples/sec Loss 5.5774 LearningRate 0.0005 Epoch: 15 Global Step: 315260 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:24:45,562-Speed 6302.85 samples/sec Loss 5.5933 LearningRate 0.0005 Epoch: 15 Global Step: 315270 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:24:48,811-Speed 6305.07 samples/sec Loss 5.5472 LearningRate 0.0005 Epoch: 15 Global Step: 315280 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:24:52,062-Speed 6302.80 samples/sec Loss 5.6788 LearningRate 0.0005 Epoch: 15 Global Step: 315290 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:24:55,306-Speed 6313.97 samples/sec Loss 5.5811 LearningRate 0.0005 Epoch: 15 Global Step: 315300 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:24:58,559-Speed 6297.71 samples/sec Loss 5.5951 LearningRate 0.0005 Epoch: 15 Global Step: 315310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:01,808-Speed 6305.94 samples/sec Loss 5.5618 LearningRate 0.0005 Epoch: 15 Global Step: 315320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:05,066-Speed 6286.66 samples/sec Loss 5.5964 LearningRate 0.0005 Epoch: 15 Global Step: 315330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:08,318-Speed 6299.57 samples/sec Loss 5.5606 LearningRate 0.0005 Epoch: 15 Global Step: 315340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:11,569-Speed 6301.33 samples/sec Loss 5.5434 LearningRate 0.0005 Epoch: 15 Global Step: 315350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:14,799-Speed 6340.97 samples/sec Loss 5.6110 LearningRate 0.0005 Epoch: 15 Global Step: 315360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:18,048-Speed 6305.84 samples/sec Loss 5.6365 LearningRate 0.0005 Epoch: 15 Global Step: 315370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:21,292-Speed 6314.30 samples/sec Loss 5.6990 LearningRate 0.0005 Epoch: 15 Global Step: 315380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:24,542-Speed 6303.03 samples/sec Loss 5.5897 LearningRate 0.0005 Epoch: 15 Global Step: 315390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:27,789-Speed 6307.81 samples/sec Loss 5.5908 LearningRate 0.0005 Epoch: 15 Global Step: 315400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:31,038-Speed 6305.04 samples/sec Loss 5.6093 LearningRate 0.0005 Epoch: 15 Global Step: 315410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:34,285-Speed 6309.28 samples/sec Loss 5.6631 LearningRate 0.0005 Epoch: 15 Global Step: 315420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:37,534-Speed 6304.82 samples/sec Loss 5.6046 LearningRate 0.0005 Epoch: 15 Global Step: 315430 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:40,786-Speed 6298.29 samples/sec Loss 5.6203 LearningRate 0.0005 Epoch: 15 Global Step: 315440 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:44,035-Speed 6306.13 samples/sec Loss 5.6902 LearningRate 0.0005 Epoch: 15 Global Step: 315450 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:47,270-Speed 6330.82 samples/sec Loss 5.6190 LearningRate 0.0005 Epoch: 15 Global Step: 315460 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:50,520-Speed 6304.47 samples/sec Loss 5.6337 LearningRate 0.0005 Epoch: 15 Global Step: 315470 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:53,775-Speed 6292.68 samples/sec Loss 5.6165 LearningRate 0.0005 Epoch: 15 Global Step: 315480 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:25:57,024-Speed 6305.38 samples/sec Loss 5.6142 LearningRate 0.0005 Epoch: 15 Global Step: 315490 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:00,267-Speed 6316.30 samples/sec Loss 5.5772 LearningRate 0.0005 Epoch: 15 Global Step: 315500 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:03,515-Speed 6308.06 samples/sec Loss 5.5992 LearningRate 0.0005 Epoch: 15 Global Step: 315510 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:06,761-Speed 6309.41 samples/sec Loss 5.5994 LearningRate 0.0005 Epoch: 15 Global Step: 315520 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:10,006-Speed 6313.94 samples/sec Loss 5.5909 LearningRate 0.0005 Epoch: 15 Global Step: 315530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:13,253-Speed 6308.07 samples/sec Loss 5.6014 LearningRate 0.0005 Epoch: 15 Global Step: 315540 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:16,496-Speed 6315.99 samples/sec Loss 5.7017 LearningRate 0.0005 Epoch: 15 Global Step: 315550 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:19,738-Speed 6319.78 samples/sec Loss 5.6798 LearningRate 0.0005 Epoch: 15 Global Step: 315560 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:22,984-Speed 6310.43 samples/sec Loss 5.6258 LearningRate 0.0005 Epoch: 15 Global Step: 315570 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:26,232-Speed 6306.02 samples/sec Loss 5.6785 LearningRate 0.0005 Epoch: 15 Global Step: 315580 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:29,485-Speed 6296.96 samples/sec Loss 5.6676 LearningRate 0.0005 Epoch: 15 Global Step: 315590 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:32,730-Speed 6312.51 samples/sec Loss 5.6002 LearningRate 0.0005 Epoch: 15 Global Step: 315600 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:35,982-Speed 6299.95 samples/sec Loss 5.6650 LearningRate 0.0005 Epoch: 15 Global Step: 315610 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:39,231-Speed 6304.34 samples/sec Loss 5.5538 LearningRate 0.0005 Epoch: 15 Global Step: 315620 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:42,480-Speed 6305.20 samples/sec Loss 5.5654 LearningRate 0.0005 Epoch: 15 Global Step: 315630 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:45,726-Speed 6309.78 samples/sec Loss 5.6340 LearningRate 0.0005 Epoch: 15 Global Step: 315640 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:48,972-Speed 6311.42 samples/sec Loss 5.6513 LearningRate 0.0005 Epoch: 15 Global Step: 315650 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:52,201-Speed 6344.60 samples/sec Loss 5.5742 LearningRate 0.0005 Epoch: 15 Global Step: 315660 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:55,449-Speed 6305.92 samples/sec Loss 5.5348 LearningRate 0.0005 Epoch: 15 Global Step: 315670 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:26:58,697-Speed 6306.39 samples/sec Loss 5.6383 LearningRate 0.0005 Epoch: 15 Global Step: 315680 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:01,946-Speed 6306.42 samples/sec Loss 5.5829 LearningRate 0.0005 Epoch: 15 Global Step: 315690 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:05,192-Speed 6310.20 samples/sec Loss 5.5540 LearningRate 0.0005 Epoch: 15 Global Step: 315700 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:08,439-Speed 6309.40 samples/sec Loss 5.6146 LearningRate 0.0005 Epoch: 15 Global Step: 315710 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:11,684-Speed 6313.12 samples/sec Loss 5.6528 LearningRate 0.0005 Epoch: 15 Global Step: 315720 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:14,930-Speed 6310.93 samples/sec Loss 5.6106 LearningRate 0.0005 Epoch: 15 Global Step: 315730 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:18,176-Speed 6311.02 samples/sec Loss 5.5583 LearningRate 0.0005 Epoch: 15 Global Step: 315740 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:21,421-Speed 6312.75 samples/sec Loss 5.5845 LearningRate 0.0005 Epoch: 15 Global Step: 315750 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:24,655-Speed 6332.65 samples/sec Loss 5.6351 LearningRate 0.0005 Epoch: 15 Global Step: 315760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:27,905-Speed 6302.72 samples/sec Loss 5.5705 LearningRate 0.0005 Epoch: 15 Global Step: 315770 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:31,150-Speed 6313.51 samples/sec Loss 5.6162 LearningRate 0.0005 Epoch: 15 Global Step: 315780 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:34,394-Speed 6313.63 samples/sec Loss 5.6507 LearningRate 0.0005 Epoch: 15 Global Step: 315790 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:37,640-Speed 6312.61 samples/sec Loss 5.5800 LearningRate 0.0005 Epoch: 15 Global Step: 315800 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:40,879-Speed 6322.96 samples/sec Loss 5.6299 LearningRate 0.0005 Epoch: 15 Global Step: 315810 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:44,127-Speed 6306.22 samples/sec Loss 5.6754 LearningRate 0.0005 Epoch: 15 Global Step: 315820 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:47,378-Speed 6301.71 samples/sec Loss 5.6530 LearningRate 0.0005 Epoch: 15 Global Step: 315830 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:50,623-Speed 6312.59 samples/sec Loss 5.6561 LearningRate 0.0005 Epoch: 15 Global Step: 315840 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:53,867-Speed 6314.31 samples/sec Loss 5.5918 LearningRate 0.0005 Epoch: 15 Global Step: 315850 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:27:57,114-Speed 6309.88 samples/sec Loss 5.6150 LearningRate 0.0005 Epoch: 15 Global Step: 315860 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:28:00,343-Speed 6343.02 samples/sec Loss 5.5838 LearningRate 0.0005 Epoch: 15 Global Step: 315870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:03,589-Speed 6312.02 samples/sec Loss 5.7063 LearningRate 0.0005 Epoch: 15 Global Step: 315880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:06,845-Speed 6290.59 samples/sec Loss 5.5563 LearningRate 0.0005 Epoch: 15 Global Step: 315890 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:10,087-Speed 6319.52 samples/sec Loss 5.5764 LearningRate 0.0005 Epoch: 15 Global Step: 315900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:13,334-Speed 6308.08 samples/sec Loss 5.6026 LearningRate 0.0005 Epoch: 15 Global Step: 315910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:16,577-Speed 6316.77 samples/sec Loss 5.6339 LearningRate 0.0005 Epoch: 15 Global Step: 315920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:19,826-Speed 6307.12 samples/sec Loss 5.6026 LearningRate 0.0005 Epoch: 15 Global Step: 315930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:23,073-Speed 6306.88 samples/sec Loss 5.6087 LearningRate 0.0005 Epoch: 15 Global Step: 315940 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:26,316-Speed 6316.88 samples/sec Loss 5.6723 LearningRate 0.0005 Epoch: 15 Global Step: 315950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:29,566-Speed 6303.86 samples/sec Loss 5.6359 LearningRate 0.0005 Epoch: 15 Global Step: 315960 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:32,795-Speed 6344.06 samples/sec Loss 5.5890 LearningRate 0.0005 Epoch: 15 Global Step: 315970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:36,041-Speed 6310.91 samples/sec Loss 5.6402 LearningRate 0.0005 Epoch: 15 Global Step: 315980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:39,285-Speed 6314.07 samples/sec Loss 5.6253 LearningRate 0.0005 Epoch: 15 Global Step: 315990 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:42,534-Speed 6304.97 samples/sec Loss 5.7272 LearningRate 0.0005 Epoch: 15 Global Step: 316000 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:45,784-Speed 6302.41 samples/sec Loss 5.6515 LearningRate 0.0005 Epoch: 15 Global Step: 316010 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:49,035-Speed 6300.70 samples/sec Loss 5.6878 LearningRate 0.0005 Epoch: 15 Global Step: 316020 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:52,283-Speed 6307.61 samples/sec Loss 5.6460 LearningRate 0.0005 Epoch: 15 Global Step: 316030 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:55,530-Speed 6308.95 samples/sec Loss 5.6248 LearningRate 0.0005 Epoch: 15 Global Step: 316040 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:28:58,778-Speed 6307.24 samples/sec Loss 5.6039 LearningRate 0.0005 Epoch: 15 Global Step: 316050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:02,030-Speed 6299.08 samples/sec Loss 5.5909 LearningRate 0.0005 Epoch: 15 Global Step: 316060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:05,261-Speed 6339.63 samples/sec Loss 5.6246 LearningRate 0.0005 Epoch: 15 Global Step: 316070 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:08,510-Speed 6304.73 samples/sec Loss 5.5911 LearningRate 0.0005 Epoch: 15 Global Step: 316080 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:11,750-Speed 6321.54 samples/sec Loss 5.6120 LearningRate 0.0005 Epoch: 15 Global Step: 316090 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:14,996-Speed 6310.93 samples/sec Loss 5.6488 LearningRate 0.0005 Epoch: 15 Global Step: 316100 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:18,243-Speed 6309.25 samples/sec Loss 5.5817 LearningRate 0.0005 Epoch: 15 Global Step: 316110 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:21,489-Speed 6311.38 samples/sec Loss 5.5581 LearningRate 0.0005 Epoch: 15 Global Step: 316120 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:24,737-Speed 6306.41 samples/sec Loss 5.6120 LearningRate 0.0005 Epoch: 15 Global Step: 316130 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:27,986-Speed 6306.20 samples/sec Loss 5.5950 LearningRate 0.0005 Epoch: 15 Global Step: 316140 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:31,233-Speed 6307.37 samples/sec Loss 5.5954 LearningRate 0.0005 Epoch: 15 Global Step: 316150 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:34,478-Speed 6313.51 samples/sec Loss 5.5737 LearningRate 0.0005 Epoch: 15 Global Step: 316160 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:37,711-Speed 6335.86 samples/sec Loss 5.6002 LearningRate 0.0005 Epoch: 15 Global Step: 316170 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:40,958-Speed 6308.48 samples/sec Loss 5.5719 LearningRate 0.0005 Epoch: 15 Global Step: 316180 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:44,202-Speed 6315.46 samples/sec Loss 5.5249 LearningRate 0.0005 Epoch: 15 Global Step: 316190 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:47,452-Speed 6301.20 samples/sec Loss 5.7271 LearningRate 0.0005 Epoch: 15 Global Step: 316200 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:29:50,681-Speed 6345.14 samples/sec Loss 5.7041 LearningRate 0.0005 Epoch: 15 Global Step: 316210 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:29:53,925-Speed 6314.95 samples/sec Loss 5.6413 LearningRate 0.0005 Epoch: 15 Global Step: 316220 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:29:57,176-Speed 6300.97 samples/sec Loss 5.6346 LearningRate 0.0005 Epoch: 15 Global Step: 316230 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:30:00,423-Speed 6308.12 samples/sec Loss 5.6077 LearningRate 0.0005 Epoch: 15 Global Step: 316240 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:30:03,667-Speed 6314.02 samples/sec Loss 5.6021 LearningRate 0.0005 Epoch: 15 Global Step: 316250 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:30:06,935-Speed 6269.82 samples/sec Loss 5.6163 LearningRate 0.0005 Epoch: 15 Global Step: 316260 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:30:10,188-Speed 6297.48 samples/sec Loss 5.6483 LearningRate 0.0005 Epoch: 15 Global Step: 316270 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:30:13,433-Speed 6311.64 samples/sec Loss 5.6258 LearningRate 0.0005 Epoch: 15 Global Step: 316280 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:30:16,679-Speed 6310.97 samples/sec Loss 5.6554 LearningRate 0.0005 Epoch: 15 Global Step: 316290 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:30:19,928-Speed 6304.43 samples/sec Loss 5.6121 LearningRate 0.0005 Epoch: 15 Global Step: 316300 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:30:23,175-Speed 6309.82 samples/sec Loss 5.6418 LearningRate 0.0005 Epoch: 15 Global Step: 316310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:26,420-Speed 6311.63 samples/sec Loss 5.6043 LearningRate 0.0005 Epoch: 15 Global Step: 316320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:29,669-Speed 6305.40 samples/sec Loss 5.6582 LearningRate 0.0005 Epoch: 15 Global Step: 316330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:32,913-Speed 6315.14 samples/sec Loss 5.6433 LearningRate 0.0005 Epoch: 15 Global Step: 316340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:36,161-Speed 6307.52 samples/sec Loss 5.6458 LearningRate 0.0005 Epoch: 15 Global Step: 316350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:39,405-Speed 6313.56 samples/sec Loss 5.6624 LearningRate 0.0005 Epoch: 15 Global Step: 316360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:42,652-Speed 6310.19 samples/sec Loss 5.6362 LearningRate 0.0005 Epoch: 15 Global Step: 316370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:45,921-Speed 6264.85 samples/sec Loss 5.5964 LearningRate 0.0005 Epoch: 15 Global Step: 316380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:49,171-Speed 6303.94 samples/sec Loss 5.6303 LearningRate 0.0005 Epoch: 15 Global Step: 316390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:52,419-Speed 6306.52 samples/sec Loss 5.5685 LearningRate 0.0005 Epoch: 15 Global Step: 316400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:55,650-Speed 6339.39 samples/sec Loss 5.6215 LearningRate 0.0005 Epoch: 15 Global Step: 316410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:30:58,896-Speed 6310.57 samples/sec Loss 5.6160 LearningRate 0.0005 Epoch: 15 Global Step: 316420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:02,143-Speed 6312.66 samples/sec Loss 5.6361 LearningRate 0.0005 Epoch: 15 Global Step: 316430 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:05,389-Speed 6310.76 samples/sec Loss 5.6217 LearningRate 0.0005 Epoch: 15 Global Step: 316440 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:08,635-Speed 6308.95 samples/sec Loss 5.7332 LearningRate 0.0005 Epoch: 15 Global Step: 316450 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:11,881-Speed 6311.70 samples/sec Loss 5.5578 LearningRate 0.0005 Epoch: 15 Global Step: 316460 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:15,130-Speed 6304.87 samples/sec Loss 5.6690 LearningRate 0.0005 Epoch: 15 Global Step: 316470 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:18,375-Speed 6312.84 samples/sec Loss 5.5376 LearningRate 0.0005 Epoch: 15 Global Step: 316480 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:21,625-Speed 6302.10 samples/sec Loss 5.6258 LearningRate 0.0005 Epoch: 15 Global Step: 316490 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:24,882-Speed 6290.63 samples/sec Loss 5.6075 LearningRate 0.0005 Epoch: 15 Global Step: 316500 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:28,123-Speed 6320.53 samples/sec Loss 5.5583 LearningRate 0.0005 Epoch: 15 Global Step: 316510 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:31,368-Speed 6312.79 samples/sec Loss 5.6125 LearningRate 0.0005 Epoch: 15 Global Step: 316520 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:34,651-Speed 6238.69 samples/sec Loss 5.6119 LearningRate 0.0005 Epoch: 15 Global Step: 316530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:37,895-Speed 6315.30 samples/sec Loss 5.6467 LearningRate 0.0005 Epoch: 15 Global Step: 316540 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:41,151-Speed 6291.27 samples/sec Loss 5.6210 LearningRate 0.0005 Epoch: 15 Global Step: 316550 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:44,402-Speed 6301.10 samples/sec Loss 5.6341 LearningRate 0.0005 Epoch: 15 Global Step: 316560 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:47,648-Speed 6310.78 samples/sec Loss 5.6382 LearningRate 0.0005 Epoch: 15 Global Step: 316570 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:50,896-Speed 6307.24 samples/sec Loss 5.6142 LearningRate 0.0005 Epoch: 15 Global Step: 316580 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:54,153-Speed 6288.51 samples/sec Loss 5.6813 LearningRate 0.0005 Epoch: 15 Global Step: 316590 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:31:57,400-Speed 6310.45 samples/sec Loss 5.5737 LearningRate 0.0005 Epoch: 15 Global Step: 316600 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:00,634-Speed 6333.72 samples/sec Loss 5.5821 LearningRate 0.0005 Epoch: 15 Global Step: 316610 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:03,879-Speed 6311.84 samples/sec Loss 5.6771 LearningRate 0.0005 Epoch: 15 Global Step: 316620 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:07,126-Speed 6308.44 samples/sec Loss 5.6488 LearningRate 0.0005 Epoch: 15 Global Step: 316630 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:10,372-Speed 6311.94 samples/sec Loss 5.6008 LearningRate 0.0005 Epoch: 15 Global Step: 316640 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:13,617-Speed 6312.46 samples/sec Loss 5.6329 LearningRate 0.0005 Epoch: 15 Global Step: 316650 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:16,878-Speed 6280.78 samples/sec Loss 5.5773 LearningRate 0.0005 Epoch: 15 Global Step: 316660 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:20,123-Speed 6312.51 samples/sec Loss 5.6044 LearningRate 0.0005 Epoch: 15 Global Step: 316670 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:23,371-Speed 6307.61 samples/sec Loss 5.5738 LearningRate 0.0005 Epoch: 15 Global Step: 316680 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:26,616-Speed 6312.09 samples/sec Loss 5.5358 LearningRate 0.0005 Epoch: 15 Global Step: 316690 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:29,860-Speed 6314.63 samples/sec Loss 5.6183 LearningRate 0.0005 Epoch: 15 Global Step: 316700 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:33,123-Speed 6278.02 samples/sec Loss 5.6275 LearningRate 0.0005 Epoch: 15 Global Step: 316710 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:32:36,354-Speed 6339.83 samples/sec Loss 5.6031 LearningRate 0.0005 Epoch: 15 Global Step: 316720 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:39,602-Speed 6306.96 samples/sec Loss 5.6581 LearningRate 0.0005 Epoch: 15 Global Step: 316730 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:42,859-Speed 6289.23 samples/sec Loss 5.6735 LearningRate 0.0005 Epoch: 15 Global Step: 316740 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:46,103-Speed 6314.69 samples/sec Loss 5.6691 LearningRate 0.0005 Epoch: 15 Global Step: 316750 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:49,351-Speed 6308.28 samples/sec Loss 5.6506 LearningRate 0.0005 Epoch: 15 Global Step: 316760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:32:52,580-Speed 6342.37 samples/sec Loss 5.6407 LearningRate 0.0005 Epoch: 15 Global Step: 316770 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:32:55,829-Speed 6305.11 samples/sec Loss 5.6104 LearningRate 0.0005 Epoch: 15 Global Step: 316780 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:32:59,073-Speed 6314.58 samples/sec Loss 5.6069 LearningRate 0.0005 Epoch: 15 Global Step: 316790 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:33:02,317-Speed 6315.15 samples/sec Loss 5.5675 LearningRate 0.0005 Epoch: 15 Global Step: 316800 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:33:05,563-Speed 6311.26 samples/sec Loss 5.6018 LearningRate 0.0005 Epoch: 15 Global Step: 316810 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:33:08,814-Speed 6301.23 samples/sec Loss 5.6260 LearningRate 0.0005 Epoch: 15 Global Step: 316820 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:33:12,060-Speed 6310.14 samples/sec Loss 5.6437 LearningRate 0.0005 Epoch: 15 Global Step: 316830 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:33:15,304-Speed 6314.41 samples/sec Loss 5.5646 LearningRate 0.0005 Epoch: 15 Global Step: 316840 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:33:18,551-Speed 6308.69 samples/sec Loss 5.6122 LearningRate 0.0005 Epoch: 15 Global Step: 316850 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:33:21,800-Speed 6304.82 samples/sec Loss 5.5816 LearningRate 0.0005 Epoch: 15 Global Step: 316860 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:33:25,049-Speed 6306.36 samples/sec Loss 5.5687 LearningRate 0.0005 Epoch: 15 Global Step: 316870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:33:28,293-Speed 6313.24 samples/sec Loss 5.5179 LearningRate 0.0005 Epoch: 15 Global Step: 316880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:33:31,537-Speed 6315.15 samples/sec Loss 5.6832 LearningRate 0.0005 Epoch: 15 Global Step: 316890 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:33:34,781-Speed 6314.28 samples/sec Loss 5.6043 LearningRate 0.0005 Epoch: 15 Global Step: 316900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:33:38,026-Speed 6312.25 samples/sec Loss 5.6507 LearningRate 0.0005 Epoch: 15 Global Step: 316910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:33:41,272-Speed 6310.01 samples/sec Loss 5.5821 LearningRate 0.0005 Epoch: 15 Global Step: 316920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:33:44,516-Speed 6316.57 samples/sec Loss 5.6285 LearningRate 0.0005 Epoch: 15 Global Step: 316930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:33:47,762-Speed 6310.13 samples/sec Loss 5.5542 LearningRate 0.0005 Epoch: 15 Global Step: 316940 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:33:51,010-Speed 6305.32 samples/sec Loss 5.6879 LearningRate 0.0005 Epoch: 15 Global Step: 316950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:33:54,260-Speed 6304.96 samples/sec Loss 5.7051 LearningRate 0.0005 Epoch: 15 Global Step: 316960 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:33:57,494-Speed 6334.30 samples/sec Loss 5.6038 LearningRate 0.0005 Epoch: 15 Global Step: 316970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:00,738-Speed 6314.41 samples/sec Loss 5.5537 LearningRate 0.0005 Epoch: 15 Global Step: 316980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:03,986-Speed 6305.82 samples/sec Loss 5.6303 LearningRate 0.0005 Epoch: 15 Global Step: 316990 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:07,231-Speed 6313.04 samples/sec Loss 5.7052 LearningRate 0.0005 Epoch: 15 Global Step: 317000 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:10,474-Speed 6316.83 samples/sec Loss 5.5688 LearningRate 0.0005 Epoch: 15 Global Step: 317010 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:13,724-Speed 6302.71 samples/sec Loss 5.6319 LearningRate 0.0005 Epoch: 15 Global Step: 317020 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:16,969-Speed 6312.06 samples/sec Loss 5.6444 LearningRate 0.0005 Epoch: 15 Global Step: 317030 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:20,217-Speed 6307.64 samples/sec Loss 5.6786 LearningRate 0.0005 Epoch: 15 Global Step: 317040 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:23,465-Speed 6307.58 samples/sec Loss 5.5104 LearningRate 0.0005 Epoch: 15 Global Step: 317050 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:26,711-Speed 6310.36 samples/sec Loss 5.6024 LearningRate 0.0005 Epoch: 15 Global Step: 317060 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:29,952-Speed 6320.72 samples/sec Loss 5.6005 LearningRate 0.0005 Epoch: 15 Global Step: 317070 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:34:33,184-Speed 6337.70 samples/sec Loss 5.6218 LearningRate 0.0005 Epoch: 15 Global Step: 317080 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:36,429-Speed 6313.03 samples/sec Loss 5.6420 LearningRate 0.0005 Epoch: 15 Global Step: 317090 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:39,680-Speed 6301.27 samples/sec Loss 5.6477 LearningRate 0.0005 Epoch: 15 Global Step: 317100 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:42,921-Speed 6319.62 samples/sec Loss 5.6488 LearningRate 0.0005 Epoch: 15 Global Step: 317110 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:46,169-Speed 6307.53 samples/sec Loss 5.5972 LearningRate 0.0005 Epoch: 15 Global Step: 317120 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:49,417-Speed 6306.00 samples/sec Loss 5.5700 LearningRate 0.0005 Epoch: 15 Global Step: 317130 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:52,662-Speed 6312.30 samples/sec Loss 5.5507 LearningRate 0.0005 Epoch: 15 Global Step: 317140 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:55,904-Speed 6319.01 samples/sec Loss 5.5697 LearningRate 0.0005 Epoch: 15 Global Step: 317150 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:34:59,150-Speed 6310.79 samples/sec Loss 5.6110 LearningRate 0.0005 Epoch: 15 Global Step: 317160 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:02,401-Speed 6302.67 samples/sec Loss 5.6215 LearningRate 0.0005 Epoch: 15 Global Step: 317170 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:05,635-Speed 6334.30 samples/sec Loss 5.6708 LearningRate 0.0005 Epoch: 15 Global Step: 317180 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:08,883-Speed 6305.52 samples/sec Loss 5.6288 LearningRate 0.0005 Epoch: 15 Global Step: 317190 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:12,137-Speed 6296.26 samples/sec Loss 5.5929 LearningRate 0.0005 Epoch: 15 Global Step: 317200 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:15,384-Speed 6309.47 samples/sec Loss 5.6343 LearningRate 0.0005 Epoch: 15 Global Step: 317210 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:18,629-Speed 6310.79 samples/sec Loss 5.6725 LearningRate 0.0005 Epoch: 15 Global Step: 317220 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:21,879-Speed 6303.99 samples/sec Loss 5.6524 LearningRate 0.0005 Epoch: 15 Global Step: 317230 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:25,135-Speed 6292.00 samples/sec Loss 5.6371 LearningRate 0.0005 Epoch: 15 Global Step: 317240 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:28,381-Speed 6309.52 samples/sec Loss 5.6835 LearningRate 0.0005 Epoch: 15 Global Step: 317250 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:31,629-Speed 6306.90 samples/sec Loss 5.6870 LearningRate 0.0005 Epoch: 15 Global Step: 317260 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:34,875-Speed 6311.11 samples/sec Loss 5.6242 LearningRate 0.0005 Epoch: 15 Global Step: 317270 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:38,110-Speed 6332.16 samples/sec Loss 5.6330 LearningRate 0.0005 Epoch: 15 Global Step: 317280 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:41,358-Speed 6305.90 samples/sec Loss 5.6306 LearningRate 0.0005 Epoch: 15 Global Step: 317290 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:44,608-Speed 6304.11 samples/sec Loss 5.5444 LearningRate 0.0005 Epoch: 15 Global Step: 317300 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:47,856-Speed 6306.50 samples/sec Loss 5.5700 LearningRate 0.0005 Epoch: 15 Global Step: 317310 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:51,104-Speed 6307.11 samples/sec Loss 5.6315 LearningRate 0.0005 Epoch: 15 Global Step: 317320 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:54,356-Speed 6299.16 samples/sec Loss 5.6124 LearningRate 0.0005 Epoch: 15 Global Step: 317330 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:35:57,601-Speed 6312.15 samples/sec Loss 5.6573 LearningRate 0.0005 Epoch: 15 Global Step: 317340 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:00,848-Speed 6308.79 samples/sec Loss 5.6044 LearningRate 0.0005 Epoch: 15 Global Step: 317350 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:04,095-Speed 6308.18 samples/sec Loss 5.5857 LearningRate 0.0005 Epoch: 15 Global Step: 317360 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:07,346-Speed 6300.87 samples/sec Loss 5.6575 LearningRate 0.0005 Epoch: 15 Global Step: 317370 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:10,581-Speed 6333.68 samples/sec Loss 5.5904 LearningRate 0.0005 Epoch: 15 Global Step: 317380 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:13,827-Speed 6311.54 samples/sec Loss 5.5456 LearningRate 0.0005 Epoch: 15 Global Step: 317390 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:17,075-Speed 6306.32 samples/sec Loss 5.6120 LearningRate 0.0005 Epoch: 15 Global Step: 317400 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:20,327-Speed 6299.90 samples/sec Loss 5.6255 LearningRate 0.0005 Epoch: 15 Global Step: 317410 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:23,570-Speed 6315.11 samples/sec Loss 5.6351 LearningRate 0.0005 Epoch: 15 Global Step: 317420 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:26,818-Speed 6308.26 samples/sec Loss 5.6368 LearningRate 0.0005 Epoch: 15 Global Step: 317430 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:30,062-Speed 6314.39 samples/sec Loss 5.6282 LearningRate 0.0005 Epoch: 15 Global Step: 317440 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:33,307-Speed 6312.84 samples/sec Loss 5.5207 LearningRate 0.0005 Epoch: 15 Global Step: 317450 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:36,551-Speed 6313.81 samples/sec Loss 5.6831 LearningRate 0.0005 Epoch: 15 Global Step: 317460 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:39,798-Speed 6309.24 samples/sec Loss 5.6550 LearningRate 0.0005 Epoch: 15 Global Step: 317470 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:43,030-Speed 6337.57 samples/sec Loss 5.5783 LearningRate 0.0005 Epoch: 15 Global Step: 317480 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:46,278-Speed 6308.21 samples/sec Loss 5.6521 LearningRate 0.0005 Epoch: 15 Global Step: 317490 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:49,522-Speed 6313.21 samples/sec Loss 5.6436 LearningRate 0.0005 Epoch: 15 Global Step: 317500 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:52,765-Speed 6317.07 samples/sec Loss 5.6174 LearningRate 0.0005 Epoch: 15 Global Step: 317510 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:56,014-Speed 6304.47 samples/sec Loss 5.5752 LearningRate 0.0005 Epoch: 15 Global Step: 317520 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:36:59,264-Speed 6302.39 samples/sec Loss 5.5381 LearningRate 0.0005 Epoch: 15 Global Step: 317530 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:37:02,503-Speed 6324.89 samples/sec Loss 5.4653 LearningRate 0.0005 Epoch: 15 Global Step: 317540 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:37:05,750-Speed 6308.84 samples/sec Loss 5.5748 LearningRate 0.0005 Epoch: 15 Global Step: 317550 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:37:08,992-Speed 6318.87 samples/sec Loss 5.4850 LearningRate 0.0005 Epoch: 15 Global Step: 317560 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:37:12,237-Speed 6312.90 samples/sec Loss 5.6085 LearningRate 0.0005 Epoch: 15 Global Step: 317570 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:37:15,486-Speed 6303.89 samples/sec Loss 5.5260 LearningRate 0.0005 Epoch: 15 Global Step: 317580 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:37:18,736-Speed 6302.84 samples/sec Loss 5.6054 LearningRate 0.0005 Epoch: 15 Global Step: 317590 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:37:21,989-Speed 6297.56 samples/sec Loss 5.5984 LearningRate 0.0005 Epoch: 15 Global Step: 317600 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:37:25,236-Speed 6309.03 samples/sec Loss 5.6608 LearningRate 0.0005 Epoch: 15 Global Step: 317610 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:37:28,487-Speed 6303.53 samples/sec Loss 5.6257 LearningRate 0.0005 Epoch: 15 Global Step: 317620 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:37:31,732-Speed 6313.38 samples/sec Loss 5.5621 LearningRate 0.0005 Epoch: 15 Global Step: 317630 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:37:34,977-Speed 6313.43 samples/sec Loss 5.5409 LearningRate 0.0005 Epoch: 15 Global Step: 317640 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:37:38,220-Speed 6315.65 samples/sec Loss 5.6087 LearningRate 0.0005 Epoch: 15 Global Step: 317650 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:37:41,464-Speed 6314.05 samples/sec Loss 5.6651 LearningRate 0.0005 Epoch: 15 Global Step: 317660 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:37:44,716-Speed 6299.94 samples/sec Loss 5.6987 LearningRate 0.0005 Epoch: 15 Global Step: 317670 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:37:47,962-Speed 6313.44 samples/sec Loss 5.6595 LearningRate 0.0005 Epoch: 15 Global Step: 317680 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:37:51,213-Speed 6300.09 samples/sec Loss 5.6196 LearningRate 0.0005 Epoch: 15 Global Step: 317690 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:37:54,458-Speed 6312.44 samples/sec Loss 5.5926 LearningRate 0.0005 Epoch: 15 Global Step: 317700 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:37:57,703-Speed 6313.03 samples/sec Loss 5.5232 LearningRate 0.0005 Epoch: 15 Global Step: 317710 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:00,961-Speed 6287.18 samples/sec Loss 5.6681 LearningRate 0.0005 Epoch: 15 Global Step: 317720 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:04,316-Speed 6105.41 samples/sec Loss 5.5838 LearningRate 0.0005 Epoch: 15 Global Step: 317730 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:07,574-Speed 6288.02 samples/sec Loss 5.5814 LearningRate 0.0005 Epoch: 15 Global Step: 317740 Fp16 Grad Scale: 65536 Required: 47 hours Training: 2022-04-01 20:38:10,810-Speed 6329.60 samples/sec Loss 5.5949 LearningRate 0.0005 Epoch: 15 Global Step: 317750 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:14,056-Speed 6310.67 samples/sec Loss 5.5627 LearningRate 0.0005 Epoch: 15 Global Step: 317760 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:17,306-Speed 6303.37 samples/sec Loss 5.6015 LearningRate 0.0005 Epoch: 15 Global Step: 317770 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:20,553-Speed 6308.30 samples/sec Loss 5.5396 LearningRate 0.0005 Epoch: 15 Global Step: 317780 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:23,804-Speed 6302.41 samples/sec Loss 5.6748 LearningRate 0.0005 Epoch: 15 Global Step: 317790 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:27,063-Speed 6284.51 samples/sec Loss 5.6067 LearningRate 0.0005 Epoch: 15 Global Step: 317800 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:30,312-Speed 6306.19 samples/sec Loss 5.5978 LearningRate 0.0005 Epoch: 15 Global Step: 317810 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:33,561-Speed 6304.80 samples/sec Loss 5.5599 LearningRate 0.0005 Epoch: 15 Global Step: 317820 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:36,805-Speed 6314.12 samples/sec Loss 5.5800 LearningRate 0.0005 Epoch: 15 Global Step: 317830 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:40,060-Speed 6294.40 samples/sec Loss 5.5782 LearningRate 0.0005 Epoch: 15 Global Step: 317840 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:43,295-Speed 6332.24 samples/sec Loss 5.5750 LearningRate 0.0005 Epoch: 15 Global Step: 317850 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:46,541-Speed 6311.73 samples/sec Loss 5.6422 LearningRate 0.0005 Epoch: 15 Global Step: 317860 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:49,804-Speed 6277.23 samples/sec Loss 5.6460 LearningRate 0.0005 Epoch: 15 Global Step: 317870 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:53,054-Speed 6302.75 samples/sec Loss 5.5813 LearningRate 0.0005 Epoch: 15 Global Step: 317880 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:56,299-Speed 6312.09 samples/sec Loss 5.6386 LearningRate 0.0005 Epoch: 15 Global Step: 317890 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:38:59,546-Speed 6309.72 samples/sec Loss 5.5688 LearningRate 0.0005 Epoch: 15 Global Step: 317900 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:39:02,792-Speed 6310.64 samples/sec Loss 5.5898 LearningRate 0.0005 Epoch: 15 Global Step: 317910 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:39:06,038-Speed 6310.42 samples/sec Loss 5.6360 LearningRate 0.0005 Epoch: 15 Global Step: 317920 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:39:09,280-Speed 6318.86 samples/sec Loss 5.6451 LearningRate 0.0005 Epoch: 15 Global Step: 317930 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:39:12,528-Speed 6306.31 samples/sec Loss 5.5868 LearningRate 0.0005 Epoch: 15 Global Step: 317940 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:39:15,757-Speed 6344.94 samples/sec Loss 5.6221 LearningRate 0.0005 Epoch: 15 Global Step: 317950 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:39:19,012-Speed 6292.75 samples/sec Loss 5.5774 LearningRate 0.0005 Epoch: 15 Global Step: 317960 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:39:22,272-Speed 6283.87 samples/sec Loss 5.5589 LearningRate 0.0005 Epoch: 15 Global Step: 317970 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:39:25,519-Speed 6308.22 samples/sec Loss 5.6092 LearningRate 0.0005 Epoch: 15 Global Step: 317980 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:39:28,755-Speed 6331.56 samples/sec Loss 5.6645 LearningRate 0.0005 Epoch: 15 Global Step: 317990 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:39:32,001-Speed 6309.74 samples/sec Loss 5.5747 LearningRate 0.0005 Epoch: 15 Global Step: 318000 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:39:35,246-Speed 6312.24 samples/sec Loss 5.5720 LearningRate 0.0005 Epoch: 15 Global Step: 318010 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:39:38,490-Speed 6316.32 samples/sec Loss 5.5564 LearningRate 0.0005 Epoch: 15 Global Step: 318020 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:39:41,733-Speed 6315.68 samples/sec Loss 5.5915 LearningRate 0.0005 Epoch: 15 Global Step: 318030 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:39:44,979-Speed 6310.86 samples/sec Loss 5.6086 LearningRate 0.0005 Epoch: 15 Global Step: 318040 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:39:48,223-Speed 6314.19 samples/sec Loss 5.5799 LearningRate 0.0005 Epoch: 15 Global Step: 318050 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:39:51,471-Speed 6308.56 samples/sec Loss 5.5777 LearningRate 0.0005 Epoch: 15 Global Step: 318060 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:39:54,715-Speed 6314.95 samples/sec Loss 5.6118 LearningRate 0.0005 Epoch: 15 Global Step: 318070 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:39:57,964-Speed 6304.41 samples/sec Loss 5.5728 LearningRate 0.0005 Epoch: 15 Global Step: 318080 Fp16 Grad Scale: 16384 Required: 47 hours Training: 2022-04-01 20:40:01,208-Speed 6314.61 samples/sec Loss 5.6224 LearningRate 0.0005 Epoch: 15 Global Step: 318090 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:40:04,454-Speed 6310.65 samples/sec Loss 5.6568 LearningRate 0.0005 Epoch: 15 Global Step: 318100 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:40:07,698-Speed 6313.53 samples/sec Loss 5.6215 LearningRate 0.0005 Epoch: 15 Global Step: 318110 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:40:10,942-Speed 6314.23 samples/sec Loss 5.5869 LearningRate 0.0005 Epoch: 15 Global Step: 318120 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:40:14,186-Speed 6315.66 samples/sec Loss 5.6147 LearningRate 0.0005 Epoch: 15 Global Step: 318130 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:40:17,430-Speed 6315.14 samples/sec Loss 5.5732 LearningRate 0.0005 Epoch: 15 Global Step: 318140 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:40:20,676-Speed 6309.31 samples/sec Loss 5.6357 LearningRate 0.0005 Epoch: 15 Global Step: 318150 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:40:23,918-Speed 6318.88 samples/sec Loss 5.5083 LearningRate 0.0005 Epoch: 15 Global Step: 318160 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:40:27,165-Speed 6310.18 samples/sec Loss 5.5951 LearningRate 0.0005 Epoch: 15 Global Step: 318170 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:40:30,409-Speed 6313.26 samples/sec Loss 5.5904 LearningRate 0.0005 Epoch: 15 Global Step: 318180 Fp16 Grad Scale: 32768 Required: 47 hours Training: 2022-04-01 20:40:33,645-Speed 6330.96 samples/sec Loss 5.5799 LearningRate 0.0005 Epoch: 15 Global Step: 318190 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:40:36,896-Speed 6300.26 samples/sec Loss 5.6418 LearningRate 0.0005 Epoch: 15 Global Step: 318200 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:40:40,139-Speed 6316.42 samples/sec Loss 5.6259 LearningRate 0.0005 Epoch: 15 Global Step: 318210 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:40:43,387-Speed 6307.69 samples/sec Loss 5.6725 LearningRate 0.0005 Epoch: 15 Global Step: 318220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:40:46,637-Speed 6303.78 samples/sec Loss 5.5920 LearningRate 0.0005 Epoch: 15 Global Step: 318230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:40:49,883-Speed 6311.07 samples/sec Loss 5.5771 LearningRate 0.0005 Epoch: 15 Global Step: 318240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:40:53,127-Speed 6313.59 samples/sec Loss 5.6100 LearningRate 0.0005 Epoch: 15 Global Step: 318250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:40:56,378-Speed 6301.56 samples/sec Loss 5.6007 LearningRate 0.0005 Epoch: 15 Global Step: 318260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:40:59,622-Speed 6313.97 samples/sec Loss 5.5521 LearningRate 0.0005 Epoch: 15 Global Step: 318270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:41:02,870-Speed 6306.48 samples/sec Loss 5.6453 LearningRate 0.0005 Epoch: 15 Global Step: 318280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:41:06,101-Speed 6340.24 samples/sec Loss 5.6256 LearningRate 0.0005 Epoch: 15 Global Step: 318290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:41:09,353-Speed 6298.97 samples/sec Loss 5.6111 LearningRate 0.0005 Epoch: 15 Global Step: 318300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:41:12,603-Speed 6304.09 samples/sec Loss 5.5520 LearningRate 0.0005 Epoch: 15 Global Step: 318310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:41:15,849-Speed 6309.44 samples/sec Loss 5.6687 LearningRate 0.0005 Epoch: 15 Global Step: 318320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:41:19,093-Speed 6315.10 samples/sec Loss 5.6365 LearningRate 0.0005 Epoch: 15 Global Step: 318330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:41:22,341-Speed 6306.31 samples/sec Loss 5.6784 LearningRate 0.0005 Epoch: 15 Global Step: 318340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:41:25,597-Speed 6292.73 samples/sec Loss 5.6209 LearningRate 0.0005 Epoch: 15 Global Step: 318350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:41:28,856-Speed 6284.82 samples/sec Loss 5.6157 LearningRate 0.0005 Epoch: 15 Global Step: 318360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:41:32,095-Speed 6325.29 samples/sec Loss 5.6131 LearningRate 0.0005 Epoch: 15 Global Step: 318370 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:41:35,345-Speed 6302.12 samples/sec Loss 5.6655 LearningRate 0.0005 Epoch: 15 Global Step: 318380 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:41:38,622-Speed 6252.01 samples/sec Loss 5.5922 LearningRate 0.0005 Epoch: 15 Global Step: 318390 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:41:41,900-Speed 6249.09 samples/sec Loss 5.6538 LearningRate 0.0005 Epoch: 15 Global Step: 318400 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:41:45,145-Speed 6312.08 samples/sec Loss 5.5841 LearningRate 0.0005 Epoch: 15 Global Step: 318410 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:41:48,391-Speed 6310.63 samples/sec Loss 5.6321 LearningRate 0.0005 Epoch: 15 Global Step: 318420 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:41:51,634-Speed 6316.04 samples/sec Loss 5.6608 LearningRate 0.0005 Epoch: 15 Global Step: 318430 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:41:54,880-Speed 6310.59 samples/sec Loss 5.5601 LearningRate 0.0005 Epoch: 15 Global Step: 318440 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:41:58,128-Speed 6308.57 samples/sec Loss 5.6042 LearningRate 0.0005 Epoch: 15 Global Step: 318450 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:42:01,376-Speed 6307.39 samples/sec Loss 5.6006 LearningRate 0.0005 Epoch: 15 Global Step: 318460 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:42:04,622-Speed 6310.06 samples/sec Loss 5.6064 LearningRate 0.0005 Epoch: 15 Global Step: 318470 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:07,868-Speed 6310.55 samples/sec Loss 5.5945 LearningRate 0.0005 Epoch: 15 Global Step: 318480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:11,112-Speed 6315.11 samples/sec Loss 5.6401 LearningRate 0.0005 Epoch: 15 Global Step: 318490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:14,358-Speed 6310.90 samples/sec Loss 5.5689 LearningRate 0.0005 Epoch: 15 Global Step: 318500 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:17,610-Speed 6297.86 samples/sec Loss 5.6165 LearningRate 0.0005 Epoch: 15 Global Step: 318510 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:20,857-Speed 6309.59 samples/sec Loss 5.6189 LearningRate 0.0005 Epoch: 15 Global Step: 318520 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:24,105-Speed 6306.44 samples/sec Loss 5.6702 LearningRate 0.0005 Epoch: 15 Global Step: 318530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:27,359-Speed 6294.91 samples/sec Loss 5.5873 LearningRate 0.0005 Epoch: 15 Global Step: 318540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:30,602-Speed 6316.57 samples/sec Loss 5.5765 LearningRate 0.0005 Epoch: 15 Global Step: 318550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:33,851-Speed 6305.28 samples/sec Loss 5.5896 LearningRate 0.0005 Epoch: 15 Global Step: 318560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:37,083-Speed 6338.82 samples/sec Loss 5.6591 LearningRate 0.0005 Epoch: 15 Global Step: 318570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:40,335-Speed 6298.06 samples/sec Loss 5.5753 LearningRate 0.0005 Epoch: 15 Global Step: 318580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:43,588-Speed 6296.80 samples/sec Loss 5.6000 LearningRate 0.0005 Epoch: 15 Global Step: 318590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:46,832-Speed 6315.86 samples/sec Loss 5.6039 LearningRate 0.0005 Epoch: 15 Global Step: 318600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:50,077-Speed 6313.01 samples/sec Loss 5.5431 LearningRate 0.0005 Epoch: 15 Global Step: 318610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:53,323-Speed 6310.15 samples/sec Loss 5.5866 LearningRate 0.0005 Epoch: 15 Global Step: 318620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:56,571-Speed 6305.95 samples/sec Loss 5.7079 LearningRate 0.0005 Epoch: 15 Global Step: 318630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:42:59,815-Speed 6315.68 samples/sec Loss 5.5045 LearningRate 0.0005 Epoch: 15 Global Step: 318640 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:03,062-Speed 6307.51 samples/sec Loss 5.5495 LearningRate 0.0005 Epoch: 15 Global Step: 318650 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:06,305-Speed 6317.55 samples/sec Loss 5.5635 LearningRate 0.0005 Epoch: 15 Global Step: 318660 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:09,548-Speed 6316.93 samples/sec Loss 5.5529 LearningRate 0.0005 Epoch: 15 Global Step: 318670 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 20:43:12,782-Speed 6334.10 samples/sec Loss 5.6065 LearningRate 0.0005 Epoch: 15 Global Step: 318680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:16,024-Speed 6319.30 samples/sec Loss 5.6070 LearningRate 0.0005 Epoch: 15 Global Step: 318690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:19,272-Speed 6306.37 samples/sec Loss 5.5771 LearningRate 0.0005 Epoch: 15 Global Step: 318700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:22,516-Speed 6314.61 samples/sec Loss 5.6063 LearningRate 0.0005 Epoch: 15 Global Step: 318710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:25,761-Speed 6312.96 samples/sec Loss 5.6118 LearningRate 0.0005 Epoch: 15 Global Step: 318720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:29,008-Speed 6308.08 samples/sec Loss 5.5311 LearningRate 0.0005 Epoch: 15 Global Step: 318730 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:32,252-Speed 6315.62 samples/sec Loss 5.5958 LearningRate 0.0005 Epoch: 15 Global Step: 318740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:35,497-Speed 6312.38 samples/sec Loss 5.5992 LearningRate 0.0005 Epoch: 15 Global Step: 318750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:38,742-Speed 6312.67 samples/sec Loss 5.5691 LearningRate 0.0005 Epoch: 15 Global Step: 318760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:41,988-Speed 6309.95 samples/sec Loss 5.5662 LearningRate 0.0005 Epoch: 15 Global Step: 318770 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:45,220-Speed 6337.50 samples/sec Loss 5.5853 LearningRate 0.0005 Epoch: 15 Global Step: 318780 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:48,464-Speed 6314.78 samples/sec Loss 5.6258 LearningRate 0.0005 Epoch: 15 Global Step: 318790 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:51,711-Speed 6309.30 samples/sec Loss 5.5583 LearningRate 0.0005 Epoch: 15 Global Step: 318800 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:54,956-Speed 6313.34 samples/sec Loss 5.6569 LearningRate 0.0005 Epoch: 15 Global Step: 318810 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:43:58,205-Speed 6304.45 samples/sec Loss 5.5568 LearningRate 0.0005 Epoch: 15 Global Step: 318820 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:01,450-Speed 6313.44 samples/sec Loss 5.5605 LearningRate 0.0005 Epoch: 15 Global Step: 318830 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:04,698-Speed 6305.11 samples/sec Loss 5.6476 LearningRate 0.0005 Epoch: 15 Global Step: 318840 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:07,947-Speed 6305.93 samples/sec Loss 5.5566 LearningRate 0.0005 Epoch: 15 Global Step: 318850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:11,191-Speed 6314.80 samples/sec Loss 5.5970 LearningRate 0.0005 Epoch: 15 Global Step: 318860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:14,438-Speed 6308.45 samples/sec Loss 5.5517 LearningRate 0.0005 Epoch: 15 Global Step: 318870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:17,669-Speed 6339.27 samples/sec Loss 5.4975 LearningRate 0.0005 Epoch: 15 Global Step: 318880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:20,915-Speed 6311.15 samples/sec Loss 5.5280 LearningRate 0.0005 Epoch: 15 Global Step: 318890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:24,167-Speed 6298.99 samples/sec Loss 5.5721 LearningRate 0.0005 Epoch: 15 Global Step: 318900 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:27,415-Speed 6308.74 samples/sec Loss 5.6025 LearningRate 0.0005 Epoch: 15 Global Step: 318910 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:30,665-Speed 6301.34 samples/sec Loss 5.6600 LearningRate 0.0005 Epoch: 15 Global Step: 318920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:33,912-Speed 6309.87 samples/sec Loss 5.5761 LearningRate 0.0005 Epoch: 15 Global Step: 318930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:37,160-Speed 6307.02 samples/sec Loss 5.6054 LearningRate 0.0005 Epoch: 15 Global Step: 318940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:40,405-Speed 6311.45 samples/sec Loss 5.6636 LearningRate 0.0005 Epoch: 15 Global Step: 318950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:43,650-Speed 6314.01 samples/sec Loss 5.5663 LearningRate 0.0005 Epoch: 15 Global Step: 318960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:46,896-Speed 6309.70 samples/sec Loss 5.5818 LearningRate 0.0005 Epoch: 15 Global Step: 318970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:50,129-Speed 6338.86 samples/sec Loss 5.5635 LearningRate 0.0005 Epoch: 15 Global Step: 318980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:53,377-Speed 6305.57 samples/sec Loss 5.5560 LearningRate 0.0005 Epoch: 15 Global Step: 318990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:56,625-Speed 6307.54 samples/sec Loss 5.5826 LearningRate 0.0005 Epoch: 15 Global Step: 319000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:44:59,870-Speed 6312.32 samples/sec Loss 5.5375 LearningRate 0.0005 Epoch: 15 Global Step: 319010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:03,117-Speed 6312.39 samples/sec Loss 5.6841 LearningRate 0.0005 Epoch: 15 Global Step: 319020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:06,370-Speed 6295.84 samples/sec Loss 5.5730 LearningRate 0.0005 Epoch: 15 Global Step: 319030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:09,615-Speed 6312.34 samples/sec Loss 5.7002 LearningRate 0.0005 Epoch: 15 Global Step: 319040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:12,861-Speed 6311.80 samples/sec Loss 5.5419 LearningRate 0.0005 Epoch: 15 Global Step: 319050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:16,109-Speed 6306.98 samples/sec Loss 5.5735 LearningRate 0.0005 Epoch: 15 Global Step: 319060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:19,352-Speed 6315.85 samples/sec Loss 5.5889 LearningRate 0.0005 Epoch: 15 Global Step: 319070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:22,596-Speed 6315.47 samples/sec Loss 5.6103 LearningRate 0.0005 Epoch: 15 Global Step: 319080 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 20:45:25,830-Speed 6334.24 samples/sec Loss 5.5936 LearningRate 0.0005 Epoch: 15 Global Step: 319090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:29,075-Speed 6313.43 samples/sec Loss 5.6560 LearningRate 0.0005 Epoch: 15 Global Step: 319100 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:32,325-Speed 6302.94 samples/sec Loss 5.5855 LearningRate 0.0005 Epoch: 15 Global Step: 319110 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:35,567-Speed 6318.37 samples/sec Loss 5.6178 LearningRate 0.0005 Epoch: 15 Global Step: 319120 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:38,812-Speed 6312.50 samples/sec Loss 5.6409 LearningRate 0.0005 Epoch: 15 Global Step: 319130 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:42,058-Speed 6310.22 samples/sec Loss 5.6546 LearningRate 0.0005 Epoch: 15 Global Step: 319140 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:45,303-Speed 6312.34 samples/sec Loss 5.5285 LearningRate 0.0005 Epoch: 15 Global Step: 319150 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:48,548-Speed 6312.45 samples/sec Loss 5.6880 LearningRate 0.0005 Epoch: 15 Global Step: 319160 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:51,793-Speed 6313.64 samples/sec Loss 5.5634 LearningRate 0.0005 Epoch: 15 Global Step: 319170 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:55,042-Speed 6304.51 samples/sec Loss 5.6896 LearningRate 0.0005 Epoch: 15 Global Step: 319180 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:45:58,275-Speed 6336.61 samples/sec Loss 5.6005 LearningRate 0.0005 Epoch: 15 Global Step: 319190 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:01,520-Speed 6312.95 samples/sec Loss 5.5957 LearningRate 0.0005 Epoch: 15 Global Step: 319200 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:04,765-Speed 6312.56 samples/sec Loss 5.6178 LearningRate 0.0005 Epoch: 15 Global Step: 319210 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:08,009-Speed 6314.54 samples/sec Loss 5.6221 LearningRate 0.0005 Epoch: 15 Global Step: 319220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:11,255-Speed 6309.76 samples/sec Loss 5.6059 LearningRate 0.0005 Epoch: 15 Global Step: 319230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:14,503-Speed 6308.12 samples/sec Loss 5.6238 LearningRate 0.0005 Epoch: 15 Global Step: 319240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:17,762-Speed 6285.16 samples/sec Loss 5.5591 LearningRate 0.0005 Epoch: 15 Global Step: 319250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:21,009-Speed 6309.17 samples/sec Loss 5.6749 LearningRate 0.0005 Epoch: 15 Global Step: 319260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:24,258-Speed 6303.66 samples/sec Loss 5.6020 LearningRate 0.0005 Epoch: 15 Global Step: 319270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:27,505-Speed 6308.68 samples/sec Loss 5.6006 LearningRate 0.0005 Epoch: 15 Global Step: 319280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:30,736-Speed 6340.17 samples/sec Loss 5.5621 LearningRate 0.0005 Epoch: 15 Global Step: 319290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:33,984-Speed 6307.35 samples/sec Loss 5.6187 LearningRate 0.0005 Epoch: 15 Global Step: 319300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:37,230-Speed 6312.37 samples/sec Loss 5.5502 LearningRate 0.0005 Epoch: 15 Global Step: 319310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:40,476-Speed 6310.66 samples/sec Loss 5.6069 LearningRate 0.0005 Epoch: 15 Global Step: 319320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:43,723-Speed 6308.04 samples/sec Loss 5.5333 LearningRate 0.0005 Epoch: 15 Global Step: 319330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:46,967-Speed 6314.33 samples/sec Loss 5.6557 LearningRate 0.0005 Epoch: 15 Global Step: 319340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:50,211-Speed 6314.65 samples/sec Loss 5.5596 LearningRate 0.0005 Epoch: 15 Global Step: 319350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:53,457-Speed 6310.19 samples/sec Loss 5.5415 LearningRate 0.0005 Epoch: 15 Global Step: 319360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:56,700-Speed 6317.30 samples/sec Loss 5.6414 LearningRate 0.0005 Epoch: 15 Global Step: 319370 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:46:59,947-Speed 6311.31 samples/sec Loss 5.5910 LearningRate 0.0005 Epoch: 15 Global Step: 319380 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:03,174-Speed 6348.75 samples/sec Loss 5.6404 LearningRate 0.0005 Epoch: 15 Global Step: 319390 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:06,420-Speed 6310.28 samples/sec Loss 5.6435 LearningRate 0.0005 Epoch: 15 Global Step: 319400 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:09,661-Speed 6319.26 samples/sec Loss 5.5838 LearningRate 0.0005 Epoch: 15 Global Step: 319410 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:12,912-Speed 6301.11 samples/sec Loss 5.5587 LearningRate 0.0005 Epoch: 15 Global Step: 319420 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:16,160-Speed 6307.39 samples/sec Loss 5.5688 LearningRate 0.0005 Epoch: 15 Global Step: 319430 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:19,407-Speed 6309.69 samples/sec Loss 5.5327 LearningRate 0.0005 Epoch: 15 Global Step: 319440 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:22,661-Speed 6293.18 samples/sec Loss 5.5799 LearningRate 0.0005 Epoch: 15 Global Step: 319450 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:25,907-Speed 6311.43 samples/sec Loss 5.5415 LearningRate 0.0005 Epoch: 15 Global Step: 319460 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:29,159-Speed 6299.54 samples/sec Loss 5.5711 LearningRate 0.0005 Epoch: 15 Global Step: 319470 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:32,402-Speed 6317.21 samples/sec Loss 5.6284 LearningRate 0.0005 Epoch: 15 Global Step: 319480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:35,633-Speed 6338.72 samples/sec Loss 5.5745 LearningRate 0.0005 Epoch: 15 Global Step: 319490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:47:38,867-Speed 6335.37 samples/sec Loss 5.6115 LearningRate 0.0005 Epoch: 15 Global Step: 319500 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:47:42,118-Speed 6299.06 samples/sec Loss 5.5607 LearningRate 0.0005 Epoch: 15 Global Step: 319510 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:47:45,367-Speed 6305.49 samples/sec Loss 5.5838 LearningRate 0.0005 Epoch: 15 Global Step: 319520 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:47:48,680-Speed 6184.82 samples/sec Loss 5.5770 LearningRate 0.0005 Epoch: 15 Global Step: 319530 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:47:51,956-Speed 6252.67 samples/sec Loss 5.5905 LearningRate 0.0005 Epoch: 15 Global Step: 319540 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:47:55,202-Speed 6310.07 samples/sec Loss 5.5336 LearningRate 0.0005 Epoch: 15 Global Step: 319550 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:47:58,448-Speed 6311.83 samples/sec Loss 5.5921 LearningRate 0.0005 Epoch: 15 Global Step: 319560 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:48:01,694-Speed 6309.63 samples/sec Loss 5.5755 LearningRate 0.0005 Epoch: 15 Global Step: 319570 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:48:04,952-Speed 6288.51 samples/sec Loss 5.6117 LearningRate 0.0005 Epoch: 15 Global Step: 319580 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:48:08,203-Speed 6301.64 samples/sec Loss 5.6152 LearningRate 0.0005 Epoch: 15 Global Step: 319590 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:48:11,449-Speed 6309.64 samples/sec Loss 5.5852 LearningRate 0.0005 Epoch: 15 Global Step: 319600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:14,698-Speed 6305.39 samples/sec Loss 5.6513 LearningRate 0.0005 Epoch: 15 Global Step: 319610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:17,944-Speed 6310.26 samples/sec Loss 5.6259 LearningRate 0.0005 Epoch: 15 Global Step: 319620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:21,194-Speed 6302.88 samples/sec Loss 5.5292 LearningRate 0.0005 Epoch: 15 Global Step: 319630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:24,445-Speed 6301.19 samples/sec Loss 5.5676 LearningRate 0.0005 Epoch: 15 Global Step: 319640 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:27,694-Speed 6303.93 samples/sec Loss 5.6647 LearningRate 0.0005 Epoch: 15 Global Step: 319650 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:30,945-Speed 6302.36 samples/sec Loss 5.5795 LearningRate 0.0005 Epoch: 15 Global Step: 319660 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:34,200-Speed 6293.58 samples/sec Loss 5.5204 LearningRate 0.0005 Epoch: 15 Global Step: 319670 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:37,443-Speed 6315.97 samples/sec Loss 5.6115 LearningRate 0.0005 Epoch: 15 Global Step: 319680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:40,690-Speed 6307.57 samples/sec Loss 5.5460 LearningRate 0.0005 Epoch: 15 Global Step: 319690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:43,919-Speed 6344.51 samples/sec Loss 5.5618 LearningRate 0.0005 Epoch: 15 Global Step: 319700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:47,165-Speed 6310.09 samples/sec Loss 5.5944 LearningRate 0.0005 Epoch: 15 Global Step: 319710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:50,414-Speed 6305.25 samples/sec Loss 5.6225 LearningRate 0.0005 Epoch: 15 Global Step: 319720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:53,660-Speed 6312.51 samples/sec Loss 5.5867 LearningRate 0.0005 Epoch: 15 Global Step: 319730 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:48:56,906-Speed 6310.60 samples/sec Loss 5.5676 LearningRate 0.0005 Epoch: 15 Global Step: 319740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:00,171-Speed 6273.48 samples/sec Loss 5.5547 LearningRate 0.0005 Epoch: 15 Global Step: 319750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:03,420-Speed 6305.10 samples/sec Loss 5.6147 LearningRate 0.0005 Epoch: 15 Global Step: 319760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:06,670-Speed 6303.74 samples/sec Loss 5.5960 LearningRate 0.0005 Epoch: 15 Global Step: 319770 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:09,918-Speed 6305.49 samples/sec Loss 5.6465 LearningRate 0.0005 Epoch: 15 Global Step: 319780 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:13,165-Speed 6309.00 samples/sec Loss 5.6277 LearningRate 0.0005 Epoch: 15 Global Step: 319790 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:16,397-Speed 6339.03 samples/sec Loss 5.5531 LearningRate 0.0005 Epoch: 15 Global Step: 319800 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:19,644-Speed 6308.19 samples/sec Loss 5.5601 LearningRate 0.0005 Epoch: 15 Global Step: 319810 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:22,884-Speed 6322.46 samples/sec Loss 5.5482 LearningRate 0.0005 Epoch: 15 Global Step: 319820 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:26,134-Speed 6303.34 samples/sec Loss 5.6163 LearningRate 0.0005 Epoch: 15 Global Step: 319830 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:29,379-Speed 6311.37 samples/sec Loss 5.6242 LearningRate 0.0005 Epoch: 15 Global Step: 319840 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:32,624-Speed 6313.05 samples/sec Loss 5.6399 LearningRate 0.0005 Epoch: 15 Global Step: 319850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:35,866-Speed 6318.19 samples/sec Loss 5.5254 LearningRate 0.0005 Epoch: 15 Global Step: 319860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:39,110-Speed 6316.84 samples/sec Loss 5.5708 LearningRate 0.0005 Epoch: 15 Global Step: 319870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:42,357-Speed 6308.57 samples/sec Loss 5.5982 LearningRate 0.0005 Epoch: 15 Global Step: 319880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:45,609-Speed 6298.91 samples/sec Loss 5.6107 LearningRate 0.0005 Epoch: 15 Global Step: 319890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:48,851-Speed 6318.26 samples/sec Loss 5.5331 LearningRate 0.0005 Epoch: 15 Global Step: 319900 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 20:49:52,084-Speed 6334.74 samples/sec Loss 5.6059 LearningRate 0.0005 Epoch: 15 Global Step: 319910 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:55,329-Speed 6312.72 samples/sec Loss 5.5885 LearningRate 0.0005 Epoch: 15 Global Step: 319920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:49:58,575-Speed 6311.33 samples/sec Loss 5.5487 LearningRate 0.0005 Epoch: 15 Global Step: 319930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:01,822-Speed 6308.50 samples/sec Loss 5.6293 LearningRate 0.0005 Epoch: 15 Global Step: 319940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:05,069-Speed 6309.44 samples/sec Loss 5.5740 LearningRate 0.0005 Epoch: 15 Global Step: 319950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:08,315-Speed 6311.04 samples/sec Loss 5.5899 LearningRate 0.0005 Epoch: 15 Global Step: 319960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:11,559-Speed 6315.99 samples/sec Loss 5.5466 LearningRate 0.0005 Epoch: 15 Global Step: 319970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:14,808-Speed 6303.44 samples/sec Loss 5.5135 LearningRate 0.0005 Epoch: 15 Global Step: 319980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:18,053-Speed 6312.84 samples/sec Loss 5.5305 LearningRate 0.0005 Epoch: 15 Global Step: 319990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:21,305-Speed 6298.83 samples/sec Loss 5.5646 LearningRate 0.0005 Epoch: 15 Global Step: 320000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:24,539-Speed 6335.42 samples/sec Loss 5.5696 LearningRate 0.0005 Epoch: 15 Global Step: 320010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:27,781-Speed 6318.02 samples/sec Loss 5.6149 LearningRate 0.0005 Epoch: 15 Global Step: 320020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:31,032-Speed 6300.18 samples/sec Loss 5.5989 LearningRate 0.0005 Epoch: 15 Global Step: 320030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:34,279-Speed 6309.55 samples/sec Loss 5.6451 LearningRate 0.0005 Epoch: 15 Global Step: 320040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:37,527-Speed 6307.40 samples/sec Loss 5.5657 LearningRate 0.0005 Epoch: 15 Global Step: 320050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:40,772-Speed 6312.43 samples/sec Loss 5.5546 LearningRate 0.0005 Epoch: 15 Global Step: 320060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:44,021-Speed 6304.62 samples/sec Loss 5.5933 LearningRate 0.0005 Epoch: 15 Global Step: 320070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:47,267-Speed 6309.57 samples/sec Loss 5.5997 LearningRate 0.0005 Epoch: 15 Global Step: 320080 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:50,516-Speed 6306.35 samples/sec Loss 5.6289 LearningRate 0.0005 Epoch: 15 Global Step: 320090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:50:53,748-Speed 6337.47 samples/sec Loss 5.5748 LearningRate 0.0005 Epoch: 15 Global Step: 320100 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:50:56,997-Speed 6305.68 samples/sec Loss 5.5491 LearningRate 0.0005 Epoch: 15 Global Step: 320110 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:51:00,247-Speed 6302.69 samples/sec Loss 5.5306 LearningRate 0.0005 Epoch: 15 Global Step: 320120 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:51:03,494-Speed 6307.57 samples/sec Loss 5.5762 LearningRate 0.0005 Epoch: 15 Global Step: 320130 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:51:06,739-Speed 6314.59 samples/sec Loss 5.5830 LearningRate 0.0005 Epoch: 15 Global Step: 320140 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:51:09,984-Speed 6311.08 samples/sec Loss 5.6118 LearningRate 0.0005 Epoch: 15 Global Step: 320150 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:51:13,233-Speed 6306.11 samples/sec Loss 5.5692 LearningRate 0.0005 Epoch: 15 Global Step: 320160 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:51:16,479-Speed 6309.75 samples/sec Loss 5.5722 LearningRate 0.0005 Epoch: 15 Global Step: 320170 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:51:19,730-Speed 6302.95 samples/sec Loss 5.6131 LearningRate 0.0005 Epoch: 15 Global Step: 320180 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:51:22,977-Speed 6308.68 samples/sec Loss 5.5844 LearningRate 0.0005 Epoch: 15 Global Step: 320190 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:51:26,231-Speed 6294.89 samples/sec Loss 5.5993 LearningRate 0.0005 Epoch: 15 Global Step: 320200 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:51:29,479-Speed 6306.91 samples/sec Loss 5.5287 LearningRate 0.0005 Epoch: 15 Global Step: 320210 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:51:32,728-Speed 6306.07 samples/sec Loss 5.5931 LearningRate 0.0005 Epoch: 15 Global Step: 320220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:51:35,972-Speed 6314.21 samples/sec Loss 5.6059 LearningRate 0.0005 Epoch: 15 Global Step: 320230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:51:39,219-Speed 6307.65 samples/sec Loss 5.6225 LearningRate 0.0005 Epoch: 15 Global Step: 320240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:51:42,464-Speed 6312.60 samples/sec Loss 5.6685 LearningRate 0.0005 Epoch: 15 Global Step: 320250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:51:45,724-Speed 6284.24 samples/sec Loss 5.5626 LearningRate 0.0005 Epoch: 15 Global Step: 320260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:51:48,982-Speed 6286.11 samples/sec Loss 5.6056 LearningRate 0.0005 Epoch: 15 Global Step: 320270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:51:52,243-Speed 6283.48 samples/sec Loss 5.6318 LearningRate 0.0005 Epoch: 15 Global Step: 320280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:51:55,489-Speed 6309.75 samples/sec Loss 5.6146 LearningRate 0.0005 Epoch: 15 Global Step: 320290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:51:58,719-Speed 6341.70 samples/sec Loss 5.5521 LearningRate 0.0005 Epoch: 15 Global Step: 320300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:01,969-Speed 6304.52 samples/sec Loss 5.5893 LearningRate 0.0005 Epoch: 15 Global Step: 320310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:05,217-Speed 6305.82 samples/sec Loss 5.5336 LearningRate 0.0005 Epoch: 15 Global Step: 320320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:08,463-Speed 6311.86 samples/sec Loss 5.5442 LearningRate 0.0005 Epoch: 15 Global Step: 320330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:11,706-Speed 6315.16 samples/sec Loss 5.5808 LearningRate 0.0005 Epoch: 15 Global Step: 320340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:14,950-Speed 6314.42 samples/sec Loss 5.5850 LearningRate 0.0005 Epoch: 15 Global Step: 320350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:18,200-Speed 6303.40 samples/sec Loss 5.5546 LearningRate 0.0005 Epoch: 15 Global Step: 320360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:21,446-Speed 6311.99 samples/sec Loss 5.5701 LearningRate 0.0005 Epoch: 15 Global Step: 320370 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:24,691-Speed 6312.37 samples/sec Loss 5.6386 LearningRate 0.0005 Epoch: 15 Global Step: 320380 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:27,938-Speed 6308.58 samples/sec Loss 5.5837 LearningRate 0.0005 Epoch: 15 Global Step: 320390 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:31,171-Speed 6337.31 samples/sec Loss 5.4560 LearningRate 0.0005 Epoch: 15 Global Step: 320400 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:34,414-Speed 6314.72 samples/sec Loss 5.5887 LearningRate 0.0005 Epoch: 15 Global Step: 320410 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:37,660-Speed 6313.00 samples/sec Loss 5.5500 LearningRate 0.0005 Epoch: 15 Global Step: 320420 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:40,912-Speed 6297.75 samples/sec Loss 5.5748 LearningRate 0.0005 Epoch: 15 Global Step: 320430 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:44,159-Speed 6309.71 samples/sec Loss 5.6458 LearningRate 0.0005 Epoch: 15 Global Step: 320440 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:47,408-Speed 6304.12 samples/sec Loss 5.5904 LearningRate 0.0005 Epoch: 15 Global Step: 320450 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:50,657-Speed 6305.43 samples/sec Loss 5.6104 LearningRate 0.0005 Epoch: 15 Global Step: 320460 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:53,901-Speed 6314.68 samples/sec Loss 5.6017 LearningRate 0.0005 Epoch: 15 Global Step: 320470 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:52:57,145-Speed 6314.46 samples/sec Loss 5.5517 LearningRate 0.0005 Epoch: 15 Global Step: 320480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:00,390-Speed 6311.69 samples/sec Loss 5.5241 LearningRate 0.0005 Epoch: 15 Global Step: 320490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:03,638-Speed 6307.31 samples/sec Loss 5.5870 LearningRate 0.0005 Epoch: 15 Global Step: 320500 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 20:53:06,875-Speed 6328.09 samples/sec Loss 5.5817 LearningRate 0.0005 Epoch: 15 Global Step: 320510 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:10,121-Speed 6309.91 samples/sec Loss 5.6273 LearningRate 0.0005 Epoch: 15 Global Step: 320520 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:13,367-Speed 6311.10 samples/sec Loss 5.5854 LearningRate 0.0005 Epoch: 15 Global Step: 320530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:16,617-Speed 6302.83 samples/sec Loss 5.6143 LearningRate 0.0005 Epoch: 15 Global Step: 320540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:19,864-Speed 6308.51 samples/sec Loss 5.6763 LearningRate 0.0005 Epoch: 15 Global Step: 320550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:23,116-Speed 6300.67 samples/sec Loss 5.5887 LearningRate 0.0005 Epoch: 15 Global Step: 320560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:26,364-Speed 6306.13 samples/sec Loss 5.5448 LearningRate 0.0005 Epoch: 15 Global Step: 320570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:29,611-Speed 6308.67 samples/sec Loss 5.5141 LearningRate 0.0005 Epoch: 15 Global Step: 320580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:32,862-Speed 6302.50 samples/sec Loss 5.6427 LearningRate 0.0005 Epoch: 15 Global Step: 320590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:36,118-Speed 6290.55 samples/sec Loss 5.5869 LearningRate 0.0005 Epoch: 15 Global Step: 320600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:39,353-Speed 6332.25 samples/sec Loss 5.5253 LearningRate 0.0005 Epoch: 15 Global Step: 320610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:42,599-Speed 6311.27 samples/sec Loss 5.5732 LearningRate 0.0005 Epoch: 15 Global Step: 320620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:45,845-Speed 6310.89 samples/sec Loss 5.5754 LearningRate 0.0005 Epoch: 15 Global Step: 320630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:49,091-Speed 6310.80 samples/sec Loss 5.5958 LearningRate 0.0005 Epoch: 15 Global Step: 320640 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:52,334-Speed 6316.70 samples/sec Loss 5.5741 LearningRate 0.0005 Epoch: 15 Global Step: 320650 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:55,579-Speed 6312.65 samples/sec Loss 5.5869 LearningRate 0.0005 Epoch: 15 Global Step: 320660 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:53:58,825-Speed 6309.54 samples/sec Loss 5.5387 LearningRate 0.0005 Epoch: 15 Global Step: 320670 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:02,077-Speed 6300.04 samples/sec Loss 5.5333 LearningRate 0.0005 Epoch: 15 Global Step: 320680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:05,320-Speed 6316.56 samples/sec Loss 5.6710 LearningRate 0.0005 Epoch: 15 Global Step: 320690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:08,565-Speed 6311.27 samples/sec Loss 5.4937 LearningRate 0.0005 Epoch: 15 Global Step: 320700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:11,799-Speed 6335.34 samples/sec Loss 5.5206 LearningRate 0.0005 Epoch: 15 Global Step: 320710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:15,045-Speed 6310.12 samples/sec Loss 5.5985 LearningRate 0.0005 Epoch: 15 Global Step: 320720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:18,289-Speed 6313.85 samples/sec Loss 5.5782 LearningRate 0.0005 Epoch: 15 Global Step: 320730 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:21,531-Speed 6319.87 samples/sec Loss 5.6366 LearningRate 0.0005 Epoch: 15 Global Step: 320740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:24,792-Speed 6281.67 samples/sec Loss 5.5890 LearningRate 0.0005 Epoch: 15 Global Step: 320750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:28,046-Speed 6295.14 samples/sec Loss 5.6083 LearningRate 0.0005 Epoch: 15 Global Step: 320760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:31,320-Speed 6256.09 samples/sec Loss 5.5328 LearningRate 0.0005 Epoch: 15 Global Step: 320770 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:34,568-Speed 6306.75 samples/sec Loss 5.6515 LearningRate 0.0005 Epoch: 15 Global Step: 320780 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:37,815-Speed 6309.49 samples/sec Loss 5.5663 LearningRate 0.0005 Epoch: 15 Global Step: 320790 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:41,065-Speed 6304.08 samples/sec Loss 5.5780 LearningRate 0.0005 Epoch: 15 Global Step: 320800 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:44,296-Speed 6338.98 samples/sec Loss 5.6363 LearningRate 0.0005 Epoch: 15 Global Step: 320810 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:47,548-Speed 6300.43 samples/sec Loss 5.6321 LearningRate 0.0005 Epoch: 15 Global Step: 320820 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:50,795-Speed 6307.19 samples/sec Loss 5.5311 LearningRate 0.0005 Epoch: 15 Global Step: 320830 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:54,042-Speed 6308.49 samples/sec Loss 5.5732 LearningRate 0.0005 Epoch: 15 Global Step: 320840 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:54:57,291-Speed 6306.17 samples/sec Loss 5.5934 LearningRate 0.0005 Epoch: 15 Global Step: 320850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:00,536-Speed 6312.91 samples/sec Loss 5.5964 LearningRate 0.0005 Epoch: 15 Global Step: 320860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:03,784-Speed 6306.48 samples/sec Loss 5.6015 LearningRate 0.0005 Epoch: 15 Global Step: 320870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:07,027-Speed 6317.47 samples/sec Loss 5.6255 LearningRate 0.0005 Epoch: 15 Global Step: 320880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:10,273-Speed 6309.89 samples/sec Loss 5.5717 LearningRate 0.0005 Epoch: 15 Global Step: 320890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:13,548-Speed 6256.02 samples/sec Loss 5.5604 LearningRate 0.0005 Epoch: 15 Global Step: 320900 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:16,797-Speed 6303.81 samples/sec Loss 5.5784 LearningRate 0.0005 Epoch: 15 Global Step: 320910 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 20:55:20,029-Speed 6337.16 samples/sec Loss 5.6201 LearningRate 0.0005 Epoch: 15 Global Step: 320920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:23,274-Speed 6313.32 samples/sec Loss 5.6397 LearningRate 0.0005 Epoch: 15 Global Step: 320930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:26,545-Speed 6263.11 samples/sec Loss 5.5436 LearningRate 0.0005 Epoch: 15 Global Step: 320940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:29,793-Speed 6305.68 samples/sec Loss 5.5906 LearningRate 0.0005 Epoch: 15 Global Step: 320950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:33,040-Speed 6309.60 samples/sec Loss 5.5850 LearningRate 0.0005 Epoch: 15 Global Step: 320960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:36,284-Speed 6314.86 samples/sec Loss 5.6176 LearningRate 0.0005 Epoch: 15 Global Step: 320970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:39,528-Speed 6314.83 samples/sec Loss 5.6102 LearningRate 0.0005 Epoch: 15 Global Step: 320980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:42,773-Speed 6312.30 samples/sec Loss 5.5410 LearningRate 0.0005 Epoch: 15 Global Step: 320990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:46,016-Speed 6315.81 samples/sec Loss 5.5690 LearningRate 0.0005 Epoch: 15 Global Step: 321000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:49,266-Speed 6303.07 samples/sec Loss 5.6450 LearningRate 0.0005 Epoch: 15 Global Step: 321010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:52,501-Speed 6332.15 samples/sec Loss 5.5644 LearningRate 0.0005 Epoch: 15 Global Step: 321020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:55,752-Speed 6302.04 samples/sec Loss 5.5375 LearningRate 0.0005 Epoch: 15 Global Step: 321030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:55:58,998-Speed 6311.47 samples/sec Loss 5.5779 LearningRate 0.0005 Epoch: 15 Global Step: 321040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:02,242-Speed 6312.92 samples/sec Loss 5.6088 LearningRate 0.0005 Epoch: 15 Global Step: 321050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:05,492-Speed 6303.96 samples/sec Loss 5.4911 LearningRate 0.0005 Epoch: 15 Global Step: 321060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:08,735-Speed 6317.32 samples/sec Loss 5.5938 LearningRate 0.0005 Epoch: 15 Global Step: 321070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:11,979-Speed 6314.14 samples/sec Loss 5.4729 LearningRate 0.0005 Epoch: 15 Global Step: 321080 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:15,225-Speed 6310.67 samples/sec Loss 5.6599 LearningRate 0.0005 Epoch: 15 Global Step: 321090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:18,467-Speed 6317.75 samples/sec Loss 5.6261 LearningRate 0.0005 Epoch: 15 Global Step: 321100 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:21,712-Speed 6313.41 samples/sec Loss 5.5817 LearningRate 0.0005 Epoch: 15 Global Step: 321110 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:24,944-Speed 6338.86 samples/sec Loss 5.6316 LearningRate 0.0005 Epoch: 15 Global Step: 321120 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:28,192-Speed 6305.53 samples/sec Loss 5.4887 LearningRate 0.0005 Epoch: 15 Global Step: 321130 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:31,439-Speed 6309.37 samples/sec Loss 5.6125 LearningRate 0.0005 Epoch: 15 Global Step: 321140 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:34,686-Speed 6307.55 samples/sec Loss 5.5682 LearningRate 0.0005 Epoch: 15 Global Step: 321150 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:37,932-Speed 6311.95 samples/sec Loss 5.5633 LearningRate 0.0005 Epoch: 15 Global Step: 321160 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:41,175-Speed 6315.32 samples/sec Loss 5.6492 LearningRate 0.0005 Epoch: 15 Global Step: 321170 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:44,421-Speed 6312.02 samples/sec Loss 5.5374 LearningRate 0.0005 Epoch: 15 Global Step: 321180 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:47,671-Speed 6302.20 samples/sec Loss 5.6411 LearningRate 0.0005 Epoch: 15 Global Step: 321190 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:50,915-Speed 6314.13 samples/sec Loss 5.5075 LearningRate 0.0005 Epoch: 15 Global Step: 321200 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:54,163-Speed 6307.15 samples/sec Loss 5.5892 LearningRate 0.0005 Epoch: 15 Global Step: 321210 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:56:57,394-Speed 6341.83 samples/sec Loss 5.6074 LearningRate 0.0005 Epoch: 15 Global Step: 321220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:00,645-Speed 6301.27 samples/sec Loss 5.5805 LearningRate 0.0005 Epoch: 15 Global Step: 321230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:03,890-Speed 6310.99 samples/sec Loss 5.5460 LearningRate 0.0005 Epoch: 15 Global Step: 321240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:07,141-Speed 6301.33 samples/sec Loss 5.5674 LearningRate 0.0005 Epoch: 15 Global Step: 321250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:10,387-Speed 6310.27 samples/sec Loss 5.5292 LearningRate 0.0005 Epoch: 15 Global Step: 321260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:13,631-Speed 6315.64 samples/sec Loss 5.4947 LearningRate 0.0005 Epoch: 15 Global Step: 321270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:16,877-Speed 6309.67 samples/sec Loss 5.5700 LearningRate 0.0005 Epoch: 15 Global Step: 321280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:20,133-Speed 6292.48 samples/sec Loss 5.5271 LearningRate 0.0005 Epoch: 15 Global Step: 321290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:23,380-Speed 6309.02 samples/sec Loss 5.5390 LearningRate 0.0005 Epoch: 15 Global Step: 321300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:26,626-Speed 6310.51 samples/sec Loss 5.5962 LearningRate 0.0005 Epoch: 15 Global Step: 321310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:29,858-Speed 6337.40 samples/sec Loss 5.5816 LearningRate 0.0005 Epoch: 15 Global Step: 321320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:33,104-Speed 6310.58 samples/sec Loss 5.5744 LearningRate 0.0005 Epoch: 15 Global Step: 321330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:36,347-Speed 6317.59 samples/sec Loss 5.6047 LearningRate 0.0005 Epoch: 15 Global Step: 321340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:39,596-Speed 6303.88 samples/sec Loss 5.5806 LearningRate 0.0005 Epoch: 15 Global Step: 321350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:42,838-Speed 6318.23 samples/sec Loss 5.6050 LearningRate 0.0005 Epoch: 15 Global Step: 321360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:46,083-Speed 6312.81 samples/sec Loss 5.5660 LearningRate 0.0005 Epoch: 15 Global Step: 321370 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:49,331-Speed 6307.89 samples/sec Loss 5.6820 LearningRate 0.0005 Epoch: 15 Global Step: 321380 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:52,581-Speed 6302.33 samples/sec Loss 5.4758 LearningRate 0.0005 Epoch: 15 Global Step: 321390 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:55,830-Speed 6304.38 samples/sec Loss 5.5550 LearningRate 0.0005 Epoch: 15 Global Step: 321400 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:57:59,076-Speed 6310.13 samples/sec Loss 5.6640 LearningRate 0.0005 Epoch: 15 Global Step: 321410 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:58:02,313-Speed 6329.04 samples/sec Loss 5.4869 LearningRate 0.0005 Epoch: 15 Global Step: 321420 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:58:05,562-Speed 6305.97 samples/sec Loss 5.5967 LearningRate 0.0005 Epoch: 15 Global Step: 321430 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:58:08,807-Speed 6312.48 samples/sec Loss 5.6225 LearningRate 0.0005 Epoch: 15 Global Step: 321440 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:58:12,054-Speed 6309.68 samples/sec Loss 5.6161 LearningRate 0.0005 Epoch: 15 Global Step: 321450 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:58:15,302-Speed 6306.03 samples/sec Loss 5.6072 LearningRate 0.0005 Epoch: 15 Global Step: 321460 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:58:18,547-Speed 6313.18 samples/sec Loss 5.6299 LearningRate 0.0005 Epoch: 15 Global Step: 321470 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:58:21,792-Speed 6313.48 samples/sec Loss 5.5536 LearningRate 0.0005 Epoch: 15 Global Step: 321480 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:58:25,037-Speed 6311.70 samples/sec Loss 5.5602 LearningRate 0.0005 Epoch: 15 Global Step: 321490 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:58:28,282-Speed 6312.03 samples/sec Loss 5.5924 LearningRate 0.0005 Epoch: 15 Global Step: 321500 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:58:31,532-Speed 6304.29 samples/sec Loss 5.5257 LearningRate 0.0005 Epoch: 15 Global Step: 321510 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:58:34,785-Speed 6295.29 samples/sec Loss 5.5454 LearningRate 0.0005 Epoch: 15 Global Step: 321520 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:58:38,039-Speed 6304.00 samples/sec Loss 5.5185 LearningRate 0.0005 Epoch: 15 Global Step: 321530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:58:41,285-Speed 6310.73 samples/sec Loss 5.5615 LearningRate 0.0005 Epoch: 15 Global Step: 321540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:58:44,527-Speed 6317.59 samples/sec Loss 5.5607 LearningRate 0.0005 Epoch: 15 Global Step: 321550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:58:47,771-Speed 6316.31 samples/sec Loss 5.5808 LearningRate 0.0005 Epoch: 15 Global Step: 321560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:58:51,016-Speed 6311.00 samples/sec Loss 5.5859 LearningRate 0.0005 Epoch: 15 Global Step: 321570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:58:54,265-Speed 6306.93 samples/sec Loss 5.6448 LearningRate 0.0005 Epoch: 15 Global Step: 321580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:58:57,511-Speed 6309.73 samples/sec Loss 5.5405 LearningRate 0.0005 Epoch: 15 Global Step: 321590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:59:00,757-Speed 6310.94 samples/sec Loss 5.5849 LearningRate 0.0005 Epoch: 15 Global Step: 321600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:59:04,009-Speed 6297.87 samples/sec Loss 5.5561 LearningRate 0.0005 Epoch: 15 Global Step: 321610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:59:07,240-Speed 6340.72 samples/sec Loss 5.5299 LearningRate 0.0005 Epoch: 15 Global Step: 321620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:59:10,491-Speed 6302.04 samples/sec Loss 5.5234 LearningRate 0.0005 Epoch: 15 Global Step: 321630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:59:13,744-Speed 6298.15 samples/sec Loss 5.4777 LearningRate 0.0005 Epoch: 15 Global Step: 321640 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:59:16,989-Speed 6311.50 samples/sec Loss 5.7088 LearningRate 0.0005 Epoch: 15 Global Step: 321650 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:59:20,237-Speed 6306.61 samples/sec Loss 5.5703 LearningRate 0.0005 Epoch: 15 Global Step: 321660 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:59:23,482-Speed 6313.12 samples/sec Loss 5.6024 LearningRate 0.0005 Epoch: 15 Global Step: 321670 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:59:26,727-Speed 6313.35 samples/sec Loss 5.5253 LearningRate 0.0005 Epoch: 15 Global Step: 321680 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:59:29,973-Speed 6310.72 samples/sec Loss 5.5437 LearningRate 0.0005 Epoch: 15 Global Step: 321690 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:59:33,217-Speed 6314.08 samples/sec Loss 5.5282 LearningRate 0.0005 Epoch: 15 Global Step: 321700 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:59:36,462-Speed 6312.74 samples/sec Loss 5.5668 LearningRate 0.0005 Epoch: 15 Global Step: 321710 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:59:39,705-Speed 6317.10 samples/sec Loss 5.5653 LearningRate 0.0005 Epoch: 15 Global Step: 321720 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:59:42,950-Speed 6312.08 samples/sec Loss 5.6089 LearningRate 0.0005 Epoch: 15 Global Step: 321730 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 20:59:46,197-Speed 6309.23 samples/sec Loss 5.5140 LearningRate 0.0005 Epoch: 15 Global Step: 321740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:59:49,444-Speed 6308.66 samples/sec Loss 5.6185 LearningRate 0.0005 Epoch: 15 Global Step: 321750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:59:52,688-Speed 6313.94 samples/sec Loss 5.5858 LearningRate 0.0005 Epoch: 15 Global Step: 321760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:59:55,944-Speed 6292.49 samples/sec Loss 5.6565 LearningRate 0.0005 Epoch: 15 Global Step: 321770 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 20:59:59,204-Speed 6283.52 samples/sec Loss 5.5126 LearningRate 0.0005 Epoch: 15 Global Step: 321780 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:02,449-Speed 6312.65 samples/sec Loss 5.5578 LearningRate 0.0005 Epoch: 15 Global Step: 321790 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:05,695-Speed 6310.64 samples/sec Loss 5.6322 LearningRate 0.0005 Epoch: 15 Global Step: 321800 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:08,943-Speed 6306.07 samples/sec Loss 5.5695 LearningRate 0.0005 Epoch: 15 Global Step: 321810 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:12,193-Speed 6303.11 samples/sec Loss 5.5985 LearningRate 0.0005 Epoch: 15 Global Step: 321820 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:15,440-Speed 6309.27 samples/sec Loss 5.5681 LearningRate 0.0005 Epoch: 15 Global Step: 321830 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:18,675-Speed 6332.34 samples/sec Loss 5.5835 LearningRate 0.0005 Epoch: 15 Global Step: 321840 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:21,923-Speed 6307.06 samples/sec Loss 5.5784 LearningRate 0.0005 Epoch: 15 Global Step: 321850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:25,172-Speed 6305.00 samples/sec Loss 5.5224 LearningRate 0.0005 Epoch: 15 Global Step: 321860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:28,417-Speed 6312.46 samples/sec Loss 5.5968 LearningRate 0.0005 Epoch: 15 Global Step: 321870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:31,666-Speed 6306.04 samples/sec Loss 5.6184 LearningRate 0.0005 Epoch: 15 Global Step: 321880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:34,909-Speed 6315.50 samples/sec Loss 5.5271 LearningRate 0.0005 Epoch: 15 Global Step: 321890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:38,160-Speed 6302.16 samples/sec Loss 5.5969 LearningRate 0.0005 Epoch: 15 Global Step: 321900 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:41,407-Speed 6308.54 samples/sec Loss 5.5447 LearningRate 0.0005 Epoch: 15 Global Step: 321910 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:44,666-Speed 6285.12 samples/sec Loss 5.6213 LearningRate 0.0005 Epoch: 15 Global Step: 321920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:47,970-Speed 6200.24 samples/sec Loss 5.6507 LearningRate 0.0005 Epoch: 15 Global Step: 321930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:51,201-Speed 6339.27 samples/sec Loss 5.6497 LearningRate 0.0005 Epoch: 15 Global Step: 321940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:54,449-Speed 6307.95 samples/sec Loss 5.5668 LearningRate 0.0005 Epoch: 15 Global Step: 321950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:00:57,697-Speed 6306.23 samples/sec Loss 5.5615 LearningRate 0.0005 Epoch: 15 Global Step: 321960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:00,944-Speed 6309.06 samples/sec Loss 5.5083 LearningRate 0.0005 Epoch: 15 Global Step: 321970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:04,192-Speed 6305.79 samples/sec Loss 5.5584 LearningRate 0.0005 Epoch: 15 Global Step: 321980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:07,440-Speed 6308.05 samples/sec Loss 5.6129 LearningRate 0.0005 Epoch: 15 Global Step: 321990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:10,689-Speed 6305.18 samples/sec Loss 5.5451 LearningRate 0.0005 Epoch: 15 Global Step: 322000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:13,933-Speed 6314.58 samples/sec Loss 5.5689 LearningRate 0.0005 Epoch: 15 Global Step: 322010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:17,176-Speed 6315.56 samples/sec Loss 5.5917 LearningRate 0.0005 Epoch: 15 Global Step: 322020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:20,425-Speed 6304.52 samples/sec Loss 5.5754 LearningRate 0.0005 Epoch: 15 Global Step: 322030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:23,673-Speed 6307.29 samples/sec Loss 5.5742 LearningRate 0.0005 Epoch: 15 Global Step: 322040 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 21:01:26,909-Speed 6332.24 samples/sec Loss 5.5419 LearningRate 0.0005 Epoch: 15 Global Step: 322050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:30,155-Speed 6310.93 samples/sec Loss 5.5861 LearningRate 0.0005 Epoch: 15 Global Step: 322060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:33,402-Speed 6307.80 samples/sec Loss 5.5823 LearningRate 0.0005 Epoch: 15 Global Step: 322070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:36,651-Speed 6305.05 samples/sec Loss 5.6409 LearningRate 0.0005 Epoch: 15 Global Step: 322080 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:39,899-Speed 6306.60 samples/sec Loss 5.5913 LearningRate 0.0005 Epoch: 15 Global Step: 322090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:43,144-Speed 6312.57 samples/sec Loss 5.5926 LearningRate 0.0005 Epoch: 15 Global Step: 322100 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:46,389-Speed 6313.78 samples/sec Loss 5.6563 LearningRate 0.0005 Epoch: 15 Global Step: 322110 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:01:49,620-Speed 6339.90 samples/sec Loss 5.5418 LearningRate 0.0005 Epoch: 15 Global Step: 322120 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:01:52,867-Speed 6308.30 samples/sec Loss 5.5514 LearningRate 0.0005 Epoch: 15 Global Step: 322130 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:01:56,113-Speed 6311.42 samples/sec Loss 5.6205 LearningRate 0.0005 Epoch: 15 Global Step: 322140 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:01:59,360-Speed 6307.19 samples/sec Loss 5.5823 LearningRate 0.0005 Epoch: 15 Global Step: 322150 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:02:02,606-Speed 6310.74 samples/sec Loss 5.6033 LearningRate 0.0005 Epoch: 15 Global Step: 322160 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:02:05,856-Speed 6303.42 samples/sec Loss 5.6096 LearningRate 0.0005 Epoch: 15 Global Step: 322170 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:02:09,102-Speed 6310.17 samples/sec Loss 5.5759 LearningRate 0.0005 Epoch: 15 Global Step: 322180 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:02:12,348-Speed 6312.02 samples/sec Loss 5.5760 LearningRate 0.0005 Epoch: 15 Global Step: 322190 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:02:15,592-Speed 6314.07 samples/sec Loss 5.5342 LearningRate 0.0005 Epoch: 15 Global Step: 322200 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:02:18,838-Speed 6309.60 samples/sec Loss 5.6594 LearningRate 0.0005 Epoch: 15 Global Step: 322210 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:02:22,094-Speed 6291.57 samples/sec Loss 5.5701 LearningRate 0.0005 Epoch: 15 Global Step: 322220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:25,343-Speed 6304.85 samples/sec Loss 5.6123 LearningRate 0.0005 Epoch: 15 Global Step: 322230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:28,584-Speed 6320.38 samples/sec Loss 5.5090 LearningRate 0.0005 Epoch: 15 Global Step: 322240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:31,832-Speed 6308.23 samples/sec Loss 5.5649 LearningRate 0.0005 Epoch: 15 Global Step: 322250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:35,091-Speed 6286.22 samples/sec Loss 5.6284 LearningRate 0.0005 Epoch: 15 Global Step: 322260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:38,338-Speed 6308.69 samples/sec Loss 5.5715 LearningRate 0.0005 Epoch: 15 Global Step: 322270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:41,579-Speed 6319.35 samples/sec Loss 5.5671 LearningRate 0.0005 Epoch: 15 Global Step: 322280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:44,822-Speed 6317.60 samples/sec Loss 5.5793 LearningRate 0.0005 Epoch: 15 Global Step: 322290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:48,068-Speed 6310.68 samples/sec Loss 5.5651 LearningRate 0.0005 Epoch: 15 Global Step: 322300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:51,329-Speed 6280.78 samples/sec Loss 5.5812 LearningRate 0.0005 Epoch: 15 Global Step: 322310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:54,561-Speed 6339.04 samples/sec Loss 5.5331 LearningRate 0.0005 Epoch: 15 Global Step: 322320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:02:57,807-Speed 6309.99 samples/sec Loss 5.6102 LearningRate 0.0005 Epoch: 15 Global Step: 322330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:01,058-Speed 6302.09 samples/sec Loss 5.6235 LearningRate 0.0005 Epoch: 15 Global Step: 322340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:04,301-Speed 6316.28 samples/sec Loss 5.5468 LearningRate 0.0005 Epoch: 15 Global Step: 322350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:07,545-Speed 6314.76 samples/sec Loss 5.5664 LearningRate 0.0005 Epoch: 15 Global Step: 322360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:10,788-Speed 6316.92 samples/sec Loss 5.5504 LearningRate 0.0005 Epoch: 15 Global Step: 322370 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:14,028-Speed 6321.17 samples/sec Loss 5.5706 LearningRate 0.0005 Epoch: 15 Global Step: 322380 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:17,280-Speed 6301.49 samples/sec Loss 5.5777 LearningRate 0.0005 Epoch: 15 Global Step: 322390 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:20,526-Speed 6309.97 samples/sec Loss 5.6032 LearningRate 0.0005 Epoch: 15 Global Step: 322400 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:23,770-Speed 6314.39 samples/sec Loss 5.5838 LearningRate 0.0005 Epoch: 15 Global Step: 322410 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:27,001-Speed 6340.09 samples/sec Loss 5.6123 LearningRate 0.0005 Epoch: 15 Global Step: 322420 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:30,250-Speed 6305.06 samples/sec Loss 5.5515 LearningRate 0.0005 Epoch: 15 Global Step: 322430 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:33,495-Speed 6313.03 samples/sec Loss 5.5526 LearningRate 0.0005 Epoch: 15 Global Step: 322440 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:36,741-Speed 6311.00 samples/sec Loss 5.5540 LearningRate 0.0005 Epoch: 15 Global Step: 322450 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:39,987-Speed 6309.45 samples/sec Loss 5.5767 LearningRate 0.0005 Epoch: 15 Global Step: 322460 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:43,231-Speed 6315.44 samples/sec Loss 5.5925 LearningRate 0.0005 Epoch: 15 Global Step: 322470 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:46,476-Speed 6312.12 samples/sec Loss 5.6131 LearningRate 0.0005 Epoch: 15 Global Step: 322480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:49,724-Speed 6307.40 samples/sec Loss 5.5457 LearningRate 0.0005 Epoch: 15 Global Step: 322490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:52,970-Speed 6310.30 samples/sec Loss 5.5618 LearningRate 0.0005 Epoch: 15 Global Step: 322500 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:56,220-Speed 6304.60 samples/sec Loss 5.5654 LearningRate 0.0005 Epoch: 15 Global Step: 322510 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:03:59,468-Speed 6306.30 samples/sec Loss 5.5662 LearningRate 0.0005 Epoch: 15 Global Step: 322520 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 21:04:02,718-Speed 6304.02 samples/sec Loss 5.5658 LearningRate 0.0005 Epoch: 15 Global Step: 322530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:05,959-Speed 6319.41 samples/sec Loss 5.5887 LearningRate 0.0005 Epoch: 15 Global Step: 322540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:09,204-Speed 6312.19 samples/sec Loss 5.5480 LearningRate 0.0005 Epoch: 15 Global Step: 322550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:12,445-Speed 6321.38 samples/sec Loss 5.5647 LearningRate 0.0005 Epoch: 15 Global Step: 322560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:15,686-Speed 6320.97 samples/sec Loss 5.5599 LearningRate 0.0005 Epoch: 15 Global Step: 322570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:18,927-Speed 6319.61 samples/sec Loss 5.5832 LearningRate 0.0005 Epoch: 15 Global Step: 322580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:22,194-Speed 6269.92 samples/sec Loss 5.5765 LearningRate 0.0005 Epoch: 15 Global Step: 322590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:25,475-Speed 6242.35 samples/sec Loss 5.6229 LearningRate 0.0005 Epoch: 15 Global Step: 322600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:28,719-Speed 6315.86 samples/sec Loss 5.5987 LearningRate 0.0005 Epoch: 15 Global Step: 322610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:31,973-Speed 6294.78 samples/sec Loss 5.5523 LearningRate 0.0005 Epoch: 15 Global Step: 322620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:35,205-Speed 6338.57 samples/sec Loss 5.6131 LearningRate 0.0005 Epoch: 15 Global Step: 322630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:38,449-Speed 6315.28 samples/sec Loss 5.5538 LearningRate 0.0005 Epoch: 15 Global Step: 322640 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:41,696-Speed 6307.70 samples/sec Loss 5.5137 LearningRate 0.0005 Epoch: 15 Global Step: 322650 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:44,940-Speed 6314.33 samples/sec Loss 5.5475 LearningRate 0.0005 Epoch: 15 Global Step: 322660 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:48,185-Speed 6313.06 samples/sec Loss 5.5748 LearningRate 0.0005 Epoch: 15 Global Step: 322670 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:51,438-Speed 6298.17 samples/sec Loss 5.5285 LearningRate 0.0005 Epoch: 15 Global Step: 322680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:54,687-Speed 6305.13 samples/sec Loss 5.5810 LearningRate 0.0005 Epoch: 15 Global Step: 322690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:04:57,931-Speed 6314.24 samples/sec Loss 5.5660 LearningRate 0.0005 Epoch: 15 Global Step: 322700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:05:01,182-Speed 6301.23 samples/sec Loss 5.5672 LearningRate 0.0005 Epoch: 15 Global Step: 322710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:05:04,431-Speed 6305.01 samples/sec Loss 5.4878 LearningRate 0.0005 Epoch: 15 Global Step: 322720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:05:07,662-Speed 6340.20 samples/sec Loss 5.4995 LearningRate 0.0005 Epoch: 15 Global Step: 322730 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:05:10,907-Speed 6311.62 samples/sec Loss 5.5945 LearningRate 0.0005 Epoch: 15 Global Step: 322740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:05:14,138-Speed 6340.39 samples/sec Loss 5.6330 LearningRate 0.0005 Epoch: 15 Global Step: 322750 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:05:17,386-Speed 6307.59 samples/sec Loss 5.6119 LearningRate 0.0005 Epoch: 15 Global Step: 322760 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:05:20,632-Speed 6309.45 samples/sec Loss 5.5151 LearningRate 0.0005 Epoch: 15 Global Step: 322770 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:05:23,881-Speed 6305.79 samples/sec Loss 5.5782 LearningRate 0.0005 Epoch: 15 Global Step: 322780 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:05:27,128-Speed 6308.24 samples/sec Loss 5.5796 LearningRate 0.0005 Epoch: 15 Global Step: 322790 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:05:30,375-Speed 6309.30 samples/sec Loss 5.5160 LearningRate 0.0005 Epoch: 15 Global Step: 322800 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:05:33,623-Speed 6307.21 samples/sec Loss 5.5677 LearningRate 0.0005 Epoch: 15 Global Step: 322810 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:05:36,866-Speed 6316.24 samples/sec Loss 5.6203 LearningRate 0.0005 Epoch: 15 Global Step: 322820 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:05:40,120-Speed 6295.32 samples/sec Loss 5.5640 LearningRate 0.0005 Epoch: 15 Global Step: 322830 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:05:43,363-Speed 6315.84 samples/sec Loss 5.5582 LearningRate 0.0005 Epoch: 15 Global Step: 322840 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:05:46,608-Speed 6313.27 samples/sec Loss 5.5240 LearningRate 0.0005 Epoch: 15 Global Step: 322850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:05:49,858-Speed 6302.80 samples/sec Loss 5.5186 LearningRate 0.0005 Epoch: 15 Global Step: 322860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:05:53,105-Speed 6308.41 samples/sec Loss 5.4938 LearningRate 0.0005 Epoch: 15 Global Step: 322870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:05:56,369-Speed 6275.49 samples/sec Loss 5.5703 LearningRate 0.0005 Epoch: 15 Global Step: 322880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:05:59,620-Speed 6301.98 samples/sec Loss 5.5740 LearningRate 0.0005 Epoch: 15 Global Step: 322890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:02,866-Speed 6309.84 samples/sec Loss 5.4889 LearningRate 0.0005 Epoch: 15 Global Step: 322900 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:06,117-Speed 6303.06 samples/sec Loss 5.5695 LearningRate 0.0005 Epoch: 15 Global Step: 322910 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:09,361-Speed 6314.98 samples/sec Loss 5.6126 LearningRate 0.0005 Epoch: 15 Global Step: 322920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:12,606-Speed 6311.76 samples/sec Loss 5.6123 LearningRate 0.0005 Epoch: 15 Global Step: 322930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:15,852-Speed 6311.37 samples/sec Loss 5.6157 LearningRate 0.0005 Epoch: 15 Global Step: 322940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:19,083-Speed 6339.52 samples/sec Loss 5.5281 LearningRate 0.0005 Epoch: 15 Global Step: 322950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:22,333-Speed 6303.03 samples/sec Loss 5.5747 LearningRate 0.0005 Epoch: 15 Global Step: 322960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:25,580-Speed 6308.28 samples/sec Loss 5.5853 LearningRate 0.0005 Epoch: 15 Global Step: 322970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:28,827-Speed 6310.05 samples/sec Loss 5.4770 LearningRate 0.0005 Epoch: 15 Global Step: 322980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:32,075-Speed 6305.80 samples/sec Loss 5.5970 LearningRate 0.0005 Epoch: 15 Global Step: 322990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:35,320-Speed 6313.71 samples/sec Loss 5.5930 LearningRate 0.0005 Epoch: 15 Global Step: 323000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:38,564-Speed 6314.67 samples/sec Loss 5.5705 LearningRate 0.0005 Epoch: 15 Global Step: 323010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:41,810-Speed 6310.71 samples/sec Loss 5.5454 LearningRate 0.0005 Epoch: 15 Global Step: 323020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:45,056-Speed 6309.61 samples/sec Loss 5.5333 LearningRate 0.0005 Epoch: 15 Global Step: 323030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:48,305-Speed 6304.51 samples/sec Loss 5.5950 LearningRate 0.0005 Epoch: 15 Global Step: 323040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:51,534-Speed 6344.34 samples/sec Loss 5.6039 LearningRate 0.0005 Epoch: 15 Global Step: 323050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:54,796-Speed 6280.63 samples/sec Loss 5.5769 LearningRate 0.0005 Epoch: 15 Global Step: 323060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:06:58,040-Speed 6313.04 samples/sec Loss 5.5688 LearningRate 0.0005 Epoch: 15 Global Step: 323070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:01,290-Speed 6303.89 samples/sec Loss 5.5262 LearningRate 0.0005 Epoch: 15 Global Step: 323080 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:04,544-Speed 6295.05 samples/sec Loss 5.6078 LearningRate 0.0005 Epoch: 15 Global Step: 323090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:07,789-Speed 6312.38 samples/sec Loss 5.5261 LearningRate 0.0005 Epoch: 15 Global Step: 323100 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:11,036-Speed 6309.75 samples/sec Loss 5.5495 LearningRate 0.0005 Epoch: 15 Global Step: 323110 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:14,282-Speed 6310.55 samples/sec Loss 5.5710 LearningRate 0.0005 Epoch: 15 Global Step: 323120 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:17,528-Speed 6311.53 samples/sec Loss 5.5841 LearningRate 0.0005 Epoch: 15 Global Step: 323130 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:20,772-Speed 6315.22 samples/sec Loss 5.5856 LearningRate 0.0005 Epoch: 15 Global Step: 323140 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:24,006-Speed 6333.47 samples/sec Loss 5.5167 LearningRate 0.0005 Epoch: 15 Global Step: 323150 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:27,257-Speed 6301.10 samples/sec Loss 5.5417 LearningRate 0.0005 Epoch: 15 Global Step: 323160 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:30,501-Speed 6313.92 samples/sec Loss 5.5172 LearningRate 0.0005 Epoch: 15 Global Step: 323170 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:33,750-Speed 6305.06 samples/sec Loss 5.5589 LearningRate 0.0005 Epoch: 15 Global Step: 323180 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:36,998-Speed 6306.37 samples/sec Loss 5.5381 LearningRate 0.0005 Epoch: 15 Global Step: 323190 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:40,248-Speed 6304.41 samples/sec Loss 5.6483 LearningRate 0.0005 Epoch: 15 Global Step: 323200 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:43,496-Speed 6306.55 samples/sec Loss 5.5755 LearningRate 0.0005 Epoch: 15 Global Step: 323210 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:46,740-Speed 6314.66 samples/sec Loss 5.5713 LearningRate 0.0005 Epoch: 15 Global Step: 323220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:49,986-Speed 6309.98 samples/sec Loss 5.4909 LearningRate 0.0005 Epoch: 15 Global Step: 323230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:53,236-Speed 6302.80 samples/sec Loss 5.6161 LearningRate 0.0005 Epoch: 15 Global Step: 323240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:56,469-Speed 6335.81 samples/sec Loss 5.5786 LearningRate 0.0005 Epoch: 15 Global Step: 323250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:07:59,717-Speed 6307.69 samples/sec Loss 5.5667 LearningRate 0.0005 Epoch: 15 Global Step: 323260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:08:02,980-Speed 6278.34 samples/sec Loss 5.5540 LearningRate 0.0005 Epoch: 15 Global Step: 323270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:08:06,225-Speed 6310.87 samples/sec Loss 5.4938 LearningRate 0.0005 Epoch: 15 Global Step: 323280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:08:09,473-Speed 6308.26 samples/sec Loss 5.4843 LearningRate 0.0005 Epoch: 15 Global Step: 323290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:08:12,723-Speed 6301.63 samples/sec Loss 5.6144 LearningRate 0.0005 Epoch: 15 Global Step: 323300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:08:15,986-Speed 6282.37 samples/sec Loss 5.5580 LearningRate 0.0005 Epoch: 15 Global Step: 323310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:08:19,218-Speed 6336.13 samples/sec Loss 5.5615 LearningRate 0.0005 Epoch: 15 Global Step: 323320 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:08:22,470-Speed 6299.07 samples/sec Loss 5.4722 LearningRate 0.0005 Epoch: 15 Global Step: 323330 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:08:25,714-Speed 6315.35 samples/sec Loss 5.6007 LearningRate 0.0005 Epoch: 15 Global Step: 323340 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:08:28,961-Speed 6309.98 samples/sec Loss 5.5624 LearningRate 0.0005 Epoch: 15 Global Step: 323350 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:08:32,204-Speed 6316.24 samples/sec Loss 5.5513 LearningRate 0.0005 Epoch: 15 Global Step: 323360 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:08:35,452-Speed 6306.64 samples/sec Loss 5.5969 LearningRate 0.0005 Epoch: 15 Global Step: 323370 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:08:38,703-Speed 6300.72 samples/sec Loss 5.5492 LearningRate 0.0005 Epoch: 15 Global Step: 323380 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:08:41,952-Speed 6305.82 samples/sec Loss 5.5109 LearningRate 0.0005 Epoch: 15 Global Step: 323390 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:08:45,203-Speed 6300.22 samples/sec Loss 5.5621 LearningRate 0.0005 Epoch: 15 Global Step: 323400 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:08:48,449-Speed 6310.11 samples/sec Loss 5.5508 LearningRate 0.0005 Epoch: 15 Global Step: 323410 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:08:51,698-Speed 6306.22 samples/sec Loss 5.5527 LearningRate 0.0005 Epoch: 15 Global Step: 323420 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:08:54,944-Speed 6309.10 samples/sec Loss 5.6126 LearningRate 0.0005 Epoch: 15 Global Step: 323430 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:08:58,190-Speed 6311.53 samples/sec Loss 5.5704 LearningRate 0.0005 Epoch: 15 Global Step: 323440 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:01,437-Speed 6309.88 samples/sec Loss 5.5552 LearningRate 0.0005 Epoch: 15 Global Step: 323450 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:04,687-Speed 6302.27 samples/sec Loss 5.6260 LearningRate 0.0005 Epoch: 15 Global Step: 323460 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:07,936-Speed 6305.56 samples/sec Loss 5.5204 LearningRate 0.0005 Epoch: 15 Global Step: 323470 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:11,180-Speed 6312.79 samples/sec Loss 5.5868 LearningRate 0.0005 Epoch: 15 Global Step: 323480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:14,429-Speed 6309.01 samples/sec Loss 5.5606 LearningRate 0.0005 Epoch: 15 Global Step: 323490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:17,676-Speed 6307.74 samples/sec Loss 5.5958 LearningRate 0.0005 Epoch: 15 Global Step: 323500 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:20,927-Speed 6301.72 samples/sec Loss 5.5199 LearningRate 0.0005 Epoch: 15 Global Step: 323510 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:24,163-Speed 6329.94 samples/sec Loss 5.5522 LearningRate 0.0005 Epoch: 15 Global Step: 323520 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:27,412-Speed 6305.44 samples/sec Loss 5.5901 LearningRate 0.0005 Epoch: 15 Global Step: 323530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:30,662-Speed 6301.58 samples/sec Loss 5.5552 LearningRate 0.0005 Epoch: 15 Global Step: 323540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:33,908-Speed 6312.07 samples/sec Loss 5.5787 LearningRate 0.0005 Epoch: 15 Global Step: 323550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:37,157-Speed 6304.67 samples/sec Loss 5.5533 LearningRate 0.0005 Epoch: 15 Global Step: 323560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:40,403-Speed 6310.77 samples/sec Loss 5.5132 LearningRate 0.0005 Epoch: 15 Global Step: 323570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:43,648-Speed 6313.27 samples/sec Loss 5.4414 LearningRate 0.0005 Epoch: 15 Global Step: 323580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:46,894-Speed 6311.36 samples/sec Loss 5.5523 LearningRate 0.0005 Epoch: 15 Global Step: 323590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:50,138-Speed 6313.40 samples/sec Loss 5.6292 LearningRate 0.0005 Epoch: 15 Global Step: 323600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:53,384-Speed 6311.89 samples/sec Loss 5.5955 LearningRate 0.0005 Epoch: 15 Global Step: 323610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:09:56,628-Speed 6313.81 samples/sec Loss 5.4881 LearningRate 0.0005 Epoch: 15 Global Step: 323620 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 21:09:59,862-Speed 6334.83 samples/sec Loss 5.5539 LearningRate 0.0005 Epoch: 15 Global Step: 323630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:10:03,111-Speed 6303.84 samples/sec Loss 5.4720 LearningRate 0.0005 Epoch: 15 Global Step: 323640 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:10:06,355-Speed 6314.05 samples/sec Loss 5.5595 LearningRate 0.0005 Epoch: 15 Global Step: 323650 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:10:09,598-Speed 6317.56 samples/sec Loss 5.5632 LearningRate 0.0005 Epoch: 15 Global Step: 323660 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:10:12,844-Speed 6311.18 samples/sec Loss 5.6211 LearningRate 0.0005 Epoch: 15 Global Step: 323670 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:10:16,091-Speed 6307.96 samples/sec Loss 5.5518 LearningRate 0.0005 Epoch: 15 Global Step: 323680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:10:19,346-Speed 6293.68 samples/sec Loss 5.5898 LearningRate 0.0005 Epoch: 15 Global Step: 323690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:10:22,593-Speed 6308.42 samples/sec Loss 5.5870 LearningRate 0.0005 Epoch: 15 Global Step: 323700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:10:25,837-Speed 6314.80 samples/sec Loss 5.4808 LearningRate 0.0005 Epoch: 15 Global Step: 323710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:10:29,084-Speed 6308.36 samples/sec Loss 5.5629 LearningRate 0.0005 Epoch: 15 Global Step: 323720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:10:32,298-Speed 6373.01 samples/sec Loss 5.5412 LearningRate 0.0005 Epoch: 15 Global Step: 323730 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:10:35,540-Speed 6318.58 samples/sec Loss 5.5833 LearningRate 0.0005 Epoch: 15 Global Step: 323740 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:10:38,784-Speed 6315.00 samples/sec Loss 5.5901 LearningRate 0.0005 Epoch: 15 Global Step: 323750 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:10:42,031-Speed 6310.07 samples/sec Loss 5.5224 LearningRate 0.0005 Epoch: 15 Global Step: 323760 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:10:45,277-Speed 6310.75 samples/sec Loss 5.5730 LearningRate 0.0005 Epoch: 15 Global Step: 323770 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:10:48,529-Speed 6299.88 samples/sec Loss 5.6057 LearningRate 0.0005 Epoch: 15 Global Step: 323780 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:10:51,773-Speed 6314.11 samples/sec Loss 5.6052 LearningRate 0.0005 Epoch: 15 Global Step: 323790 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:10:55,025-Speed 6299.87 samples/sec Loss 5.6082 LearningRate 0.0005 Epoch: 15 Global Step: 323800 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:10:58,268-Speed 6315.63 samples/sec Loss 5.5760 LearningRate 0.0005 Epoch: 15 Global Step: 323810 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:11:01,512-Speed 6315.02 samples/sec Loss 5.5421 LearningRate 0.0005 Epoch: 15 Global Step: 323820 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:11:04,757-Speed 6311.67 samples/sec Loss 5.5536 LearningRate 0.0005 Epoch: 15 Global Step: 323830 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:08,011-Speed 6295.60 samples/sec Loss 5.6107 LearningRate 0.0005 Epoch: 15 Global Step: 323840 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:11,260-Speed 6305.92 samples/sec Loss 5.5386 LearningRate 0.0005 Epoch: 15 Global Step: 323850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:14,507-Speed 6308.85 samples/sec Loss 5.5402 LearningRate 0.0005 Epoch: 15 Global Step: 323860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:17,754-Speed 6308.24 samples/sec Loss 5.5140 LearningRate 0.0005 Epoch: 15 Global Step: 323870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:21,028-Speed 6257.34 samples/sec Loss 5.5393 LearningRate 0.0005 Epoch: 15 Global Step: 323880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:24,272-Speed 6314.31 samples/sec Loss 5.6854 LearningRate 0.0005 Epoch: 15 Global Step: 323890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:27,515-Speed 6316.45 samples/sec Loss 5.5577 LearningRate 0.0005 Epoch: 15 Global Step: 323900 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:30,763-Speed 6307.18 samples/sec Loss 5.5778 LearningRate 0.0005 Epoch: 15 Global Step: 323910 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:34,005-Speed 6317.85 samples/sec Loss 5.6155 LearningRate 0.0005 Epoch: 15 Global Step: 323920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:37,254-Speed 6305.04 samples/sec Loss 5.5823 LearningRate 0.0005 Epoch: 15 Global Step: 323930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:40,495-Speed 6320.68 samples/sec Loss 5.5660 LearningRate 0.0005 Epoch: 15 Global Step: 323940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:43,741-Speed 6311.08 samples/sec Loss 5.5299 LearningRate 0.0005 Epoch: 15 Global Step: 323950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:46,994-Speed 6296.61 samples/sec Loss 5.6329 LearningRate 0.0005 Epoch: 15 Global Step: 323960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:50,251-Speed 6288.72 samples/sec Loss 5.4760 LearningRate 0.0005 Epoch: 15 Global Step: 323970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:53,499-Speed 6308.00 samples/sec Loss 5.6002 LearningRate 0.0005 Epoch: 15 Global Step: 323980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:11:56,749-Speed 6304.03 samples/sec Loss 5.5283 LearningRate 0.0005 Epoch: 15 Global Step: 323990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:00,000-Speed 6300.61 samples/sec Loss 5.6198 LearningRate 0.0005 Epoch: 15 Global Step: 324000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:03,244-Speed 6314.36 samples/sec Loss 5.5492 LearningRate 0.0005 Epoch: 15 Global Step: 324010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:06,490-Speed 6309.73 samples/sec Loss 5.5677 LearningRate 0.0005 Epoch: 15 Global Step: 324020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:09,720-Speed 6342.35 samples/sec Loss 5.6341 LearningRate 0.0005 Epoch: 15 Global Step: 324030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:12,965-Speed 6313.43 samples/sec Loss 5.5754 LearningRate 0.0005 Epoch: 15 Global Step: 324040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:16,208-Speed 6316.95 samples/sec Loss 5.5506 LearningRate 0.0005 Epoch: 15 Global Step: 324050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:19,450-Speed 6316.73 samples/sec Loss 5.6060 LearningRate 0.0005 Epoch: 15 Global Step: 324060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:22,694-Speed 6315.72 samples/sec Loss 5.5578 LearningRate 0.0005 Epoch: 15 Global Step: 324070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:25,941-Speed 6308.23 samples/sec Loss 5.5959 LearningRate 0.0005 Epoch: 15 Global Step: 324080 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:29,193-Speed 6298.67 samples/sec Loss 5.5645 LearningRate 0.0005 Epoch: 15 Global Step: 324090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:32,438-Speed 6312.59 samples/sec Loss 5.5279 LearningRate 0.0005 Epoch: 15 Global Step: 324100 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:35,706-Speed 6269.11 samples/sec Loss 5.5199 LearningRate 0.0005 Epoch: 15 Global Step: 324110 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:38,974-Speed 6269.35 samples/sec Loss 5.5733 LearningRate 0.0005 Epoch: 15 Global Step: 324120 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:42,217-Speed 6316.16 samples/sec Loss 5.5503 LearningRate 0.0005 Epoch: 15 Global Step: 324130 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 21:12:45,448-Speed 6341.79 samples/sec Loss 5.4898 LearningRate 0.0005 Epoch: 15 Global Step: 324140 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:48,691-Speed 6316.11 samples/sec Loss 5.4815 LearningRate 0.0005 Epoch: 15 Global Step: 324150 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:51,941-Speed 6302.96 samples/sec Loss 5.5259 LearningRate 0.0005 Epoch: 15 Global Step: 324160 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:55,187-Speed 6310.30 samples/sec Loss 5.5402 LearningRate 0.0005 Epoch: 15 Global Step: 324170 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:12:58,432-Speed 6313.36 samples/sec Loss 5.4969 LearningRate 0.0005 Epoch: 15 Global Step: 324180 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:01,677-Speed 6310.87 samples/sec Loss 5.5333 LearningRate 0.0005 Epoch: 15 Global Step: 324190 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:04,927-Speed 6303.93 samples/sec Loss 5.6516 LearningRate 0.0005 Epoch: 15 Global Step: 324200 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:08,171-Speed 6314.91 samples/sec Loss 5.5965 LearningRate 0.0005 Epoch: 15 Global Step: 324210 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:11,418-Speed 6309.29 samples/sec Loss 5.5855 LearningRate 0.0005 Epoch: 15 Global Step: 324220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:14,663-Speed 6312.73 samples/sec Loss 5.5475 LearningRate 0.0005 Epoch: 15 Global Step: 324230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:17,893-Speed 6341.89 samples/sec Loss 5.5866 LearningRate 0.0005 Epoch: 15 Global Step: 324240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:21,137-Speed 6314.06 samples/sec Loss 5.5554 LearningRate 0.0005 Epoch: 15 Global Step: 324250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:24,382-Speed 6312.95 samples/sec Loss 5.5511 LearningRate 0.0005 Epoch: 15 Global Step: 324260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:27,633-Speed 6300.90 samples/sec Loss 5.6140 LearningRate 0.0005 Epoch: 15 Global Step: 324270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:30,880-Speed 6309.26 samples/sec Loss 5.6007 LearningRate 0.0005 Epoch: 15 Global Step: 324280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:34,125-Speed 6312.85 samples/sec Loss 5.6199 LearningRate 0.0005 Epoch: 15 Global Step: 324290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:37,375-Speed 6302.53 samples/sec Loss 5.6326 LearningRate 0.0005 Epoch: 15 Global Step: 324300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:40,622-Speed 6308.95 samples/sec Loss 5.5408 LearningRate 0.0005 Epoch: 15 Global Step: 324310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:43,867-Speed 6312.18 samples/sec Loss 5.5709 LearningRate 0.0005 Epoch: 15 Global Step: 324320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:47,114-Speed 6309.25 samples/sec Loss 5.5466 LearningRate 0.0005 Epoch: 15 Global Step: 324330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:50,362-Speed 6306.37 samples/sec Loss 5.5082 LearningRate 0.0005 Epoch: 15 Global Step: 324340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:53,622-Speed 6284.34 samples/sec Loss 5.5671 LearningRate 0.0005 Epoch: 15 Global Step: 324350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:13:56,865-Speed 6315.90 samples/sec Loss 5.5125 LearningRate 0.0005 Epoch: 15 Global Step: 324360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:00,116-Speed 6301.35 samples/sec Loss 5.5884 LearningRate 0.0005 Epoch: 15 Global Step: 324370 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:03,370-Speed 6295.00 samples/sec Loss 5.4720 LearningRate 0.0005 Epoch: 15 Global Step: 324380 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:06,617-Speed 6308.35 samples/sec Loss 5.6130 LearningRate 0.0005 Epoch: 15 Global Step: 324390 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:09,862-Speed 6312.56 samples/sec Loss 5.5400 LearningRate 0.0005 Epoch: 15 Global Step: 324400 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:13,106-Speed 6314.95 samples/sec Loss 5.5391 LearningRate 0.0005 Epoch: 15 Global Step: 324410 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:16,360-Speed 6295.55 samples/sec Loss 5.5990 LearningRate 0.0005 Epoch: 15 Global Step: 324420 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:19,607-Speed 6309.31 samples/sec Loss 5.6057 LearningRate 0.0005 Epoch: 15 Global Step: 324430 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:22,845-Speed 6326.03 samples/sec Loss 5.5862 LearningRate 0.0005 Epoch: 15 Global Step: 324440 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:26,092-Speed 6308.74 samples/sec Loss 5.5686 LearningRate 0.0005 Epoch: 15 Global Step: 324450 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:29,341-Speed 6305.58 samples/sec Loss 5.6076 LearningRate 0.0005 Epoch: 15 Global Step: 324460 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:32,594-Speed 6297.74 samples/sec Loss 5.4905 LearningRate 0.0005 Epoch: 15 Global Step: 324470 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:35,850-Speed 6290.32 samples/sec Loss 5.5531 LearningRate 0.0005 Epoch: 15 Global Step: 324480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:39,095-Speed 6312.60 samples/sec Loss 5.5261 LearningRate 0.0005 Epoch: 15 Global Step: 324490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:42,343-Speed 6306.76 samples/sec Loss 5.5904 LearningRate 0.0005 Epoch: 15 Global Step: 324500 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:45,593-Speed 6302.92 samples/sec Loss 5.6158 LearningRate 0.0005 Epoch: 15 Global Step: 324510 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:48,838-Speed 6313.13 samples/sec Loss 5.5663 LearningRate 0.0005 Epoch: 15 Global Step: 324520 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:52,083-Speed 6313.21 samples/sec Loss 5.6185 LearningRate 0.0005 Epoch: 15 Global Step: 324530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:55,321-Speed 6325.98 samples/sec Loss 5.5405 LearningRate 0.0005 Epoch: 15 Global Step: 324540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:14:58,568-Speed 6309.25 samples/sec Loss 5.5296 LearningRate 0.0005 Epoch: 15 Global Step: 324550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:01,826-Speed 6286.12 samples/sec Loss 5.5391 LearningRate 0.0005 Epoch: 15 Global Step: 324560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:05,077-Speed 6300.58 samples/sec Loss 5.5656 LearningRate 0.0005 Epoch: 15 Global Step: 324570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:08,323-Speed 6311.90 samples/sec Loss 5.5339 LearningRate 0.0005 Epoch: 15 Global Step: 324580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:11,567-Speed 6313.15 samples/sec Loss 5.6013 LearningRate 0.0005 Epoch: 15 Global Step: 324590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:14,815-Speed 6307.14 samples/sec Loss 5.5818 LearningRate 0.0005 Epoch: 15 Global Step: 324600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:18,062-Speed 6309.46 samples/sec Loss 5.5272 LearningRate 0.0005 Epoch: 15 Global Step: 324610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:21,308-Speed 6312.03 samples/sec Loss 5.5772 LearningRate 0.0005 Epoch: 15 Global Step: 324620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:24,558-Speed 6301.64 samples/sec Loss 5.5589 LearningRate 0.0005 Epoch: 15 Global Step: 324630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:27,795-Speed 6331.00 samples/sec Loss 5.4729 LearningRate 0.0005 Epoch: 15 Global Step: 324640 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:31,039-Speed 6313.01 samples/sec Loss 5.5196 LearningRate 0.0005 Epoch: 15 Global Step: 324650 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:34,286-Speed 6308.52 samples/sec Loss 5.5704 LearningRate 0.0005 Epoch: 15 Global Step: 324660 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:37,533-Speed 6308.73 samples/sec Loss 5.5743 LearningRate 0.0005 Epoch: 15 Global Step: 324670 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:40,779-Speed 6311.24 samples/sec Loss 5.5460 LearningRate 0.0005 Epoch: 15 Global Step: 324680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:44,024-Speed 6312.83 samples/sec Loss 5.5335 LearningRate 0.0005 Epoch: 15 Global Step: 324690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:47,266-Speed 6318.72 samples/sec Loss 5.5302 LearningRate 0.0005 Epoch: 15 Global Step: 324700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:50,525-Speed 6285.77 samples/sec Loss 5.5724 LearningRate 0.0005 Epoch: 15 Global Step: 324710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:53,771-Speed 6311.34 samples/sec Loss 5.6726 LearningRate 0.0005 Epoch: 15 Global Step: 324720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:15:57,018-Speed 6307.26 samples/sec Loss 5.6378 LearningRate 0.0005 Epoch: 15 Global Step: 324730 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:16:00,272-Speed 6296.53 samples/sec Loss 5.5678 LearningRate 0.0005 Epoch: 15 Global Step: 324740 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 21:16:03,503-Speed 6339.68 samples/sec Loss 5.5875 LearningRate 0.0005 Epoch: 15 Global Step: 324750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:16:06,752-Speed 6304.47 samples/sec Loss 5.5434 LearningRate 0.0005 Epoch: 15 Global Step: 324760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:16:09,994-Speed 6317.38 samples/sec Loss 5.6021 LearningRate 0.0005 Epoch: 15 Global Step: 324770 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:16:13,242-Speed 6307.89 samples/sec Loss 5.5254 LearningRate 0.0005 Epoch: 15 Global Step: 324780 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:16:16,489-Speed 6308.27 samples/sec Loss 5.4984 LearningRate 0.0005 Epoch: 15 Global Step: 324790 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:16:19,736-Speed 6310.04 samples/sec Loss 5.5997 LearningRate 0.0005 Epoch: 15 Global Step: 324800 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:16:22,978-Speed 6318.29 samples/sec Loss 5.5426 LearningRate 0.0005 Epoch: 15 Global Step: 324810 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:16:26,218-Speed 6322.45 samples/sec Loss 5.5055 LearningRate 0.0005 Epoch: 15 Global Step: 324820 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:16:29,461-Speed 6315.26 samples/sec Loss 5.5528 LearningRate 0.0005 Epoch: 15 Global Step: 324830 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:16:32,708-Speed 6310.94 samples/sec Loss 5.5253 LearningRate 0.0005 Epoch: 15 Global Step: 324840 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:16:35,951-Speed 6316.40 samples/sec Loss 5.6127 LearningRate 0.0005 Epoch: 15 Global Step: 324850 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:16:39,197-Speed 6308.93 samples/sec Loss 5.5911 LearningRate 0.0005 Epoch: 15 Global Step: 324860 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:16:42,445-Speed 6308.82 samples/sec Loss 5.5917 LearningRate 0.0005 Epoch: 15 Global Step: 324870 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:16:45,693-Speed 6306.67 samples/sec Loss 5.5282 LearningRate 0.0005 Epoch: 15 Global Step: 324880 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:16:48,939-Speed 6310.56 samples/sec Loss 5.4688 LearningRate 0.0005 Epoch: 15 Global Step: 324890 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:16:52,181-Speed 6317.32 samples/sec Loss 5.5285 LearningRate 0.0005 Epoch: 15 Global Step: 324900 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:16:55,430-Speed 6305.62 samples/sec Loss 5.6155 LearningRate 0.0005 Epoch: 15 Global Step: 324910 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:16:58,676-Speed 6311.19 samples/sec Loss 5.5123 LearningRate 0.0005 Epoch: 15 Global Step: 324920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:01,926-Speed 6301.82 samples/sec Loss 5.5317 LearningRate 0.0005 Epoch: 15 Global Step: 324930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:05,173-Speed 6309.19 samples/sec Loss 5.5259 LearningRate 0.0005 Epoch: 15 Global Step: 324940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:08,424-Speed 6300.92 samples/sec Loss 5.5799 LearningRate 0.0005 Epoch: 15 Global Step: 324950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:11,670-Speed 6309.70 samples/sec Loss 5.5346 LearningRate 0.0005 Epoch: 15 Global Step: 324960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:14,920-Speed 6303.65 samples/sec Loss 5.5178 LearningRate 0.0005 Epoch: 15 Global Step: 324970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:18,170-Speed 6303.73 samples/sec Loss 5.5341 LearningRate 0.0005 Epoch: 15 Global Step: 324980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:21,414-Speed 6314.76 samples/sec Loss 5.5582 LearningRate 0.0005 Epoch: 15 Global Step: 324990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:24,660-Speed 6310.52 samples/sec Loss 5.5736 LearningRate 0.0005 Epoch: 15 Global Step: 325000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:27,908-Speed 6306.08 samples/sec Loss 5.4534 LearningRate 0.0005 Epoch: 15 Global Step: 325010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:31,141-Speed 6336.52 samples/sec Loss 5.4777 LearningRate 0.0005 Epoch: 15 Global Step: 325020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:34,388-Speed 6308.01 samples/sec Loss 5.5795 LearningRate 0.0005 Epoch: 15 Global Step: 325030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:37,637-Speed 6306.11 samples/sec Loss 5.5641 LearningRate 0.0005 Epoch: 15 Global Step: 325040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:40,885-Speed 6307.52 samples/sec Loss 5.6167 LearningRate 0.0005 Epoch: 15 Global Step: 325050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:44,133-Speed 6306.10 samples/sec Loss 5.4935 LearningRate 0.0005 Epoch: 15 Global Step: 325060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:47,380-Speed 6309.82 samples/sec Loss 5.6305 LearningRate 0.0005 Epoch: 15 Global Step: 325070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:50,625-Speed 6312.60 samples/sec Loss 5.5255 LearningRate 0.0005 Epoch: 15 Global Step: 325080 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:53,905-Speed 6243.98 samples/sec Loss 5.5447 LearningRate 0.0005 Epoch: 15 Global Step: 325090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:17:57,153-Speed 6308.17 samples/sec Loss 5.5521 LearningRate 0.0005 Epoch: 15 Global Step: 325100 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:00,401-Speed 6306.30 samples/sec Loss 5.5742 LearningRate 0.0005 Epoch: 15 Global Step: 325110 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:03,635-Speed 6333.31 samples/sec Loss 5.5310 LearningRate 0.0005 Epoch: 15 Global Step: 325120 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:06,879-Speed 6314.80 samples/sec Loss 5.6235 LearningRate 0.0005 Epoch: 15 Global Step: 325130 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:10,125-Speed 6311.31 samples/sec Loss 5.5792 LearningRate 0.0005 Epoch: 15 Global Step: 325140 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:13,370-Speed 6312.11 samples/sec Loss 5.4939 LearningRate 0.0005 Epoch: 15 Global Step: 325150 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:16,617-Speed 6309.44 samples/sec Loss 5.5064 LearningRate 0.0005 Epoch: 15 Global Step: 325160 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:19,861-Speed 6314.84 samples/sec Loss 5.5525 LearningRate 0.0005 Epoch: 15 Global Step: 325170 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:23,111-Speed 6303.43 samples/sec Loss 5.5425 LearningRate 0.0005 Epoch: 15 Global Step: 325180 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:26,362-Speed 6299.26 samples/sec Loss 5.5362 LearningRate 0.0005 Epoch: 15 Global Step: 325190 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:29,613-Speed 6301.56 samples/sec Loss 5.5987 LearningRate 0.0005 Epoch: 15 Global Step: 325200 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:32,859-Speed 6310.47 samples/sec Loss 5.5651 LearningRate 0.0005 Epoch: 15 Global Step: 325210 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:36,089-Speed 6341.45 samples/sec Loss 5.6030 LearningRate 0.0005 Epoch: 15 Global Step: 325220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:39,336-Speed 6309.30 samples/sec Loss 5.6088 LearningRate 0.0005 Epoch: 15 Global Step: 325230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:42,582-Speed 6311.29 samples/sec Loss 5.5627 LearningRate 0.0005 Epoch: 15 Global Step: 325240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:45,829-Speed 6308.56 samples/sec Loss 5.5528 LearningRate 0.0005 Epoch: 15 Global Step: 325250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:49,086-Speed 6290.86 samples/sec Loss 5.5933 LearningRate 0.0005 Epoch: 15 Global Step: 325260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:52,336-Speed 6303.52 samples/sec Loss 5.5858 LearningRate 0.0005 Epoch: 15 Global Step: 325270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:55,582-Speed 6309.83 samples/sec Loss 5.6127 LearningRate 0.0005 Epoch: 15 Global Step: 325280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:18:58,827-Speed 6312.20 samples/sec Loss 5.6101 LearningRate 0.0005 Epoch: 15 Global Step: 325290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:19:02,079-Speed 6298.54 samples/sec Loss 5.5102 LearningRate 0.0005 Epoch: 15 Global Step: 325300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:19:05,325-Speed 6311.74 samples/sec Loss 5.4814 LearningRate 0.0005 Epoch: 15 Global Step: 325310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:19:08,558-Speed 6336.44 samples/sec Loss 5.5765 LearningRate 0.0005 Epoch: 15 Global Step: 325320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:19:11,806-Speed 6307.13 samples/sec Loss 5.5675 LearningRate 0.0005 Epoch: 15 Global Step: 325330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:19:15,050-Speed 6315.16 samples/sec Loss 5.5183 LearningRate 0.0005 Epoch: 15 Global Step: 325340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:19:18,299-Speed 6303.59 samples/sec Loss 5.4759 LearningRate 0.0005 Epoch: 15 Global Step: 325350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:19:21,526-Speed 6347.97 samples/sec Loss 5.5132 LearningRate 0.0005 Epoch: 15 Global Step: 325360 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:19:24,767-Speed 6320.07 samples/sec Loss 5.5701 LearningRate 0.0005 Epoch: 15 Global Step: 325370 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:19:28,015-Speed 6307.40 samples/sec Loss 5.5206 LearningRate 0.0005 Epoch: 15 Global Step: 325380 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:19:31,263-Speed 6305.97 samples/sec Loss 5.5411 LearningRate 0.0005 Epoch: 15 Global Step: 325390 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:19:34,509-Speed 6311.15 samples/sec Loss 5.4792 LearningRate 0.0005 Epoch: 15 Global Step: 325400 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:19:37,755-Speed 6311.99 samples/sec Loss 5.5309 LearningRate 0.0005 Epoch: 15 Global Step: 325410 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:19:41,006-Speed 6300.45 samples/sec Loss 5.6094 LearningRate 0.0005 Epoch: 15 Global Step: 325420 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:19:44,255-Speed 6305.08 samples/sec Loss 5.5281 LearningRate 0.0005 Epoch: 15 Global Step: 325430 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:19:47,506-Speed 6300.13 samples/sec Loss 5.5291 LearningRate 0.0005 Epoch: 15 Global Step: 325440 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:19:50,750-Speed 6314.34 samples/sec Loss 5.5827 LearningRate 0.0005 Epoch: 15 Global Step: 325450 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:19:53,999-Speed 6305.72 samples/sec Loss 5.4407 LearningRate 0.0005 Epoch: 15 Global Step: 325460 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:19:57,251-Speed 6298.36 samples/sec Loss 5.5557 LearningRate 0.0005 Epoch: 15 Global Step: 325470 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:20:00,500-Speed 6305.96 samples/sec Loss 5.4979 LearningRate 0.0005 Epoch: 15 Global Step: 325480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:20:03,753-Speed 6297.58 samples/sec Loss 5.5684 LearningRate 0.0005 Epoch: 15 Global Step: 325490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:20:06,999-Speed 6309.87 samples/sec Loss 5.5024 LearningRate 0.0005 Epoch: 15 Global Step: 325500 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:20:10,245-Speed 6311.60 samples/sec Loss 5.5377 LearningRate 0.0005 Epoch: 15 Global Step: 325510 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:20:13,495-Speed 6302.65 samples/sec Loss 5.5224 LearningRate 0.0005 Epoch: 15 Global Step: 325520 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:20:16,738-Speed 6317.44 samples/sec Loss 5.5488 LearningRate 0.0005 Epoch: 15 Global Step: 325530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:20:19,985-Speed 6308.16 samples/sec Loss 5.5957 LearningRate 0.0005 Epoch: 15 Global Step: 325540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:20:23,231-Speed 6310.37 samples/sec Loss 5.5101 LearningRate 0.0005 Epoch: 15 Global Step: 325550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:20:26,463-Speed 6338.02 samples/sec Loss 5.5280 LearningRate 0.0005 Epoch: 15 Global Step: 325560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:20:29,699-Speed 6330.82 samples/sec Loss 5.5468 LearningRate 0.0005 Epoch: 15 Global Step: 325570 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:20:32,945-Speed 6310.21 samples/sec Loss 5.5955 LearningRate 0.0005 Epoch: 15 Global Step: 325580 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:20:36,194-Speed 6305.84 samples/sec Loss 5.5508 LearningRate 0.0005 Epoch: 15 Global Step: 325590 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:20:39,441-Speed 6307.61 samples/sec Loss 5.6395 LearningRate 0.0005 Epoch: 15 Global Step: 325600 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:20:42,683-Speed 6318.00 samples/sec Loss 5.5753 LearningRate 0.0005 Epoch: 15 Global Step: 325610 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:20:45,937-Speed 6296.72 samples/sec Loss 5.5927 LearningRate 0.0005 Epoch: 15 Global Step: 325620 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:20:49,181-Speed 6314.85 samples/sec Loss 5.5410 LearningRate 0.0005 Epoch: 15 Global Step: 325630 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:20:52,430-Speed 6303.42 samples/sec Loss 5.4688 LearningRate 0.0005 Epoch: 15 Global Step: 325640 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:20:55,678-Speed 6307.11 samples/sec Loss 5.5047 LearningRate 0.0005 Epoch: 15 Global Step: 325650 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:20:58,925-Speed 6308.75 samples/sec Loss 5.5117 LearningRate 0.0005 Epoch: 15 Global Step: 325660 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:21:02,172-Speed 6309.24 samples/sec Loss 5.5127 LearningRate 0.0005 Epoch: 15 Global Step: 325670 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:05,419-Speed 6309.47 samples/sec Loss 5.4848 LearningRate 0.0005 Epoch: 15 Global Step: 325680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:08,663-Speed 6315.20 samples/sec Loss 5.5633 LearningRate 0.0005 Epoch: 15 Global Step: 325690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:11,909-Speed 6311.59 samples/sec Loss 5.6714 LearningRate 0.0005 Epoch: 15 Global Step: 325700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:15,157-Speed 6306.06 samples/sec Loss 5.5600 LearningRate 0.0005 Epoch: 15 Global Step: 325710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:18,405-Speed 6307.35 samples/sec Loss 5.4747 LearningRate 0.0005 Epoch: 15 Global Step: 325720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:21,648-Speed 6316.42 samples/sec Loss 5.6484 LearningRate 0.0005 Epoch: 15 Global Step: 325730 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:24,889-Speed 6320.71 samples/sec Loss 5.5126 LearningRate 0.0005 Epoch: 15 Global Step: 325740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:28,135-Speed 6309.17 samples/sec Loss 5.5444 LearningRate 0.0005 Epoch: 15 Global Step: 325750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:31,386-Speed 6302.15 samples/sec Loss 5.6293 LearningRate 0.0005 Epoch: 15 Global Step: 325760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:34,617-Speed 6340.32 samples/sec Loss 5.5129 LearningRate 0.0005 Epoch: 15 Global Step: 325770 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:37,864-Speed 6307.77 samples/sec Loss 5.5210 LearningRate 0.0005 Epoch: 15 Global Step: 325780 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:41,109-Speed 6312.85 samples/sec Loss 5.5062 LearningRate 0.0005 Epoch: 15 Global Step: 325790 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:44,353-Speed 6314.26 samples/sec Loss 5.5005 LearningRate 0.0005 Epoch: 15 Global Step: 325800 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:47,596-Speed 6316.42 samples/sec Loss 5.6279 LearningRate 0.0005 Epoch: 15 Global Step: 325810 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:50,837-Speed 6319.89 samples/sec Loss 5.5803 LearningRate 0.0005 Epoch: 15 Global Step: 325820 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:54,086-Speed 6305.84 samples/sec Loss 5.5488 LearningRate 0.0005 Epoch: 15 Global Step: 325830 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:21:57,329-Speed 6316.39 samples/sec Loss 5.4985 LearningRate 0.0005 Epoch: 15 Global Step: 325840 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:22:00,576-Speed 6308.62 samples/sec Loss 5.4981 LearningRate 0.0005 Epoch: 15 Global Step: 325850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:22:03,826-Speed 6303.30 samples/sec Loss 5.5462 LearningRate 0.0005 Epoch: 15 Global Step: 325860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:22:07,063-Speed 6328.73 samples/sec Loss 5.5509 LearningRate 0.0005 Epoch: 15 Global Step: 325870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:22:10,310-Speed 6306.95 samples/sec Loss 5.5449 LearningRate 0.0005 Epoch: 15 Global Step: 325880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:22:13,559-Speed 6305.73 samples/sec Loss 5.6313 LearningRate 0.0005 Epoch: 15 Global Step: 325890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:22:16,791-Speed 6338.51 samples/sec Loss 5.5475 LearningRate 0.0005 Epoch: 15 Global Step: 325900 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:22:20,038-Speed 6309.68 samples/sec Loss 5.5759 LearningRate 0.0005 Epoch: 15 Global Step: 325910 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:22:23,282-Speed 6314.32 samples/sec Loss 5.5839 LearningRate 0.0005 Epoch: 15 Global Step: 325920 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:22:26,525-Speed 6315.76 samples/sec Loss 5.4811 LearningRate 0.0005 Epoch: 15 Global Step: 325930 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:22:29,771-Speed 6312.21 samples/sec Loss 5.6158 LearningRate 0.0005 Epoch: 15 Global Step: 325940 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:22:33,013-Speed 6316.95 samples/sec Loss 5.5948 LearningRate 0.0005 Epoch: 15 Global Step: 325950 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:22:36,256-Speed 6316.27 samples/sec Loss 5.5832 LearningRate 0.0005 Epoch: 15 Global Step: 325960 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:22:39,502-Speed 6311.69 samples/sec Loss 5.5205 LearningRate 0.0005 Epoch: 15 Global Step: 325970 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:22:42,743-Speed 6320.85 samples/sec Loss 5.4944 LearningRate 0.0005 Epoch: 15 Global Step: 325980 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:22:45,990-Speed 6307.63 samples/sec Loss 5.5101 LearningRate 0.0005 Epoch: 15 Global Step: 325990 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:22:49,239-Speed 6305.38 samples/sec Loss 5.5280 LearningRate 0.0005 Epoch: 15 Global Step: 326000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:22:52,490-Speed 6301.57 samples/sec Loss 5.5683 LearningRate 0.0005 Epoch: 15 Global Step: 326010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:22:55,735-Speed 6311.59 samples/sec Loss 5.5766 LearningRate 0.0005 Epoch: 15 Global Step: 326020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:22:58,991-Speed 6291.24 samples/sec Loss 5.4409 LearningRate 0.0005 Epoch: 15 Global Step: 326030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:02,238-Speed 6309.00 samples/sec Loss 5.5411 LearningRate 0.0005 Epoch: 15 Global Step: 326040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:05,480-Speed 6319.09 samples/sec Loss 5.5528 LearningRate 0.0005 Epoch: 15 Global Step: 326050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:08,728-Speed 6306.57 samples/sec Loss 5.5560 LearningRate 0.0005 Epoch: 15 Global Step: 326060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:11,977-Speed 6304.31 samples/sec Loss 5.5219 LearningRate 0.0005 Epoch: 15 Global Step: 326070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:15,221-Speed 6315.78 samples/sec Loss 5.5465 LearningRate 0.0005 Epoch: 15 Global Step: 326080 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:18,477-Speed 6291.39 samples/sec Loss 5.5279 LearningRate 0.0005 Epoch: 15 Global Step: 326090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:21,708-Speed 6339.08 samples/sec Loss 5.5009 LearningRate 0.0005 Epoch: 15 Global Step: 326100 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:24,960-Speed 6300.50 samples/sec Loss 5.4993 LearningRate 0.0005 Epoch: 15 Global Step: 326110 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:28,204-Speed 6315.04 samples/sec Loss 5.6527 LearningRate 0.0005 Epoch: 15 Global Step: 326120 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:31,451-Speed 6309.06 samples/sec Loss 5.5703 LearningRate 0.0005 Epoch: 15 Global Step: 326130 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:23:34,685-Speed 6333.59 samples/sec Loss 5.6166 LearningRate 0.0005 Epoch: 15 Global Step: 326140 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:23:37,929-Speed 6314.74 samples/sec Loss 5.5756 LearningRate 0.0005 Epoch: 15 Global Step: 326150 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:23:41,178-Speed 6304.15 samples/sec Loss 5.5828 LearningRate 0.0005 Epoch: 15 Global Step: 326160 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:23:44,423-Speed 6312.17 samples/sec Loss 5.4608 LearningRate 0.0005 Epoch: 15 Global Step: 326170 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:23:47,667-Speed 6315.42 samples/sec Loss 5.4706 LearningRate 0.0005 Epoch: 15 Global Step: 326180 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:23:50,912-Speed 6312.00 samples/sec Loss 5.6093 LearningRate 0.0005 Epoch: 15 Global Step: 326190 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:23:54,157-Speed 6313.30 samples/sec Loss 5.5829 LearningRate 0.0005 Epoch: 15 Global Step: 326200 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:23:57,408-Speed 6301.81 samples/sec Loss 5.5407 LearningRate 0.0005 Epoch: 15 Global Step: 326210 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:24:00,653-Speed 6311.22 samples/sec Loss 5.4543 LearningRate 0.0005 Epoch: 15 Global Step: 326220 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:24:03,901-Speed 6308.18 samples/sec Loss 5.5531 LearningRate 0.0005 Epoch: 15 Global Step: 326230 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:24:07,151-Speed 6301.46 samples/sec Loss 5.5228 LearningRate 0.0005 Epoch: 15 Global Step: 326240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:10,397-Speed 6311.12 samples/sec Loss 5.5749 LearningRate 0.0005 Epoch: 15 Global Step: 326250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:13,656-Speed 6286.30 samples/sec Loss 5.5490 LearningRate 0.0005 Epoch: 15 Global Step: 326260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:16,905-Speed 6304.97 samples/sec Loss 5.5474 LearningRate 0.0005 Epoch: 15 Global Step: 326270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:20,150-Speed 6311.65 samples/sec Loss 5.5950 LearningRate 0.0005 Epoch: 15 Global Step: 326280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:23,396-Speed 6310.28 samples/sec Loss 5.6469 LearningRate 0.0005 Epoch: 15 Global Step: 326290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:26,647-Speed 6302.75 samples/sec Loss 5.5328 LearningRate 0.0005 Epoch: 15 Global Step: 326300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:29,895-Speed 6307.68 samples/sec Loss 5.5712 LearningRate 0.0005 Epoch: 15 Global Step: 326310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:33,139-Speed 6313.94 samples/sec Loss 5.4917 LearningRate 0.0005 Epoch: 15 Global Step: 326320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:36,388-Speed 6304.80 samples/sec Loss 5.5779 LearningRate 0.0005 Epoch: 15 Global Step: 326330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:39,621-Speed 6336.46 samples/sec Loss 5.6094 LearningRate 0.0005 Epoch: 15 Global Step: 326340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:42,864-Speed 6316.26 samples/sec Loss 5.5636 LearningRate 0.0005 Epoch: 15 Global Step: 326350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:46,115-Speed 6300.60 samples/sec Loss 5.5312 LearningRate 0.0005 Epoch: 15 Global Step: 326360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:49,359-Speed 6315.84 samples/sec Loss 5.5100 LearningRate 0.0005 Epoch: 15 Global Step: 326370 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:52,617-Speed 6286.51 samples/sec Loss 5.5215 LearningRate 0.0005 Epoch: 15 Global Step: 326380 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:55,863-Speed 6311.47 samples/sec Loss 5.4886 LearningRate 0.0005 Epoch: 15 Global Step: 326390 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:24:59,113-Speed 6308.23 samples/sec Loss 5.4834 LearningRate 0.0005 Epoch: 15 Global Step: 326400 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:25:02,359-Speed 6310.37 samples/sec Loss 5.4992 LearningRate 0.0005 Epoch: 15 Global Step: 326410 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:25:05,610-Speed 6300.75 samples/sec Loss 5.5287 LearningRate 0.0005 Epoch: 15 Global Step: 326420 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:25:08,858-Speed 6307.84 samples/sec Loss 5.5659 LearningRate 0.0005 Epoch: 15 Global Step: 326430 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:25:12,104-Speed 6309.31 samples/sec Loss 5.5059 LearningRate 0.0005 Epoch: 15 Global Step: 326440 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:25:15,346-Speed 6318.75 samples/sec Loss 5.5991 LearningRate 0.0005 Epoch: 15 Global Step: 326450 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:25:18,595-Speed 6306.17 samples/sec Loss 5.5050 LearningRate 0.0005 Epoch: 15 Global Step: 326460 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:25:21,840-Speed 6311.61 samples/sec Loss 5.5224 LearningRate 0.0005 Epoch: 15 Global Step: 326470 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:25:25,092-Speed 6299.15 samples/sec Loss 5.5726 LearningRate 0.0005 Epoch: 15 Global Step: 326480 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:25:28,339-Speed 6308.40 samples/sec Loss 5.4602 LearningRate 0.0005 Epoch: 15 Global Step: 326490 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:25:31,588-Speed 6305.57 samples/sec Loss 5.5670 LearningRate 0.0005 Epoch: 15 Global Step: 326500 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:25:34,832-Speed 6314.97 samples/sec Loss 5.5210 LearningRate 0.0005 Epoch: 15 Global Step: 326510 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:25:38,079-Speed 6309.39 samples/sec Loss 5.5384 LearningRate 0.0005 Epoch: 15 Global Step: 326520 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:25:41,326-Speed 6309.55 samples/sec Loss 5.5474 LearningRate 0.0005 Epoch: 15 Global Step: 326530 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:25:44,570-Speed 6314.61 samples/sec Loss 5.5722 LearningRate 0.0005 Epoch: 15 Global Step: 326540 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:25:47,813-Speed 6315.36 samples/sec Loss 5.5482 LearningRate 0.0005 Epoch: 15 Global Step: 326550 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:25:51,057-Speed 6314.34 samples/sec Loss 5.5779 LearningRate 0.0005 Epoch: 15 Global Step: 326560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:25:54,311-Speed 6295.31 samples/sec Loss 5.5674 LearningRate 0.0005 Epoch: 15 Global Step: 326570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:25:57,557-Speed 6310.81 samples/sec Loss 5.5330 LearningRate 0.0005 Epoch: 15 Global Step: 326580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:00,803-Speed 6310.39 samples/sec Loss 5.5811 LearningRate 0.0005 Epoch: 15 Global Step: 326590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:04,054-Speed 6302.06 samples/sec Loss 5.5206 LearningRate 0.0005 Epoch: 15 Global Step: 326600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:07,307-Speed 6297.29 samples/sec Loss 5.5453 LearningRate 0.0005 Epoch: 15 Global Step: 326610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:10,553-Speed 6310.82 samples/sec Loss 5.5387 LearningRate 0.0005 Epoch: 15 Global Step: 326620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:13,790-Speed 6328.05 samples/sec Loss 5.5926 LearningRate 0.0005 Epoch: 15 Global Step: 326630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:17,033-Speed 6315.48 samples/sec Loss 5.6220 LearningRate 0.0005 Epoch: 15 Global Step: 326640 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:20,279-Speed 6310.42 samples/sec Loss 5.5538 LearningRate 0.0005 Epoch: 15 Global Step: 326650 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:23,535-Speed 6292.16 samples/sec Loss 5.5045 LearningRate 0.0005 Epoch: 15 Global Step: 326660 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:26,795-Speed 6283.27 samples/sec Loss 5.6513 LearningRate 0.0005 Epoch: 15 Global Step: 326670 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:30,043-Speed 6306.43 samples/sec Loss 5.5517 LearningRate 0.0005 Epoch: 15 Global Step: 326680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:33,289-Speed 6311.67 samples/sec Loss 5.5014 LearningRate 0.0005 Epoch: 15 Global Step: 326690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:36,539-Speed 6302.31 samples/sec Loss 5.4914 LearningRate 0.0005 Epoch: 15 Global Step: 326700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:39,794-Speed 6294.71 samples/sec Loss 5.4917 LearningRate 0.0005 Epoch: 15 Global Step: 326710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:43,037-Speed 6316.71 samples/sec Loss 5.5833 LearningRate 0.0005 Epoch: 15 Global Step: 326720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:46,280-Speed 6316.16 samples/sec Loss 5.5077 LearningRate 0.0005 Epoch: 15 Global Step: 326730 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:49,528-Speed 6306.04 samples/sec Loss 5.5711 LearningRate 0.0005 Epoch: 15 Global Step: 326740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:52,778-Speed 6303.90 samples/sec Loss 5.5452 LearningRate 0.0005 Epoch: 15 Global Step: 326750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:56,026-Speed 6307.75 samples/sec Loss 5.4910 LearningRate 0.0005 Epoch: 15 Global Step: 326760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:26:59,277-Speed 6299.98 samples/sec Loss 5.5509 LearningRate 0.0005 Epoch: 15 Global Step: 326770 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:02,525-Speed 6307.63 samples/sec Loss 5.4565 LearningRate 0.0005 Epoch: 15 Global Step: 326780 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:05,773-Speed 6306.92 samples/sec Loss 5.5438 LearningRate 0.0005 Epoch: 15 Global Step: 326790 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:09,019-Speed 6310.27 samples/sec Loss 5.5454 LearningRate 0.0005 Epoch: 15 Global Step: 326800 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:12,265-Speed 6310.19 samples/sec Loss 5.5728 LearningRate 0.0005 Epoch: 15 Global Step: 326810 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:15,519-Speed 6295.24 samples/sec Loss 5.5263 LearningRate 0.0005 Epoch: 15 Global Step: 326820 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:18,764-Speed 6313.33 samples/sec Loss 5.5342 LearningRate 0.0005 Epoch: 15 Global Step: 326830 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 21:27:21,994-Speed 6340.91 samples/sec Loss 5.5088 LearningRate 0.0005 Epoch: 15 Global Step: 326840 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:25,242-Speed 6308.73 samples/sec Loss 5.6033 LearningRate 0.0005 Epoch: 15 Global Step: 326850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:28,488-Speed 6309.83 samples/sec Loss 5.5119 LearningRate 0.0005 Epoch: 15 Global Step: 326860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:31,737-Speed 6303.65 samples/sec Loss 5.5291 LearningRate 0.0005 Epoch: 15 Global Step: 326870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:34,984-Speed 6309.75 samples/sec Loss 5.5547 LearningRate 0.0005 Epoch: 15 Global Step: 326880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:38,238-Speed 6294.67 samples/sec Loss 5.5146 LearningRate 0.0005 Epoch: 15 Global Step: 326890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:41,490-Speed 6300.46 samples/sec Loss 5.5693 LearningRate 0.0005 Epoch: 15 Global Step: 326900 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:44,741-Speed 6299.63 samples/sec Loss 5.5568 LearningRate 0.0005 Epoch: 15 Global Step: 326910 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:47,989-Speed 6307.54 samples/sec Loss 5.5117 LearningRate 0.0005 Epoch: 15 Global Step: 326920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:51,239-Speed 6302.63 samples/sec Loss 5.4918 LearningRate 0.0005 Epoch: 15 Global Step: 326930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:54,475-Speed 6331.43 samples/sec Loss 5.4886 LearningRate 0.0005 Epoch: 15 Global Step: 326940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:27:57,720-Speed 6312.03 samples/sec Loss 5.5179 LearningRate 0.0005 Epoch: 15 Global Step: 326950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:00,983-Speed 6279.03 samples/sec Loss 5.5358 LearningRate 0.0005 Epoch: 15 Global Step: 326960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:04,243-Speed 6283.86 samples/sec Loss 5.6126 LearningRate 0.0005 Epoch: 15 Global Step: 326970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:07,488-Speed 6310.85 samples/sec Loss 5.5640 LearningRate 0.0005 Epoch: 15 Global Step: 326980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:10,738-Speed 6304.82 samples/sec Loss 5.5244 LearningRate 0.0005 Epoch: 15 Global Step: 326990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:13,984-Speed 6310.68 samples/sec Loss 5.5477 LearningRate 0.0005 Epoch: 15 Global Step: 327000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:17,228-Speed 6312.64 samples/sec Loss 5.5589 LearningRate 0.0005 Epoch: 15 Global Step: 327010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:20,475-Speed 6309.52 samples/sec Loss 5.6292 LearningRate 0.0005 Epoch: 15 Global Step: 327020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:23,725-Speed 6304.73 samples/sec Loss 5.5044 LearningRate 0.0005 Epoch: 15 Global Step: 327030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:26,957-Speed 6337.88 samples/sec Loss 5.5988 LearningRate 0.0005 Epoch: 15 Global Step: 327040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:30,204-Speed 6308.65 samples/sec Loss 5.5698 LearningRate 0.0005 Epoch: 15 Global Step: 327050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:33,455-Speed 6300.31 samples/sec Loss 5.4561 LearningRate 0.0005 Epoch: 15 Global Step: 327060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:36,704-Speed 6310.30 samples/sec Loss 5.5219 LearningRate 0.0005 Epoch: 15 Global Step: 327070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:39,954-Speed 6302.96 samples/sec Loss 5.6074 LearningRate 0.0005 Epoch: 15 Global Step: 327080 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:43,203-Speed 6305.43 samples/sec Loss 5.4732 LearningRate 0.0005 Epoch: 15 Global Step: 327090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:46,464-Speed 6281.59 samples/sec Loss 5.5390 LearningRate 0.0005 Epoch: 15 Global Step: 327100 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:49,714-Speed 6301.56 samples/sec Loss 5.5221 LearningRate 0.0005 Epoch: 15 Global Step: 327110 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:52,955-Speed 6320.66 samples/sec Loss 5.5159 LearningRate 0.0005 Epoch: 15 Global Step: 327120 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:56,202-Speed 6309.70 samples/sec Loss 5.5322 LearningRate 0.0005 Epoch: 15 Global Step: 327130 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:28:59,432-Speed 6340.77 samples/sec Loss 5.5748 LearningRate 0.0005 Epoch: 15 Global Step: 327140 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:02,680-Speed 6307.80 samples/sec Loss 5.4825 LearningRate 0.0005 Epoch: 15 Global Step: 327150 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:05,930-Speed 6302.89 samples/sec Loss 5.5671 LearningRate 0.0005 Epoch: 15 Global Step: 327160 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:09,175-Speed 6312.15 samples/sec Loss 5.5722 LearningRate 0.0005 Epoch: 15 Global Step: 327170 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:12,453-Speed 6250.85 samples/sec Loss 5.5931 LearningRate 0.0005 Epoch: 15 Global Step: 327180 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:15,699-Speed 6309.63 samples/sec Loss 5.5023 LearningRate 0.0005 Epoch: 15 Global Step: 327190 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:18,949-Speed 6302.98 samples/sec Loss 5.5305 LearningRate 0.0005 Epoch: 15 Global Step: 327200 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:22,199-Speed 6302.78 samples/sec Loss 5.5802 LearningRate 0.0005 Epoch: 15 Global Step: 327210 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:25,447-Speed 6307.95 samples/sec Loss 5.5434 LearningRate 0.0005 Epoch: 15 Global Step: 327220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:28,694-Speed 6307.44 samples/sec Loss 5.5011 LearningRate 0.0005 Epoch: 15 Global Step: 327230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:31,926-Speed 6339.02 samples/sec Loss 5.5818 LearningRate 0.0005 Epoch: 15 Global Step: 327240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:35,173-Speed 6309.16 samples/sec Loss 5.5431 LearningRate 0.0005 Epoch: 15 Global Step: 327250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:38,422-Speed 6303.58 samples/sec Loss 5.5811 LearningRate 0.0005 Epoch: 15 Global Step: 327260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:41,672-Speed 6304.23 samples/sec Loss 5.5299 LearningRate 0.0005 Epoch: 15 Global Step: 327270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:44,922-Speed 6301.87 samples/sec Loss 5.5435 LearningRate 0.0005 Epoch: 15 Global Step: 327280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:48,184-Speed 6279.70 samples/sec Loss 5.5013 LearningRate 0.0005 Epoch: 15 Global Step: 327290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:51,433-Speed 6305.99 samples/sec Loss 5.5843 LearningRate 0.0005 Epoch: 15 Global Step: 327300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:54,681-Speed 6306.82 samples/sec Loss 5.5113 LearningRate 0.0005 Epoch: 15 Global Step: 327310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:29:57,925-Speed 6314.63 samples/sec Loss 5.5014 LearningRate 0.0005 Epoch: 15 Global Step: 327320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:01,173-Speed 6305.17 samples/sec Loss 5.5687 LearningRate 0.0005 Epoch: 15 Global Step: 327330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:04,419-Speed 6313.04 samples/sec Loss 5.4640 LearningRate 0.0005 Epoch: 15 Global Step: 327340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:07,668-Speed 6304.52 samples/sec Loss 5.5018 LearningRate 0.0005 Epoch: 15 Global Step: 327350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:10,916-Speed 6307.47 samples/sec Loss 5.5065 LearningRate 0.0005 Epoch: 15 Global Step: 327360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:14,164-Speed 6307.44 samples/sec Loss 5.6132 LearningRate 0.0005 Epoch: 15 Global Step: 327370 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:17,411-Speed 6308.57 samples/sec Loss 5.6132 LearningRate 0.0005 Epoch: 15 Global Step: 327380 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:20,662-Speed 6299.56 samples/sec Loss 5.5738 LearningRate 0.0005 Epoch: 15 Global Step: 327390 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:23,912-Speed 6307.27 samples/sec Loss 5.5483 LearningRate 0.0005 Epoch: 15 Global Step: 327400 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:27,168-Speed 6290.98 samples/sec Loss 5.5664 LearningRate 0.0005 Epoch: 15 Global Step: 327410 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:30,412-Speed 6315.21 samples/sec Loss 5.5308 LearningRate 0.0005 Epoch: 15 Global Step: 327420 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:33,658-Speed 6310.06 samples/sec Loss 5.5517 LearningRate 0.0005 Epoch: 15 Global Step: 327430 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:36,888-Speed 6341.43 samples/sec Loss 5.5232 LearningRate 0.0005 Epoch: 15 Global Step: 327440 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:40,138-Speed 6302.65 samples/sec Loss 5.5619 LearningRate 0.0005 Epoch: 15 Global Step: 327450 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:43,385-Speed 6308.88 samples/sec Loss 5.5371 LearningRate 0.0005 Epoch: 15 Global Step: 327460 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:46,632-Speed 6308.43 samples/sec Loss 5.5484 LearningRate 0.0005 Epoch: 15 Global Step: 327470 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:49,880-Speed 6307.83 samples/sec Loss 5.5641 LearningRate 0.0005 Epoch: 15 Global Step: 327480 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:53,127-Speed 6307.65 samples/sec Loss 5.5245 LearningRate 0.0005 Epoch: 15 Global Step: 327490 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:56,372-Speed 6312.92 samples/sec Loss 5.4970 LearningRate 0.0005 Epoch: 15 Global Step: 327500 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:30:59,622-Speed 6303.76 samples/sec Loss 5.5563 LearningRate 0.0005 Epoch: 15 Global Step: 327510 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:31:02,854-Speed 6336.88 samples/sec Loss 5.5465 LearningRate 0.0005 Epoch: 15 Global Step: 327520 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:06,105-Speed 6301.24 samples/sec Loss 5.5318 LearningRate 0.0005 Epoch: 15 Global Step: 327530 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:09,350-Speed 6312.70 samples/sec Loss 5.5536 LearningRate 0.0005 Epoch: 15 Global Step: 327540 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:12,597-Speed 6309.75 samples/sec Loss 5.5658 LearningRate 0.0005 Epoch: 15 Global Step: 327550 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:15,844-Speed 6307.89 samples/sec Loss 5.5621 LearningRate 0.0005 Epoch: 15 Global Step: 327560 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:19,091-Speed 6309.60 samples/sec Loss 5.5220 LearningRate 0.0005 Epoch: 15 Global Step: 327570 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:22,335-Speed 6314.90 samples/sec Loss 5.4817 LearningRate 0.0005 Epoch: 15 Global Step: 327580 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:25,587-Speed 6298.88 samples/sec Loss 5.5725 LearningRate 0.0005 Epoch: 15 Global Step: 327590 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:28,834-Speed 6310.35 samples/sec Loss 5.6286 LearningRate 0.0005 Epoch: 15 Global Step: 327600 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:32,080-Speed 6309.09 samples/sec Loss 5.5328 LearningRate 0.0005 Epoch: 15 Global Step: 327610 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:35,324-Speed 6315.81 samples/sec Loss 5.5795 LearningRate 0.0005 Epoch: 15 Global Step: 327620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:31:38,577-Speed 6296.03 samples/sec Loss 5.5411 LearningRate 0.0005 Epoch: 15 Global Step: 327630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:31:41,807-Speed 6342.18 samples/sec Loss 5.5178 LearningRate 0.0005 Epoch: 15 Global Step: 327640 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:45,057-Speed 6303.40 samples/sec Loss 5.4933 LearningRate 0.0005 Epoch: 15 Global Step: 327650 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:48,304-Speed 6308.28 samples/sec Loss 5.6281 LearningRate 0.0005 Epoch: 15 Global Step: 327660 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:51,548-Speed 6313.90 samples/sec Loss 5.4745 LearningRate 0.0005 Epoch: 15 Global Step: 327670 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:54,797-Speed 6306.01 samples/sec Loss 5.5388 LearningRate 0.0005 Epoch: 15 Global Step: 327680 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:31:58,055-Speed 6287.24 samples/sec Loss 5.5115 LearningRate 0.0005 Epoch: 15 Global Step: 327690 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:01,389-Speed 6143.94 samples/sec Loss 5.4599 LearningRate 0.0005 Epoch: 15 Global Step: 327700 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:04,637-Speed 6308.57 samples/sec Loss 5.4927 LearningRate 0.0005 Epoch: 15 Global Step: 327710 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:07,883-Speed 6309.17 samples/sec Loss 5.4870 LearningRate 0.0005 Epoch: 15 Global Step: 327720 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:11,132-Speed 6306.08 samples/sec Loss 5.5386 LearningRate 0.0005 Epoch: 15 Global Step: 327730 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:14,375-Speed 6315.28 samples/sec Loss 5.4831 LearningRate 0.0005 Epoch: 15 Global Step: 327740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:32:17,621-Speed 6310.40 samples/sec Loss 5.5466 LearningRate 0.0005 Epoch: 15 Global Step: 327750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:32:20,866-Speed 6313.79 samples/sec Loss 5.4815 LearningRate 0.0005 Epoch: 15 Global Step: 327760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:32:24,097-Speed 6339.07 samples/sec Loss 5.5933 LearningRate 0.0005 Epoch: 15 Global Step: 327770 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:27,346-Speed 6306.27 samples/sec Loss 5.5615 LearningRate 0.0005 Epoch: 15 Global Step: 327780 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:30,593-Speed 6308.22 samples/sec Loss 5.4996 LearningRate 0.0005 Epoch: 15 Global Step: 327790 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:33,838-Speed 6312.74 samples/sec Loss 5.5886 LearningRate 0.0005 Epoch: 15 Global Step: 327800 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:37,083-Speed 6313.65 samples/sec Loss 5.5898 LearningRate 0.0005 Epoch: 15 Global Step: 327810 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:40,330-Speed 6307.52 samples/sec Loss 5.5325 LearningRate 0.0005 Epoch: 15 Global Step: 327820 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:43,580-Speed 6303.83 samples/sec Loss 5.5574 LearningRate 0.0005 Epoch: 15 Global Step: 327830 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:46,823-Speed 6315.95 samples/sec Loss 5.5228 LearningRate 0.0005 Epoch: 15 Global Step: 327840 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:50,076-Speed 6297.04 samples/sec Loss 5.5432 LearningRate 0.0005 Epoch: 15 Global Step: 327850 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:53,322-Speed 6311.08 samples/sec Loss 5.5019 LearningRate 0.0005 Epoch: 15 Global Step: 327860 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:32:56,569-Speed 6308.21 samples/sec Loss 5.5176 LearningRate 0.0005 Epoch: 15 Global Step: 327870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:32:59,814-Speed 6313.35 samples/sec Loss 5.5211 LearningRate 0.0005 Epoch: 15 Global Step: 327880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:03,063-Speed 6304.50 samples/sec Loss 5.5651 LearningRate 0.0005 Epoch: 15 Global Step: 327890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:06,312-Speed 6305.05 samples/sec Loss 5.4482 LearningRate 0.0005 Epoch: 15 Global Step: 327900 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:09,554-Speed 6317.89 samples/sec Loss 5.5395 LearningRate 0.0005 Epoch: 15 Global Step: 327910 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:12,801-Speed 6309.24 samples/sec Loss 5.5463 LearningRate 0.0005 Epoch: 15 Global Step: 327920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:16,045-Speed 6314.24 samples/sec Loss 5.5690 LearningRate 0.0005 Epoch: 15 Global Step: 327930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:19,302-Speed 6289.87 samples/sec Loss 5.4725 LearningRate 0.0005 Epoch: 15 Global Step: 327940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:22,548-Speed 6311.16 samples/sec Loss 5.5121 LearningRate 0.0005 Epoch: 15 Global Step: 327950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:25,793-Speed 6312.89 samples/sec Loss 5.4967 LearningRate 0.0005 Epoch: 15 Global Step: 327960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:29,028-Speed 6331.87 samples/sec Loss 5.4901 LearningRate 0.0005 Epoch: 15 Global Step: 327970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:32,275-Speed 6309.22 samples/sec Loss 5.5473 LearningRate 0.0005 Epoch: 15 Global Step: 327980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:35,522-Speed 6308.77 samples/sec Loss 5.5390 LearningRate 0.0005 Epoch: 15 Global Step: 327990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:38,772-Speed 6302.19 samples/sec Loss 5.5272 LearningRate 0.0005 Epoch: 15 Global Step: 328000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:42,023-Speed 6302.40 samples/sec Loss 5.5733 LearningRate 0.0005 Epoch: 15 Global Step: 328010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:45,281-Speed 6286.82 samples/sec Loss 5.5841 LearningRate 0.0005 Epoch: 15 Global Step: 328020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:48,528-Speed 6307.93 samples/sec Loss 5.4949 LearningRate 0.0005 Epoch: 15 Global Step: 328030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:51,776-Speed 6306.72 samples/sec Loss 5.4788 LearningRate 0.0005 Epoch: 15 Global Step: 328040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:55,019-Speed 6318.72 samples/sec Loss 5.5343 LearningRate 0.0005 Epoch: 15 Global Step: 328050 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:33:58,263-Speed 6313.15 samples/sec Loss 5.5133 LearningRate 0.0005 Epoch: 15 Global Step: 328060 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:01,500-Speed 6327.84 samples/sec Loss 5.5352 LearningRate 0.0005 Epoch: 15 Global Step: 328070 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:04,749-Speed 6306.19 samples/sec Loss 5.5113 LearningRate 0.0005 Epoch: 15 Global Step: 328080 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:07,992-Speed 6315.49 samples/sec Loss 5.5476 LearningRate 0.0005 Epoch: 15 Global Step: 328090 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:11,251-Speed 6285.70 samples/sec Loss 5.4182 LearningRate 0.0005 Epoch: 15 Global Step: 328100 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:14,498-Speed 6308.49 samples/sec Loss 5.5521 LearningRate 0.0005 Epoch: 15 Global Step: 328110 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:17,748-Speed 6302.78 samples/sec Loss 5.4995 LearningRate 0.0005 Epoch: 15 Global Step: 328120 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:20,993-Speed 6312.69 samples/sec Loss 5.5704 LearningRate 0.0005 Epoch: 15 Global Step: 328130 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:24,240-Speed 6309.25 samples/sec Loss 5.5799 LearningRate 0.0005 Epoch: 15 Global Step: 328140 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:27,483-Speed 6316.63 samples/sec Loss 5.4810 LearningRate 0.0005 Epoch: 15 Global Step: 328150 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:30,728-Speed 6313.02 samples/sec Loss 5.5972 LearningRate 0.0005 Epoch: 15 Global Step: 328160 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:33,960-Speed 6337.06 samples/sec Loss 5.5328 LearningRate 0.0005 Epoch: 15 Global Step: 328170 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:37,206-Speed 6311.41 samples/sec Loss 5.4782 LearningRate 0.0005 Epoch: 15 Global Step: 328180 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:40,455-Speed 6304.73 samples/sec Loss 5.5642 LearningRate 0.0005 Epoch: 15 Global Step: 328190 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:43,699-Speed 6314.70 samples/sec Loss 5.5062 LearningRate 0.0005 Epoch: 15 Global Step: 328200 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:46,943-Speed 6314.25 samples/sec Loss 5.6456 LearningRate 0.0005 Epoch: 15 Global Step: 328210 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:50,190-Speed 6309.23 samples/sec Loss 5.4943 LearningRate 0.0005 Epoch: 15 Global Step: 328220 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:53,436-Speed 6311.58 samples/sec Loss 5.5725 LearningRate 0.0005 Epoch: 15 Global Step: 328230 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:56,679-Speed 6316.96 samples/sec Loss 5.6024 LearningRate 0.0005 Epoch: 15 Global Step: 328240 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:34:59,929-Speed 6302.13 samples/sec Loss 5.5361 LearningRate 0.0005 Epoch: 15 Global Step: 328250 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:03,183-Speed 6296.13 samples/sec Loss 5.5787 LearningRate 0.0005 Epoch: 15 Global Step: 328260 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:06,412-Speed 6342.81 samples/sec Loss 5.5360 LearningRate 0.0005 Epoch: 15 Global Step: 328270 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:09,660-Speed 6308.27 samples/sec Loss 5.5812 LearningRate 0.0005 Epoch: 15 Global Step: 328280 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:12,904-Speed 6314.81 samples/sec Loss 5.4939 LearningRate 0.0005 Epoch: 15 Global Step: 328290 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:16,149-Speed 6312.53 samples/sec Loss 5.5546 LearningRate 0.0005 Epoch: 15 Global Step: 328300 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:19,396-Speed 6308.46 samples/sec Loss 5.6666 LearningRate 0.0005 Epoch: 15 Global Step: 328310 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:22,646-Speed 6301.89 samples/sec Loss 5.4941 LearningRate 0.0005 Epoch: 15 Global Step: 328320 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:25,894-Speed 6308.30 samples/sec Loss 5.5098 LearningRate 0.0005 Epoch: 15 Global Step: 328330 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:29,138-Speed 6314.18 samples/sec Loss 5.4934 LearningRate 0.0005 Epoch: 15 Global Step: 328340 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:32,385-Speed 6308.00 samples/sec Loss 5.5897 LearningRate 0.0005 Epoch: 15 Global Step: 328350 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:35,630-Speed 6313.74 samples/sec Loss 5.5021 LearningRate 0.0005 Epoch: 15 Global Step: 328360 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:38,875-Speed 6312.42 samples/sec Loss 5.5123 LearningRate 0.0005 Epoch: 15 Global Step: 328370 Fp16 Grad Scale: 65536 Required: 46 hours Training: 2022-04-01 21:35:42,107-Speed 6336.79 samples/sec Loss 5.5291 LearningRate 0.0005 Epoch: 15 Global Step: 328380 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:45,350-Speed 6316.80 samples/sec Loss 5.5066 LearningRate 0.0005 Epoch: 15 Global Step: 328390 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:48,595-Speed 6312.41 samples/sec Loss 5.4693 LearningRate 0.0005 Epoch: 15 Global Step: 328400 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:51,837-Speed 6318.62 samples/sec Loss 5.5091 LearningRate 0.0005 Epoch: 15 Global Step: 328410 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:55,087-Speed 6303.39 samples/sec Loss 5.4776 LearningRate 0.0005 Epoch: 15 Global Step: 328420 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:35:58,337-Speed 6303.31 samples/sec Loss 5.4862 LearningRate 0.0005 Epoch: 15 Global Step: 328430 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:36:01,583-Speed 6310.33 samples/sec Loss 5.4962 LearningRate 0.0005 Epoch: 15 Global Step: 328440 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:36:04,831-Speed 6308.95 samples/sec Loss 5.4470 LearningRate 0.0005 Epoch: 15 Global Step: 328450 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:36:08,063-Speed 6336.74 samples/sec Loss 5.5960 LearningRate 0.0005 Epoch: 15 Global Step: 328460 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:36:11,307-Speed 6315.71 samples/sec Loss 5.4744 LearningRate 0.0005 Epoch: 15 Global Step: 328470 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:36:14,555-Speed 6305.69 samples/sec Loss 5.4762 LearningRate 0.0005 Epoch: 15 Global Step: 328480 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:36:17,799-Speed 6315.60 samples/sec Loss 5.4815 LearningRate 0.0005 Epoch: 15 Global Step: 328490 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:36:21,044-Speed 6311.37 samples/sec Loss 5.5144 LearningRate 0.0005 Epoch: 15 Global Step: 328500 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:36:24,295-Speed 6301.74 samples/sec Loss 5.5000 LearningRate 0.0005 Epoch: 15 Global Step: 328510 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:36:27,539-Speed 6314.06 samples/sec Loss 5.5131 LearningRate 0.0005 Epoch: 15 Global Step: 328520 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:36:30,783-Speed 6314.11 samples/sec Loss 5.5107 LearningRate 0.0005 Epoch: 15 Global Step: 328530 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:36:34,027-Speed 6315.32 samples/sec Loss 5.5562 LearningRate 0.0005 Epoch: 15 Global Step: 328540 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:36:37,272-Speed 6313.17 samples/sec Loss 5.5284 LearningRate 0.0005 Epoch: 15 Global Step: 328550 Fp16 Grad Scale: 16384 Required: 46 hours Training: 2022-04-01 21:36:40,518-Speed 6309.31 samples/sec Loss 5.5181 LearningRate 0.0005 Epoch: 15 Global Step: 328560 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:36:43,766-Speed 6308.06 samples/sec Loss 5.5095 LearningRate 0.0005 Epoch: 15 Global Step: 328570 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:36:47,014-Speed 6307.37 samples/sec Loss 5.4933 LearningRate 0.0005 Epoch: 15 Global Step: 328580 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:36:50,261-Speed 6307.99 samples/sec Loss 5.4569 LearningRate 0.0005 Epoch: 15 Global Step: 328590 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:36:53,505-Speed 6314.84 samples/sec Loss 5.5275 LearningRate 0.0005 Epoch: 15 Global Step: 328600 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:36:56,748-Speed 6317.27 samples/sec Loss 5.5439 LearningRate 0.0005 Epoch: 15 Global Step: 328610 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:36:59,999-Speed 6300.92 samples/sec Loss 5.5806 LearningRate 0.0005 Epoch: 15 Global Step: 328620 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:03,253-Speed 6294.94 samples/sec Loss 5.4915 LearningRate 0.0005 Epoch: 15 Global Step: 328630 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:06,499-Speed 6310.02 samples/sec Loss 5.5246 LearningRate 0.0005 Epoch: 15 Global Step: 328640 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:09,742-Speed 6317.86 samples/sec Loss 5.5249 LearningRate 0.0005 Epoch: 15 Global Step: 328650 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:12,972-Speed 6341.95 samples/sec Loss 5.5415 LearningRate 0.0005 Epoch: 15 Global Step: 328660 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:16,221-Speed 6304.68 samples/sec Loss 5.5078 LearningRate 0.0005 Epoch: 15 Global Step: 328670 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:19,465-Speed 6315.18 samples/sec Loss 5.4753 LearningRate 0.0005 Epoch: 15 Global Step: 328680 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:22,714-Speed 6303.78 samples/sec Loss 5.4910 LearningRate 0.0005 Epoch: 15 Global Step: 328690 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:25,962-Speed 6308.11 samples/sec Loss 5.4540 LearningRate 0.0005 Epoch: 15 Global Step: 328700 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:29,210-Speed 6306.63 samples/sec Loss 5.5054 LearningRate 0.0005 Epoch: 15 Global Step: 328710 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:32,459-Speed 6304.71 samples/sec Loss 5.6080 LearningRate 0.0005 Epoch: 15 Global Step: 328720 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:35,707-Speed 6307.12 samples/sec Loss 5.4784 LearningRate 0.0004 Epoch: 15 Global Step: 328730 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:38,955-Speed 6306.78 samples/sec Loss 5.5522 LearningRate 0.0004 Epoch: 15 Global Step: 328740 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:42,204-Speed 6304.16 samples/sec Loss 5.4504 LearningRate 0.0004 Epoch: 15 Global Step: 328750 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:45,436-Speed 6338.96 samples/sec Loss 5.5450 LearningRate 0.0004 Epoch: 15 Global Step: 328760 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:48,686-Speed 6302.35 samples/sec Loss 5.4981 LearningRate 0.0004 Epoch: 15 Global Step: 328770 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:51,930-Speed 6314.90 samples/sec Loss 5.4791 LearningRate 0.0004 Epoch: 15 Global Step: 328780 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:55,177-Speed 6308.52 samples/sec Loss 5.5163 LearningRate 0.0004 Epoch: 15 Global Step: 328790 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:37:58,434-Speed 6288.93 samples/sec Loss 5.4749 LearningRate 0.0004 Epoch: 15 Global Step: 328800 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:01,685-Speed 6300.46 samples/sec Loss 5.5222 LearningRate 0.0004 Epoch: 15 Global Step: 328810 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:04,935-Speed 6303.83 samples/sec Loss 5.4857 LearningRate 0.0004 Epoch: 15 Global Step: 328820 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:08,181-Speed 6310.68 samples/sec Loss 5.4377 LearningRate 0.0004 Epoch: 15 Global Step: 328830 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:11,429-Speed 6306.90 samples/sec Loss 5.5469 LearningRate 0.0004 Epoch: 15 Global Step: 328840 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:14,676-Speed 6308.98 samples/sec Loss 5.5114 LearningRate 0.0004 Epoch: 15 Global Step: 328850 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:17,912-Speed 6329.19 samples/sec Loss 5.4412 LearningRate 0.0004 Epoch: 15 Global Step: 328860 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:21,161-Speed 6306.14 samples/sec Loss 5.5079 LearningRate 0.0004 Epoch: 15 Global Step: 328870 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:24,406-Speed 6313.58 samples/sec Loss 5.5513 LearningRate 0.0004 Epoch: 15 Global Step: 328880 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:27,652-Speed 6310.79 samples/sec Loss 5.5471 LearningRate 0.0004 Epoch: 15 Global Step: 328890 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:30,901-Speed 6305.76 samples/sec Loss 5.5724 LearningRate 0.0004 Epoch: 15 Global Step: 328900 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:34,147-Speed 6310.18 samples/sec Loss 5.5054 LearningRate 0.0004 Epoch: 15 Global Step: 328910 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:37,396-Speed 6304.06 samples/sec Loss 5.5200 LearningRate 0.0004 Epoch: 15 Global Step: 328920 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:40,647-Speed 6301.26 samples/sec Loss 5.5169 LearningRate 0.0004 Epoch: 15 Global Step: 328930 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:43,891-Speed 6315.02 samples/sec Loss 5.5289 LearningRate 0.0004 Epoch: 15 Global Step: 328940 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:47,140-Speed 6304.98 samples/sec Loss 5.5105 LearningRate 0.0004 Epoch: 15 Global Step: 328950 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:50,370-Speed 6342.41 samples/sec Loss 5.4324 LearningRate 0.0004 Epoch: 15 Global Step: 328960 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:53,615-Speed 6311.95 samples/sec Loss 5.5623 LearningRate 0.0004 Epoch: 15 Global Step: 328970 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:38:56,861-Speed 6310.62 samples/sec Loss 5.5477 LearningRate 0.0004 Epoch: 15 Global Step: 328980 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:39:00,106-Speed 6312.18 samples/sec Loss 5.4562 LearningRate 0.0004 Epoch: 15 Global Step: 328990 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:39:03,353-Speed 6309.15 samples/sec Loss 5.5218 LearningRate 0.0004 Epoch: 15 Global Step: 329000 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:39:06,599-Speed 6311.25 samples/sec Loss 5.5195 LearningRate 0.0004 Epoch: 15 Global Step: 329010 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:39:09,844-Speed 6311.79 samples/sec Loss 5.4354 LearningRate 0.0004 Epoch: 15 Global Step: 329020 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:39:13,093-Speed 6308.17 samples/sec Loss 5.5171 LearningRate 0.0004 Epoch: 15 Global Step: 329030 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:39:16,354-Speed 6280.95 samples/sec Loss 5.5305 LearningRate 0.0004 Epoch: 15 Global Step: 329040 Fp16 Grad Scale: 32768 Required: 46 hours Training: 2022-04-01 21:39:19,584-Speed 6341.45 samples/sec Loss 5.5505 LearningRate 0.0004 Epoch: 15 Global Step: 329050 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:39:22,831-Speed 6310.06 samples/sec Loss 5.4700 LearningRate 0.0004 Epoch: 15 Global Step: 329060 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:39:26,081-Speed 6302.36 samples/sec Loss 5.5297 LearningRate 0.0004 Epoch: 15 Global Step: 329070 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:39:29,330-Speed 6305.32 samples/sec Loss 5.5370 LearningRate 0.0004 Epoch: 15 Global Step: 329080 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:39:32,572-Speed 6319.17 samples/sec Loss 5.4890 LearningRate 0.0004 Epoch: 15 Global Step: 329090 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:39:35,817-Speed 6312.23 samples/sec Loss 5.5939 LearningRate 0.0004 Epoch: 15 Global Step: 329100 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:39:39,063-Speed 6312.17 samples/sec Loss 5.5190 LearningRate 0.0004 Epoch: 15 Global Step: 329110 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:39:42,307-Speed 6313.92 samples/sec Loss 5.5881 LearningRate 0.0004 Epoch: 15 Global Step: 329120 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:39:45,554-Speed 6308.44 samples/sec Loss 5.5398 LearningRate 0.0004 Epoch: 15 Global Step: 329130 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:39:48,800-Speed 6311.33 samples/sec Loss 5.5028 LearningRate 0.0004 Epoch: 15 Global Step: 329140 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:39:52,047-Speed 6308.43 samples/sec Loss 5.5061 LearningRate 0.0004 Epoch: 15 Global Step: 329150 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:39:55,300-Speed 6297.92 samples/sec Loss 5.5321 LearningRate 0.0004 Epoch: 15 Global Step: 329160 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:39:58,547-Speed 6308.37 samples/sec Loss 5.4994 LearningRate 0.0004 Epoch: 15 Global Step: 329170 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:40:01,791-Speed 6313.19 samples/sec Loss 5.5011 LearningRate 0.0004 Epoch: 15 Global Step: 329180 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:40:05,049-Speed 6288.66 samples/sec Loss 5.5109 LearningRate 0.0004 Epoch: 15 Global Step: 329190 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:40:08,294-Speed 6313.20 samples/sec Loss 5.5099 LearningRate 0.0004 Epoch: 15 Global Step: 329200 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:40:11,542-Speed 6306.43 samples/sec Loss 5.5297 LearningRate 0.0004 Epoch: 15 Global Step: 329210 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:40:14,788-Speed 6310.07 samples/sec Loss 5.5150 LearningRate 0.0004 Epoch: 15 Global Step: 329220 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:40:18,041-Speed 6297.96 samples/sec Loss 5.5188 LearningRate 0.0004 Epoch: 15 Global Step: 329230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:40:21,285-Speed 6313.52 samples/sec Loss 5.4732 LearningRate 0.0004 Epoch: 15 Global Step: 329240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:40:24,523-Speed 6327.47 samples/sec Loss 5.5036 LearningRate 0.0004 Epoch: 15 Global Step: 329250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:40:27,755-Speed 6338.25 samples/sec Loss 5.4609 LearningRate 0.0004 Epoch: 15 Global Step: 329260 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:40:30,998-Speed 6315.95 samples/sec Loss 5.5160 LearningRate 0.0004 Epoch: 15 Global Step: 329270 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:40:34,244-Speed 6310.44 samples/sec Loss 5.5281 LearningRate 0.0004 Epoch: 15 Global Step: 329280 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:40:37,489-Speed 6311.48 samples/sec Loss 5.4732 LearningRate 0.0004 Epoch: 15 Global Step: 329290 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:40:40,735-Speed 6312.85 samples/sec Loss 5.5363 LearningRate 0.0004 Epoch: 15 Global Step: 329300 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:40:43,983-Speed 6306.41 samples/sec Loss 5.4744 LearningRate 0.0004 Epoch: 15 Global Step: 329310 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:40:47,230-Speed 6309.67 samples/sec Loss 5.5166 LearningRate 0.0004 Epoch: 15 Global Step: 329320 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:40:50,474-Speed 6313.62 samples/sec Loss 5.5009 LearningRate 0.0004 Epoch: 15 Global Step: 329330 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:40:53,721-Speed 6310.22 samples/sec Loss 5.5385 LearningRate 0.0004 Epoch: 15 Global Step: 329340 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:40:56,969-Speed 6306.58 samples/sec Loss 5.6118 LearningRate 0.0004 Epoch: 15 Global Step: 329350 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:41:00,219-Speed 6302.30 samples/sec Loss 5.5522 LearningRate 0.0004 Epoch: 15 Global Step: 329360 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:03,467-Speed 6306.22 samples/sec Loss 5.5301 LearningRate 0.0004 Epoch: 15 Global Step: 329370 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:06,712-Speed 6313.14 samples/sec Loss 5.5597 LearningRate 0.0004 Epoch: 15 Global Step: 329380 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:09,973-Speed 6282.51 samples/sec Loss 5.5116 LearningRate 0.0004 Epoch: 15 Global Step: 329390 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:13,222-Speed 6304.41 samples/sec Loss 5.4690 LearningRate 0.0004 Epoch: 15 Global Step: 329400 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:16,471-Speed 6304.93 samples/sec Loss 5.5531 LearningRate 0.0004 Epoch: 15 Global Step: 329410 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:19,717-Speed 6309.79 samples/sec Loss 5.5165 LearningRate 0.0004 Epoch: 15 Global Step: 329420 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:22,962-Speed 6313.64 samples/sec Loss 5.4079 LearningRate 0.0004 Epoch: 15 Global Step: 329430 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:26,209-Speed 6307.26 samples/sec Loss 5.5605 LearningRate 0.0004 Epoch: 15 Global Step: 329440 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:29,457-Speed 6307.57 samples/sec Loss 5.5520 LearningRate 0.0004 Epoch: 15 Global Step: 329450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:32,695-Speed 6326.83 samples/sec Loss 5.5305 LearningRate 0.0004 Epoch: 15 Global Step: 329460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:35,941-Speed 6310.61 samples/sec Loss 5.4757 LearningRate 0.0004 Epoch: 15 Global Step: 329470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:39,186-Speed 6312.35 samples/sec Loss 5.5610 LearningRate 0.0004 Epoch: 15 Global Step: 329480 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:42,436-Speed 6303.65 samples/sec Loss 5.5088 LearningRate 0.0004 Epoch: 15 Global Step: 329490 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:45,681-Speed 6312.28 samples/sec Loss 5.5412 LearningRate 0.0004 Epoch: 15 Global Step: 329500 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:48,928-Speed 6309.19 samples/sec Loss 5.4669 LearningRate 0.0004 Epoch: 15 Global Step: 329510 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:52,172-Speed 6314.28 samples/sec Loss 5.5311 LearningRate 0.0004 Epoch: 15 Global Step: 329520 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:55,415-Speed 6316.18 samples/sec Loss 5.5459 LearningRate 0.0004 Epoch: 15 Global Step: 329530 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:41:58,662-Speed 6310.31 samples/sec Loss 5.4788 LearningRate 0.0004 Epoch: 15 Global Step: 329540 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:42:01,907-Speed 6312.78 samples/sec Loss 5.5278 LearningRate 0.0004 Epoch: 15 Global Step: 329550 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:42:05,138-Speed 6338.46 samples/sec Loss 5.5365 LearningRate 0.0004 Epoch: 15 Global Step: 329560 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:42:08,384-Speed 6310.93 samples/sec Loss 5.4834 LearningRate 0.0004 Epoch: 15 Global Step: 329570 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:42:11,635-Speed 6301.32 samples/sec Loss 5.5833 LearningRate 0.0004 Epoch: 15 Global Step: 329580 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:42:14,881-Speed 6311.11 samples/sec Loss 5.5351 LearningRate 0.0004 Epoch: 15 Global Step: 329590 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:42:18,130-Speed 6305.49 samples/sec Loss 5.5062 LearningRate 0.0004 Epoch: 15 Global Step: 329600 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:42:21,377-Speed 6308.76 samples/sec Loss 5.5090 LearningRate 0.0004 Epoch: 15 Global Step: 329610 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:42:24,623-Speed 6310.30 samples/sec Loss 5.6050 LearningRate 0.0004 Epoch: 15 Global Step: 329620 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:42:27,855-Speed 6336.69 samples/sec Loss 5.4899 LearningRate 0.0004 Epoch: 15 Global Step: 329630 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:42:31,103-Speed 6307.82 samples/sec Loss 5.4772 LearningRate 0.0004 Epoch: 15 Global Step: 329640 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:42:34,348-Speed 6313.21 samples/sec Loss 5.4936 LearningRate 0.0004 Epoch: 15 Global Step: 329650 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:42:37,596-Speed 6305.86 samples/sec Loss 5.5183 LearningRate 0.0004 Epoch: 15 Global Step: 329660 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:42:40,843-Speed 6309.03 samples/sec Loss 5.5702 LearningRate 0.0004 Epoch: 15 Global Step: 329670 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:42:44,092-Speed 6304.68 samples/sec Loss 5.4601 LearningRate 0.0004 Epoch: 15 Global Step: 329680 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:42:47,337-Speed 6312.91 samples/sec Loss 5.4699 LearningRate 0.0004 Epoch: 15 Global Step: 329690 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:42:50,585-Speed 6306.51 samples/sec Loss 5.5155 LearningRate 0.0004 Epoch: 15 Global Step: 329700 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:42:53,831-Speed 6312.16 samples/sec Loss 5.5394 LearningRate 0.0004 Epoch: 15 Global Step: 329710 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:42:57,081-Speed 6302.65 samples/sec Loss 5.4171 LearningRate 0.0004 Epoch: 15 Global Step: 329720 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:43:00,328-Speed 6310.30 samples/sec Loss 5.4860 LearningRate 0.0004 Epoch: 15 Global Step: 329730 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:03,572-Speed 6313.38 samples/sec Loss 5.5739 LearningRate 0.0004 Epoch: 15 Global Step: 329740 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:06,818-Speed 6310.41 samples/sec Loss 5.5507 LearningRate 0.0004 Epoch: 15 Global Step: 329750 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:10,065-Speed 6309.84 samples/sec Loss 5.5813 LearningRate 0.0004 Epoch: 15 Global Step: 329760 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:13,309-Speed 6313.58 samples/sec Loss 5.4767 LearningRate 0.0004 Epoch: 15 Global Step: 329770 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:16,556-Speed 6310.38 samples/sec Loss 5.5879 LearningRate 0.0004 Epoch: 15 Global Step: 329780 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:19,818-Speed 6280.15 samples/sec Loss 5.4371 LearningRate 0.0004 Epoch: 15 Global Step: 329790 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:23,063-Speed 6311.49 samples/sec Loss 5.5556 LearningRate 0.0004 Epoch: 15 Global Step: 329800 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:26,305-Speed 6319.27 samples/sec Loss 5.4493 LearningRate 0.0004 Epoch: 15 Global Step: 329810 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:29,547-Speed 6318.20 samples/sec Loss 5.5003 LearningRate 0.0004 Epoch: 15 Global Step: 329820 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:32,782-Speed 6332.55 samples/sec Loss 5.5366 LearningRate 0.0004 Epoch: 15 Global Step: 329830 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:36,027-Speed 6312.01 samples/sec Loss 5.4935 LearningRate 0.0004 Epoch: 15 Global Step: 329840 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:39,276-Speed 6304.98 samples/sec Loss 5.4835 LearningRate 0.0004 Epoch: 15 Global Step: 329850 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:42,542-Speed 6272.48 samples/sec Loss 5.5608 LearningRate 0.0004 Epoch: 15 Global Step: 329860 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:45,785-Speed 6316.32 samples/sec Loss 5.5276 LearningRate 0.0004 Epoch: 15 Global Step: 329870 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:49,030-Speed 6312.86 samples/sec Loss 5.5514 LearningRate 0.0004 Epoch: 15 Global Step: 329880 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:52,277-Speed 6308.88 samples/sec Loss 5.5378 LearningRate 0.0004 Epoch: 15 Global Step: 329890 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:55,522-Speed 6311.89 samples/sec Loss 5.5247 LearningRate 0.0004 Epoch: 15 Global Step: 329900 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:43:58,768-Speed 6310.26 samples/sec Loss 5.5444 LearningRate 0.0004 Epoch: 15 Global Step: 329910 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:02,011-Speed 6316.72 samples/sec Loss 5.5327 LearningRate 0.0004 Epoch: 15 Global Step: 329920 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:05,245-Speed 6335.26 samples/sec Loss 5.5091 LearningRate 0.0004 Epoch: 15 Global Step: 329930 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:08,489-Speed 6314.87 samples/sec Loss 5.4635 LearningRate 0.0004 Epoch: 15 Global Step: 329940 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:11,741-Speed 6298.30 samples/sec Loss 5.5380 LearningRate 0.0004 Epoch: 15 Global Step: 329950 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:14,987-Speed 6311.60 samples/sec Loss 5.5540 LearningRate 0.0004 Epoch: 15 Global Step: 329960 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:18,231-Speed 6315.42 samples/sec Loss 5.5216 LearningRate 0.0004 Epoch: 15 Global Step: 329970 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:21,476-Speed 6310.94 samples/sec Loss 5.4329 LearningRate 0.0004 Epoch: 15 Global Step: 329980 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:24,728-Speed 6298.88 samples/sec Loss 5.5125 LearningRate 0.0004 Epoch: 15 Global Step: 329990 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:27,977-Speed 6305.79 samples/sec Loss 5.5633 LearningRate 0.0004 Epoch: 15 Global Step: 330000 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:31,228-Speed 6301.99 samples/sec Loss 5.5342 LearningRate 0.0004 Epoch: 15 Global Step: 330010 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:34,473-Speed 6310.88 samples/sec Loss 5.4371 LearningRate 0.0004 Epoch: 15 Global Step: 330020 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:37,702-Speed 6344.87 samples/sec Loss 5.5010 LearningRate 0.0004 Epoch: 15 Global Step: 330030 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:40,949-Speed 6308.44 samples/sec Loss 5.5270 LearningRate 0.0004 Epoch: 15 Global Step: 330040 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:44,193-Speed 6314.37 samples/sec Loss 5.5199 LearningRate 0.0004 Epoch: 15 Global Step: 330050 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:47,441-Speed 6308.21 samples/sec Loss 5.4897 LearningRate 0.0004 Epoch: 15 Global Step: 330060 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:50,686-Speed 6312.53 samples/sec Loss 5.5219 LearningRate 0.0004 Epoch: 15 Global Step: 330070 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:53,933-Speed 6308.35 samples/sec Loss 5.4420 LearningRate 0.0004 Epoch: 15 Global Step: 330080 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:44:57,179-Speed 6309.55 samples/sec Loss 5.4731 LearningRate 0.0004 Epoch: 15 Global Step: 330090 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:00,427-Speed 6306.97 samples/sec Loss 5.4631 LearningRate 0.0004 Epoch: 15 Global Step: 330100 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:03,672-Speed 6313.32 samples/sec Loss 5.5202 LearningRate 0.0004 Epoch: 15 Global Step: 330110 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:06,920-Speed 6306.82 samples/sec Loss 5.5193 LearningRate 0.0004 Epoch: 15 Global Step: 330120 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:10,152-Speed 6338.23 samples/sec Loss 5.5791 LearningRate 0.0004 Epoch: 15 Global Step: 330130 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:13,401-Speed 6304.04 samples/sec Loss 5.5542 LearningRate 0.0004 Epoch: 15 Global Step: 330140 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:16,647-Speed 6311.82 samples/sec Loss 5.4991 LearningRate 0.0004 Epoch: 15 Global Step: 330150 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:19,891-Speed 6315.38 samples/sec Loss 5.5212 LearningRate 0.0004 Epoch: 15 Global Step: 330160 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:23,139-Speed 6306.64 samples/sec Loss 5.5216 LearningRate 0.0004 Epoch: 15 Global Step: 330170 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:26,381-Speed 6317.94 samples/sec Loss 5.4163 LearningRate 0.0004 Epoch: 15 Global Step: 330180 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:29,625-Speed 6315.81 samples/sec Loss 5.4809 LearningRate 0.0004 Epoch: 15 Global Step: 330190 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:32,872-Speed 6308.91 samples/sec Loss 5.5059 LearningRate 0.0004 Epoch: 15 Global Step: 330200 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:36,123-Speed 6299.41 samples/sec Loss 5.4808 LearningRate 0.0004 Epoch: 15 Global Step: 330210 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:39,366-Speed 6317.58 samples/sec Loss 5.4818 LearningRate 0.0004 Epoch: 15 Global Step: 330220 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:42,602-Speed 6330.67 samples/sec Loss 5.5337 LearningRate 0.0004 Epoch: 15 Global Step: 330230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:45,848-Speed 6310.16 samples/sec Loss 5.4857 LearningRate 0.0004 Epoch: 15 Global Step: 330240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:49,098-Speed 6302.88 samples/sec Loss 5.5662 LearningRate 0.0004 Epoch: 15 Global Step: 330250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:52,343-Speed 6313.48 samples/sec Loss 5.5261 LearningRate 0.0004 Epoch: 15 Global Step: 330260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:55,585-Speed 6318.07 samples/sec Loss 5.5390 LearningRate 0.0004 Epoch: 15 Global Step: 330270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:45:58,832-Speed 6308.98 samples/sec Loss 5.4838 LearningRate 0.0004 Epoch: 15 Global Step: 330280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:02,077-Speed 6313.13 samples/sec Loss 5.5534 LearningRate 0.0004 Epoch: 15 Global Step: 330290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:05,322-Speed 6311.84 samples/sec Loss 5.5563 LearningRate 0.0004 Epoch: 15 Global Step: 330300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:08,564-Speed 6318.76 samples/sec Loss 5.4580 LearningRate 0.0004 Epoch: 15 Global Step: 330310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:11,808-Speed 6314.76 samples/sec Loss 5.4834 LearningRate 0.0004 Epoch: 15 Global Step: 330320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:15,038-Speed 6341.40 samples/sec Loss 5.5593 LearningRate 0.0004 Epoch: 15 Global Step: 330330 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:18,287-Speed 6305.65 samples/sec Loss 5.5458 LearningRate 0.0004 Epoch: 15 Global Step: 330340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:21,533-Speed 6309.25 samples/sec Loss 5.5351 LearningRate 0.0004 Epoch: 15 Global Step: 330350 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:24,776-Speed 6316.81 samples/sec Loss 5.5273 LearningRate 0.0004 Epoch: 15 Global Step: 330360 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:28,024-Speed 6307.55 samples/sec Loss 5.5048 LearningRate 0.0004 Epoch: 15 Global Step: 330370 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:31,273-Speed 6305.59 samples/sec Loss 5.5115 LearningRate 0.0004 Epoch: 15 Global Step: 330380 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:34,519-Speed 6309.77 samples/sec Loss 5.5759 LearningRate 0.0004 Epoch: 15 Global Step: 330390 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:37,767-Speed 6307.96 samples/sec Loss 5.5272 LearningRate 0.0004 Epoch: 15 Global Step: 330400 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:41,015-Speed 6307.36 samples/sec Loss 5.5286 LearningRate 0.0004 Epoch: 15 Global Step: 330410 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:44,263-Speed 6305.37 samples/sec Loss 5.5207 LearningRate 0.0004 Epoch: 15 Global Step: 330420 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:47,511-Speed 6306.94 samples/sec Loss 5.5874 LearningRate 0.0004 Epoch: 15 Global Step: 330430 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 21:46:50,744-Speed 6337.41 samples/sec Loss 5.5512 LearningRate 0.0004 Epoch: 15 Global Step: 330440 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:53,990-Speed 6309.95 samples/sec Loss 5.5035 LearningRate 0.0004 Epoch: 15 Global Step: 330450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:46:57,234-Speed 6314.20 samples/sec Loss 5.5397 LearningRate 0.0004 Epoch: 15 Global Step: 330460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:47:00,483-Speed 6305.22 samples/sec Loss 5.5250 LearningRate 0.0004 Epoch: 15 Global Step: 330470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:47:03,726-Speed 6317.16 samples/sec Loss 5.5389 LearningRate 0.0004 Epoch: 15 Global Step: 330480 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:47:06,957-Speed 6338.56 samples/sec Loss 5.5777 LearningRate 0.0004 Epoch: 15 Global Step: 330490 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:47:10,204-Speed 6308.89 samples/sec Loss 5.4371 LearningRate 0.0004 Epoch: 15 Global Step: 330500 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:47:13,451-Speed 6310.13 samples/sec Loss 5.5129 LearningRate 0.0004 Epoch: 15 Global Step: 330510 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:47:16,697-Speed 6310.58 samples/sec Loss 5.4332 LearningRate 0.0004 Epoch: 15 Global Step: 330520 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:47:19,942-Speed 6311.96 samples/sec Loss 5.5361 LearningRate 0.0004 Epoch: 15 Global Step: 330530 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:47:23,187-Speed 6312.87 samples/sec Loss 5.3715 LearningRate 0.0004 Epoch: 15 Global Step: 330540 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:47:26,431-Speed 6313.99 samples/sec Loss 5.4112 LearningRate 0.0004 Epoch: 15 Global Step: 330550 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:47:29,675-Speed 6315.29 samples/sec Loss 5.4898 LearningRate 0.0004 Epoch: 15 Global Step: 330560 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:47:32,919-Speed 6315.03 samples/sec Loss 5.4946 LearningRate 0.0004 Epoch: 15 Global Step: 330570 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:47:36,165-Speed 6310.20 samples/sec Loss 5.4621 LearningRate 0.0004 Epoch: 15 Global Step: 330580 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:47:39,410-Speed 6312.54 samples/sec Loss 5.5186 LearningRate 0.0004 Epoch: 15 Global Step: 330590 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:47:42,658-Speed 6307.04 samples/sec Loss 5.4647 LearningRate 0.0004 Epoch: 15 Global Step: 330600 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:47:45,907-Speed 6305.76 samples/sec Loss 5.5287 LearningRate 0.0004 Epoch: 15 Global Step: 330610 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:47:49,149-Speed 6319.14 samples/sec Loss 5.4624 LearningRate 0.0004 Epoch: 15 Global Step: 330620 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:47:52,393-Speed 6314.31 samples/sec Loss 5.4869 LearningRate 0.0004 Epoch: 15 Global Step: 330630 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:47:55,641-Speed 6306.84 samples/sec Loss 5.5215 LearningRate 0.0004 Epoch: 15 Global Step: 330640 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:47:58,888-Speed 6307.23 samples/sec Loss 5.4510 LearningRate 0.0004 Epoch: 15 Global Step: 330650 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:02,137-Speed 6306.58 samples/sec Loss 5.6126 LearningRate 0.0004 Epoch: 15 Global Step: 330660 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:05,382-Speed 6311.75 samples/sec Loss 5.5026 LearningRate 0.0004 Epoch: 15 Global Step: 330670 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:08,627-Speed 6313.07 samples/sec Loss 5.4976 LearningRate 0.0004 Epoch: 15 Global Step: 330680 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:11,863-Speed 6332.65 samples/sec Loss 5.5003 LearningRate 0.0004 Epoch: 15 Global Step: 330690 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:15,105-Speed 6318.47 samples/sec Loss 5.4759 LearningRate 0.0004 Epoch: 15 Global Step: 330700 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:18,353-Speed 6308.43 samples/sec Loss 5.4580 LearningRate 0.0004 Epoch: 15 Global Step: 330710 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:21,598-Speed 6310.72 samples/sec Loss 5.4368 LearningRate 0.0004 Epoch: 15 Global Step: 330720 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:24,847-Speed 6308.69 samples/sec Loss 5.5722 LearningRate 0.0004 Epoch: 15 Global Step: 330730 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:28,092-Speed 6313.61 samples/sec Loss 5.4849 LearningRate 0.0004 Epoch: 15 Global Step: 330740 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:31,337-Speed 6311.80 samples/sec Loss 5.5634 LearningRate 0.0004 Epoch: 15 Global Step: 330750 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:34,583-Speed 6310.43 samples/sec Loss 5.5537 LearningRate 0.0004 Epoch: 15 Global Step: 330760 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:37,827-Speed 6315.36 samples/sec Loss 5.5204 LearningRate 0.0004 Epoch: 15 Global Step: 330770 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:41,082-Speed 6293.36 samples/sec Loss 5.5608 LearningRate 0.0004 Epoch: 15 Global Step: 330780 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:44,313-Speed 6339.90 samples/sec Loss 5.6325 LearningRate 0.0004 Epoch: 15 Global Step: 330790 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:47,559-Speed 6310.22 samples/sec Loss 5.5854 LearningRate 0.0004 Epoch: 15 Global Step: 330800 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:50,805-Speed 6310.95 samples/sec Loss 5.4791 LearningRate 0.0004 Epoch: 15 Global Step: 330810 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:54,053-Speed 6307.83 samples/sec Loss 5.5676 LearningRate 0.0004 Epoch: 15 Global Step: 330820 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:48:57,306-Speed 6297.24 samples/sec Loss 5.4731 LearningRate 0.0004 Epoch: 15 Global Step: 330830 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:49:00,556-Speed 6303.40 samples/sec Loss 5.6033 LearningRate 0.0004 Epoch: 15 Global Step: 330840 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:49:03,801-Speed 6311.68 samples/sec Loss 5.5214 LearningRate 0.0004 Epoch: 15 Global Step: 330850 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:49:07,048-Speed 6309.81 samples/sec Loss 5.4896 LearningRate 0.0004 Epoch: 15 Global Step: 330860 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:49:10,290-Speed 6317.41 samples/sec Loss 5.4519 LearningRate 0.0004 Epoch: 15 Global Step: 330870 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:49:13,539-Speed 6304.86 samples/sec Loss 5.4524 LearningRate 0.0004 Epoch: 15 Global Step: 330880 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:49:16,772-Speed 6337.29 samples/sec Loss 5.4171 LearningRate 0.0004 Epoch: 15 Global Step: 330890 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:49:20,015-Speed 6317.02 samples/sec Loss 5.5484 LearningRate 0.0004 Epoch: 15 Global Step: 330900 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:49:23,240-Speed 6351.37 samples/sec Loss 5.4763 LearningRate 0.0004 Epoch: 15 Global Step: 330910 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:49:26,486-Speed 6310.32 samples/sec Loss 5.5028 LearningRate 0.0004 Epoch: 15 Global Step: 330920 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:49:29,731-Speed 6312.34 samples/sec Loss 5.5296 LearningRate 0.0004 Epoch: 15 Global Step: 330930 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:49:32,978-Speed 6308.73 samples/sec Loss 5.5603 LearningRate 0.0004 Epoch: 15 Global Step: 330940 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:49:36,224-Speed 6311.63 samples/sec Loss 5.5677 LearningRate 0.0004 Epoch: 15 Global Step: 330950 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:49:39,465-Speed 6319.88 samples/sec Loss 5.5869 LearningRate 0.0004 Epoch: 15 Global Step: 330960 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:49:42,713-Speed 6307.33 samples/sec Loss 5.4933 LearningRate 0.0004 Epoch: 15 Global Step: 330970 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:49:45,957-Speed 6315.10 samples/sec Loss 5.5001 LearningRate 0.0004 Epoch: 15 Global Step: 330980 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:49:49,202-Speed 6312.35 samples/sec Loss 5.5648 LearningRate 0.0004 Epoch: 15 Global Step: 330990 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:49:52,462-Speed 6283.96 samples/sec Loss 5.5883 LearningRate 0.0004 Epoch: 15 Global Step: 331000 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:49:55,706-Speed 6314.49 samples/sec Loss 5.4769 LearningRate 0.0004 Epoch: 15 Global Step: 331010 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:49:58,953-Speed 6309.34 samples/sec Loss 5.4873 LearningRate 0.0004 Epoch: 15 Global Step: 331020 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:50:02,198-Speed 6312.23 samples/sec Loss 5.5022 LearningRate 0.0004 Epoch: 15 Global Step: 331030 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:50:05,443-Speed 6313.92 samples/sec Loss 5.5509 LearningRate 0.0004 Epoch: 15 Global Step: 331040 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:50:08,686-Speed 6315.65 samples/sec Loss 5.5598 LearningRate 0.0004 Epoch: 15 Global Step: 331050 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:50:11,930-Speed 6314.31 samples/sec Loss 5.4081 LearningRate 0.0004 Epoch: 15 Global Step: 331060 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:50:15,176-Speed 6311.12 samples/sec Loss 5.4719 LearningRate 0.0004 Epoch: 15 Global Step: 331070 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:50:18,421-Speed 6313.48 samples/sec Loss 5.4366 LearningRate 0.0004 Epoch: 15 Global Step: 331080 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:50:21,652-Speed 6338.14 samples/sec Loss 5.3393 LearningRate 0.0004 Epoch: 15 Global Step: 331090 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:50:24,896-Speed 6315.47 samples/sec Loss 5.5070 LearningRate 0.0004 Epoch: 15 Global Step: 331100 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:50:28,151-Speed 6292.44 samples/sec Loss 5.4963 LearningRate 0.0004 Epoch: 15 Global Step: 331110 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:50:31,401-Speed 6304.01 samples/sec Loss 5.5218 LearningRate 0.0004 Epoch: 15 Global Step: 331120 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:50:34,660-Speed 6285.57 samples/sec Loss 5.5432 LearningRate 0.0004 Epoch: 15 Global Step: 331130 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:50:37,908-Speed 6306.92 samples/sec Loss 5.5184 LearningRate 0.0004 Epoch: 15 Global Step: 331140 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:50:41,151-Speed 6315.51 samples/sec Loss 5.4627 LearningRate 0.0004 Epoch: 15 Global Step: 331150 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:50:44,392-Speed 6321.82 samples/sec Loss 5.4889 LearningRate 0.0004 Epoch: 15 Global Step: 331160 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:50:47,641-Speed 6303.95 samples/sec Loss 5.4977 LearningRate 0.0004 Epoch: 15 Global Step: 331170 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:50:50,883-Speed 6318.34 samples/sec Loss 5.5418 LearningRate 0.0004 Epoch: 15 Global Step: 331180 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:50:54,130-Speed 6310.28 samples/sec Loss 5.4154 LearningRate 0.0004 Epoch: 15 Global Step: 331190 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:50:57,387-Speed 6289.25 samples/sec Loss 5.4971 LearningRate 0.0004 Epoch: 15 Global Step: 331200 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:00,631-Speed 6315.07 samples/sec Loss 5.5000 LearningRate 0.0004 Epoch: 15 Global Step: 331210 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:03,881-Speed 6301.59 samples/sec Loss 5.4959 LearningRate 0.0004 Epoch: 15 Global Step: 331220 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:07,127-Speed 6311.33 samples/sec Loss 5.5293 LearningRate 0.0004 Epoch: 15 Global Step: 331230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:10,373-Speed 6310.60 samples/sec Loss 5.5156 LearningRate 0.0004 Epoch: 15 Global Step: 331240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:13,618-Speed 6313.34 samples/sec Loss 5.5089 LearningRate 0.0004 Epoch: 15 Global Step: 331250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:16,875-Speed 6289.69 samples/sec Loss 5.4680 LearningRate 0.0004 Epoch: 15 Global Step: 331260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:20,118-Speed 6316.03 samples/sec Loss 5.5226 LearningRate 0.0004 Epoch: 15 Global Step: 331270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:23,366-Speed 6306.40 samples/sec Loss 5.4563 LearningRate 0.0004 Epoch: 15 Global Step: 331280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:26,597-Speed 6339.98 samples/sec Loss 5.4645 LearningRate 0.0004 Epoch: 15 Global Step: 331290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:29,842-Speed 6313.84 samples/sec Loss 5.4821 LearningRate 0.0004 Epoch: 15 Global Step: 331300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:33,087-Speed 6312.23 samples/sec Loss 5.5146 LearningRate 0.0004 Epoch: 15 Global Step: 331310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:36,355-Speed 6268.78 samples/sec Loss 5.4946 LearningRate 0.0004 Epoch: 15 Global Step: 331320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:39,598-Speed 6314.54 samples/sec Loss 5.5684 LearningRate 0.0004 Epoch: 15 Global Step: 331330 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:42,847-Speed 6305.31 samples/sec Loss 5.6055 LearningRate 0.0004 Epoch: 15 Global Step: 331340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:46,097-Speed 6304.44 samples/sec Loss 5.5295 LearningRate 0.0004 Epoch: 15 Global Step: 331350 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:49,340-Speed 6314.95 samples/sec Loss 5.5236 LearningRate 0.0004 Epoch: 15 Global Step: 331360 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:52,584-Speed 6316.15 samples/sec Loss 5.5615 LearningRate 0.0004 Epoch: 15 Global Step: 331370 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:55,827-Speed 6315.90 samples/sec Loss 5.4948 LearningRate 0.0004 Epoch: 15 Global Step: 331380 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:51:59,060-Speed 6334.61 samples/sec Loss 5.5032 LearningRate 0.0004 Epoch: 15 Global Step: 331390 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:02,312-Speed 6300.40 samples/sec Loss 5.4995 LearningRate 0.0004 Epoch: 15 Global Step: 331400 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:05,559-Speed 6309.14 samples/sec Loss 5.5159 LearningRate 0.0004 Epoch: 15 Global Step: 331410 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:08,808-Speed 6304.77 samples/sec Loss 5.4727 LearningRate 0.0004 Epoch: 15 Global Step: 331420 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:12,055-Speed 6309.09 samples/sec Loss 5.4649 LearningRate 0.0004 Epoch: 15 Global Step: 331430 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:15,298-Speed 6316.53 samples/sec Loss 5.5392 LearningRate 0.0004 Epoch: 15 Global Step: 331440 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:18,543-Speed 6312.82 samples/sec Loss 5.4616 LearningRate 0.0004 Epoch: 15 Global Step: 331450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:21,791-Speed 6306.92 samples/sec Loss 5.5097 LearningRate 0.0004 Epoch: 15 Global Step: 331460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:25,041-Speed 6302.23 samples/sec Loss 5.4898 LearningRate 0.0004 Epoch: 15 Global Step: 331470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:28,288-Speed 6310.19 samples/sec Loss 5.5526 LearningRate 0.0004 Epoch: 15 Global Step: 331480 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:31,534-Speed 6310.33 samples/sec Loss 5.5271 LearningRate 0.0004 Epoch: 15 Global Step: 331490 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 21:52:34,764-Speed 6340.26 samples/sec Loss 5.4445 LearningRate 0.0004 Epoch: 15 Global Step: 331500 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:38,115-Speed 6113.91 samples/sec Loss 5.4339 LearningRate 0.0004 Epoch: 15 Global Step: 331510 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:41,422-Speed 6193.43 samples/sec Loss 5.4358 LearningRate 0.0004 Epoch: 15 Global Step: 331520 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:44,673-Speed 6302.26 samples/sec Loss 5.4495 LearningRate 0.0004 Epoch: 15 Global Step: 331530 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:47,919-Speed 6310.39 samples/sec Loss 5.5572 LearningRate 0.0004 Epoch: 15 Global Step: 331540 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:51,165-Speed 6310.10 samples/sec Loss 5.5209 LearningRate 0.0004 Epoch: 15 Global Step: 331550 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:54,410-Speed 6313.80 samples/sec Loss 5.5302 LearningRate 0.0004 Epoch: 15 Global Step: 331560 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:52:57,654-Speed 6313.82 samples/sec Loss 5.4639 LearningRate 0.0004 Epoch: 15 Global Step: 331570 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:00,899-Speed 6311.80 samples/sec Loss 5.5385 LearningRate 0.0004 Epoch: 15 Global Step: 331580 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:04,145-Speed 6311.68 samples/sec Loss 5.5463 LearningRate 0.0004 Epoch: 15 Global Step: 331590 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:07,386-Speed 6321.16 samples/sec Loss 5.4997 LearningRate 0.0004 Epoch: 15 Global Step: 331600 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:10,630-Speed 6315.26 samples/sec Loss 5.5813 LearningRate 0.0004 Epoch: 15 Global Step: 331610 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:13,876-Speed 6309.17 samples/sec Loss 5.5448 LearningRate 0.0004 Epoch: 15 Global Step: 331620 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:17,123-Speed 6309.03 samples/sec Loss 5.6073 LearningRate 0.0004 Epoch: 15 Global Step: 331630 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:20,371-Speed 6308.20 samples/sec Loss 5.4875 LearningRate 0.0004 Epoch: 15 Global Step: 331640 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:23,617-Speed 6310.37 samples/sec Loss 5.5473 LearningRate 0.0004 Epoch: 15 Global Step: 331650 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:26,862-Speed 6311.65 samples/sec Loss 5.6385 LearningRate 0.0004 Epoch: 15 Global Step: 331660 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:30,111-Speed 6305.79 samples/sec Loss 5.5689 LearningRate 0.0004 Epoch: 15 Global Step: 331670 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:33,359-Speed 6306.79 samples/sec Loss 5.5137 LearningRate 0.0004 Epoch: 15 Global Step: 331680 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:36,614-Speed 6292.29 samples/sec Loss 5.5558 LearningRate 0.0004 Epoch: 15 Global Step: 331690 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:39,847-Speed 6337.46 samples/sec Loss 5.4596 LearningRate 0.0004 Epoch: 15 Global Step: 331700 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:43,091-Speed 6312.66 samples/sec Loss 5.5042 LearningRate 0.0004 Epoch: 15 Global Step: 331710 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:46,339-Speed 6307.50 samples/sec Loss 5.5128 LearningRate 0.0004 Epoch: 15 Global Step: 331720 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:49,579-Speed 6322.53 samples/sec Loss 5.4905 LearningRate 0.0004 Epoch: 15 Global Step: 331730 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:52,827-Speed 6306.86 samples/sec Loss 5.5071 LearningRate 0.0004 Epoch: 15 Global Step: 331740 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:53:56,061-Speed 6333.85 samples/sec Loss 5.5383 LearningRate 0.0004 Epoch: 15 Global Step: 331750 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:53:59,308-Speed 6308.52 samples/sec Loss 5.4882 LearningRate 0.0004 Epoch: 15 Global Step: 331760 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:54:02,554-Speed 6311.78 samples/sec Loss 5.5201 LearningRate 0.0004 Epoch: 15 Global Step: 331770 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:54:05,797-Speed 6315.55 samples/sec Loss 5.5367 LearningRate 0.0004 Epoch: 15 Global Step: 331780 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:54:09,041-Speed 6314.20 samples/sec Loss 5.5379 LearningRate 0.0004 Epoch: 15 Global Step: 331790 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:54:12,284-Speed 6316.80 samples/sec Loss 5.4895 LearningRate 0.0004 Epoch: 15 Global Step: 331800 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:54:15,529-Speed 6314.51 samples/sec Loss 5.5483 LearningRate 0.0004 Epoch: 15 Global Step: 331810 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:54:18,775-Speed 6309.61 samples/sec Loss 5.5176 LearningRate 0.0004 Epoch: 15 Global Step: 331820 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:54:22,021-Speed 6312.27 samples/sec Loss 5.5346 LearningRate 0.0004 Epoch: 15 Global Step: 331830 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:55:21,598-Speed 343.76 samples/sec Loss 5.5041 LearningRate 0.0004 Epoch: 16 Global Step: 331840 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:55:24,837-Speed 6326.00 samples/sec Loss 5.4881 LearningRate 0.0004 Epoch: 16 Global Step: 331850 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:55:28,071-Speed 6332.57 samples/sec Loss 5.5602 LearningRate 0.0004 Epoch: 16 Global Step: 331860 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:55:31,314-Speed 6317.91 samples/sec Loss 5.4658 LearningRate 0.0004 Epoch: 16 Global Step: 331870 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:55:34,547-Speed 6335.40 samples/sec Loss 5.4708 LearningRate 0.0004 Epoch: 16 Global Step: 331880 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:55:37,765-Speed 6365.12 samples/sec Loss 5.4896 LearningRate 0.0004 Epoch: 16 Global Step: 331890 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:55:41,002-Speed 6328.23 samples/sec Loss 5.5580 LearningRate 0.0004 Epoch: 16 Global Step: 331900 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:55:44,244-Speed 6319.41 samples/sec Loss 5.4484 LearningRate 0.0004 Epoch: 16 Global Step: 331910 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:55:47,478-Speed 6333.45 samples/sec Loss 5.4941 LearningRate 0.0004 Epoch: 16 Global Step: 331920 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:55:50,717-Speed 6324.22 samples/sec Loss 5.4228 LearningRate 0.0004 Epoch: 16 Global Step: 331930 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:55:53,957-Speed 6322.56 samples/sec Loss 5.4165 LearningRate 0.0004 Epoch: 16 Global Step: 331940 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:55:57,192-Speed 6331.73 samples/sec Loss 5.5226 LearningRate 0.0004 Epoch: 16 Global Step: 331950 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:00,429-Speed 6329.58 samples/sec Loss 5.5387 LearningRate 0.0004 Epoch: 16 Global Step: 331960 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:03,669-Speed 6321.11 samples/sec Loss 5.5518 LearningRate 0.0004 Epoch: 16 Global Step: 331970 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:06,906-Speed 6328.01 samples/sec Loss 5.5170 LearningRate 0.0004 Epoch: 16 Global Step: 331980 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:10,132-Speed 6350.13 samples/sec Loss 5.5422 LearningRate 0.0004 Epoch: 16 Global Step: 331990 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:13,373-Speed 6321.95 samples/sec Loss 5.4870 LearningRate 0.0004 Epoch: 16 Global Step: 332000 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:16,614-Speed 6319.81 samples/sec Loss 5.5116 LearningRate 0.0004 Epoch: 16 Global Step: 332010 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:19,852-Speed 6326.35 samples/sec Loss 5.4944 LearningRate 0.0004 Epoch: 16 Global Step: 332020 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:23,092-Speed 6322.97 samples/sec Loss 5.5152 LearningRate 0.0004 Epoch: 16 Global Step: 332030 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:26,331-Speed 6324.39 samples/sec Loss 5.3738 LearningRate 0.0004 Epoch: 16 Global Step: 332040 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:29,570-Speed 6324.96 samples/sec Loss 5.5245 LearningRate 0.0004 Epoch: 16 Global Step: 332050 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:32,810-Speed 6322.71 samples/sec Loss 5.4900 LearningRate 0.0004 Epoch: 16 Global Step: 332060 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:36,044-Speed 6334.35 samples/sec Loss 5.4396 LearningRate 0.0004 Epoch: 16 Global Step: 332070 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:39,282-Speed 6325.92 samples/sec Loss 5.4734 LearningRate 0.0004 Epoch: 16 Global Step: 332080 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:56:42,516-Speed 6333.32 samples/sec Loss 5.4793 LearningRate 0.0004 Epoch: 16 Global Step: 332090 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:56:45,756-Speed 6322.09 samples/sec Loss 5.4910 LearningRate 0.0004 Epoch: 16 Global Step: 332100 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:56:48,998-Speed 6319.18 samples/sec Loss 5.4346 LearningRate 0.0004 Epoch: 16 Global Step: 332110 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:56:52,233-Speed 6332.09 samples/sec Loss 5.5300 LearningRate 0.0004 Epoch: 16 Global Step: 332120 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:56:55,470-Speed 6329.04 samples/sec Loss 5.4989 LearningRate 0.0004 Epoch: 16 Global Step: 332130 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:56:58,711-Speed 6319.12 samples/sec Loss 5.5689 LearningRate 0.0004 Epoch: 16 Global Step: 332140 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:01,955-Speed 6315.53 samples/sec Loss 5.5091 LearningRate 0.0004 Epoch: 16 Global Step: 332150 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:05,194-Speed 6324.26 samples/sec Loss 5.3925 LearningRate 0.0004 Epoch: 16 Global Step: 332160 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:08,428-Speed 6335.01 samples/sec Loss 5.4489 LearningRate 0.0004 Epoch: 16 Global Step: 332170 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:11,664-Speed 6329.22 samples/sec Loss 5.5208 LearningRate 0.0004 Epoch: 16 Global Step: 332180 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:14,898-Speed 6334.48 samples/sec Loss 5.4841 LearningRate 0.0004 Epoch: 16 Global Step: 332190 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 21:57:18,123-Speed 6351.75 samples/sec Loss 5.4614 LearningRate 0.0004 Epoch: 16 Global Step: 332200 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:21,359-Speed 6330.01 samples/sec Loss 5.5026 LearningRate 0.0004 Epoch: 16 Global Step: 332210 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:24,594-Speed 6332.05 samples/sec Loss 5.5548 LearningRate 0.0004 Epoch: 16 Global Step: 332220 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:27,845-Speed 6302.10 samples/sec Loss 5.4581 LearningRate 0.0004 Epoch: 16 Global Step: 332230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:31,080-Speed 6331.53 samples/sec Loss 5.5123 LearningRate 0.0004 Epoch: 16 Global Step: 332240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:34,319-Speed 6324.02 samples/sec Loss 5.5571 LearningRate 0.0004 Epoch: 16 Global Step: 332250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:37,577-Speed 6288.25 samples/sec Loss 5.4659 LearningRate 0.0004 Epoch: 16 Global Step: 332260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:40,817-Speed 6322.72 samples/sec Loss 5.4733 LearningRate 0.0004 Epoch: 16 Global Step: 332270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:44,056-Speed 6324.70 samples/sec Loss 5.4386 LearningRate 0.0004 Epoch: 16 Global Step: 332280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:47,291-Speed 6331.30 samples/sec Loss 5.5092 LearningRate 0.0004 Epoch: 16 Global Step: 332290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:50,516-Speed 6351.10 samples/sec Loss 5.4936 LearningRate 0.0004 Epoch: 16 Global Step: 332300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:53,752-Speed 6330.27 samples/sec Loss 5.5069 LearningRate 0.0004 Epoch: 16 Global Step: 332310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:57:56,988-Speed 6330.22 samples/sec Loss 5.4543 LearningRate 0.0004 Epoch: 16 Global Step: 332320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:00,231-Speed 6316.20 samples/sec Loss 5.4712 LearningRate 0.0004 Epoch: 16 Global Step: 332330 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:03,471-Speed 6324.00 samples/sec Loss 5.4609 LearningRate 0.0004 Epoch: 16 Global Step: 332340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:06,711-Speed 6321.62 samples/sec Loss 5.4179 LearningRate 0.0004 Epoch: 16 Global Step: 332350 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:09,945-Speed 6333.55 samples/sec Loss 5.4581 LearningRate 0.0004 Epoch: 16 Global Step: 332360 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:13,181-Speed 6330.60 samples/sec Loss 5.4885 LearningRate 0.0004 Epoch: 16 Global Step: 332370 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:16,426-Speed 6312.44 samples/sec Loss 5.4369 LearningRate 0.0004 Epoch: 16 Global Step: 332380 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:19,661-Speed 6331.61 samples/sec Loss 5.5058 LearningRate 0.0004 Epoch: 16 Global Step: 332390 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:22,889-Speed 6346.54 samples/sec Loss 5.5003 LearningRate 0.0004 Epoch: 16 Global Step: 332400 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:26,126-Speed 6328.82 samples/sec Loss 5.4461 LearningRate 0.0004 Epoch: 16 Global Step: 332410 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:29,361-Speed 6331.02 samples/sec Loss 5.4568 LearningRate 0.0004 Epoch: 16 Global Step: 332420 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:32,601-Speed 6323.00 samples/sec Loss 5.4859 LearningRate 0.0004 Epoch: 16 Global Step: 332430 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:35,837-Speed 6331.70 samples/sec Loss 5.4905 LearningRate 0.0004 Epoch: 16 Global Step: 332440 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:39,097-Speed 6283.10 samples/sec Loss 5.5125 LearningRate 0.0004 Epoch: 16 Global Step: 332450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:42,331-Speed 6334.04 samples/sec Loss 5.4834 LearningRate 0.0004 Epoch: 16 Global Step: 332460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:45,571-Speed 6322.73 samples/sec Loss 5.4924 LearningRate 0.0004 Epoch: 16 Global Step: 332470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:48,808-Speed 6327.75 samples/sec Loss 5.6102 LearningRate 0.0004 Epoch: 16 Global Step: 332480 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:52,049-Speed 6319.61 samples/sec Loss 5.5044 LearningRate 0.0004 Epoch: 16 Global Step: 332490 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:55,271-Speed 6359.25 samples/sec Loss 5.5263 LearningRate 0.0004 Epoch: 16 Global Step: 332500 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:58:58,510-Speed 6324.23 samples/sec Loss 5.4346 LearningRate 0.0004 Epoch: 16 Global Step: 332510 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:01,752-Speed 6318.60 samples/sec Loss 5.5023 LearningRate 0.0004 Epoch: 16 Global Step: 332520 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:04,990-Speed 6325.91 samples/sec Loss 5.5522 LearningRate 0.0004 Epoch: 16 Global Step: 332530 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:08,226-Speed 6330.86 samples/sec Loss 5.5668 LearningRate 0.0004 Epoch: 16 Global Step: 332540 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:11,462-Speed 6328.84 samples/sec Loss 5.4709 LearningRate 0.0004 Epoch: 16 Global Step: 332550 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:14,699-Speed 6327.78 samples/sec Loss 5.5052 LearningRate 0.0004 Epoch: 16 Global Step: 332560 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:17,938-Speed 6324.21 samples/sec Loss 5.5008 LearningRate 0.0004 Epoch: 16 Global Step: 332570 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:21,176-Speed 6326.88 samples/sec Loss 5.5170 LearningRate 0.0004 Epoch: 16 Global Step: 332580 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:24,412-Speed 6330.75 samples/sec Loss 5.4615 LearningRate 0.0004 Epoch: 16 Global Step: 332590 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:27,633-Speed 6359.70 samples/sec Loss 5.5283 LearningRate 0.0004 Epoch: 16 Global Step: 332600 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:30,871-Speed 6324.97 samples/sec Loss 5.4283 LearningRate 0.0004 Epoch: 16 Global Step: 332610 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:34,108-Speed 6329.46 samples/sec Loss 5.5504 LearningRate 0.0004 Epoch: 16 Global Step: 332620 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:37,347-Speed 6324.62 samples/sec Loss 5.4849 LearningRate 0.0004 Epoch: 16 Global Step: 332630 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:40,582-Speed 6332.12 samples/sec Loss 5.4568 LearningRate 0.0004 Epoch: 16 Global Step: 332640 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:43,818-Speed 6330.22 samples/sec Loss 5.4403 LearningRate 0.0004 Epoch: 16 Global Step: 332650 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:47,057-Speed 6325.17 samples/sec Loss 5.5402 LearningRate 0.0004 Epoch: 16 Global Step: 332660 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 21:59:50,278-Speed 6359.22 samples/sec Loss 5.5662 LearningRate 0.0004 Epoch: 16 Global Step: 332670 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:59:53,518-Speed 6323.82 samples/sec Loss 5.5497 LearningRate 0.0004 Epoch: 16 Global Step: 332680 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:59:56,758-Speed 6321.47 samples/sec Loss 5.5479 LearningRate 0.0004 Epoch: 16 Global Step: 332690 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 21:59:59,992-Speed 6333.58 samples/sec Loss 5.4585 LearningRate 0.0004 Epoch: 16 Global Step: 332700 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:03,230-Speed 6326.79 samples/sec Loss 5.5244 LearningRate 0.0004 Epoch: 16 Global Step: 332710 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:06,466-Speed 6330.13 samples/sec Loss 5.4799 LearningRate 0.0004 Epoch: 16 Global Step: 332720 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:09,711-Speed 6313.40 samples/sec Loss 5.5000 LearningRate 0.0004 Epoch: 16 Global Step: 332730 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:12,948-Speed 6326.75 samples/sec Loss 5.4264 LearningRate 0.0004 Epoch: 16 Global Step: 332740 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:16,182-Speed 6334.67 samples/sec Loss 5.5234 LearningRate 0.0004 Epoch: 16 Global Step: 332750 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:19,417-Speed 6332.01 samples/sec Loss 5.4349 LearningRate 0.0004 Epoch: 16 Global Step: 332760 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:22,656-Speed 6324.66 samples/sec Loss 5.5141 LearningRate 0.0004 Epoch: 16 Global Step: 332770 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:00:25,897-Speed 6320.61 samples/sec Loss 5.6063 LearningRate 0.0004 Epoch: 16 Global Step: 332780 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:00:29,134-Speed 6328.79 samples/sec Loss 5.5373 LearningRate 0.0004 Epoch: 16 Global Step: 332790 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:00:32,369-Speed 6330.81 samples/sec Loss 5.3866 LearningRate 0.0004 Epoch: 16 Global Step: 332800 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:00:35,610-Speed 6321.55 samples/sec Loss 5.5637 LearningRate 0.0004 Epoch: 16 Global Step: 332810 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:00:38,836-Speed 6352.62 samples/sec Loss 5.4983 LearningRate 0.0004 Epoch: 16 Global Step: 332820 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:42,100-Speed 6276.05 samples/sec Loss 5.4601 LearningRate 0.0004 Epoch: 16 Global Step: 332830 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:45,338-Speed 6327.33 samples/sec Loss 5.4923 LearningRate 0.0004 Epoch: 16 Global Step: 332840 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:48,575-Speed 6328.34 samples/sec Loss 5.4932 LearningRate 0.0004 Epoch: 16 Global Step: 332850 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:51,812-Speed 6327.00 samples/sec Loss 5.4584 LearningRate 0.0004 Epoch: 16 Global Step: 332860 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:55,051-Speed 6325.62 samples/sec Loss 5.4309 LearningRate 0.0004 Epoch: 16 Global Step: 332870 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:00:58,286-Speed 6331.77 samples/sec Loss 5.4704 LearningRate 0.0004 Epoch: 16 Global Step: 332880 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:01:01,527-Speed 6320.25 samples/sec Loss 5.4896 LearningRate 0.0004 Epoch: 16 Global Step: 332890 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:01:04,765-Speed 6326.25 samples/sec Loss 5.4511 LearningRate 0.0004 Epoch: 16 Global Step: 332900 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:01:08,009-Speed 6314.65 samples/sec Loss 5.4257 LearningRate 0.0004 Epoch: 16 Global Step: 332910 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:01:11,245-Speed 6330.19 samples/sec Loss 5.5005 LearningRate 0.0004 Epoch: 16 Global Step: 332920 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:14,486-Speed 6319.77 samples/sec Loss 5.5118 LearningRate 0.0004 Epoch: 16 Global Step: 332930 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:17,723-Speed 6328.52 samples/sec Loss 5.4567 LearningRate 0.0004 Epoch: 16 Global Step: 332940 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:20,975-Speed 6299.29 samples/sec Loss 5.4710 LearningRate 0.0004 Epoch: 16 Global Step: 332950 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:24,211-Speed 6330.66 samples/sec Loss 5.4907 LearningRate 0.0004 Epoch: 16 Global Step: 332960 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:27,450-Speed 6324.34 samples/sec Loss 5.5900 LearningRate 0.0004 Epoch: 16 Global Step: 332970 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:30,692-Speed 6319.22 samples/sec Loss 5.4889 LearningRate 0.0004 Epoch: 16 Global Step: 332980 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:33,930-Speed 6326.39 samples/sec Loss 5.4982 LearningRate 0.0004 Epoch: 16 Global Step: 332990 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:37,183-Speed 6296.23 samples/sec Loss 5.4247 LearningRate 0.0004 Epoch: 16 Global Step: 333000 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:40,420-Speed 6329.66 samples/sec Loss 5.5011 LearningRate 0.0004 Epoch: 16 Global Step: 333010 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:43,647-Speed 6346.04 samples/sec Loss 5.4967 LearningRate 0.0004 Epoch: 16 Global Step: 333020 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:46,886-Speed 6325.16 samples/sec Loss 5.5163 LearningRate 0.0004 Epoch: 16 Global Step: 333030 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:50,148-Speed 6280.25 samples/sec Loss 5.5484 LearningRate 0.0004 Epoch: 16 Global Step: 333040 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:53,471-Speed 6163.90 samples/sec Loss 5.4713 LearningRate 0.0004 Epoch: 16 Global Step: 333050 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:56,723-Speed 6300.24 samples/sec Loss 5.4758 LearningRate 0.0004 Epoch: 16 Global Step: 333060 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:01:59,962-Speed 6324.67 samples/sec Loss 5.5267 LearningRate 0.0004 Epoch: 16 Global Step: 333070 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:02:03,205-Speed 6314.81 samples/sec Loss 5.4154 LearningRate 0.0004 Epoch: 16 Global Step: 333080 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:02:06,443-Speed 6327.80 samples/sec Loss 5.5023 LearningRate 0.0004 Epoch: 16 Global Step: 333090 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:02:09,687-Speed 6312.81 samples/sec Loss 5.5724 LearningRate 0.0004 Epoch: 16 Global Step: 333100 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:02:12,924-Speed 6328.68 samples/sec Loss 5.4613 LearningRate 0.0004 Epoch: 16 Global Step: 333110 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:02:16,147-Speed 6356.15 samples/sec Loss 5.5205 LearningRate 0.0004 Epoch: 16 Global Step: 333120 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:02:19,384-Speed 6327.56 samples/sec Loss 5.4612 LearningRate 0.0004 Epoch: 16 Global Step: 333130 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:02:22,625-Speed 6322.46 samples/sec Loss 5.4722 LearningRate 0.0004 Epoch: 16 Global Step: 333140 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:02:25,849-Speed 6352.57 samples/sec Loss 5.4190 LearningRate 0.0004 Epoch: 16 Global Step: 333150 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:02:29,091-Speed 6318.68 samples/sec Loss 5.3939 LearningRate 0.0004 Epoch: 16 Global Step: 333160 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:02:32,328-Speed 6327.76 samples/sec Loss 5.5385 LearningRate 0.0004 Epoch: 16 Global Step: 333170 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:02:35,568-Speed 6323.23 samples/sec Loss 5.5579 LearningRate 0.0004 Epoch: 16 Global Step: 333180 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:02:38,808-Speed 6321.35 samples/sec Loss 5.4921 LearningRate 0.0004 Epoch: 16 Global Step: 333190 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:02:42,047-Speed 6324.56 samples/sec Loss 5.4928 LearningRate 0.0004 Epoch: 16 Global Step: 333200 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:02:45,287-Speed 6322.59 samples/sec Loss 5.4997 LearningRate 0.0004 Epoch: 16 Global Step: 333210 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:02:48,540-Speed 6297.35 samples/sec Loss 5.4998 LearningRate 0.0004 Epoch: 16 Global Step: 333220 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:02:51,786-Speed 6310.31 samples/sec Loss 5.5504 LearningRate 0.0004 Epoch: 16 Global Step: 333230 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:02:55,028-Speed 6319.84 samples/sec Loss 5.5296 LearningRate 0.0004 Epoch: 16 Global Step: 333240 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:02:58,264-Speed 6328.85 samples/sec Loss 5.4932 LearningRate 0.0004 Epoch: 16 Global Step: 333250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:01,508-Speed 6314.95 samples/sec Loss 5.5469 LearningRate 0.0004 Epoch: 16 Global Step: 333260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:04,750-Speed 6318.66 samples/sec Loss 5.4358 LearningRate 0.0004 Epoch: 16 Global Step: 333270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:07,991-Speed 6320.81 samples/sec Loss 5.4500 LearningRate 0.0004 Epoch: 16 Global Step: 333280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:11,232-Speed 6320.95 samples/sec Loss 5.4646 LearningRate 0.0004 Epoch: 16 Global Step: 333290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:14,470-Speed 6326.52 samples/sec Loss 5.4916 LearningRate 0.0004 Epoch: 16 Global Step: 333300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:17,711-Speed 6320.06 samples/sec Loss 5.5178 LearningRate 0.0004 Epoch: 16 Global Step: 333310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:20,980-Speed 6270.64 samples/sec Loss 5.4903 LearningRate 0.0004 Epoch: 16 Global Step: 333320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:24,243-Speed 6277.84 samples/sec Loss 5.4356 LearningRate 0.0004 Epoch: 16 Global Step: 333330 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:27,483-Speed 6322.43 samples/sec Loss 5.4148 LearningRate 0.0004 Epoch: 16 Global Step: 333340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:30,722-Speed 6324.77 samples/sec Loss 5.4480 LearningRate 0.0004 Epoch: 16 Global Step: 333350 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 22:03:33,949-Speed 6348.90 samples/sec Loss 5.4505 LearningRate 0.0004 Epoch: 16 Global Step: 333360 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:37,186-Speed 6326.83 samples/sec Loss 5.4484 LearningRate 0.0004 Epoch: 16 Global Step: 333370 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:40,425-Speed 6325.41 samples/sec Loss 5.4523 LearningRate 0.0004 Epoch: 16 Global Step: 333380 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:43,665-Speed 6321.23 samples/sec Loss 5.5338 LearningRate 0.0004 Epoch: 16 Global Step: 333390 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:46,910-Speed 6312.79 samples/sec Loss 5.4690 LearningRate 0.0004 Epoch: 16 Global Step: 333400 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:50,152-Speed 6317.83 samples/sec Loss 5.5058 LearningRate 0.0004 Epoch: 16 Global Step: 333410 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:53,391-Speed 6324.64 samples/sec Loss 5.5508 LearningRate 0.0004 Epoch: 16 Global Step: 333420 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:56,628-Speed 6328.77 samples/sec Loss 5.5335 LearningRate 0.0004 Epoch: 16 Global Step: 333430 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:03:59,872-Speed 6314.88 samples/sec Loss 5.5169 LearningRate 0.0004 Epoch: 16 Global Step: 333440 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:03,113-Speed 6320.89 samples/sec Loss 5.4330 LearningRate 0.0004 Epoch: 16 Global Step: 333450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:06,343-Speed 6341.54 samples/sec Loss 5.4824 LearningRate 0.0004 Epoch: 16 Global Step: 333460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:09,583-Speed 6321.10 samples/sec Loss 5.5223 LearningRate 0.0004 Epoch: 16 Global Step: 333470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:12,826-Speed 6317.35 samples/sec Loss 5.5272 LearningRate 0.0004 Epoch: 16 Global Step: 333480 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:16,064-Speed 6327.36 samples/sec Loss 5.5121 LearningRate 0.0004 Epoch: 16 Global Step: 333490 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:19,303-Speed 6324.05 samples/sec Loss 5.5558 LearningRate 0.0004 Epoch: 16 Global Step: 333500 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:22,538-Speed 6332.11 samples/sec Loss 5.5407 LearningRate 0.0004 Epoch: 16 Global Step: 333510 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:25,779-Speed 6321.51 samples/sec Loss 5.4617 LearningRate 0.0004 Epoch: 16 Global Step: 333520 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:29,019-Speed 6322.19 samples/sec Loss 5.4687 LearningRate 0.0004 Epoch: 16 Global Step: 333530 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:32,289-Speed 6264.43 samples/sec Loss 5.4967 LearningRate 0.0004 Epoch: 16 Global Step: 333540 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:35,528-Speed 6325.24 samples/sec Loss 5.5103 LearningRate 0.0004 Epoch: 16 Global Step: 333550 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:38,753-Speed 6349.83 samples/sec Loss 5.4926 LearningRate 0.0004 Epoch: 16 Global Step: 333560 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:42,003-Speed 6304.82 samples/sec Loss 5.4639 LearningRate 0.0004 Epoch: 16 Global Step: 333570 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:45,245-Speed 6317.18 samples/sec Loss 5.4798 LearningRate 0.0004 Epoch: 16 Global Step: 333580 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:48,488-Speed 6316.60 samples/sec Loss 5.4676 LearningRate 0.0004 Epoch: 16 Global Step: 333590 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:04:51,718-Speed 6341.59 samples/sec Loss 5.4502 LearningRate 0.0004 Epoch: 16 Global Step: 333600 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:04:54,958-Speed 6322.75 samples/sec Loss 5.4889 LearningRate 0.0004 Epoch: 16 Global Step: 333610 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:04:58,200-Speed 6319.38 samples/sec Loss 5.4776 LearningRate 0.0004 Epoch: 16 Global Step: 333620 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:05:01,439-Speed 6323.07 samples/sec Loss 5.4807 LearningRate 0.0004 Epoch: 16 Global Step: 333630 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:05:04,684-Speed 6312.90 samples/sec Loss 5.4109 LearningRate 0.0004 Epoch: 16 Global Step: 333640 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:05:07,927-Speed 6317.37 samples/sec Loss 5.4382 LearningRate 0.0004 Epoch: 16 Global Step: 333650 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:05:11,163-Speed 6328.83 samples/sec Loss 5.4194 LearningRate 0.0004 Epoch: 16 Global Step: 333660 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:05:14,408-Speed 6314.69 samples/sec Loss 5.4804 LearningRate 0.0004 Epoch: 16 Global Step: 333670 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:05:17,649-Speed 6319.70 samples/sec Loss 5.4589 LearningRate 0.0004 Epoch: 16 Global Step: 333680 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:05:20,889-Speed 6322.80 samples/sec Loss 5.4803 LearningRate 0.0004 Epoch: 16 Global Step: 333690 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:05:24,131-Speed 6319.41 samples/sec Loss 5.4902 LearningRate 0.0004 Epoch: 16 Global Step: 333700 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:27,368-Speed 6326.87 samples/sec Loss 5.4985 LearningRate 0.0004 Epoch: 16 Global Step: 333710 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:30,613-Speed 6314.42 samples/sec Loss 5.5207 LearningRate 0.0004 Epoch: 16 Global Step: 333720 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:33,851-Speed 6326.08 samples/sec Loss 5.5302 LearningRate 0.0004 Epoch: 16 Global Step: 333730 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:37,095-Speed 6314.04 samples/sec Loss 5.4953 LearningRate 0.0004 Epoch: 16 Global Step: 333740 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:40,336-Speed 6320.82 samples/sec Loss 5.4152 LearningRate 0.0004 Epoch: 16 Global Step: 333750 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:43,574-Speed 6326.26 samples/sec Loss 5.5194 LearningRate 0.0004 Epoch: 16 Global Step: 333760 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:46,817-Speed 6315.12 samples/sec Loss 5.5555 LearningRate 0.0004 Epoch: 16 Global Step: 333770 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:50,059-Speed 6319.97 samples/sec Loss 5.4456 LearningRate 0.0004 Epoch: 16 Global Step: 333780 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:53,300-Speed 6320.56 samples/sec Loss 5.5225 LearningRate 0.0004 Epoch: 16 Global Step: 333790 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:56,526-Speed 6348.66 samples/sec Loss 5.4652 LearningRate 0.0004 Epoch: 16 Global Step: 333800 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:05:59,772-Speed 6311.76 samples/sec Loss 5.5789 LearningRate 0.0004 Epoch: 16 Global Step: 333810 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:03,028-Speed 6290.72 samples/sec Loss 5.4305 LearningRate 0.0004 Epoch: 16 Global Step: 333820 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:06,267-Speed 6323.89 samples/sec Loss 5.5178 LearningRate 0.0004 Epoch: 16 Global Step: 333830 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:09,509-Speed 6318.08 samples/sec Loss 5.5054 LearningRate 0.0004 Epoch: 16 Global Step: 333840 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:12,748-Speed 6325.00 samples/sec Loss 5.4449 LearningRate 0.0004 Epoch: 16 Global Step: 333850 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:15,993-Speed 6311.75 samples/sec Loss 5.5019 LearningRate 0.0004 Epoch: 16 Global Step: 333860 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:19,235-Speed 6318.32 samples/sec Loss 5.4504 LearningRate 0.0004 Epoch: 16 Global Step: 333870 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:22,478-Speed 6317.58 samples/sec Loss 5.4323 LearningRate 0.0004 Epoch: 16 Global Step: 333880 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:25,721-Speed 6317.23 samples/sec Loss 5.4552 LearningRate 0.0004 Epoch: 16 Global Step: 333890 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:28,948-Speed 6348.07 samples/sec Loss 5.4487 LearningRate 0.0004 Epoch: 16 Global Step: 333900 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:32,188-Speed 6322.43 samples/sec Loss 5.4318 LearningRate 0.0004 Epoch: 16 Global Step: 333910 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:06:35,419-Speed 6340.75 samples/sec Loss 5.4671 LearningRate 0.0004 Epoch: 16 Global Step: 333920 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:06:38,663-Speed 6313.87 samples/sec Loss 5.4612 LearningRate 0.0004 Epoch: 16 Global Step: 333930 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:06:41,908-Speed 6313.00 samples/sec Loss 5.5219 LearningRate 0.0004 Epoch: 16 Global Step: 333940 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:06:45,152-Speed 6315.57 samples/sec Loss 5.5224 LearningRate 0.0004 Epoch: 16 Global Step: 333950 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:06:48,395-Speed 6315.44 samples/sec Loss 5.5655 LearningRate 0.0004 Epoch: 16 Global Step: 333960 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:06:51,639-Speed 6314.57 samples/sec Loss 5.4697 LearningRate 0.0004 Epoch: 16 Global Step: 333970 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:06:54,885-Speed 6311.57 samples/sec Loss 5.4770 LearningRate 0.0004 Epoch: 16 Global Step: 333980 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:06:58,130-Speed 6311.57 samples/sec Loss 5.5105 LearningRate 0.0004 Epoch: 16 Global Step: 333990 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:07:01,373-Speed 6317.93 samples/sec Loss 5.4918 LearningRate 0.0004 Epoch: 16 Global Step: 334000 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:07:04,617-Speed 6313.93 samples/sec Loss 5.4384 LearningRate 0.0004 Epoch: 16 Global Step: 334010 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:07:07,859-Speed 6318.24 samples/sec Loss 5.4964 LearningRate 0.0004 Epoch: 16 Global Step: 334020 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:11,116-Speed 6289.60 samples/sec Loss 5.4861 LearningRate 0.0004 Epoch: 16 Global Step: 334030 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:14,355-Speed 6323.28 samples/sec Loss 5.5359 LearningRate 0.0004 Epoch: 16 Global Step: 334040 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:17,598-Speed 6316.80 samples/sec Loss 5.4491 LearningRate 0.0004 Epoch: 16 Global Step: 334050 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:20,840-Speed 6317.70 samples/sec Loss 5.4769 LearningRate 0.0004 Epoch: 16 Global Step: 334060 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:24,092-Speed 6300.88 samples/sec Loss 5.4346 LearningRate 0.0004 Epoch: 16 Global Step: 334070 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:27,336-Speed 6313.72 samples/sec Loss 5.4244 LearningRate 0.0004 Epoch: 16 Global Step: 334080 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:30,579-Speed 6316.22 samples/sec Loss 5.4816 LearningRate 0.0004 Epoch: 16 Global Step: 334090 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:33,834-Speed 6292.98 samples/sec Loss 5.4982 LearningRate 0.0004 Epoch: 16 Global Step: 334100 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:37,078-Speed 6315.04 samples/sec Loss 5.5382 LearningRate 0.0004 Epoch: 16 Global Step: 334110 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:40,308-Speed 6342.04 samples/sec Loss 5.6101 LearningRate 0.0004 Epoch: 16 Global Step: 334120 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:43,555-Speed 6308.74 samples/sec Loss 5.3760 LearningRate 0.0004 Epoch: 16 Global Step: 334130 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:46,801-Speed 6311.74 samples/sec Loss 5.4423 LearningRate 0.0004 Epoch: 16 Global Step: 334140 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:50,115-Speed 6182.13 samples/sec Loss 5.4939 LearningRate 0.0004 Epoch: 16 Global Step: 334150 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:53,356-Speed 6319.26 samples/sec Loss 5.5178 LearningRate 0.0004 Epoch: 16 Global Step: 334160 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:56,602-Speed 6310.80 samples/sec Loss 5.5284 LearningRate 0.0004 Epoch: 16 Global Step: 334170 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:07:59,850-Speed 6307.35 samples/sec Loss 5.4507 LearningRate 0.0004 Epoch: 16 Global Step: 334180 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:03,096-Speed 6309.71 samples/sec Loss 5.4406 LearningRate 0.0004 Epoch: 16 Global Step: 334190 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:06,343-Speed 6310.06 samples/sec Loss 5.4974 LearningRate 0.0004 Epoch: 16 Global Step: 334200 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:09,585-Speed 6318.66 samples/sec Loss 5.4420 LearningRate 0.0004 Epoch: 16 Global Step: 334210 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:12,831-Speed 6310.63 samples/sec Loss 5.4754 LearningRate 0.0004 Epoch: 16 Global Step: 334220 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 22:08:16,063-Speed 6337.94 samples/sec Loss 5.5044 LearningRate 0.0004 Epoch: 16 Global Step: 334230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:19,304-Speed 6319.15 samples/sec Loss 5.4736 LearningRate 0.0004 Epoch: 16 Global Step: 334240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:22,551-Speed 6309.08 samples/sec Loss 5.5335 LearningRate 0.0004 Epoch: 16 Global Step: 334250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:25,793-Speed 6318.97 samples/sec Loss 5.5131 LearningRate 0.0004 Epoch: 16 Global Step: 334260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:29,042-Speed 6305.14 samples/sec Loss 5.4457 LearningRate 0.0004 Epoch: 16 Global Step: 334270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:32,283-Speed 6319.42 samples/sec Loss 5.4694 LearningRate 0.0004 Epoch: 16 Global Step: 334280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:35,524-Speed 6320.73 samples/sec Loss 5.5731 LearningRate 0.0004 Epoch: 16 Global Step: 334290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:38,770-Speed 6310.83 samples/sec Loss 5.4784 LearningRate 0.0004 Epoch: 16 Global Step: 334300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:42,016-Speed 6310.94 samples/sec Loss 5.5397 LearningRate 0.0004 Epoch: 16 Global Step: 334310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:45,261-Speed 6313.05 samples/sec Loss 5.5998 LearningRate 0.0004 Epoch: 16 Global Step: 334320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:48,494-Speed 6336.45 samples/sec Loss 5.4457 LearningRate 0.0004 Epoch: 16 Global Step: 334330 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:51,734-Speed 6322.12 samples/sec Loss 5.4962 LearningRate 0.0004 Epoch: 16 Global Step: 334340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:54,979-Speed 6314.01 samples/sec Loss 5.4422 LearningRate 0.0004 Epoch: 16 Global Step: 334350 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:08:58,221-Speed 6316.77 samples/sec Loss 5.5574 LearningRate 0.0004 Epoch: 16 Global Step: 334360 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:01,461-Speed 6323.29 samples/sec Loss 5.4863 LearningRate 0.0004 Epoch: 16 Global Step: 334370 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:04,704-Speed 6317.40 samples/sec Loss 5.4923 LearningRate 0.0004 Epoch: 16 Global Step: 334380 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:07,946-Speed 6318.47 samples/sec Loss 5.5019 LearningRate 0.0004 Epoch: 16 Global Step: 334390 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:11,190-Speed 6314.66 samples/sec Loss 5.4574 LearningRate 0.0004 Epoch: 16 Global Step: 334400 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:14,435-Speed 6312.52 samples/sec Loss 5.3661 LearningRate 0.0004 Epoch: 16 Global Step: 334410 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:17,680-Speed 6311.68 samples/sec Loss 5.4326 LearningRate 0.0004 Epoch: 16 Global Step: 334420 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:20,910-Speed 6341.27 samples/sec Loss 5.4943 LearningRate 0.0004 Epoch: 16 Global Step: 334430 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:24,152-Speed 6318.89 samples/sec Loss 5.4503 LearningRate 0.0004 Epoch: 16 Global Step: 334440 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:27,398-Speed 6310.83 samples/sec Loss 5.4578 LearningRate 0.0004 Epoch: 16 Global Step: 334450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:30,645-Speed 6309.24 samples/sec Loss 5.4046 LearningRate 0.0004 Epoch: 16 Global Step: 334460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:33,889-Speed 6314.02 samples/sec Loss 5.4959 LearningRate 0.0004 Epoch: 16 Global Step: 334470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:37,133-Speed 6314.93 samples/sec Loss 5.4339 LearningRate 0.0004 Epoch: 16 Global Step: 334480 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:09:40,363-Speed 6341.25 samples/sec Loss 5.4606 LearningRate 0.0004 Epoch: 16 Global Step: 334490 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:09:43,609-Speed 6310.92 samples/sec Loss 5.4823 LearningRate 0.0004 Epoch: 16 Global Step: 334500 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:09:46,856-Speed 6310.03 samples/sec Loss 5.4396 LearningRate 0.0004 Epoch: 16 Global Step: 334510 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:09:50,107-Speed 6300.76 samples/sec Loss 5.4968 LearningRate 0.0004 Epoch: 16 Global Step: 334520 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:09:53,353-Speed 6311.31 samples/sec Loss 5.4657 LearningRate 0.0004 Epoch: 16 Global Step: 334530 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:09:56,598-Speed 6312.09 samples/sec Loss 5.5494 LearningRate 0.0004 Epoch: 16 Global Step: 334540 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:09:59,842-Speed 6314.21 samples/sec Loss 5.5599 LearningRate 0.0004 Epoch: 16 Global Step: 334550 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:10:03,086-Speed 6313.90 samples/sec Loss 5.5260 LearningRate 0.0004 Epoch: 16 Global Step: 334560 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:10:06,331-Speed 6313.36 samples/sec Loss 5.4917 LearningRate 0.0004 Epoch: 16 Global Step: 334570 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:10:09,577-Speed 6312.39 samples/sec Loss 5.4726 LearningRate 0.0004 Epoch: 16 Global Step: 334580 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:10:12,817-Speed 6321.37 samples/sec Loss 5.3915 LearningRate 0.0004 Epoch: 16 Global Step: 334590 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:16,060-Speed 6317.79 samples/sec Loss 5.5010 LearningRate 0.0004 Epoch: 16 Global Step: 334600 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:19,301-Speed 6319.57 samples/sec Loss 5.4084 LearningRate 0.0004 Epoch: 16 Global Step: 334610 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:22,545-Speed 6314.43 samples/sec Loss 5.5346 LearningRate 0.0004 Epoch: 16 Global Step: 334620 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:25,790-Speed 6315.42 samples/sec Loss 5.5157 LearningRate 0.0004 Epoch: 16 Global Step: 334630 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:29,036-Speed 6310.13 samples/sec Loss 5.4792 LearningRate 0.0004 Epoch: 16 Global Step: 334640 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:32,281-Speed 6312.49 samples/sec Loss 5.4175 LearningRate 0.0004 Epoch: 16 Global Step: 334650 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:35,526-Speed 6314.31 samples/sec Loss 5.4883 LearningRate 0.0004 Epoch: 16 Global Step: 334660 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:38,773-Speed 6307.29 samples/sec Loss 5.4879 LearningRate 0.0004 Epoch: 16 Global Step: 334670 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:42,017-Speed 6314.60 samples/sec Loss 5.5412 LearningRate 0.0004 Epoch: 16 Global Step: 334680 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:45,262-Speed 6313.25 samples/sec Loss 5.4564 LearningRate 0.0004 Epoch: 16 Global Step: 334690 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 22:10:48,496-Speed 6334.04 samples/sec Loss 5.5163 LearningRate 0.0004 Epoch: 16 Global Step: 334700 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:51,744-Speed 6306.63 samples/sec Loss 5.4580 LearningRate 0.0004 Epoch: 16 Global Step: 334710 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:54,988-Speed 6315.39 samples/sec Loss 5.4850 LearningRate 0.0004 Epoch: 16 Global Step: 334720 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:10:58,233-Speed 6312.24 samples/sec Loss 5.5258 LearningRate 0.0004 Epoch: 16 Global Step: 334730 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:01,486-Speed 6297.58 samples/sec Loss 5.5403 LearningRate 0.0004 Epoch: 16 Global Step: 334740 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:04,735-Speed 6303.35 samples/sec Loss 5.5138 LearningRate 0.0004 Epoch: 16 Global Step: 334750 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:07,979-Speed 6314.88 samples/sec Loss 5.5276 LearningRate 0.0004 Epoch: 16 Global Step: 334760 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:11,226-Speed 6309.71 samples/sec Loss 5.5038 LearningRate 0.0004 Epoch: 16 Global Step: 334770 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:14,488-Speed 6281.65 samples/sec Loss 5.5160 LearningRate 0.0004 Epoch: 16 Global Step: 334780 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:17,735-Speed 6309.74 samples/sec Loss 5.4936 LearningRate 0.0004 Epoch: 16 Global Step: 334790 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:20,970-Speed 6331.19 samples/sec Loss 5.4827 LearningRate 0.0004 Epoch: 16 Global Step: 334800 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:24,217-Speed 6309.98 samples/sec Loss 5.4891 LearningRate 0.0004 Epoch: 16 Global Step: 334810 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:27,462-Speed 6312.08 samples/sec Loss 5.4793 LearningRate 0.0004 Epoch: 16 Global Step: 334820 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:30,713-Speed 6300.42 samples/sec Loss 5.4318 LearningRate 0.0004 Epoch: 16 Global Step: 334830 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:33,958-Speed 6312.48 samples/sec Loss 5.5842 LearningRate 0.0004 Epoch: 16 Global Step: 334840 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:37,206-Speed 6307.94 samples/sec Loss 5.5072 LearningRate 0.0004 Epoch: 16 Global Step: 334850 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:40,446-Speed 6321.11 samples/sec Loss 5.3715 LearningRate 0.0004 Epoch: 16 Global Step: 334860 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:43,693-Speed 6309.76 samples/sec Loss 5.4532 LearningRate 0.0004 Epoch: 16 Global Step: 334870 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:46,939-Speed 6311.09 samples/sec Loss 5.4561 LearningRate 0.0004 Epoch: 16 Global Step: 334880 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:50,186-Speed 6308.32 samples/sec Loss 5.4568 LearningRate 0.0004 Epoch: 16 Global Step: 334890 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:53,419-Speed 6336.92 samples/sec Loss 5.4393 LearningRate 0.0004 Epoch: 16 Global Step: 334900 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:56,661-Speed 6317.32 samples/sec Loss 5.4936 LearningRate 0.0004 Epoch: 16 Global Step: 334910 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:11:59,913-Speed 6299.52 samples/sec Loss 5.4505 LearningRate 0.0004 Epoch: 16 Global Step: 334920 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:03,162-Speed 6304.59 samples/sec Loss 5.5556 LearningRate 0.0004 Epoch: 16 Global Step: 334930 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:06,407-Speed 6313.24 samples/sec Loss 5.4606 LearningRate 0.0004 Epoch: 16 Global Step: 334940 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:09,658-Speed 6299.49 samples/sec Loss 5.4561 LearningRate 0.0004 Epoch: 16 Global Step: 334950 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:12,905-Speed 6309.11 samples/sec Loss 5.5271 LearningRate 0.0004 Epoch: 16 Global Step: 334960 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:16,156-Speed 6302.91 samples/sec Loss 5.4884 LearningRate 0.0004 Epoch: 16 Global Step: 334970 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:19,399-Speed 6316.35 samples/sec Loss 5.4971 LearningRate 0.0004 Epoch: 16 Global Step: 334980 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:22,646-Speed 6308.30 samples/sec Loss 5.5117 LearningRate 0.0004 Epoch: 16 Global Step: 334990 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:25,877-Speed 6339.87 samples/sec Loss 5.5327 LearningRate 0.0004 Epoch: 16 Global Step: 335000 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:29,126-Speed 6305.65 samples/sec Loss 5.4651 LearningRate 0.0004 Epoch: 16 Global Step: 335010 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:32,373-Speed 6307.82 samples/sec Loss 5.4686 LearningRate 0.0004 Epoch: 16 Global Step: 335020 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:35,617-Speed 6314.72 samples/sec Loss 5.5231 LearningRate 0.0004 Epoch: 16 Global Step: 335030 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:38,860-Speed 6316.82 samples/sec Loss 5.4986 LearningRate 0.0004 Epoch: 16 Global Step: 335040 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:42,102-Speed 6317.43 samples/sec Loss 5.5155 LearningRate 0.0004 Epoch: 16 Global Step: 335050 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:45,341-Speed 6324.82 samples/sec Loss 5.5198 LearningRate 0.0004 Epoch: 16 Global Step: 335060 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:48,586-Speed 6313.97 samples/sec Loss 5.5095 LearningRate 0.0004 Epoch: 16 Global Step: 335070 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:51,831-Speed 6312.60 samples/sec Loss 5.4433 LearningRate 0.0004 Epoch: 16 Global Step: 335080 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:55,074-Speed 6316.25 samples/sec Loss 5.4739 LearningRate 0.0004 Epoch: 16 Global Step: 335090 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:12:58,302-Speed 6345.99 samples/sec Loss 5.5027 LearningRate 0.0004 Epoch: 16 Global Step: 335100 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:13:01,546-Speed 6314.75 samples/sec Loss 5.5178 LearningRate 0.0004 Epoch: 16 Global Step: 335110 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:13:04,789-Speed 6315.35 samples/sec Loss 5.4605 LearningRate 0.0004 Epoch: 16 Global Step: 335120 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:13:08,033-Speed 6314.45 samples/sec Loss 5.4175 LearningRate 0.0004 Epoch: 16 Global Step: 335130 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:13:11,277-Speed 6315.37 samples/sec Loss 5.4165 LearningRate 0.0004 Epoch: 16 Global Step: 335140 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:13:14,521-Speed 6315.37 samples/sec Loss 5.4952 LearningRate 0.0004 Epoch: 16 Global Step: 335150 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:13:17,775-Speed 6293.89 samples/sec Loss 5.4860 LearningRate 0.0004 Epoch: 16 Global Step: 335160 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:13:21,017-Speed 6320.59 samples/sec Loss 5.4239 LearningRate 0.0004 Epoch: 16 Global Step: 335170 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:13:24,261-Speed 6314.73 samples/sec Loss 5.3939 LearningRate 0.0004 Epoch: 16 Global Step: 335180 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:13:27,506-Speed 6311.71 samples/sec Loss 5.3650 LearningRate 0.0004 Epoch: 16 Global Step: 335190 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:13:30,748-Speed 6318.84 samples/sec Loss 5.5089 LearningRate 0.0004 Epoch: 16 Global Step: 335200 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:13:33,989-Speed 6320.57 samples/sec Loss 5.4383 LearningRate 0.0004 Epoch: 16 Global Step: 335210 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:13:37,229-Speed 6323.52 samples/sec Loss 5.5442 LearningRate 0.0004 Epoch: 16 Global Step: 335220 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:13:40,474-Speed 6312.43 samples/sec Loss 5.4330 LearningRate 0.0004 Epoch: 16 Global Step: 335230 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:13:43,718-Speed 6313.64 samples/sec Loss 5.4044 LearningRate 0.0004 Epoch: 16 Global Step: 335240 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:13:46,960-Speed 6318.14 samples/sec Loss 5.4067 LearningRate 0.0004 Epoch: 16 Global Step: 335250 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:13:50,200-Speed 6322.50 samples/sec Loss 5.3973 LearningRate 0.0004 Epoch: 16 Global Step: 335260 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:13:53,447-Speed 6308.52 samples/sec Loss 5.4563 LearningRate 0.0004 Epoch: 16 Global Step: 335270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:13:56,692-Speed 6314.33 samples/sec Loss 5.4647 LearningRate 0.0004 Epoch: 16 Global Step: 335280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:13:59,940-Speed 6305.48 samples/sec Loss 5.4430 LearningRate 0.0004 Epoch: 16 Global Step: 335290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:14:03,190-Speed 6303.16 samples/sec Loss 5.4273 LearningRate 0.0004 Epoch: 16 Global Step: 335300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:14:06,438-Speed 6306.63 samples/sec Loss 5.4404 LearningRate 0.0004 Epoch: 16 Global Step: 335310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:14:09,682-Speed 6314.92 samples/sec Loss 5.4540 LearningRate 0.0004 Epoch: 16 Global Step: 335320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:14:12,928-Speed 6310.46 samples/sec Loss 5.4986 LearningRate 0.0004 Epoch: 16 Global Step: 335330 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:14:16,173-Speed 6313.15 samples/sec Loss 5.4978 LearningRate 0.0004 Epoch: 16 Global Step: 335340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:14:19,408-Speed 6332.40 samples/sec Loss 5.4751 LearningRate 0.0004 Epoch: 16 Global Step: 335350 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:14:22,666-Speed 6286.29 samples/sec Loss 5.4709 LearningRate 0.0004 Epoch: 16 Global Step: 335360 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:14:25,907-Speed 6320.51 samples/sec Loss 5.5309 LearningRate 0.0004 Epoch: 16 Global Step: 335370 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:14:29,157-Speed 6304.94 samples/sec Loss 5.4721 LearningRate 0.0004 Epoch: 16 Global Step: 335380 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:14:32,396-Speed 6323.62 samples/sec Loss 5.4420 LearningRate 0.0004 Epoch: 16 Global Step: 335390 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:14:35,644-Speed 6307.78 samples/sec Loss 5.4894 LearningRate 0.0004 Epoch: 16 Global Step: 335400 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:14:38,889-Speed 6311.55 samples/sec Loss 5.5551 LearningRate 0.0004 Epoch: 16 Global Step: 335410 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:14:42,135-Speed 6311.48 samples/sec Loss 5.4534 LearningRate 0.0004 Epoch: 16 Global Step: 335420 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:14:45,380-Speed 6311.98 samples/sec Loss 5.4064 LearningRate 0.0004 Epoch: 16 Global Step: 335430 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:14:48,624-Speed 6315.14 samples/sec Loss 5.5008 LearningRate 0.0004 Epoch: 16 Global Step: 335440 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:14:51,869-Speed 6312.92 samples/sec Loss 5.4519 LearningRate 0.0004 Epoch: 16 Global Step: 335450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:14:55,112-Speed 6317.14 samples/sec Loss 5.4186 LearningRate 0.0004 Epoch: 16 Global Step: 335460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:14:58,358-Speed 6309.43 samples/sec Loss 5.4829 LearningRate 0.0004 Epoch: 16 Global Step: 335470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:01,606-Speed 6306.98 samples/sec Loss 5.4918 LearningRate 0.0004 Epoch: 16 Global Step: 335480 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:04,851-Speed 6313.60 samples/sec Loss 5.4527 LearningRate 0.0004 Epoch: 16 Global Step: 335490 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:08,093-Speed 6318.84 samples/sec Loss 5.5327 LearningRate 0.0004 Epoch: 16 Global Step: 335500 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:11,339-Speed 6309.93 samples/sec Loss 5.4068 LearningRate 0.0004 Epoch: 16 Global Step: 335510 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:14,586-Speed 6309.14 samples/sec Loss 5.4729 LearningRate 0.0004 Epoch: 16 Global Step: 335520 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:17,833-Speed 6309.11 samples/sec Loss 5.4263 LearningRate 0.0004 Epoch: 16 Global Step: 335530 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:21,073-Speed 6321.04 samples/sec Loss 5.4455 LearningRate 0.0004 Epoch: 16 Global Step: 335540 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:24,306-Speed 6336.77 samples/sec Loss 5.3693 LearningRate 0.0004 Epoch: 16 Global Step: 335550 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:27,549-Speed 6316.12 samples/sec Loss 5.3551 LearningRate 0.0004 Epoch: 16 Global Step: 335560 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:30,798-Speed 6305.13 samples/sec Loss 5.5235 LearningRate 0.0004 Epoch: 16 Global Step: 335570 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:34,042-Speed 6315.31 samples/sec Loss 5.5182 LearningRate 0.0004 Epoch: 16 Global Step: 335580 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:37,289-Speed 6308.87 samples/sec Loss 5.4433 LearningRate 0.0004 Epoch: 16 Global Step: 335590 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:40,539-Speed 6302.90 samples/sec Loss 5.4613 LearningRate 0.0004 Epoch: 16 Global Step: 335600 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:43,784-Speed 6312.35 samples/sec Loss 5.3978 LearningRate 0.0004 Epoch: 16 Global Step: 335610 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:47,032-Speed 6308.32 samples/sec Loss 5.5281 LearningRate 0.0004 Epoch: 16 Global Step: 335620 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:50,273-Speed 6320.25 samples/sec Loss 5.4075 LearningRate 0.0004 Epoch: 16 Global Step: 335630 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:53,517-Speed 6313.25 samples/sec Loss 5.5163 LearningRate 0.0004 Epoch: 16 Global Step: 335640 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:15:56,765-Speed 6307.73 samples/sec Loss 5.4736 LearningRate 0.0004 Epoch: 16 Global Step: 335650 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 22:15:59,994-Speed 6344.33 samples/sec Loss 5.4590 LearningRate 0.0004 Epoch: 16 Global Step: 335660 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:03,244-Speed 6302.82 samples/sec Loss 5.4434 LearningRate 0.0004 Epoch: 16 Global Step: 335670 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:06,489-Speed 6312.71 samples/sec Loss 5.5241 LearningRate 0.0004 Epoch: 16 Global Step: 335680 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:09,730-Speed 6319.87 samples/sec Loss 5.4428 LearningRate 0.0004 Epoch: 16 Global Step: 335690 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:12,978-Speed 6306.76 samples/sec Loss 5.4856 LearningRate 0.0004 Epoch: 16 Global Step: 335700 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:16,224-Speed 6310.97 samples/sec Loss 5.4660 LearningRate 0.0004 Epoch: 16 Global Step: 335710 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:19,471-Speed 6308.74 samples/sec Loss 5.5232 LearningRate 0.0004 Epoch: 16 Global Step: 335720 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:22,713-Speed 6317.53 samples/sec Loss 5.5041 LearningRate 0.0004 Epoch: 16 Global Step: 335730 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:25,965-Speed 6301.61 samples/sec Loss 5.4755 LearningRate 0.0004 Epoch: 16 Global Step: 335740 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:29,204-Speed 6322.48 samples/sec Loss 5.4897 LearningRate 0.0004 Epoch: 16 Global Step: 335750 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:32,434-Speed 6342.31 samples/sec Loss 5.4999 LearningRate 0.0004 Epoch: 16 Global Step: 335760 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:16:35,673-Speed 6325.85 samples/sec Loss 5.4758 LearningRate 0.0004 Epoch: 16 Global Step: 335770 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:16:38,915-Speed 6318.41 samples/sec Loss 5.4484 LearningRate 0.0004 Epoch: 16 Global Step: 335780 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:16:42,160-Speed 6310.64 samples/sec Loss 5.4198 LearningRate 0.0004 Epoch: 16 Global Step: 335790 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:16:45,402-Speed 6319.90 samples/sec Loss 5.5320 LearningRate 0.0004 Epoch: 16 Global Step: 335800 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:16:48,651-Speed 6305.30 samples/sec Loss 5.4607 LearningRate 0.0004 Epoch: 16 Global Step: 335810 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:16:51,906-Speed 6293.00 samples/sec Loss 5.5176 LearningRate 0.0004 Epoch: 16 Global Step: 335820 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:16:55,148-Speed 6318.35 samples/sec Loss 5.5154 LearningRate 0.0004 Epoch: 16 Global Step: 335830 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:16:58,396-Speed 6306.81 samples/sec Loss 5.5529 LearningRate 0.0004 Epoch: 16 Global Step: 335840 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:01,639-Speed 6317.40 samples/sec Loss 5.4405 LearningRate 0.0004 Epoch: 16 Global Step: 335850 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:04,883-Speed 6314.90 samples/sec Loss 5.4820 LearningRate 0.0004 Epoch: 16 Global Step: 335860 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:08,127-Speed 6314.58 samples/sec Loss 5.5126 LearningRate 0.0004 Epoch: 16 Global Step: 335870 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:17:11,372-Speed 6312.12 samples/sec Loss 5.4351 LearningRate 0.0004 Epoch: 16 Global Step: 335880 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:17:14,615-Speed 6316.87 samples/sec Loss 5.5007 LearningRate 0.0004 Epoch: 16 Global Step: 335890 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:17:17,871-Speed 6291.77 samples/sec Loss 5.4258 LearningRate 0.0004 Epoch: 16 Global Step: 335900 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:17:21,115-Speed 6313.79 samples/sec Loss 5.5191 LearningRate 0.0004 Epoch: 16 Global Step: 335910 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:17:24,408-Speed 6220.84 samples/sec Loss 5.4243 LearningRate 0.0004 Epoch: 16 Global Step: 335920 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:17:27,644-Speed 6330.72 samples/sec Loss 5.4328 LearningRate 0.0004 Epoch: 16 Global Step: 335930 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:30,888-Speed 6313.85 samples/sec Loss 5.4200 LearningRate 0.0004 Epoch: 16 Global Step: 335940 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:34,130-Speed 6319.23 samples/sec Loss 5.4798 LearningRate 0.0004 Epoch: 16 Global Step: 335950 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:37,376-Speed 6310.39 samples/sec Loss 5.4859 LearningRate 0.0004 Epoch: 16 Global Step: 335960 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:40,617-Speed 6319.98 samples/sec Loss 5.5006 LearningRate 0.0004 Epoch: 16 Global Step: 335970 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:43,861-Speed 6315.48 samples/sec Loss 5.4653 LearningRate 0.0004 Epoch: 16 Global Step: 335980 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:47,107-Speed 6309.97 samples/sec Loss 5.3589 LearningRate 0.0004 Epoch: 16 Global Step: 335990 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:50,351-Speed 6314.85 samples/sec Loss 5.3886 LearningRate 0.0004 Epoch: 16 Global Step: 336000 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:53,597-Speed 6310.39 samples/sec Loss 5.4949 LearningRate 0.0004 Epoch: 16 Global Step: 336010 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:17:56,838-Speed 6320.12 samples/sec Loss 5.5012 LearningRate 0.0004 Epoch: 16 Global Step: 336020 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:18:00,085-Speed 6309.06 samples/sec Loss 5.4644 LearningRate 0.0004 Epoch: 16 Global Step: 336030 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:03,332-Speed 6308.80 samples/sec Loss 5.4923 LearningRate 0.0004 Epoch: 16 Global Step: 336040 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:06,577-Speed 6313.54 samples/sec Loss 5.4906 LearningRate 0.0004 Epoch: 16 Global Step: 336050 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:09,823-Speed 6311.20 samples/sec Loss 5.4489 LearningRate 0.0004 Epoch: 16 Global Step: 336060 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:13,068-Speed 6312.53 samples/sec Loss 5.4143 LearningRate 0.0004 Epoch: 16 Global Step: 336070 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:16,311-Speed 6315.22 samples/sec Loss 5.4419 LearningRate 0.0004 Epoch: 16 Global Step: 336080 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:19,557-Speed 6312.33 samples/sec Loss 5.4598 LearningRate 0.0004 Epoch: 16 Global Step: 336090 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:22,798-Speed 6319.47 samples/sec Loss 5.5350 LearningRate 0.0004 Epoch: 16 Global Step: 336100 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:26,038-Speed 6322.47 samples/sec Loss 5.4714 LearningRate 0.0004 Epoch: 16 Global Step: 336110 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:29,284-Speed 6310.24 samples/sec Loss 5.4777 LearningRate 0.0004 Epoch: 16 Global Step: 336120 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:32,514-Speed 6342.84 samples/sec Loss 5.4027 LearningRate 0.0004 Epoch: 16 Global Step: 336130 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:35,759-Speed 6313.12 samples/sec Loss 5.5475 LearningRate 0.0004 Epoch: 16 Global Step: 336140 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:39,001-Speed 6317.29 samples/sec Loss 5.5045 LearningRate 0.0004 Epoch: 16 Global Step: 336150 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:42,244-Speed 6316.07 samples/sec Loss 5.4300 LearningRate 0.0004 Epoch: 16 Global Step: 336160 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:45,492-Speed 6308.32 samples/sec Loss 5.4428 LearningRate 0.0004 Epoch: 16 Global Step: 336170 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:48,734-Speed 6317.96 samples/sec Loss 5.4554 LearningRate 0.0004 Epoch: 16 Global Step: 336180 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:51,979-Speed 6312.92 samples/sec Loss 5.4866 LearningRate 0.0004 Epoch: 16 Global Step: 336190 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:55,233-Speed 6294.74 samples/sec Loss 5.3898 LearningRate 0.0004 Epoch: 16 Global Step: 336200 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:18:58,476-Speed 6316.82 samples/sec Loss 5.4696 LearningRate 0.0004 Epoch: 16 Global Step: 336210 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:01,717-Speed 6319.29 samples/sec Loss 5.4828 LearningRate 0.0004 Epoch: 16 Global Step: 336220 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:04,952-Speed 6333.77 samples/sec Loss 5.4154 LearningRate 0.0004 Epoch: 16 Global Step: 336230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:08,192-Speed 6322.81 samples/sec Loss 5.5255 LearningRate 0.0004 Epoch: 16 Global Step: 336240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:11,441-Speed 6305.56 samples/sec Loss 5.5357 LearningRate 0.0004 Epoch: 16 Global Step: 336250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:14,687-Speed 6309.06 samples/sec Loss 5.5072 LearningRate 0.0004 Epoch: 16 Global Step: 336260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:17,932-Speed 6314.06 samples/sec Loss 5.4598 LearningRate 0.0004 Epoch: 16 Global Step: 336270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:21,177-Speed 6312.79 samples/sec Loss 5.4505 LearningRate 0.0004 Epoch: 16 Global Step: 336280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:24,421-Speed 6313.35 samples/sec Loss 5.4414 LearningRate 0.0004 Epoch: 16 Global Step: 336290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:27,669-Speed 6308.13 samples/sec Loss 5.4477 LearningRate 0.0004 Epoch: 16 Global Step: 336300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:30,924-Speed 6293.42 samples/sec Loss 5.3397 LearningRate 0.0004 Epoch: 16 Global Step: 336310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:34,169-Speed 6311.13 samples/sec Loss 5.5379 LearningRate 0.0004 Epoch: 16 Global Step: 336320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:37,427-Speed 6287.47 samples/sec Loss 5.4750 LearningRate 0.0004 Epoch: 16 Global Step: 336330 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 22:19:40,653-Speed 6349.90 samples/sec Loss 5.4730 LearningRate 0.0004 Epoch: 16 Global Step: 336340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:43,897-Speed 6314.25 samples/sec Loss 5.4522 LearningRate 0.0004 Epoch: 16 Global Step: 336350 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:47,140-Speed 6317.00 samples/sec Loss 5.4211 LearningRate 0.0004 Epoch: 16 Global Step: 336360 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:50,387-Speed 6308.99 samples/sec Loss 5.4653 LearningRate 0.0004 Epoch: 16 Global Step: 336370 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:53,629-Speed 6318.94 samples/sec Loss 5.4827 LearningRate 0.0004 Epoch: 16 Global Step: 336380 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:19:56,874-Speed 6312.37 samples/sec Loss 5.4411 LearningRate 0.0004 Epoch: 16 Global Step: 336390 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:20:00,189-Speed 6179.83 samples/sec Loss 5.5188 LearningRate 0.0004 Epoch: 16 Global Step: 336400 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:20:03,435-Speed 6310.83 samples/sec Loss 5.4828 LearningRate 0.0004 Epoch: 16 Global Step: 336410 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:20:06,683-Speed 6307.41 samples/sec Loss 5.4403 LearningRate 0.0004 Epoch: 16 Global Step: 336420 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:20:09,923-Speed 6322.11 samples/sec Loss 5.4998 LearningRate 0.0004 Epoch: 16 Global Step: 336430 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:20:13,172-Speed 6304.03 samples/sec Loss 5.4554 LearningRate 0.0004 Epoch: 16 Global Step: 336440 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:20:16,416-Speed 6315.60 samples/sec Loss 5.3981 LearningRate 0.0004 Epoch: 16 Global Step: 336450 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:20:19,668-Speed 6299.77 samples/sec Loss 5.4581 LearningRate 0.0004 Epoch: 16 Global Step: 336460 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:20:22,913-Speed 6312.46 samples/sec Loss 5.4517 LearningRate 0.0004 Epoch: 16 Global Step: 336470 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:20:26,159-Speed 6309.45 samples/sec Loss 5.5391 LearningRate 0.0004 Epoch: 16 Global Step: 336480 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:20:29,405-Speed 6311.93 samples/sec Loss 5.4872 LearningRate 0.0004 Epoch: 16 Global Step: 336490 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:20:32,652-Speed 6307.90 samples/sec Loss 5.3858 LearningRate 0.0004 Epoch: 16 Global Step: 336500 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:20:35,904-Speed 6299.32 samples/sec Loss 5.4144 LearningRate 0.0004 Epoch: 16 Global Step: 336510 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:20:39,151-Speed 6309.13 samples/sec Loss 5.4446 LearningRate 0.0004 Epoch: 16 Global Step: 336520 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:20:42,393-Speed 6317.32 samples/sec Loss 5.4310 LearningRate 0.0004 Epoch: 16 Global Step: 336530 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:20:45,643-Speed 6304.35 samples/sec Loss 5.4861 LearningRate 0.0004 Epoch: 16 Global Step: 336540 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:20:48,886-Speed 6315.51 samples/sec Loss 5.4420 LearningRate 0.0004 Epoch: 16 Global Step: 336550 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:20:52,134-Speed 6307.17 samples/sec Loss 5.5055 LearningRate 0.0004 Epoch: 16 Global Step: 336560 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:20:55,378-Speed 6315.30 samples/sec Loss 5.4727 LearningRate 0.0004 Epoch: 16 Global Step: 336570 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:20:58,624-Speed 6310.60 samples/sec Loss 5.4737 LearningRate 0.0004 Epoch: 16 Global Step: 336580 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:01,877-Speed 6296.48 samples/sec Loss 5.5325 LearningRate 0.0004 Epoch: 16 Global Step: 336590 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:05,119-Speed 6318.02 samples/sec Loss 5.4575 LearningRate 0.0004 Epoch: 16 Global Step: 336600 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:08,379-Speed 6284.05 samples/sec Loss 5.4459 LearningRate 0.0004 Epoch: 16 Global Step: 336610 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:11,626-Speed 6308.71 samples/sec Loss 5.5017 LearningRate 0.0004 Epoch: 16 Global Step: 336620 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:14,874-Speed 6305.87 samples/sec Loss 5.4933 LearningRate 0.0004 Epoch: 16 Global Step: 336630 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:18,122-Speed 6308.53 samples/sec Loss 5.4069 LearningRate 0.0004 Epoch: 16 Global Step: 336640 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:21,366-Speed 6314.31 samples/sec Loss 5.4404 LearningRate 0.0004 Epoch: 16 Global Step: 336650 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:24,618-Speed 6299.33 samples/sec Loss 5.4315 LearningRate 0.0004 Epoch: 16 Global Step: 336660 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:27,864-Speed 6311.49 samples/sec Loss 5.4682 LearningRate 0.0004 Epoch: 16 Global Step: 336670 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:31,108-Speed 6313.64 samples/sec Loss 5.5096 LearningRate 0.0004 Epoch: 16 Global Step: 336680 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:34,352-Speed 6314.25 samples/sec Loss 5.5193 LearningRate 0.0004 Epoch: 16 Global Step: 336690 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:37,585-Speed 6336.47 samples/sec Loss 5.4652 LearningRate 0.0004 Epoch: 16 Global Step: 336700 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:40,830-Speed 6312.62 samples/sec Loss 5.4180 LearningRate 0.0004 Epoch: 16 Global Step: 336710 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:44,071-Speed 6320.77 samples/sec Loss 5.4404 LearningRate 0.0004 Epoch: 16 Global Step: 336720 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:47,316-Speed 6313.26 samples/sec Loss 5.5028 LearningRate 0.0004 Epoch: 16 Global Step: 336730 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:50,559-Speed 6315.54 samples/sec Loss 5.5199 LearningRate 0.0004 Epoch: 16 Global Step: 336740 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:53,805-Speed 6311.33 samples/sec Loss 5.4099 LearningRate 0.0004 Epoch: 16 Global Step: 336750 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:21:57,054-Speed 6304.30 samples/sec Loss 5.4108 LearningRate 0.0004 Epoch: 16 Global Step: 336760 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:00,346-Speed 6223.50 samples/sec Loss 5.5459 LearningRate 0.0004 Epoch: 16 Global Step: 336770 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:03,587-Speed 6320.49 samples/sec Loss 5.5003 LearningRate 0.0004 Epoch: 16 Global Step: 336780 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:06,833-Speed 6311.59 samples/sec Loss 5.4429 LearningRate 0.0004 Epoch: 16 Global Step: 336790 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:10,065-Speed 6337.38 samples/sec Loss 5.4860 LearningRate 0.0004 Epoch: 16 Global Step: 336800 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:13,307-Speed 6319.62 samples/sec Loss 5.3914 LearningRate 0.0004 Epoch: 16 Global Step: 336810 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:16,549-Speed 6318.33 samples/sec Loss 5.4596 LearningRate 0.0004 Epoch: 16 Global Step: 336820 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:19,794-Speed 6312.31 samples/sec Loss 5.4414 LearningRate 0.0004 Epoch: 16 Global Step: 336830 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:23,038-Speed 6313.47 samples/sec Loss 5.5492 LearningRate 0.0004 Epoch: 16 Global Step: 336840 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:26,302-Speed 6277.32 samples/sec Loss 5.4760 LearningRate 0.0004 Epoch: 16 Global Step: 336850 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:29,549-Speed 6308.25 samples/sec Loss 5.4704 LearningRate 0.0004 Epoch: 16 Global Step: 336860 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:32,795-Speed 6311.55 samples/sec Loss 5.4190 LearningRate 0.0004 Epoch: 16 Global Step: 336870 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:36,061-Speed 6271.78 samples/sec Loss 5.4888 LearningRate 0.0004 Epoch: 16 Global Step: 336880 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:39,305-Speed 6314.63 samples/sec Loss 5.4658 LearningRate 0.0004 Epoch: 16 Global Step: 336890 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:42,535-Speed 6341.75 samples/sec Loss 5.4800 LearningRate 0.0004 Epoch: 16 Global Step: 336900 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:45,782-Speed 6310.15 samples/sec Loss 5.4065 LearningRate 0.0004 Epoch: 16 Global Step: 336910 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:49,033-Speed 6300.27 samples/sec Loss 5.5345 LearningRate 0.0004 Epoch: 16 Global Step: 336920 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:52,277-Speed 6315.26 samples/sec Loss 5.4575 LearningRate 0.0004 Epoch: 16 Global Step: 336930 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:55,520-Speed 6316.59 samples/sec Loss 5.4897 LearningRate 0.0004 Epoch: 16 Global Step: 336940 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:22:58,764-Speed 6313.37 samples/sec Loss 5.4155 LearningRate 0.0004 Epoch: 16 Global Step: 336950 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:02,016-Speed 6302.66 samples/sec Loss 5.4639 LearningRate 0.0004 Epoch: 16 Global Step: 336960 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:05,263-Speed 6309.21 samples/sec Loss 5.4904 LearningRate 0.0004 Epoch: 16 Global Step: 336970 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:08,506-Speed 6316.94 samples/sec Loss 5.4838 LearningRate 0.0004 Epoch: 16 Global Step: 336980 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:11,753-Speed 6308.98 samples/sec Loss 5.4782 LearningRate 0.0004 Epoch: 16 Global Step: 336990 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:14,997-Speed 6313.62 samples/sec Loss 5.5078 LearningRate 0.0004 Epoch: 16 Global Step: 337000 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 22:23:18,226-Speed 6343.02 samples/sec Loss 5.4564 LearningRate 0.0004 Epoch: 16 Global Step: 337010 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:21,466-Speed 6323.84 samples/sec Loss 5.4616 LearningRate 0.0004 Epoch: 16 Global Step: 337020 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:24,713-Speed 6307.79 samples/sec Loss 5.5102 LearningRate 0.0004 Epoch: 16 Global Step: 337030 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:27,958-Speed 6313.74 samples/sec Loss 5.4621 LearningRate 0.0004 Epoch: 16 Global Step: 337040 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:31,205-Speed 6308.83 samples/sec Loss 5.5244 LearningRate 0.0004 Epoch: 16 Global Step: 337050 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:34,448-Speed 6316.52 samples/sec Loss 5.4128 LearningRate 0.0004 Epoch: 16 Global Step: 337060 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:37,696-Speed 6307.04 samples/sec Loss 5.4743 LearningRate 0.0004 Epoch: 16 Global Step: 337070 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:40,944-Speed 6307.57 samples/sec Loss 5.4366 LearningRate 0.0004 Epoch: 16 Global Step: 337080 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:44,188-Speed 6312.94 samples/sec Loss 5.4951 LearningRate 0.0004 Epoch: 16 Global Step: 337090 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:47,432-Speed 6314.86 samples/sec Loss 5.4579 LearningRate 0.0004 Epoch: 16 Global Step: 337100 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:50,663-Speed 6341.53 samples/sec Loss 5.3904 LearningRate 0.0004 Epoch: 16 Global Step: 337110 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:53,909-Speed 6309.56 samples/sec Loss 5.4705 LearningRate 0.0004 Epoch: 16 Global Step: 337120 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:23:57,155-Speed 6311.48 samples/sec Loss 5.4854 LearningRate 0.0004 Epoch: 16 Global Step: 337130 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:00,397-Speed 6317.95 samples/sec Loss 5.4985 LearningRate 0.0004 Epoch: 16 Global Step: 337140 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:03,646-Speed 6305.94 samples/sec Loss 5.4752 LearningRate 0.0004 Epoch: 16 Global Step: 337150 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:06,894-Speed 6306.51 samples/sec Loss 5.4683 LearningRate 0.0004 Epoch: 16 Global Step: 337160 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:10,140-Speed 6309.84 samples/sec Loss 5.3901 LearningRate 0.0004 Epoch: 16 Global Step: 337170 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:13,390-Speed 6303.99 samples/sec Loss 5.3991 LearningRate 0.0004 Epoch: 16 Global Step: 337180 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:16,633-Speed 6315.53 samples/sec Loss 5.3849 LearningRate 0.0004 Epoch: 16 Global Step: 337190 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:19,875-Speed 6318.95 samples/sec Loss 5.5096 LearningRate 0.0004 Epoch: 16 Global Step: 337200 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:23,107-Speed 6338.70 samples/sec Loss 5.4666 LearningRate 0.0004 Epoch: 16 Global Step: 337210 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:26,352-Speed 6310.73 samples/sec Loss 5.4861 LearningRate 0.0004 Epoch: 16 Global Step: 337220 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:29,600-Speed 6309.66 samples/sec Loss 5.4489 LearningRate 0.0004 Epoch: 16 Global Step: 337230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:32,848-Speed 6305.36 samples/sec Loss 5.3757 LearningRate 0.0004 Epoch: 16 Global Step: 337240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:36,092-Speed 6315.40 samples/sec Loss 5.4391 LearningRate 0.0004 Epoch: 16 Global Step: 337250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:39,333-Speed 6320.98 samples/sec Loss 5.4790 LearningRate 0.0004 Epoch: 16 Global Step: 337260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:42,580-Speed 6308.77 samples/sec Loss 5.4876 LearningRate 0.0004 Epoch: 16 Global Step: 337270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:45,823-Speed 6316.64 samples/sec Loss 5.4391 LearningRate 0.0004 Epoch: 16 Global Step: 337280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:49,068-Speed 6313.03 samples/sec Loss 5.4327 LearningRate 0.0004 Epoch: 16 Global Step: 337290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:52,313-Speed 6313.21 samples/sec Loss 5.5543 LearningRate 0.0004 Epoch: 16 Global Step: 337300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:55,544-Speed 6338.74 samples/sec Loss 5.4365 LearningRate 0.0004 Epoch: 16 Global Step: 337310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:24:58,789-Speed 6313.02 samples/sec Loss 5.4711 LearningRate 0.0004 Epoch: 16 Global Step: 337320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:02,040-Speed 6301.04 samples/sec Loss 5.4845 LearningRate 0.0004 Epoch: 16 Global Step: 337330 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:05,286-Speed 6311.04 samples/sec Loss 5.5199 LearningRate 0.0004 Epoch: 16 Global Step: 337340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:08,531-Speed 6312.23 samples/sec Loss 5.4374 LearningRate 0.0004 Epoch: 16 Global Step: 337350 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:11,789-Speed 6288.72 samples/sec Loss 5.5011 LearningRate 0.0004 Epoch: 16 Global Step: 337360 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:15,039-Speed 6303.13 samples/sec Loss 5.4951 LearningRate 0.0004 Epoch: 16 Global Step: 337370 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:18,285-Speed 6310.33 samples/sec Loss 5.4415 LearningRate 0.0004 Epoch: 16 Global Step: 337380 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:21,529-Speed 6313.71 samples/sec Loss 5.4267 LearningRate 0.0004 Epoch: 16 Global Step: 337390 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:24,780-Speed 6302.07 samples/sec Loss 5.4128 LearningRate 0.0004 Epoch: 16 Global Step: 337400 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:28,010-Speed 6341.79 samples/sec Loss 5.4846 LearningRate 0.0004 Epoch: 16 Global Step: 337410 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:31,257-Speed 6307.55 samples/sec Loss 5.4975 LearningRate 0.0004 Epoch: 16 Global Step: 337420 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:34,504-Speed 6309.53 samples/sec Loss 5.4860 LearningRate 0.0004 Epoch: 16 Global Step: 337430 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:37,753-Speed 6305.48 samples/sec Loss 5.4699 LearningRate 0.0004 Epoch: 16 Global Step: 337440 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:41,004-Speed 6300.10 samples/sec Loss 5.3969 LearningRate 0.0004 Epoch: 16 Global Step: 337450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:44,255-Speed 6302.41 samples/sec Loss 5.4507 LearningRate 0.0004 Epoch: 16 Global Step: 337460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:47,514-Speed 6284.62 samples/sec Loss 5.4071 LearningRate 0.0004 Epoch: 16 Global Step: 337470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:50,758-Speed 6314.18 samples/sec Loss 5.4584 LearningRate 0.0004 Epoch: 16 Global Step: 337480 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:54,005-Speed 6309.65 samples/sec Loss 5.4955 LearningRate 0.0004 Epoch: 16 Global Step: 337490 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:25:57,251-Speed 6310.11 samples/sec Loss 5.4325 LearningRate 0.0004 Epoch: 16 Global Step: 337500 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:00,484-Speed 6337.24 samples/sec Loss 5.4756 LearningRate 0.0004 Epoch: 16 Global Step: 337510 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:03,732-Speed 6307.30 samples/sec Loss 5.5165 LearningRate 0.0004 Epoch: 16 Global Step: 337520 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:06,977-Speed 6311.69 samples/sec Loss 5.5316 LearningRate 0.0004 Epoch: 16 Global Step: 337530 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:10,222-Speed 6313.98 samples/sec Loss 5.4485 LearningRate 0.0004 Epoch: 16 Global Step: 337540 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:13,464-Speed 6318.85 samples/sec Loss 5.5224 LearningRate 0.0004 Epoch: 16 Global Step: 337550 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:16,711-Speed 6308.48 samples/sec Loss 5.4511 LearningRate 0.0004 Epoch: 16 Global Step: 337560 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:19,958-Speed 6307.04 samples/sec Loss 5.4459 LearningRate 0.0004 Epoch: 16 Global Step: 337570 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:23,204-Speed 6312.17 samples/sec Loss 5.4892 LearningRate 0.0004 Epoch: 16 Global Step: 337580 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:26,447-Speed 6316.80 samples/sec Loss 5.3803 LearningRate 0.0004 Epoch: 16 Global Step: 337590 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:29,692-Speed 6311.79 samples/sec Loss 5.4026 LearningRate 0.0004 Epoch: 16 Global Step: 337600 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:32,923-Speed 6340.14 samples/sec Loss 5.4306 LearningRate 0.0004 Epoch: 16 Global Step: 337610 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:36,171-Speed 6307.20 samples/sec Loss 5.4799 LearningRate 0.0004 Epoch: 16 Global Step: 337620 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:39,419-Speed 6307.20 samples/sec Loss 5.4568 LearningRate 0.0004 Epoch: 16 Global Step: 337630 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:42,664-Speed 6311.58 samples/sec Loss 5.5008 LearningRate 0.0004 Epoch: 16 Global Step: 337640 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:45,910-Speed 6311.87 samples/sec Loss 5.4231 LearningRate 0.0004 Epoch: 16 Global Step: 337650 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:49,153-Speed 6316.27 samples/sec Loss 5.4745 LearningRate 0.0004 Epoch: 16 Global Step: 337660 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:52,399-Speed 6309.71 samples/sec Loss 5.4964 LearningRate 0.0004 Epoch: 16 Global Step: 337670 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:55,640-Speed 6320.28 samples/sec Loss 5.4638 LearningRate 0.0004 Epoch: 16 Global Step: 337680 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:26:58,883-Speed 6317.71 samples/sec Loss 5.4962 LearningRate 0.0004 Epoch: 16 Global Step: 337690 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:02,125-Speed 6318.47 samples/sec Loss 5.4625 LearningRate 0.0004 Epoch: 16 Global Step: 337700 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:05,353-Speed 6347.00 samples/sec Loss 5.4576 LearningRate 0.0004 Epoch: 16 Global Step: 337710 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:08,601-Speed 6306.59 samples/sec Loss 5.5132 LearningRate 0.0004 Epoch: 16 Global Step: 337720 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:11,845-Speed 6318.51 samples/sec Loss 5.4456 LearningRate 0.0004 Epoch: 16 Global Step: 337730 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:15,088-Speed 6316.26 samples/sec Loss 5.4853 LearningRate 0.0004 Epoch: 16 Global Step: 337740 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:18,335-Speed 6307.86 samples/sec Loss 5.4251 LearningRate 0.0004 Epoch: 16 Global Step: 337750 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:21,581-Speed 6311.61 samples/sec Loss 5.4658 LearningRate 0.0004 Epoch: 16 Global Step: 337760 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:24,825-Speed 6313.86 samples/sec Loss 5.3663 LearningRate 0.0004 Epoch: 16 Global Step: 337770 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:28,064-Speed 6324.46 samples/sec Loss 5.4374 LearningRate 0.0004 Epoch: 16 Global Step: 337780 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:31,307-Speed 6317.25 samples/sec Loss 5.5260 LearningRate 0.0004 Epoch: 16 Global Step: 337790 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:34,550-Speed 6316.93 samples/sec Loss 5.4201 LearningRate 0.0004 Epoch: 16 Global Step: 337800 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:37,831-Speed 6242.05 samples/sec Loss 5.4341 LearningRate 0.0004 Epoch: 16 Global Step: 337810 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 22:27:41,132-Speed 6206.74 samples/sec Loss 5.5046 LearningRate 0.0004 Epoch: 16 Global Step: 337820 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:44,396-Speed 6275.23 samples/sec Loss 5.4601 LearningRate 0.0004 Epoch: 16 Global Step: 337830 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:47,642-Speed 6310.41 samples/sec Loss 5.3617 LearningRate 0.0004 Epoch: 16 Global Step: 337840 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:50,889-Speed 6309.01 samples/sec Loss 5.4405 LearningRate 0.0004 Epoch: 16 Global Step: 337850 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:54,137-Speed 6307.94 samples/sec Loss 5.4134 LearningRate 0.0004 Epoch: 16 Global Step: 337860 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:27:57,391-Speed 6295.36 samples/sec Loss 5.4212 LearningRate 0.0004 Epoch: 16 Global Step: 337870 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:28:00,640-Speed 6303.94 samples/sec Loss 5.4870 LearningRate 0.0004 Epoch: 16 Global Step: 337880 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:28:03,883-Speed 6316.25 samples/sec Loss 5.4022 LearningRate 0.0004 Epoch: 16 Global Step: 337890 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:28:07,139-Speed 6291.10 samples/sec Loss 5.4644 LearningRate 0.0004 Epoch: 16 Global Step: 337900 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:28:10,386-Speed 6308.98 samples/sec Loss 5.4446 LearningRate 0.0004 Epoch: 16 Global Step: 337910 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:28:13,615-Speed 6344.84 samples/sec Loss 5.5267 LearningRate 0.0004 Epoch: 16 Global Step: 337920 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:28:16,847-Speed 6338.33 samples/sec Loss 5.4616 LearningRate 0.0004 Epoch: 16 Global Step: 337930 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:20,100-Speed 6297.11 samples/sec Loss 5.5073 LearningRate 0.0004 Epoch: 16 Global Step: 337940 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:23,348-Speed 6307.94 samples/sec Loss 5.4211 LearningRate 0.0004 Epoch: 16 Global Step: 337950 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:26,591-Speed 6315.42 samples/sec Loss 5.4333 LearningRate 0.0004 Epoch: 16 Global Step: 337960 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:29,833-Speed 6319.06 samples/sec Loss 5.4760 LearningRate 0.0004 Epoch: 16 Global Step: 337970 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:33,077-Speed 6313.69 samples/sec Loss 5.4623 LearningRate 0.0004 Epoch: 16 Global Step: 337980 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:36,329-Speed 6299.45 samples/sec Loss 5.4764 LearningRate 0.0004 Epoch: 16 Global Step: 337990 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:39,573-Speed 6314.22 samples/sec Loss 5.4156 LearningRate 0.0004 Epoch: 16 Global Step: 338000 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:42,819-Speed 6312.16 samples/sec Loss 5.4358 LearningRate 0.0004 Epoch: 16 Global Step: 338010 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:46,060-Speed 6319.56 samples/sec Loss 5.4647 LearningRate 0.0004 Epoch: 16 Global Step: 338020 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:49,308-Speed 6306.61 samples/sec Loss 5.4909 LearningRate 0.0004 Epoch: 16 Global Step: 338030 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:28:52,609-Speed 6204.92 samples/sec Loss 5.4312 LearningRate 0.0004 Epoch: 16 Global Step: 338040 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:28:55,839-Speed 6342.26 samples/sec Loss 5.4878 LearningRate 0.0004 Epoch: 16 Global Step: 338050 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:28:59,089-Speed 6304.34 samples/sec Loss 5.4769 LearningRate 0.0004 Epoch: 16 Global Step: 338060 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:29:02,331-Speed 6316.89 samples/sec Loss 5.4643 LearningRate 0.0004 Epoch: 16 Global Step: 338070 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:29:05,588-Speed 6289.96 samples/sec Loss 5.5033 LearningRate 0.0004 Epoch: 16 Global Step: 338080 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:29:08,831-Speed 6317.67 samples/sec Loss 5.4590 LearningRate 0.0004 Epoch: 16 Global Step: 338090 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:29:12,079-Speed 6306.95 samples/sec Loss 5.4705 LearningRate 0.0004 Epoch: 16 Global Step: 338100 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:29:15,324-Speed 6311.51 samples/sec Loss 5.5124 LearningRate 0.0004 Epoch: 16 Global Step: 338110 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:29:18,573-Speed 6306.15 samples/sec Loss 5.4460 LearningRate 0.0004 Epoch: 16 Global Step: 338120 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:29:21,818-Speed 6313.20 samples/sec Loss 5.4127 LearningRate 0.0004 Epoch: 16 Global Step: 338130 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:29:25,065-Speed 6308.10 samples/sec Loss 5.5005 LearningRate 0.0004 Epoch: 16 Global Step: 338140 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:29:28,310-Speed 6312.57 samples/sec Loss 5.4865 LearningRate 0.0004 Epoch: 16 Global Step: 338150 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:29:31,555-Speed 6312.07 samples/sec Loss 5.4371 LearningRate 0.0004 Epoch: 16 Global Step: 338160 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:29:34,803-Speed 6307.66 samples/sec Loss 5.5037 LearningRate 0.0004 Epoch: 16 Global Step: 338170 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:29:38,052-Speed 6304.31 samples/sec Loss 5.4810 LearningRate 0.0004 Epoch: 16 Global Step: 338180 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:29:41,321-Speed 6267.23 samples/sec Loss 5.4488 LearningRate 0.0004 Epoch: 16 Global Step: 338190 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:29:44,567-Speed 6309.59 samples/sec Loss 5.3787 LearningRate 0.0004 Epoch: 16 Global Step: 338200 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:29:47,874-Speed 6194.65 samples/sec Loss 5.4869 LearningRate 0.0004 Epoch: 16 Global Step: 338210 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:29:51,122-Speed 6307.84 samples/sec Loss 5.4273 LearningRate 0.0004 Epoch: 16 Global Step: 338220 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:29:54,368-Speed 6309.95 samples/sec Loss 5.4291 LearningRate 0.0004 Epoch: 16 Global Step: 338230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:29:57,610-Speed 6318.69 samples/sec Loss 5.4616 LearningRate 0.0004 Epoch: 16 Global Step: 338240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:00,843-Speed 6335.94 samples/sec Loss 5.4356 LearningRate 0.0004 Epoch: 16 Global Step: 338250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:04,089-Speed 6309.94 samples/sec Loss 5.3529 LearningRate 0.0004 Epoch: 16 Global Step: 338260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:07,338-Speed 6304.77 samples/sec Loss 5.4455 LearningRate 0.0004 Epoch: 16 Global Step: 338270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:10,587-Speed 6304.74 samples/sec Loss 5.4929 LearningRate 0.0004 Epoch: 16 Global Step: 338280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:13,831-Speed 6315.84 samples/sec Loss 5.3969 LearningRate 0.0004 Epoch: 16 Global Step: 338290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:17,080-Speed 6303.71 samples/sec Loss 5.4155 LearningRate 0.0004 Epoch: 16 Global Step: 338300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:20,327-Speed 6309.33 samples/sec Loss 5.4169 LearningRate 0.0004 Epoch: 16 Global Step: 338310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:23,573-Speed 6310.80 samples/sec Loss 5.4379 LearningRate 0.0004 Epoch: 16 Global Step: 338320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:26,821-Speed 6306.40 samples/sec Loss 5.4407 LearningRate 0.0004 Epoch: 16 Global Step: 338330 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:30,066-Speed 6313.56 samples/sec Loss 5.3998 LearningRate 0.0004 Epoch: 16 Global Step: 338340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:33,296-Speed 6342.39 samples/sec Loss 5.5067 LearningRate 0.0004 Epoch: 16 Global Step: 338350 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:36,542-Speed 6311.20 samples/sec Loss 5.5047 LearningRate 0.0004 Epoch: 16 Global Step: 338360 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:39,786-Speed 6314.09 samples/sec Loss 5.4002 LearningRate 0.0004 Epoch: 16 Global Step: 338370 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:43,027-Speed 6321.22 samples/sec Loss 5.4080 LearningRate 0.0004 Epoch: 16 Global Step: 338380 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:46,273-Speed 6310.46 samples/sec Loss 5.4613 LearningRate 0.0004 Epoch: 16 Global Step: 338390 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:49,518-Speed 6312.15 samples/sec Loss 5.4452 LearningRate 0.0004 Epoch: 16 Global Step: 338400 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:52,766-Speed 6306.65 samples/sec Loss 5.3541 LearningRate 0.0004 Epoch: 16 Global Step: 338410 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:56,012-Speed 6311.16 samples/sec Loss 5.4776 LearningRate 0.0004 Epoch: 16 Global Step: 338420 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:30:59,258-Speed 6310.49 samples/sec Loss 5.4887 LearningRate 0.0004 Epoch: 16 Global Step: 338430 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:02,511-Speed 6298.02 samples/sec Loss 5.3999 LearningRate 0.0004 Epoch: 16 Global Step: 338440 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:05,749-Speed 6325.36 samples/sec Loss 5.4815 LearningRate 0.0004 Epoch: 16 Global Step: 338450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:08,996-Speed 6309.36 samples/sec Loss 5.4445 LearningRate 0.0004 Epoch: 16 Global Step: 338460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:12,237-Speed 6321.03 samples/sec Loss 5.4320 LearningRate 0.0004 Epoch: 16 Global Step: 338470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:15,478-Speed 6319.67 samples/sec Loss 5.4229 LearningRate 0.0004 Epoch: 16 Global Step: 338480 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:18,725-Speed 6308.87 samples/sec Loss 5.4460 LearningRate 0.0004 Epoch: 16 Global Step: 338490 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:21,967-Speed 6318.29 samples/sec Loss 5.4471 LearningRate 0.0004 Epoch: 16 Global Step: 338500 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:25,214-Speed 6309.80 samples/sec Loss 5.4382 LearningRate 0.0004 Epoch: 16 Global Step: 338510 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:28,471-Speed 6289.25 samples/sec Loss 5.4955 LearningRate 0.0004 Epoch: 16 Global Step: 338520 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:31,715-Speed 6314.13 samples/sec Loss 5.4254 LearningRate 0.0004 Epoch: 16 Global Step: 338530 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:31:34,952-Speed 6329.54 samples/sec Loss 5.4991 LearningRate 0.0004 Epoch: 16 Global Step: 338540 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:31:38,196-Speed 6314.59 samples/sec Loss 5.4236 LearningRate 0.0004 Epoch: 16 Global Step: 338550 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:31:41,426-Speed 6342.37 samples/sec Loss 5.5143 LearningRate 0.0004 Epoch: 16 Global Step: 338560 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-04-01 22:31:44,668-Speed 6317.71 samples/sec Loss 5.3820 LearningRate 0.0004 Epoch: 16 Global Step: 338570 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-04-01 22:31:47,915-Speed 6309.52 samples/sec Loss 5.4670 LearningRate 0.0004 Epoch: 16 Global Step: 338580 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-04-01 22:31:51,160-Speed 6312.46 samples/sec Loss 5.5453 LearningRate 0.0004 Epoch: 16 Global Step: 338590 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-04-01 22:31:54,404-Speed 6314.21 samples/sec Loss 5.4362 LearningRate 0.0004 Epoch: 16 Global Step: 338600 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-04-01 22:31:57,653-Speed 6304.17 samples/sec Loss 5.3944 LearningRate 0.0004 Epoch: 16 Global Step: 338610 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-04-01 22:32:00,895-Speed 6318.34 samples/sec Loss 5.4045 LearningRate 0.0004 Epoch: 16 Global Step: 338620 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-04-01 22:32:04,143-Speed 6308.37 samples/sec Loss 5.4072 LearningRate 0.0004 Epoch: 16 Global Step: 338630 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-04-01 22:32:07,387-Speed 6314.48 samples/sec Loss 5.4348 LearningRate 0.0004 Epoch: 16 Global Step: 338640 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-04-01 22:32:10,631-Speed 6313.30 samples/sec Loss 5.4633 LearningRate 0.0004 Epoch: 16 Global Step: 338650 Fp16 Grad Scale: 8192 Required: 45 hours Training: 2022-04-01 22:32:13,878-Speed 6309.51 samples/sec Loss 5.5092 LearningRate 0.0004 Epoch: 16 Global Step: 338660 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:32:17,137-Speed 6285.10 samples/sec Loss 5.4543 LearningRate 0.0004 Epoch: 16 Global Step: 338670 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:32:20,384-Speed 6308.36 samples/sec Loss 5.4410 LearningRate 0.0004 Epoch: 16 Global Step: 338680 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:32:23,629-Speed 6313.88 samples/sec Loss 5.5381 LearningRate 0.0004 Epoch: 16 Global Step: 338690 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:32:26,872-Speed 6315.70 samples/sec Loss 5.4198 LearningRate 0.0004 Epoch: 16 Global Step: 338700 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:32:30,116-Speed 6314.03 samples/sec Loss 5.4154 LearningRate 0.0004 Epoch: 16 Global Step: 338710 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:32:33,363-Speed 6310.12 samples/sec Loss 5.4439 LearningRate 0.0004 Epoch: 16 Global Step: 338720 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:32:36,609-Speed 6310.55 samples/sec Loss 5.4209 LearningRate 0.0004 Epoch: 16 Global Step: 338730 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:32:39,850-Speed 6319.20 samples/sec Loss 5.5177 LearningRate 0.0004 Epoch: 16 Global Step: 338740 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:32:43,100-Speed 6304.48 samples/sec Loss 5.3561 LearningRate 0.0004 Epoch: 16 Global Step: 338750 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:32:46,344-Speed 6312.95 samples/sec Loss 5.4285 LearningRate 0.0004 Epoch: 16 Global Step: 338760 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:32:49,589-Speed 6314.06 samples/sec Loss 5.4948 LearningRate 0.0004 Epoch: 16 Global Step: 338770 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:32:52,834-Speed 6312.40 samples/sec Loss 5.5207 LearningRate 0.0004 Epoch: 16 Global Step: 338780 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:32:56,078-Speed 6315.00 samples/sec Loss 5.5263 LearningRate 0.0004 Epoch: 16 Global Step: 338790 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:32:59,323-Speed 6312.35 samples/sec Loss 5.4165 LearningRate 0.0004 Epoch: 16 Global Step: 338800 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:02,570-Speed 6310.18 samples/sec Loss 5.4780 LearningRate 0.0004 Epoch: 16 Global Step: 338810 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:05,815-Speed 6311.39 samples/sec Loss 5.5019 LearningRate 0.0004 Epoch: 16 Global Step: 338820 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:09,068-Speed 6297.21 samples/sec Loss 5.3974 LearningRate 0.0004 Epoch: 16 Global Step: 338830 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:12,312-Speed 6315.17 samples/sec Loss 5.4520 LearningRate 0.0004 Epoch: 16 Global Step: 338840 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:15,560-Speed 6306.97 samples/sec Loss 5.4664 LearningRate 0.0004 Epoch: 16 Global Step: 338850 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:18,791-Speed 6339.47 samples/sec Loss 5.4898 LearningRate 0.0004 Epoch: 16 Global Step: 338860 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:22,038-Speed 6310.15 samples/sec Loss 5.4726 LearningRate 0.0004 Epoch: 16 Global Step: 338870 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:25,289-Speed 6299.23 samples/sec Loss 5.4063 LearningRate 0.0004 Epoch: 16 Global Step: 338880 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:28,534-Speed 6313.27 samples/sec Loss 5.4595 LearningRate 0.0004 Epoch: 16 Global Step: 338890 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:31,780-Speed 6310.46 samples/sec Loss 5.4394 LearningRate 0.0004 Epoch: 16 Global Step: 338900 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:35,029-Speed 6305.79 samples/sec Loss 5.4311 LearningRate 0.0004 Epoch: 16 Global Step: 338910 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:38,273-Speed 6313.70 samples/sec Loss 5.4136 LearningRate 0.0004 Epoch: 16 Global Step: 338920 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:41,518-Speed 6313.97 samples/sec Loss 5.4551 LearningRate 0.0004 Epoch: 16 Global Step: 338930 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:44,766-Speed 6306.31 samples/sec Loss 5.4689 LearningRate 0.0004 Epoch: 16 Global Step: 338940 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:48,010-Speed 6313.96 samples/sec Loss 5.3967 LearningRate 0.0004 Epoch: 16 Global Step: 338950 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:51,239-Speed 6344.51 samples/sec Loss 5.5019 LearningRate 0.0004 Epoch: 16 Global Step: 338960 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:54,486-Speed 6308.61 samples/sec Loss 5.3537 LearningRate 0.0004 Epoch: 16 Global Step: 338970 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:33:57,732-Speed 6311.09 samples/sec Loss 5.4758 LearningRate 0.0004 Epoch: 16 Global Step: 338980 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:00,980-Speed 6306.62 samples/sec Loss 5.4808 LearningRate 0.0004 Epoch: 16 Global Step: 338990 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:04,231-Speed 6301.21 samples/sec Loss 5.4250 LearningRate 0.0004 Epoch: 16 Global Step: 339000 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:07,483-Speed 6298.54 samples/sec Loss 5.4206 LearningRate 0.0004 Epoch: 16 Global Step: 339010 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:10,735-Speed 6300.38 samples/sec Loss 5.5025 LearningRate 0.0004 Epoch: 16 Global Step: 339020 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:13,978-Speed 6317.33 samples/sec Loss 5.4542 LearningRate 0.0004 Epoch: 16 Global Step: 339030 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:17,222-Speed 6314.02 samples/sec Loss 5.4678 LearningRate 0.0004 Epoch: 16 Global Step: 339040 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:20,473-Speed 6300.12 samples/sec Loss 5.4618 LearningRate 0.0004 Epoch: 16 Global Step: 339050 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:23,717-Speed 6315.53 samples/sec Loss 5.4669 LearningRate 0.0004 Epoch: 16 Global Step: 339060 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 22:34:26,951-Speed 6332.85 samples/sec Loss 5.4771 LearningRate 0.0004 Epoch: 16 Global Step: 339070 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:30,206-Speed 6294.86 samples/sec Loss 5.4313 LearningRate 0.0004 Epoch: 16 Global Step: 339080 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:33,454-Speed 6305.43 samples/sec Loss 5.5198 LearningRate 0.0004 Epoch: 16 Global Step: 339090 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:36,700-Speed 6310.48 samples/sec Loss 5.3577 LearningRate 0.0004 Epoch: 16 Global Step: 339100 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:39,948-Speed 6308.08 samples/sec Loss 5.4492 LearningRate 0.0004 Epoch: 16 Global Step: 339110 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:34:43,179-Speed 6338.85 samples/sec Loss 5.4814 LearningRate 0.0004 Epoch: 16 Global Step: 339120 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:34:46,422-Speed 6318.01 samples/sec Loss 5.5304 LearningRate 0.0004 Epoch: 16 Global Step: 339130 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:34:49,665-Speed 6315.29 samples/sec Loss 5.3718 LearningRate 0.0004 Epoch: 16 Global Step: 339140 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:34:52,912-Speed 6309.77 samples/sec Loss 5.4454 LearningRate 0.0004 Epoch: 16 Global Step: 339150 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:34:56,157-Speed 6312.90 samples/sec Loss 5.3802 LearningRate 0.0004 Epoch: 16 Global Step: 339160 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:34:59,405-Speed 6305.52 samples/sec Loss 5.3414 LearningRate 0.0004 Epoch: 16 Global Step: 339170 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:35:02,651-Speed 6311.34 samples/sec Loss 5.4894 LearningRate 0.0004 Epoch: 16 Global Step: 339180 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:35:05,894-Speed 6317.37 samples/sec Loss 5.3572 LearningRate 0.0004 Epoch: 16 Global Step: 339190 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:35:09,135-Speed 6321.01 samples/sec Loss 5.3830 LearningRate 0.0004 Epoch: 16 Global Step: 339200 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:35:12,379-Speed 6313.75 samples/sec Loss 5.4420 LearningRate 0.0004 Epoch: 16 Global Step: 339210 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:35:15,661-Speed 6242.88 samples/sec Loss 5.3963 LearningRate 0.0004 Epoch: 16 Global Step: 339220 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:18,928-Speed 6269.50 samples/sec Loss 5.3694 LearningRate 0.0004 Epoch: 16 Global Step: 339230 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:22,208-Speed 6245.68 samples/sec Loss 5.4244 LearningRate 0.0004 Epoch: 16 Global Step: 339240 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:25,463-Speed 6293.41 samples/sec Loss 5.4864 LearningRate 0.0004 Epoch: 16 Global Step: 339250 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:28,710-Speed 6307.97 samples/sec Loss 5.4543 LearningRate 0.0004 Epoch: 16 Global Step: 339260 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:31,956-Speed 6310.86 samples/sec Loss 5.4415 LearningRate 0.0004 Epoch: 16 Global Step: 339270 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:35,201-Speed 6311.96 samples/sec Loss 5.5198 LearningRate 0.0004 Epoch: 16 Global Step: 339280 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:38,445-Speed 6315.35 samples/sec Loss 5.4591 LearningRate 0.0004 Epoch: 16 Global Step: 339290 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:41,695-Speed 6302.59 samples/sec Loss 5.4564 LearningRate 0.0004 Epoch: 16 Global Step: 339300 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:44,946-Speed 6301.78 samples/sec Loss 5.4871 LearningRate 0.0004 Epoch: 16 Global Step: 339310 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:48,175-Speed 6343.47 samples/sec Loss 5.4445 LearningRate 0.0004 Epoch: 16 Global Step: 339320 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:51,427-Speed 6299.37 samples/sec Loss 5.4574 LearningRate 0.0004 Epoch: 16 Global Step: 339330 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:54,669-Speed 6318.90 samples/sec Loss 5.4378 LearningRate 0.0004 Epoch: 16 Global Step: 339340 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:35:57,911-Speed 6317.63 samples/sec Loss 5.4649 LearningRate 0.0004 Epoch: 16 Global Step: 339350 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:01,162-Speed 6300.53 samples/sec Loss 5.4173 LearningRate 0.0004 Epoch: 16 Global Step: 339360 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:04,408-Speed 6311.76 samples/sec Loss 5.4127 LearningRate 0.0004 Epoch: 16 Global Step: 339370 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:07,657-Speed 6304.96 samples/sec Loss 5.3556 LearningRate 0.0004 Epoch: 16 Global Step: 339380 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:10,899-Speed 6317.91 samples/sec Loss 5.4662 LearningRate 0.0004 Epoch: 16 Global Step: 339390 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:14,145-Speed 6312.61 samples/sec Loss 5.4678 LearningRate 0.0004 Epoch: 16 Global Step: 339400 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:17,389-Speed 6313.73 samples/sec Loss 5.4121 LearningRate 0.0004 Epoch: 16 Global Step: 339410 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:20,632-Speed 6315.91 samples/sec Loss 5.4137 LearningRate 0.0004 Epoch: 16 Global Step: 339420 Fp16 Grad Scale: 65536 Required: 45 hours Training: 2022-04-01 22:36:23,865-Speed 6340.43 samples/sec Loss 5.3170 LearningRate 0.0004 Epoch: 16 Global Step: 339430 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:27,108-Speed 6317.12 samples/sec Loss 5.4121 LearningRate 0.0004 Epoch: 16 Global Step: 339440 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:30,351-Speed 6315.42 samples/sec Loss 5.4535 LearningRate 0.0004 Epoch: 16 Global Step: 339450 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:33,597-Speed 6311.34 samples/sec Loss 5.4238 LearningRate 0.0004 Epoch: 16 Global Step: 339460 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:36,841-Speed 6315.34 samples/sec Loss 5.4670 LearningRate 0.0004 Epoch: 16 Global Step: 339470 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:36:40,073-Speed 6336.46 samples/sec Loss 5.4232 LearningRate 0.0004 Epoch: 16 Global Step: 339480 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:36:43,321-Speed 6307.88 samples/sec Loss 5.4606 LearningRate 0.0004 Epoch: 16 Global Step: 339490 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:36:46,570-Speed 6305.39 samples/sec Loss 5.5074 LearningRate 0.0004 Epoch: 16 Global Step: 339500 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:36:49,811-Speed 6318.69 samples/sec Loss 5.4337 LearningRate 0.0004 Epoch: 16 Global Step: 339510 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:36:53,062-Speed 6302.69 samples/sec Loss 5.4376 LearningRate 0.0004 Epoch: 16 Global Step: 339520 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:36:56,306-Speed 6314.76 samples/sec Loss 5.4312 LearningRate 0.0004 Epoch: 16 Global Step: 339530 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:36:59,564-Speed 6287.25 samples/sec Loss 5.5241 LearningRate 0.0004 Epoch: 16 Global Step: 339540 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:37:02,813-Speed 6304.20 samples/sec Loss 5.4321 LearningRate 0.0004 Epoch: 16 Global Step: 339550 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:37:06,056-Speed 6316.13 samples/sec Loss 5.4617 LearningRate 0.0004 Epoch: 16 Global Step: 339560 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:37:09,303-Speed 6309.88 samples/sec Loss 5.4306 LearningRate 0.0004 Epoch: 16 Global Step: 339570 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:37:12,548-Speed 6311.84 samples/sec Loss 5.4160 LearningRate 0.0004 Epoch: 16 Global Step: 339580 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:15,797-Speed 6306.56 samples/sec Loss 5.4459 LearningRate 0.0004 Epoch: 16 Global Step: 339590 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:19,041-Speed 6314.85 samples/sec Loss 5.4443 LearningRate 0.0004 Epoch: 16 Global Step: 339600 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:22,288-Speed 6309.11 samples/sec Loss 5.4363 LearningRate 0.0004 Epoch: 16 Global Step: 339610 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:25,536-Speed 6307.39 samples/sec Loss 5.4041 LearningRate 0.0004 Epoch: 16 Global Step: 339620 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:28,783-Speed 6307.42 samples/sec Loss 5.4927 LearningRate 0.0004 Epoch: 16 Global Step: 339630 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:32,026-Speed 6318.53 samples/sec Loss 5.5013 LearningRate 0.0004 Epoch: 16 Global Step: 339640 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:35,272-Speed 6310.68 samples/sec Loss 5.3395 LearningRate 0.0004 Epoch: 16 Global Step: 339650 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:38,519-Speed 6307.54 samples/sec Loss 5.4596 LearningRate 0.0004 Epoch: 16 Global Step: 339660 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:41,799-Speed 6246.64 samples/sec Loss 5.4250 LearningRate 0.0004 Epoch: 16 Global Step: 339670 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:45,028-Speed 6344.34 samples/sec Loss 5.4462 LearningRate 0.0004 Epoch: 16 Global Step: 339680 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:48,270-Speed 6317.30 samples/sec Loss 5.4443 LearningRate 0.0004 Epoch: 16 Global Step: 339690 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:51,517-Speed 6308.98 samples/sec Loss 5.4394 LearningRate 0.0004 Epoch: 16 Global Step: 339700 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:54,760-Speed 6315.60 samples/sec Loss 5.4512 LearningRate 0.0004 Epoch: 16 Global Step: 339710 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:37:58,007-Speed 6310.41 samples/sec Loss 5.4727 LearningRate 0.0004 Epoch: 16 Global Step: 339720 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:38:01,238-Speed 6339.80 samples/sec Loss 5.4198 LearningRate 0.0004 Epoch: 16 Global Step: 339730 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:04,485-Speed 6308.62 samples/sec Loss 5.4106 LearningRate 0.0004 Epoch: 16 Global Step: 339740 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:07,728-Speed 6315.17 samples/sec Loss 5.4490 LearningRate 0.0004 Epoch: 16 Global Step: 339750 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:10,974-Speed 6310.98 samples/sec Loss 5.4474 LearningRate 0.0004 Epoch: 16 Global Step: 339760 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:14,221-Speed 6309.02 samples/sec Loss 5.4925 LearningRate 0.0004 Epoch: 16 Global Step: 339770 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:17,464-Speed 6317.30 samples/sec Loss 5.3941 LearningRate 0.0004 Epoch: 16 Global Step: 339780 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:20,707-Speed 6314.96 samples/sec Loss 5.3754 LearningRate 0.0004 Epoch: 16 Global Step: 339790 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:23,953-Speed 6311.70 samples/sec Loss 5.4012 LearningRate 0.0004 Epoch: 16 Global Step: 339800 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:27,200-Speed 6310.80 samples/sec Loss 5.4245 LearningRate 0.0004 Epoch: 16 Global Step: 339810 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:30,450-Speed 6301.62 samples/sec Loss 5.4506 LearningRate 0.0004 Epoch: 16 Global Step: 339820 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:33,695-Speed 6314.44 samples/sec Loss 5.4146 LearningRate 0.0004 Epoch: 16 Global Step: 339830 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:38:36,940-Speed 6311.09 samples/sec Loss 5.4489 LearningRate 0.0004 Epoch: 16 Global Step: 339840 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:38:40,186-Speed 6310.80 samples/sec Loss 5.4216 LearningRate 0.0004 Epoch: 16 Global Step: 339850 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:38:43,429-Speed 6317.27 samples/sec Loss 5.3836 LearningRate 0.0004 Epoch: 16 Global Step: 339860 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:38:46,664-Speed 6331.46 samples/sec Loss 5.4360 LearningRate 0.0004 Epoch: 16 Global Step: 339870 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:49,907-Speed 6317.47 samples/sec Loss 5.4289 LearningRate 0.0004 Epoch: 16 Global Step: 339880 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:53,151-Speed 6313.81 samples/sec Loss 5.4328 LearningRate 0.0004 Epoch: 16 Global Step: 339890 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:56,397-Speed 6311.31 samples/sec Loss 5.4854 LearningRate 0.0004 Epoch: 16 Global Step: 339900 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:38:59,642-Speed 6312.40 samples/sec Loss 5.3931 LearningRate 0.0004 Epoch: 16 Global Step: 339910 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:02,884-Speed 6320.17 samples/sec Loss 5.4706 LearningRate 0.0004 Epoch: 16 Global Step: 339920 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:06,136-Speed 6298.25 samples/sec Loss 5.4507 LearningRate 0.0004 Epoch: 16 Global Step: 339930 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:09,381-Speed 6313.28 samples/sec Loss 5.4267 LearningRate 0.0004 Epoch: 16 Global Step: 339940 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:12,625-Speed 6314.33 samples/sec Loss 5.4193 LearningRate 0.0004 Epoch: 16 Global Step: 339950 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:15,877-Speed 6299.31 samples/sec Loss 5.4815 LearningRate 0.0004 Epoch: 16 Global Step: 339960 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:19,110-Speed 6336.18 samples/sec Loss 5.4145 LearningRate 0.0004 Epoch: 16 Global Step: 339970 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:22,355-Speed 6312.20 samples/sec Loss 5.5420 LearningRate 0.0004 Epoch: 16 Global Step: 339980 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:25,599-Speed 6315.37 samples/sec Loss 5.5100 LearningRate 0.0004 Epoch: 16 Global Step: 339990 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:28,845-Speed 6309.39 samples/sec Loss 5.3999 LearningRate 0.0004 Epoch: 16 Global Step: 340000 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:32,092-Speed 6309.80 samples/sec Loss 5.4558 LearningRate 0.0004 Epoch: 16 Global Step: 340010 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:35,343-Speed 6300.49 samples/sec Loss 5.4994 LearningRate 0.0004 Epoch: 16 Global Step: 340020 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:38,587-Speed 6314.18 samples/sec Loss 5.4439 LearningRate 0.0004 Epoch: 16 Global Step: 340030 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:41,834-Speed 6310.01 samples/sec Loss 5.4292 LearningRate 0.0004 Epoch: 16 Global Step: 340040 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:45,074-Speed 6323.84 samples/sec Loss 5.4985 LearningRate 0.0004 Epoch: 16 Global Step: 340050 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:48,321-Speed 6308.78 samples/sec Loss 5.4164 LearningRate 0.0004 Epoch: 16 Global Step: 340060 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-01 22:39:51,564-Speed 6315.08 samples/sec Loss 5.5086 LearningRate 0.0004 Epoch: 16 Global Step: 340070 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:39:54,809-Speed 6313.66 samples/sec Loss 5.4149 LearningRate 0.0004 Epoch: 16 Global Step: 340080 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:39:58,055-Speed 6310.54 samples/sec Loss 5.4663 LearningRate 0.0004 Epoch: 16 Global Step: 340090 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:40:01,326-Speed 6263.03 samples/sec Loss 5.4438 LearningRate 0.0004 Epoch: 16 Global Step: 340100 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:40:04,571-Speed 6312.41 samples/sec Loss 5.5300 LearningRate 0.0004 Epoch: 16 Global Step: 340110 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:40:07,816-Speed 6312.90 samples/sec Loss 5.5363 LearningRate 0.0004 Epoch: 16 Global Step: 340120 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:40:11,062-Speed 6309.68 samples/sec Loss 5.4344 LearningRate 0.0004 Epoch: 16 Global Step: 340130 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:40:14,307-Speed 6312.32 samples/sec Loss 5.3389 LearningRate 0.0004 Epoch: 16 Global Step: 340140 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:40:17,555-Speed 6306.75 samples/sec Loss 5.4494 LearningRate 0.0004 Epoch: 16 Global Step: 340150 Fp16 Grad Scale: 32768 Required: 45 hours Training: 2022-04-01 22:40:20,800-Speed 6313.50 samples/sec Loss 5.4111 LearningRate 0.0004 Epoch: 16 Global Step: 340160 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:40:24,033-Speed 6336.71 samples/sec Loss 5.4545 LearningRate 0.0004 Epoch: 16 Global Step: 340170 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:40:27,265-Speed 6338.40 samples/sec Loss 5.4696 LearningRate 0.0004 Epoch: 16 Global Step: 340180 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:40:30,510-Speed 6311.42 samples/sec Loss 5.4634 LearningRate 0.0004 Epoch: 16 Global Step: 340190 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:40:33,757-Speed 6308.95 samples/sec Loss 5.4028 LearningRate 0.0004 Epoch: 16 Global Step: 340200 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:40:36,999-Speed 6317.94 samples/sec Loss 5.3642 LearningRate 0.0004 Epoch: 16 Global Step: 340210 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:40:40,247-Speed 6306.57 samples/sec Loss 5.4245 LearningRate 0.0004 Epoch: 16 Global Step: 340220 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:40:43,493-Speed 6310.65 samples/sec Loss 5.4713 LearningRate 0.0004 Epoch: 16 Global Step: 340230 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:40:46,736-Speed 6317.07 samples/sec Loss 5.4258 LearningRate 0.0004 Epoch: 16 Global Step: 340240 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:40:49,983-Speed 6308.85 samples/sec Loss 5.5044 LearningRate 0.0004 Epoch: 16 Global Step: 340250 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:40:53,224-Speed 6321.79 samples/sec Loss 5.4419 LearningRate 0.0004 Epoch: 16 Global Step: 340260 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:40:56,473-Speed 6305.53 samples/sec Loss 5.5246 LearningRate 0.0004 Epoch: 16 Global Step: 340270 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:40:59,717-Speed 6314.01 samples/sec Loss 5.4523 LearningRate 0.0004 Epoch: 16 Global Step: 340280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:02,963-Speed 6310.03 samples/sec Loss 5.3939 LearningRate 0.0004 Epoch: 16 Global Step: 340290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:06,209-Speed 6312.41 samples/sec Loss 5.3908 LearningRate 0.0004 Epoch: 16 Global Step: 340300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:09,455-Speed 6309.39 samples/sec Loss 5.4551 LearningRate 0.0004 Epoch: 16 Global Step: 340310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:12,702-Speed 6309.25 samples/sec Loss 5.4561 LearningRate 0.0004 Epoch: 16 Global Step: 340320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:15,950-Speed 6305.70 samples/sec Loss 5.4311 LearningRate 0.0004 Epoch: 16 Global Step: 340330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:19,194-Speed 6314.49 samples/sec Loss 5.4262 LearningRate 0.0004 Epoch: 16 Global Step: 340340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:22,443-Speed 6305.80 samples/sec Loss 5.3835 LearningRate 0.0004 Epoch: 16 Global Step: 340350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:25,688-Speed 6312.94 samples/sec Loss 5.4122 LearningRate 0.0004 Epoch: 16 Global Step: 340360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:28,932-Speed 6313.48 samples/sec Loss 5.4434 LearningRate 0.0004 Epoch: 16 Global Step: 340370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:32,161-Speed 6344.43 samples/sec Loss 5.4601 LearningRate 0.0004 Epoch: 16 Global Step: 340380 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:35,421-Speed 6283.55 samples/sec Loss 5.4721 LearningRate 0.0004 Epoch: 16 Global Step: 340390 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:38,666-Speed 6312.47 samples/sec Loss 5.4340 LearningRate 0.0004 Epoch: 16 Global Step: 340400 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:41,910-Speed 6314.81 samples/sec Loss 5.3160 LearningRate 0.0004 Epoch: 16 Global Step: 340410 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:45,156-Speed 6309.83 samples/sec Loss 5.4706 LearningRate 0.0004 Epoch: 16 Global Step: 340420 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:48,401-Speed 6314.05 samples/sec Loss 5.4728 LearningRate 0.0004 Epoch: 16 Global Step: 340430 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:51,650-Speed 6303.87 samples/sec Loss 5.4329 LearningRate 0.0004 Epoch: 16 Global Step: 340440 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:54,895-Speed 6313.29 samples/sec Loss 5.4608 LearningRate 0.0004 Epoch: 16 Global Step: 340450 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:41:58,150-Speed 6293.06 samples/sec Loss 5.4170 LearningRate 0.0004 Epoch: 16 Global Step: 340460 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:01,395-Speed 6313.88 samples/sec Loss 5.4354 LearningRate 0.0004 Epoch: 16 Global Step: 340470 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:04,630-Speed 6332.76 samples/sec Loss 5.4706 LearningRate 0.0004 Epoch: 16 Global Step: 340480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:07,881-Speed 6300.78 samples/sec Loss 5.4313 LearningRate 0.0004 Epoch: 16 Global Step: 340490 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:11,129-Speed 6306.17 samples/sec Loss 5.4190 LearningRate 0.0004 Epoch: 16 Global Step: 340500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:14,374-Speed 6316.39 samples/sec Loss 5.4082 LearningRate 0.0004 Epoch: 16 Global Step: 340510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:17,624-Speed 6303.20 samples/sec Loss 5.4469 LearningRate 0.0004 Epoch: 16 Global Step: 340520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:20,869-Speed 6311.61 samples/sec Loss 5.4522 LearningRate 0.0004 Epoch: 16 Global Step: 340530 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:24,122-Speed 6298.29 samples/sec Loss 5.4059 LearningRate 0.0004 Epoch: 16 Global Step: 340540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:27,366-Speed 6314.24 samples/sec Loss 5.4132 LearningRate 0.0004 Epoch: 16 Global Step: 340550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:30,613-Speed 6308.24 samples/sec Loss 5.4427 LearningRate 0.0004 Epoch: 16 Global Step: 340560 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:33,859-Speed 6311.70 samples/sec Loss 5.4063 LearningRate 0.0004 Epoch: 16 Global Step: 340570 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:37,093-Speed 6333.48 samples/sec Loss 5.4191 LearningRate 0.0004 Epoch: 16 Global Step: 340580 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:40,398-Speed 6199.13 samples/sec Loss 5.3889 LearningRate 0.0004 Epoch: 16 Global Step: 340590 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:43,646-Speed 6305.20 samples/sec Loss 5.3513 LearningRate 0.0004 Epoch: 16 Global Step: 340600 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:46,895-Speed 6305.05 samples/sec Loss 5.4386 LearningRate 0.0004 Epoch: 16 Global Step: 340610 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:50,142-Speed 6308.55 samples/sec Loss 5.4016 LearningRate 0.0004 Epoch: 16 Global Step: 340620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:53,386-Speed 6314.53 samples/sec Loss 5.4435 LearningRate 0.0004 Epoch: 16 Global Step: 340630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:56,628-Speed 6318.88 samples/sec Loss 5.5214 LearningRate 0.0004 Epoch: 16 Global Step: 340640 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:42:59,873-Speed 6313.30 samples/sec Loss 5.3830 LearningRate 0.0004 Epoch: 16 Global Step: 340650 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:03,122-Speed 6305.11 samples/sec Loss 5.4913 LearningRate 0.0004 Epoch: 16 Global Step: 340660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:06,370-Speed 6307.13 samples/sec Loss 5.4396 LearningRate 0.0004 Epoch: 16 Global Step: 340670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:09,607-Speed 6328.89 samples/sec Loss 5.3654 LearningRate 0.0004 Epoch: 16 Global Step: 340680 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:12,851-Speed 6313.83 samples/sec Loss 5.4913 LearningRate 0.0004 Epoch: 16 Global Step: 340690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:16,097-Speed 6312.16 samples/sec Loss 5.4191 LearningRate 0.0004 Epoch: 16 Global Step: 340700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:19,347-Speed 6302.68 samples/sec Loss 5.3836 LearningRate 0.0004 Epoch: 16 Global Step: 340710 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:22,591-Speed 6313.46 samples/sec Loss 5.4350 LearningRate 0.0004 Epoch: 16 Global Step: 340720 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:25,836-Speed 6314.05 samples/sec Loss 5.4652 LearningRate 0.0004 Epoch: 16 Global Step: 340730 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:29,080-Speed 6312.92 samples/sec Loss 5.4543 LearningRate 0.0004 Epoch: 16 Global Step: 340740 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:32,326-Speed 6311.77 samples/sec Loss 5.4401 LearningRate 0.0004 Epoch: 16 Global Step: 340750 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:35,572-Speed 6309.15 samples/sec Loss 5.5078 LearningRate 0.0004 Epoch: 16 Global Step: 340760 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:38,817-Speed 6313.13 samples/sec Loss 5.4149 LearningRate 0.0004 Epoch: 16 Global Step: 340770 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:42,065-Speed 6307.57 samples/sec Loss 5.5136 LearningRate 0.0004 Epoch: 16 Global Step: 340780 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-04-01 22:43:45,297-Speed 6338.46 samples/sec Loss 5.3620 LearningRate 0.0004 Epoch: 16 Global Step: 340790 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:48,545-Speed 6305.62 samples/sec Loss 5.4552 LearningRate 0.0004 Epoch: 16 Global Step: 340800 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:51,791-Speed 6311.98 samples/sec Loss 5.3887 LearningRate 0.0004 Epoch: 16 Global Step: 340810 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:55,033-Speed 6318.44 samples/sec Loss 5.4456 LearningRate 0.0004 Epoch: 16 Global Step: 340820 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:43:58,281-Speed 6306.31 samples/sec Loss 5.3425 LearningRate 0.0004 Epoch: 16 Global Step: 340830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:01,521-Speed 6321.10 samples/sec Loss 5.4505 LearningRate 0.0004 Epoch: 16 Global Step: 340840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:04,764-Speed 6316.90 samples/sec Loss 5.3571 LearningRate 0.0004 Epoch: 16 Global Step: 340850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:08,024-Speed 6285.20 samples/sec Loss 5.3677 LearningRate 0.0004 Epoch: 16 Global Step: 340860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:11,267-Speed 6314.94 samples/sec Loss 5.4148 LearningRate 0.0004 Epoch: 16 Global Step: 340870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:14,515-Speed 6308.59 samples/sec Loss 5.3901 LearningRate 0.0004 Epoch: 16 Global Step: 340880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:17,766-Speed 6300.36 samples/sec Loss 5.5264 LearningRate 0.0004 Epoch: 16 Global Step: 340890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:21,013-Speed 6308.85 samples/sec Loss 5.4074 LearningRate 0.0004 Epoch: 16 Global Step: 340900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:24,256-Speed 6316.50 samples/sec Loss 5.5145 LearningRate 0.0004 Epoch: 16 Global Step: 340910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:27,502-Speed 6311.25 samples/sec Loss 5.4488 LearningRate 0.0004 Epoch: 16 Global Step: 340920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:30,743-Speed 6320.42 samples/sec Loss 5.4617 LearningRate 0.0004 Epoch: 16 Global Step: 340930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:33,990-Speed 6309.25 samples/sec Loss 5.4175 LearningRate 0.0004 Epoch: 16 Global Step: 340940 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:37,237-Speed 6308.18 samples/sec Loss 5.4460 LearningRate 0.0004 Epoch: 16 Global Step: 340950 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:40,487-Speed 6303.42 samples/sec Loss 5.3833 LearningRate 0.0004 Epoch: 16 Global Step: 340960 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:43,733-Speed 6309.92 samples/sec Loss 5.4825 LearningRate 0.0004 Epoch: 16 Global Step: 340970 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:46,978-Speed 6313.85 samples/sec Loss 5.4319 LearningRate 0.0004 Epoch: 16 Global Step: 340980 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:50,208-Speed 6341.83 samples/sec Loss 5.4794 LearningRate 0.0004 Epoch: 16 Global Step: 340990 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:53,457-Speed 6304.85 samples/sec Loss 5.3833 LearningRate 0.0004 Epoch: 16 Global Step: 341000 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:56,701-Speed 6314.78 samples/sec Loss 5.4221 LearningRate 0.0004 Epoch: 16 Global Step: 341010 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:44:59,942-Speed 6319.36 samples/sec Loss 5.4076 LearningRate 0.0004 Epoch: 16 Global Step: 341020 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:03,190-Speed 6308.02 samples/sec Loss 5.4383 LearningRate 0.0004 Epoch: 16 Global Step: 341030 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:06,437-Speed 6308.83 samples/sec Loss 5.3332 LearningRate 0.0004 Epoch: 16 Global Step: 341040 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:09,684-Speed 6307.76 samples/sec Loss 5.4420 LearningRate 0.0004 Epoch: 16 Global Step: 341050 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:12,924-Speed 6322.45 samples/sec Loss 5.4544 LearningRate 0.0004 Epoch: 16 Global Step: 341060 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:16,170-Speed 6310.62 samples/sec Loss 5.3637 LearningRate 0.0004 Epoch: 16 Global Step: 341070 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:19,417-Speed 6309.31 samples/sec Loss 5.4113 LearningRate 0.0004 Epoch: 16 Global Step: 341080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:22,655-Speed 6326.46 samples/sec Loss 5.4806 LearningRate 0.0004 Epoch: 16 Global Step: 341090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:25,901-Speed 6312.00 samples/sec Loss 5.4343 LearningRate 0.0004 Epoch: 16 Global Step: 341100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:29,153-Speed 6299.37 samples/sec Loss 5.4213 LearningRate 0.0004 Epoch: 16 Global Step: 341110 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:32,404-Speed 6300.08 samples/sec Loss 5.5222 LearningRate 0.0004 Epoch: 16 Global Step: 341120 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:35,651-Speed 6308.82 samples/sec Loss 5.3302 LearningRate 0.0004 Epoch: 16 Global Step: 341130 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:38,899-Speed 6306.94 samples/sec Loss 5.4098 LearningRate 0.0004 Epoch: 16 Global Step: 341140 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:42,143-Speed 6314.42 samples/sec Loss 5.4269 LearningRate 0.0004 Epoch: 16 Global Step: 341150 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:45,393-Speed 6303.00 samples/sec Loss 5.4278 LearningRate 0.0004 Epoch: 16 Global Step: 341160 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:48,640-Speed 6308.90 samples/sec Loss 5.4106 LearningRate 0.0004 Epoch: 16 Global Step: 341170 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:51,884-Speed 6315.33 samples/sec Loss 5.4124 LearningRate 0.0004 Epoch: 16 Global Step: 341180 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:55,116-Speed 6336.53 samples/sec Loss 5.4492 LearningRate 0.0004 Epoch: 16 Global Step: 341190 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:45:58,358-Speed 6318.72 samples/sec Loss 5.4255 LearningRate 0.0004 Epoch: 16 Global Step: 341200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:01,610-Speed 6299.59 samples/sec Loss 5.3612 LearningRate 0.0004 Epoch: 16 Global Step: 341210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:04,854-Speed 6313.48 samples/sec Loss 5.4577 LearningRate 0.0004 Epoch: 16 Global Step: 341220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:08,104-Speed 6304.63 samples/sec Loss 5.3631 LearningRate 0.0004 Epoch: 16 Global Step: 341230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:11,350-Speed 6310.50 samples/sec Loss 5.4836 LearningRate 0.0004 Epoch: 16 Global Step: 341240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:14,591-Speed 6322.23 samples/sec Loss 5.4835 LearningRate 0.0004 Epoch: 16 Global Step: 341250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:17,846-Speed 6293.38 samples/sec Loss 5.3733 LearningRate 0.0004 Epoch: 16 Global Step: 341260 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:21,090-Speed 6315.05 samples/sec Loss 5.3912 LearningRate 0.0004 Epoch: 16 Global Step: 341270 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:24,336-Speed 6309.62 samples/sec Loss 5.4734 LearningRate 0.0004 Epoch: 16 Global Step: 341280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:27,570-Speed 6334.36 samples/sec Loss 5.4263 LearningRate 0.0004 Epoch: 16 Global Step: 341290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:30,816-Speed 6310.96 samples/sec Loss 5.4314 LearningRate 0.0004 Epoch: 16 Global Step: 341300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:34,060-Speed 6316.09 samples/sec Loss 5.3886 LearningRate 0.0004 Epoch: 16 Global Step: 341310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:37,302-Speed 6319.60 samples/sec Loss 5.4355 LearningRate 0.0004 Epoch: 16 Global Step: 341320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:40,549-Speed 6307.86 samples/sec Loss 5.3936 LearningRate 0.0004 Epoch: 16 Global Step: 341330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:43,791-Speed 6317.93 samples/sec Loss 5.4322 LearningRate 0.0004 Epoch: 16 Global Step: 341340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:47,047-Speed 6292.69 samples/sec Loss 5.4180 LearningRate 0.0004 Epoch: 16 Global Step: 341350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:50,292-Speed 6311.31 samples/sec Loss 5.4532 LearningRate 0.0004 Epoch: 16 Global Step: 341360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:53,537-Speed 6312.34 samples/sec Loss 5.4672 LearningRate 0.0004 Epoch: 16 Global Step: 341370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:46:56,784-Speed 6310.48 samples/sec Loss 5.5862 LearningRate 0.0004 Epoch: 16 Global Step: 341380 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:00,013-Speed 6342.12 samples/sec Loss 5.4337 LearningRate 0.0004 Epoch: 16 Global Step: 341390 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:03,255-Speed 6319.82 samples/sec Loss 5.4136 LearningRate 0.0004 Epoch: 16 Global Step: 341400 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:06,504-Speed 6306.17 samples/sec Loss 5.4468 LearningRate 0.0004 Epoch: 16 Global Step: 341410 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:09,747-Speed 6317.94 samples/sec Loss 5.4759 LearningRate 0.0004 Epoch: 16 Global Step: 341420 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:12,994-Speed 6308.26 samples/sec Loss 5.4182 LearningRate 0.0004 Epoch: 16 Global Step: 341430 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:16,240-Speed 6310.56 samples/sec Loss 5.3796 LearningRate 0.0004 Epoch: 16 Global Step: 341440 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:19,488-Speed 6307.86 samples/sec Loss 5.4328 LearningRate 0.0004 Epoch: 16 Global Step: 341450 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:22,735-Speed 6307.08 samples/sec Loss 5.3854 LearningRate 0.0004 Epoch: 16 Global Step: 341460 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:25,979-Speed 6314.87 samples/sec Loss 5.3769 LearningRate 0.0004 Epoch: 16 Global Step: 341470 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:29,223-Speed 6314.94 samples/sec Loss 5.4302 LearningRate 0.0004 Epoch: 16 Global Step: 341480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:32,464-Speed 6320.66 samples/sec Loss 5.3618 LearningRate 0.0004 Epoch: 16 Global Step: 341490 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-04-01 22:47:35,699-Speed 6331.92 samples/sec Loss 5.4439 LearningRate 0.0004 Epoch: 16 Global Step: 341500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:38,943-Speed 6315.08 samples/sec Loss 5.3863 LearningRate 0.0004 Epoch: 16 Global Step: 341510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:47:42,175-Speed 6338.60 samples/sec Loss 5.4747 LearningRate 0.0004 Epoch: 16 Global Step: 341520 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:47:45,419-Speed 6315.07 samples/sec Loss 5.3970 LearningRate 0.0004 Epoch: 16 Global Step: 341530 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:47:48,664-Speed 6312.73 samples/sec Loss 5.3799 LearningRate 0.0004 Epoch: 16 Global Step: 341540 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:47:51,910-Speed 6310.52 samples/sec Loss 5.4331 LearningRate 0.0004 Epoch: 16 Global Step: 341550 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:47:55,156-Speed 6311.63 samples/sec Loss 5.4124 LearningRate 0.0004 Epoch: 16 Global Step: 341560 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:47:58,399-Speed 6317.20 samples/sec Loss 5.5828 LearningRate 0.0004 Epoch: 16 Global Step: 341570 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:48:01,643-Speed 6313.70 samples/sec Loss 5.4299 LearningRate 0.0004 Epoch: 16 Global Step: 341580 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:48:04,889-Speed 6309.87 samples/sec Loss 5.5038 LearningRate 0.0004 Epoch: 16 Global Step: 341590 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:48:08,135-Speed 6311.42 samples/sec Loss 5.4927 LearningRate 0.0004 Epoch: 16 Global Step: 341600 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:48:11,377-Speed 6319.82 samples/sec Loss 5.4636 LearningRate 0.0004 Epoch: 16 Global Step: 341610 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:48:14,622-Speed 6313.22 samples/sec Loss 5.4168 LearningRate 0.0004 Epoch: 16 Global Step: 341620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:17,873-Speed 6300.14 samples/sec Loss 5.4383 LearningRate 0.0004 Epoch: 16 Global Step: 341630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:21,115-Speed 6317.43 samples/sec Loss 5.3490 LearningRate 0.0004 Epoch: 16 Global Step: 341640 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:24,361-Speed 6310.21 samples/sec Loss 5.2875 LearningRate 0.0004 Epoch: 16 Global Step: 341650 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:27,608-Speed 6309.57 samples/sec Loss 5.4486 LearningRate 0.0004 Epoch: 16 Global Step: 341660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:30,852-Speed 6315.60 samples/sec Loss 5.4112 LearningRate 0.0004 Epoch: 16 Global Step: 341670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:34,095-Speed 6315.43 samples/sec Loss 5.3831 LearningRate 0.0004 Epoch: 16 Global Step: 341680 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:37,343-Speed 6308.42 samples/sec Loss 5.3291 LearningRate 0.0004 Epoch: 16 Global Step: 341690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:40,591-Speed 6305.45 samples/sec Loss 5.4388 LearningRate 0.0004 Epoch: 16 Global Step: 341700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:43,835-Speed 6314.79 samples/sec Loss 5.3989 LearningRate 0.0004 Epoch: 16 Global Step: 341710 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:47,067-Speed 6339.12 samples/sec Loss 5.4145 LearningRate 0.0004 Epoch: 16 Global Step: 341720 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:50,319-Speed 6299.57 samples/sec Loss 5.4284 LearningRate 0.0004 Epoch: 16 Global Step: 341730 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:53,564-Speed 6312.45 samples/sec Loss 5.4972 LearningRate 0.0004 Epoch: 16 Global Step: 341740 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:48:56,806-Speed 6317.81 samples/sec Loss 5.3764 LearningRate 0.0004 Epoch: 16 Global Step: 341750 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:00,053-Speed 6309.35 samples/sec Loss 5.4562 LearningRate 0.0004 Epoch: 16 Global Step: 341760 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:03,296-Speed 6316.20 samples/sec Loss 5.4149 LearningRate 0.0004 Epoch: 16 Global Step: 341770 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:06,541-Speed 6313.73 samples/sec Loss 5.4482 LearningRate 0.0004 Epoch: 16 Global Step: 341780 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:09,782-Speed 6319.45 samples/sec Loss 5.4459 LearningRate 0.0004 Epoch: 16 Global Step: 341790 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:13,035-Speed 6298.13 samples/sec Loss 5.4047 LearningRate 0.0004 Epoch: 16 Global Step: 341800 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:16,280-Speed 6311.20 samples/sec Loss 5.4062 LearningRate 0.0004 Epoch: 16 Global Step: 341810 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:19,530-Speed 6303.91 samples/sec Loss 5.3898 LearningRate 0.0004 Epoch: 16 Global Step: 341820 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-04-01 22:49:22,759-Speed 6342.70 samples/sec Loss 5.4113 LearningRate 0.0004 Epoch: 16 Global Step: 341830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:26,003-Speed 6316.48 samples/sec Loss 5.4157 LearningRate 0.0004 Epoch: 16 Global Step: 341840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:29,249-Speed 6309.51 samples/sec Loss 5.4981 LearningRate 0.0004 Epoch: 16 Global Step: 341850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:32,494-Speed 6313.51 samples/sec Loss 5.4450 LearningRate 0.0004 Epoch: 16 Global Step: 341860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:35,738-Speed 6314.16 samples/sec Loss 5.4729 LearningRate 0.0004 Epoch: 16 Global Step: 341870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:38,985-Speed 6308.80 samples/sec Loss 5.4233 LearningRate 0.0004 Epoch: 16 Global Step: 341880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:42,228-Speed 6315.56 samples/sec Loss 5.4156 LearningRate 0.0004 Epoch: 16 Global Step: 341890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:45,472-Speed 6314.95 samples/sec Loss 5.3926 LearningRate 0.0004 Epoch: 16 Global Step: 341900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:48,722-Speed 6307.07 samples/sec Loss 5.4047 LearningRate 0.0004 Epoch: 16 Global Step: 341910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:51,968-Speed 6309.00 samples/sec Loss 5.3627 LearningRate 0.0004 Epoch: 16 Global Step: 341920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:55,198-Speed 6342.46 samples/sec Loss 5.3850 LearningRate 0.0004 Epoch: 16 Global Step: 341930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:49:58,446-Speed 6308.19 samples/sec Loss 5.5229 LearningRate 0.0004 Epoch: 16 Global Step: 341940 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:01,690-Speed 6315.23 samples/sec Loss 5.4074 LearningRate 0.0004 Epoch: 16 Global Step: 341950 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:04,937-Speed 6307.16 samples/sec Loss 5.4114 LearningRate 0.0004 Epoch: 16 Global Step: 341960 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:08,183-Speed 6310.38 samples/sec Loss 5.4613 LearningRate 0.0004 Epoch: 16 Global Step: 341970 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:11,428-Speed 6312.84 samples/sec Loss 5.3939 LearningRate 0.0004 Epoch: 16 Global Step: 341980 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:14,673-Speed 6314.19 samples/sec Loss 5.4363 LearningRate 0.0004 Epoch: 16 Global Step: 341990 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:17,917-Speed 6313.49 samples/sec Loss 5.3770 LearningRate 0.0004 Epoch: 16 Global Step: 342000 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:21,162-Speed 6312.68 samples/sec Loss 5.4271 LearningRate 0.0004 Epoch: 16 Global Step: 342010 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:24,406-Speed 6314.39 samples/sec Loss 5.5101 LearningRate 0.0004 Epoch: 16 Global Step: 342020 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:27,642-Speed 6330.14 samples/sec Loss 5.3696 LearningRate 0.0004 Epoch: 16 Global Step: 342030 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:30,891-Speed 6305.39 samples/sec Loss 5.4268 LearningRate 0.0004 Epoch: 16 Global Step: 342040 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:34,135-Speed 6314.83 samples/sec Loss 5.3972 LearningRate 0.0004 Epoch: 16 Global Step: 342050 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:37,379-Speed 6314.48 samples/sec Loss 5.4071 LearningRate 0.0004 Epoch: 16 Global Step: 342060 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:40,621-Speed 6319.20 samples/sec Loss 5.3651 LearningRate 0.0004 Epoch: 16 Global Step: 342070 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:43,864-Speed 6316.65 samples/sec Loss 5.4217 LearningRate 0.0004 Epoch: 16 Global Step: 342080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:47,112-Speed 6306.65 samples/sec Loss 5.4390 LearningRate 0.0004 Epoch: 16 Global Step: 342090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:50,357-Speed 6311.80 samples/sec Loss 5.4702 LearningRate 0.0004 Epoch: 16 Global Step: 342100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:53,599-Speed 6318.99 samples/sec Loss 5.3727 LearningRate 0.0004 Epoch: 16 Global Step: 342110 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:50:56,840-Speed 6319.86 samples/sec Loss 5.3176 LearningRate 0.0004 Epoch: 16 Global Step: 342120 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:00,102-Speed 6279.68 samples/sec Loss 5.4620 LearningRate 0.0004 Epoch: 16 Global Step: 342130 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:03,390-Speed 6230.29 samples/sec Loss 5.4336 LearningRate 0.0004 Epoch: 16 Global Step: 342140 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:06,636-Speed 6311.30 samples/sec Loss 5.4258 LearningRate 0.0004 Epoch: 16 Global Step: 342150 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:09,882-Speed 6309.20 samples/sec Loss 5.4362 LearningRate 0.0004 Epoch: 16 Global Step: 342160 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:13,126-Speed 6315.97 samples/sec Loss 5.4249 LearningRate 0.0004 Epoch: 16 Global Step: 342170 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:16,370-Speed 6315.40 samples/sec Loss 5.4055 LearningRate 0.0004 Epoch: 16 Global Step: 342180 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:19,615-Speed 6313.20 samples/sec Loss 5.4091 LearningRate 0.0004 Epoch: 16 Global Step: 342190 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:22,862-Speed 6308.55 samples/sec Loss 5.4573 LearningRate 0.0004 Epoch: 16 Global Step: 342200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:26,106-Speed 6313.86 samples/sec Loss 5.4462 LearningRate 0.0004 Epoch: 16 Global Step: 342210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:29,350-Speed 6314.35 samples/sec Loss 5.3847 LearningRate 0.0004 Epoch: 16 Global Step: 342220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:32,579-Speed 6345.10 samples/sec Loss 5.4516 LearningRate 0.0004 Epoch: 16 Global Step: 342230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:35,822-Speed 6317.39 samples/sec Loss 5.4146 LearningRate 0.0004 Epoch: 16 Global Step: 342240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:39,069-Speed 6307.17 samples/sec Loss 5.4239 LearningRate 0.0004 Epoch: 16 Global Step: 342250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:42,316-Speed 6309.05 samples/sec Loss 5.4625 LearningRate 0.0004 Epoch: 16 Global Step: 342260 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:45,565-Speed 6306.29 samples/sec Loss 5.4302 LearningRate 0.0004 Epoch: 16 Global Step: 342270 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:48,807-Speed 6317.90 samples/sec Loss 5.4268 LearningRate 0.0004 Epoch: 16 Global Step: 342280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:52,054-Speed 6309.38 samples/sec Loss 5.4365 LearningRate 0.0004 Epoch: 16 Global Step: 342290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:55,299-Speed 6312.47 samples/sec Loss 5.4011 LearningRate 0.0004 Epoch: 16 Global Step: 342300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:51:58,546-Speed 6308.56 samples/sec Loss 5.4106 LearningRate 0.0004 Epoch: 16 Global Step: 342310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:01,799-Speed 6297.39 samples/sec Loss 5.5197 LearningRate 0.0004 Epoch: 16 Global Step: 342320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:05,030-Speed 6339.62 samples/sec Loss 5.4509 LearningRate 0.0004 Epoch: 16 Global Step: 342330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:08,274-Speed 6313.49 samples/sec Loss 5.4352 LearningRate 0.0004 Epoch: 16 Global Step: 342340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:11,516-Speed 6319.07 samples/sec Loss 5.4308 LearningRate 0.0004 Epoch: 16 Global Step: 342350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:14,762-Speed 6311.56 samples/sec Loss 5.4010 LearningRate 0.0004 Epoch: 16 Global Step: 342360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:18,008-Speed 6310.20 samples/sec Loss 5.3478 LearningRate 0.0004 Epoch: 16 Global Step: 342370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:21,250-Speed 6318.69 samples/sec Loss 5.3946 LearningRate 0.0004 Epoch: 16 Global Step: 342380 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:24,497-Speed 6309.23 samples/sec Loss 5.3748 LearningRate 0.0004 Epoch: 16 Global Step: 342390 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:27,743-Speed 6311.64 samples/sec Loss 5.3833 LearningRate 0.0004 Epoch: 16 Global Step: 342400 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:30,985-Speed 6317.16 samples/sec Loss 5.3709 LearningRate 0.0004 Epoch: 16 Global Step: 342410 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:34,236-Speed 6301.00 samples/sec Loss 5.3682 LearningRate 0.0004 Epoch: 16 Global Step: 342420 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:37,465-Speed 6344.64 samples/sec Loss 5.4371 LearningRate 0.0004 Epoch: 16 Global Step: 342430 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:40,718-Speed 6296.98 samples/sec Loss 5.4604 LearningRate 0.0004 Epoch: 16 Global Step: 342440 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:43,962-Speed 6313.84 samples/sec Loss 5.4403 LearningRate 0.0004 Epoch: 16 Global Step: 342450 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:47,210-Speed 6306.51 samples/sec Loss 5.3817 LearningRate 0.0004 Epoch: 16 Global Step: 342460 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:50,458-Speed 6307.44 samples/sec Loss 5.4187 LearningRate 0.0004 Epoch: 16 Global Step: 342470 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:53,704-Speed 6311.95 samples/sec Loss 5.3850 LearningRate 0.0004 Epoch: 16 Global Step: 342480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:52:56,951-Speed 6308.58 samples/sec Loss 5.4125 LearningRate 0.0004 Epoch: 16 Global Step: 342490 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:53:00,192-Speed 6320.35 samples/sec Loss 5.3838 LearningRate 0.0004 Epoch: 16 Global Step: 342500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:53:03,440-Speed 6306.52 samples/sec Loss 5.4084 LearningRate 0.0004 Epoch: 16 Global Step: 342510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:53:06,687-Speed 6308.48 samples/sec Loss 5.3758 LearningRate 0.0004 Epoch: 16 Global Step: 342520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:53:09,920-Speed 6334.89 samples/sec Loss 5.5130 LearningRate 0.0004 Epoch: 16 Global Step: 342530 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:53:13,165-Speed 6314.33 samples/sec Loss 5.4208 LearningRate 0.0004 Epoch: 16 Global Step: 342540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:53:16,412-Speed 6307.37 samples/sec Loss 5.4023 LearningRate 0.0004 Epoch: 16 Global Step: 342550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:53:19,656-Speed 6316.23 samples/sec Loss 5.4288 LearningRate 0.0004 Epoch: 16 Global Step: 342560 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:53:22,901-Speed 6311.71 samples/sec Loss 5.3542 LearningRate 0.0004 Epoch: 16 Global Step: 342570 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:53:26,156-Speed 6294.52 samples/sec Loss 5.4318 LearningRate 0.0004 Epoch: 16 Global Step: 342580 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:53:29,381-Speed 6350.64 samples/sec Loss 5.4344 LearningRate 0.0004 Epoch: 16 Global Step: 342590 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:53:32,627-Speed 6311.94 samples/sec Loss 5.4369 LearningRate 0.0004 Epoch: 16 Global Step: 342600 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:53:35,885-Speed 6286.47 samples/sec Loss 5.4223 LearningRate 0.0004 Epoch: 16 Global Step: 342610 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:53:39,126-Speed 6321.63 samples/sec Loss 5.3976 LearningRate 0.0004 Epoch: 16 Global Step: 342620 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:53:42,371-Speed 6313.05 samples/sec Loss 5.4333 LearningRate 0.0004 Epoch: 16 Global Step: 342630 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:53:45,615-Speed 6313.78 samples/sec Loss 5.4275 LearningRate 0.0004 Epoch: 16 Global Step: 342640 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:53:48,862-Speed 6310.12 samples/sec Loss 5.4333 LearningRate 0.0004 Epoch: 16 Global Step: 342650 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:53:52,115-Speed 6295.78 samples/sec Loss 5.4748 LearningRate 0.0004 Epoch: 16 Global Step: 342660 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:53:55,358-Speed 6316.60 samples/sec Loss 5.4140 LearningRate 0.0004 Epoch: 16 Global Step: 342670 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:53:58,602-Speed 6313.91 samples/sec Loss 5.3996 LearningRate 0.0004 Epoch: 16 Global Step: 342680 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:54:01,847-Speed 6313.94 samples/sec Loss 5.3956 LearningRate 0.0004 Epoch: 16 Global Step: 342690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:05,090-Speed 6315.65 samples/sec Loss 5.4009 LearningRate 0.0004 Epoch: 16 Global Step: 342700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:08,344-Speed 6294.90 samples/sec Loss 5.3866 LearningRate 0.0004 Epoch: 16 Global Step: 342710 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:11,593-Speed 6305.14 samples/sec Loss 5.4440 LearningRate 0.0004 Epoch: 16 Global Step: 342720 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:14,914-Speed 6167.80 samples/sec Loss 5.3817 LearningRate 0.0004 Epoch: 16 Global Step: 342730 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:18,222-Speed 6193.21 samples/sec Loss 5.4400 LearningRate 0.0004 Epoch: 16 Global Step: 342740 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:21,467-Speed 6313.55 samples/sec Loss 5.4062 LearningRate 0.0004 Epoch: 16 Global Step: 342750 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:24,711-Speed 6312.86 samples/sec Loss 5.4663 LearningRate 0.0004 Epoch: 16 Global Step: 342760 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:27,955-Speed 6315.17 samples/sec Loss 5.4310 LearningRate 0.0004 Epoch: 16 Global Step: 342770 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:31,200-Speed 6311.81 samples/sec Loss 5.4248 LearningRate 0.0004 Epoch: 16 Global Step: 342780 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:34,432-Speed 6339.55 samples/sec Loss 5.4360 LearningRate 0.0004 Epoch: 16 Global Step: 342790 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:37,677-Speed 6313.15 samples/sec Loss 5.4002 LearningRate 0.0004 Epoch: 16 Global Step: 342800 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:40,922-Speed 6311.55 samples/sec Loss 5.5199 LearningRate 0.0004 Epoch: 16 Global Step: 342810 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:44,169-Speed 6311.01 samples/sec Loss 5.4160 LearningRate 0.0004 Epoch: 16 Global Step: 342820 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:47,412-Speed 6316.52 samples/sec Loss 5.4455 LearningRate 0.0004 Epoch: 16 Global Step: 342830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:50,651-Speed 6323.90 samples/sec Loss 5.3614 LearningRate 0.0004 Epoch: 16 Global Step: 342840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:53,899-Speed 6307.21 samples/sec Loss 5.3639 LearningRate 0.0004 Epoch: 16 Global Step: 342850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:54:57,145-Speed 6310.01 samples/sec Loss 5.3706 LearningRate 0.0004 Epoch: 16 Global Step: 342860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:00,388-Speed 6317.03 samples/sec Loss 5.4999 LearningRate 0.0004 Epoch: 16 Global Step: 342870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:03,631-Speed 6315.91 samples/sec Loss 5.4221 LearningRate 0.0004 Epoch: 16 Global Step: 342880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:06,859-Speed 6345.82 samples/sec Loss 5.3948 LearningRate 0.0004 Epoch: 16 Global Step: 342890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:10,110-Speed 6301.28 samples/sec Loss 5.4472 LearningRate 0.0004 Epoch: 16 Global Step: 342900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:13,355-Speed 6312.33 samples/sec Loss 5.4116 LearningRate 0.0004 Epoch: 16 Global Step: 342910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:16,668-Speed 6183.32 samples/sec Loss 5.3333 LearningRate 0.0004 Epoch: 16 Global Step: 342920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:19,954-Speed 6233.41 samples/sec Loss 5.4466 LearningRate 0.0004 Epoch: 16 Global Step: 342930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:23,204-Speed 6303.29 samples/sec Loss 5.5794 LearningRate 0.0004 Epoch: 16 Global Step: 342940 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:26,452-Speed 6306.52 samples/sec Loss 5.4900 LearningRate 0.0004 Epoch: 16 Global Step: 342950 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:29,697-Speed 6313.22 samples/sec Loss 5.3513 LearningRate 0.0004 Epoch: 16 Global Step: 342960 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:32,944-Speed 6308.70 samples/sec Loss 5.4195 LearningRate 0.0004 Epoch: 16 Global Step: 342970 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:55:36,182-Speed 6326.78 samples/sec Loss 5.3656 LearningRate 0.0004 Epoch: 16 Global Step: 342980 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:55:39,426-Speed 6314.60 samples/sec Loss 5.4257 LearningRate 0.0004 Epoch: 16 Global Step: 342990 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:55:42,674-Speed 6306.12 samples/sec Loss 5.4490 LearningRate 0.0004 Epoch: 16 Global Step: 343000 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:55:45,914-Speed 6324.10 samples/sec Loss 5.4000 LearningRate 0.0004 Epoch: 16 Global Step: 343010 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:55:49,173-Speed 6287.66 samples/sec Loss 5.4526 LearningRate 0.0004 Epoch: 16 Global Step: 343020 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:55:52,416-Speed 6318.00 samples/sec Loss 5.4952 LearningRate 0.0004 Epoch: 16 Global Step: 343030 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:55:55,659-Speed 6314.70 samples/sec Loss 5.3519 LearningRate 0.0004 Epoch: 16 Global Step: 343040 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:55:58,906-Speed 6309.42 samples/sec Loss 5.4028 LearningRate 0.0004 Epoch: 16 Global Step: 343050 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:02,152-Speed 6310.70 samples/sec Loss 5.4759 LearningRate 0.0004 Epoch: 16 Global Step: 343060 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:05,401-Speed 6305.79 samples/sec Loss 5.3727 LearningRate 0.0004 Epoch: 16 Global Step: 343070 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:08,647-Speed 6310.65 samples/sec Loss 5.4721 LearningRate 0.0004 Epoch: 16 Global Step: 343080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:56:11,892-Speed 6312.31 samples/sec Loss 5.4679 LearningRate 0.0004 Epoch: 16 Global Step: 343090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:56:15,147-Speed 6296.44 samples/sec Loss 5.4789 LearningRate 0.0004 Epoch: 16 Global Step: 343100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:56:18,393-Speed 6310.18 samples/sec Loss 5.4348 LearningRate 0.0004 Epoch: 16 Global Step: 343110 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:56:21,627-Speed 6333.91 samples/sec Loss 5.4444 LearningRate 0.0004 Epoch: 16 Global Step: 343120 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:24,872-Speed 6313.00 samples/sec Loss 5.3871 LearningRate 0.0004 Epoch: 16 Global Step: 343130 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:28,115-Speed 6315.40 samples/sec Loss 5.4277 LearningRate 0.0004 Epoch: 16 Global Step: 343140 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:31,359-Speed 6315.24 samples/sec Loss 5.4531 LearningRate 0.0004 Epoch: 16 Global Step: 343150 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:34,607-Speed 6306.09 samples/sec Loss 5.4367 LearningRate 0.0004 Epoch: 16 Global Step: 343160 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:37,854-Speed 6309.93 samples/sec Loss 5.4234 LearningRate 0.0004 Epoch: 16 Global Step: 343170 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:41,096-Speed 6318.11 samples/sec Loss 5.4376 LearningRate 0.0004 Epoch: 16 Global Step: 343180 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:44,342-Speed 6309.90 samples/sec Loss 5.3911 LearningRate 0.0004 Epoch: 16 Global Step: 343190 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:47,587-Speed 6312.52 samples/sec Loss 5.4657 LearningRate 0.0004 Epoch: 16 Global Step: 343200 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:50,831-Speed 6315.60 samples/sec Loss 5.4330 LearningRate 0.0004 Epoch: 16 Global Step: 343210 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:56:54,080-Speed 6304.76 samples/sec Loss 5.4013 LearningRate 0.0004 Epoch: 16 Global Step: 343220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:56:57,325-Speed 6312.88 samples/sec Loss 5.3540 LearningRate 0.0004 Epoch: 16 Global Step: 343230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:00,584-Speed 6285.81 samples/sec Loss 5.3778 LearningRate 0.0004 Epoch: 16 Global Step: 343240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:03,836-Speed 6299.79 samples/sec Loss 5.4460 LearningRate 0.0004 Epoch: 16 Global Step: 343250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:07,083-Speed 6309.40 samples/sec Loss 5.4282 LearningRate 0.0004 Epoch: 16 Global Step: 343260 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:10,331-Speed 6305.18 samples/sec Loss 5.3779 LearningRate 0.0004 Epoch: 16 Global Step: 343270 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:13,573-Speed 6319.03 samples/sec Loss 5.4621 LearningRate 0.0004 Epoch: 16 Global Step: 343280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:16,819-Speed 6311.70 samples/sec Loss 5.3431 LearningRate 0.0004 Epoch: 16 Global Step: 343290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:20,062-Speed 6316.63 samples/sec Loss 5.3690 LearningRate 0.0004 Epoch: 16 Global Step: 343300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:23,305-Speed 6315.34 samples/sec Loss 5.4749 LearningRate 0.0004 Epoch: 16 Global Step: 343310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:26,534-Speed 6345.35 samples/sec Loss 5.4435 LearningRate 0.0004 Epoch: 16 Global Step: 343320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:29,773-Speed 6322.80 samples/sec Loss 5.3599 LearningRate 0.0004 Epoch: 16 Global Step: 343330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:33,019-Speed 6311.02 samples/sec Loss 5.4228 LearningRate 0.0004 Epoch: 16 Global Step: 343340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:36,262-Speed 6317.46 samples/sec Loss 5.3726 LearningRate 0.0004 Epoch: 16 Global Step: 343350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:39,541-Speed 6247.38 samples/sec Loss 5.4141 LearningRate 0.0004 Epoch: 16 Global Step: 343360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:42,785-Speed 6313.74 samples/sec Loss 5.3887 LearningRate 0.0004 Epoch: 16 Global Step: 343370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:46,029-Speed 6314.54 samples/sec Loss 5.3733 LearningRate 0.0004 Epoch: 16 Global Step: 343380 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:49,286-Speed 6289.72 samples/sec Loss 5.4049 LearningRate 0.0004 Epoch: 16 Global Step: 343390 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:52,529-Speed 6315.31 samples/sec Loss 5.3788 LearningRate 0.0004 Epoch: 16 Global Step: 343400 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:55,773-Speed 6315.15 samples/sec Loss 5.4201 LearningRate 0.0004 Epoch: 16 Global Step: 343410 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:57:59,014-Speed 6320.85 samples/sec Loss 5.4632 LearningRate 0.0004 Epoch: 16 Global Step: 343420 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:02,275-Speed 6282.94 samples/sec Loss 5.4102 LearningRate 0.0004 Epoch: 16 Global Step: 343430 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:05,606-Speed 6148.24 samples/sec Loss 5.3795 LearningRate 0.0004 Epoch: 16 Global Step: 343440 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:08,854-Speed 6307.39 samples/sec Loss 5.3559 LearningRate 0.0004 Epoch: 16 Global Step: 343450 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:12,194-Speed 6134.21 samples/sec Loss 5.4367 LearningRate 0.0004 Epoch: 16 Global Step: 343460 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:15,547-Speed 6107.60 samples/sec Loss 5.4092 LearningRate 0.0004 Epoch: 16 Global Step: 343470 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:18,790-Speed 6317.98 samples/sec Loss 5.4117 LearningRate 0.0004 Epoch: 16 Global Step: 343480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:22,035-Speed 6312.43 samples/sec Loss 5.3893 LearningRate 0.0004 Epoch: 16 Global Step: 343490 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:25,282-Speed 6309.49 samples/sec Loss 5.3946 LearningRate 0.0004 Epoch: 16 Global Step: 343500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:28,531-Speed 6303.47 samples/sec Loss 5.4246 LearningRate 0.0004 Epoch: 16 Global Step: 343510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:31,758-Speed 6349.07 samples/sec Loss 5.3434 LearningRate 0.0004 Epoch: 16 Global Step: 343520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:35,002-Speed 6313.32 samples/sec Loss 5.3425 LearningRate 0.0004 Epoch: 16 Global Step: 343530 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:38,256-Speed 6295.47 samples/sec Loss 5.4244 LearningRate 0.0004 Epoch: 16 Global Step: 343540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:41,506-Speed 6302.72 samples/sec Loss 5.4289 LearningRate 0.0004 Epoch: 16 Global Step: 343550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:44,752-Speed 6310.49 samples/sec Loss 5.4196 LearningRate 0.0004 Epoch: 16 Global Step: 343560 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:48,050-Speed 6211.27 samples/sec Loss 5.4058 LearningRate 0.0004 Epoch: 16 Global Step: 343570 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:51,292-Speed 6318.83 samples/sec Loss 5.3850 LearningRate 0.0004 Epoch: 16 Global Step: 343580 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:54,533-Speed 6319.84 samples/sec Loss 5.3970 LearningRate 0.0004 Epoch: 16 Global Step: 343590 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:58:57,777-Speed 6314.39 samples/sec Loss 5.3799 LearningRate 0.0004 Epoch: 16 Global Step: 343600 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:01,026-Speed 6304.81 samples/sec Loss 5.4417 LearningRate 0.0004 Epoch: 16 Global Step: 343610 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:04,256-Speed 6342.55 samples/sec Loss 5.4454 LearningRate 0.0004 Epoch: 16 Global Step: 343620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:07,507-Speed 6302.72 samples/sec Loss 5.3784 LearningRate 0.0004 Epoch: 16 Global Step: 343630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:10,751-Speed 6313.90 samples/sec Loss 5.3469 LearningRate 0.0004 Epoch: 16 Global Step: 343640 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:13,997-Speed 6310.44 samples/sec Loss 5.4439 LearningRate 0.0004 Epoch: 16 Global Step: 343650 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:17,244-Speed 6310.11 samples/sec Loss 5.4880 LearningRate 0.0004 Epoch: 16 Global Step: 343660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:20,489-Speed 6313.00 samples/sec Loss 5.3990 LearningRate 0.0004 Epoch: 16 Global Step: 343670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:23,755-Speed 6270.16 samples/sec Loss 5.4326 LearningRate 0.0004 Epoch: 16 Global Step: 343680 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:27,021-Speed 6272.74 samples/sec Loss 5.3293 LearningRate 0.0004 Epoch: 16 Global Step: 343690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:30,268-Speed 6309.24 samples/sec Loss 5.4369 LearningRate 0.0004 Epoch: 16 Global Step: 343700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:33,515-Speed 6309.09 samples/sec Loss 5.3527 LearningRate 0.0004 Epoch: 16 Global Step: 343710 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:36,749-Speed 6333.35 samples/sec Loss 5.3609 LearningRate 0.0004 Epoch: 16 Global Step: 343720 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 22:59:39,982-Speed 6336.52 samples/sec Loss 5.4041 LearningRate 0.0004 Epoch: 16 Global Step: 343730 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:59:43,247-Speed 6274.57 samples/sec Loss 5.4126 LearningRate 0.0004 Epoch: 16 Global Step: 343740 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:59:46,493-Speed 6309.38 samples/sec Loss 5.4322 LearningRate 0.0004 Epoch: 16 Global Step: 343750 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:59:49,736-Speed 6317.66 samples/sec Loss 5.5268 LearningRate 0.0004 Epoch: 16 Global Step: 343760 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:59:52,984-Speed 6307.13 samples/sec Loss 5.4344 LearningRate 0.0004 Epoch: 16 Global Step: 343770 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:59:56,228-Speed 6313.19 samples/sec Loss 5.3789 LearningRate 0.0004 Epoch: 16 Global Step: 343780 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 22:59:59,475-Speed 6310.37 samples/sec Loss 5.3629 LearningRate 0.0004 Epoch: 16 Global Step: 343790 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:00:02,722-Speed 6308.26 samples/sec Loss 5.3513 LearningRate 0.0004 Epoch: 16 Global Step: 343800 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:00:05,970-Speed 6306.78 samples/sec Loss 5.4569 LearningRate 0.0004 Epoch: 16 Global Step: 343810 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:00:09,220-Speed 6303.33 samples/sec Loss 5.3003 LearningRate 0.0004 Epoch: 16 Global Step: 343820 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:00:12,466-Speed 6311.34 samples/sec Loss 5.3482 LearningRate 0.0004 Epoch: 16 Global Step: 343830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:15,714-Speed 6305.35 samples/sec Loss 5.4477 LearningRate 0.0004 Epoch: 16 Global Step: 343840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:18,963-Speed 6307.09 samples/sec Loss 5.4109 LearningRate 0.0004 Epoch: 16 Global Step: 343850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:22,209-Speed 6310.19 samples/sec Loss 5.3618 LearningRate 0.0004 Epoch: 16 Global Step: 343860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:25,454-Speed 6312.13 samples/sec Loss 5.4309 LearningRate 0.0004 Epoch: 16 Global Step: 343870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:28,696-Speed 6318.60 samples/sec Loss 5.3897 LearningRate 0.0004 Epoch: 16 Global Step: 343880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:31,941-Speed 6312.38 samples/sec Loss 5.4628 LearningRate 0.0004 Epoch: 16 Global Step: 343890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:35,183-Speed 6318.10 samples/sec Loss 5.5078 LearningRate 0.0004 Epoch: 16 Global Step: 343900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:38,437-Speed 6295.93 samples/sec Loss 5.3562 LearningRate 0.0004 Epoch: 16 Global Step: 343910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:41,686-Speed 6304.62 samples/sec Loss 5.4161 LearningRate 0.0004 Epoch: 16 Global Step: 343920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:44,913-Speed 6347.72 samples/sec Loss 5.3901 LearningRate 0.0004 Epoch: 16 Global Step: 343930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:48,159-Speed 6311.67 samples/sec Loss 5.4142 LearningRate 0.0004 Epoch: 16 Global Step: 343940 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:51,406-Speed 6306.83 samples/sec Loss 5.4575 LearningRate 0.0004 Epoch: 16 Global Step: 343950 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:54,655-Speed 6305.69 samples/sec Loss 5.3927 LearningRate 0.0004 Epoch: 16 Global Step: 343960 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:00:57,902-Speed 6309.46 samples/sec Loss 5.3612 LearningRate 0.0004 Epoch: 16 Global Step: 343970 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:01,149-Speed 6308.03 samples/sec Loss 5.4302 LearningRate 0.0004 Epoch: 16 Global Step: 343980 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:04,398-Speed 6305.08 samples/sec Loss 5.4673 LearningRate 0.0004 Epoch: 16 Global Step: 343990 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:07,639-Speed 6320.81 samples/sec Loss 5.4539 LearningRate 0.0004 Epoch: 16 Global Step: 344000 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:10,885-Speed 6309.24 samples/sec Loss 5.3774 LearningRate 0.0004 Epoch: 16 Global Step: 344010 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:14,130-Speed 6313.77 samples/sec Loss 5.4199 LearningRate 0.0004 Epoch: 16 Global Step: 344020 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:17,363-Speed 6336.17 samples/sec Loss 5.4084 LearningRate 0.0004 Epoch: 16 Global Step: 344030 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:20,609-Speed 6311.00 samples/sec Loss 5.3949 LearningRate 0.0004 Epoch: 16 Global Step: 344040 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:23,853-Speed 6315.52 samples/sec Loss 5.3827 LearningRate 0.0004 Epoch: 16 Global Step: 344050 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:27,100-Speed 6308.58 samples/sec Loss 5.4783 LearningRate 0.0004 Epoch: 16 Global Step: 344060 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:30,343-Speed 6315.92 samples/sec Loss 5.3509 LearningRate 0.0004 Epoch: 16 Global Step: 344070 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:33,583-Speed 6322.61 samples/sec Loss 5.3600 LearningRate 0.0004 Epoch: 16 Global Step: 344080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:36,837-Speed 6294.72 samples/sec Loss 5.4153 LearningRate 0.0004 Epoch: 16 Global Step: 344090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:40,092-Speed 6292.87 samples/sec Loss 5.4342 LearningRate 0.0004 Epoch: 16 Global Step: 344100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:43,338-Speed 6311.54 samples/sec Loss 5.4536 LearningRate 0.0004 Epoch: 16 Global Step: 344110 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:46,582-Speed 6313.59 samples/sec Loss 5.3896 LearningRate 0.0004 Epoch: 16 Global Step: 344120 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:49,813-Speed 6341.26 samples/sec Loss 5.4047 LearningRate 0.0004 Epoch: 16 Global Step: 344130 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:53,064-Speed 6300.84 samples/sec Loss 5.4440 LearningRate 0.0004 Epoch: 16 Global Step: 344140 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:56,309-Speed 6313.10 samples/sec Loss 5.3711 LearningRate 0.0004 Epoch: 16 Global Step: 344150 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:01:59,553-Speed 6314.48 samples/sec Loss 5.3345 LearningRate 0.0004 Epoch: 16 Global Step: 344160 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:02,803-Speed 6301.50 samples/sec Loss 5.4018 LearningRate 0.0004 Epoch: 16 Global Step: 344170 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:06,053-Speed 6304.39 samples/sec Loss 5.3359 LearningRate 0.0004 Epoch: 16 Global Step: 344180 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:09,301-Speed 6307.19 samples/sec Loss 5.3615 LearningRate 0.0004 Epoch: 16 Global Step: 344190 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:12,544-Speed 6314.98 samples/sec Loss 5.3787 LearningRate 0.0004 Epoch: 16 Global Step: 344200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:15,790-Speed 6311.60 samples/sec Loss 5.3778 LearningRate 0.0004 Epoch: 16 Global Step: 344210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:19,034-Speed 6314.00 samples/sec Loss 5.3574 LearningRate 0.0004 Epoch: 16 Global Step: 344220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:22,270-Speed 6331.33 samples/sec Loss 5.3609 LearningRate 0.0004 Epoch: 16 Global Step: 344230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:25,514-Speed 6315.03 samples/sec Loss 5.3964 LearningRate 0.0004 Epoch: 16 Global Step: 344240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:28,761-Speed 6309.40 samples/sec Loss 5.3817 LearningRate 0.0004 Epoch: 16 Global Step: 344250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:32,010-Speed 6305.02 samples/sec Loss 5.4029 LearningRate 0.0004 Epoch: 16 Global Step: 344260 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:35,258-Speed 6306.19 samples/sec Loss 5.3943 LearningRate 0.0004 Epoch: 16 Global Step: 344270 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:38,499-Speed 6319.60 samples/sec Loss 5.4618 LearningRate 0.0004 Epoch: 16 Global Step: 344280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:41,748-Speed 6305.50 samples/sec Loss 5.4653 LearningRate 0.0004 Epoch: 16 Global Step: 344290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:45,000-Speed 6299.50 samples/sec Loss 5.3752 LearningRate 0.0004 Epoch: 16 Global Step: 344300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:48,246-Speed 6310.58 samples/sec Loss 5.3241 LearningRate 0.0004 Epoch: 16 Global Step: 344310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:51,492-Speed 6311.28 samples/sec Loss 5.4238 LearningRate 0.0004 Epoch: 16 Global Step: 344320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:54,727-Speed 6331.61 samples/sec Loss 5.4621 LearningRate 0.0004 Epoch: 16 Global Step: 344330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:02:57,970-Speed 6316.87 samples/sec Loss 5.3097 LearningRate 0.0004 Epoch: 16 Global Step: 344340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:03:01,218-Speed 6306.57 samples/sec Loss 5.3794 LearningRate 0.0004 Epoch: 16 Global Step: 344350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:03:04,461-Speed 6315.91 samples/sec Loss 5.4156 LearningRate 0.0004 Epoch: 16 Global Step: 344360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:03:07,708-Speed 6308.71 samples/sec Loss 5.4395 LearningRate 0.0004 Epoch: 16 Global Step: 344370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:03:10,951-Speed 6316.97 samples/sec Loss 5.4370 LearningRate 0.0004 Epoch: 16 Global Step: 344380 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:03:14,196-Speed 6311.74 samples/sec Loss 5.4361 LearningRate 0.0004 Epoch: 16 Global Step: 344390 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:03:17,444-Speed 6306.18 samples/sec Loss 5.3912 LearningRate 0.0004 Epoch: 16 Global Step: 344400 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:03:20,674-Speed 6343.14 samples/sec Loss 5.3375 LearningRate 0.0004 Epoch: 16 Global Step: 344410 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:03:23,920-Speed 6310.59 samples/sec Loss 5.3760 LearningRate 0.0004 Epoch: 16 Global Step: 344420 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:03:27,164-Speed 6314.65 samples/sec Loss 5.3717 LearningRate 0.0004 Epoch: 16 Global Step: 344430 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:03:30,406-Speed 6317.89 samples/sec Loss 5.4972 LearningRate 0.0004 Epoch: 16 Global Step: 344440 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:03:33,657-Speed 6302.24 samples/sec Loss 5.4235 LearningRate 0.0004 Epoch: 16 Global Step: 344450 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:03:36,898-Speed 6320.57 samples/sec Loss 5.4933 LearningRate 0.0004 Epoch: 16 Global Step: 344460 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:03:40,143-Speed 6311.93 samples/sec Loss 5.4145 LearningRate 0.0004 Epoch: 16 Global Step: 344470 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:03:43,389-Speed 6311.09 samples/sec Loss 5.4980 LearningRate 0.0004 Epoch: 16 Global Step: 344480 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:03:46,638-Speed 6306.00 samples/sec Loss 5.4496 LearningRate 0.0004 Epoch: 16 Global Step: 344490 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:03:49,883-Speed 6311.50 samples/sec Loss 5.3871 LearningRate 0.0004 Epoch: 16 Global Step: 344500 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:03:53,125-Speed 6318.20 samples/sec Loss 5.3979 LearningRate 0.0004 Epoch: 16 Global Step: 344510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:03:56,371-Speed 6312.04 samples/sec Loss 5.3330 LearningRate 0.0004 Epoch: 16 Global Step: 344520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:03:59,617-Speed 6311.14 samples/sec Loss 5.4893 LearningRate 0.0004 Epoch: 16 Global Step: 344530 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:02,862-Speed 6311.51 samples/sec Loss 5.4728 LearningRate 0.0004 Epoch: 16 Global Step: 344540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:06,108-Speed 6311.34 samples/sec Loss 5.4093 LearningRate 0.0004 Epoch: 16 Global Step: 344550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:09,350-Speed 6317.96 samples/sec Loss 5.3881 LearningRate 0.0004 Epoch: 16 Global Step: 344560 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:12,595-Speed 6312.53 samples/sec Loss 5.4297 LearningRate 0.0004 Epoch: 16 Global Step: 344570 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:15,840-Speed 6313.30 samples/sec Loss 5.4601 LearningRate 0.0004 Epoch: 16 Global Step: 344580 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:19,086-Speed 6310.27 samples/sec Loss 5.3845 LearningRate 0.0004 Epoch: 16 Global Step: 344590 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:22,332-Speed 6311.94 samples/sec Loss 5.3776 LearningRate 0.0004 Epoch: 16 Global Step: 344600 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:25,569-Speed 6330.85 samples/sec Loss 5.3612 LearningRate 0.0004 Epoch: 16 Global Step: 344610 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:28,813-Speed 6314.41 samples/sec Loss 5.4464 LearningRate 0.0004 Epoch: 16 Global Step: 344620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:32,063-Speed 6301.40 samples/sec Loss 5.4659 LearningRate 0.0004 Epoch: 16 Global Step: 344630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:04:35,292-Speed 6344.78 samples/sec Loss 5.3744 LearningRate 0.0004 Epoch: 16 Global Step: 344640 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:04:38,537-Speed 6313.49 samples/sec Loss 5.4327 LearningRate 0.0004 Epoch: 16 Global Step: 344650 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:04:41,786-Speed 6303.40 samples/sec Loss 5.4333 LearningRate 0.0004 Epoch: 16 Global Step: 344660 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:04:45,032-Speed 6311.88 samples/sec Loss 5.3882 LearningRate 0.0004 Epoch: 16 Global Step: 344670 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:04:48,280-Speed 6307.66 samples/sec Loss 5.3231 LearningRate 0.0004 Epoch: 16 Global Step: 344680 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:04:51,523-Speed 6315.70 samples/sec Loss 5.4436 LearningRate 0.0004 Epoch: 16 Global Step: 344690 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:04:54,768-Speed 6312.73 samples/sec Loss 5.3583 LearningRate 0.0004 Epoch: 16 Global Step: 344700 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:04:58,016-Speed 6307.74 samples/sec Loss 5.4353 LearningRate 0.0004 Epoch: 16 Global Step: 344710 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:05:01,261-Speed 6311.79 samples/sec Loss 5.4168 LearningRate 0.0004 Epoch: 16 Global Step: 344720 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:05:04,509-Speed 6306.65 samples/sec Loss 5.3643 LearningRate 0.0004 Epoch: 16 Global Step: 344730 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:05:07,756-Speed 6308.74 samples/sec Loss 5.3720 LearningRate 0.0004 Epoch: 16 Global Step: 344740 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:10,999-Speed 6316.48 samples/sec Loss 5.4201 LearningRate 0.0004 Epoch: 16 Global Step: 344750 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:14,246-Speed 6309.25 samples/sec Loss 5.3767 LearningRate 0.0004 Epoch: 16 Global Step: 344760 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:17,491-Speed 6311.99 samples/sec Loss 5.4209 LearningRate 0.0004 Epoch: 16 Global Step: 344770 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:20,734-Speed 6316.44 samples/sec Loss 5.3902 LearningRate 0.0004 Epoch: 16 Global Step: 344780 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:23,983-Speed 6306.79 samples/sec Loss 5.4587 LearningRate 0.0004 Epoch: 16 Global Step: 344790 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:27,224-Speed 6320.56 samples/sec Loss 5.4596 LearningRate 0.0004 Epoch: 16 Global Step: 344800 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:30,470-Speed 6310.84 samples/sec Loss 5.4914 LearningRate 0.0004 Epoch: 16 Global Step: 344810 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:33,728-Speed 6287.09 samples/sec Loss 5.4215 LearningRate 0.0004 Epoch: 16 Global Step: 344820 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:36,974-Speed 6310.56 samples/sec Loss 5.3652 LearningRate 0.0004 Epoch: 16 Global Step: 344830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:40,207-Speed 6334.55 samples/sec Loss 5.3889 LearningRate 0.0004 Epoch: 16 Global Step: 344840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:43,454-Speed 6309.59 samples/sec Loss 5.4579 LearningRate 0.0004 Epoch: 16 Global Step: 344850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:46,704-Speed 6303.73 samples/sec Loss 5.4470 LearningRate 0.0004 Epoch: 16 Global Step: 344860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:49,958-Speed 6293.70 samples/sec Loss 5.4142 LearningRate 0.0004 Epoch: 16 Global Step: 344870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:53,204-Speed 6311.67 samples/sec Loss 5.4298 LearningRate 0.0004 Epoch: 16 Global Step: 344880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:56,447-Speed 6316.66 samples/sec Loss 5.4015 LearningRate 0.0004 Epoch: 16 Global Step: 344890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:05:59,696-Speed 6304.27 samples/sec Loss 5.4273 LearningRate 0.0004 Epoch: 16 Global Step: 344900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:06:02,947-Speed 6303.10 samples/sec Loss 5.4042 LearningRate 0.0004 Epoch: 16 Global Step: 344910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:06:06,194-Speed 6307.14 samples/sec Loss 5.4028 LearningRate 0.0004 Epoch: 16 Global Step: 344920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:06:09,439-Speed 6314.00 samples/sec Loss 5.4372 LearningRate 0.0004 Epoch: 16 Global Step: 344930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:06:12,671-Speed 6338.49 samples/sec Loss 5.3647 LearningRate 0.0004 Epoch: 16 Global Step: 344940 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:06:15,919-Speed 6306.52 samples/sec Loss 5.4383 LearningRate 0.0004 Epoch: 16 Global Step: 344950 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:06:19,168-Speed 6304.43 samples/sec Loss 5.3755 LearningRate 0.0004 Epoch: 16 Global Step: 344960 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:06:22,411-Speed 6316.20 samples/sec Loss 5.4229 LearningRate 0.0004 Epoch: 16 Global Step: 344970 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:06:25,661-Speed 6304.71 samples/sec Loss 5.4317 LearningRate 0.0004 Epoch: 16 Global Step: 344980 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:06:28,906-Speed 6310.64 samples/sec Loss 5.3915 LearningRate 0.0004 Epoch: 16 Global Step: 344990 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:06:32,150-Speed 6315.47 samples/sec Loss 5.4509 LearningRate 0.0004 Epoch: 16 Global Step: 345000 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:06:35,394-Speed 6314.43 samples/sec Loss 5.3657 LearningRate 0.0004 Epoch: 16 Global Step: 345010 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:06:38,643-Speed 6304.49 samples/sec Loss 5.3865 LearningRate 0.0004 Epoch: 16 Global Step: 345020 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:06:41,887-Speed 6315.11 samples/sec Loss 5.3850 LearningRate 0.0004 Epoch: 16 Global Step: 345030 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:06:45,135-Speed 6307.70 samples/sec Loss 5.4344 LearningRate 0.0004 Epoch: 16 Global Step: 345040 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:06:48,381-Speed 6310.48 samples/sec Loss 5.4120 LearningRate 0.0004 Epoch: 16 Global Step: 345050 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:06:51,625-Speed 6313.34 samples/sec Loss 5.4628 LearningRate 0.0004 Epoch: 16 Global Step: 345060 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:06:54,872-Speed 6310.12 samples/sec Loss 5.4217 LearningRate 0.0004 Epoch: 16 Global Step: 345070 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:06:58,116-Speed 6314.47 samples/sec Loss 5.4527 LearningRate 0.0004 Epoch: 16 Global Step: 345080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:07:01,362-Speed 6309.90 samples/sec Loss 5.4147 LearningRate 0.0004 Epoch: 16 Global Step: 345090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:07:04,609-Speed 6309.62 samples/sec Loss 5.4204 LearningRate 0.0004 Epoch: 16 Global Step: 345100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:07:07,840-Speed 6338.92 samples/sec Loss 5.3622 LearningRate 0.0004 Epoch: 16 Global Step: 345110 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:07:11,098-Speed 6287.35 samples/sec Loss 5.3875 LearningRate 0.0004 Epoch: 16 Global Step: 345120 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:07:14,340-Speed 6319.09 samples/sec Loss 5.3615 LearningRate 0.0004 Epoch: 16 Global Step: 345130 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:07:17,585-Speed 6314.12 samples/sec Loss 5.3531 LearningRate 0.0004 Epoch: 16 Global Step: 345140 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:07:20,833-Speed 6305.44 samples/sec Loss 5.3669 LearningRate 0.0004 Epoch: 16 Global Step: 345150 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:07:24,079-Speed 6312.21 samples/sec Loss 5.3860 LearningRate 0.0004 Epoch: 16 Global Step: 345160 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:07:27,327-Speed 6305.95 samples/sec Loss 5.4583 LearningRate 0.0004 Epoch: 16 Global Step: 345170 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:07:30,573-Speed 6311.91 samples/sec Loss 5.4050 LearningRate 0.0004 Epoch: 16 Global Step: 345180 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:07:33,818-Speed 6311.17 samples/sec Loss 5.3580 LearningRate 0.0004 Epoch: 16 Global Step: 345190 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:07:37,066-Speed 6307.20 samples/sec Loss 5.3937 LearningRate 0.0004 Epoch: 16 Global Step: 345200 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:07:40,314-Speed 6307.11 samples/sec Loss 5.3836 LearningRate 0.0004 Epoch: 16 Global Step: 345210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:07:43,563-Speed 6305.41 samples/sec Loss 5.4582 LearningRate 0.0004 Epoch: 16 Global Step: 345220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:07:46,812-Speed 6304.53 samples/sec Loss 5.4506 LearningRate 0.0004 Epoch: 16 Global Step: 345230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:07:50,054-Speed 6317.74 samples/sec Loss 5.4259 LearningRate 0.0004 Epoch: 16 Global Step: 345240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:07:53,312-Speed 6288.97 samples/sec Loss 5.4368 LearningRate 0.0004 Epoch: 16 Global Step: 345250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:07:56,556-Speed 6314.46 samples/sec Loss 5.4101 LearningRate 0.0004 Epoch: 16 Global Step: 345260 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:07:59,804-Speed 6305.49 samples/sec Loss 5.3247 LearningRate 0.0004 Epoch: 16 Global Step: 345270 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:08:03,051-Speed 6309.41 samples/sec Loss 5.4428 LearningRate 0.0004 Epoch: 16 Global Step: 345280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:08:06,307-Speed 6291.52 samples/sec Loss 5.3820 LearningRate 0.0004 Epoch: 16 Global Step: 345290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:08:09,550-Speed 6315.11 samples/sec Loss 5.4404 LearningRate 0.0004 Epoch: 16 Global Step: 345300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:08:12,790-Speed 6323.10 samples/sec Loss 5.3548 LearningRate 0.0004 Epoch: 16 Global Step: 345310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:08:16,038-Speed 6308.16 samples/sec Loss 5.3504 LearningRate 0.0004 Epoch: 16 Global Step: 345320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:08:19,272-Speed 6333.66 samples/sec Loss 5.3906 LearningRate 0.0004 Epoch: 16 Global Step: 345330 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:08:22,530-Speed 6287.49 samples/sec Loss 5.3384 LearningRate 0.0004 Epoch: 16 Global Step: 345340 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:08:25,776-Speed 6312.13 samples/sec Loss 5.4397 LearningRate 0.0004 Epoch: 16 Global Step: 345350 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:08:29,025-Speed 6304.52 samples/sec Loss 5.4152 LearningRate 0.0004 Epoch: 16 Global Step: 345360 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:08:32,274-Speed 6303.80 samples/sec Loss 5.4296 LearningRate 0.0004 Epoch: 16 Global Step: 345370 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:08:35,518-Speed 6314.81 samples/sec Loss 5.4001 LearningRate 0.0004 Epoch: 16 Global Step: 345380 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:08:38,766-Speed 6308.07 samples/sec Loss 5.3419 LearningRate 0.0004 Epoch: 16 Global Step: 345390 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:08:42,018-Speed 6297.11 samples/sec Loss 5.4494 LearningRate 0.0004 Epoch: 16 Global Step: 345400 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:08:45,267-Speed 6306.85 samples/sec Loss 5.3507 LearningRate 0.0004 Epoch: 16 Global Step: 345410 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:08:48,516-Speed 6304.38 samples/sec Loss 5.3589 LearningRate 0.0004 Epoch: 16 Global Step: 345420 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:08:51,761-Speed 6313.01 samples/sec Loss 5.4206 LearningRate 0.0004 Epoch: 16 Global Step: 345430 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:08:55,008-Speed 6307.53 samples/sec Loss 5.3738 LearningRate 0.0004 Epoch: 16 Global Step: 345440 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:08:58,252-Speed 6314.56 samples/sec Loss 5.2809 LearningRate 0.0004 Epoch: 16 Global Step: 345450 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:09:01,499-Speed 6308.47 samples/sec Loss 5.4271 LearningRate 0.0004 Epoch: 16 Global Step: 345460 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:09:04,750-Speed 6301.67 samples/sec Loss 5.3961 LearningRate 0.0004 Epoch: 16 Global Step: 345470 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:09:08,007-Speed 6289.25 samples/sec Loss 5.3586 LearningRate 0.0004 Epoch: 16 Global Step: 345480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:09:11,252-Speed 6312.23 samples/sec Loss 5.4093 LearningRate 0.0004 Epoch: 16 Global Step: 345490 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:09:14,500-Speed 6308.47 samples/sec Loss 5.4389 LearningRate 0.0004 Epoch: 16 Global Step: 345500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:09:17,747-Speed 6308.14 samples/sec Loss 5.4563 LearningRate 0.0004 Epoch: 16 Global Step: 345510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:09:20,995-Speed 6307.01 samples/sec Loss 5.4018 LearningRate 0.0004 Epoch: 16 Global Step: 345520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:09:24,240-Speed 6311.79 samples/sec Loss 5.4125 LearningRate 0.0004 Epoch: 16 Global Step: 345530 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-04-01 23:09:27,471-Speed 6341.32 samples/sec Loss 5.3564 LearningRate 0.0004 Epoch: 16 Global Step: 345540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:09:30,716-Speed 6312.99 samples/sec Loss 5.4655 LearningRate 0.0004 Epoch: 16 Global Step: 345550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:09:33,948-Speed 6336.70 samples/sec Loss 5.4113 LearningRate 0.0004 Epoch: 16 Global Step: 345560 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:09:37,202-Speed 6296.14 samples/sec Loss 5.3976 LearningRate 0.0004 Epoch: 16 Global Step: 345570 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:09:40,454-Speed 6298.66 samples/sec Loss 5.3738 LearningRate 0.0004 Epoch: 16 Global Step: 345580 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:09:43,709-Speed 6293.90 samples/sec Loss 5.3850 LearningRate 0.0004 Epoch: 16 Global Step: 345590 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:09:46,952-Speed 6315.42 samples/sec Loss 5.4334 LearningRate 0.0004 Epoch: 16 Global Step: 345600 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:09:50,194-Speed 6318.40 samples/sec Loss 5.4209 LearningRate 0.0004 Epoch: 16 Global Step: 345610 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:09:53,441-Speed 6308.79 samples/sec Loss 5.3687 LearningRate 0.0004 Epoch: 16 Global Step: 345620 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:09:56,690-Speed 6305.23 samples/sec Loss 5.4549 LearningRate 0.0004 Epoch: 16 Global Step: 345630 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:09:59,936-Speed 6311.39 samples/sec Loss 5.3500 LearningRate 0.0004 Epoch: 16 Global Step: 345640 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:03,195-Speed 6284.95 samples/sec Loss 5.4095 LearningRate 0.0004 Epoch: 16 Global Step: 345650 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:06,439-Speed 6315.61 samples/sec Loss 5.4231 LearningRate 0.0004 Epoch: 16 Global Step: 345660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:10:09,673-Speed 6333.29 samples/sec Loss 5.3636 LearningRate 0.0004 Epoch: 16 Global Step: 345670 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:12,921-Speed 6306.75 samples/sec Loss 5.3794 LearningRate 0.0004 Epoch: 16 Global Step: 345680 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:16,165-Speed 6314.44 samples/sec Loss 5.4063 LearningRate 0.0004 Epoch: 16 Global Step: 345690 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:19,413-Speed 6306.29 samples/sec Loss 5.3492 LearningRate 0.0004 Epoch: 16 Global Step: 345700 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:22,670-Speed 6290.66 samples/sec Loss 5.4136 LearningRate 0.0004 Epoch: 16 Global Step: 345710 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:25,923-Speed 6297.10 samples/sec Loss 5.3945 LearningRate 0.0004 Epoch: 16 Global Step: 345720 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:29,173-Speed 6301.98 samples/sec Loss 5.3412 LearningRate 0.0004 Epoch: 16 Global Step: 345730 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:32,416-Speed 6316.82 samples/sec Loss 5.3902 LearningRate 0.0004 Epoch: 16 Global Step: 345740 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:35,664-Speed 6307.60 samples/sec Loss 5.4085 LearningRate 0.0004 Epoch: 16 Global Step: 345750 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:38,909-Speed 6313.44 samples/sec Loss 5.3401 LearningRate 0.0004 Epoch: 16 Global Step: 345760 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:42,154-Speed 6311.74 samples/sec Loss 5.4321 LearningRate 0.0004 Epoch: 16 Global Step: 345770 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:10:45,400-Speed 6310.96 samples/sec Loss 5.4199 LearningRate 0.0004 Epoch: 16 Global Step: 345780 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:10:48,646-Speed 6310.50 samples/sec Loss 5.3535 LearningRate 0.0004 Epoch: 16 Global Step: 345790 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:10:51,893-Speed 6308.83 samples/sec Loss 5.4108 LearningRate 0.0004 Epoch: 16 Global Step: 345800 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:10:55,127-Speed 6334.32 samples/sec Loss 5.4201 LearningRate 0.0004 Epoch: 16 Global Step: 345810 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:10:58,377-Speed 6303.28 samples/sec Loss 5.3992 LearningRate 0.0004 Epoch: 16 Global Step: 345820 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:11:01,618-Speed 6319.96 samples/sec Loss 5.4479 LearningRate 0.0004 Epoch: 16 Global Step: 345830 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:11:04,873-Speed 6294.28 samples/sec Loss 5.4135 LearningRate 0.0004 Epoch: 16 Global Step: 345840 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:11:08,121-Speed 6305.31 samples/sec Loss 5.4022 LearningRate 0.0004 Epoch: 16 Global Step: 345850 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:11:11,367-Speed 6311.35 samples/sec Loss 5.3842 LearningRate 0.0004 Epoch: 16 Global Step: 345860 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:11:14,611-Speed 6315.60 samples/sec Loss 5.3899 LearningRate 0.0004 Epoch: 16 Global Step: 345870 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:11:17,856-Speed 6312.44 samples/sec Loss 5.3875 LearningRate 0.0004 Epoch: 16 Global Step: 345880 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:11:21,100-Speed 6313.53 samples/sec Loss 5.4306 LearningRate 0.0004 Epoch: 16 Global Step: 345890 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:11:24,345-Speed 6313.77 samples/sec Loss 5.4524 LearningRate 0.0004 Epoch: 16 Global Step: 345900 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:11:27,595-Speed 6302.48 samples/sec Loss 5.3694 LearningRate 0.0004 Epoch: 16 Global Step: 345910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:11:30,851-Speed 6292.42 samples/sec Loss 5.3459 LearningRate 0.0004 Epoch: 16 Global Step: 345920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:11:34,096-Speed 6310.82 samples/sec Loss 5.4632 LearningRate 0.0004 Epoch: 16 Global Step: 345930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:11:37,354-Speed 6287.75 samples/sec Loss 5.3963 LearningRate 0.0004 Epoch: 16 Global Step: 345940 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:11:40,601-Speed 6308.20 samples/sec Loss 5.4447 LearningRate 0.0004 Epoch: 16 Global Step: 345950 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:11:43,845-Speed 6314.79 samples/sec Loss 5.4254 LearningRate 0.0004 Epoch: 16 Global Step: 345960 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:11:47,096-Speed 6302.89 samples/sec Loss 5.4218 LearningRate 0.0004 Epoch: 16 Global Step: 345970 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:11:50,354-Speed 6286.82 samples/sec Loss 5.4176 LearningRate 0.0004 Epoch: 16 Global Step: 345980 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:11:53,610-Speed 6292.36 samples/sec Loss 5.4540 LearningRate 0.0004 Epoch: 16 Global Step: 345990 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:11:56,855-Speed 6311.62 samples/sec Loss 5.4210 LearningRate 0.0004 Epoch: 16 Global Step: 346000 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:12:00,114-Speed 6286.58 samples/sec Loss 5.4028 LearningRate 0.0004 Epoch: 16 Global Step: 346010 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-04-01 23:12:03,347-Speed 6334.60 samples/sec Loss 5.4174 LearningRate 0.0004 Epoch: 16 Global Step: 346020 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:12:06,604-Speed 6290.80 samples/sec Loss 5.4165 LearningRate 0.0004 Epoch: 16 Global Step: 346030 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:12:09,862-Speed 6285.72 samples/sec Loss 5.3216 LearningRate 0.0004 Epoch: 16 Global Step: 346040 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:12:13,109-Speed 6309.62 samples/sec Loss 5.4123 LearningRate 0.0004 Epoch: 16 Global Step: 346050 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:12:16,360-Speed 6302.01 samples/sec Loss 5.4292 LearningRate 0.0004 Epoch: 16 Global Step: 346060 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:12:19,611-Speed 6299.54 samples/sec Loss 5.4336 LearningRate 0.0004 Epoch: 16 Global Step: 346070 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:12:22,857-Speed 6311.07 samples/sec Loss 5.4299 LearningRate 0.0004 Epoch: 16 Global Step: 346080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:12:26,102-Speed 6312.44 samples/sec Loss 5.3461 LearningRate 0.0004 Epoch: 16 Global Step: 346090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:12:29,358-Speed 6291.99 samples/sec Loss 5.3688 LearningRate 0.0004 Epoch: 16 Global Step: 346100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:12:32,590-Speed 6336.96 samples/sec Loss 5.3117 LearningRate 0.0004 Epoch: 16 Global Step: 346110 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:12:35,836-Speed 6311.12 samples/sec Loss 5.3619 LearningRate 0.0004 Epoch: 16 Global Step: 346120 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:12:39,086-Speed 6303.03 samples/sec Loss 5.4249 LearningRate 0.0004 Epoch: 16 Global Step: 346130 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:12:42,332-Speed 6311.39 samples/sec Loss 5.3319 LearningRate 0.0004 Epoch: 16 Global Step: 346140 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:12:45,580-Speed 6305.66 samples/sec Loss 5.3276 LearningRate 0.0004 Epoch: 16 Global Step: 346150 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:12:48,834-Speed 6296.70 samples/sec Loss 5.4087 LearningRate 0.0004 Epoch: 16 Global Step: 346160 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:12:52,077-Speed 6316.40 samples/sec Loss 5.3429 LearningRate 0.0004 Epoch: 16 Global Step: 346170 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:12:55,331-Speed 6294.88 samples/sec Loss 5.4753 LearningRate 0.0004 Epoch: 16 Global Step: 346180 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:12:58,575-Speed 6314.98 samples/sec Loss 5.3933 LearningRate 0.0004 Epoch: 16 Global Step: 346190 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:13:01,821-Speed 6311.74 samples/sec Loss 5.3187 LearningRate 0.0004 Epoch: 16 Global Step: 346200 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:13:05,068-Speed 6307.30 samples/sec Loss 5.3962 LearningRate 0.0004 Epoch: 16 Global Step: 346210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:08,326-Speed 6288.37 samples/sec Loss 5.3287 LearningRate 0.0004 Epoch: 16 Global Step: 346220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:11,576-Speed 6303.19 samples/sec Loss 5.2840 LearningRate 0.0004 Epoch: 16 Global Step: 346230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:14,829-Speed 6299.84 samples/sec Loss 5.4144 LearningRate 0.0004 Epoch: 16 Global Step: 346240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:18,079-Speed 6304.63 samples/sec Loss 5.4146 LearningRate 0.0004 Epoch: 16 Global Step: 346250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:21,323-Speed 6313.12 samples/sec Loss 5.3674 LearningRate 0.0004 Epoch: 16 Global Step: 346260 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:24,570-Speed 6308.52 samples/sec Loss 5.3993 LearningRate 0.0004 Epoch: 16 Global Step: 346270 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:27,819-Speed 6305.50 samples/sec Loss 5.4252 LearningRate 0.0004 Epoch: 16 Global Step: 346280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:31,064-Speed 6313.36 samples/sec Loss 5.4312 LearningRate 0.0004 Epoch: 16 Global Step: 346290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:34,310-Speed 6311.25 samples/sec Loss 5.4172 LearningRate 0.0004 Epoch: 16 Global Step: 346300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:37,541-Speed 6339.52 samples/sec Loss 5.4256 LearningRate 0.0004 Epoch: 16 Global Step: 346310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:40,789-Speed 6305.70 samples/sec Loss 5.4094 LearningRate 0.0004 Epoch: 16 Global Step: 346320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:44,031-Speed 6318.84 samples/sec Loss 5.3454 LearningRate 0.0004 Epoch: 16 Global Step: 346330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:47,277-Speed 6310.37 samples/sec Loss 5.4515 LearningRate 0.0004 Epoch: 16 Global Step: 346340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:50,529-Speed 6300.87 samples/sec Loss 5.4123 LearningRate 0.0004 Epoch: 16 Global Step: 346350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:53,774-Speed 6311.50 samples/sec Loss 5.3754 LearningRate 0.0004 Epoch: 16 Global Step: 346360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:13:57,026-Speed 6298.20 samples/sec Loss 5.3864 LearningRate 0.0004 Epoch: 16 Global Step: 346370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:14:00,258-Speed 6339.78 samples/sec Loss 5.3699 LearningRate 0.0004 Epoch: 16 Global Step: 346380 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:14:03,504-Speed 6311.38 samples/sec Loss 5.4105 LearningRate 0.0004 Epoch: 16 Global Step: 346390 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:14:06,749-Speed 6313.01 samples/sec Loss 5.4258 LearningRate 0.0004 Epoch: 16 Global Step: 346400 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:14:09,992-Speed 6314.91 samples/sec Loss 5.3465 LearningRate 0.0004 Epoch: 16 Global Step: 346410 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:14:13,236-Speed 6314.95 samples/sec Loss 5.3484 LearningRate 0.0004 Epoch: 16 Global Step: 346420 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:14:16,480-Speed 6314.31 samples/sec Loss 5.4112 LearningRate 0.0004 Epoch: 16 Global Step: 346430 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:14:19,729-Speed 6305.94 samples/sec Loss 5.3834 LearningRate 0.0004 Epoch: 16 Global Step: 346440 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:14:22,977-Speed 6306.34 samples/sec Loss 5.4011 LearningRate 0.0004 Epoch: 16 Global Step: 346450 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:14:26,223-Speed 6310.32 samples/sec Loss 5.3797 LearningRate 0.0004 Epoch: 16 Global Step: 346460 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:14:29,470-Speed 6309.90 samples/sec Loss 5.4318 LearningRate 0.0004 Epoch: 16 Global Step: 346470 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:14:32,712-Speed 6316.99 samples/sec Loss 5.3462 LearningRate 0.0004 Epoch: 16 Global Step: 346480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:14:35,960-Speed 6308.38 samples/sec Loss 5.3277 LearningRate 0.0004 Epoch: 16 Global Step: 346490 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:14:39,205-Speed 6311.27 samples/sec Loss 5.5037 LearningRate 0.0004 Epoch: 16 Global Step: 346500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:14:42,450-Speed 6312.56 samples/sec Loss 5.3565 LearningRate 0.0004 Epoch: 16 Global Step: 346510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:14:45,698-Speed 6307.49 samples/sec Loss 5.3891 LearningRate 0.0004 Epoch: 16 Global Step: 346520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:14:48,944-Speed 6311.17 samples/sec Loss 5.3440 LearningRate 0.0004 Epoch: 16 Global Step: 346530 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:14:52,192-Speed 6306.05 samples/sec Loss 5.3804 LearningRate 0.0004 Epoch: 16 Global Step: 346540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:14:55,439-Speed 6309.54 samples/sec Loss 5.4499 LearningRate 0.0004 Epoch: 16 Global Step: 346550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:14:58,686-Speed 6307.54 samples/sec Loss 5.3697 LearningRate 0.0004 Epoch: 16 Global Step: 346560 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:15:01,936-Speed 6304.69 samples/sec Loss 5.3798 LearningRate 0.0004 Epoch: 16 Global Step: 346570 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:15:05,169-Speed 6335.39 samples/sec Loss 5.4283 LearningRate 0.0004 Epoch: 16 Global Step: 346580 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:15:08,404-Speed 6331.86 samples/sec Loss 5.3766 LearningRate 0.0004 Epoch: 16 Global Step: 346590 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:15:11,650-Speed 6312.46 samples/sec Loss 5.3952 LearningRate 0.0004 Epoch: 16 Global Step: 346600 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:15:14,899-Speed 6304.16 samples/sec Loss 5.3698 LearningRate 0.0004 Epoch: 16 Global Step: 346610 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:15:18,149-Speed 6302.85 samples/sec Loss 5.4311 LearningRate 0.0004 Epoch: 16 Global Step: 346620 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:15:21,394-Speed 6312.01 samples/sec Loss 5.3917 LearningRate 0.0004 Epoch: 16 Global Step: 346630 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:15:24,643-Speed 6306.30 samples/sec Loss 5.3238 LearningRate 0.0004 Epoch: 16 Global Step: 346640 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:15:27,888-Speed 6311.89 samples/sec Loss 5.4743 LearningRate 0.0004 Epoch: 16 Global Step: 346650 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:15:31,136-Speed 6306.88 samples/sec Loss 5.5014 LearningRate 0.0004 Epoch: 16 Global Step: 346660 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:15:34,380-Speed 6313.88 samples/sec Loss 5.3899 LearningRate 0.0004 Epoch: 16 Global Step: 346670 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:15:37,626-Speed 6312.02 samples/sec Loss 5.4078 LearningRate 0.0004 Epoch: 16 Global Step: 346680 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:15:40,874-Speed 6306.51 samples/sec Loss 5.4294 LearningRate 0.0004 Epoch: 16 Global Step: 346690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:15:44,122-Speed 6306.27 samples/sec Loss 5.3285 LearningRate 0.0004 Epoch: 16 Global Step: 346700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:15:47,369-Speed 6309.50 samples/sec Loss 5.4715 LearningRate 0.0004 Epoch: 16 Global Step: 346710 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:15:50,626-Speed 6288.78 samples/sec Loss 5.3740 LearningRate 0.0004 Epoch: 16 Global Step: 346720 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:15:53,874-Speed 6306.49 samples/sec Loss 5.3808 LearningRate 0.0004 Epoch: 16 Global Step: 346730 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:15:57,121-Speed 6308.67 samples/sec Loss 5.4174 LearningRate 0.0004 Epoch: 16 Global Step: 346740 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:00,367-Speed 6312.38 samples/sec Loss 5.3714 LearningRate 0.0004 Epoch: 16 Global Step: 346750 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:03,657-Speed 6225.35 samples/sec Loss 5.4036 LearningRate 0.0004 Epoch: 16 Global Step: 346760 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:06,909-Speed 6300.13 samples/sec Loss 5.4374 LearningRate 0.0004 Epoch: 16 Global Step: 346770 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:10,167-Speed 6285.84 samples/sec Loss 5.3926 LearningRate 0.0004 Epoch: 16 Global Step: 346780 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:13,404-Speed 6328.00 samples/sec Loss 5.3582 LearningRate 0.0004 Epoch: 16 Global Step: 346790 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:16,651-Speed 6309.50 samples/sec Loss 5.3702 LearningRate 0.0004 Epoch: 16 Global Step: 346800 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:19,895-Speed 6314.89 samples/sec Loss 5.4302 LearningRate 0.0004 Epoch: 16 Global Step: 346810 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:23,144-Speed 6306.50 samples/sec Loss 5.3729 LearningRate 0.0004 Epoch: 16 Global Step: 346820 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:26,393-Speed 6303.87 samples/sec Loss 5.3233 LearningRate 0.0004 Epoch: 16 Global Step: 346830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:29,638-Speed 6313.15 samples/sec Loss 5.3424 LearningRate 0.0004 Epoch: 16 Global Step: 346840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:32,887-Speed 6304.50 samples/sec Loss 5.4229 LearningRate 0.0004 Epoch: 16 Global Step: 346850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:36,136-Speed 6306.11 samples/sec Loss 5.3436 LearningRate 0.0004 Epoch: 16 Global Step: 346860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:39,420-Speed 6235.93 samples/sec Loss 5.4554 LearningRate 0.0004 Epoch: 16 Global Step: 346870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:42,703-Speed 6239.85 samples/sec Loss 5.4246 LearningRate 0.0004 Epoch: 16 Global Step: 346880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:45,946-Speed 6317.92 samples/sec Loss 5.3708 LearningRate 0.0004 Epoch: 16 Global Step: 346890 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-04-01 23:16:49,179-Speed 6335.09 samples/sec Loss 5.4212 LearningRate 0.0004 Epoch: 16 Global Step: 346900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:52,427-Speed 6306.93 samples/sec Loss 5.4786 LearningRate 0.0004 Epoch: 16 Global Step: 346910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:55,672-Speed 6313.49 samples/sec Loss 5.3894 LearningRate 0.0004 Epoch: 16 Global Step: 346920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:16:58,929-Speed 6289.24 samples/sec Loss 5.4079 LearningRate 0.0004 Epoch: 16 Global Step: 346930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:17:02,174-Speed 6312.46 samples/sec Loss 5.3638 LearningRate 0.0004 Epoch: 16 Global Step: 346940 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:17:05,417-Speed 6315.37 samples/sec Loss 5.3051 LearningRate 0.0004 Epoch: 16 Global Step: 346950 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:17:08,649-Speed 6339.28 samples/sec Loss 5.3694 LearningRate 0.0004 Epoch: 16 Global Step: 346960 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:11,900-Speed 6300.04 samples/sec Loss 5.3276 LearningRate 0.0004 Epoch: 16 Global Step: 346970 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:15,148-Speed 6307.42 samples/sec Loss 5.4147 LearningRate 0.0004 Epoch: 16 Global Step: 346980 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:18,391-Speed 6316.05 samples/sec Loss 5.3451 LearningRate 0.0004 Epoch: 16 Global Step: 346990 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:21,635-Speed 6314.26 samples/sec Loss 5.3446 LearningRate 0.0004 Epoch: 16 Global Step: 347000 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:24,883-Speed 6307.04 samples/sec Loss 5.4551 LearningRate 0.0004 Epoch: 16 Global Step: 347010 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:28,128-Speed 6312.99 samples/sec Loss 5.3488 LearningRate 0.0004 Epoch: 16 Global Step: 347020 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:31,376-Speed 6308.36 samples/sec Loss 5.3977 LearningRate 0.0004 Epoch: 16 Global Step: 347030 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:34,622-Speed 6310.15 samples/sec Loss 5.4161 LearningRate 0.0004 Epoch: 16 Global Step: 347040 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:37,865-Speed 6317.40 samples/sec Loss 5.3891 LearningRate 0.0004 Epoch: 16 Global Step: 347050 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:41,105-Speed 6322.62 samples/sec Loss 5.4212 LearningRate 0.0004 Epoch: 16 Global Step: 347060 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:17:44,353-Speed 6306.93 samples/sec Loss 5.3677 LearningRate 0.0004 Epoch: 16 Global Step: 347070 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:17:47,600-Speed 6308.06 samples/sec Loss 5.4355 LearningRate 0.0004 Epoch: 16 Global Step: 347080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:17:50,847-Speed 6308.19 samples/sec Loss 5.3971 LearningRate 0.0004 Epoch: 16 Global Step: 347090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:17:54,082-Speed 6332.29 samples/sec Loss 5.2943 LearningRate 0.0004 Epoch: 16 Global Step: 347100 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:17:57,329-Speed 6309.25 samples/sec Loss 5.4411 LearningRate 0.0004 Epoch: 16 Global Step: 347110 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:00,576-Speed 6307.79 samples/sec Loss 5.4320 LearningRate 0.0004 Epoch: 16 Global Step: 347120 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:03,828-Speed 6299.13 samples/sec Loss 5.3735 LearningRate 0.0004 Epoch: 16 Global Step: 347130 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:07,076-Speed 6306.72 samples/sec Loss 5.3655 LearningRate 0.0004 Epoch: 16 Global Step: 347140 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:10,319-Speed 6317.18 samples/sec Loss 5.3383 LearningRate 0.0004 Epoch: 16 Global Step: 347150 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:13,566-Speed 6308.67 samples/sec Loss 5.3819 LearningRate 0.0004 Epoch: 16 Global Step: 347160 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:16,809-Speed 6317.43 samples/sec Loss 5.2805 LearningRate 0.0004 Epoch: 16 Global Step: 347170 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:20,055-Speed 6308.83 samples/sec Loss 5.4039 LearningRate 0.0004 Epoch: 16 Global Step: 347180 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:23,298-Speed 6317.20 samples/sec Loss 5.2946 LearningRate 0.0004 Epoch: 16 Global Step: 347190 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:26,541-Speed 6317.65 samples/sec Loss 5.3212 LearningRate 0.0004 Epoch: 16 Global Step: 347200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:18:29,786-Speed 6311.97 samples/sec Loss 5.3796 LearningRate 0.0004 Epoch: 16 Global Step: 347210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:18:33,038-Speed 6298.89 samples/sec Loss 5.4012 LearningRate 0.0004 Epoch: 16 Global Step: 347220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:18:36,285-Speed 6309.43 samples/sec Loss 5.4468 LearningRate 0.0004 Epoch: 16 Global Step: 347230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:18:39,527-Speed 6318.44 samples/sec Loss 5.3562 LearningRate 0.0004 Epoch: 16 Global Step: 347240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:18:42,774-Speed 6309.75 samples/sec Loss 5.4395 LearningRate 0.0004 Epoch: 16 Global Step: 347250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:18:46,011-Speed 6327.72 samples/sec Loss 5.3971 LearningRate 0.0004 Epoch: 16 Global Step: 347260 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:49,257-Speed 6311.21 samples/sec Loss 5.4534 LearningRate 0.0004 Epoch: 16 Global Step: 347270 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:52,501-Speed 6315.51 samples/sec Loss 5.4167 LearningRate 0.0004 Epoch: 16 Global Step: 347280 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:55,745-Speed 6313.72 samples/sec Loss 5.4164 LearningRate 0.0004 Epoch: 16 Global Step: 347290 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:18:58,989-Speed 6314.76 samples/sec Loss 5.4005 LearningRate 0.0004 Epoch: 16 Global Step: 347300 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:19:02,240-Speed 6301.34 samples/sec Loss 5.2776 LearningRate 0.0004 Epoch: 16 Global Step: 347310 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:19:05,487-Speed 6307.26 samples/sec Loss 5.4292 LearningRate 0.0004 Epoch: 16 Global Step: 347320 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:19:08,727-Speed 6322.16 samples/sec Loss 5.4612 LearningRate 0.0004 Epoch: 16 Global Step: 347330 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:19:11,974-Speed 6309.43 samples/sec Loss 5.3497 LearningRate 0.0004 Epoch: 16 Global Step: 347340 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:19:15,220-Speed 6311.96 samples/sec Loss 5.3565 LearningRate 0.0004 Epoch: 16 Global Step: 347350 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:19:18,467-Speed 6308.31 samples/sec Loss 5.3553 LearningRate 0.0004 Epoch: 16 Global Step: 347360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:21,715-Speed 6307.06 samples/sec Loss 5.4064 LearningRate 0.0004 Epoch: 16 Global Step: 347370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:24,957-Speed 6318.81 samples/sec Loss 5.4617 LearningRate 0.0004 Epoch: 16 Global Step: 347380 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:28,204-Speed 6307.14 samples/sec Loss 5.3271 LearningRate 0.0004 Epoch: 16 Global Step: 347390 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:31,445-Speed 6320.13 samples/sec Loss 5.3931 LearningRate 0.0004 Epoch: 16 Global Step: 347400 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:34,692-Speed 6310.46 samples/sec Loss 5.4398 LearningRate 0.0004 Epoch: 16 Global Step: 347410 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:37,934-Speed 6318.20 samples/sec Loss 5.3379 LearningRate 0.0004 Epoch: 16 Global Step: 347420 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:41,182-Speed 6306.82 samples/sec Loss 5.3664 LearningRate 0.0004 Epoch: 16 Global Step: 347430 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:44,427-Speed 6313.80 samples/sec Loss 5.3087 LearningRate 0.0004 Epoch: 16 Global Step: 347440 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:47,673-Speed 6309.02 samples/sec Loss 5.3862 LearningRate 0.0004 Epoch: 16 Global Step: 347450 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:50,906-Speed 6336.76 samples/sec Loss 5.3395 LearningRate 0.0004 Epoch: 16 Global Step: 347460 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:54,157-Speed 6301.69 samples/sec Loss 5.2874 LearningRate 0.0004 Epoch: 16 Global Step: 347470 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:19:57,399-Speed 6317.75 samples/sec Loss 5.4460 LearningRate 0.0004 Epoch: 16 Global Step: 347480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:20:00,628-Speed 6344.01 samples/sec Loss 5.4912 LearningRate 0.0004 Epoch: 16 Global Step: 347490 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:20:03,874-Speed 6311.08 samples/sec Loss 5.3968 LearningRate 0.0004 Epoch: 16 Global Step: 347500 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:20:07,120-Speed 6310.32 samples/sec Loss 5.3893 LearningRate 0.0004 Epoch: 16 Global Step: 347510 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:20:10,366-Speed 6311.51 samples/sec Loss 5.2714 LearningRate 0.0004 Epoch: 16 Global Step: 347520 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:20:13,614-Speed 6305.23 samples/sec Loss 5.3805 LearningRate 0.0004 Epoch: 16 Global Step: 347530 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:20:16,858-Speed 6314.64 samples/sec Loss 5.3635 LearningRate 0.0004 Epoch: 16 Global Step: 347540 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:20:20,104-Speed 6311.22 samples/sec Loss 5.3486 LearningRate 0.0004 Epoch: 16 Global Step: 347550 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:20:23,354-Speed 6303.82 samples/sec Loss 5.3813 LearningRate 0.0004 Epoch: 16 Global Step: 347560 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:20:26,597-Speed 6315.25 samples/sec Loss 5.3442 LearningRate 0.0004 Epoch: 16 Global Step: 347570 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:20:29,843-Speed 6311.18 samples/sec Loss 5.4364 LearningRate 0.0004 Epoch: 16 Global Step: 347580 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:20:33,090-Speed 6308.74 samples/sec Loss 5.2979 LearningRate 0.0004 Epoch: 16 Global Step: 347590 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:20:36,332-Speed 6319.09 samples/sec Loss 5.3973 LearningRate 0.0004 Epoch: 16 Global Step: 347600 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:20:39,578-Speed 6309.20 samples/sec Loss 5.2583 LearningRate 0.0004 Epoch: 16 Global Step: 347610 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:20:42,822-Speed 6314.67 samples/sec Loss 5.4287 LearningRate 0.0004 Epoch: 16 Global Step: 347620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:20:46,071-Speed 6307.11 samples/sec Loss 5.3992 LearningRate 0.0004 Epoch: 16 Global Step: 347630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:20:49,320-Speed 6304.00 samples/sec Loss 5.3419 LearningRate 0.0004 Epoch: 16 Global Step: 347640 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:20:52,566-Speed 6311.57 samples/sec Loss 5.4168 LearningRate 0.0004 Epoch: 16 Global Step: 347650 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:20:55,813-Speed 6309.45 samples/sec Loss 5.3108 LearningRate 0.0004 Epoch: 16 Global Step: 347660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:20:59,062-Speed 6303.39 samples/sec Loss 5.3744 LearningRate 0.0004 Epoch: 16 Global Step: 347670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:21:02,313-Speed 6302.10 samples/sec Loss 5.4285 LearningRate 0.0004 Epoch: 16 Global Step: 347680 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:21:05,549-Speed 6330.76 samples/sec Loss 5.3945 LearningRate 0.0004 Epoch: 16 Global Step: 347690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:21:08,796-Speed 6308.01 samples/sec Loss 5.3525 LearningRate 0.0004 Epoch: 16 Global Step: 347700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:21:12,043-Speed 6308.68 samples/sec Loss 5.4036 LearningRate 0.0004 Epoch: 16 Global Step: 347710 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:21:15,286-Speed 6315.83 samples/sec Loss 5.3858 LearningRate 0.0004 Epoch: 16 Global Step: 347720 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:21:18,533-Speed 6309.68 samples/sec Loss 5.4371 LearningRate 0.0004 Epoch: 16 Global Step: 347730 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:21:21,778-Speed 6312.50 samples/sec Loss 5.3408 LearningRate 0.0004 Epoch: 16 Global Step: 347740 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:21:25,015-Speed 6328.84 samples/sec Loss 5.4388 LearningRate 0.0004 Epoch: 16 Global Step: 347750 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:21:28,262-Speed 6308.58 samples/sec Loss 5.4216 LearningRate 0.0004 Epoch: 16 Global Step: 347760 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:21:31,511-Speed 6304.72 samples/sec Loss 5.3864 LearningRate 0.0004 Epoch: 16 Global Step: 347770 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:21:34,788-Speed 6250.77 samples/sec Loss 5.3387 LearningRate 0.0004 Epoch: 16 Global Step: 347780 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:21:38,032-Speed 6315.58 samples/sec Loss 5.4452 LearningRate 0.0004 Epoch: 16 Global Step: 347790 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:21:41,283-Speed 6299.61 samples/sec Loss 5.3926 LearningRate 0.0004 Epoch: 16 Global Step: 347800 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:21:44,531-Speed 6307.26 samples/sec Loss 5.4593 LearningRate 0.0004 Epoch: 16 Global Step: 347810 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:21:47,776-Speed 6312.63 samples/sec Loss 5.3665 LearningRate 0.0004 Epoch: 16 Global Step: 347820 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:21:51,025-Speed 6305.50 samples/sec Loss 5.4331 LearningRate 0.0004 Epoch: 16 Global Step: 347830 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:21:54,269-Speed 6313.02 samples/sec Loss 5.4480 LearningRate 0.0004 Epoch: 16 Global Step: 347840 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:21:57,520-Speed 6301.73 samples/sec Loss 5.3530 LearningRate 0.0004 Epoch: 16 Global Step: 347850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:00,769-Speed 6306.43 samples/sec Loss 5.3660 LearningRate 0.0004 Epoch: 16 Global Step: 347860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:04,012-Speed 6315.62 samples/sec Loss 5.3919 LearningRate 0.0004 Epoch: 16 Global Step: 347870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:07,259-Speed 6309.78 samples/sec Loss 5.4520 LearningRate 0.0004 Epoch: 16 Global Step: 347880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:10,503-Speed 6313.55 samples/sec Loss 5.3423 LearningRate 0.0004 Epoch: 16 Global Step: 347890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:13,755-Speed 6300.46 samples/sec Loss 5.3546 LearningRate 0.0004 Epoch: 16 Global Step: 347900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:17,000-Speed 6312.06 samples/sec Loss 5.4282 LearningRate 0.0004 Epoch: 16 Global Step: 347910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:20,249-Speed 6303.85 samples/sec Loss 5.3612 LearningRate 0.0004 Epoch: 16 Global Step: 347920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:23,497-Speed 6307.16 samples/sec Loss 5.3484 LearningRate 0.0004 Epoch: 16 Global Step: 347930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:26,746-Speed 6304.71 samples/sec Loss 5.3037 LearningRate 0.0004 Epoch: 16 Global Step: 347940 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:29,979-Speed 6336.68 samples/sec Loss 5.4692 LearningRate 0.0004 Epoch: 16 Global Step: 347950 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:33,228-Speed 6304.67 samples/sec Loss 5.4278 LearningRate 0.0004 Epoch: 16 Global Step: 347960 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:36,474-Speed 6312.04 samples/sec Loss 5.4477 LearningRate 0.0004 Epoch: 16 Global Step: 347970 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:39,719-Speed 6311.72 samples/sec Loss 5.3847 LearningRate 0.0004 Epoch: 16 Global Step: 347980 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:42,966-Speed 6309.20 samples/sec Loss 5.3635 LearningRate 0.0004 Epoch: 16 Global Step: 347990 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:46,209-Speed 6316.83 samples/sec Loss 5.3937 LearningRate 0.0004 Epoch: 16 Global Step: 348000 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:49,454-Speed 6311.85 samples/sec Loss 5.3691 LearningRate 0.0004 Epoch: 16 Global Step: 348010 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:52,703-Speed 6304.73 samples/sec Loss 5.4838 LearningRate 0.0004 Epoch: 16 Global Step: 348020 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:55,944-Speed 6319.96 samples/sec Loss 5.4414 LearningRate 0.0004 Epoch: 16 Global Step: 348030 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:22:59,191-Speed 6308.80 samples/sec Loss 5.3946 LearningRate 0.0004 Epoch: 16 Global Step: 348040 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:02,426-Speed 6332.72 samples/sec Loss 5.4188 LearningRate 0.0004 Epoch: 16 Global Step: 348050 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:05,671-Speed 6313.43 samples/sec Loss 5.3441 LearningRate 0.0004 Epoch: 16 Global Step: 348060 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:08,919-Speed 6306.43 samples/sec Loss 5.4137 LearningRate 0.0004 Epoch: 16 Global Step: 348070 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:12,164-Speed 6313.66 samples/sec Loss 5.3711 LearningRate 0.0004 Epoch: 16 Global Step: 348080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:15,414-Speed 6302.94 samples/sec Loss 5.3875 LearningRate 0.0004 Epoch: 16 Global Step: 348090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:18,662-Speed 6306.95 samples/sec Loss 5.3774 LearningRate 0.0004 Epoch: 16 Global Step: 348100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:21,910-Speed 6306.99 samples/sec Loss 5.4352 LearningRate 0.0004 Epoch: 16 Global Step: 348110 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:25,154-Speed 6314.07 samples/sec Loss 5.3676 LearningRate 0.0004 Epoch: 16 Global Step: 348120 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:28,397-Speed 6317.28 samples/sec Loss 5.3519 LearningRate 0.0004 Epoch: 16 Global Step: 348130 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:31,673-Speed 6252.89 samples/sec Loss 5.3164 LearningRate 0.0004 Epoch: 16 Global Step: 348140 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:34,907-Speed 6333.43 samples/sec Loss 5.3644 LearningRate 0.0004 Epoch: 16 Global Step: 348150 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:38,154-Speed 6308.35 samples/sec Loss 5.3345 LearningRate 0.0004 Epoch: 16 Global Step: 348160 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:41,401-Speed 6308.44 samples/sec Loss 5.3923 LearningRate 0.0004 Epoch: 16 Global Step: 348170 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:44,648-Speed 6309.41 samples/sec Loss 5.3329 LearningRate 0.0004 Epoch: 16 Global Step: 348180 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:47,891-Speed 6316.38 samples/sec Loss 5.4366 LearningRate 0.0004 Epoch: 16 Global Step: 348190 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:51,136-Speed 6312.59 samples/sec Loss 5.2986 LearningRate 0.0004 Epoch: 16 Global Step: 348200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:54,385-Speed 6304.64 samples/sec Loss 5.4085 LearningRate 0.0004 Epoch: 16 Global Step: 348210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:23:57,636-Speed 6300.91 samples/sec Loss 5.3800 LearningRate 0.0004 Epoch: 16 Global Step: 348220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:24:00,898-Speed 6280.52 samples/sec Loss 5.3396 LearningRate 0.0004 Epoch: 16 Global Step: 348230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:24:04,132-Speed 6334.46 samples/sec Loss 5.4238 LearningRate 0.0004 Epoch: 16 Global Step: 348240 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:07,379-Speed 6309.00 samples/sec Loss 5.3975 LearningRate 0.0004 Epoch: 16 Global Step: 348250 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:10,626-Speed 6307.28 samples/sec Loss 5.4076 LearningRate 0.0004 Epoch: 16 Global Step: 348260 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:13,870-Speed 6315.66 samples/sec Loss 5.3825 LearningRate 0.0004 Epoch: 16 Global Step: 348270 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:17,116-Speed 6310.90 samples/sec Loss 5.3655 LearningRate 0.0004 Epoch: 16 Global Step: 348280 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:20,362-Speed 6311.08 samples/sec Loss 5.3524 LearningRate 0.0004 Epoch: 16 Global Step: 348290 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:23,610-Speed 6306.68 samples/sec Loss 5.4055 LearningRate 0.0004 Epoch: 16 Global Step: 348300 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:26,859-Speed 6304.92 samples/sec Loss 5.3666 LearningRate 0.0004 Epoch: 16 Global Step: 348310 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:30,106-Speed 6309.47 samples/sec Loss 5.3452 LearningRate 0.0004 Epoch: 16 Global Step: 348320 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:33,358-Speed 6299.22 samples/sec Loss 5.4535 LearningRate 0.0004 Epoch: 16 Global Step: 348330 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:36,598-Speed 6322.56 samples/sec Loss 5.2835 LearningRate 0.0004 Epoch: 16 Global Step: 348340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:24:39,844-Speed 6309.25 samples/sec Loss 5.3949 LearningRate 0.0004 Epoch: 16 Global Step: 348350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:24:43,091-Speed 6308.49 samples/sec Loss 5.3142 LearningRate 0.0004 Epoch: 16 Global Step: 348360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:24:46,342-Speed 6301.62 samples/sec Loss 5.3742 LearningRate 0.0004 Epoch: 16 Global Step: 348370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:24:49,588-Speed 6310.89 samples/sec Loss 5.3037 LearningRate 0.0004 Epoch: 16 Global Step: 348380 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:24:52,835-Speed 6308.89 samples/sec Loss 5.4000 LearningRate 0.0004 Epoch: 16 Global Step: 348390 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:24:56,068-Speed 6335.39 samples/sec Loss 5.3433 LearningRate 0.0004 Epoch: 16 Global Step: 348400 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:24:59,314-Speed 6311.88 samples/sec Loss 5.3828 LearningRate 0.0004 Epoch: 16 Global Step: 348410 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:25:02,558-Speed 6313.91 samples/sec Loss 5.3394 LearningRate 0.0004 Epoch: 16 Global Step: 348420 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:25:05,804-Speed 6311.34 samples/sec Loss 5.2652 LearningRate 0.0004 Epoch: 16 Global Step: 348430 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:25:09,052-Speed 6306.87 samples/sec Loss 5.4217 LearningRate 0.0004 Epoch: 16 Global Step: 348440 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:25:12,302-Speed 6303.24 samples/sec Loss 5.4628 LearningRate 0.0004 Epoch: 16 Global Step: 348450 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:25:15,545-Speed 6316.14 samples/sec Loss 5.4172 LearningRate 0.0004 Epoch: 16 Global Step: 348460 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:25:18,789-Speed 6313.95 samples/sec Loss 5.2835 LearningRate 0.0004 Epoch: 16 Global Step: 348470 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:25:22,038-Speed 6305.46 samples/sec Loss 5.3626 LearningRate 0.0004 Epoch: 16 Global Step: 348480 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:25:25,285-Speed 6308.34 samples/sec Loss 5.3568 LearningRate 0.0004 Epoch: 16 Global Step: 348490 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:25:28,532-Speed 6309.31 samples/sec Loss 5.4695 LearningRate 0.0004 Epoch: 16 Global Step: 348500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:25:31,779-Speed 6309.68 samples/sec Loss 5.3819 LearningRate 0.0004 Epoch: 16 Global Step: 348510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:25:35,023-Speed 6313.76 samples/sec Loss 5.4085 LearningRate 0.0004 Epoch: 16 Global Step: 348520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:25:38,270-Speed 6309.38 samples/sec Loss 5.4477 LearningRate 0.0004 Epoch: 16 Global Step: 348530 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:25:41,515-Speed 6312.76 samples/sec Loss 5.3787 LearningRate 0.0004 Epoch: 16 Global Step: 348540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:25:44,763-Speed 6307.27 samples/sec Loss 5.3941 LearningRate 0.0004 Epoch: 16 Global Step: 348550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:25:48,003-Speed 6321.72 samples/sec Loss 5.3998 LearningRate 0.0004 Epoch: 16 Global Step: 348560 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:25:51,251-Speed 6306.13 samples/sec Loss 5.4108 LearningRate 0.0004 Epoch: 16 Global Step: 348570 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:25:54,498-Speed 6309.86 samples/sec Loss 5.4038 LearningRate 0.0004 Epoch: 16 Global Step: 348580 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:25:57,744-Speed 6310.08 samples/sec Loss 5.3375 LearningRate 0.0004 Epoch: 16 Global Step: 348590 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:00,975-Speed 6339.88 samples/sec Loss 5.3923 LearningRate 0.0004 Epoch: 16 Global Step: 348600 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:04,220-Speed 6314.02 samples/sec Loss 5.4062 LearningRate 0.0004 Epoch: 16 Global Step: 348610 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:07,470-Speed 6302.16 samples/sec Loss 5.2769 LearningRate 0.0004 Epoch: 16 Global Step: 348620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:10,713-Speed 6316.15 samples/sec Loss 5.3825 LearningRate 0.0004 Epoch: 16 Global Step: 348630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:13,959-Speed 6310.26 samples/sec Loss 5.4295 LearningRate 0.0004 Epoch: 16 Global Step: 348640 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:17,201-Speed 6318.06 samples/sec Loss 5.3339 LearningRate 0.0004 Epoch: 16 Global Step: 348650 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:20,446-Speed 6313.08 samples/sec Loss 5.3918 LearningRate 0.0004 Epoch: 16 Global Step: 348660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:23,697-Speed 6301.37 samples/sec Loss 5.3892 LearningRate 0.0004 Epoch: 16 Global Step: 348670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:26,945-Speed 6307.03 samples/sec Loss 5.3714 LearningRate 0.0004 Epoch: 16 Global Step: 348680 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:30,192-Speed 6307.76 samples/sec Loss 5.3686 LearningRate 0.0004 Epoch: 16 Global Step: 348690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:33,426-Speed 6334.62 samples/sec Loss 5.3954 LearningRate 0.0004 Epoch: 16 Global Step: 348700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:26:36,657-Speed 6342.05 samples/sec Loss 5.4047 LearningRate 0.0004 Epoch: 16 Global Step: 348710 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:26:39,907-Speed 6301.26 samples/sec Loss 5.3373 LearningRate 0.0004 Epoch: 16 Global Step: 348720 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:26:43,153-Speed 6311.10 samples/sec Loss 5.3703 LearningRate 0.0004 Epoch: 16 Global Step: 348730 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:26:46,400-Speed 6309.32 samples/sec Loss 5.3156 LearningRate 0.0004 Epoch: 16 Global Step: 348740 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:26:49,645-Speed 6313.23 samples/sec Loss 5.4037 LearningRate 0.0004 Epoch: 16 Global Step: 348750 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:26:52,891-Speed 6309.78 samples/sec Loss 5.3574 LearningRate 0.0004 Epoch: 16 Global Step: 348760 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:26:56,135-Speed 6315.55 samples/sec Loss 5.3675 LearningRate 0.0004 Epoch: 16 Global Step: 348770 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:26:59,383-Speed 6305.43 samples/sec Loss 5.4176 LearningRate 0.0004 Epoch: 16 Global Step: 348780 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:27:02,633-Speed 6304.38 samples/sec Loss 5.3346 LearningRate 0.0004 Epoch: 16 Global Step: 348790 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:27:05,878-Speed 6312.12 samples/sec Loss 5.3822 LearningRate 0.0004 Epoch: 16 Global Step: 348800 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:27:09,125-Speed 6309.20 samples/sec Loss 5.3780 LearningRate 0.0004 Epoch: 16 Global Step: 348810 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:12,369-Speed 6313.06 samples/sec Loss 5.3349 LearningRate 0.0004 Epoch: 16 Global Step: 348820 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:15,625-Speed 6292.64 samples/sec Loss 5.3305 LearningRate 0.0004 Epoch: 16 Global Step: 348830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:18,873-Speed 6305.79 samples/sec Loss 5.3782 LearningRate 0.0004 Epoch: 16 Global Step: 348840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:22,114-Speed 6321.38 samples/sec Loss 5.3549 LearningRate 0.0004 Epoch: 16 Global Step: 348850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:25,363-Speed 6305.89 samples/sec Loss 5.3652 LearningRate 0.0004 Epoch: 16 Global Step: 348860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:28,612-Speed 6303.29 samples/sec Loss 5.3547 LearningRate 0.0004 Epoch: 16 Global Step: 348870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:31,861-Speed 6306.07 samples/sec Loss 5.3347 LearningRate 0.0004 Epoch: 16 Global Step: 348880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:35,107-Speed 6310.76 samples/sec Loss 5.3670 LearningRate 0.0004 Epoch: 16 Global Step: 348890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:38,355-Speed 6307.59 samples/sec Loss 5.3513 LearningRate 0.0004 Epoch: 16 Global Step: 348900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:41,587-Speed 6338.04 samples/sec Loss 5.4383 LearningRate 0.0004 Epoch: 16 Global Step: 348910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:44,830-Speed 6317.06 samples/sec Loss 5.3696 LearningRate 0.0004 Epoch: 16 Global Step: 348920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:48,084-Speed 6294.54 samples/sec Loss 5.3879 LearningRate 0.0004 Epoch: 16 Global Step: 348930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:51,332-Speed 6307.26 samples/sec Loss 5.2955 LearningRate 0.0004 Epoch: 16 Global Step: 348940 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:54,577-Speed 6311.22 samples/sec Loss 5.4397 LearningRate 0.0004 Epoch: 16 Global Step: 348950 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:27:57,821-Speed 6315.29 samples/sec Loss 5.3487 LearningRate 0.0004 Epoch: 16 Global Step: 348960 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:01,077-Speed 6290.54 samples/sec Loss 5.3058 LearningRate 0.0004 Epoch: 16 Global Step: 348970 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:04,326-Speed 6309.34 samples/sec Loss 5.3313 LearningRate 0.0004 Epoch: 16 Global Step: 348980 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:07,568-Speed 6317.23 samples/sec Loss 5.4278 LearningRate 0.0004 Epoch: 16 Global Step: 348990 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:10,812-Speed 6315.28 samples/sec Loss 5.4576 LearningRate 0.0004 Epoch: 16 Global Step: 349000 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:14,042-Speed 6340.64 samples/sec Loss 5.3524 LearningRate 0.0004 Epoch: 16 Global Step: 349010 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:17,285-Speed 6317.98 samples/sec Loss 5.3719 LearningRate 0.0004 Epoch: 16 Global Step: 349020 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:20,530-Speed 6312.34 samples/sec Loss 5.3851 LearningRate 0.0004 Epoch: 16 Global Step: 349030 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:23,771-Speed 6320.54 samples/sec Loss 5.3880 LearningRate 0.0004 Epoch: 16 Global Step: 349040 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:27,018-Speed 6309.00 samples/sec Loss 5.3961 LearningRate 0.0004 Epoch: 16 Global Step: 349050 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:30,263-Speed 6311.45 samples/sec Loss 5.3625 LearningRate 0.0004 Epoch: 16 Global Step: 349060 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:33,511-Speed 6307.05 samples/sec Loss 5.4232 LearningRate 0.0004 Epoch: 16 Global Step: 349070 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:36,760-Speed 6304.71 samples/sec Loss 5.3223 LearningRate 0.0004 Epoch: 16 Global Step: 349080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:40,005-Speed 6312.63 samples/sec Loss 5.4086 LearningRate 0.0004 Epoch: 16 Global Step: 349090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:43,255-Speed 6302.36 samples/sec Loss 5.3274 LearningRate 0.0004 Epoch: 16 Global Step: 349100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:46,484-Speed 6344.08 samples/sec Loss 5.3918 LearningRate 0.0004 Epoch: 16 Global Step: 349110 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:49,738-Speed 6296.77 samples/sec Loss 5.3806 LearningRate 0.0004 Epoch: 16 Global Step: 349120 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:52,985-Speed 6307.71 samples/sec Loss 5.4073 LearningRate 0.0004 Epoch: 16 Global Step: 349130 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:56,228-Speed 6318.25 samples/sec Loss 5.3659 LearningRate 0.0004 Epoch: 16 Global Step: 349140 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:28:59,474-Speed 6309.45 samples/sec Loss 5.3274 LearningRate 0.0004 Epoch: 16 Global Step: 349150 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:02,720-Speed 6310.98 samples/sec Loss 5.3272 LearningRate 0.0004 Epoch: 16 Global Step: 349160 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:05,963-Speed 6317.78 samples/sec Loss 5.2983 LearningRate 0.0004 Epoch: 16 Global Step: 349170 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:09,207-Speed 6313.34 samples/sec Loss 5.3421 LearningRate 0.0004 Epoch: 16 Global Step: 349180 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:12,455-Speed 6307.29 samples/sec Loss 5.3362 LearningRate 0.0004 Epoch: 16 Global Step: 349190 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:15,702-Speed 6309.08 samples/sec Loss 5.3292 LearningRate 0.0004 Epoch: 16 Global Step: 349200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:18,931-Speed 6343.01 samples/sec Loss 5.3876 LearningRate 0.0004 Epoch: 16 Global Step: 349210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:22,179-Speed 6306.36 samples/sec Loss 5.4193 LearningRate 0.0004 Epoch: 16 Global Step: 349220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:25,423-Speed 6315.51 samples/sec Loss 5.4147 LearningRate 0.0004 Epoch: 16 Global Step: 349230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:28,667-Speed 6314.76 samples/sec Loss 5.3685 LearningRate 0.0004 Epoch: 16 Global Step: 349240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:31,915-Speed 6307.18 samples/sec Loss 5.3223 LearningRate 0.0004 Epoch: 16 Global Step: 349250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:35,161-Speed 6310.12 samples/sec Loss 5.4092 LearningRate 0.0004 Epoch: 16 Global Step: 349260 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:38,411-Speed 6303.35 samples/sec Loss 5.3789 LearningRate 0.0004 Epoch: 16 Global Step: 349270 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:41,652-Speed 6320.04 samples/sec Loss 5.3683 LearningRate 0.0004 Epoch: 16 Global Step: 349280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:44,901-Speed 6304.92 samples/sec Loss 5.4262 LearningRate 0.0004 Epoch: 16 Global Step: 349290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:48,148-Speed 6309.47 samples/sec Loss 5.3928 LearningRate 0.0004 Epoch: 16 Global Step: 349300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:51,394-Speed 6309.61 samples/sec Loss 5.4034 LearningRate 0.0004 Epoch: 16 Global Step: 349310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:54,647-Speed 6296.52 samples/sec Loss 5.3322 LearningRate 0.0004 Epoch: 16 Global Step: 349320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:29:57,905-Speed 6288.30 samples/sec Loss 5.3394 LearningRate 0.0004 Epoch: 16 Global Step: 349330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:30:01,152-Speed 6308.94 samples/sec Loss 5.3637 LearningRate 0.0004 Epoch: 16 Global Step: 349340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:30:04,397-Speed 6312.70 samples/sec Loss 5.3507 LearningRate 0.0004 Epoch: 16 Global Step: 349350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:30:07,648-Speed 6301.80 samples/sec Loss 5.3924 LearningRate 0.0004 Epoch: 16 Global Step: 349360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:30:10,883-Speed 6332.83 samples/sec Loss 5.3419 LearningRate 0.0004 Epoch: 16 Global Step: 349370 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:30:14,127-Speed 6314.10 samples/sec Loss 5.3186 LearningRate 0.0004 Epoch: 16 Global Step: 349380 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:30:17,375-Speed 6305.26 samples/sec Loss 5.2885 LearningRate 0.0004 Epoch: 16 Global Step: 349390 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:30:20,619-Speed 6316.08 samples/sec Loss 5.3857 LearningRate 0.0004 Epoch: 16 Global Step: 349400 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:30:23,866-Speed 6308.51 samples/sec Loss 5.3438 LearningRate 0.0004 Epoch: 16 Global Step: 349410 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:30:27,115-Speed 6304.75 samples/sec Loss 5.3508 LearningRate 0.0004 Epoch: 16 Global Step: 349420 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:30:30,362-Speed 6308.80 samples/sec Loss 5.3338 LearningRate 0.0004 Epoch: 16 Global Step: 349430 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:30:33,610-Speed 6306.79 samples/sec Loss 5.3849 LearningRate 0.0004 Epoch: 16 Global Step: 349440 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:30:36,864-Speed 6294.51 samples/sec Loss 5.3557 LearningRate 0.0004 Epoch: 16 Global Step: 349450 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:30:40,112-Speed 6306.97 samples/sec Loss 5.3865 LearningRate 0.0004 Epoch: 16 Global Step: 349460 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:30:43,357-Speed 6312.84 samples/sec Loss 5.2926 LearningRate 0.0004 Epoch: 16 Global Step: 349470 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:30:46,606-Speed 6304.03 samples/sec Loss 5.3736 LearningRate 0.0004 Epoch: 16 Global Step: 349480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:30:49,855-Speed 6305.57 samples/sec Loss 5.3718 LearningRate 0.0004 Epoch: 16 Global Step: 349490 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:30:53,101-Speed 6311.53 samples/sec Loss 5.2915 LearningRate 0.0004 Epoch: 16 Global Step: 349500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:30:56,348-Speed 6307.34 samples/sec Loss 5.3255 LearningRate 0.0004 Epoch: 16 Global Step: 349510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:30:59,591-Speed 6316.55 samples/sec Loss 5.4076 LearningRate 0.0004 Epoch: 16 Global Step: 349520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:02,839-Speed 6307.78 samples/sec Loss 5.3509 LearningRate 0.0004 Epoch: 16 Global Step: 349530 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:06,083-Speed 6315.17 samples/sec Loss 5.3873 LearningRate 0.0004 Epoch: 16 Global Step: 349540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:09,331-Speed 6307.74 samples/sec Loss 5.3500 LearningRate 0.0004 Epoch: 16 Global Step: 349550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:12,575-Speed 6314.04 samples/sec Loss 5.3309 LearningRate 0.0004 Epoch: 16 Global Step: 349560 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:15,811-Speed 6330.65 samples/sec Loss 5.3772 LearningRate 0.0004 Epoch: 16 Global Step: 349570 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:19,104-Speed 6219.79 samples/sec Loss 5.2903 LearningRate 0.0004 Epoch: 16 Global Step: 349580 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:22,375-Speed 6263.94 samples/sec Loss 5.3776 LearningRate 0.0004 Epoch: 16 Global Step: 349590 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:25,621-Speed 6310.46 samples/sec Loss 5.4125 LearningRate 0.0004 Epoch: 16 Global Step: 349600 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:28,866-Speed 6311.86 samples/sec Loss 5.3281 LearningRate 0.0004 Epoch: 16 Global Step: 349610 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:32,109-Speed 6317.06 samples/sec Loss 5.3324 LearningRate 0.0004 Epoch: 16 Global Step: 349620 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:35,583-Speed 5896.24 samples/sec Loss 5.4634 LearningRate 0.0004 Epoch: 16 Global Step: 349630 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:38,831-Speed 6307.35 samples/sec Loss 5.3623 LearningRate 0.0004 Epoch: 16 Global Step: 349640 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:42,076-Speed 6311.48 samples/sec Loss 5.3294 LearningRate 0.0004 Epoch: 16 Global Step: 349650 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:45,325-Speed 6305.04 samples/sec Loss 5.3613 LearningRate 0.0004 Epoch: 16 Global Step: 349660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:48,555-Speed 6340.91 samples/sec Loss 5.3663 LearningRate 0.0004 Epoch: 16 Global Step: 349670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:51,802-Speed 6310.17 samples/sec Loss 5.3492 LearningRate 0.0004 Epoch: 16 Global Step: 349680 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:55,055-Speed 6297.31 samples/sec Loss 5.3016 LearningRate 0.0004 Epoch: 16 Global Step: 349690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:31:58,302-Speed 6307.63 samples/sec Loss 5.3299 LearningRate 0.0004 Epoch: 16 Global Step: 349700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:01,550-Speed 6308.23 samples/sec Loss 5.3611 LearningRate 0.0004 Epoch: 16 Global Step: 349710 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:04,793-Speed 6316.23 samples/sec Loss 5.3619 LearningRate 0.0004 Epoch: 16 Global Step: 349720 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:08,041-Speed 6306.57 samples/sec Loss 5.4001 LearningRate 0.0004 Epoch: 16 Global Step: 349730 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:11,287-Speed 6309.82 samples/sec Loss 5.2988 LearningRate 0.0004 Epoch: 16 Global Step: 349740 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:14,541-Speed 6295.75 samples/sec Loss 5.4561 LearningRate 0.0004 Epoch: 16 Global Step: 349750 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:17,785-Speed 6315.46 samples/sec Loss 5.2930 LearningRate 0.0004 Epoch: 16 Global Step: 349760 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:21,028-Speed 6317.44 samples/sec Loss 5.3345 LearningRate 0.0004 Epoch: 16 Global Step: 349770 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:24,274-Speed 6309.77 samples/sec Loss 5.3803 LearningRate 0.0004 Epoch: 16 Global Step: 349780 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:27,616-Speed 6130.00 samples/sec Loss 5.2993 LearningRate 0.0004 Epoch: 16 Global Step: 349790 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:30,921-Speed 6198.23 samples/sec Loss 5.3839 LearningRate 0.0004 Epoch: 16 Global Step: 349800 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:34,166-Speed 6311.57 samples/sec Loss 5.4184 LearningRate 0.0004 Epoch: 16 Global Step: 349810 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:37,419-Speed 6298.56 samples/sec Loss 5.3561 LearningRate 0.0004 Epoch: 16 Global Step: 349820 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:40,668-Speed 6303.94 samples/sec Loss 5.3371 LearningRate 0.0004 Epoch: 16 Global Step: 349830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:43,916-Speed 6306.79 samples/sec Loss 5.3546 LearningRate 0.0004 Epoch: 16 Global Step: 349840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:47,161-Speed 6312.44 samples/sec Loss 5.3494 LearningRate 0.0004 Epoch: 16 Global Step: 349850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:50,433-Speed 6259.95 samples/sec Loss 5.3796 LearningRate 0.0004 Epoch: 16 Global Step: 349860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:53,668-Speed 6333.05 samples/sec Loss 5.3853 LearningRate 0.0004 Epoch: 16 Global Step: 349870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:32:56,913-Speed 6312.58 samples/sec Loss 5.3777 LearningRate 0.0004 Epoch: 16 Global Step: 349880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:33:00,170-Speed 6288.32 samples/sec Loss 5.3155 LearningRate 0.0004 Epoch: 16 Global Step: 349890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:33:03,414-Speed 6315.11 samples/sec Loss 5.3785 LearningRate 0.0004 Epoch: 16 Global Step: 349900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:33:06,660-Speed 6310.52 samples/sec Loss 5.3485 LearningRate 0.0004 Epoch: 16 Global Step: 349910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:33:09,893-Speed 6336.05 samples/sec Loss 5.3939 LearningRate 0.0004 Epoch: 16 Global Step: 349920 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:33:13,140-Speed 6308.90 samples/sec Loss 5.4058 LearningRate 0.0004 Epoch: 16 Global Step: 349930 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:33:16,384-Speed 6314.95 samples/sec Loss 5.3925 LearningRate 0.0004 Epoch: 16 Global Step: 349940 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:33:19,638-Speed 6295.45 samples/sec Loss 5.3459 LearningRate 0.0004 Epoch: 16 Global Step: 349950 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:33:22,883-Speed 6312.78 samples/sec Loss 5.3384 LearningRate 0.0004 Epoch: 16 Global Step: 349960 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:33:26,129-Speed 6311.20 samples/sec Loss 5.3561 LearningRate 0.0004 Epoch: 16 Global Step: 349970 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:33:29,410-Speed 6244.32 samples/sec Loss 5.3523 LearningRate 0.0004 Epoch: 16 Global Step: 349980 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:33:32,667-Speed 6289.30 samples/sec Loss 5.4002 LearningRate 0.0004 Epoch: 16 Global Step: 349990 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:33:35,934-Speed 6270.74 samples/sec Loss 5.3643 LearningRate 0.0004 Epoch: 16 Global Step: 350000 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:33:39,175-Speed 6319.98 samples/sec Loss 5.3740 LearningRate 0.0004 Epoch: 16 Global Step: 350010 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:33:42,424-Speed 6304.18 samples/sec Loss 5.3918 LearningRate 0.0004 Epoch: 16 Global Step: 350020 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:33:45,673-Speed 6304.27 samples/sec Loss 5.3193 LearningRate 0.0004 Epoch: 16 Global Step: 350030 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:33:48,922-Speed 6306.36 samples/sec Loss 5.2680 LearningRate 0.0004 Epoch: 16 Global Step: 350040 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:33:52,167-Speed 6311.60 samples/sec Loss 5.3331 LearningRate 0.0004 Epoch: 16 Global Step: 350050 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:33:55,415-Speed 6307.69 samples/sec Loss 5.4081 LearningRate 0.0004 Epoch: 16 Global Step: 350060 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:33:58,658-Speed 6315.68 samples/sec Loss 5.3097 LearningRate 0.0004 Epoch: 16 Global Step: 350070 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:01,911-Speed 6297.94 samples/sec Loss 5.3547 LearningRate 0.0004 Epoch: 16 Global Step: 350080 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:05,158-Speed 6308.10 samples/sec Loss 5.2891 LearningRate 0.0004 Epoch: 16 Global Step: 350090 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:08,406-Speed 6308.26 samples/sec Loss 5.3816 LearningRate 0.0004 Epoch: 16 Global Step: 350100 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:11,654-Speed 6306.58 samples/sec Loss 5.4316 LearningRate 0.0004 Epoch: 16 Global Step: 350110 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:14,895-Speed 6320.17 samples/sec Loss 5.3739 LearningRate 0.0004 Epoch: 16 Global Step: 350120 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:18,142-Speed 6307.91 samples/sec Loss 5.3125 LearningRate 0.0004 Epoch: 16 Global Step: 350130 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:21,404-Speed 6280.21 samples/sec Loss 5.3762 LearningRate 0.0004 Epoch: 16 Global Step: 350140 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:24,647-Speed 6315.30 samples/sec Loss 5.4755 LearningRate 0.0004 Epoch: 16 Global Step: 350150 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:27,901-Speed 6294.97 samples/sec Loss 5.4063 LearningRate 0.0004 Epoch: 16 Global Step: 350160 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:31,150-Speed 6306.62 samples/sec Loss 5.3671 LearningRate 0.0004 Epoch: 16 Global Step: 350170 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:34,394-Speed 6313.41 samples/sec Loss 5.3593 LearningRate 0.0004 Epoch: 16 Global Step: 350180 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:37,643-Speed 6305.65 samples/sec Loss 5.3682 LearningRate 0.0004 Epoch: 16 Global Step: 350190 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:40,893-Speed 6303.53 samples/sec Loss 5.3887 LearningRate 0.0004 Epoch: 16 Global Step: 350200 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:44,142-Speed 6305.18 samples/sec Loss 5.3643 LearningRate 0.0004 Epoch: 16 Global Step: 350210 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:47,377-Speed 6331.29 samples/sec Loss 5.3428 LearningRate 0.0004 Epoch: 16 Global Step: 350220 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:50,627-Speed 6304.05 samples/sec Loss 5.4072 LearningRate 0.0004 Epoch: 16 Global Step: 350230 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:53,877-Speed 6303.78 samples/sec Loss 5.3655 LearningRate 0.0004 Epoch: 16 Global Step: 350240 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:34:57,121-Speed 6314.65 samples/sec Loss 5.4545 LearningRate 0.0004 Epoch: 16 Global Step: 350250 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:00,375-Speed 6294.12 samples/sec Loss 5.3685 LearningRate 0.0004 Epoch: 16 Global Step: 350260 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:03,619-Speed 6314.50 samples/sec Loss 5.4091 LearningRate 0.0004 Epoch: 16 Global Step: 350270 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:06,869-Speed 6303.75 samples/sec Loss 5.3654 LearningRate 0.0004 Epoch: 16 Global Step: 350280 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:10,117-Speed 6306.07 samples/sec Loss 5.4238 LearningRate 0.0004 Epoch: 16 Global Step: 350290 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:13,365-Speed 6307.25 samples/sec Loss 5.3661 LearningRate 0.0004 Epoch: 16 Global Step: 350300 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:16,611-Speed 6309.80 samples/sec Loss 5.3289 LearningRate 0.0004 Epoch: 16 Global Step: 350310 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:19,841-Speed 6342.39 samples/sec Loss 5.3154 LearningRate 0.0004 Epoch: 16 Global Step: 350320 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:23,091-Speed 6304.46 samples/sec Loss 5.3681 LearningRate 0.0004 Epoch: 16 Global Step: 350330 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:26,337-Speed 6310.53 samples/sec Loss 5.3470 LearningRate 0.0004 Epoch: 16 Global Step: 350340 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:29,583-Speed 6310.40 samples/sec Loss 5.3620 LearningRate 0.0004 Epoch: 16 Global Step: 350350 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:32,825-Speed 6317.37 samples/sec Loss 5.3304 LearningRate 0.0004 Epoch: 16 Global Step: 350360 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:36,072-Speed 6309.45 samples/sec Loss 5.3979 LearningRate 0.0004 Epoch: 16 Global Step: 350370 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:35:39,308-Speed 6330.87 samples/sec Loss 5.4217 LearningRate 0.0004 Epoch: 16 Global Step: 350380 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:35:42,554-Speed 6310.97 samples/sec Loss 5.3356 LearningRate 0.0004 Epoch: 16 Global Step: 350390 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:35:45,798-Speed 6314.35 samples/sec Loss 5.3794 LearningRate 0.0004 Epoch: 16 Global Step: 350400 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:35:49,043-Speed 6311.92 samples/sec Loss 5.4079 LearningRate 0.0004 Epoch: 16 Global Step: 350410 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:35:52,289-Speed 6311.13 samples/sec Loss 5.3332 LearningRate 0.0004 Epoch: 16 Global Step: 350420 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:35:55,533-Speed 6315.66 samples/sec Loss 5.2597 LearningRate 0.0004 Epoch: 16 Global Step: 350430 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:35:58,780-Speed 6308.18 samples/sec Loss 5.3285 LearningRate 0.0004 Epoch: 16 Global Step: 350440 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:36:02,046-Speed 6273.04 samples/sec Loss 5.3193 LearningRate 0.0004 Epoch: 16 Global Step: 350450 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:36:05,410-Speed 6088.10 samples/sec Loss 5.3702 LearningRate 0.0004 Epoch: 16 Global Step: 350460 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:36:08,667-Speed 6290.29 samples/sec Loss 5.4290 LearningRate 0.0004 Epoch: 16 Global Step: 350470 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:36:11,912-Speed 6312.00 samples/sec Loss 5.2986 LearningRate 0.0004 Epoch: 16 Global Step: 350480 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:36:15,158-Speed 6310.97 samples/sec Loss 5.4253 LearningRate 0.0004 Epoch: 16 Global Step: 350490 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:36:18,403-Speed 6311.53 samples/sec Loss 5.3657 LearningRate 0.0004 Epoch: 16 Global Step: 350500 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:36:21,650-Speed 6308.95 samples/sec Loss 5.3372 LearningRate 0.0004 Epoch: 16 Global Step: 350510 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:36:24,897-Speed 6308.69 samples/sec Loss 5.2897 LearningRate 0.0004 Epoch: 16 Global Step: 350520 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:36:28,141-Speed 6315.50 samples/sec Loss 5.3866 LearningRate 0.0004 Epoch: 16 Global Step: 350530 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:36:31,400-Speed 6286.08 samples/sec Loss 5.3654 LearningRate 0.0004 Epoch: 16 Global Step: 350540 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:36:34,648-Speed 6306.11 samples/sec Loss 5.3177 LearningRate 0.0004 Epoch: 16 Global Step: 350550 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:36:37,880-Speed 6338.64 samples/sec Loss 5.3797 LearningRate 0.0004 Epoch: 16 Global Step: 350560 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:36:41,123-Speed 6315.39 samples/sec Loss 5.3704 LearningRate 0.0004 Epoch: 16 Global Step: 350570 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:36:44,367-Speed 6315.02 samples/sec Loss 5.3889 LearningRate 0.0004 Epoch: 16 Global Step: 350580 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:36:47,611-Speed 6314.96 samples/sec Loss 5.3596 LearningRate 0.0004 Epoch: 16 Global Step: 350590 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:36:50,855-Speed 6315.59 samples/sec Loss 5.3747 LearningRate 0.0004 Epoch: 16 Global Step: 350600 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:36:54,098-Speed 6314.86 samples/sec Loss 5.4074 LearningRate 0.0004 Epoch: 16 Global Step: 350610 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:36:57,344-Speed 6310.88 samples/sec Loss 5.3678 LearningRate 0.0004 Epoch: 16 Global Step: 350620 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:37:00,595-Speed 6300.56 samples/sec Loss 5.2931 LearningRate 0.0004 Epoch: 16 Global Step: 350630 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:37:03,841-Speed 6312.10 samples/sec Loss 5.3073 LearningRate 0.0004 Epoch: 16 Global Step: 350640 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:37:07,085-Speed 6314.26 samples/sec Loss 5.4104 LearningRate 0.0004 Epoch: 16 Global Step: 350650 Fp16 Grad Scale: 16384 Required: 44 hours Training: 2022-04-01 23:37:10,331-Speed 6311.47 samples/sec Loss 5.3679 LearningRate 0.0004 Epoch: 16 Global Step: 350660 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:13,576-Speed 6312.74 samples/sec Loss 5.3886 LearningRate 0.0004 Epoch: 16 Global Step: 350670 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:16,820-Speed 6313.50 samples/sec Loss 5.3921 LearningRate 0.0004 Epoch: 16 Global Step: 350680 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:20,078-Speed 6288.64 samples/sec Loss 5.3475 LearningRate 0.0004 Epoch: 16 Global Step: 350690 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:23,325-Speed 6309.20 samples/sec Loss 5.3363 LearningRate 0.0004 Epoch: 16 Global Step: 350700 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:26,571-Speed 6309.10 samples/sec Loss 5.3371 LearningRate 0.0004 Epoch: 16 Global Step: 350710 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:29,818-Speed 6309.62 samples/sec Loss 5.2821 LearningRate 0.0004 Epoch: 16 Global Step: 350720 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:33,058-Speed 6321.95 samples/sec Loss 5.3459 LearningRate 0.0004 Epoch: 16 Global Step: 350730 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:36,307-Speed 6306.74 samples/sec Loss 5.3221 LearningRate 0.0004 Epoch: 16 Global Step: 350740 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:39,553-Speed 6309.54 samples/sec Loss 5.2981 LearningRate 0.0004 Epoch: 16 Global Step: 350750 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:42,783-Speed 6343.38 samples/sec Loss 5.3288 LearningRate 0.0004 Epoch: 16 Global Step: 350760 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:46,032-Speed 6303.80 samples/sec Loss 5.3509 LearningRate 0.0004 Epoch: 16 Global Step: 350770 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:49,279-Speed 6309.06 samples/sec Loss 5.4137 LearningRate 0.0004 Epoch: 16 Global Step: 350780 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:52,526-Speed 6307.88 samples/sec Loss 5.3836 LearningRate 0.0004 Epoch: 16 Global Step: 350790 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:55,776-Speed 6304.34 samples/sec Loss 5.3919 LearningRate 0.0004 Epoch: 16 Global Step: 350800 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:37:59,021-Speed 6311.33 samples/sec Loss 5.3811 LearningRate 0.0004 Epoch: 16 Global Step: 350810 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:02,270-Speed 6305.67 samples/sec Loss 5.3693 LearningRate 0.0004 Epoch: 16 Global Step: 350820 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:05,525-Speed 6291.84 samples/sec Loss 5.4064 LearningRate 0.0004 Epoch: 16 Global Step: 350830 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:08,775-Speed 6304.85 samples/sec Loss 5.2834 LearningRate 0.0004 Epoch: 16 Global Step: 350840 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:12,020-Speed 6312.44 samples/sec Loss 5.3552 LearningRate 0.0004 Epoch: 16 Global Step: 350850 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:15,258-Speed 6325.88 samples/sec Loss 5.3682 LearningRate 0.0004 Epoch: 16 Global Step: 350860 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:18,503-Speed 6312.78 samples/sec Loss 5.2801 LearningRate 0.0004 Epoch: 16 Global Step: 350870 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:21,745-Speed 6318.32 samples/sec Loss 5.3785 LearningRate 0.0004 Epoch: 16 Global Step: 350880 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:24,991-Speed 6311.63 samples/sec Loss 5.4174 LearningRate 0.0004 Epoch: 16 Global Step: 350890 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:28,237-Speed 6310.46 samples/sec Loss 5.3375 LearningRate 0.0004 Epoch: 16 Global Step: 350900 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:31,480-Speed 6315.66 samples/sec Loss 5.3958 LearningRate 0.0004 Epoch: 16 Global Step: 350910 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:34,725-Speed 6313.37 samples/sec Loss 5.4769 LearningRate 0.0004 Epoch: 16 Global Step: 350920 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:37,966-Speed 6320.02 samples/sec Loss 5.2439 LearningRate 0.0004 Epoch: 16 Global Step: 350930 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:41,211-Speed 6312.73 samples/sec Loss 5.3641 LearningRate 0.0004 Epoch: 16 Global Step: 350940 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:44,458-Speed 6310.00 samples/sec Loss 5.3050 LearningRate 0.0004 Epoch: 16 Global Step: 350950 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:47,699-Speed 6319.32 samples/sec Loss 5.3516 LearningRate 0.0004 Epoch: 16 Global Step: 350960 Fp16 Grad Scale: 65536 Required: 44 hours Training: 2022-04-01 23:38:50,936-Speed 6328.90 samples/sec Loss 5.3182 LearningRate 0.0004 Epoch: 16 Global Step: 350970 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:54,183-Speed 6308.70 samples/sec Loss 5.3546 LearningRate 0.0004 Epoch: 16 Global Step: 350980 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:38:57,427-Speed 6314.92 samples/sec Loss 5.3750 LearningRate 0.0004 Epoch: 16 Global Step: 350990 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:39:00,673-Speed 6310.92 samples/sec Loss 5.3899 LearningRate 0.0004 Epoch: 16 Global Step: 351000 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:39:03,933-Speed 6283.45 samples/sec Loss 5.3370 LearningRate 0.0004 Epoch: 16 Global Step: 351010 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:39:07,178-Speed 6311.64 samples/sec Loss 5.3730 LearningRate 0.0004 Epoch: 16 Global Step: 351020 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:39:10,427-Speed 6305.15 samples/sec Loss 5.4046 LearningRate 0.0004 Epoch: 16 Global Step: 351030 Fp16 Grad Scale: 32768 Required: 44 hours Training: 2022-04-01 23:39:13,676-Speed 6305.72 samples/sec Loss 5.4099 LearningRate 0.0004 Epoch: 16 Global Step: 351040 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:16,938-Speed 6279.85 samples/sec Loss 5.3317 LearningRate 0.0004 Epoch: 16 Global Step: 351050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:20,206-Speed 6267.64 samples/sec Loss 5.3680 LearningRate 0.0004 Epoch: 16 Global Step: 351060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:23,437-Speed 6339.47 samples/sec Loss 5.3560 LearningRate 0.0004 Epoch: 16 Global Step: 351070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:26,684-Speed 6310.23 samples/sec Loss 5.4646 LearningRate 0.0004 Epoch: 16 Global Step: 351080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:29,931-Speed 6308.71 samples/sec Loss 5.3236 LearningRate 0.0004 Epoch: 16 Global Step: 351090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:33,174-Speed 6316.23 samples/sec Loss 5.3873 LearningRate 0.0004 Epoch: 16 Global Step: 351100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:36,419-Speed 6313.28 samples/sec Loss 5.3342 LearningRate 0.0004 Epoch: 16 Global Step: 351110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:39,666-Speed 6308.02 samples/sec Loss 5.2989 LearningRate 0.0004 Epoch: 16 Global Step: 351120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:42,912-Speed 6311.19 samples/sec Loss 5.3191 LearningRate 0.0004 Epoch: 16 Global Step: 351130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:46,161-Speed 6305.04 samples/sec Loss 5.4003 LearningRate 0.0004 Epoch: 16 Global Step: 351140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:49,406-Speed 6313.24 samples/sec Loss 5.3718 LearningRate 0.0004 Epoch: 16 Global Step: 351150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:52,663-Speed 6288.09 samples/sec Loss 5.3768 LearningRate 0.0004 Epoch: 16 Global Step: 351160 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:55,896-Speed 6336.95 samples/sec Loss 5.3557 LearningRate 0.0004 Epoch: 16 Global Step: 351170 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:39:59,141-Speed 6311.78 samples/sec Loss 5.3381 LearningRate 0.0004 Epoch: 16 Global Step: 351180 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:02,386-Speed 6313.02 samples/sec Loss 5.3729 LearningRate 0.0004 Epoch: 16 Global Step: 351190 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:05,634-Speed 6307.59 samples/sec Loss 5.3700 LearningRate 0.0004 Epoch: 16 Global Step: 351200 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:08,883-Speed 6306.32 samples/sec Loss 5.3323 LearningRate 0.0004 Epoch: 16 Global Step: 351210 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:12,134-Speed 6300.71 samples/sec Loss 5.3737 LearningRate 0.0004 Epoch: 16 Global Step: 351220 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:15,382-Speed 6306.35 samples/sec Loss 5.3137 LearningRate 0.0004 Epoch: 16 Global Step: 351230 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:18,630-Speed 6306.49 samples/sec Loss 5.3886 LearningRate 0.0004 Epoch: 16 Global Step: 351240 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:21,874-Speed 6314.38 samples/sec Loss 5.3909 LearningRate 0.0004 Epoch: 16 Global Step: 351250 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:25,121-Speed 6308.31 samples/sec Loss 5.3883 LearningRate 0.0004 Epoch: 16 Global Step: 351260 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:28,356-Speed 6332.69 samples/sec Loss 5.3812 LearningRate 0.0004 Epoch: 16 Global Step: 351270 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:31,602-Speed 6310.54 samples/sec Loss 5.3831 LearningRate 0.0004 Epoch: 16 Global Step: 351280 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:34,852-Speed 6303.24 samples/sec Loss 5.3297 LearningRate 0.0004 Epoch: 16 Global Step: 351290 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:38,100-Speed 6307.97 samples/sec Loss 5.3910 LearningRate 0.0004 Epoch: 16 Global Step: 351300 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:41,345-Speed 6313.02 samples/sec Loss 5.3564 LearningRate 0.0004 Epoch: 16 Global Step: 351310 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:44,607-Speed 6279.26 samples/sec Loss 5.3098 LearningRate 0.0004 Epoch: 16 Global Step: 351320 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:47,856-Speed 6306.09 samples/sec Loss 5.4003 LearningRate 0.0004 Epoch: 16 Global Step: 351330 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:51,100-Speed 6313.31 samples/sec Loss 5.3393 LearningRate 0.0004 Epoch: 16 Global Step: 351340 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:54,385-Speed 6236.43 samples/sec Loss 5.3367 LearningRate 0.0004 Epoch: 16 Global Step: 351350 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:40:57,632-Speed 6308.36 samples/sec Loss 5.3751 LearningRate 0.0004 Epoch: 16 Global Step: 351360 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:41:00,866-Speed 6334.06 samples/sec Loss 5.4104 LearningRate 0.0004 Epoch: 16 Global Step: 351370 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:41:04,096-Speed 6342.42 samples/sec Loss 5.3276 LearningRate 0.0004 Epoch: 16 Global Step: 351380 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:07,341-Speed 6311.82 samples/sec Loss 5.3389 LearningRate 0.0004 Epoch: 16 Global Step: 351390 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:10,586-Speed 6311.77 samples/sec Loss 5.3812 LearningRate 0.0004 Epoch: 16 Global Step: 351400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:13,831-Speed 6313.29 samples/sec Loss 5.3941 LearningRate 0.0004 Epoch: 16 Global Step: 351410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:17,075-Speed 6314.24 samples/sec Loss 5.3097 LearningRate 0.0004 Epoch: 16 Global Step: 351420 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:20,323-Speed 6306.68 samples/sec Loss 5.3535 LearningRate 0.0004 Epoch: 16 Global Step: 351430 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:23,573-Speed 6304.62 samples/sec Loss 5.3374 LearningRate 0.0004 Epoch: 16 Global Step: 351440 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:26,816-Speed 6316.49 samples/sec Loss 5.3276 LearningRate 0.0004 Epoch: 16 Global Step: 351450 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:30,062-Speed 6309.45 samples/sec Loss 5.3686 LearningRate 0.0004 Epoch: 16 Global Step: 351460 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:33,310-Speed 6306.72 samples/sec Loss 5.4107 LearningRate 0.0004 Epoch: 16 Global Step: 351470 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:36,557-Speed 6309.04 samples/sec Loss 5.3438 LearningRate 0.0004 Epoch: 16 Global Step: 351480 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:41:39,805-Speed 6306.95 samples/sec Loss 5.3681 LearningRate 0.0004 Epoch: 16 Global Step: 351490 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:41:43,051-Speed 6312.62 samples/sec Loss 5.2777 LearningRate 0.0004 Epoch: 16 Global Step: 351500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:41:46,304-Speed 6296.75 samples/sec Loss 5.3990 LearningRate 0.0004 Epoch: 16 Global Step: 351510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:41:49,557-Speed 6297.52 samples/sec Loss 5.3575 LearningRate 0.0004 Epoch: 16 Global Step: 351520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:41:52,796-Speed 6323.59 samples/sec Loss 5.3062 LearningRate 0.0004 Epoch: 16 Global Step: 351530 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:56,046-Speed 6302.38 samples/sec Loss 5.3759 LearningRate 0.0004 Epoch: 16 Global Step: 351540 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:41:59,288-Speed 6319.61 samples/sec Loss 5.3253 LearningRate 0.0004 Epoch: 16 Global Step: 351550 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:42:02,535-Speed 6308.70 samples/sec Loss 5.3447 LearningRate 0.0004 Epoch: 16 Global Step: 351560 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:42:05,778-Speed 6314.63 samples/sec Loss 5.3820 LearningRate 0.0004 Epoch: 16 Global Step: 351570 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:42:09,031-Speed 6298.26 samples/sec Loss 5.3414 LearningRate 0.0004 Epoch: 16 Global Step: 351580 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:42:12,280-Speed 6304.88 samples/sec Loss 5.3079 LearningRate 0.0004 Epoch: 16 Global Step: 351590 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:42:15,526-Speed 6310.26 samples/sec Loss 5.3654 LearningRate 0.0004 Epoch: 16 Global Step: 351600 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:42:18,772-Speed 6311.96 samples/sec Loss 5.4454 LearningRate 0.0004 Epoch: 16 Global Step: 351610 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:42:22,017-Speed 6311.40 samples/sec Loss 5.4029 LearningRate 0.0004 Epoch: 16 Global Step: 351620 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:42:25,277-Speed 6283.03 samples/sec Loss 5.3378 LearningRate 0.0004 Epoch: 16 Global Step: 351630 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:42:28,526-Speed 6306.10 samples/sec Loss 5.3455 LearningRate 0.0004 Epoch: 16 Global Step: 351640 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:42:31,772-Speed 6309.76 samples/sec Loss 5.3859 LearningRate 0.0004 Epoch: 16 Global Step: 351650 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:42:35,015-Speed 6316.34 samples/sec Loss 5.3349 LearningRate 0.0004 Epoch: 16 Global Step: 351660 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:42:38,265-Speed 6303.82 samples/sec Loss 5.2879 LearningRate 0.0004 Epoch: 16 Global Step: 351670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:42:41,509-Speed 6315.27 samples/sec Loss 5.3453 LearningRate 0.0004 Epoch: 16 Global Step: 351680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:42:44,755-Speed 6310.79 samples/sec Loss 5.3686 LearningRate 0.0004 Epoch: 16 Global Step: 351690 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:42:47,999-Speed 6314.84 samples/sec Loss 5.4310 LearningRate 0.0004 Epoch: 16 Global Step: 351700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:42:51,248-Speed 6305.22 samples/sec Loss 5.3553 LearningRate 0.0004 Epoch: 16 Global Step: 351710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:42:54,479-Speed 6339.79 samples/sec Loss 5.3276 LearningRate 0.0004 Epoch: 16 Global Step: 351720 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:42:57,729-Speed 6303.80 samples/sec Loss 5.3992 LearningRate 0.0004 Epoch: 16 Global Step: 351730 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:43:00,974-Speed 6311.29 samples/sec Loss 5.3541 LearningRate 0.0004 Epoch: 16 Global Step: 351740 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:43:04,220-Speed 6309.96 samples/sec Loss 5.3648 LearningRate 0.0004 Epoch: 16 Global Step: 351750 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:43:07,468-Speed 6307.31 samples/sec Loss 5.3704 LearningRate 0.0004 Epoch: 16 Global Step: 351760 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:43:10,711-Speed 6317.57 samples/sec Loss 5.3071 LearningRate 0.0004 Epoch: 16 Global Step: 351770 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:43:13,963-Speed 6299.26 samples/sec Loss 5.3363 LearningRate 0.0004 Epoch: 16 Global Step: 351780 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:43:17,210-Speed 6307.50 samples/sec Loss 5.2987 LearningRate 0.0004 Epoch: 16 Global Step: 351790 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:43:20,456-Speed 6310.79 samples/sec Loss 5.4217 LearningRate 0.0004 Epoch: 16 Global Step: 351800 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:43:23,702-Speed 6311.31 samples/sec Loss 5.3485 LearningRate 0.0004 Epoch: 16 Global Step: 351810 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:43:26,945-Speed 6316.52 samples/sec Loss 5.2727 LearningRate 0.0004 Epoch: 16 Global Step: 351820 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:43:30,223-Speed 6249.46 samples/sec Loss 5.3836 LearningRate 0.0004 Epoch: 16 Global Step: 351830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:43:33,464-Speed 6319.70 samples/sec Loss 5.3689 LearningRate 0.0004 Epoch: 16 Global Step: 351840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:43:36,720-Speed 6290.80 samples/sec Loss 5.3371 LearningRate 0.0004 Epoch: 16 Global Step: 351850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:43:39,968-Speed 6307.28 samples/sec Loss 5.2815 LearningRate 0.0004 Epoch: 16 Global Step: 351860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:43:43,215-Speed 6309.23 samples/sec Loss 5.2304 LearningRate 0.0004 Epoch: 16 Global Step: 351870 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:43:46,459-Speed 6313.33 samples/sec Loss 5.2500 LearningRate 0.0004 Epoch: 16 Global Step: 351880 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:43:49,704-Speed 6314.32 samples/sec Loss 5.3573 LearningRate 0.0004 Epoch: 16 Global Step: 351890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:43:52,951-Speed 6307.41 samples/sec Loss 5.3921 LearningRate 0.0004 Epoch: 16 Global Step: 351900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:43:56,197-Speed 6311.85 samples/sec Loss 5.3962 LearningRate 0.0004 Epoch: 16 Global Step: 351910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:43:59,426-Speed 6344.55 samples/sec Loss 5.3093 LearningRate 0.0004 Epoch: 16 Global Step: 351920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:02,672-Speed 6309.50 samples/sec Loss 5.3832 LearningRate 0.0004 Epoch: 16 Global Step: 351930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:05,917-Speed 6314.29 samples/sec Loss 5.3934 LearningRate 0.0004 Epoch: 16 Global Step: 351940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:09,166-Speed 6303.41 samples/sec Loss 5.3962 LearningRate 0.0004 Epoch: 16 Global Step: 351950 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:12,412-Speed 6312.17 samples/sec Loss 5.3023 LearningRate 0.0004 Epoch: 16 Global Step: 351960 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:15,657-Speed 6313.01 samples/sec Loss 5.3854 LearningRate 0.0004 Epoch: 16 Global Step: 351970 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:18,903-Speed 6309.43 samples/sec Loss 5.4004 LearningRate 0.0004 Epoch: 16 Global Step: 351980 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:22,148-Speed 6312.55 samples/sec Loss 5.3260 LearningRate 0.0004 Epoch: 16 Global Step: 351990 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:25,395-Speed 6309.69 samples/sec Loss 5.2876 LearningRate 0.0004 Epoch: 16 Global Step: 352000 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:28,642-Speed 6308.70 samples/sec Loss 5.3398 LearningRate 0.0004 Epoch: 16 Global Step: 352010 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:31,874-Speed 6337.05 samples/sec Loss 5.3909 LearningRate 0.0004 Epoch: 16 Global Step: 352020 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:35,150-Speed 6252.29 samples/sec Loss 5.2728 LearningRate 0.0004 Epoch: 16 Global Step: 352030 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:38,393-Speed 6317.12 samples/sec Loss 5.2814 LearningRate 0.0004 Epoch: 16 Global Step: 352040 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:41,641-Speed 6306.72 samples/sec Loss 5.3711 LearningRate 0.0004 Epoch: 16 Global Step: 352050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:44,885-Speed 6315.78 samples/sec Loss 5.3356 LearningRate 0.0004 Epoch: 16 Global Step: 352060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:48,131-Speed 6310.20 samples/sec Loss 5.3391 LearningRate 0.0004 Epoch: 16 Global Step: 352070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:51,382-Speed 6300.36 samples/sec Loss 5.4153 LearningRate 0.0004 Epoch: 16 Global Step: 352080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:54,625-Speed 6316.79 samples/sec Loss 5.3814 LearningRate 0.0004 Epoch: 16 Global Step: 352090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:44:57,868-Speed 6315.58 samples/sec Loss 5.3512 LearningRate 0.0004 Epoch: 16 Global Step: 352100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:01,114-Speed 6312.26 samples/sec Loss 5.3369 LearningRate 0.0004 Epoch: 16 Global Step: 352110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:04,354-Speed 6322.15 samples/sec Loss 5.4074 LearningRate 0.0004 Epoch: 16 Global Step: 352120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:07,600-Speed 6311.40 samples/sec Loss 5.4438 LearningRate 0.0004 Epoch: 16 Global Step: 352130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:10,850-Speed 6301.55 samples/sec Loss 5.3534 LearningRate 0.0004 Epoch: 16 Global Step: 352140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:14,098-Speed 6308.92 samples/sec Loss 5.3964 LearningRate 0.0004 Epoch: 16 Global Step: 352150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:17,343-Speed 6311.32 samples/sec Loss 5.2971 LearningRate 0.0004 Epoch: 16 Global Step: 352160 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:20,590-Speed 6309.97 samples/sec Loss 5.3334 LearningRate 0.0004 Epoch: 16 Global Step: 352170 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:23,833-Speed 6315.98 samples/sec Loss 5.3106 LearningRate 0.0004 Epoch: 16 Global Step: 352180 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:27,083-Speed 6302.07 samples/sec Loss 5.2704 LearningRate 0.0004 Epoch: 16 Global Step: 352190 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:30,324-Speed 6320.77 samples/sec Loss 5.4390 LearningRate 0.0004 Epoch: 16 Global Step: 352200 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:33,570-Speed 6311.03 samples/sec Loss 5.3548 LearningRate 0.0004 Epoch: 16 Global Step: 352210 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:36,805-Speed 6331.42 samples/sec Loss 5.3898 LearningRate 0.0004 Epoch: 16 Global Step: 352220 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:40,055-Speed 6303.23 samples/sec Loss 5.3754 LearningRate 0.0004 Epoch: 16 Global Step: 352230 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:43,303-Speed 6306.93 samples/sec Loss 5.4427 LearningRate 0.0004 Epoch: 16 Global Step: 352240 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:46,548-Speed 6311.69 samples/sec Loss 5.3344 LearningRate 0.0004 Epoch: 16 Global Step: 352250 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:49,798-Speed 6304.01 samples/sec Loss 5.4040 LearningRate 0.0004 Epoch: 16 Global Step: 352260 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:53,045-Speed 6308.81 samples/sec Loss 5.3117 LearningRate 0.0004 Epoch: 16 Global Step: 352270 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:45:56,274-Speed 6343.20 samples/sec Loss 5.3862 LearningRate 0.0004 Epoch: 16 Global Step: 352280 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:45:59,517-Speed 6316.31 samples/sec Loss 5.3222 LearningRate 0.0004 Epoch: 16 Global Step: 352290 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:46:02,761-Speed 6314.95 samples/sec Loss 5.3308 LearningRate 0.0004 Epoch: 16 Global Step: 352300 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:46:06,010-Speed 6304.12 samples/sec Loss 5.3845 LearningRate 0.0004 Epoch: 16 Global Step: 352310 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:46:09,254-Speed 6315.50 samples/sec Loss 5.3773 LearningRate 0.0004 Epoch: 16 Global Step: 352320 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:46:12,514-Speed 6282.82 samples/sec Loss 5.3950 LearningRate 0.0004 Epoch: 16 Global Step: 352330 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:46:15,762-Speed 6307.51 samples/sec Loss 5.4218 LearningRate 0.0004 Epoch: 16 Global Step: 352340 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:46:19,004-Speed 6318.82 samples/sec Loss 5.3427 LearningRate 0.0004 Epoch: 16 Global Step: 352350 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:46:22,247-Speed 6318.27 samples/sec Loss 5.3195 LearningRate 0.0004 Epoch: 16 Global Step: 352360 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:46:25,494-Speed 6307.70 samples/sec Loss 5.3591 LearningRate 0.0004 Epoch: 16 Global Step: 352370 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:46:28,738-Speed 6314.43 samples/sec Loss 5.3277 LearningRate 0.0004 Epoch: 16 Global Step: 352380 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:46:31,985-Speed 6309.41 samples/sec Loss 5.4429 LearningRate 0.0004 Epoch: 16 Global Step: 352390 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:46:35,227-Speed 6318.36 samples/sec Loss 5.2945 LearningRate 0.0004 Epoch: 16 Global Step: 352400 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:46:38,474-Speed 6308.09 samples/sec Loss 5.3499 LearningRate 0.0004 Epoch: 16 Global Step: 352410 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:46:41,723-Speed 6305.94 samples/sec Loss 5.3730 LearningRate 0.0004 Epoch: 16 Global Step: 352420 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:46:44,967-Speed 6314.58 samples/sec Loss 5.3704 LearningRate 0.0004 Epoch: 16 Global Step: 352430 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:46:48,210-Speed 6314.98 samples/sec Loss 5.3561 LearningRate 0.0004 Epoch: 16 Global Step: 352440 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:46:51,457-Speed 6309.48 samples/sec Loss 5.3190 LearningRate 0.0004 Epoch: 16 Global Step: 352450 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:46:54,708-Speed 6301.61 samples/sec Loss 5.3449 LearningRate 0.0004 Epoch: 16 Global Step: 352460 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:46:57,954-Speed 6311.00 samples/sec Loss 5.3307 LearningRate 0.0004 Epoch: 16 Global Step: 352470 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:47:01,189-Speed 6331.87 samples/sec Loss 5.4052 LearningRate 0.0004 Epoch: 16 Global Step: 352480 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:47:04,435-Speed 6310.59 samples/sec Loss 5.3461 LearningRate 0.0004 Epoch: 16 Global Step: 352490 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:47:07,679-Speed 6313.86 samples/sec Loss 5.3328 LearningRate 0.0004 Epoch: 16 Global Step: 352500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:47:10,921-Speed 6319.00 samples/sec Loss 5.3593 LearningRate 0.0004 Epoch: 16 Global Step: 352510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:47:14,170-Speed 6304.76 samples/sec Loss 5.3548 LearningRate 0.0004 Epoch: 16 Global Step: 352520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:47:17,413-Speed 6315.37 samples/sec Loss 5.2963 LearningRate 0.0004 Epoch: 16 Global Step: 352530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:47:20,663-Speed 6303.64 samples/sec Loss 5.3626 LearningRate 0.0004 Epoch: 16 Global Step: 352540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:47:23,910-Speed 6308.22 samples/sec Loss 5.4060 LearningRate 0.0004 Epoch: 16 Global Step: 352550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:47:27,155-Speed 6313.85 samples/sec Loss 5.3741 LearningRate 0.0004 Epoch: 16 Global Step: 352560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:47:30,398-Speed 6317.15 samples/sec Loss 5.4364 LearningRate 0.0004 Epoch: 16 Global Step: 352570 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:48:30,604-Speed 340.17 samples/sec Loss 5.2970 LearningRate 0.0004 Epoch: 17 Global Step: 352580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:48:33,838-Speed 6334.27 samples/sec Loss 5.3796 LearningRate 0.0004 Epoch: 17 Global Step: 352590 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:48:37,075-Speed 6327.66 samples/sec Loss 5.5153 LearningRate 0.0004 Epoch: 17 Global Step: 352600 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:48:40,321-Speed 6311.48 samples/sec Loss 5.2834 LearningRate 0.0004 Epoch: 17 Global Step: 352610 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:48:43,555-Speed 6333.13 samples/sec Loss 5.3488 LearningRate 0.0004 Epoch: 17 Global Step: 352620 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:48:46,795-Speed 6321.86 samples/sec Loss 5.3031 LearningRate 0.0004 Epoch: 17 Global Step: 352630 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:48:50,033-Speed 6326.74 samples/sec Loss 5.3552 LearningRate 0.0004 Epoch: 17 Global Step: 352640 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:48:53,276-Speed 6317.62 samples/sec Loss 5.4116 LearningRate 0.0004 Epoch: 17 Global Step: 352650 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:48:56,517-Speed 6320.30 samples/sec Loss 5.4450 LearningRate 0.0004 Epoch: 17 Global Step: 352660 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:48:59,755-Speed 6326.24 samples/sec Loss 5.3635 LearningRate 0.0004 Epoch: 17 Global Step: 352670 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:49:02,995-Speed 6321.42 samples/sec Loss 5.3009 LearningRate 0.0004 Epoch: 17 Global Step: 352680 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:49:06,233-Speed 6327.03 samples/sec Loss 5.3493 LearningRate 0.0004 Epoch: 17 Global Step: 352690 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:49:09,477-Speed 6314.38 samples/sec Loss 5.3747 LearningRate 0.0004 Epoch: 17 Global Step: 352700 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:49:12,720-Speed 6315.33 samples/sec Loss 5.3432 LearningRate 0.0004 Epoch: 17 Global Step: 352710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:15,966-Speed 6312.59 samples/sec Loss 5.3229 LearningRate 0.0004 Epoch: 17 Global Step: 352720 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:19,206-Speed 6321.14 samples/sec Loss 5.3535 LearningRate 0.0004 Epoch: 17 Global Step: 352730 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:22,447-Speed 6320.86 samples/sec Loss 5.3168 LearningRate 0.0004 Epoch: 17 Global Step: 352740 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:25,692-Speed 6312.25 samples/sec Loss 5.3741 LearningRate 0.0004 Epoch: 17 Global Step: 352750 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:28,936-Speed 6316.65 samples/sec Loss 5.3656 LearningRate 0.0004 Epoch: 17 Global Step: 352760 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:32,183-Speed 6308.88 samples/sec Loss 5.3302 LearningRate 0.0004 Epoch: 17 Global Step: 352770 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:35,429-Speed 6309.82 samples/sec Loss 5.3269 LearningRate 0.0004 Epoch: 17 Global Step: 352780 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:38,674-Speed 6313.31 samples/sec Loss 5.3317 LearningRate 0.0004 Epoch: 17 Global Step: 352790 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:41,918-Speed 6314.15 samples/sec Loss 5.3325 LearningRate 0.0004 Epoch: 17 Global Step: 352800 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:45,152-Speed 6333.36 samples/sec Loss 5.2394 LearningRate 0.0004 Epoch: 17 Global Step: 352810 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:48,406-Speed 6295.31 samples/sec Loss 5.2888 LearningRate 0.0004 Epoch: 17 Global Step: 352820 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:51,659-Speed 6297.90 samples/sec Loss 5.2930 LearningRate 0.0004 Epoch: 17 Global Step: 352830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:54,913-Speed 6294.92 samples/sec Loss 5.3247 LearningRate 0.0004 Epoch: 17 Global Step: 352840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:49:58,188-Speed 6255.58 samples/sec Loss 5.2750 LearningRate 0.0004 Epoch: 17 Global Step: 352850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:01,435-Speed 6308.15 samples/sec Loss 5.3202 LearningRate 0.0004 Epoch: 17 Global Step: 352860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:04,688-Speed 6296.67 samples/sec Loss 5.2938 LearningRate 0.0004 Epoch: 17 Global Step: 352870 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:07,938-Speed 6302.47 samples/sec Loss 5.2443 LearningRate 0.0004 Epoch: 17 Global Step: 352880 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:11,191-Speed 6297.68 samples/sec Loss 5.2751 LearningRate 0.0004 Epoch: 17 Global Step: 352890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:14,446-Speed 6293.09 samples/sec Loss 5.3627 LearningRate 0.0004 Epoch: 17 Global Step: 352900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:17,682-Speed 6330.64 samples/sec Loss 5.4005 LearningRate 0.0004 Epoch: 17 Global Step: 352910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:20,934-Speed 6298.32 samples/sec Loss 5.4065 LearningRate 0.0004 Epoch: 17 Global Step: 352920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:24,182-Speed 6306.47 samples/sec Loss 5.3038 LearningRate 0.0004 Epoch: 17 Global Step: 352930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:27,434-Speed 6300.18 samples/sec Loss 5.2949 LearningRate 0.0004 Epoch: 17 Global Step: 352940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:30,686-Speed 6297.84 samples/sec Loss 5.3345 LearningRate 0.0004 Epoch: 17 Global Step: 352950 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:33,936-Speed 6304.15 samples/sec Loss 5.2933 LearningRate 0.0004 Epoch: 17 Global Step: 352960 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:37,184-Speed 6307.57 samples/sec Loss 5.3096 LearningRate 0.0004 Epoch: 17 Global Step: 352970 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:40,475-Speed 6224.03 samples/sec Loss 5.3982 LearningRate 0.0004 Epoch: 17 Global Step: 352980 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:43,723-Speed 6307.73 samples/sec Loss 5.3055 LearningRate 0.0004 Epoch: 17 Global Step: 352990 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:46,972-Speed 6304.04 samples/sec Loss 5.3680 LearningRate 0.0004 Epoch: 17 Global Step: 353000 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:50,204-Speed 6337.84 samples/sec Loss 5.3631 LearningRate 0.0004 Epoch: 17 Global Step: 353010 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:53,450-Speed 6310.21 samples/sec Loss 5.3848 LearningRate 0.0004 Epoch: 17 Global Step: 353020 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:56,695-Speed 6313.03 samples/sec Loss 5.2719 LearningRate 0.0004 Epoch: 17 Global Step: 353030 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:50:59,960-Speed 6273.52 samples/sec Loss 5.3953 LearningRate 0.0004 Epoch: 17 Global Step: 353040 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:51:03,207-Speed 6310.10 samples/sec Loss 5.3456 LearningRate 0.0004 Epoch: 17 Global Step: 353050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:51:06,450-Speed 6315.45 samples/sec Loss 5.3807 LearningRate 0.0004 Epoch: 17 Global Step: 353060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:51:09,694-Speed 6315.04 samples/sec Loss 5.3824 LearningRate 0.0004 Epoch: 17 Global Step: 353070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:51:12,937-Speed 6315.83 samples/sec Loss 5.3642 LearningRate 0.0004 Epoch: 17 Global Step: 353080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:51:16,186-Speed 6305.09 samples/sec Loss 5.3588 LearningRate 0.0004 Epoch: 17 Global Step: 353090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:51:19,434-Speed 6306.67 samples/sec Loss 5.3151 LearningRate 0.0004 Epoch: 17 Global Step: 353100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:51:22,669-Speed 6333.23 samples/sec Loss 5.3335 LearningRate 0.0004 Epoch: 17 Global Step: 353110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:51:25,918-Speed 6304.43 samples/sec Loss 5.2973 LearningRate 0.0004 Epoch: 17 Global Step: 353120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:51:29,168-Speed 6302.59 samples/sec Loss 5.3214 LearningRate 0.0004 Epoch: 17 Global Step: 353130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:51:32,399-Speed 6339.26 samples/sec Loss 5.3129 LearningRate 0.0004 Epoch: 17 Global Step: 353140 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:51:35,647-Speed 6306.62 samples/sec Loss 5.2652 LearningRate 0.0004 Epoch: 17 Global Step: 353150 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:51:38,894-Speed 6310.40 samples/sec Loss 5.2984 LearningRate 0.0004 Epoch: 17 Global Step: 353160 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:51:42,140-Speed 6310.79 samples/sec Loss 5.3190 LearningRate 0.0004 Epoch: 17 Global Step: 353170 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:51:45,389-Speed 6304.86 samples/sec Loss 5.3776 LearningRate 0.0004 Epoch: 17 Global Step: 353180 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:51:48,635-Speed 6310.48 samples/sec Loss 5.3243 LearningRate 0.0004 Epoch: 17 Global Step: 353190 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:51:51,881-Speed 6311.17 samples/sec Loss 5.3742 LearningRate 0.0004 Epoch: 17 Global Step: 353200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:51:55,129-Speed 6305.92 samples/sec Loss 5.3267 LearningRate 0.0004 Epoch: 17 Global Step: 353210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:51:58,376-Speed 6309.79 samples/sec Loss 5.3450 LearningRate 0.0004 Epoch: 17 Global Step: 353220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:52:01,628-Speed 6299.03 samples/sec Loss 5.3236 LearningRate 0.0004 Epoch: 17 Global Step: 353230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:52:04,881-Speed 6297.46 samples/sec Loss 5.3146 LearningRate 0.0004 Epoch: 17 Global Step: 353240 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:08,130-Speed 6304.30 samples/sec Loss 5.3681 LearningRate 0.0004 Epoch: 17 Global Step: 353250 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:11,387-Speed 6289.99 samples/sec Loss 5.3277 LearningRate 0.0004 Epoch: 17 Global Step: 353260 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:14,632-Speed 6311.60 samples/sec Loss 5.3189 LearningRate 0.0004 Epoch: 17 Global Step: 353270 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:17,879-Speed 6309.93 samples/sec Loss 5.3563 LearningRate 0.0004 Epoch: 17 Global Step: 353280 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:21,128-Speed 6303.54 samples/sec Loss 5.4554 LearningRate 0.0004 Epoch: 17 Global Step: 353290 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:24,384-Speed 6292.11 samples/sec Loss 5.3421 LearningRate 0.0004 Epoch: 17 Global Step: 353300 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:27,634-Speed 6303.36 samples/sec Loss 5.3292 LearningRate 0.0004 Epoch: 17 Global Step: 353310 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:30,886-Speed 6299.25 samples/sec Loss 5.3189 LearningRate 0.0004 Epoch: 17 Global Step: 353320 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:34,138-Speed 6297.88 samples/sec Loss 5.3023 LearningRate 0.0004 Epoch: 17 Global Step: 353330 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:37,373-Speed 6332.33 samples/sec Loss 5.2501 LearningRate 0.0004 Epoch: 17 Global Step: 353340 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:40,619-Speed 6310.50 samples/sec Loss 5.3272 LearningRate 0.0004 Epoch: 17 Global Step: 353350 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:43,870-Speed 6301.05 samples/sec Loss 5.2684 LearningRate 0.0004 Epoch: 17 Global Step: 353360 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:47,117-Speed 6309.38 samples/sec Loss 5.4270 LearningRate 0.0004 Epoch: 17 Global Step: 353370 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:50,381-Speed 6276.09 samples/sec Loss 5.3301 LearningRate 0.0004 Epoch: 17 Global Step: 353380 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:52:53,615-Speed 6335.25 samples/sec Loss 5.3452 LearningRate 0.0004 Epoch: 17 Global Step: 353390 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:52:56,861-Speed 6310.75 samples/sec Loss 5.3825 LearningRate 0.0004 Epoch: 17 Global Step: 353400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:00,108-Speed 6307.81 samples/sec Loss 5.3840 LearningRate 0.0004 Epoch: 17 Global Step: 353410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:03,358-Speed 6304.06 samples/sec Loss 5.3709 LearningRate 0.0004 Epoch: 17 Global Step: 353420 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:06,607-Speed 6304.64 samples/sec Loss 5.3321 LearningRate 0.0004 Epoch: 17 Global Step: 353430 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:09,849-Speed 6317.69 samples/sec Loss 5.4203 LearningRate 0.0004 Epoch: 17 Global Step: 353440 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:13,097-Speed 6307.55 samples/sec Loss 5.3390 LearningRate 0.0004 Epoch: 17 Global Step: 353450 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:16,376-Speed 6246.00 samples/sec Loss 5.3123 LearningRate 0.0004 Epoch: 17 Global Step: 353460 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:19,629-Speed 6298.01 samples/sec Loss 5.3052 LearningRate 0.0004 Epoch: 17 Global Step: 353470 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:22,876-Speed 6309.45 samples/sec Loss 5.3805 LearningRate 0.0004 Epoch: 17 Global Step: 353480 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:26,121-Speed 6311.42 samples/sec Loss 5.3083 LearningRate 0.0004 Epoch: 17 Global Step: 353490 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:53:29,371-Speed 6304.00 samples/sec Loss 5.3300 LearningRate 0.0004 Epoch: 17 Global Step: 353500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:53:32,620-Speed 6304.42 samples/sec Loss 5.3792 LearningRate 0.0004 Epoch: 17 Global Step: 353510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:53:35,867-Speed 6309.13 samples/sec Loss 5.3461 LearningRate 0.0004 Epoch: 17 Global Step: 353520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:53:39,119-Speed 6298.95 samples/sec Loss 5.3218 LearningRate 0.0004 Epoch: 17 Global Step: 353530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:53:42,370-Speed 6299.90 samples/sec Loss 5.3421 LearningRate 0.0004 Epoch: 17 Global Step: 353540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:53:45,620-Speed 6303.53 samples/sec Loss 5.3367 LearningRate 0.0004 Epoch: 17 Global Step: 353550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:53:48,862-Speed 6319.06 samples/sec Loss 5.2704 LearningRate 0.0004 Epoch: 17 Global Step: 353560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:53:52,095-Speed 6334.35 samples/sec Loss 5.3429 LearningRate 0.0004 Epoch: 17 Global Step: 353570 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:55,342-Speed 6310.99 samples/sec Loss 5.3471 LearningRate 0.0004 Epoch: 17 Global Step: 353580 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:53:58,590-Speed 6306.93 samples/sec Loss 5.3770 LearningRate 0.0004 Epoch: 17 Global Step: 353590 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:01,839-Speed 6304.64 samples/sec Loss 5.4098 LearningRate 0.0004 Epoch: 17 Global Step: 353600 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:05,085-Speed 6310.33 samples/sec Loss 5.3523 LearningRate 0.0004 Epoch: 17 Global Step: 353610 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:08,332-Speed 6308.72 samples/sec Loss 5.2745 LearningRate 0.0004 Epoch: 17 Global Step: 353620 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:11,581-Speed 6305.10 samples/sec Loss 5.2997 LearningRate 0.0004 Epoch: 17 Global Step: 353630 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:14,829-Speed 6307.47 samples/sec Loss 5.3005 LearningRate 0.0004 Epoch: 17 Global Step: 353640 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:18,078-Speed 6304.42 samples/sec Loss 5.3394 LearningRate 0.0004 Epoch: 17 Global Step: 353650 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:21,325-Speed 6307.73 samples/sec Loss 5.2866 LearningRate 0.0004 Epoch: 17 Global Step: 353660 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:24,582-Speed 6291.25 samples/sec Loss 5.3747 LearningRate 0.0004 Epoch: 17 Global Step: 353670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:54:27,832-Speed 6302.36 samples/sec Loss 5.3638 LearningRate 0.0004 Epoch: 17 Global Step: 353680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:54:31,081-Speed 6304.85 samples/sec Loss 5.3526 LearningRate 0.0004 Epoch: 17 Global Step: 353690 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:54:34,331-Speed 6301.70 samples/sec Loss 5.3156 LearningRate 0.0004 Epoch: 17 Global Step: 353700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:54:37,580-Speed 6305.20 samples/sec Loss 5.3273 LearningRate 0.0004 Epoch: 17 Global Step: 353710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:54:40,832-Speed 6299.13 samples/sec Loss 5.3789 LearningRate 0.0004 Epoch: 17 Global Step: 353720 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:54:44,065-Speed 6335.89 samples/sec Loss 5.3756 LearningRate 0.0004 Epoch: 17 Global Step: 353730 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:47,314-Speed 6304.30 samples/sec Loss 5.3164 LearningRate 0.0004 Epoch: 17 Global Step: 353740 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:50,557-Speed 6318.12 samples/sec Loss 5.4304 LearningRate 0.0004 Epoch: 17 Global Step: 353750 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:53,804-Speed 6308.35 samples/sec Loss 5.3656 LearningRate 0.0004 Epoch: 17 Global Step: 353760 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:54:57,050-Speed 6310.34 samples/sec Loss 5.3755 LearningRate 0.0004 Epoch: 17 Global Step: 353770 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:55:00,300-Speed 6302.59 samples/sec Loss 5.3367 LearningRate 0.0004 Epoch: 17 Global Step: 353780 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:55:03,552-Speed 6300.17 samples/sec Loss 5.3706 LearningRate 0.0004 Epoch: 17 Global Step: 353790 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:55:06,798-Speed 6309.96 samples/sec Loss 5.3343 LearningRate 0.0004 Epoch: 17 Global Step: 353800 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:55:10,044-Speed 6311.22 samples/sec Loss 5.3176 LearningRate 0.0004 Epoch: 17 Global Step: 353810 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:55:13,295-Speed 6300.99 samples/sec Loss 5.3853 LearningRate 0.0004 Epoch: 17 Global Step: 353820 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:55:16,546-Speed 6302.41 samples/sec Loss 5.2743 LearningRate 0.0004 Epoch: 17 Global Step: 353830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:19,791-Speed 6311.78 samples/sec Loss 5.3911 LearningRate 0.0004 Epoch: 17 Global Step: 353840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:23,034-Speed 6315.78 samples/sec Loss 5.4358 LearningRate 0.0004 Epoch: 17 Global Step: 353850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:26,284-Speed 6304.16 samples/sec Loss 5.3073 LearningRate 0.0004 Epoch: 17 Global Step: 353860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:29,531-Speed 6309.17 samples/sec Loss 5.3316 LearningRate 0.0004 Epoch: 17 Global Step: 353870 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:32,777-Speed 6309.13 samples/sec Loss 5.2683 LearningRate 0.0004 Epoch: 17 Global Step: 353880 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:36,021-Speed 6314.50 samples/sec Loss 5.3408 LearningRate 0.0004 Epoch: 17 Global Step: 353890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:39,277-Speed 6292.20 samples/sec Loss 5.2525 LearningRate 0.0004 Epoch: 17 Global Step: 353900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:42,526-Speed 6305.00 samples/sec Loss 5.3410 LearningRate 0.0004 Epoch: 17 Global Step: 353910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:45,772-Speed 6310.16 samples/sec Loss 5.3307 LearningRate 0.0004 Epoch: 17 Global Step: 353920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:49,009-Speed 6329.46 samples/sec Loss 5.3208 LearningRate 0.0004 Epoch: 17 Global Step: 353930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:52,258-Speed 6304.73 samples/sec Loss 5.3244 LearningRate 0.0004 Epoch: 17 Global Step: 353940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:55:55,496-Speed 6325.89 samples/sec Loss 5.3367 LearningRate 0.0004 Epoch: 17 Global Step: 353950 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:55:58,747-Speed 6300.44 samples/sec Loss 5.3305 LearningRate 0.0004 Epoch: 17 Global Step: 353960 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:56:01,992-Speed 6313.00 samples/sec Loss 5.3347 LearningRate 0.0004 Epoch: 17 Global Step: 353970 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:56:05,239-Speed 6309.19 samples/sec Loss 5.3446 LearningRate 0.0004 Epoch: 17 Global Step: 353980 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:56:08,482-Speed 6315.48 samples/sec Loss 5.2846 LearningRate 0.0004 Epoch: 17 Global Step: 353990 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:56:11,731-Speed 6304.35 samples/sec Loss 5.3255 LearningRate 0.0004 Epoch: 17 Global Step: 354000 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:56:14,983-Speed 6300.24 samples/sec Loss 5.4071 LearningRate 0.0004 Epoch: 17 Global Step: 354010 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:56:18,228-Speed 6312.58 samples/sec Loss 5.3097 LearningRate 0.0004 Epoch: 17 Global Step: 354020 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:56:21,479-Speed 6301.64 samples/sec Loss 5.2896 LearningRate 0.0004 Epoch: 17 Global Step: 354030 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:56:24,726-Speed 6309.80 samples/sec Loss 5.3247 LearningRate 0.0004 Epoch: 17 Global Step: 354040 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:56:27,975-Speed 6304.16 samples/sec Loss 5.2868 LearningRate 0.0004 Epoch: 17 Global Step: 354050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:56:31,228-Speed 6297.54 samples/sec Loss 5.3100 LearningRate 0.0004 Epoch: 17 Global Step: 354060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:56:34,474-Speed 6310.53 samples/sec Loss 5.3139 LearningRate 0.0004 Epoch: 17 Global Step: 354070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:56:37,720-Speed 6310.56 samples/sec Loss 5.3829 LearningRate 0.0004 Epoch: 17 Global Step: 354080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:56:40,965-Speed 6312.43 samples/sec Loss 5.4207 LearningRate 0.0004 Epoch: 17 Global Step: 354090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:56:44,214-Speed 6304.93 samples/sec Loss 5.3367 LearningRate 0.0004 Epoch: 17 Global Step: 354100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:56:47,459-Speed 6311.61 samples/sec Loss 5.3152 LearningRate 0.0004 Epoch: 17 Global Step: 354110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:56:50,707-Speed 6307.90 samples/sec Loss 5.4219 LearningRate 0.0004 Epoch: 17 Global Step: 354120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:56:53,958-Speed 6300.87 samples/sec Loss 5.3372 LearningRate 0.0004 Epoch: 17 Global Step: 354130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:56:57,201-Speed 6316.28 samples/sec Loss 5.3175 LearningRate 0.0004 Epoch: 17 Global Step: 354140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:57:00,440-Speed 6324.36 samples/sec Loss 5.2903 LearningRate 0.0004 Epoch: 17 Global Step: 354150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:57:03,690-Speed 6302.42 samples/sec Loss 5.3817 LearningRate 0.0004 Epoch: 17 Global Step: 354160 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:57:06,938-Speed 6306.50 samples/sec Loss 5.3320 LearningRate 0.0004 Epoch: 17 Global Step: 354170 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:57:10,169-Speed 6340.75 samples/sec Loss 5.3453 LearningRate 0.0004 Epoch: 17 Global Step: 354180 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:57:13,417-Speed 6306.83 samples/sec Loss 5.3046 LearningRate 0.0004 Epoch: 17 Global Step: 354190 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:57:16,663-Speed 6310.52 samples/sec Loss 5.2947 LearningRate 0.0004 Epoch: 17 Global Step: 354200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:57:19,908-Speed 6313.89 samples/sec Loss 5.3504 LearningRate 0.0004 Epoch: 17 Global Step: 354210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:57:23,157-Speed 6304.01 samples/sec Loss 5.3633 LearningRate 0.0004 Epoch: 17 Global Step: 354220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:57:26,406-Speed 6306.62 samples/sec Loss 5.2693 LearningRate 0.0004 Epoch: 17 Global Step: 354230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:57:29,651-Speed 6312.12 samples/sec Loss 5.3538 LearningRate 0.0004 Epoch: 17 Global Step: 354240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:57:32,914-Speed 6277.08 samples/sec Loss 5.3698 LearningRate 0.0004 Epoch: 17 Global Step: 354250 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:57:36,159-Speed 6312.39 samples/sec Loss 5.3256 LearningRate 0.0004 Epoch: 17 Global Step: 354260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:57:39,409-Speed 6304.18 samples/sec Loss 5.3578 LearningRate 0.0004 Epoch: 17 Global Step: 354270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:57:42,658-Speed 6303.98 samples/sec Loss 5.3761 LearningRate 0.0004 Epoch: 17 Global Step: 354280 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:57:45,908-Speed 6302.76 samples/sec Loss 5.3737 LearningRate 0.0004 Epoch: 17 Global Step: 354290 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:57:49,155-Speed 6309.14 samples/sec Loss 5.2609 LearningRate 0.0004 Epoch: 17 Global Step: 354300 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:57:52,418-Speed 6278.10 samples/sec Loss 5.3036 LearningRate 0.0004 Epoch: 17 Global Step: 354310 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:57:55,666-Speed 6308.14 samples/sec Loss 5.3587 LearningRate 0.0004 Epoch: 17 Global Step: 354320 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:57:58,912-Speed 6309.72 samples/sec Loss 5.4147 LearningRate 0.0004 Epoch: 17 Global Step: 354330 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:58:02,161-Speed 6305.58 samples/sec Loss 5.3307 LearningRate 0.0004 Epoch: 17 Global Step: 354340 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:58:05,412-Speed 6300.05 samples/sec Loss 5.3323 LearningRate 0.0004 Epoch: 17 Global Step: 354350 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:58:08,660-Speed 6310.32 samples/sec Loss 5.2737 LearningRate 0.0004 Epoch: 17 Global Step: 354360 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:58:11,907-Speed 6308.11 samples/sec Loss 5.2955 LearningRate 0.0004 Epoch: 17 Global Step: 354370 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:58:15,147-Speed 6322.92 samples/sec Loss 5.2858 LearningRate 0.0004 Epoch: 17 Global Step: 354380 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:58:18,397-Speed 6303.04 samples/sec Loss 5.3847 LearningRate 0.0004 Epoch: 17 Global Step: 354390 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:58:21,643-Speed 6310.71 samples/sec Loss 5.3723 LearningRate 0.0004 Epoch: 17 Global Step: 354400 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:58:24,875-Speed 6337.81 samples/sec Loss 5.3423 LearningRate 0.0004 Epoch: 17 Global Step: 354410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:58:28,124-Speed 6303.90 samples/sec Loss 5.3206 LearningRate 0.0004 Epoch: 17 Global Step: 354420 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:58:31,374-Speed 6304.20 samples/sec Loss 5.3332 LearningRate 0.0004 Epoch: 17 Global Step: 354430 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:58:34,623-Speed 6306.03 samples/sec Loss 5.3537 LearningRate 0.0004 Epoch: 17 Global Step: 354440 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:58:37,868-Speed 6311.68 samples/sec Loss 5.3162 LearningRate 0.0004 Epoch: 17 Global Step: 354450 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:58:41,115-Speed 6309.20 samples/sec Loss 5.3098 LearningRate 0.0004 Epoch: 17 Global Step: 354460 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:58:44,362-Speed 6309.98 samples/sec Loss 5.4009 LearningRate 0.0004 Epoch: 17 Global Step: 354470 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:58:47,609-Speed 6308.72 samples/sec Loss 5.2718 LearningRate 0.0004 Epoch: 17 Global Step: 354480 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:58:50,852-Speed 6315.71 samples/sec Loss 5.3474 LearningRate 0.0004 Epoch: 17 Global Step: 354490 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:58:54,096-Speed 6314.13 samples/sec Loss 5.4056 LearningRate 0.0004 Epoch: 17 Global Step: 354500 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:58:57,348-Speed 6299.00 samples/sec Loss 5.2895 LearningRate 0.0004 Epoch: 17 Global Step: 354510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:00,596-Speed 6306.68 samples/sec Loss 5.3577 LearningRate 0.0004 Epoch: 17 Global Step: 354520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:03,843-Speed 6309.38 samples/sec Loss 5.3207 LearningRate 0.0004 Epoch: 17 Global Step: 354530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:07,094-Speed 6302.26 samples/sec Loss 5.2862 LearningRate 0.0004 Epoch: 17 Global Step: 354540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:10,342-Speed 6306.30 samples/sec Loss 5.2955 LearningRate 0.0004 Epoch: 17 Global Step: 354550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:13,586-Speed 6313.17 samples/sec Loss 5.2929 LearningRate 0.0004 Epoch: 17 Global Step: 354560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:16,835-Speed 6306.25 samples/sec Loss 5.4018 LearningRate 0.0004 Epoch: 17 Global Step: 354570 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:20,086-Speed 6301.17 samples/sec Loss 5.3517 LearningRate 0.0004 Epoch: 17 Global Step: 354580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:23,335-Speed 6303.35 samples/sec Loss 5.2991 LearningRate 0.0004 Epoch: 17 Global Step: 354590 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:26,580-Speed 6312.59 samples/sec Loss 5.3602 LearningRate 0.0004 Epoch: 17 Global Step: 354600 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:29,815-Speed 6333.84 samples/sec Loss 5.3381 LearningRate 0.0004 Epoch: 17 Global Step: 354610 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:33,061-Speed 6309.13 samples/sec Loss 5.3517 LearningRate 0.0004 Epoch: 17 Global Step: 354620 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:36,310-Speed 6306.60 samples/sec Loss 5.3057 LearningRate 0.0004 Epoch: 17 Global Step: 354630 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:39,551-Speed 6319.89 samples/sec Loss 5.3649 LearningRate 0.0004 Epoch: 17 Global Step: 354640 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:42,802-Speed 6301.57 samples/sec Loss 5.3357 LearningRate 0.0004 Epoch: 17 Global Step: 354650 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:46,052-Speed 6302.94 samples/sec Loss 5.3720 LearningRate 0.0004 Epoch: 17 Global Step: 354660 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:49,298-Speed 6310.52 samples/sec Loss 5.2893 LearningRate 0.0004 Epoch: 17 Global Step: 354670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-01 23:59:52,529-Speed 6340.24 samples/sec Loss 5.3010 LearningRate 0.0004 Epoch: 17 Global Step: 354680 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:59:55,796-Speed 6270.73 samples/sec Loss 5.3551 LearningRate 0.0004 Epoch: 17 Global Step: 354690 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-01 23:59:59,043-Speed 6308.31 samples/sec Loss 5.3493 LearningRate 0.0004 Epoch: 17 Global Step: 354700 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:00:02,360-Speed 6176.01 samples/sec Loss 5.3696 LearningRate 0.0004 Epoch: 17 Global Step: 354710 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:00:05,711-Speed 6113.30 samples/sec Loss 5.3200 LearningRate 0.0004 Epoch: 17 Global Step: 354720 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:00:08,963-Speed 6297.64 samples/sec Loss 5.3417 LearningRate 0.0004 Epoch: 17 Global Step: 354730 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:00:12,211-Speed 6306.62 samples/sec Loss 5.2487 LearningRate 0.0004 Epoch: 17 Global Step: 354740 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:00:15,463-Speed 6298.40 samples/sec Loss 5.2807 LearningRate 0.0004 Epoch: 17 Global Step: 354750 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:00:18,710-Speed 6310.69 samples/sec Loss 5.2782 LearningRate 0.0004 Epoch: 17 Global Step: 354760 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:00:21,954-Speed 6314.18 samples/sec Loss 5.3547 LearningRate 0.0004 Epoch: 17 Global Step: 354770 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:00:25,204-Speed 6302.92 samples/sec Loss 5.3239 LearningRate 0.0004 Epoch: 17 Global Step: 354780 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:00:28,450-Speed 6309.74 samples/sec Loss 5.2574 LearningRate 0.0004 Epoch: 17 Global Step: 354790 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:00:31,703-Speed 6297.73 samples/sec Loss 5.3342 LearningRate 0.0004 Epoch: 17 Global Step: 354800 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:00:34,946-Speed 6317.51 samples/sec Loss 5.3188 LearningRate 0.0004 Epoch: 17 Global Step: 354810 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:00:38,194-Speed 6306.30 samples/sec Loss 5.3425 LearningRate 0.0004 Epoch: 17 Global Step: 354820 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:00:41,447-Speed 6295.79 samples/sec Loss 5.3661 LearningRate 0.0004 Epoch: 17 Global Step: 354830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:00:44,695-Speed 6307.92 samples/sec Loss 5.3627 LearningRate 0.0004 Epoch: 17 Global Step: 354840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:00:47,943-Speed 6307.46 samples/sec Loss 5.3147 LearningRate 0.0004 Epoch: 17 Global Step: 354850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:00:51,192-Speed 6306.02 samples/sec Loss 5.3672 LearningRate 0.0004 Epoch: 17 Global Step: 354860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:00:54,446-Speed 6293.29 samples/sec Loss 5.3620 LearningRate 0.0004 Epoch: 17 Global Step: 354870 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:00:57,679-Speed 6336.47 samples/sec Loss 5.3797 LearningRate 0.0004 Epoch: 17 Global Step: 354880 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:00,926-Speed 6309.20 samples/sec Loss 5.3070 LearningRate 0.0004 Epoch: 17 Global Step: 354890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:04,170-Speed 6314.80 samples/sec Loss 5.3764 LearningRate 0.0004 Epoch: 17 Global Step: 354900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:07,416-Speed 6311.07 samples/sec Loss 5.3989 LearningRate 0.0004 Epoch: 17 Global Step: 354910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:10,662-Speed 6310.65 samples/sec Loss 5.3015 LearningRate 0.0004 Epoch: 17 Global Step: 354920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:13,907-Speed 6312.25 samples/sec Loss 5.3922 LearningRate 0.0004 Epoch: 17 Global Step: 354930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:17,150-Speed 6315.97 samples/sec Loss 5.2929 LearningRate 0.0004 Epoch: 17 Global Step: 354940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:20,396-Speed 6311.51 samples/sec Loss 5.3378 LearningRate 0.0004 Epoch: 17 Global Step: 354950 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:23,692-Speed 6215.86 samples/sec Loss 5.3084 LearningRate 0.0004 Epoch: 17 Global Step: 354960 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:26,939-Speed 6308.40 samples/sec Loss 5.3020 LearningRate 0.0004 Epoch: 17 Global Step: 354970 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:30,172-Speed 6336.11 samples/sec Loss 5.2791 LearningRate 0.0004 Epoch: 17 Global Step: 354980 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:01:33,404-Speed 6337.97 samples/sec Loss 5.3278 LearningRate 0.0004 Epoch: 17 Global Step: 354990 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:01:36,649-Speed 6312.72 samples/sec Loss 5.3315 LearningRate 0.0004 Epoch: 17 Global Step: 355000 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:01:39,893-Speed 6315.20 samples/sec Loss 5.2662 LearningRate 0.0004 Epoch: 17 Global Step: 355010 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:01:43,137-Speed 6314.50 samples/sec Loss 5.3179 LearningRate 0.0004 Epoch: 17 Global Step: 355020 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:01:46,384-Speed 6307.57 samples/sec Loss 5.3532 LearningRate 0.0004 Epoch: 17 Global Step: 355030 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:01:49,633-Speed 6305.48 samples/sec Loss 5.3177 LearningRate 0.0004 Epoch: 17 Global Step: 355040 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:01:52,878-Speed 6311.76 samples/sec Loss 5.3265 LearningRate 0.0004 Epoch: 17 Global Step: 355050 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:01:56,124-Speed 6311.75 samples/sec Loss 5.3575 LearningRate 0.0004 Epoch: 17 Global Step: 355060 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:01:59,372-Speed 6306.75 samples/sec Loss 5.2963 LearningRate 0.0004 Epoch: 17 Global Step: 355070 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:02:02,619-Speed 6309.13 samples/sec Loss 5.3295 LearningRate 0.0004 Epoch: 17 Global Step: 355080 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:02:05,872-Speed 6298.61 samples/sec Loss 5.3062 LearningRate 0.0004 Epoch: 17 Global Step: 355090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:02:09,119-Speed 6307.33 samples/sec Loss 5.3703 LearningRate 0.0004 Epoch: 17 Global Step: 355100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:02:12,370-Speed 6301.99 samples/sec Loss 5.3026 LearningRate 0.0004 Epoch: 17 Global Step: 355110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:02:15,616-Speed 6310.11 samples/sec Loss 5.3080 LearningRate 0.0004 Epoch: 17 Global Step: 355120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:02:18,864-Speed 6307.32 samples/sec Loss 5.3287 LearningRate 0.0004 Epoch: 17 Global Step: 355130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:02:22,110-Speed 6310.18 samples/sec Loss 5.2781 LearningRate 0.0004 Epoch: 17 Global Step: 355140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:02:25,358-Speed 6306.79 samples/sec Loss 5.2243 LearningRate 0.0004 Epoch: 17 Global Step: 355150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:02:28,609-Speed 6300.95 samples/sec Loss 5.2439 LearningRate 0.0004 Epoch: 17 Global Step: 355160 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:02:31,854-Speed 6311.85 samples/sec Loss 5.3643 LearningRate 0.0004 Epoch: 17 Global Step: 355170 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:02:35,104-Speed 6305.12 samples/sec Loss 5.2932 LearningRate 0.0004 Epoch: 17 Global Step: 355180 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:02:38,340-Speed 6329.77 samples/sec Loss 5.4068 LearningRate 0.0004 Epoch: 17 Global Step: 355190 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:02:41,585-Speed 6311.39 samples/sec Loss 5.3944 LearningRate 0.0004 Epoch: 17 Global Step: 355200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:02:44,832-Speed 6309.89 samples/sec Loss 5.2733 LearningRate 0.0004 Epoch: 17 Global Step: 355210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:02:48,077-Speed 6311.32 samples/sec Loss 5.3092 LearningRate 0.0004 Epoch: 17 Global Step: 355220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:02:51,323-Speed 6310.57 samples/sec Loss 5.2556 LearningRate 0.0004 Epoch: 17 Global Step: 355230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:02:54,573-Speed 6304.29 samples/sec Loss 5.2995 LearningRate 0.0004 Epoch: 17 Global Step: 355240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:02:57,816-Speed 6316.00 samples/sec Loss 5.3010 LearningRate 0.0004 Epoch: 17 Global Step: 355250 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:03:01,060-Speed 6313.62 samples/sec Loss 5.2582 LearningRate 0.0004 Epoch: 17 Global Step: 355260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:03:04,311-Speed 6301.23 samples/sec Loss 5.2848 LearningRate 0.0004 Epoch: 17 Global Step: 355270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:03:07,567-Speed 6291.17 samples/sec Loss 5.3487 LearningRate 0.0004 Epoch: 17 Global Step: 355280 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:03:10,813-Speed 6312.25 samples/sec Loss 5.2902 LearningRate 0.0004 Epoch: 17 Global Step: 355290 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:14,058-Speed 6312.58 samples/sec Loss 5.2765 LearningRate 0.0004 Epoch: 17 Global Step: 355300 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:17,303-Speed 6312.08 samples/sec Loss 5.3811 LearningRate 0.0004 Epoch: 17 Global Step: 355310 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:20,551-Speed 6307.88 samples/sec Loss 5.3576 LearningRate 0.0004 Epoch: 17 Global Step: 355320 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:23,796-Speed 6313.21 samples/sec Loss 5.3165 LearningRate 0.0004 Epoch: 17 Global Step: 355330 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:27,048-Speed 6298.57 samples/sec Loss 5.3530 LearningRate 0.0004 Epoch: 17 Global Step: 355340 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:30,292-Speed 6314.44 samples/sec Loss 5.3417 LearningRate 0.0004 Epoch: 17 Global Step: 355350 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:33,539-Speed 6308.51 samples/sec Loss 5.3045 LearningRate 0.0004 Epoch: 17 Global Step: 355360 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:36,789-Speed 6301.65 samples/sec Loss 5.4271 LearningRate 0.0004 Epoch: 17 Global Step: 355370 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:40,037-Speed 6307.75 samples/sec Loss 5.3419 LearningRate 0.0004 Epoch: 17 Global Step: 355380 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:43,286-Speed 6304.34 samples/sec Loss 5.3906 LearningRate 0.0004 Epoch: 17 Global Step: 355390 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:03:46,516-Speed 6343.19 samples/sec Loss 5.3195 LearningRate 0.0004 Epoch: 17 Global Step: 355400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:03:49,770-Speed 6294.36 samples/sec Loss 5.3323 LearningRate 0.0004 Epoch: 17 Global Step: 355410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:03:53,015-Speed 6312.15 samples/sec Loss 5.3871 LearningRate 0.0004 Epoch: 17 Global Step: 355420 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:03:56,265-Speed 6303.53 samples/sec Loss 5.3223 LearningRate 0.0004 Epoch: 17 Global Step: 355430 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:03:59,511-Speed 6310.96 samples/sec Loss 5.3247 LearningRate 0.0004 Epoch: 17 Global Step: 355440 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:04:02,759-Speed 6307.15 samples/sec Loss 5.2960 LearningRate 0.0004 Epoch: 17 Global Step: 355450 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:04:06,003-Speed 6313.19 samples/sec Loss 5.3236 LearningRate 0.0004 Epoch: 17 Global Step: 355460 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:04:09,248-Speed 6314.65 samples/sec Loss 5.3444 LearningRate 0.0004 Epoch: 17 Global Step: 355470 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:04:12,495-Speed 6308.38 samples/sec Loss 5.3478 LearningRate 0.0004 Epoch: 17 Global Step: 355480 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:04:15,741-Speed 6309.54 samples/sec Loss 5.3879 LearningRate 0.0004 Epoch: 17 Global Step: 355490 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:04:18,988-Speed 6308.76 samples/sec Loss 5.3198 LearningRate 0.0004 Epoch: 17 Global Step: 355500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:22,234-Speed 6312.24 samples/sec Loss 5.3793 LearningRate 0.0004 Epoch: 17 Global Step: 355510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:25,486-Speed 6299.49 samples/sec Loss 5.2669 LearningRate 0.0004 Epoch: 17 Global Step: 355520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:28,787-Speed 6205.63 samples/sec Loss 5.3060 LearningRate 0.0004 Epoch: 17 Global Step: 355530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:32,037-Speed 6303.60 samples/sec Loss 5.3079 LearningRate 0.0004 Epoch: 17 Global Step: 355540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:35,284-Speed 6307.90 samples/sec Loss 5.3228 LearningRate 0.0004 Epoch: 17 Global Step: 355550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:38,532-Speed 6306.55 samples/sec Loss 5.2598 LearningRate 0.0004 Epoch: 17 Global Step: 355560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:41,780-Speed 6308.48 samples/sec Loss 5.3943 LearningRate 0.0004 Epoch: 17 Global Step: 355570 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:45,025-Speed 6311.41 samples/sec Loss 5.3788 LearningRate 0.0004 Epoch: 17 Global Step: 355580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:48,274-Speed 6306.84 samples/sec Loss 5.2443 LearningRate 0.0004 Epoch: 17 Global Step: 355590 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:51,505-Speed 6339.57 samples/sec Loss 5.3454 LearningRate 0.0004 Epoch: 17 Global Step: 355600 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:54,755-Speed 6302.76 samples/sec Loss 5.2974 LearningRate 0.0004 Epoch: 17 Global Step: 355610 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:04:58,004-Speed 6304.41 samples/sec Loss 5.3255 LearningRate 0.0004 Epoch: 17 Global Step: 355620 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:01,250-Speed 6312.49 samples/sec Loss 5.2609 LearningRate 0.0004 Epoch: 17 Global Step: 355630 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:04,501-Speed 6300.14 samples/sec Loss 5.3100 LearningRate 0.0004 Epoch: 17 Global Step: 355640 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:07,746-Speed 6312.16 samples/sec Loss 5.3432 LearningRate 0.0004 Epoch: 17 Global Step: 355650 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:10,995-Speed 6305.27 samples/sec Loss 5.3139 LearningRate 0.0004 Epoch: 17 Global Step: 355660 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:14,246-Speed 6300.67 samples/sec Loss 5.3692 LearningRate 0.0004 Epoch: 17 Global Step: 355670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:17,494-Speed 6308.32 samples/sec Loss 5.2899 LearningRate 0.0004 Epoch: 17 Global Step: 355680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:20,738-Speed 6313.54 samples/sec Loss 5.2744 LearningRate 0.0004 Epoch: 17 Global Step: 355690 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:23,974-Speed 6329.73 samples/sec Loss 5.2882 LearningRate 0.0004 Epoch: 17 Global Step: 355700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:27,219-Speed 6312.62 samples/sec Loss 5.3720 LearningRate 0.0004 Epoch: 17 Global Step: 355710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:30,470-Speed 6301.37 samples/sec Loss 5.3754 LearningRate 0.0004 Epoch: 17 Global Step: 355720 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:33,717-Speed 6309.47 samples/sec Loss 5.2997 LearningRate 0.0004 Epoch: 17 Global Step: 355730 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:36,966-Speed 6304.39 samples/sec Loss 5.2887 LearningRate 0.0004 Epoch: 17 Global Step: 355740 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:40,208-Speed 6319.47 samples/sec Loss 5.3017 LearningRate 0.0004 Epoch: 17 Global Step: 355750 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:43,457-Speed 6305.32 samples/sec Loss 5.3243 LearningRate 0.0004 Epoch: 17 Global Step: 355760 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:46,702-Speed 6313.16 samples/sec Loss 5.3137 LearningRate 0.0004 Epoch: 17 Global Step: 355770 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:49,945-Speed 6316.02 samples/sec Loss 5.2943 LearningRate 0.0004 Epoch: 17 Global Step: 355780 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:53,212-Speed 6269.83 samples/sec Loss 5.3289 LearningRate 0.0004 Epoch: 17 Global Step: 355790 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:56,441-Speed 6343.67 samples/sec Loss 5.2909 LearningRate 0.0004 Epoch: 17 Global Step: 355800 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:05:59,676-Speed 6332.12 samples/sec Loss 5.3200 LearningRate 0.0004 Epoch: 17 Global Step: 355810 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:02,922-Speed 6310.31 samples/sec Loss 5.3022 LearningRate 0.0004 Epoch: 17 Global Step: 355820 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:06,175-Speed 6296.99 samples/sec Loss 5.2995 LearningRate 0.0004 Epoch: 17 Global Step: 355830 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:09,422-Speed 6309.41 samples/sec Loss 5.2623 LearningRate 0.0004 Epoch: 17 Global Step: 355840 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:12,675-Speed 6297.56 samples/sec Loss 5.3834 LearningRate 0.0004 Epoch: 17 Global Step: 355850 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:15,923-Speed 6307.63 samples/sec Loss 5.3046 LearningRate 0.0004 Epoch: 17 Global Step: 355860 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:19,176-Speed 6296.84 samples/sec Loss 5.3705 LearningRate 0.0004 Epoch: 17 Global Step: 355870 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:22,419-Speed 6315.68 samples/sec Loss 5.3440 LearningRate 0.0004 Epoch: 17 Global Step: 355880 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:25,665-Speed 6311.13 samples/sec Loss 5.3303 LearningRate 0.0004 Epoch: 17 Global Step: 355890 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:28,913-Speed 6306.86 samples/sec Loss 5.2703 LearningRate 0.0004 Epoch: 17 Global Step: 355900 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:32,160-Speed 6307.68 samples/sec Loss 5.3109 LearningRate 0.0004 Epoch: 17 Global Step: 355910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:06:35,404-Speed 6316.06 samples/sec Loss 5.3244 LearningRate 0.0004 Epoch: 17 Global Step: 355920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:06:38,634-Speed 6341.98 samples/sec Loss 5.2743 LearningRate 0.0004 Epoch: 17 Global Step: 355930 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:41,890-Speed 6291.97 samples/sec Loss 5.3083 LearningRate 0.0004 Epoch: 17 Global Step: 355940 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:45,132-Speed 6317.21 samples/sec Loss 5.3006 LearningRate 0.0004 Epoch: 17 Global Step: 355950 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:48,390-Speed 6287.94 samples/sec Loss 5.3439 LearningRate 0.0004 Epoch: 17 Global Step: 355960 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:51,632-Speed 6318.62 samples/sec Loss 5.3059 LearningRate 0.0004 Epoch: 17 Global Step: 355970 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:54,875-Speed 6317.16 samples/sec Loss 5.3498 LearningRate 0.0004 Epoch: 17 Global Step: 355980 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:06:58,119-Speed 6313.99 samples/sec Loss 5.2338 LearningRate 0.0004 Epoch: 17 Global Step: 355990 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:07:01,366-Speed 6310.06 samples/sec Loss 5.2893 LearningRate 0.0004 Epoch: 17 Global Step: 356000 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:07:04,614-Speed 6305.71 samples/sec Loss 5.2602 LearningRate 0.0004 Epoch: 17 Global Step: 356010 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:07:07,859-Speed 6313.18 samples/sec Loss 5.3298 LearningRate 0.0004 Epoch: 17 Global Step: 356020 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:07:11,104-Speed 6312.97 samples/sec Loss 5.2765 LearningRate 0.0004 Epoch: 17 Global Step: 356030 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:14,351-Speed 6308.15 samples/sec Loss 5.3181 LearningRate 0.0004 Epoch: 17 Global Step: 356040 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:17,598-Speed 6309.49 samples/sec Loss 5.3087 LearningRate 0.0004 Epoch: 17 Global Step: 356050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:20,844-Speed 6310.20 samples/sec Loss 5.3375 LearningRate 0.0004 Epoch: 17 Global Step: 356060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:24,087-Speed 6315.72 samples/sec Loss 5.3737 LearningRate 0.0004 Epoch: 17 Global Step: 356070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:27,346-Speed 6286.38 samples/sec Loss 5.3576 LearningRate 0.0004 Epoch: 17 Global Step: 356080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:30,592-Speed 6310.14 samples/sec Loss 5.3473 LearningRate 0.0004 Epoch: 17 Global Step: 356090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:33,838-Speed 6310.48 samples/sec Loss 5.3594 LearningRate 0.0004 Epoch: 17 Global Step: 356100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:37,082-Speed 6314.77 samples/sec Loss 5.2553 LearningRate 0.0004 Epoch: 17 Global Step: 356110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:40,329-Speed 6309.13 samples/sec Loss 5.2251 LearningRate 0.0004 Epoch: 17 Global Step: 356120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:43,564-Speed 6331.84 samples/sec Loss 5.3214 LearningRate 0.0004 Epoch: 17 Global Step: 356130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:46,808-Speed 6316.00 samples/sec Loss 5.2658 LearningRate 0.0004 Epoch: 17 Global Step: 356140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:07:50,041-Speed 6335.95 samples/sec Loss 5.3687 LearningRate 0.0004 Epoch: 17 Global Step: 356150 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:07:53,286-Speed 6312.93 samples/sec Loss 5.2966 LearningRate 0.0004 Epoch: 17 Global Step: 356160 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:07:56,529-Speed 6316.13 samples/sec Loss 5.3826 LearningRate 0.0004 Epoch: 17 Global Step: 356170 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:07:59,774-Speed 6312.38 samples/sec Loss 5.3811 LearningRate 0.0004 Epoch: 17 Global Step: 356180 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:03,020-Speed 6311.49 samples/sec Loss 5.3000 LearningRate 0.0004 Epoch: 17 Global Step: 356190 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:06,266-Speed 6309.58 samples/sec Loss 5.3247 LearningRate 0.0004 Epoch: 17 Global Step: 356200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:09,509-Speed 6317.02 samples/sec Loss 5.3142 LearningRate 0.0004 Epoch: 17 Global Step: 356210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:12,758-Speed 6305.30 samples/sec Loss 5.3312 LearningRate 0.0004 Epoch: 17 Global Step: 356220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:16,004-Speed 6311.21 samples/sec Loss 5.3756 LearningRate 0.0004 Epoch: 17 Global Step: 356230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:19,247-Speed 6316.05 samples/sec Loss 5.2395 LearningRate 0.0004 Epoch: 17 Global Step: 356240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:22,492-Speed 6312.22 samples/sec Loss 5.3285 LearningRate 0.0004 Epoch: 17 Global Step: 356250 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:08:25,740-Speed 6306.38 samples/sec Loss 5.3007 LearningRate 0.0004 Epoch: 17 Global Step: 356260 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:08:28,985-Speed 6312.17 samples/sec Loss 5.3725 LearningRate 0.0004 Epoch: 17 Global Step: 356270 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:08:32,229-Speed 6315.37 samples/sec Loss 5.2524 LearningRate 0.0004 Epoch: 17 Global Step: 356280 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:08:35,475-Speed 6310.89 samples/sec Loss 5.2339 LearningRate 0.0004 Epoch: 17 Global Step: 356290 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:08:38,716-Speed 6319.58 samples/sec Loss 5.3246 LearningRate 0.0004 Epoch: 17 Global Step: 356300 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:08:41,947-Speed 6341.13 samples/sec Loss 5.4011 LearningRate 0.0004 Epoch: 17 Global Step: 356310 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:45,193-Speed 6309.27 samples/sec Loss 5.2990 LearningRate 0.0004 Epoch: 17 Global Step: 356320 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:48,471-Speed 6249.12 samples/sec Loss 5.3002 LearningRate 0.0004 Epoch: 17 Global Step: 356330 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:51,719-Speed 6307.81 samples/sec Loss 5.2997 LearningRate 0.0004 Epoch: 17 Global Step: 356340 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:54,966-Speed 6309.67 samples/sec Loss 5.3047 LearningRate 0.0004 Epoch: 17 Global Step: 356350 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:08:58,211-Speed 6311.18 samples/sec Loss 5.3053 LearningRate 0.0004 Epoch: 17 Global Step: 356360 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:09:01,458-Speed 6310.22 samples/sec Loss 5.3257 LearningRate 0.0004 Epoch: 17 Global Step: 356370 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:09:04,704-Speed 6310.68 samples/sec Loss 5.2288 LearningRate 0.0004 Epoch: 17 Global Step: 356380 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:09:07,949-Speed 6311.31 samples/sec Loss 5.3384 LearningRate 0.0004 Epoch: 17 Global Step: 356390 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:09:11,196-Speed 6310.81 samples/sec Loss 5.3053 LearningRate 0.0004 Epoch: 17 Global Step: 356400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:09:14,438-Speed 6317.57 samples/sec Loss 5.2878 LearningRate 0.0004 Epoch: 17 Global Step: 356410 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:17,683-Speed 6313.27 samples/sec Loss 5.3326 LearningRate 0.0004 Epoch: 17 Global Step: 356420 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:20,926-Speed 6315.85 samples/sec Loss 5.3047 LearningRate 0.0004 Epoch: 17 Global Step: 356430 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:24,178-Speed 6299.18 samples/sec Loss 5.2513 LearningRate 0.0004 Epoch: 17 Global Step: 356440 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:27,423-Speed 6311.91 samples/sec Loss 5.3417 LearningRate 0.0004 Epoch: 17 Global Step: 356450 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:30,668-Speed 6313.79 samples/sec Loss 5.3103 LearningRate 0.0004 Epoch: 17 Global Step: 356460 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:33,914-Speed 6310.94 samples/sec Loss 5.3717 LearningRate 0.0004 Epoch: 17 Global Step: 356470 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:37,160-Speed 6310.19 samples/sec Loss 5.2332 LearningRate 0.0004 Epoch: 17 Global Step: 356480 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:40,413-Speed 6297.58 samples/sec Loss 5.3166 LearningRate 0.0004 Epoch: 17 Global Step: 356490 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:43,658-Speed 6311.89 samples/sec Loss 5.3294 LearningRate 0.0004 Epoch: 17 Global Step: 356500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:46,902-Speed 6315.16 samples/sec Loss 5.2556 LearningRate 0.0004 Epoch: 17 Global Step: 356510 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-04-02 00:09:50,134-Speed 6337.22 samples/sec Loss 5.3500 LearningRate 0.0004 Epoch: 17 Global Step: 356520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:53,384-Speed 6302.93 samples/sec Loss 5.3294 LearningRate 0.0004 Epoch: 17 Global Step: 356530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:56,630-Speed 6310.55 samples/sec Loss 5.3151 LearningRate 0.0004 Epoch: 17 Global Step: 356540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:09:59,874-Speed 6314.74 samples/sec Loss 5.3579 LearningRate 0.0004 Epoch: 17 Global Step: 356550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:03,122-Speed 6306.64 samples/sec Loss 5.3086 LearningRate 0.0004 Epoch: 17 Global Step: 356560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:06,368-Speed 6311.71 samples/sec Loss 5.3870 LearningRate 0.0004 Epoch: 17 Global Step: 356570 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:09,614-Speed 6310.59 samples/sec Loss 5.3411 LearningRate 0.0004 Epoch: 17 Global Step: 356580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:12,858-Speed 6314.21 samples/sec Loss 5.3518 LearningRate 0.0004 Epoch: 17 Global Step: 356590 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:16,106-Speed 6308.11 samples/sec Loss 5.3479 LearningRate 0.0004 Epoch: 17 Global Step: 356600 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:19,350-Speed 6314.11 samples/sec Loss 5.3246 LearningRate 0.0004 Epoch: 17 Global Step: 356610 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:22,580-Speed 6341.55 samples/sec Loss 5.2818 LearningRate 0.0004 Epoch: 17 Global Step: 356620 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:25,888-Speed 6192.70 samples/sec Loss 5.3107 LearningRate 0.0004 Epoch: 17 Global Step: 356630 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:29,135-Speed 6309.03 samples/sec Loss 5.2686 LearningRate 0.0004 Epoch: 17 Global Step: 356640 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:32,380-Speed 6312.32 samples/sec Loss 5.3192 LearningRate 0.0004 Epoch: 17 Global Step: 356650 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:35,627-Speed 6309.33 samples/sec Loss 5.3532 LearningRate 0.0004 Epoch: 17 Global Step: 356660 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:38,871-Speed 6314.09 samples/sec Loss 5.3482 LearningRate 0.0004 Epoch: 17 Global Step: 356670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:42,118-Speed 6309.20 samples/sec Loss 5.3605 LearningRate 0.0004 Epoch: 17 Global Step: 356680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:45,361-Speed 6316.61 samples/sec Loss 5.2837 LearningRate 0.0004 Epoch: 17 Global Step: 356690 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:48,609-Speed 6305.64 samples/sec Loss 5.3687 LearningRate 0.0004 Epoch: 17 Global Step: 356700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:51,856-Speed 6309.64 samples/sec Loss 5.2709 LearningRate 0.0004 Epoch: 17 Global Step: 356710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:10:55,075-Speed 6364.00 samples/sec Loss 5.3603 LearningRate 0.0004 Epoch: 17 Global Step: 356720 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:10:58,327-Speed 6297.17 samples/sec Loss 5.2991 LearningRate 0.0004 Epoch: 17 Global Step: 356730 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:11:01,570-Speed 6317.10 samples/sec Loss 5.3528 LearningRate 0.0004 Epoch: 17 Global Step: 356740 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:11:04,818-Speed 6306.47 samples/sec Loss 5.3414 LearningRate 0.0004 Epoch: 17 Global Step: 356750 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:11:12,275-Speed 2747.00 samples/sec Loss 5.2951 LearningRate 0.0004 Epoch: 17 Global Step: 356760 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:11:15,517-Speed 6318.49 samples/sec Loss 5.3034 LearningRate 0.0004 Epoch: 17 Global Step: 356770 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:11:18,762-Speed 6311.65 samples/sec Loss 5.3134 LearningRate 0.0004 Epoch: 17 Global Step: 356780 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:11:22,005-Speed 6317.22 samples/sec Loss 5.2876 LearningRate 0.0004 Epoch: 17 Global Step: 356790 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:11:25,251-Speed 6310.74 samples/sec Loss 5.2721 LearningRate 0.0004 Epoch: 17 Global Step: 356800 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:11:28,493-Speed 6318.25 samples/sec Loss 5.2964 LearningRate 0.0004 Epoch: 17 Global Step: 356810 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:11:31,740-Speed 6308.95 samples/sec Loss 5.2788 LearningRate 0.0004 Epoch: 17 Global Step: 356820 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:11:34,983-Speed 6316.31 samples/sec Loss 5.2728 LearningRate 0.0004 Epoch: 17 Global Step: 356830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:11:38,225-Speed 6319.68 samples/sec Loss 5.2909 LearningRate 0.0004 Epoch: 17 Global Step: 356840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:11:41,469-Speed 6313.68 samples/sec Loss 5.3574 LearningRate 0.0004 Epoch: 17 Global Step: 356850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:11:44,709-Speed 6322.34 samples/sec Loss 5.3636 LearningRate 0.0004 Epoch: 17 Global Step: 356860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:11:47,956-Speed 6308.85 samples/sec Loss 5.3748 LearningRate 0.0004 Epoch: 17 Global Step: 356870 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:11:51,200-Speed 6315.89 samples/sec Loss 5.2976 LearningRate 0.0004 Epoch: 17 Global Step: 356880 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:11:54,447-Speed 6307.35 samples/sec Loss 5.2722 LearningRate 0.0004 Epoch: 17 Global Step: 356890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:11:57,693-Speed 6310.51 samples/sec Loss 5.2959 LearningRate 0.0004 Epoch: 17 Global Step: 356900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:00,940-Speed 6309.38 samples/sec Loss 5.3065 LearningRate 0.0004 Epoch: 17 Global Step: 356910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:04,186-Speed 6310.82 samples/sec Loss 5.3078 LearningRate 0.0004 Epoch: 17 Global Step: 356920 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-04-02 00:12:07,416-Speed 6341.04 samples/sec Loss 5.3234 LearningRate 0.0004 Epoch: 17 Global Step: 356930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:10,663-Speed 6309.35 samples/sec Loss 5.2840 LearningRate 0.0004 Epoch: 17 Global Step: 356940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:13,910-Speed 6308.15 samples/sec Loss 5.3163 LearningRate 0.0004 Epoch: 17 Global Step: 356950 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:17,151-Speed 6320.63 samples/sec Loss 5.3259 LearningRate 0.0004 Epoch: 17 Global Step: 356960 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:20,394-Speed 6316.99 samples/sec Loss 5.3070 LearningRate 0.0004 Epoch: 17 Global Step: 356970 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:23,637-Speed 6317.39 samples/sec Loss 5.2582 LearningRate 0.0004 Epoch: 17 Global Step: 356980 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:26,886-Speed 6304.43 samples/sec Loss 5.2929 LearningRate 0.0004 Epoch: 17 Global Step: 356990 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:30,130-Speed 6315.49 samples/sec Loss 5.3044 LearningRate 0.0004 Epoch: 17 Global Step: 357000 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:33,375-Speed 6312.59 samples/sec Loss 5.3831 LearningRate 0.0004 Epoch: 17 Global Step: 357010 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:36,621-Speed 6311.57 samples/sec Loss 5.2609 LearningRate 0.0004 Epoch: 17 Global Step: 357020 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:39,853-Speed 6337.75 samples/sec Loss 5.3227 LearningRate 0.0004 Epoch: 17 Global Step: 357030 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:43,098-Speed 6311.62 samples/sec Loss 5.3276 LearningRate 0.0004 Epoch: 17 Global Step: 357040 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:46,339-Speed 6320.70 samples/sec Loss 5.3534 LearningRate 0.0004 Epoch: 17 Global Step: 357050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:49,583-Speed 6313.69 samples/sec Loss 5.3242 LearningRate 0.0004 Epoch: 17 Global Step: 357060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:52,828-Speed 6314.66 samples/sec Loss 5.3567 LearningRate 0.0004 Epoch: 17 Global Step: 357070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:56,077-Speed 6303.21 samples/sec Loss 5.3509 LearningRate 0.0004 Epoch: 17 Global Step: 357080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:12:59,325-Speed 6308.10 samples/sec Loss 5.2526 LearningRate 0.0004 Epoch: 17 Global Step: 357090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:13:02,571-Speed 6309.78 samples/sec Loss 5.2601 LearningRate 0.0004 Epoch: 17 Global Step: 357100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:13:05,820-Speed 6306.12 samples/sec Loss 5.3558 LearningRate 0.0004 Epoch: 17 Global Step: 357110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:13:09,067-Speed 6308.54 samples/sec Loss 5.3226 LearningRate 0.0004 Epoch: 17 Global Step: 357120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:13:12,300-Speed 6334.80 samples/sec Loss 5.3466 LearningRate 0.0004 Epoch: 17 Global Step: 357130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:13:15,544-Speed 6315.59 samples/sec Loss 5.3621 LearningRate 0.0004 Epoch: 17 Global Step: 357140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:13:18,783-Speed 6323.60 samples/sec Loss 5.3177 LearningRate 0.0004 Epoch: 17 Global Step: 357150 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:22,029-Speed 6311.48 samples/sec Loss 5.2050 LearningRate 0.0004 Epoch: 17 Global Step: 357160 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:25,286-Speed 6289.71 samples/sec Loss 5.2854 LearningRate 0.0004 Epoch: 17 Global Step: 357170 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:28,532-Speed 6309.68 samples/sec Loss 5.2781 LearningRate 0.0004 Epoch: 17 Global Step: 357180 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:31,777-Speed 6312.65 samples/sec Loss 5.3130 LearningRate 0.0004 Epoch: 17 Global Step: 357190 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:35,022-Speed 6314.55 samples/sec Loss 5.4244 LearningRate 0.0004 Epoch: 17 Global Step: 357200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:38,271-Speed 6303.80 samples/sec Loss 5.2229 LearningRate 0.0004 Epoch: 17 Global Step: 357210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:41,518-Speed 6308.33 samples/sec Loss 5.2734 LearningRate 0.0004 Epoch: 17 Global Step: 357220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:44,766-Speed 6311.12 samples/sec Loss 5.3714 LearningRate 0.0004 Epoch: 17 Global Step: 357230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:48,012-Speed 6310.17 samples/sec Loss 5.2631 LearningRate 0.0004 Epoch: 17 Global Step: 357240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:51,260-Speed 6306.54 samples/sec Loss 5.3195 LearningRate 0.0004 Epoch: 17 Global Step: 357250 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:13:54,494-Speed 6334.42 samples/sec Loss 5.3039 LearningRate 0.0004 Epoch: 17 Global Step: 357260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:13:57,744-Speed 6302.45 samples/sec Loss 5.3241 LearningRate 0.0004 Epoch: 17 Global Step: 357270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:14:00,991-Speed 6308.47 samples/sec Loss 5.2927 LearningRate 0.0004 Epoch: 17 Global Step: 357280 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:14:04,238-Speed 6308.86 samples/sec Loss 5.3420 LearningRate 0.0004 Epoch: 17 Global Step: 357290 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:14:07,485-Speed 6308.60 samples/sec Loss 5.3273 LearningRate 0.0004 Epoch: 17 Global Step: 357300 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:14:10,731-Speed 6311.48 samples/sec Loss 5.2756 LearningRate 0.0004 Epoch: 17 Global Step: 357310 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:14:13,995-Speed 6275.16 samples/sec Loss 5.2915 LearningRate 0.0004 Epoch: 17 Global Step: 357320 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:14:17,240-Speed 6315.40 samples/sec Loss 5.3508 LearningRate 0.0004 Epoch: 17 Global Step: 357330 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:14:20,488-Speed 6309.10 samples/sec Loss 5.2533 LearningRate 0.0004 Epoch: 17 Global Step: 357340 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:14:23,736-Speed 6306.72 samples/sec Loss 5.3175 LearningRate 0.0004 Epoch: 17 Global Step: 357350 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:14:26,981-Speed 6313.42 samples/sec Loss 5.3498 LearningRate 0.0004 Epoch: 17 Global Step: 357360 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:14:30,229-Speed 6305.91 samples/sec Loss 5.2895 LearningRate 0.0004 Epoch: 17 Global Step: 357370 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:14:33,473-Speed 6315.10 samples/sec Loss 5.3034 LearningRate 0.0004 Epoch: 17 Global Step: 357380 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:14:36,721-Speed 6308.56 samples/sec Loss 5.3523 LearningRate 0.0004 Epoch: 17 Global Step: 357390 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:14:39,967-Speed 6310.05 samples/sec Loss 5.2724 LearningRate 0.0004 Epoch: 17 Global Step: 357400 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:14:43,219-Speed 6298.87 samples/sec Loss 5.2649 LearningRate 0.0004 Epoch: 17 Global Step: 357410 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:14:46,470-Speed 6301.62 samples/sec Loss 5.2839 LearningRate 0.0004 Epoch: 17 Global Step: 357420 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:14:49,715-Speed 6313.04 samples/sec Loss 5.3331 LearningRate 0.0004 Epoch: 17 Global Step: 357430 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:14:52,966-Speed 6300.73 samples/sec Loss 5.3391 LearningRate 0.0004 Epoch: 17 Global Step: 357440 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:14:56,211-Speed 6313.46 samples/sec Loss 5.3122 LearningRate 0.0004 Epoch: 17 Global Step: 357450 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:14:59,445-Speed 6333.66 samples/sec Loss 5.3443 LearningRate 0.0004 Epoch: 17 Global Step: 357460 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:02,696-Speed 6299.98 samples/sec Loss 5.3041 LearningRate 0.0004 Epoch: 17 Global Step: 357470 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:05,946-Speed 6304.50 samples/sec Loss 5.3114 LearningRate 0.0004 Epoch: 17 Global Step: 357480 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:09,191-Speed 6313.44 samples/sec Loss 5.2875 LearningRate 0.0004 Epoch: 17 Global Step: 357490 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:12,436-Speed 6312.24 samples/sec Loss 5.2895 LearningRate 0.0004 Epoch: 17 Global Step: 357500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:15,683-Speed 6308.71 samples/sec Loss 5.2273 LearningRate 0.0004 Epoch: 17 Global Step: 357510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:18,933-Speed 6303.79 samples/sec Loss 5.3564 LearningRate 0.0004 Epoch: 17 Global Step: 357520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:22,181-Speed 6305.86 samples/sec Loss 5.2645 LearningRate 0.0004 Epoch: 17 Global Step: 357530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:25,425-Speed 6314.88 samples/sec Loss 5.4163 LearningRate 0.0004 Epoch: 17 Global Step: 357540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:28,672-Speed 6308.67 samples/sec Loss 5.3843 LearningRate 0.0004 Epoch: 17 Global Step: 357550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:31,908-Speed 6330.37 samples/sec Loss 5.3529 LearningRate 0.0004 Epoch: 17 Global Step: 357560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:35,152-Speed 6314.70 samples/sec Loss 5.3057 LearningRate 0.0004 Epoch: 17 Global Step: 357570 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:38,400-Speed 6306.28 samples/sec Loss 5.2795 LearningRate 0.0004 Epoch: 17 Global Step: 357580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:41,646-Speed 6310.88 samples/sec Loss 5.3439 LearningRate 0.0004 Epoch: 17 Global Step: 357590 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:44,887-Speed 6320.05 samples/sec Loss 5.3070 LearningRate 0.0004 Epoch: 17 Global Step: 357600 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:48,132-Speed 6313.72 samples/sec Loss 5.2965 LearningRate 0.0004 Epoch: 17 Global Step: 357610 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:51,379-Speed 6308.52 samples/sec Loss 5.3798 LearningRate 0.0004 Epoch: 17 Global Step: 357620 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:54,619-Speed 6322.95 samples/sec Loss 5.3507 LearningRate 0.0004 Epoch: 17 Global Step: 357630 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:15:57,861-Speed 6318.54 samples/sec Loss 5.2876 LearningRate 0.0004 Epoch: 17 Global Step: 357640 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:16:01,121-Speed 6284.16 samples/sec Loss 5.2292 LearningRate 0.0004 Epoch: 17 Global Step: 357650 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:16:04,352-Speed 6340.48 samples/sec Loss 5.3319 LearningRate 0.0004 Epoch: 17 Global Step: 357660 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:16:07,599-Speed 6308.77 samples/sec Loss 5.3253 LearningRate 0.0004 Epoch: 17 Global Step: 357670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:16:10,855-Speed 6291.10 samples/sec Loss 5.3029 LearningRate 0.0004 Epoch: 17 Global Step: 357680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:16:14,096-Speed 6321.02 samples/sec Loss 5.1886 LearningRate 0.0004 Epoch: 17 Global Step: 357690 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:16:17,342-Speed 6309.98 samples/sec Loss 5.3479 LearningRate 0.0004 Epoch: 17 Global Step: 357700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:16:20,587-Speed 6316.45 samples/sec Loss 5.2946 LearningRate 0.0004 Epoch: 17 Global Step: 357710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:16:23,818-Speed 6339.53 samples/sec Loss 5.3522 LearningRate 0.0004 Epoch: 17 Global Step: 357720 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:16:27,067-Speed 6306.22 samples/sec Loss 5.3041 LearningRate 0.0004 Epoch: 17 Global Step: 357730 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:16:30,310-Speed 6317.44 samples/sec Loss 5.3033 LearningRate 0.0004 Epoch: 17 Global Step: 357740 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:16:33,554-Speed 6314.49 samples/sec Loss 5.2965 LearningRate 0.0004 Epoch: 17 Global Step: 357750 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:16:36,799-Speed 6312.80 samples/sec Loss 5.2984 LearningRate 0.0004 Epoch: 17 Global Step: 357760 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:16:40,043-Speed 6313.84 samples/sec Loss 5.2921 LearningRate 0.0004 Epoch: 17 Global Step: 357770 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:16:43,286-Speed 6316.41 samples/sec Loss 5.2548 LearningRate 0.0004 Epoch: 17 Global Step: 357780 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:16:46,529-Speed 6316.11 samples/sec Loss 5.2708 LearningRate 0.0004 Epoch: 17 Global Step: 357790 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:16:49,778-Speed 6305.31 samples/sec Loss 5.3556 LearningRate 0.0004 Epoch: 17 Global Step: 357800 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:16:53,024-Speed 6310.89 samples/sec Loss 5.2977 LearningRate 0.0004 Epoch: 17 Global Step: 357810 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:16:56,278-Speed 6296.13 samples/sec Loss 5.3514 LearningRate 0.0004 Epoch: 17 Global Step: 357820 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:16:59,520-Speed 6316.62 samples/sec Loss 5.2983 LearningRate 0.0004 Epoch: 17 Global Step: 357830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:02,766-Speed 6311.88 samples/sec Loss 5.3440 LearningRate 0.0004 Epoch: 17 Global Step: 357840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:06,010-Speed 6313.98 samples/sec Loss 5.3220 LearningRate 0.0004 Epoch: 17 Global Step: 357850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:09,259-Speed 6306.25 samples/sec Loss 5.3063 LearningRate 0.0004 Epoch: 17 Global Step: 357860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:12,514-Speed 6292.93 samples/sec Loss 5.3337 LearningRate 0.0004 Epoch: 17 Global Step: 357870 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:15,754-Speed 6322.08 samples/sec Loss 5.3180 LearningRate 0.0004 Epoch: 17 Global Step: 357880 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:18,997-Speed 6316.06 samples/sec Loss 5.2981 LearningRate 0.0004 Epoch: 17 Global Step: 357890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:22,241-Speed 6315.58 samples/sec Loss 5.2769 LearningRate 0.0004 Epoch: 17 Global Step: 357900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:25,489-Speed 6307.10 samples/sec Loss 5.3503 LearningRate 0.0004 Epoch: 17 Global Step: 357910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:28,721-Speed 6338.55 samples/sec Loss 5.3368 LearningRate 0.0004 Epoch: 17 Global Step: 357920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:31,964-Speed 6316.21 samples/sec Loss 5.3606 LearningRate 0.0004 Epoch: 17 Global Step: 357930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:35,212-Speed 6305.61 samples/sec Loss 5.2844 LearningRate 0.0004 Epoch: 17 Global Step: 357940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:38,462-Speed 6304.31 samples/sec Loss 5.3342 LearningRate 0.0004 Epoch: 17 Global Step: 357950 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:41,708-Speed 6310.75 samples/sec Loss 5.2430 LearningRate 0.0004 Epoch: 17 Global Step: 357960 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:44,952-Speed 6314.80 samples/sec Loss 5.2470 LearningRate 0.0004 Epoch: 17 Global Step: 357970 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:48,197-Speed 6311.26 samples/sec Loss 5.3511 LearningRate 0.0004 Epoch: 17 Global Step: 357980 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:51,444-Speed 6308.28 samples/sec Loss 5.2953 LearningRate 0.0004 Epoch: 17 Global Step: 357990 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:54,713-Speed 6265.93 samples/sec Loss 5.2958 LearningRate 0.0004 Epoch: 17 Global Step: 358000 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:17:57,966-Speed 6298.95 samples/sec Loss 5.2977 LearningRate 0.0004 Epoch: 17 Global Step: 358010 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:01,201-Speed 6331.74 samples/sec Loss 5.2474 LearningRate 0.0004 Epoch: 17 Global Step: 358020 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:04,476-Speed 6253.85 samples/sec Loss 5.2775 LearningRate 0.0004 Epoch: 17 Global Step: 358030 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:07,720-Speed 6315.06 samples/sec Loss 5.3245 LearningRate 0.0004 Epoch: 17 Global Step: 358040 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:10,968-Speed 6307.64 samples/sec Loss 5.3187 LearningRate 0.0004 Epoch: 17 Global Step: 358050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:14,230-Speed 6280.84 samples/sec Loss 5.2966 LearningRate 0.0004 Epoch: 17 Global Step: 358060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:17,478-Speed 6306.45 samples/sec Loss 5.3837 LearningRate 0.0004 Epoch: 17 Global Step: 358070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:20,721-Speed 6317.15 samples/sec Loss 5.2558 LearningRate 0.0004 Epoch: 17 Global Step: 358080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:23,972-Speed 6299.36 samples/sec Loss 5.2993 LearningRate 0.0004 Epoch: 17 Global Step: 358090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:27,229-Speed 6289.92 samples/sec Loss 5.2477 LearningRate 0.0004 Epoch: 17 Global Step: 358100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:30,472-Speed 6317.10 samples/sec Loss 5.2713 LearningRate 0.0004 Epoch: 17 Global Step: 358110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:33,703-Speed 6338.98 samples/sec Loss 5.3379 LearningRate 0.0004 Epoch: 17 Global Step: 358120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:36,947-Speed 6315.13 samples/sec Loss 5.3084 LearningRate 0.0004 Epoch: 17 Global Step: 358130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:40,192-Speed 6312.06 samples/sec Loss 5.3394 LearningRate 0.0004 Epoch: 17 Global Step: 358140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:43,440-Speed 6307.54 samples/sec Loss 5.3198 LearningRate 0.0004 Epoch: 17 Global Step: 358150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:46,683-Speed 6315.97 samples/sec Loss 5.2635 LearningRate 0.0004 Epoch: 17 Global Step: 358160 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:49,930-Speed 6308.51 samples/sec Loss 5.2851 LearningRate 0.0004 Epoch: 17 Global Step: 358170 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:53,199-Speed 6267.82 samples/sec Loss 5.3314 LearningRate 0.0004 Epoch: 17 Global Step: 358180 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:56,444-Speed 6313.04 samples/sec Loss 5.3173 LearningRate 0.0004 Epoch: 17 Global Step: 358190 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:18:59,676-Speed 6337.81 samples/sec Loss 5.3350 LearningRate 0.0004 Epoch: 17 Global Step: 358200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:19:02,925-Speed 6304.99 samples/sec Loss 5.2934 LearningRate 0.0004 Epoch: 17 Global Step: 358210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:19:06,170-Speed 6312.31 samples/sec Loss 5.3074 LearningRate 0.0004 Epoch: 17 Global Step: 358220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:19:09,412-Speed 6318.72 samples/sec Loss 5.3270 LearningRate 0.0004 Epoch: 17 Global Step: 358230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:19:12,661-Speed 6303.79 samples/sec Loss 5.3695 LearningRate 0.0004 Epoch: 17 Global Step: 358240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:19:15,903-Speed 6318.66 samples/sec Loss 5.3429 LearningRate 0.0004 Epoch: 17 Global Step: 358250 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:19:19,152-Speed 6305.43 samples/sec Loss 5.2928 LearningRate 0.0004 Epoch: 17 Global Step: 358260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:19:22,400-Speed 6307.98 samples/sec Loss 5.2720 LearningRate 0.0004 Epoch: 17 Global Step: 358270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:19:25,646-Speed 6310.94 samples/sec Loss 5.3193 LearningRate 0.0004 Epoch: 17 Global Step: 358280 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:19:28,890-Speed 6313.40 samples/sec Loss 5.3446 LearningRate 0.0004 Epoch: 17 Global Step: 358290 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:19:32,134-Speed 6314.94 samples/sec Loss 5.3130 LearningRate 0.0004 Epoch: 17 Global Step: 358300 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:19:35,380-Speed 6311.62 samples/sec Loss 5.2856 LearningRate 0.0004 Epoch: 17 Global Step: 358310 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:19:38,623-Speed 6314.80 samples/sec Loss 5.2157 LearningRate 0.0004 Epoch: 17 Global Step: 358320 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:19:41,864-Speed 6320.53 samples/sec Loss 5.3782 LearningRate 0.0004 Epoch: 17 Global Step: 358330 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:19:45,108-Speed 6316.22 samples/sec Loss 5.3051 LearningRate 0.0004 Epoch: 17 Global Step: 358340 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:19:48,353-Speed 6311.00 samples/sec Loss 5.3125 LearningRate 0.0004 Epoch: 17 Global Step: 358350 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:19:51,599-Speed 6311.19 samples/sec Loss 5.3536 LearningRate 0.0004 Epoch: 17 Global Step: 358360 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:19:54,842-Speed 6315.83 samples/sec Loss 5.2907 LearningRate 0.0004 Epoch: 17 Global Step: 358370 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:19:58,075-Speed 6336.36 samples/sec Loss 5.3251 LearningRate 0.0004 Epoch: 17 Global Step: 358380 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:20:01,324-Speed 6305.32 samples/sec Loss 5.3618 LearningRate 0.0004 Epoch: 17 Global Step: 358390 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:20:04,566-Speed 6319.32 samples/sec Loss 5.2987 LearningRate 0.0004 Epoch: 17 Global Step: 358400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:20:07,812-Speed 6310.96 samples/sec Loss 5.2711 LearningRate 0.0004 Epoch: 17 Global Step: 358410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:20:11,054-Speed 6317.50 samples/sec Loss 5.3944 LearningRate 0.0004 Epoch: 17 Global Step: 358420 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:20:14,301-Speed 6308.28 samples/sec Loss 5.3418 LearningRate 0.0004 Epoch: 17 Global Step: 358430 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:20:17,550-Speed 6305.45 samples/sec Loss 5.3739 LearningRate 0.0004 Epoch: 17 Global Step: 358440 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:20:20,798-Speed 6306.27 samples/sec Loss 5.3113 LearningRate 0.0004 Epoch: 17 Global Step: 358450 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:20:24,042-Speed 6315.34 samples/sec Loss 5.3781 LearningRate 0.0004 Epoch: 17 Global Step: 358460 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:20:27,284-Speed 6319.48 samples/sec Loss 5.3218 LearningRate 0.0004 Epoch: 17 Global Step: 358470 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:20:30,534-Speed 6302.87 samples/sec Loss 5.2715 LearningRate 0.0004 Epoch: 17 Global Step: 358480 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:20:33,781-Speed 6308.42 samples/sec Loss 5.3611 LearningRate 0.0004 Epoch: 17 Global Step: 358490 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:20:37,027-Speed 6310.61 samples/sec Loss 5.2646 LearningRate 0.0004 Epoch: 17 Global Step: 358500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:20:40,274-Speed 6309.50 samples/sec Loss 5.2903 LearningRate 0.0004 Epoch: 17 Global Step: 358510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:20:43,517-Speed 6316.24 samples/sec Loss 5.3248 LearningRate 0.0004 Epoch: 17 Global Step: 358520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:20:46,764-Speed 6308.04 samples/sec Loss 5.2891 LearningRate 0.0004 Epoch: 17 Global Step: 358530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:20:50,011-Speed 6309.97 samples/sec Loss 5.2828 LearningRate 0.0004 Epoch: 17 Global Step: 358540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:20:53,255-Speed 6314.65 samples/sec Loss 5.3298 LearningRate 0.0004 Epoch: 17 Global Step: 358550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:20:56,501-Speed 6309.28 samples/sec Loss 5.2917 LearningRate 0.0004 Epoch: 17 Global Step: 358560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:20:59,787-Speed 6235.48 samples/sec Loss 5.3325 LearningRate 0.0004 Epoch: 17 Global Step: 358570 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:21:03,021-Speed 6333.79 samples/sec Loss 5.3066 LearningRate 0.0004 Epoch: 17 Global Step: 358580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:21:06,271-Speed 6302.02 samples/sec Loss 5.3722 LearningRate 0.0004 Epoch: 17 Global Step: 358590 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:21:09,513-Speed 6318.65 samples/sec Loss 5.3026 LearningRate 0.0004 Epoch: 17 Global Step: 358600 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:21:12,764-Speed 6301.34 samples/sec Loss 5.3222 LearningRate 0.0004 Epoch: 17 Global Step: 358610 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:21:15,996-Speed 6337.95 samples/sec Loss 5.2950 LearningRate 0.0004 Epoch: 17 Global Step: 358620 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:19,241-Speed 6311.93 samples/sec Loss 5.2193 LearningRate 0.0004 Epoch: 17 Global Step: 358630 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:22,488-Speed 6308.18 samples/sec Loss 5.3250 LearningRate 0.0004 Epoch: 17 Global Step: 358640 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:25,734-Speed 6312.86 samples/sec Loss 5.3351 LearningRate 0.0004 Epoch: 17 Global Step: 358650 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:28,974-Speed 6321.35 samples/sec Loss 5.2170 LearningRate 0.0004 Epoch: 17 Global Step: 358660 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:32,219-Speed 6311.94 samples/sec Loss 5.3522 LearningRate 0.0004 Epoch: 17 Global Step: 358670 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:35,466-Speed 6309.48 samples/sec Loss 5.3280 LearningRate 0.0004 Epoch: 17 Global Step: 358680 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:38,711-Speed 6312.51 samples/sec Loss 5.3747 LearningRate 0.0004 Epoch: 17 Global Step: 358690 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:41,955-Speed 6315.32 samples/sec Loss 5.2924 LearningRate 0.0004 Epoch: 17 Global Step: 358700 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:45,199-Speed 6314.60 samples/sec Loss 5.2394 LearningRate 0.0004 Epoch: 17 Global Step: 358710 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:48,446-Speed 6309.77 samples/sec Loss 5.2823 LearningRate 0.0004 Epoch: 17 Global Step: 358720 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:21:51,685-Speed 6323.73 samples/sec Loss 5.3308 LearningRate 0.0004 Epoch: 17 Global Step: 358730 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:21:54,913-Speed 6344.97 samples/sec Loss 5.3076 LearningRate 0.0004 Epoch: 17 Global Step: 358740 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:21:58,158-Speed 6313.53 samples/sec Loss 5.2450 LearningRate 0.0004 Epoch: 17 Global Step: 358750 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:22:01,407-Speed 6304.34 samples/sec Loss 5.2885 LearningRate 0.0004 Epoch: 17 Global Step: 358760 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:22:04,708-Speed 6205.33 samples/sec Loss 5.2788 LearningRate 0.0004 Epoch: 17 Global Step: 358770 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:22:07,952-Speed 6314.60 samples/sec Loss 5.2868 LearningRate 0.0004 Epoch: 17 Global Step: 358780 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:22:11,201-Speed 6306.28 samples/sec Loss 5.2574 LearningRate 0.0004 Epoch: 17 Global Step: 358790 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:22:14,456-Speed 6291.79 samples/sec Loss 5.2743 LearningRate 0.0004 Epoch: 17 Global Step: 358800 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:22:17,700-Speed 6314.58 samples/sec Loss 5.2413 LearningRate 0.0004 Epoch: 17 Global Step: 358810 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:22:20,945-Speed 6314.15 samples/sec Loss 5.3097 LearningRate 0.0004 Epoch: 17 Global Step: 358820 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:22:24,189-Speed 6314.00 samples/sec Loss 5.2483 LearningRate 0.0004 Epoch: 17 Global Step: 358830 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:22:27,435-Speed 6311.48 samples/sec Loss 5.3021 LearningRate 0.0004 Epoch: 17 Global Step: 358840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:22:30,677-Speed 6316.49 samples/sec Loss 5.2672 LearningRate 0.0004 Epoch: 17 Global Step: 358850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:22:33,921-Speed 6316.54 samples/sec Loss 5.2613 LearningRate 0.0004 Epoch: 17 Global Step: 358860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:22:37,163-Speed 6317.01 samples/sec Loss 5.3199 LearningRate 0.0004 Epoch: 17 Global Step: 358870 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:22:40,407-Speed 6315.78 samples/sec Loss 5.3319 LearningRate 0.0004 Epoch: 17 Global Step: 358880 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:22:43,650-Speed 6316.42 samples/sec Loss 5.2400 LearningRate 0.0004 Epoch: 17 Global Step: 358890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:22:46,895-Speed 6311.79 samples/sec Loss 5.2532 LearningRate 0.0004 Epoch: 17 Global Step: 358900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:22:50,138-Speed 6317.53 samples/sec Loss 5.3134 LearningRate 0.0004 Epoch: 17 Global Step: 358910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:22:53,385-Speed 6308.75 samples/sec Loss 5.3126 LearningRate 0.0004 Epoch: 17 Global Step: 358920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:22:56,629-Speed 6315.60 samples/sec Loss 5.2287 LearningRate 0.0004 Epoch: 17 Global Step: 358930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:22:59,862-Speed 6335.49 samples/sec Loss 5.2581 LearningRate 0.0004 Epoch: 17 Global Step: 358940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:23:03,108-Speed 6309.99 samples/sec Loss 5.3343 LearningRate 0.0004 Epoch: 17 Global Step: 358950 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:23:06,338-Speed 6341.58 samples/sec Loss 5.3245 LearningRate 0.0004 Epoch: 17 Global Step: 358960 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:23:09,585-Speed 6309.41 samples/sec Loss 5.3463 LearningRate 0.0004 Epoch: 17 Global Step: 358970 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:23:12,826-Speed 6321.18 samples/sec Loss 5.3906 LearningRate 0.0004 Epoch: 17 Global Step: 358980 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:23:16,070-Speed 6314.54 samples/sec Loss 5.2779 LearningRate 0.0004 Epoch: 17 Global Step: 358990 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:23:19,311-Speed 6318.81 samples/sec Loss 5.2648 LearningRate 0.0004 Epoch: 17 Global Step: 359000 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:23:22,563-Speed 6300.51 samples/sec Loss 5.3128 LearningRate 0.0004 Epoch: 17 Global Step: 359010 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:23:25,810-Speed 6308.30 samples/sec Loss 5.3467 LearningRate 0.0004 Epoch: 17 Global Step: 359020 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:23:29,055-Speed 6312.01 samples/sec Loss 5.3405 LearningRate 0.0004 Epoch: 17 Global Step: 359030 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:23:32,297-Speed 6319.63 samples/sec Loss 5.2447 LearningRate 0.0004 Epoch: 17 Global Step: 359040 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:23:35,539-Speed 6318.44 samples/sec Loss 5.2840 LearningRate 0.0004 Epoch: 17 Global Step: 359050 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:23:38,785-Speed 6309.56 samples/sec Loss 5.2744 LearningRate 0.0004 Epoch: 17 Global Step: 359060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:23:42,026-Speed 6321.81 samples/sec Loss 5.3093 LearningRate 0.0004 Epoch: 17 Global Step: 359070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:23:45,269-Speed 6315.24 samples/sec Loss 5.2335 LearningRate 0.0004 Epoch: 17 Global Step: 359080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:23:48,514-Speed 6312.86 samples/sec Loss 5.2631 LearningRate 0.0004 Epoch: 17 Global Step: 359090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:23:51,758-Speed 6315.18 samples/sec Loss 5.2136 LearningRate 0.0004 Epoch: 17 Global Step: 359100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:23:54,998-Speed 6321.12 samples/sec Loss 5.3285 LearningRate 0.0004 Epoch: 17 Global Step: 359110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:23:58,244-Speed 6312.32 samples/sec Loss 5.2076 LearningRate 0.0004 Epoch: 17 Global Step: 359120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:24:01,491-Speed 6309.23 samples/sec Loss 5.2867 LearningRate 0.0004 Epoch: 17 Global Step: 359130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:24:04,736-Speed 6312.10 samples/sec Loss 5.3710 LearningRate 0.0004 Epoch: 17 Global Step: 359140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:24:07,980-Speed 6314.85 samples/sec Loss 5.2988 LearningRate 0.0004 Epoch: 17 Global Step: 359150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:24:11,225-Speed 6313.07 samples/sec Loss 5.2696 LearningRate 0.0004 Epoch: 17 Global Step: 359160 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-04-02 00:24:14,455-Speed 6341.49 samples/sec Loss 5.3264 LearningRate 0.0004 Epoch: 17 Global Step: 359170 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:24:17,706-Speed 6300.59 samples/sec Loss 5.2819 LearningRate 0.0004 Epoch: 17 Global Step: 359180 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:24:20,950-Speed 6315.81 samples/sec Loss 5.3391 LearningRate 0.0004 Epoch: 17 Global Step: 359190 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:24:24,201-Speed 6300.96 samples/sec Loss 5.3231 LearningRate 0.0004 Epoch: 17 Global Step: 359200 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:24:27,450-Speed 6305.00 samples/sec Loss 5.3321 LearningRate 0.0004 Epoch: 17 Global Step: 359210 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:24:30,683-Speed 6336.03 samples/sec Loss 5.3584 LearningRate 0.0004 Epoch: 17 Global Step: 359220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:24:33,927-Speed 6313.70 samples/sec Loss 5.2644 LearningRate 0.0004 Epoch: 17 Global Step: 359230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:24:37,173-Speed 6310.27 samples/sec Loss 5.2226 LearningRate 0.0004 Epoch: 17 Global Step: 359240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:24:40,417-Speed 6314.61 samples/sec Loss 5.2520 LearningRate 0.0004 Epoch: 17 Global Step: 359250 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:24:43,661-Speed 6315.56 samples/sec Loss 5.3024 LearningRate 0.0004 Epoch: 17 Global Step: 359260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:24:46,909-Speed 6306.96 samples/sec Loss 5.2614 LearningRate 0.0004 Epoch: 17 Global Step: 359270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:24:50,153-Speed 6314.21 samples/sec Loss 5.2656 LearningRate 0.0004 Epoch: 17 Global Step: 359280 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:24:53,394-Speed 6320.10 samples/sec Loss 5.2384 LearningRate 0.0004 Epoch: 17 Global Step: 359290 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:24:56,641-Speed 6309.09 samples/sec Loss 5.3158 LearningRate 0.0004 Epoch: 17 Global Step: 359300 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:24:59,886-Speed 6311.72 samples/sec Loss 5.3323 LearningRate 0.0004 Epoch: 17 Global Step: 359310 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:03,136-Speed 6303.03 samples/sec Loss 5.3758 LearningRate 0.0004 Epoch: 17 Global Step: 359320 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:25:06,381-Speed 6313.47 samples/sec Loss 5.3621 LearningRate 0.0004 Epoch: 17 Global Step: 359330 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:25:09,612-Speed 6340.33 samples/sec Loss 5.3617 LearningRate 0.0004 Epoch: 17 Global Step: 359340 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:12,857-Speed 6312.27 samples/sec Loss 5.2168 LearningRate 0.0004 Epoch: 17 Global Step: 359350 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:16,106-Speed 6304.62 samples/sec Loss 5.3226 LearningRate 0.0004 Epoch: 17 Global Step: 359360 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:19,352-Speed 6310.84 samples/sec Loss 5.2919 LearningRate 0.0004 Epoch: 17 Global Step: 359370 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:22,598-Speed 6310.83 samples/sec Loss 5.3393 LearningRate 0.0004 Epoch: 17 Global Step: 359380 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:25,841-Speed 6317.93 samples/sec Loss 5.2643 LearningRate 0.0004 Epoch: 17 Global Step: 359390 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:29,087-Speed 6310.22 samples/sec Loss 5.2898 LearningRate 0.0004 Epoch: 17 Global Step: 359400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:32,333-Speed 6311.48 samples/sec Loss 5.3420 LearningRate 0.0004 Epoch: 17 Global Step: 359410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:35,580-Speed 6307.75 samples/sec Loss 5.2744 LearningRate 0.0004 Epoch: 17 Global Step: 359420 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:38,823-Speed 6316.58 samples/sec Loss 5.2775 LearningRate 0.0004 Epoch: 17 Global Step: 359430 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:42,073-Speed 6302.03 samples/sec Loss 5.3656 LearningRate 0.0004 Epoch: 17 Global Step: 359440 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:25:45,315-Speed 6318.25 samples/sec Loss 5.3783 LearningRate 0.0004 Epoch: 17 Global Step: 359450 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:25:48,560-Speed 6312.51 samples/sec Loss 5.2062 LearningRate 0.0004 Epoch: 17 Global Step: 359460 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:25:51,807-Speed 6309.25 samples/sec Loss 5.2893 LearningRate 0.0004 Epoch: 17 Global Step: 359470 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:25:55,035-Speed 6347.03 samples/sec Loss 5.2229 LearningRate 0.0004 Epoch: 17 Global Step: 359480 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:25:58,282-Speed 6308.11 samples/sec Loss 5.3330 LearningRate 0.0004 Epoch: 17 Global Step: 359490 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:26:01,526-Speed 6315.19 samples/sec Loss 5.3207 LearningRate 0.0004 Epoch: 17 Global Step: 359500 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:26:04,773-Speed 6308.43 samples/sec Loss 5.2134 LearningRate 0.0004 Epoch: 17 Global Step: 359510 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:26:08,018-Speed 6312.23 samples/sec Loss 5.2681 LearningRate 0.0004 Epoch: 17 Global Step: 359520 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:26:11,265-Speed 6309.78 samples/sec Loss 5.3354 LearningRate 0.0004 Epoch: 17 Global Step: 359530 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:26:14,513-Speed 6306.45 samples/sec Loss 5.2632 LearningRate 0.0004 Epoch: 17 Global Step: 359540 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:26:17,755-Speed 6318.04 samples/sec Loss 5.2986 LearningRate 0.0004 Epoch: 17 Global Step: 359550 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:26:21,005-Speed 6303.48 samples/sec Loss 5.2804 LearningRate 0.0004 Epoch: 17 Global Step: 359560 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:26:24,251-Speed 6311.71 samples/sec Loss 5.3564 LearningRate 0.0004 Epoch: 17 Global Step: 359570 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:26:27,497-Speed 6309.81 samples/sec Loss 5.2435 LearningRate 0.0004 Epoch: 17 Global Step: 359580 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:26:30,746-Speed 6304.57 samples/sec Loss 5.3152 LearningRate 0.0004 Epoch: 17 Global Step: 359590 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:26:33,991-Speed 6313.43 samples/sec Loss 5.2804 LearningRate 0.0004 Epoch: 17 Global Step: 359600 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:26:37,236-Speed 6311.82 samples/sec Loss 5.2678 LearningRate 0.0004 Epoch: 17 Global Step: 359610 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:26:40,483-Speed 6309.65 samples/sec Loss 5.3253 LearningRate 0.0004 Epoch: 17 Global Step: 359620 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:26:43,726-Speed 6316.84 samples/sec Loss 5.2743 LearningRate 0.0004 Epoch: 17 Global Step: 359630 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:26:46,975-Speed 6305.10 samples/sec Loss 5.3289 LearningRate 0.0004 Epoch: 17 Global Step: 359640 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:26:50,220-Speed 6311.32 samples/sec Loss 5.2420 LearningRate 0.0004 Epoch: 17 Global Step: 359650 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:26:53,468-Speed 6306.82 samples/sec Loss 5.3296 LearningRate 0.0004 Epoch: 17 Global Step: 359660 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:26:56,714-Speed 6311.51 samples/sec Loss 5.2533 LearningRate 0.0004 Epoch: 17 Global Step: 359670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:26:59,946-Speed 6337.24 samples/sec Loss 5.2629 LearningRate 0.0004 Epoch: 17 Global Step: 359680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:03,195-Speed 6305.94 samples/sec Loss 5.2905 LearningRate 0.0004 Epoch: 17 Global Step: 359690 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:06,470-Speed 6253.63 samples/sec Loss 5.3392 LearningRate 0.0004 Epoch: 17 Global Step: 359700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:09,725-Speed 6293.92 samples/sec Loss 5.3196 LearningRate 0.0004 Epoch: 17 Global Step: 359710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:12,972-Speed 6308.53 samples/sec Loss 5.2356 LearningRate 0.0004 Epoch: 17 Global Step: 359720 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:16,235-Speed 6278.85 samples/sec Loss 5.2658 LearningRate 0.0004 Epoch: 17 Global Step: 359730 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:19,480-Speed 6311.43 samples/sec Loss 5.2593 LearningRate 0.0004 Epoch: 17 Global Step: 359740 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:22,726-Speed 6311.53 samples/sec Loss 5.3446 LearningRate 0.0004 Epoch: 17 Global Step: 359750 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:25,975-Speed 6304.45 samples/sec Loss 5.3579 LearningRate 0.0004 Epoch: 17 Global Step: 359760 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:29,220-Speed 6312.93 samples/sec Loss 5.3209 LearningRate 0.0004 Epoch: 17 Global Step: 359770 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:32,470-Speed 6303.02 samples/sec Loss 5.3001 LearningRate 0.0004 Epoch: 17 Global Step: 359780 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-04-02 00:27:35,701-Speed 6340.37 samples/sec Loss 5.2534 LearningRate 0.0004 Epoch: 17 Global Step: 359790 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:38,949-Speed 6306.44 samples/sec Loss 5.3250 LearningRate 0.0004 Epoch: 17 Global Step: 359800 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:27:42,182-Speed 6336.78 samples/sec Loss 5.2861 LearningRate 0.0004 Epoch: 17 Global Step: 359810 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:27:45,426-Speed 6315.54 samples/sec Loss 5.3063 LearningRate 0.0004 Epoch: 17 Global Step: 359820 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:27:48,696-Speed 6264.27 samples/sec Loss 5.3082 LearningRate 0.0004 Epoch: 17 Global Step: 359830 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:27:51,940-Speed 6313.38 samples/sec Loss 5.2999 LearningRate 0.0004 Epoch: 17 Global Step: 359840 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:27:55,185-Speed 6313.66 samples/sec Loss 5.2599 LearningRate 0.0004 Epoch: 17 Global Step: 359850 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:27:58,431-Speed 6310.56 samples/sec Loss 5.2774 LearningRate 0.0004 Epoch: 17 Global Step: 359860 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:28:01,677-Speed 6310.33 samples/sec Loss 5.3279 LearningRate 0.0004 Epoch: 17 Global Step: 359870 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:28:04,922-Speed 6312.80 samples/sec Loss 5.3654 LearningRate 0.0004 Epoch: 17 Global Step: 359880 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:28:08,166-Speed 6314.19 samples/sec Loss 5.3175 LearningRate 0.0004 Epoch: 17 Global Step: 359890 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:28:11,413-Speed 6309.20 samples/sec Loss 5.3226 LearningRate 0.0004 Epoch: 17 Global Step: 359900 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:28:14,664-Speed 6301.71 samples/sec Loss 5.2899 LearningRate 0.0004 Epoch: 17 Global Step: 359910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:17,911-Speed 6308.86 samples/sec Loss 5.2744 LearningRate 0.0004 Epoch: 17 Global Step: 359920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:21,157-Speed 6310.14 samples/sec Loss 5.2909 LearningRate 0.0004 Epoch: 17 Global Step: 359930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:24,406-Speed 6303.85 samples/sec Loss 5.2224 LearningRate 0.0004 Epoch: 17 Global Step: 359940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:27,650-Speed 6314.93 samples/sec Loss 5.2503 LearningRate 0.0004 Epoch: 17 Global Step: 359950 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:30,897-Speed 6308.81 samples/sec Loss 5.2682 LearningRate 0.0004 Epoch: 17 Global Step: 359960 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:34,144-Speed 6309.13 samples/sec Loss 5.3002 LearningRate 0.0004 Epoch: 17 Global Step: 359970 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:37,386-Speed 6318.48 samples/sec Loss 5.2930 LearningRate 0.0004 Epoch: 17 Global Step: 359980 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:40,633-Speed 6308.56 samples/sec Loss 5.3294 LearningRate 0.0004 Epoch: 17 Global Step: 359990 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:43,877-Speed 6314.25 samples/sec Loss 5.3187 LearningRate 0.0004 Epoch: 17 Global Step: 360000 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:47,123-Speed 6311.25 samples/sec Loss 5.3536 LearningRate 0.0004 Epoch: 17 Global Step: 360010 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:50,371-Speed 6307.47 samples/sec Loss 5.2927 LearningRate 0.0004 Epoch: 17 Global Step: 360020 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:53,616-Speed 6313.66 samples/sec Loss 5.2791 LearningRate 0.0004 Epoch: 17 Global Step: 360030 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:28:56,864-Speed 6306.26 samples/sec Loss 5.2960 LearningRate 0.0004 Epoch: 17 Global Step: 360040 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:00,109-Speed 6311.66 samples/sec Loss 5.2838 LearningRate 0.0004 Epoch: 17 Global Step: 360050 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:03,362-Speed 6298.94 samples/sec Loss 5.2994 LearningRate 0.0004 Epoch: 17 Global Step: 360060 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:06,609-Speed 6307.92 samples/sec Loss 5.2923 LearningRate 0.0004 Epoch: 17 Global Step: 360070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:09,858-Speed 6304.57 samples/sec Loss 5.3076 LearningRate 0.0004 Epoch: 17 Global Step: 360080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:13,104-Speed 6310.99 samples/sec Loss 5.3126 LearningRate 0.0004 Epoch: 17 Global Step: 360090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:16,347-Speed 6317.14 samples/sec Loss 5.3170 LearningRate 0.0004 Epoch: 17 Global Step: 360100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:19,574-Speed 6348.10 samples/sec Loss 5.2786 LearningRate 0.0004 Epoch: 17 Global Step: 360110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:22,818-Speed 6314.07 samples/sec Loss 5.2776 LearningRate 0.0004 Epoch: 17 Global Step: 360120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:26,061-Speed 6317.15 samples/sec Loss 5.2389 LearningRate 0.0004 Epoch: 17 Global Step: 360130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:29,308-Speed 6307.38 samples/sec Loss 5.2552 LearningRate 0.0004 Epoch: 17 Global Step: 360140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:32,555-Speed 6309.55 samples/sec Loss 5.2473 LearningRate 0.0004 Epoch: 17 Global Step: 360150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:35,806-Speed 6301.63 samples/sec Loss 5.3061 LearningRate 0.0004 Epoch: 17 Global Step: 360160 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:39,064-Speed 6287.02 samples/sec Loss 5.2775 LearningRate 0.0004 Epoch: 17 Global Step: 360170 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:29:42,299-Speed 6331.56 samples/sec Loss 5.3158 LearningRate 0.0004 Epoch: 17 Global Step: 360180 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:29:45,553-Speed 6294.57 samples/sec Loss 5.2651 LearningRate 0.0004 Epoch: 17 Global Step: 360190 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:29:48,796-Speed 6317.46 samples/sec Loss 5.3021 LearningRate 0.0004 Epoch: 17 Global Step: 360200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:29:52,041-Speed 6312.74 samples/sec Loss 5.3484 LearningRate 0.0004 Epoch: 17 Global Step: 360210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:29:55,284-Speed 6316.17 samples/sec Loss 5.3781 LearningRate 0.0004 Epoch: 17 Global Step: 360220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:29:58,526-Speed 6319.05 samples/sec Loss 5.3046 LearningRate 0.0004 Epoch: 17 Global Step: 360230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:30:01,771-Speed 6312.89 samples/sec Loss 5.3590 LearningRate 0.0004 Epoch: 17 Global Step: 360240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:30:05,018-Speed 6307.78 samples/sec Loss 5.3094 LearningRate 0.0004 Epoch: 17 Global Step: 360250 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:30:08,259-Speed 6321.75 samples/sec Loss 5.2314 LearningRate 0.0004 Epoch: 17 Global Step: 360260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:30:11,507-Speed 6306.46 samples/sec Loss 5.2765 LearningRate 0.0004 Epoch: 17 Global Step: 360270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:30:14,760-Speed 6297.36 samples/sec Loss 5.2597 LearningRate 0.0004 Epoch: 17 Global Step: 360280 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:30:18,007-Speed 6307.93 samples/sec Loss 5.2646 LearningRate 0.0004 Epoch: 17 Global Step: 360290 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:30:21,250-Speed 6316.98 samples/sec Loss 5.2245 LearningRate 0.0004 Epoch: 17 Global Step: 360300 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:30:24,496-Speed 6311.53 samples/sec Loss 5.3528 LearningRate 0.0004 Epoch: 17 Global Step: 360310 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:30:27,746-Speed 6302.37 samples/sec Loss 5.3612 LearningRate 0.0004 Epoch: 17 Global Step: 360320 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:30:30,989-Speed 6316.35 samples/sec Loss 5.2826 LearningRate 0.0004 Epoch: 17 Global Step: 360330 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:30:34,234-Speed 6312.34 samples/sec Loss 5.1962 LearningRate 0.0004 Epoch: 17 Global Step: 360340 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:30:37,477-Speed 6316.53 samples/sec Loss 5.3046 LearningRate 0.0004 Epoch: 17 Global Step: 360350 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:30:40,722-Speed 6314.10 samples/sec Loss 5.2389 LearningRate 0.0004 Epoch: 17 Global Step: 360360 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:30:43,975-Speed 6296.10 samples/sec Loss 5.3167 LearningRate 0.0004 Epoch: 17 Global Step: 360370 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:30:47,218-Speed 6317.19 samples/sec Loss 5.2154 LearningRate 0.0004 Epoch: 17 Global Step: 360380 Fp16 Grad Scale: 65536 Required: 43 hours Training: 2022-04-02 00:30:50,446-Speed 6345.24 samples/sec Loss 5.3468 LearningRate 0.0004 Epoch: 17 Global Step: 360390 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:30:53,696-Speed 6302.86 samples/sec Loss 5.2529 LearningRate 0.0004 Epoch: 17 Global Step: 360400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:30:56,948-Speed 6300.24 samples/sec Loss 5.3003 LearningRate 0.0004 Epoch: 17 Global Step: 360410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:00,188-Speed 6320.99 samples/sec Loss 5.2711 LearningRate 0.0004 Epoch: 17 Global Step: 360420 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:03,435-Speed 6308.70 samples/sec Loss 5.2849 LearningRate 0.0004 Epoch: 17 Global Step: 360430 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:06,701-Speed 6272.03 samples/sec Loss 5.3107 LearningRate 0.0004 Epoch: 17 Global Step: 360440 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:09,949-Speed 6307.71 samples/sec Loss 5.2666 LearningRate 0.0004 Epoch: 17 Global Step: 360450 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:13,187-Speed 6325.93 samples/sec Loss 5.3296 LearningRate 0.0004 Epoch: 17 Global Step: 360460 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:16,436-Speed 6305.70 samples/sec Loss 5.3125 LearningRate 0.0004 Epoch: 17 Global Step: 360470 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:19,683-Speed 6308.52 samples/sec Loss 5.2712 LearningRate 0.0004 Epoch: 17 Global Step: 360480 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:22,929-Speed 6312.36 samples/sec Loss 5.2307 LearningRate 0.0004 Epoch: 17 Global Step: 360490 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:31:26,174-Speed 6312.08 samples/sec Loss 5.2817 LearningRate 0.0004 Epoch: 17 Global Step: 360500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:31:29,421-Speed 6308.79 samples/sec Loss 5.3049 LearningRate 0.0004 Epoch: 17 Global Step: 360510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:31:32,664-Speed 6316.15 samples/sec Loss 5.3061 LearningRate 0.0004 Epoch: 17 Global Step: 360520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:31:35,912-Speed 6306.04 samples/sec Loss 5.2787 LearningRate 0.0004 Epoch: 17 Global Step: 360530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:31:39,159-Speed 6308.42 samples/sec Loss 5.3332 LearningRate 0.0004 Epoch: 17 Global Step: 360540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:31:42,409-Speed 6303.37 samples/sec Loss 5.3461 LearningRate 0.0004 Epoch: 17 Global Step: 360550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:31:45,661-Speed 6298.79 samples/sec Loss 5.2482 LearningRate 0.0004 Epoch: 17 Global Step: 360560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:31:48,899-Speed 6327.32 samples/sec Loss 5.3115 LearningRate 0.0004 Epoch: 17 Global Step: 360570 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:52,146-Speed 6308.33 samples/sec Loss 5.2497 LearningRate 0.0004 Epoch: 17 Global Step: 360580 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:55,394-Speed 6306.45 samples/sec Loss 5.3153 LearningRate 0.0004 Epoch: 17 Global Step: 360590 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:31:58,639-Speed 6313.73 samples/sec Loss 5.2962 LearningRate 0.0004 Epoch: 17 Global Step: 360600 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:32:01,879-Speed 6320.66 samples/sec Loss 5.3165 LearningRate 0.0004 Epoch: 17 Global Step: 360610 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:32:05,125-Speed 6311.00 samples/sec Loss 5.1948 LearningRate 0.0004 Epoch: 17 Global Step: 360620 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:32:08,374-Speed 6304.63 samples/sec Loss 5.2802 LearningRate 0.0004 Epoch: 17 Global Step: 360630 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:32:11,620-Speed 6311.30 samples/sec Loss 5.3301 LearningRate 0.0004 Epoch: 17 Global Step: 360640 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:32:14,870-Speed 6302.53 samples/sec Loss 5.1997 LearningRate 0.0004 Epoch: 17 Global Step: 360650 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:32:18,113-Speed 6317.73 samples/sec Loss 5.3800 LearningRate 0.0004 Epoch: 17 Global Step: 360660 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:32:21,360-Speed 6308.76 samples/sec Loss 5.3041 LearningRate 0.0004 Epoch: 17 Global Step: 360670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:24,607-Speed 6310.09 samples/sec Loss 5.3672 LearningRate 0.0004 Epoch: 17 Global Step: 360680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:27,853-Speed 6309.35 samples/sec Loss 5.2241 LearningRate 0.0004 Epoch: 17 Global Step: 360690 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:31,101-Speed 6307.66 samples/sec Loss 5.3099 LearningRate 0.0004 Epoch: 17 Global Step: 360700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:34,346-Speed 6311.52 samples/sec Loss 5.3230 LearningRate 0.0004 Epoch: 17 Global Step: 360710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:37,594-Speed 6307.66 samples/sec Loss 5.2656 LearningRate 0.0004 Epoch: 17 Global Step: 360720 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:40,836-Speed 6318.03 samples/sec Loss 5.2697 LearningRate 0.0004 Epoch: 17 Global Step: 360730 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:44,081-Speed 6312.99 samples/sec Loss 5.2729 LearningRate 0.0004 Epoch: 17 Global Step: 360740 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:47,328-Speed 6309.02 samples/sec Loss 5.3346 LearningRate 0.0004 Epoch: 17 Global Step: 360750 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:50,571-Speed 6315.47 samples/sec Loss 5.2894 LearningRate 0.0004 Epoch: 17 Global Step: 360760 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:53,806-Speed 6333.30 samples/sec Loss 5.3107 LearningRate 0.0004 Epoch: 17 Global Step: 360770 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:32:57,048-Speed 6318.15 samples/sec Loss 5.3753 LearningRate 0.0004 Epoch: 17 Global Step: 360780 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:00,295-Speed 6307.53 samples/sec Loss 5.3373 LearningRate 0.0004 Epoch: 17 Global Step: 360790 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:03,540-Speed 6313.63 samples/sec Loss 5.1675 LearningRate 0.0004 Epoch: 17 Global Step: 360800 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:06,783-Speed 6316.53 samples/sec Loss 5.3348 LearningRate 0.0004 Epoch: 17 Global Step: 360810 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:10,032-Speed 6304.23 samples/sec Loss 5.3258 LearningRate 0.0004 Epoch: 17 Global Step: 360820 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:13,274-Speed 6319.17 samples/sec Loss 5.3272 LearningRate 0.0004 Epoch: 17 Global Step: 360830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:16,518-Speed 6313.64 samples/sec Loss 5.2413 LearningRate 0.0004 Epoch: 17 Global Step: 360840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:19,764-Speed 6312.23 samples/sec Loss 5.2849 LearningRate 0.0004 Epoch: 17 Global Step: 360850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:23,013-Speed 6303.98 samples/sec Loss 5.2795 LearningRate 0.0004 Epoch: 17 Global Step: 360860 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:26,249-Speed 6330.50 samples/sec Loss 5.2818 LearningRate 0.0004 Epoch: 17 Global Step: 360870 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:29,498-Speed 6306.39 samples/sec Loss 5.2183 LearningRate 0.0004 Epoch: 17 Global Step: 360880 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:32,745-Speed 6308.66 samples/sec Loss 5.2925 LearningRate 0.0004 Epoch: 17 Global Step: 360890 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:35,992-Speed 6308.73 samples/sec Loss 5.3443 LearningRate 0.0004 Epoch: 17 Global Step: 360900 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:39,238-Speed 6310.25 samples/sec Loss 5.2750 LearningRate 0.0004 Epoch: 17 Global Step: 360910 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:42,484-Speed 6310.38 samples/sec Loss 5.2261 LearningRate 0.0004 Epoch: 17 Global Step: 360920 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:45,732-Speed 6307.51 samples/sec Loss 5.2508 LearningRate 0.0004 Epoch: 17 Global Step: 360930 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:48,977-Speed 6312.56 samples/sec Loss 5.2735 LearningRate 0.0004 Epoch: 17 Global Step: 360940 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:52,234-Speed 6288.02 samples/sec Loss 5.2163 LearningRate 0.0004 Epoch: 17 Global Step: 360950 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:55,482-Speed 6307.33 samples/sec Loss 5.2751 LearningRate 0.0004 Epoch: 17 Global Step: 360960 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:33:58,716-Speed 6335.49 samples/sec Loss 5.2689 LearningRate 0.0004 Epoch: 17 Global Step: 360970 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:34:01,962-Speed 6309.14 samples/sec Loss 5.2418 LearningRate 0.0004 Epoch: 17 Global Step: 360980 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:34:05,209-Speed 6309.47 samples/sec Loss 5.3573 LearningRate 0.0004 Epoch: 17 Global Step: 360990 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:34:08,457-Speed 6305.84 samples/sec Loss 5.2898 LearningRate 0.0004 Epoch: 17 Global Step: 361000 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:34:11,707-Speed 6303.62 samples/sec Loss 5.3292 LearningRate 0.0004 Epoch: 17 Global Step: 361010 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:34:14,950-Speed 6316.06 samples/sec Loss 5.3397 LearningRate 0.0004 Epoch: 17 Global Step: 361020 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:34:18,207-Speed 6289.31 samples/sec Loss 5.2865 LearningRate 0.0004 Epoch: 17 Global Step: 361030 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:34:21,453-Speed 6311.40 samples/sec Loss 5.2736 LearningRate 0.0004 Epoch: 17 Global Step: 361040 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:34:24,711-Speed 6286.77 samples/sec Loss 5.3216 LearningRate 0.0004 Epoch: 17 Global Step: 361050 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:34:27,959-Speed 6307.64 samples/sec Loss 5.2901 LearningRate 0.0004 Epoch: 17 Global Step: 361060 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:34:31,210-Speed 6300.99 samples/sec Loss 5.2551 LearningRate 0.0004 Epoch: 17 Global Step: 361070 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:34:34,457-Speed 6308.41 samples/sec Loss 5.2637 LearningRate 0.0004 Epoch: 17 Global Step: 361080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:34:37,704-Speed 6309.18 samples/sec Loss 5.2802 LearningRate 0.0004 Epoch: 17 Global Step: 361090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:34:40,952-Speed 6306.97 samples/sec Loss 5.1809 LearningRate 0.0004 Epoch: 17 Global Step: 361100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:34:44,211-Speed 6284.99 samples/sec Loss 5.2882 LearningRate 0.0004 Epoch: 17 Global Step: 361110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:34:47,463-Speed 6300.04 samples/sec Loss 5.2656 LearningRate 0.0004 Epoch: 17 Global Step: 361120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:34:50,716-Speed 6297.67 samples/sec Loss 5.2858 LearningRate 0.0004 Epoch: 17 Global Step: 361130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:34:53,961-Speed 6311.30 samples/sec Loss 5.3168 LearningRate 0.0004 Epoch: 17 Global Step: 361140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:34:57,209-Speed 6307.68 samples/sec Loss 5.2543 LearningRate 0.0004 Epoch: 17 Global Step: 361150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:35:00,460-Speed 6301.59 samples/sec Loss 5.2895 LearningRate 0.0004 Epoch: 17 Global Step: 361160 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:35:03,705-Speed 6311.36 samples/sec Loss 5.3319 LearningRate 0.0004 Epoch: 17 Global Step: 361170 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:35:06,963-Speed 6288.61 samples/sec Loss 5.3154 LearningRate 0.0004 Epoch: 17 Global Step: 361180 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:10,209-Speed 6309.04 samples/sec Loss 5.2724 LearningRate 0.0004 Epoch: 17 Global Step: 361190 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:13,499-Speed 6227.54 samples/sec Loss 5.2456 LearningRate 0.0004 Epoch: 17 Global Step: 361200 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:16,747-Speed 6306.19 samples/sec Loss 5.2579 LearningRate 0.0004 Epoch: 17 Global Step: 361210 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:19,992-Speed 6313.06 samples/sec Loss 5.2874 LearningRate 0.0004 Epoch: 17 Global Step: 361220 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:23,240-Speed 6307.25 samples/sec Loss 5.1262 LearningRate 0.0004 Epoch: 17 Global Step: 361230 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:26,485-Speed 6311.30 samples/sec Loss 5.3156 LearningRate 0.0004 Epoch: 17 Global Step: 361240 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:29,737-Speed 6298.97 samples/sec Loss 5.3577 LearningRate 0.0004 Epoch: 17 Global Step: 361250 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:32,984-Speed 6310.11 samples/sec Loss 5.2964 LearningRate 0.0004 Epoch: 17 Global Step: 361260 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:36,230-Speed 6309.48 samples/sec Loss 5.2714 LearningRate 0.0004 Epoch: 17 Global Step: 361270 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:39,478-Speed 6307.09 samples/sec Loss 5.2476 LearningRate 0.0004 Epoch: 17 Global Step: 361280 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:35:42,719-Speed 6320.33 samples/sec Loss 5.2537 LearningRate 0.0004 Epoch: 17 Global Step: 361290 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:35:45,964-Speed 6312.10 samples/sec Loss 5.2730 LearningRate 0.0004 Epoch: 17 Global Step: 361300 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:35:49,213-Speed 6306.74 samples/sec Loss 5.2816 LearningRate 0.0004 Epoch: 17 Global Step: 361310 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:35:52,461-Speed 6307.31 samples/sec Loss 5.2412 LearningRate 0.0004 Epoch: 17 Global Step: 361320 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:35:55,687-Speed 6348.37 samples/sec Loss 5.2922 LearningRate 0.0004 Epoch: 17 Global Step: 361330 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:35:58,933-Speed 6312.83 samples/sec Loss 5.2918 LearningRate 0.0004 Epoch: 17 Global Step: 361340 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:36:02,176-Speed 6314.96 samples/sec Loss 5.1986 LearningRate 0.0004 Epoch: 17 Global Step: 361350 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:36:05,424-Speed 6306.89 samples/sec Loss 5.2401 LearningRate 0.0004 Epoch: 17 Global Step: 361360 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:36:08,670-Speed 6311.77 samples/sec Loss 5.3080 LearningRate 0.0004 Epoch: 17 Global Step: 361370 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:36:11,913-Speed 6316.49 samples/sec Loss 5.2891 LearningRate 0.0004 Epoch: 17 Global Step: 361380 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:36:15,160-Speed 6308.92 samples/sec Loss 5.3600 LearningRate 0.0004 Epoch: 17 Global Step: 361390 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:36:18,404-Speed 6314.61 samples/sec Loss 5.3125 LearningRate 0.0004 Epoch: 17 Global Step: 361400 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:36:21,651-Speed 6309.86 samples/sec Loss 5.3197 LearningRate 0.0004 Epoch: 17 Global Step: 361410 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:36:24,899-Speed 6306.06 samples/sec Loss 5.2621 LearningRate 0.0004 Epoch: 17 Global Step: 361420 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:36:28,145-Speed 6310.62 samples/sec Loss 5.2637 LearningRate 0.0004 Epoch: 17 Global Step: 361430 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:36:31,388-Speed 6317.13 samples/sec Loss 5.3109 LearningRate 0.0004 Epoch: 17 Global Step: 361440 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:36:34,636-Speed 6306.15 samples/sec Loss 5.3513 LearningRate 0.0004 Epoch: 17 Global Step: 361450 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:36:37,885-Speed 6305.87 samples/sec Loss 5.2776 LearningRate 0.0004 Epoch: 17 Global Step: 361460 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:36:41,134-Speed 6304.77 samples/sec Loss 5.2886 LearningRate 0.0004 Epoch: 17 Global Step: 361470 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:36:44,388-Speed 6294.12 samples/sec Loss 5.2492 LearningRate 0.0004 Epoch: 17 Global Step: 361480 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:36:47,644-Speed 6292.56 samples/sec Loss 5.2042 LearningRate 0.0004 Epoch: 17 Global Step: 361490 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:36:50,894-Speed 6302.41 samples/sec Loss 5.2393 LearningRate 0.0004 Epoch: 17 Global Step: 361500 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:36:54,142-Speed 6306.29 samples/sec Loss 5.3036 LearningRate 0.0004 Epoch: 17 Global Step: 361510 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:36:57,384-Speed 6319.66 samples/sec Loss 5.1780 LearningRate 0.0004 Epoch: 17 Global Step: 361520 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:37:00,617-Speed 6336.72 samples/sec Loss 5.3331 LearningRate 0.0004 Epoch: 17 Global Step: 361530 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:37:03,863-Speed 6311.26 samples/sec Loss 5.3253 LearningRate 0.0004 Epoch: 17 Global Step: 361540 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:37:07,112-Speed 6304.46 samples/sec Loss 5.2431 LearningRate 0.0004 Epoch: 17 Global Step: 361550 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:37:10,354-Speed 6317.53 samples/sec Loss 5.2612 LearningRate 0.0004 Epoch: 17 Global Step: 361560 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:37:13,600-Speed 6310.25 samples/sec Loss 5.3157 LearningRate 0.0004 Epoch: 17 Global Step: 361570 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:37:16,845-Speed 6313.84 samples/sec Loss 5.2286 LearningRate 0.0004 Epoch: 17 Global Step: 361580 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:37:20,089-Speed 6314.40 samples/sec Loss 5.2905 LearningRate 0.0004 Epoch: 17 Global Step: 361590 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:37:23,339-Speed 6303.34 samples/sec Loss 5.2646 LearningRate 0.0004 Epoch: 17 Global Step: 361600 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:37:26,584-Speed 6311.85 samples/sec Loss 5.3230 LearningRate 0.0004 Epoch: 17 Global Step: 361610 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:37:29,829-Speed 6312.40 samples/sec Loss 5.2275 LearningRate 0.0004 Epoch: 17 Global Step: 361620 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:37:33,076-Speed 6309.00 samples/sec Loss 5.2280 LearningRate 0.0004 Epoch: 17 Global Step: 361630 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:37:36,322-Speed 6310.63 samples/sec Loss 5.2207 LearningRate 0.0004 Epoch: 17 Global Step: 361640 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:37:39,569-Speed 6309.03 samples/sec Loss 5.2509 LearningRate 0.0004 Epoch: 17 Global Step: 361650 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:37:42,828-Speed 6285.96 samples/sec Loss 5.2603 LearningRate 0.0004 Epoch: 17 Global Step: 361660 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:37:46,075-Speed 6307.62 samples/sec Loss 5.2697 LearningRate 0.0004 Epoch: 17 Global Step: 361670 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:37:49,321-Speed 6311.41 samples/sec Loss 5.2782 LearningRate 0.0004 Epoch: 17 Global Step: 361680 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:37:52,581-Speed 6283.37 samples/sec Loss 5.2492 LearningRate 0.0004 Epoch: 17 Global Step: 361690 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:37:55,828-Speed 6309.06 samples/sec Loss 5.3156 LearningRate 0.0004 Epoch: 17 Global Step: 361700 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:37:59,073-Speed 6312.62 samples/sec Loss 5.2583 LearningRate 0.0004 Epoch: 17 Global Step: 361710 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:02,321-Speed 6307.29 samples/sec Loss 5.2800 LearningRate 0.0004 Epoch: 17 Global Step: 361720 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:05,567-Speed 6311.89 samples/sec Loss 5.3135 LearningRate 0.0004 Epoch: 17 Global Step: 361730 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:08,813-Speed 6309.55 samples/sec Loss 5.2397 LearningRate 0.0004 Epoch: 17 Global Step: 361740 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:12,064-Speed 6300.64 samples/sec Loss 5.3395 LearningRate 0.0004 Epoch: 17 Global Step: 361750 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:15,311-Speed 6309.17 samples/sec Loss 5.3035 LearningRate 0.0004 Epoch: 17 Global Step: 361760 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:18,547-Speed 6330.84 samples/sec Loss 5.2274 LearningRate 0.0004 Epoch: 17 Global Step: 361770 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:21,791-Speed 6313.28 samples/sec Loss 5.2076 LearningRate 0.0004 Epoch: 17 Global Step: 361780 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:25,073-Speed 6242.26 samples/sec Loss 5.2794 LearningRate 0.0004 Epoch: 17 Global Step: 361790 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:28,314-Speed 6319.79 samples/sec Loss 5.2723 LearningRate 0.0004 Epoch: 17 Global Step: 361800 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:31,557-Speed 6316.76 samples/sec Loss 5.2579 LearningRate 0.0004 Epoch: 17 Global Step: 361810 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:34,802-Speed 6314.21 samples/sec Loss 5.2723 LearningRate 0.0004 Epoch: 17 Global Step: 361820 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:38,050-Speed 6306.45 samples/sec Loss 5.2715 LearningRate 0.0004 Epoch: 17 Global Step: 361830 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:41,294-Speed 6313.20 samples/sec Loss 5.3321 LearningRate 0.0004 Epoch: 17 Global Step: 361840 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:44,541-Speed 6309.92 samples/sec Loss 5.2878 LearningRate 0.0004 Epoch: 17 Global Step: 361850 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:38:47,777-Speed 6330.37 samples/sec Loss 5.2491 LearningRate 0.0004 Epoch: 17 Global Step: 361860 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:38:51,031-Speed 6294.79 samples/sec Loss 5.3028 LearningRate 0.0004 Epoch: 17 Global Step: 361870 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:38:54,281-Speed 6302.05 samples/sec Loss 5.2223 LearningRate 0.0004 Epoch: 17 Global Step: 361880 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:38:57,529-Speed 6307.31 samples/sec Loss 5.2324 LearningRate 0.0004 Epoch: 17 Global Step: 361890 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:00,772-Speed 6316.07 samples/sec Loss 5.3147 LearningRate 0.0004 Epoch: 17 Global Step: 361900 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:04,018-Speed 6311.68 samples/sec Loss 5.2658 LearningRate 0.0004 Epoch: 17 Global Step: 361910 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:07,260-Speed 6318.75 samples/sec Loss 5.2241 LearningRate 0.0004 Epoch: 17 Global Step: 361920 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:10,518-Speed 6288.05 samples/sec Loss 5.3121 LearningRate 0.0004 Epoch: 17 Global Step: 361930 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:13,757-Speed 6323.03 samples/sec Loss 5.2557 LearningRate 0.0004 Epoch: 17 Global Step: 361940 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:17,007-Speed 6304.02 samples/sec Loss 5.3277 LearningRate 0.0004 Epoch: 17 Global Step: 361950 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:20,253-Speed 6310.27 samples/sec Loss 5.2210 LearningRate 0.0004 Epoch: 17 Global Step: 361960 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:39:23,500-Speed 6309.85 samples/sec Loss 5.3635 LearningRate 0.0004 Epoch: 17 Global Step: 361970 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:39:26,743-Speed 6314.70 samples/sec Loss 5.3089 LearningRate 0.0004 Epoch: 17 Global Step: 361980 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:29,985-Speed 6318.96 samples/sec Loss 5.2979 LearningRate 0.0004 Epoch: 17 Global Step: 361990 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:33,228-Speed 6316.62 samples/sec Loss 5.2895 LearningRate 0.0004 Epoch: 17 Global Step: 362000 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:36,471-Speed 6316.61 samples/sec Loss 5.2820 LearningRate 0.0004 Epoch: 17 Global Step: 362010 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:39,721-Speed 6302.93 samples/sec Loss 5.2804 LearningRate 0.0004 Epoch: 17 Global Step: 362020 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:42,962-Speed 6321.23 samples/sec Loss 5.2504 LearningRate 0.0004 Epoch: 17 Global Step: 362030 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:46,240-Speed 6248.13 samples/sec Loss 5.2943 LearningRate 0.0004 Epoch: 17 Global Step: 362040 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:49,597-Speed 6102.53 samples/sec Loss 5.2499 LearningRate 0.0004 Epoch: 17 Global Step: 362050 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:52,844-Speed 6308.92 samples/sec Loss 5.2967 LearningRate 0.0004 Epoch: 17 Global Step: 362060 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:56,087-Speed 6315.94 samples/sec Loss 5.3266 LearningRate 0.0004 Epoch: 17 Global Step: 362070 Fp16 Grad Scale: 16384 Required: 43 hours Training: 2022-04-02 00:39:59,338-Speed 6301.13 samples/sec Loss 5.2557 LearningRate 0.0004 Epoch: 17 Global Step: 362080 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:40:02,584-Speed 6310.78 samples/sec Loss 5.1783 LearningRate 0.0004 Epoch: 17 Global Step: 362090 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:40:05,829-Speed 6313.89 samples/sec Loss 5.2736 LearningRate 0.0004 Epoch: 17 Global Step: 362100 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:40:09,073-Speed 6314.02 samples/sec Loss 5.2803 LearningRate 0.0004 Epoch: 17 Global Step: 362110 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:40:12,318-Speed 6311.77 samples/sec Loss 5.2567 LearningRate 0.0004 Epoch: 17 Global Step: 362120 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:40:15,564-Speed 6311.89 samples/sec Loss 5.2304 LearningRate 0.0004 Epoch: 17 Global Step: 362130 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:40:18,807-Speed 6316.45 samples/sec Loss 5.3229 LearningRate 0.0004 Epoch: 17 Global Step: 362140 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:40:22,052-Speed 6313.35 samples/sec Loss 5.2687 LearningRate 0.0004 Epoch: 17 Global Step: 362150 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:40:25,300-Speed 6305.85 samples/sec Loss 5.2709 LearningRate 0.0004 Epoch: 17 Global Step: 362160 Fp16 Grad Scale: 32768 Required: 43 hours Training: 2022-04-02 00:40:28,537-Speed 6328.89 samples/sec Loss 5.2784 LearningRate 0.0004 Epoch: 17 Global Step: 362170 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:40:31,780-Speed 6316.55 samples/sec Loss 5.2328 LearningRate 0.0004 Epoch: 17 Global Step: 362180 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:40:35,035-Speed 6293.93 samples/sec Loss 5.2199 LearningRate 0.0004 Epoch: 17 Global Step: 362190 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:40:38,277-Speed 6318.25 samples/sec Loss 5.2357 LearningRate 0.0004 Epoch: 17 Global Step: 362200 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:40:41,524-Speed 6308.68 samples/sec Loss 5.2370 LearningRate 0.0004 Epoch: 17 Global Step: 362210 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:40:44,768-Speed 6314.76 samples/sec Loss 5.2893 LearningRate 0.0004 Epoch: 17 Global Step: 362220 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:40:48,012-Speed 6313.67 samples/sec Loss 5.2754 LearningRate 0.0004 Epoch: 17 Global Step: 362230 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:40:51,258-Speed 6310.71 samples/sec Loss 5.1971 LearningRate 0.0004 Epoch: 17 Global Step: 362240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:40:54,501-Speed 6316.99 samples/sec Loss 5.3023 LearningRate 0.0004 Epoch: 17 Global Step: 362250 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:40:57,750-Speed 6305.92 samples/sec Loss 5.3178 LearningRate 0.0004 Epoch: 17 Global Step: 362260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:00,997-Speed 6307.29 samples/sec Loss 5.3910 LearningRate 0.0004 Epoch: 17 Global Step: 362270 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:41:04,245-Speed 6306.59 samples/sec Loss 5.2136 LearningRate 0.0004 Epoch: 17 Global Step: 362280 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:41:07,493-Speed 6307.40 samples/sec Loss 5.3021 LearningRate 0.0004 Epoch: 17 Global Step: 362290 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:41:10,748-Speed 6292.44 samples/sec Loss 5.2722 LearningRate 0.0004 Epoch: 17 Global Step: 362300 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:41:13,994-Speed 6311.28 samples/sec Loss 5.2923 LearningRate 0.0004 Epoch: 17 Global Step: 362310 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:41:17,252-Speed 6287.17 samples/sec Loss 5.2989 LearningRate 0.0004 Epoch: 17 Global Step: 362320 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:41:20,499-Speed 6309.40 samples/sec Loss 5.3565 LearningRate 0.0004 Epoch: 17 Global Step: 362330 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:41:23,732-Speed 6336.21 samples/sec Loss 5.2742 LearningRate 0.0004 Epoch: 17 Global Step: 362340 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:26,983-Speed 6301.34 samples/sec Loss 5.2756 LearningRate 0.0004 Epoch: 17 Global Step: 362350 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:30,242-Speed 6286.27 samples/sec Loss 5.3189 LearningRate 0.0004 Epoch: 17 Global Step: 362360 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:33,487-Speed 6312.28 samples/sec Loss 5.2643 LearningRate 0.0004 Epoch: 17 Global Step: 362370 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:36,732-Speed 6312.85 samples/sec Loss 5.2661 LearningRate 0.0004 Epoch: 17 Global Step: 362380 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:39,979-Speed 6307.87 samples/sec Loss 5.2803 LearningRate 0.0004 Epoch: 17 Global Step: 362390 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:43,223-Speed 6314.93 samples/sec Loss 5.2245 LearningRate 0.0004 Epoch: 17 Global Step: 362400 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:46,472-Speed 6304.58 samples/sec Loss 5.2576 LearningRate 0.0004 Epoch: 17 Global Step: 362410 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:49,719-Speed 6309.77 samples/sec Loss 5.3129 LearningRate 0.0004 Epoch: 17 Global Step: 362420 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:52,969-Speed 6303.38 samples/sec Loss 5.3195 LearningRate 0.0004 Epoch: 17 Global Step: 362430 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:41:56,214-Speed 6311.61 samples/sec Loss 5.3643 LearningRate 0.0004 Epoch: 17 Global Step: 362440 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:41:59,461-Speed 6308.22 samples/sec Loss 5.2078 LearningRate 0.0004 Epoch: 17 Global Step: 362450 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:42:02,743-Speed 6242.91 samples/sec Loss 5.2726 LearningRate 0.0004 Epoch: 17 Global Step: 362460 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:42:05,994-Speed 6299.94 samples/sec Loss 5.2973 LearningRate 0.0004 Epoch: 17 Global Step: 362470 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:42:09,243-Speed 6305.76 samples/sec Loss 5.2489 LearningRate 0.0004 Epoch: 17 Global Step: 362480 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:42:12,494-Speed 6299.86 samples/sec Loss 5.2497 LearningRate 0.0004 Epoch: 17 Global Step: 362490 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:42:15,740-Speed 6311.98 samples/sec Loss 5.2903 LearningRate 0.0004 Epoch: 17 Global Step: 362500 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:42:18,990-Speed 6302.55 samples/sec Loss 5.2308 LearningRate 0.0004 Epoch: 17 Global Step: 362510 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:42:22,233-Speed 6316.26 samples/sec Loss 5.2647 LearningRate 0.0004 Epoch: 17 Global Step: 362520 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:42:25,468-Speed 6331.82 samples/sec Loss 5.2526 LearningRate 0.0004 Epoch: 17 Global Step: 362530 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:42:28,718-Speed 6303.89 samples/sec Loss 5.2693 LearningRate 0.0004 Epoch: 17 Global Step: 362540 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:42:31,964-Speed 6310.77 samples/sec Loss 5.2826 LearningRate 0.0004 Epoch: 17 Global Step: 362550 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:42:35,208-Speed 6314.09 samples/sec Loss 5.2836 LearningRate 0.0004 Epoch: 17 Global Step: 362560 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:42:38,454-Speed 6311.55 samples/sec Loss 5.2387 LearningRate 0.0004 Epoch: 17 Global Step: 362570 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:42:41,701-Speed 6308.01 samples/sec Loss 5.3839 LearningRate 0.0004 Epoch: 17 Global Step: 362580 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:42:44,944-Speed 6317.46 samples/sec Loss 5.2794 LearningRate 0.0004 Epoch: 17 Global Step: 362590 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:42:48,192-Speed 6307.37 samples/sec Loss 5.3229 LearningRate 0.0004 Epoch: 17 Global Step: 362600 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:42:51,436-Speed 6314.12 samples/sec Loss 5.2964 LearningRate 0.0004 Epoch: 17 Global Step: 362610 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:42:54,679-Speed 6315.37 samples/sec Loss 5.2176 LearningRate 0.0004 Epoch: 17 Global Step: 362620 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:42:57,926-Speed 6310.14 samples/sec Loss 5.2259 LearningRate 0.0004 Epoch: 17 Global Step: 362630 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:01,218-Speed 6221.17 samples/sec Loss 5.3059 LearningRate 0.0004 Epoch: 17 Global Step: 362640 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:04,468-Speed 6304.24 samples/sec Loss 5.2453 LearningRate 0.0004 Epoch: 17 Global Step: 362650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:07,709-Speed 6319.06 samples/sec Loss 5.2445 LearningRate 0.0004 Epoch: 17 Global Step: 362660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:10,954-Speed 6313.52 samples/sec Loss 5.2762 LearningRate 0.0004 Epoch: 17 Global Step: 362670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:14,201-Speed 6307.90 samples/sec Loss 5.2446 LearningRate 0.0004 Epoch: 17 Global Step: 362680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:17,446-Speed 6313.09 samples/sec Loss 5.2884 LearningRate 0.0004 Epoch: 17 Global Step: 362690 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:20,695-Speed 6305.81 samples/sec Loss 5.3832 LearningRate 0.0004 Epoch: 17 Global Step: 362700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:23,944-Speed 6304.58 samples/sec Loss 5.2803 LearningRate 0.0004 Epoch: 17 Global Step: 362710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:27,188-Speed 6313.86 samples/sec Loss 5.2960 LearningRate 0.0004 Epoch: 17 Global Step: 362720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:30,439-Speed 6300.18 samples/sec Loss 5.2791 LearningRate 0.0004 Epoch: 17 Global Step: 362730 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:43:33,668-Speed 6345.01 samples/sec Loss 5.2460 LearningRate 0.0004 Epoch: 17 Global Step: 362740 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:36,923-Speed 6293.65 samples/sec Loss 5.3175 LearningRate 0.0004 Epoch: 17 Global Step: 362750 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:40,163-Speed 6322.05 samples/sec Loss 5.2095 LearningRate 0.0004 Epoch: 17 Global Step: 362760 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:43,408-Speed 6312.33 samples/sec Loss 5.2260 LearningRate 0.0004 Epoch: 17 Global Step: 362770 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:46,655-Speed 6308.24 samples/sec Loss 5.2794 LearningRate 0.0004 Epoch: 17 Global Step: 362780 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:49,900-Speed 6314.25 samples/sec Loss 5.2646 LearningRate 0.0004 Epoch: 17 Global Step: 362790 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:53,147-Speed 6308.56 samples/sec Loss 5.2295 LearningRate 0.0004 Epoch: 17 Global Step: 362800 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:56,389-Speed 6318.06 samples/sec Loss 5.4080 LearningRate 0.0004 Epoch: 17 Global Step: 362810 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:43:59,644-Speed 6294.08 samples/sec Loss 5.3192 LearningRate 0.0004 Epoch: 17 Global Step: 362820 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:02,890-Speed 6311.39 samples/sec Loss 5.3111 LearningRate 0.0004 Epoch: 17 Global Step: 362830 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:06,139-Speed 6304.91 samples/sec Loss 5.2761 LearningRate 0.0004 Epoch: 17 Global Step: 362840 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:44:09,370-Speed 6340.26 samples/sec Loss 5.2974 LearningRate 0.0004 Epoch: 17 Global Step: 362850 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:12,624-Speed 6294.25 samples/sec Loss 5.2665 LearningRate 0.0004 Epoch: 17 Global Step: 362860 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:15,966-Speed 6129.55 samples/sec Loss 5.2881 LearningRate 0.0004 Epoch: 17 Global Step: 362870 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:19,244-Speed 6249.67 samples/sec Loss 5.2264 LearningRate 0.0004 Epoch: 17 Global Step: 362880 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:22,491-Speed 6308.31 samples/sec Loss 5.2816 LearningRate 0.0004 Epoch: 17 Global Step: 362890 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:25,736-Speed 6312.88 samples/sec Loss 5.1991 LearningRate 0.0004 Epoch: 17 Global Step: 362900 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:28,978-Speed 6317.43 samples/sec Loss 5.2426 LearningRate 0.0004 Epoch: 17 Global Step: 362910 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:32,224-Speed 6310.43 samples/sec Loss 5.2311 LearningRate 0.0004 Epoch: 17 Global Step: 362920 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:35,471-Speed 6310.12 samples/sec Loss 5.2159 LearningRate 0.0004 Epoch: 17 Global Step: 362930 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:38,718-Speed 6307.18 samples/sec Loss 5.2296 LearningRate 0.0004 Epoch: 17 Global Step: 362940 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:44:41,964-Speed 6312.63 samples/sec Loss 5.2516 LearningRate 0.0004 Epoch: 17 Global Step: 362950 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:44:45,211-Speed 6308.61 samples/sec Loss 5.3393 LearningRate 0.0004 Epoch: 17 Global Step: 362960 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:44:48,458-Speed 6308.39 samples/sec Loss 5.2421 LearningRate 0.0004 Epoch: 17 Global Step: 362970 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:44:51,701-Speed 6315.18 samples/sec Loss 5.1736 LearningRate 0.0004 Epoch: 17 Global Step: 362980 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:44:54,947-Speed 6311.89 samples/sec Loss 5.2651 LearningRate 0.0004 Epoch: 17 Global Step: 362990 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:44:58,189-Speed 6318.31 samples/sec Loss 5.2809 LearningRate 0.0004 Epoch: 17 Global Step: 363000 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:45:01,438-Speed 6306.37 samples/sec Loss 5.2639 LearningRate 0.0004 Epoch: 17 Global Step: 363010 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:45:04,682-Speed 6313.59 samples/sec Loss 5.2217 LearningRate 0.0004 Epoch: 17 Global Step: 363020 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:45:07,930-Speed 6307.26 samples/sec Loss 5.2658 LearningRate 0.0004 Epoch: 17 Global Step: 363030 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:45:11,161-Speed 6340.47 samples/sec Loss 5.2716 LearningRate 0.0004 Epoch: 17 Global Step: 363040 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:45:14,402-Speed 6319.55 samples/sec Loss 5.2410 LearningRate 0.0004 Epoch: 17 Global Step: 363050 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:45:17,647-Speed 6313.48 samples/sec Loss 5.3107 LearningRate 0.0004 Epoch: 17 Global Step: 363060 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:45:20,890-Speed 6315.99 samples/sec Loss 5.2388 LearningRate 0.0004 Epoch: 17 Global Step: 363070 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:45:24,139-Speed 6305.17 samples/sec Loss 5.2998 LearningRate 0.0004 Epoch: 17 Global Step: 363080 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:45:27,385-Speed 6310.59 samples/sec Loss 5.2280 LearningRate 0.0004 Epoch: 17 Global Step: 363090 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:45:30,630-Speed 6311.69 samples/sec Loss 5.2196 LearningRate 0.0004 Epoch: 17 Global Step: 363100 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:45:33,877-Speed 6310.68 samples/sec Loss 5.1940 LearningRate 0.0004 Epoch: 17 Global Step: 363110 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:45:37,121-Speed 6313.87 samples/sec Loss 5.3210 LearningRate 0.0004 Epoch: 17 Global Step: 363120 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:45:40,371-Speed 6303.44 samples/sec Loss 5.2388 LearningRate 0.0004 Epoch: 17 Global Step: 363130 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:45:43,617-Speed 6311.06 samples/sec Loss 5.2997 LearningRate 0.0004 Epoch: 17 Global Step: 363140 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:45:46,859-Speed 6316.92 samples/sec Loss 5.2413 LearningRate 0.0004 Epoch: 17 Global Step: 363150 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:45:50,109-Speed 6303.59 samples/sec Loss 5.2925 LearningRate 0.0004 Epoch: 17 Global Step: 363160 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:45:53,354-Speed 6313.44 samples/sec Loss 5.3026 LearningRate 0.0004 Epoch: 17 Global Step: 363170 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:45:56,597-Speed 6316.11 samples/sec Loss 5.2095 LearningRate 0.0004 Epoch: 17 Global Step: 363180 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:45:59,845-Speed 6305.65 samples/sec Loss 5.2961 LearningRate 0.0004 Epoch: 17 Global Step: 363190 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:46:03,094-Speed 6305.55 samples/sec Loss 5.3122 LearningRate 0.0004 Epoch: 17 Global Step: 363200 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:46:06,333-Speed 6324.65 samples/sec Loss 5.3150 LearningRate 0.0004 Epoch: 17 Global Step: 363210 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:46:09,577-Speed 6314.62 samples/sec Loss 5.2059 LearningRate 0.0004 Epoch: 17 Global Step: 363220 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:46:12,821-Speed 6315.66 samples/sec Loss 5.2997 LearningRate 0.0004 Epoch: 17 Global Step: 363230 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:46:16,114-Speed 6220.19 samples/sec Loss 5.2587 LearningRate 0.0004 Epoch: 17 Global Step: 363240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:46:19,358-Speed 6313.54 samples/sec Loss 5.2866 LearningRate 0.0004 Epoch: 17 Global Step: 363250 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:46:22,602-Speed 6316.41 samples/sec Loss 5.2671 LearningRate 0.0004 Epoch: 17 Global Step: 363260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:46:25,846-Speed 6314.70 samples/sec Loss 5.2797 LearningRate 0.0004 Epoch: 17 Global Step: 363270 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:46:29,089-Speed 6315.58 samples/sec Loss 5.1938 LearningRate 0.0004 Epoch: 17 Global Step: 363280 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:46:32,331-Speed 6318.29 samples/sec Loss 5.3340 LearningRate 0.0004 Epoch: 17 Global Step: 363290 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:46:35,580-Speed 6305.94 samples/sec Loss 5.2341 LearningRate 0.0004 Epoch: 17 Global Step: 363300 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:46:38,823-Speed 6315.23 samples/sec Loss 5.2601 LearningRate 0.0004 Epoch: 17 Global Step: 363310 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:46:42,066-Speed 6317.86 samples/sec Loss 5.2731 LearningRate 0.0004 Epoch: 17 Global Step: 363320 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:46:45,310-Speed 6314.08 samples/sec Loss 5.2287 LearningRate 0.0004 Epoch: 17 Global Step: 363330 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:46:48,555-Speed 6311.65 samples/sec Loss 5.2652 LearningRate 0.0004 Epoch: 17 Global Step: 363340 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:46:51,803-Speed 6306.86 samples/sec Loss 5.3491 LearningRate 0.0004 Epoch: 17 Global Step: 363350 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:46:55,047-Speed 6315.39 samples/sec Loss 5.2583 LearningRate 0.0004 Epoch: 17 Global Step: 363360 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:46:58,295-Speed 6307.62 samples/sec Loss 5.2679 LearningRate 0.0004 Epoch: 17 Global Step: 363370 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:01,541-Speed 6310.49 samples/sec Loss 5.2327 LearningRate 0.0004 Epoch: 17 Global Step: 363380 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:04,783-Speed 6317.14 samples/sec Loss 5.2743 LearningRate 0.0004 Epoch: 17 Global Step: 363390 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:08,028-Speed 6312.74 samples/sec Loss 5.3143 LearningRate 0.0004 Epoch: 17 Global Step: 363400 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:11,256-Speed 6345.71 samples/sec Loss 5.2692 LearningRate 0.0004 Epoch: 17 Global Step: 363410 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:14,502-Speed 6312.21 samples/sec Loss 5.3593 LearningRate 0.0004 Epoch: 17 Global Step: 363420 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:17,747-Speed 6312.64 samples/sec Loss 5.2417 LearningRate 0.0004 Epoch: 17 Global Step: 363430 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:20,990-Speed 6316.30 samples/sec Loss 5.2440 LearningRate 0.0004 Epoch: 17 Global Step: 363440 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:24,236-Speed 6311.12 samples/sec Loss 5.2374 LearningRate 0.0004 Epoch: 17 Global Step: 363450 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:27,481-Speed 6312.58 samples/sec Loss 5.3508 LearningRate 0.0004 Epoch: 17 Global Step: 363460 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:30,730-Speed 6305.43 samples/sec Loss 5.2461 LearningRate 0.0004 Epoch: 17 Global Step: 363470 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:33,976-Speed 6310.04 samples/sec Loss 5.2833 LearningRate 0.0004 Epoch: 17 Global Step: 363480 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:37,226-Speed 6303.04 samples/sec Loss 5.3377 LearningRate 0.0004 Epoch: 17 Global Step: 363490 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:40,500-Speed 6257.53 samples/sec Loss 5.2925 LearningRate 0.0004 Epoch: 17 Global Step: 363500 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:43,755-Speed 6292.37 samples/sec Loss 5.2240 LearningRate 0.0004 Epoch: 17 Global Step: 363510 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:46,999-Speed 6314.30 samples/sec Loss 5.3174 LearningRate 0.0004 Epoch: 17 Global Step: 363520 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:47:50,230-Speed 6341.30 samples/sec Loss 5.2683 LearningRate 0.0004 Epoch: 17 Global Step: 363530 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:47:53,475-Speed 6312.56 samples/sec Loss 5.2820 LearningRate 0.0004 Epoch: 17 Global Step: 363540 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:47:56,723-Speed 6305.79 samples/sec Loss 5.2502 LearningRate 0.0004 Epoch: 17 Global Step: 363550 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:47:59,968-Speed 6313.78 samples/sec Loss 5.2851 LearningRate 0.0004 Epoch: 17 Global Step: 363560 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:03,215-Speed 6307.53 samples/sec Loss 5.2103 LearningRate 0.0004 Epoch: 17 Global Step: 363570 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:06,464-Speed 6304.62 samples/sec Loss 5.3176 LearningRate 0.0004 Epoch: 17 Global Step: 363580 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:09,708-Speed 6314.97 samples/sec Loss 5.2049 LearningRate 0.0004 Epoch: 17 Global Step: 363590 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:12,962-Speed 6295.80 samples/sec Loss 5.2925 LearningRate 0.0004 Epoch: 17 Global Step: 363600 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:16,210-Speed 6306.41 samples/sec Loss 5.3241 LearningRate 0.0004 Epoch: 17 Global Step: 363610 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:19,463-Speed 6297.65 samples/sec Loss 5.2917 LearningRate 0.0004 Epoch: 17 Global Step: 363620 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:22,708-Speed 6312.50 samples/sec Loss 5.3361 LearningRate 0.0004 Epoch: 17 Global Step: 363630 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:48:25,958-Speed 6302.59 samples/sec Loss 5.2587 LearningRate 0.0004 Epoch: 17 Global Step: 363640 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:48:29,220-Speed 6281.23 samples/sec Loss 5.3171 LearningRate 0.0004 Epoch: 17 Global Step: 363650 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:48:32,450-Speed 6342.22 samples/sec Loss 5.2502 LearningRate 0.0004 Epoch: 17 Global Step: 363660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:35,699-Speed 6303.90 samples/sec Loss 5.2727 LearningRate 0.0004 Epoch: 17 Global Step: 363670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:38,941-Speed 6318.94 samples/sec Loss 5.2278 LearningRate 0.0004 Epoch: 17 Global Step: 363680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:42,188-Speed 6308.77 samples/sec Loss 5.2987 LearningRate 0.0004 Epoch: 17 Global Step: 363690 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:45,430-Speed 6319.45 samples/sec Loss 5.2622 LearningRate 0.0004 Epoch: 17 Global Step: 363700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:48,674-Speed 6313.58 samples/sec Loss 5.2867 LearningRate 0.0004 Epoch: 17 Global Step: 363710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:51,917-Speed 6316.37 samples/sec Loss 5.2842 LearningRate 0.0004 Epoch: 17 Global Step: 363720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:55,163-Speed 6311.67 samples/sec Loss 5.3316 LearningRate 0.0004 Epoch: 17 Global Step: 363730 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:48:58,408-Speed 6312.64 samples/sec Loss 5.2353 LearningRate 0.0004 Epoch: 17 Global Step: 363740 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:49:01,651-Speed 6316.62 samples/sec Loss 5.3092 LearningRate 0.0004 Epoch: 17 Global Step: 363750 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:49:04,897-Speed 6309.99 samples/sec Loss 5.3015 LearningRate 0.0004 Epoch: 17 Global Step: 363760 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:08,141-Speed 6315.91 samples/sec Loss 5.2105 LearningRate 0.0004 Epoch: 17 Global Step: 363770 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:11,388-Speed 6307.14 samples/sec Loss 5.2548 LearningRate 0.0004 Epoch: 17 Global Step: 363780 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:14,647-Speed 6285.92 samples/sec Loss 5.1937 LearningRate 0.0004 Epoch: 17 Global Step: 363790 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:17,892-Speed 6313.76 samples/sec Loss 5.2401 LearningRate 0.0004 Epoch: 17 Global Step: 363800 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:21,135-Speed 6315.45 samples/sec Loss 5.1795 LearningRate 0.0004 Epoch: 17 Global Step: 363810 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:24,391-Speed 6290.56 samples/sec Loss 5.3143 LearningRate 0.0004 Epoch: 17 Global Step: 363820 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:27,633-Speed 6319.59 samples/sec Loss 5.2460 LearningRate 0.0004 Epoch: 17 Global Step: 363830 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:30,877-Speed 6313.29 samples/sec Loss 5.2250 LearningRate 0.0004 Epoch: 17 Global Step: 363840 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:34,122-Speed 6313.63 samples/sec Loss 5.2539 LearningRate 0.0004 Epoch: 17 Global Step: 363850 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:37,353-Speed 6339.26 samples/sec Loss 5.2423 LearningRate 0.0004 Epoch: 17 Global Step: 363860 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:40,600-Speed 6310.25 samples/sec Loss 5.2459 LearningRate 0.0004 Epoch: 17 Global Step: 363870 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:43,842-Speed 6317.62 samples/sec Loss 5.2979 LearningRate 0.0004 Epoch: 17 Global Step: 363880 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:47,087-Speed 6313.50 samples/sec Loss 5.2447 LearningRate 0.0004 Epoch: 17 Global Step: 363890 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:50,330-Speed 6316.25 samples/sec Loss 5.3304 LearningRate 0.0004 Epoch: 17 Global Step: 363900 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:53,593-Speed 6277.69 samples/sec Loss 5.2745 LearningRate 0.0004 Epoch: 17 Global Step: 363910 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:49:56,836-Speed 6317.04 samples/sec Loss 5.2766 LearningRate 0.0004 Epoch: 17 Global Step: 363920 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:50:00,092-Speed 6291.20 samples/sec Loss 5.2581 LearningRate 0.0004 Epoch: 17 Global Step: 363930 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:50:03,338-Speed 6310.14 samples/sec Loss 5.3211 LearningRate 0.0004 Epoch: 17 Global Step: 363940 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:50:06,586-Speed 6307.44 samples/sec Loss 5.3103 LearningRate 0.0004 Epoch: 17 Global Step: 363950 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:50:09,814-Speed 6345.75 samples/sec Loss 5.2603 LearningRate 0.0004 Epoch: 17 Global Step: 363960 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:50:13,060-Speed 6311.93 samples/sec Loss 5.2169 LearningRate 0.0004 Epoch: 17 Global Step: 363970 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:50:16,312-Speed 6297.69 samples/sec Loss 5.2538 LearningRate 0.0004 Epoch: 17 Global Step: 363980 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:50:19,556-Speed 6314.33 samples/sec Loss 5.2486 LearningRate 0.0004 Epoch: 17 Global Step: 363990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:50:22,802-Speed 6310.83 samples/sec Loss 5.2890 LearningRate 0.0004 Epoch: 17 Global Step: 364000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:50:26,052-Speed 6303.85 samples/sec Loss 5.2868 LearningRate 0.0004 Epoch: 17 Global Step: 364010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:50:29,295-Speed 6316.91 samples/sec Loss 5.2648 LearningRate 0.0004 Epoch: 17 Global Step: 364020 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:50:32,540-Speed 6312.89 samples/sec Loss 5.2310 LearningRate 0.0004 Epoch: 17 Global Step: 364030 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:50:35,782-Speed 6317.06 samples/sec Loss 5.2678 LearningRate 0.0004 Epoch: 17 Global Step: 364040 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:50:39,029-Speed 6308.83 samples/sec Loss 5.2511 LearningRate 0.0004 Epoch: 17 Global Step: 364050 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:50:42,275-Speed 6311.94 samples/sec Loss 5.2088 LearningRate 0.0004 Epoch: 17 Global Step: 364060 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:50:45,522-Speed 6308.53 samples/sec Loss 5.2436 LearningRate 0.0004 Epoch: 17 Global Step: 364070 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:50:48,766-Speed 6314.07 samples/sec Loss 5.3053 LearningRate 0.0004 Epoch: 17 Global Step: 364080 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:50:52,012-Speed 6310.99 samples/sec Loss 5.2618 LearningRate 0.0004 Epoch: 17 Global Step: 364090 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:50:55,260-Speed 6306.74 samples/sec Loss 5.2679 LearningRate 0.0004 Epoch: 17 Global Step: 364100 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:50:58,508-Speed 6307.71 samples/sec Loss 5.2346 LearningRate 0.0004 Epoch: 17 Global Step: 364110 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:01,754-Speed 6310.78 samples/sec Loss 5.2470 LearningRate 0.0004 Epoch: 17 Global Step: 364120 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:05,000-Speed 6310.29 samples/sec Loss 5.3524 LearningRate 0.0004 Epoch: 17 Global Step: 364130 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:08,246-Speed 6311.81 samples/sec Loss 5.2983 LearningRate 0.0004 Epoch: 17 Global Step: 364140 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:11,493-Speed 6308.16 samples/sec Loss 5.2906 LearningRate 0.0004 Epoch: 17 Global Step: 364150 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:14,732-Speed 6325.51 samples/sec Loss 5.2294 LearningRate 0.0004 Epoch: 17 Global Step: 364160 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:17,981-Speed 6303.86 samples/sec Loss 5.2473 LearningRate 0.0004 Epoch: 17 Global Step: 364170 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:21,226-Speed 6313.17 samples/sec Loss 5.2581 LearningRate 0.0004 Epoch: 17 Global Step: 364180 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:24,475-Speed 6303.58 samples/sec Loss 5.3352 LearningRate 0.0004 Epoch: 17 Global Step: 364190 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:27,725-Speed 6304.07 samples/sec Loss 5.2475 LearningRate 0.0004 Epoch: 17 Global Step: 364200 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:30,967-Speed 6317.41 samples/sec Loss 5.2420 LearningRate 0.0004 Epoch: 17 Global Step: 364210 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:34,215-Speed 6307.85 samples/sec Loss 5.1725 LearningRate 0.0004 Epoch: 17 Global Step: 364220 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:37,459-Speed 6313.27 samples/sec Loss 5.2359 LearningRate 0.0004 Epoch: 17 Global Step: 364230 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:41,709-Speed 4820.47 samples/sec Loss 5.2160 LearningRate 0.0004 Epoch: 17 Global Step: 364240 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:44,953-Speed 6316.02 samples/sec Loss 5.2650 LearningRate 0.0004 Epoch: 17 Global Step: 364250 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:48,190-Speed 6328.63 samples/sec Loss 5.2965 LearningRate 0.0004 Epoch: 17 Global Step: 364260 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:51,439-Speed 6303.47 samples/sec Loss 5.2778 LearningRate 0.0004 Epoch: 17 Global Step: 364270 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:54,686-Speed 6309.59 samples/sec Loss 5.2060 LearningRate 0.0004 Epoch: 17 Global Step: 364280 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:51:57,943-Speed 6288.49 samples/sec Loss 5.2956 LearningRate 0.0004 Epoch: 17 Global Step: 364290 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:01,213-Speed 6264.79 samples/sec Loss 5.2331 LearningRate 0.0004 Epoch: 17 Global Step: 364300 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:04,460-Speed 6309.08 samples/sec Loss 5.2275 LearningRate 0.0004 Epoch: 17 Global Step: 364310 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:07,712-Speed 6299.52 samples/sec Loss 5.2489 LearningRate 0.0004 Epoch: 17 Global Step: 364320 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:10,955-Speed 6317.30 samples/sec Loss 5.1994 LearningRate 0.0004 Epoch: 17 Global Step: 364330 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:14,200-Speed 6312.98 samples/sec Loss 5.3107 LearningRate 0.0004 Epoch: 17 Global Step: 364340 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:17,442-Speed 6318.71 samples/sec Loss 5.2702 LearningRate 0.0004 Epoch: 17 Global Step: 364350 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:20,672-Speed 6342.34 samples/sec Loss 5.2055 LearningRate 0.0004 Epoch: 17 Global Step: 364360 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:23,918-Speed 6310.36 samples/sec Loss 5.2481 LearningRate 0.0004 Epoch: 17 Global Step: 364370 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:27,168-Speed 6302.77 samples/sec Loss 5.2222 LearningRate 0.0004 Epoch: 17 Global Step: 364380 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:30,414-Speed 6310.51 samples/sec Loss 5.2452 LearningRate 0.0004 Epoch: 17 Global Step: 364390 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:33,655-Speed 6319.43 samples/sec Loss 5.2503 LearningRate 0.0004 Epoch: 17 Global Step: 364400 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:36,902-Speed 6309.76 samples/sec Loss 5.3342 LearningRate 0.0004 Epoch: 17 Global Step: 364410 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:40,148-Speed 6311.05 samples/sec Loss 5.3424 LearningRate 0.0004 Epoch: 17 Global Step: 364420 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:43,389-Speed 6319.26 samples/sec Loss 5.2680 LearningRate 0.0004 Epoch: 17 Global Step: 364430 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:46,634-Speed 6312.86 samples/sec Loss 5.2099 LearningRate 0.0004 Epoch: 17 Global Step: 364440 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:49,883-Speed 6306.12 samples/sec Loss 5.3030 LearningRate 0.0004 Epoch: 17 Global Step: 364450 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:53,114-Speed 6339.03 samples/sec Loss 5.2681 LearningRate 0.0004 Epoch: 17 Global Step: 364460 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:56,359-Speed 6313.21 samples/sec Loss 5.1818 LearningRate 0.0004 Epoch: 17 Global Step: 364470 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:52:59,608-Speed 6304.02 samples/sec Loss 5.2499 LearningRate 0.0004 Epoch: 17 Global Step: 364480 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:53:02,859-Speed 6302.06 samples/sec Loss 5.2059 LearningRate 0.0004 Epoch: 17 Global Step: 364490 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:53:06,104-Speed 6312.64 samples/sec Loss 5.2501 LearningRate 0.0004 Epoch: 17 Global Step: 364500 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:53:09,348-Speed 6313.98 samples/sec Loss 5.2616 LearningRate 0.0004 Epoch: 17 Global Step: 364510 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:53:12,577-Speed 6344.25 samples/sec Loss 5.2482 LearningRate 0.0004 Epoch: 17 Global Step: 364520 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:15,821-Speed 6316.37 samples/sec Loss 5.1985 LearningRate 0.0004 Epoch: 17 Global Step: 364530 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:19,068-Speed 6307.93 samples/sec Loss 5.1587 LearningRate 0.0004 Epoch: 17 Global Step: 364540 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:22,313-Speed 6313.28 samples/sec Loss 5.2354 LearningRate 0.0004 Epoch: 17 Global Step: 364550 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:25,556-Speed 6314.73 samples/sec Loss 5.2323 LearningRate 0.0004 Epoch: 17 Global Step: 364560 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:28,797-Speed 6320.90 samples/sec Loss 5.3229 LearningRate 0.0004 Epoch: 17 Global Step: 364570 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:32,045-Speed 6307.71 samples/sec Loss 5.2406 LearningRate 0.0004 Epoch: 17 Global Step: 364580 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:35,288-Speed 6315.93 samples/sec Loss 5.2541 LearningRate 0.0004 Epoch: 17 Global Step: 364590 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:38,536-Speed 6307.31 samples/sec Loss 5.2946 LearningRate 0.0004 Epoch: 17 Global Step: 364600 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:41,776-Speed 6321.51 samples/sec Loss 5.2948 LearningRate 0.0004 Epoch: 17 Global Step: 364610 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:45,020-Speed 6316.02 samples/sec Loss 5.1645 LearningRate 0.0004 Epoch: 17 Global Step: 364620 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:53:48,265-Speed 6311.21 samples/sec Loss 5.3420 LearningRate 0.0004 Epoch: 17 Global Step: 364630 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:53:51,514-Speed 6305.88 samples/sec Loss 5.2293 LearningRate 0.0004 Epoch: 17 Global Step: 364640 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:53:54,744-Speed 6341.74 samples/sec Loss 5.1534 LearningRate 0.0004 Epoch: 17 Global Step: 364650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:53:57,994-Speed 6303.62 samples/sec Loss 5.2207 LearningRate 0.0004 Epoch: 17 Global Step: 364660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:54:01,241-Speed 6308.60 samples/sec Loss 5.2003 LearningRate 0.0004 Epoch: 17 Global Step: 364670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:54:04,487-Speed 6309.43 samples/sec Loss 5.2431 LearningRate 0.0004 Epoch: 17 Global Step: 364680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:54:07,748-Speed 6281.96 samples/sec Loss 5.2672 LearningRate 0.0004 Epoch: 17 Global Step: 364690 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:54:10,997-Speed 6304.61 samples/sec Loss 5.2575 LearningRate 0.0004 Epoch: 17 Global Step: 364700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:54:14,242-Speed 6313.08 samples/sec Loss 5.2534 LearningRate 0.0004 Epoch: 17 Global Step: 364710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:54:17,486-Speed 6314.09 samples/sec Loss 5.2044 LearningRate 0.0004 Epoch: 17 Global Step: 364720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:54:20,736-Speed 6304.59 samples/sec Loss 5.2604 LearningRate 0.0004 Epoch: 17 Global Step: 364730 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:54:23,980-Speed 6315.13 samples/sec Loss 5.2682 LearningRate 0.0004 Epoch: 17 Global Step: 364740 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:54:27,235-Speed 6293.74 samples/sec Loss 5.3163 LearningRate 0.0004 Epoch: 17 Global Step: 364750 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:54:30,487-Speed 6298.59 samples/sec Loss 5.2140 LearningRate 0.0004 Epoch: 17 Global Step: 364760 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:54:33,729-Speed 6318.48 samples/sec Loss 5.2905 LearningRate 0.0004 Epoch: 17 Global Step: 364770 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:54:36,979-Speed 6303.40 samples/sec Loss 5.3043 LearningRate 0.0004 Epoch: 17 Global Step: 364780 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:54:40,222-Speed 6316.84 samples/sec Loss 5.2838 LearningRate 0.0004 Epoch: 17 Global Step: 364790 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:54:43,469-Speed 6307.89 samples/sec Loss 5.2394 LearningRate 0.0004 Epoch: 17 Global Step: 364800 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:54:46,711-Speed 6317.71 samples/sec Loss 5.3142 LearningRate 0.0004 Epoch: 17 Global Step: 364810 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:54:49,961-Speed 6307.85 samples/sec Loss 5.1758 LearningRate 0.0004 Epoch: 17 Global Step: 364820 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:54:53,206-Speed 6311.25 samples/sec Loss 5.2891 LearningRate 0.0004 Epoch: 17 Global Step: 364830 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:54:56,451-Speed 6313.83 samples/sec Loss 5.2133 LearningRate 0.0004 Epoch: 17 Global Step: 364840 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:54:59,679-Speed 6345.24 samples/sec Loss 5.1829 LearningRate 0.0004 Epoch: 17 Global Step: 364850 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:02,924-Speed 6313.56 samples/sec Loss 5.2205 LearningRate 0.0004 Epoch: 17 Global Step: 364860 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:06,175-Speed 6299.37 samples/sec Loss 5.3294 LearningRate 0.0004 Epoch: 17 Global Step: 364870 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:09,421-Speed 6311.73 samples/sec Loss 5.2545 LearningRate 0.0004 Epoch: 17 Global Step: 364880 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:12,667-Speed 6309.83 samples/sec Loss 5.2671 LearningRate 0.0004 Epoch: 17 Global Step: 364890 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:15,912-Speed 6312.27 samples/sec Loss 5.2663 LearningRate 0.0004 Epoch: 17 Global Step: 364900 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:19,157-Speed 6314.62 samples/sec Loss 5.3652 LearningRate 0.0004 Epoch: 17 Global Step: 364910 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:22,406-Speed 6304.27 samples/sec Loss 5.2505 LearningRate 0.0004 Epoch: 17 Global Step: 364920 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:25,655-Speed 6305.74 samples/sec Loss 5.2186 LearningRate 0.0004 Epoch: 17 Global Step: 364930 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:28,900-Speed 6311.58 samples/sec Loss 5.2999 LearningRate 0.0004 Epoch: 17 Global Step: 364940 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:32,132-Speed 6339.15 samples/sec Loss 5.2907 LearningRate 0.0004 Epoch: 17 Global Step: 364950 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:35,379-Speed 6308.38 samples/sec Loss 5.2033 LearningRate 0.0004 Epoch: 17 Global Step: 364960 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:38,625-Speed 6312.09 samples/sec Loss 5.2328 LearningRate 0.0004 Epoch: 17 Global Step: 364970 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:55:41,853-Speed 6344.37 samples/sec Loss 5.3041 LearningRate 0.0004 Epoch: 17 Global Step: 364980 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:55:45,097-Speed 6315.62 samples/sec Loss 5.1745 LearningRate 0.0004 Epoch: 17 Global Step: 364990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:55:48,344-Speed 6308.14 samples/sec Loss 5.2701 LearningRate 0.0004 Epoch: 17 Global Step: 365000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:55:51,588-Speed 6317.18 samples/sec Loss 5.2230 LearningRate 0.0004 Epoch: 17 Global Step: 365010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:55:54,836-Speed 6305.58 samples/sec Loss 5.2118 LearningRate 0.0004 Epoch: 17 Global Step: 365020 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:55:58,091-Speed 6293.55 samples/sec Loss 5.2495 LearningRate 0.0004 Epoch: 17 Global Step: 365030 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:01,339-Speed 6306.50 samples/sec Loss 5.2257 LearningRate 0.0004 Epoch: 17 Global Step: 365040 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:04,586-Speed 6309.32 samples/sec Loss 5.1813 LearningRate 0.0004 Epoch: 17 Global Step: 365050 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:07,832-Speed 6309.67 samples/sec Loss 5.2147 LearningRate 0.0004 Epoch: 17 Global Step: 365060 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:11,081-Speed 6306.38 samples/sec Loss 5.2246 LearningRate 0.0004 Epoch: 17 Global Step: 365070 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:14,311-Speed 6341.15 samples/sec Loss 5.3228 LearningRate 0.0004 Epoch: 17 Global Step: 365080 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:17,555-Speed 6315.03 samples/sec Loss 5.1921 LearningRate 0.0004 Epoch: 17 Global Step: 365090 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:20,799-Speed 6315.20 samples/sec Loss 5.2468 LearningRate 0.0004 Epoch: 17 Global Step: 365100 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:24,048-Speed 6304.82 samples/sec Loss 5.2724 LearningRate 0.0004 Epoch: 17 Global Step: 365110 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:27,295-Speed 6307.94 samples/sec Loss 5.2414 LearningRate 0.0004 Epoch: 17 Global Step: 365120 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:30,537-Speed 6319.11 samples/sec Loss 5.2941 LearningRate 0.0004 Epoch: 17 Global Step: 365130 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:33,782-Speed 6312.77 samples/sec Loss 5.2786 LearningRate 0.0004 Epoch: 17 Global Step: 365140 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:37,036-Speed 6294.02 samples/sec Loss 5.2522 LearningRate 0.0004 Epoch: 17 Global Step: 365150 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:40,283-Speed 6310.15 samples/sec Loss 5.2625 LearningRate 0.0004 Epoch: 17 Global Step: 365160 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:43,525-Speed 6318.52 samples/sec Loss 5.2868 LearningRate 0.0004 Epoch: 17 Global Step: 365170 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:56:46,846-Speed 6168.30 samples/sec Loss 5.2615 LearningRate 0.0004 Epoch: 17 Global Step: 365180 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:56:50,088-Speed 6317.99 samples/sec Loss 5.2869 LearningRate 0.0004 Epoch: 17 Global Step: 365190 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:56:53,334-Speed 6311.68 samples/sec Loss 5.2119 LearningRate 0.0004 Epoch: 17 Global Step: 365200 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:56:56,592-Speed 6288.14 samples/sec Loss 5.2185 LearningRate 0.0004 Epoch: 17 Global Step: 365210 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:56:59,895-Speed 6200.29 samples/sec Loss 5.2612 LearningRate 0.0004 Epoch: 17 Global Step: 365220 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:57:03,139-Speed 6314.48 samples/sec Loss 5.3031 LearningRate 0.0004 Epoch: 17 Global Step: 365230 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:57:06,391-Speed 6298.82 samples/sec Loss 5.2797 LearningRate 0.0004 Epoch: 17 Global Step: 365240 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:57:09,638-Speed 6309.05 samples/sec Loss 5.2693 LearningRate 0.0004 Epoch: 17 Global Step: 365250 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:57:12,873-Speed 6333.55 samples/sec Loss 5.2202 LearningRate 0.0004 Epoch: 17 Global Step: 365260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:16,121-Speed 6306.87 samples/sec Loss 5.2526 LearningRate 0.0004 Epoch: 17 Global Step: 365270 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:19,368-Speed 6307.63 samples/sec Loss 5.2121 LearningRate 0.0004 Epoch: 17 Global Step: 365280 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:22,614-Speed 6311.55 samples/sec Loss 5.3034 LearningRate 0.0004 Epoch: 17 Global Step: 365290 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:25,859-Speed 6311.83 samples/sec Loss 5.2844 LearningRate 0.0004 Epoch: 17 Global Step: 365300 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:29,105-Speed 6310.59 samples/sec Loss 5.1767 LearningRate 0.0004 Epoch: 17 Global Step: 365310 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:32,349-Speed 6316.05 samples/sec Loss 5.1859 LearningRate 0.0004 Epoch: 17 Global Step: 365320 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:35,593-Speed 6314.42 samples/sec Loss 5.2689 LearningRate 0.0004 Epoch: 17 Global Step: 365330 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:38,836-Speed 6315.42 samples/sec Loss 5.3224 LearningRate 0.0004 Epoch: 17 Global Step: 365340 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:42,084-Speed 6307.89 samples/sec Loss 5.3472 LearningRate 0.0004 Epoch: 17 Global Step: 365350 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:45,327-Speed 6315.16 samples/sec Loss 5.2351 LearningRate 0.0004 Epoch: 17 Global Step: 365360 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:57:48,571-Speed 6316.15 samples/sec Loss 5.2667 LearningRate 0.0004 Epoch: 17 Global Step: 365370 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:57:51,819-Speed 6307.62 samples/sec Loss 5.2864 LearningRate 0.0004 Epoch: 17 Global Step: 365380 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:57:55,050-Speed 6339.84 samples/sec Loss 5.3072 LearningRate 0.0004 Epoch: 17 Global Step: 365390 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:57:58,298-Speed 6306.34 samples/sec Loss 5.3203 LearningRate 0.0004 Epoch: 17 Global Step: 365400 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:58:01,551-Speed 6296.46 samples/sec Loss 5.2572 LearningRate 0.0004 Epoch: 17 Global Step: 365410 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:58:04,796-Speed 6314.40 samples/sec Loss 5.2347 LearningRate 0.0004 Epoch: 17 Global Step: 365420 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:58:08,039-Speed 6315.94 samples/sec Loss 5.1989 LearningRate 0.0004 Epoch: 17 Global Step: 365430 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:58:11,278-Speed 6324.69 samples/sec Loss 5.2458 LearningRate 0.0004 Epoch: 17 Global Step: 365440 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:58:14,522-Speed 6314.74 samples/sec Loss 5.2130 LearningRate 0.0004 Epoch: 17 Global Step: 365450 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:58:17,769-Speed 6308.51 samples/sec Loss 5.2912 LearningRate 0.0004 Epoch: 17 Global Step: 365460 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:58:21,012-Speed 6314.97 samples/sec Loss 5.2370 LearningRate 0.0004 Epoch: 17 Global Step: 365470 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:58:24,257-Speed 6313.29 samples/sec Loss 5.2874 LearningRate 0.0004 Epoch: 17 Global Step: 365480 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:58:27,501-Speed 6315.77 samples/sec Loss 5.3098 LearningRate 0.0004 Epoch: 17 Global Step: 365490 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:58:30,744-Speed 6316.17 samples/sec Loss 5.2365 LearningRate 0.0004 Epoch: 17 Global Step: 365500 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:58:33,988-Speed 6315.21 samples/sec Loss 5.2661 LearningRate 0.0004 Epoch: 17 Global Step: 365510 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:58:37,239-Speed 6299.59 samples/sec Loss 5.2192 LearningRate 0.0004 Epoch: 17 Global Step: 365520 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:58:40,494-Speed 6293.18 samples/sec Loss 5.2653 LearningRate 0.0004 Epoch: 17 Global Step: 365530 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:58:43,737-Speed 6317.10 samples/sec Loss 5.2704 LearningRate 0.0004 Epoch: 17 Global Step: 365540 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:58:46,985-Speed 6306.70 samples/sec Loss 5.2027 LearningRate 0.0004 Epoch: 17 Global Step: 365550 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:58:50,226-Speed 6321.04 samples/sec Loss 5.2903 LearningRate 0.0004 Epoch: 17 Global Step: 365560 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:58:53,475-Speed 6303.25 samples/sec Loss 5.2800 LearningRate 0.0004 Epoch: 17 Global Step: 365570 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:58:56,718-Speed 6317.50 samples/sec Loss 5.2152 LearningRate 0.0004 Epoch: 17 Global Step: 365580 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:58:59,949-Speed 6340.94 samples/sec Loss 5.2442 LearningRate 0.0004 Epoch: 17 Global Step: 365590 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:03,190-Speed 6319.54 samples/sec Loss 5.2791 LearningRate 0.0004 Epoch: 17 Global Step: 365600 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:06,437-Speed 6310.09 samples/sec Loss 5.2375 LearningRate 0.0004 Epoch: 17 Global Step: 365610 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:09,682-Speed 6312.77 samples/sec Loss 5.2175 LearningRate 0.0004 Epoch: 17 Global Step: 365620 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:12,926-Speed 6313.74 samples/sec Loss 5.2625 LearningRate 0.0004 Epoch: 17 Global Step: 365630 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:16,172-Speed 6310.08 samples/sec Loss 5.2038 LearningRate 0.0004 Epoch: 17 Global Step: 365640 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:19,415-Speed 6316.71 samples/sec Loss 5.2576 LearningRate 0.0004 Epoch: 17 Global Step: 365650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:22,661-Speed 6311.16 samples/sec Loss 5.2987 LearningRate 0.0004 Epoch: 17 Global Step: 365660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:25,907-Speed 6310.84 samples/sec Loss 5.2916 LearningRate 0.0004 Epoch: 17 Global Step: 365670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:29,152-Speed 6312.06 samples/sec Loss 5.2702 LearningRate 0.0004 Epoch: 17 Global Step: 365680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:32,396-Speed 6315.65 samples/sec Loss 5.3288 LearningRate 0.0004 Epoch: 17 Global Step: 365690 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 00:59:35,631-Speed 6332.72 samples/sec Loss 5.2793 LearningRate 0.0004 Epoch: 17 Global Step: 365700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:38,879-Speed 6305.39 samples/sec Loss 5.2637 LearningRate 0.0004 Epoch: 17 Global Step: 365710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:42,124-Speed 6313.68 samples/sec Loss 5.2546 LearningRate 0.0004 Epoch: 17 Global Step: 365720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:45,368-Speed 6313.70 samples/sec Loss 5.1790 LearningRate 0.0004 Epoch: 17 Global Step: 365730 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:48,615-Speed 6308.96 samples/sec Loss 5.2449 LearningRate 0.0004 Epoch: 17 Global Step: 365740 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:51,865-Speed 6303.09 samples/sec Loss 5.2605 LearningRate 0.0004 Epoch: 17 Global Step: 365750 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:55,112-Speed 6309.69 samples/sec Loss 5.3509 LearningRate 0.0004 Epoch: 17 Global Step: 365760 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 00:59:58,359-Speed 6307.79 samples/sec Loss 5.3343 LearningRate 0.0004 Epoch: 17 Global Step: 365770 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:01,603-Speed 6314.36 samples/sec Loss 5.2260 LearningRate 0.0004 Epoch: 17 Global Step: 365780 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:04,852-Speed 6305.63 samples/sec Loss 5.2432 LearningRate 0.0004 Epoch: 17 Global Step: 365790 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:08,093-Speed 6320.84 samples/sec Loss 5.2520 LearningRate 0.0004 Epoch: 17 Global Step: 365800 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:00:11,339-Speed 6310.29 samples/sec Loss 5.2160 LearningRate 0.0004 Epoch: 17 Global Step: 365810 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:00:14,585-Speed 6311.39 samples/sec Loss 5.0940 LearningRate 0.0004 Epoch: 17 Global Step: 365820 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:00:17,829-Speed 6315.46 samples/sec Loss 5.1416 LearningRate 0.0004 Epoch: 17 Global Step: 365830 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:00:21,084-Speed 6291.82 samples/sec Loss 5.2650 LearningRate 0.0004 Epoch: 17 Global Step: 365840 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:00:24,331-Speed 6313.41 samples/sec Loss 5.2384 LearningRate 0.0004 Epoch: 17 Global Step: 365850 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:00:27,577-Speed 6310.16 samples/sec Loss 5.2725 LearningRate 0.0004 Epoch: 17 Global Step: 365860 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:00:30,806-Speed 6342.68 samples/sec Loss 5.2871 LearningRate 0.0004 Epoch: 17 Global Step: 365870 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:34,050-Speed 6315.44 samples/sec Loss 5.2849 LearningRate 0.0004 Epoch: 17 Global Step: 365880 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:37,296-Speed 6311.72 samples/sec Loss 5.2804 LearningRate 0.0004 Epoch: 17 Global Step: 365890 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:40,541-Speed 6310.94 samples/sec Loss 5.3125 LearningRate 0.0004 Epoch: 17 Global Step: 365900 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:43,786-Speed 6313.51 samples/sec Loss 5.3342 LearningRate 0.0004 Epoch: 17 Global Step: 365910 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:47,029-Speed 6317.06 samples/sec Loss 5.2771 LearningRate 0.0004 Epoch: 17 Global Step: 365920 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:50,276-Speed 6307.29 samples/sec Loss 5.3013 LearningRate 0.0004 Epoch: 17 Global Step: 365930 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:53,532-Speed 6292.73 samples/sec Loss 5.2883 LearningRate 0.0004 Epoch: 17 Global Step: 365940 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:00:56,782-Speed 6302.51 samples/sec Loss 5.2687 LearningRate 0.0004 Epoch: 17 Global Step: 365950 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:00,025-Speed 6316.43 samples/sec Loss 5.3474 LearningRate 0.0004 Epoch: 17 Global Step: 365960 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:03,269-Speed 6314.49 samples/sec Loss 5.2206 LearningRate 0.0004 Epoch: 17 Global Step: 365970 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:01:06,516-Speed 6309.29 samples/sec Loss 5.2182 LearningRate 0.0004 Epoch: 17 Global Step: 365980 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:01:09,761-Speed 6312.41 samples/sec Loss 5.2958 LearningRate 0.0004 Epoch: 17 Global Step: 365990 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:01:13,006-Speed 6313.67 samples/sec Loss 5.2715 LearningRate 0.0004 Epoch: 17 Global Step: 366000 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:01:16,253-Speed 6307.54 samples/sec Loss 5.2292 LearningRate 0.0004 Epoch: 17 Global Step: 366010 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:01:19,501-Speed 6308.42 samples/sec Loss 5.1981 LearningRate 0.0004 Epoch: 17 Global Step: 366020 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:01:22,746-Speed 6311.90 samples/sec Loss 5.3000 LearningRate 0.0004 Epoch: 17 Global Step: 366030 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:01:25,984-Speed 6327.03 samples/sec Loss 5.2110 LearningRate 0.0004 Epoch: 17 Global Step: 366040 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:29,234-Speed 6302.13 samples/sec Loss 5.2403 LearningRate 0.0004 Epoch: 17 Global Step: 366050 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:32,477-Speed 6317.02 samples/sec Loss 5.2637 LearningRate 0.0004 Epoch: 17 Global Step: 366060 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:35,723-Speed 6309.95 samples/sec Loss 5.2587 LearningRate 0.0004 Epoch: 17 Global Step: 366070 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:38,968-Speed 6313.12 samples/sec Loss 5.2061 LearningRate 0.0004 Epoch: 17 Global Step: 366080 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:42,209-Speed 6321.27 samples/sec Loss 5.2455 LearningRate 0.0004 Epoch: 17 Global Step: 366090 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:45,454-Speed 6312.60 samples/sec Loss 5.2808 LearningRate 0.0004 Epoch: 17 Global Step: 366100 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:48,702-Speed 6306.12 samples/sec Loss 5.2257 LearningRate 0.0004 Epoch: 17 Global Step: 366110 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:51,943-Speed 6320.02 samples/sec Loss 5.1994 LearningRate 0.0004 Epoch: 17 Global Step: 366120 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:55,192-Speed 6305.17 samples/sec Loss 5.2545 LearningRate 0.0004 Epoch: 17 Global Step: 366130 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:01:58,443-Speed 6300.33 samples/sec Loss 5.1836 LearningRate 0.0004 Epoch: 17 Global Step: 366140 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:02:01,703-Speed 6284.02 samples/sec Loss 5.3140 LearningRate 0.0004 Epoch: 17 Global Step: 366150 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:02:04,980-Speed 6251.25 samples/sec Loss 5.1891 LearningRate 0.0004 Epoch: 17 Global Step: 366160 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:02:08,231-Speed 6301.71 samples/sec Loss 5.2296 LearningRate 0.0004 Epoch: 17 Global Step: 366170 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:02:11,476-Speed 6313.38 samples/sec Loss 5.2159 LearningRate 0.0004 Epoch: 17 Global Step: 366180 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:02:14,721-Speed 6311.61 samples/sec Loss 5.2801 LearningRate 0.0004 Epoch: 17 Global Step: 366190 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:02:17,955-Speed 6333.63 samples/sec Loss 5.1865 LearningRate 0.0004 Epoch: 17 Global Step: 366200 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:02:21,200-Speed 6313.08 samples/sec Loss 5.2497 LearningRate 0.0004 Epoch: 17 Global Step: 366210 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:02:24,446-Speed 6310.29 samples/sec Loss 5.2243 LearningRate 0.0004 Epoch: 17 Global Step: 366220 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:02:27,693-Speed 6309.16 samples/sec Loss 5.2761 LearningRate 0.0004 Epoch: 17 Global Step: 366230 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:02:30,938-Speed 6312.82 samples/sec Loss 5.2770 LearningRate 0.0004 Epoch: 17 Global Step: 366240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:02:34,182-Speed 6314.53 samples/sec Loss 5.2807 LearningRate 0.0004 Epoch: 17 Global Step: 366250 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:02:37,425-Speed 6316.92 samples/sec Loss 5.1265 LearningRate 0.0004 Epoch: 17 Global Step: 366260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:02:40,674-Speed 6304.91 samples/sec Loss 5.2857 LearningRate 0.0004 Epoch: 17 Global Step: 366270 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:02:43,917-Speed 6317.75 samples/sec Loss 5.2946 LearningRate 0.0004 Epoch: 17 Global Step: 366280 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:02:47,163-Speed 6311.51 samples/sec Loss 5.2416 LearningRate 0.0004 Epoch: 17 Global Step: 366290 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:02:50,408-Speed 6310.81 samples/sec Loss 5.2506 LearningRate 0.0004 Epoch: 17 Global Step: 366300 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:02:53,657-Speed 6305.09 samples/sec Loss 5.2740 LearningRate 0.0004 Epoch: 17 Global Step: 366310 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:02:56,901-Speed 6316.20 samples/sec Loss 5.2160 LearningRate 0.0004 Epoch: 17 Global Step: 366320 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:03:00,148-Speed 6307.85 samples/sec Loss 5.2027 LearningRate 0.0004 Epoch: 17 Global Step: 366330 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:03:03,400-Speed 6299.19 samples/sec Loss 5.2508 LearningRate 0.0004 Epoch: 17 Global Step: 366340 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:03:06,648-Speed 6306.59 samples/sec Loss 5.2473 LearningRate 0.0004 Epoch: 17 Global Step: 366350 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:03:09,893-Speed 6313.15 samples/sec Loss 5.2529 LearningRate 0.0004 Epoch: 17 Global Step: 366360 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:03:13,127-Speed 6338.04 samples/sec Loss 5.2133 LearningRate 0.0004 Epoch: 17 Global Step: 366370 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:03:16,367-Speed 6321.72 samples/sec Loss 5.2610 LearningRate 0.0004 Epoch: 17 Global Step: 366380 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:03:19,617-Speed 6302.11 samples/sec Loss 5.2405 LearningRate 0.0004 Epoch: 17 Global Step: 366390 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:03:22,858-Speed 6321.94 samples/sec Loss 5.2377 LearningRate 0.0004 Epoch: 17 Global Step: 366400 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:03:26,105-Speed 6311.58 samples/sec Loss 5.1624 LearningRate 0.0004 Epoch: 17 Global Step: 366410 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:03:29,355-Speed 6303.14 samples/sec Loss 5.2262 LearningRate 0.0004 Epoch: 17 Global Step: 366420 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:03:32,599-Speed 6314.09 samples/sec Loss 5.2432 LearningRate 0.0004 Epoch: 17 Global Step: 366430 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:03:35,846-Speed 6310.51 samples/sec Loss 5.2219 LearningRate 0.0004 Epoch: 17 Global Step: 366440 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:03:39,094-Speed 6306.02 samples/sec Loss 5.2351 LearningRate 0.0004 Epoch: 17 Global Step: 366450 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:03:42,341-Speed 6310.34 samples/sec Loss 5.2734 LearningRate 0.0004 Epoch: 17 Global Step: 366460 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:03:45,584-Speed 6316.64 samples/sec Loss 5.2708 LearningRate 0.0004 Epoch: 17 Global Step: 366470 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:03:48,826-Speed 6317.69 samples/sec Loss 5.2699 LearningRate 0.0004 Epoch: 17 Global Step: 366480 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:03:52,075-Speed 6306.36 samples/sec Loss 5.2404 LearningRate 0.0004 Epoch: 17 Global Step: 366490 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:03:55,322-Speed 6308.39 samples/sec Loss 5.2477 LearningRate 0.0004 Epoch: 17 Global Step: 366500 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:03:58,566-Speed 6313.60 samples/sec Loss 5.2554 LearningRate 0.0004 Epoch: 17 Global Step: 366510 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:04:01,815-Speed 6305.55 samples/sec Loss 5.2124 LearningRate 0.0004 Epoch: 17 Global Step: 366520 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:04:05,058-Speed 6316.27 samples/sec Loss 5.2667 LearningRate 0.0004 Epoch: 17 Global Step: 366530 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:04:08,306-Speed 6306.35 samples/sec Loss 5.2249 LearningRate 0.0004 Epoch: 17 Global Step: 366540 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:04:11,552-Speed 6310.84 samples/sec Loss 5.2470 LearningRate 0.0004 Epoch: 17 Global Step: 366550 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:04:14,802-Speed 6304.04 samples/sec Loss 5.1901 LearningRate 0.0004 Epoch: 17 Global Step: 366560 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:04:18,037-Speed 6330.65 samples/sec Loss 5.2827 LearningRate 0.0004 Epoch: 17 Global Step: 366570 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:04:21,282-Speed 6313.48 samples/sec Loss 5.2185 LearningRate 0.0004 Epoch: 17 Global Step: 366580 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:04:24,529-Speed 6307.81 samples/sec Loss 5.1986 LearningRate 0.0004 Epoch: 17 Global Step: 366590 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:04:27,772-Speed 6316.56 samples/sec Loss 5.2206 LearningRate 0.0004 Epoch: 17 Global Step: 366600 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:04:31,028-Speed 6291.25 samples/sec Loss 5.2786 LearningRate 0.0004 Epoch: 17 Global Step: 366610 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:04:34,272-Speed 6315.05 samples/sec Loss 5.2790 LearningRate 0.0004 Epoch: 17 Global Step: 366620 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:04:37,515-Speed 6315.64 samples/sec Loss 5.2312 LearningRate 0.0004 Epoch: 17 Global Step: 366630 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:04:40,764-Speed 6306.13 samples/sec Loss 5.2320 LearningRate 0.0004 Epoch: 17 Global Step: 366640 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:04:44,013-Speed 6305.20 samples/sec Loss 5.2296 LearningRate 0.0004 Epoch: 17 Global Step: 366650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:04:47,265-Speed 6299.37 samples/sec Loss 5.1850 LearningRate 0.0004 Epoch: 17 Global Step: 366660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:04:50,509-Speed 6314.53 samples/sec Loss 5.2483 LearningRate 0.0004 Epoch: 17 Global Step: 366670 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:04:53,758-Speed 6304.45 samples/sec Loss 5.2415 LearningRate 0.0004 Epoch: 17 Global Step: 366680 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:04:57,000-Speed 6318.39 samples/sec Loss 5.3028 LearningRate 0.0004 Epoch: 17 Global Step: 366690 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:05:00,251-Speed 6301.99 samples/sec Loss 5.2908 LearningRate 0.0004 Epoch: 17 Global Step: 366700 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:05:03,490-Speed 6324.49 samples/sec Loss 5.2694 LearningRate 0.0004 Epoch: 17 Global Step: 366710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:06,736-Speed 6309.61 samples/sec Loss 5.2352 LearningRate 0.0004 Epoch: 17 Global Step: 366720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:09,984-Speed 6308.55 samples/sec Loss 5.2426 LearningRate 0.0004 Epoch: 17 Global Step: 366730 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:13,232-Speed 6304.72 samples/sec Loss 5.2230 LearningRate 0.0004 Epoch: 17 Global Step: 366740 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:16,479-Speed 6310.81 samples/sec Loss 5.2674 LearningRate 0.0004 Epoch: 17 Global Step: 366750 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:19,725-Speed 6309.50 samples/sec Loss 5.2697 LearningRate 0.0004 Epoch: 17 Global Step: 366760 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:22,983-Speed 6288.39 samples/sec Loss 5.2261 LearningRate 0.0004 Epoch: 17 Global Step: 366770 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:26,227-Speed 6313.46 samples/sec Loss 5.2329 LearningRate 0.0004 Epoch: 17 Global Step: 366780 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:29,471-Speed 6315.18 samples/sec Loss 5.2541 LearningRate 0.0004 Epoch: 17 Global Step: 366790 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:32,718-Speed 6307.83 samples/sec Loss 5.2801 LearningRate 0.0004 Epoch: 17 Global Step: 366800 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:35,968-Speed 6303.27 samples/sec Loss 5.2444 LearningRate 0.0004 Epoch: 17 Global Step: 366810 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:05:39,200-Speed 6338.80 samples/sec Loss 5.2984 LearningRate 0.0004 Epoch: 17 Global Step: 366820 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:42,443-Speed 6316.45 samples/sec Loss 5.2902 LearningRate 0.0004 Epoch: 17 Global Step: 366830 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:45,687-Speed 6314.04 samples/sec Loss 5.1598 LearningRate 0.0004 Epoch: 17 Global Step: 366840 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:48,932-Speed 6312.50 samples/sec Loss 5.1980 LearningRate 0.0004 Epoch: 17 Global Step: 366850 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:52,178-Speed 6311.33 samples/sec Loss 5.2487 LearningRate 0.0004 Epoch: 17 Global Step: 366860 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:55,424-Speed 6311.41 samples/sec Loss 5.2329 LearningRate 0.0004 Epoch: 17 Global Step: 366870 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:05:58,671-Speed 6308.08 samples/sec Loss 5.2393 LearningRate 0.0004 Epoch: 17 Global Step: 366880 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:01,915-Speed 6315.25 samples/sec Loss 5.2567 LearningRate 0.0004 Epoch: 17 Global Step: 366890 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:05,173-Speed 6287.72 samples/sec Loss 5.2641 LearningRate 0.0004 Epoch: 17 Global Step: 366900 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:08,418-Speed 6313.88 samples/sec Loss 5.2361 LearningRate 0.0004 Epoch: 17 Global Step: 366910 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:11,662-Speed 6314.14 samples/sec Loss 5.2205 LearningRate 0.0004 Epoch: 17 Global Step: 366920 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:06:14,908-Speed 6314.19 samples/sec Loss 5.2096 LearningRate 0.0004 Epoch: 17 Global Step: 366930 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:06:18,158-Speed 6303.63 samples/sec Loss 5.1456 LearningRate 0.0004 Epoch: 17 Global Step: 366940 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:06:21,406-Speed 6305.93 samples/sec Loss 5.2708 LearningRate 0.0004 Epoch: 17 Global Step: 366950 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:06:24,650-Speed 6314.14 samples/sec Loss 5.1898 LearningRate 0.0004 Epoch: 17 Global Step: 366960 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:06:27,901-Speed 6300.41 samples/sec Loss 5.2426 LearningRate 0.0004 Epoch: 17 Global Step: 366970 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:06:31,161-Speed 6283.48 samples/sec Loss 5.2943 LearningRate 0.0004 Epoch: 17 Global Step: 366980 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:06:34,389-Speed 6346.21 samples/sec Loss 5.2321 LearningRate 0.0004 Epoch: 17 Global Step: 366990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:37,640-Speed 6301.32 samples/sec Loss 5.2382 LearningRate 0.0004 Epoch: 17 Global Step: 367000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:40,884-Speed 6313.84 samples/sec Loss 5.1988 LearningRate 0.0004 Epoch: 17 Global Step: 367010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:44,133-Speed 6305.97 samples/sec Loss 5.2375 LearningRate 0.0004 Epoch: 17 Global Step: 367020 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:47,382-Speed 6304.19 samples/sec Loss 5.1629 LearningRate 0.0004 Epoch: 17 Global Step: 367030 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:50,624-Speed 6318.98 samples/sec Loss 5.2322 LearningRate 0.0004 Epoch: 17 Global Step: 367040 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:53,869-Speed 6313.20 samples/sec Loss 5.3232 LearningRate 0.0004 Epoch: 17 Global Step: 367050 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:06:57,110-Speed 6320.21 samples/sec Loss 5.2294 LearningRate 0.0004 Epoch: 17 Global Step: 367060 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:07:00,357-Speed 6308.55 samples/sec Loss 5.3159 LearningRate 0.0004 Epoch: 17 Global Step: 367070 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:07:03,600-Speed 6316.42 samples/sec Loss 5.2323 LearningRate 0.0004 Epoch: 17 Global Step: 367080 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:07:06,847-Speed 6309.13 samples/sec Loss 5.2557 LearningRate 0.0004 Epoch: 17 Global Step: 367090 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:07:10,088-Speed 6320.10 samples/sec Loss 5.3080 LearningRate 0.0004 Epoch: 17 Global Step: 367100 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:07:13,338-Speed 6304.01 samples/sec Loss 5.1038 LearningRate 0.0004 Epoch: 17 Global Step: 367110 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:07:21,333-Speed 2561.89 samples/sec Loss 5.2763 LearningRate 0.0004 Epoch: 17 Global Step: 367120 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:07:24,579-Speed 6310.86 samples/sec Loss 5.1885 LearningRate 0.0004 Epoch: 17 Global Step: 367130 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:07:27,822-Speed 6316.21 samples/sec Loss 5.2421 LearningRate 0.0004 Epoch: 17 Global Step: 367140 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:07:31,066-Speed 6313.46 samples/sec Loss 5.1552 LearningRate 0.0004 Epoch: 17 Global Step: 367150 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:07:34,312-Speed 6311.52 samples/sec Loss 5.2168 LearningRate 0.0004 Epoch: 17 Global Step: 367160 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:07:37,565-Speed 6297.41 samples/sec Loss 5.2184 LearningRate 0.0004 Epoch: 17 Global Step: 367170 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:07:40,796-Speed 6339.77 samples/sec Loss 5.2369 LearningRate 0.0004 Epoch: 17 Global Step: 367180 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:07:44,040-Speed 6314.30 samples/sec Loss 5.2897 LearningRate 0.0004 Epoch: 17 Global Step: 367190 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:07:47,284-Speed 6315.56 samples/sec Loss 5.2019 LearningRate 0.0004 Epoch: 17 Global Step: 367200 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:07:50,535-Speed 6300.33 samples/sec Loss 5.2259 LearningRate 0.0004 Epoch: 17 Global Step: 367210 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:07:53,783-Speed 6306.23 samples/sec Loss 5.2352 LearningRate 0.0004 Epoch: 17 Global Step: 367220 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:07:57,027-Speed 6315.32 samples/sec Loss 5.2605 LearningRate 0.0004 Epoch: 17 Global Step: 367230 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:08:00,275-Speed 6307.37 samples/sec Loss 5.2820 LearningRate 0.0004 Epoch: 17 Global Step: 367240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:08:03,523-Speed 6306.13 samples/sec Loss 5.2364 LearningRate 0.0004 Epoch: 17 Global Step: 367250 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:08:06,765-Speed 6318.92 samples/sec Loss 5.2446 LearningRate 0.0004 Epoch: 17 Global Step: 367260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:08:10,023-Speed 6287.95 samples/sec Loss 5.1899 LearningRate 0.0004 Epoch: 17 Global Step: 367270 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:08:13,272-Speed 6304.31 samples/sec Loss 5.1728 LearningRate 0.0004 Epoch: 17 Global Step: 367280 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:16,517-Speed 6313.19 samples/sec Loss 5.1700 LearningRate 0.0004 Epoch: 17 Global Step: 367290 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:19,803-Speed 6234.31 samples/sec Loss 5.3139 LearningRate 0.0004 Epoch: 17 Global Step: 367300 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:23,049-Speed 6311.62 samples/sec Loss 5.2402 LearningRate 0.0004 Epoch: 17 Global Step: 367310 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:26,294-Speed 6311.98 samples/sec Loss 5.2273 LearningRate 0.0004 Epoch: 17 Global Step: 367320 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:29,537-Speed 6315.26 samples/sec Loss 5.2870 LearningRate 0.0004 Epoch: 17 Global Step: 367330 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:32,781-Speed 6315.07 samples/sec Loss 5.2679 LearningRate 0.0004 Epoch: 17 Global Step: 367340 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:36,028-Speed 6309.77 samples/sec Loss 5.2642 LearningRate 0.0004 Epoch: 17 Global Step: 367350 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:39,275-Speed 6308.20 samples/sec Loss 5.2770 LearningRate 0.0004 Epoch: 17 Global Step: 367360 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:42,521-Speed 6310.72 samples/sec Loss 5.2842 LearningRate 0.0004 Epoch: 17 Global Step: 367370 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:45,758-Speed 6328.09 samples/sec Loss 5.2591 LearningRate 0.0004 Epoch: 17 Global Step: 367380 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:49,002-Speed 6314.55 samples/sec Loss 5.1458 LearningRate 0.0004 Epoch: 17 Global Step: 367390 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:52,249-Speed 6309.54 samples/sec Loss 5.2569 LearningRate 0.0004 Epoch: 17 Global Step: 367400 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:55,497-Speed 6306.67 samples/sec Loss 5.3010 LearningRate 0.0004 Epoch: 17 Global Step: 367410 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:08:58,741-Speed 6314.63 samples/sec Loss 5.2162 LearningRate 0.0004 Epoch: 17 Global Step: 367420 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:09:02,011-Speed 6263.85 samples/sec Loss 5.2342 LearningRate 0.0004 Epoch: 17 Global Step: 367430 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:09:05,372-Speed 6094.31 samples/sec Loss 5.2412 LearningRate 0.0004 Epoch: 17 Global Step: 367440 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:09:08,644-Speed 6260.57 samples/sec Loss 5.2272 LearningRate 0.0004 Epoch: 17 Global Step: 367450 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:09:11,888-Speed 6314.35 samples/sec Loss 5.3285 LearningRate 0.0004 Epoch: 17 Global Step: 367460 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:09:15,143-Speed 6293.27 samples/sec Loss 5.2624 LearningRate 0.0004 Epoch: 17 Global Step: 367470 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:09:18,374-Speed 6340.51 samples/sec Loss 5.2279 LearningRate 0.0004 Epoch: 17 Global Step: 367480 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:09:21,608-Speed 6333.54 samples/sec Loss 5.2972 LearningRate 0.0004 Epoch: 17 Global Step: 367490 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:09:24,856-Speed 6308.89 samples/sec Loss 5.2763 LearningRate 0.0004 Epoch: 17 Global Step: 367500 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:09:28,100-Speed 6313.43 samples/sec Loss 5.2320 LearningRate 0.0004 Epoch: 17 Global Step: 367510 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:09:31,346-Speed 6310.79 samples/sec Loss 5.1731 LearningRate 0.0004 Epoch: 17 Global Step: 367520 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:09:34,588-Speed 6319.33 samples/sec Loss 5.2960 LearningRate 0.0004 Epoch: 17 Global Step: 367530 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:09:37,832-Speed 6313.77 samples/sec Loss 5.2849 LearningRate 0.0004 Epoch: 17 Global Step: 367540 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:09:41,079-Speed 6309.99 samples/sec Loss 5.2684 LearningRate 0.0004 Epoch: 17 Global Step: 367550 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:09:44,320-Speed 6320.08 samples/sec Loss 5.2535 LearningRate 0.0004 Epoch: 17 Global Step: 367560 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:09:47,596-Speed 6252.73 samples/sec Loss 5.2095 LearningRate 0.0004 Epoch: 17 Global Step: 367570 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:09:50,861-Speed 6272.90 samples/sec Loss 5.3351 LearningRate 0.0004 Epoch: 17 Global Step: 367580 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:09:54,125-Speed 6276.29 samples/sec Loss 5.2184 LearningRate 0.0004 Epoch: 17 Global Step: 367590 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:09:57,371-Speed 6310.98 samples/sec Loss 5.2335 LearningRate 0.0004 Epoch: 17 Global Step: 367600 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:10:00,619-Speed 6305.90 samples/sec Loss 5.1888 LearningRate 0.0004 Epoch: 17 Global Step: 367610 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:10:03,873-Speed 6296.27 samples/sec Loss 5.2661 LearningRate 0.0004 Epoch: 17 Global Step: 367620 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:10:07,120-Speed 6309.27 samples/sec Loss 5.2666 LearningRate 0.0004 Epoch: 17 Global Step: 367630 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:10:10,367-Speed 6308.02 samples/sec Loss 5.2834 LearningRate 0.0004 Epoch: 17 Global Step: 367640 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:10:13,628-Speed 6280.98 samples/sec Loss 5.2332 LearningRate 0.0004 Epoch: 17 Global Step: 367650 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:10:16,873-Speed 6313.79 samples/sec Loss 5.2864 LearningRate 0.0004 Epoch: 17 Global Step: 367660 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:10:20,106-Speed 6336.23 samples/sec Loss 5.3085 LearningRate 0.0004 Epoch: 17 Global Step: 367670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:10:23,348-Speed 6318.18 samples/sec Loss 5.2526 LearningRate 0.0004 Epoch: 17 Global Step: 367680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:10:26,595-Speed 6307.78 samples/sec Loss 5.2527 LearningRate 0.0004 Epoch: 17 Global Step: 367690 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:10:29,840-Speed 6314.02 samples/sec Loss 5.2905 LearningRate 0.0004 Epoch: 17 Global Step: 367700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:10:33,086-Speed 6310.10 samples/sec Loss 5.2096 LearningRate 0.0004 Epoch: 17 Global Step: 367710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:10:36,330-Speed 6314.94 samples/sec Loss 5.2332 LearningRate 0.0004 Epoch: 17 Global Step: 367720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:10:39,570-Speed 6322.39 samples/sec Loss 5.2288 LearningRate 0.0004 Epoch: 17 Global Step: 367730 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:10:42,815-Speed 6312.60 samples/sec Loss 5.1951 LearningRate 0.0004 Epoch: 17 Global Step: 367740 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:10:46,114-Speed 6210.13 samples/sec Loss 5.2491 LearningRate 0.0004 Epoch: 17 Global Step: 367750 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:10:49,402-Speed 6230.73 samples/sec Loss 5.2153 LearningRate 0.0004 Epoch: 17 Global Step: 367760 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:10:52,706-Speed 6200.43 samples/sec Loss 5.2444 LearningRate 0.0004 Epoch: 17 Global Step: 367770 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:10:55,949-Speed 6314.90 samples/sec Loss 5.2280 LearningRate 0.0004 Epoch: 17 Global Step: 367780 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:10:59,200-Speed 6301.61 samples/sec Loss 5.2308 LearningRate 0.0004 Epoch: 17 Global Step: 367790 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:11:02,447-Speed 6308.19 samples/sec Loss 5.2030 LearningRate 0.0004 Epoch: 17 Global Step: 367800 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:11:05,691-Speed 6314.48 samples/sec Loss 5.2609 LearningRate 0.0004 Epoch: 17 Global Step: 367810 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:11:08,922-Speed 6341.25 samples/sec Loss 5.1777 LearningRate 0.0004 Epoch: 17 Global Step: 367820 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:11:12,170-Speed 6306.99 samples/sec Loss 5.3139 LearningRate 0.0004 Epoch: 17 Global Step: 367830 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:11:15,414-Speed 6313.83 samples/sec Loss 5.2246 LearningRate 0.0004 Epoch: 17 Global Step: 367840 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:11:18,712-Speed 6210.82 samples/sec Loss 5.2063 LearningRate 0.0004 Epoch: 17 Global Step: 367850 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:11:21,960-Speed 6308.02 samples/sec Loss 5.2525 LearningRate 0.0004 Epoch: 17 Global Step: 367860 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:11:25,204-Speed 6314.74 samples/sec Loss 5.1709 LearningRate 0.0004 Epoch: 17 Global Step: 367870 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:11:28,543-Speed 6133.87 samples/sec Loss 5.2471 LearningRate 0.0004 Epoch: 17 Global Step: 367880 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:11:31,787-Speed 6313.57 samples/sec Loss 5.2063 LearningRate 0.0004 Epoch: 17 Global Step: 367890 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:11:35,033-Speed 6311.47 samples/sec Loss 5.1924 LearningRate 0.0004 Epoch: 17 Global Step: 367900 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:11:38,278-Speed 6313.79 samples/sec Loss 5.2143 LearningRate 0.0004 Epoch: 17 Global Step: 367910 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:11:41,526-Speed 6306.34 samples/sec Loss 5.1725 LearningRate 0.0004 Epoch: 17 Global Step: 367920 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:11:44,781-Speed 6292.61 samples/sec Loss 5.2324 LearningRate 0.0004 Epoch: 17 Global Step: 367930 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:11:48,060-Speed 6247.38 samples/sec Loss 5.2310 LearningRate 0.0004 Epoch: 17 Global Step: 367940 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:11:51,308-Speed 6307.49 samples/sec Loss 5.3239 LearningRate 0.0004 Epoch: 17 Global Step: 367950 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:11:54,563-Speed 6293.47 samples/sec Loss 5.2421 LearningRate 0.0004 Epoch: 17 Global Step: 367960 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:11:57,808-Speed 6313.65 samples/sec Loss 5.2312 LearningRate 0.0004 Epoch: 17 Global Step: 367970 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:12:01,042-Speed 6333.94 samples/sec Loss 5.2806 LearningRate 0.0004 Epoch: 17 Global Step: 367980 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:04,295-Speed 6296.18 samples/sec Loss 5.2146 LearningRate 0.0004 Epoch: 17 Global Step: 367990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:07,540-Speed 6311.91 samples/sec Loss 5.2941 LearningRate 0.0004 Epoch: 17 Global Step: 368000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:10,787-Speed 6309.09 samples/sec Loss 5.2533 LearningRate 0.0004 Epoch: 17 Global Step: 368010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:14,031-Speed 6316.10 samples/sec Loss 5.2680 LearningRate 0.0004 Epoch: 17 Global Step: 368020 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:17,275-Speed 6312.95 samples/sec Loss 5.2285 LearningRate 0.0004 Epoch: 17 Global Step: 368030 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:20,517-Speed 6319.07 samples/sec Loss 5.1991 LearningRate 0.0004 Epoch: 17 Global Step: 368040 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:23,773-Speed 6291.78 samples/sec Loss 5.1646 LearningRate 0.0004 Epoch: 17 Global Step: 368050 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:27,017-Speed 6315.27 samples/sec Loss 5.2149 LearningRate 0.0004 Epoch: 17 Global Step: 368060 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:30,263-Speed 6310.26 samples/sec Loss 5.2041 LearningRate 0.0004 Epoch: 17 Global Step: 368070 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:33,506-Speed 6315.32 samples/sec Loss 5.2380 LearningRate 0.0004 Epoch: 17 Global Step: 368080 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:12:36,754-Speed 6307.88 samples/sec Loss 5.2177 LearningRate 0.0004 Epoch: 17 Global Step: 368090 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:12:39,990-Speed 6330.59 samples/sec Loss 5.2406 LearningRate 0.0004 Epoch: 17 Global Step: 368100 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:43,235-Speed 6311.59 samples/sec Loss 5.2678 LearningRate 0.0004 Epoch: 17 Global Step: 368110 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:46,484-Speed 6304.60 samples/sec Loss 5.2196 LearningRate 0.0004 Epoch: 17 Global Step: 368120 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:49,729-Speed 6311.89 samples/sec Loss 5.2491 LearningRate 0.0004 Epoch: 17 Global Step: 368130 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:52,972-Speed 6318.19 samples/sec Loss 5.2899 LearningRate 0.0004 Epoch: 17 Global Step: 368140 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:56,219-Speed 6308.34 samples/sec Loss 5.2603 LearningRate 0.0004 Epoch: 17 Global Step: 368150 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:12:59,463-Speed 6315.09 samples/sec Loss 5.2415 LearningRate 0.0004 Epoch: 17 Global Step: 368160 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:02,713-Speed 6303.03 samples/sec Loss 5.1852 LearningRate 0.0004 Epoch: 17 Global Step: 368170 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:05,963-Speed 6303.33 samples/sec Loss 5.2137 LearningRate 0.0004 Epoch: 17 Global Step: 368180 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:09,208-Speed 6312.97 samples/sec Loss 5.2466 LearningRate 0.0004 Epoch: 17 Global Step: 368190 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:12,452-Speed 6313.63 samples/sec Loss 5.1985 LearningRate 0.0004 Epoch: 17 Global Step: 368200 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:13:15,730-Speed 6249.03 samples/sec Loss 5.2114 LearningRate 0.0004 Epoch: 17 Global Step: 368210 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:13:19,028-Speed 6211.30 samples/sec Loss 5.2410 LearningRate 0.0004 Epoch: 17 Global Step: 368220 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:13:22,305-Speed 6251.55 samples/sec Loss 5.2047 LearningRate 0.0004 Epoch: 17 Global Step: 368230 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:13:25,536-Speed 6339.52 samples/sec Loss 5.2889 LearningRate 0.0004 Epoch: 17 Global Step: 368240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:28,778-Speed 6319.03 samples/sec Loss 5.2511 LearningRate 0.0004 Epoch: 17 Global Step: 368250 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:32,023-Speed 6312.68 samples/sec Loss 5.1789 LearningRate 0.0004 Epoch: 17 Global Step: 368260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:35,267-Speed 6314.53 samples/sec Loss 5.1998 LearningRate 0.0004 Epoch: 17 Global Step: 368270 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:38,510-Speed 6316.00 samples/sec Loss 5.2845 LearningRate 0.0004 Epoch: 17 Global Step: 368280 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:41,761-Speed 6301.04 samples/sec Loss 5.2618 LearningRate 0.0004 Epoch: 17 Global Step: 368290 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:45,006-Speed 6312.03 samples/sec Loss 5.2036 LearningRate 0.0004 Epoch: 17 Global Step: 368300 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:48,253-Speed 6308.90 samples/sec Loss 5.1867 LearningRate 0.0004 Epoch: 17 Global Step: 368310 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:51,500-Speed 6310.07 samples/sec Loss 5.2455 LearningRate 0.0004 Epoch: 17 Global Step: 368320 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:54,750-Speed 6302.61 samples/sec Loss 5.1436 LearningRate 0.0004 Epoch: 17 Global Step: 368330 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:13:57,995-Speed 6312.78 samples/sec Loss 5.2492 LearningRate 0.0004 Epoch: 17 Global Step: 368340 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:14:01,243-Speed 6306.67 samples/sec Loss 5.2317 LearningRate 0.0004 Epoch: 17 Global Step: 368350 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:14:04,494-Speed 6301.54 samples/sec Loss 5.1931 LearningRate 0.0004 Epoch: 17 Global Step: 368360 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:14:07,745-Speed 6300.94 samples/sec Loss 5.2424 LearningRate 0.0004 Epoch: 17 Global Step: 368370 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:14:10,988-Speed 6316.71 samples/sec Loss 5.2469 LearningRate 0.0004 Epoch: 17 Global Step: 368380 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:14:14,248-Speed 6282.58 samples/sec Loss 5.3063 LearningRate 0.0004 Epoch: 17 Global Step: 368390 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:14:17,494-Speed 6311.26 samples/sec Loss 5.3141 LearningRate 0.0004 Epoch: 17 Global Step: 368400 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:14:20,740-Speed 6310.45 samples/sec Loss 5.2336 LearningRate 0.0004 Epoch: 17 Global Step: 368410 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:14:23,986-Speed 6311.40 samples/sec Loss 5.2212 LearningRate 0.0004 Epoch: 17 Global Step: 368420 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:14:27,231-Speed 6312.74 samples/sec Loss 5.2437 LearningRate 0.0004 Epoch: 17 Global Step: 368430 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:14:30,475-Speed 6314.25 samples/sec Loss 5.2047 LearningRate 0.0004 Epoch: 17 Global Step: 368440 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:14:33,720-Speed 6311.42 samples/sec Loss 5.2426 LearningRate 0.0004 Epoch: 17 Global Step: 368450 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:14:36,971-Speed 6302.13 samples/sec Loss 5.2198 LearningRate 0.0004 Epoch: 17 Global Step: 368460 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:14:40,220-Speed 6304.89 samples/sec Loss 5.2101 LearningRate 0.0004 Epoch: 17 Global Step: 368470 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:14:43,471-Speed 6300.33 samples/sec Loss 5.2281 LearningRate 0.0004 Epoch: 17 Global Step: 368480 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:14:46,720-Speed 6306.03 samples/sec Loss 5.2927 LearningRate 0.0004 Epoch: 17 Global Step: 368490 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:14:49,961-Speed 6319.89 samples/sec Loss 5.2369 LearningRate 0.0004 Epoch: 17 Global Step: 368500 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:14:53,209-Speed 6306.41 samples/sec Loss 5.1839 LearningRate 0.0004 Epoch: 17 Global Step: 368510 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:14:56,463-Speed 6295.21 samples/sec Loss 5.2258 LearningRate 0.0004 Epoch: 17 Global Step: 368520 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:14:59,713-Speed 6303.76 samples/sec Loss 5.2958 LearningRate 0.0004 Epoch: 17 Global Step: 368530 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:15:02,965-Speed 6298.49 samples/sec Loss 5.1757 LearningRate 0.0004 Epoch: 17 Global Step: 368540 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:15:06,216-Speed 6301.47 samples/sec Loss 5.1765 LearningRate 0.0004 Epoch: 17 Global Step: 368550 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:15:09,463-Speed 6307.44 samples/sec Loss 5.2802 LearningRate 0.0004 Epoch: 17 Global Step: 368560 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:15:12,711-Speed 6306.97 samples/sec Loss 5.2052 LearningRate 0.0004 Epoch: 17 Global Step: 368570 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:15:15,960-Speed 6306.90 samples/sec Loss 5.2246 LearningRate 0.0004 Epoch: 17 Global Step: 368580 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:15:19,191-Speed 6339.81 samples/sec Loss 5.1249 LearningRate 0.0004 Epoch: 17 Global Step: 368590 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:15:22,442-Speed 6300.26 samples/sec Loss 5.2270 LearningRate 0.0004 Epoch: 17 Global Step: 368600 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:15:25,688-Speed 6311.30 samples/sec Loss 5.1707 LearningRate 0.0004 Epoch: 17 Global Step: 368610 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:15:28,933-Speed 6313.18 samples/sec Loss 5.2887 LearningRate 0.0004 Epoch: 17 Global Step: 368620 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:15:32,181-Speed 6307.17 samples/sec Loss 5.2671 LearningRate 0.0004 Epoch: 17 Global Step: 368630 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:15:35,425-Speed 6313.27 samples/sec Loss 5.2589 LearningRate 0.0004 Epoch: 17 Global Step: 368640 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:15:38,671-Speed 6310.57 samples/sec Loss 5.2379 LearningRate 0.0004 Epoch: 17 Global Step: 368650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:15:41,917-Speed 6311.58 samples/sec Loss 5.1994 LearningRate 0.0004 Epoch: 17 Global Step: 368660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:15:45,164-Speed 6309.03 samples/sec Loss 5.1910 LearningRate 0.0004 Epoch: 17 Global Step: 368670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:15:48,409-Speed 6311.33 samples/sec Loss 5.1904 LearningRate 0.0004 Epoch: 17 Global Step: 368680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:15:51,654-Speed 6313.96 samples/sec Loss 5.2371 LearningRate 0.0004 Epoch: 17 Global Step: 368690 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:15:54,900-Speed 6310.65 samples/sec Loss 5.2031 LearningRate 0.0004 Epoch: 17 Global Step: 368700 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:15:58,127-Speed 6346.87 samples/sec Loss 5.3195 LearningRate 0.0004 Epoch: 17 Global Step: 368710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:01,378-Speed 6302.57 samples/sec Loss 5.2661 LearningRate 0.0004 Epoch: 17 Global Step: 368720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:04,624-Speed 6310.05 samples/sec Loss 5.2034 LearningRate 0.0004 Epoch: 17 Global Step: 368730 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:07,867-Speed 6315.86 samples/sec Loss 5.2526 LearningRate 0.0004 Epoch: 17 Global Step: 368740 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:11,111-Speed 6315.47 samples/sec Loss 5.2270 LearningRate 0.0004 Epoch: 17 Global Step: 368750 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:14,365-Speed 6295.29 samples/sec Loss 5.2474 LearningRate 0.0004 Epoch: 17 Global Step: 368760 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:17,613-Speed 6306.54 samples/sec Loss 5.2474 LearningRate 0.0004 Epoch: 17 Global Step: 368770 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:20,861-Speed 6307.52 samples/sec Loss 5.3012 LearningRate 0.0004 Epoch: 17 Global Step: 368780 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:24,106-Speed 6311.30 samples/sec Loss 5.2845 LearningRate 0.0004 Epoch: 17 Global Step: 368790 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:27,384-Speed 6250.16 samples/sec Loss 5.2326 LearningRate 0.0004 Epoch: 17 Global Step: 368800 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:30,629-Speed 6313.16 samples/sec Loss 5.2167 LearningRate 0.0004 Epoch: 17 Global Step: 368810 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:16:33,874-Speed 6313.38 samples/sec Loss 5.2158 LearningRate 0.0004 Epoch: 17 Global Step: 368820 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:16:37,108-Speed 6333.01 samples/sec Loss 5.2018 LearningRate 0.0004 Epoch: 17 Global Step: 368830 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:40,364-Speed 6291.11 samples/sec Loss 5.2715 LearningRate 0.0004 Epoch: 17 Global Step: 368840 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:43,607-Speed 6316.01 samples/sec Loss 5.1981 LearningRate 0.0004 Epoch: 17 Global Step: 368850 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:46,855-Speed 6308.13 samples/sec Loss 5.1952 LearningRate 0.0004 Epoch: 17 Global Step: 368860 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:50,148-Speed 6220.71 samples/sec Loss 5.2561 LearningRate 0.0004 Epoch: 17 Global Step: 368870 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:53,441-Speed 6219.95 samples/sec Loss 5.2356 LearningRate 0.0004 Epoch: 17 Global Step: 368880 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:56,692-Speed 6300.75 samples/sec Loss 5.2154 LearningRate 0.0004 Epoch: 17 Global Step: 368890 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:16:59,938-Speed 6312.23 samples/sec Loss 5.2911 LearningRate 0.0004 Epoch: 17 Global Step: 368900 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:03,227-Speed 6227.19 samples/sec Loss 5.3066 LearningRate 0.0004 Epoch: 17 Global Step: 368910 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:06,469-Speed 6317.91 samples/sec Loss 5.3063 LearningRate 0.0004 Epoch: 17 Global Step: 368920 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:09,715-Speed 6311.87 samples/sec Loss 5.2463 LearningRate 0.0004 Epoch: 17 Global Step: 368930 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:17:12,953-Speed 6325.53 samples/sec Loss 5.2039 LearningRate 0.0004 Epoch: 17 Global Step: 368940 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:16,196-Speed 6317.79 samples/sec Loss 5.2118 LearningRate 0.0004 Epoch: 17 Global Step: 368950 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:19,440-Speed 6313.86 samples/sec Loss 5.2101 LearningRate 0.0004 Epoch: 17 Global Step: 368960 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:22,691-Speed 6301.21 samples/sec Loss 5.2725 LearningRate 0.0004 Epoch: 17 Global Step: 368970 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:25,935-Speed 6314.55 samples/sec Loss 5.2551 LearningRate 0.0004 Epoch: 17 Global Step: 368980 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:29,179-Speed 6314.08 samples/sec Loss 5.2673 LearningRate 0.0004 Epoch: 17 Global Step: 368990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:32,429-Speed 6303.78 samples/sec Loss 5.2607 LearningRate 0.0004 Epoch: 17 Global Step: 369000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:35,686-Speed 6289.86 samples/sec Loss 5.2838 LearningRate 0.0004 Epoch: 17 Global Step: 369010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:38,932-Speed 6311.76 samples/sec Loss 5.1981 LearningRate 0.0004 Epoch: 17 Global Step: 369020 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:42,174-Speed 6317.94 samples/sec Loss 5.2902 LearningRate 0.0004 Epoch: 17 Global Step: 369030 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:17:45,419-Speed 6312.33 samples/sec Loss 5.2049 LearningRate 0.0004 Epoch: 17 Global Step: 369040 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:17:48,665-Speed 6310.73 samples/sec Loss 5.1974 LearningRate 0.0004 Epoch: 17 Global Step: 369050 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:17:51,914-Speed 6305.77 samples/sec Loss 5.1693 LearningRate 0.0004 Epoch: 17 Global Step: 369060 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:17:55,165-Speed 6300.24 samples/sec Loss 5.1926 LearningRate 0.0004 Epoch: 17 Global Step: 369070 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:17:58,434-Speed 6265.59 samples/sec Loss 5.2383 LearningRate 0.0004 Epoch: 17 Global Step: 369080 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:18:01,665-Speed 6340.45 samples/sec Loss 5.3495 LearningRate 0.0004 Epoch: 17 Global Step: 369090 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:04,912-Speed 6309.52 samples/sec Loss 5.2854 LearningRate 0.0004 Epoch: 17 Global Step: 369100 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:08,156-Speed 6313.78 samples/sec Loss 5.2657 LearningRate 0.0004 Epoch: 17 Global Step: 369110 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:11,396-Speed 6321.62 samples/sec Loss 5.2300 LearningRate 0.0004 Epoch: 17 Global Step: 369120 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:14,638-Speed 6320.06 samples/sec Loss 5.2062 LearningRate 0.0004 Epoch: 17 Global Step: 369130 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:17,881-Speed 6316.13 samples/sec Loss 5.1759 LearningRate 0.0004 Epoch: 17 Global Step: 369140 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:21,131-Speed 6302.82 samples/sec Loss 5.1451 LearningRate 0.0004 Epoch: 17 Global Step: 369150 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:24,378-Speed 6308.47 samples/sec Loss 5.2818 LearningRate 0.0004 Epoch: 17 Global Step: 369160 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:27,619-Speed 6319.54 samples/sec Loss 5.2851 LearningRate 0.0004 Epoch: 17 Global Step: 369170 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:30,865-Speed 6311.40 samples/sec Loss 5.2053 LearningRate 0.0004 Epoch: 17 Global Step: 369180 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:34,112-Speed 6308.87 samples/sec Loss 5.2441 LearningRate 0.0004 Epoch: 17 Global Step: 369190 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:18:37,354-Speed 6318.07 samples/sec Loss 5.1599 LearningRate 0.0004 Epoch: 17 Global Step: 369200 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:18:40,601-Speed 6309.53 samples/sec Loss 5.2184 LearningRate 0.0004 Epoch: 17 Global Step: 369210 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:18:43,828-Speed 6347.95 samples/sec Loss 5.1838 LearningRate 0.0004 Epoch: 17 Global Step: 369220 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:47,072-Speed 6315.52 samples/sec Loss 5.1868 LearningRate 0.0004 Epoch: 17 Global Step: 369230 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:50,313-Speed 6319.64 samples/sec Loss 5.1724 LearningRate 0.0004 Epoch: 17 Global Step: 369240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:53,612-Speed 6209.38 samples/sec Loss 5.2349 LearningRate 0.0004 Epoch: 17 Global Step: 369250 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:18:56,857-Speed 6312.85 samples/sec Loss 5.1715 LearningRate 0.0004 Epoch: 17 Global Step: 369260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:19:00,105-Speed 6307.16 samples/sec Loss 5.2570 LearningRate 0.0004 Epoch: 17 Global Step: 369270 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:19:03,352-Speed 6308.08 samples/sec Loss 5.2190 LearningRate 0.0004 Epoch: 17 Global Step: 369280 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:19:06,599-Speed 6309.72 samples/sec Loss 5.2492 LearningRate 0.0004 Epoch: 17 Global Step: 369290 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:19:09,844-Speed 6313.32 samples/sec Loss 5.2094 LearningRate 0.0004 Epoch: 17 Global Step: 369300 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:19:13,087-Speed 6315.79 samples/sec Loss 5.1996 LearningRate 0.0004 Epoch: 17 Global Step: 369310 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:19:16,333-Speed 6309.70 samples/sec Loss 5.2238 LearningRate 0.0004 Epoch: 17 Global Step: 369320 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:19:19,587-Speed 6295.35 samples/sec Loss 5.2280 LearningRate 0.0004 Epoch: 17 Global Step: 369330 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:19:22,834-Speed 6309.47 samples/sec Loss 5.2061 LearningRate 0.0004 Epoch: 17 Global Step: 369340 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:19:26,081-Speed 6308.89 samples/sec Loss 5.1818 LearningRate 0.0004 Epoch: 17 Global Step: 369350 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:19:29,329-Speed 6306.15 samples/sec Loss 5.1562 LearningRate 0.0004 Epoch: 17 Global Step: 369360 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:19:32,578-Speed 6305.93 samples/sec Loss 5.1715 LearningRate 0.0004 Epoch: 17 Global Step: 369370 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:19:35,823-Speed 6312.88 samples/sec Loss 5.2654 LearningRate 0.0004 Epoch: 17 Global Step: 369380 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:19:39,069-Speed 6310.97 samples/sec Loss 5.2350 LearningRate 0.0004 Epoch: 17 Global Step: 369390 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:19:42,316-Speed 6308.84 samples/sec Loss 5.2390 LearningRate 0.0004 Epoch: 17 Global Step: 369400 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:19:45,559-Speed 6315.29 samples/sec Loss 5.2931 LearningRate 0.0004 Epoch: 17 Global Step: 369410 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:19:48,776-Speed 6366.84 samples/sec Loss 5.2350 LearningRate 0.0004 Epoch: 17 Global Step: 369420 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:19:52,023-Speed 6309.03 samples/sec Loss 5.2472 LearningRate 0.0004 Epoch: 17 Global Step: 369430 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:19:55,266-Speed 6317.13 samples/sec Loss 5.2560 LearningRate 0.0004 Epoch: 17 Global Step: 369440 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:19:58,511-Speed 6314.43 samples/sec Loss 5.1806 LearningRate 0.0004 Epoch: 17 Global Step: 369450 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:01,760-Speed 6303.45 samples/sec Loss 5.2373 LearningRate 0.0004 Epoch: 17 Global Step: 369460 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:05,007-Speed 6310.19 samples/sec Loss 5.1959 LearningRate 0.0004 Epoch: 17 Global Step: 369470 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:08,251-Speed 6313.45 samples/sec Loss 5.2464 LearningRate 0.0004 Epoch: 17 Global Step: 369480 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:11,499-Speed 6307.21 samples/sec Loss 5.2228 LearningRate 0.0004 Epoch: 17 Global Step: 369490 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:14,750-Speed 6301.42 samples/sec Loss 5.2654 LearningRate 0.0004 Epoch: 17 Global Step: 369500 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:18,029-Speed 6247.84 samples/sec Loss 5.1673 LearningRate 0.0004 Epoch: 17 Global Step: 369510 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:21,275-Speed 6309.51 samples/sec Loss 5.1120 LearningRate 0.0004 Epoch: 17 Global Step: 369520 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:20:24,524-Speed 6304.76 samples/sec Loss 5.2225 LearningRate 0.0004 Epoch: 17 Global Step: 369530 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:20:27,775-Speed 6302.16 samples/sec Loss 5.1922 LearningRate 0.0004 Epoch: 17 Global Step: 369540 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:20:31,026-Speed 6301.21 samples/sec Loss 5.2111 LearningRate 0.0004 Epoch: 17 Global Step: 369550 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:20:34,269-Speed 6315.65 samples/sec Loss 5.1673 LearningRate 0.0004 Epoch: 17 Global Step: 369560 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:20:37,500-Speed 6339.52 samples/sec Loss 5.2628 LearningRate 0.0004 Epoch: 17 Global Step: 369570 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:40,748-Speed 6308.42 samples/sec Loss 5.2195 LearningRate 0.0004 Epoch: 17 Global Step: 369580 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:43,993-Speed 6312.40 samples/sec Loss 5.2122 LearningRate 0.0004 Epoch: 17 Global Step: 369590 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:47,242-Speed 6304.21 samples/sec Loss 5.1350 LearningRate 0.0004 Epoch: 17 Global Step: 369600 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:50,485-Speed 6316.36 samples/sec Loss 5.1825 LearningRate 0.0004 Epoch: 17 Global Step: 369610 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:53,731-Speed 6311.66 samples/sec Loss 5.2371 LearningRate 0.0004 Epoch: 17 Global Step: 369620 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:20:56,973-Speed 6317.12 samples/sec Loss 5.2139 LearningRate 0.0004 Epoch: 17 Global Step: 369630 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:21:00,233-Speed 6283.03 samples/sec Loss 5.2569 LearningRate 0.0004 Epoch: 17 Global Step: 369640 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:21:03,491-Speed 6287.48 samples/sec Loss 5.2382 LearningRate 0.0004 Epoch: 17 Global Step: 369650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:21:06,744-Speed 6299.87 samples/sec Loss 5.2853 LearningRate 0.0004 Epoch: 17 Global Step: 369660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:21:10,000-Speed 6292.02 samples/sec Loss 5.2989 LearningRate 0.0004 Epoch: 17 Global Step: 369670 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:13,246-Speed 6312.02 samples/sec Loss 5.2834 LearningRate 0.0004 Epoch: 17 Global Step: 369680 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:16,494-Speed 6305.79 samples/sec Loss 5.1886 LearningRate 0.0004 Epoch: 17 Global Step: 369690 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:19,744-Speed 6302.51 samples/sec Loss 5.2630 LearningRate 0.0004 Epoch: 17 Global Step: 369700 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:22,990-Speed 6311.21 samples/sec Loss 5.2295 LearningRate 0.0004 Epoch: 17 Global Step: 369710 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:26,237-Speed 6309.69 samples/sec Loss 5.2330 LearningRate 0.0004 Epoch: 17 Global Step: 369720 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:29,485-Speed 6305.45 samples/sec Loss 5.1814 LearningRate 0.0004 Epoch: 17 Global Step: 369730 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:32,730-Speed 6313.39 samples/sec Loss 5.2532 LearningRate 0.0004 Epoch: 17 Global Step: 369740 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:35,976-Speed 6309.67 samples/sec Loss 5.2546 LearningRate 0.0004 Epoch: 17 Global Step: 369750 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:39,223-Speed 6309.99 samples/sec Loss 5.2240 LearningRate 0.0004 Epoch: 17 Global Step: 369760 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:42,456-Speed 6335.29 samples/sec Loss 5.2364 LearningRate 0.0004 Epoch: 17 Global Step: 369770 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:45,700-Speed 6313.68 samples/sec Loss 5.2965 LearningRate 0.0004 Epoch: 17 Global Step: 369780 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:48,947-Speed 6309.28 samples/sec Loss 5.2053 LearningRate 0.0004 Epoch: 17 Global Step: 369790 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:52,196-Speed 6306.54 samples/sec Loss 5.2540 LearningRate 0.0004 Epoch: 17 Global Step: 369800 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:55,441-Speed 6311.75 samples/sec Loss 5.2267 LearningRate 0.0004 Epoch: 17 Global Step: 369810 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:21:58,686-Speed 6312.69 samples/sec Loss 5.2542 LearningRate 0.0004 Epoch: 17 Global Step: 369820 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:01,937-Speed 6301.52 samples/sec Loss 5.1753 LearningRate 0.0004 Epoch: 17 Global Step: 369830 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:05,182-Speed 6311.93 samples/sec Loss 5.1564 LearningRate 0.0004 Epoch: 17 Global Step: 369840 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:08,426-Speed 6315.18 samples/sec Loss 5.1556 LearningRate 0.0004 Epoch: 17 Global Step: 369850 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:11,672-Speed 6310.31 samples/sec Loss 5.2057 LearningRate 0.0004 Epoch: 17 Global Step: 369860 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:14,902-Speed 6341.27 samples/sec Loss 5.2483 LearningRate 0.0004 Epoch: 17 Global Step: 369870 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:18,151-Speed 6305.60 samples/sec Loss 5.1933 LearningRate 0.0004 Epoch: 17 Global Step: 369880 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:21,397-Speed 6311.98 samples/sec Loss 5.1549 LearningRate 0.0004 Epoch: 17 Global Step: 369890 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:24,643-Speed 6310.26 samples/sec Loss 5.3500 LearningRate 0.0004 Epoch: 17 Global Step: 369900 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:27,894-Speed 6300.50 samples/sec Loss 5.2099 LearningRate 0.0004 Epoch: 17 Global Step: 369910 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:31,165-Speed 6263.83 samples/sec Loss 5.2519 LearningRate 0.0004 Epoch: 17 Global Step: 369920 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:34,409-Speed 6314.12 samples/sec Loss 5.2197 LearningRate 0.0004 Epoch: 17 Global Step: 369930 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:37,662-Speed 6298.03 samples/sec Loss 5.1718 LearningRate 0.0004 Epoch: 17 Global Step: 369940 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:40,920-Speed 6286.98 samples/sec Loss 5.2462 LearningRate 0.0004 Epoch: 17 Global Step: 369950 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:44,165-Speed 6312.15 samples/sec Loss 5.2233 LearningRate 0.0004 Epoch: 17 Global Step: 369960 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:47,396-Speed 6340.88 samples/sec Loss 5.1739 LearningRate 0.0004 Epoch: 17 Global Step: 369970 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:50,642-Speed 6309.77 samples/sec Loss 5.1955 LearningRate 0.0004 Epoch: 17 Global Step: 369980 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:53,885-Speed 6316.09 samples/sec Loss 5.2021 LearningRate 0.0004 Epoch: 17 Global Step: 369990 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:22:57,132-Speed 6309.65 samples/sec Loss 5.3165 LearningRate 0.0004 Epoch: 17 Global Step: 370000 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:00,379-Speed 6307.69 samples/sec Loss 5.2558 LearningRate 0.0004 Epoch: 17 Global Step: 370010 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:03,628-Speed 6306.33 samples/sec Loss 5.2652 LearningRate 0.0004 Epoch: 17 Global Step: 370020 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:06,884-Speed 6291.79 samples/sec Loss 5.1931 LearningRate 0.0004 Epoch: 17 Global Step: 370030 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:10,126-Speed 6316.62 samples/sec Loss 5.1881 LearningRate 0.0004 Epoch: 17 Global Step: 370040 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:13,367-Speed 6320.15 samples/sec Loss 5.2646 LearningRate 0.0004 Epoch: 17 Global Step: 370050 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:16,615-Speed 6308.16 samples/sec Loss 5.2432 LearningRate 0.0004 Epoch: 17 Global Step: 370060 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:19,858-Speed 6316.24 samples/sec Loss 5.2190 LearningRate 0.0004 Epoch: 17 Global Step: 370070 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-04-02 01:23:23,089-Speed 6340.11 samples/sec Loss 5.2235 LearningRate 0.0004 Epoch: 17 Global Step: 370080 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:26,338-Speed 6306.30 samples/sec Loss 5.1921 LearningRate 0.0004 Epoch: 17 Global Step: 370090 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:29,582-Speed 6314.04 samples/sec Loss 5.3035 LearningRate 0.0004 Epoch: 17 Global Step: 370100 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:32,825-Speed 6316.25 samples/sec Loss 5.1691 LearningRate 0.0004 Epoch: 17 Global Step: 370110 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:36,072-Speed 6308.77 samples/sec Loss 5.1509 LearningRate 0.0004 Epoch: 17 Global Step: 370120 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:39,321-Speed 6305.51 samples/sec Loss 5.1631 LearningRate 0.0004 Epoch: 17 Global Step: 370130 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:42,564-Speed 6316.29 samples/sec Loss 5.2170 LearningRate 0.0004 Epoch: 17 Global Step: 370140 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:45,808-Speed 6314.00 samples/sec Loss 5.2270 LearningRate 0.0004 Epoch: 17 Global Step: 370150 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:49,052-Speed 6314.50 samples/sec Loss 5.2109 LearningRate 0.0004 Epoch: 17 Global Step: 370160 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:23:52,289-Speed 6328.91 samples/sec Loss 5.1823 LearningRate 0.0004 Epoch: 17 Global Step: 370170 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:23:55,540-Speed 6301.77 samples/sec Loss 5.1526 LearningRate 0.0004 Epoch: 17 Global Step: 370180 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:23:58,786-Speed 6310.46 samples/sec Loss 5.1729 LearningRate 0.0004 Epoch: 17 Global Step: 370190 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:24:02,035-Speed 6303.52 samples/sec Loss 5.2319 LearningRate 0.0004 Epoch: 17 Global Step: 370200 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:24:05,285-Speed 6303.13 samples/sec Loss 5.2144 LearningRate 0.0004 Epoch: 17 Global Step: 370210 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:24:08,532-Speed 6308.76 samples/sec Loss 5.2757 LearningRate 0.0004 Epoch: 17 Global Step: 370220 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:24:11,782-Speed 6303.34 samples/sec Loss 5.2005 LearningRate 0.0004 Epoch: 17 Global Step: 370230 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:24:15,029-Speed 6308.43 samples/sec Loss 5.2024 LearningRate 0.0004 Epoch: 17 Global Step: 370240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:24:18,276-Speed 6310.19 samples/sec Loss 5.1510 LearningRate 0.0004 Epoch: 17 Global Step: 370250 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:24:21,518-Speed 6318.19 samples/sec Loss 5.1901 LearningRate 0.0004 Epoch: 17 Global Step: 370260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:24:24,767-Speed 6304.95 samples/sec Loss 5.2093 LearningRate 0.0004 Epoch: 17 Global Step: 370270 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:24:28,010-Speed 6316.89 samples/sec Loss 5.1964 LearningRate 0.0004 Epoch: 17 Global Step: 370280 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:24:31,255-Speed 6311.63 samples/sec Loss 5.1443 LearningRate 0.0004 Epoch: 17 Global Step: 370290 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:24:34,500-Speed 6313.72 samples/sec Loss 5.1770 LearningRate 0.0004 Epoch: 17 Global Step: 370300 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:24:37,748-Speed 6309.88 samples/sec Loss 5.1233 LearningRate 0.0004 Epoch: 17 Global Step: 370310 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:24:40,995-Speed 6308.37 samples/sec Loss 5.1713 LearningRate 0.0004 Epoch: 17 Global Step: 370320 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:24:44,240-Speed 6312.57 samples/sec Loss 5.2536 LearningRate 0.0004 Epoch: 17 Global Step: 370330 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:24:47,487-Speed 6308.30 samples/sec Loss 5.2073 LearningRate 0.0004 Epoch: 17 Global Step: 370340 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:24:50,734-Speed 6309.89 samples/sec Loss 5.2421 LearningRate 0.0004 Epoch: 17 Global Step: 370350 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:24:53,983-Speed 6304.83 samples/sec Loss 5.1911 LearningRate 0.0004 Epoch: 17 Global Step: 370360 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:24:57,216-Speed 6335.68 samples/sec Loss 5.2363 LearningRate 0.0004 Epoch: 17 Global Step: 370370 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:25:00,465-Speed 6305.02 samples/sec Loss 5.2185 LearningRate 0.0004 Epoch: 17 Global Step: 370380 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:25:03,712-Speed 6307.64 samples/sec Loss 5.2171 LearningRate 0.0004 Epoch: 17 Global Step: 370390 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:25:06,959-Speed 6309.82 samples/sec Loss 5.2137 LearningRate 0.0004 Epoch: 17 Global Step: 370400 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:25:10,206-Speed 6307.45 samples/sec Loss 5.2139 LearningRate 0.0004 Epoch: 17 Global Step: 370410 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:25:13,448-Speed 6319.28 samples/sec Loss 5.2542 LearningRate 0.0004 Epoch: 17 Global Step: 370420 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:25:16,759-Speed 6186.00 samples/sec Loss 5.2292 LearningRate 0.0004 Epoch: 17 Global Step: 370430 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:25:20,038-Speed 6247.03 samples/sec Loss 5.2692 LearningRate 0.0004 Epoch: 17 Global Step: 370440 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:25:23,270-Speed 6338.35 samples/sec Loss 5.1562 LearningRate 0.0004 Epoch: 17 Global Step: 370450 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:25:26,516-Speed 6311.37 samples/sec Loss 5.1800 LearningRate 0.0004 Epoch: 17 Global Step: 370460 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:25:29,767-Speed 6300.15 samples/sec Loss 5.1576 LearningRate 0.0004 Epoch: 17 Global Step: 370470 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:25:33,010-Speed 6317.60 samples/sec Loss 5.2768 LearningRate 0.0004 Epoch: 17 Global Step: 370480 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:25:36,270-Speed 6283.14 samples/sec Loss 5.2217 LearningRate 0.0004 Epoch: 17 Global Step: 370490 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:25:39,517-Speed 6308.95 samples/sec Loss 5.2640 LearningRate 0.0004 Epoch: 17 Global Step: 370500 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:25:42,762-Speed 6312.59 samples/sec Loss 5.2308 LearningRate 0.0004 Epoch: 17 Global Step: 370510 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:25:46,006-Speed 6315.08 samples/sec Loss 5.2190 LearningRate 0.0004 Epoch: 17 Global Step: 370520 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:25:49,252-Speed 6311.89 samples/sec Loss 5.2090 LearningRate 0.0004 Epoch: 17 Global Step: 370530 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:25:52,494-Speed 6317.40 samples/sec Loss 5.1660 LearningRate 0.0004 Epoch: 17 Global Step: 370540 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:25:55,738-Speed 6314.09 samples/sec Loss 5.1765 LearningRate 0.0004 Epoch: 17 Global Step: 370550 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:25:58,983-Speed 6312.95 samples/sec Loss 5.2598 LearningRate 0.0004 Epoch: 17 Global Step: 370560 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:26:02,237-Speed 6295.50 samples/sec Loss 5.2138 LearningRate 0.0004 Epoch: 17 Global Step: 370570 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:26:05,486-Speed 6305.34 samples/sec Loss 5.2342 LearningRate 0.0004 Epoch: 17 Global Step: 370580 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:26:08,732-Speed 6310.19 samples/sec Loss 5.2370 LearningRate 0.0004 Epoch: 17 Global Step: 370590 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:26:11,979-Speed 6308.81 samples/sec Loss 5.2148 LearningRate 0.0004 Epoch: 17 Global Step: 370600 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:26:15,226-Speed 6308.18 samples/sec Loss 5.1271 LearningRate 0.0004 Epoch: 17 Global Step: 370610 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:26:18,471-Speed 6313.33 samples/sec Loss 5.2014 LearningRate 0.0004 Epoch: 17 Global Step: 370620 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:26:21,701-Speed 6341.76 samples/sec Loss 5.1577 LearningRate 0.0004 Epoch: 17 Global Step: 370630 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:26:24,946-Speed 6312.26 samples/sec Loss 5.2457 LearningRate 0.0004 Epoch: 17 Global Step: 370640 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:26:28,190-Speed 6315.80 samples/sec Loss 5.2058 LearningRate 0.0004 Epoch: 17 Global Step: 370650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:26:31,438-Speed 6306.49 samples/sec Loss 5.2437 LearningRate 0.0004 Epoch: 17 Global Step: 370660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:26:34,708-Speed 6263.81 samples/sec Loss 5.2075 LearningRate 0.0004 Epoch: 17 Global Step: 370670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:26:38,022-Speed 6180.37 samples/sec Loss 5.2050 LearningRate 0.0004 Epoch: 17 Global Step: 370680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:26:41,264-Speed 6320.06 samples/sec Loss 5.2187 LearningRate 0.0004 Epoch: 17 Global Step: 370690 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:26:44,512-Speed 6307.01 samples/sec Loss 5.1404 LearningRate 0.0004 Epoch: 17 Global Step: 370700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:26:47,758-Speed 6310.21 samples/sec Loss 5.1570 LearningRate 0.0004 Epoch: 17 Global Step: 370710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:26:51,004-Speed 6310.75 samples/sec Loss 5.1638 LearningRate 0.0004 Epoch: 17 Global Step: 370720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:26:54,251-Speed 6308.66 samples/sec Loss 5.2110 LearningRate 0.0004 Epoch: 17 Global Step: 370730 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:26:57,497-Speed 6310.84 samples/sec Loss 5.1743 LearningRate 0.0004 Epoch: 17 Global Step: 370740 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:00,743-Speed 6311.36 samples/sec Loss 5.1958 LearningRate 0.0004 Epoch: 17 Global Step: 370750 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:03,993-Speed 6302.29 samples/sec Loss 5.1950 LearningRate 0.0004 Epoch: 17 Global Step: 370760 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:07,240-Speed 6310.16 samples/sec Loss 5.1776 LearningRate 0.0004 Epoch: 17 Global Step: 370770 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:10,485-Speed 6311.13 samples/sec Loss 5.1682 LearningRate 0.0004 Epoch: 17 Global Step: 370780 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:13,732-Speed 6308.92 samples/sec Loss 5.2164 LearningRate 0.0004 Epoch: 17 Global Step: 370790 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:16,981-Speed 6306.47 samples/sec Loss 5.3444 LearningRate 0.0004 Epoch: 17 Global Step: 370800 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:20,221-Speed 6320.71 samples/sec Loss 5.1882 LearningRate 0.0004 Epoch: 17 Global Step: 370810 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:23,467-Speed 6310.49 samples/sec Loss 5.2284 LearningRate 0.0004 Epoch: 17 Global Step: 370820 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:26,711-Speed 6314.22 samples/sec Loss 5.2104 LearningRate 0.0004 Epoch: 17 Global Step: 370830 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-04-02 01:27:30,001-Speed 6227.96 samples/sec Loss 5.2507 LearningRate 0.0004 Epoch: 17 Global Step: 370840 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:33,244-Speed 6314.93 samples/sec Loss 5.1534 LearningRate 0.0004 Epoch: 17 Global Step: 370850 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:36,492-Speed 6308.05 samples/sec Loss 5.2763 LearningRate 0.0004 Epoch: 17 Global Step: 370860 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:39,749-Speed 6289.71 samples/sec Loss 5.2359 LearningRate 0.0004 Epoch: 17 Global Step: 370870 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:42,998-Speed 6303.44 samples/sec Loss 5.2577 LearningRate 0.0004 Epoch: 17 Global Step: 370880 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:46,248-Speed 6302.61 samples/sec Loss 5.1926 LearningRate 0.0004 Epoch: 17 Global Step: 370890 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:49,493-Speed 6313.20 samples/sec Loss 5.2556 LearningRate 0.0004 Epoch: 17 Global Step: 370900 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:52,738-Speed 6313.57 samples/sec Loss 5.2305 LearningRate 0.0004 Epoch: 17 Global Step: 370910 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:55,987-Speed 6304.63 samples/sec Loss 5.1608 LearningRate 0.0004 Epoch: 17 Global Step: 370920 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:27:59,235-Speed 6306.80 samples/sec Loss 5.2180 LearningRate 0.0004 Epoch: 17 Global Step: 370930 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:28:02,475-Speed 6322.23 samples/sec Loss 5.1705 LearningRate 0.0004 Epoch: 17 Global Step: 370940 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:28:05,727-Speed 6299.79 samples/sec Loss 5.2292 LearningRate 0.0004 Epoch: 17 Global Step: 370950 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:28:08,975-Speed 6307.23 samples/sec Loss 5.2480 LearningRate 0.0004 Epoch: 17 Global Step: 370960 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:28:12,224-Speed 6304.93 samples/sec Loss 5.2457 LearningRate 0.0004 Epoch: 17 Global Step: 370970 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:28:15,471-Speed 6309.05 samples/sec Loss 5.2387 LearningRate 0.0004 Epoch: 17 Global Step: 370980 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:28:18,706-Speed 6334.90 samples/sec Loss 5.3524 LearningRate 0.0004 Epoch: 17 Global Step: 370990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:28:21,952-Speed 6311.30 samples/sec Loss 5.1573 LearningRate 0.0004 Epoch: 17 Global Step: 371000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:28:25,200-Speed 6306.16 samples/sec Loss 5.2713 LearningRate 0.0004 Epoch: 17 Global Step: 371010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:28:28,449-Speed 6305.31 samples/sec Loss 5.2307 LearningRate 0.0004 Epoch: 17 Global Step: 371020 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:28:31,692-Speed 6315.31 samples/sec Loss 5.2003 LearningRate 0.0004 Epoch: 17 Global Step: 371030 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:28:34,938-Speed 6312.63 samples/sec Loss 5.2014 LearningRate 0.0004 Epoch: 17 Global Step: 371040 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:28:38,198-Speed 6282.10 samples/sec Loss 5.1501 LearningRate 0.0004 Epoch: 17 Global Step: 371050 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:28:41,447-Speed 6306.15 samples/sec Loss 5.2095 LearningRate 0.0004 Epoch: 17 Global Step: 371060 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:28:44,692-Speed 6311.03 samples/sec Loss 5.1540 LearningRate 0.0004 Epoch: 17 Global Step: 371070 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:28:47,935-Speed 6317.46 samples/sec Loss 5.2418 LearningRate 0.0004 Epoch: 17 Global Step: 371080 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:28:51,176-Speed 6320.42 samples/sec Loss 5.1945 LearningRate 0.0004 Epoch: 17 Global Step: 371090 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:28:54,422-Speed 6309.72 samples/sec Loss 5.1761 LearningRate 0.0004 Epoch: 17 Global Step: 371100 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:28:57,668-Speed 6312.25 samples/sec Loss 5.2381 LearningRate 0.0004 Epoch: 17 Global Step: 371110 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:29:00,911-Speed 6314.70 samples/sec Loss 5.1582 LearningRate 0.0004 Epoch: 17 Global Step: 371120 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:29:04,156-Speed 6313.62 samples/sec Loss 5.1705 LearningRate 0.0004 Epoch: 17 Global Step: 371130 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:29:07,410-Speed 6295.91 samples/sec Loss 5.2006 LearningRate 0.0004 Epoch: 17 Global Step: 371140 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:29:10,642-Speed 6337.03 samples/sec Loss 5.2438 LearningRate 0.0004 Epoch: 17 Global Step: 371150 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:13,892-Speed 6303.45 samples/sec Loss 5.2100 LearningRate 0.0004 Epoch: 17 Global Step: 371160 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:17,137-Speed 6313.22 samples/sec Loss 5.1860 LearningRate 0.0004 Epoch: 17 Global Step: 371170 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:20,380-Speed 6315.68 samples/sec Loss 5.2285 LearningRate 0.0004 Epoch: 17 Global Step: 371180 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:23,628-Speed 6307.94 samples/sec Loss 5.1372 LearningRate 0.0004 Epoch: 17 Global Step: 371190 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:26,872-Speed 6313.27 samples/sec Loss 5.2331 LearningRate 0.0004 Epoch: 17 Global Step: 371200 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:30,124-Speed 6300.13 samples/sec Loss 5.2663 LearningRate 0.0004 Epoch: 17 Global Step: 371210 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:33,367-Speed 6317.02 samples/sec Loss 5.1820 LearningRate 0.0004 Epoch: 17 Global Step: 371220 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:36,615-Speed 6306.69 samples/sec Loss 5.1646 LearningRate 0.0004 Epoch: 17 Global Step: 371230 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:39,860-Speed 6312.34 samples/sec Loss 5.1983 LearningRate 0.0004 Epoch: 17 Global Step: 371240 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:43,107-Speed 6308.87 samples/sec Loss 5.2104 LearningRate 0.0004 Epoch: 17 Global Step: 371250 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:29:46,336-Speed 6344.32 samples/sec Loss 5.2177 LearningRate 0.0004 Epoch: 17 Global Step: 371260 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:49,579-Speed 6315.08 samples/sec Loss 5.2577 LearningRate 0.0004 Epoch: 17 Global Step: 371270 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:52,825-Speed 6311.19 samples/sec Loss 5.2555 LearningRate 0.0004 Epoch: 17 Global Step: 371280 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:56,068-Speed 6316.58 samples/sec Loss 5.2519 LearningRate 0.0004 Epoch: 17 Global Step: 371290 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:29:59,313-Speed 6313.38 samples/sec Loss 5.1601 LearningRate 0.0004 Epoch: 17 Global Step: 371300 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:30:02,562-Speed 6304.40 samples/sec Loss 5.2711 LearningRate 0.0004 Epoch: 17 Global Step: 371310 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:30:05,815-Speed 6297.10 samples/sec Loss 5.1974 LearningRate 0.0004 Epoch: 17 Global Step: 371320 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:30:09,059-Speed 6314.23 samples/sec Loss 5.1697 LearningRate 0.0004 Epoch: 17 Global Step: 371330 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:30:12,299-Speed 6323.17 samples/sec Loss 5.2391 LearningRate 0.0004 Epoch: 17 Global Step: 371340 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:30:15,553-Speed 6294.63 samples/sec Loss 5.1817 LearningRate 0.0004 Epoch: 17 Global Step: 371350 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:30:18,801-Speed 6309.19 samples/sec Loss 5.2602 LearningRate 0.0004 Epoch: 17 Global Step: 371360 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:22,046-Speed 6312.39 samples/sec Loss 5.2473 LearningRate 0.0004 Epoch: 17 Global Step: 371370 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:25,294-Speed 6307.05 samples/sec Loss 5.1829 LearningRate 0.0004 Epoch: 17 Global Step: 371380 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:28,545-Speed 6301.43 samples/sec Loss 5.1669 LearningRate 0.0004 Epoch: 17 Global Step: 371390 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:31,795-Speed 6303.57 samples/sec Loss 5.2316 LearningRate 0.0004 Epoch: 17 Global Step: 371400 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:35,045-Speed 6302.15 samples/sec Loss 5.1804 LearningRate 0.0004 Epoch: 17 Global Step: 371410 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:38,292-Speed 6309.90 samples/sec Loss 5.2564 LearningRate 0.0004 Epoch: 17 Global Step: 371420 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:41,539-Speed 6307.22 samples/sec Loss 5.1966 LearningRate 0.0004 Epoch: 17 Global Step: 371430 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:44,785-Speed 6310.67 samples/sec Loss 5.2301 LearningRate 0.0004 Epoch: 17 Global Step: 371440 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:48,034-Speed 6305.04 samples/sec Loss 5.1879 LearningRate 0.0004 Epoch: 17 Global Step: 371450 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:51,277-Speed 6317.25 samples/sec Loss 5.1815 LearningRate 0.0004 Epoch: 17 Global Step: 371460 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-04-02 01:30:54,510-Speed 6336.77 samples/sec Loss 5.2307 LearningRate 0.0004 Epoch: 17 Global Step: 371470 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:30:57,754-Speed 6313.45 samples/sec Loss 5.2144 LearningRate 0.0004 Epoch: 17 Global Step: 371480 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:31:01,001-Speed 6309.56 samples/sec Loss 5.2060 LearningRate 0.0004 Epoch: 17 Global Step: 371490 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:31:04,249-Speed 6306.99 samples/sec Loss 5.1575 LearningRate 0.0004 Epoch: 17 Global Step: 371500 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:31:07,479-Speed 6341.44 samples/sec Loss 5.1795 LearningRate 0.0004 Epoch: 17 Global Step: 371510 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:10,728-Speed 6305.89 samples/sec Loss 5.2160 LearningRate 0.0004 Epoch: 17 Global Step: 371520 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:13,974-Speed 6309.35 samples/sec Loss 5.1981 LearningRate 0.0004 Epoch: 17 Global Step: 371530 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:17,221-Speed 6309.70 samples/sec Loss 5.2005 LearningRate 0.0004 Epoch: 17 Global Step: 371540 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:20,468-Speed 6308.74 samples/sec Loss 5.1722 LearningRate 0.0004 Epoch: 17 Global Step: 371550 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:23,721-Speed 6296.26 samples/sec Loss 5.2019 LearningRate 0.0004 Epoch: 17 Global Step: 371560 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:26,967-Speed 6310.48 samples/sec Loss 5.1801 LearningRate 0.0004 Epoch: 17 Global Step: 371570 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:30,214-Speed 6308.30 samples/sec Loss 5.1997 LearningRate 0.0004 Epoch: 17 Global Step: 371580 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:33,458-Speed 6316.11 samples/sec Loss 5.2512 LearningRate 0.0004 Epoch: 17 Global Step: 371590 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:36,707-Speed 6305.36 samples/sec Loss 5.2018 LearningRate 0.0004 Epoch: 17 Global Step: 371600 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:39,951-Speed 6314.39 samples/sec Loss 5.1961 LearningRate 0.0004 Epoch: 17 Global Step: 371610 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:31:43,197-Speed 6310.17 samples/sec Loss 5.2475 LearningRate 0.0004 Epoch: 17 Global Step: 371620 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:31:46,483-Speed 6234.49 samples/sec Loss 5.2511 LearningRate 0.0004 Epoch: 17 Global Step: 371630 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:31:49,732-Speed 6304.94 samples/sec Loss 5.2111 LearningRate 0.0004 Epoch: 17 Global Step: 371640 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:31:52,978-Speed 6310.49 samples/sec Loss 5.1879 LearningRate 0.0004 Epoch: 17 Global Step: 371650 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:31:56,210-Speed 6338.36 samples/sec Loss 5.1759 LearningRate 0.0004 Epoch: 17 Global Step: 371660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:31:59,451-Speed 6320.57 samples/sec Loss 5.1874 LearningRate 0.0004 Epoch: 17 Global Step: 371670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:02,701-Speed 6303.57 samples/sec Loss 5.2776 LearningRate 0.0004 Epoch: 17 Global Step: 371680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:05,949-Speed 6306.23 samples/sec Loss 5.2268 LearningRate 0.0004 Epoch: 17 Global Step: 371690 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:09,197-Speed 6306.96 samples/sec Loss 5.1724 LearningRate 0.0004 Epoch: 17 Global Step: 371700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:12,458-Speed 6280.90 samples/sec Loss 5.2359 LearningRate 0.0004 Epoch: 17 Global Step: 371710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:15,701-Speed 6316.38 samples/sec Loss 5.1536 LearningRate 0.0004 Epoch: 17 Global Step: 371720 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:18,944-Speed 6316.57 samples/sec Loss 5.1876 LearningRate 0.0004 Epoch: 17 Global Step: 371730 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:22,188-Speed 6315.88 samples/sec Loss 5.1872 LearningRate 0.0004 Epoch: 17 Global Step: 371740 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:25,432-Speed 6314.19 samples/sec Loss 5.2099 LearningRate 0.0004 Epoch: 17 Global Step: 371750 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:28,677-Speed 6312.85 samples/sec Loss 5.2473 LearningRate 0.0004 Epoch: 17 Global Step: 371760 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:32:31,907-Speed 6341.30 samples/sec Loss 5.2122 LearningRate 0.0004 Epoch: 17 Global Step: 371770 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:35,155-Speed 6307.76 samples/sec Loss 5.2212 LearningRate 0.0004 Epoch: 17 Global Step: 371780 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:38,401-Speed 6309.88 samples/sec Loss 5.2298 LearningRate 0.0004 Epoch: 17 Global Step: 371790 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:41,647-Speed 6311.29 samples/sec Loss 5.1776 LearningRate 0.0004 Epoch: 17 Global Step: 371800 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:44,892-Speed 6312.27 samples/sec Loss 5.1833 LearningRate 0.0004 Epoch: 17 Global Step: 371810 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:48,143-Speed 6302.43 samples/sec Loss 5.1635 LearningRate 0.0004 Epoch: 17 Global Step: 371820 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:51,386-Speed 6315.26 samples/sec Loss 5.2603 LearningRate 0.0004 Epoch: 17 Global Step: 371830 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:54,630-Speed 6314.99 samples/sec Loss 5.2082 LearningRate 0.0004 Epoch: 17 Global Step: 371840 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:32:57,876-Speed 6311.83 samples/sec Loss 5.1811 LearningRate 0.0004 Epoch: 17 Global Step: 371850 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:01,118-Speed 6317.88 samples/sec Loss 5.1199 LearningRate 0.0004 Epoch: 17 Global Step: 371860 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:04,363-Speed 6313.18 samples/sec Loss 5.1984 LearningRate 0.0004 Epoch: 17 Global Step: 371870 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:33:07,605-Speed 6319.20 samples/sec Loss 5.2628 LearningRate 0.0004 Epoch: 17 Global Step: 371880 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:33:10,850-Speed 6311.04 samples/sec Loss 5.2999 LearningRate 0.0004 Epoch: 17 Global Step: 371890 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:33:14,092-Speed 6318.41 samples/sec Loss 5.2137 LearningRate 0.0004 Epoch: 17 Global Step: 371900 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:33:17,339-Speed 6309.90 samples/sec Loss 5.2675 LearningRate 0.0004 Epoch: 17 Global Step: 371910 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:33:20,571-Speed 6337.04 samples/sec Loss 5.3094 LearningRate 0.0004 Epoch: 17 Global Step: 371920 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:23,816-Speed 6313.50 samples/sec Loss 5.2385 LearningRate 0.0004 Epoch: 17 Global Step: 371930 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:27,061-Speed 6312.15 samples/sec Loss 5.2000 LearningRate 0.0004 Epoch: 17 Global Step: 371940 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:30,304-Speed 6317.20 samples/sec Loss 5.1838 LearningRate 0.0004 Epoch: 17 Global Step: 371950 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:33,544-Speed 6322.02 samples/sec Loss 5.2247 LearningRate 0.0004 Epoch: 17 Global Step: 371960 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:36,786-Speed 6318.31 samples/sec Loss 5.2083 LearningRate 0.0004 Epoch: 17 Global Step: 371970 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:40,031-Speed 6312.68 samples/sec Loss 5.2239 LearningRate 0.0004 Epoch: 17 Global Step: 371980 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:43,279-Speed 6306.08 samples/sec Loss 5.2097 LearningRate 0.0004 Epoch: 17 Global Step: 371990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:46,525-Speed 6311.11 samples/sec Loss 5.2553 LearningRate 0.0004 Epoch: 17 Global Step: 372000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:49,768-Speed 6316.17 samples/sec Loss 5.2215 LearningRate 0.0004 Epoch: 17 Global Step: 372010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:33:53,026-Speed 6288.94 samples/sec Loss 5.1640 LearningRate 0.0004 Epoch: 17 Global Step: 372020 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:33:56,274-Speed 6306.51 samples/sec Loss 5.2299 LearningRate 0.0004 Epoch: 17 Global Step: 372030 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:33:59,518-Speed 6314.20 samples/sec Loss 5.1816 LearningRate 0.0004 Epoch: 17 Global Step: 372040 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:34:02,763-Speed 6314.15 samples/sec Loss 5.1827 LearningRate 0.0004 Epoch: 17 Global Step: 372050 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:34:06,009-Speed 6310.60 samples/sec Loss 5.2071 LearningRate 0.0004 Epoch: 17 Global Step: 372060 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:34:09,256-Speed 6309.57 samples/sec Loss 5.2106 LearningRate 0.0004 Epoch: 17 Global Step: 372070 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:34:12,501-Speed 6311.59 samples/sec Loss 5.1740 LearningRate 0.0004 Epoch: 17 Global Step: 372080 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:34:15,743-Speed 6317.38 samples/sec Loss 5.1880 LearningRate 0.0004 Epoch: 17 Global Step: 372090 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:34:18,989-Speed 6311.10 samples/sec Loss 5.1490 LearningRate 0.0004 Epoch: 17 Global Step: 372100 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:34:22,225-Speed 6331.44 samples/sec Loss 5.2606 LearningRate 0.0004 Epoch: 17 Global Step: 372110 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:34:25,470-Speed 6312.31 samples/sec Loss 5.1350 LearningRate 0.0004 Epoch: 17 Global Step: 372120 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:34:28,716-Speed 6310.66 samples/sec Loss 5.1906 LearningRate 0.0004 Epoch: 17 Global Step: 372130 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:34:31,958-Speed 6317.39 samples/sec Loss 5.1184 LearningRate 0.0004 Epoch: 17 Global Step: 372140 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:34:35,215-Speed 6290.45 samples/sec Loss 5.2361 LearningRate 0.0004 Epoch: 17 Global Step: 372150 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:34:38,462-Speed 6307.98 samples/sec Loss 5.1784 LearningRate 0.0004 Epoch: 17 Global Step: 372160 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:34:41,706-Speed 6315.12 samples/sec Loss 5.2014 LearningRate 0.0004 Epoch: 17 Global Step: 372170 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:34:44,954-Speed 6306.38 samples/sec Loss 5.2420 LearningRate 0.0004 Epoch: 17 Global Step: 372180 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:34:48,200-Speed 6310.28 samples/sec Loss 5.2434 LearningRate 0.0004 Epoch: 17 Global Step: 372190 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:34:51,442-Speed 6318.63 samples/sec Loss 5.2069 LearningRate 0.0004 Epoch: 17 Global Step: 372200 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:34:54,695-Speed 6298.47 samples/sec Loss 5.2547 LearningRate 0.0004 Epoch: 17 Global Step: 372210 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:34:57,940-Speed 6312.60 samples/sec Loss 5.2489 LearningRate 0.0004 Epoch: 17 Global Step: 372220 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:01,186-Speed 6310.26 samples/sec Loss 5.1742 LearningRate 0.0004 Epoch: 17 Global Step: 372230 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:04,432-Speed 6310.53 samples/sec Loss 5.2101 LearningRate 0.0004 Epoch: 17 Global Step: 372240 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:07,679-Speed 6309.39 samples/sec Loss 5.2051 LearningRate 0.0004 Epoch: 17 Global Step: 372250 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:10,925-Speed 6310.40 samples/sec Loss 5.1432 LearningRate 0.0004 Epoch: 17 Global Step: 372260 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:14,177-Speed 6299.94 samples/sec Loss 5.1818 LearningRate 0.0004 Epoch: 17 Global Step: 372270 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:17,427-Speed 6301.95 samples/sec Loss 5.1874 LearningRate 0.0004 Epoch: 17 Global Step: 372280 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:20,672-Speed 6312.85 samples/sec Loss 5.2068 LearningRate 0.0004 Epoch: 17 Global Step: 372290 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:23,919-Speed 6308.42 samples/sec Loss 5.1463 LearningRate 0.0004 Epoch: 17 Global Step: 372300 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:27,164-Speed 6313.03 samples/sec Loss 5.2646 LearningRate 0.0004 Epoch: 17 Global Step: 372310 Fp16 Grad Scale: 65536 Required: 42 hours Training: 2022-04-02 01:35:30,395-Speed 6340.21 samples/sec Loss 5.1654 LearningRate 0.0004 Epoch: 17 Global Step: 372320 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:33,640-Speed 6312.29 samples/sec Loss 5.2298 LearningRate 0.0004 Epoch: 17 Global Step: 372330 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:35:36,874-Speed 6334.13 samples/sec Loss 5.2551 LearningRate 0.0004 Epoch: 17 Global Step: 372340 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:35:40,122-Speed 6308.04 samples/sec Loss 5.2011 LearningRate 0.0004 Epoch: 17 Global Step: 372350 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:35:43,370-Speed 6305.32 samples/sec Loss 5.1934 LearningRate 0.0004 Epoch: 17 Global Step: 372360 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:35:46,619-Speed 6305.75 samples/sec Loss 5.2118 LearningRate 0.0004 Epoch: 17 Global Step: 372370 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:35:49,868-Speed 6304.50 samples/sec Loss 5.2053 LearningRate 0.0004 Epoch: 17 Global Step: 372380 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:35:53,111-Speed 6316.18 samples/sec Loss 5.2413 LearningRate 0.0004 Epoch: 17 Global Step: 372390 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:35:56,366-Speed 6293.81 samples/sec Loss 5.2372 LearningRate 0.0004 Epoch: 17 Global Step: 372400 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:35:59,610-Speed 6314.24 samples/sec Loss 5.2232 LearningRate 0.0004 Epoch: 17 Global Step: 372410 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:02,858-Speed 6306.86 samples/sec Loss 5.1857 LearningRate 0.0004 Epoch: 17 Global Step: 372420 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:06,106-Speed 6306.38 samples/sec Loss 5.2122 LearningRate 0.0004 Epoch: 17 Global Step: 372430 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:09,355-Speed 6306.45 samples/sec Loss 5.2266 LearningRate 0.0004 Epoch: 17 Global Step: 372440 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:36:12,602-Speed 6308.23 samples/sec Loss 5.1469 LearningRate 0.0004 Epoch: 17 Global Step: 372450 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:36:15,847-Speed 6312.88 samples/sec Loss 5.1939 LearningRate 0.0004 Epoch: 17 Global Step: 372460 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:36:19,077-Speed 6342.23 samples/sec Loss 5.1772 LearningRate 0.0004 Epoch: 17 Global Step: 372470 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:22,322-Speed 6313.48 samples/sec Loss 5.2973 LearningRate 0.0004 Epoch: 17 Global Step: 372480 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:25,566-Speed 6314.64 samples/sec Loss 5.2188 LearningRate 0.0004 Epoch: 17 Global Step: 372490 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:28,810-Speed 6313.50 samples/sec Loss 5.1906 LearningRate 0.0004 Epoch: 17 Global Step: 372500 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:32,058-Speed 6306.84 samples/sec Loss 5.1512 LearningRate 0.0004 Epoch: 17 Global Step: 372510 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:35,307-Speed 6305.34 samples/sec Loss 5.2032 LearningRate 0.0004 Epoch: 17 Global Step: 372520 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:38,551-Speed 6314.03 samples/sec Loss 5.1948 LearningRate 0.0004 Epoch: 17 Global Step: 372530 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:41,796-Speed 6312.80 samples/sec Loss 5.1794 LearningRate 0.0004 Epoch: 17 Global Step: 372540 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:45,042-Speed 6310.58 samples/sec Loss 5.2058 LearningRate 0.0004 Epoch: 17 Global Step: 372550 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:48,288-Speed 6310.82 samples/sec Loss 5.2105 LearningRate 0.0004 Epoch: 17 Global Step: 372560 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:36:51,535-Speed 6308.70 samples/sec Loss 5.2028 LearningRate 0.0004 Epoch: 17 Global Step: 372570 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:36:54,781-Speed 6310.61 samples/sec Loss 5.1935 LearningRate 0.0004 Epoch: 17 Global Step: 372580 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:36:58,032-Speed 6301.34 samples/sec Loss 5.1566 LearningRate 0.0004 Epoch: 17 Global Step: 372590 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:37:01,290-Speed 6287.05 samples/sec Loss 5.2216 LearningRate 0.0004 Epoch: 17 Global Step: 372600 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:37:04,544-Speed 6296.45 samples/sec Loss 5.1692 LearningRate 0.0004 Epoch: 17 Global Step: 372610 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:37:07,774-Speed 6343.31 samples/sec Loss 5.2305 LearningRate 0.0004 Epoch: 17 Global Step: 372620 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:37:11,020-Speed 6310.47 samples/sec Loss 5.1684 LearningRate 0.0004 Epoch: 17 Global Step: 372630 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:37:14,262-Speed 6318.56 samples/sec Loss 5.1812 LearningRate 0.0004 Epoch: 17 Global Step: 372640 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:37:17,509-Speed 6308.82 samples/sec Loss 5.3117 LearningRate 0.0004 Epoch: 17 Global Step: 372650 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:37:20,754-Speed 6312.91 samples/sec Loss 5.1922 LearningRate 0.0004 Epoch: 17 Global Step: 372660 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:37:24,010-Speed 6291.86 samples/sec Loss 5.1727 LearningRate 0.0004 Epoch: 17 Global Step: 372670 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:37:27,253-Speed 6316.01 samples/sec Loss 5.1872 LearningRate 0.0004 Epoch: 17 Global Step: 372680 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:37:30,497-Speed 6315.23 samples/sec Loss 5.2099 LearningRate 0.0004 Epoch: 17 Global Step: 372690 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:37:33,743-Speed 6311.22 samples/sec Loss 5.1438 LearningRate 0.0004 Epoch: 17 Global Step: 372700 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:37:36,989-Speed 6309.89 samples/sec Loss 5.2082 LearningRate 0.0004 Epoch: 17 Global Step: 372710 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:37:40,230-Speed 6320.49 samples/sec Loss 5.2068 LearningRate 0.0004 Epoch: 17 Global Step: 372720 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:37:43,475-Speed 6312.96 samples/sec Loss 5.2703 LearningRate 0.0004 Epoch: 17 Global Step: 372730 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:37:46,763-Speed 6229.87 samples/sec Loss 5.1984 LearningRate 0.0004 Epoch: 17 Global Step: 372740 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:37:50,127-Speed 6089.42 samples/sec Loss 5.2352 LearningRate 0.0004 Epoch: 17 Global Step: 372750 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:37:53,384-Speed 6289.80 samples/sec Loss 5.1980 LearningRate 0.0004 Epoch: 17 Global Step: 372760 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:37:56,634-Speed 6301.19 samples/sec Loss 5.1651 LearningRate 0.0004 Epoch: 17 Global Step: 372770 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:37:59,879-Speed 6313.80 samples/sec Loss 5.2261 LearningRate 0.0004 Epoch: 17 Global Step: 372780 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:38:03,146-Speed 6270.57 samples/sec Loss 5.1618 LearningRate 0.0004 Epoch: 17 Global Step: 372790 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:38:06,380-Speed 6333.69 samples/sec Loss 5.2431 LearningRate 0.0004 Epoch: 17 Global Step: 372800 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:09,625-Speed 6313.02 samples/sec Loss 5.1452 LearningRate 0.0004 Epoch: 17 Global Step: 372810 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:12,866-Speed 6320.72 samples/sec Loss 5.1802 LearningRate 0.0004 Epoch: 17 Global Step: 372820 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:16,115-Speed 6303.84 samples/sec Loss 5.1766 LearningRate 0.0004 Epoch: 17 Global Step: 372830 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:19,397-Speed 6241.19 samples/sec Loss 5.1879 LearningRate 0.0004 Epoch: 17 Global Step: 372840 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:22,651-Speed 6296.24 samples/sec Loss 5.1437 LearningRate 0.0004 Epoch: 17 Global Step: 372850 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:25,898-Speed 6307.63 samples/sec Loss 5.2520 LearningRate 0.0004 Epoch: 17 Global Step: 372860 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:29,148-Speed 6304.86 samples/sec Loss 5.1764 LearningRate 0.0004 Epoch: 17 Global Step: 372870 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:32,396-Speed 6305.78 samples/sec Loss 5.1935 LearningRate 0.0004 Epoch: 17 Global Step: 372880 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:35,641-Speed 6314.12 samples/sec Loss 5.2201 LearningRate 0.0004 Epoch: 17 Global Step: 372890 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:38,885-Speed 6313.19 samples/sec Loss 5.2307 LearningRate 0.0004 Epoch: 17 Global Step: 372900 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:38:42,127-Speed 6319.77 samples/sec Loss 5.1761 LearningRate 0.0004 Epoch: 17 Global Step: 372910 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:38:45,357-Speed 6340.75 samples/sec Loss 5.1443 LearningRate 0.0004 Epoch: 17 Global Step: 372920 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:48,600-Speed 6316.23 samples/sec Loss 5.2152 LearningRate 0.0004 Epoch: 17 Global Step: 372930 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:51,845-Speed 6313.85 samples/sec Loss 5.2206 LearningRate 0.0004 Epoch: 17 Global Step: 372940 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:55,090-Speed 6312.12 samples/sec Loss 5.1811 LearningRate 0.0004 Epoch: 17 Global Step: 372950 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:38:58,336-Speed 6310.72 samples/sec Loss 5.2600 LearningRate 0.0004 Epoch: 17 Global Step: 372960 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:39:01,580-Speed 6314.09 samples/sec Loss 5.1734 LearningRate 0.0004 Epoch: 17 Global Step: 372970 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:39:04,825-Speed 6312.90 samples/sec Loss 5.1812 LearningRate 0.0004 Epoch: 17 Global Step: 372980 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:39:08,070-Speed 6312.78 samples/sec Loss 5.1764 LearningRate 0.0004 Epoch: 17 Global Step: 372990 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:39:11,314-Speed 6314.48 samples/sec Loss 5.2478 LearningRate 0.0004 Epoch: 17 Global Step: 373000 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:39:14,561-Speed 6308.53 samples/sec Loss 5.2500 LearningRate 0.0004 Epoch: 17 Global Step: 373010 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-02 01:39:17,812-Speed 6300.73 samples/sec Loss 5.2035 LearningRate 0.0004 Epoch: 17 Global Step: 373020 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:39:21,058-Speed 6310.44 samples/sec Loss 5.1717 LearningRate 0.0004 Epoch: 17 Global Step: 373030 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:39:24,304-Speed 6310.98 samples/sec Loss 5.1106 LearningRate 0.0004 Epoch: 17 Global Step: 373040 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:39:27,547-Speed 6316.03 samples/sec Loss 5.1888 LearningRate 0.0004 Epoch: 17 Global Step: 373050 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:39:30,793-Speed 6311.96 samples/sec Loss 5.2296 LearningRate 0.0004 Epoch: 17 Global Step: 373060 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:39:34,039-Speed 6311.27 samples/sec Loss 5.2445 LearningRate 0.0004 Epoch: 17 Global Step: 373070 Fp16 Grad Scale: 32768 Required: 42 hours Training: 2022-04-02 01:39:37,284-Speed 6311.89 samples/sec Loss 5.2308 LearningRate 0.0004 Epoch: 17 Global Step: 373080 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:39:40,529-Speed 6312.77 samples/sec Loss 5.1750 LearningRate 0.0004 Epoch: 17 Global Step: 373090 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:39:43,780-Speed 6300.95 samples/sec Loss 5.1900 LearningRate 0.0004 Epoch: 17 Global Step: 373100 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:39:47,030-Speed 6304.23 samples/sec Loss 5.2559 LearningRate 0.0004 Epoch: 17 Global Step: 373110 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:39:50,263-Speed 6335.13 samples/sec Loss 5.2134 LearningRate 0.0004 Epoch: 17 Global Step: 373120 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:39:53,508-Speed 6313.58 samples/sec Loss 5.2294 LearningRate 0.0004 Epoch: 17 Global Step: 373130 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:39:56,738-Speed 6341.46 samples/sec Loss 5.2310 LearningRate 0.0004 Epoch: 17 Global Step: 373140 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:39:59,989-Speed 6299.94 samples/sec Loss 5.2295 LearningRate 0.0004 Epoch: 17 Global Step: 373150 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:03,235-Speed 6312.72 samples/sec Loss 5.2282 LearningRate 0.0004 Epoch: 17 Global Step: 373160 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:06,480-Speed 6312.28 samples/sec Loss 5.1660 LearningRate 0.0004 Epoch: 17 Global Step: 373170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:09,724-Speed 6313.44 samples/sec Loss 5.1851 LearningRate 0.0004 Epoch: 17 Global Step: 373180 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:12,975-Speed 6302.53 samples/sec Loss 5.2862 LearningRate 0.0004 Epoch: 17 Global Step: 373190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:16,216-Speed 6318.55 samples/sec Loss 5.2002 LearningRate 0.0004 Epoch: 17 Global Step: 373200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:19,462-Speed 6311.86 samples/sec Loss 5.2647 LearningRate 0.0004 Epoch: 17 Global Step: 373210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:22,734-Speed 6259.65 samples/sec Loss 5.1966 LearningRate 0.0004 Epoch: 17 Global Step: 373220 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:25,978-Speed 6314.25 samples/sec Loss 5.1383 LearningRate 0.0004 Epoch: 17 Global Step: 373230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:29,223-Speed 6314.10 samples/sec Loss 5.1557 LearningRate 0.0004 Epoch: 17 Global Step: 373240 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:40:32,466-Speed 6315.21 samples/sec Loss 5.2404 LearningRate 0.0004 Epoch: 17 Global Step: 373250 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:40:35,710-Speed 6314.87 samples/sec Loss 5.1523 LearningRate 0.0004 Epoch: 17 Global Step: 373260 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:40:38,963-Speed 6296.51 samples/sec Loss 5.3004 LearningRate 0.0004 Epoch: 17 Global Step: 373270 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:40:42,217-Speed 6295.99 samples/sec Loss 5.2309 LearningRate 0.0004 Epoch: 17 Global Step: 373280 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:40:45,444-Speed 6347.51 samples/sec Loss 5.1596 LearningRate 0.0004 Epoch: 17 Global Step: 373290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:48,694-Speed 6303.44 samples/sec Loss 5.1761 LearningRate 0.0004 Epoch: 17 Global Step: 373300 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:40:51,944-Speed 6303.86 samples/sec Loss 5.2610 LearningRate 0.0004 Epoch: 17 Global Step: 373310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:41:51,687-Speed 342.80 samples/sec Loss 5.2420 LearningRate 0.0004 Epoch: 18 Global Step: 373320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:41:54,925-Speed 6327.58 samples/sec Loss 5.2305 LearningRate 0.0004 Epoch: 18 Global Step: 373330 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:41:58,161-Speed 6330.00 samples/sec Loss 5.2031 LearningRate 0.0004 Epoch: 18 Global Step: 373340 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:01,394-Speed 6336.89 samples/sec Loss 5.1957 LearningRate 0.0004 Epoch: 18 Global Step: 373350 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:04,629-Speed 6330.72 samples/sec Loss 5.2181 LearningRate 0.0004 Epoch: 18 Global Step: 373360 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:07,869-Speed 6323.21 samples/sec Loss 5.2026 LearningRate 0.0004 Epoch: 18 Global Step: 373370 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:11,109-Speed 6321.26 samples/sec Loss 5.1837 LearningRate 0.0004 Epoch: 18 Global Step: 373380 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:14,349-Speed 6323.74 samples/sec Loss 5.2241 LearningRate 0.0004 Epoch: 18 Global Step: 373390 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:42:17,587-Speed 6325.74 samples/sec Loss 5.2251 LearningRate 0.0004 Epoch: 18 Global Step: 373400 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:42:20,825-Speed 6325.07 samples/sec Loss 5.2828 LearningRate 0.0004 Epoch: 18 Global Step: 373410 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:42:24,145-Speed 6170.62 samples/sec Loss 5.1274 LearningRate 0.0004 Epoch: 18 Global Step: 373420 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:42:27,429-Speed 6237.51 samples/sec Loss 5.1766 LearningRate 0.0004 Epoch: 18 Global Step: 373430 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:42:30,671-Speed 6319.46 samples/sec Loss 5.1919 LearningRate 0.0004 Epoch: 18 Global Step: 373440 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:42:33,918-Speed 6307.74 samples/sec Loss 5.2303 LearningRate 0.0004 Epoch: 18 Global Step: 373450 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:42:37,151-Speed 6336.53 samples/sec Loss 5.1717 LearningRate 0.0004 Epoch: 18 Global Step: 373460 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:40,394-Speed 6317.20 samples/sec Loss 5.1763 LearningRate 0.0004 Epoch: 18 Global Step: 373470 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:43,635-Speed 6319.45 samples/sec Loss 5.1731 LearningRate 0.0004 Epoch: 18 Global Step: 373480 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:46,876-Speed 6321.99 samples/sec Loss 5.1798 LearningRate 0.0004 Epoch: 18 Global Step: 373490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:50,115-Speed 6323.75 samples/sec Loss 5.1716 LearningRate 0.0004 Epoch: 18 Global Step: 373500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:53,359-Speed 6315.44 samples/sec Loss 5.1928 LearningRate 0.0004 Epoch: 18 Global Step: 373510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:42:58,540-Speed 3953.41 samples/sec Loss 5.1005 LearningRate 0.0004 Epoch: 18 Global Step: 373520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:43:01,775-Speed 6331.66 samples/sec Loss 5.1469 LearningRate 0.0004 Epoch: 18 Global Step: 373530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:43:05,014-Speed 6323.99 samples/sec Loss 5.2013 LearningRate 0.0004 Epoch: 18 Global Step: 373540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:43:08,253-Speed 6325.05 samples/sec Loss 5.1965 LearningRate 0.0004 Epoch: 18 Global Step: 373550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:43:11,497-Speed 6314.31 samples/sec Loss 5.2297 LearningRate 0.0004 Epoch: 18 Global Step: 373560 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:43:14,739-Speed 6319.31 samples/sec Loss 5.2102 LearningRate 0.0004 Epoch: 18 Global Step: 373570 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:43:17,978-Speed 6322.55 samples/sec Loss 5.1206 LearningRate 0.0004 Epoch: 18 Global Step: 373580 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:43:21,215-Speed 6328.52 samples/sec Loss 5.2717 LearningRate 0.0004 Epoch: 18 Global Step: 373590 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:43:24,452-Speed 6329.36 samples/sec Loss 5.1651 LearningRate 0.0004 Epoch: 18 Global Step: 373600 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:43:27,698-Speed 6310.42 samples/sec Loss 5.2121 LearningRate 0.0004 Epoch: 18 Global Step: 373610 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:43:30,919-Speed 6359.54 samples/sec Loss 5.1877 LearningRate 0.0004 Epoch: 18 Global Step: 373620 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:43:34,156-Speed 6328.21 samples/sec Loss 5.1379 LearningRate 0.0004 Epoch: 18 Global Step: 373630 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:43:37,399-Speed 6317.00 samples/sec Loss 5.1560 LearningRate 0.0004 Epoch: 18 Global Step: 373640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:43:40,641-Speed 6319.11 samples/sec Loss 5.1798 LearningRate 0.0004 Epoch: 18 Global Step: 373650 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:43:43,877-Speed 6330.22 samples/sec Loss 5.1912 LearningRate 0.0004 Epoch: 18 Global Step: 373660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:43:47,117-Speed 6320.66 samples/sec Loss 5.2427 LearningRate 0.0004 Epoch: 18 Global Step: 373670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:43:50,341-Speed 6354.59 samples/sec Loss 5.1863 LearningRate 0.0004 Epoch: 18 Global Step: 373680 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-04-02 01:43:53,579-Speed 6327.02 samples/sec Loss 5.1662 LearningRate 0.0004 Epoch: 18 Global Step: 373690 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-04-02 01:43:56,813-Speed 6334.01 samples/sec Loss 5.1816 LearningRate 0.0004 Epoch: 18 Global Step: 373700 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-04-02 01:44:00,056-Speed 6317.28 samples/sec Loss 5.1530 LearningRate 0.0004 Epoch: 18 Global Step: 373710 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-04-02 01:44:03,296-Speed 6324.05 samples/sec Loss 5.1972 LearningRate 0.0004 Epoch: 18 Global Step: 373720 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-04-02 01:44:06,534-Speed 6325.85 samples/sec Loss 5.1759 LearningRate 0.0004 Epoch: 18 Global Step: 373730 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-04-02 01:44:09,768-Speed 6333.50 samples/sec Loss 5.2219 LearningRate 0.0004 Epoch: 18 Global Step: 373740 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-04-02 01:44:13,005-Speed 6327.76 samples/sec Loss 5.1320 LearningRate 0.0004 Epoch: 18 Global Step: 373750 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-04-02 01:44:16,243-Speed 6327.93 samples/sec Loss 5.1265 LearningRate 0.0004 Epoch: 18 Global Step: 373760 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-04-02 01:44:19,481-Speed 6325.50 samples/sec Loss 5.1656 LearningRate 0.0004 Epoch: 18 Global Step: 373770 Fp16 Grad Scale: 8192 Required: 41 hours Training: 2022-04-02 01:44:22,719-Speed 6326.17 samples/sec Loss 5.2028 LearningRate 0.0004 Epoch: 18 Global Step: 373780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:44:25,962-Speed 6316.23 samples/sec Loss 5.1411 LearningRate 0.0004 Epoch: 18 Global Step: 373790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:44:29,207-Speed 6314.03 samples/sec Loss 5.2417 LearningRate 0.0004 Epoch: 18 Global Step: 373800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:44:32,445-Speed 6325.46 samples/sec Loss 5.1903 LearningRate 0.0004 Epoch: 18 Global Step: 373810 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:44:35,684-Speed 6325.22 samples/sec Loss 5.2264 LearningRate 0.0004 Epoch: 18 Global Step: 373820 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:44:38,926-Speed 6317.75 samples/sec Loss 5.2174 LearningRate 0.0004 Epoch: 18 Global Step: 373830 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:44:42,172-Speed 6310.60 samples/sec Loss 5.1597 LearningRate 0.0004 Epoch: 18 Global Step: 373840 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:44:45,413-Speed 6319.20 samples/sec Loss 5.1619 LearningRate 0.0004 Epoch: 18 Global Step: 373850 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:44:48,656-Speed 6316.75 samples/sec Loss 5.2203 LearningRate 0.0004 Epoch: 18 Global Step: 373860 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:44:51,997-Speed 6132.14 samples/sec Loss 5.2278 LearningRate 0.0004 Epoch: 18 Global Step: 373870 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:44:55,293-Speed 6213.85 samples/sec Loss 5.2661 LearningRate 0.0004 Epoch: 18 Global Step: 373880 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:44:58,533-Speed 6324.75 samples/sec Loss 5.1927 LearningRate 0.0004 Epoch: 18 Global Step: 373890 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:45:01,809-Speed 6251.69 samples/sec Loss 5.1444 LearningRate 0.0004 Epoch: 18 Global Step: 373900 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:45:05,054-Speed 6314.11 samples/sec Loss 5.1773 LearningRate 0.0004 Epoch: 18 Global Step: 373910 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:45:08,293-Speed 6325.00 samples/sec Loss 5.1771 LearningRate 0.0004 Epoch: 18 Global Step: 373920 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:45:11,531-Speed 6324.53 samples/sec Loss 5.1312 LearningRate 0.0004 Epoch: 18 Global Step: 373930 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:45:14,770-Speed 6325.12 samples/sec Loss 5.2300 LearningRate 0.0004 Epoch: 18 Global Step: 373940 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:45:18,013-Speed 6316.58 samples/sec Loss 5.1988 LearningRate 0.0004 Epoch: 18 Global Step: 373950 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:45:21,253-Speed 6322.28 samples/sec Loss 5.1860 LearningRate 0.0004 Epoch: 18 Global Step: 373960 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:45:24,512-Speed 6285.48 samples/sec Loss 5.1286 LearningRate 0.0004 Epoch: 18 Global Step: 373970 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:45:27,736-Speed 6352.62 samples/sec Loss 5.1952 LearningRate 0.0004 Epoch: 18 Global Step: 373980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:45:30,976-Speed 6323.89 samples/sec Loss 5.1832 LearningRate 0.0004 Epoch: 18 Global Step: 373990 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:45:34,218-Speed 6317.12 samples/sec Loss 5.2086 LearningRate 0.0004 Epoch: 18 Global Step: 374000 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:45:37,454-Speed 6331.27 samples/sec Loss 5.2038 LearningRate 0.0004 Epoch: 18 Global Step: 374010 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:45:40,693-Speed 6324.21 samples/sec Loss 5.2317 LearningRate 0.0004 Epoch: 18 Global Step: 374020 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:45:43,932-Speed 6324.09 samples/sec Loss 5.2009 LearningRate 0.0004 Epoch: 18 Global Step: 374030 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:45:47,174-Speed 6317.92 samples/sec Loss 5.2014 LearningRate 0.0004 Epoch: 18 Global Step: 374040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:45:50,416-Speed 6321.55 samples/sec Loss 5.1397 LearningRate 0.0004 Epoch: 18 Global Step: 374050 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:45:53,653-Speed 6328.28 samples/sec Loss 5.2275 LearningRate 0.0004 Epoch: 18 Global Step: 374060 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:45:56,893-Speed 6323.20 samples/sec Loss 5.2146 LearningRate 0.0004 Epoch: 18 Global Step: 374070 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:00,130-Speed 6327.61 samples/sec Loss 5.1303 LearningRate 0.0004 Epoch: 18 Global Step: 374080 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:46:03,369-Speed 6323.34 samples/sec Loss 5.1921 LearningRate 0.0004 Epoch: 18 Global Step: 374090 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:46:06,609-Speed 6324.92 samples/sec Loss 5.1144 LearningRate 0.0004 Epoch: 18 Global Step: 374100 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:46:09,846-Speed 6326.92 samples/sec Loss 5.2084 LearningRate 0.0004 Epoch: 18 Global Step: 374110 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:46:13,087-Speed 6321.28 samples/sec Loss 5.1458 LearningRate 0.0004 Epoch: 18 Global Step: 374120 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:46:16,333-Speed 6310.63 samples/sec Loss 5.2657 LearningRate 0.0004 Epoch: 18 Global Step: 374130 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:46:19,570-Speed 6328.18 samples/sec Loss 5.1489 LearningRate 0.0004 Epoch: 18 Global Step: 374140 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:46:22,808-Speed 6326.49 samples/sec Loss 5.1635 LearningRate 0.0004 Epoch: 18 Global Step: 374150 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:46:26,039-Speed 6340.40 samples/sec Loss 5.2032 LearningRate 0.0004 Epoch: 18 Global Step: 374160 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:29,276-Speed 6327.75 samples/sec Loss 5.2101 LearningRate 0.0004 Epoch: 18 Global Step: 374170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:32,515-Speed 6324.67 samples/sec Loss 5.1892 LearningRate 0.0004 Epoch: 18 Global Step: 374180 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:35,758-Speed 6316.73 samples/sec Loss 5.2121 LearningRate 0.0004 Epoch: 18 Global Step: 374190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:38,999-Speed 6320.13 samples/sec Loss 5.2052 LearningRate 0.0004 Epoch: 18 Global Step: 374200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:42,239-Speed 6322.29 samples/sec Loss 5.1664 LearningRate 0.0004 Epoch: 18 Global Step: 374210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:45,479-Speed 6322.62 samples/sec Loss 5.1758 LearningRate 0.0004 Epoch: 18 Global Step: 374220 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:48,720-Speed 6320.75 samples/sec Loss 5.2110 LearningRate 0.0004 Epoch: 18 Global Step: 374230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:51,964-Speed 6313.53 samples/sec Loss 5.1971 LearningRate 0.0004 Epoch: 18 Global Step: 374240 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:55,264-Speed 6207.65 samples/sec Loss 5.2389 LearningRate 0.0004 Epoch: 18 Global Step: 374250 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:46:58,547-Speed 6240.86 samples/sec Loss 5.2237 LearningRate 0.0004 Epoch: 18 Global Step: 374260 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:47:01,812-Speed 6274.19 samples/sec Loss 5.1995 LearningRate 0.0004 Epoch: 18 Global Step: 374270 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:47:05,087-Speed 6253.66 samples/sec Loss 5.2224 LearningRate 0.0004 Epoch: 18 Global Step: 374280 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:47:08,327-Speed 6322.62 samples/sec Loss 5.2091 LearningRate 0.0004 Epoch: 18 Global Step: 374290 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:47:11,569-Speed 6317.43 samples/sec Loss 5.1595 LearningRate 0.0004 Epoch: 18 Global Step: 374300 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:47:14,795-Speed 6349.99 samples/sec Loss 5.2073 LearningRate 0.0004 Epoch: 18 Global Step: 374310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:47:18,035-Speed 6323.24 samples/sec Loss 5.2285 LearningRate 0.0004 Epoch: 18 Global Step: 374320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:47:21,278-Speed 6317.70 samples/sec Loss 5.2099 LearningRate 0.0004 Epoch: 18 Global Step: 374330 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:47:24,518-Speed 6321.34 samples/sec Loss 5.2456 LearningRate 0.0004 Epoch: 18 Global Step: 374340 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:47:27,757-Speed 6324.89 samples/sec Loss 5.1293 LearningRate 0.0004 Epoch: 18 Global Step: 374350 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:47:30,997-Speed 6321.75 samples/sec Loss 5.1279 LearningRate 0.0004 Epoch: 18 Global Step: 374360 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:47:34,238-Speed 6320.44 samples/sec Loss 5.1429 LearningRate 0.0004 Epoch: 18 Global Step: 374370 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:47:37,486-Speed 6307.99 samples/sec Loss 5.1959 LearningRate 0.0004 Epoch: 18 Global Step: 374380 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:47:40,726-Speed 6321.05 samples/sec Loss 5.1562 LearningRate 0.0004 Epoch: 18 Global Step: 374390 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:47:43,967-Speed 6321.89 samples/sec Loss 5.1942 LearningRate 0.0004 Epoch: 18 Global Step: 374400 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:47:47,210-Speed 6315.76 samples/sec Loss 5.1548 LearningRate 0.0004 Epoch: 18 Global Step: 374410 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:47:50,450-Speed 6322.29 samples/sec Loss 5.1450 LearningRate 0.0004 Epoch: 18 Global Step: 374420 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:47:53,690-Speed 6322.06 samples/sec Loss 5.0980 LearningRate 0.0004 Epoch: 18 Global Step: 374430 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:47:56,931-Speed 6320.52 samples/sec Loss 5.2022 LearningRate 0.0004 Epoch: 18 Global Step: 374440 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:48:00,175-Speed 6315.83 samples/sec Loss 5.2021 LearningRate 0.0004 Epoch: 18 Global Step: 374450 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:48:03,417-Speed 6318.64 samples/sec Loss 5.1673 LearningRate 0.0004 Epoch: 18 Global Step: 374460 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:48:06,656-Speed 6323.04 samples/sec Loss 5.2869 LearningRate 0.0004 Epoch: 18 Global Step: 374470 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:48:09,886-Speed 6341.66 samples/sec Loss 5.1748 LearningRate 0.0004 Epoch: 18 Global Step: 374480 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:48:13,125-Speed 6324.80 samples/sec Loss 5.1794 LearningRate 0.0004 Epoch: 18 Global Step: 374490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:48:16,365-Speed 6323.57 samples/sec Loss 5.2683 LearningRate 0.0004 Epoch: 18 Global Step: 374500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:48:19,605-Speed 6321.84 samples/sec Loss 5.1036 LearningRate 0.0004 Epoch: 18 Global Step: 374510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:48:22,847-Speed 6317.53 samples/sec Loss 5.2502 LearningRate 0.0004 Epoch: 18 Global Step: 374520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:48:26,093-Speed 6312.65 samples/sec Loss 5.2519 LearningRate 0.0004 Epoch: 18 Global Step: 374530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:48:29,334-Speed 6320.46 samples/sec Loss 5.2338 LearningRate 0.0004 Epoch: 18 Global Step: 374540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:48:32,576-Speed 6318.92 samples/sec Loss 5.1387 LearningRate 0.0004 Epoch: 18 Global Step: 374550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:48:35,817-Speed 6320.77 samples/sec Loss 5.1602 LearningRate 0.0004 Epoch: 18 Global Step: 374560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:48:39,059-Speed 6317.49 samples/sec Loss 5.2238 LearningRate 0.0004 Epoch: 18 Global Step: 374570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:48:42,308-Speed 6303.88 samples/sec Loss 5.1356 LearningRate 0.0004 Epoch: 18 Global Step: 374580 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:48:45,550-Speed 6320.07 samples/sec Loss 5.1386 LearningRate 0.0004 Epoch: 18 Global Step: 374590 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:48:48,795-Speed 6313.01 samples/sec Loss 5.2161 LearningRate 0.0004 Epoch: 18 Global Step: 374600 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:48:52,036-Speed 6318.94 samples/sec Loss 5.1107 LearningRate 0.0004 Epoch: 18 Global Step: 374610 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:48:55,279-Speed 6316.36 samples/sec Loss 5.1365 LearningRate 0.0004 Epoch: 18 Global Step: 374620 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:48:58,523-Speed 6315.52 samples/sec Loss 5.1491 LearningRate 0.0004 Epoch: 18 Global Step: 374630 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:49:01,770-Speed 6308.83 samples/sec Loss 5.1830 LearningRate 0.0004 Epoch: 18 Global Step: 374640 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:49:05,016-Speed 6311.35 samples/sec Loss 5.2084 LearningRate 0.0004 Epoch: 18 Global Step: 374650 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:49:08,256-Speed 6321.35 samples/sec Loss 5.2043 LearningRate 0.0004 Epoch: 18 Global Step: 374660 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:49:11,500-Speed 6313.72 samples/sec Loss 5.1741 LearningRate 0.0004 Epoch: 18 Global Step: 374670 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:49:14,727-Speed 6350.24 samples/sec Loss 5.2155 LearningRate 0.0004 Epoch: 18 Global Step: 374680 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:49:17,971-Speed 6314.57 samples/sec Loss 5.1795 LearningRate 0.0004 Epoch: 18 Global Step: 374690 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:49:21,214-Speed 6317.08 samples/sec Loss 5.2687 LearningRate 0.0004 Epoch: 18 Global Step: 374700 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:49:24,443-Speed 6343.41 samples/sec Loss 5.2160 LearningRate 0.0004 Epoch: 18 Global Step: 374710 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:49:27,694-Speed 6301.60 samples/sec Loss 5.1940 LearningRate 0.0004 Epoch: 18 Global Step: 374720 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:49:30,936-Speed 6319.07 samples/sec Loss 5.1181 LearningRate 0.0004 Epoch: 18 Global Step: 374730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:49:34,181-Speed 6312.85 samples/sec Loss 5.2767 LearningRate 0.0004 Epoch: 18 Global Step: 374740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:49:37,484-Speed 6201.61 samples/sec Loss 5.1881 LearningRate 0.0004 Epoch: 18 Global Step: 374750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:49:40,733-Speed 6304.38 samples/sec Loss 5.2464 LearningRate 0.0004 Epoch: 18 Global Step: 374760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:49:43,978-Speed 6313.26 samples/sec Loss 5.0943 LearningRate 0.0004 Epoch: 18 Global Step: 374770 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:49:47,217-Speed 6323.85 samples/sec Loss 5.1503 LearningRate 0.0004 Epoch: 18 Global Step: 374780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:49:50,461-Speed 6315.69 samples/sec Loss 5.1579 LearningRate 0.0004 Epoch: 18 Global Step: 374790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:49:53,715-Speed 6294.68 samples/sec Loss 5.0676 LearningRate 0.0004 Epoch: 18 Global Step: 374800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:49:56,960-Speed 6312.77 samples/sec Loss 5.2272 LearningRate 0.0004 Epoch: 18 Global Step: 374810 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:50:00,204-Speed 6314.50 samples/sec Loss 5.1958 LearningRate 0.0004 Epoch: 18 Global Step: 374820 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:50:03,455-Speed 6301.18 samples/sec Loss 5.2516 LearningRate 0.0004 Epoch: 18 Global Step: 374830 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:50:06,702-Speed 6308.14 samples/sec Loss 5.1544 LearningRate 0.0004 Epoch: 18 Global Step: 374840 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:50:09,948-Speed 6311.49 samples/sec Loss 5.1281 LearningRate 0.0004 Epoch: 18 Global Step: 374850 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:50:13,181-Speed 6335.90 samples/sec Loss 5.1335 LearningRate 0.0004 Epoch: 18 Global Step: 374860 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:50:16,431-Speed 6302.83 samples/sec Loss 5.1840 LearningRate 0.0004 Epoch: 18 Global Step: 374870 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:50:19,673-Speed 6319.13 samples/sec Loss 5.1760 LearningRate 0.0004 Epoch: 18 Global Step: 374880 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:50:22,917-Speed 6313.47 samples/sec Loss 5.2053 LearningRate 0.0004 Epoch: 18 Global Step: 374890 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:50:26,157-Speed 6322.47 samples/sec Loss 5.1535 LearningRate 0.0004 Epoch: 18 Global Step: 374900 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:50:29,398-Speed 6320.28 samples/sec Loss 5.1708 LearningRate 0.0004 Epoch: 18 Global Step: 374910 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:50:32,642-Speed 6314.68 samples/sec Loss 5.2296 LearningRate 0.0004 Epoch: 18 Global Step: 374920 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:50:35,883-Speed 6320.84 samples/sec Loss 5.1548 LearningRate 0.0004 Epoch: 18 Global Step: 374930 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:50:39,125-Speed 6318.61 samples/sec Loss 5.1407 LearningRate 0.0004 Epoch: 18 Global Step: 374940 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:50:42,371-Speed 6310.08 samples/sec Loss 5.2279 LearningRate 0.0004 Epoch: 18 Global Step: 374950 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:50:45,614-Speed 6318.16 samples/sec Loss 5.1172 LearningRate 0.0004 Epoch: 18 Global Step: 374960 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:50:48,859-Speed 6312.39 samples/sec Loss 5.2644 LearningRate 0.0004 Epoch: 18 Global Step: 374970 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:50:52,103-Speed 6315.08 samples/sec Loss 5.2406 LearningRate 0.0004 Epoch: 18 Global Step: 374980 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:50:55,346-Speed 6315.40 samples/sec Loss 5.2067 LearningRate 0.0004 Epoch: 18 Global Step: 374990 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:50:58,588-Speed 6319.70 samples/sec Loss 5.1689 LearningRate 0.0004 Epoch: 18 Global Step: 375000 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:51:01,839-Speed 6301.02 samples/sec Loss 5.2322 LearningRate 0.0004 Epoch: 18 Global Step: 375010 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:51:05,069-Speed 6342.06 samples/sec Loss 5.2163 LearningRate 0.0004 Epoch: 18 Global Step: 375020 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:08,314-Speed 6311.28 samples/sec Loss 5.2044 LearningRate 0.0004 Epoch: 18 Global Step: 375030 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:11,556-Speed 6318.34 samples/sec Loss 5.2731 LearningRate 0.0004 Epoch: 18 Global Step: 375040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:14,800-Speed 6315.29 samples/sec Loss 5.1598 LearningRate 0.0004 Epoch: 18 Global Step: 375050 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:18,044-Speed 6314.81 samples/sec Loss 5.1683 LearningRate 0.0004 Epoch: 18 Global Step: 375060 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:21,288-Speed 6314.31 samples/sec Loss 5.2113 LearningRate 0.0004 Epoch: 18 Global Step: 375070 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:24,529-Speed 6321.62 samples/sec Loss 5.1191 LearningRate 0.0004 Epoch: 18 Global Step: 375080 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:27,773-Speed 6314.00 samples/sec Loss 5.1343 LearningRate 0.0004 Epoch: 18 Global Step: 375090 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:31,019-Speed 6310.86 samples/sec Loss 5.1990 LearningRate 0.0004 Epoch: 18 Global Step: 375100 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:34,267-Speed 6306.59 samples/sec Loss 5.1841 LearningRate 0.0004 Epoch: 18 Global Step: 375110 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:37,510-Speed 6315.80 samples/sec Loss 5.1817 LearningRate 0.0004 Epoch: 18 Global Step: 375120 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:51:40,756-Speed 6314.58 samples/sec Loss 5.1576 LearningRate 0.0004 Epoch: 18 Global Step: 375130 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:51:43,983-Speed 6346.56 samples/sec Loss 5.1556 LearningRate 0.0004 Epoch: 18 Global Step: 375140 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:47,231-Speed 6307.73 samples/sec Loss 5.2168 LearningRate 0.0004 Epoch: 18 Global Step: 375150 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:50,474-Speed 6317.09 samples/sec Loss 5.1978 LearningRate 0.0004 Epoch: 18 Global Step: 375160 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:53,717-Speed 6316.29 samples/sec Loss 5.1425 LearningRate 0.0004 Epoch: 18 Global Step: 375170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:51:56,959-Speed 6320.01 samples/sec Loss 5.1822 LearningRate 0.0004 Epoch: 18 Global Step: 375180 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:00,215-Speed 6290.11 samples/sec Loss 5.1805 LearningRate 0.0004 Epoch: 18 Global Step: 375190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:03,468-Speed 6298.79 samples/sec Loss 5.1698 LearningRate 0.0004 Epoch: 18 Global Step: 375200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:06,716-Speed 6306.37 samples/sec Loss 5.1728 LearningRate 0.0004 Epoch: 18 Global Step: 375210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:09,959-Speed 6315.70 samples/sec Loss 5.1209 LearningRate 0.0004 Epoch: 18 Global Step: 375220 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:13,205-Speed 6311.28 samples/sec Loss 5.0804 LearningRate 0.0004 Epoch: 18 Global Step: 375230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:16,431-Speed 6348.77 samples/sec Loss 5.1885 LearningRate 0.0004 Epoch: 18 Global Step: 375240 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:19,674-Speed 6317.15 samples/sec Loss 5.1098 LearningRate 0.0004 Epoch: 18 Global Step: 375250 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:22,920-Speed 6311.51 samples/sec Loss 5.2249 LearningRate 0.0004 Epoch: 18 Global Step: 375260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:26,164-Speed 6314.03 samples/sec Loss 5.1026 LearningRate 0.0004 Epoch: 18 Global Step: 375270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:29,409-Speed 6311.45 samples/sec Loss 5.1211 LearningRate 0.0004 Epoch: 18 Global Step: 375280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:32,651-Speed 6320.33 samples/sec Loss 5.2436 LearningRate 0.0004 Epoch: 18 Global Step: 375290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:35,903-Speed 6297.84 samples/sec Loss 5.2003 LearningRate 0.0004 Epoch: 18 Global Step: 375300 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:39,149-Speed 6310.62 samples/sec Loss 5.1887 LearningRate 0.0004 Epoch: 18 Global Step: 375310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:42,390-Speed 6320.25 samples/sec Loss 5.1116 LearningRate 0.0004 Epoch: 18 Global Step: 375320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:45,636-Speed 6312.07 samples/sec Loss 5.2021 LearningRate 0.0004 Epoch: 18 Global Step: 375330 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:48,878-Speed 6318.00 samples/sec Loss 5.1114 LearningRate 0.0004 Epoch: 18 Global Step: 375340 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:52:52,123-Speed 6311.52 samples/sec Loss 5.1947 LearningRate 0.0004 Epoch: 18 Global Step: 375350 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:52:55,355-Speed 6338.97 samples/sec Loss 5.1309 LearningRate 0.0004 Epoch: 18 Global Step: 375360 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:52:58,599-Speed 6315.58 samples/sec Loss 5.1986 LearningRate 0.0004 Epoch: 18 Global Step: 375370 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:01,841-Speed 6319.34 samples/sec Loss 5.2458 LearningRate 0.0004 Epoch: 18 Global Step: 375380 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:05,097-Speed 6291.10 samples/sec Loss 5.1648 LearningRate 0.0004 Epoch: 18 Global Step: 375390 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:08,338-Speed 6319.27 samples/sec Loss 5.2171 LearningRate 0.0004 Epoch: 18 Global Step: 375400 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:11,580-Speed 6318.98 samples/sec Loss 5.2127 LearningRate 0.0004 Epoch: 18 Global Step: 375410 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:14,825-Speed 6312.28 samples/sec Loss 5.1805 LearningRate 0.0004 Epoch: 18 Global Step: 375420 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:18,069-Speed 6314.51 samples/sec Loss 5.2182 LearningRate 0.0004 Epoch: 18 Global Step: 375430 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:21,314-Speed 6312.13 samples/sec Loss 5.1813 LearningRate 0.0004 Epoch: 18 Global Step: 375440 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:24,561-Speed 6310.58 samples/sec Loss 5.1612 LearningRate 0.0004 Epoch: 18 Global Step: 375450 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:27,802-Speed 6320.13 samples/sec Loss 5.1660 LearningRate 0.0004 Epoch: 18 Global Step: 375460 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:53:31,045-Speed 6319.65 samples/sec Loss 5.1599 LearningRate 0.0004 Epoch: 18 Global Step: 375470 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:53:34,285-Speed 6321.16 samples/sec Loss 5.2426 LearningRate 0.0004 Epoch: 18 Global Step: 375480 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:53:37,517-Speed 6338.45 samples/sec Loss 5.1776 LearningRate 0.0004 Epoch: 18 Global Step: 375490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:40,761-Speed 6315.14 samples/sec Loss 5.2056 LearningRate 0.0004 Epoch: 18 Global Step: 375500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:44,006-Speed 6312.07 samples/sec Loss 5.1955 LearningRate 0.0004 Epoch: 18 Global Step: 375510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:47,249-Speed 6317.06 samples/sec Loss 5.1654 LearningRate 0.0004 Epoch: 18 Global Step: 375520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:50,490-Speed 6320.76 samples/sec Loss 5.1788 LearningRate 0.0004 Epoch: 18 Global Step: 375530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:53,734-Speed 6313.31 samples/sec Loss 5.2499 LearningRate 0.0004 Epoch: 18 Global Step: 375540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:53:56,977-Speed 6316.32 samples/sec Loss 5.2024 LearningRate 0.0004 Epoch: 18 Global Step: 375550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:00,221-Speed 6314.80 samples/sec Loss 5.1635 LearningRate 0.0004 Epoch: 18 Global Step: 375560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:03,468-Speed 6312.96 samples/sec Loss 5.1910 LearningRate 0.0004 Epoch: 18 Global Step: 375570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:06,717-Speed 6304.09 samples/sec Loss 5.2286 LearningRate 0.0004 Epoch: 18 Global Step: 375580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:09,963-Speed 6311.07 samples/sec Loss 5.1763 LearningRate 0.0004 Epoch: 18 Global Step: 375590 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:54:13,190-Speed 6348.31 samples/sec Loss 5.1613 LearningRate 0.0004 Epoch: 18 Global Step: 375600 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:16,435-Speed 6313.79 samples/sec Loss 5.1909 LearningRate 0.0004 Epoch: 18 Global Step: 375610 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:19,678-Speed 6315.02 samples/sec Loss 5.1367 LearningRate 0.0004 Epoch: 18 Global Step: 375620 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:22,936-Speed 6287.83 samples/sec Loss 5.1655 LearningRate 0.0004 Epoch: 18 Global Step: 375630 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:26,185-Speed 6304.79 samples/sec Loss 5.2133 LearningRate 0.0004 Epoch: 18 Global Step: 375640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:29,429-Speed 6315.29 samples/sec Loss 5.1501 LearningRate 0.0004 Epoch: 18 Global Step: 375650 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:32,675-Speed 6310.27 samples/sec Loss 5.2069 LearningRate 0.0004 Epoch: 18 Global Step: 375660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:35,917-Speed 6319.09 samples/sec Loss 5.1981 LearningRate 0.0004 Epoch: 18 Global Step: 375670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:39,161-Speed 6314.33 samples/sec Loss 5.1456 LearningRate 0.0004 Epoch: 18 Global Step: 375680 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:42,408-Speed 6308.47 samples/sec Loss 5.1764 LearningRate 0.0004 Epoch: 18 Global Step: 375690 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:45,658-Speed 6303.08 samples/sec Loss 5.0578 LearningRate 0.0004 Epoch: 18 Global Step: 375700 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:54:48,903-Speed 6313.39 samples/sec Loss 5.1018 LearningRate 0.0004 Epoch: 18 Global Step: 375710 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:54:52,148-Speed 6312.58 samples/sec Loss 5.1747 LearningRate 0.0004 Epoch: 18 Global Step: 375720 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:54:55,380-Speed 6337.78 samples/sec Loss 5.2534 LearningRate 0.0004 Epoch: 18 Global Step: 375730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:54:58,623-Speed 6316.54 samples/sec Loss 5.1693 LearningRate 0.0004 Epoch: 18 Global Step: 375740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:55:01,872-Speed 6304.72 samples/sec Loss 5.1271 LearningRate 0.0004 Epoch: 18 Global Step: 375750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:55:05,114-Speed 6318.21 samples/sec Loss 5.1404 LearningRate 0.0004 Epoch: 18 Global Step: 375760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:55:08,358-Speed 6315.09 samples/sec Loss 5.2062 LearningRate 0.0004 Epoch: 18 Global Step: 375770 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:55:11,601-Speed 6315.67 samples/sec Loss 5.1154 LearningRate 0.0004 Epoch: 18 Global Step: 375780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:55:14,847-Speed 6312.46 samples/sec Loss 5.1917 LearningRate 0.0004 Epoch: 18 Global Step: 375790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:55:18,096-Speed 6305.04 samples/sec Loss 5.2013 LearningRate 0.0004 Epoch: 18 Global Step: 375800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:55:21,338-Speed 6317.70 samples/sec Loss 5.1207 LearningRate 0.0004 Epoch: 18 Global Step: 375810 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:55:24,583-Speed 6313.42 samples/sec Loss 5.2023 LearningRate 0.0004 Epoch: 18 Global Step: 375820 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:55:27,833-Speed 6302.23 samples/sec Loss 5.1210 LearningRate 0.0004 Epoch: 18 Global Step: 375830 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:55:31,084-Speed 6302.16 samples/sec Loss 5.2011 LearningRate 0.0004 Epoch: 18 Global Step: 375840 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:55:34,332-Speed 6306.14 samples/sec Loss 5.1218 LearningRate 0.0004 Epoch: 18 Global Step: 375850 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:55:37,574-Speed 6319.18 samples/sec Loss 5.1880 LearningRate 0.0004 Epoch: 18 Global Step: 375860 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:55:40,818-Speed 6313.37 samples/sec Loss 5.2045 LearningRate 0.0004 Epoch: 18 Global Step: 375870 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:55:44,062-Speed 6315.62 samples/sec Loss 5.1799 LearningRate 0.0004 Epoch: 18 Global Step: 375880 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:55:47,322-Speed 6282.48 samples/sec Loss 5.1776 LearningRate 0.0004 Epoch: 18 Global Step: 375890 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:55:50,570-Speed 6308.38 samples/sec Loss 5.2544 LearningRate 0.0004 Epoch: 18 Global Step: 375900 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:55:53,809-Speed 6322.89 samples/sec Loss 5.2108 LearningRate 0.0004 Epoch: 18 Global Step: 375910 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:55:57,053-Speed 6315.75 samples/sec Loss 5.1389 LearningRate 0.0004 Epoch: 18 Global Step: 375920 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:56:00,288-Speed 6331.51 samples/sec Loss 5.1344 LearningRate 0.0004 Epoch: 18 Global Step: 375930 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:56:03,534-Speed 6311.36 samples/sec Loss 5.1071 LearningRate 0.0004 Epoch: 18 Global Step: 375940 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:56:06,781-Speed 6308.01 samples/sec Loss 5.1336 LearningRate 0.0004 Epoch: 18 Global Step: 375950 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:56:10,024-Speed 6317.04 samples/sec Loss 5.1798 LearningRate 0.0004 Epoch: 18 Global Step: 375960 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:56:13,264-Speed 6321.06 samples/sec Loss 5.1193 LearningRate 0.0004 Epoch: 18 Global Step: 375970 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:56:16,514-Speed 6304.52 samples/sec Loss 5.1416 LearningRate 0.0004 Epoch: 18 Global Step: 375980 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:56:19,761-Speed 6308.05 samples/sec Loss 5.1337 LearningRate 0.0004 Epoch: 18 Global Step: 375990 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:56:22,997-Speed 6331.90 samples/sec Loss 5.1873 LearningRate 0.0004 Epoch: 18 Global Step: 376000 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:56:26,239-Speed 6318.43 samples/sec Loss 5.1876 LearningRate 0.0004 Epoch: 18 Global Step: 376010 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:56:29,483-Speed 6314.45 samples/sec Loss 5.1603 LearningRate 0.0004 Epoch: 18 Global Step: 376020 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:56:32,731-Speed 6310.59 samples/sec Loss 5.2300 LearningRate 0.0004 Epoch: 18 Global Step: 376030 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:56:35,973-Speed 6317.93 samples/sec Loss 5.1755 LearningRate 0.0004 Epoch: 18 Global Step: 376040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:56:39,219-Speed 6309.88 samples/sec Loss 5.1930 LearningRate 0.0004 Epoch: 18 Global Step: 376050 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:56:42,458-Speed 6324.50 samples/sec Loss 5.2095 LearningRate 0.0004 Epoch: 18 Global Step: 376060 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:56:45,712-Speed 6296.97 samples/sec Loss 5.1824 LearningRate 0.0004 Epoch: 18 Global Step: 376070 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:56:48,973-Speed 6279.91 samples/sec Loss 5.0659 LearningRate 0.0004 Epoch: 18 Global Step: 376080 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:56:52,220-Speed 6310.03 samples/sec Loss 5.1152 LearningRate 0.0004 Epoch: 18 Global Step: 376090 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:56:55,475-Speed 6292.91 samples/sec Loss 5.1300 LearningRate 0.0004 Epoch: 18 Global Step: 376100 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:56:58,721-Speed 6309.99 samples/sec Loss 5.1521 LearningRate 0.0004 Epoch: 18 Global Step: 376110 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:57:01,969-Speed 6307.16 samples/sec Loss 5.2332 LearningRate 0.0004 Epoch: 18 Global Step: 376120 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:57:05,217-Speed 6307.68 samples/sec Loss 5.3005 LearningRate 0.0004 Epoch: 18 Global Step: 376130 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:57:08,475-Speed 6285.91 samples/sec Loss 5.1736 LearningRate 0.0004 Epoch: 18 Global Step: 376140 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:57:11,708-Speed 6337.32 samples/sec Loss 5.1929 LearningRate 0.0004 Epoch: 18 Global Step: 376150 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:14,952-Speed 6314.89 samples/sec Loss 5.1570 LearningRate 0.0004 Epoch: 18 Global Step: 376160 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:18,201-Speed 6305.08 samples/sec Loss 5.1359 LearningRate 0.0004 Epoch: 18 Global Step: 376170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:21,456-Speed 6292.28 samples/sec Loss 5.1796 LearningRate 0.0004 Epoch: 18 Global Step: 376180 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:24,697-Speed 6319.80 samples/sec Loss 5.1774 LearningRate 0.0004 Epoch: 18 Global Step: 376190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:27,986-Speed 6229.07 samples/sec Loss 5.2039 LearningRate 0.0004 Epoch: 18 Global Step: 376200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:31,262-Speed 6253.93 samples/sec Loss 5.1035 LearningRate 0.0004 Epoch: 18 Global Step: 376210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:34,517-Speed 6292.62 samples/sec Loss 5.1476 LearningRate 0.0004 Epoch: 18 Global Step: 376220 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:37,763-Speed 6311.65 samples/sec Loss 5.1089 LearningRate 0.0004 Epoch: 18 Global Step: 376230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:41,005-Speed 6317.88 samples/sec Loss 5.2751 LearningRate 0.0004 Epoch: 18 Global Step: 376240 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:44,249-Speed 6315.56 samples/sec Loss 5.2345 LearningRate 0.0004 Epoch: 18 Global Step: 376250 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:57:47,480-Speed 6338.91 samples/sec Loss 5.1771 LearningRate 0.0004 Epoch: 18 Global Step: 376260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:50,729-Speed 6305.28 samples/sec Loss 5.2436 LearningRate 0.0004 Epoch: 18 Global Step: 376270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:53,974-Speed 6311.37 samples/sec Loss 5.1756 LearningRate 0.0004 Epoch: 18 Global Step: 376280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:57:57,220-Speed 6311.35 samples/sec Loss 5.1760 LearningRate 0.0004 Epoch: 18 Global Step: 376290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:00,466-Speed 6311.15 samples/sec Loss 5.1558 LearningRate 0.0004 Epoch: 18 Global Step: 376300 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:03,707-Speed 6319.92 samples/sec Loss 5.1671 LearningRate 0.0004 Epoch: 18 Global Step: 376310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:06,951-Speed 6314.62 samples/sec Loss 5.1536 LearningRate 0.0004 Epoch: 18 Global Step: 376320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:10,193-Speed 6318.19 samples/sec Loss 5.2045 LearningRate 0.0004 Epoch: 18 Global Step: 376330 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:13,444-Speed 6301.56 samples/sec Loss 5.2385 LearningRate 0.0004 Epoch: 18 Global Step: 376340 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:16,687-Speed 6316.24 samples/sec Loss 5.1256 LearningRate 0.0004 Epoch: 18 Global Step: 376350 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:19,935-Speed 6306.50 samples/sec Loss 5.1680 LearningRate 0.0004 Epoch: 18 Global Step: 376360 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:58:23,180-Speed 6313.60 samples/sec Loss 5.1935 LearningRate 0.0004 Epoch: 18 Global Step: 376370 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:58:26,461-Speed 6243.57 samples/sec Loss 5.2221 LearningRate 0.0004 Epoch: 18 Global Step: 376380 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:58:29,717-Speed 6291.74 samples/sec Loss 5.1903 LearningRate 0.0004 Epoch: 18 Global Step: 376390 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:58:32,966-Speed 6304.16 samples/sec Loss 5.1745 LearningRate 0.0004 Epoch: 18 Global Step: 376400 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:58:36,212-Speed 6310.54 samples/sec Loss 5.2031 LearningRate 0.0004 Epoch: 18 Global Step: 376410 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:58:39,455-Speed 6317.44 samples/sec Loss 5.1874 LearningRate 0.0004 Epoch: 18 Global Step: 376420 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:58:42,750-Speed 6216.39 samples/sec Loss 5.1572 LearningRate 0.0004 Epoch: 18 Global Step: 376430 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:58:45,981-Speed 6340.51 samples/sec Loss 5.1864 LearningRate 0.0004 Epoch: 18 Global Step: 376440 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:49,228-Speed 6309.80 samples/sec Loss 5.1228 LearningRate 0.0004 Epoch: 18 Global Step: 376450 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:52,469-Speed 6319.21 samples/sec Loss 5.1246 LearningRate 0.0004 Epoch: 18 Global Step: 376460 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:55,713-Speed 6314.66 samples/sec Loss 5.2583 LearningRate 0.0004 Epoch: 18 Global Step: 376470 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:58:58,957-Speed 6314.98 samples/sec Loss 5.1512 LearningRate 0.0004 Epoch: 18 Global Step: 376480 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:02,203-Speed 6312.15 samples/sec Loss 5.1812 LearningRate 0.0004 Epoch: 18 Global Step: 376490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:05,448-Speed 6312.39 samples/sec Loss 5.1375 LearningRate 0.0004 Epoch: 18 Global Step: 376500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:08,690-Speed 6316.92 samples/sec Loss 5.1736 LearningRate 0.0004 Epoch: 18 Global Step: 376510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:11,937-Speed 6309.32 samples/sec Loss 5.1614 LearningRate 0.0004 Epoch: 18 Global Step: 376520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:15,184-Speed 6309.68 samples/sec Loss 5.1746 LearningRate 0.0004 Epoch: 18 Global Step: 376530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:18,430-Speed 6309.47 samples/sec Loss 5.2046 LearningRate 0.0004 Epoch: 18 Global Step: 376540 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:59:21,662-Speed 6338.33 samples/sec Loss 5.1632 LearningRate 0.0004 Epoch: 18 Global Step: 376550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:24,908-Speed 6312.07 samples/sec Loss 5.1878 LearningRate 0.0004 Epoch: 18 Global Step: 376560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:28,149-Speed 6319.63 samples/sec Loss 5.1735 LearningRate 0.0004 Epoch: 18 Global Step: 376570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:31,401-Speed 6299.56 samples/sec Loss 5.1651 LearningRate 0.0004 Epoch: 18 Global Step: 376580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:34,645-Speed 6313.41 samples/sec Loss 5.1948 LearningRate 0.0004 Epoch: 18 Global Step: 376590 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:37,913-Speed 6269.32 samples/sec Loss 5.2506 LearningRate 0.0004 Epoch: 18 Global Step: 376600 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:41,158-Speed 6312.18 samples/sec Loss 5.1694 LearningRate 0.0004 Epoch: 18 Global Step: 376610 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:44,414-Speed 6291.36 samples/sec Loss 5.2436 LearningRate 0.0004 Epoch: 18 Global Step: 376620 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:47,660-Speed 6311.16 samples/sec Loss 5.1193 LearningRate 0.0004 Epoch: 18 Global Step: 376630 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:50,918-Speed 6297.30 samples/sec Loss 5.1342 LearningRate 0.0004 Epoch: 18 Global Step: 376640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 01:59:54,164-Speed 6310.77 samples/sec Loss 5.1490 LearningRate 0.0004 Epoch: 18 Global Step: 376650 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 01:59:57,406-Speed 6319.41 samples/sec Loss 5.1863 LearningRate 0.0004 Epoch: 18 Global Step: 376660 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:00:00,653-Speed 6307.33 samples/sec Loss 5.1715 LearningRate 0.0004 Epoch: 18 Global Step: 376670 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:00:03,900-Speed 6309.34 samples/sec Loss 5.1298 LearningRate 0.0004 Epoch: 18 Global Step: 376680 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:00:07,140-Speed 6322.18 samples/sec Loss 5.1326 LearningRate 0.0004 Epoch: 18 Global Step: 376690 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:00:10,386-Speed 6311.64 samples/sec Loss 5.1444 LearningRate 0.0004 Epoch: 18 Global Step: 376700 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:00:13,618-Speed 6338.52 samples/sec Loss 5.0722 LearningRate 0.0004 Epoch: 18 Global Step: 376710 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:00:16,867-Speed 6303.93 samples/sec Loss 5.1108 LearningRate 0.0004 Epoch: 18 Global Step: 376720 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:00:20,108-Speed 6321.20 samples/sec Loss 5.1334 LearningRate 0.0004 Epoch: 18 Global Step: 376730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:00:23,361-Speed 6296.26 samples/sec Loss 5.1131 LearningRate 0.0004 Epoch: 18 Global Step: 376740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:00:26,612-Speed 6300.24 samples/sec Loss 5.0871 LearningRate 0.0004 Epoch: 18 Global Step: 376750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:00:29,860-Speed 6308.23 samples/sec Loss 5.1170 LearningRate 0.0004 Epoch: 18 Global Step: 376760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:00:33,102-Speed 6318.32 samples/sec Loss 5.1370 LearningRate 0.0004 Epoch: 18 Global Step: 376770 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:00:36,345-Speed 6316.24 samples/sec Loss 5.1674 LearningRate 0.0004 Epoch: 18 Global Step: 376780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:00:39,603-Speed 6287.37 samples/sec Loss 5.1971 LearningRate 0.0004 Epoch: 18 Global Step: 376790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:00:42,885-Speed 6240.69 samples/sec Loss 5.2638 LearningRate 0.0004 Epoch: 18 Global Step: 376800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:00:46,130-Speed 6312.99 samples/sec Loss 5.0216 LearningRate 0.0004 Epoch: 18 Global Step: 376810 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:00:49,377-Speed 6309.70 samples/sec Loss 5.1708 LearningRate 0.0004 Epoch: 18 Global Step: 376820 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:00:52,621-Speed 6314.79 samples/sec Loss 5.1307 LearningRate 0.0004 Epoch: 18 Global Step: 376830 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:00:55,865-Speed 6315.30 samples/sec Loss 5.2350 LearningRate 0.0004 Epoch: 18 Global Step: 376840 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:00:59,092-Speed 6347.45 samples/sec Loss 5.1341 LearningRate 0.0004 Epoch: 18 Global Step: 376850 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:02,338-Speed 6311.06 samples/sec Loss 5.1542 LearningRate 0.0004 Epoch: 18 Global Step: 376860 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:05,581-Speed 6316.71 samples/sec Loss 5.1429 LearningRate 0.0004 Epoch: 18 Global Step: 376870 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:08,823-Speed 6319.05 samples/sec Loss 5.1564 LearningRate 0.0004 Epoch: 18 Global Step: 376880 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:12,070-Speed 6307.62 samples/sec Loss 5.1734 LearningRate 0.0004 Epoch: 18 Global Step: 376890 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:15,314-Speed 6314.01 samples/sec Loss 5.1950 LearningRate 0.0004 Epoch: 18 Global Step: 376900 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:18,556-Speed 6319.36 samples/sec Loss 5.1781 LearningRate 0.0004 Epoch: 18 Global Step: 376910 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:21,798-Speed 6317.85 samples/sec Loss 5.2012 LearningRate 0.0004 Epoch: 18 Global Step: 376920 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:25,042-Speed 6315.26 samples/sec Loss 5.1734 LearningRate 0.0004 Epoch: 18 Global Step: 376930 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:28,301-Speed 6285.19 samples/sec Loss 5.1990 LearningRate 0.0004 Epoch: 18 Global Step: 376940 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:31,543-Speed 6318.74 samples/sec Loss 5.1837 LearningRate 0.0004 Epoch: 18 Global Step: 376950 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:01:34,785-Speed 6317.66 samples/sec Loss 5.1256 LearningRate 0.0004 Epoch: 18 Global Step: 376960 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:01:38,032-Speed 6308.72 samples/sec Loss 5.1494 LearningRate 0.0004 Epoch: 18 Global Step: 376970 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:01:41,266-Speed 6334.10 samples/sec Loss 5.2264 LearningRate 0.0004 Epoch: 18 Global Step: 376980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:44,507-Speed 6321.65 samples/sec Loss 5.2650 LearningRate 0.0004 Epoch: 18 Global Step: 376990 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:47,758-Speed 6299.76 samples/sec Loss 5.1149 LearningRate 0.0004 Epoch: 18 Global Step: 377000 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:51,006-Speed 6307.53 samples/sec Loss 5.1622 LearningRate 0.0004 Epoch: 18 Global Step: 377010 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:54,250-Speed 6314.80 samples/sec Loss 5.1386 LearningRate 0.0004 Epoch: 18 Global Step: 377020 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:01:57,493-Speed 6316.89 samples/sec Loss 5.1641 LearningRate 0.0004 Epoch: 18 Global Step: 377030 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:00,739-Speed 6310.19 samples/sec Loss 5.1538 LearningRate 0.0004 Epoch: 18 Global Step: 377040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:03,982-Speed 6316.42 samples/sec Loss 5.1824 LearningRate 0.0004 Epoch: 18 Global Step: 377050 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:07,232-Speed 6303.46 samples/sec Loss 5.1856 LearningRate 0.0004 Epoch: 18 Global Step: 377060 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:10,475-Speed 6316.79 samples/sec Loss 5.1182 LearningRate 0.0004 Epoch: 18 Global Step: 377070 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:13,721-Speed 6311.31 samples/sec Loss 5.2075 LearningRate 0.0004 Epoch: 18 Global Step: 377080 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:02:16,955-Speed 6334.11 samples/sec Loss 5.1835 LearningRate 0.0004 Epoch: 18 Global Step: 377090 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:20,198-Speed 6316.15 samples/sec Loss 5.1712 LearningRate 0.0004 Epoch: 18 Global Step: 377100 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:23,444-Speed 6310.72 samples/sec Loss 5.1254 LearningRate 0.0004 Epoch: 18 Global Step: 377110 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:26,689-Speed 6312.39 samples/sec Loss 5.1350 LearningRate 0.0004 Epoch: 18 Global Step: 377120 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:29,933-Speed 6314.18 samples/sec Loss 5.1232 LearningRate 0.0004 Epoch: 18 Global Step: 377130 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:33,184-Speed 6302.00 samples/sec Loss 5.2271 LearningRate 0.0004 Epoch: 18 Global Step: 377140 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:36,434-Speed 6303.40 samples/sec Loss 5.1895 LearningRate 0.0004 Epoch: 18 Global Step: 377150 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:39,680-Speed 6309.95 samples/sec Loss 5.2064 LearningRate 0.0004 Epoch: 18 Global Step: 377160 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:42,922-Speed 6317.64 samples/sec Loss 5.1250 LearningRate 0.0004 Epoch: 18 Global Step: 377170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:46,170-Speed 6307.24 samples/sec Loss 5.1857 LearningRate 0.0004 Epoch: 18 Global Step: 377180 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:02:49,411-Speed 6320.19 samples/sec Loss 5.1293 LearningRate 0.0004 Epoch: 18 Global Step: 377190 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:02:52,656-Speed 6312.39 samples/sec Loss 5.1288 LearningRate 0.0004 Epoch: 18 Global Step: 377200 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:02:55,901-Speed 6312.86 samples/sec Loss 5.1680 LearningRate 0.0004 Epoch: 18 Global Step: 377210 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:02:59,149-Speed 6308.11 samples/sec Loss 5.2030 LearningRate 0.0004 Epoch: 18 Global Step: 377220 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:02,396-Speed 6308.38 samples/sec Loss 5.1686 LearningRate 0.0004 Epoch: 18 Global Step: 377230 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:05,639-Speed 6316.27 samples/sec Loss 5.1490 LearningRate 0.0004 Epoch: 18 Global Step: 377240 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:08,886-Speed 6309.54 samples/sec Loss 5.1535 LearningRate 0.0004 Epoch: 18 Global Step: 377250 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:12,128-Speed 6318.93 samples/sec Loss 5.2303 LearningRate 0.0004 Epoch: 18 Global Step: 377260 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:15,373-Speed 6313.42 samples/sec Loss 5.1867 LearningRate 0.0004 Epoch: 18 Global Step: 377270 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:18,618-Speed 6312.17 samples/sec Loss 5.1770 LearningRate 0.0004 Epoch: 18 Global Step: 377280 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:21,847-Speed 6344.01 samples/sec Loss 5.1549 LearningRate 0.0004 Epoch: 18 Global Step: 377290 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:25,095-Speed 6306.05 samples/sec Loss 5.0580 LearningRate 0.0004 Epoch: 18 Global Step: 377300 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:28,339-Speed 6314.79 samples/sec Loss 5.2116 LearningRate 0.0004 Epoch: 18 Global Step: 377310 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:31,583-Speed 6314.55 samples/sec Loss 5.1906 LearningRate 0.0004 Epoch: 18 Global Step: 377320 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:34,831-Speed 6307.84 samples/sec Loss 5.1939 LearningRate 0.0004 Epoch: 18 Global Step: 377330 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:38,073-Speed 6317.13 samples/sec Loss 5.2015 LearningRate 0.0004 Epoch: 18 Global Step: 377340 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:41,340-Speed 6270.59 samples/sec Loss 5.1430 LearningRate 0.0004 Epoch: 18 Global Step: 377350 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:44,581-Speed 6319.72 samples/sec Loss 5.2040 LearningRate 0.0004 Epoch: 18 Global Step: 377360 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:47,826-Speed 6313.19 samples/sec Loss 5.1188 LearningRate 0.0004 Epoch: 18 Global Step: 377370 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:51,075-Speed 6304.83 samples/sec Loss 5.1965 LearningRate 0.0004 Epoch: 18 Global Step: 377380 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:54,308-Speed 6336.16 samples/sec Loss 5.1353 LearningRate 0.0004 Epoch: 18 Global Step: 377390 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:03:57,556-Speed 6305.77 samples/sec Loss 5.2574 LearningRate 0.0004 Epoch: 18 Global Step: 377400 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:04:00,801-Speed 6314.58 samples/sec Loss 5.1800 LearningRate 0.0004 Epoch: 18 Global Step: 377410 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:04:04,045-Speed 6313.17 samples/sec Loss 5.1168 LearningRate 0.0004 Epoch: 18 Global Step: 377420 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:04:07,274-Speed 6344.65 samples/sec Loss 5.1803 LearningRate 0.0004 Epoch: 18 Global Step: 377430 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:04:10,518-Speed 6314.96 samples/sec Loss 5.2392 LearningRate 0.0004 Epoch: 18 Global Step: 377440 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:04:13,761-Speed 6316.81 samples/sec Loss 5.1629 LearningRate 0.0004 Epoch: 18 Global Step: 377450 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:04:17,006-Speed 6312.56 samples/sec Loss 5.1437 LearningRate 0.0004 Epoch: 18 Global Step: 377460 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:04:20,250-Speed 6314.51 samples/sec Loss 5.2409 LearningRate 0.0004 Epoch: 18 Global Step: 377470 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:04:23,493-Speed 6316.16 samples/sec Loss 5.1433 LearningRate 0.0004 Epoch: 18 Global Step: 377480 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:04:26,744-Speed 6302.14 samples/sec Loss 5.0735 LearningRate 0.0004 Epoch: 18 Global Step: 377490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:04:29,988-Speed 6314.22 samples/sec Loss 5.0846 LearningRate 0.0004 Epoch: 18 Global Step: 377500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:04:33,233-Speed 6312.74 samples/sec Loss 5.1849 LearningRate 0.0004 Epoch: 18 Global Step: 377510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:04:36,478-Speed 6311.41 samples/sec Loss 5.1351 LearningRate 0.0004 Epoch: 18 Global Step: 377520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:04:39,722-Speed 6315.79 samples/sec Loss 5.1770 LearningRate 0.0004 Epoch: 18 Global Step: 377530 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:04:42,967-Speed 6312.64 samples/sec Loss 5.2084 LearningRate 0.0004 Epoch: 18 Global Step: 377540 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:04:46,210-Speed 6316.70 samples/sec Loss 5.1108 LearningRate 0.0004 Epoch: 18 Global Step: 377550 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:04:49,455-Speed 6312.27 samples/sec Loss 5.0727 LearningRate 0.0004 Epoch: 18 Global Step: 377560 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:04:52,698-Speed 6316.55 samples/sec Loss 5.2226 LearningRate 0.0004 Epoch: 18 Global Step: 377570 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:04:55,941-Speed 6316.78 samples/sec Loss 5.1473 LearningRate 0.0004 Epoch: 18 Global Step: 377580 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:04:59,185-Speed 6314.30 samples/sec Loss 5.0951 LearningRate 0.0004 Epoch: 18 Global Step: 377590 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:05:02,434-Speed 6304.16 samples/sec Loss 5.1798 LearningRate 0.0004 Epoch: 18 Global Step: 377600 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:05:05,680-Speed 6311.84 samples/sec Loss 5.1495 LearningRate 0.0004 Epoch: 18 Global Step: 377610 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:05:08,925-Speed 6312.40 samples/sec Loss 5.2229 LearningRate 0.0004 Epoch: 18 Global Step: 377620 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:05:12,156-Speed 6340.56 samples/sec Loss 5.1675 LearningRate 0.0004 Epoch: 18 Global Step: 377630 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:05:15,389-Speed 6335.58 samples/sec Loss 5.1403 LearningRate 0.0004 Epoch: 18 Global Step: 377640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:05:18,637-Speed 6307.19 samples/sec Loss 5.1319 LearningRate 0.0004 Epoch: 18 Global Step: 377650 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:05:21,880-Speed 6318.46 samples/sec Loss 5.0982 LearningRate 0.0004 Epoch: 18 Global Step: 377660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:05:25,130-Speed 6303.06 samples/sec Loss 5.1443 LearningRate 0.0004 Epoch: 18 Global Step: 377670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:05:28,375-Speed 6311.29 samples/sec Loss 5.2049 LearningRate 0.0004 Epoch: 18 Global Step: 377680 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:05:31,618-Speed 6318.52 samples/sec Loss 5.2029 LearningRate 0.0004 Epoch: 18 Global Step: 377690 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:05:34,859-Speed 6320.57 samples/sec Loss 5.2864 LearningRate 0.0004 Epoch: 18 Global Step: 377700 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:05:38,104-Speed 6312.18 samples/sec Loss 5.1461 LearningRate 0.0004 Epoch: 18 Global Step: 377710 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:05:41,346-Speed 6319.31 samples/sec Loss 5.1988 LearningRate 0.0004 Epoch: 18 Global Step: 377720 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:05:44,590-Speed 6313.41 samples/sec Loss 5.1202 LearningRate 0.0004 Epoch: 18 Global Step: 377730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:05:47,837-Speed 6309.30 samples/sec Loss 5.1398 LearningRate 0.0004 Epoch: 18 Global Step: 377740 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:05:51,088-Speed 6301.73 samples/sec Loss 5.1588 LearningRate 0.0004 Epoch: 18 Global Step: 377750 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:05:54,334-Speed 6310.99 samples/sec Loss 5.2069 LearningRate 0.0004 Epoch: 18 Global Step: 377760 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:05:57,576-Speed 6316.62 samples/sec Loss 5.1212 LearningRate 0.0004 Epoch: 18 Global Step: 377770 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:00,820-Speed 6315.32 samples/sec Loss 5.1435 LearningRate 0.0004 Epoch: 18 Global Step: 377780 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:04,065-Speed 6311.93 samples/sec Loss 5.1646 LearningRate 0.0004 Epoch: 18 Global Step: 377790 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:07,316-Speed 6301.50 samples/sec Loss 5.1434 LearningRate 0.0004 Epoch: 18 Global Step: 377800 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:10,560-Speed 6315.76 samples/sec Loss 5.1547 LearningRate 0.0004 Epoch: 18 Global Step: 377810 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:13,803-Speed 6315.40 samples/sec Loss 5.1165 LearningRate 0.0004 Epoch: 18 Global Step: 377820 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:17,049-Speed 6310.32 samples/sec Loss 5.1363 LearningRate 0.0004 Epoch: 18 Global Step: 377830 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:20,280-Speed 6341.43 samples/sec Loss 5.1770 LearningRate 0.0004 Epoch: 18 Global Step: 377840 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:23,527-Speed 6307.96 samples/sec Loss 5.2035 LearningRate 0.0004 Epoch: 18 Global Step: 377850 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:26,772-Speed 6312.91 samples/sec Loss 5.1915 LearningRate 0.0004 Epoch: 18 Global Step: 377860 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:30,016-Speed 6313.26 samples/sec Loss 5.0814 LearningRate 0.0004 Epoch: 18 Global Step: 377870 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:33,262-Speed 6311.94 samples/sec Loss 5.0396 LearningRate 0.0004 Epoch: 18 Global Step: 377880 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:36,506-Speed 6314.90 samples/sec Loss 5.1838 LearningRate 0.0004 Epoch: 18 Global Step: 377890 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:39,752-Speed 6311.83 samples/sec Loss 5.1840 LearningRate 0.0004 Epoch: 18 Global Step: 377900 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:42,996-Speed 6315.01 samples/sec Loss 5.1580 LearningRate 0.0004 Epoch: 18 Global Step: 377910 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:46,243-Speed 6307.37 samples/sec Loss 5.1502 LearningRate 0.0004 Epoch: 18 Global Step: 377920 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:49,487-Speed 6314.19 samples/sec Loss 5.1544 LearningRate 0.0004 Epoch: 18 Global Step: 377930 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:06:52,709-Speed 6358.82 samples/sec Loss 5.2004 LearningRate 0.0004 Epoch: 18 Global Step: 377940 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:06:55,955-Speed 6309.98 samples/sec Loss 5.1935 LearningRate 0.0004 Epoch: 18 Global Step: 377950 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:06:59,207-Speed 6299.48 samples/sec Loss 5.1942 LearningRate 0.0004 Epoch: 18 Global Step: 377960 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:02,450-Speed 6316.77 samples/sec Loss 5.1624 LearningRate 0.0004 Epoch: 18 Global Step: 377970 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:05,693-Speed 6316.44 samples/sec Loss 5.1776 LearningRate 0.0004 Epoch: 18 Global Step: 377980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:08,938-Speed 6312.10 samples/sec Loss 5.1558 LearningRate 0.0004 Epoch: 18 Global Step: 377990 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:12,181-Speed 6318.70 samples/sec Loss 5.1100 LearningRate 0.0004 Epoch: 18 Global Step: 378000 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:15,424-Speed 6315.41 samples/sec Loss 5.1480 LearningRate 0.0004 Epoch: 18 Global Step: 378010 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:18,692-Speed 6267.65 samples/sec Loss 5.1851 LearningRate 0.0004 Epoch: 18 Global Step: 378020 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:21,939-Speed 6309.63 samples/sec Loss 5.2107 LearningRate 0.0004 Epoch: 18 Global Step: 378030 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:25,170-Speed 6338.92 samples/sec Loss 5.2398 LearningRate 0.0004 Epoch: 18 Global Step: 378040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:28,414-Speed 6314.86 samples/sec Loss 5.1485 LearningRate 0.0004 Epoch: 18 Global Step: 378050 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:31,661-Speed 6309.72 samples/sec Loss 5.1605 LearningRate 0.0004 Epoch: 18 Global Step: 378060 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:34,903-Speed 6317.50 samples/sec Loss 5.2156 LearningRate 0.0004 Epoch: 18 Global Step: 378070 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:38,147-Speed 6314.31 samples/sec Loss 5.2029 LearningRate 0.0004 Epoch: 18 Global Step: 378080 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:41,391-Speed 6314.99 samples/sec Loss 5.1905 LearningRate 0.0004 Epoch: 18 Global Step: 378090 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:44,640-Speed 6305.80 samples/sec Loss 5.1241 LearningRate 0.0004 Epoch: 18 Global Step: 378100 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:47,899-Speed 6286.03 samples/sec Loss 5.1327 LearningRate 0.0004 Epoch: 18 Global Step: 378110 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:51,211-Speed 6184.23 samples/sec Loss 5.1393 LearningRate 0.0004 Epoch: 18 Global Step: 378120 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:54,453-Speed 6318.32 samples/sec Loss 5.1910 LearningRate 0.0004 Epoch: 18 Global Step: 378130 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:07:57,698-Speed 6313.18 samples/sec Loss 5.1687 LearningRate 0.0004 Epoch: 18 Global Step: 378140 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:08:00,949-Speed 6300.73 samples/sec Loss 5.1417 LearningRate 0.0004 Epoch: 18 Global Step: 378150 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:08:04,274-Speed 6160.72 samples/sec Loss 5.0676 LearningRate 0.0004 Epoch: 18 Global Step: 378160 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:08:07,564-Speed 6227.34 samples/sec Loss 5.0783 LearningRate 0.0004 Epoch: 18 Global Step: 378170 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:08:10,804-Speed 6321.86 samples/sec Loss 5.1343 LearningRate 0.0004 Epoch: 18 Global Step: 378180 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:08:14,050-Speed 6311.27 samples/sec Loss 5.2402 LearningRate 0.0004 Epoch: 18 Global Step: 378190 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:08:17,289-Speed 6323.41 samples/sec Loss 5.0777 LearningRate 0.0004 Epoch: 18 Global Step: 378200 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:08:20,542-Speed 6297.58 samples/sec Loss 5.1741 LearningRate 0.0004 Epoch: 18 Global Step: 378210 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:08:23,787-Speed 6312.87 samples/sec Loss 5.1912 LearningRate 0.0004 Epoch: 18 Global Step: 378220 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:08:27,022-Speed 6331.77 samples/sec Loss 5.1943 LearningRate 0.0004 Epoch: 18 Global Step: 378230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:08:30,276-Speed 6293.93 samples/sec Loss 5.1568 LearningRate 0.0004 Epoch: 18 Global Step: 378240 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:08:33,518-Speed 6318.56 samples/sec Loss 5.1734 LearningRate 0.0004 Epoch: 18 Global Step: 378250 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:08:36,764-Speed 6312.21 samples/sec Loss 5.1165 LearningRate 0.0004 Epoch: 18 Global Step: 378260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:08:40,011-Speed 6307.23 samples/sec Loss 5.2744 LearningRate 0.0004 Epoch: 18 Global Step: 378270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:08:43,253-Speed 6319.23 samples/sec Loss 5.2130 LearningRate 0.0004 Epoch: 18 Global Step: 378280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:08:46,499-Speed 6311.27 samples/sec Loss 5.0694 LearningRate 0.0004 Epoch: 18 Global Step: 378290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:08:49,748-Speed 6305.56 samples/sec Loss 5.1826 LearningRate 0.0004 Epoch: 18 Global Step: 378300 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:08:52,990-Speed 6318.32 samples/sec Loss 5.1592 LearningRate 0.0004 Epoch: 18 Global Step: 378310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:08:56,247-Speed 6289.39 samples/sec Loss 5.1494 LearningRate 0.0004 Epoch: 18 Global Step: 378320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:08:59,491-Speed 6314.44 samples/sec Loss 5.1957 LearningRate 0.0004 Epoch: 18 Global Step: 378330 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:02,742-Speed 6301.75 samples/sec Loss 5.1601 LearningRate 0.0004 Epoch: 18 Global Step: 378340 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:05,984-Speed 6317.67 samples/sec Loss 5.2160 LearningRate 0.0004 Epoch: 18 Global Step: 378350 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:09,232-Speed 6308.08 samples/sec Loss 5.2168 LearningRate 0.0004 Epoch: 18 Global Step: 378360 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:12,473-Speed 6319.16 samples/sec Loss 5.2106 LearningRate 0.0004 Epoch: 18 Global Step: 378370 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:15,718-Speed 6313.48 samples/sec Loss 5.1171 LearningRate 0.0004 Epoch: 18 Global Step: 378380 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:18,963-Speed 6313.28 samples/sec Loss 5.1862 LearningRate 0.0004 Epoch: 18 Global Step: 378390 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:22,204-Speed 6318.97 samples/sec Loss 5.2113 LearningRate 0.0004 Epoch: 18 Global Step: 378400 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:25,448-Speed 6315.38 samples/sec Loss 5.1732 LearningRate 0.0004 Epoch: 18 Global Step: 378410 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:28,691-Speed 6316.34 samples/sec Loss 5.1585 LearningRate 0.0004 Epoch: 18 Global Step: 378420 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:31,936-Speed 6312.51 samples/sec Loss 5.1428 LearningRate 0.0004 Epoch: 18 Global Step: 378430 Fp16 Grad Scale: 65536 Required: 41 hours Training: 2022-04-02 02:09:35,168-Speed 6337.64 samples/sec Loss 5.1303 LearningRate 0.0004 Epoch: 18 Global Step: 378440 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:38,414-Speed 6311.10 samples/sec Loss 5.1330 LearningRate 0.0004 Epoch: 18 Global Step: 378450 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:41,660-Speed 6310.78 samples/sec Loss 5.2270 LearningRate 0.0004 Epoch: 18 Global Step: 378460 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:44,910-Speed 6303.90 samples/sec Loss 5.1565 LearningRate 0.0004 Epoch: 18 Global Step: 378470 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:48,152-Speed 6317.42 samples/sec Loss 5.1630 LearningRate 0.0004 Epoch: 18 Global Step: 378480 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:51,400-Speed 6307.31 samples/sec Loss 5.1223 LearningRate 0.0004 Epoch: 18 Global Step: 378490 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:09:54,689-Speed 6227.39 samples/sec Loss 5.0847 LearningRate 0.0004 Epoch: 18 Global Step: 378500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:09:57,932-Speed 6316.89 samples/sec Loss 5.1881 LearningRate 0.0004 Epoch: 18 Global Step: 378510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:01,176-Speed 6315.66 samples/sec Loss 5.0789 LearningRate 0.0004 Epoch: 18 Global Step: 378520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:04,421-Speed 6311.81 samples/sec Loss 5.1615 LearningRate 0.0004 Epoch: 18 Global Step: 378530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:07,670-Speed 6306.16 samples/sec Loss 5.1475 LearningRate 0.0004 Epoch: 18 Global Step: 378540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:10,914-Speed 6314.60 samples/sec Loss 5.1880 LearningRate 0.0004 Epoch: 18 Global Step: 378550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:14,204-Speed 6226.62 samples/sec Loss 5.1508 LearningRate 0.0004 Epoch: 18 Global Step: 378560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:17,562-Speed 6098.89 samples/sec Loss 5.1568 LearningRate 0.0004 Epoch: 18 Global Step: 378570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:20,806-Speed 6314.90 samples/sec Loss 5.1233 LearningRate 0.0004 Epoch: 18 Global Step: 378580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:24,048-Speed 6317.63 samples/sec Loss 5.1034 LearningRate 0.0004 Epoch: 18 Global Step: 378590 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:27,292-Speed 6316.69 samples/sec Loss 5.1988 LearningRate 0.0004 Epoch: 18 Global Step: 378600 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:10:30,535-Speed 6315.27 samples/sec Loss 5.2252 LearningRate 0.0004 Epoch: 18 Global Step: 378610 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:10:33,778-Speed 6317.49 samples/sec Loss 5.1558 LearningRate 0.0004 Epoch: 18 Global Step: 378620 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:10:37,023-Speed 6311.27 samples/sec Loss 5.0920 LearningRate 0.0004 Epoch: 18 Global Step: 378630 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:10:40,257-Speed 6335.63 samples/sec Loss 5.2148 LearningRate 0.0004 Epoch: 18 Global Step: 378640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:43,503-Speed 6309.60 samples/sec Loss 5.0994 LearningRate 0.0004 Epoch: 18 Global Step: 378650 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:46,745-Speed 6318.31 samples/sec Loss 5.2446 LearningRate 0.0004 Epoch: 18 Global Step: 378660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:49,990-Speed 6313.04 samples/sec Loss 5.1569 LearningRate 0.0004 Epoch: 18 Global Step: 378670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:53,234-Speed 6315.53 samples/sec Loss 5.2141 LearningRate 0.0004 Epoch: 18 Global Step: 378680 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:56,475-Speed 6319.61 samples/sec Loss 5.1432 LearningRate 0.0004 Epoch: 18 Global Step: 378690 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:10:59,724-Speed 6305.78 samples/sec Loss 5.2143 LearningRate 0.0004 Epoch: 18 Global Step: 378700 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:03,010-Speed 6233.66 samples/sec Loss 5.1161 LearningRate 0.0004 Epoch: 18 Global Step: 378710 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:06,256-Speed 6312.13 samples/sec Loss 5.2012 LearningRate 0.0004 Epoch: 18 Global Step: 378720 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:09,502-Speed 6309.37 samples/sec Loss 5.1389 LearningRate 0.0004 Epoch: 18 Global Step: 378730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:12,749-Speed 6308.68 samples/sec Loss 5.1416 LearningRate 0.0004 Epoch: 18 Global Step: 378740 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:11:16,042-Speed 6221.54 samples/sec Loss 5.0468 LearningRate 0.0004 Epoch: 18 Global Step: 378750 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:11:19,272-Speed 6341.83 samples/sec Loss 5.1608 LearningRate 0.0004 Epoch: 18 Global Step: 378760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:22,517-Speed 6312.46 samples/sec Loss 5.1658 LearningRate 0.0004 Epoch: 18 Global Step: 378770 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:25,771-Speed 6296.07 samples/sec Loss 5.2056 LearningRate 0.0004 Epoch: 18 Global Step: 378780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:29,022-Speed 6300.25 samples/sec Loss 5.1206 LearningRate 0.0004 Epoch: 18 Global Step: 378790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:32,272-Speed 6302.52 samples/sec Loss 5.1685 LearningRate 0.0004 Epoch: 18 Global Step: 378800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:35,516-Speed 6314.11 samples/sec Loss 5.0916 LearningRate 0.0004 Epoch: 18 Global Step: 378810 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:38,761-Speed 6313.06 samples/sec Loss 5.2146 LearningRate 0.0004 Epoch: 18 Global Step: 378820 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:42,002-Speed 6320.40 samples/sec Loss 5.1649 LearningRate 0.0004 Epoch: 18 Global Step: 378830 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:45,246-Speed 6315.11 samples/sec Loss 5.2246 LearningRate 0.0004 Epoch: 18 Global Step: 378840 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:48,496-Speed 6302.95 samples/sec Loss 5.1842 LearningRate 0.0004 Epoch: 18 Global Step: 378850 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:11:51,751-Speed 6292.02 samples/sec Loss 5.1472 LearningRate 0.0004 Epoch: 18 Global Step: 378860 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:11:54,998-Speed 6309.63 samples/sec Loss 5.1115 LearningRate 0.0004 Epoch: 18 Global Step: 378870 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:11:58,241-Speed 6315.65 samples/sec Loss 5.2297 LearningRate 0.0004 Epoch: 18 Global Step: 378880 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:12:01,475-Speed 6333.86 samples/sec Loss 5.1784 LearningRate 0.0004 Epoch: 18 Global Step: 378890 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:12:04,717-Speed 6320.01 samples/sec Loss 5.2253 LearningRate 0.0004 Epoch: 18 Global Step: 378900 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:12:07,964-Speed 6307.78 samples/sec Loss 5.1890 LearningRate 0.0004 Epoch: 18 Global Step: 378910 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:12:11,208-Speed 6315.16 samples/sec Loss 5.1339 LearningRate 0.0004 Epoch: 18 Global Step: 378920 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:12:14,450-Speed 6318.65 samples/sec Loss 5.1405 LearningRate 0.0004 Epoch: 18 Global Step: 378930 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:12:17,696-Speed 6312.08 samples/sec Loss 5.1480 LearningRate 0.0004 Epoch: 18 Global Step: 378940 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:12:20,939-Speed 6316.54 samples/sec Loss 5.1726 LearningRate 0.0004 Epoch: 18 Global Step: 378950 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:12:24,188-Speed 6304.79 samples/sec Loss 5.1242 LearningRate 0.0004 Epoch: 18 Global Step: 378960 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:12:27,433-Speed 6312.94 samples/sec Loss 5.2224 LearningRate 0.0004 Epoch: 18 Global Step: 378970 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:12:30,677-Speed 6313.44 samples/sec Loss 5.1122 LearningRate 0.0004 Epoch: 18 Global Step: 378980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:12:33,927-Speed 6303.59 samples/sec Loss 5.1429 LearningRate 0.0004 Epoch: 18 Global Step: 378990 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:12:37,170-Speed 6315.59 samples/sec Loss 5.1395 LearningRate 0.0004 Epoch: 18 Global Step: 379000 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:12:40,415-Speed 6313.68 samples/sec Loss 5.1499 LearningRate 0.0004 Epoch: 18 Global Step: 379010 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:12:43,662-Speed 6308.78 samples/sec Loss 5.2085 LearningRate 0.0004 Epoch: 18 Global Step: 379020 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:12:46,906-Speed 6314.35 samples/sec Loss 5.1384 LearningRate 0.0004 Epoch: 18 Global Step: 379030 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:12:50,223-Speed 6176.18 samples/sec Loss 5.1709 LearningRate 0.0004 Epoch: 18 Global Step: 379040 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:12:53,535-Speed 6183.84 samples/sec Loss 5.1454 LearningRate 0.0004 Epoch: 18 Global Step: 379050 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:12:56,785-Speed 6303.35 samples/sec Loss 5.1508 LearningRate 0.0004 Epoch: 18 Global Step: 379060 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:00,033-Speed 6307.66 samples/sec Loss 5.1037 LearningRate 0.0004 Epoch: 18 Global Step: 379070 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:03,282-Speed 6304.62 samples/sec Loss 5.1107 LearningRate 0.0004 Epoch: 18 Global Step: 379080 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:06,512-Speed 6342.10 samples/sec Loss 5.0863 LearningRate 0.0004 Epoch: 18 Global Step: 379090 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:09,758-Speed 6310.52 samples/sec Loss 5.1818 LearningRate 0.0004 Epoch: 18 Global Step: 379100 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:13,002-Speed 6314.24 samples/sec Loss 5.2149 LearningRate 0.0004 Epoch: 18 Global Step: 379110 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:16,248-Speed 6311.12 samples/sec Loss 5.0749 LearningRate 0.0004 Epoch: 18 Global Step: 379120 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:19,494-Speed 6309.39 samples/sec Loss 5.2324 LearningRate 0.0004 Epoch: 18 Global Step: 379130 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:22,739-Speed 6312.83 samples/sec Loss 5.1269 LearningRate 0.0004 Epoch: 18 Global Step: 379140 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:25,987-Speed 6307.26 samples/sec Loss 5.1991 LearningRate 0.0004 Epoch: 18 Global Step: 379150 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:29,233-Speed 6311.39 samples/sec Loss 5.1070 LearningRate 0.0004 Epoch: 18 Global Step: 379160 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:32,477-Speed 6313.73 samples/sec Loss 5.1726 LearningRate 0.0004 Epoch: 18 Global Step: 379170 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:35,719-Speed 6318.94 samples/sec Loss 5.2043 LearningRate 0.0004 Epoch: 18 Global Step: 379180 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:13:38,937-Speed 6366.05 samples/sec Loss 5.1688 LearningRate 0.0004 Epoch: 18 Global Step: 379190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:13:42,178-Speed 6321.40 samples/sec Loss 5.1271 LearningRate 0.0004 Epoch: 18 Global Step: 379200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:13:45,422-Speed 6315.58 samples/sec Loss 5.1382 LearningRate 0.0004 Epoch: 18 Global Step: 379210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:13:48,669-Speed 6307.88 samples/sec Loss 5.1859 LearningRate 0.0004 Epoch: 18 Global Step: 379220 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:13:51,914-Speed 6313.22 samples/sec Loss 5.1698 LearningRate 0.0004 Epoch: 18 Global Step: 379230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:13:55,156-Speed 6317.02 samples/sec Loss 5.1180 LearningRate 0.0004 Epoch: 18 Global Step: 379240 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:13:58,399-Speed 6318.30 samples/sec Loss 5.1545 LearningRate 0.0004 Epoch: 18 Global Step: 379250 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:01,649-Speed 6302.11 samples/sec Loss 5.0948 LearningRate 0.0004 Epoch: 18 Global Step: 379260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:04,891-Speed 6318.97 samples/sec Loss 5.1564 LearningRate 0.0004 Epoch: 18 Global Step: 379270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:08,145-Speed 6295.56 samples/sec Loss 5.1643 LearningRate 0.0004 Epoch: 18 Global Step: 379280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:11,403-Speed 6285.81 samples/sec Loss 5.2345 LearningRate 0.0004 Epoch: 18 Global Step: 379290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:14,650-Speed 6310.67 samples/sec Loss 5.0972 LearningRate 0.0004 Epoch: 18 Global Step: 379300 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:17,895-Speed 6311.88 samples/sec Loss 5.1732 LearningRate 0.0004 Epoch: 18 Global Step: 379310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:21,141-Speed 6311.26 samples/sec Loss 5.2241 LearningRate 0.0004 Epoch: 18 Global Step: 379320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:24,387-Speed 6310.56 samples/sec Loss 5.1806 LearningRate 0.0004 Epoch: 18 Global Step: 379330 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:27,633-Speed 6310.53 samples/sec Loss 5.2004 LearningRate 0.0004 Epoch: 18 Global Step: 379340 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:30,875-Speed 6318.79 samples/sec Loss 5.1375 LearningRate 0.0004 Epoch: 18 Global Step: 379350 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:34,118-Speed 6315.77 samples/sec Loss 5.2440 LearningRate 0.0004 Epoch: 18 Global Step: 379360 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:37,366-Speed 6307.54 samples/sec Loss 5.1811 LearningRate 0.0004 Epoch: 18 Global Step: 379370 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:40,664-Speed 6210.38 samples/sec Loss 5.1334 LearningRate 0.0004 Epoch: 18 Global Step: 379380 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:14:43,905-Speed 6320.97 samples/sec Loss 5.1760 LearningRate 0.0004 Epoch: 18 Global Step: 379390 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:14:47,154-Speed 6306.46 samples/sec Loss 5.1531 LearningRate 0.0004 Epoch: 18 Global Step: 379400 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:14:50,456-Speed 6203.68 samples/sec Loss 5.1908 LearningRate 0.0004 Epoch: 18 Global Step: 379410 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:14:53,696-Speed 6321.75 samples/sec Loss 5.1126 LearningRate 0.0004 Epoch: 18 Global Step: 379420 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:14:56,938-Speed 6318.05 samples/sec Loss 5.1047 LearningRate 0.0004 Epoch: 18 Global Step: 379430 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:15:00,181-Speed 6316.47 samples/sec Loss 5.1281 LearningRate 0.0004 Epoch: 18 Global Step: 379440 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:15:03,429-Speed 6307.16 samples/sec Loss 5.1144 LearningRate 0.0004 Epoch: 18 Global Step: 379450 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:15:06,671-Speed 6319.51 samples/sec Loss 5.1751 LearningRate 0.0004 Epoch: 18 Global Step: 379460 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:15:09,916-Speed 6311.92 samples/sec Loss 5.1721 LearningRate 0.0004 Epoch: 18 Global Step: 379470 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:15:13,163-Speed 6307.90 samples/sec Loss 5.1422 LearningRate 0.0004 Epoch: 18 Global Step: 379480 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:15:16,397-Speed 6335.45 samples/sec Loss 5.1957 LearningRate 0.0004 Epoch: 18 Global Step: 379490 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:15:19,629-Speed 6338.27 samples/sec Loss 5.2107 LearningRate 0.0004 Epoch: 18 Global Step: 379500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:15:22,870-Speed 6319.32 samples/sec Loss 5.1004 LearningRate 0.0004 Epoch: 18 Global Step: 379510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:15:26,113-Speed 6316.19 samples/sec Loss 5.1032 LearningRate 0.0004 Epoch: 18 Global Step: 379520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:15:29,360-Speed 6308.90 samples/sec Loss 5.1978 LearningRate 0.0004 Epoch: 18 Global Step: 379530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:15:32,605-Speed 6312.74 samples/sec Loss 5.1611 LearningRate 0.0004 Epoch: 18 Global Step: 379540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:15:35,862-Speed 6289.54 samples/sec Loss 5.2309 LearningRate 0.0004 Epoch: 18 Global Step: 379550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:15:39,103-Speed 6319.91 samples/sec Loss 5.1365 LearningRate 0.0004 Epoch: 18 Global Step: 379560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:15:42,349-Speed 6312.02 samples/sec Loss 5.1846 LearningRate 0.0004 Epoch: 18 Global Step: 379570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:15:45,592-Speed 6314.98 samples/sec Loss 5.1276 LearningRate 0.0004 Epoch: 18 Global Step: 379580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:15:48,837-Speed 6314.06 samples/sec Loss 5.0891 LearningRate 0.0004 Epoch: 18 Global Step: 379590 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:15:52,080-Speed 6315.42 samples/sec Loss 5.0997 LearningRate 0.0004 Epoch: 18 Global Step: 379600 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:15:55,327-Speed 6309.78 samples/sec Loss 5.1490 LearningRate 0.0004 Epoch: 18 Global Step: 379610 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:15:58,575-Speed 6307.11 samples/sec Loss 5.0843 LearningRate 0.0004 Epoch: 18 Global Step: 379620 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:16:01,824-Speed 6304.90 samples/sec Loss 5.0777 LearningRate 0.0004 Epoch: 18 Global Step: 379630 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:16:05,071-Speed 6308.63 samples/sec Loss 5.1179 LearningRate 0.0004 Epoch: 18 Global Step: 379640 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:16:08,325-Speed 6295.02 samples/sec Loss 5.0967 LearningRate 0.0004 Epoch: 18 Global Step: 379650 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:16:11,557-Speed 6339.10 samples/sec Loss 5.1519 LearningRate 0.0004 Epoch: 18 Global Step: 379660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:14,801-Speed 6314.08 samples/sec Loss 5.1676 LearningRate 0.0004 Epoch: 18 Global Step: 379670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:18,045-Speed 6315.53 samples/sec Loss 5.1413 LearningRate 0.0004 Epoch: 18 Global Step: 379680 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:21,287-Speed 6317.04 samples/sec Loss 5.1082 LearningRate 0.0004 Epoch: 18 Global Step: 379690 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:24,530-Speed 6316.58 samples/sec Loss 5.1486 LearningRate 0.0004 Epoch: 18 Global Step: 379700 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:27,774-Speed 6314.30 samples/sec Loss 5.1022 LearningRate 0.0004 Epoch: 18 Global Step: 379710 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:31,016-Speed 6318.54 samples/sec Loss 5.1855 LearningRate 0.0004 Epoch: 18 Global Step: 379720 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:34,258-Speed 6319.65 samples/sec Loss 5.1198 LearningRate 0.0004 Epoch: 18 Global Step: 379730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:37,503-Speed 6312.79 samples/sec Loss 5.1826 LearningRate 0.0004 Epoch: 18 Global Step: 379740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:40,749-Speed 6309.81 samples/sec Loss 5.1417 LearningRate 0.0004 Epoch: 18 Global Step: 379750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:43,994-Speed 6312.73 samples/sec Loss 5.0976 LearningRate 0.0004 Epoch: 18 Global Step: 379760 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:16:47,243-Speed 6305.80 samples/sec Loss 5.1556 LearningRate 0.0004 Epoch: 18 Global Step: 379770 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:16:50,472-Speed 6342.50 samples/sec Loss 5.1130 LearningRate 0.0004 Epoch: 18 Global Step: 379780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:53,718-Speed 6311.67 samples/sec Loss 5.1850 LearningRate 0.0004 Epoch: 18 Global Step: 379790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:16:56,961-Speed 6316.77 samples/sec Loss 5.0763 LearningRate 0.0004 Epoch: 18 Global Step: 379800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:17:00,205-Speed 6313.64 samples/sec Loss 5.1194 LearningRate 0.0004 Epoch: 18 Global Step: 379810 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:17:03,449-Speed 6315.75 samples/sec Loss 5.1010 LearningRate 0.0004 Epoch: 18 Global Step: 379820 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:17:06,700-Speed 6301.81 samples/sec Loss 5.1635 LearningRate 0.0004 Epoch: 18 Global Step: 379830 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:17:09,944-Speed 6314.01 samples/sec Loss 5.1612 LearningRate 0.0004 Epoch: 18 Global Step: 379840 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:17:13,189-Speed 6312.99 samples/sec Loss 5.1650 LearningRate 0.0004 Epoch: 18 Global Step: 379850 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:17:16,445-Speed 6291.75 samples/sec Loss 5.1730 LearningRate 0.0004 Epoch: 18 Global Step: 379860 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:17:19,731-Speed 6233.03 samples/sec Loss 5.0976 LearningRate 0.0004 Epoch: 18 Global Step: 379870 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:17:23,026-Speed 6217.75 samples/sec Loss 5.0848 LearningRate 0.0004 Epoch: 18 Global Step: 379880 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:17:26,271-Speed 6312.16 samples/sec Loss 5.0760 LearningRate 0.0004 Epoch: 18 Global Step: 379890 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:17:29,516-Speed 6312.39 samples/sec Loss 5.0891 LearningRate 0.0004 Epoch: 18 Global Step: 379900 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:17:32,761-Speed 6312.71 samples/sec Loss 5.2253 LearningRate 0.0004 Epoch: 18 Global Step: 379910 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:17:36,002-Speed 6321.07 samples/sec Loss 5.0739 LearningRate 0.0004 Epoch: 18 Global Step: 379920 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:17:39,250-Speed 6306.27 samples/sec Loss 5.1478 LearningRate 0.0004 Epoch: 18 Global Step: 379930 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:17:42,494-Speed 6315.32 samples/sec Loss 5.1418 LearningRate 0.0004 Epoch: 18 Global Step: 379940 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:17:45,737-Speed 6315.44 samples/sec Loss 5.1425 LearningRate 0.0004 Epoch: 18 Global Step: 379950 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:17:49,026-Speed 6228.95 samples/sec Loss 5.1105 LearningRate 0.0004 Epoch: 18 Global Step: 379960 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:17:52,280-Speed 6295.22 samples/sec Loss 5.1597 LearningRate 0.0004 Epoch: 18 Global Step: 379970 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:17:55,525-Speed 6311.47 samples/sec Loss 5.2461 LearningRate 0.0004 Epoch: 18 Global Step: 379980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:17:58,768-Speed 6316.48 samples/sec Loss 5.1364 LearningRate 0.0004 Epoch: 18 Global Step: 379990 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:02,020-Speed 6300.27 samples/sec Loss 5.1214 LearningRate 0.0004 Epoch: 18 Global Step: 380000 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:05,267-Speed 6308.61 samples/sec Loss 5.1736 LearningRate 0.0004 Epoch: 18 Global Step: 380010 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:08,515-Speed 6307.23 samples/sec Loss 5.2051 LearningRate 0.0004 Epoch: 18 Global Step: 380020 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:11,761-Speed 6310.11 samples/sec Loss 5.0583 LearningRate 0.0004 Epoch: 18 Global Step: 380030 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:15,066-Speed 6198.16 samples/sec Loss 5.1733 LearningRate 0.0004 Epoch: 18 Global Step: 380040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:18,355-Speed 6229.52 samples/sec Loss 5.2078 LearningRate 0.0004 Epoch: 18 Global Step: 380050 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:21,599-Speed 6314.36 samples/sec Loss 5.1698 LearningRate 0.0004 Epoch: 18 Global Step: 380060 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:24,845-Speed 6310.28 samples/sec Loss 5.1357 LearningRate 0.0004 Epoch: 18 Global Step: 380070 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:18:28,075-Speed 6342.62 samples/sec Loss 5.1965 LearningRate 0.0004 Epoch: 18 Global Step: 380080 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:31,319-Speed 6313.99 samples/sec Loss 5.1483 LearningRate 0.0004 Epoch: 18 Global Step: 380090 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:34,561-Speed 6318.75 samples/sec Loss 5.1394 LearningRate 0.0004 Epoch: 18 Global Step: 380100 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:37,806-Speed 6311.33 samples/sec Loss 5.0984 LearningRate 0.0004 Epoch: 18 Global Step: 380110 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:41,049-Speed 6317.53 samples/sec Loss 5.1440 LearningRate 0.0004 Epoch: 18 Global Step: 380120 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:44,296-Speed 6308.43 samples/sec Loss 5.1578 LearningRate 0.0004 Epoch: 18 Global Step: 380130 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:47,546-Speed 6303.57 samples/sec Loss 5.1616 LearningRate 0.0004 Epoch: 18 Global Step: 380140 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:50,835-Speed 6227.31 samples/sec Loss 5.1458 LearningRate 0.0004 Epoch: 18 Global Step: 380150 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:54,079-Speed 6314.83 samples/sec Loss 5.1228 LearningRate 0.0004 Epoch: 18 Global Step: 380160 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:18:57,328-Speed 6305.66 samples/sec Loss 5.1170 LearningRate 0.0004 Epoch: 18 Global Step: 380170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:00,617-Speed 6227.12 samples/sec Loss 5.2220 LearningRate 0.0004 Epoch: 18 Global Step: 380180 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:19:03,863-Speed 6311.29 samples/sec Loss 5.2513 LearningRate 0.0004 Epoch: 18 Global Step: 380190 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:19:07,095-Speed 6337.61 samples/sec Loss 5.1714 LearningRate 0.0004 Epoch: 18 Global Step: 380200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:10,342-Speed 6309.56 samples/sec Loss 5.1754 LearningRate 0.0004 Epoch: 18 Global Step: 380210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:13,589-Speed 6308.43 samples/sec Loss 5.1272 LearningRate 0.0004 Epoch: 18 Global Step: 380220 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:16,836-Speed 6308.62 samples/sec Loss 5.0774 LearningRate 0.0004 Epoch: 18 Global Step: 380230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:20,084-Speed 6307.59 samples/sec Loss 5.1868 LearningRate 0.0004 Epoch: 18 Global Step: 380240 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:23,378-Speed 6219.45 samples/sec Loss 5.1274 LearningRate 0.0004 Epoch: 18 Global Step: 380250 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:26,622-Speed 6314.73 samples/sec Loss 5.0758 LearningRate 0.0004 Epoch: 18 Global Step: 380260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:29,916-Speed 6218.10 samples/sec Loss 5.1295 LearningRate 0.0004 Epoch: 18 Global Step: 380270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:33,160-Speed 6314.59 samples/sec Loss 5.1683 LearningRate 0.0004 Epoch: 18 Global Step: 380280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:36,405-Speed 6311.95 samples/sec Loss 5.1503 LearningRate 0.0004 Epoch: 18 Global Step: 380290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:19:39,659-Speed 6295.54 samples/sec Loss 5.0791 LearningRate 0.0004 Epoch: 18 Global Step: 380300 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:19:42,903-Speed 6314.15 samples/sec Loss 5.1709 LearningRate 0.0004 Epoch: 18 Global Step: 380310 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:19:46,146-Speed 6316.59 samples/sec Loss 5.0862 LearningRate 0.0004 Epoch: 18 Global Step: 380320 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:19:49,388-Speed 6319.41 samples/sec Loss 5.1097 LearningRate 0.0004 Epoch: 18 Global Step: 380330 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:19:52,638-Speed 6305.67 samples/sec Loss 5.0518 LearningRate 0.0004 Epoch: 18 Global Step: 380340 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:19:55,882-Speed 6314.20 samples/sec Loss 5.1474 LearningRate 0.0004 Epoch: 18 Global Step: 380350 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:19:59,129-Speed 6308.97 samples/sec Loss 5.2071 LearningRate 0.0004 Epoch: 18 Global Step: 380360 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:20:02,378-Speed 6304.10 samples/sec Loss 5.1951 LearningRate 0.0004 Epoch: 18 Global Step: 380370 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:20:05,627-Speed 6306.24 samples/sec Loss 5.1657 LearningRate 0.0004 Epoch: 18 Global Step: 380380 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:20:08,872-Speed 6311.82 samples/sec Loss 5.1565 LearningRate 0.0004 Epoch: 18 Global Step: 380390 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:20:12,102-Speed 6341.18 samples/sec Loss 5.1278 LearningRate 0.0004 Epoch: 18 Global Step: 380400 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:20:15,333-Speed 6340.84 samples/sec Loss 5.1137 LearningRate 0.0004 Epoch: 18 Global Step: 380410 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:20:18,581-Speed 6307.41 samples/sec Loss 5.1614 LearningRate 0.0004 Epoch: 18 Global Step: 380420 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:20:21,829-Speed 6307.74 samples/sec Loss 5.1029 LearningRate 0.0004 Epoch: 18 Global Step: 380430 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:20:25,074-Speed 6311.06 samples/sec Loss 5.1163 LearningRate 0.0004 Epoch: 18 Global Step: 380440 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:20:28,316-Speed 6319.13 samples/sec Loss 5.1710 LearningRate 0.0004 Epoch: 18 Global Step: 380450 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:20:31,557-Speed 6320.48 samples/sec Loss 5.1657 LearningRate 0.0004 Epoch: 18 Global Step: 380460 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:20:34,802-Speed 6313.79 samples/sec Loss 5.1488 LearningRate 0.0004 Epoch: 18 Global Step: 380470 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:20:38,048-Speed 6311.26 samples/sec Loss 5.2112 LearningRate 0.0004 Epoch: 18 Global Step: 380480 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:20:41,292-Speed 6314.09 samples/sec Loss 5.1671 LearningRate 0.0004 Epoch: 18 Global Step: 380490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:20:47,927-Speed 3086.74 samples/sec Loss 5.1611 LearningRate 0.0004 Epoch: 18 Global Step: 380500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:20:51,168-Speed 6321.50 samples/sec Loss 5.1026 LearningRate 0.0004 Epoch: 18 Global Step: 380510 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:20:54,408-Speed 6322.37 samples/sec Loss 5.1756 LearningRate 0.0004 Epoch: 18 Global Step: 380520 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:20:57,637-Speed 6344.60 samples/sec Loss 5.2030 LearningRate 0.0004 Epoch: 18 Global Step: 380530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:00,881-Speed 6313.51 samples/sec Loss 5.2160 LearningRate 0.0004 Epoch: 18 Global Step: 380540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:04,161-Speed 6246.27 samples/sec Loss 5.2150 LearningRate 0.0004 Epoch: 18 Global Step: 380550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:07,415-Speed 6293.85 samples/sec Loss 5.1742 LearningRate 0.0004 Epoch: 18 Global Step: 380560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:10,663-Speed 6307.68 samples/sec Loss 5.0443 LearningRate 0.0004 Epoch: 18 Global Step: 380570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:13,956-Speed 6220.82 samples/sec Loss 5.1198 LearningRate 0.0004 Epoch: 18 Global Step: 380580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:17,200-Speed 6314.04 samples/sec Loss 5.0810 LearningRate 0.0004 Epoch: 18 Global Step: 380590 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:20,447-Speed 6308.61 samples/sec Loss 5.1778 LearningRate 0.0004 Epoch: 18 Global Step: 380600 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:23,697-Speed 6303.16 samples/sec Loss 5.1636 LearningRate 0.0004 Epoch: 18 Global Step: 380610 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:26,944-Speed 6309.09 samples/sec Loss 5.0945 LearningRate 0.0004 Epoch: 18 Global Step: 380620 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:30,189-Speed 6312.99 samples/sec Loss 5.1193 LearningRate 0.0004 Epoch: 18 Global Step: 380630 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:21:33,438-Speed 6305.70 samples/sec Loss 5.0982 LearningRate 0.0004 Epoch: 18 Global Step: 380640 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:21:36,691-Speed 6296.67 samples/sec Loss 5.0921 LearningRate 0.0004 Epoch: 18 Global Step: 380650 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:21:39,939-Speed 6306.25 samples/sec Loss 5.2037 LearningRate 0.0004 Epoch: 18 Global Step: 380660 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:21:43,173-Speed 6335.18 samples/sec Loss 5.1124 LearningRate 0.0004 Epoch: 18 Global Step: 380670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:46,424-Speed 6300.36 samples/sec Loss 5.1694 LearningRate 0.0004 Epoch: 18 Global Step: 380680 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:49,672-Speed 6307.43 samples/sec Loss 5.1625 LearningRate 0.0004 Epoch: 18 Global Step: 380690 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:52,918-Speed 6310.32 samples/sec Loss 5.1151 LearningRate 0.0004 Epoch: 18 Global Step: 380700 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:56,165-Speed 6309.03 samples/sec Loss 5.1885 LearningRate 0.0004 Epoch: 18 Global Step: 380710 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:21:59,413-Speed 6307.02 samples/sec Loss 5.0973 LearningRate 0.0004 Epoch: 18 Global Step: 380720 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:22:02,661-Speed 6305.71 samples/sec Loss 5.1246 LearningRate 0.0004 Epoch: 18 Global Step: 380730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:22:05,911-Speed 6303.46 samples/sec Loss 5.1557 LearningRate 0.0004 Epoch: 18 Global Step: 380740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:22:09,164-Speed 6297.83 samples/sec Loss 5.0935 LearningRate 0.0004 Epoch: 18 Global Step: 380750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:22:12,411-Speed 6309.13 samples/sec Loss 5.1149 LearningRate 0.0004 Epoch: 18 Global Step: 380760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:22:15,657-Speed 6310.08 samples/sec Loss 5.1101 LearningRate 0.0004 Epoch: 18 Global Step: 380770 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:18,906-Speed 6304.46 samples/sec Loss 5.1938 LearningRate 0.0004 Epoch: 18 Global Step: 380780 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:22,163-Speed 6288.91 samples/sec Loss 5.1725 LearningRate 0.0004 Epoch: 18 Global Step: 380790 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:25,409-Speed 6311.15 samples/sec Loss 5.1851 LearningRate 0.0004 Epoch: 18 Global Step: 380800 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:28,658-Speed 6305.49 samples/sec Loss 5.1716 LearningRate 0.0004 Epoch: 18 Global Step: 380810 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:31,905-Speed 6308.16 samples/sec Loss 5.1192 LearningRate 0.0004 Epoch: 18 Global Step: 380820 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:35,154-Speed 6305.27 samples/sec Loss 5.1555 LearningRate 0.0004 Epoch: 18 Global Step: 380830 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:38,400-Speed 6309.58 samples/sec Loss 5.1292 LearningRate 0.0004 Epoch: 18 Global Step: 380840 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:41,651-Speed 6301.71 samples/sec Loss 5.1503 LearningRate 0.0004 Epoch: 18 Global Step: 380850 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:44,902-Speed 6300.71 samples/sec Loss 5.1150 LearningRate 0.0004 Epoch: 18 Global Step: 380860 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:48,133-Speed 6340.31 samples/sec Loss 5.2350 LearningRate 0.0004 Epoch: 18 Global Step: 380870 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:51,378-Speed 6312.88 samples/sec Loss 5.1865 LearningRate 0.0004 Epoch: 18 Global Step: 380880 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:54,622-Speed 6314.32 samples/sec Loss 5.1470 LearningRate 0.0004 Epoch: 18 Global Step: 380890 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:22:57,854-Speed 6340.30 samples/sec Loss 5.1730 LearningRate 0.0004 Epoch: 18 Global Step: 380900 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:01,100-Speed 6310.06 samples/sec Loss 5.1391 LearningRate 0.0004 Epoch: 18 Global Step: 380910 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:04,343-Speed 6316.18 samples/sec Loss 5.1265 LearningRate 0.0004 Epoch: 18 Global Step: 380920 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:07,588-Speed 6313.28 samples/sec Loss 5.1172 LearningRate 0.0004 Epoch: 18 Global Step: 380930 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:10,831-Speed 6315.22 samples/sec Loss 5.1103 LearningRate 0.0004 Epoch: 18 Global Step: 380940 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:14,073-Speed 6318.16 samples/sec Loss 5.1212 LearningRate 0.0004 Epoch: 18 Global Step: 380950 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:17,327-Speed 6295.69 samples/sec Loss 5.1430 LearningRate 0.0004 Epoch: 18 Global Step: 380960 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:20,572-Speed 6313.25 samples/sec Loss 5.2204 LearningRate 0.0004 Epoch: 18 Global Step: 380970 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:23,817-Speed 6312.17 samples/sec Loss 5.1312 LearningRate 0.0004 Epoch: 18 Global Step: 380980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:27,062-Speed 6312.02 samples/sec Loss 5.1452 LearningRate 0.0004 Epoch: 18 Global Step: 380990 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:30,303-Speed 6322.13 samples/sec Loss 5.1134 LearningRate 0.0004 Epoch: 18 Global Step: 381000 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:23:33,547-Speed 6313.43 samples/sec Loss 5.1183 LearningRate 0.0004 Epoch: 18 Global Step: 381010 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:23:36,791-Speed 6314.34 samples/sec Loss 5.1966 LearningRate 0.0004 Epoch: 18 Global Step: 381020 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:23:40,039-Speed 6307.81 samples/sec Loss 5.0747 LearningRate 0.0004 Epoch: 18 Global Step: 381030 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:23:43,270-Speed 6340.35 samples/sec Loss 5.1088 LearningRate 0.0004 Epoch: 18 Global Step: 381040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:46,513-Speed 6315.41 samples/sec Loss 5.1741 LearningRate 0.0004 Epoch: 18 Global Step: 381050 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:49,760-Speed 6308.68 samples/sec Loss 5.1016 LearningRate 0.0004 Epoch: 18 Global Step: 381060 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:53,008-Speed 6306.80 samples/sec Loss 5.1439 LearningRate 0.0004 Epoch: 18 Global Step: 381070 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:56,255-Speed 6309.50 samples/sec Loss 5.1315 LearningRate 0.0004 Epoch: 18 Global Step: 381080 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:23:59,498-Speed 6317.74 samples/sec Loss 5.0984 LearningRate 0.0004 Epoch: 18 Global Step: 381090 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:02,745-Speed 6309.11 samples/sec Loss 5.2059 LearningRate 0.0004 Epoch: 18 Global Step: 381100 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:05,995-Speed 6301.80 samples/sec Loss 5.1803 LearningRate 0.0004 Epoch: 18 Global Step: 381110 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:09,239-Speed 6314.71 samples/sec Loss 5.1632 LearningRate 0.0004 Epoch: 18 Global Step: 381120 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:12,487-Speed 6307.36 samples/sec Loss 5.1477 LearningRate 0.0004 Epoch: 18 Global Step: 381130 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:15,736-Speed 6304.52 samples/sec Loss 5.0655 LearningRate 0.0004 Epoch: 18 Global Step: 381140 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:24:18,982-Speed 6310.01 samples/sec Loss 5.0916 LearningRate 0.0004 Epoch: 18 Global Step: 381150 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:24:22,233-Speed 6301.09 samples/sec Loss 5.1381 LearningRate 0.0004 Epoch: 18 Global Step: 381160 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:24:25,481-Speed 6307.25 samples/sec Loss 5.1584 LearningRate 0.0004 Epoch: 18 Global Step: 381170 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:24:28,734-Speed 6296.78 samples/sec Loss 5.0625 LearningRate 0.0004 Epoch: 18 Global Step: 381180 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:24:31,967-Speed 6335.57 samples/sec Loss 5.1503 LearningRate 0.0004 Epoch: 18 Global Step: 381190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:35,213-Speed 6311.00 samples/sec Loss 5.1970 LearningRate 0.0004 Epoch: 18 Global Step: 381200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:38,458-Speed 6313.38 samples/sec Loss 5.1352 LearningRate 0.0004 Epoch: 18 Global Step: 381210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:41,702-Speed 6314.86 samples/sec Loss 5.1152 LearningRate 0.0004 Epoch: 18 Global Step: 381220 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:44,945-Speed 6315.36 samples/sec Loss 5.0731 LearningRate 0.0004 Epoch: 18 Global Step: 381230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:48,191-Speed 6311.82 samples/sec Loss 5.1514 LearningRate 0.0004 Epoch: 18 Global Step: 381240 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:51,472-Speed 6242.67 samples/sec Loss 5.1365 LearningRate 0.0004 Epoch: 18 Global Step: 381250 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:54,800-Speed 6155.78 samples/sec Loss 5.1151 LearningRate 0.0004 Epoch: 18 Global Step: 381260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:24:58,057-Speed 6289.63 samples/sec Loss 5.1466 LearningRate 0.0004 Epoch: 18 Global Step: 381270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:01,362-Speed 6197.97 samples/sec Loss 5.1292 LearningRate 0.0004 Epoch: 18 Global Step: 381280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:04,611-Speed 6305.49 samples/sec Loss 5.2113 LearningRate 0.0004 Epoch: 18 Global Step: 381290 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:25:07,857-Speed 6310.25 samples/sec Loss 5.1166 LearningRate 0.0004 Epoch: 18 Global Step: 381300 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:25:11,101-Speed 6313.84 samples/sec Loss 5.1055 LearningRate 0.0004 Epoch: 18 Global Step: 381310 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:25:14,345-Speed 6316.34 samples/sec Loss 5.1367 LearningRate 0.0004 Epoch: 18 Global Step: 381320 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:25:17,588-Speed 6315.99 samples/sec Loss 5.1345 LearningRate 0.0004 Epoch: 18 Global Step: 381330 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:25:20,840-Speed 6297.99 samples/sec Loss 5.2005 LearningRate 0.0004 Epoch: 18 Global Step: 381340 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:25:24,087-Speed 6308.52 samples/sec Loss 5.1700 LearningRate 0.0004 Epoch: 18 Global Step: 381350 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:25:27,332-Speed 6313.77 samples/sec Loss 5.0839 LearningRate 0.0004 Epoch: 18 Global Step: 381360 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:25:30,564-Speed 6337.96 samples/sec Loss 5.1422 LearningRate 0.0004 Epoch: 18 Global Step: 381370 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:33,806-Speed 6317.90 samples/sec Loss 5.0690 LearningRate 0.0004 Epoch: 18 Global Step: 381380 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:37,057-Speed 6301.46 samples/sec Loss 5.1216 LearningRate 0.0004 Epoch: 18 Global Step: 381390 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:40,304-Speed 6309.45 samples/sec Loss 5.1784 LearningRate 0.0004 Epoch: 18 Global Step: 381400 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:43,549-Speed 6311.09 samples/sec Loss 5.0770 LearningRate 0.0004 Epoch: 18 Global Step: 381410 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:46,794-Speed 6313.42 samples/sec Loss 5.1568 LearningRate 0.0004 Epoch: 18 Global Step: 381420 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:50,042-Speed 6306.91 samples/sec Loss 5.1605 LearningRate 0.0004 Epoch: 18 Global Step: 381430 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:53,290-Speed 6306.41 samples/sec Loss 5.1608 LearningRate 0.0004 Epoch: 18 Global Step: 381440 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:56,533-Speed 6316.04 samples/sec Loss 5.2096 LearningRate 0.0004 Epoch: 18 Global Step: 381450 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:25:59,779-Speed 6311.04 samples/sec Loss 5.1876 LearningRate 0.0004 Epoch: 18 Global Step: 381460 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:03,029-Speed 6304.26 samples/sec Loss 5.2021 LearningRate 0.0004 Epoch: 18 Global Step: 381470 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:26:06,278-Speed 6303.86 samples/sec Loss 5.1087 LearningRate 0.0004 Epoch: 18 Global Step: 381480 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:26:09,511-Speed 6337.32 samples/sec Loss 5.0944 LearningRate 0.0004 Epoch: 18 Global Step: 381490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:12,835-Speed 6162.86 samples/sec Loss 5.1121 LearningRate 0.0004 Epoch: 18 Global Step: 381500 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:16,082-Speed 6309.57 samples/sec Loss 5.1369 LearningRate 0.0004 Epoch: 18 Global Step: 381510 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:19,327-Speed 6312.47 samples/sec Loss 5.1805 LearningRate 0.0004 Epoch: 18 Global Step: 381520 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:22,569-Speed 6317.57 samples/sec Loss 5.1117 LearningRate 0.0004 Epoch: 18 Global Step: 381530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:25,814-Speed 6311.78 samples/sec Loss 5.1417 LearningRate 0.0004 Epoch: 18 Global Step: 381540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:29,057-Speed 6317.28 samples/sec Loss 5.1342 LearningRate 0.0004 Epoch: 18 Global Step: 381550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:32,301-Speed 6314.20 samples/sec Loss 5.1701 LearningRate 0.0004 Epoch: 18 Global Step: 381560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:35,544-Speed 6316.12 samples/sec Loss 5.1177 LearningRate 0.0004 Epoch: 18 Global Step: 381570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:38,786-Speed 6319.26 samples/sec Loss 5.1191 LearningRate 0.0004 Epoch: 18 Global Step: 381580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:42,044-Speed 6287.62 samples/sec Loss 5.1141 LearningRate 0.0004 Epoch: 18 Global Step: 381590 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:26:45,290-Speed 6311.08 samples/sec Loss 5.1909 LearningRate 0.0004 Epoch: 18 Global Step: 381600 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:26:48,523-Speed 6334.94 samples/sec Loss 5.1849 LearningRate 0.0004 Epoch: 18 Global Step: 381610 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:51,769-Speed 6311.61 samples/sec Loss 5.1976 LearningRate 0.0004 Epoch: 18 Global Step: 381620 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:55,022-Speed 6296.65 samples/sec Loss 5.1189 LearningRate 0.0004 Epoch: 18 Global Step: 381630 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:26:58,270-Speed 6306.42 samples/sec Loss 5.1119 LearningRate 0.0004 Epoch: 18 Global Step: 381640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:01,514-Speed 6315.95 samples/sec Loss 5.1140 LearningRate 0.0004 Epoch: 18 Global Step: 381650 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:04,761-Speed 6308.87 samples/sec Loss 5.0563 LearningRate 0.0004 Epoch: 18 Global Step: 381660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:08,006-Speed 6311.69 samples/sec Loss 5.0780 LearningRate 0.0004 Epoch: 18 Global Step: 381670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:11,248-Speed 6319.49 samples/sec Loss 5.1422 LearningRate 0.0004 Epoch: 18 Global Step: 381680 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:14,495-Speed 6309.02 samples/sec Loss 5.0892 LearningRate 0.0004 Epoch: 18 Global Step: 381690 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:17,741-Speed 6311.17 samples/sec Loss 5.1166 LearningRate 0.0004 Epoch: 18 Global Step: 381700 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:20,987-Speed 6310.84 samples/sec Loss 5.0871 LearningRate 0.0004 Epoch: 18 Global Step: 381710 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:27:24,242-Speed 6292.45 samples/sec Loss 5.1042 LearningRate 0.0004 Epoch: 18 Global Step: 381720 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:27:27,488-Speed 6310.61 samples/sec Loss 5.1195 LearningRate 0.0004 Epoch: 18 Global Step: 381730 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:27:30,718-Speed 6342.20 samples/sec Loss 5.1293 LearningRate 0.0004 Epoch: 18 Global Step: 381740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:33,961-Speed 6316.56 samples/sec Loss 5.1085 LearningRate 0.0004 Epoch: 18 Global Step: 381750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:37,209-Speed 6306.52 samples/sec Loss 5.1497 LearningRate 0.0004 Epoch: 18 Global Step: 381760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:40,460-Speed 6301.90 samples/sec Loss 5.1493 LearningRate 0.0004 Epoch: 18 Global Step: 381770 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:43,703-Speed 6315.46 samples/sec Loss 5.1133 LearningRate 0.0004 Epoch: 18 Global Step: 381780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:46,952-Speed 6305.22 samples/sec Loss 5.0883 LearningRate 0.0004 Epoch: 18 Global Step: 381790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:50,202-Speed 6302.47 samples/sec Loss 5.1318 LearningRate 0.0004 Epoch: 18 Global Step: 381800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:53,443-Speed 6320.29 samples/sec Loss 5.1119 LearningRate 0.0004 Epoch: 18 Global Step: 381810 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:56,690-Speed 6309.47 samples/sec Loss 5.0891 LearningRate 0.0004 Epoch: 18 Global Step: 381820 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:27:59,933-Speed 6315.40 samples/sec Loss 5.1686 LearningRate 0.0004 Epoch: 18 Global Step: 381830 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:03,183-Speed 6304.09 samples/sec Loss 5.0961 LearningRate 0.0004 Epoch: 18 Global Step: 381840 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:28:06,433-Speed 6302.14 samples/sec Loss 5.0874 LearningRate 0.0004 Epoch: 18 Global Step: 381850 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:28:09,677-Speed 6314.69 samples/sec Loss 5.1637 LearningRate 0.0004 Epoch: 18 Global Step: 381860 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:28:12,913-Speed 6330.42 samples/sec Loss 5.1818 LearningRate 0.0004 Epoch: 18 Global Step: 381870 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:16,158-Speed 6312.13 samples/sec Loss 5.1504 LearningRate 0.0004 Epoch: 18 Global Step: 381880 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:19,404-Speed 6310.48 samples/sec Loss 5.1326 LearningRate 0.0004 Epoch: 18 Global Step: 381890 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:22,655-Speed 6302.92 samples/sec Loss 5.1321 LearningRate 0.0004 Epoch: 18 Global Step: 381900 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:25,899-Speed 6313.85 samples/sec Loss 5.1565 LearningRate 0.0004 Epoch: 18 Global Step: 381910 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:29,141-Speed 6319.64 samples/sec Loss 5.0696 LearningRate 0.0004 Epoch: 18 Global Step: 381920 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:32,388-Speed 6307.72 samples/sec Loss 5.1135 LearningRate 0.0004 Epoch: 18 Global Step: 381930 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:35,633-Speed 6313.02 samples/sec Loss 5.1984 LearningRate 0.0004 Epoch: 18 Global Step: 381940 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:38,881-Speed 6307.83 samples/sec Loss 5.1221 LearningRate 0.0004 Epoch: 18 Global Step: 381950 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:42,130-Speed 6304.34 samples/sec Loss 5.1249 LearningRate 0.0004 Epoch: 18 Global Step: 381960 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:28:45,379-Speed 6303.96 samples/sec Loss 5.1063 LearningRate 0.0004 Epoch: 18 Global Step: 381970 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:28:48,625-Speed 6311.92 samples/sec Loss 5.1304 LearningRate 0.0004 Epoch: 18 Global Step: 381980 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:28:51,868-Speed 6315.59 samples/sec Loss 5.1652 LearningRate 0.0004 Epoch: 18 Global Step: 381990 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:28:55,115-Speed 6309.73 samples/sec Loss 5.1321 LearningRate 0.0004 Epoch: 18 Global Step: 382000 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:28:58,360-Speed 6312.77 samples/sec Loss 5.1719 LearningRate 0.0004 Epoch: 18 Global Step: 382010 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:01,607-Speed 6308.01 samples/sec Loss 5.1422 LearningRate 0.0004 Epoch: 18 Global Step: 382020 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:04,859-Speed 6297.93 samples/sec Loss 5.0450 LearningRate 0.0004 Epoch: 18 Global Step: 382030 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:08,107-Speed 6307.94 samples/sec Loss 5.0974 LearningRate 0.0004 Epoch: 18 Global Step: 382040 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:11,358-Speed 6300.22 samples/sec Loss 5.0966 LearningRate 0.0004 Epoch: 18 Global Step: 382050 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:14,598-Speed 6324.06 samples/sec Loss 5.1927 LearningRate 0.0004 Epoch: 18 Global Step: 382060 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:17,838-Speed 6321.49 samples/sec Loss 5.1606 LearningRate 0.0004 Epoch: 18 Global Step: 382070 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:21,082-Speed 6315.31 samples/sec Loss 5.1030 LearningRate 0.0004 Epoch: 18 Global Step: 382080 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:24,332-Speed 6302.23 samples/sec Loss 5.1406 LearningRate 0.0004 Epoch: 18 Global Step: 382090 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:27,577-Speed 6313.14 samples/sec Loss 5.0565 LearningRate 0.0004 Epoch: 18 Global Step: 382100 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:30,821-Speed 6314.17 samples/sec Loss 5.1239 LearningRate 0.0004 Epoch: 18 Global Step: 382110 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:34,067-Speed 6310.95 samples/sec Loss 5.1820 LearningRate 0.0004 Epoch: 18 Global Step: 382120 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:37,313-Speed 6310.43 samples/sec Loss 5.1194 LearningRate 0.0004 Epoch: 18 Global Step: 382130 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:40,566-Speed 6298.12 samples/sec Loss 5.1324 LearningRate 0.0004 Epoch: 18 Global Step: 382140 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:43,815-Speed 6304.64 samples/sec Loss 5.1041 LearningRate 0.0004 Epoch: 18 Global Step: 382150 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:47,059-Speed 6315.61 samples/sec Loss 5.1108 LearningRate 0.0004 Epoch: 18 Global Step: 382160 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:29:50,280-Speed 6359.71 samples/sec Loss 5.0950 LearningRate 0.0004 Epoch: 18 Global Step: 382170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:29:53,525-Speed 6311.76 samples/sec Loss 5.0940 LearningRate 0.0004 Epoch: 18 Global Step: 382180 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:29:56,770-Speed 6311.58 samples/sec Loss 5.1250 LearningRate 0.0004 Epoch: 18 Global Step: 382190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:30:00,022-Speed 6300.18 samples/sec Loss 5.1547 LearningRate 0.0004 Epoch: 18 Global Step: 382200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:30:03,271-Speed 6305.36 samples/sec Loss 5.0930 LearningRate 0.0004 Epoch: 18 Global Step: 382210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:30:06,518-Speed 6307.73 samples/sec Loss 5.1286 LearningRate 0.0004 Epoch: 18 Global Step: 382220 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:30:09,763-Speed 6312.61 samples/sec Loss 5.1307 LearningRate 0.0004 Epoch: 18 Global Step: 382230 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:30:13,013-Speed 6303.32 samples/sec Loss 5.1268 LearningRate 0.0004 Epoch: 18 Global Step: 382240 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:30:16,255-Speed 6318.24 samples/sec Loss 5.1543 LearningRate 0.0004 Epoch: 18 Global Step: 382250 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:30:19,513-Speed 6288.95 samples/sec Loss 5.1272 LearningRate 0.0004 Epoch: 18 Global Step: 382260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:30:22,759-Speed 6309.43 samples/sec Loss 5.1211 LearningRate 0.0004 Epoch: 18 Global Step: 382270 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:26,010-Speed 6301.43 samples/sec Loss 5.1356 LearningRate 0.0004 Epoch: 18 Global Step: 382280 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:29,253-Speed 6316.66 samples/sec Loss 5.1026 LearningRate 0.0004 Epoch: 18 Global Step: 382290 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:32,500-Speed 6309.36 samples/sec Loss 5.1076 LearningRate 0.0004 Epoch: 18 Global Step: 382300 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:35,745-Speed 6311.22 samples/sec Loss 5.0384 LearningRate 0.0004 Epoch: 18 Global Step: 382310 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:38,992-Speed 6308.70 samples/sec Loss 5.0945 LearningRate 0.0004 Epoch: 18 Global Step: 382320 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:42,248-Speed 6291.41 samples/sec Loss 5.0404 LearningRate 0.0004 Epoch: 18 Global Step: 382330 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:45,502-Speed 6296.83 samples/sec Loss 5.1759 LearningRate 0.0004 Epoch: 18 Global Step: 382340 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:48,747-Speed 6311.74 samples/sec Loss 5.1105 LearningRate 0.0004 Epoch: 18 Global Step: 382350 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:51,996-Speed 6305.37 samples/sec Loss 5.1166 LearningRate 0.0004 Epoch: 18 Global Step: 382360 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:55,230-Speed 6335.06 samples/sec Loss 5.1459 LearningRate 0.0004 Epoch: 18 Global Step: 382370 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:30:58,483-Speed 6297.62 samples/sec Loss 5.1515 LearningRate 0.0004 Epoch: 18 Global Step: 382380 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:31:01,730-Speed 6308.13 samples/sec Loss 5.1144 LearningRate 0.0004 Epoch: 18 Global Step: 382390 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:31:04,961-Speed 6340.11 samples/sec Loss 5.1165 LearningRate 0.0004 Epoch: 18 Global Step: 382400 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:08,209-Speed 6306.16 samples/sec Loss 5.1198 LearningRate 0.0004 Epoch: 18 Global Step: 382410 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:11,456-Speed 6309.03 samples/sec Loss 5.1488 LearningRate 0.0004 Epoch: 18 Global Step: 382420 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:14,706-Speed 6302.96 samples/sec Loss 5.0621 LearningRate 0.0004 Epoch: 18 Global Step: 382430 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:17,951-Speed 6311.59 samples/sec Loss 5.1464 LearningRate 0.0004 Epoch: 18 Global Step: 382440 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:21,196-Speed 6312.42 samples/sec Loss 5.1106 LearningRate 0.0004 Epoch: 18 Global Step: 382450 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:24,441-Speed 6313.00 samples/sec Loss 5.1147 LearningRate 0.0004 Epoch: 18 Global Step: 382460 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:27,690-Speed 6305.14 samples/sec Loss 5.1411 LearningRate 0.0004 Epoch: 18 Global Step: 382470 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:30,938-Speed 6306.57 samples/sec Loss 5.1505 LearningRate 0.0004 Epoch: 18 Global Step: 382480 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:34,178-Speed 6323.34 samples/sec Loss 5.0907 LearningRate 0.0004 Epoch: 18 Global Step: 382490 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:37,422-Speed 6313.03 samples/sec Loss 5.1968 LearningRate 0.0004 Epoch: 18 Global Step: 382500 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:31:40,701-Speed 6248.26 samples/sec Loss 5.1812 LearningRate 0.0004 Epoch: 18 Global Step: 382510 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:31:43,949-Speed 6306.89 samples/sec Loss 5.0979 LearningRate 0.0004 Epoch: 18 Global Step: 382520 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:31:47,182-Speed 6335.92 samples/sec Loss 5.1129 LearningRate 0.0004 Epoch: 18 Global Step: 382530 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:50,426-Speed 6314.43 samples/sec Loss 5.1515 LearningRate 0.0004 Epoch: 18 Global Step: 382540 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:53,672-Speed 6311.11 samples/sec Loss 5.1949 LearningRate 0.0004 Epoch: 18 Global Step: 382550 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:31:56,919-Speed 6309.77 samples/sec Loss 5.0983 LearningRate 0.0004 Epoch: 18 Global Step: 382560 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:00,163-Speed 6314.95 samples/sec Loss 5.1129 LearningRate 0.0004 Epoch: 18 Global Step: 382570 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:03,414-Speed 6301.07 samples/sec Loss 5.1748 LearningRate 0.0004 Epoch: 18 Global Step: 382580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:06,701-Speed 6230.84 samples/sec Loss 5.0907 LearningRate 0.0004 Epoch: 18 Global Step: 382590 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:09,944-Speed 6317.21 samples/sec Loss 5.1694 LearningRate 0.0004 Epoch: 18 Global Step: 382600 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:13,191-Speed 6308.67 samples/sec Loss 5.1539 LearningRate 0.0004 Epoch: 18 Global Step: 382610 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:16,440-Speed 6305.09 samples/sec Loss 5.1196 LearningRate 0.0004 Epoch: 18 Global Step: 382620 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:19,689-Speed 6304.75 samples/sec Loss 5.1162 LearningRate 0.0004 Epoch: 18 Global Step: 382630 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:32:22,935-Speed 6311.06 samples/sec Loss 5.0618 LearningRate 0.0004 Epoch: 18 Global Step: 382640 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:32:26,184-Speed 6305.34 samples/sec Loss 5.0516 LearningRate 0.0004 Epoch: 18 Global Step: 382650 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:32:29,429-Speed 6310.78 samples/sec Loss 5.0965 LearningRate 0.0004 Epoch: 18 Global Step: 382660 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:32:32,661-Speed 6339.40 samples/sec Loss 5.1359 LearningRate 0.0004 Epoch: 18 Global Step: 382670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:35,919-Speed 6286.46 samples/sec Loss 5.1114 LearningRate 0.0004 Epoch: 18 Global Step: 382680 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:39,224-Speed 6198.33 samples/sec Loss 5.2042 LearningRate 0.0004 Epoch: 18 Global Step: 382690 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:42,473-Speed 6305.37 samples/sec Loss 5.1562 LearningRate 0.0004 Epoch: 18 Global Step: 382700 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:45,723-Speed 6302.98 samples/sec Loss 5.1260 LearningRate 0.0004 Epoch: 18 Global Step: 382710 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:48,968-Speed 6311.51 samples/sec Loss 5.1311 LearningRate 0.0004 Epoch: 18 Global Step: 382720 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:52,213-Speed 6313.85 samples/sec Loss 5.1834 LearningRate 0.0004 Epoch: 18 Global Step: 382730 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:55,460-Speed 6308.40 samples/sec Loss 5.1159 LearningRate 0.0004 Epoch: 18 Global Step: 382740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:32:58,708-Speed 6307.10 samples/sec Loss 5.1041 LearningRate 0.0004 Epoch: 18 Global Step: 382750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:01,953-Speed 6313.60 samples/sec Loss 5.1263 LearningRate 0.0004 Epoch: 18 Global Step: 382760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:05,202-Speed 6305.17 samples/sec Loss 5.1988 LearningRate 0.0004 Epoch: 18 Global Step: 382770 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:33:08,448-Speed 6309.83 samples/sec Loss 5.0842 LearningRate 0.0004 Epoch: 18 Global Step: 382780 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:33:11,711-Speed 6277.63 samples/sec Loss 5.1332 LearningRate 0.0004 Epoch: 18 Global Step: 382790 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:33:14,944-Speed 6337.56 samples/sec Loss 5.0594 LearningRate 0.0004 Epoch: 18 Global Step: 382800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:18,188-Speed 6315.62 samples/sec Loss 5.0819 LearningRate 0.0004 Epoch: 18 Global Step: 382810 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:21,436-Speed 6305.95 samples/sec Loss 5.1492 LearningRate 0.0004 Epoch: 18 Global Step: 382820 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:24,685-Speed 6304.06 samples/sec Loss 5.1324 LearningRate 0.0004 Epoch: 18 Global Step: 382830 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:27,933-Speed 6307.22 samples/sec Loss 5.1607 LearningRate 0.0004 Epoch: 18 Global Step: 382840 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:31,178-Speed 6313.42 samples/sec Loss 5.1793 LearningRate 0.0004 Epoch: 18 Global Step: 382850 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:34,426-Speed 6307.51 samples/sec Loss 5.0999 LearningRate 0.0004 Epoch: 18 Global Step: 382860 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:37,703-Speed 6249.82 samples/sec Loss 5.1031 LearningRate 0.0004 Epoch: 18 Global Step: 382870 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:40,945-Speed 6318.27 samples/sec Loss 5.0520 LearningRate 0.0004 Epoch: 18 Global Step: 382880 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:44,193-Speed 6307.07 samples/sec Loss 5.1702 LearningRate 0.0004 Epoch: 18 Global Step: 382890 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:47,460-Speed 6270.44 samples/sec Loss 5.1696 LearningRate 0.0004 Epoch: 18 Global Step: 382900 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:33:50,736-Speed 6252.25 samples/sec Loss 5.0623 LearningRate 0.0004 Epoch: 18 Global Step: 382910 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:33:54,008-Speed 6260.43 samples/sec Loss 5.1545 LearningRate 0.0004 Epoch: 18 Global Step: 382920 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:33:57,257-Speed 6304.47 samples/sec Loss 5.0905 LearningRate 0.0004 Epoch: 18 Global Step: 382930 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:34:00,502-Speed 6313.79 samples/sec Loss 5.2022 LearningRate 0.0004 Epoch: 18 Global Step: 382940 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:34:03,752-Speed 6301.71 samples/sec Loss 5.1667 LearningRate 0.0004 Epoch: 18 Global Step: 382950 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:34:07,001-Speed 6305.80 samples/sec Loss 5.1631 LearningRate 0.0004 Epoch: 18 Global Step: 382960 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:34:10,246-Speed 6313.73 samples/sec Loss 5.0919 LearningRate 0.0004 Epoch: 18 Global Step: 382970 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:34:13,488-Speed 6318.17 samples/sec Loss 5.1032 LearningRate 0.0004 Epoch: 18 Global Step: 382980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:34:16,735-Speed 6309.07 samples/sec Loss 5.1423 LearningRate 0.0004 Epoch: 18 Global Step: 382990 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:34:19,982-Speed 6308.45 samples/sec Loss 5.0806 LearningRate 0.0004 Epoch: 18 Global Step: 383000 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:34:23,232-Speed 6303.71 samples/sec Loss 5.1347 LearningRate 0.0004 Epoch: 18 Global Step: 383010 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:34:26,545-Speed 6183.37 samples/sec Loss 5.1193 LearningRate 0.0004 Epoch: 18 Global Step: 383020 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:34:29,821-Speed 6251.83 samples/sec Loss 5.0511 LearningRate 0.0004 Epoch: 18 Global Step: 383030 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:34:33,064-Speed 6316.98 samples/sec Loss 5.1086 LearningRate 0.0004 Epoch: 18 Global Step: 383040 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:34:36,310-Speed 6310.19 samples/sec Loss 5.1006 LearningRate 0.0004 Epoch: 18 Global Step: 383050 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:34:39,564-Speed 6295.49 samples/sec Loss 5.1274 LearningRate 0.0004 Epoch: 18 Global Step: 383060 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:34:42,811-Speed 6308.17 samples/sec Loss 5.2438 LearningRate 0.0004 Epoch: 18 Global Step: 383070 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:34:46,058-Speed 6309.76 samples/sec Loss 5.1022 LearningRate 0.0004 Epoch: 18 Global Step: 383080 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:34:49,306-Speed 6307.11 samples/sec Loss 5.0743 LearningRate 0.0004 Epoch: 18 Global Step: 383090 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:34:52,552-Speed 6309.85 samples/sec Loss 5.0871 LearningRate 0.0004 Epoch: 18 Global Step: 383100 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:34:55,796-Speed 6315.60 samples/sec Loss 5.1329 LearningRate 0.0004 Epoch: 18 Global Step: 383110 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:34:59,015-Speed 6362.67 samples/sec Loss 5.0992 LearningRate 0.0004 Epoch: 18 Global Step: 383120 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:02,262-Speed 6307.71 samples/sec Loss 5.2287 LearningRate 0.0004 Epoch: 18 Global Step: 383130 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:05,509-Speed 6309.19 samples/sec Loss 5.0949 LearningRate 0.0004 Epoch: 18 Global Step: 383140 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:08,759-Speed 6303.91 samples/sec Loss 5.0884 LearningRate 0.0004 Epoch: 18 Global Step: 383150 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:12,010-Speed 6300.15 samples/sec Loss 5.1383 LearningRate 0.0004 Epoch: 18 Global Step: 383160 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:15,259-Speed 6306.69 samples/sec Loss 5.1245 LearningRate 0.0004 Epoch: 18 Global Step: 383170 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:18,506-Speed 6307.38 samples/sec Loss 5.1728 LearningRate 0.0004 Epoch: 18 Global Step: 383180 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:21,757-Speed 6301.93 samples/sec Loss 5.0374 LearningRate 0.0004 Epoch: 18 Global Step: 383190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:25,007-Speed 6303.46 samples/sec Loss 5.1457 LearningRate 0.0004 Epoch: 18 Global Step: 383200 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:28,255-Speed 6306.09 samples/sec Loss 5.0827 LearningRate 0.0004 Epoch: 18 Global Step: 383210 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:31,502-Speed 6308.94 samples/sec Loss 5.0635 LearningRate 0.0004 Epoch: 18 Global Step: 383220 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:35:34,750-Speed 6307.19 samples/sec Loss 5.0480 LearningRate 0.0004 Epoch: 18 Global Step: 383230 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:35:37,994-Speed 6314.52 samples/sec Loss 5.1456 LearningRate 0.0004 Epoch: 18 Global Step: 383240 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:35:41,240-Speed 6309.54 samples/sec Loss 5.1375 LearningRate 0.0004 Epoch: 18 Global Step: 383250 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:35:44,472-Speed 6339.25 samples/sec Loss 5.0721 LearningRate 0.0004 Epoch: 18 Global Step: 383260 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:47,718-Speed 6310.47 samples/sec Loss 5.1209 LearningRate 0.0004 Epoch: 18 Global Step: 383270 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:50,963-Speed 6314.50 samples/sec Loss 5.1255 LearningRate 0.0004 Epoch: 18 Global Step: 383280 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:54,211-Speed 6307.01 samples/sec Loss 5.1074 LearningRate 0.0004 Epoch: 18 Global Step: 383290 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:35:57,457-Speed 6310.39 samples/sec Loss 5.1608 LearningRate 0.0004 Epoch: 18 Global Step: 383300 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:00,708-Speed 6299.99 samples/sec Loss 5.0981 LearningRate 0.0004 Epoch: 18 Global Step: 383310 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:03,955-Speed 6308.55 samples/sec Loss 5.1021 LearningRate 0.0004 Epoch: 18 Global Step: 383320 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:07,204-Speed 6305.01 samples/sec Loss 5.1335 LearningRate 0.0004 Epoch: 18 Global Step: 383330 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:10,458-Speed 6296.49 samples/sec Loss 5.0670 LearningRate 0.0004 Epoch: 18 Global Step: 383340 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:13,707-Speed 6304.70 samples/sec Loss 5.1580 LearningRate 0.0004 Epoch: 18 Global Step: 383350 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:16,951-Speed 6313.56 samples/sec Loss 5.0763 LearningRate 0.0004 Epoch: 18 Global Step: 383360 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:36:20,182-Speed 6341.09 samples/sec Loss 5.1074 LearningRate 0.0004 Epoch: 18 Global Step: 383370 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:23,429-Speed 6310.19 samples/sec Loss 5.1787 LearningRate 0.0004 Epoch: 18 Global Step: 383380 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:26,688-Speed 6285.28 samples/sec Loss 5.0889 LearningRate 0.0004 Epoch: 18 Global Step: 383390 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:29,943-Speed 6292.23 samples/sec Loss 5.1193 LearningRate 0.0004 Epoch: 18 Global Step: 383400 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:33,183-Speed 6322.18 samples/sec Loss 5.1167 LearningRate 0.0004 Epoch: 18 Global Step: 383410 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:36,430-Speed 6310.24 samples/sec Loss 5.1489 LearningRate 0.0004 Epoch: 18 Global Step: 383420 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:39,676-Speed 6309.13 samples/sec Loss 5.1077 LearningRate 0.0004 Epoch: 18 Global Step: 383430 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:42,924-Speed 6306.97 samples/sec Loss 5.0932 LearningRate 0.0004 Epoch: 18 Global Step: 383440 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:46,182-Speed 6288.54 samples/sec Loss 5.1431 LearningRate 0.0004 Epoch: 18 Global Step: 383450 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:49,429-Speed 6307.86 samples/sec Loss 5.1218 LearningRate 0.0004 Epoch: 18 Global Step: 383460 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:36:52,674-Speed 6313.07 samples/sec Loss 5.1495 LearningRate 0.0004 Epoch: 18 Global Step: 383470 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:36:55,920-Speed 6310.28 samples/sec Loss 5.1610 LearningRate 0.0004 Epoch: 18 Global Step: 383480 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:36:59,168-Speed 6307.20 samples/sec Loss 5.1488 LearningRate 0.0004 Epoch: 18 Global Step: 383490 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:37:02,413-Speed 6311.18 samples/sec Loss 5.1409 LearningRate 0.0004 Epoch: 18 Global Step: 383500 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:37:05,658-Speed 6313.37 samples/sec Loss 5.1497 LearningRate 0.0004 Epoch: 18 Global Step: 383510 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:37:08,904-Speed 6310.51 samples/sec Loss 5.1106 LearningRate 0.0004 Epoch: 18 Global Step: 383520 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:37:12,151-Speed 6310.01 samples/sec Loss 5.1311 LearningRate 0.0004 Epoch: 18 Global Step: 383530 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:37:15,397-Speed 6310.36 samples/sec Loss 5.0967 LearningRate 0.0004 Epoch: 18 Global Step: 383540 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:37:18,641-Speed 6314.43 samples/sec Loss 5.1460 LearningRate 0.0004 Epoch: 18 Global Step: 383550 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:37:21,885-Speed 6314.46 samples/sec Loss 5.1444 LearningRate 0.0004 Epoch: 18 Global Step: 383560 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:37:25,121-Speed 6330.58 samples/sec Loss 5.0636 LearningRate 0.0004 Epoch: 18 Global Step: 383570 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:37:28,360-Speed 6324.17 samples/sec Loss 5.1826 LearningRate 0.0004 Epoch: 18 Global Step: 383580 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:37:31,605-Speed 6312.05 samples/sec Loss 5.1058 LearningRate 0.0004 Epoch: 18 Global Step: 383590 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:37:34,853-Speed 6308.26 samples/sec Loss 5.2096 LearningRate 0.0004 Epoch: 18 Global Step: 383600 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:37:38,097-Speed 6314.82 samples/sec Loss 5.1043 LearningRate 0.0004 Epoch: 18 Global Step: 383610 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:37:41,344-Speed 6309.13 samples/sec Loss 5.0963 LearningRate 0.0004 Epoch: 18 Global Step: 383620 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:37:44,587-Speed 6315.27 samples/sec Loss 5.0743 LearningRate 0.0004 Epoch: 18 Global Step: 383630 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:37:47,833-Speed 6310.39 samples/sec Loss 5.1332 LearningRate 0.0004 Epoch: 18 Global Step: 383640 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:37:51,082-Speed 6305.79 samples/sec Loss 5.0532 LearningRate 0.0004 Epoch: 18 Global Step: 383650 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:37:54,334-Speed 6299.53 samples/sec Loss 5.0665 LearningRate 0.0004 Epoch: 18 Global Step: 383660 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:37:57,574-Speed 6321.01 samples/sec Loss 5.1321 LearningRate 0.0004 Epoch: 18 Global Step: 383670 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:00,825-Speed 6301.05 samples/sec Loss 5.1066 LearningRate 0.0004 Epoch: 18 Global Step: 383680 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:38:04,073-Speed 6306.69 samples/sec Loss 5.1362 LearningRate 0.0004 Epoch: 18 Global Step: 383690 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:38:07,319-Speed 6312.17 samples/sec Loss 5.0899 LearningRate 0.0004 Epoch: 18 Global Step: 383700 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:38:10,569-Speed 6302.96 samples/sec Loss 5.0894 LearningRate 0.0004 Epoch: 18 Global Step: 383710 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:38:13,815-Speed 6309.65 samples/sec Loss 5.0946 LearningRate 0.0004 Epoch: 18 Global Step: 383720 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:38:17,176-Speed 6094.42 samples/sec Loss 5.0427 LearningRate 0.0004 Epoch: 18 Global Step: 383730 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:38:20,409-Speed 6336.83 samples/sec Loss 5.1289 LearningRate 0.0004 Epoch: 18 Global Step: 383740 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:23,666-Speed 6289.80 samples/sec Loss 5.0487 LearningRate 0.0004 Epoch: 18 Global Step: 383750 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:26,912-Speed 6309.30 samples/sec Loss 5.1132 LearningRate 0.0004 Epoch: 18 Global Step: 383760 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:30,154-Speed 6319.66 samples/sec Loss 5.0935 LearningRate 0.0004 Epoch: 18 Global Step: 383770 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:33,401-Speed 6308.56 samples/sec Loss 5.0359 LearningRate 0.0004 Epoch: 18 Global Step: 383780 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:36,649-Speed 6308.50 samples/sec Loss 5.1425 LearningRate 0.0004 Epoch: 18 Global Step: 383790 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:39,896-Speed 6307.62 samples/sec Loss 5.1159 LearningRate 0.0004 Epoch: 18 Global Step: 383800 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:43,141-Speed 6312.14 samples/sec Loss 5.1284 LearningRate 0.0004 Epoch: 18 Global Step: 383810 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:46,386-Speed 6312.81 samples/sec Loss 5.0882 LearningRate 0.0004 Epoch: 18 Global Step: 383820 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:49,633-Speed 6309.92 samples/sec Loss 5.1348 LearningRate 0.0004 Epoch: 18 Global Step: 383830 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:38:52,878-Speed 6311.69 samples/sec Loss 5.0922 LearningRate 0.0004 Epoch: 18 Global Step: 383840 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:38:56,127-Speed 6305.65 samples/sec Loss 5.1445 LearningRate 0.0004 Epoch: 18 Global Step: 383850 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:38:59,374-Speed 6307.66 samples/sec Loss 5.1548 LearningRate 0.0004 Epoch: 18 Global Step: 383860 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:39:02,617-Speed 6316.49 samples/sec Loss 5.1526 LearningRate 0.0004 Epoch: 18 Global Step: 383870 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:39:05,888-Speed 6262.68 samples/sec Loss 5.1047 LearningRate 0.0004 Epoch: 18 Global Step: 383880 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:39:09,130-Speed 6318.81 samples/sec Loss 5.1050 LearningRate 0.0004 Epoch: 18 Global Step: 383890 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:39:12,375-Speed 6313.49 samples/sec Loss 5.0927 LearningRate 0.0004 Epoch: 18 Global Step: 383900 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:39:15,630-Speed 6291.73 samples/sec Loss 5.1223 LearningRate 0.0004 Epoch: 18 Global Step: 383910 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:39:18,873-Speed 6316.79 samples/sec Loss 5.1370 LearningRate 0.0004 Epoch: 18 Global Step: 383920 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:39:22,120-Speed 6308.72 samples/sec Loss 5.0957 LearningRate 0.0004 Epoch: 18 Global Step: 383930 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:39:25,351-Speed 6339.97 samples/sec Loss 5.1421 LearningRate 0.0004 Epoch: 18 Global Step: 383940 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:39:28,588-Speed 6329.05 samples/sec Loss 5.1000 LearningRate 0.0004 Epoch: 18 Global Step: 383950 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:39:31,829-Speed 6319.14 samples/sec Loss 5.1233 LearningRate 0.0004 Epoch: 18 Global Step: 383960 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:39:35,077-Speed 6308.22 samples/sec Loss 5.0825 LearningRate 0.0004 Epoch: 18 Global Step: 383970 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:39:38,320-Speed 6315.01 samples/sec Loss 5.1431 LearningRate 0.0004 Epoch: 18 Global Step: 383980 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:39:41,609-Speed 6229.51 samples/sec Loss 5.0945 LearningRate 0.0004 Epoch: 18 Global Step: 383990 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:39:44,855-Speed 6310.46 samples/sec Loss 5.1783 LearningRate 0.0004 Epoch: 18 Global Step: 384000 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:39:48,104-Speed 6304.66 samples/sec Loss 5.0733 LearningRate 0.0004 Epoch: 18 Global Step: 384010 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:39:51,350-Speed 6312.53 samples/sec Loss 5.0140 LearningRate 0.0004 Epoch: 18 Global Step: 384020 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:39:54,596-Speed 6309.79 samples/sec Loss 5.0793 LearningRate 0.0004 Epoch: 18 Global Step: 384030 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:39:57,842-Speed 6312.24 samples/sec Loss 5.0798 LearningRate 0.0004 Epoch: 18 Global Step: 384040 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:40:01,088-Speed 6310.38 samples/sec Loss 5.1268 LearningRate 0.0004 Epoch: 18 Global Step: 384050 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:04,339-Speed 6300.79 samples/sec Loss 5.1663 LearningRate 0.0004 Epoch: 18 Global Step: 384060 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:07,587-Speed 6306.45 samples/sec Loss 5.1048 LearningRate 0.0004 Epoch: 18 Global Step: 384070 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:10,833-Speed 6311.14 samples/sec Loss 5.0301 LearningRate 0.0004 Epoch: 18 Global Step: 384080 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:14,078-Speed 6311.62 samples/sec Loss 5.1359 LearningRate 0.0004 Epoch: 18 Global Step: 384090 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:17,329-Speed 6300.17 samples/sec Loss 5.1191 LearningRate 0.0004 Epoch: 18 Global Step: 384100 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:20,573-Speed 6315.57 samples/sec Loss 5.1596 LearningRate 0.0004 Epoch: 18 Global Step: 384110 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:23,824-Speed 6301.85 samples/sec Loss 5.0941 LearningRate 0.0004 Epoch: 18 Global Step: 384120 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:27,069-Speed 6312.76 samples/sec Loss 5.1598 LearningRate 0.0004 Epoch: 18 Global Step: 384130 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:30,313-Speed 6313.84 samples/sec Loss 5.1495 LearningRate 0.0004 Epoch: 18 Global Step: 384140 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:33,547-Speed 6334.81 samples/sec Loss 5.0928 LearningRate 0.0004 Epoch: 18 Global Step: 384150 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:36,793-Speed 6310.35 samples/sec Loss 5.1298 LearningRate 0.0004 Epoch: 18 Global Step: 384160 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:40,039-Speed 6310.90 samples/sec Loss 5.1141 LearningRate 0.0004 Epoch: 18 Global Step: 384170 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:43,284-Speed 6311.83 samples/sec Loss 5.1630 LearningRate 0.0004 Epoch: 18 Global Step: 384180 Fp16 Grad Scale: 32768 Required: 41 hours Training: 2022-04-02 02:40:46,528-Speed 6314.64 samples/sec Loss 5.1281 LearningRate 0.0004 Epoch: 18 Global Step: 384190 Fp16 Grad Scale: 16384 Required: 41 hours Training: 2022-04-02 02:40:49,773-Speed 6311.93 samples/sec Loss 5.1687 LearningRate 0.0004 Epoch: 18 Global Step: 384200 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:40:53,033-Speed 6283.70 samples/sec Loss 5.1131 LearningRate 0.0004 Epoch: 18 Global Step: 384210 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:40:56,279-Speed 6311.50 samples/sec Loss 5.1147 LearningRate 0.0004 Epoch: 18 Global Step: 384220 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:40:59,523-Speed 6315.67 samples/sec Loss 5.1290 LearningRate 0.0004 Epoch: 18 Global Step: 384230 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:02,767-Speed 6314.52 samples/sec Loss 5.1246 LearningRate 0.0004 Epoch: 18 Global Step: 384240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:06,010-Speed 6315.53 samples/sec Loss 5.1281 LearningRate 0.0004 Epoch: 18 Global Step: 384250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:09,259-Speed 6306.09 samples/sec Loss 5.0916 LearningRate 0.0004 Epoch: 18 Global Step: 384260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:12,501-Speed 6318.06 samples/sec Loss 5.1234 LearningRate 0.0004 Epoch: 18 Global Step: 384270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:15,744-Speed 6316.33 samples/sec Loss 5.1437 LearningRate 0.0004 Epoch: 18 Global Step: 384280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:18,991-Speed 6309.86 samples/sec Loss 5.1262 LearningRate 0.0004 Epoch: 18 Global Step: 384290 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:41:22,264-Speed 6257.68 samples/sec Loss 5.1205 LearningRate 0.0004 Epoch: 18 Global Step: 384300 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:41:25,523-Speed 6286.37 samples/sec Loss 5.1320 LearningRate 0.0004 Epoch: 18 Global Step: 384310 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:41:28,769-Speed 6311.10 samples/sec Loss 4.9844 LearningRate 0.0004 Epoch: 18 Global Step: 384320 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:41:32,003-Speed 6333.77 samples/sec Loss 5.1670 LearningRate 0.0004 Epoch: 18 Global Step: 384330 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:35,249-Speed 6309.90 samples/sec Loss 5.1438 LearningRate 0.0004 Epoch: 18 Global Step: 384340 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:38,493-Speed 6314.62 samples/sec Loss 5.1485 LearningRate 0.0004 Epoch: 18 Global Step: 384350 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:41,741-Speed 6307.36 samples/sec Loss 5.1545 LearningRate 0.0004 Epoch: 18 Global Step: 384360 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:44,984-Speed 6315.36 samples/sec Loss 5.0862 LearningRate 0.0004 Epoch: 18 Global Step: 384370 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:48,232-Speed 6306.95 samples/sec Loss 5.1699 LearningRate 0.0004 Epoch: 18 Global Step: 384380 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:51,479-Speed 6310.08 samples/sec Loss 5.1207 LearningRate 0.0004 Epoch: 18 Global Step: 384390 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:54,723-Speed 6314.38 samples/sec Loss 5.1220 LearningRate 0.0004 Epoch: 18 Global Step: 384400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:41:57,972-Speed 6303.62 samples/sec Loss 5.1601 LearningRate 0.0004 Epoch: 18 Global Step: 384410 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:01,214-Speed 6319.38 samples/sec Loss 5.1691 LearningRate 0.0004 Epoch: 18 Global Step: 384420 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:04,463-Speed 6304.66 samples/sec Loss 5.1070 LearningRate 0.0004 Epoch: 18 Global Step: 384430 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:42:07,689-Speed 6350.40 samples/sec Loss 5.1023 LearningRate 0.0004 Epoch: 18 Global Step: 384440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:10,933-Speed 6315.58 samples/sec Loss 5.1121 LearningRate 0.0004 Epoch: 18 Global Step: 384450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:14,178-Speed 6311.96 samples/sec Loss 5.1272 LearningRate 0.0004 Epoch: 18 Global Step: 384460 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:17,422-Speed 6315.32 samples/sec Loss 5.0778 LearningRate 0.0004 Epoch: 18 Global Step: 384470 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:20,664-Speed 6317.85 samples/sec Loss 5.0601 LearningRate 0.0004 Epoch: 18 Global Step: 384480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:23,912-Speed 6306.82 samples/sec Loss 5.1341 LearningRate 0.0004 Epoch: 18 Global Step: 384490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:27,160-Speed 6308.38 samples/sec Loss 5.1531 LearningRate 0.0004 Epoch: 18 Global Step: 384500 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:30,411-Speed 6299.84 samples/sec Loss 5.1168 LearningRate 0.0004 Epoch: 18 Global Step: 384510 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:33,656-Speed 6313.02 samples/sec Loss 5.1454 LearningRate 0.0004 Epoch: 18 Global Step: 384520 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:36,950-Speed 6217.57 samples/sec Loss 5.1209 LearningRate 0.0004 Epoch: 18 Global Step: 384530 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:40,196-Speed 6312.04 samples/sec Loss 5.1222 LearningRate 0.0004 Epoch: 18 Global Step: 384540 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:42:43,442-Speed 6309.48 samples/sec Loss 5.0861 LearningRate 0.0004 Epoch: 18 Global Step: 384550 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:42:46,694-Speed 6299.56 samples/sec Loss 5.0581 LearningRate 0.0004 Epoch: 18 Global Step: 384560 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:42:49,924-Speed 6343.04 samples/sec Loss 5.0588 LearningRate 0.0004 Epoch: 18 Global Step: 384570 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:53,170-Speed 6308.92 samples/sec Loss 5.1963 LearningRate 0.0004 Epoch: 18 Global Step: 384580 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:56,416-Speed 6311.43 samples/sec Loss 5.0752 LearningRate 0.0004 Epoch: 18 Global Step: 384590 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:42:59,659-Speed 6317.01 samples/sec Loss 5.1226 LearningRate 0.0004 Epoch: 18 Global Step: 384600 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:02,911-Speed 6297.76 samples/sec Loss 5.0857 LearningRate 0.0004 Epoch: 18 Global Step: 384610 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:06,157-Speed 6311.48 samples/sec Loss 5.0636 LearningRate 0.0004 Epoch: 18 Global Step: 384620 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:09,404-Speed 6310.25 samples/sec Loss 5.1351 LearningRate 0.0004 Epoch: 18 Global Step: 384630 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:12,654-Speed 6301.92 samples/sec Loss 5.1657 LearningRate 0.0004 Epoch: 18 Global Step: 384640 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:15,896-Speed 6318.26 samples/sec Loss 5.1315 LearningRate 0.0004 Epoch: 18 Global Step: 384650 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:19,141-Speed 6314.47 samples/sec Loss 5.1863 LearningRate 0.0004 Epoch: 18 Global Step: 384660 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:22,372-Speed 6338.89 samples/sec Loss 5.1735 LearningRate 0.0004 Epoch: 18 Global Step: 384670 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:25,623-Speed 6300.50 samples/sec Loss 5.1147 LearningRate 0.0004 Epoch: 18 Global Step: 384680 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:28,865-Speed 6318.35 samples/sec Loss 5.1160 LearningRate 0.0004 Epoch: 18 Global Step: 384690 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:32,115-Speed 6304.12 samples/sec Loss 5.0505 LearningRate 0.0004 Epoch: 18 Global Step: 384700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:35,359-Speed 6313.85 samples/sec Loss 5.0863 LearningRate 0.0004 Epoch: 18 Global Step: 384710 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:38,628-Speed 6266.08 samples/sec Loss 5.0971 LearningRate 0.0004 Epoch: 18 Global Step: 384720 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:41,902-Speed 6256.15 samples/sec Loss 5.1198 LearningRate 0.0004 Epoch: 18 Global Step: 384730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:45,152-Speed 6302.86 samples/sec Loss 5.1554 LearningRate 0.0004 Epoch: 18 Global Step: 384740 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:48,413-Speed 6282.80 samples/sec Loss 5.2137 LearningRate 0.0004 Epoch: 18 Global Step: 384750 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:51,772-Speed 6098.87 samples/sec Loss 5.0575 LearningRate 0.0004 Epoch: 18 Global Step: 384760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:43:55,014-Speed 6317.43 samples/sec Loss 5.1069 LearningRate 0.0004 Epoch: 18 Global Step: 384770 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:43:58,266-Speed 6298.85 samples/sec Loss 5.0925 LearningRate 0.0004 Epoch: 18 Global Step: 384780 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:44:01,495-Speed 6344.75 samples/sec Loss 5.0840 LearningRate 0.0004 Epoch: 18 Global Step: 384790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:04,743-Speed 6307.16 samples/sec Loss 5.1118 LearningRate 0.0004 Epoch: 18 Global Step: 384800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:07,987-Speed 6313.27 samples/sec Loss 5.0599 LearningRate 0.0004 Epoch: 18 Global Step: 384810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:11,252-Speed 6275.73 samples/sec Loss 5.0769 LearningRate 0.0004 Epoch: 18 Global Step: 384820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:14,498-Speed 6311.36 samples/sec Loss 5.0716 LearningRate 0.0004 Epoch: 18 Global Step: 384830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:17,748-Speed 6302.78 samples/sec Loss 5.1351 LearningRate 0.0004 Epoch: 18 Global Step: 384840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:21,006-Speed 6287.62 samples/sec Loss 5.0684 LearningRate 0.0004 Epoch: 18 Global Step: 384850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:24,251-Speed 6312.61 samples/sec Loss 5.1016 LearningRate 0.0004 Epoch: 18 Global Step: 384860 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:27,510-Speed 6285.75 samples/sec Loss 5.1385 LearningRate 0.0004 Epoch: 18 Global Step: 384870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:30,756-Speed 6310.22 samples/sec Loss 5.1181 LearningRate 0.0004 Epoch: 18 Global Step: 384880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:34,001-Speed 6313.10 samples/sec Loss 5.1141 LearningRate 0.0004 Epoch: 18 Global Step: 384890 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:44:37,235-Speed 6332.78 samples/sec Loss 4.9882 LearningRate 0.0004 Epoch: 18 Global Step: 384900 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:40,480-Speed 6314.49 samples/sec Loss 5.0900 LearningRate 0.0004 Epoch: 18 Global Step: 384910 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:43,727-Speed 6307.92 samples/sec Loss 5.0728 LearningRate 0.0004 Epoch: 18 Global Step: 384920 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:46,971-Speed 6315.14 samples/sec Loss 5.0556 LearningRate 0.0004 Epoch: 18 Global Step: 384930 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:50,253-Speed 6240.40 samples/sec Loss 5.1053 LearningRate 0.0004 Epoch: 18 Global Step: 384940 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:53,499-Speed 6310.98 samples/sec Loss 5.1002 LearningRate 0.0004 Epoch: 18 Global Step: 384950 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:56,744-Speed 6313.03 samples/sec Loss 5.0568 LearningRate 0.0004 Epoch: 18 Global Step: 384960 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:44:59,991-Speed 6308.07 samples/sec Loss 5.1762 LearningRate 0.0004 Epoch: 18 Global Step: 384970 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:45:03,235-Speed 6314.22 samples/sec Loss 5.1943 LearningRate 0.0004 Epoch: 18 Global Step: 384980 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:45:06,482-Speed 6308.72 samples/sec Loss 5.1427 LearningRate 0.0004 Epoch: 18 Global Step: 384990 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:45:09,724-Speed 6319.57 samples/sec Loss 5.1336 LearningRate 0.0004 Epoch: 18 Global Step: 385000 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:12,975-Speed 6300.52 samples/sec Loss 5.0668 LearningRate 0.0004 Epoch: 18 Global Step: 385010 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:16,219-Speed 6315.63 samples/sec Loss 5.1332 LearningRate 0.0004 Epoch: 18 Global Step: 385020 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:19,462-Speed 6315.60 samples/sec Loss 5.1996 LearningRate 0.0004 Epoch: 18 Global Step: 385030 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:22,707-Speed 6314.25 samples/sec Loss 5.0894 LearningRate 0.0004 Epoch: 18 Global Step: 385040 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:25,955-Speed 6306.33 samples/sec Loss 5.1276 LearningRate 0.0004 Epoch: 18 Global Step: 385050 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:29,201-Speed 6311.02 samples/sec Loss 5.1131 LearningRate 0.0004 Epoch: 18 Global Step: 385060 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:32,444-Speed 6315.45 samples/sec Loss 5.0502 LearningRate 0.0004 Epoch: 18 Global Step: 385070 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:35,692-Speed 6307.49 samples/sec Loss 5.0631 LearningRate 0.0004 Epoch: 18 Global Step: 385080 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:38,938-Speed 6311.12 samples/sec Loss 5.0745 LearningRate 0.0004 Epoch: 18 Global Step: 385090 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:42,168-Speed 6342.19 samples/sec Loss 5.1645 LearningRate 0.0004 Epoch: 18 Global Step: 385100 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:45:45,399-Speed 6338.56 samples/sec Loss 5.1519 LearningRate 0.0004 Epoch: 18 Global Step: 385110 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:45:48,646-Speed 6310.02 samples/sec Loss 5.1403 LearningRate 0.0004 Epoch: 18 Global Step: 385120 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:45:51,893-Speed 6308.41 samples/sec Loss 5.1013 LearningRate 0.0004 Epoch: 18 Global Step: 385130 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:45:55,138-Speed 6312.02 samples/sec Loss 4.9879 LearningRate 0.0004 Epoch: 18 Global Step: 385140 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:45:58,383-Speed 6313.08 samples/sec Loss 5.0461 LearningRate 0.0004 Epoch: 18 Global Step: 385150 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:01,636-Speed 6297.40 samples/sec Loss 5.0894 LearningRate 0.0004 Epoch: 18 Global Step: 385160 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:04,883-Speed 6307.78 samples/sec Loss 5.1195 LearningRate 0.0004 Epoch: 18 Global Step: 385170 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:08,128-Speed 6313.72 samples/sec Loss 5.1052 LearningRate 0.0004 Epoch: 18 Global Step: 385180 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:11,411-Speed 6239.91 samples/sec Loss 5.1661 LearningRate 0.0004 Epoch: 18 Global Step: 385190 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:14,662-Speed 6300.31 samples/sec Loss 5.1335 LearningRate 0.0004 Epoch: 18 Global Step: 385200 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:17,909-Speed 6307.87 samples/sec Loss 5.1064 LearningRate 0.0004 Epoch: 18 Global Step: 385210 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:46:21,154-Speed 6313.10 samples/sec Loss 5.0159 LearningRate 0.0004 Epoch: 18 Global Step: 385220 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:46:24,401-Speed 6309.10 samples/sec Loss 5.1111 LearningRate 0.0004 Epoch: 18 Global Step: 385230 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:46:27,634-Speed 6338.04 samples/sec Loss 5.0493 LearningRate 0.0004 Epoch: 18 Global Step: 385240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:30,879-Speed 6312.72 samples/sec Loss 5.1193 LearningRate 0.0004 Epoch: 18 Global Step: 385250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:34,134-Speed 6292.36 samples/sec Loss 5.1148 LearningRate 0.0004 Epoch: 18 Global Step: 385260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:37,379-Speed 6311.64 samples/sec Loss 5.1543 LearningRate 0.0004 Epoch: 18 Global Step: 385270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:40,624-Speed 6314.82 samples/sec Loss 5.0908 LearningRate 0.0004 Epoch: 18 Global Step: 385280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:43,869-Speed 6312.12 samples/sec Loss 5.1172 LearningRate 0.0004 Epoch: 18 Global Step: 385290 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:47,122-Speed 6295.94 samples/sec Loss 5.1123 LearningRate 0.0004 Epoch: 18 Global Step: 385300 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:50,363-Speed 6322.04 samples/sec Loss 5.0974 LearningRate 0.0004 Epoch: 18 Global Step: 385310 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:53,610-Speed 6308.17 samples/sec Loss 5.0413 LearningRate 0.0004 Epoch: 18 Global Step: 385320 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:46:56,854-Speed 6314.20 samples/sec Loss 5.0791 LearningRate 0.0004 Epoch: 18 Global Step: 385330 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:47:00,104-Speed 6303.47 samples/sec Loss 5.1251 LearningRate 0.0004 Epoch: 18 Global Step: 385340 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:47:03,356-Speed 6297.40 samples/sec Loss 5.0302 LearningRate 0.0004 Epoch: 18 Global Step: 385350 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:47:06,604-Speed 6307.73 samples/sec Loss 5.1062 LearningRate 0.0004 Epoch: 18 Global Step: 385360 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:47:09,849-Speed 6312.79 samples/sec Loss 5.1111 LearningRate 0.0004 Epoch: 18 Global Step: 385370 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:47:13,112-Speed 6277.91 samples/sec Loss 5.1036 LearningRate 0.0004 Epoch: 18 Global Step: 385380 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:47:16,371-Speed 6285.09 samples/sec Loss 5.1158 LearningRate 0.0004 Epoch: 18 Global Step: 385390 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:47:19,619-Speed 6306.67 samples/sec Loss 5.0972 LearningRate 0.0004 Epoch: 18 Global Step: 385400 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:47:22,867-Speed 6307.87 samples/sec Loss 5.1066 LearningRate 0.0004 Epoch: 18 Global Step: 385410 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:47:26,113-Speed 6310.28 samples/sec Loss 5.1463 LearningRate 0.0004 Epoch: 18 Global Step: 385420 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:47:29,349-Speed 6331.76 samples/sec Loss 5.1020 LearningRate 0.0004 Epoch: 18 Global Step: 385430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:47:32,592-Speed 6316.91 samples/sec Loss 5.2515 LearningRate 0.0004 Epoch: 18 Global Step: 385440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:47:35,844-Speed 6298.88 samples/sec Loss 5.1376 LearningRate 0.0004 Epoch: 18 Global Step: 385450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:47:39,087-Speed 6316.70 samples/sec Loss 5.0715 LearningRate 0.0004 Epoch: 18 Global Step: 385460 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:47:42,335-Speed 6306.51 samples/sec Loss 5.1036 LearningRate 0.0004 Epoch: 18 Global Step: 385470 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:47:45,583-Speed 6305.82 samples/sec Loss 5.0931 LearningRate 0.0004 Epoch: 18 Global Step: 385480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:47:48,844-Speed 6282.73 samples/sec Loss 5.1706 LearningRate 0.0004 Epoch: 18 Global Step: 385490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:47:52,094-Speed 6301.50 samples/sec Loss 5.0725 LearningRate 0.0004 Epoch: 18 Global Step: 385500 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:47:55,343-Speed 6306.60 samples/sec Loss 5.0999 LearningRate 0.0004 Epoch: 18 Global Step: 385510 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:47:58,589-Speed 6310.02 samples/sec Loss 5.1187 LearningRate 0.0004 Epoch: 18 Global Step: 385520 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:01,835-Speed 6311.79 samples/sec Loss 5.0264 LearningRate 0.0004 Epoch: 18 Global Step: 385530 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:48:05,070-Speed 6331.07 samples/sec Loss 5.0708 LearningRate 0.0004 Epoch: 18 Global Step: 385540 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:08,321-Speed 6300.86 samples/sec Loss 5.0493 LearningRate 0.0004 Epoch: 18 Global Step: 385550 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:11,566-Speed 6312.12 samples/sec Loss 5.0475 LearningRate 0.0004 Epoch: 18 Global Step: 385560 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:14,811-Speed 6314.00 samples/sec Loss 5.1981 LearningRate 0.0004 Epoch: 18 Global Step: 385570 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:18,058-Speed 6308.76 samples/sec Loss 5.1530 LearningRate 0.0004 Epoch: 18 Global Step: 385580 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:21,305-Speed 6307.91 samples/sec Loss 5.0443 LearningRate 0.0004 Epoch: 18 Global Step: 385590 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:24,553-Speed 6307.01 samples/sec Loss 5.1189 LearningRate 0.0004 Epoch: 18 Global Step: 385600 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:27,800-Speed 6309.72 samples/sec Loss 5.1562 LearningRate 0.0004 Epoch: 18 Global Step: 385610 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:31,044-Speed 6312.91 samples/sec Loss 5.0971 LearningRate 0.0004 Epoch: 18 Global Step: 385620 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:34,289-Speed 6314.66 samples/sec Loss 5.1236 LearningRate 0.0004 Epoch: 18 Global Step: 385630 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:37,536-Speed 6308.42 samples/sec Loss 5.0861 LearningRate 0.0004 Epoch: 18 Global Step: 385640 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:48:40,777-Speed 6320.89 samples/sec Loss 5.1076 LearningRate 0.0004 Epoch: 18 Global Step: 385650 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:48:44,020-Speed 6316.18 samples/sec Loss 5.0582 LearningRate 0.0004 Epoch: 18 Global Step: 385660 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:48:47,251-Speed 6340.72 samples/sec Loss 5.0273 LearningRate 0.0004 Epoch: 18 Global Step: 385670 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:50,494-Speed 6316.47 samples/sec Loss 5.1343 LearningRate 0.0004 Epoch: 18 Global Step: 385680 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:53,740-Speed 6308.90 samples/sec Loss 5.1755 LearningRate 0.0004 Epoch: 18 Global Step: 385690 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:48:56,984-Speed 6314.81 samples/sec Loss 5.0584 LearningRate 0.0004 Epoch: 18 Global Step: 385700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:00,235-Speed 6302.46 samples/sec Loss 5.0560 LearningRate 0.0004 Epoch: 18 Global Step: 385710 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:03,479-Speed 6313.56 samples/sec Loss 5.1184 LearningRate 0.0004 Epoch: 18 Global Step: 385720 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:06,738-Speed 6286.59 samples/sec Loss 5.1230 LearningRate 0.0004 Epoch: 18 Global Step: 385730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:09,984-Speed 6310.48 samples/sec Loss 5.1317 LearningRate 0.0004 Epoch: 18 Global Step: 385740 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:13,233-Speed 6304.04 samples/sec Loss 5.1677 LearningRate 0.0004 Epoch: 18 Global Step: 385750 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:16,476-Speed 6316.49 samples/sec Loss 5.0479 LearningRate 0.0004 Epoch: 18 Global Step: 385760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:19,722-Speed 6312.21 samples/sec Loss 5.1231 LearningRate 0.0004 Epoch: 18 Global Step: 385770 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:49:22,971-Speed 6303.91 samples/sec Loss 5.0026 LearningRate 0.0004 Epoch: 18 Global Step: 385780 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:49:26,225-Speed 6295.69 samples/sec Loss 5.0336 LearningRate 0.0004 Epoch: 18 Global Step: 385790 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:49:29,472-Speed 6308.80 samples/sec Loss 5.0964 LearningRate 0.0004 Epoch: 18 Global Step: 385800 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:49:32,702-Speed 6341.85 samples/sec Loss 5.0128 LearningRate 0.0004 Epoch: 18 Global Step: 385810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:35,950-Speed 6306.65 samples/sec Loss 5.0853 LearningRate 0.0004 Epoch: 18 Global Step: 385820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:39,195-Speed 6312.05 samples/sec Loss 5.0806 LearningRate 0.0004 Epoch: 18 Global Step: 385830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:42,443-Speed 6307.78 samples/sec Loss 5.0821 LearningRate 0.0004 Epoch: 18 Global Step: 385840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:45,688-Speed 6313.27 samples/sec Loss 5.0417 LearningRate 0.0004 Epoch: 18 Global Step: 385850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:48,963-Speed 6254.27 samples/sec Loss 5.1215 LearningRate 0.0004 Epoch: 18 Global Step: 385860 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:52,208-Speed 6313.37 samples/sec Loss 5.1437 LearningRate 0.0004 Epoch: 18 Global Step: 385870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:55,456-Speed 6307.34 samples/sec Loss 5.1431 LearningRate 0.0004 Epoch: 18 Global Step: 385880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:49:58,700-Speed 6313.74 samples/sec Loss 5.1020 LearningRate 0.0004 Epoch: 18 Global Step: 385890 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:01,946-Speed 6310.66 samples/sec Loss 5.0440 LearningRate 0.0004 Epoch: 18 Global Step: 385900 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:05,193-Speed 6308.48 samples/sec Loss 5.1055 LearningRate 0.0004 Epoch: 18 Global Step: 385910 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:50:08,440-Speed 6308.84 samples/sec Loss 5.1163 LearningRate 0.0004 Epoch: 18 Global Step: 385920 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:50:11,674-Speed 6334.09 samples/sec Loss 5.1103 LearningRate 0.0004 Epoch: 18 Global Step: 385930 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:14,921-Speed 6309.50 samples/sec Loss 5.0725 LearningRate 0.0004 Epoch: 18 Global Step: 385940 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:18,169-Speed 6306.40 samples/sec Loss 5.1491 LearningRate 0.0004 Epoch: 18 Global Step: 385950 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:21,414-Speed 6313.15 samples/sec Loss 5.1206 LearningRate 0.0004 Epoch: 18 Global Step: 385960 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:24,659-Speed 6311.09 samples/sec Loss 5.1817 LearningRate 0.0004 Epoch: 18 Global Step: 385970 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:27,910-Speed 6303.08 samples/sec Loss 5.1659 LearningRate 0.0004 Epoch: 18 Global Step: 385980 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:31,160-Speed 6302.00 samples/sec Loss 5.0793 LearningRate 0.0004 Epoch: 18 Global Step: 385990 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:34,407-Speed 6308.39 samples/sec Loss 5.1268 LearningRate 0.0004 Epoch: 18 Global Step: 386000 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:37,649-Speed 6317.88 samples/sec Loss 5.0965 LearningRate 0.0004 Epoch: 18 Global Step: 386010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:40,895-Speed 6312.23 samples/sec Loss 5.1105 LearningRate 0.0004 Epoch: 18 Global Step: 386020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:50:44,139-Speed 6313.97 samples/sec Loss 5.0679 LearningRate 0.0004 Epoch: 18 Global Step: 386030 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:50:47,387-Speed 6307.54 samples/sec Loss 5.1049 LearningRate 0.0004 Epoch: 18 Global Step: 386040 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:50:50,634-Speed 6308.59 samples/sec Loss 5.0893 LearningRate 0.0004 Epoch: 18 Global Step: 386050 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:50:53,876-Speed 6317.98 samples/sec Loss 5.0886 LearningRate 0.0004 Epoch: 18 Global Step: 386060 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:50:57,123-Speed 6310.08 samples/sec Loss 5.0675 LearningRate 0.0004 Epoch: 18 Global Step: 386070 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:51:00,361-Speed 6325.70 samples/sec Loss 5.1562 LearningRate 0.0004 Epoch: 18 Global Step: 386080 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:03,603-Speed 6319.19 samples/sec Loss 5.1513 LearningRate 0.0004 Epoch: 18 Global Step: 386090 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:06,849-Speed 6309.86 samples/sec Loss 5.0659 LearningRate 0.0004 Epoch: 18 Global Step: 386100 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:10,092-Speed 6316.86 samples/sec Loss 5.1636 LearningRate 0.0004 Epoch: 18 Global Step: 386110 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:13,347-Speed 6293.99 samples/sec Loss 5.1592 LearningRate 0.0004 Epoch: 18 Global Step: 386120 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:16,591-Speed 6314.68 samples/sec Loss 5.0261 LearningRate 0.0004 Epoch: 18 Global Step: 386130 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:19,835-Speed 6314.09 samples/sec Loss 5.1025 LearningRate 0.0004 Epoch: 18 Global Step: 386140 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:23,081-Speed 6310.98 samples/sec Loss 5.0951 LearningRate 0.0004 Epoch: 18 Global Step: 386150 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:26,328-Speed 6309.37 samples/sec Loss 5.0174 LearningRate 0.0004 Epoch: 18 Global Step: 386160 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:29,577-Speed 6304.39 samples/sec Loss 5.0777 LearningRate 0.0004 Epoch: 18 Global Step: 386170 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:32,843-Speed 6271.19 samples/sec Loss 5.0521 LearningRate 0.0004 Epoch: 18 Global Step: 386180 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:51:36,087-Speed 6315.31 samples/sec Loss 5.1404 LearningRate 0.0004 Epoch: 18 Global Step: 386190 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:51:39,334-Speed 6309.03 samples/sec Loss 5.1601 LearningRate 0.0004 Epoch: 18 Global Step: 386200 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:51:42,575-Speed 6318.89 samples/sec Loss 5.1288 LearningRate 0.0004 Epoch: 18 Global Step: 386210 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:51:45,804-Speed 6345.62 samples/sec Loss 5.0967 LearningRate 0.0004 Epoch: 18 Global Step: 386220 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:49,050-Speed 6310.56 samples/sec Loss 5.0916 LearningRate 0.0004 Epoch: 18 Global Step: 386230 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:52,294-Speed 6313.66 samples/sec Loss 5.0896 LearningRate 0.0004 Epoch: 18 Global Step: 386240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:55,545-Speed 6301.24 samples/sec Loss 5.0955 LearningRate 0.0004 Epoch: 18 Global Step: 386250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:51:58,796-Speed 6300.91 samples/sec Loss 5.0609 LearningRate 0.0004 Epoch: 18 Global Step: 386260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:52:02,048-Speed 6300.83 samples/sec Loss 5.1290 LearningRate 0.0004 Epoch: 18 Global Step: 386270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:52:05,326-Speed 6248.86 samples/sec Loss 5.0492 LearningRate 0.0004 Epoch: 18 Global Step: 386280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:52:08,577-Speed 6300.98 samples/sec Loss 5.1018 LearningRate 0.0004 Epoch: 18 Global Step: 386290 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:52:11,821-Speed 6314.98 samples/sec Loss 5.1782 LearningRate 0.0004 Epoch: 18 Global Step: 386300 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:52:15,072-Speed 6299.98 samples/sec Loss 5.1176 LearningRate 0.0004 Epoch: 18 Global Step: 386310 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:52:18,319-Speed 6309.91 samples/sec Loss 5.0895 LearningRate 0.0004 Epoch: 18 Global Step: 386320 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:52:21,564-Speed 6312.14 samples/sec Loss 5.0847 LearningRate 0.0004 Epoch: 18 Global Step: 386330 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:52:24,811-Speed 6308.31 samples/sec Loss 5.1248 LearningRate 0.0004 Epoch: 18 Global Step: 386340 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:52:28,056-Speed 6312.50 samples/sec Loss 5.1104 LearningRate 0.0004 Epoch: 18 Global Step: 386350 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:52:31,300-Speed 6313.69 samples/sec Loss 5.1610 LearningRate 0.0004 Epoch: 18 Global Step: 386360 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:52:34,548-Speed 6308.07 samples/sec Loss 5.1103 LearningRate 0.0004 Epoch: 18 Global Step: 386370 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:52:37,795-Speed 6308.46 samples/sec Loss 5.1112 LearningRate 0.0004 Epoch: 18 Global Step: 386380 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:52:41,040-Speed 6312.03 samples/sec Loss 5.0798 LearningRate 0.0004 Epoch: 18 Global Step: 386390 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:52:44,272-Speed 6339.27 samples/sec Loss 5.0317 LearningRate 0.0004 Epoch: 18 Global Step: 386400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:52:47,521-Speed 6304.67 samples/sec Loss 5.1299 LearningRate 0.0004 Epoch: 18 Global Step: 386410 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:52:50,768-Speed 6307.51 samples/sec Loss 5.0424 LearningRate 0.0004 Epoch: 18 Global Step: 386420 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:52:54,015-Speed 6310.35 samples/sec Loss 5.0401 LearningRate 0.0004 Epoch: 18 Global Step: 386430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:52:57,265-Speed 6302.47 samples/sec Loss 5.1804 LearningRate 0.0004 Epoch: 18 Global Step: 386440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:00,508-Speed 6315.23 samples/sec Loss 5.1376 LearningRate 0.0004 Epoch: 18 Global Step: 386450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:03,757-Speed 6306.21 samples/sec Loss 5.1413 LearningRate 0.0004 Epoch: 18 Global Step: 386460 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:07,002-Speed 6313.18 samples/sec Loss 5.0704 LearningRate 0.0004 Epoch: 18 Global Step: 386470 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:10,249-Speed 6307.48 samples/sec Loss 5.0498 LearningRate 0.0004 Epoch: 18 Global Step: 386480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:13,497-Speed 6307.97 samples/sec Loss 5.1116 LearningRate 0.0004 Epoch: 18 Global Step: 386490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:16,730-Speed 6336.54 samples/sec Loss 5.1264 LearningRate 0.0004 Epoch: 18 Global Step: 386500 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:19,981-Speed 6300.57 samples/sec Loss 5.1098 LearningRate 0.0004 Epoch: 18 Global Step: 386510 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:23,228-Speed 6308.37 samples/sec Loss 5.0980 LearningRate 0.0004 Epoch: 18 Global Step: 386520 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:26,476-Speed 6307.33 samples/sec Loss 5.1017 LearningRate 0.0004 Epoch: 18 Global Step: 386530 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:29,721-Speed 6313.21 samples/sec Loss 5.0514 LearningRate 0.0004 Epoch: 18 Global Step: 386540 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:32,973-Speed 6298.32 samples/sec Loss 5.1164 LearningRate 0.0004 Epoch: 18 Global Step: 386550 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:36,224-Speed 6301.90 samples/sec Loss 5.1651 LearningRate 0.0004 Epoch: 18 Global Step: 386560 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:39,468-Speed 6313.49 samples/sec Loss 5.1190 LearningRate 0.0004 Epoch: 18 Global Step: 386570 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:42,717-Speed 6305.45 samples/sec Loss 5.0067 LearningRate 0.0004 Epoch: 18 Global Step: 386580 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:45,962-Speed 6311.75 samples/sec Loss 5.0689 LearningRate 0.0004 Epoch: 18 Global Step: 386590 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:53:49,210-Speed 6307.33 samples/sec Loss 5.0851 LearningRate 0.0004 Epoch: 18 Global Step: 386600 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:53:52,452-Speed 6318.70 samples/sec Loss 5.0945 LearningRate 0.0004 Epoch: 18 Global Step: 386610 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:53:55,700-Speed 6307.12 samples/sec Loss 5.0633 LearningRate 0.0004 Epoch: 18 Global Step: 386620 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:53:58,946-Speed 6310.33 samples/sec Loss 5.0799 LearningRate 0.0004 Epoch: 18 Global Step: 386630 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:54:02,192-Speed 6311.42 samples/sec Loss 5.1744 LearningRate 0.0004 Epoch: 18 Global Step: 386640 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:54:05,441-Speed 6304.34 samples/sec Loss 5.0982 LearningRate 0.0004 Epoch: 18 Global Step: 386650 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:54:08,699-Speed 6287.28 samples/sec Loss 5.1909 LearningRate 0.0004 Epoch: 18 Global Step: 386660 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:54:11,930-Speed 6340.09 samples/sec Loss 5.0646 LearningRate 0.0004 Epoch: 18 Global Step: 386670 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:15,181-Speed 6301.78 samples/sec Loss 5.0850 LearningRate 0.0004 Epoch: 18 Global Step: 386680 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:18,426-Speed 6312.57 samples/sec Loss 5.1241 LearningRate 0.0004 Epoch: 18 Global Step: 386690 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:21,675-Speed 6305.35 samples/sec Loss 5.1261 LearningRate 0.0004 Epoch: 18 Global Step: 386700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:24,927-Speed 6299.50 samples/sec Loss 5.0885 LearningRate 0.0004 Epoch: 18 Global Step: 386710 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:28,175-Speed 6307.29 samples/sec Loss 4.9804 LearningRate 0.0004 Epoch: 18 Global Step: 386720 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:31,419-Speed 6313.10 samples/sec Loss 5.0625 LearningRate 0.0004 Epoch: 18 Global Step: 386730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:34,665-Speed 6311.83 samples/sec Loss 5.0134 LearningRate 0.0004 Epoch: 18 Global Step: 386740 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:37,914-Speed 6305.23 samples/sec Loss 5.0273 LearningRate 0.0004 Epoch: 18 Global Step: 386750 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:41,159-Speed 6311.41 samples/sec Loss 5.1058 LearningRate 0.0004 Epoch: 18 Global Step: 386760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:44,410-Speed 6301.59 samples/sec Loss 5.0888 LearningRate 0.0004 Epoch: 18 Global Step: 386770 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:54:47,663-Speed 6297.41 samples/sec Loss 5.0894 LearningRate 0.0004 Epoch: 18 Global Step: 386780 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:54:50,904-Speed 6321.21 samples/sec Loss 5.1288 LearningRate 0.0004 Epoch: 18 Global Step: 386790 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:54:54,138-Speed 6333.41 samples/sec Loss 5.1046 LearningRate 0.0004 Epoch: 18 Global Step: 386800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:54:57,384-Speed 6310.95 samples/sec Loss 5.0740 LearningRate 0.0004 Epoch: 18 Global Step: 386810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:00,627-Speed 6315.19 samples/sec Loss 5.1332 LearningRate 0.0004 Epoch: 18 Global Step: 386820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:03,877-Speed 6303.33 samples/sec Loss 5.0906 LearningRate 0.0004 Epoch: 18 Global Step: 386830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:07,120-Speed 6316.53 samples/sec Loss 5.1472 LearningRate 0.0004 Epoch: 18 Global Step: 386840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:10,366-Speed 6311.05 samples/sec Loss 5.1019 LearningRate 0.0004 Epoch: 18 Global Step: 386850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:13,620-Speed 6295.18 samples/sec Loss 5.0850 LearningRate 0.0004 Epoch: 18 Global Step: 386860 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:16,863-Speed 6315.57 samples/sec Loss 5.0651 LearningRate 0.0004 Epoch: 18 Global Step: 386870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:20,107-Speed 6314.81 samples/sec Loss 5.1006 LearningRate 0.0004 Epoch: 18 Global Step: 386880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:23,354-Speed 6310.63 samples/sec Loss 5.1062 LearningRate 0.0004 Epoch: 18 Global Step: 386890 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:26,599-Speed 6312.25 samples/sec Loss 5.0663 LearningRate 0.0004 Epoch: 18 Global Step: 386900 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:55:29,844-Speed 6313.39 samples/sec Loss 5.0903 LearningRate 0.0004 Epoch: 18 Global Step: 386910 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:55:33,095-Speed 6300.45 samples/sec Loss 5.1116 LearningRate 0.0004 Epoch: 18 Global Step: 386920 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:55:36,341-Speed 6310.18 samples/sec Loss 5.1099 LearningRate 0.0004 Epoch: 18 Global Step: 386930 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:55:39,575-Speed 6334.28 samples/sec Loss 5.1207 LearningRate 0.0004 Epoch: 18 Global Step: 386940 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:42,820-Speed 6312.84 samples/sec Loss 5.0792 LearningRate 0.0004 Epoch: 18 Global Step: 386950 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:46,069-Speed 6305.96 samples/sec Loss 5.1299 LearningRate 0.0004 Epoch: 18 Global Step: 386960 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:49,309-Speed 6320.86 samples/sec Loss 5.0883 LearningRate 0.0004 Epoch: 18 Global Step: 386970 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:52,551-Speed 6318.65 samples/sec Loss 5.0569 LearningRate 0.0004 Epoch: 18 Global Step: 386980 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:55,798-Speed 6309.07 samples/sec Loss 5.0988 LearningRate 0.0004 Epoch: 18 Global Step: 386990 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:55:59,044-Speed 6311.30 samples/sec Loss 5.0977 LearningRate 0.0004 Epoch: 18 Global Step: 387000 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:02,292-Speed 6306.29 samples/sec Loss 5.1030 LearningRate 0.0004 Epoch: 18 Global Step: 387010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:05,538-Speed 6310.49 samples/sec Loss 5.1341 LearningRate 0.0004 Epoch: 18 Global Step: 387020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:08,785-Speed 6309.15 samples/sec Loss 5.0256 LearningRate 0.0004 Epoch: 18 Global Step: 387030 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:12,071-Speed 6234.09 samples/sec Loss 5.1458 LearningRate 0.0004 Epoch: 18 Global Step: 387040 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:56:15,312-Speed 6320.82 samples/sec Loss 5.0712 LearningRate 0.0004 Epoch: 18 Global Step: 387050 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:56:18,557-Speed 6311.03 samples/sec Loss 5.0887 LearningRate 0.0004 Epoch: 18 Global Step: 387060 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:56:21,814-Speed 6289.87 samples/sec Loss 5.1336 LearningRate 0.0004 Epoch: 18 Global Step: 387070 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:56:25,059-Speed 6313.29 samples/sec Loss 5.1685 LearningRate 0.0004 Epoch: 18 Global Step: 387080 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:56:28,300-Speed 6320.58 samples/sec Loss 5.0541 LearningRate 0.0004 Epoch: 18 Global Step: 387090 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:31,545-Speed 6312.85 samples/sec Loss 5.1035 LearningRate 0.0004 Epoch: 18 Global Step: 387100 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:34,795-Speed 6304.04 samples/sec Loss 5.1636 LearningRate 0.0004 Epoch: 18 Global Step: 387110 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:38,042-Speed 6309.24 samples/sec Loss 5.0595 LearningRate 0.0004 Epoch: 18 Global Step: 387120 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:41,289-Speed 6307.86 samples/sec Loss 5.0815 LearningRate 0.0004 Epoch: 18 Global Step: 387130 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:44,532-Speed 6316.24 samples/sec Loss 5.0649 LearningRate 0.0004 Epoch: 18 Global Step: 387140 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:47,774-Speed 6317.92 samples/sec Loss 5.1012 LearningRate 0.0004 Epoch: 18 Global Step: 387150 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:51,019-Speed 6313.82 samples/sec Loss 5.0955 LearningRate 0.0004 Epoch: 18 Global Step: 387160 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:54,265-Speed 6311.04 samples/sec Loss 5.1260 LearningRate 0.0004 Epoch: 18 Global Step: 387170 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:56:57,510-Speed 6312.32 samples/sec Loss 5.1370 LearningRate 0.0004 Epoch: 18 Global Step: 387180 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:00,758-Speed 6305.81 samples/sec Loss 5.0894 LearningRate 0.0004 Epoch: 18 Global Step: 387190 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:57:03,989-Speed 6340.04 samples/sec Loss 5.1540 LearningRate 0.0004 Epoch: 18 Global Step: 387200 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:07,234-Speed 6313.35 samples/sec Loss 5.0793 LearningRate 0.0004 Epoch: 18 Global Step: 387210 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:10,478-Speed 6313.59 samples/sec Loss 5.1098 LearningRate 0.0004 Epoch: 18 Global Step: 387220 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:13,724-Speed 6311.14 samples/sec Loss 5.1389 LearningRate 0.0004 Epoch: 18 Global Step: 387230 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:16,979-Speed 6292.56 samples/sec Loss 5.0223 LearningRate 0.0004 Epoch: 18 Global Step: 387240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:20,224-Speed 6314.13 samples/sec Loss 5.0965 LearningRate 0.0004 Epoch: 18 Global Step: 387250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:23,539-Speed 6178.53 samples/sec Loss 5.0633 LearningRate 0.0004 Epoch: 18 Global Step: 387260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:26,829-Speed 6227.06 samples/sec Loss 5.1122 LearningRate 0.0004 Epoch: 18 Global Step: 387270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:30,083-Speed 6294.42 samples/sec Loss 5.1880 LearningRate 0.0004 Epoch: 18 Global Step: 387280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:33,330-Speed 6309.39 samples/sec Loss 5.1561 LearningRate 0.0004 Epoch: 18 Global Step: 387290 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:36,576-Speed 6311.45 samples/sec Loss 5.0938 LearningRate 0.0004 Epoch: 18 Global Step: 387300 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:57:39,825-Speed 6304.38 samples/sec Loss 5.0711 LearningRate 0.0004 Epoch: 18 Global Step: 387310 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:57:43,056-Speed 6339.89 samples/sec Loss 5.0511 LearningRate 0.0004 Epoch: 18 Global Step: 387320 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:46,298-Speed 6319.26 samples/sec Loss 5.1151 LearningRate 0.0004 Epoch: 18 Global Step: 387330 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:49,544-Speed 6310.57 samples/sec Loss 5.0657 LearningRate 0.0004 Epoch: 18 Global Step: 387340 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:52,790-Speed 6310.10 samples/sec Loss 5.1655 LearningRate 0.0004 Epoch: 18 Global Step: 387350 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:56,033-Speed 6316.39 samples/sec Loss 5.0926 LearningRate 0.0004 Epoch: 18 Global Step: 387360 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:57:59,278-Speed 6312.42 samples/sec Loss 5.0294 LearningRate 0.0004 Epoch: 18 Global Step: 387370 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:02,523-Speed 6313.90 samples/sec Loss 5.0586 LearningRate 0.0004 Epoch: 18 Global Step: 387380 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:05,769-Speed 6310.62 samples/sec Loss 5.0292 LearningRate 0.0004 Epoch: 18 Global Step: 387390 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:09,013-Speed 6314.76 samples/sec Loss 5.0729 LearningRate 0.0004 Epoch: 18 Global Step: 387400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:12,261-Speed 6305.58 samples/sec Loss 5.1617 LearningRate 0.0004 Epoch: 18 Global Step: 387410 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:15,493-Speed 6339.54 samples/sec Loss 5.0414 LearningRate 0.0004 Epoch: 18 Global Step: 387420 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:18,740-Speed 6307.19 samples/sec Loss 5.1078 LearningRate 0.0004 Epoch: 18 Global Step: 387430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:21,985-Speed 6314.29 samples/sec Loss 5.0760 LearningRate 0.0004 Epoch: 18 Global Step: 387440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:25,236-Speed 6299.16 samples/sec Loss 5.1554 LearningRate 0.0004 Epoch: 18 Global Step: 387450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:28,478-Speed 6319.53 samples/sec Loss 5.0697 LearningRate 0.0004 Epoch: 18 Global Step: 387460 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:31,721-Speed 6316.32 samples/sec Loss 5.0869 LearningRate 0.0004 Epoch: 18 Global Step: 387470 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:34,966-Speed 6313.33 samples/sec Loss 5.0175 LearningRate 0.0004 Epoch: 18 Global Step: 387480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:38,212-Speed 6310.03 samples/sec Loss 5.1830 LearningRate 0.0004 Epoch: 18 Global Step: 387490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:41,455-Speed 6316.57 samples/sec Loss 5.1037 LearningRate 0.0004 Epoch: 18 Global Step: 387500 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:44,699-Speed 6313.74 samples/sec Loss 5.0784 LearningRate 0.0004 Epoch: 18 Global Step: 387510 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:58:47,944-Speed 6313.89 samples/sec Loss 5.0183 LearningRate 0.0004 Epoch: 18 Global Step: 387520 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:58:51,191-Speed 6309.40 samples/sec Loss 5.0311 LearningRate 0.0004 Epoch: 18 Global Step: 387530 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:58:54,434-Speed 6315.79 samples/sec Loss 5.0417 LearningRate 0.0004 Epoch: 18 Global Step: 387540 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:58:57,662-Speed 6346.42 samples/sec Loss 5.1272 LearningRate 0.0004 Epoch: 18 Global Step: 387550 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:59:00,907-Speed 6311.92 samples/sec Loss 5.1576 LearningRate 0.0004 Epoch: 18 Global Step: 387560 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:59:04,151-Speed 6315.48 samples/sec Loss 5.1612 LearningRate 0.0004 Epoch: 18 Global Step: 387570 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:59:07,395-Speed 6314.30 samples/sec Loss 5.1548 LearningRate 0.0004 Epoch: 18 Global Step: 387580 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:59:10,645-Speed 6302.41 samples/sec Loss 4.9988 LearningRate 0.0004 Epoch: 18 Global Step: 387590 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:59:13,891-Speed 6311.06 samples/sec Loss 5.1012 LearningRate 0.0004 Epoch: 18 Global Step: 387600 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:59:17,133-Speed 6318.18 samples/sec Loss 5.0962 LearningRate 0.0004 Epoch: 18 Global Step: 387610 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:59:20,381-Speed 6307.10 samples/sec Loss 5.0586 LearningRate 0.0004 Epoch: 18 Global Step: 387620 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:59:23,627-Speed 6310.59 samples/sec Loss 5.0559 LearningRate 0.0004 Epoch: 18 Global Step: 387630 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:59:26,873-Speed 6310.93 samples/sec Loss 5.0358 LearningRate 0.0004 Epoch: 18 Global Step: 387640 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 02:59:30,121-Speed 6307.10 samples/sec Loss 5.0706 LearningRate 0.0004 Epoch: 18 Global Step: 387650 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:59:33,366-Speed 6313.06 samples/sec Loss 5.0465 LearningRate 0.0004 Epoch: 18 Global Step: 387660 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:59:36,609-Speed 6315.92 samples/sec Loss 5.1178 LearningRate 0.0004 Epoch: 18 Global Step: 387670 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:59:39,854-Speed 6312.60 samples/sec Loss 5.0491 LearningRate 0.0004 Epoch: 18 Global Step: 387680 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:59:43,099-Speed 6313.24 samples/sec Loss 5.0687 LearningRate 0.0004 Epoch: 18 Global Step: 387690 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:59:46,347-Speed 6307.14 samples/sec Loss 5.0417 LearningRate 0.0004 Epoch: 18 Global Step: 387700 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:59:49,594-Speed 6307.05 samples/sec Loss 5.1126 LearningRate 0.0004 Epoch: 18 Global Step: 387710 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:59:52,840-Speed 6311.70 samples/sec Loss 5.0579 LearningRate 0.0004 Epoch: 18 Global Step: 387720 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:59:56,085-Speed 6312.56 samples/sec Loss 5.0535 LearningRate 0.0004 Epoch: 18 Global Step: 387730 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 02:59:59,336-Speed 6301.00 samples/sec Loss 5.1507 LearningRate 0.0004 Epoch: 18 Global Step: 387740 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:00:02,566-Speed 6341.71 samples/sec Loss 5.1560 LearningRate 0.0004 Epoch: 18 Global Step: 387750 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:00:05,802-Speed 6331.25 samples/sec Loss 5.1216 LearningRate 0.0004 Epoch: 18 Global Step: 387760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:09,051-Speed 6304.34 samples/sec Loss 5.1279 LearningRate 0.0004 Epoch: 18 Global Step: 387770 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:12,296-Speed 6313.68 samples/sec Loss 5.0621 LearningRate 0.0004 Epoch: 18 Global Step: 387780 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:15,545-Speed 6305.29 samples/sec Loss 5.1431 LearningRate 0.0004 Epoch: 18 Global Step: 387790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:18,792-Speed 6308.34 samples/sec Loss 5.0776 LearningRate 0.0004 Epoch: 18 Global Step: 387800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:22,036-Speed 6314.81 samples/sec Loss 5.1025 LearningRate 0.0004 Epoch: 18 Global Step: 387810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:25,281-Speed 6312.80 samples/sec Loss 5.0472 LearningRate 0.0004 Epoch: 18 Global Step: 387820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:28,525-Speed 6314.04 samples/sec Loss 5.0699 LearningRate 0.0004 Epoch: 18 Global Step: 387830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:31,769-Speed 6314.22 samples/sec Loss 5.1410 LearningRate 0.0004 Epoch: 18 Global Step: 387840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:35,012-Speed 6317.58 samples/sec Loss 5.1381 LearningRate 0.0004 Epoch: 18 Global Step: 387850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:38,282-Speed 6262.98 samples/sec Loss 5.1106 LearningRate 0.0004 Epoch: 18 Global Step: 387860 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:00:41,536-Speed 6296.14 samples/sec Loss 5.0410 LearningRate 0.0003 Epoch: 18 Global Step: 387870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:44,783-Speed 6309.11 samples/sec Loss 5.0994 LearningRate 0.0003 Epoch: 18 Global Step: 387880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:48,030-Speed 6308.76 samples/sec Loss 5.1027 LearningRate 0.0003 Epoch: 18 Global Step: 387890 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:51,272-Speed 6317.34 samples/sec Loss 5.1967 LearningRate 0.0003 Epoch: 18 Global Step: 387900 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:54,532-Speed 6285.16 samples/sec Loss 5.0232 LearningRate 0.0003 Epoch: 18 Global Step: 387910 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:00:57,777-Speed 6312.42 samples/sec Loss 5.0500 LearningRate 0.0003 Epoch: 18 Global Step: 387920 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:01,026-Speed 6306.41 samples/sec Loss 5.0084 LearningRate 0.0003 Epoch: 18 Global Step: 387930 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:04,278-Speed 6300.53 samples/sec Loss 5.0745 LearningRate 0.0003 Epoch: 18 Global Step: 387940 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:07,518-Speed 6321.34 samples/sec Loss 5.1167 LearningRate 0.0003 Epoch: 18 Global Step: 387950 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:10,765-Speed 6309.27 samples/sec Loss 5.1033 LearningRate 0.0003 Epoch: 18 Global Step: 387960 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:14,011-Speed 6310.27 samples/sec Loss 5.0580 LearningRate 0.0003 Epoch: 18 Global Step: 387970 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:01:17,246-Speed 6333.73 samples/sec Loss 4.9990 LearningRate 0.0003 Epoch: 18 Global Step: 387980 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:20,492-Speed 6310.72 samples/sec Loss 5.0814 LearningRate 0.0003 Epoch: 18 Global Step: 387990 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:23,739-Speed 6308.19 samples/sec Loss 5.0403 LearningRate 0.0003 Epoch: 18 Global Step: 388000 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:26,986-Speed 6308.96 samples/sec Loss 5.0737 LearningRate 0.0003 Epoch: 18 Global Step: 388010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:30,251-Speed 6274.02 samples/sec Loss 5.0699 LearningRate 0.0003 Epoch: 18 Global Step: 388020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:33,497-Speed 6311.65 samples/sec Loss 5.0721 LearningRate 0.0003 Epoch: 18 Global Step: 388030 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:36,744-Speed 6308.30 samples/sec Loss 5.1303 LearningRate 0.0003 Epoch: 18 Global Step: 388040 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:39,990-Speed 6309.64 samples/sec Loss 5.0433 LearningRate 0.0003 Epoch: 18 Global Step: 388050 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:43,242-Speed 6300.42 samples/sec Loss 5.0997 LearningRate 0.0003 Epoch: 18 Global Step: 388060 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:46,486-Speed 6314.07 samples/sec Loss 5.1123 LearningRate 0.0003 Epoch: 18 Global Step: 388070 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:01:49,733-Speed 6308.47 samples/sec Loss 5.1352 LearningRate 0.0003 Epoch: 18 Global Step: 388080 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:01:52,979-Speed 6310.20 samples/sec Loss 5.1080 LearningRate 0.0003 Epoch: 18 Global Step: 388090 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:01:56,227-Speed 6307.94 samples/sec Loss 5.1397 LearningRate 0.0003 Epoch: 18 Global Step: 388100 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:01:59,470-Speed 6315.15 samples/sec Loss 5.1247 LearningRate 0.0003 Epoch: 18 Global Step: 388110 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:02,717-Speed 6310.23 samples/sec Loss 5.0912 LearningRate 0.0003 Epoch: 18 Global Step: 388120 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:05,965-Speed 6305.73 samples/sec Loss 5.1166 LearningRate 0.0003 Epoch: 18 Global Step: 388130 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:09,210-Speed 6312.52 samples/sec Loss 5.0876 LearningRate 0.0003 Epoch: 18 Global Step: 388140 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:12,458-Speed 6308.69 samples/sec Loss 5.1550 LearningRate 0.0003 Epoch: 18 Global Step: 388150 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:15,701-Speed 6316.24 samples/sec Loss 5.0578 LearningRate 0.0003 Epoch: 18 Global Step: 388160 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:18,950-Speed 6305.17 samples/sec Loss 5.1223 LearningRate 0.0003 Epoch: 18 Global Step: 388170 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:22,196-Speed 6309.91 samples/sec Loss 5.1126 LearningRate 0.0003 Epoch: 18 Global Step: 388180 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-04-02 03:02:25,428-Speed 6339.34 samples/sec Loss 5.0722 LearningRate 0.0003 Epoch: 18 Global Step: 388190 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:28,674-Speed 6310.80 samples/sec Loss 5.1396 LearningRate 0.0003 Epoch: 18 Global Step: 388200 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:31,925-Speed 6300.67 samples/sec Loss 5.0480 LearningRate 0.0003 Epoch: 18 Global Step: 388210 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:35,173-Speed 6307.13 samples/sec Loss 5.1169 LearningRate 0.0003 Epoch: 18 Global Step: 388220 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:02:38,408-Speed 6331.22 samples/sec Loss 5.0914 LearningRate 0.0003 Epoch: 18 Global Step: 388230 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:02:41,652-Speed 6314.78 samples/sec Loss 5.1108 LearningRate 0.0003 Epoch: 18 Global Step: 388240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:02:44,899-Speed 6308.98 samples/sec Loss 5.0143 LearningRate 0.0003 Epoch: 18 Global Step: 388250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:02:48,143-Speed 6314.46 samples/sec Loss 5.0528 LearningRate 0.0003 Epoch: 18 Global Step: 388260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:02:51,387-Speed 6315.08 samples/sec Loss 5.0899 LearningRate 0.0003 Epoch: 18 Global Step: 388270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:02:54,630-Speed 6316.47 samples/sec Loss 5.1683 LearningRate 0.0003 Epoch: 18 Global Step: 388280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:02:57,876-Speed 6310.84 samples/sec Loss 5.0951 LearningRate 0.0003 Epoch: 18 Global Step: 388290 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:01,124-Speed 6306.76 samples/sec Loss 5.1452 LearningRate 0.0003 Epoch: 18 Global Step: 388300 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:04,383-Speed 6285.00 samples/sec Loss 5.0255 LearningRate 0.0003 Epoch: 18 Global Step: 388310 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:07,628-Speed 6313.48 samples/sec Loss 5.1479 LearningRate 0.0003 Epoch: 18 Global Step: 388320 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:10,875-Speed 6307.89 samples/sec Loss 5.0698 LearningRate 0.0003 Epoch: 18 Global Step: 388330 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:03:14,124-Speed 6306.16 samples/sec Loss 5.0612 LearningRate 0.0003 Epoch: 18 Global Step: 388340 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:03:17,368-Speed 6312.92 samples/sec Loss 5.1239 LearningRate 0.0003 Epoch: 18 Global Step: 388350 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:03:20,602-Speed 6336.34 samples/sec Loss 5.0897 LearningRate 0.0003 Epoch: 18 Global Step: 388360 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:23,852-Speed 6302.24 samples/sec Loss 5.1459 LearningRate 0.0003 Epoch: 18 Global Step: 388370 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:27,130-Speed 6250.30 samples/sec Loss 5.1293 LearningRate 0.0003 Epoch: 18 Global Step: 388380 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:30,382-Speed 6297.50 samples/sec Loss 5.0726 LearningRate 0.0003 Epoch: 18 Global Step: 388390 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:33,627-Speed 6313.53 samples/sec Loss 5.0609 LearningRate 0.0003 Epoch: 18 Global Step: 388400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:36,875-Speed 6306.41 samples/sec Loss 5.0626 LearningRate 0.0003 Epoch: 18 Global Step: 388410 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:40,121-Speed 6311.70 samples/sec Loss 5.0727 LearningRate 0.0003 Epoch: 18 Global Step: 388420 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:43,364-Speed 6316.29 samples/sec Loss 5.0585 LearningRate 0.0003 Epoch: 18 Global Step: 388430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:46,612-Speed 6307.37 samples/sec Loss 5.0628 LearningRate 0.0003 Epoch: 18 Global Step: 388440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:49,857-Speed 6312.83 samples/sec Loss 5.0795 LearningRate 0.0003 Epoch: 18 Global Step: 388450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:03:53,106-Speed 6304.23 samples/sec Loss 5.0278 LearningRate 0.0003 Epoch: 18 Global Step: 388460 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:03:56,351-Speed 6311.92 samples/sec Loss 5.0561 LearningRate 0.0003 Epoch: 18 Global Step: 388470 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:03:59,598-Speed 6309.06 samples/sec Loss 5.1358 LearningRate 0.0003 Epoch: 18 Global Step: 388480 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:04:02,843-Speed 6312.63 samples/sec Loss 5.0964 LearningRate 0.0003 Epoch: 18 Global Step: 388490 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:04:06,166-Speed 6164.46 samples/sec Loss 5.0524 LearningRate 0.0003 Epoch: 18 Global Step: 388500 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:04:09,417-Speed 6301.24 samples/sec Loss 5.1576 LearningRate 0.0003 Epoch: 18 Global Step: 388510 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:04:12,664-Speed 6309.66 samples/sec Loss 5.1310 LearningRate 0.0003 Epoch: 18 Global Step: 388520 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:04:15,910-Speed 6309.99 samples/sec Loss 5.0891 LearningRate 0.0003 Epoch: 18 Global Step: 388530 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:04:19,143-Speed 6335.16 samples/sec Loss 5.0823 LearningRate 0.0003 Epoch: 18 Global Step: 388540 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:04:22,390-Speed 6310.12 samples/sec Loss 5.0928 LearningRate 0.0003 Epoch: 18 Global Step: 388550 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:04:25,634-Speed 6314.20 samples/sec Loss 5.0131 LearningRate 0.0003 Epoch: 18 Global Step: 388560 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:04:28,876-Speed 6319.47 samples/sec Loss 5.1111 LearningRate 0.0003 Epoch: 18 Global Step: 388570 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:04:32,121-Speed 6312.37 samples/sec Loss 5.0507 LearningRate 0.0003 Epoch: 18 Global Step: 388580 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:04:35,364-Speed 6316.99 samples/sec Loss 5.0669 LearningRate 0.0003 Epoch: 18 Global Step: 388590 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:04:38,605-Speed 6320.92 samples/sec Loss 5.1271 LearningRate 0.0003 Epoch: 18 Global Step: 388600 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:04:41,849-Speed 6313.62 samples/sec Loss 5.0981 LearningRate 0.0003 Epoch: 18 Global Step: 388610 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:04:45,097-Speed 6307.02 samples/sec Loss 5.0759 LearningRate 0.0003 Epoch: 18 Global Step: 388620 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:04:48,343-Speed 6311.00 samples/sec Loss 4.9776 LearningRate 0.0003 Epoch: 18 Global Step: 388630 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:04:51,590-Speed 6307.75 samples/sec Loss 5.1166 LearningRate 0.0003 Epoch: 18 Global Step: 388640 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:04:54,832-Speed 6319.63 samples/sec Loss 5.1309 LearningRate 0.0003 Epoch: 18 Global Step: 388650 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:04:58,079-Speed 6308.93 samples/sec Loss 5.0574 LearningRate 0.0003 Epoch: 18 Global Step: 388660 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:01,327-Speed 6305.98 samples/sec Loss 5.0452 LearningRate 0.0003 Epoch: 18 Global Step: 388670 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:04,577-Speed 6302.34 samples/sec Loss 5.1299 LearningRate 0.0003 Epoch: 18 Global Step: 388680 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:07,821-Speed 6314.74 samples/sec Loss 5.0522 LearningRate 0.0003 Epoch: 18 Global Step: 388690 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:11,061-Speed 6323.03 samples/sec Loss 5.0476 LearningRate 0.0003 Epoch: 18 Global Step: 388700 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:14,306-Speed 6312.72 samples/sec Loss 5.1467 LearningRate 0.0003 Epoch: 18 Global Step: 388710 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:17,553-Speed 6309.46 samples/sec Loss 5.0724 LearningRate 0.0003 Epoch: 18 Global Step: 388720 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:20,795-Speed 6316.49 samples/sec Loss 5.1517 LearningRate 0.0003 Epoch: 18 Global Step: 388730 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:24,028-Speed 6337.36 samples/sec Loss 5.1257 LearningRate 0.0003 Epoch: 18 Global Step: 388740 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:27,274-Speed 6309.59 samples/sec Loss 5.0701 LearningRate 0.0003 Epoch: 18 Global Step: 388750 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:30,522-Speed 6307.12 samples/sec Loss 5.0760 LearningRate 0.0003 Epoch: 18 Global Step: 388760 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:33,764-Speed 6319.15 samples/sec Loss 5.1033 LearningRate 0.0003 Epoch: 18 Global Step: 388770 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:05:36,992-Speed 6345.86 samples/sec Loss 5.1050 LearningRate 0.0003 Epoch: 18 Global Step: 388780 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:05:40,236-Speed 6315.93 samples/sec Loss 5.1040 LearningRate 0.0003 Epoch: 18 Global Step: 388790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:05:43,483-Speed 6309.13 samples/sec Loss 5.1395 LearningRate 0.0003 Epoch: 18 Global Step: 388800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:05:46,729-Speed 6309.73 samples/sec Loss 5.0364 LearningRate 0.0003 Epoch: 18 Global Step: 388810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:05:49,976-Speed 6309.30 samples/sec Loss 5.1205 LearningRate 0.0003 Epoch: 18 Global Step: 388820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:05:53,222-Speed 6310.24 samples/sec Loss 5.0777 LearningRate 0.0003 Epoch: 18 Global Step: 388830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:05:56,469-Speed 6308.36 samples/sec Loss 5.1095 LearningRate 0.0003 Epoch: 18 Global Step: 388840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:05:59,716-Speed 6310.19 samples/sec Loss 5.0964 LearningRate 0.0003 Epoch: 18 Global Step: 388850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:02,963-Speed 6307.52 samples/sec Loss 5.0964 LearningRate 0.0003 Epoch: 18 Global Step: 388860 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:06,220-Speed 6290.23 samples/sec Loss 5.0999 LearningRate 0.0003 Epoch: 18 Global Step: 388870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:09,451-Speed 6339.51 samples/sec Loss 5.1340 LearningRate 0.0003 Epoch: 18 Global Step: 388880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:12,695-Speed 6315.04 samples/sec Loss 5.0701 LearningRate 0.0003 Epoch: 18 Global Step: 388890 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:15,945-Speed 6303.00 samples/sec Loss 5.1339 LearningRate 0.0003 Epoch: 18 Global Step: 388900 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:19,192-Speed 6309.10 samples/sec Loss 5.1003 LearningRate 0.0003 Epoch: 18 Global Step: 388910 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:22,438-Speed 6309.97 samples/sec Loss 5.1180 LearningRate 0.0003 Epoch: 18 Global Step: 388920 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:25,687-Speed 6305.22 samples/sec Loss 5.0390 LearningRate 0.0003 Epoch: 18 Global Step: 388930 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:28,932-Speed 6311.80 samples/sec Loss 5.0185 LearningRate 0.0003 Epoch: 18 Global Step: 388940 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:32,189-Speed 6289.44 samples/sec Loss 5.0209 LearningRate 0.0003 Epoch: 18 Global Step: 388950 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:35,435-Speed 6311.23 samples/sec Loss 5.0433 LearningRate 0.0003 Epoch: 18 Global Step: 388960 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:38,678-Speed 6315.44 samples/sec Loss 5.0626 LearningRate 0.0003 Epoch: 18 Global Step: 388970 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:41,924-Speed 6311.19 samples/sec Loss 5.1108 LearningRate 0.0003 Epoch: 18 Global Step: 388980 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:06:45,171-Speed 6309.68 samples/sec Loss 5.0175 LearningRate 0.0003 Epoch: 18 Global Step: 388990 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:06:48,421-Speed 6302.04 samples/sec Loss 4.9946 LearningRate 0.0003 Epoch: 18 Global Step: 389000 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:06:51,652-Speed 6341.57 samples/sec Loss 4.9892 LearningRate 0.0003 Epoch: 18 Global Step: 389010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:54,894-Speed 6318.50 samples/sec Loss 5.0735 LearningRate 0.0003 Epoch: 18 Global Step: 389020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:06:58,137-Speed 6316.94 samples/sec Loss 5.0975 LearningRate 0.0003 Epoch: 18 Global Step: 389030 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:01,384-Speed 6308.81 samples/sec Loss 5.0818 LearningRate 0.0003 Epoch: 18 Global Step: 389040 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:04,631-Speed 6308.67 samples/sec Loss 5.0844 LearningRate 0.0003 Epoch: 18 Global Step: 389050 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:07,875-Speed 6313.70 samples/sec Loss 5.0586 LearningRate 0.0003 Epoch: 18 Global Step: 389060 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:11,123-Speed 6307.60 samples/sec Loss 4.9769 LearningRate 0.0003 Epoch: 18 Global Step: 389070 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:14,368-Speed 6311.60 samples/sec Loss 5.0962 LearningRate 0.0003 Epoch: 18 Global Step: 389080 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:17,616-Speed 6308.07 samples/sec Loss 5.1047 LearningRate 0.0003 Epoch: 18 Global Step: 389090 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:20,861-Speed 6312.05 samples/sec Loss 5.0939 LearningRate 0.0003 Epoch: 18 Global Step: 389100 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:24,108-Speed 6308.86 samples/sec Loss 5.0781 LearningRate 0.0003 Epoch: 18 Global Step: 389110 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:07:27,356-Speed 6306.78 samples/sec Loss 5.2173 LearningRate 0.0003 Epoch: 18 Global Step: 389120 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:07:30,599-Speed 6315.99 samples/sec Loss 5.0577 LearningRate 0.0003 Epoch: 18 Global Step: 389130 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:07:33,844-Speed 6312.42 samples/sec Loss 5.0733 LearningRate 0.0003 Epoch: 18 Global Step: 389140 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:07:37,103-Speed 6286.71 samples/sec Loss 5.0670 LearningRate 0.0003 Epoch: 18 Global Step: 389150 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:07:40,344-Speed 6319.45 samples/sec Loss 5.0959 LearningRate 0.0003 Epoch: 18 Global Step: 389160 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:43,589-Speed 6313.17 samples/sec Loss 5.0960 LearningRate 0.0003 Epoch: 18 Global Step: 389170 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:46,834-Speed 6311.32 samples/sec Loss 5.0594 LearningRate 0.0003 Epoch: 18 Global Step: 389180 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:50,081-Speed 6309.80 samples/sec Loss 5.0467 LearningRate 0.0003 Epoch: 18 Global Step: 389190 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:53,330-Speed 6305.60 samples/sec Loss 5.0727 LearningRate 0.0003 Epoch: 18 Global Step: 389200 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:56,576-Speed 6310.87 samples/sec Loss 5.1213 LearningRate 0.0003 Epoch: 18 Global Step: 389210 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:07:59,824-Speed 6307.65 samples/sec Loss 5.1775 LearningRate 0.0003 Epoch: 18 Global Step: 389220 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:08:03,069-Speed 6312.21 samples/sec Loss 5.0882 LearningRate 0.0003 Epoch: 18 Global Step: 389230 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:08:06,317-Speed 6305.72 samples/sec Loss 5.0614 LearningRate 0.0003 Epoch: 18 Global Step: 389240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:08:09,564-Speed 6309.68 samples/sec Loss 5.0674 LearningRate 0.0003 Epoch: 18 Global Step: 389250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:08:12,810-Speed 6312.73 samples/sec Loss 5.0567 LearningRate 0.0003 Epoch: 18 Global Step: 389260 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:08:16,056-Speed 6310.08 samples/sec Loss 5.0485 LearningRate 0.0003 Epoch: 18 Global Step: 389270 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:08:19,300-Speed 6314.42 samples/sec Loss 5.0150 LearningRate 0.0003 Epoch: 18 Global Step: 389280 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:08:22,550-Speed 6303.84 samples/sec Loss 5.1009 LearningRate 0.0003 Epoch: 18 Global Step: 389290 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:08:25,829-Speed 6246.66 samples/sec Loss 5.0922 LearningRate 0.0003 Epoch: 18 Global Step: 389300 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:08:29,073-Speed 6315.50 samples/sec Loss 5.1177 LearningRate 0.0003 Epoch: 18 Global Step: 389310 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:08:32,320-Speed 6308.94 samples/sec Loss 5.1491 LearningRate 0.0003 Epoch: 18 Global Step: 389320 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:08:35,568-Speed 6308.24 samples/sec Loss 5.1146 LearningRate 0.0003 Epoch: 18 Global Step: 389330 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:08:38,818-Speed 6302.20 samples/sec Loss 5.0388 LearningRate 0.0003 Epoch: 18 Global Step: 389340 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:08:42,056-Speed 6326.26 samples/sec Loss 5.0696 LearningRate 0.0003 Epoch: 18 Global Step: 389350 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:08:45,303-Speed 6308.58 samples/sec Loss 5.0645 LearningRate 0.0003 Epoch: 18 Global Step: 389360 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:08:48,546-Speed 6316.45 samples/sec Loss 5.0567 LearningRate 0.0003 Epoch: 18 Global Step: 389370 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:08:51,793-Speed 6308.83 samples/sec Loss 5.0015 LearningRate 0.0003 Epoch: 18 Global Step: 389380 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:08:55,038-Speed 6313.14 samples/sec Loss 5.0840 LearningRate 0.0003 Epoch: 18 Global Step: 389390 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:08:58,285-Speed 6308.67 samples/sec Loss 5.1253 LearningRate 0.0003 Epoch: 18 Global Step: 389400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:01,532-Speed 6309.89 samples/sec Loss 5.1635 LearningRate 0.0003 Epoch: 18 Global Step: 389410 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:04,783-Speed 6299.59 samples/sec Loss 5.1435 LearningRate 0.0003 Epoch: 18 Global Step: 389420 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:08,030-Speed 6310.33 samples/sec Loss 5.0705 LearningRate 0.0003 Epoch: 18 Global Step: 389430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:11,276-Speed 6312.17 samples/sec Loss 5.0690 LearningRate 0.0003 Epoch: 18 Global Step: 389440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:14,504-Speed 6346.92 samples/sec Loss 5.0862 LearningRate 0.0003 Epoch: 18 Global Step: 389450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:17,760-Speed 6290.69 samples/sec Loss 5.1249 LearningRate 0.0003 Epoch: 18 Global Step: 389460 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:21,008-Speed 6306.82 samples/sec Loss 5.0819 LearningRate 0.0003 Epoch: 18 Global Step: 389470 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:24,253-Speed 6313.28 samples/sec Loss 5.0821 LearningRate 0.0003 Epoch: 18 Global Step: 389480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:27,498-Speed 6311.89 samples/sec Loss 5.1453 LearningRate 0.0003 Epoch: 18 Global Step: 389490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:30,748-Speed 6303.02 samples/sec Loss 5.0726 LearningRate 0.0003 Epoch: 18 Global Step: 389500 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:33,991-Speed 6316.17 samples/sec Loss 5.0199 LearningRate 0.0003 Epoch: 18 Global Step: 389510 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:37,238-Speed 6308.71 samples/sec Loss 5.0848 LearningRate 0.0003 Epoch: 18 Global Step: 389520 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:40,487-Speed 6304.55 samples/sec Loss 5.1127 LearningRate 0.0003 Epoch: 18 Global Step: 389530 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:43,732-Speed 6314.32 samples/sec Loss 5.0914 LearningRate 0.0003 Epoch: 18 Global Step: 389540 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:09:46,983-Speed 6300.77 samples/sec Loss 5.0612 LearningRate 0.0003 Epoch: 18 Global Step: 389550 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:09:50,234-Speed 6301.34 samples/sec Loss 5.0843 LearningRate 0.0003 Epoch: 18 Global Step: 389560 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:09:53,478-Speed 6314.18 samples/sec Loss 5.0687 LearningRate 0.0003 Epoch: 18 Global Step: 389570 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:09:56,724-Speed 6311.38 samples/sec Loss 5.1109 LearningRate 0.0003 Epoch: 18 Global Step: 389580 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:09:59,969-Speed 6311.71 samples/sec Loss 4.9892 LearningRate 0.0003 Epoch: 18 Global Step: 389590 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:10:03,210-Speed 6319.88 samples/sec Loss 5.1607 LearningRate 0.0003 Epoch: 18 Global Step: 389600 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:10:06,454-Speed 6315.25 samples/sec Loss 5.0512 LearningRate 0.0003 Epoch: 18 Global Step: 389610 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:10:09,706-Speed 6299.29 samples/sec Loss 5.0562 LearningRate 0.0003 Epoch: 18 Global Step: 389620 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:10:12,952-Speed 6312.28 samples/sec Loss 5.0401 LearningRate 0.0003 Epoch: 18 Global Step: 389630 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:10:16,197-Speed 6311.78 samples/sec Loss 5.1356 LearningRate 0.0003 Epoch: 18 Global Step: 389640 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:10:19,429-Speed 6337.21 samples/sec Loss 5.0758 LearningRate 0.0003 Epoch: 18 Global Step: 389650 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:22,673-Speed 6316.34 samples/sec Loss 5.0778 LearningRate 0.0003 Epoch: 18 Global Step: 389660 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:25,934-Speed 6280.19 samples/sec Loss 5.0727 LearningRate 0.0003 Epoch: 18 Global Step: 389670 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:29,181-Speed 6309.17 samples/sec Loss 5.0184 LearningRate 0.0003 Epoch: 18 Global Step: 389680 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:32,424-Speed 6316.15 samples/sec Loss 5.1065 LearningRate 0.0003 Epoch: 18 Global Step: 389690 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:35,671-Speed 6310.04 samples/sec Loss 5.0197 LearningRate 0.0003 Epoch: 18 Global Step: 389700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:38,919-Speed 6306.59 samples/sec Loss 5.1046 LearningRate 0.0003 Epoch: 18 Global Step: 389710 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:42,164-Speed 6311.53 samples/sec Loss 5.0952 LearningRate 0.0003 Epoch: 18 Global Step: 389720 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:45,408-Speed 6314.17 samples/sec Loss 5.1262 LearningRate 0.0003 Epoch: 18 Global Step: 389730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:48,653-Speed 6312.38 samples/sec Loss 5.0986 LearningRate 0.0003 Epoch: 18 Global Step: 389740 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:51,902-Speed 6306.86 samples/sec Loss 5.2032 LearningRate 0.0003 Epoch: 18 Global Step: 389750 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:10:55,131-Speed 6342.85 samples/sec Loss 5.0816 LearningRate 0.0003 Epoch: 18 Global Step: 389760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:10:58,378-Speed 6307.73 samples/sec Loss 5.0493 LearningRate 0.0003 Epoch: 18 Global Step: 389770 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:11:01,665-Speed 6231.94 samples/sec Loss 5.0695 LearningRate 0.0003 Epoch: 18 Global Step: 389780 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:11:04,956-Speed 6224.71 samples/sec Loss 5.0888 LearningRate 0.0003 Epoch: 18 Global Step: 389790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:11:08,209-Speed 6297.62 samples/sec Loss 5.0323 LearningRate 0.0003 Epoch: 18 Global Step: 389800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:11:11,470-Speed 6282.80 samples/sec Loss 5.0798 LearningRate 0.0003 Epoch: 18 Global Step: 389810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:11:14,718-Speed 6305.72 samples/sec Loss 5.0432 LearningRate 0.0003 Epoch: 18 Global Step: 389820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:11:17,964-Speed 6310.15 samples/sec Loss 5.0899 LearningRate 0.0003 Epoch: 18 Global Step: 389830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:11:21,210-Speed 6311.64 samples/sec Loss 5.0780 LearningRate 0.0003 Epoch: 18 Global Step: 389840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:11:24,456-Speed 6311.63 samples/sec Loss 5.0534 LearningRate 0.0003 Epoch: 18 Global Step: 389850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:11:27,704-Speed 6305.86 samples/sec Loss 5.0252 LearningRate 0.0003 Epoch: 18 Global Step: 389860 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:11:30,951-Speed 6310.41 samples/sec Loss 5.0459 LearningRate 0.0003 Epoch: 18 Global Step: 389870 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:11:34,197-Speed 6310.15 samples/sec Loss 5.0875 LearningRate 0.0003 Epoch: 18 Global Step: 389880 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:11:37,455-Speed 6286.48 samples/sec Loss 5.1259 LearningRate 0.0003 Epoch: 18 Global Step: 389890 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:11:40,811-Speed 6104.93 samples/sec Loss 5.1106 LearningRate 0.0003 Epoch: 18 Global Step: 389900 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:11:44,066-Speed 6292.88 samples/sec Loss 5.1515 LearningRate 0.0003 Epoch: 18 Global Step: 389910 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:11:47,322-Speed 6290.69 samples/sec Loss 5.1201 LearningRate 0.0003 Epoch: 18 Global Step: 389920 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:11:50,584-Speed 6279.25 samples/sec Loss 5.0945 LearningRate 0.0003 Epoch: 18 Global Step: 389930 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:11:53,834-Speed 6304.72 samples/sec Loss 5.0757 LearningRate 0.0003 Epoch: 18 Global Step: 389940 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:11:57,085-Speed 6299.23 samples/sec Loss 5.0850 LearningRate 0.0003 Epoch: 18 Global Step: 389950 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:12:00,331-Speed 6311.97 samples/sec Loss 5.0491 LearningRate 0.0003 Epoch: 18 Global Step: 389960 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-04-02 03:12:03,566-Speed 6332.03 samples/sec Loss 5.0900 LearningRate 0.0003 Epoch: 18 Global Step: 389970 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:12:06,816-Speed 6302.11 samples/sec Loss 5.0816 LearningRate 0.0003 Epoch: 18 Global Step: 389980 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:12:10,073-Speed 6289.52 samples/sec Loss 4.9979 LearningRate 0.0003 Epoch: 18 Global Step: 389990 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:12:13,318-Speed 6312.74 samples/sec Loss 5.0431 LearningRate 0.0003 Epoch: 18 Global Step: 390000 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:12:16,565-Speed 6308.77 samples/sec Loss 5.0730 LearningRate 0.0003 Epoch: 18 Global Step: 390010 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:12:19,808-Speed 6316.85 samples/sec Loss 5.1293 LearningRate 0.0003 Epoch: 18 Global Step: 390020 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:12:23,055-Speed 6308.30 samples/sec Loss 5.0405 LearningRate 0.0003 Epoch: 18 Global Step: 390030 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:12:26,306-Speed 6303.07 samples/sec Loss 5.0616 LearningRate 0.0003 Epoch: 18 Global Step: 390040 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:12:29,549-Speed 6315.44 samples/sec Loss 5.0110 LearningRate 0.0003 Epoch: 18 Global Step: 390050 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:12:32,780-Speed 6341.17 samples/sec Loss 5.0690 LearningRate 0.0003 Epoch: 18 Global Step: 390060 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:12:36,024-Speed 6313.95 samples/sec Loss 5.0552 LearningRate 0.0003 Epoch: 18 Global Step: 390070 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:12:39,270-Speed 6310.75 samples/sec Loss 5.0810 LearningRate 0.0003 Epoch: 18 Global Step: 390080 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:12:42,514-Speed 6315.65 samples/sec Loss 5.0571 LearningRate 0.0003 Epoch: 18 Global Step: 390090 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:12:45,764-Speed 6302.52 samples/sec Loss 5.0390 LearningRate 0.0003 Epoch: 18 Global Step: 390100 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:12:49,009-Speed 6311.88 samples/sec Loss 5.0592 LearningRate 0.0003 Epoch: 18 Global Step: 390110 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:12:52,255-Speed 6310.04 samples/sec Loss 5.0301 LearningRate 0.0003 Epoch: 18 Global Step: 390120 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:12:55,502-Speed 6309.67 samples/sec Loss 5.1437 LearningRate 0.0003 Epoch: 18 Global Step: 390130 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:12:58,751-Speed 6304.57 samples/sec Loss 5.1156 LearningRate 0.0003 Epoch: 18 Global Step: 390140 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:01,996-Speed 6313.32 samples/sec Loss 5.0225 LearningRate 0.0003 Epoch: 18 Global Step: 390150 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:05,240-Speed 6314.00 samples/sec Loss 5.0374 LearningRate 0.0003 Epoch: 18 Global Step: 390160 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:13:08,484-Speed 6315.60 samples/sec Loss 5.0461 LearningRate 0.0003 Epoch: 18 Global Step: 390170 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:13:11,729-Speed 6311.62 samples/sec Loss 5.1283 LearningRate 0.0003 Epoch: 18 Global Step: 390180 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:13:14,975-Speed 6310.96 samples/sec Loss 4.9855 LearningRate 0.0003 Epoch: 18 Global Step: 390190 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:13:18,217-Speed 6317.71 samples/sec Loss 5.1144 LearningRate 0.0003 Epoch: 18 Global Step: 390200 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:13:21,449-Speed 6338.74 samples/sec Loss 5.0118 LearningRate 0.0003 Epoch: 18 Global Step: 390210 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:24,691-Speed 6318.76 samples/sec Loss 5.0153 LearningRate 0.0003 Epoch: 18 Global Step: 390220 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:27,937-Speed 6310.62 samples/sec Loss 5.0615 LearningRate 0.0003 Epoch: 18 Global Step: 390230 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:31,182-Speed 6312.24 samples/sec Loss 5.1302 LearningRate 0.0003 Epoch: 18 Global Step: 390240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:34,433-Speed 6301.91 samples/sec Loss 5.1525 LearningRate 0.0003 Epoch: 18 Global Step: 390250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:37,677-Speed 6314.77 samples/sec Loss 5.1248 LearningRate 0.0003 Epoch: 18 Global Step: 390260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:40,920-Speed 6316.99 samples/sec Loss 5.0227 LearningRate 0.0003 Epoch: 18 Global Step: 390270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:44,162-Speed 6318.95 samples/sec Loss 5.1517 LearningRate 0.0003 Epoch: 18 Global Step: 390280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:47,405-Speed 6316.30 samples/sec Loss 5.0622 LearningRate 0.0003 Epoch: 18 Global Step: 390290 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:50,649-Speed 6313.61 samples/sec Loss 5.0306 LearningRate 0.0003 Epoch: 18 Global Step: 390300 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:13:53,895-Speed 6311.86 samples/sec Loss 5.0101 LearningRate 0.0003 Epoch: 18 Global Step: 390310 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:13:57,140-Speed 6311.18 samples/sec Loss 5.1244 LearningRate 0.0003 Epoch: 18 Global Step: 390320 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:14:00,389-Speed 6305.25 samples/sec Loss 5.0897 LearningRate 0.0003 Epoch: 18 Global Step: 390330 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:14:03,637-Speed 6306.67 samples/sec Loss 5.0221 LearningRate 0.0003 Epoch: 18 Global Step: 390340 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:14:06,888-Speed 6301.40 samples/sec Loss 5.0827 LearningRate 0.0003 Epoch: 18 Global Step: 390350 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:14:10,132-Speed 6314.80 samples/sec Loss 5.0021 LearningRate 0.0003 Epoch: 18 Global Step: 390360 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:14:13,377-Speed 6312.27 samples/sec Loss 5.1471 LearningRate 0.0003 Epoch: 18 Global Step: 390370 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:14:16,620-Speed 6317.52 samples/sec Loss 5.1094 LearningRate 0.0003 Epoch: 18 Global Step: 390380 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:14:19,862-Speed 6318.78 samples/sec Loss 5.0745 LearningRate 0.0003 Epoch: 18 Global Step: 390390 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:14:23,094-Speed 6337.37 samples/sec Loss 5.0601 LearningRate 0.0003 Epoch: 18 Global Step: 390400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:14:26,342-Speed 6306.32 samples/sec Loss 5.0545 LearningRate 0.0003 Epoch: 18 Global Step: 390410 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:14:29,589-Speed 6309.96 samples/sec Loss 5.0798 LearningRate 0.0003 Epoch: 18 Global Step: 390420 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:14:32,835-Speed 6309.91 samples/sec Loss 5.1091 LearningRate 0.0003 Epoch: 18 Global Step: 390430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:14:36,080-Speed 6312.79 samples/sec Loss 5.0219 LearningRate 0.0003 Epoch: 18 Global Step: 390440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:14:39,326-Speed 6310.82 samples/sec Loss 5.0057 LearningRate 0.0003 Epoch: 18 Global Step: 390450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:14:42,574-Speed 6307.00 samples/sec Loss 5.0352 LearningRate 0.0003 Epoch: 18 Global Step: 390460 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:14:45,821-Speed 6308.29 samples/sec Loss 5.0861 LearningRate 0.0003 Epoch: 18 Global Step: 390470 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:14:49,066-Speed 6314.44 samples/sec Loss 5.1022 LearningRate 0.0003 Epoch: 18 Global Step: 390480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:14:52,315-Speed 6305.28 samples/sec Loss 5.0357 LearningRate 0.0003 Epoch: 18 Global Step: 390490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:14:55,557-Speed 6316.96 samples/sec Loss 5.0930 LearningRate 0.0003 Epoch: 18 Global Step: 390500 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:14:58,802-Speed 6312.84 samples/sec Loss 5.0791 LearningRate 0.0003 Epoch: 18 Global Step: 390510 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:02,050-Speed 6307.58 samples/sec Loss 5.1379 LearningRate 0.0003 Epoch: 18 Global Step: 390520 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:05,297-Speed 6308.87 samples/sec Loss 5.0363 LearningRate 0.0003 Epoch: 18 Global Step: 390530 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:08,542-Speed 6311.33 samples/sec Loss 5.0658 LearningRate 0.0003 Epoch: 18 Global Step: 390540 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:11,810-Speed 6268.71 samples/sec Loss 5.0690 LearningRate 0.0003 Epoch: 18 Global Step: 390550 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:15,056-Speed 6311.71 samples/sec Loss 5.1744 LearningRate 0.0003 Epoch: 18 Global Step: 390560 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:18,306-Speed 6302.20 samples/sec Loss 5.0339 LearningRate 0.0003 Epoch: 18 Global Step: 390570 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:21,568-Speed 6279.49 samples/sec Loss 5.0358 LearningRate 0.0003 Epoch: 18 Global Step: 390580 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:24,827-Speed 6285.64 samples/sec Loss 5.0994 LearningRate 0.0003 Epoch: 18 Global Step: 390590 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:28,081-Speed 6296.20 samples/sec Loss 5.0038 LearningRate 0.0003 Epoch: 18 Global Step: 390600 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:31,376-Speed 6215.49 samples/sec Loss 4.9845 LearningRate 0.0003 Epoch: 18 Global Step: 390610 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:15:34,611-Speed 6333.19 samples/sec Loss 5.0805 LearningRate 0.0003 Epoch: 18 Global Step: 390620 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:15:37,863-Speed 6299.02 samples/sec Loss 5.0770 LearningRate 0.0003 Epoch: 18 Global Step: 390630 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:15:41,109-Speed 6311.40 samples/sec Loss 5.0280 LearningRate 0.0003 Epoch: 18 Global Step: 390640 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:15:44,353-Speed 6313.18 samples/sec Loss 5.0157 LearningRate 0.0003 Epoch: 18 Global Step: 390650 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:15:47,598-Speed 6312.35 samples/sec Loss 5.0181 LearningRate 0.0003 Epoch: 18 Global Step: 390660 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:15:50,844-Speed 6312.29 samples/sec Loss 5.0530 LearningRate 0.0003 Epoch: 18 Global Step: 390670 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:15:54,090-Speed 6310.88 samples/sec Loss 5.0658 LearningRate 0.0003 Epoch: 18 Global Step: 390680 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:15:57,335-Speed 6311.42 samples/sec Loss 4.9988 LearningRate 0.0003 Epoch: 18 Global Step: 390690 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:00,582-Speed 6309.31 samples/sec Loss 5.0822 LearningRate 0.0003 Epoch: 18 Global Step: 390700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:03,878-Speed 6214.94 samples/sec Loss 5.0792 LearningRate 0.0003 Epoch: 18 Global Step: 390710 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:07,182-Speed 6199.81 samples/sec Loss 5.0009 LearningRate 0.0003 Epoch: 18 Global Step: 390720 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:16:10,414-Speed 6339.12 samples/sec Loss 5.1060 LearningRate 0.0003 Epoch: 18 Global Step: 390730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:13,661-Speed 6308.66 samples/sec Loss 5.0484 LearningRate 0.0003 Epoch: 18 Global Step: 390740 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:16,904-Speed 6315.90 samples/sec Loss 4.9630 LearningRate 0.0003 Epoch: 18 Global Step: 390750 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:20,154-Speed 6302.73 samples/sec Loss 5.0930 LearningRate 0.0003 Epoch: 18 Global Step: 390760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:23,404-Speed 6303.74 samples/sec Loss 5.1013 LearningRate 0.0003 Epoch: 18 Global Step: 390770 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:26,669-Speed 6274.47 samples/sec Loss 5.1521 LearningRate 0.0003 Epoch: 18 Global Step: 390780 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:29,914-Speed 6312.33 samples/sec Loss 5.0910 LearningRate 0.0003 Epoch: 18 Global Step: 390790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:33,156-Speed 6318.13 samples/sec Loss 5.0477 LearningRate 0.0003 Epoch: 18 Global Step: 390800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:36,403-Speed 6308.22 samples/sec Loss 5.0268 LearningRate 0.0003 Epoch: 18 Global Step: 390810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:39,650-Speed 6312.67 samples/sec Loss 5.0695 LearningRate 0.0003 Epoch: 18 Global Step: 390820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:42,896-Speed 6312.02 samples/sec Loss 5.1110 LearningRate 0.0003 Epoch: 18 Global Step: 390830 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:16:46,136-Speed 6321.33 samples/sec Loss 5.1072 LearningRate 0.0003 Epoch: 18 Global Step: 390840 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:16:49,369-Speed 6335.47 samples/sec Loss 5.0829 LearningRate 0.0003 Epoch: 18 Global Step: 390850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:52,615-Speed 6311.30 samples/sec Loss 5.0358 LearningRate 0.0003 Epoch: 18 Global Step: 390860 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:55,855-Speed 6322.21 samples/sec Loss 5.0547 LearningRate 0.0003 Epoch: 18 Global Step: 390870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:16:59,099-Speed 6316.12 samples/sec Loss 5.0090 LearningRate 0.0003 Epoch: 18 Global Step: 390880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:02,352-Speed 6297.39 samples/sec Loss 5.0604 LearningRate 0.0003 Epoch: 18 Global Step: 390890 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:05,611-Speed 6285.93 samples/sec Loss 5.0581 LearningRate 0.0003 Epoch: 18 Global Step: 390900 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:08,854-Speed 6317.79 samples/sec Loss 5.0534 LearningRate 0.0003 Epoch: 18 Global Step: 390910 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:12,103-Speed 6305.13 samples/sec Loss 5.1376 LearningRate 0.0003 Epoch: 18 Global Step: 390920 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:15,351-Speed 6306.62 samples/sec Loss 5.1086 LearningRate 0.0003 Epoch: 18 Global Step: 390930 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:18,596-Speed 6312.07 samples/sec Loss 5.0567 LearningRate 0.0003 Epoch: 18 Global Step: 390940 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:21,846-Speed 6302.52 samples/sec Loss 5.0796 LearningRate 0.0003 Epoch: 18 Global Step: 390950 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:17:25,091-Speed 6311.93 samples/sec Loss 5.0174 LearningRate 0.0003 Epoch: 18 Global Step: 390960 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:17:28,343-Speed 6299.45 samples/sec Loss 5.0439 LearningRate 0.0003 Epoch: 18 Global Step: 390970 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:17:31,594-Speed 6300.87 samples/sec Loss 5.0511 LearningRate 0.0003 Epoch: 18 Global Step: 390980 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:17:34,840-Speed 6310.11 samples/sec Loss 5.0862 LearningRate 0.0003 Epoch: 18 Global Step: 390990 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:17:38,073-Speed 6337.11 samples/sec Loss 5.0512 LearningRate 0.0003 Epoch: 18 Global Step: 391000 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:41,318-Speed 6313.48 samples/sec Loss 5.0215 LearningRate 0.0003 Epoch: 18 Global Step: 391010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:44,566-Speed 6305.40 samples/sec Loss 5.0090 LearningRate 0.0003 Epoch: 18 Global Step: 391020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:47,814-Speed 6307.22 samples/sec Loss 5.0671 LearningRate 0.0003 Epoch: 18 Global Step: 391030 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:51,060-Speed 6310.71 samples/sec Loss 5.1834 LearningRate 0.0003 Epoch: 18 Global Step: 391040 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:54,304-Speed 6314.25 samples/sec Loss 5.1268 LearningRate 0.0003 Epoch: 18 Global Step: 391050 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:17:57,543-Speed 6324.08 samples/sec Loss 5.0418 LearningRate 0.0003 Epoch: 18 Global Step: 391060 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:00,791-Speed 6307.53 samples/sec Loss 5.0120 LearningRate 0.0003 Epoch: 18 Global Step: 391070 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:04,049-Speed 6287.45 samples/sec Loss 5.1455 LearningRate 0.0003 Epoch: 18 Global Step: 391080 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:07,296-Speed 6308.46 samples/sec Loss 5.1094 LearningRate 0.0003 Epoch: 18 Global Step: 391090 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:10,544-Speed 6307.28 samples/sec Loss 5.0037 LearningRate 0.0003 Epoch: 18 Global Step: 391100 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:18:13,792-Speed 6306.62 samples/sec Loss 5.0726 LearningRate 0.0003 Epoch: 18 Global Step: 391110 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:18:17,025-Speed 6337.37 samples/sec Loss 5.0687 LearningRate 0.0003 Epoch: 18 Global Step: 391120 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:20,269-Speed 6314.55 samples/sec Loss 5.0134 LearningRate 0.0003 Epoch: 18 Global Step: 391130 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:23,514-Speed 6312.66 samples/sec Loss 5.0678 LearningRate 0.0003 Epoch: 18 Global Step: 391140 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:26,764-Speed 6303.79 samples/sec Loss 5.1346 LearningRate 0.0003 Epoch: 18 Global Step: 391150 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:30,008-Speed 6314.58 samples/sec Loss 5.0399 LearningRate 0.0003 Epoch: 18 Global Step: 391160 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:33,252-Speed 6313.27 samples/sec Loss 5.0715 LearningRate 0.0003 Epoch: 18 Global Step: 391170 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:36,493-Speed 6320.40 samples/sec Loss 5.0668 LearningRate 0.0003 Epoch: 18 Global Step: 391180 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:39,742-Speed 6306.41 samples/sec Loss 5.0270 LearningRate 0.0003 Epoch: 18 Global Step: 391190 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:42,987-Speed 6312.51 samples/sec Loss 5.0302 LearningRate 0.0003 Epoch: 18 Global Step: 391200 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:46,230-Speed 6316.66 samples/sec Loss 5.0287 LearningRate 0.0003 Epoch: 18 Global Step: 391210 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:49,472-Speed 6317.43 samples/sec Loss 5.0680 LearningRate 0.0003 Epoch: 18 Global Step: 391220 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:18:52,719-Speed 6308.80 samples/sec Loss 5.0980 LearningRate 0.0003 Epoch: 18 Global Step: 391230 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:18:55,952-Speed 6336.73 samples/sec Loss 5.1235 LearningRate 0.0003 Epoch: 18 Global Step: 391240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:18:59,191-Speed 6323.53 samples/sec Loss 5.0028 LearningRate 0.0003 Epoch: 18 Global Step: 391250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:02,437-Speed 6311.15 samples/sec Loss 5.0760 LearningRate 0.0003 Epoch: 18 Global Step: 391260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:05,679-Speed 6317.20 samples/sec Loss 5.1182 LearningRate 0.0003 Epoch: 18 Global Step: 391270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:08,923-Speed 6316.20 samples/sec Loss 5.0701 LearningRate 0.0003 Epoch: 18 Global Step: 391280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:12,165-Speed 6318.07 samples/sec Loss 5.0878 LearningRate 0.0003 Epoch: 18 Global Step: 391290 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:15,412-Speed 6309.23 samples/sec Loss 5.0676 LearningRate 0.0003 Epoch: 18 Global Step: 391300 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:18,660-Speed 6306.29 samples/sec Loss 5.0072 LearningRate 0.0003 Epoch: 18 Global Step: 391310 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:21,908-Speed 6307.58 samples/sec Loss 5.0452 LearningRate 0.0003 Epoch: 18 Global Step: 391320 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:25,156-Speed 6306.25 samples/sec Loss 5.0022 LearningRate 0.0003 Epoch: 18 Global Step: 391330 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:28,410-Speed 6299.36 samples/sec Loss 5.0453 LearningRate 0.0003 Epoch: 18 Global Step: 391340 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:19:31,663-Speed 6296.61 samples/sec Loss 5.0184 LearningRate 0.0003 Epoch: 18 Global Step: 391350 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:19:34,921-Speed 6288.73 samples/sec Loss 5.0635 LearningRate 0.0003 Epoch: 18 Global Step: 391360 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:19:38,153-Speed 6338.49 samples/sec Loss 4.9806 LearningRate 0.0003 Epoch: 18 Global Step: 391370 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:41,402-Speed 6303.87 samples/sec Loss 5.0525 LearningRate 0.0003 Epoch: 18 Global Step: 391380 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:44,652-Speed 6306.54 samples/sec Loss 5.0600 LearningRate 0.0003 Epoch: 18 Global Step: 391390 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:47,894-Speed 6316.84 samples/sec Loss 5.0709 LearningRate 0.0003 Epoch: 18 Global Step: 391400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:51,140-Speed 6310.35 samples/sec Loss 5.0619 LearningRate 0.0003 Epoch: 18 Global Step: 391410 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:54,386-Speed 6311.71 samples/sec Loss 5.0665 LearningRate 0.0003 Epoch: 18 Global Step: 391420 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:19:57,632-Speed 6311.11 samples/sec Loss 5.0970 LearningRate 0.0003 Epoch: 18 Global Step: 391430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:00,877-Speed 6312.81 samples/sec Loss 5.0762 LearningRate 0.0003 Epoch: 18 Global Step: 391440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:04,124-Speed 6308.75 samples/sec Loss 5.0353 LearningRate 0.0003 Epoch: 18 Global Step: 391450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:07,370-Speed 6310.58 samples/sec Loss 5.1472 LearningRate 0.0003 Epoch: 18 Global Step: 391460 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:10,611-Speed 6320.31 samples/sec Loss 5.0617 LearningRate 0.0003 Epoch: 18 Global Step: 391470 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:13,860-Speed 6303.91 samples/sec Loss 5.0548 LearningRate 0.0003 Epoch: 18 Global Step: 391480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:17,105-Speed 6312.92 samples/sec Loss 5.1221 LearningRate 0.0003 Epoch: 18 Global Step: 391490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:20,349-Speed 6314.13 samples/sec Loss 5.1063 LearningRate 0.0003 Epoch: 18 Global Step: 391500 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:23,602-Speed 6298.29 samples/sec Loss 5.0942 LearningRate 0.0003 Epoch: 18 Global Step: 391510 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:26,845-Speed 6315.24 samples/sec Loss 5.0283 LearningRate 0.0003 Epoch: 18 Global Step: 391520 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:30,091-Speed 6312.00 samples/sec Loss 5.0000 LearningRate 0.0003 Epoch: 18 Global Step: 391530 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:33,333-Speed 6318.88 samples/sec Loss 5.0313 LearningRate 0.0003 Epoch: 18 Global Step: 391540 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:36,580-Speed 6307.77 samples/sec Loss 5.0776 LearningRate 0.0003 Epoch: 18 Global Step: 391550 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:39,826-Speed 6312.50 samples/sec Loss 5.0691 LearningRate 0.0003 Epoch: 18 Global Step: 391560 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:20:43,069-Speed 6315.25 samples/sec Loss 5.0978 LearningRate 0.0003 Epoch: 18 Global Step: 391570 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:20:46,317-Speed 6307.02 samples/sec Loss 5.0350 LearningRate 0.0003 Epoch: 18 Global Step: 391580 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:20:49,566-Speed 6306.08 samples/sec Loss 5.0322 LearningRate 0.0003 Epoch: 18 Global Step: 391590 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:20:52,812-Speed 6309.41 samples/sec Loss 5.0064 LearningRate 0.0003 Epoch: 18 Global Step: 391600 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:20:56,057-Speed 6312.41 samples/sec Loss 5.0027 LearningRate 0.0003 Epoch: 18 Global Step: 391610 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:20:59,305-Speed 6307.05 samples/sec Loss 5.0339 LearningRate 0.0003 Epoch: 18 Global Step: 391620 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:21:02,555-Speed 6302.88 samples/sec Loss 5.0503 LearningRate 0.0003 Epoch: 18 Global Step: 391630 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:21:05,786-Speed 6339.39 samples/sec Loss 5.0628 LearningRate 0.0003 Epoch: 18 Global Step: 391640 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:09,030-Speed 6316.53 samples/sec Loss 5.0710 LearningRate 0.0003 Epoch: 18 Global Step: 391650 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:12,267-Speed 6327.46 samples/sec Loss 5.0006 LearningRate 0.0003 Epoch: 18 Global Step: 391660 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:15,517-Speed 6302.60 samples/sec Loss 5.1296 LearningRate 0.0003 Epoch: 18 Global Step: 391670 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:18,778-Speed 6281.86 samples/sec Loss 5.1000 LearningRate 0.0003 Epoch: 18 Global Step: 391680 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:22,024-Speed 6311.12 samples/sec Loss 5.0406 LearningRate 0.0003 Epoch: 18 Global Step: 391690 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:25,273-Speed 6304.96 samples/sec Loss 5.1323 LearningRate 0.0003 Epoch: 18 Global Step: 391700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:28,517-Speed 6313.55 samples/sec Loss 5.1363 LearningRate 0.0003 Epoch: 18 Global Step: 391710 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:31,761-Speed 6314.94 samples/sec Loss 5.0412 LearningRate 0.0003 Epoch: 18 Global Step: 391720 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:35,004-Speed 6316.69 samples/sec Loss 5.0242 LearningRate 0.0003 Epoch: 18 Global Step: 391730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:38,245-Speed 6320.42 samples/sec Loss 5.0605 LearningRate 0.0003 Epoch: 18 Global Step: 391740 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:21:41,480-Speed 6333.22 samples/sec Loss 5.1373 LearningRate 0.0003 Epoch: 18 Global Step: 391750 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:44,723-Speed 6315.94 samples/sec Loss 5.0141 LearningRate 0.0003 Epoch: 18 Global Step: 391760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:47,967-Speed 6314.55 samples/sec Loss 5.0939 LearningRate 0.0003 Epoch: 18 Global Step: 391770 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:51,210-Speed 6317.56 samples/sec Loss 5.0829 LearningRate 0.0003 Epoch: 18 Global Step: 391780 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:54,461-Speed 6300.16 samples/sec Loss 5.0755 LearningRate 0.0003 Epoch: 18 Global Step: 391790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:21:57,710-Speed 6305.71 samples/sec Loss 5.0579 LearningRate 0.0003 Epoch: 18 Global Step: 391800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:00,956-Speed 6310.51 samples/sec Loss 5.0381 LearningRate 0.0003 Epoch: 18 Global Step: 391810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:04,204-Speed 6307.52 samples/sec Loss 5.0362 LearningRate 0.0003 Epoch: 18 Global Step: 391820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:07,451-Speed 6306.92 samples/sec Loss 5.0591 LearningRate 0.0003 Epoch: 18 Global Step: 391830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:10,696-Speed 6313.48 samples/sec Loss 5.0404 LearningRate 0.0003 Epoch: 18 Global Step: 391840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:14,026-Speed 6152.02 samples/sec Loss 5.0944 LearningRate 0.0003 Epoch: 18 Global Step: 391850 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:22:17,258-Speed 6338.38 samples/sec Loss 5.0516 LearningRate 0.0003 Epoch: 18 Global Step: 391860 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:20,503-Speed 6311.55 samples/sec Loss 5.0414 LearningRate 0.0003 Epoch: 18 Global Step: 391870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:23,748-Speed 6312.63 samples/sec Loss 5.0581 LearningRate 0.0003 Epoch: 18 Global Step: 391880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:27,005-Speed 6289.69 samples/sec Loss 5.0604 LearningRate 0.0003 Epoch: 18 Global Step: 391890 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:30,250-Speed 6313.61 samples/sec Loss 5.0673 LearningRate 0.0003 Epoch: 18 Global Step: 391900 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:33,496-Speed 6309.16 samples/sec Loss 5.0203 LearningRate 0.0003 Epoch: 18 Global Step: 391910 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:36,743-Speed 6308.28 samples/sec Loss 4.9981 LearningRate 0.0003 Epoch: 18 Global Step: 391920 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:39,990-Speed 6309.56 samples/sec Loss 5.0469 LearningRate 0.0003 Epoch: 18 Global Step: 391930 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:43,240-Speed 6303.38 samples/sec Loss 5.0880 LearningRate 0.0003 Epoch: 18 Global Step: 391940 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:46,488-Speed 6308.12 samples/sec Loss 5.0722 LearningRate 0.0003 Epoch: 18 Global Step: 391950 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:22:49,732-Speed 6314.22 samples/sec Loss 5.1244 LearningRate 0.0003 Epoch: 18 Global Step: 391960 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:22:52,978-Speed 6310.57 samples/sec Loss 5.0865 LearningRate 0.0003 Epoch: 18 Global Step: 391970 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:22:56,220-Speed 6318.09 samples/sec Loss 5.0718 LearningRate 0.0003 Epoch: 18 Global Step: 391980 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:22:59,465-Speed 6312.63 samples/sec Loss 5.0857 LearningRate 0.0003 Epoch: 18 Global Step: 391990 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:02,710-Speed 6313.07 samples/sec Loss 5.0019 LearningRate 0.0003 Epoch: 18 Global Step: 392000 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:05,956-Speed 6311.44 samples/sec Loss 5.0335 LearningRate 0.0003 Epoch: 18 Global Step: 392010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:09,199-Speed 6317.07 samples/sec Loss 5.1013 LearningRate 0.0003 Epoch: 18 Global Step: 392020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:12,441-Speed 6317.02 samples/sec Loss 5.0156 LearningRate 0.0003 Epoch: 18 Global Step: 392030 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:15,687-Speed 6311.34 samples/sec Loss 5.0788 LearningRate 0.0003 Epoch: 18 Global Step: 392040 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:18,930-Speed 6316.94 samples/sec Loss 4.9977 LearningRate 0.0003 Epoch: 18 Global Step: 392050 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:22,174-Speed 6313.34 samples/sec Loss 5.0576 LearningRate 0.0003 Epoch: 18 Global Step: 392060 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:25,420-Speed 6311.25 samples/sec Loss 4.9914 LearningRate 0.0003 Epoch: 18 Global Step: 392070 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:28,671-Speed 6301.99 samples/sec Loss 5.0549 LearningRate 0.0003 Epoch: 18 Global Step: 392080 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:31,916-Speed 6311.74 samples/sec Loss 5.0561 LearningRate 0.0003 Epoch: 18 Global Step: 392090 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:23:35,160-Speed 6314.07 samples/sec Loss 5.0713 LearningRate 0.0003 Epoch: 18 Global Step: 392100 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:23:38,413-Speed 6296.85 samples/sec Loss 5.0388 LearningRate 0.0003 Epoch: 18 Global Step: 392110 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:23:41,640-Speed 6349.32 samples/sec Loss 5.0807 LearningRate 0.0003 Epoch: 18 Global Step: 392120 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:44,886-Speed 6309.41 samples/sec Loss 4.9995 LearningRate 0.0003 Epoch: 18 Global Step: 392130 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:48,130-Speed 6314.63 samples/sec Loss 5.1400 LearningRate 0.0003 Epoch: 18 Global Step: 392140 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:51,374-Speed 6314.83 samples/sec Loss 5.0458 LearningRate 0.0003 Epoch: 18 Global Step: 392150 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:54,623-Speed 6305.05 samples/sec Loss 5.0643 LearningRate 0.0003 Epoch: 18 Global Step: 392160 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:23:57,870-Speed 6310.40 samples/sec Loss 5.0474 LearningRate 0.0003 Epoch: 18 Global Step: 392170 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:24:01,114-Speed 6314.92 samples/sec Loss 5.0749 LearningRate 0.0003 Epoch: 18 Global Step: 392180 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:24:04,358-Speed 6313.06 samples/sec Loss 5.1115 LearningRate 0.0003 Epoch: 18 Global Step: 392190 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:24:07,605-Speed 6309.40 samples/sec Loss 5.0584 LearningRate 0.0003 Epoch: 18 Global Step: 392200 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:24:10,851-Speed 6311.53 samples/sec Loss 4.9974 LearningRate 0.0003 Epoch: 18 Global Step: 392210 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:24:14,095-Speed 6314.63 samples/sec Loss 5.0642 LearningRate 0.0003 Epoch: 18 Global Step: 392220 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:24:17,342-Speed 6309.18 samples/sec Loss 5.0406 LearningRate 0.0003 Epoch: 18 Global Step: 392230 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:24:20,589-Speed 6308.61 samples/sec Loss 5.0741 LearningRate 0.0003 Epoch: 18 Global Step: 392240 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:24:23,833-Speed 6314.03 samples/sec Loss 5.1078 LearningRate 0.0003 Epoch: 18 Global Step: 392250 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:24:27,080-Speed 6308.55 samples/sec Loss 5.0726 LearningRate 0.0003 Epoch: 18 Global Step: 392260 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:24:30,326-Speed 6311.20 samples/sec Loss 5.0425 LearningRate 0.0003 Epoch: 18 Global Step: 392270 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:24:33,605-Speed 6247.38 samples/sec Loss 5.0680 LearningRate 0.0003 Epoch: 18 Global Step: 392280 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:24:36,852-Speed 6307.34 samples/sec Loss 5.0081 LearningRate 0.0003 Epoch: 18 Global Step: 392290 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:24:40,098-Speed 6311.25 samples/sec Loss 5.1162 LearningRate 0.0003 Epoch: 18 Global Step: 392300 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:24:43,334-Speed 6329.53 samples/sec Loss 4.9899 LearningRate 0.0003 Epoch: 18 Global Step: 392310 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:24:46,577-Speed 6318.05 samples/sec Loss 5.0668 LearningRate 0.0003 Epoch: 18 Global Step: 392320 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:24:49,820-Speed 6315.90 samples/sec Loss 5.0781 LearningRate 0.0003 Epoch: 18 Global Step: 392330 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:24:53,065-Speed 6311.42 samples/sec Loss 5.0666 LearningRate 0.0003 Epoch: 18 Global Step: 392340 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:24:56,309-Speed 6314.38 samples/sec Loss 5.0243 LearningRate 0.0003 Epoch: 18 Global Step: 392350 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:24:59,558-Speed 6305.21 samples/sec Loss 5.0545 LearningRate 0.0003 Epoch: 18 Global Step: 392360 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:02,805-Speed 6309.87 samples/sec Loss 5.0730 LearningRate 0.0003 Epoch: 18 Global Step: 392370 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:06,055-Speed 6302.36 samples/sec Loss 5.0696 LearningRate 0.0003 Epoch: 18 Global Step: 392380 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:09,303-Speed 6308.15 samples/sec Loss 5.0532 LearningRate 0.0003 Epoch: 18 Global Step: 392390 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:12,552-Speed 6305.15 samples/sec Loss 5.1177 LearningRate 0.0003 Epoch: 18 Global Step: 392400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:15,801-Speed 6303.66 samples/sec Loss 5.0699 LearningRate 0.0003 Epoch: 18 Global Step: 392410 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:25:19,048-Speed 6308.52 samples/sec Loss 5.0722 LearningRate 0.0003 Epoch: 18 Global Step: 392420 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:25:22,283-Speed 6332.71 samples/sec Loss 5.0407 LearningRate 0.0003 Epoch: 18 Global Step: 392430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:25,529-Speed 6312.05 samples/sec Loss 5.0810 LearningRate 0.0003 Epoch: 18 Global Step: 392440 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:28,786-Speed 6289.08 samples/sec Loss 5.0413 LearningRate 0.0003 Epoch: 18 Global Step: 392450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:32,034-Speed 6306.07 samples/sec Loss 4.9349 LearningRate 0.0003 Epoch: 18 Global Step: 392460 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:35,281-Speed 6308.40 samples/sec Loss 5.1319 LearningRate 0.0003 Epoch: 18 Global Step: 392470 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:38,526-Speed 6313.36 samples/sec Loss 5.0263 LearningRate 0.0003 Epoch: 18 Global Step: 392480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:41,778-Speed 6298.90 samples/sec Loss 5.0068 LearningRate 0.0003 Epoch: 18 Global Step: 392490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:45,025-Speed 6308.36 samples/sec Loss 5.0742 LearningRate 0.0003 Epoch: 18 Global Step: 392500 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:48,271-Speed 6310.80 samples/sec Loss 5.0330 LearningRate 0.0003 Epoch: 18 Global Step: 392510 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:51,521-Speed 6303.02 samples/sec Loss 5.0629 LearningRate 0.0003 Epoch: 18 Global Step: 392520 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:25:54,768-Speed 6309.10 samples/sec Loss 5.0654 LearningRate 0.0003 Epoch: 18 Global Step: 392530 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:25:58,017-Speed 6303.88 samples/sec Loss 4.9984 LearningRate 0.0003 Epoch: 18 Global Step: 392540 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:26:01,283-Speed 6272.34 samples/sec Loss 5.0431 LearningRate 0.0003 Epoch: 18 Global Step: 392550 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:26:04,525-Speed 6318.73 samples/sec Loss 5.0494 LearningRate 0.0003 Epoch: 18 Global Step: 392560 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:26:07,774-Speed 6305.48 samples/sec Loss 5.0781 LearningRate 0.0003 Epoch: 18 Global Step: 392570 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:26:11,005-Speed 6339.11 samples/sec Loss 5.0619 LearningRate 0.0003 Epoch: 18 Global Step: 392580 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:14,249-Speed 6315.82 samples/sec Loss 5.0310 LearningRate 0.0003 Epoch: 18 Global Step: 392590 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:17,492-Speed 6315.92 samples/sec Loss 5.0675 LearningRate 0.0003 Epoch: 18 Global Step: 392600 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:20,739-Speed 6309.37 samples/sec Loss 5.0508 LearningRate 0.0003 Epoch: 18 Global Step: 392610 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:23,985-Speed 6310.39 samples/sec Loss 5.0547 LearningRate 0.0003 Epoch: 18 Global Step: 392620 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:27,230-Speed 6312.62 samples/sec Loss 5.0298 LearningRate 0.0003 Epoch: 18 Global Step: 392630 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:30,476-Speed 6311.63 samples/sec Loss 5.1019 LearningRate 0.0003 Epoch: 18 Global Step: 392640 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:33,719-Speed 6315.62 samples/sec Loss 5.0635 LearningRate 0.0003 Epoch: 18 Global Step: 392650 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:36,964-Speed 6313.13 samples/sec Loss 5.0937 LearningRate 0.0003 Epoch: 18 Global Step: 392660 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:40,209-Speed 6312.31 samples/sec Loss 5.0822 LearningRate 0.0003 Epoch: 18 Global Step: 392670 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:43,442-Speed 6336.97 samples/sec Loss 5.0683 LearningRate 0.0003 Epoch: 18 Global Step: 392680 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:46,691-Speed 6304.62 samples/sec Loss 5.0288 LearningRate 0.0003 Epoch: 18 Global Step: 392690 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:49,936-Speed 6312.08 samples/sec Loss 5.1181 LearningRate 0.0003 Epoch: 18 Global Step: 392700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:53,182-Speed 6311.92 samples/sec Loss 5.0124 LearningRate 0.0003 Epoch: 18 Global Step: 392710 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:56,426-Speed 6314.65 samples/sec Loss 5.0732 LearningRate 0.0003 Epoch: 18 Global Step: 392720 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:26:59,669-Speed 6315.04 samples/sec Loss 5.0400 LearningRate 0.0003 Epoch: 18 Global Step: 392730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:02,924-Speed 6294.78 samples/sec Loss 5.0695 LearningRate 0.0003 Epoch: 18 Global Step: 392740 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:06,172-Speed 6305.53 samples/sec Loss 5.0815 LearningRate 0.0003 Epoch: 18 Global Step: 392750 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:09,419-Speed 6308.93 samples/sec Loss 5.0295 LearningRate 0.0003 Epoch: 18 Global Step: 392760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:12,662-Speed 6316.04 samples/sec Loss 5.0485 LearningRate 0.0003 Epoch: 18 Global Step: 392770 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:15,899-Speed 6328.92 samples/sec Loss 5.0573 LearningRate 0.0003 Epoch: 18 Global Step: 392780 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:19,147-Speed 6307.48 samples/sec Loss 5.0201 LearningRate 0.0003 Epoch: 18 Global Step: 392790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:22,391-Speed 6313.82 samples/sec Loss 5.0689 LearningRate 0.0003 Epoch: 18 Global Step: 392800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:25,642-Speed 6300.82 samples/sec Loss 4.9676 LearningRate 0.0003 Epoch: 18 Global Step: 392810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:28,886-Speed 6316.05 samples/sec Loss 5.0794 LearningRate 0.0003 Epoch: 18 Global Step: 392820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:32,128-Speed 6317.47 samples/sec Loss 5.0324 LearningRate 0.0003 Epoch: 18 Global Step: 392830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:35,372-Speed 6316.03 samples/sec Loss 5.0803 LearningRate 0.0003 Epoch: 18 Global Step: 392840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:38,621-Speed 6304.96 samples/sec Loss 4.9759 LearningRate 0.0003 Epoch: 18 Global Step: 392850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:41,870-Speed 6304.52 samples/sec Loss 5.0946 LearningRate 0.0003 Epoch: 18 Global Step: 392860 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:45,115-Speed 6312.58 samples/sec Loss 4.9965 LearningRate 0.0003 Epoch: 18 Global Step: 392870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:27:48,362-Speed 6308.25 samples/sec Loss 5.1170 LearningRate 0.0003 Epoch: 18 Global Step: 392880 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:27:51,606-Speed 6314.90 samples/sec Loss 4.9953 LearningRate 0.0003 Epoch: 18 Global Step: 392890 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:27:54,855-Speed 6305.70 samples/sec Loss 4.9820 LearningRate 0.0003 Epoch: 18 Global Step: 392900 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:27:58,096-Speed 6319.34 samples/sec Loss 5.0985 LearningRate 0.0003 Epoch: 18 Global Step: 392910 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:28:01,339-Speed 6317.52 samples/sec Loss 5.0772 LearningRate 0.0003 Epoch: 18 Global Step: 392920 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:28:04,584-Speed 6312.84 samples/sec Loss 5.0348 LearningRate 0.0003 Epoch: 18 Global Step: 392930 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:28:07,832-Speed 6305.53 samples/sec Loss 5.0827 LearningRate 0.0003 Epoch: 18 Global Step: 392940 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:28:11,080-Speed 6306.33 samples/sec Loss 5.0636 LearningRate 0.0003 Epoch: 18 Global Step: 392950 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:28:14,326-Speed 6310.70 samples/sec Loss 5.0228 LearningRate 0.0003 Epoch: 18 Global Step: 392960 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:28:17,559-Speed 6337.62 samples/sec Loss 5.1149 LearningRate 0.0003 Epoch: 18 Global Step: 392970 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:20,801-Speed 6318.90 samples/sec Loss 4.9910 LearningRate 0.0003 Epoch: 18 Global Step: 392980 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:24,045-Speed 6312.67 samples/sec Loss 5.1202 LearningRate 0.0003 Epoch: 18 Global Step: 392990 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:27,304-Speed 6287.02 samples/sec Loss 5.0945 LearningRate 0.0003 Epoch: 18 Global Step: 393000 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:30,547-Speed 6315.31 samples/sec Loss 5.0857 LearningRate 0.0003 Epoch: 18 Global Step: 393010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:33,795-Speed 6307.53 samples/sec Loss 5.0425 LearningRate 0.0003 Epoch: 18 Global Step: 393020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:37,055-Speed 6283.76 samples/sec Loss 5.0527 LearningRate 0.0003 Epoch: 18 Global Step: 393030 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:40,299-Speed 6314.52 samples/sec Loss 5.0237 LearningRate 0.0003 Epoch: 18 Global Step: 393040 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:43,545-Speed 6311.77 samples/sec Loss 4.9833 LearningRate 0.0003 Epoch: 18 Global Step: 393050 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:46,786-Speed 6320.91 samples/sec Loss 5.0877 LearningRate 0.0003 Epoch: 18 Global Step: 393060 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:50,033-Speed 6307.68 samples/sec Loss 5.0670 LearningRate 0.0003 Epoch: 18 Global Step: 393070 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:28:53,266-Speed 6337.06 samples/sec Loss 5.0833 LearningRate 0.0003 Epoch: 18 Global Step: 393080 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:56,510-Speed 6313.88 samples/sec Loss 5.0544 LearningRate 0.0003 Epoch: 18 Global Step: 393090 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:28:59,756-Speed 6312.07 samples/sec Loss 5.0539 LearningRate 0.0003 Epoch: 18 Global Step: 393100 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:03,002-Speed 6310.55 samples/sec Loss 4.9508 LearningRate 0.0003 Epoch: 18 Global Step: 393110 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:06,248-Speed 6310.60 samples/sec Loss 5.0205 LearningRate 0.0003 Epoch: 18 Global Step: 393120 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:09,495-Speed 6307.65 samples/sec Loss 5.0637 LearningRate 0.0003 Epoch: 18 Global Step: 393130 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:12,738-Speed 6315.84 samples/sec Loss 5.0270 LearningRate 0.0003 Epoch: 18 Global Step: 393140 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:15,987-Speed 6305.08 samples/sec Loss 5.0293 LearningRate 0.0003 Epoch: 18 Global Step: 393150 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:19,231-Speed 6315.98 samples/sec Loss 5.1094 LearningRate 0.0003 Epoch: 18 Global Step: 393160 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:22,499-Speed 6267.40 samples/sec Loss 5.0456 LearningRate 0.0003 Epoch: 18 Global Step: 393170 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:25,745-Speed 6310.52 samples/sec Loss 5.0858 LearningRate 0.0003 Epoch: 18 Global Step: 393180 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:29:28,995-Speed 6302.90 samples/sec Loss 5.0596 LearningRate 0.0003 Epoch: 18 Global Step: 393190 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:29:32,250-Speed 6293.13 samples/sec Loss 5.0110 LearningRate 0.0003 Epoch: 18 Global Step: 393200 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:35,497-Speed 6308.98 samples/sec Loss 5.1120 LearningRate 0.0003 Epoch: 18 Global Step: 393210 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:38,748-Speed 6301.41 samples/sec Loss 5.0793 LearningRate 0.0003 Epoch: 18 Global Step: 393220 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:41,997-Speed 6304.64 samples/sec Loss 5.0238 LearningRate 0.0003 Epoch: 18 Global Step: 393230 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:45,248-Speed 6301.73 samples/sec Loss 5.0064 LearningRate 0.0003 Epoch: 18 Global Step: 393240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:48,493-Speed 6312.28 samples/sec Loss 5.0578 LearningRate 0.0003 Epoch: 18 Global Step: 393250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:51,739-Speed 6311.63 samples/sec Loss 4.9881 LearningRate 0.0003 Epoch: 18 Global Step: 393260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:54,985-Speed 6311.56 samples/sec Loss 5.1328 LearningRate 0.0003 Epoch: 18 Global Step: 393270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:29:58,232-Speed 6308.17 samples/sec Loss 4.9530 LearningRate 0.0003 Epoch: 18 Global Step: 393280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:01,481-Speed 6305.32 samples/sec Loss 5.0600 LearningRate 0.0003 Epoch: 18 Global Step: 393290 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:04,727-Speed 6310.96 samples/sec Loss 5.0471 LearningRate 0.0003 Epoch: 18 Global Step: 393300 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:30:07,975-Speed 6306.63 samples/sec Loss 5.0544 LearningRate 0.0003 Epoch: 18 Global Step: 393310 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:30:11,218-Speed 6315.22 samples/sec Loss 4.9996 LearningRate 0.0003 Epoch: 18 Global Step: 393320 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:30:14,462-Speed 6314.57 samples/sec Loss 4.9986 LearningRate 0.0003 Epoch: 18 Global Step: 393330 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:30:17,694-Speed 6338.99 samples/sec Loss 5.0208 LearningRate 0.0003 Epoch: 18 Global Step: 393340 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:20,943-Speed 6304.32 samples/sec Loss 5.0574 LearningRate 0.0003 Epoch: 18 Global Step: 393350 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:24,193-Speed 6302.69 samples/sec Loss 5.0958 LearningRate 0.0003 Epoch: 18 Global Step: 393360 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:27,435-Speed 6319.80 samples/sec Loss 4.9984 LearningRate 0.0003 Epoch: 18 Global Step: 393370 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:30,683-Speed 6306.43 samples/sec Loss 5.0930 LearningRate 0.0003 Epoch: 18 Global Step: 393380 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:33,934-Speed 6301.10 samples/sec Loss 5.0750 LearningRate 0.0003 Epoch: 18 Global Step: 393390 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:37,180-Speed 6310.44 samples/sec Loss 4.9909 LearningRate 0.0003 Epoch: 18 Global Step: 393400 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:40,427-Speed 6309.11 samples/sec Loss 5.0869 LearningRate 0.0003 Epoch: 18 Global Step: 393410 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:43,671-Speed 6313.59 samples/sec Loss 4.9720 LearningRate 0.0003 Epoch: 18 Global Step: 393420 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:46,917-Speed 6310.99 samples/sec Loss 5.0669 LearningRate 0.0003 Epoch: 18 Global Step: 393430 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:50,181-Speed 6275.88 samples/sec Loss 5.1231 LearningRate 0.0003 Epoch: 18 Global Step: 393440 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:30:53,447-Speed 6272.02 samples/sec Loss 5.0592 LearningRate 0.0003 Epoch: 18 Global Step: 393450 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:56,698-Speed 6301.93 samples/sec Loss 5.0322 LearningRate 0.0003 Epoch: 18 Global Step: 393460 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:30:59,943-Speed 6312.59 samples/sec Loss 5.0056 LearningRate 0.0003 Epoch: 18 Global Step: 393470 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:03,189-Speed 6310.81 samples/sec Loss 5.0037 LearningRate 0.0003 Epoch: 18 Global Step: 393480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:06,434-Speed 6312.65 samples/sec Loss 5.0511 LearningRate 0.0003 Epoch: 18 Global Step: 393490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:09,677-Speed 6317.38 samples/sec Loss 5.0162 LearningRate 0.0003 Epoch: 18 Global Step: 393500 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:12,921-Speed 6313.69 samples/sec Loss 5.0409 LearningRate 0.0003 Epoch: 18 Global Step: 393510 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:16,164-Speed 6318.08 samples/sec Loss 4.9915 LearningRate 0.0003 Epoch: 18 Global Step: 393520 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:19,408-Speed 6313.59 samples/sec Loss 5.0434 LearningRate 0.0003 Epoch: 18 Global Step: 393530 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:22,650-Speed 6319.33 samples/sec Loss 5.0388 LearningRate 0.0003 Epoch: 18 Global Step: 393540 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:25,892-Speed 6318.24 samples/sec Loss 5.0974 LearningRate 0.0003 Epoch: 18 Global Step: 393550 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:31:29,128-Speed 6329.83 samples/sec Loss 5.0753 LearningRate 0.0003 Epoch: 18 Global Step: 393560 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:32,372-Speed 6314.91 samples/sec Loss 5.0635 LearningRate 0.0003 Epoch: 18 Global Step: 393570 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:35,644-Speed 6260.74 samples/sec Loss 5.0114 LearningRate 0.0003 Epoch: 18 Global Step: 393580 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:38,893-Speed 6304.20 samples/sec Loss 5.0194 LearningRate 0.0003 Epoch: 18 Global Step: 393590 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:42,139-Speed 6309.50 samples/sec Loss 4.9832 LearningRate 0.0003 Epoch: 18 Global Step: 393600 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:45,382-Speed 6317.87 samples/sec Loss 5.1019 LearningRate 0.0003 Epoch: 18 Global Step: 393610 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:48,626-Speed 6314.66 samples/sec Loss 5.0797 LearningRate 0.0003 Epoch: 18 Global Step: 393620 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:51,868-Speed 6318.11 samples/sec Loss 5.0923 LearningRate 0.0003 Epoch: 18 Global Step: 393630 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:55,111-Speed 6317.04 samples/sec Loss 5.0745 LearningRate 0.0003 Epoch: 18 Global Step: 393640 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:31:58,365-Speed 6294.98 samples/sec Loss 5.0619 LearningRate 0.0003 Epoch: 18 Global Step: 393650 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:01,623-Speed 6287.53 samples/sec Loss 5.0031 LearningRate 0.0003 Epoch: 18 Global Step: 393660 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:32:04,866-Speed 6315.34 samples/sec Loss 5.1291 LearningRate 0.0003 Epoch: 18 Global Step: 393670 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:32:08,114-Speed 6306.59 samples/sec Loss 5.1046 LearningRate 0.0003 Epoch: 18 Global Step: 393680 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:32:11,360-Speed 6311.30 samples/sec Loss 5.0537 LearningRate 0.0003 Epoch: 18 Global Step: 393690 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:32:14,606-Speed 6311.66 samples/sec Loss 5.1620 LearningRate 0.0003 Epoch: 18 Global Step: 393700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:17,853-Speed 6309.63 samples/sec Loss 5.0305 LearningRate 0.0003 Epoch: 18 Global Step: 393710 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:21,098-Speed 6311.88 samples/sec Loss 5.0758 LearningRate 0.0003 Epoch: 18 Global Step: 393720 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:24,346-Speed 6307.68 samples/sec Loss 4.9796 LearningRate 0.0003 Epoch: 18 Global Step: 393730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:27,590-Speed 6313.35 samples/sec Loss 5.1021 LearningRate 0.0003 Epoch: 18 Global Step: 393740 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:30,835-Speed 6313.78 samples/sec Loss 5.0691 LearningRate 0.0003 Epoch: 18 Global Step: 393750 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:34,080-Speed 6311.76 samples/sec Loss 5.1112 LearningRate 0.0003 Epoch: 18 Global Step: 393760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:37,325-Speed 6312.43 samples/sec Loss 5.1421 LearningRate 0.0003 Epoch: 18 Global Step: 393770 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:40,569-Speed 6315.39 samples/sec Loss 5.0432 LearningRate 0.0003 Epoch: 18 Global Step: 393780 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:43,813-Speed 6314.72 samples/sec Loss 5.0331 LearningRate 0.0003 Epoch: 18 Global Step: 393790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:47,057-Speed 6314.16 samples/sec Loss 5.1049 LearningRate 0.0003 Epoch: 18 Global Step: 393800 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:32:50,299-Speed 6318.35 samples/sec Loss 5.0789 LearningRate 0.0003 Epoch: 18 Global Step: 393810 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:32:53,530-Speed 6340.83 samples/sec Loss 5.0116 LearningRate 0.0003 Epoch: 18 Global Step: 393820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:32:56,776-Speed 6311.04 samples/sec Loss 5.0532 LearningRate 0.0003 Epoch: 18 Global Step: 393830 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:00,023-Speed 6307.53 samples/sec Loss 5.0952 LearningRate 0.0003 Epoch: 18 Global Step: 393840 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:03,269-Speed 6311.33 samples/sec Loss 5.0826 LearningRate 0.0003 Epoch: 18 Global Step: 393850 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:06,514-Speed 6312.79 samples/sec Loss 5.0387 LearningRate 0.0003 Epoch: 18 Global Step: 393860 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:09,763-Speed 6303.97 samples/sec Loss 5.0578 LearningRate 0.0003 Epoch: 18 Global Step: 393870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:13,006-Speed 6316.61 samples/sec Loss 5.0010 LearningRate 0.0003 Epoch: 18 Global Step: 393880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:16,253-Speed 6308.30 samples/sec Loss 5.0424 LearningRate 0.0003 Epoch: 18 Global Step: 393890 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:19,498-Speed 6313.63 samples/sec Loss 5.0383 LearningRate 0.0003 Epoch: 18 Global Step: 393900 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:22,744-Speed 6312.01 samples/sec Loss 5.0777 LearningRate 0.0003 Epoch: 18 Global Step: 393910 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:25,995-Speed 6301.00 samples/sec Loss 5.0692 LearningRate 0.0003 Epoch: 18 Global Step: 393920 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:33:29,238-Speed 6316.43 samples/sec Loss 5.0506 LearningRate 0.0003 Epoch: 18 Global Step: 393930 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:33:32,484-Speed 6311.44 samples/sec Loss 4.9858 LearningRate 0.0003 Epoch: 18 Global Step: 393940 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:33:35,727-Speed 6315.16 samples/sec Loss 5.0179 LearningRate 0.0003 Epoch: 18 Global Step: 393950 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:33:38,973-Speed 6312.02 samples/sec Loss 5.0076 LearningRate 0.0003 Epoch: 18 Global Step: 393960 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:33:42,208-Speed 6331.45 samples/sec Loss 5.0739 LearningRate 0.0003 Epoch: 18 Global Step: 393970 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:45,450-Speed 6318.85 samples/sec Loss 5.0672 LearningRate 0.0003 Epoch: 18 Global Step: 393980 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:48,695-Speed 6312.22 samples/sec Loss 5.0563 LearningRate 0.0003 Epoch: 18 Global Step: 393990 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:51,937-Speed 6319.21 samples/sec Loss 5.0240 LearningRate 0.0003 Epoch: 18 Global Step: 394000 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:55,180-Speed 6315.50 samples/sec Loss 5.0840 LearningRate 0.0003 Epoch: 18 Global Step: 394010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:33:58,425-Speed 6312.81 samples/sec Loss 5.0950 LearningRate 0.0003 Epoch: 18 Global Step: 394020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:34:01,692-Speed 6270.93 samples/sec Loss 5.1245 LearningRate 0.0003 Epoch: 18 Global Step: 394030 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:34:04,936-Speed 6314.75 samples/sec Loss 5.0022 LearningRate 0.0003 Epoch: 18 Global Step: 394040 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:34:08,182-Speed 6310.48 samples/sec Loss 5.1413 LearningRate 0.0003 Epoch: 18 Global Step: 394050 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:35:08,315-Speed 340.58 samples/sec Loss 5.0954 LearningRate 0.0003 Epoch: 19 Global Step: 394060 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:35:11,554-Speed 6324.24 samples/sec Loss 5.1323 LearningRate 0.0003 Epoch: 19 Global Step: 394070 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:14,788-Speed 6333.51 samples/sec Loss 5.0968 LearningRate 0.0003 Epoch: 19 Global Step: 394080 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:18,047-Speed 6285.86 samples/sec Loss 5.0749 LearningRate 0.0003 Epoch: 19 Global Step: 394090 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:21,289-Speed 6320.43 samples/sec Loss 5.0957 LearningRate 0.0003 Epoch: 19 Global Step: 394100 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:24,531-Speed 6318.25 samples/sec Loss 5.0617 LearningRate 0.0003 Epoch: 19 Global Step: 394110 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:27,782-Speed 6300.10 samples/sec Loss 5.0694 LearningRate 0.0003 Epoch: 19 Global Step: 394120 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:31,030-Speed 6308.68 samples/sec Loss 5.0822 LearningRate 0.0003 Epoch: 19 Global Step: 394130 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:34,270-Speed 6322.22 samples/sec Loss 5.0616 LearningRate 0.0003 Epoch: 19 Global Step: 394140 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:37,518-Speed 6306.08 samples/sec Loss 5.0684 LearningRate 0.0003 Epoch: 19 Global Step: 394150 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:40,760-Speed 6318.50 samples/sec Loss 4.9904 LearningRate 0.0003 Epoch: 19 Global Step: 394160 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:43,985-Speed 6351.56 samples/sec Loss 5.0765 LearningRate 0.0003 Epoch: 19 Global Step: 394170 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:47,231-Speed 6310.73 samples/sec Loss 5.0222 LearningRate 0.0003 Epoch: 19 Global Step: 394180 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:50,501-Speed 6264.69 samples/sec Loss 5.0249 LearningRate 0.0003 Epoch: 19 Global Step: 394190 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:53,776-Speed 6254.09 samples/sec Loss 5.0575 LearningRate 0.0003 Epoch: 19 Global Step: 394200 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:35:57,018-Speed 6319.95 samples/sec Loss 5.0064 LearningRate 0.0003 Epoch: 19 Global Step: 394210 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:36:00,251-Speed 6335.55 samples/sec Loss 4.9929 LearningRate 0.0003 Epoch: 19 Global Step: 394220 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:36:03,503-Speed 6298.32 samples/sec Loss 5.0258 LearningRate 0.0003 Epoch: 19 Global Step: 394230 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:36:06,748-Speed 6313.22 samples/sec Loss 5.0285 LearningRate 0.0003 Epoch: 19 Global Step: 394240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:36:09,996-Speed 6306.29 samples/sec Loss 5.0599 LearningRate 0.0003 Epoch: 19 Global Step: 394250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:36:13,240-Speed 6314.86 samples/sec Loss 5.0073 LearningRate 0.0003 Epoch: 19 Global Step: 394260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:36:16,486-Speed 6311.33 samples/sec Loss 5.0326 LearningRate 0.0003 Epoch: 19 Global Step: 394270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:36:19,733-Speed 6308.44 samples/sec Loss 5.0139 LearningRate 0.0003 Epoch: 19 Global Step: 394280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:36:22,980-Speed 6307.98 samples/sec Loss 4.9847 LearningRate 0.0003 Epoch: 19 Global Step: 394290 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:36:26,229-Speed 6305.42 samples/sec Loss 5.0311 LearningRate 0.0003 Epoch: 19 Global Step: 394300 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:36:29,478-Speed 6305.04 samples/sec Loss 5.0542 LearningRate 0.0003 Epoch: 19 Global Step: 394310 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:36:32,729-Speed 6301.91 samples/sec Loss 5.0375 LearningRate 0.0003 Epoch: 19 Global Step: 394320 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:36:35,982-Speed 6297.71 samples/sec Loss 5.0257 LearningRate 0.0003 Epoch: 19 Global Step: 394330 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:36:39,240-Speed 6287.47 samples/sec Loss 5.0149 LearningRate 0.0003 Epoch: 19 Global Step: 394340 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:36:42,494-Speed 6296.02 samples/sec Loss 5.0494 LearningRate 0.0003 Epoch: 19 Global Step: 394350 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:36:45,742-Speed 6306.40 samples/sec Loss 5.0432 LearningRate 0.0003 Epoch: 19 Global Step: 394360 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:36:48,991-Speed 6303.46 samples/sec Loss 4.9921 LearningRate 0.0003 Epoch: 19 Global Step: 394370 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:36:52,249-Speed 6288.53 samples/sec Loss 4.9870 LearningRate 0.0003 Epoch: 19 Global Step: 394380 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:36:55,496-Speed 6307.76 samples/sec Loss 5.0011 LearningRate 0.0003 Epoch: 19 Global Step: 394390 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:36:58,745-Speed 6305.22 samples/sec Loss 5.0366 LearningRate 0.0003 Epoch: 19 Global Step: 394400 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:37:01,995-Speed 6303.00 samples/sec Loss 5.0398 LearningRate 0.0003 Epoch: 19 Global Step: 394410 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:37:05,239-Speed 6315.20 samples/sec Loss 5.0170 LearningRate 0.0003 Epoch: 19 Global Step: 394420 Fp16 Grad Scale: 65536 Required: 40 hours Training: 2022-04-02 03:37:08,477-Speed 6325.58 samples/sec Loss 4.9977 LearningRate 0.0003 Epoch: 19 Global Step: 394430 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:37:11,758-Speed 6244.08 samples/sec Loss 5.0604 LearningRate 0.0003 Epoch: 19 Global Step: 394440 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:37:15,007-Speed 6304.87 samples/sec Loss 4.9871 LearningRate 0.0003 Epoch: 19 Global Step: 394450 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:37:18,255-Speed 6305.92 samples/sec Loss 5.0641 LearningRate 0.0003 Epoch: 19 Global Step: 394460 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:37:21,502-Speed 6310.48 samples/sec Loss 5.0206 LearningRate 0.0003 Epoch: 19 Global Step: 394470 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:37:24,740-Speed 6326.04 samples/sec Loss 5.1004 LearningRate 0.0003 Epoch: 19 Global Step: 394480 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:37:28,008-Speed 6268.01 samples/sec Loss 5.0796 LearningRate 0.0003 Epoch: 19 Global Step: 394490 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:37:31,254-Speed 6311.52 samples/sec Loss 4.9639 LearningRate 0.0003 Epoch: 19 Global Step: 394500 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:37:34,504-Speed 6302.64 samples/sec Loss 5.1138 LearningRate 0.0003 Epoch: 19 Global Step: 394510 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:37:37,747-Speed 6315.45 samples/sec Loss 5.0054 LearningRate 0.0003 Epoch: 19 Global Step: 394520 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:37:40,999-Speed 6299.61 samples/sec Loss 5.0357 LearningRate 0.0003 Epoch: 19 Global Step: 394530 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:37:44,240-Speed 6321.60 samples/sec Loss 5.0408 LearningRate 0.0003 Epoch: 19 Global Step: 394540 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:37:47,487-Speed 6309.49 samples/sec Loss 5.0842 LearningRate 0.0003 Epoch: 19 Global Step: 394550 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:37:50,732-Speed 6312.61 samples/sec Loss 4.9520 LearningRate 0.0003 Epoch: 19 Global Step: 394560 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:37:53,977-Speed 6311.70 samples/sec Loss 4.9738 LearningRate 0.0003 Epoch: 19 Global Step: 394570 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:37:57,224-Speed 6310.39 samples/sec Loss 5.0824 LearningRate 0.0003 Epoch: 19 Global Step: 394580 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:38:00,472-Speed 6304.92 samples/sec Loss 5.0406 LearningRate 0.0003 Epoch: 19 Global Step: 394590 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:38:03,716-Speed 6314.96 samples/sec Loss 4.9792 LearningRate 0.0003 Epoch: 19 Global Step: 394600 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:38:06,957-Speed 6321.51 samples/sec Loss 5.0520 LearningRate 0.0003 Epoch: 19 Global Step: 394610 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:10,204-Speed 6308.82 samples/sec Loss 5.0689 LearningRate 0.0003 Epoch: 19 Global Step: 394620 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:13,474-Speed 6264.47 samples/sec Loss 5.0759 LearningRate 0.0003 Epoch: 19 Global Step: 394630 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:16,719-Speed 6312.19 samples/sec Loss 4.9875 LearningRate 0.0003 Epoch: 19 Global Step: 394640 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:19,962-Speed 6315.98 samples/sec Loss 5.0274 LearningRate 0.0003 Epoch: 19 Global Step: 394650 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:23,206-Speed 6314.99 samples/sec Loss 4.9958 LearningRate 0.0003 Epoch: 19 Global Step: 394660 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:26,454-Speed 6307.74 samples/sec Loss 4.9850 LearningRate 0.0003 Epoch: 19 Global Step: 394670 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:29,706-Speed 6298.41 samples/sec Loss 5.0308 LearningRate 0.0003 Epoch: 19 Global Step: 394680 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:32,966-Speed 6283.37 samples/sec Loss 4.9901 LearningRate 0.0003 Epoch: 19 Global Step: 394690 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:36,213-Speed 6308.83 samples/sec Loss 5.0852 LearningRate 0.0003 Epoch: 19 Global Step: 394700 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:39,459-Speed 6311.31 samples/sec Loss 5.0358 LearningRate 0.0003 Epoch: 19 Global Step: 394710 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:38:42,705-Speed 6309.70 samples/sec Loss 4.9992 LearningRate 0.0003 Epoch: 19 Global Step: 394720 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:38:45,942-Speed 6328.61 samples/sec Loss 5.0601 LearningRate 0.0003 Epoch: 19 Global Step: 394730 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:49,259-Speed 6175.52 samples/sec Loss 4.9834 LearningRate 0.0003 Epoch: 19 Global Step: 394740 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:52,554-Speed 6217.29 samples/sec Loss 5.0571 LearningRate 0.0003 Epoch: 19 Global Step: 394750 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:55,799-Speed 6313.14 samples/sec Loss 5.1156 LearningRate 0.0003 Epoch: 19 Global Step: 394760 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:38:59,055-Speed 6292.60 samples/sec Loss 5.1085 LearningRate 0.0003 Epoch: 19 Global Step: 394770 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:02,299-Speed 6313.79 samples/sec Loss 5.0158 LearningRate 0.0003 Epoch: 19 Global Step: 394780 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:05,546-Speed 6308.85 samples/sec Loss 5.0593 LearningRate 0.0003 Epoch: 19 Global Step: 394790 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:08,792-Speed 6310.60 samples/sec Loss 5.0704 LearningRate 0.0003 Epoch: 19 Global Step: 394800 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:12,034-Speed 6319.01 samples/sec Loss 5.0713 LearningRate 0.0003 Epoch: 19 Global Step: 394810 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:15,285-Speed 6300.48 samples/sec Loss 5.0238 LearningRate 0.0003 Epoch: 19 Global Step: 394820 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:18,536-Speed 6299.93 samples/sec Loss 5.0365 LearningRate 0.0003 Epoch: 19 Global Step: 394830 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:39:21,781-Speed 6312.58 samples/sec Loss 5.1328 LearningRate 0.0003 Epoch: 19 Global Step: 394840 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:39:25,026-Speed 6314.40 samples/sec Loss 4.9454 LearningRate 0.0003 Epoch: 19 Global Step: 394850 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:39:28,276-Speed 6302.30 samples/sec Loss 5.0513 LearningRate 0.0003 Epoch: 19 Global Step: 394860 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:39:31,511-Speed 6333.53 samples/sec Loss 5.0370 LearningRate 0.0003 Epoch: 19 Global Step: 394870 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:34,756-Speed 6312.28 samples/sec Loss 4.9667 LearningRate 0.0003 Epoch: 19 Global Step: 394880 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:38,002-Speed 6309.59 samples/sec Loss 4.9985 LearningRate 0.0003 Epoch: 19 Global Step: 394890 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:41,246-Speed 6314.46 samples/sec Loss 5.0478 LearningRate 0.0003 Epoch: 19 Global Step: 394900 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:44,491-Speed 6313.85 samples/sec Loss 4.9898 LearningRate 0.0003 Epoch: 19 Global Step: 394910 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:47,746-Speed 6293.18 samples/sec Loss 5.0866 LearningRate 0.0003 Epoch: 19 Global Step: 394920 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:50,993-Speed 6307.44 samples/sec Loss 4.9603 LearningRate 0.0003 Epoch: 19 Global Step: 394930 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:54,240-Speed 6310.04 samples/sec Loss 5.0492 LearningRate 0.0003 Epoch: 19 Global Step: 394940 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:39:57,489-Speed 6303.67 samples/sec Loss 5.0486 LearningRate 0.0003 Epoch: 19 Global Step: 394950 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:00,741-Speed 6299.57 samples/sec Loss 4.9691 LearningRate 0.0003 Epoch: 19 Global Step: 394960 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:03,993-Speed 6300.42 samples/sec Loss 5.0358 LearningRate 0.0003 Epoch: 19 Global Step: 394970 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:40:07,240-Speed 6309.28 samples/sec Loss 4.9711 LearningRate 0.0003 Epoch: 19 Global Step: 394980 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:40:10,494-Speed 6295.07 samples/sec Loss 5.0389 LearningRate 0.0003 Epoch: 19 Global Step: 394990 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:40:13,735-Speed 6318.97 samples/sec Loss 5.0282 LearningRate 0.0003 Epoch: 19 Global Step: 395000 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:40:16,973-Speed 6326.91 samples/sec Loss 5.0046 LearningRate 0.0003 Epoch: 19 Global Step: 395010 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:20,222-Speed 6304.99 samples/sec Loss 5.0798 LearningRate 0.0003 Epoch: 19 Global Step: 395020 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:23,475-Speed 6298.18 samples/sec Loss 5.0485 LearningRate 0.0003 Epoch: 19 Global Step: 395030 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:26,722-Speed 6307.91 samples/sec Loss 5.0692 LearningRate 0.0003 Epoch: 19 Global Step: 395040 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:29,971-Speed 6304.43 samples/sec Loss 5.0733 LearningRate 0.0003 Epoch: 19 Global Step: 395050 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:33,225-Speed 6295.19 samples/sec Loss 5.0935 LearningRate 0.0003 Epoch: 19 Global Step: 395060 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:36,472-Speed 6308.71 samples/sec Loss 5.0473 LearningRate 0.0003 Epoch: 19 Global Step: 395070 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:39,718-Speed 6311.99 samples/sec Loss 4.9519 LearningRate 0.0003 Epoch: 19 Global Step: 395080 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:42,968-Speed 6303.11 samples/sec Loss 4.9983 LearningRate 0.0003 Epoch: 19 Global Step: 395090 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:46,217-Speed 6304.35 samples/sec Loss 5.0216 LearningRate 0.0003 Epoch: 19 Global Step: 395100 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:49,463-Speed 6311.21 samples/sec Loss 5.0264 LearningRate 0.0003 Epoch: 19 Global Step: 395110 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:40:52,714-Speed 6300.40 samples/sec Loss 5.0901 LearningRate 0.0003 Epoch: 19 Global Step: 395120 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:40:55,946-Speed 6337.85 samples/sec Loss 5.0270 LearningRate 0.0003 Epoch: 19 Global Step: 395130 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:40:59,201-Speed 6293.16 samples/sec Loss 5.1681 LearningRate 0.0003 Epoch: 19 Global Step: 395140 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:02,451-Speed 6302.90 samples/sec Loss 5.0205 LearningRate 0.0003 Epoch: 19 Global Step: 395150 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:05,697-Speed 6311.10 samples/sec Loss 4.9906 LearningRate 0.0003 Epoch: 19 Global Step: 395160 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:08,944-Speed 6309.65 samples/sec Loss 5.0367 LearningRate 0.0003 Epoch: 19 Global Step: 395170 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:12,190-Speed 6309.65 samples/sec Loss 5.0763 LearningRate 0.0003 Epoch: 19 Global Step: 395180 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:15,435-Speed 6312.45 samples/sec Loss 5.0708 LearningRate 0.0003 Epoch: 19 Global Step: 395190 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:18,690-Speed 6294.24 samples/sec Loss 5.0220 LearningRate 0.0003 Epoch: 19 Global Step: 395200 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:21,940-Speed 6303.55 samples/sec Loss 4.9938 LearningRate 0.0003 Epoch: 19 Global Step: 395210 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:25,188-Speed 6307.73 samples/sec Loss 5.0297 LearningRate 0.0003 Epoch: 19 Global Step: 395220 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:28,440-Speed 6298.94 samples/sec Loss 5.0244 LearningRate 0.0003 Epoch: 19 Global Step: 395230 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-02 03:41:31,674-Speed 6332.93 samples/sec Loss 5.0631 LearningRate 0.0003 Epoch: 19 Global Step: 395240 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:34,921-Speed 6309.75 samples/sec Loss 5.0438 LearningRate 0.0003 Epoch: 19 Global Step: 395250 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:38,166-Speed 6312.28 samples/sec Loss 5.0432 LearningRate 0.0003 Epoch: 19 Global Step: 395260 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:41,413-Speed 6308.52 samples/sec Loss 5.0083 LearningRate 0.0003 Epoch: 19 Global Step: 395270 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:44,664-Speed 6300.82 samples/sec Loss 5.0868 LearningRate 0.0003 Epoch: 19 Global Step: 395280 Fp16 Grad Scale: 16384 Required: 40 hours Training: 2022-04-02 03:41:47,906-Speed 6317.79 samples/sec Loss 5.0615 LearningRate 0.0003 Epoch: 19 Global Step: 395290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:41:51,152-Speed 6310.90 samples/sec Loss 5.0386 LearningRate 0.0003 Epoch: 19 Global Step: 395300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:41:54,398-Speed 6310.74 samples/sec Loss 5.0680 LearningRate 0.0003 Epoch: 19 Global Step: 395310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:41:57,646-Speed 6306.43 samples/sec Loss 4.9615 LearningRate 0.0003 Epoch: 19 Global Step: 395320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:00,895-Speed 6305.52 samples/sec Loss 4.9832 LearningRate 0.0003 Epoch: 19 Global Step: 395330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:04,144-Speed 6305.32 samples/sec Loss 5.0145 LearningRate 0.0003 Epoch: 19 Global Step: 395340 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:42:07,393-Speed 6305.93 samples/sec Loss 5.0111 LearningRate 0.0003 Epoch: 19 Global Step: 395350 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:42:10,632-Speed 6323.91 samples/sec Loss 4.9629 LearningRate 0.0003 Epoch: 19 Global Step: 395360 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:13,878-Speed 6311.07 samples/sec Loss 4.9300 LearningRate 0.0003 Epoch: 19 Global Step: 395370 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:17,125-Speed 6307.45 samples/sec Loss 4.9739 LearningRate 0.0003 Epoch: 19 Global Step: 395380 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:20,375-Speed 6303.86 samples/sec Loss 5.0342 LearningRate 0.0003 Epoch: 19 Global Step: 395390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:23,624-Speed 6304.19 samples/sec Loss 5.1274 LearningRate 0.0003 Epoch: 19 Global Step: 395400 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:26,868-Speed 6314.99 samples/sec Loss 4.9813 LearningRate 0.0003 Epoch: 19 Global Step: 395410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:30,121-Speed 6298.66 samples/sec Loss 5.0724 LearningRate 0.0003 Epoch: 19 Global Step: 395420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:33,366-Speed 6312.22 samples/sec Loss 5.0863 LearningRate 0.0003 Epoch: 19 Global Step: 395430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:36,614-Speed 6307.92 samples/sec Loss 5.0358 LearningRate 0.0003 Epoch: 19 Global Step: 395440 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:39,862-Speed 6306.43 samples/sec Loss 5.0499 LearningRate 0.0003 Epoch: 19 Global Step: 395450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:42:43,107-Speed 6311.64 samples/sec Loss 5.0248 LearningRate 0.0003 Epoch: 19 Global Step: 395460 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:42:46,356-Speed 6304.77 samples/sec Loss 5.0142 LearningRate 0.0003 Epoch: 19 Global Step: 395470 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:42:49,603-Speed 6310.10 samples/sec Loss 5.0235 LearningRate 0.0003 Epoch: 19 Global Step: 395480 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:42:52,857-Speed 6293.85 samples/sec Loss 5.0168 LearningRate 0.0003 Epoch: 19 Global Step: 395490 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:42:56,107-Speed 6304.14 samples/sec Loss 4.9681 LearningRate 0.0003 Epoch: 19 Global Step: 395500 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:42:59,350-Speed 6314.89 samples/sec Loss 5.0101 LearningRate 0.0003 Epoch: 19 Global Step: 395510 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:43:02,601-Speed 6302.14 samples/sec Loss 5.0441 LearningRate 0.0003 Epoch: 19 Global Step: 395520 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:43:05,847-Speed 6311.53 samples/sec Loss 5.0138 LearningRate 0.0003 Epoch: 19 Global Step: 395530 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:43:09,089-Speed 6317.17 samples/sec Loss 5.0977 LearningRate 0.0003 Epoch: 19 Global Step: 395540 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:12,333-Speed 6314.94 samples/sec Loss 5.0542 LearningRate 0.0003 Epoch: 19 Global Step: 395550 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:15,578-Speed 6312.35 samples/sec Loss 4.9974 LearningRate 0.0003 Epoch: 19 Global Step: 395560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:18,823-Speed 6312.88 samples/sec Loss 4.9834 LearningRate 0.0003 Epoch: 19 Global Step: 395570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:22,066-Speed 6317.65 samples/sec Loss 5.0424 LearningRate 0.0003 Epoch: 19 Global Step: 395580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:25,315-Speed 6303.97 samples/sec Loss 5.0356 LearningRate 0.0003 Epoch: 19 Global Step: 395590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:28,563-Speed 6307.41 samples/sec Loss 5.0755 LearningRate 0.0003 Epoch: 19 Global Step: 395600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:31,807-Speed 6314.89 samples/sec Loss 5.0247 LearningRate 0.0003 Epoch: 19 Global Step: 395610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:35,059-Speed 6298.78 samples/sec Loss 5.0242 LearningRate 0.0003 Epoch: 19 Global Step: 395620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:38,340-Speed 6244.20 samples/sec Loss 5.0393 LearningRate 0.0003 Epoch: 19 Global Step: 395630 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:41,618-Speed 6248.70 samples/sec Loss 5.0739 LearningRate 0.0003 Epoch: 19 Global Step: 395640 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:43:44,847-Speed 6344.09 samples/sec Loss 5.0485 LearningRate 0.0003 Epoch: 19 Global Step: 395650 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:48,094-Speed 6308.43 samples/sec Loss 5.0003 LearningRate 0.0003 Epoch: 19 Global Step: 395660 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:51,339-Speed 6313.62 samples/sec Loss 5.0143 LearningRate 0.0003 Epoch: 19 Global Step: 395670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:54,582-Speed 6316.35 samples/sec Loss 5.0708 LearningRate 0.0003 Epoch: 19 Global Step: 395680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:43:57,827-Speed 6311.76 samples/sec Loss 5.0085 LearningRate 0.0003 Epoch: 19 Global Step: 395690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:01,078-Speed 6301.25 samples/sec Loss 5.0526 LearningRate 0.0003 Epoch: 19 Global Step: 395700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:04,324-Speed 6312.02 samples/sec Loss 5.0362 LearningRate 0.0003 Epoch: 19 Global Step: 395710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:07,570-Speed 6311.07 samples/sec Loss 5.0987 LearningRate 0.0003 Epoch: 19 Global Step: 395720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:10,835-Speed 6274.06 samples/sec Loss 5.0616 LearningRate 0.0003 Epoch: 19 Global Step: 395730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:14,077-Speed 6317.91 samples/sec Loss 5.0557 LearningRate 0.0003 Epoch: 19 Global Step: 395740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:17,323-Speed 6309.31 samples/sec Loss 5.0111 LearningRate 0.0003 Epoch: 19 Global Step: 395750 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:44:20,572-Speed 6305.13 samples/sec Loss 4.9955 LearningRate 0.0003 Epoch: 19 Global Step: 395760 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:44:23,835-Speed 6278.99 samples/sec Loss 5.0748 LearningRate 0.0003 Epoch: 19 Global Step: 395770 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:44:27,089-Speed 6294.45 samples/sec Loss 4.9383 LearningRate 0.0003 Epoch: 19 Global Step: 395780 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:44:30,335-Speed 6310.96 samples/sec Loss 5.1107 LearningRate 0.0003 Epoch: 19 Global Step: 395790 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:44:33,581-Speed 6311.30 samples/sec Loss 4.9769 LearningRate 0.0003 Epoch: 19 Global Step: 395800 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:44:36,814-Speed 6335.91 samples/sec Loss 5.0386 LearningRate 0.0003 Epoch: 19 Global Step: 395810 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:40,096-Speed 6241.12 samples/sec Loss 5.0307 LearningRate 0.0003 Epoch: 19 Global Step: 395820 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:43,347-Speed 6301.57 samples/sec Loss 5.0302 LearningRate 0.0003 Epoch: 19 Global Step: 395830 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:46,591-Speed 6314.62 samples/sec Loss 5.0870 LearningRate 0.0003 Epoch: 19 Global Step: 395840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:49,835-Speed 6314.41 samples/sec Loss 5.0732 LearningRate 0.0003 Epoch: 19 Global Step: 395850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:53,184-Speed 6116.88 samples/sec Loss 5.0202 LearningRate 0.0003 Epoch: 19 Global Step: 395860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:56,429-Speed 6312.61 samples/sec Loss 5.0407 LearningRate 0.0003 Epoch: 19 Global Step: 395870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:44:59,691-Speed 6280.20 samples/sec Loss 5.0581 LearningRate 0.0003 Epoch: 19 Global Step: 395880 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:02,935-Speed 6314.84 samples/sec Loss 5.0079 LearningRate 0.0003 Epoch: 19 Global Step: 395890 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:06,181-Speed 6310.32 samples/sec Loss 5.0312 LearningRate 0.0003 Epoch: 19 Global Step: 395900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:09,413-Speed 6338.60 samples/sec Loss 5.0492 LearningRate 0.0003 Epoch: 19 Global Step: 395910 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:12,659-Speed 6309.77 samples/sec Loss 4.9941 LearningRate 0.0003 Epoch: 19 Global Step: 395920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:15,915-Speed 6291.81 samples/sec Loss 5.0221 LearningRate 0.0003 Epoch: 19 Global Step: 395930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:19,159-Speed 6315.06 samples/sec Loss 5.0575 LearningRate 0.0003 Epoch: 19 Global Step: 395940 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:22,407-Speed 6305.96 samples/sec Loss 5.1220 LearningRate 0.0003 Epoch: 19 Global Step: 395950 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:25,677-Speed 6265.47 samples/sec Loss 5.1168 LearningRate 0.0003 Epoch: 19 Global Step: 395960 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:28,920-Speed 6316.23 samples/sec Loss 5.0133 LearningRate 0.0003 Epoch: 19 Global Step: 395970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:32,171-Speed 6301.51 samples/sec Loss 5.0240 LearningRate 0.0003 Epoch: 19 Global Step: 395980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:35,414-Speed 6315.12 samples/sec Loss 4.9919 LearningRate 0.0003 Epoch: 19 Global Step: 395990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:38,667-Speed 6299.22 samples/sec Loss 5.0435 LearningRate 0.0003 Epoch: 19 Global Step: 396000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:41,912-Speed 6311.89 samples/sec Loss 5.0119 LearningRate 0.0003 Epoch: 19 Global Step: 396010 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:45:45,192-Speed 6244.58 samples/sec Loss 5.0314 LearningRate 0.0003 Epoch: 19 Global Step: 396020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:48,470-Speed 6249.40 samples/sec Loss 5.0453 LearningRate 0.0003 Epoch: 19 Global Step: 396030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:51,709-Speed 6324.59 samples/sec Loss 5.0025 LearningRate 0.0003 Epoch: 19 Global Step: 396040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:54,964-Speed 6294.54 samples/sec Loss 5.0362 LearningRate 0.0003 Epoch: 19 Global Step: 396050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:45:58,210-Speed 6310.32 samples/sec Loss 5.1093 LearningRate 0.0003 Epoch: 19 Global Step: 396060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:46:01,459-Speed 6304.64 samples/sec Loss 5.0802 LearningRate 0.0003 Epoch: 19 Global Step: 396070 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:46:04,703-Speed 6315.87 samples/sec Loss 5.0393 LearningRate 0.0003 Epoch: 19 Global Step: 396080 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:46:07,947-Speed 6314.03 samples/sec Loss 4.9918 LearningRate 0.0003 Epoch: 19 Global Step: 396090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:46:11,197-Speed 6303.85 samples/sec Loss 5.0742 LearningRate 0.0003 Epoch: 19 Global Step: 396100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:46:14,447-Speed 6301.70 samples/sec Loss 5.0659 LearningRate 0.0003 Epoch: 19 Global Step: 396110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:46:17,699-Speed 6300.20 samples/sec Loss 5.0202 LearningRate 0.0003 Epoch: 19 Global Step: 396120 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:46:20,941-Speed 6317.53 samples/sec Loss 5.0536 LearningRate 0.0003 Epoch: 19 Global Step: 396130 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:46:24,191-Speed 6303.94 samples/sec Loss 5.1119 LearningRate 0.0003 Epoch: 19 Global Step: 396140 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:46:27,441-Speed 6301.42 samples/sec Loss 5.0951 LearningRate 0.0003 Epoch: 19 Global Step: 396150 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:46:30,687-Speed 6312.09 samples/sec Loss 4.9811 LearningRate 0.0003 Epoch: 19 Global Step: 396160 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:46:33,935-Speed 6306.59 samples/sec Loss 5.0604 LearningRate 0.0003 Epoch: 19 Global Step: 396170 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:46:37,176-Speed 6320.74 samples/sec Loss 4.9979 LearningRate 0.0003 Epoch: 19 Global Step: 396180 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:46:40,429-Speed 6296.46 samples/sec Loss 4.9910 LearningRate 0.0003 Epoch: 19 Global Step: 396190 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:46:43,681-Speed 6299.14 samples/sec Loss 5.0768 LearningRate 0.0003 Epoch: 19 Global Step: 396200 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:46:46,930-Speed 6304.89 samples/sec Loss 4.9713 LearningRate 0.0003 Epoch: 19 Global Step: 396210 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:46:50,162-Speed 6338.19 samples/sec Loss 5.0137 LearningRate 0.0003 Epoch: 19 Global Step: 396220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:46:53,410-Speed 6307.13 samples/sec Loss 4.9437 LearningRate 0.0003 Epoch: 19 Global Step: 396230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:46:56,655-Speed 6312.20 samples/sec Loss 4.9633 LearningRate 0.0003 Epoch: 19 Global Step: 396240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:46:59,900-Speed 6311.65 samples/sec Loss 5.0739 LearningRate 0.0003 Epoch: 19 Global Step: 396250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:03,148-Speed 6308.24 samples/sec Loss 5.0509 LearningRate 0.0003 Epoch: 19 Global Step: 396260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:06,397-Speed 6305.48 samples/sec Loss 5.0402 LearningRate 0.0003 Epoch: 19 Global Step: 396270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:09,648-Speed 6301.53 samples/sec Loss 4.9974 LearningRate 0.0003 Epoch: 19 Global Step: 396280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:12,897-Speed 6304.02 samples/sec Loss 5.0630 LearningRate 0.0003 Epoch: 19 Global Step: 396290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:16,145-Speed 6307.49 samples/sec Loss 4.9542 LearningRate 0.0003 Epoch: 19 Global Step: 396300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:19,394-Speed 6303.96 samples/sec Loss 5.0944 LearningRate 0.0003 Epoch: 19 Global Step: 396310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:22,641-Speed 6309.56 samples/sec Loss 5.0284 LearningRate 0.0003 Epoch: 19 Global Step: 396320 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:47:25,878-Speed 6329.26 samples/sec Loss 4.9889 LearningRate 0.0003 Epoch: 19 Global Step: 396330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:29,124-Speed 6309.77 samples/sec Loss 5.0206 LearningRate 0.0003 Epoch: 19 Global Step: 396340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:32,372-Speed 6306.97 samples/sec Loss 5.0538 LearningRate 0.0003 Epoch: 19 Global Step: 396350 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:35,616-Speed 6314.87 samples/sec Loss 5.1038 LearningRate 0.0003 Epoch: 19 Global Step: 396360 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:38,861-Speed 6313.43 samples/sec Loss 5.0103 LearningRate 0.0003 Epoch: 19 Global Step: 396370 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:42,111-Speed 6301.96 samples/sec Loss 4.9836 LearningRate 0.0003 Epoch: 19 Global Step: 396380 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:45,355-Speed 6314.08 samples/sec Loss 5.0271 LearningRate 0.0003 Epoch: 19 Global Step: 396390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:48,602-Speed 6310.12 samples/sec Loss 4.9093 LearningRate 0.0003 Epoch: 19 Global Step: 396400 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:51,846-Speed 6312.94 samples/sec Loss 5.0388 LearningRate 0.0003 Epoch: 19 Global Step: 396410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:55,093-Speed 6310.30 samples/sec Loss 4.9820 LearningRate 0.0003 Epoch: 19 Global Step: 396420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:47:58,323-Speed 6342.26 samples/sec Loss 4.9949 LearningRate 0.0003 Epoch: 19 Global Step: 396430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:01,567-Speed 6312.78 samples/sec Loss 4.9999 LearningRate 0.0003 Epoch: 19 Global Step: 396440 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:04,817-Speed 6303.86 samples/sec Loss 5.0243 LearningRate 0.0003 Epoch: 19 Global Step: 396450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:08,061-Speed 6315.36 samples/sec Loss 5.0303 LearningRate 0.0003 Epoch: 19 Global Step: 396460 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:11,308-Speed 6309.43 samples/sec Loss 5.0554 LearningRate 0.0003 Epoch: 19 Global Step: 396470 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:14,551-Speed 6316.31 samples/sec Loss 5.0063 LearningRate 0.0003 Epoch: 19 Global Step: 396480 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:17,801-Speed 6302.22 samples/sec Loss 5.0446 LearningRate 0.0003 Epoch: 19 Global Step: 396490 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:21,054-Speed 6298.05 samples/sec Loss 5.0836 LearningRate 0.0003 Epoch: 19 Global Step: 396500 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:24,299-Speed 6311.32 samples/sec Loss 4.9857 LearningRate 0.0003 Epoch: 19 Global Step: 396510 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:27,542-Speed 6317.52 samples/sec Loss 5.0434 LearningRate 0.0003 Epoch: 19 Global Step: 396520 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:30,788-Speed 6311.50 samples/sec Loss 5.0346 LearningRate 0.0003 Epoch: 19 Global Step: 396530 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:48:34,034-Speed 6309.91 samples/sec Loss 5.0513 LearningRate 0.0003 Epoch: 19 Global Step: 396540 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:48:37,280-Speed 6310.35 samples/sec Loss 5.0646 LearningRate 0.0003 Epoch: 19 Global Step: 396550 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:48:40,530-Speed 6304.05 samples/sec Loss 5.0587 LearningRate 0.0003 Epoch: 19 Global Step: 396560 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:48:43,775-Speed 6312.12 samples/sec Loss 5.0251 LearningRate 0.0003 Epoch: 19 Global Step: 396570 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:48:47,026-Speed 6301.96 samples/sec Loss 4.9931 LearningRate 0.0003 Epoch: 19 Global Step: 396580 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:48:50,270-Speed 6312.61 samples/sec Loss 4.9605 LearningRate 0.0003 Epoch: 19 Global Step: 396590 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:48:53,504-Speed 6335.67 samples/sec Loss 4.9861 LearningRate 0.0003 Epoch: 19 Global Step: 396600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:56,750-Speed 6309.96 samples/sec Loss 4.9881 LearningRate 0.0003 Epoch: 19 Global Step: 396610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:48:59,997-Speed 6309.76 samples/sec Loss 5.0582 LearningRate 0.0003 Epoch: 19 Global Step: 396620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:03,246-Speed 6305.11 samples/sec Loss 5.0289 LearningRate 0.0003 Epoch: 19 Global Step: 396630 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:06,492-Speed 6309.71 samples/sec Loss 5.0509 LearningRate 0.0003 Epoch: 19 Global Step: 396640 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:09,737-Speed 6313.00 samples/sec Loss 5.0393 LearningRate 0.0003 Epoch: 19 Global Step: 396650 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:12,982-Speed 6312.32 samples/sec Loss 5.0123 LearningRate 0.0003 Epoch: 19 Global Step: 396660 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:16,232-Speed 6303.51 samples/sec Loss 4.9737 LearningRate 0.0003 Epoch: 19 Global Step: 396670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:19,475-Speed 6316.73 samples/sec Loss 5.0581 LearningRate 0.0003 Epoch: 19 Global Step: 396680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:22,725-Speed 6304.27 samples/sec Loss 5.0967 LearningRate 0.0003 Epoch: 19 Global Step: 396690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:25,972-Speed 6309.17 samples/sec Loss 4.9598 LearningRate 0.0003 Epoch: 19 Global Step: 396700 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:49:29,218-Speed 6311.01 samples/sec Loss 4.9891 LearningRate 0.0003 Epoch: 19 Global Step: 396710 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:49:32,463-Speed 6311.40 samples/sec Loss 5.0597 LearningRate 0.0003 Epoch: 19 Global Step: 396720 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:49:35,707-Speed 6314.89 samples/sec Loss 5.0222 LearningRate 0.0003 Epoch: 19 Global Step: 396730 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:49:38,952-Speed 6312.88 samples/sec Loss 5.0261 LearningRate 0.0003 Epoch: 19 Global Step: 396740 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:49:42,207-Speed 6293.41 samples/sec Loss 5.1001 LearningRate 0.0003 Epoch: 19 Global Step: 396750 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:49:45,434-Speed 6347.04 samples/sec Loss 5.0182 LearningRate 0.0003 Epoch: 19 Global Step: 396760 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:48,686-Speed 6298.86 samples/sec Loss 4.9782 LearningRate 0.0003 Epoch: 19 Global Step: 396770 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:51,935-Speed 6305.64 samples/sec Loss 5.0428 LearningRate 0.0003 Epoch: 19 Global Step: 396780 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:55,184-Speed 6305.83 samples/sec Loss 5.0043 LearningRate 0.0003 Epoch: 19 Global Step: 396790 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:49:58,428-Speed 6314.75 samples/sec Loss 4.9961 LearningRate 0.0003 Epoch: 19 Global Step: 396800 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:01,676-Speed 6305.73 samples/sec Loss 5.0558 LearningRate 0.0003 Epoch: 19 Global Step: 396810 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:04,928-Speed 6299.99 samples/sec Loss 5.0917 LearningRate 0.0003 Epoch: 19 Global Step: 396820 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:08,177-Speed 6303.52 samples/sec Loss 5.0763 LearningRate 0.0003 Epoch: 19 Global Step: 396830 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:11,430-Speed 6297.03 samples/sec Loss 5.0094 LearningRate 0.0003 Epoch: 19 Global Step: 396840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:14,706-Speed 6253.88 samples/sec Loss 4.9906 LearningRate 0.0003 Epoch: 19 Global Step: 396850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:17,952-Speed 6309.84 samples/sec Loss 5.0096 LearningRate 0.0003 Epoch: 19 Global Step: 396860 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:50:21,202-Speed 6304.16 samples/sec Loss 5.0492 LearningRate 0.0003 Epoch: 19 Global Step: 396870 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:50:24,443-Speed 6319.61 samples/sec Loss 5.0775 LearningRate 0.0003 Epoch: 19 Global Step: 396880 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:50:27,688-Speed 6312.52 samples/sec Loss 5.0667 LearningRate 0.0003 Epoch: 19 Global Step: 396890 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:50:30,919-Speed 6340.67 samples/sec Loss 4.9501 LearningRate 0.0003 Epoch: 19 Global Step: 396900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:34,168-Speed 6306.17 samples/sec Loss 5.0161 LearningRate 0.0003 Epoch: 19 Global Step: 396910 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:37,415-Speed 6309.43 samples/sec Loss 4.9856 LearningRate 0.0003 Epoch: 19 Global Step: 396920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:40,662-Speed 6307.85 samples/sec Loss 5.0083 LearningRate 0.0003 Epoch: 19 Global Step: 396930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:43,910-Speed 6306.52 samples/sec Loss 4.9888 LearningRate 0.0003 Epoch: 19 Global Step: 396940 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:47,157-Speed 6309.70 samples/sec Loss 5.0163 LearningRate 0.0003 Epoch: 19 Global Step: 396950 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:50,404-Speed 6308.31 samples/sec Loss 5.0019 LearningRate 0.0003 Epoch: 19 Global Step: 396960 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:53,649-Speed 6312.82 samples/sec Loss 4.9431 LearningRate 0.0003 Epoch: 19 Global Step: 396970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:50:56,892-Speed 6316.11 samples/sec Loss 4.9470 LearningRate 0.0003 Epoch: 19 Global Step: 396980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:51:00,142-Speed 6303.97 samples/sec Loss 5.0118 LearningRate 0.0003 Epoch: 19 Global Step: 396990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:51:03,386-Speed 6313.15 samples/sec Loss 4.9413 LearningRate 0.0003 Epoch: 19 Global Step: 397000 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:51:06,637-Speed 6302.42 samples/sec Loss 5.0311 LearningRate 0.0003 Epoch: 19 Global Step: 397010 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:51:09,883-Speed 6310.81 samples/sec Loss 5.0095 LearningRate 0.0003 Epoch: 19 Global Step: 397020 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:51:13,131-Speed 6305.91 samples/sec Loss 5.0477 LearningRate 0.0003 Epoch: 19 Global Step: 397030 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:51:16,378-Speed 6308.68 samples/sec Loss 5.0442 LearningRate 0.0003 Epoch: 19 Global Step: 397040 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:51:19,620-Speed 6319.11 samples/sec Loss 4.9895 LearningRate 0.0003 Epoch: 19 Global Step: 397050 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:51:22,865-Speed 6311.90 samples/sec Loss 4.9923 LearningRate 0.0003 Epoch: 19 Global Step: 397060 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:51:26,108-Speed 6317.04 samples/sec Loss 5.0355 LearningRate 0.0003 Epoch: 19 Global Step: 397070 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:51:29,356-Speed 6306.93 samples/sec Loss 5.0420 LearningRate 0.0003 Epoch: 19 Global Step: 397080 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:51:32,588-Speed 6337.70 samples/sec Loss 4.9923 LearningRate 0.0003 Epoch: 19 Global Step: 397090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:51:35,834-Speed 6311.13 samples/sec Loss 5.0280 LearningRate 0.0003 Epoch: 19 Global Step: 397100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:51:39,086-Speed 6298.43 samples/sec Loss 5.0297 LearningRate 0.0003 Epoch: 19 Global Step: 397110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:51:42,331-Speed 6314.47 samples/sec Loss 5.0188 LearningRate 0.0003 Epoch: 19 Global Step: 397120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:51:45,578-Speed 6309.33 samples/sec Loss 5.0262 LearningRate 0.0003 Epoch: 19 Global Step: 397130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:51:48,825-Speed 6307.61 samples/sec Loss 5.0992 LearningRate 0.0003 Epoch: 19 Global Step: 397140 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:51:52,071-Speed 6311.77 samples/sec Loss 5.1129 LearningRate 0.0003 Epoch: 19 Global Step: 397150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:51:55,317-Speed 6311.16 samples/sec Loss 5.0794 LearningRate 0.0003 Epoch: 19 Global Step: 397160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:51:58,562-Speed 6311.15 samples/sec Loss 5.0791 LearningRate 0.0003 Epoch: 19 Global Step: 397170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:01,808-Speed 6312.40 samples/sec Loss 4.9831 LearningRate 0.0003 Epoch: 19 Global Step: 397180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:05,045-Speed 6326.40 samples/sec Loss 5.0361 LearningRate 0.0003 Epoch: 19 Global Step: 397190 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:08,294-Speed 6306.67 samples/sec Loss 5.0425 LearningRate 0.0003 Epoch: 19 Global Step: 397200 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:11,540-Speed 6310.09 samples/sec Loss 5.0186 LearningRate 0.0003 Epoch: 19 Global Step: 397210 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:14,783-Speed 6317.28 samples/sec Loss 5.0447 LearningRate 0.0003 Epoch: 19 Global Step: 397220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:18,032-Speed 6303.56 samples/sec Loss 5.0023 LearningRate 0.0003 Epoch: 19 Global Step: 397230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:21,277-Speed 6312.55 samples/sec Loss 5.0564 LearningRate 0.0003 Epoch: 19 Global Step: 397240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:24,525-Speed 6307.02 samples/sec Loss 5.0335 LearningRate 0.0003 Epoch: 19 Global Step: 397250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:27,775-Speed 6302.88 samples/sec Loss 4.9472 LearningRate 0.0003 Epoch: 19 Global Step: 397260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:31,018-Speed 6317.03 samples/sec Loss 5.0803 LearningRate 0.0003 Epoch: 19 Global Step: 397270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:34,266-Speed 6308.06 samples/sec Loss 5.0169 LearningRate 0.0003 Epoch: 19 Global Step: 397280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:37,518-Speed 6298.35 samples/sec Loss 5.0486 LearningRate 0.0003 Epoch: 19 Global Step: 397290 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:52:40,764-Speed 6310.04 samples/sec Loss 4.9194 LearningRate 0.0003 Epoch: 19 Global Step: 397300 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:52:43,997-Speed 6337.77 samples/sec Loss 4.9899 LearningRate 0.0003 Epoch: 19 Global Step: 397310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:47,241-Speed 6314.15 samples/sec Loss 5.0825 LearningRate 0.0003 Epoch: 19 Global Step: 397320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:50,489-Speed 6307.28 samples/sec Loss 4.9828 LearningRate 0.0003 Epoch: 19 Global Step: 397330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:53,735-Speed 6311.16 samples/sec Loss 4.9786 LearningRate 0.0003 Epoch: 19 Global Step: 397340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:52:56,982-Speed 6309.80 samples/sec Loss 4.9493 LearningRate 0.0003 Epoch: 19 Global Step: 397350 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:00,222-Speed 6321.23 samples/sec Loss 5.0141 LearningRate 0.0003 Epoch: 19 Global Step: 397360 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:03,486-Speed 6276.91 samples/sec Loss 5.0789 LearningRate 0.0003 Epoch: 19 Global Step: 397370 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:06,731-Speed 6311.93 samples/sec Loss 4.9635 LearningRate 0.0003 Epoch: 19 Global Step: 397380 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:09,975-Speed 6315.10 samples/sec Loss 5.0546 LearningRate 0.0003 Epoch: 19 Global Step: 397390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:13,219-Speed 6315.05 samples/sec Loss 5.0218 LearningRate 0.0003 Epoch: 19 Global Step: 397400 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:16,496-Speed 6249.84 samples/sec Loss 4.9669 LearningRate 0.0003 Epoch: 19 Global Step: 397410 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:53:19,741-Speed 6312.98 samples/sec Loss 4.9839 LearningRate 0.0003 Epoch: 19 Global Step: 397420 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:53:22,998-Speed 6289.29 samples/sec Loss 4.9587 LearningRate 0.0003 Epoch: 19 Global Step: 397430 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:53:26,242-Speed 6315.11 samples/sec Loss 5.0346 LearningRate 0.0003 Epoch: 19 Global Step: 397440 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:53:29,474-Speed 6337.58 samples/sec Loss 4.9560 LearningRate 0.0003 Epoch: 19 Global Step: 397450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:32,723-Speed 6306.16 samples/sec Loss 4.9646 LearningRate 0.0003 Epoch: 19 Global Step: 397460 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:35,993-Speed 6263.95 samples/sec Loss 5.0081 LearningRate 0.0003 Epoch: 19 Global Step: 397470 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:39,237-Speed 6313.60 samples/sec Loss 5.0141 LearningRate 0.0003 Epoch: 19 Global Step: 397480 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:42,481-Speed 6315.50 samples/sec Loss 5.0935 LearningRate 0.0003 Epoch: 19 Global Step: 397490 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:45,750-Speed 6266.40 samples/sec Loss 5.0091 LearningRate 0.0003 Epoch: 19 Global Step: 397500 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:48,992-Speed 6318.01 samples/sec Loss 5.0153 LearningRate 0.0003 Epoch: 19 Global Step: 397510 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:52,237-Speed 6311.82 samples/sec Loss 5.1228 LearningRate 0.0003 Epoch: 19 Global Step: 397520 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:55,481-Speed 6316.21 samples/sec Loss 4.9897 LearningRate 0.0003 Epoch: 19 Global Step: 397530 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:53:58,728-Speed 6308.95 samples/sec Loss 5.0092 LearningRate 0.0003 Epoch: 19 Global Step: 397540 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:01,974-Speed 6310.09 samples/sec Loss 4.9499 LearningRate 0.0003 Epoch: 19 Global Step: 397550 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:54:05,267-Speed 6221.01 samples/sec Loss 4.9811 LearningRate 0.0003 Epoch: 19 Global Step: 397560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:08,515-Speed 6308.31 samples/sec Loss 5.0520 LearningRate 0.0003 Epoch: 19 Global Step: 397570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:11,764-Speed 6304.76 samples/sec Loss 4.9661 LearningRate 0.0003 Epoch: 19 Global Step: 397580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:15,015-Speed 6299.75 samples/sec Loss 5.0000 LearningRate 0.0003 Epoch: 19 Global Step: 397590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:18,258-Speed 6317.04 samples/sec Loss 4.9683 LearningRate 0.0003 Epoch: 19 Global Step: 397600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:21,504-Speed 6310.60 samples/sec Loss 5.0147 LearningRate 0.0003 Epoch: 19 Global Step: 397610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:24,751-Speed 6309.03 samples/sec Loss 5.0268 LearningRate 0.0003 Epoch: 19 Global Step: 397620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:27,997-Speed 6311.78 samples/sec Loss 5.0134 LearningRate 0.0003 Epoch: 19 Global Step: 397630 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:31,238-Speed 6318.83 samples/sec Loss 4.9992 LearningRate 0.0003 Epoch: 19 Global Step: 397640 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:34,483-Speed 6313.04 samples/sec Loss 4.9834 LearningRate 0.0003 Epoch: 19 Global Step: 397650 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:37,730-Speed 6308.61 samples/sec Loss 5.0115 LearningRate 0.0003 Epoch: 19 Global Step: 397660 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:54:40,979-Speed 6304.45 samples/sec Loss 5.0656 LearningRate 0.0003 Epoch: 19 Global Step: 397670 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:54:44,218-Speed 6326.19 samples/sec Loss 5.0306 LearningRate 0.0003 Epoch: 19 Global Step: 397680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:47,467-Speed 6304.62 samples/sec Loss 5.0447 LearningRate 0.0003 Epoch: 19 Global Step: 397690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:50,718-Speed 6299.44 samples/sec Loss 4.9630 LearningRate 0.0003 Epoch: 19 Global Step: 397700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:53,966-Speed 6308.25 samples/sec Loss 5.0663 LearningRate 0.0003 Epoch: 19 Global Step: 397710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:54:57,210-Speed 6314.67 samples/sec Loss 5.0294 LearningRate 0.0003 Epoch: 19 Global Step: 397720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:00,460-Speed 6302.71 samples/sec Loss 5.0509 LearningRate 0.0003 Epoch: 19 Global Step: 397730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:03,709-Speed 6304.54 samples/sec Loss 5.0722 LearningRate 0.0003 Epoch: 19 Global Step: 397740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:06,957-Speed 6306.93 samples/sec Loss 4.9489 LearningRate 0.0003 Epoch: 19 Global Step: 397750 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:10,207-Speed 6303.51 samples/sec Loss 5.0078 LearningRate 0.0003 Epoch: 19 Global Step: 397760 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:13,451-Speed 6314.73 samples/sec Loss 4.9674 LearningRate 0.0003 Epoch: 19 Global Step: 397770 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:16,709-Speed 6287.84 samples/sec Loss 5.0816 LearningRate 0.0003 Epoch: 19 Global Step: 397780 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:55:19,967-Speed 6287.75 samples/sec Loss 4.9546 LearningRate 0.0003 Epoch: 19 Global Step: 397790 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:55:23,211-Speed 6314.91 samples/sec Loss 5.0286 LearningRate 0.0003 Epoch: 19 Global Step: 397800 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:55:26,463-Speed 6298.97 samples/sec Loss 5.0191 LearningRate 0.0003 Epoch: 19 Global Step: 397810 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:55:29,710-Speed 6309.57 samples/sec Loss 4.9692 LearningRate 0.0003 Epoch: 19 Global Step: 397820 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:55:32,957-Speed 6308.40 samples/sec Loss 4.9824 LearningRate 0.0003 Epoch: 19 Global Step: 397830 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:55:36,189-Speed 6337.81 samples/sec Loss 5.0520 LearningRate 0.0003 Epoch: 19 Global Step: 397840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:39,434-Speed 6313.35 samples/sec Loss 4.9835 LearningRate 0.0003 Epoch: 19 Global Step: 397850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:42,677-Speed 6315.64 samples/sec Loss 5.0604 LearningRate 0.0003 Epoch: 19 Global Step: 397860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:45,924-Speed 6309.91 samples/sec Loss 4.9844 LearningRate 0.0003 Epoch: 19 Global Step: 397870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:49,172-Speed 6307.11 samples/sec Loss 4.9747 LearningRate 0.0003 Epoch: 19 Global Step: 397880 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:52,413-Speed 6320.12 samples/sec Loss 4.9751 LearningRate 0.0003 Epoch: 19 Global Step: 397890 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:55,657-Speed 6314.23 samples/sec Loss 5.0058 LearningRate 0.0003 Epoch: 19 Global Step: 397900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:55:58,904-Speed 6308.46 samples/sec Loss 5.0419 LearningRate 0.0003 Epoch: 19 Global Step: 397910 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:02,151-Speed 6309.05 samples/sec Loss 5.0400 LearningRate 0.0003 Epoch: 19 Global Step: 397920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:05,406-Speed 6292.87 samples/sec Loss 5.0348 LearningRate 0.0003 Epoch: 19 Global Step: 397930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:08,652-Speed 6311.03 samples/sec Loss 4.9447 LearningRate 0.0003 Epoch: 19 Global Step: 397940 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:56:11,890-Speed 6326.82 samples/sec Loss 5.0238 LearningRate 0.0003 Epoch: 19 Global Step: 397950 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:15,144-Speed 6294.14 samples/sec Loss 5.0548 LearningRate 0.0003 Epoch: 19 Global Step: 397960 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:18,389-Speed 6312.53 samples/sec Loss 5.0140 LearningRate 0.0003 Epoch: 19 Global Step: 397970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:21,655-Speed 6273.80 samples/sec Loss 5.0230 LearningRate 0.0003 Epoch: 19 Global Step: 397980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:24,896-Speed 6319.97 samples/sec Loss 4.9950 LearningRate 0.0003 Epoch: 19 Global Step: 397990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:28,139-Speed 6318.04 samples/sec Loss 5.0517 LearningRate 0.0003 Epoch: 19 Global Step: 398000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:31,391-Speed 6298.15 samples/sec Loss 4.9611 LearningRate 0.0003 Epoch: 19 Global Step: 398010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:34,637-Speed 6310.71 samples/sec Loss 5.0278 LearningRate 0.0003 Epoch: 19 Global Step: 398020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:37,882-Speed 6313.79 samples/sec Loss 4.9981 LearningRate 0.0003 Epoch: 19 Global Step: 398030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:41,127-Speed 6311.26 samples/sec Loss 5.0874 LearningRate 0.0003 Epoch: 19 Global Step: 398040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:44,375-Speed 6310.59 samples/sec Loss 4.9518 LearningRate 0.0003 Epoch: 19 Global Step: 398050 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:56:47,634-Speed 6283.77 samples/sec Loss 5.0668 LearningRate 0.0003 Epoch: 19 Global Step: 398060 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:56:50,878-Speed 6314.52 samples/sec Loss 5.0272 LearningRate 0.0003 Epoch: 19 Global Step: 398070 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:56:54,110-Speed 6337.58 samples/sec Loss 5.0312 LearningRate 0.0003 Epoch: 19 Global Step: 398080 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:56:57,358-Speed 6308.60 samples/sec Loss 4.9547 LearningRate 0.0003 Epoch: 19 Global Step: 398090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:57:00,607-Speed 6304.00 samples/sec Loss 4.9949 LearningRate 0.0003 Epoch: 19 Global Step: 398100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:57:03,857-Speed 6303.40 samples/sec Loss 4.9411 LearningRate 0.0003 Epoch: 19 Global Step: 398110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:57:07,099-Speed 6318.48 samples/sec Loss 5.0182 LearningRate 0.0003 Epoch: 19 Global Step: 398120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:57:10,342-Speed 6315.18 samples/sec Loss 5.0656 LearningRate 0.0003 Epoch: 19 Global Step: 398130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:57:13,588-Speed 6312.29 samples/sec Loss 5.1192 LearningRate 0.0003 Epoch: 19 Global Step: 398140 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:57:16,836-Speed 6306.67 samples/sec Loss 5.0035 LearningRate 0.0003 Epoch: 19 Global Step: 398150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:57:20,085-Speed 6304.60 samples/sec Loss 5.0136 LearningRate 0.0003 Epoch: 19 Global Step: 398160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:57:23,330-Speed 6311.62 samples/sec Loss 5.0308 LearningRate 0.0003 Epoch: 19 Global Step: 398170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:57:26,575-Speed 6314.94 samples/sec Loss 5.0830 LearningRate 0.0003 Epoch: 19 Global Step: 398180 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:57:29,818-Speed 6317.16 samples/sec Loss 5.0570 LearningRate 0.0003 Epoch: 19 Global Step: 398190 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:57:33,066-Speed 6306.97 samples/sec Loss 4.9578 LearningRate 0.0003 Epoch: 19 Global Step: 398200 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:57:36,315-Speed 6304.74 samples/sec Loss 4.9138 LearningRate 0.0003 Epoch: 19 Global Step: 398210 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:57:39,560-Speed 6312.01 samples/sec Loss 4.9340 LearningRate 0.0003 Epoch: 19 Global Step: 398220 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:57:42,813-Speed 6298.18 samples/sec Loss 5.0224 LearningRate 0.0003 Epoch: 19 Global Step: 398230 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:57:46,062-Speed 6303.80 samples/sec Loss 5.0490 LearningRate 0.0003 Epoch: 19 Global Step: 398240 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:57:49,319-Speed 6289.98 samples/sec Loss 5.0291 LearningRate 0.0003 Epoch: 19 Global Step: 398250 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:57:52,563-Speed 6314.70 samples/sec Loss 4.9664 LearningRate 0.0003 Epoch: 19 Global Step: 398260 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:57:55,811-Speed 6306.55 samples/sec Loss 5.0134 LearningRate 0.0003 Epoch: 19 Global Step: 398270 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:57:59,047-Speed 6330.18 samples/sec Loss 5.0160 LearningRate 0.0003 Epoch: 19 Global Step: 398280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:02,300-Speed 6297.53 samples/sec Loss 5.0525 LearningRate 0.0003 Epoch: 19 Global Step: 398290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:05,544-Speed 6314.23 samples/sec Loss 5.0477 LearningRate 0.0003 Epoch: 19 Global Step: 398300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:08,788-Speed 6313.89 samples/sec Loss 5.0026 LearningRate 0.0003 Epoch: 19 Global Step: 398310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:12,037-Speed 6305.33 samples/sec Loss 4.9319 LearningRate 0.0003 Epoch: 19 Global Step: 398320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:15,284-Speed 6308.82 samples/sec Loss 4.9991 LearningRate 0.0003 Epoch: 19 Global Step: 398330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:18,533-Speed 6304.58 samples/sec Loss 5.0373 LearningRate 0.0003 Epoch: 19 Global Step: 398340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:21,779-Speed 6311.62 samples/sec Loss 5.0545 LearningRate 0.0003 Epoch: 19 Global Step: 398350 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:25,029-Speed 6302.88 samples/sec Loss 4.9991 LearningRate 0.0003 Epoch: 19 Global Step: 398360 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:28,272-Speed 6316.37 samples/sec Loss 5.0069 LearningRate 0.0003 Epoch: 19 Global Step: 398370 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:31,523-Speed 6301.54 samples/sec Loss 5.0236 LearningRate 0.0003 Epoch: 19 Global Step: 398380 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:58:34,755-Speed 6337.80 samples/sec Loss 4.9842 LearningRate 0.0003 Epoch: 19 Global Step: 398390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:38,004-Speed 6306.11 samples/sec Loss 5.0146 LearningRate 0.0003 Epoch: 19 Global Step: 398400 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:41,249-Speed 6312.64 samples/sec Loss 5.0183 LearningRate 0.0003 Epoch: 19 Global Step: 398410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:44,498-Speed 6304.28 samples/sec Loss 4.9244 LearningRate 0.0003 Epoch: 19 Global Step: 398420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:47,743-Speed 6313.68 samples/sec Loss 5.0291 LearningRate 0.0003 Epoch: 19 Global Step: 398430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:50,990-Speed 6308.43 samples/sec Loss 4.9960 LearningRate 0.0003 Epoch: 19 Global Step: 398440 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:54,234-Speed 6314.66 samples/sec Loss 5.0193 LearningRate 0.0003 Epoch: 19 Global Step: 398450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:58:57,486-Speed 6298.74 samples/sec Loss 5.1410 LearningRate 0.0003 Epoch: 19 Global Step: 398460 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:00,736-Speed 6302.52 samples/sec Loss 4.9938 LearningRate 0.0003 Epoch: 19 Global Step: 398470 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:03,989-Speed 6297.67 samples/sec Loss 4.9607 LearningRate 0.0003 Epoch: 19 Global Step: 398480 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:07,234-Speed 6312.10 samples/sec Loss 4.8750 LearningRate 0.0003 Epoch: 19 Global Step: 398490 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:59:10,478-Speed 6315.98 samples/sec Loss 4.9977 LearningRate 0.0003 Epoch: 19 Global Step: 398500 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:59:13,744-Speed 6270.61 samples/sec Loss 4.9814 LearningRate 0.0003 Epoch: 19 Global Step: 398510 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:59:16,998-Speed 6296.20 samples/sec Loss 4.9768 LearningRate 0.0003 Epoch: 19 Global Step: 398520 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:59:20,244-Speed 6309.89 samples/sec Loss 5.0240 LearningRate 0.0003 Epoch: 19 Global Step: 398530 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:59:23,493-Speed 6305.80 samples/sec Loss 5.0697 LearningRate 0.0003 Epoch: 19 Global Step: 398540 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 03:59:26,731-Speed 6326.34 samples/sec Loss 4.9616 LearningRate 0.0003 Epoch: 19 Global Step: 398550 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:29,974-Speed 6316.49 samples/sec Loss 4.9762 LearningRate 0.0003 Epoch: 19 Global Step: 398560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:33,218-Speed 6314.22 samples/sec Loss 5.0296 LearningRate 0.0003 Epoch: 19 Global Step: 398570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:36,473-Speed 6294.25 samples/sec Loss 4.9980 LearningRate 0.0003 Epoch: 19 Global Step: 398580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:39,721-Speed 6306.01 samples/sec Loss 5.0061 LearningRate 0.0003 Epoch: 19 Global Step: 398590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:42,965-Speed 6314.48 samples/sec Loss 4.9233 LearningRate 0.0003 Epoch: 19 Global Step: 398600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:46,215-Speed 6303.67 samples/sec Loss 5.0123 LearningRate 0.0003 Epoch: 19 Global Step: 398610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:49,470-Speed 6294.00 samples/sec Loss 5.0272 LearningRate 0.0003 Epoch: 19 Global Step: 398620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:52,816-Speed 6121.07 samples/sec Loss 5.0215 LearningRate 0.0003 Epoch: 19 Global Step: 398630 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:56,091-Speed 6256.48 samples/sec Loss 5.0076 LearningRate 0.0003 Epoch: 19 Global Step: 398640 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 03:59:59,339-Speed 6306.12 samples/sec Loss 5.0440 LearningRate 0.0003 Epoch: 19 Global Step: 398650 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:00:02,585-Speed 6310.66 samples/sec Loss 4.9701 LearningRate 0.0003 Epoch: 19 Global Step: 398660 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:00:05,818-Speed 6335.06 samples/sec Loss 5.0374 LearningRate 0.0003 Epoch: 19 Global Step: 398670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:00:09,066-Speed 6307.79 samples/sec Loss 5.0189 LearningRate 0.0003 Epoch: 19 Global Step: 398680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:00:12,315-Speed 6305.34 samples/sec Loss 5.0793 LearningRate 0.0003 Epoch: 19 Global Step: 398690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:00:15,559-Speed 6314.09 samples/sec Loss 4.9831 LearningRate 0.0003 Epoch: 19 Global Step: 398700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:00:18,804-Speed 6313.43 samples/sec Loss 5.0301 LearningRate 0.0003 Epoch: 19 Global Step: 398710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:00:22,048-Speed 6315.02 samples/sec Loss 4.9955 LearningRate 0.0003 Epoch: 19 Global Step: 398720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:00:25,293-Speed 6310.70 samples/sec Loss 5.0738 LearningRate 0.0003 Epoch: 19 Global Step: 398730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:00:28,538-Speed 6313.52 samples/sec Loss 4.9895 LearningRate 0.0003 Epoch: 19 Global Step: 398740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:00:31,788-Speed 6302.63 samples/sec Loss 4.9959 LearningRate 0.0003 Epoch: 19 Global Step: 398750 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:00:35,030-Speed 6318.64 samples/sec Loss 4.9922 LearningRate 0.0003 Epoch: 19 Global Step: 398760 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:00:38,280-Speed 6303.95 samples/sec Loss 4.9560 LearningRate 0.0003 Epoch: 19 Global Step: 398770 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:00:41,525-Speed 6312.24 samples/sec Loss 5.0174 LearningRate 0.0003 Epoch: 19 Global Step: 398780 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:00:44,774-Speed 6305.79 samples/sec Loss 4.9801 LearningRate 0.0003 Epoch: 19 Global Step: 398790 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:00:48,019-Speed 6312.58 samples/sec Loss 5.0348 LearningRate 0.0003 Epoch: 19 Global Step: 398800 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:00:51,261-Speed 6318.26 samples/sec Loss 5.0214 LearningRate 0.0003 Epoch: 19 Global Step: 398810 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:00:54,505-Speed 6313.24 samples/sec Loss 5.0297 LearningRate 0.0003 Epoch: 19 Global Step: 398820 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:00:57,735-Speed 6344.33 samples/sec Loss 4.9591 LearningRate 0.0003 Epoch: 19 Global Step: 398830 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:00,982-Speed 6307.89 samples/sec Loss 5.0378 LearningRate 0.0003 Epoch: 19 Global Step: 398840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:04,228-Speed 6311.24 samples/sec Loss 4.9234 LearningRate 0.0003 Epoch: 19 Global Step: 398850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:07,474-Speed 6310.97 samples/sec Loss 4.9285 LearningRate 0.0003 Epoch: 19 Global Step: 398860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:10,720-Speed 6309.93 samples/sec Loss 5.0458 LearningRate 0.0003 Epoch: 19 Global Step: 398870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:13,971-Speed 6302.79 samples/sec Loss 5.0042 LearningRate 0.0003 Epoch: 19 Global Step: 398880 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:17,217-Speed 6310.14 samples/sec Loss 4.9971 LearningRate 0.0003 Epoch: 19 Global Step: 398890 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:20,465-Speed 6305.25 samples/sec Loss 5.0553 LearningRate 0.0003 Epoch: 19 Global Step: 398900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:23,711-Speed 6311.29 samples/sec Loss 4.9965 LearningRate 0.0003 Epoch: 19 Global Step: 398910 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:26,953-Speed 6318.63 samples/sec Loss 5.0185 LearningRate 0.0003 Epoch: 19 Global Step: 398920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:30,199-Speed 6310.89 samples/sec Loss 5.0747 LearningRate 0.0003 Epoch: 19 Global Step: 398930 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:01:33,445-Speed 6310.35 samples/sec Loss 4.9570 LearningRate 0.0003 Epoch: 19 Global Step: 398940 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:01:36,694-Speed 6306.40 samples/sec Loss 5.0099 LearningRate 0.0003 Epoch: 19 Global Step: 398950 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:01:39,937-Speed 6316.31 samples/sec Loss 4.9762 LearningRate 0.0003 Epoch: 19 Global Step: 398960 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:01:43,173-Speed 6341.38 samples/sec Loss 4.9957 LearningRate 0.0003 Epoch: 19 Global Step: 398970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:46,416-Speed 6316.32 samples/sec Loss 5.0452 LearningRate 0.0003 Epoch: 19 Global Step: 398980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:49,664-Speed 6307.00 samples/sec Loss 4.8996 LearningRate 0.0003 Epoch: 19 Global Step: 398990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:52,907-Speed 6314.99 samples/sec Loss 4.9886 LearningRate 0.0003 Epoch: 19 Global Step: 399000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:56,153-Speed 6311.21 samples/sec Loss 5.0833 LearningRate 0.0003 Epoch: 19 Global Step: 399010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:01:59,409-Speed 6292.04 samples/sec Loss 5.0133 LearningRate 0.0003 Epoch: 19 Global Step: 399020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:02,654-Speed 6312.41 samples/sec Loss 4.9833 LearningRate 0.0003 Epoch: 19 Global Step: 399030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:05,902-Speed 6306.98 samples/sec Loss 5.0549 LearningRate 0.0003 Epoch: 19 Global Step: 399040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:09,155-Speed 6298.30 samples/sec Loss 5.0873 LearningRate 0.0003 Epoch: 19 Global Step: 399050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:12,409-Speed 6294.10 samples/sec Loss 4.9995 LearningRate 0.0003 Epoch: 19 Global Step: 399060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:15,655-Speed 6311.62 samples/sec Loss 5.0040 LearningRate 0.0003 Epoch: 19 Global Step: 399070 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:02:18,886-Speed 6339.85 samples/sec Loss 4.9718 LearningRate 0.0003 Epoch: 19 Global Step: 399080 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:22,135-Speed 6304.43 samples/sec Loss 5.0036 LearningRate 0.0003 Epoch: 19 Global Step: 399090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:25,383-Speed 6308.12 samples/sec Loss 4.9657 LearningRate 0.0003 Epoch: 19 Global Step: 399100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:28,632-Speed 6303.55 samples/sec Loss 4.9579 LearningRate 0.0003 Epoch: 19 Global Step: 399110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:31,879-Speed 6309.55 samples/sec Loss 4.9251 LearningRate 0.0003 Epoch: 19 Global Step: 399120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:35,126-Speed 6309.23 samples/sec Loss 4.9799 LearningRate 0.0003 Epoch: 19 Global Step: 399130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:38,370-Speed 6313.54 samples/sec Loss 4.9869 LearningRate 0.0003 Epoch: 19 Global Step: 399140 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:41,612-Speed 6318.37 samples/sec Loss 4.9798 LearningRate 0.0003 Epoch: 19 Global Step: 399150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:44,859-Speed 6308.84 samples/sec Loss 5.0501 LearningRate 0.0003 Epoch: 19 Global Step: 399160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:48,104-Speed 6313.98 samples/sec Loss 5.0332 LearningRate 0.0003 Epoch: 19 Global Step: 399170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:02:51,350-Speed 6309.87 samples/sec Loss 5.0597 LearningRate 0.0003 Epoch: 19 Global Step: 399180 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:02:54,596-Speed 6311.71 samples/sec Loss 5.0250 LearningRate 0.0003 Epoch: 19 Global Step: 399190 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:02:57,841-Speed 6312.54 samples/sec Loss 5.0962 LearningRate 0.0003 Epoch: 19 Global Step: 399200 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:03:01,092-Speed 6299.55 samples/sec Loss 5.0467 LearningRate 0.0003 Epoch: 19 Global Step: 399210 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:03:04,330-Speed 6327.81 samples/sec Loss 5.0816 LearningRate 0.0003 Epoch: 19 Global Step: 399220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:07,583-Speed 6297.16 samples/sec Loss 5.0268 LearningRate 0.0003 Epoch: 19 Global Step: 399230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:10,831-Speed 6306.20 samples/sec Loss 5.0497 LearningRate 0.0003 Epoch: 19 Global Step: 399240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:14,090-Speed 6286.63 samples/sec Loss 5.0120 LearningRate 0.0003 Epoch: 19 Global Step: 399250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:17,340-Speed 6303.48 samples/sec Loss 5.0713 LearningRate 0.0003 Epoch: 19 Global Step: 399260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:20,587-Speed 6308.91 samples/sec Loss 5.0643 LearningRate 0.0003 Epoch: 19 Global Step: 399270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:23,838-Speed 6300.52 samples/sec Loss 4.9869 LearningRate 0.0003 Epoch: 19 Global Step: 399280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:27,084-Speed 6310.65 samples/sec Loss 5.0382 LearningRate 0.0003 Epoch: 19 Global Step: 399290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:30,329-Speed 6313.19 samples/sec Loss 4.9624 LearningRate 0.0003 Epoch: 19 Global Step: 399300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:33,572-Speed 6316.71 samples/sec Loss 5.0134 LearningRate 0.0003 Epoch: 19 Global Step: 399310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:36,829-Speed 6289.46 samples/sec Loss 5.0476 LearningRate 0.0003 Epoch: 19 Global Step: 399320 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:03:40,078-Speed 6304.81 samples/sec Loss 4.9805 LearningRate 0.0003 Epoch: 19 Global Step: 399330 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:03:43,309-Speed 6340.57 samples/sec Loss 5.0361 LearningRate 0.0003 Epoch: 19 Global Step: 399340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:46,554-Speed 6311.67 samples/sec Loss 4.9917 LearningRate 0.0003 Epoch: 19 Global Step: 399350 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:49,799-Speed 6312.77 samples/sec Loss 4.9946 LearningRate 0.0003 Epoch: 19 Global Step: 399360 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:53,046-Speed 6309.28 samples/sec Loss 5.0337 LearningRate 0.0003 Epoch: 19 Global Step: 399370 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:56,290-Speed 6314.18 samples/sec Loss 4.9182 LearningRate 0.0003 Epoch: 19 Global Step: 399380 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:03:59,533-Speed 6315.88 samples/sec Loss 5.0290 LearningRate 0.0003 Epoch: 19 Global Step: 399390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:02,781-Speed 6308.08 samples/sec Loss 4.9765 LearningRate 0.0003 Epoch: 19 Global Step: 399400 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:06,028-Speed 6307.91 samples/sec Loss 4.9542 LearningRate 0.0003 Epoch: 19 Global Step: 399410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:09,275-Speed 6308.82 samples/sec Loss 4.9889 LearningRate 0.0003 Epoch: 19 Global Step: 399420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:12,523-Speed 6307.55 samples/sec Loss 5.0092 LearningRate 0.0003 Epoch: 19 Global Step: 399430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:15,770-Speed 6309.21 samples/sec Loss 5.0157 LearningRate 0.0003 Epoch: 19 Global Step: 399440 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:04:19,015-Speed 6311.34 samples/sec Loss 4.9968 LearningRate 0.0003 Epoch: 19 Global Step: 399450 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:04:22,257-Speed 6319.59 samples/sec Loss 5.0273 LearningRate 0.0003 Epoch: 19 Global Step: 399460 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:04:25,491-Speed 6334.61 samples/sec Loss 5.0369 LearningRate 0.0003 Epoch: 19 Global Step: 399470 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:28,738-Speed 6309.38 samples/sec Loss 5.0056 LearningRate 0.0003 Epoch: 19 Global Step: 399480 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:31,979-Speed 6320.16 samples/sec Loss 5.1432 LearningRate 0.0003 Epoch: 19 Global Step: 399490 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:35,223-Speed 6313.91 samples/sec Loss 4.9830 LearningRate 0.0003 Epoch: 19 Global Step: 399500 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:38,467-Speed 6315.33 samples/sec Loss 4.9842 LearningRate 0.0003 Epoch: 19 Global Step: 399510 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:41,710-Speed 6315.62 samples/sec Loss 4.9981 LearningRate 0.0003 Epoch: 19 Global Step: 399520 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:44,970-Speed 6285.28 samples/sec Loss 4.9991 LearningRate 0.0003 Epoch: 19 Global Step: 399530 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:48,218-Speed 6306.65 samples/sec Loss 4.9723 LearningRate 0.0003 Epoch: 19 Global Step: 399540 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:51,462-Speed 6314.05 samples/sec Loss 5.0179 LearningRate 0.0003 Epoch: 19 Global Step: 399550 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:54,708-Speed 6310.61 samples/sec Loss 4.9996 LearningRate 0.0003 Epoch: 19 Global Step: 399560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:04:57,950-Speed 6318.21 samples/sec Loss 4.9653 LearningRate 0.0003 Epoch: 19 Global Step: 399570 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:05:01,179-Speed 6344.17 samples/sec Loss 5.0054 LearningRate 0.0003 Epoch: 19 Global Step: 399580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:04,427-Speed 6307.37 samples/sec Loss 5.0154 LearningRate 0.0003 Epoch: 19 Global Step: 399590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:07,676-Speed 6304.47 samples/sec Loss 4.9427 LearningRate 0.0003 Epoch: 19 Global Step: 399600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:10,919-Speed 6316.35 samples/sec Loss 5.0493 LearningRate 0.0003 Epoch: 19 Global Step: 399610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:14,169-Speed 6304.42 samples/sec Loss 4.9743 LearningRate 0.0003 Epoch: 19 Global Step: 399620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:17,412-Speed 6314.95 samples/sec Loss 5.0102 LearningRate 0.0003 Epoch: 19 Global Step: 399630 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:20,657-Speed 6313.94 samples/sec Loss 5.0012 LearningRate 0.0003 Epoch: 19 Global Step: 399640 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:23,901-Speed 6314.16 samples/sec Loss 5.0238 LearningRate 0.0003 Epoch: 19 Global Step: 399650 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:27,147-Speed 6310.33 samples/sec Loss 4.9910 LearningRate 0.0003 Epoch: 19 Global Step: 399660 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:30,397-Speed 6303.17 samples/sec Loss 5.0417 LearningRate 0.0003 Epoch: 19 Global Step: 399670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:33,655-Speed 6288.43 samples/sec Loss 5.0139 LearningRate 0.0003 Epoch: 19 Global Step: 399680 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:05:36,980-Speed 6161.86 samples/sec Loss 4.9115 LearningRate 0.0003 Epoch: 19 Global Step: 399690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:40,259-Speed 6246.08 samples/sec Loss 5.0360 LearningRate 0.0003 Epoch: 19 Global Step: 399700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:43,504-Speed 6313.17 samples/sec Loss 5.0443 LearningRate 0.0003 Epoch: 19 Global Step: 399710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:46,761-Speed 6288.74 samples/sec Loss 5.0437 LearningRate 0.0003 Epoch: 19 Global Step: 399720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:50,005-Speed 6316.17 samples/sec Loss 4.9695 LearningRate 0.0003 Epoch: 19 Global Step: 399730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:53,251-Speed 6309.25 samples/sec Loss 5.0507 LearningRate 0.0003 Epoch: 19 Global Step: 399740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:56,494-Speed 6317.41 samples/sec Loss 5.0435 LearningRate 0.0003 Epoch: 19 Global Step: 399750 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:05:59,738-Speed 6314.38 samples/sec Loss 5.0199 LearningRate 0.0003 Epoch: 19 Global Step: 399760 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:02,991-Speed 6296.91 samples/sec Loss 5.0181 LearningRate 0.0003 Epoch: 19 Global Step: 399770 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:06,237-Speed 6311.01 samples/sec Loss 5.0221 LearningRate 0.0003 Epoch: 19 Global Step: 399780 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:09,483-Speed 6310.48 samples/sec Loss 4.9983 LearningRate 0.0003 Epoch: 19 Global Step: 399790 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:06:12,733-Speed 6302.76 samples/sec Loss 5.0042 LearningRate 0.0003 Epoch: 19 Global Step: 399800 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:06:15,963-Speed 6343.04 samples/sec Loss 5.0743 LearningRate 0.0003 Epoch: 19 Global Step: 399810 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:19,214-Speed 6301.72 samples/sec Loss 5.0293 LearningRate 0.0003 Epoch: 19 Global Step: 399820 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:22,459-Speed 6311.06 samples/sec Loss 4.9354 LearningRate 0.0003 Epoch: 19 Global Step: 399830 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:25,705-Speed 6311.74 samples/sec Loss 4.9953 LearningRate 0.0003 Epoch: 19 Global Step: 399840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:28,946-Speed 6319.26 samples/sec Loss 5.0040 LearningRate 0.0003 Epoch: 19 Global Step: 399850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:32,193-Speed 6310.49 samples/sec Loss 4.9803 LearningRate 0.0003 Epoch: 19 Global Step: 399860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:35,437-Speed 6313.46 samples/sec Loss 5.0056 LearningRate 0.0003 Epoch: 19 Global Step: 399870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:38,692-Speed 6293.90 samples/sec Loss 5.0167 LearningRate 0.0003 Epoch: 19 Global Step: 399880 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:41,935-Speed 6317.20 samples/sec Loss 5.0470 LearningRate 0.0003 Epoch: 19 Global Step: 399890 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:45,174-Speed 6324.10 samples/sec Loss 5.0515 LearningRate 0.0003 Epoch: 19 Global Step: 399900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:48,423-Speed 6305.39 samples/sec Loss 4.9774 LearningRate 0.0003 Epoch: 19 Global Step: 399910 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:06:51,652-Speed 6344.03 samples/sec Loss 5.0346 LearningRate 0.0003 Epoch: 19 Global Step: 399920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:54,903-Speed 6301.60 samples/sec Loss 4.9181 LearningRate 0.0003 Epoch: 19 Global Step: 399930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:06:58,146-Speed 6314.83 samples/sec Loss 5.0310 LearningRate 0.0003 Epoch: 19 Global Step: 399940 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:01,391-Speed 6313.23 samples/sec Loss 5.0031 LearningRate 0.0003 Epoch: 19 Global Step: 399950 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:04,635-Speed 6314.85 samples/sec Loss 5.0072 LearningRate 0.0003 Epoch: 19 Global Step: 399960 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:07,879-Speed 6314.25 samples/sec Loss 4.9633 LearningRate 0.0003 Epoch: 19 Global Step: 399970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:11,126-Speed 6308.37 samples/sec Loss 4.9610 LearningRate 0.0003 Epoch: 19 Global Step: 399980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:14,378-Speed 6300.34 samples/sec Loss 4.9644 LearningRate 0.0003 Epoch: 19 Global Step: 399990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:17,620-Speed 6318.88 samples/sec Loss 5.0079 LearningRate 0.0003 Epoch: 19 Global Step: 400000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:20,866-Speed 6311.27 samples/sec Loss 5.0410 LearningRate 0.0003 Epoch: 19 Global Step: 400010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:24,100-Speed 6333.98 samples/sec Loss 4.9856 LearningRate 0.0003 Epoch: 19 Global Step: 400020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:27,393-Speed 6220.26 samples/sec Loss 5.0167 LearningRate 0.0003 Epoch: 19 Global Step: 400030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:30,671-Speed 6248.12 samples/sec Loss 4.9999 LearningRate 0.0003 Epoch: 19 Global Step: 400040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:33,918-Speed 6309.89 samples/sec Loss 4.9926 LearningRate 0.0003 Epoch: 19 Global Step: 400050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:37,161-Speed 6316.65 samples/sec Loss 5.0378 LearningRate 0.0003 Epoch: 19 Global Step: 400060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:40,402-Speed 6319.63 samples/sec Loss 5.1129 LearningRate 0.0003 Epoch: 19 Global Step: 400070 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:43,649-Speed 6308.88 samples/sec Loss 4.9742 LearningRate 0.0003 Epoch: 19 Global Step: 400080 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:46,891-Speed 6319.34 samples/sec Loss 5.0409 LearningRate 0.0003 Epoch: 19 Global Step: 400090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:50,137-Speed 6309.89 samples/sec Loss 5.0077 LearningRate 0.0003 Epoch: 19 Global Step: 400100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:53,380-Speed 6316.94 samples/sec Loss 4.9697 LearningRate 0.0003 Epoch: 19 Global Step: 400110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:07:56,631-Speed 6301.04 samples/sec Loss 5.0649 LearningRate 0.0003 Epoch: 19 Global Step: 400120 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:07:59,879-Speed 6307.70 samples/sec Loss 5.0422 LearningRate 0.0003 Epoch: 19 Global Step: 400130 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:08:03,126-Speed 6308.45 samples/sec Loss 5.0149 LearningRate 0.0003 Epoch: 19 Global Step: 400140 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:08:06,362-Speed 6330.47 samples/sec Loss 5.0830 LearningRate 0.0003 Epoch: 19 Global Step: 400150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:09,604-Speed 6317.89 samples/sec Loss 4.9583 LearningRate 0.0003 Epoch: 19 Global Step: 400160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:12,861-Speed 6289.36 samples/sec Loss 5.0669 LearningRate 0.0003 Epoch: 19 Global Step: 400170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:16,107-Speed 6312.07 samples/sec Loss 4.9978 LearningRate 0.0003 Epoch: 19 Global Step: 400180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:19,356-Speed 6304.92 samples/sec Loss 4.9492 LearningRate 0.0003 Epoch: 19 Global Step: 400190 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:22,607-Speed 6300.24 samples/sec Loss 4.9575 LearningRate 0.0003 Epoch: 19 Global Step: 400200 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:25,853-Speed 6311.61 samples/sec Loss 5.0073 LearningRate 0.0003 Epoch: 19 Global Step: 400210 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:29,112-Speed 6284.02 samples/sec Loss 5.0273 LearningRate 0.0003 Epoch: 19 Global Step: 400220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:32,355-Speed 6317.78 samples/sec Loss 4.9543 LearningRate 0.0003 Epoch: 19 Global Step: 400230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:35,597-Speed 6317.25 samples/sec Loss 4.9766 LearningRate 0.0003 Epoch: 19 Global Step: 400240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:38,841-Speed 6314.73 samples/sec Loss 5.0632 LearningRate 0.0003 Epoch: 19 Global Step: 400250 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:08:42,069-Speed 6346.24 samples/sec Loss 4.9521 LearningRate 0.0003 Epoch: 19 Global Step: 400260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:45,315-Speed 6311.01 samples/sec Loss 4.9597 LearningRate 0.0003 Epoch: 19 Global Step: 400270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:48,558-Speed 6315.54 samples/sec Loss 5.0733 LearningRate 0.0003 Epoch: 19 Global Step: 400280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:51,808-Speed 6305.13 samples/sec Loss 5.0156 LearningRate 0.0003 Epoch: 19 Global Step: 400290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:55,052-Speed 6314.18 samples/sec Loss 4.9625 LearningRate 0.0003 Epoch: 19 Global Step: 400300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:08:58,299-Speed 6309.01 samples/sec Loss 5.0519 LearningRate 0.0003 Epoch: 19 Global Step: 400310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:01,545-Speed 6310.35 samples/sec Loss 4.9722 LearningRate 0.0003 Epoch: 19 Global Step: 400320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:04,803-Speed 6288.95 samples/sec Loss 5.0597 LearningRate 0.0003 Epoch: 19 Global Step: 400330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:08,047-Speed 6314.50 samples/sec Loss 4.9784 LearningRate 0.0003 Epoch: 19 Global Step: 400340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:11,294-Speed 6310.07 samples/sec Loss 4.9240 LearningRate 0.0003 Epoch: 19 Global Step: 400350 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:14,539-Speed 6312.88 samples/sec Loss 4.9762 LearningRate 0.0003 Epoch: 19 Global Step: 400360 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:09:17,784-Speed 6312.30 samples/sec Loss 5.0210 LearningRate 0.0003 Epoch: 19 Global Step: 400370 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:09:21,029-Speed 6311.98 samples/sec Loss 5.0139 LearningRate 0.0003 Epoch: 19 Global Step: 400380 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:09:24,283-Speed 6296.12 samples/sec Loss 5.0425 LearningRate 0.0003 Epoch: 19 Global Step: 400390 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:09:27,544-Speed 6280.63 samples/sec Loss 4.9984 LearningRate 0.0003 Epoch: 19 Global Step: 400400 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:09:30,835-Speed 6224.32 samples/sec Loss 5.0005 LearningRate 0.0003 Epoch: 19 Global Step: 400410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:34,079-Speed 6315.23 samples/sec Loss 4.9109 LearningRate 0.0003 Epoch: 19 Global Step: 400420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:37,318-Speed 6323.69 samples/sec Loss 5.0008 LearningRate 0.0003 Epoch: 19 Global Step: 400430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:40,565-Speed 6310.07 samples/sec Loss 4.9669 LearningRate 0.0003 Epoch: 19 Global Step: 400440 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:43,811-Speed 6309.34 samples/sec Loss 5.0375 LearningRate 0.0003 Epoch: 19 Global Step: 400450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:47,060-Speed 6305.26 samples/sec Loss 4.9427 LearningRate 0.0003 Epoch: 19 Global Step: 400460 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:50,308-Speed 6307.48 samples/sec Loss 4.9880 LearningRate 0.0003 Epoch: 19 Global Step: 400470 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:53,549-Speed 6319.50 samples/sec Loss 4.9619 LearningRate 0.0003 Epoch: 19 Global Step: 400480 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:09:56,795-Speed 6311.31 samples/sec Loss 5.0006 LearningRate 0.0003 Epoch: 19 Global Step: 400490 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:00,040-Speed 6313.60 samples/sec Loss 4.9885 LearningRate 0.0003 Epoch: 19 Global Step: 400500 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:03,285-Speed 6313.01 samples/sec Loss 4.9452 LearningRate 0.0003 Epoch: 19 Global Step: 400510 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:10:06,528-Speed 6314.64 samples/sec Loss 4.9670 LearningRate 0.0003 Epoch: 19 Global Step: 400520 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:10:09,761-Speed 6337.44 samples/sec Loss 4.9607 LearningRate 0.0003 Epoch: 19 Global Step: 400530 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:13,022-Speed 6282.08 samples/sec Loss 4.9790 LearningRate 0.0003 Epoch: 19 Global Step: 400540 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:16,265-Speed 6315.83 samples/sec Loss 4.9939 LearningRate 0.0003 Epoch: 19 Global Step: 400550 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:19,517-Speed 6301.23 samples/sec Loss 5.0599 LearningRate 0.0003 Epoch: 19 Global Step: 400560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:22,762-Speed 6312.50 samples/sec Loss 4.9544 LearningRate 0.0003 Epoch: 19 Global Step: 400570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:26,004-Speed 6317.81 samples/sec Loss 4.9550 LearningRate 0.0003 Epoch: 19 Global Step: 400580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:29,250-Speed 6311.48 samples/sec Loss 5.0108 LearningRate 0.0003 Epoch: 19 Global Step: 400590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:32,493-Speed 6315.46 samples/sec Loss 4.9269 LearningRate 0.0003 Epoch: 19 Global Step: 400600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:35,737-Speed 6314.74 samples/sec Loss 4.9060 LearningRate 0.0003 Epoch: 19 Global Step: 400610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:38,980-Speed 6317.27 samples/sec Loss 4.9840 LearningRate 0.0003 Epoch: 19 Global Step: 400620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:42,227-Speed 6308.15 samples/sec Loss 4.9909 LearningRate 0.0003 Epoch: 19 Global Step: 400630 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:10:45,470-Speed 6316.52 samples/sec Loss 4.9921 LearningRate 0.0003 Epoch: 19 Global Step: 400640 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:10:48,700-Speed 6342.90 samples/sec Loss 5.0036 LearningRate 0.0003 Epoch: 19 Global Step: 400650 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:51,946-Speed 6310.28 samples/sec Loss 4.9908 LearningRate 0.0003 Epoch: 19 Global Step: 400660 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:55,190-Speed 6314.19 samples/sec Loss 4.9932 LearningRate 0.0003 Epoch: 19 Global Step: 400670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:10:58,438-Speed 6306.25 samples/sec Loss 5.0333 LearningRate 0.0003 Epoch: 19 Global Step: 400680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:01,684-Speed 6312.08 samples/sec Loss 4.9693 LearningRate 0.0003 Epoch: 19 Global Step: 400690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:04,929-Speed 6312.73 samples/sec Loss 4.9951 LearningRate 0.0003 Epoch: 19 Global Step: 400700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:08,176-Speed 6307.64 samples/sec Loss 4.9448 LearningRate 0.0003 Epoch: 19 Global Step: 400710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:11,420-Speed 6315.70 samples/sec Loss 4.9651 LearningRate 0.0003 Epoch: 19 Global Step: 400720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:14,668-Speed 6305.84 samples/sec Loss 5.0347 LearningRate 0.0003 Epoch: 19 Global Step: 400730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:17,916-Speed 6308.59 samples/sec Loss 4.9919 LearningRate 0.0003 Epoch: 19 Global Step: 400740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:21,162-Speed 6310.27 samples/sec Loss 5.0263 LearningRate 0.0003 Epoch: 19 Global Step: 400750 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:11:24,412-Speed 6303.91 samples/sec Loss 5.0525 LearningRate 0.0003 Epoch: 19 Global Step: 400760 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:11:27,657-Speed 6313.16 samples/sec Loss 5.0257 LearningRate 0.0003 Epoch: 19 Global Step: 400770 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:11:30,889-Speed 6336.68 samples/sec Loss 4.9838 LearningRate 0.0003 Epoch: 19 Global Step: 400780 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:34,135-Speed 6310.45 samples/sec Loss 4.9537 LearningRate 0.0003 Epoch: 19 Global Step: 400790 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:37,381-Speed 6310.90 samples/sec Loss 5.0046 LearningRate 0.0003 Epoch: 19 Global Step: 400800 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:40,627-Speed 6311.59 samples/sec Loss 4.9995 LearningRate 0.0003 Epoch: 19 Global Step: 400810 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:43,871-Speed 6314.47 samples/sec Loss 5.0046 LearningRate 0.0003 Epoch: 19 Global Step: 400820 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:47,139-Speed 6268.20 samples/sec Loss 4.8976 LearningRate 0.0003 Epoch: 19 Global Step: 400830 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:50,460-Speed 6168.39 samples/sec Loss 5.0437 LearningRate 0.0003 Epoch: 19 Global Step: 400840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:53,706-Speed 6309.80 samples/sec Loss 5.0063 LearningRate 0.0003 Epoch: 19 Global Step: 400850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:11:56,950-Speed 6314.38 samples/sec Loss 5.0016 LearningRate 0.0003 Epoch: 19 Global Step: 400860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:00,193-Speed 6316.21 samples/sec Loss 5.0057 LearningRate 0.0003 Epoch: 19 Global Step: 400870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:03,440-Speed 6309.25 samples/sec Loss 5.0241 LearningRate 0.0003 Epoch: 19 Global Step: 400880 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:12:06,684-Speed 6313.99 samples/sec Loss 5.0068 LearningRate 0.0003 Epoch: 19 Global Step: 400890 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:12:09,930-Speed 6312.01 samples/sec Loss 5.0657 LearningRate 0.0003 Epoch: 19 Global Step: 400900 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:12:13,181-Speed 6299.90 samples/sec Loss 4.9859 LearningRate 0.0003 Epoch: 19 Global Step: 400910 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:12:16,417-Speed 6331.38 samples/sec Loss 4.9793 LearningRate 0.0003 Epoch: 19 Global Step: 400920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:19,660-Speed 6316.28 samples/sec Loss 4.9368 LearningRate 0.0003 Epoch: 19 Global Step: 400930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:22,910-Speed 6302.81 samples/sec Loss 4.9711 LearningRate 0.0003 Epoch: 19 Global Step: 400940 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:26,160-Speed 6302.77 samples/sec Loss 5.0139 LearningRate 0.0003 Epoch: 19 Global Step: 400950 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:29,406-Speed 6312.18 samples/sec Loss 4.9544 LearningRate 0.0003 Epoch: 19 Global Step: 400960 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:32,649-Speed 6316.01 samples/sec Loss 5.0017 LearningRate 0.0003 Epoch: 19 Global Step: 400970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:35,895-Speed 6309.92 samples/sec Loss 4.8932 LearningRate 0.0003 Epoch: 19 Global Step: 400980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:39,140-Speed 6314.09 samples/sec Loss 5.0455 LearningRate 0.0003 Epoch: 19 Global Step: 400990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:42,384-Speed 6312.88 samples/sec Loss 4.9750 LearningRate 0.0003 Epoch: 19 Global Step: 401000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:45,628-Speed 6315.00 samples/sec Loss 5.0227 LearningRate 0.0003 Epoch: 19 Global Step: 401010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:48,871-Speed 6317.96 samples/sec Loss 4.9523 LearningRate 0.0003 Epoch: 19 Global Step: 401020 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:12:52,121-Speed 6302.40 samples/sec Loss 4.9769 LearningRate 0.0003 Epoch: 19 Global Step: 401030 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:12:55,360-Speed 6324.83 samples/sec Loss 4.9383 LearningRate 0.0003 Epoch: 19 Global Step: 401040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:12:58,670-Speed 6188.18 samples/sec Loss 5.0046 LearningRate 0.0003 Epoch: 19 Global Step: 401050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:01,918-Speed 6306.70 samples/sec Loss 5.0225 LearningRate 0.0003 Epoch: 19 Global Step: 401060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:05,161-Speed 6316.86 samples/sec Loss 5.0329 LearningRate 0.0003 Epoch: 19 Global Step: 401070 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:08,404-Speed 6315.29 samples/sec Loss 4.9747 LearningRate 0.0003 Epoch: 19 Global Step: 401080 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:11,650-Speed 6310.22 samples/sec Loss 4.9030 LearningRate 0.0003 Epoch: 19 Global Step: 401090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:14,904-Speed 6295.78 samples/sec Loss 4.9967 LearningRate 0.0003 Epoch: 19 Global Step: 401100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:18,159-Speed 6293.47 samples/sec Loss 5.0256 LearningRate 0.0003 Epoch: 19 Global Step: 401110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:21,405-Speed 6310.21 samples/sec Loss 4.9658 LearningRate 0.0003 Epoch: 19 Global Step: 401120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:24,658-Speed 6298.03 samples/sec Loss 5.0024 LearningRate 0.0003 Epoch: 19 Global Step: 401130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:27,905-Speed 6309.07 samples/sec Loss 5.0181 LearningRate 0.0003 Epoch: 19 Global Step: 401140 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:13:31,135-Speed 6341.47 samples/sec Loss 5.0869 LearningRate 0.0003 Epoch: 19 Global Step: 401150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:34,378-Speed 6315.98 samples/sec Loss 5.0208 LearningRate 0.0003 Epoch: 19 Global Step: 401160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:37,624-Speed 6311.81 samples/sec Loss 4.9761 LearningRate 0.0003 Epoch: 19 Global Step: 401170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:40,870-Speed 6310.63 samples/sec Loss 5.0025 LearningRate 0.0003 Epoch: 19 Global Step: 401180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:44,117-Speed 6309.63 samples/sec Loss 5.0254 LearningRate 0.0003 Epoch: 19 Global Step: 401190 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:47,362-Speed 6312.29 samples/sec Loss 4.9850 LearningRate 0.0003 Epoch: 19 Global Step: 401200 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:50,606-Speed 6313.53 samples/sec Loss 5.0268 LearningRate 0.0003 Epoch: 19 Global Step: 401210 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:53,849-Speed 6316.83 samples/sec Loss 5.0979 LearningRate 0.0003 Epoch: 19 Global Step: 401220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:13:57,101-Speed 6298.92 samples/sec Loss 5.0093 LearningRate 0.0003 Epoch: 19 Global Step: 401230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:00,348-Speed 6310.22 samples/sec Loss 5.0284 LearningRate 0.0003 Epoch: 19 Global Step: 401240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:03,581-Speed 6335.21 samples/sec Loss 4.9219 LearningRate 0.0003 Epoch: 19 Global Step: 401250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:06,830-Speed 6305.31 samples/sec Loss 4.9532 LearningRate 0.0003 Epoch: 19 Global Step: 401260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:10,075-Speed 6312.82 samples/sec Loss 5.0376 LearningRate 0.0003 Epoch: 19 Global Step: 401270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:13,335-Speed 6283.43 samples/sec Loss 4.9814 LearningRate 0.0003 Epoch: 19 Global Step: 401280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:16,692-Speed 6101.34 samples/sec Loss 4.8764 LearningRate 0.0003 Epoch: 19 Global Step: 401290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:19,937-Speed 6311.64 samples/sec Loss 5.0086 LearningRate 0.0003 Epoch: 19 Global Step: 401300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:23,180-Speed 6318.40 samples/sec Loss 5.0224 LearningRate 0.0003 Epoch: 19 Global Step: 401310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:26,422-Speed 6318.09 samples/sec Loss 4.9556 LearningRate 0.0003 Epoch: 19 Global Step: 401320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:29,670-Speed 6307.19 samples/sec Loss 4.9749 LearningRate 0.0003 Epoch: 19 Global Step: 401330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:32,913-Speed 6316.75 samples/sec Loss 4.9822 LearningRate 0.0003 Epoch: 19 Global Step: 401340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:36,153-Speed 6321.84 samples/sec Loss 4.9948 LearningRate 0.0003 Epoch: 19 Global Step: 401350 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:14:39,397-Speed 6313.29 samples/sec Loss 5.0235 LearningRate 0.0003 Epoch: 19 Global Step: 401360 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:14:42,640-Speed 6317.40 samples/sec Loss 5.0070 LearningRate 0.0003 Epoch: 19 Global Step: 401370 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:14:45,886-Speed 6311.15 samples/sec Loss 5.0671 LearningRate 0.0003 Epoch: 19 Global Step: 401380 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:14:49,122-Speed 6329.71 samples/sec Loss 4.9779 LearningRate 0.0003 Epoch: 19 Global Step: 401390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:52,367-Speed 6313.93 samples/sec Loss 4.9801 LearningRate 0.0003 Epoch: 19 Global Step: 401400 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:55,612-Speed 6312.41 samples/sec Loss 4.9437 LearningRate 0.0003 Epoch: 19 Global Step: 401410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:14:58,854-Speed 6319.23 samples/sec Loss 5.0319 LearningRate 0.0003 Epoch: 19 Global Step: 401420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:02,099-Speed 6312.10 samples/sec Loss 4.9890 LearningRate 0.0003 Epoch: 19 Global Step: 401430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:05,340-Speed 6319.87 samples/sec Loss 5.0647 LearningRate 0.0003 Epoch: 19 Global Step: 401440 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:08,590-Speed 6302.63 samples/sec Loss 4.9528 LearningRate 0.0003 Epoch: 19 Global Step: 401450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:11,836-Speed 6311.60 samples/sec Loss 4.9811 LearningRate 0.0003 Epoch: 19 Global Step: 401460 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:15,081-Speed 6312.41 samples/sec Loss 5.0972 LearningRate 0.0003 Epoch: 19 Global Step: 401470 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:18,326-Speed 6312.56 samples/sec Loss 4.9825 LearningRate 0.0003 Epoch: 19 Global Step: 401480 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:21,569-Speed 6316.17 samples/sec Loss 4.9671 LearningRate 0.0003 Epoch: 19 Global Step: 401490 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:15:24,805-Speed 6329.93 samples/sec Loss 4.8796 LearningRate 0.0003 Epoch: 19 Global Step: 401500 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:28,047-Speed 6319.16 samples/sec Loss 4.9515 LearningRate 0.0003 Epoch: 19 Global Step: 401510 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:31,296-Speed 6304.40 samples/sec Loss 5.0201 LearningRate 0.0003 Epoch: 19 Global Step: 401520 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:34,547-Speed 6302.22 samples/sec Loss 4.9048 LearningRate 0.0003 Epoch: 19 Global Step: 401530 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:37,791-Speed 6313.01 samples/sec Loss 5.0141 LearningRate 0.0003 Epoch: 19 Global Step: 401540 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:41,036-Speed 6313.41 samples/sec Loss 5.0147 LearningRate 0.0003 Epoch: 19 Global Step: 401550 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:44,283-Speed 6308.92 samples/sec Loss 4.9414 LearningRate 0.0003 Epoch: 19 Global Step: 401560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:47,529-Speed 6309.65 samples/sec Loss 4.9604 LearningRate 0.0003 Epoch: 19 Global Step: 401570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:50,775-Speed 6310.97 samples/sec Loss 4.9979 LearningRate 0.0003 Epoch: 19 Global Step: 401580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:54,018-Speed 6318.25 samples/sec Loss 4.9552 LearningRate 0.0003 Epoch: 19 Global Step: 401590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:15:57,250-Speed 6337.67 samples/sec Loss 5.0277 LearningRate 0.0003 Epoch: 19 Global Step: 401600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:00,518-Speed 6268.74 samples/sec Loss 4.9126 LearningRate 0.0003 Epoch: 19 Global Step: 401610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:03,765-Speed 6310.06 samples/sec Loss 4.9731 LearningRate 0.0003 Epoch: 19 Global Step: 401620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:07,010-Speed 6311.74 samples/sec Loss 4.9928 LearningRate 0.0003 Epoch: 19 Global Step: 401630 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:10,252-Speed 6318.56 samples/sec Loss 4.9356 LearningRate 0.0003 Epoch: 19 Global Step: 401640 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:13,512-Speed 6284.72 samples/sec Loss 5.0075 LearningRate 0.0003 Epoch: 19 Global Step: 401650 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:16,754-Speed 6317.41 samples/sec Loss 5.0362 LearningRate 0.0003 Epoch: 19 Global Step: 401660 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:20,001-Speed 6308.12 samples/sec Loss 5.0291 LearningRate 0.0003 Epoch: 19 Global Step: 401670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:23,246-Speed 6314.12 samples/sec Loss 4.9776 LearningRate 0.0003 Epoch: 19 Global Step: 401680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:26,492-Speed 6308.95 samples/sec Loss 5.0150 LearningRate 0.0003 Epoch: 19 Global Step: 401690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:29,736-Speed 6316.17 samples/sec Loss 4.9953 LearningRate 0.0003 Epoch: 19 Global Step: 401700 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:16:32,980-Speed 6313.41 samples/sec Loss 4.9809 LearningRate 0.0003 Epoch: 19 Global Step: 401710 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:16:36,307-Speed 6158.25 samples/sec Loss 5.0120 LearningRate 0.0003 Epoch: 19 Global Step: 401720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:39,556-Speed 6304.22 samples/sec Loss 5.0124 LearningRate 0.0003 Epoch: 19 Global Step: 401730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:42,804-Speed 6307.63 samples/sec Loss 4.9882 LearningRate 0.0003 Epoch: 19 Global Step: 401740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:46,048-Speed 6313.77 samples/sec Loss 4.9928 LearningRate 0.0003 Epoch: 19 Global Step: 401750 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:49,292-Speed 6315.18 samples/sec Loss 4.9686 LearningRate 0.0003 Epoch: 19 Global Step: 401760 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:52,559-Speed 6268.21 samples/sec Loss 4.9951 LearningRate 0.0003 Epoch: 19 Global Step: 401770 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:55,802-Speed 6317.32 samples/sec Loss 4.9964 LearningRate 0.0003 Epoch: 19 Global Step: 401780 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:16:59,043-Speed 6321.05 samples/sec Loss 4.9392 LearningRate 0.0003 Epoch: 19 Global Step: 401790 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:02,288-Speed 6312.83 samples/sec Loss 5.0020 LearningRate 0.0003 Epoch: 19 Global Step: 401800 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:05,534-Speed 6310.41 samples/sec Loss 4.9846 LearningRate 0.0003 Epoch: 19 Global Step: 401810 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:08,787-Speed 6298.09 samples/sec Loss 5.0241 LearningRate 0.0003 Epoch: 19 Global Step: 401820 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:17:12,032-Speed 6312.68 samples/sec Loss 4.9540 LearningRate 0.0003 Epoch: 19 Global Step: 401830 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:17:15,265-Speed 6335.03 samples/sec Loss 5.0130 LearningRate 0.0003 Epoch: 19 Global Step: 401840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:18,508-Speed 6317.63 samples/sec Loss 4.9561 LearningRate 0.0003 Epoch: 19 Global Step: 401850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:21,763-Speed 6293.57 samples/sec Loss 4.9961 LearningRate 0.0003 Epoch: 19 Global Step: 401860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:25,010-Speed 6307.77 samples/sec Loss 4.9984 LearningRate 0.0003 Epoch: 19 Global Step: 401870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:28,253-Speed 6316.82 samples/sec Loss 4.9753 LearningRate 0.0003 Epoch: 19 Global Step: 401880 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:31,496-Speed 6316.14 samples/sec Loss 5.0595 LearningRate 0.0003 Epoch: 19 Global Step: 401890 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:34,741-Speed 6313.25 samples/sec Loss 5.0646 LearningRate 0.0003 Epoch: 19 Global Step: 401900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:37,988-Speed 6309.36 samples/sec Loss 4.9752 LearningRate 0.0003 Epoch: 19 Global Step: 401910 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:41,237-Speed 6304.19 samples/sec Loss 5.0078 LearningRate 0.0003 Epoch: 19 Global Step: 401920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:44,479-Speed 6319.69 samples/sec Loss 4.9133 LearningRate 0.0003 Epoch: 19 Global Step: 401930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:17:47,726-Speed 6307.61 samples/sec Loss 4.9495 LearningRate 0.0003 Epoch: 19 Global Step: 401940 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:17:50,974-Speed 6306.22 samples/sec Loss 4.9584 LearningRate 0.0003 Epoch: 19 Global Step: 401950 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:17:54,224-Speed 6303.70 samples/sec Loss 4.9376 LearningRate 0.0003 Epoch: 19 Global Step: 401960 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:17:57,458-Speed 6334.97 samples/sec Loss 4.9719 LearningRate 0.0003 Epoch: 19 Global Step: 401970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:00,703-Speed 6312.04 samples/sec Loss 4.9875 LearningRate 0.0003 Epoch: 19 Global Step: 401980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:03,957-Speed 6295.65 samples/sec Loss 4.9731 LearningRate 0.0003 Epoch: 19 Global Step: 401990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:07,207-Speed 6303.03 samples/sec Loss 5.0430 LearningRate 0.0003 Epoch: 19 Global Step: 402000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:10,463-Speed 6290.82 samples/sec Loss 5.0569 LearningRate 0.0003 Epoch: 19 Global Step: 402010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:13,708-Speed 6312.60 samples/sec Loss 4.9481 LearningRate 0.0003 Epoch: 19 Global Step: 402020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:16,961-Speed 6297.64 samples/sec Loss 4.9389 LearningRate 0.0003 Epoch: 19 Global Step: 402030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:20,207-Speed 6310.60 samples/sec Loss 4.9830 LearningRate 0.0003 Epoch: 19 Global Step: 402040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:23,459-Speed 6300.09 samples/sec Loss 5.0636 LearningRate 0.0003 Epoch: 19 Global Step: 402050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:26,708-Speed 6303.88 samples/sec Loss 4.9961 LearningRate 0.0003 Epoch: 19 Global Step: 402060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:30,021-Speed 6182.92 samples/sec Loss 4.9305 LearningRate 0.0003 Epoch: 19 Global Step: 402070 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:18:33,271-Speed 6302.92 samples/sec Loss 4.9878 LearningRate 0.0003 Epoch: 19 Global Step: 402080 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:18:36,518-Speed 6310.63 samples/sec Loss 5.0103 LearningRate 0.0003 Epoch: 19 Global Step: 402090 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:18:39,855-Speed 6138.25 samples/sec Loss 4.9285 LearningRate 0.0003 Epoch: 19 Global Step: 402100 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:18:43,129-Speed 6256.33 samples/sec Loss 4.9749 LearningRate 0.0003 Epoch: 19 Global Step: 402110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:46,374-Speed 6313.05 samples/sec Loss 4.9391 LearningRate 0.0003 Epoch: 19 Global Step: 402120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:49,620-Speed 6310.65 samples/sec Loss 4.9672 LearningRate 0.0003 Epoch: 19 Global Step: 402130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:52,865-Speed 6312.29 samples/sec Loss 5.0336 LearningRate 0.0003 Epoch: 19 Global Step: 402140 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:56,108-Speed 6316.50 samples/sec Loss 5.0715 LearningRate 0.0003 Epoch: 19 Global Step: 402150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:18:59,354-Speed 6310.19 samples/sec Loss 5.0059 LearningRate 0.0003 Epoch: 19 Global Step: 402160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:02,612-Speed 6287.82 samples/sec Loss 5.0057 LearningRate 0.0003 Epoch: 19 Global Step: 402170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:05,862-Speed 6302.83 samples/sec Loss 4.9879 LearningRate 0.0003 Epoch: 19 Global Step: 402180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:09,106-Speed 6313.70 samples/sec Loss 5.0383 LearningRate 0.0003 Epoch: 19 Global Step: 402190 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:12,350-Speed 6315.27 samples/sec Loss 4.9504 LearningRate 0.0003 Epoch: 19 Global Step: 402200 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:15,582-Speed 6338.84 samples/sec Loss 4.9972 LearningRate 0.0003 Epoch: 19 Global Step: 402210 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:18,821-Speed 6323.34 samples/sec Loss 5.0617 LearningRate 0.0003 Epoch: 19 Global Step: 402220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:22,071-Speed 6302.68 samples/sec Loss 4.9867 LearningRate 0.0003 Epoch: 19 Global Step: 402230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:25,318-Speed 6309.43 samples/sec Loss 4.9861 LearningRate 0.0003 Epoch: 19 Global Step: 402240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:28,566-Speed 6307.74 samples/sec Loss 4.9379 LearningRate 0.0003 Epoch: 19 Global Step: 402250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:31,811-Speed 6313.23 samples/sec Loss 4.8777 LearningRate 0.0003 Epoch: 19 Global Step: 402260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:35,054-Speed 6316.46 samples/sec Loss 4.9819 LearningRate 0.0003 Epoch: 19 Global Step: 402270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:38,363-Speed 6189.33 samples/sec Loss 5.0103 LearningRate 0.0003 Epoch: 19 Global Step: 402280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:41,605-Speed 6320.36 samples/sec Loss 4.9051 LearningRate 0.0003 Epoch: 19 Global Step: 402290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:44,848-Speed 6316.23 samples/sec Loss 4.9469 LearningRate 0.0003 Epoch: 19 Global Step: 402300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:48,089-Speed 6321.09 samples/sec Loss 5.0147 LearningRate 0.0003 Epoch: 19 Global Step: 402310 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:19:51,315-Speed 6348.85 samples/sec Loss 4.9325 LearningRate 0.0003 Epoch: 19 Global Step: 402320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:54,557-Speed 6318.12 samples/sec Loss 5.0189 LearningRate 0.0003 Epoch: 19 Global Step: 402330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:19:57,798-Speed 6320.67 samples/sec Loss 5.0148 LearningRate 0.0003 Epoch: 19 Global Step: 402340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:01,053-Speed 6292.53 samples/sec Loss 5.0273 LearningRate 0.0003 Epoch: 19 Global Step: 402350 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:04,333-Speed 6245.52 samples/sec Loss 4.9906 LearningRate 0.0003 Epoch: 19 Global Step: 402360 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:07,578-Speed 6312.75 samples/sec Loss 5.0435 LearningRate 0.0003 Epoch: 19 Global Step: 402370 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:10,821-Speed 6316.09 samples/sec Loss 5.0032 LearningRate 0.0003 Epoch: 19 Global Step: 402380 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:14,065-Speed 6316.33 samples/sec Loss 5.0228 LearningRate 0.0003 Epoch: 19 Global Step: 402390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:17,332-Speed 6269.00 samples/sec Loss 4.9943 LearningRate 0.0003 Epoch: 19 Global Step: 402400 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:20,599-Speed 6270.13 samples/sec Loss 4.9640 LearningRate 0.0003 Epoch: 19 Global Step: 402410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:23,841-Speed 6319.41 samples/sec Loss 4.9210 LearningRate 0.0003 Epoch: 19 Global Step: 402420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:27,087-Speed 6309.24 samples/sec Loss 4.9354 LearningRate 0.0003 Epoch: 19 Global Step: 402430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:30,335-Speed 6308.08 samples/sec Loss 4.9843 LearningRate 0.0003 Epoch: 19 Global Step: 402440 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:33,583-Speed 6307.19 samples/sec Loss 4.9665 LearningRate 0.0003 Epoch: 19 Global Step: 402450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:36,826-Speed 6315.70 samples/sec Loss 4.9209 LearningRate 0.0003 Epoch: 19 Global Step: 402460 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:40,076-Speed 6304.19 samples/sec Loss 4.9454 LearningRate 0.0003 Epoch: 19 Global Step: 402470 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:43,318-Speed 6318.57 samples/sec Loss 4.9795 LearningRate 0.0003 Epoch: 19 Global Step: 402480 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:46,563-Speed 6312.34 samples/sec Loss 4.9214 LearningRate 0.0003 Epoch: 19 Global Step: 402490 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:49,806-Speed 6316.84 samples/sec Loss 4.9479 LearningRate 0.0003 Epoch: 19 Global Step: 402500 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:53,046-Speed 6321.49 samples/sec Loss 4.9976 LearningRate 0.0003 Epoch: 19 Global Step: 402510 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:20:56,290-Speed 6315.51 samples/sec Loss 4.9761 LearningRate 0.0003 Epoch: 19 Global Step: 402520 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:20:59,532-Speed 6318.45 samples/sec Loss 4.9045 LearningRate 0.0003 Epoch: 19 Global Step: 402530 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:21:02,780-Speed 6305.58 samples/sec Loss 4.9191 LearningRate 0.0003 Epoch: 19 Global Step: 402540 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:21:06,024-Speed 6314.94 samples/sec Loss 4.9761 LearningRate 0.0003 Epoch: 19 Global Step: 402550 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:21:09,256-Speed 6338.65 samples/sec Loss 4.9860 LearningRate 0.0003 Epoch: 19 Global Step: 402560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:12,501-Speed 6311.44 samples/sec Loss 4.9742 LearningRate 0.0003 Epoch: 19 Global Step: 402570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:15,745-Speed 6315.10 samples/sec Loss 4.9938 LearningRate 0.0003 Epoch: 19 Global Step: 402580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:18,991-Speed 6310.85 samples/sec Loss 5.0080 LearningRate 0.0003 Epoch: 19 Global Step: 402590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:22,238-Speed 6310.19 samples/sec Loss 5.0304 LearningRate 0.0003 Epoch: 19 Global Step: 402600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:25,487-Speed 6304.21 samples/sec Loss 5.0134 LearningRate 0.0003 Epoch: 19 Global Step: 402610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:28,731-Speed 6314.11 samples/sec Loss 4.9506 LearningRate 0.0003 Epoch: 19 Global Step: 402620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:31,972-Speed 6320.11 samples/sec Loss 5.0251 LearningRate 0.0003 Epoch: 19 Global Step: 402630 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:35,215-Speed 6316.62 samples/sec Loss 4.9725 LearningRate 0.0003 Epoch: 19 Global Step: 402640 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:38,462-Speed 6309.44 samples/sec Loss 4.9543 LearningRate 0.0003 Epoch: 19 Global Step: 402650 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:41,709-Speed 6307.50 samples/sec Loss 4.9445 LearningRate 0.0003 Epoch: 19 Global Step: 402660 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:21:44,943-Speed 6335.00 samples/sec Loss 4.8692 LearningRate 0.0003 Epoch: 19 Global Step: 402670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:48,222-Speed 6247.52 samples/sec Loss 4.9578 LearningRate 0.0003 Epoch: 19 Global Step: 402680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:51,468-Speed 6310.95 samples/sec Loss 5.0110 LearningRate 0.0003 Epoch: 19 Global Step: 402690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:54,718-Speed 6303.15 samples/sec Loss 5.0453 LearningRate 0.0003 Epoch: 19 Global Step: 402700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:21:57,962-Speed 6315.09 samples/sec Loss 5.0094 LearningRate 0.0003 Epoch: 19 Global Step: 402710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:01,210-Speed 6305.46 samples/sec Loss 4.9732 LearningRate 0.0003 Epoch: 19 Global Step: 402720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:04,454-Speed 6315.17 samples/sec Loss 5.0509 LearningRate 0.0003 Epoch: 19 Global Step: 402730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:07,698-Speed 6315.28 samples/sec Loss 4.9773 LearningRate 0.0003 Epoch: 19 Global Step: 402740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:10,944-Speed 6310.30 samples/sec Loss 4.9866 LearningRate 0.0003 Epoch: 19 Global Step: 402750 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:14,188-Speed 6315.25 samples/sec Loss 4.9547 LearningRate 0.0003 Epoch: 19 Global Step: 402760 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:17,418-Speed 6340.51 samples/sec Loss 4.9957 LearningRate 0.0003 Epoch: 19 Global Step: 402770 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:20,666-Speed 6307.49 samples/sec Loss 4.9737 LearningRate 0.0003 Epoch: 19 Global Step: 402780 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:23,915-Speed 6305.80 samples/sec Loss 4.9920 LearningRate 0.0003 Epoch: 19 Global Step: 402790 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:27,163-Speed 6305.63 samples/sec Loss 4.9203 LearningRate 0.0003 Epoch: 19 Global Step: 402800 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:30,408-Speed 6311.95 samples/sec Loss 5.0447 LearningRate 0.0003 Epoch: 19 Global Step: 402810 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:33,650-Speed 6320.33 samples/sec Loss 4.9552 LearningRate 0.0003 Epoch: 19 Global Step: 402820 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:36,894-Speed 6313.33 samples/sec Loss 4.9317 LearningRate 0.0003 Epoch: 19 Global Step: 402830 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:40,137-Speed 6317.39 samples/sec Loss 4.9632 LearningRate 0.0003 Epoch: 19 Global Step: 402840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:43,381-Speed 6314.42 samples/sec Loss 5.0501 LearningRate 0.0003 Epoch: 19 Global Step: 402850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:46,625-Speed 6314.45 samples/sec Loss 4.9716 LearningRate 0.0003 Epoch: 19 Global Step: 402860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:22:49,869-Speed 6315.03 samples/sec Loss 4.9492 LearningRate 0.0003 Epoch: 19 Global Step: 402870 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:22:53,115-Speed 6310.84 samples/sec Loss 4.9810 LearningRate 0.0003 Epoch: 19 Global Step: 402880 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:22:56,358-Speed 6316.08 samples/sec Loss 5.0189 LearningRate 0.0003 Epoch: 19 Global Step: 402890 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:22:59,605-Speed 6308.60 samples/sec Loss 4.9890 LearningRate 0.0003 Epoch: 19 Global Step: 402900 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:23:02,851-Speed 6312.34 samples/sec Loss 4.8855 LearningRate 0.0003 Epoch: 19 Global Step: 402910 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:23:06,080-Speed 6342.72 samples/sec Loss 5.0051 LearningRate 0.0003 Epoch: 19 Global Step: 402920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:09,342-Speed 6281.09 samples/sec Loss 4.9623 LearningRate 0.0003 Epoch: 19 Global Step: 402930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:12,592-Speed 6302.67 samples/sec Loss 5.0019 LearningRate 0.0003 Epoch: 19 Global Step: 402940 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:15,837-Speed 6313.02 samples/sec Loss 5.0407 LearningRate 0.0003 Epoch: 19 Global Step: 402950 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:19,083-Speed 6308.97 samples/sec Loss 4.9497 LearningRate 0.0003 Epoch: 19 Global Step: 402960 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:22,331-Speed 6308.27 samples/sec Loss 4.9642 LearningRate 0.0003 Epoch: 19 Global Step: 402970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:25,586-Speed 6293.55 samples/sec Loss 4.9552 LearningRate 0.0003 Epoch: 19 Global Step: 402980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:28,838-Speed 6297.60 samples/sec Loss 5.0141 LearningRate 0.0003 Epoch: 19 Global Step: 402990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:32,083-Speed 6312.75 samples/sec Loss 4.9475 LearningRate 0.0003 Epoch: 19 Global Step: 403000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:35,327-Speed 6314.30 samples/sec Loss 5.0078 LearningRate 0.0003 Epoch: 19 Global Step: 403010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:38,560-Speed 6337.14 samples/sec Loss 4.9930 LearningRate 0.0003 Epoch: 19 Global Step: 403020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:41,805-Speed 6312.33 samples/sec Loss 4.9787 LearningRate 0.0003 Epoch: 19 Global Step: 403030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:45,052-Speed 6308.17 samples/sec Loss 4.8963 LearningRate 0.0003 Epoch: 19 Global Step: 403040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:48,294-Speed 6318.61 samples/sec Loss 4.9462 LearningRate 0.0003 Epoch: 19 Global Step: 403050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:51,567-Speed 6260.01 samples/sec Loss 5.0060 LearningRate 0.0003 Epoch: 19 Global Step: 403060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:54,810-Speed 6315.47 samples/sec Loss 4.9635 LearningRate 0.0003 Epoch: 19 Global Step: 403070 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:23:58,058-Speed 6306.32 samples/sec Loss 4.9872 LearningRate 0.0003 Epoch: 19 Global Step: 403080 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:01,306-Speed 6306.40 samples/sec Loss 4.9425 LearningRate 0.0003 Epoch: 19 Global Step: 403090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:04,565-Speed 6286.44 samples/sec Loss 4.9572 LearningRate 0.0003 Epoch: 19 Global Step: 403100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:07,808-Speed 6317.55 samples/sec Loss 5.0045 LearningRate 0.0003 Epoch: 19 Global Step: 403110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:11,055-Speed 6309.26 samples/sec Loss 4.9755 LearningRate 0.0003 Epoch: 19 Global Step: 403120 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:24:14,298-Speed 6316.07 samples/sec Loss 4.9295 LearningRate 0.0003 Epoch: 19 Global Step: 403130 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:24:17,547-Speed 6305.42 samples/sec Loss 4.9993 LearningRate 0.0003 Epoch: 19 Global Step: 403140 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:24:20,792-Speed 6313.20 samples/sec Loss 5.0228 LearningRate 0.0003 Epoch: 19 Global Step: 403150 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:24:24,037-Speed 6312.08 samples/sec Loss 4.9000 LearningRate 0.0003 Epoch: 19 Global Step: 403160 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:24:27,286-Speed 6303.83 samples/sec Loss 5.0005 LearningRate 0.0003 Epoch: 19 Global Step: 403170 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:24:30,516-Speed 6342.12 samples/sec Loss 4.9924 LearningRate 0.0003 Epoch: 19 Global Step: 403180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:33,764-Speed 6307.80 samples/sec Loss 4.9246 LearningRate 0.0003 Epoch: 19 Global Step: 403190 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:37,012-Speed 6306.95 samples/sec Loss 4.9483 LearningRate 0.0003 Epoch: 19 Global Step: 403200 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:40,255-Speed 6315.63 samples/sec Loss 5.0102 LearningRate 0.0003 Epoch: 19 Global Step: 403210 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:43,502-Speed 6308.00 samples/sec Loss 4.9399 LearningRate 0.0003 Epoch: 19 Global Step: 403220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:46,746-Speed 6316.35 samples/sec Loss 4.9461 LearningRate 0.0003 Epoch: 19 Global Step: 403230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:49,993-Speed 6308.90 samples/sec Loss 5.0438 LearningRate 0.0003 Epoch: 19 Global Step: 403240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:53,242-Speed 6305.11 samples/sec Loss 5.0350 LearningRate 0.0003 Epoch: 19 Global Step: 403250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:56,487-Speed 6313.37 samples/sec Loss 4.9159 LearningRate 0.0003 Epoch: 19 Global Step: 403260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:24:59,743-Speed 6290.35 samples/sec Loss 4.9269 LearningRate 0.0003 Epoch: 19 Global Step: 403270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:02,988-Speed 6312.31 samples/sec Loss 4.9540 LearningRate 0.0003 Epoch: 19 Global Step: 403280 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:25:06,235-Speed 6310.35 samples/sec Loss 4.9849 LearningRate 0.0003 Epoch: 19 Global Step: 403290 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:25:09,463-Speed 6344.80 samples/sec Loss 4.9335 LearningRate 0.0003 Epoch: 19 Global Step: 403300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:12,708-Speed 6313.83 samples/sec Loss 4.9843 LearningRate 0.0003 Epoch: 19 Global Step: 403310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:15,955-Speed 6308.34 samples/sec Loss 4.9766 LearningRate 0.0003 Epoch: 19 Global Step: 403320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:19,201-Speed 6312.00 samples/sec Loss 4.9018 LearningRate 0.0003 Epoch: 19 Global Step: 403330 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:22,450-Speed 6305.08 samples/sec Loss 4.9381 LearningRate 0.0003 Epoch: 19 Global Step: 403340 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:25,700-Speed 6303.17 samples/sec Loss 4.9488 LearningRate 0.0003 Epoch: 19 Global Step: 403350 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:28,945-Speed 6311.04 samples/sec Loss 4.9873 LearningRate 0.0003 Epoch: 19 Global Step: 403360 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:32,190-Speed 6313.27 samples/sec Loss 4.9979 LearningRate 0.0003 Epoch: 19 Global Step: 403370 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:35,437-Speed 6309.88 samples/sec Loss 4.9262 LearningRate 0.0003 Epoch: 19 Global Step: 403380 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:38,683-Speed 6309.26 samples/sec Loss 5.0220 LearningRate 0.0003 Epoch: 19 Global Step: 403390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:41,930-Speed 6309.43 samples/sec Loss 4.9323 LearningRate 0.0003 Epoch: 19 Global Step: 403400 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:25:45,162-Speed 6337.53 samples/sec Loss 5.0131 LearningRate 0.0003 Epoch: 19 Global Step: 403410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:48,407-Speed 6313.52 samples/sec Loss 4.9460 LearningRate 0.0003 Epoch: 19 Global Step: 403420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:51,667-Speed 6288.22 samples/sec Loss 5.0365 LearningRate 0.0003 Epoch: 19 Global Step: 403430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:54,914-Speed 6309.68 samples/sec Loss 4.9578 LearningRate 0.0003 Epoch: 19 Global Step: 403440 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:25:58,169-Speed 6292.36 samples/sec Loss 5.0056 LearningRate 0.0003 Epoch: 19 Global Step: 403450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:01,419-Speed 6302.90 samples/sec Loss 5.0135 LearningRate 0.0003 Epoch: 19 Global Step: 403460 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:04,666-Speed 6308.24 samples/sec Loss 4.9922 LearningRate 0.0003 Epoch: 19 Global Step: 403470 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:07,915-Speed 6305.54 samples/sec Loss 4.9781 LearningRate 0.0003 Epoch: 19 Global Step: 403480 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:11,158-Speed 6315.41 samples/sec Loss 4.9151 LearningRate 0.0003 Epoch: 19 Global Step: 403490 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:14,406-Speed 6308.47 samples/sec Loss 4.9611 LearningRate 0.0003 Epoch: 19 Global Step: 403500 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:17,637-Speed 6338.37 samples/sec Loss 4.9671 LearningRate 0.0003 Epoch: 19 Global Step: 403510 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:20,881-Speed 6315.72 samples/sec Loss 4.9529 LearningRate 0.0003 Epoch: 19 Global Step: 403520 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:24,130-Speed 6306.03 samples/sec Loss 4.9768 LearningRate 0.0003 Epoch: 19 Global Step: 403530 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:27,379-Speed 6304.06 samples/sec Loss 4.9529 LearningRate 0.0003 Epoch: 19 Global Step: 403540 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:30,628-Speed 6304.41 samples/sec Loss 4.9835 LearningRate 0.0003 Epoch: 19 Global Step: 403550 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:33,874-Speed 6312.56 samples/sec Loss 4.9873 LearningRate 0.0003 Epoch: 19 Global Step: 403560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:37,121-Speed 6308.69 samples/sec Loss 5.0191 LearningRate 0.0003 Epoch: 19 Global Step: 403570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:40,367-Speed 6310.50 samples/sec Loss 5.0479 LearningRate 0.0003 Epoch: 19 Global Step: 403580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:43,609-Speed 6316.80 samples/sec Loss 5.0357 LearningRate 0.0003 Epoch: 19 Global Step: 403590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:46,851-Speed 6318.83 samples/sec Loss 4.9992 LearningRate 0.0003 Epoch: 19 Global Step: 403600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:50,093-Speed 6318.16 samples/sec Loss 4.9727 LearningRate 0.0003 Epoch: 19 Global Step: 403610 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:26:53,328-Speed 6334.31 samples/sec Loss 4.9575 LearningRate 0.0003 Epoch: 19 Global Step: 403620 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:56,578-Speed 6302.88 samples/sec Loss 4.9213 LearningRate 0.0003 Epoch: 19 Global Step: 403630 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:26:59,823-Speed 6310.63 samples/sec Loss 4.9853 LearningRate 0.0003 Epoch: 19 Global Step: 403640 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:03,071-Speed 6308.30 samples/sec Loss 4.9314 LearningRate 0.0003 Epoch: 19 Global Step: 403650 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:06,317-Speed 6310.22 samples/sec Loss 4.9708 LearningRate 0.0003 Epoch: 19 Global Step: 403660 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:09,575-Speed 6288.16 samples/sec Loss 4.9932 LearningRate 0.0003 Epoch: 19 Global Step: 403670 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:12,821-Speed 6308.87 samples/sec Loss 4.9514 LearningRate 0.0003 Epoch: 19 Global Step: 403680 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:16,068-Speed 6308.94 samples/sec Loss 4.9897 LearningRate 0.0003 Epoch: 19 Global Step: 403690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:19,319-Speed 6301.87 samples/sec Loss 4.9574 LearningRate 0.0003 Epoch: 19 Global Step: 403700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:22,563-Speed 6314.51 samples/sec Loss 4.9644 LearningRate 0.0003 Epoch: 19 Global Step: 403710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:25,808-Speed 6311.61 samples/sec Loss 4.9925 LearningRate 0.0003 Epoch: 19 Global Step: 403720 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:27:29,055-Speed 6310.00 samples/sec Loss 4.9569 LearningRate 0.0003 Epoch: 19 Global Step: 403730 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:27:32,297-Speed 6318.41 samples/sec Loss 4.8980 LearningRate 0.0003 Epoch: 19 Global Step: 403740 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:27:35,544-Speed 6309.79 samples/sec Loss 4.9247 LearningRate 0.0003 Epoch: 19 Global Step: 403750 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:27:38,792-Speed 6306.19 samples/sec Loss 4.9469 LearningRate 0.0003 Epoch: 19 Global Step: 403760 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:27:42,039-Speed 6309.57 samples/sec Loss 5.0519 LearningRate 0.0003 Epoch: 19 Global Step: 403770 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:27:45,284-Speed 6311.78 samples/sec Loss 4.9833 LearningRate 0.0003 Epoch: 19 Global Step: 403780 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:27:48,517-Speed 6337.53 samples/sec Loss 4.9332 LearningRate 0.0003 Epoch: 19 Global Step: 403790 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:51,759-Speed 6317.22 samples/sec Loss 5.0981 LearningRate 0.0003 Epoch: 19 Global Step: 403800 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:55,002-Speed 6316.63 samples/sec Loss 4.9900 LearningRate 0.0003 Epoch: 19 Global Step: 403810 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:27:58,247-Speed 6314.29 samples/sec Loss 4.9152 LearningRate 0.0003 Epoch: 19 Global Step: 403820 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:01,498-Speed 6300.18 samples/sec Loss 5.0199 LearningRate 0.0003 Epoch: 19 Global Step: 403830 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:04,747-Speed 6303.72 samples/sec Loss 4.9727 LearningRate 0.0003 Epoch: 19 Global Step: 403840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:07,995-Speed 6307.99 samples/sec Loss 5.0160 LearningRate 0.0003 Epoch: 19 Global Step: 403850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:11,245-Speed 6302.12 samples/sec Loss 4.9980 LearningRate 0.0003 Epoch: 19 Global Step: 403860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:14,493-Speed 6307.82 samples/sec Loss 4.9827 LearningRate 0.0003 Epoch: 19 Global Step: 403870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:17,735-Speed 6317.59 samples/sec Loss 4.9604 LearningRate 0.0003 Epoch: 19 Global Step: 403880 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:20,981-Speed 6311.81 samples/sec Loss 4.9514 LearningRate 0.0003 Epoch: 19 Global Step: 403890 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:28:24,210-Speed 6342.69 samples/sec Loss 4.9530 LearningRate 0.0003 Epoch: 19 Global Step: 403900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:27,460-Speed 6304.26 samples/sec Loss 4.9600 LearningRate 0.0003 Epoch: 19 Global Step: 403910 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:30,712-Speed 6299.18 samples/sec Loss 4.9276 LearningRate 0.0003 Epoch: 19 Global Step: 403920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:33,961-Speed 6305.64 samples/sec Loss 4.9345 LearningRate 0.0003 Epoch: 19 Global Step: 403930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:37,209-Speed 6306.47 samples/sec Loss 4.9912 LearningRate 0.0003 Epoch: 19 Global Step: 403940 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:40,452-Speed 6316.03 samples/sec Loss 4.9498 LearningRate 0.0003 Epoch: 19 Global Step: 403950 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:43,705-Speed 6298.11 samples/sec Loss 4.9967 LearningRate 0.0003 Epoch: 19 Global Step: 403960 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:46,952-Speed 6307.52 samples/sec Loss 4.9953 LearningRate 0.0003 Epoch: 19 Global Step: 403970 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:50,200-Speed 6307.18 samples/sec Loss 5.0010 LearningRate 0.0003 Epoch: 19 Global Step: 403980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:53,448-Speed 6307.00 samples/sec Loss 4.9162 LearningRate 0.0003 Epoch: 19 Global Step: 403990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:56,678-Speed 6342.57 samples/sec Loss 4.9105 LearningRate 0.0003 Epoch: 19 Global Step: 404000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:28:59,929-Speed 6301.02 samples/sec Loss 4.9966 LearningRate 0.0003 Epoch: 19 Global Step: 404010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:03,175-Speed 6310.14 samples/sec Loss 5.0303 LearningRate 0.0003 Epoch: 19 Global Step: 404020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:06,420-Speed 6312.85 samples/sec Loss 4.9714 LearningRate 0.0003 Epoch: 19 Global Step: 404030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:09,668-Speed 6306.05 samples/sec Loss 4.9613 LearningRate 0.0003 Epoch: 19 Global Step: 404040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:12,918-Speed 6304.37 samples/sec Loss 4.9314 LearningRate 0.0003 Epoch: 19 Global Step: 404050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:16,167-Speed 6303.27 samples/sec Loss 5.0487 LearningRate 0.0003 Epoch: 19 Global Step: 404060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:19,412-Speed 6314.09 samples/sec Loss 5.0196 LearningRate 0.0003 Epoch: 19 Global Step: 404070 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:22,659-Speed 6307.52 samples/sec Loss 4.9653 LearningRate 0.0003 Epoch: 19 Global Step: 404080 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:25,908-Speed 6305.57 samples/sec Loss 4.9427 LearningRate 0.0003 Epoch: 19 Global Step: 404090 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:29,156-Speed 6307.59 samples/sec Loss 4.9479 LearningRate 0.0003 Epoch: 19 Global Step: 404100 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:29:32,402-Speed 6310.34 samples/sec Loss 4.9389 LearningRate 0.0003 Epoch: 19 Global Step: 404110 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:29:35,633-Speed 6339.82 samples/sec Loss 4.9886 LearningRate 0.0003 Epoch: 19 Global Step: 404120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:38,880-Speed 6309.69 samples/sec Loss 4.9921 LearningRate 0.0003 Epoch: 19 Global Step: 404130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:42,125-Speed 6312.84 samples/sec Loss 4.9144 LearningRate 0.0003 Epoch: 19 Global Step: 404140 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:45,371-Speed 6309.96 samples/sec Loss 5.0769 LearningRate 0.0003 Epoch: 19 Global Step: 404150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:48,615-Speed 6315.23 samples/sec Loss 5.0367 LearningRate 0.0003 Epoch: 19 Global Step: 404160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:51,860-Speed 6313.65 samples/sec Loss 4.9336 LearningRate 0.0003 Epoch: 19 Global Step: 404170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:55,103-Speed 6314.82 samples/sec Loss 5.0396 LearningRate 0.0003 Epoch: 19 Global Step: 404180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:29:58,355-Speed 6299.58 samples/sec Loss 4.9661 LearningRate 0.0003 Epoch: 19 Global Step: 404190 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:01,602-Speed 6309.28 samples/sec Loss 5.0214 LearningRate 0.0003 Epoch: 19 Global Step: 404200 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:04,859-Speed 6288.82 samples/sec Loss 4.9384 LearningRate 0.0003 Epoch: 19 Global Step: 404210 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:08,105-Speed 6310.22 samples/sec Loss 4.9755 LearningRate 0.0003 Epoch: 19 Global Step: 404220 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:30:11,336-Speed 6340.15 samples/sec Loss 4.9295 LearningRate 0.0003 Epoch: 19 Global Step: 404230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:14,588-Speed 6298.95 samples/sec Loss 4.8940 LearningRate 0.0003 Epoch: 19 Global Step: 404240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:17,833-Speed 6313.28 samples/sec Loss 5.0474 LearningRate 0.0003 Epoch: 19 Global Step: 404250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:21,079-Speed 6311.51 samples/sec Loss 5.0359 LearningRate 0.0003 Epoch: 19 Global Step: 404260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:24,325-Speed 6309.52 samples/sec Loss 4.9873 LearningRate 0.0003 Epoch: 19 Global Step: 404270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:27,567-Speed 6318.66 samples/sec Loss 4.9410 LearningRate 0.0003 Epoch: 19 Global Step: 404280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:30,812-Speed 6314.38 samples/sec Loss 4.9343 LearningRate 0.0003 Epoch: 19 Global Step: 404290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:34,057-Speed 6310.74 samples/sec Loss 4.8969 LearningRate 0.0003 Epoch: 19 Global Step: 404300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:37,306-Speed 6305.51 samples/sec Loss 4.9589 LearningRate 0.0003 Epoch: 19 Global Step: 404310 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:40,549-Speed 6316.41 samples/sec Loss 4.9046 LearningRate 0.0003 Epoch: 19 Global Step: 404320 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:30:43,796-Speed 6309.43 samples/sec Loss 4.9815 LearningRate 0.0003 Epoch: 19 Global Step: 404330 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:30:47,041-Speed 6313.51 samples/sec Loss 4.9417 LearningRate 0.0003 Epoch: 19 Global Step: 404340 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:30:50,289-Speed 6307.37 samples/sec Loss 5.0115 LearningRate 0.0003 Epoch: 19 Global Step: 404350 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:30:53,533-Speed 6313.49 samples/sec Loss 4.9460 LearningRate 0.0003 Epoch: 19 Global Step: 404360 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:30:56,784-Speed 6301.67 samples/sec Loss 4.9534 LearningRate 0.0003 Epoch: 19 Global Step: 404370 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:31:00,019-Speed 6331.53 samples/sec Loss 4.9622 LearningRate 0.0003 Epoch: 19 Global Step: 404380 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:03,264-Speed 6312.38 samples/sec Loss 4.9155 LearningRate 0.0003 Epoch: 19 Global Step: 404390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:06,508-Speed 6315.74 samples/sec Loss 4.9384 LearningRate 0.0003 Epoch: 19 Global Step: 404400 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:09,751-Speed 6317.07 samples/sec Loss 4.9610 LearningRate 0.0003 Epoch: 19 Global Step: 404410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:13,000-Speed 6304.41 samples/sec Loss 4.9519 LearningRate 0.0003 Epoch: 19 Global Step: 404420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:16,241-Speed 6319.98 samples/sec Loss 4.9591 LearningRate 0.0003 Epoch: 19 Global Step: 404430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:19,490-Speed 6305.53 samples/sec Loss 4.9692 LearningRate 0.0003 Epoch: 19 Global Step: 404440 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:22,734-Speed 6313.01 samples/sec Loss 4.9791 LearningRate 0.0003 Epoch: 19 Global Step: 404450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:25,978-Speed 6315.28 samples/sec Loss 5.0633 LearningRate 0.0003 Epoch: 19 Global Step: 404460 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:29,223-Speed 6312.36 samples/sec Loss 4.8616 LearningRate 0.0003 Epoch: 19 Global Step: 404470 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:32,465-Speed 6318.02 samples/sec Loss 4.9493 LearningRate 0.0003 Epoch: 19 Global Step: 404480 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:31:35,714-Speed 6305.89 samples/sec Loss 4.9946 LearningRate 0.0003 Epoch: 19 Global Step: 404490 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:31:38,962-Speed 6305.57 samples/sec Loss 5.0143 LearningRate 0.0003 Epoch: 19 Global Step: 404500 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:31:42,213-Speed 6300.94 samples/sec Loss 4.9897 LearningRate 0.0003 Epoch: 19 Global Step: 404510 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:31:45,442-Speed 6344.21 samples/sec Loss 5.0007 LearningRate 0.0003 Epoch: 19 Global Step: 404520 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:48,688-Speed 6310.88 samples/sec Loss 4.9860 LearningRate 0.0003 Epoch: 19 Global Step: 404530 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:51,956-Speed 6268.65 samples/sec Loss 4.9743 LearningRate 0.0003 Epoch: 19 Global Step: 404540 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:55,205-Speed 6305.55 samples/sec Loss 4.9600 LearningRate 0.0003 Epoch: 19 Global Step: 404550 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:31:58,449-Speed 6315.05 samples/sec Loss 4.8535 LearningRate 0.0003 Epoch: 19 Global Step: 404560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:01,697-Speed 6306.40 samples/sec Loss 4.9958 LearningRate 0.0003 Epoch: 19 Global Step: 404570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:04,941-Speed 6314.07 samples/sec Loss 4.9846 LearningRate 0.0003 Epoch: 19 Global Step: 404580 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:08,190-Speed 6305.95 samples/sec Loss 4.9493 LearningRate 0.0003 Epoch: 19 Global Step: 404590 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:11,434-Speed 6313.37 samples/sec Loss 4.8603 LearningRate 0.0003 Epoch: 19 Global Step: 404600 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:14,691-Speed 6290.69 samples/sec Loss 4.9115 LearningRate 0.0003 Epoch: 19 Global Step: 404610 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:17,939-Speed 6307.11 samples/sec Loss 4.9668 LearningRate 0.0003 Epoch: 19 Global Step: 404620 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:32:21,190-Speed 6300.37 samples/sec Loss 4.9560 LearningRate 0.0003 Epoch: 19 Global Step: 404630 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:32:24,436-Speed 6309.68 samples/sec Loss 4.9717 LearningRate 0.0003 Epoch: 19 Global Step: 404640 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:32:27,681-Speed 6312.79 samples/sec Loss 4.9475 LearningRate 0.0003 Epoch: 19 Global Step: 404650 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:32:30,931-Speed 6303.56 samples/sec Loss 4.9306 LearningRate 0.0003 Epoch: 19 Global Step: 404660 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:32:34,179-Speed 6306.48 samples/sec Loss 4.9524 LearningRate 0.0003 Epoch: 19 Global Step: 404670 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:32:37,424-Speed 6312.32 samples/sec Loss 4.9911 LearningRate 0.0003 Epoch: 19 Global Step: 404680 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:32:40,655-Speed 6339.97 samples/sec Loss 5.0040 LearningRate 0.0003 Epoch: 19 Global Step: 404690 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:43,899-Speed 6315.31 samples/sec Loss 5.0065 LearningRate 0.0003 Epoch: 19 Global Step: 404700 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:47,147-Speed 6307.18 samples/sec Loss 5.0173 LearningRate 0.0003 Epoch: 19 Global Step: 404710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:50,398-Speed 6299.88 samples/sec Loss 4.9069 LearningRate 0.0003 Epoch: 19 Global Step: 404720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:53,653-Speed 6294.55 samples/sec Loss 4.9124 LearningRate 0.0003 Epoch: 19 Global Step: 404730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:32:56,901-Speed 6306.97 samples/sec Loss 4.9926 LearningRate 0.0003 Epoch: 19 Global Step: 404740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:00,149-Speed 6307.92 samples/sec Loss 4.9723 LearningRate 0.0003 Epoch: 19 Global Step: 404750 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:03,399-Speed 6302.64 samples/sec Loss 5.1046 LearningRate 0.0003 Epoch: 19 Global Step: 404760 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:06,644-Speed 6311.60 samples/sec Loss 4.9358 LearningRate 0.0003 Epoch: 19 Global Step: 404770 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:09,888-Speed 6315.49 samples/sec Loss 4.8770 LearningRate 0.0003 Epoch: 19 Global Step: 404780 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:13,134-Speed 6309.92 samples/sec Loss 4.9287 LearningRate 0.0003 Epoch: 19 Global Step: 404790 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:33:16,381-Speed 6308.44 samples/sec Loss 4.8956 LearningRate 0.0003 Epoch: 19 Global Step: 404800 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:33:19,628-Speed 6309.76 samples/sec Loss 4.9856 LearningRate 0.0003 Epoch: 19 Global Step: 404810 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:33:22,875-Speed 6308.16 samples/sec Loss 4.9885 LearningRate 0.0003 Epoch: 19 Global Step: 404820 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:33:26,134-Speed 6284.86 samples/sec Loss 4.9689 LearningRate 0.0003 Epoch: 19 Global Step: 404830 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:33:29,381-Speed 6309.93 samples/sec Loss 4.9305 LearningRate 0.0003 Epoch: 19 Global Step: 404840 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:33:32,613-Speed 6338.00 samples/sec Loss 4.9990 LearningRate 0.0003 Epoch: 19 Global Step: 404850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:35,860-Speed 6308.66 samples/sec Loss 4.9166 LearningRate 0.0003 Epoch: 19 Global Step: 404860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:39,106-Speed 6310.21 samples/sec Loss 4.9929 LearningRate 0.0003 Epoch: 19 Global Step: 404870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:42,355-Speed 6306.51 samples/sec Loss 4.9239 LearningRate 0.0003 Epoch: 19 Global Step: 404880 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:45,600-Speed 6311.94 samples/sec Loss 4.9548 LearningRate 0.0003 Epoch: 19 Global Step: 404890 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:48,856-Speed 6292.36 samples/sec Loss 5.0591 LearningRate 0.0003 Epoch: 19 Global Step: 404900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:52,107-Speed 6301.43 samples/sec Loss 4.9560 LearningRate 0.0003 Epoch: 19 Global Step: 404910 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:55,354-Speed 6308.36 samples/sec Loss 5.0112 LearningRate 0.0003 Epoch: 19 Global Step: 404920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:33:58,600-Speed 6309.73 samples/sec Loss 4.9363 LearningRate 0.0003 Epoch: 19 Global Step: 404930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:01,850-Speed 6304.50 samples/sec Loss 4.9989 LearningRate 0.0003 Epoch: 19 Global Step: 404940 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:05,097-Speed 6309.05 samples/sec Loss 4.9950 LearningRate 0.0003 Epoch: 19 Global Step: 404950 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:34:08,403-Speed 6195.00 samples/sec Loss 4.9837 LearningRate 0.0003 Epoch: 19 Global Step: 404960 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:34:11,706-Speed 6203.31 samples/sec Loss 4.9043 LearningRate 0.0003 Epoch: 19 Global Step: 404970 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:34:14,938-Speed 6337.33 samples/sec Loss 4.9566 LearningRate 0.0003 Epoch: 19 Global Step: 404980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:18,185-Speed 6308.66 samples/sec Loss 4.9949 LearningRate 0.0003 Epoch: 19 Global Step: 404990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:21,431-Speed 6311.42 samples/sec Loss 5.0717 LearningRate 0.0003 Epoch: 19 Global Step: 405000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:24,676-Speed 6311.74 samples/sec Loss 5.0215 LearningRate 0.0003 Epoch: 19 Global Step: 405010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:27,925-Speed 6304.40 samples/sec Loss 4.9462 LearningRate 0.0003 Epoch: 19 Global Step: 405020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:31,166-Speed 6320.26 samples/sec Loss 4.9097 LearningRate 0.0003 Epoch: 19 Global Step: 405030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:34,438-Speed 6261.01 samples/sec Loss 4.9931 LearningRate 0.0003 Epoch: 19 Global Step: 405040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:37,688-Speed 6303.60 samples/sec Loss 4.9292 LearningRate 0.0003 Epoch: 19 Global Step: 405050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:40,932-Speed 6314.79 samples/sec Loss 4.9642 LearningRate 0.0003 Epoch: 19 Global Step: 405060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:44,175-Speed 6315.11 samples/sec Loss 4.9038 LearningRate 0.0003 Epoch: 19 Global Step: 405070 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:47,424-Speed 6306.06 samples/sec Loss 4.9520 LearningRate 0.0003 Epoch: 19 Global Step: 405080 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:34:50,672-Speed 6306.05 samples/sec Loss 4.9400 LearningRate 0.0003 Epoch: 19 Global Step: 405090 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:34:53,904-Speed 6338.88 samples/sec Loss 5.0120 LearningRate 0.0003 Epoch: 19 Global Step: 405100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:34:57,147-Speed 6316.27 samples/sec Loss 5.0013 LearningRate 0.0003 Epoch: 19 Global Step: 405110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:00,394-Speed 6308.91 samples/sec Loss 4.9862 LearningRate 0.0003 Epoch: 19 Global Step: 405120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:03,641-Speed 6309.31 samples/sec Loss 4.9181 LearningRate 0.0003 Epoch: 19 Global Step: 405130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:06,891-Speed 6301.92 samples/sec Loss 5.0102 LearningRate 0.0003 Epoch: 19 Global Step: 405140 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:10,133-Speed 6319.65 samples/sec Loss 5.0152 LearningRate 0.0003 Epoch: 19 Global Step: 405150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:13,374-Speed 6320.53 samples/sec Loss 4.9752 LearningRate 0.0003 Epoch: 19 Global Step: 405160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:16,619-Speed 6311.99 samples/sec Loss 5.0158 LearningRate 0.0003 Epoch: 19 Global Step: 405170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:19,862-Speed 6317.57 samples/sec Loss 5.0241 LearningRate 0.0003 Epoch: 19 Global Step: 405180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:23,110-Speed 6306.77 samples/sec Loss 4.9551 LearningRate 0.0003 Epoch: 19 Global Step: 405190 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:26,359-Speed 6303.78 samples/sec Loss 4.9110 LearningRate 0.0003 Epoch: 19 Global Step: 405200 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:35:29,593-Speed 6335.63 samples/sec Loss 5.0138 LearningRate 0.0003 Epoch: 19 Global Step: 405210 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:32,836-Speed 6315.69 samples/sec Loss 4.9615 LearningRate 0.0003 Epoch: 19 Global Step: 405220 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:36,078-Speed 6317.61 samples/sec Loss 4.9535 LearningRate 0.0003 Epoch: 19 Global Step: 405230 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:39,322-Speed 6314.59 samples/sec Loss 4.9424 LearningRate 0.0003 Epoch: 19 Global Step: 405240 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:42,564-Speed 6318.24 samples/sec Loss 4.9412 LearningRate 0.0003 Epoch: 19 Global Step: 405250 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:45,807-Speed 6316.53 samples/sec Loss 5.0240 LearningRate 0.0003 Epoch: 19 Global Step: 405260 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:49,057-Speed 6303.96 samples/sec Loss 4.9619 LearningRate 0.0003 Epoch: 19 Global Step: 405270 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:52,300-Speed 6315.68 samples/sec Loss 4.9698 LearningRate 0.0003 Epoch: 19 Global Step: 405280 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:55,544-Speed 6314.71 samples/sec Loss 4.9410 LearningRate 0.0003 Epoch: 19 Global Step: 405290 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:35:58,794-Speed 6304.24 samples/sec Loss 4.9218 LearningRate 0.0003 Epoch: 19 Global Step: 405300 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:02,052-Speed 6286.82 samples/sec Loss 4.9730 LearningRate 0.0003 Epoch: 19 Global Step: 405310 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:36:05,299-Speed 6309.19 samples/sec Loss 4.9737 LearningRate 0.0003 Epoch: 19 Global Step: 405320 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:36:08,543-Speed 6313.36 samples/sec Loss 5.0439 LearningRate 0.0003 Epoch: 19 Global Step: 405330 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:36:11,791-Speed 6308.30 samples/sec Loss 4.9848 LearningRate 0.0003 Epoch: 19 Global Step: 405340 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:36:15,038-Speed 6309.09 samples/sec Loss 4.9867 LearningRate 0.0003 Epoch: 19 Global Step: 405350 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:36:18,268-Speed 6341.32 samples/sec Loss 4.9848 LearningRate 0.0003 Epoch: 19 Global Step: 405360 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:21,521-Speed 6298.09 samples/sec Loss 4.9502 LearningRate 0.0003 Epoch: 19 Global Step: 405370 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:24,776-Speed 6291.91 samples/sec Loss 4.9115 LearningRate 0.0003 Epoch: 19 Global Step: 405380 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:28,024-Speed 6307.91 samples/sec Loss 4.9176 LearningRate 0.0003 Epoch: 19 Global Step: 405390 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:31,266-Speed 6319.15 samples/sec Loss 4.9623 LearningRate 0.0003 Epoch: 19 Global Step: 405400 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:34,512-Speed 6311.01 samples/sec Loss 4.9708 LearningRate 0.0003 Epoch: 19 Global Step: 405410 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:37,758-Speed 6310.10 samples/sec Loss 4.9979 LearningRate 0.0003 Epoch: 19 Global Step: 405420 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:41,002-Speed 6313.71 samples/sec Loss 4.9597 LearningRate 0.0003 Epoch: 19 Global Step: 405430 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:44,249-Speed 6309.89 samples/sec Loss 4.9565 LearningRate 0.0003 Epoch: 19 Global Step: 405440 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:47,491-Speed 6316.90 samples/sec Loss 4.9269 LearningRate 0.0003 Epoch: 19 Global Step: 405450 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:36:50,734-Speed 6317.84 samples/sec Loss 4.9019 LearningRate 0.0003 Epoch: 19 Global Step: 405460 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:36:53,978-Speed 6313.30 samples/sec Loss 4.9454 LearningRate 0.0003 Epoch: 19 Global Step: 405470 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:36:57,218-Speed 6323.01 samples/sec Loss 4.9272 LearningRate 0.0003 Epoch: 19 Global Step: 405480 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:37:00,461-Speed 6316.24 samples/sec Loss 4.8765 LearningRate 0.0003 Epoch: 19 Global Step: 405490 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:37:03,709-Speed 6307.96 samples/sec Loss 4.9642 LearningRate 0.0003 Epoch: 19 Global Step: 405500 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:37:06,956-Speed 6307.33 samples/sec Loss 4.9456 LearningRate 0.0003 Epoch: 19 Global Step: 405510 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:37:10,201-Speed 6313.99 samples/sec Loss 4.9706 LearningRate 0.0003 Epoch: 19 Global Step: 405520 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:37:13,444-Speed 6315.07 samples/sec Loss 4.9530 LearningRate 0.0003 Epoch: 19 Global Step: 405530 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:37:16,695-Speed 6302.13 samples/sec Loss 4.9091 LearningRate 0.0003 Epoch: 19 Global Step: 405540 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:37:19,938-Speed 6315.95 samples/sec Loss 4.9549 LearningRate 0.0003 Epoch: 19 Global Step: 405550 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:37:23,189-Speed 6301.36 samples/sec Loss 5.0144 LearningRate 0.0003 Epoch: 19 Global Step: 405560 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:37:26,437-Speed 6307.00 samples/sec Loss 5.0406 LearningRate 0.0003 Epoch: 19 Global Step: 405570 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:37:29,688-Speed 6302.01 samples/sec Loss 4.9807 LearningRate 0.0003 Epoch: 19 Global Step: 405580 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:37:32,936-Speed 6306.99 samples/sec Loss 4.9279 LearningRate 0.0003 Epoch: 19 Global Step: 405590 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:37:36,182-Speed 6311.01 samples/sec Loss 5.0001 LearningRate 0.0003 Epoch: 19 Global Step: 405600 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:37:39,427-Speed 6312.62 samples/sec Loss 4.9230 LearningRate 0.0003 Epoch: 19 Global Step: 405610 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:37:42,677-Speed 6301.73 samples/sec Loss 4.9757 LearningRate 0.0003 Epoch: 19 Global Step: 405620 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:37:45,922-Speed 6312.23 samples/sec Loss 5.0089 LearningRate 0.0003 Epoch: 19 Global Step: 405630 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:37:49,164-Speed 6319.74 samples/sec Loss 4.9563 LearningRate 0.0003 Epoch: 19 Global Step: 405640 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:37:52,410-Speed 6309.10 samples/sec Loss 4.8977 LearningRate 0.0003 Epoch: 19 Global Step: 405650 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:37:55,667-Speed 6289.86 samples/sec Loss 4.9081 LearningRate 0.0003 Epoch: 19 Global Step: 405660 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:37:58,914-Speed 6309.72 samples/sec Loss 4.9009 LearningRate 0.0003 Epoch: 19 Global Step: 405670 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:38:02,145-Speed 6340.08 samples/sec Loss 4.8812 LearningRate 0.0003 Epoch: 19 Global Step: 405680 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:38:05,393-Speed 6305.82 samples/sec Loss 4.9495 LearningRate 0.0003 Epoch: 19 Global Step: 405690 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:38:08,639-Speed 6310.88 samples/sec Loss 4.8994 LearningRate 0.0003 Epoch: 19 Global Step: 405700 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:38:11,881-Speed 6318.54 samples/sec Loss 4.9790 LearningRate 0.0003 Epoch: 19 Global Step: 405710 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:15,127-Speed 6311.61 samples/sec Loss 4.9615 LearningRate 0.0003 Epoch: 19 Global Step: 405720 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:18,376-Speed 6304.36 samples/sec Loss 5.0093 LearningRate 0.0003 Epoch: 19 Global Step: 405730 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:21,617-Speed 6320.53 samples/sec Loss 4.9604 LearningRate 0.0003 Epoch: 19 Global Step: 405740 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:24,865-Speed 6307.21 samples/sec Loss 4.8258 LearningRate 0.0003 Epoch: 19 Global Step: 405750 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:28,109-Speed 6315.77 samples/sec Loss 4.9917 LearningRate 0.0003 Epoch: 19 Global Step: 405760 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:31,356-Speed 6308.41 samples/sec Loss 4.9154 LearningRate 0.0003 Epoch: 19 Global Step: 405770 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:34,603-Speed 6308.18 samples/sec Loss 5.0001 LearningRate 0.0003 Epoch: 19 Global Step: 405780 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:37,851-Speed 6307.52 samples/sec Loss 5.0047 LearningRate 0.0003 Epoch: 19 Global Step: 405790 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:41,095-Speed 6312.96 samples/sec Loss 4.9258 LearningRate 0.0003 Epoch: 19 Global Step: 405800 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:44,338-Speed 6317.39 samples/sec Loss 4.9735 LearningRate 0.0003 Epoch: 19 Global Step: 405810 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:38:47,584-Speed 6310.26 samples/sec Loss 4.8958 LearningRate 0.0003 Epoch: 19 Global Step: 405820 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:38:50,831-Speed 6308.38 samples/sec Loss 4.9729 LearningRate 0.0003 Epoch: 19 Global Step: 405830 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:38:54,065-Speed 6335.16 samples/sec Loss 4.9671 LearningRate 0.0003 Epoch: 19 Global Step: 405840 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:38:57,360-Speed 6217.26 samples/sec Loss 5.0091 LearningRate 0.0003 Epoch: 19 Global Step: 405850 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:00,657-Speed 6213.14 samples/sec Loss 4.9060 LearningRate 0.0003 Epoch: 19 Global Step: 405860 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:03,962-Speed 6198.34 samples/sec Loss 5.0007 LearningRate 0.0003 Epoch: 19 Global Step: 405870 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:07,213-Speed 6300.30 samples/sec Loss 4.9962 LearningRate 0.0003 Epoch: 19 Global Step: 405880 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:10,458-Speed 6313.25 samples/sec Loss 4.9934 LearningRate 0.0003 Epoch: 19 Global Step: 405890 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:13,702-Speed 6313.01 samples/sec Loss 4.9695 LearningRate 0.0003 Epoch: 19 Global Step: 405900 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:16,946-Speed 6316.03 samples/sec Loss 4.9489 LearningRate 0.0003 Epoch: 19 Global Step: 405910 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:20,203-Speed 6288.18 samples/sec Loss 4.9514 LearningRate 0.0003 Epoch: 19 Global Step: 405920 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:23,451-Speed 6306.37 samples/sec Loss 4.9109 LearningRate 0.0003 Epoch: 19 Global Step: 405930 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:26,700-Speed 6307.11 samples/sec Loss 5.0049 LearningRate 0.0003 Epoch: 19 Global Step: 405940 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:39:29,945-Speed 6312.69 samples/sec Loss 5.0096 LearningRate 0.0003 Epoch: 19 Global Step: 405950 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:39:33,195-Speed 6302.76 samples/sec Loss 4.9643 LearningRate 0.0003 Epoch: 19 Global Step: 405960 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:39:36,443-Speed 6306.33 samples/sec Loss 4.9993 LearningRate 0.0003 Epoch: 19 Global Step: 405970 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:39:39,679-Speed 6331.46 samples/sec Loss 4.9125 LearningRate 0.0003 Epoch: 19 Global Step: 405980 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:42,923-Speed 6312.82 samples/sec Loss 4.8763 LearningRate 0.0003 Epoch: 19 Global Step: 405990 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:46,169-Speed 6312.04 samples/sec Loss 4.9206 LearningRate 0.0003 Epoch: 19 Global Step: 406000 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:49,424-Speed 6292.93 samples/sec Loss 4.9040 LearningRate 0.0003 Epoch: 19 Global Step: 406010 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:52,669-Speed 6312.02 samples/sec Loss 4.9969 LearningRate 0.0003 Epoch: 19 Global Step: 406020 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:55,917-Speed 6308.15 samples/sec Loss 4.8838 LearningRate 0.0003 Epoch: 19 Global Step: 406030 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:39:59,162-Speed 6311.67 samples/sec Loss 4.9571 LearningRate 0.0003 Epoch: 19 Global Step: 406040 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:02,404-Speed 6318.32 samples/sec Loss 4.9179 LearningRate 0.0003 Epoch: 19 Global Step: 406050 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:05,656-Speed 6300.49 samples/sec Loss 5.0125 LearningRate 0.0003 Epoch: 19 Global Step: 406060 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:08,901-Speed 6311.77 samples/sec Loss 4.9577 LearningRate 0.0003 Epoch: 19 Global Step: 406070 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:12,149-Speed 6307.53 samples/sec Loss 4.9603 LearningRate 0.0003 Epoch: 19 Global Step: 406080 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:40:15,392-Speed 6315.84 samples/sec Loss 4.9877 LearningRate 0.0003 Epoch: 19 Global Step: 406090 Fp16 Grad Scale: 32768 Required: 39 hours Training: 2022-04-02 04:40:18,629-Speed 6327.70 samples/sec Loss 4.8996 LearningRate 0.0003 Epoch: 19 Global Step: 406100 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:21,874-Speed 6312.32 samples/sec Loss 4.9007 LearningRate 0.0003 Epoch: 19 Global Step: 406110 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:25,119-Speed 6313.57 samples/sec Loss 4.9883 LearningRate 0.0003 Epoch: 19 Global Step: 406120 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:28,365-Speed 6310.47 samples/sec Loss 4.9410 LearningRate 0.0003 Epoch: 19 Global Step: 406130 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:31,611-Speed 6310.63 samples/sec Loss 4.9273 LearningRate 0.0003 Epoch: 19 Global Step: 406140 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:34,851-Speed 6321.79 samples/sec Loss 4.9817 LearningRate 0.0003 Epoch: 19 Global Step: 406150 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:38,109-Speed 6288.13 samples/sec Loss 4.9263 LearningRate 0.0003 Epoch: 19 Global Step: 406160 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:41,357-Speed 6307.13 samples/sec Loss 4.9636 LearningRate 0.0003 Epoch: 19 Global Step: 406170 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:44,600-Speed 6316.60 samples/sec Loss 4.9727 LearningRate 0.0003 Epoch: 19 Global Step: 406180 Fp16 Grad Scale: 16384 Required: 39 hours Training: 2022-04-02 04:40:47,847-Speed 6310.19 samples/sec Loss 4.9053 LearningRate 0.0003 Epoch: 19 Global Step: 406190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:40:51,089-Speed 6318.09 samples/sec Loss 4.9893 LearningRate 0.0003 Epoch: 19 Global Step: 406200 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:40:54,335-Speed 6310.43 samples/sec Loss 4.9444 LearningRate 0.0003 Epoch: 19 Global Step: 406210 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:40:57,568-Speed 6336.20 samples/sec Loss 4.9264 LearningRate 0.0003 Epoch: 19 Global Step: 406220 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:00,828-Speed 6284.27 samples/sec Loss 4.9134 LearningRate 0.0003 Epoch: 19 Global Step: 406230 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:04,079-Speed 6300.51 samples/sec Loss 4.9936 LearningRate 0.0003 Epoch: 19 Global Step: 406240 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:07,325-Speed 6311.36 samples/sec Loss 5.0364 LearningRate 0.0003 Epoch: 19 Global Step: 406250 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:10,567-Speed 6317.27 samples/sec Loss 4.9246 LearningRate 0.0003 Epoch: 19 Global Step: 406260 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:13,813-Speed 6311.35 samples/sec Loss 4.9599 LearningRate 0.0003 Epoch: 19 Global Step: 406270 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:17,060-Speed 6308.31 samples/sec Loss 4.8988 LearningRate 0.0003 Epoch: 19 Global Step: 406280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:20,307-Speed 6308.56 samples/sec Loss 4.9866 LearningRate 0.0003 Epoch: 19 Global Step: 406290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:23,555-Speed 6307.02 samples/sec Loss 4.8932 LearningRate 0.0003 Epoch: 19 Global Step: 406300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:26,803-Speed 6307.56 samples/sec Loss 4.9189 LearningRate 0.0003 Epoch: 19 Global Step: 406310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:30,056-Speed 6296.08 samples/sec Loss 4.9025 LearningRate 0.0003 Epoch: 19 Global Step: 406320 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:41:33,291-Speed 6334.19 samples/sec Loss 4.9558 LearningRate 0.0003 Epoch: 19 Global Step: 406330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:36,539-Speed 6305.43 samples/sec Loss 4.9032 LearningRate 0.0003 Epoch: 19 Global Step: 406340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:39,788-Speed 6304.16 samples/sec Loss 4.9597 LearningRate 0.0003 Epoch: 19 Global Step: 406350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:43,032-Speed 6316.10 samples/sec Loss 4.8832 LearningRate 0.0003 Epoch: 19 Global Step: 406360 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:46,277-Speed 6312.56 samples/sec Loss 4.8361 LearningRate 0.0003 Epoch: 19 Global Step: 406370 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:49,525-Speed 6306.37 samples/sec Loss 4.9368 LearningRate 0.0003 Epoch: 19 Global Step: 406380 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:52,772-Speed 6310.21 samples/sec Loss 4.9558 LearningRate 0.0003 Epoch: 19 Global Step: 406390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:56,022-Speed 6303.21 samples/sec Loss 4.9710 LearningRate 0.0003 Epoch: 19 Global Step: 406400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:41:59,270-Speed 6306.45 samples/sec Loss 4.9694 LearningRate 0.0003 Epoch: 19 Global Step: 406410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:02,532-Speed 6279.40 samples/sec Loss 4.8790 LearningRate 0.0003 Epoch: 19 Global Step: 406420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:05,780-Speed 6307.42 samples/sec Loss 4.9346 LearningRate 0.0003 Epoch: 19 Global Step: 406430 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:42:09,027-Speed 6309.04 samples/sec Loss 5.0131 LearningRate 0.0003 Epoch: 19 Global Step: 406440 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:42:12,274-Speed 6308.14 samples/sec Loss 4.9123 LearningRate 0.0003 Epoch: 19 Global Step: 406450 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:42:15,507-Speed 6336.67 samples/sec Loss 4.9590 LearningRate 0.0003 Epoch: 19 Global Step: 406460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:18,753-Speed 6309.31 samples/sec Loss 4.9486 LearningRate 0.0003 Epoch: 19 Global Step: 406470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:22,005-Speed 6300.46 samples/sec Loss 4.9802 LearningRate 0.0003 Epoch: 19 Global Step: 406480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:25,252-Speed 6307.54 samples/sec Loss 4.9129 LearningRate 0.0003 Epoch: 19 Global Step: 406490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:28,498-Speed 6310.51 samples/sec Loss 4.9686 LearningRate 0.0003 Epoch: 19 Global Step: 406500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:31,744-Speed 6310.73 samples/sec Loss 4.9844 LearningRate 0.0003 Epoch: 19 Global Step: 406510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:34,992-Speed 6306.90 samples/sec Loss 4.9359 LearningRate 0.0003 Epoch: 19 Global Step: 406520 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:38,240-Speed 6308.28 samples/sec Loss 4.9853 LearningRate 0.0003 Epoch: 19 Global Step: 406530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:41,488-Speed 6305.67 samples/sec Loss 4.9497 LearningRate 0.0003 Epoch: 19 Global Step: 406540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:44,739-Speed 6301.33 samples/sec Loss 4.9748 LearningRate 0.0003 Epoch: 19 Global Step: 406550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:47,979-Speed 6321.62 samples/sec Loss 4.9140 LearningRate 0.0003 Epoch: 19 Global Step: 406560 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:42:51,226-Speed 6308.78 samples/sec Loss 4.9048 LearningRate 0.0003 Epoch: 19 Global Step: 406570 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:42:54,461-Speed 6333.15 samples/sec Loss 4.9561 LearningRate 0.0003 Epoch: 19 Global Step: 406580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:42:57,707-Speed 6311.85 samples/sec Loss 4.9945 LearningRate 0.0003 Epoch: 19 Global Step: 406590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:00,954-Speed 6307.88 samples/sec Loss 4.9985 LearningRate 0.0003 Epoch: 19 Global Step: 406600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:04,201-Speed 6309.59 samples/sec Loss 5.0020 LearningRate 0.0003 Epoch: 19 Global Step: 406610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:07,466-Speed 6274.10 samples/sec Loss 4.8955 LearningRate 0.0003 Epoch: 19 Global Step: 406620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:10,709-Speed 6315.29 samples/sec Loss 4.9770 LearningRate 0.0003 Epoch: 19 Global Step: 406630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:13,968-Speed 6285.53 samples/sec Loss 4.8720 LearningRate 0.0003 Epoch: 19 Global Step: 406640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:17,221-Speed 6298.45 samples/sec Loss 4.9840 LearningRate 0.0003 Epoch: 19 Global Step: 406650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:20,479-Speed 6287.43 samples/sec Loss 5.0432 LearningRate 0.0003 Epoch: 19 Global Step: 406660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:23,725-Speed 6309.60 samples/sec Loss 5.0065 LearningRate 0.0003 Epoch: 19 Global Step: 406670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:26,971-Speed 6310.18 samples/sec Loss 4.9187 LearningRate 0.0003 Epoch: 19 Global Step: 406680 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:43:30,211-Speed 6324.06 samples/sec Loss 4.9928 LearningRate 0.0003 Epoch: 19 Global Step: 406690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:33,456-Speed 6312.49 samples/sec Loss 4.9476 LearningRate 0.0003 Epoch: 19 Global Step: 406700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:36,705-Speed 6305.29 samples/sec Loss 4.9756 LearningRate 0.0003 Epoch: 19 Global Step: 406710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:39,950-Speed 6311.39 samples/sec Loss 4.9619 LearningRate 0.0003 Epoch: 19 Global Step: 406720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:43,196-Speed 6310.45 samples/sec Loss 4.9239 LearningRate 0.0003 Epoch: 19 Global Step: 406730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:46,440-Speed 6315.15 samples/sec Loss 4.9603 LearningRate 0.0003 Epoch: 19 Global Step: 406740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:49,754-Speed 6180.85 samples/sec Loss 4.9765 LearningRate 0.0003 Epoch: 19 Global Step: 406750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:53,010-Speed 6291.65 samples/sec Loss 5.0192 LearningRate 0.0003 Epoch: 19 Global Step: 406760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:56,274-Speed 6276.06 samples/sec Loss 4.9828 LearningRate 0.0003 Epoch: 19 Global Step: 406770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:43:59,515-Speed 6319.92 samples/sec Loss 4.9667 LearningRate 0.0003 Epoch: 19 Global Step: 406780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:02,761-Speed 6310.73 samples/sec Loss 4.9523 LearningRate 0.0003 Epoch: 19 Global Step: 406790 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:44:06,003-Speed 6318.93 samples/sec Loss 4.9527 LearningRate 0.0003 Epoch: 19 Global Step: 406800 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:44:09,235-Speed 6338.54 samples/sec Loss 4.8938 LearningRate 0.0003 Epoch: 19 Global Step: 406810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:12,486-Speed 6301.67 samples/sec Loss 4.9485 LearningRate 0.0003 Epoch: 19 Global Step: 406820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:15,734-Speed 6306.27 samples/sec Loss 4.8783 LearningRate 0.0003 Epoch: 19 Global Step: 406830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:18,980-Speed 6310.66 samples/sec Loss 4.9772 LearningRate 0.0003 Epoch: 19 Global Step: 406840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:22,225-Speed 6313.88 samples/sec Loss 4.9151 LearningRate 0.0003 Epoch: 19 Global Step: 406850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:25,476-Speed 6300.58 samples/sec Loss 4.9231 LearningRate 0.0003 Epoch: 19 Global Step: 406860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:28,717-Speed 6320.09 samples/sec Loss 4.9887 LearningRate 0.0003 Epoch: 19 Global Step: 406870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:31,963-Speed 6309.49 samples/sec Loss 4.9846 LearningRate 0.0003 Epoch: 19 Global Step: 406880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:35,206-Speed 6317.24 samples/sec Loss 5.0511 LearningRate 0.0003 Epoch: 19 Global Step: 406890 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:38,451-Speed 6313.66 samples/sec Loss 4.9887 LearningRate 0.0003 Epoch: 19 Global Step: 406900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:41,683-Speed 6337.72 samples/sec Loss 4.9564 LearningRate 0.0003 Epoch: 19 Global Step: 406910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:44,925-Speed 6317.29 samples/sec Loss 4.8631 LearningRate 0.0003 Epoch: 19 Global Step: 406920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:48,176-Speed 6300.65 samples/sec Loss 4.9516 LearningRate 0.0003 Epoch: 19 Global Step: 406930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:51,419-Speed 6317.37 samples/sec Loss 4.9836 LearningRate 0.0003 Epoch: 19 Global Step: 406940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:54,666-Speed 6308.69 samples/sec Loss 4.9459 LearningRate 0.0003 Epoch: 19 Global Step: 406950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:44:57,911-Speed 6312.28 samples/sec Loss 4.9324 LearningRate 0.0003 Epoch: 19 Global Step: 406960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:01,158-Speed 6309.87 samples/sec Loss 5.0206 LearningRate 0.0003 Epoch: 19 Global Step: 406970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:04,404-Speed 6310.68 samples/sec Loss 5.0399 LearningRate 0.0003 Epoch: 19 Global Step: 406980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:07,650-Speed 6310.48 samples/sec Loss 4.9472 LearningRate 0.0003 Epoch: 19 Global Step: 406990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:10,892-Speed 6317.71 samples/sec Loss 4.9669 LearningRate 0.0003 Epoch: 19 Global Step: 407000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:14,140-Speed 6308.67 samples/sec Loss 4.9623 LearningRate 0.0003 Epoch: 19 Global Step: 407010 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:45:17,383-Speed 6315.46 samples/sec Loss 4.8876 LearningRate 0.0003 Epoch: 19 Global Step: 407020 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:45:20,630-Speed 6309.11 samples/sec Loss 4.9785 LearningRate 0.0003 Epoch: 19 Global Step: 407030 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:45:23,943-Speed 6182.53 samples/sec Loss 5.0004 LearningRate 0.0003 Epoch: 19 Global Step: 407040 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:45:27,186-Speed 6317.23 samples/sec Loss 4.9900 LearningRate 0.0003 Epoch: 19 Global Step: 407050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:30,436-Speed 6303.02 samples/sec Loss 4.9461 LearningRate 0.0003 Epoch: 19 Global Step: 407060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:33,684-Speed 6307.59 samples/sec Loss 4.9188 LearningRate 0.0003 Epoch: 19 Global Step: 407070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:36,933-Speed 6304.95 samples/sec Loss 4.9181 LearningRate 0.0003 Epoch: 19 Global Step: 407080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:40,181-Speed 6305.97 samples/sec Loss 4.9735 LearningRate 0.0003 Epoch: 19 Global Step: 407090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:43,430-Speed 6304.22 samples/sec Loss 4.8736 LearningRate 0.0003 Epoch: 19 Global Step: 407100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:46,677-Speed 6309.80 samples/sec Loss 4.9413 LearningRate 0.0003 Epoch: 19 Global Step: 407110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:49,922-Speed 6312.68 samples/sec Loss 4.9517 LearningRate 0.0003 Epoch: 19 Global Step: 407120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:53,168-Speed 6309.97 samples/sec Loss 4.9151 LearningRate 0.0003 Epoch: 19 Global Step: 407130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:56,415-Speed 6308.43 samples/sec Loss 4.8655 LearningRate 0.0003 Epoch: 19 Global Step: 407140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:45:59,652-Speed 6328.31 samples/sec Loss 4.8882 LearningRate 0.0003 Epoch: 19 Global Step: 407150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:02,900-Speed 6307.79 samples/sec Loss 4.9169 LearningRate 0.0003 Epoch: 19 Global Step: 407160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:06,148-Speed 6306.51 samples/sec Loss 4.8957 LearningRate 0.0003 Epoch: 19 Global Step: 407170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:09,390-Speed 6318.54 samples/sec Loss 4.9275 LearningRate 0.0003 Epoch: 19 Global Step: 407180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:12,639-Speed 6304.66 samples/sec Loss 4.9308 LearningRate 0.0003 Epoch: 19 Global Step: 407190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:15,883-Speed 6314.46 samples/sec Loss 4.9848 LearningRate 0.0003 Epoch: 19 Global Step: 407200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:19,129-Speed 6311.87 samples/sec Loss 4.9011 LearningRate 0.0003 Epoch: 19 Global Step: 407210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:22,382-Speed 6298.58 samples/sec Loss 5.0694 LearningRate 0.0003 Epoch: 19 Global Step: 407220 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:25,630-Speed 6306.62 samples/sec Loss 4.9800 LearningRate 0.0003 Epoch: 19 Global Step: 407230 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:28,873-Speed 6315.27 samples/sec Loss 4.9469 LearningRate 0.0003 Epoch: 19 Global Step: 407240 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:32,123-Speed 6303.79 samples/sec Loss 4.9871 LearningRate 0.0003 Epoch: 19 Global Step: 407250 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:46:35,371-Speed 6306.92 samples/sec Loss 4.8717 LearningRate 0.0003 Epoch: 19 Global Step: 407260 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:46:38,619-Speed 6310.07 samples/sec Loss 4.9276 LearningRate 0.0003 Epoch: 19 Global Step: 407270 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:46:41,868-Speed 6303.26 samples/sec Loss 5.0001 LearningRate 0.0003 Epoch: 19 Global Step: 407280 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:46:45,117-Speed 6305.72 samples/sec Loss 4.9427 LearningRate 0.0003 Epoch: 19 Global Step: 407290 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:46:48,364-Speed 6308.73 samples/sec Loss 4.9575 LearningRate 0.0003 Epoch: 19 Global Step: 407300 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:46:51,596-Speed 6338.71 samples/sec Loss 4.9900 LearningRate 0.0003 Epoch: 19 Global Step: 407310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:54,852-Speed 6289.85 samples/sec Loss 4.9858 LearningRate 0.0003 Epoch: 19 Global Step: 407320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:46:58,097-Speed 6312.50 samples/sec Loss 4.9481 LearningRate 0.0003 Epoch: 19 Global Step: 407330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:01,345-Speed 6308.22 samples/sec Loss 4.9201 LearningRate 0.0003 Epoch: 19 Global Step: 407340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:04,592-Speed 6308.91 samples/sec Loss 4.9233 LearningRate 0.0003 Epoch: 19 Global Step: 407350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:07,838-Speed 6310.68 samples/sec Loss 4.9534 LearningRate 0.0003 Epoch: 19 Global Step: 407360 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:11,086-Speed 6305.55 samples/sec Loss 4.9123 LearningRate 0.0003 Epoch: 19 Global Step: 407370 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:14,328-Speed 6318.92 samples/sec Loss 5.0078 LearningRate 0.0003 Epoch: 19 Global Step: 407380 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:17,573-Speed 6312.73 samples/sec Loss 4.9240 LearningRate 0.0003 Epoch: 19 Global Step: 407390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:20,818-Speed 6313.14 samples/sec Loss 4.8776 LearningRate 0.0003 Epoch: 19 Global Step: 407400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:24,100-Speed 6241.07 samples/sec Loss 5.0054 LearningRate 0.0003 Epoch: 19 Global Step: 407410 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:47:27,333-Speed 6335.57 samples/sec Loss 4.9484 LearningRate 0.0003 Epoch: 19 Global Step: 407420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:30,578-Speed 6312.64 samples/sec Loss 4.9842 LearningRate 0.0003 Epoch: 19 Global Step: 407430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:33,824-Speed 6310.61 samples/sec Loss 4.8426 LearningRate 0.0003 Epoch: 19 Global Step: 407440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:37,069-Speed 6313.05 samples/sec Loss 4.9909 LearningRate 0.0003 Epoch: 19 Global Step: 407450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:40,313-Speed 6316.25 samples/sec Loss 4.9463 LearningRate 0.0003 Epoch: 19 Global Step: 407460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:43,558-Speed 6312.43 samples/sec Loss 5.0348 LearningRate 0.0003 Epoch: 19 Global Step: 407470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:46,799-Speed 6320.22 samples/sec Loss 4.9441 LearningRate 0.0003 Epoch: 19 Global Step: 407480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:50,045-Speed 6309.33 samples/sec Loss 4.9553 LearningRate 0.0003 Epoch: 19 Global Step: 407490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:53,289-Speed 6315.76 samples/sec Loss 4.9519 LearningRate 0.0003 Epoch: 19 Global Step: 407500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:56,538-Speed 6305.55 samples/sec Loss 5.0060 LearningRate 0.0003 Epoch: 19 Global Step: 407510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:47:59,781-Speed 6314.67 samples/sec Loss 4.8947 LearningRate 0.0003 Epoch: 19 Global Step: 407520 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:48:03,016-Speed 6333.09 samples/sec Loss 4.9476 LearningRate 0.0003 Epoch: 19 Global Step: 407530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:06,260-Speed 6313.93 samples/sec Loss 5.0130 LearningRate 0.0003 Epoch: 19 Global Step: 407540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:09,505-Speed 6312.70 samples/sec Loss 4.9513 LearningRate 0.0003 Epoch: 19 Global Step: 407550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:12,746-Speed 6320.22 samples/sec Loss 4.9158 LearningRate 0.0003 Epoch: 19 Global Step: 407560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:16,011-Speed 6273.97 samples/sec Loss 4.9435 LearningRate 0.0003 Epoch: 19 Global Step: 407570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:19,259-Speed 6307.10 samples/sec Loss 4.9334 LearningRate 0.0003 Epoch: 19 Global Step: 407580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:22,505-Speed 6311.67 samples/sec Loss 4.9017 LearningRate 0.0003 Epoch: 19 Global Step: 407590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:25,758-Speed 6296.81 samples/sec Loss 4.9888 LearningRate 0.0003 Epoch: 19 Global Step: 407600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:29,012-Speed 6295.84 samples/sec Loss 4.9094 LearningRate 0.0003 Epoch: 19 Global Step: 407610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:32,254-Speed 6317.07 samples/sec Loss 4.9537 LearningRate 0.0003 Epoch: 19 Global Step: 407620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:35,484-Speed 6342.57 samples/sec Loss 4.9608 LearningRate 0.0003 Epoch: 19 Global Step: 407630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:38,727-Speed 6316.62 samples/sec Loss 4.8767 LearningRate 0.0003 Epoch: 19 Global Step: 407640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:41,973-Speed 6310.98 samples/sec Loss 4.9380 LearningRate 0.0003 Epoch: 19 Global Step: 407650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:45,216-Speed 6316.40 samples/sec Loss 4.9707 LearningRate 0.0003 Epoch: 19 Global Step: 407660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:48,465-Speed 6306.80 samples/sec Loss 4.9948 LearningRate 0.0003 Epoch: 19 Global Step: 407670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:51,710-Speed 6312.46 samples/sec Loss 4.8814 LearningRate 0.0003 Epoch: 19 Global Step: 407680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:54,956-Speed 6309.64 samples/sec Loss 4.9638 LearningRate 0.0003 Epoch: 19 Global Step: 407690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:48:58,199-Speed 6316.53 samples/sec Loss 4.9741 LearningRate 0.0003 Epoch: 19 Global Step: 407700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:01,444-Speed 6312.99 samples/sec Loss 4.9957 LearningRate 0.0003 Epoch: 19 Global Step: 407710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:04,687-Speed 6317.45 samples/sec Loss 4.9269 LearningRate 0.0003 Epoch: 19 Global Step: 407720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:07,935-Speed 6306.31 samples/sec Loss 4.9511 LearningRate 0.0003 Epoch: 19 Global Step: 407730 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:49:11,204-Speed 6265.63 samples/sec Loss 4.9748 LearningRate 0.0003 Epoch: 19 Global Step: 407740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:14,447-Speed 6317.30 samples/sec Loss 4.9620 LearningRate 0.0003 Epoch: 19 Global Step: 407750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:17,704-Speed 6289.64 samples/sec Loss 4.9567 LearningRate 0.0003 Epoch: 19 Global Step: 407760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:20,949-Speed 6312.24 samples/sec Loss 4.9292 LearningRate 0.0003 Epoch: 19 Global Step: 407770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:24,199-Speed 6303.30 samples/sec Loss 4.9241 LearningRate 0.0003 Epoch: 19 Global Step: 407780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:27,445-Speed 6308.90 samples/sec Loss 4.8890 LearningRate 0.0003 Epoch: 19 Global Step: 407790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:30,687-Speed 6319.53 samples/sec Loss 4.9532 LearningRate 0.0003 Epoch: 19 Global Step: 407800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:33,934-Speed 6309.32 samples/sec Loss 4.8796 LearningRate 0.0003 Epoch: 19 Global Step: 407810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:37,187-Speed 6296.05 samples/sec Loss 4.9099 LearningRate 0.0003 Epoch: 19 Global Step: 407820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:40,433-Speed 6311.42 samples/sec Loss 4.9931 LearningRate 0.0003 Epoch: 19 Global Step: 407830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:43,664-Speed 6339.45 samples/sec Loss 4.9565 LearningRate 0.0003 Epoch: 19 Global Step: 407840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:46,913-Speed 6305.77 samples/sec Loss 4.9376 LearningRate 0.0003 Epoch: 19 Global Step: 407850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:50,156-Speed 6316.18 samples/sec Loss 4.9563 LearningRate 0.0003 Epoch: 19 Global Step: 407860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:53,404-Speed 6306.99 samples/sec Loss 4.9246 LearningRate 0.0003 Epoch: 19 Global Step: 407870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:56,649-Speed 6313.17 samples/sec Loss 4.9504 LearningRate 0.0003 Epoch: 19 Global Step: 407880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:49:59,896-Speed 6309.17 samples/sec Loss 4.9312 LearningRate 0.0003 Epoch: 19 Global Step: 407890 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:03,144-Speed 6306.33 samples/sec Loss 4.8980 LearningRate 0.0003 Epoch: 19 Global Step: 407900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:06,389-Speed 6312.69 samples/sec Loss 4.9288 LearningRate 0.0003 Epoch: 19 Global Step: 407910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:09,633-Speed 6313.60 samples/sec Loss 4.9127 LearningRate 0.0003 Epoch: 19 Global Step: 407920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:12,877-Speed 6315.45 samples/sec Loss 4.8726 LearningRate 0.0003 Epoch: 19 Global Step: 407930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:16,122-Speed 6313.80 samples/sec Loss 4.9565 LearningRate 0.0003 Epoch: 19 Global Step: 407940 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:50:19,351-Speed 6343.05 samples/sec Loss 4.9396 LearningRate 0.0003 Epoch: 19 Global Step: 407950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:22,603-Speed 6299.85 samples/sec Loss 4.9602 LearningRate 0.0003 Epoch: 19 Global Step: 407960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:25,850-Speed 6308.61 samples/sec Loss 4.9256 LearningRate 0.0003 Epoch: 19 Global Step: 407970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:29,092-Speed 6316.61 samples/sec Loss 4.9073 LearningRate 0.0003 Epoch: 19 Global Step: 407980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:32,347-Speed 6294.71 samples/sec Loss 4.9220 LearningRate 0.0003 Epoch: 19 Global Step: 407990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:35,588-Speed 6319.03 samples/sec Loss 4.8932 LearningRate 0.0003 Epoch: 19 Global Step: 408000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:38,832-Speed 6315.40 samples/sec Loss 4.9980 LearningRate 0.0003 Epoch: 19 Global Step: 408010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:42,078-Speed 6311.78 samples/sec Loss 4.9894 LearningRate 0.0003 Epoch: 19 Global Step: 408020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:45,323-Speed 6312.10 samples/sec Loss 4.9207 LearningRate 0.0003 Epoch: 19 Global Step: 408030 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:48,580-Speed 6289.58 samples/sec Loss 4.9507 LearningRate 0.0003 Epoch: 19 Global Step: 408040 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:50:51,826-Speed 6310.65 samples/sec Loss 4.9218 LearningRate 0.0003 Epoch: 19 Global Step: 408050 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:50:55,070-Speed 6313.88 samples/sec Loss 4.9872 LearningRate 0.0003 Epoch: 19 Global Step: 408060 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:50:58,317-Speed 6309.49 samples/sec Loss 4.9842 LearningRate 0.0003 Epoch: 19 Global Step: 408070 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:51:01,564-Speed 6308.99 samples/sec Loss 5.0092 LearningRate 0.0003 Epoch: 19 Global Step: 408080 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:51:04,818-Speed 6294.15 samples/sec Loss 4.9459 LearningRate 0.0003 Epoch: 19 Global Step: 408090 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:51:08,065-Speed 6309.69 samples/sec Loss 4.9370 LearningRate 0.0003 Epoch: 19 Global Step: 408100 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:51:11,312-Speed 6307.70 samples/sec Loss 5.0493 LearningRate 0.0003 Epoch: 19 Global Step: 408110 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:51:14,553-Speed 6322.16 samples/sec Loss 5.0030 LearningRate 0.0003 Epoch: 19 Global Step: 408120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:51:17,852-Speed 6208.48 samples/sec Loss 4.9850 LearningRate 0.0003 Epoch: 19 Global Step: 408130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:51:21,101-Speed 6306.29 samples/sec Loss 4.8424 LearningRate 0.0003 Epoch: 19 Global Step: 408140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:51:24,346-Speed 6312.00 samples/sec Loss 4.9416 LearningRate 0.0003 Epoch: 19 Global Step: 408150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:51:27,588-Speed 6316.89 samples/sec Loss 4.9949 LearningRate 0.0003 Epoch: 19 Global Step: 408160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:51:30,828-Speed 6322.95 samples/sec Loss 4.9625 LearningRate 0.0003 Epoch: 19 Global Step: 408170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:51:34,076-Speed 6306.83 samples/sec Loss 4.9335 LearningRate 0.0003 Epoch: 19 Global Step: 408180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:51:37,319-Speed 6316.34 samples/sec Loss 4.9935 LearningRate 0.0003 Epoch: 19 Global Step: 408190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:51:40,563-Speed 6315.89 samples/sec Loss 4.9564 LearningRate 0.0003 Epoch: 19 Global Step: 408200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:51:43,807-Speed 6314.24 samples/sec Loss 4.9897 LearningRate 0.0003 Epoch: 19 Global Step: 408210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:51:47,058-Speed 6300.18 samples/sec Loss 4.8867 LearningRate 0.0003 Epoch: 19 Global Step: 408220 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:51:50,303-Speed 6312.67 samples/sec Loss 4.9116 LearningRate 0.0003 Epoch: 19 Global Step: 408230 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:51:53,549-Speed 6311.70 samples/sec Loss 4.9777 LearningRate 0.0003 Epoch: 19 Global Step: 408240 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:51:56,795-Speed 6310.22 samples/sec Loss 4.9640 LearningRate 0.0003 Epoch: 19 Global Step: 408250 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:52:00,046-Speed 6300.62 samples/sec Loss 5.0102 LearningRate 0.0003 Epoch: 19 Global Step: 408260 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:52:03,296-Speed 6303.77 samples/sec Loss 4.9856 LearningRate 0.0003 Epoch: 19 Global Step: 408270 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:52:06,543-Speed 6308.13 samples/sec Loss 4.9533 LearningRate 0.0003 Epoch: 19 Global Step: 408280 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:52:09,788-Speed 6313.32 samples/sec Loss 4.9262 LearningRate 0.0003 Epoch: 19 Global Step: 408290 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:52:13,036-Speed 6307.40 samples/sec Loss 4.8789 LearningRate 0.0003 Epoch: 19 Global Step: 408300 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:52:16,284-Speed 6307.55 samples/sec Loss 4.8985 LearningRate 0.0003 Epoch: 19 Global Step: 408310 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:52:19,518-Speed 6332.99 samples/sec Loss 4.9055 LearningRate 0.0003 Epoch: 19 Global Step: 408320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:52:22,773-Speed 6293.21 samples/sec Loss 4.9058 LearningRate 0.0003 Epoch: 19 Global Step: 408330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:52:26,019-Speed 6311.28 samples/sec Loss 4.8535 LearningRate 0.0003 Epoch: 19 Global Step: 408340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:52:29,267-Speed 6306.83 samples/sec Loss 4.9791 LearningRate 0.0003 Epoch: 19 Global Step: 408350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:52:32,513-Speed 6309.88 samples/sec Loss 4.9186 LearningRate 0.0003 Epoch: 19 Global Step: 408360 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:52:35,755-Speed 6318.33 samples/sec Loss 4.9123 LearningRate 0.0003 Epoch: 19 Global Step: 408370 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:52:38,999-Speed 6316.21 samples/sec Loss 4.9226 LearningRate 0.0003 Epoch: 19 Global Step: 408380 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:52:42,247-Speed 6305.32 samples/sec Loss 4.9352 LearningRate 0.0003 Epoch: 19 Global Step: 408390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:52:45,491-Speed 6315.29 samples/sec Loss 4.9571 LearningRate 0.0003 Epoch: 19 Global Step: 408400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:52:48,738-Speed 6309.56 samples/sec Loss 4.9433 LearningRate 0.0003 Epoch: 19 Global Step: 408410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:52:51,988-Speed 6305.74 samples/sec Loss 4.9508 LearningRate 0.0003 Epoch: 19 Global Step: 408420 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:52:55,235-Speed 6309.21 samples/sec Loss 4.9396 LearningRate 0.0003 Epoch: 19 Global Step: 408430 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:52:58,480-Speed 6313.01 samples/sec Loss 4.9285 LearningRate 0.0003 Epoch: 19 Global Step: 408440 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:53:01,724-Speed 6313.84 samples/sec Loss 4.9598 LearningRate 0.0003 Epoch: 19 Global Step: 408450 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:53:04,952-Speed 6345.36 samples/sec Loss 4.9669 LearningRate 0.0003 Epoch: 19 Global Step: 408460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:08,198-Speed 6310.68 samples/sec Loss 4.9306 LearningRate 0.0003 Epoch: 19 Global Step: 408470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:11,443-Speed 6313.84 samples/sec Loss 4.8684 LearningRate 0.0003 Epoch: 19 Global Step: 408480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:14,690-Speed 6307.71 samples/sec Loss 4.8905 LearningRate 0.0003 Epoch: 19 Global Step: 408490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:17,941-Speed 6301.75 samples/sec Loss 4.9259 LearningRate 0.0003 Epoch: 19 Global Step: 408500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:21,187-Speed 6310.63 samples/sec Loss 4.9970 LearningRate 0.0003 Epoch: 19 Global Step: 408510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:24,435-Speed 6307.07 samples/sec Loss 4.9312 LearningRate 0.0003 Epoch: 19 Global Step: 408520 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:27,682-Speed 6309.79 samples/sec Loss 4.9107 LearningRate 0.0003 Epoch: 19 Global Step: 408530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:30,926-Speed 6314.65 samples/sec Loss 4.9774 LearningRate 0.0003 Epoch: 19 Global Step: 408540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:34,174-Speed 6307.54 samples/sec Loss 4.9231 LearningRate 0.0003 Epoch: 19 Global Step: 408550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:37,416-Speed 6316.84 samples/sec Loss 4.9757 LearningRate 0.0003 Epoch: 19 Global Step: 408560 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:53:40,645-Speed 6343.35 samples/sec Loss 5.0038 LearningRate 0.0003 Epoch: 19 Global Step: 408570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:43,891-Speed 6312.11 samples/sec Loss 4.9965 LearningRate 0.0003 Epoch: 19 Global Step: 408580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:47,134-Speed 6316.01 samples/sec Loss 4.9715 LearningRate 0.0003 Epoch: 19 Global Step: 408590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:50,380-Speed 6309.97 samples/sec Loss 4.9211 LearningRate 0.0003 Epoch: 19 Global Step: 408600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:53,624-Speed 6315.29 samples/sec Loss 4.9081 LearningRate 0.0003 Epoch: 19 Global Step: 408610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:53:56,871-Speed 6309.60 samples/sec Loss 4.9266 LearningRate 0.0003 Epoch: 19 Global Step: 408620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:00,113-Speed 6318.41 samples/sec Loss 4.8775 LearningRate 0.0003 Epoch: 19 Global Step: 408630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:03,358-Speed 6311.57 samples/sec Loss 4.9172 LearningRate 0.0003 Epoch: 19 Global Step: 408640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:06,602-Speed 6314.62 samples/sec Loss 4.9119 LearningRate 0.0003 Epoch: 19 Global Step: 408650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:09,852-Speed 6302.81 samples/sec Loss 4.9400 LearningRate 0.0003 Epoch: 19 Global Step: 408660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:13,096-Speed 6314.20 samples/sec Loss 4.9511 LearningRate 0.0003 Epoch: 19 Global Step: 408670 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:54:16,340-Speed 6316.06 samples/sec Loss 4.9089 LearningRate 0.0003 Epoch: 19 Global Step: 408680 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:54:19,582-Speed 6317.32 samples/sec Loss 4.9164 LearningRate 0.0003 Epoch: 19 Global Step: 408690 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:54:22,844-Speed 6280.58 samples/sec Loss 4.9186 LearningRate 0.0003 Epoch: 19 Global Step: 408700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:26,091-Speed 6308.08 samples/sec Loss 4.9013 LearningRate 0.0003 Epoch: 19 Global Step: 408710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:29,380-Speed 6229.74 samples/sec Loss 4.9836 LearningRate 0.0003 Epoch: 19 Global Step: 408720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:32,626-Speed 6310.46 samples/sec Loss 5.0089 LearningRate 0.0003 Epoch: 19 Global Step: 408730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:35,872-Speed 6310.25 samples/sec Loss 4.8989 LearningRate 0.0003 Epoch: 19 Global Step: 408740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:39,116-Speed 6314.74 samples/sec Loss 4.9692 LearningRate 0.0003 Epoch: 19 Global Step: 408750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:42,363-Speed 6310.16 samples/sec Loss 4.8986 LearningRate 0.0003 Epoch: 19 Global Step: 408760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:45,606-Speed 6314.71 samples/sec Loss 4.9312 LearningRate 0.0003 Epoch: 19 Global Step: 408770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:48,847-Speed 6321.63 samples/sec Loss 4.9413 LearningRate 0.0003 Epoch: 19 Global Step: 408780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:52,096-Speed 6305.62 samples/sec Loss 4.9248 LearningRate 0.0003 Epoch: 19 Global Step: 408790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:54:55,344-Speed 6305.59 samples/sec Loss 4.9709 LearningRate 0.0003 Epoch: 19 Global Step: 408800 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:54:58,576-Speed 6338.34 samples/sec Loss 4.9077 LearningRate 0.0003 Epoch: 19 Global Step: 408810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:01,821-Speed 6313.18 samples/sec Loss 4.9325 LearningRate 0.0003 Epoch: 19 Global Step: 408820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:05,071-Speed 6303.29 samples/sec Loss 4.9118 LearningRate 0.0003 Epoch: 19 Global Step: 408830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:08,314-Speed 6316.04 samples/sec Loss 4.9840 LearningRate 0.0003 Epoch: 19 Global Step: 408840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:11,557-Speed 6315.63 samples/sec Loss 5.0019 LearningRate 0.0003 Epoch: 19 Global Step: 408850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:14,803-Speed 6311.46 samples/sec Loss 4.9247 LearningRate 0.0003 Epoch: 19 Global Step: 408860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:18,054-Speed 6300.30 samples/sec Loss 4.9632 LearningRate 0.0003 Epoch: 19 Global Step: 408870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:21,298-Speed 6315.54 samples/sec Loss 4.9657 LearningRate 0.0003 Epoch: 19 Global Step: 408880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:24,544-Speed 6310.82 samples/sec Loss 5.0139 LearningRate 0.0003 Epoch: 19 Global Step: 408890 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:27,792-Speed 6305.99 samples/sec Loss 4.9076 LearningRate 0.0003 Epoch: 19 Global Step: 408900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:31,038-Speed 6310.79 samples/sec Loss 4.9825 LearningRate 0.0003 Epoch: 19 Global Step: 408910 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:55:34,270-Speed 6338.14 samples/sec Loss 4.9086 LearningRate 0.0003 Epoch: 19 Global Step: 408920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:37,529-Speed 6285.12 samples/sec Loss 4.9321 LearningRate 0.0003 Epoch: 19 Global Step: 408930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:40,776-Speed 6310.54 samples/sec Loss 4.9297 LearningRate 0.0003 Epoch: 19 Global Step: 408940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:44,024-Speed 6307.04 samples/sec Loss 4.8944 LearningRate 0.0003 Epoch: 19 Global Step: 408950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:47,274-Speed 6302.23 samples/sec Loss 4.9240 LearningRate 0.0003 Epoch: 19 Global Step: 408960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:50,516-Speed 6319.13 samples/sec Loss 4.9801 LearningRate 0.0003 Epoch: 19 Global Step: 408970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:53,763-Speed 6308.79 samples/sec Loss 4.9703 LearningRate 0.0003 Epoch: 19 Global Step: 408980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:55:57,012-Speed 6304.10 samples/sec Loss 5.0029 LearningRate 0.0003 Epoch: 19 Global Step: 408990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:00,256-Speed 6315.95 samples/sec Loss 4.9338 LearningRate 0.0003 Epoch: 19 Global Step: 409000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:03,501-Speed 6312.47 samples/sec Loss 4.9477 LearningRate 0.0003 Epoch: 19 Global Step: 409010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:06,747-Speed 6309.48 samples/sec Loss 4.9051 LearningRate 0.0003 Epoch: 19 Global Step: 409020 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:56:09,998-Speed 6300.67 samples/sec Loss 4.9561 LearningRate 0.0003 Epoch: 19 Global Step: 409030 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:56:13,242-Speed 6315.50 samples/sec Loss 4.9475 LearningRate 0.0003 Epoch: 19 Global Step: 409040 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:56:16,473-Speed 6340.31 samples/sec Loss 4.9972 LearningRate 0.0003 Epoch: 19 Global Step: 409050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:19,717-Speed 6314.23 samples/sec Loss 4.9473 LearningRate 0.0003 Epoch: 19 Global Step: 409060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:22,961-Speed 6313.62 samples/sec Loss 4.9322 LearningRate 0.0003 Epoch: 19 Global Step: 409070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:26,208-Speed 6310.06 samples/sec Loss 4.8639 LearningRate 0.0003 Epoch: 19 Global Step: 409080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:29,463-Speed 6292.84 samples/sec Loss 4.9269 LearningRate 0.0003 Epoch: 19 Global Step: 409090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:32,706-Speed 6317.19 samples/sec Loss 5.0022 LearningRate 0.0003 Epoch: 19 Global Step: 409100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:35,951-Speed 6312.62 samples/sec Loss 4.9253 LearningRate 0.0003 Epoch: 19 Global Step: 409110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:39,204-Speed 6295.54 samples/sec Loss 4.9158 LearningRate 0.0003 Epoch: 19 Global Step: 409120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:42,451-Speed 6310.28 samples/sec Loss 4.9001 LearningRate 0.0003 Epoch: 19 Global Step: 409130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:45,693-Speed 6319.02 samples/sec Loss 4.9301 LearningRate 0.0003 Epoch: 19 Global Step: 409140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:48,941-Speed 6306.12 samples/sec Loss 4.9092 LearningRate 0.0003 Epoch: 19 Global Step: 409150 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:56:52,172-Speed 6339.27 samples/sec Loss 4.9768 LearningRate 0.0003 Epoch: 19 Global Step: 409160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:55,459-Speed 6233.27 samples/sec Loss 4.8783 LearningRate 0.0003 Epoch: 19 Global Step: 409170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:56:58,702-Speed 6315.84 samples/sec Loss 4.9593 LearningRate 0.0003 Epoch: 19 Global Step: 409180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:01,948-Speed 6311.01 samples/sec Loss 4.9586 LearningRate 0.0003 Epoch: 19 Global Step: 409190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:05,193-Speed 6312.48 samples/sec Loss 4.9478 LearningRate 0.0003 Epoch: 19 Global Step: 409200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:08,440-Speed 6309.11 samples/sec Loss 4.8943 LearningRate 0.0003 Epoch: 19 Global Step: 409210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:11,686-Speed 6310.31 samples/sec Loss 4.9702 LearningRate 0.0003 Epoch: 19 Global Step: 409220 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:14,930-Speed 6314.09 samples/sec Loss 4.9803 LearningRate 0.0003 Epoch: 19 Global Step: 409230 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:18,178-Speed 6308.19 samples/sec Loss 4.9323 LearningRate 0.0003 Epoch: 19 Global Step: 409240 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:21,427-Speed 6305.06 samples/sec Loss 4.8374 LearningRate 0.0003 Epoch: 19 Global Step: 409250 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:24,671-Speed 6314.08 samples/sec Loss 4.9008 LearningRate 0.0003 Epoch: 19 Global Step: 409260 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:57:27,920-Speed 6304.52 samples/sec Loss 4.9624 LearningRate 0.0003 Epoch: 19 Global Step: 409270 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:57:31,150-Speed 6342.29 samples/sec Loss 4.9783 LearningRate 0.0003 Epoch: 19 Global Step: 409280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:34,405-Speed 6292.08 samples/sec Loss 4.9240 LearningRate 0.0003 Epoch: 19 Global Step: 409290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:37,653-Speed 6308.31 samples/sec Loss 4.9665 LearningRate 0.0003 Epoch: 19 Global Step: 409300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:40,897-Speed 6314.79 samples/sec Loss 4.9299 LearningRate 0.0003 Epoch: 19 Global Step: 409310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:44,141-Speed 6313.99 samples/sec Loss 4.8819 LearningRate 0.0003 Epoch: 19 Global Step: 409320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:47,386-Speed 6312.43 samples/sec Loss 4.8050 LearningRate 0.0003 Epoch: 19 Global Step: 409330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:50,633-Speed 6308.60 samples/sec Loss 4.8847 LearningRate 0.0003 Epoch: 19 Global Step: 409340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:53,879-Speed 6310.49 samples/sec Loss 4.9749 LearningRate 0.0003 Epoch: 19 Global Step: 409350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:57:57,125-Speed 6310.14 samples/sec Loss 4.9390 LearningRate 0.0003 Epoch: 19 Global Step: 409360 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:00,374-Speed 6306.93 samples/sec Loss 4.9350 LearningRate 0.0003 Epoch: 19 Global Step: 409370 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:03,628-Speed 6293.78 samples/sec Loss 4.9600 LearningRate 0.0003 Epoch: 19 Global Step: 409380 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:58:06,917-Speed 6228.48 samples/sec Loss 4.9200 LearningRate 0.0003 Epoch: 19 Global Step: 409390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:10,163-Speed 6311.95 samples/sec Loss 4.9019 LearningRate 0.0003 Epoch: 19 Global Step: 409400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:13,410-Speed 6309.17 samples/sec Loss 4.9705 LearningRate 0.0003 Epoch: 19 Global Step: 409410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:16,655-Speed 6312.94 samples/sec Loss 4.9444 LearningRate 0.0003 Epoch: 19 Global Step: 409420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:19,897-Speed 6317.18 samples/sec Loss 4.9007 LearningRate 0.0003 Epoch: 19 Global Step: 409430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:23,156-Speed 6285.04 samples/sec Loss 4.9112 LearningRate 0.0003 Epoch: 19 Global Step: 409440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:26,396-Speed 6322.88 samples/sec Loss 4.9365 LearningRate 0.0003 Epoch: 19 Global Step: 409450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:29,642-Speed 6311.85 samples/sec Loss 4.9806 LearningRate 0.0003 Epoch: 19 Global Step: 409460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:32,889-Speed 6308.60 samples/sec Loss 4.8414 LearningRate 0.0003 Epoch: 19 Global Step: 409470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:36,132-Speed 6315.88 samples/sec Loss 4.9312 LearningRate 0.0003 Epoch: 19 Global Step: 409480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:39,364-Speed 6338.38 samples/sec Loss 4.9737 LearningRate 0.0003 Epoch: 19 Global Step: 409490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:42,661-Speed 6212.35 samples/sec Loss 4.9921 LearningRate 0.0003 Epoch: 19 Global Step: 409500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:45,905-Speed 6315.21 samples/sec Loss 4.9804 LearningRate 0.0003 Epoch: 19 Global Step: 409510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:49,152-Speed 6307.51 samples/sec Loss 4.9595 LearningRate 0.0003 Epoch: 19 Global Step: 409520 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:52,407-Speed 6293.66 samples/sec Loss 4.8902 LearningRate 0.0003 Epoch: 19 Global Step: 409530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:55,654-Speed 6308.42 samples/sec Loss 4.8638 LearningRate 0.0003 Epoch: 19 Global Step: 409540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:58:58,897-Speed 6317.83 samples/sec Loss 4.9628 LearningRate 0.0003 Epoch: 19 Global Step: 409550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:02,245-Speed 6118.14 samples/sec Loss 4.9158 LearningRate 0.0003 Epoch: 19 Global Step: 409560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:05,484-Speed 6323.67 samples/sec Loss 4.9934 LearningRate 0.0003 Epoch: 19 Global Step: 409570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:08,727-Speed 6316.86 samples/sec Loss 4.9022 LearningRate 0.0003 Epoch: 19 Global Step: 409580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:11,988-Speed 6282.31 samples/sec Loss 4.9283 LearningRate 0.0003 Epoch: 19 Global Step: 409590 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:59:15,235-Speed 6310.14 samples/sec Loss 4.9931 LearningRate 0.0003 Epoch: 19 Global Step: 409600 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:59:18,515-Speed 6244.10 samples/sec Loss 4.8566 LearningRate 0.0003 Epoch: 19 Global Step: 409610 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:59:21,759-Speed 6314.67 samples/sec Loss 4.9324 LearningRate 0.0003 Epoch: 19 Global Step: 409620 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 04:59:24,996-Speed 6329.06 samples/sec Loss 4.8941 LearningRate 0.0003 Epoch: 19 Global Step: 409630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:28,238-Speed 6318.71 samples/sec Loss 4.9422 LearningRate 0.0003 Epoch: 19 Global Step: 409640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:31,481-Speed 6315.78 samples/sec Loss 4.9103 LearningRate 0.0003 Epoch: 19 Global Step: 409650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:34,730-Speed 6304.25 samples/sec Loss 4.9181 LearningRate 0.0003 Epoch: 19 Global Step: 409660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:37,978-Speed 6308.11 samples/sec Loss 4.9353 LearningRate 0.0003 Epoch: 19 Global Step: 409670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:41,222-Speed 6313.99 samples/sec Loss 4.8357 LearningRate 0.0003 Epoch: 19 Global Step: 409680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:44,466-Speed 6313.79 samples/sec Loss 4.8480 LearningRate 0.0003 Epoch: 19 Global Step: 409690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:47,714-Speed 6308.26 samples/sec Loss 4.9019 LearningRate 0.0003 Epoch: 19 Global Step: 409700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:50,959-Speed 6311.08 samples/sec Loss 4.9978 LearningRate 0.0003 Epoch: 19 Global Step: 409710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:54,201-Speed 6318.31 samples/sec Loss 4.9636 LearningRate 0.0003 Epoch: 19 Global Step: 409720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 04:59:57,448-Speed 6309.73 samples/sec Loss 4.9712 LearningRate 0.0003 Epoch: 19 Global Step: 409730 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:00:00,703-Speed 6293.57 samples/sec Loss 4.9579 LearningRate 0.0003 Epoch: 19 Global Step: 409740 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:00:03,955-Speed 6298.43 samples/sec Loss 4.9128 LearningRate 0.0003 Epoch: 19 Global Step: 409750 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:00:07,201-Speed 6311.12 samples/sec Loss 4.9584 LearningRate 0.0003 Epoch: 19 Global Step: 409760 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:00:10,460-Speed 6286.07 samples/sec Loss 4.9043 LearningRate 0.0003 Epoch: 19 Global Step: 409770 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:00:13,694-Speed 6332.19 samples/sec Loss 4.9230 LearningRate 0.0003 Epoch: 19 Global Step: 409780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:00:16,948-Speed 6296.95 samples/sec Loss 4.8722 LearningRate 0.0003 Epoch: 19 Global Step: 409790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:00:20,196-Speed 6306.50 samples/sec Loss 4.9198 LearningRate 0.0003 Epoch: 19 Global Step: 409800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:00:23,442-Speed 6311.54 samples/sec Loss 4.9482 LearningRate 0.0003 Epoch: 19 Global Step: 409810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:00:26,686-Speed 6314.39 samples/sec Loss 4.9271 LearningRate 0.0003 Epoch: 19 Global Step: 409820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:00:29,935-Speed 6304.59 samples/sec Loss 4.8844 LearningRate 0.0003 Epoch: 19 Global Step: 409830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:00:33,182-Speed 6310.22 samples/sec Loss 4.9856 LearningRate 0.0003 Epoch: 19 Global Step: 409840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:00:36,428-Speed 6308.91 samples/sec Loss 4.9020 LearningRate 0.0003 Epoch: 19 Global Step: 409850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:00:39,674-Speed 6311.66 samples/sec Loss 4.9155 LearningRate 0.0003 Epoch: 19 Global Step: 409860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:00:42,923-Speed 6305.83 samples/sec Loss 4.9536 LearningRate 0.0003 Epoch: 19 Global Step: 409870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:00:46,173-Speed 6301.68 samples/sec Loss 4.9963 LearningRate 0.0003 Epoch: 19 Global Step: 409880 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:00:49,418-Speed 6312.65 samples/sec Loss 4.9576 LearningRate 0.0003 Epoch: 19 Global Step: 409890 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:00:52,665-Speed 6309.84 samples/sec Loss 4.8697 LearningRate 0.0003 Epoch: 19 Global Step: 409900 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:00:55,909-Speed 6313.02 samples/sec Loss 4.9493 LearningRate 0.0003 Epoch: 19 Global Step: 409910 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:00:59,170-Speed 6283.53 samples/sec Loss 5.0403 LearningRate 0.0003 Epoch: 19 Global Step: 409920 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:01:02,405-Speed 6330.21 samples/sec Loss 4.8642 LearningRate 0.0003 Epoch: 19 Global Step: 409930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:05,652-Speed 6310.06 samples/sec Loss 4.9850 LearningRate 0.0003 Epoch: 19 Global Step: 409940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:08,899-Speed 6307.81 samples/sec Loss 4.8541 LearningRate 0.0003 Epoch: 19 Global Step: 409950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:12,144-Speed 6313.38 samples/sec Loss 4.9043 LearningRate 0.0003 Epoch: 19 Global Step: 409960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:15,392-Speed 6307.11 samples/sec Loss 4.9907 LearningRate 0.0003 Epoch: 19 Global Step: 409970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:18,641-Speed 6304.92 samples/sec Loss 4.9160 LearningRate 0.0003 Epoch: 19 Global Step: 409980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:21,895-Speed 6295.52 samples/sec Loss 4.9164 LearningRate 0.0003 Epoch: 19 Global Step: 409990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:25,143-Speed 6306.56 samples/sec Loss 4.9175 LearningRate 0.0003 Epoch: 19 Global Step: 410000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:28,390-Speed 6309.07 samples/sec Loss 4.9068 LearningRate 0.0003 Epoch: 19 Global Step: 410010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:31,635-Speed 6312.81 samples/sec Loss 4.9678 LearningRate 0.0003 Epoch: 19 Global Step: 410020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:34,866-Speed 6339.50 samples/sec Loss 4.9316 LearningRate 0.0003 Epoch: 19 Global Step: 410030 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:38,113-Speed 6313.00 samples/sec Loss 4.9563 LearningRate 0.0003 Epoch: 19 Global Step: 410040 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:41,360-Speed 6308.44 samples/sec Loss 4.9873 LearningRate 0.0003 Epoch: 19 Global Step: 410050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:44,606-Speed 6311.60 samples/sec Loss 4.9899 LearningRate 0.0003 Epoch: 19 Global Step: 410060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:47,858-Speed 6297.72 samples/sec Loss 5.0333 LearningRate 0.0003 Epoch: 19 Global Step: 410070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:51,103-Speed 6312.55 samples/sec Loss 4.9860 LearningRate 0.0003 Epoch: 19 Global Step: 410080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:54,350-Speed 6308.51 samples/sec Loss 4.9293 LearningRate 0.0003 Epoch: 19 Global Step: 410090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:01:57,594-Speed 6314.52 samples/sec Loss 4.9688 LearningRate 0.0003 Epoch: 19 Global Step: 410100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:00,844-Speed 6302.73 samples/sec Loss 4.8598 LearningRate 0.0003 Epoch: 19 Global Step: 410110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:04,091-Speed 6310.39 samples/sec Loss 4.9057 LearningRate 0.0003 Epoch: 19 Global Step: 410120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:07,324-Speed 6335.99 samples/sec Loss 4.9123 LearningRate 0.0003 Epoch: 19 Global Step: 410130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:10,569-Speed 6311.78 samples/sec Loss 4.9938 LearningRate 0.0003 Epoch: 19 Global Step: 410140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:13,810-Speed 6320.25 samples/sec Loss 4.9071 LearningRate 0.0003 Epoch: 19 Global Step: 410150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:17,059-Speed 6306.43 samples/sec Loss 4.9296 LearningRate 0.0003 Epoch: 19 Global Step: 410160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:20,305-Speed 6309.41 samples/sec Loss 4.9201 LearningRate 0.0003 Epoch: 19 Global Step: 410170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:23,549-Speed 6313.68 samples/sec Loss 4.9021 LearningRate 0.0003 Epoch: 19 Global Step: 410180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:26,794-Speed 6313.44 samples/sec Loss 4.9750 LearningRate 0.0003 Epoch: 19 Global Step: 410190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:30,041-Speed 6308.61 samples/sec Loss 4.9495 LearningRate 0.0003 Epoch: 19 Global Step: 410200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:33,284-Speed 6316.20 samples/sec Loss 4.9888 LearningRate 0.0003 Epoch: 19 Global Step: 410210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:36,533-Speed 6306.65 samples/sec Loss 4.9654 LearningRate 0.0003 Epoch: 19 Global Step: 410220 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:39,778-Speed 6311.88 samples/sec Loss 4.9586 LearningRate 0.0003 Epoch: 19 Global Step: 410230 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:02:43,025-Speed 6308.57 samples/sec Loss 4.9457 LearningRate 0.0003 Epoch: 19 Global Step: 410240 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:02:46,270-Speed 6314.32 samples/sec Loss 4.9443 LearningRate 0.0003 Epoch: 19 Global Step: 410250 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:02:49,499-Speed 6342.51 samples/sec Loss 5.0021 LearningRate 0.0003 Epoch: 19 Global Step: 410260 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:52,743-Speed 6314.58 samples/sec Loss 4.9830 LearningRate 0.0003 Epoch: 19 Global Step: 410270 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:55,995-Speed 6300.17 samples/sec Loss 4.8780 LearningRate 0.0003 Epoch: 19 Global Step: 410280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:02:59,243-Speed 6307.41 samples/sec Loss 4.9579 LearningRate 0.0003 Epoch: 19 Global Step: 410290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:02,501-Speed 6286.66 samples/sec Loss 4.8854 LearningRate 0.0003 Epoch: 19 Global Step: 410300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:05,746-Speed 6312.52 samples/sec Loss 4.8823 LearningRate 0.0003 Epoch: 19 Global Step: 410310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:08,991-Speed 6312.53 samples/sec Loss 4.9537 LearningRate 0.0003 Epoch: 19 Global Step: 410320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:12,238-Speed 6309.59 samples/sec Loss 4.9313 LearningRate 0.0003 Epoch: 19 Global Step: 410330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:15,483-Speed 6311.85 samples/sec Loss 4.8755 LearningRate 0.0003 Epoch: 19 Global Step: 410340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:18,729-Speed 6311.40 samples/sec Loss 4.8661 LearningRate 0.0003 Epoch: 19 Global Step: 410350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:21,996-Speed 6269.39 samples/sec Loss 5.0011 LearningRate 0.0003 Epoch: 19 Global Step: 410360 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:03:25,246-Speed 6302.89 samples/sec Loss 4.9018 LearningRate 0.0003 Epoch: 19 Global Step: 410370 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:03:28,540-Speed 6220.71 samples/sec Loss 4.8619 LearningRate 0.0003 Epoch: 19 Global Step: 410380 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:31,807-Speed 6269.76 samples/sec Loss 4.9641 LearningRate 0.0003 Epoch: 19 Global Step: 410390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:35,054-Speed 6309.60 samples/sec Loss 4.9442 LearningRate 0.0003 Epoch: 19 Global Step: 410400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:38,301-Speed 6308.36 samples/sec Loss 4.9325 LearningRate 0.0003 Epoch: 19 Global Step: 410410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:41,548-Speed 6307.49 samples/sec Loss 4.9415 LearningRate 0.0003 Epoch: 19 Global Step: 410420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:44,796-Speed 6308.36 samples/sec Loss 4.9550 LearningRate 0.0003 Epoch: 19 Global Step: 410430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:48,044-Speed 6306.56 samples/sec Loss 4.9830 LearningRate 0.0003 Epoch: 19 Global Step: 410440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:51,290-Speed 6310.07 samples/sec Loss 4.9541 LearningRate 0.0003 Epoch: 19 Global Step: 410450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:54,538-Speed 6307.97 samples/sec Loss 4.8794 LearningRate 0.0003 Epoch: 19 Global Step: 410460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:03:57,787-Speed 6304.88 samples/sec Loss 4.9326 LearningRate 0.0003 Epoch: 19 Global Step: 410470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:01,040-Speed 6297.16 samples/sec Loss 4.9292 LearningRate 0.0003 Epoch: 19 Global Step: 410480 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:04:04,291-Speed 6300.82 samples/sec Loss 4.9676 LearningRate 0.0003 Epoch: 19 Global Step: 410490 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:04:07,535-Speed 6313.91 samples/sec Loss 4.9646 LearningRate 0.0003 Epoch: 19 Global Step: 410500 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:04:10,789-Speed 6296.74 samples/sec Loss 4.8796 LearningRate 0.0003 Epoch: 19 Global Step: 410510 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:04:14,034-Speed 6312.96 samples/sec Loss 4.8707 LearningRate 0.0003 Epoch: 19 Global Step: 410520 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:04:17,285-Speed 6300.97 samples/sec Loss 4.9382 LearningRate 0.0003 Epoch: 19 Global Step: 410530 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:04:20,519-Speed 6333.92 samples/sec Loss 4.9119 LearningRate 0.0003 Epoch: 19 Global Step: 410540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:23,762-Speed 6315.38 samples/sec Loss 4.8759 LearningRate 0.0003 Epoch: 19 Global Step: 410550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:27,011-Speed 6305.34 samples/sec Loss 4.8828 LearningRate 0.0003 Epoch: 19 Global Step: 410560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:30,256-Speed 6313.09 samples/sec Loss 4.8790 LearningRate 0.0003 Epoch: 19 Global Step: 410570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:33,505-Speed 6303.85 samples/sec Loss 4.9211 LearningRate 0.0003 Epoch: 19 Global Step: 410580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:36,752-Speed 6309.06 samples/sec Loss 4.8494 LearningRate 0.0003 Epoch: 19 Global Step: 410590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:40,011-Speed 6285.25 samples/sec Loss 4.9280 LearningRate 0.0003 Epoch: 19 Global Step: 410600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:43,255-Speed 6314.86 samples/sec Loss 4.9296 LearningRate 0.0003 Epoch: 19 Global Step: 410610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:46,514-Speed 6287.21 samples/sec Loss 5.0005 LearningRate 0.0003 Epoch: 19 Global Step: 410620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:49,764-Speed 6301.75 samples/sec Loss 4.8740 LearningRate 0.0003 Epoch: 19 Global Step: 410630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:04:53,010-Speed 6311.31 samples/sec Loss 4.9495 LearningRate 0.0003 Epoch: 19 Global Step: 410640 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:04:56,257-Speed 6307.17 samples/sec Loss 4.9563 LearningRate 0.0003 Epoch: 19 Global Step: 410650 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:04:59,490-Speed 6337.87 samples/sec Loss 4.9048 LearningRate 0.0003 Epoch: 19 Global Step: 410660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:02,739-Speed 6304.95 samples/sec Loss 4.9171 LearningRate 0.0003 Epoch: 19 Global Step: 410670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:05,999-Speed 6282.84 samples/sec Loss 4.8915 LearningRate 0.0003 Epoch: 19 Global Step: 410680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:09,245-Speed 6312.21 samples/sec Loss 4.8855 LearningRate 0.0003 Epoch: 19 Global Step: 410690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:12,491-Speed 6311.13 samples/sec Loss 4.9333 LearningRate 0.0003 Epoch: 19 Global Step: 410700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:15,736-Speed 6311.99 samples/sec Loss 4.9123 LearningRate 0.0003 Epoch: 19 Global Step: 410710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:18,982-Speed 6309.77 samples/sec Loss 4.9108 LearningRate 0.0003 Epoch: 19 Global Step: 410720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:22,226-Speed 6315.01 samples/sec Loss 4.9884 LearningRate 0.0003 Epoch: 19 Global Step: 410730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:25,471-Speed 6313.54 samples/sec Loss 4.9375 LearningRate 0.0003 Epoch: 19 Global Step: 410740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:28,719-Speed 6306.11 samples/sec Loss 4.8908 LearningRate 0.0003 Epoch: 19 Global Step: 410750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:31,947-Speed 6345.56 samples/sec Loss 4.9539 LearningRate 0.0003 Epoch: 19 Global Step: 410760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:35,193-Speed 6311.74 samples/sec Loss 4.9057 LearningRate 0.0003 Epoch: 19 Global Step: 410770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:38,447-Speed 6295.43 samples/sec Loss 4.9867 LearningRate 0.0003 Epoch: 19 Global Step: 410780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:41,697-Speed 6301.88 samples/sec Loss 4.9161 LearningRate 0.0003 Epoch: 19 Global Step: 410790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:44,943-Speed 6310.44 samples/sec Loss 4.9092 LearningRate 0.0003 Epoch: 19 Global Step: 410800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:48,189-Speed 6311.24 samples/sec Loss 4.8631 LearningRate 0.0003 Epoch: 19 Global Step: 410810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:51,438-Speed 6304.23 samples/sec Loss 4.9461 LearningRate 0.0003 Epoch: 19 Global Step: 410820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:54,682-Speed 6315.23 samples/sec Loss 4.9027 LearningRate 0.0003 Epoch: 19 Global Step: 410830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:05:57,924-Speed 6317.23 samples/sec Loss 4.9545 LearningRate 0.0003 Epoch: 19 Global Step: 410840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:01,173-Speed 6305.70 samples/sec Loss 4.8891 LearningRate 0.0003 Epoch: 19 Global Step: 410850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:04,421-Speed 6307.47 samples/sec Loss 4.9223 LearningRate 0.0003 Epoch: 19 Global Step: 410860 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:06:07,669-Speed 6306.90 samples/sec Loss 4.9129 LearningRate 0.0003 Epoch: 19 Global Step: 410870 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:06:10,912-Speed 6316.55 samples/sec Loss 4.8923 LearningRate 0.0003 Epoch: 19 Global Step: 410880 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:06:14,147-Speed 6332.41 samples/sec Loss 4.9727 LearningRate 0.0003 Epoch: 19 Global Step: 410890 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:17,394-Speed 6308.49 samples/sec Loss 4.8517 LearningRate 0.0003 Epoch: 19 Global Step: 410900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:20,640-Speed 6311.36 samples/sec Loss 5.0151 LearningRate 0.0003 Epoch: 19 Global Step: 410910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:23,887-Speed 6308.16 samples/sec Loss 4.9214 LearningRate 0.0003 Epoch: 19 Global Step: 410920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:27,137-Speed 6303.07 samples/sec Loss 4.9042 LearningRate 0.0003 Epoch: 19 Global Step: 410930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:30,386-Speed 6305.20 samples/sec Loss 4.9685 LearningRate 0.0003 Epoch: 19 Global Step: 410940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:33,629-Speed 6316.97 samples/sec Loss 4.8819 LearningRate 0.0003 Epoch: 19 Global Step: 410950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:36,874-Speed 6312.68 samples/sec Loss 4.9388 LearningRate 0.0003 Epoch: 19 Global Step: 410960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:40,119-Speed 6311.28 samples/sec Loss 4.9013 LearningRate 0.0003 Epoch: 19 Global Step: 410970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:43,368-Speed 6315.69 samples/sec Loss 4.9157 LearningRate 0.0003 Epoch: 19 Global Step: 410980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:46,628-Speed 6284.14 samples/sec Loss 5.0006 LearningRate 0.0003 Epoch: 19 Global Step: 410990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:49,909-Speed 6242.17 samples/sec Loss 4.8695 LearningRate 0.0003 Epoch: 19 Global Step: 411000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:53,157-Speed 6307.23 samples/sec Loss 4.8793 LearningRate 0.0003 Epoch: 19 Global Step: 411010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:56,405-Speed 6306.35 samples/sec Loss 4.8628 LearningRate 0.0003 Epoch: 19 Global Step: 411020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:06:59,654-Speed 6305.99 samples/sec Loss 4.9459 LearningRate 0.0003 Epoch: 19 Global Step: 411030 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:02,902-Speed 6307.30 samples/sec Loss 4.8498 LearningRate 0.0003 Epoch: 19 Global Step: 411040 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:06,146-Speed 6313.47 samples/sec Loss 4.9129 LearningRate 0.0003 Epoch: 19 Global Step: 411050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:09,393-Speed 6309.63 samples/sec Loss 4.8712 LearningRate 0.0003 Epoch: 19 Global Step: 411060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:12,638-Speed 6310.87 samples/sec Loss 4.9153 LearningRate 0.0003 Epoch: 19 Global Step: 411070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:15,899-Speed 6283.10 samples/sec Loss 5.0064 LearningRate 0.0003 Epoch: 19 Global Step: 411080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:19,147-Speed 6306.06 samples/sec Loss 4.8919 LearningRate 0.0003 Epoch: 19 Global Step: 411090 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:07:22,396-Speed 6305.39 samples/sec Loss 4.9581 LearningRate 0.0003 Epoch: 19 Global Step: 411100 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:07:25,644-Speed 6307.31 samples/sec Loss 4.9330 LearningRate 0.0003 Epoch: 19 Global Step: 411110 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:07:28,887-Speed 6315.99 samples/sec Loss 4.9422 LearningRate 0.0003 Epoch: 19 Global Step: 411120 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:07:32,121-Speed 6333.94 samples/sec Loss 4.8931 LearningRate 0.0003 Epoch: 19 Global Step: 411130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:35,371-Speed 6304.35 samples/sec Loss 4.8926 LearningRate 0.0003 Epoch: 19 Global Step: 411140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:38,613-Speed 6317.97 samples/sec Loss 4.8973 LearningRate 0.0003 Epoch: 19 Global Step: 411150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:41,858-Speed 6311.89 samples/sec Loss 4.9070 LearningRate 0.0003 Epoch: 19 Global Step: 411160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:45,103-Speed 6314.06 samples/sec Loss 5.0184 LearningRate 0.0003 Epoch: 19 Global Step: 411170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:48,347-Speed 6314.36 samples/sec Loss 5.0566 LearningRate 0.0003 Epoch: 19 Global Step: 411180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:51,592-Speed 6313.38 samples/sec Loss 4.8874 LearningRate 0.0003 Epoch: 19 Global Step: 411190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:54,839-Speed 6308.42 samples/sec Loss 4.9238 LearningRate 0.0003 Epoch: 19 Global Step: 411200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:07:58,083-Speed 6313.18 samples/sec Loss 4.9180 LearningRate 0.0003 Epoch: 19 Global Step: 411210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:01,327-Speed 6314.26 samples/sec Loss 4.9197 LearningRate 0.0003 Epoch: 19 Global Step: 411220 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:04,576-Speed 6305.62 samples/sec Loss 4.9429 LearningRate 0.0003 Epoch: 19 Global Step: 411230 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:08:07,821-Speed 6312.51 samples/sec Loss 4.9581 LearningRate 0.0003 Epoch: 19 Global Step: 411240 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:08:11,066-Speed 6313.19 samples/sec Loss 4.8600 LearningRate 0.0003 Epoch: 19 Global Step: 411250 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:08:14,300-Speed 6334.96 samples/sec Loss 4.9488 LearningRate 0.0003 Epoch: 19 Global Step: 411260 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:17,546-Speed 6308.88 samples/sec Loss 4.9424 LearningRate 0.0003 Epoch: 19 Global Step: 411270 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:20,792-Speed 6311.02 samples/sec Loss 4.9095 LearningRate 0.0003 Epoch: 19 Global Step: 411280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:24,040-Speed 6308.25 samples/sec Loss 4.9385 LearningRate 0.0003 Epoch: 19 Global Step: 411290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:27,286-Speed 6309.48 samples/sec Loss 4.8729 LearningRate 0.0003 Epoch: 19 Global Step: 411300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:30,534-Speed 6308.13 samples/sec Loss 4.8558 LearningRate 0.0003 Epoch: 19 Global Step: 411310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:33,778-Speed 6314.98 samples/sec Loss 4.9072 LearningRate 0.0003 Epoch: 19 Global Step: 411320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:37,024-Speed 6311.14 samples/sec Loss 4.9728 LearningRate 0.0003 Epoch: 19 Global Step: 411330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:40,268-Speed 6313.14 samples/sec Loss 4.9303 LearningRate 0.0003 Epoch: 19 Global Step: 411340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:43,515-Speed 6309.03 samples/sec Loss 4.9141 LearningRate 0.0003 Epoch: 19 Global Step: 411350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:46,758-Speed 6315.86 samples/sec Loss 4.8879 LearningRate 0.0003 Epoch: 19 Global Step: 411360 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:08:49,990-Speed 6339.55 samples/sec Loss 4.9171 LearningRate 0.0003 Epoch: 19 Global Step: 411370 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:53,239-Speed 6305.24 samples/sec Loss 4.9387 LearningRate 0.0003 Epoch: 19 Global Step: 411380 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:56,482-Speed 6316.06 samples/sec Loss 4.9192 LearningRate 0.0003 Epoch: 19 Global Step: 411390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:08:59,729-Speed 6308.26 samples/sec Loss 4.8935 LearningRate 0.0003 Epoch: 19 Global Step: 411400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:02,975-Speed 6310.92 samples/sec Loss 4.9259 LearningRate 0.0003 Epoch: 19 Global Step: 411410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:06,223-Speed 6306.00 samples/sec Loss 4.9084 LearningRate 0.0003 Epoch: 19 Global Step: 411420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:09,468-Speed 6312.33 samples/sec Loss 4.8464 LearningRate 0.0003 Epoch: 19 Global Step: 411430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:12,716-Speed 6307.12 samples/sec Loss 4.9222 LearningRate 0.0003 Epoch: 19 Global Step: 411440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:15,963-Speed 6309.59 samples/sec Loss 4.9063 LearningRate 0.0003 Epoch: 19 Global Step: 411450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:19,209-Speed 6311.11 samples/sec Loss 4.8779 LearningRate 0.0003 Epoch: 19 Global Step: 411460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:22,455-Speed 6309.52 samples/sec Loss 4.9157 LearningRate 0.0003 Epoch: 19 Global Step: 411470 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:09:25,705-Speed 6304.33 samples/sec Loss 4.8184 LearningRate 0.0003 Epoch: 19 Global Step: 411480 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:09:28,939-Speed 6332.13 samples/sec Loss 4.9318 LearningRate 0.0003 Epoch: 19 Global Step: 411490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:32,183-Speed 6315.74 samples/sec Loss 4.9553 LearningRate 0.0003 Epoch: 19 Global Step: 411500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:35,433-Speed 6302.33 samples/sec Loss 4.9131 LearningRate 0.0003 Epoch: 19 Global Step: 411510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:38,684-Speed 6300.50 samples/sec Loss 4.9936 LearningRate 0.0003 Epoch: 19 Global Step: 411520 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:41,926-Speed 6320.17 samples/sec Loss 4.9103 LearningRate 0.0003 Epoch: 19 Global Step: 411530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:45,172-Speed 6309.98 samples/sec Loss 4.9172 LearningRate 0.0003 Epoch: 19 Global Step: 411540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:48,420-Speed 6308.54 samples/sec Loss 4.9240 LearningRate 0.0003 Epoch: 19 Global Step: 411550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:51,668-Speed 6306.00 samples/sec Loss 4.8657 LearningRate 0.0003 Epoch: 19 Global Step: 411560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:54,917-Speed 6305.43 samples/sec Loss 4.8414 LearningRate 0.0003 Epoch: 19 Global Step: 411570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:09:58,161-Speed 6315.00 samples/sec Loss 4.8785 LearningRate 0.0003 Epoch: 19 Global Step: 411580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:01,409-Speed 6305.60 samples/sec Loss 4.8842 LearningRate 0.0003 Epoch: 19 Global Step: 411590 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:10:04,657-Speed 6306.24 samples/sec Loss 4.9301 LearningRate 0.0003 Epoch: 19 Global Step: 411600 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:10:07,901-Speed 6315.99 samples/sec Loss 4.8602 LearningRate 0.0003 Epoch: 19 Global Step: 411610 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:10:11,144-Speed 6315.53 samples/sec Loss 4.8669 LearningRate 0.0003 Epoch: 19 Global Step: 411620 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:10:14,374-Speed 6342.54 samples/sec Loss 4.9240 LearningRate 0.0003 Epoch: 19 Global Step: 411630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:17,627-Speed 6297.55 samples/sec Loss 4.8939 LearningRate 0.0003 Epoch: 19 Global Step: 411640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:20,870-Speed 6315.81 samples/sec Loss 4.8855 LearningRate 0.0003 Epoch: 19 Global Step: 411650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:24,116-Speed 6310.86 samples/sec Loss 4.9422 LearningRate 0.0003 Epoch: 19 Global Step: 411660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:27,371-Speed 6293.13 samples/sec Loss 4.8989 LearningRate 0.0003 Epoch: 19 Global Step: 411670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:30,619-Speed 6306.84 samples/sec Loss 4.9504 LearningRate 0.0003 Epoch: 19 Global Step: 411680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:33,862-Speed 6317.38 samples/sec Loss 4.9046 LearningRate 0.0003 Epoch: 19 Global Step: 411690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:37,109-Speed 6308.38 samples/sec Loss 4.8900 LearningRate 0.0003 Epoch: 19 Global Step: 411700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:40,380-Speed 6262.68 samples/sec Loss 4.8486 LearningRate 0.0003 Epoch: 19 Global Step: 411710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:43,635-Speed 6294.13 samples/sec Loss 4.8986 LearningRate 0.0003 Epoch: 19 Global Step: 411720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:10:46,921-Speed 6233.95 samples/sec Loss 4.9077 LearningRate 0.0003 Epoch: 19 Global Step: 411730 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:10:50,218-Speed 6212.03 samples/sec Loss 4.9111 LearningRate 0.0003 Epoch: 19 Global Step: 411740 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:10:53,497-Speed 6247.20 samples/sec Loss 4.9561 LearningRate 0.0003 Epoch: 19 Global Step: 411750 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:10:56,753-Speed 6291.56 samples/sec Loss 4.9312 LearningRate 0.0003 Epoch: 19 Global Step: 411760 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:10:59,999-Speed 6311.39 samples/sec Loss 4.8917 LearningRate 0.0003 Epoch: 19 Global Step: 411770 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:11:03,228-Speed 6344.11 samples/sec Loss 4.9105 LearningRate 0.0003 Epoch: 19 Global Step: 411780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:06,471-Speed 6316.99 samples/sec Loss 4.9599 LearningRate 0.0003 Epoch: 19 Global Step: 411790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:09,712-Speed 6319.85 samples/sec Loss 4.8306 LearningRate 0.0003 Epoch: 19 Global Step: 411800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:12,958-Speed 6311.23 samples/sec Loss 4.9160 LearningRate 0.0003 Epoch: 19 Global Step: 411810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:16,198-Speed 6321.36 samples/sec Loss 4.9254 LearningRate 0.0003 Epoch: 19 Global Step: 411820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:19,443-Speed 6313.05 samples/sec Loss 4.8929 LearningRate 0.0003 Epoch: 19 Global Step: 411830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:22,692-Speed 6305.25 samples/sec Loss 4.8806 LearningRate 0.0003 Epoch: 19 Global Step: 411840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:25,937-Speed 6311.06 samples/sec Loss 4.8648 LearningRate 0.0003 Epoch: 19 Global Step: 411850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:29,188-Speed 6302.17 samples/sec Loss 4.9102 LearningRate 0.0003 Epoch: 19 Global Step: 411860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:32,429-Speed 6319.40 samples/sec Loss 4.8151 LearningRate 0.0003 Epoch: 19 Global Step: 411870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:35,678-Speed 6305.52 samples/sec Loss 4.9085 LearningRate 0.0003 Epoch: 19 Global Step: 411880 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:11:38,927-Speed 6305.32 samples/sec Loss 4.9487 LearningRate 0.0003 Epoch: 19 Global Step: 411890 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:11:42,156-Speed 6344.23 samples/sec Loss 4.9780 LearningRate 0.0003 Epoch: 19 Global Step: 411900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:45,417-Speed 6280.30 samples/sec Loss 4.8783 LearningRate 0.0003 Epoch: 19 Global Step: 411910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:48,724-Speed 6195.92 samples/sec Loss 4.8811 LearningRate 0.0003 Epoch: 19 Global Step: 411920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:51,969-Speed 6312.14 samples/sec Loss 4.8618 LearningRate 0.0003 Epoch: 19 Global Step: 411930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:55,216-Speed 6308.45 samples/sec Loss 4.8949 LearningRate 0.0003 Epoch: 19 Global Step: 411940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:11:58,464-Speed 6308.94 samples/sec Loss 4.8648 LearningRate 0.0003 Epoch: 19 Global Step: 411950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:01,713-Speed 6303.73 samples/sec Loss 4.9259 LearningRate 0.0003 Epoch: 19 Global Step: 411960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:04,961-Speed 6307.91 samples/sec Loss 4.8242 LearningRate 0.0003 Epoch: 19 Global Step: 411970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:08,206-Speed 6310.83 samples/sec Loss 4.9361 LearningRate 0.0003 Epoch: 19 Global Step: 411980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:11,453-Speed 6310.07 samples/sec Loss 4.9099 LearningRate 0.0003 Epoch: 19 Global Step: 411990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:14,735-Speed 6240.26 samples/sec Loss 4.9057 LearningRate 0.0003 Epoch: 19 Global Step: 412000 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:12:18,003-Speed 6268.45 samples/sec Loss 4.8794 LearningRate 0.0003 Epoch: 19 Global Step: 412010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:21,246-Speed 6317.53 samples/sec Loss 4.8872 LearningRate 0.0003 Epoch: 19 Global Step: 412020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:24,491-Speed 6312.46 samples/sec Loss 4.8711 LearningRate 0.0003 Epoch: 19 Global Step: 412030 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:27,736-Speed 6311.70 samples/sec Loss 4.9913 LearningRate 0.0003 Epoch: 19 Global Step: 412040 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:30,979-Speed 6317.66 samples/sec Loss 4.9498 LearningRate 0.0003 Epoch: 19 Global Step: 412050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:34,257-Speed 6248.45 samples/sec Loss 4.8593 LearningRate 0.0003 Epoch: 19 Global Step: 412060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:37,540-Speed 6240.13 samples/sec Loss 4.9239 LearningRate 0.0003 Epoch: 19 Global Step: 412070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:40,783-Speed 6316.26 samples/sec Loss 4.8889 LearningRate 0.0003 Epoch: 19 Global Step: 412080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:44,028-Speed 6311.38 samples/sec Loss 4.8404 LearningRate 0.0003 Epoch: 19 Global Step: 412090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:47,277-Speed 6306.31 samples/sec Loss 4.9422 LearningRate 0.0003 Epoch: 19 Global Step: 412100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:50,521-Speed 6314.24 samples/sec Loss 4.9215 LearningRate 0.0003 Epoch: 19 Global Step: 412110 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:12:53,753-Speed 6337.51 samples/sec Loss 4.9548 LearningRate 0.0003 Epoch: 19 Global Step: 412120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:12:57,036-Speed 6240.56 samples/sec Loss 4.9157 LearningRate 0.0003 Epoch: 19 Global Step: 412130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:00,289-Speed 6296.69 samples/sec Loss 4.8919 LearningRate 0.0003 Epoch: 19 Global Step: 412140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:03,537-Speed 6307.72 samples/sec Loss 4.8949 LearningRate 0.0003 Epoch: 19 Global Step: 412150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:06,782-Speed 6312.73 samples/sec Loss 4.9864 LearningRate 0.0003 Epoch: 19 Global Step: 412160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:10,031-Speed 6305.05 samples/sec Loss 4.9181 LearningRate 0.0003 Epoch: 19 Global Step: 412170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:13,310-Speed 6246.45 samples/sec Loss 4.9214 LearningRate 0.0003 Epoch: 19 Global Step: 412180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:16,556-Speed 6311.61 samples/sec Loss 4.9197 LearningRate 0.0003 Epoch: 19 Global Step: 412190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:19,798-Speed 6317.63 samples/sec Loss 4.9549 LearningRate 0.0003 Epoch: 19 Global Step: 412200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:23,042-Speed 6314.72 samples/sec Loss 4.8538 LearningRate 0.0003 Epoch: 19 Global Step: 412210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:26,289-Speed 6308.05 samples/sec Loss 4.8950 LearningRate 0.0003 Epoch: 19 Global Step: 412220 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:13:29,521-Speed 6337.90 samples/sec Loss 4.9438 LearningRate 0.0003 Epoch: 19 Global Step: 412230 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:32,767-Speed 6311.16 samples/sec Loss 4.8362 LearningRate 0.0003 Epoch: 19 Global Step: 412240 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:36,009-Speed 6319.61 samples/sec Loss 4.9383 LearningRate 0.0003 Epoch: 19 Global Step: 412250 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:39,251-Speed 6317.39 samples/sec Loss 4.9022 LearningRate 0.0003 Epoch: 19 Global Step: 412260 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:42,505-Speed 6296.16 samples/sec Loss 5.0138 LearningRate 0.0003 Epoch: 19 Global Step: 412270 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:45,753-Speed 6305.74 samples/sec Loss 4.8499 LearningRate 0.0003 Epoch: 19 Global Step: 412280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:48,998-Speed 6313.14 samples/sec Loss 4.8538 LearningRate 0.0003 Epoch: 19 Global Step: 412290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:52,254-Speed 6291.57 samples/sec Loss 4.8974 LearningRate 0.0003 Epoch: 19 Global Step: 412300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:55,529-Speed 6253.51 samples/sec Loss 4.8209 LearningRate 0.0003 Epoch: 19 Global Step: 412310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:13:58,775-Speed 6311.26 samples/sec Loss 4.9061 LearningRate 0.0003 Epoch: 19 Global Step: 412320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:02,020-Speed 6313.49 samples/sec Loss 4.8775 LearningRate 0.0003 Epoch: 19 Global Step: 412330 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:14:05,258-Speed 6326.58 samples/sec Loss 4.8473 LearningRate 0.0003 Epoch: 19 Global Step: 412340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:08,514-Speed 6291.11 samples/sec Loss 4.9228 LearningRate 0.0003 Epoch: 19 Global Step: 412350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:11,761-Speed 6309.20 samples/sec Loss 4.9338 LearningRate 0.0003 Epoch: 19 Global Step: 412360 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:15,012-Speed 6300.46 samples/sec Loss 5.0120 LearningRate 0.0003 Epoch: 19 Global Step: 412370 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:18,262-Speed 6304.00 samples/sec Loss 4.9252 LearningRate 0.0003 Epoch: 19 Global Step: 412380 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:21,510-Speed 6306.47 samples/sec Loss 4.9272 LearningRate 0.0003 Epoch: 19 Global Step: 412390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:24,759-Speed 6304.00 samples/sec Loss 4.9566 LearningRate 0.0003 Epoch: 19 Global Step: 412400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:28,005-Speed 6310.97 samples/sec Loss 4.9838 LearningRate 0.0003 Epoch: 19 Global Step: 412410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:31,249-Speed 6315.45 samples/sec Loss 4.8635 LearningRate 0.0003 Epoch: 19 Global Step: 412420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:34,499-Speed 6301.42 samples/sec Loss 4.8915 LearningRate 0.0003 Epoch: 19 Global Step: 412430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:37,731-Speed 6341.07 samples/sec Loss 5.0215 LearningRate 0.0003 Epoch: 19 Global Step: 412440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:40,975-Speed 6314.74 samples/sec Loss 4.9568 LearningRate 0.0003 Epoch: 19 Global Step: 412450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:44,222-Speed 6309.53 samples/sec Loss 4.9374 LearningRate 0.0003 Epoch: 19 Global Step: 412460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:47,464-Speed 6317.66 samples/sec Loss 4.8372 LearningRate 0.0003 Epoch: 19 Global Step: 412470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:50,713-Speed 6305.60 samples/sec Loss 4.8324 LearningRate 0.0003 Epoch: 19 Global Step: 412480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:53,958-Speed 6311.59 samples/sec Loss 4.8630 LearningRate 0.0003 Epoch: 19 Global Step: 412490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:14:57,204-Speed 6311.45 samples/sec Loss 4.9648 LearningRate 0.0003 Epoch: 19 Global Step: 412500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:00,448-Speed 6314.26 samples/sec Loss 4.9673 LearningRate 0.0003 Epoch: 19 Global Step: 412510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:03,697-Speed 6305.57 samples/sec Loss 4.9661 LearningRate 0.0003 Epoch: 19 Global Step: 412520 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:06,943-Speed 6309.74 samples/sec Loss 4.8742 LearningRate 0.0003 Epoch: 19 Global Step: 412530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:10,190-Speed 6308.90 samples/sec Loss 4.9154 LearningRate 0.0003 Epoch: 19 Global Step: 412540 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:15:13,443-Speed 6299.26 samples/sec Loss 4.9347 LearningRate 0.0003 Epoch: 19 Global Step: 412550 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:15:16,677-Speed 6332.90 samples/sec Loss 4.8989 LearningRate 0.0003 Epoch: 19 Global Step: 412560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:19,924-Speed 6309.18 samples/sec Loss 4.8805 LearningRate 0.0003 Epoch: 19 Global Step: 412570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:23,175-Speed 6300.81 samples/sec Loss 4.8910 LearningRate 0.0003 Epoch: 19 Global Step: 412580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:26,420-Speed 6312.69 samples/sec Loss 4.9608 LearningRate 0.0003 Epoch: 19 Global Step: 412590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:29,666-Speed 6310.83 samples/sec Loss 4.9722 LearningRate 0.0003 Epoch: 19 Global Step: 412600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:32,911-Speed 6312.28 samples/sec Loss 4.7986 LearningRate 0.0003 Epoch: 19 Global Step: 412610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:36,158-Speed 6309.81 samples/sec Loss 4.8476 LearningRate 0.0003 Epoch: 19 Global Step: 412620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:39,486-Speed 6154.48 samples/sec Loss 4.9211 LearningRate 0.0003 Epoch: 19 Global Step: 412630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:42,805-Speed 6171.27 samples/sec Loss 4.9165 LearningRate 0.0003 Epoch: 19 Global Step: 412640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:46,052-Speed 6310.59 samples/sec Loss 4.9387 LearningRate 0.0003 Epoch: 19 Global Step: 412650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:49,285-Speed 6335.93 samples/sec Loss 4.8719 LearningRate 0.0003 Epoch: 19 Global Step: 412660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:52,564-Speed 6246.84 samples/sec Loss 4.8299 LearningRate 0.0003 Epoch: 19 Global Step: 412670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:55,810-Speed 6310.88 samples/sec Loss 4.8342 LearningRate 0.0003 Epoch: 19 Global Step: 412680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:15:59,055-Speed 6313.03 samples/sec Loss 4.9794 LearningRate 0.0003 Epoch: 19 Global Step: 412690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:02,305-Speed 6302.16 samples/sec Loss 4.9448 LearningRate 0.0003 Epoch: 19 Global Step: 412700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:05,556-Speed 6300.70 samples/sec Loss 4.9213 LearningRate 0.0003 Epoch: 19 Global Step: 412710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:08,800-Speed 6314.19 samples/sec Loss 4.8608 LearningRate 0.0003 Epoch: 19 Global Step: 412720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:12,048-Speed 6307.05 samples/sec Loss 4.9380 LearningRate 0.0003 Epoch: 19 Global Step: 412730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:15,295-Speed 6308.81 samples/sec Loss 4.9352 LearningRate 0.0003 Epoch: 19 Global Step: 412740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:18,543-Speed 6306.92 samples/sec Loss 4.8995 LearningRate 0.0003 Epoch: 19 Global Step: 412750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:21,779-Speed 6329.74 samples/sec Loss 4.9284 LearningRate 0.0003 Epoch: 19 Global Step: 412760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:25,041-Speed 6281.77 samples/sec Loss 4.9631 LearningRate 0.0003 Epoch: 19 Global Step: 412770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:28,289-Speed 6306.44 samples/sec Loss 4.8930 LearningRate 0.0003 Epoch: 19 Global Step: 412780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:31,537-Speed 6306.49 samples/sec Loss 4.9125 LearningRate 0.0003 Epoch: 19 Global Step: 412790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:34,783-Speed 6311.36 samples/sec Loss 4.8828 LearningRate 0.0003 Epoch: 19 Global Step: 412800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:38,030-Speed 6308.70 samples/sec Loss 4.9132 LearningRate 0.0003 Epoch: 19 Global Step: 412810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:41,274-Speed 6314.26 samples/sec Loss 4.9954 LearningRate 0.0003 Epoch: 19 Global Step: 412820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:44,516-Speed 6318.08 samples/sec Loss 4.8366 LearningRate 0.0003 Epoch: 19 Global Step: 412830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:47,761-Speed 6312.58 samples/sec Loss 4.9340 LearningRate 0.0003 Epoch: 19 Global Step: 412840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:51,004-Speed 6315.94 samples/sec Loss 4.9038 LearningRate 0.0003 Epoch: 19 Global Step: 412850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:16:54,247-Speed 6317.24 samples/sec Loss 4.9106 LearningRate 0.0003 Epoch: 19 Global Step: 412860 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:16:57,493-Speed 6310.08 samples/sec Loss 4.9388 LearningRate 0.0003 Epoch: 19 Global Step: 412870 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:17:00,738-Speed 6314.02 samples/sec Loss 4.9632 LearningRate 0.0003 Epoch: 19 Global Step: 412880 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:17:03,987-Speed 6303.95 samples/sec Loss 4.8548 LearningRate 0.0003 Epoch: 19 Global Step: 412890 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:17:07,280-Speed 6220.64 samples/sec Loss 4.9141 LearningRate 0.0003 Epoch: 19 Global Step: 412900 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:17:10,540-Speed 6283.53 samples/sec Loss 4.9313 LearningRate 0.0003 Epoch: 19 Global Step: 412910 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:17:13,786-Speed 6311.48 samples/sec Loss 4.8987 LearningRate 0.0003 Epoch: 19 Global Step: 412920 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:17:17,032-Speed 6310.59 samples/sec Loss 4.9150 LearningRate 0.0003 Epoch: 19 Global Step: 412930 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:17:20,265-Speed 6336.11 samples/sec Loss 4.9683 LearningRate 0.0003 Epoch: 19 Global Step: 412940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:23,517-Speed 6299.55 samples/sec Loss 4.9000 LearningRate 0.0003 Epoch: 19 Global Step: 412950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:26,763-Speed 6310.74 samples/sec Loss 4.9202 LearningRate 0.0003 Epoch: 19 Global Step: 412960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:30,012-Speed 6304.24 samples/sec Loss 4.8866 LearningRate 0.0003 Epoch: 19 Global Step: 412970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:33,258-Speed 6310.68 samples/sec Loss 4.9133 LearningRate 0.0003 Epoch: 19 Global Step: 412980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:36,503-Speed 6312.92 samples/sec Loss 4.8184 LearningRate 0.0003 Epoch: 19 Global Step: 412990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:39,753-Speed 6302.61 samples/sec Loss 4.8571 LearningRate 0.0003 Epoch: 19 Global Step: 413000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:43,001-Speed 6308.48 samples/sec Loss 4.8986 LearningRate 0.0003 Epoch: 19 Global Step: 413010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:46,247-Speed 6311.15 samples/sec Loss 4.8602 LearningRate 0.0003 Epoch: 19 Global Step: 413020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:49,493-Speed 6310.91 samples/sec Loss 4.9587 LearningRate 0.0003 Epoch: 19 Global Step: 413030 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:52,738-Speed 6310.80 samples/sec Loss 4.8724 LearningRate 0.0003 Epoch: 19 Global Step: 413040 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:17:55,976-Speed 6327.53 samples/sec Loss 4.9408 LearningRate 0.0003 Epoch: 19 Global Step: 413050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:17:59,221-Speed 6313.11 samples/sec Loss 4.8926 LearningRate 0.0003 Epoch: 19 Global Step: 413060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:18:02,468-Speed 6308.13 samples/sec Loss 4.9159 LearningRate 0.0003 Epoch: 19 Global Step: 413070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:18:05,718-Speed 6302.53 samples/sec Loss 4.8889 LearningRate 0.0003 Epoch: 19 Global Step: 413080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:18:08,963-Speed 6312.08 samples/sec Loss 4.8619 LearningRate 0.0003 Epoch: 19 Global Step: 413090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:18:12,212-Speed 6306.28 samples/sec Loss 4.8993 LearningRate 0.0003 Epoch: 19 Global Step: 413100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:18:15,460-Speed 6304.80 samples/sec Loss 4.9052 LearningRate 0.0003 Epoch: 19 Global Step: 413110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:18:18,714-Speed 6297.28 samples/sec Loss 4.8995 LearningRate 0.0003 Epoch: 19 Global Step: 413120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:18:21,960-Speed 6309.90 samples/sec Loss 4.8531 LearningRate 0.0003 Epoch: 19 Global Step: 413130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:18:25,240-Speed 6244.55 samples/sec Loss 4.9051 LearningRate 0.0003 Epoch: 19 Global Step: 413140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:18:28,537-Speed 6212.94 samples/sec Loss 4.8861 LearningRate 0.0003 Epoch: 19 Global Step: 413150 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:18:31,786-Speed 6304.81 samples/sec Loss 4.8714 LearningRate 0.0003 Epoch: 19 Global Step: 413160 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:18:35,032-Speed 6312.25 samples/sec Loss 4.9275 LearningRate 0.0003 Epoch: 19 Global Step: 413170 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:18:38,283-Speed 6300.86 samples/sec Loss 4.8980 LearningRate 0.0003 Epoch: 19 Global Step: 413180 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:18:41,563-Speed 6246.29 samples/sec Loss 4.9312 LearningRate 0.0003 Epoch: 19 Global Step: 413190 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:18:44,815-Speed 6297.71 samples/sec Loss 4.9153 LearningRate 0.0003 Epoch: 19 Global Step: 413200 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:18:48,058-Speed 6316.95 samples/sec Loss 4.8811 LearningRate 0.0003 Epoch: 19 Global Step: 413210 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:18:51,305-Speed 6309.96 samples/sec Loss 4.8607 LearningRate 0.0003 Epoch: 19 Global Step: 413220 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:18:54,553-Speed 6305.64 samples/sec Loss 4.9371 LearningRate 0.0003 Epoch: 19 Global Step: 413230 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:18:57,798-Speed 6313.49 samples/sec Loss 4.9509 LearningRate 0.0003 Epoch: 19 Global Step: 413240 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:19:01,030-Speed 6337.28 samples/sec Loss 4.9134 LearningRate 0.0003 Epoch: 19 Global Step: 413250 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:19:04,261-Speed 6340.06 samples/sec Loss 4.9277 LearningRate 0.0003 Epoch: 19 Global Step: 413260 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:07,506-Speed 6313.26 samples/sec Loss 4.8735 LearningRate 0.0003 Epoch: 19 Global Step: 413270 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:10,749-Speed 6316.99 samples/sec Loss 4.9551 LearningRate 0.0003 Epoch: 19 Global Step: 413280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:13,992-Speed 6315.67 samples/sec Loss 4.9615 LearningRate 0.0003 Epoch: 19 Global Step: 413290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:17,238-Speed 6310.70 samples/sec Loss 4.9662 LearningRate 0.0003 Epoch: 19 Global Step: 413300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:20,482-Speed 6315.50 samples/sec Loss 4.9056 LearningRate 0.0003 Epoch: 19 Global Step: 413310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:23,723-Speed 6319.53 samples/sec Loss 4.8756 LearningRate 0.0003 Epoch: 19 Global Step: 413320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:26,969-Speed 6310.40 samples/sec Loss 4.8813 LearningRate 0.0003 Epoch: 19 Global Step: 413330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:30,213-Speed 6314.21 samples/sec Loss 4.9966 LearningRate 0.0003 Epoch: 19 Global Step: 413340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:33,460-Speed 6310.22 samples/sec Loss 5.0175 LearningRate 0.0003 Epoch: 19 Global Step: 413350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:36,707-Speed 6308.56 samples/sec Loss 4.9042 LearningRate 0.0003 Epoch: 19 Global Step: 413360 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:19:39,950-Speed 6315.37 samples/sec Loss 4.8752 LearningRate 0.0003 Epoch: 19 Global Step: 413370 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:19:43,198-Speed 6307.64 samples/sec Loss 4.9319 LearningRate 0.0003 Epoch: 19 Global Step: 413380 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:19:46,447-Speed 6305.59 samples/sec Loss 4.9077 LearningRate 0.0003 Epoch: 19 Global Step: 413390 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:19:49,687-Speed 6321.51 samples/sec Loss 5.0110 LearningRate 0.0003 Epoch: 19 Global Step: 413400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:52,940-Speed 6298.24 samples/sec Loss 4.9091 LearningRate 0.0003 Epoch: 19 Global Step: 413410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:56,187-Speed 6308.88 samples/sec Loss 4.8920 LearningRate 0.0003 Epoch: 19 Global Step: 413420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:19:59,437-Speed 6302.82 samples/sec Loss 4.9053 LearningRate 0.0003 Epoch: 19 Global Step: 413430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:02,683-Speed 6311.62 samples/sec Loss 4.9300 LearningRate 0.0003 Epoch: 19 Global Step: 413440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:05,933-Speed 6302.02 samples/sec Loss 4.9377 LearningRate 0.0003 Epoch: 19 Global Step: 413450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:09,182-Speed 6304.69 samples/sec Loss 4.8622 LearningRate 0.0003 Epoch: 19 Global Step: 413460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:12,436-Speed 6295.71 samples/sec Loss 4.9139 LearningRate 0.0003 Epoch: 19 Global Step: 413470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:15,682-Speed 6310.51 samples/sec Loss 4.8676 LearningRate 0.0003 Epoch: 19 Global Step: 413480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:18,935-Speed 6297.04 samples/sec Loss 4.9918 LearningRate 0.0003 Epoch: 19 Global Step: 413490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:22,181-Speed 6310.42 samples/sec Loss 4.8003 LearningRate 0.0003 Epoch: 19 Global Step: 413500 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:20:25,411-Speed 6341.49 samples/sec Loss 4.9225 LearningRate 0.0003 Epoch: 19 Global Step: 413510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:28,659-Speed 6306.81 samples/sec Loss 4.8880 LearningRate 0.0003 Epoch: 19 Global Step: 413520 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:31,916-Speed 6289.58 samples/sec Loss 4.8813 LearningRate 0.0003 Epoch: 19 Global Step: 413530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:35,165-Speed 6306.30 samples/sec Loss 4.8758 LearningRate 0.0003 Epoch: 19 Global Step: 413540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:38,412-Speed 6307.32 samples/sec Loss 4.8530 LearningRate 0.0003 Epoch: 19 Global Step: 413550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:41,657-Speed 6313.26 samples/sec Loss 4.9298 LearningRate 0.0003 Epoch: 19 Global Step: 413560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:44,904-Speed 6309.05 samples/sec Loss 4.8633 LearningRate 0.0003 Epoch: 19 Global Step: 413570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:48,147-Speed 6316.74 samples/sec Loss 4.8772 LearningRate 0.0003 Epoch: 19 Global Step: 413580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:51,392-Speed 6311.57 samples/sec Loss 4.9284 LearningRate 0.0003 Epoch: 19 Global Step: 413590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:54,635-Speed 6316.64 samples/sec Loss 4.8626 LearningRate 0.0003 Epoch: 19 Global Step: 413600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:20:57,880-Speed 6313.36 samples/sec Loss 4.9980 LearningRate 0.0003 Epoch: 19 Global Step: 413610 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:21:01,128-Speed 6306.84 samples/sec Loss 4.9058 LearningRate 0.0003 Epoch: 19 Global Step: 413620 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:21:04,482-Speed 6108.52 samples/sec Loss 4.9403 LearningRate 0.0003 Epoch: 19 Global Step: 413630 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:21:07,750-Speed 6267.31 samples/sec Loss 4.8606 LearningRate 0.0003 Epoch: 19 Global Step: 413640 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:21:11,001-Speed 6300.94 samples/sec Loss 4.8976 LearningRate 0.0003 Epoch: 19 Global Step: 413650 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:21:14,243-Speed 6319.01 samples/sec Loss 4.9364 LearningRate 0.0003 Epoch: 19 Global Step: 413660 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:21:17,498-Speed 6297.72 samples/sec Loss 4.9202 LearningRate 0.0003 Epoch: 19 Global Step: 413670 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:21:20,728-Speed 6341.18 samples/sec Loss 4.8346 LearningRate 0.0003 Epoch: 19 Global Step: 413680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:23,974-Speed 6310.57 samples/sec Loss 4.9036 LearningRate 0.0003 Epoch: 19 Global Step: 413690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:27,219-Speed 6313.40 samples/sec Loss 4.9044 LearningRate 0.0003 Epoch: 19 Global Step: 413700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:30,467-Speed 6306.09 samples/sec Loss 4.9260 LearningRate 0.0003 Epoch: 19 Global Step: 413710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:33,711-Speed 6314.35 samples/sec Loss 4.8834 LearningRate 0.0003 Epoch: 19 Global Step: 413720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:36,957-Speed 6310.74 samples/sec Loss 4.9351 LearningRate 0.0003 Epoch: 19 Global Step: 413730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:40,206-Speed 6306.17 samples/sec Loss 4.8631 LearningRate 0.0003 Epoch: 19 Global Step: 413740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:43,462-Speed 6290.68 samples/sec Loss 4.8841 LearningRate 0.0003 Epoch: 19 Global Step: 413750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:46,711-Speed 6306.38 samples/sec Loss 4.8618 LearningRate 0.0003 Epoch: 19 Global Step: 413760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:49,958-Speed 6308.08 samples/sec Loss 4.8961 LearningRate 0.0003 Epoch: 19 Global Step: 413770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:53,201-Speed 6316.64 samples/sec Loss 4.8888 LearningRate 0.0003 Epoch: 19 Global Step: 413780 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:21:56,439-Speed 6324.96 samples/sec Loss 4.9589 LearningRate 0.0003 Epoch: 19 Global Step: 413790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:21:59,682-Speed 6316.90 samples/sec Loss 4.9710 LearningRate 0.0003 Epoch: 19 Global Step: 413800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:02,931-Speed 6304.81 samples/sec Loss 4.9355 LearningRate 0.0003 Epoch: 19 Global Step: 413810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:06,179-Speed 6307.79 samples/sec Loss 4.9231 LearningRate 0.0003 Epoch: 19 Global Step: 413820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:09,427-Speed 6307.31 samples/sec Loss 4.9133 LearningRate 0.0003 Epoch: 19 Global Step: 413830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:12,696-Speed 6265.93 samples/sec Loss 4.8761 LearningRate 0.0003 Epoch: 19 Global Step: 413840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:15,976-Speed 6246.82 samples/sec Loss 4.8217 LearningRate 0.0003 Epoch: 19 Global Step: 413850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:19,223-Speed 6309.24 samples/sec Loss 4.9031 LearningRate 0.0003 Epoch: 19 Global Step: 413860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:22,467-Speed 6312.79 samples/sec Loss 4.9014 LearningRate 0.0003 Epoch: 19 Global Step: 413870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:25,717-Speed 6303.96 samples/sec Loss 4.9147 LearningRate 0.0003 Epoch: 19 Global Step: 413880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:28,960-Speed 6315.48 samples/sec Loss 4.9257 LearningRate 0.0003 Epoch: 19 Global Step: 413890 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:22:32,207-Speed 6308.68 samples/sec Loss 4.9390 LearningRate 0.0003 Epoch: 19 Global Step: 413900 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:22:35,458-Speed 6300.76 samples/sec Loss 4.8834 LearningRate 0.0003 Epoch: 19 Global Step: 413910 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:22:38,706-Speed 6307.78 samples/sec Loss 4.8763 LearningRate 0.0003 Epoch: 19 Global Step: 413920 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:22:41,940-Speed 6333.46 samples/sec Loss 4.9329 LearningRate 0.0003 Epoch: 19 Global Step: 413930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:45,185-Speed 6312.91 samples/sec Loss 4.9117 LearningRate 0.0003 Epoch: 19 Global Step: 413940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:48,431-Speed 6310.67 samples/sec Loss 4.9415 LearningRate 0.0003 Epoch: 19 Global Step: 413950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:51,679-Speed 6307.62 samples/sec Loss 4.8873 LearningRate 0.0003 Epoch: 19 Global Step: 413960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:54,922-Speed 6316.46 samples/sec Loss 4.8380 LearningRate 0.0003 Epoch: 19 Global Step: 413970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:22:58,171-Speed 6305.60 samples/sec Loss 4.8879 LearningRate 0.0003 Epoch: 19 Global Step: 413980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:01,417-Speed 6310.22 samples/sec Loss 4.9746 LearningRate 0.0003 Epoch: 19 Global Step: 413990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:04,666-Speed 6304.97 samples/sec Loss 4.8170 LearningRate 0.0003 Epoch: 19 Global Step: 414000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:07,916-Speed 6301.69 samples/sec Loss 4.8868 LearningRate 0.0003 Epoch: 19 Global Step: 414010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:11,166-Speed 6304.36 samples/sec Loss 4.9439 LearningRate 0.0003 Epoch: 19 Global Step: 414020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:14,419-Speed 6295.77 samples/sec Loss 4.9000 LearningRate 0.0003 Epoch: 19 Global Step: 414030 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:23:17,656-Speed 6329.70 samples/sec Loss 4.9226 LearningRate 0.0003 Epoch: 19 Global Step: 414040 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:20,909-Speed 6297.37 samples/sec Loss 4.8724 LearningRate 0.0003 Epoch: 19 Global Step: 414050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:24,156-Speed 6310.05 samples/sec Loss 4.9102 LearningRate 0.0003 Epoch: 19 Global Step: 414060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:27,404-Speed 6306.87 samples/sec Loss 4.8826 LearningRate 0.0003 Epoch: 19 Global Step: 414070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:30,651-Speed 6307.82 samples/sec Loss 4.8759 LearningRate 0.0003 Epoch: 19 Global Step: 414080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:33,901-Speed 6303.21 samples/sec Loss 4.9169 LearningRate 0.0003 Epoch: 19 Global Step: 414090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:37,148-Speed 6308.83 samples/sec Loss 4.8869 LearningRate 0.0003 Epoch: 19 Global Step: 414100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:40,392-Speed 6313.66 samples/sec Loss 4.8313 LearningRate 0.0003 Epoch: 19 Global Step: 414110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:43,639-Speed 6309.48 samples/sec Loss 4.9413 LearningRate 0.0003 Epoch: 19 Global Step: 414120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:46,882-Speed 6315.92 samples/sec Loss 4.8910 LearningRate 0.0003 Epoch: 19 Global Step: 414130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:50,125-Speed 6318.23 samples/sec Loss 4.9240 LearningRate 0.0003 Epoch: 19 Global Step: 414140 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:23:53,357-Speed 6336.64 samples/sec Loss 4.8680 LearningRate 0.0003 Epoch: 19 Global Step: 414150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:56,604-Speed 6308.28 samples/sec Loss 4.9253 LearningRate 0.0003 Epoch: 19 Global Step: 414160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:23:59,852-Speed 6307.76 samples/sec Loss 4.9184 LearningRate 0.0003 Epoch: 19 Global Step: 414170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:03,101-Speed 6305.03 samples/sec Loss 4.9111 LearningRate 0.0003 Epoch: 19 Global Step: 414180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:06,348-Speed 6307.66 samples/sec Loss 4.8959 LearningRate 0.0003 Epoch: 19 Global Step: 414190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:09,600-Speed 6299.13 samples/sec Loss 4.9378 LearningRate 0.0003 Epoch: 19 Global Step: 414200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:12,878-Speed 6249.54 samples/sec Loss 4.9199 LearningRate 0.0003 Epoch: 19 Global Step: 414210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:16,125-Speed 6309.22 samples/sec Loss 4.9937 LearningRate 0.0003 Epoch: 19 Global Step: 414220 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:19,369-Speed 6313.34 samples/sec Loss 4.9573 LearningRate 0.0003 Epoch: 19 Global Step: 414230 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:22,618-Speed 6306.70 samples/sec Loss 4.9478 LearningRate 0.0003 Epoch: 19 Global Step: 414240 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:25,862-Speed 6313.31 samples/sec Loss 4.8890 LearningRate 0.0003 Epoch: 19 Global Step: 414250 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:24:29,106-Speed 6314.83 samples/sec Loss 4.8589 LearningRate 0.0003 Epoch: 19 Global Step: 414260 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:24:32,351-Speed 6312.82 samples/sec Loss 4.8878 LearningRate 0.0003 Epoch: 19 Global Step: 414270 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:24:35,599-Speed 6308.27 samples/sec Loss 4.8527 LearningRate 0.0003 Epoch: 19 Global Step: 414280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:38,847-Speed 6307.54 samples/sec Loss 4.8873 LearningRate 0.0003 Epoch: 19 Global Step: 414290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:42,094-Speed 6306.81 samples/sec Loss 4.9096 LearningRate 0.0003 Epoch: 19 Global Step: 414300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:45,338-Speed 6316.37 samples/sec Loss 4.8260 LearningRate 0.0003 Epoch: 19 Global Step: 414310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:48,586-Speed 6305.17 samples/sec Loss 4.8274 LearningRate 0.0003 Epoch: 19 Global Step: 414320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:51,835-Speed 6306.31 samples/sec Loss 4.9892 LearningRate 0.0003 Epoch: 19 Global Step: 414330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:55,082-Speed 6307.27 samples/sec Loss 4.8339 LearningRate 0.0003 Epoch: 19 Global Step: 414340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:24:58,329-Speed 6309.20 samples/sec Loss 4.8554 LearningRate 0.0003 Epoch: 19 Global Step: 414350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:01,578-Speed 6305.95 samples/sec Loss 4.8126 LearningRate 0.0003 Epoch: 19 Global Step: 414360 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:04,826-Speed 6306.65 samples/sec Loss 4.9114 LearningRate 0.0003 Epoch: 19 Global Step: 414370 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:08,069-Speed 6315.76 samples/sec Loss 4.8585 LearningRate 0.0003 Epoch: 19 Global Step: 414380 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:25:11,318-Speed 6305.13 samples/sec Loss 4.8409 LearningRate 0.0003 Epoch: 19 Global Step: 414390 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:25:14,562-Speed 6315.28 samples/sec Loss 4.8153 LearningRate 0.0003 Epoch: 19 Global Step: 414400 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:25:17,811-Speed 6308.33 samples/sec Loss 4.9504 LearningRate 0.0003 Epoch: 19 Global Step: 414410 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:25:21,043-Speed 6338.11 samples/sec Loss 4.9316 LearningRate 0.0003 Epoch: 19 Global Step: 414420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:24,292-Speed 6303.41 samples/sec Loss 4.9071 LearningRate 0.0003 Epoch: 19 Global Step: 414430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:27,538-Speed 6312.33 samples/sec Loss 4.9699 LearningRate 0.0003 Epoch: 19 Global Step: 414440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:30,781-Speed 6314.90 samples/sec Loss 4.9297 LearningRate 0.0003 Epoch: 19 Global Step: 414450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:34,023-Speed 6318.57 samples/sec Loss 4.8745 LearningRate 0.0003 Epoch: 19 Global Step: 414460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:37,268-Speed 6313.19 samples/sec Loss 4.9232 LearningRate 0.0003 Epoch: 19 Global Step: 414470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:40,513-Speed 6313.17 samples/sec Loss 4.9465 LearningRate 0.0003 Epoch: 19 Global Step: 414480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:43,757-Speed 6314.69 samples/sec Loss 4.9675 LearningRate 0.0003 Epoch: 19 Global Step: 414490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:47,002-Speed 6312.90 samples/sec Loss 4.9658 LearningRate 0.0003 Epoch: 19 Global Step: 414500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:50,247-Speed 6312.82 samples/sec Loss 4.8023 LearningRate 0.0003 Epoch: 19 Global Step: 414510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:25:53,493-Speed 6310.78 samples/sec Loss 4.9615 LearningRate 0.0003 Epoch: 19 Global Step: 414520 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:25:56,739-Speed 6311.58 samples/sec Loss 4.9046 LearningRate 0.0003 Epoch: 19 Global Step: 414530 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:25:59,971-Speed 6337.76 samples/sec Loss 4.8782 LearningRate 0.0003 Epoch: 19 Global Step: 414540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:03,217-Speed 6309.75 samples/sec Loss 4.9353 LearningRate 0.0003 Epoch: 19 Global Step: 414550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:06,464-Speed 6310.06 samples/sec Loss 4.7910 LearningRate 0.0003 Epoch: 19 Global Step: 414560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:09,714-Speed 6302.78 samples/sec Loss 4.8799 LearningRate 0.0003 Epoch: 19 Global Step: 414570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:12,957-Speed 6314.60 samples/sec Loss 4.9439 LearningRate 0.0003 Epoch: 19 Global Step: 414580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:16,207-Speed 6303.78 samples/sec Loss 4.9264 LearningRate 0.0003 Epoch: 19 Global Step: 414590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:19,459-Speed 6299.38 samples/sec Loss 4.8257 LearningRate 0.0003 Epoch: 19 Global Step: 414600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:22,706-Speed 6308.90 samples/sec Loss 4.9344 LearningRate 0.0003 Epoch: 19 Global Step: 414610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:25,957-Speed 6301.46 samples/sec Loss 4.9333 LearningRate 0.0003 Epoch: 19 Global Step: 414620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:29,200-Speed 6315.51 samples/sec Loss 4.9381 LearningRate 0.0003 Epoch: 19 Global Step: 414630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:32,431-Speed 6339.89 samples/sec Loss 4.8376 LearningRate 0.0003 Epoch: 19 Global Step: 414640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:35,681-Speed 6302.93 samples/sec Loss 4.8724 LearningRate 0.0003 Epoch: 19 Global Step: 414650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:38,927-Speed 6311.85 samples/sec Loss 4.8852 LearningRate 0.0003 Epoch: 19 Global Step: 414660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:42,174-Speed 6308.88 samples/sec Loss 4.9194 LearningRate 0.0003 Epoch: 19 Global Step: 414670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:45,423-Speed 6304.94 samples/sec Loss 4.8516 LearningRate 0.0003 Epoch: 19 Global Step: 414680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:48,670-Speed 6308.13 samples/sec Loss 4.8891 LearningRate 0.0003 Epoch: 19 Global Step: 414690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:51,916-Speed 6310.94 samples/sec Loss 4.9154 LearningRate 0.0003 Epoch: 19 Global Step: 414700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:55,166-Speed 6303.59 samples/sec Loss 4.8957 LearningRate 0.0003 Epoch: 19 Global Step: 414710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:26:58,412-Speed 6310.64 samples/sec Loss 4.9379 LearningRate 0.0003 Epoch: 19 Global Step: 414720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:27:01,664-Speed 6299.43 samples/sec Loss 4.9074 LearningRate 0.0003 Epoch: 19 Global Step: 414730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:27:04,915-Speed 6300.51 samples/sec Loss 4.9429 LearningRate 0.0003 Epoch: 19 Global Step: 414740 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:27:08,149-Speed 6334.68 samples/sec Loss 4.9303 LearningRate 0.0003 Epoch: 19 Global Step: 414750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:27:11,398-Speed 6303.70 samples/sec Loss 4.9155 LearningRate 0.0003 Epoch: 19 Global Step: 414760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:27:14,643-Speed 6313.45 samples/sec Loss 4.9001 LearningRate 0.0003 Epoch: 19 Global Step: 414770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:27:17,887-Speed 6314.75 samples/sec Loss 4.9165 LearningRate 0.0003 Epoch: 19 Global Step: 414780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:27:21,139-Speed 6298.73 samples/sec Loss 4.8529 LearningRate 0.0003 Epoch: 19 Global Step: 414790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:21,389-Speed 339.92 samples/sec Loss 4.8555 LearningRate 0.0003 Epoch: 20 Global Step: 414800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:24,626-Speed 6328.49 samples/sec Loss 4.9287 LearningRate 0.0003 Epoch: 20 Global Step: 414810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:27,861-Speed 6331.51 samples/sec Loss 4.8742 LearningRate 0.0003 Epoch: 20 Global Step: 414820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:31,100-Speed 6325.65 samples/sec Loss 4.8981 LearningRate 0.0003 Epoch: 20 Global Step: 414830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:34,332-Speed 6337.28 samples/sec Loss 4.9061 LearningRate 0.0003 Epoch: 20 Global Step: 414840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:37,564-Speed 6339.06 samples/sec Loss 4.9085 LearningRate 0.0003 Epoch: 20 Global Step: 414850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:40,803-Speed 6324.85 samples/sec Loss 5.0144 LearningRate 0.0003 Epoch: 20 Global Step: 414860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:44,038-Speed 6332.00 samples/sec Loss 4.9801 LearningRate 0.0003 Epoch: 20 Global Step: 414870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:47,292-Speed 6296.15 samples/sec Loss 4.8259 LearningRate 0.0003 Epoch: 20 Global Step: 414880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:50,529-Speed 6326.95 samples/sec Loss 4.9164 LearningRate 0.0003 Epoch: 20 Global Step: 414890 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:53,770-Speed 6321.24 samples/sec Loss 4.8713 LearningRate 0.0003 Epoch: 20 Global Step: 414900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:28:57,008-Speed 6325.58 samples/sec Loss 4.9546 LearningRate 0.0003 Epoch: 20 Global Step: 414910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:29:00,243-Speed 6333.38 samples/sec Loss 4.9057 LearningRate 0.0003 Epoch: 20 Global Step: 414920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:29:03,489-Speed 6309.58 samples/sec Loss 4.8472 LearningRate 0.0003 Epoch: 20 Global Step: 414930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:29:06,728-Speed 6324.59 samples/sec Loss 4.8206 LearningRate 0.0003 Epoch: 20 Global Step: 414940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:29:09,965-Speed 6327.92 samples/sec Loss 4.8692 LearningRate 0.0003 Epoch: 20 Global Step: 414950 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:13,200-Speed 6331.81 samples/sec Loss 4.9357 LearningRate 0.0003 Epoch: 20 Global Step: 414960 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:16,440-Speed 6324.34 samples/sec Loss 4.8832 LearningRate 0.0003 Epoch: 20 Global Step: 414970 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:19,679-Speed 6324.00 samples/sec Loss 4.8575 LearningRate 0.0003 Epoch: 20 Global Step: 414980 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:22,923-Speed 6314.07 samples/sec Loss 4.8994 LearningRate 0.0003 Epoch: 20 Global Step: 414990 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:26,163-Speed 6322.05 samples/sec Loss 4.8264 LearningRate 0.0003 Epoch: 20 Global Step: 415000 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:29,404-Speed 6320.13 samples/sec Loss 4.8403 LearningRate 0.0003 Epoch: 20 Global Step: 415010 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:32,644-Speed 6323.64 samples/sec Loss 4.8399 LearningRate 0.0003 Epoch: 20 Global Step: 415020 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:35,885-Speed 6319.54 samples/sec Loss 4.8404 LearningRate 0.0003 Epoch: 20 Global Step: 415030 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:39,122-Speed 6329.05 samples/sec Loss 4.8688 LearningRate 0.0003 Epoch: 20 Global Step: 415040 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:42,349-Speed 6347.97 samples/sec Loss 4.8351 LearningRate 0.0003 Epoch: 20 Global Step: 415050 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:45,591-Speed 6317.66 samples/sec Loss 4.8454 LearningRate 0.0003 Epoch: 20 Global Step: 415060 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:48,830-Speed 6325.05 samples/sec Loss 4.9135 LearningRate 0.0003 Epoch: 20 Global Step: 415070 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:52,076-Speed 6312.05 samples/sec Loss 4.8490 LearningRate 0.0003 Epoch: 20 Global Step: 415080 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:29:55,301-Speed 6350.89 samples/sec Loss 4.8973 LearningRate 0.0003 Epoch: 20 Global Step: 415090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:29:58,537-Speed 6329.46 samples/sec Loss 4.8664 LearningRate 0.0003 Epoch: 20 Global Step: 415100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:01,780-Speed 6317.81 samples/sec Loss 4.8976 LearningRate 0.0003 Epoch: 20 Global Step: 415110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:05,017-Speed 6328.83 samples/sec Loss 4.9314 LearningRate 0.0003 Epoch: 20 Global Step: 415120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:08,255-Speed 6324.74 samples/sec Loss 4.8378 LearningRate 0.0003 Epoch: 20 Global Step: 415130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:11,493-Speed 6326.13 samples/sec Loss 4.8938 LearningRate 0.0003 Epoch: 20 Global Step: 415140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:14,734-Speed 6321.36 samples/sec Loss 4.8901 LearningRate 0.0003 Epoch: 20 Global Step: 415150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:17,970-Speed 6330.56 samples/sec Loss 4.9082 LearningRate 0.0003 Epoch: 20 Global Step: 415160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:21,206-Speed 6330.40 samples/sec Loss 4.8713 LearningRate 0.0003 Epoch: 20 Global Step: 415170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:24,442-Speed 6330.49 samples/sec Loss 4.8321 LearningRate 0.0003 Epoch: 20 Global Step: 415180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:27,680-Speed 6324.73 samples/sec Loss 4.8971 LearningRate 0.0003 Epoch: 20 Global Step: 415190 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:30:30,914-Speed 6335.23 samples/sec Loss 4.8452 LearningRate 0.0003 Epoch: 20 Global Step: 415200 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:30:34,156-Speed 6317.56 samples/sec Loss 4.8720 LearningRate 0.0003 Epoch: 20 Global Step: 415210 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:30:37,395-Speed 6326.22 samples/sec Loss 4.9021 LearningRate 0.0003 Epoch: 20 Global Step: 415220 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:30:40,635-Speed 6321.67 samples/sec Loss 4.8825 LearningRate 0.0003 Epoch: 20 Global Step: 415230 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:30:43,858-Speed 6356.32 samples/sec Loss 4.9484 LearningRate 0.0003 Epoch: 20 Global Step: 415240 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:47,095-Speed 6327.98 samples/sec Loss 4.8998 LearningRate 0.0003 Epoch: 20 Global Step: 415250 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:50,330-Speed 6331.90 samples/sec Loss 4.8979 LearningRate 0.0003 Epoch: 20 Global Step: 415260 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:53,610-Speed 6245.60 samples/sec Loss 4.7862 LearningRate 0.0003 Epoch: 20 Global Step: 415270 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:30:56,846-Speed 6329.46 samples/sec Loss 4.8824 LearningRate 0.0003 Epoch: 20 Global Step: 415280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:00,084-Speed 6327.01 samples/sec Loss 4.9174 LearningRate 0.0003 Epoch: 20 Global Step: 415290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:03,320-Speed 6331.04 samples/sec Loss 4.8861 LearningRate 0.0003 Epoch: 20 Global Step: 415300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:06,594-Speed 6257.18 samples/sec Loss 4.9243 LearningRate 0.0003 Epoch: 20 Global Step: 415310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:09,830-Speed 6329.14 samples/sec Loss 4.8390 LearningRate 0.0003 Epoch: 20 Global Step: 415320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:13,068-Speed 6326.14 samples/sec Loss 4.8741 LearningRate 0.0003 Epoch: 20 Global Step: 415330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:16,312-Speed 6316.11 samples/sec Loss 4.8240 LearningRate 0.0003 Epoch: 20 Global Step: 415340 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:31:19,546-Speed 6333.06 samples/sec Loss 4.8188 LearningRate 0.0003 Epoch: 20 Global Step: 415350 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:31:22,783-Speed 6329.19 samples/sec Loss 4.9376 LearningRate 0.0003 Epoch: 20 Global Step: 415360 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:31:26,021-Speed 6325.30 samples/sec Loss 4.8778 LearningRate 0.0003 Epoch: 20 Global Step: 415370 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:31:29,245-Speed 6353.48 samples/sec Loss 4.8783 LearningRate 0.0003 Epoch: 20 Global Step: 415380 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:32,481-Speed 6329.80 samples/sec Loss 4.9258 LearningRate 0.0003 Epoch: 20 Global Step: 415390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:35,720-Speed 6325.12 samples/sec Loss 4.8566 LearningRate 0.0003 Epoch: 20 Global Step: 415400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:38,975-Speed 6293.01 samples/sec Loss 4.9040 LearningRate 0.0003 Epoch: 20 Global Step: 415410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:42,213-Speed 6327.44 samples/sec Loss 4.9019 LearningRate 0.0003 Epoch: 20 Global Step: 415420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:45,450-Speed 6327.34 samples/sec Loss 4.9235 LearningRate 0.0003 Epoch: 20 Global Step: 415430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:48,687-Speed 6328.35 samples/sec Loss 4.9124 LearningRate 0.0003 Epoch: 20 Global Step: 415440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:51,925-Speed 6327.12 samples/sec Loss 4.9110 LearningRate 0.0003 Epoch: 20 Global Step: 415450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:55,166-Speed 6320.12 samples/sec Loss 4.8670 LearningRate 0.0003 Epoch: 20 Global Step: 415460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:31:58,407-Speed 6320.22 samples/sec Loss 4.8644 LearningRate 0.0003 Epoch: 20 Global Step: 415470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:01,651-Speed 6313.86 samples/sec Loss 4.8923 LearningRate 0.0003 Epoch: 20 Global Step: 415480 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:32:04,891-Speed 6323.02 samples/sec Loss 4.9204 LearningRate 0.0003 Epoch: 20 Global Step: 415490 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:32:08,134-Speed 6317.27 samples/sec Loss 4.8800 LearningRate 0.0003 Epoch: 20 Global Step: 415500 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:32:11,370-Speed 6329.51 samples/sec Loss 4.9640 LearningRate 0.0003 Epoch: 20 Global Step: 415510 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:32:14,605-Speed 6332.75 samples/sec Loss 4.9140 LearningRate 0.0003 Epoch: 20 Global Step: 415520 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:32:17,847-Speed 6318.55 samples/sec Loss 4.8663 LearningRate 0.0003 Epoch: 20 Global Step: 415530 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:32:21,071-Speed 6353.49 samples/sec Loss 4.9340 LearningRate 0.0003 Epoch: 20 Global Step: 415540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:24,311-Speed 6322.46 samples/sec Loss 4.8609 LearningRate 0.0003 Epoch: 20 Global Step: 415550 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:27,549-Speed 6326.20 samples/sec Loss 4.9294 LearningRate 0.0003 Epoch: 20 Global Step: 415560 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:30,788-Speed 6325.45 samples/sec Loss 4.8909 LearningRate 0.0003 Epoch: 20 Global Step: 415570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:34,023-Speed 6331.08 samples/sec Loss 4.8748 LearningRate 0.0003 Epoch: 20 Global Step: 415580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:37,263-Speed 6323.21 samples/sec Loss 4.8895 LearningRate 0.0003 Epoch: 20 Global Step: 415590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:40,503-Speed 6321.59 samples/sec Loss 4.8179 LearningRate 0.0003 Epoch: 20 Global Step: 415600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:43,740-Speed 6328.85 samples/sec Loss 4.8568 LearningRate 0.0003 Epoch: 20 Global Step: 415610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:46,980-Speed 6322.56 samples/sec Loss 4.9126 LearningRate 0.0003 Epoch: 20 Global Step: 415620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:50,219-Speed 6324.22 samples/sec Loss 4.9407 LearningRate 0.0003 Epoch: 20 Global Step: 415630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:53,445-Speed 6349.19 samples/sec Loss 4.9484 LearningRate 0.0003 Epoch: 20 Global Step: 415640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:56,684-Speed 6325.72 samples/sec Loss 4.8711 LearningRate 0.0003 Epoch: 20 Global Step: 415650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:32:59,923-Speed 6322.95 samples/sec Loss 4.8523 LearningRate 0.0003 Epoch: 20 Global Step: 415660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:03,166-Speed 6316.56 samples/sec Loss 4.9485 LearningRate 0.0003 Epoch: 20 Global Step: 415670 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:06,401-Speed 6331.87 samples/sec Loss 4.9063 LearningRate 0.0003 Epoch: 20 Global Step: 415680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:09,639-Speed 6326.51 samples/sec Loss 4.8410 LearningRate 0.0003 Epoch: 20 Global Step: 415690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:12,877-Speed 6325.73 samples/sec Loss 4.9109 LearningRate 0.0003 Epoch: 20 Global Step: 415700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:16,113-Speed 6331.47 samples/sec Loss 4.8457 LearningRate 0.0003 Epoch: 20 Global Step: 415710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:19,351-Speed 6326.26 samples/sec Loss 4.9172 LearningRate 0.0003 Epoch: 20 Global Step: 415720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:22,590-Speed 6324.48 samples/sec Loss 4.8743 LearningRate 0.0003 Epoch: 20 Global Step: 415730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:25,829-Speed 6324.18 samples/sec Loss 4.8555 LearningRate 0.0003 Epoch: 20 Global Step: 415740 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:33:29,065-Speed 6330.05 samples/sec Loss 4.9693 LearningRate 0.0003 Epoch: 20 Global Step: 415750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:32,300-Speed 6332.37 samples/sec Loss 4.8777 LearningRate 0.0003 Epoch: 20 Global Step: 415760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:35,540-Speed 6322.84 samples/sec Loss 4.8805 LearningRate 0.0003 Epoch: 20 Global Step: 415770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:38,777-Speed 6328.96 samples/sec Loss 4.9610 LearningRate 0.0003 Epoch: 20 Global Step: 415780 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:42,070-Speed 6219.24 samples/sec Loss 4.9346 LearningRate 0.0003 Epoch: 20 Global Step: 415790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:45,308-Speed 6326.62 samples/sec Loss 4.8775 LearningRate 0.0003 Epoch: 20 Global Step: 415800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:48,549-Speed 6320.15 samples/sec Loss 4.9203 LearningRate 0.0003 Epoch: 20 Global Step: 415810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:51,797-Speed 6307.78 samples/sec Loss 4.8831 LearningRate 0.0003 Epoch: 20 Global Step: 415820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:55,070-Speed 6258.75 samples/sec Loss 4.9359 LearningRate 0.0003 Epoch: 20 Global Step: 415830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:33:58,318-Speed 6305.78 samples/sec Loss 4.8400 LearningRate 0.0003 Epoch: 20 Global Step: 415840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:01,554-Speed 6331.61 samples/sec Loss 4.8778 LearningRate 0.0003 Epoch: 20 Global Step: 415850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:04,793-Speed 6322.70 samples/sec Loss 4.8751 LearningRate 0.0003 Epoch: 20 Global Step: 415860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:08,032-Speed 6326.37 samples/sec Loss 4.8389 LearningRate 0.0003 Epoch: 20 Global Step: 415870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:11,269-Speed 6327.42 samples/sec Loss 4.8509 LearningRate 0.0003 Epoch: 20 Global Step: 415880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:14,510-Speed 6319.20 samples/sec Loss 4.8609 LearningRate 0.0003 Epoch: 20 Global Step: 415890 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:17,749-Speed 6325.16 samples/sec Loss 4.8380 LearningRate 0.0003 Epoch: 20 Global Step: 415900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:20,993-Speed 6314.41 samples/sec Loss 4.8924 LearningRate 0.0003 Epoch: 20 Global Step: 415910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:24,235-Speed 6317.96 samples/sec Loss 4.8868 LearningRate 0.0003 Epoch: 20 Global Step: 415920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:27,478-Speed 6318.65 samples/sec Loss 4.9280 LearningRate 0.0003 Epoch: 20 Global Step: 415930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:30,719-Speed 6319.77 samples/sec Loss 4.8541 LearningRate 0.0003 Epoch: 20 Global Step: 415940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:33,957-Speed 6326.71 samples/sec Loss 4.9747 LearningRate 0.0003 Epoch: 20 Global Step: 415950 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:34:37,188-Speed 6338.86 samples/sec Loss 4.8184 LearningRate 0.0003 Epoch: 20 Global Step: 415960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:40,425-Speed 6329.17 samples/sec Loss 4.9142 LearningRate 0.0003 Epoch: 20 Global Step: 415970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:43,663-Speed 6326.69 samples/sec Loss 4.8429 LearningRate 0.0003 Epoch: 20 Global Step: 415980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:46,902-Speed 6323.07 samples/sec Loss 4.9098 LearningRate 0.0003 Epoch: 20 Global Step: 415990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:50,142-Speed 6322.59 samples/sec Loss 4.8044 LearningRate 0.0003 Epoch: 20 Global Step: 416000 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:53,382-Speed 6322.54 samples/sec Loss 4.9011 LearningRate 0.0003 Epoch: 20 Global Step: 416010 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:56,620-Speed 6325.81 samples/sec Loss 4.9566 LearningRate 0.0003 Epoch: 20 Global Step: 416020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:34:59,863-Speed 6317.82 samples/sec Loss 4.8478 LearningRate 0.0003 Epoch: 20 Global Step: 416030 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:03,102-Speed 6324.65 samples/sec Loss 4.8642 LearningRate 0.0003 Epoch: 20 Global Step: 416040 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:06,345-Speed 6316.19 samples/sec Loss 4.9055 LearningRate 0.0003 Epoch: 20 Global Step: 416050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:09,587-Speed 6318.72 samples/sec Loss 4.8739 LearningRate 0.0003 Epoch: 20 Global Step: 416060 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:35:12,827-Speed 6322.20 samples/sec Loss 4.8506 LearningRate 0.0003 Epoch: 20 Global Step: 416070 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:35:16,071-Speed 6313.37 samples/sec Loss 4.8875 LearningRate 0.0003 Epoch: 20 Global Step: 416080 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:35:19,310-Speed 6325.25 samples/sec Loss 4.8407 LearningRate 0.0003 Epoch: 20 Global Step: 416090 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:35:22,553-Speed 6317.45 samples/sec Loss 4.8623 LearningRate 0.0003 Epoch: 20 Global Step: 416100 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:35:25,792-Speed 6323.94 samples/sec Loss 4.8726 LearningRate 0.0003 Epoch: 20 Global Step: 416110 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:35:29,012-Speed 6360.47 samples/sec Loss 4.8981 LearningRate 0.0003 Epoch: 20 Global Step: 416120 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:32,256-Speed 6315.30 samples/sec Loss 4.8720 LearningRate 0.0003 Epoch: 20 Global Step: 416130 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:35,493-Speed 6329.38 samples/sec Loss 4.9423 LearningRate 0.0003 Epoch: 20 Global Step: 416140 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:38,732-Speed 6323.20 samples/sec Loss 4.8771 LearningRate 0.0003 Epoch: 20 Global Step: 416150 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:41,969-Speed 6329.07 samples/sec Loss 4.9071 LearningRate 0.0003 Epoch: 20 Global Step: 416160 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:45,209-Speed 6322.61 samples/sec Loss 4.9006 LearningRate 0.0003 Epoch: 20 Global Step: 416170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:48,450-Speed 6319.76 samples/sec Loss 4.8028 LearningRate 0.0003 Epoch: 20 Global Step: 416180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:51,690-Speed 6323.70 samples/sec Loss 4.7911 LearningRate 0.0003 Epoch: 20 Global Step: 416190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:54,927-Speed 6328.05 samples/sec Loss 4.8434 LearningRate 0.0003 Epoch: 20 Global Step: 416200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:35:58,163-Speed 6330.14 samples/sec Loss 4.8854 LearningRate 0.0003 Epoch: 20 Global Step: 416210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:01,410-Speed 6309.20 samples/sec Loss 4.9049 LearningRate 0.0003 Epoch: 20 Global Step: 416220 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:36:04,651-Speed 6320.15 samples/sec Loss 4.8585 LearningRate 0.0003 Epoch: 20 Global Step: 416230 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:36:07,890-Speed 6323.57 samples/sec Loss 4.8828 LearningRate 0.0003 Epoch: 20 Global Step: 416240 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:36:11,118-Speed 6346.31 samples/sec Loss 4.8336 LearningRate 0.0003 Epoch: 20 Global Step: 416250 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:14,360-Speed 6318.09 samples/sec Loss 4.9722 LearningRate 0.0003 Epoch: 20 Global Step: 416260 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:17,602-Speed 6318.80 samples/sec Loss 4.8748 LearningRate 0.0003 Epoch: 20 Global Step: 416270 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:20,849-Speed 6308.41 samples/sec Loss 4.8023 LearningRate 0.0003 Epoch: 20 Global Step: 416280 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:24,104-Speed 6294.03 samples/sec Loss 4.8832 LearningRate 0.0003 Epoch: 20 Global Step: 416290 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:27,341-Speed 6327.93 samples/sec Loss 4.9168 LearningRate 0.0003 Epoch: 20 Global Step: 416300 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:30,583-Speed 6318.30 samples/sec Loss 4.8411 LearningRate 0.0003 Epoch: 20 Global Step: 416310 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:33,824-Speed 6319.56 samples/sec Loss 4.9440 LearningRate 0.0003 Epoch: 20 Global Step: 416320 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:37,121-Speed 6212.96 samples/sec Loss 4.8743 LearningRate 0.0003 Epoch: 20 Global Step: 416330 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:40,398-Speed 6252.15 samples/sec Loss 4.8550 LearningRate 0.0003 Epoch: 20 Global Step: 416340 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:43,625-Speed 6348.77 samples/sec Loss 4.8671 LearningRate 0.0003 Epoch: 20 Global Step: 416350 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:46,866-Speed 6319.39 samples/sec Loss 4.8272 LearningRate 0.0003 Epoch: 20 Global Step: 416360 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:50,106-Speed 6322.47 samples/sec Loss 4.9545 LearningRate 0.0003 Epoch: 20 Global Step: 416370 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:53,344-Speed 6327.22 samples/sec Loss 4.8840 LearningRate 0.0003 Epoch: 20 Global Step: 416380 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:56,584-Speed 6321.89 samples/sec Loss 4.9501 LearningRate 0.0003 Epoch: 20 Global Step: 416390 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:36:59,825-Speed 6320.54 samples/sec Loss 4.8706 LearningRate 0.0003 Epoch: 20 Global Step: 416400 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:03,065-Speed 6323.19 samples/sec Loss 4.7977 LearningRate 0.0003 Epoch: 20 Global Step: 416410 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:06,305-Speed 6322.04 samples/sec Loss 4.8648 LearningRate 0.0003 Epoch: 20 Global Step: 416420 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:09,549-Speed 6314.33 samples/sec Loss 4.9222 LearningRate 0.0003 Epoch: 20 Global Step: 416430 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:12,786-Speed 6327.96 samples/sec Loss 4.8291 LearningRate 0.0003 Epoch: 20 Global Step: 416440 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:16,015-Speed 6344.38 samples/sec Loss 4.8452 LearningRate 0.0003 Epoch: 20 Global Step: 416450 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:19,258-Speed 6315.78 samples/sec Loss 4.9428 LearningRate 0.0003 Epoch: 20 Global Step: 416460 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:22,499-Speed 6321.36 samples/sec Loss 4.8508 LearningRate 0.0003 Epoch: 20 Global Step: 416470 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:25,741-Speed 6319.34 samples/sec Loss 4.8499 LearningRate 0.0003 Epoch: 20 Global Step: 416480 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:29,015-Speed 6255.86 samples/sec Loss 4.8535 LearningRate 0.0003 Epoch: 20 Global Step: 416490 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:32,256-Speed 6320.57 samples/sec Loss 4.8768 LearningRate 0.0003 Epoch: 20 Global Step: 416500 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:35,498-Speed 6317.72 samples/sec Loss 5.0062 LearningRate 0.0003 Epoch: 20 Global Step: 416510 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:38,747-Speed 6304.96 samples/sec Loss 4.9186 LearningRate 0.0003 Epoch: 20 Global Step: 416520 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:41,992-Speed 6312.47 samples/sec Loss 4.8508 LearningRate 0.0003 Epoch: 20 Global Step: 416530 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:45,233-Speed 6320.86 samples/sec Loss 4.9240 LearningRate 0.0003 Epoch: 20 Global Step: 416540 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:48,472-Speed 6324.34 samples/sec Loss 4.8774 LearningRate 0.0003 Epoch: 20 Global Step: 416550 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:37:51,715-Speed 6317.12 samples/sec Loss 4.8614 LearningRate 0.0003 Epoch: 20 Global Step: 416560 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:37:54,943-Speed 6346.62 samples/sec Loss 4.8717 LearningRate 0.0003 Epoch: 20 Global Step: 416570 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:37:58,182-Speed 6324.54 samples/sec Loss 4.9800 LearningRate 0.0003 Epoch: 20 Global Step: 416580 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:01,430-Speed 6306.45 samples/sec Loss 4.9158 LearningRate 0.0003 Epoch: 20 Global Step: 416590 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:04,678-Speed 6307.11 samples/sec Loss 4.9372 LearningRate 0.0003 Epoch: 20 Global Step: 416600 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:07,921-Speed 6316.49 samples/sec Loss 4.8111 LearningRate 0.0003 Epoch: 20 Global Step: 416610 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:11,161-Speed 6322.47 samples/sec Loss 4.8323 LearningRate 0.0003 Epoch: 20 Global Step: 416620 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:14,401-Speed 6322.48 samples/sec Loss 4.8709 LearningRate 0.0003 Epoch: 20 Global Step: 416630 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:17,653-Speed 6297.87 samples/sec Loss 4.8739 LearningRate 0.0003 Epoch: 20 Global Step: 416640 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:20,896-Speed 6316.27 samples/sec Loss 4.9212 LearningRate 0.0003 Epoch: 20 Global Step: 416650 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:24,138-Speed 6318.74 samples/sec Loss 4.9175 LearningRate 0.0003 Epoch: 20 Global Step: 416660 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:27,381-Speed 6318.13 samples/sec Loss 4.8924 LearningRate 0.0003 Epoch: 20 Global Step: 416670 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:38:30,610-Speed 6343.28 samples/sec Loss 4.9023 LearningRate 0.0003 Epoch: 20 Global Step: 416680 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:33,854-Speed 6314.84 samples/sec Loss 4.9245 LearningRate 0.0003 Epoch: 20 Global Step: 416690 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:37,097-Speed 6315.59 samples/sec Loss 4.8364 LearningRate 0.0003 Epoch: 20 Global Step: 416700 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:40,337-Speed 6322.12 samples/sec Loss 4.8530 LearningRate 0.0003 Epoch: 20 Global Step: 416710 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:43,581-Speed 6316.08 samples/sec Loss 4.8798 LearningRate 0.0003 Epoch: 20 Global Step: 416720 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:46,824-Speed 6315.30 samples/sec Loss 4.8289 LearningRate 0.0003 Epoch: 20 Global Step: 416730 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:50,067-Speed 6317.63 samples/sec Loss 4.8940 LearningRate 0.0003 Epoch: 20 Global Step: 416740 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:53,308-Speed 6320.28 samples/sec Loss 4.8323 LearningRate 0.0003 Epoch: 20 Global Step: 416750 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:56,552-Speed 6313.92 samples/sec Loss 4.7396 LearningRate 0.0003 Epoch: 20 Global Step: 416760 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:38:59,800-Speed 6307.81 samples/sec Loss 4.8607 LearningRate 0.0003 Epoch: 20 Global Step: 416770 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:03,040-Speed 6321.52 samples/sec Loss 4.9103 LearningRate 0.0003 Epoch: 20 Global Step: 416780 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:39:06,269-Speed 6344.01 samples/sec Loss 4.8805 LearningRate 0.0003 Epoch: 20 Global Step: 416790 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:09,511-Speed 6319.92 samples/sec Loss 4.9568 LearningRate 0.0003 Epoch: 20 Global Step: 416800 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:12,753-Speed 6318.50 samples/sec Loss 4.9055 LearningRate 0.0003 Epoch: 20 Global Step: 416810 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:15,998-Speed 6313.32 samples/sec Loss 4.8219 LearningRate 0.0003 Epoch: 20 Global Step: 416820 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:19,242-Speed 6314.36 samples/sec Loss 4.8464 LearningRate 0.0003 Epoch: 20 Global Step: 416830 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:22,482-Speed 6322.50 samples/sec Loss 4.8327 LearningRate 0.0003 Epoch: 20 Global Step: 416840 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:25,726-Speed 6313.23 samples/sec Loss 4.8982 LearningRate 0.0003 Epoch: 20 Global Step: 416850 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:28,971-Speed 6314.37 samples/sec Loss 4.8440 LearningRate 0.0003 Epoch: 20 Global Step: 416860 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:32,217-Speed 6310.02 samples/sec Loss 4.7972 LearningRate 0.0003 Epoch: 20 Global Step: 416870 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:35,458-Speed 6319.41 samples/sec Loss 4.8733 LearningRate 0.0003 Epoch: 20 Global Step: 416880 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:38,697-Speed 6324.82 samples/sec Loss 4.8398 LearningRate 0.0003 Epoch: 20 Global Step: 416890 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:39:41,928-Speed 6341.01 samples/sec Loss 4.8769 LearningRate 0.0003 Epoch: 20 Global Step: 416900 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:45,166-Speed 6325.78 samples/sec Loss 4.8837 LearningRate 0.0003 Epoch: 20 Global Step: 416910 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:48,408-Speed 6318.25 samples/sec Loss 4.8755 LearningRate 0.0003 Epoch: 20 Global Step: 416920 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:51,649-Speed 6321.13 samples/sec Loss 4.8457 LearningRate 0.0003 Epoch: 20 Global Step: 416930 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:54,891-Speed 6318.28 samples/sec Loss 4.8270 LearningRate 0.0003 Epoch: 20 Global Step: 416940 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:39:58,133-Speed 6318.67 samples/sec Loss 4.8530 LearningRate 0.0003 Epoch: 20 Global Step: 416950 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:01,374-Speed 6319.07 samples/sec Loss 4.9007 LearningRate 0.0003 Epoch: 20 Global Step: 416960 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:04,619-Speed 6313.59 samples/sec Loss 4.8203 LearningRate 0.0003 Epoch: 20 Global Step: 416970 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:07,862-Speed 6316.20 samples/sec Loss 4.8895 LearningRate 0.0003 Epoch: 20 Global Step: 416980 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:11,107-Speed 6312.62 samples/sec Loss 4.8732 LearningRate 0.0003 Epoch: 20 Global Step: 416990 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:14,350-Speed 6317.24 samples/sec Loss 4.9066 LearningRate 0.0003 Epoch: 20 Global Step: 417000 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:40:17,592-Speed 6317.52 samples/sec Loss 4.8836 LearningRate 0.0003 Epoch: 20 Global Step: 417010 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:40:20,828-Speed 6330.89 samples/sec Loss 4.8178 LearningRate 0.0003 Epoch: 20 Global Step: 417020 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:24,082-Speed 6296.20 samples/sec Loss 4.9051 LearningRate 0.0003 Epoch: 20 Global Step: 417030 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:27,324-Speed 6317.79 samples/sec Loss 4.8809 LearningRate 0.0003 Epoch: 20 Global Step: 417040 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:30,569-Speed 6316.72 samples/sec Loss 4.9219 LearningRate 0.0003 Epoch: 20 Global Step: 417050 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:33,810-Speed 6319.65 samples/sec Loss 4.9310 LearningRate 0.0003 Epoch: 20 Global Step: 417060 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:37,054-Speed 6314.73 samples/sec Loss 4.9041 LearningRate 0.0003 Epoch: 20 Global Step: 417070 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:40,298-Speed 6312.97 samples/sec Loss 4.8731 LearningRate 0.0003 Epoch: 20 Global Step: 417080 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:43,544-Speed 6310.87 samples/sec Loss 4.8945 LearningRate 0.0003 Epoch: 20 Global Step: 417090 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:46,795-Speed 6302.09 samples/sec Loss 4.9219 LearningRate 0.0003 Epoch: 20 Global Step: 417100 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:50,117-Speed 6165.57 samples/sec Loss 4.9303 LearningRate 0.0003 Epoch: 20 Global Step: 417110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:40:53,478-Speed 6094.78 samples/sec Loss 4.8679 LearningRate 0.0003 Epoch: 20 Global Step: 417120 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:40:56,725-Speed 6308.41 samples/sec Loss 4.8638 LearningRate 0.0003 Epoch: 20 Global Step: 417130 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:40:59,978-Speed 6298.69 samples/sec Loss 4.8428 LearningRate 0.0003 Epoch: 20 Global Step: 417140 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:41:03,258-Speed 6244.93 samples/sec Loss 4.8819 LearningRate 0.0003 Epoch: 20 Global Step: 417150 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:41:06,558-Speed 6206.22 samples/sec Loss 4.8373 LearningRate 0.0003 Epoch: 20 Global Step: 417160 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-02 05:41:09,792-Speed 6334.02 samples/sec Loss 4.8723 LearningRate 0.0003 Epoch: 20 Global Step: 417170 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:41:13,052-Speed 6283.96 samples/sec Loss 4.8390 LearningRate 0.0003 Epoch: 20 Global Step: 417180 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:41:16,296-Speed 6315.09 samples/sec Loss 4.8641 LearningRate 0.0003 Epoch: 20 Global Step: 417190 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:41:19,542-Speed 6311.62 samples/sec Loss 4.8812 LearningRate 0.0003 Epoch: 20 Global Step: 417200 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:41:22,786-Speed 6313.64 samples/sec Loss 4.8644 LearningRate 0.0003 Epoch: 20 Global Step: 417210 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:41:26,035-Speed 6306.66 samples/sec Loss 4.9119 LearningRate 0.0003 Epoch: 20 Global Step: 417220 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:41:29,279-Speed 6314.66 samples/sec Loss 4.8703 LearningRate 0.0003 Epoch: 20 Global Step: 417230 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:41:32,525-Speed 6309.30 samples/sec Loss 4.8474 LearningRate 0.0003 Epoch: 20 Global Step: 417240 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:41:35,769-Speed 6315.26 samples/sec Loss 4.8532 LearningRate 0.0003 Epoch: 20 Global Step: 417250 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:41:39,011-Speed 6318.44 samples/sec Loss 4.8971 LearningRate 0.0003 Epoch: 20 Global Step: 417260 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-02 05:41:42,255-Speed 6314.19 samples/sec Loss 4.8707 LearningRate 0.0003 Epoch: 20 Global Step: 417270 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:41:45,499-Speed 6314.26 samples/sec Loss 4.8434 LearningRate 0.0003 Epoch: 20 Global Step: 417280 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:41:48,742-Speed 6316.62 samples/sec Loss 4.8309 LearningRate 0.0003 Epoch: 20 Global Step: 417290 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:41:51,989-Speed 6309.66 samples/sec Loss 4.8173 LearningRate 0.0003 Epoch: 20 Global Step: 417300 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:41:55,233-Speed 6313.88 samples/sec Loss 4.7759 LearningRate 0.0003 Epoch: 20 Global Step: 417310 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:41:58,478-Speed 6312.54 samples/sec Loss 4.8847 LearningRate 0.0003 Epoch: 20 Global Step: 417320 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:42:01,707-Speed 6344.66 samples/sec Loss 4.8509 LearningRate 0.0003 Epoch: 20 Global Step: 417330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:04,956-Speed 6305.16 samples/sec Loss 4.8752 LearningRate 0.0003 Epoch: 20 Global Step: 417340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:08,199-Speed 6315.90 samples/sec Loss 4.8796 LearningRate 0.0003 Epoch: 20 Global Step: 417350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:11,448-Speed 6304.31 samples/sec Loss 4.8162 LearningRate 0.0003 Epoch: 20 Global Step: 417360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:14,690-Speed 6319.07 samples/sec Loss 4.8843 LearningRate 0.0003 Epoch: 20 Global Step: 417370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:17,939-Speed 6304.61 samples/sec Loss 4.8751 LearningRate 0.0003 Epoch: 20 Global Step: 417380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:21,185-Speed 6310.87 samples/sec Loss 4.8214 LearningRate 0.0003 Epoch: 20 Global Step: 417390 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:24,431-Speed 6312.12 samples/sec Loss 4.8671 LearningRate 0.0003 Epoch: 20 Global Step: 417400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:27,672-Speed 6319.81 samples/sec Loss 4.9365 LearningRate 0.0003 Epoch: 20 Global Step: 417410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:30,914-Speed 6318.96 samples/sec Loss 4.8605 LearningRate 0.0003 Epoch: 20 Global Step: 417420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:34,158-Speed 6315.75 samples/sec Loss 4.8841 LearningRate 0.0003 Epoch: 20 Global Step: 417430 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:42:37,386-Speed 6345.44 samples/sec Loss 4.9394 LearningRate 0.0003 Epoch: 20 Global Step: 417440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:40,629-Speed 6315.38 samples/sec Loss 4.8750 LearningRate 0.0003 Epoch: 20 Global Step: 417450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:43,873-Speed 6315.49 samples/sec Loss 4.9386 LearningRate 0.0003 Epoch: 20 Global Step: 417460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:47,115-Speed 6318.39 samples/sec Loss 4.9197 LearningRate 0.0003 Epoch: 20 Global Step: 417470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:50,359-Speed 6315.02 samples/sec Loss 4.7906 LearningRate 0.0003 Epoch: 20 Global Step: 417480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:53,602-Speed 6316.21 samples/sec Loss 4.7962 LearningRate 0.0003 Epoch: 20 Global Step: 417490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:42:56,845-Speed 6316.77 samples/sec Loss 4.8874 LearningRate 0.0003 Epoch: 20 Global Step: 417500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:00,090-Speed 6312.12 samples/sec Loss 4.8230 LearningRate 0.0003 Epoch: 20 Global Step: 417510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:03,331-Speed 6319.88 samples/sec Loss 4.8550 LearningRate 0.0003 Epoch: 20 Global Step: 417520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:06,572-Speed 6320.93 samples/sec Loss 4.8900 LearningRate 0.0003 Epoch: 20 Global Step: 417530 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:09,815-Speed 6315.84 samples/sec Loss 4.8568 LearningRate 0.0003 Epoch: 20 Global Step: 417540 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:43:13,059-Speed 6315.05 samples/sec Loss 4.8888 LearningRate 0.0003 Epoch: 20 Global Step: 417550 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:43:16,314-Speed 6293.02 samples/sec Loss 4.8633 LearningRate 0.0003 Epoch: 20 Global Step: 417560 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:43:19,557-Speed 6317.34 samples/sec Loss 4.8799 LearningRate 0.0003 Epoch: 20 Global Step: 417570 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:43:22,786-Speed 6343.78 samples/sec Loss 4.8551 LearningRate 0.0003 Epoch: 20 Global Step: 417580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:26,040-Speed 6294.95 samples/sec Loss 4.8865 LearningRate 0.0003 Epoch: 20 Global Step: 417590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:29,289-Speed 6305.34 samples/sec Loss 4.8240 LearningRate 0.0003 Epoch: 20 Global Step: 417600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:32,541-Speed 6300.04 samples/sec Loss 4.8230 LearningRate 0.0003 Epoch: 20 Global Step: 417610 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:35,782-Speed 6320.93 samples/sec Loss 4.9432 LearningRate 0.0003 Epoch: 20 Global Step: 417620 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:39,030-Speed 6307.01 samples/sec Loss 4.8214 LearningRate 0.0003 Epoch: 20 Global Step: 417630 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:42,274-Speed 6313.33 samples/sec Loss 4.8691 LearningRate 0.0003 Epoch: 20 Global Step: 417640 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:45,517-Speed 6316.12 samples/sec Loss 4.8829 LearningRate 0.0003 Epoch: 20 Global Step: 417650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:48,841-Speed 6165.89 samples/sec Loss 4.8857 LearningRate 0.0003 Epoch: 20 Global Step: 417660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:52,099-Speed 6286.71 samples/sec Loss 4.9333 LearningRate 0.0003 Epoch: 20 Global Step: 417670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:43:55,341-Speed 6318.41 samples/sec Loss 4.8771 LearningRate 0.0003 Epoch: 20 Global Step: 417680 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:43:58,584-Speed 6317.32 samples/sec Loss 4.9131 LearningRate 0.0003 Epoch: 20 Global Step: 417690 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:44:01,831-Speed 6308.60 samples/sec Loss 4.8787 LearningRate 0.0003 Epoch: 20 Global Step: 417700 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:44:05,073-Speed 6319.52 samples/sec Loss 4.8388 LearningRate 0.0003 Epoch: 20 Global Step: 417710 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:44:08,316-Speed 6315.49 samples/sec Loss 4.8618 LearningRate 0.0003 Epoch: 20 Global Step: 417720 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:44:11,558-Speed 6318.50 samples/sec Loss 4.8377 LearningRate 0.0003 Epoch: 20 Global Step: 417730 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:44:14,792-Speed 6332.89 samples/sec Loss 4.8410 LearningRate 0.0003 Epoch: 20 Global Step: 417740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:18,040-Speed 6307.26 samples/sec Loss 4.9518 LearningRate 0.0003 Epoch: 20 Global Step: 417750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:21,281-Speed 6321.95 samples/sec Loss 4.8891 LearningRate 0.0003 Epoch: 20 Global Step: 417760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:24,524-Speed 6315.90 samples/sec Loss 4.8398 LearningRate 0.0003 Epoch: 20 Global Step: 417770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:27,770-Speed 6313.85 samples/sec Loss 4.8446 LearningRate 0.0003 Epoch: 20 Global Step: 417780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:31,014-Speed 6313.81 samples/sec Loss 4.8960 LearningRate 0.0003 Epoch: 20 Global Step: 417790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:34,273-Speed 6285.89 samples/sec Loss 4.8490 LearningRate 0.0003 Epoch: 20 Global Step: 417800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:37,517-Speed 6314.81 samples/sec Loss 4.9046 LearningRate 0.0003 Epoch: 20 Global Step: 417810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:40,768-Speed 6301.19 samples/sec Loss 4.8568 LearningRate 0.0003 Epoch: 20 Global Step: 417820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:44,013-Speed 6312.88 samples/sec Loss 4.7858 LearningRate 0.0003 Epoch: 20 Global Step: 417830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:47,247-Speed 6336.19 samples/sec Loss 4.8688 LearningRate 0.0003 Epoch: 20 Global Step: 417840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:50,500-Speed 6297.48 samples/sec Loss 4.8889 LearningRate 0.0003 Epoch: 20 Global Step: 417850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:53,746-Speed 6310.51 samples/sec Loss 4.8878 LearningRate 0.0003 Epoch: 20 Global Step: 417860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:44:56,990-Speed 6314.65 samples/sec Loss 4.8876 LearningRate 0.0003 Epoch: 20 Global Step: 417870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:00,236-Speed 6311.66 samples/sec Loss 4.9145 LearningRate 0.0003 Epoch: 20 Global Step: 417880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:03,481-Speed 6313.25 samples/sec Loss 4.7975 LearningRate 0.0003 Epoch: 20 Global Step: 417890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:06,797-Speed 6176.97 samples/sec Loss 4.9002 LearningRate 0.0003 Epoch: 20 Global Step: 417900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:10,048-Speed 6299.84 samples/sec Loss 4.9105 LearningRate 0.0003 Epoch: 20 Global Step: 417910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:13,306-Speed 6288.60 samples/sec Loss 4.8842 LearningRate 0.0003 Epoch: 20 Global Step: 417920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:16,553-Speed 6308.38 samples/sec Loss 4.8372 LearningRate 0.0003 Epoch: 20 Global Step: 417930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:19,803-Speed 6302.67 samples/sec Loss 4.8515 LearningRate 0.0003 Epoch: 20 Global Step: 417940 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:45:23,050-Speed 6309.47 samples/sec Loss 4.8885 LearningRate 0.0003 Epoch: 20 Global Step: 417950 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:45:26,284-Speed 6332.21 samples/sec Loss 4.8361 LearningRate 0.0003 Epoch: 20 Global Step: 417960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:29,540-Speed 6292.76 samples/sec Loss 4.8938 LearningRate 0.0003 Epoch: 20 Global Step: 417970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:32,783-Speed 6315.73 samples/sec Loss 4.9305 LearningRate 0.0003 Epoch: 20 Global Step: 417980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:36,027-Speed 6315.32 samples/sec Loss 4.8858 LearningRate 0.0003 Epoch: 20 Global Step: 417990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:39,272-Speed 6311.88 samples/sec Loss 4.8541 LearningRate 0.0003 Epoch: 20 Global Step: 418000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:42,516-Speed 6314.92 samples/sec Loss 4.8438 LearningRate 0.0003 Epoch: 20 Global Step: 418010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:45,760-Speed 6313.88 samples/sec Loss 4.8559 LearningRate 0.0003 Epoch: 20 Global Step: 418020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:49,009-Speed 6307.45 samples/sec Loss 4.8648 LearningRate 0.0003 Epoch: 20 Global Step: 418030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:52,258-Speed 6305.11 samples/sec Loss 4.9600 LearningRate 0.0003 Epoch: 20 Global Step: 418040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:55,501-Speed 6315.46 samples/sec Loss 4.7785 LearningRate 0.0003 Epoch: 20 Global Step: 418050 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:45:58,748-Speed 6308.25 samples/sec Loss 4.8789 LearningRate 0.0003 Epoch: 20 Global Step: 418060 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:46:01,991-Speed 6317.76 samples/sec Loss 4.9028 LearningRate 0.0003 Epoch: 20 Global Step: 418070 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:46:05,228-Speed 6328.66 samples/sec Loss 4.8162 LearningRate 0.0003 Epoch: 20 Global Step: 418080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:08,470-Speed 6317.60 samples/sec Loss 4.8906 LearningRate 0.0003 Epoch: 20 Global Step: 418090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:11,713-Speed 6316.13 samples/sec Loss 4.8244 LearningRate 0.0003 Epoch: 20 Global Step: 418100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:14,957-Speed 6315.99 samples/sec Loss 4.9137 LearningRate 0.0003 Epoch: 20 Global Step: 418110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:18,200-Speed 6316.41 samples/sec Loss 4.8766 LearningRate 0.0003 Epoch: 20 Global Step: 418120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:21,445-Speed 6312.36 samples/sec Loss 4.8893 LearningRate 0.0003 Epoch: 20 Global Step: 418130 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:24,691-Speed 6311.06 samples/sec Loss 4.8509 LearningRate 0.0003 Epoch: 20 Global Step: 418140 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:27,933-Speed 6317.22 samples/sec Loss 4.8431 LearningRate 0.0003 Epoch: 20 Global Step: 418150 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:31,178-Speed 6311.68 samples/sec Loss 4.8660 LearningRate 0.0003 Epoch: 20 Global Step: 418160 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:34,434-Speed 6292.35 samples/sec Loss 4.8737 LearningRate 0.0003 Epoch: 20 Global Step: 418170 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:37,676-Speed 6318.08 samples/sec Loss 4.8884 LearningRate 0.0003 Epoch: 20 Global Step: 418180 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:46:40,923-Speed 6309.62 samples/sec Loss 4.9014 LearningRate 0.0003 Epoch: 20 Global Step: 418190 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:46:44,168-Speed 6312.73 samples/sec Loss 4.8400 LearningRate 0.0003 Epoch: 20 Global Step: 418200 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:46:47,393-Speed 6350.81 samples/sec Loss 4.8490 LearningRate 0.0003 Epoch: 20 Global Step: 418210 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:50,637-Speed 6314.30 samples/sec Loss 4.8426 LearningRate 0.0003 Epoch: 20 Global Step: 418220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:53,882-Speed 6313.40 samples/sec Loss 4.8859 LearningRate 0.0003 Epoch: 20 Global Step: 418230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:46:57,125-Speed 6318.09 samples/sec Loss 4.8302 LearningRate 0.0003 Epoch: 20 Global Step: 418240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:00,367-Speed 6317.21 samples/sec Loss 4.8209 LearningRate 0.0003 Epoch: 20 Global Step: 418250 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:03,609-Speed 6318.78 samples/sec Loss 4.7834 LearningRate 0.0003 Epoch: 20 Global Step: 418260 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:06,887-Speed 6249.04 samples/sec Loss 4.8762 LearningRate 0.0003 Epoch: 20 Global Step: 418270 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:10,131-Speed 6314.62 samples/sec Loss 4.8747 LearningRate 0.0003 Epoch: 20 Global Step: 418280 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:13,374-Speed 6317.84 samples/sec Loss 4.8462 LearningRate 0.0003 Epoch: 20 Global Step: 418290 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:16,621-Speed 6308.64 samples/sec Loss 4.8958 LearningRate 0.0003 Epoch: 20 Global Step: 418300 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:19,866-Speed 6313.13 samples/sec Loss 4.9223 LearningRate 0.0003 Epoch: 20 Global Step: 418310 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:47:23,112-Speed 6309.59 samples/sec Loss 4.8389 LearningRate 0.0003 Epoch: 20 Global Step: 418320 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:26,357-Speed 6312.34 samples/sec Loss 4.8382 LearningRate 0.0003 Epoch: 20 Global Step: 418330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:29,603-Speed 6310.65 samples/sec Loss 4.8282 LearningRate 0.0003 Epoch: 20 Global Step: 418340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:32,848-Speed 6314.13 samples/sec Loss 4.7871 LearningRate 0.0003 Epoch: 20 Global Step: 418350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:36,087-Speed 6323.51 samples/sec Loss 4.8737 LearningRate 0.0003 Epoch: 20 Global Step: 418360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:39,330-Speed 6317.30 samples/sec Loss 4.8788 LearningRate 0.0003 Epoch: 20 Global Step: 418370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:42,574-Speed 6313.18 samples/sec Loss 4.9080 LearningRate 0.0003 Epoch: 20 Global Step: 418380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:45,905-Speed 6149.93 samples/sec Loss 4.8746 LearningRate 0.0003 Epoch: 20 Global Step: 418390 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:49,269-Speed 6090.54 samples/sec Loss 4.9244 LearningRate 0.0003 Epoch: 20 Global Step: 418400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:52,619-Speed 6114.00 samples/sec Loss 4.9027 LearningRate 0.0003 Epoch: 20 Global Step: 418410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:47:55,862-Speed 6316.74 samples/sec Loss 4.9172 LearningRate 0.0003 Epoch: 20 Global Step: 418420 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:47:59,097-Speed 6332.09 samples/sec Loss 4.9190 LearningRate 0.0003 Epoch: 20 Global Step: 418430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:02,344-Speed 6308.48 samples/sec Loss 4.8787 LearningRate 0.0003 Epoch: 20 Global Step: 418440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:05,589-Speed 6313.20 samples/sec Loss 4.8022 LearningRate 0.0003 Epoch: 20 Global Step: 418450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:08,832-Speed 6316.60 samples/sec Loss 4.8550 LearningRate 0.0003 Epoch: 20 Global Step: 418460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:12,080-Speed 6307.23 samples/sec Loss 4.8408 LearningRate 0.0003 Epoch: 20 Global Step: 418470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:15,321-Speed 6319.64 samples/sec Loss 4.8508 LearningRate 0.0003 Epoch: 20 Global Step: 418480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:18,568-Speed 6310.46 samples/sec Loss 4.8054 LearningRate 0.0003 Epoch: 20 Global Step: 418490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:21,815-Speed 6308.74 samples/sec Loss 4.8429 LearningRate 0.0003 Epoch: 20 Global Step: 418500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:25,060-Speed 6311.18 samples/sec Loss 4.8217 LearningRate 0.0003 Epoch: 20 Global Step: 418510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:28,309-Speed 6306.44 samples/sec Loss 4.9009 LearningRate 0.0003 Epoch: 20 Global Step: 418520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:31,540-Speed 6339.70 samples/sec Loss 4.8697 LearningRate 0.0003 Epoch: 20 Global Step: 418530 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:34,795-Speed 6293.07 samples/sec Loss 4.8735 LearningRate 0.0003 Epoch: 20 Global Step: 418540 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:38,036-Speed 6320.04 samples/sec Loss 4.8160 LearningRate 0.0003 Epoch: 20 Global Step: 418550 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:41,279-Speed 6315.80 samples/sec Loss 4.8182 LearningRate 0.0003 Epoch: 20 Global Step: 418560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:44,522-Speed 6317.78 samples/sec Loss 4.8357 LearningRate 0.0003 Epoch: 20 Global Step: 418570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:47,770-Speed 6307.21 samples/sec Loss 4.8611 LearningRate 0.0003 Epoch: 20 Global Step: 418580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:51,014-Speed 6312.97 samples/sec Loss 4.8791 LearningRate 0.0003 Epoch: 20 Global Step: 418590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:54,258-Speed 6315.97 samples/sec Loss 4.8624 LearningRate 0.0003 Epoch: 20 Global Step: 418600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:48:57,501-Speed 6315.48 samples/sec Loss 4.8359 LearningRate 0.0003 Epoch: 20 Global Step: 418610 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:00,748-Speed 6309.59 samples/sec Loss 4.8516 LearningRate 0.0003 Epoch: 20 Global Step: 418620 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:03,993-Speed 6312.77 samples/sec Loss 4.8902 LearningRate 0.0003 Epoch: 20 Global Step: 418630 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:49:07,237-Speed 6314.47 samples/sec Loss 4.8390 LearningRate 0.0003 Epoch: 20 Global Step: 418640 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:49:10,465-Speed 6345.05 samples/sec Loss 4.9213 LearningRate 0.0003 Epoch: 20 Global Step: 418650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:13,714-Speed 6305.49 samples/sec Loss 4.8242 LearningRate 0.0003 Epoch: 20 Global Step: 418660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:16,961-Speed 6308.79 samples/sec Loss 4.8897 LearningRate 0.0003 Epoch: 20 Global Step: 418670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:20,207-Speed 6312.24 samples/sec Loss 4.8098 LearningRate 0.0003 Epoch: 20 Global Step: 418680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:23,453-Speed 6310.26 samples/sec Loss 4.8814 LearningRate 0.0003 Epoch: 20 Global Step: 418690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:26,696-Speed 6316.66 samples/sec Loss 4.8387 LearningRate 0.0003 Epoch: 20 Global Step: 418700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:29,945-Speed 6304.74 samples/sec Loss 4.8611 LearningRate 0.0003 Epoch: 20 Global Step: 418710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:33,190-Speed 6312.44 samples/sec Loss 4.8342 LearningRate 0.0003 Epoch: 20 Global Step: 418720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:36,438-Speed 6307.42 samples/sec Loss 4.8699 LearningRate 0.0003 Epoch: 20 Global Step: 418730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:39,682-Speed 6313.30 samples/sec Loss 4.9013 LearningRate 0.0003 Epoch: 20 Global Step: 418740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:42,913-Speed 6340.05 samples/sec Loss 4.8488 LearningRate 0.0003 Epoch: 20 Global Step: 418750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:46,170-Speed 6290.33 samples/sec Loss 4.8632 LearningRate 0.0003 Epoch: 20 Global Step: 418760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:49,413-Speed 6316.79 samples/sec Loss 4.8172 LearningRate 0.0003 Epoch: 20 Global Step: 418770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:52,662-Speed 6303.72 samples/sec Loss 4.8057 LearningRate 0.0003 Epoch: 20 Global Step: 418780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:55,907-Speed 6312.89 samples/sec Loss 4.9021 LearningRate 0.0003 Epoch: 20 Global Step: 418790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:49:59,150-Speed 6316.90 samples/sec Loss 4.9550 LearningRate 0.0003 Epoch: 20 Global Step: 418800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:02,396-Speed 6310.11 samples/sec Loss 4.8879 LearningRate 0.0003 Epoch: 20 Global Step: 418810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:05,641-Speed 6312.20 samples/sec Loss 4.8313 LearningRate 0.0003 Epoch: 20 Global Step: 418820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:08,885-Speed 6315.46 samples/sec Loss 4.8323 LearningRate 0.0003 Epoch: 20 Global Step: 418830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:12,126-Speed 6319.16 samples/sec Loss 4.8923 LearningRate 0.0003 Epoch: 20 Global Step: 418840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:15,372-Speed 6311.60 samples/sec Loss 4.9257 LearningRate 0.0003 Epoch: 20 Global Step: 418850 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:50:18,619-Speed 6309.56 samples/sec Loss 4.9044 LearningRate 0.0003 Epoch: 20 Global Step: 418860 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:50:21,865-Speed 6310.43 samples/sec Loss 4.8747 LearningRate 0.0003 Epoch: 20 Global Step: 418870 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:50:25,099-Speed 6335.23 samples/sec Loss 4.8752 LearningRate 0.0003 Epoch: 20 Global Step: 418880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:28,342-Speed 6316.22 samples/sec Loss 4.9047 LearningRate 0.0003 Epoch: 20 Global Step: 418890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:31,584-Speed 6317.51 samples/sec Loss 4.8691 LearningRate 0.0003 Epoch: 20 Global Step: 418900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:34,828-Speed 6316.67 samples/sec Loss 4.8492 LearningRate 0.0003 Epoch: 20 Global Step: 418910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:38,072-Speed 6313.03 samples/sec Loss 4.7839 LearningRate 0.0003 Epoch: 20 Global Step: 418920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:41,319-Speed 6308.71 samples/sec Loss 4.8351 LearningRate 0.0003 Epoch: 20 Global Step: 418930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:44,562-Speed 6316.47 samples/sec Loss 4.8372 LearningRate 0.0003 Epoch: 20 Global Step: 418940 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:47,809-Speed 6309.62 samples/sec Loss 4.8774 LearningRate 0.0003 Epoch: 20 Global Step: 418950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:51,055-Speed 6310.50 samples/sec Loss 4.9056 LearningRate 0.0003 Epoch: 20 Global Step: 418960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:54,297-Speed 6318.68 samples/sec Loss 4.8892 LearningRate 0.0003 Epoch: 20 Global Step: 418970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:50:57,541-Speed 6314.27 samples/sec Loss 4.8023 LearningRate 0.0003 Epoch: 20 Global Step: 418980 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:51:00,784-Speed 6315.77 samples/sec Loss 4.8666 LearningRate 0.0003 Epoch: 20 Global Step: 418990 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:51:04,031-Speed 6309.84 samples/sec Loss 4.7922 LearningRate 0.0003 Epoch: 20 Global Step: 419000 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:51:07,261-Speed 6341.85 samples/sec Loss 4.8729 LearningRate 0.0003 Epoch: 20 Global Step: 419010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:51:10,507-Speed 6310.42 samples/sec Loss 4.8334 LearningRate 0.0003 Epoch: 20 Global Step: 419020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:51:13,762-Speed 6293.13 samples/sec Loss 4.8623 LearningRate 0.0003 Epoch: 20 Global Step: 419030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:51:17,008-Speed 6311.73 samples/sec Loss 4.9126 LearningRate 0.0003 Epoch: 20 Global Step: 419040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:51:20,251-Speed 6316.98 samples/sec Loss 4.8347 LearningRate 0.0003 Epoch: 20 Global Step: 419050 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:51:23,494-Speed 6315.64 samples/sec Loss 4.7825 LearningRate 0.0003 Epoch: 20 Global Step: 419060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:51:26,741-Speed 6309.11 samples/sec Loss 4.8756 LearningRate 0.0003 Epoch: 20 Global Step: 419070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:51:29,987-Speed 6309.22 samples/sec Loss 4.7956 LearningRate 0.0003 Epoch: 20 Global Step: 419080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:51:33,234-Speed 6310.22 samples/sec Loss 4.8116 LearningRate 0.0003 Epoch: 20 Global Step: 419090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:51:36,480-Speed 6310.87 samples/sec Loss 4.8559 LearningRate 0.0003 Epoch: 20 Global Step: 419100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:51:39,727-Speed 6308.00 samples/sec Loss 4.8904 LearningRate 0.0003 Epoch: 20 Global Step: 419110 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:51:42,970-Speed 6317.49 samples/sec Loss 4.8672 LearningRate 0.0003 Epoch: 20 Global Step: 419120 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:51:46,213-Speed 6315.65 samples/sec Loss 4.8312 LearningRate 0.0003 Epoch: 20 Global Step: 419130 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:51:49,460-Speed 6309.97 samples/sec Loss 4.8148 LearningRate 0.0003 Epoch: 20 Global Step: 419140 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:51:52,704-Speed 6315.07 samples/sec Loss 4.8887 LearningRate 0.0003 Epoch: 20 Global Step: 419150 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:51:55,949-Speed 6311.69 samples/sec Loss 4.7821 LearningRate 0.0003 Epoch: 20 Global Step: 419160 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:51:59,195-Speed 6311.27 samples/sec Loss 4.7698 LearningRate 0.0003 Epoch: 20 Global Step: 419170 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:52:02,446-Speed 6300.31 samples/sec Loss 4.8289 LearningRate 0.0003 Epoch: 20 Global Step: 419180 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:52:05,679-Speed 6336.49 samples/sec Loss 4.9085 LearningRate 0.0003 Epoch: 20 Global Step: 419190 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:08,925-Speed 6310.88 samples/sec Loss 4.8500 LearningRate 0.0003 Epoch: 20 Global Step: 419200 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:12,167-Speed 6317.32 samples/sec Loss 4.8361 LearningRate 0.0003 Epoch: 20 Global Step: 419210 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:15,413-Speed 6312.01 samples/sec Loss 4.8720 LearningRate 0.0003 Epoch: 20 Global Step: 419220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:18,655-Speed 6317.09 samples/sec Loss 4.8389 LearningRate 0.0003 Epoch: 20 Global Step: 419230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:21,904-Speed 6304.94 samples/sec Loss 4.9069 LearningRate 0.0003 Epoch: 20 Global Step: 419240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:25,155-Speed 6302.24 samples/sec Loss 4.9007 LearningRate 0.0003 Epoch: 20 Global Step: 419250 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:28,411-Speed 6290.09 samples/sec Loss 4.8467 LearningRate 0.0003 Epoch: 20 Global Step: 419260 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:31,659-Speed 6307.42 samples/sec Loss 4.8427 LearningRate 0.0003 Epoch: 20 Global Step: 419270 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:34,902-Speed 6316.25 samples/sec Loss 4.8861 LearningRate 0.0003 Epoch: 20 Global Step: 419280 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:38,149-Speed 6309.10 samples/sec Loss 4.9077 LearningRate 0.0003 Epoch: 20 Global Step: 419290 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:52:41,395-Speed 6311.19 samples/sec Loss 4.9088 LearningRate 0.0003 Epoch: 20 Global Step: 419300 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:52:44,629-Speed 6334.21 samples/sec Loss 4.8001 LearningRate 0.0003 Epoch: 20 Global Step: 419310 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:47,878-Speed 6305.24 samples/sec Loss 4.9251 LearningRate 0.0003 Epoch: 20 Global Step: 419320 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:51,126-Speed 6306.75 samples/sec Loss 4.8473 LearningRate 0.0003 Epoch: 20 Global Step: 419330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:54,373-Speed 6310.17 samples/sec Loss 4.8635 LearningRate 0.0003 Epoch: 20 Global Step: 419340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:52:57,616-Speed 6315.24 samples/sec Loss 4.8505 LearningRate 0.0003 Epoch: 20 Global Step: 419350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:00,864-Speed 6307.04 samples/sec Loss 4.8420 LearningRate 0.0003 Epoch: 20 Global Step: 419360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:04,108-Speed 6313.91 samples/sec Loss 4.8398 LearningRate 0.0003 Epoch: 20 Global Step: 419370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:07,356-Speed 6307.93 samples/sec Loss 4.8616 LearningRate 0.0003 Epoch: 20 Global Step: 419380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:10,606-Speed 6303.34 samples/sec Loss 4.8320 LearningRate 0.0003 Epoch: 20 Global Step: 419390 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:13,858-Speed 6297.22 samples/sec Loss 4.8762 LearningRate 0.0003 Epoch: 20 Global Step: 419400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:17,108-Speed 6303.20 samples/sec Loss 4.8309 LearningRate 0.0003 Epoch: 20 Global Step: 419410 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:53:20,356-Speed 6306.33 samples/sec Loss 4.8726 LearningRate 0.0003 Epoch: 20 Global Step: 419420 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:53:23,591-Speed 6333.46 samples/sec Loss 4.8311 LearningRate 0.0003 Epoch: 20 Global Step: 419430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:26,846-Speed 6293.08 samples/sec Loss 4.8468 LearningRate 0.0003 Epoch: 20 Global Step: 419440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:30,090-Speed 6313.84 samples/sec Loss 4.8922 LearningRate 0.0003 Epoch: 20 Global Step: 419450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:33,335-Speed 6313.52 samples/sec Loss 4.8590 LearningRate 0.0003 Epoch: 20 Global Step: 419460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:36,583-Speed 6306.99 samples/sec Loss 4.8152 LearningRate 0.0003 Epoch: 20 Global Step: 419470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:39,831-Speed 6305.98 samples/sec Loss 4.7886 LearningRate 0.0003 Epoch: 20 Global Step: 419480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:43,073-Speed 6317.96 samples/sec Loss 4.8589 LearningRate 0.0003 Epoch: 20 Global Step: 419490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:46,319-Speed 6311.09 samples/sec Loss 4.8466 LearningRate 0.0003 Epoch: 20 Global Step: 419500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:49,565-Speed 6310.82 samples/sec Loss 4.8781 LearningRate 0.0003 Epoch: 20 Global Step: 419510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:52,812-Speed 6308.76 samples/sec Loss 4.8499 LearningRate 0.0003 Epoch: 20 Global Step: 419520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:53:56,062-Speed 6304.61 samples/sec Loss 4.8574 LearningRate 0.0003 Epoch: 20 Global Step: 419530 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:53:59,290-Speed 6345.08 samples/sec Loss 4.8542 LearningRate 0.0003 Epoch: 20 Global Step: 419540 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:02,594-Speed 6201.02 samples/sec Loss 4.8529 LearningRate 0.0003 Epoch: 20 Global Step: 419550 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:05,839-Speed 6312.77 samples/sec Loss 4.8854 LearningRate 0.0003 Epoch: 20 Global Step: 419560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:09,081-Speed 6318.69 samples/sec Loss 4.8958 LearningRate 0.0003 Epoch: 20 Global Step: 419570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:12,329-Speed 6305.34 samples/sec Loss 4.8538 LearningRate 0.0003 Epoch: 20 Global Step: 419580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:15,575-Speed 6310.88 samples/sec Loss 4.9380 LearningRate 0.0003 Epoch: 20 Global Step: 419590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:18,821-Speed 6310.37 samples/sec Loss 4.8147 LearningRate 0.0003 Epoch: 20 Global Step: 419600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:22,066-Speed 6313.37 samples/sec Loss 4.8642 LearningRate 0.0003 Epoch: 20 Global Step: 419610 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:25,310-Speed 6315.20 samples/sec Loss 4.8060 LearningRate 0.0003 Epoch: 20 Global Step: 419620 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:28,557-Speed 6308.98 samples/sec Loss 4.8773 LearningRate 0.0003 Epoch: 20 Global Step: 419630 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:31,792-Speed 6330.32 samples/sec Loss 4.9188 LearningRate 0.0003 Epoch: 20 Global Step: 419640 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:35,037-Speed 6314.25 samples/sec Loss 4.8267 LearningRate 0.0003 Epoch: 20 Global Step: 419650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:38,285-Speed 6306.35 samples/sec Loss 4.8535 LearningRate 0.0003 Epoch: 20 Global Step: 419660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:41,530-Speed 6311.42 samples/sec Loss 4.8333 LearningRate 0.0003 Epoch: 20 Global Step: 419670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:44,774-Speed 6314.77 samples/sec Loss 4.8266 LearningRate 0.0003 Epoch: 20 Global Step: 419680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:48,024-Speed 6304.25 samples/sec Loss 4.8603 LearningRate 0.0003 Epoch: 20 Global Step: 419690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:51,277-Speed 6297.19 samples/sec Loss 4.8721 LearningRate 0.0003 Epoch: 20 Global Step: 419700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:54,525-Speed 6307.18 samples/sec Loss 4.8550 LearningRate 0.0003 Epoch: 20 Global Step: 419710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:54:57,770-Speed 6311.51 samples/sec Loss 4.8706 LearningRate 0.0003 Epoch: 20 Global Step: 419720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:01,023-Speed 6298.16 samples/sec Loss 4.8279 LearningRate 0.0003 Epoch: 20 Global Step: 419730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:04,268-Speed 6312.79 samples/sec Loss 4.8590 LearningRate 0.0003 Epoch: 20 Global Step: 419740 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:55:07,502-Speed 6333.53 samples/sec Loss 4.9101 LearningRate 0.0003 Epoch: 20 Global Step: 419750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:10,748-Speed 6311.23 samples/sec Loss 4.8378 LearningRate 0.0003 Epoch: 20 Global Step: 419760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:13,993-Speed 6311.62 samples/sec Loss 4.8547 LearningRate 0.0003 Epoch: 20 Global Step: 419770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:17,240-Speed 6308.50 samples/sec Loss 4.7992 LearningRate 0.0003 Epoch: 20 Global Step: 419780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:20,490-Speed 6304.53 samples/sec Loss 4.8340 LearningRate 0.0003 Epoch: 20 Global Step: 419790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:23,736-Speed 6311.04 samples/sec Loss 4.8146 LearningRate 0.0003 Epoch: 20 Global Step: 419800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:26,983-Speed 6309.05 samples/sec Loss 4.8555 LearningRate 0.0003 Epoch: 20 Global Step: 419810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:30,228-Speed 6311.94 samples/sec Loss 4.8366 LearningRate 0.0003 Epoch: 20 Global Step: 419820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:33,468-Speed 6321.48 samples/sec Loss 4.8371 LearningRate 0.0003 Epoch: 20 Global Step: 419830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:36,714-Speed 6310.47 samples/sec Loss 4.8881 LearningRate 0.0003 Epoch: 20 Global Step: 419840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:39,943-Speed 6343.64 samples/sec Loss 4.8831 LearningRate 0.0003 Epoch: 20 Global Step: 419850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:43,188-Speed 6313.34 samples/sec Loss 4.8153 LearningRate 0.0003 Epoch: 20 Global Step: 419860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:46,435-Speed 6309.24 samples/sec Loss 4.8701 LearningRate 0.0003 Epoch: 20 Global Step: 419870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:49,680-Speed 6313.01 samples/sec Loss 4.8741 LearningRate 0.0003 Epoch: 20 Global Step: 419880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:52,929-Speed 6304.26 samples/sec Loss 4.8575 LearningRate 0.0003 Epoch: 20 Global Step: 419890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:56,173-Speed 6314.74 samples/sec Loss 4.8318 LearningRate 0.0003 Epoch: 20 Global Step: 419900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:55:59,415-Speed 6317.71 samples/sec Loss 4.8507 LearningRate 0.0003 Epoch: 20 Global Step: 419910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:56:02,663-Speed 6307.83 samples/sec Loss 4.8687 LearningRate 0.0003 Epoch: 20 Global Step: 419920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:56:05,904-Speed 6320.07 samples/sec Loss 4.8894 LearningRate 0.0003 Epoch: 20 Global Step: 419930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:56:09,153-Speed 6305.45 samples/sec Loss 4.9006 LearningRate 0.0003 Epoch: 20 Global Step: 419940 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:56:12,385-Speed 6337.50 samples/sec Loss 4.9091 LearningRate 0.0003 Epoch: 20 Global Step: 419950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:56:15,620-Speed 6333.78 samples/sec Loss 4.8232 LearningRate 0.0003 Epoch: 20 Global Step: 419960 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 05:56:18,867-Speed 6308.40 samples/sec Loss 4.8534 LearningRate 0.0003 Epoch: 20 Global Step: 419970 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 05:56:22,111-Speed 6314.91 samples/sec Loss 4.8491 LearningRate 0.0003 Epoch: 20 Global Step: 419980 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 05:56:25,362-Speed 6301.62 samples/sec Loss 4.8124 LearningRate 0.0003 Epoch: 20 Global Step: 419990 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 05:56:28,607-Speed 6311.37 samples/sec Loss 4.9321 LearningRate 0.0003 Epoch: 20 Global Step: 420000 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 05:56:31,853-Speed 6310.38 samples/sec Loss 4.8320 LearningRate 0.0003 Epoch: 20 Global Step: 420010 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 05:56:35,099-Speed 6310.93 samples/sec Loss 4.8405 LearningRate 0.0003 Epoch: 20 Global Step: 420020 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 05:56:38,341-Speed 6318.19 samples/sec Loss 4.8634 LearningRate 0.0003 Epoch: 20 Global Step: 420030 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 05:56:41,587-Speed 6311.85 samples/sec Loss 4.9108 LearningRate 0.0003 Epoch: 20 Global Step: 420040 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 05:56:44,830-Speed 6315.31 samples/sec Loss 4.8263 LearningRate 0.0003 Epoch: 20 Global Step: 420050 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 05:56:48,080-Speed 6304.10 samples/sec Loss 4.8420 LearningRate 0.0003 Epoch: 20 Global Step: 420060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:56:51,329-Speed 6303.67 samples/sec Loss 4.8262 LearningRate 0.0003 Epoch: 20 Global Step: 420070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:56:54,572-Speed 6317.96 samples/sec Loss 4.8821 LearningRate 0.0003 Epoch: 20 Global Step: 420080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:56:57,823-Speed 6300.53 samples/sec Loss 4.7854 LearningRate 0.0003 Epoch: 20 Global Step: 420090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:01,065-Speed 6317.72 samples/sec Loss 4.8690 LearningRate 0.0003 Epoch: 20 Global Step: 420100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:04,311-Speed 6311.84 samples/sec Loss 4.8595 LearningRate 0.0003 Epoch: 20 Global Step: 420110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:07,553-Speed 6318.74 samples/sec Loss 4.8460 LearningRate 0.0003 Epoch: 20 Global Step: 420120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:10,804-Speed 6299.17 samples/sec Loss 4.8641 LearningRate 0.0003 Epoch: 20 Global Step: 420130 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:14,048-Speed 6315.58 samples/sec Loss 4.7844 LearningRate 0.0003 Epoch: 20 Global Step: 420140 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:17,293-Speed 6311.97 samples/sec Loss 4.7983 LearningRate 0.0003 Epoch: 20 Global Step: 420150 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:20,542-Speed 6306.55 samples/sec Loss 4.8629 LearningRate 0.0003 Epoch: 20 Global Step: 420160 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:57:23,789-Speed 6309.09 samples/sec Loss 4.8872 LearningRate 0.0003 Epoch: 20 Global Step: 420170 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:57:27,035-Speed 6309.66 samples/sec Loss 4.8602 LearningRate 0.0003 Epoch: 20 Global Step: 420180 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:57:30,268-Speed 6335.74 samples/sec Loss 4.9437 LearningRate 0.0003 Epoch: 20 Global Step: 420190 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:33,513-Speed 6314.23 samples/sec Loss 4.8097 LearningRate 0.0003 Epoch: 20 Global Step: 420200 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:36,757-Speed 6314.26 samples/sec Loss 4.8640 LearningRate 0.0003 Epoch: 20 Global Step: 420210 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:40,001-Speed 6313.38 samples/sec Loss 4.8447 LearningRate 0.0003 Epoch: 20 Global Step: 420220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:43,246-Speed 6314.24 samples/sec Loss 4.8694 LearningRate 0.0003 Epoch: 20 Global Step: 420230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:46,490-Speed 6312.65 samples/sec Loss 4.9185 LearningRate 0.0003 Epoch: 20 Global Step: 420240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:49,733-Speed 6316.94 samples/sec Loss 4.8417 LearningRate 0.0003 Epoch: 20 Global Step: 420250 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:52,979-Speed 6310.88 samples/sec Loss 4.8984 LearningRate 0.0003 Epoch: 20 Global Step: 420260 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:56,231-Speed 6299.71 samples/sec Loss 4.8338 LearningRate 0.0003 Epoch: 20 Global Step: 420270 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:57:59,488-Speed 6289.08 samples/sec Loss 4.8355 LearningRate 0.0003 Epoch: 20 Global Step: 420280 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:02,736-Speed 6305.99 samples/sec Loss 4.8595 LearningRate 0.0003 Epoch: 20 Global Step: 420290 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:58:05,979-Speed 6316.78 samples/sec Loss 4.8994 LearningRate 0.0003 Epoch: 20 Global Step: 420300 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:09,219-Speed 6323.40 samples/sec Loss 4.9637 LearningRate 0.0003 Epoch: 20 Global Step: 420310 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:12,465-Speed 6311.28 samples/sec Loss 4.8385 LearningRate 0.0003 Epoch: 20 Global Step: 420320 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:15,709-Speed 6312.96 samples/sec Loss 4.7496 LearningRate 0.0003 Epoch: 20 Global Step: 420330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:18,952-Speed 6317.38 samples/sec Loss 4.7916 LearningRate 0.0003 Epoch: 20 Global Step: 420340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:22,199-Speed 6308.09 samples/sec Loss 4.9191 LearningRate 0.0003 Epoch: 20 Global Step: 420350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:25,444-Speed 6313.42 samples/sec Loss 4.8630 LearningRate 0.0003 Epoch: 20 Global Step: 420360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:28,695-Speed 6302.37 samples/sec Loss 4.9083 LearningRate 0.0003 Epoch: 20 Global Step: 420370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:31,939-Speed 6314.85 samples/sec Loss 4.8916 LearningRate 0.0003 Epoch: 20 Global Step: 420380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:35,181-Speed 6316.76 samples/sec Loss 4.7738 LearningRate 0.0003 Epoch: 20 Global Step: 420390 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:38,414-Speed 6336.76 samples/sec Loss 4.8831 LearningRate 0.0003 Epoch: 20 Global Step: 420400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:41,660-Speed 6311.65 samples/sec Loss 4.7822 LearningRate 0.0003 Epoch: 20 Global Step: 420410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:44,907-Speed 6307.49 samples/sec Loss 4.8753 LearningRate 0.0003 Epoch: 20 Global Step: 420420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:48,166-Speed 6286.50 samples/sec Loss 4.8137 LearningRate 0.0003 Epoch: 20 Global Step: 420430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:51,410-Speed 6314.98 samples/sec Loss 4.8708 LearningRate 0.0003 Epoch: 20 Global Step: 420440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:54,658-Speed 6305.56 samples/sec Loss 4.9234 LearningRate 0.0003 Epoch: 20 Global Step: 420450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:58:57,905-Speed 6310.34 samples/sec Loss 4.7725 LearningRate 0.0003 Epoch: 20 Global Step: 420460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:01,153-Speed 6305.47 samples/sec Loss 4.8221 LearningRate 0.0003 Epoch: 20 Global Step: 420470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:04,397-Speed 6314.42 samples/sec Loss 4.9400 LearningRate 0.0003 Epoch: 20 Global Step: 420480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:07,640-Speed 6316.95 samples/sec Loss 4.8314 LearningRate 0.0003 Epoch: 20 Global Step: 420490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:10,887-Speed 6309.49 samples/sec Loss 4.8935 LearningRate 0.0003 Epoch: 20 Global Step: 420500 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:59:14,132-Speed 6311.20 samples/sec Loss 4.8657 LearningRate 0.0003 Epoch: 20 Global Step: 420510 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:59:17,378-Speed 6310.34 samples/sec Loss 4.8638 LearningRate 0.0003 Epoch: 20 Global Step: 420520 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:59:20,610-Speed 6339.18 samples/sec Loss 4.8766 LearningRate 0.0003 Epoch: 20 Global Step: 420530 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:23,854-Speed 6313.47 samples/sec Loss 4.7900 LearningRate 0.0003 Epoch: 20 Global Step: 420540 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:27,106-Speed 6298.99 samples/sec Loss 4.8761 LearningRate 0.0003 Epoch: 20 Global Step: 420550 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:30,351-Speed 6314.10 samples/sec Loss 4.9210 LearningRate 0.0003 Epoch: 20 Global Step: 420560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:33,598-Speed 6308.36 samples/sec Loss 4.8555 LearningRate 0.0003 Epoch: 20 Global Step: 420570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:36,864-Speed 6271.43 samples/sec Loss 4.7924 LearningRate 0.0003 Epoch: 20 Global Step: 420580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:40,130-Speed 6272.58 samples/sec Loss 4.8307 LearningRate 0.0003 Epoch: 20 Global Step: 420590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:43,398-Speed 6268.56 samples/sec Loss 4.8340 LearningRate 0.0003 Epoch: 20 Global Step: 420600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:46,640-Speed 6317.98 samples/sec Loss 4.9408 LearningRate 0.0003 Epoch: 20 Global Step: 420610 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:49,893-Speed 6297.58 samples/sec Loss 4.7967 LearningRate 0.0003 Epoch: 20 Global Step: 420620 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 05:59:53,136-Speed 6317.33 samples/sec Loss 4.8231 LearningRate 0.0003 Epoch: 20 Global Step: 420630 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:59:56,380-Speed 6314.11 samples/sec Loss 4.8611 LearningRate 0.0003 Epoch: 20 Global Step: 420640 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 05:59:59,630-Speed 6303.31 samples/sec Loss 4.7976 LearningRate 0.0003 Epoch: 20 Global Step: 420650 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:00:02,879-Speed 6304.00 samples/sec Loss 4.9002 LearningRate 0.0003 Epoch: 20 Global Step: 420660 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:00:06,119-Speed 6322.78 samples/sec Loss 4.8637 LearningRate 0.0003 Epoch: 20 Global Step: 420670 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:00:09,356-Speed 6328.48 samples/sec Loss 4.8033 LearningRate 0.0003 Epoch: 20 Global Step: 420680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:12,599-Speed 6315.71 samples/sec Loss 4.9283 LearningRate 0.0003 Epoch: 20 Global Step: 420690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:15,846-Speed 6308.72 samples/sec Loss 4.8649 LearningRate 0.0003 Epoch: 20 Global Step: 420700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:19,091-Speed 6312.18 samples/sec Loss 4.8996 LearningRate 0.0003 Epoch: 20 Global Step: 420710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:22,338-Speed 6309.74 samples/sec Loss 4.8343 LearningRate 0.0003 Epoch: 20 Global Step: 420720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:25,596-Speed 6286.50 samples/sec Loss 4.9015 LearningRate 0.0003 Epoch: 20 Global Step: 420730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:28,841-Speed 6312.67 samples/sec Loss 4.8430 LearningRate 0.0003 Epoch: 20 Global Step: 420740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:32,086-Speed 6313.95 samples/sec Loss 4.8163 LearningRate 0.0003 Epoch: 20 Global Step: 420750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:35,329-Speed 6315.49 samples/sec Loss 4.8905 LearningRate 0.0003 Epoch: 20 Global Step: 420760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:38,578-Speed 6304.44 samples/sec Loss 4.8823 LearningRate 0.0003 Epoch: 20 Global Step: 420770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:41,823-Speed 6312.49 samples/sec Loss 4.7867 LearningRate 0.0003 Epoch: 20 Global Step: 420780 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:00:45,060-Speed 6329.70 samples/sec Loss 4.9201 LearningRate 0.0003 Epoch: 20 Global Step: 420790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:48,303-Speed 6316.26 samples/sec Loss 4.8381 LearningRate 0.0003 Epoch: 20 Global Step: 420800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:51,547-Speed 6315.20 samples/sec Loss 4.8860 LearningRate 0.0003 Epoch: 20 Global Step: 420810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:54,789-Speed 6318.03 samples/sec Loss 4.8763 LearningRate 0.0003 Epoch: 20 Global Step: 420820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:00:58,036-Speed 6308.70 samples/sec Loss 4.8740 LearningRate 0.0003 Epoch: 20 Global Step: 420830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:01,277-Speed 6320.51 samples/sec Loss 4.8974 LearningRate 0.0003 Epoch: 20 Global Step: 420840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:04,527-Speed 6303.58 samples/sec Loss 4.8159 LearningRate 0.0003 Epoch: 20 Global Step: 420850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:07,772-Speed 6312.75 samples/sec Loss 4.8448 LearningRate 0.0003 Epoch: 20 Global Step: 420860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:11,025-Speed 6296.82 samples/sec Loss 4.8263 LearningRate 0.0003 Epoch: 20 Global Step: 420870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:14,290-Speed 6274.48 samples/sec Loss 4.9206 LearningRate 0.0003 Epoch: 20 Global Step: 420880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:17,521-Speed 6340.06 samples/sec Loss 4.7886 LearningRate 0.0003 Epoch: 20 Global Step: 420890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:20,767-Speed 6310.54 samples/sec Loss 4.8103 LearningRate 0.0003 Epoch: 20 Global Step: 420900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:24,020-Speed 6297.06 samples/sec Loss 4.8563 LearningRate 0.0003 Epoch: 20 Global Step: 420910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:27,269-Speed 6305.23 samples/sec Loss 4.8171 LearningRate 0.0003 Epoch: 20 Global Step: 420920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:30,516-Speed 6308.86 samples/sec Loss 4.9012 LearningRate 0.0003 Epoch: 20 Global Step: 420930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:33,761-Speed 6312.42 samples/sec Loss 4.8171 LearningRate 0.0003 Epoch: 20 Global Step: 420940 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:37,006-Speed 6311.71 samples/sec Loss 4.8038 LearningRate 0.0003 Epoch: 20 Global Step: 420950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:40,255-Speed 6304.27 samples/sec Loss 4.8529 LearningRate 0.0003 Epoch: 20 Global Step: 420960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:43,505-Speed 6303.61 samples/sec Loss 4.7804 LearningRate 0.0003 Epoch: 20 Global Step: 420970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:46,750-Speed 6313.60 samples/sec Loss 4.8311 LearningRate 0.0003 Epoch: 20 Global Step: 420980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:49,992-Speed 6317.50 samples/sec Loss 4.8520 LearningRate 0.0003 Epoch: 20 Global Step: 420990 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:01:53,240-Speed 6306.53 samples/sec Loss 4.8215 LearningRate 0.0003 Epoch: 20 Global Step: 421000 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:01:56,471-Speed 6340.80 samples/sec Loss 4.8615 LearningRate 0.0003 Epoch: 20 Global Step: 421010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:01:59,718-Speed 6310.02 samples/sec Loss 4.8600 LearningRate 0.0003 Epoch: 20 Global Step: 421020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:02,969-Speed 6299.65 samples/sec Loss 4.8088 LearningRate 0.0003 Epoch: 20 Global Step: 421030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:06,214-Speed 6312.98 samples/sec Loss 4.9531 LearningRate 0.0003 Epoch: 20 Global Step: 421040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:09,457-Speed 6317.14 samples/sec Loss 4.8508 LearningRate 0.0003 Epoch: 20 Global Step: 421050 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:12,699-Speed 6317.07 samples/sec Loss 4.8303 LearningRate 0.0003 Epoch: 20 Global Step: 421060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:15,946-Speed 6310.13 samples/sec Loss 4.8261 LearningRate 0.0003 Epoch: 20 Global Step: 421070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:19,189-Speed 6317.10 samples/sec Loss 4.8058 LearningRate 0.0003 Epoch: 20 Global Step: 421080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:22,435-Speed 6309.74 samples/sec Loss 4.8739 LearningRate 0.0003 Epoch: 20 Global Step: 421090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:25,689-Speed 6295.04 samples/sec Loss 4.8274 LearningRate 0.0003 Epoch: 20 Global Step: 421100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:28,931-Speed 6319.20 samples/sec Loss 4.9193 LearningRate 0.0003 Epoch: 20 Global Step: 421110 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:02:32,177-Speed 6311.15 samples/sec Loss 4.8054 LearningRate 0.0003 Epoch: 20 Global Step: 421120 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:02:35,421-Speed 6312.94 samples/sec Loss 4.8223 LearningRate 0.0003 Epoch: 20 Global Step: 421130 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:02:38,667-Speed 6311.72 samples/sec Loss 4.8507 LearningRate 0.0003 Epoch: 20 Global Step: 421140 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:02:41,914-Speed 6309.22 samples/sec Loss 4.8761 LearningRate 0.0003 Epoch: 20 Global Step: 421150 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:02:45,146-Speed 6336.46 samples/sec Loss 4.8498 LearningRate 0.0003 Epoch: 20 Global Step: 421160 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:48,392-Speed 6310.60 samples/sec Loss 4.8144 LearningRate 0.0003 Epoch: 20 Global Step: 421170 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:51,637-Speed 6313.39 samples/sec Loss 4.8092 LearningRate 0.0003 Epoch: 20 Global Step: 421180 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:54,882-Speed 6312.47 samples/sec Loss 4.9522 LearningRate 0.0003 Epoch: 20 Global Step: 421190 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:02:58,128-Speed 6311.78 samples/sec Loss 4.8564 LearningRate 0.0003 Epoch: 20 Global Step: 421200 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:01,372-Speed 6313.73 samples/sec Loss 4.8710 LearningRate 0.0003 Epoch: 20 Global Step: 421210 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:04,624-Speed 6300.34 samples/sec Loss 4.7956 LearningRate 0.0003 Epoch: 20 Global Step: 421220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:07,870-Speed 6311.05 samples/sec Loss 4.8368 LearningRate 0.0003 Epoch: 20 Global Step: 421230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:11,112-Speed 6318.70 samples/sec Loss 4.8144 LearningRate 0.0003 Epoch: 20 Global Step: 421240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:14,363-Speed 6299.81 samples/sec Loss 4.7240 LearningRate 0.0003 Epoch: 20 Global Step: 421250 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:17,611-Speed 6308.00 samples/sec Loss 4.8233 LearningRate 0.0003 Epoch: 20 Global Step: 421260 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:03:20,839-Speed 6345.18 samples/sec Loss 4.8495 LearningRate 0.0003 Epoch: 20 Global Step: 421270 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:24,085-Speed 6311.80 samples/sec Loss 4.8086 LearningRate 0.0003 Epoch: 20 Global Step: 421280 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:27,331-Speed 6310.25 samples/sec Loss 4.8083 LearningRate 0.0003 Epoch: 20 Global Step: 421290 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:30,576-Speed 6311.86 samples/sec Loss 4.8550 LearningRate 0.0003 Epoch: 20 Global Step: 421300 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:33,820-Speed 6315.06 samples/sec Loss 4.8673 LearningRate 0.0003 Epoch: 20 Global Step: 421310 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:37,065-Speed 6313.24 samples/sec Loss 4.8511 LearningRate 0.0003 Epoch: 20 Global Step: 421320 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:40,307-Speed 6318.35 samples/sec Loss 4.7900 LearningRate 0.0003 Epoch: 20 Global Step: 421330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:43,550-Speed 6315.42 samples/sec Loss 4.8673 LearningRate 0.0003 Epoch: 20 Global Step: 421340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:46,798-Speed 6307.05 samples/sec Loss 4.8121 LearningRate 0.0003 Epoch: 20 Global Step: 421350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:50,044-Speed 6310.39 samples/sec Loss 4.8180 LearningRate 0.0003 Epoch: 20 Global Step: 421360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:03:53,291-Speed 6309.10 samples/sec Loss 4.8749 LearningRate 0.0003 Epoch: 20 Global Step: 421370 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:03:56,537-Speed 6310.89 samples/sec Loss 4.8685 LearningRate 0.0003 Epoch: 20 Global Step: 421380 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:03:59,780-Speed 6315.86 samples/sec Loss 4.8727 LearningRate 0.0003 Epoch: 20 Global Step: 421390 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:04:03,010-Speed 6341.87 samples/sec Loss 4.8203 LearningRate 0.0003 Epoch: 20 Global Step: 421400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:06,256-Speed 6311.24 samples/sec Loss 4.8588 LearningRate 0.0003 Epoch: 20 Global Step: 421410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:09,498-Speed 6319.08 samples/sec Loss 4.8721 LearningRate 0.0003 Epoch: 20 Global Step: 421420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:12,741-Speed 6316.21 samples/sec Loss 4.8212 LearningRate 0.0003 Epoch: 20 Global Step: 421430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:15,986-Speed 6312.95 samples/sec Loss 4.8534 LearningRate 0.0003 Epoch: 20 Global Step: 421440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:19,237-Speed 6301.25 samples/sec Loss 4.8448 LearningRate 0.0003 Epoch: 20 Global Step: 421450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:22,488-Speed 6300.53 samples/sec Loss 4.8629 LearningRate 0.0003 Epoch: 20 Global Step: 421460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:25,745-Speed 6289.83 samples/sec Loss 4.9480 LearningRate 0.0003 Epoch: 20 Global Step: 421470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:28,993-Speed 6307.39 samples/sec Loss 4.8243 LearningRate 0.0003 Epoch: 20 Global Step: 421480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:32,238-Speed 6313.58 samples/sec Loss 4.8625 LearningRate 0.0003 Epoch: 20 Global Step: 421490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:35,494-Speed 6291.34 samples/sec Loss 4.7773 LearningRate 0.0003 Epoch: 20 Global Step: 421500 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:04:38,740-Speed 6310.05 samples/sec Loss 4.8138 LearningRate 0.0003 Epoch: 20 Global Step: 421510 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:04:42,056-Speed 6176.32 samples/sec Loss 4.7951 LearningRate 0.0003 Epoch: 20 Global Step: 421520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:45,302-Speed 6312.42 samples/sec Loss 4.7841 LearningRate 0.0003 Epoch: 20 Global Step: 421530 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:48,547-Speed 6311.24 samples/sec Loss 4.8161 LearningRate 0.0003 Epoch: 20 Global Step: 421540 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:51,792-Speed 6312.79 samples/sec Loss 4.7801 LearningRate 0.0003 Epoch: 20 Global Step: 421550 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:55,037-Speed 6313.59 samples/sec Loss 4.8375 LearningRate 0.0003 Epoch: 20 Global Step: 421560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:04:58,283-Speed 6308.91 samples/sec Loss 4.8524 LearningRate 0.0003 Epoch: 20 Global Step: 421570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:01,528-Speed 6314.83 samples/sec Loss 4.8354 LearningRate 0.0003 Epoch: 20 Global Step: 421580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:04,776-Speed 6306.23 samples/sec Loss 4.8002 LearningRate 0.0003 Epoch: 20 Global Step: 421590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:08,024-Speed 6305.76 samples/sec Loss 4.8171 LearningRate 0.0003 Epoch: 20 Global Step: 421600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:11,273-Speed 6305.96 samples/sec Loss 4.8533 LearningRate 0.0003 Epoch: 20 Global Step: 421610 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:14,515-Speed 6317.56 samples/sec Loss 4.7785 LearningRate 0.0003 Epoch: 20 Global Step: 421620 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:05:17,746-Speed 6341.03 samples/sec Loss 4.8320 LearningRate 0.0003 Epoch: 20 Global Step: 421630 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:20,996-Speed 6302.75 samples/sec Loss 4.8346 LearningRate 0.0003 Epoch: 20 Global Step: 421640 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:24,242-Speed 6310.96 samples/sec Loss 4.8471 LearningRate 0.0003 Epoch: 20 Global Step: 421650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:27,489-Speed 6308.15 samples/sec Loss 4.8919 LearningRate 0.0003 Epoch: 20 Global Step: 421660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:30,733-Speed 6316.70 samples/sec Loss 4.7787 LearningRate 0.0003 Epoch: 20 Global Step: 421670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:33,980-Speed 6308.64 samples/sec Loss 4.8136 LearningRate 0.0003 Epoch: 20 Global Step: 421680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:37,225-Speed 6311.92 samples/sec Loss 4.8408 LearningRate 0.0003 Epoch: 20 Global Step: 421690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:40,469-Speed 6314.49 samples/sec Loss 4.8983 LearningRate 0.0003 Epoch: 20 Global Step: 421700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:43,713-Speed 6314.26 samples/sec Loss 4.9510 LearningRate 0.0003 Epoch: 20 Global Step: 421710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:46,969-Speed 6291.64 samples/sec Loss 4.8529 LearningRate 0.0003 Epoch: 20 Global Step: 421720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:50,200-Speed 6339.56 samples/sec Loss 4.8045 LearningRate 0.0003 Epoch: 20 Global Step: 421730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:53,442-Speed 6318.06 samples/sec Loss 4.8715 LearningRate 0.0003 Epoch: 20 Global Step: 421740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:56,689-Speed 6308.62 samples/sec Loss 4.8507 LearningRate 0.0003 Epoch: 20 Global Step: 421750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:05:59,941-Speed 6300.12 samples/sec Loss 4.8247 LearningRate 0.0003 Epoch: 20 Global Step: 421760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:03,184-Speed 6316.04 samples/sec Loss 4.8206 LearningRate 0.0003 Epoch: 20 Global Step: 421770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:06,429-Speed 6313.06 samples/sec Loss 4.8450 LearningRate 0.0003 Epoch: 20 Global Step: 421780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:09,675-Speed 6309.86 samples/sec Loss 4.8620 LearningRate 0.0003 Epoch: 20 Global Step: 421790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:12,924-Speed 6306.31 samples/sec Loss 4.8326 LearningRate 0.0003 Epoch: 20 Global Step: 421800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:16,165-Speed 6319.86 samples/sec Loss 4.8157 LearningRate 0.0003 Epoch: 20 Global Step: 421810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:19,409-Speed 6313.54 samples/sec Loss 4.8770 LearningRate 0.0003 Epoch: 20 Global Step: 421820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:22,657-Speed 6307.94 samples/sec Loss 4.8609 LearningRate 0.0003 Epoch: 20 Global Step: 421830 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:06:25,886-Speed 6343.37 samples/sec Loss 4.7994 LearningRate 0.0003 Epoch: 20 Global Step: 421840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:29,140-Speed 6294.88 samples/sec Loss 4.8425 LearningRate 0.0003 Epoch: 20 Global Step: 421850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:32,388-Speed 6307.97 samples/sec Loss 4.7859 LearningRate 0.0003 Epoch: 20 Global Step: 421860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:35,633-Speed 6313.78 samples/sec Loss 4.8313 LearningRate 0.0003 Epoch: 20 Global Step: 421870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:38,877-Speed 6313.42 samples/sec Loss 4.8630 LearningRate 0.0003 Epoch: 20 Global Step: 421880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:42,122-Speed 6312.27 samples/sec Loss 4.8805 LearningRate 0.0003 Epoch: 20 Global Step: 421890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:45,368-Speed 6312.95 samples/sec Loss 4.8801 LearningRate 0.0003 Epoch: 20 Global Step: 421900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:48,616-Speed 6306.20 samples/sec Loss 4.9106 LearningRate 0.0003 Epoch: 20 Global Step: 421910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:51,863-Speed 6307.70 samples/sec Loss 4.8833 LearningRate 0.0003 Epoch: 20 Global Step: 421920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:55,108-Speed 6313.17 samples/sec Loss 4.8478 LearningRate 0.0003 Epoch: 20 Global Step: 421930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:06:58,358-Speed 6302.31 samples/sec Loss 4.8284 LearningRate 0.0003 Epoch: 20 Global Step: 421940 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:07:01,592-Speed 6335.43 samples/sec Loss 4.8930 LearningRate 0.0003 Epoch: 20 Global Step: 421950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:04,836-Speed 6313.65 samples/sec Loss 4.8158 LearningRate 0.0003 Epoch: 20 Global Step: 421960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:08,082-Speed 6311.33 samples/sec Loss 4.8754 LearningRate 0.0003 Epoch: 20 Global Step: 421970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:11,326-Speed 6313.77 samples/sec Loss 4.8114 LearningRate 0.0003 Epoch: 20 Global Step: 421980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:14,573-Speed 6308.45 samples/sec Loss 4.7915 LearningRate 0.0003 Epoch: 20 Global Step: 421990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:17,824-Speed 6301.40 samples/sec Loss 4.8604 LearningRate 0.0003 Epoch: 20 Global Step: 422000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:21,071-Speed 6308.50 samples/sec Loss 4.8519 LearningRate 0.0003 Epoch: 20 Global Step: 422010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:24,331-Speed 6284.38 samples/sec Loss 4.8586 LearningRate 0.0003 Epoch: 20 Global Step: 422020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:27,591-Speed 6283.50 samples/sec Loss 4.8583 LearningRate 0.0003 Epoch: 20 Global Step: 422030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:30,846-Speed 6294.15 samples/sec Loss 4.8229 LearningRate 0.0003 Epoch: 20 Global Step: 422040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:34,089-Speed 6316.26 samples/sec Loss 4.8728 LearningRate 0.0003 Epoch: 20 Global Step: 422050 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:07:37,327-Speed 6327.41 samples/sec Loss 4.8755 LearningRate 0.0003 Epoch: 20 Global Step: 422060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:40,570-Speed 6315.17 samples/sec Loss 4.9071 LearningRate 0.0003 Epoch: 20 Global Step: 422070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:43,818-Speed 6307.67 samples/sec Loss 4.8372 LearningRate 0.0003 Epoch: 20 Global Step: 422080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:47,065-Speed 6308.42 samples/sec Loss 4.8352 LearningRate 0.0003 Epoch: 20 Global Step: 422090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:50,307-Speed 6318.71 samples/sec Loss 4.8207 LearningRate 0.0003 Epoch: 20 Global Step: 422100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:53,554-Speed 6309.49 samples/sec Loss 4.8266 LearningRate 0.0003 Epoch: 20 Global Step: 422110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:07:56,800-Speed 6309.73 samples/sec Loss 4.7854 LearningRate 0.0003 Epoch: 20 Global Step: 422120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:00,049-Speed 6305.64 samples/sec Loss 4.8319 LearningRate 0.0003 Epoch: 20 Global Step: 422130 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:03,297-Speed 6306.66 samples/sec Loss 4.8378 LearningRate 0.0003 Epoch: 20 Global Step: 422140 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:06,547-Speed 6302.64 samples/sec Loss 4.8033 LearningRate 0.0003 Epoch: 20 Global Step: 422150 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:09,796-Speed 6304.12 samples/sec Loss 4.7638 LearningRate 0.0003 Epoch: 20 Global Step: 422160 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:08:13,040-Speed 6315.85 samples/sec Loss 4.8902 LearningRate 0.0003 Epoch: 20 Global Step: 422170 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:08:16,272-Speed 6336.56 samples/sec Loss 4.8440 LearningRate 0.0003 Epoch: 20 Global Step: 422180 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:19,515-Speed 6318.36 samples/sec Loss 4.8316 LearningRate 0.0003 Epoch: 20 Global Step: 422190 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:22,764-Speed 6304.78 samples/sec Loss 4.8496 LearningRate 0.0003 Epoch: 20 Global Step: 422200 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:26,021-Speed 6289.45 samples/sec Loss 4.7951 LearningRate 0.0003 Epoch: 20 Global Step: 422210 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:29,267-Speed 6310.03 samples/sec Loss 4.8768 LearningRate 0.0003 Epoch: 20 Global Step: 422220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:32,511-Speed 6314.51 samples/sec Loss 4.8226 LearningRate 0.0003 Epoch: 20 Global Step: 422230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:35,756-Speed 6313.76 samples/sec Loss 4.8073 LearningRate 0.0003 Epoch: 20 Global Step: 422240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:39,004-Speed 6305.36 samples/sec Loss 4.8500 LearningRate 0.0003 Epoch: 20 Global Step: 422250 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:42,247-Speed 6317.37 samples/sec Loss 4.7847 LearningRate 0.0003 Epoch: 20 Global Step: 422260 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:45,490-Speed 6315.97 samples/sec Loss 4.9420 LearningRate 0.0003 Epoch: 20 Global Step: 422270 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:48,723-Speed 6338.23 samples/sec Loss 4.7623 LearningRate 0.0003 Epoch: 20 Global Step: 422280 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:51,965-Speed 6317.42 samples/sec Loss 4.8382 LearningRate 0.0003 Epoch: 20 Global Step: 422290 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:55,212-Speed 6308.80 samples/sec Loss 4.7637 LearningRate 0.0003 Epoch: 20 Global Step: 422300 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:08:58,460-Speed 6307.81 samples/sec Loss 4.7835 LearningRate 0.0003 Epoch: 20 Global Step: 422310 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:01,706-Speed 6310.00 samples/sec Loss 4.8686 LearningRate 0.0003 Epoch: 20 Global Step: 422320 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:04,949-Speed 6316.23 samples/sec Loss 4.8646 LearningRate 0.0003 Epoch: 20 Global Step: 422330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:08,195-Speed 6311.90 samples/sec Loss 4.8322 LearningRate 0.0003 Epoch: 20 Global Step: 422340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:11,444-Speed 6303.40 samples/sec Loss 4.7919 LearningRate 0.0003 Epoch: 20 Global Step: 422350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:14,685-Speed 6321.92 samples/sec Loss 4.7850 LearningRate 0.0003 Epoch: 20 Global Step: 422360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:17,927-Speed 6316.76 samples/sec Loss 4.8736 LearningRate 0.0003 Epoch: 20 Global Step: 422370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:21,160-Speed 6338.03 samples/sec Loss 4.8297 LearningRate 0.0003 Epoch: 20 Global Step: 422380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:24,418-Speed 6287.30 samples/sec Loss 4.8177 LearningRate 0.0003 Epoch: 20 Global Step: 422390 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:27,664-Speed 6310.24 samples/sec Loss 4.7907 LearningRate 0.0003 Epoch: 20 Global Step: 422400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:30,910-Speed 6309.78 samples/sec Loss 4.8807 LearningRate 0.0003 Epoch: 20 Global Step: 422410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:34,152-Speed 6317.80 samples/sec Loss 4.8632 LearningRate 0.0003 Epoch: 20 Global Step: 422420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:37,401-Speed 6305.07 samples/sec Loss 4.7962 LearningRate 0.0003 Epoch: 20 Global Step: 422430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:40,654-Speed 6298.38 samples/sec Loss 4.8715 LearningRate 0.0003 Epoch: 20 Global Step: 422440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:43,932-Speed 6248.52 samples/sec Loss 4.8572 LearningRate 0.0003 Epoch: 20 Global Step: 422450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:47,175-Speed 6317.32 samples/sec Loss 4.7878 LearningRate 0.0003 Epoch: 20 Global Step: 422460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:50,422-Speed 6307.53 samples/sec Loss 4.8304 LearningRate 0.0003 Epoch: 20 Global Step: 422470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:09:53,670-Speed 6308.16 samples/sec Loss 4.7617 LearningRate 0.0003 Epoch: 20 Global Step: 422480 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:09:56,917-Speed 6308.75 samples/sec Loss 4.8851 LearningRate 0.0003 Epoch: 20 Global Step: 422490 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:10:00,149-Speed 6338.44 samples/sec Loss 4.8720 LearningRate 0.0003 Epoch: 20 Global Step: 422500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:03,394-Speed 6312.12 samples/sec Loss 4.8111 LearningRate 0.0003 Epoch: 20 Global Step: 422510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:06,638-Speed 6314.99 samples/sec Loss 4.8410 LearningRate 0.0003 Epoch: 20 Global Step: 422520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:09,881-Speed 6316.04 samples/sec Loss 4.8166 LearningRate 0.0003 Epoch: 20 Global Step: 422530 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:13,128-Speed 6310.02 samples/sec Loss 4.7995 LearningRate 0.0003 Epoch: 20 Global Step: 422540 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:16,370-Speed 6317.72 samples/sec Loss 4.8516 LearningRate 0.0003 Epoch: 20 Global Step: 422550 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:19,621-Speed 6301.76 samples/sec Loss 4.8050 LearningRate 0.0003 Epoch: 20 Global Step: 422560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:22,866-Speed 6312.38 samples/sec Loss 4.8185 LearningRate 0.0003 Epoch: 20 Global Step: 422570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:26,124-Speed 6287.09 samples/sec Loss 4.8729 LearningRate 0.0003 Epoch: 20 Global Step: 422580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:29,372-Speed 6306.96 samples/sec Loss 4.8059 LearningRate 0.0003 Epoch: 20 Global Step: 422590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:32,604-Speed 6337.86 samples/sec Loss 4.7831 LearningRate 0.0003 Epoch: 20 Global Step: 422600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:35,851-Speed 6308.85 samples/sec Loss 4.8715 LearningRate 0.0003 Epoch: 20 Global Step: 422610 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:39,102-Speed 6299.88 samples/sec Loss 4.7625 LearningRate 0.0003 Epoch: 20 Global Step: 422620 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:42,351-Speed 6305.78 samples/sec Loss 4.8161 LearningRate 0.0003 Epoch: 20 Global Step: 422630 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:45,609-Speed 6287.69 samples/sec Loss 4.8190 LearningRate 0.0003 Epoch: 20 Global Step: 422640 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:48,893-Speed 6236.96 samples/sec Loss 4.8862 LearningRate 0.0003 Epoch: 20 Global Step: 422650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:52,143-Speed 6304.13 samples/sec Loss 4.8212 LearningRate 0.0003 Epoch: 20 Global Step: 422660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:55,388-Speed 6311.03 samples/sec Loss 4.8900 LearningRate 0.0003 Epoch: 20 Global Step: 422670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:10:58,633-Speed 6313.06 samples/sec Loss 4.8546 LearningRate 0.0003 Epoch: 20 Global Step: 422680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:01,880-Speed 6310.22 samples/sec Loss 4.8800 LearningRate 0.0003 Epoch: 20 Global Step: 422690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:05,123-Speed 6316.22 samples/sec Loss 4.8641 LearningRate 0.0003 Epoch: 20 Global Step: 422700 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:11:08,356-Speed 6337.11 samples/sec Loss 4.8569 LearningRate 0.0003 Epoch: 20 Global Step: 422710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:11,599-Speed 6316.36 samples/sec Loss 4.8533 LearningRate 0.0003 Epoch: 20 Global Step: 422720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:14,847-Speed 6306.96 samples/sec Loss 4.7588 LearningRate 0.0003 Epoch: 20 Global Step: 422730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:18,115-Speed 6267.98 samples/sec Loss 4.7963 LearningRate 0.0003 Epoch: 20 Global Step: 422740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:21,362-Speed 6309.23 samples/sec Loss 4.7994 LearningRate 0.0003 Epoch: 20 Global Step: 422750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:24,610-Speed 6305.39 samples/sec Loss 4.8407 LearningRate 0.0003 Epoch: 20 Global Step: 422760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:27,855-Speed 6313.62 samples/sec Loss 4.7895 LearningRate 0.0003 Epoch: 20 Global Step: 422770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:31,100-Speed 6311.57 samples/sec Loss 4.8590 LearningRate 0.0003 Epoch: 20 Global Step: 422780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:34,346-Speed 6310.61 samples/sec Loss 4.8218 LearningRate 0.0003 Epoch: 20 Global Step: 422790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:37,592-Speed 6311.90 samples/sec Loss 4.8509 LearningRate 0.0003 Epoch: 20 Global Step: 422800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:40,836-Speed 6313.56 samples/sec Loss 4.8386 LearningRate 0.0003 Epoch: 20 Global Step: 422810 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:11:44,079-Speed 6317.74 samples/sec Loss 4.8841 LearningRate 0.0003 Epoch: 20 Global Step: 422820 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:11:47,360-Speed 6243.07 samples/sec Loss 4.8534 LearningRate 0.0003 Epoch: 20 Global Step: 422830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:50,615-Speed 6292.27 samples/sec Loss 4.8650 LearningRate 0.0003 Epoch: 20 Global Step: 422840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:53,862-Speed 6309.86 samples/sec Loss 4.8552 LearningRate 0.0003 Epoch: 20 Global Step: 422850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:11:57,104-Speed 6317.19 samples/sec Loss 4.8425 LearningRate 0.0003 Epoch: 20 Global Step: 422860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:00,353-Speed 6304.61 samples/sec Loss 4.8585 LearningRate 0.0003 Epoch: 20 Global Step: 422870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:03,599-Speed 6310.67 samples/sec Loss 4.8346 LearningRate 0.0003 Epoch: 20 Global Step: 422880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:06,846-Speed 6309.71 samples/sec Loss 4.8713 LearningRate 0.0003 Epoch: 20 Global Step: 422890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:10,096-Speed 6304.55 samples/sec Loss 4.8987 LearningRate 0.0003 Epoch: 20 Global Step: 422900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:13,360-Speed 6275.39 samples/sec Loss 4.8897 LearningRate 0.0003 Epoch: 20 Global Step: 422910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:16,610-Speed 6303.42 samples/sec Loss 4.9465 LearningRate 0.0003 Epoch: 20 Global Step: 422920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:19,854-Speed 6313.22 samples/sec Loss 4.9545 LearningRate 0.0003 Epoch: 20 Global Step: 422930 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:12:23,100-Speed 6311.59 samples/sec Loss 4.8902 LearningRate 0.0003 Epoch: 20 Global Step: 422940 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:12:26,342-Speed 6318.58 samples/sec Loss 4.8530 LearningRate 0.0003 Epoch: 20 Global Step: 422950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:29,591-Speed 6305.47 samples/sec Loss 4.8165 LearningRate 0.0003 Epoch: 20 Global Step: 422960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:32,833-Speed 6317.99 samples/sec Loss 4.8197 LearningRate 0.0003 Epoch: 20 Global Step: 422970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:36,079-Speed 6310.71 samples/sec Loss 4.8422 LearningRate 0.0003 Epoch: 20 Global Step: 422980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:39,325-Speed 6310.15 samples/sec Loss 4.8308 LearningRate 0.0003 Epoch: 20 Global Step: 422990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:42,568-Speed 6316.84 samples/sec Loss 4.8573 LearningRate 0.0003 Epoch: 20 Global Step: 423000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:45,816-Speed 6307.43 samples/sec Loss 4.8794 LearningRate 0.0003 Epoch: 20 Global Step: 423010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:49,060-Speed 6313.07 samples/sec Loss 4.8227 LearningRate 0.0003 Epoch: 20 Global Step: 423020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:52,306-Speed 6310.54 samples/sec Loss 4.8873 LearningRate 0.0003 Epoch: 20 Global Step: 423030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:55,548-Speed 6319.09 samples/sec Loss 4.8635 LearningRate 0.0003 Epoch: 20 Global Step: 423040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:12:58,799-Speed 6301.77 samples/sec Loss 4.8266 LearningRate 0.0003 Epoch: 20 Global Step: 423050 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:13:02,044-Speed 6311.03 samples/sec Loss 4.8106 LearningRate 0.0003 Epoch: 20 Global Step: 423060 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:13:05,275-Speed 6340.05 samples/sec Loss 4.8099 LearningRate 0.0003 Epoch: 20 Global Step: 423070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:08,522-Speed 6309.59 samples/sec Loss 4.8721 LearningRate 0.0003 Epoch: 20 Global Step: 423080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:11,772-Speed 6303.56 samples/sec Loss 4.7969 LearningRate 0.0003 Epoch: 20 Global Step: 423090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:15,016-Speed 6314.37 samples/sec Loss 4.8320 LearningRate 0.0003 Epoch: 20 Global Step: 423100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:18,262-Speed 6311.69 samples/sec Loss 4.8495 LearningRate 0.0003 Epoch: 20 Global Step: 423110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:21,510-Speed 6306.71 samples/sec Loss 4.8644 LearningRate 0.0003 Epoch: 20 Global Step: 423120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:24,757-Speed 6307.98 samples/sec Loss 4.8995 LearningRate 0.0003 Epoch: 20 Global Step: 423130 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:28,011-Speed 6294.84 samples/sec Loss 4.7681 LearningRate 0.0003 Epoch: 20 Global Step: 423140 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:31,258-Speed 6309.27 samples/sec Loss 4.7905 LearningRate 0.0003 Epoch: 20 Global Step: 423150 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:34,504-Speed 6310.73 samples/sec Loss 4.7588 LearningRate 0.0003 Epoch: 20 Global Step: 423160 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:37,755-Speed 6301.97 samples/sec Loss 4.7909 LearningRate 0.0003 Epoch: 20 Global Step: 423170 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:13:40,988-Speed 6335.12 samples/sec Loss 4.8137 LearningRate 0.0003 Epoch: 20 Global Step: 423180 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:44,234-Speed 6311.79 samples/sec Loss 4.8509 LearningRate 0.0003 Epoch: 20 Global Step: 423190 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:47,476-Speed 6317.61 samples/sec Loss 4.8294 LearningRate 0.0003 Epoch: 20 Global Step: 423200 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:50,722-Speed 6310.54 samples/sec Loss 4.8908 LearningRate 0.0003 Epoch: 20 Global Step: 423210 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:53,969-Speed 6308.22 samples/sec Loss 4.8037 LearningRate 0.0003 Epoch: 20 Global Step: 423220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:13:57,217-Speed 6307.89 samples/sec Loss 4.8338 LearningRate 0.0003 Epoch: 20 Global Step: 423230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:14:00,473-Speed 6291.16 samples/sec Loss 4.8102 LearningRate 0.0003 Epoch: 20 Global Step: 423240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:14:03,718-Speed 6312.08 samples/sec Loss 4.8120 LearningRate 0.0003 Epoch: 20 Global Step: 423250 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:14:06,954-Speed 6331.53 samples/sec Loss 4.8397 LearningRate 0.0003 Epoch: 20 Global Step: 423260 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:14:10,195-Speed 6319.44 samples/sec Loss 4.7670 LearningRate 0.0003 Epoch: 20 Global Step: 423270 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:14:13,442-Speed 6309.20 samples/sec Loss 4.8292 LearningRate 0.0003 Epoch: 20 Global Step: 423280 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:14:16,688-Speed 6310.05 samples/sec Loss 4.8542 LearningRate 0.0003 Epoch: 20 Global Step: 423290 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:14:19,930-Speed 6318.08 samples/sec Loss 4.8177 LearningRate 0.0003 Epoch: 20 Global Step: 423300 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:14:23,180-Speed 6305.05 samples/sec Loss 4.7589 LearningRate 0.0003 Epoch: 20 Global Step: 423310 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:14:26,439-Speed 6285.22 samples/sec Loss 4.7999 LearningRate 0.0003 Epoch: 20 Global Step: 423320 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:14:29,682-Speed 6316.80 samples/sec Loss 4.8443 LearningRate 0.0003 Epoch: 20 Global Step: 423330 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:14:32,928-Speed 6310.44 samples/sec Loss 4.8427 LearningRate 0.0003 Epoch: 20 Global Step: 423340 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:14:36,175-Speed 6309.77 samples/sec Loss 4.8032 LearningRate 0.0003 Epoch: 20 Global Step: 423350 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:14:39,421-Speed 6310.70 samples/sec Loss 4.8534 LearningRate 0.0003 Epoch: 20 Global Step: 423360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:14:42,664-Speed 6315.66 samples/sec Loss 4.7121 LearningRate 0.0003 Epoch: 20 Global Step: 423370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:14:45,909-Speed 6313.27 samples/sec Loss 4.7933 LearningRate 0.0003 Epoch: 20 Global Step: 423380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:14:49,154-Speed 6312.83 samples/sec Loss 4.9031 LearningRate 0.0003 Epoch: 20 Global Step: 423390 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:14:52,398-Speed 6314.12 samples/sec Loss 4.7563 LearningRate 0.0003 Epoch: 20 Global Step: 423400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:14:55,646-Speed 6307.52 samples/sec Loss 4.9244 LearningRate 0.0003 Epoch: 20 Global Step: 423410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:14:58,890-Speed 6314.13 samples/sec Loss 4.8004 LearningRate 0.0003 Epoch: 20 Global Step: 423420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:02,135-Speed 6312.72 samples/sec Loss 4.7802 LearningRate 0.0003 Epoch: 20 Global Step: 423430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:05,386-Speed 6301.55 samples/sec Loss 4.8478 LearningRate 0.0003 Epoch: 20 Global Step: 423440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:08,634-Speed 6306.00 samples/sec Loss 4.8314 LearningRate 0.0003 Epoch: 20 Global Step: 423450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:11,880-Speed 6312.41 samples/sec Loss 4.8370 LearningRate 0.0003 Epoch: 20 Global Step: 423460 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:15:15,124-Speed 6314.06 samples/sec Loss 4.8504 LearningRate 0.0003 Epoch: 20 Global Step: 423470 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:15:18,354-Speed 6340.73 samples/sec Loss 4.8276 LearningRate 0.0003 Epoch: 20 Global Step: 423480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:21,597-Speed 6315.93 samples/sec Loss 4.7995 LearningRate 0.0003 Epoch: 20 Global Step: 423490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:24,851-Speed 6296.05 samples/sec Loss 4.8100 LearningRate 0.0003 Epoch: 20 Global Step: 423500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:28,098-Speed 6309.02 samples/sec Loss 4.8346 LearningRate 0.0003 Epoch: 20 Global Step: 423510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:31,341-Speed 6316.49 samples/sec Loss 4.8338 LearningRate 0.0003 Epoch: 20 Global Step: 423520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:34,588-Speed 6309.09 samples/sec Loss 4.8039 LearningRate 0.0003 Epoch: 20 Global Step: 423530 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:37,835-Speed 6309.48 samples/sec Loss 4.8377 LearningRate 0.0003 Epoch: 20 Global Step: 423540 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:41,079-Speed 6314.90 samples/sec Loss 4.8679 LearningRate 0.0003 Epoch: 20 Global Step: 423550 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:44,323-Speed 6314.37 samples/sec Loss 4.7634 LearningRate 0.0003 Epoch: 20 Global Step: 423560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:47,566-Speed 6317.11 samples/sec Loss 4.7965 LearningRate 0.0003 Epoch: 20 Global Step: 423570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:15:50,810-Speed 6313.68 samples/sec Loss 4.8350 LearningRate 0.0003 Epoch: 20 Global Step: 423580 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:15:54,054-Speed 6314.15 samples/sec Loss 4.8801 LearningRate 0.0003 Epoch: 20 Global Step: 423590 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:15:57,285-Speed 6340.50 samples/sec Loss 4.8048 LearningRate 0.0003 Epoch: 20 Global Step: 423600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:00,530-Speed 6311.93 samples/sec Loss 4.8241 LearningRate 0.0003 Epoch: 20 Global Step: 423610 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:03,776-Speed 6311.94 samples/sec Loss 4.8121 LearningRate 0.0003 Epoch: 20 Global Step: 423620 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:07,035-Speed 6286.09 samples/sec Loss 4.8290 LearningRate 0.0003 Epoch: 20 Global Step: 423630 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:10,395-Speed 6095.43 samples/sec Loss 4.8571 LearningRate 0.0003 Epoch: 20 Global Step: 423640 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:13,669-Speed 6257.95 samples/sec Loss 4.8413 LearningRate 0.0003 Epoch: 20 Global Step: 423650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:16,910-Speed 6319.07 samples/sec Loss 4.7745 LearningRate 0.0003 Epoch: 20 Global Step: 423660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:20,160-Speed 6303.24 samples/sec Loss 4.8724 LearningRate 0.0003 Epoch: 20 Global Step: 423670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:23,435-Speed 6254.33 samples/sec Loss 4.8209 LearningRate 0.0003 Epoch: 20 Global Step: 423680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:26,704-Speed 6266.68 samples/sec Loss 4.8792 LearningRate 0.0003 Epoch: 20 Global Step: 423690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:29,975-Speed 6262.87 samples/sec Loss 4.8662 LearningRate 0.0003 Epoch: 20 Global Step: 423700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:33,220-Speed 6312.90 samples/sec Loss 4.8560 LearningRate 0.0003 Epoch: 20 Global Step: 423710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:36,468-Speed 6306.33 samples/sec Loss 4.8284 LearningRate 0.0003 Epoch: 20 Global Step: 423720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:39,781-Speed 6182.04 samples/sec Loss 4.7427 LearningRate 0.0003 Epoch: 20 Global Step: 423730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:43,028-Speed 6311.30 samples/sec Loss 4.7959 LearningRate 0.0003 Epoch: 20 Global Step: 423740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:46,280-Speed 6298.11 samples/sec Loss 4.7806 LearningRate 0.0003 Epoch: 20 Global Step: 423750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:49,525-Speed 6312.98 samples/sec Loss 4.8910 LearningRate 0.0003 Epoch: 20 Global Step: 423760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:52,770-Speed 6312.68 samples/sec Loss 4.7769 LearningRate 0.0003 Epoch: 20 Global Step: 423770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:56,011-Speed 6320.50 samples/sec Loss 4.8669 LearningRate 0.0003 Epoch: 20 Global Step: 423780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:16:59,257-Speed 6310.96 samples/sec Loss 4.8304 LearningRate 0.0003 Epoch: 20 Global Step: 423790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:02,501-Speed 6313.98 samples/sec Loss 4.9059 LearningRate 0.0003 Epoch: 20 Global Step: 423800 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:17:05,734-Speed 6336.64 samples/sec Loss 4.8520 LearningRate 0.0003 Epoch: 20 Global Step: 423810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:08,976-Speed 6318.52 samples/sec Loss 4.7718 LearningRate 0.0003 Epoch: 20 Global Step: 423820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:12,221-Speed 6313.80 samples/sec Loss 4.8045 LearningRate 0.0003 Epoch: 20 Global Step: 423830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:15,463-Speed 6318.50 samples/sec Loss 4.7729 LearningRate 0.0003 Epoch: 20 Global Step: 423840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:18,708-Speed 6310.57 samples/sec Loss 4.8782 LearningRate 0.0003 Epoch: 20 Global Step: 423850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:21,952-Speed 6315.61 samples/sec Loss 4.7766 LearningRate 0.0003 Epoch: 20 Global Step: 423860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:25,223-Speed 6262.43 samples/sec Loss 4.8396 LearningRate 0.0003 Epoch: 20 Global Step: 423870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:28,471-Speed 6307.18 samples/sec Loss 4.8310 LearningRate 0.0003 Epoch: 20 Global Step: 423880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:31,717-Speed 6310.71 samples/sec Loss 4.8761 LearningRate 0.0003 Epoch: 20 Global Step: 423890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:34,964-Speed 6309.01 samples/sec Loss 4.7366 LearningRate 0.0003 Epoch: 20 Global Step: 423900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:38,194-Speed 6340.35 samples/sec Loss 4.8592 LearningRate 0.0003 Epoch: 20 Global Step: 423910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:41,443-Speed 6305.44 samples/sec Loss 4.7991 LearningRate 0.0003 Epoch: 20 Global Step: 423920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:44,690-Speed 6310.26 samples/sec Loss 4.7494 LearningRate 0.0003 Epoch: 20 Global Step: 423930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:47,968-Speed 6247.84 samples/sec Loss 4.8332 LearningRate 0.0003 Epoch: 20 Global Step: 423940 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:51,330-Speed 6093.42 samples/sec Loss 4.7501 LearningRate 0.0003 Epoch: 20 Global Step: 423950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:54,664-Speed 6144.26 samples/sec Loss 4.8761 LearningRate 0.0003 Epoch: 20 Global Step: 423960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:17:57,910-Speed 6311.32 samples/sec Loss 4.8009 LearningRate 0.0003 Epoch: 20 Global Step: 423970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:01,154-Speed 6315.30 samples/sec Loss 4.8585 LearningRate 0.0003 Epoch: 20 Global Step: 423980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:04,397-Speed 6315.56 samples/sec Loss 4.8398 LearningRate 0.0003 Epoch: 20 Global Step: 423990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:07,642-Speed 6313.96 samples/sec Loss 4.8239 LearningRate 0.0003 Epoch: 20 Global Step: 424000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:10,886-Speed 6314.78 samples/sec Loss 4.8643 LearningRate 0.0003 Epoch: 20 Global Step: 424010 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:18:14,132-Speed 6311.21 samples/sec Loss 4.7859 LearningRate 0.0003 Epoch: 20 Global Step: 424020 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:18:17,379-Speed 6308.39 samples/sec Loss 4.7997 LearningRate 0.0003 Epoch: 20 Global Step: 424030 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:18:20,624-Speed 6311.17 samples/sec Loss 4.8286 LearningRate 0.0003 Epoch: 20 Global Step: 424040 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:18:23,851-Speed 6347.33 samples/sec Loss 4.8439 LearningRate 0.0003 Epoch: 20 Global Step: 424050 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:27,098-Speed 6310.27 samples/sec Loss 4.7739 LearningRate 0.0003 Epoch: 20 Global Step: 424060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:30,343-Speed 6311.44 samples/sec Loss 4.8830 LearningRate 0.0003 Epoch: 20 Global Step: 424070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:33,589-Speed 6312.24 samples/sec Loss 4.8391 LearningRate 0.0003 Epoch: 20 Global Step: 424080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:36,834-Speed 6311.11 samples/sec Loss 4.8652 LearningRate 0.0003 Epoch: 20 Global Step: 424090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:40,083-Speed 6304.95 samples/sec Loss 4.8207 LearningRate 0.0003 Epoch: 20 Global Step: 424100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:43,329-Speed 6312.54 samples/sec Loss 4.7820 LearningRate 0.0003 Epoch: 20 Global Step: 424110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:46,578-Speed 6304.01 samples/sec Loss 4.8217 LearningRate 0.0003 Epoch: 20 Global Step: 424120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:49,825-Speed 6308.53 samples/sec Loss 4.8744 LearningRate 0.0003 Epoch: 20 Global Step: 424130 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:53,074-Speed 6305.31 samples/sec Loss 4.7995 LearningRate 0.0003 Epoch: 20 Global Step: 424140 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:18:56,322-Speed 6306.12 samples/sec Loss 4.8327 LearningRate 0.0003 Epoch: 20 Global Step: 424150 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:18:59,554-Speed 6338.59 samples/sec Loss 4.8095 LearningRate 0.0003 Epoch: 20 Global Step: 424160 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:02,799-Speed 6313.46 samples/sec Loss 4.8438 LearningRate 0.0003 Epoch: 20 Global Step: 424170 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:06,045-Speed 6311.79 samples/sec Loss 4.8179 LearningRate 0.0003 Epoch: 20 Global Step: 424180 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:09,311-Speed 6270.37 samples/sec Loss 4.8856 LearningRate 0.0003 Epoch: 20 Global Step: 424190 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:12,557-Speed 6315.26 samples/sec Loss 4.8209 LearningRate 0.0003 Epoch: 20 Global Step: 424200 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:15,803-Speed 6310.02 samples/sec Loss 4.7547 LearningRate 0.0003 Epoch: 20 Global Step: 424210 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:19,049-Speed 6309.92 samples/sec Loss 4.8058 LearningRate 0.0003 Epoch: 20 Global Step: 424220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:22,315-Speed 6271.73 samples/sec Loss 4.8720 LearningRate 0.0003 Epoch: 20 Global Step: 424230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:25,562-Speed 6309.05 samples/sec Loss 4.7535 LearningRate 0.0003 Epoch: 20 Global Step: 424240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:28,808-Speed 6310.56 samples/sec Loss 4.8293 LearningRate 0.0003 Epoch: 20 Global Step: 424250 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:32,059-Speed 6301.57 samples/sec Loss 4.8519 LearningRate 0.0003 Epoch: 20 Global Step: 424260 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:19:35,299-Speed 6323.29 samples/sec Loss 4.7858 LearningRate 0.0003 Epoch: 20 Global Step: 424270 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:38,544-Speed 6312.85 samples/sec Loss 4.8838 LearningRate 0.0003 Epoch: 20 Global Step: 424280 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:41,787-Speed 6316.70 samples/sec Loss 4.7764 LearningRate 0.0003 Epoch: 20 Global Step: 424290 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:45,032-Speed 6311.83 samples/sec Loss 4.8083 LearningRate 0.0003 Epoch: 20 Global Step: 424300 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:48,288-Speed 6291.71 samples/sec Loss 4.7159 LearningRate 0.0003 Epoch: 20 Global Step: 424310 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:51,533-Speed 6312.88 samples/sec Loss 4.7962 LearningRate 0.0003 Epoch: 20 Global Step: 424320 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:54,780-Speed 6307.98 samples/sec Loss 4.7845 LearningRate 0.0003 Epoch: 20 Global Step: 424330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:19:58,025-Speed 6312.68 samples/sec Loss 4.7952 LearningRate 0.0003 Epoch: 20 Global Step: 424340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:01,271-Speed 6311.06 samples/sec Loss 4.8244 LearningRate 0.0003 Epoch: 20 Global Step: 424350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:04,516-Speed 6312.13 samples/sec Loss 4.8296 LearningRate 0.0003 Epoch: 20 Global Step: 424360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:07,747-Speed 6341.12 samples/sec Loss 4.9005 LearningRate 0.0003 Epoch: 20 Global Step: 424370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:10,992-Speed 6312.98 samples/sec Loss 4.8579 LearningRate 0.0003 Epoch: 20 Global Step: 424380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:14,234-Speed 6318.78 samples/sec Loss 4.7960 LearningRate 0.0003 Epoch: 20 Global Step: 424390 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:17,478-Speed 6314.53 samples/sec Loss 4.8124 LearningRate 0.0003 Epoch: 20 Global Step: 424400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:20,721-Speed 6315.82 samples/sec Loss 4.8423 LearningRate 0.0003 Epoch: 20 Global Step: 424410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:23,970-Speed 6305.78 samples/sec Loss 4.8430 LearningRate 0.0003 Epoch: 20 Global Step: 424420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:27,219-Speed 6305.20 samples/sec Loss 4.8656 LearningRate 0.0003 Epoch: 20 Global Step: 424430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:30,460-Speed 6320.02 samples/sec Loss 4.7472 LearningRate 0.0003 Epoch: 20 Global Step: 424440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:33,718-Speed 6287.36 samples/sec Loss 4.8218 LearningRate 0.0003 Epoch: 20 Global Step: 424450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:36,964-Speed 6311.41 samples/sec Loss 4.8406 LearningRate 0.0003 Epoch: 20 Global Step: 424460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:40,205-Speed 6319.09 samples/sec Loss 4.8309 LearningRate 0.0003 Epoch: 20 Global Step: 424470 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:20:43,440-Speed 6332.83 samples/sec Loss 4.7492 LearningRate 0.0003 Epoch: 20 Global Step: 424480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:46,686-Speed 6310.14 samples/sec Loss 4.7564 LearningRate 0.0003 Epoch: 20 Global Step: 424490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:49,935-Speed 6305.14 samples/sec Loss 4.8421 LearningRate 0.0003 Epoch: 20 Global Step: 424500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:53,179-Speed 6314.39 samples/sec Loss 4.8350 LearningRate 0.0003 Epoch: 20 Global Step: 424510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:56,425-Speed 6311.61 samples/sec Loss 4.7954 LearningRate 0.0003 Epoch: 20 Global Step: 424520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:20:59,671-Speed 6309.55 samples/sec Loss 4.7791 LearningRate 0.0003 Epoch: 20 Global Step: 424530 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:02,916-Speed 6313.17 samples/sec Loss 4.7706 LearningRate 0.0003 Epoch: 20 Global Step: 424540 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:06,162-Speed 6311.38 samples/sec Loss 4.9019 LearningRate 0.0003 Epoch: 20 Global Step: 424550 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:09,402-Speed 6321.64 samples/sec Loss 4.7977 LearningRate 0.0003 Epoch: 20 Global Step: 424560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:12,669-Speed 6269.98 samples/sec Loss 4.8033 LearningRate 0.0003 Epoch: 20 Global Step: 424570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:15,936-Speed 6270.77 samples/sec Loss 4.7625 LearningRate 0.0003 Epoch: 20 Global Step: 424580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:19,183-Speed 6309.38 samples/sec Loss 4.8242 LearningRate 0.0003 Epoch: 20 Global Step: 424590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:22,428-Speed 6313.01 samples/sec Loss 4.9069 LearningRate 0.0003 Epoch: 20 Global Step: 424600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:25,672-Speed 6314.82 samples/sec Loss 4.8341 LearningRate 0.0003 Epoch: 20 Global Step: 424610 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:28,921-Speed 6305.69 samples/sec Loss 4.7985 LearningRate 0.0003 Epoch: 20 Global Step: 424620 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:32,164-Speed 6316.17 samples/sec Loss 4.8264 LearningRate 0.0003 Epoch: 20 Global Step: 424630 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:35,409-Speed 6312.09 samples/sec Loss 4.8975 LearningRate 0.0003 Epoch: 20 Global Step: 424640 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:38,657-Speed 6306.83 samples/sec Loss 4.8310 LearningRate 0.0003 Epoch: 20 Global Step: 424650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:41,922-Speed 6275.70 samples/sec Loss 4.8450 LearningRate 0.0003 Epoch: 20 Global Step: 424660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:45,164-Speed 6318.20 samples/sec Loss 4.8185 LearningRate 0.0003 Epoch: 20 Global Step: 424670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:48,405-Speed 6319.00 samples/sec Loss 4.8369 LearningRate 0.0003 Epoch: 20 Global Step: 424680 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:21:51,650-Speed 6312.24 samples/sec Loss 4.8368 LearningRate 0.0003 Epoch: 20 Global Step: 424690 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:21:54,881-Speed 6340.70 samples/sec Loss 4.7617 LearningRate 0.0003 Epoch: 20 Global Step: 424700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:21:58,132-Speed 6300.99 samples/sec Loss 4.8791 LearningRate 0.0003 Epoch: 20 Global Step: 424710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:01,379-Speed 6308.22 samples/sec Loss 4.8610 LearningRate 0.0003 Epoch: 20 Global Step: 424720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:04,622-Speed 6317.00 samples/sec Loss 4.8854 LearningRate 0.0003 Epoch: 20 Global Step: 424730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:07,865-Speed 6316.58 samples/sec Loss 4.8049 LearningRate 0.0003 Epoch: 20 Global Step: 424740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:11,113-Speed 6306.64 samples/sec Loss 4.8655 LearningRate 0.0003 Epoch: 20 Global Step: 424750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:14,359-Speed 6311.77 samples/sec Loss 4.8755 LearningRate 0.0003 Epoch: 20 Global Step: 424760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:17,603-Speed 6313.74 samples/sec Loss 4.7428 LearningRate 0.0003 Epoch: 20 Global Step: 424770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:20,849-Speed 6311.71 samples/sec Loss 4.7915 LearningRate 0.0003 Epoch: 20 Global Step: 424780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:24,094-Speed 6311.17 samples/sec Loss 4.7410 LearningRate 0.0003 Epoch: 20 Global Step: 424790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:27,338-Speed 6314.05 samples/sec Loss 4.8408 LearningRate 0.0003 Epoch: 20 Global Step: 424800 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:22:30,573-Speed 6334.66 samples/sec Loss 4.8539 LearningRate 0.0003 Epoch: 20 Global Step: 424810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:33,818-Speed 6311.78 samples/sec Loss 4.8111 LearningRate 0.0003 Epoch: 20 Global Step: 424820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:37,068-Speed 6303.09 samples/sec Loss 4.7832 LearningRate 0.0003 Epoch: 20 Global Step: 424830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:40,314-Speed 6311.93 samples/sec Loss 4.7465 LearningRate 0.0003 Epoch: 20 Global Step: 424840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:43,560-Speed 6309.12 samples/sec Loss 4.8517 LearningRate 0.0003 Epoch: 20 Global Step: 424850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:46,805-Speed 6313.15 samples/sec Loss 4.8183 LearningRate 0.0003 Epoch: 20 Global Step: 424860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:50,050-Speed 6312.51 samples/sec Loss 4.8378 LearningRate 0.0003 Epoch: 20 Global Step: 424870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:53,298-Speed 6307.91 samples/sec Loss 4.8120 LearningRate 0.0003 Epoch: 20 Global Step: 424880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:56,556-Speed 6286.14 samples/sec Loss 4.8251 LearningRate 0.0003 Epoch: 20 Global Step: 424890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:22:59,814-Speed 6287.61 samples/sec Loss 4.8061 LearningRate 0.0003 Epoch: 20 Global Step: 424900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:03,104-Speed 6226.64 samples/sec Loss 4.7709 LearningRate 0.0003 Epoch: 20 Global Step: 424910 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:23:06,354-Speed 6302.19 samples/sec Loss 4.7658 LearningRate 0.0003 Epoch: 20 Global Step: 424920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:09,596-Speed 6320.16 samples/sec Loss 4.8401 LearningRate 0.0003 Epoch: 20 Global Step: 424930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:12,845-Speed 6304.91 samples/sec Loss 4.8078 LearningRate 0.0003 Epoch: 20 Global Step: 424940 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:16,091-Speed 6310.66 samples/sec Loss 4.8502 LearningRate 0.0003 Epoch: 20 Global Step: 424950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:19,339-Speed 6305.70 samples/sec Loss 4.8306 LearningRate 0.0003 Epoch: 20 Global Step: 424960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:22,579-Speed 6322.52 samples/sec Loss 4.7835 LearningRate 0.0003 Epoch: 20 Global Step: 424970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:25,831-Speed 6299.36 samples/sec Loss 4.8377 LearningRate 0.0003 Epoch: 20 Global Step: 424980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:29,078-Speed 6308.66 samples/sec Loss 4.8232 LearningRate 0.0003 Epoch: 20 Global Step: 424990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:32,324-Speed 6311.78 samples/sec Loss 4.8205 LearningRate 0.0003 Epoch: 20 Global Step: 425000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:35,573-Speed 6304.53 samples/sec Loss 4.7716 LearningRate 0.0003 Epoch: 20 Global Step: 425010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:38,815-Speed 6318.55 samples/sec Loss 4.7780 LearningRate 0.0003 Epoch: 20 Global Step: 425020 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:23:42,044-Speed 6345.36 samples/sec Loss 4.7749 LearningRate 0.0003 Epoch: 20 Global Step: 425030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:45,293-Speed 6304.00 samples/sec Loss 4.8017 LearningRate 0.0003 Epoch: 20 Global Step: 425040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:48,539-Speed 6312.18 samples/sec Loss 4.8285 LearningRate 0.0003 Epoch: 20 Global Step: 425050 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:51,782-Speed 6315.87 samples/sec Loss 4.8446 LearningRate 0.0003 Epoch: 20 Global Step: 425060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:55,027-Speed 6312.72 samples/sec Loss 4.8202 LearningRate 0.0003 Epoch: 20 Global Step: 425070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:23:58,273-Speed 6310.59 samples/sec Loss 4.7667 LearningRate 0.0003 Epoch: 20 Global Step: 425080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:01,519-Speed 6310.28 samples/sec Loss 4.8506 LearningRate 0.0003 Epoch: 20 Global Step: 425090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:04,767-Speed 6306.00 samples/sec Loss 4.7724 LearningRate 0.0003 Epoch: 20 Global Step: 425100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:08,014-Speed 6309.00 samples/sec Loss 4.8282 LearningRate 0.0003 Epoch: 20 Global Step: 425110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:11,259-Speed 6314.04 samples/sec Loss 4.7785 LearningRate 0.0003 Epoch: 20 Global Step: 425120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:14,507-Speed 6305.70 samples/sec Loss 4.7609 LearningRate 0.0003 Epoch: 20 Global Step: 425130 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:24:17,759-Speed 6299.74 samples/sec Loss 4.8315 LearningRate 0.0003 Epoch: 20 Global Step: 425140 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:24:20,990-Speed 6340.61 samples/sec Loss 4.9046 LearningRate 0.0003 Epoch: 20 Global Step: 425150 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:24,236-Speed 6310.13 samples/sec Loss 4.8140 LearningRate 0.0003 Epoch: 20 Global Step: 425160 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:27,485-Speed 6304.91 samples/sec Loss 4.8577 LearningRate 0.0003 Epoch: 20 Global Step: 425170 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:30,741-Speed 6290.99 samples/sec Loss 4.8418 LearningRate 0.0003 Epoch: 20 Global Step: 425180 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:33,986-Speed 6312.39 samples/sec Loss 4.7567 LearningRate 0.0003 Epoch: 20 Global Step: 425190 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:37,237-Speed 6300.46 samples/sec Loss 4.8039 LearningRate 0.0003 Epoch: 20 Global Step: 425200 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:40,482-Speed 6314.12 samples/sec Loss 4.8960 LearningRate 0.0003 Epoch: 20 Global Step: 425210 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:43,726-Speed 6313.48 samples/sec Loss 4.7358 LearningRate 0.0003 Epoch: 20 Global Step: 425220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:47,022-Speed 6216.66 samples/sec Loss 4.7783 LearningRate 0.0003 Epoch: 20 Global Step: 425230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:50,301-Speed 6246.49 samples/sec Loss 4.8156 LearningRate 0.0003 Epoch: 20 Global Step: 425240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:24:53,561-Speed 6283.62 samples/sec Loss 4.7953 LearningRate 0.0003 Epoch: 20 Global Step: 425250 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:24:56,807-Speed 6311.46 samples/sec Loss 4.7501 LearningRate 0.0003 Epoch: 20 Global Step: 425260 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:25:00,057-Speed 6303.71 samples/sec Loss 4.8416 LearningRate 0.0003 Epoch: 20 Global Step: 425270 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:25:03,304-Speed 6307.80 samples/sec Loss 4.8515 LearningRate 0.0003 Epoch: 20 Global Step: 425280 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:25:06,537-Speed 6336.46 samples/sec Loss 4.8168 LearningRate 0.0003 Epoch: 20 Global Step: 425290 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:09,788-Speed 6300.97 samples/sec Loss 4.8117 LearningRate 0.0003 Epoch: 20 Global Step: 425300 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:13,033-Speed 6312.50 samples/sec Loss 4.8240 LearningRate 0.0003 Epoch: 20 Global Step: 425310 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:16,278-Speed 6312.28 samples/sec Loss 4.7543 LearningRate 0.0003 Epoch: 20 Global Step: 425320 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:19,533-Speed 6294.56 samples/sec Loss 4.7254 LearningRate 0.0003 Epoch: 20 Global Step: 425330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:22,778-Speed 6310.86 samples/sec Loss 4.8179 LearningRate 0.0003 Epoch: 20 Global Step: 425340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:26,026-Speed 6307.32 samples/sec Loss 4.7927 LearningRate 0.0003 Epoch: 20 Global Step: 425350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:29,272-Speed 6310.78 samples/sec Loss 4.8210 LearningRate 0.0003 Epoch: 20 Global Step: 425360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:32,522-Speed 6303.24 samples/sec Loss 4.7970 LearningRate 0.0003 Epoch: 20 Global Step: 425370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:35,775-Speed 6297.26 samples/sec Loss 4.8066 LearningRate 0.0003 Epoch: 20 Global Step: 425380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:39,025-Speed 6302.40 samples/sec Loss 4.8710 LearningRate 0.0003 Epoch: 20 Global Step: 425390 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:25:42,276-Speed 6302.30 samples/sec Loss 4.8374 LearningRate 0.0003 Epoch: 20 Global Step: 425400 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:25:45,504-Speed 6345.20 samples/sec Loss 4.8250 LearningRate 0.0003 Epoch: 20 Global Step: 425410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:48,753-Speed 6304.95 samples/sec Loss 4.8309 LearningRate 0.0003 Epoch: 20 Global Step: 425420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:51,999-Speed 6311.23 samples/sec Loss 4.7810 LearningRate 0.0003 Epoch: 20 Global Step: 425430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:55,247-Speed 6306.07 samples/sec Loss 4.7305 LearningRate 0.0003 Epoch: 20 Global Step: 425440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:25:58,494-Speed 6309.36 samples/sec Loss 4.8386 LearningRate 0.0003 Epoch: 20 Global Step: 425450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:01,742-Speed 6306.42 samples/sec Loss 4.7720 LearningRate 0.0003 Epoch: 20 Global Step: 425460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:04,988-Speed 6310.62 samples/sec Loss 4.7892 LearningRate 0.0003 Epoch: 20 Global Step: 425470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:08,234-Speed 6311.21 samples/sec Loss 4.8131 LearningRate 0.0003 Epoch: 20 Global Step: 425480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:11,483-Speed 6304.50 samples/sec Loss 4.8388 LearningRate 0.0003 Epoch: 20 Global Step: 425490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:14,730-Speed 6309.59 samples/sec Loss 4.8793 LearningRate 0.0003 Epoch: 20 Global Step: 425500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:17,970-Speed 6321.81 samples/sec Loss 4.8201 LearningRate 0.0003 Epoch: 20 Global Step: 425510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:21,218-Speed 6307.49 samples/sec Loss 4.8993 LearningRate 0.0003 Epoch: 20 Global Step: 425520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:24,567-Speed 6117.22 samples/sec Loss 4.8213 LearningRate 0.0003 Epoch: 20 Global Step: 425530 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:27,812-Speed 6311.95 samples/sec Loss 4.8307 LearningRate 0.0003 Epoch: 20 Global Step: 425540 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:31,056-Speed 6314.81 samples/sec Loss 4.8468 LearningRate 0.0003 Epoch: 20 Global Step: 425550 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:34,307-Speed 6300.68 samples/sec Loss 4.8496 LearningRate 0.0003 Epoch: 20 Global Step: 425560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:37,551-Speed 6314.63 samples/sec Loss 4.7740 LearningRate 0.0003 Epoch: 20 Global Step: 425570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:40,795-Speed 6314.30 samples/sec Loss 4.8081 LearningRate 0.0003 Epoch: 20 Global Step: 425580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:44,047-Speed 6299.24 samples/sec Loss 4.8123 LearningRate 0.0003 Epoch: 20 Global Step: 425590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:47,293-Speed 6313.97 samples/sec Loss 4.7929 LearningRate 0.0003 Epoch: 20 Global Step: 425600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:26:50,541-Speed 6306.57 samples/sec Loss 4.8512 LearningRate 0.0003 Epoch: 20 Global Step: 425610 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:26:53,789-Speed 6308.10 samples/sec Loss 4.8583 LearningRate 0.0003 Epoch: 20 Global Step: 425620 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:26:57,033-Speed 6314.51 samples/sec Loss 4.8745 LearningRate 0.0003 Epoch: 20 Global Step: 425630 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:27:00,295-Speed 6279.04 samples/sec Loss 4.8263 LearningRate 0.0003 Epoch: 20 Global Step: 425640 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:27:03,548-Speed 6297.38 samples/sec Loss 4.8339 LearningRate 0.0003 Epoch: 20 Global Step: 425650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:06,794-Speed 6310.99 samples/sec Loss 4.8141 LearningRate 0.0003 Epoch: 20 Global Step: 425660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:10,042-Speed 6305.77 samples/sec Loss 4.9196 LearningRate 0.0003 Epoch: 20 Global Step: 425670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:13,289-Speed 6310.99 samples/sec Loss 4.8666 LearningRate 0.0003 Epoch: 20 Global Step: 425680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:16,537-Speed 6306.51 samples/sec Loss 4.8503 LearningRate 0.0003 Epoch: 20 Global Step: 425690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:19,816-Speed 6246.42 samples/sec Loss 4.8361 LearningRate 0.0003 Epoch: 20 Global Step: 425700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:23,059-Speed 6317.67 samples/sec Loss 4.7948 LearningRate 0.0003 Epoch: 20 Global Step: 425710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:26,304-Speed 6311.40 samples/sec Loss 4.8141 LearningRate 0.0003 Epoch: 20 Global Step: 425720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:29,552-Speed 6306.81 samples/sec Loss 4.7941 LearningRate 0.0003 Epoch: 20 Global Step: 425730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:32,796-Speed 6314.44 samples/sec Loss 4.8293 LearningRate 0.0003 Epoch: 20 Global Step: 425740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:36,038-Speed 6318.80 samples/sec Loss 4.8954 LearningRate 0.0003 Epoch: 20 Global Step: 425750 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:27:39,285-Speed 6308.18 samples/sec Loss 4.8431 LearningRate 0.0003 Epoch: 20 Global Step: 425760 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:27:42,527-Speed 6319.07 samples/sec Loss 4.7499 LearningRate 0.0003 Epoch: 20 Global Step: 425770 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:27:45,757-Speed 6341.14 samples/sec Loss 4.8787 LearningRate 0.0003 Epoch: 20 Global Step: 425780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:49,002-Speed 6314.48 samples/sec Loss 4.8106 LearningRate 0.0003 Epoch: 20 Global Step: 425790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:52,247-Speed 6312.02 samples/sec Loss 4.7935 LearningRate 0.0003 Epoch: 20 Global Step: 425800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:55,492-Speed 6312.58 samples/sec Loss 4.7845 LearningRate 0.0003 Epoch: 20 Global Step: 425810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:27:58,743-Speed 6300.62 samples/sec Loss 4.8122 LearningRate 0.0003 Epoch: 20 Global Step: 425820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:01,990-Speed 6308.22 samples/sec Loss 4.7746 LearningRate 0.0003 Epoch: 20 Global Step: 425830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:05,278-Speed 6231.18 samples/sec Loss 4.7954 LearningRate 0.0003 Epoch: 20 Global Step: 425840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:08,534-Speed 6290.95 samples/sec Loss 4.8963 LearningRate 0.0003 Epoch: 20 Global Step: 425850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:11,788-Speed 6296.42 samples/sec Loss 4.7790 LearningRate 0.0003 Epoch: 20 Global Step: 425860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:15,058-Speed 6265.37 samples/sec Loss 4.8205 LearningRate 0.0003 Epoch: 20 Global Step: 425870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:18,289-Speed 6339.31 samples/sec Loss 4.7734 LearningRate 0.0003 Epoch: 20 Global Step: 425880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:21,532-Speed 6316.23 samples/sec Loss 4.8100 LearningRate 0.0003 Epoch: 20 Global Step: 425890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:24,778-Speed 6311.02 samples/sec Loss 4.8762 LearningRate 0.0003 Epoch: 20 Global Step: 425900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:28,026-Speed 6306.91 samples/sec Loss 4.8284 LearningRate 0.0003 Epoch: 20 Global Step: 425910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:31,270-Speed 6314.06 samples/sec Loss 4.7725 LearningRate 0.0003 Epoch: 20 Global Step: 425920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:34,518-Speed 6307.27 samples/sec Loss 4.7992 LearningRate 0.0003 Epoch: 20 Global Step: 425930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:37,770-Speed 6299.66 samples/sec Loss 4.7774 LearningRate 0.0003 Epoch: 20 Global Step: 425940 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:41,015-Speed 6312.00 samples/sec Loss 4.8334 LearningRate 0.0003 Epoch: 20 Global Step: 425950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:44,260-Speed 6312.07 samples/sec Loss 4.8116 LearningRate 0.0003 Epoch: 20 Global Step: 425960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:47,507-Speed 6309.90 samples/sec Loss 4.8374 LearningRate 0.0003 Epoch: 20 Global Step: 425970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:50,753-Speed 6310.61 samples/sec Loss 4.8645 LearningRate 0.0003 Epoch: 20 Global Step: 425980 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:28:53,983-Speed 6340.55 samples/sec Loss 4.7992 LearningRate 0.0003 Epoch: 20 Global Step: 425990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:28:57,229-Speed 6310.76 samples/sec Loss 4.7325 LearningRate 0.0003 Epoch: 20 Global Step: 426000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:00,474-Speed 6313.39 samples/sec Loss 4.8024 LearningRate 0.0003 Epoch: 20 Global Step: 426010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:03,729-Speed 6292.97 samples/sec Loss 4.8233 LearningRate 0.0003 Epoch: 20 Global Step: 426020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:06,973-Speed 6314.69 samples/sec Loss 4.7771 LearningRate 0.0003 Epoch: 20 Global Step: 426030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:10,221-Speed 6307.71 samples/sec Loss 4.8659 LearningRate 0.0003 Epoch: 20 Global Step: 426040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:13,465-Speed 6315.45 samples/sec Loss 4.7732 LearningRate 0.0003 Epoch: 20 Global Step: 426050 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:16,710-Speed 6312.78 samples/sec Loss 4.7940 LearningRate 0.0003 Epoch: 20 Global Step: 426060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:19,957-Speed 6312.01 samples/sec Loss 4.8277 LearningRate 0.0003 Epoch: 20 Global Step: 426070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:23,200-Speed 6317.25 samples/sec Loss 4.7667 LearningRate 0.0003 Epoch: 20 Global Step: 426080 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:26,451-Speed 6300.30 samples/sec Loss 4.8336 LearningRate 0.0003 Epoch: 20 Global Step: 426090 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:29:29,699-Speed 6307.40 samples/sec Loss 4.7700 LearningRate 0.0003 Epoch: 20 Global Step: 426100 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:29:32,925-Speed 6349.68 samples/sec Loss 4.8204 LearningRate 0.0003 Epoch: 20 Global Step: 426110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:36,169-Speed 6316.15 samples/sec Loss 4.8100 LearningRate 0.0003 Epoch: 20 Global Step: 426120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:39,414-Speed 6310.84 samples/sec Loss 4.8259 LearningRate 0.0003 Epoch: 20 Global Step: 426130 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:42,658-Speed 6316.37 samples/sec Loss 4.7574 LearningRate 0.0003 Epoch: 20 Global Step: 426140 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:45,901-Speed 6315.15 samples/sec Loss 4.7972 LearningRate 0.0003 Epoch: 20 Global Step: 426150 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:49,149-Speed 6307.03 samples/sec Loss 4.8360 LearningRate 0.0003 Epoch: 20 Global Step: 426160 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:52,395-Speed 6311.88 samples/sec Loss 4.8272 LearningRate 0.0003 Epoch: 20 Global Step: 426170 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:55,638-Speed 6317.05 samples/sec Loss 4.8450 LearningRate 0.0003 Epoch: 20 Global Step: 426180 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:29:58,885-Speed 6307.97 samples/sec Loss 4.7659 LearningRate 0.0003 Epoch: 20 Global Step: 426190 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:02,128-Speed 6315.57 samples/sec Loss 4.8373 LearningRate 0.0003 Epoch: 20 Global Step: 426200 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:05,355-Speed 6347.84 samples/sec Loss 4.7989 LearningRate 0.0003 Epoch: 20 Global Step: 426210 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:08,613-Speed 6288.14 samples/sec Loss 4.8101 LearningRate 0.0003 Epoch: 20 Global Step: 426220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:11,856-Speed 6317.07 samples/sec Loss 4.7937 LearningRate 0.0003 Epoch: 20 Global Step: 426230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:15,098-Speed 6317.78 samples/sec Loss 4.7557 LearningRate 0.0003 Epoch: 20 Global Step: 426240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:18,344-Speed 6310.38 samples/sec Loss 4.7888 LearningRate 0.0003 Epoch: 20 Global Step: 426250 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:21,588-Speed 6314.84 samples/sec Loss 4.8071 LearningRate 0.0003 Epoch: 20 Global Step: 426260 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:24,837-Speed 6304.96 samples/sec Loss 4.8443 LearningRate 0.0003 Epoch: 20 Global Step: 426270 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:28,086-Speed 6305.25 samples/sec Loss 4.8815 LearningRate 0.0003 Epoch: 20 Global Step: 426280 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:31,333-Speed 6308.45 samples/sec Loss 4.8695 LearningRate 0.0003 Epoch: 20 Global Step: 426290 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:34,581-Speed 6306.59 samples/sec Loss 4.7956 LearningRate 0.0003 Epoch: 20 Global Step: 426300 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:30:37,816-Speed 6334.21 samples/sec Loss 4.7501 LearningRate 0.0003 Epoch: 20 Global Step: 426310 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:30:41,058-Speed 6318.74 samples/sec Loss 4.7930 LearningRate 0.0003 Epoch: 20 Global Step: 426320 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:30:44,299-Speed 6318.45 samples/sec Loss 4.7859 LearningRate 0.0003 Epoch: 20 Global Step: 426330 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:30:47,544-Speed 6313.19 samples/sec Loss 4.8432 LearningRate 0.0003 Epoch: 20 Global Step: 426340 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:30:50,789-Speed 6313.50 samples/sec Loss 4.7979 LearningRate 0.0003 Epoch: 20 Global Step: 426350 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:30:54,035-Speed 6311.33 samples/sec Loss 4.7316 LearningRate 0.0003 Epoch: 20 Global Step: 426360 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:30:57,274-Speed 6324.10 samples/sec Loss 4.8533 LearningRate 0.0003 Epoch: 20 Global Step: 426370 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:31:00,520-Speed 6309.87 samples/sec Loss 4.7324 LearningRate 0.0003 Epoch: 20 Global Step: 426380 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:31:03,768-Speed 6307.77 samples/sec Loss 4.8294 LearningRate 0.0003 Epoch: 20 Global Step: 426390 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:31:07,016-Speed 6306.41 samples/sec Loss 4.7486 LearningRate 0.0003 Epoch: 20 Global Step: 426400 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-02 06:31:10,263-Speed 6308.35 samples/sec Loss 4.8639 LearningRate 0.0003 Epoch: 20 Global Step: 426410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:13,508-Speed 6312.76 samples/sec Loss 4.8340 LearningRate 0.0003 Epoch: 20 Global Step: 426420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:16,756-Speed 6307.15 samples/sec Loss 4.7774 LearningRate 0.0003 Epoch: 20 Global Step: 426430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:19,998-Speed 6318.33 samples/sec Loss 4.8031 LearningRate 0.0003 Epoch: 20 Global Step: 426440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:23,240-Speed 6318.88 samples/sec Loss 4.8938 LearningRate 0.0003 Epoch: 20 Global Step: 426450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:26,488-Speed 6307.69 samples/sec Loss 4.7158 LearningRate 0.0003 Epoch: 20 Global Step: 426460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:29,739-Speed 6299.83 samples/sec Loss 4.7822 LearningRate 0.0003 Epoch: 20 Global Step: 426470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:32,985-Speed 6311.49 samples/sec Loss 4.7446 LearningRate 0.0003 Epoch: 20 Global Step: 426480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:36,229-Speed 6314.00 samples/sec Loss 4.8642 LearningRate 0.0003 Epoch: 20 Global Step: 426490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:39,479-Speed 6302.10 samples/sec Loss 4.8082 LearningRate 0.0003 Epoch: 20 Global Step: 426500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:42,712-Speed 6336.58 samples/sec Loss 4.8666 LearningRate 0.0003 Epoch: 20 Global Step: 426510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:45,956-Speed 6314.83 samples/sec Loss 4.7671 LearningRate 0.0003 Epoch: 20 Global Step: 426520 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:49,205-Speed 6305.75 samples/sec Loss 4.7987 LearningRate 0.0003 Epoch: 20 Global Step: 426530 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:52,451-Speed 6311.05 samples/sec Loss 4.8279 LearningRate 0.0003 Epoch: 20 Global Step: 426540 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:55,699-Speed 6307.65 samples/sec Loss 4.8073 LearningRate 0.0003 Epoch: 20 Global Step: 426550 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:31:58,951-Speed 6297.31 samples/sec Loss 4.8677 LearningRate 0.0003 Epoch: 20 Global Step: 426560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:02,198-Speed 6309.95 samples/sec Loss 4.8144 LearningRate 0.0003 Epoch: 20 Global Step: 426570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:05,446-Speed 6306.69 samples/sec Loss 4.8447 LearningRate 0.0003 Epoch: 20 Global Step: 426580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:08,687-Speed 6321.20 samples/sec Loss 4.8170 LearningRate 0.0003 Epoch: 20 Global Step: 426590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:11,935-Speed 6306.54 samples/sec Loss 4.8353 LearningRate 0.0003 Epoch: 20 Global Step: 426600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:15,180-Speed 6312.54 samples/sec Loss 4.7534 LearningRate 0.0003 Epoch: 20 Global Step: 426610 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:32:18,421-Speed 6320.49 samples/sec Loss 4.8392 LearningRate 0.0003 Epoch: 20 Global Step: 426620 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:21,670-Speed 6303.17 samples/sec Loss 4.7870 LearningRate 0.0003 Epoch: 20 Global Step: 426630 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:24,916-Speed 6311.19 samples/sec Loss 4.8305 LearningRate 0.0003 Epoch: 20 Global Step: 426640 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:28,164-Speed 6308.03 samples/sec Loss 4.7484 LearningRate 0.0003 Epoch: 20 Global Step: 426650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:31,409-Speed 6311.84 samples/sec Loss 4.7619 LearningRate 0.0003 Epoch: 20 Global Step: 426660 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:34,653-Speed 6314.54 samples/sec Loss 4.8005 LearningRate 0.0003 Epoch: 20 Global Step: 426670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:37,899-Speed 6312.21 samples/sec Loss 4.7299 LearningRate 0.0003 Epoch: 20 Global Step: 426680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:41,142-Speed 6316.09 samples/sec Loss 4.7979 LearningRate 0.0003 Epoch: 20 Global Step: 426690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:44,388-Speed 6311.77 samples/sec Loss 4.8690 LearningRate 0.0003 Epoch: 20 Global Step: 426700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:47,633-Speed 6312.23 samples/sec Loss 4.7415 LearningRate 0.0003 Epoch: 20 Global Step: 426710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:50,891-Speed 6286.77 samples/sec Loss 4.7322 LearningRate 0.0003 Epoch: 20 Global Step: 426720 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:32:54,120-Speed 6344.87 samples/sec Loss 4.7573 LearningRate 0.0003 Epoch: 20 Global Step: 426730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:32:57,361-Speed 6321.12 samples/sec Loss 4.7450 LearningRate 0.0003 Epoch: 20 Global Step: 426740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:00,606-Speed 6312.09 samples/sec Loss 4.8663 LearningRate 0.0003 Epoch: 20 Global Step: 426750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:03,847-Speed 6320.91 samples/sec Loss 4.8363 LearningRate 0.0003 Epoch: 20 Global Step: 426760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:07,092-Speed 6312.53 samples/sec Loss 4.8001 LearningRate 0.0003 Epoch: 20 Global Step: 426770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:10,335-Speed 6316.61 samples/sec Loss 4.7730 LearningRate 0.0003 Epoch: 20 Global Step: 426780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:13,578-Speed 6316.67 samples/sec Loss 4.7698 LearningRate 0.0003 Epoch: 20 Global Step: 426790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:16,823-Speed 6313.71 samples/sec Loss 4.7273 LearningRate 0.0003 Epoch: 20 Global Step: 426800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:20,071-Speed 6305.35 samples/sec Loss 4.7441 LearningRate 0.0003 Epoch: 20 Global Step: 426810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:23,318-Speed 6309.16 samples/sec Loss 4.8505 LearningRate 0.0003 Epoch: 20 Global Step: 426820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:26,557-Speed 6324.28 samples/sec Loss 4.8505 LearningRate 0.0003 Epoch: 20 Global Step: 426830 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:33:29,787-Speed 6342.68 samples/sec Loss 4.7650 LearningRate 0.0003 Epoch: 20 Global Step: 426840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:33,031-Speed 6313.59 samples/sec Loss 4.8436 LearningRate 0.0003 Epoch: 20 Global Step: 426850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:36,278-Speed 6308.51 samples/sec Loss 4.7939 LearningRate 0.0003 Epoch: 20 Global Step: 426860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:39,529-Speed 6302.23 samples/sec Loss 4.7786 LearningRate 0.0003 Epoch: 20 Global Step: 426870 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:42,771-Speed 6318.86 samples/sec Loss 4.8778 LearningRate 0.0003 Epoch: 20 Global Step: 426880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:46,013-Speed 6317.56 samples/sec Loss 4.8690 LearningRate 0.0003 Epoch: 20 Global Step: 426890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:49,255-Speed 6317.93 samples/sec Loss 4.7805 LearningRate 0.0003 Epoch: 20 Global Step: 426900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:52,502-Speed 6308.54 samples/sec Loss 4.8352 LearningRate 0.0003 Epoch: 20 Global Step: 426910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:55,749-Speed 6310.02 samples/sec Loss 4.7532 LearningRate 0.0003 Epoch: 20 Global Step: 426920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:33:58,991-Speed 6318.70 samples/sec Loss 4.8375 LearningRate 0.0003 Epoch: 20 Global Step: 426930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:02,238-Speed 6307.13 samples/sec Loss 4.7682 LearningRate 0.0003 Epoch: 20 Global Step: 426940 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:34:05,473-Speed 6333.77 samples/sec Loss 4.7292 LearningRate 0.0003 Epoch: 20 Global Step: 426950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:08,718-Speed 6312.44 samples/sec Loss 4.7792 LearningRate 0.0003 Epoch: 20 Global Step: 426960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:11,963-Speed 6312.81 samples/sec Loss 4.7518 LearningRate 0.0003 Epoch: 20 Global Step: 426970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:15,211-Speed 6308.01 samples/sec Loss 4.8644 LearningRate 0.0003 Epoch: 20 Global Step: 426980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:18,455-Speed 6313.28 samples/sec Loss 4.7730 LearningRate 0.0003 Epoch: 20 Global Step: 426990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:21,718-Speed 6279.39 samples/sec Loss 4.9000 LearningRate 0.0003 Epoch: 20 Global Step: 427000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:24,971-Speed 6297.13 samples/sec Loss 4.7744 LearningRate 0.0003 Epoch: 20 Global Step: 427010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:28,218-Speed 6308.04 samples/sec Loss 4.7821 LearningRate 0.0003 Epoch: 20 Global Step: 427020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:31,467-Speed 6304.01 samples/sec Loss 4.7923 LearningRate 0.0003 Epoch: 20 Global Step: 427030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:34,716-Speed 6305.87 samples/sec Loss 4.7808 LearningRate 0.0003 Epoch: 20 Global Step: 427040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:34:37,965-Speed 6305.67 samples/sec Loss 4.7682 LearningRate 0.0003 Epoch: 20 Global Step: 427050 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:34:41,212-Speed 6307.51 samples/sec Loss 4.8042 LearningRate 0.0003 Epoch: 20 Global Step: 427060 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:34:44,453-Speed 6319.81 samples/sec Loss 4.7487 LearningRate 0.0003 Epoch: 20 Global Step: 427070 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:34:47,758-Speed 6198.09 samples/sec Loss 4.7712 LearningRate 0.0003 Epoch: 20 Global Step: 427080 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:34:51,004-Speed 6311.72 samples/sec Loss 4.7992 LearningRate 0.0003 Epoch: 20 Global Step: 427090 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:34:54,250-Speed 6311.08 samples/sec Loss 4.8421 LearningRate 0.0003 Epoch: 20 Global Step: 427100 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:34:57,493-Speed 6315.95 samples/sec Loss 4.8421 LearningRate 0.0003 Epoch: 20 Global Step: 427110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:00,748-Speed 6292.89 samples/sec Loss 4.7978 LearningRate 0.0003 Epoch: 20 Global Step: 427120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:04,125-Speed 6066.35 samples/sec Loss 4.7773 LearningRate 0.0003 Epoch: 20 Global Step: 427130 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:07,367-Speed 6318.87 samples/sec Loss 4.7654 LearningRate 0.0003 Epoch: 20 Global Step: 427140 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:10,618-Speed 6299.57 samples/sec Loss 4.7504 LearningRate 0.0003 Epoch: 20 Global Step: 427150 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:13,866-Speed 6307.08 samples/sec Loss 4.8274 LearningRate 0.0003 Epoch: 20 Global Step: 427160 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:17,113-Speed 6310.05 samples/sec Loss 4.8027 LearningRate 0.0003 Epoch: 20 Global Step: 427170 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:20,360-Speed 6308.92 samples/sec Loss 4.8090 LearningRate 0.0003 Epoch: 20 Global Step: 427180 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:23,608-Speed 6307.20 samples/sec Loss 4.7759 LearningRate 0.0003 Epoch: 20 Global Step: 427190 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:26,856-Speed 6305.61 samples/sec Loss 4.7687 LearningRate 0.0003 Epoch: 20 Global Step: 427200 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:30,105-Speed 6306.47 samples/sec Loss 4.7707 LearningRate 0.0003 Epoch: 20 Global Step: 427210 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:35:33,333-Speed 6345.77 samples/sec Loss 4.7501 LearningRate 0.0003 Epoch: 20 Global Step: 427220 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:36,623-Speed 6225.68 samples/sec Loss 4.8519 LearningRate 0.0003 Epoch: 20 Global Step: 427230 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:39,940-Speed 6175.22 samples/sec Loss 4.7579 LearningRate 0.0003 Epoch: 20 Global Step: 427240 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:43,185-Speed 6312.27 samples/sec Loss 4.7848 LearningRate 0.0003 Epoch: 20 Global Step: 427250 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:46,431-Speed 6310.86 samples/sec Loss 4.8001 LearningRate 0.0003 Epoch: 20 Global Step: 427260 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:49,726-Speed 6216.56 samples/sec Loss 4.7702 LearningRate 0.0003 Epoch: 20 Global Step: 427270 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:52,973-Speed 6308.98 samples/sec Loss 4.7610 LearningRate 0.0003 Epoch: 20 Global Step: 427280 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:56,238-Speed 6274.73 samples/sec Loss 4.7717 LearningRate 0.0003 Epoch: 20 Global Step: 427290 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:35:59,487-Speed 6305.62 samples/sec Loss 4.7142 LearningRate 0.0003 Epoch: 20 Global Step: 427300 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:02,739-Speed 6299.22 samples/sec Loss 4.7806 LearningRate 0.0003 Epoch: 20 Global Step: 427310 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:05,974-Speed 6331.89 samples/sec Loss 4.8057 LearningRate 0.0003 Epoch: 20 Global Step: 427320 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:09,217-Speed 6315.37 samples/sec Loss 4.7883 LearningRate 0.0003 Epoch: 20 Global Step: 427330 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:12,469-Speed 6299.31 samples/sec Loss 4.7621 LearningRate 0.0003 Epoch: 20 Global Step: 427340 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:15,713-Speed 6315.51 samples/sec Loss 4.8210 LearningRate 0.0003 Epoch: 20 Global Step: 427350 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:18,960-Speed 6308.98 samples/sec Loss 4.7610 LearningRate 0.0003 Epoch: 20 Global Step: 427360 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:22,206-Speed 6310.78 samples/sec Loss 4.8025 LearningRate 0.0003 Epoch: 20 Global Step: 427370 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:25,458-Speed 6298.06 samples/sec Loss 4.8037 LearningRate 0.0003 Epoch: 20 Global Step: 427380 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:28,706-Speed 6306.64 samples/sec Loss 4.7321 LearningRate 0.0003 Epoch: 20 Global Step: 427390 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:31,954-Speed 6308.64 samples/sec Loss 4.8420 LearningRate 0.0003 Epoch: 20 Global Step: 427400 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:35,196-Speed 6318.30 samples/sec Loss 4.8160 LearningRate 0.0003 Epoch: 20 Global Step: 427410 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:38,423-Speed 6346.85 samples/sec Loss 4.8464 LearningRate 0.0003 Epoch: 20 Global Step: 427420 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:41,671-Speed 6307.51 samples/sec Loss 4.8002 LearningRate 0.0003 Epoch: 20 Global Step: 427430 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:44,917-Speed 6310.27 samples/sec Loss 4.7986 LearningRate 0.0003 Epoch: 20 Global Step: 427440 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:48,170-Speed 6297.26 samples/sec Loss 4.7829 LearningRate 0.0003 Epoch: 20 Global Step: 427450 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:51,416-Speed 6310.27 samples/sec Loss 4.8867 LearningRate 0.0003 Epoch: 20 Global Step: 427460 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:54,672-Speed 6292.46 samples/sec Loss 4.8640 LearningRate 0.0003 Epoch: 20 Global Step: 427470 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:36:57,916-Speed 6313.75 samples/sec Loss 4.7629 LearningRate 0.0003 Epoch: 20 Global Step: 427480 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:01,166-Speed 6303.06 samples/sec Loss 4.7848 LearningRate 0.0003 Epoch: 20 Global Step: 427490 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:04,412-Speed 6310.86 samples/sec Loss 4.8732 LearningRate 0.0003 Epoch: 20 Global Step: 427500 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:07,657-Speed 6313.41 samples/sec Loss 4.8474 LearningRate 0.0003 Epoch: 20 Global Step: 427510 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:10,913-Speed 6291.15 samples/sec Loss 4.8458 LearningRate 0.0003 Epoch: 20 Global Step: 427520 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:37:14,160-Speed 6308.11 samples/sec Loss 4.8476 LearningRate 0.0003 Epoch: 20 Global Step: 427530 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:37:17,407-Speed 6309.94 samples/sec Loss 4.8522 LearningRate 0.0003 Epoch: 20 Global Step: 427540 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:37:20,652-Speed 6311.10 samples/sec Loss 4.8209 LearningRate 0.0003 Epoch: 20 Global Step: 427550 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:37:23,883-Speed 6340.20 samples/sec Loss 4.8033 LearningRate 0.0003 Epoch: 20 Global Step: 427560 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:27,126-Speed 6317.12 samples/sec Loss 4.7672 LearningRate 0.0003 Epoch: 20 Global Step: 427570 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:30,368-Speed 6319.41 samples/sec Loss 4.7914 LearningRate 0.0003 Epoch: 20 Global Step: 427580 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:33,615-Speed 6308.34 samples/sec Loss 4.8950 LearningRate 0.0003 Epoch: 20 Global Step: 427590 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:36,858-Speed 6315.21 samples/sec Loss 4.7565 LearningRate 0.0003 Epoch: 20 Global Step: 427600 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:40,105-Speed 6309.70 samples/sec Loss 4.8370 LearningRate 0.0003 Epoch: 20 Global Step: 427610 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:43,350-Speed 6313.22 samples/sec Loss 4.8369 LearningRate 0.0003 Epoch: 20 Global Step: 427620 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:46,598-Speed 6307.53 samples/sec Loss 4.7847 LearningRate 0.0003 Epoch: 20 Global Step: 427630 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:49,875-Speed 6250.81 samples/sec Loss 4.8742 LearningRate 0.0003 Epoch: 20 Global Step: 427640 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:53,117-Speed 6317.81 samples/sec Loss 4.8279 LearningRate 0.0003 Epoch: 20 Global Step: 427650 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:37:56,362-Speed 6312.82 samples/sec Loss 4.8102 LearningRate 0.0003 Epoch: 20 Global Step: 427660 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:37:59,614-Speed 6299.56 samples/sec Loss 4.8259 LearningRate 0.0003 Epoch: 20 Global Step: 427670 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:02,863-Speed 6305.90 samples/sec Loss 4.7834 LearningRate 0.0003 Epoch: 20 Global Step: 427680 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:06,110-Speed 6308.45 samples/sec Loss 4.7745 LearningRate 0.0003 Epoch: 20 Global Step: 427690 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:09,353-Speed 6315.11 samples/sec Loss 4.7932 LearningRate 0.0003 Epoch: 20 Global Step: 427700 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:12,624-Speed 6263.22 samples/sec Loss 4.7867 LearningRate 0.0003 Epoch: 20 Global Step: 427710 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:15,868-Speed 6314.79 samples/sec Loss 4.8142 LearningRate 0.0003 Epoch: 20 Global Step: 427720 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:19,115-Speed 6308.88 samples/sec Loss 4.8077 LearningRate 0.0003 Epoch: 20 Global Step: 427730 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:22,360-Speed 6313.28 samples/sec Loss 4.8190 LearningRate 0.0003 Epoch: 20 Global Step: 427740 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:25,606-Speed 6309.57 samples/sec Loss 4.8436 LearningRate 0.0003 Epoch: 20 Global Step: 427750 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:28,853-Speed 6309.75 samples/sec Loss 4.7804 LearningRate 0.0003 Epoch: 20 Global Step: 427760 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:32,079-Speed 6348.12 samples/sec Loss 4.8281 LearningRate 0.0003 Epoch: 20 Global Step: 427770 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:35,329-Speed 6303.68 samples/sec Loss 4.8200 LearningRate 0.0003 Epoch: 20 Global Step: 427780 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:38,575-Speed 6311.12 samples/sec Loss 4.8080 LearningRate 0.0003 Epoch: 20 Global Step: 427790 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:41,822-Speed 6308.43 samples/sec Loss 4.8281 LearningRate 0.0003 Epoch: 20 Global Step: 427800 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:45,070-Speed 6307.00 samples/sec Loss 4.7644 LearningRate 0.0003 Epoch: 20 Global Step: 427810 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:48,320-Speed 6302.83 samples/sec Loss 4.7915 LearningRate 0.0003 Epoch: 20 Global Step: 427820 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:51,569-Speed 6306.65 samples/sec Loss 4.8614 LearningRate 0.0003 Epoch: 20 Global Step: 427830 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:54,811-Speed 6317.41 samples/sec Loss 4.7337 LearningRate 0.0003 Epoch: 20 Global Step: 427840 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:38:58,058-Speed 6309.88 samples/sec Loss 4.7459 LearningRate 0.0003 Epoch: 20 Global Step: 427850 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:01,307-Speed 6304.44 samples/sec Loss 4.7167 LearningRate 0.0003 Epoch: 20 Global Step: 427860 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:04,564-Speed 6288.60 samples/sec Loss 4.8392 LearningRate 0.0003 Epoch: 20 Global Step: 427870 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:39:07,798-Speed 6335.85 samples/sec Loss 4.8779 LearningRate 0.0003 Epoch: 20 Global Step: 427880 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:11,043-Speed 6311.30 samples/sec Loss 4.7424 LearningRate 0.0003 Epoch: 20 Global Step: 427890 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:14,290-Speed 6308.48 samples/sec Loss 4.7547 LearningRate 0.0003 Epoch: 20 Global Step: 427900 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:17,538-Speed 6307.70 samples/sec Loss 4.8332 LearningRate 0.0003 Epoch: 20 Global Step: 427910 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:20,783-Speed 6312.70 samples/sec Loss 4.8080 LearningRate 0.0003 Epoch: 20 Global Step: 427920 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:24,026-Speed 6316.11 samples/sec Loss 4.8790 LearningRate 0.0003 Epoch: 20 Global Step: 427930 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:27,270-Speed 6314.24 samples/sec Loss 4.8151 LearningRate 0.0003 Epoch: 20 Global Step: 427940 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:30,516-Speed 6312.40 samples/sec Loss 4.7981 LearningRate 0.0003 Epoch: 20 Global Step: 427950 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:33,761-Speed 6312.39 samples/sec Loss 4.7760 LearningRate 0.0003 Epoch: 20 Global Step: 427960 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:37,013-Speed 6297.66 samples/sec Loss 4.7936 LearningRate 0.0003 Epoch: 20 Global Step: 427970 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:40,242-Speed 6343.78 samples/sec Loss 4.7348 LearningRate 0.0003 Epoch: 20 Global Step: 427980 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:43,486-Speed 6314.85 samples/sec Loss 4.7258 LearningRate 0.0003 Epoch: 20 Global Step: 427990 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:46,732-Speed 6310.35 samples/sec Loss 4.7900 LearningRate 0.0003 Epoch: 20 Global Step: 428000 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:49,978-Speed 6312.01 samples/sec Loss 4.7679 LearningRate 0.0003 Epoch: 20 Global Step: 428010 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:53,224-Speed 6311.67 samples/sec Loss 4.8017 LearningRate 0.0003 Epoch: 20 Global Step: 428020 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:56,471-Speed 6309.20 samples/sec Loss 4.7841 LearningRate 0.0003 Epoch: 20 Global Step: 428030 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:39:59,714-Speed 6316.55 samples/sec Loss 4.8298 LearningRate 0.0003 Epoch: 20 Global Step: 428040 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:02,959-Speed 6312.13 samples/sec Loss 4.7560 LearningRate 0.0003 Epoch: 20 Global Step: 428050 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:06,201-Speed 6317.31 samples/sec Loss 4.8184 LearningRate 0.0003 Epoch: 20 Global Step: 428060 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:09,452-Speed 6301.34 samples/sec Loss 4.8657 LearningRate 0.0003 Epoch: 20 Global Step: 428070 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:12,697-Speed 6312.43 samples/sec Loss 4.8371 LearningRate 0.0003 Epoch: 20 Global Step: 428080 Fp16 Grad Scale: 32768 Required: 37 hours Training: 2022-04-02 06:40:15,930-Speed 6337.06 samples/sec Loss 4.7724 LearningRate 0.0003 Epoch: 20 Global Step: 428090 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:19,174-Speed 6313.98 samples/sec Loss 4.7861 LearningRate 0.0003 Epoch: 20 Global Step: 428100 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:22,417-Speed 6316.83 samples/sec Loss 4.8898 LearningRate 0.0003 Epoch: 20 Global Step: 428110 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:25,660-Speed 6316.03 samples/sec Loss 4.8413 LearningRate 0.0003 Epoch: 20 Global Step: 428120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:28,925-Speed 6275.56 samples/sec Loss 4.8533 LearningRate 0.0003 Epoch: 20 Global Step: 428130 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:32,172-Speed 6307.21 samples/sec Loss 4.7990 LearningRate 0.0003 Epoch: 20 Global Step: 428140 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:35,423-Speed 6302.57 samples/sec Loss 4.7410 LearningRate 0.0003 Epoch: 20 Global Step: 428150 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:38,669-Speed 6308.75 samples/sec Loss 4.8097 LearningRate 0.0003 Epoch: 20 Global Step: 428160 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:41,917-Speed 6308.13 samples/sec Loss 4.7479 LearningRate 0.0003 Epoch: 20 Global Step: 428170 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-02 06:40:45,161-Speed 6313.77 samples/sec Loss 4.7801 LearningRate 0.0003 Epoch: 20 Global Step: 428180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:40:48,405-Speed 6315.10 samples/sec Loss 4.7673 LearningRate 0.0003 Epoch: 20 Global Step: 428190 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:40:51,634-Speed 6343.29 samples/sec Loss 4.7996 LearningRate 0.0003 Epoch: 20 Global Step: 428200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:40:54,904-Speed 6265.86 samples/sec Loss 4.7711 LearningRate 0.0003 Epoch: 20 Global Step: 428210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:40:58,151-Speed 6307.67 samples/sec Loss 4.8488 LearningRate 0.0003 Epoch: 20 Global Step: 428220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:01,397-Speed 6311.58 samples/sec Loss 4.8708 LearningRate 0.0003 Epoch: 20 Global Step: 428230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:04,644-Speed 6309.06 samples/sec Loss 4.7589 LearningRate 0.0003 Epoch: 20 Global Step: 428240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:07,885-Speed 6320.18 samples/sec Loss 4.8399 LearningRate 0.0003 Epoch: 20 Global Step: 428250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:11,128-Speed 6317.35 samples/sec Loss 4.7921 LearningRate 0.0003 Epoch: 20 Global Step: 428260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:14,374-Speed 6309.49 samples/sec Loss 4.7691 LearningRate 0.0003 Epoch: 20 Global Step: 428270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:17,618-Speed 6315.80 samples/sec Loss 4.7812 LearningRate 0.0003 Epoch: 20 Global Step: 428280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:20,865-Speed 6307.82 samples/sec Loss 4.7444 LearningRate 0.0003 Epoch: 20 Global Step: 428290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:24,095-Speed 6342.28 samples/sec Loss 4.7543 LearningRate 0.0003 Epoch: 20 Global Step: 428300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:27,342-Speed 6309.01 samples/sec Loss 4.8016 LearningRate 0.0003 Epoch: 20 Global Step: 428310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:30,586-Speed 6314.01 samples/sec Loss 4.8505 LearningRate 0.0003 Epoch: 20 Global Step: 428320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:33,831-Speed 6313.89 samples/sec Loss 4.8104 LearningRate 0.0003 Epoch: 20 Global Step: 428330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:37,076-Speed 6311.95 samples/sec Loss 4.8040 LearningRate 0.0003 Epoch: 20 Global Step: 428340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:40,320-Speed 6313.88 samples/sec Loss 4.8107 LearningRate 0.0003 Epoch: 20 Global Step: 428350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:43,567-Speed 6309.31 samples/sec Loss 4.7427 LearningRate 0.0003 Epoch: 20 Global Step: 428360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:46,828-Speed 6282.79 samples/sec Loss 4.7953 LearningRate 0.0003 Epoch: 20 Global Step: 428370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:50,074-Speed 6309.94 samples/sec Loss 4.7364 LearningRate 0.0003 Epoch: 20 Global Step: 428380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:53,318-Speed 6313.78 samples/sec Loss 4.8097 LearningRate 0.0003 Epoch: 20 Global Step: 428390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:56,544-Speed 6350.25 samples/sec Loss 4.7603 LearningRate 0.0003 Epoch: 20 Global Step: 428400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:41:59,802-Speed 6287.72 samples/sec Loss 4.8386 LearningRate 0.0003 Epoch: 20 Global Step: 428410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:03,051-Speed 6304.68 samples/sec Loss 4.9086 LearningRate 0.0003 Epoch: 20 Global Step: 428420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:06,298-Speed 6308.76 samples/sec Loss 4.8073 LearningRate 0.0003 Epoch: 20 Global Step: 428430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:09,544-Speed 6311.39 samples/sec Loss 4.8181 LearningRate 0.0003 Epoch: 20 Global Step: 428440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:12,790-Speed 6309.80 samples/sec Loss 4.7659 LearningRate 0.0003 Epoch: 20 Global Step: 428450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:16,041-Speed 6302.36 samples/sec Loss 4.8011 LearningRate 0.0003 Epoch: 20 Global Step: 428460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:19,287-Speed 6309.47 samples/sec Loss 4.7803 LearningRate 0.0003 Epoch: 20 Global Step: 428470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:22,532-Speed 6313.57 samples/sec Loss 4.7250 LearningRate 0.0003 Epoch: 20 Global Step: 428480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:25,777-Speed 6313.75 samples/sec Loss 4.8042 LearningRate 0.0003 Epoch: 20 Global Step: 428490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:29,032-Speed 6292.35 samples/sec Loss 4.8376 LearningRate 0.0003 Epoch: 20 Global Step: 428500 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:42:32,269-Speed 6329.62 samples/sec Loss 4.7775 LearningRate 0.0003 Epoch: 20 Global Step: 428510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:35,516-Speed 6307.89 samples/sec Loss 4.8312 LearningRate 0.0003 Epoch: 20 Global Step: 428520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:38,761-Speed 6313.44 samples/sec Loss 4.8057 LearningRate 0.0003 Epoch: 20 Global Step: 428530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:42,009-Speed 6306.92 samples/sec Loss 4.8349 LearningRate 0.0003 Epoch: 20 Global Step: 428540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:45,255-Speed 6309.61 samples/sec Loss 4.7829 LearningRate 0.0003 Epoch: 20 Global Step: 428550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:48,499-Speed 6315.88 samples/sec Loss 4.7605 LearningRate 0.0003 Epoch: 20 Global Step: 428560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:51,741-Speed 6317.83 samples/sec Loss 4.7509 LearningRate 0.0003 Epoch: 20 Global Step: 428570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:54,990-Speed 6304.35 samples/sec Loss 4.7775 LearningRate 0.0003 Epoch: 20 Global Step: 428580 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:42:58,238-Speed 6306.99 samples/sec Loss 4.7942 LearningRate 0.0003 Epoch: 20 Global Step: 428590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:01,483-Speed 6311.96 samples/sec Loss 4.7317 LearningRate 0.0003 Epoch: 20 Global Step: 428600 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:04,733-Speed 6303.85 samples/sec Loss 4.7796 LearningRate 0.0003 Epoch: 20 Global Step: 428610 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:43:07,963-Speed 6342.68 samples/sec Loss 4.7211 LearningRate 0.0003 Epoch: 20 Global Step: 428620 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:11,210-Speed 6306.93 samples/sec Loss 4.7761 LearningRate 0.0003 Epoch: 20 Global Step: 428630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:14,454-Speed 6316.50 samples/sec Loss 4.7797 LearningRate 0.0003 Epoch: 20 Global Step: 428640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:17,697-Speed 6316.39 samples/sec Loss 4.8512 LearningRate 0.0003 Epoch: 20 Global Step: 428650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:20,938-Speed 6320.06 samples/sec Loss 4.7806 LearningRate 0.0003 Epoch: 20 Global Step: 428660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:24,184-Speed 6311.72 samples/sec Loss 4.8505 LearningRate 0.0003 Epoch: 20 Global Step: 428670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:27,433-Speed 6305.05 samples/sec Loss 4.7944 LearningRate 0.0003 Epoch: 20 Global Step: 428680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:30,675-Speed 6317.63 samples/sec Loss 4.7950 LearningRate 0.0003 Epoch: 20 Global Step: 428690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:33,921-Speed 6311.66 samples/sec Loss 4.7370 LearningRate 0.0003 Epoch: 20 Global Step: 428700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:37,164-Speed 6315.81 samples/sec Loss 4.7615 LearningRate 0.0003 Epoch: 20 Global Step: 428710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:40,413-Speed 6305.62 samples/sec Loss 4.8435 LearningRate 0.0003 Epoch: 20 Global Step: 428720 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:43:43,660-Speed 6308.13 samples/sec Loss 4.7595 LearningRate 0.0003 Epoch: 20 Global Step: 428730 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:43:46,894-Speed 6335.13 samples/sec Loss 4.7930 LearningRate 0.0003 Epoch: 20 Global Step: 428740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:50,137-Speed 6315.60 samples/sec Loss 4.8352 LearningRate 0.0003 Epoch: 20 Global Step: 428750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:53,380-Speed 6316.69 samples/sec Loss 4.7433 LearningRate 0.0003 Epoch: 20 Global Step: 428760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:56,625-Speed 6313.07 samples/sec Loss 4.7600 LearningRate 0.0003 Epoch: 20 Global Step: 428770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:43:59,870-Speed 6312.85 samples/sec Loss 4.8020 LearningRate 0.0003 Epoch: 20 Global Step: 428780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:03,117-Speed 6306.93 samples/sec Loss 4.8764 LearningRate 0.0003 Epoch: 20 Global Step: 428790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:06,363-Speed 6311.64 samples/sec Loss 4.7859 LearningRate 0.0003 Epoch: 20 Global Step: 428800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:09,608-Speed 6312.58 samples/sec Loss 4.7562 LearningRate 0.0003 Epoch: 20 Global Step: 428810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:12,854-Speed 6312.20 samples/sec Loss 4.8239 LearningRate 0.0003 Epoch: 20 Global Step: 428820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:16,100-Speed 6310.23 samples/sec Loss 4.7519 LearningRate 0.0003 Epoch: 20 Global Step: 428830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:19,346-Speed 6311.09 samples/sec Loss 4.7757 LearningRate 0.0003 Epoch: 20 Global Step: 428840 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:44:22,591-Speed 6312.28 samples/sec Loss 4.7388 LearningRate 0.0003 Epoch: 20 Global Step: 428850 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:44:25,825-Speed 6334.05 samples/sec Loss 4.8345 LearningRate 0.0003 Epoch: 20 Global Step: 428860 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:29,072-Speed 6308.65 samples/sec Loss 4.7803 LearningRate 0.0003 Epoch: 20 Global Step: 428870 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:32,319-Speed 6308.43 samples/sec Loss 4.8015 LearningRate 0.0003 Epoch: 20 Global Step: 428880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:35,567-Speed 6307.93 samples/sec Loss 4.7953 LearningRate 0.0003 Epoch: 20 Global Step: 428890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:38,808-Speed 6321.15 samples/sec Loss 4.8484 LearningRate 0.0003 Epoch: 20 Global Step: 428900 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:42,054-Speed 6310.25 samples/sec Loss 4.7569 LearningRate 0.0003 Epoch: 20 Global Step: 428910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:45,297-Speed 6317.31 samples/sec Loss 4.7973 LearningRate 0.0003 Epoch: 20 Global Step: 428920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:48,541-Speed 6313.74 samples/sec Loss 4.7905 LearningRate 0.0003 Epoch: 20 Global Step: 428930 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:51,786-Speed 6311.96 samples/sec Loss 4.8182 LearningRate 0.0003 Epoch: 20 Global Step: 428940 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:55,028-Speed 6318.83 samples/sec Loss 4.8208 LearningRate 0.0003 Epoch: 20 Global Step: 428950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:44:58,273-Speed 6313.04 samples/sec Loss 4.7547 LearningRate 0.0003 Epoch: 20 Global Step: 428960 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:45:01,505-Speed 6338.67 samples/sec Loss 4.8195 LearningRate 0.0003 Epoch: 20 Global Step: 428970 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:04,746-Speed 6319.28 samples/sec Loss 4.7905 LearningRate 0.0003 Epoch: 20 Global Step: 428980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:07,992-Speed 6312.09 samples/sec Loss 4.7735 LearningRate 0.0003 Epoch: 20 Global Step: 428990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:11,233-Speed 6320.01 samples/sec Loss 4.8020 LearningRate 0.0003 Epoch: 20 Global Step: 429000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:14,477-Speed 6314.65 samples/sec Loss 4.7546 LearningRate 0.0003 Epoch: 20 Global Step: 429010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:17,730-Speed 6297.83 samples/sec Loss 4.8135 LearningRate 0.0003 Epoch: 20 Global Step: 429020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:20,977-Speed 6307.70 samples/sec Loss 4.7882 LearningRate 0.0003 Epoch: 20 Global Step: 429030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:24,228-Speed 6301.83 samples/sec Loss 4.8120 LearningRate 0.0003 Epoch: 20 Global Step: 429040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:27,470-Speed 6317.69 samples/sec Loss 4.7032 LearningRate 0.0003 Epoch: 20 Global Step: 429050 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:30,713-Speed 6316.67 samples/sec Loss 4.7642 LearningRate 0.0003 Epoch: 20 Global Step: 429060 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:33,956-Speed 6316.06 samples/sec Loss 4.8092 LearningRate 0.0003 Epoch: 20 Global Step: 429070 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:45:37,200-Speed 6316.27 samples/sec Loss 4.7279 LearningRate 0.0003 Epoch: 20 Global Step: 429080 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:45:40,436-Speed 6331.77 samples/sec Loss 4.8222 LearningRate 0.0003 Epoch: 20 Global Step: 429090 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:43,685-Speed 6304.25 samples/sec Loss 4.7416 LearningRate 0.0003 Epoch: 20 Global Step: 429100 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:46,932-Speed 6309.57 samples/sec Loss 4.7403 LearningRate 0.0003 Epoch: 20 Global Step: 429110 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:50,178-Speed 6311.66 samples/sec Loss 4.7898 LearningRate 0.0003 Epoch: 20 Global Step: 429120 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:53,423-Speed 6312.16 samples/sec Loss 4.8076 LearningRate 0.0003 Epoch: 20 Global Step: 429130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:56,675-Speed 6299.06 samples/sec Loss 4.6787 LearningRate 0.0003 Epoch: 20 Global Step: 429140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:45:59,920-Speed 6312.72 samples/sec Loss 4.8154 LearningRate 0.0003 Epoch: 20 Global Step: 429150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:03,171-Speed 6301.71 samples/sec Loss 4.8213 LearningRate 0.0003 Epoch: 20 Global Step: 429160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:06,416-Speed 6310.80 samples/sec Loss 4.7586 LearningRate 0.0003 Epoch: 20 Global Step: 429170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:09,664-Speed 6307.95 samples/sec Loss 4.7482 LearningRate 0.0003 Epoch: 20 Global Step: 429180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:12,911-Speed 6308.00 samples/sec Loss 4.7303 LearningRate 0.0003 Epoch: 20 Global Step: 429190 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:46:16,145-Speed 6334.40 samples/sec Loss 4.7837 LearningRate 0.0003 Epoch: 20 Global Step: 429200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:19,388-Speed 6315.87 samples/sec Loss 4.7459 LearningRate 0.0003 Epoch: 20 Global Step: 429210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:22,636-Speed 6306.82 samples/sec Loss 4.7764 LearningRate 0.0003 Epoch: 20 Global Step: 429220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:25,968-Speed 6147.50 samples/sec Loss 4.7072 LearningRate 0.0003 Epoch: 20 Global Step: 429230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:29,266-Speed 6212.27 samples/sec Loss 4.8146 LearningRate 0.0003 Epoch: 20 Global Step: 429240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:32,521-Speed 6292.98 samples/sec Loss 4.7643 LearningRate 0.0003 Epoch: 20 Global Step: 429250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:35,768-Speed 6309.57 samples/sec Loss 4.8232 LearningRate 0.0003 Epoch: 20 Global Step: 429260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:39,012-Speed 6313.97 samples/sec Loss 4.8099 LearningRate 0.0003 Epoch: 20 Global Step: 429270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:42,258-Speed 6310.62 samples/sec Loss 4.8763 LearningRate 0.0003 Epoch: 20 Global Step: 429280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:45,505-Speed 6309.18 samples/sec Loss 4.8092 LearningRate 0.0003 Epoch: 20 Global Step: 429290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:48,748-Speed 6316.72 samples/sec Loss 4.8360 LearningRate 0.0003 Epoch: 20 Global Step: 429300 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:46:51,980-Speed 6339.19 samples/sec Loss 4.8482 LearningRate 0.0003 Epoch: 20 Global Step: 429310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:55,223-Speed 6316.19 samples/sec Loss 4.8003 LearningRate 0.0003 Epoch: 20 Global Step: 429320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:46:58,463-Speed 6322.37 samples/sec Loss 4.7464 LearningRate 0.0003 Epoch: 20 Global Step: 429330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:01,705-Speed 6318.88 samples/sec Loss 4.7804 LearningRate 0.0003 Epoch: 20 Global Step: 429340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:04,948-Speed 6317.28 samples/sec Loss 4.7806 LearningRate 0.0003 Epoch: 20 Global Step: 429350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:08,192-Speed 6313.86 samples/sec Loss 4.7112 LearningRate 0.0003 Epoch: 20 Global Step: 429360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:11,441-Speed 6306.21 samples/sec Loss 4.7836 LearningRate 0.0003 Epoch: 20 Global Step: 429370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:14,688-Speed 6307.11 samples/sec Loss 4.7803 LearningRate 0.0003 Epoch: 20 Global Step: 429380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:17,935-Speed 6310.72 samples/sec Loss 4.7577 LearningRate 0.0003 Epoch: 20 Global Step: 429390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:21,180-Speed 6311.19 samples/sec Loss 4.8535 LearningRate 0.0003 Epoch: 20 Global Step: 429400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:24,424-Speed 6314.68 samples/sec Loss 4.8439 LearningRate 0.0003 Epoch: 20 Global Step: 429410 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:47:27,673-Speed 6306.28 samples/sec Loss 4.8241 LearningRate 0.0003 Epoch: 20 Global Step: 429420 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:47:30,905-Speed 6337.55 samples/sec Loss 4.7178 LearningRate 0.0003 Epoch: 20 Global Step: 429430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:34,147-Speed 6318.76 samples/sec Loss 4.7829 LearningRate 0.0003 Epoch: 20 Global Step: 429440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:37,394-Speed 6309.10 samples/sec Loss 4.7904 LearningRate 0.0003 Epoch: 20 Global Step: 429450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:40,637-Speed 6316.75 samples/sec Loss 4.8194 LearningRate 0.0003 Epoch: 20 Global Step: 429460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:43,881-Speed 6314.35 samples/sec Loss 4.7950 LearningRate 0.0003 Epoch: 20 Global Step: 429470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:47,125-Speed 6314.43 samples/sec Loss 4.7909 LearningRate 0.0003 Epoch: 20 Global Step: 429480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:50,369-Speed 6315.58 samples/sec Loss 4.7949 LearningRate 0.0003 Epoch: 20 Global Step: 429490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:53,618-Speed 6305.89 samples/sec Loss 4.8402 LearningRate 0.0003 Epoch: 20 Global Step: 429500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:47:56,870-Speed 6299.32 samples/sec Loss 4.8446 LearningRate 0.0003 Epoch: 20 Global Step: 429510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:00,115-Speed 6312.91 samples/sec Loss 4.8071 LearningRate 0.0003 Epoch: 20 Global Step: 429520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:03,344-Speed 6342.47 samples/sec Loss 4.7903 LearningRate 0.0003 Epoch: 20 Global Step: 429530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:06,591-Speed 6309.28 samples/sec Loss 4.7456 LearningRate 0.0003 Epoch: 20 Global Step: 429540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:09,838-Speed 6309.77 samples/sec Loss 4.8129 LearningRate 0.0003 Epoch: 20 Global Step: 429550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:13,085-Speed 6308.62 samples/sec Loss 4.7458 LearningRate 0.0003 Epoch: 20 Global Step: 429560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:16,337-Speed 6298.05 samples/sec Loss 4.8286 LearningRate 0.0003 Epoch: 20 Global Step: 429570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:19,582-Speed 6313.62 samples/sec Loss 4.7712 LearningRate 0.0003 Epoch: 20 Global Step: 429580 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:22,829-Speed 6308.23 samples/sec Loss 4.7556 LearningRate 0.0003 Epoch: 20 Global Step: 429590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:26,074-Speed 6312.46 samples/sec Loss 4.7762 LearningRate 0.0003 Epoch: 20 Global Step: 429600 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:29,319-Speed 6313.31 samples/sec Loss 4.8274 LearningRate 0.0003 Epoch: 20 Global Step: 429610 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:32,561-Speed 6318.28 samples/sec Loss 4.7566 LearningRate 0.0003 Epoch: 20 Global Step: 429620 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:35,795-Speed 6334.42 samples/sec Loss 4.7750 LearningRate 0.0003 Epoch: 20 Global Step: 429630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:39,051-Speed 6291.29 samples/sec Loss 4.8050 LearningRate 0.0003 Epoch: 20 Global Step: 429640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:42,295-Speed 6314.71 samples/sec Loss 4.7594 LearningRate 0.0003 Epoch: 20 Global Step: 429650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:45,539-Speed 6314.35 samples/sec Loss 4.8668 LearningRate 0.0003 Epoch: 20 Global Step: 429660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:48,782-Speed 6316.17 samples/sec Loss 4.7889 LearningRate 0.0003 Epoch: 20 Global Step: 429670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:52,023-Speed 6321.25 samples/sec Loss 4.7986 LearningRate 0.0003 Epoch: 20 Global Step: 429680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:55,271-Speed 6307.44 samples/sec Loss 4.7585 LearningRate 0.0003 Epoch: 20 Global Step: 429690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:48:58,513-Speed 6318.36 samples/sec Loss 4.7290 LearningRate 0.0003 Epoch: 20 Global Step: 429700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:01,759-Speed 6308.88 samples/sec Loss 4.8050 LearningRate 0.0003 Epoch: 20 Global Step: 429710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:05,009-Speed 6303.42 samples/sec Loss 4.8119 LearningRate 0.0003 Epoch: 20 Global Step: 429720 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:08,254-Speed 6314.88 samples/sec Loss 4.8185 LearningRate 0.0003 Epoch: 20 Global Step: 429730 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:49:11,499-Speed 6310.94 samples/sec Loss 4.8842 LearningRate 0.0003 Epoch: 20 Global Step: 429740 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:49:14,742-Speed 6318.02 samples/sec Loss 4.7496 LearningRate 0.0003 Epoch: 20 Global Step: 429750 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:49:17,990-Speed 6305.45 samples/sec Loss 4.7723 LearningRate 0.0003 Epoch: 20 Global Step: 429760 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:49:21,221-Speed 6340.65 samples/sec Loss 4.8139 LearningRate 0.0003 Epoch: 20 Global Step: 429770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:24,468-Speed 6309.56 samples/sec Loss 4.6876 LearningRate 0.0003 Epoch: 20 Global Step: 429780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:27,715-Speed 6308.16 samples/sec Loss 4.7382 LearningRate 0.0003 Epoch: 20 Global Step: 429790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:30,963-Speed 6306.75 samples/sec Loss 4.7677 LearningRate 0.0003 Epoch: 20 Global Step: 429800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:34,206-Speed 6317.20 samples/sec Loss 4.8210 LearningRate 0.0003 Epoch: 20 Global Step: 429810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:37,451-Speed 6313.01 samples/sec Loss 4.7152 LearningRate 0.0003 Epoch: 20 Global Step: 429820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:40,696-Speed 6312.04 samples/sec Loss 4.8321 LearningRate 0.0003 Epoch: 20 Global Step: 429830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:43,942-Speed 6310.41 samples/sec Loss 4.8075 LearningRate 0.0003 Epoch: 20 Global Step: 429840 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:47,192-Speed 6302.10 samples/sec Loss 4.8374 LearningRate 0.0003 Epoch: 20 Global Step: 429850 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:50,437-Speed 6312.61 samples/sec Loss 4.8027 LearningRate 0.0003 Epoch: 20 Global Step: 429860 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:53,668-Speed 6341.45 samples/sec Loss 4.8549 LearningRate 0.0003 Epoch: 20 Global Step: 429870 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:49:56,914-Speed 6310.37 samples/sec Loss 4.7249 LearningRate 0.0003 Epoch: 20 Global Step: 429880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:00,158-Speed 6313.47 samples/sec Loss 4.7745 LearningRate 0.0003 Epoch: 20 Global Step: 429890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:03,405-Speed 6308.89 samples/sec Loss 4.8170 LearningRate 0.0003 Epoch: 20 Global Step: 429900 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:06,654-Speed 6305.52 samples/sec Loss 4.8175 LearningRate 0.0003 Epoch: 20 Global Step: 429910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:09,900-Speed 6309.99 samples/sec Loss 4.7258 LearningRate 0.0003 Epoch: 20 Global Step: 429920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:13,153-Speed 6298.75 samples/sec Loss 4.7982 LearningRate 0.0003 Epoch: 20 Global Step: 429930 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:16,395-Speed 6318.59 samples/sec Loss 4.7519 LearningRate 0.0003 Epoch: 20 Global Step: 429940 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:19,642-Speed 6309.55 samples/sec Loss 4.7389 LearningRate 0.0003 Epoch: 20 Global Step: 429950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:22,886-Speed 6314.06 samples/sec Loss 4.8452 LearningRate 0.0003 Epoch: 20 Global Step: 429960 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:26,127-Speed 6320.28 samples/sec Loss 4.7811 LearningRate 0.0003 Epoch: 20 Global Step: 429970 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:50:29,362-Speed 6331.99 samples/sec Loss 4.7497 LearningRate 0.0003 Epoch: 20 Global Step: 429980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:32,606-Speed 6314.53 samples/sec Loss 4.7836 LearningRate 0.0003 Epoch: 20 Global Step: 429990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:35,853-Speed 6309.23 samples/sec Loss 4.8179 LearningRate 0.0003 Epoch: 20 Global Step: 430000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:39,104-Speed 6301.38 samples/sec Loss 4.7728 LearningRate 0.0003 Epoch: 20 Global Step: 430010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:42,353-Speed 6304.58 samples/sec Loss 4.7126 LearningRate 0.0003 Epoch: 20 Global Step: 430020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:45,597-Speed 6313.80 samples/sec Loss 4.7637 LearningRate 0.0003 Epoch: 20 Global Step: 430030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:48,840-Speed 6318.06 samples/sec Loss 4.8035 LearningRate 0.0003 Epoch: 20 Global Step: 430040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:52,086-Speed 6310.15 samples/sec Loss 4.8031 LearningRate 0.0003 Epoch: 20 Global Step: 430050 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:55,332-Speed 6309.54 samples/sec Loss 4.7751 LearningRate 0.0003 Epoch: 20 Global Step: 430060 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:50:58,583-Speed 6302.15 samples/sec Loss 4.7792 LearningRate 0.0003 Epoch: 20 Global Step: 430070 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:01,829-Speed 6310.81 samples/sec Loss 4.7958 LearningRate 0.0003 Epoch: 20 Global Step: 430080 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:51:05,060-Speed 6338.23 samples/sec Loss 4.7191 LearningRate 0.0003 Epoch: 20 Global Step: 430090 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:08,305-Speed 6314.22 samples/sec Loss 4.7884 LearningRate 0.0003 Epoch: 20 Global Step: 430100 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:11,552-Speed 6308.89 samples/sec Loss 4.7674 LearningRate 0.0003 Epoch: 20 Global Step: 430110 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:14,797-Speed 6311.39 samples/sec Loss 4.8055 LearningRate 0.0003 Epoch: 20 Global Step: 430120 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:18,048-Speed 6303.40 samples/sec Loss 4.7487 LearningRate 0.0003 Epoch: 20 Global Step: 430130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:21,307-Speed 6283.91 samples/sec Loss 4.7496 LearningRate 0.0003 Epoch: 20 Global Step: 430140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:24,552-Speed 6313.60 samples/sec Loss 4.8045 LearningRate 0.0003 Epoch: 20 Global Step: 430150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:27,801-Speed 6305.40 samples/sec Loss 4.7156 LearningRate 0.0003 Epoch: 20 Global Step: 430160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:31,043-Speed 6317.99 samples/sec Loss 4.8161 LearningRate 0.0003 Epoch: 20 Global Step: 430170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:34,285-Speed 6319.04 samples/sec Loss 4.8378 LearningRate 0.0003 Epoch: 20 Global Step: 430180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:37,522-Speed 6328.45 samples/sec Loss 4.8233 LearningRate 0.0003 Epoch: 20 Global Step: 430190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:40,763-Speed 6320.47 samples/sec Loss 4.7507 LearningRate 0.0003 Epoch: 20 Global Step: 430200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:44,007-Speed 6312.98 samples/sec Loss 4.7397 LearningRate 0.0003 Epoch: 20 Global Step: 430210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:47,261-Speed 6296.56 samples/sec Loss 4.7964 LearningRate 0.0003 Epoch: 20 Global Step: 430220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:50,503-Speed 6318.28 samples/sec Loss 4.8087 LearningRate 0.0003 Epoch: 20 Global Step: 430230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:53,754-Speed 6300.11 samples/sec Loss 4.8483 LearningRate 0.0003 Epoch: 20 Global Step: 430240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:51:57,020-Speed 6273.49 samples/sec Loss 4.7391 LearningRate 0.0003 Epoch: 20 Global Step: 430250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:00,268-Speed 6306.45 samples/sec Loss 4.6953 LearningRate 0.0003 Epoch: 20 Global Step: 430260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:03,514-Speed 6310.54 samples/sec Loss 4.8526 LearningRate 0.0003 Epoch: 20 Global Step: 430270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:06,762-Speed 6305.63 samples/sec Loss 4.8199 LearningRate 0.0003 Epoch: 20 Global Step: 430280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:09,994-Speed 6339.42 samples/sec Loss 4.8404 LearningRate 0.0003 Epoch: 20 Global Step: 430290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:13,238-Speed 6314.84 samples/sec Loss 4.8859 LearningRate 0.0003 Epoch: 20 Global Step: 430300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:16,485-Speed 6307.93 samples/sec Loss 4.7503 LearningRate 0.0003 Epoch: 20 Global Step: 430310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:19,728-Speed 6315.75 samples/sec Loss 4.7536 LearningRate 0.0003 Epoch: 20 Global Step: 430320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:23,038-Speed 6190.19 samples/sec Loss 4.7966 LearningRate 0.0003 Epoch: 20 Global Step: 430330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:26,311-Speed 6259.20 samples/sec Loss 4.7351 LearningRate 0.0003 Epoch: 20 Global Step: 430340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:29,559-Speed 6306.22 samples/sec Loss 4.7502 LearningRate 0.0003 Epoch: 20 Global Step: 430350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:32,805-Speed 6311.43 samples/sec Loss 4.7582 LearningRate 0.0003 Epoch: 20 Global Step: 430360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:36,052-Speed 6307.63 samples/sec Loss 4.7665 LearningRate 0.0003 Epoch: 20 Global Step: 430370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:39,295-Speed 6316.90 samples/sec Loss 4.7914 LearningRate 0.0003 Epoch: 20 Global Step: 430380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:42,541-Speed 6309.97 samples/sec Loss 4.7443 LearningRate 0.0003 Epoch: 20 Global Step: 430390 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:52:45,787-Speed 6310.85 samples/sec Loss 4.7938 LearningRate 0.0003 Epoch: 20 Global Step: 430400 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:52:49,022-Speed 6333.46 samples/sec Loss 4.7115 LearningRate 0.0003 Epoch: 20 Global Step: 430410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:52,270-Speed 6306.46 samples/sec Loss 4.7692 LearningRate 0.0003 Epoch: 20 Global Step: 430420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:55,518-Speed 6307.23 samples/sec Loss 4.7742 LearningRate 0.0003 Epoch: 20 Global Step: 430430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:52:58,763-Speed 6312.69 samples/sec Loss 4.7520 LearningRate 0.0003 Epoch: 20 Global Step: 430440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:02,013-Speed 6302.21 samples/sec Loss 4.8731 LearningRate 0.0003 Epoch: 20 Global Step: 430450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:05,260-Speed 6307.97 samples/sec Loss 4.8085 LearningRate 0.0003 Epoch: 20 Global Step: 430460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:08,507-Speed 6310.31 samples/sec Loss 4.8114 LearningRate 0.0003 Epoch: 20 Global Step: 430470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:11,751-Speed 6314.49 samples/sec Loss 4.7712 LearningRate 0.0003 Epoch: 20 Global Step: 430480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:14,992-Speed 6319.47 samples/sec Loss 4.7712 LearningRate 0.0003 Epoch: 20 Global Step: 430490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:18,241-Speed 6305.87 samples/sec Loss 4.7462 LearningRate 0.0003 Epoch: 20 Global Step: 430500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:21,475-Speed 6333.16 samples/sec Loss 4.7267 LearningRate 0.0003 Epoch: 20 Global Step: 430510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:24,716-Speed 6321.86 samples/sec Loss 4.7283 LearningRate 0.0003 Epoch: 20 Global Step: 430520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:27,963-Speed 6308.88 samples/sec Loss 4.8020 LearningRate 0.0003 Epoch: 20 Global Step: 430530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:31,208-Speed 6311.99 samples/sec Loss 4.6729 LearningRate 0.0003 Epoch: 20 Global Step: 430540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:34,458-Speed 6304.17 samples/sec Loss 4.8022 LearningRate 0.0003 Epoch: 20 Global Step: 430550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:37,703-Speed 6313.21 samples/sec Loss 4.7527 LearningRate 0.0003 Epoch: 20 Global Step: 430560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:40,944-Speed 6319.45 samples/sec Loss 4.7718 LearningRate 0.0003 Epoch: 20 Global Step: 430570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:44,186-Speed 6318.48 samples/sec Loss 4.7905 LearningRate 0.0003 Epoch: 20 Global Step: 430580 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:47,429-Speed 6315.39 samples/sec Loss 4.7720 LearningRate 0.0003 Epoch: 20 Global Step: 430590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:50,673-Speed 6314.80 samples/sec Loss 4.8419 LearningRate 0.0003 Epoch: 20 Global Step: 430600 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:53:53,919-Speed 6310.64 samples/sec Loss 4.7300 LearningRate 0.0003 Epoch: 20 Global Step: 430610 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:53:57,164-Speed 6313.55 samples/sec Loss 4.8708 LearningRate 0.0003 Epoch: 20 Global Step: 430620 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:54:00,414-Speed 6303.56 samples/sec Loss 4.7892 LearningRate 0.0003 Epoch: 20 Global Step: 430630 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:54:03,647-Speed 6334.75 samples/sec Loss 4.8038 LearningRate 0.0003 Epoch: 20 Global Step: 430640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:06,890-Speed 6316.69 samples/sec Loss 4.7920 LearningRate 0.0003 Epoch: 20 Global Step: 430650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:10,135-Speed 6313.15 samples/sec Loss 4.7334 LearningRate 0.0003 Epoch: 20 Global Step: 430660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:13,380-Speed 6312.06 samples/sec Loss 4.8159 LearningRate 0.0003 Epoch: 20 Global Step: 430670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:16,624-Speed 6315.31 samples/sec Loss 4.7060 LearningRate 0.0003 Epoch: 20 Global Step: 430680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:19,873-Speed 6305.50 samples/sec Loss 4.7121 LearningRate 0.0003 Epoch: 20 Global Step: 430690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:23,120-Speed 6307.58 samples/sec Loss 4.8047 LearningRate 0.0003 Epoch: 20 Global Step: 430700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:26,374-Speed 6294.98 samples/sec Loss 4.7796 LearningRate 0.0003 Epoch: 20 Global Step: 430710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:29,622-Speed 6308.13 samples/sec Loss 4.7771 LearningRate 0.0003 Epoch: 20 Global Step: 430720 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:32,866-Speed 6314.14 samples/sec Loss 4.8025 LearningRate 0.0003 Epoch: 20 Global Step: 430730 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:36,101-Speed 6332.51 samples/sec Loss 4.7362 LearningRate 0.0003 Epoch: 20 Global Step: 430740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:39,351-Speed 6304.22 samples/sec Loss 4.8122 LearningRate 0.0003 Epoch: 20 Global Step: 430750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:42,594-Speed 6314.99 samples/sec Loss 4.8328 LearningRate 0.0003 Epoch: 20 Global Step: 430760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:45,837-Speed 6315.96 samples/sec Loss 4.7345 LearningRate 0.0003 Epoch: 20 Global Step: 430770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:49,083-Speed 6311.96 samples/sec Loss 4.7839 LearningRate 0.0003 Epoch: 20 Global Step: 430780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:52,331-Speed 6306.60 samples/sec Loss 4.8437 LearningRate 0.0003 Epoch: 20 Global Step: 430790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:55,581-Speed 6302.47 samples/sec Loss 4.7730 LearningRate 0.0003 Epoch: 20 Global Step: 430800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:54:58,830-Speed 6305.21 samples/sec Loss 4.7814 LearningRate 0.0003 Epoch: 20 Global Step: 430810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:02,077-Speed 6308.56 samples/sec Loss 4.8186 LearningRate 0.0003 Epoch: 20 Global Step: 430820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:05,323-Speed 6311.02 samples/sec Loss 4.8104 LearningRate 0.0003 Epoch: 20 Global Step: 430830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:08,571-Speed 6307.51 samples/sec Loss 4.7689 LearningRate 0.0003 Epoch: 20 Global Step: 430840 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:55:11,813-Speed 6318.52 samples/sec Loss 4.7639 LearningRate 0.0003 Epoch: 20 Global Step: 430850 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:55:15,059-Speed 6310.19 samples/sec Loss 4.7678 LearningRate 0.0003 Epoch: 20 Global Step: 430860 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:55:18,310-Speed 6301.32 samples/sec Loss 4.8075 LearningRate 0.0003 Epoch: 20 Global Step: 430870 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:21,556-Speed 6309.03 samples/sec Loss 4.7763 LearningRate 0.0003 Epoch: 20 Global Step: 430880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:24,810-Speed 6295.52 samples/sec Loss 4.7433 LearningRate 0.0003 Epoch: 20 Global Step: 430890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:28,061-Speed 6301.90 samples/sec Loss 4.8024 LearningRate 0.0003 Epoch: 20 Global Step: 430900 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:31,307-Speed 6310.14 samples/sec Loss 4.8293 LearningRate 0.0003 Epoch: 20 Global Step: 430910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:34,554-Speed 6308.61 samples/sec Loss 4.7550 LearningRate 0.0003 Epoch: 20 Global Step: 430920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:37,808-Speed 6296.51 samples/sec Loss 4.7981 LearningRate 0.0003 Epoch: 20 Global Step: 430930 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:41,057-Speed 6305.34 samples/sec Loss 4.7831 LearningRate 0.0003 Epoch: 20 Global Step: 430940 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:44,301-Speed 6314.09 samples/sec Loss 4.8002 LearningRate 0.0003 Epoch: 20 Global Step: 430950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:47,548-Speed 6308.37 samples/sec Loss 4.8012 LearningRate 0.0003 Epoch: 20 Global Step: 430960 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:50,796-Speed 6307.70 samples/sec Loss 4.8106 LearningRate 0.0003 Epoch: 20 Global Step: 430970 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:55:54,028-Speed 6337.61 samples/sec Loss 4.7772 LearningRate 0.0003 Epoch: 20 Global Step: 430980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:55:57,271-Speed 6316.41 samples/sec Loss 4.7649 LearningRate 0.0003 Epoch: 20 Global Step: 430990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:00,515-Speed 6314.67 samples/sec Loss 4.7855 LearningRate 0.0003 Epoch: 20 Global Step: 431000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:03,759-Speed 6314.90 samples/sec Loss 4.7267 LearningRate 0.0003 Epoch: 20 Global Step: 431010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:07,001-Speed 6318.15 samples/sec Loss 4.6821 LearningRate 0.0003 Epoch: 20 Global Step: 431020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:10,250-Speed 6305.81 samples/sec Loss 4.8031 LearningRate 0.0003 Epoch: 20 Global Step: 431030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:13,495-Speed 6312.55 samples/sec Loss 4.8110 LearningRate 0.0003 Epoch: 20 Global Step: 431040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:16,738-Speed 6316.26 samples/sec Loss 4.7629 LearningRate 0.0003 Epoch: 20 Global Step: 431050 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:19,981-Speed 6316.60 samples/sec Loss 4.7564 LearningRate 0.0003 Epoch: 20 Global Step: 431060 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:23,227-Speed 6309.84 samples/sec Loss 4.7782 LearningRate 0.0003 Epoch: 20 Global Step: 431070 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:26,482-Speed 6294.73 samples/sec Loss 4.7766 LearningRate 0.0003 Epoch: 20 Global Step: 431080 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:56:29,727-Speed 6311.40 samples/sec Loss 4.7291 LearningRate 0.0003 Epoch: 20 Global Step: 431090 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:56:32,972-Speed 6312.58 samples/sec Loss 4.7871 LearningRate 0.0003 Epoch: 20 Global Step: 431100 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:56:36,232-Speed 6284.36 samples/sec Loss 4.8223 LearningRate 0.0003 Epoch: 20 Global Step: 431110 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:56:39,474-Speed 6318.64 samples/sec Loss 4.7779 LearningRate 0.0003 Epoch: 20 Global Step: 431120 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:42,719-Speed 6311.29 samples/sec Loss 4.8209 LearningRate 0.0003 Epoch: 20 Global Step: 431130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:45,966-Speed 6309.50 samples/sec Loss 4.7824 LearningRate 0.0003 Epoch: 20 Global Step: 431140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:49,212-Speed 6312.58 samples/sec Loss 4.8282 LearningRate 0.0003 Epoch: 20 Global Step: 431150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:52,457-Speed 6310.97 samples/sec Loss 4.7891 LearningRate 0.0003 Epoch: 20 Global Step: 431160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:55,704-Speed 6310.16 samples/sec Loss 4.7367 LearningRate 0.0003 Epoch: 20 Global Step: 431170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:56:58,950-Speed 6310.84 samples/sec Loss 4.7747 LearningRate 0.0003 Epoch: 20 Global Step: 431180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:02,193-Speed 6317.13 samples/sec Loss 4.7450 LearningRate 0.0003 Epoch: 20 Global Step: 431190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:05,439-Speed 6309.00 samples/sec Loss 4.7983 LearningRate 0.0003 Epoch: 20 Global Step: 431200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:08,684-Speed 6312.90 samples/sec Loss 4.7907 LearningRate 0.0003 Epoch: 20 Global Step: 431210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:11,933-Speed 6305.20 samples/sec Loss 4.7844 LearningRate 0.0003 Epoch: 20 Global Step: 431220 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:57:15,164-Speed 6341.69 samples/sec Loss 4.7637 LearningRate 0.0003 Epoch: 20 Global Step: 431230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:18,408-Speed 6313.38 samples/sec Loss 4.7524 LearningRate 0.0003 Epoch: 20 Global Step: 431240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:21,660-Speed 6299.29 samples/sec Loss 4.8055 LearningRate 0.0003 Epoch: 20 Global Step: 431250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:24,906-Speed 6310.12 samples/sec Loss 4.7721 LearningRate 0.0003 Epoch: 20 Global Step: 431260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:28,151-Speed 6313.05 samples/sec Loss 4.8823 LearningRate 0.0003 Epoch: 20 Global Step: 431270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:31,399-Speed 6306.35 samples/sec Loss 4.8587 LearningRate 0.0003 Epoch: 20 Global Step: 431280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:34,646-Speed 6309.39 samples/sec Loss 4.8435 LearningRate 0.0003 Epoch: 20 Global Step: 431290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:37,892-Speed 6309.73 samples/sec Loss 4.7571 LearningRate 0.0003 Epoch: 20 Global Step: 431300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:41,138-Speed 6310.52 samples/sec Loss 4.8003 LearningRate 0.0003 Epoch: 20 Global Step: 431310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:44,385-Speed 6309.08 samples/sec Loss 4.7574 LearningRate 0.0003 Epoch: 20 Global Step: 431320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:47,630-Speed 6312.43 samples/sec Loss 4.7304 LearningRate 0.0003 Epoch: 20 Global Step: 431330 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:57:50,859-Speed 6344.07 samples/sec Loss 4.7222 LearningRate 0.0003 Epoch: 20 Global Step: 431340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:54,109-Speed 6304.83 samples/sec Loss 4.7688 LearningRate 0.0003 Epoch: 20 Global Step: 431350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:57:57,352-Speed 6315.33 samples/sec Loss 4.8047 LearningRate 0.0003 Epoch: 20 Global Step: 431360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:00,600-Speed 6308.72 samples/sec Loss 4.6646 LearningRate 0.0003 Epoch: 20 Global Step: 431370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:03,851-Speed 6299.48 samples/sec Loss 4.7439 LearningRate 0.0003 Epoch: 20 Global Step: 431380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:07,099-Speed 6307.80 samples/sec Loss 4.7821 LearningRate 0.0003 Epoch: 20 Global Step: 431390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:10,344-Speed 6312.11 samples/sec Loss 4.7977 LearningRate 0.0003 Epoch: 20 Global Step: 431400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:13,588-Speed 6314.37 samples/sec Loss 4.7842 LearningRate 0.0003 Epoch: 20 Global Step: 431410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:16,831-Speed 6316.17 samples/sec Loss 4.7778 LearningRate 0.0003 Epoch: 20 Global Step: 431420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:20,076-Speed 6313.58 samples/sec Loss 4.6981 LearningRate 0.0003 Epoch: 20 Global Step: 431430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:23,319-Speed 6315.49 samples/sec Loss 4.7405 LearningRate 0.0003 Epoch: 20 Global Step: 431440 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:58:26,565-Speed 6311.11 samples/sec Loss 4.7780 LearningRate 0.0003 Epoch: 20 Global Step: 431450 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:58:29,813-Speed 6307.13 samples/sec Loss 4.7543 LearningRate 0.0003 Epoch: 20 Global Step: 431460 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:58:33,043-Speed 6342.27 samples/sec Loss 4.7841 LearningRate 0.0003 Epoch: 20 Global Step: 431470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:36,292-Speed 6305.40 samples/sec Loss 4.7872 LearningRate 0.0003 Epoch: 20 Global Step: 431480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:39,536-Speed 6314.14 samples/sec Loss 4.7501 LearningRate 0.0003 Epoch: 20 Global Step: 431490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:42,778-Speed 6318.85 samples/sec Loss 4.8594 LearningRate 0.0003 Epoch: 20 Global Step: 431500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:46,030-Speed 6298.28 samples/sec Loss 4.7396 LearningRate 0.0003 Epoch: 20 Global Step: 431510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:49,277-Speed 6308.54 samples/sec Loss 4.7591 LearningRate 0.0003 Epoch: 20 Global Step: 431520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:52,519-Speed 6318.15 samples/sec Loss 4.7901 LearningRate 0.0003 Epoch: 20 Global Step: 431530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:55,764-Speed 6313.10 samples/sec Loss 4.8238 LearningRate 0.0003 Epoch: 20 Global Step: 431540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:58:59,007-Speed 6317.46 samples/sec Loss 4.7978 LearningRate 0.0003 Epoch: 20 Global Step: 431550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:02,250-Speed 6317.41 samples/sec Loss 4.7440 LearningRate 0.0003 Epoch: 20 Global Step: 431560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:05,499-Speed 6304.94 samples/sec Loss 4.7484 LearningRate 0.0003 Epoch: 20 Global Step: 431570 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:59:08,745-Speed 6310.83 samples/sec Loss 4.8004 LearningRate 0.0003 Epoch: 20 Global Step: 431580 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 06:59:11,974-Speed 6343.63 samples/sec Loss 4.8014 LearningRate 0.0003 Epoch: 20 Global Step: 431590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:15,232-Speed 6286.06 samples/sec Loss 4.8038 LearningRate 0.0003 Epoch: 20 Global Step: 431600 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:18,482-Speed 6304.59 samples/sec Loss 4.7782 LearningRate 0.0003 Epoch: 20 Global Step: 431610 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:21,725-Speed 6314.74 samples/sec Loss 4.7186 LearningRate 0.0003 Epoch: 20 Global Step: 431620 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:24,972-Speed 6310.48 samples/sec Loss 4.7502 LearningRate 0.0003 Epoch: 20 Global Step: 431630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:28,223-Speed 6299.69 samples/sec Loss 4.7625 LearningRate 0.0003 Epoch: 20 Global Step: 431640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:31,470-Speed 6309.82 samples/sec Loss 4.7752 LearningRate 0.0003 Epoch: 20 Global Step: 431650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:34,715-Speed 6311.88 samples/sec Loss 4.8780 LearningRate 0.0003 Epoch: 20 Global Step: 431660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:37,960-Speed 6312.46 samples/sec Loss 4.7548 LearningRate 0.0003 Epoch: 20 Global Step: 431670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:41,211-Speed 6300.46 samples/sec Loss 4.7252 LearningRate 0.0003 Epoch: 20 Global Step: 431680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:44,444-Speed 6337.98 samples/sec Loss 4.6728 LearningRate 0.0003 Epoch: 20 Global Step: 431690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:47,691-Speed 6307.33 samples/sec Loss 4.7001 LearningRate 0.0003 Epoch: 20 Global Step: 431700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:50,939-Speed 6306.87 samples/sec Loss 4.7431 LearningRate 0.0003 Epoch: 20 Global Step: 431710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:54,186-Speed 6309.61 samples/sec Loss 4.8337 LearningRate 0.0003 Epoch: 20 Global Step: 431720 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 06:59:57,432-Speed 6310.28 samples/sec Loss 4.7747 LearningRate 0.0003 Epoch: 20 Global Step: 431730 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:00,681-Speed 6304.23 samples/sec Loss 4.7059 LearningRate 0.0003 Epoch: 20 Global Step: 431740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:03,929-Speed 6306.98 samples/sec Loss 4.8052 LearningRate 0.0003 Epoch: 20 Global Step: 431750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:07,177-Speed 6308.40 samples/sec Loss 4.7830 LearningRate 0.0003 Epoch: 20 Global Step: 431760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:10,420-Speed 6316.90 samples/sec Loss 4.8167 LearningRate 0.0003 Epoch: 20 Global Step: 431770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:13,667-Speed 6308.00 samples/sec Loss 4.7377 LearningRate 0.0003 Epoch: 20 Global Step: 431780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:16,914-Speed 6309.02 samples/sec Loss 4.7210 LearningRate 0.0003 Epoch: 20 Global Step: 431790 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:00:20,148-Speed 6334.51 samples/sec Loss 4.7921 LearningRate 0.0003 Epoch: 20 Global Step: 431800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:23,391-Speed 6315.57 samples/sec Loss 4.7523 LearningRate 0.0003 Epoch: 20 Global Step: 431810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:26,637-Speed 6310.88 samples/sec Loss 4.7354 LearningRate 0.0003 Epoch: 20 Global Step: 431820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:29,883-Speed 6310.68 samples/sec Loss 4.7761 LearningRate 0.0003 Epoch: 20 Global Step: 431830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:33,125-Speed 6318.68 samples/sec Loss 4.7337 LearningRate 0.0003 Epoch: 20 Global Step: 431840 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:36,382-Speed 6288.87 samples/sec Loss 4.7228 LearningRate 0.0003 Epoch: 20 Global Step: 431850 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:39,628-Speed 6311.55 samples/sec Loss 4.7671 LearningRate 0.0003 Epoch: 20 Global Step: 431860 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:42,874-Speed 6310.22 samples/sec Loss 4.8006 LearningRate 0.0003 Epoch: 20 Global Step: 431870 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:46,122-Speed 6307.26 samples/sec Loss 4.6974 LearningRate 0.0003 Epoch: 20 Global Step: 431880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:49,367-Speed 6312.21 samples/sec Loss 4.7058 LearningRate 0.0003 Epoch: 20 Global Step: 431890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:52,605-Speed 6326.97 samples/sec Loss 4.7376 LearningRate 0.0003 Epoch: 20 Global Step: 431900 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:55,857-Speed 6299.54 samples/sec Loss 4.8141 LearningRate 0.0003 Epoch: 20 Global Step: 431910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:00:59,102-Speed 6312.51 samples/sec Loss 4.8576 LearningRate 0.0003 Epoch: 20 Global Step: 431920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:01:02,346-Speed 6314.67 samples/sec Loss 4.7471 LearningRate 0.0003 Epoch: 20 Global Step: 431930 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:01:05,589-Speed 6316.38 samples/sec Loss 4.7312 LearningRate 0.0003 Epoch: 20 Global Step: 431940 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:01:08,832-Speed 6315.45 samples/sec Loss 4.7683 LearningRate 0.0003 Epoch: 20 Global Step: 431950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:01:12,075-Speed 6315.75 samples/sec Loss 4.8095 LearningRate 0.0003 Epoch: 20 Global Step: 431960 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:01:15,322-Speed 6309.08 samples/sec Loss 4.6969 LearningRate 0.0003 Epoch: 20 Global Step: 431970 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:01:18,568-Speed 6312.36 samples/sec Loss 4.7795 LearningRate 0.0003 Epoch: 20 Global Step: 431980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:01:21,813-Speed 6313.38 samples/sec Loss 4.8075 LearningRate 0.0003 Epoch: 20 Global Step: 431990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:01:25,061-Speed 6306.25 samples/sec Loss 4.7117 LearningRate 0.0003 Epoch: 20 Global Step: 432000 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:01:28,311-Speed 6302.51 samples/sec Loss 4.7632 LearningRate 0.0003 Epoch: 20 Global Step: 432010 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:01:31,557-Speed 6311.58 samples/sec Loss 4.7854 LearningRate 0.0003 Epoch: 20 Global Step: 432020 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:01:34,786-Speed 6344.01 samples/sec Loss 4.7398 LearningRate 0.0003 Epoch: 20 Global Step: 432030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:01:38,032-Speed 6310.69 samples/sec Loss 4.7391 LearningRate 0.0003 Epoch: 20 Global Step: 432040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:01:41,267-Speed 6331.08 samples/sec Loss 4.7986 LearningRate 0.0003 Epoch: 20 Global Step: 432050 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:01:44,516-Speed 6305.86 samples/sec Loss 4.8126 LearningRate 0.0003 Epoch: 20 Global Step: 432060 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:01:47,761-Speed 6313.01 samples/sec Loss 4.7917 LearningRate 0.0003 Epoch: 20 Global Step: 432070 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:01:51,006-Speed 6310.80 samples/sec Loss 4.7751 LearningRate 0.0003 Epoch: 20 Global Step: 432080 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:01:54,251-Speed 6313.51 samples/sec Loss 4.7737 LearningRate 0.0003 Epoch: 20 Global Step: 432090 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:01:57,496-Speed 6312.49 samples/sec Loss 4.7827 LearningRate 0.0003 Epoch: 20 Global Step: 432100 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:02:00,739-Speed 6316.82 samples/sec Loss 4.8019 LearningRate 0.0003 Epoch: 20 Global Step: 432110 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:02:03,988-Speed 6305.78 samples/sec Loss 4.7666 LearningRate 0.0003 Epoch: 20 Global Step: 432120 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:02:07,233-Speed 6311.70 samples/sec Loss 4.7608 LearningRate 0.0003 Epoch: 20 Global Step: 432130 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:02:10,480-Speed 6307.79 samples/sec Loss 4.8497 LearningRate 0.0003 Epoch: 20 Global Step: 432140 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:02:13,731-Speed 6301.38 samples/sec Loss 4.7909 LearningRate 0.0003 Epoch: 20 Global Step: 432150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:16,990-Speed 6285.80 samples/sec Loss 4.7252 LearningRate 0.0003 Epoch: 20 Global Step: 432160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:20,260-Speed 6265.46 samples/sec Loss 4.6737 LearningRate 0.0003 Epoch: 20 Global Step: 432170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:23,504-Speed 6314.46 samples/sec Loss 4.7487 LearningRate 0.0003 Epoch: 20 Global Step: 432180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:26,749-Speed 6311.04 samples/sec Loss 4.7281 LearningRate 0.0003 Epoch: 20 Global Step: 432190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:29,995-Speed 6311.70 samples/sec Loss 4.7861 LearningRate 0.0003 Epoch: 20 Global Step: 432200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:33,242-Speed 6309.88 samples/sec Loss 4.7529 LearningRate 0.0003 Epoch: 20 Global Step: 432210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:36,488-Speed 6311.65 samples/sec Loss 4.7782 LearningRate 0.0003 Epoch: 20 Global Step: 432220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:39,733-Speed 6311.04 samples/sec Loss 4.7621 LearningRate 0.0003 Epoch: 20 Global Step: 432230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:42,976-Speed 6317.90 samples/sec Loss 4.7489 LearningRate 0.0003 Epoch: 20 Global Step: 432240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:46,221-Speed 6311.50 samples/sec Loss 4.7345 LearningRate 0.0003 Epoch: 20 Global Step: 432250 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:02:49,452-Speed 6339.98 samples/sec Loss 4.7340 LearningRate 0.0003 Epoch: 20 Global Step: 432260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:52,696-Speed 6314.25 samples/sec Loss 4.7528 LearningRate 0.0003 Epoch: 20 Global Step: 432270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:55,942-Speed 6312.17 samples/sec Loss 4.7005 LearningRate 0.0003 Epoch: 20 Global Step: 432280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:02:59,184-Speed 6317.95 samples/sec Loss 4.7801 LearningRate 0.0003 Epoch: 20 Global Step: 432290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:02,431-Speed 6309.24 samples/sec Loss 4.7420 LearningRate 0.0003 Epoch: 20 Global Step: 432300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:05,675-Speed 6314.38 samples/sec Loss 4.7352 LearningRate 0.0003 Epoch: 20 Global Step: 432310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:08,921-Speed 6311.38 samples/sec Loss 4.8065 LearningRate 0.0003 Epoch: 20 Global Step: 432320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:12,167-Speed 6308.86 samples/sec Loss 4.7439 LearningRate 0.0003 Epoch: 20 Global Step: 432330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:15,425-Speed 6287.71 samples/sec Loss 4.7321 LearningRate 0.0003 Epoch: 20 Global Step: 432340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:18,681-Speed 6291.55 samples/sec Loss 4.7213 LearningRate 0.0003 Epoch: 20 Global Step: 432350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:21,924-Speed 6316.22 samples/sec Loss 4.7465 LearningRate 0.0003 Epoch: 20 Global Step: 432360 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:03:25,157-Speed 6337.21 samples/sec Loss 4.8164 LearningRate 0.0003 Epoch: 20 Global Step: 432370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:28,409-Speed 6297.98 samples/sec Loss 4.6977 LearningRate 0.0003 Epoch: 20 Global Step: 432380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:31,655-Speed 6310.87 samples/sec Loss 4.7905 LearningRate 0.0003 Epoch: 20 Global Step: 432390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:34,906-Speed 6302.63 samples/sec Loss 4.7897 LearningRate 0.0003 Epoch: 20 Global Step: 432400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:38,153-Speed 6307.66 samples/sec Loss 4.7524 LearningRate 0.0003 Epoch: 20 Global Step: 432410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:41,396-Speed 6317.21 samples/sec Loss 4.7078 LearningRate 0.0003 Epoch: 20 Global Step: 432420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:44,641-Speed 6312.57 samples/sec Loss 4.7276 LearningRate 0.0003 Epoch: 20 Global Step: 432430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:47,889-Speed 6307.32 samples/sec Loss 4.7881 LearningRate 0.0003 Epoch: 20 Global Step: 432440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:51,134-Speed 6313.01 samples/sec Loss 4.8768 LearningRate 0.0003 Epoch: 20 Global Step: 432450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:54,378-Speed 6313.90 samples/sec Loss 4.7281 LearningRate 0.0003 Epoch: 20 Global Step: 432460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:03:58,142-Speed 5441.75 samples/sec Loss 4.7821 LearningRate 0.0003 Epoch: 20 Global Step: 432470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:01,392-Speed 6306.63 samples/sec Loss 4.7161 LearningRate 0.0003 Epoch: 20 Global Step: 432480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:04,675-Speed 6238.60 samples/sec Loss 4.7647 LearningRate 0.0003 Epoch: 20 Global Step: 432490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:07,921-Speed 6311.76 samples/sec Loss 4.7860 LearningRate 0.0003 Epoch: 20 Global Step: 432500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:11,166-Speed 6313.09 samples/sec Loss 4.7256 LearningRate 0.0003 Epoch: 20 Global Step: 432510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:14,408-Speed 6316.83 samples/sec Loss 4.7683 LearningRate 0.0003 Epoch: 20 Global Step: 432520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:17,650-Speed 6318.16 samples/sec Loss 4.8234 LearningRate 0.0003 Epoch: 20 Global Step: 432530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:20,898-Speed 6307.88 samples/sec Loss 4.7643 LearningRate 0.0003 Epoch: 20 Global Step: 432540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:24,145-Speed 6308.39 samples/sec Loss 4.7304 LearningRate 0.0003 Epoch: 20 Global Step: 432550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:27,393-Speed 6306.00 samples/sec Loss 4.7189 LearningRate 0.0003 Epoch: 20 Global Step: 432560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:30,640-Speed 6309.80 samples/sec Loss 4.7505 LearningRate 0.0003 Epoch: 20 Global Step: 432570 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:04:33,888-Speed 6306.69 samples/sec Loss 4.7588 LearningRate 0.0003 Epoch: 20 Global Step: 432580 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:04:37,134-Speed 6310.37 samples/sec Loss 4.7831 LearningRate 0.0003 Epoch: 20 Global Step: 432590 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:04:40,385-Speed 6301.72 samples/sec Loss 4.7276 LearningRate 0.0003 Epoch: 20 Global Step: 432600 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:04:43,622-Speed 6328.45 samples/sec Loss 4.7557 LearningRate 0.0003 Epoch: 20 Global Step: 432610 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:46,877-Speed 6293.68 samples/sec Loss 4.8207 LearningRate 0.0003 Epoch: 20 Global Step: 432620 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:50,126-Speed 6304.52 samples/sec Loss 4.7817 LearningRate 0.0003 Epoch: 20 Global Step: 432630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:53,376-Speed 6303.09 samples/sec Loss 4.7960 LearningRate 0.0003 Epoch: 20 Global Step: 432640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:56,622-Speed 6310.59 samples/sec Loss 4.7125 LearningRate 0.0003 Epoch: 20 Global Step: 432650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:04:59,864-Speed 6318.45 samples/sec Loss 4.7316 LearningRate 0.0003 Epoch: 20 Global Step: 432660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:03,109-Speed 6314.08 samples/sec Loss 4.7177 LearningRate 0.0003 Epoch: 20 Global Step: 432670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:06,353-Speed 6315.20 samples/sec Loss 4.6717 LearningRate 0.0003 Epoch: 20 Global Step: 432680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:09,594-Speed 6319.69 samples/sec Loss 4.8250 LearningRate 0.0003 Epoch: 20 Global Step: 432690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:12,839-Speed 6311.95 samples/sec Loss 4.7549 LearningRate 0.0003 Epoch: 20 Global Step: 432700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:16,082-Speed 6316.61 samples/sec Loss 4.7633 LearningRate 0.0003 Epoch: 20 Global Step: 432710 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:05:19,317-Speed 6331.74 samples/sec Loss 4.7784 LearningRate 0.0003 Epoch: 20 Global Step: 432720 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:22,562-Speed 6313.29 samples/sec Loss 4.6957 LearningRate 0.0003 Epoch: 20 Global Step: 432730 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:25,807-Speed 6312.19 samples/sec Loss 4.7626 LearningRate 0.0003 Epoch: 20 Global Step: 432740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:29,059-Speed 6298.74 samples/sec Loss 4.7662 LearningRate 0.0003 Epoch: 20 Global Step: 432750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:32,305-Speed 6310.94 samples/sec Loss 4.7804 LearningRate 0.0003 Epoch: 20 Global Step: 432760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:35,552-Speed 6309.22 samples/sec Loss 4.7618 LearningRate 0.0003 Epoch: 20 Global Step: 432770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:38,800-Speed 6307.46 samples/sec Loss 4.8124 LearningRate 0.0003 Epoch: 20 Global Step: 432780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:42,044-Speed 6314.30 samples/sec Loss 4.7697 LearningRate 0.0003 Epoch: 20 Global Step: 432790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:45,291-Speed 6309.30 samples/sec Loss 4.7930 LearningRate 0.0003 Epoch: 20 Global Step: 432800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:48,534-Speed 6317.00 samples/sec Loss 4.7492 LearningRate 0.0003 Epoch: 20 Global Step: 432810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:05:54,590-Speed 3381.98 samples/sec Loss 4.7289 LearningRate 0.0003 Epoch: 20 Global Step: 432820 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:05:57,817-Speed 6348.36 samples/sec Loss 4.7510 LearningRate 0.0003 Epoch: 20 Global Step: 432830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:01,067-Speed 6303.47 samples/sec Loss 4.7167 LearningRate 0.0003 Epoch: 20 Global Step: 432840 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:04,319-Speed 6299.73 samples/sec Loss 4.7196 LearningRate 0.0003 Epoch: 20 Global Step: 432850 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:07,564-Speed 6312.87 samples/sec Loss 4.7295 LearningRate 0.0003 Epoch: 20 Global Step: 432860 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:10,804-Speed 6321.36 samples/sec Loss 4.7457 LearningRate 0.0003 Epoch: 20 Global Step: 432870 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:14,049-Speed 6312.24 samples/sec Loss 4.6884 LearningRate 0.0003 Epoch: 20 Global Step: 432880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:17,295-Speed 6311.48 samples/sec Loss 4.7783 LearningRate 0.0003 Epoch: 20 Global Step: 432890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:20,553-Speed 6287.43 samples/sec Loss 4.7683 LearningRate 0.0003 Epoch: 20 Global Step: 432900 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:23,804-Speed 6300.69 samples/sec Loss 4.7610 LearningRate 0.0003 Epoch: 20 Global Step: 432910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:27,070-Speed 6271.69 samples/sec Loss 4.8107 LearningRate 0.0003 Epoch: 20 Global Step: 432920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:30,328-Speed 6288.09 samples/sec Loss 4.7428 LearningRate 0.0003 Epoch: 20 Global Step: 432930 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:06:33,578-Speed 6301.85 samples/sec Loss 4.8243 LearningRate 0.0003 Epoch: 20 Global Step: 432940 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:06:36,819-Speed 6322.03 samples/sec Loss 4.6581 LearningRate 0.0003 Epoch: 20 Global Step: 432950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:40,069-Speed 6302.76 samples/sec Loss 4.7633 LearningRate 0.0003 Epoch: 20 Global Step: 432960 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:43,312-Speed 6316.37 samples/sec Loss 4.7587 LearningRate 0.0003 Epoch: 20 Global Step: 432970 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:46,564-Speed 6298.91 samples/sec Loss 4.8014 LearningRate 0.0003 Epoch: 20 Global Step: 432980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:49,814-Speed 6303.06 samples/sec Loss 4.7042 LearningRate 0.0003 Epoch: 20 Global Step: 432990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:53,060-Speed 6309.96 samples/sec Loss 4.8297 LearningRate 0.0003 Epoch: 20 Global Step: 433000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:56,306-Speed 6310.39 samples/sec Loss 4.7913 LearningRate 0.0003 Epoch: 20 Global Step: 433010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:06:59,556-Speed 6304.42 samples/sec Loss 4.7054 LearningRate 0.0003 Epoch: 20 Global Step: 433020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:02,806-Speed 6303.03 samples/sec Loss 4.6883 LearningRate 0.0003 Epoch: 20 Global Step: 433030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:06,050-Speed 6314.48 samples/sec Loss 4.7876 LearningRate 0.0003 Epoch: 20 Global Step: 433040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:09,295-Speed 6312.46 samples/sec Loss 4.7488 LearningRate 0.0003 Epoch: 20 Global Step: 433050 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:07:12,546-Speed 6300.69 samples/sec Loss 4.7178 LearningRate 0.0003 Epoch: 20 Global Step: 433060 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:15,803-Speed 6289.27 samples/sec Loss 4.7205 LearningRate 0.0003 Epoch: 20 Global Step: 433070 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:19,049-Speed 6310.95 samples/sec Loss 4.7489 LearningRate 0.0003 Epoch: 20 Global Step: 433080 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:22,296-Speed 6308.22 samples/sec Loss 4.7630 LearningRate 0.0003 Epoch: 20 Global Step: 433090 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:25,545-Speed 6306.20 samples/sec Loss 4.7754 LearningRate 0.0003 Epoch: 20 Global Step: 433100 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:28,788-Speed 6315.53 samples/sec Loss 4.7627 LearningRate 0.0003 Epoch: 20 Global Step: 433110 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:32,039-Speed 6302.16 samples/sec Loss 4.7378 LearningRate 0.0003 Epoch: 20 Global Step: 433120 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:35,286-Speed 6307.85 samples/sec Loss 4.7819 LearningRate 0.0003 Epoch: 20 Global Step: 433130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:38,538-Speed 6298.63 samples/sec Loss 4.7891 LearningRate 0.0003 Epoch: 20 Global Step: 433140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:41,785-Speed 6309.86 samples/sec Loss 4.7192 LearningRate 0.0003 Epoch: 20 Global Step: 433150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:45,016-Speed 6340.08 samples/sec Loss 4.7880 LearningRate 0.0003 Epoch: 20 Global Step: 433160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:48,365-Speed 6116.19 samples/sec Loss 4.8052 LearningRate 0.0003 Epoch: 20 Global Step: 433170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:51,689-Speed 6161.95 samples/sec Loss 4.7357 LearningRate 0.0003 Epoch: 20 Global Step: 433180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:07:57,672-Speed 3423.59 samples/sec Loss 4.7446 LearningRate 0.0003 Epoch: 20 Global Step: 433190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:00,917-Speed 6312.30 samples/sec Loss 4.7325 LearningRate 0.0003 Epoch: 20 Global Step: 433200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:04,161-Speed 6315.19 samples/sec Loss 4.7508 LearningRate 0.0003 Epoch: 20 Global Step: 433210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:07,407-Speed 6311.50 samples/sec Loss 4.7248 LearningRate 0.0003 Epoch: 20 Global Step: 433220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:10,654-Speed 6308.46 samples/sec Loss 4.7727 LearningRate 0.0003 Epoch: 20 Global Step: 433230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:13,900-Speed 6311.63 samples/sec Loss 4.7461 LearningRate 0.0003 Epoch: 20 Global Step: 433240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:17,147-Speed 6308.71 samples/sec Loss 4.7867 LearningRate 0.0003 Epoch: 20 Global Step: 433250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:20,391-Speed 6314.58 samples/sec Loss 4.7968 LearningRate 0.0003 Epoch: 20 Global Step: 433260 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:08:23,645-Speed 6294.89 samples/sec Loss 4.8088 LearningRate 0.0003 Epoch: 20 Global Step: 433270 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:08:26,875-Speed 6342.37 samples/sec Loss 4.7456 LearningRate 0.0003 Epoch: 20 Global Step: 433280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:30,118-Speed 6316.28 samples/sec Loss 4.7524 LearningRate 0.0003 Epoch: 20 Global Step: 433290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:33,360-Speed 6318.15 samples/sec Loss 4.7058 LearningRate 0.0003 Epoch: 20 Global Step: 433300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:36,611-Speed 6301.81 samples/sec Loss 4.7930 LearningRate 0.0003 Epoch: 20 Global Step: 433310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:39,850-Speed 6323.06 samples/sec Loss 4.7189 LearningRate 0.0003 Epoch: 20 Global Step: 433320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:43,093-Speed 6316.71 samples/sec Loss 4.7031 LearningRate 0.0003 Epoch: 20 Global Step: 433330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:46,337-Speed 6313.69 samples/sec Loss 4.7001 LearningRate 0.0003 Epoch: 20 Global Step: 433340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:49,581-Speed 6315.38 samples/sec Loss 4.7281 LearningRate 0.0003 Epoch: 20 Global Step: 433350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:52,836-Speed 6294.16 samples/sec Loss 4.7003 LearningRate 0.0003 Epoch: 20 Global Step: 433360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:56,083-Speed 6308.00 samples/sec Loss 4.7204 LearningRate 0.0003 Epoch: 20 Global Step: 433370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:08:59,330-Speed 6308.48 samples/sec Loss 4.7990 LearningRate 0.0003 Epoch: 20 Global Step: 433380 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:09:02,656-Speed 6159.69 samples/sec Loss 4.7303 LearningRate 0.0003 Epoch: 20 Global Step: 433390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:05,925-Speed 6265.92 samples/sec Loss 4.7136 LearningRate 0.0003 Epoch: 20 Global Step: 433400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:09,173-Speed 6307.34 samples/sec Loss 4.7381 LearningRate 0.0003 Epoch: 20 Global Step: 433410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:12,419-Speed 6309.96 samples/sec Loss 4.7308 LearningRate 0.0003 Epoch: 20 Global Step: 433420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:15,667-Speed 6307.11 samples/sec Loss 4.7811 LearningRate 0.0003 Epoch: 20 Global Step: 433430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:18,909-Speed 6318.30 samples/sec Loss 4.7735 LearningRate 0.0003 Epoch: 20 Global Step: 433440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:22,154-Speed 6313.69 samples/sec Loss 4.8006 LearningRate 0.0003 Epoch: 20 Global Step: 433450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:25,400-Speed 6310.06 samples/sec Loss 4.7406 LearningRate 0.0003 Epoch: 20 Global Step: 433460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:28,647-Speed 6309.91 samples/sec Loss 4.7577 LearningRate 0.0003 Epoch: 20 Global Step: 433470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:31,892-Speed 6313.45 samples/sec Loss 4.7432 LearningRate 0.0003 Epoch: 20 Global Step: 433480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:35,138-Speed 6310.30 samples/sec Loss 4.7455 LearningRate 0.0003 Epoch: 20 Global Step: 433490 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:09:38,376-Speed 6326.89 samples/sec Loss 4.7967 LearningRate 0.0003 Epoch: 20 Global Step: 433500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:41,624-Speed 6306.17 samples/sec Loss 4.7534 LearningRate 0.0003 Epoch: 20 Global Step: 433510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:44,874-Speed 6302.06 samples/sec Loss 4.8057 LearningRate 0.0003 Epoch: 20 Global Step: 433520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:48,117-Speed 6316.09 samples/sec Loss 4.7072 LearningRate 0.0003 Epoch: 20 Global Step: 433530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:51,367-Speed 6304.39 samples/sec Loss 4.7633 LearningRate 0.0003 Epoch: 20 Global Step: 433540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:54,612-Speed 6312.85 samples/sec Loss 4.7179 LearningRate 0.0003 Epoch: 20 Global Step: 433550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:09:57,854-Speed 6317.70 samples/sec Loss 4.7538 LearningRate 0.0003 Epoch: 20 Global Step: 433560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:01,176-Speed 6166.07 samples/sec Loss 4.7916 LearningRate 0.0003 Epoch: 20 Global Step: 433570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:04,421-Speed 6313.26 samples/sec Loss 4.7594 LearningRate 0.0003 Epoch: 20 Global Step: 433580 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:07,669-Speed 6306.84 samples/sec Loss 4.7885 LearningRate 0.0003 Epoch: 20 Global Step: 433590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:10,999-Speed 6151.90 samples/sec Loss 4.7750 LearningRate 0.0003 Epoch: 20 Global Step: 433600 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:10:14,299-Speed 6206.27 samples/sec Loss 4.7905 LearningRate 0.0003 Epoch: 20 Global Step: 433610 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:10:17,545-Speed 6312.27 samples/sec Loss 4.7405 LearningRate 0.0003 Epoch: 20 Global Step: 433620 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:10:20,779-Speed 6332.06 samples/sec Loss 4.7851 LearningRate 0.0003 Epoch: 20 Global Step: 433630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:24,027-Speed 6307.31 samples/sec Loss 4.7636 LearningRate 0.0003 Epoch: 20 Global Step: 433640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:27,274-Speed 6310.43 samples/sec Loss 4.7421 LearningRate 0.0003 Epoch: 20 Global Step: 433650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:30,520-Speed 6311.03 samples/sec Loss 4.7695 LearningRate 0.0003 Epoch: 20 Global Step: 433660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:33,767-Speed 6307.32 samples/sec Loss 4.7810 LearningRate 0.0003 Epoch: 20 Global Step: 433670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:37,014-Speed 6310.24 samples/sec Loss 4.7935 LearningRate 0.0003 Epoch: 20 Global Step: 433680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:40,263-Speed 6305.11 samples/sec Loss 4.6912 LearningRate 0.0003 Epoch: 20 Global Step: 433690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:43,521-Speed 6286.32 samples/sec Loss 4.7730 LearningRate 0.0003 Epoch: 20 Global Step: 433700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:46,771-Speed 6302.69 samples/sec Loss 4.7597 LearningRate 0.0003 Epoch: 20 Global Step: 433710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:50,026-Speed 6293.73 samples/sec Loss 4.7260 LearningRate 0.0003 Epoch: 20 Global Step: 433720 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:10:53,325-Speed 6209.21 samples/sec Loss 4.7704 LearningRate 0.0003 Epoch: 20 Global Step: 433730 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:10:56,571-Speed 6311.25 samples/sec Loss 4.7438 LearningRate 0.0003 Epoch: 20 Global Step: 433740 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:10:59,821-Speed 6302.52 samples/sec Loss 4.7680 LearningRate 0.0003 Epoch: 20 Global Step: 433750 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:11:03,131-Speed 6188.25 samples/sec Loss 4.7018 LearningRate 0.0003 Epoch: 20 Global Step: 433760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:06,388-Speed 6290.57 samples/sec Loss 4.6969 LearningRate 0.0003 Epoch: 20 Global Step: 433770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:09,631-Speed 6314.98 samples/sec Loss 4.8185 LearningRate 0.0003 Epoch: 20 Global Step: 433780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:12,882-Speed 6301.88 samples/sec Loss 4.7046 LearningRate 0.0003 Epoch: 20 Global Step: 433790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:16,125-Speed 6316.44 samples/sec Loss 4.6878 LearningRate 0.0003 Epoch: 20 Global Step: 433800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:19,370-Speed 6312.18 samples/sec Loss 4.7906 LearningRate 0.0003 Epoch: 20 Global Step: 433810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:22,622-Speed 6299.93 samples/sec Loss 4.7293 LearningRate 0.0003 Epoch: 20 Global Step: 433820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:25,871-Speed 6303.99 samples/sec Loss 4.7536 LearningRate 0.0003 Epoch: 20 Global Step: 433830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:29,115-Speed 6315.21 samples/sec Loss 4.7254 LearningRate 0.0003 Epoch: 20 Global Step: 433840 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:32,361-Speed 6310.02 samples/sec Loss 4.7121 LearningRate 0.0003 Epoch: 20 Global Step: 433850 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:35,610-Speed 6306.58 samples/sec Loss 4.7086 LearningRate 0.0003 Epoch: 20 Global Step: 433860 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:11:38,855-Speed 6312.99 samples/sec Loss 4.7533 LearningRate 0.0003 Epoch: 20 Global Step: 433870 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:11:42,087-Speed 6338.44 samples/sec Loss 4.7227 LearningRate 0.0003 Epoch: 20 Global Step: 433880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:45,334-Speed 6308.59 samples/sec Loss 4.8078 LearningRate 0.0003 Epoch: 20 Global Step: 433890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:48,575-Speed 6320.58 samples/sec Loss 4.6972 LearningRate 0.0003 Epoch: 20 Global Step: 433900 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:51,825-Speed 6301.38 samples/sec Loss 4.6900 LearningRate 0.0003 Epoch: 20 Global Step: 433910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:55,082-Speed 6290.13 samples/sec Loss 4.7078 LearningRate 0.0003 Epoch: 20 Global Step: 433920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:11:58,330-Speed 6307.16 samples/sec Loss 4.6845 LearningRate 0.0003 Epoch: 20 Global Step: 433930 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:01,576-Speed 6310.72 samples/sec Loss 4.7931 LearningRate 0.0003 Epoch: 20 Global Step: 433940 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:04,825-Speed 6304.81 samples/sec Loss 4.7474 LearningRate 0.0003 Epoch: 20 Global Step: 433950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:08,070-Speed 6311.63 samples/sec Loss 4.8114 LearningRate 0.0003 Epoch: 20 Global Step: 433960 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:11,313-Speed 6317.54 samples/sec Loss 4.7407 LearningRate 0.0003 Epoch: 20 Global Step: 433970 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:14,543-Speed 6340.78 samples/sec Loss 4.8237 LearningRate 0.0003 Epoch: 20 Global Step: 433980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:17,789-Speed 6310.90 samples/sec Loss 4.6927 LearningRate 0.0003 Epoch: 20 Global Step: 433990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:21,040-Speed 6301.04 samples/sec Loss 4.7703 LearningRate 0.0003 Epoch: 20 Global Step: 434000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:24,297-Speed 6289.85 samples/sec Loss 4.7143 LearningRate 0.0003 Epoch: 20 Global Step: 434010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:27,543-Speed 6311.38 samples/sec Loss 4.8176 LearningRate 0.0003 Epoch: 20 Global Step: 434020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:30,788-Speed 6311.99 samples/sec Loss 4.7265 LearningRate 0.0003 Epoch: 20 Global Step: 434030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:34,039-Speed 6300.98 samples/sec Loss 4.8092 LearningRate 0.0003 Epoch: 20 Global Step: 434040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:37,294-Speed 6293.26 samples/sec Loss 4.7462 LearningRate 0.0003 Epoch: 20 Global Step: 434050 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:40,538-Speed 6315.09 samples/sec Loss 4.8165 LearningRate 0.0003 Epoch: 20 Global Step: 434060 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:43,786-Speed 6306.62 samples/sec Loss 4.7627 LearningRate 0.0003 Epoch: 20 Global Step: 434070 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:12:47,034-Speed 6307.25 samples/sec Loss 4.8074 LearningRate 0.0003 Epoch: 20 Global Step: 434080 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:12:50,286-Speed 6299.12 samples/sec Loss 4.7535 LearningRate 0.0003 Epoch: 20 Global Step: 434090 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:12:53,535-Speed 6304.30 samples/sec Loss 4.8767 LearningRate 0.0003 Epoch: 20 Global Step: 434100 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:12:56,766-Speed 6341.51 samples/sec Loss 4.7147 LearningRate 0.0003 Epoch: 20 Global Step: 434110 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:00,014-Speed 6306.56 samples/sec Loss 4.7297 LearningRate 0.0003 Epoch: 20 Global Step: 434120 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:03,281-Speed 6270.03 samples/sec Loss 4.7373 LearningRate 0.0003 Epoch: 20 Global Step: 434130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:06,529-Speed 6307.12 samples/sec Loss 4.7351 LearningRate 0.0003 Epoch: 20 Global Step: 434140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:09,772-Speed 6315.53 samples/sec Loss 4.8076 LearningRate 0.0003 Epoch: 20 Global Step: 434150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:13,017-Speed 6312.36 samples/sec Loss 4.7868 LearningRate 0.0003 Epoch: 20 Global Step: 434160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:16,267-Speed 6303.79 samples/sec Loss 4.8250 LearningRate 0.0003 Epoch: 20 Global Step: 434170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:19,509-Speed 6317.68 samples/sec Loss 4.7918 LearningRate 0.0003 Epoch: 20 Global Step: 434180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:22,756-Speed 6308.14 samples/sec Loss 4.7618 LearningRate 0.0003 Epoch: 20 Global Step: 434190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:26,003-Speed 6308.99 samples/sec Loss 4.7824 LearningRate 0.0003 Epoch: 20 Global Step: 434200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:29,253-Speed 6303.89 samples/sec Loss 4.7288 LearningRate 0.0003 Epoch: 20 Global Step: 434210 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:13:32,486-Speed 6336.43 samples/sec Loss 4.7533 LearningRate 0.0003 Epoch: 20 Global Step: 434220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:35,726-Speed 6321.29 samples/sec Loss 4.8010 LearningRate 0.0003 Epoch: 20 Global Step: 434230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:38,971-Speed 6313.89 samples/sec Loss 4.7550 LearningRate 0.0003 Epoch: 20 Global Step: 434240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:42,218-Speed 6308.68 samples/sec Loss 4.6998 LearningRate 0.0003 Epoch: 20 Global Step: 434250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:45,463-Speed 6311.88 samples/sec Loss 4.7321 LearningRate 0.0003 Epoch: 20 Global Step: 434260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:48,709-Speed 6309.76 samples/sec Loss 4.7727 LearningRate 0.0003 Epoch: 20 Global Step: 434270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:51,953-Speed 6316.45 samples/sec Loss 4.7527 LearningRate 0.0003 Epoch: 20 Global Step: 434280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:55,197-Speed 6313.42 samples/sec Loss 4.7269 LearningRate 0.0003 Epoch: 20 Global Step: 434290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:13:58,445-Speed 6307.36 samples/sec Loss 4.7318 LearningRate 0.0003 Epoch: 20 Global Step: 434300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:01,685-Speed 6322.60 samples/sec Loss 4.7311 LearningRate 0.0003 Epoch: 20 Global Step: 434310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:04,930-Speed 6311.73 samples/sec Loss 4.7371 LearningRate 0.0003 Epoch: 20 Global Step: 434320 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:14:08,177-Speed 6310.41 samples/sec Loss 4.7694 LearningRate 0.0003 Epoch: 20 Global Step: 434330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:11,423-Speed 6309.21 samples/sec Loss 4.6975 LearningRate 0.0003 Epoch: 20 Global Step: 434340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:14,671-Speed 6307.56 samples/sec Loss 4.7134 LearningRate 0.0003 Epoch: 20 Global Step: 434350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:17,924-Speed 6297.07 samples/sec Loss 4.7223 LearningRate 0.0003 Epoch: 20 Global Step: 434360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:21,191-Speed 6270.35 samples/sec Loss 4.7049 LearningRate 0.0003 Epoch: 20 Global Step: 434370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:24,446-Speed 6293.06 samples/sec Loss 4.7921 LearningRate 0.0003 Epoch: 20 Global Step: 434380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:27,690-Speed 6313.78 samples/sec Loss 4.7628 LearningRate 0.0003 Epoch: 20 Global Step: 434390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:30,935-Speed 6313.63 samples/sec Loss 4.7183 LearningRate 0.0003 Epoch: 20 Global Step: 434400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:34,181-Speed 6310.19 samples/sec Loss 4.7346 LearningRate 0.0003 Epoch: 20 Global Step: 434410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:37,431-Speed 6302.80 samples/sec Loss 4.7634 LearningRate 0.0003 Epoch: 20 Global Step: 434420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:40,674-Speed 6316.70 samples/sec Loss 4.7744 LearningRate 0.0003 Epoch: 20 Global Step: 434430 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:14:43,908-Speed 6334.54 samples/sec Loss 4.7119 LearningRate 0.0003 Epoch: 20 Global Step: 434440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:47,151-Speed 6316.69 samples/sec Loss 4.6986 LearningRate 0.0003 Epoch: 20 Global Step: 434450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:50,395-Speed 6313.44 samples/sec Loss 4.7528 LearningRate 0.0003 Epoch: 20 Global Step: 434460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:53,640-Speed 6312.73 samples/sec Loss 4.7998 LearningRate 0.0003 Epoch: 20 Global Step: 434470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:14:56,887-Speed 6309.24 samples/sec Loss 4.7501 LearningRate 0.0003 Epoch: 20 Global Step: 434480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:00,135-Speed 6308.22 samples/sec Loss 4.7744 LearningRate 0.0003 Epoch: 20 Global Step: 434490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:03,384-Speed 6305.34 samples/sec Loss 4.6897 LearningRate 0.0003 Epoch: 20 Global Step: 434500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:06,629-Speed 6311.27 samples/sec Loss 4.7610 LearningRate 0.0003 Epoch: 20 Global Step: 434510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:09,872-Speed 6316.95 samples/sec Loss 4.8106 LearningRate 0.0003 Epoch: 20 Global Step: 434520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:13,121-Speed 6306.31 samples/sec Loss 4.7755 LearningRate 0.0003 Epoch: 20 Global Step: 434530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:16,355-Speed 6333.88 samples/sec Loss 4.7207 LearningRate 0.0003 Epoch: 20 Global Step: 434540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:19,601-Speed 6309.59 samples/sec Loss 4.8104 LearningRate 0.0003 Epoch: 20 Global Step: 434550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:22,852-Speed 6301.26 samples/sec Loss 4.8178 LearningRate 0.0003 Epoch: 20 Global Step: 434560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:26,094-Speed 6318.72 samples/sec Loss 4.7207 LearningRate 0.0003 Epoch: 20 Global Step: 434570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:29,337-Speed 6317.02 samples/sec Loss 4.7427 LearningRate 0.0003 Epoch: 20 Global Step: 434580 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:32,583-Speed 6310.09 samples/sec Loss 4.6910 LearningRate 0.0003 Epoch: 20 Global Step: 434590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:35,832-Speed 6304.10 samples/sec Loss 4.7280 LearningRate 0.0003 Epoch: 20 Global Step: 434600 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:39,078-Speed 6312.45 samples/sec Loss 4.7737 LearningRate 0.0003 Epoch: 20 Global Step: 434610 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:42,321-Speed 6315.94 samples/sec Loss 4.7346 LearningRate 0.0003 Epoch: 20 Global Step: 434620 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:45,568-Speed 6309.43 samples/sec Loss 4.7237 LearningRate 0.0003 Epoch: 20 Global Step: 434630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:48,800-Speed 6337.46 samples/sec Loss 4.7313 LearningRate 0.0003 Epoch: 20 Global Step: 434640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:52,048-Speed 6306.36 samples/sec Loss 4.7693 LearningRate 0.0003 Epoch: 20 Global Step: 434650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:55,295-Speed 6309.39 samples/sec Loss 4.7148 LearningRate 0.0003 Epoch: 20 Global Step: 434660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:15:58,540-Speed 6312.37 samples/sec Loss 4.7673 LearningRate 0.0003 Epoch: 20 Global Step: 434670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:01,786-Speed 6311.20 samples/sec Loss 4.7528 LearningRate 0.0003 Epoch: 20 Global Step: 434680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:05,039-Speed 6296.53 samples/sec Loss 4.7285 LearningRate 0.0003 Epoch: 20 Global Step: 434690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:08,287-Speed 6307.40 samples/sec Loss 4.7535 LearningRate 0.0003 Epoch: 20 Global Step: 434700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:11,536-Speed 6304.59 samples/sec Loss 4.7590 LearningRate 0.0003 Epoch: 20 Global Step: 434710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:14,779-Speed 6317.30 samples/sec Loss 4.7897 LearningRate 0.0003 Epoch: 20 Global Step: 434720 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:18,023-Speed 6313.73 samples/sec Loss 4.7732 LearningRate 0.0003 Epoch: 20 Global Step: 434730 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:21,255-Speed 6339.38 samples/sec Loss 4.8300 LearningRate 0.0003 Epoch: 20 Global Step: 434740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:24,499-Speed 6313.98 samples/sec Loss 4.7953 LearningRate 0.0003 Epoch: 20 Global Step: 434750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:27,744-Speed 6312.53 samples/sec Loss 4.7917 LearningRate 0.0003 Epoch: 20 Global Step: 434760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:30,995-Speed 6301.05 samples/sec Loss 4.7143 LearningRate 0.0003 Epoch: 20 Global Step: 434770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:34,241-Speed 6310.44 samples/sec Loss 4.7271 LearningRate 0.0003 Epoch: 20 Global Step: 434780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:37,483-Speed 6319.28 samples/sec Loss 4.7247 LearningRate 0.0003 Epoch: 20 Global Step: 434790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:40,728-Speed 6313.02 samples/sec Loss 4.6800 LearningRate 0.0003 Epoch: 20 Global Step: 434800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:43,969-Speed 6320.29 samples/sec Loss 4.7718 LearningRate 0.0003 Epoch: 20 Global Step: 434810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:47,213-Speed 6313.98 samples/sec Loss 4.7418 LearningRate 0.0003 Epoch: 20 Global Step: 434820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:50,458-Speed 6312.27 samples/sec Loss 4.7790 LearningRate 0.0003 Epoch: 20 Global Step: 434830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:16:53,701-Speed 6316.34 samples/sec Loss 4.7586 LearningRate 0.0003 Epoch: 20 Global Step: 434840 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:16:56,949-Speed 6306.82 samples/sec Loss 4.7781 LearningRate 0.0003 Epoch: 20 Global Step: 434850 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:17:00,195-Speed 6311.68 samples/sec Loss 4.7285 LearningRate 0.0003 Epoch: 20 Global Step: 434860 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:17:03,426-Speed 6338.90 samples/sec Loss 4.7821 LearningRate 0.0003 Epoch: 20 Global Step: 434870 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:06,677-Speed 6302.02 samples/sec Loss 4.7235 LearningRate 0.0003 Epoch: 20 Global Step: 434880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:09,920-Speed 6315.59 samples/sec Loss 4.8464 LearningRate 0.0003 Epoch: 20 Global Step: 434890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:13,165-Speed 6313.13 samples/sec Loss 4.7562 LearningRate 0.0003 Epoch: 20 Global Step: 434900 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:16,410-Speed 6312.73 samples/sec Loss 4.7520 LearningRate 0.0003 Epoch: 20 Global Step: 434910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:19,657-Speed 6308.83 samples/sec Loss 4.7764 LearningRate 0.0003 Epoch: 20 Global Step: 434920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:22,901-Speed 6315.27 samples/sec Loss 4.7271 LearningRate 0.0003 Epoch: 20 Global Step: 434930 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:26,149-Speed 6306.94 samples/sec Loss 4.7122 LearningRate 0.0003 Epoch: 20 Global Step: 434940 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:29,397-Speed 6307.31 samples/sec Loss 4.6875 LearningRate 0.0003 Epoch: 20 Global Step: 434950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:32,640-Speed 6316.03 samples/sec Loss 4.7201 LearningRate 0.0003 Epoch: 20 Global Step: 434960 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:35,888-Speed 6307.72 samples/sec Loss 4.7428 LearningRate 0.0003 Epoch: 20 Global Step: 434970 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:17:39,118-Speed 6342.13 samples/sec Loss 4.7236 LearningRate 0.0003 Epoch: 20 Global Step: 434980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:42,367-Speed 6304.71 samples/sec Loss 4.6292 LearningRate 0.0003 Epoch: 20 Global Step: 434990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:45,622-Speed 6292.12 samples/sec Loss 4.6647 LearningRate 0.0003 Epoch: 20 Global Step: 435000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:48,864-Speed 6320.40 samples/sec Loss 4.7381 LearningRate 0.0003 Epoch: 20 Global Step: 435010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:52,110-Speed 6310.28 samples/sec Loss 4.8291 LearningRate 0.0003 Epoch: 20 Global Step: 435020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:55,354-Speed 6313.60 samples/sec Loss 4.8036 LearningRate 0.0003 Epoch: 20 Global Step: 435030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:17:58,599-Speed 6312.29 samples/sec Loss 4.6851 LearningRate 0.0003 Epoch: 20 Global Step: 435040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:01,843-Speed 6315.85 samples/sec Loss 4.7386 LearningRate 0.0003 Epoch: 20 Global Step: 435050 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:05,088-Speed 6312.05 samples/sec Loss 4.7179 LearningRate 0.0003 Epoch: 20 Global Step: 435060 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:08,331-Speed 6315.66 samples/sec Loss 4.7677 LearningRate 0.0003 Epoch: 20 Global Step: 435070 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:11,573-Speed 6320.08 samples/sec Loss 4.7727 LearningRate 0.0003 Epoch: 20 Global Step: 435080 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:18:14,821-Speed 6306.08 samples/sec Loss 4.7908 LearningRate 0.0003 Epoch: 20 Global Step: 435090 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:18:18,068-Speed 6308.29 samples/sec Loss 4.7117 LearningRate 0.0003 Epoch: 20 Global Step: 435100 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:18:21,313-Speed 6312.26 samples/sec Loss 4.6457 LearningRate 0.0003 Epoch: 20 Global Step: 435110 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:18:24,545-Speed 6339.33 samples/sec Loss 4.7026 LearningRate 0.0003 Epoch: 20 Global Step: 435120 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:27,794-Speed 6303.65 samples/sec Loss 4.7267 LearningRate 0.0003 Epoch: 20 Global Step: 435130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:31,038-Speed 6314.66 samples/sec Loss 4.6646 LearningRate 0.0003 Epoch: 20 Global Step: 435140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:34,288-Speed 6303.79 samples/sec Loss 4.7103 LearningRate 0.0003 Epoch: 20 Global Step: 435150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:37,532-Speed 6315.17 samples/sec Loss 4.7139 LearningRate 0.0003 Epoch: 20 Global Step: 435160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:40,774-Speed 6318.90 samples/sec Loss 4.7418 LearningRate 0.0003 Epoch: 20 Global Step: 435170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:44,021-Speed 6309.65 samples/sec Loss 4.8455 LearningRate 0.0003 Epoch: 20 Global Step: 435180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:47,264-Speed 6315.72 samples/sec Loss 4.7201 LearningRate 0.0003 Epoch: 20 Global Step: 435190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:50,509-Speed 6312.91 samples/sec Loss 4.7389 LearningRate 0.0003 Epoch: 20 Global Step: 435200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:53,751-Speed 6317.85 samples/sec Loss 4.7637 LearningRate 0.0003 Epoch: 20 Global Step: 435210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:18:56,999-Speed 6307.24 samples/sec Loss 4.7362 LearningRate 0.0003 Epoch: 20 Global Step: 435220 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:19:00,232-Speed 6336.02 samples/sec Loss 4.7707 LearningRate 0.0003 Epoch: 20 Global Step: 435230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:03,477-Speed 6313.49 samples/sec Loss 4.7596 LearningRate 0.0003 Epoch: 20 Global Step: 435240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:06,722-Speed 6312.56 samples/sec Loss 4.7284 LearningRate 0.0003 Epoch: 20 Global Step: 435250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:09,971-Speed 6305.00 samples/sec Loss 4.8021 LearningRate 0.0003 Epoch: 20 Global Step: 435260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:13,217-Speed 6310.10 samples/sec Loss 4.7874 LearningRate 0.0003 Epoch: 20 Global Step: 435270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:16,468-Speed 6301.31 samples/sec Loss 4.7638 LearningRate 0.0003 Epoch: 20 Global Step: 435280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:19,713-Speed 6312.31 samples/sec Loss 4.7368 LearningRate 0.0003 Epoch: 20 Global Step: 435290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:22,961-Speed 6306.20 samples/sec Loss 4.6966 LearningRate 0.0003 Epoch: 20 Global Step: 435300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:26,207-Speed 6310.68 samples/sec Loss 4.6962 LearningRate 0.0003 Epoch: 20 Global Step: 435310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:29,454-Speed 6309.35 samples/sec Loss 4.8056 LearningRate 0.0003 Epoch: 20 Global Step: 435320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:32,688-Speed 6334.84 samples/sec Loss 4.7996 LearningRate 0.0003 Epoch: 20 Global Step: 435330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:35,934-Speed 6310.36 samples/sec Loss 4.8516 LearningRate 0.0003 Epoch: 20 Global Step: 435340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:39,185-Speed 6301.17 samples/sec Loss 4.7162 LearningRate 0.0003 Epoch: 20 Global Step: 435350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:42,435-Speed 6303.58 samples/sec Loss 4.8039 LearningRate 0.0003 Epoch: 20 Global Step: 435360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:45,681-Speed 6310.94 samples/sec Loss 4.7581 LearningRate 0.0003 Epoch: 20 Global Step: 435370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:48,928-Speed 6308.03 samples/sec Loss 4.8337 LearningRate 0.0003 Epoch: 20 Global Step: 435380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:52,289-Speed 6095.23 samples/sec Loss 4.8155 LearningRate 0.0003 Epoch: 20 Global Step: 435390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:55,576-Speed 6232.69 samples/sec Loss 4.6992 LearningRate 0.0003 Epoch: 20 Global Step: 435400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:19:58,826-Speed 6301.63 samples/sec Loss 4.7197 LearningRate 0.0003 Epoch: 20 Global Step: 435410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:02,074-Speed 6307.93 samples/sec Loss 4.7474 LearningRate 0.0003 Epoch: 20 Global Step: 435420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:05,320-Speed 6309.31 samples/sec Loss 4.8301 LearningRate 0.0003 Epoch: 20 Global Step: 435430 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:20:08,552-Speed 6338.77 samples/sec Loss 4.6975 LearningRate 0.0003 Epoch: 20 Global Step: 435440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:11,796-Speed 6314.11 samples/sec Loss 4.7168 LearningRate 0.0003 Epoch: 20 Global Step: 435450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:15,043-Speed 6309.63 samples/sec Loss 4.8266 LearningRate 0.0003 Epoch: 20 Global Step: 435460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:18,291-Speed 6306.63 samples/sec Loss 4.6906 LearningRate 0.0003 Epoch: 20 Global Step: 435470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:21,533-Speed 6317.85 samples/sec Loss 4.7541 LearningRate 0.0003 Epoch: 20 Global Step: 435480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:24,780-Speed 6309.59 samples/sec Loss 4.7645 LearningRate 0.0003 Epoch: 20 Global Step: 435490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:28,025-Speed 6312.31 samples/sec Loss 4.7938 LearningRate 0.0003 Epoch: 20 Global Step: 435500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:31,271-Speed 6310.33 samples/sec Loss 4.7249 LearningRate 0.0003 Epoch: 20 Global Step: 435510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:34,519-Speed 6306.44 samples/sec Loss 4.8023 LearningRate 0.0003 Epoch: 20 Global Step: 435520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:20:37,764-Speed 6312.17 samples/sec Loss 4.7658 LearningRate 0.0003 Epoch: 20 Global Step: 435530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:21:37,407-Speed 343.38 samples/sec Loss 4.8051 LearningRate 0.0003 Epoch: 21 Global Step: 435540 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:21:40,648-Speed 6320.85 samples/sec Loss 4.7050 LearningRate 0.0003 Epoch: 21 Global Step: 435550 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:21:43,868-Speed 6361.88 samples/sec Loss 4.8167 LearningRate 0.0003 Epoch: 21 Global Step: 435560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:21:47,112-Speed 6316.05 samples/sec Loss 4.7437 LearningRate 0.0003 Epoch: 21 Global Step: 435570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:21:50,349-Speed 6327.01 samples/sec Loss 4.7228 LearningRate 0.0003 Epoch: 21 Global Step: 435580 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:21:53,588-Speed 6324.92 samples/sec Loss 4.7344 LearningRate 0.0003 Epoch: 21 Global Step: 435590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:21:56,820-Speed 6337.76 samples/sec Loss 4.7556 LearningRate 0.0003 Epoch: 21 Global Step: 435600 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:00,071-Speed 6300.50 samples/sec Loss 4.7877 LearningRate 0.0003 Epoch: 21 Global Step: 435610 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:03,312-Speed 6321.69 samples/sec Loss 4.7780 LearningRate 0.0003 Epoch: 21 Global Step: 435620 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:06,556-Speed 6314.08 samples/sec Loss 4.6921 LearningRate 0.0003 Epoch: 21 Global Step: 435630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:09,806-Speed 6304.17 samples/sec Loss 4.7139 LearningRate 0.0003 Epoch: 21 Global Step: 435640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:13,078-Speed 6258.88 samples/sec Loss 4.7964 LearningRate 0.0003 Epoch: 21 Global Step: 435650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:16,314-Speed 6330.18 samples/sec Loss 4.8426 LearningRate 0.0003 Epoch: 21 Global Step: 435660 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:22:19,550-Speed 6331.02 samples/sec Loss 4.6743 LearningRate 0.0003 Epoch: 21 Global Step: 435670 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:22:22,792-Speed 6318.99 samples/sec Loss 4.7226 LearningRate 0.0003 Epoch: 21 Global Step: 435680 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:22:26,020-Speed 6344.90 samples/sec Loss 4.7077 LearningRate 0.0003 Epoch: 21 Global Step: 435690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:29,261-Speed 6320.12 samples/sec Loss 4.7085 LearningRate 0.0003 Epoch: 21 Global Step: 435700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:32,505-Speed 6315.43 samples/sec Loss 4.7085 LearningRate 0.0003 Epoch: 21 Global Step: 435710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:35,746-Speed 6320.28 samples/sec Loss 4.7209 LearningRate 0.0003 Epoch: 21 Global Step: 435720 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:39,048-Speed 6202.86 samples/sec Loss 4.7623 LearningRate 0.0003 Epoch: 21 Global Step: 435730 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:42,316-Speed 6269.25 samples/sec Loss 4.7525 LearningRate 0.0003 Epoch: 21 Global Step: 435740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:45,568-Speed 6298.59 samples/sec Loss 4.6374 LearningRate 0.0003 Epoch: 21 Global Step: 435750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:48,811-Speed 6318.03 samples/sec Loss 4.6468 LearningRate 0.0003 Epoch: 21 Global Step: 435760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:52,052-Speed 6319.81 samples/sec Loss 4.7243 LearningRate 0.0003 Epoch: 21 Global Step: 435770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:55,294-Speed 6318.35 samples/sec Loss 4.7180 LearningRate 0.0003 Epoch: 21 Global Step: 435780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:22:58,537-Speed 6315.80 samples/sec Loss 4.7134 LearningRate 0.0003 Epoch: 21 Global Step: 435790 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:23:01,773-Speed 6332.03 samples/sec Loss 4.7068 LearningRate 0.0003 Epoch: 21 Global Step: 435800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:05,016-Speed 6315.43 samples/sec Loss 4.7360 LearningRate 0.0003 Epoch: 21 Global Step: 435810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:08,263-Speed 6309.93 samples/sec Loss 4.6865 LearningRate 0.0003 Epoch: 21 Global Step: 435820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:11,506-Speed 6316.40 samples/sec Loss 4.6502 LearningRate 0.0003 Epoch: 21 Global Step: 435830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:14,750-Speed 6314.78 samples/sec Loss 4.7425 LearningRate 0.0003 Epoch: 21 Global Step: 435840 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:17,998-Speed 6306.45 samples/sec Loss 4.7215 LearningRate 0.0003 Epoch: 21 Global Step: 435850 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:21,239-Speed 6319.47 samples/sec Loss 4.7467 LearningRate 0.0003 Epoch: 21 Global Step: 435860 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:24,482-Speed 6316.32 samples/sec Loss 4.7584 LearningRate 0.0003 Epoch: 21 Global Step: 435870 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:27,790-Speed 6193.62 samples/sec Loss 4.7680 LearningRate 0.0003 Epoch: 21 Global Step: 435880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:31,109-Speed 6170.62 samples/sec Loss 4.7288 LearningRate 0.0003 Epoch: 21 Global Step: 435890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:34,357-Speed 6308.31 samples/sec Loss 4.7396 LearningRate 0.0003 Epoch: 21 Global Step: 435900 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:23:37,587-Speed 6341.93 samples/sec Loss 4.7316 LearningRate 0.0003 Epoch: 21 Global Step: 435910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:40,830-Speed 6317.66 samples/sec Loss 4.7687 LearningRate 0.0003 Epoch: 21 Global Step: 435920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:44,070-Speed 6321.47 samples/sec Loss 4.7532 LearningRate 0.0003 Epoch: 21 Global Step: 435930 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:47,311-Speed 6321.49 samples/sec Loss 4.7194 LearningRate 0.0003 Epoch: 21 Global Step: 435940 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:50,554-Speed 6315.67 samples/sec Loss 4.6775 LearningRate 0.0003 Epoch: 21 Global Step: 435950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:53,796-Speed 6319.68 samples/sec Loss 4.7258 LearningRate 0.0003 Epoch: 21 Global Step: 435960 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:23:57,038-Speed 6319.14 samples/sec Loss 4.6976 LearningRate 0.0003 Epoch: 21 Global Step: 435970 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:00,287-Speed 6304.99 samples/sec Loss 4.7938 LearningRate 0.0003 Epoch: 21 Global Step: 435980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:03,534-Speed 6309.27 samples/sec Loss 4.6884 LearningRate 0.0003 Epoch: 21 Global Step: 435990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:06,774-Speed 6320.96 samples/sec Loss 4.7165 LearningRate 0.0003 Epoch: 21 Global Step: 436000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:10,006-Speed 6337.92 samples/sec Loss 4.7175 LearningRate 0.0003 Epoch: 21 Global Step: 436010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:13,247-Speed 6320.86 samples/sec Loss 4.7207 LearningRate 0.0003 Epoch: 21 Global Step: 436020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:16,493-Speed 6311.08 samples/sec Loss 4.7399 LearningRate 0.0003 Epoch: 21 Global Step: 436030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:19,735-Speed 6318.63 samples/sec Loss 4.7007 LearningRate 0.0003 Epoch: 21 Global Step: 436040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:22,982-Speed 6309.13 samples/sec Loss 4.7287 LearningRate 0.0003 Epoch: 21 Global Step: 436050 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:26,223-Speed 6320.67 samples/sec Loss 4.7809 LearningRate 0.0003 Epoch: 21 Global Step: 436060 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:29,464-Speed 6318.69 samples/sec Loss 4.6754 LearningRate 0.0003 Epoch: 21 Global Step: 436070 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:32,708-Speed 6315.20 samples/sec Loss 4.7551 LearningRate 0.0003 Epoch: 21 Global Step: 436080 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:35,950-Speed 6319.02 samples/sec Loss 4.7207 LearningRate 0.0003 Epoch: 21 Global Step: 436090 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:39,191-Speed 6320.70 samples/sec Loss 4.7205 LearningRate 0.0003 Epoch: 21 Global Step: 436100 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:42,435-Speed 6314.39 samples/sec Loss 4.7424 LearningRate 0.0003 Epoch: 21 Global Step: 436110 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:24:45,675-Speed 6321.64 samples/sec Loss 4.7415 LearningRate 0.0003 Epoch: 21 Global Step: 436120 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:24:48,906-Speed 6340.57 samples/sec Loss 4.7225 LearningRate 0.0003 Epoch: 21 Global Step: 436130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:52,153-Speed 6308.85 samples/sec Loss 4.7894 LearningRate 0.0003 Epoch: 21 Global Step: 436140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:55,394-Speed 6320.72 samples/sec Loss 4.7279 LearningRate 0.0003 Epoch: 21 Global Step: 436150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:24:58,637-Speed 6315.98 samples/sec Loss 4.6935 LearningRate 0.0003 Epoch: 21 Global Step: 436160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:01,879-Speed 6318.39 samples/sec Loss 4.7344 LearningRate 0.0003 Epoch: 21 Global Step: 436170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:05,123-Speed 6315.66 samples/sec Loss 4.7303 LearningRate 0.0003 Epoch: 21 Global Step: 436180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:08,366-Speed 6317.14 samples/sec Loss 4.6836 LearningRate 0.0003 Epoch: 21 Global Step: 436190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:11,611-Speed 6312.76 samples/sec Loss 4.7568 LearningRate 0.0003 Epoch: 21 Global Step: 436200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:14,858-Speed 6308.03 samples/sec Loss 4.7383 LearningRate 0.0003 Epoch: 21 Global Step: 436210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:18,116-Speed 6286.27 samples/sec Loss 4.7874 LearningRate 0.0003 Epoch: 21 Global Step: 436220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:21,366-Speed 6305.58 samples/sec Loss 4.7805 LearningRate 0.0003 Epoch: 21 Global Step: 436230 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:25:24,610-Speed 6313.96 samples/sec Loss 4.7155 LearningRate 0.0003 Epoch: 21 Global Step: 436240 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:25:27,842-Speed 6337.63 samples/sec Loss 4.7128 LearningRate 0.0003 Epoch: 21 Global Step: 436250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:31,088-Speed 6311.20 samples/sec Loss 4.7074 LearningRate 0.0003 Epoch: 21 Global Step: 436260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:34,332-Speed 6315.70 samples/sec Loss 4.7127 LearningRate 0.0003 Epoch: 21 Global Step: 436270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:37,578-Speed 6309.89 samples/sec Loss 4.7521 LearningRate 0.0003 Epoch: 21 Global Step: 436280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:40,825-Speed 6308.10 samples/sec Loss 4.7449 LearningRate 0.0003 Epoch: 21 Global Step: 436290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:44,071-Speed 6311.50 samples/sec Loss 4.7064 LearningRate 0.0003 Epoch: 21 Global Step: 436300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:47,313-Speed 6318.08 samples/sec Loss 4.7769 LearningRate 0.0003 Epoch: 21 Global Step: 436310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:50,559-Speed 6311.29 samples/sec Loss 4.7375 LearningRate 0.0003 Epoch: 21 Global Step: 436320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:53,800-Speed 6320.66 samples/sec Loss 4.7510 LearningRate 0.0003 Epoch: 21 Global Step: 436330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:25:57,048-Speed 6305.94 samples/sec Loss 4.7299 LearningRate 0.0003 Epoch: 21 Global Step: 436340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:00,294-Speed 6310.71 samples/sec Loss 4.7233 LearningRate 0.0003 Epoch: 21 Global Step: 436350 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:26:03,525-Speed 6340.09 samples/sec Loss 4.7360 LearningRate 0.0003 Epoch: 21 Global Step: 436360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:06,780-Speed 6294.24 samples/sec Loss 4.6838 LearningRate 0.0003 Epoch: 21 Global Step: 436370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:10,025-Speed 6312.31 samples/sec Loss 4.7314 LearningRate 0.0003 Epoch: 21 Global Step: 436380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:13,268-Speed 6315.94 samples/sec Loss 4.7438 LearningRate 0.0003 Epoch: 21 Global Step: 436390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:16,524-Speed 6292.75 samples/sec Loss 4.7846 LearningRate 0.0003 Epoch: 21 Global Step: 436400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:19,769-Speed 6311.51 samples/sec Loss 4.7745 LearningRate 0.0003 Epoch: 21 Global Step: 436410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:23,017-Speed 6307.43 samples/sec Loss 4.7600 LearningRate 0.0003 Epoch: 21 Global Step: 436420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:26,262-Speed 6312.72 samples/sec Loss 4.7825 LearningRate 0.0003 Epoch: 21 Global Step: 436430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:29,504-Speed 6319.19 samples/sec Loss 4.7998 LearningRate 0.0003 Epoch: 21 Global Step: 436440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:32,760-Speed 6289.81 samples/sec Loss 4.7289 LearningRate 0.0003 Epoch: 21 Global Step: 436450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:36,062-Speed 6204.49 samples/sec Loss 4.7886 LearningRate 0.0003 Epoch: 21 Global Step: 436460 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:26:39,399-Speed 6139.11 samples/sec Loss 4.7639 LearningRate 0.0003 Epoch: 21 Global Step: 436470 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:26:42,626-Speed 6346.85 samples/sec Loss 4.7424 LearningRate 0.0003 Epoch: 21 Global Step: 436480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:45,872-Speed 6310.59 samples/sec Loss 4.7593 LearningRate 0.0003 Epoch: 21 Global Step: 436490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:49,116-Speed 6314.11 samples/sec Loss 4.6882 LearningRate 0.0003 Epoch: 21 Global Step: 436500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:52,362-Speed 6312.09 samples/sec Loss 4.7198 LearningRate 0.0003 Epoch: 21 Global Step: 436510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:55,607-Speed 6312.49 samples/sec Loss 4.7917 LearningRate 0.0003 Epoch: 21 Global Step: 436520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:26:58,857-Speed 6302.45 samples/sec Loss 4.7186 LearningRate 0.0003 Epoch: 21 Global Step: 436530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:02,102-Speed 6313.18 samples/sec Loss 4.7299 LearningRate 0.0003 Epoch: 21 Global Step: 436540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:05,342-Speed 6320.62 samples/sec Loss 4.7427 LearningRate 0.0003 Epoch: 21 Global Step: 436550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:08,589-Speed 6309.19 samples/sec Loss 4.6694 LearningRate 0.0003 Epoch: 21 Global Step: 436560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:11,833-Speed 6314.93 samples/sec Loss 4.6948 LearningRate 0.0003 Epoch: 21 Global Step: 436570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:15,074-Speed 6320.30 samples/sec Loss 4.7475 LearningRate 0.0003 Epoch: 21 Global Step: 436580 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:27:18,323-Speed 6306.32 samples/sec Loss 4.7469 LearningRate 0.0003 Epoch: 21 Global Step: 436590 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:27:21,569-Speed 6310.22 samples/sec Loss 4.6449 LearningRate 0.0003 Epoch: 21 Global Step: 436600 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:27:24,817-Speed 6306.99 samples/sec Loss 4.7694 LearningRate 0.0003 Epoch: 21 Global Step: 436610 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:27:28,061-Speed 6315.43 samples/sec Loss 4.7197 LearningRate 0.0003 Epoch: 21 Global Step: 436620 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:27:31,306-Speed 6312.34 samples/sec Loss 4.7231 LearningRate 0.0003 Epoch: 21 Global Step: 436630 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:27:34,551-Speed 6312.41 samples/sec Loss 4.7217 LearningRate 0.0003 Epoch: 21 Global Step: 436640 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:27:37,781-Speed 6341.57 samples/sec Loss 4.7291 LearningRate 0.0003 Epoch: 21 Global Step: 436650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:41,026-Speed 6312.22 samples/sec Loss 4.7509 LearningRate 0.0003 Epoch: 21 Global Step: 436660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:44,275-Speed 6306.48 samples/sec Loss 4.7248 LearningRate 0.0003 Epoch: 21 Global Step: 436670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:47,520-Speed 6311.88 samples/sec Loss 4.7060 LearningRate 0.0003 Epoch: 21 Global Step: 436680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:50,764-Speed 6313.47 samples/sec Loss 4.7097 LearningRate 0.0003 Epoch: 21 Global Step: 436690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:54,006-Speed 6319.34 samples/sec Loss 4.7482 LearningRate 0.0003 Epoch: 21 Global Step: 436700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:27:57,248-Speed 6319.04 samples/sec Loss 4.7692 LearningRate 0.0003 Epoch: 21 Global Step: 436710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:00,491-Speed 6315.26 samples/sec Loss 4.8067 LearningRate 0.0003 Epoch: 21 Global Step: 436720 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:03,736-Speed 6312.64 samples/sec Loss 4.7573 LearningRate 0.0003 Epoch: 21 Global Step: 436730 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:06,980-Speed 6315.54 samples/sec Loss 4.7256 LearningRate 0.0003 Epoch: 21 Global Step: 436740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:10,207-Speed 6348.44 samples/sec Loss 4.7436 LearningRate 0.0003 Epoch: 21 Global Step: 436750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:13,455-Speed 6306.93 samples/sec Loss 4.7646 LearningRate 0.0003 Epoch: 21 Global Step: 436760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:16,697-Speed 6318.14 samples/sec Loss 4.7479 LearningRate 0.0003 Epoch: 21 Global Step: 436770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:19,945-Speed 6305.72 samples/sec Loss 4.8153 LearningRate 0.0003 Epoch: 21 Global Step: 436780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:23,191-Speed 6311.79 samples/sec Loss 4.7670 LearningRate 0.0003 Epoch: 21 Global Step: 436790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:26,437-Speed 6309.05 samples/sec Loss 4.7279 LearningRate 0.0003 Epoch: 21 Global Step: 436800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:29,684-Speed 6309.91 samples/sec Loss 4.7704 LearningRate 0.0003 Epoch: 21 Global Step: 436810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:32,927-Speed 6316.93 samples/sec Loss 4.6571 LearningRate 0.0003 Epoch: 21 Global Step: 436820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:36,173-Speed 6311.00 samples/sec Loss 4.6823 LearningRate 0.0003 Epoch: 21 Global Step: 436830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:39,418-Speed 6313.59 samples/sec Loss 4.7691 LearningRate 0.0003 Epoch: 21 Global Step: 436840 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:42,663-Speed 6311.79 samples/sec Loss 4.7842 LearningRate 0.0003 Epoch: 21 Global Step: 436850 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:28:45,892-Speed 6343.59 samples/sec Loss 4.7515 LearningRate 0.0003 Epoch: 21 Global Step: 436860 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:28:49,124-Speed 6338.96 samples/sec Loss 4.7134 LearningRate 0.0003 Epoch: 21 Global Step: 436870 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:28:52,370-Speed 6310.55 samples/sec Loss 4.7820 LearningRate 0.0003 Epoch: 21 Global Step: 436880 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:28:55,610-Speed 6321.91 samples/sec Loss 4.7208 LearningRate 0.0003 Epoch: 21 Global Step: 436890 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:28:58,854-Speed 6315.22 samples/sec Loss 4.7147 LearningRate 0.0003 Epoch: 21 Global Step: 436900 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:29:02,099-Speed 6313.01 samples/sec Loss 4.6976 LearningRate 0.0003 Epoch: 21 Global Step: 436910 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:29:05,345-Speed 6310.71 samples/sec Loss 4.7031 LearningRate 0.0003 Epoch: 21 Global Step: 436920 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:29:08,592-Speed 6308.88 samples/sec Loss 4.6871 LearningRate 0.0003 Epoch: 21 Global Step: 436930 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:29:11,834-Speed 6318.33 samples/sec Loss 4.7390 LearningRate 0.0003 Epoch: 21 Global Step: 436940 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:29:15,078-Speed 6312.81 samples/sec Loss 4.7748 LearningRate 0.0003 Epoch: 21 Global Step: 436950 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:29:18,328-Speed 6303.13 samples/sec Loss 4.7526 LearningRate 0.0003 Epoch: 21 Global Step: 436960 Fp16 Grad Scale: 8192 Required: 36 hours Training: 2022-04-02 07:29:21,575-Speed 6309.76 samples/sec Loss 4.7249 LearningRate 0.0003 Epoch: 21 Global Step: 436970 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:29:24,820-Speed 6312.90 samples/sec Loss 4.7556 LearningRate 0.0003 Epoch: 21 Global Step: 436980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:29:28,063-Speed 6315.59 samples/sec Loss 4.6784 LearningRate 0.0003 Epoch: 21 Global Step: 436990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:29:31,311-Speed 6306.88 samples/sec Loss 4.6511 LearningRate 0.0003 Epoch: 21 Global Step: 437000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:29:34,553-Speed 6318.32 samples/sec Loss 4.7826 LearningRate 0.0003 Epoch: 21 Global Step: 437010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:29:37,800-Speed 6309.09 samples/sec Loss 4.6994 LearningRate 0.0003 Epoch: 21 Global Step: 437020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:29:41,046-Speed 6312.74 samples/sec Loss 4.7376 LearningRate 0.0003 Epoch: 21 Global Step: 437030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:29:44,289-Speed 6315.45 samples/sec Loss 4.7203 LearningRate 0.0003 Epoch: 21 Global Step: 437040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:29:47,535-Speed 6309.91 samples/sec Loss 4.7037 LearningRate 0.0003 Epoch: 21 Global Step: 437050 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:29:50,780-Speed 6314.43 samples/sec Loss 4.7219 LearningRate 0.0003 Epoch: 21 Global Step: 437060 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:29:54,026-Speed 6312.42 samples/sec Loss 4.7783 LearningRate 0.0003 Epoch: 21 Global Step: 437070 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:29:57,256-Speed 6342.68 samples/sec Loss 4.6639 LearningRate 0.0003 Epoch: 21 Global Step: 437080 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:00,498-Speed 6318.10 samples/sec Loss 4.6243 LearningRate 0.0003 Epoch: 21 Global Step: 437090 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:03,743-Speed 6313.58 samples/sec Loss 4.7478 LearningRate 0.0003 Epoch: 21 Global Step: 437100 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:07,009-Speed 6270.32 samples/sec Loss 4.7183 LearningRate 0.0003 Epoch: 21 Global Step: 437110 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:10,314-Speed 6199.72 samples/sec Loss 4.7226 LearningRate 0.0003 Epoch: 21 Global Step: 437120 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:13,555-Speed 6318.89 samples/sec Loss 4.7401 LearningRate 0.0003 Epoch: 21 Global Step: 437130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:16,799-Speed 6315.57 samples/sec Loss 4.7569 LearningRate 0.0003 Epoch: 21 Global Step: 437140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:20,041-Speed 6317.01 samples/sec Loss 4.7285 LearningRate 0.0003 Epoch: 21 Global Step: 437150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:23,286-Speed 6313.46 samples/sec Loss 4.7225 LearningRate 0.0003 Epoch: 21 Global Step: 437160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:26,529-Speed 6317.11 samples/sec Loss 4.7499 LearningRate 0.0003 Epoch: 21 Global Step: 437170 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:29,770-Speed 6319.70 samples/sec Loss 4.7240 LearningRate 0.0003 Epoch: 21 Global Step: 437180 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:30:33,000-Speed 6342.64 samples/sec Loss 4.6890 LearningRate 0.0003 Epoch: 21 Global Step: 437190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:36,242-Speed 6317.66 samples/sec Loss 4.7432 LearningRate 0.0003 Epoch: 21 Global Step: 437200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:39,483-Speed 6321.58 samples/sec Loss 4.7620 LearningRate 0.0003 Epoch: 21 Global Step: 437210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:42,727-Speed 6314.62 samples/sec Loss 4.7100 LearningRate 0.0003 Epoch: 21 Global Step: 437220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:45,985-Speed 6288.55 samples/sec Loss 4.6894 LearningRate 0.0003 Epoch: 21 Global Step: 437230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:49,230-Speed 6311.40 samples/sec Loss 4.7240 LearningRate 0.0003 Epoch: 21 Global Step: 437240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:52,476-Speed 6310.85 samples/sec Loss 4.7375 LearningRate 0.0003 Epoch: 21 Global Step: 437250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:55,725-Speed 6304.81 samples/sec Loss 4.7364 LearningRate 0.0003 Epoch: 21 Global Step: 437260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:30:58,978-Speed 6297.51 samples/sec Loss 4.8111 LearningRate 0.0003 Epoch: 21 Global Step: 437270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:02,223-Speed 6313.78 samples/sec Loss 4.7671 LearningRate 0.0003 Epoch: 21 Global Step: 437280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:05,456-Speed 6335.97 samples/sec Loss 4.7596 LearningRate 0.0003 Epoch: 21 Global Step: 437290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:08,701-Speed 6312.56 samples/sec Loss 4.6972 LearningRate 0.0003 Epoch: 21 Global Step: 437300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:11,948-Speed 6309.18 samples/sec Loss 4.7648 LearningRate 0.0003 Epoch: 21 Global Step: 437310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:15,190-Speed 6318.11 samples/sec Loss 4.7925 LearningRate 0.0003 Epoch: 21 Global Step: 437320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:18,431-Speed 6320.54 samples/sec Loss 4.7002 LearningRate 0.0003 Epoch: 21 Global Step: 437330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:21,674-Speed 6315.13 samples/sec Loss 4.6617 LearningRate 0.0003 Epoch: 21 Global Step: 437340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:24,919-Speed 6313.11 samples/sec Loss 4.8051 LearningRate 0.0003 Epoch: 21 Global Step: 437350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:28,167-Speed 6307.17 samples/sec Loss 4.7142 LearningRate 0.0003 Epoch: 21 Global Step: 437360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:31,411-Speed 6314.50 samples/sec Loss 4.7039 LearningRate 0.0003 Epoch: 21 Global Step: 437370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:34,657-Speed 6311.41 samples/sec Loss 4.6714 LearningRate 0.0003 Epoch: 21 Global Step: 437380 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:37,902-Speed 6312.60 samples/sec Loss 4.7249 LearningRate 0.0003 Epoch: 21 Global Step: 437390 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:31:41,143-Speed 6319.10 samples/sec Loss 4.7242 LearningRate 0.0003 Epoch: 21 Global Step: 437400 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:31:44,374-Speed 6340.38 samples/sec Loss 4.6781 LearningRate 0.0003 Epoch: 21 Global Step: 437410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:47,618-Speed 6315.64 samples/sec Loss 4.7314 LearningRate 0.0003 Epoch: 21 Global Step: 437420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:50,865-Speed 6307.18 samples/sec Loss 4.8035 LearningRate 0.0003 Epoch: 21 Global Step: 437430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:54,115-Speed 6303.69 samples/sec Loss 4.7228 LearningRate 0.0003 Epoch: 21 Global Step: 437440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:31:57,362-Speed 6309.40 samples/sec Loss 4.8039 LearningRate 0.0003 Epoch: 21 Global Step: 437450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:00,607-Speed 6314.11 samples/sec Loss 4.7057 LearningRate 0.0003 Epoch: 21 Global Step: 437460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:03,851-Speed 6314.66 samples/sec Loss 4.7623 LearningRate 0.0003 Epoch: 21 Global Step: 437470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:07,093-Speed 6317.29 samples/sec Loss 4.7115 LearningRate 0.0003 Epoch: 21 Global Step: 437480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:10,339-Speed 6311.37 samples/sec Loss 4.7041 LearningRate 0.0003 Epoch: 21 Global Step: 437490 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:13,586-Speed 6309.09 samples/sec Loss 4.7364 LearningRate 0.0003 Epoch: 21 Global Step: 437500 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:16,817-Speed 6339.64 samples/sec Loss 4.6675 LearningRate 0.0003 Epoch: 21 Global Step: 437510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:20,061-Speed 6314.38 samples/sec Loss 4.7664 LearningRate 0.0003 Epoch: 21 Global Step: 437520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:23,305-Speed 6313.91 samples/sec Loss 4.7110 LearningRate 0.0003 Epoch: 21 Global Step: 437530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:26,547-Speed 6319.59 samples/sec Loss 4.7404 LearningRate 0.0003 Epoch: 21 Global Step: 437540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:29,794-Speed 6308.23 samples/sec Loss 4.6952 LearningRate 0.0003 Epoch: 21 Global Step: 437550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:33,044-Speed 6302.94 samples/sec Loss 4.6402 LearningRate 0.0003 Epoch: 21 Global Step: 437560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:36,289-Speed 6312.60 samples/sec Loss 4.8002 LearningRate 0.0003 Epoch: 21 Global Step: 437570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:39,537-Speed 6307.46 samples/sec Loss 4.7734 LearningRate 0.0003 Epoch: 21 Global Step: 437580 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:42,781-Speed 6313.51 samples/sec Loss 4.7528 LearningRate 0.0003 Epoch: 21 Global Step: 437590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:46,038-Speed 6289.92 samples/sec Loss 4.7089 LearningRate 0.0003 Epoch: 21 Global Step: 437600 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:49,280-Speed 6318.91 samples/sec Loss 4.7510 LearningRate 0.0003 Epoch: 21 Global Step: 437610 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:32:52,506-Speed 6348.35 samples/sec Loss 4.8237 LearningRate 0.0003 Epoch: 21 Global Step: 437620 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:55,765-Speed 6286.38 samples/sec Loss 4.7679 LearningRate 0.0003 Epoch: 21 Global Step: 437630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:32:59,016-Speed 6300.06 samples/sec Loss 4.6900 LearningRate 0.0003 Epoch: 21 Global Step: 437640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:02,260-Speed 6315.83 samples/sec Loss 4.6868 LearningRate 0.0003 Epoch: 21 Global Step: 437650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:05,507-Speed 6308.87 samples/sec Loss 4.7742 LearningRate 0.0003 Epoch: 21 Global Step: 437660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:08,752-Speed 6312.03 samples/sec Loss 4.7062 LearningRate 0.0003 Epoch: 21 Global Step: 437670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:11,997-Speed 6313.13 samples/sec Loss 4.7365 LearningRate 0.0003 Epoch: 21 Global Step: 437680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:15,241-Speed 6316.16 samples/sec Loss 4.7434 LearningRate 0.0003 Epoch: 21 Global Step: 437690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:18,486-Speed 6312.36 samples/sec Loss 4.6498 LearningRate 0.0003 Epoch: 21 Global Step: 437700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:21,727-Speed 6319.78 samples/sec Loss 4.6723 LearningRate 0.0003 Epoch: 21 Global Step: 437710 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:24,971-Speed 6314.00 samples/sec Loss 4.7479 LearningRate 0.0003 Epoch: 21 Global Step: 437720 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:33:28,218-Speed 6309.71 samples/sec Loss 4.6853 LearningRate 0.0003 Epoch: 21 Global Step: 437730 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:33:31,459-Speed 6320.02 samples/sec Loss 4.6212 LearningRate 0.0003 Epoch: 21 Global Step: 437740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:34,705-Speed 6310.56 samples/sec Loss 4.7023 LearningRate 0.0003 Epoch: 21 Global Step: 437750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:37,947-Speed 6319.57 samples/sec Loss 4.7380 LearningRate 0.0003 Epoch: 21 Global Step: 437760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:41,194-Speed 6308.33 samples/sec Loss 4.7859 LearningRate 0.0003 Epoch: 21 Global Step: 437770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:44,437-Speed 6316.50 samples/sec Loss 4.7019 LearningRate 0.0003 Epoch: 21 Global Step: 437780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:47,685-Speed 6306.70 samples/sec Loss 4.6343 LearningRate 0.0003 Epoch: 21 Global Step: 437790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:50,930-Speed 6311.87 samples/sec Loss 4.6850 LearningRate 0.0003 Epoch: 21 Global Step: 437800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:54,173-Speed 6316.01 samples/sec Loss 4.7622 LearningRate 0.0003 Epoch: 21 Global Step: 437810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:33:57,419-Speed 6312.19 samples/sec Loss 4.7296 LearningRate 0.0003 Epoch: 21 Global Step: 437820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:00,663-Speed 6313.84 samples/sec Loss 4.7180 LearningRate 0.0003 Epoch: 21 Global Step: 437830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:03,911-Speed 6307.12 samples/sec Loss 4.7553 LearningRate 0.0003 Epoch: 21 Global Step: 437840 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:34:07,142-Speed 6338.95 samples/sec Loss 4.7655 LearningRate 0.0003 Epoch: 21 Global Step: 437850 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:10,386-Speed 6316.47 samples/sec Loss 4.6871 LearningRate 0.0003 Epoch: 21 Global Step: 437860 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:13,633-Speed 6307.37 samples/sec Loss 4.6773 LearningRate 0.0003 Epoch: 21 Global Step: 437870 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:16,877-Speed 6315.93 samples/sec Loss 4.6571 LearningRate 0.0003 Epoch: 21 Global Step: 437880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:20,123-Speed 6310.01 samples/sec Loss 4.7206 LearningRate 0.0003 Epoch: 21 Global Step: 437890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:23,366-Speed 6317.32 samples/sec Loss 4.7061 LearningRate 0.0003 Epoch: 21 Global Step: 437900 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:26,623-Speed 6288.91 samples/sec Loss 4.6567 LearningRate 0.0003 Epoch: 21 Global Step: 437910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:29,867-Speed 6315.32 samples/sec Loss 4.6869 LearningRate 0.0003 Epoch: 21 Global Step: 437920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:33,108-Speed 6319.91 samples/sec Loss 4.7349 LearningRate 0.0003 Epoch: 21 Global Step: 437930 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:36,358-Speed 6303.43 samples/sec Loss 4.7406 LearningRate 0.0003 Epoch: 21 Global Step: 437940 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:39,588-Speed 6342.73 samples/sec Loss 4.7431 LearningRate 0.0003 Epoch: 21 Global Step: 437950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:42,831-Speed 6315.01 samples/sec Loss 4.7580 LearningRate 0.0003 Epoch: 21 Global Step: 437960 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:46,078-Speed 6309.23 samples/sec Loss 4.7172 LearningRate 0.0003 Epoch: 21 Global Step: 437970 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:49,334-Speed 6292.07 samples/sec Loss 4.7341 LearningRate 0.0003 Epoch: 21 Global Step: 437980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:52,586-Speed 6299.15 samples/sec Loss 4.6892 LearningRate 0.0003 Epoch: 21 Global Step: 437990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:55,833-Speed 6307.87 samples/sec Loss 4.8071 LearningRate 0.0003 Epoch: 21 Global Step: 438000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:34:59,080-Speed 6309.97 samples/sec Loss 4.7226 LearningRate 0.0003 Epoch: 21 Global Step: 438010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:02,330-Speed 6301.32 samples/sec Loss 4.7052 LearningRate 0.0003 Epoch: 21 Global Step: 438020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:05,580-Speed 6304.39 samples/sec Loss 4.6673 LearningRate 0.0003 Epoch: 21 Global Step: 438030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:08,825-Speed 6311.18 samples/sec Loss 4.8274 LearningRate 0.0003 Epoch: 21 Global Step: 438040 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:12,068-Speed 6318.24 samples/sec Loss 4.7018 LearningRate 0.0003 Epoch: 21 Global Step: 438050 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:35:15,301-Speed 6336.33 samples/sec Loss 4.7133 LearningRate 0.0003 Epoch: 21 Global Step: 438060 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:18,544-Speed 6315.47 samples/sec Loss 4.6646 LearningRate 0.0003 Epoch: 21 Global Step: 438070 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:21,788-Speed 6313.87 samples/sec Loss 4.7558 LearningRate 0.0003 Epoch: 21 Global Step: 438080 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:25,056-Speed 6268.53 samples/sec Loss 4.7272 LearningRate 0.0003 Epoch: 21 Global Step: 438090 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:28,301-Speed 6314.30 samples/sec Loss 4.6914 LearningRate 0.0003 Epoch: 21 Global Step: 438100 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:31,550-Speed 6304.14 samples/sec Loss 4.7346 LearningRate 0.0003 Epoch: 21 Global Step: 438110 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:34,856-Speed 6197.16 samples/sec Loss 4.6418 LearningRate 0.0003 Epoch: 21 Global Step: 438120 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:38,135-Speed 6245.74 samples/sec Loss 4.7572 LearningRate 0.0003 Epoch: 21 Global Step: 438130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:41,392-Speed 6289.31 samples/sec Loss 4.7249 LearningRate 0.0003 Epoch: 21 Global Step: 438140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:44,639-Speed 6309.61 samples/sec Loss 4.6835 LearningRate 0.0003 Epoch: 21 Global Step: 438150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:47,941-Speed 6203.01 samples/sec Loss 4.6854 LearningRate 0.0003 Epoch: 21 Global Step: 438160 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:35:51,185-Speed 6314.68 samples/sec Loss 4.7182 LearningRate 0.0003 Epoch: 21 Global Step: 438170 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:35:54,418-Speed 6337.26 samples/sec Loss 4.6774 LearningRate 0.0003 Epoch: 21 Global Step: 438180 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:35:57,669-Speed 6301.16 samples/sec Loss 4.7266 LearningRate 0.0003 Epoch: 21 Global Step: 438190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:00,919-Speed 6302.62 samples/sec Loss 4.7311 LearningRate 0.0003 Epoch: 21 Global Step: 438200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:04,161-Speed 6317.49 samples/sec Loss 4.7407 LearningRate 0.0003 Epoch: 21 Global Step: 438210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:07,405-Speed 6314.69 samples/sec Loss 4.6801 LearningRate 0.0003 Epoch: 21 Global Step: 438220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:10,651-Speed 6311.54 samples/sec Loss 4.7223 LearningRate 0.0003 Epoch: 21 Global Step: 438230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:13,893-Speed 6318.74 samples/sec Loss 4.7237 LearningRate 0.0003 Epoch: 21 Global Step: 438240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:17,140-Speed 6307.91 samples/sec Loss 4.6579 LearningRate 0.0003 Epoch: 21 Global Step: 438250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:20,385-Speed 6312.69 samples/sec Loss 4.7054 LearningRate 0.0003 Epoch: 21 Global Step: 438260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:23,628-Speed 6315.96 samples/sec Loss 4.6916 LearningRate 0.0003 Epoch: 21 Global Step: 438270 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:26,859-Speed 6340.53 samples/sec Loss 4.7662 LearningRate 0.0003 Epoch: 21 Global Step: 438280 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:30,102-Speed 6316.56 samples/sec Loss 4.7274 LearningRate 0.0003 Epoch: 21 Global Step: 438290 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:33,347-Speed 6313.04 samples/sec Loss 4.7298 LearningRate 0.0003 Epoch: 21 Global Step: 438300 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:36,591-Speed 6314.58 samples/sec Loss 4.7142 LearningRate 0.0003 Epoch: 21 Global Step: 438310 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:39,833-Speed 6319.62 samples/sec Loss 4.7258 LearningRate 0.0003 Epoch: 21 Global Step: 438320 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:43,087-Speed 6295.75 samples/sec Loss 4.7633 LearningRate 0.0003 Epoch: 21 Global Step: 438330 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:46,332-Speed 6311.47 samples/sec Loss 4.7226 LearningRate 0.0003 Epoch: 21 Global Step: 438340 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:49,580-Speed 6308.11 samples/sec Loss 4.7159 LearningRate 0.0003 Epoch: 21 Global Step: 438350 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:52,825-Speed 6311.83 samples/sec Loss 4.6680 LearningRate 0.0003 Epoch: 21 Global Step: 438360 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:56,072-Speed 6307.95 samples/sec Loss 4.7182 LearningRate 0.0003 Epoch: 21 Global Step: 438370 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:36:59,319-Speed 6309.40 samples/sec Loss 4.7637 LearningRate 0.0003 Epoch: 21 Global Step: 438380 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:37:02,555-Speed 6331.15 samples/sec Loss 4.7130 LearningRate 0.0003 Epoch: 21 Global Step: 438390 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:05,798-Speed 6315.27 samples/sec Loss 4.6604 LearningRate 0.0003 Epoch: 21 Global Step: 438400 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:09,044-Speed 6310.80 samples/sec Loss 4.7168 LearningRate 0.0003 Epoch: 21 Global Step: 438410 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:12,290-Speed 6310.98 samples/sec Loss 4.6724 LearningRate 0.0003 Epoch: 21 Global Step: 438420 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:15,536-Speed 6310.39 samples/sec Loss 4.7415 LearningRate 0.0003 Epoch: 21 Global Step: 438430 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:18,787-Speed 6300.84 samples/sec Loss 4.6799 LearningRate 0.0003 Epoch: 21 Global Step: 438440 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:22,034-Speed 6309.40 samples/sec Loss 4.7015 LearningRate 0.0003 Epoch: 21 Global Step: 438450 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:25,284-Speed 6302.94 samples/sec Loss 4.6226 LearningRate 0.0003 Epoch: 21 Global Step: 438460 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:28,528-Speed 6313.98 samples/sec Loss 4.6910 LearningRate 0.0003 Epoch: 21 Global Step: 438470 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:31,772-Speed 6315.89 samples/sec Loss 4.6822 LearningRate 0.0003 Epoch: 21 Global Step: 438480 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:35,018-Speed 6310.13 samples/sec Loss 4.7479 LearningRate 0.0003 Epoch: 21 Global Step: 438490 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:37:38,264-Speed 6311.66 samples/sec Loss 4.7011 LearningRate 0.0003 Epoch: 21 Global Step: 438500 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:37:41,496-Speed 6338.70 samples/sec Loss 4.6762 LearningRate 0.0003 Epoch: 21 Global Step: 438510 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:44,741-Speed 6311.49 samples/sec Loss 4.8142 LearningRate 0.0003 Epoch: 21 Global Step: 438520 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:47,988-Speed 6309.32 samples/sec Loss 4.7350 LearningRate 0.0003 Epoch: 21 Global Step: 438530 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:51,233-Speed 6311.94 samples/sec Loss 4.7510 LearningRate 0.0003 Epoch: 21 Global Step: 438540 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:54,481-Speed 6307.98 samples/sec Loss 4.7101 LearningRate 0.0003 Epoch: 21 Global Step: 438550 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:37:57,728-Speed 6309.22 samples/sec Loss 4.7067 LearningRate 0.0003 Epoch: 21 Global Step: 438560 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:00,973-Speed 6312.03 samples/sec Loss 4.7079 LearningRate 0.0003 Epoch: 21 Global Step: 438570 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:04,217-Speed 6314.28 samples/sec Loss 4.6832 LearningRate 0.0003 Epoch: 21 Global Step: 438580 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:07,465-Speed 6306.26 samples/sec Loss 4.7360 LearningRate 0.0003 Epoch: 21 Global Step: 438590 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:10,710-Speed 6313.68 samples/sec Loss 4.7445 LearningRate 0.0003 Epoch: 21 Global Step: 438600 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:13,946-Speed 6329.17 samples/sec Loss 4.7314 LearningRate 0.0003 Epoch: 21 Global Step: 438610 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:17,194-Speed 6307.29 samples/sec Loss 4.7430 LearningRate 0.0003 Epoch: 21 Global Step: 438620 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:20,440-Speed 6311.36 samples/sec Loss 4.7396 LearningRate 0.0003 Epoch: 21 Global Step: 438630 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:23,688-Speed 6306.17 samples/sec Loss 4.6684 LearningRate 0.0003 Epoch: 21 Global Step: 438640 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:26,934-Speed 6309.63 samples/sec Loss 4.7564 LearningRate 0.0003 Epoch: 21 Global Step: 438650 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:30,180-Speed 6311.55 samples/sec Loss 4.7598 LearningRate 0.0003 Epoch: 21 Global Step: 438660 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:33,429-Speed 6304.43 samples/sec Loss 4.7587 LearningRate 0.0003 Epoch: 21 Global Step: 438670 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:36,675-Speed 6310.95 samples/sec Loss 4.7010 LearningRate 0.0003 Epoch: 21 Global Step: 438680 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:39,917-Speed 6318.25 samples/sec Loss 4.7608 LearningRate 0.0003 Epoch: 21 Global Step: 438690 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:43,163-Speed 6311.93 samples/sec Loss 4.7594 LearningRate 0.0003 Epoch: 21 Global Step: 438700 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:46,407-Speed 6315.62 samples/sec Loss 4.7011 LearningRate 0.0003 Epoch: 21 Global Step: 438710 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:38:49,653-Speed 6310.26 samples/sec Loss 4.7486 LearningRate 0.0003 Epoch: 21 Global Step: 438720 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:38:52,886-Speed 6335.30 samples/sec Loss 4.6317 LearningRate 0.0003 Epoch: 21 Global Step: 438730 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:56,132-Speed 6310.50 samples/sec Loss 4.6973 LearningRate 0.0003 Epoch: 21 Global Step: 438740 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:38:59,382-Speed 6304.56 samples/sec Loss 4.7430 LearningRate 0.0003 Epoch: 21 Global Step: 438750 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:02,632-Speed 6302.87 samples/sec Loss 4.6820 LearningRate 0.0003 Epoch: 21 Global Step: 438760 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:05,877-Speed 6312.23 samples/sec Loss 4.7228 LearningRate 0.0003 Epoch: 21 Global Step: 438770 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:09,126-Speed 6304.46 samples/sec Loss 4.7553 LearningRate 0.0003 Epoch: 21 Global Step: 438780 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:12,372-Speed 6310.78 samples/sec Loss 4.6938 LearningRate 0.0003 Epoch: 21 Global Step: 438790 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:15,627-Speed 6305.38 samples/sec Loss 4.7187 LearningRate 0.0003 Epoch: 21 Global Step: 438800 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:18,870-Speed 6315.48 samples/sec Loss 4.7346 LearningRate 0.0003 Epoch: 21 Global Step: 438810 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:22,117-Speed 6310.10 samples/sec Loss 4.7764 LearningRate 0.0003 Epoch: 21 Global Step: 438820 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:25,397-Speed 6245.27 samples/sec Loss 4.7161 LearningRate 0.0003 Epoch: 21 Global Step: 438830 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:28,673-Speed 6253.36 samples/sec Loss 4.7654 LearningRate 0.0003 Epoch: 21 Global Step: 438840 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:31,916-Speed 6315.96 samples/sec Loss 4.7255 LearningRate 0.0003 Epoch: 21 Global Step: 438850 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:35,164-Speed 6307.04 samples/sec Loss 4.7358 LearningRate 0.0003 Epoch: 21 Global Step: 438860 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:38,411-Speed 6309.08 samples/sec Loss 4.6870 LearningRate 0.0003 Epoch: 21 Global Step: 438870 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:41,659-Speed 6306.79 samples/sec Loss 4.7405 LearningRate 0.0003 Epoch: 21 Global Step: 438880 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:44,901-Speed 6319.06 samples/sec Loss 4.6898 LearningRate 0.0003 Epoch: 21 Global Step: 438890 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:48,146-Speed 6311.35 samples/sec Loss 4.5774 LearningRate 0.0003 Epoch: 21 Global Step: 438900 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:51,392-Speed 6311.98 samples/sec Loss 4.6673 LearningRate 0.0003 Epoch: 21 Global Step: 438910 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:54,645-Speed 6296.81 samples/sec Loss 4.6880 LearningRate 0.0003 Epoch: 21 Global Step: 438920 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:39:57,890-Speed 6313.62 samples/sec Loss 4.7355 LearningRate 0.0003 Epoch: 21 Global Step: 438930 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:40:01,123-Speed 6335.17 samples/sec Loss 4.6793 LearningRate 0.0003 Epoch: 21 Global Step: 438940 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:04,367-Speed 6314.94 samples/sec Loss 4.6986 LearningRate 0.0003 Epoch: 21 Global Step: 438950 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:07,615-Speed 6307.46 samples/sec Loss 4.7200 LearningRate 0.0003 Epoch: 21 Global Step: 438960 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:10,861-Speed 6310.19 samples/sec Loss 4.6587 LearningRate 0.0003 Epoch: 21 Global Step: 438970 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:14,108-Speed 6307.90 samples/sec Loss 4.7599 LearningRate 0.0003 Epoch: 21 Global Step: 438980 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:17,352-Speed 6315.91 samples/sec Loss 4.6882 LearningRate 0.0003 Epoch: 21 Global Step: 438990 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:20,598-Speed 6309.80 samples/sec Loss 4.6188 LearningRate 0.0003 Epoch: 21 Global Step: 439000 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:23,856-Speed 6287.60 samples/sec Loss 4.7044 LearningRate 0.0003 Epoch: 21 Global Step: 439010 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:27,102-Speed 6311.72 samples/sec Loss 4.6817 LearningRate 0.0003 Epoch: 21 Global Step: 439020 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:30,369-Speed 6268.85 samples/sec Loss 4.6510 LearningRate 0.0003 Epoch: 21 Global Step: 439030 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:33,615-Speed 6310.51 samples/sec Loss 4.7246 LearningRate 0.0003 Epoch: 21 Global Step: 439040 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:40:36,859-Speed 6314.59 samples/sec Loss 4.6694 LearningRate 0.0003 Epoch: 21 Global Step: 439050 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:40:40,151-Speed 6222.38 samples/sec Loss 4.7195 LearningRate 0.0003 Epoch: 21 Global Step: 439060 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:40:43,435-Speed 6237.96 samples/sec Loss 4.6942 LearningRate 0.0003 Epoch: 21 Global Step: 439070 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:46,681-Speed 6310.49 samples/sec Loss 4.7352 LearningRate 0.0003 Epoch: 21 Global Step: 439080 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:49,927-Speed 6310.96 samples/sec Loss 4.7560 LearningRate 0.0003 Epoch: 21 Global Step: 439090 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:53,175-Speed 6308.11 samples/sec Loss 4.6924 LearningRate 0.0003 Epoch: 21 Global Step: 439100 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:56,422-Speed 6308.21 samples/sec Loss 4.6756 LearningRate 0.0003 Epoch: 21 Global Step: 439110 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:40:59,669-Speed 6308.60 samples/sec Loss 4.7334 LearningRate 0.0003 Epoch: 21 Global Step: 439120 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:02,913-Speed 6315.52 samples/sec Loss 4.7289 LearningRate 0.0003 Epoch: 21 Global Step: 439130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:06,159-Speed 6310.91 samples/sec Loss 4.7842 LearningRate 0.0003 Epoch: 21 Global Step: 439140 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:09,405-Speed 6311.08 samples/sec Loss 4.6982 LearningRate 0.0003 Epoch: 21 Global Step: 439150 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:12,648-Speed 6315.79 samples/sec Loss 4.6988 LearningRate 0.0003 Epoch: 21 Global Step: 439160 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:15,898-Speed 6304.19 samples/sec Loss 4.6814 LearningRate 0.0003 Epoch: 21 Global Step: 439170 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:41:19,144-Speed 6309.00 samples/sec Loss 4.6563 LearningRate 0.0003 Epoch: 21 Global Step: 439180 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-02 07:41:22,374-Speed 6342.56 samples/sec Loss 4.7386 LearningRate 0.0003 Epoch: 21 Global Step: 439190 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:25,656-Speed 6242.48 samples/sec Loss 4.6160 LearningRate 0.0003 Epoch: 21 Global Step: 439200 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:28,899-Speed 6316.47 samples/sec Loss 4.7201 LearningRate 0.0003 Epoch: 21 Global Step: 439210 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:32,145-Speed 6309.92 samples/sec Loss 4.6624 LearningRate 0.0003 Epoch: 21 Global Step: 439220 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:35,388-Speed 6316.67 samples/sec Loss 4.7207 LearningRate 0.0003 Epoch: 21 Global Step: 439230 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:38,636-Speed 6306.91 samples/sec Loss 4.7363 LearningRate 0.0003 Epoch: 21 Global Step: 439240 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:41,880-Speed 6313.32 samples/sec Loss 4.7021 LearningRate 0.0003 Epoch: 21 Global Step: 439250 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:45,124-Speed 6315.61 samples/sec Loss 4.7225 LearningRate 0.0003 Epoch: 21 Global Step: 439260 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-02 07:41:48,367-Speed 6315.77 samples/sec Loss 4.6796 LearningRate 0.0003 Epoch: 21 Global Step: 439270 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:41:51,611-Speed 6316.25 samples/sec Loss 4.7737 LearningRate 0.0003 Epoch: 21 Global Step: 439280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:41:54,857-Speed 6310.41 samples/sec Loss 4.7141 LearningRate 0.0003 Epoch: 21 Global Step: 439290 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:41:58,086-Speed 6344.07 samples/sec Loss 4.7177 LearningRate 0.0003 Epoch: 21 Global Step: 439300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:01,337-Speed 6299.89 samples/sec Loss 4.6587 LearningRate 0.0003 Epoch: 21 Global Step: 439310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:04,582-Speed 6313.56 samples/sec Loss 4.7023 LearningRate 0.0003 Epoch: 21 Global Step: 439320 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:07,830-Speed 6306.99 samples/sec Loss 4.7576 LearningRate 0.0003 Epoch: 21 Global Step: 439330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:11,075-Speed 6313.01 samples/sec Loss 4.6921 LearningRate 0.0003 Epoch: 21 Global Step: 439340 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:14,316-Speed 6321.27 samples/sec Loss 4.7888 LearningRate 0.0003 Epoch: 21 Global Step: 439350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:17,567-Speed 6300.85 samples/sec Loss 4.7351 LearningRate 0.0003 Epoch: 21 Global Step: 439360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:20,815-Speed 6307.47 samples/sec Loss 4.7407 LearningRate 0.0003 Epoch: 21 Global Step: 439370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:24,059-Speed 6314.22 samples/sec Loss 4.7589 LearningRate 0.0003 Epoch: 21 Global Step: 439380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:27,305-Speed 6310.29 samples/sec Loss 4.8318 LearningRate 0.0003 Epoch: 21 Global Step: 439390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:30,546-Speed 6319.01 samples/sec Loss 4.7311 LearningRate 0.0003 Epoch: 21 Global Step: 439400 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:33,827-Speed 6244.52 samples/sec Loss 4.6535 LearningRate 0.0003 Epoch: 21 Global Step: 439410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:37,129-Speed 6203.58 samples/sec Loss 4.7274 LearningRate 0.0003 Epoch: 21 Global Step: 439420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:40,374-Speed 6311.78 samples/sec Loss 4.6283 LearningRate 0.0003 Epoch: 21 Global Step: 439430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:43,622-Speed 6307.17 samples/sec Loss 4.7082 LearningRate 0.0003 Epoch: 21 Global Step: 439440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:46,867-Speed 6314.30 samples/sec Loss 4.7159 LearningRate 0.0003 Epoch: 21 Global Step: 439450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:50,110-Speed 6316.78 samples/sec Loss 4.6554 LearningRate 0.0003 Epoch: 21 Global Step: 439460 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:53,353-Speed 6315.98 samples/sec Loss 4.6703 LearningRate 0.0003 Epoch: 21 Global Step: 439470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:56,599-Speed 6310.82 samples/sec Loss 4.6401 LearningRate 0.0003 Epoch: 21 Global Step: 439480 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:42:59,844-Speed 6311.25 samples/sec Loss 4.6518 LearningRate 0.0003 Epoch: 21 Global Step: 439490 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:03,089-Speed 6312.73 samples/sec Loss 4.7596 LearningRate 0.0003 Epoch: 21 Global Step: 439500 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:43:06,336-Speed 6309.67 samples/sec Loss 4.6679 LearningRate 0.0003 Epoch: 21 Global Step: 439510 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:43:09,569-Speed 6336.06 samples/sec Loss 4.7395 LearningRate 0.0003 Epoch: 21 Global Step: 439520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:12,838-Speed 6266.66 samples/sec Loss 4.7082 LearningRate 0.0003 Epoch: 21 Global Step: 439530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:16,108-Speed 6264.05 samples/sec Loss 4.7125 LearningRate 0.0003 Epoch: 21 Global Step: 439540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:19,355-Speed 6310.01 samples/sec Loss 4.7369 LearningRate 0.0003 Epoch: 21 Global Step: 439550 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:22,603-Speed 6307.00 samples/sec Loss 4.7740 LearningRate 0.0003 Epoch: 21 Global Step: 439560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:25,847-Speed 6313.70 samples/sec Loss 4.6795 LearningRate 0.0003 Epoch: 21 Global Step: 439570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:29,091-Speed 6313.93 samples/sec Loss 4.7216 LearningRate 0.0003 Epoch: 21 Global Step: 439580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:32,335-Speed 6314.77 samples/sec Loss 4.6786 LearningRate 0.0003 Epoch: 21 Global Step: 439590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:35,580-Speed 6313.77 samples/sec Loss 4.6660 LearningRate 0.0003 Epoch: 21 Global Step: 439600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:38,837-Speed 6288.52 samples/sec Loss 4.7627 LearningRate 0.0003 Epoch: 21 Global Step: 439610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:42,084-Speed 6308.26 samples/sec Loss 4.7162 LearningRate 0.0003 Epoch: 21 Global Step: 439620 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:43:45,314-Speed 6343.34 samples/sec Loss 4.6798 LearningRate 0.0003 Epoch: 21 Global Step: 439630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:48,558-Speed 6314.09 samples/sec Loss 4.7623 LearningRate 0.0003 Epoch: 21 Global Step: 439640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:51,800-Speed 6321.27 samples/sec Loss 4.6510 LearningRate 0.0003 Epoch: 21 Global Step: 439650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:55,048-Speed 6308.09 samples/sec Loss 4.7006 LearningRate 0.0003 Epoch: 21 Global Step: 439660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:43:58,290-Speed 6317.06 samples/sec Loss 4.7060 LearningRate 0.0003 Epoch: 21 Global Step: 439670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:01,536-Speed 6311.81 samples/sec Loss 4.7491 LearningRate 0.0003 Epoch: 21 Global Step: 439680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:04,784-Speed 6305.85 samples/sec Loss 4.6764 LearningRate 0.0003 Epoch: 21 Global Step: 439690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:08,028-Speed 6315.91 samples/sec Loss 4.6981 LearningRate 0.0003 Epoch: 21 Global Step: 439700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:11,273-Speed 6311.89 samples/sec Loss 4.6939 LearningRate 0.0003 Epoch: 21 Global Step: 439710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:14,517-Speed 6314.73 samples/sec Loss 4.7297 LearningRate 0.0003 Epoch: 21 Global Step: 439720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:17,765-Speed 6306.05 samples/sec Loss 4.7315 LearningRate 0.0003 Epoch: 21 Global Step: 439730 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:44:20,995-Speed 6342.36 samples/sec Loss 4.7650 LearningRate 0.0003 Epoch: 21 Global Step: 439740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:24,245-Speed 6303.10 samples/sec Loss 4.7540 LearningRate 0.0003 Epoch: 21 Global Step: 439750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:27,491-Speed 6311.53 samples/sec Loss 4.6934 LearningRate 0.0003 Epoch: 21 Global Step: 439760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:30,735-Speed 6314.44 samples/sec Loss 4.6954 LearningRate 0.0003 Epoch: 21 Global Step: 439770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:33,982-Speed 6309.75 samples/sec Loss 4.6856 LearningRate 0.0003 Epoch: 21 Global Step: 439780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:37,226-Speed 6314.87 samples/sec Loss 4.6631 LearningRate 0.0003 Epoch: 21 Global Step: 439790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:40,469-Speed 6314.83 samples/sec Loss 4.6917 LearningRate 0.0003 Epoch: 21 Global Step: 439800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:43,716-Speed 6308.86 samples/sec Loss 4.6290 LearningRate 0.0003 Epoch: 21 Global Step: 439810 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:46,975-Speed 6286.88 samples/sec Loss 4.6539 LearningRate 0.0003 Epoch: 21 Global Step: 439820 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:50,234-Speed 6285.68 samples/sec Loss 4.7020 LearningRate 0.0003 Epoch: 21 Global Step: 439830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:53,479-Speed 6312.82 samples/sec Loss 4.6763 LearningRate 0.0003 Epoch: 21 Global Step: 439840 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:44:56,710-Speed 6338.00 samples/sec Loss 4.7267 LearningRate 0.0003 Epoch: 21 Global Step: 439850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:44:59,960-Speed 6302.96 samples/sec Loss 4.7085 LearningRate 0.0003 Epoch: 21 Global Step: 439860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:03,205-Speed 6313.90 samples/sec Loss 4.7543 LearningRate 0.0003 Epoch: 21 Global Step: 439870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:06,450-Speed 6313.05 samples/sec Loss 4.7160 LearningRate 0.0003 Epoch: 21 Global Step: 439880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:09,694-Speed 6314.29 samples/sec Loss 4.7334 LearningRate 0.0003 Epoch: 21 Global Step: 439890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:12,940-Speed 6310.26 samples/sec Loss 4.7654 LearningRate 0.0003 Epoch: 21 Global Step: 439900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:16,185-Speed 6313.23 samples/sec Loss 4.7266 LearningRate 0.0003 Epoch: 21 Global Step: 439910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:19,431-Speed 6310.11 samples/sec Loss 4.6455 LearningRate 0.0003 Epoch: 21 Global Step: 439920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:22,675-Speed 6314.49 samples/sec Loss 4.7134 LearningRate 0.0003 Epoch: 21 Global Step: 439930 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:25,920-Speed 6313.35 samples/sec Loss 4.7211 LearningRate 0.0003 Epoch: 21 Global Step: 439940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:29,167-Speed 6309.49 samples/sec Loss 4.6902 LearningRate 0.0003 Epoch: 21 Global Step: 439950 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:45:32,397-Speed 6341.04 samples/sec Loss 4.6396 LearningRate 0.0003 Epoch: 21 Global Step: 439960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:35,646-Speed 6305.41 samples/sec Loss 4.7357 LearningRate 0.0003 Epoch: 21 Global Step: 439970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:38,892-Speed 6311.24 samples/sec Loss 4.6810 LearningRate 0.0003 Epoch: 21 Global Step: 439980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:42,136-Speed 6314.08 samples/sec Loss 4.7230 LearningRate 0.0003 Epoch: 21 Global Step: 439990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:45,381-Speed 6312.81 samples/sec Loss 4.7257 LearningRate 0.0003 Epoch: 21 Global Step: 440000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:48,626-Speed 6313.05 samples/sec Loss 4.6592 LearningRate 0.0003 Epoch: 21 Global Step: 440010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:51,870-Speed 6313.90 samples/sec Loss 4.7383 LearningRate 0.0003 Epoch: 21 Global Step: 440020 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:55,112-Speed 6318.93 samples/sec Loss 4.6963 LearningRate 0.0003 Epoch: 21 Global Step: 440030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:45:58,355-Speed 6316.50 samples/sec Loss 4.7443 LearningRate 0.0003 Epoch: 21 Global Step: 440040 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:01,605-Speed 6302.80 samples/sec Loss 4.6956 LearningRate 0.0003 Epoch: 21 Global Step: 440050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:04,843-Speed 6325.59 samples/sec Loss 4.6934 LearningRate 0.0003 Epoch: 21 Global Step: 440060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:08,087-Speed 6315.07 samples/sec Loss 4.6913 LearningRate 0.0003 Epoch: 21 Global Step: 440070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:11,329-Speed 6318.29 samples/sec Loss 4.6412 LearningRate 0.0003 Epoch: 21 Global Step: 440080 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:14,575-Speed 6311.21 samples/sec Loss 4.6602 LearningRate 0.0003 Epoch: 21 Global Step: 440090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:17,820-Speed 6313.22 samples/sec Loss 4.6758 LearningRate 0.0003 Epoch: 21 Global Step: 440100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:21,066-Speed 6311.07 samples/sec Loss 4.7513 LearningRate 0.0003 Epoch: 21 Global Step: 440110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:24,310-Speed 6314.08 samples/sec Loss 4.6503 LearningRate 0.0003 Epoch: 21 Global Step: 440120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:27,554-Speed 6315.45 samples/sec Loss 4.6790 LearningRate 0.0003 Epoch: 21 Global Step: 440130 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:30,801-Speed 6309.45 samples/sec Loss 4.7128 LearningRate 0.0003 Epoch: 21 Global Step: 440140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:34,047-Speed 6309.76 samples/sec Loss 4.6653 LearningRate 0.0003 Epoch: 21 Global Step: 440150 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:37,283-Speed 6330.46 samples/sec Loss 4.7466 LearningRate 0.0003 Epoch: 21 Global Step: 440160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:40,528-Speed 6314.12 samples/sec Loss 4.6505 LearningRate 0.0003 Epoch: 21 Global Step: 440170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:43,773-Speed 6312.82 samples/sec Loss 4.6540 LearningRate 0.0003 Epoch: 21 Global Step: 440180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:47,018-Speed 6312.68 samples/sec Loss 4.7018 LearningRate 0.0003 Epoch: 21 Global Step: 440190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:50,266-Speed 6306.37 samples/sec Loss 4.6930 LearningRate 0.0003 Epoch: 21 Global Step: 440200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:53,509-Speed 6315.76 samples/sec Loss 4.7262 LearningRate 0.0003 Epoch: 21 Global Step: 440210 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:56,754-Speed 6314.07 samples/sec Loss 4.6844 LearningRate 0.0003 Epoch: 21 Global Step: 440220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:46:59,998-Speed 6314.60 samples/sec Loss 4.6374 LearningRate 0.0003 Epoch: 21 Global Step: 440230 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:47:03,246-Speed 6305.70 samples/sec Loss 4.6781 LearningRate 0.0003 Epoch: 21 Global Step: 440240 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:47:06,479-Speed 6336.53 samples/sec Loss 4.7482 LearningRate 0.0003 Epoch: 21 Global Step: 440250 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:47:09,723-Speed 6313.86 samples/sec Loss 4.7887 LearningRate 0.0003 Epoch: 21 Global Step: 440260 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:47:12,970-Speed 6309.10 samples/sec Loss 4.6863 LearningRate 0.0003 Epoch: 21 Global Step: 440270 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:47:16,217-Speed 6309.46 samples/sec Loss 4.6771 LearningRate 0.0003 Epoch: 21 Global Step: 440280 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:47:19,461-Speed 6314.23 samples/sec Loss 4.6582 LearningRate 0.0003 Epoch: 21 Global Step: 440290 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:47:22,705-Speed 6315.31 samples/sec Loss 4.7187 LearningRate 0.0003 Epoch: 21 Global Step: 440300 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:47:25,951-Speed 6310.79 samples/sec Loss 4.7132 LearningRate 0.0003 Epoch: 21 Global Step: 440310 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:47:29,195-Speed 6314.13 samples/sec Loss 4.7669 LearningRate 0.0003 Epoch: 21 Global Step: 440320 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:47:32,436-Speed 6320.95 samples/sec Loss 4.7231 LearningRate 0.0003 Epoch: 21 Global Step: 440330 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:47:35,676-Speed 6322.39 samples/sec Loss 4.7537 LearningRate 0.0003 Epoch: 21 Global Step: 440340 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:47:38,920-Speed 6315.62 samples/sec Loss 4.7029 LearningRate 0.0003 Epoch: 21 Global Step: 440350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:47:42,168-Speed 6306.97 samples/sec Loss 4.7223 LearningRate 0.0003 Epoch: 21 Global Step: 440360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:47:45,411-Speed 6317.41 samples/sec Loss 4.6875 LearningRate 0.0003 Epoch: 21 Global Step: 440370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:47:48,665-Speed 6295.07 samples/sec Loss 4.7347 LearningRate 0.0003 Epoch: 21 Global Step: 440380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:47:51,914-Speed 6304.35 samples/sec Loss 4.7174 LearningRate 0.0003 Epoch: 21 Global Step: 440390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:47:55,161-Speed 6308.38 samples/sec Loss 4.7028 LearningRate 0.0003 Epoch: 21 Global Step: 440400 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:47:58,406-Speed 6312.68 samples/sec Loss 4.7441 LearningRate 0.0003 Epoch: 21 Global Step: 440410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:01,653-Speed 6308.53 samples/sec Loss 4.7050 LearningRate 0.0003 Epoch: 21 Global Step: 440420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:04,899-Speed 6312.41 samples/sec Loss 4.6729 LearningRate 0.0003 Epoch: 21 Global Step: 440430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:08,143-Speed 6313.14 samples/sec Loss 4.7455 LearningRate 0.0003 Epoch: 21 Global Step: 440440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:11,393-Speed 6303.04 samples/sec Loss 4.6297 LearningRate 0.0003 Epoch: 21 Global Step: 440450 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:48:14,637-Speed 6314.20 samples/sec Loss 4.7331 LearningRate 0.0003 Epoch: 21 Global Step: 440460 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:48:17,870-Speed 6336.26 samples/sec Loss 4.6830 LearningRate 0.0003 Epoch: 21 Global Step: 440470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:21,116-Speed 6310.94 samples/sec Loss 4.7295 LearningRate 0.0003 Epoch: 21 Global Step: 440480 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:24,359-Speed 6317.58 samples/sec Loss 4.6703 LearningRate 0.0003 Epoch: 21 Global Step: 440490 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:27,604-Speed 6311.36 samples/sec Loss 4.6921 LearningRate 0.0003 Epoch: 21 Global Step: 440500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:30,848-Speed 6315.45 samples/sec Loss 4.7178 LearningRate 0.0003 Epoch: 21 Global Step: 440510 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:34,091-Speed 6315.26 samples/sec Loss 4.7222 LearningRate 0.0003 Epoch: 21 Global Step: 440520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:37,336-Speed 6313.97 samples/sec Loss 4.6308 LearningRate 0.0003 Epoch: 21 Global Step: 440530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:40,582-Speed 6314.30 samples/sec Loss 4.6989 LearningRate 0.0003 Epoch: 21 Global Step: 440540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:43,830-Speed 6305.13 samples/sec Loss 4.7099 LearningRate 0.0003 Epoch: 21 Global Step: 440550 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:47,075-Speed 6313.44 samples/sec Loss 4.6840 LearningRate 0.0003 Epoch: 21 Global Step: 440560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:50,309-Speed 6336.00 samples/sec Loss 4.6966 LearningRate 0.0003 Epoch: 21 Global Step: 440570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:53,556-Speed 6308.01 samples/sec Loss 4.6812 LearningRate 0.0003 Epoch: 21 Global Step: 440580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:48:56,802-Speed 6311.86 samples/sec Loss 4.7530 LearningRate 0.0003 Epoch: 21 Global Step: 440590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:00,046-Speed 6313.18 samples/sec Loss 4.7028 LearningRate 0.0003 Epoch: 21 Global Step: 440600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:03,288-Speed 6319.37 samples/sec Loss 4.7399 LearningRate 0.0003 Epoch: 21 Global Step: 440610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:06,534-Speed 6310.15 samples/sec Loss 4.6799 LearningRate 0.0003 Epoch: 21 Global Step: 440620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:09,775-Speed 6319.58 samples/sec Loss 4.6637 LearningRate 0.0003 Epoch: 21 Global Step: 440630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:13,019-Speed 6315.13 samples/sec Loss 4.7673 LearningRate 0.0003 Epoch: 21 Global Step: 440640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:16,264-Speed 6313.04 samples/sec Loss 4.7316 LearningRate 0.0003 Epoch: 21 Global Step: 440650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:19,508-Speed 6315.29 samples/sec Loss 4.6908 LearningRate 0.0003 Epoch: 21 Global Step: 440660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:22,751-Speed 6315.89 samples/sec Loss 4.6915 LearningRate 0.0003 Epoch: 21 Global Step: 440670 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:49:25,982-Speed 6340.36 samples/sec Loss 4.7250 LearningRate 0.0003 Epoch: 21 Global Step: 440680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:29,237-Speed 6292.51 samples/sec Loss 4.7308 LearningRate 0.0003 Epoch: 21 Global Step: 440690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:32,484-Speed 6309.69 samples/sec Loss 4.7064 LearningRate 0.0003 Epoch: 21 Global Step: 440700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:35,731-Speed 6308.85 samples/sec Loss 4.6575 LearningRate 0.0003 Epoch: 21 Global Step: 440710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:38,975-Speed 6313.16 samples/sec Loss 4.6954 LearningRate 0.0003 Epoch: 21 Global Step: 440720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:42,217-Speed 6318.71 samples/sec Loss 4.6709 LearningRate 0.0003 Epoch: 21 Global Step: 440730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:45,460-Speed 6316.30 samples/sec Loss 4.7190 LearningRate 0.0003 Epoch: 21 Global Step: 440740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:48,703-Speed 6316.65 samples/sec Loss 4.7467 LearningRate 0.0003 Epoch: 21 Global Step: 440750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:51,952-Speed 6305.83 samples/sec Loss 4.7233 LearningRate 0.0003 Epoch: 21 Global Step: 440760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:55,198-Speed 6311.78 samples/sec Loss 4.7214 LearningRate 0.0003 Epoch: 21 Global Step: 440770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:49:58,447-Speed 6304.77 samples/sec Loss 4.6918 LearningRate 0.0003 Epoch: 21 Global Step: 440780 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:50:01,679-Speed 6337.10 samples/sec Loss 4.7134 LearningRate 0.0003 Epoch: 21 Global Step: 440790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:50:04,927-Speed 6308.15 samples/sec Loss 4.6822 LearningRate 0.0003 Epoch: 21 Global Step: 440800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:50:08,181-Speed 6295.13 samples/sec Loss 4.7230 LearningRate 0.0003 Epoch: 21 Global Step: 440810 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:50:11,506-Speed 6161.00 samples/sec Loss 4.7674 LearningRate 0.0003 Epoch: 21 Global Step: 440820 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:50:14,765-Speed 6285.54 samples/sec Loss 4.7513 LearningRate 0.0003 Epoch: 21 Global Step: 440830 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:50:18,009-Speed 6314.62 samples/sec Loss 4.7533 LearningRate 0.0003 Epoch: 21 Global Step: 440840 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:50:21,273-Speed 6276.32 samples/sec Loss 4.6812 LearningRate 0.0003 Epoch: 21 Global Step: 440850 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:50:24,526-Speed 6297.00 samples/sec Loss 4.7050 LearningRate 0.0003 Epoch: 21 Global Step: 440860 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:50:27,846-Speed 6169.71 samples/sec Loss 4.6752 LearningRate 0.0003 Epoch: 21 Global Step: 440870 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:50:31,117-Speed 6262.79 samples/sec Loss 4.7441 LearningRate 0.0003 Epoch: 21 Global Step: 440880 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:50:34,362-Speed 6311.64 samples/sec Loss 4.7558 LearningRate 0.0003 Epoch: 21 Global Step: 440890 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:50:37,608-Speed 6311.25 samples/sec Loss 4.6891 LearningRate 0.0003 Epoch: 21 Global Step: 440900 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:50:40,850-Speed 6318.18 samples/sec Loss 4.7802 LearningRate 0.0003 Epoch: 21 Global Step: 440910 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 07:50:44,092-Speed 6319.37 samples/sec Loss 4.6976 LearningRate 0.0003 Epoch: 21 Global Step: 440920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:50:47,338-Speed 6310.14 samples/sec Loss 4.6832 LearningRate 0.0003 Epoch: 21 Global Step: 440930 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:50:50,580-Speed 6318.65 samples/sec Loss 4.6899 LearningRate 0.0003 Epoch: 21 Global Step: 440940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:50:53,823-Speed 6316.73 samples/sec Loss 4.6844 LearningRate 0.0003 Epoch: 21 Global Step: 440950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:50:57,063-Speed 6322.05 samples/sec Loss 4.7347 LearningRate 0.0003 Epoch: 21 Global Step: 440960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:00,312-Speed 6306.89 samples/sec Loss 4.6521 LearningRate 0.0003 Epoch: 21 Global Step: 440970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:03,555-Speed 6315.64 samples/sec Loss 4.7243 LearningRate 0.0003 Epoch: 21 Global Step: 440980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:06,798-Speed 6317.08 samples/sec Loss 4.6761 LearningRate 0.0003 Epoch: 21 Global Step: 440990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:10,043-Speed 6313.13 samples/sec Loss 4.7030 LearningRate 0.0003 Epoch: 21 Global Step: 441000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:13,287-Speed 6314.01 samples/sec Loss 4.7092 LearningRate 0.0003 Epoch: 21 Global Step: 441010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:16,532-Speed 6312.96 samples/sec Loss 4.7058 LearningRate 0.0003 Epoch: 21 Global Step: 441020 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:51:19,763-Speed 6340.44 samples/sec Loss 4.7306 LearningRate 0.0003 Epoch: 21 Global Step: 441030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:23,014-Speed 6301.04 samples/sec Loss 4.6176 LearningRate 0.0003 Epoch: 21 Global Step: 441040 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:26,259-Speed 6312.11 samples/sec Loss 4.6786 LearningRate 0.0003 Epoch: 21 Global Step: 441050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:29,583-Speed 6162.43 samples/sec Loss 4.6826 LearningRate 0.0003 Epoch: 21 Global Step: 441060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:32,827-Speed 6314.42 samples/sec Loss 4.7028 LearningRate 0.0003 Epoch: 21 Global Step: 441070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:36,081-Speed 6294.49 samples/sec Loss 4.6410 LearningRate 0.0003 Epoch: 21 Global Step: 441080 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:39,327-Speed 6311.59 samples/sec Loss 4.7616 LearningRate 0.0003 Epoch: 21 Global Step: 441090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:42,569-Speed 6317.56 samples/sec Loss 4.7775 LearningRate 0.0003 Epoch: 21 Global Step: 441100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:45,819-Speed 6302.51 samples/sec Loss 4.6894 LearningRate 0.0003 Epoch: 21 Global Step: 441110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:49,147-Speed 6156.26 samples/sec Loss 4.7311 LearningRate 0.0003 Epoch: 21 Global Step: 441120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:51:52,428-Speed 6243.00 samples/sec Loss 4.6660 LearningRate 0.0003 Epoch: 21 Global Step: 441130 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:51:55,714-Speed 6234.30 samples/sec Loss 4.7401 LearningRate 0.0003 Epoch: 21 Global Step: 441140 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:51:58,960-Speed 6311.19 samples/sec Loss 4.7127 LearningRate 0.0003 Epoch: 21 Global Step: 441150 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:52:02,216-Speed 6290.15 samples/sec Loss 4.7026 LearningRate 0.0003 Epoch: 21 Global Step: 441160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:05,467-Speed 6300.97 samples/sec Loss 4.6979 LearningRate 0.0003 Epoch: 21 Global Step: 441170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:08,713-Speed 6311.34 samples/sec Loss 4.6780 LearningRate 0.0003 Epoch: 21 Global Step: 441180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:11,970-Speed 6289.82 samples/sec Loss 4.6901 LearningRate 0.0003 Epoch: 21 Global Step: 441190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:15,221-Speed 6300.86 samples/sec Loss 4.7000 LearningRate 0.0003 Epoch: 21 Global Step: 441200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:18,471-Speed 6302.76 samples/sec Loss 4.7103 LearningRate 0.0003 Epoch: 21 Global Step: 441210 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:21,718-Speed 6309.73 samples/sec Loss 4.7243 LearningRate 0.0003 Epoch: 21 Global Step: 441220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:24,970-Speed 6298.41 samples/sec Loss 4.7407 LearningRate 0.0003 Epoch: 21 Global Step: 441230 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:28,221-Speed 6305.90 samples/sec Loss 4.6464 LearningRate 0.0003 Epoch: 21 Global Step: 441240 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:31,467-Speed 6309.32 samples/sec Loss 4.6638 LearningRate 0.0003 Epoch: 21 Global Step: 441250 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:34,714-Speed 6308.83 samples/sec Loss 4.7385 LearningRate 0.0003 Epoch: 21 Global Step: 441260 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:52:37,944-Speed 6342.56 samples/sec Loss 4.6828 LearningRate 0.0003 Epoch: 21 Global Step: 441270 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:41,189-Speed 6313.52 samples/sec Loss 4.7378 LearningRate 0.0003 Epoch: 21 Global Step: 441280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:44,436-Speed 6308.23 samples/sec Loss 4.7101 LearningRate 0.0003 Epoch: 21 Global Step: 441290 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:47,685-Speed 6305.06 samples/sec Loss 4.6674 LearningRate 0.0003 Epoch: 21 Global Step: 441300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:50,931-Speed 6309.52 samples/sec Loss 4.7240 LearningRate 0.0003 Epoch: 21 Global Step: 441310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:54,177-Speed 6311.65 samples/sec Loss 4.6670 LearningRate 0.0003 Epoch: 21 Global Step: 441320 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:52:57,419-Speed 6319.41 samples/sec Loss 4.6263 LearningRate 0.0003 Epoch: 21 Global Step: 441330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:00,665-Speed 6309.26 samples/sec Loss 4.6717 LearningRate 0.0003 Epoch: 21 Global Step: 441340 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:03,909-Speed 6314.60 samples/sec Loss 4.7213 LearningRate 0.0003 Epoch: 21 Global Step: 441350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:07,155-Speed 6310.76 samples/sec Loss 4.7399 LearningRate 0.0003 Epoch: 21 Global Step: 441360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:10,414-Speed 6286.25 samples/sec Loss 4.6758 LearningRate 0.0003 Epoch: 21 Global Step: 441370 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:53:13,645-Speed 6341.53 samples/sec Loss 4.6849 LearningRate 0.0003 Epoch: 21 Global Step: 441380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:16,889-Speed 6313.12 samples/sec Loss 4.7343 LearningRate 0.0003 Epoch: 21 Global Step: 441390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:20,136-Speed 6309.15 samples/sec Loss 4.6402 LearningRate 0.0003 Epoch: 21 Global Step: 441400 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:23,388-Speed 6299.22 samples/sec Loss 4.6959 LearningRate 0.0003 Epoch: 21 Global Step: 441410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:26,636-Speed 6307.92 samples/sec Loss 4.7074 LearningRate 0.0003 Epoch: 21 Global Step: 441420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:29,880-Speed 6313.72 samples/sec Loss 4.6844 LearningRate 0.0003 Epoch: 21 Global Step: 441430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:33,127-Speed 6308.69 samples/sec Loss 4.6837 LearningRate 0.0003 Epoch: 21 Global Step: 441440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:36,373-Speed 6311.29 samples/sec Loss 4.6730 LearningRate 0.0003 Epoch: 21 Global Step: 441450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:39,624-Speed 6300.87 samples/sec Loss 4.6837 LearningRate 0.0003 Epoch: 21 Global Step: 441460 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:42,866-Speed 6317.43 samples/sec Loss 4.6613 LearningRate 0.0003 Epoch: 21 Global Step: 441470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:46,111-Speed 6313.02 samples/sec Loss 4.7524 LearningRate 0.0003 Epoch: 21 Global Step: 441480 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:53:49,361-Speed 6304.38 samples/sec Loss 4.6293 LearningRate 0.0003 Epoch: 21 Global Step: 441490 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:53:52,593-Speed 6337.80 samples/sec Loss 4.7790 LearningRate 0.0003 Epoch: 21 Global Step: 441500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:55,841-Speed 6307.00 samples/sec Loss 4.6160 LearningRate 0.0003 Epoch: 21 Global Step: 441510 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:53:59,084-Speed 6315.84 samples/sec Loss 4.7738 LearningRate 0.0003 Epoch: 21 Global Step: 441520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:02,334-Speed 6302.04 samples/sec Loss 4.7044 LearningRate 0.0003 Epoch: 21 Global Step: 441530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:05,583-Speed 6306.45 samples/sec Loss 4.6886 LearningRate 0.0003 Epoch: 21 Global Step: 441540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:08,830-Speed 6308.63 samples/sec Loss 4.6484 LearningRate 0.0003 Epoch: 21 Global Step: 441550 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:12,081-Speed 6300.83 samples/sec Loss 4.7440 LearningRate 0.0003 Epoch: 21 Global Step: 441560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:15,330-Speed 6303.55 samples/sec Loss 4.6297 LearningRate 0.0003 Epoch: 21 Global Step: 441570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:18,579-Speed 6304.89 samples/sec Loss 4.6530 LearningRate 0.0003 Epoch: 21 Global Step: 441580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:21,838-Speed 6285.73 samples/sec Loss 4.6558 LearningRate 0.0003 Epoch: 21 Global Step: 441590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:25,088-Speed 6303.98 samples/sec Loss 4.7399 LearningRate 0.0003 Epoch: 21 Global Step: 441600 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:54:28,321-Speed 6335.94 samples/sec Loss 4.7531 LearningRate 0.0003 Epoch: 21 Global Step: 441610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:31,564-Speed 6316.90 samples/sec Loss 4.7352 LearningRate 0.0003 Epoch: 21 Global Step: 441620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:34,812-Speed 6307.17 samples/sec Loss 4.6133 LearningRate 0.0003 Epoch: 21 Global Step: 441630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:38,062-Speed 6303.95 samples/sec Loss 4.7285 LearningRate 0.0003 Epoch: 21 Global Step: 441640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:41,308-Speed 6310.50 samples/sec Loss 4.6959 LearningRate 0.0003 Epoch: 21 Global Step: 441650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:44,554-Speed 6310.37 samples/sec Loss 4.7268 LearningRate 0.0003 Epoch: 21 Global Step: 441660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:47,802-Speed 6305.85 samples/sec Loss 4.7099 LearningRate 0.0003 Epoch: 21 Global Step: 441670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:51,048-Speed 6312.08 samples/sec Loss 4.6897 LearningRate 0.0003 Epoch: 21 Global Step: 441680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:54,291-Speed 6315.27 samples/sec Loss 4.6935 LearningRate 0.0003 Epoch: 21 Global Step: 441690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:54:57,538-Speed 6309.05 samples/sec Loss 4.6972 LearningRate 0.0003 Epoch: 21 Global Step: 441700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:00,771-Speed 6336.98 samples/sec Loss 4.7491 LearningRate 0.0003 Epoch: 21 Global Step: 441710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:04,016-Speed 6311.74 samples/sec Loss 4.7475 LearningRate 0.0003 Epoch: 21 Global Step: 441720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:07,263-Speed 6310.08 samples/sec Loss 4.7556 LearningRate 0.0003 Epoch: 21 Global Step: 441730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:10,508-Speed 6311.62 samples/sec Loss 4.7904 LearningRate 0.0003 Epoch: 21 Global Step: 441740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:13,755-Speed 6308.84 samples/sec Loss 4.6804 LearningRate 0.0003 Epoch: 21 Global Step: 441750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:17,003-Speed 6306.73 samples/sec Loss 4.7405 LearningRate 0.0003 Epoch: 21 Global Step: 441760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:20,244-Speed 6319.94 samples/sec Loss 4.6831 LearningRate 0.0003 Epoch: 21 Global Step: 441770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:23,496-Speed 6299.61 samples/sec Loss 4.6759 LearningRate 0.0003 Epoch: 21 Global Step: 441780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:26,742-Speed 6311.53 samples/sec Loss 4.6819 LearningRate 0.0003 Epoch: 21 Global Step: 441790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:29,987-Speed 6311.71 samples/sec Loss 4.6206 LearningRate 0.0003 Epoch: 21 Global Step: 441800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:33,230-Speed 6316.41 samples/sec Loss 4.6233 LearningRate 0.0003 Epoch: 21 Global Step: 441810 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:55:36,486-Speed 6291.37 samples/sec Loss 4.7271 LearningRate 0.0003 Epoch: 21 Global Step: 441820 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:55:39,732-Speed 6310.74 samples/sec Loss 4.6782 LearningRate 0.0003 Epoch: 21 Global Step: 441830 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:55:42,963-Speed 6340.23 samples/sec Loss 4.6934 LearningRate 0.0003 Epoch: 21 Global Step: 441840 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:46,212-Speed 6305.53 samples/sec Loss 4.6832 LearningRate 0.0003 Epoch: 21 Global Step: 441850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:49,456-Speed 6313.85 samples/sec Loss 4.6922 LearningRate 0.0003 Epoch: 21 Global Step: 441860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:52,706-Speed 6304.34 samples/sec Loss 4.6815 LearningRate 0.0003 Epoch: 21 Global Step: 441870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:55,949-Speed 6316.28 samples/sec Loss 4.6903 LearningRate 0.0003 Epoch: 21 Global Step: 441880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:55:59,196-Speed 6309.28 samples/sec Loss 4.7477 LearningRate 0.0003 Epoch: 21 Global Step: 441890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:02,440-Speed 6313.81 samples/sec Loss 4.7200 LearningRate 0.0003 Epoch: 21 Global Step: 441900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:05,683-Speed 6315.73 samples/sec Loss 4.6901 LearningRate 0.0003 Epoch: 21 Global Step: 441910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:08,932-Speed 6306.46 samples/sec Loss 4.6701 LearningRate 0.0003 Epoch: 21 Global Step: 441920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:12,175-Speed 6315.47 samples/sec Loss 4.7030 LearningRate 0.0003 Epoch: 21 Global Step: 441930 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:15,407-Speed 6338.29 samples/sec Loss 4.6524 LearningRate 0.0003 Epoch: 21 Global Step: 441940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:18,653-Speed 6309.98 samples/sec Loss 4.6635 LearningRate 0.0003 Epoch: 21 Global Step: 441950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:21,898-Speed 6313.56 samples/sec Loss 4.6408 LearningRate 0.0003 Epoch: 21 Global Step: 441960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:25,145-Speed 6309.57 samples/sec Loss 4.6700 LearningRate 0.0003 Epoch: 21 Global Step: 441970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:28,390-Speed 6312.00 samples/sec Loss 4.6951 LearningRate 0.0003 Epoch: 21 Global Step: 441980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:31,635-Speed 6312.69 samples/sec Loss 4.6797 LearningRate 0.0003 Epoch: 21 Global Step: 441990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:34,879-Speed 6314.53 samples/sec Loss 4.6284 LearningRate 0.0003 Epoch: 21 Global Step: 442000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:38,126-Speed 6308.40 samples/sec Loss 4.7311 LearningRate 0.0003 Epoch: 21 Global Step: 442010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:41,372-Speed 6311.36 samples/sec Loss 4.7035 LearningRate 0.0003 Epoch: 21 Global Step: 442020 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:44,616-Speed 6313.89 samples/sec Loss 4.6722 LearningRate 0.0003 Epoch: 21 Global Step: 442030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:47,863-Speed 6309.51 samples/sec Loss 4.6507 LearningRate 0.0003 Epoch: 21 Global Step: 442040 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:56:51,095-Speed 6337.64 samples/sec Loss 4.6406 LearningRate 0.0003 Epoch: 21 Global Step: 442050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:54,343-Speed 6307.47 samples/sec Loss 4.6563 LearningRate 0.0003 Epoch: 21 Global Step: 442060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:56:57,588-Speed 6312.56 samples/sec Loss 4.6754 LearningRate 0.0003 Epoch: 21 Global Step: 442070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:00,834-Speed 6311.98 samples/sec Loss 4.6858 LearningRate 0.0003 Epoch: 21 Global Step: 442080 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:04,116-Speed 6240.82 samples/sec Loss 4.7570 LearningRate 0.0003 Epoch: 21 Global Step: 442090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:07,373-Speed 6288.80 samples/sec Loss 4.6734 LearningRate 0.0003 Epoch: 21 Global Step: 442100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:10,625-Speed 6300.37 samples/sec Loss 4.6699 LearningRate 0.0003 Epoch: 21 Global Step: 442110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:13,871-Speed 6309.68 samples/sec Loss 4.6770 LearningRate 0.0003 Epoch: 21 Global Step: 442120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:17,116-Speed 6313.27 samples/sec Loss 4.6377 LearningRate 0.0003 Epoch: 21 Global Step: 442130 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:20,358-Speed 6317.73 samples/sec Loss 4.6659 LearningRate 0.0003 Epoch: 21 Global Step: 442140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:23,588-Speed 6341.35 samples/sec Loss 4.6587 LearningRate 0.0003 Epoch: 21 Global Step: 442150 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:26,836-Speed 6307.40 samples/sec Loss 4.6834 LearningRate 0.0003 Epoch: 21 Global Step: 442160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:30,116-Speed 6245.62 samples/sec Loss 4.6961 LearningRate 0.0003 Epoch: 21 Global Step: 442170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:33,362-Speed 6309.96 samples/sec Loss 4.6820 LearningRate 0.0003 Epoch: 21 Global Step: 442180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:36,610-Speed 6306.56 samples/sec Loss 4.7192 LearningRate 0.0003 Epoch: 21 Global Step: 442190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:39,862-Speed 6300.61 samples/sec Loss 4.7446 LearningRate 0.0003 Epoch: 21 Global Step: 442200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:43,102-Speed 6322.08 samples/sec Loss 4.7204 LearningRate 0.0003 Epoch: 21 Global Step: 442210 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:46,348-Speed 6310.00 samples/sec Loss 4.6772 LearningRate 0.0003 Epoch: 21 Global Step: 442220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:49,597-Speed 6304.68 samples/sec Loss 4.6537 LearningRate 0.0003 Epoch: 21 Global Step: 442230 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:52,838-Speed 6321.25 samples/sec Loss 4.7216 LearningRate 0.0003 Epoch: 21 Global Step: 442240 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:57:56,088-Speed 6304.12 samples/sec Loss 4.7202 LearningRate 0.0003 Epoch: 21 Global Step: 442250 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:57:59,321-Speed 6334.63 samples/sec Loss 4.6923 LearningRate 0.0003 Epoch: 21 Global Step: 442260 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:02,570-Speed 6305.39 samples/sec Loss 4.6310 LearningRate 0.0003 Epoch: 21 Global Step: 442270 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:05,823-Speed 6297.91 samples/sec Loss 4.7596 LearningRate 0.0003 Epoch: 21 Global Step: 442280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:09,066-Speed 6316.64 samples/sec Loss 4.6821 LearningRate 0.0003 Epoch: 21 Global Step: 442290 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:12,309-Speed 6315.79 samples/sec Loss 4.6586 LearningRate 0.0003 Epoch: 21 Global Step: 442300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:15,662-Speed 6109.04 samples/sec Loss 4.6807 LearningRate 0.0003 Epoch: 21 Global Step: 442310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:18,975-Speed 6183.49 samples/sec Loss 4.6871 LearningRate 0.0003 Epoch: 21 Global Step: 442320 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:22,219-Speed 6313.99 samples/sec Loss 4.6697 LearningRate 0.0003 Epoch: 21 Global Step: 442330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:25,468-Speed 6305.87 samples/sec Loss 4.6833 LearningRate 0.0003 Epoch: 21 Global Step: 442340 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:28,713-Speed 6312.04 samples/sec Loss 4.6984 LearningRate 0.0003 Epoch: 21 Global Step: 442350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:31,958-Speed 6313.48 samples/sec Loss 4.7517 LearningRate 0.0003 Epoch: 21 Global Step: 442360 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:58:35,191-Speed 6335.44 samples/sec Loss 4.7565 LearningRate 0.0003 Epoch: 21 Global Step: 442370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:38,437-Speed 6311.48 samples/sec Loss 4.7094 LearningRate 0.0003 Epoch: 21 Global Step: 442380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:41,684-Speed 6308.57 samples/sec Loss 4.6724 LearningRate 0.0003 Epoch: 21 Global Step: 442390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:44,934-Speed 6302.86 samples/sec Loss 4.6977 LearningRate 0.0003 Epoch: 21 Global Step: 442400 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:48,179-Speed 6313.16 samples/sec Loss 4.7391 LearningRate 0.0003 Epoch: 21 Global Step: 442410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:51,425-Speed 6309.76 samples/sec Loss 4.6565 LearningRate 0.0003 Epoch: 21 Global Step: 442420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:54,669-Speed 6313.72 samples/sec Loss 4.7437 LearningRate 0.0003 Epoch: 21 Global Step: 442430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:58:57,914-Speed 6313.33 samples/sec Loss 4.6810 LearningRate 0.0003 Epoch: 21 Global Step: 442440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:01,163-Speed 6305.27 samples/sec Loss 4.6908 LearningRate 0.0003 Epoch: 21 Global Step: 442450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:04,412-Speed 6305.15 samples/sec Loss 4.7039 LearningRate 0.0003 Epoch: 21 Global Step: 442460 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:07,644-Speed 6337.78 samples/sec Loss 4.6831 LearningRate 0.0003 Epoch: 21 Global Step: 442470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:10,889-Speed 6313.44 samples/sec Loss 4.6529 LearningRate 0.0003 Epoch: 21 Global Step: 442480 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:14,136-Speed 6309.25 samples/sec Loss 4.6706 LearningRate 0.0003 Epoch: 21 Global Step: 442490 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:17,380-Speed 6315.14 samples/sec Loss 4.6698 LearningRate 0.0003 Epoch: 21 Global Step: 442500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:20,625-Speed 6310.72 samples/sec Loss 4.6370 LearningRate 0.0003 Epoch: 21 Global Step: 442510 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:23,870-Speed 6312.74 samples/sec Loss 4.5872 LearningRate 0.0003 Epoch: 21 Global Step: 442520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:27,115-Speed 6313.98 samples/sec Loss 4.6733 LearningRate 0.0003 Epoch: 21 Global Step: 442530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:30,362-Speed 6307.86 samples/sec Loss 4.7298 LearningRate 0.0003 Epoch: 21 Global Step: 442540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:33,608-Speed 6311.40 samples/sec Loss 4.6811 LearningRate 0.0003 Epoch: 21 Global Step: 442550 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:36,854-Speed 6310.25 samples/sec Loss 4.7071 LearningRate 0.0003 Epoch: 21 Global Step: 442560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:40,096-Speed 6319.26 samples/sec Loss 4.6653 LearningRate 0.0003 Epoch: 21 Global Step: 442570 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:59:43,344-Speed 6306.58 samples/sec Loss 4.7518 LearningRate 0.0003 Epoch: 21 Global Step: 442580 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 07:59:46,579-Speed 6331.20 samples/sec Loss 4.6963 LearningRate 0.0003 Epoch: 21 Global Step: 442590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:49,825-Speed 6310.92 samples/sec Loss 4.7460 LearningRate 0.0003 Epoch: 21 Global Step: 442600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:53,069-Speed 6314.88 samples/sec Loss 4.6739 LearningRate 0.0003 Epoch: 21 Global Step: 442610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:56,316-Speed 6308.41 samples/sec Loss 4.6553 LearningRate 0.0003 Epoch: 21 Global Step: 442620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 07:59:59,558-Speed 6318.47 samples/sec Loss 4.6730 LearningRate 0.0003 Epoch: 21 Global Step: 442630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:00:02,810-Speed 6302.33 samples/sec Loss 4.7413 LearningRate 0.0003 Epoch: 21 Global Step: 442640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:00:06,055-Speed 6313.39 samples/sec Loss 4.6782 LearningRate 0.0003 Epoch: 21 Global Step: 442650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:00:09,303-Speed 6306.61 samples/sec Loss 4.6594 LearningRate 0.0003 Epoch: 21 Global Step: 442660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:00:12,551-Speed 6306.45 samples/sec Loss 4.7170 LearningRate 0.0003 Epoch: 21 Global Step: 442670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:00:15,832-Speed 6242.66 samples/sec Loss 4.6811 LearningRate 0.0003 Epoch: 21 Global Step: 442680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:00:19,124-Speed 6224.49 samples/sec Loss 4.6919 LearningRate 0.0003 Epoch: 21 Global Step: 442690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:00:22,368-Speed 6314.89 samples/sec Loss 4.6619 LearningRate 0.0003 Epoch: 21 Global Step: 442700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:00:25,651-Speed 6239.48 samples/sec Loss 4.7360 LearningRate 0.0003 Epoch: 21 Global Step: 442710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:00:28,939-Speed 6230.13 samples/sec Loss 4.6825 LearningRate 0.0003 Epoch: 21 Global Step: 442720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:00:32,169-Speed 6341.78 samples/sec Loss 4.7030 LearningRate 0.0003 Epoch: 21 Global Step: 442730 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 08:00:35,416-Speed 6308.88 samples/sec Loss 4.7714 LearningRate 0.0003 Epoch: 21 Global Step: 442740 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 08:00:38,660-Speed 6314.83 samples/sec Loss 4.7085 LearningRate 0.0003 Epoch: 21 Global Step: 442750 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 08:00:41,907-Speed 6308.42 samples/sec Loss 4.7166 LearningRate 0.0003 Epoch: 21 Global Step: 442760 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 08:00:45,153-Speed 6310.83 samples/sec Loss 4.6817 LearningRate 0.0003 Epoch: 21 Global Step: 442770 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 08:00:48,395-Speed 6318.26 samples/sec Loss 4.6918 LearningRate 0.0003 Epoch: 21 Global Step: 442780 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 08:00:51,776-Speed 6059.10 samples/sec Loss 4.6881 LearningRate 0.0003 Epoch: 21 Global Step: 442790 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 08:00:55,046-Speed 6264.53 samples/sec Loss 4.6908 LearningRate 0.0003 Epoch: 21 Global Step: 442800 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 08:00:58,284-Speed 6325.66 samples/sec Loss 4.7500 LearningRate 0.0003 Epoch: 21 Global Step: 442810 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 08:01:01,530-Speed 6309.83 samples/sec Loss 4.6304 LearningRate 0.0003 Epoch: 21 Global Step: 442820 Fp16 Grad Scale: 8192 Required: 35 hours Training: 2022-04-02 08:01:04,776-Speed 6310.45 samples/sec Loss 4.6491 LearningRate 0.0003 Epoch: 21 Global Step: 442830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:08,028-Speed 6300.46 samples/sec Loss 4.6948 LearningRate 0.0003 Epoch: 21 Global Step: 442840 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:11,269-Speed 6319.14 samples/sec Loss 4.6969 LearningRate 0.0003 Epoch: 21 Global Step: 442850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:14,520-Speed 6301.41 samples/sec Loss 4.7044 LearningRate 0.0003 Epoch: 21 Global Step: 442860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:17,770-Speed 6303.41 samples/sec Loss 4.6182 LearningRate 0.0003 Epoch: 21 Global Step: 442870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:21,022-Speed 6299.06 samples/sec Loss 4.6332 LearningRate 0.0003 Epoch: 21 Global Step: 442880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:24,300-Speed 6247.77 samples/sec Loss 4.6632 LearningRate 0.0003 Epoch: 21 Global Step: 442890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:27,550-Speed 6305.00 samples/sec Loss 4.6648 LearningRate 0.0003 Epoch: 21 Global Step: 442900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:30,794-Speed 6314.71 samples/sec Loss 4.7439 LearningRate 0.0003 Epoch: 21 Global Step: 442910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:34,042-Speed 6306.77 samples/sec Loss 4.6635 LearningRate 0.0003 Epoch: 21 Global Step: 442920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:37,287-Speed 6313.67 samples/sec Loss 4.7206 LearningRate 0.0003 Epoch: 21 Global Step: 442930 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:01:40,516-Speed 6343.36 samples/sec Loss 4.6580 LearningRate 0.0003 Epoch: 21 Global Step: 442940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:43,773-Speed 6290.06 samples/sec Loss 4.7411 LearningRate 0.0003 Epoch: 21 Global Step: 442950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:47,016-Speed 6315.15 samples/sec Loss 4.6967 LearningRate 0.0003 Epoch: 21 Global Step: 442960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:50,261-Speed 6314.29 samples/sec Loss 4.6196 LearningRate 0.0003 Epoch: 21 Global Step: 442970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:53,507-Speed 6310.08 samples/sec Loss 4.7580 LearningRate 0.0003 Epoch: 21 Global Step: 442980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:56,751-Speed 6313.88 samples/sec Loss 4.6375 LearningRate 0.0003 Epoch: 21 Global Step: 442990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:01:59,998-Speed 6308.91 samples/sec Loss 4.7182 LearningRate 0.0003 Epoch: 21 Global Step: 443000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:03,245-Speed 6308.87 samples/sec Loss 4.6891 LearningRate 0.0003 Epoch: 21 Global Step: 443010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:06,490-Speed 6313.49 samples/sec Loss 4.7561 LearningRate 0.0003 Epoch: 21 Global Step: 443020 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:09,737-Speed 6307.90 samples/sec Loss 4.7570 LearningRate 0.0003 Epoch: 21 Global Step: 443030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:12,968-Speed 6339.35 samples/sec Loss 4.6882 LearningRate 0.0003 Epoch: 21 Global Step: 443040 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:16,224-Speed 6293.02 samples/sec Loss 4.7575 LearningRate 0.0003 Epoch: 21 Global Step: 443050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:19,513-Speed 6228.06 samples/sec Loss 4.6362 LearningRate 0.0003 Epoch: 21 Global Step: 443060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:22,754-Speed 6319.84 samples/sec Loss 4.7473 LearningRate 0.0003 Epoch: 21 Global Step: 443070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:25,999-Speed 6312.34 samples/sec Loss 4.6524 LearningRate 0.0003 Epoch: 21 Global Step: 443080 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:29,249-Speed 6303.76 samples/sec Loss 4.7043 LearningRate 0.0003 Epoch: 21 Global Step: 443090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:32,497-Speed 6307.52 samples/sec Loss 4.6245 LearningRate 0.0003 Epoch: 21 Global Step: 443100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:35,747-Speed 6302.63 samples/sec Loss 4.6286 LearningRate 0.0003 Epoch: 21 Global Step: 443110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:38,991-Speed 6315.63 samples/sec Loss 4.7086 LearningRate 0.0003 Epoch: 21 Global Step: 443120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:42,236-Speed 6311.26 samples/sec Loss 4.6705 LearningRate 0.0003 Epoch: 21 Global Step: 443130 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:45,467-Speed 6340.34 samples/sec Loss 4.6507 LearningRate 0.0003 Epoch: 21 Global Step: 443140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:48,712-Speed 6313.99 samples/sec Loss 4.6738 LearningRate 0.0003 Epoch: 21 Global Step: 443150 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:51,988-Speed 6251.17 samples/sec Loss 4.7172 LearningRate 0.0003 Epoch: 21 Global Step: 443160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:55,234-Speed 6311.32 samples/sec Loss 4.7710 LearningRate 0.0003 Epoch: 21 Global Step: 443170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:02:58,482-Speed 6307.77 samples/sec Loss 4.6107 LearningRate 0.0003 Epoch: 21 Global Step: 443180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:01,734-Speed 6299.48 samples/sec Loss 4.6913 LearningRate 0.0003 Epoch: 21 Global Step: 443190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:04,981-Speed 6308.31 samples/sec Loss 4.7372 LearningRate 0.0003 Epoch: 21 Global Step: 443200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:08,224-Speed 6316.65 samples/sec Loss 4.6704 LearningRate 0.0003 Epoch: 21 Global Step: 443210 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:11,462-Speed 6325.60 samples/sec Loss 4.7108 LearningRate 0.0003 Epoch: 21 Global Step: 443220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:14,706-Speed 6314.41 samples/sec Loss 4.6429 LearningRate 0.0003 Epoch: 21 Global Step: 443230 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:17,949-Speed 6316.62 samples/sec Loss 4.6866 LearningRate 0.0003 Epoch: 21 Global Step: 443240 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:03:21,196-Speed 6308.66 samples/sec Loss 4.6475 LearningRate 0.0003 Epoch: 21 Global Step: 443250 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:03:24,429-Speed 6336.36 samples/sec Loss 4.6508 LearningRate 0.0003 Epoch: 21 Global Step: 443260 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:27,680-Speed 6301.85 samples/sec Loss 4.7443 LearningRate 0.0003 Epoch: 21 Global Step: 443270 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:30,922-Speed 6316.67 samples/sec Loss 4.6624 LearningRate 0.0003 Epoch: 21 Global Step: 443280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:34,168-Speed 6312.08 samples/sec Loss 4.7439 LearningRate 0.0003 Epoch: 21 Global Step: 443290 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:37,414-Speed 6310.86 samples/sec Loss 4.6529 LearningRate 0.0003 Epoch: 21 Global Step: 443300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:40,661-Speed 6308.86 samples/sec Loss 4.7607 LearningRate 0.0003 Epoch: 21 Global Step: 443310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:43,906-Speed 6313.10 samples/sec Loss 4.6099 LearningRate 0.0003 Epoch: 21 Global Step: 443320 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:47,150-Speed 6315.00 samples/sec Loss 4.7188 LearningRate 0.0003 Epoch: 21 Global Step: 443330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:50,396-Speed 6310.87 samples/sec Loss 4.7067 LearningRate 0.0003 Epoch: 21 Global Step: 443340 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:53,638-Speed 6317.92 samples/sec Loss 4.7172 LearningRate 0.0003 Epoch: 21 Global Step: 443350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:03:56,869-Speed 6340.26 samples/sec Loss 4.6452 LearningRate 0.0003 Epoch: 21 Global Step: 443360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:00,113-Speed 6315.14 samples/sec Loss 4.6167 LearningRate 0.0003 Epoch: 21 Global Step: 443370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:03,358-Speed 6311.72 samples/sec Loss 4.6767 LearningRate 0.0003 Epoch: 21 Global Step: 443380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:06,606-Speed 6307.84 samples/sec Loss 4.6581 LearningRate 0.0003 Epoch: 21 Global Step: 443390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:09,852-Speed 6311.02 samples/sec Loss 4.7123 LearningRate 0.0003 Epoch: 21 Global Step: 443400 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:13,097-Speed 6312.36 samples/sec Loss 4.7137 LearningRate 0.0003 Epoch: 21 Global Step: 443410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:16,350-Speed 6296.65 samples/sec Loss 4.6197 LearningRate 0.0003 Epoch: 21 Global Step: 443420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:19,595-Speed 6312.50 samples/sec Loss 4.5807 LearningRate 0.0003 Epoch: 21 Global Step: 443430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:22,839-Speed 6313.53 samples/sec Loss 4.6851 LearningRate 0.0003 Epoch: 21 Global Step: 443440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:26,083-Speed 6316.32 samples/sec Loss 4.7190 LearningRate 0.0003 Epoch: 21 Global Step: 443450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:29,328-Speed 6312.58 samples/sec Loss 4.6734 LearningRate 0.0003 Epoch: 21 Global Step: 443460 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:04:32,576-Speed 6306.55 samples/sec Loss 4.7007 LearningRate 0.0003 Epoch: 21 Global Step: 443470 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:04:35,852-Speed 6252.70 samples/sec Loss 4.7005 LearningRate 0.0003 Epoch: 21 Global Step: 443480 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:04:39,083-Speed 6338.86 samples/sec Loss 4.6837 LearningRate 0.0003 Epoch: 21 Global Step: 443490 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:42,330-Speed 6309.53 samples/sec Loss 4.6585 LearningRate 0.0003 Epoch: 21 Global Step: 443500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:45,574-Speed 6315.12 samples/sec Loss 4.6737 LearningRate 0.0003 Epoch: 21 Global Step: 443510 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:48,819-Speed 6312.07 samples/sec Loss 4.6967 LearningRate 0.0003 Epoch: 21 Global Step: 443520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:52,066-Speed 6309.76 samples/sec Loss 4.7170 LearningRate 0.0003 Epoch: 21 Global Step: 443530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:55,314-Speed 6306.35 samples/sec Loss 4.7899 LearningRate 0.0003 Epoch: 21 Global Step: 443540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:04:58,554-Speed 6322.53 samples/sec Loss 4.6658 LearningRate 0.0003 Epoch: 21 Global Step: 443550 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:01,801-Speed 6308.23 samples/sec Loss 4.6564 LearningRate 0.0003 Epoch: 21 Global Step: 443560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:05,046-Speed 6313.45 samples/sec Loss 4.6895 LearningRate 0.0003 Epoch: 21 Global Step: 443570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:08,289-Speed 6316.82 samples/sec Loss 4.6814 LearningRate 0.0003 Epoch: 21 Global Step: 443580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:11,533-Speed 6314.62 samples/sec Loss 4.7315 LearningRate 0.0003 Epoch: 21 Global Step: 443590 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:05:14,763-Speed 6342.54 samples/sec Loss 4.7205 LearningRate 0.0003 Epoch: 21 Global Step: 443600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:18,008-Speed 6311.87 samples/sec Loss 4.6686 LearningRate 0.0003 Epoch: 21 Global Step: 443610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:21,253-Speed 6313.11 samples/sec Loss 4.6755 LearningRate 0.0003 Epoch: 21 Global Step: 443620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:24,497-Speed 6314.50 samples/sec Loss 4.6938 LearningRate 0.0003 Epoch: 21 Global Step: 443630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:27,744-Speed 6309.12 samples/sec Loss 4.6171 LearningRate 0.0003 Epoch: 21 Global Step: 443640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:30,986-Speed 6318.53 samples/sec Loss 4.7238 LearningRate 0.0003 Epoch: 21 Global Step: 443650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:34,241-Speed 6293.56 samples/sec Loss 4.6549 LearningRate 0.0003 Epoch: 21 Global Step: 443660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:37,491-Speed 6301.66 samples/sec Loss 4.6124 LearningRate 0.0003 Epoch: 21 Global Step: 443670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:40,746-Speed 6293.74 samples/sec Loss 4.6419 LearningRate 0.0003 Epoch: 21 Global Step: 443680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:44,004-Speed 6287.98 samples/sec Loss 4.6950 LearningRate 0.0003 Epoch: 21 Global Step: 443690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:47,257-Speed 6295.93 samples/sec Loss 4.6861 LearningRate 0.0003 Epoch: 21 Global Step: 443700 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:05:50,487-Speed 6341.48 samples/sec Loss 4.6816 LearningRate 0.0003 Epoch: 21 Global Step: 443710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:53,731-Speed 6314.68 samples/sec Loss 4.7257 LearningRate 0.0003 Epoch: 21 Global Step: 443720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:05:56,978-Speed 6309.85 samples/sec Loss 4.6914 LearningRate 0.0003 Epoch: 21 Global Step: 443730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:00,220-Speed 6318.58 samples/sec Loss 4.6602 LearningRate 0.0003 Epoch: 21 Global Step: 443740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:03,486-Speed 6271.62 samples/sec Loss 4.6605 LearningRate 0.0003 Epoch: 21 Global Step: 443750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:06,734-Speed 6307.51 samples/sec Loss 4.6949 LearningRate 0.0003 Epoch: 21 Global Step: 443760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:09,975-Speed 6320.30 samples/sec Loss 4.6716 LearningRate 0.0003 Epoch: 21 Global Step: 443770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:13,218-Speed 6316.00 samples/sec Loss 4.6932 LearningRate 0.0003 Epoch: 21 Global Step: 443780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:16,463-Speed 6316.89 samples/sec Loss 4.7085 LearningRate 0.0003 Epoch: 21 Global Step: 443790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:19,711-Speed 6305.09 samples/sec Loss 4.6812 LearningRate 0.0003 Epoch: 21 Global Step: 443800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:22,952-Speed 6321.40 samples/sec Loss 4.6150 LearningRate 0.0003 Epoch: 21 Global Step: 443810 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:06:26,183-Speed 6339.99 samples/sec Loss 4.6512 LearningRate 0.0003 Epoch: 21 Global Step: 443820 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:29,429-Speed 6310.52 samples/sec Loss 4.6512 LearningRate 0.0003 Epoch: 21 Global Step: 443830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:32,673-Speed 6315.10 samples/sec Loss 4.6797 LearningRate 0.0003 Epoch: 21 Global Step: 443840 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:35,920-Speed 6308.09 samples/sec Loss 4.7599 LearningRate 0.0003 Epoch: 21 Global Step: 443850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:39,163-Speed 6316.31 samples/sec Loss 4.6316 LearningRate 0.0003 Epoch: 21 Global Step: 443860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:42,407-Speed 6314.92 samples/sec Loss 4.6431 LearningRate 0.0003 Epoch: 21 Global Step: 443870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:45,652-Speed 6312.22 samples/sec Loss 4.6399 LearningRate 0.0003 Epoch: 21 Global Step: 443880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:48,895-Speed 6317.60 samples/sec Loss 4.6631 LearningRate 0.0003 Epoch: 21 Global Step: 443890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:52,139-Speed 6313.46 samples/sec Loss 4.6342 LearningRate 0.0003 Epoch: 21 Global Step: 443900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:55,383-Speed 6315.35 samples/sec Loss 4.6698 LearningRate 0.0003 Epoch: 21 Global Step: 443910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:06:58,615-Speed 6338.40 samples/sec Loss 4.6734 LearningRate 0.0003 Epoch: 21 Global Step: 443920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:01,863-Speed 6306.26 samples/sec Loss 4.6928 LearningRate 0.0003 Epoch: 21 Global Step: 443930 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:05,114-Speed 6300.59 samples/sec Loss 4.6939 LearningRate 0.0003 Epoch: 21 Global Step: 443940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:08,358-Speed 6314.51 samples/sec Loss 4.6949 LearningRate 0.0003 Epoch: 21 Global Step: 443950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:11,607-Speed 6305.09 samples/sec Loss 4.6963 LearningRate 0.0003 Epoch: 21 Global Step: 443960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:14,850-Speed 6315.74 samples/sec Loss 4.6852 LearningRate 0.0003 Epoch: 21 Global Step: 443970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:18,101-Speed 6302.89 samples/sec Loss 4.7049 LearningRate 0.0003 Epoch: 21 Global Step: 443980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:21,347-Speed 6309.96 samples/sec Loss 4.7558 LearningRate 0.0003 Epoch: 21 Global Step: 443990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:24,592-Speed 6313.86 samples/sec Loss 4.6588 LearningRate 0.0003 Epoch: 21 Global Step: 444000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:27,928-Speed 6140.30 samples/sec Loss 4.6727 LearningRate 0.0003 Epoch: 21 Global Step: 444010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:31,174-Speed 6310.54 samples/sec Loss 4.6136 LearningRate 0.0003 Epoch: 21 Global Step: 444020 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:07:34,404-Speed 6342.32 samples/sec Loss 4.6908 LearningRate 0.0003 Epoch: 21 Global Step: 444030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:37,654-Speed 6302.55 samples/sec Loss 4.6110 LearningRate 0.0003 Epoch: 21 Global Step: 444040 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:40,901-Speed 6308.76 samples/sec Loss 4.6850 LearningRate 0.0003 Epoch: 21 Global Step: 444050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:44,144-Speed 6315.65 samples/sec Loss 4.6540 LearningRate 0.0003 Epoch: 21 Global Step: 444060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:47,390-Speed 6312.23 samples/sec Loss 4.6034 LearningRate 0.0003 Epoch: 21 Global Step: 444070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:50,635-Speed 6311.70 samples/sec Loss 4.7000 LearningRate 0.0003 Epoch: 21 Global Step: 444080 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:53,887-Speed 6299.49 samples/sec Loss 4.6618 LearningRate 0.0003 Epoch: 21 Global Step: 444090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:07:57,128-Speed 6319.26 samples/sec Loss 4.6624 LearningRate 0.0003 Epoch: 21 Global Step: 444100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:00,375-Speed 6310.39 samples/sec Loss 4.6010 LearningRate 0.0003 Epoch: 21 Global Step: 444110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:03,619-Speed 6314.74 samples/sec Loss 4.6814 LearningRate 0.0003 Epoch: 21 Global Step: 444120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:06,852-Speed 6335.81 samples/sec Loss 4.6434 LearningRate 0.0003 Epoch: 21 Global Step: 444130 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:10,095-Speed 6315.44 samples/sec Loss 4.6564 LearningRate 0.0003 Epoch: 21 Global Step: 444140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:13,341-Speed 6311.48 samples/sec Loss 4.7156 LearningRate 0.0003 Epoch: 21 Global Step: 444150 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:16,588-Speed 6307.83 samples/sec Loss 4.6979 LearningRate 0.0003 Epoch: 21 Global Step: 444160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:19,832-Speed 6314.54 samples/sec Loss 4.6373 LearningRate 0.0003 Epoch: 21 Global Step: 444170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:23,125-Speed 6221.58 samples/sec Loss 4.6545 LearningRate 0.0003 Epoch: 21 Global Step: 444180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:26,418-Speed 6221.96 samples/sec Loss 4.6184 LearningRate 0.0003 Epoch: 21 Global Step: 444190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:29,661-Speed 6315.03 samples/sec Loss 4.6910 LearningRate 0.0003 Epoch: 21 Global Step: 444200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:32,908-Speed 6310.77 samples/sec Loss 4.7319 LearningRate 0.0003 Epoch: 21 Global Step: 444210 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:36,153-Speed 6311.61 samples/sec Loss 4.7365 LearningRate 0.0003 Epoch: 21 Global Step: 444220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:39,406-Speed 6297.16 samples/sec Loss 4.6719 LearningRate 0.0003 Epoch: 21 Global Step: 444230 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:08:42,654-Speed 6306.57 samples/sec Loss 4.6739 LearningRate 0.0003 Epoch: 21 Global Step: 444240 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:08:45,885-Speed 6340.62 samples/sec Loss 4.6814 LearningRate 0.0003 Epoch: 21 Global Step: 444250 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:49,135-Speed 6302.85 samples/sec Loss 4.6228 LearningRate 0.0003 Epoch: 21 Global Step: 444260 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:52,383-Speed 6305.50 samples/sec Loss 4.6812 LearningRate 0.0003 Epoch: 21 Global Step: 444270 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:55,645-Speed 6280.40 samples/sec Loss 4.5855 LearningRate 0.0003 Epoch: 21 Global Step: 444280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:08:58,891-Speed 6310.04 samples/sec Loss 4.6845 LearningRate 0.0003 Epoch: 21 Global Step: 444290 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:02,132-Speed 6321.13 samples/sec Loss 4.7419 LearningRate 0.0003 Epoch: 21 Global Step: 444300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:05,390-Speed 6287.33 samples/sec Loss 4.6766 LearningRate 0.0003 Epoch: 21 Global Step: 444310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:08,634-Speed 6314.60 samples/sec Loss 4.6927 LearningRate 0.0003 Epoch: 21 Global Step: 444320 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:11,898-Speed 6276.31 samples/sec Loss 4.6487 LearningRate 0.0003 Epoch: 21 Global Step: 444330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:15,145-Speed 6308.96 samples/sec Loss 4.7232 LearningRate 0.0003 Epoch: 21 Global Step: 444340 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:18,378-Speed 6335.19 samples/sec Loss 4.5937 LearningRate 0.0003 Epoch: 21 Global Step: 444350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:21,628-Speed 6304.32 samples/sec Loss 4.6942 LearningRate 0.0003 Epoch: 21 Global Step: 444360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:24,872-Speed 6312.94 samples/sec Loss 4.6844 LearningRate 0.0003 Epoch: 21 Global Step: 444370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:28,120-Speed 6307.83 samples/sec Loss 4.6752 LearningRate 0.0003 Epoch: 21 Global Step: 444380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:31,371-Speed 6301.55 samples/sec Loss 4.6295 LearningRate 0.0003 Epoch: 21 Global Step: 444390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:34,624-Speed 6296.18 samples/sec Loss 4.7128 LearningRate 0.0003 Epoch: 21 Global Step: 444400 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:37,940-Speed 6178.12 samples/sec Loss 4.6972 LearningRate 0.0003 Epoch: 21 Global Step: 444410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:41,185-Speed 6312.62 samples/sec Loss 4.6955 LearningRate 0.0003 Epoch: 21 Global Step: 444420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:44,429-Speed 6314.17 samples/sec Loss 4.6747 LearningRate 0.0003 Epoch: 21 Global Step: 444430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:47,675-Speed 6311.29 samples/sec Loss 4.7144 LearningRate 0.0003 Epoch: 21 Global Step: 444440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:50,909-Speed 6334.88 samples/sec Loss 4.7042 LearningRate 0.0003 Epoch: 21 Global Step: 444450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:54,249-Speed 6132.66 samples/sec Loss 4.6727 LearningRate 0.0003 Epoch: 21 Global Step: 444460 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:09:57,496-Speed 6307.69 samples/sec Loss 4.7187 LearningRate 0.0003 Epoch: 21 Global Step: 444470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:00,744-Speed 6307.54 samples/sec Loss 4.5989 LearningRate 0.0003 Epoch: 21 Global Step: 444480 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:03,990-Speed 6310.20 samples/sec Loss 4.6024 LearningRate 0.0003 Epoch: 21 Global Step: 444490 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:07,253-Speed 6278.71 samples/sec Loss 4.6374 LearningRate 0.0003 Epoch: 21 Global Step: 444500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:10,496-Speed 6315.63 samples/sec Loss 4.6696 LearningRate 0.0003 Epoch: 21 Global Step: 444510 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:13,740-Speed 6315.05 samples/sec Loss 4.7002 LearningRate 0.0003 Epoch: 21 Global Step: 444520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:16,988-Speed 6307.09 samples/sec Loss 4.6865 LearningRate 0.0003 Epoch: 21 Global Step: 444530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:20,232-Speed 6314.36 samples/sec Loss 4.7035 LearningRate 0.0003 Epoch: 21 Global Step: 444540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:23,482-Speed 6304.06 samples/sec Loss 4.6344 LearningRate 0.0003 Epoch: 21 Global Step: 444550 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:10:26,730-Speed 6305.95 samples/sec Loss 4.6577 LearningRate 0.0003 Epoch: 21 Global Step: 444560 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:10:29,961-Speed 6340.35 samples/sec Loss 4.6742 LearningRate 0.0003 Epoch: 21 Global Step: 444570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:33,208-Speed 6308.76 samples/sec Loss 4.6037 LearningRate 0.0003 Epoch: 21 Global Step: 444580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:36,460-Speed 6299.45 samples/sec Loss 4.7295 LearningRate 0.0003 Epoch: 21 Global Step: 444590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:39,706-Speed 6310.17 samples/sec Loss 4.6898 LearningRate 0.0003 Epoch: 21 Global Step: 444600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:42,954-Speed 6306.39 samples/sec Loss 4.7214 LearningRate 0.0003 Epoch: 21 Global Step: 444610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:46,204-Speed 6303.01 samples/sec Loss 4.6770 LearningRate 0.0003 Epoch: 21 Global Step: 444620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:49,451-Speed 6310.20 samples/sec Loss 4.6296 LearningRate 0.0003 Epoch: 21 Global Step: 444630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:52,697-Speed 6311.17 samples/sec Loss 4.6560 LearningRate 0.0003 Epoch: 21 Global Step: 444640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:55,941-Speed 6313.28 samples/sec Loss 4.7268 LearningRate 0.0003 Epoch: 21 Global Step: 444650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:10:59,185-Speed 6315.54 samples/sec Loss 4.7010 LearningRate 0.0003 Epoch: 21 Global Step: 444660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:02,416-Speed 6338.90 samples/sec Loss 4.6618 LearningRate 0.0003 Epoch: 21 Global Step: 444670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:05,673-Speed 6289.89 samples/sec Loss 4.6325 LearningRate 0.0003 Epoch: 21 Global Step: 444680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:08,920-Speed 6309.42 samples/sec Loss 4.6964 LearningRate 0.0003 Epoch: 21 Global Step: 444690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:12,166-Speed 6310.46 samples/sec Loss 4.6723 LearningRate 0.0003 Epoch: 21 Global Step: 444700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:15,408-Speed 6318.88 samples/sec Loss 4.6445 LearningRate 0.0003 Epoch: 21 Global Step: 444710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:18,653-Speed 6311.24 samples/sec Loss 4.6819 LearningRate 0.0003 Epoch: 21 Global Step: 444720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:21,896-Speed 6316.51 samples/sec Loss 4.7274 LearningRate 0.0003 Epoch: 21 Global Step: 444730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:25,147-Speed 6301.43 samples/sec Loss 4.6878 LearningRate 0.0003 Epoch: 21 Global Step: 444740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:28,393-Speed 6311.15 samples/sec Loss 4.7200 LearningRate 0.0003 Epoch: 21 Global Step: 444750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:31,638-Speed 6311.87 samples/sec Loss 4.5979 LearningRate 0.0003 Epoch: 21 Global Step: 444760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:34,898-Speed 6285.12 samples/sec Loss 4.6640 LearningRate 0.0003 Epoch: 21 Global Step: 444770 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:11:38,133-Speed 6331.76 samples/sec Loss 4.6728 LearningRate 0.0003 Epoch: 21 Global Step: 444780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:41,383-Speed 6302.87 samples/sec Loss 4.6631 LearningRate 0.0003 Epoch: 21 Global Step: 444790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:44,635-Speed 6299.32 samples/sec Loss 4.6704 LearningRate 0.0003 Epoch: 21 Global Step: 444800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:47,877-Speed 6317.81 samples/sec Loss 4.6460 LearningRate 0.0003 Epoch: 21 Global Step: 444810 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:51,143-Speed 6273.90 samples/sec Loss 4.6807 LearningRate 0.0003 Epoch: 21 Global Step: 444820 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:54,386-Speed 6315.98 samples/sec Loss 4.6023 LearningRate 0.0003 Epoch: 21 Global Step: 444830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:11:57,632-Speed 6310.97 samples/sec Loss 4.6615 LearningRate 0.0003 Epoch: 21 Global Step: 444840 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:00,879-Speed 6308.17 samples/sec Loss 4.7318 LearningRate 0.0003 Epoch: 21 Global Step: 444850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:04,126-Speed 6308.89 samples/sec Loss 4.6316 LearningRate 0.0003 Epoch: 21 Global Step: 444860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:07,371-Speed 6312.01 samples/sec Loss 4.6665 LearningRate 0.0003 Epoch: 21 Global Step: 444870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:10,638-Speed 6270.52 samples/sec Loss 4.7498 LearningRate 0.0003 Epoch: 21 Global Step: 444880 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:12:13,872-Speed 6334.96 samples/sec Loss 4.7132 LearningRate 0.0003 Epoch: 21 Global Step: 444890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:17,117-Speed 6312.10 samples/sec Loss 4.6232 LearningRate 0.0003 Epoch: 21 Global Step: 444900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:20,363-Speed 6311.43 samples/sec Loss 4.7519 LearningRate 0.0003 Epoch: 21 Global Step: 444910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:23,613-Speed 6301.97 samples/sec Loss 4.6645 LearningRate 0.0003 Epoch: 21 Global Step: 444920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:26,862-Speed 6305.74 samples/sec Loss 4.6490 LearningRate 0.0003 Epoch: 21 Global Step: 444930 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:30,108-Speed 6310.85 samples/sec Loss 4.6412 LearningRate 0.0003 Epoch: 21 Global Step: 444940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:33,362-Speed 6293.66 samples/sec Loss 4.6787 LearningRate 0.0003 Epoch: 21 Global Step: 444950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:36,611-Speed 6305.56 samples/sec Loss 4.6808 LearningRate 0.0003 Epoch: 21 Global Step: 444960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:39,854-Speed 6317.34 samples/sec Loss 4.6393 LearningRate 0.0003 Epoch: 21 Global Step: 444970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:43,097-Speed 6316.46 samples/sec Loss 4.6913 LearningRate 0.0003 Epoch: 21 Global Step: 444980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:46,343-Speed 6309.37 samples/sec Loss 4.6435 LearningRate 0.0003 Epoch: 21 Global Step: 444990 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:12:49,589-Speed 6311.31 samples/sec Loss 4.6883 LearningRate 0.0003 Epoch: 21 Global Step: 445000 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:12:52,824-Speed 6335.22 samples/sec Loss 4.6882 LearningRate 0.0003 Epoch: 21 Global Step: 445010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:56,069-Speed 6314.35 samples/sec Loss 4.6562 LearningRate 0.0003 Epoch: 21 Global Step: 445020 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:12:59,318-Speed 6304.95 samples/sec Loss 4.6370 LearningRate 0.0003 Epoch: 21 Global Step: 445030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:02,575-Speed 6290.13 samples/sec Loss 4.6034 LearningRate 0.0003 Epoch: 21 Global Step: 445040 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:05,820-Speed 6311.79 samples/sec Loss 4.5848 LearningRate 0.0003 Epoch: 21 Global Step: 445050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:09,072-Speed 6299.04 samples/sec Loss 4.7125 LearningRate 0.0003 Epoch: 21 Global Step: 445060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:12,316-Speed 6315.43 samples/sec Loss 4.6710 LearningRate 0.0003 Epoch: 21 Global Step: 445070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:15,559-Speed 6315.98 samples/sec Loss 4.6932 LearningRate 0.0003 Epoch: 21 Global Step: 445080 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:18,802-Speed 6315.74 samples/sec Loss 4.6797 LearningRate 0.0003 Epoch: 21 Global Step: 445090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:22,046-Speed 6315.52 samples/sec Loss 4.6236 LearningRate 0.0003 Epoch: 21 Global Step: 445100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:25,336-Speed 6225.54 samples/sec Loss 4.7087 LearningRate 0.0003 Epoch: 21 Global Step: 445110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:28,581-Speed 6313.36 samples/sec Loss 4.6974 LearningRate 0.0003 Epoch: 21 Global Step: 445120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:31,827-Speed 6310.70 samples/sec Loss 4.6432 LearningRate 0.0003 Epoch: 21 Global Step: 445130 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:35,071-Speed 6314.92 samples/sec Loss 4.6688 LearningRate 0.0003 Epoch: 21 Global Step: 445140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:38,315-Speed 6314.73 samples/sec Loss 4.6278 LearningRate 0.0003 Epoch: 21 Global Step: 445150 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:41,564-Speed 6304.13 samples/sec Loss 4.6883 LearningRate 0.0003 Epoch: 21 Global Step: 445160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:44,810-Speed 6310.68 samples/sec Loss 4.7212 LearningRate 0.0003 Epoch: 21 Global Step: 445170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:48,053-Speed 6315.75 samples/sec Loss 4.6671 LearningRate 0.0003 Epoch: 21 Global Step: 445180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:51,300-Speed 6310.40 samples/sec Loss 4.5661 LearningRate 0.0003 Epoch: 21 Global Step: 445190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:54,547-Speed 6308.40 samples/sec Loss 4.6423 LearningRate 0.0003 Epoch: 21 Global Step: 445200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:13:57,795-Speed 6305.95 samples/sec Loss 4.7204 LearningRate 0.0003 Epoch: 21 Global Step: 445210 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:14:01,024-Speed 6343.94 samples/sec Loss 4.6648 LearningRate 0.0003 Epoch: 21 Global Step: 445220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:04,273-Speed 6306.20 samples/sec Loss 4.6988 LearningRate 0.0003 Epoch: 21 Global Step: 445230 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:07,515-Speed 6318.41 samples/sec Loss 4.6090 LearningRate 0.0003 Epoch: 21 Global Step: 445240 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:10,760-Speed 6312.64 samples/sec Loss 4.7123 LearningRate 0.0003 Epoch: 21 Global Step: 445250 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:14,007-Speed 6309.70 samples/sec Loss 4.6608 LearningRate 0.0003 Epoch: 21 Global Step: 445260 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:17,253-Speed 6309.95 samples/sec Loss 4.6694 LearningRate 0.0003 Epoch: 21 Global Step: 445270 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:20,497-Speed 6314.93 samples/sec Loss 4.6840 LearningRate 0.0003 Epoch: 21 Global Step: 445280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:23,740-Speed 6316.79 samples/sec Loss 4.7491 LearningRate 0.0003 Epoch: 21 Global Step: 445290 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:26,984-Speed 6314.20 samples/sec Loss 4.7213 LearningRate 0.0003 Epoch: 21 Global Step: 445300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:30,230-Speed 6310.72 samples/sec Loss 4.6696 LearningRate 0.0003 Epoch: 21 Global Step: 445310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:33,474-Speed 6313.56 samples/sec Loss 4.6497 LearningRate 0.0003 Epoch: 21 Global Step: 445320 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:14:36,704-Speed 6343.06 samples/sec Loss 4.6636 LearningRate 0.0003 Epoch: 21 Global Step: 445330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:39,954-Speed 6302.54 samples/sec Loss 4.6939 LearningRate 0.0003 Epoch: 21 Global Step: 445340 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:43,199-Speed 6312.77 samples/sec Loss 4.6984 LearningRate 0.0003 Epoch: 21 Global Step: 445350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:46,440-Speed 6320.76 samples/sec Loss 4.7078 LearningRate 0.0003 Epoch: 21 Global Step: 445360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:49,686-Speed 6310.76 samples/sec Loss 4.6364 LearningRate 0.0003 Epoch: 21 Global Step: 445370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:52,936-Speed 6302.91 samples/sec Loss 4.6209 LearningRate 0.0003 Epoch: 21 Global Step: 445380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:56,181-Speed 6311.70 samples/sec Loss 4.6705 LearningRate 0.0003 Epoch: 21 Global Step: 445390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:14:59,429-Speed 6306.72 samples/sec Loss 4.5918 LearningRate 0.0003 Epoch: 21 Global Step: 445400 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:02,673-Speed 6315.59 samples/sec Loss 4.6315 LearningRate 0.0003 Epoch: 21 Global Step: 445410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:05,917-Speed 6315.05 samples/sec Loss 4.6167 LearningRate 0.0003 Epoch: 21 Global Step: 445420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:09,162-Speed 6312.95 samples/sec Loss 4.6256 LearningRate 0.0003 Epoch: 21 Global Step: 445430 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:15:12,411-Speed 6305.43 samples/sec Loss 4.6219 LearningRate 0.0003 Epoch: 21 Global Step: 445440 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:15:15,641-Speed 6342.20 samples/sec Loss 4.6577 LearningRate 0.0003 Epoch: 21 Global Step: 445450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:18,884-Speed 6315.53 samples/sec Loss 4.7135 LearningRate 0.0003 Epoch: 21 Global Step: 445460 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:22,133-Speed 6305.67 samples/sec Loss 4.6576 LearningRate 0.0003 Epoch: 21 Global Step: 445470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:25,380-Speed 6307.45 samples/sec Loss 4.6146 LearningRate 0.0003 Epoch: 21 Global Step: 445480 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:28,623-Speed 6318.18 samples/sec Loss 4.6588 LearningRate 0.0003 Epoch: 21 Global Step: 445490 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:31,869-Speed 6309.76 samples/sec Loss 4.6356 LearningRate 0.0003 Epoch: 21 Global Step: 445500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:35,118-Speed 6305.86 samples/sec Loss 4.7213 LearningRate 0.0003 Epoch: 21 Global Step: 445510 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:38,361-Speed 6316.47 samples/sec Loss 4.6541 LearningRate 0.0003 Epoch: 21 Global Step: 445520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:41,603-Speed 6318.04 samples/sec Loss 4.6609 LearningRate 0.0003 Epoch: 21 Global Step: 445530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:44,851-Speed 6306.33 samples/sec Loss 4.6936 LearningRate 0.0003 Epoch: 21 Global Step: 445540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:48,098-Speed 6309.60 samples/sec Loss 4.7009 LearningRate 0.0003 Epoch: 21 Global Step: 445550 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:15:51,332-Speed 6333.24 samples/sec Loss 4.6325 LearningRate 0.0003 Epoch: 21 Global Step: 445560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:54,577-Speed 6312.91 samples/sec Loss 4.6942 LearningRate 0.0003 Epoch: 21 Global Step: 445570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:15:57,825-Speed 6307.13 samples/sec Loss 4.6571 LearningRate 0.0003 Epoch: 21 Global Step: 445580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:01,070-Speed 6311.13 samples/sec Loss 4.6335 LearningRate 0.0003 Epoch: 21 Global Step: 445590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:04,317-Speed 6311.76 samples/sec Loss 4.6910 LearningRate 0.0003 Epoch: 21 Global Step: 445600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:07,565-Speed 6307.21 samples/sec Loss 4.6615 LearningRate 0.0003 Epoch: 21 Global Step: 445610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:10,811-Speed 6309.50 samples/sec Loss 4.6766 LearningRate 0.0003 Epoch: 21 Global Step: 445620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:14,059-Speed 6307.61 samples/sec Loss 4.6265 LearningRate 0.0003 Epoch: 21 Global Step: 445630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:17,305-Speed 6310.96 samples/sec Loss 4.7069 LearningRate 0.0003 Epoch: 21 Global Step: 445640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:20,546-Speed 6319.79 samples/sec Loss 4.6891 LearningRate 0.0003 Epoch: 21 Global Step: 445650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:23,790-Speed 6315.86 samples/sec Loss 4.6199 LearningRate 0.0003 Epoch: 21 Global Step: 445660 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:16:27,019-Speed 6344.37 samples/sec Loss 4.6274 LearningRate 0.0003 Epoch: 21 Global Step: 445670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:30,265-Speed 6310.52 samples/sec Loss 4.6796 LearningRate 0.0003 Epoch: 21 Global Step: 445680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:33,516-Speed 6300.81 samples/sec Loss 4.6965 LearningRate 0.0003 Epoch: 21 Global Step: 445690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:36,761-Speed 6312.61 samples/sec Loss 4.6695 LearningRate 0.0003 Epoch: 21 Global Step: 445700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:40,010-Speed 6304.50 samples/sec Loss 4.6634 LearningRate 0.0003 Epoch: 21 Global Step: 445710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:43,255-Speed 6313.16 samples/sec Loss 4.6506 LearningRate 0.0003 Epoch: 21 Global Step: 445720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:46,512-Speed 6290.01 samples/sec Loss 4.6585 LearningRate 0.0003 Epoch: 21 Global Step: 445730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:49,758-Speed 6310.09 samples/sec Loss 4.6819 LearningRate 0.0003 Epoch: 21 Global Step: 445740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:53,002-Speed 6313.95 samples/sec Loss 4.6312 LearningRate 0.0003 Epoch: 21 Global Step: 445750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:56,248-Speed 6312.23 samples/sec Loss 4.6769 LearningRate 0.0003 Epoch: 21 Global Step: 445760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:16:59,479-Speed 6338.89 samples/sec Loss 4.6122 LearningRate 0.0003 Epoch: 21 Global Step: 445770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:02,726-Speed 6309.45 samples/sec Loss 4.7347 LearningRate 0.0003 Epoch: 21 Global Step: 445780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:05,970-Speed 6312.80 samples/sec Loss 4.6373 LearningRate 0.0003 Epoch: 21 Global Step: 445790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:09,215-Speed 6313.74 samples/sec Loss 4.6589 LearningRate 0.0003 Epoch: 21 Global Step: 445800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:12,461-Speed 6310.86 samples/sec Loss 4.7520 LearningRate 0.0003 Epoch: 21 Global Step: 445810 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:15,706-Speed 6312.38 samples/sec Loss 4.6273 LearningRate 0.0003 Epoch: 21 Global Step: 445820 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:18,949-Speed 6315.94 samples/sec Loss 4.6847 LearningRate 0.0003 Epoch: 21 Global Step: 445830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:22,197-Speed 6308.26 samples/sec Loss 4.6659 LearningRate 0.0003 Epoch: 21 Global Step: 445840 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:25,442-Speed 6311.34 samples/sec Loss 4.6440 LearningRate 0.0003 Epoch: 21 Global Step: 445850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:28,687-Speed 6313.23 samples/sec Loss 4.6406 LearningRate 0.0003 Epoch: 21 Global Step: 445860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:31,932-Speed 6313.86 samples/sec Loss 4.6747 LearningRate 0.0003 Epoch: 21 Global Step: 445870 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:17:35,159-Speed 6347.39 samples/sec Loss 4.6974 LearningRate 0.0003 Epoch: 21 Global Step: 445880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:38,405-Speed 6311.42 samples/sec Loss 4.7418 LearningRate 0.0003 Epoch: 21 Global Step: 445890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:41,650-Speed 6312.46 samples/sec Loss 4.5938 LearningRate 0.0003 Epoch: 21 Global Step: 445900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:44,900-Speed 6303.43 samples/sec Loss 4.5977 LearningRate 0.0003 Epoch: 21 Global Step: 445910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:48,145-Speed 6311.83 samples/sec Loss 4.6856 LearningRate 0.0003 Epoch: 21 Global Step: 445920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:51,390-Speed 6311.85 samples/sec Loss 4.6271 LearningRate 0.0003 Epoch: 21 Global Step: 445930 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:54,636-Speed 6312.21 samples/sec Loss 4.6542 LearningRate 0.0003 Epoch: 21 Global Step: 445940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:17:57,881-Speed 6311.81 samples/sec Loss 4.6653 LearningRate 0.0003 Epoch: 21 Global Step: 445950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:01,121-Speed 6323.27 samples/sec Loss 4.6387 LearningRate 0.0003 Epoch: 21 Global Step: 445960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:04,377-Speed 6289.60 samples/sec Loss 4.6512 LearningRate 0.0003 Epoch: 21 Global Step: 445970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:07,608-Speed 6341.73 samples/sec Loss 4.6964 LearningRate 0.0003 Epoch: 21 Global Step: 445980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:10,855-Speed 6308.75 samples/sec Loss 4.6460 LearningRate 0.0003 Epoch: 21 Global Step: 445990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:14,130-Speed 6254.58 samples/sec Loss 4.6396 LearningRate 0.0003 Epoch: 21 Global Step: 446000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:17,376-Speed 6310.97 samples/sec Loss 4.6250 LearningRate 0.0003 Epoch: 21 Global Step: 446010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:20,661-Speed 6234.85 samples/sec Loss 4.6620 LearningRate 0.0003 Epoch: 21 Global Step: 446020 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:24,046-Speed 6050.97 samples/sec Loss 4.7127 LearningRate 0.0003 Epoch: 21 Global Step: 446030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:27,298-Speed 6300.45 samples/sec Loss 4.6358 LearningRate 0.0003 Epoch: 21 Global Step: 446040 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:30,542-Speed 6314.07 samples/sec Loss 4.6742 LearningRate 0.0003 Epoch: 21 Global Step: 446050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:33,783-Speed 6319.03 samples/sec Loss 4.6215 LearningRate 0.0003 Epoch: 21 Global Step: 446060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:37,030-Speed 6309.60 samples/sec Loss 4.6419 LearningRate 0.0003 Epoch: 21 Global Step: 446070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:40,278-Speed 6308.81 samples/sec Loss 4.6197 LearningRate 0.0003 Epoch: 21 Global Step: 446080 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:18:43,506-Speed 6344.90 samples/sec Loss 4.6640 LearningRate 0.0003 Epoch: 21 Global Step: 446090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:46,754-Speed 6306.94 samples/sec Loss 4.6497 LearningRate 0.0003 Epoch: 21 Global Step: 446100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:49,997-Speed 6316.05 samples/sec Loss 4.6436 LearningRate 0.0003 Epoch: 21 Global Step: 446110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:53,242-Speed 6313.63 samples/sec Loss 4.5944 LearningRate 0.0003 Epoch: 21 Global Step: 446120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:56,487-Speed 6313.39 samples/sec Loss 4.6615 LearningRate 0.0003 Epoch: 21 Global Step: 446130 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:18:59,733-Speed 6309.62 samples/sec Loss 4.6892 LearningRate 0.0003 Epoch: 21 Global Step: 446140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:02,978-Speed 6312.60 samples/sec Loss 4.7211 LearningRate 0.0003 Epoch: 21 Global Step: 446150 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:06,227-Speed 6303.98 samples/sec Loss 4.6112 LearningRate 0.0003 Epoch: 21 Global Step: 446160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:09,474-Speed 6310.39 samples/sec Loss 4.7778 LearningRate 0.0003 Epoch: 21 Global Step: 446170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:12,783-Speed 6190.37 samples/sec Loss 4.6103 LearningRate 0.0003 Epoch: 21 Global Step: 446180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:16,015-Speed 6338.32 samples/sec Loss 4.6331 LearningRate 0.0003 Epoch: 21 Global Step: 446190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:19,263-Speed 6306.99 samples/sec Loss 4.6650 LearningRate 0.0003 Epoch: 21 Global Step: 446200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:22,508-Speed 6312.59 samples/sec Loss 4.6124 LearningRate 0.0003 Epoch: 21 Global Step: 446210 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:25,750-Speed 6317.06 samples/sec Loss 4.6812 LearningRate 0.0003 Epoch: 21 Global Step: 446220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:28,997-Speed 6309.45 samples/sec Loss 4.6203 LearningRate 0.0003 Epoch: 21 Global Step: 446230 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:32,245-Speed 6306.24 samples/sec Loss 4.6680 LearningRate 0.0003 Epoch: 21 Global Step: 446240 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:35,492-Speed 6308.21 samples/sec Loss 4.6965 LearningRate 0.0003 Epoch: 21 Global Step: 446250 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:38,740-Speed 6308.47 samples/sec Loss 4.6433 LearningRate 0.0003 Epoch: 21 Global Step: 446260 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:41,989-Speed 6303.68 samples/sec Loss 4.6480 LearningRate 0.0003 Epoch: 21 Global Step: 446270 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:45,241-Speed 6300.17 samples/sec Loss 4.6689 LearningRate 0.0003 Epoch: 21 Global Step: 446280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:48,487-Speed 6310.92 samples/sec Loss 4.5816 LearningRate 0.0003 Epoch: 21 Global Step: 446290 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:19:51,718-Speed 6340.49 samples/sec Loss 4.6874 LearningRate 0.0003 Epoch: 21 Global Step: 446300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:54,968-Speed 6302.67 samples/sec Loss 4.6701 LearningRate 0.0003 Epoch: 21 Global Step: 446310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:19:58,215-Speed 6309.14 samples/sec Loss 4.6004 LearningRate 0.0003 Epoch: 21 Global Step: 446320 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:01,458-Speed 6315.62 samples/sec Loss 4.6694 LearningRate 0.0003 Epoch: 21 Global Step: 446330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:04,705-Speed 6310.07 samples/sec Loss 4.6486 LearningRate 0.0003 Epoch: 21 Global Step: 446340 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:07,947-Speed 6317.37 samples/sec Loss 4.6621 LearningRate 0.0003 Epoch: 21 Global Step: 446350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:11,192-Speed 6313.39 samples/sec Loss 4.6637 LearningRate 0.0003 Epoch: 21 Global Step: 446360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:14,439-Speed 6308.59 samples/sec Loss 4.6453 LearningRate 0.0003 Epoch: 21 Global Step: 446370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:17,682-Speed 6316.63 samples/sec Loss 4.7156 LearningRate 0.0003 Epoch: 21 Global Step: 446380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:20,935-Speed 6296.77 samples/sec Loss 4.7154 LearningRate 0.0003 Epoch: 21 Global Step: 446390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:24,179-Speed 6314.20 samples/sec Loss 4.7018 LearningRate 0.0003 Epoch: 21 Global Step: 446400 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:20:27,411-Speed 6338.06 samples/sec Loss 4.6815 LearningRate 0.0003 Epoch: 21 Global Step: 446410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:30,657-Speed 6311.70 samples/sec Loss 4.5509 LearningRate 0.0003 Epoch: 21 Global Step: 446420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:33,903-Speed 6310.04 samples/sec Loss 4.6779 LearningRate 0.0003 Epoch: 21 Global Step: 446430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:37,152-Speed 6305.65 samples/sec Loss 4.6558 LearningRate 0.0003 Epoch: 21 Global Step: 446440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:40,397-Speed 6311.47 samples/sec Loss 4.6858 LearningRate 0.0003 Epoch: 21 Global Step: 446450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:43,643-Speed 6313.87 samples/sec Loss 4.6893 LearningRate 0.0003 Epoch: 21 Global Step: 446460 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:46,889-Speed 6311.66 samples/sec Loss 4.6001 LearningRate 0.0003 Epoch: 21 Global Step: 446470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:50,135-Speed 6309.86 samples/sec Loss 4.6715 LearningRate 0.0003 Epoch: 21 Global Step: 446480 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:53,383-Speed 6307.51 samples/sec Loss 4.6738 LearningRate 0.0003 Epoch: 21 Global Step: 446490 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:56,632-Speed 6305.72 samples/sec Loss 4.7291 LearningRate 0.0003 Epoch: 21 Global Step: 446500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:20:59,874-Speed 6317.57 samples/sec Loss 4.7920 LearningRate 0.0003 Epoch: 21 Global Step: 446510 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:21:03,108-Speed 6335.51 samples/sec Loss 4.6432 LearningRate 0.0003 Epoch: 21 Global Step: 446520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:06,355-Speed 6307.61 samples/sec Loss 4.6314 LearningRate 0.0003 Epoch: 21 Global Step: 446530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:09,596-Speed 6320.33 samples/sec Loss 4.5915 LearningRate 0.0003 Epoch: 21 Global Step: 446540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:12,841-Speed 6312.24 samples/sec Loss 4.6767 LearningRate 0.0003 Epoch: 21 Global Step: 446550 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:16,087-Speed 6312.26 samples/sec Loss 4.6793 LearningRate 0.0003 Epoch: 21 Global Step: 446560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:19,334-Speed 6309.10 samples/sec Loss 4.6473 LearningRate 0.0003 Epoch: 21 Global Step: 446570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:22,581-Speed 6308.28 samples/sec Loss 4.6840 LearningRate 0.0003 Epoch: 21 Global Step: 446580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:25,824-Speed 6316.76 samples/sec Loss 4.6237 LearningRate 0.0003 Epoch: 21 Global Step: 446590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:29,071-Speed 6308.19 samples/sec Loss 4.6853 LearningRate 0.0003 Epoch: 21 Global Step: 446600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:32,331-Speed 6283.84 samples/sec Loss 4.5908 LearningRate 0.0003 Epoch: 21 Global Step: 446610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:35,578-Speed 6307.92 samples/sec Loss 4.5894 LearningRate 0.0003 Epoch: 21 Global Step: 446620 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:21:38,812-Speed 6335.02 samples/sec Loss 4.5985 LearningRate 0.0003 Epoch: 21 Global Step: 446630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:42,056-Speed 6313.57 samples/sec Loss 4.6440 LearningRate 0.0003 Epoch: 21 Global Step: 446640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:45,310-Speed 6296.01 samples/sec Loss 4.6346 LearningRate 0.0003 Epoch: 21 Global Step: 446650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:48,553-Speed 6315.51 samples/sec Loss 4.6677 LearningRate 0.0003 Epoch: 21 Global Step: 446660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:51,799-Speed 6310.49 samples/sec Loss 4.6596 LearningRate 0.0003 Epoch: 21 Global Step: 446670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:55,048-Speed 6306.28 samples/sec Loss 4.6440 LearningRate 0.0003 Epoch: 21 Global Step: 446680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:21:58,288-Speed 6321.33 samples/sec Loss 4.6870 LearningRate 0.0003 Epoch: 21 Global Step: 446690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:01,535-Speed 6309.36 samples/sec Loss 4.6617 LearningRate 0.0003 Epoch: 21 Global Step: 446700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:04,780-Speed 6311.28 samples/sec Loss 4.6883 LearningRate 0.0003 Epoch: 21 Global Step: 446710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:08,024-Speed 6315.56 samples/sec Loss 4.7501 LearningRate 0.0003 Epoch: 21 Global Step: 446720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:11,252-Speed 6347.66 samples/sec Loss 4.5909 LearningRate 0.0003 Epoch: 21 Global Step: 446730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:14,500-Speed 6306.25 samples/sec Loss 4.6935 LearningRate 0.0003 Epoch: 21 Global Step: 446740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:17,747-Speed 6309.88 samples/sec Loss 4.6806 LearningRate 0.0003 Epoch: 21 Global Step: 446750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:20,992-Speed 6312.67 samples/sec Loss 4.6905 LearningRate 0.0003 Epoch: 21 Global Step: 446760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:24,236-Speed 6313.95 samples/sec Loss 4.7257 LearningRate 0.0003 Epoch: 21 Global Step: 446770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:27,485-Speed 6305.65 samples/sec Loss 4.7089 LearningRate 0.0003 Epoch: 21 Global Step: 446780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:30,737-Speed 6298.21 samples/sec Loss 4.7095 LearningRate 0.0003 Epoch: 21 Global Step: 446790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:33,984-Speed 6307.77 samples/sec Loss 4.6870 LearningRate 0.0003 Epoch: 21 Global Step: 446800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:37,262-Speed 6249.12 samples/sec Loss 4.7000 LearningRate 0.0003 Epoch: 21 Global Step: 446810 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:40,526-Speed 6276.93 samples/sec Loss 4.7300 LearningRate 0.0003 Epoch: 21 Global Step: 446820 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:43,753-Speed 6347.62 samples/sec Loss 4.6454 LearningRate 0.0003 Epoch: 21 Global Step: 446830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:46,998-Speed 6313.36 samples/sec Loss 4.7288 LearningRate 0.0003 Epoch: 21 Global Step: 446840 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:50,241-Speed 6315.24 samples/sec Loss 4.6405 LearningRate 0.0003 Epoch: 21 Global Step: 446850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:53,491-Speed 6304.34 samples/sec Loss 4.6023 LearningRate 0.0003 Epoch: 21 Global Step: 446860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:56,738-Speed 6307.43 samples/sec Loss 4.6906 LearningRate 0.0003 Epoch: 21 Global Step: 446870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:22:59,982-Speed 6314.42 samples/sec Loss 4.6923 LearningRate 0.0003 Epoch: 21 Global Step: 446880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:03,226-Speed 6315.47 samples/sec Loss 4.6691 LearningRate 0.0003 Epoch: 21 Global Step: 446890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:06,480-Speed 6294.89 samples/sec Loss 4.6779 LearningRate 0.0003 Epoch: 21 Global Step: 446900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:09,725-Speed 6311.90 samples/sec Loss 4.6654 LearningRate 0.0003 Epoch: 21 Global Step: 446910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:12,974-Speed 6305.73 samples/sec Loss 4.6226 LearningRate 0.0003 Epoch: 21 Global Step: 446920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:16,220-Speed 6311.48 samples/sec Loss 4.6811 LearningRate 0.0003 Epoch: 21 Global Step: 446930 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:23:19,458-Speed 6327.04 samples/sec Loss 4.6316 LearningRate 0.0003 Epoch: 21 Global Step: 446940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:22,708-Speed 6301.31 samples/sec Loss 4.7136 LearningRate 0.0003 Epoch: 21 Global Step: 446950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:25,959-Speed 6302.07 samples/sec Loss 4.6391 LearningRate 0.0003 Epoch: 21 Global Step: 446960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:29,206-Speed 6308.77 samples/sec Loss 4.6637 LearningRate 0.0003 Epoch: 21 Global Step: 446970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:32,456-Speed 6303.49 samples/sec Loss 4.6822 LearningRate 0.0003 Epoch: 21 Global Step: 446980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:35,717-Speed 6280.72 samples/sec Loss 4.5884 LearningRate 0.0003 Epoch: 21 Global Step: 446990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:38,966-Speed 6305.82 samples/sec Loss 4.7192 LearningRate 0.0003 Epoch: 21 Global Step: 447000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:42,215-Speed 6303.77 samples/sec Loss 4.5986 LearningRate 0.0003 Epoch: 21 Global Step: 447010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:45,464-Speed 6305.35 samples/sec Loss 4.6341 LearningRate 0.0003 Epoch: 21 Global Step: 447020 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:48,715-Speed 6301.01 samples/sec Loss 4.6241 LearningRate 0.0003 Epoch: 21 Global Step: 447030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:51,961-Speed 6311.42 samples/sec Loss 4.6110 LearningRate 0.0003 Epoch: 21 Global Step: 447040 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:23:55,191-Speed 6342.07 samples/sec Loss 4.6556 LearningRate 0.0003 Epoch: 21 Global Step: 447050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:23:58,444-Speed 6296.25 samples/sec Loss 4.7119 LearningRate 0.0003 Epoch: 21 Global Step: 447060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:01,693-Speed 6305.72 samples/sec Loss 4.6526 LearningRate 0.0003 Epoch: 21 Global Step: 447070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:04,941-Speed 6306.24 samples/sec Loss 4.6402 LearningRate 0.0003 Epoch: 21 Global Step: 447080 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:08,194-Speed 6297.24 samples/sec Loss 4.6458 LearningRate 0.0003 Epoch: 21 Global Step: 447090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:11,440-Speed 6310.26 samples/sec Loss 4.6444 LearningRate 0.0003 Epoch: 21 Global Step: 447100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:14,682-Speed 6318.55 samples/sec Loss 4.5757 LearningRate 0.0003 Epoch: 21 Global Step: 447110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:17,929-Speed 6309.88 samples/sec Loss 4.6213 LearningRate 0.0003 Epoch: 21 Global Step: 447120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:21,173-Speed 6314.52 samples/sec Loss 4.6461 LearningRate 0.0003 Epoch: 21 Global Step: 447130 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:24,417-Speed 6314.09 samples/sec Loss 4.6156 LearningRate 0.0003 Epoch: 21 Global Step: 447140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:27,666-Speed 6306.08 samples/sec Loss 4.6631 LearningRate 0.0003 Epoch: 21 Global Step: 447150 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:24:30,912-Speed 6309.85 samples/sec Loss 4.6429 LearningRate 0.0003 Epoch: 21 Global Step: 447160 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:24:34,141-Speed 6344.36 samples/sec Loss 4.6024 LearningRate 0.0003 Epoch: 21 Global Step: 447170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:37,387-Speed 6310.02 samples/sec Loss 4.6572 LearningRate 0.0003 Epoch: 21 Global Step: 447180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:40,634-Speed 6310.30 samples/sec Loss 4.6385 LearningRate 0.0003 Epoch: 21 Global Step: 447190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:43,883-Speed 6304.04 samples/sec Loss 4.5777 LearningRate 0.0003 Epoch: 21 Global Step: 447200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:47,128-Speed 6313.82 samples/sec Loss 4.6128 LearningRate 0.0003 Epoch: 21 Global Step: 447210 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:50,370-Speed 6317.73 samples/sec Loss 4.6369 LearningRate 0.0003 Epoch: 21 Global Step: 447220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:53,617-Speed 6308.90 samples/sec Loss 4.6128 LearningRate 0.0003 Epoch: 21 Global Step: 447230 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:24:56,861-Speed 6313.84 samples/sec Loss 4.6412 LearningRate 0.0003 Epoch: 21 Global Step: 447240 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:00,105-Speed 6314.84 samples/sec Loss 4.6683 LearningRate 0.0003 Epoch: 21 Global Step: 447250 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:03,357-Speed 6301.93 samples/sec Loss 4.6687 LearningRate 0.0003 Epoch: 21 Global Step: 447260 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:06,598-Speed 6320.14 samples/sec Loss 4.5578 LearningRate 0.0003 Epoch: 21 Global Step: 447270 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:25:09,830-Speed 6338.12 samples/sec Loss 4.6749 LearningRate 0.0003 Epoch: 21 Global Step: 447280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:13,075-Speed 6313.26 samples/sec Loss 4.6758 LearningRate 0.0003 Epoch: 21 Global Step: 447290 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:16,318-Speed 6316.26 samples/sec Loss 4.7270 LearningRate 0.0003 Epoch: 21 Global Step: 447300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:19,564-Speed 6310.85 samples/sec Loss 4.6387 LearningRate 0.0003 Epoch: 21 Global Step: 447310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:22,810-Speed 6311.31 samples/sec Loss 4.6813 LearningRate 0.0003 Epoch: 21 Global Step: 447320 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:26,055-Speed 6311.90 samples/sec Loss 4.6827 LearningRate 0.0003 Epoch: 21 Global Step: 447330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:29,302-Speed 6308.35 samples/sec Loss 4.6243 LearningRate 0.0003 Epoch: 21 Global Step: 447340 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:32,547-Speed 6314.40 samples/sec Loss 4.7313 LearningRate 0.0003 Epoch: 21 Global Step: 447350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:35,796-Speed 6303.80 samples/sec Loss 4.6907 LearningRate 0.0003 Epoch: 21 Global Step: 447360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:39,043-Speed 6309.92 samples/sec Loss 4.6112 LearningRate 0.0003 Epoch: 21 Global Step: 447370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:42,294-Speed 6300.61 samples/sec Loss 4.6219 LearningRate 0.0003 Epoch: 21 Global Step: 447380 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:25:45,522-Speed 6346.06 samples/sec Loss 4.5890 LearningRate 0.0003 Epoch: 21 Global Step: 447390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:48,771-Speed 6304.62 samples/sec Loss 4.6116 LearningRate 0.0003 Epoch: 21 Global Step: 447400 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:52,018-Speed 6309.77 samples/sec Loss 4.5629 LearningRate 0.0003 Epoch: 21 Global Step: 447410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:55,266-Speed 6305.42 samples/sec Loss 4.6068 LearningRate 0.0003 Epoch: 21 Global Step: 447420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:25:58,513-Speed 6308.93 samples/sec Loss 4.7124 LearningRate 0.0003 Epoch: 21 Global Step: 447430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:01,762-Speed 6305.40 samples/sec Loss 4.6196 LearningRate 0.0003 Epoch: 21 Global Step: 447440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:05,008-Speed 6311.20 samples/sec Loss 4.7117 LearningRate 0.0003 Epoch: 21 Global Step: 447450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:08,253-Speed 6311.45 samples/sec Loss 4.6121 LearningRate 0.0003 Epoch: 21 Global Step: 447460 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:11,500-Speed 6309.38 samples/sec Loss 4.7344 LearningRate 0.0003 Epoch: 21 Global Step: 447470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:14,745-Speed 6312.43 samples/sec Loss 4.7091 LearningRate 0.0003 Epoch: 21 Global Step: 447480 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:17,990-Speed 6312.95 samples/sec Loss 4.6393 LearningRate 0.0003 Epoch: 21 Global Step: 447490 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:26:21,222-Speed 6337.65 samples/sec Loss 4.6276 LearningRate 0.0003 Epoch: 21 Global Step: 447500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:24,470-Speed 6307.93 samples/sec Loss 4.6355 LearningRate 0.0003 Epoch: 21 Global Step: 447510 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:27,715-Speed 6311.49 samples/sec Loss 4.6947 LearningRate 0.0003 Epoch: 21 Global Step: 447520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:30,960-Speed 6312.85 samples/sec Loss 4.6110 LearningRate 0.0003 Epoch: 21 Global Step: 447530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:34,207-Speed 6309.87 samples/sec Loss 4.6461 LearningRate 0.0003 Epoch: 21 Global Step: 447540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:37,450-Speed 6314.79 samples/sec Loss 4.6937 LearningRate 0.0003 Epoch: 21 Global Step: 447550 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:40,697-Speed 6310.40 samples/sec Loss 4.6549 LearningRate 0.0003 Epoch: 21 Global Step: 447560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:43,950-Speed 6297.30 samples/sec Loss 4.6597 LearningRate 0.0003 Epoch: 21 Global Step: 447570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:47,195-Speed 6312.80 samples/sec Loss 4.6200 LearningRate 0.0003 Epoch: 21 Global Step: 447580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:50,444-Speed 6304.89 samples/sec Loss 4.6308 LearningRate 0.0003 Epoch: 21 Global Step: 447590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:53,675-Speed 6340.61 samples/sec Loss 4.6529 LearningRate 0.0003 Epoch: 21 Global Step: 447600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:26:56,918-Speed 6314.89 samples/sec Loss 4.6107 LearningRate 0.0003 Epoch: 21 Global Step: 447610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:00,161-Speed 6317.33 samples/sec Loss 4.6062 LearningRate 0.0003 Epoch: 21 Global Step: 447620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:03,407-Speed 6311.99 samples/sec Loss 4.6303 LearningRate 0.0003 Epoch: 21 Global Step: 447630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:06,650-Speed 6315.08 samples/sec Loss 4.6420 LearningRate 0.0003 Epoch: 21 Global Step: 447640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:09,892-Speed 6318.90 samples/sec Loss 4.7116 LearningRate 0.0003 Epoch: 21 Global Step: 447650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:13,140-Speed 6307.34 samples/sec Loss 4.6297 LearningRate 0.0003 Epoch: 21 Global Step: 447660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:16,383-Speed 6315.53 samples/sec Loss 4.6701 LearningRate 0.0003 Epoch: 21 Global Step: 447670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:19,632-Speed 6305.58 samples/sec Loss 4.6807 LearningRate 0.0003 Epoch: 21 Global Step: 447680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:22,878-Speed 6311.14 samples/sec Loss 4.6804 LearningRate 0.0003 Epoch: 21 Global Step: 447690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:26,120-Speed 6317.44 samples/sec Loss 4.5893 LearningRate 0.0003 Epoch: 21 Global Step: 447700 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:27:29,353-Speed 6336.42 samples/sec Loss 4.6741 LearningRate 0.0003 Epoch: 21 Global Step: 447710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:32,595-Speed 6318.03 samples/sec Loss 4.6952 LearningRate 0.0003 Epoch: 21 Global Step: 447720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:35,841-Speed 6310.48 samples/sec Loss 4.6283 LearningRate 0.0003 Epoch: 21 Global Step: 447730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:39,084-Speed 6316.77 samples/sec Loss 4.5879 LearningRate 0.0003 Epoch: 21 Global Step: 447740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:42,330-Speed 6310.35 samples/sec Loss 4.5844 LearningRate 0.0003 Epoch: 21 Global Step: 447750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:45,578-Speed 6307.89 samples/sec Loss 4.6332 LearningRate 0.0003 Epoch: 21 Global Step: 447760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:48,823-Speed 6313.49 samples/sec Loss 4.6648 LearningRate 0.0003 Epoch: 21 Global Step: 447770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:52,065-Speed 6317.62 samples/sec Loss 4.6141 LearningRate 0.0003 Epoch: 21 Global Step: 447780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:55,307-Speed 6319.65 samples/sec Loss 4.6418 LearningRate 0.0003 Epoch: 21 Global Step: 447790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:27:58,554-Speed 6308.30 samples/sec Loss 4.7026 LearningRate 0.0003 Epoch: 21 Global Step: 447800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:01,804-Speed 6303.38 samples/sec Loss 4.7047 LearningRate 0.0003 Epoch: 21 Global Step: 447810 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:28:05,035-Speed 6340.36 samples/sec Loss 4.6001 LearningRate 0.0003 Epoch: 21 Global Step: 447820 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:08,278-Speed 6315.98 samples/sec Loss 4.6229 LearningRate 0.0003 Epoch: 21 Global Step: 447830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:11,520-Speed 6317.62 samples/sec Loss 4.6445 LearningRate 0.0003 Epoch: 21 Global Step: 447840 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:14,764-Speed 6313.72 samples/sec Loss 4.6019 LearningRate 0.0003 Epoch: 21 Global Step: 447850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:18,011-Speed 6310.53 samples/sec Loss 4.6954 LearningRate 0.0003 Epoch: 21 Global Step: 447860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:21,257-Speed 6310.71 samples/sec Loss 4.6212 LearningRate 0.0003 Epoch: 21 Global Step: 447870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:24,502-Speed 6311.72 samples/sec Loss 4.6708 LearningRate 0.0003 Epoch: 21 Global Step: 447880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:27,742-Speed 6322.70 samples/sec Loss 4.6470 LearningRate 0.0003 Epoch: 21 Global Step: 447890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:30,989-Speed 6308.40 samples/sec Loss 4.6944 LearningRate 0.0003 Epoch: 21 Global Step: 447900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:34,235-Speed 6311.17 samples/sec Loss 4.6127 LearningRate 0.0003 Epoch: 21 Global Step: 447910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:37,466-Speed 6340.54 samples/sec Loss 4.6148 LearningRate 0.0003 Epoch: 21 Global Step: 447920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:40,714-Speed 6305.50 samples/sec Loss 4.6364 LearningRate 0.0003 Epoch: 21 Global Step: 447930 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:43,955-Speed 6321.67 samples/sec Loss 4.6441 LearningRate 0.0003 Epoch: 21 Global Step: 447940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:47,207-Speed 6297.55 samples/sec Loss 4.5983 LearningRate 0.0003 Epoch: 21 Global Step: 447950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:50,453-Speed 6310.97 samples/sec Loss 4.6043 LearningRate 0.0003 Epoch: 21 Global Step: 447960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:53,703-Speed 6303.39 samples/sec Loss 4.5944 LearningRate 0.0003 Epoch: 21 Global Step: 447970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:28:56,952-Speed 6305.96 samples/sec Loss 4.6916 LearningRate 0.0003 Epoch: 21 Global Step: 447980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:00,196-Speed 6315.24 samples/sec Loss 4.6334 LearningRate 0.0003 Epoch: 21 Global Step: 447990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:03,447-Speed 6300.87 samples/sec Loss 4.5861 LearningRate 0.0003 Epoch: 21 Global Step: 448000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:06,689-Speed 6317.08 samples/sec Loss 4.5813 LearningRate 0.0003 Epoch: 21 Global Step: 448010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:09,938-Speed 6305.57 samples/sec Loss 4.6832 LearningRate 0.0003 Epoch: 21 Global Step: 448020 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:29:13,185-Speed 6308.79 samples/sec Loss 4.6397 LearningRate 0.0003 Epoch: 21 Global Step: 448030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:16,463-Speed 6249.48 samples/sec Loss 4.6475 LearningRate 0.0003 Epoch: 21 Global Step: 448040 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:19,710-Speed 6308.46 samples/sec Loss 4.6423 LearningRate 0.0003 Epoch: 21 Global Step: 448050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:22,958-Speed 6306.58 samples/sec Loss 4.5975 LearningRate 0.0003 Epoch: 21 Global Step: 448060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:26,205-Speed 6309.40 samples/sec Loss 4.6629 LearningRate 0.0003 Epoch: 21 Global Step: 448070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:29,453-Speed 6306.62 samples/sec Loss 4.5928 LearningRate 0.0003 Epoch: 21 Global Step: 448080 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:32,699-Speed 6310.60 samples/sec Loss 4.5704 LearningRate 0.0003 Epoch: 21 Global Step: 448090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:35,951-Speed 6299.64 samples/sec Loss 4.6452 LearningRate 0.0003 Epoch: 21 Global Step: 448100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:39,199-Speed 6306.33 samples/sec Loss 4.6810 LearningRate 0.0003 Epoch: 21 Global Step: 448110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:42,448-Speed 6304.51 samples/sec Loss 4.6473 LearningRate 0.0003 Epoch: 21 Global Step: 448120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:45,698-Speed 6304.28 samples/sec Loss 4.7102 LearningRate 0.0003 Epoch: 21 Global Step: 448130 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:29:48,929-Speed 6339.60 samples/sec Loss 4.6671 LearningRate 0.0003 Epoch: 21 Global Step: 448140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:52,176-Speed 6307.99 samples/sec Loss 4.7006 LearningRate 0.0003 Epoch: 21 Global Step: 448150 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:55,422-Speed 6310.44 samples/sec Loss 4.6512 LearningRate 0.0003 Epoch: 21 Global Step: 448160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:29:58,675-Speed 6297.68 samples/sec Loss 4.6967 LearningRate 0.0003 Epoch: 21 Global Step: 448170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:01,921-Speed 6310.60 samples/sec Loss 4.6734 LearningRate 0.0003 Epoch: 21 Global Step: 448180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:05,168-Speed 6309.95 samples/sec Loss 4.6227 LearningRate 0.0003 Epoch: 21 Global Step: 448190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:08,422-Speed 6295.78 samples/sec Loss 4.6193 LearningRate 0.0003 Epoch: 21 Global Step: 448200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:11,671-Speed 6304.74 samples/sec Loss 4.6114 LearningRate 0.0003 Epoch: 21 Global Step: 448210 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:14,917-Speed 6310.53 samples/sec Loss 4.6263 LearningRate 0.0003 Epoch: 21 Global Step: 448220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:18,162-Speed 6313.64 samples/sec Loss 4.6676 LearningRate 0.0003 Epoch: 21 Global Step: 448230 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:21,394-Speed 6338.09 samples/sec Loss 4.6413 LearningRate 0.0003 Epoch: 21 Global Step: 448240 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:24,642-Speed 6305.04 samples/sec Loss 4.6834 LearningRate 0.0003 Epoch: 21 Global Step: 448250 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:27,886-Speed 6315.08 samples/sec Loss 4.6510 LearningRate 0.0003 Epoch: 21 Global Step: 448260 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:31,132-Speed 6310.42 samples/sec Loss 4.6450 LearningRate 0.0003 Epoch: 21 Global Step: 448270 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:34,389-Speed 6290.73 samples/sec Loss 4.7097 LearningRate 0.0003 Epoch: 21 Global Step: 448280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:37,638-Speed 6304.61 samples/sec Loss 4.6549 LearningRate 0.0003 Epoch: 21 Global Step: 448290 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:40,882-Speed 6315.22 samples/sec Loss 4.7242 LearningRate 0.0003 Epoch: 21 Global Step: 448300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:44,128-Speed 6308.69 samples/sec Loss 4.6660 LearningRate 0.0003 Epoch: 21 Global Step: 448310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:47,378-Speed 6304.69 samples/sec Loss 4.6436 LearningRate 0.0003 Epoch: 21 Global Step: 448320 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:50,625-Speed 6307.82 samples/sec Loss 4.6299 LearningRate 0.0003 Epoch: 21 Global Step: 448330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:30:53,874-Speed 6305.09 samples/sec Loss 4.6371 LearningRate 0.0003 Epoch: 21 Global Step: 448340 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:30:57,106-Speed 6338.18 samples/sec Loss 4.6097 LearningRate 0.0003 Epoch: 21 Global Step: 448350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:00,352-Speed 6311.38 samples/sec Loss 4.6697 LearningRate 0.0003 Epoch: 21 Global Step: 448360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:03,599-Speed 6307.93 samples/sec Loss 4.6930 LearningRate 0.0003 Epoch: 21 Global Step: 448370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:06,846-Speed 6311.77 samples/sec Loss 4.6699 LearningRate 0.0003 Epoch: 21 Global Step: 448380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:10,094-Speed 6305.58 samples/sec Loss 4.6419 LearningRate 0.0003 Epoch: 21 Global Step: 448390 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:13,338-Speed 6314.98 samples/sec Loss 4.6091 LearningRate 0.0003 Epoch: 21 Global Step: 448400 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:16,584-Speed 6311.59 samples/sec Loss 4.7104 LearningRate 0.0003 Epoch: 21 Global Step: 448410 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:19,828-Speed 6313.65 samples/sec Loss 4.6424 LearningRate 0.0003 Epoch: 21 Global Step: 448420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:23,076-Speed 6308.05 samples/sec Loss 4.6712 LearningRate 0.0003 Epoch: 21 Global Step: 448430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:26,323-Speed 6307.84 samples/sec Loss 4.6243 LearningRate 0.0003 Epoch: 21 Global Step: 448440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:29,554-Speed 6341.71 samples/sec Loss 4.5971 LearningRate 0.0003 Epoch: 21 Global Step: 448450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:32,801-Speed 6306.88 samples/sec Loss 4.5933 LearningRate 0.0003 Epoch: 21 Global Step: 448460 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:36,046-Speed 6313.67 samples/sec Loss 4.6185 LearningRate 0.0003 Epoch: 21 Global Step: 448470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:39,297-Speed 6301.70 samples/sec Loss 4.6884 LearningRate 0.0003 Epoch: 21 Global Step: 448480 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:42,544-Speed 6307.78 samples/sec Loss 4.6694 LearningRate 0.0003 Epoch: 21 Global Step: 448490 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:45,786-Speed 6318.40 samples/sec Loss 4.5805 LearningRate 0.0003 Epoch: 21 Global Step: 448500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:49,037-Speed 6301.62 samples/sec Loss 4.6565 LearningRate 0.0003 Epoch: 21 Global Step: 448510 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:52,282-Speed 6311.51 samples/sec Loss 4.6544 LearningRate 0.0003 Epoch: 21 Global Step: 448520 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:55,533-Speed 6301.62 samples/sec Loss 4.6546 LearningRate 0.0003 Epoch: 21 Global Step: 448530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:31:58,778-Speed 6312.26 samples/sec Loss 4.6917 LearningRate 0.0003 Epoch: 21 Global Step: 448540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:02,015-Speed 6328.75 samples/sec Loss 4.7070 LearningRate 0.0003 Epoch: 21 Global Step: 448550 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:05,261-Speed 6310.40 samples/sec Loss 4.6780 LearningRate 0.0003 Epoch: 21 Global Step: 448560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:08,508-Speed 6308.66 samples/sec Loss 4.6255 LearningRate 0.0003 Epoch: 21 Global Step: 448570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:11,753-Speed 6312.27 samples/sec Loss 4.6057 LearningRate 0.0003 Epoch: 21 Global Step: 448580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:14,996-Speed 6315.94 samples/sec Loss 4.5579 LearningRate 0.0003 Epoch: 21 Global Step: 448590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:18,250-Speed 6299.20 samples/sec Loss 4.6650 LearningRate 0.0003 Epoch: 21 Global Step: 448600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:21,497-Speed 6308.41 samples/sec Loss 4.6200 LearningRate 0.0003 Epoch: 21 Global Step: 448610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:24,742-Speed 6313.59 samples/sec Loss 4.7205 LearningRate 0.0003 Epoch: 21 Global Step: 448620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:27,987-Speed 6312.73 samples/sec Loss 4.6721 LearningRate 0.0003 Epoch: 21 Global Step: 448630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:31,231-Speed 6314.97 samples/sec Loss 4.6629 LearningRate 0.0003 Epoch: 21 Global Step: 448640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:34,481-Speed 6301.72 samples/sec Loss 4.6002 LearningRate 0.0003 Epoch: 21 Global Step: 448650 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:32:37,715-Speed 6333.98 samples/sec Loss 4.6542 LearningRate 0.0003 Epoch: 21 Global Step: 448660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:40,961-Speed 6310.61 samples/sec Loss 4.6451 LearningRate 0.0003 Epoch: 21 Global Step: 448670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:44,209-Speed 6308.21 samples/sec Loss 4.6718 LearningRate 0.0003 Epoch: 21 Global Step: 448680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:47,454-Speed 6313.07 samples/sec Loss 4.6505 LearningRate 0.0003 Epoch: 21 Global Step: 448690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:50,702-Speed 6305.81 samples/sec Loss 4.6978 LearningRate 0.0003 Epoch: 21 Global Step: 448700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:53,944-Speed 6318.29 samples/sec Loss 4.6827 LearningRate 0.0003 Epoch: 21 Global Step: 448710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:32:57,188-Speed 6313.52 samples/sec Loss 4.6355 LearningRate 0.0003 Epoch: 21 Global Step: 448720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:00,432-Speed 6315.93 samples/sec Loss 4.6154 LearningRate 0.0003 Epoch: 21 Global Step: 448730 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:03,678-Speed 6309.54 samples/sec Loss 4.6323 LearningRate 0.0003 Epoch: 21 Global Step: 448740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:06,925-Speed 6312.03 samples/sec Loss 4.6791 LearningRate 0.0003 Epoch: 21 Global Step: 448750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:10,157-Speed 6337.73 samples/sec Loss 4.6585 LearningRate 0.0003 Epoch: 21 Global Step: 448760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:13,405-Speed 6306.22 samples/sec Loss 4.6479 LearningRate 0.0003 Epoch: 21 Global Step: 448770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:16,650-Speed 6312.91 samples/sec Loss 4.6670 LearningRate 0.0003 Epoch: 21 Global Step: 448780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:19,902-Speed 6298.72 samples/sec Loss 4.6376 LearningRate 0.0003 Epoch: 21 Global Step: 448790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:23,147-Speed 6312.18 samples/sec Loss 4.6682 LearningRate 0.0003 Epoch: 21 Global Step: 448800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:26,394-Speed 6309.64 samples/sec Loss 4.7167 LearningRate 0.0003 Epoch: 21 Global Step: 448810 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:29,645-Speed 6300.06 samples/sec Loss 4.6524 LearningRate 0.0003 Epoch: 21 Global Step: 448820 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:32,888-Speed 6317.81 samples/sec Loss 4.6035 LearningRate 0.0003 Epoch: 21 Global Step: 448830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:36,136-Speed 6307.70 samples/sec Loss 4.6446 LearningRate 0.0003 Epoch: 21 Global Step: 448840 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:39,394-Speed 6286.62 samples/sec Loss 4.6411 LearningRate 0.0003 Epoch: 21 Global Step: 448850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:42,624-Speed 6342.66 samples/sec Loss 4.6159 LearningRate 0.0003 Epoch: 21 Global Step: 448860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:45,869-Speed 6312.45 samples/sec Loss 4.6776 LearningRate 0.0003 Epoch: 21 Global Step: 448870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:49,113-Speed 6315.50 samples/sec Loss 4.7008 LearningRate 0.0003 Epoch: 21 Global Step: 448880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:52,358-Speed 6312.19 samples/sec Loss 4.5992 LearningRate 0.0003 Epoch: 21 Global Step: 448890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:55,610-Speed 6298.57 samples/sec Loss 4.6488 LearningRate 0.0003 Epoch: 21 Global Step: 448900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:33:58,856-Speed 6309.95 samples/sec Loss 4.6170 LearningRate 0.0003 Epoch: 21 Global Step: 448910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:02,103-Speed 6308.61 samples/sec Loss 4.6682 LearningRate 0.0003 Epoch: 21 Global Step: 448920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:05,349-Speed 6312.36 samples/sec Loss 4.6358 LearningRate 0.0003 Epoch: 21 Global Step: 448930 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:08,595-Speed 6310.57 samples/sec Loss 4.6366 LearningRate 0.0003 Epoch: 21 Global Step: 448940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:11,842-Speed 6310.94 samples/sec Loss 4.7340 LearningRate 0.0003 Epoch: 21 Global Step: 448950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:15,089-Speed 6308.86 samples/sec Loss 4.6092 LearningRate 0.0003 Epoch: 21 Global Step: 448960 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:34:18,334-Speed 6311.54 samples/sec Loss 4.6392 LearningRate 0.0003 Epoch: 21 Global Step: 448970 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:34:21,566-Speed 6338.04 samples/sec Loss 4.6373 LearningRate 0.0003 Epoch: 21 Global Step: 448980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:24,815-Speed 6308.49 samples/sec Loss 4.6706 LearningRate 0.0003 Epoch: 21 Global Step: 448990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:28,062-Speed 6306.79 samples/sec Loss 4.6533 LearningRate 0.0003 Epoch: 21 Global Step: 449000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:31,310-Speed 6308.53 samples/sec Loss 4.6265 LearningRate 0.0003 Epoch: 21 Global Step: 449010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:34,556-Speed 6310.32 samples/sec Loss 4.6473 LearningRate 0.0003 Epoch: 21 Global Step: 449020 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:37,806-Speed 6304.24 samples/sec Loss 4.6488 LearningRate 0.0003 Epoch: 21 Global Step: 449030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:41,054-Speed 6305.81 samples/sec Loss 4.6643 LearningRate 0.0003 Epoch: 21 Global Step: 449040 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:44,297-Speed 6316.64 samples/sec Loss 4.6588 LearningRate 0.0003 Epoch: 21 Global Step: 449050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:47,543-Speed 6310.64 samples/sec Loss 4.5827 LearningRate 0.0003 Epoch: 21 Global Step: 449060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:50,789-Speed 6311.78 samples/sec Loss 4.6611 LearningRate 0.0003 Epoch: 21 Global Step: 449070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:34:54,037-Speed 6305.45 samples/sec Loss 4.6308 LearningRate 0.0003 Epoch: 21 Global Step: 449080 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:34:57,270-Speed 6337.20 samples/sec Loss 4.6319 LearningRate 0.0003 Epoch: 21 Global Step: 449090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:00,518-Speed 6307.27 samples/sec Loss 4.6740 LearningRate 0.0003 Epoch: 21 Global Step: 449100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:03,766-Speed 6306.47 samples/sec Loss 4.6031 LearningRate 0.0003 Epoch: 21 Global Step: 449110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:07,008-Speed 6317.35 samples/sec Loss 4.5968 LearningRate 0.0003 Epoch: 21 Global Step: 449120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:10,252-Speed 6315.94 samples/sec Loss 4.7041 LearningRate 0.0003 Epoch: 21 Global Step: 449130 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:13,498-Speed 6309.58 samples/sec Loss 4.6676 LearningRate 0.0003 Epoch: 21 Global Step: 449140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:16,745-Speed 6309.17 samples/sec Loss 4.7329 LearningRate 0.0003 Epoch: 21 Global Step: 449150 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:19,990-Speed 6312.57 samples/sec Loss 4.6042 LearningRate 0.0003 Epoch: 21 Global Step: 449160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:23,240-Speed 6303.25 samples/sec Loss 4.6394 LearningRate 0.0003 Epoch: 21 Global Step: 449170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:26,486-Speed 6310.54 samples/sec Loss 4.6819 LearningRate 0.0003 Epoch: 21 Global Step: 449180 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:29,718-Speed 6337.81 samples/sec Loss 4.6437 LearningRate 0.0003 Epoch: 21 Global Step: 449190 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:32,968-Speed 6303.63 samples/sec Loss 4.6416 LearningRate 0.0003 Epoch: 21 Global Step: 449200 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:36,213-Speed 6311.06 samples/sec Loss 4.6771 LearningRate 0.0003 Epoch: 21 Global Step: 449210 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:39,457-Speed 6315.68 samples/sec Loss 4.6462 LearningRate 0.0003 Epoch: 21 Global Step: 449220 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:42,700-Speed 6316.37 samples/sec Loss 4.6129 LearningRate 0.0003 Epoch: 21 Global Step: 449230 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:45,950-Speed 6303.23 samples/sec Loss 4.6146 LearningRate 0.0003 Epoch: 21 Global Step: 449240 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:49,193-Speed 6316.86 samples/sec Loss 4.6217 LearningRate 0.0003 Epoch: 21 Global Step: 449250 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:52,441-Speed 6306.90 samples/sec Loss 4.6413 LearningRate 0.0003 Epoch: 21 Global Step: 449260 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:55,690-Speed 6306.11 samples/sec Loss 4.6784 LearningRate 0.0003 Epoch: 21 Global Step: 449270 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:35:58,937-Speed 6307.68 samples/sec Loss 4.6199 LearningRate 0.0003 Epoch: 21 Global Step: 449280 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:02,203-Speed 6272.02 samples/sec Loss 4.5979 LearningRate 0.0003 Epoch: 21 Global Step: 449290 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:05,517-Speed 6181.91 samples/sec Loss 4.6587 LearningRate 0.0003 Epoch: 21 Global Step: 449300 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:08,764-Speed 6308.59 samples/sec Loss 4.6137 LearningRate 0.0003 Epoch: 21 Global Step: 449310 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:12,014-Speed 6302.91 samples/sec Loss 4.6113 LearningRate 0.0003 Epoch: 21 Global Step: 449320 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:15,259-Speed 6311.59 samples/sec Loss 4.6342 LearningRate 0.0003 Epoch: 21 Global Step: 449330 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:18,506-Speed 6309.26 samples/sec Loss 4.7083 LearningRate 0.0003 Epoch: 21 Global Step: 449340 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:21,756-Speed 6303.23 samples/sec Loss 4.6297 LearningRate 0.0003 Epoch: 21 Global Step: 449350 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:25,008-Speed 6298.38 samples/sec Loss 4.6820 LearningRate 0.0003 Epoch: 21 Global Step: 449360 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:28,256-Speed 6307.00 samples/sec Loss 4.6495 LearningRate 0.0003 Epoch: 21 Global Step: 449370 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:31,505-Speed 6304.90 samples/sec Loss 4.6119 LearningRate 0.0003 Epoch: 21 Global Step: 449380 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:34,761-Speed 6290.87 samples/sec Loss 4.6209 LearningRate 0.0003 Epoch: 21 Global Step: 449390 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:36:38,116-Speed 6105.51 samples/sec Loss 4.6012 LearningRate 0.0003 Epoch: 21 Global Step: 449400 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:36:41,359-Speed 6317.31 samples/sec Loss 4.6321 LearningRate 0.0003 Epoch: 21 Global Step: 449410 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:36:44,591-Speed 6338.55 samples/sec Loss 4.6053 LearningRate 0.0003 Epoch: 21 Global Step: 449420 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:47,834-Speed 6316.36 samples/sec Loss 4.6143 LearningRate 0.0003 Epoch: 21 Global Step: 449430 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:51,089-Speed 6293.50 samples/sec Loss 4.6802 LearningRate 0.0003 Epoch: 21 Global Step: 449440 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:54,337-Speed 6306.15 samples/sec Loss 4.6448 LearningRate 0.0003 Epoch: 21 Global Step: 449450 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:36:57,592-Speed 6294.42 samples/sec Loss 4.6878 LearningRate 0.0003 Epoch: 21 Global Step: 449460 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:00,839-Speed 6309.23 samples/sec Loss 4.6063 LearningRate 0.0003 Epoch: 21 Global Step: 449470 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:04,087-Speed 6307.66 samples/sec Loss 4.6878 LearningRate 0.0003 Epoch: 21 Global Step: 449480 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:07,332-Speed 6311.46 samples/sec Loss 4.6795 LearningRate 0.0003 Epoch: 21 Global Step: 449490 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:10,593-Speed 6283.29 samples/sec Loss 4.6267 LearningRate 0.0003 Epoch: 21 Global Step: 449500 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:13,837-Speed 6313.04 samples/sec Loss 4.6906 LearningRate 0.0003 Epoch: 21 Global Step: 449510 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:17,083-Speed 6311.92 samples/sec Loss 4.5936 LearningRate 0.0003 Epoch: 21 Global Step: 449520 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:37:20,320-Speed 6326.65 samples/sec Loss 4.6403 LearningRate 0.0003 Epoch: 21 Global Step: 449530 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:23,565-Speed 6313.52 samples/sec Loss 4.6030 LearningRate 0.0003 Epoch: 21 Global Step: 449540 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:26,812-Speed 6307.94 samples/sec Loss 4.6337 LearningRate 0.0003 Epoch: 21 Global Step: 449550 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:30,058-Speed 6311.25 samples/sec Loss 4.6512 LearningRate 0.0003 Epoch: 21 Global Step: 449560 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:33,305-Speed 6308.45 samples/sec Loss 4.6473 LearningRate 0.0003 Epoch: 21 Global Step: 449570 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:36,548-Speed 6316.71 samples/sec Loss 4.6616 LearningRate 0.0003 Epoch: 21 Global Step: 449580 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:39,794-Speed 6310.33 samples/sec Loss 4.6270 LearningRate 0.0003 Epoch: 21 Global Step: 449590 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:43,036-Speed 6318.36 samples/sec Loss 4.6892 LearningRate 0.0003 Epoch: 21 Global Step: 449600 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:46,284-Speed 6307.01 samples/sec Loss 4.6545 LearningRate 0.0003 Epoch: 21 Global Step: 449610 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:49,528-Speed 6314.67 samples/sec Loss 4.6502 LearningRate 0.0003 Epoch: 21 Global Step: 449620 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:52,757-Speed 6344.49 samples/sec Loss 4.6071 LearningRate 0.0003 Epoch: 21 Global Step: 449630 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:56,002-Speed 6312.48 samples/sec Loss 4.6097 LearningRate 0.0003 Epoch: 21 Global Step: 449640 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:37:59,245-Speed 6316.67 samples/sec Loss 4.5785 LearningRate 0.0003 Epoch: 21 Global Step: 449650 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:02,494-Speed 6304.61 samples/sec Loss 4.6836 LearningRate 0.0003 Epoch: 21 Global Step: 449660 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:05,751-Speed 6290.81 samples/sec Loss 4.5883 LearningRate 0.0003 Epoch: 21 Global Step: 449670 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:08,996-Speed 6311.86 samples/sec Loss 4.6195 LearningRate 0.0003 Epoch: 21 Global Step: 449680 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:12,241-Speed 6314.10 samples/sec Loss 4.6798 LearningRate 0.0003 Epoch: 21 Global Step: 449690 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:15,482-Speed 6320.42 samples/sec Loss 4.6533 LearningRate 0.0003 Epoch: 21 Global Step: 449700 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:18,725-Speed 6314.81 samples/sec Loss 4.6860 LearningRate 0.0003 Epoch: 21 Global Step: 449710 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:21,972-Speed 6309.96 samples/sec Loss 4.7002 LearningRate 0.0003 Epoch: 21 Global Step: 449720 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:25,218-Speed 6310.35 samples/sec Loss 4.6568 LearningRate 0.0003 Epoch: 21 Global Step: 449730 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:38:28,449-Speed 6339.63 samples/sec Loss 4.6217 LearningRate 0.0003 Epoch: 21 Global Step: 449740 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:31,694-Speed 6313.02 samples/sec Loss 4.5882 LearningRate 0.0003 Epoch: 21 Global Step: 449750 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:34,938-Speed 6314.65 samples/sec Loss 4.6248 LearningRate 0.0003 Epoch: 21 Global Step: 449760 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:38,183-Speed 6313.35 samples/sec Loss 4.6346 LearningRate 0.0003 Epoch: 21 Global Step: 449770 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:41,510-Speed 6156.30 samples/sec Loss 4.6680 LearningRate 0.0003 Epoch: 21 Global Step: 449780 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:44,764-Speed 6295.23 samples/sec Loss 4.6156 LearningRate 0.0003 Epoch: 21 Global Step: 449790 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:48,007-Speed 6316.45 samples/sec Loss 4.5727 LearningRate 0.0003 Epoch: 21 Global Step: 449800 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:51,267-Speed 6283.28 samples/sec Loss 4.5931 LearningRate 0.0003 Epoch: 21 Global Step: 449810 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:54,511-Speed 6315.43 samples/sec Loss 4.6093 LearningRate 0.0003 Epoch: 21 Global Step: 449820 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:38:57,759-Speed 6306.52 samples/sec Loss 4.6338 LearningRate 0.0003 Epoch: 21 Global Step: 449830 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:01,002-Speed 6317.41 samples/sec Loss 4.5499 LearningRate 0.0003 Epoch: 21 Global Step: 449840 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:39:04,235-Speed 6336.26 samples/sec Loss 4.6869 LearningRate 0.0003 Epoch: 21 Global Step: 449850 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:07,481-Speed 6310.49 samples/sec Loss 4.6463 LearningRate 0.0003 Epoch: 21 Global Step: 449860 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:10,726-Speed 6312.79 samples/sec Loss 4.6723 LearningRate 0.0003 Epoch: 21 Global Step: 449870 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:13,969-Speed 6316.56 samples/sec Loss 4.6181 LearningRate 0.0003 Epoch: 21 Global Step: 449880 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:17,216-Speed 6308.76 samples/sec Loss 4.6155 LearningRate 0.0003 Epoch: 21 Global Step: 449890 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:20,466-Speed 6302.56 samples/sec Loss 4.6715 LearningRate 0.0003 Epoch: 21 Global Step: 449900 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:23,717-Speed 6301.16 samples/sec Loss 4.5742 LearningRate 0.0003 Epoch: 21 Global Step: 449910 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:26,961-Speed 6314.55 samples/sec Loss 4.5888 LearningRate 0.0003 Epoch: 21 Global Step: 449920 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:30,209-Speed 6307.61 samples/sec Loss 4.5949 LearningRate 0.0003 Epoch: 21 Global Step: 449930 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:33,452-Speed 6316.24 samples/sec Loss 4.5923 LearningRate 0.0003 Epoch: 21 Global Step: 449940 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:36,687-Speed 6333.02 samples/sec Loss 4.7176 LearningRate 0.0003 Epoch: 21 Global Step: 449950 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:39,933-Speed 6310.77 samples/sec Loss 4.6130 LearningRate 0.0003 Epoch: 21 Global Step: 449960 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:43,181-Speed 6306.32 samples/sec Loss 4.6544 LearningRate 0.0003 Epoch: 21 Global Step: 449970 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:46,424-Speed 6315.69 samples/sec Loss 4.6409 LearningRate 0.0003 Epoch: 21 Global Step: 449980 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:49,670-Speed 6316.08 samples/sec Loss 4.6314 LearningRate 0.0003 Epoch: 21 Global Step: 449990 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:52,915-Speed 6313.49 samples/sec Loss 4.6113 LearningRate 0.0003 Epoch: 21 Global Step: 450000 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:56,161-Speed 6309.30 samples/sec Loss 4.6513 LearningRate 0.0003 Epoch: 21 Global Step: 450010 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:39:59,410-Speed 6306.38 samples/sec Loss 4.6694 LearningRate 0.0003 Epoch: 21 Global Step: 450020 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:02,667-Speed 6288.52 samples/sec Loss 4.5472 LearningRate 0.0003 Epoch: 21 Global Step: 450030 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:05,914-Speed 6308.83 samples/sec Loss 4.6476 LearningRate 0.0003 Epoch: 21 Global Step: 450040 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:09,147-Speed 6335.89 samples/sec Loss 4.6304 LearningRate 0.0003 Epoch: 21 Global Step: 450050 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:12,397-Speed 6306.31 samples/sec Loss 4.6501 LearningRate 0.0003 Epoch: 21 Global Step: 450060 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:15,642-Speed 6312.44 samples/sec Loss 4.6091 LearningRate 0.0003 Epoch: 21 Global Step: 450070 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:18,894-Speed 6298.30 samples/sec Loss 4.5305 LearningRate 0.0003 Epoch: 21 Global Step: 450080 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:22,139-Speed 6314.25 samples/sec Loss 4.6219 LearningRate 0.0003 Epoch: 21 Global Step: 450090 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:25,389-Speed 6305.11 samples/sec Loss 4.6294 LearningRate 0.0003 Epoch: 21 Global Step: 450100 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:28,635-Speed 6309.20 samples/sec Loss 4.5993 LearningRate 0.0003 Epoch: 21 Global Step: 450110 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:31,881-Speed 6311.88 samples/sec Loss 4.7000 LearningRate 0.0003 Epoch: 21 Global Step: 450120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:35,132-Speed 6299.73 samples/sec Loss 4.5483 LearningRate 0.0003 Epoch: 21 Global Step: 450130 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:38,375-Speed 6316.51 samples/sec Loss 4.6398 LearningRate 0.0003 Epoch: 21 Global Step: 450140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:41,625-Speed 6304.82 samples/sec Loss 4.6522 LearningRate 0.0003 Epoch: 21 Global Step: 450150 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-02 08:40:44,858-Speed 6340.33 samples/sec Loss 4.6388 LearningRate 0.0003 Epoch: 21 Global Step: 450160 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:48,107-Speed 6303.59 samples/sec Loss 4.6639 LearningRate 0.0003 Epoch: 21 Global Step: 450170 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-02 08:40:51,357-Speed 6302.99 samples/sec Loss 4.6698 LearningRate 0.0003 Epoch: 21 Global Step: 450180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:40:54,600-Speed 6317.83 samples/sec Loss 4.6778 LearningRate 0.0003 Epoch: 21 Global Step: 450190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:40:57,842-Speed 6316.92 samples/sec Loss 4.6651 LearningRate 0.0003 Epoch: 21 Global Step: 450200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:01,088-Speed 6310.87 samples/sec Loss 4.6293 LearningRate 0.0003 Epoch: 21 Global Step: 450210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:04,336-Speed 6307.44 samples/sec Loss 4.6147 LearningRate 0.0003 Epoch: 21 Global Step: 450220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:07,588-Speed 6298.30 samples/sec Loss 4.6214 LearningRate 0.0003 Epoch: 21 Global Step: 450230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:10,838-Speed 6304.85 samples/sec Loss 4.6002 LearningRate 0.0003 Epoch: 21 Global Step: 450240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:14,085-Speed 6307.26 samples/sec Loss 4.6595 LearningRate 0.0003 Epoch: 21 Global Step: 450250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:17,331-Speed 6311.83 samples/sec Loss 4.5935 LearningRate 0.0003 Epoch: 21 Global Step: 450260 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:41:20,560-Speed 6342.47 samples/sec Loss 4.6823 LearningRate 0.0003 Epoch: 21 Global Step: 450270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:23,806-Speed 6312.00 samples/sec Loss 4.6001 LearningRate 0.0003 Epoch: 21 Global Step: 450280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:27,057-Speed 6301.00 samples/sec Loss 4.6082 LearningRate 0.0003 Epoch: 21 Global Step: 450290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:30,304-Speed 6308.28 samples/sec Loss 4.7086 LearningRate 0.0003 Epoch: 21 Global Step: 450300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:33,551-Speed 6309.05 samples/sec Loss 4.6056 LearningRate 0.0003 Epoch: 21 Global Step: 450310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:36,798-Speed 6309.76 samples/sec Loss 4.5955 LearningRate 0.0003 Epoch: 21 Global Step: 450320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:40,044-Speed 6309.88 samples/sec Loss 4.5981 LearningRate 0.0003 Epoch: 21 Global Step: 450330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:43,298-Speed 6294.51 samples/sec Loss 4.6820 LearningRate 0.0003 Epoch: 21 Global Step: 450340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:46,546-Speed 6307.76 samples/sec Loss 4.6435 LearningRate 0.0003 Epoch: 21 Global Step: 450350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:49,793-Speed 6309.02 samples/sec Loss 4.6409 LearningRate 0.0003 Epoch: 21 Global Step: 450360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:53,036-Speed 6315.63 samples/sec Loss 4.6533 LearningRate 0.0003 Epoch: 21 Global Step: 450370 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:41:56,269-Speed 6337.55 samples/sec Loss 4.5740 LearningRate 0.0003 Epoch: 21 Global Step: 450380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:41:59,516-Speed 6307.59 samples/sec Loss 4.6358 LearningRate 0.0003 Epoch: 21 Global Step: 450390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:02,763-Speed 6309.02 samples/sec Loss 4.6564 LearningRate 0.0003 Epoch: 21 Global Step: 450400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:06,010-Speed 6308.03 samples/sec Loss 4.6518 LearningRate 0.0003 Epoch: 21 Global Step: 450410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:09,259-Speed 6305.98 samples/sec Loss 4.6402 LearningRate 0.0003 Epoch: 21 Global Step: 450420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:12,508-Speed 6304.19 samples/sec Loss 4.6460 LearningRate 0.0003 Epoch: 21 Global Step: 450430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:15,757-Speed 6305.54 samples/sec Loss 4.6057 LearningRate 0.0003 Epoch: 21 Global Step: 450440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:19,005-Speed 6307.14 samples/sec Loss 4.7129 LearningRate 0.0003 Epoch: 21 Global Step: 450450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:22,252-Speed 6307.40 samples/sec Loss 4.5972 LearningRate 0.0003 Epoch: 21 Global Step: 450460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:25,501-Speed 6305.34 samples/sec Loss 4.6358 LearningRate 0.0003 Epoch: 21 Global Step: 450470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:28,731-Speed 6342.07 samples/sec Loss 4.5680 LearningRate 0.0003 Epoch: 21 Global Step: 450480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:31,974-Speed 6316.51 samples/sec Loss 4.6552 LearningRate 0.0003 Epoch: 21 Global Step: 450490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:35,223-Speed 6306.67 samples/sec Loss 4.6001 LearningRate 0.0003 Epoch: 21 Global Step: 450500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:38,473-Speed 6302.46 samples/sec Loss 4.6366 LearningRate 0.0003 Epoch: 21 Global Step: 450510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:41,720-Speed 6308.52 samples/sec Loss 4.6667 LearningRate 0.0003 Epoch: 21 Global Step: 450520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:44,965-Speed 6313.74 samples/sec Loss 4.6567 LearningRate 0.0003 Epoch: 21 Global Step: 450530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:48,219-Speed 6295.99 samples/sec Loss 4.6280 LearningRate 0.0003 Epoch: 21 Global Step: 450540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:51,468-Speed 6304.17 samples/sec Loss 4.5741 LearningRate 0.0003 Epoch: 21 Global Step: 450550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:54,728-Speed 6282.54 samples/sec Loss 4.6745 LearningRate 0.0003 Epoch: 21 Global Step: 450560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:42:57,976-Speed 6307.83 samples/sec Loss 4.6380 LearningRate 0.0003 Epoch: 21 Global Step: 450570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:01,225-Speed 6305.94 samples/sec Loss 4.6589 LearningRate 0.0003 Epoch: 21 Global Step: 450580 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:43:04,476-Speed 6300.62 samples/sec Loss 4.6041 LearningRate 0.0003 Epoch: 21 Global Step: 450590 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:43:07,722-Speed 6310.62 samples/sec Loss 4.5920 LearningRate 0.0003 Epoch: 21 Global Step: 450600 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:43:10,952-Speed 6340.62 samples/sec Loss 4.6618 LearningRate 0.0003 Epoch: 21 Global Step: 450610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:14,201-Speed 6306.25 samples/sec Loss 4.6128 LearningRate 0.0003 Epoch: 21 Global Step: 450620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:17,455-Speed 6299.01 samples/sec Loss 4.6678 LearningRate 0.0003 Epoch: 21 Global Step: 450630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:20,707-Speed 6298.91 samples/sec Loss 4.6246 LearningRate 0.0003 Epoch: 21 Global Step: 450640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:23,954-Speed 6307.90 samples/sec Loss 4.5867 LearningRate 0.0003 Epoch: 21 Global Step: 450650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:27,199-Speed 6312.17 samples/sec Loss 4.6641 LearningRate 0.0003 Epoch: 21 Global Step: 450660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:30,445-Speed 6310.65 samples/sec Loss 4.5676 LearningRate 0.0003 Epoch: 21 Global Step: 450670 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:33,691-Speed 6310.33 samples/sec Loss 4.6452 LearningRate 0.0003 Epoch: 21 Global Step: 450680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:36,935-Speed 6314.88 samples/sec Loss 4.6143 LearningRate 0.0003 Epoch: 21 Global Step: 450690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:40,180-Speed 6313.00 samples/sec Loss 4.5607 LearningRate 0.0003 Epoch: 21 Global Step: 450700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:43,411-Speed 6340.10 samples/sec Loss 4.6560 LearningRate 0.0003 Epoch: 21 Global Step: 450710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:46,658-Speed 6309.35 samples/sec Loss 4.5947 LearningRate 0.0003 Epoch: 21 Global Step: 450720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:49,969-Speed 6187.52 samples/sec Loss 4.6008 LearningRate 0.0003 Epoch: 21 Global Step: 450730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:53,326-Speed 6102.33 samples/sec Loss 4.6124 LearningRate 0.0003 Epoch: 21 Global Step: 450740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:56,573-Speed 6307.38 samples/sec Loss 4.5985 LearningRate 0.0003 Epoch: 21 Global Step: 450750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:43:59,815-Speed 6319.17 samples/sec Loss 4.6610 LearningRate 0.0003 Epoch: 21 Global Step: 450760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:03,060-Speed 6312.64 samples/sec Loss 4.6855 LearningRate 0.0003 Epoch: 21 Global Step: 450770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:06,303-Speed 6317.16 samples/sec Loss 4.6891 LearningRate 0.0003 Epoch: 21 Global Step: 450780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:09,552-Speed 6304.30 samples/sec Loss 4.6041 LearningRate 0.0003 Epoch: 21 Global Step: 450790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:12,796-Speed 6315.15 samples/sec Loss 4.6338 LearningRate 0.0003 Epoch: 21 Global Step: 450800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:16,027-Speed 6339.23 samples/sec Loss 4.6681 LearningRate 0.0003 Epoch: 21 Global Step: 450810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:19,275-Speed 6307.69 samples/sec Loss 4.5628 LearningRate 0.0003 Epoch: 21 Global Step: 450820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:22,521-Speed 6310.09 samples/sec Loss 4.6535 LearningRate 0.0003 Epoch: 21 Global Step: 450830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:25,767-Speed 6310.63 samples/sec Loss 4.6490 LearningRate 0.0003 Epoch: 21 Global Step: 450840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:29,011-Speed 6315.49 samples/sec Loss 4.6392 LearningRate 0.0003 Epoch: 21 Global Step: 450850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:32,263-Speed 6298.95 samples/sec Loss 4.7202 LearningRate 0.0003 Epoch: 21 Global Step: 450860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:35,511-Speed 6305.36 samples/sec Loss 4.6377 LearningRate 0.0003 Epoch: 21 Global Step: 450870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:38,757-Speed 6310.81 samples/sec Loss 4.7004 LearningRate 0.0003 Epoch: 21 Global Step: 450880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:42,002-Speed 6313.78 samples/sec Loss 4.5809 LearningRate 0.0003 Epoch: 21 Global Step: 450890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:45,245-Speed 6315.50 samples/sec Loss 4.6364 LearningRate 0.0003 Epoch: 21 Global Step: 450900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:48,477-Speed 6338.42 samples/sec Loss 4.5911 LearningRate 0.0003 Epoch: 21 Global Step: 450910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:51,720-Speed 6316.16 samples/sec Loss 4.5736 LearningRate 0.0003 Epoch: 21 Global Step: 450920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:54,967-Speed 6310.45 samples/sec Loss 4.6288 LearningRate 0.0003 Epoch: 21 Global Step: 450930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:44:58,216-Speed 6304.97 samples/sec Loss 4.6346 LearningRate 0.0003 Epoch: 21 Global Step: 450940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:01,458-Speed 6317.49 samples/sec Loss 4.6264 LearningRate 0.0003 Epoch: 21 Global Step: 450950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:04,719-Speed 6282.41 samples/sec Loss 4.6980 LearningRate 0.0003 Epoch: 21 Global Step: 450960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:07,962-Speed 6316.91 samples/sec Loss 4.6848 LearningRate 0.0003 Epoch: 21 Global Step: 450970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:11,208-Speed 6310.70 samples/sec Loss 4.6807 LearningRate 0.0003 Epoch: 21 Global Step: 450980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:14,453-Speed 6312.59 samples/sec Loss 4.6450 LearningRate 0.0003 Epoch: 21 Global Step: 450990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:17,742-Speed 6228.40 samples/sec Loss 4.6337 LearningRate 0.0003 Epoch: 21 Global Step: 451000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:20,984-Speed 6317.77 samples/sec Loss 4.7098 LearningRate 0.0003 Epoch: 21 Global Step: 451010 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:45:24,228-Speed 6315.02 samples/sec Loss 4.6868 LearningRate 0.0003 Epoch: 21 Global Step: 451020 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:45:27,474-Speed 6309.86 samples/sec Loss 4.6912 LearningRate 0.0003 Epoch: 21 Global Step: 451030 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:45:30,722-Speed 6307.42 samples/sec Loss 4.6083 LearningRate 0.0003 Epoch: 21 Global Step: 451040 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:45:33,953-Speed 6338.92 samples/sec Loss 4.5809 LearningRate 0.0003 Epoch: 21 Global Step: 451050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:37,196-Speed 6316.62 samples/sec Loss 4.5854 LearningRate 0.0003 Epoch: 21 Global Step: 451060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:40,440-Speed 6314.63 samples/sec Loss 4.6266 LearningRate 0.0003 Epoch: 21 Global Step: 451070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:43,687-Speed 6309.32 samples/sec Loss 4.6303 LearningRate 0.0003 Epoch: 21 Global Step: 451080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:46,935-Speed 6306.88 samples/sec Loss 4.6271 LearningRate 0.0003 Epoch: 21 Global Step: 451090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:50,180-Speed 6313.88 samples/sec Loss 4.6359 LearningRate 0.0003 Epoch: 21 Global Step: 451100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:53,425-Speed 6311.01 samples/sec Loss 4.6849 LearningRate 0.0003 Epoch: 21 Global Step: 451110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:56,670-Speed 6312.11 samples/sec Loss 4.5609 LearningRate 0.0003 Epoch: 21 Global Step: 451120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:45:59,913-Speed 6316.84 samples/sec Loss 4.6252 LearningRate 0.0003 Epoch: 21 Global Step: 451130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:03,157-Speed 6314.84 samples/sec Loss 4.6364 LearningRate 0.0003 Epoch: 21 Global Step: 451140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:06,409-Speed 6300.52 samples/sec Loss 4.6310 LearningRate 0.0003 Epoch: 21 Global Step: 451150 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:46:09,645-Speed 6330.99 samples/sec Loss 4.6448 LearningRate 0.0003 Epoch: 21 Global Step: 451160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:12,893-Speed 6305.80 samples/sec Loss 4.6080 LearningRate 0.0003 Epoch: 21 Global Step: 451170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:16,139-Speed 6312.25 samples/sec Loss 4.6113 LearningRate 0.0003 Epoch: 21 Global Step: 451180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:19,383-Speed 6313.54 samples/sec Loss 4.5742 LearningRate 0.0003 Epoch: 21 Global Step: 451190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:22,629-Speed 6311.50 samples/sec Loss 4.5914 LearningRate 0.0003 Epoch: 21 Global Step: 451200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:25,874-Speed 6312.38 samples/sec Loss 4.5947 LearningRate 0.0003 Epoch: 21 Global Step: 451210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:29,157-Speed 6242.65 samples/sec Loss 4.6386 LearningRate 0.0003 Epoch: 21 Global Step: 451220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:32,422-Speed 6273.72 samples/sec Loss 4.6135 LearningRate 0.0003 Epoch: 21 Global Step: 451230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:35,693-Speed 6262.09 samples/sec Loss 4.5747 LearningRate 0.0003 Epoch: 21 Global Step: 451240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:38,985-Speed 6223.44 samples/sec Loss 4.6264 LearningRate 0.0003 Epoch: 21 Global Step: 451250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:42,220-Speed 6333.52 samples/sec Loss 4.5896 LearningRate 0.0003 Epoch: 21 Global Step: 451260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:45,463-Speed 6315.06 samples/sec Loss 4.5721 LearningRate 0.0003 Epoch: 21 Global Step: 451270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:48,709-Speed 6310.82 samples/sec Loss 4.6143 LearningRate 0.0003 Epoch: 21 Global Step: 451280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:51,955-Speed 6311.77 samples/sec Loss 4.6888 LearningRate 0.0003 Epoch: 21 Global Step: 451290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:55,200-Speed 6311.03 samples/sec Loss 4.5870 LearningRate 0.0003 Epoch: 21 Global Step: 451300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:46:58,437-Speed 6329.72 samples/sec Loss 4.5948 LearningRate 0.0003 Epoch: 21 Global Step: 451310 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 08:47:01,682-Speed 6312.53 samples/sec Loss 4.6546 LearningRate 0.0003 Epoch: 21 Global Step: 451320 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 08:47:04,931-Speed 6305.32 samples/sec Loss 4.6784 LearningRate 0.0003 Epoch: 21 Global Step: 451330 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 08:47:08,186-Speed 6292.13 samples/sec Loss 4.6778 LearningRate 0.0003 Epoch: 21 Global Step: 451340 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 08:47:11,432-Speed 6312.72 samples/sec Loss 4.6462 LearningRate 0.0003 Epoch: 21 Global Step: 451350 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 08:47:14,748-Speed 6176.52 samples/sec Loss 4.6539 LearningRate 0.0003 Epoch: 21 Global Step: 451360 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 08:47:17,995-Speed 6309.38 samples/sec Loss 4.5725 LearningRate 0.0003 Epoch: 21 Global Step: 451370 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 08:47:21,237-Speed 6318.08 samples/sec Loss 4.6088 LearningRate 0.0003 Epoch: 21 Global Step: 451380 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 08:47:24,486-Speed 6306.31 samples/sec Loss 4.6143 LearningRate 0.0003 Epoch: 21 Global Step: 451390 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 08:47:27,732-Speed 6309.22 samples/sec Loss 4.6034 LearningRate 0.0003 Epoch: 21 Global Step: 451400 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 08:47:30,978-Speed 6311.61 samples/sec Loss 4.6961 LearningRate 0.0003 Epoch: 21 Global Step: 451410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:47:34,225-Speed 6308.55 samples/sec Loss 4.6179 LearningRate 0.0003 Epoch: 21 Global Step: 451420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:47:37,471-Speed 6309.50 samples/sec Loss 4.5759 LearningRate 0.0003 Epoch: 21 Global Step: 451430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:47:40,719-Speed 6307.89 samples/sec Loss 4.5879 LearningRate 0.0003 Epoch: 21 Global Step: 451440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:47:43,965-Speed 6310.51 samples/sec Loss 4.6161 LearningRate 0.0003 Epoch: 21 Global Step: 451450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:47:47,213-Speed 6306.32 samples/sec Loss 4.6100 LearningRate 0.0003 Epoch: 21 Global Step: 451460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:47:50,458-Speed 6312.34 samples/sec Loss 4.6289 LearningRate 0.0003 Epoch: 21 Global Step: 451470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:47:53,701-Speed 6317.16 samples/sec Loss 4.6143 LearningRate 0.0003 Epoch: 21 Global Step: 451480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:47:56,945-Speed 6313.81 samples/sec Loss 4.6907 LearningRate 0.0003 Epoch: 21 Global Step: 451490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:00,192-Speed 6310.61 samples/sec Loss 4.5969 LearningRate 0.0003 Epoch: 21 Global Step: 451500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:03,442-Speed 6301.50 samples/sec Loss 4.6497 LearningRate 0.0003 Epoch: 21 Global Step: 451510 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:48:06,686-Speed 6315.55 samples/sec Loss 4.5030 LearningRate 0.0003 Epoch: 21 Global Step: 451520 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:48:09,934-Speed 6307.63 samples/sec Loss 4.5691 LearningRate 0.0003 Epoch: 21 Global Step: 451530 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:48:13,171-Speed 6328.09 samples/sec Loss 4.6168 LearningRate 0.0003 Epoch: 21 Global Step: 451540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:16,419-Speed 6307.48 samples/sec Loss 4.6104 LearningRate 0.0003 Epoch: 21 Global Step: 451550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:19,665-Speed 6310.73 samples/sec Loss 4.5585 LearningRate 0.0003 Epoch: 21 Global Step: 451560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:22,915-Speed 6303.94 samples/sec Loss 4.6537 LearningRate 0.0003 Epoch: 21 Global Step: 451570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:26,161-Speed 6310.41 samples/sec Loss 4.7047 LearningRate 0.0003 Epoch: 21 Global Step: 451580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:29,412-Speed 6302.90 samples/sec Loss 4.7015 LearningRate 0.0003 Epoch: 21 Global Step: 451590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:32,657-Speed 6311.38 samples/sec Loss 4.6530 LearningRate 0.0003 Epoch: 21 Global Step: 451600 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:35,904-Speed 6309.32 samples/sec Loss 4.6471 LearningRate 0.0003 Epoch: 21 Global Step: 451610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:39,153-Speed 6304.90 samples/sec Loss 4.6393 LearningRate 0.0003 Epoch: 21 Global Step: 451620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:42,395-Speed 6318.35 samples/sec Loss 4.6503 LearningRate 0.0003 Epoch: 21 Global Step: 451630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:45,628-Speed 6335.07 samples/sec Loss 4.6533 LearningRate 0.0003 Epoch: 21 Global Step: 451640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:48,871-Speed 6317.03 samples/sec Loss 4.5684 LearningRate 0.0003 Epoch: 21 Global Step: 451650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:52,115-Speed 6314.77 samples/sec Loss 4.6636 LearningRate 0.0003 Epoch: 21 Global Step: 451660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:55,357-Speed 6319.28 samples/sec Loss 4.6448 LearningRate 0.0003 Epoch: 21 Global Step: 451670 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:48:58,603-Speed 6310.68 samples/sec Loss 4.6578 LearningRate 0.0003 Epoch: 21 Global Step: 451680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:01,853-Speed 6301.84 samples/sec Loss 4.6314 LearningRate 0.0003 Epoch: 21 Global Step: 451690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:05,099-Speed 6310.85 samples/sec Loss 4.6337 LearningRate 0.0003 Epoch: 21 Global Step: 451700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:08,343-Speed 6315.45 samples/sec Loss 4.6341 LearningRate 0.0003 Epoch: 21 Global Step: 451710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:11,588-Speed 6311.37 samples/sec Loss 4.5871 LearningRate 0.0003 Epoch: 21 Global Step: 451720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:14,835-Speed 6309.56 samples/sec Loss 4.7029 LearningRate 0.0003 Epoch: 21 Global Step: 451730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:18,069-Speed 6332.94 samples/sec Loss 4.6584 LearningRate 0.0003 Epoch: 21 Global Step: 451740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:21,317-Speed 6308.04 samples/sec Loss 4.6926 LearningRate 0.0003 Epoch: 21 Global Step: 451750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:24,561-Speed 6315.83 samples/sec Loss 4.6341 LearningRate 0.0003 Epoch: 21 Global Step: 451760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:27,806-Speed 6312.03 samples/sec Loss 4.6114 LearningRate 0.0003 Epoch: 21 Global Step: 451770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:31,050-Speed 6314.86 samples/sec Loss 4.5929 LearningRate 0.0003 Epoch: 21 Global Step: 451780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:34,296-Speed 6311.24 samples/sec Loss 4.6101 LearningRate 0.0003 Epoch: 21 Global Step: 451790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:37,546-Speed 6303.29 samples/sec Loss 4.6294 LearningRate 0.0003 Epoch: 21 Global Step: 451800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:40,799-Speed 6297.12 samples/sec Loss 4.6515 LearningRate 0.0003 Epoch: 21 Global Step: 451810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:44,045-Speed 6309.10 samples/sec Loss 4.6434 LearningRate 0.0003 Epoch: 21 Global Step: 451820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:47,299-Speed 6295.25 samples/sec Loss 4.6324 LearningRate 0.0003 Epoch: 21 Global Step: 451830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:49:50,543-Speed 6314.75 samples/sec Loss 4.5979 LearningRate 0.0003 Epoch: 21 Global Step: 451840 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:49:53,790-Speed 6309.51 samples/sec Loss 4.6311 LearningRate 0.0003 Epoch: 21 Global Step: 451850 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:49:57,022-Speed 6337.76 samples/sec Loss 4.6527 LearningRate 0.0003 Epoch: 21 Global Step: 451860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:00,269-Speed 6309.18 samples/sec Loss 4.6687 LearningRate 0.0003 Epoch: 21 Global Step: 451870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:03,515-Speed 6309.83 samples/sec Loss 4.6331 LearningRate 0.0003 Epoch: 21 Global Step: 451880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:06,762-Speed 6309.53 samples/sec Loss 4.6615 LearningRate 0.0003 Epoch: 21 Global Step: 451890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:10,008-Speed 6309.92 samples/sec Loss 4.5993 LearningRate 0.0003 Epoch: 21 Global Step: 451900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:13,252-Speed 6314.15 samples/sec Loss 4.6128 LearningRate 0.0003 Epoch: 21 Global Step: 451910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:16,497-Speed 6314.55 samples/sec Loss 4.6102 LearningRate 0.0003 Epoch: 21 Global Step: 451920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:19,741-Speed 6314.17 samples/sec Loss 4.6134 LearningRate 0.0003 Epoch: 21 Global Step: 451930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:22,991-Speed 6303.32 samples/sec Loss 4.5923 LearningRate 0.0003 Epoch: 21 Global Step: 451940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:26,242-Speed 6300.84 samples/sec Loss 4.6971 LearningRate 0.0003 Epoch: 21 Global Step: 451950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:29,476-Speed 6333.49 samples/sec Loss 4.6857 LearningRate 0.0003 Epoch: 21 Global Step: 451960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:32,721-Speed 6313.79 samples/sec Loss 4.5914 LearningRate 0.0003 Epoch: 21 Global Step: 451970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:35,967-Speed 6309.96 samples/sec Loss 4.6297 LearningRate 0.0003 Epoch: 21 Global Step: 451980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:39,215-Speed 6306.32 samples/sec Loss 4.7042 LearningRate 0.0003 Epoch: 21 Global Step: 451990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:42,465-Speed 6303.86 samples/sec Loss 4.5479 LearningRate 0.0003 Epoch: 21 Global Step: 452000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:45,713-Speed 6306.09 samples/sec Loss 4.6377 LearningRate 0.0003 Epoch: 21 Global Step: 452010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:48,964-Speed 6302.43 samples/sec Loss 4.6089 LearningRate 0.0003 Epoch: 21 Global Step: 452020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:52,211-Speed 6308.61 samples/sec Loss 4.6395 LearningRate 0.0003 Epoch: 21 Global Step: 452030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:55,456-Speed 6311.84 samples/sec Loss 4.5997 LearningRate 0.0003 Epoch: 21 Global Step: 452040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:50:58,707-Speed 6300.53 samples/sec Loss 4.5576 LearningRate 0.0003 Epoch: 21 Global Step: 452050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:01,939-Speed 6338.64 samples/sec Loss 4.6143 LearningRate 0.0003 Epoch: 21 Global Step: 452060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:05,204-Speed 6274.28 samples/sec Loss 4.5790 LearningRate 0.0003 Epoch: 21 Global Step: 452070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:08,455-Speed 6301.43 samples/sec Loss 4.6264 LearningRate 0.0003 Epoch: 21 Global Step: 452080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:11,703-Speed 6305.32 samples/sec Loss 4.6113 LearningRate 0.0003 Epoch: 21 Global Step: 452090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:14,949-Speed 6311.44 samples/sec Loss 4.5958 LearningRate 0.0003 Epoch: 21 Global Step: 452100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:18,198-Speed 6305.26 samples/sec Loss 4.6034 LearningRate 0.0003 Epoch: 21 Global Step: 452110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:21,443-Speed 6312.24 samples/sec Loss 4.5487 LearningRate 0.0003 Epoch: 21 Global Step: 452120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:24,691-Speed 6306.56 samples/sec Loss 4.6076 LearningRate 0.0003 Epoch: 21 Global Step: 452130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:27,937-Speed 6310.58 samples/sec Loss 4.6186 LearningRate 0.0003 Epoch: 21 Global Step: 452140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:31,186-Speed 6305.95 samples/sec Loss 4.6509 LearningRate 0.0003 Epoch: 21 Global Step: 452150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:34,414-Speed 6345.70 samples/sec Loss 4.6179 LearningRate 0.0003 Epoch: 21 Global Step: 452160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:37,660-Speed 6311.01 samples/sec Loss 4.6249 LearningRate 0.0003 Epoch: 21 Global Step: 452170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:40,906-Speed 6311.01 samples/sec Loss 4.6396 LearningRate 0.0003 Epoch: 21 Global Step: 452180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:44,153-Speed 6308.59 samples/sec Loss 4.6215 LearningRate 0.0003 Epoch: 21 Global Step: 452190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:47,399-Speed 6310.81 samples/sec Loss 4.5791 LearningRate 0.0003 Epoch: 21 Global Step: 452200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:50,650-Speed 6301.60 samples/sec Loss 4.6079 LearningRate 0.0003 Epoch: 21 Global Step: 452210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:53,897-Speed 6308.13 samples/sec Loss 4.5801 LearningRate 0.0003 Epoch: 21 Global Step: 452220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:51:57,143-Speed 6311.31 samples/sec Loss 4.5875 LearningRate 0.0003 Epoch: 21 Global Step: 452230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:00,394-Speed 6299.61 samples/sec Loss 4.5903 LearningRate 0.0003 Epoch: 21 Global Step: 452240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:03,651-Speed 6290.56 samples/sec Loss 4.5947 LearningRate 0.0003 Epoch: 21 Global Step: 452250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:06,898-Speed 6307.76 samples/sec Loss 4.6155 LearningRate 0.0003 Epoch: 21 Global Step: 452260 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:52:10,136-Speed 6327.92 samples/sec Loss 4.6523 LearningRate 0.0003 Epoch: 21 Global Step: 452270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:13,384-Speed 6305.66 samples/sec Loss 4.6208 LearningRate 0.0003 Epoch: 21 Global Step: 452280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:16,632-Speed 6307.64 samples/sec Loss 4.6776 LearningRate 0.0003 Epoch: 21 Global Step: 452290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:19,881-Speed 6304.81 samples/sec Loss 4.6143 LearningRate 0.0003 Epoch: 21 Global Step: 452300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:23,131-Speed 6302.83 samples/sec Loss 4.6148 LearningRate 0.0003 Epoch: 21 Global Step: 452310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:26,381-Speed 6303.01 samples/sec Loss 4.6708 LearningRate 0.0003 Epoch: 21 Global Step: 452320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:29,623-Speed 6317.74 samples/sec Loss 4.5545 LearningRate 0.0003 Epoch: 21 Global Step: 452330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:32,873-Speed 6303.34 samples/sec Loss 4.5846 LearningRate 0.0003 Epoch: 21 Global Step: 452340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:36,120-Speed 6309.45 samples/sec Loss 4.5903 LearningRate 0.0003 Epoch: 21 Global Step: 452350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:39,368-Speed 6307.52 samples/sec Loss 4.6357 LearningRate 0.0003 Epoch: 21 Global Step: 452360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:42,605-Speed 6328.34 samples/sec Loss 4.6855 LearningRate 0.0003 Epoch: 21 Global Step: 452370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:45,851-Speed 6311.43 samples/sec Loss 4.5562 LearningRate 0.0003 Epoch: 21 Global Step: 452380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:49,099-Speed 6306.33 samples/sec Loss 4.6558 LearningRate 0.0003 Epoch: 21 Global Step: 452390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:52,342-Speed 6316.96 samples/sec Loss 4.5980 LearningRate 0.0003 Epoch: 21 Global Step: 452400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:55,587-Speed 6311.43 samples/sec Loss 4.6478 LearningRate 0.0003 Epoch: 21 Global Step: 452410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:52:58,837-Speed 6304.19 samples/sec Loss 4.6011 LearningRate 0.0003 Epoch: 21 Global Step: 452420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:02,089-Speed 6297.75 samples/sec Loss 4.5827 LearningRate 0.0003 Epoch: 21 Global Step: 452430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:05,333-Speed 6315.64 samples/sec Loss 4.6306 LearningRate 0.0003 Epoch: 21 Global Step: 452440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:08,573-Speed 6321.04 samples/sec Loss 4.6411 LearningRate 0.0003 Epoch: 21 Global Step: 452450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:11,828-Speed 6293.94 samples/sec Loss 4.6804 LearningRate 0.0003 Epoch: 21 Global Step: 452460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:15,063-Speed 6331.84 samples/sec Loss 4.6369 LearningRate 0.0003 Epoch: 21 Global Step: 452470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:18,333-Speed 6265.08 samples/sec Loss 4.5973 LearningRate 0.0003 Epoch: 21 Global Step: 452480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:21,582-Speed 6303.71 samples/sec Loss 4.6132 LearningRate 0.0003 Epoch: 21 Global Step: 452490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:24,831-Speed 6305.67 samples/sec Loss 4.6108 LearningRate 0.0003 Epoch: 21 Global Step: 452500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:28,080-Speed 6305.29 samples/sec Loss 4.6436 LearningRate 0.0003 Epoch: 21 Global Step: 452510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:31,321-Speed 6320.26 samples/sec Loss 4.5506 LearningRate 0.0003 Epoch: 21 Global Step: 452520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:34,568-Speed 6309.25 samples/sec Loss 4.5800 LearningRate 0.0003 Epoch: 21 Global Step: 452530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:37,880-Speed 6184.15 samples/sec Loss 4.5569 LearningRate 0.0003 Epoch: 21 Global Step: 452540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:41,183-Speed 6201.88 samples/sec Loss 4.6812 LearningRate 0.0003 Epoch: 21 Global Step: 452550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:44,430-Speed 6309.12 samples/sec Loss 4.6497 LearningRate 0.0003 Epoch: 21 Global Step: 452560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:47,660-Speed 6341.81 samples/sec Loss 4.6614 LearningRate 0.0003 Epoch: 21 Global Step: 452570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:50,908-Speed 6307.99 samples/sec Loss 4.6263 LearningRate 0.0003 Epoch: 21 Global Step: 452580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:54,154-Speed 6310.44 samples/sec Loss 4.5203 LearningRate 0.0003 Epoch: 21 Global Step: 452590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:53:57,397-Speed 6316.19 samples/sec Loss 4.6119 LearningRate 0.0003 Epoch: 21 Global Step: 452600 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:00,644-Speed 6308.11 samples/sec Loss 4.6094 LearningRate 0.0003 Epoch: 21 Global Step: 452610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:03,908-Speed 6275.61 samples/sec Loss 4.6104 LearningRate 0.0003 Epoch: 21 Global Step: 452620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:07,159-Speed 6302.79 samples/sec Loss 4.5708 LearningRate 0.0003 Epoch: 21 Global Step: 452630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:10,402-Speed 6314.90 samples/sec Loss 4.6696 LearningRate 0.0003 Epoch: 21 Global Step: 452640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:13,648-Speed 6311.88 samples/sec Loss 4.5715 LearningRate 0.0003 Epoch: 21 Global Step: 452650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:16,961-Speed 6183.15 samples/sec Loss 4.6428 LearningRate 0.0003 Epoch: 21 Global Step: 452660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:20,228-Speed 6268.87 samples/sec Loss 4.5917 LearningRate 0.0003 Epoch: 21 Global Step: 452670 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:54:23,464-Speed 6331.16 samples/sec Loss 4.6883 LearningRate 0.0003 Epoch: 21 Global Step: 452680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:26,713-Speed 6305.40 samples/sec Loss 4.6053 LearningRate 0.0003 Epoch: 21 Global Step: 452690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:29,961-Speed 6305.58 samples/sec Loss 4.6390 LearningRate 0.0003 Epoch: 21 Global Step: 452700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:33,202-Speed 6320.15 samples/sec Loss 4.5825 LearningRate 0.0003 Epoch: 21 Global Step: 452710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:36,448-Speed 6310.96 samples/sec Loss 4.6462 LearningRate 0.0003 Epoch: 21 Global Step: 452720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:39,696-Speed 6307.32 samples/sec Loss 4.5989 LearningRate 0.0003 Epoch: 21 Global Step: 452730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:42,947-Speed 6300.50 samples/sec Loss 4.6535 LearningRate 0.0003 Epoch: 21 Global Step: 452740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:46,194-Speed 6309.27 samples/sec Loss 4.6225 LearningRate 0.0003 Epoch: 21 Global Step: 452750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:49,442-Speed 6306.89 samples/sec Loss 4.6409 LearningRate 0.0003 Epoch: 21 Global Step: 452760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:52,689-Speed 6310.16 samples/sec Loss 4.5115 LearningRate 0.0003 Epoch: 21 Global Step: 452770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:55,919-Speed 6342.18 samples/sec Loss 4.5397 LearningRate 0.0003 Epoch: 21 Global Step: 452780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:54:59,165-Speed 6309.04 samples/sec Loss 4.5671 LearningRate 0.0003 Epoch: 21 Global Step: 452790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:02,418-Speed 6298.57 samples/sec Loss 4.5787 LearningRate 0.0003 Epoch: 21 Global Step: 452800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:05,665-Speed 6308.14 samples/sec Loss 4.5482 LearningRate 0.0003 Epoch: 21 Global Step: 452810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:08,910-Speed 6312.97 samples/sec Loss 4.6402 LearningRate 0.0003 Epoch: 21 Global Step: 452820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:12,155-Speed 6311.72 samples/sec Loss 4.5828 LearningRate 0.0003 Epoch: 21 Global Step: 452830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:15,422-Speed 6270.16 samples/sec Loss 4.5793 LearningRate 0.0003 Epoch: 21 Global Step: 452840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:18,671-Speed 6306.13 samples/sec Loss 4.5869 LearningRate 0.0003 Epoch: 21 Global Step: 452850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:21,921-Speed 6302.97 samples/sec Loss 4.6818 LearningRate 0.0003 Epoch: 21 Global Step: 452860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:25,174-Speed 6297.25 samples/sec Loss 4.6186 LearningRate 0.0003 Epoch: 21 Global Step: 452870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:28,418-Speed 6313.77 samples/sec Loss 4.5784 LearningRate 0.0003 Epoch: 21 Global Step: 452880 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:55:31,653-Speed 6331.68 samples/sec Loss 4.5951 LearningRate 0.0003 Epoch: 21 Global Step: 452890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:34,903-Speed 6302.75 samples/sec Loss 4.6147 LearningRate 0.0003 Epoch: 21 Global Step: 452900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:38,149-Speed 6311.93 samples/sec Loss 4.6400 LearningRate 0.0003 Epoch: 21 Global Step: 452910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:41,395-Speed 6309.01 samples/sec Loss 4.6273 LearningRate 0.0003 Epoch: 21 Global Step: 452920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:44,641-Speed 6312.14 samples/sec Loss 4.5687 LearningRate 0.0003 Epoch: 21 Global Step: 452930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:47,889-Speed 6307.11 samples/sec Loss 4.6549 LearningRate 0.0003 Epoch: 21 Global Step: 452940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:51,139-Speed 6302.59 samples/sec Loss 4.6089 LearningRate 0.0003 Epoch: 21 Global Step: 452950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:54,396-Speed 6289.92 samples/sec Loss 4.6372 LearningRate 0.0003 Epoch: 21 Global Step: 452960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:55:57,645-Speed 6304.86 samples/sec Loss 4.6353 LearningRate 0.0003 Epoch: 21 Global Step: 452970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:00,891-Speed 6311.08 samples/sec Loss 4.5690 LearningRate 0.0003 Epoch: 21 Global Step: 452980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:04,150-Speed 6284.43 samples/sec Loss 4.5653 LearningRate 0.0003 Epoch: 21 Global Step: 452990 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:56:07,417-Speed 6270.40 samples/sec Loss 4.5774 LearningRate 0.0003 Epoch: 21 Global Step: 453000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:10,662-Speed 6313.93 samples/sec Loss 4.6442 LearningRate 0.0003 Epoch: 21 Global Step: 453010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:13,911-Speed 6304.57 samples/sec Loss 4.6112 LearningRate 0.0003 Epoch: 21 Global Step: 453020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:17,156-Speed 6312.24 samples/sec Loss 4.5881 LearningRate 0.0003 Epoch: 21 Global Step: 453030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:20,404-Speed 6305.71 samples/sec Loss 4.6085 LearningRate 0.0003 Epoch: 21 Global Step: 453040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:23,652-Speed 6307.97 samples/sec Loss 4.6669 LearningRate 0.0003 Epoch: 21 Global Step: 453050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:26,899-Speed 6307.74 samples/sec Loss 4.5530 LearningRate 0.0003 Epoch: 21 Global Step: 453060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:30,146-Speed 6308.60 samples/sec Loss 4.6276 LearningRate 0.0003 Epoch: 21 Global Step: 453070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:33,393-Speed 6310.48 samples/sec Loss 4.5176 LearningRate 0.0003 Epoch: 21 Global Step: 453080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:36,637-Speed 6314.50 samples/sec Loss 4.6585 LearningRate 0.0003 Epoch: 21 Global Step: 453090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:39,882-Speed 6312.35 samples/sec Loss 4.6166 LearningRate 0.0003 Epoch: 21 Global Step: 453100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:43,130-Speed 6305.72 samples/sec Loss 4.6906 LearningRate 0.0003 Epoch: 21 Global Step: 453110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:46,377-Speed 6309.16 samples/sec Loss 4.6139 LearningRate 0.0003 Epoch: 21 Global Step: 453120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:49,624-Speed 6309.05 samples/sec Loss 4.5765 LearningRate 0.0003 Epoch: 21 Global Step: 453130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:52,868-Speed 6314.89 samples/sec Loss 4.5488 LearningRate 0.0003 Epoch: 21 Global Step: 453140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:56,118-Speed 6302.22 samples/sec Loss 4.6815 LearningRate 0.0003 Epoch: 21 Global Step: 453150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:56:59,363-Speed 6313.29 samples/sec Loss 4.6303 LearningRate 0.0003 Epoch: 21 Global Step: 453160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:02,610-Speed 6308.23 samples/sec Loss 4.5422 LearningRate 0.0003 Epoch: 21 Global Step: 453170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:05,858-Speed 6307.47 samples/sec Loss 4.5871 LearningRate 0.0003 Epoch: 21 Global Step: 453180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:09,108-Speed 6302.23 samples/sec Loss 4.6011 LearningRate 0.0003 Epoch: 21 Global Step: 453190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:12,357-Speed 6306.80 samples/sec Loss 4.5544 LearningRate 0.0003 Epoch: 21 Global Step: 453200 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:57:15,596-Speed 6324.94 samples/sec Loss 4.5488 LearningRate 0.0003 Epoch: 21 Global Step: 453210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:18,844-Speed 6305.75 samples/sec Loss 4.5938 LearningRate 0.0003 Epoch: 21 Global Step: 453220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:22,095-Speed 6302.12 samples/sec Loss 4.5648 LearningRate 0.0003 Epoch: 21 Global Step: 453230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:25,342-Speed 6308.90 samples/sec Loss 4.5819 LearningRate 0.0003 Epoch: 21 Global Step: 453240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:28,588-Speed 6310.16 samples/sec Loss 4.5921 LearningRate 0.0003 Epoch: 21 Global Step: 453250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:31,837-Speed 6305.38 samples/sec Loss 4.6019 LearningRate 0.0003 Epoch: 21 Global Step: 453260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:35,091-Speed 6294.87 samples/sec Loss 4.6211 LearningRate 0.0003 Epoch: 21 Global Step: 453270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:38,339-Speed 6306.22 samples/sec Loss 4.6322 LearningRate 0.0003 Epoch: 21 Global Step: 453280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:41,585-Speed 6310.41 samples/sec Loss 4.5311 LearningRate 0.0003 Epoch: 21 Global Step: 453290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:44,830-Speed 6312.60 samples/sec Loss 4.6153 LearningRate 0.0003 Epoch: 21 Global Step: 453300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:48,064-Speed 6335.08 samples/sec Loss 4.5929 LearningRate 0.0003 Epoch: 21 Global Step: 453310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:51,312-Speed 6306.76 samples/sec Loss 4.5150 LearningRate 0.0003 Epoch: 21 Global Step: 453320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:54,557-Speed 6311.36 samples/sec Loss 4.6568 LearningRate 0.0003 Epoch: 21 Global Step: 453330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:57:57,818-Speed 6282.68 samples/sec Loss 4.6305 LearningRate 0.0003 Epoch: 21 Global Step: 453340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:01,064-Speed 6311.15 samples/sec Loss 4.6585 LearningRate 0.0003 Epoch: 21 Global Step: 453350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:04,310-Speed 6309.99 samples/sec Loss 4.6127 LearningRate 0.0003 Epoch: 21 Global Step: 453360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:07,561-Speed 6299.95 samples/sec Loss 4.5998 LearningRate 0.0003 Epoch: 21 Global Step: 453370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:10,807-Speed 6311.08 samples/sec Loss 4.6086 LearningRate 0.0003 Epoch: 21 Global Step: 453380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:14,067-Speed 6283.52 samples/sec Loss 4.6009 LearningRate 0.0003 Epoch: 21 Global Step: 453390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:17,317-Speed 6304.22 samples/sec Loss 4.5616 LearningRate 0.0003 Epoch: 21 Global Step: 453400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:20,549-Speed 6338.27 samples/sec Loss 4.5541 LearningRate 0.0003 Epoch: 21 Global Step: 453410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:23,796-Speed 6309.08 samples/sec Loss 4.6452 LearningRate 0.0003 Epoch: 21 Global Step: 453420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:27,042-Speed 6310.36 samples/sec Loss 4.6854 LearningRate 0.0003 Epoch: 21 Global Step: 453430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:30,289-Speed 6308.84 samples/sec Loss 4.6162 LearningRate 0.0003 Epoch: 21 Global Step: 453440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:33,533-Speed 6313.60 samples/sec Loss 4.5672 LearningRate 0.0003 Epoch: 21 Global Step: 453450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:36,780-Speed 6309.70 samples/sec Loss 4.6392 LearningRate 0.0003 Epoch: 21 Global Step: 453460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:40,029-Speed 6305.95 samples/sec Loss 4.6933 LearningRate 0.0003 Epoch: 21 Global Step: 453470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:43,274-Speed 6312.58 samples/sec Loss 4.5778 LearningRate 0.0003 Epoch: 21 Global Step: 453480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:46,517-Speed 6316.06 samples/sec Loss 4.6113 LearningRate 0.0003 Epoch: 21 Global Step: 453490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:49,762-Speed 6311.86 samples/sec Loss 4.6212 LearningRate 0.0003 Epoch: 21 Global Step: 453500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:58:53,008-Speed 6311.80 samples/sec Loss 4.5389 LearningRate 0.0003 Epoch: 21 Global Step: 453510 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:58:56,254-Speed 6308.93 samples/sec Loss 4.5683 LearningRate 0.0003 Epoch: 21 Global Step: 453520 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 08:58:59,489-Speed 6333.79 samples/sec Loss 4.6269 LearningRate 0.0003 Epoch: 21 Global Step: 453530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:02,737-Speed 6306.94 samples/sec Loss 4.6502 LearningRate 0.0003 Epoch: 21 Global Step: 453540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:05,983-Speed 6310.15 samples/sec Loss 4.5416 LearningRate 0.0003 Epoch: 21 Global Step: 453550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:09,227-Speed 6314.03 samples/sec Loss 4.5995 LearningRate 0.0003 Epoch: 21 Global Step: 453560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:12,485-Speed 6287.37 samples/sec Loss 4.5845 LearningRate 0.0003 Epoch: 21 Global Step: 453570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:15,728-Speed 6316.17 samples/sec Loss 4.6148 LearningRate 0.0003 Epoch: 21 Global Step: 453580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:18,980-Speed 6298.63 samples/sec Loss 4.6168 LearningRate 0.0003 Epoch: 21 Global Step: 453590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:22,227-Speed 6310.90 samples/sec Loss 4.5546 LearningRate 0.0003 Epoch: 21 Global Step: 453600 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:25,477-Speed 6303.28 samples/sec Loss 4.6388 LearningRate 0.0003 Epoch: 21 Global Step: 453610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:28,724-Speed 6309.00 samples/sec Loss 4.5539 LearningRate 0.0003 Epoch: 21 Global Step: 453620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:31,972-Speed 6305.83 samples/sec Loss 4.5395 LearningRate 0.0003 Epoch: 21 Global Step: 453630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:35,215-Speed 6316.63 samples/sec Loss 4.5211 LearningRate 0.0003 Epoch: 21 Global Step: 453640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:38,458-Speed 6315.88 samples/sec Loss 4.6281 LearningRate 0.0003 Epoch: 21 Global Step: 453650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:41,704-Speed 6311.26 samples/sec Loss 4.5996 LearningRate 0.0003 Epoch: 21 Global Step: 453660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:44,953-Speed 6306.30 samples/sec Loss 4.6137 LearningRate 0.0003 Epoch: 21 Global Step: 453670 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:48,203-Speed 6302.98 samples/sec Loss 4.6769 LearningRate 0.0003 Epoch: 21 Global Step: 453680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:51,449-Speed 6310.82 samples/sec Loss 4.5942 LearningRate 0.0003 Epoch: 21 Global Step: 453690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:54,693-Speed 6314.27 samples/sec Loss 4.5456 LearningRate 0.0003 Epoch: 21 Global Step: 453700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 08:59:57,940-Speed 6309.17 samples/sec Loss 4.6401 LearningRate 0.0003 Epoch: 21 Global Step: 453710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:01,185-Speed 6312.04 samples/sec Loss 4.6206 LearningRate 0.0003 Epoch: 21 Global Step: 453720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:04,415-Speed 6341.44 samples/sec Loss 4.5586 LearningRate 0.0003 Epoch: 21 Global Step: 453730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:07,661-Speed 6311.37 samples/sec Loss 4.7093 LearningRate 0.0003 Epoch: 21 Global Step: 453740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:10,910-Speed 6304.87 samples/sec Loss 4.6384 LearningRate 0.0003 Epoch: 21 Global Step: 453750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:14,168-Speed 6286.47 samples/sec Loss 4.5080 LearningRate 0.0003 Epoch: 21 Global Step: 453760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:17,421-Speed 6297.34 samples/sec Loss 4.5939 LearningRate 0.0003 Epoch: 21 Global Step: 453770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:20,667-Speed 6311.96 samples/sec Loss 4.6281 LearningRate 0.0003 Epoch: 21 Global Step: 453780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:23,916-Speed 6304.63 samples/sec Loss 4.6289 LearningRate 0.0003 Epoch: 21 Global Step: 453790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:27,161-Speed 6313.74 samples/sec Loss 4.6090 LearningRate 0.0003 Epoch: 21 Global Step: 453800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:30,405-Speed 6313.52 samples/sec Loss 4.5262 LearningRate 0.0003 Epoch: 21 Global Step: 453810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:33,652-Speed 6309.69 samples/sec Loss 4.6296 LearningRate 0.0003 Epoch: 21 Global Step: 453820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:36,882-Speed 6342.64 samples/sec Loss 4.6343 LearningRate 0.0003 Epoch: 21 Global Step: 453830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:40,128-Speed 6311.59 samples/sec Loss 4.6698 LearningRate 0.0003 Epoch: 21 Global Step: 453840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:43,376-Speed 6306.35 samples/sec Loss 4.6313 LearningRate 0.0003 Epoch: 21 Global Step: 453850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:46,621-Speed 6312.72 samples/sec Loss 4.6558 LearningRate 0.0003 Epoch: 21 Global Step: 453860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:49,866-Speed 6311.39 samples/sec Loss 4.6320 LearningRate 0.0003 Epoch: 21 Global Step: 453870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:53,114-Speed 6307.66 samples/sec Loss 4.5720 LearningRate 0.0003 Epoch: 21 Global Step: 453880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:56,362-Speed 6306.22 samples/sec Loss 4.6232 LearningRate 0.0003 Epoch: 21 Global Step: 453890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:00:59,613-Speed 6302.24 samples/sec Loss 4.6076 LearningRate 0.0003 Epoch: 21 Global Step: 453900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:02,861-Speed 6306.72 samples/sec Loss 4.5719 LearningRate 0.0003 Epoch: 21 Global Step: 453910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:06,101-Speed 6321.35 samples/sec Loss 4.6601 LearningRate 0.0003 Epoch: 21 Global Step: 453920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:09,350-Speed 6304.69 samples/sec Loss 4.5778 LearningRate 0.0003 Epoch: 21 Global Step: 453930 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:01:12,578-Speed 6346.54 samples/sec Loss 4.6375 LearningRate 0.0003 Epoch: 21 Global Step: 453940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:15,827-Speed 6305.01 samples/sec Loss 4.5609 LearningRate 0.0003 Epoch: 21 Global Step: 453950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:19,079-Speed 6298.12 samples/sec Loss 4.6839 LearningRate 0.0003 Epoch: 21 Global Step: 453960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:22,323-Speed 6315.01 samples/sec Loss 4.5863 LearningRate 0.0003 Epoch: 21 Global Step: 453970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:25,570-Speed 6308.81 samples/sec Loss 4.6135 LearningRate 0.0003 Epoch: 21 Global Step: 453980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:28,815-Speed 6312.74 samples/sec Loss 4.5917 LearningRate 0.0003 Epoch: 21 Global Step: 453990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:32,061-Speed 6310.79 samples/sec Loss 4.6117 LearningRate 0.0003 Epoch: 21 Global Step: 454000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:35,313-Speed 6298.87 samples/sec Loss 4.5968 LearningRate 0.0003 Epoch: 21 Global Step: 454010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:38,561-Speed 6307.88 samples/sec Loss 4.5596 LearningRate 0.0003 Epoch: 21 Global Step: 454020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:41,822-Speed 6282.67 samples/sec Loss 4.6506 LearningRate 0.0003 Epoch: 21 Global Step: 454030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:45,051-Speed 6343.04 samples/sec Loss 4.6433 LearningRate 0.0003 Epoch: 21 Global Step: 454040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:48,304-Speed 6297.90 samples/sec Loss 4.6245 LearningRate 0.0003 Epoch: 21 Global Step: 454050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:51,551-Speed 6307.80 samples/sec Loss 4.5569 LearningRate 0.0003 Epoch: 21 Global Step: 454060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:54,797-Speed 6310.89 samples/sec Loss 4.5743 LearningRate 0.0003 Epoch: 21 Global Step: 454070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:01:58,047-Speed 6302.86 samples/sec Loss 4.5984 LearningRate 0.0003 Epoch: 21 Global Step: 454080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:01,293-Speed 6310.93 samples/sec Loss 4.6486 LearningRate 0.0003 Epoch: 21 Global Step: 454090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:04,540-Speed 6308.37 samples/sec Loss 4.6614 LearningRate 0.0003 Epoch: 21 Global Step: 454100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:07,785-Speed 6313.46 samples/sec Loss 4.6313 LearningRate 0.0003 Epoch: 21 Global Step: 454110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:11,032-Speed 6309.73 samples/sec Loss 4.5828 LearningRate 0.0003 Epoch: 21 Global Step: 454120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:14,289-Speed 6289.18 samples/sec Loss 4.5763 LearningRate 0.0003 Epoch: 21 Global Step: 454130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:17,536-Speed 6307.38 samples/sec Loss 4.5494 LearningRate 0.0003 Epoch: 21 Global Step: 454140 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:02:20,769-Speed 6335.79 samples/sec Loss 4.5687 LearningRate 0.0003 Epoch: 21 Global Step: 454150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:24,013-Speed 6315.20 samples/sec Loss 4.6239 LearningRate 0.0003 Epoch: 21 Global Step: 454160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:27,263-Speed 6304.01 samples/sec Loss 4.6249 LearningRate 0.0003 Epoch: 21 Global Step: 454170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:30,509-Speed 6310.65 samples/sec Loss 4.6063 LearningRate 0.0003 Epoch: 21 Global Step: 454180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:33,756-Speed 6308.87 samples/sec Loss 4.5839 LearningRate 0.0003 Epoch: 21 Global Step: 454190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:37,004-Speed 6305.81 samples/sec Loss 4.5742 LearningRate 0.0003 Epoch: 21 Global Step: 454200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:40,254-Speed 6302.85 samples/sec Loss 4.5983 LearningRate 0.0003 Epoch: 21 Global Step: 454210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:43,502-Speed 6307.29 samples/sec Loss 4.6323 LearningRate 0.0003 Epoch: 21 Global Step: 454220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:46,752-Speed 6303.98 samples/sec Loss 4.6503 LearningRate 0.0003 Epoch: 21 Global Step: 454230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:49,994-Speed 6318.27 samples/sec Loss 4.6338 LearningRate 0.0003 Epoch: 21 Global Step: 454240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:53,229-Speed 6332.54 samples/sec Loss 4.6272 LearningRate 0.0003 Epoch: 21 Global Step: 454250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:56,472-Speed 6316.22 samples/sec Loss 4.6100 LearningRate 0.0003 Epoch: 21 Global Step: 454260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:02:59,716-Speed 6316.41 samples/sec Loss 4.5827 LearningRate 0.0003 Epoch: 21 Global Step: 454270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:02,959-Speed 6316.54 samples/sec Loss 4.6241 LearningRate 0.0003 Epoch: 21 Global Step: 454280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:06,203-Speed 6314.56 samples/sec Loss 4.6131 LearningRate 0.0003 Epoch: 21 Global Step: 454290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:09,448-Speed 6312.02 samples/sec Loss 4.6314 LearningRate 0.0003 Epoch: 21 Global Step: 454300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:12,694-Speed 6311.19 samples/sec Loss 4.5675 LearningRate 0.0003 Epoch: 21 Global Step: 454310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:15,945-Speed 6299.57 samples/sec Loss 4.5874 LearningRate 0.0003 Epoch: 21 Global Step: 454320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:19,192-Speed 6309.19 samples/sec Loss 4.6616 LearningRate 0.0003 Epoch: 21 Global Step: 454330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:22,436-Speed 6315.09 samples/sec Loss 4.6180 LearningRate 0.0003 Epoch: 21 Global Step: 454340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:25,675-Speed 6324.87 samples/sec Loss 4.5096 LearningRate 0.0003 Epoch: 21 Global Step: 454350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:28,922-Speed 6308.83 samples/sec Loss 4.5504 LearningRate 0.0003 Epoch: 21 Global Step: 454360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:32,169-Speed 6308.72 samples/sec Loss 4.5544 LearningRate 0.0003 Epoch: 21 Global Step: 454370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:35,418-Speed 6305.25 samples/sec Loss 4.6139 LearningRate 0.0003 Epoch: 21 Global Step: 454380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:38,663-Speed 6311.50 samples/sec Loss 4.6522 LearningRate 0.0003 Epoch: 21 Global Step: 454390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:41,917-Speed 6294.89 samples/sec Loss 4.6268 LearningRate 0.0003 Epoch: 21 Global Step: 454400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:45,163-Speed 6310.56 samples/sec Loss 4.5649 LearningRate 0.0003 Epoch: 21 Global Step: 454410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:48,406-Speed 6317.75 samples/sec Loss 4.6496 LearningRate 0.0003 Epoch: 21 Global Step: 454420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:51,649-Speed 6317.22 samples/sec Loss 4.5874 LearningRate 0.0003 Epoch: 21 Global Step: 454430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:54,893-Speed 6315.16 samples/sec Loss 4.4983 LearningRate 0.0003 Epoch: 21 Global Step: 454440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:03:58,140-Speed 6308.88 samples/sec Loss 4.6358 LearningRate 0.0003 Epoch: 21 Global Step: 454450 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:04:01,384-Speed 6314.60 samples/sec Loss 4.6256 LearningRate 0.0003 Epoch: 21 Global Step: 454460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:04:04,626-Speed 6318.56 samples/sec Loss 4.6106 LearningRate 0.0003 Epoch: 21 Global Step: 454470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:04:07,872-Speed 6309.54 samples/sec Loss 4.6145 LearningRate 0.0003 Epoch: 21 Global Step: 454480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:04:11,176-Speed 6200.12 samples/sec Loss 4.6236 LearningRate 0.0003 Epoch: 21 Global Step: 454490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:04:14,423-Speed 6309.87 samples/sec Loss 4.5735 LearningRate 0.0003 Epoch: 21 Global Step: 454500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:04:17,675-Speed 6298.22 samples/sec Loss 4.6315 LearningRate 0.0003 Epoch: 21 Global Step: 454510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:04:20,925-Speed 6303.03 samples/sec Loss 4.5890 LearningRate 0.0003 Epoch: 21 Global Step: 454520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:04:24,157-Speed 6337.38 samples/sec Loss 4.5413 LearningRate 0.0003 Epoch: 21 Global Step: 454530 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 09:04:27,407-Speed 6303.42 samples/sec Loss 4.5712 LearningRate 0.0003 Epoch: 21 Global Step: 454540 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 09:04:30,661-Speed 6295.01 samples/sec Loss 4.6983 LearningRate 0.0003 Epoch: 21 Global Step: 454550 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 09:04:33,910-Speed 6305.65 samples/sec Loss 4.6754 LearningRate 0.0003 Epoch: 21 Global Step: 454560 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 09:04:37,157-Speed 6308.88 samples/sec Loss 4.6172 LearningRate 0.0003 Epoch: 21 Global Step: 454570 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 09:04:40,411-Speed 6295.94 samples/sec Loss 4.5536 LearningRate 0.0003 Epoch: 21 Global Step: 454580 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 09:04:43,655-Speed 6313.40 samples/sec Loss 4.6069 LearningRate 0.0003 Epoch: 21 Global Step: 454590 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 09:04:46,899-Speed 6314.63 samples/sec Loss 4.5796 LearningRate 0.0003 Epoch: 21 Global Step: 454600 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 09:04:50,144-Speed 6312.46 samples/sec Loss 4.5714 LearningRate 0.0003 Epoch: 21 Global Step: 454610 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 09:04:53,386-Speed 6319.66 samples/sec Loss 4.5473 LearningRate 0.0003 Epoch: 21 Global Step: 454620 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-04-02 09:04:56,644-Speed 6286.45 samples/sec Loss 4.5665 LearningRate 0.0003 Epoch: 21 Global Step: 454630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:04:59,890-Speed 6311.86 samples/sec Loss 4.5749 LearningRate 0.0003 Epoch: 21 Global Step: 454640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:03,139-Speed 6305.23 samples/sec Loss 4.6474 LearningRate 0.0003 Epoch: 21 Global Step: 454650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:06,389-Speed 6304.10 samples/sec Loss 4.6266 LearningRate 0.0003 Epoch: 21 Global Step: 454660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:09,635-Speed 6309.77 samples/sec Loss 4.6287 LearningRate 0.0003 Epoch: 21 Global Step: 454670 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:12,887-Speed 6299.32 samples/sec Loss 4.5802 LearningRate 0.0003 Epoch: 21 Global Step: 454680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:16,138-Speed 6302.89 samples/sec Loss 4.6147 LearningRate 0.0003 Epoch: 21 Global Step: 454690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:19,385-Speed 6308.42 samples/sec Loss 4.6087 LearningRate 0.0003 Epoch: 21 Global Step: 454700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:22,647-Speed 6279.30 samples/sec Loss 4.5584 LearningRate 0.0003 Epoch: 21 Global Step: 454710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:25,891-Speed 6314.30 samples/sec Loss 4.5699 LearningRate 0.0003 Epoch: 21 Global Step: 454720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:29,128-Speed 6329.81 samples/sec Loss 4.5456 LearningRate 0.0003 Epoch: 21 Global Step: 454730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:32,376-Speed 6306.58 samples/sec Loss 4.5391 LearningRate 0.0003 Epoch: 21 Global Step: 454740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:35,620-Speed 6313.96 samples/sec Loss 4.6488 LearningRate 0.0003 Epoch: 21 Global Step: 454750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:38,868-Speed 6306.45 samples/sec Loss 4.5836 LearningRate 0.0003 Epoch: 21 Global Step: 454760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:42,111-Speed 6316.03 samples/sec Loss 4.6376 LearningRate 0.0003 Epoch: 21 Global Step: 454770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:45,360-Speed 6306.14 samples/sec Loss 4.6656 LearningRate 0.0003 Epoch: 21 Global Step: 454780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:48,607-Speed 6308.97 samples/sec Loss 4.5785 LearningRate 0.0003 Epoch: 21 Global Step: 454790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:51,852-Speed 6312.77 samples/sec Loss 4.6312 LearningRate 0.0003 Epoch: 21 Global Step: 454800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:55,095-Speed 6316.56 samples/sec Loss 4.6092 LearningRate 0.0003 Epoch: 21 Global Step: 454810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:05:58,342-Speed 6308.55 samples/sec Loss 4.6552 LearningRate 0.0003 Epoch: 21 Global Step: 454820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:01,588-Speed 6309.59 samples/sec Loss 4.5505 LearningRate 0.0003 Epoch: 21 Global Step: 454830 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:06:04,822-Speed 6334.19 samples/sec Loss 4.6075 LearningRate 0.0003 Epoch: 21 Global Step: 454840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:08,072-Speed 6304.35 samples/sec Loss 4.5431 LearningRate 0.0003 Epoch: 21 Global Step: 454850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:11,316-Speed 6315.43 samples/sec Loss 4.6183 LearningRate 0.0003 Epoch: 21 Global Step: 454860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:14,562-Speed 6309.62 samples/sec Loss 4.5293 LearningRate 0.0003 Epoch: 21 Global Step: 454870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:17,805-Speed 6317.19 samples/sec Loss 4.6053 LearningRate 0.0003 Epoch: 21 Global Step: 454880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:21,051-Speed 6312.00 samples/sec Loss 4.6216 LearningRate 0.0003 Epoch: 21 Global Step: 454890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:24,298-Speed 6308.91 samples/sec Loss 4.6671 LearningRate 0.0003 Epoch: 21 Global Step: 454900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:27,543-Speed 6312.52 samples/sec Loss 4.6369 LearningRate 0.0003 Epoch: 21 Global Step: 454910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:30,789-Speed 6309.80 samples/sec Loss 4.5741 LearningRate 0.0003 Epoch: 21 Global Step: 454920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:34,034-Speed 6312.87 samples/sec Loss 4.6320 LearningRate 0.0003 Epoch: 21 Global Step: 454930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:37,272-Speed 6327.06 samples/sec Loss 4.5966 LearningRate 0.0003 Epoch: 21 Global Step: 454940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:40,525-Speed 6296.07 samples/sec Loss 4.6127 LearningRate 0.0003 Epoch: 21 Global Step: 454950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:43,769-Speed 6315.06 samples/sec Loss 4.6070 LearningRate 0.0003 Epoch: 21 Global Step: 454960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:47,014-Speed 6312.76 samples/sec Loss 4.5699 LearningRate 0.0003 Epoch: 21 Global Step: 454970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:50,262-Speed 6307.62 samples/sec Loss 4.6606 LearningRate 0.0003 Epoch: 21 Global Step: 454980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:53,513-Speed 6299.92 samples/sec Loss 4.5825 LearningRate 0.0003 Epoch: 21 Global Step: 454990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:06:56,758-Speed 6313.91 samples/sec Loss 4.5880 LearningRate 0.0003 Epoch: 21 Global Step: 455000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:00,000-Speed 6317.12 samples/sec Loss 4.5858 LearningRate 0.0003 Epoch: 21 Global Step: 455010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:03,249-Speed 6305.49 samples/sec Loss 4.5600 LearningRate 0.0003 Epoch: 21 Global Step: 455020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:06,499-Speed 6303.18 samples/sec Loss 4.5453 LearningRate 0.0003 Epoch: 21 Global Step: 455030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:09,745-Speed 6309.81 samples/sec Loss 4.5448 LearningRate 0.0003 Epoch: 21 Global Step: 455040 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:07:12,977-Speed 6339.21 samples/sec Loss 4.5705 LearningRate 0.0003 Epoch: 21 Global Step: 455050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:16,225-Speed 6306.36 samples/sec Loss 4.5968 LearningRate 0.0003 Epoch: 21 Global Step: 455060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:19,469-Speed 6315.52 samples/sec Loss 4.5708 LearningRate 0.0003 Epoch: 21 Global Step: 455070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:22,717-Speed 6306.95 samples/sec Loss 4.6265 LearningRate 0.0003 Epoch: 21 Global Step: 455080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:25,963-Speed 6310.99 samples/sec Loss 4.5789 LearningRate 0.0003 Epoch: 21 Global Step: 455090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:29,210-Speed 6308.32 samples/sec Loss 4.5830 LearningRate 0.0003 Epoch: 21 Global Step: 455100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:32,455-Speed 6312.94 samples/sec Loss 4.5686 LearningRate 0.0003 Epoch: 21 Global Step: 455110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:35,700-Speed 6313.13 samples/sec Loss 4.5565 LearningRate 0.0003 Epoch: 21 Global Step: 455120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:38,945-Speed 6312.01 samples/sec Loss 4.6252 LearningRate 0.0003 Epoch: 21 Global Step: 455130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:42,191-Speed 6311.53 samples/sec Loss 4.6205 LearningRate 0.0003 Epoch: 21 Global Step: 455140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:45,426-Speed 6331.88 samples/sec Loss 4.5821 LearningRate 0.0003 Epoch: 21 Global Step: 455150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:48,674-Speed 6306.98 samples/sec Loss 4.6270 LearningRate 0.0003 Epoch: 21 Global Step: 455160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:51,921-Speed 6309.04 samples/sec Loss 4.6814 LearningRate 0.0003 Epoch: 21 Global Step: 455170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:55,165-Speed 6314.00 samples/sec Loss 4.6065 LearningRate 0.0003 Epoch: 21 Global Step: 455180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:07:58,410-Speed 6313.56 samples/sec Loss 4.6049 LearningRate 0.0003 Epoch: 21 Global Step: 455190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:01,662-Speed 6297.72 samples/sec Loss 4.6709 LearningRate 0.0003 Epoch: 21 Global Step: 455200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:04,907-Speed 6313.42 samples/sec Loss 4.5644 LearningRate 0.0003 Epoch: 21 Global Step: 455210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:08,156-Speed 6305.21 samples/sec Loss 4.6221 LearningRate 0.0003 Epoch: 21 Global Step: 455220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:11,402-Speed 6309.66 samples/sec Loss 4.6284 LearningRate 0.0003 Epoch: 21 Global Step: 455230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:14,659-Speed 6289.99 samples/sec Loss 4.6537 LearningRate 0.0003 Epoch: 21 Global Step: 455240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:17,889-Speed 6341.49 samples/sec Loss 4.5601 LearningRate 0.0003 Epoch: 21 Global Step: 455250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:21,133-Speed 6315.16 samples/sec Loss 4.6870 LearningRate 0.0003 Epoch: 21 Global Step: 455260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:24,378-Speed 6311.62 samples/sec Loss 4.5758 LearningRate 0.0003 Epoch: 21 Global Step: 455270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:27,623-Speed 6313.39 samples/sec Loss 4.5905 LearningRate 0.0003 Epoch: 21 Global Step: 455280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:30,870-Speed 6309.38 samples/sec Loss 4.6302 LearningRate 0.0003 Epoch: 21 Global Step: 455290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:34,127-Speed 6290.78 samples/sec Loss 4.6237 LearningRate 0.0003 Epoch: 21 Global Step: 455300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:37,380-Speed 6297.08 samples/sec Loss 4.5580 LearningRate 0.0003 Epoch: 21 Global Step: 455310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:40,627-Speed 6308.12 samples/sec Loss 4.6134 LearningRate 0.0003 Epoch: 21 Global Step: 455320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:43,872-Speed 6312.38 samples/sec Loss 4.6161 LearningRate 0.0003 Epoch: 21 Global Step: 455330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:47,124-Speed 6300.33 samples/sec Loss 4.5787 LearningRate 0.0003 Epoch: 21 Global Step: 455340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:50,353-Speed 6343.68 samples/sec Loss 4.5968 LearningRate 0.0003 Epoch: 21 Global Step: 455350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:53,599-Speed 6309.86 samples/sec Loss 4.5933 LearningRate 0.0003 Epoch: 21 Global Step: 455360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:08:56,846-Speed 6309.67 samples/sec Loss 4.6167 LearningRate 0.0003 Epoch: 21 Global Step: 455370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:00,092-Speed 6310.26 samples/sec Loss 4.6085 LearningRate 0.0003 Epoch: 21 Global Step: 455380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:03,341-Speed 6305.66 samples/sec Loss 4.6161 LearningRate 0.0003 Epoch: 21 Global Step: 455390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:06,587-Speed 6309.19 samples/sec Loss 4.5793 LearningRate 0.0003 Epoch: 21 Global Step: 455400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:09,830-Speed 6317.21 samples/sec Loss 4.5634 LearningRate 0.0003 Epoch: 21 Global Step: 455410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:13,078-Speed 6307.46 samples/sec Loss 4.5956 LearningRate 0.0003 Epoch: 21 Global Step: 455420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:16,337-Speed 6285.02 samples/sec Loss 4.5206 LearningRate 0.0003 Epoch: 21 Global Step: 455430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:19,594-Speed 6289.21 samples/sec Loss 4.6348 LearningRate 0.0003 Epoch: 21 Global Step: 455440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:22,844-Speed 6303.80 samples/sec Loss 4.6310 LearningRate 0.0003 Epoch: 21 Global Step: 455450 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:09:26,086-Speed 6317.09 samples/sec Loss 4.5907 LearningRate 0.0003 Epoch: 21 Global Step: 455460 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:09:29,320-Speed 6334.99 samples/sec Loss 4.6417 LearningRate 0.0003 Epoch: 21 Global Step: 455470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:32,567-Speed 6309.67 samples/sec Loss 4.6244 LearningRate 0.0003 Epoch: 21 Global Step: 455480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:35,813-Speed 6310.52 samples/sec Loss 4.6020 LearningRate 0.0003 Epoch: 21 Global Step: 455490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:39,059-Speed 6310.87 samples/sec Loss 4.5200 LearningRate 0.0003 Epoch: 21 Global Step: 455500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:42,306-Speed 6309.06 samples/sec Loss 4.6817 LearningRate 0.0003 Epoch: 21 Global Step: 455510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:45,552-Speed 6311.57 samples/sec Loss 4.5641 LearningRate 0.0003 Epoch: 21 Global Step: 455520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:48,840-Speed 6229.26 samples/sec Loss 4.5216 LearningRate 0.0003 Epoch: 21 Global Step: 455530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:52,097-Speed 6290.46 samples/sec Loss 4.5941 LearningRate 0.0003 Epoch: 21 Global Step: 455540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:55,344-Speed 6308.41 samples/sec Loss 4.5942 LearningRate 0.0003 Epoch: 21 Global Step: 455550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:09:58,589-Speed 6311.90 samples/sec Loss 4.5933 LearningRate 0.0003 Epoch: 21 Global Step: 455560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:01,820-Speed 6340.56 samples/sec Loss 4.5586 LearningRate 0.0003 Epoch: 21 Global Step: 455570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:05,066-Speed 6310.24 samples/sec Loss 4.5832 LearningRate 0.0003 Epoch: 21 Global Step: 455580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:08,310-Speed 6315.64 samples/sec Loss 4.5903 LearningRate 0.0003 Epoch: 21 Global Step: 455590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:11,559-Speed 6304.13 samples/sec Loss 4.5612 LearningRate 0.0003 Epoch: 21 Global Step: 455600 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:14,809-Speed 6301.92 samples/sec Loss 4.6124 LearningRate 0.0003 Epoch: 21 Global Step: 455610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:18,059-Speed 6304.54 samples/sec Loss 4.6136 LearningRate 0.0003 Epoch: 21 Global Step: 455620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:21,303-Speed 6313.74 samples/sec Loss 4.5752 LearningRate 0.0003 Epoch: 21 Global Step: 455630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:24,550-Speed 6308.07 samples/sec Loss 4.6545 LearningRate 0.0003 Epoch: 21 Global Step: 455640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:27,800-Speed 6304.39 samples/sec Loss 4.5996 LearningRate 0.0003 Epoch: 21 Global Step: 455650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:31,044-Speed 6313.84 samples/sec Loss 4.6289 LearningRate 0.0003 Epoch: 21 Global Step: 455660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:34,289-Speed 6314.06 samples/sec Loss 4.6126 LearningRate 0.0003 Epoch: 21 Global Step: 455670 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:10:37,520-Speed 6339.85 samples/sec Loss 4.6056 LearningRate 0.0003 Epoch: 21 Global Step: 455680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:40,762-Speed 6317.95 samples/sec Loss 4.5724 LearningRate 0.0003 Epoch: 21 Global Step: 455690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:44,009-Speed 6309.08 samples/sec Loss 4.5584 LearningRate 0.0003 Epoch: 21 Global Step: 455700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:47,296-Speed 6232.81 samples/sec Loss 4.6529 LearningRate 0.0003 Epoch: 21 Global Step: 455710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:50,611-Speed 6178.68 samples/sec Loss 4.5817 LearningRate 0.0003 Epoch: 21 Global Step: 455720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:53,872-Speed 6281.24 samples/sec Loss 4.6164 LearningRate 0.0003 Epoch: 21 Global Step: 455730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:10:57,117-Speed 6314.32 samples/sec Loss 4.5706 LearningRate 0.0003 Epoch: 21 Global Step: 455740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:00,374-Speed 6289.58 samples/sec Loss 4.5552 LearningRate 0.0003 Epoch: 21 Global Step: 455750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:03,624-Speed 6302.51 samples/sec Loss 4.5178 LearningRate 0.0003 Epoch: 21 Global Step: 455760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:06,865-Speed 6320.99 samples/sec Loss 4.6057 LearningRate 0.0003 Epoch: 21 Global Step: 455770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:10,094-Speed 6343.54 samples/sec Loss 4.5703 LearningRate 0.0003 Epoch: 21 Global Step: 455780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:13,361-Speed 6268.79 samples/sec Loss 4.6077 LearningRate 0.0003 Epoch: 21 Global Step: 455790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:16,637-Speed 6253.08 samples/sec Loss 4.5526 LearningRate 0.0003 Epoch: 21 Global Step: 455800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:19,886-Speed 6305.42 samples/sec Loss 4.5959 LearningRate 0.0003 Epoch: 21 Global Step: 455810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:23,130-Speed 6315.43 samples/sec Loss 4.4917 LearningRate 0.0003 Epoch: 21 Global Step: 455820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:26,377-Speed 6308.68 samples/sec Loss 4.6108 LearningRate 0.0003 Epoch: 21 Global Step: 455830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:29,622-Speed 6312.83 samples/sec Loss 4.6453 LearningRate 0.0003 Epoch: 21 Global Step: 455840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:32,866-Speed 6313.53 samples/sec Loss 4.6528 LearningRate 0.0003 Epoch: 21 Global Step: 455850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:36,113-Speed 6308.32 samples/sec Loss 4.6041 LearningRate 0.0003 Epoch: 21 Global Step: 455860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:39,357-Speed 6314.54 samples/sec Loss 4.6028 LearningRate 0.0003 Epoch: 21 Global Step: 455870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:42,605-Speed 6307.90 samples/sec Loss 4.5654 LearningRate 0.0003 Epoch: 21 Global Step: 455880 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:11:45,849-Speed 6313.28 samples/sec Loss 4.5789 LearningRate 0.0003 Epoch: 21 Global Step: 455890 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:11:49,076-Speed 6349.37 samples/sec Loss 4.5623 LearningRate 0.0003 Epoch: 21 Global Step: 455900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:52,321-Speed 6312.78 samples/sec Loss 4.5189 LearningRate 0.0003 Epoch: 21 Global Step: 455910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:55,566-Speed 6312.36 samples/sec Loss 4.6665 LearningRate 0.0003 Epoch: 21 Global Step: 455920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:11:58,811-Speed 6313.24 samples/sec Loss 4.6073 LearningRate 0.0003 Epoch: 21 Global Step: 455930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:02,056-Speed 6312.36 samples/sec Loss 4.6498 LearningRate 0.0003 Epoch: 21 Global Step: 455940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:05,304-Speed 6307.23 samples/sec Loss 4.5716 LearningRate 0.0003 Epoch: 21 Global Step: 455950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:08,548-Speed 6315.28 samples/sec Loss 4.6256 LearningRate 0.0003 Epoch: 21 Global Step: 455960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:11,792-Speed 6314.53 samples/sec Loss 4.6271 LearningRate 0.0003 Epoch: 21 Global Step: 455970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:15,038-Speed 6309.57 samples/sec Loss 4.6467 LearningRate 0.0003 Epoch: 21 Global Step: 455980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:18,281-Speed 6316.81 samples/sec Loss 4.5999 LearningRate 0.0003 Epoch: 21 Global Step: 455990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:21,536-Speed 6293.67 samples/sec Loss 4.6728 LearningRate 0.0003 Epoch: 21 Global Step: 456000 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:12:24,771-Speed 6332.35 samples/sec Loss 4.5533 LearningRate 0.0003 Epoch: 21 Global Step: 456010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:28,012-Speed 6319.14 samples/sec Loss 4.5443 LearningRate 0.0003 Epoch: 21 Global Step: 456020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:31,257-Speed 6313.81 samples/sec Loss 4.6372 LearningRate 0.0003 Epoch: 21 Global Step: 456030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:34,497-Speed 6321.33 samples/sec Loss 4.5503 LearningRate 0.0003 Epoch: 21 Global Step: 456040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:37,743-Speed 6311.90 samples/sec Loss 4.6328 LearningRate 0.0003 Epoch: 21 Global Step: 456050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:41,102-Speed 6097.74 samples/sec Loss 4.5794 LearningRate 0.0003 Epoch: 21 Global Step: 456060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:44,360-Speed 6287.43 samples/sec Loss 4.6106 LearningRate 0.0003 Epoch: 21 Global Step: 456070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:47,608-Speed 6308.28 samples/sec Loss 4.5490 LearningRate 0.0003 Epoch: 21 Global Step: 456080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:50,853-Speed 6312.07 samples/sec Loss 4.5832 LearningRate 0.0003 Epoch: 21 Global Step: 456090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:54,094-Speed 6319.68 samples/sec Loss 4.6297 LearningRate 0.0003 Epoch: 21 Global Step: 456100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:12:57,358-Speed 6276.34 samples/sec Loss 4.6873 LearningRate 0.0003 Epoch: 21 Global Step: 456110 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:13:00,606-Speed 6307.04 samples/sec Loss 4.5310 LearningRate 0.0003 Epoch: 21 Global Step: 456120 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:13:03,964-Speed 6101.08 samples/sec Loss 4.6136 LearningRate 0.0003 Epoch: 21 Global Step: 456130 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:13:07,240-Speed 6252.69 samples/sec Loss 4.5659 LearningRate 0.0003 Epoch: 21 Global Step: 456140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:10,485-Speed 6312.86 samples/sec Loss 4.6184 LearningRate 0.0003 Epoch: 21 Global Step: 456150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:13,731-Speed 6310.45 samples/sec Loss 4.6070 LearningRate 0.0003 Epoch: 21 Global Step: 456160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:16,975-Speed 6315.23 samples/sec Loss 4.5307 LearningRate 0.0003 Epoch: 21 Global Step: 456170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:20,227-Speed 6298.48 samples/sec Loss 4.5872 LearningRate 0.0003 Epoch: 21 Global Step: 456180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:23,473-Speed 6312.36 samples/sec Loss 4.6279 LearningRate 0.0003 Epoch: 21 Global Step: 456190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:26,716-Speed 6315.09 samples/sec Loss 4.5953 LearningRate 0.0003 Epoch: 21 Global Step: 456200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:29,961-Speed 6313.25 samples/sec Loss 4.5286 LearningRate 0.0003 Epoch: 21 Global Step: 456210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:33,207-Speed 6311.45 samples/sec Loss 4.6271 LearningRate 0.0003 Epoch: 21 Global Step: 456220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:36,452-Speed 6312.59 samples/sec Loss 4.5537 LearningRate 0.0003 Epoch: 21 Global Step: 456230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:39,686-Speed 6333.26 samples/sec Loss 4.6483 LearningRate 0.0003 Epoch: 21 Global Step: 456240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:42,936-Speed 6303.68 samples/sec Loss 4.6159 LearningRate 0.0003 Epoch: 21 Global Step: 456250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:46,180-Speed 6313.25 samples/sec Loss 4.5430 LearningRate 0.0002 Epoch: 21 Global Step: 456260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:13:49,425-Speed 6313.81 samples/sec Loss 4.6219 LearningRate 0.0002 Epoch: 21 Global Step: 456270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:14:49,300-Speed 342.05 samples/sec Loss 4.6096 LearningRate 0.0002 Epoch: 22 Global Step: 456280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:14:52,536-Speed 6330.65 samples/sec Loss 4.6160 LearningRate 0.0002 Epoch: 22 Global Step: 456290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:14:55,773-Speed 6327.04 samples/sec Loss 4.5779 LearningRate 0.0002 Epoch: 22 Global Step: 456300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:14:59,005-Speed 6339.51 samples/sec Loss 4.6576 LearningRate 0.0002 Epoch: 22 Global Step: 456310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:02,249-Speed 6315.23 samples/sec Loss 4.6678 LearningRate 0.0002 Epoch: 22 Global Step: 456320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:05,483-Speed 6333.99 samples/sec Loss 4.5963 LearningRate 0.0002 Epoch: 22 Global Step: 456330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:08,735-Speed 6299.67 samples/sec Loss 4.5443 LearningRate 0.0002 Epoch: 22 Global Step: 456340 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:15:11,957-Speed 6357.66 samples/sec Loss 4.5690 LearningRate 0.0002 Epoch: 22 Global Step: 456350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:15,198-Speed 6320.50 samples/sec Loss 4.6101 LearningRate 0.0002 Epoch: 22 Global Step: 456360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:18,435-Speed 6327.85 samples/sec Loss 4.6601 LearningRate 0.0002 Epoch: 22 Global Step: 456370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:21,682-Speed 6309.91 samples/sec Loss 4.5899 LearningRate 0.0002 Epoch: 22 Global Step: 456380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:24,916-Speed 6332.45 samples/sec Loss 4.6206 LearningRate 0.0002 Epoch: 22 Global Step: 456390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:28,155-Speed 6324.62 samples/sec Loss 4.5598 LearningRate 0.0002 Epoch: 22 Global Step: 456400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:31,391-Speed 6329.51 samples/sec Loss 4.5489 LearningRate 0.0002 Epoch: 22 Global Step: 456410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:34,625-Speed 6335.06 samples/sec Loss 4.5466 LearningRate 0.0002 Epoch: 22 Global Step: 456420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:37,862-Speed 6329.01 samples/sec Loss 4.5060 LearningRate 0.0002 Epoch: 22 Global Step: 456430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:41,099-Speed 6328.73 samples/sec Loss 4.5541 LearningRate 0.0002 Epoch: 22 Global Step: 456440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:44,338-Speed 6322.57 samples/sec Loss 4.5963 LearningRate 0.0002 Epoch: 22 Global Step: 456450 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:15:47,564-Speed 6351.21 samples/sec Loss 4.5653 LearningRate 0.0002 Epoch: 22 Global Step: 456460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:50,799-Speed 6331.70 samples/sec Loss 4.5845 LearningRate 0.0002 Epoch: 22 Global Step: 456470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:54,042-Speed 6317.19 samples/sec Loss 4.6127 LearningRate 0.0002 Epoch: 22 Global Step: 456480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:15:57,282-Speed 6320.89 samples/sec Loss 4.5311 LearningRate 0.0002 Epoch: 22 Global Step: 456490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:00,518-Speed 6330.83 samples/sec Loss 4.5440 LearningRate 0.0002 Epoch: 22 Global Step: 456500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:03,757-Speed 6324.33 samples/sec Loss 4.5786 LearningRate 0.0002 Epoch: 22 Global Step: 456510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:06,997-Speed 6322.43 samples/sec Loss 4.6179 LearningRate 0.0002 Epoch: 22 Global Step: 456520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:10,237-Speed 6322.51 samples/sec Loss 4.5500 LearningRate 0.0002 Epoch: 22 Global Step: 456530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:13,481-Speed 6314.15 samples/sec Loss 4.5235 LearningRate 0.0002 Epoch: 22 Global Step: 456540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:16,719-Speed 6327.45 samples/sec Loss 4.6285 LearningRate 0.0002 Epoch: 22 Global Step: 456550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:19,943-Speed 6355.28 samples/sec Loss 4.6272 LearningRate 0.0002 Epoch: 22 Global Step: 456560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:23,180-Speed 6327.50 samples/sec Loss 4.5370 LearningRate 0.0002 Epoch: 22 Global Step: 456570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:26,420-Speed 6323.24 samples/sec Loss 4.5907 LearningRate 0.0002 Epoch: 22 Global Step: 456580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:29,656-Speed 6330.06 samples/sec Loss 4.6341 LearningRate 0.0002 Epoch: 22 Global Step: 456590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:32,889-Speed 6336.08 samples/sec Loss 4.6071 LearningRate 0.0002 Epoch: 22 Global Step: 456600 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:36,127-Speed 6326.59 samples/sec Loss 4.4959 LearningRate 0.0002 Epoch: 22 Global Step: 456610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:39,365-Speed 6324.85 samples/sec Loss 4.5839 LearningRate 0.0002 Epoch: 22 Global Step: 456620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:42,604-Speed 6325.83 samples/sec Loss 4.6316 LearningRate 0.0002 Epoch: 22 Global Step: 456630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:45,847-Speed 6316.05 samples/sec Loss 4.6437 LearningRate 0.0002 Epoch: 22 Global Step: 456640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:49,081-Speed 6334.68 samples/sec Loss 4.5699 LearningRate 0.0002 Epoch: 22 Global Step: 456650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:52,302-Speed 6359.55 samples/sec Loss 4.6105 LearningRate 0.0002 Epoch: 22 Global Step: 456660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:55,532-Speed 6340.17 samples/sec Loss 4.5655 LearningRate 0.0002 Epoch: 22 Global Step: 456670 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:16:58,766-Speed 6335.26 samples/sec Loss 4.5676 LearningRate 0.0002 Epoch: 22 Global Step: 456680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:02,005-Speed 6323.29 samples/sec Loss 4.4886 LearningRate 0.0002 Epoch: 22 Global Step: 456690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:05,239-Speed 6334.96 samples/sec Loss 4.6349 LearningRate 0.0002 Epoch: 22 Global Step: 456700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:08,477-Speed 6326.59 samples/sec Loss 4.6104 LearningRate 0.0002 Epoch: 22 Global Step: 456710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:11,709-Speed 6336.52 samples/sec Loss 4.5400 LearningRate 0.0002 Epoch: 22 Global Step: 456720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:14,945-Speed 6331.81 samples/sec Loss 4.5409 LearningRate 0.0002 Epoch: 22 Global Step: 456730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:18,182-Speed 6327.39 samples/sec Loss 4.5202 LearningRate 0.0002 Epoch: 22 Global Step: 456740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:21,422-Speed 6323.67 samples/sec Loss 4.5888 LearningRate 0.0002 Epoch: 22 Global Step: 456750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:24,659-Speed 6328.49 samples/sec Loss 4.5975 LearningRate 0.0002 Epoch: 22 Global Step: 456760 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:17:27,884-Speed 6351.72 samples/sec Loss 4.5573 LearningRate 0.0002 Epoch: 22 Global Step: 456770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:31,117-Speed 6335.37 samples/sec Loss 4.5956 LearningRate 0.0002 Epoch: 22 Global Step: 456780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:34,356-Speed 6326.25 samples/sec Loss 4.5862 LearningRate 0.0002 Epoch: 22 Global Step: 456790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:37,589-Speed 6335.72 samples/sec Loss 4.5988 LearningRate 0.0002 Epoch: 22 Global Step: 456800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:40,828-Speed 6323.16 samples/sec Loss 4.5627 LearningRate 0.0002 Epoch: 22 Global Step: 456810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:44,060-Speed 6338.15 samples/sec Loss 4.5706 LearningRate 0.0002 Epoch: 22 Global Step: 456820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:47,297-Speed 6328.66 samples/sec Loss 4.5954 LearningRate 0.0002 Epoch: 22 Global Step: 456830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:50,532-Speed 6332.99 samples/sec Loss 4.6271 LearningRate 0.0002 Epoch: 22 Global Step: 456840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:53,769-Speed 6328.16 samples/sec Loss 4.5672 LearningRate 0.0002 Epoch: 22 Global Step: 456850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:17:57,019-Speed 6301.43 samples/sec Loss 4.6278 LearningRate 0.0002 Epoch: 22 Global Step: 456860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:00,246-Speed 6348.93 samples/sec Loss 4.5586 LearningRate 0.0002 Epoch: 22 Global Step: 456870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:03,488-Speed 6318.45 samples/sec Loss 4.6390 LearningRate 0.0002 Epoch: 22 Global Step: 456880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:06,722-Speed 6334.35 samples/sec Loss 4.5717 LearningRate 0.0002 Epoch: 22 Global Step: 456890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:09,957-Speed 6332.12 samples/sec Loss 4.5866 LearningRate 0.0002 Epoch: 22 Global Step: 456900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:13,194-Speed 6330.19 samples/sec Loss 4.6104 LearningRate 0.0002 Epoch: 22 Global Step: 456910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:16,455-Speed 6282.93 samples/sec Loss 4.6365 LearningRate 0.0002 Epoch: 22 Global Step: 456920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:19,689-Speed 6333.61 samples/sec Loss 4.6057 LearningRate 0.0002 Epoch: 22 Global Step: 456930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:22,939-Speed 6303.41 samples/sec Loss 4.5293 LearningRate 0.0002 Epoch: 22 Global Step: 456940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:26,175-Speed 6329.48 samples/sec Loss 4.5641 LearningRate 0.0002 Epoch: 22 Global Step: 456950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:29,421-Speed 6309.82 samples/sec Loss 4.5006 LearningRate 0.0002 Epoch: 22 Global Step: 456960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:32,658-Speed 6330.50 samples/sec Loss 4.5583 LearningRate 0.0002 Epoch: 22 Global Step: 456970 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:18:35,894-Speed 6330.55 samples/sec Loss 4.5907 LearningRate 0.0002 Epoch: 22 Global Step: 456980 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:18:39,119-Speed 6351.23 samples/sec Loss 4.5794 LearningRate 0.0002 Epoch: 22 Global Step: 456990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:42,359-Speed 6322.46 samples/sec Loss 4.5859 LearningRate 0.0002 Epoch: 22 Global Step: 457000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:45,593-Speed 6335.01 samples/sec Loss 4.5482 LearningRate 0.0002 Epoch: 22 Global Step: 457010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:48,829-Speed 6329.43 samples/sec Loss 4.5498 LearningRate 0.0002 Epoch: 22 Global Step: 457020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:52,063-Speed 6333.70 samples/sec Loss 4.5309 LearningRate 0.0002 Epoch: 22 Global Step: 457030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:55,299-Speed 6331.45 samples/sec Loss 4.5952 LearningRate 0.0002 Epoch: 22 Global Step: 457040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:18:58,538-Speed 6323.77 samples/sec Loss 4.5915 LearningRate 0.0002 Epoch: 22 Global Step: 457050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:01,776-Speed 6326.38 samples/sec Loss 4.6298 LearningRate 0.0002 Epoch: 22 Global Step: 457060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:05,015-Speed 6323.83 samples/sec Loss 4.5756 LearningRate 0.0002 Epoch: 22 Global Step: 457070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:08,249-Speed 6333.86 samples/sec Loss 4.6251 LearningRate 0.0002 Epoch: 22 Global Step: 457080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:11,472-Speed 6356.50 samples/sec Loss 4.5253 LearningRate 0.0002 Epoch: 22 Global Step: 457090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:14,708-Speed 6330.66 samples/sec Loss 4.5697 LearningRate 0.0002 Epoch: 22 Global Step: 457100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:17,943-Speed 6331.42 samples/sec Loss 4.6077 LearningRate 0.0002 Epoch: 22 Global Step: 457110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:21,184-Speed 6321.39 samples/sec Loss 4.6182 LearningRate 0.0002 Epoch: 22 Global Step: 457120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:24,427-Speed 6316.29 samples/sec Loss 4.5413 LearningRate 0.0002 Epoch: 22 Global Step: 457130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:27,665-Speed 6325.61 samples/sec Loss 4.5456 LearningRate 0.0002 Epoch: 22 Global Step: 457140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:30,906-Speed 6320.00 samples/sec Loss 4.6048 LearningRate 0.0002 Epoch: 22 Global Step: 457150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:34,143-Speed 6328.81 samples/sec Loss 4.5523 LearningRate 0.0002 Epoch: 22 Global Step: 457160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:37,379-Speed 6329.49 samples/sec Loss 4.6196 LearningRate 0.0002 Epoch: 22 Global Step: 457170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:40,617-Speed 6327.87 samples/sec Loss 4.5486 LearningRate 0.0002 Epoch: 22 Global Step: 457180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:43,853-Speed 6329.57 samples/sec Loss 4.6055 LearningRate 0.0002 Epoch: 22 Global Step: 457190 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:19:47,075-Speed 6358.67 samples/sec Loss 4.5221 LearningRate 0.0002 Epoch: 22 Global Step: 457200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:50,313-Speed 6325.38 samples/sec Loss 4.5006 LearningRate 0.0002 Epoch: 22 Global Step: 457210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:53,551-Speed 6327.75 samples/sec Loss 4.6248 LearningRate 0.0002 Epoch: 22 Global Step: 457220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:19:56,787-Speed 6330.34 samples/sec Loss 4.6019 LearningRate 0.0002 Epoch: 22 Global Step: 457230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:00,025-Speed 6324.81 samples/sec Loss 4.5791 LearningRate 0.0002 Epoch: 22 Global Step: 457240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:03,264-Speed 6325.59 samples/sec Loss 4.6155 LearningRate 0.0002 Epoch: 22 Global Step: 457250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:06,500-Speed 6329.07 samples/sec Loss 4.5707 LearningRate 0.0002 Epoch: 22 Global Step: 457260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:09,737-Speed 6329.17 samples/sec Loss 4.5601 LearningRate 0.0002 Epoch: 22 Global Step: 457270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:12,975-Speed 6326.21 samples/sec Loss 4.5601 LearningRate 0.0002 Epoch: 22 Global Step: 457280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:16,210-Speed 6332.35 samples/sec Loss 4.5754 LearningRate 0.0002 Epoch: 22 Global Step: 457290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:19,435-Speed 6350.80 samples/sec Loss 4.5575 LearningRate 0.0002 Epoch: 22 Global Step: 457300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:22,675-Speed 6322.74 samples/sec Loss 4.5140 LearningRate 0.0002 Epoch: 22 Global Step: 457310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:25,913-Speed 6325.81 samples/sec Loss 4.6242 LearningRate 0.0002 Epoch: 22 Global Step: 457320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:29,152-Speed 6324.02 samples/sec Loss 4.5830 LearningRate 0.0002 Epoch: 22 Global Step: 457330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:32,385-Speed 6336.77 samples/sec Loss 4.5652 LearningRate 0.0002 Epoch: 22 Global Step: 457340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:35,623-Speed 6326.57 samples/sec Loss 4.5323 LearningRate 0.0002 Epoch: 22 Global Step: 457350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:38,861-Speed 6326.08 samples/sec Loss 4.5493 LearningRate 0.0002 Epoch: 22 Global Step: 457360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:42,097-Speed 6330.44 samples/sec Loss 4.5574 LearningRate 0.0002 Epoch: 22 Global Step: 457370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:45,335-Speed 6326.56 samples/sec Loss 4.5683 LearningRate 0.0002 Epoch: 22 Global Step: 457380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:48,573-Speed 6327.27 samples/sec Loss 4.5871 LearningRate 0.0002 Epoch: 22 Global Step: 457390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:51,796-Speed 6355.48 samples/sec Loss 4.5914 LearningRate 0.0002 Epoch: 22 Global Step: 457400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:55,034-Speed 6325.58 samples/sec Loss 4.6314 LearningRate 0.0002 Epoch: 22 Global Step: 457410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:20:58,283-Speed 6306.09 samples/sec Loss 4.6121 LearningRate 0.0002 Epoch: 22 Global Step: 457420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:01,522-Speed 6323.86 samples/sec Loss 4.6234 LearningRate 0.0002 Epoch: 22 Global Step: 457430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:04,764-Speed 6318.93 samples/sec Loss 4.6326 LearningRate 0.0002 Epoch: 22 Global Step: 457440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:08,000-Speed 6330.35 samples/sec Loss 4.6063 LearningRate 0.0002 Epoch: 22 Global Step: 457450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:11,238-Speed 6325.21 samples/sec Loss 4.5398 LearningRate 0.0002 Epoch: 22 Global Step: 457460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:14,477-Speed 6325.36 samples/sec Loss 4.5782 LearningRate 0.0002 Epoch: 22 Global Step: 457470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:17,718-Speed 6319.30 samples/sec Loss 4.5579 LearningRate 0.0002 Epoch: 22 Global Step: 457480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:20,954-Speed 6331.13 samples/sec Loss 4.5652 LearningRate 0.0002 Epoch: 22 Global Step: 457490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:24,179-Speed 6351.79 samples/sec Loss 4.6091 LearningRate 0.0002 Epoch: 22 Global Step: 457500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:27,417-Speed 6325.67 samples/sec Loss 4.6205 LearningRate 0.0002 Epoch: 22 Global Step: 457510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:30,667-Speed 6302.42 samples/sec Loss 4.5796 LearningRate 0.0002 Epoch: 22 Global Step: 457520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:33,907-Speed 6324.32 samples/sec Loss 4.6201 LearningRate 0.0002 Epoch: 22 Global Step: 457530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:37,230-Speed 6164.32 samples/sec Loss 4.5499 LearningRate 0.0002 Epoch: 22 Global Step: 457540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:40,512-Speed 6240.17 samples/sec Loss 4.5533 LearningRate 0.0002 Epoch: 22 Global Step: 457550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:43,752-Speed 6322.60 samples/sec Loss 4.5797 LearningRate 0.0002 Epoch: 22 Global Step: 457560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:46,987-Speed 6331.87 samples/sec Loss 4.5992 LearningRate 0.0002 Epoch: 22 Global Step: 457570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:50,226-Speed 6324.78 samples/sec Loss 4.5705 LearningRate 0.0002 Epoch: 22 Global Step: 457580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:53,492-Speed 6271.69 samples/sec Loss 4.5535 LearningRate 0.0002 Epoch: 22 Global Step: 457590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:21:56,739-Speed 6308.41 samples/sec Loss 4.5756 LearningRate 0.0002 Epoch: 22 Global Step: 457600 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:21:59,964-Speed 6353.30 samples/sec Loss 4.5565 LearningRate 0.0002 Epoch: 22 Global Step: 457610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:03,209-Speed 6313.05 samples/sec Loss 4.5114 LearningRate 0.0002 Epoch: 22 Global Step: 457620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:06,446-Speed 6327.39 samples/sec Loss 4.5912 LearningRate 0.0002 Epoch: 22 Global Step: 457630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:09,699-Speed 6298.27 samples/sec Loss 4.5312 LearningRate 0.0002 Epoch: 22 Global Step: 457640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:12,939-Speed 6322.88 samples/sec Loss 4.5589 LearningRate 0.0002 Epoch: 22 Global Step: 457650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:16,177-Speed 6326.71 samples/sec Loss 4.5922 LearningRate 0.0002 Epoch: 22 Global Step: 457660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:19,427-Speed 6302.47 samples/sec Loss 4.6426 LearningRate 0.0002 Epoch: 22 Global Step: 457670 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:22,664-Speed 6327.50 samples/sec Loss 4.6360 LearningRate 0.0002 Epoch: 22 Global Step: 457680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:25,908-Speed 6314.20 samples/sec Loss 4.6174 LearningRate 0.0002 Epoch: 22 Global Step: 457690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:29,145-Speed 6328.16 samples/sec Loss 4.6073 LearningRate 0.0002 Epoch: 22 Global Step: 457700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:32,370-Speed 6353.03 samples/sec Loss 4.5102 LearningRate 0.0002 Epoch: 22 Global Step: 457710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:35,610-Speed 6321.53 samples/sec Loss 4.5615 LearningRate 0.0002 Epoch: 22 Global Step: 457720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:38,850-Speed 6323.41 samples/sec Loss 4.5444 LearningRate 0.0002 Epoch: 22 Global Step: 457730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:42,094-Speed 6313.79 samples/sec Loss 4.5439 LearningRate 0.0002 Epoch: 22 Global Step: 457740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:45,334-Speed 6323.24 samples/sec Loss 4.4770 LearningRate 0.0002 Epoch: 22 Global Step: 457750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:48,570-Speed 6329.60 samples/sec Loss 4.5708 LearningRate 0.0002 Epoch: 22 Global Step: 457760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:51,812-Speed 6318.67 samples/sec Loss 4.5990 LearningRate 0.0002 Epoch: 22 Global Step: 457770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:55,051-Speed 6324.64 samples/sec Loss 4.6182 LearningRate 0.0002 Epoch: 22 Global Step: 457780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:22:58,292-Speed 6320.06 samples/sec Loss 4.5840 LearningRate 0.0002 Epoch: 22 Global Step: 457790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:01,531-Speed 6323.44 samples/sec Loss 4.6092 LearningRate 0.0002 Epoch: 22 Global Step: 457800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:04,774-Speed 6316.14 samples/sec Loss 4.5727 LearningRate 0.0002 Epoch: 22 Global Step: 457810 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:23:09,274-Speed 4551.88 samples/sec Loss 4.5558 LearningRate 0.0002 Epoch: 22 Global Step: 457820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:12,515-Speed 6320.54 samples/sec Loss 4.5815 LearningRate 0.0002 Epoch: 22 Global Step: 457830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:15,756-Speed 6321.65 samples/sec Loss 4.5398 LearningRate 0.0002 Epoch: 22 Global Step: 457840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:18,996-Speed 6322.40 samples/sec Loss 4.5398 LearningRate 0.0002 Epoch: 22 Global Step: 457850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:22,244-Speed 6308.59 samples/sec Loss 4.5738 LearningRate 0.0002 Epoch: 22 Global Step: 457860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:25,484-Speed 6320.69 samples/sec Loss 4.5200 LearningRate 0.0002 Epoch: 22 Global Step: 457870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:28,727-Speed 6317.69 samples/sec Loss 4.6327 LearningRate 0.0002 Epoch: 22 Global Step: 457880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:31,971-Speed 6313.16 samples/sec Loss 4.5456 LearningRate 0.0002 Epoch: 22 Global Step: 457890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:35,212-Speed 6321.41 samples/sec Loss 4.6083 LearningRate 0.0002 Epoch: 22 Global Step: 457900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:38,454-Speed 6318.79 samples/sec Loss 4.5685 LearningRate 0.0002 Epoch: 22 Global Step: 457910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:41,682-Speed 6345.48 samples/sec Loss 4.5225 LearningRate 0.0002 Epoch: 22 Global Step: 457920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:44,922-Speed 6322.45 samples/sec Loss 4.4946 LearningRate 0.0002 Epoch: 22 Global Step: 457930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:48,164-Speed 6318.17 samples/sec Loss 4.6285 LearningRate 0.0002 Epoch: 22 Global Step: 457940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:51,403-Speed 6325.89 samples/sec Loss 4.5926 LearningRate 0.0002 Epoch: 22 Global Step: 457950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:54,645-Speed 6317.70 samples/sec Loss 4.5765 LearningRate 0.0002 Epoch: 22 Global Step: 457960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:23:57,884-Speed 6323.59 samples/sec Loss 4.5245 LearningRate 0.0002 Epoch: 22 Global Step: 457970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:01,136-Speed 6299.92 samples/sec Loss 4.6193 LearningRate 0.0002 Epoch: 22 Global Step: 457980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:04,379-Speed 6317.35 samples/sec Loss 4.5873 LearningRate 0.0002 Epoch: 22 Global Step: 457990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:07,620-Speed 6320.07 samples/sec Loss 4.5473 LearningRate 0.0002 Epoch: 22 Global Step: 458000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:10,861-Speed 6320.09 samples/sec Loss 4.6552 LearningRate 0.0002 Epoch: 22 Global Step: 458010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:14,089-Speed 6344.90 samples/sec Loss 4.5808 LearningRate 0.0002 Epoch: 22 Global Step: 458020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:17,330-Speed 6321.55 samples/sec Loss 4.5545 LearningRate 0.0002 Epoch: 22 Global Step: 458030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:20,575-Speed 6311.65 samples/sec Loss 4.6055 LearningRate 0.0002 Epoch: 22 Global Step: 458040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:23,818-Speed 6318.05 samples/sec Loss 4.5468 LearningRate 0.0002 Epoch: 22 Global Step: 458050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:27,056-Speed 6326.34 samples/sec Loss 4.6626 LearningRate 0.0002 Epoch: 22 Global Step: 458060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:30,300-Speed 6314.64 samples/sec Loss 4.5462 LearningRate 0.0002 Epoch: 22 Global Step: 458070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:33,540-Speed 6322.91 samples/sec Loss 4.6245 LearningRate 0.0002 Epoch: 22 Global Step: 458080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:36,779-Speed 6324.18 samples/sec Loss 4.5120 LearningRate 0.0002 Epoch: 22 Global Step: 458090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:40,022-Speed 6315.91 samples/sec Loss 4.6022 LearningRate 0.0002 Epoch: 22 Global Step: 458100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:43,268-Speed 6311.51 samples/sec Loss 4.6162 LearningRate 0.0002 Epoch: 22 Global Step: 458110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:46,493-Speed 6350.94 samples/sec Loss 4.6285 LearningRate 0.0002 Epoch: 22 Global Step: 458120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:49,733-Speed 6323.85 samples/sec Loss 4.5613 LearningRate 0.0002 Epoch: 22 Global Step: 458130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:52,976-Speed 6316.28 samples/sec Loss 4.6003 LearningRate 0.0002 Epoch: 22 Global Step: 458140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:56,219-Speed 6315.88 samples/sec Loss 4.5284 LearningRate 0.0002 Epoch: 22 Global Step: 458150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:24:59,461-Speed 6319.39 samples/sec Loss 4.5597 LearningRate 0.0002 Epoch: 22 Global Step: 458160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:02,708-Speed 6308.44 samples/sec Loss 4.6005 LearningRate 0.0002 Epoch: 22 Global Step: 458170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:05,945-Speed 6326.66 samples/sec Loss 4.6145 LearningRate 0.0002 Epoch: 22 Global Step: 458180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:09,186-Speed 6321.91 samples/sec Loss 4.6055 LearningRate 0.0002 Epoch: 22 Global Step: 458190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:12,426-Speed 6322.43 samples/sec Loss 4.5968 LearningRate 0.0002 Epoch: 22 Global Step: 458200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:15,677-Speed 6299.82 samples/sec Loss 4.5683 LearningRate 0.0002 Epoch: 22 Global Step: 458210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:18,917-Speed 6323.35 samples/sec Loss 4.6041 LearningRate 0.0002 Epoch: 22 Global Step: 458220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:22,161-Speed 6313.17 samples/sec Loss 4.5652 LearningRate 0.0002 Epoch: 22 Global Step: 458230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:25,407-Speed 6312.44 samples/sec Loss 4.4550 LearningRate 0.0002 Epoch: 22 Global Step: 458240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:28,652-Speed 6311.92 samples/sec Loss 4.5965 LearningRate 0.0002 Epoch: 22 Global Step: 458250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:31,892-Speed 6322.39 samples/sec Loss 4.4956 LearningRate 0.0002 Epoch: 22 Global Step: 458260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:35,135-Speed 6316.58 samples/sec Loss 4.6142 LearningRate 0.0002 Epoch: 22 Global Step: 458270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:38,375-Speed 6323.68 samples/sec Loss 4.5753 LearningRate 0.0002 Epoch: 22 Global Step: 458280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:41,612-Speed 6328.72 samples/sec Loss 4.6327 LearningRate 0.0002 Epoch: 22 Global Step: 458290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:44,856-Speed 6313.13 samples/sec Loss 4.5928 LearningRate 0.0002 Epoch: 22 Global Step: 458300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:48,097-Speed 6321.94 samples/sec Loss 4.6442 LearningRate 0.0002 Epoch: 22 Global Step: 458310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:51,323-Speed 6349.60 samples/sec Loss 4.6149 LearningRate 0.0002 Epoch: 22 Global Step: 458320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:54,568-Speed 6311.22 samples/sec Loss 4.5542 LearningRate 0.0002 Epoch: 22 Global Step: 458330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:25:57,810-Speed 6319.38 samples/sec Loss 4.5448 LearningRate 0.0002 Epoch: 22 Global Step: 458340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:01,050-Speed 6322.13 samples/sec Loss 4.5678 LearningRate 0.0002 Epoch: 22 Global Step: 458350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:04,293-Speed 6317.44 samples/sec Loss 4.5869 LearningRate 0.0002 Epoch: 22 Global Step: 458360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:07,535-Speed 6318.67 samples/sec Loss 4.6170 LearningRate 0.0002 Epoch: 22 Global Step: 458370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:10,775-Speed 6321.32 samples/sec Loss 4.5536 LearningRate 0.0002 Epoch: 22 Global Step: 458380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:14,014-Speed 6324.23 samples/sec Loss 4.6278 LearningRate 0.0002 Epoch: 22 Global Step: 458390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:17,282-Speed 6268.99 samples/sec Loss 4.5931 LearningRate 0.0002 Epoch: 22 Global Step: 458400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:20,533-Speed 6301.93 samples/sec Loss 4.5742 LearningRate 0.0002 Epoch: 22 Global Step: 458410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:23,774-Speed 6318.86 samples/sec Loss 4.5360 LearningRate 0.0002 Epoch: 22 Global Step: 458420 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:26:27,008-Speed 6335.34 samples/sec Loss 4.5759 LearningRate 0.0002 Epoch: 22 Global Step: 458430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:30,251-Speed 6315.21 samples/sec Loss 4.5565 LearningRate 0.0002 Epoch: 22 Global Step: 458440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:33,493-Speed 6319.72 samples/sec Loss 4.5094 LearningRate 0.0002 Epoch: 22 Global Step: 458450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:36,736-Speed 6317.33 samples/sec Loss 4.6240 LearningRate 0.0002 Epoch: 22 Global Step: 458460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:39,976-Speed 6323.04 samples/sec Loss 4.5300 LearningRate 0.0002 Epoch: 22 Global Step: 458470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:43,216-Speed 6321.98 samples/sec Loss 4.6051 LearningRate 0.0002 Epoch: 22 Global Step: 458480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:46,461-Speed 6311.89 samples/sec Loss 4.5548 LearningRate 0.0002 Epoch: 22 Global Step: 458490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:49,704-Speed 6316.47 samples/sec Loss 4.6324 LearningRate 0.0002 Epoch: 22 Global Step: 458500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:52,949-Speed 6313.05 samples/sec Loss 4.5644 LearningRate 0.0002 Epoch: 22 Global Step: 458510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:56,194-Speed 6313.78 samples/sec Loss 4.5898 LearningRate 0.0002 Epoch: 22 Global Step: 458520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:26:59,421-Speed 6347.67 samples/sec Loss 4.5150 LearningRate 0.0002 Epoch: 22 Global Step: 458530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:02,663-Speed 6317.10 samples/sec Loss 4.5625 LearningRate 0.0002 Epoch: 22 Global Step: 458540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:05,907-Speed 6314.19 samples/sec Loss 4.5432 LearningRate 0.0002 Epoch: 22 Global Step: 458550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:09,153-Speed 6311.59 samples/sec Loss 4.5413 LearningRate 0.0002 Epoch: 22 Global Step: 458560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:12,396-Speed 6317.53 samples/sec Loss 4.5280 LearningRate 0.0002 Epoch: 22 Global Step: 458570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:15,640-Speed 6314.41 samples/sec Loss 4.5552 LearningRate 0.0002 Epoch: 22 Global Step: 458580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:18,883-Speed 6316.01 samples/sec Loss 4.5083 LearningRate 0.0002 Epoch: 22 Global Step: 458590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:22,125-Speed 6318.61 samples/sec Loss 4.5660 LearningRate 0.0002 Epoch: 22 Global Step: 458600 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:25,374-Speed 6305.68 samples/sec Loss 4.5352 LearningRate 0.0002 Epoch: 22 Global Step: 458610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:28,612-Speed 6324.88 samples/sec Loss 4.5494 LearningRate 0.0002 Epoch: 22 Global Step: 458620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:31,840-Speed 6347.00 samples/sec Loss 4.5089 LearningRate 0.0002 Epoch: 22 Global Step: 458630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:35,081-Speed 6318.66 samples/sec Loss 4.5703 LearningRate 0.0002 Epoch: 22 Global Step: 458640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:38,320-Speed 6325.20 samples/sec Loss 4.5924 LearningRate 0.0002 Epoch: 22 Global Step: 458650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:41,561-Speed 6320.87 samples/sec Loss 4.6194 LearningRate 0.0002 Epoch: 22 Global Step: 458660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:44,807-Speed 6310.29 samples/sec Loss 4.6017 LearningRate 0.0002 Epoch: 22 Global Step: 458670 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:48,048-Speed 6320.70 samples/sec Loss 4.5986 LearningRate 0.0002 Epoch: 22 Global Step: 458680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:51,292-Speed 6314.30 samples/sec Loss 4.5611 LearningRate 0.0002 Epoch: 22 Global Step: 458690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:54,535-Speed 6318.24 samples/sec Loss 4.5798 LearningRate 0.0002 Epoch: 22 Global Step: 458700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:27:57,779-Speed 6314.10 samples/sec Loss 4.5036 LearningRate 0.0002 Epoch: 22 Global Step: 458710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:01,019-Speed 6322.84 samples/sec Loss 4.5847 LearningRate 0.0002 Epoch: 22 Global Step: 458720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:04,263-Speed 6313.98 samples/sec Loss 4.5889 LearningRate 0.0002 Epoch: 22 Global Step: 458730 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:28:07,492-Speed 6343.15 samples/sec Loss 4.6457 LearningRate 0.0002 Epoch: 22 Global Step: 458740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:10,735-Speed 6317.56 samples/sec Loss 4.6344 LearningRate 0.0002 Epoch: 22 Global Step: 458750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:13,977-Speed 6317.76 samples/sec Loss 4.6268 LearningRate 0.0002 Epoch: 22 Global Step: 458760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:17,222-Speed 6313.59 samples/sec Loss 4.5150 LearningRate 0.0002 Epoch: 22 Global Step: 458770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:20,464-Speed 6318.54 samples/sec Loss 4.5317 LearningRate 0.0002 Epoch: 22 Global Step: 458780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:23,704-Speed 6321.82 samples/sec Loss 4.5275 LearningRate 0.0002 Epoch: 22 Global Step: 458790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:26,948-Speed 6314.94 samples/sec Loss 4.5646 LearningRate 0.0002 Epoch: 22 Global Step: 458800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:30,189-Speed 6320.09 samples/sec Loss 4.5472 LearningRate 0.0002 Epoch: 22 Global Step: 458810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:33,431-Speed 6318.75 samples/sec Loss 4.4791 LearningRate 0.0002 Epoch: 22 Global Step: 458820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:36,672-Speed 6320.36 samples/sec Loss 4.5832 LearningRate 0.0002 Epoch: 22 Global Step: 458830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:39,914-Speed 6317.93 samples/sec Loss 4.5333 LearningRate 0.0002 Epoch: 22 Global Step: 458840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:43,184-Speed 6264.67 samples/sec Loss 4.4820 LearningRate 0.0002 Epoch: 22 Global Step: 458850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:46,424-Speed 6322.95 samples/sec Loss 4.6134 LearningRate 0.0002 Epoch: 22 Global Step: 458860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:49,688-Speed 6275.70 samples/sec Loss 4.5898 LearningRate 0.0002 Epoch: 22 Global Step: 458870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:52,932-Speed 6314.25 samples/sec Loss 4.5858 LearningRate 0.0002 Epoch: 22 Global Step: 458880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:56,173-Speed 6319.72 samples/sec Loss 4.5586 LearningRate 0.0002 Epoch: 22 Global Step: 458890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:28:59,418-Speed 6313.70 samples/sec Loss 4.5536 LearningRate 0.0002 Epoch: 22 Global Step: 458900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:02,671-Speed 6298.38 samples/sec Loss 4.5172 LearningRate 0.0002 Epoch: 22 Global Step: 458910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:05,912-Speed 6320.29 samples/sec Loss 4.5399 LearningRate 0.0002 Epoch: 22 Global Step: 458920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:09,153-Speed 6320.87 samples/sec Loss 4.5677 LearningRate 0.0002 Epoch: 22 Global Step: 458930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:12,395-Speed 6317.06 samples/sec Loss 4.6005 LearningRate 0.0002 Epoch: 22 Global Step: 458940 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:29:15,622-Speed 6348.72 samples/sec Loss 4.5640 LearningRate 0.0002 Epoch: 22 Global Step: 458950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:18,866-Speed 6314.13 samples/sec Loss 4.5956 LearningRate 0.0002 Epoch: 22 Global Step: 458960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:22,107-Speed 6321.06 samples/sec Loss 4.5584 LearningRate 0.0002 Epoch: 22 Global Step: 458970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:25,349-Speed 6317.55 samples/sec Loss 4.5070 LearningRate 0.0002 Epoch: 22 Global Step: 458980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:28,600-Speed 6301.19 samples/sec Loss 4.6065 LearningRate 0.0002 Epoch: 22 Global Step: 458990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:31,844-Speed 6315.03 samples/sec Loss 4.6004 LearningRate 0.0002 Epoch: 22 Global Step: 459000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:35,085-Speed 6321.75 samples/sec Loss 4.5562 LearningRate 0.0002 Epoch: 22 Global Step: 459010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:38,331-Speed 6310.14 samples/sec Loss 4.5434 LearningRate 0.0002 Epoch: 22 Global Step: 459020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:41,573-Speed 6318.22 samples/sec Loss 4.5892 LearningRate 0.0002 Epoch: 22 Global Step: 459030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:44,817-Speed 6313.77 samples/sec Loss 4.5251 LearningRate 0.0002 Epoch: 22 Global Step: 459040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:48,048-Speed 6340.85 samples/sec Loss 4.5883 LearningRate 0.0002 Epoch: 22 Global Step: 459050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:51,288-Speed 6322.19 samples/sec Loss 4.4878 LearningRate 0.0002 Epoch: 22 Global Step: 459060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:54,538-Speed 6302.35 samples/sec Loss 4.5797 LearningRate 0.0002 Epoch: 22 Global Step: 459070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:29:57,779-Speed 6321.08 samples/sec Loss 4.5861 LearningRate 0.0002 Epoch: 22 Global Step: 459080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:01,023-Speed 6314.54 samples/sec Loss 4.5455 LearningRate 0.0002 Epoch: 22 Global Step: 459090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:04,265-Speed 6317.61 samples/sec Loss 4.5836 LearningRate 0.0002 Epoch: 22 Global Step: 459100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:07,511-Speed 6313.39 samples/sec Loss 4.5998 LearningRate 0.0002 Epoch: 22 Global Step: 459110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:10,754-Speed 6316.36 samples/sec Loss 4.6112 LearningRate 0.0002 Epoch: 22 Global Step: 459120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:13,997-Speed 6317.13 samples/sec Loss 4.5640 LearningRate 0.0002 Epoch: 22 Global Step: 459130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:17,241-Speed 6312.97 samples/sec Loss 4.5671 LearningRate 0.0002 Epoch: 22 Global Step: 459140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:20,469-Speed 6346.70 samples/sec Loss 4.6644 LearningRate 0.0002 Epoch: 22 Global Step: 459150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:23,711-Speed 6318.84 samples/sec Loss 4.5709 LearningRate 0.0002 Epoch: 22 Global Step: 459160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:26,954-Speed 6315.35 samples/sec Loss 4.5518 LearningRate 0.0002 Epoch: 22 Global Step: 459170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:30,196-Speed 6318.78 samples/sec Loss 4.5925 LearningRate 0.0002 Epoch: 22 Global Step: 459180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:33,437-Speed 6321.14 samples/sec Loss 4.5557 LearningRate 0.0002 Epoch: 22 Global Step: 459190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:36,678-Speed 6321.21 samples/sec Loss 4.5196 LearningRate 0.0002 Epoch: 22 Global Step: 459200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:39,925-Speed 6307.58 samples/sec Loss 4.4666 LearningRate 0.0002 Epoch: 22 Global Step: 459210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:43,173-Speed 6307.35 samples/sec Loss 4.5441 LearningRate 0.0002 Epoch: 22 Global Step: 459220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:46,418-Speed 6312.69 samples/sec Loss 4.5185 LearningRate 0.0002 Epoch: 22 Global Step: 459230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:49,660-Speed 6318.07 samples/sec Loss 4.5150 LearningRate 0.0002 Epoch: 22 Global Step: 459240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:52,890-Speed 6341.15 samples/sec Loss 4.5025 LearningRate 0.0002 Epoch: 22 Global Step: 459250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:56,134-Speed 6316.10 samples/sec Loss 4.5492 LearningRate 0.0002 Epoch: 22 Global Step: 459260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:30:59,380-Speed 6309.55 samples/sec Loss 4.5864 LearningRate 0.0002 Epoch: 22 Global Step: 459270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:02,632-Speed 6299.24 samples/sec Loss 4.5864 LearningRate 0.0002 Epoch: 22 Global Step: 459280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:05,890-Speed 6288.99 samples/sec Loss 4.5337 LearningRate 0.0002 Epoch: 22 Global Step: 459290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:09,139-Speed 6303.01 samples/sec Loss 4.5838 LearningRate 0.0002 Epoch: 22 Global Step: 459300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:12,387-Speed 6307.83 samples/sec Loss 4.5257 LearningRate 0.0002 Epoch: 22 Global Step: 459310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:15,630-Speed 6316.86 samples/sec Loss 4.6731 LearningRate 0.0002 Epoch: 22 Global Step: 459320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:18,875-Speed 6312.32 samples/sec Loss 4.5659 LearningRate 0.0002 Epoch: 22 Global Step: 459330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:22,117-Speed 6319.80 samples/sec Loss 4.5761 LearningRate 0.0002 Epoch: 22 Global Step: 459340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:25,373-Speed 6290.49 samples/sec Loss 4.6085 LearningRate 0.0002 Epoch: 22 Global Step: 459350 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:31:28,607-Speed 6333.94 samples/sec Loss 4.5542 LearningRate 0.0002 Epoch: 22 Global Step: 459360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:31,858-Speed 6302.10 samples/sec Loss 4.6045 LearningRate 0.0002 Epoch: 22 Global Step: 459370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:35,101-Speed 6316.64 samples/sec Loss 4.5551 LearningRate 0.0002 Epoch: 22 Global Step: 459380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:38,358-Speed 6288.60 samples/sec Loss 4.6078 LearningRate 0.0002 Epoch: 22 Global Step: 459390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:41,602-Speed 6315.55 samples/sec Loss 4.5942 LearningRate 0.0002 Epoch: 22 Global Step: 459400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:44,843-Speed 6318.94 samples/sec Loss 4.5713 LearningRate 0.0002 Epoch: 22 Global Step: 459410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:48,085-Speed 6317.96 samples/sec Loss 4.5464 LearningRate 0.0002 Epoch: 22 Global Step: 459420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:51,331-Speed 6313.87 samples/sec Loss 4.5380 LearningRate 0.0002 Epoch: 22 Global Step: 459430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:54,574-Speed 6315.55 samples/sec Loss 4.5755 LearningRate 0.0002 Epoch: 22 Global Step: 459440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:31:57,820-Speed 6309.93 samples/sec Loss 4.5724 LearningRate 0.0002 Epoch: 22 Global Step: 459450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:01,071-Speed 6302.17 samples/sec Loss 4.5743 LearningRate 0.0002 Epoch: 22 Global Step: 459460 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:32:04,305-Speed 6334.32 samples/sec Loss 4.5857 LearningRate 0.0002 Epoch: 22 Global Step: 459470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:07,547-Speed 6318.38 samples/sec Loss 4.5687 LearningRate 0.0002 Epoch: 22 Global Step: 459480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:10,788-Speed 6320.53 samples/sec Loss 4.5405 LearningRate 0.0002 Epoch: 22 Global Step: 459490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:14,032-Speed 6313.55 samples/sec Loss 4.6480 LearningRate 0.0002 Epoch: 22 Global Step: 459500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:17,278-Speed 6311.08 samples/sec Loss 4.5440 LearningRate 0.0002 Epoch: 22 Global Step: 459510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:20,522-Speed 6314.59 samples/sec Loss 4.4781 LearningRate 0.0002 Epoch: 22 Global Step: 459520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:23,766-Speed 6316.63 samples/sec Loss 4.5908 LearningRate 0.0002 Epoch: 22 Global Step: 459530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:27,012-Speed 6310.81 samples/sec Loss 4.5505 LearningRate 0.0002 Epoch: 22 Global Step: 459540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:30,255-Speed 6315.52 samples/sec Loss 4.5025 LearningRate 0.0002 Epoch: 22 Global Step: 459550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:33,497-Speed 6319.10 samples/sec Loss 4.4993 LearningRate 0.0002 Epoch: 22 Global Step: 459560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:36,735-Speed 6326.27 samples/sec Loss 4.6160 LearningRate 0.0002 Epoch: 22 Global Step: 459570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:39,989-Speed 6294.07 samples/sec Loss 4.5350 LearningRate 0.0002 Epoch: 22 Global Step: 459580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:43,237-Speed 6308.46 samples/sec Loss 4.4710 LearningRate 0.0002 Epoch: 22 Global Step: 459590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:46,486-Speed 6304.09 samples/sec Loss 4.5202 LearningRate 0.0002 Epoch: 22 Global Step: 459600 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:49,731-Speed 6312.37 samples/sec Loss 4.5177 LearningRate 0.0002 Epoch: 22 Global Step: 459610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:52,978-Speed 6309.55 samples/sec Loss 4.5172 LearningRate 0.0002 Epoch: 22 Global Step: 459620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:56,218-Speed 6320.99 samples/sec Loss 4.5593 LearningRate 0.0002 Epoch: 22 Global Step: 459630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:32:59,464-Speed 6310.99 samples/sec Loss 4.5869 LearningRate 0.0002 Epoch: 22 Global Step: 459640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:02,709-Speed 6312.88 samples/sec Loss 4.5710 LearningRate 0.0002 Epoch: 22 Global Step: 459650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:05,953-Speed 6315.32 samples/sec Loss 4.5183 LearningRate 0.0002 Epoch: 22 Global Step: 459660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:09,199-Speed 6310.76 samples/sec Loss 4.6272 LearningRate 0.0002 Epoch: 22 Global Step: 459670 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:33:12,438-Speed 6324.72 samples/sec Loss 4.4804 LearningRate 0.0002 Epoch: 22 Global Step: 459680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:16,989-Speed 4500.42 samples/sec Loss 4.5195 LearningRate 0.0002 Epoch: 22 Global Step: 459690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:20,233-Speed 6313.86 samples/sec Loss 4.5077 LearningRate 0.0002 Epoch: 22 Global Step: 459700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:23,478-Speed 6314.02 samples/sec Loss 4.5545 LearningRate 0.0002 Epoch: 22 Global Step: 459710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:26,722-Speed 6314.92 samples/sec Loss 4.6463 LearningRate 0.0002 Epoch: 22 Global Step: 459720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:29,971-Speed 6303.89 samples/sec Loss 4.5490 LearningRate 0.0002 Epoch: 22 Global Step: 459730 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:33,219-Speed 6307.35 samples/sec Loss 4.5141 LearningRate 0.0002 Epoch: 22 Global Step: 459740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:36,464-Speed 6314.24 samples/sec Loss 4.5741 LearningRate 0.0002 Epoch: 22 Global Step: 459750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:39,721-Speed 6288.24 samples/sec Loss 4.5431 LearningRate 0.0002 Epoch: 22 Global Step: 459760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:42,986-Speed 6274.82 samples/sec Loss 4.4691 LearningRate 0.0002 Epoch: 22 Global Step: 459770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:46,216-Speed 6342.32 samples/sec Loss 4.5094 LearningRate 0.0002 Epoch: 22 Global Step: 459780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:49,462-Speed 6309.48 samples/sec Loss 4.5360 LearningRate 0.0002 Epoch: 22 Global Step: 459790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:52,709-Speed 6308.20 samples/sec Loss 4.5674 LearningRate 0.0002 Epoch: 22 Global Step: 459800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:55,953-Speed 6315.48 samples/sec Loss 4.5276 LearningRate 0.0002 Epoch: 22 Global Step: 459810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:33:59,197-Speed 6315.35 samples/sec Loss 4.5246 LearningRate 0.0002 Epoch: 22 Global Step: 459820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:02,439-Speed 6318.15 samples/sec Loss 4.5861 LearningRate 0.0002 Epoch: 22 Global Step: 459830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:05,683-Speed 6313.77 samples/sec Loss 4.6334 LearningRate 0.0002 Epoch: 22 Global Step: 459840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:08,928-Speed 6312.83 samples/sec Loss 4.5497 LearningRate 0.0002 Epoch: 22 Global Step: 459850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:12,179-Speed 6302.11 samples/sec Loss 4.5315 LearningRate 0.0002 Epoch: 22 Global Step: 459860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:15,427-Speed 6305.51 samples/sec Loss 4.6025 LearningRate 0.0002 Epoch: 22 Global Step: 459870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:18,675-Speed 6308.06 samples/sec Loss 4.6120 LearningRate 0.0002 Epoch: 22 Global Step: 459880 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:34:21,918-Speed 6316.02 samples/sec Loss 4.5295 LearningRate 0.0002 Epoch: 22 Global Step: 459890 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:34:25,167-Speed 6305.92 samples/sec Loss 4.5606 LearningRate 0.0002 Epoch: 22 Global Step: 459900 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:34:28,402-Speed 6331.85 samples/sec Loss 4.5838 LearningRate 0.0002 Epoch: 22 Global Step: 459910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:31,644-Speed 6318.88 samples/sec Loss 4.5645 LearningRate 0.0002 Epoch: 22 Global Step: 459920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:34,889-Speed 6312.72 samples/sec Loss 4.6662 LearningRate 0.0002 Epoch: 22 Global Step: 459930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:38,135-Speed 6309.90 samples/sec Loss 4.5682 LearningRate 0.0002 Epoch: 22 Global Step: 459940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:41,378-Speed 6316.94 samples/sec Loss 4.5264 LearningRate 0.0002 Epoch: 22 Global Step: 459950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:44,624-Speed 6312.06 samples/sec Loss 4.5840 LearningRate 0.0002 Epoch: 22 Global Step: 459960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:47,871-Speed 6307.49 samples/sec Loss 4.5789 LearningRate 0.0002 Epoch: 22 Global Step: 459970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:51,113-Speed 6319.68 samples/sec Loss 4.5411 LearningRate 0.0002 Epoch: 22 Global Step: 459980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:54,358-Speed 6312.97 samples/sec Loss 4.6063 LearningRate 0.0002 Epoch: 22 Global Step: 459990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:34:57,606-Speed 6306.57 samples/sec Loss 4.6162 LearningRate 0.0002 Epoch: 22 Global Step: 460000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:00,836-Speed 6342.02 samples/sec Loss 4.5810 LearningRate 0.0002 Epoch: 22 Global Step: 460010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:04,081-Speed 6312.65 samples/sec Loss 4.6006 LearningRate 0.0002 Epoch: 22 Global Step: 460020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:07,330-Speed 6304.77 samples/sec Loss 4.5160 LearningRate 0.0002 Epoch: 22 Global Step: 460030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:10,574-Speed 6313.76 samples/sec Loss 4.5718 LearningRate 0.0002 Epoch: 22 Global Step: 460040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:13,815-Speed 6319.72 samples/sec Loss 4.5291 LearningRate 0.0002 Epoch: 22 Global Step: 460050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:17,082-Speed 6273.56 samples/sec Loss 4.5863 LearningRate 0.0002 Epoch: 22 Global Step: 460060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:20,348-Speed 6271.09 samples/sec Loss 4.6136 LearningRate 0.0002 Epoch: 22 Global Step: 460070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:23,598-Speed 6302.79 samples/sec Loss 4.5133 LearningRate 0.0002 Epoch: 22 Global Step: 460080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:26,840-Speed 6318.22 samples/sec Loss 4.5382 LearningRate 0.0002 Epoch: 22 Global Step: 460090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:30,084-Speed 6315.29 samples/sec Loss 4.5411 LearningRate 0.0002 Epoch: 22 Global Step: 460100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:33,313-Speed 6344.77 samples/sec Loss 4.4602 LearningRate 0.0002 Epoch: 22 Global Step: 460110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:36,563-Speed 6302.38 samples/sec Loss 4.5592 LearningRate 0.0002 Epoch: 22 Global Step: 460120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:39,807-Speed 6314.70 samples/sec Loss 4.5584 LearningRate 0.0002 Epoch: 22 Global Step: 460130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:43,053-Speed 6310.70 samples/sec Loss 4.5619 LearningRate 0.0002 Epoch: 22 Global Step: 460140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:46,298-Speed 6313.64 samples/sec Loss 4.5519 LearningRate 0.0002 Epoch: 22 Global Step: 460150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:49,542-Speed 6315.65 samples/sec Loss 4.5772 LearningRate 0.0002 Epoch: 22 Global Step: 460160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:52,786-Speed 6314.34 samples/sec Loss 4.4999 LearningRate 0.0002 Epoch: 22 Global Step: 460170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:56,038-Speed 6297.73 samples/sec Loss 4.5436 LearningRate 0.0002 Epoch: 22 Global Step: 460180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:35:59,296-Speed 6288.58 samples/sec Loss 4.5404 LearningRate 0.0002 Epoch: 22 Global Step: 460190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:02,541-Speed 6312.34 samples/sec Loss 4.5169 LearningRate 0.0002 Epoch: 22 Global Step: 460200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:05,790-Speed 6304.99 samples/sec Loss 4.5092 LearningRate 0.0002 Epoch: 22 Global Step: 460210 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:36:09,020-Speed 6341.49 samples/sec Loss 4.5447 LearningRate 0.0002 Epoch: 22 Global Step: 460220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:12,266-Speed 6311.14 samples/sec Loss 4.5783 LearningRate 0.0002 Epoch: 22 Global Step: 460230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:15,512-Speed 6309.24 samples/sec Loss 4.5205 LearningRate 0.0002 Epoch: 22 Global Step: 460240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:18,773-Speed 6281.90 samples/sec Loss 4.5988 LearningRate 0.0002 Epoch: 22 Global Step: 460250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:22,024-Speed 6301.51 samples/sec Loss 4.5101 LearningRate 0.0002 Epoch: 22 Global Step: 460260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:25,272-Speed 6307.11 samples/sec Loss 4.5336 LearningRate 0.0002 Epoch: 22 Global Step: 460270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:28,519-Speed 6308.04 samples/sec Loss 4.6306 LearningRate 0.0002 Epoch: 22 Global Step: 460280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:31,763-Speed 6314.95 samples/sec Loss 4.5821 LearningRate 0.0002 Epoch: 22 Global Step: 460290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:35,008-Speed 6311.85 samples/sec Loss 4.5743 LearningRate 0.0002 Epoch: 22 Global Step: 460300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:38,254-Speed 6312.33 samples/sec Loss 4.5617 LearningRate 0.0002 Epoch: 22 Global Step: 460310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:41,500-Speed 6310.64 samples/sec Loss 4.5466 LearningRate 0.0002 Epoch: 22 Global Step: 460320 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:36:44,736-Speed 6331.39 samples/sec Loss 4.5516 LearningRate 0.0002 Epoch: 22 Global Step: 460330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:47,978-Speed 6318.50 samples/sec Loss 4.5745 LearningRate 0.0002 Epoch: 22 Global Step: 460340 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:51,224-Speed 6310.13 samples/sec Loss 4.6089 LearningRate 0.0002 Epoch: 22 Global Step: 460350 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:54,468-Speed 6314.84 samples/sec Loss 4.5469 LearningRate 0.0002 Epoch: 22 Global Step: 460360 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:36:57,714-Speed 6311.26 samples/sec Loss 4.5627 LearningRate 0.0002 Epoch: 22 Global Step: 460370 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:00,960-Speed 6309.53 samples/sec Loss 4.5685 LearningRate 0.0002 Epoch: 22 Global Step: 460380 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:04,210-Speed 6303.46 samples/sec Loss 4.5787 LearningRate 0.0002 Epoch: 22 Global Step: 460390 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:07,454-Speed 6315.23 samples/sec Loss 4.5672 LearningRate 0.0002 Epoch: 22 Global Step: 460400 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:10,702-Speed 6305.27 samples/sec Loss 4.5495 LearningRate 0.0002 Epoch: 22 Global Step: 460410 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:13,973-Speed 6262.34 samples/sec Loss 4.5742 LearningRate 0.0002 Epoch: 22 Global Step: 460420 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:17,218-Speed 6313.79 samples/sec Loss 4.4969 LearningRate 0.0002 Epoch: 22 Global Step: 460430 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:20,462-Speed 6313.87 samples/sec Loss 4.5418 LearningRate 0.0002 Epoch: 22 Global Step: 460440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:23,707-Speed 6313.87 samples/sec Loss 4.5452 LearningRate 0.0002 Epoch: 22 Global Step: 460450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:26,954-Speed 6308.00 samples/sec Loss 4.5936 LearningRate 0.0002 Epoch: 22 Global Step: 460460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:30,203-Speed 6305.84 samples/sec Loss 4.5197 LearningRate 0.0002 Epoch: 22 Global Step: 460470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:33,452-Speed 6303.50 samples/sec Loss 4.5270 LearningRate 0.0002 Epoch: 22 Global Step: 460480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:36,698-Speed 6310.19 samples/sec Loss 4.5309 LearningRate 0.0002 Epoch: 22 Global Step: 460490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:39,944-Speed 6311.65 samples/sec Loss 4.4650 LearningRate 0.0002 Epoch: 22 Global Step: 460500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:43,186-Speed 6318.73 samples/sec Loss 4.5401 LearningRate 0.0002 Epoch: 22 Global Step: 460510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:46,430-Speed 6314.44 samples/sec Loss 4.5544 LearningRate 0.0002 Epoch: 22 Global Step: 460520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:49,658-Speed 6346.02 samples/sec Loss 4.6096 LearningRate 0.0002 Epoch: 22 Global Step: 460530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:52,907-Speed 6305.82 samples/sec Loss 4.5671 LearningRate 0.0002 Epoch: 22 Global Step: 460540 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:56,155-Speed 6306.29 samples/sec Loss 4.5602 LearningRate 0.0002 Epoch: 22 Global Step: 460550 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:37:59,400-Speed 6312.96 samples/sec Loss 4.5118 LearningRate 0.0002 Epoch: 22 Global Step: 460560 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:02,647-Speed 6309.36 samples/sec Loss 4.4927 LearningRate 0.0002 Epoch: 22 Global Step: 460570 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:05,889-Speed 6318.61 samples/sec Loss 4.5426 LearningRate 0.0002 Epoch: 22 Global Step: 460580 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:09,137-Speed 6306.45 samples/sec Loss 4.5695 LearningRate 0.0002 Epoch: 22 Global Step: 460590 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:12,484-Speed 6120.41 samples/sec Loss 4.4824 LearningRate 0.0002 Epoch: 22 Global Step: 460600 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:15,742-Speed 6287.74 samples/sec Loss 4.5594 LearningRate 0.0002 Epoch: 22 Global Step: 460610 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:18,985-Speed 6316.35 samples/sec Loss 4.5615 LearningRate 0.0002 Epoch: 22 Global Step: 460620 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:22,215-Speed 6341.97 samples/sec Loss 4.6123 LearningRate 0.0002 Epoch: 22 Global Step: 460630 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:25,459-Speed 6314.51 samples/sec Loss 4.5350 LearningRate 0.0002 Epoch: 22 Global Step: 460640 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:28,705-Speed 6310.68 samples/sec Loss 4.5357 LearningRate 0.0002 Epoch: 22 Global Step: 460650 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:31,953-Speed 6307.02 samples/sec Loss 4.5732 LearningRate 0.0002 Epoch: 22 Global Step: 460660 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:35,197-Speed 6314.71 samples/sec Loss 4.5717 LearningRate 0.0002 Epoch: 22 Global Step: 460670 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:38,442-Speed 6312.29 samples/sec Loss 4.5796 LearningRate 0.0002 Epoch: 22 Global Step: 460680 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:41,687-Speed 6312.86 samples/sec Loss 4.5143 LearningRate 0.0002 Epoch: 22 Global Step: 460690 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:44,934-Speed 6307.81 samples/sec Loss 4.4852 LearningRate 0.0002 Epoch: 22 Global Step: 460700 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:48,184-Speed 6303.06 samples/sec Loss 4.5884 LearningRate 0.0002 Epoch: 22 Global Step: 460710 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:51,426-Speed 6318.39 samples/sec Loss 4.5627 LearningRate 0.0002 Epoch: 22 Global Step: 460720 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:38:54,672-Speed 6310.19 samples/sec Loss 4.6393 LearningRate 0.0002 Epoch: 22 Global Step: 460730 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:38:57,905-Speed 6338.18 samples/sec Loss 4.5606 LearningRate 0.0002 Epoch: 22 Global Step: 460740 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:01,149-Speed 6313.62 samples/sec Loss 4.5587 LearningRate 0.0002 Epoch: 22 Global Step: 460750 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:04,396-Speed 6310.08 samples/sec Loss 4.5778 LearningRate 0.0002 Epoch: 22 Global Step: 460760 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:07,638-Speed 6317.37 samples/sec Loss 4.5588 LearningRate 0.0002 Epoch: 22 Global Step: 460770 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:10,888-Speed 6303.77 samples/sec Loss 4.4887 LearningRate 0.0002 Epoch: 22 Global Step: 460780 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:14,131-Speed 6316.74 samples/sec Loss 4.5338 LearningRate 0.0002 Epoch: 22 Global Step: 460790 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:17,374-Speed 6316.79 samples/sec Loss 4.5607 LearningRate 0.0002 Epoch: 22 Global Step: 460800 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:20,620-Speed 6310.63 samples/sec Loss 4.5122 LearningRate 0.0002 Epoch: 22 Global Step: 460810 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:23,867-Speed 6309.10 samples/sec Loss 4.5565 LearningRate 0.0002 Epoch: 22 Global Step: 460820 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:27,112-Speed 6311.57 samples/sec Loss 4.5832 LearningRate 0.0002 Epoch: 22 Global Step: 460830 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:30,343-Speed 6341.26 samples/sec Loss 4.5685 LearningRate 0.0002 Epoch: 22 Global Step: 460840 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:33,588-Speed 6312.16 samples/sec Loss 4.5646 LearningRate 0.0002 Epoch: 22 Global Step: 460850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:36,836-Speed 6306.70 samples/sec Loss 4.5227 LearningRate 0.0002 Epoch: 22 Global Step: 460860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:40,081-Speed 6312.45 samples/sec Loss 4.5360 LearningRate 0.0002 Epoch: 22 Global Step: 460870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:43,327-Speed 6310.44 samples/sec Loss 4.5128 LearningRate 0.0002 Epoch: 22 Global Step: 460880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:46,584-Speed 6290.13 samples/sec Loss 4.4372 LearningRate 0.0002 Epoch: 22 Global Step: 460890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:49,830-Speed 6310.40 samples/sec Loss 4.5066 LearningRate 0.0002 Epoch: 22 Global Step: 460900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:53,075-Speed 6312.21 samples/sec Loss 4.5416 LearningRate 0.0002 Epoch: 22 Global Step: 460910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:56,314-Speed 6324.71 samples/sec Loss 4.5296 LearningRate 0.0002 Epoch: 22 Global Step: 460920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:39:59,559-Speed 6313.58 samples/sec Loss 4.6115 LearningRate 0.0002 Epoch: 22 Global Step: 460930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:02,802-Speed 6315.12 samples/sec Loss 4.5157 LearningRate 0.0002 Epoch: 22 Global Step: 460940 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-02 09:40:06,035-Speed 6336.08 samples/sec Loss 4.6018 LearningRate 0.0002 Epoch: 22 Global Step: 460950 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:09,283-Speed 6307.90 samples/sec Loss 4.5498 LearningRate 0.0002 Epoch: 22 Global Step: 460960 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:12,586-Speed 6202.53 samples/sec Loss 4.5479 LearningRate 0.0002 Epoch: 22 Global Step: 460970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:15,848-Speed 6279.85 samples/sec Loss 4.5614 LearningRate 0.0002 Epoch: 22 Global Step: 460980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:19,091-Speed 6316.83 samples/sec Loss 4.5323 LearningRate 0.0002 Epoch: 22 Global Step: 460990 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:22,348-Speed 6287.88 samples/sec Loss 4.5596 LearningRate 0.0002 Epoch: 22 Global Step: 461000 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:25,606-Speed 6288.56 samples/sec Loss 4.5723 LearningRate 0.0002 Epoch: 22 Global Step: 461010 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:28,853-Speed 6308.80 samples/sec Loss 4.5150 LearningRate 0.0002 Epoch: 22 Global Step: 461020 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:32,097-Speed 6313.16 samples/sec Loss 4.5741 LearningRate 0.0002 Epoch: 22 Global Step: 461030 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:35,337-Speed 6323.03 samples/sec Loss 4.5034 LearningRate 0.0002 Epoch: 22 Global Step: 461040 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:38,569-Speed 6338.72 samples/sec Loss 4.5723 LearningRate 0.0002 Epoch: 22 Global Step: 461050 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:41,847-Speed 6249.32 samples/sec Loss 4.5686 LearningRate 0.0002 Epoch: 22 Global Step: 461060 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:45,090-Speed 6315.12 samples/sec Loss 4.5645 LearningRate 0.0002 Epoch: 22 Global Step: 461070 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:48,336-Speed 6312.55 samples/sec Loss 4.5540 LearningRate 0.0002 Epoch: 22 Global Step: 461080 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:51,580-Speed 6313.87 samples/sec Loss 4.6097 LearningRate 0.0002 Epoch: 22 Global Step: 461090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:54,826-Speed 6311.40 samples/sec Loss 4.5385 LearningRate 0.0002 Epoch: 22 Global Step: 461100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:40:58,070-Speed 6313.76 samples/sec Loss 4.5485 LearningRate 0.0002 Epoch: 22 Global Step: 461110 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:01,313-Speed 6317.01 samples/sec Loss 4.5602 LearningRate 0.0002 Epoch: 22 Global Step: 461120 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:04,614-Speed 6204.38 samples/sec Loss 4.5590 LearningRate 0.0002 Epoch: 22 Global Step: 461130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:07,858-Speed 6315.78 samples/sec Loss 4.6171 LearningRate 0.0002 Epoch: 22 Global Step: 461140 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:11,093-Speed 6331.99 samples/sec Loss 4.6097 LearningRate 0.0002 Epoch: 22 Global Step: 461150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:14,341-Speed 6306.10 samples/sec Loss 4.5917 LearningRate 0.0002 Epoch: 22 Global Step: 461160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:17,590-Speed 6306.17 samples/sec Loss 4.5393 LearningRate 0.0002 Epoch: 22 Global Step: 461170 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:20,839-Speed 6304.93 samples/sec Loss 4.5412 LearningRate 0.0002 Epoch: 22 Global Step: 461180 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:24,086-Speed 6309.15 samples/sec Loss 4.4912 LearningRate 0.0002 Epoch: 22 Global Step: 461190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:27,332-Speed 6310.26 samples/sec Loss 4.5594 LearningRate 0.0002 Epoch: 22 Global Step: 461200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:30,575-Speed 6317.50 samples/sec Loss 4.5893 LearningRate 0.0002 Epoch: 22 Global Step: 461210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:33,833-Speed 6286.32 samples/sec Loss 4.5594 LearningRate 0.0002 Epoch: 22 Global Step: 461220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:37,083-Speed 6303.42 samples/sec Loss 4.5469 LearningRate 0.0002 Epoch: 22 Global Step: 461230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:40,330-Speed 6310.99 samples/sec Loss 4.5412 LearningRate 0.0002 Epoch: 22 Global Step: 461240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-02 09:41:43,560-Speed 6340.91 samples/sec Loss 4.5767 LearningRate 0.0002 Epoch: 22 Global Step: 461250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:41:46,803-Speed 6316.91 samples/sec Loss 4.5032 LearningRate 0.0002 Epoch: 22 Global Step: 461260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:41:50,049-Speed 6311.66 samples/sec Loss 4.5916 LearningRate 0.0002 Epoch: 22 Global Step: 461270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:41:53,289-Speed 6321.65 samples/sec Loss 4.5386 LearningRate 0.0002 Epoch: 22 Global Step: 461280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:41:56,545-Speed 6291.98 samples/sec Loss 4.5292 LearningRate 0.0002 Epoch: 22 Global Step: 461290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:41:59,831-Speed 6234.33 samples/sec Loss 4.5161 LearningRate 0.0002 Epoch: 22 Global Step: 461300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:03,080-Speed 6304.05 samples/sec Loss 4.5036 LearningRate 0.0002 Epoch: 22 Global Step: 461310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:06,325-Speed 6311.89 samples/sec Loss 4.5783 LearningRate 0.0002 Epoch: 22 Global Step: 461320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:09,571-Speed 6310.55 samples/sec Loss 4.5796 LearningRate 0.0002 Epoch: 22 Global Step: 461330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:12,819-Speed 6307.37 samples/sec Loss 4.5557 LearningRate 0.0002 Epoch: 22 Global Step: 461340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:16,069-Speed 6303.90 samples/sec Loss 4.5037 LearningRate 0.0002 Epoch: 22 Global Step: 461350 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:42:19,303-Speed 6332.82 samples/sec Loss 4.6005 LearningRate 0.0002 Epoch: 22 Global Step: 461360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:22,550-Speed 6310.24 samples/sec Loss 4.5397 LearningRate 0.0002 Epoch: 22 Global Step: 461370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:25,794-Speed 6315.60 samples/sec Loss 4.5757 LearningRate 0.0002 Epoch: 22 Global Step: 461380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:29,038-Speed 6312.99 samples/sec Loss 4.5579 LearningRate 0.0002 Epoch: 22 Global Step: 461390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:32,286-Speed 6307.81 samples/sec Loss 4.5217 LearningRate 0.0002 Epoch: 22 Global Step: 461400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:35,534-Speed 6307.03 samples/sec Loss 4.5580 LearningRate 0.0002 Epoch: 22 Global Step: 461410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:38,777-Speed 6316.87 samples/sec Loss 4.5362 LearningRate 0.0002 Epoch: 22 Global Step: 461420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:42,037-Speed 6283.03 samples/sec Loss 4.5769 LearningRate 0.0002 Epoch: 22 Global Step: 461430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:45,282-Speed 6312.90 samples/sec Loss 4.5638 LearningRate 0.0002 Epoch: 22 Global Step: 461440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:48,539-Speed 6288.71 samples/sec Loss 4.5517 LearningRate 0.0002 Epoch: 22 Global Step: 461450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:51,786-Speed 6313.72 samples/sec Loss 4.5869 LearningRate 0.0002 Epoch: 22 Global Step: 461460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:55,029-Speed 6315.50 samples/sec Loss 4.5142 LearningRate 0.0002 Epoch: 22 Global Step: 461470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:42:58,283-Speed 6294.45 samples/sec Loss 4.6118 LearningRate 0.0002 Epoch: 22 Global Step: 461480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:01,528-Speed 6313.23 samples/sec Loss 4.6069 LearningRate 0.0002 Epoch: 22 Global Step: 461490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:04,773-Speed 6311.93 samples/sec Loss 4.4898 LearningRate 0.0002 Epoch: 22 Global Step: 461500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:08,020-Speed 6309.96 samples/sec Loss 4.5227 LearningRate 0.0002 Epoch: 22 Global Step: 461510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:11,260-Speed 6321.18 samples/sec Loss 4.5214 LearningRate 0.0002 Epoch: 22 Global Step: 461520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:14,505-Speed 6313.79 samples/sec Loss 4.5253 LearningRate 0.0002 Epoch: 22 Global Step: 461530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:17,748-Speed 6315.17 samples/sec Loss 4.5275 LearningRate 0.0002 Epoch: 22 Global Step: 461540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:20,993-Speed 6313.19 samples/sec Loss 4.6130 LearningRate 0.0002 Epoch: 22 Global Step: 461550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:24,222-Speed 6344.75 samples/sec Loss 4.5445 LearningRate 0.0002 Epoch: 22 Global Step: 461560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:27,477-Speed 6293.35 samples/sec Loss 4.5519 LearningRate 0.0002 Epoch: 22 Global Step: 461570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:30,721-Speed 6313.09 samples/sec Loss 4.5273 LearningRate 0.0002 Epoch: 22 Global Step: 461580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:33,967-Speed 6311.62 samples/sec Loss 4.5713 LearningRate 0.0002 Epoch: 22 Global Step: 461590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:37,212-Speed 6314.12 samples/sec Loss 4.5321 LearningRate 0.0002 Epoch: 22 Global Step: 461600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:40,456-Speed 6314.87 samples/sec Loss 4.4997 LearningRate 0.0002 Epoch: 22 Global Step: 461610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:43,700-Speed 6314.00 samples/sec Loss 4.4403 LearningRate 0.0002 Epoch: 22 Global Step: 461620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:46,945-Speed 6311.79 samples/sec Loss 4.5099 LearningRate 0.0002 Epoch: 22 Global Step: 461630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:50,224-Speed 6248.30 samples/sec Loss 4.5194 LearningRate 0.0002 Epoch: 22 Global Step: 461640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:53,511-Speed 6232.18 samples/sec Loss 4.5108 LearningRate 0.0002 Epoch: 22 Global Step: 461650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:43:56,756-Speed 6311.30 samples/sec Loss 4.5376 LearningRate 0.0002 Epoch: 22 Global Step: 461660 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:43:59,986-Speed 6342.78 samples/sec Loss 4.6156 LearningRate 0.0002 Epoch: 22 Global Step: 461670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:03,230-Speed 6314.33 samples/sec Loss 4.5655 LearningRate 0.0002 Epoch: 22 Global Step: 461680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:06,474-Speed 6313.99 samples/sec Loss 4.5560 LearningRate 0.0002 Epoch: 22 Global Step: 461690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:09,726-Speed 6300.36 samples/sec Loss 4.5477 LearningRate 0.0002 Epoch: 22 Global Step: 461700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:13,028-Speed 6202.81 samples/sec Loss 4.5509 LearningRate 0.0002 Epoch: 22 Global Step: 461710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:16,271-Speed 6317.08 samples/sec Loss 4.5377 LearningRate 0.0002 Epoch: 22 Global Step: 461720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:19,522-Speed 6301.23 samples/sec Loss 4.5404 LearningRate 0.0002 Epoch: 22 Global Step: 461730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:22,766-Speed 6314.29 samples/sec Loss 4.6299 LearningRate 0.0002 Epoch: 22 Global Step: 461740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:26,015-Speed 6305.43 samples/sec Loss 4.5983 LearningRate 0.0002 Epoch: 22 Global Step: 461750 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:29,260-Speed 6310.89 samples/sec Loss 4.5502 LearningRate 0.0002 Epoch: 22 Global Step: 461760 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:32,491-Speed 6340.34 samples/sec Loss 4.5425 LearningRate 0.0002 Epoch: 22 Global Step: 461770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:35,739-Speed 6308.23 samples/sec Loss 4.5405 LearningRate 0.0002 Epoch: 22 Global Step: 461780 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:38,986-Speed 6308.97 samples/sec Loss 4.5880 LearningRate 0.0002 Epoch: 22 Global Step: 461790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:42,229-Speed 6316.59 samples/sec Loss 4.5517 LearningRate 0.0002 Epoch: 22 Global Step: 461800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:45,479-Speed 6302.41 samples/sec Loss 4.5700 LearningRate 0.0002 Epoch: 22 Global Step: 461810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:48,722-Speed 6316.58 samples/sec Loss 4.5229 LearningRate 0.0002 Epoch: 22 Global Step: 461820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:51,966-Speed 6315.08 samples/sec Loss 4.5128 LearningRate 0.0002 Epoch: 22 Global Step: 461830 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:55,212-Speed 6311.93 samples/sec Loss 4.5768 LearningRate 0.0002 Epoch: 22 Global Step: 461840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:44:58,457-Speed 6312.04 samples/sec Loss 4.5182 LearningRate 0.0002 Epoch: 22 Global Step: 461850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:01,708-Speed 6300.21 samples/sec Loss 4.5304 LearningRate 0.0002 Epoch: 22 Global Step: 461860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:04,939-Speed 6340.74 samples/sec Loss 4.5861 LearningRate 0.0002 Epoch: 22 Global Step: 461870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:08,180-Speed 6319.09 samples/sec Loss 4.5509 LearningRate 0.0002 Epoch: 22 Global Step: 461880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:11,423-Speed 6317.26 samples/sec Loss 4.5764 LearningRate 0.0002 Epoch: 22 Global Step: 461890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:14,667-Speed 6314.11 samples/sec Loss 4.5550 LearningRate 0.0002 Epoch: 22 Global Step: 461900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:17,913-Speed 6311.48 samples/sec Loss 4.5192 LearningRate 0.0002 Epoch: 22 Global Step: 461910 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:21,159-Speed 6310.62 samples/sec Loss 4.5246 LearningRate 0.0002 Epoch: 22 Global Step: 461920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:24,405-Speed 6310.14 samples/sec Loss 4.5179 LearningRate 0.0002 Epoch: 22 Global Step: 461930 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:27,651-Speed 6311.30 samples/sec Loss 4.5053 LearningRate 0.0002 Epoch: 22 Global Step: 461940 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:30,896-Speed 6311.95 samples/sec Loss 4.5880 LearningRate 0.0002 Epoch: 22 Global Step: 461950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:34,140-Speed 6314.63 samples/sec Loss 4.5785 LearningRate 0.0002 Epoch: 22 Global Step: 461960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:37,412-Speed 6260.68 samples/sec Loss 4.4685 LearningRate 0.0002 Epoch: 22 Global Step: 461970 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:45:40,645-Speed 6336.59 samples/sec Loss 4.5511 LearningRate 0.0002 Epoch: 22 Global Step: 461980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:43,888-Speed 6316.13 samples/sec Loss 4.5375 LearningRate 0.0002 Epoch: 22 Global Step: 461990 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:47,139-Speed 6302.14 samples/sec Loss 4.5479 LearningRate 0.0002 Epoch: 22 Global Step: 462000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:50,399-Speed 6283.72 samples/sec Loss 4.5391 LearningRate 0.0002 Epoch: 22 Global Step: 462010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:53,644-Speed 6312.85 samples/sec Loss 4.5320 LearningRate 0.0002 Epoch: 22 Global Step: 462020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:45:56,890-Speed 6309.61 samples/sec Loss 4.5637 LearningRate 0.0002 Epoch: 22 Global Step: 462030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:00,135-Speed 6313.03 samples/sec Loss 4.5342 LearningRate 0.0002 Epoch: 22 Global Step: 462040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:03,381-Speed 6309.94 samples/sec Loss 4.5581 LearningRate 0.0002 Epoch: 22 Global Step: 462050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:06,626-Speed 6312.95 samples/sec Loss 4.5949 LearningRate 0.0002 Epoch: 22 Global Step: 462060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:09,872-Speed 6310.83 samples/sec Loss 4.5029 LearningRate 0.0002 Epoch: 22 Global Step: 462070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:13,114-Speed 6318.80 samples/sec Loss 4.5574 LearningRate 0.0002 Epoch: 22 Global Step: 462080 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:46:16,347-Speed 6337.31 samples/sec Loss 4.5204 LearningRate 0.0002 Epoch: 22 Global Step: 462090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:19,590-Speed 6316.56 samples/sec Loss 4.5903 LearningRate 0.0002 Epoch: 22 Global Step: 462100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:22,834-Speed 6313.43 samples/sec Loss 4.5156 LearningRate 0.0002 Epoch: 22 Global Step: 462110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:26,085-Speed 6302.51 samples/sec Loss 4.5880 LearningRate 0.0002 Epoch: 22 Global Step: 462120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:29,331-Speed 6309.98 samples/sec Loss 4.5863 LearningRate 0.0002 Epoch: 22 Global Step: 462130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:32,578-Speed 6308.84 samples/sec Loss 4.4530 LearningRate 0.0002 Epoch: 22 Global Step: 462140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:35,824-Speed 6310.46 samples/sec Loss 4.6079 LearningRate 0.0002 Epoch: 22 Global Step: 462150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:39,073-Speed 6305.18 samples/sec Loss 4.5863 LearningRate 0.0002 Epoch: 22 Global Step: 462160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:42,318-Speed 6311.81 samples/sec Loss 4.6133 LearningRate 0.0002 Epoch: 22 Global Step: 462170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:45,564-Speed 6310.66 samples/sec Loss 4.5938 LearningRate 0.0002 Epoch: 22 Global Step: 462180 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:48,795-Speed 6339.47 samples/sec Loss 4.5443 LearningRate 0.0002 Epoch: 22 Global Step: 462190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:52,039-Speed 6314.83 samples/sec Loss 4.5028 LearningRate 0.0002 Epoch: 22 Global Step: 462200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:55,281-Speed 6319.87 samples/sec Loss 4.5427 LearningRate 0.0002 Epoch: 22 Global Step: 462210 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:46:58,532-Speed 6300.46 samples/sec Loss 4.5525 LearningRate 0.0002 Epoch: 22 Global Step: 462220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:01,781-Speed 6305.58 samples/sec Loss 4.6133 LearningRate 0.0002 Epoch: 22 Global Step: 462230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:05,029-Speed 6307.08 samples/sec Loss 4.5532 LearningRate 0.0002 Epoch: 22 Global Step: 462240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:08,273-Speed 6313.91 samples/sec Loss 4.5524 LearningRate 0.0002 Epoch: 22 Global Step: 462250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:11,518-Speed 6313.98 samples/sec Loss 4.5519 LearningRate 0.0002 Epoch: 22 Global Step: 462260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:14,786-Speed 6266.89 samples/sec Loss 4.5624 LearningRate 0.0002 Epoch: 22 Global Step: 462270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:18,061-Speed 6256.03 samples/sec Loss 4.5640 LearningRate 0.0002 Epoch: 22 Global Step: 462280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:21,305-Speed 6314.19 samples/sec Loss 4.6383 LearningRate 0.0002 Epoch: 22 Global Step: 462290 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:47:24,550-Speed 6313.24 samples/sec Loss 4.5232 LearningRate 0.0002 Epoch: 22 Global Step: 462300 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:47:27,789-Speed 6324.28 samples/sec Loss 4.5739 LearningRate 0.0002 Epoch: 22 Global Step: 462310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:31,035-Speed 6309.07 samples/sec Loss 4.5708 LearningRate 0.0002 Epoch: 22 Global Step: 462320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:34,279-Speed 6314.29 samples/sec Loss 4.5121 LearningRate 0.0002 Epoch: 22 Global Step: 462330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:37,530-Speed 6301.81 samples/sec Loss 4.5399 LearningRate 0.0002 Epoch: 22 Global Step: 462340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:40,776-Speed 6310.75 samples/sec Loss 4.5073 LearningRate 0.0002 Epoch: 22 Global Step: 462350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:44,022-Speed 6310.92 samples/sec Loss 4.5050 LearningRate 0.0002 Epoch: 22 Global Step: 462360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:47,270-Speed 6307.32 samples/sec Loss 4.5545 LearningRate 0.0002 Epoch: 22 Global Step: 462370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:50,517-Speed 6308.48 samples/sec Loss 4.4647 LearningRate 0.0002 Epoch: 22 Global Step: 462380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:53,766-Speed 6304.68 samples/sec Loss 4.5069 LearningRate 0.0002 Epoch: 22 Global Step: 462390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:47:57,014-Speed 6306.34 samples/sec Loss 4.5386 LearningRate 0.0002 Epoch: 22 Global Step: 462400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:00,256-Speed 6317.94 samples/sec Loss 4.5309 LearningRate 0.0002 Epoch: 22 Global Step: 462410 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:48:03,489-Speed 6336.75 samples/sec Loss 4.4941 LearningRate 0.0002 Epoch: 22 Global Step: 462420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:06,734-Speed 6312.38 samples/sec Loss 4.5352 LearningRate 0.0002 Epoch: 22 Global Step: 462430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:09,978-Speed 6314.75 samples/sec Loss 4.4987 LearningRate 0.0002 Epoch: 22 Global Step: 462440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:13,223-Speed 6312.93 samples/sec Loss 4.5454 LearningRate 0.0002 Epoch: 22 Global Step: 462450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:16,482-Speed 6287.51 samples/sec Loss 4.6014 LearningRate 0.0002 Epoch: 22 Global Step: 462460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:19,726-Speed 6313.81 samples/sec Loss 4.5173 LearningRate 0.0002 Epoch: 22 Global Step: 462470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:22,978-Speed 6300.03 samples/sec Loss 4.5248 LearningRate 0.0002 Epoch: 22 Global Step: 462480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:26,219-Speed 6318.73 samples/sec Loss 4.5567 LearningRate 0.0002 Epoch: 22 Global Step: 462490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:29,466-Speed 6308.87 samples/sec Loss 4.5422 LearningRate 0.0002 Epoch: 22 Global Step: 462500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:32,709-Speed 6316.82 samples/sec Loss 4.4744 LearningRate 0.0002 Epoch: 22 Global Step: 462510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:35,956-Speed 6308.89 samples/sec Loss 4.5229 LearningRate 0.0002 Epoch: 22 Global Step: 462520 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:48:39,188-Speed 6338.08 samples/sec Loss 4.5156 LearningRate 0.0002 Epoch: 22 Global Step: 462530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:42,458-Speed 6263.93 samples/sec Loss 4.5710 LearningRate 0.0002 Epoch: 22 Global Step: 462540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:45,704-Speed 6311.27 samples/sec Loss 4.5961 LearningRate 0.0002 Epoch: 22 Global Step: 462550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:48,949-Speed 6311.90 samples/sec Loss 4.5068 LearningRate 0.0002 Epoch: 22 Global Step: 462560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:52,195-Speed 6312.54 samples/sec Loss 4.5869 LearningRate 0.0002 Epoch: 22 Global Step: 462570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:55,439-Speed 6314.21 samples/sec Loss 4.5854 LearningRate 0.0002 Epoch: 22 Global Step: 462580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:48:58,685-Speed 6309.33 samples/sec Loss 4.5650 LearningRate 0.0002 Epoch: 22 Global Step: 462590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:01,932-Speed 6309.97 samples/sec Loss 4.5818 LearningRate 0.0002 Epoch: 22 Global Step: 462600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:05,178-Speed 6311.06 samples/sec Loss 4.5741 LearningRate 0.0002 Epoch: 22 Global Step: 462610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:08,423-Speed 6311.11 samples/sec Loss 4.5157 LearningRate 0.0002 Epoch: 22 Global Step: 462620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:11,656-Speed 6337.15 samples/sec Loss 4.4998 LearningRate 0.0002 Epoch: 22 Global Step: 462630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:14,903-Speed 6308.44 samples/sec Loss 4.5408 LearningRate 0.0002 Epoch: 22 Global Step: 462640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:18,154-Speed 6300.33 samples/sec Loss 4.4968 LearningRate 0.0002 Epoch: 22 Global Step: 462650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:21,401-Speed 6310.26 samples/sec Loss 4.4844 LearningRate 0.0002 Epoch: 22 Global Step: 462660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:24,657-Speed 6291.33 samples/sec Loss 4.5719 LearningRate 0.0002 Epoch: 22 Global Step: 462670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:27,907-Speed 6303.94 samples/sec Loss 4.4985 LearningRate 0.0002 Epoch: 22 Global Step: 462680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:31,156-Speed 6304.34 samples/sec Loss 4.5490 LearningRate 0.0002 Epoch: 22 Global Step: 462690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:34,406-Speed 6302.59 samples/sec Loss 4.6033 LearningRate 0.0002 Epoch: 22 Global Step: 462700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:37,653-Speed 6309.33 samples/sec Loss 4.5194 LearningRate 0.0002 Epoch: 22 Global Step: 462710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:40,898-Speed 6312.31 samples/sec Loss 4.5245 LearningRate 0.0002 Epoch: 22 Global Step: 462720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:44,124-Speed 6349.46 samples/sec Loss 4.5792 LearningRate 0.0002 Epoch: 22 Global Step: 462730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:47,374-Speed 6302.75 samples/sec Loss 4.5017 LearningRate 0.0002 Epoch: 22 Global Step: 462740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:50,616-Speed 6318.74 samples/sec Loss 4.5452 LearningRate 0.0002 Epoch: 22 Global Step: 462750 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:53,864-Speed 6306.85 samples/sec Loss 4.4919 LearningRate 0.0002 Epoch: 22 Global Step: 462760 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:49:57,109-Speed 6313.69 samples/sec Loss 4.5245 LearningRate 0.0002 Epoch: 22 Global Step: 462770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:00,362-Speed 6297.24 samples/sec Loss 4.5703 LearningRate 0.0002 Epoch: 22 Global Step: 462780 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:03,612-Speed 6302.06 samples/sec Loss 4.4933 LearningRate 0.0002 Epoch: 22 Global Step: 462790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:06,859-Speed 6309.16 samples/sec Loss 4.5106 LearningRate 0.0002 Epoch: 22 Global Step: 462800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:10,170-Speed 6187.10 samples/sec Loss 4.5401 LearningRate 0.0002 Epoch: 22 Global Step: 462810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:13,434-Speed 6275.77 samples/sec Loss 4.4914 LearningRate 0.0002 Epoch: 22 Global Step: 462820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:16,667-Speed 6335.23 samples/sec Loss 4.5149 LearningRate 0.0002 Epoch: 22 Global Step: 462830 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:19,912-Speed 6312.97 samples/sec Loss 4.5613 LearningRate 0.0002 Epoch: 22 Global Step: 462840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:23,155-Speed 6316.06 samples/sec Loss 4.5219 LearningRate 0.0002 Epoch: 22 Global Step: 462850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:26,402-Speed 6310.21 samples/sec Loss 4.5412 LearningRate 0.0002 Epoch: 22 Global Step: 462860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:29,660-Speed 6288.15 samples/sec Loss 4.4834 LearningRate 0.0002 Epoch: 22 Global Step: 462870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:32,906-Speed 6309.76 samples/sec Loss 4.5075 LearningRate 0.0002 Epoch: 22 Global Step: 462880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:36,149-Speed 6316.50 samples/sec Loss 4.4797 LearningRate 0.0002 Epoch: 22 Global Step: 462890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:39,392-Speed 6317.49 samples/sec Loss 4.5518 LearningRate 0.0002 Epoch: 22 Global Step: 462900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:50:42,622-Speed 6340.99 samples/sec Loss 4.4804 LearningRate 0.0002 Epoch: 22 Global Step: 462910 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:50:45,868-Speed 6311.58 samples/sec Loss 4.5027 LearningRate 0.0002 Epoch: 22 Global Step: 462920 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:50:49,140-Speed 6258.92 samples/sec Loss 4.5106 LearningRate 0.0002 Epoch: 22 Global Step: 462930 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:50:52,406-Speed 6273.02 samples/sec Loss 4.5842 LearningRate 0.0002 Epoch: 22 Global Step: 462940 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:50:55,650-Speed 6313.91 samples/sec Loss 4.5728 LearningRate 0.0002 Epoch: 22 Global Step: 462950 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:50:58,898-Speed 6307.93 samples/sec Loss 4.5822 LearningRate 0.0002 Epoch: 22 Global Step: 462960 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:51:02,192-Speed 6217.92 samples/sec Loss 4.5310 LearningRate 0.0002 Epoch: 22 Global Step: 462970 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:51:05,436-Speed 6314.58 samples/sec Loss 4.5654 LearningRate 0.0002 Epoch: 22 Global Step: 462980 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:51:08,676-Speed 6322.77 samples/sec Loss 4.4948 LearningRate 0.0002 Epoch: 22 Global Step: 462990 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:51:11,918-Speed 6318.19 samples/sec Loss 4.5738 LearningRate 0.0002 Epoch: 22 Global Step: 463000 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:51:15,163-Speed 6313.02 samples/sec Loss 4.6108 LearningRate 0.0002 Epoch: 22 Global Step: 463010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:18,408-Speed 6313.14 samples/sec Loss 4.5371 LearningRate 0.0002 Epoch: 22 Global Step: 463020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:21,673-Speed 6274.19 samples/sec Loss 4.5339 LearningRate 0.0002 Epoch: 22 Global Step: 463030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:24,927-Speed 6293.39 samples/sec Loss 4.5107 LearningRate 0.0002 Epoch: 22 Global Step: 463040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:28,173-Speed 6311.37 samples/sec Loss 4.5789 LearningRate 0.0002 Epoch: 22 Global Step: 463050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:31,419-Speed 6311.71 samples/sec Loss 4.5615 LearningRate 0.0002 Epoch: 22 Global Step: 463060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:34,687-Speed 6267.99 samples/sec Loss 4.4942 LearningRate 0.0002 Epoch: 22 Global Step: 463070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:37,953-Speed 6272.88 samples/sec Loss 4.4878 LearningRate 0.0002 Epoch: 22 Global Step: 463080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:41,194-Speed 6319.68 samples/sec Loss 4.5028 LearningRate 0.0002 Epoch: 22 Global Step: 463090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:44,437-Speed 6318.09 samples/sec Loss 4.5392 LearningRate 0.0002 Epoch: 22 Global Step: 463100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:47,700-Speed 6276.74 samples/sec Loss 4.5855 LearningRate 0.0002 Epoch: 22 Global Step: 463110 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:51:51,013-Speed 6184.37 samples/sec Loss 4.5867 LearningRate 0.0002 Epoch: 22 Global Step: 463120 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:51:54,247-Speed 6333.75 samples/sec Loss 4.4892 LearningRate 0.0002 Epoch: 22 Global Step: 463130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:51:57,488-Speed 6320.83 samples/sec Loss 4.5740 LearningRate 0.0002 Epoch: 22 Global Step: 463140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:00,740-Speed 6297.32 samples/sec Loss 4.5462 LearningRate 0.0002 Epoch: 22 Global Step: 463150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:03,987-Speed 6309.82 samples/sec Loss 4.6051 LearningRate 0.0002 Epoch: 22 Global Step: 463160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:07,236-Speed 6304.27 samples/sec Loss 4.5241 LearningRate 0.0002 Epoch: 22 Global Step: 463170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:10,480-Speed 6314.26 samples/sec Loss 4.5126 LearningRate 0.0002 Epoch: 22 Global Step: 463180 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:13,739-Speed 6285.61 samples/sec Loss 4.6053 LearningRate 0.0002 Epoch: 22 Global Step: 463190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:16,984-Speed 6313.26 samples/sec Loss 4.5318 LearningRate 0.0002 Epoch: 22 Global Step: 463200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:20,230-Speed 6310.41 samples/sec Loss 4.5736 LearningRate 0.0002 Epoch: 22 Global Step: 463210 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:23,476-Speed 6311.79 samples/sec Loss 4.4695 LearningRate 0.0002 Epoch: 22 Global Step: 463220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:26,708-Speed 6336.85 samples/sec Loss 4.5639 LearningRate 0.0002 Epoch: 22 Global Step: 463230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:29,955-Speed 6309.81 samples/sec Loss 4.5416 LearningRate 0.0002 Epoch: 22 Global Step: 463240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:33,198-Speed 6315.16 samples/sec Loss 4.5169 LearningRate 0.0002 Epoch: 22 Global Step: 463250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:36,439-Speed 6321.09 samples/sec Loss 4.4958 LearningRate 0.0002 Epoch: 22 Global Step: 463260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:39,684-Speed 6312.62 samples/sec Loss 4.5069 LearningRate 0.0002 Epoch: 22 Global Step: 463270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:42,933-Speed 6304.81 samples/sec Loss 4.5276 LearningRate 0.0002 Epoch: 22 Global Step: 463280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:46,180-Speed 6308.69 samples/sec Loss 4.5627 LearningRate 0.0002 Epoch: 22 Global Step: 463290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:49,427-Speed 6308.65 samples/sec Loss 4.6061 LearningRate 0.0002 Epoch: 22 Global Step: 463300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:52,673-Speed 6311.56 samples/sec Loss 4.4550 LearningRate 0.0002 Epoch: 22 Global Step: 463310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:55,921-Speed 6306.05 samples/sec Loss 4.5845 LearningRate 0.0002 Epoch: 22 Global Step: 463320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:52:59,165-Speed 6314.67 samples/sec Loss 4.5884 LearningRate 0.0002 Epoch: 22 Global Step: 463330 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:53:02,397-Speed 6339.79 samples/sec Loss 4.5820 LearningRate 0.0002 Epoch: 22 Global Step: 463340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:05,642-Speed 6312.17 samples/sec Loss 4.5294 LearningRate 0.0002 Epoch: 22 Global Step: 463350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:08,886-Speed 6314.22 samples/sec Loss 4.5711 LearningRate 0.0002 Epoch: 22 Global Step: 463360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:12,132-Speed 6311.00 samples/sec Loss 4.5180 LearningRate 0.0002 Epoch: 22 Global Step: 463370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:15,375-Speed 6316.73 samples/sec Loss 4.5374 LearningRate 0.0002 Epoch: 22 Global Step: 463380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:18,626-Speed 6300.78 samples/sec Loss 4.5129 LearningRate 0.0002 Epoch: 22 Global Step: 463390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:21,872-Speed 6309.73 samples/sec Loss 4.5492 LearningRate 0.0002 Epoch: 22 Global Step: 463400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:25,122-Speed 6302.85 samples/sec Loss 4.5186 LearningRate 0.0002 Epoch: 22 Global Step: 463410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:28,366-Speed 6314.67 samples/sec Loss 4.4872 LearningRate 0.0002 Epoch: 22 Global Step: 463420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:31,633-Speed 6271.12 samples/sec Loss 4.5571 LearningRate 0.0002 Epoch: 22 Global Step: 463430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:34,866-Speed 6335.09 samples/sec Loss 4.5675 LearningRate 0.0002 Epoch: 22 Global Step: 463440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:38,117-Speed 6301.06 samples/sec Loss 4.5109 LearningRate 0.0002 Epoch: 22 Global Step: 463450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:41,365-Speed 6307.79 samples/sec Loss 4.5333 LearningRate 0.0002 Epoch: 22 Global Step: 463460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:44,609-Speed 6314.56 samples/sec Loss 4.6075 LearningRate 0.0002 Epoch: 22 Global Step: 463470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:47,855-Speed 6310.65 samples/sec Loss 4.5136 LearningRate 0.0002 Epoch: 22 Global Step: 463480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:51,100-Speed 6312.29 samples/sec Loss 4.5168 LearningRate 0.0002 Epoch: 22 Global Step: 463490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:54,345-Speed 6312.31 samples/sec Loss 4.6079 LearningRate 0.0002 Epoch: 22 Global Step: 463500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:53:57,590-Speed 6313.12 samples/sec Loss 4.5747 LearningRate 0.0002 Epoch: 22 Global Step: 463510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:00,837-Speed 6310.50 samples/sec Loss 4.5727 LearningRate 0.0002 Epoch: 22 Global Step: 463520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:04,085-Speed 6304.94 samples/sec Loss 4.4842 LearningRate 0.0002 Epoch: 22 Global Step: 463530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:07,335-Speed 6304.19 samples/sec Loss 4.4702 LearningRate 0.0002 Epoch: 22 Global Step: 463540 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:54:10,579-Speed 6314.79 samples/sec Loss 4.5392 LearningRate 0.0002 Epoch: 22 Global Step: 463550 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:54:13,810-Speed 6340.10 samples/sec Loss 4.5445 LearningRate 0.0002 Epoch: 22 Global Step: 463560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:17,054-Speed 6314.26 samples/sec Loss 4.5442 LearningRate 0.0002 Epoch: 22 Global Step: 463570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:20,302-Speed 6307.29 samples/sec Loss 4.4727 LearningRate 0.0002 Epoch: 22 Global Step: 463580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:23,549-Speed 6308.17 samples/sec Loss 4.5529 LearningRate 0.0002 Epoch: 22 Global Step: 463590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:26,795-Speed 6310.11 samples/sec Loss 4.5724 LearningRate 0.0002 Epoch: 22 Global Step: 463600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:30,043-Speed 6307.38 samples/sec Loss 4.5145 LearningRate 0.0002 Epoch: 22 Global Step: 463610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:33,286-Speed 6316.99 samples/sec Loss 4.5877 LearningRate 0.0002 Epoch: 22 Global Step: 463620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:36,534-Speed 6306.72 samples/sec Loss 4.5219 LearningRate 0.0002 Epoch: 22 Global Step: 463630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:39,778-Speed 6313.91 samples/sec Loss 4.5473 LearningRate 0.0002 Epoch: 22 Global Step: 463640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:43,033-Speed 6292.61 samples/sec Loss 4.5066 LearningRate 0.0002 Epoch: 22 Global Step: 463650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:46,267-Speed 6334.14 samples/sec Loss 4.5357 LearningRate 0.0002 Epoch: 22 Global Step: 463660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:49,510-Speed 6317.59 samples/sec Loss 4.5534 LearningRate 0.0002 Epoch: 22 Global Step: 463670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:52,757-Speed 6309.06 samples/sec Loss 4.4805 LearningRate 0.0002 Epoch: 22 Global Step: 463680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:56,001-Speed 6314.05 samples/sec Loss 4.4638 LearningRate 0.0002 Epoch: 22 Global Step: 463690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:54:59,250-Speed 6305.08 samples/sec Loss 4.5917 LearningRate 0.0002 Epoch: 22 Global Step: 463700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:55:02,495-Speed 6312.16 samples/sec Loss 4.4891 LearningRate 0.0002 Epoch: 22 Global Step: 463710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:55:05,740-Speed 6313.23 samples/sec Loss 4.5098 LearningRate 0.0002 Epoch: 22 Global Step: 463720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:55:08,984-Speed 6315.97 samples/sec Loss 4.5538 LearningRate 0.0002 Epoch: 22 Global Step: 463730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:55:12,228-Speed 6313.60 samples/sec Loss 4.5385 LearningRate 0.0002 Epoch: 22 Global Step: 463740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:55:15,461-Speed 6335.87 samples/sec Loss 4.4763 LearningRate 0.0002 Epoch: 22 Global Step: 463750 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:55:18,707-Speed 6312.07 samples/sec Loss 4.6187 LearningRate 0.0002 Epoch: 22 Global Step: 463760 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:55:21,953-Speed 6309.79 samples/sec Loss 4.5053 LearningRate 0.0002 Epoch: 22 Global Step: 463770 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:55:25,202-Speed 6304.93 samples/sec Loss 4.4566 LearningRate 0.0002 Epoch: 22 Global Step: 463780 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:55:28,448-Speed 6310.96 samples/sec Loss 4.5354 LearningRate 0.0002 Epoch: 22 Global Step: 463790 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:55:31,695-Speed 6309.83 samples/sec Loss 4.5721 LearningRate 0.0002 Epoch: 22 Global Step: 463800 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:55:34,938-Speed 6316.38 samples/sec Loss 4.5009 LearningRate 0.0002 Epoch: 22 Global Step: 463810 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:55:38,185-Speed 6308.50 samples/sec Loss 4.5774 LearningRate 0.0002 Epoch: 22 Global Step: 463820 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:55:41,429-Speed 6314.72 samples/sec Loss 4.5186 LearningRate 0.0002 Epoch: 22 Global Step: 463830 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:55:44,670-Speed 6319.63 samples/sec Loss 4.5376 LearningRate 0.0002 Epoch: 22 Global Step: 463840 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 09:55:47,915-Speed 6313.54 samples/sec Loss 4.5076 LearningRate 0.0002 Epoch: 22 Global Step: 463850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:55:51,157-Speed 6317.82 samples/sec Loss 4.5396 LearningRate 0.0002 Epoch: 22 Global Step: 463860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:55:54,412-Speed 6293.30 samples/sec Loss 4.5006 LearningRate 0.0002 Epoch: 22 Global Step: 463870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:55:57,658-Speed 6311.23 samples/sec Loss 4.4627 LearningRate 0.0002 Epoch: 22 Global Step: 463880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:00,903-Speed 6313.03 samples/sec Loss 4.5087 LearningRate 0.0002 Epoch: 22 Global Step: 463890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:04,147-Speed 6313.33 samples/sec Loss 4.5362 LearningRate 0.0002 Epoch: 22 Global Step: 463900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:07,394-Speed 6308.13 samples/sec Loss 4.4933 LearningRate 0.0002 Epoch: 22 Global Step: 463910 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:10,648-Speed 6296.06 samples/sec Loss 4.4820 LearningRate 0.0002 Epoch: 22 Global Step: 463920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:13,891-Speed 6317.55 samples/sec Loss 4.5260 LearningRate 0.0002 Epoch: 22 Global Step: 463930 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:17,140-Speed 6304.28 samples/sec Loss 4.5993 LearningRate 0.0002 Epoch: 22 Global Step: 463940 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:20,370-Speed 6341.76 samples/sec Loss 4.5369 LearningRate 0.0002 Epoch: 22 Global Step: 463950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:23,658-Speed 6230.38 samples/sec Loss 4.5423 LearningRate 0.0002 Epoch: 22 Global Step: 463960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:26,903-Speed 6312.91 samples/sec Loss 4.6079 LearningRate 0.0002 Epoch: 22 Global Step: 463970 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:30,147-Speed 6315.57 samples/sec Loss 4.5330 LearningRate 0.0002 Epoch: 22 Global Step: 463980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:33,390-Speed 6316.30 samples/sec Loss 4.5538 LearningRate 0.0002 Epoch: 22 Global Step: 463990 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:36,635-Speed 6312.79 samples/sec Loss 4.4519 LearningRate 0.0002 Epoch: 22 Global Step: 464000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:39,886-Speed 6301.42 samples/sec Loss 4.5958 LearningRate 0.0002 Epoch: 22 Global Step: 464010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:43,135-Speed 6306.81 samples/sec Loss 4.5565 LearningRate 0.0002 Epoch: 22 Global Step: 464020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:46,384-Speed 6303.96 samples/sec Loss 4.5178 LearningRate 0.0002 Epoch: 22 Global Step: 464030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:49,627-Speed 6317.15 samples/sec Loss 4.4574 LearningRate 0.0002 Epoch: 22 Global Step: 464040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:52,876-Speed 6306.01 samples/sec Loss 4.4955 LearningRate 0.0002 Epoch: 22 Global Step: 464050 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:56:56,107-Speed 6338.67 samples/sec Loss 4.4687 LearningRate 0.0002 Epoch: 22 Global Step: 464060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:56:59,352-Speed 6312.62 samples/sec Loss 4.5321 LearningRate 0.0002 Epoch: 22 Global Step: 464070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:02,595-Speed 6317.86 samples/sec Loss 4.5566 LearningRate 0.0002 Epoch: 22 Global Step: 464080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:05,850-Speed 6292.91 samples/sec Loss 4.4848 LearningRate 0.0002 Epoch: 22 Global Step: 464090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:09,104-Speed 6295.71 samples/sec Loss 4.6336 LearningRate 0.0002 Epoch: 22 Global Step: 464100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:12,348-Speed 6313.51 samples/sec Loss 4.5856 LearningRate 0.0002 Epoch: 22 Global Step: 464110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:15,594-Speed 6309.62 samples/sec Loss 4.5558 LearningRate 0.0002 Epoch: 22 Global Step: 464120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:18,838-Speed 6316.01 samples/sec Loss 4.5009 LearningRate 0.0002 Epoch: 22 Global Step: 464130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:22,081-Speed 6316.61 samples/sec Loss 4.5786 LearningRate 0.0002 Epoch: 22 Global Step: 464140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:25,338-Speed 6288.28 samples/sec Loss 4.5698 LearningRate 0.0002 Epoch: 22 Global Step: 464150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:28,570-Speed 6339.56 samples/sec Loss 4.5417 LearningRate 0.0002 Epoch: 22 Global Step: 464160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:31,815-Speed 6312.08 samples/sec Loss 4.5655 LearningRate 0.0002 Epoch: 22 Global Step: 464170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:35,060-Speed 6313.51 samples/sec Loss 4.4709 LearningRate 0.0002 Epoch: 22 Global Step: 464180 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:38,307-Speed 6309.39 samples/sec Loss 4.5269 LearningRate 0.0002 Epoch: 22 Global Step: 464190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:41,550-Speed 6315.29 samples/sec Loss 4.4896 LearningRate 0.0002 Epoch: 22 Global Step: 464200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:44,801-Speed 6301.73 samples/sec Loss 4.5297 LearningRate 0.0002 Epoch: 22 Global Step: 464210 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:48,044-Speed 6315.49 samples/sec Loss 4.5679 LearningRate 0.0002 Epoch: 22 Global Step: 464220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:51,294-Speed 6304.28 samples/sec Loss 4.5367 LearningRate 0.0002 Epoch: 22 Global Step: 464230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:54,535-Speed 6318.69 samples/sec Loss 4.5108 LearningRate 0.0002 Epoch: 22 Global Step: 464240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:57:57,776-Speed 6321.62 samples/sec Loss 4.5424 LearningRate 0.0002 Epoch: 22 Global Step: 464250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:01,007-Speed 6339.22 samples/sec Loss 4.5630 LearningRate 0.0002 Epoch: 22 Global Step: 464260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:04,253-Speed 6312.13 samples/sec Loss 4.6079 LearningRate 0.0002 Epoch: 22 Global Step: 464270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:07,504-Speed 6300.25 samples/sec Loss 4.5599 LearningRate 0.0002 Epoch: 22 Global Step: 464280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:10,749-Speed 6313.10 samples/sec Loss 4.5073 LearningRate 0.0002 Epoch: 22 Global Step: 464290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:13,997-Speed 6306.72 samples/sec Loss 4.4619 LearningRate 0.0002 Epoch: 22 Global Step: 464300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:17,244-Speed 6308.23 samples/sec Loss 4.4584 LearningRate 0.0002 Epoch: 22 Global Step: 464310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:20,493-Speed 6304.27 samples/sec Loss 4.4876 LearningRate 0.0002 Epoch: 22 Global Step: 464320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:23,740-Speed 6308.60 samples/sec Loss 4.5078 LearningRate 0.0002 Epoch: 22 Global Step: 464330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:26,985-Speed 6313.33 samples/sec Loss 4.5091 LearningRate 0.0002 Epoch: 22 Global Step: 464340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:30,229-Speed 6315.03 samples/sec Loss 4.5511 LearningRate 0.0002 Epoch: 22 Global Step: 464350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:33,465-Speed 6330.59 samples/sec Loss 4.5095 LearningRate 0.0002 Epoch: 22 Global Step: 464360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:36,713-Speed 6307.35 samples/sec Loss 4.5054 LearningRate 0.0002 Epoch: 22 Global Step: 464370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:39,959-Speed 6310.03 samples/sec Loss 4.5101 LearningRate 0.0002 Epoch: 22 Global Step: 464380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:43,208-Speed 6305.79 samples/sec Loss 4.4582 LearningRate 0.0002 Epoch: 22 Global Step: 464390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:46,455-Speed 6308.64 samples/sec Loss 4.5905 LearningRate 0.0002 Epoch: 22 Global Step: 464400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:49,700-Speed 6311.70 samples/sec Loss 4.6284 LearningRate 0.0002 Epoch: 22 Global Step: 464410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:52,945-Speed 6312.60 samples/sec Loss 4.4873 LearningRate 0.0002 Epoch: 22 Global Step: 464420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:56,193-Speed 6307.27 samples/sec Loss 4.4884 LearningRate 0.0002 Epoch: 22 Global Step: 464430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:58:59,438-Speed 6313.59 samples/sec Loss 4.5554 LearningRate 0.0002 Epoch: 22 Global Step: 464440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:02,685-Speed 6307.91 samples/sec Loss 4.4763 LearningRate 0.0002 Epoch: 22 Global Step: 464450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:05,933-Speed 6306.64 samples/sec Loss 4.5227 LearningRate 0.0002 Epoch: 22 Global Step: 464460 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:59:09,163-Speed 6343.41 samples/sec Loss 4.6221 LearningRate 0.0002 Epoch: 22 Global Step: 464470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:12,407-Speed 6313.35 samples/sec Loss 4.5242 LearningRate 0.0002 Epoch: 22 Global Step: 464480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:15,671-Speed 6275.06 samples/sec Loss 4.4579 LearningRate 0.0002 Epoch: 22 Global Step: 464490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:18,918-Speed 6310.37 samples/sec Loss 4.5312 LearningRate 0.0002 Epoch: 22 Global Step: 464500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:22,163-Speed 6312.18 samples/sec Loss 4.5419 LearningRate 0.0002 Epoch: 22 Global Step: 464510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:25,413-Speed 6303.23 samples/sec Loss 4.4996 LearningRate 0.0002 Epoch: 22 Global Step: 464520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:28,672-Speed 6285.00 samples/sec Loss 4.5365 LearningRate 0.0002 Epoch: 22 Global Step: 464530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:31,916-Speed 6315.38 samples/sec Loss 4.4824 LearningRate 0.0002 Epoch: 22 Global Step: 464540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:35,161-Speed 6311.31 samples/sec Loss 4.5384 LearningRate 0.0002 Epoch: 22 Global Step: 464550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:38,407-Speed 6310.79 samples/sec Loss 4.5160 LearningRate 0.0002 Epoch: 22 Global Step: 464560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:41,652-Speed 6313.69 samples/sec Loss 4.5764 LearningRate 0.0002 Epoch: 22 Global Step: 464570 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 09:59:44,881-Speed 6343.38 samples/sec Loss 4.5159 LearningRate 0.0002 Epoch: 22 Global Step: 464580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:48,125-Speed 6314.50 samples/sec Loss 4.4914 LearningRate 0.0002 Epoch: 22 Global Step: 464590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:51,372-Speed 6310.41 samples/sec Loss 4.5988 LearningRate 0.0002 Epoch: 22 Global Step: 464600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:54,617-Speed 6312.56 samples/sec Loss 4.5703 LearningRate 0.0002 Epoch: 22 Global Step: 464610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 09:59:57,862-Speed 6312.54 samples/sec Loss 4.5236 LearningRate 0.0002 Epoch: 22 Global Step: 464620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:01,104-Speed 6318.50 samples/sec Loss 4.5493 LearningRate 0.0002 Epoch: 22 Global Step: 464630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:04,387-Speed 6239.66 samples/sec Loss 4.5276 LearningRate 0.0002 Epoch: 22 Global Step: 464640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:07,632-Speed 6313.42 samples/sec Loss 4.4933 LearningRate 0.0002 Epoch: 22 Global Step: 464650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:10,879-Speed 6307.68 samples/sec Loss 4.5670 LearningRate 0.0002 Epoch: 22 Global Step: 464660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:14,126-Speed 6308.22 samples/sec Loss 4.5278 LearningRate 0.0002 Epoch: 22 Global Step: 464670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:17,357-Speed 6340.04 samples/sec Loss 4.5189 LearningRate 0.0002 Epoch: 22 Global Step: 464680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:20,604-Speed 6310.05 samples/sec Loss 4.4292 LearningRate 0.0002 Epoch: 22 Global Step: 464690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:23,849-Speed 6311.15 samples/sec Loss 4.4882 LearningRate 0.0002 Epoch: 22 Global Step: 464700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:27,094-Speed 6314.46 samples/sec Loss 4.5267 LearningRate 0.0002 Epoch: 22 Global Step: 464710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:30,343-Speed 6304.77 samples/sec Loss 4.5812 LearningRate 0.0002 Epoch: 22 Global Step: 464720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:33,587-Speed 6313.10 samples/sec Loss 4.5511 LearningRate 0.0002 Epoch: 22 Global Step: 464730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:36,835-Speed 6307.04 samples/sec Loss 4.5340 LearningRate 0.0002 Epoch: 22 Global Step: 464740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:40,126-Speed 6225.17 samples/sec Loss 4.4804 LearningRate 0.0002 Epoch: 22 Global Step: 464750 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:43,429-Speed 6201.10 samples/sec Loss 4.5576 LearningRate 0.0002 Epoch: 22 Global Step: 464760 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:46,674-Speed 6313.48 samples/sec Loss 4.5402 LearningRate 0.0002 Epoch: 22 Global Step: 464770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:49,915-Speed 6319.72 samples/sec Loss 4.5356 LearningRate 0.0002 Epoch: 22 Global Step: 464780 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:00:53,150-Speed 6331.57 samples/sec Loss 4.4803 LearningRate 0.0002 Epoch: 22 Global Step: 464790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:56,397-Speed 6309.42 samples/sec Loss 4.5905 LearningRate 0.0002 Epoch: 22 Global Step: 464800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:00:59,646-Speed 6305.87 samples/sec Loss 4.4922 LearningRate 0.0002 Epoch: 22 Global Step: 464810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:02,950-Speed 6200.38 samples/sec Loss 4.4699 LearningRate 0.0002 Epoch: 22 Global Step: 464820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:06,195-Speed 6312.68 samples/sec Loss 4.5379 LearningRate 0.0002 Epoch: 22 Global Step: 464830 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:09,444-Speed 6304.43 samples/sec Loss 4.5781 LearningRate 0.0002 Epoch: 22 Global Step: 464840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:12,688-Speed 6314.91 samples/sec Loss 4.6032 LearningRate 0.0002 Epoch: 22 Global Step: 464850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:15,934-Speed 6309.51 samples/sec Loss 4.4708 LearningRate 0.0002 Epoch: 22 Global Step: 464860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:19,180-Speed 6312.16 samples/sec Loss 4.5854 LearningRate 0.0002 Epoch: 22 Global Step: 464870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:22,424-Speed 6313.05 samples/sec Loss 4.5571 LearningRate 0.0002 Epoch: 22 Global Step: 464880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:25,671-Speed 6309.99 samples/sec Loss 4.4966 LearningRate 0.0002 Epoch: 22 Global Step: 464890 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:01:28,902-Speed 6340.05 samples/sec Loss 4.5129 LearningRate 0.0002 Epoch: 22 Global Step: 464900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:32,149-Speed 6308.03 samples/sec Loss 4.4885 LearningRate 0.0002 Epoch: 22 Global Step: 464910 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:35,391-Speed 6318.56 samples/sec Loss 4.4324 LearningRate 0.0002 Epoch: 22 Global Step: 464920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:38,636-Speed 6312.59 samples/sec Loss 4.5007 LearningRate 0.0002 Epoch: 22 Global Step: 464930 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:41,879-Speed 6316.71 samples/sec Loss 4.5036 LearningRate 0.0002 Epoch: 22 Global Step: 464940 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:45,124-Speed 6312.50 samples/sec Loss 4.5317 LearningRate 0.0002 Epoch: 22 Global Step: 464950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:48,369-Speed 6313.83 samples/sec Loss 4.5010 LearningRate 0.0002 Epoch: 22 Global Step: 464960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:51,612-Speed 6315.36 samples/sec Loss 4.4564 LearningRate 0.0002 Epoch: 22 Global Step: 464970 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:54,860-Speed 6307.34 samples/sec Loss 4.5303 LearningRate 0.0002 Epoch: 22 Global Step: 464980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:01:58,112-Speed 6298.84 samples/sec Loss 4.5528 LearningRate 0.0002 Epoch: 22 Global Step: 464990 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:01,349-Speed 6328.74 samples/sec Loss 4.5366 LearningRate 0.0002 Epoch: 22 Global Step: 465000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:04,595-Speed 6310.15 samples/sec Loss 4.5747 LearningRate 0.0002 Epoch: 22 Global Step: 465010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:07,848-Speed 6299.19 samples/sec Loss 4.5695 LearningRate 0.0002 Epoch: 22 Global Step: 465020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:11,126-Speed 6247.61 samples/sec Loss 4.4267 LearningRate 0.0002 Epoch: 22 Global Step: 465030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:14,388-Speed 6280.59 samples/sec Loss 4.5431 LearningRate 0.0002 Epoch: 22 Global Step: 465040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:17,633-Speed 6313.61 samples/sec Loss 4.5419 LearningRate 0.0002 Epoch: 22 Global Step: 465050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:20,875-Speed 6317.02 samples/sec Loss 4.5202 LearningRate 0.0002 Epoch: 22 Global Step: 465060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:24,118-Speed 6316.22 samples/sec Loss 4.5325 LearningRate 0.0002 Epoch: 22 Global Step: 465070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:27,362-Speed 6315.06 samples/sec Loss 4.5476 LearningRate 0.0002 Epoch: 22 Global Step: 465080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:30,608-Speed 6310.90 samples/sec Loss 4.4976 LearningRate 0.0002 Epoch: 22 Global Step: 465090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:33,860-Speed 6299.16 samples/sec Loss 4.5430 LearningRate 0.0002 Epoch: 22 Global Step: 465100 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:02:37,091-Speed 6340.32 samples/sec Loss 4.5373 LearningRate 0.0002 Epoch: 22 Global Step: 465110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:40,335-Speed 6314.94 samples/sec Loss 4.5237 LearningRate 0.0002 Epoch: 22 Global Step: 465120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:43,580-Speed 6311.81 samples/sec Loss 4.4801 LearningRate 0.0002 Epoch: 22 Global Step: 465130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:46,825-Speed 6313.87 samples/sec Loss 4.5041 LearningRate 0.0002 Epoch: 22 Global Step: 465140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:50,083-Speed 6285.66 samples/sec Loss 4.5010 LearningRate 0.0002 Epoch: 22 Global Step: 465150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:53,325-Speed 6319.62 samples/sec Loss 4.4309 LearningRate 0.0002 Epoch: 22 Global Step: 465160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:56,567-Speed 6317.38 samples/sec Loss 4.5427 LearningRate 0.0002 Epoch: 22 Global Step: 465170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:02:59,813-Speed 6312.18 samples/sec Loss 4.5450 LearningRate 0.0002 Epoch: 22 Global Step: 465180 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:03,056-Speed 6315.08 samples/sec Loss 4.5056 LearningRate 0.0002 Epoch: 22 Global Step: 465190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:06,307-Speed 6302.08 samples/sec Loss 4.5250 LearningRate 0.0002 Epoch: 22 Global Step: 465200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:09,535-Speed 6345.40 samples/sec Loss 4.5680 LearningRate 0.0002 Epoch: 22 Global Step: 465210 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:12,780-Speed 6312.47 samples/sec Loss 4.5303 LearningRate 0.0002 Epoch: 22 Global Step: 465220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:16,034-Speed 6295.65 samples/sec Loss 4.5703 LearningRate 0.0002 Epoch: 22 Global Step: 465230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:19,280-Speed 6312.28 samples/sec Loss 4.5731 LearningRate 0.0002 Epoch: 22 Global Step: 465240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:22,530-Speed 6301.67 samples/sec Loss 4.5518 LearningRate 0.0002 Epoch: 22 Global Step: 465250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:25,776-Speed 6310.90 samples/sec Loss 4.4983 LearningRate 0.0002 Epoch: 22 Global Step: 465260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:29,043-Speed 6270.60 samples/sec Loss 4.5573 LearningRate 0.0002 Epoch: 22 Global Step: 465270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:32,292-Speed 6305.54 samples/sec Loss 4.4980 LearningRate 0.0002 Epoch: 22 Global Step: 465280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:35,537-Speed 6311.68 samples/sec Loss 4.4868 LearningRate 0.0002 Epoch: 22 Global Step: 465290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:38,783-Speed 6311.46 samples/sec Loss 4.4965 LearningRate 0.0002 Epoch: 22 Global Step: 465300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:42,013-Speed 6341.76 samples/sec Loss 4.4787 LearningRate 0.0002 Epoch: 22 Global Step: 465310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:45,260-Speed 6308.89 samples/sec Loss 4.5336 LearningRate 0.0002 Epoch: 22 Global Step: 465320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:48,503-Speed 6317.69 samples/sec Loss 4.5243 LearningRate 0.0002 Epoch: 22 Global Step: 465330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:51,745-Speed 6317.15 samples/sec Loss 4.5430 LearningRate 0.0002 Epoch: 22 Global Step: 465340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:54,990-Speed 6312.78 samples/sec Loss 4.5042 LearningRate 0.0002 Epoch: 22 Global Step: 465350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:03:58,238-Speed 6306.52 samples/sec Loss 4.5221 LearningRate 0.0002 Epoch: 22 Global Step: 465360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:01,488-Speed 6302.53 samples/sec Loss 4.4632 LearningRate 0.0002 Epoch: 22 Global Step: 465370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:04,730-Speed 6318.98 samples/sec Loss 4.5138 LearningRate 0.0002 Epoch: 22 Global Step: 465380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:07,973-Speed 6316.23 samples/sec Loss 4.4981 LearningRate 0.0002 Epoch: 22 Global Step: 465390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:11,214-Speed 6320.07 samples/sec Loss 4.4668 LearningRate 0.0002 Epoch: 22 Global Step: 465400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:14,461-Speed 6309.70 samples/sec Loss 4.5406 LearningRate 0.0002 Epoch: 22 Global Step: 465410 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:04:17,691-Speed 6342.32 samples/sec Loss 4.4939 LearningRate 0.0002 Epoch: 22 Global Step: 465420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:20,935-Speed 6313.89 samples/sec Loss 4.5125 LearningRate 0.0002 Epoch: 22 Global Step: 465430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:24,181-Speed 6311.69 samples/sec Loss 4.5554 LearningRate 0.0002 Epoch: 22 Global Step: 465440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:27,428-Speed 6308.71 samples/sec Loss 4.5460 LearningRate 0.0002 Epoch: 22 Global Step: 465450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:30,674-Speed 6311.78 samples/sec Loss 4.5193 LearningRate 0.0002 Epoch: 22 Global Step: 465460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:33,920-Speed 6310.61 samples/sec Loss 4.5013 LearningRate 0.0002 Epoch: 22 Global Step: 465470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:37,165-Speed 6311.68 samples/sec Loss 4.4152 LearningRate 0.0002 Epoch: 22 Global Step: 465480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:40,414-Speed 6305.89 samples/sec Loss 4.5113 LearningRate 0.0002 Epoch: 22 Global Step: 465490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:43,659-Speed 6312.64 samples/sec Loss 4.5643 LearningRate 0.0002 Epoch: 22 Global Step: 465500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:46,907-Speed 6306.33 samples/sec Loss 4.5224 LearningRate 0.0002 Epoch: 22 Global Step: 465510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:50,139-Speed 6338.33 samples/sec Loss 4.5221 LearningRate 0.0002 Epoch: 22 Global Step: 465520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:53,384-Speed 6312.69 samples/sec Loss 4.5697 LearningRate 0.0002 Epoch: 22 Global Step: 465530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:56,631-Speed 6309.91 samples/sec Loss 4.5181 LearningRate 0.0002 Epoch: 22 Global Step: 465540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:04:59,871-Speed 6320.73 samples/sec Loss 4.4830 LearningRate 0.0002 Epoch: 22 Global Step: 465550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:03,118-Speed 6309.41 samples/sec Loss 4.5083 LearningRate 0.0002 Epoch: 22 Global Step: 465560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:06,362-Speed 6314.67 samples/sec Loss 4.5250 LearningRate 0.0002 Epoch: 22 Global Step: 465570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:09,610-Speed 6307.45 samples/sec Loss 4.4452 LearningRate 0.0002 Epoch: 22 Global Step: 465580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:12,860-Speed 6302.03 samples/sec Loss 4.5383 LearningRate 0.0002 Epoch: 22 Global Step: 465590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:16,109-Speed 6306.14 samples/sec Loss 4.5047 LearningRate 0.0002 Epoch: 22 Global Step: 465600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:19,357-Speed 6306.06 samples/sec Loss 4.4882 LearningRate 0.0002 Epoch: 22 Global Step: 465610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:22,604-Speed 6308.00 samples/sec Loss 4.5277 LearningRate 0.0002 Epoch: 22 Global Step: 465620 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:05:25,844-Speed 6323.37 samples/sec Loss 4.4889 LearningRate 0.0002 Epoch: 22 Global Step: 465630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:29,093-Speed 6304.20 samples/sec Loss 4.4931 LearningRate 0.0002 Epoch: 22 Global Step: 465640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:32,340-Speed 6310.14 samples/sec Loss 4.5247 LearningRate 0.0002 Epoch: 22 Global Step: 465650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:35,593-Speed 6296.94 samples/sec Loss 4.5081 LearningRate 0.0002 Epoch: 22 Global Step: 465660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:38,840-Speed 6309.31 samples/sec Loss 4.5096 LearningRate 0.0002 Epoch: 22 Global Step: 465670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:42,086-Speed 6310.12 samples/sec Loss 4.5362 LearningRate 0.0002 Epoch: 22 Global Step: 465680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:45,337-Speed 6300.74 samples/sec Loss 4.4781 LearningRate 0.0002 Epoch: 22 Global Step: 465690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:48,584-Speed 6309.60 samples/sec Loss 4.4491 LearningRate 0.0002 Epoch: 22 Global Step: 465700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:51,831-Speed 6308.49 samples/sec Loss 4.5578 LearningRate 0.0002 Epoch: 22 Global Step: 465710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:55,074-Speed 6315.48 samples/sec Loss 4.4998 LearningRate 0.0002 Epoch: 22 Global Step: 465720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:05:58,309-Speed 6332.51 samples/sec Loss 4.5404 LearningRate 0.0002 Epoch: 22 Global Step: 465730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:01,560-Speed 6301.92 samples/sec Loss 4.5381 LearningRate 0.0002 Epoch: 22 Global Step: 465740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:04,807-Speed 6308.41 samples/sec Loss 4.5401 LearningRate 0.0002 Epoch: 22 Global Step: 465750 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:08,052-Speed 6313.01 samples/sec Loss 4.5295 LearningRate 0.0002 Epoch: 22 Global Step: 465760 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:11,297-Speed 6313.23 samples/sec Loss 4.5493 LearningRate 0.0002 Epoch: 22 Global Step: 465770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:14,564-Speed 6269.26 samples/sec Loss 4.4814 LearningRate 0.0002 Epoch: 22 Global Step: 465780 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:17,915-Speed 6113.16 samples/sec Loss 4.5787 LearningRate 0.0002 Epoch: 22 Global Step: 465790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:21,174-Speed 6285.43 samples/sec Loss 4.5379 LearningRate 0.0002 Epoch: 22 Global Step: 465800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:24,418-Speed 6314.39 samples/sec Loss 4.4951 LearningRate 0.0002 Epoch: 22 Global Step: 465810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:27,661-Speed 6316.08 samples/sec Loss 4.5864 LearningRate 0.0002 Epoch: 22 Global Step: 465820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:30,907-Speed 6310.63 samples/sec Loss 4.4674 LearningRate 0.0002 Epoch: 22 Global Step: 465830 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:34,150-Speed 6317.12 samples/sec Loss 4.4990 LearningRate 0.0002 Epoch: 22 Global Step: 465840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:37,447-Speed 6213.68 samples/sec Loss 4.5438 LearningRate 0.0002 Epoch: 22 Global Step: 465850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:40,796-Speed 6115.97 samples/sec Loss 4.5735 LearningRate 0.0002 Epoch: 22 Global Step: 465860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:44,045-Speed 6305.73 samples/sec Loss 4.5108 LearningRate 0.0002 Epoch: 22 Global Step: 465870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:47,290-Speed 6311.91 samples/sec Loss 4.4652 LearningRate 0.0002 Epoch: 22 Global Step: 465880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:50,540-Speed 6304.19 samples/sec Loss 4.5418 LearningRate 0.0002 Epoch: 22 Global Step: 465890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:53,787-Speed 6308.75 samples/sec Loss 4.5327 LearningRate 0.0002 Epoch: 22 Global Step: 465900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:06:57,032-Speed 6312.60 samples/sec Loss 4.5257 LearningRate 0.0002 Epoch: 22 Global Step: 465910 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:00,279-Speed 6309.24 samples/sec Loss 4.4720 LearningRate 0.0002 Epoch: 22 Global Step: 465920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:03,523-Speed 6314.13 samples/sec Loss 4.5204 LearningRate 0.0002 Epoch: 22 Global Step: 465930 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:07:06,757-Speed 6334.83 samples/sec Loss 4.4914 LearningRate 0.0002 Epoch: 22 Global Step: 465940 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:10,001-Speed 6313.95 samples/sec Loss 4.5172 LearningRate 0.0002 Epoch: 22 Global Step: 465950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:13,241-Speed 6321.76 samples/sec Loss 4.4398 LearningRate 0.0002 Epoch: 22 Global Step: 465960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:16,484-Speed 6317.07 samples/sec Loss 4.4940 LearningRate 0.0002 Epoch: 22 Global Step: 465970 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:19,727-Speed 6316.11 samples/sec Loss 4.4645 LearningRate 0.0002 Epoch: 22 Global Step: 465980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:22,972-Speed 6312.65 samples/sec Loss 4.5704 LearningRate 0.0002 Epoch: 22 Global Step: 465990 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:26,219-Speed 6308.98 samples/sec Loss 4.5436 LearningRate 0.0002 Epoch: 22 Global Step: 466000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:29,463-Speed 6314.63 samples/sec Loss 4.4739 LearningRate 0.0002 Epoch: 22 Global Step: 466010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:32,712-Speed 6306.30 samples/sec Loss 4.5797 LearningRate 0.0002 Epoch: 22 Global Step: 466020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:35,953-Speed 6318.73 samples/sec Loss 4.5129 LearningRate 0.0002 Epoch: 22 Global Step: 466030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:39,202-Speed 6306.72 samples/sec Loss 4.4569 LearningRate 0.0002 Epoch: 22 Global Step: 466040 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:07:42,432-Speed 6341.50 samples/sec Loss 4.4621 LearningRate 0.0002 Epoch: 22 Global Step: 466050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:45,674-Speed 6318.76 samples/sec Loss 4.5569 LearningRate 0.0002 Epoch: 22 Global Step: 466060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:48,922-Speed 6306.21 samples/sec Loss 4.5096 LearningRate 0.0002 Epoch: 22 Global Step: 466070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:52,183-Speed 6282.46 samples/sec Loss 4.5362 LearningRate 0.0002 Epoch: 22 Global Step: 466080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:55,478-Speed 6216.73 samples/sec Loss 4.4411 LearningRate 0.0002 Epoch: 22 Global Step: 466090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:07:58,721-Speed 6316.52 samples/sec Loss 4.5027 LearningRate 0.0002 Epoch: 22 Global Step: 466100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:01,969-Speed 6307.89 samples/sec Loss 4.5306 LearningRate 0.0002 Epoch: 22 Global Step: 466110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:05,215-Speed 6311.05 samples/sec Loss 4.5566 LearningRate 0.0002 Epoch: 22 Global Step: 466120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:08,456-Speed 6319.60 samples/sec Loss 4.4507 LearningRate 0.0002 Epoch: 22 Global Step: 466130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:11,702-Speed 6310.79 samples/sec Loss 4.4998 LearningRate 0.0002 Epoch: 22 Global Step: 466140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:14,932-Speed 6342.00 samples/sec Loss 4.5673 LearningRate 0.0002 Epoch: 22 Global Step: 466150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:18,175-Speed 6316.80 samples/sec Loss 4.5448 LearningRate 0.0002 Epoch: 22 Global Step: 466160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:21,428-Speed 6297.26 samples/sec Loss 4.5493 LearningRate 0.0002 Epoch: 22 Global Step: 466170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:24,670-Speed 6318.19 samples/sec Loss 4.5456 LearningRate 0.0002 Epoch: 22 Global Step: 466180 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:27,917-Speed 6307.36 samples/sec Loss 4.4833 LearningRate 0.0002 Epoch: 22 Global Step: 466190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:31,172-Speed 6293.16 samples/sec Loss 4.5528 LearningRate 0.0002 Epoch: 22 Global Step: 466200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:34,470-Speed 6211.36 samples/sec Loss 4.5245 LearningRate 0.0002 Epoch: 22 Global Step: 466210 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:37,713-Speed 6317.31 samples/sec Loss 4.5808 LearningRate 0.0002 Epoch: 22 Global Step: 466220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:40,960-Speed 6309.51 samples/sec Loss 4.4711 LearningRate 0.0002 Epoch: 22 Global Step: 466230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:44,204-Speed 6314.87 samples/sec Loss 4.5184 LearningRate 0.0002 Epoch: 22 Global Step: 466240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:47,459-Speed 6292.39 samples/sec Loss 4.5554 LearningRate 0.0002 Epoch: 22 Global Step: 466250 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:08:50,689-Speed 6342.79 samples/sec Loss 4.4600 LearningRate 0.0002 Epoch: 22 Global Step: 466260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:53,932-Speed 6316.90 samples/sec Loss 4.4793 LearningRate 0.0002 Epoch: 22 Global Step: 466270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:08:57,178-Speed 6311.37 samples/sec Loss 4.5132 LearningRate 0.0002 Epoch: 22 Global Step: 466280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:00,424-Speed 6309.03 samples/sec Loss 4.5420 LearningRate 0.0002 Epoch: 22 Global Step: 466290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:03,666-Speed 6320.38 samples/sec Loss 4.5263 LearningRate 0.0002 Epoch: 22 Global Step: 466300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:06,913-Speed 6307.15 samples/sec Loss 4.5576 LearningRate 0.0002 Epoch: 22 Global Step: 466310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:10,156-Speed 6316.78 samples/sec Loss 4.5546 LearningRate 0.0002 Epoch: 22 Global Step: 466320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:13,402-Speed 6311.78 samples/sec Loss 4.4907 LearningRate 0.0002 Epoch: 22 Global Step: 466330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:16,647-Speed 6312.41 samples/sec Loss 4.5206 LearningRate 0.0002 Epoch: 22 Global Step: 466340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:19,893-Speed 6309.75 samples/sec Loss 4.4927 LearningRate 0.0002 Epoch: 22 Global Step: 466350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:23,124-Speed 6341.45 samples/sec Loss 4.4615 LearningRate 0.0002 Epoch: 22 Global Step: 466360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:26,368-Speed 6313.10 samples/sec Loss 4.5097 LearningRate 0.0002 Epoch: 22 Global Step: 466370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:29,613-Speed 6313.87 samples/sec Loss 4.4902 LearningRate 0.0002 Epoch: 22 Global Step: 466380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:32,857-Speed 6315.14 samples/sec Loss 4.4881 LearningRate 0.0002 Epoch: 22 Global Step: 466390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:36,100-Speed 6314.42 samples/sec Loss 4.5329 LearningRate 0.0002 Epoch: 22 Global Step: 466400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:39,385-Speed 6237.90 samples/sec Loss 4.5077 LearningRate 0.0002 Epoch: 22 Global Step: 466410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:42,628-Speed 6314.73 samples/sec Loss 4.4921 LearningRate 0.0002 Epoch: 22 Global Step: 466420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:45,873-Speed 6313.57 samples/sec Loss 4.4707 LearningRate 0.0002 Epoch: 22 Global Step: 466430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:49,118-Speed 6311.42 samples/sec Loss 4.5165 LearningRate 0.0002 Epoch: 22 Global Step: 466440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:52,361-Speed 6317.83 samples/sec Loss 4.5084 LearningRate 0.0002 Epoch: 22 Global Step: 466450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:09:55,604-Speed 6316.03 samples/sec Loss 4.4944 LearningRate 0.0002 Epoch: 22 Global Step: 466460 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:09:58,837-Speed 6335.77 samples/sec Loss 4.5272 LearningRate 0.0002 Epoch: 22 Global Step: 466470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:02,085-Speed 6306.90 samples/sec Loss 4.4934 LearningRate 0.0002 Epoch: 22 Global Step: 466480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:05,332-Speed 6310.62 samples/sec Loss 4.5567 LearningRate 0.0002 Epoch: 22 Global Step: 466490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:08,577-Speed 6311.69 samples/sec Loss 4.5228 LearningRate 0.0002 Epoch: 22 Global Step: 466500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:11,825-Speed 6307.72 samples/sec Loss 4.5495 LearningRate 0.0002 Epoch: 22 Global Step: 466510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:15,152-Speed 6157.09 samples/sec Loss 4.4774 LearningRate 0.0002 Epoch: 22 Global Step: 466520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:18,482-Speed 6150.34 samples/sec Loss 4.4865 LearningRate 0.0002 Epoch: 22 Global Step: 466530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:21,724-Speed 6318.68 samples/sec Loss 4.4790 LearningRate 0.0002 Epoch: 22 Global Step: 466540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:24,972-Speed 6307.43 samples/sec Loss 4.4935 LearningRate 0.0002 Epoch: 22 Global Step: 466550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:28,220-Speed 6307.37 samples/sec Loss 4.4891 LearningRate 0.0002 Epoch: 22 Global Step: 466560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:31,452-Speed 6337.24 samples/sec Loss 4.5143 LearningRate 0.0002 Epoch: 22 Global Step: 466570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:34,697-Speed 6313.91 samples/sec Loss 4.4630 LearningRate 0.0002 Epoch: 22 Global Step: 466580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:37,945-Speed 6306.80 samples/sec Loss 4.5090 LearningRate 0.0002 Epoch: 22 Global Step: 466590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:41,187-Speed 6317.72 samples/sec Loss 4.5164 LearningRate 0.0002 Epoch: 22 Global Step: 466600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:44,433-Speed 6310.10 samples/sec Loss 4.5576 LearningRate 0.0002 Epoch: 22 Global Step: 466610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:47,680-Speed 6308.40 samples/sec Loss 4.5814 LearningRate 0.0002 Epoch: 22 Global Step: 466620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:50,927-Speed 6310.42 samples/sec Loss 4.5157 LearningRate 0.0002 Epoch: 22 Global Step: 466630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:54,173-Speed 6309.54 samples/sec Loss 4.5226 LearningRate 0.0002 Epoch: 22 Global Step: 466640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:10:57,416-Speed 6317.27 samples/sec Loss 4.4449 LearningRate 0.0002 Epoch: 22 Global Step: 466650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:00,660-Speed 6315.01 samples/sec Loss 4.4557 LearningRate 0.0002 Epoch: 22 Global Step: 466660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:03,906-Speed 6310.34 samples/sec Loss 4.4693 LearningRate 0.0002 Epoch: 22 Global Step: 466670 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:11:07,142-Speed 6330.33 samples/sec Loss 4.5174 LearningRate 0.0002 Epoch: 22 Global Step: 466680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:10,395-Speed 6296.92 samples/sec Loss 4.5567 LearningRate 0.0002 Epoch: 22 Global Step: 466690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:13,637-Speed 6317.69 samples/sec Loss 4.4858 LearningRate 0.0002 Epoch: 22 Global Step: 466700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:16,884-Speed 6310.54 samples/sec Loss 4.4761 LearningRate 0.0002 Epoch: 22 Global Step: 466710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:20,130-Speed 6311.18 samples/sec Loss 4.5580 LearningRate 0.0002 Epoch: 22 Global Step: 466720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:23,379-Speed 6303.99 samples/sec Loss 4.5689 LearningRate 0.0002 Epoch: 22 Global Step: 466730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:26,626-Speed 6309.30 samples/sec Loss 4.4943 LearningRate 0.0002 Epoch: 22 Global Step: 466740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:29,875-Speed 6305.52 samples/sec Loss 4.5604 LearningRate 0.0002 Epoch: 22 Global Step: 466750 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:33,120-Speed 6312.97 samples/sec Loss 4.5024 LearningRate 0.0002 Epoch: 22 Global Step: 466760 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:36,366-Speed 6309.81 samples/sec Loss 4.5063 LearningRate 0.0002 Epoch: 22 Global Step: 466770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:39,599-Speed 6337.15 samples/sec Loss 4.5373 LearningRate 0.0002 Epoch: 22 Global Step: 466780 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:42,844-Speed 6310.91 samples/sec Loss 4.5670 LearningRate 0.0002 Epoch: 22 Global Step: 466790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:46,085-Speed 6321.27 samples/sec Loss 4.4022 LearningRate 0.0002 Epoch: 22 Global Step: 466800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:49,331-Speed 6310.89 samples/sec Loss 4.4675 LearningRate 0.0002 Epoch: 22 Global Step: 466810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:52,579-Speed 6306.80 samples/sec Loss 4.5104 LearningRate 0.0002 Epoch: 22 Global Step: 466820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:55,825-Speed 6309.89 samples/sec Loss 4.4893 LearningRate 0.0002 Epoch: 22 Global Step: 466830 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:11:59,075-Speed 6303.20 samples/sec Loss 4.5319 LearningRate 0.0002 Epoch: 22 Global Step: 466840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:02,320-Speed 6312.83 samples/sec Loss 4.5147 LearningRate 0.0002 Epoch: 22 Global Step: 466850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:05,566-Speed 6310.51 samples/sec Loss 4.4729 LearningRate 0.0002 Epoch: 22 Global Step: 466860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:08,811-Speed 6312.58 samples/sec Loss 4.5003 LearningRate 0.0002 Epoch: 22 Global Step: 466870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:12,062-Speed 6302.34 samples/sec Loss 4.4519 LearningRate 0.0002 Epoch: 22 Global Step: 466880 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:12:15,298-Speed 6330.59 samples/sec Loss 4.5364 LearningRate 0.0002 Epoch: 22 Global Step: 466890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:18,543-Speed 6310.80 samples/sec Loss 4.5450 LearningRate 0.0002 Epoch: 22 Global Step: 466900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:21,793-Speed 6304.06 samples/sec Loss 4.4706 LearningRate 0.0002 Epoch: 22 Global Step: 466910 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:25,038-Speed 6313.09 samples/sec Loss 4.5727 LearningRate 0.0002 Epoch: 22 Global Step: 466920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:28,288-Speed 6303.62 samples/sec Loss 4.5452 LearningRate 0.0002 Epoch: 22 Global Step: 466930 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:31,534-Speed 6309.55 samples/sec Loss 4.4596 LearningRate 0.0002 Epoch: 22 Global Step: 466940 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:34,775-Speed 6322.21 samples/sec Loss 4.4625 LearningRate 0.0002 Epoch: 22 Global Step: 466950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:38,022-Speed 6307.89 samples/sec Loss 4.4923 LearningRate 0.0002 Epoch: 22 Global Step: 466960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:41,268-Speed 6310.31 samples/sec Loss 4.5495 LearningRate 0.0002 Epoch: 22 Global Step: 466970 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:44,513-Speed 6313.19 samples/sec Loss 4.4645 LearningRate 0.0002 Epoch: 22 Global Step: 466980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:47,760-Speed 6309.12 samples/sec Loss 4.4803 LearningRate 0.0002 Epoch: 22 Global Step: 466990 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:12:50,992-Speed 6337.81 samples/sec Loss 4.5132 LearningRate 0.0002 Epoch: 22 Global Step: 467000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:54,240-Speed 6306.13 samples/sec Loss 4.5169 LearningRate 0.0002 Epoch: 22 Global Step: 467010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:12:57,492-Speed 6299.09 samples/sec Loss 4.4346 LearningRate 0.0002 Epoch: 22 Global Step: 467020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:00,738-Speed 6311.47 samples/sec Loss 4.5223 LearningRate 0.0002 Epoch: 22 Global Step: 467030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:03,983-Speed 6313.47 samples/sec Loss 4.5760 LearningRate 0.0002 Epoch: 22 Global Step: 467040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:07,228-Speed 6311.63 samples/sec Loss 4.4886 LearningRate 0.0002 Epoch: 22 Global Step: 467050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:10,475-Speed 6308.11 samples/sec Loss 4.5233 LearningRate 0.0002 Epoch: 22 Global Step: 467060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:13,720-Speed 6313.93 samples/sec Loss 4.4433 LearningRate 0.0002 Epoch: 22 Global Step: 467070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:16,970-Speed 6301.68 samples/sec Loss 4.5216 LearningRate 0.0002 Epoch: 22 Global Step: 467080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:20,217-Speed 6308.61 samples/sec Loss 4.4928 LearningRate 0.0002 Epoch: 22 Global Step: 467090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:23,464-Speed 6309.34 samples/sec Loss 4.5226 LearningRate 0.0002 Epoch: 22 Global Step: 467100 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:13:26,698-Speed 6334.74 samples/sec Loss 4.5036 LearningRate 0.0002 Epoch: 22 Global Step: 467110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:29,944-Speed 6309.62 samples/sec Loss 4.5499 LearningRate 0.0002 Epoch: 22 Global Step: 467120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:33,197-Speed 6299.25 samples/sec Loss 4.5116 LearningRate 0.0002 Epoch: 22 Global Step: 467130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:36,441-Speed 6313.46 samples/sec Loss 4.5091 LearningRate 0.0002 Epoch: 22 Global Step: 467140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:39,689-Speed 6307.49 samples/sec Loss 4.5125 LearningRate 0.0002 Epoch: 22 Global Step: 467150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:42,934-Speed 6312.72 samples/sec Loss 4.5620 LearningRate 0.0002 Epoch: 22 Global Step: 467160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:46,185-Speed 6301.92 samples/sec Loss 4.4925 LearningRate 0.0002 Epoch: 22 Global Step: 467170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:49,430-Speed 6312.66 samples/sec Loss 4.5220 LearningRate 0.0002 Epoch: 22 Global Step: 467180 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:52,674-Speed 6313.88 samples/sec Loss 4.5331 LearningRate 0.0002 Epoch: 22 Global Step: 467190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:55,921-Speed 6308.38 samples/sec Loss 4.4698 LearningRate 0.0002 Epoch: 22 Global Step: 467200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:13:59,167-Speed 6309.99 samples/sec Loss 4.5322 LearningRate 0.0002 Epoch: 22 Global Step: 467210 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:14:02,400-Speed 6336.24 samples/sec Loss 4.4486 LearningRate 0.0002 Epoch: 22 Global Step: 467220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:05,646-Speed 6311.81 samples/sec Loss 4.5066 LearningRate 0.0002 Epoch: 22 Global Step: 467230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:08,893-Speed 6308.39 samples/sec Loss 4.5536 LearningRate 0.0002 Epoch: 22 Global Step: 467240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:12,193-Speed 6210.93 samples/sec Loss 4.4626 LearningRate 0.0002 Epoch: 22 Global Step: 467250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:15,442-Speed 6305.80 samples/sec Loss 4.5134 LearningRate 0.0002 Epoch: 22 Global Step: 467260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:18,691-Speed 6304.43 samples/sec Loss 4.4359 LearningRate 0.0002 Epoch: 22 Global Step: 467270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:21,937-Speed 6310.57 samples/sec Loss 4.5298 LearningRate 0.0002 Epoch: 22 Global Step: 467280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:25,184-Speed 6309.10 samples/sec Loss 4.5410 LearningRate 0.0002 Epoch: 22 Global Step: 467290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:28,432-Speed 6306.81 samples/sec Loss 4.5408 LearningRate 0.0002 Epoch: 22 Global Step: 467300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:31,672-Speed 6321.71 samples/sec Loss 4.5110 LearningRate 0.0002 Epoch: 22 Global Step: 467310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:34,904-Speed 6338.14 samples/sec Loss 4.5401 LearningRate 0.0002 Epoch: 22 Global Step: 467320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:38,150-Speed 6309.42 samples/sec Loss 4.5357 LearningRate 0.0002 Epoch: 22 Global Step: 467330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:41,401-Speed 6301.96 samples/sec Loss 4.5028 LearningRate 0.0002 Epoch: 22 Global Step: 467340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:44,650-Speed 6305.24 samples/sec Loss 4.4806 LearningRate 0.0002 Epoch: 22 Global Step: 467350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:47,897-Speed 6309.19 samples/sec Loss 4.5601 LearningRate 0.0002 Epoch: 22 Global Step: 467360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:51,140-Speed 6316.02 samples/sec Loss 4.5096 LearningRate 0.0002 Epoch: 22 Global Step: 467370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:54,385-Speed 6313.06 samples/sec Loss 4.5344 LearningRate 0.0002 Epoch: 22 Global Step: 467380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:14:57,632-Speed 6309.95 samples/sec Loss 4.5027 LearningRate 0.0002 Epoch: 22 Global Step: 467390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:00,876-Speed 6312.87 samples/sec Loss 4.5042 LearningRate 0.0002 Epoch: 22 Global Step: 467400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:04,119-Speed 6318.23 samples/sec Loss 4.4648 LearningRate 0.0002 Epoch: 22 Global Step: 467410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:07,350-Speed 6340.31 samples/sec Loss 4.4511 LearningRate 0.0002 Epoch: 22 Global Step: 467420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:10,594-Speed 6313.29 samples/sec Loss 4.4458 LearningRate 0.0002 Epoch: 22 Global Step: 467430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:13,845-Speed 6302.42 samples/sec Loss 4.5070 LearningRate 0.0002 Epoch: 22 Global Step: 467440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:17,090-Speed 6312.34 samples/sec Loss 4.5797 LearningRate 0.0002 Epoch: 22 Global Step: 467450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:20,334-Speed 6313.15 samples/sec Loss 4.5098 LearningRate 0.0002 Epoch: 22 Global Step: 467460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:23,578-Speed 6316.34 samples/sec Loss 4.5385 LearningRate 0.0002 Epoch: 22 Global Step: 467470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:26,821-Speed 6314.76 samples/sec Loss 4.5050 LearningRate 0.0002 Epoch: 22 Global Step: 467480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:30,067-Speed 6311.21 samples/sec Loss 4.5297 LearningRate 0.0002 Epoch: 22 Global Step: 467490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:33,310-Speed 6316.78 samples/sec Loss 4.4703 LearningRate 0.0002 Epoch: 22 Global Step: 467500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:36,552-Speed 6318.97 samples/sec Loss 4.5636 LearningRate 0.0002 Epoch: 22 Global Step: 467510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:39,788-Speed 6329.77 samples/sec Loss 4.4569 LearningRate 0.0002 Epoch: 22 Global Step: 467520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:43,031-Speed 6316.76 samples/sec Loss 4.4825 LearningRate 0.0002 Epoch: 22 Global Step: 467530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:46,281-Speed 6303.93 samples/sec Loss 4.5416 LearningRate 0.0002 Epoch: 22 Global Step: 467540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:49,524-Speed 6316.00 samples/sec Loss 4.5585 LearningRate 0.0002 Epoch: 22 Global Step: 467550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:52,771-Speed 6308.87 samples/sec Loss 4.4636 LearningRate 0.0002 Epoch: 22 Global Step: 467560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:56,014-Speed 6316.76 samples/sec Loss 4.5312 LearningRate 0.0002 Epoch: 22 Global Step: 467570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:15:59,258-Speed 6314.12 samples/sec Loss 4.4722 LearningRate 0.0002 Epoch: 22 Global Step: 467580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:02,504-Speed 6311.76 samples/sec Loss 4.5199 LearningRate 0.0002 Epoch: 22 Global Step: 467590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:05,749-Speed 6311.99 samples/sec Loss 4.5015 LearningRate 0.0002 Epoch: 22 Global Step: 467600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:08,994-Speed 6312.46 samples/sec Loss 4.5303 LearningRate 0.0002 Epoch: 22 Global Step: 467610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:12,227-Speed 6337.90 samples/sec Loss 4.5163 LearningRate 0.0002 Epoch: 22 Global Step: 467620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:15,473-Speed 6309.05 samples/sec Loss 4.5026 LearningRate 0.0002 Epoch: 22 Global Step: 467630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:18,722-Speed 6305.55 samples/sec Loss 4.5289 LearningRate 0.0002 Epoch: 22 Global Step: 467640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:21,969-Speed 6307.47 samples/sec Loss 4.5154 LearningRate 0.0002 Epoch: 22 Global Step: 467650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:25,213-Speed 6314.91 samples/sec Loss 4.4854 LearningRate 0.0002 Epoch: 22 Global Step: 467660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:28,457-Speed 6315.52 samples/sec Loss 4.5504 LearningRate 0.0002 Epoch: 22 Global Step: 467670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:31,703-Speed 6309.87 samples/sec Loss 4.4967 LearningRate 0.0002 Epoch: 22 Global Step: 467680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:34,944-Speed 6320.35 samples/sec Loss 4.4991 LearningRate 0.0002 Epoch: 22 Global Step: 467690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:38,189-Speed 6313.18 samples/sec Loss 4.4829 LearningRate 0.0002 Epoch: 22 Global Step: 467700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:41,436-Speed 6310.25 samples/sec Loss 4.4782 LearningRate 0.0002 Epoch: 22 Global Step: 467710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:44,679-Speed 6316.17 samples/sec Loss 4.4826 LearningRate 0.0002 Epoch: 22 Global Step: 467720 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:16:47,907-Speed 6344.71 samples/sec Loss 4.5382 LearningRate 0.0002 Epoch: 22 Global Step: 467730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:51,150-Speed 6316.41 samples/sec Loss 4.4887 LearningRate 0.0002 Epoch: 22 Global Step: 467740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:54,398-Speed 6308.34 samples/sec Loss 4.5293 LearningRate 0.0002 Epoch: 22 Global Step: 467750 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:16:57,644-Speed 6310.02 samples/sec Loss 4.5132 LearningRate 0.0002 Epoch: 22 Global Step: 467760 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:00,901-Speed 6289.80 samples/sec Loss 4.4853 LearningRate 0.0002 Epoch: 22 Global Step: 467770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:04,150-Speed 6306.30 samples/sec Loss 4.5152 LearningRate 0.0002 Epoch: 22 Global Step: 467780 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:07,395-Speed 6310.93 samples/sec Loss 4.4893 LearningRate 0.0002 Epoch: 22 Global Step: 467790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:10,643-Speed 6308.06 samples/sec Loss 4.4841 LearningRate 0.0002 Epoch: 22 Global Step: 467800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:13,886-Speed 6316.18 samples/sec Loss 4.4776 LearningRate 0.0002 Epoch: 22 Global Step: 467810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:17,132-Speed 6311.04 samples/sec Loss 4.4077 LearningRate 0.0002 Epoch: 22 Global Step: 467820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:20,364-Speed 6337.39 samples/sec Loss 4.5497 LearningRate 0.0002 Epoch: 22 Global Step: 467830 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:23,611-Speed 6308.18 samples/sec Loss 4.4834 LearningRate 0.0002 Epoch: 22 Global Step: 467840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:26,855-Speed 6315.95 samples/sec Loss 4.4566 LearningRate 0.0002 Epoch: 22 Global Step: 467850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:30,098-Speed 6316.08 samples/sec Loss 4.5030 LearningRate 0.0002 Epoch: 22 Global Step: 467860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:33,345-Speed 6309.61 samples/sec Loss 4.5051 LearningRate 0.0002 Epoch: 22 Global Step: 467870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:36,587-Speed 6317.69 samples/sec Loss 4.5241 LearningRate 0.0002 Epoch: 22 Global Step: 467880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:39,843-Speed 6290.95 samples/sec Loss 4.5031 LearningRate 0.0002 Epoch: 22 Global Step: 467890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:43,093-Speed 6303.26 samples/sec Loss 4.5008 LearningRate 0.0002 Epoch: 22 Global Step: 467900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:46,337-Speed 6314.68 samples/sec Loss 4.4824 LearningRate 0.0002 Epoch: 22 Global Step: 467910 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:49,593-Speed 6291.43 samples/sec Loss 4.5743 LearningRate 0.0002 Epoch: 22 Global Step: 467920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:17:52,869-Speed 6253.72 samples/sec Loss 4.4380 LearningRate 0.0002 Epoch: 22 Global Step: 467930 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:17:56,128-Speed 6285.04 samples/sec Loss 4.5612 LearningRate 0.0002 Epoch: 22 Global Step: 467940 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:17:59,361-Speed 6335.40 samples/sec Loss 4.4903 LearningRate 0.0002 Epoch: 22 Global Step: 467950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:02,634-Speed 6259.65 samples/sec Loss 4.4530 LearningRate 0.0002 Epoch: 22 Global Step: 467960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:05,882-Speed 6307.66 samples/sec Loss 4.4379 LearningRate 0.0002 Epoch: 22 Global Step: 467970 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:09,129-Speed 6307.25 samples/sec Loss 4.5297 LearningRate 0.0002 Epoch: 22 Global Step: 467980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:12,379-Speed 6303.93 samples/sec Loss 4.5152 LearningRate 0.0002 Epoch: 22 Global Step: 467990 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:15,678-Speed 6209.43 samples/sec Loss 4.4962 LearningRate 0.0002 Epoch: 22 Global Step: 468000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:18,998-Speed 6169.66 samples/sec Loss 4.5216 LearningRate 0.0002 Epoch: 22 Global Step: 468010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:22,248-Speed 6303.59 samples/sec Loss 4.5046 LearningRate 0.0002 Epoch: 22 Global Step: 468020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:25,495-Speed 6308.67 samples/sec Loss 4.4510 LearningRate 0.0002 Epoch: 22 Global Step: 468030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:28,742-Speed 6309.67 samples/sec Loss 4.4788 LearningRate 0.0002 Epoch: 22 Global Step: 468040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:31,973-Speed 6338.35 samples/sec Loss 4.4778 LearningRate 0.0002 Epoch: 22 Global Step: 468050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:35,221-Speed 6307.91 samples/sec Loss 4.5177 LearningRate 0.0002 Epoch: 22 Global Step: 468060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:38,469-Speed 6306.58 samples/sec Loss 4.5314 LearningRate 0.0002 Epoch: 22 Global Step: 468070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:41,716-Speed 6310.03 samples/sec Loss 4.4727 LearningRate 0.0002 Epoch: 22 Global Step: 468080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:44,961-Speed 6312.51 samples/sec Loss 4.4759 LearningRate 0.0002 Epoch: 22 Global Step: 468090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:48,210-Speed 6303.90 samples/sec Loss 4.5114 LearningRate 0.0002 Epoch: 22 Global Step: 468100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:51,459-Speed 6304.71 samples/sec Loss 4.5054 LearningRate 0.0002 Epoch: 22 Global Step: 468110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:54,707-Speed 6306.40 samples/sec Loss 4.5339 LearningRate 0.0002 Epoch: 22 Global Step: 468120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:18:57,977-Speed 6264.19 samples/sec Loss 4.5151 LearningRate 0.0002 Epoch: 22 Global Step: 468130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:01,221-Speed 6314.26 samples/sec Loss 4.4573 LearningRate 0.0002 Epoch: 22 Global Step: 468140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:04,480-Speed 6285.92 samples/sec Loss 4.4957 LearningRate 0.0002 Epoch: 22 Global Step: 468150 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:19:07,712-Speed 6339.76 samples/sec Loss 4.5105 LearningRate 0.0002 Epoch: 22 Global Step: 468160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:10,955-Speed 6315.85 samples/sec Loss 4.4997 LearningRate 0.0002 Epoch: 22 Global Step: 468170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:14,198-Speed 6316.19 samples/sec Loss 4.5474 LearningRate 0.0002 Epoch: 22 Global Step: 468180 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:17,442-Speed 6314.88 samples/sec Loss 4.4487 LearningRate 0.0002 Epoch: 22 Global Step: 468190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:20,685-Speed 6316.90 samples/sec Loss 4.4963 LearningRate 0.0002 Epoch: 22 Global Step: 468200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:23,931-Speed 6310.54 samples/sec Loss 4.4536 LearningRate 0.0002 Epoch: 22 Global Step: 468210 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:27,174-Speed 6316.67 samples/sec Loss 4.4814 LearningRate 0.0002 Epoch: 22 Global Step: 468220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:30,419-Speed 6313.41 samples/sec Loss 4.4931 LearningRate 0.0002 Epoch: 22 Global Step: 468230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:33,664-Speed 6312.45 samples/sec Loss 4.5030 LearningRate 0.0002 Epoch: 22 Global Step: 468240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:36,915-Speed 6301.10 samples/sec Loss 4.5453 LearningRate 0.0002 Epoch: 22 Global Step: 468250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:40,151-Speed 6330.53 samples/sec Loss 4.4381 LearningRate 0.0002 Epoch: 22 Global Step: 468260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:43,397-Speed 6309.99 samples/sec Loss 4.4812 LearningRate 0.0002 Epoch: 22 Global Step: 468270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:46,657-Speed 6285.09 samples/sec Loss 4.4585 LearningRate 0.0002 Epoch: 22 Global Step: 468280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:49,958-Speed 6205.39 samples/sec Loss 4.4907 LearningRate 0.0002 Epoch: 22 Global Step: 468290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:53,199-Speed 6319.36 samples/sec Loss 4.5289 LearningRate 0.0002 Epoch: 22 Global Step: 468300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:56,447-Speed 6306.09 samples/sec Loss 4.4689 LearningRate 0.0002 Epoch: 22 Global Step: 468310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:19:59,732-Speed 6236.00 samples/sec Loss 4.4900 LearningRate 0.0002 Epoch: 22 Global Step: 468320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:02,979-Speed 6309.81 samples/sec Loss 4.5124 LearningRate 0.0002 Epoch: 22 Global Step: 468330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:06,223-Speed 6313.80 samples/sec Loss 4.4285 LearningRate 0.0002 Epoch: 22 Global Step: 468340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:09,465-Speed 6318.83 samples/sec Loss 4.5613 LearningRate 0.0002 Epoch: 22 Global Step: 468350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:12,710-Speed 6312.81 samples/sec Loss 4.4749 LearningRate 0.0002 Epoch: 22 Global Step: 468360 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:20:15,938-Speed 6346.45 samples/sec Loss 4.4944 LearningRate 0.0002 Epoch: 22 Global Step: 468370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:19,184-Speed 6311.19 samples/sec Loss 4.4941 LearningRate 0.0002 Epoch: 22 Global Step: 468380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:22,431-Speed 6309.84 samples/sec Loss 4.5014 LearningRate 0.0002 Epoch: 22 Global Step: 468390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:25,677-Speed 6310.48 samples/sec Loss 4.5265 LearningRate 0.0002 Epoch: 22 Global Step: 468400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:28,921-Speed 6313.71 samples/sec Loss 4.5208 LearningRate 0.0002 Epoch: 22 Global Step: 468410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:32,167-Speed 6310.23 samples/sec Loss 4.4589 LearningRate 0.0002 Epoch: 22 Global Step: 468420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:35,414-Speed 6309.98 samples/sec Loss 4.5606 LearningRate 0.0002 Epoch: 22 Global Step: 468430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:38,655-Speed 6319.67 samples/sec Loss 4.5227 LearningRate 0.0002 Epoch: 22 Global Step: 468440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:41,915-Speed 6284.31 samples/sec Loss 4.4066 LearningRate 0.0002 Epoch: 22 Global Step: 468450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:45,179-Speed 6275.32 samples/sec Loss 4.4912 LearningRate 0.0002 Epoch: 22 Global Step: 468460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:48,426-Speed 6309.85 samples/sec Loss 4.5207 LearningRate 0.0002 Epoch: 22 Global Step: 468470 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:20:51,656-Speed 6341.37 samples/sec Loss 4.4428 LearningRate 0.0002 Epoch: 22 Global Step: 468480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:54,899-Speed 6316.81 samples/sec Loss 4.5140 LearningRate 0.0002 Epoch: 22 Global Step: 468490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:20:58,145-Speed 6310.58 samples/sec Loss 4.5048 LearningRate 0.0002 Epoch: 22 Global Step: 468500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:01,390-Speed 6312.26 samples/sec Loss 4.4730 LearningRate 0.0002 Epoch: 22 Global Step: 468510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:04,636-Speed 6309.75 samples/sec Loss 4.5124 LearningRate 0.0002 Epoch: 22 Global Step: 468520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:07,882-Speed 6311.00 samples/sec Loss 4.4400 LearningRate 0.0002 Epoch: 22 Global Step: 468530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:11,127-Speed 6313.95 samples/sec Loss 4.4536 LearningRate 0.0002 Epoch: 22 Global Step: 468540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:14,375-Speed 6306.33 samples/sec Loss 4.4767 LearningRate 0.0002 Epoch: 22 Global Step: 468550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:17,621-Speed 6311.18 samples/sec Loss 4.4694 LearningRate 0.0002 Epoch: 22 Global Step: 468560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:20,865-Speed 6314.53 samples/sec Loss 4.5480 LearningRate 0.0002 Epoch: 22 Global Step: 468570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:24,099-Speed 6333.17 samples/sec Loss 4.5687 LearningRate 0.0002 Epoch: 22 Global Step: 468580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:27,346-Speed 6308.54 samples/sec Loss 4.4946 LearningRate 0.0002 Epoch: 22 Global Step: 468590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:30,590-Speed 6315.42 samples/sec Loss 4.4745 LearningRate 0.0002 Epoch: 22 Global Step: 468600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:33,835-Speed 6313.37 samples/sec Loss 4.4494 LearningRate 0.0002 Epoch: 22 Global Step: 468610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:37,080-Speed 6312.78 samples/sec Loss 4.4835 LearningRate 0.0002 Epoch: 22 Global Step: 468620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:40,325-Speed 6312.50 samples/sec Loss 4.5030 LearningRate 0.0002 Epoch: 22 Global Step: 468630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:43,571-Speed 6311.16 samples/sec Loss 4.5231 LearningRate 0.0002 Epoch: 22 Global Step: 468640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:46,824-Speed 6295.94 samples/sec Loss 4.5197 LearningRate 0.0002 Epoch: 22 Global Step: 468650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:50,162-Speed 6137.77 samples/sec Loss 4.4514 LearningRate 0.0002 Epoch: 22 Global Step: 468660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:21:53,463-Speed 6205.06 samples/sec Loss 4.5024 LearningRate 0.0002 Epoch: 22 Global Step: 468670 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:21:56,706-Speed 6315.83 samples/sec Loss 4.5209 LearningRate 0.0002 Epoch: 22 Global Step: 468680 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:21:59,952-Speed 6312.40 samples/sec Loss 4.4202 LearningRate 0.0002 Epoch: 22 Global Step: 468690 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:22:03,198-Speed 6308.95 samples/sec Loss 4.4981 LearningRate 0.0002 Epoch: 22 Global Step: 468700 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:22:06,445-Speed 6309.57 samples/sec Loss 4.4042 LearningRate 0.0002 Epoch: 22 Global Step: 468710 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:22:09,698-Speed 6297.40 samples/sec Loss 4.5001 LearningRate 0.0002 Epoch: 22 Global Step: 468720 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:22:12,954-Speed 6291.15 samples/sec Loss 4.5132 LearningRate 0.0002 Epoch: 22 Global Step: 468730 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:22:16,200-Speed 6311.14 samples/sec Loss 4.5027 LearningRate 0.0002 Epoch: 22 Global Step: 468740 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:22:19,445-Speed 6312.32 samples/sec Loss 4.4896 LearningRate 0.0002 Epoch: 22 Global Step: 468750 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:22:22,693-Speed 6305.82 samples/sec Loss 4.5235 LearningRate 0.0002 Epoch: 22 Global Step: 468760 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:22:25,943-Speed 6304.14 samples/sec Loss 4.4743 LearningRate 0.0002 Epoch: 22 Global Step: 468770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:22:29,187-Speed 6314.71 samples/sec Loss 4.4491 LearningRate 0.0002 Epoch: 22 Global Step: 468780 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:22:32,435-Speed 6306.88 samples/sec Loss 4.3829 LearningRate 0.0002 Epoch: 22 Global Step: 468790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:22:35,676-Speed 6321.06 samples/sec Loss 4.5024 LearningRate 0.0002 Epoch: 22 Global Step: 468800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:22:38,924-Speed 6307.64 samples/sec Loss 4.5327 LearningRate 0.0002 Epoch: 22 Global Step: 468810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:22:42,169-Speed 6311.40 samples/sec Loss 4.4392 LearningRate 0.0002 Epoch: 22 Global Step: 468820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:22:45,415-Speed 6311.08 samples/sec Loss 4.4390 LearningRate 0.0002 Epoch: 22 Global Step: 468830 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:22:48,664-Speed 6305.44 samples/sec Loss 4.4986 LearningRate 0.0002 Epoch: 22 Global Step: 468840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:22:51,907-Speed 6316.21 samples/sec Loss 4.4927 LearningRate 0.0002 Epoch: 22 Global Step: 468850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:22:55,150-Speed 6317.35 samples/sec Loss 4.5725 LearningRate 0.0002 Epoch: 22 Global Step: 468860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:22:58,381-Speed 6339.25 samples/sec Loss 4.5085 LearningRate 0.0002 Epoch: 22 Global Step: 468870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:01,623-Speed 6317.76 samples/sec Loss 4.5298 LearningRate 0.0002 Epoch: 22 Global Step: 468880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:04,871-Speed 6306.59 samples/sec Loss 4.5084 LearningRate 0.0002 Epoch: 22 Global Step: 468890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:08,125-Speed 6296.16 samples/sec Loss 4.5233 LearningRate 0.0002 Epoch: 22 Global Step: 468900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:11,368-Speed 6317.02 samples/sec Loss 4.4996 LearningRate 0.0002 Epoch: 22 Global Step: 468910 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:14,612-Speed 6314.36 samples/sec Loss 4.4767 LearningRate 0.0002 Epoch: 22 Global Step: 468920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:17,859-Speed 6309.65 samples/sec Loss 4.5441 LearningRate 0.0002 Epoch: 22 Global Step: 468930 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:21,107-Speed 6305.20 samples/sec Loss 4.4730 LearningRate 0.0002 Epoch: 22 Global Step: 468940 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:24,352-Speed 6312.68 samples/sec Loss 4.5254 LearningRate 0.0002 Epoch: 22 Global Step: 468950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:27,599-Speed 6308.50 samples/sec Loss 4.5068 LearningRate 0.0002 Epoch: 22 Global Step: 468960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:30,833-Speed 6334.26 samples/sec Loss 4.5336 LearningRate 0.0002 Epoch: 22 Global Step: 468970 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:34,093-Speed 6284.65 samples/sec Loss 4.5380 LearningRate 0.0002 Epoch: 22 Global Step: 468980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:37,342-Speed 6304.68 samples/sec Loss 4.4741 LearningRate 0.0002 Epoch: 22 Global Step: 468990 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:40,587-Speed 6313.54 samples/sec Loss 4.5226 LearningRate 0.0002 Epoch: 22 Global Step: 469000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:43,831-Speed 6314.80 samples/sec Loss 4.4912 LearningRate 0.0002 Epoch: 22 Global Step: 469010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:47,071-Speed 6323.30 samples/sec Loss 4.5562 LearningRate 0.0002 Epoch: 22 Global Step: 469020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:50,322-Speed 6300.26 samples/sec Loss 4.5038 LearningRate 0.0002 Epoch: 22 Global Step: 469030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:53,567-Speed 6313.02 samples/sec Loss 4.5276 LearningRate 0.0002 Epoch: 22 Global Step: 469040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:23:56,813-Speed 6309.08 samples/sec Loss 4.5197 LearningRate 0.0002 Epoch: 22 Global Step: 469050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:00,059-Speed 6311.00 samples/sec Loss 4.5245 LearningRate 0.0002 Epoch: 22 Global Step: 469060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:03,304-Speed 6312.57 samples/sec Loss 4.4454 LearningRate 0.0002 Epoch: 22 Global Step: 469070 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:24:06,541-Speed 6327.76 samples/sec Loss 4.4025 LearningRate 0.0002 Epoch: 22 Global Step: 469080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:09,787-Speed 6312.31 samples/sec Loss 4.4683 LearningRate 0.0002 Epoch: 22 Global Step: 469090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:13,031-Speed 6314.28 samples/sec Loss 4.5184 LearningRate 0.0002 Epoch: 22 Global Step: 469100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:16,282-Speed 6300.64 samples/sec Loss 4.5221 LearningRate 0.0002 Epoch: 22 Global Step: 469110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:19,528-Speed 6310.11 samples/sec Loss 4.4730 LearningRate 0.0002 Epoch: 22 Global Step: 469120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:22,780-Speed 6299.85 samples/sec Loss 4.5018 LearningRate 0.0002 Epoch: 22 Global Step: 469130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:26,029-Speed 6304.42 samples/sec Loss 4.5010 LearningRate 0.0002 Epoch: 22 Global Step: 469140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:29,275-Speed 6312.05 samples/sec Loss 4.5072 LearningRate 0.0002 Epoch: 22 Global Step: 469150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:32,521-Speed 6310.16 samples/sec Loss 4.5030 LearningRate 0.0002 Epoch: 22 Global Step: 469160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:35,765-Speed 6314.34 samples/sec Loss 4.4534 LearningRate 0.0002 Epoch: 22 Global Step: 469170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:39,014-Speed 6305.64 samples/sec Loss 4.4803 LearningRate 0.0002 Epoch: 22 Global Step: 469180 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:24:42,245-Speed 6338.77 samples/sec Loss 4.4702 LearningRate 0.0002 Epoch: 22 Global Step: 469190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:45,490-Speed 6313.57 samples/sec Loss 4.5640 LearningRate 0.0002 Epoch: 22 Global Step: 469200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:48,734-Speed 6316.01 samples/sec Loss 4.5483 LearningRate 0.0002 Epoch: 22 Global Step: 469210 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:51,977-Speed 6315.32 samples/sec Loss 4.5914 LearningRate 0.0002 Epoch: 22 Global Step: 469220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:55,222-Speed 6312.42 samples/sec Loss 4.5604 LearningRate 0.0002 Epoch: 22 Global Step: 469230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:24:58,471-Speed 6306.44 samples/sec Loss 4.4550 LearningRate 0.0002 Epoch: 22 Global Step: 469240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:01,715-Speed 6314.91 samples/sec Loss 4.5683 LearningRate 0.0002 Epoch: 22 Global Step: 469250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:04,960-Speed 6311.14 samples/sec Loss 4.5682 LearningRate 0.0002 Epoch: 22 Global Step: 469260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:08,207-Speed 6309.86 samples/sec Loss 4.4906 LearningRate 0.0002 Epoch: 22 Global Step: 469270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:11,455-Speed 6305.89 samples/sec Loss 4.4662 LearningRate 0.0002 Epoch: 22 Global Step: 469280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:14,687-Speed 6338.94 samples/sec Loss 4.5246 LearningRate 0.0002 Epoch: 22 Global Step: 469290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:17,937-Speed 6303.35 samples/sec Loss 4.5005 LearningRate 0.0002 Epoch: 22 Global Step: 469300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:21,181-Speed 6314.20 samples/sec Loss 4.4974 LearningRate 0.0002 Epoch: 22 Global Step: 469310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:24,429-Speed 6306.18 samples/sec Loss 4.4927 LearningRate 0.0002 Epoch: 22 Global Step: 469320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:27,700-Speed 6262.51 samples/sec Loss 4.4999 LearningRate 0.0002 Epoch: 22 Global Step: 469330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:30,959-Speed 6285.43 samples/sec Loss 4.5083 LearningRate 0.0002 Epoch: 22 Global Step: 469340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:34,205-Speed 6310.68 samples/sec Loss 4.4936 LearningRate 0.0002 Epoch: 22 Global Step: 469350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:37,449-Speed 6315.32 samples/sec Loss 4.5128 LearningRate 0.0002 Epoch: 22 Global Step: 469360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:40,692-Speed 6315.54 samples/sec Loss 4.4224 LearningRate 0.0002 Epoch: 22 Global Step: 469370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:43,936-Speed 6313.99 samples/sec Loss 4.4215 LearningRate 0.0002 Epoch: 22 Global Step: 469380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:47,169-Speed 6337.69 samples/sec Loss 4.4716 LearningRate 0.0002 Epoch: 22 Global Step: 469390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:50,422-Speed 6296.95 samples/sec Loss 4.4855 LearningRate 0.0002 Epoch: 22 Global Step: 469400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:53,668-Speed 6310.50 samples/sec Loss 4.4394 LearningRate 0.0002 Epoch: 22 Global Step: 469410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:25:56,913-Speed 6313.03 samples/sec Loss 4.5298 LearningRate 0.0002 Epoch: 22 Global Step: 469420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:00,159-Speed 6311.43 samples/sec Loss 4.5146 LearningRate 0.0002 Epoch: 22 Global Step: 469430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:03,406-Speed 6309.10 samples/sec Loss 4.5037 LearningRate 0.0002 Epoch: 22 Global Step: 469440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:06,649-Speed 6315.80 samples/sec Loss 4.5069 LearningRate 0.0002 Epoch: 22 Global Step: 469450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:09,892-Speed 6316.87 samples/sec Loss 4.5729 LearningRate 0.0002 Epoch: 22 Global Step: 469460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:13,138-Speed 6310.57 samples/sec Loss 4.5069 LearningRate 0.0002 Epoch: 22 Global Step: 469470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:16,384-Speed 6309.77 samples/sec Loss 4.5212 LearningRate 0.0002 Epoch: 22 Global Step: 469480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:19,631-Speed 6309.87 samples/sec Loss 4.4364 LearningRate 0.0002 Epoch: 22 Global Step: 469490 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:26:22,858-Speed 6346.79 samples/sec Loss 4.5227 LearningRate 0.0002 Epoch: 22 Global Step: 469500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:26,118-Speed 6284.11 samples/sec Loss 4.4316 LearningRate 0.0002 Epoch: 22 Global Step: 469510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:29,362-Speed 6314.83 samples/sec Loss 4.4649 LearningRate 0.0002 Epoch: 22 Global Step: 469520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:32,608-Speed 6310.28 samples/sec Loss 4.5028 LearningRate 0.0002 Epoch: 22 Global Step: 469530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:35,858-Speed 6303.73 samples/sec Loss 4.5066 LearningRate 0.0002 Epoch: 22 Global Step: 469540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:39,113-Speed 6292.14 samples/sec Loss 4.4762 LearningRate 0.0002 Epoch: 22 Global Step: 469550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:42,358-Speed 6314.02 samples/sec Loss 4.4154 LearningRate 0.0002 Epoch: 22 Global Step: 469560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:45,602-Speed 6313.82 samples/sec Loss 4.4866 LearningRate 0.0002 Epoch: 22 Global Step: 469570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:48,850-Speed 6306.44 samples/sec Loss 4.4835 LearningRate 0.0002 Epoch: 22 Global Step: 469580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:52,095-Speed 6313.59 samples/sec Loss 4.5825 LearningRate 0.0002 Epoch: 22 Global Step: 469590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:55,327-Speed 6336.74 samples/sec Loss 4.4843 LearningRate 0.0002 Epoch: 22 Global Step: 469600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:26:58,580-Speed 6297.62 samples/sec Loss 4.5443 LearningRate 0.0002 Epoch: 22 Global Step: 469610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:01,825-Speed 6313.05 samples/sec Loss 4.4690 LearningRate 0.0002 Epoch: 22 Global Step: 469620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:05,131-Speed 6196.89 samples/sec Loss 4.4017 LearningRate 0.0002 Epoch: 22 Global Step: 469630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:08,389-Speed 6286.64 samples/sec Loss 4.5214 LearningRate 0.0002 Epoch: 22 Global Step: 469640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:11,634-Speed 6314.18 samples/sec Loss 4.4652 LearningRate 0.0002 Epoch: 22 Global Step: 469650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:14,881-Speed 6307.31 samples/sec Loss 4.4807 LearningRate 0.0002 Epoch: 22 Global Step: 469660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:18,126-Speed 6313.46 samples/sec Loss 4.5456 LearningRate 0.0002 Epoch: 22 Global Step: 469670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:21,374-Speed 6307.87 samples/sec Loss 4.4189 LearningRate 0.0002 Epoch: 22 Global Step: 469680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:24,622-Speed 6306.60 samples/sec Loss 4.4817 LearningRate 0.0002 Epoch: 22 Global Step: 469690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:27,851-Speed 6343.52 samples/sec Loss 4.5353 LearningRate 0.0002 Epoch: 22 Global Step: 469700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:31,099-Speed 6305.63 samples/sec Loss 4.5050 LearningRate 0.0002 Epoch: 22 Global Step: 469710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:34,349-Speed 6303.90 samples/sec Loss 4.4778 LearningRate 0.0002 Epoch: 22 Global Step: 469720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:37,600-Speed 6301.27 samples/sec Loss 4.5078 LearningRate 0.0002 Epoch: 22 Global Step: 469730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:40,845-Speed 6311.48 samples/sec Loss 4.5656 LearningRate 0.0002 Epoch: 22 Global Step: 469740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:44,093-Speed 6307.79 samples/sec Loss 4.4404 LearningRate 0.0002 Epoch: 22 Global Step: 469750 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:47,337-Speed 6314.77 samples/sec Loss 4.4753 LearningRate 0.0002 Epoch: 22 Global Step: 469760 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:50,591-Speed 6295.62 samples/sec Loss 4.4547 LearningRate 0.0002 Epoch: 22 Global Step: 469770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:53,838-Speed 6307.06 samples/sec Loss 4.5476 LearningRate 0.0002 Epoch: 22 Global Step: 469780 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:27:57,081-Speed 6317.70 samples/sec Loss 4.4823 LearningRate 0.0002 Epoch: 22 Global Step: 469790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:00,317-Speed 6329.97 samples/sec Loss 4.4580 LearningRate 0.0002 Epoch: 22 Global Step: 469800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:03,566-Speed 6304.14 samples/sec Loss 4.4745 LearningRate 0.0002 Epoch: 22 Global Step: 469810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:06,812-Speed 6311.55 samples/sec Loss 4.4909 LearningRate 0.0002 Epoch: 22 Global Step: 469820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:10,057-Speed 6312.16 samples/sec Loss 4.5319 LearningRate 0.0002 Epoch: 22 Global Step: 469830 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:13,305-Speed 6307.34 samples/sec Loss 4.5847 LearningRate 0.0002 Epoch: 22 Global Step: 469840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:16,550-Speed 6312.00 samples/sec Loss 4.5052 LearningRate 0.0002 Epoch: 22 Global Step: 469850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:19,809-Speed 6287.28 samples/sec Loss 4.5016 LearningRate 0.0002 Epoch: 22 Global Step: 469860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:23,056-Speed 6309.06 samples/sec Loss 4.4336 LearningRate 0.0002 Epoch: 22 Global Step: 469870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:26,302-Speed 6309.66 samples/sec Loss 4.4604 LearningRate 0.0002 Epoch: 22 Global Step: 469880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:29,547-Speed 6312.97 samples/sec Loss 4.4967 LearningRate 0.0002 Epoch: 22 Global Step: 469890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:32,794-Speed 6309.53 samples/sec Loss 4.5128 LearningRate 0.0002 Epoch: 22 Global Step: 469900 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:28:36,040-Speed 6309.59 samples/sec Loss 4.4741 LearningRate 0.0002 Epoch: 22 Global Step: 469910 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:28:39,277-Speed 6328.26 samples/sec Loss 4.5641 LearningRate 0.0002 Epoch: 22 Global Step: 469920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:42,526-Speed 6305.94 samples/sec Loss 4.4891 LearningRate 0.0002 Epoch: 22 Global Step: 469930 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:45,772-Speed 6310.80 samples/sec Loss 4.4280 LearningRate 0.0002 Epoch: 22 Global Step: 469940 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:49,016-Speed 6314.88 samples/sec Loss 4.4870 LearningRate 0.0002 Epoch: 22 Global Step: 469950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:52,259-Speed 6315.88 samples/sec Loss 4.5082 LearningRate 0.0002 Epoch: 22 Global Step: 469960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:55,502-Speed 6316.60 samples/sec Loss 4.4942 LearningRate 0.0002 Epoch: 22 Global Step: 469970 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:28:58,748-Speed 6311.17 samples/sec Loss 4.5473 LearningRate 0.0002 Epoch: 22 Global Step: 469980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:02,003-Speed 6292.73 samples/sec Loss 4.4986 LearningRate 0.0002 Epoch: 22 Global Step: 469990 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:05,251-Speed 6306.84 samples/sec Loss 4.4667 LearningRate 0.0002 Epoch: 22 Global Step: 470000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:08,494-Speed 6316.64 samples/sec Loss 4.4868 LearningRate 0.0002 Epoch: 22 Global Step: 470010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:11,729-Speed 6331.07 samples/sec Loss 4.4297 LearningRate 0.0002 Epoch: 22 Global Step: 470020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:14,980-Speed 6301.16 samples/sec Loss 4.4814 LearningRate 0.0002 Epoch: 22 Global Step: 470030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:18,238-Speed 6288.77 samples/sec Loss 4.4749 LearningRate 0.0002 Epoch: 22 Global Step: 470040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:21,513-Speed 6255.27 samples/sec Loss 4.4912 LearningRate 0.0002 Epoch: 22 Global Step: 470050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:24,776-Speed 6278.61 samples/sec Loss 4.4459 LearningRate 0.0002 Epoch: 22 Global Step: 470060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:28,024-Speed 6306.38 samples/sec Loss 4.5028 LearningRate 0.0002 Epoch: 22 Global Step: 470070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:31,273-Speed 6304.55 samples/sec Loss 4.4860 LearningRate 0.0002 Epoch: 22 Global Step: 470080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:34,520-Speed 6309.10 samples/sec Loss 4.4215 LearningRate 0.0002 Epoch: 22 Global Step: 470090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:37,768-Speed 6307.49 samples/sec Loss 4.5057 LearningRate 0.0002 Epoch: 22 Global Step: 470100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:41,013-Speed 6312.49 samples/sec Loss 4.4615 LearningRate 0.0002 Epoch: 22 Global Step: 470110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:44,245-Speed 6337.31 samples/sec Loss 4.5387 LearningRate 0.0002 Epoch: 22 Global Step: 470120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:47,489-Speed 6315.19 samples/sec Loss 4.5223 LearningRate 0.0002 Epoch: 22 Global Step: 470130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:50,732-Speed 6315.21 samples/sec Loss 4.4778 LearningRate 0.0002 Epoch: 22 Global Step: 470140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:53,978-Speed 6311.13 samples/sec Loss 4.5379 LearningRate 0.0002 Epoch: 22 Global Step: 470150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:29:57,224-Speed 6311.02 samples/sec Loss 4.5423 LearningRate 0.0002 Epoch: 22 Global Step: 470160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:00,474-Speed 6302.82 samples/sec Loss 4.5220 LearningRate 0.0002 Epoch: 22 Global Step: 470170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:03,721-Speed 6308.14 samples/sec Loss 4.4742 LearningRate 0.0002 Epoch: 22 Global Step: 470180 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:06,965-Speed 6314.46 samples/sec Loss 4.4275 LearningRate 0.0002 Epoch: 22 Global Step: 470190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:10,212-Speed 6308.91 samples/sec Loss 4.5110 LearningRate 0.0002 Epoch: 22 Global Step: 470200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:13,460-Speed 6307.84 samples/sec Loss 4.4430 LearningRate 0.0002 Epoch: 22 Global Step: 470210 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:16,696-Speed 6330.43 samples/sec Loss 4.4997 LearningRate 0.0002 Epoch: 22 Global Step: 470220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:19,942-Speed 6309.99 samples/sec Loss 4.5132 LearningRate 0.0002 Epoch: 22 Global Step: 470230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:23,188-Speed 6311.07 samples/sec Loss 4.5287 LearningRate 0.0002 Epoch: 22 Global Step: 470240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:26,435-Speed 6308.46 samples/sec Loss 4.5019 LearningRate 0.0002 Epoch: 22 Global Step: 470250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:29,682-Speed 6310.21 samples/sec Loss 4.5028 LearningRate 0.0002 Epoch: 22 Global Step: 470260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:32,927-Speed 6311.96 samples/sec Loss 4.5378 LearningRate 0.0002 Epoch: 22 Global Step: 470270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:36,170-Speed 6316.32 samples/sec Loss 4.5207 LearningRate 0.0002 Epoch: 22 Global Step: 470280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:39,414-Speed 6314.49 samples/sec Loss 4.4697 LearningRate 0.0002 Epoch: 22 Global Step: 470290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:42,661-Speed 6309.54 samples/sec Loss 4.4976 LearningRate 0.0002 Epoch: 22 Global Step: 470300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:45,909-Speed 6306.43 samples/sec Loss 4.4843 LearningRate 0.0002 Epoch: 22 Global Step: 470310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:49,143-Speed 6334.77 samples/sec Loss 4.5027 LearningRate 0.0002 Epoch: 22 Global Step: 470320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:52,387-Speed 6314.78 samples/sec Loss 4.5330 LearningRate 0.0002 Epoch: 22 Global Step: 470330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:55,630-Speed 6315.30 samples/sec Loss 4.4440 LearningRate 0.0002 Epoch: 22 Global Step: 470340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:30:58,874-Speed 6315.26 samples/sec Loss 4.4882 LearningRate 0.0002 Epoch: 22 Global Step: 470350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:02,120-Speed 6311.16 samples/sec Loss 4.3628 LearningRate 0.0002 Epoch: 22 Global Step: 470360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:05,362-Speed 6317.64 samples/sec Loss 4.5018 LearningRate 0.0002 Epoch: 22 Global Step: 470370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:08,607-Speed 6312.42 samples/sec Loss 4.4729 LearningRate 0.0002 Epoch: 22 Global Step: 470380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:11,861-Speed 6296.35 samples/sec Loss 4.4749 LearningRate 0.0002 Epoch: 22 Global Step: 470390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:15,109-Speed 6307.00 samples/sec Loss 4.5204 LearningRate 0.0002 Epoch: 22 Global Step: 470400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:18,355-Speed 6310.73 samples/sec Loss 4.5126 LearningRate 0.0002 Epoch: 22 Global Step: 470410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:21,606-Speed 6300.69 samples/sec Loss 4.4790 LearningRate 0.0002 Epoch: 22 Global Step: 470420 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:31:24,841-Speed 6332.34 samples/sec Loss 4.4514 LearningRate 0.0002 Epoch: 22 Global Step: 470430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:28,083-Speed 6316.98 samples/sec Loss 4.4856 LearningRate 0.0002 Epoch: 22 Global Step: 470440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:31,330-Speed 6310.85 samples/sec Loss 4.4824 LearningRate 0.0002 Epoch: 22 Global Step: 470450 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:34,575-Speed 6312.82 samples/sec Loss 4.4817 LearningRate 0.0002 Epoch: 22 Global Step: 470460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:37,823-Speed 6305.68 samples/sec Loss 4.4467 LearningRate 0.0002 Epoch: 22 Global Step: 470470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:41,071-Speed 6307.31 samples/sec Loss 4.5260 LearningRate 0.0002 Epoch: 22 Global Step: 470480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:44,325-Speed 6294.42 samples/sec Loss 4.4432 LearningRate 0.0002 Epoch: 22 Global Step: 470490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:47,573-Speed 6308.12 samples/sec Loss 4.4468 LearningRate 0.0002 Epoch: 22 Global Step: 470500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:50,819-Speed 6310.90 samples/sec Loss 4.4767 LearningRate 0.0002 Epoch: 22 Global Step: 470510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:54,073-Speed 6295.00 samples/sec Loss 4.4490 LearningRate 0.0002 Epoch: 22 Global Step: 470520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:31:57,306-Speed 6335.95 samples/sec Loss 4.4312 LearningRate 0.0002 Epoch: 22 Global Step: 470530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:00,551-Speed 6312.69 samples/sec Loss 4.4968 LearningRate 0.0002 Epoch: 22 Global Step: 470540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:03,800-Speed 6303.62 samples/sec Loss 4.4571 LearningRate 0.0002 Epoch: 22 Global Step: 470550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:07,051-Speed 6302.74 samples/sec Loss 4.4664 LearningRate 0.0002 Epoch: 22 Global Step: 470560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:10,292-Speed 6320.48 samples/sec Loss 4.5520 LearningRate 0.0002 Epoch: 22 Global Step: 470570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:13,536-Speed 6312.68 samples/sec Loss 4.4789 LearningRate 0.0002 Epoch: 22 Global Step: 470580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:16,783-Speed 6310.30 samples/sec Loss 4.4757 LearningRate 0.0002 Epoch: 22 Global Step: 470590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:20,034-Speed 6299.90 samples/sec Loss 4.4099 LearningRate 0.0002 Epoch: 22 Global Step: 470600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:23,280-Speed 6312.84 samples/sec Loss 4.4703 LearningRate 0.0002 Epoch: 22 Global Step: 470610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:26,523-Speed 6316.02 samples/sec Loss 4.4893 LearningRate 0.0002 Epoch: 22 Global Step: 470620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:29,754-Speed 6340.42 samples/sec Loss 4.4784 LearningRate 0.0002 Epoch: 22 Global Step: 470630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:32,996-Speed 6318.88 samples/sec Loss 4.4774 LearningRate 0.0002 Epoch: 22 Global Step: 470640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:36,248-Speed 6300.20 samples/sec Loss 4.5016 LearningRate 0.0002 Epoch: 22 Global Step: 470650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:39,490-Speed 6318.09 samples/sec Loss 4.4634 LearningRate 0.0002 Epoch: 22 Global Step: 470660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:42,738-Speed 6307.61 samples/sec Loss 4.4811 LearningRate 0.0002 Epoch: 22 Global Step: 470670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:45,984-Speed 6310.36 samples/sec Loss 4.4527 LearningRate 0.0002 Epoch: 22 Global Step: 470680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:49,227-Speed 6316.21 samples/sec Loss 4.4916 LearningRate 0.0002 Epoch: 22 Global Step: 470690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:52,476-Speed 6305.96 samples/sec Loss 4.4817 LearningRate 0.0002 Epoch: 22 Global Step: 470700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:55,723-Speed 6308.17 samples/sec Loss 4.4323 LearningRate 0.0002 Epoch: 22 Global Step: 470710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:32:58,976-Speed 6296.04 samples/sec Loss 4.4648 LearningRate 0.0002 Epoch: 22 Global Step: 470720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:02,208-Speed 6338.89 samples/sec Loss 4.4720 LearningRate 0.0002 Epoch: 22 Global Step: 470730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:05,455-Speed 6309.16 samples/sec Loss 4.4338 LearningRate 0.0002 Epoch: 22 Global Step: 470740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:08,701-Speed 6309.34 samples/sec Loss 4.5108 LearningRate 0.0002 Epoch: 22 Global Step: 470750 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:11,946-Speed 6312.80 samples/sec Loss 4.4979 LearningRate 0.0002 Epoch: 22 Global Step: 470760 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:15,190-Speed 6316.11 samples/sec Loss 4.3922 LearningRate 0.0002 Epoch: 22 Global Step: 470770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:18,433-Speed 6314.78 samples/sec Loss 4.4801 LearningRate 0.0002 Epoch: 22 Global Step: 470780 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:21,683-Speed 6304.97 samples/sec Loss 4.5521 LearningRate 0.0002 Epoch: 22 Global Step: 470790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:24,929-Speed 6310.32 samples/sec Loss 4.4793 LearningRate 0.0002 Epoch: 22 Global Step: 470800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:28,179-Speed 6302.41 samples/sec Loss 4.4747 LearningRate 0.0002 Epoch: 22 Global Step: 470810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:31,425-Speed 6309.73 samples/sec Loss 4.4477 LearningRate 0.0002 Epoch: 22 Global Step: 470820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:34,668-Speed 6317.14 samples/sec Loss 4.5275 LearningRate 0.0002 Epoch: 22 Global Step: 470830 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:33:37,905-Speed 6329.31 samples/sec Loss 4.4580 LearningRate 0.0002 Epoch: 22 Global Step: 470840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:41,146-Speed 6319.20 samples/sec Loss 4.4718 LearningRate 0.0002 Epoch: 22 Global Step: 470850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:44,393-Speed 6309.61 samples/sec Loss 4.4730 LearningRate 0.0002 Epoch: 22 Global Step: 470860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:47,641-Speed 6307.18 samples/sec Loss 4.4750 LearningRate 0.0002 Epoch: 22 Global Step: 470870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:50,885-Speed 6315.19 samples/sec Loss 4.5207 LearningRate 0.0002 Epoch: 22 Global Step: 470880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:54,132-Speed 6308.67 samples/sec Loss 4.4709 LearningRate 0.0002 Epoch: 22 Global Step: 470890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:33:57,384-Speed 6299.25 samples/sec Loss 4.5178 LearningRate 0.0002 Epoch: 22 Global Step: 470900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:34:00,632-Speed 6305.92 samples/sec Loss 4.4183 LearningRate 0.0002 Epoch: 22 Global Step: 470910 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:34:03,877-Speed 6312.80 samples/sec Loss 4.5019 LearningRate 0.0002 Epoch: 22 Global Step: 470920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:34:07,119-Speed 6317.76 samples/sec Loss 4.4691 LearningRate 0.0002 Epoch: 22 Global Step: 470930 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:34:10,352-Speed 6337.82 samples/sec Loss 4.5382 LearningRate 0.0002 Epoch: 22 Global Step: 470940 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:34:13,581-Speed 6342.14 samples/sec Loss 4.5002 LearningRate 0.0002 Epoch: 22 Global Step: 470950 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:34:16,827-Speed 6310.97 samples/sec Loss 4.4604 LearningRate 0.0002 Epoch: 22 Global Step: 470960 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:34:20,073-Speed 6311.14 samples/sec Loss 4.5422 LearningRate 0.0002 Epoch: 22 Global Step: 470970 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:34:23,319-Speed 6310.58 samples/sec Loss 4.5135 LearningRate 0.0002 Epoch: 22 Global Step: 470980 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:34:26,565-Speed 6311.77 samples/sec Loss 4.5168 LearningRate 0.0002 Epoch: 22 Global Step: 470990 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:34:29,808-Speed 6314.98 samples/sec Loss 4.5353 LearningRate 0.0002 Epoch: 22 Global Step: 471000 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:34:33,063-Speed 6294.52 samples/sec Loss 4.4463 LearningRate 0.0002 Epoch: 22 Global Step: 471010 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:34:36,308-Speed 6312.93 samples/sec Loss 4.4515 LearningRate 0.0002 Epoch: 22 Global Step: 471020 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:34:39,556-Speed 6306.72 samples/sec Loss 4.4623 LearningRate 0.0002 Epoch: 22 Global Step: 471030 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:34:42,806-Speed 6302.37 samples/sec Loss 4.5200 LearningRate 0.0002 Epoch: 22 Global Step: 471040 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:34:46,052-Speed 6310.65 samples/sec Loss 4.5197 LearningRate 0.0002 Epoch: 22 Global Step: 471050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:34:49,295-Speed 6316.07 samples/sec Loss 4.4483 LearningRate 0.0002 Epoch: 22 Global Step: 471060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:34:52,546-Speed 6300.91 samples/sec Loss 4.5019 LearningRate 0.0002 Epoch: 22 Global Step: 471070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:34:55,795-Speed 6304.97 samples/sec Loss 4.5087 LearningRate 0.0002 Epoch: 22 Global Step: 471080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:34:59,041-Speed 6312.31 samples/sec Loss 4.4910 LearningRate 0.0002 Epoch: 22 Global Step: 471090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:02,287-Speed 6310.03 samples/sec Loss 4.4060 LearningRate 0.0002 Epoch: 22 Global Step: 471100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:05,531-Speed 6314.39 samples/sec Loss 4.4666 LearningRate 0.0002 Epoch: 22 Global Step: 471110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:08,777-Speed 6312.08 samples/sec Loss 4.4818 LearningRate 0.0002 Epoch: 22 Global Step: 471120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:12,020-Speed 6316.81 samples/sec Loss 4.5230 LearningRate 0.0002 Epoch: 22 Global Step: 471130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:15,263-Speed 6315.99 samples/sec Loss 4.4897 LearningRate 0.0002 Epoch: 22 Global Step: 471140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:18,494-Speed 6339.79 samples/sec Loss 4.4889 LearningRate 0.0002 Epoch: 22 Global Step: 471150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:21,740-Speed 6310.66 samples/sec Loss 4.5698 LearningRate 0.0002 Epoch: 22 Global Step: 471160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:25,005-Speed 6274.06 samples/sec Loss 4.4084 LearningRate 0.0002 Epoch: 22 Global Step: 471170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:28,250-Speed 6312.10 samples/sec Loss 4.4320 LearningRate 0.0002 Epoch: 22 Global Step: 471180 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:31,493-Speed 6317.09 samples/sec Loss 4.5665 LearningRate 0.0002 Epoch: 22 Global Step: 471190 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:34,742-Speed 6305.32 samples/sec Loss 4.4658 LearningRate 0.0002 Epoch: 22 Global Step: 471200 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:37,995-Speed 6296.25 samples/sec Loss 4.4245 LearningRate 0.0002 Epoch: 22 Global Step: 471210 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:41,239-Speed 6314.34 samples/sec Loss 4.4917 LearningRate 0.0002 Epoch: 22 Global Step: 471220 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:44,489-Speed 6304.23 samples/sec Loss 4.4591 LearningRate 0.0002 Epoch: 22 Global Step: 471230 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:47,733-Speed 6312.87 samples/sec Loss 4.5039 LearningRate 0.0002 Epoch: 22 Global Step: 471240 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:50,964-Speed 6340.41 samples/sec Loss 4.4271 LearningRate 0.0002 Epoch: 22 Global Step: 471250 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:54,207-Speed 6316.67 samples/sec Loss 4.4916 LearningRate 0.0002 Epoch: 22 Global Step: 471260 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:35:57,455-Speed 6308.66 samples/sec Loss 4.5088 LearningRate 0.0002 Epoch: 22 Global Step: 471270 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:00,701-Speed 6310.40 samples/sec Loss 4.4649 LearningRate 0.0002 Epoch: 22 Global Step: 471280 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:03,948-Speed 6309.14 samples/sec Loss 4.4144 LearningRate 0.0002 Epoch: 22 Global Step: 471290 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:07,198-Speed 6302.61 samples/sec Loss 4.4868 LearningRate 0.0002 Epoch: 22 Global Step: 471300 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:10,443-Speed 6312.69 samples/sec Loss 4.4367 LearningRate 0.0002 Epoch: 22 Global Step: 471310 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:13,722-Speed 6247.86 samples/sec Loss 4.4828 LearningRate 0.0002 Epoch: 22 Global Step: 471320 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:16,968-Speed 6310.86 samples/sec Loss 4.4828 LearningRate 0.0002 Epoch: 22 Global Step: 471330 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:20,211-Speed 6316.55 samples/sec Loss 4.5293 LearningRate 0.0002 Epoch: 22 Global Step: 471340 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:23,446-Speed 6331.33 samples/sec Loss 4.5037 LearningRate 0.0002 Epoch: 22 Global Step: 471350 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:26,694-Speed 6308.09 samples/sec Loss 4.5829 LearningRate 0.0002 Epoch: 22 Global Step: 471360 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:29,939-Speed 6312.38 samples/sec Loss 4.4497 LearningRate 0.0002 Epoch: 22 Global Step: 471370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:33,183-Speed 6314.11 samples/sec Loss 4.4566 LearningRate 0.0002 Epoch: 22 Global Step: 471380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:36,427-Speed 6314.95 samples/sec Loss 4.4712 LearningRate 0.0002 Epoch: 22 Global Step: 471390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:39,670-Speed 6316.00 samples/sec Loss 4.4842 LearningRate 0.0002 Epoch: 22 Global Step: 471400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:42,918-Speed 6307.74 samples/sec Loss 4.4880 LearningRate 0.0002 Epoch: 22 Global Step: 471410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:46,158-Speed 6321.98 samples/sec Loss 4.5117 LearningRate 0.0002 Epoch: 22 Global Step: 471420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:49,405-Speed 6307.45 samples/sec Loss 4.3918 LearningRate 0.0002 Epoch: 22 Global Step: 471430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:52,649-Speed 6314.83 samples/sec Loss 4.5053 LearningRate 0.0002 Epoch: 22 Global Step: 471440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:36:55,897-Speed 6308.40 samples/sec Loss 4.5050 LearningRate 0.0002 Epoch: 22 Global Step: 471450 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:36:59,169-Speed 6260.82 samples/sec Loss 4.4504 LearningRate 0.0002 Epoch: 22 Global Step: 471460 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:02,415-Speed 6309.98 samples/sec Loss 4.5005 LearningRate 0.0002 Epoch: 22 Global Step: 471470 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:05,662-Speed 6309.21 samples/sec Loss 4.5280 LearningRate 0.0002 Epoch: 22 Global Step: 471480 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:08,907-Speed 6312.03 samples/sec Loss 4.4388 LearningRate 0.0002 Epoch: 22 Global Step: 471490 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:12,150-Speed 6317.27 samples/sec Loss 4.4730 LearningRate 0.0002 Epoch: 22 Global Step: 471500 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:15,407-Speed 6290.03 samples/sec Loss 4.5208 LearningRate 0.0002 Epoch: 22 Global Step: 471510 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:18,724-Speed 6175.96 samples/sec Loss 4.4563 LearningRate 0.0002 Epoch: 22 Global Step: 471520 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:21,969-Speed 6312.14 samples/sec Loss 4.4806 LearningRate 0.0002 Epoch: 22 Global Step: 471530 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:25,214-Speed 6311.96 samples/sec Loss 4.4918 LearningRate 0.0002 Epoch: 22 Global Step: 471540 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:28,460-Speed 6312.18 samples/sec Loss 4.4855 LearningRate 0.0002 Epoch: 22 Global Step: 471550 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:31,709-Speed 6304.18 samples/sec Loss 4.4616 LearningRate 0.0002 Epoch: 22 Global Step: 471560 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:34,953-Speed 6314.45 samples/sec Loss 4.4083 LearningRate 0.0002 Epoch: 22 Global Step: 471570 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:38,196-Speed 6315.36 samples/sec Loss 4.4951 LearningRate 0.0002 Epoch: 22 Global Step: 471580 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:41,445-Speed 6307.00 samples/sec Loss 4.4462 LearningRate 0.0002 Epoch: 22 Global Step: 471590 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:44,688-Speed 6315.64 samples/sec Loss 4.4539 LearningRate 0.0002 Epoch: 22 Global Step: 471600 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:47,940-Speed 6299.40 samples/sec Loss 4.5409 LearningRate 0.0002 Epoch: 22 Global Step: 471610 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:51,182-Speed 6318.15 samples/sec Loss 4.5176 LearningRate 0.0002 Epoch: 22 Global Step: 471620 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:54,426-Speed 6313.78 samples/sec Loss 4.4346 LearningRate 0.0002 Epoch: 22 Global Step: 471630 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:37:57,670-Speed 6315.98 samples/sec Loss 4.4832 LearningRate 0.0002 Epoch: 22 Global Step: 471640 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:00,914-Speed 6314.63 samples/sec Loss 4.5525 LearningRate 0.0002 Epoch: 22 Global Step: 471650 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:04,143-Speed 6341.93 samples/sec Loss 4.4382 LearningRate 0.0002 Epoch: 22 Global Step: 471660 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:07,394-Speed 6302.08 samples/sec Loss 4.4734 LearningRate 0.0002 Epoch: 22 Global Step: 471670 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:10,654-Speed 6283.03 samples/sec Loss 4.4389 LearningRate 0.0002 Epoch: 22 Global Step: 471680 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:13,899-Speed 6313.40 samples/sec Loss 4.5279 LearningRate 0.0002 Epoch: 22 Global Step: 471690 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:17,143-Speed 6314.57 samples/sec Loss 4.4704 LearningRate 0.0002 Epoch: 22 Global Step: 471700 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:20,391-Speed 6307.19 samples/sec Loss 4.4710 LearningRate 0.0002 Epoch: 22 Global Step: 471710 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:23,637-Speed 6310.22 samples/sec Loss 4.4893 LearningRate 0.0002 Epoch: 22 Global Step: 471720 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:26,884-Speed 6309.15 samples/sec Loss 4.5461 LearningRate 0.0002 Epoch: 22 Global Step: 471730 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:30,128-Speed 6314.82 samples/sec Loss 4.5404 LearningRate 0.0002 Epoch: 22 Global Step: 471740 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:33,373-Speed 6312.46 samples/sec Loss 4.5358 LearningRate 0.0002 Epoch: 22 Global Step: 471750 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:36,616-Speed 6316.47 samples/sec Loss 4.4780 LearningRate 0.0002 Epoch: 22 Global Step: 471760 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-02 10:38:39,848-Speed 6339.39 samples/sec Loss 4.4527 LearningRate 0.0002 Epoch: 22 Global Step: 471770 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:43,093-Speed 6311.44 samples/sec Loss 4.4498 LearningRate 0.0002 Epoch: 22 Global Step: 471780 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:46,337-Speed 6314.63 samples/sec Loss 4.4536 LearningRate 0.0002 Epoch: 22 Global Step: 471790 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:49,582-Speed 6312.13 samples/sec Loss 4.4005 LearningRate 0.0002 Epoch: 22 Global Step: 471800 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:52,832-Speed 6304.30 samples/sec Loss 4.4262 LearningRate 0.0002 Epoch: 22 Global Step: 471810 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:56,076-Speed 6313.88 samples/sec Loss 4.4688 LearningRate 0.0002 Epoch: 22 Global Step: 471820 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:38:59,325-Speed 6305.11 samples/sec Loss 4.3916 LearningRate 0.0002 Epoch: 22 Global Step: 471830 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:39:02,586-Speed 6281.11 samples/sec Loss 4.4275 LearningRate 0.0002 Epoch: 22 Global Step: 471840 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:39:05,839-Speed 6298.25 samples/sec Loss 4.4870 LearningRate 0.0002 Epoch: 22 Global Step: 471850 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:39:09,083-Speed 6313.73 samples/sec Loss 4.5042 LearningRate 0.0002 Epoch: 22 Global Step: 471860 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:39:12,316-Speed 6335.84 samples/sec Loss 4.4446 LearningRate 0.0002 Epoch: 22 Global Step: 471870 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:39:15,567-Speed 6301.36 samples/sec Loss 4.4553 LearningRate 0.0002 Epoch: 22 Global Step: 471880 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:39:18,814-Speed 6309.16 samples/sec Loss 4.4683 LearningRate 0.0002 Epoch: 22 Global Step: 471890 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:39:22,043-Speed 6343.87 samples/sec Loss 4.4335 LearningRate 0.0002 Epoch: 22 Global Step: 471900 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:39:25,290-Speed 6308.84 samples/sec Loss 4.4734 LearningRate 0.0002 Epoch: 22 Global Step: 471910 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:39:28,539-Speed 6306.63 samples/sec Loss 4.4996 LearningRate 0.0002 Epoch: 22 Global Step: 471920 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:39:31,784-Speed 6312.32 samples/sec Loss 4.5086 LearningRate 0.0002 Epoch: 22 Global Step: 471930 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:39:35,028-Speed 6313.51 samples/sec Loss 4.4507 LearningRate 0.0002 Epoch: 22 Global Step: 471940 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:39:38,274-Speed 6310.77 samples/sec Loss 4.4908 LearningRate 0.0002 Epoch: 22 Global Step: 471950 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:39:41,527-Speed 6298.23 samples/sec Loss 4.4507 LearningRate 0.0002 Epoch: 22 Global Step: 471960 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:39:44,776-Speed 6304.82 samples/sec Loss 4.4951 LearningRate 0.0002 Epoch: 22 Global Step: 471970 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:39:48,026-Speed 6302.33 samples/sec Loss 4.5193 LearningRate 0.0002 Epoch: 22 Global Step: 471980 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:39:51,271-Speed 6313.96 samples/sec Loss 4.5022 LearningRate 0.0002 Epoch: 22 Global Step: 471990 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-04-02 10:39:54,517-Speed 6309.54 samples/sec Loss 4.4988 LearningRate 0.0002 Epoch: 22 Global Step: 472000 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:39:57,764-Speed 6309.66 samples/sec Loss 4.5233 LearningRate 0.0002 Epoch: 22 Global Step: 472010 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:01,012-Speed 6305.47 samples/sec Loss 4.4616 LearningRate 0.0002 Epoch: 22 Global Step: 472020 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:04,259-Speed 6309.69 samples/sec Loss 4.3946 LearningRate 0.0002 Epoch: 22 Global Step: 472030 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:07,507-Speed 6306.18 samples/sec Loss 4.4344 LearningRate 0.0002 Epoch: 22 Global Step: 472040 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:10,762-Speed 6298.06 samples/sec Loss 4.4587 LearningRate 0.0002 Epoch: 22 Global Step: 472050 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:14,007-Speed 6313.36 samples/sec Loss 4.4804 LearningRate 0.0002 Epoch: 22 Global Step: 472060 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:17,256-Speed 6304.44 samples/sec Loss 4.4345 LearningRate 0.0002 Epoch: 22 Global Step: 472070 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:20,501-Speed 6312.71 samples/sec Loss 4.4001 LearningRate 0.0002 Epoch: 22 Global Step: 472080 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:23,746-Speed 6312.27 samples/sec Loss 4.4848 LearningRate 0.0002 Epoch: 22 Global Step: 472090 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:26,983-Speed 6329.51 samples/sec Loss 4.4243 LearningRate 0.0002 Epoch: 22 Global Step: 472100 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:30,230-Speed 6308.23 samples/sec Loss 4.4113 LearningRate 0.0002 Epoch: 22 Global Step: 472110 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:33,479-Speed 6304.99 samples/sec Loss 4.4260 LearningRate 0.0002 Epoch: 22 Global Step: 472120 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:36,725-Speed 6311.02 samples/sec Loss 4.4404 LearningRate 0.0002 Epoch: 22 Global Step: 472130 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:39,973-Speed 6310.12 samples/sec Loss 4.4371 LearningRate 0.0002 Epoch: 22 Global Step: 472140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:43,217-Speed 6315.14 samples/sec Loss 4.4914 LearningRate 0.0002 Epoch: 22 Global Step: 472150 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:46,467-Speed 6302.80 samples/sec Loss 4.4815 LearningRate 0.0002 Epoch: 22 Global Step: 472160 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:49,712-Speed 6312.51 samples/sec Loss 4.4622 LearningRate 0.0002 Epoch: 22 Global Step: 472170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-02 10:40:52,965-Speed 6297.56 samples/sec Loss 4.5018 LearningRate 0.0002 Epoch: 22 Global Step: 472180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:40:56,216-Speed 6299.72 samples/sec Loss 4.5261 LearningRate 0.0002 Epoch: 22 Global Step: 472190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:40:59,447-Speed 6340.74 samples/sec Loss 4.4369 LearningRate 0.0002 Epoch: 22 Global Step: 472200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:02,690-Speed 6316.60 samples/sec Loss 4.4744 LearningRate 0.0002 Epoch: 22 Global Step: 472210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:05,935-Speed 6313.19 samples/sec Loss 4.4801 LearningRate 0.0002 Epoch: 22 Global Step: 472220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:09,190-Speed 6292.48 samples/sec Loss 4.5153 LearningRate 0.0002 Epoch: 22 Global Step: 472230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:12,437-Speed 6308.38 samples/sec Loss 4.5022 LearningRate 0.0002 Epoch: 22 Global Step: 472240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:15,682-Speed 6312.63 samples/sec Loss 4.4386 LearningRate 0.0002 Epoch: 22 Global Step: 472250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:18,933-Speed 6301.37 samples/sec Loss 4.4892 LearningRate 0.0002 Epoch: 22 Global Step: 472260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:22,178-Speed 6312.24 samples/sec Loss 4.4101 LearningRate 0.0002 Epoch: 22 Global Step: 472270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:25,427-Speed 6305.47 samples/sec Loss 4.4266 LearningRate 0.0002 Epoch: 22 Global Step: 472280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:28,672-Speed 6313.64 samples/sec Loss 4.5751 LearningRate 0.0002 Epoch: 22 Global Step: 472290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:31,906-Speed 6332.93 samples/sec Loss 4.4453 LearningRate 0.0002 Epoch: 22 Global Step: 472300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:35,166-Speed 6283.05 samples/sec Loss 4.4509 LearningRate 0.0002 Epoch: 22 Global Step: 472310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:38,413-Speed 6309.66 samples/sec Loss 4.5106 LearningRate 0.0002 Epoch: 22 Global Step: 472320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:41,657-Speed 6315.63 samples/sec Loss 4.4895 LearningRate 0.0002 Epoch: 22 Global Step: 472330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:44,903-Speed 6309.57 samples/sec Loss 4.4906 LearningRate 0.0002 Epoch: 22 Global Step: 472340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:48,148-Speed 6313.53 samples/sec Loss 4.5313 LearningRate 0.0002 Epoch: 22 Global Step: 472350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:51,395-Speed 6308.75 samples/sec Loss 4.4690 LearningRate 0.0002 Epoch: 22 Global Step: 472360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:54,642-Speed 6308.56 samples/sec Loss 4.4705 LearningRate 0.0002 Epoch: 22 Global Step: 472370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:41:57,899-Speed 6290.42 samples/sec Loss 4.4171 LearningRate 0.0002 Epoch: 22 Global Step: 472380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:01,144-Speed 6311.68 samples/sec Loss 4.5092 LearningRate 0.0002 Epoch: 22 Global Step: 472390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:04,386-Speed 6318.85 samples/sec Loss 4.4529 LearningRate 0.0002 Epoch: 22 Global Step: 472400 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:42:07,622-Speed 6330.53 samples/sec Loss 4.5001 LearningRate 0.0002 Epoch: 22 Global Step: 472410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:10,864-Speed 6319.03 samples/sec Loss 4.5290 LearningRate 0.0002 Epoch: 22 Global Step: 472420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:14,110-Speed 6310.42 samples/sec Loss 4.5286 LearningRate 0.0002 Epoch: 22 Global Step: 472430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:17,356-Speed 6310.33 samples/sec Loss 4.4937 LearningRate 0.0002 Epoch: 22 Global Step: 472440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:20,598-Speed 6317.87 samples/sec Loss 4.4826 LearningRate 0.0002 Epoch: 22 Global Step: 472450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:23,845-Speed 6308.67 samples/sec Loss 4.4305 LearningRate 0.0002 Epoch: 22 Global Step: 472460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:27,099-Speed 6296.17 samples/sec Loss 4.4529 LearningRate 0.0002 Epoch: 22 Global Step: 472470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:30,342-Speed 6317.07 samples/sec Loss 4.4306 LearningRate 0.0002 Epoch: 22 Global Step: 472480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:33,603-Speed 6280.51 samples/sec Loss 4.4511 LearningRate 0.0002 Epoch: 22 Global Step: 472490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:36,854-Speed 6302.19 samples/sec Loss 4.4950 LearningRate 0.0002 Epoch: 22 Global Step: 472500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:40,088-Speed 6333.25 samples/sec Loss 4.4544 LearningRate 0.0002 Epoch: 22 Global Step: 472510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:43,331-Speed 6315.79 samples/sec Loss 4.4747 LearningRate 0.0002 Epoch: 22 Global Step: 472520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:46,604-Speed 6263.29 samples/sec Loss 4.4607 LearningRate 0.0002 Epoch: 22 Global Step: 472530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:49,905-Speed 6205.39 samples/sec Loss 4.4534 LearningRate 0.0002 Epoch: 22 Global Step: 472540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:53,151-Speed 6311.33 samples/sec Loss 4.4432 LearningRate 0.0002 Epoch: 22 Global Step: 472550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:56,398-Speed 6308.49 samples/sec Loss 4.5318 LearningRate 0.0002 Epoch: 22 Global Step: 472560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:42:59,647-Speed 6304.26 samples/sec Loss 4.4466 LearningRate 0.0002 Epoch: 22 Global Step: 472570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:02,895-Speed 6306.88 samples/sec Loss 4.4812 LearningRate 0.0002 Epoch: 22 Global Step: 472580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:06,141-Speed 6310.45 samples/sec Loss 4.4599 LearningRate 0.0002 Epoch: 22 Global Step: 472590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:09,386-Speed 6313.67 samples/sec Loss 4.5512 LearningRate 0.0002 Epoch: 22 Global Step: 472600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:12,616-Speed 6340.74 samples/sec Loss 4.4572 LearningRate 0.0002 Epoch: 22 Global Step: 472610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:15,870-Speed 6296.66 samples/sec Loss 4.4968 LearningRate 0.0002 Epoch: 22 Global Step: 472620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:19,117-Speed 6308.38 samples/sec Loss 4.4440 LearningRate 0.0002 Epoch: 22 Global Step: 472630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:22,360-Speed 6316.13 samples/sec Loss 4.4882 LearningRate 0.0002 Epoch: 22 Global Step: 472640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:25,611-Speed 6301.75 samples/sec Loss 4.4888 LearningRate 0.0002 Epoch: 22 Global Step: 472650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:28,861-Speed 6301.11 samples/sec Loss 4.4552 LearningRate 0.0002 Epoch: 22 Global Step: 472660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:32,106-Speed 6312.85 samples/sec Loss 4.4114 LearningRate 0.0002 Epoch: 22 Global Step: 472670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:35,350-Speed 6314.80 samples/sec Loss 4.4016 LearningRate 0.0002 Epoch: 22 Global Step: 472680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:38,607-Speed 6290.82 samples/sec Loss 4.5775 LearningRate 0.0002 Epoch: 22 Global Step: 472690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:41,853-Speed 6310.26 samples/sec Loss 4.4944 LearningRate 0.0002 Epoch: 22 Global Step: 472700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:45,086-Speed 6336.57 samples/sec Loss 4.5614 LearningRate 0.0002 Epoch: 22 Global Step: 472710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:48,335-Speed 6304.24 samples/sec Loss 4.4243 LearningRate 0.0002 Epoch: 22 Global Step: 472720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:51,581-Speed 6309.29 samples/sec Loss 4.4750 LearningRate 0.0002 Epoch: 22 Global Step: 472730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:54,828-Speed 6308.54 samples/sec Loss 4.4842 LearningRate 0.0002 Epoch: 22 Global Step: 472740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:43:58,078-Speed 6304.44 samples/sec Loss 4.4510 LearningRate 0.0002 Epoch: 22 Global Step: 472750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:01,321-Speed 6317.82 samples/sec Loss 4.4322 LearningRate 0.0002 Epoch: 22 Global Step: 472760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:04,567-Speed 6309.55 samples/sec Loss 4.4430 LearningRate 0.0002 Epoch: 22 Global Step: 472770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:07,809-Speed 6318.09 samples/sec Loss 4.5097 LearningRate 0.0002 Epoch: 22 Global Step: 472780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:11,052-Speed 6317.22 samples/sec Loss 4.3898 LearningRate 0.0002 Epoch: 22 Global Step: 472790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:14,298-Speed 6310.19 samples/sec Loss 4.4230 LearningRate 0.0002 Epoch: 22 Global Step: 472800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:17,543-Speed 6313.28 samples/sec Loss 4.4474 LearningRate 0.0002 Epoch: 22 Global Step: 472810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:20,789-Speed 6310.84 samples/sec Loss 4.4919 LearningRate 0.0002 Epoch: 22 Global Step: 472820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:24,034-Speed 6313.16 samples/sec Loss 4.4881 LearningRate 0.0002 Epoch: 22 Global Step: 472830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:27,282-Speed 6306.80 samples/sec Loss 4.4197 LearningRate 0.0002 Epoch: 22 Global Step: 472840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:30,537-Speed 6292.23 samples/sec Loss 4.4205 LearningRate 0.0002 Epoch: 22 Global Step: 472850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:33,784-Speed 6309.92 samples/sec Loss 4.5328 LearningRate 0.0002 Epoch: 22 Global Step: 472860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:37,036-Speed 6297.72 samples/sec Loss 4.5326 LearningRate 0.0002 Epoch: 22 Global Step: 472870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:40,281-Speed 6312.17 samples/sec Loss 4.4280 LearningRate 0.0002 Epoch: 22 Global Step: 472880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:43,536-Speed 6294.68 samples/sec Loss 4.5108 LearningRate 0.0002 Epoch: 22 Global Step: 472890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:46,783-Speed 6308.69 samples/sec Loss 4.3277 LearningRate 0.0002 Epoch: 22 Global Step: 472900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:50,017-Speed 6333.40 samples/sec Loss 4.4682 LearningRate 0.0002 Epoch: 22 Global Step: 472910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:53,270-Speed 6297.59 samples/sec Loss 4.4609 LearningRate 0.0002 Epoch: 22 Global Step: 472920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:56,519-Speed 6305.25 samples/sec Loss 4.4450 LearningRate 0.0002 Epoch: 22 Global Step: 472930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:44:59,771-Speed 6298.22 samples/sec Loss 4.5368 LearningRate 0.0002 Epoch: 22 Global Step: 472940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:03,021-Speed 6302.68 samples/sec Loss 4.5186 LearningRate 0.0002 Epoch: 22 Global Step: 472950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:06,268-Speed 6310.55 samples/sec Loss 4.4829 LearningRate 0.0002 Epoch: 22 Global Step: 472960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:09,517-Speed 6305.34 samples/sec Loss 4.4468 LearningRate 0.0002 Epoch: 22 Global Step: 472970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:12,764-Speed 6307.85 samples/sec Loss 4.4834 LearningRate 0.0002 Epoch: 22 Global Step: 472980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:16,012-Speed 6306.40 samples/sec Loss 4.4580 LearningRate 0.0002 Epoch: 22 Global Step: 472990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:19,257-Speed 6312.52 samples/sec Loss 4.4953 LearningRate 0.0002 Epoch: 22 Global Step: 473000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:22,500-Speed 6316.92 samples/sec Loss 4.4750 LearningRate 0.0002 Epoch: 22 Global Step: 473010 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:45:25,733-Speed 6336.82 samples/sec Loss 4.4028 LearningRate 0.0002 Epoch: 22 Global Step: 473020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:28,980-Speed 6309.15 samples/sec Loss 4.4748 LearningRate 0.0002 Epoch: 22 Global Step: 473030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:32,228-Speed 6305.47 samples/sec Loss 4.4770 LearningRate 0.0002 Epoch: 22 Global Step: 473040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:35,473-Speed 6313.33 samples/sec Loss 4.4328 LearningRate 0.0002 Epoch: 22 Global Step: 473050 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:38,719-Speed 6310.80 samples/sec Loss 4.4437 LearningRate 0.0002 Epoch: 22 Global Step: 473060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:41,969-Speed 6302.08 samples/sec Loss 4.4801 LearningRate 0.0002 Epoch: 22 Global Step: 473070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:45,216-Speed 6308.25 samples/sec Loss 4.5346 LearningRate 0.0002 Epoch: 22 Global Step: 473080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:48,465-Speed 6305.43 samples/sec Loss 4.4821 LearningRate 0.0002 Epoch: 22 Global Step: 473090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:51,710-Speed 6313.03 samples/sec Loss 4.4237 LearningRate 0.0002 Epoch: 22 Global Step: 473100 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:54,960-Speed 6303.64 samples/sec Loss 4.4491 LearningRate 0.0002 Epoch: 22 Global Step: 473110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:45:58,193-Speed 6335.03 samples/sec Loss 4.4879 LearningRate 0.0002 Epoch: 22 Global Step: 473120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:01,437-Speed 6315.15 samples/sec Loss 4.4740 LearningRate 0.0002 Epoch: 22 Global Step: 473130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:04,682-Speed 6312.40 samples/sec Loss 4.4472 LearningRate 0.0002 Epoch: 22 Global Step: 473140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:07,932-Speed 6303.32 samples/sec Loss 4.5368 LearningRate 0.0002 Epoch: 22 Global Step: 473150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:11,195-Speed 6277.34 samples/sec Loss 4.4920 LearningRate 0.0002 Epoch: 22 Global Step: 473160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:14,443-Speed 6306.75 samples/sec Loss 4.4929 LearningRate 0.0002 Epoch: 22 Global Step: 473170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:17,692-Speed 6306.37 samples/sec Loss 4.4933 LearningRate 0.0002 Epoch: 22 Global Step: 473180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:20,940-Speed 6306.57 samples/sec Loss 4.4598 LearningRate 0.0002 Epoch: 22 Global Step: 473190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:24,221-Speed 6243.11 samples/sec Loss 4.4199 LearningRate 0.0002 Epoch: 22 Global Step: 473200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:27,474-Speed 6298.57 samples/sec Loss 4.4322 LearningRate 0.0002 Epoch: 22 Global Step: 473210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:30,703-Speed 6343.35 samples/sec Loss 4.3959 LearningRate 0.0002 Epoch: 22 Global Step: 473220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:33,949-Speed 6311.14 samples/sec Loss 4.4066 LearningRate 0.0002 Epoch: 22 Global Step: 473230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:37,191-Speed 6317.69 samples/sec Loss 4.4218 LearningRate 0.0002 Epoch: 22 Global Step: 473240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:40,438-Speed 6309.60 samples/sec Loss 4.5245 LearningRate 0.0002 Epoch: 22 Global Step: 473250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:43,689-Speed 6299.62 samples/sec Loss 4.4384 LearningRate 0.0002 Epoch: 22 Global Step: 473260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:46,936-Speed 6310.37 samples/sec Loss 4.4560 LearningRate 0.0002 Epoch: 22 Global Step: 473270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:50,184-Speed 6305.15 samples/sec Loss 4.4069 LearningRate 0.0002 Epoch: 22 Global Step: 473280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:53,432-Speed 6307.91 samples/sec Loss 4.4710 LearningRate 0.0002 Epoch: 22 Global Step: 473290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:56,679-Speed 6309.46 samples/sec Loss 4.5062 LearningRate 0.0002 Epoch: 22 Global Step: 473300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:46:59,934-Speed 6292.15 samples/sec Loss 4.3756 LearningRate 0.0002 Epoch: 22 Global Step: 473310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:03,185-Speed 6301.02 samples/sec Loss 4.4497 LearningRate 0.0002 Epoch: 22 Global Step: 473320 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:47:06,437-Speed 6300.01 samples/sec Loss 4.4974 LearningRate 0.0002 Epoch: 22 Global Step: 473330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:09,682-Speed 6312.96 samples/sec Loss 4.4064 LearningRate 0.0002 Epoch: 22 Global Step: 473340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:12,928-Speed 6309.06 samples/sec Loss 4.4246 LearningRate 0.0002 Epoch: 22 Global Step: 473350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:16,176-Speed 6307.02 samples/sec Loss 4.4837 LearningRate 0.0002 Epoch: 22 Global Step: 473360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:19,422-Speed 6311.55 samples/sec Loss 4.4934 LearningRate 0.0002 Epoch: 22 Global Step: 473370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:22,668-Speed 6310.94 samples/sec Loss 4.4303 LearningRate 0.0002 Epoch: 22 Global Step: 473380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:25,916-Speed 6305.79 samples/sec Loss 4.5115 LearningRate 0.0002 Epoch: 22 Global Step: 473390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:29,158-Speed 6318.75 samples/sec Loss 4.4925 LearningRate 0.0002 Epoch: 22 Global Step: 473400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:32,404-Speed 6312.84 samples/sec Loss 4.5255 LearningRate 0.0002 Epoch: 22 Global Step: 473410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:35,650-Speed 6309.82 samples/sec Loss 4.3995 LearningRate 0.0002 Epoch: 22 Global Step: 473420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:38,883-Speed 6336.11 samples/sec Loss 4.4717 LearningRate 0.0002 Epoch: 22 Global Step: 473430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:42,128-Speed 6312.16 samples/sec Loss 4.4470 LearningRate 0.0002 Epoch: 22 Global Step: 473440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:45,377-Speed 6305.97 samples/sec Loss 4.5034 LearningRate 0.0002 Epoch: 22 Global Step: 473450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:48,624-Speed 6308.29 samples/sec Loss 4.4922 LearningRate 0.0002 Epoch: 22 Global Step: 473460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:51,872-Speed 6306.23 samples/sec Loss 4.4723 LearningRate 0.0002 Epoch: 22 Global Step: 473470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:55,115-Speed 6316.24 samples/sec Loss 4.4964 LearningRate 0.0002 Epoch: 22 Global Step: 473480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:47:58,366-Speed 6302.42 samples/sec Loss 4.5081 LearningRate 0.0002 Epoch: 22 Global Step: 473490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:01,615-Speed 6304.60 samples/sec Loss 4.4751 LearningRate 0.0002 Epoch: 22 Global Step: 473500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:04,864-Speed 6304.88 samples/sec Loss 4.5200 LearningRate 0.0002 Epoch: 22 Global Step: 473510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:08,113-Speed 6303.63 samples/sec Loss 4.4951 LearningRate 0.0002 Epoch: 22 Global Step: 473520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:11,344-Speed 6340.78 samples/sec Loss 4.5019 LearningRate 0.0002 Epoch: 22 Global Step: 473530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:14,598-Speed 6294.55 samples/sec Loss 4.4749 LearningRate 0.0002 Epoch: 22 Global Step: 473540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:17,844-Speed 6311.32 samples/sec Loss 4.4287 LearningRate 0.0002 Epoch: 22 Global Step: 473550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:21,092-Speed 6305.91 samples/sec Loss 4.4348 LearningRate 0.0002 Epoch: 22 Global Step: 473560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:24,336-Speed 6315.26 samples/sec Loss 4.4740 LearningRate 0.0002 Epoch: 22 Global Step: 473570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:27,577-Speed 6320.41 samples/sec Loss 4.4882 LearningRate 0.0002 Epoch: 22 Global Step: 473580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:30,821-Speed 6314.27 samples/sec Loss 4.4728 LearningRate 0.0002 Epoch: 22 Global Step: 473590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:34,068-Speed 6308.63 samples/sec Loss 4.4939 LearningRate 0.0002 Epoch: 22 Global Step: 473600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:37,328-Speed 6285.91 samples/sec Loss 4.4037 LearningRate 0.0002 Epoch: 22 Global Step: 473610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:40,571-Speed 6315.38 samples/sec Loss 4.4060 LearningRate 0.0002 Epoch: 22 Global Step: 473620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:43,821-Speed 6303.29 samples/sec Loss 4.4437 LearningRate 0.0002 Epoch: 22 Global Step: 473630 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:48:47,049-Speed 6345.55 samples/sec Loss 4.4569 LearningRate 0.0002 Epoch: 22 Global Step: 473640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:50,298-Speed 6306.49 samples/sec Loss 4.4521 LearningRate 0.0002 Epoch: 22 Global Step: 473650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:53,548-Speed 6302.72 samples/sec Loss 4.4427 LearningRate 0.0002 Epoch: 22 Global Step: 473660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:48:56,798-Speed 6302.19 samples/sec Loss 4.4337 LearningRate 0.0002 Epoch: 22 Global Step: 473670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:00,047-Speed 6305.32 samples/sec Loss 4.4948 LearningRate 0.0002 Epoch: 22 Global Step: 473680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:03,293-Speed 6309.82 samples/sec Loss 4.4249 LearningRate 0.0002 Epoch: 22 Global Step: 473690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:06,538-Speed 6313.99 samples/sec Loss 4.4123 LearningRate 0.0002 Epoch: 22 Global Step: 473700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:09,787-Speed 6303.29 samples/sec Loss 4.3646 LearningRate 0.0002 Epoch: 22 Global Step: 473710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:13,031-Speed 6316.06 samples/sec Loss 4.4316 LearningRate 0.0002 Epoch: 22 Global Step: 473720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:16,277-Speed 6310.53 samples/sec Loss 4.4666 LearningRate 0.0002 Epoch: 22 Global Step: 473730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:19,514-Speed 6328.17 samples/sec Loss 4.4351 LearningRate 0.0002 Epoch: 22 Global Step: 473740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:22,762-Speed 6306.13 samples/sec Loss 4.4330 LearningRate 0.0002 Epoch: 22 Global Step: 473750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:26,009-Speed 6308.73 samples/sec Loss 4.5019 LearningRate 0.0002 Epoch: 22 Global Step: 473760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:29,257-Speed 6307.49 samples/sec Loss 4.5055 LearningRate 0.0002 Epoch: 22 Global Step: 473770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:32,528-Speed 6261.46 samples/sec Loss 4.3994 LearningRate 0.0002 Epoch: 22 Global Step: 473780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:35,774-Speed 6311.12 samples/sec Loss 4.4696 LearningRate 0.0002 Epoch: 22 Global Step: 473790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:39,023-Speed 6305.28 samples/sec Loss 4.4558 LearningRate 0.0002 Epoch: 22 Global Step: 473800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:42,280-Speed 6288.08 samples/sec Loss 4.4282 LearningRate 0.0002 Epoch: 22 Global Step: 473810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:45,526-Speed 6311.02 samples/sec Loss 4.3820 LearningRate 0.0002 Epoch: 22 Global Step: 473820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:48,775-Speed 6306.06 samples/sec Loss 4.4255 LearningRate 0.0002 Epoch: 22 Global Step: 473830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:52,005-Speed 6343.17 samples/sec Loss 4.4723 LearningRate 0.0002 Epoch: 22 Global Step: 473840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:55,250-Speed 6311.68 samples/sec Loss 4.4238 LearningRate 0.0002 Epoch: 22 Global Step: 473850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:49:58,496-Speed 6310.13 samples/sec Loss 4.4517 LearningRate 0.0002 Epoch: 22 Global Step: 473860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:01,747-Speed 6300.97 samples/sec Loss 4.4376 LearningRate 0.0002 Epoch: 22 Global Step: 473870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:05,000-Speed 6297.81 samples/sec Loss 4.4500 LearningRate 0.0002 Epoch: 22 Global Step: 473880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:08,246-Speed 6310.95 samples/sec Loss 4.4319 LearningRate 0.0002 Epoch: 22 Global Step: 473890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:11,500-Speed 6295.09 samples/sec Loss 4.4277 LearningRate 0.0002 Epoch: 22 Global Step: 473900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:14,746-Speed 6311.06 samples/sec Loss 4.4660 LearningRate 0.0002 Epoch: 22 Global Step: 473910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:17,996-Speed 6302.37 samples/sec Loss 4.3578 LearningRate 0.0002 Epoch: 22 Global Step: 473920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:21,243-Speed 6308.98 samples/sec Loss 4.4800 LearningRate 0.0002 Epoch: 22 Global Step: 473930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:24,492-Speed 6304.12 samples/sec Loss 4.3710 LearningRate 0.0002 Epoch: 22 Global Step: 473940 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:50:27,725-Speed 6336.78 samples/sec Loss 4.4118 LearningRate 0.0002 Epoch: 22 Global Step: 473950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:30,973-Speed 6307.27 samples/sec Loss 4.4006 LearningRate 0.0002 Epoch: 22 Global Step: 473960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:34,218-Speed 6311.24 samples/sec Loss 4.4279 LearningRate 0.0002 Epoch: 22 Global Step: 473970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:37,464-Speed 6311.44 samples/sec Loss 4.4227 LearningRate 0.0002 Epoch: 22 Global Step: 473980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:40,710-Speed 6311.02 samples/sec Loss 4.4451 LearningRate 0.0002 Epoch: 22 Global Step: 473990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:43,957-Speed 6308.70 samples/sec Loss 4.5253 LearningRate 0.0002 Epoch: 22 Global Step: 474000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:47,213-Speed 6290.96 samples/sec Loss 4.4739 LearningRate 0.0002 Epoch: 22 Global Step: 474010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:50,459-Speed 6310.13 samples/sec Loss 4.4727 LearningRate 0.0002 Epoch: 22 Global Step: 474020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:53,705-Speed 6311.63 samples/sec Loss 4.4681 LearningRate 0.0002 Epoch: 22 Global Step: 474030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:50:56,947-Speed 6318.35 samples/sec Loss 4.4284 LearningRate 0.0002 Epoch: 22 Global Step: 474040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:00,198-Speed 6302.36 samples/sec Loss 4.4394 LearningRate 0.0002 Epoch: 22 Global Step: 474050 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:51:03,431-Speed 6334.85 samples/sec Loss 4.4472 LearningRate 0.0002 Epoch: 22 Global Step: 474060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:06,689-Speed 6289.27 samples/sec Loss 4.4246 LearningRate 0.0002 Epoch: 22 Global Step: 474070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:09,933-Speed 6314.13 samples/sec Loss 4.4513 LearningRate 0.0002 Epoch: 22 Global Step: 474080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:13,181-Speed 6306.77 samples/sec Loss 4.5261 LearningRate 0.0002 Epoch: 22 Global Step: 474090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:16,424-Speed 6316.35 samples/sec Loss 4.4736 LearningRate 0.0002 Epoch: 22 Global Step: 474100 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:19,670-Speed 6311.01 samples/sec Loss 4.4438 LearningRate 0.0002 Epoch: 22 Global Step: 474110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:22,916-Speed 6310.19 samples/sec Loss 4.4366 LearningRate 0.0002 Epoch: 22 Global Step: 474120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:26,163-Speed 6308.31 samples/sec Loss 4.4438 LearningRate 0.0002 Epoch: 22 Global Step: 474130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:29,412-Speed 6304.86 samples/sec Loss 4.5044 LearningRate 0.0002 Epoch: 22 Global Step: 474140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:32,659-Speed 6310.00 samples/sec Loss 4.4572 LearningRate 0.0002 Epoch: 22 Global Step: 474150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:35,909-Speed 6301.75 samples/sec Loss 4.4565 LearningRate 0.0002 Epoch: 22 Global Step: 474160 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:51:39,140-Speed 6340.24 samples/sec Loss 4.4120 LearningRate 0.0002 Epoch: 22 Global Step: 474170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:42,387-Speed 6309.19 samples/sec Loss 4.3979 LearningRate 0.0002 Epoch: 22 Global Step: 474180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:45,635-Speed 6306.11 samples/sec Loss 4.4636 LearningRate 0.0002 Epoch: 22 Global Step: 474190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:48,883-Speed 6307.53 samples/sec Loss 4.4350 LearningRate 0.0002 Epoch: 22 Global Step: 474200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:52,128-Speed 6312.30 samples/sec Loss 4.4347 LearningRate 0.0002 Epoch: 22 Global Step: 474210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:55,373-Speed 6312.54 samples/sec Loss 4.4062 LearningRate 0.0002 Epoch: 22 Global Step: 474220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:51:58,620-Speed 6309.53 samples/sec Loss 4.4578 LearningRate 0.0002 Epoch: 22 Global Step: 474230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:01,870-Speed 6302.61 samples/sec Loss 4.4889 LearningRate 0.0002 Epoch: 22 Global Step: 474240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:05,117-Speed 6308.74 samples/sec Loss 4.4330 LearningRate 0.0002 Epoch: 22 Global Step: 474250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:08,366-Speed 6306.09 samples/sec Loss 4.5058 LearningRate 0.0002 Epoch: 22 Global Step: 474260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:11,603-Speed 6328.36 samples/sec Loss 4.5756 LearningRate 0.0002 Epoch: 22 Global Step: 474270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:14,850-Speed 6309.27 samples/sec Loss 4.4370 LearningRate 0.0002 Epoch: 22 Global Step: 474280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:18,100-Speed 6302.93 samples/sec Loss 4.4808 LearningRate 0.0002 Epoch: 22 Global Step: 474290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:21,352-Speed 6299.21 samples/sec Loss 4.3727 LearningRate 0.0002 Epoch: 22 Global Step: 474300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:24,599-Speed 6308.54 samples/sec Loss 4.4111 LearningRate 0.0002 Epoch: 22 Global Step: 474310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:27,845-Speed 6309.81 samples/sec Loss 4.4261 LearningRate 0.0002 Epoch: 22 Global Step: 474320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:31,127-Speed 6241.30 samples/sec Loss 4.4355 LearningRate 0.0002 Epoch: 22 Global Step: 474330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:34,374-Speed 6308.95 samples/sec Loss 4.5366 LearningRate 0.0002 Epoch: 22 Global Step: 474340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:37,618-Speed 6315.09 samples/sec Loss 4.4823 LearningRate 0.0002 Epoch: 22 Global Step: 474350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:40,863-Speed 6313.45 samples/sec Loss 4.4163 LearningRate 0.0002 Epoch: 22 Global Step: 474360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:44,108-Speed 6310.97 samples/sec Loss 4.4288 LearningRate 0.0002 Epoch: 22 Global Step: 474370 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:52:47,340-Speed 6340.18 samples/sec Loss 4.4995 LearningRate 0.0002 Epoch: 22 Global Step: 474380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:50,596-Speed 6291.29 samples/sec Loss 4.3847 LearningRate 0.0002 Epoch: 22 Global Step: 474390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:53,843-Speed 6309.06 samples/sec Loss 4.4085 LearningRate 0.0002 Epoch: 22 Global Step: 474400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:52:57,090-Speed 6308.31 samples/sec Loss 4.4605 LearningRate 0.0002 Epoch: 22 Global Step: 474410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:00,336-Speed 6309.80 samples/sec Loss 4.4552 LearningRate 0.0002 Epoch: 22 Global Step: 474420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:03,586-Speed 6304.02 samples/sec Loss 4.4697 LearningRate 0.0002 Epoch: 22 Global Step: 474430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:06,833-Speed 6308.03 samples/sec Loss 4.4434 LearningRate 0.0002 Epoch: 22 Global Step: 474440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:10,089-Speed 6291.49 samples/sec Loss 4.4422 LearningRate 0.0002 Epoch: 22 Global Step: 474450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:13,338-Speed 6306.46 samples/sec Loss 4.4596 LearningRate 0.0002 Epoch: 22 Global Step: 474460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:16,583-Speed 6312.06 samples/sec Loss 4.4461 LearningRate 0.0002 Epoch: 22 Global Step: 474470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:19,814-Speed 6339.73 samples/sec Loss 4.4184 LearningRate 0.0002 Epoch: 22 Global Step: 474480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:23,059-Speed 6313.24 samples/sec Loss 4.4472 LearningRate 0.0002 Epoch: 22 Global Step: 474490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:26,306-Speed 6308.82 samples/sec Loss 4.4232 LearningRate 0.0002 Epoch: 22 Global Step: 474500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:29,552-Speed 6310.09 samples/sec Loss 4.4350 LearningRate 0.0002 Epoch: 22 Global Step: 474510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:32,797-Speed 6313.53 samples/sec Loss 4.4236 LearningRate 0.0002 Epoch: 22 Global Step: 474520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:36,040-Speed 6315.16 samples/sec Loss 4.4661 LearningRate 0.0002 Epoch: 22 Global Step: 474530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:39,282-Speed 6320.09 samples/sec Loss 4.4965 LearningRate 0.0002 Epoch: 22 Global Step: 474540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:42,528-Speed 6309.23 samples/sec Loss 4.4620 LearningRate 0.0002 Epoch: 22 Global Step: 474550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:45,774-Speed 6311.12 samples/sec Loss 4.4263 LearningRate 0.0002 Epoch: 22 Global Step: 474560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:49,016-Speed 6319.04 samples/sec Loss 4.4281 LearningRate 0.0002 Epoch: 22 Global Step: 474570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:52,248-Speed 6337.02 samples/sec Loss 4.4409 LearningRate 0.0002 Epoch: 22 Global Step: 474580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:55,491-Speed 6317.17 samples/sec Loss 4.4323 LearningRate 0.0002 Epoch: 22 Global Step: 474590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:53:58,738-Speed 6309.83 samples/sec Loss 4.4254 LearningRate 0.0002 Epoch: 22 Global Step: 474600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:01,984-Speed 6309.41 samples/sec Loss 4.4968 LearningRate 0.0002 Epoch: 22 Global Step: 474610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:05,254-Speed 6263.90 samples/sec Loss 4.5097 LearningRate 0.0002 Epoch: 22 Global Step: 474620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:08,501-Speed 6310.39 samples/sec Loss 4.4605 LearningRate 0.0002 Epoch: 22 Global Step: 474630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:11,751-Speed 6303.15 samples/sec Loss 4.4359 LearningRate 0.0002 Epoch: 22 Global Step: 474640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:14,997-Speed 6310.68 samples/sec Loss 4.3992 LearningRate 0.0002 Epoch: 22 Global Step: 474650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:18,239-Speed 6316.53 samples/sec Loss 4.4606 LearningRate 0.0002 Epoch: 22 Global Step: 474660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:21,483-Speed 6314.75 samples/sec Loss 4.5169 LearningRate 0.0002 Epoch: 22 Global Step: 474670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:24,733-Speed 6303.85 samples/sec Loss 4.5279 LearningRate 0.0002 Epoch: 22 Global Step: 474680 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:54:27,963-Speed 6342.57 samples/sec Loss 4.4304 LearningRate 0.0002 Epoch: 22 Global Step: 474690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:31,207-Speed 6314.99 samples/sec Loss 4.4940 LearningRate 0.0002 Epoch: 22 Global Step: 474700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:34,454-Speed 6309.33 samples/sec Loss 4.4797 LearningRate 0.0002 Epoch: 22 Global Step: 474710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:37,698-Speed 6313.54 samples/sec Loss 4.4427 LearningRate 0.0002 Epoch: 22 Global Step: 474720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:40,945-Speed 6308.64 samples/sec Loss 4.4376 LearningRate 0.0002 Epoch: 22 Global Step: 474730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:44,191-Speed 6311.57 samples/sec Loss 4.4688 LearningRate 0.0002 Epoch: 22 Global Step: 474740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:47,435-Speed 6315.01 samples/sec Loss 4.3827 LearningRate 0.0002 Epoch: 22 Global Step: 474750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:50,679-Speed 6314.66 samples/sec Loss 4.4440 LearningRate 0.0002 Epoch: 22 Global Step: 474760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:53,928-Speed 6304.98 samples/sec Loss 4.4230 LearningRate 0.0002 Epoch: 22 Global Step: 474770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:54:57,177-Speed 6303.61 samples/sec Loss 4.3939 LearningRate 0.0002 Epoch: 22 Global Step: 474780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:00,410-Speed 6337.70 samples/sec Loss 4.4740 LearningRate 0.0002 Epoch: 22 Global Step: 474790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:03,667-Speed 6288.40 samples/sec Loss 4.4668 LearningRate 0.0002 Epoch: 22 Global Step: 474800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:06,910-Speed 6316.14 samples/sec Loss 4.3603 LearningRate 0.0002 Epoch: 22 Global Step: 474810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:10,168-Speed 6287.30 samples/sec Loss 4.4582 LearningRate 0.0002 Epoch: 22 Global Step: 474820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:13,416-Speed 6308.26 samples/sec Loss 4.4421 LearningRate 0.0002 Epoch: 22 Global Step: 474830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:16,661-Speed 6311.51 samples/sec Loss 4.4343 LearningRate 0.0002 Epoch: 22 Global Step: 474840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:19,909-Speed 6307.69 samples/sec Loss 4.4275 LearningRate 0.0002 Epoch: 22 Global Step: 474850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:23,155-Speed 6309.67 samples/sec Loss 4.3620 LearningRate 0.0002 Epoch: 22 Global Step: 474860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:26,398-Speed 6316.31 samples/sec Loss 4.3974 LearningRate 0.0002 Epoch: 22 Global Step: 474870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:29,645-Speed 6309.79 samples/sec Loss 4.4820 LearningRate 0.0002 Epoch: 22 Global Step: 474880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:32,890-Speed 6313.16 samples/sec Loss 4.4922 LearningRate 0.0002 Epoch: 22 Global Step: 474890 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:55:36,122-Speed 6337.51 samples/sec Loss 4.5135 LearningRate 0.0002 Epoch: 22 Global Step: 474900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:39,369-Speed 6309.15 samples/sec Loss 4.4422 LearningRate 0.0002 Epoch: 22 Global Step: 474910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:42,613-Speed 6314.45 samples/sec Loss 4.4741 LearningRate 0.0002 Epoch: 22 Global Step: 474920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:45,859-Speed 6312.44 samples/sec Loss 4.4048 LearningRate 0.0002 Epoch: 22 Global Step: 474930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:49,109-Speed 6302.25 samples/sec Loss 4.5305 LearningRate 0.0002 Epoch: 22 Global Step: 474940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:52,355-Speed 6310.75 samples/sec Loss 4.4487 LearningRate 0.0002 Epoch: 22 Global Step: 474950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:55,597-Speed 6317.24 samples/sec Loss 4.5430 LearningRate 0.0002 Epoch: 22 Global Step: 474960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:55:58,848-Speed 6303.10 samples/sec Loss 4.4523 LearningRate 0.0002 Epoch: 22 Global Step: 474970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:02,093-Speed 6310.90 samples/sec Loss 4.3838 LearningRate 0.0002 Epoch: 22 Global Step: 474980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:05,335-Speed 6318.31 samples/sec Loss 4.4582 LearningRate 0.0002 Epoch: 22 Global Step: 474990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:08,567-Speed 6338.80 samples/sec Loss 4.4263 LearningRate 0.0002 Epoch: 22 Global Step: 475000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:11,811-Speed 6314.85 samples/sec Loss 4.5296 LearningRate 0.0002 Epoch: 22 Global Step: 475010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:15,054-Speed 6316.06 samples/sec Loss 4.4830 LearningRate 0.0002 Epoch: 22 Global Step: 475020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:18,302-Speed 6306.77 samples/sec Loss 4.4261 LearningRate 0.0002 Epoch: 22 Global Step: 475030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:21,550-Speed 6307.02 samples/sec Loss 4.3904 LearningRate 0.0002 Epoch: 22 Global Step: 475040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:24,797-Speed 6308.66 samples/sec Loss 4.5674 LearningRate 0.0002 Epoch: 22 Global Step: 475050 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:28,048-Speed 6301.07 samples/sec Loss 4.5018 LearningRate 0.0002 Epoch: 22 Global Step: 475060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:31,295-Speed 6309.20 samples/sec Loss 4.4320 LearningRate 0.0002 Epoch: 22 Global Step: 475070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:34,545-Speed 6302.80 samples/sec Loss 4.4128 LearningRate 0.0002 Epoch: 22 Global Step: 475080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:37,788-Speed 6316.14 samples/sec Loss 4.4624 LearningRate 0.0002 Epoch: 22 Global Step: 475090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:41,035-Speed 6307.99 samples/sec Loss 4.4401 LearningRate 0.0002 Epoch: 22 Global Step: 475100 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:56:44,266-Speed 6340.12 samples/sec Loss 4.4437 LearningRate 0.0002 Epoch: 22 Global Step: 475110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:47,512-Speed 6311.90 samples/sec Loss 4.4990 LearningRate 0.0002 Epoch: 22 Global Step: 475120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:50,758-Speed 6311.43 samples/sec Loss 4.4750 LearningRate 0.0002 Epoch: 22 Global Step: 475130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:54,005-Speed 6307.42 samples/sec Loss 4.4386 LearningRate 0.0002 Epoch: 22 Global Step: 475140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:56:57,254-Speed 6306.47 samples/sec Loss 4.4692 LearningRate 0.0002 Epoch: 22 Global Step: 475150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:00,505-Speed 6299.51 samples/sec Loss 4.4280 LearningRate 0.0002 Epoch: 22 Global Step: 475160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:03,751-Speed 6312.77 samples/sec Loss 4.4574 LearningRate 0.0002 Epoch: 22 Global Step: 475170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:06,999-Speed 6306.44 samples/sec Loss 4.4644 LearningRate 0.0002 Epoch: 22 Global Step: 475180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:10,246-Speed 6309.87 samples/sec Loss 4.3798 LearningRate 0.0002 Epoch: 22 Global Step: 475190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:13,498-Speed 6298.92 samples/sec Loss 4.4569 LearningRate 0.0002 Epoch: 22 Global Step: 475200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:16,737-Speed 6323.75 samples/sec Loss 4.5135 LearningRate 0.0002 Epoch: 22 Global Step: 475210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:19,981-Speed 6315.48 samples/sec Loss 4.3962 LearningRate 0.0002 Epoch: 22 Global Step: 475220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:23,225-Speed 6313.45 samples/sec Loss 4.4100 LearningRate 0.0002 Epoch: 22 Global Step: 475230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:26,472-Speed 6309.06 samples/sec Loss 4.4473 LearningRate 0.0002 Epoch: 22 Global Step: 475240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:29,716-Speed 6313.87 samples/sec Loss 4.4574 LearningRate 0.0002 Epoch: 22 Global Step: 475250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:32,967-Speed 6301.93 samples/sec Loss 4.5286 LearningRate 0.0002 Epoch: 22 Global Step: 475260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:36,215-Speed 6307.80 samples/sec Loss 4.3840 LearningRate 0.0002 Epoch: 22 Global Step: 475270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:39,459-Speed 6313.92 samples/sec Loss 4.4618 LearningRate 0.0002 Epoch: 22 Global Step: 475280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:42,705-Speed 6310.50 samples/sec Loss 4.3976 LearningRate 0.0002 Epoch: 22 Global Step: 475290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:45,951-Speed 6310.83 samples/sec Loss 4.3884 LearningRate 0.0002 Epoch: 22 Global Step: 475300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:49,185-Speed 6334.06 samples/sec Loss 4.4384 LearningRate 0.0002 Epoch: 22 Global Step: 475310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:52,429-Speed 6315.60 samples/sec Loss 4.4372 LearningRate 0.0002 Epoch: 22 Global Step: 475320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:55,675-Speed 6311.30 samples/sec Loss 4.4511 LearningRate 0.0002 Epoch: 22 Global Step: 475330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:57:58,926-Speed 6299.53 samples/sec Loss 4.4297 LearningRate 0.0002 Epoch: 22 Global Step: 475340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:02,174-Speed 6307.29 samples/sec Loss 4.5108 LearningRate 0.0002 Epoch: 22 Global Step: 475350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:05,421-Speed 6309.91 samples/sec Loss 4.4647 LearningRate 0.0002 Epoch: 22 Global Step: 475360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:08,665-Speed 6313.83 samples/sec Loss 4.4715 LearningRate 0.0002 Epoch: 22 Global Step: 475370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:11,910-Speed 6312.75 samples/sec Loss 4.4310 LearningRate 0.0002 Epoch: 22 Global Step: 475380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:15,154-Speed 6315.32 samples/sec Loss 4.4152 LearningRate 0.0002 Epoch: 22 Global Step: 475390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:18,396-Speed 6317.73 samples/sec Loss 4.4881 LearningRate 0.0002 Epoch: 22 Global Step: 475400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:21,640-Speed 6314.37 samples/sec Loss 4.4233 LearningRate 0.0002 Epoch: 22 Global Step: 475410 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:58:24,877-Speed 6328.91 samples/sec Loss 4.4294 LearningRate 0.0002 Epoch: 22 Global Step: 475420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:28,130-Speed 6296.54 samples/sec Loss 4.4425 LearningRate 0.0002 Epoch: 22 Global Step: 475430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:31,375-Speed 6311.85 samples/sec Loss 4.5366 LearningRate 0.0002 Epoch: 22 Global Step: 475440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:34,621-Speed 6312.20 samples/sec Loss 4.3708 LearningRate 0.0002 Epoch: 22 Global Step: 475450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:37,865-Speed 6314.38 samples/sec Loss 4.4287 LearningRate 0.0002 Epoch: 22 Global Step: 475460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:41,117-Speed 6298.75 samples/sec Loss 4.4172 LearningRate 0.0002 Epoch: 22 Global Step: 475470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:44,362-Speed 6312.91 samples/sec Loss 4.4662 LearningRate 0.0002 Epoch: 22 Global Step: 475480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:47,608-Speed 6310.45 samples/sec Loss 4.4569 LearningRate 0.0002 Epoch: 22 Global Step: 475490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:50,858-Speed 6303.31 samples/sec Loss 4.4006 LearningRate 0.0002 Epoch: 22 Global Step: 475500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:54,109-Speed 6301.52 samples/sec Loss 4.4523 LearningRate 0.0002 Epoch: 22 Global Step: 475510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:58:57,340-Speed 6338.75 samples/sec Loss 4.4455 LearningRate 0.0002 Epoch: 22 Global Step: 475520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:00,584-Speed 6314.73 samples/sec Loss 4.4376 LearningRate 0.0002 Epoch: 22 Global Step: 475530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:03,829-Speed 6314.42 samples/sec Loss 4.4655 LearningRate 0.0002 Epoch: 22 Global Step: 475540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:07,075-Speed 6309.64 samples/sec Loss 4.4705 LearningRate 0.0002 Epoch: 22 Global Step: 475550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:10,319-Speed 6315.79 samples/sec Loss 4.4611 LearningRate 0.0002 Epoch: 22 Global Step: 475560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:13,563-Speed 6313.76 samples/sec Loss 4.5035 LearningRate 0.0002 Epoch: 22 Global Step: 475570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:16,808-Speed 6311.94 samples/sec Loss 4.4958 LearningRate 0.0002 Epoch: 22 Global Step: 475580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:20,058-Speed 6304.51 samples/sec Loss 4.4790 LearningRate 0.0002 Epoch: 22 Global Step: 475590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:23,303-Speed 6311.19 samples/sec Loss 4.5409 LearningRate 0.0002 Epoch: 22 Global Step: 475600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:26,553-Speed 6303.46 samples/sec Loss 4.4181 LearningRate 0.0002 Epoch: 22 Global Step: 475610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:29,800-Speed 6309.25 samples/sec Loss 4.4424 LearningRate 0.0002 Epoch: 22 Global Step: 475620 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 10:59:33,030-Speed 6341.23 samples/sec Loss 4.4716 LearningRate 0.0002 Epoch: 22 Global Step: 475630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:36,277-Speed 6310.13 samples/sec Loss 4.4773 LearningRate 0.0002 Epoch: 22 Global Step: 475640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:39,521-Speed 6314.49 samples/sec Loss 4.4233 LearningRate 0.0002 Epoch: 22 Global Step: 475650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:42,771-Speed 6301.68 samples/sec Loss 4.4723 LearningRate 0.0002 Epoch: 22 Global Step: 475660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:46,019-Speed 6308.19 samples/sec Loss 4.4108 LearningRate 0.0002 Epoch: 22 Global Step: 475670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:49,266-Speed 6307.70 samples/sec Loss 4.4309 LearningRate 0.0002 Epoch: 22 Global Step: 475680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:52,515-Speed 6305.53 samples/sec Loss 4.4601 LearningRate 0.0002 Epoch: 22 Global Step: 475690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:55,760-Speed 6311.21 samples/sec Loss 4.4237 LearningRate 0.0002 Epoch: 22 Global Step: 475700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 10:59:59,024-Speed 6276.14 samples/sec Loss 4.4691 LearningRate 0.0002 Epoch: 22 Global Step: 475710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:00:02,272-Speed 6307.17 samples/sec Loss 4.3923 LearningRate 0.0002 Epoch: 22 Global Step: 475720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:00:05,493-Speed 6361.03 samples/sec Loss 4.4523 LearningRate 0.0002 Epoch: 22 Global Step: 475730 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:00:08,740-Speed 6309.07 samples/sec Loss 4.4422 LearningRate 0.0002 Epoch: 22 Global Step: 475740 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:00:12,004-Speed 6275.34 samples/sec Loss 4.4394 LearningRate 0.0002 Epoch: 22 Global Step: 475750 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:00:15,252-Speed 6307.44 samples/sec Loss 4.4895 LearningRate 0.0002 Epoch: 22 Global Step: 475760 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:00:18,495-Speed 6317.02 samples/sec Loss 4.4842 LearningRate 0.0002 Epoch: 22 Global Step: 475770 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:00:21,737-Speed 6317.95 samples/sec Loss 4.4174 LearningRate 0.0002 Epoch: 22 Global Step: 475780 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:00:24,982-Speed 6313.48 samples/sec Loss 4.4077 LearningRate 0.0002 Epoch: 22 Global Step: 475790 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:00:28,237-Speed 6291.70 samples/sec Loss 4.4412 LearningRate 0.0002 Epoch: 22 Global Step: 475800 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:00:31,490-Speed 6298.32 samples/sec Loss 4.4514 LearningRate 0.0002 Epoch: 22 Global Step: 475810 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:00:34,734-Speed 6314.81 samples/sec Loss 4.4226 LearningRate 0.0002 Epoch: 22 Global Step: 475820 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:00:37,981-Speed 6307.20 samples/sec Loss 4.4924 LearningRate 0.0002 Epoch: 22 Global Step: 475830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:00:41,226-Speed 6312.94 samples/sec Loss 4.4034 LearningRate 0.0002 Epoch: 22 Global Step: 475840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:00:44,475-Speed 6306.17 samples/sec Loss 4.4134 LearningRate 0.0002 Epoch: 22 Global Step: 475850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:00:47,720-Speed 6312.18 samples/sec Loss 4.4989 LearningRate 0.0002 Epoch: 22 Global Step: 475860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:00:50,965-Speed 6312.11 samples/sec Loss 4.4482 LearningRate 0.0002 Epoch: 22 Global Step: 475870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:00:54,211-Speed 6311.34 samples/sec Loss 4.4711 LearningRate 0.0002 Epoch: 22 Global Step: 475880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:00:57,458-Speed 6308.13 samples/sec Loss 4.4211 LearningRate 0.0002 Epoch: 22 Global Step: 475890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:00,701-Speed 6315.44 samples/sec Loss 4.3867 LearningRate 0.0002 Epoch: 22 Global Step: 475900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:03,950-Speed 6305.63 samples/sec Loss 4.4384 LearningRate 0.0002 Epoch: 22 Global Step: 475910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:07,201-Speed 6302.30 samples/sec Loss 4.4031 LearningRate 0.0002 Epoch: 22 Global Step: 475920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:10,448-Speed 6307.32 samples/sec Loss 4.4169 LearningRate 0.0002 Epoch: 22 Global Step: 475930 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:01:13,684-Speed 6332.64 samples/sec Loss 4.4193 LearningRate 0.0002 Epoch: 22 Global Step: 475940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:16,933-Speed 6304.65 samples/sec Loss 4.4671 LearningRate 0.0002 Epoch: 22 Global Step: 475950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:20,175-Speed 6316.93 samples/sec Loss 4.4571 LearningRate 0.0002 Epoch: 22 Global Step: 475960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:23,419-Speed 6316.90 samples/sec Loss 4.5121 LearningRate 0.0002 Epoch: 22 Global Step: 475970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:26,663-Speed 6313.76 samples/sec Loss 4.4994 LearningRate 0.0002 Epoch: 22 Global Step: 475980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:29,908-Speed 6312.90 samples/sec Loss 4.5040 LearningRate 0.0002 Epoch: 22 Global Step: 475990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:33,154-Speed 6311.61 samples/sec Loss 4.4727 LearningRate 0.0002 Epoch: 22 Global Step: 476000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:36,403-Speed 6304.86 samples/sec Loss 4.4554 LearningRate 0.0002 Epoch: 22 Global Step: 476010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:39,653-Speed 6301.58 samples/sec Loss 4.5023 LearningRate 0.0002 Epoch: 22 Global Step: 476020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:42,902-Speed 6306.58 samples/sec Loss 4.3972 LearningRate 0.0002 Epoch: 22 Global Step: 476030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:46,131-Speed 6342.71 samples/sec Loss 4.4707 LearningRate 0.0002 Epoch: 22 Global Step: 476040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:49,373-Speed 6319.21 samples/sec Loss 4.4697 LearningRate 0.0002 Epoch: 22 Global Step: 476050 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:52,616-Speed 6316.29 samples/sec Loss 4.3841 LearningRate 0.0002 Epoch: 22 Global Step: 476060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:55,858-Speed 6317.64 samples/sec Loss 4.4074 LearningRate 0.0002 Epoch: 22 Global Step: 476070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:01:59,102-Speed 6314.97 samples/sec Loss 4.4415 LearningRate 0.0002 Epoch: 22 Global Step: 476080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:02,359-Speed 6289.08 samples/sec Loss 4.4720 LearningRate 0.0002 Epoch: 22 Global Step: 476090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:05,607-Speed 6307.58 samples/sec Loss 4.4354 LearningRate 0.0002 Epoch: 22 Global Step: 476100 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:08,867-Speed 6283.42 samples/sec Loss 4.4001 LearningRate 0.0002 Epoch: 22 Global Step: 476110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:12,115-Speed 6306.75 samples/sec Loss 4.4317 LearningRate 0.0002 Epoch: 22 Global Step: 476120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:15,360-Speed 6313.41 samples/sec Loss 4.4629 LearningRate 0.0002 Epoch: 22 Global Step: 476130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:18,606-Speed 6309.09 samples/sec Loss 4.4551 LearningRate 0.0002 Epoch: 22 Global Step: 476140 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:02:21,851-Speed 6312.27 samples/sec Loss 4.4889 LearningRate 0.0002 Epoch: 22 Global Step: 476150 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:02:25,086-Speed 6334.83 samples/sec Loss 4.4753 LearningRate 0.0002 Epoch: 22 Global Step: 476160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:28,331-Speed 6311.87 samples/sec Loss 4.4827 LearningRate 0.0002 Epoch: 22 Global Step: 476170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:31,574-Speed 6316.68 samples/sec Loss 4.3957 LearningRate 0.0002 Epoch: 22 Global Step: 476180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:34,821-Speed 6308.35 samples/sec Loss 4.4592 LearningRate 0.0002 Epoch: 22 Global Step: 476190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:38,064-Speed 6318.03 samples/sec Loss 4.4225 LearningRate 0.0002 Epoch: 22 Global Step: 476200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:41,313-Speed 6303.85 samples/sec Loss 4.3822 LearningRate 0.0002 Epoch: 22 Global Step: 476210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:44,558-Speed 6313.23 samples/sec Loss 4.4334 LearningRate 0.0002 Epoch: 22 Global Step: 476220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:47,806-Speed 6306.67 samples/sec Loss 4.4124 LearningRate 0.0002 Epoch: 22 Global Step: 476230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:51,056-Speed 6302.68 samples/sec Loss 4.5146 LearningRate 0.0002 Epoch: 22 Global Step: 476240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:54,309-Speed 6296.97 samples/sec Loss 4.4309 LearningRate 0.0002 Epoch: 22 Global Step: 476250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:02:57,541-Speed 6337.27 samples/sec Loss 4.4145 LearningRate 0.0002 Epoch: 22 Global Step: 476260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:00,791-Speed 6302.98 samples/sec Loss 4.4413 LearningRate 0.0002 Epoch: 22 Global Step: 476270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:04,038-Speed 6310.02 samples/sec Loss 4.4373 LearningRate 0.0002 Epoch: 22 Global Step: 476280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:07,278-Speed 6321.50 samples/sec Loss 4.3935 LearningRate 0.0002 Epoch: 22 Global Step: 476290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:10,521-Speed 6317.79 samples/sec Loss 4.4225 LearningRate 0.0002 Epoch: 22 Global Step: 476300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:13,767-Speed 6308.96 samples/sec Loss 4.4559 LearningRate 0.0002 Epoch: 22 Global Step: 476310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:17,008-Speed 6320.96 samples/sec Loss 4.4701 LearningRate 0.0002 Epoch: 22 Global Step: 476320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:20,254-Speed 6310.78 samples/sec Loss 4.4061 LearningRate 0.0002 Epoch: 22 Global Step: 476330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:23,501-Speed 6307.98 samples/sec Loss 4.4688 LearningRate 0.0002 Epoch: 22 Global Step: 476340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:26,752-Speed 6302.51 samples/sec Loss 4.4365 LearningRate 0.0002 Epoch: 22 Global Step: 476350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:29,984-Speed 6336.25 samples/sec Loss 4.4381 LearningRate 0.0002 Epoch: 22 Global Step: 476360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:33,230-Speed 6312.26 samples/sec Loss 4.4283 LearningRate 0.0002 Epoch: 22 Global Step: 476370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:36,478-Speed 6306.17 samples/sec Loss 4.4528 LearningRate 0.0002 Epoch: 22 Global Step: 476380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:39,726-Speed 6307.23 samples/sec Loss 4.4333 LearningRate 0.0002 Epoch: 22 Global Step: 476390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:42,971-Speed 6314.35 samples/sec Loss 4.4481 LearningRate 0.0002 Epoch: 22 Global Step: 476400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:46,214-Speed 6315.76 samples/sec Loss 4.4695 LearningRate 0.0002 Epoch: 22 Global Step: 476410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:49,460-Speed 6309.62 samples/sec Loss 4.4559 LearningRate 0.0002 Epoch: 22 Global Step: 476420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:52,707-Speed 6309.83 samples/sec Loss 4.4331 LearningRate 0.0002 Epoch: 22 Global Step: 476430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:55,953-Speed 6310.42 samples/sec Loss 4.4296 LearningRate 0.0002 Epoch: 22 Global Step: 476440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:03:59,197-Speed 6314.96 samples/sec Loss 4.4182 LearningRate 0.0002 Epoch: 22 Global Step: 476450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:04:02,447-Speed 6302.42 samples/sec Loss 4.4934 LearningRate 0.0002 Epoch: 22 Global Step: 476460 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:04:05,681-Speed 6333.93 samples/sec Loss 4.4464 LearningRate 0.0002 Epoch: 22 Global Step: 476470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:04:08,928-Speed 6308.32 samples/sec Loss 4.4437 LearningRate 0.0002 Epoch: 22 Global Step: 476480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:04:12,159-Speed 6340.23 samples/sec Loss 4.4147 LearningRate 0.0002 Epoch: 22 Global Step: 476490 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:04:15,407-Speed 6307.27 samples/sec Loss 4.3644 LearningRate 0.0002 Epoch: 22 Global Step: 476500 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:04:18,658-Speed 6301.10 samples/sec Loss 4.4264 LearningRate 0.0002 Epoch: 22 Global Step: 476510 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:04:21,906-Speed 6305.88 samples/sec Loss 4.4751 LearningRate 0.0002 Epoch: 22 Global Step: 476520 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:04:25,148-Speed 6318.98 samples/sec Loss 4.4078 LearningRate 0.0002 Epoch: 22 Global Step: 476530 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:04:28,391-Speed 6316.91 samples/sec Loss 4.4457 LearningRate 0.0002 Epoch: 22 Global Step: 476540 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:04:31,632-Speed 6319.47 samples/sec Loss 4.4415 LearningRate 0.0002 Epoch: 22 Global Step: 476550 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:04:34,875-Speed 6316.28 samples/sec Loss 4.3495 LearningRate 0.0002 Epoch: 22 Global Step: 476560 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:04:38,121-Speed 6311.53 samples/sec Loss 4.5075 LearningRate 0.0002 Epoch: 22 Global Step: 476570 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:04:41,368-Speed 6310.45 samples/sec Loss 4.4925 LearningRate 0.0002 Epoch: 22 Global Step: 476580 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:04:44,617-Speed 6304.53 samples/sec Loss 4.4305 LearningRate 0.0002 Epoch: 22 Global Step: 476590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:04:47,861-Speed 6314.03 samples/sec Loss 4.4690 LearningRate 0.0002 Epoch: 22 Global Step: 476600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:04:51,106-Speed 6313.97 samples/sec Loss 4.3905 LearningRate 0.0002 Epoch: 22 Global Step: 476610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:04:54,352-Speed 6310.84 samples/sec Loss 4.4021 LearningRate 0.0002 Epoch: 22 Global Step: 476620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:04:57,597-Speed 6311.89 samples/sec Loss 4.4115 LearningRate 0.0002 Epoch: 22 Global Step: 476630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:00,844-Speed 6308.70 samples/sec Loss 4.4743 LearningRate 0.0002 Epoch: 22 Global Step: 476640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:04,090-Speed 6311.68 samples/sec Loss 4.4866 LearningRate 0.0002 Epoch: 22 Global Step: 476650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:07,337-Speed 6307.92 samples/sec Loss 4.4396 LearningRate 0.0002 Epoch: 22 Global Step: 476660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:10,584-Speed 6310.26 samples/sec Loss 4.4170 LearningRate 0.0002 Epoch: 22 Global Step: 476670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:13,833-Speed 6304.85 samples/sec Loss 4.5124 LearningRate 0.0002 Epoch: 22 Global Step: 476680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:17,091-Speed 6286.83 samples/sec Loss 4.4288 LearningRate 0.0002 Epoch: 22 Global Step: 476690 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:05:20,323-Speed 6338.35 samples/sec Loss 4.4479 LearningRate 0.0002 Epoch: 22 Global Step: 476700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:23,570-Speed 6307.74 samples/sec Loss 4.4456 LearningRate 0.0002 Epoch: 22 Global Step: 476710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:26,825-Speed 6294.24 samples/sec Loss 4.5164 LearningRate 0.0002 Epoch: 22 Global Step: 476720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:30,072-Speed 6309.16 samples/sec Loss 4.4677 LearningRate 0.0002 Epoch: 22 Global Step: 476730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:33,318-Speed 6309.73 samples/sec Loss 4.4394 LearningRate 0.0002 Epoch: 22 Global Step: 476740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:36,566-Speed 6307.19 samples/sec Loss 4.4346 LearningRate 0.0002 Epoch: 22 Global Step: 476750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:39,811-Speed 6311.87 samples/sec Loss 4.4417 LearningRate 0.0002 Epoch: 22 Global Step: 476760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:43,054-Speed 6316.99 samples/sec Loss 4.3747 LearningRate 0.0002 Epoch: 22 Global Step: 476770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:46,303-Speed 6305.93 samples/sec Loss 4.4606 LearningRate 0.0002 Epoch: 22 Global Step: 476780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:49,550-Speed 6308.47 samples/sec Loss 4.4695 LearningRate 0.0002 Epoch: 22 Global Step: 476790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:52,781-Speed 6339.90 samples/sec Loss 4.4109 LearningRate 0.0002 Epoch: 22 Global Step: 476800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:56,029-Speed 6307.29 samples/sec Loss 4.5126 LearningRate 0.0002 Epoch: 22 Global Step: 476810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:05:59,275-Speed 6311.35 samples/sec Loss 4.5070 LearningRate 0.0002 Epoch: 22 Global Step: 476820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:02,519-Speed 6314.51 samples/sec Loss 4.4427 LearningRate 0.0002 Epoch: 22 Global Step: 476830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:05,766-Speed 6307.85 samples/sec Loss 4.3711 LearningRate 0.0002 Epoch: 22 Global Step: 476840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:09,007-Speed 6320.82 samples/sec Loss 4.4370 LearningRate 0.0002 Epoch: 22 Global Step: 476850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:12,258-Speed 6300.19 samples/sec Loss 4.4158 LearningRate 0.0002 Epoch: 22 Global Step: 476860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:15,505-Speed 6309.73 samples/sec Loss 4.4739 LearningRate 0.0002 Epoch: 22 Global Step: 476870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:18,753-Speed 6307.50 samples/sec Loss 4.4628 LearningRate 0.0002 Epoch: 22 Global Step: 476880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:21,998-Speed 6311.48 samples/sec Loss 4.4564 LearningRate 0.0002 Epoch: 22 Global Step: 476890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:25,229-Speed 6339.30 samples/sec Loss 4.4420 LearningRate 0.0002 Epoch: 22 Global Step: 476900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:28,475-Speed 6311.58 samples/sec Loss 4.4840 LearningRate 0.0002 Epoch: 22 Global Step: 476910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:31,723-Speed 6305.89 samples/sec Loss 4.4423 LearningRate 0.0002 Epoch: 22 Global Step: 476920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:34,969-Speed 6312.12 samples/sec Loss 4.4280 LearningRate 0.0002 Epoch: 22 Global Step: 476930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:38,220-Speed 6299.99 samples/sec Loss 4.5049 LearningRate 0.0002 Epoch: 22 Global Step: 476940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:41,468-Speed 6308.51 samples/sec Loss 4.4823 LearningRate 0.0002 Epoch: 22 Global Step: 476950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:44,712-Speed 6313.95 samples/sec Loss 4.3961 LearningRate 0.0002 Epoch: 22 Global Step: 476960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:47,961-Speed 6304.39 samples/sec Loss 4.5048 LearningRate 0.0002 Epoch: 22 Global Step: 476970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:51,213-Speed 6298.26 samples/sec Loss 4.4359 LearningRate 0.0002 Epoch: 22 Global Step: 476980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:54,461-Speed 6306.97 samples/sec Loss 4.4939 LearningRate 0.0002 Epoch: 22 Global Step: 476990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:06:57,690-Speed 6346.36 samples/sec Loss 4.4726 LearningRate 0.0002 Epoch: 22 Global Step: 477000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:07:00,938-Speed 6305.76 samples/sec Loss 4.4445 LearningRate 0.0002 Epoch: 22 Global Step: 477010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:01,182-Speed 339.95 samples/sec Loss 4.4713 LearningRate 0.0002 Epoch: 23 Global Step: 477020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:04,429-Speed 6310.66 samples/sec Loss 4.4750 LearningRate 0.0002 Epoch: 23 Global Step: 477030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:07,664-Speed 6331.90 samples/sec Loss 4.4249 LearningRate 0.0002 Epoch: 23 Global Step: 477040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:10,914-Speed 6301.44 samples/sec Loss 4.4354 LearningRate 0.0002 Epoch: 23 Global Step: 477050 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:14,154-Speed 6323.02 samples/sec Loss 4.4392 LearningRate 0.0002 Epoch: 23 Global Step: 477060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:17,392-Speed 6326.96 samples/sec Loss 4.4866 LearningRate 0.0002 Epoch: 23 Global Step: 477070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:20,629-Speed 6327.54 samples/sec Loss 4.4803 LearningRate 0.0002 Epoch: 23 Global Step: 477080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:23,872-Speed 6317.88 samples/sec Loss 4.4856 LearningRate 0.0002 Epoch: 23 Global Step: 477090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:27,112-Speed 6321.38 samples/sec Loss 4.4078 LearningRate 0.0002 Epoch: 23 Global Step: 477100 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:08:30,337-Speed 6351.11 samples/sec Loss 4.4273 LearningRate 0.0002 Epoch: 23 Global Step: 477110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:33,574-Speed 6328.60 samples/sec Loss 4.3874 LearningRate 0.0002 Epoch: 23 Global Step: 477120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:36,814-Speed 6322.44 samples/sec Loss 4.4160 LearningRate 0.0002 Epoch: 23 Global Step: 477130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:40,055-Speed 6319.63 samples/sec Loss 4.4232 LearningRate 0.0002 Epoch: 23 Global Step: 477140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:43,295-Speed 6322.87 samples/sec Loss 4.4374 LearningRate 0.0002 Epoch: 23 Global Step: 477150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:46,535-Speed 6323.54 samples/sec Loss 4.4550 LearningRate 0.0002 Epoch: 23 Global Step: 477160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:49,777-Speed 6318.10 samples/sec Loss 4.3890 LearningRate 0.0002 Epoch: 23 Global Step: 477170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:53,019-Speed 6317.34 samples/sec Loss 4.4293 LearningRate 0.0002 Epoch: 23 Global Step: 477180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:56,266-Speed 6309.47 samples/sec Loss 4.3852 LearningRate 0.0002 Epoch: 23 Global Step: 477190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:08:59,514-Speed 6307.12 samples/sec Loss 4.4341 LearningRate 0.0002 Epoch: 23 Global Step: 477200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:02,759-Speed 6312.45 samples/sec Loss 4.4292 LearningRate 0.0002 Epoch: 23 Global Step: 477210 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:09:05,988-Speed 6345.04 samples/sec Loss 4.4678 LearningRate 0.0002 Epoch: 23 Global Step: 477220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:09,230-Speed 6319.46 samples/sec Loss 4.4466 LearningRate 0.0002 Epoch: 23 Global Step: 477230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:12,469-Speed 6322.64 samples/sec Loss 4.3904 LearningRate 0.0002 Epoch: 23 Global Step: 477240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:15,723-Speed 6295.79 samples/sec Loss 4.4512 LearningRate 0.0002 Epoch: 23 Global Step: 477250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:18,967-Speed 6314.27 samples/sec Loss 4.4301 LearningRate 0.0002 Epoch: 23 Global Step: 477260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:22,217-Speed 6303.75 samples/sec Loss 4.4239 LearningRate 0.0002 Epoch: 23 Global Step: 477270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:25,460-Speed 6315.41 samples/sec Loss 4.4134 LearningRate 0.0002 Epoch: 23 Global Step: 477280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:28,704-Speed 6315.90 samples/sec Loss 4.4280 LearningRate 0.0002 Epoch: 23 Global Step: 477290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:31,949-Speed 6312.02 samples/sec Loss 4.4707 LearningRate 0.0002 Epoch: 23 Global Step: 477300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:35,191-Speed 6318.14 samples/sec Loss 4.4106 LearningRate 0.0002 Epoch: 23 Global Step: 477310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:38,422-Speed 6340.52 samples/sec Loss 4.3977 LearningRate 0.0002 Epoch: 23 Global Step: 477320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:41,685-Speed 6278.15 samples/sec Loss 4.4364 LearningRate 0.0002 Epoch: 23 Global Step: 477330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:44,937-Speed 6300.17 samples/sec Loss 4.4048 LearningRate 0.0002 Epoch: 23 Global Step: 477340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:48,231-Speed 6217.13 samples/sec Loss 4.3848 LearningRate 0.0002 Epoch: 23 Global Step: 477350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:51,474-Speed 6316.95 samples/sec Loss 4.4358 LearningRate 0.0002 Epoch: 23 Global Step: 477360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:54,722-Speed 6306.34 samples/sec Loss 4.4177 LearningRate 0.0002 Epoch: 23 Global Step: 477370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:09:57,969-Speed 6309.50 samples/sec Loss 4.3956 LearningRate 0.0002 Epoch: 23 Global Step: 477380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:01,215-Speed 6310.97 samples/sec Loss 4.3939 LearningRate 0.0002 Epoch: 23 Global Step: 477390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:04,459-Speed 6314.07 samples/sec Loss 4.4559 LearningRate 0.0002 Epoch: 23 Global Step: 477400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:07,700-Speed 6321.34 samples/sec Loss 4.3935 LearningRate 0.0002 Epoch: 23 Global Step: 477410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:10,931-Speed 6340.01 samples/sec Loss 4.5012 LearningRate 0.0002 Epoch: 23 Global Step: 477420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:14,173-Speed 6317.71 samples/sec Loss 4.4207 LearningRate 0.0002 Epoch: 23 Global Step: 477430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:17,418-Speed 6314.09 samples/sec Loss 4.4803 LearningRate 0.0002 Epoch: 23 Global Step: 477440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:20,659-Speed 6319.58 samples/sec Loss 4.4234 LearningRate 0.0002 Epoch: 23 Global Step: 477450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:23,907-Speed 6306.92 samples/sec Loss 4.4834 LearningRate 0.0002 Epoch: 23 Global Step: 477460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:27,153-Speed 6311.37 samples/sec Loss 4.4892 LearningRate 0.0002 Epoch: 23 Global Step: 477470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:30,400-Speed 6309.39 samples/sec Loss 4.4403 LearningRate 0.0002 Epoch: 23 Global Step: 477480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:33,643-Speed 6315.71 samples/sec Loss 4.4048 LearningRate 0.0002 Epoch: 23 Global Step: 477490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:36,891-Speed 6307.32 samples/sec Loss 4.4534 LearningRate 0.0002 Epoch: 23 Global Step: 477500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:40,135-Speed 6314.82 samples/sec Loss 4.4754 LearningRate 0.0002 Epoch: 23 Global Step: 477510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:43,363-Speed 6344.22 samples/sec Loss 4.4128 LearningRate 0.0002 Epoch: 23 Global Step: 477520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:46,605-Speed 6319.05 samples/sec Loss 4.3892 LearningRate 0.0002 Epoch: 23 Global Step: 477530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:49,852-Speed 6309.59 samples/sec Loss 4.3551 LearningRate 0.0002 Epoch: 23 Global Step: 477540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:53,098-Speed 6310.79 samples/sec Loss 4.3393 LearningRate 0.0002 Epoch: 23 Global Step: 477550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:56,339-Speed 6320.01 samples/sec Loss 4.4321 LearningRate 0.0002 Epoch: 23 Global Step: 477560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:10:59,582-Speed 6316.03 samples/sec Loss 4.4286 LearningRate 0.0002 Epoch: 23 Global Step: 477570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:02,829-Speed 6309.89 samples/sec Loss 4.4481 LearningRate 0.0002 Epoch: 23 Global Step: 477580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:06,070-Speed 6318.59 samples/sec Loss 4.3893 LearningRate 0.0002 Epoch: 23 Global Step: 477590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:09,317-Speed 6311.65 samples/sec Loss 4.3921 LearningRate 0.0002 Epoch: 23 Global Step: 477600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:12,570-Speed 6295.71 samples/sec Loss 4.4591 LearningRate 0.0002 Epoch: 23 Global Step: 477610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:15,805-Speed 6332.56 samples/sec Loss 4.3950 LearningRate 0.0002 Epoch: 23 Global Step: 477620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:19,049-Speed 6314.72 samples/sec Loss 4.4680 LearningRate 0.0002 Epoch: 23 Global Step: 477630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:22,293-Speed 6315.60 samples/sec Loss 4.4930 LearningRate 0.0002 Epoch: 23 Global Step: 477640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:25,542-Speed 6303.97 samples/sec Loss 4.3888 LearningRate 0.0002 Epoch: 23 Global Step: 477650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:28,793-Speed 6301.03 samples/sec Loss 4.4422 LearningRate 0.0002 Epoch: 23 Global Step: 477660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:32,039-Speed 6312.38 samples/sec Loss 4.4316 LearningRate 0.0002 Epoch: 23 Global Step: 477670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:35,282-Speed 6315.04 samples/sec Loss 4.4088 LearningRate 0.0002 Epoch: 23 Global Step: 477680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:38,529-Speed 6310.30 samples/sec Loss 4.3974 LearningRate 0.0002 Epoch: 23 Global Step: 477690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:41,769-Speed 6322.47 samples/sec Loss 4.4098 LearningRate 0.0002 Epoch: 23 Global Step: 477700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:45,016-Speed 6308.39 samples/sec Loss 4.4774 LearningRate 0.0002 Epoch: 23 Global Step: 477710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:48,261-Speed 6311.89 samples/sec Loss 4.4384 LearningRate 0.0002 Epoch: 23 Global Step: 477720 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:11:51,505-Speed 6315.60 samples/sec Loss 4.4187 LearningRate 0.0002 Epoch: 23 Global Step: 477730 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:11:54,738-Speed 6335.87 samples/sec Loss 4.4281 LearningRate 0.0002 Epoch: 23 Global Step: 477740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:11:57,983-Speed 6312.76 samples/sec Loss 4.4406 LearningRate 0.0002 Epoch: 23 Global Step: 477750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:01,229-Speed 6309.80 samples/sec Loss 4.4279 LearningRate 0.0002 Epoch: 23 Global Step: 477760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:04,475-Speed 6312.02 samples/sec Loss 4.4575 LearningRate 0.0002 Epoch: 23 Global Step: 477770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:07,723-Speed 6305.76 samples/sec Loss 4.4662 LearningRate 0.0002 Epoch: 23 Global Step: 477780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:10,972-Speed 6304.84 samples/sec Loss 4.3370 LearningRate 0.0002 Epoch: 23 Global Step: 477790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:14,214-Speed 6318.34 samples/sec Loss 4.4542 LearningRate 0.0002 Epoch: 23 Global Step: 477800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:17,459-Speed 6312.40 samples/sec Loss 4.4043 LearningRate 0.0002 Epoch: 23 Global Step: 477810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:20,702-Speed 6316.97 samples/sec Loss 4.5014 LearningRate 0.0002 Epoch: 23 Global Step: 477820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:23,948-Speed 6310.71 samples/sec Loss 4.4016 LearningRate 0.0002 Epoch: 23 Global Step: 477830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:27,181-Speed 6337.68 samples/sec Loss 4.4130 LearningRate 0.0002 Epoch: 23 Global Step: 477840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:30,425-Speed 6314.09 samples/sec Loss 4.4426 LearningRate 0.0002 Epoch: 23 Global Step: 477850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:33,670-Speed 6313.58 samples/sec Loss 4.5319 LearningRate 0.0002 Epoch: 23 Global Step: 477860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:36,914-Speed 6314.80 samples/sec Loss 4.4850 LearningRate 0.0002 Epoch: 23 Global Step: 477870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:40,154-Speed 6320.99 samples/sec Loss 4.4404 LearningRate 0.0002 Epoch: 23 Global Step: 477880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:43,399-Speed 6312.68 samples/sec Loss 4.3295 LearningRate 0.0002 Epoch: 23 Global Step: 477890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:46,643-Speed 6315.85 samples/sec Loss 4.4240 LearningRate 0.0002 Epoch: 23 Global Step: 477900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:49,888-Speed 6311.46 samples/sec Loss 4.4252 LearningRate 0.0002 Epoch: 23 Global Step: 477910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:53,131-Speed 6316.39 samples/sec Loss 4.4362 LearningRate 0.0002 Epoch: 23 Global Step: 477920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:56,379-Speed 6307.09 samples/sec Loss 4.4121 LearningRate 0.0002 Epoch: 23 Global Step: 477930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:12:59,609-Speed 6342.86 samples/sec Loss 4.4391 LearningRate 0.0002 Epoch: 23 Global Step: 477940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:02,855-Speed 6309.83 samples/sec Loss 4.4067 LearningRate 0.0002 Epoch: 23 Global Step: 477950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:06,104-Speed 6304.42 samples/sec Loss 4.4461 LearningRate 0.0002 Epoch: 23 Global Step: 477960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:09,344-Speed 6322.04 samples/sec Loss 4.4289 LearningRate 0.0002 Epoch: 23 Global Step: 477970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:12,588-Speed 6315.30 samples/sec Loss 4.4900 LearningRate 0.0002 Epoch: 23 Global Step: 477980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:15,836-Speed 6306.75 samples/sec Loss 4.4210 LearningRate 0.0002 Epoch: 23 Global Step: 477990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:19,079-Speed 6317.63 samples/sec Loss 4.4604 LearningRate 0.0002 Epoch: 23 Global Step: 478000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:22,321-Speed 6317.06 samples/sec Loss 4.4476 LearningRate 0.0002 Epoch: 23 Global Step: 478010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:25,564-Speed 6317.64 samples/sec Loss 4.4504 LearningRate 0.0002 Epoch: 23 Global Step: 478020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:28,811-Speed 6308.67 samples/sec Loss 4.4870 LearningRate 0.0002 Epoch: 23 Global Step: 478030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:32,058-Speed 6309.18 samples/sec Loss 4.3938 LearningRate 0.0002 Epoch: 23 Global Step: 478040 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:13:35,304-Speed 6310.90 samples/sec Loss 4.3927 LearningRate 0.0002 Epoch: 23 Global Step: 478050 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:13:38,533-Speed 6343.46 samples/sec Loss 4.4102 LearningRate 0.0002 Epoch: 23 Global Step: 478060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:41,781-Speed 6308.11 samples/sec Loss 4.4105 LearningRate 0.0002 Epoch: 23 Global Step: 478070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:45,023-Speed 6317.45 samples/sec Loss 4.4346 LearningRate 0.0002 Epoch: 23 Global Step: 478080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:48,266-Speed 6316.39 samples/sec Loss 4.4449 LearningRate 0.0002 Epoch: 23 Global Step: 478090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:51,511-Speed 6313.56 samples/sec Loss 4.4056 LearningRate 0.0002 Epoch: 23 Global Step: 478100 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:54,754-Speed 6317.12 samples/sec Loss 4.3923 LearningRate 0.0002 Epoch: 23 Global Step: 478110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:13:57,998-Speed 6314.06 samples/sec Loss 4.4135 LearningRate 0.0002 Epoch: 23 Global Step: 478120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:01,241-Speed 6315.35 samples/sec Loss 4.4476 LearningRate 0.0002 Epoch: 23 Global Step: 478130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:04,495-Speed 6305.01 samples/sec Loss 4.4007 LearningRate 0.0002 Epoch: 23 Global Step: 478140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:07,745-Speed 6303.99 samples/sec Loss 4.4507 LearningRate 0.0002 Epoch: 23 Global Step: 478150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:11,007-Speed 6278.10 samples/sec Loss 4.4609 LearningRate 0.0002 Epoch: 23 Global Step: 478160 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:14:14,285-Speed 6250.05 samples/sec Loss 4.4301 LearningRate 0.0002 Epoch: 23 Global Step: 478170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:17,533-Speed 6307.20 samples/sec Loss 4.3797 LearningRate 0.0002 Epoch: 23 Global Step: 478180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:20,779-Speed 6309.77 samples/sec Loss 4.4598 LearningRate 0.0002 Epoch: 23 Global Step: 478190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:24,029-Speed 6303.52 samples/sec Loss 4.4368 LearningRate 0.0002 Epoch: 23 Global Step: 478200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:27,276-Speed 6309.43 samples/sec Loss 4.4270 LearningRate 0.0002 Epoch: 23 Global Step: 478210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:30,525-Speed 6303.71 samples/sec Loss 4.4440 LearningRate 0.0002 Epoch: 23 Global Step: 478220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:33,769-Speed 6315.17 samples/sec Loss 4.4266 LearningRate 0.0002 Epoch: 23 Global Step: 478230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:37,011-Speed 6318.68 samples/sec Loss 4.4627 LearningRate 0.0002 Epoch: 23 Global Step: 478240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:40,256-Speed 6312.46 samples/sec Loss 4.4823 LearningRate 0.0002 Epoch: 23 Global Step: 478250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:43,503-Speed 6308.56 samples/sec Loss 4.3767 LearningRate 0.0002 Epoch: 23 Global Step: 478260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:46,748-Speed 6314.14 samples/sec Loss 4.4648 LearningRate 0.0002 Epoch: 23 Global Step: 478270 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:14:49,978-Speed 6341.58 samples/sec Loss 4.4226 LearningRate 0.0002 Epoch: 23 Global Step: 478280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:53,223-Speed 6312.54 samples/sec Loss 4.4582 LearningRate 0.0002 Epoch: 23 Global Step: 478290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:56,466-Speed 6316.73 samples/sec Loss 4.3758 LearningRate 0.0002 Epoch: 23 Global Step: 478300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:14:59,714-Speed 6307.62 samples/sec Loss 4.4264 LearningRate 0.0002 Epoch: 23 Global Step: 478310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:02,998-Speed 6236.62 samples/sec Loss 4.4552 LearningRate 0.0002 Epoch: 23 Global Step: 478320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:06,243-Speed 6312.64 samples/sec Loss 4.4237 LearningRate 0.0002 Epoch: 23 Global Step: 478330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:09,490-Speed 6310.02 samples/sec Loss 4.4052 LearningRate 0.0002 Epoch: 23 Global Step: 478340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:12,732-Speed 6316.79 samples/sec Loss 4.4086 LearningRate 0.0002 Epoch: 23 Global Step: 478350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:15,978-Speed 6312.08 samples/sec Loss 4.4122 LearningRate 0.0002 Epoch: 23 Global Step: 478360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:19,225-Speed 6309.23 samples/sec Loss 4.4112 LearningRate 0.0002 Epoch: 23 Global Step: 478370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:22,461-Speed 6330.01 samples/sec Loss 4.4049 LearningRate 0.0002 Epoch: 23 Global Step: 478380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:25,706-Speed 6311.34 samples/sec Loss 4.4293 LearningRate 0.0002 Epoch: 23 Global Step: 478390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:28,963-Speed 6290.01 samples/sec Loss 4.3581 LearningRate 0.0002 Epoch: 23 Global Step: 478400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:32,210-Speed 6308.50 samples/sec Loss 4.4304 LearningRate 0.0002 Epoch: 23 Global Step: 478410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:35,455-Speed 6316.10 samples/sec Loss 4.4163 LearningRate 0.0002 Epoch: 23 Global Step: 478420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:38,698-Speed 6315.66 samples/sec Loss 4.4327 LearningRate 0.0002 Epoch: 23 Global Step: 478430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:41,943-Speed 6313.40 samples/sec Loss 4.3880 LearningRate 0.0002 Epoch: 23 Global Step: 478440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:45,188-Speed 6312.57 samples/sec Loss 4.4544 LearningRate 0.0002 Epoch: 23 Global Step: 478450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:48,435-Speed 6308.90 samples/sec Loss 4.4726 LearningRate 0.0002 Epoch: 23 Global Step: 478460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:51,678-Speed 6317.67 samples/sec Loss 4.3910 LearningRate 0.0002 Epoch: 23 Global Step: 478470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:54,909-Speed 6338.73 samples/sec Loss 4.4282 LearningRate 0.0002 Epoch: 23 Global Step: 478480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:15:58,151-Speed 6318.29 samples/sec Loss 4.4454 LearningRate 0.0002 Epoch: 23 Global Step: 478490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:01,397-Speed 6310.66 samples/sec Loss 4.4458 LearningRate 0.0002 Epoch: 23 Global Step: 478500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:04,644-Speed 6309.46 samples/sec Loss 4.4526 LearningRate 0.0002 Epoch: 23 Global Step: 478510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:07,893-Speed 6305.43 samples/sec Loss 4.4427 LearningRate 0.0002 Epoch: 23 Global Step: 478520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:11,138-Speed 6312.38 samples/sec Loss 4.4898 LearningRate 0.0002 Epoch: 23 Global Step: 478530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:14,382-Speed 6315.38 samples/sec Loss 4.4630 LearningRate 0.0002 Epoch: 23 Global Step: 478540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:17,630-Speed 6306.35 samples/sec Loss 4.3820 LearningRate 0.0002 Epoch: 23 Global Step: 478550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:20,875-Speed 6311.92 samples/sec Loss 4.4193 LearningRate 0.0002 Epoch: 23 Global Step: 478560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:24,123-Speed 6307.53 samples/sec Loss 4.4262 LearningRate 0.0002 Epoch: 23 Global Step: 478570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:27,439-Speed 6177.03 samples/sec Loss 4.4322 LearningRate 0.0002 Epoch: 23 Global Step: 478580 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:16:30,681-Speed 6318.48 samples/sec Loss 4.4241 LearningRate 0.0002 Epoch: 23 Global Step: 478590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:33,936-Speed 6293.49 samples/sec Loss 4.3644 LearningRate 0.0002 Epoch: 23 Global Step: 478600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:37,176-Speed 6321.86 samples/sec Loss 4.3619 LearningRate 0.0002 Epoch: 23 Global Step: 478610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:40,421-Speed 6312.49 samples/sec Loss 4.3865 LearningRate 0.0002 Epoch: 23 Global Step: 478620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:43,668-Speed 6308.39 samples/sec Loss 4.4435 LearningRate 0.0002 Epoch: 23 Global Step: 478630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:46,925-Speed 6290.44 samples/sec Loss 4.3900 LearningRate 0.0002 Epoch: 23 Global Step: 478640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:50,185-Speed 6283.08 samples/sec Loss 4.3987 LearningRate 0.0002 Epoch: 23 Global Step: 478650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:53,430-Speed 6313.93 samples/sec Loss 4.3936 LearningRate 0.0002 Epoch: 23 Global Step: 478660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:56,677-Speed 6308.47 samples/sec Loss 4.4490 LearningRate 0.0002 Epoch: 23 Global Step: 478670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:16:59,924-Speed 6309.55 samples/sec Loss 4.4273 LearningRate 0.0002 Epoch: 23 Global Step: 478680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:03,151-Speed 6346.45 samples/sec Loss 4.4554 LearningRate 0.0002 Epoch: 23 Global Step: 478690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:06,395-Speed 6314.90 samples/sec Loss 4.3713 LearningRate 0.0002 Epoch: 23 Global Step: 478700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:09,638-Speed 6316.12 samples/sec Loss 4.3932 LearningRate 0.0002 Epoch: 23 Global Step: 478710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:12,881-Speed 6317.31 samples/sec Loss 4.4103 LearningRate 0.0002 Epoch: 23 Global Step: 478720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:16,126-Speed 6313.43 samples/sec Loss 4.3861 LearningRate 0.0002 Epoch: 23 Global Step: 478730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:19,374-Speed 6306.41 samples/sec Loss 4.4561 LearningRate 0.0002 Epoch: 23 Global Step: 478740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:22,623-Speed 6304.80 samples/sec Loss 4.4360 LearningRate 0.0002 Epoch: 23 Global Step: 478750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:25,869-Speed 6310.17 samples/sec Loss 4.3666 LearningRate 0.0002 Epoch: 23 Global Step: 478760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:29,178-Speed 6191.03 samples/sec Loss 4.4426 LearningRate 0.0002 Epoch: 23 Global Step: 478770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:32,440-Speed 6278.47 samples/sec Loss 4.4951 LearningRate 0.0002 Epoch: 23 Global Step: 478780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:35,675-Speed 6332.75 samples/sec Loss 4.4267 LearningRate 0.0002 Epoch: 23 Global Step: 478790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:38,922-Speed 6309.42 samples/sec Loss 4.4114 LearningRate 0.0002 Epoch: 23 Global Step: 478800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:42,167-Speed 6312.17 samples/sec Loss 4.4740 LearningRate 0.0002 Epoch: 23 Global Step: 478810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:45,410-Speed 6316.97 samples/sec Loss 4.4840 LearningRate 0.0002 Epoch: 23 Global Step: 478820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:48,654-Speed 6314.33 samples/sec Loss 4.4460 LearningRate 0.0002 Epoch: 23 Global Step: 478830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:51,898-Speed 6314.82 samples/sec Loss 4.3870 LearningRate 0.0002 Epoch: 23 Global Step: 478840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:55,144-Speed 6309.76 samples/sec Loss 4.4889 LearningRate 0.0002 Epoch: 23 Global Step: 478850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:17:58,390-Speed 6312.89 samples/sec Loss 4.4522 LearningRate 0.0002 Epoch: 23 Global Step: 478860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:01,641-Speed 6300.88 samples/sec Loss 4.3802 LearningRate 0.0002 Epoch: 23 Global Step: 478870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:04,888-Speed 6308.13 samples/sec Loss 4.4567 LearningRate 0.0002 Epoch: 23 Global Step: 478880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:08,119-Speed 6339.64 samples/sec Loss 4.4064 LearningRate 0.0002 Epoch: 23 Global Step: 478890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:11,363-Speed 6316.03 samples/sec Loss 4.4366 LearningRate 0.0002 Epoch: 23 Global Step: 478900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:14,612-Speed 6304.70 samples/sec Loss 4.4251 LearningRate 0.0002 Epoch: 23 Global Step: 478910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:17,857-Speed 6311.53 samples/sec Loss 4.3493 LearningRate 0.0002 Epoch: 23 Global Step: 478920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:21,100-Speed 6317.50 samples/sec Loss 4.4374 LearningRate 0.0002 Epoch: 23 Global Step: 478930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:24,346-Speed 6310.03 samples/sec Loss 4.3536 LearningRate 0.0002 Epoch: 23 Global Step: 478940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:27,592-Speed 6310.03 samples/sec Loss 4.4318 LearningRate 0.0002 Epoch: 23 Global Step: 478950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:30,839-Speed 6309.77 samples/sec Loss 4.3765 LearningRate 0.0002 Epoch: 23 Global Step: 478960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:34,084-Speed 6311.98 samples/sec Loss 4.4696 LearningRate 0.0002 Epoch: 23 Global Step: 478970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:37,328-Speed 6315.50 samples/sec Loss 4.4013 LearningRate 0.0002 Epoch: 23 Global Step: 478980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:40,561-Speed 6335.54 samples/sec Loss 4.4301 LearningRate 0.0002 Epoch: 23 Global Step: 478990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:43,805-Speed 6313.77 samples/sec Loss 4.3898 LearningRate 0.0002 Epoch: 23 Global Step: 479000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:47,053-Speed 6312.16 samples/sec Loss 4.4133 LearningRate 0.0002 Epoch: 23 Global Step: 479010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:50,297-Speed 6313.59 samples/sec Loss 4.4245 LearningRate 0.0002 Epoch: 23 Global Step: 479020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:53,541-Speed 6315.98 samples/sec Loss 4.4613 LearningRate 0.0002 Epoch: 23 Global Step: 479030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:18:56,790-Speed 6304.93 samples/sec Loss 4.3949 LearningRate 0.0002 Epoch: 23 Global Step: 479040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:00,032-Speed 6316.91 samples/sec Loss 4.3941 LearningRate 0.0002 Epoch: 23 Global Step: 479050 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:03,281-Speed 6306.40 samples/sec Loss 4.4199 LearningRate 0.0002 Epoch: 23 Global Step: 479060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:06,527-Speed 6311.20 samples/sec Loss 4.4099 LearningRate 0.0002 Epoch: 23 Global Step: 479070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:09,771-Speed 6314.84 samples/sec Loss 4.4324 LearningRate 0.0002 Epoch: 23 Global Step: 479080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:13,015-Speed 6313.47 samples/sec Loss 4.4533 LearningRate 0.0002 Epoch: 23 Global Step: 479090 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:19:16,247-Speed 6338.91 samples/sec Loss 4.4361 LearningRate 0.0002 Epoch: 23 Global Step: 479100 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:19,494-Speed 6308.69 samples/sec Loss 4.3893 LearningRate 0.0002 Epoch: 23 Global Step: 479110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:22,740-Speed 6310.81 samples/sec Loss 4.4373 LearningRate 0.0002 Epoch: 23 Global Step: 479120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:25,987-Speed 6309.02 samples/sec Loss 4.4323 LearningRate 0.0002 Epoch: 23 Global Step: 479130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:29,232-Speed 6311.95 samples/sec Loss 4.4419 LearningRate 0.0002 Epoch: 23 Global Step: 479140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:32,482-Speed 6304.05 samples/sec Loss 4.4037 LearningRate 0.0002 Epoch: 23 Global Step: 479150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:35,727-Speed 6311.98 samples/sec Loss 4.4081 LearningRate 0.0002 Epoch: 23 Global Step: 479160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:38,972-Speed 6313.13 samples/sec Loss 4.4297 LearningRate 0.0002 Epoch: 23 Global Step: 479170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:42,217-Speed 6311.95 samples/sec Loss 4.4165 LearningRate 0.0002 Epoch: 23 Global Step: 479180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:45,460-Speed 6316.09 samples/sec Loss 4.3633 LearningRate 0.0002 Epoch: 23 Global Step: 479190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:48,691-Speed 6341.67 samples/sec Loss 4.3742 LearningRate 0.0002 Epoch: 23 Global Step: 479200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:51,936-Speed 6312.07 samples/sec Loss 4.3841 LearningRate 0.0002 Epoch: 23 Global Step: 479210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:55,182-Speed 6310.42 samples/sec Loss 4.3417 LearningRate 0.0002 Epoch: 23 Global Step: 479220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:19:58,427-Speed 6314.21 samples/sec Loss 4.3733 LearningRate 0.0002 Epoch: 23 Global Step: 479230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:01,671-Speed 6313.81 samples/sec Loss 4.4326 LearningRate 0.0002 Epoch: 23 Global Step: 479240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:04,925-Speed 6295.34 samples/sec Loss 4.4781 LearningRate 0.0002 Epoch: 23 Global Step: 479250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:08,172-Speed 6308.61 samples/sec Loss 4.4023 LearningRate 0.0002 Epoch: 23 Global Step: 479260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:11,418-Speed 6310.93 samples/sec Loss 4.3938 LearningRate 0.0002 Epoch: 23 Global Step: 479270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:14,663-Speed 6312.54 samples/sec Loss 4.4140 LearningRate 0.0002 Epoch: 23 Global Step: 479280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:17,908-Speed 6313.58 samples/sec Loss 4.3834 LearningRate 0.0002 Epoch: 23 Global Step: 479290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:21,140-Speed 6337.86 samples/sec Loss 4.4207 LearningRate 0.0002 Epoch: 23 Global Step: 479300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:24,387-Speed 6309.06 samples/sec Loss 4.4566 LearningRate 0.0002 Epoch: 23 Global Step: 479310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:27,645-Speed 6288.07 samples/sec Loss 4.3992 LearningRate 0.0002 Epoch: 23 Global Step: 479320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:30,890-Speed 6312.10 samples/sec Loss 4.4225 LearningRate 0.0002 Epoch: 23 Global Step: 479330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:34,138-Speed 6306.76 samples/sec Loss 4.3748 LearningRate 0.0002 Epoch: 23 Global Step: 479340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:37,382-Speed 6314.47 samples/sec Loss 4.4132 LearningRate 0.0002 Epoch: 23 Global Step: 479350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:40,626-Speed 6314.08 samples/sec Loss 4.4216 LearningRate 0.0002 Epoch: 23 Global Step: 479360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:43,868-Speed 6317.92 samples/sec Loss 4.4331 LearningRate 0.0002 Epoch: 23 Global Step: 479370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:47,111-Speed 6316.91 samples/sec Loss 4.4806 LearningRate 0.0002 Epoch: 23 Global Step: 479380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:50,354-Speed 6317.06 samples/sec Loss 4.4249 LearningRate 0.0002 Epoch: 23 Global Step: 479390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:53,582-Speed 6344.90 samples/sec Loss 4.4509 LearningRate 0.0002 Epoch: 23 Global Step: 479400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:20:56,831-Speed 6305.91 samples/sec Loss 4.4486 LearningRate 0.0002 Epoch: 23 Global Step: 479410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:00,096-Speed 6273.41 samples/sec Loss 4.3839 LearningRate 0.0002 Epoch: 23 Global Step: 479420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:03,340-Speed 6315.24 samples/sec Loss 4.4198 LearningRate 0.0002 Epoch: 23 Global Step: 479430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:06,584-Speed 6313.76 samples/sec Loss 4.4022 LearningRate 0.0002 Epoch: 23 Global Step: 479440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:09,832-Speed 6307.77 samples/sec Loss 4.4625 LearningRate 0.0002 Epoch: 23 Global Step: 479450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:13,081-Speed 6304.91 samples/sec Loss 4.4425 LearningRate 0.0002 Epoch: 23 Global Step: 479460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:16,330-Speed 6303.52 samples/sec Loss 4.4053 LearningRate 0.0002 Epoch: 23 Global Step: 479470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:19,578-Speed 6309.12 samples/sec Loss 4.4511 LearningRate 0.0002 Epoch: 23 Global Step: 479480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:22,824-Speed 6310.88 samples/sec Loss 4.3496 LearningRate 0.0002 Epoch: 23 Global Step: 479490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:26,050-Speed 6348.76 samples/sec Loss 4.4654 LearningRate 0.0002 Epoch: 23 Global Step: 479500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:29,298-Speed 6308.20 samples/sec Loss 4.4039 LearningRate 0.0002 Epoch: 23 Global Step: 479510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:32,539-Speed 6319.05 samples/sec Loss 4.4136 LearningRate 0.0002 Epoch: 23 Global Step: 479520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:35,780-Speed 6320.53 samples/sec Loss 4.3768 LearningRate 0.0002 Epoch: 23 Global Step: 479530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:39,031-Speed 6302.15 samples/sec Loss 4.4070 LearningRate 0.0002 Epoch: 23 Global Step: 479540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:42,281-Speed 6302.31 samples/sec Loss 4.4257 LearningRate 0.0002 Epoch: 23 Global Step: 479550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:45,525-Speed 6314.99 samples/sec Loss 4.3928 LearningRate 0.0002 Epoch: 23 Global Step: 479560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:48,770-Speed 6311.80 samples/sec Loss 4.4823 LearningRate 0.0002 Epoch: 23 Global Step: 479570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:52,015-Speed 6312.50 samples/sec Loss 4.3368 LearningRate 0.0002 Epoch: 23 Global Step: 479580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:55,265-Speed 6302.32 samples/sec Loss 4.3779 LearningRate 0.0002 Epoch: 23 Global Step: 479590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:21:58,510-Speed 6314.04 samples/sec Loss 4.4049 LearningRate 0.0002 Epoch: 23 Global Step: 479600 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:22:01,746-Speed 6328.97 samples/sec Loss 4.4157 LearningRate 0.0002 Epoch: 23 Global Step: 479610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:04,990-Speed 6315.23 samples/sec Loss 4.4056 LearningRate 0.0002 Epoch: 23 Global Step: 479620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:08,245-Speed 6292.96 samples/sec Loss 4.4838 LearningRate 0.0002 Epoch: 23 Global Step: 479630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:11,490-Speed 6315.23 samples/sec Loss 4.3781 LearningRate 0.0002 Epoch: 23 Global Step: 479640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:14,736-Speed 6311.33 samples/sec Loss 4.3669 LearningRate 0.0002 Epoch: 23 Global Step: 479650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:17,982-Speed 6311.01 samples/sec Loss 4.3369 LearningRate 0.0002 Epoch: 23 Global Step: 479660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:21,233-Speed 6299.97 samples/sec Loss 4.4147 LearningRate 0.0002 Epoch: 23 Global Step: 479670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:24,488-Speed 6293.43 samples/sec Loss 4.4435 LearningRate 0.0002 Epoch: 23 Global Step: 479680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:27,740-Speed 6301.03 samples/sec Loss 4.4006 LearningRate 0.0002 Epoch: 23 Global Step: 479690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:30,984-Speed 6314.67 samples/sec Loss 4.4953 LearningRate 0.0002 Epoch: 23 Global Step: 479700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:34,219-Speed 6331.43 samples/sec Loss 4.3623 LearningRate 0.0002 Epoch: 23 Global Step: 479710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:37,465-Speed 6310.71 samples/sec Loss 4.3667 LearningRate 0.0002 Epoch: 23 Global Step: 479720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:40,712-Speed 6309.09 samples/sec Loss 4.4569 LearningRate 0.0002 Epoch: 23 Global Step: 479730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:43,957-Speed 6311.56 samples/sec Loss 4.3897 LearningRate 0.0002 Epoch: 23 Global Step: 479740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:47,202-Speed 6313.21 samples/sec Loss 4.4440 LearningRate 0.0002 Epoch: 23 Global Step: 479750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:50,446-Speed 6314.52 samples/sec Loss 4.4561 LearningRate 0.0002 Epoch: 23 Global Step: 479760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:53,691-Speed 6313.27 samples/sec Loss 4.4134 LearningRate 0.0002 Epoch: 23 Global Step: 479770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:22:56,939-Speed 6307.55 samples/sec Loss 4.3981 LearningRate 0.0002 Epoch: 23 Global Step: 479780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:00,183-Speed 6313.17 samples/sec Loss 4.3737 LearningRate 0.0002 Epoch: 23 Global Step: 479790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:03,432-Speed 6305.05 samples/sec Loss 4.4056 LearningRate 0.0002 Epoch: 23 Global Step: 479800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:06,676-Speed 6315.12 samples/sec Loss 4.4080 LearningRate 0.0002 Epoch: 23 Global Step: 479810 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:23:09,905-Speed 6343.99 samples/sec Loss 4.4641 LearningRate 0.0002 Epoch: 23 Global Step: 479820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:13,150-Speed 6312.72 samples/sec Loss 4.4013 LearningRate 0.0002 Epoch: 23 Global Step: 479830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:16,396-Speed 6310.22 samples/sec Loss 4.3828 LearningRate 0.0002 Epoch: 23 Global Step: 479840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:19,641-Speed 6313.87 samples/sec Loss 4.3841 LearningRate 0.0002 Epoch: 23 Global Step: 479850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:22,884-Speed 6314.70 samples/sec Loss 4.4261 LearningRate 0.0002 Epoch: 23 Global Step: 479860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:26,131-Speed 6310.57 samples/sec Loss 4.4219 LearningRate 0.0002 Epoch: 23 Global Step: 479870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:29,375-Speed 6312.99 samples/sec Loss 4.3748 LearningRate 0.0002 Epoch: 23 Global Step: 479880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:32,619-Speed 6315.78 samples/sec Loss 4.5101 LearningRate 0.0002 Epoch: 23 Global Step: 479890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:35,870-Speed 6301.13 samples/sec Loss 4.4465 LearningRate 0.0002 Epoch: 23 Global Step: 479900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:39,112-Speed 6318.69 samples/sec Loss 4.4782 LearningRate 0.0002 Epoch: 23 Global Step: 479910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:42,345-Speed 6335.70 samples/sec Loss 4.4105 LearningRate 0.0002 Epoch: 23 Global Step: 479920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:45,592-Speed 6309.36 samples/sec Loss 4.4190 LearningRate 0.0002 Epoch: 23 Global Step: 479930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:48,836-Speed 6314.93 samples/sec Loss 4.4004 LearningRate 0.0002 Epoch: 23 Global Step: 479940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:52,090-Speed 6295.41 samples/sec Loss 4.3680 LearningRate 0.0002 Epoch: 23 Global Step: 479950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:55,333-Speed 6316.05 samples/sec Loss 4.3623 LearningRate 0.0002 Epoch: 23 Global Step: 479960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:23:58,579-Speed 6310.58 samples/sec Loss 4.4441 LearningRate 0.0002 Epoch: 23 Global Step: 479970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:01,824-Speed 6311.79 samples/sec Loss 4.4618 LearningRate 0.0002 Epoch: 23 Global Step: 479980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:05,071-Speed 6309.23 samples/sec Loss 4.4195 LearningRate 0.0002 Epoch: 23 Global Step: 479990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:08,318-Speed 6309.53 samples/sec Loss 4.4094 LearningRate 0.0002 Epoch: 23 Global Step: 480000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:11,564-Speed 6309.52 samples/sec Loss 4.3451 LearningRate 0.0002 Epoch: 23 Global Step: 480010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:14,796-Speed 6338.06 samples/sec Loss 4.3742 LearningRate 0.0002 Epoch: 23 Global Step: 480020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:18,039-Speed 6316.94 samples/sec Loss 4.4042 LearningRate 0.0002 Epoch: 23 Global Step: 480030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:21,295-Speed 6291.78 samples/sec Loss 4.3950 LearningRate 0.0002 Epoch: 23 Global Step: 480040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:24,538-Speed 6316.06 samples/sec Loss 4.4323 LearningRate 0.0002 Epoch: 23 Global Step: 480050 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:27,783-Speed 6313.78 samples/sec Loss 4.3997 LearningRate 0.0002 Epoch: 23 Global Step: 480060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:31,034-Speed 6300.45 samples/sec Loss 4.4374 LearningRate 0.0002 Epoch: 23 Global Step: 480070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:34,283-Speed 6304.16 samples/sec Loss 4.4635 LearningRate 0.0002 Epoch: 23 Global Step: 480080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:37,527-Speed 6314.95 samples/sec Loss 4.4494 LearningRate 0.0002 Epoch: 23 Global Step: 480090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:40,775-Speed 6306.65 samples/sec Loss 4.4075 LearningRate 0.0002 Epoch: 23 Global Step: 480100 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:44,022-Speed 6310.32 samples/sec Loss 4.4176 LearningRate 0.0002 Epoch: 23 Global Step: 480110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:47,253-Speed 6340.30 samples/sec Loss 4.4390 LearningRate 0.0002 Epoch: 23 Global Step: 480120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:50,497-Speed 6313.89 samples/sec Loss 4.3591 LearningRate 0.0002 Epoch: 23 Global Step: 480130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:53,750-Speed 6296.87 samples/sec Loss 4.4164 LearningRate 0.0002 Epoch: 23 Global Step: 480140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:24:56,996-Speed 6310.81 samples/sec Loss 4.4200 LearningRate 0.0002 Epoch: 23 Global Step: 480150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:00,242-Speed 6311.30 samples/sec Loss 4.4188 LearningRate 0.0002 Epoch: 23 Global Step: 480160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:03,489-Speed 6308.81 samples/sec Loss 4.5141 LearningRate 0.0002 Epoch: 23 Global Step: 480170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:06,732-Speed 6315.50 samples/sec Loss 4.4882 LearningRate 0.0002 Epoch: 23 Global Step: 480180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:09,980-Speed 6306.94 samples/sec Loss 4.3827 LearningRate 0.0002 Epoch: 23 Global Step: 480190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:13,224-Speed 6314.97 samples/sec Loss 4.4671 LearningRate 0.0002 Epoch: 23 Global Step: 480200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:16,470-Speed 6311.77 samples/sec Loss 4.4646 LearningRate 0.0002 Epoch: 23 Global Step: 480210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:19,703-Speed 6336.10 samples/sec Loss 4.4159 LearningRate 0.0002 Epoch: 23 Global Step: 480220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:22,950-Speed 6309.86 samples/sec Loss 4.3648 LearningRate 0.0002 Epoch: 23 Global Step: 480230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:26,194-Speed 6313.55 samples/sec Loss 4.3609 LearningRate 0.0002 Epoch: 23 Global Step: 480240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:29,439-Speed 6313.63 samples/sec Loss 4.4755 LearningRate 0.0002 Epoch: 23 Global Step: 480250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:32,681-Speed 6317.14 samples/sec Loss 4.3540 LearningRate 0.0002 Epoch: 23 Global Step: 480260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:35,926-Speed 6313.29 samples/sec Loss 4.4553 LearningRate 0.0002 Epoch: 23 Global Step: 480270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:39,169-Speed 6317.41 samples/sec Loss 4.3756 LearningRate 0.0002 Epoch: 23 Global Step: 480280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:42,414-Speed 6312.20 samples/sec Loss 4.4035 LearningRate 0.0002 Epoch: 23 Global Step: 480290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:45,658-Speed 6314.33 samples/sec Loss 4.3661 LearningRate 0.0002 Epoch: 23 Global Step: 480300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:48,911-Speed 6296.21 samples/sec Loss 4.3706 LearningRate 0.0002 Epoch: 23 Global Step: 480310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:52,158-Speed 6309.69 samples/sec Loss 4.4180 LearningRate 0.0002 Epoch: 23 Global Step: 480320 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:25:55,395-Speed 6329.26 samples/sec Loss 4.4217 LearningRate 0.0002 Epoch: 23 Global Step: 480330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:25:58,664-Speed 6265.86 samples/sec Loss 4.4050 LearningRate 0.0002 Epoch: 23 Global Step: 480340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:01,909-Speed 6313.55 samples/sec Loss 4.4272 LearningRate 0.0002 Epoch: 23 Global Step: 480350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:05,187-Speed 6248.27 samples/sec Loss 4.3807 LearningRate 0.0002 Epoch: 23 Global Step: 480360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:08,431-Speed 6315.34 samples/sec Loss 4.3931 LearningRate 0.0002 Epoch: 23 Global Step: 480370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:11,678-Speed 6308.81 samples/sec Loss 4.4431 LearningRate 0.0002 Epoch: 23 Global Step: 480380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:14,927-Speed 6305.34 samples/sec Loss 4.4432 LearningRate 0.0002 Epoch: 23 Global Step: 480390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:18,168-Speed 6319.09 samples/sec Loss 4.4016 LearningRate 0.0002 Epoch: 23 Global Step: 480400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:21,413-Speed 6313.40 samples/sec Loss 4.4600 LearningRate 0.0002 Epoch: 23 Global Step: 480410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:24,659-Speed 6309.67 samples/sec Loss 4.3582 LearningRate 0.0002 Epoch: 23 Global Step: 480420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:27,894-Speed 6332.10 samples/sec Loss 4.4055 LearningRate 0.0002 Epoch: 23 Global Step: 480430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:31,140-Speed 6310.56 samples/sec Loss 4.3720 LearningRate 0.0002 Epoch: 23 Global Step: 480440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:34,388-Speed 6308.48 samples/sec Loss 4.4197 LearningRate 0.0002 Epoch: 23 Global Step: 480450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:37,632-Speed 6312.91 samples/sec Loss 4.3603 LearningRate 0.0002 Epoch: 23 Global Step: 480460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:40,880-Speed 6307.90 samples/sec Loss 4.4726 LearningRate 0.0002 Epoch: 23 Global Step: 480470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:44,126-Speed 6309.99 samples/sec Loss 4.3962 LearningRate 0.0002 Epoch: 23 Global Step: 480480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:47,376-Speed 6303.46 samples/sec Loss 4.3682 LearningRate 0.0002 Epoch: 23 Global Step: 480490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:50,623-Speed 6308.57 samples/sec Loss 4.4121 LearningRate 0.0002 Epoch: 23 Global Step: 480500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:53,871-Speed 6306.01 samples/sec Loss 4.4391 LearningRate 0.0002 Epoch: 23 Global Step: 480510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:26:57,115-Speed 6315.24 samples/sec Loss 4.3297 LearningRate 0.0002 Epoch: 23 Global Step: 480520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:00,354-Speed 6325.47 samples/sec Loss 4.3826 LearningRate 0.0002 Epoch: 23 Global Step: 480530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:03,603-Speed 6305.78 samples/sec Loss 4.3812 LearningRate 0.0002 Epoch: 23 Global Step: 480540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:06,851-Speed 6305.55 samples/sec Loss 4.4006 LearningRate 0.0002 Epoch: 23 Global Step: 480550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:10,096-Speed 6313.23 samples/sec Loss 4.3336 LearningRate 0.0002 Epoch: 23 Global Step: 480560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:13,339-Speed 6316.96 samples/sec Loss 4.3963 LearningRate 0.0002 Epoch: 23 Global Step: 480570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:16,587-Speed 6305.87 samples/sec Loss 4.4064 LearningRate 0.0002 Epoch: 23 Global Step: 480580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:19,835-Speed 6308.19 samples/sec Loss 4.3646 LearningRate 0.0002 Epoch: 23 Global Step: 480590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:23,091-Speed 6291.13 samples/sec Loss 4.4217 LearningRate 0.0002 Epoch: 23 Global Step: 480600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:26,337-Speed 6310.09 samples/sec Loss 4.3446 LearningRate 0.0002 Epoch: 23 Global Step: 480610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:29,587-Speed 6303.75 samples/sec Loss 4.3816 LearningRate 0.0002 Epoch: 23 Global Step: 480620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:32,833-Speed 6311.06 samples/sec Loss 4.3928 LearningRate 0.0002 Epoch: 23 Global Step: 480630 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:27:36,063-Speed 6341.66 samples/sec Loss 4.4035 LearningRate 0.0002 Epoch: 23 Global Step: 480640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:39,311-Speed 6305.06 samples/sec Loss 4.4444 LearningRate 0.0002 Epoch: 23 Global Step: 480650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:42,555-Speed 6314.62 samples/sec Loss 4.4320 LearningRate 0.0002 Epoch: 23 Global Step: 480660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:45,803-Speed 6308.31 samples/sec Loss 4.3827 LearningRate 0.0002 Epoch: 23 Global Step: 480670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:49,049-Speed 6310.18 samples/sec Loss 4.3765 LearningRate 0.0002 Epoch: 23 Global Step: 480680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:52,292-Speed 6315.64 samples/sec Loss 4.4161 LearningRate 0.0002 Epoch: 23 Global Step: 480690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:55,539-Speed 6308.88 samples/sec Loss 4.4432 LearningRate 0.0002 Epoch: 23 Global Step: 480700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:27:58,784-Speed 6312.55 samples/sec Loss 4.4328 LearningRate 0.0002 Epoch: 23 Global Step: 480710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:02,031-Speed 6309.81 samples/sec Loss 4.3761 LearningRate 0.0002 Epoch: 23 Global Step: 480720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:05,287-Speed 6290.91 samples/sec Loss 4.3887 LearningRate 0.0002 Epoch: 23 Global Step: 480730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:08,519-Speed 6339.72 samples/sec Loss 4.4319 LearningRate 0.0002 Epoch: 23 Global Step: 480740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:11,768-Speed 6303.72 samples/sec Loss 4.4488 LearningRate 0.0002 Epoch: 23 Global Step: 480750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:15,012-Speed 6314.50 samples/sec Loss 4.4371 LearningRate 0.0002 Epoch: 23 Global Step: 480760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:18,258-Speed 6311.22 samples/sec Loss 4.4298 LearningRate 0.0002 Epoch: 23 Global Step: 480770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:21,500-Speed 6319.41 samples/sec Loss 4.3973 LearningRate 0.0002 Epoch: 23 Global Step: 480780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:24,751-Speed 6301.21 samples/sec Loss 4.4599 LearningRate 0.0002 Epoch: 23 Global Step: 480790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:27,995-Speed 6314.65 samples/sec Loss 4.4393 LearningRate 0.0002 Epoch: 23 Global Step: 480800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:31,239-Speed 6314.36 samples/sec Loss 4.3859 LearningRate 0.0002 Epoch: 23 Global Step: 480810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:34,484-Speed 6312.60 samples/sec Loss 4.3515 LearningRate 0.0002 Epoch: 23 Global Step: 480820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:37,730-Speed 6309.88 samples/sec Loss 4.4181 LearningRate 0.0002 Epoch: 23 Global Step: 480830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:40,975-Speed 6313.43 samples/sec Loss 4.3753 LearningRate 0.0002 Epoch: 23 Global Step: 480840 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:28:44,206-Speed 6339.62 samples/sec Loss 4.4284 LearningRate 0.0002 Epoch: 23 Global Step: 480850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:47,451-Speed 6311.85 samples/sec Loss 4.3831 LearningRate 0.0002 Epoch: 23 Global Step: 480860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:50,698-Speed 6309.53 samples/sec Loss 4.3647 LearningRate 0.0002 Epoch: 23 Global Step: 480870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:53,945-Speed 6309.48 samples/sec Loss 4.4749 LearningRate 0.0002 Epoch: 23 Global Step: 480880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:28:57,190-Speed 6311.79 samples/sec Loss 4.3970 LearningRate 0.0002 Epoch: 23 Global Step: 480890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:00,436-Speed 6311.40 samples/sec Loss 4.3692 LearningRate 0.0002 Epoch: 23 Global Step: 480900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:03,683-Speed 6308.78 samples/sec Loss 4.3283 LearningRate 0.0002 Epoch: 23 Global Step: 480910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:06,927-Speed 6313.37 samples/sec Loss 4.4329 LearningRate 0.0002 Epoch: 23 Global Step: 480920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:10,179-Speed 6299.54 samples/sec Loss 4.3081 LearningRate 0.0002 Epoch: 23 Global Step: 480930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:13,423-Speed 6314.98 samples/sec Loss 4.3769 LearningRate 0.0002 Epoch: 23 Global Step: 480940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:16,656-Speed 6336.35 samples/sec Loss 4.4193 LearningRate 0.0002 Epoch: 23 Global Step: 480950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:19,904-Speed 6307.76 samples/sec Loss 4.4147 LearningRate 0.0002 Epoch: 23 Global Step: 480960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:23,152-Speed 6306.64 samples/sec Loss 4.4166 LearningRate 0.0002 Epoch: 23 Global Step: 480970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:26,399-Speed 6308.38 samples/sec Loss 4.4418 LearningRate 0.0002 Epoch: 23 Global Step: 480980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:29,646-Speed 6309.69 samples/sec Loss 4.4040 LearningRate 0.0002 Epoch: 23 Global Step: 480990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:32,893-Speed 6307.77 samples/sec Loss 4.4194 LearningRate 0.0002 Epoch: 23 Global Step: 481000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:36,158-Speed 6274.96 samples/sec Loss 4.3693 LearningRate 0.0002 Epoch: 23 Global Step: 481010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:39,403-Speed 6312.86 samples/sec Loss 4.4299 LearningRate 0.0002 Epoch: 23 Global Step: 481020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:42,648-Speed 6312.17 samples/sec Loss 4.3530 LearningRate 0.0002 Epoch: 23 Global Step: 481030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:45,896-Speed 6306.16 samples/sec Loss 4.4196 LearningRate 0.0002 Epoch: 23 Global Step: 481040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:49,146-Speed 6303.12 samples/sec Loss 4.3775 LearningRate 0.0002 Epoch: 23 Global Step: 481050 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:29:52,379-Speed 6336.86 samples/sec Loss 4.4189 LearningRate 0.0002 Epoch: 23 Global Step: 481060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:55,621-Speed 6318.86 samples/sec Loss 4.4365 LearningRate 0.0002 Epoch: 23 Global Step: 481070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:29:58,875-Speed 6294.32 samples/sec Loss 4.4272 LearningRate 0.0002 Epoch: 23 Global Step: 481080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:02,125-Speed 6303.61 samples/sec Loss 4.4269 LearningRate 0.0002 Epoch: 23 Global Step: 481090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:05,374-Speed 6303.42 samples/sec Loss 4.4362 LearningRate 0.0002 Epoch: 23 Global Step: 481100 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:08,618-Speed 6314.52 samples/sec Loss 4.4013 LearningRate 0.0002 Epoch: 23 Global Step: 481110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:11,869-Speed 6302.13 samples/sec Loss 4.4441 LearningRate 0.0002 Epoch: 23 Global Step: 481120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:15,117-Speed 6306.51 samples/sec Loss 4.3873 LearningRate 0.0002 Epoch: 23 Global Step: 481130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:18,362-Speed 6311.87 samples/sec Loss 4.4576 LearningRate 0.0002 Epoch: 23 Global Step: 481140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:21,607-Speed 6313.98 samples/sec Loss 4.4622 LearningRate 0.0002 Epoch: 23 Global Step: 481150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:24,835-Speed 6344.30 samples/sec Loss 4.4433 LearningRate 0.0002 Epoch: 23 Global Step: 481160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:28,081-Speed 6313.27 samples/sec Loss 4.3649 LearningRate 0.0002 Epoch: 23 Global Step: 481170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:31,326-Speed 6310.93 samples/sec Loss 4.3511 LearningRate 0.0002 Epoch: 23 Global Step: 481180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:34,570-Speed 6315.11 samples/sec Loss 4.3872 LearningRate 0.0002 Epoch: 23 Global Step: 481190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:37,827-Speed 6290.42 samples/sec Loss 4.4068 LearningRate 0.0002 Epoch: 23 Global Step: 481200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:41,075-Speed 6306.80 samples/sec Loss 4.3761 LearningRate 0.0002 Epoch: 23 Global Step: 481210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:44,322-Speed 6308.60 samples/sec Loss 4.4484 LearningRate 0.0002 Epoch: 23 Global Step: 481220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:47,568-Speed 6310.75 samples/sec Loss 4.3671 LearningRate 0.0002 Epoch: 23 Global Step: 481230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:50,813-Speed 6313.02 samples/sec Loss 4.2978 LearningRate 0.0002 Epoch: 23 Global Step: 481240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:54,062-Speed 6303.36 samples/sec Loss 4.3891 LearningRate 0.0002 Epoch: 23 Global Step: 481250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:30:57,294-Speed 6338.99 samples/sec Loss 4.4029 LearningRate 0.0002 Epoch: 23 Global Step: 481260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:00,540-Speed 6311.40 samples/sec Loss 4.3321 LearningRate 0.0002 Epoch: 23 Global Step: 481270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:03,788-Speed 6305.98 samples/sec Loss 4.4183 LearningRate 0.0002 Epoch: 23 Global Step: 481280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:07,054-Speed 6271.41 samples/sec Loss 4.3642 LearningRate 0.0002 Epoch: 23 Global Step: 481290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:10,303-Speed 6304.87 samples/sec Loss 4.3771 LearningRate 0.0002 Epoch: 23 Global Step: 481300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:13,550-Speed 6309.39 samples/sec Loss 4.3759 LearningRate 0.0002 Epoch: 23 Global Step: 481310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:16,796-Speed 6310.34 samples/sec Loss 4.4329 LearningRate 0.0002 Epoch: 23 Global Step: 481320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:20,050-Speed 6296.18 samples/sec Loss 4.3349 LearningRate 0.0002 Epoch: 23 Global Step: 481330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:23,295-Speed 6312.16 samples/sec Loss 4.3983 LearningRate 0.0002 Epoch: 23 Global Step: 481340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:26,543-Speed 6307.52 samples/sec Loss 4.4266 LearningRate 0.0002 Epoch: 23 Global Step: 481350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:29,778-Speed 6331.70 samples/sec Loss 4.4517 LearningRate 0.0002 Epoch: 23 Global Step: 481360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:33,026-Speed 6307.13 samples/sec Loss 4.3365 LearningRate 0.0002 Epoch: 23 Global Step: 481370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:36,278-Speed 6297.47 samples/sec Loss 4.4107 LearningRate 0.0002 Epoch: 23 Global Step: 481380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:39,527-Speed 6306.48 samples/sec Loss 4.4192 LearningRate 0.0002 Epoch: 23 Global Step: 481390 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:42,776-Speed 6305.11 samples/sec Loss 4.4019 LearningRate 0.0002 Epoch: 23 Global Step: 481400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:46,022-Speed 6309.95 samples/sec Loss 4.3685 LearningRate 0.0002 Epoch: 23 Global Step: 481410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:49,276-Speed 6296.07 samples/sec Loss 4.4768 LearningRate 0.0002 Epoch: 23 Global Step: 481420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:52,522-Speed 6310.17 samples/sec Loss 4.4387 LearningRate 0.0002 Epoch: 23 Global Step: 481430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:55,771-Speed 6304.85 samples/sec Loss 4.4041 LearningRate 0.0002 Epoch: 23 Global Step: 481440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:31:59,016-Speed 6312.54 samples/sec Loss 4.3850 LearningRate 0.0002 Epoch: 23 Global Step: 481450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:02,265-Speed 6305.95 samples/sec Loss 4.3612 LearningRate 0.0002 Epoch: 23 Global Step: 481460 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:32:05,521-Speed 6291.50 samples/sec Loss 4.3582 LearningRate 0.0002 Epoch: 23 Global Step: 481470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:08,763-Speed 6317.32 samples/sec Loss 4.3764 LearningRate 0.0002 Epoch: 23 Global Step: 481480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:12,007-Speed 6315.35 samples/sec Loss 4.3975 LearningRate 0.0002 Epoch: 23 Global Step: 481490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:15,253-Speed 6311.69 samples/sec Loss 4.3793 LearningRate 0.0002 Epoch: 23 Global Step: 481500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:18,495-Speed 6316.76 samples/sec Loss 4.3670 LearningRate 0.0002 Epoch: 23 Global Step: 481510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:21,745-Speed 6303.01 samples/sec Loss 4.3994 LearningRate 0.0002 Epoch: 23 Global Step: 481520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:24,991-Speed 6311.87 samples/sec Loss 4.3256 LearningRate 0.0002 Epoch: 23 Global Step: 481530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:28,242-Speed 6301.62 samples/sec Loss 4.3787 LearningRate 0.0002 Epoch: 23 Global Step: 481540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:31,489-Speed 6307.09 samples/sec Loss 4.4307 LearningRate 0.0002 Epoch: 23 Global Step: 481550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:34,742-Speed 6298.46 samples/sec Loss 4.3201 LearningRate 0.0002 Epoch: 23 Global Step: 481560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:37,974-Speed 6338.05 samples/sec Loss 4.4179 LearningRate 0.0002 Epoch: 23 Global Step: 481570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:41,216-Speed 6318.11 samples/sec Loss 4.4019 LearningRate 0.0002 Epoch: 23 Global Step: 481580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:44,463-Speed 6309.11 samples/sec Loss 4.4506 LearningRate 0.0002 Epoch: 23 Global Step: 481590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:47,709-Speed 6310.29 samples/sec Loss 4.4167 LearningRate 0.0002 Epoch: 23 Global Step: 481600 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:50,975-Speed 6273.08 samples/sec Loss 4.3636 LearningRate 0.0002 Epoch: 23 Global Step: 481610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:54,233-Speed 6286.95 samples/sec Loss 4.3677 LearningRate 0.0002 Epoch: 23 Global Step: 481620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:32:57,483-Speed 6303.86 samples/sec Loss 4.4310 LearningRate 0.0002 Epoch: 23 Global Step: 481630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:00,729-Speed 6310.35 samples/sec Loss 4.4281 LearningRate 0.0002 Epoch: 23 Global Step: 481640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:03,978-Speed 6304.86 samples/sec Loss 4.3585 LearningRate 0.0002 Epoch: 23 Global Step: 481650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:07,249-Speed 6262.50 samples/sec Loss 4.3893 LearningRate 0.0002 Epoch: 23 Global Step: 481660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:10,478-Speed 6342.87 samples/sec Loss 4.3533 LearningRate 0.0002 Epoch: 23 Global Step: 481670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:13,724-Speed 6312.27 samples/sec Loss 4.4185 LearningRate 0.0002 Epoch: 23 Global Step: 481680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:16,969-Speed 6311.34 samples/sec Loss 4.3761 LearningRate 0.0002 Epoch: 23 Global Step: 481690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:20,215-Speed 6310.73 samples/sec Loss 4.4103 LearningRate 0.0002 Epoch: 23 Global Step: 481700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:23,469-Speed 6295.59 samples/sec Loss 4.3873 LearningRate 0.0002 Epoch: 23 Global Step: 481710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:26,713-Speed 6314.28 samples/sec Loss 4.4236 LearningRate 0.0002 Epoch: 23 Global Step: 481720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:29,959-Speed 6310.61 samples/sec Loss 4.3745 LearningRate 0.0002 Epoch: 23 Global Step: 481730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:33,207-Speed 6308.52 samples/sec Loss 4.4300 LearningRate 0.0002 Epoch: 23 Global Step: 481740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:36,453-Speed 6310.83 samples/sec Loss 4.4149 LearningRate 0.0002 Epoch: 23 Global Step: 481750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:39,708-Speed 6292.91 samples/sec Loss 4.3821 LearningRate 0.0002 Epoch: 23 Global Step: 481760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:42,955-Speed 6308.46 samples/sec Loss 4.4117 LearningRate 0.0002 Epoch: 23 Global Step: 481770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:46,202-Speed 6307.94 samples/sec Loss 4.4448 LearningRate 0.0002 Epoch: 23 Global Step: 481780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:49,447-Speed 6312.93 samples/sec Loss 4.3921 LearningRate 0.0002 Epoch: 23 Global Step: 481790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:52,693-Speed 6310.70 samples/sec Loss 4.3872 LearningRate 0.0002 Epoch: 23 Global Step: 481800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:55,943-Speed 6304.28 samples/sec Loss 4.4012 LearningRate 0.0002 Epoch: 23 Global Step: 481810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:33:59,189-Speed 6309.69 samples/sec Loss 4.3101 LearningRate 0.0002 Epoch: 23 Global Step: 481820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:02,439-Speed 6307.73 samples/sec Loss 4.3952 LearningRate 0.0002 Epoch: 23 Global Step: 481830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:05,689-Speed 6302.35 samples/sec Loss 4.4003 LearningRate 0.0002 Epoch: 23 Global Step: 481840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:08,938-Speed 6304.23 samples/sec Loss 4.3708 LearningRate 0.0002 Epoch: 23 Global Step: 481850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:12,183-Speed 6312.72 samples/sec Loss 4.4365 LearningRate 0.0002 Epoch: 23 Global Step: 481860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:15,427-Speed 6315.46 samples/sec Loss 4.4252 LearningRate 0.0002 Epoch: 23 Global Step: 481870 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:34:18,659-Speed 6337.81 samples/sec Loss 4.4490 LearningRate 0.0002 Epoch: 23 Global Step: 481880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:21,902-Speed 6316.33 samples/sec Loss 4.4076 LearningRate 0.0002 Epoch: 23 Global Step: 481890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:25,148-Speed 6311.33 samples/sec Loss 4.4374 LearningRate 0.0002 Epoch: 23 Global Step: 481900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:28,393-Speed 6312.48 samples/sec Loss 4.4111 LearningRate 0.0002 Epoch: 23 Global Step: 481910 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:31,636-Speed 6315.70 samples/sec Loss 4.4153 LearningRate 0.0002 Epoch: 23 Global Step: 481920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:34,884-Speed 6307.24 samples/sec Loss 4.3825 LearningRate 0.0002 Epoch: 23 Global Step: 481930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:38,209-Speed 6161.39 samples/sec Loss 4.4301 LearningRate 0.0002 Epoch: 23 Global Step: 481940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:41,470-Speed 6281.47 samples/sec Loss 4.4527 LearningRate 0.0002 Epoch: 23 Global Step: 481950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:44,714-Speed 6313.51 samples/sec Loss 4.4039 LearningRate 0.0002 Epoch: 23 Global Step: 481960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:47,963-Speed 6305.30 samples/sec Loss 4.4056 LearningRate 0.0002 Epoch: 23 Global Step: 481970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:51,210-Speed 6308.43 samples/sec Loss 4.3655 LearningRate 0.0002 Epoch: 23 Global Step: 481980 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:34:54,443-Speed 6336.34 samples/sec Loss 4.4154 LearningRate 0.0002 Epoch: 23 Global Step: 481990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:34:57,693-Speed 6302.97 samples/sec Loss 4.3765 LearningRate 0.0002 Epoch: 23 Global Step: 482000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:00,941-Speed 6307.09 samples/sec Loss 4.4198 LearningRate 0.0002 Epoch: 23 Global Step: 482010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:04,190-Speed 6305.56 samples/sec Loss 4.4213 LearningRate 0.0002 Epoch: 23 Global Step: 482020 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:07,436-Speed 6311.26 samples/sec Loss 4.4463 LearningRate 0.0002 Epoch: 23 Global Step: 482030 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:10,680-Speed 6314.01 samples/sec Loss 4.3709 LearningRate 0.0002 Epoch: 23 Global Step: 482040 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:13,921-Speed 6320.06 samples/sec Loss 4.3788 LearningRate 0.0002 Epoch: 23 Global Step: 482050 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:17,169-Speed 6308.30 samples/sec Loss 4.4449 LearningRate 0.0002 Epoch: 23 Global Step: 482060 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:20,418-Speed 6304.77 samples/sec Loss 4.3827 LearningRate 0.0002 Epoch: 23 Global Step: 482070 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:23,665-Speed 6308.75 samples/sec Loss 4.4014 LearningRate 0.0002 Epoch: 23 Global Step: 482080 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:26,899-Speed 6333.83 samples/sec Loss 4.3553 LearningRate 0.0002 Epoch: 23 Global Step: 482090 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:30,149-Speed 6303.24 samples/sec Loss 4.3691 LearningRate 0.0002 Epoch: 23 Global Step: 482100 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:33,393-Speed 6314.81 samples/sec Loss 4.4028 LearningRate 0.0002 Epoch: 23 Global Step: 482110 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:36,641-Speed 6305.39 samples/sec Loss 4.4485 LearningRate 0.0002 Epoch: 23 Global Step: 482120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:39,885-Speed 6314.61 samples/sec Loss 4.3971 LearningRate 0.0002 Epoch: 23 Global Step: 482130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:43,137-Speed 6299.22 samples/sec Loss 4.3888 LearningRate 0.0002 Epoch: 23 Global Step: 482140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:46,388-Speed 6300.70 samples/sec Loss 4.4051 LearningRate 0.0002 Epoch: 23 Global Step: 482150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:49,636-Speed 6306.86 samples/sec Loss 4.3949 LearningRate 0.0002 Epoch: 23 Global Step: 482160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:52,882-Speed 6311.17 samples/sec Loss 4.4077 LearningRate 0.0002 Epoch: 23 Global Step: 482170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:56,129-Speed 6308.99 samples/sec Loss 4.4076 LearningRate 0.0002 Epoch: 23 Global Step: 482180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:35:59,363-Speed 6334.69 samples/sec Loss 4.4166 LearningRate 0.0002 Epoch: 23 Global Step: 482190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:02,613-Speed 6303.24 samples/sec Loss 4.4393 LearningRate 0.0002 Epoch: 23 Global Step: 482200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:05,860-Speed 6308.17 samples/sec Loss 4.3924 LearningRate 0.0002 Epoch: 23 Global Step: 482210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:09,103-Speed 6317.23 samples/sec Loss 4.3793 LearningRate 0.0002 Epoch: 23 Global Step: 482220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:12,346-Speed 6317.03 samples/sec Loss 4.4131 LearningRate 0.0002 Epoch: 23 Global Step: 482230 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:15,596-Speed 6302.22 samples/sec Loss 4.3104 LearningRate 0.0002 Epoch: 23 Global Step: 482240 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:18,840-Speed 6314.19 samples/sec Loss 4.4071 LearningRate 0.0002 Epoch: 23 Global Step: 482250 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:22,086-Speed 6311.87 samples/sec Loss 4.3674 LearningRate 0.0002 Epoch: 23 Global Step: 482260 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:25,334-Speed 6306.29 samples/sec Loss 4.3823 LearningRate 0.0002 Epoch: 23 Global Step: 482270 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:28,579-Speed 6313.17 samples/sec Loss 4.4112 LearningRate 0.0002 Epoch: 23 Global Step: 482280 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:31,813-Speed 6333.15 samples/sec Loss 4.4331 LearningRate 0.0002 Epoch: 23 Global Step: 482290 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:35,062-Speed 6304.94 samples/sec Loss 4.3699 LearningRate 0.0002 Epoch: 23 Global Step: 482300 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:38,309-Speed 6310.11 samples/sec Loss 4.4570 LearningRate 0.0002 Epoch: 23 Global Step: 482310 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:41,558-Speed 6304.75 samples/sec Loss 4.3819 LearningRate 0.0002 Epoch: 23 Global Step: 482320 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:44,801-Speed 6315.27 samples/sec Loss 4.4428 LearningRate 0.0002 Epoch: 23 Global Step: 482330 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:48,046-Speed 6312.39 samples/sec Loss 4.4067 LearningRate 0.0002 Epoch: 23 Global Step: 482340 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:51,295-Speed 6306.56 samples/sec Loss 4.4120 LearningRate 0.0002 Epoch: 23 Global Step: 482350 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:54,542-Speed 6307.84 samples/sec Loss 4.2994 LearningRate 0.0002 Epoch: 23 Global Step: 482360 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:36:57,789-Speed 6308.10 samples/sec Loss 4.4585 LearningRate 0.0002 Epoch: 23 Global Step: 482370 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:01,041-Speed 6300.44 samples/sec Loss 4.3978 LearningRate 0.0002 Epoch: 23 Global Step: 482380 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:04,300-Speed 6285.16 samples/sec Loss 4.4411 LearningRate 0.0002 Epoch: 23 Global Step: 482390 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:37:07,532-Speed 6337.75 samples/sec Loss 4.3813 LearningRate 0.0002 Epoch: 23 Global Step: 482400 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:10,779-Speed 6309.32 samples/sec Loss 4.3798 LearningRate 0.0002 Epoch: 23 Global Step: 482410 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:14,023-Speed 6314.85 samples/sec Loss 4.4195 LearningRate 0.0002 Epoch: 23 Global Step: 482420 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:17,270-Speed 6309.25 samples/sec Loss 4.3354 LearningRate 0.0002 Epoch: 23 Global Step: 482430 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:20,518-Speed 6307.60 samples/sec Loss 4.3540 LearningRate 0.0002 Epoch: 23 Global Step: 482440 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:23,767-Speed 6304.10 samples/sec Loss 4.3508 LearningRate 0.0002 Epoch: 23 Global Step: 482450 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:27,019-Speed 6299.94 samples/sec Loss 4.3984 LearningRate 0.0002 Epoch: 23 Global Step: 482460 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:30,262-Speed 6314.79 samples/sec Loss 4.4432 LearningRate 0.0002 Epoch: 23 Global Step: 482470 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:33,509-Speed 6310.56 samples/sec Loss 4.4515 LearningRate 0.0002 Epoch: 23 Global Step: 482480 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:36,756-Speed 6308.29 samples/sec Loss 4.4182 LearningRate 0.0002 Epoch: 23 Global Step: 482490 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:39,990-Speed 6333.42 samples/sec Loss 4.3549 LearningRate 0.0002 Epoch: 23 Global Step: 482500 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:43,240-Speed 6302.71 samples/sec Loss 4.3430 LearningRate 0.0002 Epoch: 23 Global Step: 482510 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:46,494-Speed 6295.85 samples/sec Loss 4.3949 LearningRate 0.0002 Epoch: 23 Global Step: 482520 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:49,743-Speed 6304.79 samples/sec Loss 4.3733 LearningRate 0.0002 Epoch: 23 Global Step: 482530 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:52,994-Speed 6302.04 samples/sec Loss 4.4425 LearningRate 0.0002 Epoch: 23 Global Step: 482540 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:56,242-Speed 6306.58 samples/sec Loss 4.4241 LearningRate 0.0002 Epoch: 23 Global Step: 482550 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:37:59,493-Speed 6299.34 samples/sec Loss 4.3682 LearningRate 0.0002 Epoch: 23 Global Step: 482560 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:02,812-Speed 6173.12 samples/sec Loss 4.3874 LearningRate 0.0002 Epoch: 23 Global Step: 482570 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:06,060-Speed 6306.33 samples/sec Loss 4.3844 LearningRate 0.0002 Epoch: 23 Global Step: 482580 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:09,304-Speed 6315.03 samples/sec Loss 4.4105 LearningRate 0.0002 Epoch: 23 Global Step: 482590 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:12,642-Speed 6137.13 samples/sec Loss 4.3514 LearningRate 0.0002 Epoch: 23 Global Step: 482600 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:38:15,897-Speed 6292.63 samples/sec Loss 4.4091 LearningRate 0.0002 Epoch: 23 Global Step: 482610 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:19,144-Speed 6308.82 samples/sec Loss 4.4065 LearningRate 0.0002 Epoch: 23 Global Step: 482620 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:22,394-Speed 6303.32 samples/sec Loss 4.3842 LearningRate 0.0002 Epoch: 23 Global Step: 482630 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:25,643-Speed 6304.91 samples/sec Loss 4.3565 LearningRate 0.0002 Epoch: 23 Global Step: 482640 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:28,889-Speed 6311.39 samples/sec Loss 4.3924 LearningRate 0.0002 Epoch: 23 Global Step: 482650 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:32,134-Speed 6313.34 samples/sec Loss 4.3764 LearningRate 0.0002 Epoch: 23 Global Step: 482660 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:35,381-Speed 6307.54 samples/sec Loss 4.4081 LearningRate 0.0002 Epoch: 23 Global Step: 482670 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:38,627-Speed 6312.27 samples/sec Loss 4.3991 LearningRate 0.0002 Epoch: 23 Global Step: 482680 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:41,871-Speed 6312.89 samples/sec Loss 4.4375 LearningRate 0.0002 Epoch: 23 Global Step: 482690 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:45,117-Speed 6310.33 samples/sec Loss 4.4287 LearningRate 0.0002 Epoch: 23 Global Step: 482700 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:48,357-Speed 6323.12 samples/sec Loss 4.3900 LearningRate 0.0002 Epoch: 23 Global Step: 482710 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:51,601-Speed 6315.50 samples/sec Loss 4.3839 LearningRate 0.0002 Epoch: 23 Global Step: 482720 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:54,845-Speed 6313.35 samples/sec Loss 4.3922 LearningRate 0.0002 Epoch: 23 Global Step: 482730 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:38:58,093-Speed 6306.99 samples/sec Loss 4.4115 LearningRate 0.0002 Epoch: 23 Global Step: 482740 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:01,341-Speed 6306.34 samples/sec Loss 4.3596 LearningRate 0.0002 Epoch: 23 Global Step: 482750 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:04,587-Speed 6311.34 samples/sec Loss 4.3719 LearningRate 0.0002 Epoch: 23 Global Step: 482760 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:07,832-Speed 6315.11 samples/sec Loss 4.4502 LearningRate 0.0002 Epoch: 23 Global Step: 482770 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:11,077-Speed 6311.48 samples/sec Loss 4.4500 LearningRate 0.0002 Epoch: 23 Global Step: 482780 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:14,321-Speed 6315.35 samples/sec Loss 4.3325 LearningRate 0.0002 Epoch: 23 Global Step: 482790 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:17,564-Speed 6316.31 samples/sec Loss 4.4239 LearningRate 0.0002 Epoch: 23 Global Step: 482800 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:20,794-Speed 6342.87 samples/sec Loss 4.3379 LearningRate 0.0002 Epoch: 23 Global Step: 482810 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:24,042-Speed 6306.50 samples/sec Loss 4.3466 LearningRate 0.0002 Epoch: 23 Global Step: 482820 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:27,285-Speed 6315.77 samples/sec Loss 4.3740 LearningRate 0.0002 Epoch: 23 Global Step: 482830 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:30,540-Speed 6293.91 samples/sec Loss 4.3532 LearningRate 0.0002 Epoch: 23 Global Step: 482840 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:33,785-Speed 6313.16 samples/sec Loss 4.4609 LearningRate 0.0002 Epoch: 23 Global Step: 482850 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:37,035-Speed 6301.96 samples/sec Loss 4.3776 LearningRate 0.0002 Epoch: 23 Global Step: 482860 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:40,282-Speed 6309.88 samples/sec Loss 4.4210 LearningRate 0.0002 Epoch: 23 Global Step: 482870 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:43,527-Speed 6312.14 samples/sec Loss 4.3913 LearningRate 0.0002 Epoch: 23 Global Step: 482880 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:46,774-Speed 6309.50 samples/sec Loss 4.3985 LearningRate 0.0002 Epoch: 23 Global Step: 482890 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:50,017-Speed 6315.61 samples/sec Loss 4.3428 LearningRate 0.0002 Epoch: 23 Global Step: 482900 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:53,263-Speed 6311.69 samples/sec Loss 4.4312 LearningRate 0.0002 Epoch: 23 Global Step: 482910 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-02 11:39:56,490-Speed 6346.88 samples/sec Loss 4.3551 LearningRate 0.0002 Epoch: 23 Global Step: 482920 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:39:59,733-Speed 6316.60 samples/sec Loss 4.3686 LearningRate 0.0002 Epoch: 23 Global Step: 482930 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:40:02,980-Speed 6310.07 samples/sec Loss 4.3762 LearningRate 0.0002 Epoch: 23 Global Step: 482940 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:40:06,226-Speed 6309.60 samples/sec Loss 4.4465 LearningRate 0.0002 Epoch: 23 Global Step: 482950 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:40:09,492-Speed 6272.48 samples/sec Loss 4.4086 LearningRate 0.0002 Epoch: 23 Global Step: 482960 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:40:12,740-Speed 6306.00 samples/sec Loss 4.3926 LearningRate 0.0002 Epoch: 23 Global Step: 482970 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:40:15,989-Speed 6305.73 samples/sec Loss 4.4197 LearningRate 0.0002 Epoch: 23 Global Step: 482980 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:40:19,233-Speed 6313.90 samples/sec Loss 4.3681 LearningRate 0.0002 Epoch: 23 Global Step: 482990 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:40:22,481-Speed 6307.25 samples/sec Loss 4.4059 LearningRate 0.0002 Epoch: 23 Global Step: 483000 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:40:25,726-Speed 6312.55 samples/sec Loss 4.4095 LearningRate 0.0002 Epoch: 23 Global Step: 483010 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:40:28,955-Speed 6343.32 samples/sec Loss 4.4625 LearningRate 0.0002 Epoch: 23 Global Step: 483020 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:40:32,197-Speed 6320.17 samples/sec Loss 4.3739 LearningRate 0.0002 Epoch: 23 Global Step: 483030 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:40:35,439-Speed 6318.21 samples/sec Loss 4.4111 LearningRate 0.0002 Epoch: 23 Global Step: 483040 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:40:38,685-Speed 6310.82 samples/sec Loss 4.3337 LearningRate 0.0002 Epoch: 23 Global Step: 483050 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:40:41,930-Speed 6311.35 samples/sec Loss 4.4094 LearningRate 0.0002 Epoch: 23 Global Step: 483060 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:40:45,183-Speed 6299.16 samples/sec Loss 4.4287 LearningRate 0.0002 Epoch: 23 Global Step: 483070 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:40:48,427-Speed 6314.76 samples/sec Loss 4.3378 LearningRate 0.0002 Epoch: 23 Global Step: 483080 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:40:51,677-Speed 6302.40 samples/sec Loss 4.3478 LearningRate 0.0002 Epoch: 23 Global Step: 483090 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:40:54,920-Speed 6317.80 samples/sec Loss 4.3681 LearningRate 0.0002 Epoch: 23 Global Step: 483100 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:40:58,165-Speed 6312.10 samples/sec Loss 4.4094 LearningRate 0.0002 Epoch: 23 Global Step: 483110 Fp16 Grad Scale: 8192 Required: 32 hours Training: 2022-04-02 11:41:01,412-Speed 6308.36 samples/sec Loss 4.4289 LearningRate 0.0002 Epoch: 23 Global Step: 483120 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:04,656-Speed 6313.87 samples/sec Loss 4.4555 LearningRate 0.0002 Epoch: 23 Global Step: 483130 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:07,905-Speed 6306.05 samples/sec Loss 4.4194 LearningRate 0.0002 Epoch: 23 Global Step: 483140 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:11,152-Speed 6308.72 samples/sec Loss 4.3777 LearningRate 0.0002 Epoch: 23 Global Step: 483150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:14,397-Speed 6312.11 samples/sec Loss 4.3398 LearningRate 0.0002 Epoch: 23 Global Step: 483160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:17,643-Speed 6309.88 samples/sec Loss 4.4051 LearningRate 0.0002 Epoch: 23 Global Step: 483170 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:20,889-Speed 6312.83 samples/sec Loss 4.4515 LearningRate 0.0002 Epoch: 23 Global Step: 483180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:24,133-Speed 6313.24 samples/sec Loss 4.4429 LearningRate 0.0002 Epoch: 23 Global Step: 483190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:27,379-Speed 6310.87 samples/sec Loss 4.3600 LearningRate 0.0002 Epoch: 23 Global Step: 483200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:30,623-Speed 6314.50 samples/sec Loss 4.4216 LearningRate 0.0002 Epoch: 23 Global Step: 483210 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:33,855-Speed 6337.70 samples/sec Loss 4.3036 LearningRate 0.0002 Epoch: 23 Global Step: 483220 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-02 11:41:37,106-Speed 6301.29 samples/sec Loss 4.4333 LearningRate 0.0002 Epoch: 23 Global Step: 483230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:41:40,349-Speed 6317.22 samples/sec Loss 4.4331 LearningRate 0.0002 Epoch: 23 Global Step: 483240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:41:43,593-Speed 6314.69 samples/sec Loss 4.4106 LearningRate 0.0002 Epoch: 23 Global Step: 483250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:41:46,846-Speed 6296.12 samples/sec Loss 4.3596 LearningRate 0.0002 Epoch: 23 Global Step: 483260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:41:50,089-Speed 6316.00 samples/sec Loss 4.3997 LearningRate 0.0002 Epoch: 23 Global Step: 483270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:41:53,341-Speed 6300.58 samples/sec Loss 4.3811 LearningRate 0.0002 Epoch: 23 Global Step: 483280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:41:56,583-Speed 6317.95 samples/sec Loss 4.4113 LearningRate 0.0002 Epoch: 23 Global Step: 483290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:41:59,831-Speed 6307.91 samples/sec Loss 4.4053 LearningRate 0.0002 Epoch: 23 Global Step: 483300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:03,076-Speed 6312.07 samples/sec Loss 4.3934 LearningRate 0.0002 Epoch: 23 Global Step: 483310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:06,309-Speed 6336.89 samples/sec Loss 4.3989 LearningRate 0.0002 Epoch: 23 Global Step: 483320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:09,553-Speed 6315.34 samples/sec Loss 4.3504 LearningRate 0.0002 Epoch: 23 Global Step: 483330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:12,797-Speed 6313.93 samples/sec Loss 4.4055 LearningRate 0.0002 Epoch: 23 Global Step: 483340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:16,053-Speed 6289.92 samples/sec Loss 4.4353 LearningRate 0.0002 Epoch: 23 Global Step: 483350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:19,299-Speed 6311.73 samples/sec Loss 4.4117 LearningRate 0.0002 Epoch: 23 Global Step: 483360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:22,542-Speed 6316.71 samples/sec Loss 4.3235 LearningRate 0.0002 Epoch: 23 Global Step: 483370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:25,788-Speed 6309.73 samples/sec Loss 4.3924 LearningRate 0.0002 Epoch: 23 Global Step: 483380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:29,031-Speed 6317.03 samples/sec Loss 4.3480 LearningRate 0.0002 Epoch: 23 Global Step: 483390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:32,278-Speed 6309.75 samples/sec Loss 4.4500 LearningRate 0.0002 Epoch: 23 Global Step: 483400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:35,521-Speed 6314.94 samples/sec Loss 4.3865 LearningRate 0.0002 Epoch: 23 Global Step: 483410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:38,753-Speed 6338.25 samples/sec Loss 4.3315 LearningRate 0.0002 Epoch: 23 Global Step: 483420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:42,001-Speed 6307.20 samples/sec Loss 4.3563 LearningRate 0.0002 Epoch: 23 Global Step: 483430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:45,244-Speed 6317.44 samples/sec Loss 4.3751 LearningRate 0.0002 Epoch: 23 Global Step: 483440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:48,492-Speed 6305.90 samples/sec Loss 4.3674 LearningRate 0.0002 Epoch: 23 Global Step: 483450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:51,736-Speed 6313.96 samples/sec Loss 4.3984 LearningRate 0.0002 Epoch: 23 Global Step: 483460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:54,982-Speed 6312.30 samples/sec Loss 4.4480 LearningRate 0.0002 Epoch: 23 Global Step: 483470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:42:58,225-Speed 6315.07 samples/sec Loss 4.3769 LearningRate 0.0002 Epoch: 23 Global Step: 483480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:01,474-Speed 6305.37 samples/sec Loss 4.3159 LearningRate 0.0002 Epoch: 23 Global Step: 483490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:04,724-Speed 6303.69 samples/sec Loss 4.3927 LearningRate 0.0002 Epoch: 23 Global Step: 483500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:07,967-Speed 6316.27 samples/sec Loss 4.3726 LearningRate 0.0002 Epoch: 23 Global Step: 483510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:11,198-Speed 6341.96 samples/sec Loss 4.4576 LearningRate 0.0002 Epoch: 23 Global Step: 483520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:14,442-Speed 6314.25 samples/sec Loss 4.3911 LearningRate 0.0002 Epoch: 23 Global Step: 483530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:17,690-Speed 6306.00 samples/sec Loss 4.4463 LearningRate 0.0002 Epoch: 23 Global Step: 483540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:20,935-Speed 6312.81 samples/sec Loss 4.4013 LearningRate 0.0002 Epoch: 23 Global Step: 483550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:24,178-Speed 6316.92 samples/sec Loss 4.4132 LearningRate 0.0002 Epoch: 23 Global Step: 483560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:27,428-Speed 6302.78 samples/sec Loss 4.3771 LearningRate 0.0002 Epoch: 23 Global Step: 483570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:30,669-Speed 6319.88 samples/sec Loss 4.3868 LearningRate 0.0002 Epoch: 23 Global Step: 483580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:33,915-Speed 6311.42 samples/sec Loss 4.3753 LearningRate 0.0002 Epoch: 23 Global Step: 483590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:37,161-Speed 6309.82 samples/sec Loss 4.3779 LearningRate 0.0002 Epoch: 23 Global Step: 483600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:40,416-Speed 6294.01 samples/sec Loss 4.3828 LearningRate 0.0002 Epoch: 23 Global Step: 483610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:43,665-Speed 6304.00 samples/sec Loss 4.3890 LearningRate 0.0002 Epoch: 23 Global Step: 483620 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 11:43:46,895-Speed 6342.49 samples/sec Loss 4.3500 LearningRate 0.0002 Epoch: 23 Global Step: 483630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:50,147-Speed 6299.37 samples/sec Loss 4.3488 LearningRate 0.0002 Epoch: 23 Global Step: 483640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:53,394-Speed 6307.41 samples/sec Loss 4.3720 LearningRate 0.0002 Epoch: 23 Global Step: 483650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:56,640-Speed 6311.43 samples/sec Loss 4.2871 LearningRate 0.0002 Epoch: 23 Global Step: 483660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:43:59,887-Speed 6308.63 samples/sec Loss 4.3678 LearningRate 0.0002 Epoch: 23 Global Step: 483670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:03,135-Speed 6307.16 samples/sec Loss 4.4235 LearningRate 0.0002 Epoch: 23 Global Step: 483680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:06,381-Speed 6310.68 samples/sec Loss 4.4116 LearningRate 0.0002 Epoch: 23 Global Step: 483690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:09,622-Speed 6320.63 samples/sec Loss 4.4359 LearningRate 0.0002 Epoch: 23 Global Step: 483700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:12,871-Speed 6304.83 samples/sec Loss 4.3864 LearningRate 0.0002 Epoch: 23 Global Step: 483710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:16,115-Speed 6315.34 samples/sec Loss 4.3925 LearningRate 0.0002 Epoch: 23 Global Step: 483720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:19,353-Speed 6326.67 samples/sec Loss 4.3409 LearningRate 0.0002 Epoch: 23 Global Step: 483730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:22,601-Speed 6307.22 samples/sec Loss 4.3140 LearningRate 0.0002 Epoch: 23 Global Step: 483740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:25,849-Speed 6305.48 samples/sec Loss 4.3519 LearningRate 0.0002 Epoch: 23 Global Step: 483750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:29,093-Speed 6315.45 samples/sec Loss 4.3867 LearningRate 0.0002 Epoch: 23 Global Step: 483760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:32,335-Speed 6318.86 samples/sec Loss 4.4047 LearningRate 0.0002 Epoch: 23 Global Step: 483770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:35,579-Speed 6313.40 samples/sec Loss 4.3807 LearningRate 0.0002 Epoch: 23 Global Step: 483780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:38,823-Speed 6315.45 samples/sec Loss 4.3418 LearningRate 0.0002 Epoch: 23 Global Step: 483790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:42,068-Speed 6312.63 samples/sec Loss 4.4103 LearningRate 0.0002 Epoch: 23 Global Step: 483800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:45,316-Speed 6306.61 samples/sec Loss 4.4338 LearningRate 0.0002 Epoch: 23 Global Step: 483810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:48,582-Speed 6272.39 samples/sec Loss 4.3714 LearningRate 0.0002 Epoch: 23 Global Step: 483820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:51,816-Speed 6333.99 samples/sec Loss 4.3705 LearningRate 0.0002 Epoch: 23 Global Step: 483830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:55,059-Speed 6315.43 samples/sec Loss 4.3540 LearningRate 0.0002 Epoch: 23 Global Step: 483840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:44:58,308-Speed 6306.15 samples/sec Loss 4.4079 LearningRate 0.0002 Epoch: 23 Global Step: 483850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:01,556-Speed 6305.91 samples/sec Loss 4.3679 LearningRate 0.0002 Epoch: 23 Global Step: 483860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:04,811-Speed 6293.77 samples/sec Loss 4.4060 LearningRate 0.0002 Epoch: 23 Global Step: 483870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:08,057-Speed 6310.81 samples/sec Loss 4.3326 LearningRate 0.0002 Epoch: 23 Global Step: 483880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:11,305-Speed 6307.22 samples/sec Loss 4.4019 LearningRate 0.0002 Epoch: 23 Global Step: 483890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:14,551-Speed 6310.83 samples/sec Loss 4.3248 LearningRate 0.0002 Epoch: 23 Global Step: 483900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:17,804-Speed 6296.35 samples/sec Loss 4.3731 LearningRate 0.0002 Epoch: 23 Global Step: 483910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:21,048-Speed 6315.55 samples/sec Loss 4.4331 LearningRate 0.0002 Epoch: 23 Global Step: 483920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:24,279-Speed 6340.63 samples/sec Loss 4.3921 LearningRate 0.0002 Epoch: 23 Global Step: 483930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:27,524-Speed 6312.40 samples/sec Loss 4.3791 LearningRate 0.0002 Epoch: 23 Global Step: 483940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:30,771-Speed 6308.15 samples/sec Loss 4.3872 LearningRate 0.0002 Epoch: 23 Global Step: 483950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:34,021-Speed 6302.81 samples/sec Loss 4.3976 LearningRate 0.0002 Epoch: 23 Global Step: 483960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:37,355-Speed 6144.53 samples/sec Loss 4.3479 LearningRate 0.0002 Epoch: 23 Global Step: 483970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:40,603-Speed 6307.95 samples/sec Loss 4.3664 LearningRate 0.0002 Epoch: 23 Global Step: 483980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:43,845-Speed 6317.89 samples/sec Loss 4.3696 LearningRate 0.0002 Epoch: 23 Global Step: 483990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:47,090-Speed 6311.83 samples/sec Loss 4.3469 LearningRate 0.0002 Epoch: 23 Global Step: 484000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:50,339-Speed 6304.83 samples/sec Loss 4.3670 LearningRate 0.0002 Epoch: 23 Global Step: 484010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:53,586-Speed 6309.48 samples/sec Loss 4.3259 LearningRate 0.0002 Epoch: 23 Global Step: 484020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:45:56,818-Speed 6338.10 samples/sec Loss 4.3878 LearningRate 0.0002 Epoch: 23 Global Step: 484030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:46:00,064-Speed 6310.93 samples/sec Loss 4.4508 LearningRate 0.0002 Epoch: 23 Global Step: 484040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:46:03,330-Speed 6271.65 samples/sec Loss 4.4709 LearningRate 0.0002 Epoch: 23 Global Step: 484050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:46:06,578-Speed 6306.44 samples/sec Loss 4.4001 LearningRate 0.0002 Epoch: 23 Global Step: 484060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:46:09,823-Speed 6312.59 samples/sec Loss 4.3659 LearningRate 0.0002 Epoch: 23 Global Step: 484070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:46:13,065-Speed 6317.92 samples/sec Loss 4.3759 LearningRate 0.0002 Epoch: 23 Global Step: 484080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:46:16,294-Speed 6344.82 samples/sec Loss 4.3774 LearningRate 0.0002 Epoch: 23 Global Step: 484090 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 11:46:19,541-Speed 6308.44 samples/sec Loss 4.3551 LearningRate 0.0002 Epoch: 23 Global Step: 484100 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 11:46:22,784-Speed 6316.20 samples/sec Loss 4.3825 LearningRate 0.0002 Epoch: 23 Global Step: 484110 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 11:46:26,028-Speed 6315.25 samples/sec Loss 4.5171 LearningRate 0.0002 Epoch: 23 Global Step: 484120 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 11:46:29,286-Speed 6288.71 samples/sec Loss 4.3362 LearningRate 0.0002 Epoch: 23 Global Step: 484130 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 11:46:32,531-Speed 6313.08 samples/sec Loss 4.3400 LearningRate 0.0002 Epoch: 23 Global Step: 484140 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 11:46:35,774-Speed 6315.60 samples/sec Loss 4.3951 LearningRate 0.0002 Epoch: 23 Global Step: 484150 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 11:46:39,018-Speed 6314.86 samples/sec Loss 4.4125 LearningRate 0.0002 Epoch: 23 Global Step: 484160 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 11:46:42,261-Speed 6317.10 samples/sec Loss 4.3903 LearningRate 0.0002 Epoch: 23 Global Step: 484170 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 11:46:45,505-Speed 6313.71 samples/sec Loss 4.3930 LearningRate 0.0002 Epoch: 23 Global Step: 484180 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 11:46:48,749-Speed 6315.02 samples/sec Loss 4.3678 LearningRate 0.0002 Epoch: 23 Global Step: 484190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:46:51,995-Speed 6311.76 samples/sec Loss 4.4140 LearningRate 0.0002 Epoch: 23 Global Step: 484200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:46:55,241-Speed 6309.95 samples/sec Loss 4.3399 LearningRate 0.0002 Epoch: 23 Global Step: 484210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:46:58,483-Speed 6318.35 samples/sec Loss 4.4066 LearningRate 0.0002 Epoch: 23 Global Step: 484220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:01,728-Speed 6313.32 samples/sec Loss 4.4335 LearningRate 0.0002 Epoch: 23 Global Step: 484230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:04,974-Speed 6309.71 samples/sec Loss 4.3969 LearningRate 0.0002 Epoch: 23 Global Step: 484240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:08,222-Speed 6307.25 samples/sec Loss 4.4149 LearningRate 0.0002 Epoch: 23 Global Step: 484250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:11,467-Speed 6313.74 samples/sec Loss 4.3970 LearningRate 0.0002 Epoch: 23 Global Step: 484260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:14,710-Speed 6315.30 samples/sec Loss 4.3849 LearningRate 0.0002 Epoch: 23 Global Step: 484270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:17,958-Speed 6307.82 samples/sec Loss 4.3806 LearningRate 0.0002 Epoch: 23 Global Step: 484280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:21,186-Speed 6346.00 samples/sec Loss 4.3236 LearningRate 0.0002 Epoch: 23 Global Step: 484290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:24,429-Speed 6316.82 samples/sec Loss 4.3764 LearningRate 0.0002 Epoch: 23 Global Step: 484300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:27,672-Speed 6315.43 samples/sec Loss 4.3353 LearningRate 0.0002 Epoch: 23 Global Step: 484310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:30,917-Speed 6312.24 samples/sec Loss 4.3857 LearningRate 0.0002 Epoch: 23 Global Step: 484320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:34,160-Speed 6317.17 samples/sec Loss 4.3568 LearningRate 0.0002 Epoch: 23 Global Step: 484330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:37,407-Speed 6309.26 samples/sec Loss 4.3539 LearningRate 0.0002 Epoch: 23 Global Step: 484340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:40,652-Speed 6313.47 samples/sec Loss 4.3886 LearningRate 0.0002 Epoch: 23 Global Step: 484350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:43,897-Speed 6312.48 samples/sec Loss 4.3381 LearningRate 0.0002 Epoch: 23 Global Step: 484360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:47,139-Speed 6318.25 samples/sec Loss 4.3857 LearningRate 0.0002 Epoch: 23 Global Step: 484370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:50,384-Speed 6313.10 samples/sec Loss 4.4341 LearningRate 0.0002 Epoch: 23 Global Step: 484380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:47:53,628-Speed 6315.20 samples/sec Loss 4.4020 LearningRate 0.0002 Epoch: 23 Global Step: 484390 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 11:47:56,859-Speed 6339.38 samples/sec Loss 4.3689 LearningRate 0.0002 Epoch: 23 Global Step: 484400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:00,104-Speed 6312.65 samples/sec Loss 4.3485 LearningRate 0.0002 Epoch: 23 Global Step: 484410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:03,348-Speed 6313.10 samples/sec Loss 4.3949 LearningRate 0.0002 Epoch: 23 Global Step: 484420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:06,594-Speed 6312.01 samples/sec Loss 4.4178 LearningRate 0.0002 Epoch: 23 Global Step: 484430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:09,841-Speed 6307.53 samples/sec Loss 4.3396 LearningRate 0.0002 Epoch: 23 Global Step: 484440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:13,081-Speed 6323.86 samples/sec Loss 4.3422 LearningRate 0.0002 Epoch: 23 Global Step: 484450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:16,327-Speed 6310.24 samples/sec Loss 4.3812 LearningRate 0.0002 Epoch: 23 Global Step: 484460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:19,571-Speed 6314.20 samples/sec Loss 4.4341 LearningRate 0.0002 Epoch: 23 Global Step: 484470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:22,816-Speed 6312.17 samples/sec Loss 4.3832 LearningRate 0.0002 Epoch: 23 Global Step: 484480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:26,062-Speed 6311.94 samples/sec Loss 4.3327 LearningRate 0.0002 Epoch: 23 Global Step: 484490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:29,294-Speed 6338.12 samples/sec Loss 4.3779 LearningRate 0.0002 Epoch: 23 Global Step: 484500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:32,537-Speed 6316.43 samples/sec Loss 4.3804 LearningRate 0.0002 Epoch: 23 Global Step: 484510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:35,781-Speed 6313.52 samples/sec Loss 4.4118 LearningRate 0.0002 Epoch: 23 Global Step: 484520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:39,034-Speed 6297.04 samples/sec Loss 4.3696 LearningRate 0.0002 Epoch: 23 Global Step: 484530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:42,301-Speed 6271.42 samples/sec Loss 4.3856 LearningRate 0.0002 Epoch: 23 Global Step: 484540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:45,549-Speed 6306.66 samples/sec Loss 4.3647 LearningRate 0.0002 Epoch: 23 Global Step: 484550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:48,794-Speed 6312.80 samples/sec Loss 4.3682 LearningRate 0.0002 Epoch: 23 Global Step: 484560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:52,039-Speed 6313.31 samples/sec Loss 4.4318 LearningRate 0.0002 Epoch: 23 Global Step: 484570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:55,281-Speed 6318.26 samples/sec Loss 4.3915 LearningRate 0.0002 Epoch: 23 Global Step: 484580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:48:58,523-Speed 6318.72 samples/sec Loss 4.3296 LearningRate 0.0002 Epoch: 23 Global Step: 484590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:01,770-Speed 6308.78 samples/sec Loss 4.3599 LearningRate 0.0002 Epoch: 23 Global Step: 484600 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 11:49:04,998-Speed 6345.93 samples/sec Loss 4.3290 LearningRate 0.0002 Epoch: 23 Global Step: 484610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:08,245-Speed 6309.72 samples/sec Loss 4.3520 LearningRate 0.0002 Epoch: 23 Global Step: 484620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:11,491-Speed 6309.61 samples/sec Loss 4.3175 LearningRate 0.0002 Epoch: 23 Global Step: 484630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:14,734-Speed 6316.28 samples/sec Loss 4.3388 LearningRate 0.0002 Epoch: 23 Global Step: 484640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:17,980-Speed 6311.70 samples/sec Loss 4.3465 LearningRate 0.0002 Epoch: 23 Global Step: 484650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:21,220-Speed 6322.79 samples/sec Loss 4.3483 LearningRate 0.0002 Epoch: 23 Global Step: 484660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:24,467-Speed 6307.79 samples/sec Loss 4.3479 LearningRate 0.0002 Epoch: 23 Global Step: 484670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:27,711-Speed 6315.77 samples/sec Loss 4.4044 LearningRate 0.0002 Epoch: 23 Global Step: 484680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:30,953-Speed 6317.72 samples/sec Loss 4.3869 LearningRate 0.0002 Epoch: 23 Global Step: 484690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:34,209-Speed 6290.77 samples/sec Loss 4.3495 LearningRate 0.0002 Epoch: 23 Global Step: 484700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:37,444-Speed 6331.94 samples/sec Loss 4.4214 LearningRate 0.0002 Epoch: 23 Global Step: 484710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:40,689-Speed 6313.44 samples/sec Loss 4.3233 LearningRate 0.0002 Epoch: 23 Global Step: 484720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:43,933-Speed 6314.42 samples/sec Loss 4.3766 LearningRate 0.0002 Epoch: 23 Global Step: 484730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:47,179-Speed 6311.57 samples/sec Loss 4.3569 LearningRate 0.0002 Epoch: 23 Global Step: 484740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:50,430-Speed 6299.27 samples/sec Loss 4.3932 LearningRate 0.0002 Epoch: 23 Global Step: 484750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:53,675-Speed 6313.02 samples/sec Loss 4.4706 LearningRate 0.0002 Epoch: 23 Global Step: 484760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:49:56,921-Speed 6311.08 samples/sec Loss 4.4177 LearningRate 0.0002 Epoch: 23 Global Step: 484770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:00,161-Speed 6323.00 samples/sec Loss 4.4091 LearningRate 0.0002 Epoch: 23 Global Step: 484780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:03,466-Speed 6198.36 samples/sec Loss 4.3812 LearningRate 0.0002 Epoch: 23 Global Step: 484790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:06,712-Speed 6310.51 samples/sec Loss 4.3865 LearningRate 0.0002 Epoch: 23 Global Step: 484800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:09,955-Speed 6317.51 samples/sec Loss 4.4463 LearningRate 0.0002 Epoch: 23 Global Step: 484810 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 11:50:13,188-Speed 6334.75 samples/sec Loss 4.4193 LearningRate 0.0002 Epoch: 23 Global Step: 484820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:16,429-Speed 6321.52 samples/sec Loss 4.3331 LearningRate 0.0002 Epoch: 23 Global Step: 484830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:19,672-Speed 6315.82 samples/sec Loss 4.3992 LearningRate 0.0002 Epoch: 23 Global Step: 484840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:22,919-Speed 6309.24 samples/sec Loss 4.3518 LearningRate 0.0002 Epoch: 23 Global Step: 484850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:26,161-Speed 6318.16 samples/sec Loss 4.3875 LearningRate 0.0002 Epoch: 23 Global Step: 484860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:29,409-Speed 6307.10 samples/sec Loss 4.3239 LearningRate 0.0002 Epoch: 23 Global Step: 484870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:32,651-Speed 6318.97 samples/sec Loss 4.3555 LearningRate 0.0002 Epoch: 23 Global Step: 484880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:35,896-Speed 6310.84 samples/sec Loss 4.3960 LearningRate 0.0002 Epoch: 23 Global Step: 484890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:39,138-Speed 6319.93 samples/sec Loss 4.3608 LearningRate 0.0002 Epoch: 23 Global Step: 484900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:42,381-Speed 6316.39 samples/sec Loss 4.3519 LearningRate 0.0002 Epoch: 23 Global Step: 484910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:45,612-Speed 6339.00 samples/sec Loss 4.4321 LearningRate 0.0002 Epoch: 23 Global Step: 484920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:48,860-Speed 6307.83 samples/sec Loss 4.4095 LearningRate 0.0002 Epoch: 23 Global Step: 484930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:52,141-Speed 6242.99 samples/sec Loss 4.4152 LearningRate 0.0002 Epoch: 23 Global Step: 484940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:55,384-Speed 6317.07 samples/sec Loss 4.3982 LearningRate 0.0002 Epoch: 23 Global Step: 484950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:50:58,629-Speed 6312.53 samples/sec Loss 4.4278 LearningRate 0.0002 Epoch: 23 Global Step: 484960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:01,877-Speed 6307.10 samples/sec Loss 4.3902 LearningRate 0.0002 Epoch: 23 Global Step: 484970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:05,120-Speed 6315.55 samples/sec Loss 4.3989 LearningRate 0.0002 Epoch: 23 Global Step: 484980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:08,365-Speed 6313.08 samples/sec Loss 4.3789 LearningRate 0.0002 Epoch: 23 Global Step: 484990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:11,607-Speed 6319.48 samples/sec Loss 4.4103 LearningRate 0.0002 Epoch: 23 Global Step: 485000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:14,853-Speed 6310.97 samples/sec Loss 4.3976 LearningRate 0.0002 Epoch: 23 Global Step: 485010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:18,086-Speed 6335.09 samples/sec Loss 4.4120 LearningRate 0.0002 Epoch: 23 Global Step: 485020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:21,330-Speed 6314.21 samples/sec Loss 4.3368 LearningRate 0.0002 Epoch: 23 Global Step: 485030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:24,576-Speed 6312.08 samples/sec Loss 4.4089 LearningRate 0.0002 Epoch: 23 Global Step: 485040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:27,825-Speed 6304.01 samples/sec Loss 4.4142 LearningRate 0.0002 Epoch: 23 Global Step: 485050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:31,071-Speed 6311.74 samples/sec Loss 4.4087 LearningRate 0.0002 Epoch: 23 Global Step: 485060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:34,314-Speed 6314.81 samples/sec Loss 4.4423 LearningRate 0.0002 Epoch: 23 Global Step: 485070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:37,560-Speed 6311.80 samples/sec Loss 4.3595 LearningRate 0.0002 Epoch: 23 Global Step: 485080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:40,804-Speed 6314.87 samples/sec Loss 4.3960 LearningRate 0.0002 Epoch: 23 Global Step: 485090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:44,048-Speed 6314.66 samples/sec Loss 4.4401 LearningRate 0.0002 Epoch: 23 Global Step: 485100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:47,295-Speed 6307.74 samples/sec Loss 4.3824 LearningRate 0.0002 Epoch: 23 Global Step: 485110 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:50,536-Speed 6320.77 samples/sec Loss 4.3901 LearningRate 0.0002 Epoch: 23 Global Step: 485120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:53,783-Speed 6309.04 samples/sec Loss 4.3401 LearningRate 0.0002 Epoch: 23 Global Step: 485130 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:51:57,029-Speed 6309.98 samples/sec Loss 4.3862 LearningRate 0.0002 Epoch: 23 Global Step: 485140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:00,277-Speed 6307.56 samples/sec Loss 4.3447 LearningRate 0.0002 Epoch: 23 Global Step: 485150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:03,526-Speed 6305.92 samples/sec Loss 4.3554 LearningRate 0.0002 Epoch: 23 Global Step: 485160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:06,772-Speed 6309.34 samples/sec Loss 4.3605 LearningRate 0.0002 Epoch: 23 Global Step: 485170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:10,018-Speed 6311.00 samples/sec Loss 4.3760 LearningRate 0.0002 Epoch: 23 Global Step: 485180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:13,272-Speed 6295.73 samples/sec Loss 4.3942 LearningRate 0.0002 Epoch: 23 Global Step: 485190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:16,533-Speed 6282.48 samples/sec Loss 4.3675 LearningRate 0.0002 Epoch: 23 Global Step: 485200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:19,779-Speed 6310.35 samples/sec Loss 4.4191 LearningRate 0.0002 Epoch: 23 Global Step: 485210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:23,031-Speed 6299.53 samples/sec Loss 4.3228 LearningRate 0.0002 Epoch: 23 Global Step: 485220 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 11:52:26,266-Speed 6332.89 samples/sec Loss 4.3645 LearningRate 0.0002 Epoch: 23 Global Step: 485230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:29,511-Speed 6313.03 samples/sec Loss 4.3987 LearningRate 0.0002 Epoch: 23 Global Step: 485240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:32,754-Speed 6316.21 samples/sec Loss 4.3946 LearningRate 0.0002 Epoch: 23 Global Step: 485250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:35,998-Speed 6314.91 samples/sec Loss 4.3544 LearningRate 0.0002 Epoch: 23 Global Step: 485260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:39,246-Speed 6305.87 samples/sec Loss 4.3543 LearningRate 0.0002 Epoch: 23 Global Step: 485270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:42,492-Speed 6309.98 samples/sec Loss 4.3678 LearningRate 0.0002 Epoch: 23 Global Step: 485280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:45,736-Speed 6315.70 samples/sec Loss 4.3485 LearningRate 0.0002 Epoch: 23 Global Step: 485290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:48,983-Speed 6309.03 samples/sec Loss 4.3524 LearningRate 0.0002 Epoch: 23 Global Step: 485300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:52,229-Speed 6309.13 samples/sec Loss 4.4210 LearningRate 0.0002 Epoch: 23 Global Step: 485310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:55,477-Speed 6307.82 samples/sec Loss 4.3920 LearningRate 0.0002 Epoch: 23 Global Step: 485320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:52:58,708-Speed 6339.13 samples/sec Loss 4.3913 LearningRate 0.0002 Epoch: 23 Global Step: 485330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:01,953-Speed 6314.11 samples/sec Loss 4.3436 LearningRate 0.0002 Epoch: 23 Global Step: 485340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:05,202-Speed 6303.73 samples/sec Loss 4.3346 LearningRate 0.0002 Epoch: 23 Global Step: 485350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:08,447-Speed 6312.28 samples/sec Loss 4.3816 LearningRate 0.0002 Epoch: 23 Global Step: 485360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:11,695-Speed 6307.38 samples/sec Loss 4.3565 LearningRate 0.0002 Epoch: 23 Global Step: 485370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:14,957-Speed 6279.96 samples/sec Loss 4.3622 LearningRate 0.0002 Epoch: 23 Global Step: 485380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:18,202-Speed 6312.54 samples/sec Loss 4.3864 LearningRate 0.0002 Epoch: 23 Global Step: 485390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:21,449-Speed 6309.25 samples/sec Loss 4.3893 LearningRate 0.0002 Epoch: 23 Global Step: 485400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:24,691-Speed 6317.92 samples/sec Loss 4.3153 LearningRate 0.0002 Epoch: 23 Global Step: 485410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:27,940-Speed 6306.76 samples/sec Loss 4.4127 LearningRate 0.0002 Epoch: 23 Global Step: 485420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:31,170-Speed 6341.48 samples/sec Loss 4.3818 LearningRate 0.0002 Epoch: 23 Global Step: 485430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:34,416-Speed 6309.89 samples/sec Loss 4.3701 LearningRate 0.0002 Epoch: 23 Global Step: 485440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:37,659-Speed 6316.90 samples/sec Loss 4.3502 LearningRate 0.0002 Epoch: 23 Global Step: 485450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:40,901-Speed 6318.81 samples/sec Loss 4.3728 LearningRate 0.0002 Epoch: 23 Global Step: 485460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:44,147-Speed 6309.90 samples/sec Loss 4.3063 LearningRate 0.0002 Epoch: 23 Global Step: 485470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:47,391-Speed 6314.80 samples/sec Loss 4.3378 LearningRate 0.0002 Epoch: 23 Global Step: 485480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:50,651-Speed 6284.76 samples/sec Loss 4.2904 LearningRate 0.0002 Epoch: 23 Global Step: 485490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:53,897-Speed 6309.67 samples/sec Loss 4.3230 LearningRate 0.0002 Epoch: 23 Global Step: 485500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:53:57,141-Speed 6315.68 samples/sec Loss 4.3756 LearningRate 0.0002 Epoch: 23 Global Step: 485510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:00,387-Speed 6309.39 samples/sec Loss 4.3423 LearningRate 0.0002 Epoch: 23 Global Step: 485520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:03,615-Speed 6345.89 samples/sec Loss 4.3680 LearningRate 0.0002 Epoch: 23 Global Step: 485530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:06,862-Speed 6310.48 samples/sec Loss 4.4146 LearningRate 0.0002 Epoch: 23 Global Step: 485540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:10,109-Speed 6307.38 samples/sec Loss 4.3528 LearningRate 0.0002 Epoch: 23 Global Step: 485550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:13,353-Speed 6315.50 samples/sec Loss 4.3694 LearningRate 0.0002 Epoch: 23 Global Step: 485560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:16,602-Speed 6305.27 samples/sec Loss 4.4190 LearningRate 0.0002 Epoch: 23 Global Step: 485570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:19,845-Speed 6315.23 samples/sec Loss 4.3727 LearningRate 0.0002 Epoch: 23 Global Step: 485580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:23,092-Speed 6309.96 samples/sec Loss 4.2971 LearningRate 0.0002 Epoch: 23 Global Step: 485590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:26,336-Speed 6314.16 samples/sec Loss 4.3483 LearningRate 0.0002 Epoch: 23 Global Step: 485600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:29,582-Speed 6310.86 samples/sec Loss 4.4002 LearningRate 0.0002 Epoch: 23 Global Step: 485610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:32,828-Speed 6310.64 samples/sec Loss 4.3685 LearningRate 0.0002 Epoch: 23 Global Step: 485620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:36,062-Speed 6333.54 samples/sec Loss 4.4415 LearningRate 0.0002 Epoch: 23 Global Step: 485630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:39,309-Speed 6309.80 samples/sec Loss 4.4308 LearningRate 0.0002 Epoch: 23 Global Step: 485640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:42,589-Speed 6245.94 samples/sec Loss 4.3883 LearningRate 0.0002 Epoch: 23 Global Step: 485650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:45,871-Speed 6241.20 samples/sec Loss 4.3666 LearningRate 0.0002 Epoch: 23 Global Step: 485660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:49,119-Speed 6306.99 samples/sec Loss 4.3778 LearningRate 0.0002 Epoch: 23 Global Step: 485670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:52,374-Speed 6293.22 samples/sec Loss 4.3587 LearningRate 0.0002 Epoch: 23 Global Step: 485680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:55,619-Speed 6311.89 samples/sec Loss 4.3393 LearningRate 0.0002 Epoch: 23 Global Step: 485690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:54:58,865-Speed 6310.79 samples/sec Loss 4.3434 LearningRate 0.0002 Epoch: 23 Global Step: 485700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:02,112-Speed 6310.02 samples/sec Loss 4.3721 LearningRate 0.0002 Epoch: 23 Global Step: 485710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:05,357-Speed 6312.73 samples/sec Loss 4.4527 LearningRate 0.0002 Epoch: 23 Global Step: 485720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:08,592-Speed 6332.30 samples/sec Loss 4.3282 LearningRate 0.0002 Epoch: 23 Global Step: 485730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:11,835-Speed 6315.35 samples/sec Loss 4.4439 LearningRate 0.0002 Epoch: 23 Global Step: 485740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:15,081-Speed 6311.82 samples/sec Loss 4.4207 LearningRate 0.0002 Epoch: 23 Global Step: 485750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:18,330-Speed 6305.22 samples/sec Loss 4.3800 LearningRate 0.0002 Epoch: 23 Global Step: 485760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:21,571-Speed 6318.41 samples/sec Loss 4.3703 LearningRate 0.0002 Epoch: 23 Global Step: 485770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:24,818-Speed 6308.69 samples/sec Loss 4.3430 LearningRate 0.0002 Epoch: 23 Global Step: 485780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:28,067-Speed 6305.94 samples/sec Loss 4.3491 LearningRate 0.0002 Epoch: 23 Global Step: 485790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:31,317-Speed 6303.45 samples/sec Loss 4.3576 LearningRate 0.0002 Epoch: 23 Global Step: 485800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:34,561-Speed 6313.44 samples/sec Loss 4.3017 LearningRate 0.0002 Epoch: 23 Global Step: 485810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:37,810-Speed 6306.10 samples/sec Loss 4.3656 LearningRate 0.0002 Epoch: 23 Global Step: 485820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:41,061-Speed 6300.67 samples/sec Loss 4.4154 LearningRate 0.0002 Epoch: 23 Global Step: 485830 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 11:55:44,294-Speed 6337.10 samples/sec Loss 4.3376 LearningRate 0.0002 Epoch: 23 Global Step: 485840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:47,540-Speed 6309.71 samples/sec Loss 4.3636 LearningRate 0.0002 Epoch: 23 Global Step: 485850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:50,786-Speed 6310.89 samples/sec Loss 4.3246 LearningRate 0.0002 Epoch: 23 Global Step: 485860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:54,026-Speed 6322.71 samples/sec Loss 4.3518 LearningRate 0.0002 Epoch: 23 Global Step: 485870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:55:57,271-Speed 6312.00 samples/sec Loss 4.3417 LearningRate 0.0002 Epoch: 23 Global Step: 485880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:00,519-Speed 6307.17 samples/sec Loss 4.3398 LearningRate 0.0002 Epoch: 23 Global Step: 485890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:03,778-Speed 6285.69 samples/sec Loss 4.4425 LearningRate 0.0002 Epoch: 23 Global Step: 485900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:07,028-Speed 6302.79 samples/sec Loss 4.3617 LearningRate 0.0002 Epoch: 23 Global Step: 485910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:10,288-Speed 6283.11 samples/sec Loss 4.3761 LearningRate 0.0002 Epoch: 23 Global Step: 485920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:13,540-Speed 6299.02 samples/sec Loss 4.4018 LearningRate 0.0002 Epoch: 23 Global Step: 485930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:16,781-Speed 6321.93 samples/sec Loss 4.3573 LearningRate 0.0002 Epoch: 23 Global Step: 485940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:20,032-Speed 6299.62 samples/sec Loss 4.3457 LearningRate 0.0002 Epoch: 23 Global Step: 485950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:23,299-Speed 6271.15 samples/sec Loss 4.4120 LearningRate 0.0002 Epoch: 23 Global Step: 485960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:26,555-Speed 6290.81 samples/sec Loss 4.3742 LearningRate 0.0002 Epoch: 23 Global Step: 485970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:29,804-Speed 6305.80 samples/sec Loss 4.3506 LearningRate 0.0002 Epoch: 23 Global Step: 485980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:33,055-Speed 6300.54 samples/sec Loss 4.4544 LearningRate 0.0002 Epoch: 23 Global Step: 485990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:36,300-Speed 6311.98 samples/sec Loss 4.3178 LearningRate 0.0002 Epoch: 23 Global Step: 486000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:39,548-Speed 6307.90 samples/sec Loss 4.3652 LearningRate 0.0002 Epoch: 23 Global Step: 486010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:42,793-Speed 6312.60 samples/sec Loss 4.3618 LearningRate 0.0002 Epoch: 23 Global Step: 486020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:46,048-Speed 6292.87 samples/sec Loss 4.3416 LearningRate 0.0002 Epoch: 23 Global Step: 486030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:49,288-Speed 6322.98 samples/sec Loss 4.3368 LearningRate 0.0002 Epoch: 23 Global Step: 486040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:52,533-Speed 6312.07 samples/sec Loss 4.4172 LearningRate 0.0002 Epoch: 23 Global Step: 486050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:55,789-Speed 6292.53 samples/sec Loss 4.3725 LearningRate 0.0002 Epoch: 23 Global Step: 486060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:56:59,043-Speed 6294.35 samples/sec Loss 4.3588 LearningRate 0.0002 Epoch: 23 Global Step: 486070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:02,290-Speed 6310.06 samples/sec Loss 4.3581 LearningRate 0.0002 Epoch: 23 Global Step: 486080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:05,537-Speed 6308.34 samples/sec Loss 4.3055 LearningRate 0.0002 Epoch: 23 Global Step: 486090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:08,787-Speed 6303.75 samples/sec Loss 4.2903 LearningRate 0.0002 Epoch: 23 Global Step: 486100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:12,028-Speed 6318.84 samples/sec Loss 4.3661 LearningRate 0.0002 Epoch: 23 Global Step: 486110 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:15,288-Speed 6283.94 samples/sec Loss 4.3439 LearningRate 0.0002 Epoch: 23 Global Step: 486120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:18,532-Speed 6314.72 samples/sec Loss 4.3406 LearningRate 0.0002 Epoch: 23 Global Step: 486130 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:21,780-Speed 6307.47 samples/sec Loss 4.3116 LearningRate 0.0002 Epoch: 23 Global Step: 486140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:25,027-Speed 6308.77 samples/sec Loss 4.3528 LearningRate 0.0002 Epoch: 23 Global Step: 486150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:28,276-Speed 6304.81 samples/sec Loss 4.3823 LearningRate 0.0002 Epoch: 23 Global Step: 486160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:31,547-Speed 6262.00 samples/sec Loss 4.3947 LearningRate 0.0002 Epoch: 23 Global Step: 486170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:34,821-Speed 6255.69 samples/sec Loss 4.3244 LearningRate 0.0002 Epoch: 23 Global Step: 486180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:38,071-Speed 6303.80 samples/sec Loss 4.3343 LearningRate 0.0002 Epoch: 23 Global Step: 486190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:41,340-Speed 6282.39 samples/sec Loss 4.3707 LearningRate 0.0002 Epoch: 23 Global Step: 486200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:44,586-Speed 6311.51 samples/sec Loss 4.3414 LearningRate 0.0002 Epoch: 23 Global Step: 486210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:47,837-Speed 6301.22 samples/sec Loss 4.4200 LearningRate 0.0002 Epoch: 23 Global Step: 486220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:51,077-Speed 6321.39 samples/sec Loss 4.3912 LearningRate 0.0002 Epoch: 23 Global Step: 486230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:54,314-Speed 6329.00 samples/sec Loss 4.3525 LearningRate 0.0002 Epoch: 23 Global Step: 486240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:57:57,557-Speed 6316.97 samples/sec Loss 4.3990 LearningRate 0.0002 Epoch: 23 Global Step: 486250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:00,810-Speed 6296.60 samples/sec Loss 4.4227 LearningRate 0.0002 Epoch: 23 Global Step: 486260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:04,068-Speed 6287.81 samples/sec Loss 4.3359 LearningRate 0.0002 Epoch: 23 Global Step: 486270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:07,330-Speed 6279.66 samples/sec Loss 4.3082 LearningRate 0.0002 Epoch: 23 Global Step: 486280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:10,579-Speed 6305.95 samples/sec Loss 4.3413 LearningRate 0.0002 Epoch: 23 Global Step: 486290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:13,828-Speed 6305.51 samples/sec Loss 4.4707 LearningRate 0.0002 Epoch: 23 Global Step: 486300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:17,076-Speed 6304.95 samples/sec Loss 4.3710 LearningRate 0.0002 Epoch: 23 Global Step: 486310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:20,327-Speed 6301.33 samples/sec Loss 4.3446 LearningRate 0.0002 Epoch: 23 Global Step: 486320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:23,571-Speed 6315.07 samples/sec Loss 4.3277 LearningRate 0.0002 Epoch: 23 Global Step: 486330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:26,830-Speed 6285.91 samples/sec Loss 4.3550 LearningRate 0.0002 Epoch: 23 Global Step: 486340 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 11:58:30,077-Speed 6308.00 samples/sec Loss 4.3469 LearningRate 0.0002 Epoch: 23 Global Step: 486350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:33,328-Speed 6302.00 samples/sec Loss 4.4319 LearningRate 0.0002 Epoch: 23 Global Step: 486360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:36,575-Speed 6309.00 samples/sec Loss 4.3449 LearningRate 0.0002 Epoch: 23 Global Step: 486370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:39,822-Speed 6307.20 samples/sec Loss 4.3365 LearningRate 0.0002 Epoch: 23 Global Step: 486380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:43,090-Speed 6268.43 samples/sec Loss 4.3742 LearningRate 0.0002 Epoch: 23 Global Step: 486390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:46,335-Speed 6312.61 samples/sec Loss 4.3614 LearningRate 0.0002 Epoch: 23 Global Step: 486400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:49,580-Speed 6312.76 samples/sec Loss 4.3896 LearningRate 0.0002 Epoch: 23 Global Step: 486410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:52,822-Speed 6318.14 samples/sec Loss 4.3719 LearningRate 0.0002 Epoch: 23 Global Step: 486420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:56,069-Speed 6309.15 samples/sec Loss 4.3624 LearningRate 0.0002 Epoch: 23 Global Step: 486430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:58:59,317-Speed 6306.82 samples/sec Loss 4.3391 LearningRate 0.0002 Epoch: 23 Global Step: 486440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:02,549-Speed 6339.23 samples/sec Loss 4.2996 LearningRate 0.0002 Epoch: 23 Global Step: 486450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:05,826-Speed 6249.72 samples/sec Loss 4.3544 LearningRate 0.0002 Epoch: 23 Global Step: 486460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:09,073-Speed 6310.33 samples/sec Loss 4.3262 LearningRate 0.0002 Epoch: 23 Global Step: 486470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:12,319-Speed 6311.19 samples/sec Loss 4.3519 LearningRate 0.0002 Epoch: 23 Global Step: 486480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:15,562-Speed 6315.18 samples/sec Loss 4.3408 LearningRate 0.0002 Epoch: 23 Global Step: 486490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:18,809-Speed 6309.41 samples/sec Loss 4.4221 LearningRate 0.0002 Epoch: 23 Global Step: 486500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:22,064-Speed 6293.38 samples/sec Loss 4.4055 LearningRate 0.0002 Epoch: 23 Global Step: 486510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:25,312-Speed 6306.27 samples/sec Loss 4.3750 LearningRate 0.0002 Epoch: 23 Global Step: 486520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:28,564-Speed 6300.23 samples/sec Loss 4.3556 LearningRate 0.0002 Epoch: 23 Global Step: 486530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:31,813-Speed 6304.33 samples/sec Loss 4.3638 LearningRate 0.0002 Epoch: 23 Global Step: 486540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:35,043-Speed 6342.94 samples/sec Loss 4.4084 LearningRate 0.0002 Epoch: 23 Global Step: 486550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:38,288-Speed 6311.92 samples/sec Loss 4.4125 LearningRate 0.0002 Epoch: 23 Global Step: 486560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:41,530-Speed 6317.93 samples/sec Loss 4.3737 LearningRate 0.0002 Epoch: 23 Global Step: 486570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:44,777-Speed 6308.68 samples/sec Loss 4.3391 LearningRate 0.0002 Epoch: 23 Global Step: 486580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:48,055-Speed 6250.74 samples/sec Loss 4.3352 LearningRate 0.0002 Epoch: 23 Global Step: 486590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:51,333-Speed 6248.30 samples/sec Loss 4.3288 LearningRate 0.0002 Epoch: 23 Global Step: 486600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:54,579-Speed 6310.84 samples/sec Loss 4.4324 LearningRate 0.0002 Epoch: 23 Global Step: 486610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 11:59:57,825-Speed 6310.34 samples/sec Loss 4.3557 LearningRate 0.0002 Epoch: 23 Global Step: 486620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:01,071-Speed 6312.03 samples/sec Loss 4.3261 LearningRate 0.0002 Epoch: 23 Global Step: 486630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:04,319-Speed 6306.39 samples/sec Loss 4.3220 LearningRate 0.0002 Epoch: 23 Global Step: 486640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:07,553-Speed 6333.59 samples/sec Loss 4.3649 LearningRate 0.0002 Epoch: 23 Global Step: 486650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:10,803-Speed 6303.70 samples/sec Loss 4.4131 LearningRate 0.0002 Epoch: 23 Global Step: 486660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:14,049-Speed 6309.29 samples/sec Loss 4.3727 LearningRate 0.0002 Epoch: 23 Global Step: 486670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:17,304-Speed 6295.37 samples/sec Loss 4.3818 LearningRate 0.0002 Epoch: 23 Global Step: 486680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:20,549-Speed 6312.48 samples/sec Loss 4.3536 LearningRate 0.0002 Epoch: 23 Global Step: 486690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:23,799-Speed 6302.74 samples/sec Loss 4.3351 LearningRate 0.0002 Epoch: 23 Global Step: 486700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:27,058-Speed 6285.72 samples/sec Loss 4.3559 LearningRate 0.0002 Epoch: 23 Global Step: 486710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:30,306-Speed 6307.60 samples/sec Loss 4.3516 LearningRate 0.0002 Epoch: 23 Global Step: 486720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:33,550-Speed 6313.48 samples/sec Loss 4.4339 LearningRate 0.0002 Epoch: 23 Global Step: 486730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:36,794-Speed 6315.56 samples/sec Loss 4.3190 LearningRate 0.0002 Epoch: 23 Global Step: 486740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:40,025-Speed 6339.44 samples/sec Loss 4.3512 LearningRate 0.0002 Epoch: 23 Global Step: 486750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:43,270-Speed 6313.03 samples/sec Loss 4.3449 LearningRate 0.0002 Epoch: 23 Global Step: 486760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:46,518-Speed 6306.52 samples/sec Loss 4.3827 LearningRate 0.0002 Epoch: 23 Global Step: 486770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:49,768-Speed 6301.80 samples/sec Loss 4.3424 LearningRate 0.0002 Epoch: 23 Global Step: 486780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:53,013-Speed 6313.86 samples/sec Loss 4.3681 LearningRate 0.0002 Epoch: 23 Global Step: 486790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:56,259-Speed 6310.33 samples/sec Loss 4.3867 LearningRate 0.0002 Epoch: 23 Global Step: 486800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:00:59,505-Speed 6310.22 samples/sec Loss 4.3905 LearningRate 0.0002 Epoch: 23 Global Step: 486810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:02,752-Speed 6309.74 samples/sec Loss 4.3633 LearningRate 0.0002 Epoch: 23 Global Step: 486820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:05,998-Speed 6310.09 samples/sec Loss 4.4013 LearningRate 0.0002 Epoch: 23 Global Step: 486830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:09,244-Speed 6309.92 samples/sec Loss 4.3452 LearningRate 0.0002 Epoch: 23 Global Step: 486840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:12,493-Speed 6306.39 samples/sec Loss 4.3842 LearningRate 0.0002 Epoch: 23 Global Step: 486850 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:01:15,722-Speed 6343.14 samples/sec Loss 4.3586 LearningRate 0.0002 Epoch: 23 Global Step: 486860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:18,971-Speed 6304.56 samples/sec Loss 4.3618 LearningRate 0.0002 Epoch: 23 Global Step: 486870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:22,217-Speed 6310.05 samples/sec Loss 4.4625 LearningRate 0.0002 Epoch: 23 Global Step: 486880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:25,464-Speed 6310.87 samples/sec Loss 4.3579 LearningRate 0.0002 Epoch: 23 Global Step: 486890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:28,723-Speed 6286.31 samples/sec Loss 4.4506 LearningRate 0.0002 Epoch: 23 Global Step: 486900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:31,970-Speed 6307.60 samples/sec Loss 4.3664 LearningRate 0.0002 Epoch: 23 Global Step: 486910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:35,221-Speed 6301.34 samples/sec Loss 4.3663 LearningRate 0.0002 Epoch: 23 Global Step: 486920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:38,466-Speed 6312.99 samples/sec Loss 4.3411 LearningRate 0.0002 Epoch: 23 Global Step: 486930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:41,710-Speed 6313.50 samples/sec Loss 4.4101 LearningRate 0.0002 Epoch: 23 Global Step: 486940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:44,958-Speed 6308.00 samples/sec Loss 4.2850 LearningRate 0.0002 Epoch: 23 Global Step: 486950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:48,193-Speed 6331.22 samples/sec Loss 4.3539 LearningRate 0.0002 Epoch: 23 Global Step: 486960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:51,445-Speed 6299.69 samples/sec Loss 4.3516 LearningRate 0.0002 Epoch: 23 Global Step: 486970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:54,690-Speed 6311.58 samples/sec Loss 4.4026 LearningRate 0.0002 Epoch: 23 Global Step: 486980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:01:57,946-Speed 6291.39 samples/sec Loss 4.3770 LearningRate 0.0002 Epoch: 23 Global Step: 486990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:01,199-Speed 6298.83 samples/sec Loss 4.3690 LearningRate 0.0002 Epoch: 23 Global Step: 487000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:04,448-Speed 6303.29 samples/sec Loss 4.4057 LearningRate 0.0002 Epoch: 23 Global Step: 487010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:07,696-Speed 6307.66 samples/sec Loss 4.3340 LearningRate 0.0002 Epoch: 23 Global Step: 487020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:11,031-Speed 6141.39 samples/sec Loss 4.3829 LearningRate 0.0002 Epoch: 23 Global Step: 487030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:14,397-Speed 6086.24 samples/sec Loss 4.3477 LearningRate 0.0002 Epoch: 23 Global Step: 487040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:17,689-Speed 6222.43 samples/sec Loss 4.3591 LearningRate 0.0002 Epoch: 23 Global Step: 487050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:20,920-Speed 6339.23 samples/sec Loss 4.3649 LearningRate 0.0002 Epoch: 23 Global Step: 487060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:24,166-Speed 6310.92 samples/sec Loss 4.3717 LearningRate 0.0002 Epoch: 23 Global Step: 487070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:27,412-Speed 6310.78 samples/sec Loss 4.3637 LearningRate 0.0002 Epoch: 23 Global Step: 487080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:30,658-Speed 6311.98 samples/sec Loss 4.3770 LearningRate 0.0002 Epoch: 23 Global Step: 487090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:33,905-Speed 6308.10 samples/sec Loss 4.3696 LearningRate 0.0002 Epoch: 23 Global Step: 487100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:37,150-Speed 6312.90 samples/sec Loss 4.3489 LearningRate 0.0002 Epoch: 23 Global Step: 487110 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:40,395-Speed 6313.11 samples/sec Loss 4.3797 LearningRate 0.0002 Epoch: 23 Global Step: 487120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:43,642-Speed 6309.38 samples/sec Loss 4.3033 LearningRate 0.0002 Epoch: 23 Global Step: 487130 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:46,884-Speed 6318.17 samples/sec Loss 4.3304 LearningRate 0.0002 Epoch: 23 Global Step: 487140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:50,131-Speed 6309.39 samples/sec Loss 4.3678 LearningRate 0.0002 Epoch: 23 Global Step: 487150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:53,361-Speed 6341.30 samples/sec Loss 4.3772 LearningRate 0.0002 Epoch: 23 Global Step: 487160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:56,626-Speed 6274.14 samples/sec Loss 4.3109 LearningRate 0.0002 Epoch: 23 Global Step: 487170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:02:59,871-Speed 6312.83 samples/sec Loss 4.3990 LearningRate 0.0002 Epoch: 23 Global Step: 487180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:03,119-Speed 6305.73 samples/sec Loss 4.3010 LearningRate 0.0002 Epoch: 23 Global Step: 487190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:06,364-Speed 6315.13 samples/sec Loss 4.2882 LearningRate 0.0002 Epoch: 23 Global Step: 487200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:09,608-Speed 6314.22 samples/sec Loss 4.3455 LearningRate 0.0002 Epoch: 23 Global Step: 487210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:12,848-Speed 6322.40 samples/sec Loss 4.4003 LearningRate 0.0002 Epoch: 23 Global Step: 487220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:16,098-Speed 6302.42 samples/sec Loss 4.3534 LearningRate 0.0002 Epoch: 23 Global Step: 487230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:19,349-Speed 6300.59 samples/sec Loss 4.3111 LearningRate 0.0002 Epoch: 23 Global Step: 487240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:22,590-Speed 6320.27 samples/sec Loss 4.2921 LearningRate 0.0002 Epoch: 23 Global Step: 487250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:25,822-Speed 6338.24 samples/sec Loss 4.3955 LearningRate 0.0002 Epoch: 23 Global Step: 487260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:29,067-Speed 6312.86 samples/sec Loss 4.3255 LearningRate 0.0002 Epoch: 23 Global Step: 487270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:32,310-Speed 6316.78 samples/sec Loss 4.3672 LearningRate 0.0002 Epoch: 23 Global Step: 487280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:35,560-Speed 6303.54 samples/sec Loss 4.3584 LearningRate 0.0002 Epoch: 23 Global Step: 487290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:38,804-Speed 6313.61 samples/sec Loss 4.3268 LearningRate 0.0002 Epoch: 23 Global Step: 487300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:42,060-Speed 6292.72 samples/sec Loss 4.3206 LearningRate 0.0002 Epoch: 23 Global Step: 487310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:45,303-Speed 6315.94 samples/sec Loss 4.3680 LearningRate 0.0002 Epoch: 23 Global Step: 487320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:48,548-Speed 6313.63 samples/sec Loss 4.3735 LearningRate 0.0002 Epoch: 23 Global Step: 487330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:51,793-Speed 6312.28 samples/sec Loss 4.4443 LearningRate 0.0002 Epoch: 23 Global Step: 487340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:55,041-Speed 6306.07 samples/sec Loss 4.4111 LearningRate 0.0002 Epoch: 23 Global Step: 487350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:03:58,274-Speed 6337.46 samples/sec Loss 4.4241 LearningRate 0.0002 Epoch: 23 Global Step: 487360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:01,527-Speed 6297.04 samples/sec Loss 4.3957 LearningRate 0.0002 Epoch: 23 Global Step: 487370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:04,776-Speed 6303.42 samples/sec Loss 4.3073 LearningRate 0.0002 Epoch: 23 Global Step: 487380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:08,021-Speed 6312.96 samples/sec Loss 4.3011 LearningRate 0.0002 Epoch: 23 Global Step: 487390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:11,266-Speed 6313.25 samples/sec Loss 4.3871 LearningRate 0.0002 Epoch: 23 Global Step: 487400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:14,514-Speed 6306.54 samples/sec Loss 4.3267 LearningRate 0.0002 Epoch: 23 Global Step: 487410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:17,758-Speed 6314.86 samples/sec Loss 4.3719 LearningRate 0.0002 Epoch: 23 Global Step: 487420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:21,001-Speed 6316.35 samples/sec Loss 4.2568 LearningRate 0.0002 Epoch: 23 Global Step: 487430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:24,244-Speed 6316.48 samples/sec Loss 4.3432 LearningRate 0.0002 Epoch: 23 Global Step: 487440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:27,490-Speed 6310.64 samples/sec Loss 4.3329 LearningRate 0.0002 Epoch: 23 Global Step: 487450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:30,735-Speed 6312.02 samples/sec Loss 4.4451 LearningRate 0.0002 Epoch: 23 Global Step: 487460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:33,986-Speed 6302.48 samples/sec Loss 4.2909 LearningRate 0.0002 Epoch: 23 Global Step: 487470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:37,236-Speed 6304.45 samples/sec Loss 4.3165 LearningRate 0.0002 Epoch: 23 Global Step: 487480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:40,483-Speed 6309.39 samples/sec Loss 4.3189 LearningRate 0.0002 Epoch: 23 Global Step: 487490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:43,729-Speed 6309.22 samples/sec Loss 4.3672 LearningRate 0.0002 Epoch: 23 Global Step: 487500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:46,975-Speed 6310.78 samples/sec Loss 4.3697 LearningRate 0.0002 Epoch: 23 Global Step: 487510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:50,224-Speed 6306.32 samples/sec Loss 4.2823 LearningRate 0.0002 Epoch: 23 Global Step: 487520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:53,470-Speed 6310.60 samples/sec Loss 4.3143 LearningRate 0.0002 Epoch: 23 Global Step: 487530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:56,715-Speed 6313.63 samples/sec Loss 4.3993 LearningRate 0.0002 Epoch: 23 Global Step: 487540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:04:59,980-Speed 6274.64 samples/sec Loss 4.3453 LearningRate 0.0002 Epoch: 23 Global Step: 487550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:03,227-Speed 6308.36 samples/sec Loss 4.3689 LearningRate 0.0002 Epoch: 23 Global Step: 487560 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:05:06,462-Speed 6336.20 samples/sec Loss 4.3708 LearningRate 0.0002 Epoch: 23 Global Step: 487570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:09,708-Speed 6311.34 samples/sec Loss 4.3027 LearningRate 0.0002 Epoch: 23 Global Step: 487580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:12,952-Speed 6313.55 samples/sec Loss 4.3681 LearningRate 0.0002 Epoch: 23 Global Step: 487590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:16,197-Speed 6312.94 samples/sec Loss 4.3859 LearningRate 0.0002 Epoch: 23 Global Step: 487600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:19,444-Speed 6307.76 samples/sec Loss 4.3503 LearningRate 0.0002 Epoch: 23 Global Step: 487610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:22,688-Speed 6315.89 samples/sec Loss 4.3087 LearningRate 0.0002 Epoch: 23 Global Step: 487620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:25,931-Speed 6315.50 samples/sec Loss 4.3868 LearningRate 0.0002 Epoch: 23 Global Step: 487630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:29,177-Speed 6312.28 samples/sec Loss 4.3292 LearningRate 0.0002 Epoch: 23 Global Step: 487640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:32,419-Speed 6318.03 samples/sec Loss 4.3236 LearningRate 0.0002 Epoch: 23 Global Step: 487650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:35,664-Speed 6312.91 samples/sec Loss 4.3098 LearningRate 0.0002 Epoch: 23 Global Step: 487660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:38,906-Speed 6318.81 samples/sec Loss 4.3172 LearningRate 0.0002 Epoch: 23 Global Step: 487670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:42,152-Speed 6309.08 samples/sec Loss 4.3574 LearningRate 0.0002 Epoch: 23 Global Step: 487680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:45,398-Speed 6311.17 samples/sec Loss 4.3927 LearningRate 0.0002 Epoch: 23 Global Step: 487690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:48,645-Speed 6309.02 samples/sec Loss 4.3162 LearningRate 0.0002 Epoch: 23 Global Step: 487700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:51,892-Speed 6308.52 samples/sec Loss 4.2941 LearningRate 0.0002 Epoch: 23 Global Step: 487710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:55,136-Speed 6314.01 samples/sec Loss 4.2911 LearningRate 0.0002 Epoch: 23 Global Step: 487720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:05:58,384-Speed 6308.03 samples/sec Loss 4.4047 LearningRate 0.0002 Epoch: 23 Global Step: 487730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:01,631-Speed 6309.67 samples/sec Loss 4.3406 LearningRate 0.0002 Epoch: 23 Global Step: 487740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:04,878-Speed 6308.15 samples/sec Loss 4.3824 LearningRate 0.0002 Epoch: 23 Global Step: 487750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:08,135-Speed 6289.58 samples/sec Loss 4.3702 LearningRate 0.0002 Epoch: 23 Global Step: 487760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:11,366-Speed 6340.91 samples/sec Loss 4.3319 LearningRate 0.0002 Epoch: 23 Global Step: 487770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:14,633-Speed 6268.46 samples/sec Loss 4.2968 LearningRate 0.0002 Epoch: 23 Global Step: 487780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:17,880-Speed 6308.87 samples/sec Loss 4.3644 LearningRate 0.0002 Epoch: 23 Global Step: 487790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:21,125-Speed 6313.25 samples/sec Loss 4.3407 LearningRate 0.0002 Epoch: 23 Global Step: 487800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:24,367-Speed 6317.75 samples/sec Loss 4.3783 LearningRate 0.0002 Epoch: 23 Global Step: 487810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:27,622-Speed 6293.75 samples/sec Loss 4.3900 LearningRate 0.0002 Epoch: 23 Global Step: 487820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:30,869-Speed 6308.33 samples/sec Loss 4.3200 LearningRate 0.0002 Epoch: 23 Global Step: 487830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:34,114-Speed 6312.84 samples/sec Loss 4.3927 LearningRate 0.0002 Epoch: 23 Global Step: 487840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:37,362-Speed 6307.78 samples/sec Loss 4.3200 LearningRate 0.0002 Epoch: 23 Global Step: 487850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:40,610-Speed 6306.28 samples/sec Loss 4.3830 LearningRate 0.0002 Epoch: 23 Global Step: 487860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:43,840-Speed 6340.99 samples/sec Loss 4.3529 LearningRate 0.0002 Epoch: 23 Global Step: 487870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:47,094-Speed 6295.64 samples/sec Loss 4.3630 LearningRate 0.0002 Epoch: 23 Global Step: 487880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:50,339-Speed 6313.23 samples/sec Loss 4.3471 LearningRate 0.0002 Epoch: 23 Global Step: 487890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:53,588-Speed 6305.24 samples/sec Loss 4.3731 LearningRate 0.0002 Epoch: 23 Global Step: 487900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:06:56,831-Speed 6316.07 samples/sec Loss 4.3453 LearningRate 0.0002 Epoch: 23 Global Step: 487910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:00,079-Speed 6306.56 samples/sec Loss 4.3169 LearningRate 0.0002 Epoch: 23 Global Step: 487920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:03,327-Speed 6307.18 samples/sec Loss 4.3168 LearningRate 0.0002 Epoch: 23 Global Step: 487930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:06,572-Speed 6313.86 samples/sec Loss 4.3405 LearningRate 0.0002 Epoch: 23 Global Step: 487940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:09,820-Speed 6307.29 samples/sec Loss 4.4184 LearningRate 0.0002 Epoch: 23 Global Step: 487950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:13,064-Speed 6314.49 samples/sec Loss 4.3293 LearningRate 0.0002 Epoch: 23 Global Step: 487960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:16,312-Speed 6306.17 samples/sec Loss 4.4653 LearningRate 0.0002 Epoch: 23 Global Step: 487970 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:07:19,547-Speed 6331.87 samples/sec Loss 4.3151 LearningRate 0.0002 Epoch: 23 Global Step: 487980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:22,795-Speed 6307.99 samples/sec Loss 4.4014 LearningRate 0.0002 Epoch: 23 Global Step: 487990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:26,042-Speed 6308.21 samples/sec Loss 4.3949 LearningRate 0.0002 Epoch: 23 Global Step: 488000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:29,291-Speed 6305.22 samples/sec Loss 4.3414 LearningRate 0.0002 Epoch: 23 Global Step: 488010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:32,546-Speed 6292.53 samples/sec Loss 4.4240 LearningRate 0.0002 Epoch: 23 Global Step: 488020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:35,793-Speed 6309.65 samples/sec Loss 4.3901 LearningRate 0.0002 Epoch: 23 Global Step: 488030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:39,036-Speed 6316.83 samples/sec Loss 4.3512 LearningRate 0.0002 Epoch: 23 Global Step: 488040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:42,284-Speed 6306.26 samples/sec Loss 4.4115 LearningRate 0.0002 Epoch: 23 Global Step: 488050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:45,533-Speed 6304.65 samples/sec Loss 4.3639 LearningRate 0.0002 Epoch: 23 Global Step: 488060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:48,791-Speed 6287.39 samples/sec Loss 4.3855 LearningRate 0.0002 Epoch: 23 Global Step: 488070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:52,027-Speed 6329.25 samples/sec Loss 4.3811 LearningRate 0.0002 Epoch: 23 Global Step: 488080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:55,281-Speed 6296.27 samples/sec Loss 4.3880 LearningRate 0.0002 Epoch: 23 Global Step: 488090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:07:58,530-Speed 6304.51 samples/sec Loss 4.3580 LearningRate 0.0002 Epoch: 23 Global Step: 488100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:01,773-Speed 6316.85 samples/sec Loss 4.2956 LearningRate 0.0002 Epoch: 23 Global Step: 488110 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:05,029-Speed 6291.42 samples/sec Loss 4.3567 LearningRate 0.0002 Epoch: 23 Global Step: 488120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:08,276-Speed 6309.01 samples/sec Loss 4.3298 LearningRate 0.0002 Epoch: 23 Global Step: 488130 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:11,518-Speed 6317.55 samples/sec Loss 4.3651 LearningRate 0.0002 Epoch: 23 Global Step: 488140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:14,763-Speed 6312.71 samples/sec Loss 4.3534 LearningRate 0.0002 Epoch: 23 Global Step: 488150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:18,011-Speed 6308.50 samples/sec Loss 4.3284 LearningRate 0.0002 Epoch: 23 Global Step: 488160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:21,255-Speed 6315.19 samples/sec Loss 4.3776 LearningRate 0.0002 Epoch: 23 Global Step: 488170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:24,490-Speed 6332.18 samples/sec Loss 4.3136 LearningRate 0.0002 Epoch: 23 Global Step: 488180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:27,736-Speed 6310.00 samples/sec Loss 4.3247 LearningRate 0.0002 Epoch: 23 Global Step: 488190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:30,980-Speed 6315.16 samples/sec Loss 4.4192 LearningRate 0.0002 Epoch: 23 Global Step: 488200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:34,224-Speed 6314.74 samples/sec Loss 4.3460 LearningRate 0.0002 Epoch: 23 Global Step: 488210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:37,470-Speed 6311.18 samples/sec Loss 4.3969 LearningRate 0.0002 Epoch: 23 Global Step: 488220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:40,715-Speed 6311.48 samples/sec Loss 4.4415 LearningRate 0.0002 Epoch: 23 Global Step: 488230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:43,959-Speed 6314.91 samples/sec Loss 4.3346 LearningRate 0.0002 Epoch: 23 Global Step: 488240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:47,206-Speed 6308.87 samples/sec Loss 4.3211 LearningRate 0.0002 Epoch: 23 Global Step: 488250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:50,451-Speed 6311.94 samples/sec Loss 4.3016 LearningRate 0.0002 Epoch: 23 Global Step: 488260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:53,696-Speed 6314.22 samples/sec Loss 4.3447 LearningRate 0.0002 Epoch: 23 Global Step: 488270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:08:56,940-Speed 6313.00 samples/sec Loss 4.2952 LearningRate 0.0002 Epoch: 23 Global Step: 488280 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:09:00,175-Speed 6333.21 samples/sec Loss 4.3356 LearningRate 0.0002 Epoch: 23 Global Step: 488290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:03,420-Speed 6312.66 samples/sec Loss 4.4583 LearningRate 0.0002 Epoch: 23 Global Step: 488300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:06,668-Speed 6307.41 samples/sec Loss 4.3390 LearningRate 0.0002 Epoch: 23 Global Step: 488310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:09,922-Speed 6293.89 samples/sec Loss 4.3059 LearningRate 0.0002 Epoch: 23 Global Step: 488320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:13,198-Speed 6254.51 samples/sec Loss 4.3328 LearningRate 0.0002 Epoch: 23 Global Step: 488330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:16,445-Speed 6307.79 samples/sec Loss 4.3282 LearningRate 0.0002 Epoch: 23 Global Step: 488340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:19,693-Speed 6306.85 samples/sec Loss 4.4262 LearningRate 0.0002 Epoch: 23 Global Step: 488350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:22,939-Speed 6309.87 samples/sec Loss 4.3612 LearningRate 0.0002 Epoch: 23 Global Step: 488360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:26,189-Speed 6303.97 samples/sec Loss 4.3748 LearningRate 0.0002 Epoch: 23 Global Step: 488370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:29,457-Speed 6267.55 samples/sec Loss 4.3368 LearningRate 0.0002 Epoch: 23 Global Step: 488380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:32,703-Speed 6311.98 samples/sec Loss 4.3249 LearningRate 0.0002 Epoch: 23 Global Step: 488390 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:09:35,933-Speed 6341.34 samples/sec Loss 4.2904 LearningRate 0.0002 Epoch: 23 Global Step: 488400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:39,176-Speed 6317.84 samples/sec Loss 4.3437 LearningRate 0.0002 Epoch: 23 Global Step: 488410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:42,418-Speed 6318.50 samples/sec Loss 4.3681 LearningRate 0.0002 Epoch: 23 Global Step: 488420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:45,664-Speed 6309.66 samples/sec Loss 4.3877 LearningRate 0.0002 Epoch: 23 Global Step: 488430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:48,979-Speed 6180.17 samples/sec Loss 4.3475 LearningRate 0.0002 Epoch: 23 Global Step: 488440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:52,290-Speed 6186.67 samples/sec Loss 4.3665 LearningRate 0.0002 Epoch: 23 Global Step: 488450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:55,534-Speed 6314.92 samples/sec Loss 4.3881 LearningRate 0.0002 Epoch: 23 Global Step: 488460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:09:58,780-Speed 6310.51 samples/sec Loss 4.2983 LearningRate 0.0002 Epoch: 23 Global Step: 488470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:02,027-Speed 6307.24 samples/sec Loss 4.3496 LearningRate 0.0002 Epoch: 23 Global Step: 488480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:05,275-Speed 6307.22 samples/sec Loss 4.4172 LearningRate 0.0002 Epoch: 23 Global Step: 488490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:08,504-Speed 6344.46 samples/sec Loss 4.3329 LearningRate 0.0002 Epoch: 23 Global Step: 488500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:11,752-Speed 6306.75 samples/sec Loss 4.3161 LearningRate 0.0002 Epoch: 23 Global Step: 488510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:14,998-Speed 6310.23 samples/sec Loss 4.3143 LearningRate 0.0002 Epoch: 23 Global Step: 488520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:18,244-Speed 6310.65 samples/sec Loss 4.3892 LearningRate 0.0002 Epoch: 23 Global Step: 488530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:21,490-Speed 6310.58 samples/sec Loss 4.3769 LearningRate 0.0002 Epoch: 23 Global Step: 488540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:24,737-Speed 6310.28 samples/sec Loss 4.4316 LearningRate 0.0002 Epoch: 23 Global Step: 488550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:27,986-Speed 6304.87 samples/sec Loss 4.2899 LearningRate 0.0002 Epoch: 23 Global Step: 488560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:31,231-Speed 6311.55 samples/sec Loss 4.2892 LearningRate 0.0002 Epoch: 23 Global Step: 488570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:34,473-Speed 6320.54 samples/sec Loss 4.3408 LearningRate 0.0002 Epoch: 23 Global Step: 488580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:37,720-Speed 6308.56 samples/sec Loss 4.3335 LearningRate 0.0002 Epoch: 23 Global Step: 488590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:40,953-Speed 6336.11 samples/sec Loss 4.2919 LearningRate 0.0002 Epoch: 23 Global Step: 488600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:44,200-Speed 6309.31 samples/sec Loss 4.3682 LearningRate 0.0002 Epoch: 23 Global Step: 488610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:47,449-Speed 6305.08 samples/sec Loss 4.3897 LearningRate 0.0002 Epoch: 23 Global Step: 488620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:50,693-Speed 6314.31 samples/sec Loss 4.3296 LearningRate 0.0002 Epoch: 23 Global Step: 488630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:53,939-Speed 6311.50 samples/sec Loss 4.3640 LearningRate 0.0002 Epoch: 23 Global Step: 488640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:10:57,183-Speed 6313.75 samples/sec Loss 4.3083 LearningRate 0.0002 Epoch: 23 Global Step: 488650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:00,435-Speed 6299.30 samples/sec Loss 4.3660 LearningRate 0.0002 Epoch: 23 Global Step: 488660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:03,677-Speed 6316.86 samples/sec Loss 4.3727 LearningRate 0.0002 Epoch: 23 Global Step: 488670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:06,925-Speed 6308.41 samples/sec Loss 4.2910 LearningRate 0.0002 Epoch: 23 Global Step: 488680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:10,168-Speed 6316.17 samples/sec Loss 4.2973 LearningRate 0.0002 Epoch: 23 Global Step: 488690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:13,410-Speed 6317.71 samples/sec Loss 4.3240 LearningRate 0.0002 Epoch: 23 Global Step: 488700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:16,657-Speed 6309.37 samples/sec Loss 4.3138 LearningRate 0.0002 Epoch: 23 Global Step: 488710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:19,904-Speed 6308.84 samples/sec Loss 4.3025 LearningRate 0.0002 Epoch: 23 Global Step: 488720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:23,159-Speed 6293.11 samples/sec Loss 4.3234 LearningRate 0.0002 Epoch: 23 Global Step: 488730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:26,403-Speed 6314.98 samples/sec Loss 4.2827 LearningRate 0.0002 Epoch: 23 Global Step: 488740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:29,650-Speed 6307.26 samples/sec Loss 4.3352 LearningRate 0.0002 Epoch: 23 Global Step: 488750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:32,897-Speed 6310.20 samples/sec Loss 4.3358 LearningRate 0.0002 Epoch: 23 Global Step: 488760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:36,146-Speed 6303.51 samples/sec Loss 4.3729 LearningRate 0.0002 Epoch: 23 Global Step: 488770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:39,391-Speed 6312.59 samples/sec Loss 4.2809 LearningRate 0.0002 Epoch: 23 Global Step: 488780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:42,638-Speed 6309.79 samples/sec Loss 4.3488 LearningRate 0.0002 Epoch: 23 Global Step: 488790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:45,872-Speed 6335.25 samples/sec Loss 4.3099 LearningRate 0.0002 Epoch: 23 Global Step: 488800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:49,122-Speed 6302.31 samples/sec Loss 4.3844 LearningRate 0.0002 Epoch: 23 Global Step: 488810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:52,392-Speed 6265.57 samples/sec Loss 4.3933 LearningRate 0.0002 Epoch: 23 Global Step: 488820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:55,640-Speed 6305.98 samples/sec Loss 4.3975 LearningRate 0.0002 Epoch: 23 Global Step: 488830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:11:58,927-Speed 6231.39 samples/sec Loss 4.3711 LearningRate 0.0002 Epoch: 23 Global Step: 488840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:02,176-Speed 6306.22 samples/sec Loss 4.3725 LearningRate 0.0002 Epoch: 23 Global Step: 488850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:05,425-Speed 6305.01 samples/sec Loss 4.2802 LearningRate 0.0002 Epoch: 23 Global Step: 488860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:08,669-Speed 6314.56 samples/sec Loss 4.3701 LearningRate 0.0002 Epoch: 23 Global Step: 488870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:11,913-Speed 6313.50 samples/sec Loss 4.3291 LearningRate 0.0002 Epoch: 23 Global Step: 488880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:15,175-Speed 6279.26 samples/sec Loss 4.3748 LearningRate 0.0002 Epoch: 23 Global Step: 488890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:18,423-Speed 6306.69 samples/sec Loss 4.3642 LearningRate 0.0002 Epoch: 23 Global Step: 488900 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:12:21,655-Speed 6338.73 samples/sec Loss 4.3235 LearningRate 0.0002 Epoch: 23 Global Step: 488910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:24,901-Speed 6310.51 samples/sec Loss 4.3105 LearningRate 0.0002 Epoch: 23 Global Step: 488920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:28,150-Speed 6304.60 samples/sec Loss 4.3519 LearningRate 0.0002 Epoch: 23 Global Step: 488930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:31,396-Speed 6312.04 samples/sec Loss 4.3764 LearningRate 0.0002 Epoch: 23 Global Step: 488940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:34,642-Speed 6310.11 samples/sec Loss 4.3218 LearningRate 0.0002 Epoch: 23 Global Step: 488950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:37,889-Speed 6308.92 samples/sec Loss 4.3681 LearningRate 0.0002 Epoch: 23 Global Step: 488960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:41,138-Speed 6304.73 samples/sec Loss 4.2915 LearningRate 0.0002 Epoch: 23 Global Step: 488970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:44,382-Speed 6314.58 samples/sec Loss 4.3821 LearningRate 0.0002 Epoch: 23 Global Step: 488980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:47,684-Speed 6202.67 samples/sec Loss 4.3075 LearningRate 0.0002 Epoch: 23 Global Step: 488990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:51,029-Speed 6123.69 samples/sec Loss 4.2870 LearningRate 0.0002 Epoch: 23 Global Step: 489000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:54,310-Speed 6245.25 samples/sec Loss 4.4031 LearningRate 0.0002 Epoch: 23 Global Step: 489010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:12:57,556-Speed 6309.80 samples/sec Loss 4.3266 LearningRate 0.0002 Epoch: 23 Global Step: 489020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:00,804-Speed 6307.33 samples/sec Loss 4.3428 LearningRate 0.0002 Epoch: 23 Global Step: 489030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:04,051-Speed 6309.13 samples/sec Loss 4.3485 LearningRate 0.0002 Epoch: 23 Global Step: 489040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:07,296-Speed 6312.42 samples/sec Loss 4.3841 LearningRate 0.0002 Epoch: 23 Global Step: 489050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:10,538-Speed 6318.30 samples/sec Loss 4.3967 LearningRate 0.0002 Epoch: 23 Global Step: 489060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:13,786-Speed 6308.40 samples/sec Loss 4.4204 LearningRate 0.0002 Epoch: 23 Global Step: 489070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:17,029-Speed 6315.78 samples/sec Loss 4.2688 LearningRate 0.0002 Epoch: 23 Global Step: 489080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:20,280-Speed 6301.59 samples/sec Loss 4.3425 LearningRate 0.0002 Epoch: 23 Global Step: 489090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:23,528-Speed 6305.24 samples/sec Loss 4.2623 LearningRate 0.0002 Epoch: 23 Global Step: 489100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:26,785-Speed 6289.68 samples/sec Loss 4.3390 LearningRate 0.0002 Epoch: 23 Global Step: 489110 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:30,038-Speed 6296.87 samples/sec Loss 4.3455 LearningRate 0.0002 Epoch: 23 Global Step: 489120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:33,288-Speed 6304.52 samples/sec Loss 4.3053 LearningRate 0.0002 Epoch: 23 Global Step: 489130 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:36,534-Speed 6310.72 samples/sec Loss 4.3468 LearningRate 0.0002 Epoch: 23 Global Step: 489140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:39,782-Speed 6305.87 samples/sec Loss 4.3724 LearningRate 0.0002 Epoch: 23 Global Step: 489150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:43,024-Speed 6318.05 samples/sec Loss 4.2731 LearningRate 0.0002 Epoch: 23 Global Step: 489160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:46,270-Speed 6311.55 samples/sec Loss 4.3524 LearningRate 0.0002 Epoch: 23 Global Step: 489170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:49,518-Speed 6306.33 samples/sec Loss 4.3793 LearningRate 0.0002 Epoch: 23 Global Step: 489180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:52,765-Speed 6308.66 samples/sec Loss 4.3373 LearningRate 0.0002 Epoch: 23 Global Step: 489190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:56,008-Speed 6316.63 samples/sec Loss 4.3405 LearningRate 0.0002 Epoch: 23 Global Step: 489200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:13:59,276-Speed 6267.59 samples/sec Loss 4.3042 LearningRate 0.0002 Epoch: 23 Global Step: 489210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:02,532-Speed 6291.48 samples/sec Loss 4.4088 LearningRate 0.0002 Epoch: 23 Global Step: 489220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:05,786-Speed 6295.62 samples/sec Loss 4.3496 LearningRate 0.0002 Epoch: 23 Global Step: 489230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:09,032-Speed 6311.24 samples/sec Loss 4.3842 LearningRate 0.0002 Epoch: 23 Global Step: 489240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:12,279-Speed 6309.42 samples/sec Loss 4.3200 LearningRate 0.0002 Epoch: 23 Global Step: 489250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:15,526-Speed 6309.31 samples/sec Loss 4.3376 LearningRate 0.0002 Epoch: 23 Global Step: 489260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:18,769-Speed 6314.76 samples/sec Loss 4.3547 LearningRate 0.0002 Epoch: 23 Global Step: 489270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:22,015-Speed 6312.31 samples/sec Loss 4.3378 LearningRate 0.0002 Epoch: 23 Global Step: 489280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:25,278-Speed 6277.55 samples/sec Loss 4.3843 LearningRate 0.0002 Epoch: 23 Global Step: 489290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:28,525-Speed 6309.17 samples/sec Loss 4.2677 LearningRate 0.0002 Epoch: 23 Global Step: 489300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:31,759-Speed 6332.31 samples/sec Loss 4.2973 LearningRate 0.0002 Epoch: 23 Global Step: 489310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:35,004-Speed 6314.30 samples/sec Loss 4.3661 LearningRate 0.0002 Epoch: 23 Global Step: 489320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:38,302-Speed 6211.43 samples/sec Loss 4.3149 LearningRate 0.0002 Epoch: 23 Global Step: 489330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:41,596-Speed 6219.41 samples/sec Loss 4.2642 LearningRate 0.0002 Epoch: 23 Global Step: 489340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:44,842-Speed 6310.28 samples/sec Loss 4.3997 LearningRate 0.0002 Epoch: 23 Global Step: 489350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:48,088-Speed 6309.70 samples/sec Loss 4.2898 LearningRate 0.0002 Epoch: 23 Global Step: 489360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:51,332-Speed 6315.46 samples/sec Loss 4.3814 LearningRate 0.0002 Epoch: 23 Global Step: 489370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:54,571-Speed 6323.27 samples/sec Loss 4.3296 LearningRate 0.0002 Epoch: 23 Global Step: 489380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:14:57,816-Speed 6312.55 samples/sec Loss 4.4016 LearningRate 0.0002 Epoch: 23 Global Step: 489390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:01,065-Speed 6304.39 samples/sec Loss 4.2943 LearningRate 0.0002 Epoch: 23 Global Step: 489400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:04,297-Speed 6338.97 samples/sec Loss 4.3474 LearningRate 0.0002 Epoch: 23 Global Step: 489410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:07,544-Speed 6307.56 samples/sec Loss 4.3524 LearningRate 0.0002 Epoch: 23 Global Step: 489420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:10,792-Speed 6309.02 samples/sec Loss 4.3274 LearningRate 0.0002 Epoch: 23 Global Step: 489430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:14,038-Speed 6309.91 samples/sec Loss 4.2660 LearningRate 0.0002 Epoch: 23 Global Step: 489440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:17,285-Speed 6309.04 samples/sec Loss 4.3510 LearningRate 0.0002 Epoch: 23 Global Step: 489450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:20,529-Speed 6314.81 samples/sec Loss 4.3073 LearningRate 0.0002 Epoch: 23 Global Step: 489460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:23,774-Speed 6313.01 samples/sec Loss 4.3253 LearningRate 0.0002 Epoch: 23 Global Step: 489470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:27,023-Speed 6304.38 samples/sec Loss 4.3196 LearningRate 0.0002 Epoch: 23 Global Step: 489480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:30,269-Speed 6311.10 samples/sec Loss 4.3012 LearningRate 0.0002 Epoch: 23 Global Step: 489490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:33,510-Speed 6320.45 samples/sec Loss 4.3058 LearningRate 0.0002 Epoch: 23 Global Step: 489500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:36,752-Speed 6317.54 samples/sec Loss 4.3394 LearningRate 0.0002 Epoch: 23 Global Step: 489510 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:15:39,990-Speed 6326.69 samples/sec Loss 4.3409 LearningRate 0.0002 Epoch: 23 Global Step: 489520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:43,233-Speed 6316.37 samples/sec Loss 4.3152 LearningRate 0.0002 Epoch: 23 Global Step: 489530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:46,479-Speed 6312.71 samples/sec Loss 4.3858 LearningRate 0.0002 Epoch: 23 Global Step: 489540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:49,724-Speed 6312.38 samples/sec Loss 4.2866 LearningRate 0.0002 Epoch: 23 Global Step: 489550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:52,964-Speed 6321.38 samples/sec Loss 4.2491 LearningRate 0.0002 Epoch: 23 Global Step: 489560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:56,213-Speed 6304.36 samples/sec Loss 4.3169 LearningRate 0.0002 Epoch: 23 Global Step: 489570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:15:59,471-Speed 6288.20 samples/sec Loss 4.4015 LearningRate 0.0002 Epoch: 23 Global Step: 489580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:02,716-Speed 6312.80 samples/sec Loss 4.3367 LearningRate 0.0002 Epoch: 23 Global Step: 489590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:05,958-Speed 6318.24 samples/sec Loss 4.3536 LearningRate 0.0002 Epoch: 23 Global Step: 489600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:09,200-Speed 6317.68 samples/sec Loss 4.3947 LearningRate 0.0002 Epoch: 23 Global Step: 489610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:12,431-Speed 6339.94 samples/sec Loss 4.3906 LearningRate 0.0002 Epoch: 23 Global Step: 489620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:15,680-Speed 6305.52 samples/sec Loss 4.4278 LearningRate 0.0002 Epoch: 23 Global Step: 489630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:18,923-Speed 6316.87 samples/sec Loss 4.3100 LearningRate 0.0002 Epoch: 23 Global Step: 489640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:22,170-Speed 6310.45 samples/sec Loss 4.3592 LearningRate 0.0002 Epoch: 23 Global Step: 489650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:25,416-Speed 6310.39 samples/sec Loss 4.3577 LearningRate 0.0002 Epoch: 23 Global Step: 489660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:28,663-Speed 6308.91 samples/sec Loss 4.3291 LearningRate 0.0002 Epoch: 23 Global Step: 489670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:31,906-Speed 6317.03 samples/sec Loss 4.3477 LearningRate 0.0002 Epoch: 23 Global Step: 489680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:35,148-Speed 6316.75 samples/sec Loss 4.3316 LearningRate 0.0002 Epoch: 23 Global Step: 489690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:38,395-Speed 6308.94 samples/sec Loss 4.3520 LearningRate 0.0002 Epoch: 23 Global Step: 489700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:41,644-Speed 6305.19 samples/sec Loss 4.3239 LearningRate 0.0002 Epoch: 23 Global Step: 489710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:44,875-Speed 6339.73 samples/sec Loss 4.3750 LearningRate 0.0002 Epoch: 23 Global Step: 489720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:48,141-Speed 6272.20 samples/sec Loss 4.4172 LearningRate 0.0002 Epoch: 23 Global Step: 489730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:51,463-Speed 6167.76 samples/sec Loss 4.3097 LearningRate 0.0002 Epoch: 23 Global Step: 489740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:54,773-Speed 6187.27 samples/sec Loss 4.3577 LearningRate 0.0002 Epoch: 23 Global Step: 489750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:16:58,025-Speed 6299.42 samples/sec Loss 4.3918 LearningRate 0.0002 Epoch: 23 Global Step: 489760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:01,270-Speed 6312.61 samples/sec Loss 4.3736 LearningRate 0.0002 Epoch: 23 Global Step: 489770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:04,519-Speed 6305.17 samples/sec Loss 4.3212 LearningRate 0.0002 Epoch: 23 Global Step: 489780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:07,766-Speed 6309.54 samples/sec Loss 4.3962 LearningRate 0.0002 Epoch: 23 Global Step: 489790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:11,014-Speed 6306.56 samples/sec Loss 4.3702 LearningRate 0.0002 Epoch: 23 Global Step: 489800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:14,261-Speed 6307.64 samples/sec Loss 4.3464 LearningRate 0.0002 Epoch: 23 Global Step: 489810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:17,494-Speed 6336.14 samples/sec Loss 4.3365 LearningRate 0.0002 Epoch: 23 Global Step: 489820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:20,741-Speed 6309.12 samples/sec Loss 4.3654 LearningRate 0.0002 Epoch: 23 Global Step: 489830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:23,995-Speed 6295.27 samples/sec Loss 4.2828 LearningRate 0.0002 Epoch: 23 Global Step: 489840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:27,240-Speed 6313.22 samples/sec Loss 4.3230 LearningRate 0.0002 Epoch: 23 Global Step: 489850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:30,489-Speed 6305.34 samples/sec Loss 4.3704 LearningRate 0.0002 Epoch: 23 Global Step: 489860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:33,732-Speed 6316.93 samples/sec Loss 4.3614 LearningRate 0.0002 Epoch: 23 Global Step: 489870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:36,977-Speed 6312.03 samples/sec Loss 4.3314 LearningRate 0.0002 Epoch: 23 Global Step: 489880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:40,220-Speed 6317.88 samples/sec Loss 4.3157 LearningRate 0.0002 Epoch: 23 Global Step: 489890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:43,465-Speed 6312.52 samples/sec Loss 4.3457 LearningRate 0.0002 Epoch: 23 Global Step: 489900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:46,709-Speed 6313.99 samples/sec Loss 4.3432 LearningRate 0.0002 Epoch: 23 Global Step: 489910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:49,955-Speed 6311.03 samples/sec Loss 4.4080 LearningRate 0.0002 Epoch: 23 Global Step: 489920 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:17:53,187-Speed 6338.49 samples/sec Loss 4.3367 LearningRate 0.0002 Epoch: 23 Global Step: 489930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:56,432-Speed 6312.67 samples/sec Loss 4.2686 LearningRate 0.0002 Epoch: 23 Global Step: 489940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:17:59,677-Speed 6312.56 samples/sec Loss 4.3249 LearningRate 0.0002 Epoch: 23 Global Step: 489950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:02,927-Speed 6303.35 samples/sec Loss 4.3116 LearningRate 0.0002 Epoch: 23 Global Step: 489960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:06,203-Speed 6251.04 samples/sec Loss 4.3139 LearningRate 0.0002 Epoch: 23 Global Step: 489970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:09,459-Speed 6292.29 samples/sec Loss 4.3504 LearningRate 0.0002 Epoch: 23 Global Step: 489980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:12,705-Speed 6310.84 samples/sec Loss 4.3560 LearningRate 0.0002 Epoch: 23 Global Step: 489990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:15,955-Speed 6302.06 samples/sec Loss 4.3872 LearningRate 0.0002 Epoch: 23 Global Step: 490000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:19,202-Speed 6309.86 samples/sec Loss 4.3245 LearningRate 0.0002 Epoch: 23 Global Step: 490010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:22,449-Speed 6308.37 samples/sec Loss 4.3538 LearningRate 0.0002 Epoch: 23 Global Step: 490020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:25,681-Speed 6336.95 samples/sec Loss 4.3098 LearningRate 0.0002 Epoch: 23 Global Step: 490030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:28,946-Speed 6275.42 samples/sec Loss 4.2972 LearningRate 0.0002 Epoch: 23 Global Step: 490040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:32,192-Speed 6310.73 samples/sec Loss 4.3212 LearningRate 0.0002 Epoch: 23 Global Step: 490050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:35,446-Speed 6295.87 samples/sec Loss 4.3608 LearningRate 0.0002 Epoch: 23 Global Step: 490060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:38,689-Speed 6316.57 samples/sec Loss 4.3014 LearningRate 0.0002 Epoch: 23 Global Step: 490070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:41,940-Speed 6300.73 samples/sec Loss 4.3474 LearningRate 0.0002 Epoch: 23 Global Step: 490080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:45,181-Speed 6320.34 samples/sec Loss 4.3802 LearningRate 0.0002 Epoch: 23 Global Step: 490090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:48,435-Speed 6295.88 samples/sec Loss 4.3105 LearningRate 0.0002 Epoch: 23 Global Step: 490100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:51,681-Speed 6311.14 samples/sec Loss 4.3808 LearningRate 0.0002 Epoch: 23 Global Step: 490110 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:54,933-Speed 6298.33 samples/sec Loss 4.3409 LearningRate 0.0002 Epoch: 23 Global Step: 490120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:18:58,177-Speed 6313.59 samples/sec Loss 4.3169 LearningRate 0.0002 Epoch: 23 Global Step: 490130 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:19:01,411-Speed 6334.48 samples/sec Loss 4.3672 LearningRate 0.0002 Epoch: 23 Global Step: 490140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:04,701-Speed 6227.67 samples/sec Loss 4.3613 LearningRate 0.0002 Epoch: 23 Global Step: 490150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:07,982-Speed 6242.97 samples/sec Loss 4.3793 LearningRate 0.0002 Epoch: 23 Global Step: 490160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:11,228-Speed 6310.73 samples/sec Loss 4.3498 LearningRate 0.0002 Epoch: 23 Global Step: 490170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:14,474-Speed 6309.79 samples/sec Loss 4.3230 LearningRate 0.0002 Epoch: 23 Global Step: 490180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:17,716-Speed 6318.69 samples/sec Loss 4.3290 LearningRate 0.0002 Epoch: 23 Global Step: 490190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:20,965-Speed 6305.41 samples/sec Loss 4.3088 LearningRate 0.0002 Epoch: 23 Global Step: 490200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:24,214-Speed 6304.20 samples/sec Loss 4.4075 LearningRate 0.0002 Epoch: 23 Global Step: 490210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:27,464-Speed 6304.40 samples/sec Loss 4.3559 LearningRate 0.0002 Epoch: 23 Global Step: 490220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:30,711-Speed 6307.82 samples/sec Loss 4.3350 LearningRate 0.0002 Epoch: 23 Global Step: 490230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:33,939-Speed 6346.27 samples/sec Loss 4.4066 LearningRate 0.0002 Epoch: 23 Global Step: 490240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:19:37,170-Speed 6339.27 samples/sec Loss 4.3268 LearningRate 0.0002 Epoch: 23 Global Step: 490250 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:19:40,415-Speed 6313.47 samples/sec Loss 4.3620 LearningRate 0.0002 Epoch: 23 Global Step: 490260 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:19:43,660-Speed 6311.97 samples/sec Loss 4.3629 LearningRate 0.0002 Epoch: 23 Global Step: 490270 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:19:46,907-Speed 6309.85 samples/sec Loss 4.2716 LearningRate 0.0002 Epoch: 23 Global Step: 490280 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:19:50,150-Speed 6316.25 samples/sec Loss 4.3763 LearningRate 0.0002 Epoch: 23 Global Step: 490290 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:19:53,395-Speed 6313.17 samples/sec Loss 4.3323 LearningRate 0.0002 Epoch: 23 Global Step: 490300 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:19:56,645-Speed 6302.86 samples/sec Loss 4.3121 LearningRate 0.0002 Epoch: 23 Global Step: 490310 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:19:59,890-Speed 6313.33 samples/sec Loss 4.3679 LearningRate 0.0002 Epoch: 23 Global Step: 490320 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:20:03,136-Speed 6310.42 samples/sec Loss 4.3821 LearningRate 0.0002 Epoch: 23 Global Step: 490330 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:20:06,384-Speed 6306.13 samples/sec Loss 4.3178 LearningRate 0.0002 Epoch: 23 Global Step: 490340 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:20:09,628-Speed 6313.51 samples/sec Loss 4.3069 LearningRate 0.0002 Epoch: 23 Global Step: 490350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:12,872-Speed 6316.61 samples/sec Loss 4.3293 LearningRate 0.0002 Epoch: 23 Global Step: 490360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:16,122-Speed 6302.36 samples/sec Loss 4.3320 LearningRate 0.0002 Epoch: 23 Global Step: 490370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:19,370-Speed 6307.46 samples/sec Loss 4.3096 LearningRate 0.0002 Epoch: 23 Global Step: 490380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:22,615-Speed 6311.54 samples/sec Loss 4.3174 LearningRate 0.0002 Epoch: 23 Global Step: 490390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:25,863-Speed 6308.05 samples/sec Loss 4.2820 LearningRate 0.0002 Epoch: 23 Global Step: 490400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:29,113-Speed 6301.18 samples/sec Loss 4.3663 LearningRate 0.0002 Epoch: 23 Global Step: 490410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:32,359-Speed 6312.48 samples/sec Loss 4.3410 LearningRate 0.0002 Epoch: 23 Global Step: 490420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:35,606-Speed 6307.59 samples/sec Loss 4.3514 LearningRate 0.0002 Epoch: 23 Global Step: 490430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:38,853-Speed 6309.67 samples/sec Loss 4.3583 LearningRate 0.0002 Epoch: 23 Global Step: 490440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:42,086-Speed 6335.06 samples/sec Loss 4.2816 LearningRate 0.0002 Epoch: 23 Global Step: 490450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:45,333-Speed 6309.71 samples/sec Loss 4.2531 LearningRate 0.0002 Epoch: 23 Global Step: 490460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:48,577-Speed 6313.52 samples/sec Loss 4.3634 LearningRate 0.0002 Epoch: 23 Global Step: 490470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:51,822-Speed 6314.38 samples/sec Loss 4.3674 LearningRate 0.0002 Epoch: 23 Global Step: 490480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:55,068-Speed 6309.54 samples/sec Loss 4.3415 LearningRate 0.0002 Epoch: 23 Global Step: 490490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:20:58,311-Speed 6317.59 samples/sec Loss 4.3288 LearningRate 0.0002 Epoch: 23 Global Step: 490500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:01,564-Speed 6297.20 samples/sec Loss 4.3842 LearningRate 0.0002 Epoch: 23 Global Step: 490510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:04,808-Speed 6314.56 samples/sec Loss 4.3211 LearningRate 0.0002 Epoch: 23 Global Step: 490520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:08,051-Speed 6316.44 samples/sec Loss 4.3875 LearningRate 0.0002 Epoch: 23 Global Step: 490530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:11,296-Speed 6313.46 samples/sec Loss 4.3501 LearningRate 0.0002 Epoch: 23 Global Step: 490540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:14,524-Speed 6344.55 samples/sec Loss 4.4395 LearningRate 0.0002 Epoch: 23 Global Step: 490550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:17,770-Speed 6312.56 samples/sec Loss 4.3093 LearningRate 0.0002 Epoch: 23 Global Step: 490560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:21,015-Speed 6310.85 samples/sec Loss 4.3670 LearningRate 0.0002 Epoch: 23 Global Step: 490570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:24,262-Speed 6309.79 samples/sec Loss 4.3807 LearningRate 0.0002 Epoch: 23 Global Step: 490580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:27,508-Speed 6310.01 samples/sec Loss 4.3139 LearningRate 0.0002 Epoch: 23 Global Step: 490590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:30,755-Speed 6309.98 samples/sec Loss 4.3875 LearningRate 0.0002 Epoch: 23 Global Step: 490600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:33,998-Speed 6315.73 samples/sec Loss 4.3550 LearningRate 0.0002 Epoch: 23 Global Step: 490610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:37,246-Speed 6306.79 samples/sec Loss 4.2998 LearningRate 0.0002 Epoch: 23 Global Step: 490620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:40,495-Speed 6304.69 samples/sec Loss 4.2868 LearningRate 0.0002 Epoch: 23 Global Step: 490630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:43,741-Speed 6310.76 samples/sec Loss 4.3190 LearningRate 0.0002 Epoch: 23 Global Step: 490640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:46,971-Speed 6342.07 samples/sec Loss 4.3586 LearningRate 0.0002 Epoch: 23 Global Step: 490650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:50,215-Speed 6314.35 samples/sec Loss 4.3402 LearningRate 0.0002 Epoch: 23 Global Step: 490660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:53,460-Speed 6313.46 samples/sec Loss 4.3337 LearningRate 0.0002 Epoch: 23 Global Step: 490670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:56,702-Speed 6317.03 samples/sec Loss 4.3451 LearningRate 0.0002 Epoch: 23 Global Step: 490680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:21:59,949-Speed 6310.20 samples/sec Loss 4.3100 LearningRate 0.0002 Epoch: 23 Global Step: 490690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:03,197-Speed 6307.47 samples/sec Loss 4.2928 LearningRate 0.0002 Epoch: 23 Global Step: 490700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:06,452-Speed 6293.87 samples/sec Loss 4.3631 LearningRate 0.0002 Epoch: 23 Global Step: 490710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:09,697-Speed 6311.31 samples/sec Loss 4.3699 LearningRate 0.0002 Epoch: 23 Global Step: 490720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:12,941-Speed 6315.68 samples/sec Loss 4.2355 LearningRate 0.0002 Epoch: 23 Global Step: 490730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:16,188-Speed 6308.28 samples/sec Loss 4.3432 LearningRate 0.0002 Epoch: 23 Global Step: 490740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:19,419-Speed 6341.37 samples/sec Loss 4.3356 LearningRate 0.0002 Epoch: 23 Global Step: 490750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:22,664-Speed 6311.97 samples/sec Loss 4.4074 LearningRate 0.0002 Epoch: 23 Global Step: 490760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:25,912-Speed 6306.61 samples/sec Loss 4.3319 LearningRate 0.0002 Epoch: 23 Global Step: 490770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:29,156-Speed 6314.30 samples/sec Loss 4.3075 LearningRate 0.0002 Epoch: 23 Global Step: 490780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:32,403-Speed 6309.64 samples/sec Loss 4.2721 LearningRate 0.0002 Epoch: 23 Global Step: 490790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:35,661-Speed 6286.53 samples/sec Loss 4.3358 LearningRate 0.0002 Epoch: 23 Global Step: 490800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:38,906-Speed 6313.39 samples/sec Loss 4.3611 LearningRate 0.0002 Epoch: 23 Global Step: 490810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:42,154-Speed 6306.35 samples/sec Loss 4.2724 LearningRate 0.0002 Epoch: 23 Global Step: 490820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:45,400-Speed 6310.80 samples/sec Loss 4.3939 LearningRate 0.0002 Epoch: 23 Global Step: 490830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:48,645-Speed 6312.18 samples/sec Loss 4.3671 LearningRate 0.0002 Epoch: 23 Global Step: 490840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:51,898-Speed 6297.47 samples/sec Loss 4.3462 LearningRate 0.0002 Epoch: 23 Global Step: 490850 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:22:55,133-Speed 6333.27 samples/sec Loss 4.2858 LearningRate 0.0002 Epoch: 23 Global Step: 490860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:22:58,381-Speed 6305.69 samples/sec Loss 4.3782 LearningRate 0.0002 Epoch: 23 Global Step: 490870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:01,629-Speed 6307.03 samples/sec Loss 4.3702 LearningRate 0.0002 Epoch: 23 Global Step: 490880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:04,881-Speed 6300.82 samples/sec Loss 4.3753 LearningRate 0.0002 Epoch: 23 Global Step: 490890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:08,127-Speed 6310.40 samples/sec Loss 4.2898 LearningRate 0.0002 Epoch: 23 Global Step: 490900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:11,370-Speed 6315.77 samples/sec Loss 4.2864 LearningRate 0.0002 Epoch: 23 Global Step: 490910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:14,613-Speed 6316.61 samples/sec Loss 4.3249 LearningRate 0.0002 Epoch: 23 Global Step: 490920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:17,857-Speed 6314.94 samples/sec Loss 4.3611 LearningRate 0.0002 Epoch: 23 Global Step: 490930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:21,103-Speed 6310.85 samples/sec Loss 4.3999 LearningRate 0.0002 Epoch: 23 Global Step: 490940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:24,354-Speed 6301.32 samples/sec Loss 4.3080 LearningRate 0.0002 Epoch: 23 Global Step: 490950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:27,587-Speed 6337.39 samples/sec Loss 4.3290 LearningRate 0.0002 Epoch: 23 Global Step: 490960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:30,829-Speed 6317.34 samples/sec Loss 4.3399 LearningRate 0.0002 Epoch: 23 Global Step: 490970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:34,069-Speed 6321.71 samples/sec Loss 4.3637 LearningRate 0.0002 Epoch: 23 Global Step: 490980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:37,315-Speed 6311.40 samples/sec Loss 4.3046 LearningRate 0.0002 Epoch: 23 Global Step: 490990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:40,561-Speed 6310.35 samples/sec Loss 4.3147 LearningRate 0.0002 Epoch: 23 Global Step: 491000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:43,804-Speed 6316.71 samples/sec Loss 4.3215 LearningRate 0.0002 Epoch: 23 Global Step: 491010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:47,048-Speed 6314.86 samples/sec Loss 4.3364 LearningRate 0.0002 Epoch: 23 Global Step: 491020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:50,293-Speed 6313.58 samples/sec Loss 4.3393 LearningRate 0.0002 Epoch: 23 Global Step: 491030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:53,546-Speed 6295.65 samples/sec Loss 4.3200 LearningRate 0.0002 Epoch: 23 Global Step: 491040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:23:56,780-Speed 6335.08 samples/sec Loss 4.3285 LearningRate 0.0002 Epoch: 23 Global Step: 491050 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:24:00,025-Speed 6312.65 samples/sec Loss 4.3837 LearningRate 0.0002 Epoch: 23 Global Step: 491060 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:24:03,275-Speed 6301.77 samples/sec Loss 4.3384 LearningRate 0.0002 Epoch: 23 Global Step: 491070 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:24:06,517-Speed 6319.08 samples/sec Loss 4.2824 LearningRate 0.0002 Epoch: 23 Global Step: 491080 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:24:09,763-Speed 6310.88 samples/sec Loss 4.2989 LearningRate 0.0002 Epoch: 23 Global Step: 491090 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:24:13,006-Speed 6317.34 samples/sec Loss 4.3421 LearningRate 0.0002 Epoch: 23 Global Step: 491100 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:24:16,254-Speed 6305.26 samples/sec Loss 4.3232 LearningRate 0.0002 Epoch: 23 Global Step: 491110 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:24:19,503-Speed 6307.30 samples/sec Loss 4.2957 LearningRate 0.0002 Epoch: 23 Global Step: 491120 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:24:22,750-Speed 6308.46 samples/sec Loss 4.3293 LearningRate 0.0002 Epoch: 23 Global Step: 491130 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:24:25,995-Speed 6312.61 samples/sec Loss 4.3976 LearningRate 0.0002 Epoch: 23 Global Step: 491140 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:24:29,243-Speed 6305.99 samples/sec Loss 4.3556 LearningRate 0.0002 Epoch: 23 Global Step: 491150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:24:32,496-Speed 6297.98 samples/sec Loss 4.3500 LearningRate 0.0002 Epoch: 23 Global Step: 491160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:24:35,738-Speed 6318.60 samples/sec Loss 4.2979 LearningRate 0.0002 Epoch: 23 Global Step: 491170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:24:38,980-Speed 6317.75 samples/sec Loss 4.2881 LearningRate 0.0002 Epoch: 23 Global Step: 491180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:24:42,225-Speed 6312.64 samples/sec Loss 4.3969 LearningRate 0.0002 Epoch: 23 Global Step: 491190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:24:45,469-Speed 6315.55 samples/sec Loss 4.4049 LearningRate 0.0002 Epoch: 23 Global Step: 491200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:24:48,716-Speed 6307.69 samples/sec Loss 4.3366 LearningRate 0.0002 Epoch: 23 Global Step: 491210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:24:51,962-Speed 6312.26 samples/sec Loss 4.3099 LearningRate 0.0002 Epoch: 23 Global Step: 491220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:24:55,209-Speed 6308.62 samples/sec Loss 4.3974 LearningRate 0.0002 Epoch: 23 Global Step: 491230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:24:58,454-Speed 6312.57 samples/sec Loss 4.3412 LearningRate 0.0002 Epoch: 23 Global Step: 491240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:01,683-Speed 6342.33 samples/sec Loss 4.3356 LearningRate 0.0002 Epoch: 23 Global Step: 491250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:04,929-Speed 6311.18 samples/sec Loss 4.3170 LearningRate 0.0002 Epoch: 23 Global Step: 491260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:08,177-Speed 6306.61 samples/sec Loss 4.3286 LearningRate 0.0002 Epoch: 23 Global Step: 491270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:11,425-Speed 6307.74 samples/sec Loss 4.3615 LearningRate 0.0002 Epoch: 23 Global Step: 491280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:14,685-Speed 6282.76 samples/sec Loss 4.2918 LearningRate 0.0002 Epoch: 23 Global Step: 491290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:17,932-Speed 6308.51 samples/sec Loss 4.2485 LearningRate 0.0002 Epoch: 23 Global Step: 491300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:21,178-Speed 6310.77 samples/sec Loss 4.3419 LearningRate 0.0002 Epoch: 23 Global Step: 491310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:24,427-Speed 6305.16 samples/sec Loss 4.3079 LearningRate 0.0002 Epoch: 23 Global Step: 491320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:27,680-Speed 6298.16 samples/sec Loss 4.3128 LearningRate 0.0002 Epoch: 23 Global Step: 491330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:30,939-Speed 6286.07 samples/sec Loss 4.2598 LearningRate 0.0002 Epoch: 23 Global Step: 491340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:34,166-Speed 6348.38 samples/sec Loss 4.3226 LearningRate 0.0002 Epoch: 23 Global Step: 491350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:37,414-Speed 6306.99 samples/sec Loss 4.3864 LearningRate 0.0002 Epoch: 23 Global Step: 491360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:40,660-Speed 6310.69 samples/sec Loss 4.3005 LearningRate 0.0002 Epoch: 23 Global Step: 491370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:43,904-Speed 6312.98 samples/sec Loss 4.2825 LearningRate 0.0002 Epoch: 23 Global Step: 491380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:47,147-Speed 6317.75 samples/sec Loss 4.3067 LearningRate 0.0002 Epoch: 23 Global Step: 491390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:50,390-Speed 6315.82 samples/sec Loss 4.3456 LearningRate 0.0002 Epoch: 23 Global Step: 491400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:53,645-Speed 6293.13 samples/sec Loss 4.3405 LearningRate 0.0002 Epoch: 23 Global Step: 491410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:25:56,890-Speed 6313.21 samples/sec Loss 4.3834 LearningRate 0.0002 Epoch: 23 Global Step: 491420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:00,135-Speed 6313.39 samples/sec Loss 4.3395 LearningRate 0.0002 Epoch: 23 Global Step: 491430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:03,377-Speed 6316.77 samples/sec Loss 4.2959 LearningRate 0.0002 Epoch: 23 Global Step: 491440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:06,623-Speed 6311.38 samples/sec Loss 4.3287 LearningRate 0.0002 Epoch: 23 Global Step: 491450 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:26:09,853-Speed 6341.94 samples/sec Loss 4.2698 LearningRate 0.0002 Epoch: 23 Global Step: 491460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:13,110-Speed 6289.66 samples/sec Loss 4.3413 LearningRate 0.0002 Epoch: 23 Global Step: 491470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:16,358-Speed 6306.47 samples/sec Loss 4.3996 LearningRate 0.0002 Epoch: 23 Global Step: 491480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:19,605-Speed 6309.71 samples/sec Loss 4.3330 LearningRate 0.0002 Epoch: 23 Global Step: 491490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:22,850-Speed 6312.56 samples/sec Loss 4.3675 LearningRate 0.0002 Epoch: 23 Global Step: 491500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:26,096-Speed 6310.84 samples/sec Loss 4.3719 LearningRate 0.0002 Epoch: 23 Global Step: 491510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:29,345-Speed 6304.72 samples/sec Loss 4.3173 LearningRate 0.0002 Epoch: 23 Global Step: 491520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:32,584-Speed 6323.60 samples/sec Loss 4.3690 LearningRate 0.0002 Epoch: 23 Global Step: 491530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:35,834-Speed 6304.89 samples/sec Loss 4.2781 LearningRate 0.0002 Epoch: 23 Global Step: 491540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:39,080-Speed 6309.57 samples/sec Loss 4.3127 LearningRate 0.0002 Epoch: 23 Global Step: 491550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:42,309-Speed 6345.73 samples/sec Loss 4.3187 LearningRate 0.0002 Epoch: 23 Global Step: 491560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:45,554-Speed 6311.90 samples/sec Loss 4.3949 LearningRate 0.0002 Epoch: 23 Global Step: 491570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:48,803-Speed 6305.17 samples/sec Loss 4.3248 LearningRate 0.0002 Epoch: 23 Global Step: 491580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:52,048-Speed 6313.35 samples/sec Loss 4.3538 LearningRate 0.0002 Epoch: 23 Global Step: 491590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:55,295-Speed 6308.18 samples/sec Loss 4.3005 LearningRate 0.0002 Epoch: 23 Global Step: 491600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:26:58,535-Speed 6321.39 samples/sec Loss 4.3136 LearningRate 0.0002 Epoch: 23 Global Step: 491610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:01,784-Speed 6304.45 samples/sec Loss 4.3168 LearningRate 0.0002 Epoch: 23 Global Step: 491620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:05,029-Speed 6314.72 samples/sec Loss 4.3512 LearningRate 0.0002 Epoch: 23 Global Step: 491630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:08,275-Speed 6310.35 samples/sec Loss 4.2459 LearningRate 0.0002 Epoch: 23 Global Step: 491640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:11,519-Speed 6314.06 samples/sec Loss 4.3068 LearningRate 0.0002 Epoch: 23 Global Step: 491650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:14,746-Speed 6346.62 samples/sec Loss 4.3386 LearningRate 0.0002 Epoch: 23 Global Step: 491660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:17,994-Speed 6308.68 samples/sec Loss 4.3174 LearningRate 0.0002 Epoch: 23 Global Step: 491670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:21,238-Speed 6312.81 samples/sec Loss 4.2755 LearningRate 0.0002 Epoch: 23 Global Step: 491680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:24,488-Speed 6305.07 samples/sec Loss 4.3349 LearningRate 0.0002 Epoch: 23 Global Step: 491690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:27,735-Speed 6308.37 samples/sec Loss 4.4092 LearningRate 0.0002 Epoch: 23 Global Step: 491700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:30,978-Speed 6316.65 samples/sec Loss 4.3487 LearningRate 0.0002 Epoch: 23 Global Step: 491710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:34,226-Speed 6306.36 samples/sec Loss 4.3316 LearningRate 0.0002 Epoch: 23 Global Step: 491720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:37,482-Speed 6290.52 samples/sec Loss 4.3527 LearningRate 0.0002 Epoch: 23 Global Step: 491730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:40,728-Speed 6310.26 samples/sec Loss 4.3163 LearningRate 0.0002 Epoch: 23 Global Step: 491740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:43,974-Speed 6311.79 samples/sec Loss 4.4000 LearningRate 0.0002 Epoch: 23 Global Step: 491750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:47,205-Speed 6341.23 samples/sec Loss 4.2861 LearningRate 0.0002 Epoch: 23 Global Step: 491760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:50,449-Speed 6314.30 samples/sec Loss 4.2927 LearningRate 0.0002 Epoch: 23 Global Step: 491770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:53,698-Speed 6304.72 samples/sec Loss 4.3173 LearningRate 0.0002 Epoch: 23 Global Step: 491780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:27:56,944-Speed 6310.92 samples/sec Loss 4.3801 LearningRate 0.0002 Epoch: 23 Global Step: 491790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:00,185-Speed 6319.48 samples/sec Loss 4.3150 LearningRate 0.0002 Epoch: 23 Global Step: 491800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:03,434-Speed 6306.47 samples/sec Loss 4.3180 LearningRate 0.0002 Epoch: 23 Global Step: 491810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:06,681-Speed 6308.78 samples/sec Loss 4.3164 LearningRate 0.0002 Epoch: 23 Global Step: 491820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:09,927-Speed 6309.88 samples/sec Loss 4.3165 LearningRate 0.0002 Epoch: 23 Global Step: 491830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:13,172-Speed 6313.02 samples/sec Loss 4.3467 LearningRate 0.0002 Epoch: 23 Global Step: 491840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:16,414-Speed 6319.36 samples/sec Loss 4.3449 LearningRate 0.0002 Epoch: 23 Global Step: 491850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:19,653-Speed 6322.94 samples/sec Loss 4.3283 LearningRate 0.0002 Epoch: 23 Global Step: 491860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:22,904-Speed 6302.06 samples/sec Loss 4.3525 LearningRate 0.0002 Epoch: 23 Global Step: 491870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:26,147-Speed 6316.37 samples/sec Loss 4.3771 LearningRate 0.0002 Epoch: 23 Global Step: 491880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:29,391-Speed 6314.11 samples/sec Loss 4.4196 LearningRate 0.0002 Epoch: 23 Global Step: 491890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:32,635-Speed 6314.17 samples/sec Loss 4.3410 LearningRate 0.0002 Epoch: 23 Global Step: 491900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:35,880-Speed 6312.31 samples/sec Loss 4.2965 LearningRate 0.0002 Epoch: 23 Global Step: 491910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:39,123-Speed 6317.12 samples/sec Loss 4.2781 LearningRate 0.0002 Epoch: 23 Global Step: 491920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:42,383-Speed 6282.82 samples/sec Loss 4.3360 LearningRate 0.0002 Epoch: 23 Global Step: 491930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:45,626-Speed 6316.60 samples/sec Loss 4.3297 LearningRate 0.0002 Epoch: 23 Global Step: 491940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:48,872-Speed 6312.21 samples/sec Loss 4.3125 LearningRate 0.0002 Epoch: 23 Global Step: 491950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:52,104-Speed 6336.33 samples/sec Loss 4.3581 LearningRate 0.0002 Epoch: 23 Global Step: 491960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:55,349-Speed 6314.33 samples/sec Loss 4.2960 LearningRate 0.0002 Epoch: 23 Global Step: 491970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:28:58,595-Speed 6311.68 samples/sec Loss 4.3424 LearningRate 0.0002 Epoch: 23 Global Step: 491980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:01,844-Speed 6303.96 samples/sec Loss 4.3388 LearningRate 0.0002 Epoch: 23 Global Step: 491990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:05,084-Speed 6321.95 samples/sec Loss 4.3436 LearningRate 0.0002 Epoch: 23 Global Step: 492000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:08,329-Speed 6312.66 samples/sec Loss 4.3751 LearningRate 0.0002 Epoch: 23 Global Step: 492010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:11,572-Speed 6316.46 samples/sec Loss 4.4211 LearningRate 0.0002 Epoch: 23 Global Step: 492020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:14,815-Speed 6316.86 samples/sec Loss 4.3569 LearningRate 0.0002 Epoch: 23 Global Step: 492030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:18,062-Speed 6309.43 samples/sec Loss 4.3887 LearningRate 0.0002 Epoch: 23 Global Step: 492040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:21,311-Speed 6304.21 samples/sec Loss 4.3102 LearningRate 0.0002 Epoch: 23 Global Step: 492050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:24,556-Speed 6313.40 samples/sec Loss 4.3033 LearningRate 0.0002 Epoch: 23 Global Step: 492060 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:29:27,793-Speed 6328.06 samples/sec Loss 4.3099 LearningRate 0.0002 Epoch: 23 Global Step: 492070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:31,045-Speed 6298.14 samples/sec Loss 4.2937 LearningRate 0.0002 Epoch: 23 Global Step: 492080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:34,289-Speed 6315.97 samples/sec Loss 4.3275 LearningRate 0.0002 Epoch: 23 Global Step: 492090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:37,539-Speed 6301.54 samples/sec Loss 4.2694 LearningRate 0.0002 Epoch: 23 Global Step: 492100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:40,788-Speed 6306.47 samples/sec Loss 4.2722 LearningRate 0.0002 Epoch: 23 Global Step: 492110 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:44,043-Speed 6291.99 samples/sec Loss 4.3138 LearningRate 0.0002 Epoch: 23 Global Step: 492120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:47,289-Speed 6311.79 samples/sec Loss 4.2544 LearningRate 0.0002 Epoch: 23 Global Step: 492130 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:50,531-Speed 6317.11 samples/sec Loss 4.3595 LearningRate 0.0002 Epoch: 23 Global Step: 492140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:53,777-Speed 6311.00 samples/sec Loss 4.3438 LearningRate 0.0002 Epoch: 23 Global Step: 492150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:29:57,024-Speed 6310.11 samples/sec Loss 4.3447 LearningRate 0.0002 Epoch: 23 Global Step: 492160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:00,261-Speed 6327.03 samples/sec Loss 4.3001 LearningRate 0.0002 Epoch: 23 Global Step: 492170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:03,508-Speed 6308.67 samples/sec Loss 4.2810 LearningRate 0.0002 Epoch: 23 Global Step: 492180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:06,842-Speed 6144.70 samples/sec Loss 4.3839 LearningRate 0.0002 Epoch: 23 Global Step: 492190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:10,121-Speed 6248.11 samples/sec Loss 4.3822 LearningRate 0.0002 Epoch: 23 Global Step: 492200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:13,366-Speed 6311.81 samples/sec Loss 4.3587 LearningRate 0.0002 Epoch: 23 Global Step: 492210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:16,610-Speed 6315.21 samples/sec Loss 4.3155 LearningRate 0.0002 Epoch: 23 Global Step: 492220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:19,859-Speed 6305.54 samples/sec Loss 4.3290 LearningRate 0.0002 Epoch: 23 Global Step: 492230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:23,102-Speed 6317.33 samples/sec Loss 4.3710 LearningRate 0.0002 Epoch: 23 Global Step: 492240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:26,348-Speed 6310.53 samples/sec Loss 4.2769 LearningRate 0.0002 Epoch: 23 Global Step: 492250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:29,596-Speed 6306.62 samples/sec Loss 4.2644 LearningRate 0.0002 Epoch: 23 Global Step: 492260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:32,833-Speed 6327.89 samples/sec Loss 4.2912 LearningRate 0.0002 Epoch: 23 Global Step: 492270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:36,074-Speed 6319.68 samples/sec Loss 4.3520 LearningRate 0.0002 Epoch: 23 Global Step: 492280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:39,322-Speed 6306.93 samples/sec Loss 4.3126 LearningRate 0.0002 Epoch: 23 Global Step: 492290 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:42,568-Speed 6310.19 samples/sec Loss 4.3082 LearningRate 0.0002 Epoch: 23 Global Step: 492300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:45,812-Speed 6314.40 samples/sec Loss 4.3523 LearningRate 0.0002 Epoch: 23 Global Step: 492310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:49,060-Speed 6307.26 samples/sec Loss 4.3654 LearningRate 0.0002 Epoch: 23 Global Step: 492320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:52,307-Speed 6309.12 samples/sec Loss 4.3349 LearningRate 0.0002 Epoch: 23 Global Step: 492330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:55,553-Speed 6310.51 samples/sec Loss 4.2823 LearningRate 0.0002 Epoch: 23 Global Step: 492340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:30:58,819-Speed 6272.02 samples/sec Loss 4.2124 LearningRate 0.0002 Epoch: 23 Global Step: 492350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:02,078-Speed 6285.25 samples/sec Loss 4.3412 LearningRate 0.0002 Epoch: 23 Global Step: 492360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:05,318-Speed 6323.52 samples/sec Loss 4.2975 LearningRate 0.0002 Epoch: 23 Global Step: 492370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:08,562-Speed 6314.80 samples/sec Loss 4.3054 LearningRate 0.0002 Epoch: 23 Global Step: 492380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:11,823-Speed 6282.89 samples/sec Loss 4.3257 LearningRate 0.0002 Epoch: 23 Global Step: 492390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:15,067-Speed 6314.44 samples/sec Loss 4.2579 LearningRate 0.0002 Epoch: 23 Global Step: 492400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:18,314-Speed 6308.97 samples/sec Loss 4.3652 LearningRate 0.0002 Epoch: 23 Global Step: 492410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:21,559-Speed 6312.53 samples/sec Loss 4.3625 LearningRate 0.0002 Epoch: 23 Global Step: 492420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:24,802-Speed 6316.33 samples/sec Loss 4.3883 LearningRate 0.0002 Epoch: 23 Global Step: 492430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:28,047-Speed 6312.49 samples/sec Loss 4.3318 LearningRate 0.0002 Epoch: 23 Global Step: 492440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:31,292-Speed 6313.27 samples/sec Loss 4.3631 LearningRate 0.0002 Epoch: 23 Global Step: 492450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:34,535-Speed 6314.87 samples/sec Loss 4.2921 LearningRate 0.0002 Epoch: 23 Global Step: 492460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:37,771-Speed 6330.06 samples/sec Loss 4.3755 LearningRate 0.0002 Epoch: 23 Global Step: 492470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:41,017-Speed 6311.09 samples/sec Loss 4.3509 LearningRate 0.0002 Epoch: 23 Global Step: 492480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:44,262-Speed 6313.81 samples/sec Loss 4.3316 LearningRate 0.0002 Epoch: 23 Global Step: 492490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:47,503-Speed 6319.34 samples/sec Loss 4.2980 LearningRate 0.0002 Epoch: 23 Global Step: 492500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:50,748-Speed 6312.46 samples/sec Loss 4.3438 LearningRate 0.0002 Epoch: 23 Global Step: 492510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:53,992-Speed 6314.69 samples/sec Loss 4.2794 LearningRate 0.0002 Epoch: 23 Global Step: 492520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:31:57,239-Speed 6310.37 samples/sec Loss 4.2811 LearningRate 0.0002 Epoch: 23 Global Step: 492530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:00,489-Speed 6301.73 samples/sec Loss 4.3633 LearningRate 0.0002 Epoch: 23 Global Step: 492540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:03,736-Speed 6309.82 samples/sec Loss 4.3294 LearningRate 0.0002 Epoch: 23 Global Step: 492550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:06,981-Speed 6311.41 samples/sec Loss 4.2928 LearningRate 0.0002 Epoch: 23 Global Step: 492560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:10,222-Speed 6321.18 samples/sec Loss 4.2943 LearningRate 0.0002 Epoch: 23 Global Step: 492570 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:32:13,460-Speed 6329.77 samples/sec Loss 4.3618 LearningRate 0.0002 Epoch: 23 Global Step: 492580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:16,726-Speed 6271.92 samples/sec Loss 4.3136 LearningRate 0.0002 Epoch: 23 Global Step: 492590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:19,973-Speed 6309.50 samples/sec Loss 4.3140 LearningRate 0.0002 Epoch: 23 Global Step: 492600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:23,220-Speed 6308.34 samples/sec Loss 4.3125 LearningRate 0.0002 Epoch: 23 Global Step: 492610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:26,465-Speed 6312.56 samples/sec Loss 4.3463 LearningRate 0.0002 Epoch: 23 Global Step: 492620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:29,711-Speed 6312.00 samples/sec Loss 4.3249 LearningRate 0.0002 Epoch: 23 Global Step: 492630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:32,955-Speed 6313.65 samples/sec Loss 4.3897 LearningRate 0.0002 Epoch: 23 Global Step: 492640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:36,200-Speed 6312.83 samples/sec Loss 4.3321 LearningRate 0.0002 Epoch: 23 Global Step: 492650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:39,447-Speed 6308.33 samples/sec Loss 4.2678 LearningRate 0.0002 Epoch: 23 Global Step: 492660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:42,692-Speed 6313.05 samples/sec Loss 4.2564 LearningRate 0.0002 Epoch: 23 Global Step: 492670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:45,951-Speed 6286.17 samples/sec Loss 4.3503 LearningRate 0.0002 Epoch: 23 Global Step: 492680 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:32:49,181-Speed 6341.01 samples/sec Loss 4.3242 LearningRate 0.0002 Epoch: 23 Global Step: 492690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:52,425-Speed 6315.17 samples/sec Loss 4.3032 LearningRate 0.0002 Epoch: 23 Global Step: 492700 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:55,673-Speed 6306.55 samples/sec Loss 4.3694 LearningRate 0.0002 Epoch: 23 Global Step: 492710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:32:58,937-Speed 6275.69 samples/sec Loss 4.2465 LearningRate 0.0002 Epoch: 23 Global Step: 492720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:02,248-Speed 6186.08 samples/sec Loss 4.2943 LearningRate 0.0002 Epoch: 23 Global Step: 492730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:05,493-Speed 6312.81 samples/sec Loss 4.3417 LearningRate 0.0002 Epoch: 23 Global Step: 492740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:08,741-Speed 6306.96 samples/sec Loss 4.2943 LearningRate 0.0002 Epoch: 23 Global Step: 492750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:11,984-Speed 6318.15 samples/sec Loss 4.3380 LearningRate 0.0002 Epoch: 23 Global Step: 492760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:15,229-Speed 6311.13 samples/sec Loss 4.3394 LearningRate 0.0002 Epoch: 23 Global Step: 492770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:18,475-Speed 6311.27 samples/sec Loss 4.2859 LearningRate 0.0002 Epoch: 23 Global Step: 492780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:21,705-Speed 6341.64 samples/sec Loss 4.2802 LearningRate 0.0002 Epoch: 23 Global Step: 492790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:24,947-Speed 6317.92 samples/sec Loss 4.2939 LearningRate 0.0002 Epoch: 23 Global Step: 492800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:28,192-Speed 6314.07 samples/sec Loss 4.3238 LearningRate 0.0002 Epoch: 23 Global Step: 492810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:31,438-Speed 6311.95 samples/sec Loss 4.3605 LearningRate 0.0002 Epoch: 23 Global Step: 492820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:34,683-Speed 6311.36 samples/sec Loss 4.3453 LearningRate 0.0002 Epoch: 23 Global Step: 492830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:37,930-Speed 6309.15 samples/sec Loss 4.3028 LearningRate 0.0002 Epoch: 23 Global Step: 492840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:41,174-Speed 6315.16 samples/sec Loss 4.3025 LearningRate 0.0002 Epoch: 23 Global Step: 492850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:44,417-Speed 6316.46 samples/sec Loss 4.2524 LearningRate 0.0002 Epoch: 23 Global Step: 492860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:47,666-Speed 6305.42 samples/sec Loss 4.3026 LearningRate 0.0002 Epoch: 23 Global Step: 492870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:50,913-Speed 6307.74 samples/sec Loss 4.3299 LearningRate 0.0002 Epoch: 23 Global Step: 492880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:33:54,147-Speed 6334.07 samples/sec Loss 4.2837 LearningRate 0.0002 Epoch: 23 Global Step: 492890 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:33:57,394-Speed 6310.23 samples/sec Loss 4.3112 LearningRate 0.0002 Epoch: 23 Global Step: 492900 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:34:00,642-Speed 6306.44 samples/sec Loss 4.3177 LearningRate 0.0002 Epoch: 23 Global Step: 492910 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:34:03,894-Speed 6298.15 samples/sec Loss 4.3346 LearningRate 0.0002 Epoch: 23 Global Step: 492920 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:34:07,141-Speed 6309.51 samples/sec Loss 4.2903 LearningRate 0.0002 Epoch: 23 Global Step: 492930 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:34:10,391-Speed 6301.93 samples/sec Loss 4.3098 LearningRate 0.0002 Epoch: 23 Global Step: 492940 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:34:13,637-Speed 6312.06 samples/sec Loss 4.3886 LearningRate 0.0002 Epoch: 23 Global Step: 492950 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:34:16,882-Speed 6312.14 samples/sec Loss 4.3869 LearningRate 0.0002 Epoch: 23 Global Step: 492960 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:34:20,127-Speed 6312.28 samples/sec Loss 4.3585 LearningRate 0.0002 Epoch: 23 Global Step: 492970 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:34:23,375-Speed 6308.14 samples/sec Loss 4.2616 LearningRate 0.0002 Epoch: 23 Global Step: 492980 Fp16 Grad Scale: 8192 Required: 31 hours Training: 2022-04-02 12:34:26,620-Speed 6311.31 samples/sec Loss 4.3512 LearningRate 0.0002 Epoch: 23 Global Step: 492990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:34:29,866-Speed 6312.11 samples/sec Loss 4.2778 LearningRate 0.0002 Epoch: 23 Global Step: 493000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:34:33,111-Speed 6312.86 samples/sec Loss 4.3237 LearningRate 0.0002 Epoch: 23 Global Step: 493010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:34:36,355-Speed 6313.68 samples/sec Loss 4.3697 LearningRate 0.0002 Epoch: 23 Global Step: 493020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:34:39,604-Speed 6305.68 samples/sec Loss 4.3665 LearningRate 0.0002 Epoch: 23 Global Step: 493030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:34:42,849-Speed 6312.32 samples/sec Loss 4.2849 LearningRate 0.0002 Epoch: 23 Global Step: 493040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:34:46,107-Speed 6287.93 samples/sec Loss 4.3561 LearningRate 0.0002 Epoch: 23 Global Step: 493050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:34:49,355-Speed 6307.44 samples/sec Loss 4.3920 LearningRate 0.0002 Epoch: 23 Global Step: 493060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:34:52,602-Speed 6309.45 samples/sec Loss 4.3670 LearningRate 0.0002 Epoch: 23 Global Step: 493070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:34:55,847-Speed 6312.01 samples/sec Loss 4.3094 LearningRate 0.0002 Epoch: 23 Global Step: 493080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:34:59,080-Speed 6335.95 samples/sec Loss 4.2934 LearningRate 0.0002 Epoch: 23 Global Step: 493090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:02,327-Speed 6309.18 samples/sec Loss 4.3410 LearningRate 0.0002 Epoch: 23 Global Step: 493100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:05,575-Speed 6306.85 samples/sec Loss 4.3080 LearningRate 0.0002 Epoch: 23 Global Step: 493110 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:08,821-Speed 6310.03 samples/sec Loss 4.3774 LearningRate 0.0002 Epoch: 23 Global Step: 493120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:12,065-Speed 6314.81 samples/sec Loss 4.2892 LearningRate 0.0002 Epoch: 23 Global Step: 493130 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:15,314-Speed 6306.13 samples/sec Loss 4.3466 LearningRate 0.0002 Epoch: 23 Global Step: 493140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:18,563-Speed 6304.25 samples/sec Loss 4.3645 LearningRate 0.0002 Epoch: 23 Global Step: 493150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:21,819-Speed 6291.08 samples/sec Loss 4.3387 LearningRate 0.0002 Epoch: 23 Global Step: 493160 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:25,064-Speed 6312.64 samples/sec Loss 4.2731 LearningRate 0.0002 Epoch: 23 Global Step: 493170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:28,312-Speed 6307.91 samples/sec Loss 4.3053 LearningRate 0.0002 Epoch: 23 Global Step: 493180 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:31,546-Speed 6333.13 samples/sec Loss 4.3601 LearningRate 0.0002 Epoch: 23 Global Step: 493190 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:34,792-Speed 6311.45 samples/sec Loss 4.3250 LearningRate 0.0002 Epoch: 23 Global Step: 493200 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:38,085-Speed 6219.49 samples/sec Loss 4.3322 LearningRate 0.0002 Epoch: 23 Global Step: 493210 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:41,333-Speed 6307.41 samples/sec Loss 4.3968 LearningRate 0.0002 Epoch: 23 Global Step: 493220 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:44,580-Speed 6307.74 samples/sec Loss 4.2783 LearningRate 0.0002 Epoch: 23 Global Step: 493230 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:47,830-Speed 6304.96 samples/sec Loss 4.3965 LearningRate 0.0002 Epoch: 23 Global Step: 493240 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:51,124-Speed 6218.70 samples/sec Loss 4.2814 LearningRate 0.0002 Epoch: 23 Global Step: 493250 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:54,378-Speed 6295.16 samples/sec Loss 4.2883 LearningRate 0.0002 Epoch: 23 Global Step: 493260 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:35:57,623-Speed 6311.70 samples/sec Loss 4.3745 LearningRate 0.0002 Epoch: 23 Global Step: 493270 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:00,870-Speed 6309.17 samples/sec Loss 4.3525 LearningRate 0.0002 Epoch: 23 Global Step: 493280 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:04,117-Speed 6309.08 samples/sec Loss 4.2708 LearningRate 0.0002 Epoch: 23 Global Step: 493290 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:36:07,344-Speed 6347.11 samples/sec Loss 4.4246 LearningRate 0.0002 Epoch: 23 Global Step: 493300 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:10,588-Speed 6314.95 samples/sec Loss 4.3996 LearningRate 0.0002 Epoch: 23 Global Step: 493310 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:13,831-Speed 6317.21 samples/sec Loss 4.3762 LearningRate 0.0002 Epoch: 23 Global Step: 493320 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:17,081-Speed 6303.46 samples/sec Loss 4.3363 LearningRate 0.0002 Epoch: 23 Global Step: 493330 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:20,325-Speed 6313.22 samples/sec Loss 4.3042 LearningRate 0.0002 Epoch: 23 Global Step: 493340 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:23,576-Speed 6302.88 samples/sec Loss 4.3199 LearningRate 0.0002 Epoch: 23 Global Step: 493350 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:26,828-Speed 6298.98 samples/sec Loss 4.2937 LearningRate 0.0002 Epoch: 23 Global Step: 493360 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:30,071-Speed 6315.52 samples/sec Loss 4.2707 LearningRate 0.0002 Epoch: 23 Global Step: 493370 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:33,319-Speed 6307.41 samples/sec Loss 4.3335 LearningRate 0.0002 Epoch: 23 Global Step: 493380 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:36,572-Speed 6296.76 samples/sec Loss 4.3216 LearningRate 0.0002 Epoch: 23 Global Step: 493390 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:39,827-Speed 6292.99 samples/sec Loss 4.3648 LearningRate 0.0002 Epoch: 23 Global Step: 493400 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:43,073-Speed 6312.41 samples/sec Loss 4.3173 LearningRate 0.0002 Epoch: 23 Global Step: 493410 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:46,317-Speed 6314.43 samples/sec Loss 4.2700 LearningRate 0.0002 Epoch: 23 Global Step: 493420 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:49,563-Speed 6310.58 samples/sec Loss 4.2898 LearningRate 0.0002 Epoch: 23 Global Step: 493430 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:52,808-Speed 6312.43 samples/sec Loss 4.2381 LearningRate 0.0002 Epoch: 23 Global Step: 493440 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:56,054-Speed 6310.18 samples/sec Loss 4.3075 LearningRate 0.0002 Epoch: 23 Global Step: 493450 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:36:59,308-Speed 6296.85 samples/sec Loss 4.2921 LearningRate 0.0002 Epoch: 23 Global Step: 493460 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:02,554-Speed 6311.22 samples/sec Loss 4.3609 LearningRate 0.0002 Epoch: 23 Global Step: 493470 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:05,797-Speed 6315.41 samples/sec Loss 4.3209 LearningRate 0.0002 Epoch: 23 Global Step: 493480 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:09,040-Speed 6316.20 samples/sec Loss 4.3346 LearningRate 0.0002 Epoch: 23 Global Step: 493490 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:12,272-Speed 6339.78 samples/sec Loss 4.2993 LearningRate 0.0002 Epoch: 23 Global Step: 493500 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:15,521-Speed 6304.62 samples/sec Loss 4.3374 LearningRate 0.0002 Epoch: 23 Global Step: 493510 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:18,769-Speed 6306.04 samples/sec Loss 4.2916 LearningRate 0.0002 Epoch: 23 Global Step: 493520 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:22,017-Speed 6306.71 samples/sec Loss 4.2514 LearningRate 0.0002 Epoch: 23 Global Step: 493530 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:25,264-Speed 6308.24 samples/sec Loss 4.3346 LearningRate 0.0002 Epoch: 23 Global Step: 493540 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:28,511-Speed 6309.42 samples/sec Loss 4.2771 LearningRate 0.0002 Epoch: 23 Global Step: 493550 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:31,759-Speed 6307.49 samples/sec Loss 4.2889 LearningRate 0.0002 Epoch: 23 Global Step: 493560 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:35,006-Speed 6308.02 samples/sec Loss 4.2289 LearningRate 0.0002 Epoch: 23 Global Step: 493570 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:38,256-Speed 6302.76 samples/sec Loss 4.3431 LearningRate 0.0002 Epoch: 23 Global Step: 493580 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:41,512-Speed 6291.36 samples/sec Loss 4.2997 LearningRate 0.0002 Epoch: 23 Global Step: 493590 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:44,744-Speed 6338.37 samples/sec Loss 4.2716 LearningRate 0.0002 Epoch: 23 Global Step: 493600 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:47,989-Speed 6312.88 samples/sec Loss 4.3052 LearningRate 0.0002 Epoch: 23 Global Step: 493610 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:51,247-Speed 6287.79 samples/sec Loss 4.3317 LearningRate 0.0002 Epoch: 23 Global Step: 493620 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:54,494-Speed 6307.86 samples/sec Loss 4.3927 LearningRate 0.0002 Epoch: 23 Global Step: 493630 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:37:57,740-Speed 6312.11 samples/sec Loss 4.3017 LearningRate 0.0002 Epoch: 23 Global Step: 493640 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:00,997-Speed 6288.86 samples/sec Loss 4.3285 LearningRate 0.0002 Epoch: 23 Global Step: 493650 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:04,290-Speed 6219.93 samples/sec Loss 4.2924 LearningRate 0.0002 Epoch: 23 Global Step: 493660 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:07,543-Speed 6297.82 samples/sec Loss 4.2670 LearningRate 0.0002 Epoch: 23 Global Step: 493670 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:10,792-Speed 6304.90 samples/sec Loss 4.3233 LearningRate 0.0002 Epoch: 23 Global Step: 493680 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:14,043-Speed 6301.66 samples/sec Loss 4.3290 LearningRate 0.0002 Epoch: 23 Global Step: 493690 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:17,292-Speed 6304.79 samples/sec Loss 4.3599 LearningRate 0.0002 Epoch: 23 Global Step: 493700 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:38:20,529-Speed 6328.48 samples/sec Loss 4.3260 LearningRate 0.0002 Epoch: 23 Global Step: 493710 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:23,778-Speed 6304.88 samples/sec Loss 4.3456 LearningRate 0.0002 Epoch: 23 Global Step: 493720 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:27,023-Speed 6313.47 samples/sec Loss 4.3655 LearningRate 0.0002 Epoch: 23 Global Step: 493730 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:30,270-Speed 6308.28 samples/sec Loss 4.2983 LearningRate 0.0002 Epoch: 23 Global Step: 493740 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:33,518-Speed 6306.70 samples/sec Loss 4.2502 LearningRate 0.0002 Epoch: 23 Global Step: 493750 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:36,765-Speed 6308.20 samples/sec Loss 4.3123 LearningRate 0.0002 Epoch: 23 Global Step: 493760 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:40,015-Speed 6304.20 samples/sec Loss 4.2787 LearningRate 0.0002 Epoch: 23 Global Step: 493770 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:43,264-Speed 6304.61 samples/sec Loss 4.3469 LearningRate 0.0002 Epoch: 23 Global Step: 493780 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:46,508-Speed 6315.11 samples/sec Loss 4.2508 LearningRate 0.0002 Epoch: 23 Global Step: 493790 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:49,755-Speed 6307.86 samples/sec Loss 4.3058 LearningRate 0.0002 Epoch: 23 Global Step: 493800 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:52,990-Speed 6332.05 samples/sec Loss 4.2899 LearningRate 0.0002 Epoch: 23 Global Step: 493810 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:56,237-Speed 6309.46 samples/sec Loss 4.3132 LearningRate 0.0002 Epoch: 23 Global Step: 493820 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:38:59,486-Speed 6303.88 samples/sec Loss 4.3724 LearningRate 0.0002 Epoch: 23 Global Step: 493830 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:02,738-Speed 6300.18 samples/sec Loss 4.2865 LearningRate 0.0002 Epoch: 23 Global Step: 493840 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:05,981-Speed 6316.63 samples/sec Loss 4.3016 LearningRate 0.0002 Epoch: 23 Global Step: 493850 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:09,236-Speed 6293.00 samples/sec Loss 4.2867 LearningRate 0.0002 Epoch: 23 Global Step: 493860 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:12,483-Speed 6309.64 samples/sec Loss 4.3172 LearningRate 0.0002 Epoch: 23 Global Step: 493870 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:15,728-Speed 6312.22 samples/sec Loss 4.3401 LearningRate 0.0002 Epoch: 23 Global Step: 493880 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:18,978-Speed 6303.93 samples/sec Loss 4.3132 LearningRate 0.0002 Epoch: 23 Global Step: 493890 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:22,223-Speed 6312.39 samples/sec Loss 4.3078 LearningRate 0.0002 Epoch: 23 Global Step: 493900 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:25,453-Speed 6340.82 samples/sec Loss 4.3219 LearningRate 0.0002 Epoch: 23 Global Step: 493910 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:28,700-Speed 6310.14 samples/sec Loss 4.2740 LearningRate 0.0002 Epoch: 23 Global Step: 493920 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:31,950-Speed 6303.23 samples/sec Loss 4.2683 LearningRate 0.0002 Epoch: 23 Global Step: 493930 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:35,195-Speed 6312.05 samples/sec Loss 4.2515 LearningRate 0.0002 Epoch: 23 Global Step: 493940 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:38,441-Speed 6310.92 samples/sec Loss 4.3423 LearningRate 0.0002 Epoch: 23 Global Step: 493950 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:41,697-Speed 6290.76 samples/sec Loss 4.2828 LearningRate 0.0002 Epoch: 23 Global Step: 493960 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:44,943-Speed 6311.28 samples/sec Loss 4.3228 LearningRate 0.0002 Epoch: 23 Global Step: 493970 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:48,199-Speed 6292.19 samples/sec Loss 4.3076 LearningRate 0.0002 Epoch: 23 Global Step: 493980 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:51,445-Speed 6310.65 samples/sec Loss 4.3118 LearningRate 0.0002 Epoch: 23 Global Step: 493990 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:54,689-Speed 6313.25 samples/sec Loss 4.3127 LearningRate 0.0002 Epoch: 23 Global Step: 494000 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:39:57,922-Speed 6336.16 samples/sec Loss 4.2871 LearningRate 0.0002 Epoch: 23 Global Step: 494010 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:01,173-Speed 6301.84 samples/sec Loss 4.3690 LearningRate 0.0002 Epoch: 23 Global Step: 494020 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:04,454-Speed 6244.46 samples/sec Loss 4.2765 LearningRate 0.0002 Epoch: 23 Global Step: 494030 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:07,699-Speed 6312.55 samples/sec Loss 4.2978 LearningRate 0.0002 Epoch: 23 Global Step: 494040 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:10,947-Speed 6305.09 samples/sec Loss 4.3134 LearningRate 0.0002 Epoch: 23 Global Step: 494050 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:14,194-Speed 6309.61 samples/sec Loss 4.2470 LearningRate 0.0002 Epoch: 23 Global Step: 494060 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:17,439-Speed 6312.43 samples/sec Loss 4.3059 LearningRate 0.0002 Epoch: 23 Global Step: 494070 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:20,683-Speed 6314.81 samples/sec Loss 4.2878 LearningRate 0.0002 Epoch: 23 Global Step: 494080 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:23,930-Speed 6308.41 samples/sec Loss 4.3462 LearningRate 0.0002 Epoch: 23 Global Step: 494090 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:27,182-Speed 6299.70 samples/sec Loss 4.3298 LearningRate 0.0002 Epoch: 23 Global Step: 494100 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:30,426-Speed 6315.88 samples/sec Loss 4.3535 LearningRate 0.0002 Epoch: 23 Global Step: 494110 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-02 12:40:33,656-Speed 6341.58 samples/sec Loss 4.3266 LearningRate 0.0002 Epoch: 23 Global Step: 494120 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:36,915-Speed 6287.11 samples/sec Loss 4.4132 LearningRate 0.0002 Epoch: 23 Global Step: 494130 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:40,160-Speed 6311.10 samples/sec Loss 4.3497 LearningRate 0.0002 Epoch: 23 Global Step: 494140 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:43,405-Speed 6312.66 samples/sec Loss 4.2987 LearningRate 0.0002 Epoch: 23 Global Step: 494150 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-02 12:40:46,651-Speed 6311.62 samples/sec Loss 4.2807 LearningRate 0.0002 Epoch: 23 Global Step: 494160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:40:49,896-Speed 6312.28 samples/sec Loss 4.2683 LearningRate 0.0002 Epoch: 23 Global Step: 494170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:40:53,142-Speed 6310.89 samples/sec Loss 4.2827 LearningRate 0.0002 Epoch: 23 Global Step: 494180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:40:56,389-Speed 6307.76 samples/sec Loss 4.3218 LearningRate 0.0002 Epoch: 23 Global Step: 494190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:40:59,635-Speed 6310.87 samples/sec Loss 4.3876 LearningRate 0.0002 Epoch: 23 Global Step: 494200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:02,917-Speed 6241.67 samples/sec Loss 4.3734 LearningRate 0.0002 Epoch: 23 Global Step: 494210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:06,145-Speed 6347.07 samples/sec Loss 4.2368 LearningRate 0.0002 Epoch: 23 Global Step: 494220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:09,392-Speed 6308.12 samples/sec Loss 4.2964 LearningRate 0.0002 Epoch: 23 Global Step: 494230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:12,643-Speed 6300.90 samples/sec Loss 4.3084 LearningRate 0.0002 Epoch: 23 Global Step: 494240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:15,888-Speed 6312.96 samples/sec Loss 4.2924 LearningRate 0.0002 Epoch: 23 Global Step: 494250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:19,134-Speed 6312.09 samples/sec Loss 4.2959 LearningRate 0.0002 Epoch: 23 Global Step: 494260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:22,379-Speed 6312.02 samples/sec Loss 4.2918 LearningRate 0.0002 Epoch: 23 Global Step: 494270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:25,627-Speed 6307.18 samples/sec Loss 4.2784 LearningRate 0.0002 Epoch: 23 Global Step: 494280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:28,877-Speed 6302.99 samples/sec Loss 4.3401 LearningRate 0.0002 Epoch: 23 Global Step: 494290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:32,127-Speed 6302.49 samples/sec Loss 4.2924 LearningRate 0.0002 Epoch: 23 Global Step: 494300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:35,370-Speed 6317.47 samples/sec Loss 4.2664 LearningRate 0.0002 Epoch: 23 Global Step: 494310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:38,615-Speed 6313.41 samples/sec Loss 4.2832 LearningRate 0.0002 Epoch: 23 Global Step: 494320 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 12:41:41,849-Speed 6334.14 samples/sec Loss 4.3001 LearningRate 0.0002 Epoch: 23 Global Step: 494330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:45,096-Speed 6308.44 samples/sec Loss 4.3270 LearningRate 0.0002 Epoch: 23 Global Step: 494340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:48,342-Speed 6310.56 samples/sec Loss 4.3184 LearningRate 0.0002 Epoch: 23 Global Step: 494350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:51,583-Speed 6320.86 samples/sec Loss 4.2731 LearningRate 0.0002 Epoch: 23 Global Step: 494360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:54,829-Speed 6310.98 samples/sec Loss 4.3114 LearningRate 0.0002 Epoch: 23 Global Step: 494370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:41:58,070-Speed 6319.54 samples/sec Loss 4.3048 LearningRate 0.0002 Epoch: 23 Global Step: 494380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:01,317-Speed 6308.68 samples/sec Loss 4.2972 LearningRate 0.0002 Epoch: 23 Global Step: 494390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:04,568-Speed 6301.29 samples/sec Loss 4.3095 LearningRate 0.0002 Epoch: 23 Global Step: 494400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:07,812-Speed 6314.36 samples/sec Loss 4.3201 LearningRate 0.0002 Epoch: 23 Global Step: 494410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:11,056-Speed 6315.67 samples/sec Loss 4.3709 LearningRate 0.0002 Epoch: 23 Global Step: 494420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:14,304-Speed 6306.64 samples/sec Loss 4.3026 LearningRate 0.0002 Epoch: 23 Global Step: 494430 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 12:42:17,545-Speed 6319.14 samples/sec Loss 4.3286 LearningRate 0.0002 Epoch: 23 Global Step: 494440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:20,790-Speed 6314.35 samples/sec Loss 4.4024 LearningRate 0.0002 Epoch: 23 Global Step: 494450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:24,034-Speed 6314.13 samples/sec Loss 4.3714 LearningRate 0.0002 Epoch: 23 Global Step: 494460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:27,280-Speed 6310.48 samples/sec Loss 4.2565 LearningRate 0.0002 Epoch: 23 Global Step: 494470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:30,524-Speed 6315.64 samples/sec Loss 4.2612 LearningRate 0.0002 Epoch: 23 Global Step: 494480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:33,775-Speed 6300.30 samples/sec Loss 4.3005 LearningRate 0.0002 Epoch: 23 Global Step: 494490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:37,020-Speed 6313.12 samples/sec Loss 4.2942 LearningRate 0.0002 Epoch: 23 Global Step: 494500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:40,267-Speed 6308.96 samples/sec Loss 4.3515 LearningRate 0.0002 Epoch: 23 Global Step: 494510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:43,514-Speed 6308.13 samples/sec Loss 4.2615 LearningRate 0.0002 Epoch: 23 Global Step: 494520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:46,759-Speed 6313.55 samples/sec Loss 4.3265 LearningRate 0.0002 Epoch: 23 Global Step: 494530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:49,988-Speed 6344.19 samples/sec Loss 4.2866 LearningRate 0.0002 Epoch: 23 Global Step: 494540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:53,248-Speed 6283.20 samples/sec Loss 4.2870 LearningRate 0.0002 Epoch: 23 Global Step: 494550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:56,492-Speed 6314.89 samples/sec Loss 4.2893 LearningRate 0.0002 Epoch: 23 Global Step: 494560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:42:59,735-Speed 6316.80 samples/sec Loss 4.2998 LearningRate 0.0002 Epoch: 23 Global Step: 494570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:02,982-Speed 6308.64 samples/sec Loss 4.2639 LearningRate 0.0002 Epoch: 23 Global Step: 494580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:06,229-Speed 6308.96 samples/sec Loss 4.3026 LearningRate 0.0002 Epoch: 23 Global Step: 494590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:09,496-Speed 6269.76 samples/sec Loss 4.3556 LearningRate 0.0002 Epoch: 23 Global Step: 494600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:12,743-Speed 6308.94 samples/sec Loss 4.2524 LearningRate 0.0002 Epoch: 23 Global Step: 494610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:15,985-Speed 6317.86 samples/sec Loss 4.2842 LearningRate 0.0002 Epoch: 23 Global Step: 494620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:19,246-Speed 6282.82 samples/sec Loss 4.3528 LearningRate 0.0002 Epoch: 23 Global Step: 494630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:22,478-Speed 6338.03 samples/sec Loss 4.2869 LearningRate 0.0002 Epoch: 23 Global Step: 494640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:25,726-Speed 6307.38 samples/sec Loss 4.2932 LearningRate 0.0002 Epoch: 23 Global Step: 494650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:28,970-Speed 6314.35 samples/sec Loss 4.3695 LearningRate 0.0002 Epoch: 23 Global Step: 494660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:32,211-Speed 6320.26 samples/sec Loss 4.2220 LearningRate 0.0002 Epoch: 23 Global Step: 494670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:35,458-Speed 6309.65 samples/sec Loss 4.3127 LearningRate 0.0002 Epoch: 23 Global Step: 494680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:38,705-Speed 6307.13 samples/sec Loss 4.2978 LearningRate 0.0002 Epoch: 23 Global Step: 494690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:42,001-Speed 6215.51 samples/sec Loss 4.2876 LearningRate 0.0002 Epoch: 23 Global Step: 494700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:45,245-Speed 6315.89 samples/sec Loss 4.3395 LearningRate 0.0002 Epoch: 23 Global Step: 494710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:48,488-Speed 6316.59 samples/sec Loss 4.3096 LearningRate 0.0002 Epoch: 23 Global Step: 494720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:51,737-Speed 6303.99 samples/sec Loss 4.3532 LearningRate 0.0002 Epoch: 23 Global Step: 494730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:54,968-Speed 6341.12 samples/sec Loss 4.3292 LearningRate 0.0002 Epoch: 23 Global Step: 494740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:43:58,214-Speed 6309.60 samples/sec Loss 4.2744 LearningRate 0.0002 Epoch: 23 Global Step: 494750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:01,460-Speed 6311.67 samples/sec Loss 4.2779 LearningRate 0.0002 Epoch: 23 Global Step: 494760 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:04,707-Speed 6309.79 samples/sec Loss 4.3206 LearningRate 0.0002 Epoch: 23 Global Step: 494770 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:07,951-Speed 6313.97 samples/sec Loss 4.3234 LearningRate 0.0002 Epoch: 23 Global Step: 494780 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:11,199-Speed 6305.70 samples/sec Loss 4.3023 LearningRate 0.0002 Epoch: 23 Global Step: 494790 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:14,445-Speed 6311.90 samples/sec Loss 4.3051 LearningRate 0.0002 Epoch: 23 Global Step: 494800 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:17,694-Speed 6304.91 samples/sec Loss 4.3211 LearningRate 0.0002 Epoch: 23 Global Step: 494810 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:20,936-Speed 6318.77 samples/sec Loss 4.2762 LearningRate 0.0002 Epoch: 23 Global Step: 494820 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:24,184-Speed 6306.08 samples/sec Loss 4.3694 LearningRate 0.0002 Epoch: 23 Global Step: 494830 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:27,429-Speed 6313.24 samples/sec Loss 4.2906 LearningRate 0.0002 Epoch: 23 Global Step: 494840 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 12:44:30,666-Speed 6327.29 samples/sec Loss 4.3019 LearningRate 0.0002 Epoch: 23 Global Step: 494850 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:33,908-Speed 6319.29 samples/sec Loss 4.2633 LearningRate 0.0002 Epoch: 23 Global Step: 494860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:37,155-Speed 6308.77 samples/sec Loss 4.3349 LearningRate 0.0002 Epoch: 23 Global Step: 494870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:40,403-Speed 6306.36 samples/sec Loss 4.2575 LearningRate 0.0002 Epoch: 23 Global Step: 494880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:43,659-Speed 6292.33 samples/sec Loss 4.3331 LearningRate 0.0002 Epoch: 23 Global Step: 494890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:46,948-Speed 6228.33 samples/sec Loss 4.3211 LearningRate 0.0002 Epoch: 23 Global Step: 494900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:50,197-Speed 6304.94 samples/sec Loss 4.3600 LearningRate 0.0002 Epoch: 23 Global Step: 494910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:53,448-Speed 6301.51 samples/sec Loss 4.2963 LearningRate 0.0002 Epoch: 23 Global Step: 494920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:56,700-Speed 6298.98 samples/sec Loss 4.2504 LearningRate 0.0002 Epoch: 23 Global Step: 494930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:44:59,944-Speed 6314.37 samples/sec Loss 4.2737 LearningRate 0.0002 Epoch: 23 Global Step: 494940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:03,178-Speed 6335.38 samples/sec Loss 4.2520 LearningRate 0.0002 Epoch: 23 Global Step: 494950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:06,423-Speed 6311.55 samples/sec Loss 4.2562 LearningRate 0.0002 Epoch: 23 Global Step: 494960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:09,667-Speed 6316.00 samples/sec Loss 4.2956 LearningRate 0.0002 Epoch: 23 Global Step: 494970 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:12,910-Speed 6314.98 samples/sec Loss 4.2908 LearningRate 0.0002 Epoch: 23 Global Step: 494980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:16,154-Speed 6314.45 samples/sec Loss 4.2712 LearningRate 0.0002 Epoch: 23 Global Step: 494990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:19,401-Speed 6309.73 samples/sec Loss 4.3135 LearningRate 0.0002 Epoch: 23 Global Step: 495000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:22,644-Speed 6317.01 samples/sec Loss 4.2366 LearningRate 0.0002 Epoch: 23 Global Step: 495010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:25,890-Speed 6310.37 samples/sec Loss 4.2923 LearningRate 0.0002 Epoch: 23 Global Step: 495020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:29,137-Speed 6309.59 samples/sec Loss 4.3592 LearningRate 0.0002 Epoch: 23 Global Step: 495030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:32,394-Speed 6289.12 samples/sec Loss 4.2875 LearningRate 0.0002 Epoch: 23 Global Step: 495040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:35,629-Speed 6331.33 samples/sec Loss 4.2528 LearningRate 0.0002 Epoch: 23 Global Step: 495050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:38,871-Speed 6318.71 samples/sec Loss 4.2628 LearningRate 0.0002 Epoch: 23 Global Step: 495060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:42,117-Speed 6311.16 samples/sec Loss 4.3573 LearningRate 0.0002 Epoch: 23 Global Step: 495070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:45,374-Speed 6290.47 samples/sec Loss 4.3488 LearningRate 0.0002 Epoch: 23 Global Step: 495080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:48,618-Speed 6313.52 samples/sec Loss 4.3277 LearningRate 0.0002 Epoch: 23 Global Step: 495090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:51,862-Speed 6314.32 samples/sec Loss 4.3057 LearningRate 0.0002 Epoch: 23 Global Step: 495100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:55,107-Speed 6312.98 samples/sec Loss 4.3331 LearningRate 0.0002 Epoch: 23 Global Step: 495110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:45:58,353-Speed 6312.25 samples/sec Loss 4.2862 LearningRate 0.0002 Epoch: 23 Global Step: 495120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:01,596-Speed 6315.30 samples/sec Loss 4.3018 LearningRate 0.0002 Epoch: 23 Global Step: 495130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:04,845-Speed 6305.49 samples/sec Loss 4.3167 LearningRate 0.0002 Epoch: 23 Global Step: 495140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:08,081-Speed 6330.24 samples/sec Loss 4.2583 LearningRate 0.0002 Epoch: 23 Global Step: 495150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:11,328-Speed 6308.49 samples/sec Loss 4.3333 LearningRate 0.0002 Epoch: 23 Global Step: 495160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:14,576-Speed 6308.46 samples/sec Loss 4.3103 LearningRate 0.0002 Epoch: 23 Global Step: 495170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:17,819-Speed 6315.67 samples/sec Loss 4.2847 LearningRate 0.0002 Epoch: 23 Global Step: 495180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:21,062-Speed 6316.86 samples/sec Loss 4.2906 LearningRate 0.0002 Epoch: 23 Global Step: 495190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:24,312-Speed 6303.09 samples/sec Loss 4.2623 LearningRate 0.0002 Epoch: 23 Global Step: 495200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:27,557-Speed 6312.23 samples/sec Loss 4.2873 LearningRate 0.0002 Epoch: 23 Global Step: 495210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:30,801-Speed 6314.93 samples/sec Loss 4.2085 LearningRate 0.0002 Epoch: 23 Global Step: 495220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:34,059-Speed 6288.43 samples/sec Loss 4.3031 LearningRate 0.0002 Epoch: 23 Global Step: 495230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:37,303-Speed 6313.29 samples/sec Loss 4.3192 LearningRate 0.0002 Epoch: 23 Global Step: 495240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:40,549-Speed 6310.65 samples/sec Loss 4.2769 LearningRate 0.0002 Epoch: 23 Global Step: 495250 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 12:46:43,781-Speed 6338.56 samples/sec Loss 4.3348 LearningRate 0.0002 Epoch: 23 Global Step: 495260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:47,028-Speed 6308.65 samples/sec Loss 4.2908 LearningRate 0.0002 Epoch: 23 Global Step: 495270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:50,275-Speed 6307.99 samples/sec Loss 4.2875 LearningRate 0.0002 Epoch: 23 Global Step: 495280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:53,518-Speed 6316.46 samples/sec Loss 4.3346 LearningRate 0.0002 Epoch: 23 Global Step: 495290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:46:56,763-Speed 6313.49 samples/sec Loss 4.2925 LearningRate 0.0002 Epoch: 23 Global Step: 495300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:00,012-Speed 6305.46 samples/sec Loss 4.2854 LearningRate 0.0002 Epoch: 23 Global Step: 495310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:03,257-Speed 6312.47 samples/sec Loss 4.3977 LearningRate 0.0002 Epoch: 23 Global Step: 495320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:06,501-Speed 6315.81 samples/sec Loss 4.2882 LearningRate 0.0002 Epoch: 23 Global Step: 495330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:09,749-Speed 6307.53 samples/sec Loss 4.3746 LearningRate 0.0002 Epoch: 23 Global Step: 495340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:13,004-Speed 6293.23 samples/sec Loss 4.2846 LearningRate 0.0002 Epoch: 23 Global Step: 495350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:16,235-Speed 6339.56 samples/sec Loss 4.3228 LearningRate 0.0002 Epoch: 23 Global Step: 495360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:19,479-Speed 6313.13 samples/sec Loss 4.2673 LearningRate 0.0002 Epoch: 23 Global Step: 495370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:22,725-Speed 6312.30 samples/sec Loss 4.2904 LearningRate 0.0002 Epoch: 23 Global Step: 495380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:25,970-Speed 6311.89 samples/sec Loss 4.3267 LearningRate 0.0002 Epoch: 23 Global Step: 495390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:29,214-Speed 6314.54 samples/sec Loss 4.3378 LearningRate 0.0002 Epoch: 23 Global Step: 495400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:32,458-Speed 6313.70 samples/sec Loss 4.3180 LearningRate 0.0002 Epoch: 23 Global Step: 495410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:35,706-Speed 6308.45 samples/sec Loss 4.3228 LearningRate 0.0002 Epoch: 23 Global Step: 495420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:38,951-Speed 6311.39 samples/sec Loss 4.3167 LearningRate 0.0002 Epoch: 23 Global Step: 495430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:42,197-Speed 6312.50 samples/sec Loss 4.2919 LearningRate 0.0002 Epoch: 23 Global Step: 495440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:45,443-Speed 6309.70 samples/sec Loss 4.3253 LearningRate 0.0002 Epoch: 23 Global Step: 495450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:48,676-Speed 6335.46 samples/sec Loss 4.3143 LearningRate 0.0002 Epoch: 23 Global Step: 495460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:51,922-Speed 6311.63 samples/sec Loss 4.3291 LearningRate 0.0002 Epoch: 23 Global Step: 495470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:55,165-Speed 6317.42 samples/sec Loss 4.3173 LearningRate 0.0002 Epoch: 23 Global Step: 495480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:47:58,408-Speed 6315.97 samples/sec Loss 4.2894 LearningRate 0.0002 Epoch: 23 Global Step: 495490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:01,653-Speed 6312.65 samples/sec Loss 4.2584 LearningRate 0.0002 Epoch: 23 Global Step: 495500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:04,992-Speed 6135.51 samples/sec Loss 4.2888 LearningRate 0.0002 Epoch: 23 Global Step: 495510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:08,265-Speed 6257.73 samples/sec Loss 4.2470 LearningRate 0.0002 Epoch: 23 Global Step: 495520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:11,543-Speed 6249.50 samples/sec Loss 4.2820 LearningRate 0.0002 Epoch: 23 Global Step: 495530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:14,843-Speed 6207.59 samples/sec Loss 4.3017 LearningRate 0.0002 Epoch: 23 Global Step: 495540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:18,094-Speed 6300.78 samples/sec Loss 4.3614 LearningRate 0.0002 Epoch: 23 Global Step: 495550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:21,324-Speed 6341.79 samples/sec Loss 4.3264 LearningRate 0.0002 Epoch: 23 Global Step: 495560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:24,570-Speed 6311.98 samples/sec Loss 4.2584 LearningRate 0.0002 Epoch: 23 Global Step: 495570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:27,812-Speed 6318.34 samples/sec Loss 4.3375 LearningRate 0.0002 Epoch: 23 Global Step: 495580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:31,081-Speed 6267.04 samples/sec Loss 4.2207 LearningRate 0.0002 Epoch: 23 Global Step: 495590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:34,326-Speed 6311.28 samples/sec Loss 4.3170 LearningRate 0.0002 Epoch: 23 Global Step: 495600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:37,571-Speed 6313.15 samples/sec Loss 4.2503 LearningRate 0.0002 Epoch: 23 Global Step: 495610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:40,816-Speed 6313.92 samples/sec Loss 4.3410 LearningRate 0.0002 Epoch: 23 Global Step: 495620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:44,062-Speed 6309.68 samples/sec Loss 4.3378 LearningRate 0.0002 Epoch: 23 Global Step: 495630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:47,317-Speed 6293.90 samples/sec Loss 4.2883 LearningRate 0.0002 Epoch: 23 Global Step: 495640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:50,565-Speed 6307.09 samples/sec Loss 4.2296 LearningRate 0.0002 Epoch: 23 Global Step: 495650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:53,799-Speed 6333.30 samples/sec Loss 4.3360 LearningRate 0.0002 Epoch: 23 Global Step: 495660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:48:57,056-Speed 6289.13 samples/sec Loss 4.3192 LearningRate 0.0002 Epoch: 23 Global Step: 495670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:00,301-Speed 6312.95 samples/sec Loss 4.2531 LearningRate 0.0002 Epoch: 23 Global Step: 495680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:03,554-Speed 6296.79 samples/sec Loss 4.3836 LearningRate 0.0002 Epoch: 23 Global Step: 495690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:06,803-Speed 6305.86 samples/sec Loss 4.3204 LearningRate 0.0002 Epoch: 23 Global Step: 495700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:10,045-Speed 6319.03 samples/sec Loss 4.3090 LearningRate 0.0002 Epoch: 23 Global Step: 495710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:13,292-Speed 6309.18 samples/sec Loss 4.2989 LearningRate 0.0002 Epoch: 23 Global Step: 495720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:16,536-Speed 6313.99 samples/sec Loss 4.3186 LearningRate 0.0002 Epoch: 23 Global Step: 495730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:19,781-Speed 6313.84 samples/sec Loss 4.2725 LearningRate 0.0002 Epoch: 23 Global Step: 495740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:23,028-Speed 6308.18 samples/sec Loss 4.2689 LearningRate 0.0002 Epoch: 23 Global Step: 495750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:26,259-Speed 6339.61 samples/sec Loss 4.2521 LearningRate 0.0002 Epoch: 23 Global Step: 495760 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:29,506-Speed 6309.49 samples/sec Loss 4.3647 LearningRate 0.0002 Epoch: 23 Global Step: 495770 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:32,750-Speed 6315.13 samples/sec Loss 4.3219 LearningRate 0.0002 Epoch: 23 Global Step: 495780 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:36,005-Speed 6292.67 samples/sec Loss 4.2793 LearningRate 0.0002 Epoch: 23 Global Step: 495790 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:39,261-Speed 6291.11 samples/sec Loss 4.2835 LearningRate 0.0002 Epoch: 23 Global Step: 495800 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:42,506-Speed 6313.76 samples/sec Loss 4.2167 LearningRate 0.0002 Epoch: 23 Global Step: 495810 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:45,754-Speed 6306.68 samples/sec Loss 4.3174 LearningRate 0.0002 Epoch: 23 Global Step: 495820 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:49,001-Speed 6306.94 samples/sec Loss 4.2979 LearningRate 0.0002 Epoch: 23 Global Step: 495830 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:52,244-Speed 6317.10 samples/sec Loss 4.2848 LearningRate 0.0002 Epoch: 23 Global Step: 495840 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:55,488-Speed 6315.90 samples/sec Loss 4.2674 LearningRate 0.0002 Epoch: 23 Global Step: 495850 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:49:58,735-Speed 6307.72 samples/sec Loss 4.2630 LearningRate 0.0002 Epoch: 23 Global Step: 495860 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 12:50:01,968-Speed 6336.87 samples/sec Loss 4.2977 LearningRate 0.0002 Epoch: 23 Global Step: 495870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:05,227-Speed 6285.01 samples/sec Loss 4.3050 LearningRate 0.0002 Epoch: 23 Global Step: 495880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:08,475-Speed 6307.03 samples/sec Loss 4.3389 LearningRate 0.0002 Epoch: 23 Global Step: 495890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:11,720-Speed 6312.55 samples/sec Loss 4.3104 LearningRate 0.0002 Epoch: 23 Global Step: 495900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:14,965-Speed 6311.73 samples/sec Loss 4.3003 LearningRate 0.0002 Epoch: 23 Global Step: 495910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:18,210-Speed 6312.98 samples/sec Loss 4.2800 LearningRate 0.0002 Epoch: 23 Global Step: 495920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:21,454-Speed 6315.44 samples/sec Loss 4.2523 LearningRate 0.0002 Epoch: 23 Global Step: 495930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:24,699-Speed 6312.72 samples/sec Loss 4.2770 LearningRate 0.0002 Epoch: 23 Global Step: 495940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:27,947-Speed 6306.32 samples/sec Loss 4.3252 LearningRate 0.0002 Epoch: 23 Global Step: 495950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:31,200-Speed 6298.31 samples/sec Loss 4.2987 LearningRate 0.0002 Epoch: 23 Global Step: 495960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:34,430-Speed 6341.54 samples/sec Loss 4.2785 LearningRate 0.0002 Epoch: 23 Global Step: 495970 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:37,678-Speed 6308.17 samples/sec Loss 4.2993 LearningRate 0.0002 Epoch: 23 Global Step: 495980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:40,929-Speed 6300.83 samples/sec Loss 4.2392 LearningRate 0.0002 Epoch: 23 Global Step: 495990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:44,172-Speed 6315.79 samples/sec Loss 4.2968 LearningRate 0.0002 Epoch: 23 Global Step: 496000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:47,419-Speed 6310.04 samples/sec Loss 4.2701 LearningRate 0.0002 Epoch: 23 Global Step: 496010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:50,664-Speed 6312.30 samples/sec Loss 4.3010 LearningRate 0.0002 Epoch: 23 Global Step: 496020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:53,912-Speed 6305.67 samples/sec Loss 4.2273 LearningRate 0.0002 Epoch: 23 Global Step: 496030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:50:57,168-Speed 6292.93 samples/sec Loss 4.3078 LearningRate 0.0002 Epoch: 23 Global Step: 496040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:00,411-Speed 6315.81 samples/sec Loss 4.3127 LearningRate 0.0002 Epoch: 23 Global Step: 496050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:03,661-Speed 6302.39 samples/sec Loss 4.3454 LearningRate 0.0002 Epoch: 23 Global Step: 496060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:06,892-Speed 6340.65 samples/sec Loss 4.3006 LearningRate 0.0002 Epoch: 23 Global Step: 496070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:10,137-Speed 6312.24 samples/sec Loss 4.3203 LearningRate 0.0002 Epoch: 23 Global Step: 496080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:13,386-Speed 6306.18 samples/sec Loss 4.2995 LearningRate 0.0002 Epoch: 23 Global Step: 496090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:16,632-Speed 6309.80 samples/sec Loss 4.2714 LearningRate 0.0002 Epoch: 23 Global Step: 496100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:19,887-Speed 6292.24 samples/sec Loss 4.3228 LearningRate 0.0002 Epoch: 23 Global Step: 496110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:23,132-Speed 6314.55 samples/sec Loss 4.3210 LearningRate 0.0002 Epoch: 23 Global Step: 496120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:26,377-Speed 6312.63 samples/sec Loss 4.2712 LearningRate 0.0002 Epoch: 23 Global Step: 496130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:29,618-Speed 6319.26 samples/sec Loss 4.3686 LearningRate 0.0002 Epoch: 23 Global Step: 496140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:32,862-Speed 6314.80 samples/sec Loss 4.3911 LearningRate 0.0002 Epoch: 23 Global Step: 496150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:36,104-Speed 6317.98 samples/sec Loss 4.3448 LearningRate 0.0002 Epoch: 23 Global Step: 496160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:39,338-Speed 6335.47 samples/sec Loss 4.3282 LearningRate 0.0002 Epoch: 23 Global Step: 496170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:42,583-Speed 6313.24 samples/sec Loss 4.2385 LearningRate 0.0002 Epoch: 23 Global Step: 496180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:45,829-Speed 6309.83 samples/sec Loss 4.2515 LearningRate 0.0002 Epoch: 23 Global Step: 496190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:49,078-Speed 6305.07 samples/sec Loss 4.2903 LearningRate 0.0002 Epoch: 23 Global Step: 496200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:52,322-Speed 6314.97 samples/sec Loss 4.2399 LearningRate 0.0002 Epoch: 23 Global Step: 496210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:55,570-Speed 6308.28 samples/sec Loss 4.2690 LearningRate 0.0002 Epoch: 23 Global Step: 496220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:51:58,820-Speed 6301.87 samples/sec Loss 4.3351 LearningRate 0.0002 Epoch: 23 Global Step: 496230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:02,065-Speed 6312.07 samples/sec Loss 4.2453 LearningRate 0.0002 Epoch: 23 Global Step: 496240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:05,315-Speed 6304.76 samples/sec Loss 4.2825 LearningRate 0.0002 Epoch: 23 Global Step: 496250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:08,563-Speed 6306.36 samples/sec Loss 4.2785 LearningRate 0.0002 Epoch: 23 Global Step: 496260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:11,797-Speed 6333.22 samples/sec Loss 4.2388 LearningRate 0.0002 Epoch: 23 Global Step: 496270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:15,039-Speed 6318.41 samples/sec Loss 4.2790 LearningRate 0.0002 Epoch: 23 Global Step: 496280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:18,282-Speed 6316.53 samples/sec Loss 4.3224 LearningRate 0.0002 Epoch: 23 Global Step: 496290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:21,527-Speed 6312.12 samples/sec Loss 4.2925 LearningRate 0.0002 Epoch: 23 Global Step: 496300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:24,776-Speed 6306.65 samples/sec Loss 4.2376 LearningRate 0.0002 Epoch: 23 Global Step: 496310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:28,022-Speed 6309.00 samples/sec Loss 4.3260 LearningRate 0.0002 Epoch: 23 Global Step: 496320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:31,270-Speed 6307.46 samples/sec Loss 4.3199 LearningRate 0.0002 Epoch: 23 Global Step: 496330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:34,516-Speed 6311.49 samples/sec Loss 4.3207 LearningRate 0.0002 Epoch: 23 Global Step: 496340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:37,766-Speed 6303.29 samples/sec Loss 4.2893 LearningRate 0.0002 Epoch: 23 Global Step: 496350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:41,014-Speed 6306.05 samples/sec Loss 4.3653 LearningRate 0.0002 Epoch: 23 Global Step: 496360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:44,246-Speed 6338.25 samples/sec Loss 4.3388 LearningRate 0.0002 Epoch: 23 Global Step: 496370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:47,487-Speed 6320.12 samples/sec Loss 4.3318 LearningRate 0.0002 Epoch: 23 Global Step: 496380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:50,740-Speed 6297.85 samples/sec Loss 4.2942 LearningRate 0.0002 Epoch: 23 Global Step: 496390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:53,987-Speed 6309.89 samples/sec Loss 4.3088 LearningRate 0.0002 Epoch: 23 Global Step: 496400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:52:57,230-Speed 6315.82 samples/sec Loss 4.3483 LearningRate 0.0002 Epoch: 23 Global Step: 496410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:00,478-Speed 6306.09 samples/sec Loss 4.3481 LearningRate 0.0002 Epoch: 23 Global Step: 496420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:03,804-Speed 6159.93 samples/sec Loss 4.3393 LearningRate 0.0002 Epoch: 23 Global Step: 496430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:07,051-Speed 6309.28 samples/sec Loss 4.2787 LearningRate 0.0002 Epoch: 23 Global Step: 496440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:10,302-Speed 6299.78 samples/sec Loss 4.2838 LearningRate 0.0002 Epoch: 23 Global Step: 496450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:13,546-Speed 6315.63 samples/sec Loss 4.2755 LearningRate 0.0002 Epoch: 23 Global Step: 496460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:16,776-Speed 6341.88 samples/sec Loss 4.2864 LearningRate 0.0002 Epoch: 23 Global Step: 496470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:20,022-Speed 6310.96 samples/sec Loss 4.2696 LearningRate 0.0002 Epoch: 23 Global Step: 496480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:23,267-Speed 6311.93 samples/sec Loss 4.2836 LearningRate 0.0002 Epoch: 23 Global Step: 496490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:26,511-Speed 6315.02 samples/sec Loss 4.3172 LearningRate 0.0002 Epoch: 23 Global Step: 496500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:29,765-Speed 6295.70 samples/sec Loss 4.3052 LearningRate 0.0002 Epoch: 23 Global Step: 496510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:33,013-Speed 6306.20 samples/sec Loss 4.2672 LearningRate 0.0002 Epoch: 23 Global Step: 496520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:36,302-Speed 6228.15 samples/sec Loss 4.3003 LearningRate 0.0002 Epoch: 23 Global Step: 496530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:39,551-Speed 6304.88 samples/sec Loss 4.2857 LearningRate 0.0002 Epoch: 23 Global Step: 496540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:42,796-Speed 6311.89 samples/sec Loss 4.3362 LearningRate 0.0002 Epoch: 23 Global Step: 496550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:46,039-Speed 6317.65 samples/sec Loss 4.2303 LearningRate 0.0002 Epoch: 23 Global Step: 496560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:49,285-Speed 6310.96 samples/sec Loss 4.2469 LearningRate 0.0002 Epoch: 23 Global Step: 496570 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 12:53:52,515-Speed 6340.91 samples/sec Loss 4.3260 LearningRate 0.0002 Epoch: 23 Global Step: 496580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:55,762-Speed 6309.30 samples/sec Loss 4.3121 LearningRate 0.0002 Epoch: 23 Global Step: 496590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:53:59,017-Speed 6294.34 samples/sec Loss 4.2676 LearningRate 0.0002 Epoch: 23 Global Step: 496600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:02,263-Speed 6310.59 samples/sec Loss 4.3244 LearningRate 0.0002 Epoch: 23 Global Step: 496610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:05,513-Speed 6303.50 samples/sec Loss 4.2888 LearningRate 0.0002 Epoch: 23 Global Step: 496620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:08,757-Speed 6315.35 samples/sec Loss 4.3311 LearningRate 0.0002 Epoch: 23 Global Step: 496630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:12,001-Speed 6312.83 samples/sec Loss 4.3466 LearningRate 0.0002 Epoch: 23 Global Step: 496640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:15,246-Speed 6314.59 samples/sec Loss 4.3190 LearningRate 0.0002 Epoch: 23 Global Step: 496650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:18,495-Speed 6303.82 samples/sec Loss 4.3205 LearningRate 0.0002 Epoch: 23 Global Step: 496660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:21,739-Speed 6315.33 samples/sec Loss 4.2565 LearningRate 0.0002 Epoch: 23 Global Step: 496670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:24,973-Speed 6334.45 samples/sec Loss 4.2730 LearningRate 0.0002 Epoch: 23 Global Step: 496680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:28,218-Speed 6312.20 samples/sec Loss 4.3232 LearningRate 0.0002 Epoch: 23 Global Step: 496690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:31,467-Speed 6303.78 samples/sec Loss 4.2974 LearningRate 0.0002 Epoch: 23 Global Step: 496700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:34,714-Speed 6309.49 samples/sec Loss 4.3005 LearningRate 0.0002 Epoch: 23 Global Step: 496710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:37,960-Speed 6310.83 samples/sec Loss 4.2464 LearningRate 0.0002 Epoch: 23 Global Step: 496720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:41,201-Speed 6320.21 samples/sec Loss 4.2865 LearningRate 0.0002 Epoch: 23 Global Step: 496730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:44,455-Speed 6296.46 samples/sec Loss 4.3239 LearningRate 0.0002 Epoch: 23 Global Step: 496740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:47,703-Speed 6306.56 samples/sec Loss 4.2633 LearningRate 0.0002 Epoch: 23 Global Step: 496750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:50,952-Speed 6304.76 samples/sec Loss 4.2453 LearningRate 0.0002 Epoch: 23 Global Step: 496760 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:54,195-Speed 6316.56 samples/sec Loss 4.3335 LearningRate 0.0002 Epoch: 23 Global Step: 496770 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:54:57,428-Speed 6335.99 samples/sec Loss 4.2997 LearningRate 0.0002 Epoch: 23 Global Step: 496780 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:00,674-Speed 6309.66 samples/sec Loss 4.3456 LearningRate 0.0002 Epoch: 23 Global Step: 496790 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:03,922-Speed 6307.76 samples/sec Loss 4.2187 LearningRate 0.0002 Epoch: 23 Global Step: 496800 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:07,166-Speed 6314.65 samples/sec Loss 4.2293 LearningRate 0.0002 Epoch: 23 Global Step: 496810 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:10,411-Speed 6312.56 samples/sec Loss 4.2778 LearningRate 0.0002 Epoch: 23 Global Step: 496820 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:13,656-Speed 6313.52 samples/sec Loss 4.2620 LearningRate 0.0002 Epoch: 23 Global Step: 496830 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:16,897-Speed 6320.43 samples/sec Loss 4.2980 LearningRate 0.0002 Epoch: 23 Global Step: 496840 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:20,140-Speed 6316.71 samples/sec Loss 4.2559 LearningRate 0.0002 Epoch: 23 Global Step: 496850 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:23,382-Speed 6318.54 samples/sec Loss 4.2778 LearningRate 0.0002 Epoch: 23 Global Step: 496860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:26,628-Speed 6309.88 samples/sec Loss 4.3418 LearningRate 0.0002 Epoch: 23 Global Step: 496870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:29,858-Speed 6342.49 samples/sec Loss 4.2589 LearningRate 0.0002 Epoch: 23 Global Step: 496880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:33,102-Speed 6315.52 samples/sec Loss 4.3502 LearningRate 0.0002 Epoch: 23 Global Step: 496890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:36,349-Speed 6308.49 samples/sec Loss 4.3945 LearningRate 0.0002 Epoch: 23 Global Step: 496900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:39,595-Speed 6311.58 samples/sec Loss 4.3156 LearningRate 0.0002 Epoch: 23 Global Step: 496910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:42,840-Speed 6311.57 samples/sec Loss 4.3066 LearningRate 0.0002 Epoch: 23 Global Step: 496920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:46,086-Speed 6311.98 samples/sec Loss 4.2473 LearningRate 0.0002 Epoch: 23 Global Step: 496930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:49,330-Speed 6314.49 samples/sec Loss 4.3044 LearningRate 0.0002 Epoch: 23 Global Step: 496940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:52,578-Speed 6305.77 samples/sec Loss 4.2846 LearningRate 0.0002 Epoch: 23 Global Step: 496950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:55,821-Speed 6316.04 samples/sec Loss 4.2727 LearningRate 0.0002 Epoch: 23 Global Step: 496960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:55:59,067-Speed 6310.77 samples/sec Loss 4.3467 LearningRate 0.0002 Epoch: 23 Global Step: 496970 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:02,299-Speed 6337.84 samples/sec Loss 4.2808 LearningRate 0.0002 Epoch: 23 Global Step: 496980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:05,544-Speed 6313.50 samples/sec Loss 4.2335 LearningRate 0.0002 Epoch: 23 Global Step: 496990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:08,790-Speed 6310.96 samples/sec Loss 4.2720 LearningRate 0.0002 Epoch: 23 Global Step: 497000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:12,031-Speed 6321.14 samples/sec Loss 4.2399 LearningRate 0.0002 Epoch: 23 Global Step: 497010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:15,279-Speed 6305.86 samples/sec Loss 4.3385 LearningRate 0.0002 Epoch: 23 Global Step: 497020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:18,524-Speed 6312.74 samples/sec Loss 4.3281 LearningRate 0.0002 Epoch: 23 Global Step: 497030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:21,769-Speed 6313.51 samples/sec Loss 4.3016 LearningRate 0.0002 Epoch: 23 Global Step: 497040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:25,030-Speed 6281.55 samples/sec Loss 4.2701 LearningRate 0.0002 Epoch: 23 Global Step: 497050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:28,277-Speed 6309.12 samples/sec Loss 4.2931 LearningRate 0.0002 Epoch: 23 Global Step: 497060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:31,522-Speed 6312.83 samples/sec Loss 4.2671 LearningRate 0.0002 Epoch: 23 Global Step: 497070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:34,769-Speed 6307.97 samples/sec Loss 4.3543 LearningRate 0.0002 Epoch: 23 Global Step: 497080 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 12:56:38,001-Speed 6338.27 samples/sec Loss 4.2947 LearningRate 0.0002 Epoch: 23 Global Step: 497090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:41,251-Speed 6304.42 samples/sec Loss 4.2775 LearningRate 0.0002 Epoch: 23 Global Step: 497100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:44,499-Speed 6305.16 samples/sec Loss 4.2536 LearningRate 0.0002 Epoch: 23 Global Step: 497110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:47,745-Speed 6312.43 samples/sec Loss 4.2889 LearningRate 0.0002 Epoch: 23 Global Step: 497120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:50,987-Speed 6317.70 samples/sec Loss 4.2890 LearningRate 0.0002 Epoch: 23 Global Step: 497130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:54,230-Speed 6315.33 samples/sec Loss 4.2455 LearningRate 0.0002 Epoch: 23 Global Step: 497140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:56:57,471-Speed 6322.13 samples/sec Loss 4.2244 LearningRate 0.0002 Epoch: 23 Global Step: 497150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:00,722-Speed 6300.80 samples/sec Loss 4.2777 LearningRate 0.0002 Epoch: 23 Global Step: 497160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:03,971-Speed 6305.19 samples/sec Loss 4.3214 LearningRate 0.0002 Epoch: 23 Global Step: 497170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:07,218-Speed 6306.98 samples/sec Loss 4.3011 LearningRate 0.0002 Epoch: 23 Global Step: 497180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:10,456-Speed 6327.87 samples/sec Loss 4.3071 LearningRate 0.0002 Epoch: 23 Global Step: 497190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:13,711-Speed 6293.27 samples/sec Loss 4.3428 LearningRate 0.0002 Epoch: 23 Global Step: 497200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:16,955-Speed 6313.93 samples/sec Loss 4.2868 LearningRate 0.0002 Epoch: 23 Global Step: 497210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:20,202-Speed 6309.96 samples/sec Loss 4.2251 LearningRate 0.0002 Epoch: 23 Global Step: 497220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:23,449-Speed 6308.18 samples/sec Loss 4.2761 LearningRate 0.0002 Epoch: 23 Global Step: 497230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:26,699-Speed 6302.20 samples/sec Loss 4.2708 LearningRate 0.0002 Epoch: 23 Global Step: 497240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:29,943-Speed 6315.48 samples/sec Loss 4.2913 LearningRate 0.0002 Epoch: 23 Global Step: 497250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:33,187-Speed 6315.31 samples/sec Loss 4.2325 LearningRate 0.0002 Epoch: 23 Global Step: 497260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:36,431-Speed 6313.87 samples/sec Loss 4.2973 LearningRate 0.0002 Epoch: 23 Global Step: 497270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:39,675-Speed 6315.07 samples/sec Loss 4.3370 LearningRate 0.0002 Epoch: 23 Global Step: 497280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:42,925-Speed 6304.12 samples/sec Loss 4.2924 LearningRate 0.0002 Epoch: 23 Global Step: 497290 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 12:57:46,151-Speed 6350.51 samples/sec Loss 4.3019 LearningRate 0.0002 Epoch: 23 Global Step: 497300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:49,396-Speed 6312.64 samples/sec Loss 4.2822 LearningRate 0.0002 Epoch: 23 Global Step: 497310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:52,637-Speed 6320.16 samples/sec Loss 4.3069 LearningRate 0.0002 Epoch: 23 Global Step: 497320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:55,883-Speed 6310.58 samples/sec Loss 4.2709 LearningRate 0.0002 Epoch: 23 Global Step: 497330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:57:59,127-Speed 6315.01 samples/sec Loss 4.3183 LearningRate 0.0002 Epoch: 23 Global Step: 497340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:02,371-Speed 6313.04 samples/sec Loss 4.2570 LearningRate 0.0002 Epoch: 23 Global Step: 497350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:05,614-Speed 6317.26 samples/sec Loss 4.2403 LearningRate 0.0002 Epoch: 23 Global Step: 497360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:08,858-Speed 6315.50 samples/sec Loss 4.3230 LearningRate 0.0002 Epoch: 23 Global Step: 497370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:12,104-Speed 6310.06 samples/sec Loss 4.2411 LearningRate 0.0002 Epoch: 23 Global Step: 497380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:15,347-Speed 6317.37 samples/sec Loss 4.2771 LearningRate 0.0002 Epoch: 23 Global Step: 497390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:18,578-Speed 6338.35 samples/sec Loss 4.2726 LearningRate 0.0002 Epoch: 23 Global Step: 497400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:21,824-Speed 6311.94 samples/sec Loss 4.2641 LearningRate 0.0002 Epoch: 23 Global Step: 497410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:25,065-Speed 6319.81 samples/sec Loss 4.2822 LearningRate 0.0002 Epoch: 23 Global Step: 497420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:28,312-Speed 6309.37 samples/sec Loss 4.3026 LearningRate 0.0002 Epoch: 23 Global Step: 497430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:31,558-Speed 6309.83 samples/sec Loss 4.2615 LearningRate 0.0002 Epoch: 23 Global Step: 497440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:34,802-Speed 6315.21 samples/sec Loss 4.3192 LearningRate 0.0002 Epoch: 23 Global Step: 497450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:38,046-Speed 6313.82 samples/sec Loss 4.2462 LearningRate 0.0002 Epoch: 23 Global Step: 497460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:41,297-Speed 6302.47 samples/sec Loss 4.2851 LearningRate 0.0002 Epoch: 23 Global Step: 497470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:44,542-Speed 6313.35 samples/sec Loss 4.2728 LearningRate 0.0002 Epoch: 23 Global Step: 497480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:47,785-Speed 6316.65 samples/sec Loss 4.2731 LearningRate 0.0002 Epoch: 23 Global Step: 497490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:51,023-Speed 6326.55 samples/sec Loss 4.2857 LearningRate 0.0002 Epoch: 23 Global Step: 497500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:54,268-Speed 6312.73 samples/sec Loss 4.2846 LearningRate 0.0002 Epoch: 23 Global Step: 497510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:58:57,511-Speed 6315.94 samples/sec Loss 4.3090 LearningRate 0.0002 Epoch: 23 Global Step: 497520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:00,754-Speed 6317.24 samples/sec Loss 4.3709 LearningRate 0.0002 Epoch: 23 Global Step: 497530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:04,004-Speed 6302.48 samples/sec Loss 4.3243 LearningRate 0.0002 Epoch: 23 Global Step: 497540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:07,250-Speed 6310.12 samples/sec Loss 4.3363 LearningRate 0.0002 Epoch: 23 Global Step: 497550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:10,495-Speed 6314.26 samples/sec Loss 4.3014 LearningRate 0.0002 Epoch: 23 Global Step: 497560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:13,747-Speed 6298.49 samples/sec Loss 4.2213 LearningRate 0.0002 Epoch: 23 Global Step: 497570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:16,991-Speed 6315.33 samples/sec Loss 4.3422 LearningRate 0.0002 Epoch: 23 Global Step: 497580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:20,238-Speed 6308.60 samples/sec Loss 4.3786 LearningRate 0.0002 Epoch: 23 Global Step: 497590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:23,482-Speed 6313.75 samples/sec Loss 4.3685 LearningRate 0.0002 Epoch: 23 Global Step: 497600 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 12:59:26,715-Speed 6336.25 samples/sec Loss 4.2741 LearningRate 0.0002 Epoch: 23 Global Step: 497610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:29,957-Speed 6319.33 samples/sec Loss 4.2588 LearningRate 0.0002 Epoch: 23 Global Step: 497620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:33,199-Speed 6317.73 samples/sec Loss 4.2809 LearningRate 0.0002 Epoch: 23 Global Step: 497630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:36,443-Speed 6314.12 samples/sec Loss 4.2383 LearningRate 0.0002 Epoch: 23 Global Step: 497640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:39,685-Speed 6319.53 samples/sec Loss 4.3198 LearningRate 0.0002 Epoch: 23 Global Step: 497650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:42,928-Speed 6316.99 samples/sec Loss 4.3026 LearningRate 0.0002 Epoch: 23 Global Step: 497660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:46,174-Speed 6310.15 samples/sec Loss 4.3276 LearningRate 0.0002 Epoch: 23 Global Step: 497670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:49,419-Speed 6313.89 samples/sec Loss 4.2924 LearningRate 0.0002 Epoch: 23 Global Step: 497680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:52,663-Speed 6314.92 samples/sec Loss 4.2986 LearningRate 0.0002 Epoch: 23 Global Step: 497690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:55,908-Speed 6312.85 samples/sec Loss 4.3232 LearningRate 0.0002 Epoch: 23 Global Step: 497700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 12:59:59,150-Speed 6318.94 samples/sec Loss 4.2246 LearningRate 0.0002 Epoch: 23 Global Step: 497710 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:00:02,385-Speed 6330.55 samples/sec Loss 4.2706 LearningRate 0.0002 Epoch: 23 Global Step: 497720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:00:05,638-Speed 6298.44 samples/sec Loss 4.3463 LearningRate 0.0002 Epoch: 23 Global Step: 497730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:00:08,882-Speed 6314.34 samples/sec Loss 4.3547 LearningRate 0.0002 Epoch: 23 Global Step: 497740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:01:09,955-Speed 335.34 samples/sec Loss 4.3508 LearningRate 0.0002 Epoch: 24 Global Step: 497750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:01:13,172-Speed 6367.72 samples/sec Loss 4.3372 LearningRate 0.0002 Epoch: 24 Global Step: 497760 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:01:16,423-Speed 6301.59 samples/sec Loss 4.2820 LearningRate 0.0002 Epoch: 24 Global Step: 497770 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:01:19,669-Speed 6310.45 samples/sec Loss 4.3178 LearningRate 0.0002 Epoch: 24 Global Step: 497780 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:01:22,906-Speed 6329.25 samples/sec Loss 4.3268 LearningRate 0.0002 Epoch: 24 Global Step: 497790 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:01:26,143-Speed 6327.37 samples/sec Loss 4.2836 LearningRate 0.0002 Epoch: 24 Global Step: 497800 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:01:29,378-Speed 6331.37 samples/sec Loss 4.2572 LearningRate 0.0002 Epoch: 24 Global Step: 497810 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:01:32,615-Speed 6329.91 samples/sec Loss 4.2935 LearningRate 0.0002 Epoch: 24 Global Step: 497820 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:01:35,853-Speed 6324.79 samples/sec Loss 4.3861 LearningRate 0.0002 Epoch: 24 Global Step: 497830 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:01:39,092-Speed 6324.66 samples/sec Loss 4.3095 LearningRate 0.0002 Epoch: 24 Global Step: 497840 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:01:42,332-Speed 6323.28 samples/sec Loss 4.2966 LearningRate 0.0002 Epoch: 24 Global Step: 497850 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:01:45,564-Speed 6337.11 samples/sec Loss 4.2388 LearningRate 0.0002 Epoch: 24 Global Step: 497860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:01:48,801-Speed 6329.16 samples/sec Loss 4.2826 LearningRate 0.0002 Epoch: 24 Global Step: 497870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:01:52,039-Speed 6327.43 samples/sec Loss 4.2661 LearningRate 0.0002 Epoch: 24 Global Step: 497880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:01:55,279-Speed 6320.81 samples/sec Loss 4.2514 LearningRate 0.0002 Epoch: 24 Global Step: 497890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:01:58,520-Speed 6320.92 samples/sec Loss 4.2631 LearningRate 0.0002 Epoch: 24 Global Step: 497900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:01,778-Speed 6288.11 samples/sec Loss 4.2466 LearningRate 0.0002 Epoch: 24 Global Step: 497910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:05,018-Speed 6322.86 samples/sec Loss 4.2731 LearningRate 0.0002 Epoch: 24 Global Step: 497920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:08,257-Speed 6324.20 samples/sec Loss 4.3029 LearningRate 0.0002 Epoch: 24 Global Step: 497930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:11,496-Speed 6323.81 samples/sec Loss 4.2691 LearningRate 0.0002 Epoch: 24 Global Step: 497940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:14,740-Speed 6314.55 samples/sec Loss 4.2467 LearningRate 0.0002 Epoch: 24 Global Step: 497950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:17,972-Speed 6339.05 samples/sec Loss 4.2281 LearningRate 0.0002 Epoch: 24 Global Step: 497960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:21,223-Speed 6300.91 samples/sec Loss 4.2892 LearningRate 0.0002 Epoch: 24 Global Step: 497970 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:24,465-Speed 6318.52 samples/sec Loss 4.2590 LearningRate 0.0002 Epoch: 24 Global Step: 497980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:27,719-Speed 6293.70 samples/sec Loss 4.2754 LearningRate 0.0002 Epoch: 24 Global Step: 497990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:30,963-Speed 6316.97 samples/sec Loss 4.2347 LearningRate 0.0002 Epoch: 24 Global Step: 498000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:34,213-Speed 6301.54 samples/sec Loss 4.3284 LearningRate 0.0002 Epoch: 24 Global Step: 498010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:37,466-Speed 6297.05 samples/sec Loss 4.3055 LearningRate 0.0002 Epoch: 24 Global Step: 498020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:40,712-Speed 6310.92 samples/sec Loss 4.2617 LearningRate 0.0002 Epoch: 24 Global Step: 498030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:43,950-Speed 6327.66 samples/sec Loss 4.2373 LearningRate 0.0002 Epoch: 24 Global Step: 498040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:47,187-Speed 6327.84 samples/sec Loss 4.2637 LearningRate 0.0002 Epoch: 24 Global Step: 498050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:50,412-Speed 6350.78 samples/sec Loss 4.2607 LearningRate 0.0002 Epoch: 24 Global Step: 498060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:53,647-Speed 6332.06 samples/sec Loss 4.2591 LearningRate 0.0002 Epoch: 24 Global Step: 498070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:02:56,887-Speed 6324.02 samples/sec Loss 4.2084 LearningRate 0.0002 Epoch: 24 Global Step: 498080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:00,124-Speed 6327.62 samples/sec Loss 4.2584 LearningRate 0.0002 Epoch: 24 Global Step: 498090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:03,357-Speed 6337.07 samples/sec Loss 4.2469 LearningRate 0.0002 Epoch: 24 Global Step: 498100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:06,595-Speed 6327.04 samples/sec Loss 4.3353 LearningRate 0.0002 Epoch: 24 Global Step: 498110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:09,829-Speed 6332.71 samples/sec Loss 4.2457 LearningRate 0.0002 Epoch: 24 Global Step: 498120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:13,074-Speed 6312.75 samples/sec Loss 4.2073 LearningRate 0.0002 Epoch: 24 Global Step: 498130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:16,311-Speed 6328.19 samples/sec Loss 4.2857 LearningRate 0.0002 Epoch: 24 Global Step: 498140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:19,547-Speed 6331.65 samples/sec Loss 4.2077 LearningRate 0.0002 Epoch: 24 Global Step: 498150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:22,786-Speed 6324.32 samples/sec Loss 4.3205 LearningRate 0.0002 Epoch: 24 Global Step: 498160 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:03:26,011-Speed 6351.13 samples/sec Loss 4.2857 LearningRate 0.0002 Epoch: 24 Global Step: 498170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:29,249-Speed 6326.39 samples/sec Loss 4.2535 LearningRate 0.0002 Epoch: 24 Global Step: 498180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:32,484-Speed 6331.22 samples/sec Loss 4.2512 LearningRate 0.0002 Epoch: 24 Global Step: 498190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:35,718-Speed 6334.88 samples/sec Loss 4.3280 LearningRate 0.0002 Epoch: 24 Global Step: 498200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:38,955-Speed 6327.56 samples/sec Loss 4.2489 LearningRate 0.0002 Epoch: 24 Global Step: 498210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:42,189-Speed 6334.33 samples/sec Loss 4.3320 LearningRate 0.0002 Epoch: 24 Global Step: 498220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:45,434-Speed 6312.53 samples/sec Loss 4.3090 LearningRate 0.0002 Epoch: 24 Global Step: 498230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:48,672-Speed 6327.68 samples/sec Loss 4.3190 LearningRate 0.0002 Epoch: 24 Global Step: 498240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:51,905-Speed 6335.25 samples/sec Loss 4.2229 LearningRate 0.0002 Epoch: 24 Global Step: 498250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:55,139-Speed 6334.14 samples/sec Loss 4.2990 LearningRate 0.0002 Epoch: 24 Global Step: 498260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:03:58,360-Speed 6358.67 samples/sec Loss 4.2336 LearningRate 0.0002 Epoch: 24 Global Step: 498270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:01,600-Speed 6324.04 samples/sec Loss 4.2932 LearningRate 0.0002 Epoch: 24 Global Step: 498280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:04,834-Speed 6333.36 samples/sec Loss 4.2626 LearningRate 0.0002 Epoch: 24 Global Step: 498290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:08,073-Speed 6324.87 samples/sec Loss 4.3078 LearningRate 0.0002 Epoch: 24 Global Step: 498300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:11,309-Speed 6331.26 samples/sec Loss 4.3104 LearningRate 0.0002 Epoch: 24 Global Step: 498310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:14,545-Speed 6330.30 samples/sec Loss 4.3077 LearningRate 0.0002 Epoch: 24 Global Step: 498320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:17,784-Speed 6323.27 samples/sec Loss 4.3171 LearningRate 0.0002 Epoch: 24 Global Step: 498330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:21,021-Speed 6329.27 samples/sec Loss 4.2653 LearningRate 0.0002 Epoch: 24 Global Step: 498340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:24,261-Speed 6322.92 samples/sec Loss 4.3511 LearningRate 0.0002 Epoch: 24 Global Step: 498350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:27,506-Speed 6312.60 samples/sec Loss 4.3266 LearningRate 0.0002 Epoch: 24 Global Step: 498360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:30,728-Speed 6355.88 samples/sec Loss 4.3314 LearningRate 0.0002 Epoch: 24 Global Step: 498370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:33,966-Speed 6326.61 samples/sec Loss 4.2198 LearningRate 0.0002 Epoch: 24 Global Step: 498380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:37,205-Speed 6323.99 samples/sec Loss 4.2090 LearningRate 0.0002 Epoch: 24 Global Step: 498390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:40,440-Speed 6332.30 samples/sec Loss 4.2381 LearningRate 0.0002 Epoch: 24 Global Step: 498400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:43,677-Speed 6327.86 samples/sec Loss 4.2891 LearningRate 0.0002 Epoch: 24 Global Step: 498410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:46,916-Speed 6324.66 samples/sec Loss 4.2603 LearningRate 0.0002 Epoch: 24 Global Step: 498420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:50,156-Speed 6322.73 samples/sec Loss 4.2078 LearningRate 0.0002 Epoch: 24 Global Step: 498430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:53,399-Speed 6316.89 samples/sec Loss 4.2016 LearningRate 0.0002 Epoch: 24 Global Step: 498440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:56,636-Speed 6328.47 samples/sec Loss 4.2415 LearningRate 0.0002 Epoch: 24 Global Step: 498450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:04:59,874-Speed 6325.88 samples/sec Loss 4.2932 LearningRate 0.0002 Epoch: 24 Global Step: 498460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:03,096-Speed 6357.45 samples/sec Loss 4.3010 LearningRate 0.0002 Epoch: 24 Global Step: 498470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:06,335-Speed 6325.52 samples/sec Loss 4.3075 LearningRate 0.0002 Epoch: 24 Global Step: 498480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:09,575-Speed 6321.74 samples/sec Loss 4.2547 LearningRate 0.0002 Epoch: 24 Global Step: 498490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:12,813-Speed 6327.47 samples/sec Loss 4.3102 LearningRate 0.0002 Epoch: 24 Global Step: 498500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:16,057-Speed 6313.68 samples/sec Loss 4.2265 LearningRate 0.0002 Epoch: 24 Global Step: 498510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:19,296-Speed 6325.85 samples/sec Loss 4.3180 LearningRate 0.0002 Epoch: 24 Global Step: 498520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:22,532-Speed 6329.87 samples/sec Loss 4.2808 LearningRate 0.0002 Epoch: 24 Global Step: 498530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:25,767-Speed 6332.48 samples/sec Loss 4.2487 LearningRate 0.0002 Epoch: 24 Global Step: 498540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:29,003-Speed 6329.63 samples/sec Loss 4.2942 LearningRate 0.0002 Epoch: 24 Global Step: 498550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:32,250-Speed 6307.96 samples/sec Loss 4.2772 LearningRate 0.0002 Epoch: 24 Global Step: 498560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:35,477-Speed 6349.95 samples/sec Loss 4.3157 LearningRate 0.0002 Epoch: 24 Global Step: 498570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:38,714-Speed 6327.21 samples/sec Loss 4.2675 LearningRate 0.0002 Epoch: 24 Global Step: 498580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:41,952-Speed 6327.21 samples/sec Loss 4.3184 LearningRate 0.0002 Epoch: 24 Global Step: 498590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:45,188-Speed 6330.44 samples/sec Loss 4.3560 LearningRate 0.0002 Epoch: 24 Global Step: 498600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:48,442-Speed 6294.09 samples/sec Loss 4.2553 LearningRate 0.0002 Epoch: 24 Global Step: 498610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:51,681-Speed 6324.41 samples/sec Loss 4.2552 LearningRate 0.0002 Epoch: 24 Global Step: 498620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:54,921-Speed 6323.35 samples/sec Loss 4.2608 LearningRate 0.0002 Epoch: 24 Global Step: 498630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:05:58,157-Speed 6329.67 samples/sec Loss 4.2863 LearningRate 0.0002 Epoch: 24 Global Step: 498640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:01,395-Speed 6326.37 samples/sec Loss 4.2936 LearningRate 0.0002 Epoch: 24 Global Step: 498650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:04,638-Speed 6315.89 samples/sec Loss 4.2958 LearningRate 0.0002 Epoch: 24 Global Step: 498660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:07,864-Speed 6349.88 samples/sec Loss 4.2965 LearningRate 0.0002 Epoch: 24 Global Step: 498670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:11,101-Speed 6328.40 samples/sec Loss 4.2441 LearningRate 0.0002 Epoch: 24 Global Step: 498680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:14,341-Speed 6321.90 samples/sec Loss 4.3207 LearningRate 0.0002 Epoch: 24 Global Step: 498690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:17,581-Speed 6322.88 samples/sec Loss 4.2646 LearningRate 0.0002 Epoch: 24 Global Step: 498700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:20,866-Speed 6235.49 samples/sec Loss 4.2413 LearningRate 0.0002 Epoch: 24 Global Step: 498710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:24,114-Speed 6307.42 samples/sec Loss 4.2937 LearningRate 0.0002 Epoch: 24 Global Step: 498720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:27,355-Speed 6320.70 samples/sec Loss 4.3162 LearningRate 0.0002 Epoch: 24 Global Step: 498730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:30,599-Speed 6314.30 samples/sec Loss 4.2392 LearningRate 0.0002 Epoch: 24 Global Step: 498740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:33,835-Speed 6330.92 samples/sec Loss 4.2911 LearningRate 0.0002 Epoch: 24 Global Step: 498750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:37,071-Speed 6329.50 samples/sec Loss 4.2372 LearningRate 0.0002 Epoch: 24 Global Step: 498760 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:40,344-Speed 6258.17 samples/sec Loss 4.3109 LearningRate 0.0002 Epoch: 24 Global Step: 498770 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:43,588-Speed 6315.69 samples/sec Loss 4.2719 LearningRate 0.0002 Epoch: 24 Global Step: 498780 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:46,825-Speed 6328.98 samples/sec Loss 4.2822 LearningRate 0.0002 Epoch: 24 Global Step: 498790 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:50,062-Speed 6327.32 samples/sec Loss 4.3009 LearningRate 0.0002 Epoch: 24 Global Step: 498800 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:53,301-Speed 6325.70 samples/sec Loss 4.2036 LearningRate 0.0002 Epoch: 24 Global Step: 498810 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:56,537-Speed 6328.25 samples/sec Loss 4.2626 LearningRate 0.0002 Epoch: 24 Global Step: 498820 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:06:59,778-Speed 6320.88 samples/sec Loss 4.2482 LearningRate 0.0002 Epoch: 24 Global Step: 498830 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:03,015-Speed 6327.64 samples/sec Loss 4.2229 LearningRate 0.0002 Epoch: 24 Global Step: 498840 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:06,262-Speed 6310.37 samples/sec Loss 4.3326 LearningRate 0.0002 Epoch: 24 Global Step: 498850 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:09,502-Speed 6321.41 samples/sec Loss 4.2597 LearningRate 0.0002 Epoch: 24 Global Step: 498860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:12,724-Speed 6357.59 samples/sec Loss 4.3132 LearningRate 0.0002 Epoch: 24 Global Step: 498870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:15,966-Speed 6319.75 samples/sec Loss 4.2583 LearningRate 0.0002 Epoch: 24 Global Step: 498880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:19,201-Speed 6331.02 samples/sec Loss 4.2672 LearningRate 0.0002 Epoch: 24 Global Step: 498890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:22,443-Speed 6319.89 samples/sec Loss 4.2737 LearningRate 0.0002 Epoch: 24 Global Step: 498900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:25,681-Speed 6325.03 samples/sec Loss 4.2504 LearningRate 0.0002 Epoch: 24 Global Step: 498910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:28,924-Speed 6316.49 samples/sec Loss 4.2829 LearningRate 0.0002 Epoch: 24 Global Step: 498920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:32,163-Speed 6326.23 samples/sec Loss 4.2848 LearningRate 0.0002 Epoch: 24 Global Step: 498930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:35,401-Speed 6326.45 samples/sec Loss 4.2939 LearningRate 0.0002 Epoch: 24 Global Step: 498940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:38,644-Speed 6315.08 samples/sec Loss 4.3252 LearningRate 0.0002 Epoch: 24 Global Step: 498950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:41,882-Speed 6326.45 samples/sec Loss 4.3061 LearningRate 0.0002 Epoch: 24 Global Step: 498960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:45,123-Speed 6321.05 samples/sec Loss 4.2411 LearningRate 0.0002 Epoch: 24 Global Step: 498970 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:07:48,350-Speed 6347.44 samples/sec Loss 4.2774 LearningRate 0.0002 Epoch: 24 Global Step: 498980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:51,589-Speed 6325.19 samples/sec Loss 4.3437 LearningRate 0.0002 Epoch: 24 Global Step: 498990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:54,829-Speed 6322.85 samples/sec Loss 4.3256 LearningRate 0.0002 Epoch: 24 Global Step: 499000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:07:58,068-Speed 6324.12 samples/sec Loss 4.2785 LearningRate 0.0002 Epoch: 24 Global Step: 499010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:01,312-Speed 6313.54 samples/sec Loss 4.2553 LearningRate 0.0002 Epoch: 24 Global Step: 499020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:04,551-Speed 6325.31 samples/sec Loss 4.2917 LearningRate 0.0002 Epoch: 24 Global Step: 499030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:07,791-Speed 6323.36 samples/sec Loss 4.2715 LearningRate 0.0002 Epoch: 24 Global Step: 499040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:11,029-Speed 6325.98 samples/sec Loss 4.2505 LearningRate 0.0002 Epoch: 24 Global Step: 499050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:14,273-Speed 6313.83 samples/sec Loss 4.2850 LearningRate 0.0002 Epoch: 24 Global Step: 499060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:17,519-Speed 6311.19 samples/sec Loss 4.2717 LearningRate 0.0002 Epoch: 24 Global Step: 499070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:20,746-Speed 6347.92 samples/sec Loss 4.2419 LearningRate 0.0002 Epoch: 24 Global Step: 499080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:23,994-Speed 6306.47 samples/sec Loss 4.3087 LearningRate 0.0002 Epoch: 24 Global Step: 499090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:27,236-Speed 6319.26 samples/sec Loss 4.2226 LearningRate 0.0002 Epoch: 24 Global Step: 499100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:30,474-Speed 6325.95 samples/sec Loss 4.2595 LearningRate 0.0002 Epoch: 24 Global Step: 499110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:33,717-Speed 6316.15 samples/sec Loss 4.3610 LearningRate 0.0002 Epoch: 24 Global Step: 499120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:36,963-Speed 6309.95 samples/sec Loss 4.3036 LearningRate 0.0002 Epoch: 24 Global Step: 499130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:40,210-Speed 6311.26 samples/sec Loss 4.2499 LearningRate 0.0002 Epoch: 24 Global Step: 499140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:43,449-Speed 6323.36 samples/sec Loss 4.2340 LearningRate 0.0002 Epoch: 24 Global Step: 499150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:46,687-Speed 6326.35 samples/sec Loss 4.3779 LearningRate 0.0002 Epoch: 24 Global Step: 499160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:49,992-Speed 6199.24 samples/sec Loss 4.2986 LearningRate 0.0002 Epoch: 24 Global Step: 499170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:53,219-Speed 6346.63 samples/sec Loss 4.2427 LearningRate 0.0002 Epoch: 24 Global Step: 499180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:56,461-Speed 6319.88 samples/sec Loss 4.2421 LearningRate 0.0002 Epoch: 24 Global Step: 499190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:08:59,695-Speed 6333.47 samples/sec Loss 4.2191 LearningRate 0.0002 Epoch: 24 Global Step: 499200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:02,939-Speed 6314.34 samples/sec Loss 4.2855 LearningRate 0.0002 Epoch: 24 Global Step: 499210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:06,195-Speed 6291.18 samples/sec Loss 4.3076 LearningRate 0.0002 Epoch: 24 Global Step: 499220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:09,436-Speed 6320.53 samples/sec Loss 4.3202 LearningRate 0.0002 Epoch: 24 Global Step: 499230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:12,676-Speed 6322.61 samples/sec Loss 4.2406 LearningRate 0.0002 Epoch: 24 Global Step: 499240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:15,935-Speed 6285.91 samples/sec Loss 4.2291 LearningRate 0.0002 Epoch: 24 Global Step: 499250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:19,175-Speed 6321.47 samples/sec Loss 4.2522 LearningRate 0.0002 Epoch: 24 Global Step: 499260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:22,420-Speed 6314.24 samples/sec Loss 4.2449 LearningRate 0.0002 Epoch: 24 Global Step: 499270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:25,643-Speed 6355.09 samples/sec Loss 4.3151 LearningRate 0.0002 Epoch: 24 Global Step: 499280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:28,884-Speed 6321.27 samples/sec Loss 4.3012 LearningRate 0.0002 Epoch: 24 Global Step: 499290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:32,123-Speed 6323.31 samples/sec Loss 4.2784 LearningRate 0.0002 Epoch: 24 Global Step: 499300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:35,360-Speed 6328.77 samples/sec Loss 4.2193 LearningRate 0.0002 Epoch: 24 Global Step: 499310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:38,600-Speed 6322.73 samples/sec Loss 4.2693 LearningRate 0.0002 Epoch: 24 Global Step: 499320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:41,841-Speed 6318.94 samples/sec Loss 4.2928 LearningRate 0.0002 Epoch: 24 Global Step: 499330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:45,091-Speed 6304.48 samples/sec Loss 4.2574 LearningRate 0.0002 Epoch: 24 Global Step: 499340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:48,340-Speed 6304.43 samples/sec Loss 4.2360 LearningRate 0.0002 Epoch: 24 Global Step: 499350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:51,604-Speed 6276.16 samples/sec Loss 4.3639 LearningRate 0.0002 Epoch: 24 Global Step: 499360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:54,848-Speed 6315.78 samples/sec Loss 4.2550 LearningRate 0.0002 Epoch: 24 Global Step: 499370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:09:58,079-Speed 6339.71 samples/sec Loss 4.3552 LearningRate 0.0002 Epoch: 24 Global Step: 499380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:10:01,323-Speed 6314.42 samples/sec Loss 4.3094 LearningRate 0.0002 Epoch: 24 Global Step: 499390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:10:04,566-Speed 6317.17 samples/sec Loss 4.2657 LearningRate 0.0002 Epoch: 24 Global Step: 499400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:10:07,807-Speed 6320.32 samples/sec Loss 4.2779 LearningRate 0.0002 Epoch: 24 Global Step: 499410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:10:11,051-Speed 6313.76 samples/sec Loss 4.2600 LearningRate 0.0002 Epoch: 24 Global Step: 499420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:10:14,294-Speed 6319.76 samples/sec Loss 4.2735 LearningRate 0.0002 Epoch: 24 Global Step: 499430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:10:17,535-Speed 6320.32 samples/sec Loss 4.2592 LearningRate 0.0002 Epoch: 24 Global Step: 499440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:10:20,776-Speed 6321.15 samples/sec Loss 4.2849 LearningRate 0.0002 Epoch: 24 Global Step: 499450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:10:24,017-Speed 6320.29 samples/sec Loss 4.2507 LearningRate 0.0002 Epoch: 24 Global Step: 499460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:10:27,259-Speed 6318.31 samples/sec Loss 4.2360 LearningRate 0.0002 Epoch: 24 Global Step: 499470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:10:30,489-Speed 6340.77 samples/sec Loss 4.1926 LearningRate 0.0002 Epoch: 24 Global Step: 499480 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:10:33,732-Speed 6318.50 samples/sec Loss 4.2642 LearningRate 0.0002 Epoch: 24 Global Step: 499490 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:10:36,973-Speed 6319.36 samples/sec Loss 4.2455 LearningRate 0.0002 Epoch: 24 Global Step: 499500 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:10:40,212-Speed 6323.97 samples/sec Loss 4.2592 LearningRate 0.0002 Epoch: 24 Global Step: 499510 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:10:43,463-Speed 6301.89 samples/sec Loss 4.2595 LearningRate 0.0002 Epoch: 24 Global Step: 499520 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:10:46,702-Speed 6323.99 samples/sec Loss 4.2055 LearningRate 0.0002 Epoch: 24 Global Step: 499530 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:10:49,943-Speed 6319.71 samples/sec Loss 4.2622 LearningRate 0.0002 Epoch: 24 Global Step: 499540 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:10:53,186-Speed 6316.86 samples/sec Loss 4.2772 LearningRate 0.0002 Epoch: 24 Global Step: 499550 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:10:56,426-Speed 6323.50 samples/sec Loss 4.2446 LearningRate 0.0002 Epoch: 24 Global Step: 499560 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:10:59,673-Speed 6309.44 samples/sec Loss 4.2947 LearningRate 0.0002 Epoch: 24 Global Step: 499570 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:11:02,914-Speed 6320.05 samples/sec Loss 4.2647 LearningRate 0.0002 Epoch: 24 Global Step: 499580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:06,158-Speed 6315.09 samples/sec Loss 4.2256 LearningRate 0.0002 Epoch: 24 Global Step: 499590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:09,398-Speed 6321.28 samples/sec Loss 4.2899 LearningRate 0.0002 Epoch: 24 Global Step: 499600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:12,643-Speed 6311.90 samples/sec Loss 4.3070 LearningRate 0.0002 Epoch: 24 Global Step: 499610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:15,886-Speed 6317.70 samples/sec Loss 4.2767 LearningRate 0.0002 Epoch: 24 Global Step: 499620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:19,129-Speed 6315.46 samples/sec Loss 4.2435 LearningRate 0.0002 Epoch: 24 Global Step: 499630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:22,373-Speed 6315.42 samples/sec Loss 4.2199 LearningRate 0.0002 Epoch: 24 Global Step: 499640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:25,617-Speed 6314.29 samples/sec Loss 4.2912 LearningRate 0.0002 Epoch: 24 Global Step: 499650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:28,860-Speed 6317.17 samples/sec Loss 4.2680 LearningRate 0.0002 Epoch: 24 Global Step: 499660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:32,135-Speed 6254.47 samples/sec Loss 4.2482 LearningRate 0.0002 Epoch: 24 Global Step: 499670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:35,368-Speed 6336.02 samples/sec Loss 4.2420 LearningRate 0.0002 Epoch: 24 Global Step: 499680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:38,613-Speed 6312.57 samples/sec Loss 4.2289 LearningRate 0.0002 Epoch: 24 Global Step: 499690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:41,858-Speed 6312.50 samples/sec Loss 4.2654 LearningRate 0.0002 Epoch: 24 Global Step: 499700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:45,098-Speed 6323.38 samples/sec Loss 4.2657 LearningRate 0.0002 Epoch: 24 Global Step: 499710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:48,340-Speed 6317.28 samples/sec Loss 4.2654 LearningRate 0.0002 Epoch: 24 Global Step: 499720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:51,578-Speed 6327.26 samples/sec Loss 4.2764 LearningRate 0.0002 Epoch: 24 Global Step: 499730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:54,819-Speed 6319.79 samples/sec Loss 4.3220 LearningRate 0.0002 Epoch: 24 Global Step: 499740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:11:58,061-Speed 6318.08 samples/sec Loss 4.3005 LearningRate 0.0002 Epoch: 24 Global Step: 499750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:01,307-Speed 6310.81 samples/sec Loss 4.2392 LearningRate 0.0002 Epoch: 24 Global Step: 499760 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:04,552-Speed 6313.52 samples/sec Loss 4.2711 LearningRate 0.0002 Epoch: 24 Global Step: 499770 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:07,795-Speed 6317.44 samples/sec Loss 4.2950 LearningRate 0.0002 Epoch: 24 Global Step: 499780 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:12:11,024-Speed 6345.13 samples/sec Loss 4.2434 LearningRate 0.0002 Epoch: 24 Global Step: 499790 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:14,426-Speed 6020.58 samples/sec Loss 4.1982 LearningRate 0.0002 Epoch: 24 Global Step: 499800 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:17,670-Speed 6314.46 samples/sec Loss 4.2786 LearningRate 0.0002 Epoch: 24 Global Step: 499810 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:20,913-Speed 6317.28 samples/sec Loss 4.3131 LearningRate 0.0002 Epoch: 24 Global Step: 499820 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:24,156-Speed 6314.82 samples/sec Loss 4.3149 LearningRate 0.0002 Epoch: 24 Global Step: 499830 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:27,400-Speed 6315.21 samples/sec Loss 4.3080 LearningRate 0.0002 Epoch: 24 Global Step: 499840 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:30,641-Speed 6320.27 samples/sec Loss 4.2723 LearningRate 0.0002 Epoch: 24 Global Step: 499850 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:33,883-Speed 6318.55 samples/sec Loss 4.2533 LearningRate 0.0002 Epoch: 24 Global Step: 499860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:37,126-Speed 6317.30 samples/sec Loss 4.2069 LearningRate 0.0002 Epoch: 24 Global Step: 499870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:40,413-Speed 6231.16 samples/sec Loss 4.2794 LearningRate 0.0002 Epoch: 24 Global Step: 499880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:43,641-Speed 6346.09 samples/sec Loss 4.3551 LearningRate 0.0002 Epoch: 24 Global Step: 499890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:46,881-Speed 6323.69 samples/sec Loss 4.3011 LearningRate 0.0002 Epoch: 24 Global Step: 499900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:50,127-Speed 6310.52 samples/sec Loss 4.3395 LearningRate 0.0002 Epoch: 24 Global Step: 499910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:53,369-Speed 6317.16 samples/sec Loss 4.2872 LearningRate 0.0002 Epoch: 24 Global Step: 499920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:56,611-Speed 6318.73 samples/sec Loss 4.2816 LearningRate 0.0002 Epoch: 24 Global Step: 499930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:12:59,855-Speed 6314.62 samples/sec Loss 4.2774 LearningRate 0.0002 Epoch: 24 Global Step: 499940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:03,104-Speed 6309.22 samples/sec Loss 4.3344 LearningRate 0.0002 Epoch: 24 Global Step: 499950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:06,350-Speed 6309.23 samples/sec Loss 4.2671 LearningRate 0.0002 Epoch: 24 Global Step: 499960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:09,595-Speed 6314.48 samples/sec Loss 4.2823 LearningRate 0.0002 Epoch: 24 Global Step: 499970 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:12,840-Speed 6311.81 samples/sec Loss 4.2262 LearningRate 0.0002 Epoch: 24 Global Step: 499980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:16,079-Speed 6325.94 samples/sec Loss 4.2379 LearningRate 0.0002 Epoch: 24 Global Step: 499990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:19,321-Speed 6318.28 samples/sec Loss 4.2752 LearningRate 0.0002 Epoch: 24 Global Step: 500000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:22,566-Speed 6311.76 samples/sec Loss 4.2950 LearningRate 0.0002 Epoch: 24 Global Step: 500010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:25,808-Speed 6318.53 samples/sec Loss 4.2018 LearningRate 0.0002 Epoch: 24 Global Step: 500020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:29,054-Speed 6311.24 samples/sec Loss 4.2911 LearningRate 0.0002 Epoch: 24 Global Step: 500030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:32,293-Speed 6325.47 samples/sec Loss 4.2971 LearningRate 0.0002 Epoch: 24 Global Step: 500040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:35,533-Speed 6321.46 samples/sec Loss 4.2221 LearningRate 0.0002 Epoch: 24 Global Step: 500050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:38,777-Speed 6315.30 samples/sec Loss 4.3061 LearningRate 0.0002 Epoch: 24 Global Step: 500060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:42,020-Speed 6316.65 samples/sec Loss 4.2671 LearningRate 0.0002 Epoch: 24 Global Step: 500070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:45,270-Speed 6301.39 samples/sec Loss 4.2573 LearningRate 0.0002 Epoch: 24 Global Step: 500080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:48,499-Speed 6344.91 samples/sec Loss 4.2828 LearningRate 0.0002 Epoch: 24 Global Step: 500090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:51,741-Speed 6317.92 samples/sec Loss 4.2480 LearningRate 0.0002 Epoch: 24 Global Step: 500100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:54,986-Speed 6313.95 samples/sec Loss 4.2326 LearningRate 0.0002 Epoch: 24 Global Step: 500110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:13:58,227-Speed 6319.08 samples/sec Loss 4.2674 LearningRate 0.0002 Epoch: 24 Global Step: 500120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:01,472-Speed 6314.25 samples/sec Loss 4.2985 LearningRate 0.0002 Epoch: 24 Global Step: 500130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:04,718-Speed 6309.23 samples/sec Loss 4.2911 LearningRate 0.0002 Epoch: 24 Global Step: 500140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:07,967-Speed 6306.20 samples/sec Loss 4.2450 LearningRate 0.0002 Epoch: 24 Global Step: 500150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:11,212-Speed 6313.10 samples/sec Loss 4.2877 LearningRate 0.0002 Epoch: 24 Global Step: 500160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:14,457-Speed 6312.23 samples/sec Loss 4.2860 LearningRate 0.0002 Epoch: 24 Global Step: 500170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:17,699-Speed 6317.45 samples/sec Loss 4.2555 LearningRate 0.0002 Epoch: 24 Global Step: 500180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:20,940-Speed 6320.69 samples/sec Loss 4.2175 LearningRate 0.0002 Epoch: 24 Global Step: 500190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:24,185-Speed 6312.93 samples/sec Loss 4.2738 LearningRate 0.0002 Epoch: 24 Global Step: 500200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:27,469-Speed 6237.97 samples/sec Loss 4.2526 LearningRate 0.0002 Epoch: 24 Global Step: 500210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:30,711-Speed 6318.79 samples/sec Loss 4.2663 LearningRate 0.0002 Epoch: 24 Global Step: 500220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:33,955-Speed 6315.54 samples/sec Loss 4.3113 LearningRate 0.0002 Epoch: 24 Global Step: 500230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:37,195-Speed 6323.07 samples/sec Loss 4.2602 LearningRate 0.0002 Epoch: 24 Global Step: 500240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:40,448-Speed 6296.23 samples/sec Loss 4.2242 LearningRate 0.0002 Epoch: 24 Global Step: 500250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:43,689-Speed 6321.52 samples/sec Loss 4.2358 LearningRate 0.0002 Epoch: 24 Global Step: 500260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:46,937-Speed 6306.15 samples/sec Loss 4.3133 LearningRate 0.0002 Epoch: 24 Global Step: 500270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:50,185-Speed 6306.72 samples/sec Loss 4.2421 LearningRate 0.0002 Epoch: 24 Global Step: 500280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:53,439-Speed 6294.68 samples/sec Loss 4.2235 LearningRate 0.0002 Epoch: 24 Global Step: 500290 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:14:56,666-Speed 6348.87 samples/sec Loss 4.2979 LearningRate 0.0002 Epoch: 24 Global Step: 500300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:14:59,911-Speed 6311.35 samples/sec Loss 4.1940 LearningRate 0.0002 Epoch: 24 Global Step: 500310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:03,159-Speed 6307.35 samples/sec Loss 4.2455 LearningRate 0.0002 Epoch: 24 Global Step: 500320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:06,406-Speed 6308.64 samples/sec Loss 4.2801 LearningRate 0.0002 Epoch: 24 Global Step: 500330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:09,651-Speed 6311.80 samples/sec Loss 4.2634 LearningRate 0.0002 Epoch: 24 Global Step: 500340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:12,899-Speed 6308.91 samples/sec Loss 4.3051 LearningRate 0.0002 Epoch: 24 Global Step: 500350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:16,139-Speed 6320.48 samples/sec Loss 4.2670 LearningRate 0.0002 Epoch: 24 Global Step: 500360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:19,383-Speed 6314.63 samples/sec Loss 4.3397 LearningRate 0.0002 Epoch: 24 Global Step: 500370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:22,628-Speed 6313.26 samples/sec Loss 4.2817 LearningRate 0.0002 Epoch: 24 Global Step: 500380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:25,876-Speed 6308.14 samples/sec Loss 4.2777 LearningRate 0.0002 Epoch: 24 Global Step: 500390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:29,105-Speed 6344.19 samples/sec Loss 4.2476 LearningRate 0.0002 Epoch: 24 Global Step: 500400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:32,348-Speed 6315.25 samples/sec Loss 4.2541 LearningRate 0.0002 Epoch: 24 Global Step: 500410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:35,595-Speed 6310.50 samples/sec Loss 4.2925 LearningRate 0.0002 Epoch: 24 Global Step: 500420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:38,836-Speed 6320.89 samples/sec Loss 4.1995 LearningRate 0.0002 Epoch: 24 Global Step: 500430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:42,083-Speed 6307.62 samples/sec Loss 4.1939 LearningRate 0.0002 Epoch: 24 Global Step: 500440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:45,332-Speed 6304.79 samples/sec Loss 4.2752 LearningRate 0.0002 Epoch: 24 Global Step: 500450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:48,575-Speed 6317.72 samples/sec Loss 4.2219 LearningRate 0.0002 Epoch: 24 Global Step: 500460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:51,821-Speed 6310.73 samples/sec Loss 4.2466 LearningRate 0.0002 Epoch: 24 Global Step: 500470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:55,067-Speed 6310.60 samples/sec Loss 4.1969 LearningRate 0.0002 Epoch: 24 Global Step: 500480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:15:58,316-Speed 6304.43 samples/sec Loss 4.2561 LearningRate 0.0002 Epoch: 24 Global Step: 500490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:01,550-Speed 6334.55 samples/sec Loss 4.2959 LearningRate 0.0002 Epoch: 24 Global Step: 500500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:04,793-Speed 6314.89 samples/sec Loss 4.2458 LearningRate 0.0002 Epoch: 24 Global Step: 500510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:08,039-Speed 6311.07 samples/sec Loss 4.1953 LearningRate 0.0002 Epoch: 24 Global Step: 500520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:11,281-Speed 6319.01 samples/sec Loss 4.1913 LearningRate 0.0002 Epoch: 24 Global Step: 500530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:14,523-Speed 6318.42 samples/sec Loss 4.2633 LearningRate 0.0002 Epoch: 24 Global Step: 500540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:17,771-Speed 6306.70 samples/sec Loss 4.2434 LearningRate 0.0002 Epoch: 24 Global Step: 500550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:21,016-Speed 6312.28 samples/sec Loss 4.2344 LearningRate 0.0002 Epoch: 24 Global Step: 500560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:24,263-Speed 6308.54 samples/sec Loss 4.2883 LearningRate 0.0002 Epoch: 24 Global Step: 500570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:27,507-Speed 6315.32 samples/sec Loss 4.2681 LearningRate 0.0002 Epoch: 24 Global Step: 500580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:30,750-Speed 6316.75 samples/sec Loss 4.2976 LearningRate 0.0002 Epoch: 24 Global Step: 500590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:33,981-Speed 6339.67 samples/sec Loss 4.2956 LearningRate 0.0002 Epoch: 24 Global Step: 500600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:37,226-Speed 6313.55 samples/sec Loss 4.2153 LearningRate 0.0002 Epoch: 24 Global Step: 500610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:40,471-Speed 6312.66 samples/sec Loss 4.2853 LearningRate 0.0002 Epoch: 24 Global Step: 500620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:43,725-Speed 6295.47 samples/sec Loss 4.3402 LearningRate 0.0002 Epoch: 24 Global Step: 500630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:46,973-Speed 6307.43 samples/sec Loss 4.2331 LearningRate 0.0002 Epoch: 24 Global Step: 500640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:50,216-Speed 6316.09 samples/sec Loss 4.2629 LearningRate 0.0002 Epoch: 24 Global Step: 500650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:53,465-Speed 6305.16 samples/sec Loss 4.1783 LearningRate 0.0002 Epoch: 24 Global Step: 500660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:56,711-Speed 6310.13 samples/sec Loss 4.2215 LearningRate 0.0002 Epoch: 24 Global Step: 500670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:16:59,951-Speed 6322.34 samples/sec Loss 4.3236 LearningRate 0.0002 Epoch: 24 Global Step: 500680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:03,197-Speed 6311.04 samples/sec Loss 4.2942 LearningRate 0.0002 Epoch: 24 Global Step: 500690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:06,431-Speed 6333.85 samples/sec Loss 4.2416 LearningRate 0.0002 Epoch: 24 Global Step: 500700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:09,679-Speed 6307.10 samples/sec Loss 4.3085 LearningRate 0.0002 Epoch: 24 Global Step: 500710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:12,923-Speed 6315.12 samples/sec Loss 4.2871 LearningRate 0.0002 Epoch: 24 Global Step: 500720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:16,168-Speed 6312.76 samples/sec Loss 4.3126 LearningRate 0.0002 Epoch: 24 Global Step: 500730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:19,415-Speed 6307.79 samples/sec Loss 4.2882 LearningRate 0.0002 Epoch: 24 Global Step: 500740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:22,662-Speed 6309.03 samples/sec Loss 4.1670 LearningRate 0.0002 Epoch: 24 Global Step: 500750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:25,910-Speed 6307.93 samples/sec Loss 4.3053 LearningRate 0.0002 Epoch: 24 Global Step: 500760 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:29,150-Speed 6321.86 samples/sec Loss 4.2680 LearningRate 0.0002 Epoch: 24 Global Step: 500770 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:32,402-Speed 6299.73 samples/sec Loss 4.2813 LearningRate 0.0002 Epoch: 24 Global Step: 500780 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:35,646-Speed 6313.55 samples/sec Loss 4.2880 LearningRate 0.0002 Epoch: 24 Global Step: 500790 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:38,886-Speed 6321.98 samples/sec Loss 4.2628 LearningRate 0.0002 Epoch: 24 Global Step: 500800 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:17:42,121-Speed 6333.20 samples/sec Loss 4.2336 LearningRate 0.0002 Epoch: 24 Global Step: 500810 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:45,364-Speed 6315.57 samples/sec Loss 4.3078 LearningRate 0.0002 Epoch: 24 Global Step: 500820 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:48,613-Speed 6304.78 samples/sec Loss 4.1834 LearningRate 0.0002 Epoch: 24 Global Step: 500830 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:51,861-Speed 6308.24 samples/sec Loss 4.2370 LearningRate 0.0002 Epoch: 24 Global Step: 500840 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:55,108-Speed 6309.84 samples/sec Loss 4.3072 LearningRate 0.0002 Epoch: 24 Global Step: 500850 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:17:58,351-Speed 6317.13 samples/sec Loss 4.2531 LearningRate 0.0002 Epoch: 24 Global Step: 500860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:01,596-Speed 6312.89 samples/sec Loss 4.3261 LearningRate 0.0002 Epoch: 24 Global Step: 500870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:04,840-Speed 6313.42 samples/sec Loss 4.2654 LearningRate 0.0002 Epoch: 24 Global Step: 500880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:08,082-Speed 6320.20 samples/sec Loss 4.2799 LearningRate 0.0002 Epoch: 24 Global Step: 500890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:11,326-Speed 6314.76 samples/sec Loss 4.3338 LearningRate 0.0002 Epoch: 24 Global Step: 500900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:14,560-Speed 6333.39 samples/sec Loss 4.2962 LearningRate 0.0002 Epoch: 24 Global Step: 500910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:17,803-Speed 6316.17 samples/sec Loss 4.2562 LearningRate 0.0002 Epoch: 24 Global Step: 500920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:21,048-Speed 6311.95 samples/sec Loss 4.2592 LearningRate 0.0002 Epoch: 24 Global Step: 500930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:24,299-Speed 6301.54 samples/sec Loss 4.3440 LearningRate 0.0002 Epoch: 24 Global Step: 500940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:27,553-Speed 6296.10 samples/sec Loss 4.3585 LearningRate 0.0002 Epoch: 24 Global Step: 500950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:30,820-Speed 6270.04 samples/sec Loss 4.2084 LearningRate 0.0002 Epoch: 24 Global Step: 500960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:34,069-Speed 6305.21 samples/sec Loss 4.2636 LearningRate 0.0002 Epoch: 24 Global Step: 500970 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:37,310-Speed 6320.19 samples/sec Loss 4.3014 LearningRate 0.0002 Epoch: 24 Global Step: 500980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:40,555-Speed 6311.71 samples/sec Loss 4.2319 LearningRate 0.0002 Epoch: 24 Global Step: 500990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:43,799-Speed 6315.79 samples/sec Loss 4.2603 LearningRate 0.0002 Epoch: 24 Global Step: 501000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:47,027-Speed 6344.75 samples/sec Loss 4.2393 LearningRate 0.0002 Epoch: 24 Global Step: 501010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:50,273-Speed 6310.29 samples/sec Loss 4.2092 LearningRate 0.0002 Epoch: 24 Global Step: 501020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:53,518-Speed 6312.40 samples/sec Loss 4.2680 LearningRate 0.0002 Epoch: 24 Global Step: 501030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:18:56,761-Speed 6318.21 samples/sec Loss 4.2614 LearningRate 0.0002 Epoch: 24 Global Step: 501040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:00,087-Speed 6158.93 samples/sec Loss 4.2278 LearningRate 0.0002 Epoch: 24 Global Step: 501050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:03,329-Speed 6317.81 samples/sec Loss 4.2039 LearningRate 0.0002 Epoch: 24 Global Step: 501060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:06,577-Speed 6311.09 samples/sec Loss 4.3033 LearningRate 0.0002 Epoch: 24 Global Step: 501070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:09,820-Speed 6317.22 samples/sec Loss 4.2669 LearningRate 0.0002 Epoch: 24 Global Step: 501080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:13,069-Speed 6305.61 samples/sec Loss 4.2640 LearningRate 0.0002 Epoch: 24 Global Step: 501090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:16,313-Speed 6314.66 samples/sec Loss 4.2714 LearningRate 0.0002 Epoch: 24 Global Step: 501100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:19,546-Speed 6335.84 samples/sec Loss 4.2888 LearningRate 0.0002 Epoch: 24 Global Step: 501110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:22,793-Speed 6307.11 samples/sec Loss 4.2503 LearningRate 0.0002 Epoch: 24 Global Step: 501120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:26,042-Speed 6305.35 samples/sec Loss 4.2318 LearningRate 0.0002 Epoch: 24 Global Step: 501130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:29,286-Speed 6314.55 samples/sec Loss 4.2844 LearningRate 0.0002 Epoch: 24 Global Step: 501140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:32,543-Speed 6289.91 samples/sec Loss 4.2246 LearningRate 0.0002 Epoch: 24 Global Step: 501150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:35,788-Speed 6312.29 samples/sec Loss 4.2186 LearningRate 0.0002 Epoch: 24 Global Step: 501160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:39,040-Speed 6298.66 samples/sec Loss 4.1851 LearningRate 0.0002 Epoch: 24 Global Step: 501170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:19:42,273-Speed 6336.63 samples/sec Loss 4.2255 LearningRate 0.0002 Epoch: 24 Global Step: 501180 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:19:45,520-Speed 6309.20 samples/sec Loss 4.2741 LearningRate 0.0002 Epoch: 24 Global Step: 501190 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:19:48,765-Speed 6312.03 samples/sec Loss 4.2618 LearningRate 0.0002 Epoch: 24 Global Step: 501200 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:19:52,011-Speed 6311.61 samples/sec Loss 4.2018 LearningRate 0.0002 Epoch: 24 Global Step: 501210 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:19:55,274-Speed 6277.30 samples/sec Loss 4.2600 LearningRate 0.0002 Epoch: 24 Global Step: 501220 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:19:58,524-Speed 6302.91 samples/sec Loss 4.2160 LearningRate 0.0002 Epoch: 24 Global Step: 501230 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:20:01,770-Speed 6309.77 samples/sec Loss 4.2889 LearningRate 0.0002 Epoch: 24 Global Step: 501240 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:20:05,016-Speed 6311.90 samples/sec Loss 4.2061 LearningRate 0.0002 Epoch: 24 Global Step: 501250 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:20:08,259-Speed 6317.56 samples/sec Loss 4.2198 LearningRate 0.0002 Epoch: 24 Global Step: 501260 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:20:11,504-Speed 6311.14 samples/sec Loss 4.2372 LearningRate 0.0002 Epoch: 24 Global Step: 501270 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:20:14,753-Speed 6306.84 samples/sec Loss 4.2406 LearningRate 0.0002 Epoch: 24 Global Step: 501280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:17,998-Speed 6310.93 samples/sec Loss 4.2758 LearningRate 0.0002 Epoch: 24 Global Step: 501290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:21,239-Speed 6320.57 samples/sec Loss 4.2374 LearningRate 0.0002 Epoch: 24 Global Step: 501300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:24,482-Speed 6316.11 samples/sec Loss 4.2722 LearningRate 0.0002 Epoch: 24 Global Step: 501310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:27,727-Speed 6313.36 samples/sec Loss 4.2419 LearningRate 0.0002 Epoch: 24 Global Step: 501320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:30,973-Speed 6311.41 samples/sec Loss 4.1914 LearningRate 0.0002 Epoch: 24 Global Step: 501330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:34,215-Speed 6318.36 samples/sec Loss 4.3383 LearningRate 0.0002 Epoch: 24 Global Step: 501340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:37,457-Speed 6317.62 samples/sec Loss 4.2387 LearningRate 0.0002 Epoch: 24 Global Step: 501350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:40,705-Speed 6308.03 samples/sec Loss 4.2494 LearningRate 0.0002 Epoch: 24 Global Step: 501360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:43,952-Speed 6308.48 samples/sec Loss 4.2607 LearningRate 0.0002 Epoch: 24 Global Step: 501370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:47,185-Speed 6336.78 samples/sec Loss 4.2154 LearningRate 0.0002 Epoch: 24 Global Step: 501380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:50,433-Speed 6306.85 samples/sec Loss 4.2298 LearningRate 0.0002 Epoch: 24 Global Step: 501390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:53,675-Speed 6318.08 samples/sec Loss 4.2692 LearningRate 0.0002 Epoch: 24 Global Step: 501400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:20:56,918-Speed 6316.22 samples/sec Loss 4.2366 LearningRate 0.0002 Epoch: 24 Global Step: 501410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:00,162-Speed 6313.71 samples/sec Loss 4.2229 LearningRate 0.0002 Epoch: 24 Global Step: 501420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:03,411-Speed 6304.85 samples/sec Loss 4.2710 LearningRate 0.0002 Epoch: 24 Global Step: 501430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:06,654-Speed 6316.55 samples/sec Loss 4.2977 LearningRate 0.0002 Epoch: 24 Global Step: 501440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:09,904-Speed 6304.02 samples/sec Loss 4.2046 LearningRate 0.0002 Epoch: 24 Global Step: 501450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:13,152-Speed 6307.92 samples/sec Loss 4.2430 LearningRate 0.0002 Epoch: 24 Global Step: 501460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:16,397-Speed 6311.93 samples/sec Loss 4.2451 LearningRate 0.0002 Epoch: 24 Global Step: 501470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:19,645-Speed 6308.06 samples/sec Loss 4.2577 LearningRate 0.0002 Epoch: 24 Global Step: 501480 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:21:22,875-Speed 6340.09 samples/sec Loss 4.2281 LearningRate 0.0002 Epoch: 24 Global Step: 501490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:26,133-Speed 6290.16 samples/sec Loss 4.2784 LearningRate 0.0002 Epoch: 24 Global Step: 501500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:29,380-Speed 6309.40 samples/sec Loss 4.1825 LearningRate 0.0002 Epoch: 24 Global Step: 501510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:32,646-Speed 6271.73 samples/sec Loss 4.2278 LearningRate 0.0002 Epoch: 24 Global Step: 501520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:35,892-Speed 6311.41 samples/sec Loss 4.2960 LearningRate 0.0002 Epoch: 24 Global Step: 501530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:39,134-Speed 6318.05 samples/sec Loss 4.2396 LearningRate 0.0002 Epoch: 24 Global Step: 501540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:42,386-Speed 6299.14 samples/sec Loss 4.2860 LearningRate 0.0002 Epoch: 24 Global Step: 501550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:45,632-Speed 6309.66 samples/sec Loss 4.2347 LearningRate 0.0002 Epoch: 24 Global Step: 501560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:48,878-Speed 6311.28 samples/sec Loss 4.2433 LearningRate 0.0002 Epoch: 24 Global Step: 501570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:52,129-Speed 6302.21 samples/sec Loss 4.2372 LearningRate 0.0002 Epoch: 24 Global Step: 501580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:55,358-Speed 6342.29 samples/sec Loss 4.3317 LearningRate 0.0002 Epoch: 24 Global Step: 501590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:21:58,609-Speed 6300.86 samples/sec Loss 4.3009 LearningRate 0.0002 Epoch: 24 Global Step: 501600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:01,862-Speed 6297.96 samples/sec Loss 4.2360 LearningRate 0.0002 Epoch: 24 Global Step: 501610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:05,107-Speed 6312.75 samples/sec Loss 4.3054 LearningRate 0.0002 Epoch: 24 Global Step: 501620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:08,353-Speed 6310.16 samples/sec Loss 4.2326 LearningRate 0.0002 Epoch: 24 Global Step: 501630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:11,598-Speed 6313.21 samples/sec Loss 4.3082 LearningRate 0.0002 Epoch: 24 Global Step: 501640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:14,847-Speed 6305.99 samples/sec Loss 4.2949 LearningRate 0.0002 Epoch: 24 Global Step: 501650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:18,091-Speed 6315.18 samples/sec Loss 4.3105 LearningRate 0.0002 Epoch: 24 Global Step: 501660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:21,343-Speed 6298.17 samples/sec Loss 4.1997 LearningRate 0.0002 Epoch: 24 Global Step: 501670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:24,591-Speed 6307.16 samples/sec Loss 4.2824 LearningRate 0.0002 Epoch: 24 Global Step: 501680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:27,822-Speed 6340.37 samples/sec Loss 4.1968 LearningRate 0.0002 Epoch: 24 Global Step: 501690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:31,071-Speed 6304.86 samples/sec Loss 4.2581 LearningRate 0.0002 Epoch: 24 Global Step: 501700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:34,321-Speed 6302.43 samples/sec Loss 4.2957 LearningRate 0.0002 Epoch: 24 Global Step: 501710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:37,572-Speed 6300.44 samples/sec Loss 4.2486 LearningRate 0.0002 Epoch: 24 Global Step: 501720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:40,818-Speed 6311.26 samples/sec Loss 4.2248 LearningRate 0.0002 Epoch: 24 Global Step: 501730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:44,065-Speed 6309.56 samples/sec Loss 4.2608 LearningRate 0.0002 Epoch: 24 Global Step: 501740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:47,309-Speed 6313.87 samples/sec Loss 4.2479 LearningRate 0.0002 Epoch: 24 Global Step: 501750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:22:50,537-Speed 6345.22 samples/sec Loss 4.2365 LearningRate 0.0002 Epoch: 24 Global Step: 501760 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:22:53,783-Speed 6312.22 samples/sec Loss 4.2745 LearningRate 0.0002 Epoch: 24 Global Step: 501770 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:22:57,027-Speed 6314.13 samples/sec Loss 4.2807 LearningRate 0.0002 Epoch: 24 Global Step: 501780 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:23:00,272-Speed 6312.05 samples/sec Loss 4.2248 LearningRate 0.0002 Epoch: 24 Global Step: 501790 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:23:03,523-Speed 6301.58 samples/sec Loss 4.2459 LearningRate 0.0002 Epoch: 24 Global Step: 501800 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:23:06,771-Speed 6306.91 samples/sec Loss 4.2063 LearningRate 0.0002 Epoch: 24 Global Step: 501810 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:23:10,016-Speed 6311.22 samples/sec Loss 4.3163 LearningRate 0.0002 Epoch: 24 Global Step: 501820 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:23:13,259-Speed 6316.96 samples/sec Loss 4.2388 LearningRate 0.0002 Epoch: 24 Global Step: 501830 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:23:16,503-Speed 6314.29 samples/sec Loss 4.2043 LearningRate 0.0002 Epoch: 24 Global Step: 501840 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:23:19,747-Speed 6316.03 samples/sec Loss 4.2896 LearningRate 0.0002 Epoch: 24 Global Step: 501850 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:23:22,995-Speed 6306.14 samples/sec Loss 4.2913 LearningRate 0.0002 Epoch: 24 Global Step: 501860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:26,244-Speed 6305.22 samples/sec Loss 4.2324 LearningRate 0.0002 Epoch: 24 Global Step: 501870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:29,487-Speed 6317.65 samples/sec Loss 4.2685 LearningRate 0.0002 Epoch: 24 Global Step: 501880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:32,734-Speed 6309.45 samples/sec Loss 4.2435 LearningRate 0.0002 Epoch: 24 Global Step: 501890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:35,975-Speed 6320.33 samples/sec Loss 4.2544 LearningRate 0.0002 Epoch: 24 Global Step: 501900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:39,221-Speed 6310.92 samples/sec Loss 4.2045 LearningRate 0.0002 Epoch: 24 Global Step: 501910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:42,470-Speed 6304.62 samples/sec Loss 4.2988 LearningRate 0.0002 Epoch: 24 Global Step: 501920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:45,711-Speed 6320.51 samples/sec Loss 4.2847 LearningRate 0.0002 Epoch: 24 Global Step: 501930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:48,955-Speed 6314.94 samples/sec Loss 4.2719 LearningRate 0.0002 Epoch: 24 Global Step: 501940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:52,198-Speed 6316.35 samples/sec Loss 4.2392 LearningRate 0.0002 Epoch: 24 Global Step: 501950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:55,427-Speed 6342.44 samples/sec Loss 4.2243 LearningRate 0.0002 Epoch: 24 Global Step: 501960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:23:58,675-Speed 6308.05 samples/sec Loss 4.2320 LearningRate 0.0002 Epoch: 24 Global Step: 501970 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:01,920-Speed 6313.40 samples/sec Loss 4.3392 LearningRate 0.0002 Epoch: 24 Global Step: 501980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:05,168-Speed 6306.02 samples/sec Loss 4.2160 LearningRate 0.0002 Epoch: 24 Global Step: 501990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:08,412-Speed 6313.61 samples/sec Loss 4.2481 LearningRate 0.0002 Epoch: 24 Global Step: 502000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:11,661-Speed 6305.94 samples/sec Loss 4.2100 LearningRate 0.0002 Epoch: 24 Global Step: 502010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:14,906-Speed 6312.25 samples/sec Loss 4.2222 LearningRate 0.0002 Epoch: 24 Global Step: 502020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:18,154-Speed 6307.15 samples/sec Loss 4.2173 LearningRate 0.0002 Epoch: 24 Global Step: 502030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:21,400-Speed 6310.42 samples/sec Loss 4.2245 LearningRate 0.0002 Epoch: 24 Global Step: 502040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:24,646-Speed 6311.16 samples/sec Loss 4.2677 LearningRate 0.0002 Epoch: 24 Global Step: 502050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:27,881-Speed 6330.64 samples/sec Loss 4.2472 LearningRate 0.0002 Epoch: 24 Global Step: 502060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:31,133-Speed 6303.45 samples/sec Loss 4.2366 LearningRate 0.0002 Epoch: 24 Global Step: 502070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:34,378-Speed 6313.28 samples/sec Loss 4.2617 LearningRate 0.0002 Epoch: 24 Global Step: 502080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:37,621-Speed 6316.28 samples/sec Loss 4.2395 LearningRate 0.0002 Epoch: 24 Global Step: 502090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:40,867-Speed 6310.62 samples/sec Loss 4.2180 LearningRate 0.0002 Epoch: 24 Global Step: 502100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:44,112-Speed 6313.89 samples/sec Loss 4.2329 LearningRate 0.0002 Epoch: 24 Global Step: 502110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:47,357-Speed 6311.10 samples/sec Loss 4.2897 LearningRate 0.0002 Epoch: 24 Global Step: 502120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:50,604-Speed 6308.98 samples/sec Loss 4.2560 LearningRate 0.0002 Epoch: 24 Global Step: 502130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:53,852-Speed 6306.85 samples/sec Loss 4.1856 LearningRate 0.0002 Epoch: 24 Global Step: 502140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:24:57,103-Speed 6301.50 samples/sec Loss 4.2752 LearningRate 0.0002 Epoch: 24 Global Step: 502150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:00,352-Speed 6304.64 samples/sec Loss 4.2517 LearningRate 0.0002 Epoch: 24 Global Step: 502160 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:25:03,584-Speed 6337.30 samples/sec Loss 4.2181 LearningRate 0.0002 Epoch: 24 Global Step: 502170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:06,831-Speed 6309.18 samples/sec Loss 4.2937 LearningRate 0.0002 Epoch: 24 Global Step: 502180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:10,078-Speed 6309.69 samples/sec Loss 4.2350 LearningRate 0.0002 Epoch: 24 Global Step: 502190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:13,327-Speed 6304.76 samples/sec Loss 4.2902 LearningRate 0.0002 Epoch: 24 Global Step: 502200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:16,571-Speed 6313.39 samples/sec Loss 4.3261 LearningRate 0.0002 Epoch: 24 Global Step: 502210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:19,818-Speed 6309.35 samples/sec Loss 4.2395 LearningRate 0.0002 Epoch: 24 Global Step: 502220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:23,065-Speed 6308.52 samples/sec Loss 4.3108 LearningRate 0.0002 Epoch: 24 Global Step: 502230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:26,310-Speed 6312.62 samples/sec Loss 4.2918 LearningRate 0.0002 Epoch: 24 Global Step: 502240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:29,556-Speed 6311.78 samples/sec Loss 4.1825 LearningRate 0.0002 Epoch: 24 Global Step: 502250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:32,800-Speed 6313.55 samples/sec Loss 4.2795 LearningRate 0.0002 Epoch: 24 Global Step: 502260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:36,034-Speed 6334.30 samples/sec Loss 4.1947 LearningRate 0.0002 Epoch: 24 Global Step: 502270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:39,281-Speed 6308.20 samples/sec Loss 4.2293 LearningRate 0.0002 Epoch: 24 Global Step: 502280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:42,526-Speed 6314.24 samples/sec Loss 4.3438 LearningRate 0.0002 Epoch: 24 Global Step: 502290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:45,775-Speed 6304.64 samples/sec Loss 4.3054 LearningRate 0.0002 Epoch: 24 Global Step: 502300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:49,020-Speed 6313.82 samples/sec Loss 4.2784 LearningRate 0.0002 Epoch: 24 Global Step: 502310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:52,278-Speed 6287.15 samples/sec Loss 4.2234 LearningRate 0.0002 Epoch: 24 Global Step: 502320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:55,521-Speed 6316.84 samples/sec Loss 4.2522 LearningRate 0.0002 Epoch: 24 Global Step: 502330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:25:58,777-Speed 6289.78 samples/sec Loss 4.2699 LearningRate 0.0002 Epoch: 24 Global Step: 502340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:02,022-Speed 6313.26 samples/sec Loss 4.2382 LearningRate 0.0002 Epoch: 24 Global Step: 502350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:05,269-Speed 6309.67 samples/sec Loss 4.2551 LearningRate 0.0002 Epoch: 24 Global Step: 502360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:08,498-Speed 6343.26 samples/sec Loss 4.2508 LearningRate 0.0002 Epoch: 24 Global Step: 502370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:11,745-Speed 6309.03 samples/sec Loss 4.2297 LearningRate 0.0002 Epoch: 24 Global Step: 502380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:14,992-Speed 6308.60 samples/sec Loss 4.2304 LearningRate 0.0002 Epoch: 24 Global Step: 502390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:18,240-Speed 6307.11 samples/sec Loss 4.2531 LearningRate 0.0002 Epoch: 24 Global Step: 502400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:21,487-Speed 6309.12 samples/sec Loss 4.2097 LearningRate 0.0002 Epoch: 24 Global Step: 502410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:24,739-Speed 6298.25 samples/sec Loss 4.2282 LearningRate 0.0002 Epoch: 24 Global Step: 502420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:27,986-Speed 6308.61 samples/sec Loss 4.2872 LearningRate 0.0002 Epoch: 24 Global Step: 502430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:31,236-Speed 6304.29 samples/sec Loss 4.2329 LearningRate 0.0002 Epoch: 24 Global Step: 502440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:34,480-Speed 6313.82 samples/sec Loss 4.3058 LearningRate 0.0002 Epoch: 24 Global Step: 502450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:37,726-Speed 6309.49 samples/sec Loss 4.2194 LearningRate 0.0002 Epoch: 24 Global Step: 502460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:40,960-Speed 6335.84 samples/sec Loss 4.2876 LearningRate 0.0002 Epoch: 24 Global Step: 502470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:44,206-Speed 6309.95 samples/sec Loss 4.2394 LearningRate 0.0002 Epoch: 24 Global Step: 502480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:47,453-Speed 6309.47 samples/sec Loss 4.2627 LearningRate 0.0002 Epoch: 24 Global Step: 502490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:50,706-Speed 6297.42 samples/sec Loss 4.3036 LearningRate 0.0002 Epoch: 24 Global Step: 502500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:53,948-Speed 6317.76 samples/sec Loss 4.2194 LearningRate 0.0002 Epoch: 24 Global Step: 502510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:26:57,192-Speed 6315.52 samples/sec Loss 4.2311 LearningRate 0.0002 Epoch: 24 Global Step: 502520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:00,436-Speed 6314.40 samples/sec Loss 4.3093 LearningRate 0.0002 Epoch: 24 Global Step: 502530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:03,684-Speed 6306.40 samples/sec Loss 4.2383 LearningRate 0.0002 Epoch: 24 Global Step: 502540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:06,930-Speed 6310.63 samples/sec Loss 4.2833 LearningRate 0.0002 Epoch: 24 Global Step: 502550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:10,176-Speed 6310.51 samples/sec Loss 4.2211 LearningRate 0.0002 Epoch: 24 Global Step: 502560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:13,408-Speed 6338.14 samples/sec Loss 4.1744 LearningRate 0.0002 Epoch: 24 Global Step: 502570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:16,659-Speed 6301.49 samples/sec Loss 4.2236 LearningRate 0.0002 Epoch: 24 Global Step: 502580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:19,906-Speed 6310.27 samples/sec Loss 4.2130 LearningRate 0.0002 Epoch: 24 Global Step: 502590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:23,153-Speed 6309.28 samples/sec Loss 4.2219 LearningRate 0.0002 Epoch: 24 Global Step: 502600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:26,396-Speed 6315.78 samples/sec Loss 4.2394 LearningRate 0.0002 Epoch: 24 Global Step: 502610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:29,640-Speed 6314.41 samples/sec Loss 4.2296 LearningRate 0.0002 Epoch: 24 Global Step: 502620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:32,889-Speed 6304.73 samples/sec Loss 4.2208 LearningRate 0.0002 Epoch: 24 Global Step: 502630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:36,143-Speed 6296.45 samples/sec Loss 4.1761 LearningRate 0.0002 Epoch: 24 Global Step: 502640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:39,400-Speed 6289.46 samples/sec Loss 4.2098 LearningRate 0.0002 Epoch: 24 Global Step: 502650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:42,651-Speed 6300.02 samples/sec Loss 4.3106 LearningRate 0.0002 Epoch: 24 Global Step: 502660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:45,894-Speed 6316.93 samples/sec Loss 4.2662 LearningRate 0.0002 Epoch: 24 Global Step: 502670 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:27:49,173-Speed 6246.91 samples/sec Loss 4.2556 LearningRate 0.0002 Epoch: 24 Global Step: 502680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:52,418-Speed 6313.18 samples/sec Loss 4.2937 LearningRate 0.0002 Epoch: 24 Global Step: 502690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:55,662-Speed 6313.46 samples/sec Loss 4.2703 LearningRate 0.0002 Epoch: 24 Global Step: 502700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:27:58,909-Speed 6309.45 samples/sec Loss 4.3040 LearningRate 0.0002 Epoch: 24 Global Step: 502710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:02,159-Speed 6308.40 samples/sec Loss 4.2458 LearningRate 0.0002 Epoch: 24 Global Step: 502720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:05,409-Speed 6302.89 samples/sec Loss 4.2907 LearningRate 0.0002 Epoch: 24 Global Step: 502730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:08,659-Speed 6302.92 samples/sec Loss 4.2424 LearningRate 0.0002 Epoch: 24 Global Step: 502740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:11,901-Speed 6318.30 samples/sec Loss 4.2394 LearningRate 0.0002 Epoch: 24 Global Step: 502750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:15,146-Speed 6313.92 samples/sec Loss 4.2373 LearningRate 0.0002 Epoch: 24 Global Step: 502760 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:18,391-Speed 6312.48 samples/sec Loss 4.2311 LearningRate 0.0002 Epoch: 24 Global Step: 502770 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:21,622-Speed 6340.16 samples/sec Loss 4.3042 LearningRate 0.0002 Epoch: 24 Global Step: 502780 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:24,868-Speed 6310.16 samples/sec Loss 4.2268 LearningRate 0.0002 Epoch: 24 Global Step: 502790 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:28,112-Speed 6315.44 samples/sec Loss 4.2292 LearningRate 0.0002 Epoch: 24 Global Step: 502800 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:31,357-Speed 6311.24 samples/sec Loss 4.1902 LearningRate 0.0002 Epoch: 24 Global Step: 502810 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:34,600-Speed 6316.75 samples/sec Loss 4.2414 LearningRate 0.0002 Epoch: 24 Global Step: 502820 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:37,851-Speed 6300.51 samples/sec Loss 4.2615 LearningRate 0.0002 Epoch: 24 Global Step: 502830 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:41,102-Speed 6301.71 samples/sec Loss 4.2608 LearningRate 0.0002 Epoch: 24 Global Step: 502840 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:44,347-Speed 6313.32 samples/sec Loss 4.2856 LearningRate 0.0002 Epoch: 24 Global Step: 502850 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:47,604-Speed 6288.33 samples/sec Loss 4.2978 LearningRate 0.0002 Epoch: 24 Global Step: 502860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:50,849-Speed 6313.42 samples/sec Loss 4.2059 LearningRate 0.0002 Epoch: 24 Global Step: 502870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:54,088-Speed 6323.25 samples/sec Loss 4.2866 LearningRate 0.0002 Epoch: 24 Global Step: 502880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:28:57,332-Speed 6315.19 samples/sec Loss 4.2817 LearningRate 0.0002 Epoch: 24 Global Step: 502890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:00,581-Speed 6304.17 samples/sec Loss 4.2502 LearningRate 0.0002 Epoch: 24 Global Step: 502900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:03,826-Speed 6313.20 samples/sec Loss 4.2561 LearningRate 0.0002 Epoch: 24 Global Step: 502910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:07,073-Speed 6308.05 samples/sec Loss 4.2175 LearningRate 0.0002 Epoch: 24 Global Step: 502920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:10,321-Speed 6307.59 samples/sec Loss 4.2619 LearningRate 0.0002 Epoch: 24 Global Step: 502930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:13,563-Speed 6319.90 samples/sec Loss 4.2580 LearningRate 0.0002 Epoch: 24 Global Step: 502940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:16,806-Speed 6315.85 samples/sec Loss 4.2856 LearningRate 0.0002 Epoch: 24 Global Step: 502950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:20,051-Speed 6312.84 samples/sec Loss 4.2103 LearningRate 0.0002 Epoch: 24 Global Step: 502960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:23,294-Speed 6318.19 samples/sec Loss 4.2872 LearningRate 0.0002 Epoch: 24 Global Step: 502970 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:26,523-Speed 6343.94 samples/sec Loss 4.3057 LearningRate 0.0002 Epoch: 24 Global Step: 502980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:29,771-Speed 6305.39 samples/sec Loss 4.2474 LearningRate 0.0002 Epoch: 24 Global Step: 502990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:33,017-Speed 6311.91 samples/sec Loss 4.2505 LearningRate 0.0002 Epoch: 24 Global Step: 503000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:36,264-Speed 6307.40 samples/sec Loss 4.2652 LearningRate 0.0002 Epoch: 24 Global Step: 503010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:39,509-Speed 6313.42 samples/sec Loss 4.3050 LearningRate 0.0002 Epoch: 24 Global Step: 503020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:42,758-Speed 6304.57 samples/sec Loss 4.2048 LearningRate 0.0002 Epoch: 24 Global Step: 503030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:46,008-Speed 6301.99 samples/sec Loss 4.2261 LearningRate 0.0002 Epoch: 24 Global Step: 503040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:49,254-Speed 6312.82 samples/sec Loss 4.2158 LearningRate 0.0002 Epoch: 24 Global Step: 503050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:52,498-Speed 6312.72 samples/sec Loss 4.2618 LearningRate 0.0002 Epoch: 24 Global Step: 503060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:55,747-Speed 6305.64 samples/sec Loss 4.2843 LearningRate 0.0002 Epoch: 24 Global Step: 503070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:29:58,978-Speed 6339.73 samples/sec Loss 4.2332 LearningRate 0.0002 Epoch: 24 Global Step: 503080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:02,227-Speed 6304.36 samples/sec Loss 4.1923 LearningRate 0.0002 Epoch: 24 Global Step: 503090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:05,471-Speed 6315.52 samples/sec Loss 4.2672 LearningRate 0.0002 Epoch: 24 Global Step: 503100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:08,719-Speed 6306.69 samples/sec Loss 4.2991 LearningRate 0.0002 Epoch: 24 Global Step: 503110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:11,979-Speed 6284.01 samples/sec Loss 4.2869 LearningRate 0.0002 Epoch: 24 Global Step: 503120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:15,245-Speed 6272.12 samples/sec Loss 4.1879 LearningRate 0.0002 Epoch: 24 Global Step: 503130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:18,494-Speed 6305.02 samples/sec Loss 4.2765 LearningRate 0.0002 Epoch: 24 Global Step: 503140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:21,738-Speed 6314.00 samples/sec Loss 4.2374 LearningRate 0.0002 Epoch: 24 Global Step: 503150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:24,983-Speed 6313.84 samples/sec Loss 4.2021 LearningRate 0.0002 Epoch: 24 Global Step: 503160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:28,234-Speed 6300.12 samples/sec Loss 4.2489 LearningRate 0.0002 Epoch: 24 Global Step: 503170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:31,464-Speed 6342.53 samples/sec Loss 4.2728 LearningRate 0.0002 Epoch: 24 Global Step: 503180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:34,710-Speed 6309.96 samples/sec Loss 4.2186 LearningRate 0.0002 Epoch: 24 Global Step: 503190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:37,958-Speed 6307.32 samples/sec Loss 4.2439 LearningRate 0.0002 Epoch: 24 Global Step: 503200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:41,203-Speed 6313.95 samples/sec Loss 4.2079 LearningRate 0.0002 Epoch: 24 Global Step: 503210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:44,447-Speed 6313.14 samples/sec Loss 4.2891 LearningRate 0.0002 Epoch: 24 Global Step: 503220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:47,695-Speed 6308.55 samples/sec Loss 4.2379 LearningRate 0.0002 Epoch: 24 Global Step: 503230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:50,940-Speed 6310.65 samples/sec Loss 4.1678 LearningRate 0.0002 Epoch: 24 Global Step: 503240 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:54,188-Speed 6308.59 samples/sec Loss 4.2661 LearningRate 0.0002 Epoch: 24 Global Step: 503250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:30:57,434-Speed 6309.21 samples/sec Loss 4.1915 LearningRate 0.0002 Epoch: 24 Global Step: 503260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:00,679-Speed 6313.62 samples/sec Loss 4.2480 LearningRate 0.0002 Epoch: 24 Global Step: 503270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:03,919-Speed 6322.72 samples/sec Loss 4.2348 LearningRate 0.0002 Epoch: 24 Global Step: 503280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:07,172-Speed 6296.33 samples/sec Loss 4.2572 LearningRate 0.0002 Epoch: 24 Global Step: 503290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:10,437-Speed 6274.09 samples/sec Loss 4.2001 LearningRate 0.0002 Epoch: 24 Global Step: 503300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:13,682-Speed 6312.62 samples/sec Loss 4.2421 LearningRate 0.0002 Epoch: 24 Global Step: 503310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:16,926-Speed 6315.20 samples/sec Loss 4.2251 LearningRate 0.0002 Epoch: 24 Global Step: 503320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:20,175-Speed 6303.40 samples/sec Loss 4.1970 LearningRate 0.0002 Epoch: 24 Global Step: 503330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:23,425-Speed 6303.45 samples/sec Loss 4.2689 LearningRate 0.0002 Epoch: 24 Global Step: 503340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:26,666-Speed 6320.88 samples/sec Loss 4.2626 LearningRate 0.0002 Epoch: 24 Global Step: 503350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:29,910-Speed 6314.09 samples/sec Loss 4.2097 LearningRate 0.0002 Epoch: 24 Global Step: 503360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:33,157-Speed 6310.00 samples/sec Loss 4.2428 LearningRate 0.0002 Epoch: 24 Global Step: 503370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:36,404-Speed 6308.11 samples/sec Loss 4.2082 LearningRate 0.0002 Epoch: 24 Global Step: 503380 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:31:39,633-Speed 6344.23 samples/sec Loss 4.2981 LearningRate 0.0002 Epoch: 24 Global Step: 503390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:42,881-Speed 6306.98 samples/sec Loss 4.2496 LearningRate 0.0002 Epoch: 24 Global Step: 503400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:46,123-Speed 6318.20 samples/sec Loss 4.2251 LearningRate 0.0002 Epoch: 24 Global Step: 503410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:49,369-Speed 6311.64 samples/sec Loss 4.2007 LearningRate 0.0002 Epoch: 24 Global Step: 503420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:52,616-Speed 6308.91 samples/sec Loss 4.2521 LearningRate 0.0002 Epoch: 24 Global Step: 503430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:55,860-Speed 6314.49 samples/sec Loss 4.2088 LearningRate 0.0002 Epoch: 24 Global Step: 503440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:31:59,107-Speed 6309.67 samples/sec Loss 4.2313 LearningRate 0.0002 Epoch: 24 Global Step: 503450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:02,355-Speed 6304.96 samples/sec Loss 4.1702 LearningRate 0.0002 Epoch: 24 Global Step: 503460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:05,601-Speed 6311.09 samples/sec Loss 4.2977 LearningRate 0.0002 Epoch: 24 Global Step: 503470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:08,849-Speed 6308.24 samples/sec Loss 4.3418 LearningRate 0.0002 Epoch: 24 Global Step: 503480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:12,081-Speed 6337.73 samples/sec Loss 4.2649 LearningRate 0.0002 Epoch: 24 Global Step: 503490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:15,329-Speed 6306.20 samples/sec Loss 4.3308 LearningRate 0.0002 Epoch: 24 Global Step: 503500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:18,573-Speed 6314.28 samples/sec Loss 4.2898 LearningRate 0.0002 Epoch: 24 Global Step: 503510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:21,821-Speed 6307.86 samples/sec Loss 4.2841 LearningRate 0.0002 Epoch: 24 Global Step: 503520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:25,068-Speed 6309.28 samples/sec Loss 4.2388 LearningRate 0.0002 Epoch: 24 Global Step: 503530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:28,322-Speed 6295.39 samples/sec Loss 4.2196 LearningRate 0.0002 Epoch: 24 Global Step: 503540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:31,566-Speed 6314.64 samples/sec Loss 4.2342 LearningRate 0.0002 Epoch: 24 Global Step: 503550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:34,808-Speed 6318.33 samples/sec Loss 4.2395 LearningRate 0.0002 Epoch: 24 Global Step: 503560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:38,054-Speed 6310.92 samples/sec Loss 4.2026 LearningRate 0.0002 Epoch: 24 Global Step: 503570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:41,302-Speed 6305.77 samples/sec Loss 4.1727 LearningRate 0.0002 Epoch: 24 Global Step: 503580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:44,537-Speed 6333.00 samples/sec Loss 4.2479 LearningRate 0.0002 Epoch: 24 Global Step: 503590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:47,782-Speed 6312.80 samples/sec Loss 4.2348 LearningRate 0.0002 Epoch: 24 Global Step: 503600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:51,028-Speed 6311.61 samples/sec Loss 4.2701 LearningRate 0.0002 Epoch: 24 Global Step: 503610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:54,278-Speed 6303.93 samples/sec Loss 4.3027 LearningRate 0.0002 Epoch: 24 Global Step: 503620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:32:57,523-Speed 6311.63 samples/sec Loss 4.1696 LearningRate 0.0002 Epoch: 24 Global Step: 503630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:00,767-Speed 6315.24 samples/sec Loss 4.2601 LearningRate 0.0002 Epoch: 24 Global Step: 503640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:04,015-Speed 6306.29 samples/sec Loss 4.2836 LearningRate 0.0002 Epoch: 24 Global Step: 503650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:07,261-Speed 6311.31 samples/sec Loss 4.2696 LearningRate 0.0002 Epoch: 24 Global Step: 503660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:10,507-Speed 6311.52 samples/sec Loss 4.2167 LearningRate 0.0002 Epoch: 24 Global Step: 503670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:13,752-Speed 6312.57 samples/sec Loss 4.2516 LearningRate 0.0002 Epoch: 24 Global Step: 503680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:16,985-Speed 6334.87 samples/sec Loss 4.2596 LearningRate 0.0002 Epoch: 24 Global Step: 503690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:20,230-Speed 6313.08 samples/sec Loss 4.2137 LearningRate 0.0002 Epoch: 24 Global Step: 503700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:23,476-Speed 6310.46 samples/sec Loss 4.3511 LearningRate 0.0002 Epoch: 24 Global Step: 503710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:26,722-Speed 6311.14 samples/sec Loss 4.2121 LearningRate 0.0002 Epoch: 24 Global Step: 503720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:29,968-Speed 6311.81 samples/sec Loss 4.2534 LearningRate 0.0002 Epoch: 24 Global Step: 503730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:33,212-Speed 6312.65 samples/sec Loss 4.3116 LearningRate 0.0002 Epoch: 24 Global Step: 503740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:36,473-Speed 6283.19 samples/sec Loss 4.2084 LearningRate 0.0002 Epoch: 24 Global Step: 503750 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:39,726-Speed 6297.39 samples/sec Loss 4.2426 LearningRate 0.0002 Epoch: 24 Global Step: 503760 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:42,974-Speed 6306.40 samples/sec Loss 4.1985 LearningRate 0.0002 Epoch: 24 Global Step: 503770 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:46,216-Speed 6317.38 samples/sec Loss 4.1832 LearningRate 0.0002 Epoch: 24 Global Step: 503780 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:49,462-Speed 6311.88 samples/sec Loss 4.2754 LearningRate 0.0002 Epoch: 24 Global Step: 503790 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:33:52,689-Speed 6346.45 samples/sec Loss 4.2728 LearningRate 0.0002 Epoch: 24 Global Step: 503800 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:55,939-Speed 6304.92 samples/sec Loss 4.2337 LearningRate 0.0002 Epoch: 24 Global Step: 503810 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:33:59,184-Speed 6312.82 samples/sec Loss 4.2332 LearningRate 0.0002 Epoch: 24 Global Step: 503820 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:02,433-Speed 6304.25 samples/sec Loss 4.2005 LearningRate 0.0002 Epoch: 24 Global Step: 503830 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:05,688-Speed 6292.87 samples/sec Loss 4.2132 LearningRate 0.0002 Epoch: 24 Global Step: 503840 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:08,934-Speed 6311.74 samples/sec Loss 4.2187 LearningRate 0.0002 Epoch: 24 Global Step: 503850 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:12,178-Speed 6314.37 samples/sec Loss 4.2766 LearningRate 0.0002 Epoch: 24 Global Step: 503860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:15,427-Speed 6304.98 samples/sec Loss 4.2292 LearningRate 0.0002 Epoch: 24 Global Step: 503870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:18,673-Speed 6310.54 samples/sec Loss 4.2743 LearningRate 0.0002 Epoch: 24 Global Step: 503880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:21,920-Speed 6308.09 samples/sec Loss 4.1832 LearningRate 0.0002 Epoch: 24 Global Step: 503890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:25,151-Speed 6341.90 samples/sec Loss 4.2010 LearningRate 0.0002 Epoch: 24 Global Step: 503900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:28,397-Speed 6310.58 samples/sec Loss 4.2731 LearningRate 0.0002 Epoch: 24 Global Step: 503910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:31,638-Speed 6318.83 samples/sec Loss 4.2722 LearningRate 0.0002 Epoch: 24 Global Step: 503920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:34,886-Speed 6307.20 samples/sec Loss 4.2499 LearningRate 0.0002 Epoch: 24 Global Step: 503930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:34:38,120-Speed 6334.77 samples/sec Loss 4.2900 LearningRate 0.0002 Epoch: 24 Global Step: 503940 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:34:41,363-Speed 6315.41 samples/sec Loss 4.2398 LearningRate 0.0002 Epoch: 24 Global Step: 503950 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:34:44,606-Speed 6317.46 samples/sec Loss 4.2310 LearningRate 0.0002 Epoch: 24 Global Step: 503960 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:34:47,853-Speed 6308.21 samples/sec Loss 4.2518 LearningRate 0.0002 Epoch: 24 Global Step: 503970 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:34:51,102-Speed 6304.61 samples/sec Loss 4.2342 LearningRate 0.0002 Epoch: 24 Global Step: 503980 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:34:54,347-Speed 6313.03 samples/sec Loss 4.2015 LearningRate 0.0002 Epoch: 24 Global Step: 503990 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:34:57,590-Speed 6316.65 samples/sec Loss 4.2581 LearningRate 0.0002 Epoch: 24 Global Step: 504000 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:35:00,835-Speed 6313.64 samples/sec Loss 4.3278 LearningRate 0.0002 Epoch: 24 Global Step: 504010 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:35:04,079-Speed 6313.81 samples/sec Loss 4.3075 LearningRate 0.0002 Epoch: 24 Global Step: 504020 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:35:07,324-Speed 6312.34 samples/sec Loss 4.3203 LearningRate 0.0002 Epoch: 24 Global Step: 504030 Fp16 Grad Scale: 8192 Required: 30 hours Training: 2022-04-02 13:35:10,567-Speed 6317.60 samples/sec Loss 4.2186 LearningRate 0.0002 Epoch: 24 Global Step: 504040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:13,813-Speed 6309.67 samples/sec Loss 4.2363 LearningRate 0.0002 Epoch: 24 Global Step: 504050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:17,064-Speed 6302.60 samples/sec Loss 4.2145 LearningRate 0.0002 Epoch: 24 Global Step: 504060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:20,310-Speed 6311.65 samples/sec Loss 4.2601 LearningRate 0.0002 Epoch: 24 Global Step: 504070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:23,554-Speed 6313.23 samples/sec Loss 4.1977 LearningRate 0.0002 Epoch: 24 Global Step: 504080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:26,799-Speed 6313.87 samples/sec Loss 4.2116 LearningRate 0.0002 Epoch: 24 Global Step: 504090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:30,049-Speed 6302.14 samples/sec Loss 4.2699 LearningRate 0.0002 Epoch: 24 Global Step: 504100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:33,297-Speed 6307.27 samples/sec Loss 4.2191 LearningRate 0.0002 Epoch: 24 Global Step: 504110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:36,544-Speed 6307.77 samples/sec Loss 4.2845 LearningRate 0.0002 Epoch: 24 Global Step: 504120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:39,792-Speed 6307.00 samples/sec Loss 4.2223 LearningRate 0.0002 Epoch: 24 Global Step: 504130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:43,026-Speed 6334.13 samples/sec Loss 4.1974 LearningRate 0.0002 Epoch: 24 Global Step: 504140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:46,272-Speed 6311.82 samples/sec Loss 4.2443 LearningRate 0.0002 Epoch: 24 Global Step: 504150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:49,517-Speed 6311.16 samples/sec Loss 4.2414 LearningRate 0.0002 Epoch: 24 Global Step: 504160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:52,768-Speed 6301.78 samples/sec Loss 4.1879 LearningRate 0.0002 Epoch: 24 Global Step: 504170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:56,015-Speed 6309.24 samples/sec Loss 4.2168 LearningRate 0.0002 Epoch: 24 Global Step: 504180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:35:59,261-Speed 6310.44 samples/sec Loss 4.2179 LearningRate 0.0002 Epoch: 24 Global Step: 504190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:02,508-Speed 6308.85 samples/sec Loss 4.2597 LearningRate 0.0002 Epoch: 24 Global Step: 504200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:05,756-Speed 6306.83 samples/sec Loss 4.2080 LearningRate 0.0002 Epoch: 24 Global Step: 504210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:09,002-Speed 6310.91 samples/sec Loss 4.1479 LearningRate 0.0002 Epoch: 24 Global Step: 504220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:12,250-Speed 6306.89 samples/sec Loss 4.2193 LearningRate 0.0002 Epoch: 24 Global Step: 504230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:15,497-Speed 6307.76 samples/sec Loss 4.2536 LearningRate 0.0002 Epoch: 24 Global Step: 504240 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:36:18,730-Speed 6337.86 samples/sec Loss 4.2342 LearningRate 0.0002 Epoch: 24 Global Step: 504250 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:21,975-Speed 6312.69 samples/sec Loss 4.2573 LearningRate 0.0002 Epoch: 24 Global Step: 504260 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:25,223-Speed 6309.13 samples/sec Loss 4.2010 LearningRate 0.0002 Epoch: 24 Global Step: 504270 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:28,471-Speed 6306.70 samples/sec Loss 4.2537 LearningRate 0.0002 Epoch: 24 Global Step: 504280 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:31,716-Speed 6312.35 samples/sec Loss 4.2167 LearningRate 0.0002 Epoch: 24 Global Step: 504290 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:34,966-Speed 6303.90 samples/sec Loss 4.2740 LearningRate 0.0002 Epoch: 24 Global Step: 504300 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:38,211-Speed 6312.27 samples/sec Loss 4.2542 LearningRate 0.0002 Epoch: 24 Global Step: 504310 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:41,464-Speed 6296.62 samples/sec Loss 4.2648 LearningRate 0.0002 Epoch: 24 Global Step: 504320 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:44,711-Speed 6308.56 samples/sec Loss 4.2075 LearningRate 0.0002 Epoch: 24 Global Step: 504330 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:47,954-Speed 6317.51 samples/sec Loss 4.2743 LearningRate 0.0002 Epoch: 24 Global Step: 504340 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:51,188-Speed 6333.88 samples/sec Loss 4.2809 LearningRate 0.0002 Epoch: 24 Global Step: 504350 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:54,433-Speed 6312.33 samples/sec Loss 4.2259 LearningRate 0.0002 Epoch: 24 Global Step: 504360 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:36:57,676-Speed 6317.01 samples/sec Loss 4.2369 LearningRate 0.0002 Epoch: 24 Global Step: 504370 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:00,921-Speed 6312.64 samples/sec Loss 4.1989 LearningRate 0.0002 Epoch: 24 Global Step: 504380 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:04,165-Speed 6314.78 samples/sec Loss 4.1883 LearningRate 0.0002 Epoch: 24 Global Step: 504390 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:07,418-Speed 6296.81 samples/sec Loss 4.1602 LearningRate 0.0002 Epoch: 24 Global Step: 504400 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:10,669-Speed 6301.25 samples/sec Loss 4.2207 LearningRate 0.0002 Epoch: 24 Global Step: 504410 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:13,917-Speed 6306.45 samples/sec Loss 4.2596 LearningRate 0.0002 Epoch: 24 Global Step: 504420 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:17,177-Speed 6283.95 samples/sec Loss 4.2517 LearningRate 0.0002 Epoch: 24 Global Step: 504430 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:20,434-Speed 6289.75 samples/sec Loss 4.3072 LearningRate 0.0002 Epoch: 24 Global Step: 504440 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:23,661-Speed 6347.38 samples/sec Loss 4.2396 LearningRate 0.0002 Epoch: 24 Global Step: 504450 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:26,906-Speed 6313.72 samples/sec Loss 4.1434 LearningRate 0.0002 Epoch: 24 Global Step: 504460 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:30,154-Speed 6306.76 samples/sec Loss 4.2604 LearningRate 0.0002 Epoch: 24 Global Step: 504470 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:33,398-Speed 6314.73 samples/sec Loss 4.3008 LearningRate 0.0002 Epoch: 24 Global Step: 504480 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:36,641-Speed 6317.25 samples/sec Loss 4.2334 LearningRate 0.0002 Epoch: 24 Global Step: 504490 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:39,887-Speed 6310.81 samples/sec Loss 4.1681 LearningRate 0.0002 Epoch: 24 Global Step: 504500 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:43,131-Speed 6314.11 samples/sec Loss 4.1976 LearningRate 0.0002 Epoch: 24 Global Step: 504510 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:46,374-Speed 6316.18 samples/sec Loss 4.2247 LearningRate 0.0002 Epoch: 24 Global Step: 504520 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:49,619-Speed 6313.77 samples/sec Loss 4.2004 LearningRate 0.0002 Epoch: 24 Global Step: 504530 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:52,869-Speed 6302.76 samples/sec Loss 4.1849 LearningRate 0.0002 Epoch: 24 Global Step: 504540 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:56,099-Speed 6341.52 samples/sec Loss 4.2605 LearningRate 0.0002 Epoch: 24 Global Step: 504550 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:37:59,347-Speed 6307.03 samples/sec Loss 4.2518 LearningRate 0.0002 Epoch: 24 Global Step: 504560 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:02,591-Speed 6314.34 samples/sec Loss 4.1989 LearningRate 0.0002 Epoch: 24 Global Step: 504570 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:05,836-Speed 6311.78 samples/sec Loss 4.2358 LearningRate 0.0002 Epoch: 24 Global Step: 504580 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:09,086-Speed 6303.75 samples/sec Loss 4.1987 LearningRate 0.0002 Epoch: 24 Global Step: 504590 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:12,332-Speed 6310.36 samples/sec Loss 4.2483 LearningRate 0.0002 Epoch: 24 Global Step: 504600 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:15,579-Speed 6310.04 samples/sec Loss 4.2089 LearningRate 0.0002 Epoch: 24 Global Step: 504610 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:18,823-Speed 6312.73 samples/sec Loss 4.2169 LearningRate 0.0002 Epoch: 24 Global Step: 504620 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:22,068-Speed 6312.69 samples/sec Loss 4.1885 LearningRate 0.0002 Epoch: 24 Global Step: 504630 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:25,316-Speed 6307.31 samples/sec Loss 4.3089 LearningRate 0.0002 Epoch: 24 Global Step: 504640 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:28,550-Speed 6334.50 samples/sec Loss 4.2480 LearningRate 0.0002 Epoch: 24 Global Step: 504650 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:31,793-Speed 6317.55 samples/sec Loss 4.2336 LearningRate 0.0002 Epoch: 24 Global Step: 504660 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:35,034-Speed 6319.84 samples/sec Loss 4.2494 LearningRate 0.0002 Epoch: 24 Global Step: 504670 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:38,284-Speed 6303.22 samples/sec Loss 4.2080 LearningRate 0.0002 Epoch: 24 Global Step: 504680 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:41,527-Speed 6317.34 samples/sec Loss 4.2265 LearningRate 0.0002 Epoch: 24 Global Step: 504690 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:44,772-Speed 6313.82 samples/sec Loss 4.2624 LearningRate 0.0002 Epoch: 24 Global Step: 504700 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:48,019-Speed 6307.52 samples/sec Loss 4.1760 LearningRate 0.0002 Epoch: 24 Global Step: 504710 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:51,265-Speed 6310.18 samples/sec Loss 4.2830 LearningRate 0.0002 Epoch: 24 Global Step: 504720 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:54,511-Speed 6311.44 samples/sec Loss 4.2500 LearningRate 0.0002 Epoch: 24 Global Step: 504730 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:38:57,755-Speed 6315.51 samples/sec Loss 4.2285 LearningRate 0.0002 Epoch: 24 Global Step: 504740 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:01,001-Speed 6309.81 samples/sec Loss 4.2048 LearningRate 0.0002 Epoch: 24 Global Step: 504750 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-02 13:39:04,236-Speed 6332.70 samples/sec Loss 4.1885 LearningRate 0.0002 Epoch: 24 Global Step: 504760 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:07,479-Speed 6315.92 samples/sec Loss 4.1985 LearningRate 0.0002 Epoch: 24 Global Step: 504770 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:10,726-Speed 6308.72 samples/sec Loss 4.2355 LearningRate 0.0002 Epoch: 24 Global Step: 504780 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:13,973-Speed 6308.51 samples/sec Loss 4.3261 LearningRate 0.0002 Epoch: 24 Global Step: 504790 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:17,218-Speed 6312.68 samples/sec Loss 4.3194 LearningRate 0.0002 Epoch: 24 Global Step: 504800 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:20,459-Speed 6321.19 samples/sec Loss 4.2598 LearningRate 0.0002 Epoch: 24 Global Step: 504810 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:23,704-Speed 6312.90 samples/sec Loss 4.2358 LearningRate 0.0002 Epoch: 24 Global Step: 504820 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:26,967-Speed 6277.85 samples/sec Loss 4.1952 LearningRate 0.0002 Epoch: 24 Global Step: 504830 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:30,213-Speed 6310.84 samples/sec Loss 4.1906 LearningRate 0.0002 Epoch: 24 Global Step: 504840 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:33,456-Speed 6316.81 samples/sec Loss 4.1887 LearningRate 0.0002 Epoch: 24 Global Step: 504850 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:36,685-Speed 6343.42 samples/sec Loss 4.2983 LearningRate 0.0002 Epoch: 24 Global Step: 504860 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:39,931-Speed 6309.96 samples/sec Loss 4.2219 LearningRate 0.0002 Epoch: 24 Global Step: 504870 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:43,186-Speed 6294.24 samples/sec Loss 4.1913 LearningRate 0.0002 Epoch: 24 Global Step: 504880 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:46,429-Speed 6316.44 samples/sec Loss 4.2699 LearningRate 0.0002 Epoch: 24 Global Step: 504890 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:49,674-Speed 6312.75 samples/sec Loss 4.1799 LearningRate 0.0002 Epoch: 24 Global Step: 504900 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:52,920-Speed 6312.43 samples/sec Loss 4.2703 LearningRate 0.0002 Epoch: 24 Global Step: 504910 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:56,162-Speed 6317.60 samples/sec Loss 4.2392 LearningRate 0.0002 Epoch: 24 Global Step: 504920 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:39:59,409-Speed 6308.82 samples/sec Loss 4.2497 LearningRate 0.0002 Epoch: 24 Global Step: 504930 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:02,659-Speed 6302.71 samples/sec Loss 4.1997 LearningRate 0.0002 Epoch: 24 Global Step: 504940 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:05,913-Speed 6296.42 samples/sec Loss 4.2811 LearningRate 0.0002 Epoch: 24 Global Step: 504950 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:09,143-Speed 6340.96 samples/sec Loss 4.1990 LearningRate 0.0002 Epoch: 24 Global Step: 504960 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:12,389-Speed 6310.65 samples/sec Loss 4.2086 LearningRate 0.0002 Epoch: 24 Global Step: 504970 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:15,631-Speed 6317.69 samples/sec Loss 4.2117 LearningRate 0.0002 Epoch: 24 Global Step: 504980 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:18,879-Speed 6307.73 samples/sec Loss 4.2786 LearningRate 0.0002 Epoch: 24 Global Step: 504990 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:22,122-Speed 6316.53 samples/sec Loss 4.2251 LearningRate 0.0002 Epoch: 24 Global Step: 505000 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:25,366-Speed 6315.56 samples/sec Loss 4.2422 LearningRate 0.0002 Epoch: 24 Global Step: 505010 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:28,656-Speed 6226.47 samples/sec Loss 4.2822 LearningRate 0.0002 Epoch: 24 Global Step: 505020 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:31,900-Speed 6316.40 samples/sec Loss 4.1954 LearningRate 0.0002 Epoch: 24 Global Step: 505030 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:35,147-Speed 6307.80 samples/sec Loss 4.1794 LearningRate 0.0002 Epoch: 24 Global Step: 505040 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:38,390-Speed 6316.91 samples/sec Loss 4.2102 LearningRate 0.0002 Epoch: 24 Global Step: 505050 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:41,620-Speed 6340.66 samples/sec Loss 4.1817 LearningRate 0.0002 Epoch: 24 Global Step: 505060 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:44,868-Speed 6308.57 samples/sec Loss 4.2480 LearningRate 0.0002 Epoch: 24 Global Step: 505070 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:48,116-Speed 6306.05 samples/sec Loss 4.2582 LearningRate 0.0002 Epoch: 24 Global Step: 505080 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:51,361-Speed 6312.57 samples/sec Loss 4.2057 LearningRate 0.0002 Epoch: 24 Global Step: 505090 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:54,604-Speed 6316.18 samples/sec Loss 4.1810 LearningRate 0.0002 Epoch: 24 Global Step: 505100 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:40:57,851-Speed 6308.44 samples/sec Loss 4.2545 LearningRate 0.0002 Epoch: 24 Global Step: 505110 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:41:01,098-Speed 6310.72 samples/sec Loss 4.2462 LearningRate 0.0002 Epoch: 24 Global Step: 505120 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:41:04,342-Speed 6315.10 samples/sec Loss 4.2627 LearningRate 0.0002 Epoch: 24 Global Step: 505130 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:41:07,587-Speed 6311.18 samples/sec Loss 4.2355 LearningRate 0.0002 Epoch: 24 Global Step: 505140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:41:10,831-Speed 6316.09 samples/sec Loss 4.3026 LearningRate 0.0002 Epoch: 24 Global Step: 505150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:41:14,067-Speed 6330.17 samples/sec Loss 4.2711 LearningRate 0.0002 Epoch: 24 Global Step: 505160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:41:17,314-Speed 6307.96 samples/sec Loss 4.2261 LearningRate 0.0002 Epoch: 24 Global Step: 505170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:41:20,559-Speed 6312.60 samples/sec Loss 4.1673 LearningRate 0.0002 Epoch: 24 Global Step: 505180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:41:23,805-Speed 6311.23 samples/sec Loss 4.2576 LearningRate 0.0002 Epoch: 24 Global Step: 505190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:41:27,057-Speed 6298.53 samples/sec Loss 4.2457 LearningRate 0.0002 Epoch: 24 Global Step: 505200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-02 13:41:30,304-Speed 6308.46 samples/sec Loss 4.2943 LearningRate 0.0002 Epoch: 24 Global Step: 505210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:41:33,561-Speed 6289.52 samples/sec Loss 4.2940 LearningRate 0.0002 Epoch: 24 Global Step: 505220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:41:36,808-Speed 6308.98 samples/sec Loss 4.2249 LearningRate 0.0002 Epoch: 24 Global Step: 505230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:41:40,085-Speed 6250.98 samples/sec Loss 4.1787 LearningRate 0.0002 Epoch: 24 Global Step: 505240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:41:43,327-Speed 6318.11 samples/sec Loss 4.2214 LearningRate 0.0002 Epoch: 24 Global Step: 505250 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:41:46,563-Speed 6330.92 samples/sec Loss 4.2975 LearningRate 0.0002 Epoch: 24 Global Step: 505260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:41:49,812-Speed 6305.22 samples/sec Loss 4.2576 LearningRate 0.0002 Epoch: 24 Global Step: 505270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:41:53,061-Speed 6304.09 samples/sec Loss 4.2105 LearningRate 0.0002 Epoch: 24 Global Step: 505280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:41:56,317-Speed 6292.54 samples/sec Loss 4.2008 LearningRate 0.0002 Epoch: 24 Global Step: 505290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:41:59,560-Speed 6315.71 samples/sec Loss 4.1892 LearningRate 0.0002 Epoch: 24 Global Step: 505300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:02,812-Speed 6300.23 samples/sec Loss 4.2699 LearningRate 0.0002 Epoch: 24 Global Step: 505310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:06,055-Speed 6316.63 samples/sec Loss 4.1978 LearningRate 0.0002 Epoch: 24 Global Step: 505320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:09,300-Speed 6311.68 samples/sec Loss 4.2224 LearningRate 0.0002 Epoch: 24 Global Step: 505330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:12,546-Speed 6312.54 samples/sec Loss 4.2175 LearningRate 0.0002 Epoch: 24 Global Step: 505340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:15,792-Speed 6309.98 samples/sec Loss 4.2569 LearningRate 0.0002 Epoch: 24 Global Step: 505350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:19,035-Speed 6316.29 samples/sec Loss 4.2302 LearningRate 0.0002 Epoch: 24 Global Step: 505360 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 13:42:22,267-Speed 6337.76 samples/sec Loss 4.2525 LearningRate 0.0002 Epoch: 24 Global Step: 505370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:25,517-Speed 6302.93 samples/sec Loss 4.1728 LearningRate 0.0002 Epoch: 24 Global Step: 505380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:28,764-Speed 6309.25 samples/sec Loss 4.2304 LearningRate 0.0002 Epoch: 24 Global Step: 505390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:32,010-Speed 6309.94 samples/sec Loss 4.3054 LearningRate 0.0002 Epoch: 24 Global Step: 505400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:35,266-Speed 6292.94 samples/sec Loss 4.1825 LearningRate 0.0002 Epoch: 24 Global Step: 505410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:38,510-Speed 6314.26 samples/sec Loss 4.2150 LearningRate 0.0002 Epoch: 24 Global Step: 505420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:41,757-Speed 6307.84 samples/sec Loss 4.2141 LearningRate 0.0002 Epoch: 24 Global Step: 505430 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:45,005-Speed 6307.62 samples/sec Loss 4.1909 LearningRate 0.0002 Epoch: 24 Global Step: 505440 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:48,251-Speed 6309.63 samples/sec Loss 4.2142 LearningRate 0.0002 Epoch: 24 Global Step: 505450 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:51,494-Speed 6317.30 samples/sec Loss 4.1509 LearningRate 0.0002 Epoch: 24 Global Step: 505460 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:54,722-Speed 6346.20 samples/sec Loss 4.2244 LearningRate 0.0002 Epoch: 24 Global Step: 505470 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:42:57,981-Speed 6285.15 samples/sec Loss 4.2081 LearningRate 0.0002 Epoch: 24 Global Step: 505480 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:01,270-Speed 6228.35 samples/sec Loss 4.3043 LearningRate 0.0002 Epoch: 24 Global Step: 505490 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:04,517-Speed 6309.68 samples/sec Loss 4.2182 LearningRate 0.0002 Epoch: 24 Global Step: 505500 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:07,764-Speed 6309.24 samples/sec Loss 4.2271 LearningRate 0.0002 Epoch: 24 Global Step: 505510 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:11,009-Speed 6311.63 samples/sec Loss 4.2021 LearningRate 0.0002 Epoch: 24 Global Step: 505520 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:14,258-Speed 6305.55 samples/sec Loss 4.2286 LearningRate 0.0002 Epoch: 24 Global Step: 505530 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:17,504-Speed 6311.75 samples/sec Loss 4.2365 LearningRate 0.0002 Epoch: 24 Global Step: 505540 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:20,751-Speed 6308.03 samples/sec Loss 4.2068 LearningRate 0.0002 Epoch: 24 Global Step: 505550 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:23,995-Speed 6315.53 samples/sec Loss 4.2564 LearningRate 0.0002 Epoch: 24 Global Step: 505560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:27,226-Speed 6339.71 samples/sec Loss 4.2253 LearningRate 0.0002 Epoch: 24 Global Step: 505570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:30,475-Speed 6306.56 samples/sec Loss 4.2020 LearningRate 0.0002 Epoch: 24 Global Step: 505580 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:33,718-Speed 6314.89 samples/sec Loss 4.2095 LearningRate 0.0002 Epoch: 24 Global Step: 505590 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:36,968-Speed 6303.82 samples/sec Loss 4.2057 LearningRate 0.0002 Epoch: 24 Global Step: 505600 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:40,210-Speed 6317.34 samples/sec Loss 4.1909 LearningRate 0.0002 Epoch: 24 Global Step: 505610 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:43,454-Speed 6315.98 samples/sec Loss 4.2216 LearningRate 0.0002 Epoch: 24 Global Step: 505620 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:46,698-Speed 6314.19 samples/sec Loss 4.2667 LearningRate 0.0002 Epoch: 24 Global Step: 505630 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:49,943-Speed 6313.07 samples/sec Loss 4.3068 LearningRate 0.0002 Epoch: 24 Global Step: 505640 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:53,189-Speed 6309.51 samples/sec Loss 4.2292 LearningRate 0.0002 Epoch: 24 Global Step: 505650 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:56,430-Speed 6320.21 samples/sec Loss 4.2136 LearningRate 0.0002 Epoch: 24 Global Step: 505660 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:43:59,669-Speed 6324.71 samples/sec Loss 4.3030 LearningRate 0.0002 Epoch: 24 Global Step: 505670 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:02,912-Speed 6316.43 samples/sec Loss 4.2766 LearningRate 0.0002 Epoch: 24 Global Step: 505680 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:06,160-Speed 6307.53 samples/sec Loss 4.1861 LearningRate 0.0002 Epoch: 24 Global Step: 505690 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:09,406-Speed 6310.46 samples/sec Loss 4.1729 LearningRate 0.0002 Epoch: 24 Global Step: 505700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:12,652-Speed 6310.40 samples/sec Loss 4.2489 LearningRate 0.0002 Epoch: 24 Global Step: 505710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:15,896-Speed 6313.90 samples/sec Loss 4.1640 LearningRate 0.0002 Epoch: 24 Global Step: 505720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:19,142-Speed 6310.76 samples/sec Loss 4.1991 LearningRate 0.0002 Epoch: 24 Global Step: 505730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:22,387-Speed 6313.73 samples/sec Loss 4.2330 LearningRate 0.0002 Epoch: 24 Global Step: 505740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:25,634-Speed 6308.07 samples/sec Loss 4.2387 LearningRate 0.0002 Epoch: 24 Global Step: 505750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:28,880-Speed 6312.67 samples/sec Loss 4.2043 LearningRate 0.0002 Epoch: 24 Global Step: 505760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:32,126-Speed 6310.90 samples/sec Loss 4.1696 LearningRate 0.0002 Epoch: 24 Global Step: 505770 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 13:44:35,350-Speed 6352.66 samples/sec Loss 4.2305 LearningRate 0.0002 Epoch: 24 Global Step: 505780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:38,598-Speed 6307.09 samples/sec Loss 4.2480 LearningRate 0.0002 Epoch: 24 Global Step: 505790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:41,844-Speed 6311.10 samples/sec Loss 4.2309 LearningRate 0.0002 Epoch: 24 Global Step: 505800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:45,090-Speed 6309.85 samples/sec Loss 4.2558 LearningRate 0.0002 Epoch: 24 Global Step: 505810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:48,337-Speed 6308.81 samples/sec Loss 4.2778 LearningRate 0.0002 Epoch: 24 Global Step: 505820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:51,583-Speed 6310.61 samples/sec Loss 4.2723 LearningRate 0.0002 Epoch: 24 Global Step: 505830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:54,828-Speed 6312.79 samples/sec Loss 4.2534 LearningRate 0.0002 Epoch: 24 Global Step: 505840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:44:58,072-Speed 6314.19 samples/sec Loss 4.2523 LearningRate 0.0002 Epoch: 24 Global Step: 505850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:45:01,317-Speed 6314.11 samples/sec Loss 4.1651 LearningRate 0.0002 Epoch: 24 Global Step: 505860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:45:04,562-Speed 6312.50 samples/sec Loss 4.2037 LearningRate 0.0002 Epoch: 24 Global Step: 505870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:45:07,783-Speed 6359.38 samples/sec Loss 4.2393 LearningRate 0.0002 Epoch: 24 Global Step: 505880 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:45:11,034-Speed 6301.76 samples/sec Loss 4.2746 LearningRate 0.0002 Epoch: 24 Global Step: 505890 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:45:14,275-Speed 6318.81 samples/sec Loss 4.2456 LearningRate 0.0002 Epoch: 24 Global Step: 505900 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:45:17,520-Speed 6312.92 samples/sec Loss 4.2249 LearningRate 0.0002 Epoch: 24 Global Step: 505910 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:45:20,764-Speed 6314.38 samples/sec Loss 4.2236 LearningRate 0.0002 Epoch: 24 Global Step: 505920 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:45:24,016-Speed 6299.28 samples/sec Loss 4.3094 LearningRate 0.0002 Epoch: 24 Global Step: 505930 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:45:27,260-Speed 6315.42 samples/sec Loss 4.2909 LearningRate 0.0002 Epoch: 24 Global Step: 505940 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:45:30,506-Speed 6310.72 samples/sec Loss 4.2294 LearningRate 0.0002 Epoch: 24 Global Step: 505950 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:45:33,753-Speed 6309.58 samples/sec Loss 4.2135 LearningRate 0.0002 Epoch: 24 Global Step: 505960 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:45:37,003-Speed 6303.92 samples/sec Loss 4.2463 LearningRate 0.0002 Epoch: 24 Global Step: 505970 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:45:40,249-Speed 6309.58 samples/sec Loss 4.2365 LearningRate 0.0002 Epoch: 24 Global Step: 505980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:45:43,492-Speed 6317.10 samples/sec Loss 4.1986 LearningRate 0.0002 Epoch: 24 Global Step: 505990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:45:46,740-Speed 6305.95 samples/sec Loss 4.2348 LearningRate 0.0002 Epoch: 24 Global Step: 506000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:45:49,989-Speed 6305.32 samples/sec Loss 4.1763 LearningRate 0.0002 Epoch: 24 Global Step: 506010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:45:53,237-Speed 6307.61 samples/sec Loss 4.1859 LearningRate 0.0002 Epoch: 24 Global Step: 506020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:45:56,482-Speed 6310.96 samples/sec Loss 4.2389 LearningRate 0.0002 Epoch: 24 Global Step: 506030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:45:59,731-Speed 6305.32 samples/sec Loss 4.3107 LearningRate 0.0002 Epoch: 24 Global Step: 506040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:46:02,964-Speed 6336.96 samples/sec Loss 4.2966 LearningRate 0.0002 Epoch: 24 Global Step: 506050 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:46:06,209-Speed 6311.91 samples/sec Loss 4.2445 LearningRate 0.0002 Epoch: 24 Global Step: 506060 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:46:09,451-Speed 6318.94 samples/sec Loss 4.1954 LearningRate 0.0002 Epoch: 24 Global Step: 506070 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:46:12,694-Speed 6316.42 samples/sec Loss 4.2525 LearningRate 0.0002 Epoch: 24 Global Step: 506080 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:46:15,939-Speed 6313.55 samples/sec Loss 4.2519 LearningRate 0.0002 Epoch: 24 Global Step: 506090 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:46:19,186-Speed 6307.36 samples/sec Loss 4.2295 LearningRate 0.0002 Epoch: 24 Global Step: 506100 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:46:22,429-Speed 6316.93 samples/sec Loss 4.1427 LearningRate 0.0002 Epoch: 24 Global Step: 506110 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:46:25,672-Speed 6316.92 samples/sec Loss 4.1655 LearningRate 0.0002 Epoch: 24 Global Step: 506120 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:46:28,917-Speed 6311.96 samples/sec Loss 4.2081 LearningRate 0.0002 Epoch: 24 Global Step: 506130 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:46:32,164-Speed 6308.74 samples/sec Loss 4.2307 LearningRate 0.0002 Epoch: 24 Global Step: 506140 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:46:35,412-Speed 6306.97 samples/sec Loss 4.2172 LearningRate 0.0002 Epoch: 24 Global Step: 506150 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:46:38,657-Speed 6312.90 samples/sec Loss 4.2783 LearningRate 0.0002 Epoch: 24 Global Step: 506160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:46:41,908-Speed 6302.55 samples/sec Loss 4.1724 LearningRate 0.0002 Epoch: 24 Global Step: 506170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:46:45,155-Speed 6309.71 samples/sec Loss 4.2095 LearningRate 0.0002 Epoch: 24 Global Step: 506180 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:46:48,399-Speed 6313.83 samples/sec Loss 4.2882 LearningRate 0.0002 Epoch: 24 Global Step: 506190 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:46:51,646-Speed 6308.72 samples/sec Loss 4.2419 LearningRate 0.0002 Epoch: 24 Global Step: 506200 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:46:54,893-Speed 6308.30 samples/sec Loss 4.2166 LearningRate 0.0002 Epoch: 24 Global Step: 506210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:46:58,141-Speed 6306.60 samples/sec Loss 4.2809 LearningRate 0.0002 Epoch: 24 Global Step: 506220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:47:01,395-Speed 6295.43 samples/sec Loss 4.2031 LearningRate 0.0002 Epoch: 24 Global Step: 506230 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:04,641-Speed 6311.14 samples/sec Loss 4.2507 LearningRate 0.0002 Epoch: 24 Global Step: 506240 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:07,884-Speed 6316.68 samples/sec Loss 4.2269 LearningRate 0.0002 Epoch: 24 Global Step: 506250 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:11,132-Speed 6306.91 samples/sec Loss 4.2357 LearningRate 0.0002 Epoch: 24 Global Step: 506260 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:14,380-Speed 6306.08 samples/sec Loss 4.2073 LearningRate 0.0002 Epoch: 24 Global Step: 506270 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:17,633-Speed 6297.17 samples/sec Loss 4.2209 LearningRate 0.0002 Epoch: 24 Global Step: 506280 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:20,878-Speed 6312.53 samples/sec Loss 4.2036 LearningRate 0.0002 Epoch: 24 Global Step: 506290 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:24,123-Speed 6313.44 samples/sec Loss 4.1696 LearningRate 0.0002 Epoch: 24 Global Step: 506300 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:27,367-Speed 6314.05 samples/sec Loss 4.2163 LearningRate 0.0002 Epoch: 24 Global Step: 506310 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:30,611-Speed 6314.53 samples/sec Loss 4.2546 LearningRate 0.0002 Epoch: 24 Global Step: 506320 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:33,856-Speed 6313.46 samples/sec Loss 4.2375 LearningRate 0.0002 Epoch: 24 Global Step: 506330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:47:37,090-Speed 6334.68 samples/sec Loss 4.2181 LearningRate 0.0002 Epoch: 24 Global Step: 506340 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:40,335-Speed 6311.31 samples/sec Loss 4.2174 LearningRate 0.0002 Epoch: 24 Global Step: 506350 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:43,581-Speed 6310.50 samples/sec Loss 4.1864 LearningRate 0.0002 Epoch: 24 Global Step: 506360 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:46,826-Speed 6312.40 samples/sec Loss 4.2523 LearningRate 0.0002 Epoch: 24 Global Step: 506370 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:50,072-Speed 6312.30 samples/sec Loss 4.1862 LearningRate 0.0002 Epoch: 24 Global Step: 506380 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:53,321-Speed 6304.66 samples/sec Loss 4.2160 LearningRate 0.0002 Epoch: 24 Global Step: 506390 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:56,566-Speed 6312.80 samples/sec Loss 4.2440 LearningRate 0.0002 Epoch: 24 Global Step: 506400 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:47:59,810-Speed 6314.56 samples/sec Loss 4.1240 LearningRate 0.0002 Epoch: 24 Global Step: 506410 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:48:03,058-Speed 6306.08 samples/sec Loss 4.1909 LearningRate 0.0002 Epoch: 24 Global Step: 506420 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:48:06,302-Speed 6316.38 samples/sec Loss 4.1882 LearningRate 0.0002 Epoch: 24 Global Step: 506430 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:48:09,563-Speed 6281.68 samples/sec Loss 4.2403 LearningRate 0.0002 Epoch: 24 Global Step: 506440 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:12,806-Speed 6315.44 samples/sec Loss 4.1555 LearningRate 0.0002 Epoch: 24 Global Step: 506450 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:16,050-Speed 6315.36 samples/sec Loss 4.2095 LearningRate 0.0002 Epoch: 24 Global Step: 506460 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:19,302-Speed 6298.61 samples/sec Loss 4.2262 LearningRate 0.0002 Epoch: 24 Global Step: 506470 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:22,545-Speed 6317.36 samples/sec Loss 4.2605 LearningRate 0.0002 Epoch: 24 Global Step: 506480 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:25,794-Speed 6303.49 samples/sec Loss 4.2114 LearningRate 0.0002 Epoch: 24 Global Step: 506490 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:29,037-Speed 6317.60 samples/sec Loss 4.2812 LearningRate 0.0002 Epoch: 24 Global Step: 506500 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:32,282-Speed 6312.33 samples/sec Loss 4.2330 LearningRate 0.0002 Epoch: 24 Global Step: 506510 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:35,525-Speed 6316.66 samples/sec Loss 4.1699 LearningRate 0.0002 Epoch: 24 Global Step: 506520 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:38,772-Speed 6308.69 samples/sec Loss 4.2158 LearningRate 0.0002 Epoch: 24 Global Step: 506530 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:42,010-Speed 6326.41 samples/sec Loss 4.1638 LearningRate 0.0002 Epoch: 24 Global Step: 506540 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:45,258-Speed 6307.00 samples/sec Loss 4.2137 LearningRate 0.0002 Epoch: 24 Global Step: 506550 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:48,507-Speed 6304.10 samples/sec Loss 4.2672 LearningRate 0.0002 Epoch: 24 Global Step: 506560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:51,753-Speed 6311.43 samples/sec Loss 4.1922 LearningRate 0.0002 Epoch: 24 Global Step: 506570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:48:54,987-Speed 6333.82 samples/sec Loss 4.2831 LearningRate 0.0002 Epoch: 24 Global Step: 506580 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:48:58,232-Speed 6313.41 samples/sec Loss 4.2375 LearningRate 0.0002 Epoch: 24 Global Step: 506590 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:49:01,477-Speed 6312.46 samples/sec Loss 4.3013 LearningRate 0.0002 Epoch: 24 Global Step: 506600 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:49:04,723-Speed 6310.78 samples/sec Loss 4.2051 LearningRate 0.0002 Epoch: 24 Global Step: 506610 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:49:07,968-Speed 6314.03 samples/sec Loss 4.2894 LearningRate 0.0002 Epoch: 24 Global Step: 506620 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:49:11,219-Speed 6299.86 samples/sec Loss 4.1771 LearningRate 0.0002 Epoch: 24 Global Step: 506630 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:49:14,465-Speed 6311.19 samples/sec Loss 4.3005 LearningRate 0.0002 Epoch: 24 Global Step: 506640 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:49:17,709-Speed 6314.45 samples/sec Loss 4.2032 LearningRate 0.0002 Epoch: 24 Global Step: 506650 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:49:20,953-Speed 6313.92 samples/sec Loss 4.2555 LearningRate 0.0002 Epoch: 24 Global Step: 506660 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:49:24,196-Speed 6316.45 samples/sec Loss 4.2776 LearningRate 0.0002 Epoch: 24 Global Step: 506670 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:49:27,443-Speed 6308.93 samples/sec Loss 4.2550 LearningRate 0.0002 Epoch: 24 Global Step: 506680 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:49:30,690-Speed 6310.17 samples/sec Loss 4.2648 LearningRate 0.0002 Epoch: 24 Global Step: 506690 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:49:33,937-Speed 6307.73 samples/sec Loss 4.1907 LearningRate 0.0002 Epoch: 24 Global Step: 506700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:49:37,182-Speed 6312.79 samples/sec Loss 4.2267 LearningRate 0.0002 Epoch: 24 Global Step: 506710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:49:40,429-Speed 6308.34 samples/sec Loss 4.2320 LearningRate 0.0002 Epoch: 24 Global Step: 506720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:49:43,674-Speed 6312.45 samples/sec Loss 4.2424 LearningRate 0.0002 Epoch: 24 Global Step: 506730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:49:46,918-Speed 6315.10 samples/sec Loss 4.1662 LearningRate 0.0002 Epoch: 24 Global Step: 506740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:49:50,161-Speed 6317.55 samples/sec Loss 4.1799 LearningRate 0.0002 Epoch: 24 Global Step: 506750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:49:53,405-Speed 6312.87 samples/sec Loss 4.2640 LearningRate 0.0002 Epoch: 24 Global Step: 506760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:49:56,651-Speed 6312.08 samples/sec Loss 4.1858 LearningRate 0.0002 Epoch: 24 Global Step: 506770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:49:59,888-Speed 6328.90 samples/sec Loss 4.1905 LearningRate 0.0002 Epoch: 24 Global Step: 506780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:03,137-Speed 6304.84 samples/sec Loss 4.2303 LearningRate 0.0002 Epoch: 24 Global Step: 506790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:06,382-Speed 6312.56 samples/sec Loss 4.2350 LearningRate 0.0002 Epoch: 24 Global Step: 506800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:09,629-Speed 6308.38 samples/sec Loss 4.2120 LearningRate 0.0002 Epoch: 24 Global Step: 506810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:12,877-Speed 6306.62 samples/sec Loss 4.1209 LearningRate 0.0002 Epoch: 24 Global Step: 506820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:16,120-Speed 6316.47 samples/sec Loss 4.1496 LearningRate 0.0002 Epoch: 24 Global Step: 506830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:19,366-Speed 6313.83 samples/sec Loss 4.2249 LearningRate 0.0002 Epoch: 24 Global Step: 506840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:22,615-Speed 6304.34 samples/sec Loss 4.2601 LearningRate 0.0002 Epoch: 24 Global Step: 506850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:25,860-Speed 6312.99 samples/sec Loss 4.2169 LearningRate 0.0002 Epoch: 24 Global Step: 506860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:29,109-Speed 6306.01 samples/sec Loss 4.1926 LearningRate 0.0002 Epoch: 24 Global Step: 506870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:32,343-Speed 6332.90 samples/sec Loss 4.2565 LearningRate 0.0002 Epoch: 24 Global Step: 506880 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:35,586-Speed 6317.43 samples/sec Loss 4.2210 LearningRate 0.0002 Epoch: 24 Global Step: 506890 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:38,835-Speed 6305.32 samples/sec Loss 4.2526 LearningRate 0.0002 Epoch: 24 Global Step: 506900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:42,081-Speed 6310.16 samples/sec Loss 4.1779 LearningRate 0.0002 Epoch: 24 Global Step: 506910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:45,326-Speed 6311.88 samples/sec Loss 4.1598 LearningRate 0.0002 Epoch: 24 Global Step: 506920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:48,573-Speed 6309.70 samples/sec Loss 4.2431 LearningRate 0.0002 Epoch: 24 Global Step: 506930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:51,824-Speed 6299.87 samples/sec Loss 4.2023 LearningRate 0.0002 Epoch: 24 Global Step: 506940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:55,069-Speed 6313.40 samples/sec Loss 4.1787 LearningRate 0.0002 Epoch: 24 Global Step: 506950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:50:58,318-Speed 6305.11 samples/sec Loss 4.1953 LearningRate 0.0002 Epoch: 24 Global Step: 506960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:01,561-Speed 6316.14 samples/sec Loss 4.1732 LearningRate 0.0002 Epoch: 24 Global Step: 506970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:04,796-Speed 6332.90 samples/sec Loss 4.1843 LearningRate 0.0002 Epoch: 24 Global Step: 506980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:08,040-Speed 6315.13 samples/sec Loss 4.2192 LearningRate 0.0002 Epoch: 24 Global Step: 506990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:11,288-Speed 6305.90 samples/sec Loss 4.1550 LearningRate 0.0002 Epoch: 24 Global Step: 507000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:14,534-Speed 6311.21 samples/sec Loss 4.2763 LearningRate 0.0002 Epoch: 24 Global Step: 507010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:17,782-Speed 6307.20 samples/sec Loss 4.1592 LearningRate 0.0002 Epoch: 24 Global Step: 507020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:21,028-Speed 6311.33 samples/sec Loss 4.1414 LearningRate 0.0002 Epoch: 24 Global Step: 507030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:24,274-Speed 6311.20 samples/sec Loss 4.2217 LearningRate 0.0002 Epoch: 24 Global Step: 507040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:27,522-Speed 6306.56 samples/sec Loss 4.2403 LearningRate 0.0002 Epoch: 24 Global Step: 507050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:30,763-Speed 6319.28 samples/sec Loss 4.2469 LearningRate 0.0002 Epoch: 24 Global Step: 507060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:34,010-Speed 6309.39 samples/sec Loss 4.2360 LearningRate 0.0002 Epoch: 24 Global Step: 507070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:37,240-Speed 6341.40 samples/sec Loss 4.2401 LearningRate 0.0002 Epoch: 24 Global Step: 507080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:40,488-Speed 6307.44 samples/sec Loss 4.2342 LearningRate 0.0002 Epoch: 24 Global Step: 507090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:43,731-Speed 6316.26 samples/sec Loss 4.1947 LearningRate 0.0002 Epoch: 24 Global Step: 507100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:47,020-Speed 6228.23 samples/sec Loss 4.1700 LearningRate 0.0002 Epoch: 24 Global Step: 507110 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:50,380-Speed 6096.78 samples/sec Loss 4.2465 LearningRate 0.0002 Epoch: 24 Global Step: 507120 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:53,636-Speed 6290.42 samples/sec Loss 4.2423 LearningRate 0.0002 Epoch: 24 Global Step: 507130 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:51:56,880-Speed 6315.82 samples/sec Loss 4.1973 LearningRate 0.0002 Epoch: 24 Global Step: 507140 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:00,129-Speed 6304.35 samples/sec Loss 4.1796 LearningRate 0.0002 Epoch: 24 Global Step: 507150 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:03,374-Speed 6313.45 samples/sec Loss 4.2676 LearningRate 0.0002 Epoch: 24 Global Step: 507160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:06,619-Speed 6312.31 samples/sec Loss 4.1889 LearningRate 0.0002 Epoch: 24 Global Step: 507170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:09,849-Speed 6341.15 samples/sec Loss 4.2009 LearningRate 0.0002 Epoch: 24 Global Step: 507180 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:13,095-Speed 6312.40 samples/sec Loss 4.2285 LearningRate 0.0002 Epoch: 24 Global Step: 507190 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:16,339-Speed 6312.81 samples/sec Loss 4.2579 LearningRate 0.0002 Epoch: 24 Global Step: 507200 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:19,585-Speed 6312.19 samples/sec Loss 4.2138 LearningRate 0.0002 Epoch: 24 Global Step: 507210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:22,831-Speed 6310.86 samples/sec Loss 4.2186 LearningRate 0.0002 Epoch: 24 Global Step: 507220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:26,079-Speed 6307.11 samples/sec Loss 4.1877 LearningRate 0.0002 Epoch: 24 Global Step: 507230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:29,326-Speed 6309.13 samples/sec Loss 4.2153 LearningRate 0.0002 Epoch: 24 Global Step: 507240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:32,569-Speed 6316.31 samples/sec Loss 4.2389 LearningRate 0.0002 Epoch: 24 Global Step: 507250 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:35,813-Speed 6314.36 samples/sec Loss 4.2343 LearningRate 0.0002 Epoch: 24 Global Step: 507260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:39,058-Speed 6312.49 samples/sec Loss 4.2366 LearningRate 0.0002 Epoch: 24 Global Step: 507270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:42,304-Speed 6310.94 samples/sec Loss 4.2341 LearningRate 0.0002 Epoch: 24 Global Step: 507280 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 13:52:45,535-Speed 6340.86 samples/sec Loss 4.2252 LearningRate 0.0002 Epoch: 24 Global Step: 507290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:48,781-Speed 6309.85 samples/sec Loss 4.1477 LearningRate 0.0002 Epoch: 24 Global Step: 507300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:52,025-Speed 6314.42 samples/sec Loss 4.2318 LearningRate 0.0002 Epoch: 24 Global Step: 507310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:55,272-Speed 6308.09 samples/sec Loss 4.2106 LearningRate 0.0002 Epoch: 24 Global Step: 507320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:52:58,515-Speed 6316.75 samples/sec Loss 4.1915 LearningRate 0.0002 Epoch: 24 Global Step: 507330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:01,760-Speed 6313.33 samples/sec Loss 4.2428 LearningRate 0.0002 Epoch: 24 Global Step: 507340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:05,005-Speed 6312.44 samples/sec Loss 4.2045 LearningRate 0.0002 Epoch: 24 Global Step: 507350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:08,253-Speed 6306.76 samples/sec Loss 4.1488 LearningRate 0.0002 Epoch: 24 Global Step: 507360 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:11,501-Speed 6307.54 samples/sec Loss 4.1818 LearningRate 0.0002 Epoch: 24 Global Step: 507370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:14,754-Speed 6295.81 samples/sec Loss 4.1552 LearningRate 0.0002 Epoch: 24 Global Step: 507380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:17,985-Speed 6340.62 samples/sec Loss 4.1386 LearningRate 0.0002 Epoch: 24 Global Step: 507390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:21,227-Speed 6317.54 samples/sec Loss 4.1778 LearningRate 0.0002 Epoch: 24 Global Step: 507400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:24,475-Speed 6308.39 samples/sec Loss 4.1950 LearningRate 0.0002 Epoch: 24 Global Step: 507410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:27,725-Speed 6304.03 samples/sec Loss 4.2309 LearningRate 0.0002 Epoch: 24 Global Step: 507420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:30,972-Speed 6308.06 samples/sec Loss 4.2343 LearningRate 0.0002 Epoch: 24 Global Step: 507430 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:34,216-Speed 6314.95 samples/sec Loss 4.2602 LearningRate 0.0002 Epoch: 24 Global Step: 507440 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:37,459-Speed 6316.00 samples/sec Loss 4.1979 LearningRate 0.0002 Epoch: 24 Global Step: 507450 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:40,704-Speed 6314.07 samples/sec Loss 4.2736 LearningRate 0.0002 Epoch: 24 Global Step: 507460 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:43,950-Speed 6310.18 samples/sec Loss 4.1293 LearningRate 0.0002 Epoch: 24 Global Step: 507470 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:47,194-Speed 6314.10 samples/sec Loss 4.2225 LearningRate 0.0002 Epoch: 24 Global Step: 507480 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:50,424-Speed 6341.54 samples/sec Loss 4.2233 LearningRate 0.0002 Epoch: 24 Global Step: 507490 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:53,668-Speed 6316.75 samples/sec Loss 4.2717 LearningRate 0.0002 Epoch: 24 Global Step: 507500 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:53:56,909-Speed 6321.69 samples/sec Loss 4.2316 LearningRate 0.0002 Epoch: 24 Global Step: 507510 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:00,154-Speed 6311.29 samples/sec Loss 4.1954 LearningRate 0.0002 Epoch: 24 Global Step: 507520 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:03,400-Speed 6311.33 samples/sec Loss 4.1412 LearningRate 0.0002 Epoch: 24 Global Step: 507530 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:06,648-Speed 6307.26 samples/sec Loss 4.2745 LearningRate 0.0002 Epoch: 24 Global Step: 507540 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:09,895-Speed 6308.50 samples/sec Loss 4.2405 LearningRate 0.0002 Epoch: 24 Global Step: 507550 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:13,138-Speed 6315.77 samples/sec Loss 4.1393 LearningRate 0.0002 Epoch: 24 Global Step: 507560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:16,395-Speed 6290.74 samples/sec Loss 4.2242 LearningRate 0.0002 Epoch: 24 Global Step: 507570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:19,641-Speed 6309.91 samples/sec Loss 4.2101 LearningRate 0.0002 Epoch: 24 Global Step: 507580 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:22,874-Speed 6335.65 samples/sec Loss 4.1946 LearningRate 0.0002 Epoch: 24 Global Step: 507590 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:26,118-Speed 6314.51 samples/sec Loss 4.2221 LearningRate 0.0002 Epoch: 24 Global Step: 507600 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:29,362-Speed 6315.26 samples/sec Loss 4.2376 LearningRate 0.0002 Epoch: 24 Global Step: 507610 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:32,607-Speed 6311.30 samples/sec Loss 4.1922 LearningRate 0.0002 Epoch: 24 Global Step: 507620 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:35,852-Speed 6313.24 samples/sec Loss 4.2066 LearningRate 0.0002 Epoch: 24 Global Step: 507630 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:39,094-Speed 6319.64 samples/sec Loss 4.1420 LearningRate 0.0002 Epoch: 24 Global Step: 507640 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:42,339-Speed 6312.91 samples/sec Loss 4.1255 LearningRate 0.0002 Epoch: 24 Global Step: 507650 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:45,587-Speed 6307.50 samples/sec Loss 4.1887 LearningRate 0.0002 Epoch: 24 Global Step: 507660 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:48,830-Speed 6316.26 samples/sec Loss 4.2619 LearningRate 0.0002 Epoch: 24 Global Step: 507670 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:52,076-Speed 6310.72 samples/sec Loss 4.2203 LearningRate 0.0002 Epoch: 24 Global Step: 507680 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:54:55,321-Speed 6312.66 samples/sec Loss 4.2463 LearningRate 0.0002 Epoch: 24 Global Step: 507690 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 13:54:58,555-Speed 6333.67 samples/sec Loss 4.2138 LearningRate 0.0002 Epoch: 24 Global Step: 507700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:01,801-Speed 6310.19 samples/sec Loss 4.2187 LearningRate 0.0002 Epoch: 24 Global Step: 507710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:05,043-Speed 6318.76 samples/sec Loss 4.1756 LearningRate 0.0002 Epoch: 24 Global Step: 507720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:08,284-Speed 6320.43 samples/sec Loss 4.2245 LearningRate 0.0002 Epoch: 24 Global Step: 507730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:11,531-Speed 6309.77 samples/sec Loss 4.2136 LearningRate 0.0002 Epoch: 24 Global Step: 507740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:14,782-Speed 6300.87 samples/sec Loss 4.1485 LearningRate 0.0002 Epoch: 24 Global Step: 507750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:18,025-Speed 6316.92 samples/sec Loss 4.2398 LearningRate 0.0002 Epoch: 24 Global Step: 507760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:21,272-Speed 6308.01 samples/sec Loss 4.1583 LearningRate 0.0002 Epoch: 24 Global Step: 507770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:24,514-Speed 6318.16 samples/sec Loss 4.1970 LearningRate 0.0002 Epoch: 24 Global Step: 507780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:27,762-Speed 6307.92 samples/sec Loss 4.1383 LearningRate 0.0002 Epoch: 24 Global Step: 507790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:30,993-Speed 6338.35 samples/sec Loss 4.1815 LearningRate 0.0002 Epoch: 24 Global Step: 507800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:34,238-Speed 6314.11 samples/sec Loss 4.2457 LearningRate 0.0002 Epoch: 24 Global Step: 507810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:37,486-Speed 6306.30 samples/sec Loss 4.1973 LearningRate 0.0002 Epoch: 24 Global Step: 507820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:40,785-Speed 6209.49 samples/sec Loss 4.1989 LearningRate 0.0002 Epoch: 24 Global Step: 507830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:55:44,019-Speed 6334.65 samples/sec Loss 4.2098 LearningRate 0.0002 Epoch: 24 Global Step: 507840 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:55:47,272-Speed 6297.42 samples/sec Loss 4.2515 LearningRate 0.0002 Epoch: 24 Global Step: 507850 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:55:50,520-Speed 6306.92 samples/sec Loss 4.2060 LearningRate 0.0002 Epoch: 24 Global Step: 507860 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:55:53,769-Speed 6304.36 samples/sec Loss 4.2546 LearningRate 0.0002 Epoch: 24 Global Step: 507870 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:55:57,011-Speed 6318.97 samples/sec Loss 4.1979 LearningRate 0.0002 Epoch: 24 Global Step: 507880 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:56:00,270-Speed 6286.10 samples/sec Loss 4.2460 LearningRate 0.0002 Epoch: 24 Global Step: 507890 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:56:03,511-Speed 6319.63 samples/sec Loss 4.2161 LearningRate 0.0002 Epoch: 24 Global Step: 507900 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:56:06,763-Speed 6298.25 samples/sec Loss 4.2040 LearningRate 0.0002 Epoch: 24 Global Step: 507910 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:56:10,014-Speed 6301.59 samples/sec Loss 4.2357 LearningRate 0.0002 Epoch: 24 Global Step: 507920 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:56:13,258-Speed 6314.11 samples/sec Loss 4.1335 LearningRate 0.0002 Epoch: 24 Global Step: 507930 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:56:16,501-Speed 6316.67 samples/sec Loss 4.2239 LearningRate 0.0002 Epoch: 24 Global Step: 507940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:19,747-Speed 6311.16 samples/sec Loss 4.2292 LearningRate 0.0002 Epoch: 24 Global Step: 507950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:22,993-Speed 6311.62 samples/sec Loss 4.2481 LearningRate 0.0002 Epoch: 24 Global Step: 507960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:26,241-Speed 6304.99 samples/sec Loss 4.1744 LearningRate 0.0002 Epoch: 24 Global Step: 507970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:29,488-Speed 6310.36 samples/sec Loss 4.2447 LearningRate 0.0002 Epoch: 24 Global Step: 507980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:32,734-Speed 6310.87 samples/sec Loss 4.1909 LearningRate 0.0002 Epoch: 24 Global Step: 507990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:35,977-Speed 6315.96 samples/sec Loss 4.1577 LearningRate 0.0002 Epoch: 24 Global Step: 508000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:39,302-Speed 6160.53 samples/sec Loss 4.3001 LearningRate 0.0002 Epoch: 24 Global Step: 508010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:42,554-Speed 6299.23 samples/sec Loss 4.2494 LearningRate 0.0002 Epoch: 24 Global Step: 508020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:45,808-Speed 6295.24 samples/sec Loss 4.1605 LearningRate 0.0002 Epoch: 24 Global Step: 508030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:49,041-Speed 6335.99 samples/sec Loss 4.1290 LearningRate 0.0002 Epoch: 24 Global Step: 508040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:52,285-Speed 6315.94 samples/sec Loss 4.2488 LearningRate 0.0002 Epoch: 24 Global Step: 508050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:55,533-Speed 6306.11 samples/sec Loss 4.2485 LearningRate 0.0002 Epoch: 24 Global Step: 508060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:56:58,782-Speed 6304.15 samples/sec Loss 4.1953 LearningRate 0.0002 Epoch: 24 Global Step: 508070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:02,030-Speed 6306.56 samples/sec Loss 4.2135 LearningRate 0.0002 Epoch: 24 Global Step: 508080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:05,275-Speed 6313.10 samples/sec Loss 4.1876 LearningRate 0.0002 Epoch: 24 Global Step: 508090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:08,523-Speed 6308.14 samples/sec Loss 4.1401 LearningRate 0.0002 Epoch: 24 Global Step: 508100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:11,768-Speed 6311.55 samples/sec Loss 4.1811 LearningRate 0.0002 Epoch: 24 Global Step: 508110 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:15,013-Speed 6313.60 samples/sec Loss 4.2152 LearningRate 0.0002 Epoch: 24 Global Step: 508120 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:18,256-Speed 6316.81 samples/sec Loss 4.1936 LearningRate 0.0002 Epoch: 24 Global Step: 508130 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:21,485-Speed 6343.59 samples/sec Loss 4.2412 LearningRate 0.0002 Epoch: 24 Global Step: 508140 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:24,761-Speed 6252.03 samples/sec Loss 4.2096 LearningRate 0.0002 Epoch: 24 Global Step: 508150 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:28,007-Speed 6310.78 samples/sec Loss 4.2129 LearningRate 0.0002 Epoch: 24 Global Step: 508160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:31,254-Speed 6309.75 samples/sec Loss 4.2251 LearningRate 0.0002 Epoch: 24 Global Step: 508170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:57:34,488-Speed 6333.19 samples/sec Loss 4.2660 LearningRate 0.0002 Epoch: 24 Global Step: 508180 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:57:37,735-Speed 6309.25 samples/sec Loss 4.3020 LearningRate 0.0002 Epoch: 24 Global Step: 508190 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:57:40,982-Speed 6309.57 samples/sec Loss 4.2328 LearningRate 0.0002 Epoch: 24 Global Step: 508200 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:57:44,226-Speed 6313.78 samples/sec Loss 4.2316 LearningRate 0.0002 Epoch: 24 Global Step: 508210 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:57:47,472-Speed 6309.56 samples/sec Loss 4.2378 LearningRate 0.0002 Epoch: 24 Global Step: 508220 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:57:50,718-Speed 6311.84 samples/sec Loss 4.2311 LearningRate 0.0002 Epoch: 24 Global Step: 508230 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:57:53,961-Speed 6315.26 samples/sec Loss 4.2613 LearningRate 0.0002 Epoch: 24 Global Step: 508240 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:57:57,208-Speed 6310.49 samples/sec Loss 4.2260 LearningRate 0.0002 Epoch: 24 Global Step: 508250 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:58:00,456-Speed 6308.02 samples/sec Loss 4.1567 LearningRate 0.0002 Epoch: 24 Global Step: 508260 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:58:03,703-Speed 6307.33 samples/sec Loss 4.1890 LearningRate 0.0002 Epoch: 24 Global Step: 508270 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 13:58:06,957-Speed 6296.60 samples/sec Loss 4.1564 LearningRate 0.0002 Epoch: 24 Global Step: 508280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:10,201-Speed 6314.43 samples/sec Loss 4.2147 LearningRate 0.0002 Epoch: 24 Global Step: 508290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:13,446-Speed 6312.28 samples/sec Loss 4.1544 LearningRate 0.0002 Epoch: 24 Global Step: 508300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:16,693-Speed 6308.86 samples/sec Loss 4.1757 LearningRate 0.0002 Epoch: 24 Global Step: 508310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:19,941-Speed 6305.78 samples/sec Loss 4.2008 LearningRate 0.0002 Epoch: 24 Global Step: 508320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:23,187-Speed 6310.31 samples/sec Loss 4.2338 LearningRate 0.0002 Epoch: 24 Global Step: 508330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:26,431-Speed 6316.29 samples/sec Loss 4.2069 LearningRate 0.0002 Epoch: 24 Global Step: 508340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:29,677-Speed 6309.82 samples/sec Loss 4.1718 LearningRate 0.0002 Epoch: 24 Global Step: 508350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:32,923-Speed 6311.32 samples/sec Loss 4.2070 LearningRate 0.0002 Epoch: 24 Global Step: 508360 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:36,166-Speed 6315.11 samples/sec Loss 4.2298 LearningRate 0.0002 Epoch: 24 Global Step: 508370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:39,409-Speed 6317.78 samples/sec Loss 4.1986 LearningRate 0.0002 Epoch: 24 Global Step: 508380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:42,656-Speed 6307.70 samples/sec Loss 4.1850 LearningRate 0.0002 Epoch: 24 Global Step: 508390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:45,900-Speed 6314.92 samples/sec Loss 4.2225 LearningRate 0.0002 Epoch: 24 Global Step: 508400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:49,147-Speed 6309.39 samples/sec Loss 4.2496 LearningRate 0.0002 Epoch: 24 Global Step: 508410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:52,394-Speed 6308.36 samples/sec Loss 4.1824 LearningRate 0.0002 Epoch: 24 Global Step: 508420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:55,637-Speed 6315.56 samples/sec Loss 4.2173 LearningRate 0.0002 Epoch: 24 Global Step: 508430 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:58:58,885-Speed 6307.14 samples/sec Loss 4.2076 LearningRate 0.0002 Epoch: 24 Global Step: 508440 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:02,133-Speed 6306.91 samples/sec Loss 4.1885 LearningRate 0.0002 Epoch: 24 Global Step: 508450 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:05,375-Speed 6318.85 samples/sec Loss 4.1539 LearningRate 0.0002 Epoch: 24 Global Step: 508460 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:08,621-Speed 6311.00 samples/sec Loss 4.2416 LearningRate 0.0002 Epoch: 24 Global Step: 508470 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:11,857-Speed 6331.34 samples/sec Loss 4.2242 LearningRate 0.0002 Epoch: 24 Global Step: 508480 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:15,103-Speed 6309.45 samples/sec Loss 4.2183 LearningRate 0.0002 Epoch: 24 Global Step: 508490 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:18,349-Speed 6311.61 samples/sec Loss 4.2673 LearningRate 0.0002 Epoch: 24 Global Step: 508500 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:21,594-Speed 6313.08 samples/sec Loss 4.2161 LearningRate 0.0002 Epoch: 24 Global Step: 508510 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:24,841-Speed 6307.76 samples/sec Loss 4.2348 LearningRate 0.0002 Epoch: 24 Global Step: 508520 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:28,084-Speed 6317.47 samples/sec Loss 4.2047 LearningRate 0.0002 Epoch: 24 Global Step: 508530 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:31,343-Speed 6285.45 samples/sec Loss 4.2307 LearningRate 0.0002 Epoch: 24 Global Step: 508540 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:34,588-Speed 6313.10 samples/sec Loss 4.2186 LearningRate 0.0002 Epoch: 24 Global Step: 508550 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:37,835-Speed 6308.16 samples/sec Loss 4.2399 LearningRate 0.0002 Epoch: 24 Global Step: 508560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:41,083-Speed 6308.15 samples/sec Loss 4.2364 LearningRate 0.0002 Epoch: 24 Global Step: 508570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:44,310-Speed 6347.55 samples/sec Loss 4.1983 LearningRate 0.0002 Epoch: 24 Global Step: 508580 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:47,560-Speed 6302.10 samples/sec Loss 4.2318 LearningRate 0.0002 Epoch: 24 Global Step: 508590 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:50,806-Speed 6310.25 samples/sec Loss 4.2184 LearningRate 0.0002 Epoch: 24 Global Step: 508600 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:54,052-Speed 6310.44 samples/sec Loss 4.1832 LearningRate 0.0002 Epoch: 24 Global Step: 508610 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 13:59:57,293-Speed 6321.79 samples/sec Loss 4.2062 LearningRate 0.0002 Epoch: 24 Global Step: 508620 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:00,536-Speed 6316.97 samples/sec Loss 4.1874 LearningRate 0.0002 Epoch: 24 Global Step: 508630 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:03,778-Speed 6317.79 samples/sec Loss 4.1409 LearningRate 0.0002 Epoch: 24 Global Step: 508640 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:07,022-Speed 6313.76 samples/sec Loss 4.1466 LearningRate 0.0002 Epoch: 24 Global Step: 508650 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:10,269-Speed 6309.68 samples/sec Loss 4.2086 LearningRate 0.0002 Epoch: 24 Global Step: 508660 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:13,516-Speed 6308.13 samples/sec Loss 4.2210 LearningRate 0.0002 Epoch: 24 Global Step: 508670 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:16,760-Speed 6315.08 samples/sec Loss 4.1575 LearningRate 0.0002 Epoch: 24 Global Step: 508680 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 14:00:19,992-Speed 6338.13 samples/sec Loss 4.1907 LearningRate 0.0002 Epoch: 24 Global Step: 508690 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:23,282-Speed 6227.83 samples/sec Loss 4.2553 LearningRate 0.0002 Epoch: 24 Global Step: 508700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:26,527-Speed 6311.62 samples/sec Loss 4.2428 LearningRate 0.0002 Epoch: 24 Global Step: 508710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:29,775-Speed 6307.56 samples/sec Loss 4.2447 LearningRate 0.0002 Epoch: 24 Global Step: 508720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:33,027-Speed 6299.14 samples/sec Loss 4.1746 LearningRate 0.0002 Epoch: 24 Global Step: 508730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:36,272-Speed 6311.25 samples/sec Loss 4.1964 LearningRate 0.0002 Epoch: 24 Global Step: 508740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:39,515-Speed 6316.66 samples/sec Loss 4.2341 LearningRate 0.0002 Epoch: 24 Global Step: 508750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:42,758-Speed 6316.95 samples/sec Loss 4.1440 LearningRate 0.0002 Epoch: 24 Global Step: 508760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:46,003-Speed 6311.69 samples/sec Loss 4.2033 LearningRate 0.0002 Epoch: 24 Global Step: 508770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:49,245-Speed 6318.35 samples/sec Loss 4.1842 LearningRate 0.0002 Epoch: 24 Global Step: 508780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:52,480-Speed 6332.34 samples/sec Loss 4.1297 LearningRate 0.0002 Epoch: 24 Global Step: 508790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:55,723-Speed 6317.06 samples/sec Loss 4.3004 LearningRate 0.0002 Epoch: 24 Global Step: 508800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:00:58,966-Speed 6316.51 samples/sec Loss 4.2361 LearningRate 0.0002 Epoch: 24 Global Step: 508810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:02,216-Speed 6303.37 samples/sec Loss 4.1391 LearningRate 0.0002 Epoch: 24 Global Step: 508820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:05,465-Speed 6304.52 samples/sec Loss 4.1286 LearningRate 0.0002 Epoch: 24 Global Step: 508830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:08,707-Speed 6318.11 samples/sec Loss 4.1953 LearningRate 0.0002 Epoch: 24 Global Step: 508840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:11,958-Speed 6301.66 samples/sec Loss 4.3095 LearningRate 0.0002 Epoch: 24 Global Step: 508850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:15,201-Speed 6316.56 samples/sec Loss 4.2226 LearningRate 0.0002 Epoch: 24 Global Step: 508860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:18,450-Speed 6304.96 samples/sec Loss 4.1414 LearningRate 0.0002 Epoch: 24 Global Step: 508870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:21,694-Speed 6314.27 samples/sec Loss 4.2096 LearningRate 0.0002 Epoch: 24 Global Step: 508880 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:24,926-Speed 6337.74 samples/sec Loss 4.2320 LearningRate 0.0002 Epoch: 24 Global Step: 508890 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:28,174-Speed 6308.83 samples/sec Loss 4.1831 LearningRate 0.0002 Epoch: 24 Global Step: 508900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:31,421-Speed 6308.34 samples/sec Loss 4.2055 LearningRate 0.0002 Epoch: 24 Global Step: 508910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:34,664-Speed 6316.62 samples/sec Loss 4.2117 LearningRate 0.0002 Epoch: 24 Global Step: 508920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:37,912-Speed 6306.22 samples/sec Loss 4.2373 LearningRate 0.0002 Epoch: 24 Global Step: 508930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:41,161-Speed 6305.55 samples/sec Loss 4.1324 LearningRate 0.0002 Epoch: 24 Global Step: 508940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:44,409-Speed 6306.63 samples/sec Loss 4.1985 LearningRate 0.0002 Epoch: 24 Global Step: 508950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:47,653-Speed 6314.38 samples/sec Loss 4.1602 LearningRate 0.0002 Epoch: 24 Global Step: 508960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:50,898-Speed 6312.79 samples/sec Loss 4.1557 LearningRate 0.0002 Epoch: 24 Global Step: 508970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:54,142-Speed 6313.53 samples/sec Loss 4.1620 LearningRate 0.0002 Epoch: 24 Global Step: 508980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:01:57,375-Speed 6336.88 samples/sec Loss 4.2299 LearningRate 0.0002 Epoch: 24 Global Step: 508990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:02:00,617-Speed 6318.30 samples/sec Loss 4.1029 LearningRate 0.0002 Epoch: 24 Global Step: 509000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:02:03,860-Speed 6316.06 samples/sec Loss 4.1836 LearningRate 0.0002 Epoch: 24 Global Step: 509010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:02:07,105-Speed 6312.48 samples/sec Loss 4.2143 LearningRate 0.0002 Epoch: 24 Global Step: 509020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:02:10,348-Speed 6317.07 samples/sec Loss 4.2805 LearningRate 0.0002 Epoch: 24 Global Step: 509030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:02:13,594-Speed 6310.52 samples/sec Loss 4.2159 LearningRate 0.0002 Epoch: 24 Global Step: 509040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:02:16,841-Speed 6308.38 samples/sec Loss 4.2240 LearningRate 0.0002 Epoch: 24 Global Step: 509050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:02:20,073-Speed 6339.40 samples/sec Loss 4.2550 LearningRate 0.0002 Epoch: 24 Global Step: 509060 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:02:23,319-Speed 6310.64 samples/sec Loss 4.2348 LearningRate 0.0002 Epoch: 24 Global Step: 509070 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:02:26,596-Speed 6251.24 samples/sec Loss 4.2678 LearningRate 0.0002 Epoch: 24 Global Step: 509080 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:02:29,840-Speed 6314.80 samples/sec Loss 4.2762 LearningRate 0.0002 Epoch: 24 Global Step: 509090 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:02:33,085-Speed 6312.12 samples/sec Loss 4.2348 LearningRate 0.0002 Epoch: 24 Global Step: 509100 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:02:36,326-Speed 6322.05 samples/sec Loss 4.2090 LearningRate 0.0002 Epoch: 24 Global Step: 509110 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:02:39,585-Speed 6284.09 samples/sec Loss 4.2033 LearningRate 0.0002 Epoch: 24 Global Step: 509120 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:02:42,830-Speed 6314.39 samples/sec Loss 4.2114 LearningRate 0.0002 Epoch: 24 Global Step: 509130 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:02:46,073-Speed 6316.02 samples/sec Loss 4.1554 LearningRate 0.0002 Epoch: 24 Global Step: 509140 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:02:49,317-Speed 6313.97 samples/sec Loss 4.1765 LearningRate 0.0002 Epoch: 24 Global Step: 509150 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:02:52,559-Speed 6318.03 samples/sec Loss 4.1913 LearningRate 0.0002 Epoch: 24 Global Step: 509160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:02:55,803-Speed 6315.17 samples/sec Loss 4.2353 LearningRate 0.0002 Epoch: 24 Global Step: 509170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:02:59,049-Speed 6309.80 samples/sec Loss 4.1767 LearningRate 0.0002 Epoch: 24 Global Step: 509180 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:02,300-Speed 6301.96 samples/sec Loss 4.1783 LearningRate 0.0002 Epoch: 24 Global Step: 509190 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:05,544-Speed 6314.26 samples/sec Loss 4.1832 LearningRate 0.0002 Epoch: 24 Global Step: 509200 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:08,789-Speed 6312.88 samples/sec Loss 4.1903 LearningRate 0.0002 Epoch: 24 Global Step: 509210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:12,039-Speed 6304.18 samples/sec Loss 4.2224 LearningRate 0.0002 Epoch: 24 Global Step: 509220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:15,322-Speed 6239.24 samples/sec Loss 4.1734 LearningRate 0.0002 Epoch: 24 Global Step: 509230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:18,588-Speed 6270.65 samples/sec Loss 4.1962 LearningRate 0.0002 Epoch: 24 Global Step: 509240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:21,830-Speed 6318.83 samples/sec Loss 4.2028 LearningRate 0.0002 Epoch: 24 Global Step: 509250 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:25,061-Speed 6340.13 samples/sec Loss 4.2048 LearningRate 0.0002 Epoch: 24 Global Step: 509260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:28,306-Speed 6313.48 samples/sec Loss 4.2343 LearningRate 0.0002 Epoch: 24 Global Step: 509270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:31,546-Speed 6322.60 samples/sec Loss 4.1776 LearningRate 0.0002 Epoch: 24 Global Step: 509280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:34,793-Speed 6307.53 samples/sec Loss 4.2073 LearningRate 0.0002 Epoch: 24 Global Step: 509290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:38,036-Speed 6316.87 samples/sec Loss 4.1328 LearningRate 0.0002 Epoch: 24 Global Step: 509300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:41,282-Speed 6311.71 samples/sec Loss 4.2031 LearningRate 0.0002 Epoch: 24 Global Step: 509310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:44,524-Speed 6317.74 samples/sec Loss 4.2145 LearningRate 0.0002 Epoch: 24 Global Step: 509320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:47,769-Speed 6313.65 samples/sec Loss 4.2091 LearningRate 0.0002 Epoch: 24 Global Step: 509330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:51,015-Speed 6309.99 samples/sec Loss 4.1246 LearningRate 0.0002 Epoch: 24 Global Step: 509340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:54,262-Speed 6308.84 samples/sec Loss 4.2450 LearningRate 0.0002 Epoch: 24 Global Step: 509350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:03:57,494-Speed 6339.25 samples/sec Loss 4.2412 LearningRate 0.0002 Epoch: 24 Global Step: 509360 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:04:00,736-Speed 6318.13 samples/sec Loss 4.1850 LearningRate 0.0002 Epoch: 24 Global Step: 509370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:04:03,983-Speed 6308.22 samples/sec Loss 4.1481 LearningRate 0.0002 Epoch: 24 Global Step: 509380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:04:07,236-Speed 6298.26 samples/sec Loss 4.2682 LearningRate 0.0002 Epoch: 24 Global Step: 509390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:04:10,479-Speed 6316.01 samples/sec Loss 4.1728 LearningRate 0.0002 Epoch: 24 Global Step: 509400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:04:13,747-Speed 6266.69 samples/sec Loss 4.2051 LearningRate 0.0002 Epoch: 24 Global Step: 509410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:04:17,003-Speed 6291.58 samples/sec Loss 4.2472 LearningRate 0.0002 Epoch: 24 Global Step: 509420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:04:20,247-Speed 6315.39 samples/sec Loss 4.1978 LearningRate 0.0002 Epoch: 24 Global Step: 509430 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:04:23,477-Speed 6341.69 samples/sec Loss 4.2270 LearningRate 0.0002 Epoch: 24 Global Step: 509440 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:04:26,738-Speed 6282.32 samples/sec Loss 4.2275 LearningRate 0.0002 Epoch: 24 Global Step: 509450 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:04:29,982-Speed 6313.89 samples/sec Loss 4.2225 LearningRate 0.0002 Epoch: 24 Global Step: 509460 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:04:33,225-Speed 6316.56 samples/sec Loss 4.2261 LearningRate 0.0002 Epoch: 24 Global Step: 509470 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:04:36,479-Speed 6295.22 samples/sec Loss 4.2222 LearningRate 0.0002 Epoch: 24 Global Step: 509480 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:04:39,724-Speed 6313.60 samples/sec Loss 4.1315 LearningRate 0.0002 Epoch: 24 Global Step: 509490 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:04:42,972-Speed 6305.13 samples/sec Loss 4.1586 LearningRate 0.0002 Epoch: 24 Global Step: 509500 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:04:46,215-Speed 6316.71 samples/sec Loss 4.2041 LearningRate 0.0002 Epoch: 24 Global Step: 509510 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:04:49,463-Speed 6309.13 samples/sec Loss 4.1866 LearningRate 0.0002 Epoch: 24 Global Step: 509520 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:04:52,710-Speed 6307.54 samples/sec Loss 4.1618 LearningRate 0.0002 Epoch: 24 Global Step: 509530 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:04:55,953-Speed 6316.91 samples/sec Loss 4.2373 LearningRate 0.0002 Epoch: 24 Global Step: 509540 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:04:59,197-Speed 6314.28 samples/sec Loss 4.2351 LearningRate 0.0002 Epoch: 24 Global Step: 509550 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:02,443-Speed 6311.09 samples/sec Loss 4.1442 LearningRate 0.0002 Epoch: 24 Global Step: 509560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:05,684-Speed 6319.67 samples/sec Loss 4.1504 LearningRate 0.0002 Epoch: 24 Global Step: 509570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:08,928-Speed 6316.33 samples/sec Loss 4.2122 LearningRate 0.0002 Epoch: 24 Global Step: 509580 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:12,264-Speed 6139.60 samples/sec Loss 4.1942 LearningRate 0.0002 Epoch: 24 Global Step: 509590 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:15,509-Speed 6312.61 samples/sec Loss 4.1984 LearningRate 0.0002 Epoch: 24 Global Step: 509600 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:18,755-Speed 6310.71 samples/sec Loss 4.2006 LearningRate 0.0002 Epoch: 24 Global Step: 509610 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:21,995-Speed 6322.77 samples/sec Loss 4.1278 LearningRate 0.0002 Epoch: 24 Global Step: 509620 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:25,243-Speed 6306.94 samples/sec Loss 4.2334 LearningRate 0.0002 Epoch: 24 Global Step: 509630 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:28,473-Speed 6342.32 samples/sec Loss 4.1708 LearningRate 0.0002 Epoch: 24 Global Step: 509640 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:31,722-Speed 6304.41 samples/sec Loss 4.1351 LearningRate 0.0002 Epoch: 24 Global Step: 509650 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:34,965-Speed 6316.35 samples/sec Loss 4.1861 LearningRate 0.0002 Epoch: 24 Global Step: 509660 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:38,210-Speed 6312.53 samples/sec Loss 4.1247 LearningRate 0.0002 Epoch: 24 Global Step: 509670 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:41,455-Speed 6313.80 samples/sec Loss 4.1750 LearningRate 0.0002 Epoch: 24 Global Step: 509680 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:44,697-Speed 6318.41 samples/sec Loss 4.1377 LearningRate 0.0002 Epoch: 24 Global Step: 509690 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:47,940-Speed 6315.07 samples/sec Loss 4.1889 LearningRate 0.0002 Epoch: 24 Global Step: 509700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:51,183-Speed 6316.20 samples/sec Loss 4.1961 LearningRate 0.0002 Epoch: 24 Global Step: 509710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:54,431-Speed 6307.36 samples/sec Loss 4.1907 LearningRate 0.0002 Epoch: 24 Global Step: 509720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:05:57,678-Speed 6309.67 samples/sec Loss 4.2000 LearningRate 0.0002 Epoch: 24 Global Step: 509730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:00,911-Speed 6335.29 samples/sec Loss 4.1484 LearningRate 0.0002 Epoch: 24 Global Step: 509740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:04,160-Speed 6306.94 samples/sec Loss 4.1457 LearningRate 0.0002 Epoch: 24 Global Step: 509750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:07,405-Speed 6313.15 samples/sec Loss 4.1757 LearningRate 0.0002 Epoch: 24 Global Step: 509760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:10,651-Speed 6309.41 samples/sec Loss 4.1804 LearningRate 0.0002 Epoch: 24 Global Step: 509770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:13,894-Speed 6316.63 samples/sec Loss 4.1856 LearningRate 0.0002 Epoch: 24 Global Step: 509780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:17,140-Speed 6311.62 samples/sec Loss 4.2687 LearningRate 0.0002 Epoch: 24 Global Step: 509790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:20,384-Speed 6313.48 samples/sec Loss 4.1879 LearningRate 0.0002 Epoch: 24 Global Step: 509800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:23,628-Speed 6314.78 samples/sec Loss 4.1924 LearningRate 0.0002 Epoch: 24 Global Step: 509810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:26,878-Speed 6302.36 samples/sec Loss 4.2621 LearningRate 0.0002 Epoch: 24 Global Step: 509820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:30,127-Speed 6304.55 samples/sec Loss 4.2510 LearningRate 0.0002 Epoch: 24 Global Step: 509830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:33,358-Speed 6341.74 samples/sec Loss 4.1511 LearningRate 0.0002 Epoch: 24 Global Step: 509840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:36,602-Speed 6313.48 samples/sec Loss 4.2497 LearningRate 0.0002 Epoch: 24 Global Step: 509850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:39,847-Speed 6313.85 samples/sec Loss 4.1410 LearningRate 0.0002 Epoch: 24 Global Step: 509860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:43,093-Speed 6310.78 samples/sec Loss 4.2575 LearningRate 0.0002 Epoch: 24 Global Step: 509870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:46,336-Speed 6316.22 samples/sec Loss 4.2162 LearningRate 0.0002 Epoch: 24 Global Step: 509880 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:49,583-Speed 6306.93 samples/sec Loss 4.2235 LearningRate 0.0002 Epoch: 24 Global Step: 509890 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:52,827-Speed 6315.44 samples/sec Loss 4.1360 LearningRate 0.0002 Epoch: 24 Global Step: 509900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:56,075-Speed 6306.48 samples/sec Loss 4.1637 LearningRate 0.0002 Epoch: 24 Global Step: 509910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:06:59,320-Speed 6314.03 samples/sec Loss 4.1773 LearningRate 0.0002 Epoch: 24 Global Step: 509920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:02,566-Speed 6309.84 samples/sec Loss 4.2543 LearningRate 0.0002 Epoch: 24 Global Step: 509930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:05,802-Speed 6330.47 samples/sec Loss 4.1746 LearningRate 0.0002 Epoch: 24 Global Step: 509940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:09,045-Speed 6316.48 samples/sec Loss 4.2308 LearningRate 0.0002 Epoch: 24 Global Step: 509950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:12,291-Speed 6312.23 samples/sec Loss 4.1926 LearningRate 0.0002 Epoch: 24 Global Step: 509960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:15,537-Speed 6309.77 samples/sec Loss 4.2336 LearningRate 0.0002 Epoch: 24 Global Step: 509970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:18,781-Speed 6314.45 samples/sec Loss 4.2003 LearningRate 0.0002 Epoch: 24 Global Step: 509980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:22,027-Speed 6312.04 samples/sec Loss 4.1525 LearningRate 0.0002 Epoch: 24 Global Step: 509990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:25,272-Speed 6312.66 samples/sec Loss 4.1923 LearningRate 0.0002 Epoch: 24 Global Step: 510000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:28,523-Speed 6300.87 samples/sec Loss 4.1352 LearningRate 0.0002 Epoch: 24 Global Step: 510010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:31,767-Speed 6313.91 samples/sec Loss 4.1423 LearningRate 0.0002 Epoch: 24 Global Step: 510020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:35,013-Speed 6310.76 samples/sec Loss 4.1836 LearningRate 0.0002 Epoch: 24 Global Step: 510030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:38,261-Speed 6307.13 samples/sec Loss 4.1866 LearningRate 0.0002 Epoch: 24 Global Step: 510040 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 14:07:41,489-Speed 6346.73 samples/sec Loss 4.1707 LearningRate 0.0002 Epoch: 24 Global Step: 510050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:44,733-Speed 6314.30 samples/sec Loss 4.1913 LearningRate 0.0002 Epoch: 24 Global Step: 510060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:47,982-Speed 6303.10 samples/sec Loss 4.2338 LearningRate 0.0002 Epoch: 24 Global Step: 510070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:51,317-Speed 6143.96 samples/sec Loss 4.1284 LearningRate 0.0002 Epoch: 24 Global Step: 510080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:54,564-Speed 6308.74 samples/sec Loss 4.1871 LearningRate 0.0002 Epoch: 24 Global Step: 510090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:07:57,810-Speed 6309.06 samples/sec Loss 4.1132 LearningRate 0.0002 Epoch: 24 Global Step: 510100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:01,060-Speed 6304.57 samples/sec Loss 4.2828 LearningRate 0.0002 Epoch: 24 Global Step: 510110 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:04,306-Speed 6309.61 samples/sec Loss 4.1839 LearningRate 0.0002 Epoch: 24 Global Step: 510120 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:07,552-Speed 6310.85 samples/sec Loss 4.2091 LearningRate 0.0002 Epoch: 24 Global Step: 510130 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:10,802-Speed 6303.20 samples/sec Loss 4.2556 LearningRate 0.0002 Epoch: 24 Global Step: 510140 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:14,034-Speed 6338.79 samples/sec Loss 4.1404 LearningRate 0.0002 Epoch: 24 Global Step: 510150 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:17,284-Speed 6301.85 samples/sec Loss 4.1847 LearningRate 0.0002 Epoch: 24 Global Step: 510160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:20,531-Speed 6310.04 samples/sec Loss 4.1707 LearningRate 0.0002 Epoch: 24 Global Step: 510170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:23,782-Speed 6301.76 samples/sec Loss 4.2009 LearningRate 0.0002 Epoch: 24 Global Step: 510180 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:27,029-Speed 6307.45 samples/sec Loss 4.1873 LearningRate 0.0002 Epoch: 24 Global Step: 510190 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:30,273-Speed 6315.11 samples/sec Loss 4.1434 LearningRate 0.0002 Epoch: 24 Global Step: 510200 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:33,517-Speed 6314.42 samples/sec Loss 4.1374 LearningRate 0.0002 Epoch: 24 Global Step: 510210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:36,759-Speed 6318.25 samples/sec Loss 4.2066 LearningRate 0.0002 Epoch: 24 Global Step: 510220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:40,004-Speed 6312.97 samples/sec Loss 4.1615 LearningRate 0.0002 Epoch: 24 Global Step: 510230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:43,245-Speed 6319.55 samples/sec Loss 4.1510 LearningRate 0.0002 Epoch: 24 Global Step: 510240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:46,475-Speed 6342.73 samples/sec Loss 4.1594 LearningRate 0.0002 Epoch: 24 Global Step: 510250 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:49,718-Speed 6316.82 samples/sec Loss 4.2443 LearningRate 0.0002 Epoch: 24 Global Step: 510260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:52,967-Speed 6305.11 samples/sec Loss 4.1583 LearningRate 0.0002 Epoch: 24 Global Step: 510270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:56,216-Speed 6304.02 samples/sec Loss 4.1524 LearningRate 0.0002 Epoch: 24 Global Step: 510280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:08:59,460-Speed 6315.56 samples/sec Loss 4.1962 LearningRate 0.0002 Epoch: 24 Global Step: 510290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:02,705-Speed 6312.48 samples/sec Loss 4.1967 LearningRate 0.0002 Epoch: 24 Global Step: 510300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:05,951-Speed 6309.70 samples/sec Loss 4.2332 LearningRate 0.0002 Epoch: 24 Global Step: 510310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:09,207-Speed 6292.46 samples/sec Loss 4.2542 LearningRate 0.0002 Epoch: 24 Global Step: 510320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:12,451-Speed 6314.49 samples/sec Loss 4.2253 LearningRate 0.0002 Epoch: 24 Global Step: 510330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:15,696-Speed 6312.40 samples/sec Loss 4.2421 LearningRate 0.0002 Epoch: 24 Global Step: 510340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:18,923-Speed 6348.17 samples/sec Loss 4.2251 LearningRate 0.0002 Epoch: 24 Global Step: 510350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:22,163-Speed 6322.97 samples/sec Loss 4.1947 LearningRate 0.0002 Epoch: 24 Global Step: 510360 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:25,413-Speed 6302.22 samples/sec Loss 4.2734 LearningRate 0.0002 Epoch: 24 Global Step: 510370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:28,660-Speed 6310.36 samples/sec Loss 4.1978 LearningRate 0.0002 Epoch: 24 Global Step: 510380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:31,901-Speed 6320.73 samples/sec Loss 4.2470 LearningRate 0.0002 Epoch: 24 Global Step: 510390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:35,147-Speed 6308.84 samples/sec Loss 4.2308 LearningRate 0.0002 Epoch: 24 Global Step: 510400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:38,392-Speed 6314.63 samples/sec Loss 4.1890 LearningRate 0.0002 Epoch: 24 Global Step: 510410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:41,636-Speed 6313.90 samples/sec Loss 4.1555 LearningRate 0.0002 Epoch: 24 Global Step: 510420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:09:44,869-Speed 6336.56 samples/sec Loss 4.1835 LearningRate 0.0002 Epoch: 24 Global Step: 510430 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:09:48,113-Speed 6314.44 samples/sec Loss 4.2233 LearningRate 0.0002 Epoch: 24 Global Step: 510440 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:09:51,360-Speed 6309.02 samples/sec Loss 4.2103 LearningRate 0.0002 Epoch: 24 Global Step: 510450 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:09:54,601-Speed 6319.06 samples/sec Loss 4.1776 LearningRate 0.0002 Epoch: 24 Global Step: 510460 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:09:57,848-Speed 6309.27 samples/sec Loss 4.1811 LearningRate 0.0002 Epoch: 24 Global Step: 510470 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:01,090-Speed 6318.91 samples/sec Loss 4.1673 LearningRate 0.0002 Epoch: 24 Global Step: 510480 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:04,335-Speed 6312.82 samples/sec Loss 4.1743 LearningRate 0.0002 Epoch: 24 Global Step: 510490 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:07,577-Speed 6318.25 samples/sec Loss 4.2223 LearningRate 0.0002 Epoch: 24 Global Step: 510500 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:10,819-Speed 6317.08 samples/sec Loss 4.1737 LearningRate 0.0002 Epoch: 24 Global Step: 510510 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:14,066-Speed 6310.54 samples/sec Loss 4.1837 LearningRate 0.0002 Epoch: 24 Global Step: 510520 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:17,308-Speed 6316.94 samples/sec Loss 4.2072 LearningRate 0.0002 Epoch: 24 Global Step: 510530 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:10:20,555-Speed 6309.47 samples/sec Loss 4.2032 LearningRate 0.0002 Epoch: 24 Global Step: 510540 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:10:23,785-Speed 6341.36 samples/sec Loss 4.2186 LearningRate 0.0002 Epoch: 24 Global Step: 510550 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:27,038-Speed 6298.43 samples/sec Loss 4.2238 LearningRate 0.0002 Epoch: 24 Global Step: 510560 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:30,285-Speed 6308.49 samples/sec Loss 4.1841 LearningRate 0.0002 Epoch: 24 Global Step: 510570 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:33,536-Speed 6301.71 samples/sec Loss 4.1536 LearningRate 0.0002 Epoch: 24 Global Step: 510580 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:36,798-Speed 6280.15 samples/sec Loss 4.2435 LearningRate 0.0002 Epoch: 24 Global Step: 510590 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:40,051-Speed 6297.24 samples/sec Loss 4.1871 LearningRate 0.0002 Epoch: 24 Global Step: 510600 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:43,301-Speed 6302.45 samples/sec Loss 4.2325 LearningRate 0.0002 Epoch: 24 Global Step: 510610 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:46,544-Speed 6315.55 samples/sec Loss 4.1676 LearningRate 0.0002 Epoch: 24 Global Step: 510620 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:49,786-Speed 6319.71 samples/sec Loss 4.1150 LearningRate 0.0002 Epoch: 24 Global Step: 510630 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:53,030-Speed 6313.42 samples/sec Loss 4.1595 LearningRate 0.0002 Epoch: 24 Global Step: 510640 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:10:56,275-Speed 6314.28 samples/sec Loss 4.1879 LearningRate 0.0002 Epoch: 24 Global Step: 510650 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:10:59,526-Speed 6300.09 samples/sec Loss 4.2062 LearningRate 0.0002 Epoch: 24 Global Step: 510660 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:02,775-Speed 6305.04 samples/sec Loss 4.2310 LearningRate 0.0002 Epoch: 24 Global Step: 510670 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:06,017-Speed 6318.94 samples/sec Loss 4.2051 LearningRate 0.0002 Epoch: 24 Global Step: 510680 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:09,265-Speed 6305.93 samples/sec Loss 4.2141 LearningRate 0.0002 Epoch: 24 Global Step: 510690 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:12,511-Speed 6311.96 samples/sec Loss 4.2030 LearningRate 0.0002 Epoch: 24 Global Step: 510700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:15,755-Speed 6314.25 samples/sec Loss 4.2155 LearningRate 0.0002 Epoch: 24 Global Step: 510710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:19,002-Speed 6308.62 samples/sec Loss 4.1542 LearningRate 0.0002 Epoch: 24 Global Step: 510720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:22,244-Speed 6316.91 samples/sec Loss 4.1973 LearningRate 0.0002 Epoch: 24 Global Step: 510730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:25,495-Speed 6302.37 samples/sec Loss 4.2229 LearningRate 0.0002 Epoch: 24 Global Step: 510740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:28,731-Speed 6330.26 samples/sec Loss 4.2340 LearningRate 0.0002 Epoch: 24 Global Step: 510750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:31,976-Speed 6311.43 samples/sec Loss 4.1854 LearningRate 0.0002 Epoch: 24 Global Step: 510760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:35,221-Speed 6313.08 samples/sec Loss 4.1580 LearningRate 0.0002 Epoch: 24 Global Step: 510770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:38,472-Speed 6301.19 samples/sec Loss 4.2555 LearningRate 0.0002 Epoch: 24 Global Step: 510780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:41,719-Speed 6310.54 samples/sec Loss 4.1881 LearningRate 0.0002 Epoch: 24 Global Step: 510790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:44,963-Speed 6314.29 samples/sec Loss 4.1232 LearningRate 0.0002 Epoch: 24 Global Step: 510800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:48,211-Speed 6305.67 samples/sec Loss 4.1129 LearningRate 0.0002 Epoch: 24 Global Step: 510810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:51,458-Speed 6310.44 samples/sec Loss 4.1628 LearningRate 0.0002 Epoch: 24 Global Step: 510820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:54,701-Speed 6315.62 samples/sec Loss 4.1880 LearningRate 0.0002 Epoch: 24 Global Step: 510830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:11:57,947-Speed 6311.74 samples/sec Loss 4.2017 LearningRate 0.0002 Epoch: 24 Global Step: 510840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:01,184-Speed 6326.85 samples/sec Loss 4.2060 LearningRate 0.0002 Epoch: 24 Global Step: 510850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:04,428-Speed 6315.31 samples/sec Loss 4.2036 LearningRate 0.0002 Epoch: 24 Global Step: 510860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:07,676-Speed 6306.40 samples/sec Loss 4.1791 LearningRate 0.0002 Epoch: 24 Global Step: 510870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:10,920-Speed 6314.29 samples/sec Loss 4.1844 LearningRate 0.0002 Epoch: 24 Global Step: 510880 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:14,167-Speed 6308.82 samples/sec Loss 4.1981 LearningRate 0.0002 Epoch: 24 Global Step: 510890 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:17,416-Speed 6305.50 samples/sec Loss 4.1873 LearningRate 0.0002 Epoch: 24 Global Step: 510900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:20,664-Speed 6306.51 samples/sec Loss 4.1885 LearningRate 0.0002 Epoch: 24 Global Step: 510910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:23,907-Speed 6315.78 samples/sec Loss 4.1258 LearningRate 0.0002 Epoch: 24 Global Step: 510920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:27,154-Speed 6310.34 samples/sec Loss 4.1655 LearningRate 0.0002 Epoch: 24 Global Step: 510930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:30,413-Speed 6284.35 samples/sec Loss 4.1959 LearningRate 0.0002 Epoch: 24 Global Step: 510940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:33,644-Speed 6340.69 samples/sec Loss 4.1494 LearningRate 0.0002 Epoch: 24 Global Step: 510950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:36,890-Speed 6310.40 samples/sec Loss 4.1676 LearningRate 0.0002 Epoch: 24 Global Step: 510960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:40,135-Speed 6313.25 samples/sec Loss 4.2404 LearningRate 0.0002 Epoch: 24 Global Step: 510970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:43,381-Speed 6309.60 samples/sec Loss 4.1763 LearningRate 0.0002 Epoch: 24 Global Step: 510980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:46,632-Speed 6301.86 samples/sec Loss 4.2111 LearningRate 0.0002 Epoch: 24 Global Step: 510990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:49,877-Speed 6313.87 samples/sec Loss 4.2728 LearningRate 0.0002 Epoch: 24 Global Step: 511000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:53,120-Speed 6315.04 samples/sec Loss 4.1553 LearningRate 0.0002 Epoch: 24 Global Step: 511010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:56,365-Speed 6314.17 samples/sec Loss 4.1881 LearningRate 0.0002 Epoch: 24 Global Step: 511020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:12:59,615-Speed 6302.91 samples/sec Loss 4.2517 LearningRate 0.0002 Epoch: 24 Global Step: 511030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:13:02,867-Speed 6297.41 samples/sec Loss 4.2341 LearningRate 0.0002 Epoch: 24 Global Step: 511040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:13:06,107-Speed 6324.13 samples/sec Loss 4.1925 LearningRate 0.0002 Epoch: 24 Global Step: 511050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:13:09,351-Speed 6314.28 samples/sec Loss 4.1970 LearningRate 0.0002 Epoch: 24 Global Step: 511060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:13:12,611-Speed 6284.51 samples/sec Loss 4.1056 LearningRate 0.0002 Epoch: 24 Global Step: 511070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:13:15,856-Speed 6312.71 samples/sec Loss 4.2421 LearningRate 0.0002 Epoch: 24 Global Step: 511080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:13:19,100-Speed 6313.43 samples/sec Loss 4.1155 LearningRate 0.0002 Epoch: 24 Global Step: 511090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:13:22,344-Speed 6314.81 samples/sec Loss 4.1588 LearningRate 0.0002 Epoch: 24 Global Step: 511100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:13:25,577-Speed 6336.86 samples/sec Loss 4.2783 LearningRate 0.0002 Epoch: 24 Global Step: 511110 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:13:28,821-Speed 6313.86 samples/sec Loss 4.2174 LearningRate 0.0002 Epoch: 24 Global Step: 511120 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:13:32,062-Speed 6319.32 samples/sec Loss 4.1686 LearningRate 0.0002 Epoch: 24 Global Step: 511130 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:13:35,305-Speed 6317.98 samples/sec Loss 4.1764 LearningRate 0.0002 Epoch: 24 Global Step: 511140 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:13:38,548-Speed 6316.05 samples/sec Loss 4.2316 LearningRate 0.0002 Epoch: 24 Global Step: 511150 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:13:41,824-Speed 6253.21 samples/sec Loss 4.2330 LearningRate 0.0002 Epoch: 24 Global Step: 511160 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:13:45,073-Speed 6305.17 samples/sec Loss 4.1955 LearningRate 0.0002 Epoch: 24 Global Step: 511170 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:13:48,321-Speed 6306.48 samples/sec Loss 4.1798 LearningRate 0.0002 Epoch: 24 Global Step: 511180 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:13:51,572-Speed 6301.64 samples/sec Loss 4.1601 LearningRate 0.0002 Epoch: 24 Global Step: 511190 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:13:54,815-Speed 6315.64 samples/sec Loss 4.1270 LearningRate 0.0002 Epoch: 24 Global Step: 511200 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:13:58,057-Speed 6320.42 samples/sec Loss 4.1916 LearningRate 0.0002 Epoch: 24 Global Step: 511210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:14:01,304-Speed 6308.63 samples/sec Loss 4.2488 LearningRate 0.0002 Epoch: 24 Global Step: 511220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:14:04,556-Speed 6297.79 samples/sec Loss 4.1470 LearningRate 0.0002 Epoch: 24 Global Step: 511230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:14:07,798-Speed 6318.74 samples/sec Loss 4.1842 LearningRate 0.0002 Epoch: 24 Global Step: 511240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:14:11,026-Speed 6345.33 samples/sec Loss 4.2090 LearningRate 0.0002 Epoch: 24 Global Step: 511250 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:14:14,272-Speed 6311.79 samples/sec Loss 4.1871 LearningRate 0.0002 Epoch: 24 Global Step: 511260 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:14:17,516-Speed 6314.54 samples/sec Loss 4.1317 LearningRate 0.0002 Epoch: 24 Global Step: 511270 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:14:20,761-Speed 6313.39 samples/sec Loss 4.1857 LearningRate 0.0002 Epoch: 24 Global Step: 511280 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:14:24,003-Speed 6317.05 samples/sec Loss 4.2162 LearningRate 0.0002 Epoch: 24 Global Step: 511290 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:14:27,246-Speed 6316.40 samples/sec Loss 4.2583 LearningRate 0.0002 Epoch: 24 Global Step: 511300 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:14:30,488-Speed 6319.49 samples/sec Loss 4.2678 LearningRate 0.0002 Epoch: 24 Global Step: 511310 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:14:33,732-Speed 6314.55 samples/sec Loss 4.2479 LearningRate 0.0002 Epoch: 24 Global Step: 511320 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:14:37,019-Speed 6231.48 samples/sec Loss 4.2423 LearningRate 0.0002 Epoch: 24 Global Step: 511330 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:14:40,361-Speed 6129.92 samples/sec Loss 4.1377 LearningRate 0.0002 Epoch: 24 Global Step: 511340 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:14:43,608-Speed 6309.14 samples/sec Loss 4.2024 LearningRate 0.0002 Epoch: 24 Global Step: 511350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:14:46,853-Speed 6311.14 samples/sec Loss 4.1847 LearningRate 0.0002 Epoch: 24 Global Step: 511360 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:14:50,098-Speed 6313.30 samples/sec Loss 4.1570 LearningRate 0.0002 Epoch: 24 Global Step: 511370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:14:53,341-Speed 6315.83 samples/sec Loss 4.1743 LearningRate 0.0002 Epoch: 24 Global Step: 511380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:14:56,591-Speed 6304.83 samples/sec Loss 4.2184 LearningRate 0.0002 Epoch: 24 Global Step: 511390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:14:59,833-Speed 6318.95 samples/sec Loss 4.2148 LearningRate 0.0002 Epoch: 24 Global Step: 511400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:03,077-Speed 6313.93 samples/sec Loss 4.2145 LearningRate 0.0002 Epoch: 24 Global Step: 511410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:06,321-Speed 6315.12 samples/sec Loss 4.1947 LearningRate 0.0002 Epoch: 24 Global Step: 511420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:09,567-Speed 6311.02 samples/sec Loss 4.2374 LearningRate 0.0002 Epoch: 24 Global Step: 511430 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:12,810-Speed 6316.51 samples/sec Loss 4.2139 LearningRate 0.0002 Epoch: 24 Global Step: 511440 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:16,043-Speed 6335.24 samples/sec Loss 4.1923 LearningRate 0.0002 Epoch: 24 Global Step: 511450 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:19,291-Speed 6308.32 samples/sec Loss 4.1612 LearningRate 0.0002 Epoch: 24 Global Step: 511460 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:22,537-Speed 6309.01 samples/sec Loss 4.1866 LearningRate 0.0002 Epoch: 24 Global Step: 511470 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:25,786-Speed 6305.64 samples/sec Loss 4.1734 LearningRate 0.0002 Epoch: 24 Global Step: 511480 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:29,031-Speed 6311.96 samples/sec Loss 4.2086 LearningRate 0.0002 Epoch: 24 Global Step: 511490 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:32,273-Speed 6318.22 samples/sec Loss 4.1853 LearningRate 0.0002 Epoch: 24 Global Step: 511500 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:35,520-Speed 6308.83 samples/sec Loss 4.1420 LearningRate 0.0002 Epoch: 24 Global Step: 511510 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:38,764-Speed 6316.17 samples/sec Loss 4.2017 LearningRate 0.0002 Epoch: 24 Global Step: 511520 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:42,011-Speed 6308.21 samples/sec Loss 4.1775 LearningRate 0.0002 Epoch: 24 Global Step: 511530 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:45,258-Speed 6308.11 samples/sec Loss 4.1849 LearningRate 0.0002 Epoch: 24 Global Step: 511540 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:48,493-Speed 6332.22 samples/sec Loss 4.1654 LearningRate 0.0002 Epoch: 24 Global Step: 511550 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:51,741-Speed 6307.68 samples/sec Loss 4.1903 LearningRate 0.0002 Epoch: 24 Global Step: 511560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:54,986-Speed 6312.72 samples/sec Loss 4.2080 LearningRate 0.0002 Epoch: 24 Global Step: 511570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:15:58,230-Speed 6313.40 samples/sec Loss 4.1162 LearningRate 0.0002 Epoch: 24 Global Step: 511580 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:16:01,478-Speed 6308.18 samples/sec Loss 4.2193 LearningRate 0.0002 Epoch: 24 Global Step: 511590 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:16:04,711-Speed 6335.71 samples/sec Loss 4.1073 LearningRate 0.0002 Epoch: 24 Global Step: 511600 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:16:07,962-Speed 6300.71 samples/sec Loss 4.2569 LearningRate 0.0002 Epoch: 24 Global Step: 511610 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:16:11,208-Speed 6312.24 samples/sec Loss 4.1841 LearningRate 0.0002 Epoch: 24 Global Step: 511620 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:16:14,454-Speed 6309.49 samples/sec Loss 4.2358 LearningRate 0.0002 Epoch: 24 Global Step: 511630 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:16:17,700-Speed 6312.06 samples/sec Loss 4.1785 LearningRate 0.0002 Epoch: 24 Global Step: 511640 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:16:20,945-Speed 6311.57 samples/sec Loss 4.1662 LearningRate 0.0002 Epoch: 24 Global Step: 511650 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:16:24,193-Speed 6307.42 samples/sec Loss 4.1583 LearningRate 0.0002 Epoch: 24 Global Step: 511660 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:16:27,462-Speed 6265.51 samples/sec Loss 4.2410 LearningRate 0.0002 Epoch: 24 Global Step: 511670 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:16:30,756-Speed 6219.62 samples/sec Loss 4.2207 LearningRate 0.0002 Epoch: 24 Global Step: 511680 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:16:34,000-Speed 6314.73 samples/sec Loss 4.2279 LearningRate 0.0002 Epoch: 24 Global Step: 511690 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:16:37,244-Speed 6313.15 samples/sec Loss 4.1170 LearningRate 0.0002 Epoch: 24 Global Step: 511700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:16:40,491-Speed 6309.20 samples/sec Loss 4.1857 LearningRate 0.0002 Epoch: 24 Global Step: 511710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:16:43,733-Speed 6318.77 samples/sec Loss 4.1774 LearningRate 0.0002 Epoch: 24 Global Step: 511720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:16:46,978-Speed 6313.83 samples/sec Loss 4.1913 LearningRate 0.0002 Epoch: 24 Global Step: 511730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:16:50,227-Speed 6304.63 samples/sec Loss 4.2147 LearningRate 0.0002 Epoch: 24 Global Step: 511740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:16:53,476-Speed 6305.09 samples/sec Loss 4.1970 LearningRate 0.0002 Epoch: 24 Global Step: 511750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:16:56,721-Speed 6311.80 samples/sec Loss 4.1939 LearningRate 0.0002 Epoch: 24 Global Step: 511760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:16:59,974-Speed 6297.70 samples/sec Loss 4.2164 LearningRate 0.0002 Epoch: 24 Global Step: 511770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:03,232-Speed 6287.70 samples/sec Loss 4.1645 LearningRate 0.0002 Epoch: 24 Global Step: 511780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:06,477-Speed 6312.35 samples/sec Loss 4.1785 LearningRate 0.0002 Epoch: 24 Global Step: 511790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:09,709-Speed 6337.06 samples/sec Loss 4.1668 LearningRate 0.0002 Epoch: 24 Global Step: 511800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:12,956-Speed 6309.92 samples/sec Loss 4.1998 LearningRate 0.0002 Epoch: 24 Global Step: 511810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:16,204-Speed 6307.16 samples/sec Loss 4.3027 LearningRate 0.0002 Epoch: 24 Global Step: 511820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:19,451-Speed 6308.88 samples/sec Loss 4.2594 LearningRate 0.0002 Epoch: 24 Global Step: 511830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:22,694-Speed 6316.71 samples/sec Loss 4.1697 LearningRate 0.0002 Epoch: 24 Global Step: 511840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:25,944-Speed 6302.31 samples/sec Loss 4.1664 LearningRate 0.0002 Epoch: 24 Global Step: 511850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:29,188-Speed 6314.47 samples/sec Loss 4.1417 LearningRate 0.0002 Epoch: 24 Global Step: 511860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:32,443-Speed 6295.04 samples/sec Loss 4.1516 LearningRate 0.0002 Epoch: 24 Global Step: 511870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:35,699-Speed 6291.19 samples/sec Loss 4.2449 LearningRate 0.0002 Epoch: 24 Global Step: 511880 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:39,059-Speed 6095.90 samples/sec Loss 4.2103 LearningRate 0.0002 Epoch: 24 Global Step: 511890 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:42,318-Speed 6286.16 samples/sec Loss 4.2077 LearningRate 0.0002 Epoch: 24 Global Step: 511900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:45,562-Speed 6314.47 samples/sec Loss 4.1832 LearningRate 0.0002 Epoch: 24 Global Step: 511910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:48,808-Speed 6309.76 samples/sec Loss 4.1904 LearningRate 0.0002 Epoch: 24 Global Step: 511920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:52,054-Speed 6310.21 samples/sec Loss 4.1849 LearningRate 0.0002 Epoch: 24 Global Step: 511930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:55,302-Speed 6308.19 samples/sec Loss 4.1975 LearningRate 0.0002 Epoch: 24 Global Step: 511940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:17:58,554-Speed 6297.74 samples/sec Loss 4.2033 LearningRate 0.0002 Epoch: 24 Global Step: 511950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:01,802-Speed 6307.47 samples/sec Loss 4.1646 LearningRate 0.0002 Epoch: 24 Global Step: 511960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:05,095-Speed 6220.89 samples/sec Loss 4.1003 LearningRate 0.0002 Epoch: 24 Global Step: 511970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:08,350-Speed 6293.25 samples/sec Loss 4.1627 LearningRate 0.0002 Epoch: 24 Global Step: 511980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:11,597-Speed 6309.10 samples/sec Loss 4.1809 LearningRate 0.0002 Epoch: 24 Global Step: 511990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:14,829-Speed 6337.07 samples/sec Loss 4.1715 LearningRate 0.0002 Epoch: 24 Global Step: 512000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:18,073-Speed 6315.55 samples/sec Loss 4.1438 LearningRate 0.0002 Epoch: 24 Global Step: 512010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:21,319-Speed 6309.97 samples/sec Loss 4.1602 LearningRate 0.0002 Epoch: 24 Global Step: 512020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:24,620-Speed 6206.66 samples/sec Loss 4.1920 LearningRate 0.0002 Epoch: 24 Global Step: 512030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:27,866-Speed 6311.41 samples/sec Loss 4.1217 LearningRate 0.0002 Epoch: 24 Global Step: 512040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:31,121-Speed 6292.36 samples/sec Loss 4.1412 LearningRate 0.0002 Epoch: 24 Global Step: 512050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:34,364-Speed 6316.80 samples/sec Loss 4.2070 LearningRate 0.0002 Epoch: 24 Global Step: 512060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:37,611-Speed 6308.83 samples/sec Loss 4.2305 LearningRate 0.0002 Epoch: 24 Global Step: 512070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:40,868-Speed 6289.16 samples/sec Loss 4.1622 LearningRate 0.0002 Epoch: 24 Global Step: 512080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:44,115-Speed 6308.57 samples/sec Loss 4.1344 LearningRate 0.0002 Epoch: 24 Global Step: 512090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:47,346-Speed 6339.69 samples/sec Loss 4.1379 LearningRate 0.0002 Epoch: 24 Global Step: 512100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:50,592-Speed 6311.99 samples/sec Loss 4.1588 LearningRate 0.0002 Epoch: 24 Global Step: 512110 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:53,843-Speed 6299.59 samples/sec Loss 4.1554 LearningRate 0.0002 Epoch: 24 Global Step: 512120 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:18:57,099-Speed 6291.38 samples/sec Loss 4.2173 LearningRate 0.0002 Epoch: 24 Global Step: 512130 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:00,346-Speed 6308.72 samples/sec Loss 4.1284 LearningRate 0.0002 Epoch: 24 Global Step: 512140 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:03,590-Speed 6315.72 samples/sec Loss 4.0986 LearningRate 0.0002 Epoch: 24 Global Step: 512150 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:06,845-Speed 6292.56 samples/sec Loss 4.2371 LearningRate 0.0002 Epoch: 24 Global Step: 512160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:10,100-Speed 6292.59 samples/sec Loss 4.2245 LearningRate 0.0002 Epoch: 24 Global Step: 512170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:13,355-Speed 6294.60 samples/sec Loss 4.1796 LearningRate 0.0002 Epoch: 24 Global Step: 512180 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:16,600-Speed 6311.79 samples/sec Loss 4.2514 LearningRate 0.0002 Epoch: 24 Global Step: 512190 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:19,861-Speed 6283.37 samples/sec Loss 4.1926 LearningRate 0.0002 Epoch: 24 Global Step: 512200 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 14:19:23,092-Speed 6339.71 samples/sec Loss 4.1979 LearningRate 0.0002 Epoch: 24 Global Step: 512210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:26,338-Speed 6310.49 samples/sec Loss 4.2263 LearningRate 0.0002 Epoch: 24 Global Step: 512220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:29,583-Speed 6311.91 samples/sec Loss 4.1995 LearningRate 0.0002 Epoch: 24 Global Step: 512230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:32,829-Speed 6310.77 samples/sec Loss 4.1232 LearningRate 0.0002 Epoch: 24 Global Step: 512240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:36,075-Speed 6311.72 samples/sec Loss 4.2118 LearningRate 0.0002 Epoch: 24 Global Step: 512250 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:39,321-Speed 6310.31 samples/sec Loss 4.1895 LearningRate 0.0002 Epoch: 24 Global Step: 512260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:42,567-Speed 6311.29 samples/sec Loss 4.1836 LearningRate 0.0002 Epoch: 24 Global Step: 512270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:45,813-Speed 6311.55 samples/sec Loss 4.1555 LearningRate 0.0002 Epoch: 24 Global Step: 512280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:49,055-Speed 6318.88 samples/sec Loss 4.1854 LearningRate 0.0002 Epoch: 24 Global Step: 512290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:52,301-Speed 6309.67 samples/sec Loss 4.1905 LearningRate 0.0002 Epoch: 24 Global Step: 512300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:55,534-Speed 6335.38 samples/sec Loss 4.1736 LearningRate 0.0002 Epoch: 24 Global Step: 512310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:19:58,778-Speed 6314.88 samples/sec Loss 4.1793 LearningRate 0.0002 Epoch: 24 Global Step: 512320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:02,034-Speed 6292.67 samples/sec Loss 4.1357 LearningRate 0.0002 Epoch: 24 Global Step: 512330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:05,280-Speed 6309.74 samples/sec Loss 4.1388 LearningRate 0.0002 Epoch: 24 Global Step: 512340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:08,528-Speed 6307.48 samples/sec Loss 4.2150 LearningRate 0.0002 Epoch: 24 Global Step: 512350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:11,772-Speed 6314.34 samples/sec Loss 4.1899 LearningRate 0.0002 Epoch: 24 Global Step: 512360 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:15,014-Speed 6317.41 samples/sec Loss 4.2119 LearningRate 0.0002 Epoch: 24 Global Step: 512370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:18,264-Speed 6304.24 samples/sec Loss 4.1465 LearningRate 0.0002 Epoch: 24 Global Step: 512380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:21,511-Speed 6307.75 samples/sec Loss 4.1350 LearningRate 0.0002 Epoch: 24 Global Step: 512390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:24,754-Speed 6316.50 samples/sec Loss 4.2143 LearningRate 0.0002 Epoch: 24 Global Step: 512400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:27,989-Speed 6332.20 samples/sec Loss 4.1981 LearningRate 0.0002 Epoch: 24 Global Step: 512410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:31,235-Speed 6310.45 samples/sec Loss 4.2128 LearningRate 0.0002 Epoch: 24 Global Step: 512420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:34,484-Speed 6306.40 samples/sec Loss 4.0988 LearningRate 0.0002 Epoch: 24 Global Step: 512430 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:20:37,712-Speed 6345.21 samples/sec Loss 4.1520 LearningRate 0.0002 Epoch: 24 Global Step: 512440 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:20:40,958-Speed 6309.43 samples/sec Loss 4.2387 LearningRate 0.0002 Epoch: 24 Global Step: 512450 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:20:44,204-Speed 6312.11 samples/sec Loss 4.1722 LearningRate 0.0002 Epoch: 24 Global Step: 512460 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:20:47,461-Speed 6288.79 samples/sec Loss 4.1893 LearningRate 0.0002 Epoch: 24 Global Step: 512470 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:20:50,713-Speed 6300.54 samples/sec Loss 4.2238 LearningRate 0.0002 Epoch: 24 Global Step: 512480 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:20:53,954-Speed 6320.28 samples/sec Loss 4.1523 LearningRate 0.0002 Epoch: 24 Global Step: 512490 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:20:57,199-Speed 6312.46 samples/sec Loss 4.1843 LearningRate 0.0002 Epoch: 24 Global Step: 512500 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:21:00,448-Speed 6305.04 samples/sec Loss 4.1112 LearningRate 0.0002 Epoch: 24 Global Step: 512510 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:21:03,694-Speed 6311.33 samples/sec Loss 4.1620 LearningRate 0.0002 Epoch: 24 Global Step: 512520 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:21:06,936-Speed 6317.03 samples/sec Loss 4.1509 LearningRate 0.0002 Epoch: 24 Global Step: 512530 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:21:10,186-Speed 6303.57 samples/sec Loss 4.1461 LearningRate 0.0002 Epoch: 24 Global Step: 512540 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:13,434-Speed 6306.59 samples/sec Loss 4.1843 LearningRate 0.0002 Epoch: 24 Global Step: 512550 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:16,684-Speed 6302.10 samples/sec Loss 4.1894 LearningRate 0.0002 Epoch: 24 Global Step: 512560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:19,932-Speed 6308.86 samples/sec Loss 4.2215 LearningRate 0.0002 Epoch: 24 Global Step: 512570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:23,176-Speed 6312.62 samples/sec Loss 4.1558 LearningRate 0.0002 Epoch: 24 Global Step: 512580 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:26,420-Speed 6314.80 samples/sec Loss 4.2410 LearningRate 0.0002 Epoch: 24 Global Step: 512590 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:29,666-Speed 6311.40 samples/sec Loss 4.2033 LearningRate 0.0002 Epoch: 24 Global Step: 512600 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:32,906-Speed 6323.40 samples/sec Loss 4.1680 LearningRate 0.0002 Epoch: 24 Global Step: 512610 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:36,151-Speed 6312.12 samples/sec Loss 4.2730 LearningRate 0.0002 Epoch: 24 Global Step: 512620 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:39,406-Speed 6292.48 samples/sec Loss 4.1675 LearningRate 0.0002 Epoch: 24 Global Step: 512630 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:42,635-Speed 6345.07 samples/sec Loss 4.1761 LearningRate 0.0002 Epoch: 24 Global Step: 512640 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:45,886-Speed 6300.24 samples/sec Loss 4.1659 LearningRate 0.0002 Epoch: 24 Global Step: 512650 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:49,132-Speed 6311.33 samples/sec Loss 4.2352 LearningRate 0.0002 Epoch: 24 Global Step: 512660 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:52,374-Speed 6317.52 samples/sec Loss 4.2017 LearningRate 0.0002 Epoch: 24 Global Step: 512670 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:55,615-Speed 6319.75 samples/sec Loss 4.1420 LearningRate 0.0002 Epoch: 24 Global Step: 512680 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:21:58,861-Speed 6312.69 samples/sec Loss 4.2373 LearningRate 0.0002 Epoch: 24 Global Step: 512690 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:02,108-Speed 6308.07 samples/sec Loss 4.1808 LearningRate 0.0002 Epoch: 24 Global Step: 512700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:05,358-Speed 6304.45 samples/sec Loss 4.2269 LearningRate 0.0002 Epoch: 24 Global Step: 512710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:08,603-Speed 6311.23 samples/sec Loss 4.1227 LearningRate 0.0002 Epoch: 24 Global Step: 512720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:11,847-Speed 6314.67 samples/sec Loss 4.1717 LearningRate 0.0002 Epoch: 24 Global Step: 512730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:15,078-Speed 6340.15 samples/sec Loss 4.1844 LearningRate 0.0002 Epoch: 24 Global Step: 512740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:18,324-Speed 6310.78 samples/sec Loss 4.1152 LearningRate 0.0002 Epoch: 24 Global Step: 512750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:21,569-Speed 6312.89 samples/sec Loss 4.1347 LearningRate 0.0002 Epoch: 24 Global Step: 512760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:24,817-Speed 6306.85 samples/sec Loss 4.1788 LearningRate 0.0002 Epoch: 24 Global Step: 512770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:28,059-Speed 6319.48 samples/sec Loss 4.2173 LearningRate 0.0002 Epoch: 24 Global Step: 512780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:31,304-Speed 6312.68 samples/sec Loss 4.1948 LearningRate 0.0002 Epoch: 24 Global Step: 512790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:34,551-Speed 6307.62 samples/sec Loss 4.2108 LearningRate 0.0002 Epoch: 24 Global Step: 512800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:37,795-Speed 6314.43 samples/sec Loss 4.1138 LearningRate 0.0002 Epoch: 24 Global Step: 512810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:41,040-Speed 6313.43 samples/sec Loss 4.2172 LearningRate 0.0002 Epoch: 24 Global Step: 512820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:44,286-Speed 6310.59 samples/sec Loss 4.2025 LearningRate 0.0002 Epoch: 24 Global Step: 512830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:47,519-Speed 6335.64 samples/sec Loss 4.1453 LearningRate 0.0002 Epoch: 24 Global Step: 512840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:50,851-Speed 6148.47 samples/sec Loss 4.1794 LearningRate 0.0002 Epoch: 24 Global Step: 512850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:54,117-Speed 6272.65 samples/sec Loss 4.2107 LearningRate 0.0002 Epoch: 24 Global Step: 512860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:22:57,361-Speed 6313.19 samples/sec Loss 4.2315 LearningRate 0.0002 Epoch: 24 Global Step: 512870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:00,603-Speed 6318.83 samples/sec Loss 4.1659 LearningRate 0.0002 Epoch: 24 Global Step: 512880 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:03,864-Speed 6282.15 samples/sec Loss 4.1096 LearningRate 0.0002 Epoch: 24 Global Step: 512890 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:07,108-Speed 6314.75 samples/sec Loss 4.1636 LearningRate 0.0002 Epoch: 24 Global Step: 512900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:10,354-Speed 6311.15 samples/sec Loss 4.0855 LearningRate 0.0002 Epoch: 24 Global Step: 512910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:13,606-Speed 6298.98 samples/sec Loss 4.1829 LearningRate 0.0002 Epoch: 24 Global Step: 512920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:16,851-Speed 6312.48 samples/sec Loss 4.1852 LearningRate 0.0002 Epoch: 24 Global Step: 512930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:20,101-Speed 6303.15 samples/sec Loss 4.2116 LearningRate 0.0002 Epoch: 24 Global Step: 512940 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 14:23:23,332-Speed 6341.05 samples/sec Loss 4.1830 LearningRate 0.0002 Epoch: 24 Global Step: 512950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:26,577-Speed 6312.64 samples/sec Loss 4.1790 LearningRate 0.0002 Epoch: 24 Global Step: 512960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:29,824-Speed 6307.74 samples/sec Loss 4.2106 LearningRate 0.0002 Epoch: 24 Global Step: 512970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:33,069-Speed 6312.74 samples/sec Loss 4.1691 LearningRate 0.0002 Epoch: 24 Global Step: 512980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:36,315-Speed 6311.15 samples/sec Loss 4.1968 LearningRate 0.0002 Epoch: 24 Global Step: 512990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:39,564-Speed 6305.21 samples/sec Loss 4.1790 LearningRate 0.0002 Epoch: 24 Global Step: 513000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:42,808-Speed 6314.13 samples/sec Loss 4.2074 LearningRate 0.0002 Epoch: 24 Global Step: 513010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:46,055-Speed 6309.20 samples/sec Loss 4.1431 LearningRate 0.0002 Epoch: 24 Global Step: 513020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:49,303-Speed 6305.97 samples/sec Loss 4.1453 LearningRate 0.0002 Epoch: 24 Global Step: 513030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:52,553-Speed 6303.18 samples/sec Loss 4.2017 LearningRate 0.0002 Epoch: 24 Global Step: 513040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:55,784-Speed 6339.75 samples/sec Loss 4.2001 LearningRate 0.0002 Epoch: 24 Global Step: 513050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:23:59,030-Speed 6311.91 samples/sec Loss 4.1295 LearningRate 0.0002 Epoch: 24 Global Step: 513060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:02,278-Speed 6306.48 samples/sec Loss 4.1782 LearningRate 0.0002 Epoch: 24 Global Step: 513070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:05,527-Speed 6303.79 samples/sec Loss 4.1783 LearningRate 0.0002 Epoch: 24 Global Step: 513080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:08,774-Speed 6309.15 samples/sec Loss 4.2275 LearningRate 0.0002 Epoch: 24 Global Step: 513090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:12,016-Speed 6319.60 samples/sec Loss 4.1706 LearningRate 0.0002 Epoch: 24 Global Step: 513100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:15,263-Speed 6309.15 samples/sec Loss 4.1876 LearningRate 0.0002 Epoch: 24 Global Step: 513110 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:18,510-Speed 6309.28 samples/sec Loss 4.2428 LearningRate 0.0002 Epoch: 24 Global Step: 513120 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:21,754-Speed 6313.39 samples/sec Loss 4.1596 LearningRate 0.0002 Epoch: 24 Global Step: 513130 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:24,996-Speed 6319.41 samples/sec Loss 4.1894 LearningRate 0.0002 Epoch: 24 Global Step: 513140 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:28,234-Speed 6325.51 samples/sec Loss 4.1448 LearningRate 0.0002 Epoch: 24 Global Step: 513150 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:31,481-Speed 6309.74 samples/sec Loss 4.1375 LearningRate 0.0002 Epoch: 24 Global Step: 513160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:34,725-Speed 6313.70 samples/sec Loss 4.1548 LearningRate 0.0002 Epoch: 24 Global Step: 513170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:38,010-Speed 6236.00 samples/sec Loss 4.1564 LearningRate 0.0002 Epoch: 24 Global Step: 513180 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:41,255-Speed 6313.18 samples/sec Loss 4.1234 LearningRate 0.0002 Epoch: 24 Global Step: 513190 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:44,503-Speed 6306.08 samples/sec Loss 4.1501 LearningRate 0.0002 Epoch: 24 Global Step: 513200 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:47,803-Speed 6208.55 samples/sec Loss 4.1450 LearningRate 0.0002 Epoch: 24 Global Step: 513210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:51,052-Speed 6304.03 samples/sec Loss 4.1639 LearningRate 0.0002 Epoch: 24 Global Step: 513220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:54,296-Speed 6315.32 samples/sec Loss 4.1611 LearningRate 0.0002 Epoch: 24 Global Step: 513230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:24:57,538-Speed 6318.05 samples/sec Loss 4.1603 LearningRate 0.0002 Epoch: 24 Global Step: 513240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:00,773-Speed 6332.72 samples/sec Loss 4.2464 LearningRate 0.0002 Epoch: 24 Global Step: 513250 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:04,016-Speed 6316.12 samples/sec Loss 4.1918 LearningRate 0.0002 Epoch: 24 Global Step: 513260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:07,260-Speed 6314.01 samples/sec Loss 4.2202 LearningRate 0.0002 Epoch: 24 Global Step: 513270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:10,501-Speed 6319.98 samples/sec Loss 4.1933 LearningRate 0.0002 Epoch: 24 Global Step: 513280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:13,745-Speed 6314.84 samples/sec Loss 4.1672 LearningRate 0.0002 Epoch: 24 Global Step: 513290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:16,988-Speed 6318.06 samples/sec Loss 4.1715 LearningRate 0.0002 Epoch: 24 Global Step: 513300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:20,235-Speed 6308.15 samples/sec Loss 4.2127 LearningRate 0.0002 Epoch: 24 Global Step: 513310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:23,480-Speed 6313.38 samples/sec Loss 4.1541 LearningRate 0.0002 Epoch: 24 Global Step: 513320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:26,730-Speed 6302.83 samples/sec Loss 4.1910 LearningRate 0.0002 Epoch: 24 Global Step: 513330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:29,973-Speed 6317.56 samples/sec Loss 4.1944 LearningRate 0.0002 Epoch: 24 Global Step: 513340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:33,202-Speed 6342.24 samples/sec Loss 4.1976 LearningRate 0.0002 Epoch: 24 Global Step: 513350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:36,446-Speed 6315.04 samples/sec Loss 4.2020 LearningRate 0.0002 Epoch: 24 Global Step: 513360 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:39,693-Speed 6308.53 samples/sec Loss 4.1629 LearningRate 0.0002 Epoch: 24 Global Step: 513370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:42,937-Speed 6315.03 samples/sec Loss 4.1216 LearningRate 0.0002 Epoch: 24 Global Step: 513380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:46,180-Speed 6316.76 samples/sec Loss 4.1507 LearningRate 0.0002 Epoch: 24 Global Step: 513390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:49,426-Speed 6311.41 samples/sec Loss 4.1553 LearningRate 0.0002 Epoch: 24 Global Step: 513400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:52,667-Speed 6318.97 samples/sec Loss 4.2458 LearningRate 0.0002 Epoch: 24 Global Step: 513410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:55,915-Speed 6308.28 samples/sec Loss 4.1749 LearningRate 0.0002 Epoch: 24 Global Step: 513420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:25:59,157-Speed 6317.78 samples/sec Loss 4.1834 LearningRate 0.0002 Epoch: 24 Global Step: 513430 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:26:02,404-Speed 6308.50 samples/sec Loss 4.1204 LearningRate 0.0002 Epoch: 24 Global Step: 513440 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:26:05,634-Speed 6341.72 samples/sec Loss 4.1481 LearningRate 0.0002 Epoch: 24 Global Step: 513450 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:26:08,866-Speed 6337.98 samples/sec Loss 4.1411 LearningRate 0.0002 Epoch: 24 Global Step: 513460 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:26:12,114-Speed 6307.18 samples/sec Loss 4.1516 LearningRate 0.0002 Epoch: 24 Global Step: 513470 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:26:15,358-Speed 6315.30 samples/sec Loss 4.0651 LearningRate 0.0002 Epoch: 24 Global Step: 513480 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:26:18,604-Speed 6310.15 samples/sec Loss 4.1489 LearningRate 0.0002 Epoch: 24 Global Step: 513490 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:26:21,847-Speed 6316.70 samples/sec Loss 4.1235 LearningRate 0.0002 Epoch: 24 Global Step: 513500 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:26:25,092-Speed 6312.73 samples/sec Loss 4.2010 LearningRate 0.0002 Epoch: 24 Global Step: 513510 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:26:28,342-Speed 6301.88 samples/sec Loss 4.2090 LearningRate 0.0002 Epoch: 24 Global Step: 513520 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:26:31,586-Speed 6314.77 samples/sec Loss 4.2236 LearningRate 0.0002 Epoch: 24 Global Step: 513530 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:26:34,836-Speed 6303.41 samples/sec Loss 4.1728 LearningRate 0.0002 Epoch: 24 Global Step: 513540 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:26:38,082-Speed 6312.37 samples/sec Loss 4.2071 LearningRate 0.0002 Epoch: 24 Global Step: 513550 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:26:41,326-Speed 6313.20 samples/sec Loss 4.1550 LearningRate 0.0002 Epoch: 24 Global Step: 513560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:26:44,572-Speed 6310.70 samples/sec Loss 4.1023 LearningRate 0.0002 Epoch: 24 Global Step: 513570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:26:47,821-Speed 6306.20 samples/sec Loss 4.1552 LearningRate 0.0002 Epoch: 24 Global Step: 513580 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:26:51,067-Speed 6309.95 samples/sec Loss 4.0820 LearningRate 0.0002 Epoch: 24 Global Step: 513590 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:26:54,313-Speed 6311.10 samples/sec Loss 4.1549 LearningRate 0.0002 Epoch: 24 Global Step: 513600 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:26:57,561-Speed 6306.46 samples/sec Loss 4.1696 LearningRate 0.0002 Epoch: 24 Global Step: 513610 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:00,803-Speed 6318.29 samples/sec Loss 4.1559 LearningRate 0.0002 Epoch: 24 Global Step: 513620 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:04,050-Speed 6309.10 samples/sec Loss 4.2275 LearningRate 0.0002 Epoch: 24 Global Step: 513630 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:07,297-Speed 6308.29 samples/sec Loss 4.2269 LearningRate 0.0002 Epoch: 24 Global Step: 513640 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:10,554-Speed 6288.84 samples/sec Loss 4.1893 LearningRate 0.0002 Epoch: 24 Global Step: 513650 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:13,787-Speed 6337.59 samples/sec Loss 4.1874 LearningRate 0.0002 Epoch: 24 Global Step: 513660 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:17,032-Speed 6311.60 samples/sec Loss 4.1575 LearningRate 0.0002 Epoch: 24 Global Step: 513670 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:20,276-Speed 6314.51 samples/sec Loss 4.1649 LearningRate 0.0002 Epoch: 24 Global Step: 513680 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:23,519-Speed 6316.56 samples/sec Loss 4.1558 LearningRate 0.0002 Epoch: 24 Global Step: 513690 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:26,763-Speed 6315.09 samples/sec Loss 4.2088 LearningRate 0.0002 Epoch: 24 Global Step: 513700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:30,020-Speed 6289.42 samples/sec Loss 4.1523 LearningRate 0.0002 Epoch: 24 Global Step: 513710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:33,262-Speed 6318.38 samples/sec Loss 4.1968 LearningRate 0.0002 Epoch: 24 Global Step: 513720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:36,505-Speed 6316.62 samples/sec Loss 4.1541 LearningRate 0.0002 Epoch: 24 Global Step: 513730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:39,749-Speed 6315.44 samples/sec Loss 4.1559 LearningRate 0.0002 Epoch: 24 Global Step: 513740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:42,993-Speed 6314.89 samples/sec Loss 4.1526 LearningRate 0.0002 Epoch: 24 Global Step: 513750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:46,234-Speed 6320.37 samples/sec Loss 4.2267 LearningRate 0.0002 Epoch: 24 Global Step: 513760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:49,477-Speed 6317.33 samples/sec Loss 4.2675 LearningRate 0.0002 Epoch: 24 Global Step: 513770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:52,719-Speed 6316.66 samples/sec Loss 4.1541 LearningRate 0.0002 Epoch: 24 Global Step: 513780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:55,966-Speed 6310.50 samples/sec Loss 4.1815 LearningRate 0.0002 Epoch: 24 Global Step: 513790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:27:59,210-Speed 6314.13 samples/sec Loss 4.1975 LearningRate 0.0002 Epoch: 24 Global Step: 513800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:02,457-Speed 6308.39 samples/sec Loss 4.1596 LearningRate 0.0002 Epoch: 24 Global Step: 513810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:05,699-Speed 6317.44 samples/sec Loss 4.1659 LearningRate 0.0002 Epoch: 24 Global Step: 513820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:08,944-Speed 6313.04 samples/sec Loss 4.2354 LearningRate 0.0002 Epoch: 24 Global Step: 513830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:12,189-Speed 6313.45 samples/sec Loss 4.1712 LearningRate 0.0002 Epoch: 24 Global Step: 513840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:15,435-Speed 6310.95 samples/sec Loss 4.1951 LearningRate 0.0002 Epoch: 24 Global Step: 513850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:18,663-Speed 6345.19 samples/sec Loss 4.1656 LearningRate 0.0002 Epoch: 24 Global Step: 513860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:21,913-Speed 6303.14 samples/sec Loss 4.2036 LearningRate 0.0002 Epoch: 24 Global Step: 513870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:25,157-Speed 6314.02 samples/sec Loss 4.1472 LearningRate 0.0002 Epoch: 24 Global Step: 513880 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:28,404-Speed 6309.55 samples/sec Loss 4.2492 LearningRate 0.0002 Epoch: 24 Global Step: 513890 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:31,653-Speed 6304.18 samples/sec Loss 4.1814 LearningRate 0.0002 Epoch: 24 Global Step: 513900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:34,896-Speed 6316.83 samples/sec Loss 4.0980 LearningRate 0.0002 Epoch: 24 Global Step: 513910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:38,140-Speed 6316.21 samples/sec Loss 4.1970 LearningRate 0.0002 Epoch: 24 Global Step: 513920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:41,384-Speed 6312.76 samples/sec Loss 4.1504 LearningRate 0.0002 Epoch: 24 Global Step: 513930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:44,625-Speed 6320.28 samples/sec Loss 4.1801 LearningRate 0.0002 Epoch: 24 Global Step: 513940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:47,872-Speed 6310.30 samples/sec Loss 4.2232 LearningRate 0.0002 Epoch: 24 Global Step: 513950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:51,118-Speed 6311.46 samples/sec Loss 4.1851 LearningRate 0.0002 Epoch: 24 Global Step: 513960 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 14:28:54,349-Speed 6339.63 samples/sec Loss 4.1443 LearningRate 0.0002 Epoch: 24 Global Step: 513970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:28:57,597-Speed 6307.32 samples/sec Loss 4.1291 LearningRate 0.0002 Epoch: 24 Global Step: 513980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:00,849-Speed 6299.00 samples/sec Loss 4.1537 LearningRate 0.0002 Epoch: 24 Global Step: 513990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:04,096-Speed 6307.54 samples/sec Loss 4.1760 LearningRate 0.0002 Epoch: 24 Global Step: 514000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:07,340-Speed 6316.03 samples/sec Loss 4.1359 LearningRate 0.0002 Epoch: 24 Global Step: 514010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:10,583-Speed 6315.66 samples/sec Loss 4.1516 LearningRate 0.0002 Epoch: 24 Global Step: 514020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:13,833-Speed 6302.66 samples/sec Loss 4.1652 LearningRate 0.0002 Epoch: 24 Global Step: 514030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:17,081-Speed 6307.28 samples/sec Loss 4.1990 LearningRate 0.0002 Epoch: 24 Global Step: 514040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:20,327-Speed 6310.05 samples/sec Loss 4.1911 LearningRate 0.0002 Epoch: 24 Global Step: 514050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:23,570-Speed 6316.28 samples/sec Loss 4.1583 LearningRate 0.0002 Epoch: 24 Global Step: 514060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:26,802-Speed 6339.48 samples/sec Loss 4.1581 LearningRate 0.0002 Epoch: 24 Global Step: 514070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:30,047-Speed 6312.08 samples/sec Loss 4.1885 LearningRate 0.0002 Epoch: 24 Global Step: 514080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:33,294-Speed 6309.19 samples/sec Loss 4.2134 LearningRate 0.0002 Epoch: 24 Global Step: 514090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:36,539-Speed 6312.85 samples/sec Loss 4.2255 LearningRate 0.0002 Epoch: 24 Global Step: 514100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:39,790-Speed 6300.43 samples/sec Loss 4.1396 LearningRate 0.0002 Epoch: 24 Global Step: 514110 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:43,035-Speed 6313.33 samples/sec Loss 4.1109 LearningRate 0.0002 Epoch: 24 Global Step: 514120 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:46,280-Speed 6312.46 samples/sec Loss 4.1634 LearningRate 0.0002 Epoch: 24 Global Step: 514130 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:49,526-Speed 6310.38 samples/sec Loss 4.1263 LearningRate 0.0002 Epoch: 24 Global Step: 514140 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:52,770-Speed 6314.57 samples/sec Loss 4.1178 LearningRate 0.0002 Epoch: 24 Global Step: 514150 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:56,017-Speed 6308.07 samples/sec Loss 4.2167 LearningRate 0.0002 Epoch: 24 Global Step: 514160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:29:59,248-Speed 6341.66 samples/sec Loss 4.1742 LearningRate 0.0002 Epoch: 24 Global Step: 514170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:02,497-Speed 6304.75 samples/sec Loss 4.1366 LearningRate 0.0002 Epoch: 24 Global Step: 514180 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:05,744-Speed 6308.96 samples/sec Loss 4.1680 LearningRate 0.0002 Epoch: 24 Global Step: 514190 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:08,990-Speed 6309.41 samples/sec Loss 4.1410 LearningRate 0.0002 Epoch: 24 Global Step: 514200 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:12,233-Speed 6318.03 samples/sec Loss 4.1587 LearningRate 0.0002 Epoch: 24 Global Step: 514210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:15,478-Speed 6312.38 samples/sec Loss 4.2014 LearningRate 0.0002 Epoch: 24 Global Step: 514220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:18,718-Speed 6322.75 samples/sec Loss 4.2176 LearningRate 0.0002 Epoch: 24 Global Step: 514230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:21,966-Speed 6305.68 samples/sec Loss 4.1965 LearningRate 0.0002 Epoch: 24 Global Step: 514240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:25,213-Speed 6308.49 samples/sec Loss 4.1496 LearningRate 0.0002 Epoch: 24 Global Step: 514250 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:28,462-Speed 6305.10 samples/sec Loss 4.2352 LearningRate 0.0002 Epoch: 24 Global Step: 514260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:31,694-Speed 6338.47 samples/sec Loss 4.1617 LearningRate 0.0002 Epoch: 24 Global Step: 514270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:34,954-Speed 6284.35 samples/sec Loss 4.1168 LearningRate 0.0002 Epoch: 24 Global Step: 514280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:38,196-Speed 6316.91 samples/sec Loss 4.1866 LearningRate 0.0002 Epoch: 24 Global Step: 514290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:41,444-Speed 6307.64 samples/sec Loss 4.1877 LearningRate 0.0002 Epoch: 24 Global Step: 514300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:44,689-Speed 6312.33 samples/sec Loss 4.0888 LearningRate 0.0002 Epoch: 24 Global Step: 514310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:47,935-Speed 6310.72 samples/sec Loss 4.1814 LearningRate 0.0002 Epoch: 24 Global Step: 514320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:51,190-Speed 6293.17 samples/sec Loss 4.2016 LearningRate 0.0002 Epoch: 24 Global Step: 514330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:30:54,423-Speed 6336.48 samples/sec Loss 4.2114 LearningRate 0.0002 Epoch: 24 Global Step: 514340 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:30:57,665-Speed 6317.15 samples/sec Loss 4.1722 LearningRate 0.0002 Epoch: 24 Global Step: 514350 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:31:00,909-Speed 6315.00 samples/sec Loss 4.2017 LearningRate 0.0002 Epoch: 24 Global Step: 514360 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:31:04,154-Speed 6314.17 samples/sec Loss 4.1255 LearningRate 0.0002 Epoch: 24 Global Step: 514370 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:31:07,400-Speed 6310.87 samples/sec Loss 4.2173 LearningRate 0.0002 Epoch: 24 Global Step: 514380 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:31:10,644-Speed 6314.92 samples/sec Loss 4.1641 LearningRate 0.0002 Epoch: 24 Global Step: 514390 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:31:13,890-Speed 6310.63 samples/sec Loss 4.1684 LearningRate 0.0002 Epoch: 24 Global Step: 514400 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:31:17,137-Speed 6307.82 samples/sec Loss 4.1614 LearningRate 0.0002 Epoch: 24 Global Step: 514410 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:31:20,382-Speed 6313.10 samples/sec Loss 4.1986 LearningRate 0.0002 Epoch: 24 Global Step: 514420 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:31:23,627-Speed 6312.50 samples/sec Loss 4.1503 LearningRate 0.0002 Epoch: 24 Global Step: 514430 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:31:26,870-Speed 6317.38 samples/sec Loss 4.1780 LearningRate 0.0002 Epoch: 24 Global Step: 514440 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:31:30,111-Speed 6319.48 samples/sec Loss 4.1988 LearningRate 0.0002 Epoch: 24 Global Step: 514450 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:31:33,361-Speed 6304.49 samples/sec Loss 4.2249 LearningRate 0.0002 Epoch: 24 Global Step: 514460 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:31:36,613-Speed 6297.47 samples/sec Loss 4.1427 LearningRate 0.0002 Epoch: 24 Global Step: 514470 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:31:39,859-Speed 6311.56 samples/sec Loss 4.1320 LearningRate 0.0002 Epoch: 24 Global Step: 514480 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:31:43,101-Speed 6318.25 samples/sec Loss 4.1664 LearningRate 0.0002 Epoch: 24 Global Step: 514490 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:31:46,348-Speed 6307.76 samples/sec Loss 4.1810 LearningRate 0.0002 Epoch: 24 Global Step: 514500 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:31:49,593-Speed 6313.93 samples/sec Loss 4.2129 LearningRate 0.0002 Epoch: 24 Global Step: 514510 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:31:52,838-Speed 6312.26 samples/sec Loss 4.1133 LearningRate 0.0002 Epoch: 24 Global Step: 514520 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:31:56,081-Speed 6317.26 samples/sec Loss 4.1381 LearningRate 0.0002 Epoch: 24 Global Step: 514530 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:31:59,320-Speed 6323.25 samples/sec Loss 4.1611 LearningRate 0.0002 Epoch: 24 Global Step: 514540 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 14:32:02,554-Speed 6334.37 samples/sec Loss 4.2045 LearningRate 0.0002 Epoch: 24 Global Step: 514550 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:05,802-Speed 6307.77 samples/sec Loss 4.1896 LearningRate 0.0002 Epoch: 24 Global Step: 514560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:09,048-Speed 6310.45 samples/sec Loss 4.1609 LearningRate 0.0002 Epoch: 24 Global Step: 514570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:12,292-Speed 6313.47 samples/sec Loss 4.2096 LearningRate 0.0002 Epoch: 24 Global Step: 514580 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:15,538-Speed 6312.06 samples/sec Loss 4.2054 LearningRate 0.0002 Epoch: 24 Global Step: 514590 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:18,788-Speed 6302.45 samples/sec Loss 4.1511 LearningRate 0.0002 Epoch: 24 Global Step: 514600 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:22,037-Speed 6305.94 samples/sec Loss 4.1654 LearningRate 0.0002 Epoch: 24 Global Step: 514610 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:25,296-Speed 6284.87 samples/sec Loss 4.1956 LearningRate 0.0002 Epoch: 24 Global Step: 514620 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:28,550-Speed 6294.61 samples/sec Loss 4.2071 LearningRate 0.0002 Epoch: 24 Global Step: 514630 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:31,793-Speed 6316.75 samples/sec Loss 4.1607 LearningRate 0.0002 Epoch: 24 Global Step: 514640 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:35,027-Speed 6335.80 samples/sec Loss 4.1657 LearningRate 0.0002 Epoch: 24 Global Step: 514650 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:38,277-Speed 6302.77 samples/sec Loss 4.2082 LearningRate 0.0002 Epoch: 24 Global Step: 514660 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:41,523-Speed 6310.03 samples/sec Loss 4.1353 LearningRate 0.0002 Epoch: 24 Global Step: 514670 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:44,768-Speed 6312.51 samples/sec Loss 4.0902 LearningRate 0.0002 Epoch: 24 Global Step: 514680 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:48,015-Speed 6309.45 samples/sec Loss 4.1327 LearningRate 0.0002 Epoch: 24 Global Step: 514690 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:51,262-Speed 6307.33 samples/sec Loss 4.1489 LearningRate 0.0002 Epoch: 24 Global Step: 514700 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:54,512-Speed 6304.09 samples/sec Loss 4.1972 LearningRate 0.0002 Epoch: 24 Global Step: 514710 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:32:57,761-Speed 6304.69 samples/sec Loss 4.2181 LearningRate 0.0002 Epoch: 24 Global Step: 514720 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:01,016-Speed 6293.08 samples/sec Loss 4.1367 LearningRate 0.0002 Epoch: 24 Global Step: 514730 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:04,264-Speed 6307.02 samples/sec Loss 4.1679 LearningRate 0.0002 Epoch: 24 Global Step: 514740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:07,507-Speed 6317.31 samples/sec Loss 4.1376 LearningRate 0.0002 Epoch: 24 Global Step: 514750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:10,753-Speed 6310.43 samples/sec Loss 4.1302 LearningRate 0.0002 Epoch: 24 Global Step: 514760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:14,000-Speed 6307.63 samples/sec Loss 4.1675 LearningRate 0.0002 Epoch: 24 Global Step: 514770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:17,249-Speed 6304.64 samples/sec Loss 4.1755 LearningRate 0.0002 Epoch: 24 Global Step: 514780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:20,497-Speed 6306.82 samples/sec Loss 4.0837 LearningRate 0.0002 Epoch: 24 Global Step: 514790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:23,743-Speed 6313.27 samples/sec Loss 4.1029 LearningRate 0.0002 Epoch: 24 Global Step: 514800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:26,986-Speed 6316.16 samples/sec Loss 4.0742 LearningRate 0.0002 Epoch: 24 Global Step: 514810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:30,236-Speed 6302.56 samples/sec Loss 4.1758 LearningRate 0.0002 Epoch: 24 Global Step: 514820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:33,481-Speed 6312.18 samples/sec Loss 4.2116 LearningRate 0.0002 Epoch: 24 Global Step: 514830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:36,729-Speed 6307.58 samples/sec Loss 4.1097 LearningRate 0.0002 Epoch: 24 Global Step: 514840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:39,958-Speed 6343.60 samples/sec Loss 4.1231 LearningRate 0.0002 Epoch: 24 Global Step: 514850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:43,202-Speed 6314.03 samples/sec Loss 4.1968 LearningRate 0.0002 Epoch: 24 Global Step: 514860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:46,450-Speed 6307.39 samples/sec Loss 4.1733 LearningRate 0.0002 Epoch: 24 Global Step: 514870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:49,691-Speed 6319.96 samples/sec Loss 4.1865 LearningRate 0.0002 Epoch: 24 Global Step: 514880 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:52,936-Speed 6313.88 samples/sec Loss 4.1153 LearningRate 0.0002 Epoch: 24 Global Step: 514890 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:56,185-Speed 6304.50 samples/sec Loss 4.2195 LearningRate 0.0002 Epoch: 24 Global Step: 514900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:33:59,433-Speed 6306.44 samples/sec Loss 4.1695 LearningRate 0.0002 Epoch: 24 Global Step: 514910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:02,680-Speed 6308.12 samples/sec Loss 4.2253 LearningRate 0.0002 Epoch: 24 Global Step: 514920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:05,924-Speed 6314.54 samples/sec Loss 4.2056 LearningRate 0.0002 Epoch: 24 Global Step: 514930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:09,172-Speed 6307.91 samples/sec Loss 4.1770 LearningRate 0.0002 Epoch: 24 Global Step: 514940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:12,405-Speed 6336.16 samples/sec Loss 4.1897 LearningRate 0.0002 Epoch: 24 Global Step: 514950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:15,648-Speed 6316.92 samples/sec Loss 4.0997 LearningRate 0.0002 Epoch: 24 Global Step: 514960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:18,890-Speed 6317.19 samples/sec Loss 4.1257 LearningRate 0.0002 Epoch: 24 Global Step: 514970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:22,135-Speed 6312.58 samples/sec Loss 4.1640 LearningRate 0.0002 Epoch: 24 Global Step: 514980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:25,384-Speed 6306.51 samples/sec Loss 4.1122 LearningRate 0.0002 Epoch: 24 Global Step: 514990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:28,625-Speed 6319.71 samples/sec Loss 4.1620 LearningRate 0.0002 Epoch: 24 Global Step: 515000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:31,871-Speed 6311.38 samples/sec Loss 4.2093 LearningRate 0.0002 Epoch: 24 Global Step: 515010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:35,115-Speed 6315.13 samples/sec Loss 4.1753 LearningRate 0.0002 Epoch: 24 Global Step: 515020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:38,359-Speed 6313.60 samples/sec Loss 4.1827 LearningRate 0.0002 Epoch: 24 Global Step: 515030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:41,604-Speed 6313.39 samples/sec Loss 4.1001 LearningRate 0.0002 Epoch: 24 Global Step: 515040 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:44,834-Speed 6342.99 samples/sec Loss 4.1256 LearningRate 0.0002 Epoch: 24 Global Step: 515050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:48,079-Speed 6312.34 samples/sec Loss 4.1383 LearningRate 0.0002 Epoch: 24 Global Step: 515060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:51,321-Speed 6317.14 samples/sec Loss 4.1287 LearningRate 0.0002 Epoch: 24 Global Step: 515070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:54,577-Speed 6293.03 samples/sec Loss 4.2011 LearningRate 0.0002 Epoch: 24 Global Step: 515080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:34:57,821-Speed 6313.33 samples/sec Loss 4.1771 LearningRate 0.0002 Epoch: 24 Global Step: 515090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:01,067-Speed 6312.35 samples/sec Loss 4.1439 LearningRate 0.0002 Epoch: 24 Global Step: 515100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:04,310-Speed 6315.41 samples/sec Loss 4.1627 LearningRate 0.0002 Epoch: 24 Global Step: 515110 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:07,558-Speed 6306.81 samples/sec Loss 4.1116 LearningRate 0.0002 Epoch: 24 Global Step: 515120 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:10,813-Speed 6292.23 samples/sec Loss 4.1457 LearningRate 0.0002 Epoch: 24 Global Step: 515130 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:14,056-Speed 6317.30 samples/sec Loss 4.2121 LearningRate 0.0002 Epoch: 24 Global Step: 515140 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:17,286-Speed 6343.00 samples/sec Loss 4.1928 LearningRate 0.0002 Epoch: 24 Global Step: 515150 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:20,533-Speed 6307.99 samples/sec Loss 4.1605 LearningRate 0.0002 Epoch: 24 Global Step: 515160 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:23,775-Speed 6319.08 samples/sec Loss 4.2077 LearningRate 0.0002 Epoch: 24 Global Step: 515170 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:27,018-Speed 6315.11 samples/sec Loss 4.1647 LearningRate 0.0002 Epoch: 24 Global Step: 515180 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:30,260-Speed 6319.80 samples/sec Loss 4.1388 LearningRate 0.0002 Epoch: 24 Global Step: 515190 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:33,504-Speed 6314.43 samples/sec Loss 4.2054 LearningRate 0.0002 Epoch: 24 Global Step: 515200 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:36,749-Speed 6312.93 samples/sec Loss 4.1196 LearningRate 0.0002 Epoch: 24 Global Step: 515210 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:39,995-Speed 6309.51 samples/sec Loss 4.1113 LearningRate 0.0002 Epoch: 24 Global Step: 515220 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:43,238-Speed 6316.73 samples/sec Loss 4.1636 LearningRate 0.0002 Epoch: 24 Global Step: 515230 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:46,485-Speed 6309.41 samples/sec Loss 4.1153 LearningRate 0.0002 Epoch: 24 Global Step: 515240 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:49,725-Speed 6322.22 samples/sec Loss 4.1701 LearningRate 0.0002 Epoch: 24 Global Step: 515250 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 14:35:52,955-Speed 6343.14 samples/sec Loss 4.0980 LearningRate 0.0002 Epoch: 24 Global Step: 515260 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:56,202-Speed 6308.50 samples/sec Loss 4.1136 LearningRate 0.0002 Epoch: 24 Global Step: 515270 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:35:59,458-Speed 6291.61 samples/sec Loss 4.1840 LearningRate 0.0002 Epoch: 24 Global Step: 515280 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:02,702-Speed 6315.81 samples/sec Loss 4.1763 LearningRate 0.0002 Epoch: 24 Global Step: 515290 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:05,950-Speed 6304.73 samples/sec Loss 4.1238 LearningRate 0.0002 Epoch: 24 Global Step: 515300 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:09,195-Speed 6313.78 samples/sec Loss 4.1177 LearningRate 0.0002 Epoch: 24 Global Step: 515310 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:12,439-Speed 6314.58 samples/sec Loss 4.1721 LearningRate 0.0002 Epoch: 24 Global Step: 515320 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:15,687-Speed 6306.22 samples/sec Loss 4.1518 LearningRate 0.0002 Epoch: 24 Global Step: 515330 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:18,930-Speed 6317.26 samples/sec Loss 4.1557 LearningRate 0.0002 Epoch: 24 Global Step: 515340 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:22,185-Speed 6292.30 samples/sec Loss 4.1017 LearningRate 0.0002 Epoch: 24 Global Step: 515350 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:25,421-Speed 6330.72 samples/sec Loss 4.1699 LearningRate 0.0002 Epoch: 24 Global Step: 515360 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:28,714-Speed 6220.83 samples/sec Loss 4.1654 LearningRate 0.0002 Epoch: 24 Global Step: 515370 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:31,961-Speed 6309.99 samples/sec Loss 4.1858 LearningRate 0.0002 Epoch: 24 Global Step: 515380 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:35,206-Speed 6312.20 samples/sec Loss 4.1369 LearningRate 0.0002 Epoch: 24 Global Step: 515390 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:38,450-Speed 6313.28 samples/sec Loss 4.1314 LearningRate 0.0002 Epoch: 24 Global Step: 515400 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:41,694-Speed 6315.30 samples/sec Loss 4.1590 LearningRate 0.0002 Epoch: 24 Global Step: 515410 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:44,939-Speed 6312.41 samples/sec Loss 4.1063 LearningRate 0.0002 Epoch: 24 Global Step: 515420 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:48,185-Speed 6311.33 samples/sec Loss 4.1272 LearningRate 0.0002 Epoch: 24 Global Step: 515430 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:51,436-Speed 6300.79 samples/sec Loss 4.1556 LearningRate 0.0002 Epoch: 24 Global Step: 515440 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:54,678-Speed 6319.16 samples/sec Loss 4.1209 LearningRate 0.0002 Epoch: 24 Global Step: 515450 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:36:57,917-Speed 6323.44 samples/sec Loss 4.1495 LearningRate 0.0002 Epoch: 24 Global Step: 515460 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:01,166-Speed 6306.96 samples/sec Loss 4.1391 LearningRate 0.0002 Epoch: 24 Global Step: 515470 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:04,417-Speed 6300.81 samples/sec Loss 4.1497 LearningRate 0.0002 Epoch: 24 Global Step: 515480 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:07,663-Speed 6309.92 samples/sec Loss 4.2137 LearningRate 0.0002 Epoch: 24 Global Step: 515490 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:10,904-Speed 6320.56 samples/sec Loss 4.1995 LearningRate 0.0002 Epoch: 24 Global Step: 515500 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:14,149-Speed 6312.51 samples/sec Loss 4.1618 LearningRate 0.0002 Epoch: 24 Global Step: 515510 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:17,400-Speed 6300.38 samples/sec Loss 4.1718 LearningRate 0.0002 Epoch: 24 Global Step: 515520 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:20,641-Speed 6321.44 samples/sec Loss 4.1347 LearningRate 0.0002 Epoch: 24 Global Step: 515530 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:23,889-Speed 6306.55 samples/sec Loss 4.0932 LearningRate 0.0002 Epoch: 24 Global Step: 515540 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:27,135-Speed 6310.93 samples/sec Loss 4.1825 LearningRate 0.0002 Epoch: 24 Global Step: 515550 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:30,370-Speed 6333.00 samples/sec Loss 4.1657 LearningRate 0.0002 Epoch: 24 Global Step: 515560 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:33,615-Speed 6311.33 samples/sec Loss 4.1167 LearningRate 0.0002 Epoch: 24 Global Step: 515570 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:36,906-Speed 6225.40 samples/sec Loss 4.1761 LearningRate 0.0002 Epoch: 24 Global Step: 515580 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:40,154-Speed 6307.12 samples/sec Loss 4.1183 LearningRate 0.0002 Epoch: 24 Global Step: 515590 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:43,396-Speed 6318.26 samples/sec Loss 4.1009 LearningRate 0.0002 Epoch: 24 Global Step: 515600 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:46,643-Speed 6308.10 samples/sec Loss 4.0906 LearningRate 0.0002 Epoch: 24 Global Step: 515610 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:49,891-Speed 6306.98 samples/sec Loss 4.1763 LearningRate 0.0002 Epoch: 24 Global Step: 515620 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:53,142-Speed 6300.71 samples/sec Loss 4.1105 LearningRate 0.0002 Epoch: 24 Global Step: 515630 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:37:56,377-Speed 6332.61 samples/sec Loss 4.1256 LearningRate 0.0002 Epoch: 24 Global Step: 515640 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:37:59,619-Speed 6318.44 samples/sec Loss 4.1165 LearningRate 0.0002 Epoch: 24 Global Step: 515650 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:38:02,871-Speed 6300.32 samples/sec Loss 4.2108 LearningRate 0.0002 Epoch: 24 Global Step: 515660 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:38:06,118-Speed 6308.15 samples/sec Loss 4.0932 LearningRate 0.0002 Epoch: 24 Global Step: 515670 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:38:09,361-Speed 6316.45 samples/sec Loss 4.1828 LearningRate 0.0002 Epoch: 24 Global Step: 515680 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:38:12,602-Speed 6320.33 samples/sec Loss 4.1921 LearningRate 0.0002 Epoch: 24 Global Step: 515690 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:38:15,845-Speed 6316.76 samples/sec Loss 4.1822 LearningRate 0.0002 Epoch: 24 Global Step: 515700 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:38:19,090-Speed 6312.84 samples/sec Loss 4.1978 LearningRate 0.0002 Epoch: 24 Global Step: 515710 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:38:22,333-Speed 6317.56 samples/sec Loss 4.1194 LearningRate 0.0002 Epoch: 24 Global Step: 515720 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:38:25,581-Speed 6305.71 samples/sec Loss 4.1319 LearningRate 0.0002 Epoch: 24 Global Step: 515730 Fp16 Grad Scale: 8192 Required: 29 hours Training: 2022-04-02 14:38:28,827-Speed 6310.56 samples/sec Loss 4.1670 LearningRate 0.0002 Epoch: 24 Global Step: 515740 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:38:32,070-Speed 6317.48 samples/sec Loss 4.1242 LearningRate 0.0002 Epoch: 24 Global Step: 515750 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:38:35,318-Speed 6306.61 samples/sec Loss 4.1624 LearningRate 0.0002 Epoch: 24 Global Step: 515760 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:38:38,561-Speed 6316.22 samples/sec Loss 4.1398 LearningRate 0.0002 Epoch: 24 Global Step: 515770 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:38:41,805-Speed 6315.01 samples/sec Loss 4.0797 LearningRate 0.0002 Epoch: 24 Global Step: 515780 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:38:45,051-Speed 6310.69 samples/sec Loss 4.1683 LearningRate 0.0002 Epoch: 24 Global Step: 515790 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:38:48,296-Speed 6313.52 samples/sec Loss 4.1548 LearningRate 0.0002 Epoch: 24 Global Step: 515800 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:38:51,546-Speed 6301.82 samples/sec Loss 4.1470 LearningRate 0.0002 Epoch: 24 Global Step: 515810 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:38:54,790-Speed 6314.31 samples/sec Loss 4.0859 LearningRate 0.0002 Epoch: 24 Global Step: 515820 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:38:58,216-Speed 5978.73 samples/sec Loss 4.1515 LearningRate 0.0002 Epoch: 24 Global Step: 515830 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:01,450-Speed 6334.19 samples/sec Loss 4.1874 LearningRate 0.0002 Epoch: 24 Global Step: 515840 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:04,762-Speed 6185.04 samples/sec Loss 4.1674 LearningRate 0.0002 Epoch: 24 Global Step: 515850 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:08,011-Speed 6306.64 samples/sec Loss 4.2013 LearningRate 0.0002 Epoch: 24 Global Step: 515860 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:11,255-Speed 6314.08 samples/sec Loss 4.1144 LearningRate 0.0002 Epoch: 24 Global Step: 515870 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:14,501-Speed 6311.55 samples/sec Loss 4.1957 LearningRate 0.0002 Epoch: 24 Global Step: 515880 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:17,745-Speed 6313.18 samples/sec Loss 4.1606 LearningRate 0.0002 Epoch: 24 Global Step: 515890 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:20,987-Speed 6320.18 samples/sec Loss 4.1696 LearningRate 0.0002 Epoch: 24 Global Step: 515900 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:24,232-Speed 6311.70 samples/sec Loss 4.1581 LearningRate 0.0002 Epoch: 24 Global Step: 515910 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:27,478-Speed 6310.61 samples/sec Loss 4.1463 LearningRate 0.0002 Epoch: 24 Global Step: 515920 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:30,721-Speed 6316.46 samples/sec Loss 4.1686 LearningRate 0.0002 Epoch: 24 Global Step: 515930 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:33,953-Speed 6338.86 samples/sec Loss 4.0885 LearningRate 0.0002 Epoch: 24 Global Step: 515940 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:37,198-Speed 6312.38 samples/sec Loss 4.1512 LearningRate 0.0002 Epoch: 24 Global Step: 515950 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:40,443-Speed 6312.78 samples/sec Loss 4.1612 LearningRate 0.0002 Epoch: 24 Global Step: 515960 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:43,685-Speed 6318.58 samples/sec Loss 4.1540 LearningRate 0.0002 Epoch: 24 Global Step: 515970 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:46,926-Speed 6319.21 samples/sec Loss 4.1090 LearningRate 0.0002 Epoch: 24 Global Step: 515980 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:50,174-Speed 6308.40 samples/sec Loss 4.1978 LearningRate 0.0002 Epoch: 24 Global Step: 515990 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:53,420-Speed 6309.41 samples/sec Loss 4.1346 LearningRate 0.0002 Epoch: 24 Global Step: 516000 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:56,660-Speed 6323.43 samples/sec Loss 4.1102 LearningRate 0.0002 Epoch: 24 Global Step: 516010 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:39:59,906-Speed 6310.19 samples/sec Loss 4.1197 LearningRate 0.0002 Epoch: 24 Global Step: 516020 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:03,152-Speed 6310.05 samples/sec Loss 4.1686 LearningRate 0.0002 Epoch: 24 Global Step: 516030 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:06,396-Speed 6314.83 samples/sec Loss 4.1909 LearningRate 0.0002 Epoch: 24 Global Step: 516040 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-02 14:40:09,628-Speed 6338.19 samples/sec Loss 4.1499 LearningRate 0.0002 Epoch: 24 Global Step: 516050 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:12,867-Speed 6323.53 samples/sec Loss 4.1627 LearningRate 0.0002 Epoch: 24 Global Step: 516060 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:16,110-Speed 6316.71 samples/sec Loss 4.1536 LearningRate 0.0002 Epoch: 24 Global Step: 516070 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:19,357-Speed 6308.67 samples/sec Loss 4.1525 LearningRate 0.0002 Epoch: 24 Global Step: 516080 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:22,601-Speed 6316.67 samples/sec Loss 4.1469 LearningRate 0.0002 Epoch: 24 Global Step: 516090 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:25,843-Speed 6316.99 samples/sec Loss 4.1794 LearningRate 0.0002 Epoch: 24 Global Step: 516100 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:29,085-Speed 6319.17 samples/sec Loss 4.1766 LearningRate 0.0002 Epoch: 24 Global Step: 516110 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:32,327-Speed 6318.19 samples/sec Loss 4.1698 LearningRate 0.0002 Epoch: 24 Global Step: 516120 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:35,570-Speed 6316.58 samples/sec Loss 4.1609 LearningRate 0.0002 Epoch: 24 Global Step: 516130 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:38,814-Speed 6315.51 samples/sec Loss 4.1597 LearningRate 0.0002 Epoch: 24 Global Step: 516140 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-02 14:40:42,047-Speed 6335.98 samples/sec Loss 4.1907 LearningRate 0.0002 Epoch: 24 Global Step: 516150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:40:45,302-Speed 6293.92 samples/sec Loss 4.1518 LearningRate 0.0002 Epoch: 24 Global Step: 516160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:40:48,547-Speed 6311.27 samples/sec Loss 4.1094 LearningRate 0.0002 Epoch: 24 Global Step: 516170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:40:51,817-Speed 6264.88 samples/sec Loss 4.2293 LearningRate 0.0002 Epoch: 24 Global Step: 516180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:40:55,132-Speed 6178.83 samples/sec Loss 4.1242 LearningRate 0.0002 Epoch: 24 Global Step: 516190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:40:58,378-Speed 6311.77 samples/sec Loss 4.2170 LearningRate 0.0002 Epoch: 24 Global Step: 516200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:01,623-Speed 6312.00 samples/sec Loss 4.1778 LearningRate 0.0002 Epoch: 24 Global Step: 516210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:04,867-Speed 6314.48 samples/sec Loss 4.1691 LearningRate 0.0002 Epoch: 24 Global Step: 516220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:08,110-Speed 6317.27 samples/sec Loss 4.1295 LearningRate 0.0002 Epoch: 24 Global Step: 516230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:11,355-Speed 6313.15 samples/sec Loss 4.1195 LearningRate 0.0002 Epoch: 24 Global Step: 516240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:14,583-Speed 6344.35 samples/sec Loss 4.1547 LearningRate 0.0002 Epoch: 24 Global Step: 516250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:17,827-Speed 6314.92 samples/sec Loss 4.1563 LearningRate 0.0002 Epoch: 24 Global Step: 516260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:21,071-Speed 6315.85 samples/sec Loss 4.1455 LearningRate 0.0002 Epoch: 24 Global Step: 516270 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:24,329-Speed 6287.17 samples/sec Loss 4.0664 LearningRate 0.0002 Epoch: 24 Global Step: 516280 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:27,576-Speed 6307.74 samples/sec Loss 4.1449 LearningRate 0.0002 Epoch: 24 Global Step: 516290 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:30,821-Speed 6313.41 samples/sec Loss 4.1827 LearningRate 0.0002 Epoch: 24 Global Step: 516300 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:34,065-Speed 6314.72 samples/sec Loss 4.1496 LearningRate 0.0002 Epoch: 24 Global Step: 516310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:37,334-Speed 6267.64 samples/sec Loss 4.1272 LearningRate 0.0002 Epoch: 24 Global Step: 516320 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:40,581-Speed 6307.55 samples/sec Loss 4.1273 LearningRate 0.0002 Epoch: 24 Global Step: 516330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:43,824-Speed 6317.41 samples/sec Loss 4.2333 LearningRate 0.0002 Epoch: 24 Global Step: 516340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:47,053-Speed 6343.06 samples/sec Loss 4.1442 LearningRate 0.0002 Epoch: 24 Global Step: 516350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:50,300-Speed 6309.92 samples/sec Loss 4.1423 LearningRate 0.0002 Epoch: 24 Global Step: 516360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:53,542-Speed 6317.10 samples/sec Loss 4.1872 LearningRate 0.0002 Epoch: 24 Global Step: 516370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:41:56,786-Speed 6315.83 samples/sec Loss 4.1935 LearningRate 0.0002 Epoch: 24 Global Step: 516380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:00,033-Speed 6308.90 samples/sec Loss 4.1117 LearningRate 0.0002 Epoch: 24 Global Step: 516390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:03,277-Speed 6312.85 samples/sec Loss 4.1622 LearningRate 0.0002 Epoch: 24 Global Step: 516400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:06,522-Speed 6314.06 samples/sec Loss 4.1048 LearningRate 0.0002 Epoch: 24 Global Step: 516410 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:09,768-Speed 6309.87 samples/sec Loss 4.1669 LearningRate 0.0002 Epoch: 24 Global Step: 516420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:13,013-Speed 6311.82 samples/sec Loss 4.1544 LearningRate 0.0002 Epoch: 24 Global Step: 516430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:16,261-Speed 6308.10 samples/sec Loss 4.1751 LearningRate 0.0002 Epoch: 24 Global Step: 516440 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:19,492-Speed 6340.18 samples/sec Loss 4.1181 LearningRate 0.0002 Epoch: 24 Global Step: 516450 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:22,740-Speed 6306.12 samples/sec Loss 4.1424 LearningRate 0.0002 Epoch: 24 Global Step: 516460 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:25,988-Speed 6306.82 samples/sec Loss 4.1583 LearningRate 0.0002 Epoch: 24 Global Step: 516470 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:29,232-Speed 6314.04 samples/sec Loss 4.1451 LearningRate 0.0002 Epoch: 24 Global Step: 516480 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:32,477-Speed 6312.48 samples/sec Loss 4.1556 LearningRate 0.0002 Epoch: 24 Global Step: 516490 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:35,723-Speed 6310.43 samples/sec Loss 4.1811 LearningRate 0.0002 Epoch: 24 Global Step: 516500 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:38,970-Speed 6309.76 samples/sec Loss 4.1395 LearningRate 0.0002 Epoch: 24 Global Step: 516510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:42,212-Speed 6318.55 samples/sec Loss 4.0849 LearningRate 0.0002 Epoch: 24 Global Step: 516520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:45,461-Speed 6305.88 samples/sec Loss 4.1604 LearningRate 0.0002 Epoch: 24 Global Step: 516530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:48,763-Speed 6203.23 samples/sec Loss 4.1447 LearningRate 0.0002 Epoch: 24 Global Step: 516540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:52,027-Speed 6276.81 samples/sec Loss 4.1841 LearningRate 0.0002 Epoch: 24 Global Step: 516550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:55,274-Speed 6309.43 samples/sec Loss 4.1347 LearningRate 0.0002 Epoch: 24 Global Step: 516560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:42:58,520-Speed 6310.29 samples/sec Loss 4.1585 LearningRate 0.0002 Epoch: 24 Global Step: 516570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:01,766-Speed 6310.30 samples/sec Loss 4.1328 LearningRate 0.0002 Epoch: 24 Global Step: 516580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:05,018-Speed 6299.14 samples/sec Loss 4.1618 LearningRate 0.0002 Epoch: 24 Global Step: 516590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:08,266-Speed 6306.73 samples/sec Loss 4.1770 LearningRate 0.0002 Epoch: 24 Global Step: 516600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:11,510-Speed 6313.91 samples/sec Loss 4.1540 LearningRate 0.0002 Epoch: 24 Global Step: 516610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:14,754-Speed 6315.62 samples/sec Loss 4.1417 LearningRate 0.0002 Epoch: 24 Global Step: 516620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:18,005-Speed 6299.74 samples/sec Loss 4.1726 LearningRate 0.0002 Epoch: 24 Global Step: 516630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:21,250-Speed 6314.64 samples/sec Loss 4.1891 LearningRate 0.0002 Epoch: 24 Global Step: 516640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:24,479-Speed 6342.90 samples/sec Loss 4.1154 LearningRate 0.0002 Epoch: 24 Global Step: 516650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:27,726-Speed 6307.60 samples/sec Loss 4.1565 LearningRate 0.0002 Epoch: 24 Global Step: 516660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:30,972-Speed 6311.44 samples/sec Loss 4.1120 LearningRate 0.0002 Epoch: 24 Global Step: 516670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:34,222-Speed 6302.47 samples/sec Loss 4.1212 LearningRate 0.0002 Epoch: 24 Global Step: 516680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:37,471-Speed 6305.86 samples/sec Loss 4.1436 LearningRate 0.0002 Epoch: 24 Global Step: 516690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:40,715-Speed 6313.32 samples/sec Loss 4.1745 LearningRate 0.0002 Epoch: 24 Global Step: 516700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:43,961-Speed 6312.63 samples/sec Loss 4.1975 LearningRate 0.0002 Epoch: 24 Global Step: 516710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:47,260-Speed 6209.71 samples/sec Loss 4.1261 LearningRate 0.0002 Epoch: 24 Global Step: 516720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:50,518-Speed 6287.81 samples/sec Loss 4.1898 LearningRate 0.0002 Epoch: 24 Global Step: 516730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:53,766-Speed 6306.18 samples/sec Loss 4.1628 LearningRate 0.0002 Epoch: 24 Global Step: 516740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:43:57,012-Speed 6311.42 samples/sec Loss 4.1446 LearningRate 0.0002 Epoch: 24 Global Step: 516750 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 14:44:00,326-Speed 6180.58 samples/sec Loss 4.1863 LearningRate 0.0002 Epoch: 24 Global Step: 516760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:44:03,678-Speed 6111.65 samples/sec Loss 4.1540 LearningRate 0.0002 Epoch: 24 Global Step: 516770 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:44:06,932-Speed 6294.91 samples/sec Loss 4.1327 LearningRate 0.0002 Epoch: 24 Global Step: 516780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:44:10,194-Speed 6279.85 samples/sec Loss 4.1862 LearningRate 0.0002 Epoch: 24 Global Step: 516790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:44:13,483-Speed 6227.27 samples/sec Loss 4.1097 LearningRate 0.0002 Epoch: 24 Global Step: 516800 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:44:16,728-Speed 6313.10 samples/sec Loss 4.1377 LearningRate 0.0002 Epoch: 24 Global Step: 516810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:44:19,970-Speed 6317.14 samples/sec Loss 4.1043 LearningRate 0.0002 Epoch: 24 Global Step: 516820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:44:23,216-Speed 6311.93 samples/sec Loss 4.0648 LearningRate 0.0002 Epoch: 24 Global Step: 516830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:44:26,473-Speed 6289.14 samples/sec Loss 4.1564 LearningRate 0.0002 Epoch: 24 Global Step: 516840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:44:29,702-Speed 6344.27 samples/sec Loss 4.1153 LearningRate 0.0002 Epoch: 24 Global Step: 516850 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:44:32,947-Speed 6311.67 samples/sec Loss 4.1303 LearningRate 0.0002 Epoch: 24 Global Step: 516860 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:44:36,190-Speed 6316.92 samples/sec Loss 4.1631 LearningRate 0.0002 Epoch: 24 Global Step: 516870 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:44:39,439-Speed 6304.58 samples/sec Loss 4.1973 LearningRate 0.0002 Epoch: 24 Global Step: 516880 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:44:42,684-Speed 6313.62 samples/sec Loss 4.1624 LearningRate 0.0002 Epoch: 24 Global Step: 516890 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:44:45,927-Speed 6315.68 samples/sec Loss 4.1061 LearningRate 0.0002 Epoch: 24 Global Step: 516900 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:44:49,170-Speed 6317.90 samples/sec Loss 4.1318 LearningRate 0.0002 Epoch: 24 Global Step: 516910 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:44:52,413-Speed 6315.67 samples/sec Loss 4.1774 LearningRate 0.0002 Epoch: 24 Global Step: 516920 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:44:55,655-Speed 6320.03 samples/sec Loss 4.1707 LearningRate 0.0002 Epoch: 24 Global Step: 516930 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:44:58,898-Speed 6316.86 samples/sec Loss 4.0858 LearningRate 0.0002 Epoch: 24 Global Step: 516940 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:45:02,145-Speed 6308.77 samples/sec Loss 4.1526 LearningRate 0.0002 Epoch: 24 Global Step: 516950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:05,388-Speed 6315.31 samples/sec Loss 4.1936 LearningRate 0.0002 Epoch: 24 Global Step: 516960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:08,633-Speed 6313.43 samples/sec Loss 4.2127 LearningRate 0.0002 Epoch: 24 Global Step: 516970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:11,877-Speed 6313.83 samples/sec Loss 4.1409 LearningRate 0.0002 Epoch: 24 Global Step: 516980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:15,126-Speed 6304.73 samples/sec Loss 4.1818 LearningRate 0.0002 Epoch: 24 Global Step: 516990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:18,369-Speed 6317.39 samples/sec Loss 4.1225 LearningRate 0.0002 Epoch: 24 Global Step: 517000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:21,614-Speed 6312.81 samples/sec Loss 4.2173 LearningRate 0.0002 Epoch: 24 Global Step: 517010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:24,862-Speed 6307.14 samples/sec Loss 4.1541 LearningRate 0.0002 Epoch: 24 Global Step: 517020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:28,107-Speed 6312.46 samples/sec Loss 4.1349 LearningRate 0.0002 Epoch: 24 Global Step: 517030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:31,353-Speed 6309.28 samples/sec Loss 4.1689 LearningRate 0.0002 Epoch: 24 Global Step: 517040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:34,596-Speed 6318.21 samples/sec Loss 4.1621 LearningRate 0.0002 Epoch: 24 Global Step: 517050 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 14:45:37,826-Speed 6340.26 samples/sec Loss 4.1695 LearningRate 0.0002 Epoch: 24 Global Step: 517060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:41,075-Speed 6304.63 samples/sec Loss 4.2072 LearningRate 0.0002 Epoch: 24 Global Step: 517070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:44,319-Speed 6315.27 samples/sec Loss 4.1389 LearningRate 0.0002 Epoch: 24 Global Step: 517080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:47,562-Speed 6317.13 samples/sec Loss 4.1488 LearningRate 0.0002 Epoch: 24 Global Step: 517090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:50,804-Speed 6318.08 samples/sec Loss 4.2041 LearningRate 0.0002 Epoch: 24 Global Step: 517100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:54,048-Speed 6315.81 samples/sec Loss 4.1494 LearningRate 0.0002 Epoch: 24 Global Step: 517110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:45:57,297-Speed 6303.72 samples/sec Loss 4.1667 LearningRate 0.0002 Epoch: 24 Global Step: 517120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:00,542-Speed 6311.77 samples/sec Loss 4.1045 LearningRate 0.0002 Epoch: 24 Global Step: 517130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:03,790-Speed 6307.93 samples/sec Loss 4.1466 LearningRate 0.0002 Epoch: 24 Global Step: 517140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:07,031-Speed 6320.41 samples/sec Loss 4.1587 LearningRate 0.0002 Epoch: 24 Global Step: 517150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:10,265-Speed 6334.70 samples/sec Loss 4.1575 LearningRate 0.0002 Epoch: 24 Global Step: 517160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:13,512-Speed 6309.33 samples/sec Loss 4.1377 LearningRate 0.0002 Epoch: 24 Global Step: 517170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:16,761-Speed 6304.76 samples/sec Loss 4.1454 LearningRate 0.0002 Epoch: 24 Global Step: 517180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:20,005-Speed 6314.31 samples/sec Loss 4.1392 LearningRate 0.0002 Epoch: 24 Global Step: 517190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:23,249-Speed 6315.23 samples/sec Loss 4.0599 LearningRate 0.0002 Epoch: 24 Global Step: 517200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:26,500-Speed 6300.40 samples/sec Loss 4.1504 LearningRate 0.0002 Epoch: 24 Global Step: 517210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:29,755-Speed 6293.72 samples/sec Loss 4.0913 LearningRate 0.0002 Epoch: 24 Global Step: 517220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:32,999-Speed 6314.21 samples/sec Loss 4.1748 LearningRate 0.0002 Epoch: 24 Global Step: 517230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:36,243-Speed 6314.13 samples/sec Loss 4.1664 LearningRate 0.0002 Epoch: 24 Global Step: 517240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:39,486-Speed 6317.70 samples/sec Loss 4.1746 LearningRate 0.0002 Epoch: 24 Global Step: 517250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:42,717-Speed 6338.25 samples/sec Loss 4.1181 LearningRate 0.0002 Epoch: 24 Global Step: 517260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:45,959-Speed 6318.92 samples/sec Loss 4.1880 LearningRate 0.0002 Epoch: 24 Global Step: 517270 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:49,207-Speed 6308.67 samples/sec Loss 4.1582 LearningRate 0.0002 Epoch: 24 Global Step: 517280 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:52,455-Speed 6305.96 samples/sec Loss 4.0995 LearningRate 0.0002 Epoch: 24 Global Step: 517290 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:55,695-Speed 6322.04 samples/sec Loss 4.1794 LearningRate 0.0002 Epoch: 24 Global Step: 517300 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:46:58,945-Speed 6302.57 samples/sec Loss 4.0591 LearningRate 0.0002 Epoch: 24 Global Step: 517310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:02,204-Speed 6286.69 samples/sec Loss 4.1733 LearningRate 0.0002 Epoch: 24 Global Step: 517320 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:05,482-Speed 6247.95 samples/sec Loss 4.1686 LearningRate 0.0002 Epoch: 24 Global Step: 517330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:08,727-Speed 6313.92 samples/sec Loss 4.1287 LearningRate 0.0002 Epoch: 24 Global Step: 517340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:11,979-Speed 6297.53 samples/sec Loss 4.1672 LearningRate 0.0002 Epoch: 24 Global Step: 517350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:15,230-Speed 6301.90 samples/sec Loss 4.1208 LearningRate 0.0002 Epoch: 24 Global Step: 517360 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 14:47:18,461-Speed 6341.17 samples/sec Loss 4.0899 LearningRate 0.0002 Epoch: 24 Global Step: 517370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:21,707-Speed 6311.19 samples/sec Loss 4.1614 LearningRate 0.0002 Epoch: 24 Global Step: 517380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:24,949-Speed 6317.87 samples/sec Loss 4.1067 LearningRate 0.0002 Epoch: 24 Global Step: 517390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:28,194-Speed 6311.48 samples/sec Loss 4.1431 LearningRate 0.0002 Epoch: 24 Global Step: 517400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:31,445-Speed 6302.43 samples/sec Loss 4.1846 LearningRate 0.0002 Epoch: 24 Global Step: 517410 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:34,690-Speed 6313.32 samples/sec Loss 4.1340 LearningRate 0.0002 Epoch: 24 Global Step: 517420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:37,937-Speed 6308.48 samples/sec Loss 4.1403 LearningRate 0.0002 Epoch: 24 Global Step: 517430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:41,183-Speed 6310.09 samples/sec Loss 4.1776 LearningRate 0.0002 Epoch: 24 Global Step: 517440 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:44,426-Speed 6316.48 samples/sec Loss 4.1839 LearningRate 0.0002 Epoch: 24 Global Step: 517450 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:47,668-Speed 6317.54 samples/sec Loss 4.1237 LearningRate 0.0002 Epoch: 24 Global Step: 517460 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:50,899-Speed 6340.27 samples/sec Loss 4.1450 LearningRate 0.0002 Epoch: 24 Global Step: 517470 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:54,145-Speed 6311.54 samples/sec Loss 4.1782 LearningRate 0.0002 Epoch: 24 Global Step: 517480 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:47:57,388-Speed 6316.53 samples/sec Loss 4.1599 LearningRate 0.0002 Epoch: 24 Global Step: 517490 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:00,635-Speed 6308.17 samples/sec Loss 4.1860 LearningRate 0.0002 Epoch: 24 Global Step: 517500 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:03,883-Speed 6307.32 samples/sec Loss 4.0992 LearningRate 0.0002 Epoch: 24 Global Step: 517510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:07,128-Speed 6312.44 samples/sec Loss 4.1173 LearningRate 0.0002 Epoch: 24 Global Step: 517520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:10,377-Speed 6305.30 samples/sec Loss 4.1844 LearningRate 0.0002 Epoch: 24 Global Step: 517530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:13,621-Speed 6315.35 samples/sec Loss 4.1140 LearningRate 0.0002 Epoch: 24 Global Step: 517540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:16,880-Speed 6285.17 samples/sec Loss 4.1507 LearningRate 0.0002 Epoch: 24 Global Step: 517550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:20,123-Speed 6316.54 samples/sec Loss 4.1512 LearningRate 0.0002 Epoch: 24 Global Step: 517560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:23,356-Speed 6335.96 samples/sec Loss 4.1382 LearningRate 0.0002 Epoch: 24 Global Step: 517570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:26,603-Speed 6309.42 samples/sec Loss 4.1596 LearningRate 0.0002 Epoch: 24 Global Step: 517580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:29,847-Speed 6313.88 samples/sec Loss 4.1783 LearningRate 0.0002 Epoch: 24 Global Step: 517590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:33,097-Speed 6303.61 samples/sec Loss 4.1040 LearningRate 0.0002 Epoch: 24 Global Step: 517600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:36,343-Speed 6311.19 samples/sec Loss 4.1058 LearningRate 0.0002 Epoch: 24 Global Step: 517610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:39,586-Speed 6316.70 samples/sec Loss 4.1178 LearningRate 0.0002 Epoch: 24 Global Step: 517620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:42,831-Speed 6311.27 samples/sec Loss 4.1420 LearningRate 0.0002 Epoch: 24 Global Step: 517630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:46,076-Speed 6312.82 samples/sec Loss 4.1548 LearningRate 0.0002 Epoch: 24 Global Step: 517640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:49,320-Speed 6315.52 samples/sec Loss 4.1006 LearningRate 0.0002 Epoch: 24 Global Step: 517650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:52,565-Speed 6311.97 samples/sec Loss 4.1335 LearningRate 0.0002 Epoch: 24 Global Step: 517660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:55,799-Speed 6334.68 samples/sec Loss 4.1004 LearningRate 0.0002 Epoch: 24 Global Step: 517670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:48:59,043-Speed 6313.25 samples/sec Loss 4.1063 LearningRate 0.0002 Epoch: 24 Global Step: 517680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:02,288-Speed 6312.76 samples/sec Loss 4.1795 LearningRate 0.0002 Epoch: 24 Global Step: 517690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:05,536-Speed 6306.87 samples/sec Loss 4.1572 LearningRate 0.0002 Epoch: 24 Global Step: 517700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:08,785-Speed 6305.26 samples/sec Loss 4.1018 LearningRate 0.0002 Epoch: 24 Global Step: 517710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:12,037-Speed 6299.75 samples/sec Loss 4.1742 LearningRate 0.0002 Epoch: 24 Global Step: 517720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:15,281-Speed 6313.91 samples/sec Loss 4.1037 LearningRate 0.0002 Epoch: 24 Global Step: 517730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:18,523-Speed 6318.66 samples/sec Loss 4.1395 LearningRate 0.0002 Epoch: 24 Global Step: 517740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:21,767-Speed 6314.53 samples/sec Loss 4.1354 LearningRate 0.0002 Epoch: 24 Global Step: 517750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:25,020-Speed 6297.25 samples/sec Loss 4.1203 LearningRate 0.0002 Epoch: 24 Global Step: 517760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:28,250-Speed 6340.88 samples/sec Loss 4.1787 LearningRate 0.0002 Epoch: 24 Global Step: 517770 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:31,494-Speed 6315.86 samples/sec Loss 4.1984 LearningRate 0.0002 Epoch: 24 Global Step: 517780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:34,735-Speed 6321.39 samples/sec Loss 4.1356 LearningRate 0.0002 Epoch: 24 Global Step: 517790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:37,978-Speed 6315.22 samples/sec Loss 4.1431 LearningRate 0.0002 Epoch: 24 Global Step: 517800 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:41,240-Speed 6280.74 samples/sec Loss 4.0926 LearningRate 0.0002 Epoch: 24 Global Step: 517810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:44,480-Speed 6322.41 samples/sec Loss 4.0844 LearningRate 0.0002 Epoch: 24 Global Step: 517820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:47,725-Speed 6311.86 samples/sec Loss 4.1599 LearningRate 0.0002 Epoch: 24 Global Step: 517830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:50,975-Speed 6304.26 samples/sec Loss 4.1987 LearningRate 0.0002 Epoch: 24 Global Step: 517840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:54,221-Speed 6309.24 samples/sec Loss 4.1131 LearningRate 0.0002 Epoch: 24 Global Step: 517850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:49:57,480-Speed 6285.89 samples/sec Loss 4.2243 LearningRate 0.0002 Epoch: 24 Global Step: 517860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:00,710-Speed 6342.66 samples/sec Loss 4.1452 LearningRate 0.0002 Epoch: 24 Global Step: 517870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:03,955-Speed 6311.91 samples/sec Loss 4.2074 LearningRate 0.0002 Epoch: 24 Global Step: 517880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:07,205-Speed 6303.39 samples/sec Loss 4.1218 LearningRate 0.0002 Epoch: 24 Global Step: 517890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:10,449-Speed 6313.85 samples/sec Loss 4.1394 LearningRate 0.0002 Epoch: 24 Global Step: 517900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:13,693-Speed 6315.17 samples/sec Loss 4.0991 LearningRate 0.0002 Epoch: 24 Global Step: 517910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:16,941-Speed 6306.76 samples/sec Loss 4.1918 LearningRate 0.0002 Epoch: 24 Global Step: 517920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:20,189-Speed 6307.46 samples/sec Loss 4.1228 LearningRate 0.0002 Epoch: 24 Global Step: 517930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:23,441-Speed 6298.54 samples/sec Loss 4.1878 LearningRate 0.0002 Epoch: 24 Global Step: 517940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:26,687-Speed 6311.51 samples/sec Loss 4.0803 LearningRate 0.0002 Epoch: 24 Global Step: 517950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:29,932-Speed 6311.48 samples/sec Loss 4.0593 LearningRate 0.0002 Epoch: 24 Global Step: 517960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:33,162-Speed 6343.51 samples/sec Loss 4.1684 LearningRate 0.0002 Epoch: 24 Global Step: 517970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:36,409-Speed 6307.98 samples/sec Loss 4.1323 LearningRate 0.0002 Epoch: 24 Global Step: 517980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:39,651-Speed 6318.32 samples/sec Loss 4.1162 LearningRate 0.0002 Epoch: 24 Global Step: 517990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:42,894-Speed 6317.01 samples/sec Loss 4.1510 LearningRate 0.0002 Epoch: 24 Global Step: 518000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:46,141-Speed 6308.56 samples/sec Loss 4.1037 LearningRate 0.0002 Epoch: 24 Global Step: 518010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:49,435-Speed 6219.22 samples/sec Loss 4.1001 LearningRate 0.0002 Epoch: 24 Global Step: 518020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:52,691-Speed 6291.79 samples/sec Loss 4.1495 LearningRate 0.0002 Epoch: 24 Global Step: 518030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:55,934-Speed 6315.93 samples/sec Loss 4.1652 LearningRate 0.0002 Epoch: 24 Global Step: 518040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:50:59,177-Speed 6316.47 samples/sec Loss 4.1431 LearningRate 0.0002 Epoch: 24 Global Step: 518050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:02,422-Speed 6312.71 samples/sec Loss 4.1502 LearningRate 0.0002 Epoch: 24 Global Step: 518060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:05,665-Speed 6317.08 samples/sec Loss 4.1247 LearningRate 0.0002 Epoch: 24 Global Step: 518070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:08,915-Speed 6302.36 samples/sec Loss 4.0958 LearningRate 0.0002 Epoch: 24 Global Step: 518080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:12,159-Speed 6315.35 samples/sec Loss 4.1630 LearningRate 0.0002 Epoch: 24 Global Step: 518090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:15,406-Speed 6309.09 samples/sec Loss 4.1587 LearningRate 0.0002 Epoch: 24 Global Step: 518100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:18,648-Speed 6318.11 samples/sec Loss 4.2107 LearningRate 0.0002 Epoch: 24 Global Step: 518110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:21,893-Speed 6312.85 samples/sec Loss 4.1941 LearningRate 0.0002 Epoch: 24 Global Step: 518120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:25,145-Speed 6298.47 samples/sec Loss 4.1278 LearningRate 0.0002 Epoch: 24 Global Step: 518130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:28,392-Speed 6308.09 samples/sec Loss 4.1598 LearningRate 0.0002 Epoch: 24 Global Step: 518140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:31,638-Speed 6312.10 samples/sec Loss 4.1622 LearningRate 0.0002 Epoch: 24 Global Step: 518150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:34,881-Speed 6316.33 samples/sec Loss 4.2276 LearningRate 0.0002 Epoch: 24 Global Step: 518160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:38,125-Speed 6313.55 samples/sec Loss 4.1906 LearningRate 0.0002 Epoch: 24 Global Step: 518170 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 14:51:41,361-Speed 6330.59 samples/sec Loss 4.2320 LearningRate 0.0002 Epoch: 24 Global Step: 518180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:44,605-Speed 6315.02 samples/sec Loss 4.0922 LearningRate 0.0002 Epoch: 24 Global Step: 518190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:47,851-Speed 6309.76 samples/sec Loss 4.1444 LearningRate 0.0002 Epoch: 24 Global Step: 518200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:51,095-Speed 6316.41 samples/sec Loss 4.1206 LearningRate 0.0002 Epoch: 24 Global Step: 518210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:54,343-Speed 6306.61 samples/sec Loss 4.1510 LearningRate 0.0002 Epoch: 24 Global Step: 518220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:51:57,583-Speed 6321.65 samples/sec Loss 4.2018 LearningRate 0.0002 Epoch: 24 Global Step: 518230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:00,830-Speed 6309.04 samples/sec Loss 4.2218 LearningRate 0.0002 Epoch: 24 Global Step: 518240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:04,073-Speed 6317.61 samples/sec Loss 4.0986 LearningRate 0.0002 Epoch: 24 Global Step: 518250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:07,317-Speed 6314.71 samples/sec Loss 4.0740 LearningRate 0.0002 Epoch: 24 Global Step: 518260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:10,571-Speed 6293.26 samples/sec Loss 4.1338 LearningRate 0.0002 Epoch: 24 Global Step: 518270 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:13,804-Speed 6337.32 samples/sec Loss 4.1319 LearningRate 0.0002 Epoch: 24 Global Step: 518280 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:17,050-Speed 6310.08 samples/sec Loss 4.0720 LearningRate 0.0002 Epoch: 24 Global Step: 518290 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:20,296-Speed 6311.16 samples/sec Loss 4.1329 LearningRate 0.0002 Epoch: 24 Global Step: 518300 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:23,556-Speed 6284.18 samples/sec Loss 4.0897 LearningRate 0.0002 Epoch: 24 Global Step: 518310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:26,800-Speed 6314.67 samples/sec Loss 4.1220 LearningRate 0.0002 Epoch: 24 Global Step: 518320 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:30,043-Speed 6315.83 samples/sec Loss 4.2071 LearningRate 0.0002 Epoch: 24 Global Step: 518330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:33,292-Speed 6304.93 samples/sec Loss 4.1643 LearningRate 0.0002 Epoch: 24 Global Step: 518340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:36,533-Speed 6319.45 samples/sec Loss 4.1803 LearningRate 0.0002 Epoch: 24 Global Step: 518350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:39,782-Speed 6304.90 samples/sec Loss 4.1420 LearningRate 0.0002 Epoch: 24 Global Step: 518360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:43,031-Speed 6306.38 samples/sec Loss 4.0988 LearningRate 0.0002 Epoch: 24 Global Step: 518370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:46,261-Speed 6341.25 samples/sec Loss 4.2157 LearningRate 0.0002 Epoch: 24 Global Step: 518380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:49,505-Speed 6313.92 samples/sec Loss 4.1983 LearningRate 0.0002 Epoch: 24 Global Step: 518390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:52,748-Speed 6317.32 samples/sec Loss 4.1298 LearningRate 0.0002 Epoch: 24 Global Step: 518400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:52:55,983-Speed 6331.67 samples/sec Loss 4.1507 LearningRate 0.0002 Epoch: 24 Global Step: 518410 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:52:59,228-Speed 6313.23 samples/sec Loss 4.1493 LearningRate 0.0002 Epoch: 24 Global Step: 518420 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:53:02,473-Speed 6312.60 samples/sec Loss 4.1406 LearningRate 0.0002 Epoch: 24 Global Step: 518430 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:53:05,719-Speed 6311.97 samples/sec Loss 4.1572 LearningRate 0.0002 Epoch: 24 Global Step: 518440 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:53:08,961-Speed 6318.46 samples/sec Loss 4.1776 LearningRate 0.0002 Epoch: 24 Global Step: 518450 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:53:12,201-Speed 6320.53 samples/sec Loss 4.1223 LearningRate 0.0002 Epoch: 24 Global Step: 518460 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:53:15,447-Speed 6312.72 samples/sec Loss 4.1680 LearningRate 0.0002 Epoch: 24 Global Step: 518470 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:53:18,688-Speed 6319.27 samples/sec Loss 4.1131 LearningRate 0.0002 Epoch: 24 Global Step: 518480 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:54:18,957-Speed 339.81 samples/sec Loss 4.0912 LearningRate 0.0002 Epoch: 25 Global Step: 518490 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:54:22,257-Speed 6208.05 samples/sec Loss 4.1268 LearningRate 0.0002 Epoch: 25 Global Step: 518500 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:54:25,497-Speed 6321.42 samples/sec Loss 4.1532 LearningRate 0.0002 Epoch: 25 Global Step: 518510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:54:28,733-Speed 6331.05 samples/sec Loss 4.1393 LearningRate 0.0002 Epoch: 25 Global Step: 518520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:54:31,969-Speed 6329.91 samples/sec Loss 4.1759 LearningRate 0.0002 Epoch: 25 Global Step: 518530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:54:35,204-Speed 6331.51 samples/sec Loss 4.2168 LearningRate 0.0002 Epoch: 25 Global Step: 518540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:54:38,447-Speed 6316.29 samples/sec Loss 4.1905 LearningRate 0.0002 Epoch: 25 Global Step: 518550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:54:41,683-Speed 6330.07 samples/sec Loss 4.1532 LearningRate 0.0002 Epoch: 25 Global Step: 518560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:54:44,921-Speed 6326.71 samples/sec Loss 4.1410 LearningRate 0.0002 Epoch: 25 Global Step: 518570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:54:48,160-Speed 6324.82 samples/sec Loss 4.1472 LearningRate 0.0002 Epoch: 25 Global Step: 518580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:54:51,398-Speed 6325.43 samples/sec Loss 4.1412 LearningRate 0.0002 Epoch: 25 Global Step: 518590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:54:54,637-Speed 6325.27 samples/sec Loss 4.1538 LearningRate 0.0002 Epoch: 25 Global Step: 518600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:54:57,863-Speed 6350.88 samples/sec Loss 4.0920 LearningRate 0.0002 Epoch: 25 Global Step: 518610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:01,102-Speed 6323.57 samples/sec Loss 4.1624 LearningRate 0.0002 Epoch: 25 Global Step: 518620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:04,341-Speed 6325.58 samples/sec Loss 4.0929 LearningRate 0.0002 Epoch: 25 Global Step: 518630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:07,609-Speed 6267.24 samples/sec Loss 4.1002 LearningRate 0.0002 Epoch: 25 Global Step: 518640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:10,848-Speed 6324.96 samples/sec Loss 4.1216 LearningRate 0.0002 Epoch: 25 Global Step: 518650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:14,095-Speed 6308.63 samples/sec Loss 4.0669 LearningRate 0.0002 Epoch: 25 Global Step: 518660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:17,343-Speed 6305.82 samples/sec Loss 4.1574 LearningRate 0.0002 Epoch: 25 Global Step: 518670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:20,593-Speed 6304.80 samples/sec Loss 4.0636 LearningRate 0.0002 Epoch: 25 Global Step: 518680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:23,836-Speed 6315.62 samples/sec Loss 4.1150 LearningRate 0.0002 Epoch: 25 Global Step: 518690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:27,085-Speed 6304.70 samples/sec Loss 4.1461 LearningRate 0.0002 Epoch: 25 Global Step: 518700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:30,331-Speed 6310.78 samples/sec Loss 4.1128 LearningRate 0.0002 Epoch: 25 Global Step: 518710 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 14:55:33,563-Speed 6337.91 samples/sec Loss 4.1066 LearningRate 0.0002 Epoch: 25 Global Step: 518720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:36,810-Speed 6308.31 samples/sec Loss 4.1384 LearningRate 0.0002 Epoch: 25 Global Step: 518730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:40,060-Speed 6303.71 samples/sec Loss 4.1559 LearningRate 0.0002 Epoch: 25 Global Step: 518740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:43,307-Speed 6307.93 samples/sec Loss 4.2103 LearningRate 0.0002 Epoch: 25 Global Step: 518750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:46,554-Speed 6309.53 samples/sec Loss 4.1144 LearningRate 0.0002 Epoch: 25 Global Step: 518760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:49,797-Speed 6315.39 samples/sec Loss 4.1420 LearningRate 0.0002 Epoch: 25 Global Step: 518770 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:53,043-Speed 6312.35 samples/sec Loss 4.1146 LearningRate 0.0002 Epoch: 25 Global Step: 518780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:56,289-Speed 6309.41 samples/sec Loss 4.1857 LearningRate 0.0002 Epoch: 25 Global Step: 518790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:55:59,536-Speed 6310.65 samples/sec Loss 4.0276 LearningRate 0.0002 Epoch: 25 Global Step: 518800 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:02,784-Speed 6305.26 samples/sec Loss 4.1279 LearningRate 0.0002 Epoch: 25 Global Step: 518810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:06,019-Speed 6332.34 samples/sec Loss 4.2052 LearningRate 0.0002 Epoch: 25 Global Step: 518820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:09,268-Speed 6305.77 samples/sec Loss 4.1099 LearningRate 0.0002 Epoch: 25 Global Step: 518830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:12,513-Speed 6313.50 samples/sec Loss 4.1331 LearningRate 0.0002 Epoch: 25 Global Step: 518840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:15,765-Speed 6298.84 samples/sec Loss 4.1257 LearningRate 0.0002 Epoch: 25 Global Step: 518850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:19,012-Speed 6308.68 samples/sec Loss 4.1023 LearningRate 0.0002 Epoch: 25 Global Step: 518860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:22,269-Speed 6288.77 samples/sec Loss 4.0905 LearningRate 0.0002 Epoch: 25 Global Step: 518870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:25,517-Speed 6307.12 samples/sec Loss 4.1699 LearningRate 0.0002 Epoch: 25 Global Step: 518880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:28,761-Speed 6314.55 samples/sec Loss 4.1301 LearningRate 0.0002 Epoch: 25 Global Step: 518890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:32,006-Speed 6312.71 samples/sec Loss 4.1424 LearningRate 0.0002 Epoch: 25 Global Step: 518900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:35,250-Speed 6315.35 samples/sec Loss 4.0782 LearningRate 0.0002 Epoch: 25 Global Step: 518910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:38,479-Speed 6343.60 samples/sec Loss 4.0800 LearningRate 0.0002 Epoch: 25 Global Step: 518920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:41,757-Speed 6248.39 samples/sec Loss 4.1744 LearningRate 0.0002 Epoch: 25 Global Step: 518930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:44,998-Speed 6320.96 samples/sec Loss 4.1038 LearningRate 0.0002 Epoch: 25 Global Step: 518940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:48,240-Speed 6317.37 samples/sec Loss 4.1252 LearningRate 0.0002 Epoch: 25 Global Step: 518950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:51,484-Speed 6316.43 samples/sec Loss 4.1051 LearningRate 0.0002 Epoch: 25 Global Step: 518960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:54,778-Speed 6218.13 samples/sec Loss 4.0928 LearningRate 0.0002 Epoch: 25 Global Step: 518970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:56:58,023-Speed 6312.26 samples/sec Loss 4.1481 LearningRate 0.0002 Epoch: 25 Global Step: 518980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:01,272-Speed 6304.85 samples/sec Loss 4.1448 LearningRate 0.0002 Epoch: 25 Global Step: 518990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:04,516-Speed 6314.90 samples/sec Loss 4.0941 LearningRate 0.0002 Epoch: 25 Global Step: 519000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:07,758-Speed 6318.55 samples/sec Loss 4.1893 LearningRate 0.0002 Epoch: 25 Global Step: 519010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:11,002-Speed 6314.71 samples/sec Loss 4.0624 LearningRate 0.0002 Epoch: 25 Global Step: 519020 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 14:57:14,230-Speed 6346.66 samples/sec Loss 4.1640 LearningRate 0.0002 Epoch: 25 Global Step: 519030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:17,479-Speed 6303.95 samples/sec Loss 4.1396 LearningRate 0.0002 Epoch: 25 Global Step: 519040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:20,724-Speed 6314.71 samples/sec Loss 4.1305 LearningRate 0.0002 Epoch: 25 Global Step: 519050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:23,972-Speed 6307.48 samples/sec Loss 4.1241 LearningRate 0.0002 Epoch: 25 Global Step: 519060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:27,219-Speed 6310.52 samples/sec Loss 4.1188 LearningRate 0.0002 Epoch: 25 Global Step: 519070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:30,462-Speed 6316.42 samples/sec Loss 4.1387 LearningRate 0.0002 Epoch: 25 Global Step: 519080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:33,712-Speed 6302.80 samples/sec Loss 4.0804 LearningRate 0.0002 Epoch: 25 Global Step: 519090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:36,956-Speed 6312.98 samples/sec Loss 4.1190 LearningRate 0.0002 Epoch: 25 Global Step: 519100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:40,207-Speed 6302.47 samples/sec Loss 4.1205 LearningRate 0.0002 Epoch: 25 Global Step: 519110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:43,452-Speed 6311.98 samples/sec Loss 4.1657 LearningRate 0.0002 Epoch: 25 Global Step: 519120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:46,696-Speed 6315.07 samples/sec Loss 4.1039 LearningRate 0.0002 Epoch: 25 Global Step: 519130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:49,944-Speed 6305.27 samples/sec Loss 4.1477 LearningRate 0.0002 Epoch: 25 Global Step: 519140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:53,195-Speed 6301.39 samples/sec Loss 4.1798 LearningRate 0.0002 Epoch: 25 Global Step: 519150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:56,437-Speed 6318.45 samples/sec Loss 4.1319 LearningRate 0.0002 Epoch: 25 Global Step: 519160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:57:59,684-Speed 6309.44 samples/sec Loss 4.1837 LearningRate 0.0002 Epoch: 25 Global Step: 519170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:58:02,926-Speed 6317.98 samples/sec Loss 4.0916 LearningRate 0.0002 Epoch: 25 Global Step: 519180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:58:06,176-Speed 6303.16 samples/sec Loss 4.1227 LearningRate 0.0002 Epoch: 25 Global Step: 519190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:58:09,430-Speed 6295.13 samples/sec Loss 4.1211 LearningRate 0.0002 Epoch: 25 Global Step: 519200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:58:12,675-Speed 6313.35 samples/sec Loss 4.2070 LearningRate 0.0002 Epoch: 25 Global Step: 519210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:58:15,955-Speed 6245.16 samples/sec Loss 4.1464 LearningRate 0.0002 Epoch: 25 Global Step: 519220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:58:19,186-Speed 6339.18 samples/sec Loss 4.1099 LearningRate 0.0002 Epoch: 25 Global Step: 519230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:58:22,437-Speed 6300.88 samples/sec Loss 4.1496 LearningRate 0.0002 Epoch: 25 Global Step: 519240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:58:25,682-Speed 6313.22 samples/sec Loss 4.0910 LearningRate 0.0002 Epoch: 25 Global Step: 519250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:58:28,930-Speed 6308.68 samples/sec Loss 4.1170 LearningRate 0.0002 Epoch: 25 Global Step: 519260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:58:32,159-Speed 6342.36 samples/sec Loss 4.1521 LearningRate 0.0002 Epoch: 25 Global Step: 519270 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:58:35,401-Speed 6318.39 samples/sec Loss 4.1077 LearningRate 0.0002 Epoch: 25 Global Step: 519280 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:58:38,659-Speed 6288.38 samples/sec Loss 4.1517 LearningRate 0.0002 Epoch: 25 Global Step: 519290 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:58:41,909-Speed 6303.65 samples/sec Loss 4.1608 LearningRate 0.0002 Epoch: 25 Global Step: 519300 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:58:45,155-Speed 6309.62 samples/sec Loss 4.0889 LearningRate 0.0002 Epoch: 25 Global Step: 519310 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:58:48,399-Speed 6315.43 samples/sec Loss 4.1167 LearningRate 0.0002 Epoch: 25 Global Step: 519320 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:58:51,643-Speed 6315.15 samples/sec Loss 4.0929 LearningRate 0.0002 Epoch: 25 Global Step: 519330 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:58:54,888-Speed 6311.45 samples/sec Loss 4.1096 LearningRate 0.0002 Epoch: 25 Global Step: 519340 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:58:58,131-Speed 6316.43 samples/sec Loss 4.1287 LearningRate 0.0002 Epoch: 25 Global Step: 519350 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:59:01,375-Speed 6315.32 samples/sec Loss 4.1731 LearningRate 0.0002 Epoch: 25 Global Step: 519360 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 14:59:04,618-Speed 6316.57 samples/sec Loss 4.1660 LearningRate 0.0002 Epoch: 25 Global Step: 519370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:07,864-Speed 6309.56 samples/sec Loss 4.1362 LearningRate 0.0002 Epoch: 25 Global Step: 519380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:11,110-Speed 6311.18 samples/sec Loss 4.1379 LearningRate 0.0002 Epoch: 25 Global Step: 519390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:14,357-Speed 6309.97 samples/sec Loss 4.0239 LearningRate 0.0002 Epoch: 25 Global Step: 519400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:17,605-Speed 6306.55 samples/sec Loss 4.1515 LearningRate 0.0002 Epoch: 25 Global Step: 519410 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:20,851-Speed 6310.57 samples/sec Loss 4.1158 LearningRate 0.0002 Epoch: 25 Global Step: 519420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:24,096-Speed 6311.75 samples/sec Loss 4.1034 LearningRate 0.0002 Epoch: 25 Global Step: 519430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:27,339-Speed 6317.48 samples/sec Loss 4.1422 LearningRate 0.0002 Epoch: 25 Global Step: 519440 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:30,584-Speed 6312.99 samples/sec Loss 4.1473 LearningRate 0.0002 Epoch: 25 Global Step: 519450 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:33,829-Speed 6311.58 samples/sec Loss 4.1362 LearningRate 0.0002 Epoch: 25 Global Step: 519460 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:37,060-Speed 6341.32 samples/sec Loss 4.0796 LearningRate 0.0002 Epoch: 25 Global Step: 519470 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:40,308-Speed 6307.18 samples/sec Loss 4.1532 LearningRate 0.0002 Epoch: 25 Global Step: 519480 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:43,556-Speed 6306.29 samples/sec Loss 4.0791 LearningRate 0.0002 Epoch: 25 Global Step: 519490 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:46,800-Speed 6314.05 samples/sec Loss 4.1967 LearningRate 0.0002 Epoch: 25 Global Step: 519500 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:50,045-Speed 6312.24 samples/sec Loss 4.1664 LearningRate 0.0002 Epoch: 25 Global Step: 519510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:53,291-Speed 6312.56 samples/sec Loss 4.1055 LearningRate 0.0002 Epoch: 25 Global Step: 519520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:56,537-Speed 6310.59 samples/sec Loss 4.1111 LearningRate 0.0002 Epoch: 25 Global Step: 519530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 14:59:59,782-Speed 6311.29 samples/sec Loss 4.1386 LearningRate 0.0002 Epoch: 25 Global Step: 519540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:03,029-Speed 6310.12 samples/sec Loss 4.1107 LearningRate 0.0002 Epoch: 25 Global Step: 519550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:06,275-Speed 6310.01 samples/sec Loss 4.0968 LearningRate 0.0002 Epoch: 25 Global Step: 519560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:09,506-Speed 6338.80 samples/sec Loss 4.1892 LearningRate 0.0002 Epoch: 25 Global Step: 519570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:12,777-Speed 6267.14 samples/sec Loss 4.1226 LearningRate 0.0002 Epoch: 25 Global Step: 519580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:16,019-Speed 6318.52 samples/sec Loss 4.0887 LearningRate 0.0002 Epoch: 25 Global Step: 519590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:19,272-Speed 6296.00 samples/sec Loss 4.1420 LearningRate 0.0002 Epoch: 25 Global Step: 519600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:22,516-Speed 6315.36 samples/sec Loss 4.1120 LearningRate 0.0002 Epoch: 25 Global Step: 519610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:25,769-Speed 6297.37 samples/sec Loss 4.1778 LearningRate 0.0002 Epoch: 25 Global Step: 519620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:29,016-Speed 6309.18 samples/sec Loss 4.1413 LearningRate 0.0002 Epoch: 25 Global Step: 519630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:32,259-Speed 6314.58 samples/sec Loss 4.1423 LearningRate 0.0002 Epoch: 25 Global Step: 519640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:35,503-Speed 6315.71 samples/sec Loss 4.1284 LearningRate 0.0002 Epoch: 25 Global Step: 519650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:38,743-Speed 6320.89 samples/sec Loss 4.1229 LearningRate 0.0002 Epoch: 25 Global Step: 519660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:41,976-Speed 6337.50 samples/sec Loss 4.1509 LearningRate 0.0002 Epoch: 25 Global Step: 519670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:45,221-Speed 6312.54 samples/sec Loss 4.0998 LearningRate 0.0002 Epoch: 25 Global Step: 519680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:48,465-Speed 6315.67 samples/sec Loss 4.0811 LearningRate 0.0002 Epoch: 25 Global Step: 519690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:51,711-Speed 6309.72 samples/sec Loss 4.1400 LearningRate 0.0002 Epoch: 25 Global Step: 519700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:54,953-Speed 6319.42 samples/sec Loss 4.1138 LearningRate 0.0002 Epoch: 25 Global Step: 519710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:00:58,200-Speed 6309.05 samples/sec Loss 4.1571 LearningRate 0.0002 Epoch: 25 Global Step: 519720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:01,450-Speed 6302.50 samples/sec Loss 4.0988 LearningRate 0.0002 Epoch: 25 Global Step: 519730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:04,694-Speed 6315.35 samples/sec Loss 4.1449 LearningRate 0.0002 Epoch: 25 Global Step: 519740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:07,939-Speed 6312.07 samples/sec Loss 4.1042 LearningRate 0.0002 Epoch: 25 Global Step: 519750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:11,178-Speed 6323.69 samples/sec Loss 4.1088 LearningRate 0.0002 Epoch: 25 Global Step: 519760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:14,425-Speed 6308.44 samples/sec Loss 4.1429 LearningRate 0.0002 Epoch: 25 Global Step: 519770 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:01:17,661-Speed 6337.49 samples/sec Loss 4.1733 LearningRate 0.0002 Epoch: 25 Global Step: 519780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:20,905-Speed 6313.21 samples/sec Loss 4.0591 LearningRate 0.0002 Epoch: 25 Global Step: 519790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:24,150-Speed 6312.21 samples/sec Loss 4.1243 LearningRate 0.0002 Epoch: 25 Global Step: 519800 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:27,392-Speed 6319.87 samples/sec Loss 4.1320 LearningRate 0.0002 Epoch: 25 Global Step: 519810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:30,639-Speed 6307.64 samples/sec Loss 4.1227 LearningRate 0.0002 Epoch: 25 Global Step: 519820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:33,883-Speed 6315.08 samples/sec Loss 4.1329 LearningRate 0.0002 Epoch: 25 Global Step: 519830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:37,130-Speed 6309.98 samples/sec Loss 4.0921 LearningRate 0.0002 Epoch: 25 Global Step: 519840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:40,382-Speed 6299.03 samples/sec Loss 4.0940 LearningRate 0.0002 Epoch: 25 Global Step: 519850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:43,622-Speed 6321.16 samples/sec Loss 4.1065 LearningRate 0.0002 Epoch: 25 Global Step: 519860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:46,872-Speed 6304.22 samples/sec Loss 4.1700 LearningRate 0.0002 Epoch: 25 Global Step: 519870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:50,102-Speed 6340.48 samples/sec Loss 4.1101 LearningRate 0.0002 Epoch: 25 Global Step: 519880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:53,348-Speed 6310.87 samples/sec Loss 4.1109 LearningRate 0.0002 Epoch: 25 Global Step: 519890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:01:56,579-Speed 6341.54 samples/sec Loss 4.0970 LearningRate 0.0002 Epoch: 25 Global Step: 519900 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:01:59,826-Speed 6307.56 samples/sec Loss 4.1372 LearningRate 0.0002 Epoch: 25 Global Step: 519910 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:02:03,073-Speed 6309.30 samples/sec Loss 4.1287 LearningRate 0.0002 Epoch: 25 Global Step: 519920 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:02:06,331-Speed 6289.13 samples/sec Loss 4.0934 LearningRate 0.0002 Epoch: 25 Global Step: 519930 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:02:09,575-Speed 6312.95 samples/sec Loss 4.1103 LearningRate 0.0002 Epoch: 25 Global Step: 519940 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:02:12,819-Speed 6315.06 samples/sec Loss 4.1167 LearningRate 0.0002 Epoch: 25 Global Step: 519950 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:02:16,062-Speed 6316.98 samples/sec Loss 4.1279 LearningRate 0.0002 Epoch: 25 Global Step: 519960 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:02:19,306-Speed 6314.13 samples/sec Loss 4.1416 LearningRate 0.0002 Epoch: 25 Global Step: 519970 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:02:22,548-Speed 6317.62 samples/sec Loss 4.0772 LearningRate 0.0002 Epoch: 25 Global Step: 519980 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:02:25,805-Speed 6290.94 samples/sec Loss 4.1240 LearningRate 0.0002 Epoch: 25 Global Step: 519990 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:02:29,047-Speed 6317.69 samples/sec Loss 4.1201 LearningRate 0.0002 Epoch: 25 Global Step: 520000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:02:32,291-Speed 6315.06 samples/sec Loss 4.1761 LearningRate 0.0002 Epoch: 25 Global Step: 520010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:02:35,560-Speed 6267.24 samples/sec Loss 4.1002 LearningRate 0.0002 Epoch: 25 Global Step: 520020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:02:38,805-Speed 6310.96 samples/sec Loss 4.1126 LearningRate 0.0002 Epoch: 25 Global Step: 520030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:02:42,049-Speed 6315.34 samples/sec Loss 4.0748 LearningRate 0.0002 Epoch: 25 Global Step: 520040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:02:45,291-Speed 6318.10 samples/sec Loss 4.1337 LearningRate 0.0002 Epoch: 25 Global Step: 520050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:02:48,542-Speed 6300.78 samples/sec Loss 4.1184 LearningRate 0.0002 Epoch: 25 Global Step: 520060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:02:51,786-Speed 6315.40 samples/sec Loss 4.1542 LearningRate 0.0002 Epoch: 25 Global Step: 520070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:02:55,030-Speed 6314.32 samples/sec Loss 4.1087 LearningRate 0.0002 Epoch: 25 Global Step: 520080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:02:58,273-Speed 6316.01 samples/sec Loss 4.1475 LearningRate 0.0002 Epoch: 25 Global Step: 520090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:01,532-Speed 6286.62 samples/sec Loss 4.1588 LearningRate 0.0002 Epoch: 25 Global Step: 520100 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:03:04,766-Speed 6334.85 samples/sec Loss 4.1048 LearningRate 0.0002 Epoch: 25 Global Step: 520110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:08,011-Speed 6312.20 samples/sec Loss 4.0808 LearningRate 0.0002 Epoch: 25 Global Step: 520120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:11,256-Speed 6312.15 samples/sec Loss 4.1423 LearningRate 0.0002 Epoch: 25 Global Step: 520130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:14,507-Speed 6302.71 samples/sec Loss 4.1260 LearningRate 0.0002 Epoch: 25 Global Step: 520140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:17,751-Speed 6313.68 samples/sec Loss 4.1252 LearningRate 0.0002 Epoch: 25 Global Step: 520150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:20,998-Speed 6309.06 samples/sec Loss 4.1024 LearningRate 0.0002 Epoch: 25 Global Step: 520160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:24,238-Speed 6321.97 samples/sec Loss 4.1558 LearningRate 0.0002 Epoch: 25 Global Step: 520170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:27,486-Speed 6307.96 samples/sec Loss 4.2183 LearningRate 0.0002 Epoch: 25 Global Step: 520180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:30,729-Speed 6316.52 samples/sec Loss 4.1538 LearningRate 0.0002 Epoch: 25 Global Step: 520190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:33,972-Speed 6315.82 samples/sec Loss 4.1226 LearningRate 0.0002 Epoch: 25 Global Step: 520200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:37,205-Speed 6337.21 samples/sec Loss 4.1162 LearningRate 0.0002 Epoch: 25 Global Step: 520210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:40,452-Speed 6307.53 samples/sec Loss 4.1331 LearningRate 0.0002 Epoch: 25 Global Step: 520220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:43,696-Speed 6314.62 samples/sec Loss 4.1061 LearningRate 0.0002 Epoch: 25 Global Step: 520230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:03:46,926-Speed 6341.81 samples/sec Loss 4.1568 LearningRate 0.0002 Epoch: 25 Global Step: 520240 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:03:50,172-Speed 6310.54 samples/sec Loss 4.1472 LearningRate 0.0002 Epoch: 25 Global Step: 520250 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:03:53,413-Speed 6320.62 samples/sec Loss 4.1584 LearningRate 0.0002 Epoch: 25 Global Step: 520260 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:03:56,658-Speed 6314.05 samples/sec Loss 4.0943 LearningRate 0.0002 Epoch: 25 Global Step: 520270 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:03:59,903-Speed 6312.00 samples/sec Loss 4.1096 LearningRate 0.0002 Epoch: 25 Global Step: 520280 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:04:03,148-Speed 6313.42 samples/sec Loss 4.1467 LearningRate 0.0002 Epoch: 25 Global Step: 520290 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:04:06,396-Speed 6306.52 samples/sec Loss 4.1243 LearningRate 0.0002 Epoch: 25 Global Step: 520300 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:04:09,639-Speed 6316.12 samples/sec Loss 4.1199 LearningRate 0.0002 Epoch: 25 Global Step: 520310 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:04:12,887-Speed 6306.08 samples/sec Loss 4.1830 LearningRate 0.0002 Epoch: 25 Global Step: 520320 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:04:16,134-Speed 6309.00 samples/sec Loss 4.1022 LearningRate 0.0002 Epoch: 25 Global Step: 520330 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:04:19,379-Speed 6314.16 samples/sec Loss 4.1238 LearningRate 0.0002 Epoch: 25 Global Step: 520340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:22,626-Speed 6307.24 samples/sec Loss 4.1037 LearningRate 0.0002 Epoch: 25 Global Step: 520350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:25,871-Speed 6314.26 samples/sec Loss 4.1168 LearningRate 0.0002 Epoch: 25 Global Step: 520360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:29,116-Speed 6312.70 samples/sec Loss 4.1637 LearningRate 0.0002 Epoch: 25 Global Step: 520370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:32,360-Speed 6313.49 samples/sec Loss 4.1462 LearningRate 0.0002 Epoch: 25 Global Step: 520380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:35,603-Speed 6316.48 samples/sec Loss 4.0650 LearningRate 0.0002 Epoch: 25 Global Step: 520390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:38,853-Speed 6304.30 samples/sec Loss 4.1478 LearningRate 0.0002 Epoch: 25 Global Step: 520400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:42,093-Speed 6320.94 samples/sec Loss 4.0778 LearningRate 0.0002 Epoch: 25 Global Step: 520410 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:45,343-Speed 6303.84 samples/sec Loss 4.1076 LearningRate 0.0002 Epoch: 25 Global Step: 520420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:48,587-Speed 6315.67 samples/sec Loss 4.0963 LearningRate 0.0002 Epoch: 25 Global Step: 520430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:51,829-Speed 6316.94 samples/sec Loss 4.0722 LearningRate 0.0002 Epoch: 25 Global Step: 520440 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:04:55,058-Speed 6344.77 samples/sec Loss 4.1118 LearningRate 0.0002 Epoch: 25 Global Step: 520450 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:04:58,299-Speed 6321.00 samples/sec Loss 4.1223 LearningRate 0.0002 Epoch: 25 Global Step: 520460 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:01,549-Speed 6302.07 samples/sec Loss 4.1476 LearningRate 0.0002 Epoch: 25 Global Step: 520470 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:04,871-Speed 6167.05 samples/sec Loss 4.1158 LearningRate 0.0002 Epoch: 25 Global Step: 520480 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:08,136-Speed 6274.10 samples/sec Loss 4.1742 LearningRate 0.0002 Epoch: 25 Global Step: 520490 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:11,382-Speed 6309.47 samples/sec Loss 4.1394 LearningRate 0.0002 Epoch: 25 Global Step: 520500 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:14,626-Speed 6315.43 samples/sec Loss 4.1237 LearningRate 0.0002 Epoch: 25 Global Step: 520510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:17,872-Speed 6310.55 samples/sec Loss 4.1466 LearningRate 0.0002 Epoch: 25 Global Step: 520520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:21,120-Speed 6306.27 samples/sec Loss 4.0283 LearningRate 0.0002 Epoch: 25 Global Step: 520530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:24,365-Speed 6312.98 samples/sec Loss 4.1854 LearningRate 0.0002 Epoch: 25 Global Step: 520540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:27,596-Speed 6341.43 samples/sec Loss 4.1244 LearningRate 0.0002 Epoch: 25 Global Step: 520550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:30,839-Speed 6315.34 samples/sec Loss 4.1225 LearningRate 0.0002 Epoch: 25 Global Step: 520560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:34,085-Speed 6311.88 samples/sec Loss 4.0850 LearningRate 0.0002 Epoch: 25 Global Step: 520570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:37,331-Speed 6311.20 samples/sec Loss 4.1694 LearningRate 0.0002 Epoch: 25 Global Step: 520580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:40,576-Speed 6312.48 samples/sec Loss 4.0378 LearningRate 0.0002 Epoch: 25 Global Step: 520590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:43,824-Speed 6305.30 samples/sec Loss 4.1654 LearningRate 0.0002 Epoch: 25 Global Step: 520600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:47,068-Speed 6316.03 samples/sec Loss 4.1127 LearningRate 0.0002 Epoch: 25 Global Step: 520610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:50,326-Speed 6286.71 samples/sec Loss 4.1535 LearningRate 0.0002 Epoch: 25 Global Step: 520620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:53,574-Speed 6307.42 samples/sec Loss 4.1445 LearningRate 0.0002 Epoch: 25 Global Step: 520630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:05:56,819-Speed 6312.46 samples/sec Loss 4.1596 LearningRate 0.0002 Epoch: 25 Global Step: 520640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:00,050-Speed 6339.31 samples/sec Loss 4.0663 LearningRate 0.0002 Epoch: 25 Global Step: 520650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:03,300-Speed 6303.48 samples/sec Loss 4.1219 LearningRate 0.0002 Epoch: 25 Global Step: 520660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:06,546-Speed 6310.34 samples/sec Loss 4.0882 LearningRate 0.0002 Epoch: 25 Global Step: 520670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:09,790-Speed 6315.15 samples/sec Loss 4.1141 LearningRate 0.0002 Epoch: 25 Global Step: 520680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:13,042-Speed 6298.76 samples/sec Loss 4.0946 LearningRate 0.0002 Epoch: 25 Global Step: 520690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:16,291-Speed 6304.02 samples/sec Loss 4.0694 LearningRate 0.0002 Epoch: 25 Global Step: 520700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:19,545-Speed 6295.25 samples/sec Loss 4.1147 LearningRate 0.0002 Epoch: 25 Global Step: 520710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:22,787-Speed 6319.48 samples/sec Loss 4.1395 LearningRate 0.0002 Epoch: 25 Global Step: 520720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:26,034-Speed 6309.48 samples/sec Loss 4.0671 LearningRate 0.0002 Epoch: 25 Global Step: 520730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:29,279-Speed 6311.60 samples/sec Loss 4.1446 LearningRate 0.0002 Epoch: 25 Global Step: 520740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:32,509-Speed 6342.60 samples/sec Loss 4.1197 LearningRate 0.0002 Epoch: 25 Global Step: 520750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:35,759-Speed 6304.05 samples/sec Loss 4.1311 LearningRate 0.0002 Epoch: 25 Global Step: 520760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:39,003-Speed 6313.63 samples/sec Loss 4.1785 LearningRate 0.0002 Epoch: 25 Global Step: 520770 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:42,255-Speed 6299.13 samples/sec Loss 4.0903 LearningRate 0.0002 Epoch: 25 Global Step: 520780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:45,501-Speed 6310.11 samples/sec Loss 4.1523 LearningRate 0.0002 Epoch: 25 Global Step: 520790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:48,748-Speed 6309.24 samples/sec Loss 4.1295 LearningRate 0.0002 Epoch: 25 Global Step: 520800 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:51,996-Speed 6307.66 samples/sec Loss 4.1504 LearningRate 0.0002 Epoch: 25 Global Step: 520810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:55,241-Speed 6311.31 samples/sec Loss 4.0598 LearningRate 0.0002 Epoch: 25 Global Step: 520820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:06:58,488-Speed 6308.77 samples/sec Loss 4.1245 LearningRate 0.0002 Epoch: 25 Global Step: 520830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:07:01,736-Speed 6307.28 samples/sec Loss 4.1164 LearningRate 0.0002 Epoch: 25 Global Step: 520840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:07:04,965-Speed 6343.79 samples/sec Loss 4.1642 LearningRate 0.0002 Epoch: 25 Global Step: 520850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:07:08,217-Speed 6299.17 samples/sec Loss 4.1399 LearningRate 0.0002 Epoch: 25 Global Step: 520860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:07:11,463-Speed 6311.14 samples/sec Loss 4.0553 LearningRate 0.0002 Epoch: 25 Global Step: 520870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:07:14,709-Speed 6309.37 samples/sec Loss 4.1157 LearningRate 0.0002 Epoch: 25 Global Step: 520880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:07:17,940-Speed 6341.23 samples/sec Loss 4.1067 LearningRate 0.0002 Epoch: 25 Global Step: 520890 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:07:21,187-Speed 6307.62 samples/sec Loss 4.1124 LearningRate 0.0002 Epoch: 25 Global Step: 520900 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:07:24,433-Speed 6310.83 samples/sec Loss 4.0834 LearningRate 0.0002 Epoch: 25 Global Step: 520910 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:07:27,682-Speed 6305.33 samples/sec Loss 4.0870 LearningRate 0.0002 Epoch: 25 Global Step: 520920 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:07:30,929-Speed 6309.52 samples/sec Loss 4.1361 LearningRate 0.0002 Epoch: 25 Global Step: 520930 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:07:34,173-Speed 6315.15 samples/sec Loss 4.1109 LearningRate 0.0002 Epoch: 25 Global Step: 520940 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:07:37,416-Speed 6316.91 samples/sec Loss 4.0929 LearningRate 0.0002 Epoch: 25 Global Step: 520950 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:07:40,661-Speed 6311.58 samples/sec Loss 4.1081 LearningRate 0.0002 Epoch: 25 Global Step: 520960 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:07:43,903-Speed 6318.47 samples/sec Loss 4.1251 LearningRate 0.0002 Epoch: 25 Global Step: 520970 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:07:47,149-Speed 6312.31 samples/sec Loss 4.0723 LearningRate 0.0002 Epoch: 25 Global Step: 520980 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:07:50,394-Speed 6312.53 samples/sec Loss 4.0843 LearningRate 0.0002 Epoch: 25 Global Step: 520990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:07:53,641-Speed 6308.54 samples/sec Loss 4.1137 LearningRate 0.0002 Epoch: 25 Global Step: 521000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:07:56,885-Speed 6313.59 samples/sec Loss 4.0448 LearningRate 0.0002 Epoch: 25 Global Step: 521010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:00,131-Speed 6310.89 samples/sec Loss 4.0789 LearningRate 0.0002 Epoch: 25 Global Step: 521020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:03,376-Speed 6313.65 samples/sec Loss 4.0658 LearningRate 0.0002 Epoch: 25 Global Step: 521030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:06,618-Speed 6317.11 samples/sec Loss 4.1086 LearningRate 0.0002 Epoch: 25 Global Step: 521040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:09,865-Speed 6310.12 samples/sec Loss 4.0830 LearningRate 0.0002 Epoch: 25 Global Step: 521050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:13,107-Speed 6318.18 samples/sec Loss 4.1505 LearningRate 0.0002 Epoch: 25 Global Step: 521060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:16,351-Speed 6313.64 samples/sec Loss 4.1304 LearningRate 0.0002 Epoch: 25 Global Step: 521070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:19,598-Speed 6309.85 samples/sec Loss 4.1567 LearningRate 0.0002 Epoch: 25 Global Step: 521080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:22,830-Speed 6337.31 samples/sec Loss 4.0824 LearningRate 0.0002 Epoch: 25 Global Step: 521090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:26,076-Speed 6310.44 samples/sec Loss 4.1637 LearningRate 0.0002 Epoch: 25 Global Step: 521100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:29,321-Speed 6313.35 samples/sec Loss 4.1220 LearningRate 0.0002 Epoch: 25 Global Step: 521110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:32,569-Speed 6307.12 samples/sec Loss 4.1523 LearningRate 0.0002 Epoch: 25 Global Step: 521120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:35,814-Speed 6312.55 samples/sec Loss 4.0809 LearningRate 0.0002 Epoch: 25 Global Step: 521130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:39,055-Speed 6320.50 samples/sec Loss 4.1587 LearningRate 0.0002 Epoch: 25 Global Step: 521140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:42,298-Speed 6315.99 samples/sec Loss 4.1488 LearningRate 0.0002 Epoch: 25 Global Step: 521150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:45,541-Speed 6316.47 samples/sec Loss 4.1893 LearningRate 0.0002 Epoch: 25 Global Step: 521160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:48,788-Speed 6307.97 samples/sec Loss 4.0396 LearningRate 0.0002 Epoch: 25 Global Step: 521170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:52,032-Speed 6315.33 samples/sec Loss 4.0911 LearningRate 0.0002 Epoch: 25 Global Step: 521180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:55,262-Speed 6342.35 samples/sec Loss 4.1148 LearningRate 0.0002 Epoch: 25 Global Step: 521190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:08:58,492-Speed 6340.92 samples/sec Loss 4.1464 LearningRate 0.0002 Epoch: 25 Global Step: 521200 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:09:01,737-Speed 6314.41 samples/sec Loss 4.1082 LearningRate 0.0002 Epoch: 25 Global Step: 521210 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:09:04,981-Speed 6314.55 samples/sec Loss 4.0647 LearningRate 0.0002 Epoch: 25 Global Step: 521220 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:09:08,227-Speed 6311.07 samples/sec Loss 4.0750 LearningRate 0.0002 Epoch: 25 Global Step: 521230 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:09:11,473-Speed 6310.27 samples/sec Loss 4.1278 LearningRate 0.0002 Epoch: 25 Global Step: 521240 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:09:14,714-Speed 6320.46 samples/sec Loss 4.0405 LearningRate 0.0002 Epoch: 25 Global Step: 521250 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:09:17,966-Speed 6298.40 samples/sec Loss 4.1916 LearningRate 0.0002 Epoch: 25 Global Step: 521260 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:09:21,209-Speed 6315.86 samples/sec Loss 4.0766 LearningRate 0.0002 Epoch: 25 Global Step: 521270 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:09:24,457-Speed 6306.82 samples/sec Loss 4.1068 LearningRate 0.0002 Epoch: 25 Global Step: 521280 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:09:27,704-Speed 6309.20 samples/sec Loss 4.1228 LearningRate 0.0002 Epoch: 25 Global Step: 521290 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:09:30,948-Speed 6315.06 samples/sec Loss 4.1149 LearningRate 0.0002 Epoch: 25 Global Step: 521300 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:09:34,190-Speed 6318.10 samples/sec Loss 4.0773 LearningRate 0.0002 Epoch: 25 Global Step: 521310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:09:37,436-Speed 6310.02 samples/sec Loss 4.1507 LearningRate 0.0002 Epoch: 25 Global Step: 521320 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:09:40,689-Speed 6297.49 samples/sec Loss 4.1304 LearningRate 0.0002 Epoch: 25 Global Step: 521330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:09:43,931-Speed 6319.05 samples/sec Loss 4.0429 LearningRate 0.0002 Epoch: 25 Global Step: 521340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:09:47,179-Speed 6307.16 samples/sec Loss 4.1229 LearningRate 0.0002 Epoch: 25 Global Step: 521350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:09:50,422-Speed 6315.63 samples/sec Loss 4.1983 LearningRate 0.0002 Epoch: 25 Global Step: 521360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:09:53,667-Speed 6313.42 samples/sec Loss 4.1732 LearningRate 0.0002 Epoch: 25 Global Step: 521370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:09:56,914-Speed 6308.86 samples/sec Loss 4.0924 LearningRate 0.0002 Epoch: 25 Global Step: 521380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:10:00,164-Speed 6302.84 samples/sec Loss 4.0977 LearningRate 0.0002 Epoch: 25 Global Step: 521390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:10:03,410-Speed 6310.27 samples/sec Loss 4.1155 LearningRate 0.0002 Epoch: 25 Global Step: 521400 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:10:06,642-Speed 6337.97 samples/sec Loss 4.0651 LearningRate 0.0002 Epoch: 25 Global Step: 521410 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:10:09,885-Speed 6317.97 samples/sec Loss 4.0810 LearningRate 0.0002 Epoch: 25 Global Step: 521420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:10:13,132-Speed 6309.09 samples/sec Loss 4.0475 LearningRate 0.0002 Epoch: 25 Global Step: 521430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:10:16,375-Speed 6315.87 samples/sec Loss 4.1599 LearningRate 0.0002 Epoch: 25 Global Step: 521440 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:10:19,620-Speed 6312.06 samples/sec Loss 4.1336 LearningRate 0.0002 Epoch: 25 Global Step: 521450 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:10:22,875-Speed 6294.58 samples/sec Loss 4.1007 LearningRate 0.0002 Epoch: 25 Global Step: 521460 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:10:26,104-Speed 6343.20 samples/sec Loss 4.0716 LearningRate 0.0002 Epoch: 25 Global Step: 521470 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:10:29,349-Speed 6311.93 samples/sec Loss 4.1252 LearningRate 0.0002 Epoch: 25 Global Step: 521480 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:10:32,599-Speed 6303.98 samples/sec Loss 4.1369 LearningRate 0.0002 Epoch: 25 Global Step: 521490 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:10:35,866-Speed 6268.66 samples/sec Loss 4.1154 LearningRate 0.0002 Epoch: 25 Global Step: 521500 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:10:39,110-Speed 6315.21 samples/sec Loss 4.1361 LearningRate 0.0002 Epoch: 25 Global Step: 521510 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:10:42,352-Speed 6319.68 samples/sec Loss 4.1439 LearningRate 0.0002 Epoch: 25 Global Step: 521520 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:10:45,595-Speed 6315.04 samples/sec Loss 4.0545 LearningRate 0.0002 Epoch: 25 Global Step: 521530 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:10:48,836-Speed 6321.37 samples/sec Loss 4.1445 LearningRate 0.0002 Epoch: 25 Global Step: 521540 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:10:52,078-Speed 6317.29 samples/sec Loss 4.1265 LearningRate 0.0002 Epoch: 25 Global Step: 521550 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:10:55,322-Speed 6315.77 samples/sec Loss 4.1088 LearningRate 0.0002 Epoch: 25 Global Step: 521560 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:10:58,564-Speed 6317.94 samples/sec Loss 4.0779 LearningRate 0.0002 Epoch: 25 Global Step: 521570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:01,806-Speed 6319.70 samples/sec Loss 4.1692 LearningRate 0.0002 Epoch: 25 Global Step: 521580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:05,052-Speed 6311.36 samples/sec Loss 4.1463 LearningRate 0.0002 Epoch: 25 Global Step: 521590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:08,300-Speed 6307.26 samples/sec Loss 4.1348 LearningRate 0.0002 Epoch: 25 Global Step: 521600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:11,542-Speed 6318.13 samples/sec Loss 4.1509 LearningRate 0.0002 Epoch: 25 Global Step: 521610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:14,784-Speed 6317.51 samples/sec Loss 4.1166 LearningRate 0.0002 Epoch: 25 Global Step: 521620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:18,027-Speed 6316.94 samples/sec Loss 4.1456 LearningRate 0.0002 Epoch: 25 Global Step: 521630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:21,271-Speed 6314.54 samples/sec Loss 4.1577 LearningRate 0.0002 Epoch: 25 Global Step: 521640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:24,516-Speed 6313.06 samples/sec Loss 4.1090 LearningRate 0.0002 Epoch: 25 Global Step: 521650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:27,761-Speed 6312.53 samples/sec Loss 4.0924 LearningRate 0.0002 Epoch: 25 Global Step: 521660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:31,002-Speed 6320.14 samples/sec Loss 4.1397 LearningRate 0.0002 Epoch: 25 Global Step: 521670 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:11:34,233-Speed 6340.80 samples/sec Loss 4.1749 LearningRate 0.0002 Epoch: 25 Global Step: 521680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:37,476-Speed 6315.59 samples/sec Loss 4.1236 LearningRate 0.0002 Epoch: 25 Global Step: 521690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:40,720-Speed 6315.17 samples/sec Loss 4.1298 LearningRate 0.0002 Epoch: 25 Global Step: 521700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:43,962-Speed 6317.65 samples/sec Loss 4.1537 LearningRate 0.0002 Epoch: 25 Global Step: 521710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:47,229-Speed 6269.93 samples/sec Loss 4.0509 LearningRate 0.0002 Epoch: 25 Global Step: 521720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:50,472-Speed 6316.12 samples/sec Loss 4.1435 LearningRate 0.0002 Epoch: 25 Global Step: 521730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:53,714-Speed 6318.53 samples/sec Loss 4.1525 LearningRate 0.0002 Epoch: 25 Global Step: 521740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:11:56,959-Speed 6314.15 samples/sec Loss 4.1252 LearningRate 0.0002 Epoch: 25 Global Step: 521750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:00,202-Speed 6314.86 samples/sec Loss 4.1532 LearningRate 0.0002 Epoch: 25 Global Step: 521760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:03,446-Speed 6315.37 samples/sec Loss 4.0972 LearningRate 0.0002 Epoch: 25 Global Step: 521770 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:06,676-Speed 6342.88 samples/sec Loss 4.0996 LearningRate 0.0002 Epoch: 25 Global Step: 521780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:09,920-Speed 6314.77 samples/sec Loss 4.1082 LearningRate 0.0002 Epoch: 25 Global Step: 521790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:13,166-Speed 6310.91 samples/sec Loss 4.0804 LearningRate 0.0002 Epoch: 25 Global Step: 521800 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:16,416-Speed 6302.44 samples/sec Loss 4.0941 LearningRate 0.0002 Epoch: 25 Global Step: 521810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:19,658-Speed 6318.21 samples/sec Loss 4.0906 LearningRate 0.0002 Epoch: 25 Global Step: 521820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:22,904-Speed 6312.32 samples/sec Loss 4.0975 LearningRate 0.0002 Epoch: 25 Global Step: 521830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:26,158-Speed 6294.80 samples/sec Loss 4.0778 LearningRate 0.0002 Epoch: 25 Global Step: 521840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:29,404-Speed 6309.95 samples/sec Loss 4.1339 LearningRate 0.0002 Epoch: 25 Global Step: 521850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:32,646-Speed 6319.49 samples/sec Loss 4.1037 LearningRate 0.0002 Epoch: 25 Global Step: 521860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:35,893-Speed 6308.06 samples/sec Loss 4.0800 LearningRate 0.0002 Epoch: 25 Global Step: 521870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:39,124-Speed 6340.45 samples/sec Loss 4.0652 LearningRate 0.0002 Epoch: 25 Global Step: 521880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:42,370-Speed 6309.96 samples/sec Loss 4.0919 LearningRate 0.0002 Epoch: 25 Global Step: 521890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:45,612-Speed 6318.28 samples/sec Loss 4.1581 LearningRate 0.0002 Epoch: 25 Global Step: 521900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:48,858-Speed 6311.40 samples/sec Loss 4.0809 LearningRate 0.0002 Epoch: 25 Global Step: 521910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:52,115-Speed 6289.33 samples/sec Loss 4.1234 LearningRate 0.0002 Epoch: 25 Global Step: 521920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:55,356-Speed 6319.20 samples/sec Loss 4.0810 LearningRate 0.0002 Epoch: 25 Global Step: 521930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:12:58,619-Speed 6277.96 samples/sec Loss 4.0604 LearningRate 0.0002 Epoch: 25 Global Step: 521940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:13:01,900-Speed 6243.63 samples/sec Loss 4.1103 LearningRate 0.0002 Epoch: 25 Global Step: 521950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:13:05,145-Speed 6313.10 samples/sec Loss 4.0184 LearningRate 0.0002 Epoch: 25 Global Step: 521960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:13:08,394-Speed 6304.26 samples/sec Loss 4.0800 LearningRate 0.0002 Epoch: 25 Global Step: 521970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:13:11,624-Speed 6342.39 samples/sec Loss 4.0653 LearningRate 0.0002 Epoch: 25 Global Step: 521980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:13:14,871-Speed 6309.77 samples/sec Loss 4.1477 LearningRate 0.0002 Epoch: 25 Global Step: 521990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:13:18,099-Speed 6345.91 samples/sec Loss 4.1072 LearningRate 0.0002 Epoch: 25 Global Step: 522000 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:13:21,341-Speed 6318.49 samples/sec Loss 4.0787 LearningRate 0.0002 Epoch: 25 Global Step: 522010 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:13:24,596-Speed 6293.07 samples/sec Loss 4.0456 LearningRate 0.0002 Epoch: 25 Global Step: 522020 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:13:27,838-Speed 6319.87 samples/sec Loss 4.0303 LearningRate 0.0002 Epoch: 25 Global Step: 522030 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:13:31,083-Speed 6311.76 samples/sec Loss 4.1057 LearningRate 0.0002 Epoch: 25 Global Step: 522040 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:13:34,337-Speed 6295.01 samples/sec Loss 4.1198 LearningRate 0.0002 Epoch: 25 Global Step: 522050 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:13:37,641-Speed 6200.01 samples/sec Loss 4.0760 LearningRate 0.0002 Epoch: 25 Global Step: 522060 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:13:40,884-Speed 6316.11 samples/sec Loss 4.1627 LearningRate 0.0002 Epoch: 25 Global Step: 522070 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:13:44,131-Speed 6309.34 samples/sec Loss 4.1116 LearningRate 0.0002 Epoch: 25 Global Step: 522080 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:13:47,374-Speed 6316.39 samples/sec Loss 4.1059 LearningRate 0.0002 Epoch: 25 Global Step: 522090 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:13:50,613-Speed 6323.27 samples/sec Loss 4.0541 LearningRate 0.0002 Epoch: 25 Global Step: 522100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:13:53,856-Speed 6317.24 samples/sec Loss 4.1258 LearningRate 0.0002 Epoch: 25 Global Step: 522110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:13:57,103-Speed 6309.90 samples/sec Loss 3.9970 LearningRate 0.0002 Epoch: 25 Global Step: 522120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:00,347-Speed 6313.95 samples/sec Loss 4.1262 LearningRate 0.0002 Epoch: 25 Global Step: 522130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:03,591-Speed 6315.03 samples/sec Loss 4.0836 LearningRate 0.0002 Epoch: 25 Global Step: 522140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:06,834-Speed 6315.83 samples/sec Loss 4.0987 LearningRate 0.0002 Epoch: 25 Global Step: 522150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:10,077-Speed 6316.09 samples/sec Loss 4.0813 LearningRate 0.0002 Epoch: 25 Global Step: 522160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:13,324-Speed 6308.72 samples/sec Loss 4.1412 LearningRate 0.0002 Epoch: 25 Global Step: 522170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:16,571-Speed 6309.03 samples/sec Loss 4.1708 LearningRate 0.0002 Epoch: 25 Global Step: 522180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:19,816-Speed 6313.86 samples/sec Loss 4.0379 LearningRate 0.0002 Epoch: 25 Global Step: 522190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:23,163-Speed 6121.12 samples/sec Loss 4.0672 LearningRate 0.0002 Epoch: 25 Global Step: 522200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:26,514-Speed 6112.20 samples/sec Loss 4.1190 LearningRate 0.0002 Epoch: 25 Global Step: 522210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:29,865-Speed 6112.54 samples/sec Loss 4.0520 LearningRate 0.0002 Epoch: 25 Global Step: 522220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:33,110-Speed 6312.59 samples/sec Loss 4.1013 LearningRate 0.0002 Epoch: 25 Global Step: 522230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:36,354-Speed 6314.35 samples/sec Loss 4.1237 LearningRate 0.0002 Epoch: 25 Global Step: 522240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:39,600-Speed 6311.47 samples/sec Loss 4.0635 LearningRate 0.0002 Epoch: 25 Global Step: 522250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:42,847-Speed 6309.39 samples/sec Loss 4.1261 LearningRate 0.0002 Epoch: 25 Global Step: 522260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:46,090-Speed 6316.38 samples/sec Loss 4.1534 LearningRate 0.0002 Epoch: 25 Global Step: 522270 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:49,334-Speed 6312.87 samples/sec Loss 4.1452 LearningRate 0.0002 Epoch: 25 Global Step: 522280 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:52,575-Speed 6321.05 samples/sec Loss 4.0986 LearningRate 0.0002 Epoch: 25 Global Step: 522290 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:14:55,825-Speed 6303.99 samples/sec Loss 4.0923 LearningRate 0.0002 Epoch: 25 Global Step: 522300 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:14:59,054-Speed 6342.57 samples/sec Loss 4.0751 LearningRate 0.0002 Epoch: 25 Global Step: 522310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:02,299-Speed 6314.04 samples/sec Loss 4.1222 LearningRate 0.0002 Epoch: 25 Global Step: 522320 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:05,550-Speed 6300.50 samples/sec Loss 4.1365 LearningRate 0.0002 Epoch: 25 Global Step: 522330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:08,791-Speed 6319.40 samples/sec Loss 4.0611 LearningRate 0.0002 Epoch: 25 Global Step: 522340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:12,040-Speed 6304.61 samples/sec Loss 4.1005 LearningRate 0.0002 Epoch: 25 Global Step: 522350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:15,286-Speed 6311.18 samples/sec Loss 4.0677 LearningRate 0.0002 Epoch: 25 Global Step: 522360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:18,540-Speed 6295.52 samples/sec Loss 4.0734 LearningRate 0.0002 Epoch: 25 Global Step: 522370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:21,788-Speed 6307.05 samples/sec Loss 4.0755 LearningRate 0.0002 Epoch: 25 Global Step: 522380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:25,030-Speed 6318.63 samples/sec Loss 4.0953 LearningRate 0.0002 Epoch: 25 Global Step: 522390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:28,275-Speed 6311.97 samples/sec Loss 4.1053 LearningRate 0.0002 Epoch: 25 Global Step: 522400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:31,516-Speed 6320.80 samples/sec Loss 4.1271 LearningRate 0.0002 Epoch: 25 Global Step: 522410 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:34,761-Speed 6313.45 samples/sec Loss 4.1266 LearningRate 0.0002 Epoch: 25 Global Step: 522420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:38,010-Speed 6305.59 samples/sec Loss 4.0949 LearningRate 0.0002 Epoch: 25 Global Step: 522430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:41,258-Speed 6306.09 samples/sec Loss 4.0552 LearningRate 0.0002 Epoch: 25 Global Step: 522440 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:44,503-Speed 6312.69 samples/sec Loss 4.0874 LearningRate 0.0002 Epoch: 25 Global Step: 522450 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:47,745-Speed 6317.50 samples/sec Loss 4.1176 LearningRate 0.0002 Epoch: 25 Global Step: 522460 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:50,994-Speed 6305.04 samples/sec Loss 4.1057 LearningRate 0.0002 Epoch: 25 Global Step: 522470 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:54,239-Speed 6313.14 samples/sec Loss 4.1379 LearningRate 0.0002 Epoch: 25 Global Step: 522480 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:15:57,500-Speed 6281.85 samples/sec Loss 4.0746 LearningRate 0.0002 Epoch: 25 Global Step: 522490 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:00,751-Speed 6300.04 samples/sec Loss 4.0803 LearningRate 0.0002 Epoch: 25 Global Step: 522500 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:03,989-Speed 6327.85 samples/sec Loss 4.1838 LearningRate 0.0002 Epoch: 25 Global Step: 522510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:07,237-Speed 6307.10 samples/sec Loss 4.0546 LearningRate 0.0002 Epoch: 25 Global Step: 522520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:10,491-Speed 6294.85 samples/sec Loss 4.1377 LearningRate 0.0002 Epoch: 25 Global Step: 522530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:13,735-Speed 6313.41 samples/sec Loss 4.0907 LearningRate 0.0002 Epoch: 25 Global Step: 522540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:16,979-Speed 6315.30 samples/sec Loss 4.1110 LearningRate 0.0002 Epoch: 25 Global Step: 522550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:20,232-Speed 6296.72 samples/sec Loss 4.0798 LearningRate 0.0002 Epoch: 25 Global Step: 522560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:23,480-Speed 6307.81 samples/sec Loss 4.1307 LearningRate 0.0002 Epoch: 25 Global Step: 522570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:26,725-Speed 6312.77 samples/sec Loss 4.1193 LearningRate 0.0002 Epoch: 25 Global Step: 522580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:29,972-Speed 6307.17 samples/sec Loss 4.0974 LearningRate 0.0002 Epoch: 25 Global Step: 522590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:33,222-Speed 6304.12 samples/sec Loss 4.1689 LearningRate 0.0002 Epoch: 25 Global Step: 522600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:36,453-Speed 6339.43 samples/sec Loss 4.1228 LearningRate 0.0002 Epoch: 25 Global Step: 522610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:39,699-Speed 6311.49 samples/sec Loss 4.1145 LearningRate 0.0002 Epoch: 25 Global Step: 522620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:42,946-Speed 6309.36 samples/sec Loss 4.1771 LearningRate 0.0002 Epoch: 25 Global Step: 522630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:46,191-Speed 6313.53 samples/sec Loss 4.1386 LearningRate 0.0002 Epoch: 25 Global Step: 522640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:49,433-Speed 6317.44 samples/sec Loss 4.0371 LearningRate 0.0002 Epoch: 25 Global Step: 522650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:52,679-Speed 6311.03 samples/sec Loss 4.0782 LearningRate 0.0002 Epoch: 25 Global Step: 522660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:55,923-Speed 6314.09 samples/sec Loss 4.0738 LearningRate 0.0002 Epoch: 25 Global Step: 522670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:16:59,168-Speed 6313.09 samples/sec Loss 4.0563 LearningRate 0.0002 Epoch: 25 Global Step: 522680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:02,422-Speed 6295.90 samples/sec Loss 4.0977 LearningRate 0.0002 Epoch: 25 Global Step: 522690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:05,670-Speed 6306.78 samples/sec Loss 4.1004 LearningRate 0.0002 Epoch: 25 Global Step: 522700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:08,911-Speed 6320.93 samples/sec Loss 4.1091 LearningRate 0.0002 Epoch: 25 Global Step: 522710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:12,158-Speed 6307.50 samples/sec Loss 4.0892 LearningRate 0.0002 Epoch: 25 Global Step: 522720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:15,407-Speed 6305.29 samples/sec Loss 4.0852 LearningRate 0.0002 Epoch: 25 Global Step: 522730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:18,654-Speed 6308.72 samples/sec Loss 4.0468 LearningRate 0.0002 Epoch: 25 Global Step: 522740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:21,895-Speed 6321.50 samples/sec Loss 4.1195 LearningRate 0.0002 Epoch: 25 Global Step: 522750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:25,142-Speed 6307.88 samples/sec Loss 4.0804 LearningRate 0.0002 Epoch: 25 Global Step: 522760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:28,388-Speed 6310.38 samples/sec Loss 4.0890 LearningRate 0.0002 Epoch: 25 Global Step: 522770 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:31,631-Speed 6316.70 samples/sec Loss 4.0768 LearningRate 0.0002 Epoch: 25 Global Step: 522780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:34,875-Speed 6315.51 samples/sec Loss 4.1015 LearningRate 0.0002 Epoch: 25 Global Step: 522790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:38,124-Speed 6304.46 samples/sec Loss 4.0170 LearningRate 0.0002 Epoch: 25 Global Step: 522800 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:41,353-Speed 6343.45 samples/sec Loss 4.1449 LearningRate 0.0002 Epoch: 25 Global Step: 522810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:44,594-Speed 6321.40 samples/sec Loss 4.1120 LearningRate 0.0002 Epoch: 25 Global Step: 522820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:47,837-Speed 6315.03 samples/sec Loss 4.0198 LearningRate 0.0002 Epoch: 25 Global Step: 522830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:51,084-Speed 6308.81 samples/sec Loss 4.0721 LearningRate 0.0002 Epoch: 25 Global Step: 522840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:54,332-Speed 6308.54 samples/sec Loss 4.1319 LearningRate 0.0002 Epoch: 25 Global Step: 522850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:17:57,575-Speed 6315.33 samples/sec Loss 4.1340 LearningRate 0.0002 Epoch: 25 Global Step: 522860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:00,818-Speed 6318.17 samples/sec Loss 4.1210 LearningRate 0.0002 Epoch: 25 Global Step: 522870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:04,063-Speed 6311.75 samples/sec Loss 4.1421 LearningRate 0.0002 Epoch: 25 Global Step: 522880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:07,305-Speed 6318.29 samples/sec Loss 4.1618 LearningRate 0.0002 Epoch: 25 Global Step: 522890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:10,553-Speed 6308.13 samples/sec Loss 4.1629 LearningRate 0.0002 Epoch: 25 Global Step: 522900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:13,782-Speed 6342.94 samples/sec Loss 4.1184 LearningRate 0.0002 Epoch: 25 Global Step: 522910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:17,025-Speed 6316.40 samples/sec Loss 4.0933 LearningRate 0.0002 Epoch: 25 Global Step: 522920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:20,273-Speed 6307.30 samples/sec Loss 4.0388 LearningRate 0.0002 Epoch: 25 Global Step: 522930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:23,517-Speed 6314.08 samples/sec Loss 4.1224 LearningRate 0.0002 Epoch: 25 Global Step: 522940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:26,768-Speed 6301.33 samples/sec Loss 4.1281 LearningRate 0.0002 Epoch: 25 Global Step: 522950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:30,024-Speed 6291.13 samples/sec Loss 4.1247 LearningRate 0.0002 Epoch: 25 Global Step: 522960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:33,268-Speed 6314.56 samples/sec Loss 4.1129 LearningRate 0.0002 Epoch: 25 Global Step: 522970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:36,514-Speed 6309.54 samples/sec Loss 4.1400 LearningRate 0.0002 Epoch: 25 Global Step: 522980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:39,765-Speed 6302.04 samples/sec Loss 4.1308 LearningRate 0.0002 Epoch: 25 Global Step: 522990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:43,008-Speed 6315.72 samples/sec Loss 4.0765 LearningRate 0.0002 Epoch: 25 Global Step: 523000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:46,239-Speed 6341.31 samples/sec Loss 4.1357 LearningRate 0.0002 Epoch: 25 Global Step: 523010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:49,482-Speed 6316.54 samples/sec Loss 4.0803 LearningRate 0.0002 Epoch: 25 Global Step: 523020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:52,727-Speed 6311.30 samples/sec Loss 4.0771 LearningRate 0.0002 Epoch: 25 Global Step: 523030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:55,971-Speed 6314.88 samples/sec Loss 4.0930 LearningRate 0.0002 Epoch: 25 Global Step: 523040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:18:59,218-Speed 6309.04 samples/sec Loss 4.1127 LearningRate 0.0002 Epoch: 25 Global Step: 523050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:19:02,463-Speed 6313.82 samples/sec Loss 4.0171 LearningRate 0.0002 Epoch: 25 Global Step: 523060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:19:05,707-Speed 6314.66 samples/sec Loss 4.0789 LearningRate 0.0002 Epoch: 25 Global Step: 523070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:19:08,970-Speed 6277.23 samples/sec Loss 4.0685 LearningRate 0.0002 Epoch: 25 Global Step: 523080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:19:12,203-Speed 6336.91 samples/sec Loss 4.0805 LearningRate 0.0002 Epoch: 25 Global Step: 523090 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:19:15,464-Speed 6280.89 samples/sec Loss 4.1254 LearningRate 0.0002 Epoch: 25 Global Step: 523100 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:19:18,718-Speed 6296.75 samples/sec Loss 4.1134 LearningRate 0.0002 Epoch: 25 Global Step: 523110 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:19:21,973-Speed 6291.90 samples/sec Loss 4.0759 LearningRate 0.0002 Epoch: 25 Global Step: 523120 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:19:25,216-Speed 6317.20 samples/sec Loss 4.0557 LearningRate 0.0002 Epoch: 25 Global Step: 523130 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:19:28,461-Speed 6311.98 samples/sec Loss 4.0925 LearningRate 0.0002 Epoch: 25 Global Step: 523140 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:19:31,709-Speed 6307.45 samples/sec Loss 4.1014 LearningRate 0.0002 Epoch: 25 Global Step: 523150 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:19:34,949-Speed 6322.52 samples/sec Loss 4.1044 LearningRate 0.0002 Epoch: 25 Global Step: 523160 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:19:38,194-Speed 6312.18 samples/sec Loss 4.0601 LearningRate 0.0002 Epoch: 25 Global Step: 523170 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:19:41,440-Speed 6311.09 samples/sec Loss 4.0762 LearningRate 0.0002 Epoch: 25 Global Step: 523180 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:19:44,683-Speed 6317.09 samples/sec Loss 4.1085 LearningRate 0.0002 Epoch: 25 Global Step: 523190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:19:47,931-Speed 6306.23 samples/sec Loss 4.1450 LearningRate 0.0002 Epoch: 25 Global Step: 523200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:19:51,175-Speed 6315.50 samples/sec Loss 4.0828 LearningRate 0.0002 Epoch: 25 Global Step: 523210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:19:54,421-Speed 6309.70 samples/sec Loss 4.1112 LearningRate 0.0002 Epoch: 25 Global Step: 523220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:19:57,662-Speed 6320.33 samples/sec Loss 4.1139 LearningRate 0.0002 Epoch: 25 Global Step: 523230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:00,905-Speed 6316.57 samples/sec Loss 4.1182 LearningRate 0.0002 Epoch: 25 Global Step: 523240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:04,149-Speed 6315.68 samples/sec Loss 4.1111 LearningRate 0.0002 Epoch: 25 Global Step: 523250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:07,393-Speed 6314.29 samples/sec Loss 4.1096 LearningRate 0.0002 Epoch: 25 Global Step: 523260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:10,637-Speed 6315.42 samples/sec Loss 4.1648 LearningRate 0.0002 Epoch: 25 Global Step: 523270 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:13,885-Speed 6307.36 samples/sec Loss 4.1438 LearningRate 0.0002 Epoch: 25 Global Step: 523280 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:17,119-Speed 6333.66 samples/sec Loss 4.2041 LearningRate 0.0002 Epoch: 25 Global Step: 523290 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:20,363-Speed 6314.51 samples/sec Loss 4.0731 LearningRate 0.0002 Epoch: 25 Global Step: 523300 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:23,607-Speed 6314.87 samples/sec Loss 4.0717 LearningRate 0.0002 Epoch: 25 Global Step: 523310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:26,852-Speed 6311.52 samples/sec Loss 4.0988 LearningRate 0.0002 Epoch: 25 Global Step: 523320 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:30,097-Speed 6313.71 samples/sec Loss 4.1212 LearningRate 0.0002 Epoch: 25 Global Step: 523330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:33,342-Speed 6312.54 samples/sec Loss 4.1435 LearningRate 0.0002 Epoch: 25 Global Step: 523340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:36,586-Speed 6314.40 samples/sec Loss 4.0825 LearningRate 0.0002 Epoch: 25 Global Step: 523350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:39,835-Speed 6304.00 samples/sec Loss 4.0526 LearningRate 0.0002 Epoch: 25 Global Step: 523360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:43,084-Speed 6305.73 samples/sec Loss 4.1453 LearningRate 0.0002 Epoch: 25 Global Step: 523370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:46,326-Speed 6318.53 samples/sec Loss 4.0906 LearningRate 0.0002 Epoch: 25 Global Step: 523380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:49,555-Speed 6342.40 samples/sec Loss 4.1008 LearningRate 0.0002 Epoch: 25 Global Step: 523390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:52,796-Speed 6320.61 samples/sec Loss 4.0961 LearningRate 0.0002 Epoch: 25 Global Step: 523400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:56,041-Speed 6313.99 samples/sec Loss 4.0565 LearningRate 0.0002 Epoch: 25 Global Step: 523410 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:20:59,285-Speed 6313.44 samples/sec Loss 4.0476 LearningRate 0.0002 Epoch: 25 Global Step: 523420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:02,532-Speed 6309.41 samples/sec Loss 4.1121 LearningRate 0.0002 Epoch: 25 Global Step: 523430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:05,779-Speed 6307.42 samples/sec Loss 4.1248 LearningRate 0.0002 Epoch: 25 Global Step: 523440 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:09,037-Speed 6289.48 samples/sec Loss 4.0688 LearningRate 0.0002 Epoch: 25 Global Step: 523450 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:12,282-Speed 6312.04 samples/sec Loss 4.0945 LearningRate 0.0002 Epoch: 25 Global Step: 523460 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:15,526-Speed 6313.48 samples/sec Loss 4.1202 LearningRate 0.0002 Epoch: 25 Global Step: 523470 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:18,778-Speed 6300.30 samples/sec Loss 4.0928 LearningRate 0.0002 Epoch: 25 Global Step: 523480 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:22,010-Speed 6337.20 samples/sec Loss 4.0584 LearningRate 0.0002 Epoch: 25 Global Step: 523490 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:25,261-Speed 6301.80 samples/sec Loss 4.1998 LearningRate 0.0002 Epoch: 25 Global Step: 523500 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:28,510-Speed 6304.87 samples/sec Loss 4.0737 LearningRate 0.0002 Epoch: 25 Global Step: 523510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:31,751-Speed 6321.54 samples/sec Loss 4.1313 LearningRate 0.0002 Epoch: 25 Global Step: 523520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:34,996-Speed 6312.48 samples/sec Loss 4.0934 LearningRate 0.0002 Epoch: 25 Global Step: 523530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:38,243-Speed 6308.23 samples/sec Loss 4.0494 LearningRate 0.0002 Epoch: 25 Global Step: 523540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:41,487-Speed 6314.83 samples/sec Loss 4.0548 LearningRate 0.0002 Epoch: 25 Global Step: 523550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:44,728-Speed 6320.58 samples/sec Loss 4.0584 LearningRate 0.0002 Epoch: 25 Global Step: 523560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:48,020-Speed 6221.65 samples/sec Loss 4.1814 LearningRate 0.0002 Epoch: 25 Global Step: 523570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:51,300-Speed 6245.06 samples/sec Loss 4.0636 LearningRate 0.0002 Epoch: 25 Global Step: 523580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:54,531-Speed 6340.44 samples/sec Loss 4.0734 LearningRate 0.0002 Epoch: 25 Global Step: 523590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:21:57,780-Speed 6306.43 samples/sec Loss 4.1381 LearningRate 0.0002 Epoch: 25 Global Step: 523600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:01,081-Speed 6204.30 samples/sec Loss 4.1041 LearningRate 0.0002 Epoch: 25 Global Step: 523610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:04,326-Speed 6313.49 samples/sec Loss 4.1366 LearningRate 0.0002 Epoch: 25 Global Step: 523620 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:07,570-Speed 6313.65 samples/sec Loss 4.1007 LearningRate 0.0002 Epoch: 25 Global Step: 523630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:10,812-Speed 6318.45 samples/sec Loss 4.1402 LearningRate 0.0002 Epoch: 25 Global Step: 523640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:14,056-Speed 6314.17 samples/sec Loss 4.1594 LearningRate 0.0002 Epoch: 25 Global Step: 523650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:17,298-Speed 6319.04 samples/sec Loss 4.0519 LearningRate 0.0002 Epoch: 25 Global Step: 523660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:20,543-Speed 6312.99 samples/sec Loss 4.0917 LearningRate 0.0002 Epoch: 25 Global Step: 523670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:23,788-Speed 6312.79 samples/sec Loss 4.1346 LearningRate 0.0002 Epoch: 25 Global Step: 523680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:27,019-Speed 6338.62 samples/sec Loss 4.1246 LearningRate 0.0002 Epoch: 25 Global Step: 523690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:30,270-Speed 6302.13 samples/sec Loss 4.1024 LearningRate 0.0002 Epoch: 25 Global Step: 523700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:33,515-Speed 6311.96 samples/sec Loss 4.1330 LearningRate 0.0002 Epoch: 25 Global Step: 523710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:36,759-Speed 6314.80 samples/sec Loss 4.0800 LearningRate 0.0002 Epoch: 25 Global Step: 523720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:40,007-Speed 6308.61 samples/sec Loss 4.0673 LearningRate 0.0002 Epoch: 25 Global Step: 523730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:43,250-Speed 6315.19 samples/sec Loss 4.1144 LearningRate 0.0002 Epoch: 25 Global Step: 523740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:46,496-Speed 6311.35 samples/sec Loss 4.1145 LearningRate 0.0002 Epoch: 25 Global Step: 523750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:49,740-Speed 6315.26 samples/sec Loss 4.1178 LearningRate 0.0002 Epoch: 25 Global Step: 523760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:52,984-Speed 6314.31 samples/sec Loss 4.1242 LearningRate 0.0002 Epoch: 25 Global Step: 523770 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:56,229-Speed 6312.00 samples/sec Loss 4.0896 LearningRate 0.0002 Epoch: 25 Global Step: 523780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:22:59,475-Speed 6314.94 samples/sec Loss 4.1058 LearningRate 0.0002 Epoch: 25 Global Step: 523790 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:23:02,688-Speed 6375.01 samples/sec Loss 4.1550 LearningRate 0.0002 Epoch: 25 Global Step: 523800 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:23:05,927-Speed 6324.68 samples/sec Loss 4.0461 LearningRate 0.0002 Epoch: 25 Global Step: 523810 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:23:09,168-Speed 6320.91 samples/sec Loss 4.0933 LearningRate 0.0002 Epoch: 25 Global Step: 523820 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:23:12,411-Speed 6315.20 samples/sec Loss 4.0761 LearningRate 0.0002 Epoch: 25 Global Step: 523830 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:23:15,655-Speed 6316.12 samples/sec Loss 4.1318 LearningRate 0.0002 Epoch: 25 Global Step: 523840 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:23:18,902-Speed 6308.05 samples/sec Loss 4.1087 LearningRate 0.0002 Epoch: 25 Global Step: 523850 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:23:22,142-Speed 6321.53 samples/sec Loss 4.0991 LearningRate 0.0002 Epoch: 25 Global Step: 523860 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:23:25,393-Speed 6301.45 samples/sec Loss 4.1019 LearningRate 0.0002 Epoch: 25 Global Step: 523870 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:23:28,638-Speed 6313.41 samples/sec Loss 4.1345 LearningRate 0.0002 Epoch: 25 Global Step: 523880 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:23:31,881-Speed 6315.77 samples/sec Loss 4.1231 LearningRate 0.0002 Epoch: 25 Global Step: 523890 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:23:35,125-Speed 6314.08 samples/sec Loss 4.0389 LearningRate 0.0002 Epoch: 25 Global Step: 523900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:23:38,372-Speed 6310.91 samples/sec Loss 4.0538 LearningRate 0.0002 Epoch: 25 Global Step: 523910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:23:41,617-Speed 6312.06 samples/sec Loss 4.0535 LearningRate 0.0002 Epoch: 25 Global Step: 523920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:23:44,861-Speed 6315.47 samples/sec Loss 4.1133 LearningRate 0.0002 Epoch: 25 Global Step: 523930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:23:48,106-Speed 6312.91 samples/sec Loss 4.0600 LearningRate 0.0002 Epoch: 25 Global Step: 523940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:23:51,354-Speed 6305.61 samples/sec Loss 4.1506 LearningRate 0.0002 Epoch: 25 Global Step: 523950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:23:54,597-Speed 6316.83 samples/sec Loss 4.1035 LearningRate 0.0002 Epoch: 25 Global Step: 523960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:23:57,844-Speed 6309.64 samples/sec Loss 3.9934 LearningRate 0.0002 Epoch: 25 Global Step: 523970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:01,087-Speed 6315.74 samples/sec Loss 4.1180 LearningRate 0.0002 Epoch: 25 Global Step: 523980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:04,331-Speed 6314.31 samples/sec Loss 4.0171 LearningRate 0.0002 Epoch: 25 Global Step: 523990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:07,569-Speed 6326.75 samples/sec Loss 4.1258 LearningRate 0.0002 Epoch: 25 Global Step: 524000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:10,814-Speed 6312.51 samples/sec Loss 4.0927 LearningRate 0.0002 Epoch: 25 Global Step: 524010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:14,060-Speed 6310.77 samples/sec Loss 4.0725 LearningRate 0.0002 Epoch: 25 Global Step: 524020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:17,302-Speed 6319.20 samples/sec Loss 4.0878 LearningRate 0.0002 Epoch: 25 Global Step: 524030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:20,548-Speed 6310.53 samples/sec Loss 4.0535 LearningRate 0.0002 Epoch: 25 Global Step: 524040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:23,795-Speed 6308.97 samples/sec Loss 4.0654 LearningRate 0.0002 Epoch: 25 Global Step: 524050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:27,042-Speed 6308.49 samples/sec Loss 4.1029 LearningRate 0.0002 Epoch: 25 Global Step: 524060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:30,288-Speed 6310.05 samples/sec Loss 4.2066 LearningRate 0.0002 Epoch: 25 Global Step: 524070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:33,533-Speed 6312.99 samples/sec Loss 4.0767 LearningRate 0.0002 Epoch: 25 Global Step: 524080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:36,777-Speed 6314.45 samples/sec Loss 4.0659 LearningRate 0.0002 Epoch: 25 Global Step: 524090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:40,008-Speed 6340.84 samples/sec Loss 4.0748 LearningRate 0.0002 Epoch: 25 Global Step: 524100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:43,254-Speed 6311.13 samples/sec Loss 4.1406 LearningRate 0.0002 Epoch: 25 Global Step: 524110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:46,498-Speed 6314.22 samples/sec Loss 4.0851 LearningRate 0.0002 Epoch: 25 Global Step: 524120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:49,752-Speed 6294.80 samples/sec Loss 4.0342 LearningRate 0.0002 Epoch: 25 Global Step: 524130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:52,997-Speed 6313.57 samples/sec Loss 4.0876 LearningRate 0.0002 Epoch: 25 Global Step: 524140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:56,245-Speed 6306.91 samples/sec Loss 4.1270 LearningRate 0.0002 Epoch: 25 Global Step: 524150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:24:59,491-Speed 6310.53 samples/sec Loss 4.1123 LearningRate 0.0002 Epoch: 25 Global Step: 524160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:02,736-Speed 6312.43 samples/sec Loss 4.1183 LearningRate 0.0002 Epoch: 25 Global Step: 524170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:05,980-Speed 6315.54 samples/sec Loss 4.1539 LearningRate 0.0002 Epoch: 25 Global Step: 524180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:09,224-Speed 6314.80 samples/sec Loss 4.0979 LearningRate 0.0002 Epoch: 25 Global Step: 524190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:12,468-Speed 6313.23 samples/sec Loss 4.1230 LearningRate 0.0002 Epoch: 25 Global Step: 524200 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:25:15,698-Speed 6342.99 samples/sec Loss 4.0952 LearningRate 0.0002 Epoch: 25 Global Step: 524210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:18,943-Speed 6311.64 samples/sec Loss 4.0743 LearningRate 0.0002 Epoch: 25 Global Step: 524220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:22,196-Speed 6297.86 samples/sec Loss 4.0545 LearningRate 0.0002 Epoch: 25 Global Step: 524230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:25,445-Speed 6304.73 samples/sec Loss 4.1027 LearningRate 0.0002 Epoch: 25 Global Step: 524240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:28,687-Speed 6318.42 samples/sec Loss 4.0919 LearningRate 0.0002 Epoch: 25 Global Step: 524250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:31,937-Speed 6302.79 samples/sec Loss 4.1444 LearningRate 0.0002 Epoch: 25 Global Step: 524260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:35,179-Speed 6318.26 samples/sec Loss 4.0722 LearningRate 0.0002 Epoch: 25 Global Step: 524270 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:38,430-Speed 6302.24 samples/sec Loss 4.1313 LearningRate 0.0002 Epoch: 25 Global Step: 524280 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:41,671-Speed 6319.29 samples/sec Loss 4.1311 LearningRate 0.0002 Epoch: 25 Global Step: 524290 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:44,915-Speed 6316.31 samples/sec Loss 4.1276 LearningRate 0.0002 Epoch: 25 Global Step: 524300 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:48,149-Speed 6333.43 samples/sec Loss 4.0845 LearningRate 0.0002 Epoch: 25 Global Step: 524310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:25:51,381-Speed 6340.98 samples/sec Loss 4.0798 LearningRate 0.0002 Epoch: 25 Global Step: 524320 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:25:54,627-Speed 6310.76 samples/sec Loss 4.1286 LearningRate 0.0002 Epoch: 25 Global Step: 524330 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:25:57,870-Speed 6317.19 samples/sec Loss 4.0859 LearningRate 0.0002 Epoch: 25 Global Step: 524340 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:26:01,114-Speed 6312.90 samples/sec Loss 4.1087 LearningRate 0.0002 Epoch: 25 Global Step: 524350 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:26:04,363-Speed 6306.81 samples/sec Loss 4.0592 LearningRate 0.0002 Epoch: 25 Global Step: 524360 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:26:07,604-Speed 6319.18 samples/sec Loss 4.0695 LearningRate 0.0002 Epoch: 25 Global Step: 524370 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:26:10,849-Speed 6313.51 samples/sec Loss 4.0503 LearningRate 0.0002 Epoch: 25 Global Step: 524380 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:26:14,091-Speed 6320.75 samples/sec Loss 4.0913 LearningRate 0.0002 Epoch: 25 Global Step: 524390 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:26:17,338-Speed 6309.87 samples/sec Loss 4.0993 LearningRate 0.0002 Epoch: 25 Global Step: 524400 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:26:20,585-Speed 6307.91 samples/sec Loss 4.0507 LearningRate 0.0002 Epoch: 25 Global Step: 524410 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:26:23,828-Speed 6316.79 samples/sec Loss 4.0523 LearningRate 0.0002 Epoch: 25 Global Step: 524420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:27,076-Speed 6307.07 samples/sec Loss 4.0677 LearningRate 0.0002 Epoch: 25 Global Step: 524430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:30,319-Speed 6316.69 samples/sec Loss 4.0915 LearningRate 0.0002 Epoch: 25 Global Step: 524440 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:33,566-Speed 6309.17 samples/sec Loss 4.0863 LearningRate 0.0002 Epoch: 25 Global Step: 524450 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:36,810-Speed 6313.98 samples/sec Loss 4.0737 LearningRate 0.0002 Epoch: 25 Global Step: 524460 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:40,057-Speed 6307.94 samples/sec Loss 4.1083 LearningRate 0.0002 Epoch: 25 Global Step: 524470 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:43,303-Speed 6310.60 samples/sec Loss 4.2118 LearningRate 0.0002 Epoch: 25 Global Step: 524480 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:46,545-Speed 6318.60 samples/sec Loss 4.0959 LearningRate 0.0002 Epoch: 25 Global Step: 524490 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:49,792-Speed 6310.51 samples/sec Loss 4.1012 LearningRate 0.0002 Epoch: 25 Global Step: 524500 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:53,035-Speed 6315.99 samples/sec Loss 4.1253 LearningRate 0.0002 Epoch: 25 Global Step: 524510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:56,270-Speed 6331.93 samples/sec Loss 4.1486 LearningRate 0.0002 Epoch: 25 Global Step: 524520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:26:59,519-Speed 6304.64 samples/sec Loss 4.0720 LearningRate 0.0002 Epoch: 25 Global Step: 524530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:02,763-Speed 6313.68 samples/sec Loss 4.0331 LearningRate 0.0002 Epoch: 25 Global Step: 524540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:06,014-Speed 6302.29 samples/sec Loss 4.1549 LearningRate 0.0002 Epoch: 25 Global Step: 524550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:09,259-Speed 6312.12 samples/sec Loss 4.0606 LearningRate 0.0002 Epoch: 25 Global Step: 524560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:12,518-Speed 6286.96 samples/sec Loss 4.0810 LearningRate 0.0002 Epoch: 25 Global Step: 524570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:15,767-Speed 6304.67 samples/sec Loss 4.0410 LearningRate 0.0002 Epoch: 25 Global Step: 524580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:19,014-Speed 6308.45 samples/sec Loss 4.1071 LearningRate 0.0002 Epoch: 25 Global Step: 524590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:22,268-Speed 6296.55 samples/sec Loss 4.0708 LearningRate 0.0002 Epoch: 25 Global Step: 524600 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:25,515-Speed 6308.22 samples/sec Loss 4.0896 LearningRate 0.0002 Epoch: 25 Global Step: 524610 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:28,763-Speed 6306.40 samples/sec Loss 4.1155 LearningRate 0.0002 Epoch: 25 Global Step: 524620 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:27:31,995-Speed 6338.55 samples/sec Loss 4.0619 LearningRate 0.0002 Epoch: 25 Global Step: 524630 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:35,245-Speed 6302.10 samples/sec Loss 4.1041 LearningRate 0.0002 Epoch: 25 Global Step: 524640 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:38,488-Speed 6316.55 samples/sec Loss 4.1194 LearningRate 0.0002 Epoch: 25 Global Step: 524650 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:41,744-Speed 6290.36 samples/sec Loss 4.0950 LearningRate 0.0002 Epoch: 25 Global Step: 524660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:44,988-Speed 6316.36 samples/sec Loss 4.1634 LearningRate 0.0002 Epoch: 25 Global Step: 524670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:48,261-Speed 6257.23 samples/sec Loss 4.0895 LearningRate 0.0002 Epoch: 25 Global Step: 524680 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:51,520-Speed 6287.13 samples/sec Loss 4.1147 LearningRate 0.0002 Epoch: 25 Global Step: 524690 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:54,774-Speed 6294.80 samples/sec Loss 4.0195 LearningRate 0.0002 Epoch: 25 Global Step: 524700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:27:58,018-Speed 6314.32 samples/sec Loss 4.0692 LearningRate 0.0002 Epoch: 25 Global Step: 524710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:28:01,265-Speed 6309.00 samples/sec Loss 4.1389 LearningRate 0.0002 Epoch: 25 Global Step: 524720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:28:04,480-Speed 6371.95 samples/sec Loss 4.1082 LearningRate 0.0002 Epoch: 25 Global Step: 524730 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:28:07,721-Speed 6319.27 samples/sec Loss 4.0944 LearningRate 0.0002 Epoch: 25 Global Step: 524740 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:28:10,965-Speed 6314.38 samples/sec Loss 4.1159 LearningRate 0.0002 Epoch: 25 Global Step: 524750 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:28:14,210-Speed 6317.60 samples/sec Loss 4.0437 LearningRate 0.0002 Epoch: 25 Global Step: 524760 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:28:17,459-Speed 6303.83 samples/sec Loss 4.0708 LearningRate 0.0002 Epoch: 25 Global Step: 524770 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:28:20,702-Speed 6319.19 samples/sec Loss 4.0185 LearningRate 0.0002 Epoch: 25 Global Step: 524780 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:28:23,945-Speed 6315.62 samples/sec Loss 4.1011 LearningRate 0.0002 Epoch: 25 Global Step: 524790 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:28:27,186-Speed 6320.02 samples/sec Loss 4.0884 LearningRate 0.0002 Epoch: 25 Global Step: 524800 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:28:30,429-Speed 6316.37 samples/sec Loss 4.0969 LearningRate 0.0002 Epoch: 25 Global Step: 524810 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:28:33,675-Speed 6312.35 samples/sec Loss 4.1459 LearningRate 0.0002 Epoch: 25 Global Step: 524820 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:28:36,919-Speed 6313.78 samples/sec Loss 4.0642 LearningRate 0.0002 Epoch: 25 Global Step: 524830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:28:40,165-Speed 6310.83 samples/sec Loss 4.1556 LearningRate 0.0002 Epoch: 25 Global Step: 524840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:28:43,408-Speed 6316.32 samples/sec Loss 4.0503 LearningRate 0.0002 Epoch: 25 Global Step: 524850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:28:46,656-Speed 6305.88 samples/sec Loss 4.1200 LearningRate 0.0002 Epoch: 25 Global Step: 524860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:28:49,903-Speed 6309.98 samples/sec Loss 4.0686 LearningRate 0.0002 Epoch: 25 Global Step: 524870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:28:53,149-Speed 6310.05 samples/sec Loss 4.1189 LearningRate 0.0002 Epoch: 25 Global Step: 524880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:28:56,393-Speed 6314.24 samples/sec Loss 4.1055 LearningRate 0.0002 Epoch: 25 Global Step: 524890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:28:59,639-Speed 6311.74 samples/sec Loss 4.0588 LearningRate 0.0002 Epoch: 25 Global Step: 524900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:29:02,885-Speed 6310.87 samples/sec Loss 4.0336 LearningRate 0.0002 Epoch: 25 Global Step: 524910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:29:06,133-Speed 6306.04 samples/sec Loss 4.0055 LearningRate 0.0002 Epoch: 25 Global Step: 524920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:29:09,364-Speed 6339.39 samples/sec Loss 4.0967 LearningRate 0.0002 Epoch: 25 Global Step: 524930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:29:12,609-Speed 6312.67 samples/sec Loss 4.1311 LearningRate 0.0002 Epoch: 25 Global Step: 524940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:29:15,862-Speed 6297.88 samples/sec Loss 4.0815 LearningRate 0.0002 Epoch: 25 Global Step: 524950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:29:19,095-Speed 6336.17 samples/sec Loss 4.1051 LearningRate 0.0002 Epoch: 25 Global Step: 524960 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:29:22,343-Speed 6306.00 samples/sec Loss 4.0523 LearningRate 0.0002 Epoch: 25 Global Step: 524970 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:29:25,626-Speed 6240.77 samples/sec Loss 4.1056 LearningRate 0.0002 Epoch: 25 Global Step: 524980 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:29:28,874-Speed 6307.39 samples/sec Loss 4.1730 LearningRate 0.0002 Epoch: 25 Global Step: 524990 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:29:32,117-Speed 6316.13 samples/sec Loss 4.0706 LearningRate 0.0002 Epoch: 25 Global Step: 525000 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:29:35,363-Speed 6310.27 samples/sec Loss 4.1074 LearningRate 0.0002 Epoch: 25 Global Step: 525010 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:29:38,609-Speed 6310.19 samples/sec Loss 4.0382 LearningRate 0.0002 Epoch: 25 Global Step: 525020 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:29:41,854-Speed 6313.36 samples/sec Loss 4.0642 LearningRate 0.0002 Epoch: 25 Global Step: 525030 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:29:45,097-Speed 6315.91 samples/sec Loss 4.1209 LearningRate 0.0002 Epoch: 25 Global Step: 525040 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:29:48,340-Speed 6316.71 samples/sec Loss 4.0127 LearningRate 0.0002 Epoch: 25 Global Step: 525050 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:29:51,582-Speed 6319.14 samples/sec Loss 4.1048 LearningRate 0.0002 Epoch: 25 Global Step: 525060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:29:54,831-Speed 6304.39 samples/sec Loss 4.0879 LearningRate 0.0002 Epoch: 25 Global Step: 525070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:29:58,079-Speed 6306.56 samples/sec Loss 4.1214 LearningRate 0.0002 Epoch: 25 Global Step: 525080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:01,322-Speed 6316.76 samples/sec Loss 4.0711 LearningRate 0.0002 Epoch: 25 Global Step: 525090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:04,570-Speed 6307.13 samples/sec Loss 4.0826 LearningRate 0.0002 Epoch: 25 Global Step: 525100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:07,815-Speed 6312.92 samples/sec Loss 4.0485 LearningRate 0.0002 Epoch: 25 Global Step: 525110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:11,060-Speed 6312.88 samples/sec Loss 4.0558 LearningRate 0.0002 Epoch: 25 Global Step: 525120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:14,311-Speed 6300.90 samples/sec Loss 4.0866 LearningRate 0.0002 Epoch: 25 Global Step: 525130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:17,555-Speed 6313.77 samples/sec Loss 4.0186 LearningRate 0.0002 Epoch: 25 Global Step: 525140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:20,802-Speed 6309.44 samples/sec Loss 4.0595 LearningRate 0.0002 Epoch: 25 Global Step: 525150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:24,037-Speed 6332.92 samples/sec Loss 4.1229 LearningRate 0.0002 Epoch: 25 Global Step: 525160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:27,282-Speed 6311.46 samples/sec Loss 4.0455 LearningRate 0.0002 Epoch: 25 Global Step: 525170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:30,529-Speed 6308.87 samples/sec Loss 4.1073 LearningRate 0.0002 Epoch: 25 Global Step: 525180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:33,776-Speed 6309.19 samples/sec Loss 4.1157 LearningRate 0.0002 Epoch: 25 Global Step: 525190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:37,024-Speed 6307.88 samples/sec Loss 4.1223 LearningRate 0.0002 Epoch: 25 Global Step: 525200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:40,272-Speed 6307.13 samples/sec Loss 4.1669 LearningRate 0.0002 Epoch: 25 Global Step: 525210 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:43,515-Speed 6315.90 samples/sec Loss 4.1102 LearningRate 0.0002 Epoch: 25 Global Step: 525220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:30:46,746-Speed 6341.07 samples/sec Loss 4.1423 LearningRate 0.0002 Epoch: 25 Global Step: 525230 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:30:49,988-Speed 6319.65 samples/sec Loss 4.0900 LearningRate 0.0002 Epoch: 25 Global Step: 525240 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:30:53,232-Speed 6312.61 samples/sec Loss 4.0476 LearningRate 0.0002 Epoch: 25 Global Step: 525250 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:30:56,480-Speed 6306.86 samples/sec Loss 4.1263 LearningRate 0.0002 Epoch: 25 Global Step: 525260 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:30:59,758-Speed 6249.05 samples/sec Loss 4.1459 LearningRate 0.0002 Epoch: 25 Global Step: 525270 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:31:03,004-Speed 6312.35 samples/sec Loss 4.0824 LearningRate 0.0002 Epoch: 25 Global Step: 525280 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:31:06,296-Speed 6222.12 samples/sec Loss 4.0176 LearningRate 0.0002 Epoch: 25 Global Step: 525290 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:31:09,544-Speed 6305.93 samples/sec Loss 4.0576 LearningRate 0.0002 Epoch: 25 Global Step: 525300 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:31:12,790-Speed 6309.80 samples/sec Loss 4.1036 LearningRate 0.0002 Epoch: 25 Global Step: 525310 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:31:16,044-Speed 6295.41 samples/sec Loss 4.1864 LearningRate 0.0002 Epoch: 25 Global Step: 525320 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:31:19,292-Speed 6308.45 samples/sec Loss 4.0700 LearningRate 0.0002 Epoch: 25 Global Step: 525330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:31:22,543-Speed 6300.57 samples/sec Loss 4.0900 LearningRate 0.0002 Epoch: 25 Global Step: 525340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:31:25,795-Speed 6299.13 samples/sec Loss 4.0558 LearningRate 0.0002 Epoch: 25 Global Step: 525350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:31:29,042-Speed 6309.13 samples/sec Loss 4.0452 LearningRate 0.0002 Epoch: 25 Global Step: 525360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:31:32,292-Speed 6302.74 samples/sec Loss 4.0808 LearningRate 0.0002 Epoch: 25 Global Step: 525370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:31:35,537-Speed 6311.00 samples/sec Loss 4.0294 LearningRate 0.0002 Epoch: 25 Global Step: 525380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:31:38,783-Speed 6315.49 samples/sec Loss 4.0874 LearningRate 0.0002 Epoch: 25 Global Step: 525390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:31:42,028-Speed 6311.78 samples/sec Loss 4.0386 LearningRate 0.0002 Epoch: 25 Global Step: 525400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:31:45,261-Speed 6336.50 samples/sec Loss 4.1001 LearningRate 0.0002 Epoch: 25 Global Step: 525410 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:31:48,508-Speed 6309.63 samples/sec Loss 4.0890 LearningRate 0.0002 Epoch: 25 Global Step: 525420 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:31:51,756-Speed 6307.06 samples/sec Loss 4.0573 LearningRate 0.0002 Epoch: 25 Global Step: 525430 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:31:55,003-Speed 6308.11 samples/sec Loss 4.1352 LearningRate 0.0002 Epoch: 25 Global Step: 525440 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:31:58,247-Speed 6314.81 samples/sec Loss 4.1279 LearningRate 0.0002 Epoch: 25 Global Step: 525450 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:01,493-Speed 6311.72 samples/sec Loss 4.0064 LearningRate 0.0002 Epoch: 25 Global Step: 525460 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:04,742-Speed 6303.92 samples/sec Loss 4.0813 LearningRate 0.0002 Epoch: 25 Global Step: 525470 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:07,992-Speed 6302.78 samples/sec Loss 4.1016 LearningRate 0.0002 Epoch: 25 Global Step: 525480 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:11,238-Speed 6311.61 samples/sec Loss 4.0608 LearningRate 0.0002 Epoch: 25 Global Step: 525490 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:14,481-Speed 6317.25 samples/sec Loss 4.0441 LearningRate 0.0002 Epoch: 25 Global Step: 525500 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:17,727-Speed 6309.55 samples/sec Loss 4.0673 LearningRate 0.0002 Epoch: 25 Global Step: 525510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:32:20,974-Speed 6309.15 samples/sec Loss 4.0678 LearningRate 0.0002 Epoch: 25 Global Step: 525520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:32:24,224-Speed 6303.55 samples/sec Loss 4.1339 LearningRate 0.0002 Epoch: 25 Global Step: 525530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:32:27,470-Speed 6309.82 samples/sec Loss 4.0518 LearningRate 0.0002 Epoch: 25 Global Step: 525540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:32:30,718-Speed 6306.88 samples/sec Loss 4.0581 LearningRate 0.0002 Epoch: 25 Global Step: 525550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:32:33,949-Speed 6340.74 samples/sec Loss 4.0216 LearningRate 0.0002 Epoch: 25 Global Step: 525560 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:37,188-Speed 6322.95 samples/sec Loss 4.1232 LearningRate 0.0002 Epoch: 25 Global Step: 525570 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:40,438-Speed 6303.11 samples/sec Loss 4.1152 LearningRate 0.0002 Epoch: 25 Global Step: 525580 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:43,689-Speed 6301.69 samples/sec Loss 4.1036 LearningRate 0.0002 Epoch: 25 Global Step: 525590 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:46,947-Speed 6286.27 samples/sec Loss 4.0902 LearningRate 0.0002 Epoch: 25 Global Step: 525600 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:50,206-Speed 6286.20 samples/sec Loss 4.1298 LearningRate 0.0002 Epoch: 25 Global Step: 525610 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:53,452-Speed 6311.42 samples/sec Loss 4.0460 LearningRate 0.0002 Epoch: 25 Global Step: 525620 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:56,698-Speed 6309.86 samples/sec Loss 4.0779 LearningRate 0.0002 Epoch: 25 Global Step: 525630 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:32:59,945-Speed 6310.13 samples/sec Loss 4.0419 LearningRate 0.0002 Epoch: 25 Global Step: 525640 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:03,201-Speed 6290.58 samples/sec Loss 4.1623 LearningRate 0.0002 Epoch: 25 Global Step: 525650 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:06,450-Speed 6306.47 samples/sec Loss 4.1709 LearningRate 0.0002 Epoch: 25 Global Step: 525660 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:33:09,693-Speed 6316.55 samples/sec Loss 4.0467 LearningRate 0.0002 Epoch: 25 Global Step: 525670 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:33:12,925-Speed 6338.24 samples/sec Loss 4.0981 LearningRate 0.0002 Epoch: 25 Global Step: 525680 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:16,175-Speed 6301.23 samples/sec Loss 4.1037 LearningRate 0.0002 Epoch: 25 Global Step: 525690 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:19,417-Speed 6319.46 samples/sec Loss 4.1093 LearningRate 0.0002 Epoch: 25 Global Step: 525700 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:22,668-Speed 6300.79 samples/sec Loss 4.1017 LearningRate 0.0002 Epoch: 25 Global Step: 525710 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:25,966-Speed 6210.91 samples/sec Loss 4.0802 LearningRate 0.0002 Epoch: 25 Global Step: 525720 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:29,210-Speed 6315.14 samples/sec Loss 4.0799 LearningRate 0.0002 Epoch: 25 Global Step: 525730 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:32,458-Speed 6307.55 samples/sec Loss 4.1165 LearningRate 0.0002 Epoch: 25 Global Step: 525740 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:35,704-Speed 6310.84 samples/sec Loss 4.0330 LearningRate 0.0002 Epoch: 25 Global Step: 525750 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:38,947-Speed 6316.59 samples/sec Loss 4.0312 LearningRate 0.0002 Epoch: 25 Global Step: 525760 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:42,192-Speed 6311.62 samples/sec Loss 4.0771 LearningRate 0.0002 Epoch: 25 Global Step: 525770 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:33:45,439-Speed 6309.60 samples/sec Loss 4.0958 LearningRate 0.0002 Epoch: 25 Global Step: 525780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:33:48,688-Speed 6304.66 samples/sec Loss 4.1601 LearningRate 0.0002 Epoch: 25 Global Step: 525790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:33:51,931-Speed 6316.39 samples/sec Loss 4.1001 LearningRate 0.0002 Epoch: 25 Global Step: 525800 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:33:55,173-Speed 6319.41 samples/sec Loss 4.1101 LearningRate 0.0002 Epoch: 25 Global Step: 525810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:33:58,419-Speed 6310.39 samples/sec Loss 4.1266 LearningRate 0.0002 Epoch: 25 Global Step: 525820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:34:01,672-Speed 6296.06 samples/sec Loss 4.1409 LearningRate 0.0002 Epoch: 25 Global Step: 525830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:34:04,919-Speed 6309.77 samples/sec Loss 4.1254 LearningRate 0.0002 Epoch: 25 Global Step: 525840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:34:08,168-Speed 6305.26 samples/sec Loss 4.0832 LearningRate 0.0002 Epoch: 25 Global Step: 525850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:34:11,414-Speed 6310.95 samples/sec Loss 4.0495 LearningRate 0.0002 Epoch: 25 Global Step: 525860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:34:14,658-Speed 6315.19 samples/sec Loss 4.0943 LearningRate 0.0002 Epoch: 25 Global Step: 525870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:34:17,905-Speed 6308.30 samples/sec Loss 4.0573 LearningRate 0.0002 Epoch: 25 Global Step: 525880 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:34:21,132-Speed 6347.87 samples/sec Loss 4.0913 LearningRate 0.0002 Epoch: 25 Global Step: 525890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:34:24,373-Speed 6320.92 samples/sec Loss 4.0655 LearningRate 0.0002 Epoch: 25 Global Step: 525900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:34:27,602-Speed 6342.35 samples/sec Loss 4.0365 LearningRate 0.0002 Epoch: 25 Global Step: 525910 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:34:30,845-Speed 6318.25 samples/sec Loss 4.0377 LearningRate 0.0002 Epoch: 25 Global Step: 525920 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:34:34,087-Speed 6317.80 samples/sec Loss 4.0560 LearningRate 0.0002 Epoch: 25 Global Step: 525930 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:34:37,327-Speed 6321.81 samples/sec Loss 4.0581 LearningRate 0.0002 Epoch: 25 Global Step: 525940 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:34:40,571-Speed 6314.62 samples/sec Loss 4.1337 LearningRate 0.0002 Epoch: 25 Global Step: 525950 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:34:43,813-Speed 6318.59 samples/sec Loss 4.1112 LearningRate 0.0002 Epoch: 25 Global Step: 525960 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:34:47,057-Speed 6313.87 samples/sec Loss 4.0612 LearningRate 0.0002 Epoch: 25 Global Step: 525970 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:34:50,300-Speed 6316.84 samples/sec Loss 4.1061 LearningRate 0.0002 Epoch: 25 Global Step: 525980 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:34:53,544-Speed 6315.47 samples/sec Loss 4.0963 LearningRate 0.0002 Epoch: 25 Global Step: 525990 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:34:56,790-Speed 6309.38 samples/sec Loss 4.0770 LearningRate 0.0002 Epoch: 25 Global Step: 526000 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:35:00,034-Speed 6316.17 samples/sec Loss 4.0226 LearningRate 0.0002 Epoch: 25 Global Step: 526010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:03,287-Speed 6296.00 samples/sec Loss 4.1148 LearningRate 0.0002 Epoch: 25 Global Step: 526020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:06,533-Speed 6310.09 samples/sec Loss 4.0231 LearningRate 0.0002 Epoch: 25 Global Step: 526030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:09,779-Speed 6312.13 samples/sec Loss 4.0785 LearningRate 0.0002 Epoch: 25 Global Step: 526040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:13,027-Speed 6305.57 samples/sec Loss 4.0986 LearningRate 0.0002 Epoch: 25 Global Step: 526050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:16,271-Speed 6314.82 samples/sec Loss 4.0370 LearningRate 0.0002 Epoch: 25 Global Step: 526060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:19,517-Speed 6312.72 samples/sec Loss 4.0154 LearningRate 0.0002 Epoch: 25 Global Step: 526070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:22,761-Speed 6314.07 samples/sec Loss 4.0789 LearningRate 0.0002 Epoch: 25 Global Step: 526080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:26,008-Speed 6309.88 samples/sec Loss 4.0456 LearningRate 0.0002 Epoch: 25 Global Step: 526090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:29,252-Speed 6314.75 samples/sec Loss 4.0555 LearningRate 0.0002 Epoch: 25 Global Step: 526100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:32,479-Speed 6347.02 samples/sec Loss 4.0492 LearningRate 0.0002 Epoch: 25 Global Step: 526110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:35,727-Speed 6305.89 samples/sec Loss 4.0713 LearningRate 0.0002 Epoch: 25 Global Step: 526120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:38,976-Speed 6306.17 samples/sec Loss 4.0697 LearningRate 0.0002 Epoch: 25 Global Step: 526130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:42,220-Speed 6313.41 samples/sec Loss 4.0984 LearningRate 0.0002 Epoch: 25 Global Step: 526140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:45,465-Speed 6312.44 samples/sec Loss 4.1221 LearningRate 0.0002 Epoch: 25 Global Step: 526150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:48,714-Speed 6306.35 samples/sec Loss 4.0581 LearningRate 0.0002 Epoch: 25 Global Step: 526160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:51,957-Speed 6316.22 samples/sec Loss 4.0380 LearningRate 0.0002 Epoch: 25 Global Step: 526170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:55,201-Speed 6314.41 samples/sec Loss 4.1377 LearningRate 0.0002 Epoch: 25 Global Step: 526180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:35:58,449-Speed 6307.18 samples/sec Loss 4.1125 LearningRate 0.0002 Epoch: 25 Global Step: 526190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:01,694-Speed 6313.40 samples/sec Loss 4.0929 LearningRate 0.0002 Epoch: 25 Global Step: 526200 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:04,944-Speed 6302.05 samples/sec Loss 4.0567 LearningRate 0.0002 Epoch: 25 Global Step: 526210 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:36:08,175-Speed 6338.75 samples/sec Loss 4.1056 LearningRate 0.0002 Epoch: 25 Global Step: 526220 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:11,421-Speed 6311.35 samples/sec Loss 4.0882 LearningRate 0.0002 Epoch: 25 Global Step: 526230 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:14,669-Speed 6306.84 samples/sec Loss 4.0935 LearningRate 0.0002 Epoch: 25 Global Step: 526240 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:17,914-Speed 6313.11 samples/sec Loss 4.1179 LearningRate 0.0002 Epoch: 25 Global Step: 526250 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:21,161-Speed 6309.05 samples/sec Loss 4.1653 LearningRate 0.0002 Epoch: 25 Global Step: 526260 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:24,404-Speed 6315.42 samples/sec Loss 4.0814 LearningRate 0.0002 Epoch: 25 Global Step: 526270 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:27,657-Speed 6299.09 samples/sec Loss 4.0765 LearningRate 0.0002 Epoch: 25 Global Step: 526280 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:30,905-Speed 6306.23 samples/sec Loss 4.0755 LearningRate 0.0002 Epoch: 25 Global Step: 526290 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:34,148-Speed 6316.21 samples/sec Loss 4.0941 LearningRate 0.0002 Epoch: 25 Global Step: 526300 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:37,391-Speed 6316.37 samples/sec Loss 4.0647 LearningRate 0.0002 Epoch: 25 Global Step: 526310 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:40,625-Speed 6335.49 samples/sec Loss 4.0799 LearningRate 0.0002 Epoch: 25 Global Step: 526320 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:43,872-Speed 6308.91 samples/sec Loss 4.1019 LearningRate 0.0002 Epoch: 25 Global Step: 526330 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:47,116-Speed 6314.20 samples/sec Loss 4.1047 LearningRate 0.0002 Epoch: 25 Global Step: 526340 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:50,364-Speed 6307.14 samples/sec Loss 4.0121 LearningRate 0.0002 Epoch: 25 Global Step: 526350 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:53,611-Speed 6308.63 samples/sec Loss 4.0994 LearningRate 0.0002 Epoch: 25 Global Step: 526360 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:36:56,854-Speed 6316.23 samples/sec Loss 4.0743 LearningRate 0.0002 Epoch: 25 Global Step: 526370 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:00,102-Speed 6307.19 samples/sec Loss 4.0463 LearningRate 0.0002 Epoch: 25 Global Step: 526380 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:03,345-Speed 6315.41 samples/sec Loss 4.0951 LearningRate 0.0002 Epoch: 25 Global Step: 526390 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:06,591-Speed 6311.02 samples/sec Loss 4.1166 LearningRate 0.0002 Epoch: 25 Global Step: 526400 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:09,838-Speed 6308.88 samples/sec Loss 4.1381 LearningRate 0.0002 Epoch: 25 Global Step: 526410 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:13,072-Speed 6334.86 samples/sec Loss 4.0594 LearningRate 0.0002 Epoch: 25 Global Step: 526420 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:16,319-Speed 6308.96 samples/sec Loss 4.1109 LearningRate 0.0002 Epoch: 25 Global Step: 526430 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:19,563-Speed 6314.87 samples/sec Loss 4.1069 LearningRate 0.0002 Epoch: 25 Global Step: 526440 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:22,808-Speed 6311.50 samples/sec Loss 4.0607 LearningRate 0.0002 Epoch: 25 Global Step: 526450 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:26,052-Speed 6314.45 samples/sec Loss 4.0621 LearningRate 0.0002 Epoch: 25 Global Step: 526460 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:29,300-Speed 6306.98 samples/sec Loss 4.0224 LearningRate 0.0002 Epoch: 25 Global Step: 526470 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:32,547-Speed 6308.32 samples/sec Loss 4.0280 LearningRate 0.0002 Epoch: 25 Global Step: 526480 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:35,795-Speed 6307.62 samples/sec Loss 4.0650 LearningRate 0.0002 Epoch: 25 Global Step: 526490 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:39,079-Speed 6237.49 samples/sec Loss 4.1071 LearningRate 0.0002 Epoch: 25 Global Step: 526500 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:42,412-Speed 6145.97 samples/sec Loss 4.0310 LearningRate 0.0002 Epoch: 25 Global Step: 526510 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:45,646-Speed 6334.96 samples/sec Loss 4.1012 LearningRate 0.0002 Epoch: 25 Global Step: 526520 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:48,890-Speed 6314.64 samples/sec Loss 4.1001 LearningRate 0.0002 Epoch: 25 Global Step: 526530 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:52,137-Speed 6309.30 samples/sec Loss 4.0945 LearningRate 0.0002 Epoch: 25 Global Step: 526540 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:55,386-Speed 6304.00 samples/sec Loss 4.1081 LearningRate 0.0002 Epoch: 25 Global Step: 526550 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:37:58,631-Speed 6313.86 samples/sec Loss 4.0314 LearningRate 0.0002 Epoch: 25 Global Step: 526560 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:38:01,874-Speed 6315.06 samples/sec Loss 4.1071 LearningRate 0.0002 Epoch: 25 Global Step: 526570 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:38:05,124-Speed 6303.90 samples/sec Loss 4.0825 LearningRate 0.0002 Epoch: 25 Global Step: 526580 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:38:08,368-Speed 6314.90 samples/sec Loss 4.0959 LearningRate 0.0002 Epoch: 25 Global Step: 526590 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:38:11,602-Speed 6332.79 samples/sec Loss 4.1291 LearningRate 0.0002 Epoch: 25 Global Step: 526600 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:38:14,847-Speed 6312.45 samples/sec Loss 4.0973 LearningRate 0.0002 Epoch: 25 Global Step: 526610 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:38:18,096-Speed 6305.63 samples/sec Loss 4.0982 LearningRate 0.0002 Epoch: 25 Global Step: 526620 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:38:21,341-Speed 6312.54 samples/sec Loss 4.1128 LearningRate 0.0002 Epoch: 25 Global Step: 526630 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:38:24,589-Speed 6307.51 samples/sec Loss 4.0813 LearningRate 0.0002 Epoch: 25 Global Step: 526640 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:38:27,834-Speed 6312.04 samples/sec Loss 4.1267 LearningRate 0.0002 Epoch: 25 Global Step: 526650 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:38:31,081-Speed 6308.00 samples/sec Loss 4.1049 LearningRate 0.0002 Epoch: 25 Global Step: 526660 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:38:34,326-Speed 6314.11 samples/sec Loss 4.1173 LearningRate 0.0002 Epoch: 25 Global Step: 526670 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:38:37,570-Speed 6313.20 samples/sec Loss 4.0688 LearningRate 0.0002 Epoch: 25 Global Step: 526680 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:38:40,819-Speed 6306.52 samples/sec Loss 4.0448 LearningRate 0.0002 Epoch: 25 Global Step: 526690 Fp16 Grad Scale: 8192 Required: 28 hours Training: 2022-04-02 15:38:44,064-Speed 6312.41 samples/sec Loss 4.1215 LearningRate 0.0002 Epoch: 25 Global Step: 526700 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:38:47,314-Speed 6303.33 samples/sec Loss 4.1050 LearningRate 0.0002 Epoch: 25 Global Step: 526710 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:38:50,559-Speed 6312.85 samples/sec Loss 4.0714 LearningRate 0.0002 Epoch: 25 Global Step: 526720 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:38:53,802-Speed 6316.72 samples/sec Loss 4.1017 LearningRate 0.0002 Epoch: 25 Global Step: 526730 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:38:57,047-Speed 6312.04 samples/sec Loss 4.0932 LearningRate 0.0002 Epoch: 25 Global Step: 526740 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:00,375-Speed 6154.73 samples/sec Loss 4.0517 LearningRate 0.0002 Epoch: 25 Global Step: 526750 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:03,660-Speed 6236.39 samples/sec Loss 4.0345 LearningRate 0.0002 Epoch: 25 Global Step: 526760 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:06,906-Speed 6311.03 samples/sec Loss 4.0644 LearningRate 0.0002 Epoch: 25 Global Step: 526770 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:10,152-Speed 6310.02 samples/sec Loss 4.0717 LearningRate 0.0002 Epoch: 25 Global Step: 526780 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:13,398-Speed 6310.23 samples/sec Loss 4.0404 LearningRate 0.0002 Epoch: 25 Global Step: 526790 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:16,642-Speed 6314.99 samples/sec Loss 4.0774 LearningRate 0.0002 Epoch: 25 Global Step: 526800 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-02 15:39:19,873-Speed 6340.47 samples/sec Loss 3.9767 LearningRate 0.0002 Epoch: 25 Global Step: 526810 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:23,121-Speed 6306.39 samples/sec Loss 4.0725 LearningRate 0.0002 Epoch: 25 Global Step: 526820 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:26,373-Speed 6299.40 samples/sec Loss 4.0415 LearningRate 0.0002 Epoch: 25 Global Step: 526830 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:29,622-Speed 6305.26 samples/sec Loss 4.0413 LearningRate 0.0002 Epoch: 25 Global Step: 526840 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:32,871-Speed 6305.40 samples/sec Loss 4.0212 LearningRate 0.0002 Epoch: 25 Global Step: 526850 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:36,116-Speed 6311.94 samples/sec Loss 4.0681 LearningRate 0.0002 Epoch: 25 Global Step: 526860 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:39,363-Speed 6308.47 samples/sec Loss 4.0957 LearningRate 0.0002 Epoch: 25 Global Step: 526870 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:42,606-Speed 6315.98 samples/sec Loss 4.0676 LearningRate 0.0002 Epoch: 25 Global Step: 526880 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:45,848-Speed 6318.65 samples/sec Loss 4.0772 LearningRate 0.0002 Epoch: 25 Global Step: 526890 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:49,089-Speed 6321.13 samples/sec Loss 4.0828 LearningRate 0.0002 Epoch: 25 Global Step: 526900 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:52,321-Speed 6339.50 samples/sec Loss 4.0269 LearningRate 0.0002 Epoch: 25 Global Step: 526910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:55,567-Speed 6309.90 samples/sec Loss 4.0834 LearningRate 0.0002 Epoch: 25 Global Step: 526920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:39:58,812-Speed 6313.37 samples/sec Loss 4.0573 LearningRate 0.0002 Epoch: 25 Global Step: 526930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:02,059-Speed 6308.80 samples/sec Loss 4.0633 LearningRate 0.0002 Epoch: 25 Global Step: 526940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:05,304-Speed 6312.64 samples/sec Loss 4.1431 LearningRate 0.0002 Epoch: 25 Global Step: 526950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:08,548-Speed 6314.28 samples/sec Loss 4.0313 LearningRate 0.0002 Epoch: 25 Global Step: 526960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:11,790-Speed 6318.44 samples/sec Loss 4.0049 LearningRate 0.0002 Epoch: 25 Global Step: 526970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:15,037-Speed 6309.95 samples/sec Loss 4.0963 LearningRate 0.0002 Epoch: 25 Global Step: 526980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:18,281-Speed 6313.24 samples/sec Loss 4.0549 LearningRate 0.0002 Epoch: 25 Global Step: 526990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:21,526-Speed 6313.71 samples/sec Loss 4.0974 LearningRate 0.0002 Epoch: 25 Global Step: 527000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:24,777-Speed 6301.47 samples/sec Loss 4.0621 LearningRate 0.0002 Epoch: 25 Global Step: 527010 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:28,022-Speed 6310.84 samples/sec Loss 3.9893 LearningRate 0.0002 Epoch: 25 Global Step: 527020 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:31,267-Speed 6314.21 samples/sec Loss 4.0781 LearningRate 0.0002 Epoch: 25 Global Step: 527030 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:34,509-Speed 6316.97 samples/sec Loss 4.1197 LearningRate 0.0002 Epoch: 25 Global Step: 527040 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:37,765-Speed 6291.35 samples/sec Loss 4.0742 LearningRate 0.0002 Epoch: 25 Global Step: 527050 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:41,011-Speed 6311.64 samples/sec Loss 4.1368 LearningRate 0.0002 Epoch: 25 Global Step: 527060 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:44,253-Speed 6317.68 samples/sec Loss 4.0786 LearningRate 0.0002 Epoch: 25 Global Step: 527070 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:47,498-Speed 6312.94 samples/sec Loss 4.0696 LearningRate 0.0002 Epoch: 25 Global Step: 527080 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:50,740-Speed 6318.38 samples/sec Loss 4.0800 LearningRate 0.0002 Epoch: 25 Global Step: 527090 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:53,983-Speed 6316.46 samples/sec Loss 4.0010 LearningRate 0.0002 Epoch: 25 Global Step: 527100 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:40:57,218-Speed 6332.76 samples/sec Loss 4.1285 LearningRate 0.0002 Epoch: 25 Global Step: 527110 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:41:00,461-Speed 6317.77 samples/sec Loss 4.1547 LearningRate 0.0002 Epoch: 25 Global Step: 527120 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:41:03,706-Speed 6312.92 samples/sec Loss 4.0783 LearningRate 0.0002 Epoch: 25 Global Step: 527130 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:41:06,947-Speed 6320.00 samples/sec Loss 4.0913 LearningRate 0.0002 Epoch: 25 Global Step: 527140 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:41:10,193-Speed 6310.76 samples/sec Loss 4.0387 LearningRate 0.0002 Epoch: 25 Global Step: 527150 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:41:13,438-Speed 6312.76 samples/sec Loss 4.1192 LearningRate 0.0002 Epoch: 25 Global Step: 527160 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:41:16,681-Speed 6315.81 samples/sec Loss 4.0004 LearningRate 0.0002 Epoch: 25 Global Step: 527170 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:41:19,925-Speed 6315.64 samples/sec Loss 4.0510 LearningRate 0.0002 Epoch: 25 Global Step: 527180 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:41:23,171-Speed 6309.24 samples/sec Loss 4.0360 LearningRate 0.0002 Epoch: 25 Global Step: 527190 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-04-02 15:41:26,416-Speed 6314.48 samples/sec Loss 4.0397 LearningRate 0.0002 Epoch: 25 Global Step: 527200 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:41:29,649-Speed 6334.52 samples/sec Loss 4.0988 LearningRate 0.0002 Epoch: 25 Global Step: 527210 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:41:32,897-Speed 6308.79 samples/sec Loss 4.0184 LearningRate 0.0002 Epoch: 25 Global Step: 527220 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:41:36,194-Speed 6211.38 samples/sec Loss 4.0461 LearningRate 0.0002 Epoch: 25 Global Step: 527230 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:41:39,541-Speed 6120.09 samples/sec Loss 4.0048 LearningRate 0.0002 Epoch: 25 Global Step: 527240 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:41:42,851-Speed 6188.94 samples/sec Loss 4.0977 LearningRate 0.0002 Epoch: 25 Global Step: 527250 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:41:46,096-Speed 6313.84 samples/sec Loss 4.1005 LearningRate 0.0002 Epoch: 25 Global Step: 527260 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:41:49,343-Speed 6307.83 samples/sec Loss 4.0760 LearningRate 0.0002 Epoch: 25 Global Step: 527270 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:41:52,585-Speed 6317.96 samples/sec Loss 4.0786 LearningRate 0.0002 Epoch: 25 Global Step: 527280 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:41:55,832-Speed 6310.61 samples/sec Loss 4.0036 LearningRate 0.0002 Epoch: 25 Global Step: 527290 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:41:59,087-Speed 6294.29 samples/sec Loss 4.1649 LearningRate 0.0002 Epoch: 25 Global Step: 527300 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:42:02,331-Speed 6313.41 samples/sec Loss 4.0923 LearningRate 0.0002 Epoch: 25 Global Step: 527310 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:42:05,572-Speed 6320.80 samples/sec Loss 4.0615 LearningRate 0.0002 Epoch: 25 Global Step: 527320 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:42:08,815-Speed 6316.86 samples/sec Loss 4.0985 LearningRate 0.0002 Epoch: 25 Global Step: 527330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:12,059-Speed 6314.70 samples/sec Loss 3.9892 LearningRate 0.0002 Epoch: 25 Global Step: 527340 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:15,304-Speed 6313.03 samples/sec Loss 4.0243 LearningRate 0.0002 Epoch: 25 Global Step: 527350 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:18,548-Speed 6314.79 samples/sec Loss 4.0658 LearningRate 0.0002 Epoch: 25 Global Step: 527360 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:21,794-Speed 6311.29 samples/sec Loss 4.0410 LearningRate 0.0002 Epoch: 25 Global Step: 527370 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:25,043-Speed 6303.66 samples/sec Loss 4.0999 LearningRate 0.0002 Epoch: 25 Global Step: 527380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:28,289-Speed 6312.44 samples/sec Loss 4.0662 LearningRate 0.0002 Epoch: 25 Global Step: 527390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:31,534-Speed 6311.66 samples/sec Loss 4.1304 LearningRate 0.0002 Epoch: 25 Global Step: 527400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:34,776-Speed 6319.42 samples/sec Loss 4.0989 LearningRate 0.0002 Epoch: 25 Global Step: 527410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:38,024-Speed 6305.27 samples/sec Loss 4.0506 LearningRate 0.0002 Epoch: 25 Global Step: 527420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:41,276-Speed 6300.11 samples/sec Loss 4.0921 LearningRate 0.0002 Epoch: 25 Global Step: 527430 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-02 15:42:44,504-Speed 6345.49 samples/sec Loss 4.1463 LearningRate 0.0002 Epoch: 25 Global Step: 527440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:47,752-Speed 6307.63 samples/sec Loss 4.0552 LearningRate 0.0002 Epoch: 25 Global Step: 527450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:50,994-Speed 6317.65 samples/sec Loss 4.0878 LearningRate 0.0002 Epoch: 25 Global Step: 527460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:54,240-Speed 6309.76 samples/sec Loss 4.0690 LearningRate 0.0002 Epoch: 25 Global Step: 527470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:42:57,485-Speed 6312.98 samples/sec Loss 4.1004 LearningRate 0.0002 Epoch: 25 Global Step: 527480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:00,731-Speed 6310.17 samples/sec Loss 4.0806 LearningRate 0.0002 Epoch: 25 Global Step: 527490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:03,979-Speed 6307.69 samples/sec Loss 3.9925 LearningRate 0.0002 Epoch: 25 Global Step: 527500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:07,227-Speed 6307.18 samples/sec Loss 4.0671 LearningRate 0.0002 Epoch: 25 Global Step: 527510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:10,471-Speed 6313.24 samples/sec Loss 4.0633 LearningRate 0.0002 Epoch: 25 Global Step: 527520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:13,715-Speed 6315.04 samples/sec Loss 4.0960 LearningRate 0.0002 Epoch: 25 Global Step: 527530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:16,947-Speed 6338.45 samples/sec Loss 4.0200 LearningRate 0.0002 Epoch: 25 Global Step: 527540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:20,196-Speed 6305.61 samples/sec Loss 4.0659 LearningRate 0.0002 Epoch: 25 Global Step: 527550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:23,448-Speed 6298.95 samples/sec Loss 4.1188 LearningRate 0.0002 Epoch: 25 Global Step: 527560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:26,694-Speed 6311.83 samples/sec Loss 4.1028 LearningRate 0.0002 Epoch: 25 Global Step: 527570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:29,941-Speed 6310.31 samples/sec Loss 4.1284 LearningRate 0.0002 Epoch: 25 Global Step: 527580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:33,185-Speed 6314.77 samples/sec Loss 4.0689 LearningRate 0.0002 Epoch: 25 Global Step: 527590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:36,429-Speed 6315.52 samples/sec Loss 4.0329 LearningRate 0.0002 Epoch: 25 Global Step: 527600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:39,674-Speed 6311.09 samples/sec Loss 4.0444 LearningRate 0.0002 Epoch: 25 Global Step: 527610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:42,922-Speed 6306.88 samples/sec Loss 4.0554 LearningRate 0.0002 Epoch: 25 Global Step: 527620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:46,166-Speed 6313.94 samples/sec Loss 4.1119 LearningRate 0.0002 Epoch: 25 Global Step: 527630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:49,403-Speed 6330.56 samples/sec Loss 4.0733 LearningRate 0.0002 Epoch: 25 Global Step: 527640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:52,647-Speed 6314.30 samples/sec Loss 4.0872 LearningRate 0.0002 Epoch: 25 Global Step: 527650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:55,899-Speed 6299.75 samples/sec Loss 4.0660 LearningRate 0.0002 Epoch: 25 Global Step: 527660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:43:59,145-Speed 6309.67 samples/sec Loss 4.0353 LearningRate 0.0002 Epoch: 25 Global Step: 527670 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:02,391-Speed 6310.86 samples/sec Loss 3.9905 LearningRate 0.0002 Epoch: 25 Global Step: 527680 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:05,637-Speed 6311.62 samples/sec Loss 4.0639 LearningRate 0.0002 Epoch: 25 Global Step: 527690 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:08,882-Speed 6311.55 samples/sec Loss 4.0628 LearningRate 0.0002 Epoch: 25 Global Step: 527700 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:12,127-Speed 6312.98 samples/sec Loss 4.0592 LearningRate 0.0002 Epoch: 25 Global Step: 527710 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:15,377-Speed 6302.95 samples/sec Loss 4.0921 LearningRate 0.0002 Epoch: 25 Global Step: 527720 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:18,644-Speed 6270.72 samples/sec Loss 4.0245 LearningRate 0.0002 Epoch: 25 Global Step: 527730 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:21,876-Speed 6336.19 samples/sec Loss 4.0747 LearningRate 0.0002 Epoch: 25 Global Step: 527740 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:25,128-Speed 6300.31 samples/sec Loss 4.0732 LearningRate 0.0002 Epoch: 25 Global Step: 527750 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:28,381-Speed 6298.41 samples/sec Loss 4.1420 LearningRate 0.0002 Epoch: 25 Global Step: 527760 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:31,628-Speed 6308.92 samples/sec Loss 4.0627 LearningRate 0.0002 Epoch: 25 Global Step: 527770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:34,874-Speed 6310.24 samples/sec Loss 4.0237 LearningRate 0.0002 Epoch: 25 Global Step: 527780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:38,118-Speed 6313.46 samples/sec Loss 4.0411 LearningRate 0.0002 Epoch: 25 Global Step: 527790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:41,367-Speed 6307.15 samples/sec Loss 4.0464 LearningRate 0.0002 Epoch: 25 Global Step: 527800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:44,611-Speed 6313.64 samples/sec Loss 4.0463 LearningRate 0.0002 Epoch: 25 Global Step: 527810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:47,859-Speed 6306.13 samples/sec Loss 4.0950 LearningRate 0.0002 Epoch: 25 Global Step: 527820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:51,180-Speed 6168.76 samples/sec Loss 4.0796 LearningRate 0.0002 Epoch: 25 Global Step: 527830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:44:54,478-Speed 6214.03 samples/sec Loss 4.0273 LearningRate 0.0002 Epoch: 25 Global Step: 527840 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:44:57,723-Speed 6311.35 samples/sec Loss 3.9956 LearningRate 0.0002 Epoch: 25 Global Step: 527850 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:45:00,966-Speed 6316.47 samples/sec Loss 4.0817 LearningRate 0.0002 Epoch: 25 Global Step: 527860 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:45:04,207-Speed 6319.99 samples/sec Loss 4.0665 LearningRate 0.0002 Epoch: 25 Global Step: 527870 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:45:07,454-Speed 6309.90 samples/sec Loss 4.1040 LearningRate 0.0002 Epoch: 25 Global Step: 527880 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:45:10,699-Speed 6311.91 samples/sec Loss 3.9873 LearningRate 0.0002 Epoch: 25 Global Step: 527890 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:45:13,943-Speed 6314.89 samples/sec Loss 4.0539 LearningRate 0.0002 Epoch: 25 Global Step: 527900 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:45:17,189-Speed 6311.82 samples/sec Loss 4.0833 LearningRate 0.0002 Epoch: 25 Global Step: 527910 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:45:20,440-Speed 6299.71 samples/sec Loss 4.0841 LearningRate 0.0002 Epoch: 25 Global Step: 527920 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:45:23,693-Speed 6297.95 samples/sec Loss 4.0915 LearningRate 0.0002 Epoch: 25 Global Step: 527930 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:45:26,936-Speed 6315.88 samples/sec Loss 4.0887 LearningRate 0.0002 Epoch: 25 Global Step: 527940 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:45:30,187-Speed 6300.52 samples/sec Loss 4.0878 LearningRate 0.0002 Epoch: 25 Global Step: 527950 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:45:33,429-Speed 6319.47 samples/sec Loss 4.0687 LearningRate 0.0002 Epoch: 25 Global Step: 527960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:45:36,674-Speed 6313.15 samples/sec Loss 4.0516 LearningRate 0.0002 Epoch: 25 Global Step: 527970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:45:39,918-Speed 6314.68 samples/sec Loss 4.0388 LearningRate 0.0002 Epoch: 25 Global Step: 527980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:45:43,162-Speed 6314.89 samples/sec Loss 4.1259 LearningRate 0.0002 Epoch: 25 Global Step: 527990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:45:46,407-Speed 6312.54 samples/sec Loss 4.0699 LearningRate 0.0002 Epoch: 25 Global Step: 528000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:45:49,650-Speed 6317.41 samples/sec Loss 4.0140 LearningRate 0.0002 Epoch: 25 Global Step: 528010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:45:52,895-Speed 6311.93 samples/sec Loss 4.0531 LearningRate 0.0002 Epoch: 25 Global Step: 528020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:45:56,143-Speed 6307.47 samples/sec Loss 4.0505 LearningRate 0.0002 Epoch: 25 Global Step: 528030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:45:59,379-Speed 6328.54 samples/sec Loss 4.0513 LearningRate 0.0002 Epoch: 25 Global Step: 528040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:02,625-Speed 6311.89 samples/sec Loss 4.1150 LearningRate 0.0002 Epoch: 25 Global Step: 528050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:05,866-Speed 6320.67 samples/sec Loss 4.1325 LearningRate 0.0002 Epoch: 25 Global Step: 528060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:09,109-Speed 6316.11 samples/sec Loss 4.0960 LearningRate 0.0002 Epoch: 25 Global Step: 528070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:12,351-Speed 6319.19 samples/sec Loss 4.0381 LearningRate 0.0002 Epoch: 25 Global Step: 528080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:15,598-Speed 6309.29 samples/sec Loss 4.0264 LearningRate 0.0002 Epoch: 25 Global Step: 528090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:18,846-Speed 6306.36 samples/sec Loss 4.0933 LearningRate 0.0002 Epoch: 25 Global Step: 528100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:22,093-Speed 6308.73 samples/sec Loss 4.1158 LearningRate 0.0002 Epoch: 25 Global Step: 528110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:25,338-Speed 6312.97 samples/sec Loss 3.9751 LearningRate 0.0002 Epoch: 25 Global Step: 528120 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:28,579-Speed 6319.34 samples/sec Loss 4.0886 LearningRate 0.0002 Epoch: 25 Global Step: 528130 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:31,810-Speed 6341.04 samples/sec Loss 3.9905 LearningRate 0.0002 Epoch: 25 Global Step: 528140 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:35,064-Speed 6295.50 samples/sec Loss 4.0380 LearningRate 0.0002 Epoch: 25 Global Step: 528150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:38,310-Speed 6311.03 samples/sec Loss 4.0749 LearningRate 0.0002 Epoch: 25 Global Step: 528160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:41,553-Speed 6316.61 samples/sec Loss 4.0767 LearningRate 0.0002 Epoch: 25 Global Step: 528170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:44,798-Speed 6312.16 samples/sec Loss 4.1024 LearningRate 0.0002 Epoch: 25 Global Step: 528180 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:48,044-Speed 6310.58 samples/sec Loss 4.0523 LearningRate 0.0002 Epoch: 25 Global Step: 528190 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:51,287-Speed 6316.16 samples/sec Loss 4.0101 LearningRate 0.0002 Epoch: 25 Global Step: 528200 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:54,532-Speed 6314.35 samples/sec Loss 4.0888 LearningRate 0.0002 Epoch: 25 Global Step: 528210 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:46:57,782-Speed 6302.56 samples/sec Loss 4.1103 LearningRate 0.0002 Epoch: 25 Global Step: 528220 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:01,027-Speed 6312.92 samples/sec Loss 4.0865 LearningRate 0.0002 Epoch: 25 Global Step: 528230 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:04,274-Speed 6307.66 samples/sec Loss 4.0541 LearningRate 0.0002 Epoch: 25 Global Step: 528240 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-02 15:47:07,502-Speed 6347.09 samples/sec Loss 4.0764 LearningRate 0.0002 Epoch: 25 Global Step: 528250 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:10,746-Speed 6312.90 samples/sec Loss 4.0500 LearningRate 0.0002 Epoch: 25 Global Step: 528260 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:13,992-Speed 6312.61 samples/sec Loss 4.0296 LearningRate 0.0002 Epoch: 25 Global Step: 528270 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:17,236-Speed 6312.85 samples/sec Loss 4.0429 LearningRate 0.0002 Epoch: 25 Global Step: 528280 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:20,487-Speed 6301.01 samples/sec Loss 4.1030 LearningRate 0.0002 Epoch: 25 Global Step: 528290 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:23,732-Speed 6313.06 samples/sec Loss 4.0278 LearningRate 0.0002 Epoch: 25 Global Step: 528300 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:26,975-Speed 6317.34 samples/sec Loss 4.0025 LearningRate 0.0002 Epoch: 25 Global Step: 528310 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:30,219-Speed 6314.89 samples/sec Loss 4.1021 LearningRate 0.0002 Epoch: 25 Global Step: 528320 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:33,466-Speed 6307.68 samples/sec Loss 4.1059 LearningRate 0.0002 Epoch: 25 Global Step: 528330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:36,711-Speed 6314.18 samples/sec Loss 4.0870 LearningRate 0.0002 Epoch: 25 Global Step: 528340 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:39,941-Speed 6341.51 samples/sec Loss 4.0210 LearningRate 0.0002 Epoch: 25 Global Step: 528350 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:43,184-Speed 6316.73 samples/sec Loss 4.0449 LearningRate 0.0002 Epoch: 25 Global Step: 528360 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:46,426-Speed 6317.67 samples/sec Loss 4.0334 LearningRate 0.0002 Epoch: 25 Global Step: 528370 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:49,669-Speed 6315.54 samples/sec Loss 4.0478 LearningRate 0.0002 Epoch: 25 Global Step: 528380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:52,919-Speed 6303.53 samples/sec Loss 4.0388 LearningRate 0.0002 Epoch: 25 Global Step: 528390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:56,163-Speed 6316.68 samples/sec Loss 4.0010 LearningRate 0.0002 Epoch: 25 Global Step: 528400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:47:59,408-Speed 6311.87 samples/sec Loss 4.0817 LearningRate 0.0002 Epoch: 25 Global Step: 528410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:02,650-Speed 6319.13 samples/sec Loss 4.1113 LearningRate 0.0002 Epoch: 25 Global Step: 528420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:05,894-Speed 6314.31 samples/sec Loss 4.0570 LearningRate 0.0002 Epoch: 25 Global Step: 528430 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:09,138-Speed 6313.98 samples/sec Loss 4.1163 LearningRate 0.0002 Epoch: 25 Global Step: 528440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:12,368-Speed 6344.75 samples/sec Loss 4.0642 LearningRate 0.0002 Epoch: 25 Global Step: 528450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:15,611-Speed 6317.06 samples/sec Loss 4.0645 LearningRate 0.0002 Epoch: 25 Global Step: 528460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:18,857-Speed 6311.70 samples/sec Loss 4.0510 LearningRate 0.0002 Epoch: 25 Global Step: 528470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:22,105-Speed 6306.82 samples/sec Loss 4.0736 LearningRate 0.0002 Epoch: 25 Global Step: 528480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:25,346-Speed 6318.51 samples/sec Loss 4.0006 LearningRate 0.0002 Epoch: 25 Global Step: 528490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:28,594-Speed 6307.00 samples/sec Loss 4.0756 LearningRate 0.0002 Epoch: 25 Global Step: 528500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:31,839-Speed 6314.18 samples/sec Loss 4.0670 LearningRate 0.0002 Epoch: 25 Global Step: 528510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:35,083-Speed 6313.71 samples/sec Loss 4.0366 LearningRate 0.0002 Epoch: 25 Global Step: 528520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:38,327-Speed 6315.33 samples/sec Loss 4.0884 LearningRate 0.0002 Epoch: 25 Global Step: 528530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:41,571-Speed 6313.82 samples/sec Loss 4.0742 LearningRate 0.0002 Epoch: 25 Global Step: 528540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:44,803-Speed 6338.83 samples/sec Loss 4.0099 LearningRate 0.0002 Epoch: 25 Global Step: 528550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:48,042-Speed 6324.13 samples/sec Loss 4.0325 LearningRate 0.0002 Epoch: 25 Global Step: 528560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:51,292-Speed 6302.91 samples/sec Loss 4.0767 LearningRate 0.0002 Epoch: 25 Global Step: 528570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:54,536-Speed 6316.13 samples/sec Loss 4.1314 LearningRate 0.0002 Epoch: 25 Global Step: 528580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:48:57,781-Speed 6310.98 samples/sec Loss 4.0183 LearningRate 0.0002 Epoch: 25 Global Step: 528590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:49:01,028-Speed 6309.52 samples/sec Loss 4.0548 LearningRate 0.0002 Epoch: 25 Global Step: 528600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:49:04,272-Speed 6314.12 samples/sec Loss 4.1081 LearningRate 0.0002 Epoch: 25 Global Step: 528610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:49:07,522-Speed 6304.38 samples/sec Loss 4.0620 LearningRate 0.0002 Epoch: 25 Global Step: 528620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:49:10,766-Speed 6314.07 samples/sec Loss 4.0362 LearningRate 0.0002 Epoch: 25 Global Step: 528630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:49:14,007-Speed 6320.22 samples/sec Loss 4.0495 LearningRate 0.0002 Epoch: 25 Global Step: 528640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:49:17,239-Speed 6338.69 samples/sec Loss 4.0821 LearningRate 0.0002 Epoch: 25 Global Step: 528650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:49:20,479-Speed 6321.70 samples/sec Loss 4.0433 LearningRate 0.0002 Epoch: 25 Global Step: 528660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:49:23,714-Speed 6332.30 samples/sec Loss 4.0578 LearningRate 0.0002 Epoch: 25 Global Step: 528670 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:49:26,960-Speed 6310.31 samples/sec Loss 4.1833 LearningRate 0.0002 Epoch: 25 Global Step: 528680 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:49:30,205-Speed 6313.29 samples/sec Loss 4.0522 LearningRate 0.0002 Epoch: 25 Global Step: 528690 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:49:33,452-Speed 6308.67 samples/sec Loss 4.0702 LearningRate 0.0002 Epoch: 25 Global Step: 528700 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:49:36,698-Speed 6311.60 samples/sec Loss 4.0006 LearningRate 0.0002 Epoch: 25 Global Step: 528710 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:49:39,943-Speed 6311.77 samples/sec Loss 4.0899 LearningRate 0.0002 Epoch: 25 Global Step: 528720 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:49:43,189-Speed 6311.32 samples/sec Loss 4.0844 LearningRate 0.0002 Epoch: 25 Global Step: 528730 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:49:46,434-Speed 6312.96 samples/sec Loss 4.0694 LearningRate 0.0002 Epoch: 25 Global Step: 528740 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:49:49,678-Speed 6313.96 samples/sec Loss 4.0806 LearningRate 0.0002 Epoch: 25 Global Step: 528750 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:49:52,925-Speed 6309.48 samples/sec Loss 4.0040 LearningRate 0.0002 Epoch: 25 Global Step: 528760 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:49:56,176-Speed 6299.48 samples/sec Loss 4.0421 LearningRate 0.0002 Epoch: 25 Global Step: 528770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:49:59,424-Speed 6306.85 samples/sec Loss 4.0530 LearningRate 0.0002 Epoch: 25 Global Step: 528780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:02,673-Speed 6304.34 samples/sec Loss 4.0138 LearningRate 0.0002 Epoch: 25 Global Step: 528790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:05,918-Speed 6313.68 samples/sec Loss 4.0884 LearningRate 0.0002 Epoch: 25 Global Step: 528800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:09,166-Speed 6307.49 samples/sec Loss 4.0793 LearningRate 0.0002 Epoch: 25 Global Step: 528810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:12,416-Speed 6302.06 samples/sec Loss 4.0875 LearningRate 0.0002 Epoch: 25 Global Step: 528820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:15,664-Speed 6306.90 samples/sec Loss 4.0874 LearningRate 0.0002 Epoch: 25 Global Step: 528830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:18,909-Speed 6313.12 samples/sec Loss 4.0693 LearningRate 0.0002 Epoch: 25 Global Step: 528840 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:22,155-Speed 6311.38 samples/sec Loss 4.1125 LearningRate 0.0002 Epoch: 25 Global Step: 528850 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:25,401-Speed 6311.81 samples/sec Loss 4.0603 LearningRate 0.0002 Epoch: 25 Global Step: 528860 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:28,667-Speed 6271.87 samples/sec Loss 4.0595 LearningRate 0.0002 Epoch: 25 Global Step: 528870 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-02 15:50:31,900-Speed 6335.67 samples/sec Loss 4.0517 LearningRate 0.0002 Epoch: 25 Global Step: 528880 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:35,147-Speed 6309.00 samples/sec Loss 4.0700 LearningRate 0.0002 Epoch: 25 Global Step: 528890 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:38,425-Speed 6247.63 samples/sec Loss 4.0224 LearningRate 0.0002 Epoch: 25 Global Step: 528900 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:41,672-Speed 6310.13 samples/sec Loss 3.9756 LearningRate 0.0002 Epoch: 25 Global Step: 528910 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:44,918-Speed 6309.34 samples/sec Loss 4.0130 LearningRate 0.0002 Epoch: 25 Global Step: 528920 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:48,162-Speed 6315.59 samples/sec Loss 4.0306 LearningRate 0.0002 Epoch: 25 Global Step: 528930 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:51,439-Speed 6250.39 samples/sec Loss 4.0058 LearningRate 0.0002 Epoch: 25 Global Step: 528940 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:54,680-Speed 6320.51 samples/sec Loss 4.0519 LearningRate 0.0002 Epoch: 25 Global Step: 528950 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:50:57,925-Speed 6312.94 samples/sec Loss 4.1112 LearningRate 0.0002 Epoch: 25 Global Step: 528960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:01,168-Speed 6316.96 samples/sec Loss 4.0568 LearningRate 0.0002 Epoch: 25 Global Step: 528970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:04,398-Speed 6341.42 samples/sec Loss 4.0906 LearningRate 0.0002 Epoch: 25 Global Step: 528980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:07,646-Speed 6306.99 samples/sec Loss 4.0602 LearningRate 0.0002 Epoch: 25 Global Step: 528990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:10,900-Speed 6294.89 samples/sec Loss 4.0842 LearningRate 0.0002 Epoch: 25 Global Step: 529000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:14,144-Speed 6314.39 samples/sec Loss 4.0221 LearningRate 0.0002 Epoch: 25 Global Step: 529010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:17,388-Speed 6315.28 samples/sec Loss 4.0435 LearningRate 0.0002 Epoch: 25 Global Step: 529020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:20,632-Speed 6314.95 samples/sec Loss 4.0668 LearningRate 0.0002 Epoch: 25 Global Step: 529030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:23,879-Speed 6308.63 samples/sec Loss 4.1155 LearningRate 0.0002 Epoch: 25 Global Step: 529040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:27,120-Speed 6320.05 samples/sec Loss 4.0039 LearningRate 0.0002 Epoch: 25 Global Step: 529050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:30,365-Speed 6312.44 samples/sec Loss 4.1005 LearningRate 0.0002 Epoch: 25 Global Step: 529060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:33,609-Speed 6315.72 samples/sec Loss 4.0455 LearningRate 0.0002 Epoch: 25 Global Step: 529070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:36,838-Speed 6344.51 samples/sec Loss 4.0516 LearningRate 0.0002 Epoch: 25 Global Step: 529080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:40,078-Speed 6321.45 samples/sec Loss 4.0571 LearningRate 0.0002 Epoch: 25 Global Step: 529090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:43,323-Speed 6313.03 samples/sec Loss 3.9212 LearningRate 0.0002 Epoch: 25 Global Step: 529100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:46,576-Speed 6296.99 samples/sec Loss 4.0127 LearningRate 0.0002 Epoch: 25 Global Step: 529110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:49,823-Speed 6308.48 samples/sec Loss 4.0568 LearningRate 0.0002 Epoch: 25 Global Step: 529120 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:53,070-Speed 6309.43 samples/sec Loss 4.0058 LearningRate 0.0002 Epoch: 25 Global Step: 529130 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:56,313-Speed 6316.55 samples/sec Loss 4.0604 LearningRate 0.0002 Epoch: 25 Global Step: 529140 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:51:59,559-Speed 6310.17 samples/sec Loss 4.0377 LearningRate 0.0002 Epoch: 25 Global Step: 529150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:02,811-Speed 6298.58 samples/sec Loss 4.0451 LearningRate 0.0002 Epoch: 25 Global Step: 529160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:06,069-Speed 6288.31 samples/sec Loss 4.0193 LearningRate 0.0002 Epoch: 25 Global Step: 529170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:09,303-Speed 6333.80 samples/sec Loss 4.0911 LearningRate 0.0002 Epoch: 25 Global Step: 529180 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:12,546-Speed 6315.87 samples/sec Loss 3.9995 LearningRate 0.0002 Epoch: 25 Global Step: 529190 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:15,791-Speed 6312.86 samples/sec Loss 4.0259 LearningRate 0.0002 Epoch: 25 Global Step: 529200 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:19,038-Speed 6309.38 samples/sec Loss 4.0300 LearningRate 0.0002 Epoch: 25 Global Step: 529210 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:22,283-Speed 6312.65 samples/sec Loss 4.0330 LearningRate 0.0002 Epoch: 25 Global Step: 529220 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:25,528-Speed 6313.00 samples/sec Loss 4.1351 LearningRate 0.0002 Epoch: 25 Global Step: 529230 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:28,770-Speed 6317.90 samples/sec Loss 4.0626 LearningRate 0.0002 Epoch: 25 Global Step: 529240 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:32,014-Speed 6314.47 samples/sec Loss 4.0893 LearningRate 0.0002 Epoch: 25 Global Step: 529250 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:35,259-Speed 6313.07 samples/sec Loss 4.0840 LearningRate 0.0002 Epoch: 25 Global Step: 529260 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:38,507-Speed 6306.38 samples/sec Loss 4.0554 LearningRate 0.0002 Epoch: 25 Global Step: 529270 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:41,735-Speed 6345.67 samples/sec Loss 4.0243 LearningRate 0.0002 Epoch: 25 Global Step: 529280 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:44,974-Speed 6324.46 samples/sec Loss 4.0810 LearningRate 0.0002 Epoch: 25 Global Step: 529290 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:48,219-Speed 6313.81 samples/sec Loss 4.0748 LearningRate 0.0002 Epoch: 25 Global Step: 529300 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:51,464-Speed 6313.28 samples/sec Loss 4.0180 LearningRate 0.0002 Epoch: 25 Global Step: 529310 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:54,718-Speed 6293.86 samples/sec Loss 4.0981 LearningRate 0.0002 Epoch: 25 Global Step: 529320 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:52:57,971-Speed 6297.43 samples/sec Loss 4.0323 LearningRate 0.0002 Epoch: 25 Global Step: 529330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:01,216-Speed 6312.34 samples/sec Loss 4.1118 LearningRate 0.0002 Epoch: 25 Global Step: 529340 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:04,458-Speed 6318.93 samples/sec Loss 4.0858 LearningRate 0.0002 Epoch: 25 Global Step: 529350 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:07,703-Speed 6313.81 samples/sec Loss 4.0840 LearningRate 0.0002 Epoch: 25 Global Step: 529360 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:10,944-Speed 6319.07 samples/sec Loss 4.0533 LearningRate 0.0002 Epoch: 25 Global Step: 529370 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:14,172-Speed 6346.53 samples/sec Loss 4.0236 LearningRate 0.0002 Epoch: 25 Global Step: 529380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:17,415-Speed 6316.09 samples/sec Loss 4.0891 LearningRate 0.0002 Epoch: 25 Global Step: 529390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:20,659-Speed 6315.53 samples/sec Loss 4.0532 LearningRate 0.0002 Epoch: 25 Global Step: 529400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:23,903-Speed 6313.41 samples/sec Loss 4.0896 LearningRate 0.0002 Epoch: 25 Global Step: 529410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:27,147-Speed 6315.48 samples/sec Loss 4.0896 LearningRate 0.0002 Epoch: 25 Global Step: 529420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:30,391-Speed 6314.82 samples/sec Loss 3.9639 LearningRate 0.0002 Epoch: 25 Global Step: 529430 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:33,650-Speed 6285.42 samples/sec Loss 4.1315 LearningRate 0.0002 Epoch: 25 Global Step: 529440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:36,958-Speed 6195.65 samples/sec Loss 4.0904 LearningRate 0.0002 Epoch: 25 Global Step: 529450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:40,235-Speed 6250.08 samples/sec Loss 4.0097 LearningRate 0.0002 Epoch: 25 Global Step: 529460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:43,484-Speed 6305.50 samples/sec Loss 4.0561 LearningRate 0.0002 Epoch: 25 Global Step: 529470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:46,714-Speed 6343.52 samples/sec Loss 4.0646 LearningRate 0.0002 Epoch: 25 Global Step: 529480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:49,971-Speed 6289.02 samples/sec Loss 4.1332 LearningRate 0.0002 Epoch: 25 Global Step: 529490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:53,333-Speed 6092.91 samples/sec Loss 4.0742 LearningRate 0.0002 Epoch: 25 Global Step: 529500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:56,603-Speed 6265.06 samples/sec Loss 4.1333 LearningRate 0.0002 Epoch: 25 Global Step: 529510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:53:59,845-Speed 6318.90 samples/sec Loss 4.0576 LearningRate 0.0002 Epoch: 25 Global Step: 529520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:03,097-Speed 6297.50 samples/sec Loss 4.0485 LearningRate 0.0002 Epoch: 25 Global Step: 529530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:06,345-Speed 6308.48 samples/sec Loss 4.0794 LearningRate 0.0002 Epoch: 25 Global Step: 529540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:09,591-Speed 6310.67 samples/sec Loss 4.0334 LearningRate 0.0002 Epoch: 25 Global Step: 529550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:12,837-Speed 6308.75 samples/sec Loss 4.0258 LearningRate 0.0002 Epoch: 25 Global Step: 529560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:16,083-Speed 6311.79 samples/sec Loss 4.0540 LearningRate 0.0002 Epoch: 25 Global Step: 529570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:19,325-Speed 6318.03 samples/sec Loss 4.0566 LearningRate 0.0002 Epoch: 25 Global Step: 529580 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-02 15:54:22,561-Speed 6330.64 samples/sec Loss 4.0302 LearningRate 0.0002 Epoch: 25 Global Step: 529590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:25,814-Speed 6297.96 samples/sec Loss 4.0607 LearningRate 0.0002 Epoch: 25 Global Step: 529600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:29,078-Speed 6275.76 samples/sec Loss 4.0041 LearningRate 0.0002 Epoch: 25 Global Step: 529610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:32,322-Speed 6313.01 samples/sec Loss 4.1263 LearningRate 0.0002 Epoch: 25 Global Step: 529620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:35,566-Speed 6315.98 samples/sec Loss 4.0680 LearningRate 0.0002 Epoch: 25 Global Step: 529630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:38,809-Speed 6315.32 samples/sec Loss 4.0682 LearningRate 0.0002 Epoch: 25 Global Step: 529640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:42,055-Speed 6310.75 samples/sec Loss 4.0363 LearningRate 0.0002 Epoch: 25 Global Step: 529650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:45,302-Speed 6308.63 samples/sec Loss 4.0645 LearningRate 0.0002 Epoch: 25 Global Step: 529660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:48,552-Speed 6303.80 samples/sec Loss 4.0667 LearningRate 0.0002 Epoch: 25 Global Step: 529670 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:54:51,784-Speed 6337.81 samples/sec Loss 4.1348 LearningRate 0.0002 Epoch: 25 Global Step: 529680 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:54:55,030-Speed 6309.97 samples/sec Loss 4.0092 LearningRate 0.0002 Epoch: 25 Global Step: 529690 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:54:58,277-Speed 6310.09 samples/sec Loss 4.0402 LearningRate 0.0002 Epoch: 25 Global Step: 529700 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:55:01,524-Speed 6308.55 samples/sec Loss 4.0976 LearningRate 0.0002 Epoch: 25 Global Step: 529710 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:55:04,780-Speed 6292.40 samples/sec Loss 4.0267 LearningRate 0.0002 Epoch: 25 Global Step: 529720 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:55:08,026-Speed 6310.91 samples/sec Loss 4.0118 LearningRate 0.0002 Epoch: 25 Global Step: 529730 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:55:11,306-Speed 6245.19 samples/sec Loss 4.0885 LearningRate 0.0002 Epoch: 25 Global Step: 529740 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:55:14,558-Speed 6297.29 samples/sec Loss 4.0245 LearningRate 0.0002 Epoch: 25 Global Step: 529750 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:55:17,812-Speed 6295.52 samples/sec Loss 4.0278 LearningRate 0.0002 Epoch: 25 Global Step: 529760 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:55:21,058-Speed 6311.21 samples/sec Loss 4.0970 LearningRate 0.0002 Epoch: 25 Global Step: 529770 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:55:24,305-Speed 6309.30 samples/sec Loss 4.0376 LearningRate 0.0002 Epoch: 25 Global Step: 529780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:55:27,554-Speed 6305.24 samples/sec Loss 3.9911 LearningRate 0.0002 Epoch: 25 Global Step: 529790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:55:30,797-Speed 6316.74 samples/sec Loss 4.0446 LearningRate 0.0002 Epoch: 25 Global Step: 529800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:55:34,040-Speed 6314.71 samples/sec Loss 4.1539 LearningRate 0.0002 Epoch: 25 Global Step: 529810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:55:37,286-Speed 6312.43 samples/sec Loss 4.0750 LearningRate 0.0002 Epoch: 25 Global Step: 529820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:55:40,577-Speed 6222.75 samples/sec Loss 4.0925 LearningRate 0.0002 Epoch: 25 Global Step: 529830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:55:43,820-Speed 6317.03 samples/sec Loss 4.1141 LearningRate 0.0002 Epoch: 25 Global Step: 529840 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:55:47,066-Speed 6311.88 samples/sec Loss 4.0509 LearningRate 0.0002 Epoch: 25 Global Step: 529850 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:55:50,312-Speed 6310.08 samples/sec Loss 4.0831 LearningRate 0.0002 Epoch: 25 Global Step: 529860 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:55:53,559-Speed 6308.73 samples/sec Loss 4.0816 LearningRate 0.0002 Epoch: 25 Global Step: 529870 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:55:56,790-Speed 6339.01 samples/sec Loss 4.0461 LearningRate 0.0002 Epoch: 25 Global Step: 529880 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:00,037-Speed 6308.73 samples/sec Loss 4.0407 LearningRate 0.0002 Epoch: 25 Global Step: 529890 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:03,283-Speed 6311.11 samples/sec Loss 4.0681 LearningRate 0.0002 Epoch: 25 Global Step: 529900 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:06,526-Speed 6317.09 samples/sec Loss 4.0670 LearningRate 0.0002 Epoch: 25 Global Step: 529910 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:09,772-Speed 6312.09 samples/sec Loss 4.0088 LearningRate 0.0002 Epoch: 25 Global Step: 529920 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:13,014-Speed 6316.93 samples/sec Loss 3.9787 LearningRate 0.0002 Epoch: 25 Global Step: 529930 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:16,255-Speed 6322.15 samples/sec Loss 4.0678 LearningRate 0.0002 Epoch: 25 Global Step: 529940 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:19,501-Speed 6310.56 samples/sec Loss 4.0355 LearningRate 0.0002 Epoch: 25 Global Step: 529950 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:22,745-Speed 6313.74 samples/sec Loss 4.0775 LearningRate 0.0002 Epoch: 25 Global Step: 529960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:25,992-Speed 6309.34 samples/sec Loss 4.1627 LearningRate 0.0002 Epoch: 25 Global Step: 529970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:29,221-Speed 6344.72 samples/sec Loss 4.0701 LearningRate 0.0002 Epoch: 25 Global Step: 529980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:32,470-Speed 6304.55 samples/sec Loss 4.0512 LearningRate 0.0002 Epoch: 25 Global Step: 529990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:35,712-Speed 6317.60 samples/sec Loss 4.0700 LearningRate 0.0002 Epoch: 25 Global Step: 530000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:38,957-Speed 6312.65 samples/sec Loss 4.0345 LearningRate 0.0002 Epoch: 25 Global Step: 530010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:42,249-Speed 6222.69 samples/sec Loss 4.0361 LearningRate 0.0002 Epoch: 25 Global Step: 530020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:45,501-Speed 6298.69 samples/sec Loss 4.0485 LearningRate 0.0002 Epoch: 25 Global Step: 530030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:48,743-Speed 6319.40 samples/sec Loss 4.1033 LearningRate 0.0002 Epoch: 25 Global Step: 530040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:51,990-Speed 6309.01 samples/sec Loss 4.0311 LearningRate 0.0002 Epoch: 25 Global Step: 530050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:55,231-Speed 6318.94 samples/sec Loss 4.0647 LearningRate 0.0002 Epoch: 25 Global Step: 530060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:56:58,474-Speed 6316.39 samples/sec Loss 3.9386 LearningRate 0.0002 Epoch: 25 Global Step: 530070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:01,704-Speed 6343.17 samples/sec Loss 4.0371 LearningRate 0.0002 Epoch: 25 Global Step: 530080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:04,946-Speed 6317.62 samples/sec Loss 4.0356 LearningRate 0.0002 Epoch: 25 Global Step: 530090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:08,192-Speed 6310.72 samples/sec Loss 4.0162 LearningRate 0.0002 Epoch: 25 Global Step: 530100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:11,434-Speed 6319.98 samples/sec Loss 4.0921 LearningRate 0.0002 Epoch: 25 Global Step: 530110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:14,689-Speed 6293.35 samples/sec Loss 4.1094 LearningRate 0.0002 Epoch: 25 Global Step: 530120 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:17,933-Speed 6314.20 samples/sec Loss 4.0778 LearningRate 0.0002 Epoch: 25 Global Step: 530130 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:21,179-Speed 6311.93 samples/sec Loss 4.0970 LearningRate 0.0002 Epoch: 25 Global Step: 530140 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:24,425-Speed 6310.04 samples/sec Loss 3.9996 LearningRate 0.0002 Epoch: 25 Global Step: 530150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:27,671-Speed 6309.84 samples/sec Loss 4.0033 LearningRate 0.0002 Epoch: 25 Global Step: 530160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:30,918-Speed 6309.91 samples/sec Loss 4.0958 LearningRate 0.0002 Epoch: 25 Global Step: 530170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:57:34,146-Speed 6344.56 samples/sec Loss 4.0313 LearningRate 0.0002 Epoch: 25 Global Step: 530180 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:57:37,391-Speed 6314.11 samples/sec Loss 4.0816 LearningRate 0.0002 Epoch: 25 Global Step: 530190 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:57:40,646-Speed 6292.73 samples/sec Loss 4.0478 LearningRate 0.0002 Epoch: 25 Global Step: 530200 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:57:43,889-Speed 6316.94 samples/sec Loss 4.0151 LearningRate 0.0002 Epoch: 25 Global Step: 530210 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:57:47,133-Speed 6313.71 samples/sec Loss 4.0109 LearningRate 0.0002 Epoch: 25 Global Step: 530220 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:57:50,403-Speed 6263.44 samples/sec Loss 4.0250 LearningRate 0.0002 Epoch: 25 Global Step: 530230 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:57:53,722-Speed 6172.31 samples/sec Loss 4.0422 LearningRate 0.0002 Epoch: 25 Global Step: 530240 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:57:56,968-Speed 6310.28 samples/sec Loss 4.0450 LearningRate 0.0002 Epoch: 25 Global Step: 530250 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:58:00,213-Speed 6313.36 samples/sec Loss 4.0234 LearningRate 0.0002 Epoch: 25 Global Step: 530260 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:58:03,460-Speed 6308.32 samples/sec Loss 4.0575 LearningRate 0.0002 Epoch: 25 Global Step: 530270 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 15:58:06,710-Speed 6302.71 samples/sec Loss 4.0004 LearningRate 0.0002 Epoch: 25 Global Step: 530280 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:09,957-Speed 6308.72 samples/sec Loss 4.0498 LearningRate 0.0002 Epoch: 25 Global Step: 530290 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:13,199-Speed 6319.76 samples/sec Loss 4.0298 LearningRate 0.0002 Epoch: 25 Global Step: 530300 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:16,445-Speed 6311.28 samples/sec Loss 3.9795 LearningRate 0.0002 Epoch: 25 Global Step: 530310 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:19,694-Speed 6304.71 samples/sec Loss 3.9957 LearningRate 0.0002 Epoch: 25 Global Step: 530320 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:22,940-Speed 6311.56 samples/sec Loss 4.0146 LearningRate 0.0002 Epoch: 25 Global Step: 530330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:26,191-Speed 6300.61 samples/sec Loss 4.0433 LearningRate 0.0002 Epoch: 25 Global Step: 530340 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:29,436-Speed 6311.63 samples/sec Loss 4.0057 LearningRate 0.0002 Epoch: 25 Global Step: 530350 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:32,686-Speed 6303.33 samples/sec Loss 4.0630 LearningRate 0.0002 Epoch: 25 Global Step: 530360 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:35,929-Speed 6316.87 samples/sec Loss 4.0598 LearningRate 0.0002 Epoch: 25 Global Step: 530370 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:39,168-Speed 6325.08 samples/sec Loss 4.0702 LearningRate 0.0002 Epoch: 25 Global Step: 530380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:42,483-Speed 6179.34 samples/sec Loss 4.1036 LearningRate 0.0002 Epoch: 25 Global Step: 530390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:45,725-Speed 6317.81 samples/sec Loss 3.9939 LearningRate 0.0002 Epoch: 25 Global Step: 530400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:48,977-Speed 6297.79 samples/sec Loss 4.0228 LearningRate 0.0002 Epoch: 25 Global Step: 530410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:52,227-Speed 6304.98 samples/sec Loss 4.1172 LearningRate 0.0002 Epoch: 25 Global Step: 530420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:55,480-Speed 6296.33 samples/sec Loss 4.0458 LearningRate 0.0002 Epoch: 25 Global Step: 530430 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:58:58,724-Speed 6314.72 samples/sec Loss 3.9978 LearningRate 0.0002 Epoch: 25 Global Step: 530440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:01,972-Speed 6306.31 samples/sec Loss 4.0277 LearningRate 0.0002 Epoch: 25 Global Step: 530450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:05,219-Speed 6308.90 samples/sec Loss 4.0704 LearningRate 0.0002 Epoch: 25 Global Step: 530460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:08,473-Speed 6294.29 samples/sec Loss 4.0608 LearningRate 0.0002 Epoch: 25 Global Step: 530470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:11,708-Speed 6333.22 samples/sec Loss 4.0517 LearningRate 0.0002 Epoch: 25 Global Step: 530480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:14,972-Speed 6276.40 samples/sec Loss 3.9838 LearningRate 0.0002 Epoch: 25 Global Step: 530490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:18,218-Speed 6310.33 samples/sec Loss 4.0152 LearningRate 0.0002 Epoch: 25 Global Step: 530500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:21,461-Speed 6316.95 samples/sec Loss 4.0944 LearningRate 0.0002 Epoch: 25 Global Step: 530510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:24,709-Speed 6307.14 samples/sec Loss 4.0120 LearningRate 0.0002 Epoch: 25 Global Step: 530520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:27,965-Speed 6290.23 samples/sec Loss 4.0245 LearningRate 0.0002 Epoch: 25 Global Step: 530530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:31,209-Speed 6315.26 samples/sec Loss 4.0648 LearningRate 0.0002 Epoch: 25 Global Step: 530540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:34,455-Speed 6311.18 samples/sec Loss 4.0457 LearningRate 0.0002 Epoch: 25 Global Step: 530550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:37,698-Speed 6315.65 samples/sec Loss 4.0120 LearningRate 0.0002 Epoch: 25 Global Step: 530560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:40,937-Speed 6325.16 samples/sec Loss 4.0566 LearningRate 0.0002 Epoch: 25 Global Step: 530570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:44,168-Speed 6340.54 samples/sec Loss 4.0229 LearningRate 0.0002 Epoch: 25 Global Step: 530580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:47,424-Speed 6290.43 samples/sec Loss 4.0234 LearningRate 0.0002 Epoch: 25 Global Step: 530590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:50,666-Speed 6319.12 samples/sec Loss 4.1335 LearningRate 0.0002 Epoch: 25 Global Step: 530600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:53,909-Speed 6316.47 samples/sec Loss 4.0318 LearningRate 0.0002 Epoch: 25 Global Step: 530610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 15:59:57,150-Speed 6320.31 samples/sec Loss 4.0454 LearningRate 0.0002 Epoch: 25 Global Step: 530620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:00:00,395-Speed 6311.68 samples/sec Loss 4.0183 LearningRate 0.0002 Epoch: 25 Global Step: 530630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:00:03,640-Speed 6314.30 samples/sec Loss 4.0210 LearningRate 0.0002 Epoch: 25 Global Step: 530640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:00:06,870-Speed 6340.25 samples/sec Loss 4.0652 LearningRate 0.0002 Epoch: 25 Global Step: 530650 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:00:10,116-Speed 6311.43 samples/sec Loss 3.9886 LearningRate 0.0002 Epoch: 25 Global Step: 530660 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:00:13,369-Speed 6296.97 samples/sec Loss 4.0209 LearningRate 0.0002 Epoch: 25 Global Step: 530670 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:00:16,623-Speed 6295.26 samples/sec Loss 4.1364 LearningRate 0.0002 Epoch: 25 Global Step: 530680 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:00:19,865-Speed 6318.30 samples/sec Loss 4.0763 LearningRate 0.0002 Epoch: 25 Global Step: 530690 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:00:23,124-Speed 6286.61 samples/sec Loss 4.0511 LearningRate 0.0002 Epoch: 25 Global Step: 530700 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:00:26,367-Speed 6315.79 samples/sec Loss 4.0886 LearningRate 0.0002 Epoch: 25 Global Step: 530710 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:00:29,617-Speed 6304.72 samples/sec Loss 4.0358 LearningRate 0.0002 Epoch: 25 Global Step: 530720 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:00:32,858-Speed 6320.56 samples/sec Loss 4.1434 LearningRate 0.0002 Epoch: 25 Global Step: 530730 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:00:36,098-Speed 6322.13 samples/sec Loss 4.0351 LearningRate 0.0002 Epoch: 25 Global Step: 530740 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:00:39,348-Speed 6302.79 samples/sec Loss 3.9752 LearningRate 0.0002 Epoch: 25 Global Step: 530750 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:00:42,590-Speed 6317.21 samples/sec Loss 3.9942 LearningRate 0.0002 Epoch: 25 Global Step: 530760 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:00:45,835-Speed 6314.23 samples/sec Loss 3.9931 LearningRate 0.0002 Epoch: 25 Global Step: 530770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:00:49,081-Speed 6310.21 samples/sec Loss 4.0380 LearningRate 0.0002 Epoch: 25 Global Step: 530780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:00:52,328-Speed 6309.18 samples/sec Loss 4.0735 LearningRate 0.0002 Epoch: 25 Global Step: 530790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:00:55,570-Speed 6317.00 samples/sec Loss 4.0654 LearningRate 0.0002 Epoch: 25 Global Step: 530800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:00:58,826-Speed 6292.21 samples/sec Loss 4.0293 LearningRate 0.0002 Epoch: 25 Global Step: 530810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:01:02,073-Speed 6308.03 samples/sec Loss 4.0690 LearningRate 0.0002 Epoch: 25 Global Step: 530820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:01:05,309-Speed 6330.20 samples/sec Loss 3.9796 LearningRate 0.0002 Epoch: 25 Global Step: 530830 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:01:08,554-Speed 6311.90 samples/sec Loss 4.0253 LearningRate 0.0002 Epoch: 25 Global Step: 530840 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:01:11,811-Speed 6291.21 samples/sec Loss 4.0228 LearningRate 0.0002 Epoch: 25 Global Step: 530850 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:01:15,052-Speed 6318.63 samples/sec Loss 4.0078 LearningRate 0.0002 Epoch: 25 Global Step: 530860 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:01:18,298-Speed 6311.50 samples/sec Loss 4.0224 LearningRate 0.0002 Epoch: 25 Global Step: 530870 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:01:21,541-Speed 6316.58 samples/sec Loss 3.9551 LearningRate 0.0002 Epoch: 25 Global Step: 530880 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:01:24,800-Speed 6286.14 samples/sec Loss 4.0086 LearningRate 0.0002 Epoch: 25 Global Step: 530890 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:01:28,043-Speed 6315.97 samples/sec Loss 4.0772 LearningRate 0.0002 Epoch: 25 Global Step: 530900 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:01:31,290-Speed 6308.91 samples/sec Loss 4.0451 LearningRate 0.0002 Epoch: 25 Global Step: 530910 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:01:34,534-Speed 6314.13 samples/sec Loss 4.0580 LearningRate 0.0002 Epoch: 25 Global Step: 530920 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:01:37,782-Speed 6308.31 samples/sec Loss 4.0038 LearningRate 0.0002 Epoch: 25 Global Step: 530930 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:01:41,032-Speed 6302.80 samples/sec Loss 3.9910 LearningRate 0.0002 Epoch: 25 Global Step: 530940 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:01:44,276-Speed 6313.67 samples/sec Loss 4.0674 LearningRate 0.0002 Epoch: 25 Global Step: 530950 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:01:47,523-Speed 6310.01 samples/sec Loss 3.9945 LearningRate 0.0002 Epoch: 25 Global Step: 530960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:01:50,767-Speed 6313.34 samples/sec Loss 4.0406 LearningRate 0.0002 Epoch: 25 Global Step: 530970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:01:54,014-Speed 6310.45 samples/sec Loss 4.0075 LearningRate 0.0002 Epoch: 25 Global Step: 530980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:01:57,260-Speed 6309.65 samples/sec Loss 4.0223 LearningRate 0.0002 Epoch: 25 Global Step: 530990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:00,503-Speed 6316.23 samples/sec Loss 4.0879 LearningRate 0.0002 Epoch: 25 Global Step: 531000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:03,749-Speed 6311.11 samples/sec Loss 4.0286 LearningRate 0.0002 Epoch: 25 Global Step: 531010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:06,994-Speed 6311.85 samples/sec Loss 4.0171 LearningRate 0.0002 Epoch: 25 Global Step: 531020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:10,226-Speed 6337.81 samples/sec Loss 4.0383 LearningRate 0.0002 Epoch: 25 Global Step: 531030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:13,476-Speed 6304.19 samples/sec Loss 4.0124 LearningRate 0.0002 Epoch: 25 Global Step: 531040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:16,729-Speed 6297.12 samples/sec Loss 4.1424 LearningRate 0.0002 Epoch: 25 Global Step: 531050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:19,977-Speed 6307.02 samples/sec Loss 3.9778 LearningRate 0.0002 Epoch: 25 Global Step: 531060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:23,223-Speed 6310.76 samples/sec Loss 4.0241 LearningRate 0.0002 Epoch: 25 Global Step: 531070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:26,470-Speed 6308.47 samples/sec Loss 4.0875 LearningRate 0.0002 Epoch: 25 Global Step: 531080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:29,719-Speed 6305.31 samples/sec Loss 4.0087 LearningRate 0.0002 Epoch: 25 Global Step: 531090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:32,960-Speed 6319.55 samples/sec Loss 4.0336 LearningRate 0.0002 Epoch: 25 Global Step: 531100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:36,208-Speed 6307.47 samples/sec Loss 4.1106 LearningRate 0.0002 Epoch: 25 Global Step: 531110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:39,450-Speed 6317.69 samples/sec Loss 4.0587 LearningRate 0.0002 Epoch: 25 Global Step: 531120 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:42,683-Speed 6336.73 samples/sec Loss 4.1205 LearningRate 0.0002 Epoch: 25 Global Step: 531130 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:45,925-Speed 6318.61 samples/sec Loss 4.0309 LearningRate 0.0002 Epoch: 25 Global Step: 531140 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:49,167-Speed 6317.86 samples/sec Loss 4.1030 LearningRate 0.0002 Epoch: 25 Global Step: 531150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:52,414-Speed 6309.91 samples/sec Loss 4.1140 LearningRate 0.0002 Epoch: 25 Global Step: 531160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:55,656-Speed 6318.52 samples/sec Loss 3.9871 LearningRate 0.0002 Epoch: 25 Global Step: 531170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:02:58,900-Speed 6314.15 samples/sec Loss 4.0361 LearningRate 0.0002 Epoch: 25 Global Step: 531180 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:02,145-Speed 6313.27 samples/sec Loss 4.0676 LearningRate 0.0002 Epoch: 25 Global Step: 531190 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:05,388-Speed 6316.80 samples/sec Loss 4.1368 LearningRate 0.0002 Epoch: 25 Global Step: 531200 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:08,634-Speed 6309.71 samples/sec Loss 4.0135 LearningRate 0.0002 Epoch: 25 Global Step: 531210 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:11,887-Speed 6297.34 samples/sec Loss 4.0350 LearningRate 0.0002 Epoch: 25 Global Step: 531220 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:15,130-Speed 6317.02 samples/sec Loss 4.1210 LearningRate 0.0002 Epoch: 25 Global Step: 531230 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-02 16:03:18,361-Speed 6339.61 samples/sec Loss 4.0836 LearningRate 0.0002 Epoch: 25 Global Step: 531240 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:21,603-Speed 6317.73 samples/sec Loss 4.0502 LearningRate 0.0002 Epoch: 25 Global Step: 531250 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:24,848-Speed 6313.19 samples/sec Loss 4.0373 LearningRate 0.0002 Epoch: 25 Global Step: 531260 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:28,090-Speed 6318.95 samples/sec Loss 4.0044 LearningRate 0.0002 Epoch: 25 Global Step: 531270 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:31,332-Speed 6317.57 samples/sec Loss 4.0507 LearningRate 0.0002 Epoch: 25 Global Step: 531280 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:34,575-Speed 6317.37 samples/sec Loss 4.0538 LearningRate 0.0002 Epoch: 25 Global Step: 531290 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:37,821-Speed 6310.79 samples/sec Loss 4.0458 LearningRate 0.0002 Epoch: 25 Global Step: 531300 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:41,067-Speed 6309.63 samples/sec Loss 4.0681 LearningRate 0.0002 Epoch: 25 Global Step: 531310 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:44,308-Speed 6321.49 samples/sec Loss 4.0455 LearningRate 0.0002 Epoch: 25 Global Step: 531320 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:47,554-Speed 6309.54 samples/sec Loss 4.0071 LearningRate 0.0002 Epoch: 25 Global Step: 531330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:50,782-Speed 6346.55 samples/sec Loss 4.0210 LearningRate 0.0002 Epoch: 25 Global Step: 531340 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:54,028-Speed 6310.86 samples/sec Loss 4.0785 LearningRate 0.0002 Epoch: 25 Global Step: 531350 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:03:57,271-Speed 6317.33 samples/sec Loss 4.1094 LearningRate 0.0002 Epoch: 25 Global Step: 531360 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:00,516-Speed 6312.33 samples/sec Loss 4.0390 LearningRate 0.0002 Epoch: 25 Global Step: 531370 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:03,759-Speed 6316.85 samples/sec Loss 4.0817 LearningRate 0.0002 Epoch: 25 Global Step: 531380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:07,004-Speed 6313.10 samples/sec Loss 4.0208 LearningRate 0.0002 Epoch: 25 Global Step: 531390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:10,293-Speed 6227.06 samples/sec Loss 4.0324 LearningRate 0.0002 Epoch: 25 Global Step: 531400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:13,536-Speed 6316.45 samples/sec Loss 4.1103 LearningRate 0.0002 Epoch: 25 Global Step: 531410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:16,785-Speed 6305.43 samples/sec Loss 4.1142 LearningRate 0.0002 Epoch: 25 Global Step: 531420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:20,034-Speed 6306.01 samples/sec Loss 4.0091 LearningRate 0.0002 Epoch: 25 Global Step: 531430 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:23,265-Speed 6339.95 samples/sec Loss 4.0289 LearningRate 0.0002 Epoch: 25 Global Step: 531440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:26,524-Speed 6284.59 samples/sec Loss 4.0972 LearningRate 0.0002 Epoch: 25 Global Step: 531450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:29,776-Speed 6301.57 samples/sec Loss 4.0360 LearningRate 0.0002 Epoch: 25 Global Step: 531460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:33,024-Speed 6305.92 samples/sec Loss 4.0575 LearningRate 0.0002 Epoch: 25 Global Step: 531470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:36,270-Speed 6310.64 samples/sec Loss 4.0697 LearningRate 0.0002 Epoch: 25 Global Step: 531480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:39,510-Speed 6321.83 samples/sec Loss 3.9775 LearningRate 0.0002 Epoch: 25 Global Step: 531490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:42,754-Speed 6314.61 samples/sec Loss 4.0129 LearningRate 0.0002 Epoch: 25 Global Step: 531500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:46,002-Speed 6306.72 samples/sec Loss 4.0847 LearningRate 0.0002 Epoch: 25 Global Step: 531510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:49,248-Speed 6311.87 samples/sec Loss 4.0654 LearningRate 0.0002 Epoch: 25 Global Step: 531520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:52,492-Speed 6314.89 samples/sec Loss 4.0302 LearningRate 0.0002 Epoch: 25 Global Step: 531530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:55,725-Speed 6335.15 samples/sec Loss 4.0380 LearningRate 0.0002 Epoch: 25 Global Step: 531540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:04:58,973-Speed 6307.29 samples/sec Loss 3.9823 LearningRate 0.0002 Epoch: 25 Global Step: 531550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:02,222-Speed 6305.06 samples/sec Loss 4.0863 LearningRate 0.0002 Epoch: 25 Global Step: 531560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:05,464-Speed 6319.95 samples/sec Loss 4.0264 LearningRate 0.0002 Epoch: 25 Global Step: 531570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:08,706-Speed 6317.76 samples/sec Loss 3.9682 LearningRate 0.0002 Epoch: 25 Global Step: 531580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:11,955-Speed 6304.19 samples/sec Loss 4.0411 LearningRate 0.0002 Epoch: 25 Global Step: 531590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:15,200-Speed 6312.92 samples/sec Loss 4.0249 LearningRate 0.0002 Epoch: 25 Global Step: 531600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:18,449-Speed 6305.64 samples/sec Loss 3.9961 LearningRate 0.0002 Epoch: 25 Global Step: 531610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:21,720-Speed 6261.94 samples/sec Loss 4.0515 LearningRate 0.0002 Epoch: 25 Global Step: 531620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:24,977-Speed 6289.97 samples/sec Loss 4.0642 LearningRate 0.0002 Epoch: 25 Global Step: 531630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:28,208-Speed 6339.12 samples/sec Loss 4.0948 LearningRate 0.0002 Epoch: 25 Global Step: 531640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:31,454-Speed 6311.22 samples/sec Loss 4.0016 LearningRate 0.0002 Epoch: 25 Global Step: 531650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:34,741-Speed 6231.68 samples/sec Loss 4.0330 LearningRate 0.0002 Epoch: 25 Global Step: 531660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:38,012-Speed 6263.09 samples/sec Loss 4.0607 LearningRate 0.0002 Epoch: 25 Global Step: 531670 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:41,255-Speed 6316.06 samples/sec Loss 4.0407 LearningRate 0.0002 Epoch: 25 Global Step: 531680 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:44,502-Speed 6308.13 samples/sec Loss 4.0845 LearningRate 0.0002 Epoch: 25 Global Step: 531690 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:47,750-Speed 6308.52 samples/sec Loss 4.0347 LearningRate 0.0002 Epoch: 25 Global Step: 531700 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:51,000-Speed 6303.64 samples/sec Loss 4.0510 LearningRate 0.0002 Epoch: 25 Global Step: 531710 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:54,243-Speed 6314.88 samples/sec Loss 4.0795 LearningRate 0.0002 Epoch: 25 Global Step: 531720 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:05:57,486-Speed 6317.74 samples/sec Loss 4.0360 LearningRate 0.0002 Epoch: 25 Global Step: 531730 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:00,717-Speed 6339.23 samples/sec Loss 4.0627 LearningRate 0.0002 Epoch: 25 Global Step: 531740 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:03,960-Speed 6316.41 samples/sec Loss 4.0199 LearningRate 0.0002 Epoch: 25 Global Step: 531750 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:07,207-Speed 6309.53 samples/sec Loss 4.1101 LearningRate 0.0002 Epoch: 25 Global Step: 531760 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:10,455-Speed 6306.49 samples/sec Loss 4.1188 LearningRate 0.0002 Epoch: 25 Global Step: 531770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:13,709-Speed 6295.86 samples/sec Loss 4.0676 LearningRate 0.0002 Epoch: 25 Global Step: 531780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:16,967-Speed 6287.83 samples/sec Loss 4.0276 LearningRate 0.0002 Epoch: 25 Global Step: 531790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:20,214-Speed 6308.88 samples/sec Loss 4.0704 LearningRate 0.0002 Epoch: 25 Global Step: 531800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:23,461-Speed 6307.98 samples/sec Loss 4.1025 LearningRate 0.0002 Epoch: 25 Global Step: 531810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:26,706-Speed 6314.17 samples/sec Loss 4.0452 LearningRate 0.0002 Epoch: 25 Global Step: 531820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:29,951-Speed 6313.18 samples/sec Loss 3.9891 LearningRate 0.0002 Epoch: 25 Global Step: 531830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:33,182-Speed 6339.06 samples/sec Loss 4.0501 LearningRate 0.0002 Epoch: 25 Global Step: 531840 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:36,427-Speed 6311.97 samples/sec Loss 4.0229 LearningRate 0.0002 Epoch: 25 Global Step: 531850 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:39,674-Speed 6309.97 samples/sec Loss 3.9577 LearningRate 0.0002 Epoch: 25 Global Step: 531860 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:42,921-Speed 6308.27 samples/sec Loss 4.0769 LearningRate 0.0002 Epoch: 25 Global Step: 531870 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:46,165-Speed 6313.22 samples/sec Loss 4.0971 LearningRate 0.0002 Epoch: 25 Global Step: 531880 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:49,415-Speed 6305.05 samples/sec Loss 3.9980 LearningRate 0.0002 Epoch: 25 Global Step: 531890 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:52,660-Speed 6311.14 samples/sec Loss 3.9751 LearningRate 0.0002 Epoch: 25 Global Step: 531900 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:55,905-Speed 6313.56 samples/sec Loss 4.0624 LearningRate 0.0002 Epoch: 25 Global Step: 531910 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:06:59,147-Speed 6318.22 samples/sec Loss 4.0397 LearningRate 0.0002 Epoch: 25 Global Step: 531920 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:07:02,391-Speed 6315.04 samples/sec Loss 4.0149 LearningRate 0.0002 Epoch: 25 Global Step: 531930 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:07:05,621-Speed 6342.32 samples/sec Loss 3.9801 LearningRate 0.0002 Epoch: 25 Global Step: 531940 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:07:08,863-Speed 6318.16 samples/sec Loss 4.0539 LearningRate 0.0002 Epoch: 25 Global Step: 531950 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:07:12,109-Speed 6309.23 samples/sec Loss 4.0665 LearningRate 0.0002 Epoch: 25 Global Step: 531960 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:07:15,355-Speed 6311.83 samples/sec Loss 4.0207 LearningRate 0.0002 Epoch: 25 Global Step: 531970 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:07:18,602-Speed 6310.08 samples/sec Loss 4.0589 LearningRate 0.0002 Epoch: 25 Global Step: 531980 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:07:21,849-Speed 6307.79 samples/sec Loss 4.0553 LearningRate 0.0002 Epoch: 25 Global Step: 531990 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:07:25,097-Speed 6307.92 samples/sec Loss 4.0853 LearningRate 0.0002 Epoch: 25 Global Step: 532000 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:07:28,346-Speed 6304.90 samples/sec Loss 3.9818 LearningRate 0.0002 Epoch: 25 Global Step: 532010 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:07:31,597-Speed 6300.58 samples/sec Loss 4.0148 LearningRate 0.0002 Epoch: 25 Global Step: 532020 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:07:34,841-Speed 6315.30 samples/sec Loss 4.0398 LearningRate 0.0002 Epoch: 25 Global Step: 532030 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:07:38,084-Speed 6315.24 samples/sec Loss 4.1066 LearningRate 0.0002 Epoch: 25 Global Step: 532040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:07:41,328-Speed 6314.55 samples/sec Loss 4.0263 LearningRate 0.0002 Epoch: 25 Global Step: 532050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:07:44,575-Speed 6308.80 samples/sec Loss 4.0494 LearningRate 0.0002 Epoch: 25 Global Step: 532060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:07:47,819-Speed 6315.06 samples/sec Loss 4.0409 LearningRate 0.0002 Epoch: 25 Global Step: 532070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:07:51,065-Speed 6310.76 samples/sec Loss 4.0766 LearningRate 0.0002 Epoch: 25 Global Step: 532080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:07:54,307-Speed 6318.18 samples/sec Loss 4.0543 LearningRate 0.0002 Epoch: 25 Global Step: 532090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:07:57,567-Speed 6284.74 samples/sec Loss 4.0157 LearningRate 0.0002 Epoch: 25 Global Step: 532100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:00,810-Speed 6315.09 samples/sec Loss 4.0315 LearningRate 0.0002 Epoch: 25 Global Step: 532110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:04,059-Speed 6306.09 samples/sec Loss 4.0898 LearningRate 0.0002 Epoch: 25 Global Step: 532120 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:07,311-Speed 6299.96 samples/sec Loss 4.0132 LearningRate 0.0002 Epoch: 25 Global Step: 532130 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:10,543-Speed 6337.57 samples/sec Loss 4.0435 LearningRate 0.0002 Epoch: 25 Global Step: 532140 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:13,790-Speed 6307.66 samples/sec Loss 4.0311 LearningRate 0.0002 Epoch: 25 Global Step: 532150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:17,031-Speed 6321.70 samples/sec Loss 3.9647 LearningRate 0.0002 Epoch: 25 Global Step: 532160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:20,277-Speed 6310.75 samples/sec Loss 4.0714 LearningRate 0.0002 Epoch: 25 Global Step: 532170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:23,522-Speed 6312.17 samples/sec Loss 4.0212 LearningRate 0.0002 Epoch: 25 Global Step: 532180 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:26,772-Speed 6303.24 samples/sec Loss 3.9907 LearningRate 0.0002 Epoch: 25 Global Step: 532190 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:30,040-Speed 6267.90 samples/sec Loss 3.9916 LearningRate 0.0002 Epoch: 25 Global Step: 532200 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:33,281-Speed 6321.26 samples/sec Loss 4.0612 LearningRate 0.0002 Epoch: 25 Global Step: 532210 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:36,528-Speed 6307.61 samples/sec Loss 4.0313 LearningRate 0.0002 Epoch: 25 Global Step: 532220 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:39,773-Speed 6314.23 samples/sec Loss 4.0137 LearningRate 0.0002 Epoch: 25 Global Step: 532230 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:43,004-Speed 6338.67 samples/sec Loss 4.0087 LearningRate 0.0002 Epoch: 25 Global Step: 532240 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:46,248-Speed 6315.43 samples/sec Loss 4.0506 LearningRate 0.0002 Epoch: 25 Global Step: 532250 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:49,493-Speed 6313.32 samples/sec Loss 4.0293 LearningRate 0.0002 Epoch: 25 Global Step: 532260 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:52,752-Speed 6284.73 samples/sec Loss 4.0478 LearningRate 0.0002 Epoch: 25 Global Step: 532270 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:55,995-Speed 6315.39 samples/sec Loss 4.0947 LearningRate 0.0002 Epoch: 25 Global Step: 532280 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:08:59,249-Speed 6295.57 samples/sec Loss 4.0364 LearningRate 0.0002 Epoch: 25 Global Step: 532290 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:02,498-Speed 6305.34 samples/sec Loss 4.0434 LearningRate 0.0002 Epoch: 25 Global Step: 532300 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:05,744-Speed 6310.07 samples/sec Loss 4.0028 LearningRate 0.0002 Epoch: 25 Global Step: 532310 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:08,999-Speed 6294.48 samples/sec Loss 4.0327 LearningRate 0.0002 Epoch: 25 Global Step: 532320 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:12,247-Speed 6306.05 samples/sec Loss 4.0329 LearningRate 0.0002 Epoch: 25 Global Step: 532330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:15,494-Speed 6309.40 samples/sec Loss 3.9977 LearningRate 0.0002 Epoch: 25 Global Step: 532340 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-02 16:09:18,733-Speed 6323.99 samples/sec Loss 4.0163 LearningRate 0.0002 Epoch: 25 Global Step: 532350 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:21,978-Speed 6311.73 samples/sec Loss 4.0307 LearningRate 0.0002 Epoch: 25 Global Step: 532360 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:25,226-Speed 6306.78 samples/sec Loss 4.0259 LearningRate 0.0002 Epoch: 25 Global Step: 532370 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:28,471-Speed 6314.16 samples/sec Loss 3.9962 LearningRate 0.0002 Epoch: 25 Global Step: 532380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:31,715-Speed 6313.11 samples/sec Loss 4.0783 LearningRate 0.0002 Epoch: 25 Global Step: 532390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:34,956-Speed 6322.07 samples/sec Loss 4.0330 LearningRate 0.0002 Epoch: 25 Global Step: 532400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:38,200-Speed 6313.99 samples/sec Loss 4.0367 LearningRate 0.0002 Epoch: 25 Global Step: 532410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:09:41,427-Speed 6348.47 samples/sec Loss 4.0445 LearningRate 0.0002 Epoch: 25 Global Step: 532420 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:09:44,679-Speed 6298.63 samples/sec Loss 4.0987 LearningRate 0.0002 Epoch: 25 Global Step: 532430 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:09:47,928-Speed 6305.76 samples/sec Loss 4.0609 LearningRate 0.0002 Epoch: 25 Global Step: 532440 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:09:51,170-Speed 6318.07 samples/sec Loss 4.0573 LearningRate 0.0002 Epoch: 25 Global Step: 532450 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:09:54,416-Speed 6309.72 samples/sec Loss 4.0249 LearningRate 0.0002 Epoch: 25 Global Step: 532460 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:09:57,662-Speed 6312.65 samples/sec Loss 4.0728 LearningRate 0.0002 Epoch: 25 Global Step: 532470 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:10:00,927-Speed 6273.59 samples/sec Loss 4.0214 LearningRate 0.0002 Epoch: 25 Global Step: 532480 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:10:04,170-Speed 6315.44 samples/sec Loss 4.0051 LearningRate 0.0002 Epoch: 25 Global Step: 532490 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:10:07,414-Speed 6314.57 samples/sec Loss 4.0505 LearningRate 0.0002 Epoch: 25 Global Step: 532500 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:10:10,656-Speed 6318.66 samples/sec Loss 3.9896 LearningRate 0.0002 Epoch: 25 Global Step: 532510 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:10:13,896-Speed 6323.30 samples/sec Loss 4.0792 LearningRate 0.0002 Epoch: 25 Global Step: 532520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:17,140-Speed 6314.38 samples/sec Loss 4.0101 LearningRate 0.0002 Epoch: 25 Global Step: 532530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:20,396-Speed 6291.14 samples/sec Loss 4.0617 LearningRate 0.0002 Epoch: 25 Global Step: 532540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:23,640-Speed 6314.74 samples/sec Loss 4.0882 LearningRate 0.0002 Epoch: 25 Global Step: 532550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:26,899-Speed 6285.44 samples/sec Loss 4.0683 LearningRate 0.0002 Epoch: 25 Global Step: 532560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:30,138-Speed 6324.04 samples/sec Loss 3.9924 LearningRate 0.0002 Epoch: 25 Global Step: 532570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:33,385-Speed 6308.74 samples/sec Loss 4.0006 LearningRate 0.0002 Epoch: 25 Global Step: 532580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:36,719-Speed 6143.95 samples/sec Loss 4.0555 LearningRate 0.0002 Epoch: 25 Global Step: 532590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:39,991-Speed 6260.19 samples/sec Loss 3.9948 LearningRate 0.0002 Epoch: 25 Global Step: 532600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:43,237-Speed 6310.42 samples/sec Loss 4.1573 LearningRate 0.0002 Epoch: 25 Global Step: 532610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:46,466-Speed 6345.43 samples/sec Loss 4.0648 LearningRate 0.0002 Epoch: 25 Global Step: 532620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:49,714-Speed 6307.87 samples/sec Loss 4.0123 LearningRate 0.0002 Epoch: 25 Global Step: 532630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:52,959-Speed 6311.68 samples/sec Loss 4.0553 LearningRate 0.0002 Epoch: 25 Global Step: 532640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:56,201-Speed 6317.99 samples/sec Loss 4.0500 LearningRate 0.0002 Epoch: 25 Global Step: 532650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:10:59,444-Speed 6317.24 samples/sec Loss 4.0378 LearningRate 0.0002 Epoch: 25 Global Step: 532660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:02,691-Speed 6308.31 samples/sec Loss 4.0947 LearningRate 0.0002 Epoch: 25 Global Step: 532670 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:05,948-Speed 6290.52 samples/sec Loss 4.1112 LearningRate 0.0002 Epoch: 25 Global Step: 532680 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:09,190-Speed 6317.25 samples/sec Loss 4.0129 LearningRate 0.0002 Epoch: 25 Global Step: 532690 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:12,432-Speed 6318.04 samples/sec Loss 4.0789 LearningRate 0.0002 Epoch: 25 Global Step: 532700 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:15,695-Speed 6279.46 samples/sec Loss 4.0053 LearningRate 0.0002 Epoch: 25 Global Step: 532710 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:18,927-Speed 6337.41 samples/sec Loss 3.9321 LearningRate 0.0002 Epoch: 25 Global Step: 532720 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:22,173-Speed 6309.90 samples/sec Loss 4.0619 LearningRate 0.0002 Epoch: 25 Global Step: 532730 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:25,428-Speed 6293.51 samples/sec Loss 3.9910 LearningRate 0.0002 Epoch: 25 Global Step: 532740 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:28,676-Speed 6308.78 samples/sec Loss 3.9919 LearningRate 0.0002 Epoch: 25 Global Step: 532750 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:31,924-Speed 6306.53 samples/sec Loss 4.0392 LearningRate 0.0002 Epoch: 25 Global Step: 532760 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:35,170-Speed 6311.57 samples/sec Loss 4.0886 LearningRate 0.0002 Epoch: 25 Global Step: 532770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:38,413-Speed 6316.29 samples/sec Loss 4.0179 LearningRate 0.0002 Epoch: 25 Global Step: 532780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:41,654-Speed 6320.18 samples/sec Loss 3.9840 LearningRate 0.0002 Epoch: 25 Global Step: 532790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:44,896-Speed 6319.17 samples/sec Loss 3.9805 LearningRate 0.0002 Epoch: 25 Global Step: 532800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:48,139-Speed 6314.72 samples/sec Loss 4.0079 LearningRate 0.0002 Epoch: 25 Global Step: 532810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:51,371-Speed 6339.32 samples/sec Loss 4.0327 LearningRate 0.0002 Epoch: 25 Global Step: 532820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:54,613-Speed 6318.53 samples/sec Loss 4.0338 LearningRate 0.0002 Epoch: 25 Global Step: 532830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:11:57,856-Speed 6317.89 samples/sec Loss 4.0582 LearningRate 0.0002 Epoch: 25 Global Step: 532840 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:01,101-Speed 6311.72 samples/sec Loss 3.9842 LearningRate 0.0002 Epoch: 25 Global Step: 532850 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:04,346-Speed 6313.56 samples/sec Loss 4.0195 LearningRate 0.0002 Epoch: 25 Global Step: 532860 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:07,591-Speed 6311.73 samples/sec Loss 4.0087 LearningRate 0.0002 Epoch: 25 Global Step: 532870 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:10,836-Speed 6314.18 samples/sec Loss 3.9458 LearningRate 0.0002 Epoch: 25 Global Step: 532880 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:14,089-Speed 6295.59 samples/sec Loss 3.9715 LearningRate 0.0002 Epoch: 25 Global Step: 532890 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:17,336-Speed 6309.98 samples/sec Loss 4.0772 LearningRate 0.0002 Epoch: 25 Global Step: 532900 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:20,589-Speed 6296.15 samples/sec Loss 4.0396 LearningRate 0.0002 Epoch: 25 Global Step: 532910 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:23,829-Speed 6323.46 samples/sec Loss 4.0388 LearningRate 0.0002 Epoch: 25 Global Step: 532920 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:27,078-Speed 6304.79 samples/sec Loss 4.0400 LearningRate 0.0002 Epoch: 25 Global Step: 532930 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:30,396-Speed 6173.85 samples/sec Loss 4.0195 LearningRate 0.0002 Epoch: 25 Global Step: 532940 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:33,642-Speed 6310.82 samples/sec Loss 3.9919 LearningRate 0.0002 Epoch: 25 Global Step: 532950 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:36,881-Speed 6323.08 samples/sec Loss 4.0213 LearningRate 0.0002 Epoch: 25 Global Step: 532960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:40,131-Speed 6303.05 samples/sec Loss 4.0707 LearningRate 0.0002 Epoch: 25 Global Step: 532970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:43,374-Speed 6317.35 samples/sec Loss 4.0771 LearningRate 0.0002 Epoch: 25 Global Step: 532980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:46,619-Speed 6313.18 samples/sec Loss 4.0586 LearningRate 0.0002 Epoch: 25 Global Step: 532990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:49,868-Speed 6303.89 samples/sec Loss 3.9968 LearningRate 0.0002 Epoch: 25 Global Step: 533000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:53,112-Speed 6314.42 samples/sec Loss 4.0316 LearningRate 0.0002 Epoch: 25 Global Step: 533010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:12:56,356-Speed 6314.62 samples/sec Loss 4.0134 LearningRate 0.0002 Epoch: 25 Global Step: 533020 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-02 16:12:59,590-Speed 6334.36 samples/sec Loss 4.0455 LearningRate 0.0002 Epoch: 25 Global Step: 533030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:02,838-Speed 6307.95 samples/sec Loss 4.0306 LearningRate 0.0002 Epoch: 25 Global Step: 533040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:06,088-Speed 6302.68 samples/sec Loss 4.0593 LearningRate 0.0002 Epoch: 25 Global Step: 533050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:09,334-Speed 6310.98 samples/sec Loss 3.9787 LearningRate 0.0002 Epoch: 25 Global Step: 533060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:12,580-Speed 6311.25 samples/sec Loss 3.9907 LearningRate 0.0002 Epoch: 25 Global Step: 533070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:15,822-Speed 6318.61 samples/sec Loss 3.9713 LearningRate 0.0002 Epoch: 25 Global Step: 533080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:19,067-Speed 6311.99 samples/sec Loss 4.0426 LearningRate 0.0002 Epoch: 25 Global Step: 533090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:22,310-Speed 6316.58 samples/sec Loss 3.9437 LearningRate 0.0002 Epoch: 25 Global Step: 533100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:25,558-Speed 6306.67 samples/sec Loss 4.0510 LearningRate 0.0002 Epoch: 25 Global Step: 533110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:28,806-Speed 6306.45 samples/sec Loss 4.0590 LearningRate 0.0002 Epoch: 25 Global Step: 533120 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:32,035-Speed 6343.70 samples/sec Loss 4.0248 LearningRate 0.0002 Epoch: 25 Global Step: 533130 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:35,281-Speed 6311.08 samples/sec Loss 4.0621 LearningRate 0.0002 Epoch: 25 Global Step: 533140 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:38,525-Speed 6314.70 samples/sec Loss 4.0665 LearningRate 0.0002 Epoch: 25 Global Step: 533150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:41,774-Speed 6305.42 samples/sec Loss 4.0257 LearningRate 0.0002 Epoch: 25 Global Step: 533160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:45,019-Speed 6311.32 samples/sec Loss 4.0750 LearningRate 0.0002 Epoch: 25 Global Step: 533170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:48,261-Speed 6318.34 samples/sec Loss 4.0242 LearningRate 0.0002 Epoch: 25 Global Step: 533180 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:51,509-Speed 6307.74 samples/sec Loss 4.0775 LearningRate 0.0002 Epoch: 25 Global Step: 533190 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:13:54,737-Speed 6346.35 samples/sec Loss 3.9887 LearningRate 0.0002 Epoch: 25 Global Step: 533200 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:13:57,982-Speed 6311.75 samples/sec Loss 4.1083 LearningRate 0.0002 Epoch: 25 Global Step: 533210 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:14:01,224-Speed 6317.85 samples/sec Loss 4.0757 LearningRate 0.0002 Epoch: 25 Global Step: 533220 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:14:04,468-Speed 6315.68 samples/sec Loss 4.0816 LearningRate 0.0002 Epoch: 25 Global Step: 533230 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:14:07,718-Speed 6303.69 samples/sec Loss 4.0001 LearningRate 0.0002 Epoch: 25 Global Step: 533240 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:14:10,963-Speed 6314.31 samples/sec Loss 4.0477 LearningRate 0.0002 Epoch: 25 Global Step: 533250 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:14:14,209-Speed 6308.92 samples/sec Loss 4.0369 LearningRate 0.0002 Epoch: 25 Global Step: 533260 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:14:17,454-Speed 6313.04 samples/sec Loss 4.0193 LearningRate 0.0002 Epoch: 25 Global Step: 533270 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:14:20,698-Speed 6314.93 samples/sec Loss 4.0300 LearningRate 0.0002 Epoch: 25 Global Step: 533280 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:14:23,943-Speed 6312.27 samples/sec Loss 4.0238 LearningRate 0.0002 Epoch: 25 Global Step: 533290 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:14:27,188-Speed 6312.49 samples/sec Loss 4.0358 LearningRate 0.0002 Epoch: 25 Global Step: 533300 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:14:30,429-Speed 6320.49 samples/sec Loss 4.0016 LearningRate 0.0002 Epoch: 25 Global Step: 533310 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:14:33,673-Speed 6314.75 samples/sec Loss 4.0282 LearningRate 0.0002 Epoch: 25 Global Step: 533320 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:14:36,917-Speed 6315.82 samples/sec Loss 4.0402 LearningRate 0.0002 Epoch: 25 Global Step: 533330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:14:40,163-Speed 6311.12 samples/sec Loss 4.0352 LearningRate 0.0002 Epoch: 25 Global Step: 533340 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:14:43,408-Speed 6311.18 samples/sec Loss 4.1028 LearningRate 0.0002 Epoch: 25 Global Step: 533350 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:14:46,655-Speed 6309.74 samples/sec Loss 4.0631 LearningRate 0.0002 Epoch: 25 Global Step: 533360 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:14:49,900-Speed 6312.55 samples/sec Loss 4.0914 LearningRate 0.0002 Epoch: 25 Global Step: 533370 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:14:53,148-Speed 6306.76 samples/sec Loss 4.0274 LearningRate 0.0002 Epoch: 25 Global Step: 533380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:14:56,389-Speed 6320.64 samples/sec Loss 4.0546 LearningRate 0.0002 Epoch: 25 Global Step: 533390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:14:59,634-Speed 6311.71 samples/sec Loss 3.9821 LearningRate 0.0002 Epoch: 25 Global Step: 533400 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-02 16:15:02,864-Speed 6343.45 samples/sec Loss 4.0502 LearningRate 0.0002 Epoch: 25 Global Step: 533410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:06,118-Speed 6294.25 samples/sec Loss 4.0777 LearningRate 0.0002 Epoch: 25 Global Step: 533420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:09,363-Speed 6314.27 samples/sec Loss 4.0431 LearningRate 0.0002 Epoch: 25 Global Step: 533430 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:12,605-Speed 6317.81 samples/sec Loss 4.0214 LearningRate 0.0002 Epoch: 25 Global Step: 533440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:15,848-Speed 6317.52 samples/sec Loss 4.0133 LearningRate 0.0002 Epoch: 25 Global Step: 533450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:19,089-Speed 6319.25 samples/sec Loss 4.0502 LearningRate 0.0002 Epoch: 25 Global Step: 533460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:22,335-Speed 6312.09 samples/sec Loss 4.0503 LearningRate 0.0002 Epoch: 25 Global Step: 533470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:25,582-Speed 6308.35 samples/sec Loss 4.0163 LearningRate 0.0002 Epoch: 25 Global Step: 533480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:28,834-Speed 6298.83 samples/sec Loss 3.9764 LearningRate 0.0002 Epoch: 25 Global Step: 533490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:32,078-Speed 6313.05 samples/sec Loss 4.0286 LearningRate 0.0002 Epoch: 25 Global Step: 533500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:35,315-Speed 6330.27 samples/sec Loss 3.9844 LearningRate 0.0002 Epoch: 25 Global Step: 533510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:38,558-Speed 6315.76 samples/sec Loss 4.0677 LearningRate 0.0002 Epoch: 25 Global Step: 533520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:41,806-Speed 6306.72 samples/sec Loss 4.0434 LearningRate 0.0002 Epoch: 25 Global Step: 533530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:45,051-Speed 6313.75 samples/sec Loss 4.0268 LearningRate 0.0002 Epoch: 25 Global Step: 533540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:48,296-Speed 6311.68 samples/sec Loss 4.0242 LearningRate 0.0002 Epoch: 25 Global Step: 533550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:51,543-Speed 6309.00 samples/sec Loss 3.9709 LearningRate 0.0002 Epoch: 25 Global Step: 533560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:54,790-Speed 6309.14 samples/sec Loss 4.0305 LearningRate 0.0002 Epoch: 25 Global Step: 533570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:15:58,035-Speed 6312.19 samples/sec Loss 4.0191 LearningRate 0.0002 Epoch: 25 Global Step: 533580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:16:01,265-Speed 6341.72 samples/sec Loss 4.0632 LearningRate 0.0002 Epoch: 25 Global Step: 533590 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:04,514-Speed 6304.74 samples/sec Loss 4.0263 LearningRate 0.0002 Epoch: 25 Global Step: 533600 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:07,759-Speed 6312.91 samples/sec Loss 4.1141 LearningRate 0.0002 Epoch: 25 Global Step: 533610 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:11,001-Speed 6317.43 samples/sec Loss 3.9993 LearningRate 0.0002 Epoch: 25 Global Step: 533620 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:14,246-Speed 6313.79 samples/sec Loss 4.0450 LearningRate 0.0002 Epoch: 25 Global Step: 533630 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:17,495-Speed 6304.84 samples/sec Loss 4.0152 LearningRate 0.0002 Epoch: 25 Global Step: 533640 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:20,737-Speed 6318.87 samples/sec Loss 4.0460 LearningRate 0.0002 Epoch: 25 Global Step: 533650 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:23,983-Speed 6309.87 samples/sec Loss 3.9838 LearningRate 0.0002 Epoch: 25 Global Step: 533660 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:27,229-Speed 6311.91 samples/sec Loss 4.0037 LearningRate 0.0002 Epoch: 25 Global Step: 533670 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:30,470-Speed 6320.45 samples/sec Loss 4.0111 LearningRate 0.0002 Epoch: 25 Global Step: 533680 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:33,715-Speed 6313.03 samples/sec Loss 3.9992 LearningRate 0.0002 Epoch: 25 Global Step: 533690 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:16:36,960-Speed 6312.08 samples/sec Loss 3.9994 LearningRate 0.0002 Epoch: 25 Global Step: 533700 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:16:40,206-Speed 6311.02 samples/sec Loss 4.0641 LearningRate 0.0002 Epoch: 25 Global Step: 533710 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:16:43,450-Speed 6314.12 samples/sec Loss 4.0285 LearningRate 0.0002 Epoch: 25 Global Step: 533720 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:16:46,678-Speed 6346.47 samples/sec Loss 4.0269 LearningRate 0.0002 Epoch: 25 Global Step: 533730 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:49,930-Speed 6297.93 samples/sec Loss 4.0176 LearningRate 0.0002 Epoch: 25 Global Step: 533740 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:53,175-Speed 6313.91 samples/sec Loss 4.1075 LearningRate 0.0002 Epoch: 25 Global Step: 533750 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:56,421-Speed 6310.18 samples/sec Loss 3.9331 LearningRate 0.0002 Epoch: 25 Global Step: 533760 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:16:59,664-Speed 6316.79 samples/sec Loss 4.0360 LearningRate 0.0002 Epoch: 25 Global Step: 533770 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:17:02,926-Speed 6279.49 samples/sec Loss 4.0671 LearningRate 0.0002 Epoch: 25 Global Step: 533780 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:17:06,174-Speed 6306.99 samples/sec Loss 4.0850 LearningRate 0.0002 Epoch: 25 Global Step: 533790 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:17:09,429-Speed 6293.51 samples/sec Loss 4.0441 LearningRate 0.0002 Epoch: 25 Global Step: 533800 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:17:12,673-Speed 6313.54 samples/sec Loss 3.9937 LearningRate 0.0002 Epoch: 25 Global Step: 533810 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:17:15,928-Speed 6293.37 samples/sec Loss 3.9945 LearningRate 0.0002 Epoch: 25 Global Step: 533820 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:17:19,173-Speed 6312.95 samples/sec Loss 4.0036 LearningRate 0.0002 Epoch: 25 Global Step: 533830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:17:22,423-Speed 6303.66 samples/sec Loss 3.9987 LearningRate 0.0002 Epoch: 25 Global Step: 533840 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:17:25,667-Speed 6313.63 samples/sec Loss 4.0325 LearningRate 0.0002 Epoch: 25 Global Step: 533850 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:17:28,912-Speed 6313.15 samples/sec Loss 4.0313 LearningRate 0.0002 Epoch: 25 Global Step: 533860 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:17:32,198-Speed 6235.19 samples/sec Loss 4.0578 LearningRate 0.0002 Epoch: 25 Global Step: 533870 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:17:35,441-Speed 6316.16 samples/sec Loss 4.0935 LearningRate 0.0002 Epoch: 25 Global Step: 533880 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:17:38,686-Speed 6312.65 samples/sec Loss 4.0592 LearningRate 0.0002 Epoch: 25 Global Step: 533890 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:17:41,930-Speed 6314.40 samples/sec Loss 4.0178 LearningRate 0.0002 Epoch: 25 Global Step: 533900 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:17:45,162-Speed 6337.92 samples/sec Loss 3.9640 LearningRate 0.0002 Epoch: 25 Global Step: 533910 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:17:48,403-Speed 6320.14 samples/sec Loss 3.9828 LearningRate 0.0002 Epoch: 25 Global Step: 533920 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:17:51,648-Speed 6312.87 samples/sec Loss 4.0094 LearningRate 0.0002 Epoch: 25 Global Step: 533930 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:17:54,893-Speed 6312.61 samples/sec Loss 4.0164 LearningRate 0.0002 Epoch: 25 Global Step: 533940 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:17:58,147-Speed 6295.40 samples/sec Loss 4.0282 LearningRate 0.0002 Epoch: 25 Global Step: 533950 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:18:01,392-Speed 6313.25 samples/sec Loss 4.0340 LearningRate 0.0002 Epoch: 25 Global Step: 533960 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:18:04,636-Speed 6314.95 samples/sec Loss 4.0080 LearningRate 0.0002 Epoch: 25 Global Step: 533970 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:18:07,884-Speed 6304.94 samples/sec Loss 4.0839 LearningRate 0.0002 Epoch: 25 Global Step: 533980 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:18:11,136-Speed 6301.15 samples/sec Loss 4.0388 LearningRate 0.0002 Epoch: 25 Global Step: 533990 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:18:14,384-Speed 6306.10 samples/sec Loss 4.0060 LearningRate 0.0002 Epoch: 25 Global Step: 534000 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:18:17,627-Speed 6316.09 samples/sec Loss 4.0532 LearningRate 0.0002 Epoch: 25 Global Step: 534010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:20,868-Speed 6321.20 samples/sec Loss 4.0728 LearningRate 0.0002 Epoch: 25 Global Step: 534020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:24,115-Speed 6308.56 samples/sec Loss 4.0536 LearningRate 0.0002 Epoch: 25 Global Step: 534030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:27,357-Speed 6318.13 samples/sec Loss 3.9629 LearningRate 0.0002 Epoch: 25 Global Step: 534040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:30,600-Speed 6315.41 samples/sec Loss 4.0280 LearningRate 0.0002 Epoch: 25 Global Step: 534050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:33,847-Speed 6310.43 samples/sec Loss 3.9425 LearningRate 0.0002 Epoch: 25 Global Step: 534060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:37,091-Speed 6315.77 samples/sec Loss 4.0190 LearningRate 0.0002 Epoch: 25 Global Step: 534070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:40,335-Speed 6313.53 samples/sec Loss 4.0355 LearningRate 0.0002 Epoch: 25 Global Step: 534080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:43,578-Speed 6316.35 samples/sec Loss 3.9951 LearningRate 0.0002 Epoch: 25 Global Step: 534090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:46,824-Speed 6310.94 samples/sec Loss 3.9826 LearningRate 0.0002 Epoch: 25 Global Step: 534100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:50,055-Speed 6340.95 samples/sec Loss 4.0146 LearningRate 0.0002 Epoch: 25 Global Step: 534110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:53,300-Speed 6311.70 samples/sec Loss 4.0162 LearningRate 0.0002 Epoch: 25 Global Step: 534120 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:56,547-Speed 6309.00 samples/sec Loss 4.0407 LearningRate 0.0002 Epoch: 25 Global Step: 534130 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:18:59,780-Speed 6337.11 samples/sec Loss 4.0161 LearningRate 0.0002 Epoch: 25 Global Step: 534140 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:19:03,028-Speed 6306.13 samples/sec Loss 3.9553 LearningRate 0.0002 Epoch: 25 Global Step: 534150 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:19:06,271-Speed 6316.44 samples/sec Loss 4.0310 LearningRate 0.0002 Epoch: 25 Global Step: 534160 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:19:09,516-Speed 6313.82 samples/sec Loss 3.9857 LearningRate 0.0002 Epoch: 25 Global Step: 534170 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:19:12,761-Speed 6312.39 samples/sec Loss 4.0100 LearningRate 0.0002 Epoch: 25 Global Step: 534180 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:19:16,006-Speed 6312.27 samples/sec Loss 3.9949 LearningRate 0.0002 Epoch: 25 Global Step: 534190 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:19:19,250-Speed 6313.63 samples/sec Loss 4.0174 LearningRate 0.0002 Epoch: 25 Global Step: 534200 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:19:22,496-Speed 6311.91 samples/sec Loss 4.0056 LearningRate 0.0002 Epoch: 25 Global Step: 534210 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:19:25,740-Speed 6313.70 samples/sec Loss 4.0812 LearningRate 0.0002 Epoch: 25 Global Step: 534220 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:19:28,989-Speed 6305.19 samples/sec Loss 4.0554 LearningRate 0.0002 Epoch: 25 Global Step: 534230 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:19:32,242-Speed 6297.30 samples/sec Loss 3.9931 LearningRate 0.0002 Epoch: 25 Global Step: 534240 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:19:35,490-Speed 6306.77 samples/sec Loss 4.0119 LearningRate 0.0002 Epoch: 25 Global Step: 534250 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:19:38,732-Speed 6319.32 samples/sec Loss 3.9569 LearningRate 0.0002 Epoch: 25 Global Step: 534260 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:19:41,973-Speed 6319.06 samples/sec Loss 3.9297 LearningRate 0.0002 Epoch: 25 Global Step: 534270 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:19:45,219-Speed 6312.04 samples/sec Loss 4.0281 LearningRate 0.0002 Epoch: 25 Global Step: 534280 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:19:48,464-Speed 6313.89 samples/sec Loss 4.0585 LearningRate 0.0002 Epoch: 25 Global Step: 534290 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:19:51,715-Speed 6300.15 samples/sec Loss 4.0089 LearningRate 0.0002 Epoch: 25 Global Step: 534300 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:19:54,956-Speed 6320.86 samples/sec Loss 4.0717 LearningRate 0.0002 Epoch: 25 Global Step: 534310 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:19:58,203-Speed 6307.62 samples/sec Loss 3.9979 LearningRate 0.0002 Epoch: 25 Global Step: 534320 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:20:01,454-Speed 6301.07 samples/sec Loss 4.0217 LearningRate 0.0002 Epoch: 25 Global Step: 534330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:20:04,685-Speed 6339.55 samples/sec Loss 4.0595 LearningRate 0.0002 Epoch: 25 Global Step: 534340 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:20:07,923-Speed 6326.93 samples/sec Loss 3.9553 LearningRate 0.0002 Epoch: 25 Global Step: 534350 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:20:11,165-Speed 6319.16 samples/sec Loss 4.0103 LearningRate 0.0002 Epoch: 25 Global Step: 534360 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:20:14,408-Speed 6316.27 samples/sec Loss 4.0278 LearningRate 0.0002 Epoch: 25 Global Step: 534370 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:20:17,650-Speed 6318.57 samples/sec Loss 4.0553 LearningRate 0.0002 Epoch: 25 Global Step: 534380 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:20:20,890-Speed 6322.19 samples/sec Loss 4.0269 LearningRate 0.0002 Epoch: 25 Global Step: 534390 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:20:24,135-Speed 6311.56 samples/sec Loss 4.0186 LearningRate 0.0002 Epoch: 25 Global Step: 534400 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:20:27,382-Speed 6309.02 samples/sec Loss 4.0392 LearningRate 0.0002 Epoch: 25 Global Step: 534410 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:20:30,625-Speed 6317.78 samples/sec Loss 4.0565 LearningRate 0.0002 Epoch: 25 Global Step: 534420 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:20:33,868-Speed 6316.65 samples/sec Loss 4.0027 LearningRate 0.0002 Epoch: 25 Global Step: 534430 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:20:37,110-Speed 6318.33 samples/sec Loss 3.9807 LearningRate 0.0002 Epoch: 25 Global Step: 534440 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:20:40,398-Speed 6229.41 samples/sec Loss 4.0356 LearningRate 0.0002 Epoch: 25 Global Step: 534450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:20:43,681-Speed 6239.74 samples/sec Loss 4.0328 LearningRate 0.0002 Epoch: 25 Global Step: 534460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:20:46,928-Speed 6308.62 samples/sec Loss 3.9928 LearningRate 0.0002 Epoch: 25 Global Step: 534470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:20:50,172-Speed 6315.32 samples/sec Loss 4.0481 LearningRate 0.0002 Epoch: 25 Global Step: 534480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:20:53,416-Speed 6314.71 samples/sec Loss 4.0569 LearningRate 0.0002 Epoch: 25 Global Step: 534490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:20:56,662-Speed 6310.62 samples/sec Loss 4.0396 LearningRate 0.0002 Epoch: 25 Global Step: 534500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:20:59,906-Speed 6314.51 samples/sec Loss 4.0195 LearningRate 0.0002 Epoch: 25 Global Step: 534510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:03,150-Speed 6314.04 samples/sec Loss 4.0728 LearningRate 0.0002 Epoch: 25 Global Step: 534520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:06,394-Speed 6314.54 samples/sec Loss 4.0751 LearningRate 0.0002 Epoch: 25 Global Step: 534530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:09,642-Speed 6307.24 samples/sec Loss 4.0355 LearningRate 0.0002 Epoch: 25 Global Step: 534540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:12,877-Speed 6331.87 samples/sec Loss 3.9952 LearningRate 0.0002 Epoch: 25 Global Step: 534550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:16,119-Speed 6318.92 samples/sec Loss 4.0337 LearningRate 0.0002 Epoch: 25 Global Step: 534560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:19,364-Speed 6312.46 samples/sec Loss 4.0128 LearningRate 0.0002 Epoch: 25 Global Step: 534570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:22,608-Speed 6316.10 samples/sec Loss 4.0398 LearningRate 0.0002 Epoch: 25 Global Step: 534580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:25,857-Speed 6303.60 samples/sec Loss 4.0499 LearningRate 0.0002 Epoch: 25 Global Step: 534590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:29,105-Speed 6308.37 samples/sec Loss 4.0135 LearningRate 0.0002 Epoch: 25 Global Step: 534600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:32,356-Speed 6300.07 samples/sec Loss 4.0092 LearningRate 0.0002 Epoch: 25 Global Step: 534610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:35,599-Speed 6317.21 samples/sec Loss 4.0177 LearningRate 0.0002 Epoch: 25 Global Step: 534620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:38,846-Speed 6308.38 samples/sec Loss 4.0162 LearningRate 0.0002 Epoch: 25 Global Step: 534630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:42,120-Speed 6257.24 samples/sec Loss 4.0606 LearningRate 0.0002 Epoch: 25 Global Step: 534640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:45,351-Speed 6338.43 samples/sec Loss 4.0319 LearningRate 0.0002 Epoch: 25 Global Step: 534650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:48,596-Speed 6313.62 samples/sec Loss 4.0027 LearningRate 0.0002 Epoch: 25 Global Step: 534660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:21:51,827-Speed 6338.85 samples/sec Loss 4.0338 LearningRate 0.0002 Epoch: 25 Global Step: 534670 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:21:55,071-Speed 6315.98 samples/sec Loss 4.0377 LearningRate 0.0002 Epoch: 25 Global Step: 534680 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:21:58,349-Speed 6248.30 samples/sec Loss 4.0743 LearningRate 0.0002 Epoch: 25 Global Step: 534690 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:22:01,597-Speed 6307.41 samples/sec Loss 4.0310 LearningRate 0.0002 Epoch: 25 Global Step: 534700 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:22:04,842-Speed 6312.71 samples/sec Loss 4.0652 LearningRate 0.0002 Epoch: 25 Global Step: 534710 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:22:08,084-Speed 6318.48 samples/sec Loss 4.0581 LearningRate 0.0002 Epoch: 25 Global Step: 534720 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:22:11,330-Speed 6311.45 samples/sec Loss 3.9926 LearningRate 0.0002 Epoch: 25 Global Step: 534730 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:22:14,574-Speed 6314.66 samples/sec Loss 4.0080 LearningRate 0.0002 Epoch: 25 Global Step: 534740 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:22:17,825-Speed 6301.19 samples/sec Loss 4.0208 LearningRate 0.0002 Epoch: 25 Global Step: 534750 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:22:21,069-Speed 6313.45 samples/sec Loss 4.0358 LearningRate 0.0002 Epoch: 25 Global Step: 534760 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:22:24,313-Speed 6315.44 samples/sec Loss 4.0270 LearningRate 0.0002 Epoch: 25 Global Step: 534770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:27,557-Speed 6314.54 samples/sec Loss 4.0055 LearningRate 0.0002 Epoch: 25 Global Step: 534780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:30,799-Speed 6319.29 samples/sec Loss 4.0147 LearningRate 0.0002 Epoch: 25 Global Step: 534790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:34,046-Speed 6308.54 samples/sec Loss 4.0498 LearningRate 0.0002 Epoch: 25 Global Step: 534800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:37,286-Speed 6320.93 samples/sec Loss 4.0422 LearningRate 0.0002 Epoch: 25 Global Step: 534810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:40,535-Speed 6306.30 samples/sec Loss 4.0548 LearningRate 0.0002 Epoch: 25 Global Step: 534820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:43,790-Speed 6292.09 samples/sec Loss 4.0550 LearningRate 0.0002 Epoch: 25 Global Step: 534830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:47,034-Speed 6315.89 samples/sec Loss 3.9922 LearningRate 0.0002 Epoch: 25 Global Step: 534840 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:50,278-Speed 6313.72 samples/sec Loss 3.9955 LearningRate 0.0002 Epoch: 25 Global Step: 534850 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:53,522-Speed 6314.62 samples/sec Loss 4.0386 LearningRate 0.0002 Epoch: 25 Global Step: 534860 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:56,753-Speed 6339.91 samples/sec Loss 3.9822 LearningRate 0.0002 Epoch: 25 Global Step: 534870 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:22:59,998-Speed 6313.37 samples/sec Loss 3.9934 LearningRate 0.0002 Epoch: 25 Global Step: 534880 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:03,243-Speed 6311.16 samples/sec Loss 4.0508 LearningRate 0.0002 Epoch: 25 Global Step: 534890 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:06,497-Speed 6297.15 samples/sec Loss 3.9932 LearningRate 0.0002 Epoch: 25 Global Step: 534900 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:09,745-Speed 6306.65 samples/sec Loss 4.0011 LearningRate 0.0002 Epoch: 25 Global Step: 534910 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:12,990-Speed 6313.33 samples/sec Loss 4.0045 LearningRate 0.0002 Epoch: 25 Global Step: 534920 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:16,235-Speed 6312.27 samples/sec Loss 4.0438 LearningRate 0.0002 Epoch: 25 Global Step: 534930 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:19,482-Speed 6308.85 samples/sec Loss 4.0924 LearningRate 0.0002 Epoch: 25 Global Step: 534940 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:22,728-Speed 6309.92 samples/sec Loss 4.0611 LearningRate 0.0002 Epoch: 25 Global Step: 534950 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:25,978-Speed 6304.01 samples/sec Loss 3.9976 LearningRate 0.0002 Epoch: 25 Global Step: 534960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:29,207-Speed 6343.59 samples/sec Loss 3.9883 LearningRate 0.0002 Epoch: 25 Global Step: 534970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:32,457-Speed 6302.94 samples/sec Loss 3.9092 LearningRate 0.0002 Epoch: 25 Global Step: 534980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:35,703-Speed 6311.03 samples/sec Loss 4.0529 LearningRate 0.0002 Epoch: 25 Global Step: 534990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:38,950-Speed 6308.23 samples/sec Loss 3.9403 LearningRate 0.0002 Epoch: 25 Global Step: 535000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:42,198-Speed 6306.87 samples/sec Loss 4.0225 LearningRate 0.0002 Epoch: 25 Global Step: 535010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:45,445-Speed 6308.21 samples/sec Loss 4.0009 LearningRate 0.0002 Epoch: 25 Global Step: 535020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:48,693-Speed 6308.28 samples/sec Loss 4.0075 LearningRate 0.0002 Epoch: 25 Global Step: 535030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:23:51,924-Speed 6338.47 samples/sec Loss 4.0232 LearningRate 0.0002 Epoch: 25 Global Step: 535040 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:23:55,169-Speed 6312.48 samples/sec Loss 4.0102 LearningRate 0.0002 Epoch: 25 Global Step: 535050 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:23:58,414-Speed 6313.92 samples/sec Loss 4.0587 LearningRate 0.0002 Epoch: 25 Global Step: 535060 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:24:01,676-Speed 6278.65 samples/sec Loss 4.0096 LearningRate 0.0002 Epoch: 25 Global Step: 535070 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:24:04,920-Speed 6314.59 samples/sec Loss 4.0241 LearningRate 0.0002 Epoch: 25 Global Step: 535080 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:24:08,166-Speed 6311.29 samples/sec Loss 4.0827 LearningRate 0.0002 Epoch: 25 Global Step: 535090 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:24:11,411-Speed 6313.40 samples/sec Loss 3.9715 LearningRate 0.0002 Epoch: 25 Global Step: 535100 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:24:14,730-Speed 6172.43 samples/sec Loss 4.0590 LearningRate 0.0002 Epoch: 25 Global Step: 535110 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:24:18,078-Speed 6118.20 samples/sec Loss 4.0232 LearningRate 0.0002 Epoch: 25 Global Step: 535120 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:24:21,320-Speed 6318.91 samples/sec Loss 3.9747 LearningRate 0.0002 Epoch: 25 Global Step: 535130 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:24:24,567-Speed 6308.39 samples/sec Loss 4.0457 LearningRate 0.0002 Epoch: 25 Global Step: 535140 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:24:27,812-Speed 6311.10 samples/sec Loss 3.9646 LearningRate 0.0002 Epoch: 25 Global Step: 535150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:24:31,059-Speed 6310.73 samples/sec Loss 4.0520 LearningRate 0.0002 Epoch: 25 Global Step: 535160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:24:34,309-Speed 6301.42 samples/sec Loss 4.0451 LearningRate 0.0002 Epoch: 25 Global Step: 535170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:24:37,554-Speed 6313.18 samples/sec Loss 3.9622 LearningRate 0.0002 Epoch: 25 Global Step: 535180 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:24:40,802-Speed 6307.50 samples/sec Loss 4.0296 LearningRate 0.0002 Epoch: 25 Global Step: 535190 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:24:44,042-Speed 6321.63 samples/sec Loss 4.0238 LearningRate 0.0002 Epoch: 25 Global Step: 535200 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:24:47,294-Speed 6300.55 samples/sec Loss 4.0711 LearningRate 0.0002 Epoch: 25 Global Step: 535210 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:24:50,538-Speed 6313.61 samples/sec Loss 4.0738 LearningRate 0.0002 Epoch: 25 Global Step: 535220 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:24:53,783-Speed 6312.15 samples/sec Loss 3.9556 LearningRate 0.0002 Epoch: 25 Global Step: 535230 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:24:57,023-Speed 6323.58 samples/sec Loss 3.9737 LearningRate 0.0002 Epoch: 25 Global Step: 535240 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:00,267-Speed 6314.16 samples/sec Loss 3.9965 LearningRate 0.0002 Epoch: 25 Global Step: 535250 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:03,519-Speed 6297.93 samples/sec Loss 4.0471 LearningRate 0.0002 Epoch: 25 Global Step: 535260 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:06,765-Speed 6310.51 samples/sec Loss 4.0144 LearningRate 0.0002 Epoch: 25 Global Step: 535270 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:10,007-Speed 6319.61 samples/sec Loss 4.0413 LearningRate 0.0002 Epoch: 25 Global Step: 535280 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:13,253-Speed 6309.93 samples/sec Loss 4.0237 LearningRate 0.0002 Epoch: 25 Global Step: 535290 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:16,497-Speed 6315.92 samples/sec Loss 3.9953 LearningRate 0.0002 Epoch: 25 Global Step: 535300 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:19,745-Speed 6307.75 samples/sec Loss 3.9630 LearningRate 0.0002 Epoch: 25 Global Step: 535310 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:22,989-Speed 6314.17 samples/sec Loss 3.9882 LearningRate 0.0002 Epoch: 25 Global Step: 535320 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:26,239-Speed 6303.78 samples/sec Loss 4.0054 LearningRate 0.0002 Epoch: 25 Global Step: 535330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:29,471-Speed 6337.84 samples/sec Loss 3.9549 LearningRate 0.0002 Epoch: 25 Global Step: 535340 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:32,715-Speed 6314.45 samples/sec Loss 4.0717 LearningRate 0.0002 Epoch: 25 Global Step: 535350 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:35,967-Speed 6297.95 samples/sec Loss 3.9993 LearningRate 0.0002 Epoch: 25 Global Step: 535360 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:39,211-Speed 6315.96 samples/sec Loss 3.9956 LearningRate 0.0002 Epoch: 25 Global Step: 535370 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:42,456-Speed 6312.25 samples/sec Loss 4.0250 LearningRate 0.0002 Epoch: 25 Global Step: 535380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:45,703-Speed 6309.34 samples/sec Loss 4.0805 LearningRate 0.0002 Epoch: 25 Global Step: 535390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:48,951-Speed 6305.69 samples/sec Loss 3.9728 LearningRate 0.0002 Epoch: 25 Global Step: 535400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:52,195-Speed 6314.06 samples/sec Loss 4.0280 LearningRate 0.0002 Epoch: 25 Global Step: 535410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:55,439-Speed 6314.91 samples/sec Loss 4.0518 LearningRate 0.0002 Epoch: 25 Global Step: 535420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:25:58,681-Speed 6318.95 samples/sec Loss 3.9939 LearningRate 0.0002 Epoch: 25 Global Step: 535430 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:26:01,918-Speed 6328.55 samples/sec Loss 4.0005 LearningRate 0.0002 Epoch: 25 Global Step: 535440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:26:05,161-Speed 6316.42 samples/sec Loss 4.0356 LearningRate 0.0002 Epoch: 25 Global Step: 535450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:26:08,408-Speed 6307.52 samples/sec Loss 3.9882 LearningRate 0.0002 Epoch: 25 Global Step: 535460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:26:11,654-Speed 6311.26 samples/sec Loss 3.9704 LearningRate 0.0002 Epoch: 25 Global Step: 535470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:26:14,895-Speed 6320.46 samples/sec Loss 3.9903 LearningRate 0.0002 Epoch: 25 Global Step: 535480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:26:18,137-Speed 6318.80 samples/sec Loss 3.9881 LearningRate 0.0002 Epoch: 25 Global Step: 535490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:26:21,385-Speed 6307.55 samples/sec Loss 3.9206 LearningRate 0.0002 Epoch: 25 Global Step: 535500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:26:24,667-Speed 6240.94 samples/sec Loss 4.0110 LearningRate 0.0002 Epoch: 25 Global Step: 535510 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:26:27,949-Speed 6240.91 samples/sec Loss 3.9975 LearningRate 0.0002 Epoch: 25 Global Step: 535520 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:26:31,198-Speed 6305.95 samples/sec Loss 4.0256 LearningRate 0.0002 Epoch: 25 Global Step: 535530 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:26:34,438-Speed 6321.73 samples/sec Loss 4.0230 LearningRate 0.0002 Epoch: 25 Global Step: 535540 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:26:37,682-Speed 6316.06 samples/sec Loss 3.9885 LearningRate 0.0002 Epoch: 25 Global Step: 535550 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:26:40,923-Speed 6319.55 samples/sec Loss 4.0007 LearningRate 0.0002 Epoch: 25 Global Step: 535560 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:26:44,166-Speed 6317.11 samples/sec Loss 3.9941 LearningRate 0.0002 Epoch: 25 Global Step: 535570 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:26:47,410-Speed 6313.61 samples/sec Loss 4.0913 LearningRate 0.0002 Epoch: 25 Global Step: 535580 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:26:50,655-Speed 6313.16 samples/sec Loss 3.9857 LearningRate 0.0002 Epoch: 25 Global Step: 535590 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:26:53,899-Speed 6313.86 samples/sec Loss 3.9840 LearningRate 0.0002 Epoch: 25 Global Step: 535600 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:26:57,147-Speed 6308.00 samples/sec Loss 4.0701 LearningRate 0.0002 Epoch: 25 Global Step: 535610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:00,391-Speed 6314.28 samples/sec Loss 3.9838 LearningRate 0.0002 Epoch: 25 Global Step: 535620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:03,639-Speed 6307.17 samples/sec Loss 4.0818 LearningRate 0.0002 Epoch: 25 Global Step: 535630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:06,885-Speed 6310.26 samples/sec Loss 4.0593 LearningRate 0.0002 Epoch: 25 Global Step: 535640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:10,130-Speed 6312.25 samples/sec Loss 4.0115 LearningRate 0.0002 Epoch: 25 Global Step: 535650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:13,371-Speed 6321.82 samples/sec Loss 4.0202 LearningRate 0.0002 Epoch: 25 Global Step: 535660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:16,619-Speed 6307.08 samples/sec Loss 4.0097 LearningRate 0.0002 Epoch: 25 Global Step: 535670 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:19,862-Speed 6314.87 samples/sec Loss 4.0486 LearningRate 0.0002 Epoch: 25 Global Step: 535680 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:23,110-Speed 6306.89 samples/sec Loss 4.0612 LearningRate 0.0002 Epoch: 25 Global Step: 535690 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:26,358-Speed 6307.71 samples/sec Loss 4.0240 LearningRate 0.0002 Epoch: 25 Global Step: 535700 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:29,593-Speed 6332.42 samples/sec Loss 3.9516 LearningRate 0.0002 Epoch: 25 Global Step: 535710 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:32,837-Speed 6313.16 samples/sec Loss 4.0429 LearningRate 0.0002 Epoch: 25 Global Step: 535720 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:36,081-Speed 6314.19 samples/sec Loss 4.0275 LearningRate 0.0002 Epoch: 25 Global Step: 535730 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:39,324-Speed 6316.76 samples/sec Loss 4.0021 LearningRate 0.0002 Epoch: 25 Global Step: 535740 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:42,574-Speed 6304.89 samples/sec Loss 3.9999 LearningRate 0.0002 Epoch: 25 Global Step: 535750 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:45,818-Speed 6314.35 samples/sec Loss 3.9866 LearningRate 0.0002 Epoch: 25 Global Step: 535760 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:49,064-Speed 6310.30 samples/sec Loss 4.0582 LearningRate 0.0002 Epoch: 25 Global Step: 535770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:52,309-Speed 6313.64 samples/sec Loss 3.9733 LearningRate 0.0002 Epoch: 25 Global Step: 535780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:55,559-Speed 6301.62 samples/sec Loss 3.9975 LearningRate 0.0002 Epoch: 25 Global Step: 535790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:27:58,803-Speed 6315.76 samples/sec Loss 3.9745 LearningRate 0.0002 Epoch: 25 Global Step: 535800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:28:02,033-Speed 6341.25 samples/sec Loss 3.9991 LearningRate 0.0002 Epoch: 25 Global Step: 535810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:28:05,279-Speed 6310.54 samples/sec Loss 3.9738 LearningRate 0.0002 Epoch: 25 Global Step: 535820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:28:08,521-Speed 6318.58 samples/sec Loss 4.0039 LearningRate 0.0002 Epoch: 25 Global Step: 535830 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:28:11,770-Speed 6305.78 samples/sec Loss 4.0027 LearningRate 0.0002 Epoch: 25 Global Step: 535840 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:28:15,011-Speed 6318.90 samples/sec Loss 4.0443 LearningRate 0.0002 Epoch: 25 Global Step: 535850 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:28:18,255-Speed 6316.37 samples/sec Loss 4.0629 LearningRate 0.0002 Epoch: 25 Global Step: 535860 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:28:21,495-Speed 6320.93 samples/sec Loss 3.9414 LearningRate 0.0002 Epoch: 25 Global Step: 535870 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:28:24,744-Speed 6304.78 samples/sec Loss 3.9960 LearningRate 0.0002 Epoch: 25 Global Step: 535880 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:28:27,980-Speed 6330.71 samples/sec Loss 3.9857 LearningRate 0.0002 Epoch: 25 Global Step: 535890 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:28:31,226-Speed 6310.35 samples/sec Loss 3.9626 LearningRate 0.0002 Epoch: 25 Global Step: 535900 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:28:34,470-Speed 6314.07 samples/sec Loss 3.9943 LearningRate 0.0002 Epoch: 25 Global Step: 535910 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:28:37,716-Speed 6311.06 samples/sec Loss 3.9741 LearningRate 0.0002 Epoch: 25 Global Step: 535920 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:28:40,973-Speed 6290.01 samples/sec Loss 4.0584 LearningRate 0.0002 Epoch: 25 Global Step: 535930 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:28:44,219-Speed 6310.84 samples/sec Loss 4.0048 LearningRate 0.0002 Epoch: 25 Global Step: 535940 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:28:47,462-Speed 6315.77 samples/sec Loss 3.9709 LearningRate 0.0002 Epoch: 25 Global Step: 535950 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:28:50,709-Speed 6309.53 samples/sec Loss 4.0087 LearningRate 0.0002 Epoch: 25 Global Step: 535960 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:28:53,951-Speed 6318.19 samples/sec Loss 3.9916 LearningRate 0.0002 Epoch: 25 Global Step: 535970 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:28:57,194-Speed 6316.95 samples/sec Loss 3.9418 LearningRate 0.0002 Epoch: 25 Global Step: 535980 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:00,443-Speed 6305.45 samples/sec Loss 4.0650 LearningRate 0.0002 Epoch: 25 Global Step: 535990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:29:03,686-Speed 6316.17 samples/sec Loss 4.0644 LearningRate 0.0002 Epoch: 25 Global Step: 536000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:29:06,931-Speed 6314.24 samples/sec Loss 3.9849 LearningRate 0.0002 Epoch: 25 Global Step: 536010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:29:10,172-Speed 6319.97 samples/sec Loss 4.0570 LearningRate 0.0002 Epoch: 25 Global Step: 536020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:29:13,417-Speed 6312.95 samples/sec Loss 4.0398 LearningRate 0.0002 Epoch: 25 Global Step: 536030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:29:16,662-Speed 6312.94 samples/sec Loss 4.0008 LearningRate 0.0002 Epoch: 25 Global Step: 536040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:29:19,892-Speed 6340.65 samples/sec Loss 4.1416 LearningRate 0.0002 Epoch: 25 Global Step: 536050 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:23,138-Speed 6310.77 samples/sec Loss 3.9170 LearningRate 0.0002 Epoch: 25 Global Step: 536060 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:26,386-Speed 6307.24 samples/sec Loss 3.9964 LearningRate 0.0002 Epoch: 25 Global Step: 536070 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:29,630-Speed 6315.11 samples/sec Loss 3.9715 LearningRate 0.0002 Epoch: 25 Global Step: 536080 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:32,872-Speed 6317.90 samples/sec Loss 4.0577 LearningRate 0.0002 Epoch: 25 Global Step: 536090 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:36,126-Speed 6293.99 samples/sec Loss 4.0481 LearningRate 0.0002 Epoch: 25 Global Step: 536100 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:39,372-Speed 6311.32 samples/sec Loss 3.9655 LearningRate 0.0002 Epoch: 25 Global Step: 536110 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:42,618-Speed 6311.50 samples/sec Loss 3.9999 LearningRate 0.0002 Epoch: 25 Global Step: 536120 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:45,862-Speed 6313.82 samples/sec Loss 4.0098 LearningRate 0.0002 Epoch: 25 Global Step: 536130 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:49,105-Speed 6318.16 samples/sec Loss 3.9490 LearningRate 0.0002 Epoch: 25 Global Step: 536140 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:29:52,347-Speed 6317.71 samples/sec Loss 3.9751 LearningRate 0.0002 Epoch: 25 Global Step: 536150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:29:55,590-Speed 6316.33 samples/sec Loss 3.9735 LearningRate 0.0002 Epoch: 25 Global Step: 536160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:29:58,842-Speed 6299.85 samples/sec Loss 3.9723 LearningRate 0.0002 Epoch: 25 Global Step: 536170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:30:02,087-Speed 6312.88 samples/sec Loss 4.0334 LearningRate 0.0002 Epoch: 25 Global Step: 536180 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:30:05,332-Speed 6311.52 samples/sec Loss 3.9984 LearningRate 0.0002 Epoch: 25 Global Step: 536190 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:30:08,573-Speed 6320.57 samples/sec Loss 3.9405 LearningRate 0.0002 Epoch: 25 Global Step: 536200 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:30:11,824-Speed 6302.54 samples/sec Loss 4.0258 LearningRate 0.0002 Epoch: 25 Global Step: 536210 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:30:15,069-Speed 6311.41 samples/sec Loss 3.9635 LearningRate 0.0002 Epoch: 25 Global Step: 536220 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:30:18,314-Speed 6312.70 samples/sec Loss 4.0082 LearningRate 0.0002 Epoch: 25 Global Step: 536230 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:30:21,561-Speed 6309.84 samples/sec Loss 4.0248 LearningRate 0.0002 Epoch: 25 Global Step: 536240 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:30:24,810-Speed 6303.94 samples/sec Loss 4.0123 LearningRate 0.0002 Epoch: 25 Global Step: 536250 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-02 16:30:28,046-Speed 6331.19 samples/sec Loss 3.9914 LearningRate 0.0002 Epoch: 25 Global Step: 536260 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:30:31,291-Speed 6312.01 samples/sec Loss 4.0499 LearningRate 0.0002 Epoch: 25 Global Step: 536270 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:30:34,524-Speed 6335.68 samples/sec Loss 3.9952 LearningRate 0.0002 Epoch: 25 Global Step: 536280 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:30:37,770-Speed 6310.07 samples/sec Loss 3.9830 LearningRate 0.0002 Epoch: 25 Global Step: 536290 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:30:41,016-Speed 6313.83 samples/sec Loss 3.9643 LearningRate 0.0002 Epoch: 25 Global Step: 536300 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:30:44,257-Speed 6320.85 samples/sec Loss 3.9307 LearningRate 0.0002 Epoch: 25 Global Step: 536310 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:30:47,506-Speed 6306.42 samples/sec Loss 3.9679 LearningRate 0.0002 Epoch: 25 Global Step: 536320 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:30:50,749-Speed 6315.50 samples/sec Loss 4.0228 LearningRate 0.0002 Epoch: 25 Global Step: 536330 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:30:53,990-Speed 6321.00 samples/sec Loss 3.9922 LearningRate 0.0002 Epoch: 25 Global Step: 536340 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:30:57,236-Speed 6310.22 samples/sec Loss 4.0073 LearningRate 0.0002 Epoch: 25 Global Step: 536350 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:31:00,497-Speed 6281.82 samples/sec Loss 3.9913 LearningRate 0.0002 Epoch: 25 Global Step: 536360 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:31:03,745-Speed 6306.68 samples/sec Loss 4.0497 LearningRate 0.0002 Epoch: 25 Global Step: 536370 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:31:06,991-Speed 6310.38 samples/sec Loss 3.9830 LearningRate 0.0002 Epoch: 25 Global Step: 536380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:10,239-Speed 6308.06 samples/sec Loss 4.0258 LearningRate 0.0002 Epoch: 25 Global Step: 536390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:13,485-Speed 6310.75 samples/sec Loss 4.0044 LearningRate 0.0002 Epoch: 25 Global Step: 536400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:16,728-Speed 6316.54 samples/sec Loss 3.9758 LearningRate 0.0002 Epoch: 25 Global Step: 536410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:19,974-Speed 6311.98 samples/sec Loss 4.0364 LearningRate 0.0002 Epoch: 25 Global Step: 536420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:23,219-Speed 6311.09 samples/sec Loss 3.9813 LearningRate 0.0002 Epoch: 25 Global Step: 536430 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:26,466-Speed 6309.04 samples/sec Loss 3.9981 LearningRate 0.0002 Epoch: 25 Global Step: 536440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:29,709-Speed 6316.62 samples/sec Loss 4.0353 LearningRate 0.0002 Epoch: 25 Global Step: 536450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:32,956-Speed 6308.81 samples/sec Loss 3.9765 LearningRate 0.0002 Epoch: 25 Global Step: 536460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:36,199-Speed 6315.52 samples/sec Loss 3.9935 LearningRate 0.0002 Epoch: 25 Global Step: 536470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:39,433-Speed 6335.55 samples/sec Loss 4.0196 LearningRate 0.0002 Epoch: 25 Global Step: 536480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:42,685-Speed 6299.12 samples/sec Loss 4.0014 LearningRate 0.0002 Epoch: 25 Global Step: 536490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:45,925-Speed 6321.84 samples/sec Loss 4.0200 LearningRate 0.0002 Epoch: 25 Global Step: 536500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:49,169-Speed 6314.76 samples/sec Loss 4.0151 LearningRate 0.0002 Epoch: 25 Global Step: 536510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:52,416-Speed 6309.24 samples/sec Loss 3.9947 LearningRate 0.0002 Epoch: 25 Global Step: 536520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:55,663-Speed 6307.47 samples/sec Loss 4.0018 LearningRate 0.0002 Epoch: 25 Global Step: 536530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:31:58,907-Speed 6315.85 samples/sec Loss 4.0537 LearningRate 0.0002 Epoch: 25 Global Step: 536540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:02,151-Speed 6314.69 samples/sec Loss 4.0705 LearningRate 0.0002 Epoch: 25 Global Step: 536550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:05,397-Speed 6310.30 samples/sec Loss 4.0007 LearningRate 0.0002 Epoch: 25 Global Step: 536560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:08,642-Speed 6311.93 samples/sec Loss 3.9906 LearningRate 0.0002 Epoch: 25 Global Step: 536570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:11,870-Speed 6345.51 samples/sec Loss 3.9759 LearningRate 0.0002 Epoch: 25 Global Step: 536580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:15,111-Speed 6321.03 samples/sec Loss 4.0264 LearningRate 0.0002 Epoch: 25 Global Step: 536590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:18,357-Speed 6312.40 samples/sec Loss 4.0029 LearningRate 0.0002 Epoch: 25 Global Step: 536600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:21,600-Speed 6316.45 samples/sec Loss 4.0016 LearningRate 0.0002 Epoch: 25 Global Step: 536610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:24,842-Speed 6318.34 samples/sec Loss 3.9376 LearningRate 0.0002 Epoch: 25 Global Step: 536620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:28,086-Speed 6314.14 samples/sec Loss 3.9474 LearningRate 0.0002 Epoch: 25 Global Step: 536630 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:31,330-Speed 6314.35 samples/sec Loss 3.9960 LearningRate 0.0002 Epoch: 25 Global Step: 536640 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:34,576-Speed 6312.40 samples/sec Loss 3.9378 LearningRate 0.0002 Epoch: 25 Global Step: 536650 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:37,822-Speed 6310.42 samples/sec Loss 4.0594 LearningRate 0.0002 Epoch: 25 Global Step: 536660 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:32:41,049-Speed 6347.44 samples/sec Loss 3.9430 LearningRate 0.0002 Epoch: 25 Global Step: 536670 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:32:44,292-Speed 6315.98 samples/sec Loss 3.9980 LearningRate 0.0002 Epoch: 25 Global Step: 536680 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:32:47,534-Speed 6319.87 samples/sec Loss 3.9353 LearningRate 0.0002 Epoch: 25 Global Step: 536690 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:32:50,778-Speed 6313.52 samples/sec Loss 4.0370 LearningRate 0.0002 Epoch: 25 Global Step: 536700 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:32:54,022-Speed 6314.94 samples/sec Loss 3.9869 LearningRate 0.0002 Epoch: 25 Global Step: 536710 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:32:57,265-Speed 6315.58 samples/sec Loss 3.9801 LearningRate 0.0002 Epoch: 25 Global Step: 536720 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:00,513-Speed 6308.37 samples/sec Loss 4.0048 LearningRate 0.0002 Epoch: 25 Global Step: 536730 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:03,782-Speed 6265.70 samples/sec Loss 3.9758 LearningRate 0.0002 Epoch: 25 Global Step: 536740 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:07,039-Speed 6288.57 samples/sec Loss 3.9755 LearningRate 0.0002 Epoch: 25 Global Step: 536750 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:10,288-Speed 6305.23 samples/sec Loss 4.0155 LearningRate 0.0002 Epoch: 25 Global Step: 536760 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:13,604-Speed 6176.76 samples/sec Loss 3.9871 LearningRate 0.0002 Epoch: 25 Global Step: 536770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:33:16,851-Speed 6310.12 samples/sec Loss 4.0113 LearningRate 0.0002 Epoch: 25 Global Step: 536780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:33:20,085-Speed 6333.79 samples/sec Loss 4.0095 LearningRate 0.0002 Epoch: 25 Global Step: 536790 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:23,328-Speed 6315.53 samples/sec Loss 4.0837 LearningRate 0.0002 Epoch: 25 Global Step: 536800 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:26,572-Speed 6315.37 samples/sec Loss 4.0311 LearningRate 0.0002 Epoch: 25 Global Step: 536810 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:29,820-Speed 6307.60 samples/sec Loss 4.0421 LearningRate 0.0002 Epoch: 25 Global Step: 536820 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:33,065-Speed 6312.11 samples/sec Loss 4.0423 LearningRate 0.0002 Epoch: 25 Global Step: 536830 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:36,312-Speed 6309.01 samples/sec Loss 4.0749 LearningRate 0.0002 Epoch: 25 Global Step: 536840 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:39,560-Speed 6308.55 samples/sec Loss 4.0404 LearningRate 0.0002 Epoch: 25 Global Step: 536850 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:42,801-Speed 6319.36 samples/sec Loss 4.0206 LearningRate 0.0002 Epoch: 25 Global Step: 536860 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:46,047-Speed 6310.36 samples/sec Loss 3.9540 LearningRate 0.0002 Epoch: 25 Global Step: 536870 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:49,295-Speed 6308.05 samples/sec Loss 4.0749 LearningRate 0.0002 Epoch: 25 Global Step: 536880 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:33:52,539-Speed 6312.86 samples/sec Loss 3.9997 LearningRate 0.0002 Epoch: 25 Global Step: 536890 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:33:55,785-Speed 6311.98 samples/sec Loss 4.0007 LearningRate 0.0002 Epoch: 25 Global Step: 536900 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:33:59,034-Speed 6304.74 samples/sec Loss 4.1072 LearningRate 0.0002 Epoch: 25 Global Step: 536910 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:02,278-Speed 6314.24 samples/sec Loss 3.9911 LearningRate 0.0002 Epoch: 25 Global Step: 536920 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:05,522-Speed 6315.12 samples/sec Loss 3.9850 LearningRate 0.0002 Epoch: 25 Global Step: 536930 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:08,766-Speed 6314.63 samples/sec Loss 4.0106 LearningRate 0.0002 Epoch: 25 Global Step: 536940 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:12,011-Speed 6312.59 samples/sec Loss 4.0758 LearningRate 0.0002 Epoch: 25 Global Step: 536950 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:15,253-Speed 6317.68 samples/sec Loss 3.9930 LearningRate 0.0002 Epoch: 25 Global Step: 536960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:18,500-Speed 6309.71 samples/sec Loss 4.0038 LearningRate 0.0002 Epoch: 25 Global Step: 536970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:21,750-Speed 6303.21 samples/sec Loss 3.9377 LearningRate 0.0002 Epoch: 25 Global Step: 536980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:24,979-Speed 6342.66 samples/sec Loss 3.9244 LearningRate 0.0002 Epoch: 25 Global Step: 536990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:28,222-Speed 6315.73 samples/sec Loss 4.0107 LearningRate 0.0002 Epoch: 25 Global Step: 537000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:31,465-Speed 6316.39 samples/sec Loss 3.9823 LearningRate 0.0002 Epoch: 25 Global Step: 537010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:34,707-Speed 6318.75 samples/sec Loss 4.0294 LearningRate 0.0002 Epoch: 25 Global Step: 537020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:37,957-Speed 6305.34 samples/sec Loss 3.9895 LearningRate 0.0002 Epoch: 25 Global Step: 537030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:41,202-Speed 6311.54 samples/sec Loss 4.0307 LearningRate 0.0002 Epoch: 25 Global Step: 537040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:34:44,434-Speed 6338.90 samples/sec Loss 4.0117 LearningRate 0.0002 Epoch: 25 Global Step: 537050 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:34:47,679-Speed 6316.64 samples/sec Loss 3.9887 LearningRate 0.0002 Epoch: 25 Global Step: 537060 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:34:50,923-Speed 6315.59 samples/sec Loss 4.0000 LearningRate 0.0002 Epoch: 25 Global Step: 537070 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:34:54,166-Speed 6315.86 samples/sec Loss 3.9471 LearningRate 0.0002 Epoch: 25 Global Step: 537080 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:34:57,420-Speed 6295.00 samples/sec Loss 3.9937 LearningRate 0.0002 Epoch: 25 Global Step: 537090 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:35:00,665-Speed 6312.15 samples/sec Loss 4.0263 LearningRate 0.0002 Epoch: 25 Global Step: 537100 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:35:03,911-Speed 6310.82 samples/sec Loss 4.0026 LearningRate 0.0002 Epoch: 25 Global Step: 537110 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:35:07,155-Speed 6315.22 samples/sec Loss 3.9720 LearningRate 0.0002 Epoch: 25 Global Step: 537120 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:35:10,394-Speed 6323.26 samples/sec Loss 3.9751 LearningRate 0.0002 Epoch: 25 Global Step: 537130 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:35:13,639-Speed 6313.86 samples/sec Loss 4.0125 LearningRate 0.0002 Epoch: 25 Global Step: 537140 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:35:16,880-Speed 6319.71 samples/sec Loss 4.0632 LearningRate 0.0002 Epoch: 25 Global Step: 537150 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:20,123-Speed 6316.17 samples/sec Loss 4.0621 LearningRate 0.0002 Epoch: 25 Global Step: 537160 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:23,372-Speed 6306.09 samples/sec Loss 3.9636 LearningRate 0.0002 Epoch: 25 Global Step: 537170 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:26,619-Speed 6307.89 samples/sec Loss 3.9794 LearningRate 0.0002 Epoch: 25 Global Step: 537180 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:29,868-Speed 6305.53 samples/sec Loss 4.0420 LearningRate 0.0002 Epoch: 25 Global Step: 537190 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:33,111-Speed 6315.91 samples/sec Loss 4.0616 LearningRate 0.0002 Epoch: 25 Global Step: 537200 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:36,363-Speed 6300.61 samples/sec Loss 3.9816 LearningRate 0.0002 Epoch: 25 Global Step: 537210 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:39,607-Speed 6313.04 samples/sec Loss 4.0089 LearningRate 0.0002 Epoch: 25 Global Step: 537220 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:42,855-Speed 6307.76 samples/sec Loss 3.9947 LearningRate 0.0002 Epoch: 25 Global Step: 537230 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:46,100-Speed 6313.96 samples/sec Loss 3.9818 LearningRate 0.0002 Epoch: 25 Global Step: 537240 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:49,329-Speed 6343.14 samples/sec Loss 4.0677 LearningRate 0.0002 Epoch: 25 Global Step: 537250 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:52,583-Speed 6296.22 samples/sec Loss 3.9860 LearningRate 0.0002 Epoch: 25 Global Step: 537260 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:55,828-Speed 6312.19 samples/sec Loss 3.9805 LearningRate 0.0002 Epoch: 25 Global Step: 537270 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:35:59,075-Speed 6308.28 samples/sec Loss 4.0185 LearningRate 0.0002 Epoch: 25 Global Step: 537280 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:02,322-Speed 6309.46 samples/sec Loss 3.9771 LearningRate 0.0002 Epoch: 25 Global Step: 537290 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:05,568-Speed 6309.84 samples/sec Loss 3.9701 LearningRate 0.0002 Epoch: 25 Global Step: 537300 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:08,814-Speed 6311.57 samples/sec Loss 4.0625 LearningRate 0.0002 Epoch: 25 Global Step: 537310 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:12,067-Speed 6295.90 samples/sec Loss 3.9900 LearningRate 0.0002 Epoch: 25 Global Step: 537320 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:15,389-Speed 6166.60 samples/sec Loss 4.0224 LearningRate 0.0002 Epoch: 25 Global Step: 537330 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:18,632-Speed 6316.45 samples/sec Loss 3.9979 LearningRate 0.0002 Epoch: 25 Global Step: 537340 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:21,867-Speed 6332.17 samples/sec Loss 4.0344 LearningRate 0.0002 Epoch: 25 Global Step: 537350 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:25,114-Speed 6309.03 samples/sec Loss 4.0231 LearningRate 0.0002 Epoch: 25 Global Step: 537360 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:28,362-Speed 6306.59 samples/sec Loss 4.0886 LearningRate 0.0002 Epoch: 25 Global Step: 537370 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:31,610-Speed 6307.49 samples/sec Loss 4.0349 LearningRate 0.0002 Epoch: 25 Global Step: 537380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:34,870-Speed 6282.46 samples/sec Loss 3.9861 LearningRate 0.0002 Epoch: 25 Global Step: 537390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:38,172-Speed 6204.72 samples/sec Loss 4.0508 LearningRate 0.0002 Epoch: 25 Global Step: 537400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:41,419-Speed 6308.31 samples/sec Loss 4.0403 LearningRate 0.0002 Epoch: 25 Global Step: 537410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:44,664-Speed 6312.47 samples/sec Loss 3.9720 LearningRate 0.0002 Epoch: 25 Global Step: 537420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:47,914-Speed 6303.41 samples/sec Loss 3.9940 LearningRate 0.0002 Epoch: 25 Global Step: 537430 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:51,158-Speed 6314.96 samples/sec Loss 4.0106 LearningRate 0.0002 Epoch: 25 Global Step: 537440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:54,389-Speed 6339.74 samples/sec Loss 3.9515 LearningRate 0.0002 Epoch: 25 Global Step: 537450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:36:57,635-Speed 6311.85 samples/sec Loss 3.9689 LearningRate 0.0002 Epoch: 25 Global Step: 537460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:00,906-Speed 6262.71 samples/sec Loss 3.9929 LearningRate 0.0002 Epoch: 25 Global Step: 537470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:04,156-Speed 6304.84 samples/sec Loss 3.9398 LearningRate 0.0002 Epoch: 25 Global Step: 537480 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:07,402-Speed 6309.97 samples/sec Loss 3.9983 LearningRate 0.0002 Epoch: 25 Global Step: 537490 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:10,644-Speed 6318.04 samples/sec Loss 3.9850 LearningRate 0.0002 Epoch: 25 Global Step: 537500 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:13,892-Speed 6307.24 samples/sec Loss 3.9534 LearningRate 0.0002 Epoch: 25 Global Step: 537510 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:17,139-Speed 6308.65 samples/sec Loss 3.9743 LearningRate 0.0002 Epoch: 25 Global Step: 537520 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:20,387-Speed 6306.91 samples/sec Loss 4.0520 LearningRate 0.0002 Epoch: 25 Global Step: 537530 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:23,632-Speed 6313.21 samples/sec Loss 3.9557 LearningRate 0.0002 Epoch: 25 Global Step: 537540 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:26,865-Speed 6334.90 samples/sec Loss 3.9468 LearningRate 0.0002 Epoch: 25 Global Step: 537550 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:30,108-Speed 6316.85 samples/sec Loss 4.0653 LearningRate 0.0002 Epoch: 25 Global Step: 537560 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:33,355-Speed 6309.63 samples/sec Loss 3.9906 LearningRate 0.0002 Epoch: 25 Global Step: 537570 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:36,603-Speed 6307.16 samples/sec Loss 3.9936 LearningRate 0.0002 Epoch: 25 Global Step: 537580 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:39,847-Speed 6313.40 samples/sec Loss 3.9798 LearningRate 0.0002 Epoch: 25 Global Step: 537590 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:43,096-Speed 6303.95 samples/sec Loss 4.0162 LearningRate 0.0002 Epoch: 25 Global Step: 537600 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:46,352-Speed 6292.11 samples/sec Loss 3.9715 LearningRate 0.0002 Epoch: 25 Global Step: 537610 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:49,612-Speed 6283.07 samples/sec Loss 3.9205 LearningRate 0.0002 Epoch: 25 Global Step: 537620 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:37:52,841-Speed 6343.90 samples/sec Loss 3.9484 LearningRate 0.0002 Epoch: 25 Global Step: 537630 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:37:56,086-Speed 6314.34 samples/sec Loss 4.0225 LearningRate 0.0002 Epoch: 25 Global Step: 537640 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:37:59,344-Speed 6288.45 samples/sec Loss 3.9859 LearningRate 0.0002 Epoch: 25 Global Step: 537650 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:38:02,588-Speed 6312.73 samples/sec Loss 3.9714 LearningRate 0.0002 Epoch: 25 Global Step: 537660 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:38:05,837-Speed 6305.91 samples/sec Loss 4.0059 LearningRate 0.0002 Epoch: 25 Global Step: 537670 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:38:09,079-Speed 6318.41 samples/sec Loss 3.9430 LearningRate 0.0002 Epoch: 25 Global Step: 537680 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:38:12,343-Speed 6275.57 samples/sec Loss 4.0891 LearningRate 0.0002 Epoch: 25 Global Step: 537690 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:38:15,656-Speed 6183.07 samples/sec Loss 4.0302 LearningRate 0.0002 Epoch: 25 Global Step: 537700 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:38:18,992-Speed 6140.45 samples/sec Loss 4.0176 LearningRate 0.0002 Epoch: 25 Global Step: 537710 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:38:22,237-Speed 6313.96 samples/sec Loss 4.0002 LearningRate 0.0002 Epoch: 25 Global Step: 537720 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:38:25,481-Speed 6313.80 samples/sec Loss 4.0040 LearningRate 0.0002 Epoch: 25 Global Step: 537730 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:38:28,730-Speed 6304.74 samples/sec Loss 3.9664 LearningRate 0.0002 Epoch: 25 Global Step: 537740 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:38:31,979-Speed 6304.19 samples/sec Loss 4.0190 LearningRate 0.0002 Epoch: 25 Global Step: 537750 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:38:35,223-Speed 6315.02 samples/sec Loss 4.0373 LearningRate 0.0002 Epoch: 25 Global Step: 537760 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:38:38,470-Speed 6309.47 samples/sec Loss 4.0034 LearningRate 0.0002 Epoch: 25 Global Step: 537770 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:38:41,715-Speed 6312.72 samples/sec Loss 3.9683 LearningRate 0.0002 Epoch: 25 Global Step: 537780 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:38:44,960-Speed 6313.20 samples/sec Loss 3.9578 LearningRate 0.0002 Epoch: 25 Global Step: 537790 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:38:48,204-Speed 6313.99 samples/sec Loss 4.0249 LearningRate 0.0002 Epoch: 25 Global Step: 537800 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:38:51,449-Speed 6313.49 samples/sec Loss 3.9846 LearningRate 0.0002 Epoch: 25 Global Step: 537810 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:38:54,694-Speed 6310.83 samples/sec Loss 4.0170 LearningRate 0.0002 Epoch: 25 Global Step: 537820 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:38:57,925-Speed 6339.91 samples/sec Loss 4.0486 LearningRate 0.0002 Epoch: 25 Global Step: 537830 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:39:01,171-Speed 6312.00 samples/sec Loss 3.9806 LearningRate 0.0002 Epoch: 25 Global Step: 537840 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:39:04,412-Speed 6319.91 samples/sec Loss 4.0271 LearningRate 0.0002 Epoch: 25 Global Step: 537850 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:39:07,657-Speed 6313.89 samples/sec Loss 4.0141 LearningRate 0.0002 Epoch: 25 Global Step: 537860 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:39:10,900-Speed 6316.38 samples/sec Loss 4.0043 LearningRate 0.0002 Epoch: 25 Global Step: 537870 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:39:14,149-Speed 6304.66 samples/sec Loss 4.0122 LearningRate 0.0002 Epoch: 25 Global Step: 537880 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:39:17,396-Speed 6309.52 samples/sec Loss 3.9844 LearningRate 0.0002 Epoch: 25 Global Step: 537890 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:39:20,641-Speed 6311.39 samples/sec Loss 4.0250 LearningRate 0.0002 Epoch: 25 Global Step: 537900 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:39:23,885-Speed 6316.54 samples/sec Loss 4.0134 LearningRate 0.0002 Epoch: 25 Global Step: 537910 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:39:27,130-Speed 6311.20 samples/sec Loss 4.0078 LearningRate 0.0002 Epoch: 25 Global Step: 537920 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:39:30,378-Speed 6307.90 samples/sec Loss 3.9590 LearningRate 0.0002 Epoch: 25 Global Step: 537930 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:39:33,636-Speed 6287.42 samples/sec Loss 4.0044 LearningRate 0.0002 Epoch: 25 Global Step: 537940 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:39:36,880-Speed 6314.07 samples/sec Loss 3.9895 LearningRate 0.0002 Epoch: 25 Global Step: 537950 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:39:40,130-Speed 6302.60 samples/sec Loss 3.9438 LearningRate 0.0002 Epoch: 25 Global Step: 537960 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:39:43,373-Speed 6316.71 samples/sec Loss 3.9529 LearningRate 0.0002 Epoch: 25 Global Step: 537970 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:39:46,620-Speed 6309.16 samples/sec Loss 4.0046 LearningRate 0.0002 Epoch: 25 Global Step: 537980 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:39:49,869-Speed 6303.82 samples/sec Loss 3.9758 LearningRate 0.0002 Epoch: 25 Global Step: 537990 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:39:53,115-Speed 6311.69 samples/sec Loss 3.9825 LearningRate 0.0002 Epoch: 25 Global Step: 538000 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:39:56,370-Speed 6293.56 samples/sec Loss 4.0010 LearningRate 0.0002 Epoch: 25 Global Step: 538010 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:39:59,613-Speed 6315.73 samples/sec Loss 3.9846 LearningRate 0.0002 Epoch: 25 Global Step: 538020 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:40:02,848-Speed 6333.07 samples/sec Loss 3.9804 LearningRate 0.0002 Epoch: 25 Global Step: 538030 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:40:06,094-Speed 6310.22 samples/sec Loss 3.9966 LearningRate 0.0002 Epoch: 25 Global Step: 538040 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:40:09,347-Speed 6297.59 samples/sec Loss 3.9571 LearningRate 0.0002 Epoch: 25 Global Step: 538050 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:40:12,597-Speed 6303.08 samples/sec Loss 4.0387 LearningRate 0.0002 Epoch: 25 Global Step: 538060 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:40:15,842-Speed 6311.93 samples/sec Loss 4.0162 LearningRate 0.0002 Epoch: 25 Global Step: 538070 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:40:19,087-Speed 6313.19 samples/sec Loss 3.9859 LearningRate 0.0002 Epoch: 25 Global Step: 538080 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:40:22,334-Speed 6310.63 samples/sec Loss 4.0641 LearningRate 0.0002 Epoch: 25 Global Step: 538090 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:40:25,578-Speed 6312.98 samples/sec Loss 4.0340 LearningRate 0.0002 Epoch: 25 Global Step: 538100 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:40:28,827-Speed 6304.95 samples/sec Loss 4.0142 LearningRate 0.0002 Epoch: 25 Global Step: 538110 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-04-02 16:40:32,056-Speed 6344.53 samples/sec Loss 3.9415 LearningRate 0.0002 Epoch: 25 Global Step: 538120 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:40:35,300-Speed 6314.94 samples/sec Loss 3.9447 LearningRate 0.0002 Epoch: 25 Global Step: 538130 Fp16 Grad Scale: 8192 Required: 27 hours Training: 2022-04-02 16:40:38,556-Speed 6291.75 samples/sec Loss 3.9976 LearningRate 0.0002 Epoch: 25 Global Step: 538140 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:40:41,804-Speed 6306.53 samples/sec Loss 4.0170 LearningRate 0.0002 Epoch: 25 Global Step: 538150 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:40:45,048-Speed 6314.54 samples/sec Loss 4.0433 LearningRate 0.0002 Epoch: 25 Global Step: 538160 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:40:48,294-Speed 6311.03 samples/sec Loss 3.9590 LearningRate 0.0002 Epoch: 25 Global Step: 538170 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:40:51,538-Speed 6313.74 samples/sec Loss 4.0060 LearningRate 0.0002 Epoch: 25 Global Step: 538180 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:40:54,780-Speed 6318.95 samples/sec Loss 4.0385 LearningRate 0.0002 Epoch: 25 Global Step: 538190 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:40:58,024-Speed 6314.02 samples/sec Loss 3.9661 LearningRate 0.0002 Epoch: 25 Global Step: 538200 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:01,268-Speed 6314.50 samples/sec Loss 3.9862 LearningRate 0.0002 Epoch: 25 Global Step: 538210 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:04,517-Speed 6305.05 samples/sec Loss 3.9604 LearningRate 0.0002 Epoch: 25 Global Step: 538220 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:41:07,749-Speed 6338.82 samples/sec Loss 3.9677 LearningRate 0.0002 Epoch: 25 Global Step: 538230 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:10,994-Speed 6312.33 samples/sec Loss 3.9814 LearningRate 0.0002 Epoch: 25 Global Step: 538240 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:14,251-Speed 6289.85 samples/sec Loss 3.9854 LearningRate 0.0002 Epoch: 25 Global Step: 538250 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:17,599-Speed 6117.60 samples/sec Loss 3.9706 LearningRate 0.0002 Epoch: 25 Global Step: 538260 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:20,845-Speed 6311.07 samples/sec Loss 4.0028 LearningRate 0.0002 Epoch: 25 Global Step: 538270 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:24,091-Speed 6310.21 samples/sec Loss 3.9827 LearningRate 0.0002 Epoch: 25 Global Step: 538280 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:27,336-Speed 6313.48 samples/sec Loss 4.0272 LearningRate 0.0002 Epoch: 25 Global Step: 538290 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:30,583-Speed 6309.01 samples/sec Loss 4.0405 LearningRate 0.0002 Epoch: 25 Global Step: 538300 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:33,825-Speed 6318.71 samples/sec Loss 3.9575 LearningRate 0.0002 Epoch: 25 Global Step: 538310 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:37,073-Speed 6307.21 samples/sec Loss 4.0201 LearningRate 0.0002 Epoch: 25 Global Step: 538320 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:41:40,322-Speed 6304.97 samples/sec Loss 3.9166 LearningRate 0.0002 Epoch: 25 Global Step: 538330 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:41:43,566-Speed 6314.32 samples/sec Loss 4.0108 LearningRate 0.0002 Epoch: 25 Global Step: 538340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:41:46,808-Speed 6319.31 samples/sec Loss 4.0183 LearningRate 0.0002 Epoch: 25 Global Step: 538350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:41:50,056-Speed 6306.72 samples/sec Loss 4.0458 LearningRate 0.0002 Epoch: 25 Global Step: 538360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:41:53,302-Speed 6310.27 samples/sec Loss 3.9959 LearningRate 0.0002 Epoch: 25 Global Step: 538370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:41:56,546-Speed 6314.67 samples/sec Loss 3.9984 LearningRate 0.0002 Epoch: 25 Global Step: 538380 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:41:59,793-Speed 6309.18 samples/sec Loss 3.9536 LearningRate 0.0002 Epoch: 25 Global Step: 538390 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:42:03,041-Speed 6307.01 samples/sec Loss 4.0294 LearningRate 0.0002 Epoch: 25 Global Step: 538400 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:42:06,296-Speed 6292.26 samples/sec Loss 3.9728 LearningRate 0.0002 Epoch: 25 Global Step: 538410 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:42:09,551-Speed 6294.29 samples/sec Loss 4.0182 LearningRate 0.0002 Epoch: 25 Global Step: 538420 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:42:12,793-Speed 6318.78 samples/sec Loss 3.9670 LearningRate 0.0002 Epoch: 25 Global Step: 538430 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:42:16,060-Speed 6269.28 samples/sec Loss 3.9554 LearningRate 0.0002 Epoch: 25 Global Step: 538440 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:19,303-Speed 6316.48 samples/sec Loss 3.9656 LearningRate 0.0002 Epoch: 25 Global Step: 538450 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:22,545-Speed 6318.57 samples/sec Loss 3.9995 LearningRate 0.0002 Epoch: 25 Global Step: 538460 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:25,791-Speed 6309.61 samples/sec Loss 3.9514 LearningRate 0.0002 Epoch: 25 Global Step: 538470 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:29,038-Speed 6309.15 samples/sec Loss 3.9979 LearningRate 0.0002 Epoch: 25 Global Step: 538480 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:32,283-Speed 6313.06 samples/sec Loss 3.9757 LearningRate 0.0002 Epoch: 25 Global Step: 538490 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:35,531-Speed 6307.91 samples/sec Loss 3.9981 LearningRate 0.0002 Epoch: 25 Global Step: 538500 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:38,780-Speed 6305.71 samples/sec Loss 3.9817 LearningRate 0.0002 Epoch: 25 Global Step: 538510 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:42,024-Speed 6314.84 samples/sec Loss 4.0065 LearningRate 0.0002 Epoch: 25 Global Step: 538520 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:45,267-Speed 6316.03 samples/sec Loss 3.9560 LearningRate 0.0002 Epoch: 25 Global Step: 538530 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:48,508-Speed 6319.29 samples/sec Loss 4.0139 LearningRate 0.0002 Epoch: 25 Global Step: 538540 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:42:51,736-Speed 6346.35 samples/sec Loss 3.9723 LearningRate 0.0002 Epoch: 25 Global Step: 538550 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:54,981-Speed 6313.02 samples/sec Loss 3.9534 LearningRate 0.0002 Epoch: 25 Global Step: 538560 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:42:58,225-Speed 6315.43 samples/sec Loss 4.0126 LearningRate 0.0002 Epoch: 25 Global Step: 538570 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:43:01,466-Speed 6320.13 samples/sec Loss 3.9684 LearningRate 0.0002 Epoch: 25 Global Step: 538580 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:43:04,710-Speed 6313.59 samples/sec Loss 4.0079 LearningRate 0.0002 Epoch: 25 Global Step: 538590 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:43:07,957-Speed 6308.38 samples/sec Loss 4.0492 LearningRate 0.0002 Epoch: 25 Global Step: 538600 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:43:11,205-Speed 6308.29 samples/sec Loss 4.0249 LearningRate 0.0002 Epoch: 25 Global Step: 538610 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:43:14,453-Speed 6306.44 samples/sec Loss 4.0897 LearningRate 0.0002 Epoch: 25 Global Step: 538620 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:43:17,699-Speed 6310.87 samples/sec Loss 3.9828 LearningRate 0.0002 Epoch: 25 Global Step: 538630 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:43:20,943-Speed 6314.19 samples/sec Loss 4.0401 LearningRate 0.0002 Epoch: 25 Global Step: 538640 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:43:24,185-Speed 6319.08 samples/sec Loss 3.9675 LearningRate 0.0002 Epoch: 25 Global Step: 538650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:27,433-Speed 6306.50 samples/sec Loss 4.0007 LearningRate 0.0002 Epoch: 25 Global Step: 538660 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:30,678-Speed 6312.23 samples/sec Loss 4.0506 LearningRate 0.0002 Epoch: 25 Global Step: 538670 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:33,922-Speed 6315.57 samples/sec Loss 4.0032 LearningRate 0.0002 Epoch: 25 Global Step: 538680 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:37,164-Speed 6316.96 samples/sec Loss 3.9476 LearningRate 0.0002 Epoch: 25 Global Step: 538690 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:40,410-Speed 6312.39 samples/sec Loss 3.9773 LearningRate 0.0002 Epoch: 25 Global Step: 538700 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:43,659-Speed 6305.40 samples/sec Loss 3.9793 LearningRate 0.0002 Epoch: 25 Global Step: 538710 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:46,900-Speed 6319.33 samples/sec Loss 3.8992 LearningRate 0.0002 Epoch: 25 Global Step: 538720 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:50,146-Speed 6310.58 samples/sec Loss 4.0049 LearningRate 0.0002 Epoch: 25 Global Step: 538730 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:53,387-Speed 6321.74 samples/sec Loss 4.0425 LearningRate 0.0002 Epoch: 25 Global Step: 538740 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:56,618-Speed 6338.44 samples/sec Loss 4.0156 LearningRate 0.0002 Epoch: 25 Global Step: 538750 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:43:59,863-Speed 6312.96 samples/sec Loss 3.9700 LearningRate 0.0002 Epoch: 25 Global Step: 538760 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:03,111-Speed 6307.94 samples/sec Loss 3.9490 LearningRate 0.0002 Epoch: 25 Global Step: 538770 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:06,356-Speed 6312.88 samples/sec Loss 3.9860 LearningRate 0.0002 Epoch: 25 Global Step: 538780 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:09,598-Speed 6317.31 samples/sec Loss 4.0182 LearningRate 0.0002 Epoch: 25 Global Step: 538790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:12,845-Speed 6308.85 samples/sec Loss 3.9824 LearningRate 0.0002 Epoch: 25 Global Step: 538800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:16,090-Speed 6313.48 samples/sec Loss 4.0158 LearningRate 0.0002 Epoch: 25 Global Step: 538810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:19,331-Speed 6320.52 samples/sec Loss 3.9743 LearningRate 0.0002 Epoch: 25 Global Step: 538820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:22,603-Speed 6259.49 samples/sec Loss 3.9804 LearningRate 0.0002 Epoch: 25 Global Step: 538830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:25,850-Speed 6309.41 samples/sec Loss 3.9774 LearningRate 0.0002 Epoch: 25 Global Step: 538840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:29,089-Speed 6324.69 samples/sec Loss 3.9409 LearningRate 0.0002 Epoch: 25 Global Step: 538850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:32,335-Speed 6310.89 samples/sec Loss 4.0863 LearningRate 0.0002 Epoch: 25 Global Step: 538860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:35,579-Speed 6313.59 samples/sec Loss 3.9621 LearningRate 0.0002 Epoch: 25 Global Step: 538870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:38,826-Speed 6309.26 samples/sec Loss 4.0397 LearningRate 0.0002 Epoch: 25 Global Step: 538880 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:42,073-Speed 6307.70 samples/sec Loss 3.9793 LearningRate 0.0002 Epoch: 25 Global Step: 538890 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:45,319-Speed 6310.47 samples/sec Loss 3.9773 LearningRate 0.0002 Epoch: 25 Global Step: 538900 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:48,567-Speed 6308.83 samples/sec Loss 4.0368 LearningRate 0.0002 Epoch: 25 Global Step: 538910 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:51,814-Speed 6308.55 samples/sec Loss 4.0055 LearningRate 0.0002 Epoch: 25 Global Step: 538920 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:55,058-Speed 6313.71 samples/sec Loss 4.0420 LearningRate 0.0002 Epoch: 25 Global Step: 538930 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:44:58,310-Speed 6299.77 samples/sec Loss 3.9660 LearningRate 0.0002 Epoch: 25 Global Step: 538940 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:45:01,547-Speed 6328.89 samples/sec Loss 4.0227 LearningRate 0.0002 Epoch: 25 Global Step: 538950 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:45:04,794-Speed 6307.92 samples/sec Loss 4.0368 LearningRate 0.0002 Epoch: 25 Global Step: 538960 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:45:08,037-Speed 6316.23 samples/sec Loss 3.9970 LearningRate 0.0002 Epoch: 25 Global Step: 538970 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:45:11,284-Speed 6309.10 samples/sec Loss 3.9964 LearningRate 0.0002 Epoch: 25 Global Step: 538980 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:45:14,529-Speed 6312.54 samples/sec Loss 4.0148 LearningRate 0.0002 Epoch: 25 Global Step: 538990 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:45:17,775-Speed 6311.52 samples/sec Loss 3.9713 LearningRate 0.0002 Epoch: 25 Global Step: 539000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:45:21,024-Speed 6305.81 samples/sec Loss 3.9317 LearningRate 0.0002 Epoch: 25 Global Step: 539010 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:45:24,270-Speed 6310.14 samples/sec Loss 3.9394 LearningRate 0.0002 Epoch: 25 Global Step: 539020 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:45:27,507-Speed 6328.75 samples/sec Loss 3.9594 LearningRate 0.0002 Epoch: 25 Global Step: 539030 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:45:30,752-Speed 6311.64 samples/sec Loss 3.9999 LearningRate 0.0002 Epoch: 25 Global Step: 539040 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:45:33,999-Speed 6308.62 samples/sec Loss 4.0206 LearningRate 0.0002 Epoch: 25 Global Step: 539050 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:45:37,244-Speed 6312.70 samples/sec Loss 4.0535 LearningRate 0.0002 Epoch: 25 Global Step: 539060 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:45:40,600-Speed 6104.15 samples/sec Loss 4.0599 LearningRate 0.0002 Epoch: 25 Global Step: 539070 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:45:43,886-Speed 6233.13 samples/sec Loss 3.9181 LearningRate 0.0002 Epoch: 25 Global Step: 539080 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:45:47,129-Speed 6316.52 samples/sec Loss 3.9796 LearningRate 0.0002 Epoch: 25 Global Step: 539090 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:45:50,372-Speed 6316.27 samples/sec Loss 4.0088 LearningRate 0.0002 Epoch: 25 Global Step: 539100 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:45:53,618-Speed 6312.62 samples/sec Loss 3.9565 LearningRate 0.0002 Epoch: 25 Global Step: 539110 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:45:56,864-Speed 6310.91 samples/sec Loss 3.9952 LearningRate 0.0002 Epoch: 25 Global Step: 539120 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:46:00,106-Speed 6317.53 samples/sec Loss 4.0416 LearningRate 0.0002 Epoch: 25 Global Step: 539130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:46:03,351-Speed 6314.18 samples/sec Loss 4.0444 LearningRate 0.0002 Epoch: 25 Global Step: 539140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:46:06,595-Speed 6313.33 samples/sec Loss 3.9949 LearningRate 0.0002 Epoch: 25 Global Step: 539150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:46:09,835-Speed 6322.76 samples/sec Loss 4.0097 LearningRate 0.0002 Epoch: 25 Global Step: 539160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:46:13,082-Speed 6308.27 samples/sec Loss 3.9982 LearningRate 0.0002 Epoch: 25 Global Step: 539170 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:46:16,322-Speed 6323.61 samples/sec Loss 3.9970 LearningRate 0.0002 Epoch: 25 Global Step: 539180 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:46:19,568-Speed 6310.26 samples/sec Loss 3.9849 LearningRate 0.0002 Epoch: 25 Global Step: 539190 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:46:22,812-Speed 6314.53 samples/sec Loss 4.0050 LearningRate 0.0002 Epoch: 25 Global Step: 539200 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:46:26,068-Speed 6290.83 samples/sec Loss 4.0108 LearningRate 0.0002 Epoch: 25 Global Step: 539210 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:46:29,313-Speed 6313.60 samples/sec Loss 4.0191 LearningRate 0.0002 Epoch: 25 Global Step: 539220 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:47:32,453-Speed 324.36 samples/sec Loss 4.0633 LearningRate 0.0002 Epoch: 26 Global Step: 539230 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:47:35,690-Speed 6328.02 samples/sec Loss 3.9629 LearningRate 0.0002 Epoch: 26 Global Step: 539240 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:47:38,974-Speed 6238.37 samples/sec Loss 4.0048 LearningRate 0.0002 Epoch: 26 Global Step: 539250 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:47:42,228-Speed 6294.11 samples/sec Loss 3.9433 LearningRate 0.0002 Epoch: 26 Global Step: 539260 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:47:45,466-Speed 6327.23 samples/sec Loss 4.0085 LearningRate 0.0002 Epoch: 26 Global Step: 539270 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:47:48,697-Speed 6339.81 samples/sec Loss 3.9963 LearningRate 0.0002 Epoch: 26 Global Step: 539280 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:47:51,968-Speed 6261.31 samples/sec Loss 4.0567 LearningRate 0.0002 Epoch: 26 Global Step: 539290 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:47:55,204-Speed 6330.49 samples/sec Loss 4.0081 LearningRate 0.0002 Epoch: 26 Global Step: 539300 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:47:58,446-Speed 6320.42 samples/sec Loss 4.0447 LearningRate 0.0002 Epoch: 26 Global Step: 539310 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:01,684-Speed 6325.74 samples/sec Loss 3.9909 LearningRate 0.0002 Epoch: 26 Global Step: 539320 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:04,983-Speed 6209.72 samples/sec Loss 4.0130 LearningRate 0.0002 Epoch: 26 Global Step: 539330 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-04-02 16:48:08,232-Speed 6303.86 samples/sec Loss 4.0093 LearningRate 0.0002 Epoch: 26 Global Step: 539340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:11,470-Speed 6326.90 samples/sec Loss 3.9971 LearningRate 0.0002 Epoch: 26 Global Step: 539350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:14,710-Speed 6322.51 samples/sec Loss 3.9534 LearningRate 0.0002 Epoch: 26 Global Step: 539360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:17,952-Speed 6319.31 samples/sec Loss 3.9756 LearningRate 0.0002 Epoch: 26 Global Step: 539370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:21,194-Speed 6317.60 samples/sec Loss 3.9286 LearningRate 0.0002 Epoch: 26 Global Step: 539380 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:24,450-Speed 6291.77 samples/sec Loss 3.9103 LearningRate 0.0002 Epoch: 26 Global Step: 539390 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:27,703-Speed 6297.38 samples/sec Loss 3.9320 LearningRate 0.0002 Epoch: 26 Global Step: 539400 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:30,968-Speed 6274.76 samples/sec Loss 3.9865 LearningRate 0.0002 Epoch: 26 Global Step: 539410 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:34,206-Speed 6325.28 samples/sec Loss 3.9902 LearningRate 0.0002 Epoch: 26 Global Step: 539420 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:37,448-Speed 6318.11 samples/sec Loss 3.9795 LearningRate 0.0002 Epoch: 26 Global Step: 539430 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:40,675-Speed 6349.32 samples/sec Loss 4.0221 LearningRate 0.0002 Epoch: 26 Global Step: 539440 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:43,914-Speed 6324.13 samples/sec Loss 3.9672 LearningRate 0.0002 Epoch: 26 Global Step: 539450 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:47,155-Speed 6319.12 samples/sec Loss 3.9813 LearningRate 0.0002 Epoch: 26 Global Step: 539460 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:50,399-Speed 6315.09 samples/sec Loss 3.9733 LearningRate 0.0002 Epoch: 26 Global Step: 539470 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:53,638-Speed 6324.00 samples/sec Loss 4.0104 LearningRate 0.0002 Epoch: 26 Global Step: 539480 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:48:56,875-Speed 6329.66 samples/sec Loss 3.9600 LearningRate 0.0002 Epoch: 26 Global Step: 539490 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:49:00,111-Speed 6329.84 samples/sec Loss 3.9852 LearningRate 0.0002 Epoch: 26 Global Step: 539500 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:49:03,351-Speed 6322.49 samples/sec Loss 3.9808 LearningRate 0.0002 Epoch: 26 Global Step: 539510 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:49:06,590-Speed 6324.55 samples/sec Loss 4.0223 LearningRate 0.0002 Epoch: 26 Global Step: 539520 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:49:09,826-Speed 6329.89 samples/sec Loss 3.9901 LearningRate 0.0002 Epoch: 26 Global Step: 539530 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:49:13,050-Speed 6354.18 samples/sec Loss 4.0151 LearningRate 0.0002 Epoch: 26 Global Step: 539540 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:49:16,291-Speed 6321.36 samples/sec Loss 3.8749 LearningRate 0.0002 Epoch: 26 Global Step: 539550 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:49:19,527-Speed 6328.56 samples/sec Loss 3.9646 LearningRate 0.0002 Epoch: 26 Global Step: 539560 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:49:22,766-Speed 6325.09 samples/sec Loss 3.9461 LearningRate 0.0002 Epoch: 26 Global Step: 539570 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:49:26,004-Speed 6326.30 samples/sec Loss 3.9325 LearningRate 0.0002 Epoch: 26 Global Step: 539580 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:49:29,248-Speed 6315.22 samples/sec Loss 3.9713 LearningRate 0.0002 Epoch: 26 Global Step: 539590 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:49:32,485-Speed 6327.20 samples/sec Loss 3.9668 LearningRate 0.0002 Epoch: 26 Global Step: 539600 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:49:35,746-Speed 6282.66 samples/sec Loss 3.9731 LearningRate 0.0002 Epoch: 26 Global Step: 539610 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:49:39,039-Speed 6220.90 samples/sec Loss 3.9981 LearningRate 0.0002 Epoch: 26 Global Step: 539620 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:49:42,276-Speed 6327.25 samples/sec Loss 3.9018 LearningRate 0.0002 Epoch: 26 Global Step: 539630 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:49:45,515-Speed 6325.38 samples/sec Loss 4.0301 LearningRate 0.0002 Epoch: 26 Global Step: 539640 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:49:48,756-Speed 6319.82 samples/sec Loss 3.9911 LearningRate 0.0002 Epoch: 26 Global Step: 539650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:49:51,996-Speed 6323.26 samples/sec Loss 3.9863 LearningRate 0.0002 Epoch: 26 Global Step: 539660 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:49:55,232-Speed 6328.51 samples/sec Loss 3.9509 LearningRate 0.0002 Epoch: 26 Global Step: 539670 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:49:58,472-Speed 6324.05 samples/sec Loss 3.9913 LearningRate 0.0002 Epoch: 26 Global Step: 539680 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:50:01,714-Speed 6318.21 samples/sec Loss 3.9684 LearningRate 0.0002 Epoch: 26 Global Step: 539690 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:50:04,954-Speed 6321.26 samples/sec Loss 3.9560 LearningRate 0.0002 Epoch: 26 Global Step: 539700 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:50:08,196-Speed 6317.99 samples/sec Loss 3.9907 LearningRate 0.0002 Epoch: 26 Global Step: 539710 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:50:11,443-Speed 6310.45 samples/sec Loss 3.9765 LearningRate 0.0002 Epoch: 26 Global Step: 539720 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:50:14,680-Speed 6328.23 samples/sec Loss 3.9989 LearningRate 0.0002 Epoch: 26 Global Step: 539730 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:50:17,908-Speed 6346.33 samples/sec Loss 4.0167 LearningRate 0.0002 Epoch: 26 Global Step: 539740 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:50:21,148-Speed 6323.21 samples/sec Loss 3.9654 LearningRate 0.0002 Epoch: 26 Global Step: 539750 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:50:24,373-Speed 6351.79 samples/sec Loss 3.9953 LearningRate 0.0002 Epoch: 26 Global Step: 539760 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:50:27,613-Speed 6321.10 samples/sec Loss 4.0293 LearningRate 0.0002 Epoch: 26 Global Step: 539770 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:50:30,850-Speed 6329.14 samples/sec Loss 3.9757 LearningRate 0.0002 Epoch: 26 Global Step: 539780 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:50:34,088-Speed 6326.75 samples/sec Loss 3.9551 LearningRate 0.0002 Epoch: 26 Global Step: 539790 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:50:37,329-Speed 6319.61 samples/sec Loss 4.0245 LearningRate 0.0002 Epoch: 26 Global Step: 539800 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:50:40,569-Speed 6322.78 samples/sec Loss 4.0055 LearningRate 0.0002 Epoch: 26 Global Step: 539810 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:50:43,808-Speed 6324.38 samples/sec Loss 4.0388 LearningRate 0.0002 Epoch: 26 Global Step: 539820 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:50:47,048-Speed 6321.19 samples/sec Loss 3.9283 LearningRate 0.0002 Epoch: 26 Global Step: 539830 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:50:50,291-Speed 6317.06 samples/sec Loss 3.9961 LearningRate 0.0002 Epoch: 26 Global Step: 539840 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:50:53,528-Speed 6329.10 samples/sec Loss 4.0009 LearningRate 0.0002 Epoch: 26 Global Step: 539850 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:50:56,769-Speed 6320.00 samples/sec Loss 3.9905 LearningRate 0.0002 Epoch: 26 Global Step: 539860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:00,021-Speed 6299.21 samples/sec Loss 3.9574 LearningRate 0.0002 Epoch: 26 Global Step: 539870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:03,265-Speed 6315.01 samples/sec Loss 4.0067 LearningRate 0.0002 Epoch: 26 Global Step: 539880 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:06,505-Speed 6320.85 samples/sec Loss 3.9634 LearningRate 0.0002 Epoch: 26 Global Step: 539890 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:09,742-Speed 6327.70 samples/sec Loss 3.9556 LearningRate 0.0002 Epoch: 26 Global Step: 539900 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:12,987-Speed 6314.61 samples/sec Loss 3.9983 LearningRate 0.0002 Epoch: 26 Global Step: 539910 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:16,225-Speed 6325.62 samples/sec Loss 4.0019 LearningRate 0.0002 Epoch: 26 Global Step: 539920 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:19,465-Speed 6322.19 samples/sec Loss 3.9732 LearningRate 0.0002 Epoch: 26 Global Step: 539930 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:22,708-Speed 6316.49 samples/sec Loss 3.9599 LearningRate 0.0002 Epoch: 26 Global Step: 539940 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:25,947-Speed 6324.84 samples/sec Loss 3.9902 LearningRate 0.0002 Epoch: 26 Global Step: 539950 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:29,175-Speed 6346.93 samples/sec Loss 4.0040 LearningRate 0.0002 Epoch: 26 Global Step: 539960 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:32,411-Speed 6329.84 samples/sec Loss 4.0085 LearningRate 0.0002 Epoch: 26 Global Step: 539970 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:35,682-Speed 6262.87 samples/sec Loss 4.0180 LearningRate 0.0002 Epoch: 26 Global Step: 539980 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:38,920-Speed 6326.27 samples/sec Loss 4.0293 LearningRate 0.0002 Epoch: 26 Global Step: 539990 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:42,159-Speed 6323.41 samples/sec Loss 3.9792 LearningRate 0.0002 Epoch: 26 Global Step: 540000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:51:45,387-Speed 6346.46 samples/sec Loss 3.9791 LearningRate 0.0002 Epoch: 26 Global Step: 540010 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:51:48,625-Speed 6326.55 samples/sec Loss 4.0047 LearningRate 0.0002 Epoch: 26 Global Step: 540020 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:51:51,867-Speed 6317.45 samples/sec Loss 4.0186 LearningRate 0.0002 Epoch: 26 Global Step: 540030 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:51:55,108-Speed 6321.81 samples/sec Loss 3.9607 LearningRate 0.0002 Epoch: 26 Global Step: 540040 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:51:58,404-Speed 6215.31 samples/sec Loss 3.9813 LearningRate 0.0002 Epoch: 26 Global Step: 540050 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:01,646-Speed 6317.64 samples/sec Loss 3.9876 LearningRate 0.0002 Epoch: 26 Global Step: 540060 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:04,925-Speed 6246.86 samples/sec Loss 3.9965 LearningRate 0.0002 Epoch: 26 Global Step: 540070 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:08,168-Speed 6317.22 samples/sec Loss 3.9859 LearningRate 0.0002 Epoch: 26 Global Step: 540080 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:11,408-Speed 6322.93 samples/sec Loss 3.9982 LearningRate 0.0002 Epoch: 26 Global Step: 540090 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:14,652-Speed 6312.79 samples/sec Loss 3.9949 LearningRate 0.0002 Epoch: 26 Global Step: 540100 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:17,894-Speed 6319.05 samples/sec Loss 3.9364 LearningRate 0.0002 Epoch: 26 Global Step: 540110 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:52:21,121-Speed 6347.21 samples/sec Loss 3.9750 LearningRate 0.0002 Epoch: 26 Global Step: 540120 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:24,375-Speed 6296.66 samples/sec Loss 3.9821 LearningRate 0.0002 Epoch: 26 Global Step: 540130 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:27,624-Speed 6304.30 samples/sec Loss 3.9107 LearningRate 0.0002 Epoch: 26 Global Step: 540140 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:30,867-Speed 6317.70 samples/sec Loss 3.9900 LearningRate 0.0002 Epoch: 26 Global Step: 540150 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:34,109-Speed 6317.85 samples/sec Loss 4.0182 LearningRate 0.0002 Epoch: 26 Global Step: 540160 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:37,348-Speed 6323.89 samples/sec Loss 3.9693 LearningRate 0.0002 Epoch: 26 Global Step: 540170 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:40,590-Speed 6319.49 samples/sec Loss 4.0383 LearningRate 0.0002 Epoch: 26 Global Step: 540180 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:43,857-Speed 6270.06 samples/sec Loss 4.0231 LearningRate 0.0002 Epoch: 26 Global Step: 540190 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:47,105-Speed 6307.20 samples/sec Loss 3.9865 LearningRate 0.0002 Epoch: 26 Global Step: 540200 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:50,346-Speed 6319.55 samples/sec Loss 3.9777 LearningRate 0.0002 Epoch: 26 Global Step: 540210 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:52:53,586-Speed 6322.65 samples/sec Loss 3.9424 LearningRate 0.0002 Epoch: 26 Global Step: 540220 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:52:56,828-Speed 6319.57 samples/sec Loss 4.0270 LearningRate 0.0002 Epoch: 26 Global Step: 540230 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:53:00,058-Speed 6341.15 samples/sec Loss 3.9981 LearningRate 0.0002 Epoch: 26 Global Step: 540240 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:53:03,311-Speed 6297.54 samples/sec Loss 3.9915 LearningRate 0.0002 Epoch: 26 Global Step: 540250 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:53:06,550-Speed 6323.84 samples/sec Loss 3.9295 LearningRate 0.0002 Epoch: 26 Global Step: 540260 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:53:09,790-Speed 6321.83 samples/sec Loss 3.9605 LearningRate 0.0002 Epoch: 26 Global Step: 540270 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:53:13,032-Speed 6318.83 samples/sec Loss 3.9871 LearningRate 0.0002 Epoch: 26 Global Step: 540280 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:53:16,279-Speed 6309.16 samples/sec Loss 4.0193 LearningRate 0.0002 Epoch: 26 Global Step: 540290 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:53:19,519-Speed 6321.46 samples/sec Loss 3.9120 LearningRate 0.0002 Epoch: 26 Global Step: 540300 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:53:22,762-Speed 6318.06 samples/sec Loss 4.0522 LearningRate 0.0002 Epoch: 26 Global Step: 540310 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:53:26,005-Speed 6315.51 samples/sec Loss 4.0361 LearningRate 0.0002 Epoch: 26 Global Step: 540320 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:53:29,246-Speed 6320.43 samples/sec Loss 3.9875 LearningRate 0.0002 Epoch: 26 Global Step: 540330 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:53:32,490-Speed 6315.10 samples/sec Loss 3.9250 LearningRate 0.0002 Epoch: 26 Global Step: 540340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:53:35,731-Speed 6320.80 samples/sec Loss 4.0016 LearningRate 0.0002 Epoch: 26 Global Step: 540350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:53:38,986-Speed 6293.91 samples/sec Loss 3.9800 LearningRate 0.0002 Epoch: 26 Global Step: 540360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:53:42,228-Speed 6318.85 samples/sec Loss 3.9253 LearningRate 0.0002 Epoch: 26 Global Step: 540370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:53:45,469-Speed 6319.74 samples/sec Loss 3.9549 LearningRate 0.0002 Epoch: 26 Global Step: 540380 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:53:48,711-Speed 6319.87 samples/sec Loss 3.9357 LearningRate 0.0002 Epoch: 26 Global Step: 540390 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:53:51,948-Speed 6326.35 samples/sec Loss 3.9192 LearningRate 0.0002 Epoch: 26 Global Step: 540400 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:53:55,192-Speed 6315.52 samples/sec Loss 3.9900 LearningRate 0.0001 Epoch: 26 Global Step: 540410 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:53:58,436-Speed 6314.17 samples/sec Loss 4.0078 LearningRate 0.0001 Epoch: 26 Global Step: 540420 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:54:01,666-Speed 6342.12 samples/sec Loss 4.0075 LearningRate 0.0001 Epoch: 26 Global Step: 540430 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:04,915-Speed 6305.63 samples/sec Loss 3.9584 LearningRate 0.0001 Epoch: 26 Global Step: 540440 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:08,160-Speed 6312.64 samples/sec Loss 3.9685 LearningRate 0.0001 Epoch: 26 Global Step: 540450 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:11,399-Speed 6324.46 samples/sec Loss 3.9610 LearningRate 0.0001 Epoch: 26 Global Step: 540460 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:14,638-Speed 6324.31 samples/sec Loss 4.0015 LearningRate 0.0001 Epoch: 26 Global Step: 540470 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:17,879-Speed 6320.18 samples/sec Loss 3.9389 LearningRate 0.0001 Epoch: 26 Global Step: 540480 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:21,127-Speed 6306.98 samples/sec Loss 3.9946 LearningRate 0.0001 Epoch: 26 Global Step: 540490 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:24,369-Speed 6317.34 samples/sec Loss 3.9632 LearningRate 0.0001 Epoch: 26 Global Step: 540500 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:27,613-Speed 6316.19 samples/sec Loss 4.0254 LearningRate 0.0001 Epoch: 26 Global Step: 540510 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:30,854-Speed 6319.70 samples/sec Loss 4.0427 LearningRate 0.0001 Epoch: 26 Global Step: 540520 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:34,098-Speed 6315.21 samples/sec Loss 4.0099 LearningRate 0.0001 Epoch: 26 Global Step: 540530 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:54:37,329-Speed 6340.15 samples/sec Loss 4.0211 LearningRate 0.0001 Epoch: 26 Global Step: 540540 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:40,573-Speed 6314.21 samples/sec Loss 3.9475 LearningRate 0.0001 Epoch: 26 Global Step: 540550 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:43,811-Speed 6326.31 samples/sec Loss 3.9804 LearningRate 0.0001 Epoch: 26 Global Step: 540560 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:47,053-Speed 6319.75 samples/sec Loss 3.9517 LearningRate 0.0001 Epoch: 26 Global Step: 540570 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:50,299-Speed 6309.84 samples/sec Loss 3.9600 LearningRate 0.0001 Epoch: 26 Global Step: 540580 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:53,540-Speed 6321.12 samples/sec Loss 3.9410 LearningRate 0.0001 Epoch: 26 Global Step: 540590 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:54:56,782-Speed 6317.79 samples/sec Loss 3.9976 LearningRate 0.0001 Epoch: 26 Global Step: 540600 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:00,027-Speed 6312.62 samples/sec Loss 3.9868 LearningRate 0.0001 Epoch: 26 Global Step: 540610 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:03,269-Speed 6318.48 samples/sec Loss 3.9872 LearningRate 0.0001 Epoch: 26 Global Step: 540620 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:06,510-Speed 6320.65 samples/sec Loss 3.9910 LearningRate 0.0001 Epoch: 26 Global Step: 540630 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:09,751-Speed 6321.10 samples/sec Loss 3.9229 LearningRate 0.0001 Epoch: 26 Global Step: 540640 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:55:12,994-Speed 6316.30 samples/sec Loss 3.9695 LearningRate 0.0001 Epoch: 26 Global Step: 540650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:55:16,237-Speed 6316.56 samples/sec Loss 3.9807 LearningRate 0.0001 Epoch: 26 Global Step: 540660 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:55:19,485-Speed 6307.42 samples/sec Loss 3.9450 LearningRate 0.0001 Epoch: 26 Global Step: 540670 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:55:22,728-Speed 6316.07 samples/sec Loss 3.9635 LearningRate 0.0001 Epoch: 26 Global Step: 540680 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:55:25,956-Speed 6345.75 samples/sec Loss 3.9142 LearningRate 0.0001 Epoch: 26 Global Step: 540690 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:29,202-Speed 6309.60 samples/sec Loss 4.0021 LearningRate 0.0001 Epoch: 26 Global Step: 540700 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:32,445-Speed 6316.50 samples/sec Loss 3.9012 LearningRate 0.0001 Epoch: 26 Global Step: 540710 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:35,688-Speed 6316.66 samples/sec Loss 3.8954 LearningRate 0.0001 Epoch: 26 Global Step: 540720 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:38,929-Speed 6320.99 samples/sec Loss 3.9809 LearningRate 0.0001 Epoch: 26 Global Step: 540730 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:42,172-Speed 6316.82 samples/sec Loss 3.9973 LearningRate 0.0001 Epoch: 26 Global Step: 540740 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:45,413-Speed 6320.48 samples/sec Loss 3.9853 LearningRate 0.0001 Epoch: 26 Global Step: 540750 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:48,656-Speed 6316.46 samples/sec Loss 3.8996 LearningRate 0.0001 Epoch: 26 Global Step: 540760 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:51,901-Speed 6313.77 samples/sec Loss 3.9334 LearningRate 0.0001 Epoch: 26 Global Step: 540770 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:55,146-Speed 6311.39 samples/sec Loss 4.0121 LearningRate 0.0001 Epoch: 26 Global Step: 540780 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:55:58,388-Speed 6319.56 samples/sec Loss 3.9991 LearningRate 0.0001 Epoch: 26 Global Step: 540790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:01,633-Speed 6313.24 samples/sec Loss 3.9439 LearningRate 0.0001 Epoch: 26 Global Step: 540800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:04,874-Speed 6319.81 samples/sec Loss 3.9931 LearningRate 0.0001 Epoch: 26 Global Step: 540810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:08,116-Speed 6318.17 samples/sec Loss 3.9816 LearningRate 0.0001 Epoch: 26 Global Step: 540820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:11,359-Speed 6316.66 samples/sec Loss 4.0124 LearningRate 0.0001 Epoch: 26 Global Step: 540830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:14,601-Speed 6319.65 samples/sec Loss 3.9714 LearningRate 0.0001 Epoch: 26 Global Step: 540840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:17,844-Speed 6315.95 samples/sec Loss 4.0480 LearningRate 0.0001 Epoch: 26 Global Step: 540850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:21,086-Speed 6317.99 samples/sec Loss 3.9811 LearningRate 0.0001 Epoch: 26 Global Step: 540860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:24,333-Speed 6308.63 samples/sec Loss 3.9967 LearningRate 0.0001 Epoch: 26 Global Step: 540870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:27,610-Speed 6252.45 samples/sec Loss 4.0148 LearningRate 0.0001 Epoch: 26 Global Step: 540880 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:30,874-Speed 6275.69 samples/sec Loss 3.8849 LearningRate 0.0001 Epoch: 26 Global Step: 540890 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-04-02 16:56:34,107-Speed 6336.46 samples/sec Loss 3.8985 LearningRate 0.0001 Epoch: 26 Global Step: 540900 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:37,350-Speed 6314.69 samples/sec Loss 3.9766 LearningRate 0.0001 Epoch: 26 Global Step: 540910 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:40,594-Speed 6315.76 samples/sec Loss 3.9382 LearningRate 0.0001 Epoch: 26 Global Step: 540920 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:43,833-Speed 6322.98 samples/sec Loss 3.9559 LearningRate 0.0001 Epoch: 26 Global Step: 540930 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:47,078-Speed 6314.49 samples/sec Loss 3.9588 LearningRate 0.0001 Epoch: 26 Global Step: 540940 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:50,318-Speed 6321.65 samples/sec Loss 4.0323 LearningRate 0.0001 Epoch: 26 Global Step: 540950 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:53,562-Speed 6315.37 samples/sec Loss 4.0639 LearningRate 0.0001 Epoch: 26 Global Step: 540960 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:56:56,844-Speed 6240.76 samples/sec Loss 3.9868 LearningRate 0.0001 Epoch: 26 Global Step: 540970 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:57:00,088-Speed 6314.57 samples/sec Loss 3.9167 LearningRate 0.0001 Epoch: 26 Global Step: 540980 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:57:03,333-Speed 6313.18 samples/sec Loss 4.0396 LearningRate 0.0001 Epoch: 26 Global Step: 540990 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:57:06,563-Speed 6343.23 samples/sec Loss 3.9640 LearningRate 0.0001 Epoch: 26 Global Step: 541000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:57:09,802-Speed 6323.03 samples/sec Loss 3.9514 LearningRate 0.0001 Epoch: 26 Global Step: 541010 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:57:13,036-Speed 6335.12 samples/sec Loss 3.9705 LearningRate 0.0001 Epoch: 26 Global Step: 541020 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:57:16,280-Speed 6314.01 samples/sec Loss 4.0174 LearningRate 0.0001 Epoch: 26 Global Step: 541030 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:57:19,520-Speed 6322.50 samples/sec Loss 3.9779 LearningRate 0.0001 Epoch: 26 Global Step: 541040 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:57:22,759-Speed 6323.76 samples/sec Loss 3.9660 LearningRate 0.0001 Epoch: 26 Global Step: 541050 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:57:26,006-Speed 6309.05 samples/sec Loss 3.9835 LearningRate 0.0001 Epoch: 26 Global Step: 541060 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:57:29,250-Speed 6314.93 samples/sec Loss 3.9448 LearningRate 0.0001 Epoch: 26 Global Step: 541070 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:57:32,492-Speed 6317.92 samples/sec Loss 3.9679 LearningRate 0.0001 Epoch: 26 Global Step: 541080 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:57:35,738-Speed 6311.29 samples/sec Loss 3.9823 LearningRate 0.0001 Epoch: 26 Global Step: 541090 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:57:38,976-Speed 6325.03 samples/sec Loss 3.9868 LearningRate 0.0001 Epoch: 26 Global Step: 541100 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:57:42,219-Speed 6317.78 samples/sec Loss 4.0349 LearningRate 0.0001 Epoch: 26 Global Step: 541110 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:57:45,460-Speed 6318.93 samples/sec Loss 3.9765 LearningRate 0.0001 Epoch: 26 Global Step: 541120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:57:48,709-Speed 6306.21 samples/sec Loss 3.9484 LearningRate 0.0001 Epoch: 26 Global Step: 541130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:57:51,950-Speed 6319.43 samples/sec Loss 4.0180 LearningRate 0.0001 Epoch: 26 Global Step: 541140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:57:55,191-Speed 6321.26 samples/sec Loss 3.9850 LearningRate 0.0001 Epoch: 26 Global Step: 541150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:57:58,434-Speed 6316.66 samples/sec Loss 3.9751 LearningRate 0.0001 Epoch: 26 Global Step: 541160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:01,675-Speed 6320.04 samples/sec Loss 3.9327 LearningRate 0.0001 Epoch: 26 Global Step: 541170 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:04,922-Speed 6309.72 samples/sec Loss 3.9516 LearningRate 0.0001 Epoch: 26 Global Step: 541180 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:08,168-Speed 6311.47 samples/sec Loss 3.9104 LearningRate 0.0001 Epoch: 26 Global Step: 541190 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:11,411-Speed 6316.98 samples/sec Loss 3.8673 LearningRate 0.0001 Epoch: 26 Global Step: 541200 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:14,653-Speed 6317.70 samples/sec Loss 3.9456 LearningRate 0.0001 Epoch: 26 Global Step: 541210 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:17,887-Speed 6333.92 samples/sec Loss 4.0142 LearningRate 0.0001 Epoch: 26 Global Step: 541220 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:21,128-Speed 6320.25 samples/sec Loss 3.9519 LearningRate 0.0001 Epoch: 26 Global Step: 541230 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:24,370-Speed 6319.33 samples/sec Loss 3.9653 LearningRate 0.0001 Epoch: 26 Global Step: 541240 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:27,615-Speed 6311.56 samples/sec Loss 3.9920 LearningRate 0.0001 Epoch: 26 Global Step: 541250 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:30,860-Speed 6313.56 samples/sec Loss 3.9656 LearningRate 0.0001 Epoch: 26 Global Step: 541260 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:34,101-Speed 6320.22 samples/sec Loss 4.0024 LearningRate 0.0001 Epoch: 26 Global Step: 541270 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:37,344-Speed 6316.15 samples/sec Loss 4.0171 LearningRate 0.0001 Epoch: 26 Global Step: 541280 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:40,593-Speed 6305.14 samples/sec Loss 4.0091 LearningRate 0.0001 Epoch: 26 Global Step: 541290 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:43,838-Speed 6311.88 samples/sec Loss 3.9251 LearningRate 0.0001 Epoch: 26 Global Step: 541300 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:47,083-Speed 6312.63 samples/sec Loss 3.9091 LearningRate 0.0001 Epoch: 26 Global Step: 541310 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:50,308-Speed 6353.10 samples/sec Loss 4.0197 LearningRate 0.0001 Epoch: 26 Global Step: 541320 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:53,553-Speed 6312.64 samples/sec Loss 3.8992 LearningRate 0.0001 Epoch: 26 Global Step: 541330 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:58:56,798-Speed 6312.60 samples/sec Loss 3.9735 LearningRate 0.0001 Epoch: 26 Global Step: 541340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:59:00,060-Speed 6279.48 samples/sec Loss 3.9403 LearningRate 0.0001 Epoch: 26 Global Step: 541350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:59:03,302-Speed 6318.07 samples/sec Loss 3.9553 LearningRate 0.0001 Epoch: 26 Global Step: 541360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:59:06,544-Speed 6319.49 samples/sec Loss 3.9613 LearningRate 0.0001 Epoch: 26 Global Step: 541370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:59:09,774-Speed 6341.81 samples/sec Loss 4.0116 LearningRate 0.0001 Epoch: 26 Global Step: 541380 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:59:13,014-Speed 6321.48 samples/sec Loss 4.0604 LearningRate 0.0001 Epoch: 26 Global Step: 541390 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:59:16,262-Speed 6307.41 samples/sec Loss 3.9397 LearningRate 0.0001 Epoch: 26 Global Step: 541400 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:59:19,505-Speed 6317.48 samples/sec Loss 3.9064 LearningRate 0.0001 Epoch: 26 Global Step: 541410 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:59:22,747-Speed 6317.78 samples/sec Loss 3.9386 LearningRate 0.0001 Epoch: 26 Global Step: 541420 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:59:25,989-Speed 6317.89 samples/sec Loss 3.9575 LearningRate 0.0001 Epoch: 26 Global Step: 541430 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:59:29,236-Speed 6312.67 samples/sec Loss 3.9590 LearningRate 0.0001 Epoch: 26 Global Step: 541440 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:59:32,479-Speed 6315.43 samples/sec Loss 3.9312 LearningRate 0.0001 Epoch: 26 Global Step: 541450 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:59:35,720-Speed 6320.92 samples/sec Loss 3.9385 LearningRate 0.0001 Epoch: 26 Global Step: 541460 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:59:38,963-Speed 6316.69 samples/sec Loss 3.9943 LearningRate 0.0001 Epoch: 26 Global Step: 541470 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 16:59:42,203-Speed 6321.88 samples/sec Loss 4.0107 LearningRate 0.0001 Epoch: 26 Global Step: 541480 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:59:45,445-Speed 6319.24 samples/sec Loss 3.9494 LearningRate 0.0001 Epoch: 26 Global Step: 541490 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:59:48,691-Speed 6310.53 samples/sec Loss 4.0448 LearningRate 0.0001 Epoch: 26 Global Step: 541500 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:59:51,935-Speed 6315.75 samples/sec Loss 4.0296 LearningRate 0.0001 Epoch: 26 Global Step: 541510 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:59:55,178-Speed 6315.06 samples/sec Loss 3.9329 LearningRate 0.0001 Epoch: 26 Global Step: 541520 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 16:59:58,427-Speed 6305.46 samples/sec Loss 3.9265 LearningRate 0.0001 Epoch: 26 Global Step: 541530 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:01,674-Speed 6307.78 samples/sec Loss 3.9432 LearningRate 0.0001 Epoch: 26 Global Step: 541540 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:04,925-Speed 6301.75 samples/sec Loss 3.9458 LearningRate 0.0001 Epoch: 26 Global Step: 541550 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:08,167-Speed 6319.43 samples/sec Loss 3.9957 LearningRate 0.0001 Epoch: 26 Global Step: 541560 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:11,410-Speed 6315.10 samples/sec Loss 3.9813 LearningRate 0.0001 Epoch: 26 Global Step: 541570 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:14,640-Speed 6341.95 samples/sec Loss 3.9645 LearningRate 0.0001 Epoch: 26 Global Step: 541580 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:17,898-Speed 6288.95 samples/sec Loss 3.9441 LearningRate 0.0001 Epoch: 26 Global Step: 541590 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:21,141-Speed 6316.07 samples/sec Loss 3.9721 LearningRate 0.0001 Epoch: 26 Global Step: 541600 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:24,400-Speed 6286.34 samples/sec Loss 3.9832 LearningRate 0.0001 Epoch: 26 Global Step: 541610 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:27,646-Speed 6310.69 samples/sec Loss 3.9793 LearningRate 0.0001 Epoch: 26 Global Step: 541620 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:30,889-Speed 6316.42 samples/sec Loss 4.0161 LearningRate 0.0001 Epoch: 26 Global Step: 541630 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:34,132-Speed 6317.41 samples/sec Loss 4.0168 LearningRate 0.0001 Epoch: 26 Global Step: 541640 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:37,375-Speed 6315.55 samples/sec Loss 3.9723 LearningRate 0.0001 Epoch: 26 Global Step: 541650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:40,620-Speed 6313.98 samples/sec Loss 3.9540 LearningRate 0.0001 Epoch: 26 Global Step: 541660 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:43,864-Speed 6314.06 samples/sec Loss 4.0103 LearningRate 0.0001 Epoch: 26 Global Step: 541670 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:47,095-Speed 6339.65 samples/sec Loss 3.9637 LearningRate 0.0001 Epoch: 26 Global Step: 541680 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:50,340-Speed 6311.41 samples/sec Loss 3.9667 LearningRate 0.0001 Epoch: 26 Global Step: 541690 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:53,584-Speed 6315.18 samples/sec Loss 3.8918 LearningRate 0.0001 Epoch: 26 Global Step: 541700 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:00:56,827-Speed 6316.55 samples/sec Loss 3.9784 LearningRate 0.0001 Epoch: 26 Global Step: 541710 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:00,070-Speed 6316.11 samples/sec Loss 3.9648 LearningRate 0.0001 Epoch: 26 Global Step: 541720 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:03,317-Speed 6310.93 samples/sec Loss 3.9355 LearningRate 0.0001 Epoch: 26 Global Step: 541730 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:06,562-Speed 6312.40 samples/sec Loss 3.9432 LearningRate 0.0001 Epoch: 26 Global Step: 541740 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:09,807-Speed 6312.17 samples/sec Loss 3.9448 LearningRate 0.0001 Epoch: 26 Global Step: 541750 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:13,062-Speed 6293.25 samples/sec Loss 3.9357 LearningRate 0.0001 Epoch: 26 Global Step: 541760 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:16,314-Speed 6298.72 samples/sec Loss 3.9531 LearningRate 0.0001 Epoch: 26 Global Step: 541770 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:19,547-Speed 6336.44 samples/sec Loss 3.9440 LearningRate 0.0001 Epoch: 26 Global Step: 541780 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:22,794-Speed 6309.28 samples/sec Loss 3.9679 LearningRate 0.0001 Epoch: 26 Global Step: 541790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:26,046-Speed 6299.02 samples/sec Loss 3.9310 LearningRate 0.0001 Epoch: 26 Global Step: 541800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:29,293-Speed 6308.94 samples/sec Loss 3.9461 LearningRate 0.0001 Epoch: 26 Global Step: 541810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:32,537-Speed 6315.48 samples/sec Loss 3.9906 LearningRate 0.0001 Epoch: 26 Global Step: 541820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:35,780-Speed 6316.08 samples/sec Loss 4.0268 LearningRate 0.0001 Epoch: 26 Global Step: 541830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:39,023-Speed 6317.18 samples/sec Loss 3.9346 LearningRate 0.0001 Epoch: 26 Global Step: 541840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:42,269-Speed 6310.80 samples/sec Loss 3.9904 LearningRate 0.0001 Epoch: 26 Global Step: 541850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:45,514-Speed 6312.37 samples/sec Loss 3.9974 LearningRate 0.0001 Epoch: 26 Global Step: 541860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:48,763-Speed 6304.95 samples/sec Loss 4.0312 LearningRate 0.0001 Epoch: 26 Global Step: 541870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:51,992-Speed 6344.01 samples/sec Loss 3.9665 LearningRate 0.0001 Epoch: 26 Global Step: 541880 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:55,235-Speed 6315.99 samples/sec Loss 4.0112 LearningRate 0.0001 Epoch: 26 Global Step: 541890 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:01:58,478-Speed 6316.00 samples/sec Loss 3.9460 LearningRate 0.0001 Epoch: 26 Global Step: 541900 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:02:01,720-Speed 6318.91 samples/sec Loss 3.9593 LearningRate 0.0001 Epoch: 26 Global Step: 541910 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:02:04,962-Speed 6318.73 samples/sec Loss 3.8767 LearningRate 0.0001 Epoch: 26 Global Step: 541920 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:02:08,204-Speed 6319.16 samples/sec Loss 3.9418 LearningRate 0.0001 Epoch: 26 Global Step: 541930 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:02:11,456-Speed 6299.12 samples/sec Loss 4.0246 LearningRate 0.0001 Epoch: 26 Global Step: 541940 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:02:14,687-Speed 6338.24 samples/sec Loss 3.9509 LearningRate 0.0001 Epoch: 26 Global Step: 541950 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:02:17,937-Speed 6304.23 samples/sec Loss 3.9668 LearningRate 0.0001 Epoch: 26 Global Step: 541960 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:02:21,182-Speed 6311.46 samples/sec Loss 3.9363 LearningRate 0.0001 Epoch: 26 Global Step: 541970 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:02:24,427-Speed 6314.36 samples/sec Loss 4.0158 LearningRate 0.0001 Epoch: 26 Global Step: 541980 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:02:27,670-Speed 6315.34 samples/sec Loss 3.9543 LearningRate 0.0001 Epoch: 26 Global Step: 541990 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:02:30,919-Speed 6305.52 samples/sec Loss 3.9419 LearningRate 0.0001 Epoch: 26 Global Step: 542000 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:02:34,167-Speed 6306.93 samples/sec Loss 3.9528 LearningRate 0.0001 Epoch: 26 Global Step: 542010 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:02:37,417-Speed 6302.93 samples/sec Loss 3.9743 LearningRate 0.0001 Epoch: 26 Global Step: 542020 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:02:40,665-Speed 6306.30 samples/sec Loss 4.0066 LearningRate 0.0001 Epoch: 26 Global Step: 542030 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:02:43,911-Speed 6310.82 samples/sec Loss 3.9619 LearningRate 0.0001 Epoch: 26 Global Step: 542040 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:02:47,156-Speed 6313.67 samples/sec Loss 4.0223 LearningRate 0.0001 Epoch: 26 Global Step: 542050 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:02:50,402-Speed 6310.45 samples/sec Loss 3.9935 LearningRate 0.0001 Epoch: 26 Global Step: 542060 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:02:53,646-Speed 6314.84 samples/sec Loss 4.0169 LearningRate 0.0001 Epoch: 26 Global Step: 542070 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:02:56,893-Speed 6307.95 samples/sec Loss 3.8980 LearningRate 0.0001 Epoch: 26 Global Step: 542080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:03:00,134-Speed 6320.81 samples/sec Loss 3.9240 LearningRate 0.0001 Epoch: 26 Global Step: 542090 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:03:03,379-Speed 6312.38 samples/sec Loss 3.9487 LearningRate 0.0001 Epoch: 26 Global Step: 542100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:03:06,621-Speed 6318.94 samples/sec Loss 3.9705 LearningRate 0.0001 Epoch: 26 Global Step: 542110 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:03:09,866-Speed 6313.67 samples/sec Loss 4.0296 LearningRate 0.0001 Epoch: 26 Global Step: 542120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:03:13,114-Speed 6306.96 samples/sec Loss 3.9427 LearningRate 0.0001 Epoch: 26 Global Step: 542130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:03:16,357-Speed 6315.15 samples/sec Loss 4.0048 LearningRate 0.0001 Epoch: 26 Global Step: 542140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:03:19,588-Speed 6339.73 samples/sec Loss 3.9737 LearningRate 0.0001 Epoch: 26 Global Step: 542150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:03:22,838-Speed 6303.54 samples/sec Loss 3.9730 LearningRate 0.0001 Epoch: 26 Global Step: 542160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:03:26,069-Speed 6340.59 samples/sec Loss 3.8854 LearningRate 0.0001 Epoch: 26 Global Step: 542170 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:03:29,362-Speed 6219.88 samples/sec Loss 3.9199 LearningRate 0.0001 Epoch: 26 Global Step: 542180 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:03:32,609-Speed 6309.98 samples/sec Loss 3.9290 LearningRate 0.0001 Epoch: 26 Global Step: 542190 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:03:35,854-Speed 6312.53 samples/sec Loss 3.9375 LearningRate 0.0001 Epoch: 26 Global Step: 542200 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:03:39,097-Speed 6315.26 samples/sec Loss 3.9428 LearningRate 0.0001 Epoch: 26 Global Step: 542210 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:03:42,348-Speed 6302.54 samples/sec Loss 3.9811 LearningRate 0.0001 Epoch: 26 Global Step: 542220 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:03:45,595-Speed 6308.47 samples/sec Loss 3.9708 LearningRate 0.0001 Epoch: 26 Global Step: 542230 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:03:48,838-Speed 6315.52 samples/sec Loss 3.9566 LearningRate 0.0001 Epoch: 26 Global Step: 542240 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:03:52,082-Speed 6315.42 samples/sec Loss 3.9994 LearningRate 0.0001 Epoch: 26 Global Step: 542250 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:03:55,325-Speed 6317.10 samples/sec Loss 3.9141 LearningRate 0.0001 Epoch: 26 Global Step: 542260 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:03:58,572-Speed 6308.54 samples/sec Loss 4.0276 LearningRate 0.0001 Epoch: 26 Global Step: 542270 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:01,818-Speed 6309.77 samples/sec Loss 3.9685 LearningRate 0.0001 Epoch: 26 Global Step: 542280 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:05,065-Speed 6309.55 samples/sec Loss 4.0422 LearningRate 0.0001 Epoch: 26 Global Step: 542290 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:08,309-Speed 6314.55 samples/sec Loss 3.9681 LearningRate 0.0001 Epoch: 26 Global Step: 542300 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:11,564-Speed 6293.99 samples/sec Loss 3.9880 LearningRate 0.0001 Epoch: 26 Global Step: 542310 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:14,806-Speed 6318.64 samples/sec Loss 3.9255 LearningRate 0.0001 Epoch: 26 Global Step: 542320 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:18,050-Speed 6313.90 samples/sec Loss 3.9631 LearningRate 0.0001 Epoch: 26 Global Step: 542330 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:21,297-Speed 6308.47 samples/sec Loss 4.0200 LearningRate 0.0001 Epoch: 26 Global Step: 542340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:24,542-Speed 6312.99 samples/sec Loss 3.9957 LearningRate 0.0001 Epoch: 26 Global Step: 542350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:27,785-Speed 6315.57 samples/sec Loss 4.0290 LearningRate 0.0001 Epoch: 26 Global Step: 542360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:31,011-Speed 6350.39 samples/sec Loss 3.9204 LearningRate 0.0001 Epoch: 26 Global Step: 542370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:34,256-Speed 6315.82 samples/sec Loss 3.9747 LearningRate 0.0001 Epoch: 26 Global Step: 542380 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:37,505-Speed 6304.02 samples/sec Loss 3.9979 LearningRate 0.0001 Epoch: 26 Global Step: 542390 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:04:40,733-Speed 6347.19 samples/sec Loss 3.9599 LearningRate 0.0001 Epoch: 26 Global Step: 542400 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:04:43,979-Speed 6310.30 samples/sec Loss 3.9912 LearningRate 0.0001 Epoch: 26 Global Step: 542410 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:04:47,222-Speed 6316.28 samples/sec Loss 3.9972 LearningRate 0.0001 Epoch: 26 Global Step: 542420 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:04:50,465-Speed 6317.12 samples/sec Loss 3.9537 LearningRate 0.0001 Epoch: 26 Global Step: 542430 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:04:53,707-Speed 6317.97 samples/sec Loss 3.9758 LearningRate 0.0001 Epoch: 26 Global Step: 542440 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:04:56,951-Speed 6316.00 samples/sec Loss 3.9597 LearningRate 0.0001 Epoch: 26 Global Step: 542450 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:00,208-Speed 6289.72 samples/sec Loss 3.9838 LearningRate 0.0001 Epoch: 26 Global Step: 542460 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:03,505-Speed 6211.90 samples/sec Loss 4.0137 LearningRate 0.0001 Epoch: 26 Global Step: 542470 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:06,748-Speed 6316.97 samples/sec Loss 3.9948 LearningRate 0.0001 Epoch: 26 Global Step: 542480 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:09,993-Speed 6312.91 samples/sec Loss 3.9835 LearningRate 0.0001 Epoch: 26 Global Step: 542490 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:13,236-Speed 6317.56 samples/sec Loss 3.9607 LearningRate 0.0001 Epoch: 26 Global Step: 542500 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:05:16,480-Speed 6313.27 samples/sec Loss 3.9433 LearningRate 0.0001 Epoch: 26 Global Step: 542510 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:05:19,722-Speed 6318.07 samples/sec Loss 3.9378 LearningRate 0.0001 Epoch: 26 Global Step: 542520 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:05:22,952-Speed 6342.60 samples/sec Loss 3.8964 LearningRate 0.0001 Epoch: 26 Global Step: 542530 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:26,202-Speed 6303.68 samples/sec Loss 3.9566 LearningRate 0.0001 Epoch: 26 Global Step: 542540 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:29,447-Speed 6311.72 samples/sec Loss 3.9729 LearningRate 0.0001 Epoch: 26 Global Step: 542550 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:32,690-Speed 6316.68 samples/sec Loss 3.9703 LearningRate 0.0001 Epoch: 26 Global Step: 542560 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:35,934-Speed 6315.46 samples/sec Loss 3.9649 LearningRate 0.0001 Epoch: 26 Global Step: 542570 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:39,183-Speed 6305.15 samples/sec Loss 3.9293 LearningRate 0.0001 Epoch: 26 Global Step: 542580 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:42,433-Speed 6302.94 samples/sec Loss 3.9675 LearningRate 0.0001 Epoch: 26 Global Step: 542590 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:45,678-Speed 6311.57 samples/sec Loss 3.9164 LearningRate 0.0001 Epoch: 26 Global Step: 542600 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:48,922-Speed 6315.60 samples/sec Loss 3.8664 LearningRate 0.0001 Epoch: 26 Global Step: 542610 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:52,166-Speed 6314.72 samples/sec Loss 4.0019 LearningRate 0.0001 Epoch: 26 Global Step: 542620 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:05:55,413-Speed 6308.38 samples/sec Loss 3.9369 LearningRate 0.0001 Epoch: 26 Global Step: 542630 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:05:58,642-Speed 6342.62 samples/sec Loss 3.9463 LearningRate 0.0001 Epoch: 26 Global Step: 542640 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:06:01,888-Speed 6310.65 samples/sec Loss 3.9688 LearningRate 0.0001 Epoch: 26 Global Step: 542650 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:06:05,140-Speed 6301.01 samples/sec Loss 3.9123 LearningRate 0.0001 Epoch: 26 Global Step: 542660 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:06:08,384-Speed 6314.72 samples/sec Loss 3.9695 LearningRate 0.0001 Epoch: 26 Global Step: 542670 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:06:11,628-Speed 6314.00 samples/sec Loss 4.0117 LearningRate 0.0001 Epoch: 26 Global Step: 542680 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:06:14,917-Speed 6227.85 samples/sec Loss 3.9760 LearningRate 0.0001 Epoch: 26 Global Step: 542690 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:06:18,160-Speed 6317.38 samples/sec Loss 3.9542 LearningRate 0.0001 Epoch: 26 Global Step: 542700 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:06:21,405-Speed 6312.28 samples/sec Loss 3.9133 LearningRate 0.0001 Epoch: 26 Global Step: 542710 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:06:24,649-Speed 6315.65 samples/sec Loss 3.9568 LearningRate 0.0001 Epoch: 26 Global Step: 542720 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:06:27,896-Speed 6308.43 samples/sec Loss 3.9251 LearningRate 0.0001 Epoch: 26 Global Step: 542730 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:06:31,140-Speed 6314.37 samples/sec Loss 3.9096 LearningRate 0.0001 Epoch: 26 Global Step: 542740 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:06:34,389-Speed 6305.38 samples/sec Loss 3.9554 LearningRate 0.0001 Epoch: 26 Global Step: 542750 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:06:37,636-Speed 6308.55 samples/sec Loss 3.9378 LearningRate 0.0001 Epoch: 26 Global Step: 542760 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:06:40,892-Speed 6292.01 samples/sec Loss 3.9439 LearningRate 0.0001 Epoch: 26 Global Step: 542770 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:06:44,135-Speed 6314.90 samples/sec Loss 3.9412 LearningRate 0.0001 Epoch: 26 Global Step: 542780 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:06:47,382-Speed 6308.91 samples/sec Loss 4.0103 LearningRate 0.0001 Epoch: 26 Global Step: 542790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:06:50,632-Speed 6304.35 samples/sec Loss 4.0043 LearningRate 0.0001 Epoch: 26 Global Step: 542800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:06:53,874-Speed 6318.45 samples/sec Loss 3.9325 LearningRate 0.0001 Epoch: 26 Global Step: 542810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:06:57,123-Speed 6303.31 samples/sec Loss 3.9034 LearningRate 0.0001 Epoch: 26 Global Step: 542820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:07:00,372-Speed 6305.32 samples/sec Loss 3.9280 LearningRate 0.0001 Epoch: 26 Global Step: 542830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:07:03,602-Speed 6342.63 samples/sec Loss 3.9820 LearningRate 0.0001 Epoch: 26 Global Step: 542840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:07:06,846-Speed 6314.33 samples/sec Loss 4.0171 LearningRate 0.0001 Epoch: 26 Global Step: 542850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:07:10,091-Speed 6314.06 samples/sec Loss 3.8683 LearningRate 0.0001 Epoch: 26 Global Step: 542860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:07:13,337-Speed 6309.06 samples/sec Loss 3.9314 LearningRate 0.0001 Epoch: 26 Global Step: 542870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:07:16,570-Speed 6336.97 samples/sec Loss 3.9118 LearningRate 0.0001 Epoch: 26 Global Step: 542880 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:07:19,813-Speed 6317.54 samples/sec Loss 3.9844 LearningRate 0.0001 Epoch: 26 Global Step: 542890 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:07:23,055-Speed 6319.00 samples/sec Loss 3.9605 LearningRate 0.0001 Epoch: 26 Global Step: 542900 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:07:26,299-Speed 6313.29 samples/sec Loss 3.9508 LearningRate 0.0001 Epoch: 26 Global Step: 542910 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:07:29,542-Speed 6316.76 samples/sec Loss 3.9178 LearningRate 0.0001 Epoch: 26 Global Step: 542920 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:07:32,790-Speed 6306.86 samples/sec Loss 3.9721 LearningRate 0.0001 Epoch: 26 Global Step: 542930 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:07:36,036-Speed 6310.02 samples/sec Loss 3.9343 LearningRate 0.0001 Epoch: 26 Global Step: 542940 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:07:39,281-Speed 6313.48 samples/sec Loss 3.9060 LearningRate 0.0001 Epoch: 26 Global Step: 542950 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:07:42,526-Speed 6312.19 samples/sec Loss 3.9437 LearningRate 0.0001 Epoch: 26 Global Step: 542960 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:07:45,770-Speed 6316.00 samples/sec Loss 3.9894 LearningRate 0.0001 Epoch: 26 Global Step: 542970 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:07:49,013-Speed 6314.68 samples/sec Loss 3.9545 LearningRate 0.0001 Epoch: 26 Global Step: 542980 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:07:52,265-Speed 6300.91 samples/sec Loss 4.0387 LearningRate 0.0001 Epoch: 26 Global Step: 542990 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:07:55,514-Speed 6303.39 samples/sec Loss 3.9382 LearningRate 0.0001 Epoch: 26 Global Step: 543000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:07:58,761-Speed 6309.09 samples/sec Loss 4.0285 LearningRate 0.0001 Epoch: 26 Global Step: 543010 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:02,006-Speed 6313.54 samples/sec Loss 3.9599 LearningRate 0.0001 Epoch: 26 Global Step: 543020 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:05,259-Speed 6295.71 samples/sec Loss 3.9001 LearningRate 0.0001 Epoch: 26 Global Step: 543030 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:08,510-Speed 6302.30 samples/sec Loss 3.8976 LearningRate 0.0001 Epoch: 26 Global Step: 543040 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:11,756-Speed 6311.04 samples/sec Loss 3.9284 LearningRate 0.0001 Epoch: 26 Global Step: 543050 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:15,006-Speed 6303.13 samples/sec Loss 3.9479 LearningRate 0.0001 Epoch: 26 Global Step: 543060 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:18,261-Speed 6292.73 samples/sec Loss 3.9534 LearningRate 0.0001 Epoch: 26 Global Step: 543070 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:21,500-Speed 6324.32 samples/sec Loss 4.0342 LearningRate 0.0001 Epoch: 26 Global Step: 543080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:24,748-Speed 6305.70 samples/sec Loss 3.9868 LearningRate 0.0001 Epoch: 26 Global Step: 543090 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:27,995-Speed 6309.82 samples/sec Loss 3.9492 LearningRate 0.0001 Epoch: 26 Global Step: 543100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:31,252-Speed 6290.34 samples/sec Loss 3.9366 LearningRate 0.0001 Epoch: 26 Global Step: 543110 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:34,501-Speed 6305.28 samples/sec Loss 3.9533 LearningRate 0.0001 Epoch: 26 Global Step: 543120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:37,760-Speed 6284.93 samples/sec Loss 4.0083 LearningRate 0.0001 Epoch: 26 Global Step: 543130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:41,010-Speed 6303.26 samples/sec Loss 3.9836 LearningRate 0.0001 Epoch: 26 Global Step: 543140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:44,257-Speed 6308.42 samples/sec Loss 4.0105 LearningRate 0.0001 Epoch: 26 Global Step: 543150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:47,506-Speed 6305.22 samples/sec Loss 4.0237 LearningRate 0.0001 Epoch: 26 Global Step: 543160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:50,751-Speed 6312.55 samples/sec Loss 3.9229 LearningRate 0.0001 Epoch: 26 Global Step: 543170 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:53,984-Speed 6336.56 samples/sec Loss 3.9527 LearningRate 0.0001 Epoch: 26 Global Step: 543180 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:08:57,229-Speed 6311.84 samples/sec Loss 4.0139 LearningRate 0.0001 Epoch: 26 Global Step: 543190 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:09:00,463-Speed 6335.00 samples/sec Loss 3.9551 LearningRate 0.0001 Epoch: 26 Global Step: 543200 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:03,711-Speed 6305.91 samples/sec Loss 3.9738 LearningRate 0.0001 Epoch: 26 Global Step: 543210 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:06,954-Speed 6317.04 samples/sec Loss 3.9211 LearningRate 0.0001 Epoch: 26 Global Step: 543220 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:10,199-Speed 6311.39 samples/sec Loss 3.9258 LearningRate 0.0001 Epoch: 26 Global Step: 543230 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:13,444-Speed 6314.31 samples/sec Loss 3.9081 LearningRate 0.0001 Epoch: 26 Global Step: 543240 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:16,688-Speed 6314.85 samples/sec Loss 3.9577 LearningRate 0.0001 Epoch: 26 Global Step: 543250 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:19,932-Speed 6313.12 samples/sec Loss 3.9749 LearningRate 0.0001 Epoch: 26 Global Step: 543260 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:23,177-Speed 6313.08 samples/sec Loss 3.9769 LearningRate 0.0001 Epoch: 26 Global Step: 543270 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:26,420-Speed 6317.00 samples/sec Loss 3.9356 LearningRate 0.0001 Epoch: 26 Global Step: 543280 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:29,665-Speed 6311.77 samples/sec Loss 3.9542 LearningRate 0.0001 Epoch: 26 Global Step: 543290 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:32,912-Speed 6310.27 samples/sec Loss 3.9838 LearningRate 0.0001 Epoch: 26 Global Step: 543300 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:09:36,156-Speed 6313.02 samples/sec Loss 3.9464 LearningRate 0.0001 Epoch: 26 Global Step: 543310 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:09:39,394-Speed 6327.99 samples/sec Loss 3.9455 LearningRate 0.0001 Epoch: 26 Global Step: 543320 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:42,640-Speed 6310.02 samples/sec Loss 3.9011 LearningRate 0.0001 Epoch: 26 Global Step: 543330 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:45,897-Speed 6289.85 samples/sec Loss 3.9462 LearningRate 0.0001 Epoch: 26 Global Step: 543340 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:49,146-Speed 6304.79 samples/sec Loss 3.9977 LearningRate 0.0001 Epoch: 26 Global Step: 543350 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:52,393-Speed 6309.66 samples/sec Loss 3.9851 LearningRate 0.0001 Epoch: 26 Global Step: 543360 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:55,639-Speed 6310.72 samples/sec Loss 3.9826 LearningRate 0.0001 Epoch: 26 Global Step: 543370 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:09:58,885-Speed 6309.58 samples/sec Loss 3.9523 LearningRate 0.0001 Epoch: 26 Global Step: 543380 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:10:02,131-Speed 6310.34 samples/sec Loss 3.9804 LearningRate 0.0001 Epoch: 26 Global Step: 543390 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:10:05,378-Speed 6310.63 samples/sec Loss 3.9499 LearningRate 0.0001 Epoch: 26 Global Step: 543400 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:10:08,619-Speed 6319.31 samples/sec Loss 3.9278 LearningRate 0.0001 Epoch: 26 Global Step: 543410 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:10:11,864-Speed 6314.03 samples/sec Loss 3.9258 LearningRate 0.0001 Epoch: 26 Global Step: 543420 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:15,107-Speed 6316.29 samples/sec Loss 3.9089 LearningRate 0.0001 Epoch: 26 Global Step: 543430 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:18,352-Speed 6311.84 samples/sec Loss 3.9553 LearningRate 0.0001 Epoch: 26 Global Step: 543440 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:21,601-Speed 6305.58 samples/sec Loss 3.9337 LearningRate 0.0001 Epoch: 26 Global Step: 543450 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:24,847-Speed 6310.56 samples/sec Loss 3.9530 LearningRate 0.0001 Epoch: 26 Global Step: 543460 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:28,099-Speed 6299.00 samples/sec Loss 3.9307 LearningRate 0.0001 Epoch: 26 Global Step: 543470 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:31,344-Speed 6311.40 samples/sec Loss 3.9525 LearningRate 0.0001 Epoch: 26 Global Step: 543480 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:34,584-Speed 6322.92 samples/sec Loss 3.8933 LearningRate 0.0001 Epoch: 26 Global Step: 543490 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:37,832-Speed 6306.34 samples/sec Loss 3.9486 LearningRate 0.0001 Epoch: 26 Global Step: 543500 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:41,074-Speed 6319.57 samples/sec Loss 3.9463 LearningRate 0.0001 Epoch: 26 Global Step: 543510 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:44,307-Speed 6335.64 samples/sec Loss 3.9843 LearningRate 0.0001 Epoch: 26 Global Step: 543520 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:47,549-Speed 6318.81 samples/sec Loss 4.0345 LearningRate 0.0001 Epoch: 26 Global Step: 543530 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:50,793-Speed 6315.92 samples/sec Loss 4.0186 LearningRate 0.0001 Epoch: 26 Global Step: 543540 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:10:54,028-Speed 6332.35 samples/sec Loss 3.9513 LearningRate 0.0001 Epoch: 26 Global Step: 543550 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:10:57,270-Speed 6316.92 samples/sec Loss 3.9961 LearningRate 0.0001 Epoch: 26 Global Step: 543560 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:00,515-Speed 6313.35 samples/sec Loss 3.9524 LearningRate 0.0001 Epoch: 26 Global Step: 543570 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:03,770-Speed 6294.20 samples/sec Loss 3.9479 LearningRate 0.0001 Epoch: 26 Global Step: 543580 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:07,011-Speed 6318.57 samples/sec Loss 3.9964 LearningRate 0.0001 Epoch: 26 Global Step: 543590 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:10,256-Speed 6314.25 samples/sec Loss 3.9702 LearningRate 0.0001 Epoch: 26 Global Step: 543600 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:13,503-Speed 6307.96 samples/sec Loss 3.9500 LearningRate 0.0001 Epoch: 26 Global Step: 543610 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:16,749-Speed 6311.79 samples/sec Loss 3.9402 LearningRate 0.0001 Epoch: 26 Global Step: 543620 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:19,990-Speed 6319.75 samples/sec Loss 4.0014 LearningRate 0.0001 Epoch: 26 Global Step: 543630 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:23,236-Speed 6309.53 samples/sec Loss 3.9746 LearningRate 0.0001 Epoch: 26 Global Step: 543640 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:26,487-Speed 6301.71 samples/sec Loss 3.9600 LearningRate 0.0001 Epoch: 26 Global Step: 543650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:11:29,737-Speed 6302.69 samples/sec Loss 3.9473 LearningRate 0.0001 Epoch: 26 Global Step: 543660 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:11:32,982-Speed 6313.30 samples/sec Loss 3.9608 LearningRate 0.0001 Epoch: 26 Global Step: 543670 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:11:36,230-Speed 6306.79 samples/sec Loss 3.9592 LearningRate 0.0001 Epoch: 26 Global Step: 543680 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:11:39,476-Speed 6310.87 samples/sec Loss 3.9291 LearningRate 0.0001 Epoch: 26 Global Step: 543690 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:11:42,719-Speed 6316.78 samples/sec Loss 3.9607 LearningRate 0.0001 Epoch: 26 Global Step: 543700 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:11:45,962-Speed 6315.13 samples/sec Loss 3.9421 LearningRate 0.0001 Epoch: 26 Global Step: 543710 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:11:49,192-Speed 6343.38 samples/sec Loss 3.9629 LearningRate 0.0001 Epoch: 26 Global Step: 543720 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:52,441-Speed 6305.51 samples/sec Loss 3.9495 LearningRate 0.0001 Epoch: 26 Global Step: 543730 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:55,682-Speed 6319.79 samples/sec Loss 3.9741 LearningRate 0.0001 Epoch: 26 Global Step: 543740 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:11:58,926-Speed 6315.20 samples/sec Loss 3.9621 LearningRate 0.0001 Epoch: 26 Global Step: 543750 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:02,176-Speed 6303.26 samples/sec Loss 3.9780 LearningRate 0.0001 Epoch: 26 Global Step: 543760 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:05,421-Speed 6311.39 samples/sec Loss 3.9000 LearningRate 0.0001 Epoch: 26 Global Step: 543770 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:08,665-Speed 6315.00 samples/sec Loss 3.9440 LearningRate 0.0001 Epoch: 26 Global Step: 543780 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:11,911-Speed 6310.96 samples/sec Loss 3.9537 LearningRate 0.0001 Epoch: 26 Global Step: 543790 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:15,156-Speed 6313.78 samples/sec Loss 3.9144 LearningRate 0.0001 Epoch: 26 Global Step: 543800 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:18,411-Speed 6292.02 samples/sec Loss 3.9348 LearningRate 0.0001 Epoch: 26 Global Step: 543810 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:21,662-Speed 6301.91 samples/sec Loss 3.9614 LearningRate 0.0001 Epoch: 26 Global Step: 543820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:12:24,908-Speed 6310.30 samples/sec Loss 3.9069 LearningRate 0.0001 Epoch: 26 Global Step: 543830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:12:28,153-Speed 6312.64 samples/sec Loss 3.9568 LearningRate 0.0001 Epoch: 26 Global Step: 543840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:12:31,398-Speed 6312.19 samples/sec Loss 3.9181 LearningRate 0.0001 Epoch: 26 Global Step: 543850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:12:34,642-Speed 6313.48 samples/sec Loss 3.9525 LearningRate 0.0001 Epoch: 26 Global Step: 543860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:12:37,893-Speed 6301.86 samples/sec Loss 3.9626 LearningRate 0.0001 Epoch: 26 Global Step: 543870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:12:41,124-Speed 6339.58 samples/sec Loss 3.9548 LearningRate 0.0001 Epoch: 26 Global Step: 543880 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:44,367-Speed 6317.75 samples/sec Loss 3.9337 LearningRate 0.0001 Epoch: 26 Global Step: 543890 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:47,615-Speed 6306.01 samples/sec Loss 3.9587 LearningRate 0.0001 Epoch: 26 Global Step: 543900 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:50,874-Speed 6286.46 samples/sec Loss 3.9504 LearningRate 0.0001 Epoch: 26 Global Step: 543910 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:54,121-Speed 6307.26 samples/sec Loss 3.9686 LearningRate 0.0001 Epoch: 26 Global Step: 543920 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:12:57,366-Speed 6312.43 samples/sec Loss 3.9614 LearningRate 0.0001 Epoch: 26 Global Step: 543930 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:13:00,614-Speed 6307.65 samples/sec Loss 3.9661 LearningRate 0.0001 Epoch: 26 Global Step: 543940 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:13:03,860-Speed 6311.62 samples/sec Loss 3.9208 LearningRate 0.0001 Epoch: 26 Global Step: 543950 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:13:07,102-Speed 6318.24 samples/sec Loss 3.9020 LearningRate 0.0001 Epoch: 26 Global Step: 543960 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:13:10,350-Speed 6306.61 samples/sec Loss 3.9435 LearningRate 0.0001 Epoch: 26 Global Step: 543970 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:13:13,594-Speed 6315.30 samples/sec Loss 3.9375 LearningRate 0.0001 Epoch: 26 Global Step: 543980 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:16,845-Speed 6301.04 samples/sec Loss 3.9918 LearningRate 0.0001 Epoch: 26 Global Step: 543990 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:20,098-Speed 6297.13 samples/sec Loss 3.9459 LearningRate 0.0001 Epoch: 26 Global Step: 544000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:23,343-Speed 6313.57 samples/sec Loss 4.0061 LearningRate 0.0001 Epoch: 26 Global Step: 544010 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:26,586-Speed 6314.93 samples/sec Loss 3.8892 LearningRate 0.0001 Epoch: 26 Global Step: 544020 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:29,833-Speed 6309.86 samples/sec Loss 3.9194 LearningRate 0.0001 Epoch: 26 Global Step: 544030 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:33,077-Speed 6314.82 samples/sec Loss 3.9338 LearningRate 0.0001 Epoch: 26 Global Step: 544040 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:36,325-Speed 6305.68 samples/sec Loss 4.0217 LearningRate 0.0001 Epoch: 26 Global Step: 544050 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:39,567-Speed 6318.26 samples/sec Loss 4.0115 LearningRate 0.0001 Epoch: 26 Global Step: 544060 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:42,834-Speed 6270.75 samples/sec Loss 3.9650 LearningRate 0.0001 Epoch: 26 Global Step: 544070 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:46,066-Speed 6337.41 samples/sec Loss 3.9018 LearningRate 0.0001 Epoch: 26 Global Step: 544080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:49,309-Speed 6317.89 samples/sec Loss 3.9530 LearningRate 0.0001 Epoch: 26 Global Step: 544090 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:52,555-Speed 6311.17 samples/sec Loss 4.0084 LearningRate 0.0001 Epoch: 26 Global Step: 544100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:55,795-Speed 6320.48 samples/sec Loss 3.9577 LearningRate 0.0001 Epoch: 26 Global Step: 544110 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:13:59,039-Speed 6316.55 samples/sec Loss 3.9746 LearningRate 0.0001 Epoch: 26 Global Step: 544120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:02,281-Speed 6318.04 samples/sec Loss 3.8569 LearningRate 0.0001 Epoch: 26 Global Step: 544130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:05,526-Speed 6312.20 samples/sec Loss 3.9781 LearningRate 0.0001 Epoch: 26 Global Step: 544140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:08,771-Speed 6312.98 samples/sec Loss 3.9439 LearningRate 0.0001 Epoch: 26 Global Step: 544150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:12,012-Speed 6318.93 samples/sec Loss 3.9702 LearningRate 0.0001 Epoch: 26 Global Step: 544160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:15,259-Speed 6309.99 samples/sec Loss 3.9960 LearningRate 0.0001 Epoch: 26 Global Step: 544170 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:18,517-Speed 6288.58 samples/sec Loss 3.8898 LearningRate 0.0001 Epoch: 26 Global Step: 544180 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:21,762-Speed 6311.58 samples/sec Loss 4.0055 LearningRate 0.0001 Epoch: 26 Global Step: 544190 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:25,023-Speed 6282.19 samples/sec Loss 3.9634 LearningRate 0.0001 Epoch: 26 Global Step: 544200 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:28,265-Speed 6319.10 samples/sec Loss 3.9510 LearningRate 0.0001 Epoch: 26 Global Step: 544210 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:31,514-Speed 6304.52 samples/sec Loss 3.9428 LearningRate 0.0001 Epoch: 26 Global Step: 544220 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:34,757-Speed 6316.77 samples/sec Loss 3.9338 LearningRate 0.0001 Epoch: 26 Global Step: 544230 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:38,004-Speed 6308.58 samples/sec Loss 3.9773 LearningRate 0.0001 Epoch: 26 Global Step: 544240 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:41,246-Speed 6319.63 samples/sec Loss 3.9724 LearningRate 0.0001 Epoch: 26 Global Step: 544250 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:44,488-Speed 6316.73 samples/sec Loss 3.9312 LearningRate 0.0001 Epoch: 26 Global Step: 544260 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:47,735-Speed 6310.07 samples/sec Loss 3.9299 LearningRate 0.0001 Epoch: 26 Global Step: 544270 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:50,960-Speed 6350.86 samples/sec Loss 3.9569 LearningRate 0.0001 Epoch: 26 Global Step: 544280 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:54,206-Speed 6310.31 samples/sec Loss 3.8962 LearningRate 0.0001 Epoch: 26 Global Step: 544290 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:14:57,448-Speed 6318.57 samples/sec Loss 3.9246 LearningRate 0.0001 Epoch: 26 Global Step: 544300 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:00,695-Speed 6309.55 samples/sec Loss 3.9558 LearningRate 0.0001 Epoch: 26 Global Step: 544310 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:03,945-Speed 6303.20 samples/sec Loss 3.9575 LearningRate 0.0001 Epoch: 26 Global Step: 544320 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:07,186-Speed 6319.94 samples/sec Loss 3.9994 LearningRate 0.0001 Epoch: 26 Global Step: 544330 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:10,430-Speed 6314.58 samples/sec Loss 3.9949 LearningRate 0.0001 Epoch: 26 Global Step: 544340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:13,675-Speed 6312.28 samples/sec Loss 3.9421 LearningRate 0.0001 Epoch: 26 Global Step: 544350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:16,921-Speed 6312.14 samples/sec Loss 4.0708 LearningRate 0.0001 Epoch: 26 Global Step: 544360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:20,169-Speed 6305.64 samples/sec Loss 3.9506 LearningRate 0.0001 Epoch: 26 Global Step: 544370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:23,402-Speed 6335.62 samples/sec Loss 3.9852 LearningRate 0.0001 Epoch: 26 Global Step: 544380 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:26,649-Speed 6309.41 samples/sec Loss 3.9530 LearningRate 0.0001 Epoch: 26 Global Step: 544390 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:29,946-Speed 6214.83 samples/sec Loss 3.9959 LearningRate 0.0001 Epoch: 26 Global Step: 544400 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:33,193-Speed 6307.51 samples/sec Loss 3.9259 LearningRate 0.0001 Epoch: 26 Global Step: 544410 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:36,433-Speed 6322.63 samples/sec Loss 3.8844 LearningRate 0.0001 Epoch: 26 Global Step: 544420 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:39,677-Speed 6314.47 samples/sec Loss 3.9891 LearningRate 0.0001 Epoch: 26 Global Step: 544430 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:42,920-Speed 6316.92 samples/sec Loss 3.9740 LearningRate 0.0001 Epoch: 26 Global Step: 544440 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:46,164-Speed 6314.09 samples/sec Loss 3.9474 LearningRate 0.0001 Epoch: 26 Global Step: 544450 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:49,412-Speed 6306.56 samples/sec Loss 3.9881 LearningRate 0.0001 Epoch: 26 Global Step: 544460 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:52,657-Speed 6314.26 samples/sec Loss 3.9638 LearningRate 0.0001 Epoch: 26 Global Step: 544470 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:15:55,889-Speed 6337.74 samples/sec Loss 4.0224 LearningRate 0.0001 Epoch: 26 Global Step: 544480 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:15:59,129-Speed 6320.95 samples/sec Loss 3.9622 LearningRate 0.0001 Epoch: 26 Global Step: 544490 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:02,374-Speed 6312.47 samples/sec Loss 3.9256 LearningRate 0.0001 Epoch: 26 Global Step: 544500 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:05,617-Speed 6316.89 samples/sec Loss 3.8604 LearningRate 0.0001 Epoch: 26 Global Step: 544510 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:08,859-Speed 6318.37 samples/sec Loss 3.9476 LearningRate 0.0001 Epoch: 26 Global Step: 544520 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:12,106-Speed 6310.04 samples/sec Loss 3.9072 LearningRate 0.0001 Epoch: 26 Global Step: 544530 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:15,349-Speed 6316.47 samples/sec Loss 3.9625 LearningRate 0.0001 Epoch: 26 Global Step: 544540 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:18,592-Speed 6316.12 samples/sec Loss 3.9956 LearningRate 0.0001 Epoch: 26 Global Step: 544550 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:21,840-Speed 6306.84 samples/sec Loss 3.9075 LearningRate 0.0001 Epoch: 26 Global Step: 544560 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:25,142-Speed 6203.56 samples/sec Loss 3.9251 LearningRate 0.0001 Epoch: 26 Global Step: 544570 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:28,397-Speed 6292.14 samples/sec Loss 4.0601 LearningRate 0.0001 Epoch: 26 Global Step: 544580 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:16:31,630-Speed 6337.16 samples/sec Loss 3.9336 LearningRate 0.0001 Epoch: 26 Global Step: 544590 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:34,874-Speed 6315.68 samples/sec Loss 3.9341 LearningRate 0.0001 Epoch: 26 Global Step: 544600 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:38,114-Speed 6321.03 samples/sec Loss 3.9695 LearningRate 0.0001 Epoch: 26 Global Step: 544610 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:41,358-Speed 6314.83 samples/sec Loss 3.9796 LearningRate 0.0001 Epoch: 26 Global Step: 544620 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:44,599-Speed 6321.64 samples/sec Loss 3.9557 LearningRate 0.0001 Epoch: 26 Global Step: 544630 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:47,848-Speed 6304.65 samples/sec Loss 3.9710 LearningRate 0.0001 Epoch: 26 Global Step: 544640 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:51,091-Speed 6317.27 samples/sec Loss 3.9875 LearningRate 0.0001 Epoch: 26 Global Step: 544650 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:54,334-Speed 6316.47 samples/sec Loss 3.9489 LearningRate 0.0001 Epoch: 26 Global Step: 544660 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:16:57,575-Speed 6320.11 samples/sec Loss 3.9581 LearningRate 0.0001 Epoch: 26 Global Step: 544670 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:00,911-Speed 6140.65 samples/sec Loss 3.9563 LearningRate 0.0001 Epoch: 26 Global Step: 544680 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:04,213-Speed 6203.06 samples/sec Loss 3.9951 LearningRate 0.0001 Epoch: 26 Global Step: 544690 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:07,493-Speed 6244.51 samples/sec Loss 3.9521 LearningRate 0.0001 Epoch: 26 Global Step: 544700 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:10,735-Speed 6318.33 samples/sec Loss 3.8886 LearningRate 0.0001 Epoch: 26 Global Step: 544710 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:13,979-Speed 6314.68 samples/sec Loss 3.9316 LearningRate 0.0001 Epoch: 26 Global Step: 544720 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:17,226-Speed 6309.77 samples/sec Loss 3.9540 LearningRate 0.0001 Epoch: 26 Global Step: 544730 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:20,468-Speed 6318.30 samples/sec Loss 3.9192 LearningRate 0.0001 Epoch: 26 Global Step: 544740 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:23,714-Speed 6310.26 samples/sec Loss 3.9495 LearningRate 0.0001 Epoch: 26 Global Step: 544750 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:26,958-Speed 6314.82 samples/sec Loss 3.9741 LearningRate 0.0001 Epoch: 26 Global Step: 544760 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:30,202-Speed 6314.26 samples/sec Loss 3.9103 LearningRate 0.0001 Epoch: 26 Global Step: 544770 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:33,453-Speed 6300.94 samples/sec Loss 3.9743 LearningRate 0.0001 Epoch: 26 Global Step: 544780 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:36,705-Speed 6299.58 samples/sec Loss 3.9789 LearningRate 0.0001 Epoch: 26 Global Step: 544790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:17:39,949-Speed 6315.33 samples/sec Loss 3.9251 LearningRate 0.0001 Epoch: 26 Global Step: 544800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:17:43,189-Speed 6322.20 samples/sec Loss 3.9177 LearningRate 0.0001 Epoch: 26 Global Step: 544810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:17:46,425-Speed 6330.60 samples/sec Loss 3.9134 LearningRate 0.0001 Epoch: 26 Global Step: 544820 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:49,672-Speed 6308.81 samples/sec Loss 3.9496 LearningRate 0.0001 Epoch: 26 Global Step: 544830 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:52,916-Speed 6315.27 samples/sec Loss 3.9830 LearningRate 0.0001 Epoch: 26 Global Step: 544840 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:56,156-Speed 6322.61 samples/sec Loss 3.9889 LearningRate 0.0001 Epoch: 26 Global Step: 544850 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:17:59,400-Speed 6313.58 samples/sec Loss 4.0522 LearningRate 0.0001 Epoch: 26 Global Step: 544860 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:02,644-Speed 6314.54 samples/sec Loss 3.8841 LearningRate 0.0001 Epoch: 26 Global Step: 544870 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:05,892-Speed 6306.81 samples/sec Loss 3.9650 LearningRate 0.0001 Epoch: 26 Global Step: 544880 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:09,135-Speed 6317.65 samples/sec Loss 3.9558 LearningRate 0.0001 Epoch: 26 Global Step: 544890 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:12,381-Speed 6310.24 samples/sec Loss 3.9831 LearningRate 0.0001 Epoch: 26 Global Step: 544900 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:15,628-Speed 6308.53 samples/sec Loss 3.9206 LearningRate 0.0001 Epoch: 26 Global Step: 544910 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:18,874-Speed 6309.63 samples/sec Loss 3.9684 LearningRate 0.0001 Epoch: 26 Global Step: 544920 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:18:22,118-Speed 6316.44 samples/sec Loss 3.8926 LearningRate 0.0001 Epoch: 26 Global Step: 544930 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:18:25,361-Speed 6315.86 samples/sec Loss 3.9837 LearningRate 0.0001 Epoch: 26 Global Step: 544940 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:18:28,608-Speed 6307.69 samples/sec Loss 3.9724 LearningRate 0.0001 Epoch: 26 Global Step: 544950 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:18:31,854-Speed 6311.80 samples/sec Loss 4.0245 LearningRate 0.0001 Epoch: 26 Global Step: 544960 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:18:35,098-Speed 6314.42 samples/sec Loss 3.9892 LearningRate 0.0001 Epoch: 26 Global Step: 544970 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:18:38,327-Speed 6343.00 samples/sec Loss 3.9546 LearningRate 0.0001 Epoch: 26 Global Step: 544980 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:41,597-Speed 6265.35 samples/sec Loss 3.9015 LearningRate 0.0001 Epoch: 26 Global Step: 544990 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:44,837-Speed 6321.36 samples/sec Loss 3.9809 LearningRate 0.0001 Epoch: 26 Global Step: 545000 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:48,083-Speed 6311.55 samples/sec Loss 3.9550 LearningRate 0.0001 Epoch: 26 Global Step: 545010 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:51,342-Speed 6286.02 samples/sec Loss 3.9418 LearningRate 0.0001 Epoch: 26 Global Step: 545020 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:54,582-Speed 6322.29 samples/sec Loss 3.9437 LearningRate 0.0001 Epoch: 26 Global Step: 545030 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:18:57,827-Speed 6313.44 samples/sec Loss 3.8646 LearningRate 0.0001 Epoch: 26 Global Step: 545040 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:01,068-Speed 6319.91 samples/sec Loss 3.9978 LearningRate 0.0001 Epoch: 26 Global Step: 545050 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:04,311-Speed 6317.27 samples/sec Loss 3.9378 LearningRate 0.0001 Epoch: 26 Global Step: 545060 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:07,556-Speed 6311.07 samples/sec Loss 3.9231 LearningRate 0.0001 Epoch: 26 Global Step: 545070 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:10,801-Speed 6313.82 samples/sec Loss 3.9651 LearningRate 0.0001 Epoch: 26 Global Step: 545080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:19:14,036-Speed 6332.17 samples/sec Loss 4.0217 LearningRate 0.0001 Epoch: 26 Global Step: 545090 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:17,278-Speed 6318.08 samples/sec Loss 3.9030 LearningRate 0.0001 Epoch: 26 Global Step: 545100 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:20,519-Speed 6320.16 samples/sec Loss 3.9472 LearningRate 0.0001 Epoch: 26 Global Step: 545110 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:23,768-Speed 6306.28 samples/sec Loss 3.9121 LearningRate 0.0001 Epoch: 26 Global Step: 545120 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:27,009-Speed 6319.10 samples/sec Loss 3.9280 LearningRate 0.0001 Epoch: 26 Global Step: 545130 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:30,256-Speed 6309.28 samples/sec Loss 3.9905 LearningRate 0.0001 Epoch: 26 Global Step: 545140 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:33,512-Speed 6290.95 samples/sec Loss 3.9525 LearningRate 0.0001 Epoch: 26 Global Step: 545150 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:36,757-Speed 6312.46 samples/sec Loss 4.0051 LearningRate 0.0001 Epoch: 26 Global Step: 545160 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:40,003-Speed 6310.62 samples/sec Loss 3.9856 LearningRate 0.0001 Epoch: 26 Global Step: 545170 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:43,250-Speed 6308.40 samples/sec Loss 3.9536 LearningRate 0.0001 Epoch: 26 Global Step: 545180 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:19:46,499-Speed 6305.37 samples/sec Loss 4.0152 LearningRate 0.0001 Epoch: 26 Global Step: 545190 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:19:49,746-Speed 6309.99 samples/sec Loss 3.9211 LearningRate 0.0001 Epoch: 26 Global Step: 545200 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:19:53,009-Speed 6277.46 samples/sec Loss 3.9391 LearningRate 0.0001 Epoch: 26 Global Step: 545210 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:19:56,254-Speed 6311.79 samples/sec Loss 3.8603 LearningRate 0.0001 Epoch: 26 Global Step: 545220 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:19:59,516-Speed 6279.18 samples/sec Loss 3.8711 LearningRate 0.0001 Epoch: 26 Global Step: 545230 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:02,760-Speed 6314.90 samples/sec Loss 3.9441 LearningRate 0.0001 Epoch: 26 Global Step: 545240 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:06,010-Speed 6304.07 samples/sec Loss 3.9386 LearningRate 0.0001 Epoch: 26 Global Step: 545250 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:09,257-Speed 6308.53 samples/sec Loss 3.9558 LearningRate 0.0001 Epoch: 26 Global Step: 545260 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:12,503-Speed 6311.45 samples/sec Loss 3.9345 LearningRate 0.0001 Epoch: 26 Global Step: 545270 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:15,751-Speed 6307.12 samples/sec Loss 3.9505 LearningRate 0.0001 Epoch: 26 Global Step: 545280 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:18,981-Speed 6340.96 samples/sec Loss 3.9249 LearningRate 0.0001 Epoch: 26 Global Step: 545290 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:22,232-Speed 6302.57 samples/sec Loss 3.9625 LearningRate 0.0001 Epoch: 26 Global Step: 545300 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:25,480-Speed 6306.80 samples/sec Loss 3.9581 LearningRate 0.0001 Epoch: 26 Global Step: 545310 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:28,725-Speed 6311.58 samples/sec Loss 3.9101 LearningRate 0.0001 Epoch: 26 Global Step: 545320 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:31,970-Speed 6313.85 samples/sec Loss 3.9348 LearningRate 0.0001 Epoch: 26 Global Step: 545330 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:35,211-Speed 6320.06 samples/sec Loss 3.9221 LearningRate 0.0001 Epoch: 26 Global Step: 545340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:38,457-Speed 6311.02 samples/sec Loss 3.9087 LearningRate 0.0001 Epoch: 26 Global Step: 545350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:41,701-Speed 6314.47 samples/sec Loss 3.9549 LearningRate 0.0001 Epoch: 26 Global Step: 545360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:20:44,927-Speed 6349.41 samples/sec Loss 4.0178 LearningRate 0.0001 Epoch: 26 Global Step: 545370 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:20:48,173-Speed 6309.76 samples/sec Loss 3.9484 LearningRate 0.0001 Epoch: 26 Global Step: 545380 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:20:51,462-Speed 6228.37 samples/sec Loss 3.9199 LearningRate 0.0001 Epoch: 26 Global Step: 545390 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:20:54,790-Speed 6156.32 samples/sec Loss 3.9565 LearningRate 0.0001 Epoch: 26 Global Step: 545400 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:20:58,034-Speed 6313.50 samples/sec Loss 3.9357 LearningRate 0.0001 Epoch: 26 Global Step: 545410 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:21:01,279-Speed 6312.16 samples/sec Loss 3.9343 LearningRate 0.0001 Epoch: 26 Global Step: 545420 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:21:04,530-Speed 6301.36 samples/sec Loss 3.9148 LearningRate 0.0001 Epoch: 26 Global Step: 545430 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:21:07,773-Speed 6317.71 samples/sec Loss 3.9412 LearningRate 0.0001 Epoch: 26 Global Step: 545440 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:21:11,018-Speed 6310.92 samples/sec Loss 4.0032 LearningRate 0.0001 Epoch: 26 Global Step: 545450 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:21:14,268-Speed 6304.24 samples/sec Loss 3.9333 LearningRate 0.0001 Epoch: 26 Global Step: 545460 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:21:17,518-Speed 6304.32 samples/sec Loss 3.9542 LearningRate 0.0001 Epoch: 26 Global Step: 545470 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:20,763-Speed 6312.34 samples/sec Loss 3.9554 LearningRate 0.0001 Epoch: 26 Global Step: 545480 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:24,014-Speed 6301.15 samples/sec Loss 3.8981 LearningRate 0.0001 Epoch: 26 Global Step: 545490 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:27,259-Speed 6311.66 samples/sec Loss 3.9795 LearningRate 0.0001 Epoch: 26 Global Step: 545500 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:30,507-Speed 6306.66 samples/sec Loss 3.9291 LearningRate 0.0001 Epoch: 26 Global Step: 545510 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:33,750-Speed 6318.02 samples/sec Loss 3.9757 LearningRate 0.0001 Epoch: 26 Global Step: 545520 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:36,999-Speed 6304.99 samples/sec Loss 3.9535 LearningRate 0.0001 Epoch: 26 Global Step: 545530 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:40,246-Speed 6307.61 samples/sec Loss 3.9616 LearningRate 0.0001 Epoch: 26 Global Step: 545540 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:43,493-Speed 6309.95 samples/sec Loss 3.8665 LearningRate 0.0001 Epoch: 26 Global Step: 545550 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:46,740-Speed 6308.27 samples/sec Loss 3.9808 LearningRate 0.0001 Epoch: 26 Global Step: 545560 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:49,972-Speed 6338.16 samples/sec Loss 3.9823 LearningRate 0.0001 Epoch: 26 Global Step: 545570 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:53,212-Speed 6321.51 samples/sec Loss 3.9597 LearningRate 0.0001 Epoch: 26 Global Step: 545580 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:56,461-Speed 6305.26 samples/sec Loss 3.9369 LearningRate 0.0001 Epoch: 26 Global Step: 545590 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:21:59,716-Speed 6293.05 samples/sec Loss 3.9558 LearningRate 0.0001 Epoch: 26 Global Step: 545600 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:02,963-Speed 6309.29 samples/sec Loss 3.9581 LearningRate 0.0001 Epoch: 26 Global Step: 545610 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:06,208-Speed 6312.04 samples/sec Loss 3.8689 LearningRate 0.0001 Epoch: 26 Global Step: 545620 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:09,453-Speed 6312.65 samples/sec Loss 3.9601 LearningRate 0.0001 Epoch: 26 Global Step: 545630 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:12,701-Speed 6306.89 samples/sec Loss 3.8577 LearningRate 0.0001 Epoch: 26 Global Step: 545640 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:15,948-Speed 6309.28 samples/sec Loss 3.8319 LearningRate 0.0001 Epoch: 26 Global Step: 545650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:19,195-Speed 6308.82 samples/sec Loss 3.9035 LearningRate 0.0001 Epoch: 26 Global Step: 545660 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:22,427-Speed 6338.95 samples/sec Loss 3.9470 LearningRate 0.0001 Epoch: 26 Global Step: 545670 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:25,671-Speed 6314.41 samples/sec Loss 3.8869 LearningRate 0.0001 Epoch: 26 Global Step: 545680 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:28,917-Speed 6309.84 samples/sec Loss 3.9319 LearningRate 0.0001 Epoch: 26 Global Step: 545690 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:32,164-Speed 6309.32 samples/sec Loss 3.9106 LearningRate 0.0001 Epoch: 26 Global Step: 545700 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:35,409-Speed 6313.92 samples/sec Loss 3.9980 LearningRate 0.0001 Epoch: 26 Global Step: 545710 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:38,653-Speed 6314.21 samples/sec Loss 3.9079 LearningRate 0.0001 Epoch: 26 Global Step: 545720 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:41,895-Speed 6318.64 samples/sec Loss 3.9185 LearningRate 0.0001 Epoch: 26 Global Step: 545730 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:45,141-Speed 6309.99 samples/sec Loss 3.9316 LearningRate 0.0001 Epoch: 26 Global Step: 545740 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:48,443-Speed 6203.44 samples/sec Loss 3.9377 LearningRate 0.0001 Epoch: 26 Global Step: 545750 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:51,690-Speed 6308.70 samples/sec Loss 3.9291 LearningRate 0.0001 Epoch: 26 Global Step: 545760 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:22:54,933-Speed 6317.94 samples/sec Loss 3.9750 LearningRate 0.0001 Epoch: 26 Global Step: 545770 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-04-02 17:22:58,160-Speed 6346.13 samples/sec Loss 3.9635 LearningRate 0.0001 Epoch: 26 Global Step: 545780 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:01,406-Speed 6310.53 samples/sec Loss 3.9126 LearningRate 0.0001 Epoch: 26 Global Step: 545790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:04,650-Speed 6314.88 samples/sec Loss 3.9173 LearningRate 0.0001 Epoch: 26 Global Step: 545800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:07,891-Speed 6320.68 samples/sec Loss 3.9122 LearningRate 0.0001 Epoch: 26 Global Step: 545810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:11,132-Speed 6320.88 samples/sec Loss 3.9240 LearningRate 0.0001 Epoch: 26 Global Step: 545820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:14,385-Speed 6297.67 samples/sec Loss 3.9862 LearningRate 0.0001 Epoch: 26 Global Step: 545830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:17,630-Speed 6312.44 samples/sec Loss 3.9257 LearningRate 0.0001 Epoch: 26 Global Step: 545840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:20,884-Speed 6294.46 samples/sec Loss 3.9623 LearningRate 0.0001 Epoch: 26 Global Step: 545850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:24,130-Speed 6310.65 samples/sec Loss 3.9058 LearningRate 0.0001 Epoch: 26 Global Step: 545860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:27,374-Speed 6314.71 samples/sec Loss 3.9606 LearningRate 0.0001 Epoch: 26 Global Step: 545870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:30,606-Speed 6338.86 samples/sec Loss 3.9412 LearningRate 0.0001 Epoch: 26 Global Step: 545880 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:33,852-Speed 6310.90 samples/sec Loss 3.8944 LearningRate 0.0001 Epoch: 26 Global Step: 545890 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:37,109-Speed 6289.70 samples/sec Loss 3.9659 LearningRate 0.0001 Epoch: 26 Global Step: 545900 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:40,359-Speed 6302.64 samples/sec Loss 3.9051 LearningRate 0.0001 Epoch: 26 Global Step: 545910 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:43,606-Speed 6308.56 samples/sec Loss 3.9669 LearningRate 0.0001 Epoch: 26 Global Step: 545920 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:46,850-Speed 6314.81 samples/sec Loss 3.9907 LearningRate 0.0001 Epoch: 26 Global Step: 545930 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:50,099-Speed 6304.96 samples/sec Loss 3.9858 LearningRate 0.0001 Epoch: 26 Global Step: 545940 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:53,348-Speed 6304.64 samples/sec Loss 3.9497 LearningRate 0.0001 Epoch: 26 Global Step: 545950 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:56,594-Speed 6312.06 samples/sec Loss 3.9417 LearningRate 0.0001 Epoch: 26 Global Step: 545960 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:23:59,839-Speed 6311.94 samples/sec Loss 3.9180 LearningRate 0.0001 Epoch: 26 Global Step: 545970 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:24:03,071-Speed 6338.65 samples/sec Loss 3.9490 LearningRate 0.0001 Epoch: 26 Global Step: 545980 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:24:06,316-Speed 6312.25 samples/sec Loss 3.9133 LearningRate 0.0001 Epoch: 26 Global Step: 545990 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:24:09,563-Speed 6309.03 samples/sec Loss 3.9363 LearningRate 0.0001 Epoch: 26 Global Step: 546000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:24:12,814-Speed 6300.07 samples/sec Loss 3.9232 LearningRate 0.0001 Epoch: 26 Global Step: 546010 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:24:16,045-Speed 6340.65 samples/sec Loss 3.9607 LearningRate 0.0001 Epoch: 26 Global Step: 546020 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:24:19,289-Speed 6313.46 samples/sec Loss 3.9342 LearningRate 0.0001 Epoch: 26 Global Step: 546030 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:24:22,533-Speed 6315.67 samples/sec Loss 3.9174 LearningRate 0.0001 Epoch: 26 Global Step: 546040 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:24:25,781-Speed 6306.47 samples/sec Loss 3.9537 LearningRate 0.0001 Epoch: 26 Global Step: 546050 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:24:29,026-Speed 6312.04 samples/sec Loss 3.9671 LearningRate 0.0001 Epoch: 26 Global Step: 546060 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:24:32,271-Speed 6313.90 samples/sec Loss 3.9320 LearningRate 0.0001 Epoch: 26 Global Step: 546070 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:24:35,544-Speed 6257.37 samples/sec Loss 3.9210 LearningRate 0.0001 Epoch: 26 Global Step: 546080 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:24:38,795-Speed 6302.35 samples/sec Loss 3.8871 LearningRate 0.0001 Epoch: 26 Global Step: 546090 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:24:42,042-Speed 6308.37 samples/sec Loss 3.9684 LearningRate 0.0001 Epoch: 26 Global Step: 546100 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:24:45,287-Speed 6312.54 samples/sec Loss 3.9917 LearningRate 0.0001 Epoch: 26 Global Step: 546110 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:24:48,532-Speed 6313.35 samples/sec Loss 3.9811 LearningRate 0.0001 Epoch: 26 Global Step: 546120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:24:51,778-Speed 6311.43 samples/sec Loss 3.8936 LearningRate 0.0001 Epoch: 26 Global Step: 546130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:24:55,026-Speed 6306.76 samples/sec Loss 3.9417 LearningRate 0.0001 Epoch: 26 Global Step: 546140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:24:58,272-Speed 6310.28 samples/sec Loss 3.8798 LearningRate 0.0001 Epoch: 26 Global Step: 546150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:25:01,516-Speed 6314.30 samples/sec Loss 3.9436 LearningRate 0.0001 Epoch: 26 Global Step: 546160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:25:04,826-Speed 6189.17 samples/sec Loss 3.8976 LearningRate 0.0001 Epoch: 26 Global Step: 546170 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:25:08,075-Speed 6304.73 samples/sec Loss 3.9560 LearningRate 0.0001 Epoch: 26 Global Step: 546180 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:25:11,320-Speed 6312.62 samples/sec Loss 3.9540 LearningRate 0.0001 Epoch: 26 Global Step: 546190 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:25:14,563-Speed 6316.44 samples/sec Loss 3.9420 LearningRate 0.0001 Epoch: 26 Global Step: 546200 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:25:17,803-Speed 6322.23 samples/sec Loss 3.9316 LearningRate 0.0001 Epoch: 26 Global Step: 546210 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:25:21,065-Speed 6280.73 samples/sec Loss 3.8735 LearningRate 0.0001 Epoch: 26 Global Step: 546220 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:25:24,328-Speed 6276.84 samples/sec Loss 3.9231 LearningRate 0.0001 Epoch: 26 Global Step: 546230 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:25:27,587-Speed 6286.36 samples/sec Loss 3.8869 LearningRate 0.0001 Epoch: 26 Global Step: 546240 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:25:30,844-Speed 6288.75 samples/sec Loss 3.9237 LearningRate 0.0001 Epoch: 26 Global Step: 546250 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:25:34,087-Speed 6315.79 samples/sec Loss 3.9136 LearningRate 0.0001 Epoch: 26 Global Step: 546260 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:25:37,333-Speed 6310.61 samples/sec Loss 3.9039 LearningRate 0.0001 Epoch: 26 Global Step: 546270 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:25:40,586-Speed 6298.67 samples/sec Loss 3.9444 LearningRate 0.0001 Epoch: 26 Global Step: 546280 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:25:43,829-Speed 6315.02 samples/sec Loss 3.9317 LearningRate 0.0001 Epoch: 26 Global Step: 546290 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:25:47,074-Speed 6313.50 samples/sec Loss 3.9591 LearningRate 0.0001 Epoch: 26 Global Step: 546300 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:25:50,321-Speed 6309.23 samples/sec Loss 3.9829 LearningRate 0.0001 Epoch: 26 Global Step: 546310 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:25:53,565-Speed 6314.81 samples/sec Loss 3.9321 LearningRate 0.0001 Epoch: 26 Global Step: 546320 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:25:56,808-Speed 6316.19 samples/sec Loss 3.9533 LearningRate 0.0001 Epoch: 26 Global Step: 546330 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:26:00,079-Speed 6263.21 samples/sec Loss 3.9160 LearningRate 0.0001 Epoch: 26 Global Step: 546340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:26:03,390-Speed 6186.53 samples/sec Loss 3.9826 LearningRate 0.0001 Epoch: 26 Global Step: 546350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:26:06,635-Speed 6311.66 samples/sec Loss 3.9544 LearningRate 0.0001 Epoch: 26 Global Step: 546360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:26:09,890-Speed 6294.44 samples/sec Loss 3.9287 LearningRate 0.0001 Epoch: 26 Global Step: 546370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:26:13,121-Speed 6340.72 samples/sec Loss 3.9188 LearningRate 0.0001 Epoch: 26 Global Step: 546380 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:26:16,354-Speed 6335.74 samples/sec Loss 3.9633 LearningRate 0.0001 Epoch: 26 Global Step: 546390 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:26:19,602-Speed 6305.93 samples/sec Loss 3.9000 LearningRate 0.0001 Epoch: 26 Global Step: 546400 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:26:22,846-Speed 6315.94 samples/sec Loss 3.9110 LearningRate 0.0001 Epoch: 26 Global Step: 546410 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:26:26,092-Speed 6311.16 samples/sec Loss 3.9636 LearningRate 0.0001 Epoch: 26 Global Step: 546420 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:26:29,340-Speed 6306.25 samples/sec Loss 3.9773 LearningRate 0.0001 Epoch: 26 Global Step: 546430 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:26:32,587-Speed 6308.42 samples/sec Loss 3.8710 LearningRate 0.0001 Epoch: 26 Global Step: 546440 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:26:35,836-Speed 6304.07 samples/sec Loss 3.9548 LearningRate 0.0001 Epoch: 26 Global Step: 546450 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:26:39,086-Speed 6303.51 samples/sec Loss 3.8873 LearningRate 0.0001 Epoch: 26 Global Step: 546460 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:26:42,341-Speed 6294.28 samples/sec Loss 3.9637 LearningRate 0.0001 Epoch: 26 Global Step: 546470 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:26:45,589-Speed 6306.32 samples/sec Loss 3.9232 LearningRate 0.0001 Epoch: 26 Global Step: 546480 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:26:48,839-Speed 6302.67 samples/sec Loss 3.9675 LearningRate 0.0001 Epoch: 26 Global Step: 546490 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:26:52,083-Speed 6314.43 samples/sec Loss 3.9638 LearningRate 0.0001 Epoch: 26 Global Step: 546500 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:26:55,335-Speed 6298.05 samples/sec Loss 3.9448 LearningRate 0.0001 Epoch: 26 Global Step: 546510 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:26:58,581-Speed 6311.41 samples/sec Loss 3.9814 LearningRate 0.0001 Epoch: 26 Global Step: 546520 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:01,828-Speed 6309.37 samples/sec Loss 3.9475 LearningRate 0.0001 Epoch: 26 Global Step: 546530 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:05,075-Speed 6309.06 samples/sec Loss 3.9292 LearningRate 0.0001 Epoch: 26 Global Step: 546540 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:08,318-Speed 6316.95 samples/sec Loss 3.9130 LearningRate 0.0001 Epoch: 26 Global Step: 546550 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:11,562-Speed 6314.01 samples/sec Loss 3.9455 LearningRate 0.0001 Epoch: 26 Global Step: 546560 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:14,809-Speed 6309.12 samples/sec Loss 3.9378 LearningRate 0.0001 Epoch: 26 Global Step: 546570 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:18,052-Speed 6316.32 samples/sec Loss 3.9520 LearningRate 0.0001 Epoch: 26 Global Step: 546580 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:21,280-Speed 6346.21 samples/sec Loss 3.9023 LearningRate 0.0001 Epoch: 26 Global Step: 546590 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:24,527-Speed 6309.59 samples/sec Loss 3.8976 LearningRate 0.0001 Epoch: 26 Global Step: 546600 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:27,792-Speed 6273.46 samples/sec Loss 3.9202 LearningRate 0.0001 Epoch: 26 Global Step: 546610 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:31,041-Speed 6305.13 samples/sec Loss 3.9311 LearningRate 0.0001 Epoch: 26 Global Step: 546620 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:34,290-Speed 6304.59 samples/sec Loss 4.0398 LearningRate 0.0001 Epoch: 26 Global Step: 546630 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:37,535-Speed 6311.77 samples/sec Loss 3.9733 LearningRate 0.0001 Epoch: 26 Global Step: 546640 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:40,781-Speed 6312.02 samples/sec Loss 3.9068 LearningRate 0.0001 Epoch: 26 Global Step: 546650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:44,022-Speed 6320.27 samples/sec Loss 3.9118 LearningRate 0.0001 Epoch: 26 Global Step: 546660 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:47,264-Speed 6318.46 samples/sec Loss 3.9119 LearningRate 0.0001 Epoch: 26 Global Step: 546670 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:50,516-Speed 6298.35 samples/sec Loss 3.9445 LearningRate 0.0001 Epoch: 26 Global Step: 546680 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:53,744-Speed 6346.19 samples/sec Loss 3.9913 LearningRate 0.0001 Epoch: 26 Global Step: 546690 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:27:56,995-Speed 6301.23 samples/sec Loss 3.9302 LearningRate 0.0001 Epoch: 26 Global Step: 546700 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:00,239-Speed 6313.83 samples/sec Loss 3.9505 LearningRate 0.0001 Epoch: 26 Global Step: 546710 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:03,489-Speed 6303.12 samples/sec Loss 3.9795 LearningRate 0.0001 Epoch: 26 Global Step: 546720 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:06,735-Speed 6310.65 samples/sec Loss 3.9204 LearningRate 0.0001 Epoch: 26 Global Step: 546730 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:09,983-Speed 6306.87 samples/sec Loss 3.9162 LearningRate 0.0001 Epoch: 26 Global Step: 546740 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:13,229-Speed 6311.64 samples/sec Loss 3.8996 LearningRate 0.0001 Epoch: 26 Global Step: 546750 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:16,472-Speed 6317.14 samples/sec Loss 3.8613 LearningRate 0.0001 Epoch: 26 Global Step: 546760 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:19,717-Speed 6312.69 samples/sec Loss 3.8819 LearningRate 0.0001 Epoch: 26 Global Step: 546770 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:22,963-Speed 6310.78 samples/sec Loss 3.9598 LearningRate 0.0001 Epoch: 26 Global Step: 546780 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:26,192-Speed 6345.01 samples/sec Loss 3.9220 LearningRate 0.0001 Epoch: 26 Global Step: 546790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:29,437-Speed 6310.77 samples/sec Loss 3.9262 LearningRate 0.0001 Epoch: 26 Global Step: 546800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:32,682-Speed 6314.12 samples/sec Loss 3.9321 LearningRate 0.0001 Epoch: 26 Global Step: 546810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:35,930-Speed 6306.04 samples/sec Loss 3.9682 LearningRate 0.0001 Epoch: 26 Global Step: 546820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:39,172-Speed 6318.10 samples/sec Loss 3.8961 LearningRate 0.0001 Epoch: 26 Global Step: 546830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:28:42,403-Speed 6341.23 samples/sec Loss 3.8912 LearningRate 0.0001 Epoch: 26 Global Step: 546840 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:28:45,648-Speed 6312.33 samples/sec Loss 3.9141 LearningRate 0.0001 Epoch: 26 Global Step: 546850 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:28:48,892-Speed 6313.11 samples/sec Loss 3.9141 LearningRate 0.0001 Epoch: 26 Global Step: 546860 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:28:52,134-Speed 6319.87 samples/sec Loss 3.9663 LearningRate 0.0001 Epoch: 26 Global Step: 546870 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:28:55,377-Speed 6315.52 samples/sec Loss 3.9691 LearningRate 0.0001 Epoch: 26 Global Step: 546880 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:28:58,620-Speed 6316.71 samples/sec Loss 3.8970 LearningRate 0.0001 Epoch: 26 Global Step: 546890 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:29:01,864-Speed 6314.43 samples/sec Loss 3.9022 LearningRate 0.0001 Epoch: 26 Global Step: 546900 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:29:05,133-Speed 6267.37 samples/sec Loss 3.9474 LearningRate 0.0001 Epoch: 26 Global Step: 546910 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:29:08,378-Speed 6312.33 samples/sec Loss 3.9075 LearningRate 0.0001 Epoch: 26 Global Step: 546920 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:29:11,623-Speed 6313.77 samples/sec Loss 3.9691 LearningRate 0.0001 Epoch: 26 Global Step: 546930 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:29:14,871-Speed 6306.48 samples/sec Loss 3.9560 LearningRate 0.0001 Epoch: 26 Global Step: 546940 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:18,116-Speed 6312.83 samples/sec Loss 3.9312 LearningRate 0.0001 Epoch: 26 Global Step: 546950 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:21,359-Speed 6316.42 samples/sec Loss 3.9008 LearningRate 0.0001 Epoch: 26 Global Step: 546960 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:24,609-Speed 6302.90 samples/sec Loss 3.9687 LearningRate 0.0001 Epoch: 26 Global Step: 546970 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:27,855-Speed 6311.80 samples/sec Loss 3.8896 LearningRate 0.0001 Epoch: 26 Global Step: 546980 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:31,103-Speed 6307.55 samples/sec Loss 3.8911 LearningRate 0.0001 Epoch: 26 Global Step: 546990 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:34,351-Speed 6306.63 samples/sec Loss 3.8998 LearningRate 0.0001 Epoch: 26 Global Step: 547000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:37,599-Speed 6305.49 samples/sec Loss 3.9091 LearningRate 0.0001 Epoch: 26 Global Step: 547010 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:40,845-Speed 6310.87 samples/sec Loss 3.9069 LearningRate 0.0001 Epoch: 26 Global Step: 547020 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:44,091-Speed 6311.10 samples/sec Loss 3.9633 LearningRate 0.0001 Epoch: 26 Global Step: 547030 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:47,338-Speed 6309.04 samples/sec Loss 3.9414 LearningRate 0.0001 Epoch: 26 Global Step: 547040 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-04-02 17:29:50,568-Speed 6342.09 samples/sec Loss 3.9477 LearningRate 0.0001 Epoch: 26 Global Step: 547050 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:53,814-Speed 6309.46 samples/sec Loss 3.9569 LearningRate 0.0001 Epoch: 26 Global Step: 547060 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:29:57,058-Speed 6316.46 samples/sec Loss 3.9067 LearningRate 0.0001 Epoch: 26 Global Step: 547070 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:30:00,303-Speed 6312.19 samples/sec Loss 3.8988 LearningRate 0.0001 Epoch: 26 Global Step: 547080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:30:03,553-Speed 6303.17 samples/sec Loss 3.9102 LearningRate 0.0001 Epoch: 26 Global Step: 547090 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:30:06,796-Speed 6314.60 samples/sec Loss 3.9580 LearningRate 0.0001 Epoch: 26 Global Step: 547100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:30:10,046-Speed 6304.13 samples/sec Loss 3.9878 LearningRate 0.0001 Epoch: 26 Global Step: 547110 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:30:13,289-Speed 6315.48 samples/sec Loss 3.9254 LearningRate 0.0001 Epoch: 26 Global Step: 547120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:30:16,536-Speed 6309.43 samples/sec Loss 3.9145 LearningRate 0.0001 Epoch: 26 Global Step: 547130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:30:19,766-Speed 6342.62 samples/sec Loss 3.9339 LearningRate 0.0001 Epoch: 26 Global Step: 547140 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:23,013-Speed 6307.76 samples/sec Loss 3.9117 LearningRate 0.0001 Epoch: 26 Global Step: 547150 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:26,260-Speed 6309.85 samples/sec Loss 3.9318 LearningRate 0.0001 Epoch: 26 Global Step: 547160 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:29,502-Speed 6318.87 samples/sec Loss 3.9048 LearningRate 0.0001 Epoch: 26 Global Step: 547170 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:32,748-Speed 6310.59 samples/sec Loss 3.9395 LearningRate 0.0001 Epoch: 26 Global Step: 547180 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:36,030-Speed 6241.45 samples/sec Loss 3.9533 LearningRate 0.0001 Epoch: 26 Global Step: 547190 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:39,275-Speed 6312.41 samples/sec Loss 3.9747 LearningRate 0.0001 Epoch: 26 Global Step: 547200 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:42,517-Speed 6318.90 samples/sec Loss 3.9494 LearningRate 0.0001 Epoch: 26 Global Step: 547210 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:45,763-Speed 6310.51 samples/sec Loss 3.9972 LearningRate 0.0001 Epoch: 26 Global Step: 547220 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:49,007-Speed 6314.72 samples/sec Loss 3.9352 LearningRate 0.0001 Epoch: 26 Global Step: 547230 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:52,247-Speed 6321.60 samples/sec Loss 3.9477 LearningRate 0.0001 Epoch: 26 Global Step: 547240 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:30:55,481-Speed 6334.28 samples/sec Loss 4.0016 LearningRate 0.0001 Epoch: 26 Global Step: 547250 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:30:58,726-Speed 6311.90 samples/sec Loss 3.9619 LearningRate 0.0001 Epoch: 26 Global Step: 547260 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:31:01,970-Speed 6315.49 samples/sec Loss 3.9428 LearningRate 0.0001 Epoch: 26 Global Step: 547270 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:31:05,213-Speed 6316.84 samples/sec Loss 3.9891 LearningRate 0.0001 Epoch: 26 Global Step: 547280 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:31:08,462-Speed 6304.75 samples/sec Loss 3.9050 LearningRate 0.0001 Epoch: 26 Global Step: 547290 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:31:11,707-Speed 6311.77 samples/sec Loss 3.9858 LearningRate 0.0001 Epoch: 26 Global Step: 547300 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:31:14,950-Speed 6317.07 samples/sec Loss 3.9607 LearningRate 0.0001 Epoch: 26 Global Step: 547310 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:31:18,198-Speed 6307.34 samples/sec Loss 3.9278 LearningRate 0.0001 Epoch: 26 Global Step: 547320 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:31:21,442-Speed 6314.52 samples/sec Loss 3.8803 LearningRate 0.0001 Epoch: 26 Global Step: 547330 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:31:24,688-Speed 6310.69 samples/sec Loss 3.9528 LearningRate 0.0001 Epoch: 26 Global Step: 547340 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:31:27,938-Speed 6302.00 samples/sec Loss 3.9241 LearningRate 0.0001 Epoch: 26 Global Step: 547350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:31:31,183-Speed 6313.89 samples/sec Loss 3.9711 LearningRate 0.0001 Epoch: 26 Global Step: 547360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:31:34,426-Speed 6315.49 samples/sec Loss 3.9238 LearningRate 0.0001 Epoch: 26 Global Step: 547370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:31:37,673-Speed 6310.57 samples/sec Loss 3.9149 LearningRate 0.0001 Epoch: 26 Global Step: 547380 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:31:40,914-Speed 6320.19 samples/sec Loss 3.8669 LearningRate 0.0001 Epoch: 26 Global Step: 547390 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:31:44,156-Speed 6317.34 samples/sec Loss 3.9026 LearningRate 0.0001 Epoch: 26 Global Step: 547400 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:31:47,402-Speed 6311.06 samples/sec Loss 3.9070 LearningRate 0.0001 Epoch: 26 Global Step: 547410 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:31:50,649-Speed 6309.17 samples/sec Loss 3.8797 LearningRate 0.0001 Epoch: 26 Global Step: 547420 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:31:53,896-Speed 6309.15 samples/sec Loss 3.9358 LearningRate 0.0001 Epoch: 26 Global Step: 547430 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:31:57,140-Speed 6314.45 samples/sec Loss 3.9744 LearningRate 0.0001 Epoch: 26 Global Step: 547440 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:32:00,373-Speed 6336.81 samples/sec Loss 3.9856 LearningRate 0.0001 Epoch: 26 Global Step: 547450 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:32:03,620-Speed 6307.44 samples/sec Loss 3.9082 LearningRate 0.0001 Epoch: 26 Global Step: 547460 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:32:06,870-Speed 6304.07 samples/sec Loss 3.9894 LearningRate 0.0001 Epoch: 26 Global Step: 547470 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:32:10,112-Speed 6318.36 samples/sec Loss 3.8847 LearningRate 0.0001 Epoch: 26 Global Step: 547480 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:32:13,342-Speed 6341.75 samples/sec Loss 3.9544 LearningRate 0.0001 Epoch: 26 Global Step: 547490 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:32:16,587-Speed 6313.43 samples/sec Loss 3.8731 LearningRate 0.0001 Epoch: 26 Global Step: 547500 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:32:19,830-Speed 6314.73 samples/sec Loss 3.9001 LearningRate 0.0001 Epoch: 26 Global Step: 547510 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:32:23,084-Speed 6295.98 samples/sec Loss 3.9657 LearningRate 0.0001 Epoch: 26 Global Step: 547520 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:32:26,349-Speed 6274.99 samples/sec Loss 3.9469 LearningRate 0.0001 Epoch: 26 Global Step: 547530 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:32:29,599-Speed 6301.47 samples/sec Loss 3.8850 LearningRate 0.0001 Epoch: 26 Global Step: 547540 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:32:32,849-Speed 6303.35 samples/sec Loss 3.9289 LearningRate 0.0001 Epoch: 26 Global Step: 547550 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:32:36,091-Speed 6319.74 samples/sec Loss 3.8907 LearningRate 0.0001 Epoch: 26 Global Step: 547560 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:32:39,334-Speed 6315.70 samples/sec Loss 3.9385 LearningRate 0.0001 Epoch: 26 Global Step: 547570 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:32:42,578-Speed 6313.72 samples/sec Loss 3.8530 LearningRate 0.0001 Epoch: 26 Global Step: 547580 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:32:45,827-Speed 6306.46 samples/sec Loss 3.9129 LearningRate 0.0001 Epoch: 26 Global Step: 547590 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:32:49,073-Speed 6311.02 samples/sec Loss 3.9295 LearningRate 0.0001 Epoch: 26 Global Step: 547600 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:32:52,317-Speed 6313.63 samples/sec Loss 3.9148 LearningRate 0.0001 Epoch: 26 Global Step: 547610 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:32:55,564-Speed 6309.71 samples/sec Loss 3.9543 LearningRate 0.0001 Epoch: 26 Global Step: 547620 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:32:58,810-Speed 6310.93 samples/sec Loss 3.8955 LearningRate 0.0001 Epoch: 26 Global Step: 547630 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:33:02,053-Speed 6316.13 samples/sec Loss 3.9313 LearningRate 0.0001 Epoch: 26 Global Step: 547640 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:33:05,304-Speed 6301.95 samples/sec Loss 3.9259 LearningRate 0.0001 Epoch: 26 Global Step: 547650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:33:08,533-Speed 6342.26 samples/sec Loss 3.9318 LearningRate 0.0001 Epoch: 26 Global Step: 547660 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:33:11,780-Speed 6308.71 samples/sec Loss 3.9500 LearningRate 0.0001 Epoch: 26 Global Step: 547670 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:33:15,026-Speed 6311.35 samples/sec Loss 3.9035 LearningRate 0.0001 Epoch: 26 Global Step: 547680 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:33:18,274-Speed 6306.95 samples/sec Loss 3.9359 LearningRate 0.0001 Epoch: 26 Global Step: 547690 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:33:21,517-Speed 6316.05 samples/sec Loss 3.8411 LearningRate 0.0001 Epoch: 26 Global Step: 547700 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:33:24,765-Speed 6307.11 samples/sec Loss 3.9521 LearningRate 0.0001 Epoch: 26 Global Step: 547710 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:33:28,010-Speed 6312.28 samples/sec Loss 3.9289 LearningRate 0.0001 Epoch: 26 Global Step: 547720 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:33:31,260-Speed 6303.78 samples/sec Loss 3.8925 LearningRate 0.0001 Epoch: 26 Global Step: 547730 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:33:34,508-Speed 6306.97 samples/sec Loss 3.9204 LearningRate 0.0001 Epoch: 26 Global Step: 547740 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:33:37,751-Speed 6317.07 samples/sec Loss 3.9282 LearningRate 0.0001 Epoch: 26 Global Step: 547750 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:33:40,997-Speed 6309.42 samples/sec Loss 3.9268 LearningRate 0.0001 Epoch: 26 Global Step: 547760 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:33:44,243-Speed 6311.71 samples/sec Loss 3.9686 LearningRate 0.0001 Epoch: 26 Global Step: 547770 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:33:47,487-Speed 6314.10 samples/sec Loss 3.9565 LearningRate 0.0001 Epoch: 26 Global Step: 547780 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:33:50,736-Speed 6306.61 samples/sec Loss 4.0006 LearningRate 0.0001 Epoch: 26 Global Step: 547790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:33:53,979-Speed 6315.34 samples/sec Loss 3.9369 LearningRate 0.0001 Epoch: 26 Global Step: 547800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:33:57,228-Speed 6306.50 samples/sec Loss 3.9543 LearningRate 0.0001 Epoch: 26 Global Step: 547810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:34:00,474-Speed 6310.17 samples/sec Loss 3.9456 LearningRate 0.0001 Epoch: 26 Global Step: 547820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:34:03,722-Speed 6307.46 samples/sec Loss 3.9364 LearningRate 0.0001 Epoch: 26 Global Step: 547830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:34:06,968-Speed 6310.96 samples/sec Loss 3.9363 LearningRate 0.0001 Epoch: 26 Global Step: 547840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:34:10,214-Speed 6310.31 samples/sec Loss 3.8857 LearningRate 0.0001 Epoch: 26 Global Step: 547850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:34:13,446-Speed 6336.61 samples/sec Loss 3.8858 LearningRate 0.0001 Epoch: 26 Global Step: 547860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:34:16,692-Speed 6312.37 samples/sec Loss 3.9385 LearningRate 0.0001 Epoch: 26 Global Step: 547870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:34:19,921-Speed 6342.68 samples/sec Loss 3.9007 LearningRate 0.0001 Epoch: 26 Global Step: 547880 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:34:23,166-Speed 6312.20 samples/sec Loss 3.9203 LearningRate 0.0001 Epoch: 26 Global Step: 547890 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:34:26,421-Speed 6293.42 samples/sec Loss 3.8928 LearningRate 0.0001 Epoch: 26 Global Step: 547900 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:34:29,664-Speed 6316.14 samples/sec Loss 3.9668 LearningRate 0.0001 Epoch: 26 Global Step: 547910 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:34:32,912-Speed 6308.45 samples/sec Loss 3.9232 LearningRate 0.0001 Epoch: 26 Global Step: 547920 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:34:36,162-Speed 6301.36 samples/sec Loss 3.9027 LearningRate 0.0001 Epoch: 26 Global Step: 547930 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:34:39,406-Speed 6315.53 samples/sec Loss 3.9298 LearningRate 0.0001 Epoch: 26 Global Step: 547940 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:34:42,655-Speed 6305.23 samples/sec Loss 3.9508 LearningRate 0.0001 Epoch: 26 Global Step: 547950 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:34:45,898-Speed 6315.88 samples/sec Loss 3.8843 LearningRate 0.0001 Epoch: 26 Global Step: 547960 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:34:49,140-Speed 6318.66 samples/sec Loss 3.9588 LearningRate 0.0001 Epoch: 26 Global Step: 547970 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:34:52,382-Speed 6318.03 samples/sec Loss 3.9091 LearningRate 0.0001 Epoch: 26 Global Step: 547980 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:34:55,632-Speed 6304.16 samples/sec Loss 3.9724 LearningRate 0.0001 Epoch: 26 Global Step: 547990 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:34:58,880-Speed 6306.09 samples/sec Loss 3.9233 LearningRate 0.0001 Epoch: 26 Global Step: 548000 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:02,130-Speed 6304.59 samples/sec Loss 3.9734 LearningRate 0.0001 Epoch: 26 Global Step: 548010 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:05,374-Speed 6313.21 samples/sec Loss 3.9670 LearningRate 0.0001 Epoch: 26 Global Step: 548020 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:08,621-Speed 6310.39 samples/sec Loss 3.8311 LearningRate 0.0001 Epoch: 26 Global Step: 548030 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:11,865-Speed 6314.51 samples/sec Loss 3.9025 LearningRate 0.0001 Epoch: 26 Global Step: 548040 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:15,110-Speed 6312.62 samples/sec Loss 3.9287 LearningRate 0.0001 Epoch: 26 Global Step: 548050 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:18,355-Speed 6311.84 samples/sec Loss 3.9667 LearningRate 0.0001 Epoch: 26 Global Step: 548060 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:21,602-Speed 6308.67 samples/sec Loss 3.8900 LearningRate 0.0001 Epoch: 26 Global Step: 548070 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:24,832-Speed 6341.92 samples/sec Loss 3.9257 LearningRate 0.0001 Epoch: 26 Global Step: 548080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:28,076-Speed 6314.83 samples/sec Loss 3.9091 LearningRate 0.0001 Epoch: 26 Global Step: 548090 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:31,323-Speed 6310.03 samples/sec Loss 3.9202 LearningRate 0.0001 Epoch: 26 Global Step: 548100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:34,569-Speed 6309.59 samples/sec Loss 3.9524 LearningRate 0.0001 Epoch: 26 Global Step: 548110 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:37,813-Speed 6314.57 samples/sec Loss 3.9609 LearningRate 0.0001 Epoch: 26 Global Step: 548120 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:41,057-Speed 6314.21 samples/sec Loss 3.8975 LearningRate 0.0001 Epoch: 26 Global Step: 548130 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:44,298-Speed 6320.92 samples/sec Loss 3.9182 LearningRate 0.0001 Epoch: 26 Global Step: 548140 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:47,548-Speed 6302.45 samples/sec Loss 3.8756 LearningRate 0.0001 Epoch: 26 Global Step: 548150 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:50,792-Speed 6315.62 samples/sec Loss 3.9516 LearningRate 0.0001 Epoch: 26 Global Step: 548160 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:54,039-Speed 6307.39 samples/sec Loss 3.9514 LearningRate 0.0001 Epoch: 26 Global Step: 548170 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:35:57,273-Speed 6334.94 samples/sec Loss 3.8907 LearningRate 0.0001 Epoch: 26 Global Step: 548180 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:36:00,521-Speed 6307.01 samples/sec Loss 3.9079 LearningRate 0.0001 Epoch: 26 Global Step: 548190 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:36:03,767-Speed 6310.72 samples/sec Loss 3.9737 LearningRate 0.0001 Epoch: 26 Global Step: 548200 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:36:06,996-Speed 6345.24 samples/sec Loss 3.9207 LearningRate 0.0001 Epoch: 26 Global Step: 548210 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:36:10,238-Speed 6317.55 samples/sec Loss 3.8169 LearningRate 0.0001 Epoch: 26 Global Step: 548220 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:36:13,482-Speed 6315.56 samples/sec Loss 3.8940 LearningRate 0.0001 Epoch: 26 Global Step: 548230 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:36:16,728-Speed 6310.09 samples/sec Loss 3.8928 LearningRate 0.0001 Epoch: 26 Global Step: 548240 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:36:19,974-Speed 6311.94 samples/sec Loss 3.9321 LearningRate 0.0001 Epoch: 26 Global Step: 548250 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:36:23,218-Speed 6314.55 samples/sec Loss 3.8742 LearningRate 0.0001 Epoch: 26 Global Step: 548260 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:36:26,465-Speed 6307.55 samples/sec Loss 3.9355 LearningRate 0.0001 Epoch: 26 Global Step: 548270 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:36:29,711-Speed 6311.50 samples/sec Loss 3.9334 LearningRate 0.0001 Epoch: 26 Global Step: 548280 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:36:32,970-Speed 6285.06 samples/sec Loss 3.9551 LearningRate 0.0001 Epoch: 26 Global Step: 548290 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:36:36,212-Speed 6317.65 samples/sec Loss 3.8998 LearningRate 0.0001 Epoch: 26 Global Step: 548300 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:36:39,456-Speed 6315.45 samples/sec Loss 3.9382 LearningRate 0.0001 Epoch: 26 Global Step: 548310 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:36:42,704-Speed 6307.67 samples/sec Loss 3.8863 LearningRate 0.0001 Epoch: 26 Global Step: 548320 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:36:45,950-Speed 6309.90 samples/sec Loss 3.9273 LearningRate 0.0001 Epoch: 26 Global Step: 548330 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:36:49,195-Speed 6313.18 samples/sec Loss 3.8838 LearningRate 0.0001 Epoch: 26 Global Step: 548340 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:36:52,441-Speed 6309.71 samples/sec Loss 3.8714 LearningRate 0.0001 Epoch: 26 Global Step: 548350 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:36:55,689-Speed 6306.63 samples/sec Loss 3.8975 LearningRate 0.0001 Epoch: 26 Global Step: 548360 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:36:58,936-Speed 6309.97 samples/sec Loss 3.9035 LearningRate 0.0001 Epoch: 26 Global Step: 548370 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:37:02,182-Speed 6310.97 samples/sec Loss 3.9380 LearningRate 0.0001 Epoch: 26 Global Step: 548380 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:37:05,428-Speed 6310.29 samples/sec Loss 3.9918 LearningRate 0.0001 Epoch: 26 Global Step: 548390 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:37:08,673-Speed 6311.91 samples/sec Loss 3.9266 LearningRate 0.0001 Epoch: 26 Global Step: 548400 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:37:11,905-Speed 6338.81 samples/sec Loss 3.9688 LearningRate 0.0001 Epoch: 26 Global Step: 548410 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:37:15,135-Speed 6342.31 samples/sec Loss 3.9623 LearningRate 0.0001 Epoch: 26 Global Step: 548420 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:18,383-Speed 6308.25 samples/sec Loss 3.9232 LearningRate 0.0001 Epoch: 26 Global Step: 548430 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:21,626-Speed 6315.39 samples/sec Loss 3.9214 LearningRate 0.0001 Epoch: 26 Global Step: 548440 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:24,868-Speed 6318.85 samples/sec Loss 3.9119 LearningRate 0.0001 Epoch: 26 Global Step: 548450 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:28,117-Speed 6305.26 samples/sec Loss 3.9098 LearningRate 0.0001 Epoch: 26 Global Step: 548460 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:31,359-Speed 6317.48 samples/sec Loss 3.9037 LearningRate 0.0001 Epoch: 26 Global Step: 548470 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:34,603-Speed 6315.12 samples/sec Loss 3.9242 LearningRate 0.0001 Epoch: 26 Global Step: 548480 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:37,845-Speed 6318.59 samples/sec Loss 3.8833 LearningRate 0.0001 Epoch: 26 Global Step: 548490 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:41,088-Speed 6316.00 samples/sec Loss 3.8285 LearningRate 0.0001 Epoch: 26 Global Step: 548500 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:44,333-Speed 6313.75 samples/sec Loss 3.8713 LearningRate 0.0001 Epoch: 26 Global Step: 548510 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:47,575-Speed 6318.01 samples/sec Loss 3.8915 LearningRate 0.0001 Epoch: 26 Global Step: 548520 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:37:50,828-Speed 6297.80 samples/sec Loss 3.9209 LearningRate 0.0001 Epoch: 26 Global Step: 548530 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:37:54,058-Speed 6340.69 samples/sec Loss 3.8883 LearningRate 0.0001 Epoch: 26 Global Step: 548540 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:37:57,304-Speed 6310.56 samples/sec Loss 3.9355 LearningRate 0.0001 Epoch: 26 Global Step: 548550 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:00,548-Speed 6314.83 samples/sec Loss 3.8730 LearningRate 0.0001 Epoch: 26 Global Step: 548560 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:03,789-Speed 6321.49 samples/sec Loss 3.9414 LearningRate 0.0001 Epoch: 26 Global Step: 548570 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:07,035-Speed 6308.93 samples/sec Loss 4.0098 LearningRate 0.0001 Epoch: 26 Global Step: 548580 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:10,281-Speed 6311.90 samples/sec Loss 3.9738 LearningRate 0.0001 Epoch: 26 Global Step: 548590 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:13,526-Speed 6312.40 samples/sec Loss 3.9313 LearningRate 0.0001 Epoch: 26 Global Step: 548600 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:16,768-Speed 6317.36 samples/sec Loss 3.9463 LearningRate 0.0001 Epoch: 26 Global Step: 548610 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:20,029-Speed 6284.17 samples/sec Loss 3.9183 LearningRate 0.0001 Epoch: 26 Global Step: 548620 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:23,277-Speed 6305.99 samples/sec Loss 3.9245 LearningRate 0.0001 Epoch: 26 Global Step: 548630 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:26,522-Speed 6312.83 samples/sec Loss 3.9129 LearningRate 0.0001 Epoch: 26 Global Step: 548640 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:38:29,769-Speed 6309.41 samples/sec Loss 3.9775 LearningRate 0.0001 Epoch: 26 Global Step: 548650 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:38:33,016-Speed 6309.28 samples/sec Loss 3.8942 LearningRate 0.0001 Epoch: 26 Global Step: 548660 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:38:36,259-Speed 6316.37 samples/sec Loss 3.9090 LearningRate 0.0001 Epoch: 26 Global Step: 548670 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:38:39,504-Speed 6312.05 samples/sec Loss 3.8963 LearningRate 0.0001 Epoch: 26 Global Step: 548680 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:38:42,736-Speed 6338.50 samples/sec Loss 3.9154 LearningRate 0.0001 Epoch: 26 Global Step: 548690 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:45,987-Speed 6300.57 samples/sec Loss 3.8949 LearningRate 0.0001 Epoch: 26 Global Step: 548700 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:49,230-Speed 6316.76 samples/sec Loss 3.9534 LearningRate 0.0001 Epoch: 26 Global Step: 548710 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:52,477-Speed 6309.23 samples/sec Loss 3.9384 LearningRate 0.0001 Epoch: 26 Global Step: 548720 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:55,724-Speed 6308.95 samples/sec Loss 3.9262 LearningRate 0.0001 Epoch: 26 Global Step: 548730 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:38:58,969-Speed 6311.70 samples/sec Loss 3.8914 LearningRate 0.0001 Epoch: 26 Global Step: 548740 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:39:02,220-Speed 6301.41 samples/sec Loss 3.8798 LearningRate 0.0001 Epoch: 26 Global Step: 548750 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:39:05,467-Speed 6308.88 samples/sec Loss 3.9727 LearningRate 0.0001 Epoch: 26 Global Step: 548760 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:39:08,714-Speed 6308.13 samples/sec Loss 3.9654 LearningRate 0.0001 Epoch: 26 Global Step: 548770 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:39:11,959-Speed 6312.48 samples/sec Loss 3.9253 LearningRate 0.0001 Epoch: 26 Global Step: 548780 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:39:15,207-Speed 6307.43 samples/sec Loss 3.9504 LearningRate 0.0001 Epoch: 26 Global Step: 548790 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:18,452-Speed 6312.66 samples/sec Loss 3.9487 LearningRate 0.0001 Epoch: 26 Global Step: 548800 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:21,701-Speed 6303.87 samples/sec Loss 3.9126 LearningRate 0.0001 Epoch: 26 Global Step: 548810 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:24,947-Speed 6312.26 samples/sec Loss 3.9159 LearningRate 0.0001 Epoch: 26 Global Step: 548820 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:28,193-Speed 6308.69 samples/sec Loss 3.8829 LearningRate 0.0001 Epoch: 26 Global Step: 548830 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:31,440-Speed 6310.73 samples/sec Loss 3.8922 LearningRate 0.0001 Epoch: 26 Global Step: 548840 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:34,685-Speed 6312.23 samples/sec Loss 3.8947 LearningRate 0.0001 Epoch: 26 Global Step: 548850 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:37,934-Speed 6306.30 samples/sec Loss 3.9056 LearningRate 0.0001 Epoch: 26 Global Step: 548860 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:41,183-Speed 6305.00 samples/sec Loss 3.8863 LearningRate 0.0001 Epoch: 26 Global Step: 548870 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:44,431-Speed 6306.51 samples/sec Loss 3.9735 LearningRate 0.0001 Epoch: 26 Global Step: 548880 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:47,663-Speed 6338.46 samples/sec Loss 3.8833 LearningRate 0.0001 Epoch: 26 Global Step: 548890 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:50,911-Speed 6305.41 samples/sec Loss 3.9412 LearningRate 0.0001 Epoch: 26 Global Step: 548900 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:54,157-Speed 6310.44 samples/sec Loss 3.9486 LearningRate 0.0001 Epoch: 26 Global Step: 548910 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:39:57,389-Speed 6338.49 samples/sec Loss 3.9063 LearningRate 0.0001 Epoch: 26 Global Step: 548920 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:40:00,635-Speed 6310.97 samples/sec Loss 3.9510 LearningRate 0.0001 Epoch: 26 Global Step: 548930 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:40:03,883-Speed 6306.83 samples/sec Loss 3.8773 LearningRate 0.0001 Epoch: 26 Global Step: 548940 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:40:07,124-Speed 6319.94 samples/sec Loss 3.8724 LearningRate 0.0001 Epoch: 26 Global Step: 548950 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:40:10,368-Speed 6315.83 samples/sec Loss 3.9109 LearningRate 0.0001 Epoch: 26 Global Step: 548960 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:40:13,609-Speed 6319.06 samples/sec Loss 3.9288 LearningRate 0.0001 Epoch: 26 Global Step: 548970 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:40:16,860-Speed 6301.92 samples/sec Loss 3.9067 LearningRate 0.0001 Epoch: 26 Global Step: 548980 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:40:20,116-Speed 6291.85 samples/sec Loss 3.8923 LearningRate 0.0001 Epoch: 26 Global Step: 548990 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:40:23,358-Speed 6318.58 samples/sec Loss 3.8944 LearningRate 0.0001 Epoch: 26 Global Step: 549000 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:40:26,605-Speed 6308.47 samples/sec Loss 3.9792 LearningRate 0.0001 Epoch: 26 Global Step: 549010 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:40:29,855-Speed 6301.42 samples/sec Loss 3.9327 LearningRate 0.0001 Epoch: 26 Global Step: 549020 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:40:33,100-Speed 6314.30 samples/sec Loss 3.9367 LearningRate 0.0001 Epoch: 26 Global Step: 549030 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:40:36,343-Speed 6316.64 samples/sec Loss 3.9515 LearningRate 0.0001 Epoch: 26 Global Step: 549040 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:40:39,584-Speed 6318.71 samples/sec Loss 3.9348 LearningRate 0.0001 Epoch: 26 Global Step: 549050 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:40:42,837-Speed 6299.66 samples/sec Loss 3.9247 LearningRate 0.0001 Epoch: 26 Global Step: 549060 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:40:46,082-Speed 6312.32 samples/sec Loss 3.9484 LearningRate 0.0001 Epoch: 26 Global Step: 549070 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:40:49,328-Speed 6309.94 samples/sec Loss 3.8879 LearningRate 0.0001 Epoch: 26 Global Step: 549080 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:40:52,676-Speed 6118.88 samples/sec Loss 3.8772 LearningRate 0.0001 Epoch: 26 Global Step: 549090 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:40:55,921-Speed 6313.67 samples/sec Loss 3.8941 LearningRate 0.0001 Epoch: 26 Global Step: 549100 Fp16 Grad Scale: 16384 Required: 26 hours Training: 2022-04-02 17:40:59,164-Speed 6316.47 samples/sec Loss 3.8871 LearningRate 0.0001 Epoch: 26 Global Step: 549110 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:41:02,406-Speed 6317.04 samples/sec Loss 3.9510 LearningRate 0.0001 Epoch: 26 Global Step: 549120 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:41:05,653-Speed 6309.70 samples/sec Loss 3.9642 LearningRate 0.0001 Epoch: 26 Global Step: 549130 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:41:08,898-Speed 6313.16 samples/sec Loss 3.9375 LearningRate 0.0001 Epoch: 26 Global Step: 549140 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:41:12,142-Speed 6314.41 samples/sec Loss 3.9321 LearningRate 0.0001 Epoch: 26 Global Step: 549150 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:41:15,387-Speed 6311.42 samples/sec Loss 3.9682 LearningRate 0.0001 Epoch: 26 Global Step: 549160 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:41:18,631-Speed 6315.96 samples/sec Loss 3.9618 LearningRate 0.0001 Epoch: 26 Global Step: 549170 Fp16 Grad Scale: 8192 Required: 26 hours Training: 2022-04-02 17:41:21,877-Speed 6310.70 samples/sec Loss 3.8700 LearningRate 0.0001 Epoch: 26 Global Step: 549180 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:41:25,121-Speed 6313.57 samples/sec Loss 3.8863 LearningRate 0.0001 Epoch: 26 Global Step: 549190 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:41:28,377-Speed 6291.05 samples/sec Loss 3.8497 LearningRate 0.0001 Epoch: 26 Global Step: 549200 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:41:31,623-Speed 6311.71 samples/sec Loss 3.9414 LearningRate 0.0001 Epoch: 26 Global Step: 549210 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:41:34,870-Speed 6308.85 samples/sec Loss 3.9335 LearningRate 0.0001 Epoch: 26 Global Step: 549220 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:41:38,102-Speed 6338.15 samples/sec Loss 3.9776 LearningRate 0.0001 Epoch: 26 Global Step: 549230 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:41:41,352-Speed 6301.55 samples/sec Loss 3.8809 LearningRate 0.0001 Epoch: 26 Global Step: 549240 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:41:44,600-Speed 6307.32 samples/sec Loss 3.8454 LearningRate 0.0001 Epoch: 26 Global Step: 549250 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:41:47,904-Speed 6200.12 samples/sec Loss 3.9545 LearningRate 0.0001 Epoch: 26 Global Step: 549260 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:41:51,153-Speed 6306.31 samples/sec Loss 3.9181 LearningRate 0.0001 Epoch: 26 Global Step: 549270 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:41:54,397-Speed 6315.04 samples/sec Loss 3.9486 LearningRate 0.0001 Epoch: 26 Global Step: 549280 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:41:57,651-Speed 6293.77 samples/sec Loss 3.8808 LearningRate 0.0001 Epoch: 26 Global Step: 549290 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:00,899-Speed 6306.93 samples/sec Loss 3.8948 LearningRate 0.0001 Epoch: 26 Global Step: 549300 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:04,145-Speed 6312.00 samples/sec Loss 3.9181 LearningRate 0.0001 Epoch: 26 Global Step: 549310 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:07,388-Speed 6315.69 samples/sec Loss 3.9103 LearningRate 0.0001 Epoch: 26 Global Step: 549320 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:10,633-Speed 6312.06 samples/sec Loss 3.9150 LearningRate 0.0001 Epoch: 26 Global Step: 549330 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:42:13,877-Speed 6314.75 samples/sec Loss 3.8664 LearningRate 0.0001 Epoch: 26 Global Step: 549340 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:42:17,122-Speed 6313.19 samples/sec Loss 3.9004 LearningRate 0.0001 Epoch: 26 Global Step: 549350 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:42:20,364-Speed 6317.82 samples/sec Loss 3.9314 LearningRate 0.0001 Epoch: 26 Global Step: 549360 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:42:23,615-Speed 6301.84 samples/sec Loss 3.9391 LearningRate 0.0001 Epoch: 26 Global Step: 549370 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:42:26,851-Speed 6330.82 samples/sec Loss 3.8712 LearningRate 0.0001 Epoch: 26 Global Step: 549380 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:30,094-Speed 6315.10 samples/sec Loss 3.9425 LearningRate 0.0001 Epoch: 26 Global Step: 549390 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:33,346-Speed 6299.41 samples/sec Loss 3.8822 LearningRate 0.0001 Epoch: 26 Global Step: 549400 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:36,594-Speed 6307.80 samples/sec Loss 3.9242 LearningRate 0.0001 Epoch: 26 Global Step: 549410 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:39,846-Speed 6299.12 samples/sec Loss 3.9509 LearningRate 0.0001 Epoch: 26 Global Step: 549420 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:43,091-Speed 6312.13 samples/sec Loss 3.8922 LearningRate 0.0001 Epoch: 26 Global Step: 549430 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:46,331-Speed 6321.49 samples/sec Loss 3.9113 LearningRate 0.0001 Epoch: 26 Global Step: 549440 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:49,580-Speed 6304.58 samples/sec Loss 3.9532 LearningRate 0.0001 Epoch: 26 Global Step: 549450 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:52,832-Speed 6300.52 samples/sec Loss 3.9600 LearningRate 0.0001 Epoch: 26 Global Step: 549460 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:56,081-Speed 6305.41 samples/sec Loss 3.9701 LearningRate 0.0001 Epoch: 26 Global Step: 549470 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:42:59,325-Speed 6313.48 samples/sec Loss 3.9197 LearningRate 0.0001 Epoch: 26 Global Step: 549480 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:02,571-Speed 6312.22 samples/sec Loss 3.9023 LearningRate 0.0001 Epoch: 26 Global Step: 549490 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:05,815-Speed 6314.01 samples/sec Loss 3.9196 LearningRate 0.0001 Epoch: 26 Global Step: 549500 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:09,061-Speed 6310.70 samples/sec Loss 3.9314 LearningRate 0.0001 Epoch: 26 Global Step: 549510 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:12,309-Speed 6307.69 samples/sec Loss 3.8841 LearningRate 0.0001 Epoch: 26 Global Step: 549520 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:15,554-Speed 6311.68 samples/sec Loss 3.8872 LearningRate 0.0001 Epoch: 26 Global Step: 549530 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:18,800-Speed 6310.65 samples/sec Loss 3.8852 LearningRate 0.0001 Epoch: 26 Global Step: 549540 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:22,047-Speed 6309.17 samples/sec Loss 3.8901 LearningRate 0.0001 Epoch: 26 Global Step: 549550 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:25,295-Speed 6307.73 samples/sec Loss 3.8850 LearningRate 0.0001 Epoch: 26 Global Step: 549560 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:28,544-Speed 6303.47 samples/sec Loss 3.9606 LearningRate 0.0001 Epoch: 26 Global Step: 549570 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:31,775-Speed 6341.69 samples/sec Loss 3.9209 LearningRate 0.0001 Epoch: 26 Global Step: 549580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:35,021-Speed 6308.87 samples/sec Loss 3.9148 LearningRate 0.0001 Epoch: 26 Global Step: 549590 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:38,271-Speed 6303.34 samples/sec Loss 3.9270 LearningRate 0.0001 Epoch: 26 Global Step: 549600 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:41,516-Speed 6313.10 samples/sec Loss 3.9552 LearningRate 0.0001 Epoch: 26 Global Step: 549610 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:44,766-Speed 6303.71 samples/sec Loss 3.8055 LearningRate 0.0001 Epoch: 26 Global Step: 549620 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:48,007-Speed 6320.34 samples/sec Loss 3.9196 LearningRate 0.0001 Epoch: 26 Global Step: 549630 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:51,255-Speed 6306.88 samples/sec Loss 3.8849 LearningRate 0.0001 Epoch: 26 Global Step: 549640 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:54,506-Speed 6301.21 samples/sec Loss 3.8837 LearningRate 0.0001 Epoch: 26 Global Step: 549650 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:43:57,753-Speed 6308.09 samples/sec Loss 3.9072 LearningRate 0.0001 Epoch: 26 Global Step: 549660 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:44:01,003-Speed 6303.45 samples/sec Loss 3.9216 LearningRate 0.0001 Epoch: 26 Global Step: 549670 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:44:04,232-Speed 6343.36 samples/sec Loss 3.9465 LearningRate 0.0001 Epoch: 26 Global Step: 549680 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:44:07,482-Speed 6303.87 samples/sec Loss 3.8451 LearningRate 0.0001 Epoch: 26 Global Step: 549690 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:44:10,726-Speed 6314.31 samples/sec Loss 3.8743 LearningRate 0.0001 Epoch: 26 Global Step: 549700 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:44:13,972-Speed 6311.47 samples/sec Loss 3.9308 LearningRate 0.0001 Epoch: 26 Global Step: 549710 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:44:17,227-Speed 6293.58 samples/sec Loss 3.9127 LearningRate 0.0001 Epoch: 26 Global Step: 549720 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:44:20,469-Speed 6317.74 samples/sec Loss 3.9129 LearningRate 0.0001 Epoch: 26 Global Step: 549730 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:44:23,719-Speed 6303.57 samples/sec Loss 3.9776 LearningRate 0.0001 Epoch: 26 Global Step: 549740 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:44:26,964-Speed 6312.87 samples/sec Loss 3.9219 LearningRate 0.0001 Epoch: 26 Global Step: 549750 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:44:30,211-Speed 6307.97 samples/sec Loss 3.8273 LearningRate 0.0001 Epoch: 26 Global Step: 549760 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:44:33,452-Speed 6321.04 samples/sec Loss 3.9466 LearningRate 0.0001 Epoch: 26 Global Step: 549770 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:44:36,694-Speed 6318.44 samples/sec Loss 3.9093 LearningRate 0.0001 Epoch: 26 Global Step: 549780 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:44:39,942-Speed 6306.38 samples/sec Loss 3.9303 LearningRate 0.0001 Epoch: 26 Global Step: 549790 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:44:43,185-Speed 6315.15 samples/sec Loss 3.8313 LearningRate 0.0001 Epoch: 26 Global Step: 549800 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:44:46,428-Speed 6317.07 samples/sec Loss 3.8150 LearningRate 0.0001 Epoch: 26 Global Step: 549810 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:44:49,680-Speed 6299.68 samples/sec Loss 3.8867 LearningRate 0.0001 Epoch: 26 Global Step: 549820 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:44:52,922-Speed 6318.17 samples/sec Loss 3.9438 LearningRate 0.0001 Epoch: 26 Global Step: 549830 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:44:56,167-Speed 6313.17 samples/sec Loss 3.8810 LearningRate 0.0001 Epoch: 26 Global Step: 549840 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:44:59,408-Speed 6320.34 samples/sec Loss 3.9136 LearningRate 0.0001 Epoch: 26 Global Step: 549850 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:02,652-Speed 6314.77 samples/sec Loss 3.9286 LearningRate 0.0001 Epoch: 26 Global Step: 549860 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:05,894-Speed 6317.57 samples/sec Loss 3.8572 LearningRate 0.0001 Epoch: 26 Global Step: 549870 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:09,125-Speed 6340.44 samples/sec Loss 3.8961 LearningRate 0.0001 Epoch: 26 Global Step: 549880 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:12,369-Speed 6315.52 samples/sec Loss 3.8847 LearningRate 0.0001 Epoch: 26 Global Step: 549890 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:15,615-Speed 6311.28 samples/sec Loss 3.8966 LearningRate 0.0001 Epoch: 26 Global Step: 549900 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:18,863-Speed 6306.06 samples/sec Loss 3.9448 LearningRate 0.0001 Epoch: 26 Global Step: 549910 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:22,108-Speed 6312.65 samples/sec Loss 3.8872 LearningRate 0.0001 Epoch: 26 Global Step: 549920 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:25,353-Speed 6312.50 samples/sec Loss 3.8973 LearningRate 0.0001 Epoch: 26 Global Step: 549930 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:28,601-Speed 6307.88 samples/sec Loss 3.9698 LearningRate 0.0001 Epoch: 26 Global Step: 549940 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:31,844-Speed 6314.94 samples/sec Loss 3.8329 LearningRate 0.0001 Epoch: 26 Global Step: 549950 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:35,092-Speed 6308.00 samples/sec Loss 3.9072 LearningRate 0.0001 Epoch: 26 Global Step: 549960 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:38,338-Speed 6310.74 samples/sec Loss 3.9214 LearningRate 0.0001 Epoch: 26 Global Step: 549970 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:41,571-Speed 6334.78 samples/sec Loss 3.9137 LearningRate 0.0001 Epoch: 26 Global Step: 549980 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:44,814-Speed 6317.46 samples/sec Loss 3.9465 LearningRate 0.0001 Epoch: 26 Global Step: 549990 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:48,064-Speed 6302.92 samples/sec Loss 3.9740 LearningRate 0.0001 Epoch: 26 Global Step: 550000 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:51,311-Speed 6307.65 samples/sec Loss 3.8886 LearningRate 0.0001 Epoch: 26 Global Step: 550010 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:54,557-Speed 6312.68 samples/sec Loss 3.9048 LearningRate 0.0001 Epoch: 26 Global Step: 550020 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:45:57,800-Speed 6314.92 samples/sec Loss 3.8558 LearningRate 0.0001 Epoch: 26 Global Step: 550030 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:46:01,051-Speed 6301.47 samples/sec Loss 3.8940 LearningRate 0.0001 Epoch: 26 Global Step: 550040 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:46:04,290-Speed 6324.90 samples/sec Loss 3.9350 LearningRate 0.0001 Epoch: 26 Global Step: 550050 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:07,532-Speed 6317.12 samples/sec Loss 3.8975 LearningRate 0.0001 Epoch: 26 Global Step: 550060 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:10,776-Speed 6316.09 samples/sec Loss 3.9616 LearningRate 0.0001 Epoch: 26 Global Step: 550070 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:14,041-Speed 6273.79 samples/sec Loss 3.9012 LearningRate 0.0001 Epoch: 26 Global Step: 550080 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:17,306-Speed 6274.16 samples/sec Loss 3.8914 LearningRate 0.0001 Epoch: 26 Global Step: 550090 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:20,550-Speed 6315.11 samples/sec Loss 3.8855 LearningRate 0.0001 Epoch: 26 Global Step: 550100 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:23,796-Speed 6310.63 samples/sec Loss 3.9380 LearningRate 0.0001 Epoch: 26 Global Step: 550110 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:27,042-Speed 6310.00 samples/sec Loss 3.9508 LearningRate 0.0001 Epoch: 26 Global Step: 550120 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:30,289-Speed 6308.58 samples/sec Loss 3.8881 LearningRate 0.0001 Epoch: 26 Global Step: 550130 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:33,534-Speed 6312.65 samples/sec Loss 3.9443 LearningRate 0.0001 Epoch: 26 Global Step: 550140 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:36,783-Speed 6308.74 samples/sec Loss 3.9233 LearningRate 0.0001 Epoch: 26 Global Step: 550150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:46:40,027-Speed 6313.39 samples/sec Loss 3.9750 LearningRate 0.0001 Epoch: 26 Global Step: 550160 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:46:43,263-Speed 6329.79 samples/sec Loss 3.9231 LearningRate 0.0001 Epoch: 26 Global Step: 550170 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:46,507-Speed 6315.03 samples/sec Loss 3.9691 LearningRate 0.0001 Epoch: 26 Global Step: 550180 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:49,755-Speed 6307.36 samples/sec Loss 3.8580 LearningRate 0.0001 Epoch: 26 Global Step: 550190 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:53,000-Speed 6313.68 samples/sec Loss 3.9295 LearningRate 0.0001 Epoch: 26 Global Step: 550200 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:56,242-Speed 6316.87 samples/sec Loss 3.8705 LearningRate 0.0001 Epoch: 26 Global Step: 550210 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:46:59,485-Speed 6316.28 samples/sec Loss 3.9502 LearningRate 0.0001 Epoch: 26 Global Step: 550220 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:02,732-Speed 6309.15 samples/sec Loss 3.9198 LearningRate 0.0001 Epoch: 26 Global Step: 550230 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:05,975-Speed 6317.31 samples/sec Loss 3.8794 LearningRate 0.0001 Epoch: 26 Global Step: 550240 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:09,219-Speed 6314.35 samples/sec Loss 3.9683 LearningRate 0.0001 Epoch: 26 Global Step: 550250 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:12,462-Speed 6315.88 samples/sec Loss 3.8723 LearningRate 0.0001 Epoch: 26 Global Step: 550260 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:15,707-Speed 6313.95 samples/sec Loss 3.9361 LearningRate 0.0001 Epoch: 26 Global Step: 550270 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:47:18,958-Speed 6299.95 samples/sec Loss 3.8815 LearningRate 0.0001 Epoch: 26 Global Step: 550280 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:47:22,203-Speed 6313.34 samples/sec Loss 3.9008 LearningRate 0.0001 Epoch: 26 Global Step: 550290 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:47:25,446-Speed 6317.22 samples/sec Loss 3.9712 LearningRate 0.0001 Epoch: 26 Global Step: 550300 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:47:28,694-Speed 6306.03 samples/sec Loss 3.8639 LearningRate 0.0001 Epoch: 26 Global Step: 550310 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:47:31,940-Speed 6311.74 samples/sec Loss 3.9636 LearningRate 0.0001 Epoch: 26 Global Step: 550320 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:47:35,186-Speed 6311.01 samples/sec Loss 3.9117 LearningRate 0.0001 Epoch: 26 Global Step: 550330 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:47:38,421-Speed 6331.06 samples/sec Loss 3.9286 LearningRate 0.0001 Epoch: 26 Global Step: 550340 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:41,664-Speed 6317.97 samples/sec Loss 3.9088 LearningRate 0.0001 Epoch: 26 Global Step: 550350 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:44,911-Speed 6307.13 samples/sec Loss 3.8808 LearningRate 0.0001 Epoch: 26 Global Step: 550360 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:48,159-Speed 6310.34 samples/sec Loss 3.8992 LearningRate 0.0001 Epoch: 26 Global Step: 550370 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:51,401-Speed 6317.70 samples/sec Loss 3.8712 LearningRate 0.0001 Epoch: 26 Global Step: 550380 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:54,645-Speed 6316.34 samples/sec Loss 3.9228 LearningRate 0.0001 Epoch: 26 Global Step: 550390 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:47:57,889-Speed 6313.04 samples/sec Loss 3.9040 LearningRate 0.0001 Epoch: 26 Global Step: 550400 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:48:01,134-Speed 6313.20 samples/sec Loss 3.9312 LearningRate 0.0001 Epoch: 26 Global Step: 550410 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:48:04,381-Speed 6308.18 samples/sec Loss 3.8755 LearningRate 0.0001 Epoch: 26 Global Step: 550420 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:48:07,630-Speed 6304.83 samples/sec Loss 3.9092 LearningRate 0.0001 Epoch: 26 Global Step: 550430 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:48:10,878-Speed 6307.00 samples/sec Loss 3.9208 LearningRate 0.0001 Epoch: 26 Global Step: 550440 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:14,124-Speed 6311.67 samples/sec Loss 3.9016 LearningRate 0.0001 Epoch: 26 Global Step: 550450 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:17,365-Speed 6319.68 samples/sec Loss 3.9655 LearningRate 0.0001 Epoch: 26 Global Step: 550460 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:20,609-Speed 6315.52 samples/sec Loss 3.8631 LearningRate 0.0001 Epoch: 26 Global Step: 550470 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:23,860-Speed 6300.46 samples/sec Loss 3.9767 LearningRate 0.0001 Epoch: 26 Global Step: 550480 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:27,101-Speed 6320.81 samples/sec Loss 3.9316 LearningRate 0.0001 Epoch: 26 Global Step: 550490 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:30,346-Speed 6312.56 samples/sec Loss 3.9450 LearningRate 0.0001 Epoch: 26 Global Step: 550500 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:33,594-Speed 6307.87 samples/sec Loss 3.9258 LearningRate 0.0001 Epoch: 26 Global Step: 550510 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:36,874-Speed 6244.76 samples/sec Loss 3.9228 LearningRate 0.0001 Epoch: 26 Global Step: 550520 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:40,196-Speed 6166.29 samples/sec Loss 3.8442 LearningRate 0.0001 Epoch: 26 Global Step: 550530 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:43,460-Speed 6277.45 samples/sec Loss 3.9911 LearningRate 0.0001 Epoch: 26 Global Step: 550540 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:46,705-Speed 6311.63 samples/sec Loss 3.9741 LearningRate 0.0001 Epoch: 26 Global Step: 550550 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:49,969-Speed 6276.70 samples/sec Loss 3.9222 LearningRate 0.0001 Epoch: 26 Global Step: 550560 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:53,213-Speed 6313.73 samples/sec Loss 3.9070 LearningRate 0.0001 Epoch: 26 Global Step: 550570 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:56,458-Speed 6312.42 samples/sec Loss 3.9413 LearningRate 0.0001 Epoch: 26 Global Step: 550580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:48:59,709-Speed 6302.30 samples/sec Loss 3.9351 LearningRate 0.0001 Epoch: 26 Global Step: 550590 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:02,952-Speed 6315.48 samples/sec Loss 3.8980 LearningRate 0.0001 Epoch: 26 Global Step: 550600 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:06,198-Speed 6310.64 samples/sec Loss 3.8232 LearningRate 0.0001 Epoch: 26 Global Step: 550610 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:09,443-Speed 6313.81 samples/sec Loss 3.8757 LearningRate 0.0001 Epoch: 26 Global Step: 550620 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:12,690-Speed 6307.78 samples/sec Loss 3.8410 LearningRate 0.0001 Epoch: 26 Global Step: 550630 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:15,919-Speed 6344.28 samples/sec Loss 3.8572 LearningRate 0.0001 Epoch: 26 Global Step: 550640 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:19,168-Speed 6303.80 samples/sec Loss 3.9259 LearningRate 0.0001 Epoch: 26 Global Step: 550650 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:22,413-Speed 6313.14 samples/sec Loss 3.9663 LearningRate 0.0001 Epoch: 26 Global Step: 550660 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:25,658-Speed 6312.55 samples/sec Loss 3.8936 LearningRate 0.0001 Epoch: 26 Global Step: 550670 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:28,920-Speed 6281.07 samples/sec Loss 3.8769 LearningRate 0.0001 Epoch: 26 Global Step: 550680 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:32,163-Speed 6316.08 samples/sec Loss 3.9468 LearningRate 0.0001 Epoch: 26 Global Step: 550690 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:35,417-Speed 6295.93 samples/sec Loss 3.9210 LearningRate 0.0001 Epoch: 26 Global Step: 550700 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:38,660-Speed 6315.89 samples/sec Loss 3.8752 LearningRate 0.0001 Epoch: 26 Global Step: 550710 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:41,906-Speed 6310.50 samples/sec Loss 3.9420 LearningRate 0.0001 Epoch: 26 Global Step: 550720 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:45,160-Speed 6296.00 samples/sec Loss 3.8919 LearningRate 0.0001 Epoch: 26 Global Step: 550730 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:48,390-Speed 6340.97 samples/sec Loss 3.9189 LearningRate 0.0001 Epoch: 26 Global Step: 550740 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:51,631-Speed 6321.57 samples/sec Loss 3.9075 LearningRate 0.0001 Epoch: 26 Global Step: 550750 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:54,875-Speed 6315.64 samples/sec Loss 3.9283 LearningRate 0.0001 Epoch: 26 Global Step: 550760 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:49:58,120-Speed 6312.38 samples/sec Loss 3.8297 LearningRate 0.0001 Epoch: 26 Global Step: 550770 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:01,363-Speed 6315.46 samples/sec Loss 3.8629 LearningRate 0.0001 Epoch: 26 Global Step: 550780 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:04,608-Speed 6312.64 samples/sec Loss 3.9089 LearningRate 0.0001 Epoch: 26 Global Step: 550790 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:07,858-Speed 6304.26 samples/sec Loss 3.8870 LearningRate 0.0001 Epoch: 26 Global Step: 550800 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:11,111-Speed 6296.72 samples/sec Loss 3.8752 LearningRate 0.0001 Epoch: 26 Global Step: 550810 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:14,354-Speed 6316.11 samples/sec Loss 3.9024 LearningRate 0.0001 Epoch: 26 Global Step: 550820 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:17,601-Speed 6307.81 samples/sec Loss 3.8607 LearningRate 0.0001 Epoch: 26 Global Step: 550830 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:20,836-Speed 6333.21 samples/sec Loss 3.8916 LearningRate 0.0001 Epoch: 26 Global Step: 550840 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:24,085-Speed 6305.71 samples/sec Loss 3.9126 LearningRate 0.0001 Epoch: 26 Global Step: 550850 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:27,337-Speed 6297.96 samples/sec Loss 3.9213 LearningRate 0.0001 Epoch: 26 Global Step: 550860 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:30,585-Speed 6308.03 samples/sec Loss 3.9274 LearningRate 0.0001 Epoch: 26 Global Step: 550870 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:33,836-Speed 6299.84 samples/sec Loss 3.9291 LearningRate 0.0001 Epoch: 26 Global Step: 550880 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:50:37,073-Speed 6328.64 samples/sec Loss 3.9178 LearningRate 0.0001 Epoch: 26 Global Step: 550890 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:50:40,318-Speed 6312.20 samples/sec Loss 3.9333 LearningRate 0.0001 Epoch: 26 Global Step: 550900 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:50:43,562-Speed 6315.60 samples/sec Loss 3.9138 LearningRate 0.0001 Epoch: 26 Global Step: 550910 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:50:46,814-Speed 6297.97 samples/sec Loss 3.9256 LearningRate 0.0001 Epoch: 26 Global Step: 550920 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:50:50,058-Speed 6314.81 samples/sec Loss 3.9046 LearningRate 0.0001 Epoch: 26 Global Step: 550930 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:50:53,309-Speed 6302.12 samples/sec Loss 3.9149 LearningRate 0.0001 Epoch: 26 Global Step: 550940 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:50:56,556-Speed 6308.97 samples/sec Loss 3.8837 LearningRate 0.0001 Epoch: 26 Global Step: 550950 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:50:59,802-Speed 6310.90 samples/sec Loss 3.9105 LearningRate 0.0001 Epoch: 26 Global Step: 550960 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:51:03,051-Speed 6305.82 samples/sec Loss 3.9010 LearningRate 0.0001 Epoch: 26 Global Step: 550970 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:51:06,297-Speed 6310.26 samples/sec Loss 3.8341 LearningRate 0.0001 Epoch: 26 Global Step: 550980 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:51:09,541-Speed 6313.74 samples/sec Loss 3.9212 LearningRate 0.0001 Epoch: 26 Global Step: 550990 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:12,789-Speed 6307.21 samples/sec Loss 3.8622 LearningRate 0.0001 Epoch: 26 Global Step: 551000 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:16,117-Speed 6155.51 samples/sec Loss 3.8404 LearningRate 0.0001 Epoch: 26 Global Step: 551010 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:19,369-Speed 6299.06 samples/sec Loss 3.9072 LearningRate 0.0001 Epoch: 26 Global Step: 551020 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:22,612-Speed 6316.07 samples/sec Loss 3.8382 LearningRate 0.0001 Epoch: 26 Global Step: 551030 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:25,858-Speed 6310.27 samples/sec Loss 3.8987 LearningRate 0.0001 Epoch: 26 Global Step: 551040 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:29,100-Speed 6318.96 samples/sec Loss 3.9294 LearningRate 0.0001 Epoch: 26 Global Step: 551050 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:32,349-Speed 6304.34 samples/sec Loss 3.8619 LearningRate 0.0001 Epoch: 26 Global Step: 551060 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:35,594-Speed 6312.89 samples/sec Loss 3.8603 LearningRate 0.0001 Epoch: 26 Global Step: 551070 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:38,841-Speed 6309.34 samples/sec Loss 3.9425 LearningRate 0.0001 Epoch: 26 Global Step: 551080 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:42,085-Speed 6313.85 samples/sec Loss 3.8500 LearningRate 0.0001 Epoch: 26 Global Step: 551090 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-04-02 17:51:45,313-Speed 6345.55 samples/sec Loss 3.9113 LearningRate 0.0001 Epoch: 26 Global Step: 551100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:48,563-Speed 6303.79 samples/sec Loss 3.9180 LearningRate 0.0001 Epoch: 26 Global Step: 551110 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:51,807-Speed 6314.47 samples/sec Loss 3.9033 LearningRate 0.0001 Epoch: 26 Global Step: 551120 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:55,055-Speed 6307.26 samples/sec Loss 3.9531 LearningRate 0.0001 Epoch: 26 Global Step: 551130 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:51:58,304-Speed 6305.59 samples/sec Loss 3.8776 LearningRate 0.0001 Epoch: 26 Global Step: 551140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:52:01,551-Speed 6309.79 samples/sec Loss 3.9244 LearningRate 0.0001 Epoch: 26 Global Step: 551150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:52:04,810-Speed 6284.50 samples/sec Loss 3.8908 LearningRate 0.0001 Epoch: 26 Global Step: 551160 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:52:08,042-Speed 6337.71 samples/sec Loss 3.8549 LearningRate 0.0001 Epoch: 26 Global Step: 551170 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:11,291-Speed 6306.05 samples/sec Loss 3.8719 LearningRate 0.0001 Epoch: 26 Global Step: 551180 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:14,535-Speed 6314.47 samples/sec Loss 3.9159 LearningRate 0.0001 Epoch: 26 Global Step: 551190 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:17,779-Speed 6314.28 samples/sec Loss 3.8197 LearningRate 0.0001 Epoch: 26 Global Step: 551200 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:21,027-Speed 6306.79 samples/sec Loss 3.8500 LearningRate 0.0001 Epoch: 26 Global Step: 551210 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:24,274-Speed 6308.90 samples/sec Loss 3.8670 LearningRate 0.0001 Epoch: 26 Global Step: 551220 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:27,522-Speed 6306.94 samples/sec Loss 3.9142 LearningRate 0.0001 Epoch: 26 Global Step: 551230 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:30,768-Speed 6312.43 samples/sec Loss 3.8538 LearningRate 0.0001 Epoch: 26 Global Step: 551240 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:34,019-Speed 6300.14 samples/sec Loss 3.9311 LearningRate 0.0001 Epoch: 26 Global Step: 551250 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:37,266-Speed 6309.09 samples/sec Loss 3.9170 LearningRate 0.0001 Epoch: 26 Global Step: 551260 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:40,513-Speed 6308.46 samples/sec Loss 3.9115 LearningRate 0.0001 Epoch: 26 Global Step: 551270 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:52:43,785-Speed 6260.69 samples/sec Loss 3.8702 LearningRate 0.0001 Epoch: 26 Global Step: 551280 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:52:47,016-Speed 6339.75 samples/sec Loss 3.9363 LearningRate 0.0001 Epoch: 26 Global Step: 551290 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:50,356-Speed 6132.17 samples/sec Loss 3.8907 LearningRate 0.0001 Epoch: 26 Global Step: 551300 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:53,672-Speed 6178.66 samples/sec Loss 3.9289 LearningRate 0.0001 Epoch: 26 Global Step: 551310 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:52:56,917-Speed 6311.95 samples/sec Loss 3.9067 LearningRate 0.0001 Epoch: 26 Global Step: 551320 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:00,162-Speed 6314.04 samples/sec Loss 3.9298 LearningRate 0.0001 Epoch: 26 Global Step: 551330 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:03,410-Speed 6306.26 samples/sec Loss 3.8839 LearningRate 0.0001 Epoch: 26 Global Step: 551340 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:06,659-Speed 6305.12 samples/sec Loss 3.8923 LearningRate 0.0001 Epoch: 26 Global Step: 551350 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:09,910-Speed 6301.18 samples/sec Loss 3.8971 LearningRate 0.0001 Epoch: 26 Global Step: 551360 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:13,152-Speed 6318.23 samples/sec Loss 3.8721 LearningRate 0.0001 Epoch: 26 Global Step: 551370 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:16,397-Speed 6312.56 samples/sec Loss 3.8638 LearningRate 0.0001 Epoch: 26 Global Step: 551380 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:19,644-Speed 6310.95 samples/sec Loss 3.8821 LearningRate 0.0001 Epoch: 26 Global Step: 551390 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:53:22,899-Speed 6293.22 samples/sec Loss 3.9044 LearningRate 0.0001 Epoch: 26 Global Step: 551400 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:53:26,150-Speed 6301.02 samples/sec Loss 3.9231 LearningRate 0.0001 Epoch: 26 Global Step: 551410 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:53:29,396-Speed 6311.60 samples/sec Loss 3.9073 LearningRate 0.0001 Epoch: 26 Global Step: 551420 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:53:32,643-Speed 6308.86 samples/sec Loss 3.9713 LearningRate 0.0001 Epoch: 26 Global Step: 551430 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:53:35,877-Speed 6334.52 samples/sec Loss 3.8785 LearningRate 0.0001 Epoch: 26 Global Step: 551440 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:39,127-Speed 6302.40 samples/sec Loss 3.9341 LearningRate 0.0001 Epoch: 26 Global Step: 551450 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:42,371-Speed 6315.61 samples/sec Loss 3.9035 LearningRate 0.0001 Epoch: 26 Global Step: 551460 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:45,619-Speed 6306.57 samples/sec Loss 3.8601 LearningRate 0.0001 Epoch: 26 Global Step: 551470 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:48,865-Speed 6309.15 samples/sec Loss 3.9424 LearningRate 0.0001 Epoch: 26 Global Step: 551480 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:52,112-Speed 6308.69 samples/sec Loss 3.9289 LearningRate 0.0001 Epoch: 26 Global Step: 551490 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:55,360-Speed 6307.55 samples/sec Loss 3.8689 LearningRate 0.0001 Epoch: 26 Global Step: 551500 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:53:58,604-Speed 6314.57 samples/sec Loss 3.9509 LearningRate 0.0001 Epoch: 26 Global Step: 551510 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:01,851-Speed 6307.64 samples/sec Loss 3.8147 LearningRate 0.0001 Epoch: 26 Global Step: 551520 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:05,096-Speed 6313.98 samples/sec Loss 3.9412 LearningRate 0.0001 Epoch: 26 Global Step: 551530 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:08,344-Speed 6307.11 samples/sec Loss 3.8039 LearningRate 0.0001 Epoch: 26 Global Step: 551540 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:54:11,588-Speed 6314.39 samples/sec Loss 3.8225 LearningRate 0.0001 Epoch: 26 Global Step: 551550 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:54:14,838-Speed 6302.18 samples/sec Loss 3.9231 LearningRate 0.0001 Epoch: 26 Global Step: 551560 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:54:18,086-Speed 6307.46 samples/sec Loss 3.9071 LearningRate 0.0001 Epoch: 26 Global Step: 551570 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:54:21,333-Speed 6309.56 samples/sec Loss 3.8831 LearningRate 0.0001 Epoch: 26 Global Step: 551580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:54:24,568-Speed 6332.40 samples/sec Loss 3.8734 LearningRate 0.0001 Epoch: 26 Global Step: 551590 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:27,815-Speed 6308.68 samples/sec Loss 3.9464 LearningRate 0.0001 Epoch: 26 Global Step: 551600 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:31,060-Speed 6312.78 samples/sec Loss 3.8705 LearningRate 0.0001 Epoch: 26 Global Step: 551610 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:34,304-Speed 6314.80 samples/sec Loss 3.8723 LearningRate 0.0001 Epoch: 26 Global Step: 551620 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:37,542-Speed 6324.88 samples/sec Loss 3.8658 LearningRate 0.0001 Epoch: 26 Global Step: 551630 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:40,790-Speed 6308.42 samples/sec Loss 3.8026 LearningRate 0.0001 Epoch: 26 Global Step: 551640 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:44,031-Speed 6319.73 samples/sec Loss 3.8585 LearningRate 0.0001 Epoch: 26 Global Step: 551650 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:47,277-Speed 6310.50 samples/sec Loss 3.9238 LearningRate 0.0001 Epoch: 26 Global Step: 551660 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:50,532-Speed 6294.09 samples/sec Loss 3.9134 LearningRate 0.0001 Epoch: 26 Global Step: 551670 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:53,778-Speed 6309.71 samples/sec Loss 3.8452 LearningRate 0.0001 Epoch: 26 Global Step: 551680 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:54:57,021-Speed 6316.09 samples/sec Loss 3.8627 LearningRate 0.0001 Epoch: 26 Global Step: 551690 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:55:00,267-Speed 6310.47 samples/sec Loss 3.8870 LearningRate 0.0001 Epoch: 26 Global Step: 551700 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:55:03,513-Speed 6312.02 samples/sec Loss 3.8403 LearningRate 0.0001 Epoch: 26 Global Step: 551710 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:55:06,760-Speed 6307.80 samples/sec Loss 3.9102 LearningRate 0.0001 Epoch: 26 Global Step: 551720 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:55:10,005-Speed 6313.64 samples/sec Loss 3.8927 LearningRate 0.0001 Epoch: 26 Global Step: 551730 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:55:13,260-Speed 6293.48 samples/sec Loss 3.8198 LearningRate 0.0001 Epoch: 26 Global Step: 551740 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:55:16,503-Speed 6315.19 samples/sec Loss 3.8977 LearningRate 0.0001 Epoch: 26 Global Step: 551750 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:55:19,750-Speed 6308.85 samples/sec Loss 3.8776 LearningRate 0.0001 Epoch: 26 Global Step: 551760 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:55:22,994-Speed 6315.88 samples/sec Loss 3.9358 LearningRate 0.0001 Epoch: 26 Global Step: 551770 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:55:26,227-Speed 6335.16 samples/sec Loss 3.8841 LearningRate 0.0001 Epoch: 26 Global Step: 551780 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:55:29,483-Speed 6292.63 samples/sec Loss 3.9290 LearningRate 0.0001 Epoch: 26 Global Step: 551790 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:55:32,725-Speed 6318.61 samples/sec Loss 3.9034 LearningRate 0.0001 Epoch: 26 Global Step: 551800 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:55:35,974-Speed 6305.74 samples/sec Loss 3.9079 LearningRate 0.0001 Epoch: 26 Global Step: 551810 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:55:39,219-Speed 6312.66 samples/sec Loss 3.9066 LearningRate 0.0001 Epoch: 26 Global Step: 551820 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:55:42,461-Speed 6317.73 samples/sec Loss 3.9258 LearningRate 0.0001 Epoch: 26 Global Step: 551830 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:55:45,708-Speed 6308.70 samples/sec Loss 3.9222 LearningRate 0.0001 Epoch: 26 Global Step: 551840 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:55:48,953-Speed 6312.59 samples/sec Loss 3.9160 LearningRate 0.0001 Epoch: 26 Global Step: 551850 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:55:52,196-Speed 6317.16 samples/sec Loss 3.9169 LearningRate 0.0001 Epoch: 26 Global Step: 551860 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:55:55,441-Speed 6311.57 samples/sec Loss 3.9759 LearningRate 0.0001 Epoch: 26 Global Step: 551870 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:55:58,689-Speed 6307.68 samples/sec Loss 3.9236 LearningRate 0.0001 Epoch: 26 Global Step: 551880 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:56:01,936-Speed 6307.47 samples/sec Loss 3.9459 LearningRate 0.0001 Epoch: 26 Global Step: 551890 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:56:05,185-Speed 6305.73 samples/sec Loss 3.8833 LearningRate 0.0001 Epoch: 26 Global Step: 551900 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:56:08,435-Speed 6303.11 samples/sec Loss 3.9626 LearningRate 0.0001 Epoch: 26 Global Step: 551910 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:56:11,667-Speed 6338.96 samples/sec Loss 3.8837 LearningRate 0.0001 Epoch: 26 Global Step: 551920 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:14,913-Speed 6311.00 samples/sec Loss 3.8624 LearningRate 0.0001 Epoch: 26 Global Step: 551930 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:18,158-Speed 6311.59 samples/sec Loss 3.9389 LearningRate 0.0001 Epoch: 26 Global Step: 551940 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:21,402-Speed 6313.58 samples/sec Loss 3.9215 LearningRate 0.0001 Epoch: 26 Global Step: 551950 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:24,651-Speed 6305.88 samples/sec Loss 3.9006 LearningRate 0.0001 Epoch: 26 Global Step: 551960 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:27,894-Speed 6317.36 samples/sec Loss 3.8554 LearningRate 0.0001 Epoch: 26 Global Step: 551970 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:31,141-Speed 6307.09 samples/sec Loss 3.8978 LearningRate 0.0001 Epoch: 26 Global Step: 551980 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:34,385-Speed 6314.59 samples/sec Loss 3.9009 LearningRate 0.0001 Epoch: 26 Global Step: 551990 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:37,634-Speed 6306.48 samples/sec Loss 3.9108 LearningRate 0.0001 Epoch: 26 Global Step: 552000 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:40,884-Speed 6303.19 samples/sec Loss 3.8656 LearningRate 0.0001 Epoch: 26 Global Step: 552010 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:44,130-Speed 6311.08 samples/sec Loss 3.9374 LearningRate 0.0001 Epoch: 26 Global Step: 552020 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:56:47,378-Speed 6306.11 samples/sec Loss 3.9579 LearningRate 0.0001 Epoch: 26 Global Step: 552030 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:56:50,607-Speed 6343.90 samples/sec Loss 3.8483 LearningRate 0.0001 Epoch: 26 Global Step: 552040 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:53,852-Speed 6314.29 samples/sec Loss 3.9239 LearningRate 0.0001 Epoch: 26 Global Step: 552050 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:56:57,101-Speed 6304.09 samples/sec Loss 3.8901 LearningRate 0.0001 Epoch: 26 Global Step: 552060 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:00,346-Speed 6311.51 samples/sec Loss 3.9236 LearningRate 0.0001 Epoch: 26 Global Step: 552070 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:03,593-Speed 6309.60 samples/sec Loss 3.9002 LearningRate 0.0001 Epoch: 26 Global Step: 552080 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:06,838-Speed 6311.67 samples/sec Loss 3.8463 LearningRate 0.0001 Epoch: 26 Global Step: 552090 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:10,086-Speed 6308.39 samples/sec Loss 3.8946 LearningRate 0.0001 Epoch: 26 Global Step: 552100 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:13,328-Speed 6317.41 samples/sec Loss 3.9312 LearningRate 0.0001 Epoch: 26 Global Step: 552110 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:16,570-Speed 6318.57 samples/sec Loss 3.8745 LearningRate 0.0001 Epoch: 26 Global Step: 552120 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:19,816-Speed 6310.88 samples/sec Loss 3.8400 LearningRate 0.0001 Epoch: 26 Global Step: 552130 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:23,067-Speed 6300.95 samples/sec Loss 3.8498 LearningRate 0.0001 Epoch: 26 Global Step: 552140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:57:26,314-Speed 6309.15 samples/sec Loss 3.8631 LearningRate 0.0001 Epoch: 26 Global Step: 552150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:57:29,567-Speed 6297.67 samples/sec Loss 3.8622 LearningRate 0.0001 Epoch: 26 Global Step: 552160 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:57:32,807-Speed 6321.42 samples/sec Loss 3.8850 LearningRate 0.0001 Epoch: 26 Global Step: 552170 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:57:36,052-Speed 6313.46 samples/sec Loss 3.8890 LearningRate 0.0001 Epoch: 26 Global Step: 552180 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:57:39,279-Speed 6346.98 samples/sec Loss 3.9054 LearningRate 0.0001 Epoch: 26 Global Step: 552190 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:42,527-Speed 6307.44 samples/sec Loss 3.8719 LearningRate 0.0001 Epoch: 26 Global Step: 552200 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:45,771-Speed 6316.16 samples/sec Loss 3.8967 LearningRate 0.0001 Epoch: 26 Global Step: 552210 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:49,018-Speed 6307.32 samples/sec Loss 3.8711 LearningRate 0.0001 Epoch: 26 Global Step: 552220 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:52,265-Speed 6310.61 samples/sec Loss 3.9389 LearningRate 0.0001 Epoch: 26 Global Step: 552230 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:55,509-Speed 6312.94 samples/sec Loss 3.8850 LearningRate 0.0001 Epoch: 26 Global Step: 552240 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:57:58,759-Speed 6303.02 samples/sec Loss 3.8656 LearningRate 0.0001 Epoch: 26 Global Step: 552250 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:02,008-Speed 6306.03 samples/sec Loss 3.9111 LearningRate 0.0001 Epoch: 26 Global Step: 552260 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:05,253-Speed 6313.29 samples/sec Loss 3.8804 LearningRate 0.0001 Epoch: 26 Global Step: 552270 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:08,496-Speed 6315.48 samples/sec Loss 3.8786 LearningRate 0.0001 Epoch: 26 Global Step: 552280 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:11,743-Speed 6311.48 samples/sec Loss 3.8318 LearningRate 0.0001 Epoch: 26 Global Step: 552290 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:58:14,991-Speed 6307.04 samples/sec Loss 3.8995 LearningRate 0.0001 Epoch: 26 Global Step: 552300 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:58:18,239-Speed 6307.37 samples/sec Loss 3.9113 LearningRate 0.0001 Epoch: 26 Global Step: 552310 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:58:21,483-Speed 6312.82 samples/sec Loss 3.9484 LearningRate 0.0001 Epoch: 26 Global Step: 552320 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:58:24,730-Speed 6310.28 samples/sec Loss 3.8539 LearningRate 0.0001 Epoch: 26 Global Step: 552330 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:58:27,979-Speed 6304.38 samples/sec Loss 3.9020 LearningRate 0.0001 Epoch: 26 Global Step: 552340 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:58:31,213-Speed 6333.82 samples/sec Loss 3.9304 LearningRate 0.0001 Epoch: 26 Global Step: 552350 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:34,457-Speed 6314.93 samples/sec Loss 3.9607 LearningRate 0.0001 Epoch: 26 Global Step: 552360 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:37,706-Speed 6303.72 samples/sec Loss 3.8851 LearningRate 0.0001 Epoch: 26 Global Step: 552370 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:40,955-Speed 6306.51 samples/sec Loss 3.9409 LearningRate 0.0001 Epoch: 26 Global Step: 552380 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:44,208-Speed 6296.05 samples/sec Loss 3.9446 LearningRate 0.0001 Epoch: 26 Global Step: 552390 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:47,455-Speed 6309.67 samples/sec Loss 3.9258 LearningRate 0.0001 Epoch: 26 Global Step: 552400 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:50,704-Speed 6305.92 samples/sec Loss 3.8849 LearningRate 0.0001 Epoch: 26 Global Step: 552410 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:53,952-Speed 6306.92 samples/sec Loss 3.8967 LearningRate 0.0001 Epoch: 26 Global Step: 552420 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:58:57,200-Speed 6306.50 samples/sec Loss 3.8803 LearningRate 0.0001 Epoch: 26 Global Step: 552430 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:00,449-Speed 6304.08 samples/sec Loss 3.9074 LearningRate 0.0001 Epoch: 26 Global Step: 552440 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:03,700-Speed 6301.19 samples/sec Loss 3.9044 LearningRate 0.0001 Epoch: 26 Global Step: 552450 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:59:06,951-Speed 6301.65 samples/sec Loss 3.9231 LearningRate 0.0001 Epoch: 26 Global Step: 552460 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:59:10,197-Speed 6310.72 samples/sec Loss 3.9128 LearningRate 0.0001 Epoch: 26 Global Step: 552470 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:59:13,444-Speed 6308.61 samples/sec Loss 3.9454 LearningRate 0.0001 Epoch: 26 Global Step: 552480 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:59:16,675-Speed 6340.80 samples/sec Loss 3.9085 LearningRate 0.0001 Epoch: 26 Global Step: 552490 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:19,918-Speed 6316.43 samples/sec Loss 3.9130 LearningRate 0.0001 Epoch: 26 Global Step: 552500 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:23,162-Speed 6313.17 samples/sec Loss 3.8906 LearningRate 0.0001 Epoch: 26 Global Step: 552510 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:26,407-Speed 6313.23 samples/sec Loss 3.8982 LearningRate 0.0001 Epoch: 26 Global Step: 552520 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:29,653-Speed 6311.00 samples/sec Loss 3.9692 LearningRate 0.0001 Epoch: 26 Global Step: 552530 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:32,897-Speed 6314.09 samples/sec Loss 3.8895 LearningRate 0.0001 Epoch: 26 Global Step: 552540 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:36,142-Speed 6313.49 samples/sec Loss 3.9468 LearningRate 0.0001 Epoch: 26 Global Step: 552550 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:39,388-Speed 6311.15 samples/sec Loss 3.8749 LearningRate 0.0001 Epoch: 26 Global Step: 552560 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:42,632-Speed 6313.84 samples/sec Loss 3.8881 LearningRate 0.0001 Epoch: 26 Global Step: 552570 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:45,880-Speed 6306.27 samples/sec Loss 3.9190 LearningRate 0.0001 Epoch: 26 Global Step: 552580 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 17:59:49,127-Speed 6309.38 samples/sec Loss 3.8825 LearningRate 0.0001 Epoch: 26 Global Step: 552590 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:59:52,374-Speed 6310.63 samples/sec Loss 3.9032 LearningRate 0.0001 Epoch: 26 Global Step: 552600 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:59:55,629-Speed 6291.42 samples/sec Loss 3.9099 LearningRate 0.0001 Epoch: 26 Global Step: 552610 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 17:59:58,871-Speed 6318.70 samples/sec Loss 3.8585 LearningRate 0.0001 Epoch: 26 Global Step: 552620 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:00:02,119-Speed 6307.94 samples/sec Loss 3.8251 LearningRate 0.0001 Epoch: 26 Global Step: 552630 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:00:05,352-Speed 6336.65 samples/sec Loss 3.9103 LearningRate 0.0001 Epoch: 26 Global Step: 552640 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:00:08,598-Speed 6309.93 samples/sec Loss 3.8316 LearningRate 0.0001 Epoch: 26 Global Step: 552650 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:00:11,851-Speed 6296.69 samples/sec Loss 3.9253 LearningRate 0.0001 Epoch: 26 Global Step: 552660 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:00:15,096-Speed 6313.43 samples/sec Loss 3.9387 LearningRate 0.0001 Epoch: 26 Global Step: 552670 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:00:18,344-Speed 6306.59 samples/sec Loss 3.9248 LearningRate 0.0001 Epoch: 26 Global Step: 552680 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:00:21,589-Speed 6311.96 samples/sec Loss 3.9472 LearningRate 0.0001 Epoch: 26 Global Step: 552690 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:00:24,839-Speed 6304.00 samples/sec Loss 3.8622 LearningRate 0.0001 Epoch: 26 Global Step: 552700 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:00:28,084-Speed 6311.29 samples/sec Loss 3.8851 LearningRate 0.0001 Epoch: 26 Global Step: 552710 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:00:31,389-Speed 6198.49 samples/sec Loss 3.9445 LearningRate 0.0001 Epoch: 26 Global Step: 552720 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:00:34,633-Speed 6315.23 samples/sec Loss 3.8961 LearningRate 0.0001 Epoch: 26 Global Step: 552730 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:00:37,879-Speed 6309.82 samples/sec Loss 3.9462 LearningRate 0.0001 Epoch: 26 Global Step: 552740 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:00:41,127-Speed 6308.09 samples/sec Loss 3.8779 LearningRate 0.0001 Epoch: 26 Global Step: 552750 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:00:44,373-Speed 6309.82 samples/sec Loss 3.9323 LearningRate 0.0001 Epoch: 26 Global Step: 552760 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:00:47,618-Speed 6312.25 samples/sec Loss 3.8462 LearningRate 0.0001 Epoch: 26 Global Step: 552770 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:00:50,862-Speed 6315.69 samples/sec Loss 3.8830 LearningRate 0.0001 Epoch: 26 Global Step: 552780 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:00:54,109-Speed 6308.11 samples/sec Loss 3.8545 LearningRate 0.0001 Epoch: 26 Global Step: 552790 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:00:57,351-Speed 6318.06 samples/sec Loss 3.9027 LearningRate 0.0001 Epoch: 26 Global Step: 552800 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:01:00,592-Speed 6320.75 samples/sec Loss 3.9413 LearningRate 0.0001 Epoch: 26 Global Step: 552810 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:01:03,837-Speed 6313.78 samples/sec Loss 3.8513 LearningRate 0.0001 Epoch: 26 Global Step: 552820 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:01:07,081-Speed 6314.11 samples/sec Loss 3.8490 LearningRate 0.0001 Epoch: 26 Global Step: 552830 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:01:10,311-Speed 6342.83 samples/sec Loss 3.9097 LearningRate 0.0001 Epoch: 26 Global Step: 552840 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:01:13,561-Speed 6303.66 samples/sec Loss 3.9305 LearningRate 0.0001 Epoch: 26 Global Step: 552850 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:01:16,806-Speed 6313.26 samples/sec Loss 3.8986 LearningRate 0.0001 Epoch: 26 Global Step: 552860 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:01:20,049-Speed 6315.67 samples/sec Loss 3.8971 LearningRate 0.0001 Epoch: 26 Global Step: 552870 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:01:23,280-Speed 6340.34 samples/sec Loss 3.9034 LearningRate 0.0001 Epoch: 26 Global Step: 552880 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:01:26,524-Speed 6313.40 samples/sec Loss 3.8502 LearningRate 0.0001 Epoch: 26 Global Step: 552890 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:01:29,777-Speed 6297.80 samples/sec Loss 3.8794 LearningRate 0.0001 Epoch: 26 Global Step: 552900 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:01:33,024-Speed 6308.72 samples/sec Loss 3.9264 LearningRate 0.0001 Epoch: 26 Global Step: 552910 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:01:36,266-Speed 6317.59 samples/sec Loss 3.8644 LearningRate 0.0001 Epoch: 26 Global Step: 552920 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:01:39,515-Speed 6307.98 samples/sec Loss 3.9080 LearningRate 0.0001 Epoch: 26 Global Step: 552930 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:01:42,759-Speed 6315.24 samples/sec Loss 3.8751 LearningRate 0.0001 Epoch: 26 Global Step: 552940 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:01:46,006-Speed 6309.15 samples/sec Loss 3.9157 LearningRate 0.0001 Epoch: 26 Global Step: 552950 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:01:49,253-Speed 6308.82 samples/sec Loss 3.8049 LearningRate 0.0001 Epoch: 26 Global Step: 552960 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:01:52,497-Speed 6313.13 samples/sec Loss 3.9587 LearningRate 0.0001 Epoch: 26 Global Step: 552970 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:01:55,746-Speed 6305.55 samples/sec Loss 3.8249 LearningRate 0.0001 Epoch: 26 Global Step: 552980 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:01:58,999-Speed 6296.88 samples/sec Loss 3.8703 LearningRate 0.0001 Epoch: 26 Global Step: 552990 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:02:02,230-Speed 6340.09 samples/sec Loss 3.9055 LearningRate 0.0001 Epoch: 26 Global Step: 553000 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:05,534-Speed 6199.18 samples/sec Loss 3.8912 LearningRate 0.0001 Epoch: 26 Global Step: 553010 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:08,785-Speed 6302.18 samples/sec Loss 3.9813 LearningRate 0.0001 Epoch: 26 Global Step: 553020 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:12,037-Speed 6299.44 samples/sec Loss 3.9179 LearningRate 0.0001 Epoch: 26 Global Step: 553030 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:15,286-Speed 6306.48 samples/sec Loss 3.8809 LearningRate 0.0001 Epoch: 26 Global Step: 553040 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:18,532-Speed 6310.26 samples/sec Loss 3.9473 LearningRate 0.0001 Epoch: 26 Global Step: 553050 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:21,783-Speed 6300.71 samples/sec Loss 3.9116 LearningRate 0.0001 Epoch: 26 Global Step: 553060 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:25,033-Speed 6303.36 samples/sec Loss 3.8912 LearningRate 0.0001 Epoch: 26 Global Step: 553070 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:28,279-Speed 6310.46 samples/sec Loss 3.8267 LearningRate 0.0001 Epoch: 26 Global Step: 553080 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:31,527-Speed 6307.48 samples/sec Loss 3.7884 LearningRate 0.0001 Epoch: 26 Global Step: 553090 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:34,774-Speed 6308.55 samples/sec Loss 3.8813 LearningRate 0.0001 Epoch: 26 Global Step: 553100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:02:38,018-Speed 6313.80 samples/sec Loss 3.8749 LearningRate 0.0001 Epoch: 26 Global Step: 553110 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:02:41,270-Speed 6299.09 samples/sec Loss 3.8583 LearningRate 0.0001 Epoch: 26 Global Step: 553120 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:02:44,503-Speed 6337.38 samples/sec Loss 3.8832 LearningRate 0.0001 Epoch: 26 Global Step: 553130 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:47,750-Speed 6308.26 samples/sec Loss 3.8904 LearningRate 0.0001 Epoch: 26 Global Step: 553140 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:51,000-Speed 6302.17 samples/sec Loss 3.8734 LearningRate 0.0001 Epoch: 26 Global Step: 553150 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:54,251-Speed 6300.72 samples/sec Loss 3.8958 LearningRate 0.0001 Epoch: 26 Global Step: 553160 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:02:57,496-Speed 6313.44 samples/sec Loss 3.9027 LearningRate 0.0001 Epoch: 26 Global Step: 553170 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:00,744-Speed 6307.02 samples/sec Loss 3.8928 LearningRate 0.0001 Epoch: 26 Global Step: 553180 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:03,995-Speed 6299.63 samples/sec Loss 3.8623 LearningRate 0.0001 Epoch: 26 Global Step: 553190 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:07,251-Speed 6291.11 samples/sec Loss 3.9031 LearningRate 0.0001 Epoch: 26 Global Step: 553200 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:10,495-Speed 6315.96 samples/sec Loss 3.9063 LearningRate 0.0001 Epoch: 26 Global Step: 553210 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:13,745-Speed 6303.00 samples/sec Loss 3.9554 LearningRate 0.0001 Epoch: 26 Global Step: 553220 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:16,991-Speed 6310.76 samples/sec Loss 3.8584 LearningRate 0.0001 Epoch: 26 Global Step: 553230 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:03:20,234-Speed 6317.00 samples/sec Loss 3.8637 LearningRate 0.0001 Epoch: 26 Global Step: 553240 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:03:23,479-Speed 6313.15 samples/sec Loss 3.8908 LearningRate 0.0001 Epoch: 26 Global Step: 553250 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:03:26,726-Speed 6309.11 samples/sec Loss 3.8830 LearningRate 0.0001 Epoch: 26 Global Step: 553260 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:03:29,972-Speed 6309.78 samples/sec Loss 3.9047 LearningRate 0.0001 Epoch: 26 Global Step: 553270 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:03:33,217-Speed 6312.25 samples/sec Loss 3.7972 LearningRate 0.0001 Epoch: 26 Global Step: 553280 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:03:36,461-Speed 6314.92 samples/sec Loss 3.8992 LearningRate 0.0001 Epoch: 26 Global Step: 553290 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:03:39,695-Speed 6337.46 samples/sec Loss 3.9042 LearningRate 0.0001 Epoch: 26 Global Step: 553300 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:42,945-Speed 6302.93 samples/sec Loss 3.8929 LearningRate 0.0001 Epoch: 26 Global Step: 553310 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:46,190-Speed 6311.61 samples/sec Loss 3.9150 LearningRate 0.0001 Epoch: 26 Global Step: 553320 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:49,438-Speed 6306.25 samples/sec Loss 3.8838 LearningRate 0.0001 Epoch: 26 Global Step: 553330 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:52,680-Speed 6319.03 samples/sec Loss 3.8520 LearningRate 0.0001 Epoch: 26 Global Step: 553340 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:55,926-Speed 6311.39 samples/sec Loss 3.9042 LearningRate 0.0001 Epoch: 26 Global Step: 553350 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:03:59,168-Speed 6317.98 samples/sec Loss 3.9294 LearningRate 0.0001 Epoch: 26 Global Step: 553360 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:04:02,412-Speed 6313.72 samples/sec Loss 3.9153 LearningRate 0.0001 Epoch: 26 Global Step: 553370 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:04:05,659-Speed 6308.83 samples/sec Loss 3.9185 LearningRate 0.0001 Epoch: 26 Global Step: 553380 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:04:08,909-Speed 6304.07 samples/sec Loss 3.8929 LearningRate 0.0001 Epoch: 26 Global Step: 553390 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:04:12,151-Speed 6318.02 samples/sec Loss 3.9328 LearningRate 0.0001 Epoch: 26 Global Step: 553400 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:15,396-Speed 6314.18 samples/sec Loss 3.8805 LearningRate 0.0001 Epoch: 26 Global Step: 553410 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:18,639-Speed 6314.98 samples/sec Loss 3.8341 LearningRate 0.0001 Epoch: 26 Global Step: 553420 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:21,914-Speed 6254.43 samples/sec Loss 3.9225 LearningRate 0.0001 Epoch: 26 Global Step: 553430 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:25,271-Speed 6102.12 samples/sec Loss 3.8547 LearningRate 0.0001 Epoch: 26 Global Step: 553440 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:28,531-Speed 6284.22 samples/sec Loss 3.8692 LearningRate 0.0001 Epoch: 26 Global Step: 553450 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:31,776-Speed 6312.84 samples/sec Loss 3.8815 LearningRate 0.0001 Epoch: 26 Global Step: 553460 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:35,027-Speed 6302.69 samples/sec Loss 3.9349 LearningRate 0.0001 Epoch: 26 Global Step: 553470 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:38,269-Speed 6318.80 samples/sec Loss 3.8866 LearningRate 0.0001 Epoch: 26 Global Step: 553480 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:41,514-Speed 6312.25 samples/sec Loss 3.8950 LearningRate 0.0001 Epoch: 26 Global Step: 553490 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:44,744-Speed 6341.93 samples/sec Loss 3.8331 LearningRate 0.0001 Epoch: 26 Global Step: 553500 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:47,991-Speed 6309.23 samples/sec Loss 3.8344 LearningRate 0.0001 Epoch: 26 Global Step: 553510 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:51,240-Speed 6304.06 samples/sec Loss 3.9126 LearningRate 0.0001 Epoch: 26 Global Step: 553520 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:54,543-Speed 6201.92 samples/sec Loss 3.8450 LearningRate 0.0001 Epoch: 26 Global Step: 553530 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:04:57,786-Speed 6316.42 samples/sec Loss 3.8303 LearningRate 0.0001 Epoch: 26 Global Step: 553540 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:01,037-Speed 6301.28 samples/sec Loss 3.8692 LearningRate 0.0001 Epoch: 26 Global Step: 553550 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:04,282-Speed 6312.00 samples/sec Loss 3.8737 LearningRate 0.0001 Epoch: 26 Global Step: 553560 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:07,532-Speed 6303.59 samples/sec Loss 3.9096 LearningRate 0.0001 Epoch: 26 Global Step: 553570 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:10,778-Speed 6309.39 samples/sec Loss 3.8868 LearningRate 0.0001 Epoch: 26 Global Step: 553580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:14,031-Speed 6298.20 samples/sec Loss 3.9908 LearningRate 0.0001 Epoch: 26 Global Step: 553590 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:17,278-Speed 6309.32 samples/sec Loss 3.8346 LearningRate 0.0001 Epoch: 26 Global Step: 553600 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-04-02 18:05:20,509-Speed 6338.69 samples/sec Loss 3.8395 LearningRate 0.0001 Epoch: 26 Global Step: 553610 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:23,757-Speed 6306.60 samples/sec Loss 3.8910 LearningRate 0.0001 Epoch: 26 Global Step: 553620 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:27,003-Speed 6311.49 samples/sec Loss 3.9179 LearningRate 0.0001 Epoch: 26 Global Step: 553630 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:30,247-Speed 6314.92 samples/sec Loss 3.9100 LearningRate 0.0001 Epoch: 26 Global Step: 553640 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:33,494-Speed 6307.26 samples/sec Loss 3.8977 LearningRate 0.0001 Epoch: 26 Global Step: 553650 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:36,741-Speed 6310.00 samples/sec Loss 3.8650 LearningRate 0.0001 Epoch: 26 Global Step: 553660 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:39,989-Speed 6308.04 samples/sec Loss 3.9190 LearningRate 0.0001 Epoch: 26 Global Step: 553670 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:43,233-Speed 6314.57 samples/sec Loss 3.8226 LearningRate 0.0001 Epoch: 26 Global Step: 553680 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:46,479-Speed 6310.07 samples/sec Loss 3.8224 LearningRate 0.0001 Epoch: 26 Global Step: 553690 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:49,728-Speed 6304.37 samples/sec Loss 3.8470 LearningRate 0.0001 Epoch: 26 Global Step: 553700 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:52,957-Speed 6344.12 samples/sec Loss 3.9149 LearningRate 0.0001 Epoch: 26 Global Step: 553710 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:56,205-Speed 6306.33 samples/sec Loss 3.8707 LearningRate 0.0001 Epoch: 26 Global Step: 553720 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:05:59,449-Speed 6315.34 samples/sec Loss 3.8653 LearningRate 0.0001 Epoch: 26 Global Step: 553730 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:06:02,695-Speed 6310.82 samples/sec Loss 3.9461 LearningRate 0.0001 Epoch: 26 Global Step: 553740 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:06:05,939-Speed 6314.78 samples/sec Loss 3.8769 LearningRate 0.0001 Epoch: 26 Global Step: 553750 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:06:09,185-Speed 6311.06 samples/sec Loss 3.8399 LearningRate 0.0001 Epoch: 26 Global Step: 553760 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:06:12,419-Speed 6334.11 samples/sec Loss 3.8324 LearningRate 0.0001 Epoch: 26 Global Step: 553770 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:15,665-Speed 6310.78 samples/sec Loss 3.8909 LearningRate 0.0001 Epoch: 26 Global Step: 553780 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:18,911-Speed 6310.49 samples/sec Loss 3.8035 LearningRate 0.0001 Epoch: 26 Global Step: 553790 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:22,155-Speed 6314.63 samples/sec Loss 3.8437 LearningRate 0.0001 Epoch: 26 Global Step: 553800 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:25,402-Speed 6309.14 samples/sec Loss 3.8734 LearningRate 0.0001 Epoch: 26 Global Step: 553810 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:28,648-Speed 6310.68 samples/sec Loss 3.8856 LearningRate 0.0001 Epoch: 26 Global Step: 553820 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:31,902-Speed 6294.05 samples/sec Loss 3.8771 LearningRate 0.0001 Epoch: 26 Global Step: 553830 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:35,161-Speed 6285.57 samples/sec Loss 3.8573 LearningRate 0.0001 Epoch: 26 Global Step: 553840 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:38,405-Speed 6315.55 samples/sec Loss 3.8892 LearningRate 0.0001 Epoch: 26 Global Step: 553850 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:41,649-Speed 6313.86 samples/sec Loss 3.9536 LearningRate 0.0001 Epoch: 26 Global Step: 553860 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:44,899-Speed 6303.28 samples/sec Loss 3.9182 LearningRate 0.0001 Epoch: 26 Global Step: 553870 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:06:48,129-Speed 6343.77 samples/sec Loss 3.8239 LearningRate 0.0001 Epoch: 26 Global Step: 553880 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:51,376-Speed 6307.05 samples/sec Loss 3.9053 LearningRate 0.0001 Epoch: 26 Global Step: 553890 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:54,620-Speed 6315.43 samples/sec Loss 3.9482 LearningRate 0.0001 Epoch: 26 Global Step: 553900 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:06:57,867-Speed 6309.82 samples/sec Loss 3.9265 LearningRate 0.0001 Epoch: 26 Global Step: 553910 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:01,113-Speed 6310.77 samples/sec Loss 3.8099 LearningRate 0.0001 Epoch: 26 Global Step: 553920 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:04,367-Speed 6294.88 samples/sec Loss 3.9894 LearningRate 0.0001 Epoch: 26 Global Step: 553930 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:07,607-Speed 6320.83 samples/sec Loss 3.8849 LearningRate 0.0001 Epoch: 26 Global Step: 553940 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:10,851-Speed 6315.30 samples/sec Loss 3.9251 LearningRate 0.0001 Epoch: 26 Global Step: 553950 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:14,098-Speed 6309.88 samples/sec Loss 3.8816 LearningRate 0.0001 Epoch: 26 Global Step: 553960 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:17,343-Speed 6312.28 samples/sec Loss 3.8679 LearningRate 0.0001 Epoch: 26 Global Step: 553970 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:20,574-Speed 6339.91 samples/sec Loss 3.8450 LearningRate 0.0001 Epoch: 26 Global Step: 553980 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:23,835-Speed 6280.79 samples/sec Loss 3.8987 LearningRate 0.0001 Epoch: 26 Global Step: 553990 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:27,083-Speed 6307.41 samples/sec Loss 3.8476 LearningRate 0.0001 Epoch: 26 Global Step: 554000 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:30,332-Speed 6305.30 samples/sec Loss 3.9073 LearningRate 0.0001 Epoch: 26 Global Step: 554010 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:33,576-Speed 6313.59 samples/sec Loss 3.8787 LearningRate 0.0001 Epoch: 26 Global Step: 554020 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:36,822-Speed 6311.05 samples/sec Loss 3.8470 LearningRate 0.0001 Epoch: 26 Global Step: 554030 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:40,066-Speed 6315.05 samples/sec Loss 3.8366 LearningRate 0.0001 Epoch: 26 Global Step: 554040 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:43,312-Speed 6309.82 samples/sec Loss 3.9253 LearningRate 0.0001 Epoch: 26 Global Step: 554050 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:46,557-Speed 6313.14 samples/sec Loss 3.8854 LearningRate 0.0001 Epoch: 26 Global Step: 554060 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:49,804-Speed 6307.92 samples/sec Loss 3.8284 LearningRate 0.0001 Epoch: 26 Global Step: 554070 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:07:53,051-Speed 6311.09 samples/sec Loss 3.8683 LearningRate 0.0001 Epoch: 26 Global Step: 554080 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:07:56,291-Speed 6321.27 samples/sec Loss 3.8801 LearningRate 0.0001 Epoch: 26 Global Step: 554090 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:07:59,536-Speed 6313.20 samples/sec Loss 3.9229 LearningRate 0.0001 Epoch: 26 Global Step: 554100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:02,783-Speed 6309.30 samples/sec Loss 3.8710 LearningRate 0.0001 Epoch: 26 Global Step: 554110 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:06,027-Speed 6313.43 samples/sec Loss 3.8820 LearningRate 0.0001 Epoch: 26 Global Step: 554120 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:09,279-Speed 6299.30 samples/sec Loss 3.9447 LearningRate 0.0001 Epoch: 26 Global Step: 554130 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:12,526-Speed 6309.49 samples/sec Loss 3.8914 LearningRate 0.0001 Epoch: 26 Global Step: 554140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:15,776-Speed 6302.57 samples/sec Loss 3.8708 LearningRate 0.0001 Epoch: 26 Global Step: 554150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:19,023-Speed 6309.64 samples/sec Loss 3.8900 LearningRate 0.0001 Epoch: 26 Global Step: 554160 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:22,267-Speed 6313.66 samples/sec Loss 3.9016 LearningRate 0.0001 Epoch: 26 Global Step: 554170 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:25,497-Speed 6341.65 samples/sec Loss 3.8470 LearningRate 0.0001 Epoch: 26 Global Step: 554180 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:28,750-Speed 6296.90 samples/sec Loss 3.8316 LearningRate 0.0001 Epoch: 26 Global Step: 554190 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:31,997-Speed 6309.32 samples/sec Loss 3.9470 LearningRate 0.0001 Epoch: 26 Global Step: 554200 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:35,239-Speed 6317.54 samples/sec Loss 3.8260 LearningRate 0.0001 Epoch: 26 Global Step: 554210 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:08:38,470-Speed 6340.52 samples/sec Loss 3.8697 LearningRate 0.0001 Epoch: 26 Global Step: 554220 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:08:41,716-Speed 6310.12 samples/sec Loss 3.8973 LearningRate 0.0001 Epoch: 26 Global Step: 554230 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:08:44,963-Speed 6310.09 samples/sec Loss 3.9026 LearningRate 0.0001 Epoch: 26 Global Step: 554240 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:08:48,205-Speed 6317.05 samples/sec Loss 3.9348 LearningRate 0.0001 Epoch: 26 Global Step: 554250 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:08:51,449-Speed 6315.45 samples/sec Loss 3.8858 LearningRate 0.0001 Epoch: 26 Global Step: 554260 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:08:54,696-Speed 6307.98 samples/sec Loss 3.9019 LearningRate 0.0001 Epoch: 26 Global Step: 554270 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:08:57,938-Speed 6319.41 samples/sec Loss 3.8464 LearningRate 0.0001 Epoch: 26 Global Step: 554280 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:01,185-Speed 6308.84 samples/sec Loss 3.9080 LearningRate 0.0001 Epoch: 26 Global Step: 554290 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:04,432-Speed 6310.04 samples/sec Loss 3.9247 LearningRate 0.0001 Epoch: 26 Global Step: 554300 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:07,675-Speed 6316.54 samples/sec Loss 3.8997 LearningRate 0.0001 Epoch: 26 Global Step: 554310 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:10,921-Speed 6309.50 samples/sec Loss 3.8855 LearningRate 0.0001 Epoch: 26 Global Step: 554320 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:09:14,168-Speed 6310.49 samples/sec Loss 3.8962 LearningRate 0.0001 Epoch: 26 Global Step: 554330 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:09:17,412-Speed 6314.30 samples/sec Loss 3.9164 LearningRate 0.0001 Epoch: 26 Global Step: 554340 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:09:20,642-Speed 6341.24 samples/sec Loss 3.8732 LearningRate 0.0001 Epoch: 26 Global Step: 554350 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:23,888-Speed 6310.89 samples/sec Loss 3.8648 LearningRate 0.0001 Epoch: 26 Global Step: 554360 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:27,131-Speed 6316.07 samples/sec Loss 3.9052 LearningRate 0.0001 Epoch: 26 Global Step: 554370 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:30,372-Speed 6320.33 samples/sec Loss 3.8268 LearningRate 0.0001 Epoch: 26 Global Step: 554380 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:33,614-Speed 6319.00 samples/sec Loss 3.8592 LearningRate 0.0001 Epoch: 26 Global Step: 554390 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:36,856-Speed 6318.48 samples/sec Loss 3.8738 LearningRate 0.0001 Epoch: 26 Global Step: 554400 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:40,098-Speed 6318.42 samples/sec Loss 3.8854 LearningRate 0.0001 Epoch: 26 Global Step: 554410 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:43,351-Speed 6296.73 samples/sec Loss 3.8724 LearningRate 0.0001 Epoch: 26 Global Step: 554420 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:46,590-Speed 6324.84 samples/sec Loss 3.8368 LearningRate 0.0001 Epoch: 26 Global Step: 554430 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:49,838-Speed 6307.19 samples/sec Loss 3.8799 LearningRate 0.0001 Epoch: 26 Global Step: 554440 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:09:53,087-Speed 6303.77 samples/sec Loss 3.9438 LearningRate 0.0001 Epoch: 26 Global Step: 554450 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:09:56,333-Speed 6312.09 samples/sec Loss 3.8136 LearningRate 0.0001 Epoch: 26 Global Step: 554460 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:09:59,560-Speed 6347.68 samples/sec Loss 3.8202 LearningRate 0.0001 Epoch: 26 Global Step: 554470 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:10:02,806-Speed 6309.74 samples/sec Loss 3.8626 LearningRate 0.0001 Epoch: 26 Global Step: 554480 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:10:06,050-Speed 6315.03 samples/sec Loss 3.8890 LearningRate 0.0001 Epoch: 26 Global Step: 554490 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:10:09,290-Speed 6321.85 samples/sec Loss 3.8990 LearningRate 0.0001 Epoch: 26 Global Step: 554500 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:10:12,537-Speed 6309.35 samples/sec Loss 3.8817 LearningRate 0.0001 Epoch: 26 Global Step: 554510 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:10:15,787-Speed 6303.40 samples/sec Loss 3.8553 LearningRate 0.0001 Epoch: 26 Global Step: 554520 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:10:19,037-Speed 6302.10 samples/sec Loss 3.8558 LearningRate 0.0001 Epoch: 26 Global Step: 554530 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:10:22,281-Speed 6315.57 samples/sec Loss 3.8544 LearningRate 0.0001 Epoch: 26 Global Step: 554540 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:10:25,524-Speed 6316.91 samples/sec Loss 3.8759 LearningRate 0.0001 Epoch: 26 Global Step: 554550 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:10:28,765-Speed 6320.52 samples/sec Loss 3.8708 LearningRate 0.0001 Epoch: 26 Global Step: 554560 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:10:32,007-Speed 6318.99 samples/sec Loss 3.9476 LearningRate 0.0001 Epoch: 26 Global Step: 554570 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:10:35,255-Speed 6306.27 samples/sec Loss 3.8081 LearningRate 0.0001 Epoch: 26 Global Step: 554580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:10:38,512-Speed 6290.02 samples/sec Loss 3.8549 LearningRate 0.0001 Epoch: 26 Global Step: 554590 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:10:41,761-Speed 6303.81 samples/sec Loss 3.9387 LearningRate 0.0001 Epoch: 26 Global Step: 554600 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:10:45,008-Speed 6308.76 samples/sec Loss 3.8793 LearningRate 0.0001 Epoch: 26 Global Step: 554610 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:10:48,249-Speed 6319.80 samples/sec Loss 3.8678 LearningRate 0.0001 Epoch: 26 Global Step: 554620 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:10:51,497-Speed 6308.72 samples/sec Loss 3.8099 LearningRate 0.0001 Epoch: 26 Global Step: 554630 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:10:54,743-Speed 6309.27 samples/sec Loss 3.8900 LearningRate 0.0001 Epoch: 26 Global Step: 554640 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:10:57,976-Speed 6336.59 samples/sec Loss 3.9108 LearningRate 0.0001 Epoch: 26 Global Step: 554650 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:01,223-Speed 6308.12 samples/sec Loss 3.8680 LearningRate 0.0001 Epoch: 26 Global Step: 554660 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:04,467-Speed 6315.45 samples/sec Loss 3.9462 LearningRate 0.0001 Epoch: 26 Global Step: 554670 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:07,716-Speed 6303.91 samples/sec Loss 3.8924 LearningRate 0.0001 Epoch: 26 Global Step: 554680 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:10,958-Speed 6317.95 samples/sec Loss 3.9009 LearningRate 0.0001 Epoch: 26 Global Step: 554690 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:14,205-Speed 6309.33 samples/sec Loss 3.9359 LearningRate 0.0001 Epoch: 26 Global Step: 554700 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:17,450-Speed 6312.54 samples/sec Loss 3.8480 LearningRate 0.0001 Epoch: 26 Global Step: 554710 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:20,696-Speed 6310.70 samples/sec Loss 3.8750 LearningRate 0.0001 Epoch: 26 Global Step: 554720 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:23,939-Speed 6317.71 samples/sec Loss 3.9116 LearningRate 0.0001 Epoch: 26 Global Step: 554730 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:27,217-Speed 6248.24 samples/sec Loss 3.9322 LearningRate 0.0001 Epoch: 26 Global Step: 554740 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:30,511-Speed 6220.46 samples/sec Loss 3.8577 LearningRate 0.0001 Epoch: 26 Global Step: 554750 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:11:33,743-Speed 6337.93 samples/sec Loss 3.8620 LearningRate 0.0001 Epoch: 26 Global Step: 554760 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:36,989-Speed 6311.33 samples/sec Loss 3.8906 LearningRate 0.0001 Epoch: 26 Global Step: 554770 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:40,233-Speed 6313.90 samples/sec Loss 3.9121 LearningRate 0.0001 Epoch: 26 Global Step: 554780 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:43,477-Speed 6314.96 samples/sec Loss 3.8799 LearningRate 0.0001 Epoch: 26 Global Step: 554790 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:46,726-Speed 6304.05 samples/sec Loss 3.8671 LearningRate 0.0001 Epoch: 26 Global Step: 554800 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:49,970-Speed 6315.46 samples/sec Loss 3.8553 LearningRate 0.0001 Epoch: 26 Global Step: 554810 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:53,215-Speed 6311.98 samples/sec Loss 3.9188 LearningRate 0.0001 Epoch: 26 Global Step: 554820 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:56,461-Speed 6310.74 samples/sec Loss 3.8944 LearningRate 0.0001 Epoch: 26 Global Step: 554830 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:11:59,705-Speed 6313.99 samples/sec Loss 3.8772 LearningRate 0.0001 Epoch: 26 Global Step: 554840 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:12:02,948-Speed 6316.38 samples/sec Loss 3.8191 LearningRate 0.0001 Epoch: 26 Global Step: 554850 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:12:06,207-Speed 6286.99 samples/sec Loss 3.9018 LearningRate 0.0001 Epoch: 26 Global Step: 554860 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:09,457-Speed 6301.84 samples/sec Loss 3.9373 LearningRate 0.0001 Epoch: 26 Global Step: 554870 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:12,701-Speed 6314.82 samples/sec Loss 3.8641 LearningRate 0.0001 Epoch: 26 Global Step: 554880 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:15,944-Speed 6317.23 samples/sec Loss 3.9347 LearningRate 0.0001 Epoch: 26 Global Step: 554890 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:19,191-Speed 6308.06 samples/sec Loss 3.8764 LearningRate 0.0001 Epoch: 26 Global Step: 554900 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:22,447-Speed 6291.75 samples/sec Loss 3.7981 LearningRate 0.0001 Epoch: 26 Global Step: 554910 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:25,697-Speed 6302.47 samples/sec Loss 3.8411 LearningRate 0.0001 Epoch: 26 Global Step: 554920 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:28,941-Speed 6313.83 samples/sec Loss 3.9449 LearningRate 0.0001 Epoch: 26 Global Step: 554930 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:32,189-Speed 6307.96 samples/sec Loss 3.8533 LearningRate 0.0001 Epoch: 26 Global Step: 554940 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:35,440-Speed 6301.84 samples/sec Loss 3.9238 LearningRate 0.0001 Epoch: 26 Global Step: 554950 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:38,672-Speed 6338.25 samples/sec Loss 3.8480 LearningRate 0.0001 Epoch: 26 Global Step: 554960 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:12:41,904-Speed 6337.04 samples/sec Loss 3.8303 LearningRate 0.0001 Epoch: 26 Global Step: 554970 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:12:45,149-Speed 6313.03 samples/sec Loss 3.8399 LearningRate 0.0001 Epoch: 26 Global Step: 554980 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:12:48,401-Speed 6298.96 samples/sec Loss 3.9307 LearningRate 0.0001 Epoch: 26 Global Step: 554990 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:12:51,643-Speed 6319.69 samples/sec Loss 3.8706 LearningRate 0.0001 Epoch: 26 Global Step: 555000 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:12:54,885-Speed 6317.40 samples/sec Loss 3.9202 LearningRate 0.0001 Epoch: 26 Global Step: 555010 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:12:58,132-Speed 6307.98 samples/sec Loss 3.8794 LearningRate 0.0001 Epoch: 26 Global Step: 555020 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:13:01,378-Speed 6310.80 samples/sec Loss 3.8845 LearningRate 0.0001 Epoch: 26 Global Step: 555030 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:13:04,628-Speed 6303.77 samples/sec Loss 3.8096 LearningRate 0.0001 Epoch: 26 Global Step: 555040 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:13:07,872-Speed 6314.69 samples/sec Loss 3.9220 LearningRate 0.0001 Epoch: 26 Global Step: 555050 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:13:11,116-Speed 6315.11 samples/sec Loss 3.8948 LearningRate 0.0001 Epoch: 26 Global Step: 555060 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:13:14,362-Speed 6310.25 samples/sec Loss 3.9078 LearningRate 0.0001 Epoch: 26 Global Step: 555070 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:13:17,614-Speed 6299.26 samples/sec Loss 3.8442 LearningRate 0.0001 Epoch: 26 Global Step: 555080 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:13:20,868-Speed 6295.19 samples/sec Loss 3.9193 LearningRate 0.0001 Epoch: 26 Global Step: 555090 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:13:24,121-Speed 6296.29 samples/sec Loss 3.8224 LearningRate 0.0001 Epoch: 26 Global Step: 555100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:13:27,369-Speed 6306.92 samples/sec Loss 3.9219 LearningRate 0.0001 Epoch: 26 Global Step: 555110 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:13:30,614-Speed 6313.54 samples/sec Loss 3.8496 LearningRate 0.0001 Epoch: 26 Global Step: 555120 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:13:33,874-Speed 6283.93 samples/sec Loss 3.8248 LearningRate 0.0001 Epoch: 26 Global Step: 555130 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:13:37,137-Speed 6276.71 samples/sec Loss 3.8734 LearningRate 0.0001 Epoch: 26 Global Step: 555140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:13:40,389-Speed 6300.69 samples/sec Loss 3.9302 LearningRate 0.0001 Epoch: 26 Global Step: 555150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:13:43,619-Speed 6340.46 samples/sec Loss 3.8992 LearningRate 0.0001 Epoch: 26 Global Step: 555160 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:13:46,865-Speed 6311.23 samples/sec Loss 3.8552 LearningRate 0.0001 Epoch: 26 Global Step: 555170 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:13:50,112-Speed 6310.18 samples/sec Loss 3.9155 LearningRate 0.0001 Epoch: 26 Global Step: 555180 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:13:53,357-Speed 6311.73 samples/sec Loss 3.9085 LearningRate 0.0001 Epoch: 26 Global Step: 555190 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:13:56,604-Speed 6309.21 samples/sec Loss 3.8626 LearningRate 0.0001 Epoch: 26 Global Step: 555200 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:13:59,854-Speed 6303.08 samples/sec Loss 3.8228 LearningRate 0.0001 Epoch: 26 Global Step: 555210 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:03,101-Speed 6307.90 samples/sec Loss 3.8822 LearningRate 0.0001 Epoch: 26 Global Step: 555220 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:06,352-Speed 6301.83 samples/sec Loss 3.9079 LearningRate 0.0001 Epoch: 26 Global Step: 555230 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:09,598-Speed 6310.34 samples/sec Loss 3.8288 LearningRate 0.0001 Epoch: 26 Global Step: 555240 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:12,843-Speed 6312.20 samples/sec Loss 3.9652 LearningRate 0.0001 Epoch: 26 Global Step: 555250 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:16,090-Speed 6308.31 samples/sec Loss 3.8484 LearningRate 0.0001 Epoch: 26 Global Step: 555260 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:14:19,339-Speed 6305.70 samples/sec Loss 3.8971 LearningRate 0.0001 Epoch: 26 Global Step: 555270 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:14:22,583-Speed 6313.83 samples/sec Loss 3.8946 LearningRate 0.0001 Epoch: 26 Global Step: 555280 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:14:25,834-Speed 6302.74 samples/sec Loss 3.8159 LearningRate 0.0001 Epoch: 26 Global Step: 555290 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:14:29,080-Speed 6309.39 samples/sec Loss 3.9066 LearningRate 0.0001 Epoch: 26 Global Step: 555300 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:14:32,329-Speed 6304.29 samples/sec Loss 3.9142 LearningRate 0.0001 Epoch: 26 Global Step: 555310 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:14:35,574-Speed 6313.91 samples/sec Loss 3.8681 LearningRate 0.0001 Epoch: 26 Global Step: 555320 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:14:38,804-Speed 6340.53 samples/sec Loss 3.8951 LearningRate 0.0001 Epoch: 26 Global Step: 555330 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:42,061-Speed 6290.43 samples/sec Loss 3.8512 LearningRate 0.0001 Epoch: 26 Global Step: 555340 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:45,308-Speed 6310.21 samples/sec Loss 3.9219 LearningRate 0.0001 Epoch: 26 Global Step: 555350 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:48,558-Speed 6301.87 samples/sec Loss 3.9531 LearningRate 0.0001 Epoch: 26 Global Step: 555360 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:51,847-Speed 6229.61 samples/sec Loss 3.9127 LearningRate 0.0001 Epoch: 26 Global Step: 555370 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:55,096-Speed 6303.93 samples/sec Loss 3.8844 LearningRate 0.0001 Epoch: 26 Global Step: 555380 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:14:58,340-Speed 6314.90 samples/sec Loss 3.8358 LearningRate 0.0001 Epoch: 26 Global Step: 555390 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:01,587-Speed 6309.94 samples/sec Loss 3.8872 LearningRate 0.0001 Epoch: 26 Global Step: 555400 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:04,832-Speed 6311.60 samples/sec Loss 3.9379 LearningRate 0.0001 Epoch: 26 Global Step: 555410 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:08,080-Speed 6308.15 samples/sec Loss 3.8522 LearningRate 0.0001 Epoch: 26 Global Step: 555420 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:11,327-Speed 6308.58 samples/sec Loss 3.8204 LearningRate 0.0001 Epoch: 26 Global Step: 555430 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:15:14,559-Speed 6337.84 samples/sec Loss 3.8599 LearningRate 0.0001 Epoch: 26 Global Step: 555440 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:17,813-Speed 6295.46 samples/sec Loss 3.8747 LearningRate 0.0001 Epoch: 26 Global Step: 555450 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:21,062-Speed 6304.93 samples/sec Loss 3.8635 LearningRate 0.0001 Epoch: 26 Global Step: 555460 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:24,313-Speed 6300.49 samples/sec Loss 3.9438 LearningRate 0.0001 Epoch: 26 Global Step: 555470 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:27,560-Speed 6308.71 samples/sec Loss 3.8977 LearningRate 0.0001 Epoch: 26 Global Step: 555480 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:30,806-Speed 6309.92 samples/sec Loss 3.8549 LearningRate 0.0001 Epoch: 26 Global Step: 555490 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:34,054-Speed 6306.80 samples/sec Loss 3.8211 LearningRate 0.0001 Epoch: 26 Global Step: 555500 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:37,304-Speed 6302.92 samples/sec Loss 3.8920 LearningRate 0.0001 Epoch: 26 Global Step: 555510 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:40,549-Speed 6312.84 samples/sec Loss 3.9393 LearningRate 0.0001 Epoch: 26 Global Step: 555520 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:43,798-Speed 6303.86 samples/sec Loss 3.8589 LearningRate 0.0001 Epoch: 26 Global Step: 555530 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:15:47,048-Speed 6303.73 samples/sec Loss 3.8440 LearningRate 0.0001 Epoch: 26 Global Step: 555540 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:15:50,297-Speed 6305.30 samples/sec Loss 3.8314 LearningRate 0.0001 Epoch: 26 Global Step: 555550 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:15:53,544-Speed 6308.46 samples/sec Loss 3.8859 LearningRate 0.0001 Epoch: 26 Global Step: 555560 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:15:56,790-Speed 6311.42 samples/sec Loss 3.8728 LearningRate 0.0001 Epoch: 26 Global Step: 555570 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:16:00,038-Speed 6308.08 samples/sec Loss 3.8800 LearningRate 0.0001 Epoch: 26 Global Step: 555580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:16:03,285-Speed 6308.69 samples/sec Loss 3.8909 LearningRate 0.0001 Epoch: 26 Global Step: 555590 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:16:06,530-Speed 6311.62 samples/sec Loss 3.8395 LearningRate 0.0001 Epoch: 26 Global Step: 555600 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:16:09,777-Speed 6309.56 samples/sec Loss 3.8902 LearningRate 0.0001 Epoch: 26 Global Step: 555610 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:16:13,021-Speed 6314.99 samples/sec Loss 3.8212 LearningRate 0.0001 Epoch: 26 Global Step: 555620 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:16:16,254-Speed 6335.84 samples/sec Loss 3.8147 LearningRate 0.0001 Epoch: 26 Global Step: 555630 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:19,502-Speed 6305.80 samples/sec Loss 3.8167 LearningRate 0.0001 Epoch: 26 Global Step: 555640 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:22,749-Speed 6308.71 samples/sec Loss 3.8971 LearningRate 0.0001 Epoch: 26 Global Step: 555650 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:25,998-Speed 6304.78 samples/sec Loss 3.9108 LearningRate 0.0001 Epoch: 26 Global Step: 555660 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:29,245-Speed 6309.73 samples/sec Loss 3.8738 LearningRate 0.0001 Epoch: 26 Global Step: 555670 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:32,488-Speed 6315.91 samples/sec Loss 3.9003 LearningRate 0.0001 Epoch: 26 Global Step: 555680 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:35,732-Speed 6313.87 samples/sec Loss 3.9409 LearningRate 0.0001 Epoch: 26 Global Step: 555690 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:38,978-Speed 6312.33 samples/sec Loss 3.8896 LearningRate 0.0001 Epoch: 26 Global Step: 555700 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:42,223-Speed 6312.40 samples/sec Loss 3.9124 LearningRate 0.0001 Epoch: 26 Global Step: 555710 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:45,472-Speed 6304.20 samples/sec Loss 3.8878 LearningRate 0.0001 Epoch: 26 Global Step: 555720 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:48,701-Speed 6343.26 samples/sec Loss 3.8603 LearningRate 0.0001 Epoch: 26 Global Step: 555730 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:51,944-Speed 6317.20 samples/sec Loss 3.8522 LearningRate 0.0001 Epoch: 26 Global Step: 555740 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:55,185-Speed 6321.06 samples/sec Loss 3.9071 LearningRate 0.0001 Epoch: 26 Global Step: 555750 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:16:58,433-Speed 6305.67 samples/sec Loss 3.8750 LearningRate 0.0001 Epoch: 26 Global Step: 555760 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:17:01,682-Speed 6305.90 samples/sec Loss 3.8695 LearningRate 0.0001 Epoch: 26 Global Step: 555770 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:17:04,923-Speed 6321.05 samples/sec Loss 3.8464 LearningRate 0.0001 Epoch: 26 Global Step: 555780 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:17:08,168-Speed 6311.06 samples/sec Loss 3.8843 LearningRate 0.0001 Epoch: 26 Global Step: 555790 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:17:11,419-Speed 6302.98 samples/sec Loss 3.8923 LearningRate 0.0001 Epoch: 26 Global Step: 555800 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:17:14,662-Speed 6316.12 samples/sec Loss 3.8756 LearningRate 0.0001 Epoch: 26 Global Step: 555810 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:17:17,905-Speed 6315.78 samples/sec Loss 3.8960 LearningRate 0.0001 Epoch: 26 Global Step: 555820 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:17:21,150-Speed 6313.25 samples/sec Loss 3.8572 LearningRate 0.0001 Epoch: 26 Global Step: 555830 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:24,400-Speed 6303.28 samples/sec Loss 3.8657 LearningRate 0.0001 Epoch: 26 Global Step: 555840 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:27,642-Speed 6318.83 samples/sec Loss 3.8488 LearningRate 0.0001 Epoch: 26 Global Step: 555850 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:30,885-Speed 6315.04 samples/sec Loss 3.8043 LearningRate 0.0001 Epoch: 26 Global Step: 555860 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:34,130-Speed 6314.10 samples/sec Loss 3.8973 LearningRate 0.0001 Epoch: 26 Global Step: 555870 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:37,373-Speed 6315.13 samples/sec Loss 3.8402 LearningRate 0.0001 Epoch: 26 Global Step: 555880 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:40,621-Speed 6308.15 samples/sec Loss 3.9282 LearningRate 0.0001 Epoch: 26 Global Step: 555890 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:43,860-Speed 6323.07 samples/sec Loss 3.8919 LearningRate 0.0001 Epoch: 26 Global Step: 555900 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:47,109-Speed 6304.69 samples/sec Loss 3.9080 LearningRate 0.0001 Epoch: 26 Global Step: 555910 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:50,353-Speed 6315.60 samples/sec Loss 3.8864 LearningRate 0.0001 Epoch: 26 Global Step: 555920 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:53,583-Speed 6340.85 samples/sec Loss 3.8491 LearningRate 0.0001 Epoch: 26 Global Step: 555930 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:17:56,829-Speed 6311.57 samples/sec Loss 3.8687 LearningRate 0.0001 Epoch: 26 Global Step: 555940 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:18:00,077-Speed 6307.82 samples/sec Loss 3.9393 LearningRate 0.0001 Epoch: 26 Global Step: 555950 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:18:03,322-Speed 6312.49 samples/sec Loss 3.8399 LearningRate 0.0001 Epoch: 26 Global Step: 555960 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:18:06,566-Speed 6313.88 samples/sec Loss 3.8375 LearningRate 0.0001 Epoch: 26 Global Step: 555970 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:18:09,813-Speed 6309.10 samples/sec Loss 3.8261 LearningRate 0.0001 Epoch: 26 Global Step: 555980 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:18:13,057-Speed 6314.84 samples/sec Loss 3.8157 LearningRate 0.0001 Epoch: 26 Global Step: 555990 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:18:16,287-Speed 6341.39 samples/sec Loss 3.8733 LearningRate 0.0001 Epoch: 26 Global Step: 556000 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:18:19,533-Speed 6311.17 samples/sec Loss 3.8441 LearningRate 0.0001 Epoch: 26 Global Step: 556010 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:18:22,777-Speed 6315.55 samples/sec Loss 3.8841 LearningRate 0.0001 Epoch: 26 Global Step: 556020 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:18:26,026-Speed 6305.10 samples/sec Loss 3.9178 LearningRate 0.0001 Epoch: 26 Global Step: 556030 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:18:29,271-Speed 6311.40 samples/sec Loss 3.8523 LearningRate 0.0001 Epoch: 26 Global Step: 556040 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:18:32,518-Speed 6310.15 samples/sec Loss 3.9067 LearningRate 0.0001 Epoch: 26 Global Step: 556050 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:18:35,762-Speed 6314.52 samples/sec Loss 3.8831 LearningRate 0.0001 Epoch: 26 Global Step: 556060 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:18:39,009-Speed 6308.61 samples/sec Loss 3.8517 LearningRate 0.0001 Epoch: 26 Global Step: 556070 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:18:42,252-Speed 6316.19 samples/sec Loss 3.7996 LearningRate 0.0001 Epoch: 26 Global Step: 556080 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:18:45,493-Speed 6319.55 samples/sec Loss 3.9013 LearningRate 0.0001 Epoch: 26 Global Step: 556090 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:18:48,738-Speed 6312.86 samples/sec Loss 3.8866 LearningRate 0.0001 Epoch: 26 Global Step: 556100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:18:51,981-Speed 6317.54 samples/sec Loss 3.8372 LearningRate 0.0001 Epoch: 26 Global Step: 556110 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:18:55,230-Speed 6304.84 samples/sec Loss 3.7970 LearningRate 0.0001 Epoch: 26 Global Step: 556120 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:18:58,477-Speed 6309.24 samples/sec Loss 3.8088 LearningRate 0.0001 Epoch: 26 Global Step: 556130 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:01,726-Speed 6304.52 samples/sec Loss 3.7939 LearningRate 0.0001 Epoch: 26 Global Step: 556140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:04,972-Speed 6310.49 samples/sec Loss 3.8635 LearningRate 0.0001 Epoch: 26 Global Step: 556150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:08,221-Speed 6304.27 samples/sec Loss 3.8427 LearningRate 0.0001 Epoch: 26 Global Step: 556160 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:11,468-Speed 6309.22 samples/sec Loss 3.8785 LearningRate 0.0001 Epoch: 26 Global Step: 556170 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:14,720-Speed 6298.16 samples/sec Loss 3.8278 LearningRate 0.0001 Epoch: 26 Global Step: 556180 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:17,962-Speed 6318.66 samples/sec Loss 3.7927 LearningRate 0.0001 Epoch: 26 Global Step: 556190 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:21,199-Speed 6330.00 samples/sec Loss 3.8823 LearningRate 0.0001 Epoch: 26 Global Step: 556200 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:24,445-Speed 6311.18 samples/sec Loss 3.8730 LearningRate 0.0001 Epoch: 26 Global Step: 556210 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:27,688-Speed 6316.31 samples/sec Loss 3.8545 LearningRate 0.0001 Epoch: 26 Global Step: 556220 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:30,936-Speed 6307.25 samples/sec Loss 3.8665 LearningRate 0.0001 Epoch: 26 Global Step: 556230 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:19:34,169-Speed 6335.28 samples/sec Loss 3.9283 LearningRate 0.0001 Epoch: 26 Global Step: 556240 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:19:37,414-Speed 6312.72 samples/sec Loss 3.8727 LearningRate 0.0001 Epoch: 26 Global Step: 556250 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:19:40,664-Speed 6303.33 samples/sec Loss 3.8587 LearningRate 0.0001 Epoch: 26 Global Step: 556260 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:19:43,917-Speed 6297.67 samples/sec Loss 3.8530 LearningRate 0.0001 Epoch: 26 Global Step: 556270 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:19:47,161-Speed 6313.87 samples/sec Loss 3.8232 LearningRate 0.0001 Epoch: 26 Global Step: 556280 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:19:50,405-Speed 6314.43 samples/sec Loss 3.8289 LearningRate 0.0001 Epoch: 26 Global Step: 556290 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:19:53,650-Speed 6312.22 samples/sec Loss 3.8735 LearningRate 0.0001 Epoch: 26 Global Step: 556300 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:19:56,894-Speed 6316.02 samples/sec Loss 3.9255 LearningRate 0.0001 Epoch: 26 Global Step: 556310 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:00,138-Speed 6314.34 samples/sec Loss 3.8330 LearningRate 0.0001 Epoch: 26 Global Step: 556320 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:03,385-Speed 6307.60 samples/sec Loss 3.9375 LearningRate 0.0001 Epoch: 26 Global Step: 556330 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:06,616-Speed 6339.35 samples/sec Loss 3.9395 LearningRate 0.0001 Epoch: 26 Global Step: 556340 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:09,873-Speed 6290.62 samples/sec Loss 3.8756 LearningRate 0.0001 Epoch: 26 Global Step: 556350 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:13,211-Speed 6136.36 samples/sec Loss 3.7916 LearningRate 0.0001 Epoch: 26 Global Step: 556360 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:16,494-Speed 6239.64 samples/sec Loss 3.9052 LearningRate 0.0001 Epoch: 26 Global Step: 556370 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:19,737-Speed 6316.20 samples/sec Loss 3.8690 LearningRate 0.0001 Epoch: 26 Global Step: 556380 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:22,987-Speed 6303.53 samples/sec Loss 3.8274 LearningRate 0.0001 Epoch: 26 Global Step: 556390 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:26,237-Speed 6304.00 samples/sec Loss 3.8936 LearningRate 0.0001 Epoch: 26 Global Step: 556400 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:29,480-Speed 6315.88 samples/sec Loss 3.8598 LearningRate 0.0001 Epoch: 26 Global Step: 556410 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:32,725-Speed 6313.74 samples/sec Loss 3.9632 LearningRate 0.0001 Epoch: 26 Global Step: 556420 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:35,972-Speed 6308.25 samples/sec Loss 3.8778 LearningRate 0.0001 Epoch: 26 Global Step: 556430 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:39,217-Speed 6312.22 samples/sec Loss 3.8742 LearningRate 0.0001 Epoch: 26 Global Step: 556440 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:20:42,449-Speed 6338.20 samples/sec Loss 3.9175 LearningRate 0.0001 Epoch: 26 Global Step: 556450 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:45,695-Speed 6310.10 samples/sec Loss 3.8228 LearningRate 0.0001 Epoch: 26 Global Step: 556460 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:48,944-Speed 6305.32 samples/sec Loss 3.8872 LearningRate 0.0001 Epoch: 26 Global Step: 556470 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:52,189-Speed 6311.99 samples/sec Loss 3.8373 LearningRate 0.0001 Epoch: 26 Global Step: 556480 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:55,433-Speed 6315.33 samples/sec Loss 3.8892 LearningRate 0.0001 Epoch: 26 Global Step: 556490 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:20:58,687-Speed 6294.81 samples/sec Loss 3.8669 LearningRate 0.0001 Epoch: 26 Global Step: 556500 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:01,942-Speed 6294.66 samples/sec Loss 3.8570 LearningRate 0.0001 Epoch: 26 Global Step: 556510 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:05,214-Speed 6260.37 samples/sec Loss 3.8241 LearningRate 0.0001 Epoch: 26 Global Step: 556520 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:08,460-Speed 6309.07 samples/sec Loss 3.8866 LearningRate 0.0001 Epoch: 26 Global Step: 556530 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:11,705-Speed 6313.90 samples/sec Loss 3.8328 LearningRate 0.0001 Epoch: 26 Global Step: 556540 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:14,951-Speed 6310.03 samples/sec Loss 3.9319 LearningRate 0.0001 Epoch: 26 Global Step: 556550 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:21:18,187-Speed 6330.91 samples/sec Loss 3.8609 LearningRate 0.0001 Epoch: 26 Global Step: 556560 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:21,431-Speed 6313.90 samples/sec Loss 3.7959 LearningRate 0.0001 Epoch: 26 Global Step: 556570 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:24,682-Speed 6301.82 samples/sec Loss 3.8701 LearningRate 0.0001 Epoch: 26 Global Step: 556580 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:27,925-Speed 6316.76 samples/sec Loss 3.8741 LearningRate 0.0001 Epoch: 26 Global Step: 556590 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:31,180-Speed 6292.97 samples/sec Loss 3.8014 LearningRate 0.0001 Epoch: 26 Global Step: 556600 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:34,429-Speed 6304.68 samples/sec Loss 3.8226 LearningRate 0.0001 Epoch: 26 Global Step: 556610 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:37,678-Speed 6305.94 samples/sec Loss 3.7883 LearningRate 0.0001 Epoch: 26 Global Step: 556620 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:40,924-Speed 6311.04 samples/sec Loss 3.8263 LearningRate 0.0001 Epoch: 26 Global Step: 556630 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:44,174-Speed 6301.38 samples/sec Loss 3.8935 LearningRate 0.0001 Epoch: 26 Global Step: 556640 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:47,419-Speed 6313.74 samples/sec Loss 3.8516 LearningRate 0.0001 Epoch: 26 Global Step: 556650 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:21:50,669-Speed 6303.19 samples/sec Loss 3.8309 LearningRate 0.0001 Epoch: 26 Global Step: 556660 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:21:53,914-Speed 6311.68 samples/sec Loss 3.8624 LearningRate 0.0001 Epoch: 26 Global Step: 556670 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:21:57,171-Speed 6290.14 samples/sec Loss 3.8796 LearningRate 0.0001 Epoch: 26 Global Step: 556680 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:22:00,422-Speed 6301.19 samples/sec Loss 3.8168 LearningRate 0.0001 Epoch: 26 Global Step: 556690 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:22:03,668-Speed 6309.28 samples/sec Loss 3.8687 LearningRate 0.0001 Epoch: 26 Global Step: 556700 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:22:06,916-Speed 6306.63 samples/sec Loss 3.8756 LearningRate 0.0001 Epoch: 26 Global Step: 556710 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:22:10,167-Speed 6302.99 samples/sec Loss 3.8160 LearningRate 0.0001 Epoch: 26 Global Step: 556720 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:22:13,412-Speed 6312.22 samples/sec Loss 3.9272 LearningRate 0.0001 Epoch: 26 Global Step: 556730 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:22:16,673-Speed 6280.15 samples/sec Loss 3.8332 LearningRate 0.0001 Epoch: 26 Global Step: 556740 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:22:19,941-Speed 6268.87 samples/sec Loss 3.9356 LearningRate 0.0001 Epoch: 26 Global Step: 556750 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:22:23,187-Speed 6310.43 samples/sec Loss 3.9001 LearningRate 0.0001 Epoch: 26 Global Step: 556760 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:22:26,435-Speed 6308.53 samples/sec Loss 3.9175 LearningRate 0.0001 Epoch: 26 Global Step: 556770 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:22:29,679-Speed 6313.84 samples/sec Loss 3.8273 LearningRate 0.0001 Epoch: 26 Global Step: 556780 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:22:32,927-Speed 6306.24 samples/sec Loss 3.7910 LearningRate 0.0001 Epoch: 26 Global Step: 556790 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:22:36,175-Speed 6306.50 samples/sec Loss 3.8365 LearningRate 0.0001 Epoch: 26 Global Step: 556800 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:22:39,422-Speed 6310.23 samples/sec Loss 3.8839 LearningRate 0.0001 Epoch: 26 Global Step: 556810 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:22:42,667-Speed 6312.25 samples/sec Loss 3.8671 LearningRate 0.0001 Epoch: 26 Global Step: 556820 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:22:45,914-Speed 6309.10 samples/sec Loss 3.8328 LearningRate 0.0001 Epoch: 26 Global Step: 556830 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:22:49,161-Speed 6309.84 samples/sec Loss 3.8270 LearningRate 0.0001 Epoch: 26 Global Step: 556840 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:22:52,407-Speed 6309.32 samples/sec Loss 3.8368 LearningRate 0.0001 Epoch: 26 Global Step: 556850 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:22:55,652-Speed 6313.43 samples/sec Loss 3.8705 LearningRate 0.0001 Epoch: 26 Global Step: 556860 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:22:58,900-Speed 6306.90 samples/sec Loss 3.8769 LearningRate 0.0001 Epoch: 26 Global Step: 556870 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:23:02,149-Speed 6305.24 samples/sec Loss 3.8564 LearningRate 0.0001 Epoch: 26 Global Step: 556880 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:23:05,404-Speed 6291.87 samples/sec Loss 3.9023 LearningRate 0.0001 Epoch: 26 Global Step: 556890 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:08,648-Speed 6315.55 samples/sec Loss 3.8587 LearningRate 0.0001 Epoch: 26 Global Step: 556900 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:11,899-Speed 6300.96 samples/sec Loss 3.8270 LearningRate 0.0001 Epoch: 26 Global Step: 556910 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:15,145-Speed 6310.06 samples/sec Loss 3.8218 LearningRate 0.0001 Epoch: 26 Global Step: 556920 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:18,393-Speed 6306.80 samples/sec Loss 3.8362 LearningRate 0.0001 Epoch: 26 Global Step: 556930 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:21,641-Speed 6307.65 samples/sec Loss 3.8533 LearningRate 0.0001 Epoch: 26 Global Step: 556940 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:24,919-Speed 6249.07 samples/sec Loss 3.8361 LearningRate 0.0001 Epoch: 26 Global Step: 556950 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:28,165-Speed 6310.02 samples/sec Loss 3.8062 LearningRate 0.0001 Epoch: 26 Global Step: 556960 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:31,410-Speed 6312.46 samples/sec Loss 3.8780 LearningRate 0.0001 Epoch: 26 Global Step: 556970 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:34,687-Speed 6252.13 samples/sec Loss 3.8679 LearningRate 0.0001 Epoch: 26 Global Step: 556980 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:37,937-Speed 6302.42 samples/sec Loss 3.8995 LearningRate 0.0001 Epoch: 26 Global Step: 556990 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:23:41,186-Speed 6304.79 samples/sec Loss 3.7297 LearningRate 0.0001 Epoch: 26 Global Step: 557000 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:23:44,431-Speed 6313.45 samples/sec Loss 3.9037 LearningRate 0.0001 Epoch: 26 Global Step: 557010 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:23:47,677-Speed 6309.51 samples/sec Loss 3.8461 LearningRate 0.0001 Epoch: 26 Global Step: 557020 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:23:50,922-Speed 6313.70 samples/sec Loss 3.8875 LearningRate 0.0001 Epoch: 26 Global Step: 557030 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:23:54,154-Speed 6337.55 samples/sec Loss 3.8025 LearningRate 0.0001 Epoch: 26 Global Step: 557040 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:23:57,400-Speed 6311.11 samples/sec Loss 3.8892 LearningRate 0.0001 Epoch: 26 Global Step: 557050 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:00,647-Speed 6308.54 samples/sec Loss 3.8443 LearningRate 0.0001 Epoch: 26 Global Step: 557060 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:03,894-Speed 6309.75 samples/sec Loss 3.9185 LearningRate 0.0001 Epoch: 26 Global Step: 557070 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:07,142-Speed 6307.75 samples/sec Loss 3.8086 LearningRate 0.0001 Epoch: 26 Global Step: 557080 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:10,391-Speed 6304.41 samples/sec Loss 3.8188 LearningRate 0.0001 Epoch: 26 Global Step: 557090 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:13,637-Speed 6310.47 samples/sec Loss 3.8549 LearningRate 0.0001 Epoch: 26 Global Step: 557100 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:16,880-Speed 6315.36 samples/sec Loss 3.8696 LearningRate 0.0001 Epoch: 26 Global Step: 557110 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:20,127-Speed 6309.33 samples/sec Loss 3.8350 LearningRate 0.0001 Epoch: 26 Global Step: 557120 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:23,372-Speed 6312.41 samples/sec Loss 3.8169 LearningRate 0.0001 Epoch: 26 Global Step: 557130 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:26,620-Speed 6306.55 samples/sec Loss 3.8606 LearningRate 0.0001 Epoch: 26 Global Step: 557140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:24:29,873-Speed 6298.40 samples/sec Loss 3.8666 LearningRate 0.0001 Epoch: 26 Global Step: 557150 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:24:33,117-Speed 6314.35 samples/sec Loss 3.7975 LearningRate 0.0001 Epoch: 26 Global Step: 557160 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:24:36,361-Speed 6315.43 samples/sec Loss 3.8759 LearningRate 0.0001 Epoch: 26 Global Step: 557170 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:24:39,591-Speed 6341.42 samples/sec Loss 3.8491 LearningRate 0.0001 Epoch: 26 Global Step: 557180 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:42,835-Speed 6314.17 samples/sec Loss 3.8613 LearningRate 0.0001 Epoch: 26 Global Step: 557190 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:46,085-Speed 6303.01 samples/sec Loss 3.8290 LearningRate 0.0001 Epoch: 26 Global Step: 557200 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:49,333-Speed 6305.95 samples/sec Loss 3.8804 LearningRate 0.0001 Epoch: 26 Global Step: 557210 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:52,580-Speed 6309.60 samples/sec Loss 3.8596 LearningRate 0.0001 Epoch: 26 Global Step: 557220 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:55,824-Speed 6313.92 samples/sec Loss 3.8028 LearningRate 0.0001 Epoch: 26 Global Step: 557230 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:24:59,071-Speed 6309.32 samples/sec Loss 3.8510 LearningRate 0.0001 Epoch: 26 Global Step: 557240 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:25:02,314-Speed 6317.26 samples/sec Loss 3.8669 LearningRate 0.0001 Epoch: 26 Global Step: 557250 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:25:05,563-Speed 6305.00 samples/sec Loss 3.7722 LearningRate 0.0001 Epoch: 26 Global Step: 557260 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:25:08,809-Speed 6311.44 samples/sec Loss 3.9285 LearningRate 0.0001 Epoch: 26 Global Step: 557270 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:25:12,054-Speed 6312.83 samples/sec Loss 3.8419 LearningRate 0.0001 Epoch: 26 Global Step: 557280 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:15,297-Speed 6316.26 samples/sec Loss 3.8832 LearningRate 0.0001 Epoch: 26 Global Step: 557290 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:18,543-Speed 6310.64 samples/sec Loss 3.8227 LearningRate 0.0001 Epoch: 26 Global Step: 557300 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:21,788-Speed 6312.51 samples/sec Loss 3.8338 LearningRate 0.0001 Epoch: 26 Global Step: 557310 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:25,124-Speed 6140.81 samples/sec Loss 3.8617 LearningRate 0.0001 Epoch: 26 Global Step: 557320 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:28,388-Speed 6276.00 samples/sec Loss 3.8548 LearningRate 0.0001 Epoch: 26 Global Step: 557330 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:31,633-Speed 6311.98 samples/sec Loss 3.8694 LearningRate 0.0001 Epoch: 26 Global Step: 557340 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:34,883-Speed 6303.00 samples/sec Loss 3.8597 LearningRate 0.0001 Epoch: 26 Global Step: 557350 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:38,128-Speed 6312.50 samples/sec Loss 3.8175 LearningRate 0.0001 Epoch: 26 Global Step: 557360 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:41,377-Speed 6304.34 samples/sec Loss 3.8299 LearningRate 0.0001 Epoch: 26 Global Step: 557370 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:44,607-Speed 6342.45 samples/sec Loss 3.8463 LearningRate 0.0001 Epoch: 26 Global Step: 557380 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:47,848-Speed 6319.63 samples/sec Loss 3.8552 LearningRate 0.0001 Epoch: 26 Global Step: 557390 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:51,094-Speed 6311.13 samples/sec Loss 3.8607 LearningRate 0.0001 Epoch: 26 Global Step: 557400 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:54,340-Speed 6309.93 samples/sec Loss 3.8919 LearningRate 0.0001 Epoch: 26 Global Step: 557410 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:25:57,587-Speed 6310.63 samples/sec Loss 3.8088 LearningRate 0.0001 Epoch: 26 Global Step: 557420 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:26:00,841-Speed 6293.24 samples/sec Loss 3.8194 LearningRate 0.0001 Epoch: 26 Global Step: 557430 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:26:04,083-Speed 6319.65 samples/sec Loss 3.8696 LearningRate 0.0001 Epoch: 26 Global Step: 557440 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:26:07,325-Speed 6318.22 samples/sec Loss 3.9201 LearningRate 0.0001 Epoch: 26 Global Step: 557450 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:26:10,571-Speed 6310.85 samples/sec Loss 3.8722 LearningRate 0.0001 Epoch: 26 Global Step: 557460 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:26:13,807-Speed 6330.93 samples/sec Loss 3.8685 LearningRate 0.0001 Epoch: 26 Global Step: 557470 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:17,051-Speed 6315.19 samples/sec Loss 3.8241 LearningRate 0.0001 Epoch: 26 Global Step: 557480 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:20,290-Speed 6323.74 samples/sec Loss 3.8293 LearningRate 0.0001 Epoch: 26 Global Step: 557490 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:23,534-Speed 6313.94 samples/sec Loss 3.8767 LearningRate 0.0001 Epoch: 26 Global Step: 557500 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:26,779-Speed 6313.97 samples/sec Loss 3.8454 LearningRate 0.0001 Epoch: 26 Global Step: 557510 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:30,023-Speed 6315.23 samples/sec Loss 3.8141 LearningRate 0.0001 Epoch: 26 Global Step: 557520 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:33,270-Speed 6308.18 samples/sec Loss 3.8870 LearningRate 0.0001 Epoch: 26 Global Step: 557530 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:36,510-Speed 6321.70 samples/sec Loss 3.8357 LearningRate 0.0001 Epoch: 26 Global Step: 557540 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:39,759-Speed 6304.83 samples/sec Loss 3.8482 LearningRate 0.0001 Epoch: 26 Global Step: 557550 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:43,004-Speed 6312.59 samples/sec Loss 3.8275 LearningRate 0.0001 Epoch: 26 Global Step: 557560 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:46,246-Speed 6319.76 samples/sec Loss 3.7838 LearningRate 0.0001 Epoch: 26 Global Step: 557570 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:26:49,489-Speed 6315.05 samples/sec Loss 3.9062 LearningRate 0.0001 Epoch: 26 Global Step: 557580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:26:52,721-Speed 6338.07 samples/sec Loss 3.8465 LearningRate 0.0001 Epoch: 26 Global Step: 557590 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:55,967-Speed 6311.39 samples/sec Loss 3.9399 LearningRate 0.0001 Epoch: 26 Global Step: 557600 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:26:59,211-Speed 6315.03 samples/sec Loss 3.8505 LearningRate 0.0001 Epoch: 26 Global Step: 557610 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:27:02,465-Speed 6296.08 samples/sec Loss 3.9288 LearningRate 0.0001 Epoch: 26 Global Step: 557620 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:27:05,710-Speed 6312.48 samples/sec Loss 3.8391 LearningRate 0.0001 Epoch: 26 Global Step: 557630 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:27:08,954-Speed 6313.44 samples/sec Loss 3.8611 LearningRate 0.0001 Epoch: 26 Global Step: 557640 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:27:12,199-Speed 6313.03 samples/sec Loss 3.8177 LearningRate 0.0001 Epoch: 26 Global Step: 557650 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:27:15,443-Speed 6314.96 samples/sec Loss 3.8561 LearningRate 0.0001 Epoch: 26 Global Step: 557660 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:27:18,688-Speed 6311.37 samples/sec Loss 3.8809 LearningRate 0.0001 Epoch: 26 Global Step: 557670 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:27:21,935-Speed 6310.12 samples/sec Loss 3.9172 LearningRate 0.0001 Epoch: 26 Global Step: 557680 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:27:25,183-Speed 6307.12 samples/sec Loss 3.8272 LearningRate 0.0001 Epoch: 26 Global Step: 557690 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:27:28,431-Speed 6306.72 samples/sec Loss 3.8822 LearningRate 0.0001 Epoch: 26 Global Step: 557700 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:27:31,677-Speed 6310.17 samples/sec Loss 3.8658 LearningRate 0.0001 Epoch: 26 Global Step: 557710 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:27:34,921-Speed 6315.61 samples/sec Loss 3.8951 LearningRate 0.0001 Epoch: 26 Global Step: 557720 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:27:38,166-Speed 6313.08 samples/sec Loss 3.8716 LearningRate 0.0001 Epoch: 26 Global Step: 557730 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:27:41,433-Speed 6270.17 samples/sec Loss 3.8698 LearningRate 0.0001 Epoch: 26 Global Step: 557740 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:27:44,690-Speed 6288.25 samples/sec Loss 3.8496 LearningRate 0.0001 Epoch: 26 Global Step: 557750 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:27:47,936-Speed 6311.20 samples/sec Loss 3.8504 LearningRate 0.0001 Epoch: 26 Global Step: 557760 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:27:51,179-Speed 6317.17 samples/sec Loss 3.8654 LearningRate 0.0001 Epoch: 26 Global Step: 557770 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:27:54,435-Speed 6290.15 samples/sec Loss 3.8768 LearningRate 0.0001 Epoch: 26 Global Step: 557780 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:27:57,670-Speed 6332.88 samples/sec Loss 3.8752 LearningRate 0.0001 Epoch: 26 Global Step: 557790 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:28:00,919-Speed 6304.37 samples/sec Loss 3.8457 LearningRate 0.0001 Epoch: 26 Global Step: 557800 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:28:04,168-Speed 6305.64 samples/sec Loss 3.7976 LearningRate 0.0001 Epoch: 26 Global Step: 557810 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:28:07,417-Speed 6305.09 samples/sec Loss 3.7962 LearningRate 0.0001 Epoch: 26 Global Step: 557820 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:28:10,674-Speed 6287.65 samples/sec Loss 3.7808 LearningRate 0.0001 Epoch: 26 Global Step: 557830 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:28:13,923-Speed 6306.37 samples/sec Loss 3.8609 LearningRate 0.0001 Epoch: 26 Global Step: 557840 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:28:17,159-Speed 6329.99 samples/sec Loss 3.8550 LearningRate 0.0001 Epoch: 26 Global Step: 557850 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:28:20,403-Speed 6314.31 samples/sec Loss 3.7749 LearningRate 0.0001 Epoch: 26 Global Step: 557860 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:28:23,648-Speed 6311.40 samples/sec Loss 3.8199 LearningRate 0.0001 Epoch: 26 Global Step: 557870 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:28:26,896-Speed 6309.16 samples/sec Loss 3.8436 LearningRate 0.0001 Epoch: 26 Global Step: 557880 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:28:30,138-Speed 6317.01 samples/sec Loss 3.8624 LearningRate 0.0001 Epoch: 26 Global Step: 557890 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:28:33,383-Speed 6313.34 samples/sec Loss 3.8471 LearningRate 0.0001 Epoch: 26 Global Step: 557900 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:28:36,635-Speed 6299.98 samples/sec Loss 3.8739 LearningRate 0.0001 Epoch: 26 Global Step: 557910 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:28:39,880-Speed 6311.79 samples/sec Loss 3.8237 LearningRate 0.0001 Epoch: 26 Global Step: 557920 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:28:43,123-Speed 6316.02 samples/sec Loss 3.8126 LearningRate 0.0001 Epoch: 26 Global Step: 557930 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:28:46,372-Speed 6306.71 samples/sec Loss 3.8496 LearningRate 0.0001 Epoch: 26 Global Step: 557940 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:28:49,620-Speed 6306.54 samples/sec Loss 3.8601 LearningRate 0.0001 Epoch: 26 Global Step: 557950 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:28:52,865-Speed 6311.99 samples/sec Loss 3.9571 LearningRate 0.0001 Epoch: 26 Global Step: 557960 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:28:56,106-Speed 6319.66 samples/sec Loss 3.8595 LearningRate 0.0001 Epoch: 26 Global Step: 557970 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:28:59,340-Speed 6335.81 samples/sec Loss 3.8813 LearningRate 0.0001 Epoch: 26 Global Step: 557980 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:02,589-Speed 6305.07 samples/sec Loss 3.7915 LearningRate 0.0001 Epoch: 26 Global Step: 557990 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:05,836-Speed 6308.82 samples/sec Loss 3.8822 LearningRate 0.0001 Epoch: 26 Global Step: 558000 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:09,085-Speed 6303.94 samples/sec Loss 3.8722 LearningRate 0.0001 Epoch: 26 Global Step: 558010 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:12,329-Speed 6315.10 samples/sec Loss 3.8734 LearningRate 0.0001 Epoch: 26 Global Step: 558020 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:15,577-Speed 6307.53 samples/sec Loss 3.8652 LearningRate 0.0001 Epoch: 26 Global Step: 558030 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:18,828-Speed 6299.90 samples/sec Loss 3.8526 LearningRate 0.0001 Epoch: 26 Global Step: 558040 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:22,086-Speed 6287.01 samples/sec Loss 3.8404 LearningRate 0.0001 Epoch: 26 Global Step: 558050 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:25,332-Speed 6310.42 samples/sec Loss 3.8054 LearningRate 0.0001 Epoch: 26 Global Step: 558060 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:28,582-Speed 6304.76 samples/sec Loss 3.8698 LearningRate 0.0001 Epoch: 26 Global Step: 558070 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:31,825-Speed 6315.95 samples/sec Loss 3.7730 LearningRate 0.0001 Epoch: 26 Global Step: 558080 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:29:35,072-Speed 6309.12 samples/sec Loss 3.8504 LearningRate 0.0001 Epoch: 26 Global Step: 558090 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:29:38,304-Speed 6338.49 samples/sec Loss 3.8732 LearningRate 0.0001 Epoch: 26 Global Step: 558100 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:41,549-Speed 6313.16 samples/sec Loss 3.8498 LearningRate 0.0001 Epoch: 26 Global Step: 558110 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:44,796-Speed 6307.28 samples/sec Loss 3.7999 LearningRate 0.0001 Epoch: 26 Global Step: 558120 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:48,040-Speed 6315.87 samples/sec Loss 3.8458 LearningRate 0.0001 Epoch: 26 Global Step: 558130 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:51,283-Speed 6316.37 samples/sec Loss 3.8646 LearningRate 0.0001 Epoch: 26 Global Step: 558140 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:54,526-Speed 6315.43 samples/sec Loss 3.8547 LearningRate 0.0001 Epoch: 26 Global Step: 558150 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:29:57,773-Speed 6310.38 samples/sec Loss 3.8484 LearningRate 0.0001 Epoch: 26 Global Step: 558160 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:01,017-Speed 6312.93 samples/sec Loss 3.8883 LearningRate 0.0001 Epoch: 26 Global Step: 558170 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:04,263-Speed 6311.89 samples/sec Loss 3.8204 LearningRate 0.0001 Epoch: 26 Global Step: 558180 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:07,506-Speed 6316.06 samples/sec Loss 3.8730 LearningRate 0.0001 Epoch: 26 Global Step: 558190 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:10,751-Speed 6312.12 samples/sec Loss 3.8432 LearningRate 0.0001 Epoch: 26 Global Step: 558200 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:30:13,998-Speed 6309.75 samples/sec Loss 3.8219 LearningRate 0.0001 Epoch: 26 Global Step: 558210 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:30:17,242-Speed 6315.20 samples/sec Loss 3.8401 LearningRate 0.0001 Epoch: 26 Global Step: 558220 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:30:20,490-Speed 6306.47 samples/sec Loss 3.8580 LearningRate 0.0001 Epoch: 26 Global Step: 558230 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:30:23,735-Speed 6312.06 samples/sec Loss 3.8304 LearningRate 0.0001 Epoch: 26 Global Step: 558240 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:30:26,988-Speed 6298.19 samples/sec Loss 3.8748 LearningRate 0.0001 Epoch: 26 Global Step: 558250 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:30:30,235-Speed 6307.80 samples/sec Loss 3.8815 LearningRate 0.0001 Epoch: 26 Global Step: 558260 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:30:33,477-Speed 6318.47 samples/sec Loss 3.8242 LearningRate 0.0001 Epoch: 26 Global Step: 558270 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:30:36,712-Speed 6333.80 samples/sec Loss 3.8662 LearningRate 0.0001 Epoch: 26 Global Step: 558280 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:39,958-Speed 6309.93 samples/sec Loss 3.8772 LearningRate 0.0001 Epoch: 26 Global Step: 558290 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:43,204-Speed 6311.62 samples/sec Loss 3.8592 LearningRate 0.0001 Epoch: 26 Global Step: 558300 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:46,463-Speed 6285.49 samples/sec Loss 3.8209 LearningRate 0.0001 Epoch: 26 Global Step: 558310 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:49,705-Speed 6317.92 samples/sec Loss 3.8619 LearningRate 0.0001 Epoch: 26 Global Step: 558320 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:52,947-Speed 6318.80 samples/sec Loss 3.8347 LearningRate 0.0001 Epoch: 26 Global Step: 558330 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:56,191-Speed 6314.96 samples/sec Loss 3.9043 LearningRate 0.0001 Epoch: 26 Global Step: 558340 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:30:59,438-Speed 6308.88 samples/sec Loss 3.8143 LearningRate 0.0001 Epoch: 26 Global Step: 558350 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:31:02,681-Speed 6316.46 samples/sec Loss 3.8905 LearningRate 0.0001 Epoch: 26 Global Step: 558360 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:31:05,924-Speed 6315.65 samples/sec Loss 3.8179 LearningRate 0.0001 Epoch: 26 Global Step: 558370 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:31:09,167-Speed 6316.33 samples/sec Loss 3.8858 LearningRate 0.0001 Epoch: 26 Global Step: 558380 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:31:12,410-Speed 6316.89 samples/sec Loss 3.8046 LearningRate 0.0001 Epoch: 26 Global Step: 558390 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:31:15,655-Speed 6313.58 samples/sec Loss 3.8411 LearningRate 0.0001 Epoch: 26 Global Step: 558400 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:31:18,900-Speed 6312.78 samples/sec Loss 3.8858 LearningRate 0.0001 Epoch: 26 Global Step: 558410 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:31:22,144-Speed 6315.41 samples/sec Loss 3.9112 LearningRate 0.0001 Epoch: 26 Global Step: 558420 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:31:25,385-Speed 6320.83 samples/sec Loss 3.8721 LearningRate 0.0001 Epoch: 26 Global Step: 558430 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:31:28,632-Speed 6307.68 samples/sec Loss 3.8648 LearningRate 0.0001 Epoch: 26 Global Step: 558440 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:31:31,879-Speed 6308.96 samples/sec Loss 3.8631 LearningRate 0.0001 Epoch: 26 Global Step: 558450 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:31:35,124-Speed 6311.90 samples/sec Loss 3.8759 LearningRate 0.0001 Epoch: 26 Global Step: 558460 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:31:38,370-Speed 6311.56 samples/sec Loss 3.8930 LearningRate 0.0001 Epoch: 26 Global Step: 558470 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:31:41,585-Speed 6372.46 samples/sec Loss 3.8320 LearningRate 0.0001 Epoch: 26 Global Step: 558480 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:31:44,834-Speed 6303.61 samples/sec Loss 3.8678 LearningRate 0.0001 Epoch: 26 Global Step: 558490 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:31:48,083-Speed 6305.89 samples/sec Loss 3.8248 LearningRate 0.0001 Epoch: 26 Global Step: 558500 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:31:51,328-Speed 6313.16 samples/sec Loss 3.8697 LearningRate 0.0001 Epoch: 26 Global Step: 558510 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:31:54,573-Speed 6312.03 samples/sec Loss 3.8825 LearningRate 0.0001 Epoch: 26 Global Step: 558520 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:31:57,819-Speed 6311.19 samples/sec Loss 3.8497 LearningRate 0.0001 Epoch: 26 Global Step: 558530 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:01,068-Speed 6308.40 samples/sec Loss 3.8974 LearningRate 0.0001 Epoch: 26 Global Step: 558540 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:04,311-Speed 6315.98 samples/sec Loss 3.8916 LearningRate 0.0001 Epoch: 26 Global Step: 558550 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:07,557-Speed 6310.18 samples/sec Loss 3.8229 LearningRate 0.0001 Epoch: 26 Global Step: 558560 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:10,802-Speed 6312.66 samples/sec Loss 3.8779 LearningRate 0.0001 Epoch: 26 Global Step: 558570 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:14,047-Speed 6312.95 samples/sec Loss 3.8529 LearningRate 0.0001 Epoch: 26 Global Step: 558580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:32:17,295-Speed 6307.26 samples/sec Loss 3.8437 LearningRate 0.0001 Epoch: 26 Global Step: 558590 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:32:20,538-Speed 6315.76 samples/sec Loss 3.7944 LearningRate 0.0001 Epoch: 26 Global Step: 558600 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:32:23,828-Speed 6226.66 samples/sec Loss 3.8765 LearningRate 0.0001 Epoch: 26 Global Step: 558610 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:32:27,087-Speed 6284.50 samples/sec Loss 3.8922 LearningRate 0.0001 Epoch: 26 Global Step: 558620 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:32:30,356-Speed 6267.68 samples/sec Loss 3.8305 LearningRate 0.0001 Epoch: 26 Global Step: 558630 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:32:33,602-Speed 6310.87 samples/sec Loss 3.7856 LearningRate 0.0001 Epoch: 26 Global Step: 558640 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:32:36,833-Speed 6338.70 samples/sec Loss 3.8539 LearningRate 0.0001 Epoch: 26 Global Step: 558650 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:40,090-Speed 6290.99 samples/sec Loss 3.8833 LearningRate 0.0001 Epoch: 26 Global Step: 558660 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:43,355-Speed 6273.46 samples/sec Loss 3.8620 LearningRate 0.0001 Epoch: 26 Global Step: 558670 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:46,620-Speed 6273.76 samples/sec Loss 3.8050 LearningRate 0.0001 Epoch: 26 Global Step: 558680 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:49,873-Speed 6296.21 samples/sec Loss 3.8627 LearningRate 0.0001 Epoch: 26 Global Step: 558690 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:53,120-Speed 6308.89 samples/sec Loss 3.8417 LearningRate 0.0001 Epoch: 26 Global Step: 558700 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:56,367-Speed 6309.27 samples/sec Loss 3.8775 LearningRate 0.0001 Epoch: 26 Global Step: 558710 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:32:59,612-Speed 6311.89 samples/sec Loss 3.8291 LearningRate 0.0001 Epoch: 26 Global Step: 558720 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:02,859-Speed 6310.40 samples/sec Loss 3.8668 LearningRate 0.0001 Epoch: 26 Global Step: 558730 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:06,103-Speed 6314.93 samples/sec Loss 3.8461 LearningRate 0.0001 Epoch: 26 Global Step: 558740 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:09,349-Speed 6310.48 samples/sec Loss 3.7974 LearningRate 0.0001 Epoch: 26 Global Step: 558750 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:33:12,594-Speed 6313.30 samples/sec Loss 3.8922 LearningRate 0.0001 Epoch: 26 Global Step: 558760 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:33:15,865-Speed 6262.66 samples/sec Loss 3.8455 LearningRate 0.0001 Epoch: 26 Global Step: 558770 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:33:19,110-Speed 6311.79 samples/sec Loss 3.8153 LearningRate 0.0001 Epoch: 26 Global Step: 558780 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:33:22,355-Speed 6313.11 samples/sec Loss 3.8540 LearningRate 0.0001 Epoch: 26 Global Step: 558790 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:33:25,585-Speed 6341.59 samples/sec Loss 3.8618 LearningRate 0.0001 Epoch: 26 Global Step: 558800 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:28,827-Speed 6318.33 samples/sec Loss 3.8875 LearningRate 0.0001 Epoch: 26 Global Step: 558810 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:32,074-Speed 6309.67 samples/sec Loss 3.8725 LearningRate 0.0001 Epoch: 26 Global Step: 558820 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:35,319-Speed 6312.32 samples/sec Loss 3.8482 LearningRate 0.0001 Epoch: 26 Global Step: 558830 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:38,565-Speed 6310.39 samples/sec Loss 3.8068 LearningRate 0.0001 Epoch: 26 Global Step: 558840 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:41,811-Speed 6311.14 samples/sec Loss 3.8934 LearningRate 0.0001 Epoch: 26 Global Step: 558850 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:45,057-Speed 6309.14 samples/sec Loss 3.8621 LearningRate 0.0001 Epoch: 26 Global Step: 558860 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:48,303-Speed 6311.59 samples/sec Loss 3.9019 LearningRate 0.0001 Epoch: 26 Global Step: 558870 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:51,549-Speed 6310.33 samples/sec Loss 3.9035 LearningRate 0.0001 Epoch: 26 Global Step: 558880 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:54,793-Speed 6314.43 samples/sec Loss 3.8180 LearningRate 0.0001 Epoch: 26 Global Step: 558890 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:33:58,038-Speed 6313.52 samples/sec Loss 3.8713 LearningRate 0.0001 Epoch: 26 Global Step: 558900 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:34:01,282-Speed 6314.73 samples/sec Loss 3.8768 LearningRate 0.0001 Epoch: 26 Global Step: 558910 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:34:04,533-Speed 6301.83 samples/sec Loss 3.8818 LearningRate 0.0001 Epoch: 26 Global Step: 558920 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:34:07,818-Speed 6235.10 samples/sec Loss 3.8856 LearningRate 0.0001 Epoch: 26 Global Step: 558930 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:34:11,063-Speed 6312.64 samples/sec Loss 3.8305 LearningRate 0.0001 Epoch: 26 Global Step: 558940 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:34:14,325-Speed 6280.52 samples/sec Loss 3.8486 LearningRate 0.0001 Epoch: 26 Global Step: 558950 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:34:17,559-Speed 6332.99 samples/sec Loss 3.8633 LearningRate 0.0001 Epoch: 26 Global Step: 558960 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:34:20,809-Speed 6303.36 samples/sec Loss 3.9190 LearningRate 0.0001 Epoch: 26 Global Step: 558970 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:34:24,052-Speed 6317.07 samples/sec Loss 3.8410 LearningRate 0.0001 Epoch: 26 Global Step: 558980 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:34:27,298-Speed 6310.95 samples/sec Loss 3.8710 LearningRate 0.0001 Epoch: 26 Global Step: 558990 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:34:30,542-Speed 6314.13 samples/sec Loss 3.8518 LearningRate 0.0001 Epoch: 26 Global Step: 559000 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:34:33,785-Speed 6315.87 samples/sec Loss 3.7955 LearningRate 0.0001 Epoch: 26 Global Step: 559010 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:34:37,029-Speed 6315.21 samples/sec Loss 3.8748 LearningRate 0.0001 Epoch: 26 Global Step: 559020 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:34:40,280-Speed 6301.10 samples/sec Loss 3.8689 LearningRate 0.0001 Epoch: 26 Global Step: 559030 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:34:43,524-Speed 6314.02 samples/sec Loss 3.8624 LearningRate 0.0001 Epoch: 26 Global Step: 559040 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:34:46,772-Speed 6306.46 samples/sec Loss 3.8424 LearningRate 0.0001 Epoch: 26 Global Step: 559050 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:34:50,025-Speed 6296.79 samples/sec Loss 3.8882 LearningRate 0.0001 Epoch: 26 Global Step: 559060 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:34:53,279-Speed 6296.99 samples/sec Loss 3.8305 LearningRate 0.0001 Epoch: 26 Global Step: 559070 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:34:56,524-Speed 6311.82 samples/sec Loss 3.8113 LearningRate 0.0001 Epoch: 26 Global Step: 559080 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:34:59,770-Speed 6310.97 samples/sec Loss 3.8681 LearningRate 0.0001 Epoch: 26 Global Step: 559090 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:35:03,014-Speed 6314.66 samples/sec Loss 3.8628 LearningRate 0.0001 Epoch: 26 Global Step: 559100 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:35:06,259-Speed 6311.63 samples/sec Loss 3.8485 LearningRate 0.0001 Epoch: 26 Global Step: 559110 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:35:09,504-Speed 6314.58 samples/sec Loss 3.8335 LearningRate 0.0001 Epoch: 26 Global Step: 559120 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:35:12,752-Speed 6306.86 samples/sec Loss 3.8825 LearningRate 0.0001 Epoch: 26 Global Step: 559130 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:35:15,996-Speed 6314.63 samples/sec Loss 3.8015 LearningRate 0.0001 Epoch: 26 Global Step: 559140 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:35:19,228-Speed 6337.73 samples/sec Loss 3.8511 LearningRate 0.0001 Epoch: 26 Global Step: 559150 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:35:22,473-Speed 6313.62 samples/sec Loss 3.7976 LearningRate 0.0001 Epoch: 26 Global Step: 559160 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:35:25,718-Speed 6312.65 samples/sec Loss 3.7844 LearningRate 0.0001 Epoch: 26 Global Step: 559170 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:35:28,963-Speed 6311.94 samples/sec Loss 3.8244 LearningRate 0.0001 Epoch: 26 Global Step: 559180 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:35:32,209-Speed 6310.35 samples/sec Loss 3.8569 LearningRate 0.0001 Epoch: 26 Global Step: 559190 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:35:35,452-Speed 6317.61 samples/sec Loss 3.8954 LearningRate 0.0001 Epoch: 26 Global Step: 559200 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:35:38,698-Speed 6310.46 samples/sec Loss 3.8375 LearningRate 0.0001 Epoch: 26 Global Step: 559210 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:35:41,940-Speed 6317.56 samples/sec Loss 3.8130 LearningRate 0.0001 Epoch: 26 Global Step: 559220 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:35:45,185-Speed 6312.62 samples/sec Loss 3.8494 LearningRate 0.0001 Epoch: 26 Global Step: 559230 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:35:48,437-Speed 6299.73 samples/sec Loss 3.8219 LearningRate 0.0001 Epoch: 26 Global Step: 559240 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:35:51,676-Speed 6324.51 samples/sec Loss 3.7890 LearningRate 0.0001 Epoch: 26 Global Step: 559250 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:35:54,920-Speed 6313.49 samples/sec Loss 3.8748 LearningRate 0.0001 Epoch: 26 Global Step: 559260 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:35:58,162-Speed 6319.00 samples/sec Loss 3.8838 LearningRate 0.0001 Epoch: 26 Global Step: 559270 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:36:01,408-Speed 6312.00 samples/sec Loss 3.8382 LearningRate 0.0001 Epoch: 26 Global Step: 559280 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:36:04,658-Speed 6302.74 samples/sec Loss 3.8306 LearningRate 0.0001 Epoch: 26 Global Step: 559290 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:36:07,901-Speed 6315.46 samples/sec Loss 3.9003 LearningRate 0.0001 Epoch: 26 Global Step: 559300 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:36:11,134-Speed 6337.37 samples/sec Loss 3.8536 LearningRate 0.0001 Epoch: 26 Global Step: 559310 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:14,380-Speed 6310.14 samples/sec Loss 3.7741 LearningRate 0.0001 Epoch: 26 Global Step: 559320 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:17,628-Speed 6307.49 samples/sec Loss 3.8348 LearningRate 0.0001 Epoch: 26 Global Step: 559330 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:20,871-Speed 6317.39 samples/sec Loss 3.8728 LearningRate 0.0001 Epoch: 26 Global Step: 559340 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:24,118-Speed 6307.53 samples/sec Loss 3.8871 LearningRate 0.0001 Epoch: 26 Global Step: 559350 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:27,363-Speed 6312.88 samples/sec Loss 3.8995 LearningRate 0.0001 Epoch: 26 Global Step: 559360 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:30,607-Speed 6315.43 samples/sec Loss 3.8697 LearningRate 0.0001 Epoch: 26 Global Step: 559370 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:33,850-Speed 6316.07 samples/sec Loss 3.8582 LearningRate 0.0001 Epoch: 26 Global Step: 559380 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:37,100-Speed 6301.81 samples/sec Loss 3.8167 LearningRate 0.0001 Epoch: 26 Global Step: 559390 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:40,354-Speed 6296.91 samples/sec Loss 3.8655 LearningRate 0.0001 Epoch: 26 Global Step: 559400 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:43,597-Speed 6315.03 samples/sec Loss 3.8868 LearningRate 0.0001 Epoch: 26 Global Step: 559410 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:36:46,853-Speed 6291.73 samples/sec Loss 3.8653 LearningRate 0.0001 Epoch: 26 Global Step: 559420 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:36:50,099-Speed 6311.38 samples/sec Loss 3.8405 LearningRate 0.0001 Epoch: 26 Global Step: 559430 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:36:53,329-Speed 6340.76 samples/sec Loss 3.8005 LearningRate 0.0001 Epoch: 26 Global Step: 559440 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:56,573-Speed 6315.99 samples/sec Loss 3.8257 LearningRate 0.0001 Epoch: 26 Global Step: 559450 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:36:59,817-Speed 6312.81 samples/sec Loss 3.8558 LearningRate 0.0001 Epoch: 26 Global Step: 559460 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:03,070-Speed 6298.23 samples/sec Loss 3.8751 LearningRate 0.0001 Epoch: 26 Global Step: 559470 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:06,316-Speed 6309.63 samples/sec Loss 3.7541 LearningRate 0.0001 Epoch: 26 Global Step: 559480 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:09,562-Speed 6312.09 samples/sec Loss 3.8701 LearningRate 0.0001 Epoch: 26 Global Step: 559490 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:12,807-Speed 6311.90 samples/sec Loss 3.8384 LearningRate 0.0001 Epoch: 26 Global Step: 559500 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:16,049-Speed 6318.11 samples/sec Loss 3.8573 LearningRate 0.0001 Epoch: 26 Global Step: 559510 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:19,295-Speed 6311.78 samples/sec Loss 3.8342 LearningRate 0.0001 Epoch: 26 Global Step: 559520 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:22,542-Speed 6308.43 samples/sec Loss 3.7981 LearningRate 0.0001 Epoch: 26 Global Step: 559530 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:25,791-Speed 6305.72 samples/sec Loss 3.8181 LearningRate 0.0001 Epoch: 26 Global Step: 559540 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:37:29,027-Speed 6330.27 samples/sec Loss 3.8684 LearningRate 0.0001 Epoch: 26 Global Step: 559550 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:32,271-Speed 6313.94 samples/sec Loss 3.8688 LearningRate 0.0001 Epoch: 26 Global Step: 559560 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:35,519-Speed 6308.50 samples/sec Loss 3.8275 LearningRate 0.0001 Epoch: 26 Global Step: 559570 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:38,764-Speed 6312.20 samples/sec Loss 3.8799 LearningRate 0.0001 Epoch: 26 Global Step: 559580 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:42,009-Speed 6311.46 samples/sec Loss 3.8412 LearningRate 0.0001 Epoch: 26 Global Step: 559590 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:45,254-Speed 6314.30 samples/sec Loss 3.8062 LearningRate 0.0001 Epoch: 26 Global Step: 559600 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:48,503-Speed 6304.15 samples/sec Loss 3.8743 LearningRate 0.0001 Epoch: 26 Global Step: 559610 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:51,750-Speed 6309.13 samples/sec Loss 3.8512 LearningRate 0.0001 Epoch: 26 Global Step: 559620 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:54,997-Speed 6307.22 samples/sec Loss 3.8474 LearningRate 0.0001 Epoch: 26 Global Step: 559630 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:37:58,246-Speed 6305.48 samples/sec Loss 3.8122 LearningRate 0.0001 Epoch: 26 Global Step: 559640 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:01,493-Speed 6309.51 samples/sec Loss 3.8411 LearningRate 0.0001 Epoch: 26 Global Step: 559650 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:38:04,737-Speed 6313.00 samples/sec Loss 3.8739 LearningRate 0.0001 Epoch: 26 Global Step: 559660 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:38:07,970-Speed 6337.47 samples/sec Loss 3.8161 LearningRate 0.0001 Epoch: 26 Global Step: 559670 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:11,213-Speed 6315.43 samples/sec Loss 3.8358 LearningRate 0.0001 Epoch: 26 Global Step: 559680 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:14,461-Speed 6308.27 samples/sec Loss 3.8704 LearningRate 0.0001 Epoch: 26 Global Step: 559690 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:17,715-Speed 6295.05 samples/sec Loss 3.8628 LearningRate 0.0001 Epoch: 26 Global Step: 559700 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:20,962-Speed 6309.05 samples/sec Loss 3.9000 LearningRate 0.0001 Epoch: 26 Global Step: 559710 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:24,241-Speed 6246.40 samples/sec Loss 3.8645 LearningRate 0.0001 Epoch: 26 Global Step: 559720 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:27,486-Speed 6313.09 samples/sec Loss 3.9122 LearningRate 0.0001 Epoch: 26 Global Step: 559730 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:30,737-Speed 6300.63 samples/sec Loss 3.7869 LearningRate 0.0001 Epoch: 26 Global Step: 559740 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:33,981-Speed 6314.62 samples/sec Loss 3.7888 LearningRate 0.0001 Epoch: 26 Global Step: 559750 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:37,227-Speed 6310.58 samples/sec Loss 3.8636 LearningRate 0.0001 Epoch: 26 Global Step: 559760 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:38:40,476-Speed 6305.32 samples/sec Loss 3.8081 LearningRate 0.0001 Epoch: 26 Global Step: 559770 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:38:43,720-Speed 6315.30 samples/sec Loss 3.8887 LearningRate 0.0001 Epoch: 26 Global Step: 559780 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:38:46,964-Speed 6313.57 samples/sec Loss 3.8954 LearningRate 0.0001 Epoch: 26 Global Step: 559790 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:38:50,210-Speed 6310.36 samples/sec Loss 3.8692 LearningRate 0.0001 Epoch: 26 Global Step: 559800 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:38:53,459-Speed 6306.51 samples/sec Loss 3.9343 LearningRate 0.0001 Epoch: 26 Global Step: 559810 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:38:56,706-Speed 6308.48 samples/sec Loss 3.8173 LearningRate 0.0001 Epoch: 26 Global Step: 559820 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:38:59,950-Speed 6313.96 samples/sec Loss 3.8836 LearningRate 0.0001 Epoch: 26 Global Step: 559830 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:39:03,191-Speed 6320.51 samples/sec Loss 3.8459 LearningRate 0.0001 Epoch: 26 Global Step: 559840 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:39:06,434-Speed 6317.82 samples/sec Loss 3.8315 LearningRate 0.0001 Epoch: 26 Global Step: 559850 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:39:09,680-Speed 6308.93 samples/sec Loss 3.8510 LearningRate 0.0001 Epoch: 26 Global Step: 559860 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:39:12,928-Speed 6308.12 samples/sec Loss 3.8913 LearningRate 0.0001 Epoch: 26 Global Step: 559870 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:39:16,171-Speed 6315.56 samples/sec Loss 3.8355 LearningRate 0.0001 Epoch: 26 Global Step: 559880 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:39:19,414-Speed 6315.83 samples/sec Loss 3.8670 LearningRate 0.0001 Epoch: 26 Global Step: 559890 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:39:22,654-Speed 6322.50 samples/sec Loss 3.8916 LearningRate 0.0001 Epoch: 26 Global Step: 559900 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:39:25,903-Speed 6306.91 samples/sec Loss 3.8621 LearningRate 0.0001 Epoch: 26 Global Step: 559910 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:39:29,147-Speed 6313.68 samples/sec Loss 3.9042 LearningRate 0.0001 Epoch: 26 Global Step: 559920 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:39:32,391-Speed 6315.20 samples/sec Loss 3.8011 LearningRate 0.0001 Epoch: 26 Global Step: 559930 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:39:35,633-Speed 6317.86 samples/sec Loss 3.8610 LearningRate 0.0001 Epoch: 26 Global Step: 559940 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:39:38,877-Speed 6314.49 samples/sec Loss 3.7917 LearningRate 0.0001 Epoch: 26 Global Step: 559950 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:39:42,108-Speed 6339.32 samples/sec Loss 3.8607 LearningRate 0.0001 Epoch: 26 Global Step: 559960 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:40:43,441-Speed 333.92 samples/sec Loss 3.8600 LearningRate 0.0001 Epoch: 27 Global Step: 559970 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:40:46,669-Speed 6346.11 samples/sec Loss 3.8393 LearningRate 0.0001 Epoch: 27 Global Step: 559980 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:40:49,901-Speed 6336.96 samples/sec Loss 3.8121 LearningRate 0.0001 Epoch: 27 Global Step: 559990 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:40:53,136-Speed 6333.49 samples/sec Loss 3.8747 LearningRate 0.0001 Epoch: 27 Global Step: 560000 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:40:56,373-Speed 6327.91 samples/sec Loss 3.8825 LearningRate 0.0001 Epoch: 27 Global Step: 560010 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:40:59,613-Speed 6322.88 samples/sec Loss 3.8137 LearningRate 0.0001 Epoch: 27 Global Step: 560020 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:02,854-Speed 6320.01 samples/sec Loss 3.8627 LearningRate 0.0001 Epoch: 27 Global Step: 560030 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:06,099-Speed 6313.38 samples/sec Loss 3.8476 LearningRate 0.0001 Epoch: 27 Global Step: 560040 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:09,332-Speed 6334.33 samples/sec Loss 3.8388 LearningRate 0.0001 Epoch: 27 Global Step: 560050 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:12,569-Speed 6329.91 samples/sec Loss 3.8487 LearningRate 0.0001 Epoch: 27 Global Step: 560060 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:41:15,795-Speed 6348.84 samples/sec Loss 3.8086 LearningRate 0.0001 Epoch: 27 Global Step: 560070 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:19,040-Speed 6313.45 samples/sec Loss 3.9767 LearningRate 0.0001 Epoch: 27 Global Step: 560080 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:22,283-Speed 6315.75 samples/sec Loss 3.8134 LearningRate 0.0001 Epoch: 27 Global Step: 560090 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:25,517-Speed 6334.33 samples/sec Loss 3.8348 LearningRate 0.0001 Epoch: 27 Global Step: 560100 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:28,752-Speed 6332.03 samples/sec Loss 3.8139 LearningRate 0.0001 Epoch: 27 Global Step: 560110 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:31,992-Speed 6322.62 samples/sec Loss 3.8770 LearningRate 0.0001 Epoch: 27 Global Step: 560120 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:35,270-Speed 6249.27 samples/sec Loss 3.8027 LearningRate 0.0001 Epoch: 27 Global Step: 560130 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:38,514-Speed 6315.35 samples/sec Loss 3.8511 LearningRate 0.0001 Epoch: 27 Global Step: 560140 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:41,792-Speed 6249.36 samples/sec Loss 3.7866 LearningRate 0.0001 Epoch: 27 Global Step: 560150 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:45,032-Speed 6323.11 samples/sec Loss 3.7935 LearningRate 0.0001 Epoch: 27 Global Step: 560160 Fp16 Grad Scale: 8192 Required: 25 hours Training: 2022-04-02 18:41:48,271-Speed 6323.75 samples/sec Loss 3.8722 LearningRate 0.0001 Epoch: 27 Global Step: 560170 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:41:51,512-Speed 6321.25 samples/sec Loss 3.8062 LearningRate 0.0001 Epoch: 27 Global Step: 560180 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:41:54,753-Speed 6319.37 samples/sec Loss 3.7640 LearningRate 0.0001 Epoch: 27 Global Step: 560190 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:41:58,013-Speed 6282.74 samples/sec Loss 3.8253 LearningRate 0.0001 Epoch: 27 Global Step: 560200 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:42:01,257-Speed 6318.58 samples/sec Loss 3.8185 LearningRate 0.0001 Epoch: 27 Global Step: 560210 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-04-02 18:42:04,502-Speed 6313.22 samples/sec Loss 3.7556 LearningRate 0.0001 Epoch: 27 Global Step: 560220 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:42:07,739-Speed 6328.02 samples/sec Loss 3.8686 LearningRate 0.0001 Epoch: 27 Global Step: 560230 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:42:10,976-Speed 6327.82 samples/sec Loss 3.8470 LearningRate 0.0001 Epoch: 27 Global Step: 560240 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:42:14,215-Speed 6325.65 samples/sec Loss 3.7874 LearningRate 0.0001 Epoch: 27 Global Step: 560250 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:42:17,438-Speed 6354.59 samples/sec Loss 3.8371 LearningRate 0.0001 Epoch: 27 Global Step: 560260 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:42:20,675-Speed 6329.35 samples/sec Loss 3.8047 LearningRate 0.0001 Epoch: 27 Global Step: 560270 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:42:23,913-Speed 6326.25 samples/sec Loss 3.8288 LearningRate 0.0001 Epoch: 27 Global Step: 560280 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:42:27,149-Speed 6328.91 samples/sec Loss 3.7906 LearningRate 0.0001 Epoch: 27 Global Step: 560290 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:42:30,389-Speed 6322.94 samples/sec Loss 3.8210 LearningRate 0.0001 Epoch: 27 Global Step: 560300 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:42:33,627-Speed 6326.12 samples/sec Loss 3.8705 LearningRate 0.0001 Epoch: 27 Global Step: 560310 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:42:36,863-Speed 6331.71 samples/sec Loss 3.8465 LearningRate 0.0001 Epoch: 27 Global Step: 560320 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:42:40,106-Speed 6314.50 samples/sec Loss 3.7997 LearningRate 0.0001 Epoch: 27 Global Step: 560330 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:42:43,344-Speed 6328.00 samples/sec Loss 3.8718 LearningRate 0.0001 Epoch: 27 Global Step: 560340 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:42:46,582-Speed 6325.78 samples/sec Loss 3.8212 LearningRate 0.0001 Epoch: 27 Global Step: 560350 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:42:49,820-Speed 6326.40 samples/sec Loss 3.8193 LearningRate 0.0001 Epoch: 27 Global Step: 560360 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:42:53,060-Speed 6322.57 samples/sec Loss 3.8459 LearningRate 0.0001 Epoch: 27 Global Step: 560370 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:42:56,295-Speed 6332.97 samples/sec Loss 3.8461 LearningRate 0.0001 Epoch: 27 Global Step: 560380 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:42:59,531-Speed 6330.81 samples/sec Loss 3.8804 LearningRate 0.0001 Epoch: 27 Global Step: 560390 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:43:02,765-Speed 6332.98 samples/sec Loss 3.8022 LearningRate 0.0001 Epoch: 27 Global Step: 560400 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:43:05,993-Speed 6346.19 samples/sec Loss 3.8247 LearningRate 0.0001 Epoch: 27 Global Step: 560410 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:09,232-Speed 6324.54 samples/sec Loss 3.8080 LearningRate 0.0001 Epoch: 27 Global Step: 560420 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:12,466-Speed 6335.07 samples/sec Loss 3.8507 LearningRate 0.0001 Epoch: 27 Global Step: 560430 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:15,702-Speed 6329.83 samples/sec Loss 3.8640 LearningRate 0.0001 Epoch: 27 Global Step: 560440 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:18,936-Speed 6334.80 samples/sec Loss 3.8658 LearningRate 0.0001 Epoch: 27 Global Step: 560450 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:22,171-Speed 6330.78 samples/sec Loss 3.7754 LearningRate 0.0001 Epoch: 27 Global Step: 560460 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:25,409-Speed 6327.07 samples/sec Loss 3.8037 LearningRate 0.0001 Epoch: 27 Global Step: 560470 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:28,650-Speed 6319.62 samples/sec Loss 3.8136 LearningRate 0.0001 Epoch: 27 Global Step: 560480 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:31,882-Speed 6338.53 samples/sec Loss 3.8970 LearningRate 0.0001 Epoch: 27 Global Step: 560490 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:35,120-Speed 6325.58 samples/sec Loss 3.8197 LearningRate 0.0001 Epoch: 27 Global Step: 560500 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:38,342-Speed 6358.33 samples/sec Loss 3.8345 LearningRate 0.0001 Epoch: 27 Global Step: 560510 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:41,575-Speed 6335.83 samples/sec Loss 3.8432 LearningRate 0.0001 Epoch: 27 Global Step: 560520 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:44,811-Speed 6329.80 samples/sec Loss 3.8425 LearningRate 0.0001 Epoch: 27 Global Step: 560530 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:48,052-Speed 6321.95 samples/sec Loss 3.8223 LearningRate 0.0001 Epoch: 27 Global Step: 560540 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:51,291-Speed 6323.10 samples/sec Loss 3.8810 LearningRate 0.0001 Epoch: 27 Global Step: 560550 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:54,525-Speed 6334.08 samples/sec Loss 3.8341 LearningRate 0.0001 Epoch: 27 Global Step: 560560 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:43:57,765-Speed 6323.66 samples/sec Loss 3.7867 LearningRate 0.0001 Epoch: 27 Global Step: 560570 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:01,000-Speed 6331.18 samples/sec Loss 3.7757 LearningRate 0.0001 Epoch: 27 Global Step: 560580 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:04,245-Speed 6313.99 samples/sec Loss 3.8224 LearningRate 0.0001 Epoch: 27 Global Step: 560590 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:07,484-Speed 6323.69 samples/sec Loss 3.8474 LearningRate 0.0001 Epoch: 27 Global Step: 560600 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:10,720-Speed 6331.28 samples/sec Loss 3.8465 LearningRate 0.0001 Epoch: 27 Global Step: 560610 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:44:13,961-Speed 6319.75 samples/sec Loss 3.8331 LearningRate 0.0001 Epoch: 27 Global Step: 560620 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:17,263-Speed 6203.76 samples/sec Loss 3.7927 LearningRate 0.0001 Epoch: 27 Global Step: 560630 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:20,498-Speed 6331.47 samples/sec Loss 3.8540 LearningRate 0.0001 Epoch: 27 Global Step: 560640 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:23,737-Speed 6324.31 samples/sec Loss 3.7905 LearningRate 0.0001 Epoch: 27 Global Step: 560650 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:26,978-Speed 6321.63 samples/sec Loss 3.8097 LearningRate 0.0001 Epoch: 27 Global Step: 560660 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:30,214-Speed 6329.51 samples/sec Loss 3.8705 LearningRate 0.0001 Epoch: 27 Global Step: 560670 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:33,450-Speed 6331.58 samples/sec Loss 3.8681 LearningRate 0.0001 Epoch: 27 Global Step: 560680 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:36,689-Speed 6323.35 samples/sec Loss 3.9143 LearningRate 0.0001 Epoch: 27 Global Step: 560690 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:39,930-Speed 6320.20 samples/sec Loss 3.8833 LearningRate 0.0001 Epoch: 27 Global Step: 560700 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:43,169-Speed 6324.92 samples/sec Loss 3.8672 LearningRate 0.0001 Epoch: 27 Global Step: 560710 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:44:46,405-Speed 6328.75 samples/sec Loss 3.8777 LearningRate 0.0001 Epoch: 27 Global Step: 560720 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:44:49,638-Speed 6336.81 samples/sec Loss 3.7914 LearningRate 0.0001 Epoch: 27 Global Step: 560730 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:44:52,878-Speed 6322.54 samples/sec Loss 3.8395 LearningRate 0.0001 Epoch: 27 Global Step: 560740 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:44:56,121-Speed 6317.41 samples/sec Loss 3.8330 LearningRate 0.0001 Epoch: 27 Global Step: 560750 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:44:59,345-Speed 6352.05 samples/sec Loss 3.8787 LearningRate 0.0001 Epoch: 27 Global Step: 560760 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:02,582-Speed 6330.10 samples/sec Loss 3.8115 LearningRate 0.0001 Epoch: 27 Global Step: 560770 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:05,825-Speed 6315.83 samples/sec Loss 3.9010 LearningRate 0.0001 Epoch: 27 Global Step: 560780 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:09,064-Speed 6325.95 samples/sec Loss 3.8122 LearningRate 0.0001 Epoch: 27 Global Step: 560790 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:12,299-Speed 6331.34 samples/sec Loss 3.8666 LearningRate 0.0001 Epoch: 27 Global Step: 560800 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:15,538-Speed 6323.65 samples/sec Loss 3.8750 LearningRate 0.0001 Epoch: 27 Global Step: 560810 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:18,779-Speed 6322.21 samples/sec Loss 3.8434 LearningRate 0.0001 Epoch: 27 Global Step: 560820 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:22,018-Speed 6323.82 samples/sec Loss 3.8561 LearningRate 0.0001 Epoch: 27 Global Step: 560830 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:25,258-Speed 6322.85 samples/sec Loss 3.8107 LearningRate 0.0001 Epoch: 27 Global Step: 560840 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:28,529-Speed 6262.17 samples/sec Loss 3.8508 LearningRate 0.0001 Epoch: 27 Global Step: 560850 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:31,765-Speed 6329.11 samples/sec Loss 3.7988 LearningRate 0.0001 Epoch: 27 Global Step: 560860 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:45:34,991-Speed 6351.38 samples/sec Loss 3.8340 LearningRate 0.0001 Epoch: 27 Global Step: 560870 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:38,235-Speed 6313.49 samples/sec Loss 3.8009 LearningRate 0.0001 Epoch: 27 Global Step: 560880 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:41,475-Speed 6322.37 samples/sec Loss 3.7648 LearningRate 0.0001 Epoch: 27 Global Step: 560890 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:44,722-Speed 6309.32 samples/sec Loss 3.8335 LearningRate 0.0001 Epoch: 27 Global Step: 560900 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:47,959-Speed 6327.50 samples/sec Loss 3.8458 LearningRate 0.0001 Epoch: 27 Global Step: 560910 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:51,201-Speed 6318.63 samples/sec Loss 3.8271 LearningRate 0.0001 Epoch: 27 Global Step: 560920 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:54,441-Speed 6322.22 samples/sec Loss 3.8177 LearningRate 0.0001 Epoch: 27 Global Step: 560930 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:45:57,684-Speed 6316.98 samples/sec Loss 3.8709 LearningRate 0.0001 Epoch: 27 Global Step: 560940 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:00,925-Speed 6319.42 samples/sec Loss 3.8250 LearningRate 0.0001 Epoch: 27 Global Step: 560950 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:04,167-Speed 6319.53 samples/sec Loss 3.8431 LearningRate 0.0001 Epoch: 27 Global Step: 560960 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:07,402-Speed 6331.11 samples/sec Loss 3.8659 LearningRate 0.0001 Epoch: 27 Global Step: 560970 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:46:10,642-Speed 6323.17 samples/sec Loss 3.7694 LearningRate 0.0001 Epoch: 27 Global Step: 560980 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:46:13,889-Speed 6308.54 samples/sec Loss 3.8021 LearningRate 0.0001 Epoch: 27 Global Step: 560990 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:17,175-Speed 6234.40 samples/sec Loss 3.8273 LearningRate 0.0001 Epoch: 27 Global Step: 561000 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:20,420-Speed 6313.83 samples/sec Loss 3.8548 LearningRate 0.0001 Epoch: 27 Global Step: 561010 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:23,657-Speed 6326.88 samples/sec Loss 3.8326 LearningRate 0.0001 Epoch: 27 Global Step: 561020 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:26,899-Speed 6320.07 samples/sec Loss 3.8392 LearningRate 0.0001 Epoch: 27 Global Step: 561030 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:30,138-Speed 6322.90 samples/sec Loss 3.8339 LearningRate 0.0001 Epoch: 27 Global Step: 561040 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:33,377-Speed 6326.03 samples/sec Loss 3.8341 LearningRate 0.0001 Epoch: 27 Global Step: 561050 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:36,615-Speed 6325.25 samples/sec Loss 3.8073 LearningRate 0.0001 Epoch: 27 Global Step: 561060 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:39,853-Speed 6325.51 samples/sec Loss 3.8029 LearningRate 0.0001 Epoch: 27 Global Step: 561070 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:43,091-Speed 6327.92 samples/sec Loss 3.8063 LearningRate 0.0001 Epoch: 27 Global Step: 561080 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:46:46,331-Speed 6321.60 samples/sec Loss 3.8668 LearningRate 0.0001 Epoch: 27 Global Step: 561090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:46:49,573-Speed 6318.17 samples/sec Loss 3.8309 LearningRate 0.0001 Epoch: 27 Global Step: 561100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:46:52,812-Speed 6324.19 samples/sec Loss 3.8724 LearningRate 0.0001 Epoch: 27 Global Step: 561110 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:46:56,051-Speed 6324.67 samples/sec Loss 3.8281 LearningRate 0.0001 Epoch: 27 Global Step: 561120 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:46:59,277-Speed 6349.45 samples/sec Loss 3.8481 LearningRate 0.0001 Epoch: 27 Global Step: 561130 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:02,522-Speed 6313.30 samples/sec Loss 3.7486 LearningRate 0.0001 Epoch: 27 Global Step: 561140 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:05,763-Speed 6321.10 samples/sec Loss 3.8653 LearningRate 0.0001 Epoch: 27 Global Step: 561150 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:09,015-Speed 6298.40 samples/sec Loss 3.8947 LearningRate 0.0001 Epoch: 27 Global Step: 561160 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:12,256-Speed 6320.03 samples/sec Loss 3.8565 LearningRate 0.0001 Epoch: 27 Global Step: 561170 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:15,498-Speed 6319.14 samples/sec Loss 3.8843 LearningRate 0.0001 Epoch: 27 Global Step: 561180 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:18,737-Speed 6323.97 samples/sec Loss 3.8429 LearningRate 0.0001 Epoch: 27 Global Step: 561190 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:21,977-Speed 6321.93 samples/sec Loss 3.8303 LearningRate 0.0001 Epoch: 27 Global Step: 561200 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:25,216-Speed 6325.62 samples/sec Loss 3.8092 LearningRate 0.0001 Epoch: 27 Global Step: 561210 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:28,454-Speed 6326.13 samples/sec Loss 3.8449 LearningRate 0.0001 Epoch: 27 Global Step: 561220 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:31,687-Speed 6335.25 samples/sec Loss 3.8228 LearningRate 0.0001 Epoch: 27 Global Step: 561230 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:34,930-Speed 6318.24 samples/sec Loss 3.8513 LearningRate 0.0001 Epoch: 27 Global Step: 561240 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:38,167-Speed 6326.68 samples/sec Loss 3.8290 LearningRate 0.0001 Epoch: 27 Global Step: 561250 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:41,406-Speed 6326.16 samples/sec Loss 3.8492 LearningRate 0.0001 Epoch: 27 Global Step: 561260 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:44,644-Speed 6324.84 samples/sec Loss 3.7800 LearningRate 0.0001 Epoch: 27 Global Step: 561270 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:47,880-Speed 6331.21 samples/sec Loss 3.8586 LearningRate 0.0001 Epoch: 27 Global Step: 561280 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:51,119-Speed 6323.20 samples/sec Loss 3.8431 LearningRate 0.0001 Epoch: 27 Global Step: 561290 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:54,356-Speed 6329.16 samples/sec Loss 3.8166 LearningRate 0.0001 Epoch: 27 Global Step: 561300 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:47:57,602-Speed 6311.38 samples/sec Loss 3.8401 LearningRate 0.0001 Epoch: 27 Global Step: 561310 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:00,842-Speed 6323.13 samples/sec Loss 3.8187 LearningRate 0.0001 Epoch: 27 Global Step: 561320 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:04,082-Speed 6321.99 samples/sec Loss 3.8741 LearningRate 0.0001 Epoch: 27 Global Step: 561330 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:48:07,320-Speed 6325.55 samples/sec Loss 3.8467 LearningRate 0.0001 Epoch: 27 Global Step: 561340 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:48:10,569-Speed 6303.97 samples/sec Loss 3.8446 LearningRate 0.0001 Epoch: 27 Global Step: 561350 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:48:13,807-Speed 6326.68 samples/sec Loss 3.8430 LearningRate 0.0001 Epoch: 27 Global Step: 561360 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:48:17,044-Speed 6329.02 samples/sec Loss 3.8753 LearningRate 0.0001 Epoch: 27 Global Step: 561370 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:48:20,285-Speed 6320.92 samples/sec Loss 3.8178 LearningRate 0.0001 Epoch: 27 Global Step: 561380 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:48:23,527-Speed 6318.72 samples/sec Loss 3.7744 LearningRate 0.0001 Epoch: 27 Global Step: 561390 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:48:26,771-Speed 6312.94 samples/sec Loss 3.7904 LearningRate 0.0001 Epoch: 27 Global Step: 561400 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:48:29,995-Speed 6354.09 samples/sec Loss 3.8324 LearningRate 0.0001 Epoch: 27 Global Step: 561410 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:33,233-Speed 6326.91 samples/sec Loss 3.7929 LearningRate 0.0001 Epoch: 27 Global Step: 561420 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:36,478-Speed 6314.18 samples/sec Loss 3.8477 LearningRate 0.0001 Epoch: 27 Global Step: 561430 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:39,720-Speed 6317.25 samples/sec Loss 3.7970 LearningRate 0.0001 Epoch: 27 Global Step: 561440 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:42,963-Speed 6317.11 samples/sec Loss 3.7857 LearningRate 0.0001 Epoch: 27 Global Step: 561450 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:46,204-Speed 6320.44 samples/sec Loss 3.8026 LearningRate 0.0001 Epoch: 27 Global Step: 561460 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:49,443-Speed 6325.01 samples/sec Loss 3.7889 LearningRate 0.0001 Epoch: 27 Global Step: 561470 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:52,681-Speed 6326.42 samples/sec Loss 3.8033 LearningRate 0.0001 Epoch: 27 Global Step: 561480 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:55,924-Speed 6315.51 samples/sec Loss 3.7674 LearningRate 0.0001 Epoch: 27 Global Step: 561490 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:48:59,211-Speed 6232.36 samples/sec Loss 3.8875 LearningRate 0.0001 Epoch: 27 Global Step: 561500 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:02,451-Speed 6323.22 samples/sec Loss 3.8419 LearningRate 0.0001 Epoch: 27 Global Step: 561510 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:49:05,689-Speed 6324.81 samples/sec Loss 3.8431 LearningRate 0.0001 Epoch: 27 Global Step: 561520 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:49:08,934-Speed 6312.49 samples/sec Loss 3.8068 LearningRate 0.0001 Epoch: 27 Global Step: 561530 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:49:12,165-Speed 6341.90 samples/sec Loss 3.7941 LearningRate 0.0001 Epoch: 27 Global Step: 561540 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:15,405-Speed 6320.99 samples/sec Loss 3.8038 LearningRate 0.0001 Epoch: 27 Global Step: 561550 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:18,645-Speed 6322.37 samples/sec Loss 3.8503 LearningRate 0.0001 Epoch: 27 Global Step: 561560 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:21,884-Speed 6325.35 samples/sec Loss 3.8448 LearningRate 0.0001 Epoch: 27 Global Step: 561570 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:25,126-Speed 6317.19 samples/sec Loss 3.9257 LearningRate 0.0001 Epoch: 27 Global Step: 561580 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:28,367-Speed 6321.45 samples/sec Loss 3.7912 LearningRate 0.0001 Epoch: 27 Global Step: 561590 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:31,606-Speed 6324.13 samples/sec Loss 3.8313 LearningRate 0.0001 Epoch: 27 Global Step: 561600 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:34,848-Speed 6318.14 samples/sec Loss 3.7977 LearningRate 0.0001 Epoch: 27 Global Step: 561610 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:38,092-Speed 6315.06 samples/sec Loss 3.8144 LearningRate 0.0001 Epoch: 27 Global Step: 561620 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:41,333-Speed 6319.75 samples/sec Loss 3.7945 LearningRate 0.0001 Epoch: 27 Global Step: 561630 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:49:44,575-Speed 6320.04 samples/sec Loss 3.8197 LearningRate 0.0001 Epoch: 27 Global Step: 561640 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:49:47,817-Speed 6317.81 samples/sec Loss 3.8297 LearningRate 0.0001 Epoch: 27 Global Step: 561650 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:49:51,126-Speed 6191.77 samples/sec Loss 3.7184 LearningRate 0.0001 Epoch: 27 Global Step: 561660 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:49:54,366-Speed 6322.71 samples/sec Loss 3.8449 LearningRate 0.0001 Epoch: 27 Global Step: 561670 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:49:57,608-Speed 6318.64 samples/sec Loss 3.8416 LearningRate 0.0001 Epoch: 27 Global Step: 561680 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:50:00,846-Speed 6326.20 samples/sec Loss 3.7893 LearningRate 0.0001 Epoch: 27 Global Step: 561690 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:50:04,087-Speed 6320.36 samples/sec Loss 3.8701 LearningRate 0.0001 Epoch: 27 Global Step: 561700 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:50:07,330-Speed 6315.15 samples/sec Loss 3.8943 LearningRate 0.0001 Epoch: 27 Global Step: 561710 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:50:10,569-Speed 6324.52 samples/sec Loss 3.7773 LearningRate 0.0001 Epoch: 27 Global Step: 561720 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:50:13,799-Speed 6341.92 samples/sec Loss 3.8066 LearningRate 0.0001 Epoch: 27 Global Step: 561730 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:50:17,038-Speed 6324.58 samples/sec Loss 3.8478 LearningRate 0.0001 Epoch: 27 Global Step: 561740 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:50:20,279-Speed 6320.27 samples/sec Loss 3.9032 LearningRate 0.0001 Epoch: 27 Global Step: 561750 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:50:23,519-Speed 6323.98 samples/sec Loss 3.8710 LearningRate 0.0001 Epoch: 27 Global Step: 561760 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:50:26,758-Speed 6323.42 samples/sec Loss 3.8108 LearningRate 0.0001 Epoch: 27 Global Step: 561770 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:50:30,000-Speed 6317.97 samples/sec Loss 3.8622 LearningRate 0.0001 Epoch: 27 Global Step: 561780 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:50:33,237-Speed 6328.14 samples/sec Loss 3.8094 LearningRate 0.0001 Epoch: 27 Global Step: 561790 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:50:36,481-Speed 6315.29 samples/sec Loss 3.8359 LearningRate 0.0001 Epoch: 27 Global Step: 561800 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:50:39,724-Speed 6316.89 samples/sec Loss 3.8023 LearningRate 0.0001 Epoch: 27 Global Step: 561810 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:50:42,967-Speed 6316.93 samples/sec Loss 3.8257 LearningRate 0.0001 Epoch: 27 Global Step: 561820 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:50:46,206-Speed 6322.95 samples/sec Loss 3.7361 LearningRate 0.0001 Epoch: 27 Global Step: 561830 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:50:49,448-Speed 6318.30 samples/sec Loss 3.8498 LearningRate 0.0001 Epoch: 27 Global Step: 561840 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:50:52,750-Speed 6205.08 samples/sec Loss 3.9099 LearningRate 0.0001 Epoch: 27 Global Step: 561850 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:50:56,026-Speed 6253.71 samples/sec Loss 3.8958 LearningRate 0.0001 Epoch: 27 Global Step: 561860 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:50:59,266-Speed 6322.53 samples/sec Loss 3.8060 LearningRate 0.0001 Epoch: 27 Global Step: 561870 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:51:02,497-Speed 6341.58 samples/sec Loss 3.8354 LearningRate 0.0001 Epoch: 27 Global Step: 561880 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:05,738-Speed 6319.10 samples/sec Loss 3.7215 LearningRate 0.0001 Epoch: 27 Global Step: 561890 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:08,991-Speed 6296.85 samples/sec Loss 3.8420 LearningRate 0.0001 Epoch: 27 Global Step: 561900 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:12,247-Speed 6292.07 samples/sec Loss 3.8862 LearningRate 0.0001 Epoch: 27 Global Step: 561910 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:15,491-Speed 6314.45 samples/sec Loss 3.8635 LearningRate 0.0001 Epoch: 27 Global Step: 561920 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:18,733-Speed 6318.39 samples/sec Loss 3.8114 LearningRate 0.0001 Epoch: 27 Global Step: 561930 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:21,974-Speed 6320.46 samples/sec Loss 3.8204 LearningRate 0.0001 Epoch: 27 Global Step: 561940 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:25,228-Speed 6295.75 samples/sec Loss 3.8435 LearningRate 0.0001 Epoch: 27 Global Step: 561950 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:28,477-Speed 6304.40 samples/sec Loss 3.7894 LearningRate 0.0001 Epoch: 27 Global Step: 561960 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:31,719-Speed 6319.62 samples/sec Loss 3.7932 LearningRate 0.0001 Epoch: 27 Global Step: 561970 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:34,961-Speed 6317.22 samples/sec Loss 3.8704 LearningRate 0.0001 Epoch: 27 Global Step: 561980 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:51:38,188-Speed 6347.24 samples/sec Loss 3.8012 LearningRate 0.0001 Epoch: 27 Global Step: 561990 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:41,434-Speed 6312.07 samples/sec Loss 3.7617 LearningRate 0.0001 Epoch: 27 Global Step: 562000 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:44,674-Speed 6321.56 samples/sec Loss 3.8893 LearningRate 0.0001 Epoch: 27 Global Step: 562010 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:47,917-Speed 6316.62 samples/sec Loss 3.8614 LearningRate 0.0001 Epoch: 27 Global Step: 562020 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:51,157-Speed 6322.54 samples/sec Loss 3.8266 LearningRate 0.0001 Epoch: 27 Global Step: 562030 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:54,400-Speed 6320.51 samples/sec Loss 3.8272 LearningRate 0.0001 Epoch: 27 Global Step: 562040 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:51:57,640-Speed 6320.97 samples/sec Loss 3.8238 LearningRate 0.0001 Epoch: 27 Global Step: 562050 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:00,885-Speed 6313.06 samples/sec Loss 3.8974 LearningRate 0.0001 Epoch: 27 Global Step: 562060 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:04,131-Speed 6311.30 samples/sec Loss 3.8598 LearningRate 0.0001 Epoch: 27 Global Step: 562070 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:07,377-Speed 6312.55 samples/sec Loss 3.8023 LearningRate 0.0001 Epoch: 27 Global Step: 562080 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:10,622-Speed 6313.28 samples/sec Loss 3.7450 LearningRate 0.0001 Epoch: 27 Global Step: 562090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:52:13,860-Speed 6326.93 samples/sec Loss 3.8236 LearningRate 0.0001 Epoch: 27 Global Step: 562100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:52:17,103-Speed 6315.36 samples/sec Loss 3.8335 LearningRate 0.0001 Epoch: 27 Global Step: 562110 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:52:20,346-Speed 6317.10 samples/sec Loss 3.8541 LearningRate 0.0001 Epoch: 27 Global Step: 562120 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:52:23,597-Speed 6301.48 samples/sec Loss 3.8158 LearningRate 0.0001 Epoch: 27 Global Step: 562130 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:52:26,839-Speed 6317.56 samples/sec Loss 3.8009 LearningRate 0.0001 Epoch: 27 Global Step: 562140 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:52:30,080-Speed 6320.64 samples/sec Loss 3.8284 LearningRate 0.0001 Epoch: 27 Global Step: 562150 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:52:33,308-Speed 6346.61 samples/sec Loss 3.8558 LearningRate 0.0001 Epoch: 27 Global Step: 562160 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:36,552-Speed 6314.02 samples/sec Loss 3.8405 LearningRate 0.0001 Epoch: 27 Global Step: 562170 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:39,800-Speed 6307.32 samples/sec Loss 3.7876 LearningRate 0.0001 Epoch: 27 Global Step: 562180 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:43,044-Speed 6313.57 samples/sec Loss 3.8596 LearningRate 0.0001 Epoch: 27 Global Step: 562190 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:46,286-Speed 6318.69 samples/sec Loss 3.8003 LearningRate 0.0001 Epoch: 27 Global Step: 562200 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:49,527-Speed 6321.04 samples/sec Loss 3.8753 LearningRate 0.0001 Epoch: 27 Global Step: 562210 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:52,771-Speed 6315.22 samples/sec Loss 3.8610 LearningRate 0.0001 Epoch: 27 Global Step: 562220 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:56,019-Speed 6304.96 samples/sec Loss 3.8088 LearningRate 0.0001 Epoch: 27 Global Step: 562230 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:52:59,260-Speed 6320.46 samples/sec Loss 3.8592 LearningRate 0.0001 Epoch: 27 Global Step: 562240 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:53:02,505-Speed 6312.98 samples/sec Loss 3.7671 LearningRate 0.0001 Epoch: 27 Global Step: 562250 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:53:05,752-Speed 6308.85 samples/sec Loss 3.8210 LearningRate 0.0001 Epoch: 27 Global Step: 562260 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:08,995-Speed 6316.12 samples/sec Loss 3.8030 LearningRate 0.0001 Epoch: 27 Global Step: 562270 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:12,238-Speed 6318.17 samples/sec Loss 3.8785 LearningRate 0.0001 Epoch: 27 Global Step: 562280 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:15,482-Speed 6313.71 samples/sec Loss 3.7613 LearningRate 0.0001 Epoch: 27 Global Step: 562290 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:18,727-Speed 6314.36 samples/sec Loss 3.8078 LearningRate 0.0001 Epoch: 27 Global Step: 562300 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:21,968-Speed 6319.91 samples/sec Loss 3.7787 LearningRate 0.0001 Epoch: 27 Global Step: 562310 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:25,211-Speed 6316.83 samples/sec Loss 3.8746 LearningRate 0.0001 Epoch: 27 Global Step: 562320 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:28,456-Speed 6313.45 samples/sec Loss 3.8670 LearningRate 0.0001 Epoch: 27 Global Step: 562330 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:31,699-Speed 6315.27 samples/sec Loss 3.7871 LearningRate 0.0001 Epoch: 27 Global Step: 562340 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:34,946-Speed 6309.71 samples/sec Loss 3.8181 LearningRate 0.0001 Epoch: 27 Global Step: 562350 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:38,183-Speed 6327.94 samples/sec Loss 3.8043 LearningRate 0.0001 Epoch: 27 Global Step: 562360 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:41,425-Speed 6318.04 samples/sec Loss 3.8238 LearningRate 0.0001 Epoch: 27 Global Step: 562370 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:44,669-Speed 6314.85 samples/sec Loss 3.8384 LearningRate 0.0001 Epoch: 27 Global Step: 562380 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:47,910-Speed 6319.64 samples/sec Loss 3.7711 LearningRate 0.0001 Epoch: 27 Global Step: 562390 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:51,152-Speed 6319.92 samples/sec Loss 3.8486 LearningRate 0.0001 Epoch: 27 Global Step: 562400 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:54,393-Speed 6319.24 samples/sec Loss 3.8533 LearningRate 0.0001 Epoch: 27 Global Step: 562410 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:53:57,640-Speed 6309.76 samples/sec Loss 3.7947 LearningRate 0.0001 Epoch: 27 Global Step: 562420 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:54:00,888-Speed 6306.86 samples/sec Loss 3.8350 LearningRate 0.0001 Epoch: 27 Global Step: 562430 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:54:04,128-Speed 6322.50 samples/sec Loss 3.8247 LearningRate 0.0001 Epoch: 27 Global Step: 562440 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:54:07,356-Speed 6345.55 samples/sec Loss 3.8482 LearningRate 0.0001 Epoch: 27 Global Step: 562450 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:54:10,598-Speed 6317.71 samples/sec Loss 3.8278 LearningRate 0.0001 Epoch: 27 Global Step: 562460 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:54:13,840-Speed 6318.57 samples/sec Loss 3.8474 LearningRate 0.0001 Epoch: 27 Global Step: 562470 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:54:17,087-Speed 6309.29 samples/sec Loss 3.8233 LearningRate 0.0001 Epoch: 27 Global Step: 562480 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:54:20,337-Speed 6301.81 samples/sec Loss 3.8311 LearningRate 0.0001 Epoch: 27 Global Step: 562490 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:54:23,578-Speed 6321.89 samples/sec Loss 3.8281 LearningRate 0.0001 Epoch: 27 Global Step: 562500 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:54:26,822-Speed 6315.32 samples/sec Loss 3.8287 LearningRate 0.0001 Epoch: 27 Global Step: 562510 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:54:30,086-Speed 6274.93 samples/sec Loss 3.7810 LearningRate 0.0001 Epoch: 27 Global Step: 562520 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:54:33,326-Speed 6323.12 samples/sec Loss 3.8179 LearningRate 0.0001 Epoch: 27 Global Step: 562530 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:54:36,569-Speed 6316.18 samples/sec Loss 3.7764 LearningRate 0.0001 Epoch: 27 Global Step: 562540 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:54:39,816-Speed 6309.64 samples/sec Loss 3.7562 LearningRate 0.0001 Epoch: 27 Global Step: 562550 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:54:43,057-Speed 6319.27 samples/sec Loss 3.8132 LearningRate 0.0001 Epoch: 27 Global Step: 562560 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:54:46,297-Speed 6322.19 samples/sec Loss 3.7680 LearningRate 0.0001 Epoch: 27 Global Step: 562570 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:54:49,540-Speed 6317.48 samples/sec Loss 3.8202 LearningRate 0.0001 Epoch: 27 Global Step: 562580 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:54:52,784-Speed 6315.21 samples/sec Loss 3.8328 LearningRate 0.0001 Epoch: 27 Global Step: 562590 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:54:56,033-Speed 6303.96 samples/sec Loss 3.8494 LearningRate 0.0001 Epoch: 27 Global Step: 562600 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:54:59,276-Speed 6316.54 samples/sec Loss 3.8228 LearningRate 0.0001 Epoch: 27 Global Step: 562610 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:55:02,515-Speed 6324.66 samples/sec Loss 3.8154 LearningRate 0.0001 Epoch: 27 Global Step: 562620 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:55:05,741-Speed 6348.33 samples/sec Loss 3.7881 LearningRate 0.0001 Epoch: 27 Global Step: 562630 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:55:08,983-Speed 6319.51 samples/sec Loss 3.8080 LearningRate 0.0001 Epoch: 27 Global Step: 562640 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:55:12,225-Speed 6319.18 samples/sec Loss 3.7464 LearningRate 0.0001 Epoch: 27 Global Step: 562650 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:55:15,467-Speed 6317.48 samples/sec Loss 3.8034 LearningRate 0.0001 Epoch: 27 Global Step: 562660 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:55:18,713-Speed 6310.04 samples/sec Loss 3.8478 LearningRate 0.0001 Epoch: 27 Global Step: 562670 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:55:21,952-Speed 6325.34 samples/sec Loss 3.8790 LearningRate 0.0001 Epoch: 27 Global Step: 562680 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:55:25,194-Speed 6317.42 samples/sec Loss 3.8045 LearningRate 0.0001 Epoch: 27 Global Step: 562690 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:55:28,456-Speed 6281.59 samples/sec Loss 3.7474 LearningRate 0.0001 Epoch: 27 Global Step: 562700 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:55:31,701-Speed 6312.09 samples/sec Loss 3.7837 LearningRate 0.0001 Epoch: 27 Global Step: 562710 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:55:34,950-Speed 6306.11 samples/sec Loss 3.8036 LearningRate 0.0001 Epoch: 27 Global Step: 562720 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:55:38,195-Speed 6311.67 samples/sec Loss 3.8103 LearningRate 0.0001 Epoch: 27 Global Step: 562730 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:55:41,438-Speed 6317.74 samples/sec Loss 3.8130 LearningRate 0.0001 Epoch: 27 Global Step: 562740 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:55:44,685-Speed 6308.57 samples/sec Loss 3.8522 LearningRate 0.0001 Epoch: 27 Global Step: 562750 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:55:47,929-Speed 6313.19 samples/sec Loss 3.8292 LearningRate 0.0001 Epoch: 27 Global Step: 562760 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:55:51,176-Speed 6308.78 samples/sec Loss 3.7786 LearningRate 0.0001 Epoch: 27 Global Step: 562770 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:55:54,418-Speed 6319.67 samples/sec Loss 3.9047 LearningRate 0.0001 Epoch: 27 Global Step: 562780 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:55:57,652-Speed 6332.80 samples/sec Loss 3.8960 LearningRate 0.0001 Epoch: 27 Global Step: 562790 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:56:00,894-Speed 6319.22 samples/sec Loss 3.7584 LearningRate 0.0001 Epoch: 27 Global Step: 562800 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:56:04,137-Speed 6317.67 samples/sec Loss 3.8098 LearningRate 0.0001 Epoch: 27 Global Step: 562810 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:56:07,376-Speed 6322.80 samples/sec Loss 3.7885 LearningRate 0.0001 Epoch: 27 Global Step: 562820 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:56:10,622-Speed 6311.75 samples/sec Loss 3.8614 LearningRate 0.0001 Epoch: 27 Global Step: 562830 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:56:13,864-Speed 6318.64 samples/sec Loss 3.8082 LearningRate 0.0001 Epoch: 27 Global Step: 562840 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:56:17,111-Speed 6307.02 samples/sec Loss 3.8862 LearningRate 0.0001 Epoch: 27 Global Step: 562850 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:56:20,356-Speed 6312.67 samples/sec Loss 3.8894 LearningRate 0.0001 Epoch: 27 Global Step: 562860 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:56:23,600-Speed 6315.78 samples/sec Loss 3.7864 LearningRate 0.0001 Epoch: 27 Global Step: 562870 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:56:26,845-Speed 6313.75 samples/sec Loss 3.8109 LearningRate 0.0001 Epoch: 27 Global Step: 562880 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:56:30,088-Speed 6315.50 samples/sec Loss 3.7651 LearningRate 0.0001 Epoch: 27 Global Step: 562890 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:56:33,332-Speed 6315.79 samples/sec Loss 3.7753 LearningRate 0.0001 Epoch: 27 Global Step: 562900 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:56:36,585-Speed 6296.66 samples/sec Loss 3.7927 LearningRate 0.0001 Epoch: 27 Global Step: 562910 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:56:39,832-Speed 6310.63 samples/sec Loss 3.8616 LearningRate 0.0001 Epoch: 27 Global Step: 562920 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:56:43,074-Speed 6316.70 samples/sec Loss 3.7898 LearningRate 0.0001 Epoch: 27 Global Step: 562930 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:56:46,316-Speed 6319.36 samples/sec Loss 3.7345 LearningRate 0.0001 Epoch: 27 Global Step: 562940 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:56:49,561-Speed 6312.24 samples/sec Loss 3.8302 LearningRate 0.0001 Epoch: 27 Global Step: 562950 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:56:52,806-Speed 6313.61 samples/sec Loss 3.7672 LearningRate 0.0001 Epoch: 27 Global Step: 562960 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:56:56,048-Speed 6318.52 samples/sec Loss 3.7722 LearningRate 0.0001 Epoch: 27 Global Step: 562970 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:56:59,312-Speed 6275.82 samples/sec Loss 3.8802 LearningRate 0.0001 Epoch: 27 Global Step: 562980 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:57:02,550-Speed 6324.93 samples/sec Loss 3.8550 LearningRate 0.0001 Epoch: 27 Global Step: 562990 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:05,794-Speed 6315.46 samples/sec Loss 3.7684 LearningRate 0.0001 Epoch: 27 Global Step: 563000 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:09,038-Speed 6313.47 samples/sec Loss 3.8332 LearningRate 0.0001 Epoch: 27 Global Step: 563010 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:12,288-Speed 6303.02 samples/sec Loss 3.7993 LearningRate 0.0001 Epoch: 27 Global Step: 563020 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:15,532-Speed 6315.72 samples/sec Loss 3.8288 LearningRate 0.0001 Epoch: 27 Global Step: 563030 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:18,835-Speed 6202.07 samples/sec Loss 3.8638 LearningRate 0.0001 Epoch: 27 Global Step: 563040 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:22,080-Speed 6310.88 samples/sec Loss 3.8316 LearningRate 0.0001 Epoch: 27 Global Step: 563050 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:25,324-Speed 6315.13 samples/sec Loss 3.8297 LearningRate 0.0001 Epoch: 27 Global Step: 563060 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:28,573-Speed 6306.45 samples/sec Loss 3.9287 LearningRate 0.0001 Epoch: 27 Global Step: 563070 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:31,822-Speed 6304.84 samples/sec Loss 3.8584 LearningRate 0.0001 Epoch: 27 Global Step: 563080 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:35,064-Speed 6317.60 samples/sec Loss 3.8134 LearningRate 0.0001 Epoch: 27 Global Step: 563090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:57:38,312-Speed 6307.29 samples/sec Loss 3.8083 LearningRate 0.0001 Epoch: 27 Global Step: 563100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:57:41,546-Speed 6334.79 samples/sec Loss 3.8784 LearningRate 0.0001 Epoch: 27 Global Step: 563110 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:44,789-Speed 6317.01 samples/sec Loss 3.7918 LearningRate 0.0001 Epoch: 27 Global Step: 563120 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:48,038-Speed 6305.28 samples/sec Loss 3.8581 LearningRate 0.0001 Epoch: 27 Global Step: 563130 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:51,279-Speed 6320.36 samples/sec Loss 3.8185 LearningRate 0.0001 Epoch: 27 Global Step: 563140 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:54,525-Speed 6310.19 samples/sec Loss 3.7811 LearningRate 0.0001 Epoch: 27 Global Step: 563150 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:57:57,765-Speed 6322.50 samples/sec Loss 3.8397 LearningRate 0.0001 Epoch: 27 Global Step: 563160 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:01,010-Speed 6312.65 samples/sec Loss 3.8405 LearningRate 0.0001 Epoch: 27 Global Step: 563170 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:04,267-Speed 6289.29 samples/sec Loss 3.8455 LearningRate 0.0001 Epoch: 27 Global Step: 563180 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:07,524-Speed 6290.41 samples/sec Loss 3.8165 LearningRate 0.0001 Epoch: 27 Global Step: 563190 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:10,769-Speed 6312.53 samples/sec Loss 3.8511 LearningRate 0.0001 Epoch: 27 Global Step: 563200 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:14,012-Speed 6316.50 samples/sec Loss 3.7709 LearningRate 0.0001 Epoch: 27 Global Step: 563210 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:58:17,260-Speed 6306.25 samples/sec Loss 3.8205 LearningRate 0.0001 Epoch: 27 Global Step: 563220 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:58:20,525-Speed 6273.95 samples/sec Loss 3.8314 LearningRate 0.0001 Epoch: 27 Global Step: 563230 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:58:23,756-Speed 6340.44 samples/sec Loss 3.8376 LearningRate 0.0001 Epoch: 27 Global Step: 563240 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:27,000-Speed 6314.28 samples/sec Loss 3.8382 LearningRate 0.0001 Epoch: 27 Global Step: 563250 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:30,249-Speed 6304.61 samples/sec Loss 3.7862 LearningRate 0.0001 Epoch: 27 Global Step: 563260 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:33,500-Speed 6301.84 samples/sec Loss 3.8287 LearningRate 0.0001 Epoch: 27 Global Step: 563270 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:36,742-Speed 6318.45 samples/sec Loss 3.8726 LearningRate 0.0001 Epoch: 27 Global Step: 563280 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:39,988-Speed 6311.42 samples/sec Loss 3.7654 LearningRate 0.0001 Epoch: 27 Global Step: 563290 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:43,231-Speed 6315.19 samples/sec Loss 3.8121 LearningRate 0.0001 Epoch: 27 Global Step: 563300 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:46,493-Speed 6280.17 samples/sec Loss 3.8289 LearningRate 0.0001 Epoch: 27 Global Step: 563310 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:49,747-Speed 6296.83 samples/sec Loss 3.8005 LearningRate 0.0001 Epoch: 27 Global Step: 563320 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:52,991-Speed 6313.15 samples/sec Loss 3.8026 LearningRate 0.0001 Epoch: 27 Global Step: 563330 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:58:56,240-Speed 6304.92 samples/sec Loss 3.8148 LearningRate 0.0001 Epoch: 27 Global Step: 563340 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:58:59,487-Speed 6310.45 samples/sec Loss 3.7844 LearningRate 0.0001 Epoch: 27 Global Step: 563350 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:59:02,737-Speed 6302.08 samples/sec Loss 3.8496 LearningRate 0.0001 Epoch: 27 Global Step: 563360 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:59:05,983-Speed 6309.94 samples/sec Loss 3.7668 LearningRate 0.0001 Epoch: 27 Global Step: 563370 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:59:09,227-Speed 6315.58 samples/sec Loss 3.7978 LearningRate 0.0001 Epoch: 27 Global Step: 563380 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:59:12,470-Speed 6315.60 samples/sec Loss 3.8142 LearningRate 0.0001 Epoch: 27 Global Step: 563390 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:59:15,712-Speed 6320.07 samples/sec Loss 3.7403 LearningRate 0.0001 Epoch: 27 Global Step: 563400 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:59:18,961-Speed 6304.50 samples/sec Loss 3.8096 LearningRate 0.0001 Epoch: 27 Global Step: 563410 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:59:22,204-Speed 6315.78 samples/sec Loss 3.8289 LearningRate 0.0001 Epoch: 27 Global Step: 563420 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 18:59:25,435-Speed 6340.57 samples/sec Loss 3.7830 LearningRate 0.0001 Epoch: 27 Global Step: 563430 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:59:28,682-Speed 6309.84 samples/sec Loss 3.8248 LearningRate 0.0001 Epoch: 27 Global Step: 563440 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:59:31,925-Speed 6315.79 samples/sec Loss 3.8195 LearningRate 0.0001 Epoch: 27 Global Step: 563450 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:59:35,169-Speed 6314.13 samples/sec Loss 3.7565 LearningRate 0.0001 Epoch: 27 Global Step: 563460 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:59:38,416-Speed 6308.68 samples/sec Loss 3.8115 LearningRate 0.0001 Epoch: 27 Global Step: 563470 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:59:41,680-Speed 6275.81 samples/sec Loss 3.8275 LearningRate 0.0001 Epoch: 27 Global Step: 563480 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:59:44,925-Speed 6313.03 samples/sec Loss 3.7744 LearningRate 0.0001 Epoch: 27 Global Step: 563490 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:59:48,170-Speed 6312.96 samples/sec Loss 3.7945 LearningRate 0.0001 Epoch: 27 Global Step: 563500 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:59:51,414-Speed 6313.63 samples/sec Loss 3.7892 LearningRate 0.0001 Epoch: 27 Global Step: 563510 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:59:54,656-Speed 6318.69 samples/sec Loss 3.7176 LearningRate 0.0001 Epoch: 27 Global Step: 563520 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 18:59:57,890-Speed 6335.82 samples/sec Loss 3.8408 LearningRate 0.0001 Epoch: 27 Global Step: 563530 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:01,133-Speed 6316.67 samples/sec Loss 3.8620 LearningRate 0.0001 Epoch: 27 Global Step: 563540 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:04,376-Speed 6318.03 samples/sec Loss 3.7835 LearningRate 0.0001 Epoch: 27 Global Step: 563550 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:07,620-Speed 6313.00 samples/sec Loss 3.7720 LearningRate 0.0001 Epoch: 27 Global Step: 563560 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:10,905-Speed 6235.46 samples/sec Loss 3.7662 LearningRate 0.0001 Epoch: 27 Global Step: 563570 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:14,151-Speed 6312.77 samples/sec Loss 3.8601 LearningRate 0.0001 Epoch: 27 Global Step: 563580 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:17,396-Speed 6311.75 samples/sec Loss 3.8145 LearningRate 0.0001 Epoch: 27 Global Step: 563590 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:20,637-Speed 6319.30 samples/sec Loss 3.8163 LearningRate 0.0001 Epoch: 27 Global Step: 563600 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:23,883-Speed 6311.88 samples/sec Loss 3.8464 LearningRate 0.0001 Epoch: 27 Global Step: 563610 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:27,126-Speed 6315.80 samples/sec Loss 3.7494 LearningRate 0.0001 Epoch: 27 Global Step: 563620 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:30,370-Speed 6315.10 samples/sec Loss 3.8338 LearningRate 0.0001 Epoch: 27 Global Step: 563630 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:00:33,636-Speed 6273.97 samples/sec Loss 3.8222 LearningRate 0.0001 Epoch: 27 Global Step: 563640 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:00:36,888-Speed 6298.24 samples/sec Loss 3.8096 LearningRate 0.0001 Epoch: 27 Global Step: 563650 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:00:40,131-Speed 6317.21 samples/sec Loss 3.7874 LearningRate 0.0001 Epoch: 27 Global Step: 563660 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:00:43,361-Speed 6341.36 samples/sec Loss 3.8716 LearningRate 0.0001 Epoch: 27 Global Step: 563670 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:46,604-Speed 6315.74 samples/sec Loss 3.8347 LearningRate 0.0001 Epoch: 27 Global Step: 563680 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:49,854-Speed 6304.22 samples/sec Loss 3.7934 LearningRate 0.0001 Epoch: 27 Global Step: 563690 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:53,096-Speed 6317.01 samples/sec Loss 3.8027 LearningRate 0.0001 Epoch: 27 Global Step: 563700 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:56,342-Speed 6312.22 samples/sec Loss 3.8252 LearningRate 0.0001 Epoch: 27 Global Step: 563710 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:00:59,583-Speed 6319.48 samples/sec Loss 3.7904 LearningRate 0.0001 Epoch: 27 Global Step: 563720 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:02,829-Speed 6310.47 samples/sec Loss 3.7911 LearningRate 0.0001 Epoch: 27 Global Step: 563730 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:06,095-Speed 6273.24 samples/sec Loss 3.8084 LearningRate 0.0001 Epoch: 27 Global Step: 563740 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:09,353-Speed 6288.56 samples/sec Loss 3.8322 LearningRate 0.0001 Epoch: 27 Global Step: 563750 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:12,597-Speed 6313.47 samples/sec Loss 3.8141 LearningRate 0.0001 Epoch: 27 Global Step: 563760 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:15,843-Speed 6311.35 samples/sec Loss 3.7614 LearningRate 0.0001 Epoch: 27 Global Step: 563770 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:01:19,090-Speed 6309.31 samples/sec Loss 3.8474 LearningRate 0.0001 Epoch: 27 Global Step: 563780 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:01:22,335-Speed 6312.48 samples/sec Loss 3.8123 LearningRate 0.0001 Epoch: 27 Global Step: 563790 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:01:25,572-Speed 6327.64 samples/sec Loss 3.7555 LearningRate 0.0001 Epoch: 27 Global Step: 563800 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:28,870-Speed 6211.31 samples/sec Loss 3.8314 LearningRate 0.0001 Epoch: 27 Global Step: 563810 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:32,116-Speed 6311.04 samples/sec Loss 3.7979 LearningRate 0.0001 Epoch: 27 Global Step: 563820 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:35,365-Speed 6308.31 samples/sec Loss 3.7909 LearningRate 0.0001 Epoch: 27 Global Step: 563830 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:38,609-Speed 6314.19 samples/sec Loss 3.7858 LearningRate 0.0001 Epoch: 27 Global Step: 563840 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:41,856-Speed 6308.70 samples/sec Loss 3.8553 LearningRate 0.0001 Epoch: 27 Global Step: 563850 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:45,099-Speed 6316.31 samples/sec Loss 3.8575 LearningRate 0.0001 Epoch: 27 Global Step: 563860 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:48,344-Speed 6312.92 samples/sec Loss 3.7711 LearningRate 0.0001 Epoch: 27 Global Step: 563870 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:51,586-Speed 6318.40 samples/sec Loss 3.8638 LearningRate 0.0001 Epoch: 27 Global Step: 563880 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:54,830-Speed 6314.84 samples/sec Loss 3.8524 LearningRate 0.0001 Epoch: 27 Global Step: 563890 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:01:58,067-Speed 6328.37 samples/sec Loss 3.8725 LearningRate 0.0001 Epoch: 27 Global Step: 563900 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:01,318-Speed 6299.48 samples/sec Loss 3.7987 LearningRate 0.0001 Epoch: 27 Global Step: 563910 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:04,563-Speed 6313.62 samples/sec Loss 3.7664 LearningRate 0.0001 Epoch: 27 Global Step: 563920 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:07,811-Speed 6306.68 samples/sec Loss 3.8496 LearningRate 0.0001 Epoch: 27 Global Step: 563930 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:11,070-Speed 6285.61 samples/sec Loss 3.8186 LearningRate 0.0001 Epoch: 27 Global Step: 563940 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:14,312-Speed 6318.34 samples/sec Loss 3.8150 LearningRate 0.0001 Epoch: 27 Global Step: 563950 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:17,560-Speed 6307.29 samples/sec Loss 3.8269 LearningRate 0.0001 Epoch: 27 Global Step: 563960 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:20,814-Speed 6295.08 samples/sec Loss 3.8164 LearningRate 0.0001 Epoch: 27 Global Step: 563970 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:24,061-Speed 6310.99 samples/sec Loss 3.8332 LearningRate 0.0001 Epoch: 27 Global Step: 563980 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:27,319-Speed 6287.21 samples/sec Loss 3.8840 LearningRate 0.0001 Epoch: 27 Global Step: 563990 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:30,565-Speed 6309.69 samples/sec Loss 3.8553 LearningRate 0.0001 Epoch: 27 Global Step: 564000 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:02:33,810-Speed 6314.10 samples/sec Loss 3.7872 LearningRate 0.0001 Epoch: 27 Global Step: 564010 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:02:37,054-Speed 6313.25 samples/sec Loss 3.8128 LearningRate 0.0001 Epoch: 27 Global Step: 564020 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:02:40,306-Speed 6300.02 samples/sec Loss 3.8158 LearningRate 0.0001 Epoch: 27 Global Step: 564030 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:02:43,549-Speed 6316.15 samples/sec Loss 3.7998 LearningRate 0.0001 Epoch: 27 Global Step: 564040 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:02:46,794-Speed 6312.66 samples/sec Loss 3.7617 LearningRate 0.0001 Epoch: 27 Global Step: 564050 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:02:50,046-Speed 6299.09 samples/sec Loss 3.8067 LearningRate 0.0001 Epoch: 27 Global Step: 564060 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:02:53,276-Speed 6342.24 samples/sec Loss 3.8136 LearningRate 0.0001 Epoch: 27 Global Step: 564070 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:56,526-Speed 6303.17 samples/sec Loss 3.8414 LearningRate 0.0001 Epoch: 27 Global Step: 564080 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:02:59,771-Speed 6313.24 samples/sec Loss 3.7745 LearningRate 0.0001 Epoch: 27 Global Step: 564090 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:03,018-Speed 6307.86 samples/sec Loss 3.8266 LearningRate 0.0001 Epoch: 27 Global Step: 564100 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:06,264-Speed 6311.01 samples/sec Loss 3.7661 LearningRate 0.0001 Epoch: 27 Global Step: 564110 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:09,509-Speed 6312.52 samples/sec Loss 3.7504 LearningRate 0.0001 Epoch: 27 Global Step: 564120 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:12,803-Speed 6218.55 samples/sec Loss 3.8325 LearningRate 0.0001 Epoch: 27 Global Step: 564130 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:16,072-Speed 6265.97 samples/sec Loss 3.7426 LearningRate 0.0001 Epoch: 27 Global Step: 564140 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:19,318-Speed 6311.09 samples/sec Loss 3.8284 LearningRate 0.0001 Epoch: 27 Global Step: 564150 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:22,568-Speed 6302.65 samples/sec Loss 3.8377 LearningRate 0.0001 Epoch: 27 Global Step: 564160 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:25,813-Speed 6313.98 samples/sec Loss 3.7458 LearningRate 0.0001 Epoch: 27 Global Step: 564170 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:03:29,064-Speed 6301.10 samples/sec Loss 3.8427 LearningRate 0.0001 Epoch: 27 Global Step: 564180 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:03:32,311-Speed 6308.81 samples/sec Loss 3.8440 LearningRate 0.0001 Epoch: 27 Global Step: 564190 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:03:35,554-Speed 6316.73 samples/sec Loss 3.7632 LearningRate 0.0001 Epoch: 27 Global Step: 564200 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:03:38,783-Speed 6343.85 samples/sec Loss 3.8511 LearningRate 0.0001 Epoch: 27 Global Step: 564210 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:42,031-Speed 6306.82 samples/sec Loss 3.7690 LearningRate 0.0001 Epoch: 27 Global Step: 564220 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:45,277-Speed 6310.84 samples/sec Loss 3.8112 LearningRate 0.0001 Epoch: 27 Global Step: 564230 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:48,523-Speed 6309.77 samples/sec Loss 3.7354 LearningRate 0.0001 Epoch: 27 Global Step: 564240 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:51,772-Speed 6305.68 samples/sec Loss 3.7639 LearningRate 0.0001 Epoch: 27 Global Step: 564250 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:55,019-Speed 6309.08 samples/sec Loss 3.7866 LearningRate 0.0001 Epoch: 27 Global Step: 564260 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:03:58,272-Speed 6296.51 samples/sec Loss 3.8370 LearningRate 0.0001 Epoch: 27 Global Step: 564270 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:01,524-Speed 6299.48 samples/sec Loss 3.8112 LearningRate 0.0001 Epoch: 27 Global Step: 564280 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:04,767-Speed 6316.53 samples/sec Loss 3.8202 LearningRate 0.0001 Epoch: 27 Global Step: 564290 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:08,013-Speed 6310.04 samples/sec Loss 3.8005 LearningRate 0.0001 Epoch: 27 Global Step: 564300 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:11,264-Speed 6300.49 samples/sec Loss 3.8231 LearningRate 0.0001 Epoch: 27 Global Step: 564310 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:04:14,508-Speed 6315.17 samples/sec Loss 3.8713 LearningRate 0.0001 Epoch: 27 Global Step: 564320 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:04:17,741-Speed 6336.85 samples/sec Loss 3.8412 LearningRate 0.0001 Epoch: 27 Global Step: 564330 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:20,983-Speed 6317.80 samples/sec Loss 3.7982 LearningRate 0.0001 Epoch: 27 Global Step: 564340 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:24,229-Speed 6310.36 samples/sec Loss 3.8085 LearningRate 0.0001 Epoch: 27 Global Step: 564350 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:27,472-Speed 6316.33 samples/sec Loss 3.8286 LearningRate 0.0001 Epoch: 27 Global Step: 564360 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:30,721-Speed 6305.27 samples/sec Loss 3.8274 LearningRate 0.0001 Epoch: 27 Global Step: 564370 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:33,964-Speed 6317.08 samples/sec Loss 3.7577 LearningRate 0.0001 Epoch: 27 Global Step: 564380 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:37,208-Speed 6315.15 samples/sec Loss 3.8436 LearningRate 0.0001 Epoch: 27 Global Step: 564390 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:40,454-Speed 6311.39 samples/sec Loss 3.7736 LearningRate 0.0001 Epoch: 27 Global Step: 564400 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:43,710-Speed 6290.77 samples/sec Loss 3.7836 LearningRate 0.0001 Epoch: 27 Global Step: 564410 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:46,953-Speed 6316.90 samples/sec Loss 3.7840 LearningRate 0.0001 Epoch: 27 Global Step: 564420 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:04:50,207-Speed 6295.03 samples/sec Loss 3.8011 LearningRate 0.0001 Epoch: 27 Global Step: 564430 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:04:53,452-Speed 6311.86 samples/sec Loss 3.8041 LearningRate 0.0001 Epoch: 27 Global Step: 564440 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:04:56,697-Speed 6312.76 samples/sec Loss 3.7914 LearningRate 0.0001 Epoch: 27 Global Step: 564450 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:04:59,926-Speed 6344.49 samples/sec Loss 3.8278 LearningRate 0.0001 Epoch: 27 Global Step: 564460 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:03,170-Speed 6314.80 samples/sec Loss 3.8535 LearningRate 0.0001 Epoch: 27 Global Step: 564470 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:06,415-Speed 6312.82 samples/sec Loss 3.8308 LearningRate 0.0001 Epoch: 27 Global Step: 564480 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:09,660-Speed 6312.23 samples/sec Loss 3.7456 LearningRate 0.0001 Epoch: 27 Global Step: 564490 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:12,907-Speed 6308.45 samples/sec Loss 3.8426 LearningRate 0.0001 Epoch: 27 Global Step: 564500 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:16,152-Speed 6313.24 samples/sec Loss 3.7082 LearningRate 0.0001 Epoch: 27 Global Step: 564510 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:19,403-Speed 6300.81 samples/sec Loss 3.7755 LearningRate 0.0001 Epoch: 27 Global Step: 564520 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:22,647-Speed 6314.52 samples/sec Loss 3.7824 LearningRate 0.0001 Epoch: 27 Global Step: 564530 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:25,892-Speed 6312.38 samples/sec Loss 3.7929 LearningRate 0.0001 Epoch: 27 Global Step: 564540 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:29,141-Speed 6305.14 samples/sec Loss 3.8518 LearningRate 0.0001 Epoch: 27 Global Step: 564550 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:32,383-Speed 6317.35 samples/sec Loss 3.8371 LearningRate 0.0001 Epoch: 27 Global Step: 564560 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:05:35,624-Speed 6321.43 samples/sec Loss 3.8235 LearningRate 0.0001 Epoch: 27 Global Step: 564570 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:38,868-Speed 6315.18 samples/sec Loss 3.7660 LearningRate 0.0001 Epoch: 27 Global Step: 564580 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:42,114-Speed 6310.33 samples/sec Loss 3.8697 LearningRate 0.0001 Epoch: 27 Global Step: 564590 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:45,362-Speed 6308.59 samples/sec Loss 3.8359 LearningRate 0.0001 Epoch: 27 Global Step: 564600 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:48,606-Speed 6314.54 samples/sec Loss 3.8108 LearningRate 0.0001 Epoch: 27 Global Step: 564610 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:51,849-Speed 6316.03 samples/sec Loss 3.8030 LearningRate 0.0001 Epoch: 27 Global Step: 564620 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:55,092-Speed 6315.64 samples/sec Loss 3.8096 LearningRate 0.0001 Epoch: 27 Global Step: 564630 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:05:58,339-Speed 6309.77 samples/sec Loss 3.8073 LearningRate 0.0001 Epoch: 27 Global Step: 564640 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:01,586-Speed 6308.94 samples/sec Loss 3.8149 LearningRate 0.0001 Epoch: 27 Global Step: 564650 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:04,830-Speed 6313.90 samples/sec Loss 3.8499 LearningRate 0.0001 Epoch: 27 Global Step: 564660 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:08,082-Speed 6300.23 samples/sec Loss 3.7922 LearningRate 0.0001 Epoch: 27 Global Step: 564670 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:06:11,313-Speed 6338.42 samples/sec Loss 3.8106 LearningRate 0.0001 Epoch: 27 Global Step: 564680 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:14,557-Speed 6315.54 samples/sec Loss 3.8127 LearningRate 0.0001 Epoch: 27 Global Step: 564690 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:17,800-Speed 6316.65 samples/sec Loss 3.8683 LearningRate 0.0001 Epoch: 27 Global Step: 564700 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:21,046-Speed 6311.27 samples/sec Loss 3.7830 LearningRate 0.0001 Epoch: 27 Global Step: 564710 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:24,294-Speed 6306.37 samples/sec Loss 3.8431 LearningRate 0.0001 Epoch: 27 Global Step: 564720 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:27,539-Speed 6312.17 samples/sec Loss 3.8368 LearningRate 0.0001 Epoch: 27 Global Step: 564730 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:30,789-Speed 6303.45 samples/sec Loss 3.7898 LearningRate 0.0001 Epoch: 27 Global Step: 564740 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:34,034-Speed 6312.59 samples/sec Loss 3.8067 LearningRate 0.0001 Epoch: 27 Global Step: 564750 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:37,280-Speed 6310.17 samples/sec Loss 3.7762 LearningRate 0.0001 Epoch: 27 Global Step: 564760 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:40,522-Speed 6318.08 samples/sec Loss 3.7568 LearningRate 0.0001 Epoch: 27 Global Step: 564770 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:06:43,765-Speed 6316.30 samples/sec Loss 3.8292 LearningRate 0.0001 Epoch: 27 Global Step: 564780 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:06:47,014-Speed 6305.05 samples/sec Loss 3.7668 LearningRate 0.0001 Epoch: 27 Global Step: 564790 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:06:50,263-Speed 6305.91 samples/sec Loss 3.7198 LearningRate 0.0001 Epoch: 27 Global Step: 564800 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:06:53,511-Speed 6308.21 samples/sec Loss 3.8197 LearningRate 0.0001 Epoch: 27 Global Step: 564810 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:06:56,766-Speed 6292.89 samples/sec Loss 3.8324 LearningRate 0.0001 Epoch: 27 Global Step: 564820 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:07:00,015-Speed 6303.99 samples/sec Loss 3.8008 LearningRate 0.0001 Epoch: 27 Global Step: 564830 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:07:03,263-Speed 6307.34 samples/sec Loss 3.8336 LearningRate 0.0001 Epoch: 27 Global Step: 564840 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:07:06,506-Speed 6316.59 samples/sec Loss 3.7931 LearningRate 0.0001 Epoch: 27 Global Step: 564850 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:07:09,750-Speed 6314.97 samples/sec Loss 3.7452 LearningRate 0.0001 Epoch: 27 Global Step: 564860 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:07:12,994-Speed 6314.12 samples/sec Loss 3.8057 LearningRate 0.0001 Epoch: 27 Global Step: 564870 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:07:16,240-Speed 6311.22 samples/sec Loss 3.8862 LearningRate 0.0001 Epoch: 27 Global Step: 564880 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:07:19,486-Speed 6309.51 samples/sec Loss 3.7792 LearningRate 0.0001 Epoch: 27 Global Step: 564890 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:07:22,729-Speed 6318.02 samples/sec Loss 3.8181 LearningRate 0.0001 Epoch: 27 Global Step: 564900 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:07:25,975-Speed 6309.47 samples/sec Loss 3.7951 LearningRate 0.0001 Epoch: 27 Global Step: 564910 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:07:29,218-Speed 6317.27 samples/sec Loss 3.8639 LearningRate 0.0001 Epoch: 27 Global Step: 564920 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:07:32,464-Speed 6309.46 samples/sec Loss 3.7869 LearningRate 0.0001 Epoch: 27 Global Step: 564930 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:07:35,709-Speed 6313.29 samples/sec Loss 3.8077 LearningRate 0.0001 Epoch: 27 Global Step: 564940 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:07:38,962-Speed 6297.66 samples/sec Loss 3.8246 LearningRate 0.0001 Epoch: 27 Global Step: 564950 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:07:42,210-Speed 6306.85 samples/sec Loss 3.8787 LearningRate 0.0001 Epoch: 27 Global Step: 564960 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:07:45,452-Speed 6317.21 samples/sec Loss 3.7929 LearningRate 0.0001 Epoch: 27 Global Step: 564970 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:07:48,697-Speed 6312.46 samples/sec Loss 3.7731 LearningRate 0.0001 Epoch: 27 Global Step: 564980 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:07:51,943-Speed 6312.68 samples/sec Loss 3.7719 LearningRate 0.0001 Epoch: 27 Global Step: 564990 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:07:55,191-Speed 6305.81 samples/sec Loss 3.7918 LearningRate 0.0001 Epoch: 27 Global Step: 565000 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:07:58,439-Speed 6307.27 samples/sec Loss 3.7878 LearningRate 0.0001 Epoch: 27 Global Step: 565010 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:08:01,682-Speed 6317.34 samples/sec Loss 3.8032 LearningRate 0.0001 Epoch: 27 Global Step: 565020 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:08:04,916-Speed 6334.62 samples/sec Loss 3.7723 LearningRate 0.0001 Epoch: 27 Global Step: 565030 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:08,161-Speed 6311.98 samples/sec Loss 3.7931 LearningRate 0.0001 Epoch: 27 Global Step: 565040 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:11,413-Speed 6299.00 samples/sec Loss 3.8003 LearningRate 0.0001 Epoch: 27 Global Step: 565050 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:14,658-Speed 6313.78 samples/sec Loss 3.7833 LearningRate 0.0001 Epoch: 27 Global Step: 565060 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:17,904-Speed 6309.02 samples/sec Loss 3.8070 LearningRate 0.0001 Epoch: 27 Global Step: 565070 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:21,152-Speed 6307.28 samples/sec Loss 3.8369 LearningRate 0.0001 Epoch: 27 Global Step: 565080 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:24,395-Speed 6317.62 samples/sec Loss 3.8038 LearningRate 0.0001 Epoch: 27 Global Step: 565090 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:27,641-Speed 6309.38 samples/sec Loss 3.7724 LearningRate 0.0001 Epoch: 27 Global Step: 565100 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:30,889-Speed 6307.58 samples/sec Loss 3.8627 LearningRate 0.0001 Epoch: 27 Global Step: 565110 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:34,138-Speed 6304.94 samples/sec Loss 3.7847 LearningRate 0.0001 Epoch: 27 Global Step: 565120 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:37,381-Speed 6316.42 samples/sec Loss 3.8321 LearningRate 0.0001 Epoch: 27 Global Step: 565130 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:08:40,628-Speed 6309.61 samples/sec Loss 3.7698 LearningRate 0.0001 Epoch: 27 Global Step: 565140 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:08:43,873-Speed 6311.38 samples/sec Loss 3.8722 LearningRate 0.0001 Epoch: 27 Global Step: 565150 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:08:47,104-Speed 6340.20 samples/sec Loss 3.8210 LearningRate 0.0001 Epoch: 27 Global Step: 565160 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:50,356-Speed 6299.68 samples/sec Loss 3.7627 LearningRate 0.0001 Epoch: 27 Global Step: 565170 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:53,598-Speed 6317.35 samples/sec Loss 3.6966 LearningRate 0.0001 Epoch: 27 Global Step: 565180 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:08:56,844-Speed 6311.60 samples/sec Loss 3.8270 LearningRate 0.0001 Epoch: 27 Global Step: 565190 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:00,095-Speed 6300.13 samples/sec Loss 3.8388 LearningRate 0.0001 Epoch: 27 Global Step: 565200 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:03,337-Speed 6318.75 samples/sec Loss 3.8583 LearningRate 0.0001 Epoch: 27 Global Step: 565210 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:06,584-Speed 6309.88 samples/sec Loss 3.8001 LearningRate 0.0001 Epoch: 27 Global Step: 565220 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:09,828-Speed 6314.58 samples/sec Loss 3.8120 LearningRate 0.0001 Epoch: 27 Global Step: 565230 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:13,094-Speed 6271.52 samples/sec Loss 3.8046 LearningRate 0.0001 Epoch: 27 Global Step: 565240 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:16,427-Speed 6147.04 samples/sec Loss 3.8451 LearningRate 0.0001 Epoch: 27 Global Step: 565250 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:19,671-Speed 6314.18 samples/sec Loss 3.8186 LearningRate 0.0001 Epoch: 27 Global Step: 565260 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:09:22,918-Speed 6309.81 samples/sec Loss 3.7789 LearningRate 0.0001 Epoch: 27 Global Step: 565270 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:09:26,162-Speed 6313.50 samples/sec Loss 3.7718 LearningRate 0.0001 Epoch: 27 Global Step: 565280 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:09:29,394-Speed 6338.42 samples/sec Loss 3.8307 LearningRate 0.0001 Epoch: 27 Global Step: 565290 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:32,640-Speed 6309.53 samples/sec Loss 3.8263 LearningRate 0.0001 Epoch: 27 Global Step: 565300 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:35,883-Speed 6316.60 samples/sec Loss 3.7602 LearningRate 0.0001 Epoch: 27 Global Step: 565310 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:39,130-Speed 6308.81 samples/sec Loss 3.8372 LearningRate 0.0001 Epoch: 27 Global Step: 565320 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:42,378-Speed 6307.60 samples/sec Loss 3.7692 LearningRate 0.0001 Epoch: 27 Global Step: 565330 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:45,622-Speed 6315.18 samples/sec Loss 3.8170 LearningRate 0.0001 Epoch: 27 Global Step: 565340 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:48,866-Speed 6314.60 samples/sec Loss 3.7340 LearningRate 0.0001 Epoch: 27 Global Step: 565350 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:52,114-Speed 6305.53 samples/sec Loss 3.8028 LearningRate 0.0001 Epoch: 27 Global Step: 565360 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:55,358-Speed 6316.33 samples/sec Loss 3.8201 LearningRate 0.0001 Epoch: 27 Global Step: 565370 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:09:58,608-Speed 6301.97 samples/sec Loss 3.8325 LearningRate 0.0001 Epoch: 27 Global Step: 565380 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:01,842-Speed 6334.03 samples/sec Loss 3.7889 LearningRate 0.0001 Epoch: 27 Global Step: 565390 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:05,089-Speed 6308.95 samples/sec Loss 3.8732 LearningRate 0.0001 Epoch: 27 Global Step: 565400 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:08,337-Speed 6306.66 samples/sec Loss 3.7652 LearningRate 0.0001 Epoch: 27 Global Step: 565410 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:11,581-Speed 6314.98 samples/sec Loss 3.8558 LearningRate 0.0001 Epoch: 27 Global Step: 565420 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:14,825-Speed 6315.16 samples/sec Loss 3.7799 LearningRate 0.0001 Epoch: 27 Global Step: 565430 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:18,074-Speed 6305.24 samples/sec Loss 3.8324 LearningRate 0.0001 Epoch: 27 Global Step: 565440 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:21,317-Speed 6317.02 samples/sec Loss 3.8169 LearningRate 0.0001 Epoch: 27 Global Step: 565450 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:24,564-Speed 6308.57 samples/sec Loss 3.8008 LearningRate 0.0001 Epoch: 27 Global Step: 565460 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:27,814-Speed 6302.34 samples/sec Loss 3.8117 LearningRate 0.0001 Epoch: 27 Global Step: 565470 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:31,056-Speed 6319.14 samples/sec Loss 3.8218 LearningRate 0.0001 Epoch: 27 Global Step: 565480 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:34,301-Speed 6312.95 samples/sec Loss 3.7282 LearningRate 0.0001 Epoch: 27 Global Step: 565490 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:10:37,546-Speed 6312.79 samples/sec Loss 3.8042 LearningRate 0.0001 Epoch: 27 Global Step: 565500 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:10:40,781-Speed 6332.67 samples/sec Loss 3.7885 LearningRate 0.0001 Epoch: 27 Global Step: 565510 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:44,028-Speed 6307.71 samples/sec Loss 3.8068 LearningRate 0.0001 Epoch: 27 Global Step: 565520 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:47,271-Speed 6317.25 samples/sec Loss 3.7787 LearningRate 0.0001 Epoch: 27 Global Step: 565530 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:50,513-Speed 6318.46 samples/sec Loss 3.8087 LearningRate 0.0001 Epoch: 27 Global Step: 565540 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:53,758-Speed 6311.58 samples/sec Loss 3.7160 LearningRate 0.0001 Epoch: 27 Global Step: 565550 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:10:57,003-Speed 6312.38 samples/sec Loss 3.8126 LearningRate 0.0001 Epoch: 27 Global Step: 565560 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:00,248-Speed 6313.38 samples/sec Loss 3.7189 LearningRate 0.0001 Epoch: 27 Global Step: 565570 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:03,493-Speed 6312.91 samples/sec Loss 3.7655 LearningRate 0.0001 Epoch: 27 Global Step: 565580 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:06,735-Speed 6318.07 samples/sec Loss 3.8227 LearningRate 0.0001 Epoch: 27 Global Step: 565590 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:09,983-Speed 6308.06 samples/sec Loss 3.7893 LearningRate 0.0001 Epoch: 27 Global Step: 565600 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:13,234-Speed 6300.21 samples/sec Loss 3.8127 LearningRate 0.0001 Epoch: 27 Global Step: 565610 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:11:16,477-Speed 6317.62 samples/sec Loss 3.8329 LearningRate 0.0001 Epoch: 27 Global Step: 565620 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:11:19,724-Speed 6308.22 samples/sec Loss 3.8315 LearningRate 0.0001 Epoch: 27 Global Step: 565630 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:11:22,968-Speed 6315.88 samples/sec Loss 3.8386 LearningRate 0.0001 Epoch: 27 Global Step: 565640 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:11:26,203-Speed 6332.25 samples/sec Loss 3.7606 LearningRate 0.0001 Epoch: 27 Global Step: 565650 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:29,446-Speed 6315.60 samples/sec Loss 3.7803 LearningRate 0.0001 Epoch: 27 Global Step: 565660 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:32,692-Speed 6310.86 samples/sec Loss 3.7860 LearningRate 0.0001 Epoch: 27 Global Step: 565670 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:35,938-Speed 6310.70 samples/sec Loss 3.8955 LearningRate 0.0001 Epoch: 27 Global Step: 565680 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:39,184-Speed 6311.15 samples/sec Loss 3.8290 LearningRate 0.0001 Epoch: 27 Global Step: 565690 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:42,428-Speed 6315.42 samples/sec Loss 3.7723 LearningRate 0.0001 Epoch: 27 Global Step: 565700 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:45,671-Speed 6316.30 samples/sec Loss 3.8904 LearningRate 0.0001 Epoch: 27 Global Step: 565710 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:48,915-Speed 6314.82 samples/sec Loss 3.7642 LearningRate 0.0001 Epoch: 27 Global Step: 565720 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:52,159-Speed 6313.50 samples/sec Loss 3.8064 LearningRate 0.0001 Epoch: 27 Global Step: 565730 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:55,410-Speed 6301.85 samples/sec Loss 3.7029 LearningRate 0.0001 Epoch: 27 Global Step: 565740 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:11:58,654-Speed 6313.29 samples/sec Loss 3.7925 LearningRate 0.0001 Epoch: 27 Global Step: 565750 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:01,898-Speed 6315.12 samples/sec Loss 3.8370 LearningRate 0.0001 Epoch: 27 Global Step: 565760 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:05,143-Speed 6313.43 samples/sec Loss 3.8183 LearningRate 0.0001 Epoch: 27 Global Step: 565770 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:08,388-Speed 6311.83 samples/sec Loss 3.7982 LearningRate 0.0001 Epoch: 27 Global Step: 565780 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:11,633-Speed 6313.43 samples/sec Loss 3.7391 LearningRate 0.0001 Epoch: 27 Global Step: 565790 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:14,881-Speed 6305.31 samples/sec Loss 3.8182 LearningRate 0.0001 Epoch: 27 Global Step: 565800 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:18,128-Speed 6309.13 samples/sec Loss 3.8121 LearningRate 0.0001 Epoch: 27 Global Step: 565810 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:21,372-Speed 6314.58 samples/sec Loss 3.7649 LearningRate 0.0001 Epoch: 27 Global Step: 565820 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:24,616-Speed 6314.34 samples/sec Loss 3.8148 LearningRate 0.0001 Epoch: 27 Global Step: 565830 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:27,859-Speed 6316.64 samples/sec Loss 3.7753 LearningRate 0.0001 Epoch: 27 Global Step: 565840 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:31,094-Speed 6333.54 samples/sec Loss 3.7841 LearningRate 0.0001 Epoch: 27 Global Step: 565850 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:34,339-Speed 6312.11 samples/sec Loss 3.8108 LearningRate 0.0001 Epoch: 27 Global Step: 565860 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:37,590-Speed 6300.74 samples/sec Loss 3.8166 LearningRate 0.0001 Epoch: 27 Global Step: 565870 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:12:40,822-Speed 6339.67 samples/sec Loss 3.8475 LearningRate 0.0001 Epoch: 27 Global Step: 565880 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:12:44,064-Speed 6317.18 samples/sec Loss 3.8499 LearningRate 0.0001 Epoch: 27 Global Step: 565890 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:12:47,310-Speed 6310.60 samples/sec Loss 3.7856 LearningRate 0.0001 Epoch: 27 Global Step: 565900 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:12:50,561-Speed 6300.89 samples/sec Loss 3.8353 LearningRate 0.0001 Epoch: 27 Global Step: 565910 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:12:53,808-Speed 6309.53 samples/sec Loss 3.8493 LearningRate 0.0001 Epoch: 27 Global Step: 565920 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:12:57,050-Speed 6317.14 samples/sec Loss 3.7794 LearningRate 0.0001 Epoch: 27 Global Step: 565930 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:00,299-Speed 6306.01 samples/sec Loss 3.7618 LearningRate 0.0001 Epoch: 27 Global Step: 565940 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:03,546-Speed 6309.02 samples/sec Loss 3.8730 LearningRate 0.0001 Epoch: 27 Global Step: 565950 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:06,787-Speed 6319.61 samples/sec Loss 3.7881 LearningRate 0.0001 Epoch: 27 Global Step: 565960 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:10,030-Speed 6316.76 samples/sec Loss 3.8278 LearningRate 0.0001 Epoch: 27 Global Step: 565970 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:13,272-Speed 6319.13 samples/sec Loss 3.8523 LearningRate 0.0001 Epoch: 27 Global Step: 565980 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:13:16,523-Speed 6300.49 samples/sec Loss 3.7889 LearningRate 0.0001 Epoch: 27 Global Step: 565990 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:13:19,768-Speed 6312.05 samples/sec Loss 3.8371 LearningRate 0.0001 Epoch: 27 Global Step: 566000 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:13:23,015-Speed 6308.98 samples/sec Loss 3.8190 LearningRate 0.0001 Epoch: 27 Global Step: 566010 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:13:26,258-Speed 6316.60 samples/sec Loss 3.7925 LearningRate 0.0001 Epoch: 27 Global Step: 566020 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:13:29,510-Speed 6298.67 samples/sec Loss 3.7376 LearningRate 0.0001 Epoch: 27 Global Step: 566030 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:13:32,758-Speed 6307.53 samples/sec Loss 3.7762 LearningRate 0.0001 Epoch: 27 Global Step: 566040 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:13:36,061-Speed 6202.22 samples/sec Loss 3.7952 LearningRate 0.0001 Epoch: 27 Global Step: 566050 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:13:39,309-Speed 6307.10 samples/sec Loss 3.8160 LearningRate 0.0001 Epoch: 27 Global Step: 566060 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:42,554-Speed 6313.25 samples/sec Loss 3.7895 LearningRate 0.0001 Epoch: 27 Global Step: 566070 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:45,795-Speed 6320.44 samples/sec Loss 3.8399 LearningRate 0.0001 Epoch: 27 Global Step: 566080 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:49,048-Speed 6297.48 samples/sec Loss 3.7693 LearningRate 0.0001 Epoch: 27 Global Step: 566090 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:52,292-Speed 6313.93 samples/sec Loss 3.7541 LearningRate 0.0001 Epoch: 27 Global Step: 566100 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:55,538-Speed 6310.91 samples/sec Loss 3.7847 LearningRate 0.0001 Epoch: 27 Global Step: 566110 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:13:58,781-Speed 6315.38 samples/sec Loss 3.8102 LearningRate 0.0001 Epoch: 27 Global Step: 566120 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:02,026-Speed 6314.22 samples/sec Loss 3.8647 LearningRate 0.0001 Epoch: 27 Global Step: 566130 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:05,274-Speed 6306.55 samples/sec Loss 3.8257 LearningRate 0.0001 Epoch: 27 Global Step: 566140 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:08,518-Speed 6314.21 samples/sec Loss 3.7532 LearningRate 0.0001 Epoch: 27 Global Step: 566150 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:11,763-Speed 6311.45 samples/sec Loss 3.8256 LearningRate 0.0001 Epoch: 27 Global Step: 566160 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:14:15,010-Speed 6309.58 samples/sec Loss 3.7866 LearningRate 0.0001 Epoch: 27 Global Step: 566170 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:14:18,260-Speed 6303.72 samples/sec Loss 3.6933 LearningRate 0.0001 Epoch: 27 Global Step: 566180 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:14:21,505-Speed 6312.13 samples/sec Loss 3.8397 LearningRate 0.0001 Epoch: 27 Global Step: 566190 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:14:24,735-Speed 6340.81 samples/sec Loss 3.8530 LearningRate 0.0001 Epoch: 27 Global Step: 566200 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:27,983-Speed 6306.79 samples/sec Loss 3.8157 LearningRate 0.0001 Epoch: 27 Global Step: 566210 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:31,224-Speed 6321.79 samples/sec Loss 3.7994 LearningRate 0.0001 Epoch: 27 Global Step: 566220 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:34,494-Speed 6264.02 samples/sec Loss 3.7531 LearningRate 0.0001 Epoch: 27 Global Step: 566230 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:37,741-Speed 6307.93 samples/sec Loss 3.7899 LearningRate 0.0001 Epoch: 27 Global Step: 566240 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:41,082-Speed 6132.82 samples/sec Loss 3.7805 LearningRate 0.0001 Epoch: 27 Global Step: 566250 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:44,326-Speed 6314.35 samples/sec Loss 3.7264 LearningRate 0.0001 Epoch: 27 Global Step: 566260 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:47,578-Speed 6298.97 samples/sec Loss 3.8194 LearningRate 0.0001 Epoch: 27 Global Step: 566270 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:50,822-Speed 6313.98 samples/sec Loss 3.8162 LearningRate 0.0001 Epoch: 27 Global Step: 566280 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:54,063-Speed 6320.77 samples/sec Loss 3.8181 LearningRate 0.0001 Epoch: 27 Global Step: 566290 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:14:57,312-Speed 6305.53 samples/sec Loss 3.8404 LearningRate 0.0001 Epoch: 27 Global Step: 566300 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:15:00,555-Speed 6316.39 samples/sec Loss 3.7395 LearningRate 0.0001 Epoch: 27 Global Step: 566310 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:15:03,803-Speed 6307.87 samples/sec Loss 3.8037 LearningRate 0.0001 Epoch: 27 Global Step: 566320 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:15:07,043-Speed 6321.30 samples/sec Loss 3.7833 LearningRate 0.0001 Epoch: 27 Global Step: 566330 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:10,284-Speed 6320.76 samples/sec Loss 3.8249 LearningRate 0.0001 Epoch: 27 Global Step: 566340 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:13,527-Speed 6316.27 samples/sec Loss 3.8609 LearningRate 0.0001 Epoch: 27 Global Step: 566350 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:16,775-Speed 6307.16 samples/sec Loss 3.7827 LearningRate 0.0001 Epoch: 27 Global Step: 566360 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:20,023-Speed 6307.41 samples/sec Loss 3.7690 LearningRate 0.0001 Epoch: 27 Global Step: 566370 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:23,267-Speed 6313.66 samples/sec Loss 3.8162 LearningRate 0.0001 Epoch: 27 Global Step: 566380 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:26,513-Speed 6310.17 samples/sec Loss 3.7925 LearningRate 0.0001 Epoch: 27 Global Step: 566390 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:29,777-Speed 6276.92 samples/sec Loss 3.7358 LearningRate 0.0001 Epoch: 27 Global Step: 566400 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:33,021-Speed 6313.38 samples/sec Loss 3.8276 LearningRate 0.0001 Epoch: 27 Global Step: 566410 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:36,266-Speed 6313.15 samples/sec Loss 3.7274 LearningRate 0.0001 Epoch: 27 Global Step: 566420 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:39,499-Speed 6335.81 samples/sec Loss 3.7705 LearningRate 0.0001 Epoch: 27 Global Step: 566430 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:42,747-Speed 6307.86 samples/sec Loss 3.7751 LearningRate 0.0001 Epoch: 27 Global Step: 566440 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:45,990-Speed 6316.46 samples/sec Loss 3.7401 LearningRate 0.0001 Epoch: 27 Global Step: 566450 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:49,235-Speed 6312.42 samples/sec Loss 3.7164 LearningRate 0.0001 Epoch: 27 Global Step: 566460 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:52,481-Speed 6312.38 samples/sec Loss 3.8035 LearningRate 0.0001 Epoch: 27 Global Step: 566470 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:55,729-Speed 6305.77 samples/sec Loss 3.8315 LearningRate 0.0001 Epoch: 27 Global Step: 566480 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:15:58,974-Speed 6313.63 samples/sec Loss 3.8070 LearningRate 0.0001 Epoch: 27 Global Step: 566490 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:02,220-Speed 6310.99 samples/sec Loss 3.8098 LearningRate 0.0001 Epoch: 27 Global Step: 566500 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:05,467-Speed 6308.27 samples/sec Loss 3.7664 LearningRate 0.0001 Epoch: 27 Global Step: 566510 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:08,711-Speed 6314.09 samples/sec Loss 3.8055 LearningRate 0.0001 Epoch: 27 Global Step: 566520 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:11,956-Speed 6313.90 samples/sec Loss 3.7948 LearningRate 0.0001 Epoch: 27 Global Step: 566530 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:16:15,203-Speed 6307.04 samples/sec Loss 3.8279 LearningRate 0.0001 Epoch: 27 Global Step: 566540 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:16:18,450-Speed 6311.25 samples/sec Loss 3.7799 LearningRate 0.0001 Epoch: 27 Global Step: 566550 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:16:21,693-Speed 6316.62 samples/sec Loss 3.7964 LearningRate 0.0001 Epoch: 27 Global Step: 566560 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:16:24,928-Speed 6331.90 samples/sec Loss 3.7313 LearningRate 0.0001 Epoch: 27 Global Step: 566570 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:28,197-Speed 6265.78 samples/sec Loss 3.7924 LearningRate 0.0001 Epoch: 27 Global Step: 566580 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:31,444-Speed 6310.51 samples/sec Loss 3.8062 LearningRate 0.0001 Epoch: 27 Global Step: 566590 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:34,691-Speed 6307.32 samples/sec Loss 3.8012 LearningRate 0.0001 Epoch: 27 Global Step: 566600 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:37,939-Speed 6307.39 samples/sec Loss 3.8485 LearningRate 0.0001 Epoch: 27 Global Step: 566610 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:41,187-Speed 6306.23 samples/sec Loss 3.8099 LearningRate 0.0001 Epoch: 27 Global Step: 566620 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:44,434-Speed 6309.50 samples/sec Loss 3.8119 LearningRate 0.0001 Epoch: 27 Global Step: 566630 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:47,677-Speed 6316.22 samples/sec Loss 3.8398 LearningRate 0.0001 Epoch: 27 Global Step: 566640 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:50,925-Speed 6306.41 samples/sec Loss 3.7856 LearningRate 0.0001 Epoch: 27 Global Step: 566650 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:54,171-Speed 6312.23 samples/sec Loss 3.8102 LearningRate 0.0001 Epoch: 27 Global Step: 566660 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:16:57,419-Speed 6306.57 samples/sec Loss 3.7671 LearningRate 0.0001 Epoch: 27 Global Step: 566670 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:17:00,651-Speed 6339.48 samples/sec Loss 3.7445 LearningRate 0.0001 Epoch: 27 Global Step: 566680 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:03,899-Speed 6306.82 samples/sec Loss 3.8382 LearningRate 0.0001 Epoch: 27 Global Step: 566690 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:07,142-Speed 6314.64 samples/sec Loss 3.7546 LearningRate 0.0001 Epoch: 27 Global Step: 566700 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:10,386-Speed 6315.71 samples/sec Loss 3.8385 LearningRate 0.0001 Epoch: 27 Global Step: 566710 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:13,631-Speed 6311.78 samples/sec Loss 3.7979 LearningRate 0.0001 Epoch: 27 Global Step: 566720 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:16,876-Speed 6314.32 samples/sec Loss 3.8025 LearningRate 0.0001 Epoch: 27 Global Step: 566730 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:20,119-Speed 6315.43 samples/sec Loss 3.7731 LearningRate 0.0001 Epoch: 27 Global Step: 566740 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:23,363-Speed 6313.98 samples/sec Loss 3.7655 LearningRate 0.0001 Epoch: 27 Global Step: 566750 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:26,611-Speed 6307.05 samples/sec Loss 3.8418 LearningRate 0.0001 Epoch: 27 Global Step: 566760 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:29,855-Speed 6315.36 samples/sec Loss 3.7906 LearningRate 0.0001 Epoch: 27 Global Step: 566770 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:33,102-Speed 6309.44 samples/sec Loss 3.7326 LearningRate 0.0001 Epoch: 27 Global Step: 566780 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:17:36,349-Speed 6308.53 samples/sec Loss 3.8075 LearningRate 0.0001 Epoch: 27 Global Step: 566790 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:17:39,581-Speed 6337.25 samples/sec Loss 3.7761 LearningRate 0.0001 Epoch: 27 Global Step: 566800 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:42,825-Speed 6313.37 samples/sec Loss 3.7678 LearningRate 0.0001 Epoch: 27 Global Step: 566810 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:46,070-Speed 6313.41 samples/sec Loss 3.8073 LearningRate 0.0001 Epoch: 27 Global Step: 566820 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:49,347-Speed 6251.59 samples/sec Loss 3.7627 LearningRate 0.0001 Epoch: 27 Global Step: 566830 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:52,594-Speed 6307.82 samples/sec Loss 3.7814 LearningRate 0.0001 Epoch: 27 Global Step: 566840 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:55,839-Speed 6312.64 samples/sec Loss 3.7117 LearningRate 0.0001 Epoch: 27 Global Step: 566850 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:17:59,090-Speed 6302.24 samples/sec Loss 3.8447 LearningRate 0.0001 Epoch: 27 Global Step: 566860 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:02,335-Speed 6311.61 samples/sec Loss 3.8369 LearningRate 0.0001 Epoch: 27 Global Step: 566870 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:05,581-Speed 6312.22 samples/sec Loss 3.8333 LearningRate 0.0001 Epoch: 27 Global Step: 566880 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:08,833-Speed 6297.76 samples/sec Loss 3.7100 LearningRate 0.0001 Epoch: 27 Global Step: 566890 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:12,083-Speed 6303.42 samples/sec Loss 3.7332 LearningRate 0.0001 Epoch: 27 Global Step: 566900 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:18:15,343-Speed 6284.93 samples/sec Loss 3.8243 LearningRate 0.0001 Epoch: 27 Global Step: 566910 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:18:18,591-Speed 6307.29 samples/sec Loss 3.7500 LearningRate 0.0001 Epoch: 27 Global Step: 566920 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:18:21,822-Speed 6338.58 samples/sec Loss 3.8262 LearningRate 0.0001 Epoch: 27 Global Step: 566930 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:25,067-Speed 6313.91 samples/sec Loss 3.7853 LearningRate 0.0001 Epoch: 27 Global Step: 566940 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:28,316-Speed 6305.31 samples/sec Loss 3.7779 LearningRate 0.0001 Epoch: 27 Global Step: 566950 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:31,563-Speed 6308.00 samples/sec Loss 3.7627 LearningRate 0.0001 Epoch: 27 Global Step: 566960 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:34,808-Speed 6313.43 samples/sec Loss 3.8048 LearningRate 0.0001 Epoch: 27 Global Step: 566970 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:38,054-Speed 6310.62 samples/sec Loss 3.7845 LearningRate 0.0001 Epoch: 27 Global Step: 566980 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:41,298-Speed 6312.63 samples/sec Loss 3.8122 LearningRate 0.0001 Epoch: 27 Global Step: 566990 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:44,545-Speed 6310.25 samples/sec Loss 3.8139 LearningRate 0.0001 Epoch: 27 Global Step: 567000 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:47,790-Speed 6312.98 samples/sec Loss 3.7809 LearningRate 0.0001 Epoch: 27 Global Step: 567010 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:51,035-Speed 6311.18 samples/sec Loss 3.8377 LearningRate 0.0001 Epoch: 27 Global Step: 567020 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:18:54,279-Speed 6314.60 samples/sec Loss 3.7898 LearningRate 0.0001 Epoch: 27 Global Step: 567030 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:18:57,524-Speed 6313.70 samples/sec Loss 3.8236 LearningRate 0.0001 Epoch: 27 Global Step: 567040 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:19:00,754-Speed 6342.26 samples/sec Loss 3.7661 LearningRate 0.0001 Epoch: 27 Global Step: 567050 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:04,003-Speed 6303.71 samples/sec Loss 3.7829 LearningRate 0.0001 Epoch: 27 Global Step: 567060 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:07,249-Speed 6312.64 samples/sec Loss 3.7462 LearningRate 0.0001 Epoch: 27 Global Step: 567070 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:10,497-Speed 6306.98 samples/sec Loss 3.8446 LearningRate 0.0001 Epoch: 27 Global Step: 567080 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:13,746-Speed 6305.13 samples/sec Loss 3.8335 LearningRate 0.0001 Epoch: 27 Global Step: 567090 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:16,993-Speed 6308.48 samples/sec Loss 3.7246 LearningRate 0.0001 Epoch: 27 Global Step: 567100 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:20,237-Speed 6314.98 samples/sec Loss 3.8044 LearningRate 0.0001 Epoch: 27 Global Step: 567110 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:23,484-Speed 6309.20 samples/sec Loss 3.7580 LearningRate 0.0001 Epoch: 27 Global Step: 567120 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:26,729-Speed 6312.34 samples/sec Loss 3.7456 LearningRate 0.0001 Epoch: 27 Global Step: 567130 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:29,972-Speed 6317.11 samples/sec Loss 3.8197 LearningRate 0.0001 Epoch: 27 Global Step: 567140 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:33,217-Speed 6312.60 samples/sec Loss 3.8255 LearningRate 0.0001 Epoch: 27 Global Step: 567150 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:19:36,459-Speed 6318.71 samples/sec Loss 3.8031 LearningRate 0.0001 Epoch: 27 Global Step: 567160 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:19:39,708-Speed 6304.15 samples/sec Loss 3.7643 LearningRate 0.0001 Epoch: 27 Global Step: 567170 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:19:42,954-Speed 6310.09 samples/sec Loss 3.8049 LearningRate 0.0001 Epoch: 27 Global Step: 567180 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:19:46,202-Speed 6306.74 samples/sec Loss 3.7692 LearningRate 0.0001 Epoch: 27 Global Step: 567190 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:19:49,434-Speed 6338.05 samples/sec Loss 3.7750 LearningRate 0.0001 Epoch: 27 Global Step: 567200 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:52,677-Speed 6316.48 samples/sec Loss 3.7870 LearningRate 0.0001 Epoch: 27 Global Step: 567210 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:55,921-Speed 6315.13 samples/sec Loss 3.8114 LearningRate 0.0001 Epoch: 27 Global Step: 567220 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:19:59,170-Speed 6305.01 samples/sec Loss 3.7882 LearningRate 0.0001 Epoch: 27 Global Step: 567230 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:02,426-Speed 6291.66 samples/sec Loss 3.7846 LearningRate 0.0001 Epoch: 27 Global Step: 567240 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:05,671-Speed 6313.03 samples/sec Loss 3.8513 LearningRate 0.0001 Epoch: 27 Global Step: 567250 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:08,913-Speed 6318.56 samples/sec Loss 3.8502 LearningRate 0.0001 Epoch: 27 Global Step: 567260 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:12,166-Speed 6295.81 samples/sec Loss 3.8049 LearningRate 0.0001 Epoch: 27 Global Step: 567270 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:15,414-Speed 6306.77 samples/sec Loss 3.7986 LearningRate 0.0001 Epoch: 27 Global Step: 567280 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:18,658-Speed 6314.48 samples/sec Loss 3.7972 LearningRate 0.0001 Epoch: 27 Global Step: 567290 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:21,906-Speed 6308.60 samples/sec Loss 3.7354 LearningRate 0.0001 Epoch: 27 Global Step: 567300 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:20:25,183-Speed 6251.32 samples/sec Loss 3.7741 LearningRate 0.0001 Epoch: 27 Global Step: 567310 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:20:28,492-Speed 6190.46 samples/sec Loss 3.7597 LearningRate 0.0001 Epoch: 27 Global Step: 567320 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:20:31,738-Speed 6309.75 samples/sec Loss 3.7305 LearningRate 0.0001 Epoch: 27 Global Step: 567330 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:20:34,981-Speed 6315.77 samples/sec Loss 3.7685 LearningRate 0.0001 Epoch: 27 Global Step: 567340 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:20:38,214-Speed 6337.44 samples/sec Loss 3.7984 LearningRate 0.0001 Epoch: 27 Global Step: 567350 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:41,461-Speed 6308.31 samples/sec Loss 3.8135 LearningRate 0.0001 Epoch: 27 Global Step: 567360 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:44,705-Speed 6314.75 samples/sec Loss 3.8599 LearningRate 0.0001 Epoch: 27 Global Step: 567370 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:47,956-Speed 6301.21 samples/sec Loss 3.7994 LearningRate 0.0001 Epoch: 27 Global Step: 567380 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:51,202-Speed 6310.48 samples/sec Loss 3.7690 LearningRate 0.0001 Epoch: 27 Global Step: 567390 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:54,446-Speed 6313.79 samples/sec Loss 3.7843 LearningRate 0.0001 Epoch: 27 Global Step: 567400 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:20:57,704-Speed 6288.76 samples/sec Loss 3.7169 LearningRate 0.0001 Epoch: 27 Global Step: 567410 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:00,948-Speed 6314.45 samples/sec Loss 3.8044 LearningRate 0.0001 Epoch: 27 Global Step: 567420 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:04,203-Speed 6293.17 samples/sec Loss 3.8056 LearningRate 0.0001 Epoch: 27 Global Step: 567430 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:07,445-Speed 6318.74 samples/sec Loss 3.7569 LearningRate 0.0001 Epoch: 27 Global Step: 567440 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:10,692-Speed 6306.95 samples/sec Loss 3.7920 LearningRate 0.0001 Epoch: 27 Global Step: 567450 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:21:13,939-Speed 6310.44 samples/sec Loss 3.8097 LearningRate 0.0001 Epoch: 27 Global Step: 567460 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:21:17,187-Speed 6305.88 samples/sec Loss 3.8036 LearningRate 0.0001 Epoch: 27 Global Step: 567470 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:21:20,421-Speed 6335.13 samples/sec Loss 3.7840 LearningRate 0.0001 Epoch: 27 Global Step: 567480 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:23,688-Speed 6270.58 samples/sec Loss 3.7799 LearningRate 0.0001 Epoch: 27 Global Step: 567490 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:26,933-Speed 6312.38 samples/sec Loss 3.7652 LearningRate 0.0001 Epoch: 27 Global Step: 567500 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:30,177-Speed 6314.01 samples/sec Loss 3.7869 LearningRate 0.0001 Epoch: 27 Global Step: 567510 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:33,422-Speed 6312.57 samples/sec Loss 3.7560 LearningRate 0.0001 Epoch: 27 Global Step: 567520 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:36,696-Speed 6257.31 samples/sec Loss 3.7139 LearningRate 0.0001 Epoch: 27 Global Step: 567530 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:39,941-Speed 6313.16 samples/sec Loss 3.7513 LearningRate 0.0001 Epoch: 27 Global Step: 567540 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:43,187-Speed 6310.83 samples/sec Loss 3.7954 LearningRate 0.0001 Epoch: 27 Global Step: 567550 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:46,433-Speed 6311.42 samples/sec Loss 3.8027 LearningRate 0.0001 Epoch: 27 Global Step: 567560 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:49,675-Speed 6317.92 samples/sec Loss 3.7646 LearningRate 0.0001 Epoch: 27 Global Step: 567570 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:52,923-Speed 6306.79 samples/sec Loss 3.7899 LearningRate 0.0001 Epoch: 27 Global Step: 567580 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:21:56,157-Speed 6333.21 samples/sec Loss 3.8045 LearningRate 0.0001 Epoch: 27 Global Step: 567590 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:21:59,403-Speed 6310.43 samples/sec Loss 3.8368 LearningRate 0.0001 Epoch: 27 Global Step: 567600 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:02,650-Speed 6309.36 samples/sec Loss 3.8022 LearningRate 0.0001 Epoch: 27 Global Step: 567610 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:05,894-Speed 6314.95 samples/sec Loss 3.7715 LearningRate 0.0001 Epoch: 27 Global Step: 567620 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:09,139-Speed 6311.76 samples/sec Loss 3.7515 LearningRate 0.0001 Epoch: 27 Global Step: 567630 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:12,386-Speed 6309.32 samples/sec Loss 3.7658 LearningRate 0.0001 Epoch: 27 Global Step: 567640 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:15,632-Speed 6309.99 samples/sec Loss 3.8175 LearningRate 0.0001 Epoch: 27 Global Step: 567650 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:18,904-Speed 6262.50 samples/sec Loss 3.7616 LearningRate 0.0001 Epoch: 27 Global Step: 567660 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:22,151-Speed 6307.79 samples/sec Loss 3.7962 LearningRate 0.0001 Epoch: 27 Global Step: 567670 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:25,395-Speed 6317.24 samples/sec Loss 3.7602 LearningRate 0.0001 Epoch: 27 Global Step: 567680 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:28,625-Speed 6342.33 samples/sec Loss 3.7883 LearningRate 0.0001 Epoch: 27 Global Step: 567690 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:31,871-Speed 6311.62 samples/sec Loss 3.7610 LearningRate 0.0001 Epoch: 27 Global Step: 567700 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:35,116-Speed 6313.15 samples/sec Loss 3.8333 LearningRate 0.0001 Epoch: 27 Global Step: 567710 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:38,358-Speed 6318.02 samples/sec Loss 3.7723 LearningRate 0.0001 Epoch: 27 Global Step: 567720 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:41,609-Speed 6300.84 samples/sec Loss 3.7275 LearningRate 0.0001 Epoch: 27 Global Step: 567730 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:44,851-Speed 6317.86 samples/sec Loss 3.7963 LearningRate 0.0001 Epoch: 27 Global Step: 567740 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:48,101-Speed 6304.95 samples/sec Loss 3.8070 LearningRate 0.0001 Epoch: 27 Global Step: 567750 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:51,350-Speed 6304.61 samples/sec Loss 3.7197 LearningRate 0.0001 Epoch: 27 Global Step: 567760 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:54,594-Speed 6313.90 samples/sec Loss 3.7927 LearningRate 0.0001 Epoch: 27 Global Step: 567770 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:22:57,839-Speed 6312.43 samples/sec Loss 3.7852 LearningRate 0.0001 Epoch: 27 Global Step: 567780 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:01,087-Speed 6306.52 samples/sec Loss 3.7651 LearningRate 0.0001 Epoch: 27 Global Step: 567790 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:23:04,332-Speed 6313.02 samples/sec Loss 3.8501 LearningRate 0.0001 Epoch: 27 Global Step: 567800 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:23:07,576-Speed 6315.46 samples/sec Loss 3.7924 LearningRate 0.0001 Epoch: 27 Global Step: 567810 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:23:10,818-Speed 6317.37 samples/sec Loss 3.7683 LearningRate 0.0001 Epoch: 27 Global Step: 567820 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:23:14,046-Speed 6346.01 samples/sec Loss 3.7518 LearningRate 0.0001 Epoch: 27 Global Step: 567830 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:17,288-Speed 6318.00 samples/sec Loss 3.7715 LearningRate 0.0001 Epoch: 27 Global Step: 567840 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:20,535-Speed 6310.16 samples/sec Loss 3.8395 LearningRate 0.0001 Epoch: 27 Global Step: 567850 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:23,788-Speed 6295.87 samples/sec Loss 3.7811 LearningRate 0.0001 Epoch: 27 Global Step: 567860 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:27,033-Speed 6312.52 samples/sec Loss 3.8021 LearningRate 0.0001 Epoch: 27 Global Step: 567870 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:30,280-Speed 6308.90 samples/sec Loss 3.7964 LearningRate 0.0001 Epoch: 27 Global Step: 567880 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:33,526-Speed 6311.47 samples/sec Loss 3.8477 LearningRate 0.0001 Epoch: 27 Global Step: 567890 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:36,772-Speed 6310.49 samples/sec Loss 3.8142 LearningRate 0.0001 Epoch: 27 Global Step: 567900 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:40,022-Speed 6302.67 samples/sec Loss 3.7710 LearningRate 0.0001 Epoch: 27 Global Step: 567910 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:43,271-Speed 6305.14 samples/sec Loss 3.7730 LearningRate 0.0001 Epoch: 27 Global Step: 567920 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:23:46,520-Speed 6305.79 samples/sec Loss 3.7771 LearningRate 0.0001 Epoch: 27 Global Step: 567930 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:23:49,765-Speed 6311.65 samples/sec Loss 3.8195 LearningRate 0.0001 Epoch: 27 Global Step: 567940 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:23:53,013-Speed 6308.53 samples/sec Loss 3.7984 LearningRate 0.0001 Epoch: 27 Global Step: 567950 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:23:56,259-Speed 6309.40 samples/sec Loss 3.7684 LearningRate 0.0001 Epoch: 27 Global Step: 567960 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:23:59,503-Speed 6315.71 samples/sec Loss 3.8174 LearningRate 0.0001 Epoch: 27 Global Step: 567970 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:02,750-Speed 6308.95 samples/sec Loss 3.7973 LearningRate 0.0001 Epoch: 27 Global Step: 567980 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:05,993-Speed 6316.46 samples/sec Loss 3.7764 LearningRate 0.0001 Epoch: 27 Global Step: 567990 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:09,234-Speed 6318.81 samples/sec Loss 3.8602 LearningRate 0.0001 Epoch: 27 Global Step: 568000 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:12,479-Speed 6313.07 samples/sec Loss 3.8071 LearningRate 0.0001 Epoch: 27 Global Step: 568010 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:15,730-Speed 6300.31 samples/sec Loss 3.8004 LearningRate 0.0001 Epoch: 27 Global Step: 568020 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:18,977-Speed 6310.26 samples/sec Loss 3.7797 LearningRate 0.0001 Epoch: 27 Global Step: 568030 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:22,225-Speed 6306.17 samples/sec Loss 3.8186 LearningRate 0.0001 Epoch: 27 Global Step: 568040 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:25,471-Speed 6310.99 samples/sec Loss 3.8318 LearningRate 0.0001 Epoch: 27 Global Step: 568050 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:28,721-Speed 6303.54 samples/sec Loss 3.8160 LearningRate 0.0001 Epoch: 27 Global Step: 568060 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:31,964-Speed 6315.13 samples/sec Loss 3.7642 LearningRate 0.0001 Epoch: 27 Global Step: 568070 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:24:35,213-Speed 6305.88 samples/sec Loss 3.8354 LearningRate 0.0001 Epoch: 27 Global Step: 568080 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:24:38,461-Speed 6305.94 samples/sec Loss 3.7893 LearningRate 0.0001 Epoch: 27 Global Step: 568090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:24:41,704-Speed 6316.90 samples/sec Loss 3.7353 LearningRate 0.0001 Epoch: 27 Global Step: 568100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:24:44,940-Speed 6329.12 samples/sec Loss 3.7839 LearningRate 0.0001 Epoch: 27 Global Step: 568110 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:48,184-Speed 6314.68 samples/sec Loss 3.7887 LearningRate 0.0001 Epoch: 27 Global Step: 568120 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:51,428-Speed 6316.10 samples/sec Loss 3.7413 LearningRate 0.0001 Epoch: 27 Global Step: 568130 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:54,676-Speed 6305.51 samples/sec Loss 3.7956 LearningRate 0.0001 Epoch: 27 Global Step: 568140 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:24:57,929-Speed 6297.92 samples/sec Loss 3.8207 LearningRate 0.0001 Epoch: 27 Global Step: 568150 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:01,173-Speed 6314.83 samples/sec Loss 3.8208 LearningRate 0.0001 Epoch: 27 Global Step: 568160 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:04,420-Speed 6309.52 samples/sec Loss 3.8152 LearningRate 0.0001 Epoch: 27 Global Step: 568170 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:07,668-Speed 6306.53 samples/sec Loss 3.7578 LearningRate 0.0001 Epoch: 27 Global Step: 568180 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:10,913-Speed 6312.75 samples/sec Loss 3.7868 LearningRate 0.0001 Epoch: 27 Global Step: 568190 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:14,157-Speed 6314.35 samples/sec Loss 3.7473 LearningRate 0.0001 Epoch: 27 Global Step: 568200 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:17,387-Speed 6342.84 samples/sec Loss 3.8303 LearningRate 0.0001 Epoch: 27 Global Step: 568210 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:20,632-Speed 6312.57 samples/sec Loss 3.8042 LearningRate 0.0001 Epoch: 27 Global Step: 568220 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:23,875-Speed 6315.75 samples/sec Loss 3.7840 LearningRate 0.0001 Epoch: 27 Global Step: 568230 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:27,125-Speed 6303.45 samples/sec Loss 3.7708 LearningRate 0.0001 Epoch: 27 Global Step: 568240 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:30,371-Speed 6311.34 samples/sec Loss 3.7219 LearningRate 0.0001 Epoch: 27 Global Step: 568250 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:33,620-Speed 6305.22 samples/sec Loss 3.8271 LearningRate 0.0001 Epoch: 27 Global Step: 568260 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:36,868-Speed 6305.38 samples/sec Loss 3.7863 LearningRate 0.0001 Epoch: 27 Global Step: 568270 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:40,115-Speed 6308.36 samples/sec Loss 3.8009 LearningRate 0.0001 Epoch: 27 Global Step: 568280 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:43,399-Speed 6238.86 samples/sec Loss 3.7770 LearningRate 0.0001 Epoch: 27 Global Step: 568290 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:46,646-Speed 6307.94 samples/sec Loss 3.7413 LearningRate 0.0001 Epoch: 27 Global Step: 568300 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:25:49,990-Speed 6125.18 samples/sec Loss 3.7635 LearningRate 0.0001 Epoch: 27 Global Step: 568310 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:25:53,237-Speed 6309.10 samples/sec Loss 3.6947 LearningRate 0.0001 Epoch: 27 Global Step: 568320 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:25:56,485-Speed 6307.66 samples/sec Loss 3.6773 LearningRate 0.0001 Epoch: 27 Global Step: 568330 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:25:59,731-Speed 6310.67 samples/sec Loss 3.8063 LearningRate 0.0001 Epoch: 27 Global Step: 568340 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:26:02,974-Speed 6316.13 samples/sec Loss 3.7439 LearningRate 0.0001 Epoch: 27 Global Step: 568350 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:06,220-Speed 6311.59 samples/sec Loss 3.7819 LearningRate 0.0001 Epoch: 27 Global Step: 568360 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:09,469-Speed 6303.94 samples/sec Loss 3.7273 LearningRate 0.0001 Epoch: 27 Global Step: 568370 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:12,716-Speed 6309.88 samples/sec Loss 3.7796 LearningRate 0.0001 Epoch: 27 Global Step: 568380 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:15,967-Speed 6301.29 samples/sec Loss 3.8027 LearningRate 0.0001 Epoch: 27 Global Step: 568390 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:19,228-Speed 6281.94 samples/sec Loss 3.8002 LearningRate 0.0001 Epoch: 27 Global Step: 568400 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:22,472-Speed 6314.09 samples/sec Loss 3.7817 LearningRate 0.0001 Epoch: 27 Global Step: 568410 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:25,718-Speed 6311.41 samples/sec Loss 3.8151 LearningRate 0.0001 Epoch: 27 Global Step: 568420 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:28,964-Speed 6310.34 samples/sec Loss 3.7593 LearningRate 0.0001 Epoch: 27 Global Step: 568430 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:32,216-Speed 6297.91 samples/sec Loss 3.7267 LearningRate 0.0001 Epoch: 27 Global Step: 568440 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:35,462-Speed 6311.34 samples/sec Loss 3.7801 LearningRate 0.0001 Epoch: 27 Global Step: 568450 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:26:38,690-Speed 6344.69 samples/sec Loss 3.8080 LearningRate 0.0001 Epoch: 27 Global Step: 568460 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:41,939-Speed 6305.67 samples/sec Loss 3.7216 LearningRate 0.0001 Epoch: 27 Global Step: 568470 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:45,186-Speed 6309.89 samples/sec Loss 3.7143 LearningRate 0.0001 Epoch: 27 Global Step: 568480 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:48,430-Speed 6312.72 samples/sec Loss 3.7525 LearningRate 0.0001 Epoch: 27 Global Step: 568490 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:51,685-Speed 6293.99 samples/sec Loss 3.7725 LearningRate 0.0001 Epoch: 27 Global Step: 568500 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:54,927-Speed 6318.98 samples/sec Loss 3.7958 LearningRate 0.0001 Epoch: 27 Global Step: 568510 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:26:58,179-Speed 6300.09 samples/sec Loss 3.7864 LearningRate 0.0001 Epoch: 27 Global Step: 568520 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:01,423-Speed 6313.27 samples/sec Loss 3.7378 LearningRate 0.0001 Epoch: 27 Global Step: 568530 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:04,669-Speed 6311.57 samples/sec Loss 3.7282 LearningRate 0.0001 Epoch: 27 Global Step: 568540 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:07,916-Speed 6307.38 samples/sec Loss 3.7948 LearningRate 0.0001 Epoch: 27 Global Step: 568550 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:11,161-Speed 6314.00 samples/sec Loss 3.7552 LearningRate 0.0001 Epoch: 27 Global Step: 568560 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:27:14,407-Speed 6310.48 samples/sec Loss 3.7882 LearningRate 0.0001 Epoch: 27 Global Step: 568570 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:27:17,639-Speed 6339.02 samples/sec Loss 3.7408 LearningRate 0.0001 Epoch: 27 Global Step: 568580 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:20,884-Speed 6312.43 samples/sec Loss 3.7644 LearningRate 0.0001 Epoch: 27 Global Step: 568590 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:24,130-Speed 6309.93 samples/sec Loss 3.8047 LearningRate 0.0001 Epoch: 27 Global Step: 568600 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:27,377-Speed 6308.95 samples/sec Loss 3.7699 LearningRate 0.0001 Epoch: 27 Global Step: 568610 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:30,624-Speed 6309.83 samples/sec Loss 3.7583 LearningRate 0.0001 Epoch: 27 Global Step: 568620 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:33,870-Speed 6309.99 samples/sec Loss 3.8089 LearningRate 0.0001 Epoch: 27 Global Step: 568630 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:37,117-Speed 6308.34 samples/sec Loss 3.8181 LearningRate 0.0001 Epoch: 27 Global Step: 568640 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:40,365-Speed 6306.71 samples/sec Loss 3.8046 LearningRate 0.0001 Epoch: 27 Global Step: 568650 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:43,614-Speed 6305.19 samples/sec Loss 3.7556 LearningRate 0.0001 Epoch: 27 Global Step: 568660 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:46,872-Speed 6287.32 samples/sec Loss 3.8304 LearningRate 0.0001 Epoch: 27 Global Step: 568670 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:50,119-Speed 6309.55 samples/sec Loss 3.7909 LearningRate 0.0001 Epoch: 27 Global Step: 568680 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:27:53,355-Speed 6330.76 samples/sec Loss 3.7321 LearningRate 0.0001 Epoch: 27 Global Step: 568690 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:56,614-Speed 6285.54 samples/sec Loss 3.7924 LearningRate 0.0001 Epoch: 27 Global Step: 568700 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:27:59,872-Speed 6286.13 samples/sec Loss 3.7596 LearningRate 0.0001 Epoch: 27 Global Step: 568710 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:28:03,119-Speed 6308.46 samples/sec Loss 3.7752 LearningRate 0.0001 Epoch: 27 Global Step: 568720 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:28:06,368-Speed 6304.98 samples/sec Loss 3.7424 LearningRate 0.0001 Epoch: 27 Global Step: 568730 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:28:09,614-Speed 6311.67 samples/sec Loss 3.7262 LearningRate 0.0001 Epoch: 27 Global Step: 568740 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:28:12,858-Speed 6313.67 samples/sec Loss 3.7948 LearningRate 0.0001 Epoch: 27 Global Step: 568750 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:28:16,103-Speed 6312.44 samples/sec Loss 3.7982 LearningRate 0.0001 Epoch: 27 Global Step: 568760 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:28:19,353-Speed 6303.43 samples/sec Loss 3.8197 LearningRate 0.0001 Epoch: 27 Global Step: 568770 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:28:22,603-Speed 6304.06 samples/sec Loss 3.7778 LearningRate 0.0001 Epoch: 27 Global Step: 568780 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:28:25,859-Speed 6292.25 samples/sec Loss 3.7589 LearningRate 0.0001 Epoch: 27 Global Step: 568790 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:28:29,106-Speed 6307.50 samples/sec Loss 3.8409 LearningRate 0.0001 Epoch: 27 Global Step: 568800 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:28:32,353-Speed 6310.27 samples/sec Loss 3.8751 LearningRate 0.0001 Epoch: 27 Global Step: 568810 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:28:35,597-Speed 6313.16 samples/sec Loss 3.7560 LearningRate 0.0001 Epoch: 27 Global Step: 568820 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:28:38,842-Speed 6313.18 samples/sec Loss 3.7562 LearningRate 0.0001 Epoch: 27 Global Step: 568830 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:28:42,091-Speed 6304.87 samples/sec Loss 3.7467 LearningRate 0.0001 Epoch: 27 Global Step: 568840 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:28:45,337-Speed 6311.19 samples/sec Loss 3.7771 LearningRate 0.0001 Epoch: 27 Global Step: 568850 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:28:48,586-Speed 6304.88 samples/sec Loss 3.7818 LearningRate 0.0001 Epoch: 27 Global Step: 568860 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:28:51,818-Speed 6336.80 samples/sec Loss 3.7745 LearningRate 0.0001 Epoch: 27 Global Step: 568870 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:28:55,062-Speed 6315.86 samples/sec Loss 3.6535 LearningRate 0.0001 Epoch: 27 Global Step: 568880 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:28:58,310-Speed 6306.70 samples/sec Loss 3.7874 LearningRate 0.0001 Epoch: 27 Global Step: 568890 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:01,598-Speed 6228.98 samples/sec Loss 3.8092 LearningRate 0.0001 Epoch: 27 Global Step: 568900 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:04,873-Speed 6255.02 samples/sec Loss 3.7489 LearningRate 0.0001 Epoch: 27 Global Step: 568910 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:08,122-Speed 6304.82 samples/sec Loss 3.8488 LearningRate 0.0001 Epoch: 27 Global Step: 568920 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:11,373-Speed 6302.66 samples/sec Loss 3.7991 LearningRate 0.0001 Epoch: 27 Global Step: 568930 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:14,634-Speed 6281.38 samples/sec Loss 3.7232 LearningRate 0.0001 Epoch: 27 Global Step: 568940 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:17,882-Speed 6306.57 samples/sec Loss 3.7341 LearningRate 0.0001 Epoch: 27 Global Step: 568950 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:21,130-Speed 6306.98 samples/sec Loss 3.8096 LearningRate 0.0001 Epoch: 27 Global Step: 568960 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:24,363-Speed 6335.82 samples/sec Loss 3.7019 LearningRate 0.0001 Epoch: 27 Global Step: 568970 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:27,610-Speed 6308.81 samples/sec Loss 3.7973 LearningRate 0.0001 Epoch: 27 Global Step: 568980 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:30,867-Speed 6290.12 samples/sec Loss 3.7773 LearningRate 0.0001 Epoch: 27 Global Step: 568990 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:34,117-Speed 6301.99 samples/sec Loss 3.7702 LearningRate 0.0001 Epoch: 27 Global Step: 569000 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:37,367-Speed 6302.92 samples/sec Loss 3.7571 LearningRate 0.0001 Epoch: 27 Global Step: 569010 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:40,622-Speed 6294.72 samples/sec Loss 3.8262 LearningRate 0.0001 Epoch: 27 Global Step: 569020 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:43,897-Speed 6253.66 samples/sec Loss 3.8125 LearningRate 0.0001 Epoch: 27 Global Step: 569030 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:47,144-Speed 6308.69 samples/sec Loss 3.8085 LearningRate 0.0001 Epoch: 27 Global Step: 569040 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:50,390-Speed 6310.55 samples/sec Loss 3.7152 LearningRate 0.0001 Epoch: 27 Global Step: 569050 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:53,643-Speed 6297.50 samples/sec Loss 3.7778 LearningRate 0.0001 Epoch: 27 Global Step: 569060 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:29:56,891-Speed 6307.25 samples/sec Loss 3.8070 LearningRate 0.0001 Epoch: 27 Global Step: 569070 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:30:00,138-Speed 6309.19 samples/sec Loss 3.7781 LearningRate 0.0001 Epoch: 27 Global Step: 569080 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:30:03,386-Speed 6307.02 samples/sec Loss 3.7476 LearningRate 0.0001 Epoch: 27 Global Step: 569090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:30:06,632-Speed 6310.71 samples/sec Loss 3.7564 LearningRate 0.0001 Epoch: 27 Global Step: 569100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:30:09,884-Speed 6297.84 samples/sec Loss 3.7492 LearningRate 0.0001 Epoch: 27 Global Step: 569110 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:30:13,134-Speed 6302.64 samples/sec Loss 3.7613 LearningRate 0.0001 Epoch: 27 Global Step: 569120 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:30:16,380-Speed 6312.08 samples/sec Loss 3.7881 LearningRate 0.0001 Epoch: 27 Global Step: 569130 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:30:19,611-Speed 6339.14 samples/sec Loss 3.8329 LearningRate 0.0001 Epoch: 27 Global Step: 569140 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:30:22,862-Speed 6301.99 samples/sec Loss 3.7757 LearningRate 0.0001 Epoch: 27 Global Step: 569150 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:30:26,109-Speed 6308.07 samples/sec Loss 3.7588 LearningRate 0.0001 Epoch: 27 Global Step: 569160 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:30:29,353-Speed 6313.71 samples/sec Loss 3.7279 LearningRate 0.0001 Epoch: 27 Global Step: 569170 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:30:32,600-Speed 6308.39 samples/sec Loss 3.7742 LearningRate 0.0001 Epoch: 27 Global Step: 569180 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:30:35,845-Speed 6314.24 samples/sec Loss 3.7756 LearningRate 0.0001 Epoch: 27 Global Step: 569190 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:30:39,094-Speed 6304.83 samples/sec Loss 3.7527 LearningRate 0.0001 Epoch: 27 Global Step: 569200 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:30:42,343-Speed 6306.29 samples/sec Loss 3.7204 LearningRate 0.0001 Epoch: 27 Global Step: 569210 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:30:45,588-Speed 6312.65 samples/sec Loss 3.7474 LearningRate 0.0001 Epoch: 27 Global Step: 569220 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:30:48,843-Speed 6292.33 samples/sec Loss 3.7740 LearningRate 0.0001 Epoch: 27 Global Step: 569230 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:30:52,090-Speed 6309.20 samples/sec Loss 3.8369 LearningRate 0.0001 Epoch: 27 Global Step: 569240 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:30:55,336-Speed 6310.94 samples/sec Loss 3.8123 LearningRate 0.0001 Epoch: 27 Global Step: 569250 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:30:58,582-Speed 6309.85 samples/sec Loss 3.7362 LearningRate 0.0001 Epoch: 27 Global Step: 569260 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:31:01,815-Speed 6336.66 samples/sec Loss 3.8000 LearningRate 0.0001 Epoch: 27 Global Step: 569270 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:05,059-Speed 6314.63 samples/sec Loss 3.7063 LearningRate 0.0001 Epoch: 27 Global Step: 569280 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:08,304-Speed 6312.08 samples/sec Loss 3.7825 LearningRate 0.0001 Epoch: 27 Global Step: 569290 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:11,552-Speed 6307.08 samples/sec Loss 3.8168 LearningRate 0.0001 Epoch: 27 Global Step: 569300 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:14,800-Speed 6306.78 samples/sec Loss 3.7725 LearningRate 0.0001 Epoch: 27 Global Step: 569310 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:18,048-Speed 6306.45 samples/sec Loss 3.7980 LearningRate 0.0001 Epoch: 27 Global Step: 569320 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:21,294-Speed 6311.19 samples/sec Loss 3.7277 LearningRate 0.0001 Epoch: 27 Global Step: 569330 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:24,541-Speed 6309.00 samples/sec Loss 3.7870 LearningRate 0.0001 Epoch: 27 Global Step: 569340 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:27,785-Speed 6313.33 samples/sec Loss 3.7438 LearningRate 0.0001 Epoch: 27 Global Step: 569350 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:31,026-Speed 6322.01 samples/sec Loss 3.7746 LearningRate 0.0001 Epoch: 27 Global Step: 569360 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:34,273-Speed 6308.78 samples/sec Loss 3.7683 LearningRate 0.0001 Epoch: 27 Global Step: 569370 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:31:37,502-Speed 6342.43 samples/sec Loss 3.8712 LearningRate 0.0001 Epoch: 27 Global Step: 569380 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:40,747-Speed 6312.79 samples/sec Loss 3.8243 LearningRate 0.0001 Epoch: 27 Global Step: 569390 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:44,000-Speed 6298.72 samples/sec Loss 3.7652 LearningRate 0.0001 Epoch: 27 Global Step: 569400 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:47,243-Speed 6317.12 samples/sec Loss 3.7903 LearningRate 0.0001 Epoch: 27 Global Step: 569410 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:50,486-Speed 6315.07 samples/sec Loss 3.7571 LearningRate 0.0001 Epoch: 27 Global Step: 569420 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:53,733-Speed 6310.50 samples/sec Loss 3.7366 LearningRate 0.0001 Epoch: 27 Global Step: 569430 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:31:56,978-Speed 6311.81 samples/sec Loss 3.8025 LearningRate 0.0001 Epoch: 27 Global Step: 569440 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:00,237-Speed 6286.64 samples/sec Loss 3.7473 LearningRate 0.0001 Epoch: 27 Global Step: 569450 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:03,487-Speed 6301.66 samples/sec Loss 3.7616 LearningRate 0.0001 Epoch: 27 Global Step: 569460 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:06,731-Speed 6313.71 samples/sec Loss 3.7020 LearningRate 0.0001 Epoch: 27 Global Step: 569470 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:09,973-Speed 6318.82 samples/sec Loss 3.7676 LearningRate 0.0001 Epoch: 27 Global Step: 569480 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:32:13,206-Speed 6335.87 samples/sec Loss 3.7565 LearningRate 0.0001 Epoch: 27 Global Step: 569490 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:16,455-Speed 6306.38 samples/sec Loss 3.7457 LearningRate 0.0001 Epoch: 27 Global Step: 569500 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:19,699-Speed 6313.95 samples/sec Loss 3.7638 LearningRate 0.0001 Epoch: 27 Global Step: 569510 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:22,944-Speed 6313.23 samples/sec Loss 3.7341 LearningRate 0.0001 Epoch: 27 Global Step: 569520 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:26,189-Speed 6312.21 samples/sec Loss 3.7532 LearningRate 0.0001 Epoch: 27 Global Step: 569530 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:29,438-Speed 6305.22 samples/sec Loss 3.8086 LearningRate 0.0001 Epoch: 27 Global Step: 569540 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:32,686-Speed 6306.17 samples/sec Loss 3.7798 LearningRate 0.0001 Epoch: 27 Global Step: 569550 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:35,934-Speed 6305.88 samples/sec Loss 3.7497 LearningRate 0.0001 Epoch: 27 Global Step: 569560 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:39,219-Speed 6237.12 samples/sec Loss 3.7746 LearningRate 0.0001 Epoch: 27 Global Step: 569570 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:42,461-Speed 6319.12 samples/sec Loss 3.7596 LearningRate 0.0001 Epoch: 27 Global Step: 569580 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:45,708-Speed 6308.35 samples/sec Loss 3.8217 LearningRate 0.0001 Epoch: 27 Global Step: 569590 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:32:48,951-Speed 6315.78 samples/sec Loss 3.7394 LearningRate 0.0001 Epoch: 27 Global Step: 569600 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:32:52,187-Speed 6330.78 samples/sec Loss 3.7728 LearningRate 0.0001 Epoch: 27 Global Step: 569610 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:55,437-Speed 6302.86 samples/sec Loss 3.8054 LearningRate 0.0001 Epoch: 27 Global Step: 569620 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:32:58,679-Speed 6319.05 samples/sec Loss 3.6960 LearningRate 0.0001 Epoch: 27 Global Step: 569630 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:01,925-Speed 6310.82 samples/sec Loss 3.7776 LearningRate 0.0001 Epoch: 27 Global Step: 569640 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:05,172-Speed 6308.77 samples/sec Loss 3.8333 LearningRate 0.0001 Epoch: 27 Global Step: 569650 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:08,417-Speed 6313.93 samples/sec Loss 3.8033 LearningRate 0.0001 Epoch: 27 Global Step: 569660 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:11,660-Speed 6316.03 samples/sec Loss 3.8282 LearningRate 0.0001 Epoch: 27 Global Step: 569670 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:14,904-Speed 6314.12 samples/sec Loss 3.7373 LearningRate 0.0001 Epoch: 27 Global Step: 569680 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:18,154-Speed 6303.33 samples/sec Loss 3.7419 LearningRate 0.0001 Epoch: 27 Global Step: 569690 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:21,403-Speed 6304.79 samples/sec Loss 3.7287 LearningRate 0.0001 Epoch: 27 Global Step: 569700 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:24,652-Speed 6303.43 samples/sec Loss 3.7756 LearningRate 0.0001 Epoch: 27 Global Step: 569710 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:33:27,897-Speed 6313.34 samples/sec Loss 3.7189 LearningRate 0.0001 Epoch: 27 Global Step: 569720 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:33:31,148-Speed 6300.95 samples/sec Loss 3.7716 LearningRate 0.0001 Epoch: 27 Global Step: 569730 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:33:34,393-Speed 6312.73 samples/sec Loss 3.7082 LearningRate 0.0001 Epoch: 27 Global Step: 569740 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:33:37,637-Speed 6314.30 samples/sec Loss 3.8575 LearningRate 0.0001 Epoch: 27 Global Step: 569750 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:33:40,867-Speed 6341.79 samples/sec Loss 3.8322 LearningRate 0.0001 Epoch: 27 Global Step: 569760 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:44,112-Speed 6313.86 samples/sec Loss 3.7522 LearningRate 0.0001 Epoch: 27 Global Step: 569770 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:47,359-Speed 6307.59 samples/sec Loss 3.7363 LearningRate 0.0001 Epoch: 27 Global Step: 569780 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:50,606-Speed 6310.07 samples/sec Loss 3.8375 LearningRate 0.0001 Epoch: 27 Global Step: 569790 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:53,852-Speed 6309.21 samples/sec Loss 3.7716 LearningRate 0.0001 Epoch: 27 Global Step: 569800 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:33:57,098-Speed 6312.46 samples/sec Loss 3.7653 LearningRate 0.0001 Epoch: 27 Global Step: 569810 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:00,348-Speed 6302.32 samples/sec Loss 3.7476 LearningRate 0.0001 Epoch: 27 Global Step: 569820 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:03,593-Speed 6313.25 samples/sec Loss 3.7940 LearningRate 0.0001 Epoch: 27 Global Step: 569830 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:06,838-Speed 6311.71 samples/sec Loss 3.7484 LearningRate 0.0001 Epoch: 27 Global Step: 569840 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:10,084-Speed 6311.11 samples/sec Loss 3.7592 LearningRate 0.0001 Epoch: 27 Global Step: 569850 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:13,325-Speed 6321.47 samples/sec Loss 3.7377 LearningRate 0.0001 Epoch: 27 Global Step: 569860 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:34:16,569-Speed 6314.79 samples/sec Loss 3.7565 LearningRate 0.0001 Epoch: 27 Global Step: 569870 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:34:19,801-Speed 6336.78 samples/sec Loss 3.7718 LearningRate 0.0001 Epoch: 27 Global Step: 569880 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:23,046-Speed 6312.84 samples/sec Loss 3.7559 LearningRate 0.0001 Epoch: 27 Global Step: 569890 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:26,292-Speed 6311.47 samples/sec Loss 3.7759 LearningRate 0.0001 Epoch: 27 Global Step: 569900 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:29,617-Speed 6161.37 samples/sec Loss 3.8269 LearningRate 0.0001 Epoch: 27 Global Step: 569910 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:32,864-Speed 6307.84 samples/sec Loss 3.7586 LearningRate 0.0001 Epoch: 27 Global Step: 569920 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:36,110-Speed 6310.97 samples/sec Loss 3.8277 LearningRate 0.0001 Epoch: 27 Global Step: 569930 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:39,360-Speed 6303.30 samples/sec Loss 3.8412 LearningRate 0.0001 Epoch: 27 Global Step: 569940 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:42,607-Speed 6308.39 samples/sec Loss 3.8011 LearningRate 0.0001 Epoch: 27 Global Step: 569950 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:45,851-Speed 6314.26 samples/sec Loss 3.7731 LearningRate 0.0001 Epoch: 27 Global Step: 569960 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:49,099-Speed 6306.46 samples/sec Loss 3.7855 LearningRate 0.0001 Epoch: 27 Global Step: 569970 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:52,334-Speed 6333.06 samples/sec Loss 3.8485 LearningRate 0.0001 Epoch: 27 Global Step: 569980 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:55,578-Speed 6314.33 samples/sec Loss 3.7145 LearningRate 0.0001 Epoch: 27 Global Step: 569990 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:34:58,826-Speed 6306.54 samples/sec Loss 3.7079 LearningRate 0.0001 Epoch: 27 Global Step: 570000 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:02,073-Speed 6309.34 samples/sec Loss 3.7456 LearningRate 0.0001 Epoch: 27 Global Step: 570010 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:05,322-Speed 6303.89 samples/sec Loss 3.7527 LearningRate 0.0001 Epoch: 27 Global Step: 570020 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:08,571-Speed 6305.01 samples/sec Loss 3.7558 LearningRate 0.0001 Epoch: 27 Global Step: 570030 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:11,818-Speed 6308.83 samples/sec Loss 3.7820 LearningRate 0.0001 Epoch: 27 Global Step: 570040 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:15,078-Speed 6285.03 samples/sec Loss 3.7776 LearningRate 0.0001 Epoch: 27 Global Step: 570050 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:18,326-Speed 6306.50 samples/sec Loss 3.7570 LearningRate 0.0001 Epoch: 27 Global Step: 570060 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:21,572-Speed 6310.70 samples/sec Loss 3.7051 LearningRate 0.0001 Epoch: 27 Global Step: 570070 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:24,830-Speed 6288.59 samples/sec Loss 3.7703 LearningRate 0.0001 Epoch: 27 Global Step: 570080 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:35:28,078-Speed 6305.42 samples/sec Loss 3.7754 LearningRate 0.0001 Epoch: 27 Global Step: 570090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:35:31,323-Speed 6312.45 samples/sec Loss 3.8349 LearningRate 0.0001 Epoch: 27 Global Step: 570100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:35:34,560-Speed 6329.87 samples/sec Loss 3.8093 LearningRate 0.0001 Epoch: 27 Global Step: 570110 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:37,811-Speed 6300.56 samples/sec Loss 3.7172 LearningRate 0.0001 Epoch: 27 Global Step: 570120 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:41,159-Speed 6118.75 samples/sec Loss 3.7563 LearningRate 0.0001 Epoch: 27 Global Step: 570130 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:44,407-Speed 6306.64 samples/sec Loss 3.8165 LearningRate 0.0001 Epoch: 27 Global Step: 570140 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:47,656-Speed 6303.87 samples/sec Loss 3.7618 LearningRate 0.0001 Epoch: 27 Global Step: 570150 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:50,900-Speed 6314.28 samples/sec Loss 3.7825 LearningRate 0.0001 Epoch: 27 Global Step: 570160 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:54,145-Speed 6313.72 samples/sec Loss 3.8345 LearningRate 0.0001 Epoch: 27 Global Step: 570170 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:35:57,390-Speed 6313.41 samples/sec Loss 3.8045 LearningRate 0.0001 Epoch: 27 Global Step: 570180 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:00,639-Speed 6304.12 samples/sec Loss 3.7793 LearningRate 0.0001 Epoch: 27 Global Step: 570190 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:03,887-Speed 6307.10 samples/sec Loss 3.7954 LearningRate 0.0001 Epoch: 27 Global Step: 570200 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:07,135-Speed 6306.55 samples/sec Loss 3.7508 LearningRate 0.0001 Epoch: 27 Global Step: 570210 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:36:10,381-Speed 6311.26 samples/sec Loss 3.7958 LearningRate 0.0001 Epoch: 27 Global Step: 570220 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:36:13,614-Speed 6334.79 samples/sec Loss 3.7701 LearningRate 0.0001 Epoch: 27 Global Step: 570230 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:16,866-Speed 6299.68 samples/sec Loss 3.7978 LearningRate 0.0001 Epoch: 27 Global Step: 570240 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:20,111-Speed 6314.58 samples/sec Loss 3.7822 LearningRate 0.0001 Epoch: 27 Global Step: 570250 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:23,356-Speed 6311.68 samples/sec Loss 3.7866 LearningRate 0.0001 Epoch: 27 Global Step: 570260 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:26,605-Speed 6305.22 samples/sec Loss 3.7423 LearningRate 0.0001 Epoch: 27 Global Step: 570270 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:29,846-Speed 6321.05 samples/sec Loss 3.7228 LearningRate 0.0001 Epoch: 27 Global Step: 570280 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:33,136-Speed 6224.95 samples/sec Loss 3.7541 LearningRate 0.0001 Epoch: 27 Global Step: 570290 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:36,458-Speed 6166.68 samples/sec Loss 3.7669 LearningRate 0.0001 Epoch: 27 Global Step: 570300 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:39,706-Speed 6308.07 samples/sec Loss 3.7046 LearningRate 0.0001 Epoch: 27 Global Step: 570310 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:42,952-Speed 6309.96 samples/sec Loss 3.6882 LearningRate 0.0001 Epoch: 27 Global Step: 570320 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:46,181-Speed 6343.98 samples/sec Loss 3.7153 LearningRate 0.0001 Epoch: 27 Global Step: 570330 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:49,443-Speed 6280.20 samples/sec Loss 3.7995 LearningRate 0.0001 Epoch: 27 Global Step: 570340 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:52,697-Speed 6295.36 samples/sec Loss 3.7220 LearningRate 0.0001 Epoch: 27 Global Step: 570350 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:55,954-Speed 6288.51 samples/sec Loss 3.7791 LearningRate 0.0001 Epoch: 27 Global Step: 570360 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:36:59,196-Speed 6318.38 samples/sec Loss 3.8035 LearningRate 0.0001 Epoch: 27 Global Step: 570370 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:02,448-Speed 6299.94 samples/sec Loss 3.7893 LearningRate 0.0001 Epoch: 27 Global Step: 570380 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:05,705-Speed 6288.07 samples/sec Loss 3.7393 LearningRate 0.0001 Epoch: 27 Global Step: 570390 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:08,952-Speed 6309.29 samples/sec Loss 3.6750 LearningRate 0.0001 Epoch: 27 Global Step: 570400 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:12,210-Speed 6287.55 samples/sec Loss 3.7396 LearningRate 0.0001 Epoch: 27 Global Step: 570410 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:15,464-Speed 6295.30 samples/sec Loss 3.7169 LearningRate 0.0001 Epoch: 27 Global Step: 570420 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:18,710-Speed 6309.96 samples/sec Loss 3.7348 LearningRate 0.0001 Epoch: 27 Global Step: 570430 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:37:21,946-Speed 6329.37 samples/sec Loss 3.7280 LearningRate 0.0001 Epoch: 27 Global Step: 570440 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:25,199-Speed 6298.96 samples/sec Loss 3.7523 LearningRate 0.0001 Epoch: 27 Global Step: 570450 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:28,448-Speed 6304.72 samples/sec Loss 3.6765 LearningRate 0.0001 Epoch: 27 Global Step: 570460 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:31,692-Speed 6315.39 samples/sec Loss 3.7718 LearningRate 0.0001 Epoch: 27 Global Step: 570470 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:34,940-Speed 6307.08 samples/sec Loss 3.8655 LearningRate 0.0001 Epoch: 27 Global Step: 570480 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:38,182-Speed 6317.94 samples/sec Loss 3.7489 LearningRate 0.0001 Epoch: 27 Global Step: 570490 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:41,428-Speed 6309.51 samples/sec Loss 3.7829 LearningRate 0.0001 Epoch: 27 Global Step: 570500 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:44,676-Speed 6307.77 samples/sec Loss 3.7517 LearningRate 0.0001 Epoch: 27 Global Step: 570510 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:47,933-Speed 6288.95 samples/sec Loss 3.6943 LearningRate 0.0001 Epoch: 27 Global Step: 570520 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:51,179-Speed 6311.30 samples/sec Loss 3.8180 LearningRate 0.0001 Epoch: 27 Global Step: 570530 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:37:54,433-Speed 6295.40 samples/sec Loss 3.7180 LearningRate 0.0001 Epoch: 27 Global Step: 570540 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:37:57,678-Speed 6312.64 samples/sec Loss 3.7516 LearningRate 0.0001 Epoch: 27 Global Step: 570550 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:38:00,919-Speed 6319.26 samples/sec Loss 3.8095 LearningRate 0.0001 Epoch: 27 Global Step: 570560 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:04,165-Speed 6311.21 samples/sec Loss 3.7513 LearningRate 0.0001 Epoch: 27 Global Step: 570570 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:07,407-Speed 6319.60 samples/sec Loss 3.7980 LearningRate 0.0001 Epoch: 27 Global Step: 570580 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:10,654-Speed 6308.04 samples/sec Loss 3.8175 LearningRate 0.0001 Epoch: 27 Global Step: 570590 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:13,897-Speed 6316.05 samples/sec Loss 3.7336 LearningRate 0.0001 Epoch: 27 Global Step: 570600 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:17,143-Speed 6311.34 samples/sec Loss 3.7184 LearningRate 0.0001 Epoch: 27 Global Step: 570610 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:20,400-Speed 6288.07 samples/sec Loss 3.7165 LearningRate 0.0001 Epoch: 27 Global Step: 570620 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:23,652-Speed 6299.68 samples/sec Loss 3.7477 LearningRate 0.0001 Epoch: 27 Global Step: 570630 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:26,903-Speed 6301.41 samples/sec Loss 3.7845 LearningRate 0.0001 Epoch: 27 Global Step: 570640 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:30,145-Speed 6318.59 samples/sec Loss 3.7261 LearningRate 0.0001 Epoch: 27 Global Step: 570650 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:33,375-Speed 6340.52 samples/sec Loss 3.7871 LearningRate 0.0001 Epoch: 27 Global Step: 570660 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:36,627-Speed 6300.72 samples/sec Loss 3.7646 LearningRate 0.0001 Epoch: 27 Global Step: 570670 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:39,873-Speed 6310.03 samples/sec Loss 3.7202 LearningRate 0.0001 Epoch: 27 Global Step: 570680 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:43,117-Speed 6314.85 samples/sec Loss 3.8258 LearningRate 0.0001 Epoch: 27 Global Step: 570690 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:46,360-Speed 6316.68 samples/sec Loss 3.8026 LearningRate 0.0001 Epoch: 27 Global Step: 570700 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:49,606-Speed 6311.33 samples/sec Loss 3.7382 LearningRate 0.0001 Epoch: 27 Global Step: 570710 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:52,854-Speed 6307.15 samples/sec Loss 3.8035 LearningRate 0.0001 Epoch: 27 Global Step: 570720 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:56,098-Speed 6313.17 samples/sec Loss 3.7603 LearningRate 0.0001 Epoch: 27 Global Step: 570730 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:38:59,350-Speed 6300.50 samples/sec Loss 3.8045 LearningRate 0.0001 Epoch: 27 Global Step: 570740 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:02,595-Speed 6311.70 samples/sec Loss 3.8236 LearningRate 0.0001 Epoch: 27 Global Step: 570750 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:05,824-Speed 6344.97 samples/sec Loss 3.7875 LearningRate 0.0001 Epoch: 27 Global Step: 570760 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:09,072-Speed 6306.93 samples/sec Loss 3.7381 LearningRate 0.0001 Epoch: 27 Global Step: 570770 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:12,320-Speed 6306.00 samples/sec Loss 3.7166 LearningRate 0.0001 Epoch: 27 Global Step: 570780 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:15,565-Speed 6313.09 samples/sec Loss 3.7687 LearningRate 0.0001 Epoch: 27 Global Step: 570790 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:18,815-Speed 6302.84 samples/sec Loss 3.7768 LearningRate 0.0001 Epoch: 27 Global Step: 570800 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:22,059-Speed 6315.56 samples/sec Loss 3.7327 LearningRate 0.0001 Epoch: 27 Global Step: 570810 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:25,302-Speed 6314.83 samples/sec Loss 3.7452 LearningRate 0.0001 Epoch: 27 Global Step: 570820 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:28,547-Speed 6313.91 samples/sec Loss 3.7985 LearningRate 0.0001 Epoch: 27 Global Step: 570830 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:31,797-Speed 6302.93 samples/sec Loss 3.8147 LearningRate 0.0001 Epoch: 27 Global Step: 570840 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:35,038-Speed 6319.72 samples/sec Loss 3.7712 LearningRate 0.0001 Epoch: 27 Global Step: 570850 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:38,289-Speed 6301.02 samples/sec Loss 3.7686 LearningRate 0.0001 Epoch: 27 Global Step: 570860 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:39:41,537-Speed 6306.51 samples/sec Loss 3.7512 LearningRate 0.0001 Epoch: 27 Global Step: 570870 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:39:44,784-Speed 6309.69 samples/sec Loss 3.7964 LearningRate 0.0001 Epoch: 27 Global Step: 570880 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:39:48,017-Speed 6335.68 samples/sec Loss 3.7840 LearningRate 0.0001 Epoch: 27 Global Step: 570890 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:51,261-Speed 6316.38 samples/sec Loss 3.7529 LearningRate 0.0001 Epoch: 27 Global Step: 570900 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:54,506-Speed 6311.66 samples/sec Loss 3.7966 LearningRate 0.0001 Epoch: 27 Global Step: 570910 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:39:57,752-Speed 6310.58 samples/sec Loss 3.7407 LearningRate 0.0001 Epoch: 27 Global Step: 570920 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:00,994-Speed 6318.71 samples/sec Loss 3.7929 LearningRate 0.0001 Epoch: 27 Global Step: 570930 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:04,242-Speed 6307.18 samples/sec Loss 3.8183 LearningRate 0.0001 Epoch: 27 Global Step: 570940 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:07,487-Speed 6312.95 samples/sec Loss 3.7497 LearningRate 0.0001 Epoch: 27 Global Step: 570950 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:10,730-Speed 6316.06 samples/sec Loss 3.8112 LearningRate 0.0001 Epoch: 27 Global Step: 570960 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:13,983-Speed 6298.03 samples/sec Loss 3.7523 LearningRate 0.0001 Epoch: 27 Global Step: 570970 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:17,277-Speed 6217.63 samples/sec Loss 3.7724 LearningRate 0.0001 Epoch: 27 Global Step: 570980 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:20,514-Speed 6328.40 samples/sec Loss 3.7405 LearningRate 0.0001 Epoch: 27 Global Step: 570990 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:23,803-Speed 6228.09 samples/sec Loss 3.7928 LearningRate 0.0001 Epoch: 27 Global Step: 571000 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:27,153-Speed 6114.56 samples/sec Loss 3.7989 LearningRate 0.0001 Epoch: 27 Global Step: 571010 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:30,400-Speed 6309.26 samples/sec Loss 3.7540 LearningRate 0.0001 Epoch: 27 Global Step: 571020 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:33,645-Speed 6311.59 samples/sec Loss 3.7563 LearningRate 0.0001 Epoch: 27 Global Step: 571030 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:36,890-Speed 6313.32 samples/sec Loss 3.7687 LearningRate 0.0001 Epoch: 27 Global Step: 571040 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:40,135-Speed 6313.89 samples/sec Loss 3.7116 LearningRate 0.0001 Epoch: 27 Global Step: 571050 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:43,380-Speed 6312.51 samples/sec Loss 3.7480 LearningRate 0.0001 Epoch: 27 Global Step: 571060 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:46,622-Speed 6316.82 samples/sec Loss 3.8041 LearningRate 0.0001 Epoch: 27 Global Step: 571070 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:49,871-Speed 6306.22 samples/sec Loss 3.8667 LearningRate 0.0001 Epoch: 27 Global Step: 571080 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:40:53,117-Speed 6310.08 samples/sec Loss 3.7625 LearningRate 0.0001 Epoch: 27 Global Step: 571090 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:40:56,363-Speed 6312.04 samples/sec Loss 3.7756 LearningRate 0.0001 Epoch: 27 Global Step: 571100 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:40:59,609-Speed 6309.77 samples/sec Loss 3.6632 LearningRate 0.0001 Epoch: 27 Global Step: 571110 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-04-02 19:41:02,839-Speed 6343.03 samples/sec Loss 3.7295 LearningRate 0.0001 Epoch: 27 Global Step: 571120 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:41:06,082-Speed 6316.11 samples/sec Loss 3.6704 LearningRate 0.0001 Epoch: 27 Global Step: 571130 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:41:09,328-Speed 6310.37 samples/sec Loss 3.7822 LearningRate 0.0001 Epoch: 27 Global Step: 571140 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:41:12,581-Speed 6297.73 samples/sec Loss 3.7442 LearningRate 0.0001 Epoch: 27 Global Step: 571150 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:41:15,828-Speed 6308.50 samples/sec Loss 3.8390 LearningRate 0.0001 Epoch: 27 Global Step: 571160 Fp16 Grad Scale: 8192 Required: 24 hours Training: 2022-04-02 19:41:19,072-Speed 6314.77 samples/sec Loss 3.7521 LearningRate 0.0001 Epoch: 27 Global Step: 571170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:22,317-Speed 6312.45 samples/sec Loss 3.7398 LearningRate 0.0001 Epoch: 27 Global Step: 571180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:25,562-Speed 6312.34 samples/sec Loss 3.7531 LearningRate 0.0001 Epoch: 27 Global Step: 571190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:28,813-Speed 6300.20 samples/sec Loss 3.7768 LearningRate 0.0001 Epoch: 27 Global Step: 571200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:32,062-Speed 6305.34 samples/sec Loss 3.7455 LearningRate 0.0001 Epoch: 27 Global Step: 571210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:35,290-Speed 6345.24 samples/sec Loss 3.7159 LearningRate 0.0001 Epoch: 27 Global Step: 571220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:38,538-Speed 6307.93 samples/sec Loss 3.8051 LearningRate 0.0001 Epoch: 27 Global Step: 571230 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:41,787-Speed 6305.08 samples/sec Loss 3.7537 LearningRate 0.0001 Epoch: 27 Global Step: 571240 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:45,037-Speed 6303.06 samples/sec Loss 3.7533 LearningRate 0.0001 Epoch: 27 Global Step: 571250 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:48,281-Speed 6314.63 samples/sec Loss 3.7351 LearningRate 0.0001 Epoch: 27 Global Step: 571260 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:51,521-Speed 6320.57 samples/sec Loss 3.7500 LearningRate 0.0001 Epoch: 27 Global Step: 571270 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:54,768-Speed 6309.47 samples/sec Loss 3.8068 LearningRate 0.0001 Epoch: 27 Global Step: 571280 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:41:58,010-Speed 6318.07 samples/sec Loss 3.7584 LearningRate 0.0001 Epoch: 27 Global Step: 571290 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:01,257-Speed 6310.34 samples/sec Loss 3.8187 LearningRate 0.0001 Epoch: 27 Global Step: 571300 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:04,502-Speed 6311.62 samples/sec Loss 3.7226 LearningRate 0.0001 Epoch: 27 Global Step: 571310 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:07,749-Speed 6309.91 samples/sec Loss 3.7623 LearningRate 0.0001 Epoch: 27 Global Step: 571320 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:42:10,996-Speed 6308.22 samples/sec Loss 3.7673 LearningRate 0.0001 Epoch: 27 Global Step: 571330 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:42:14,242-Speed 6311.85 samples/sec Loss 3.7656 LearningRate 0.0001 Epoch: 27 Global Step: 571340 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:42:17,489-Speed 6308.01 samples/sec Loss 3.7448 LearningRate 0.0001 Epoch: 27 Global Step: 571350 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:42:20,721-Speed 6337.93 samples/sec Loss 3.7714 LearningRate 0.0001 Epoch: 27 Global Step: 571360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:23,966-Speed 6313.78 samples/sec Loss 3.7925 LearningRate 0.0001 Epoch: 27 Global Step: 571370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:27,210-Speed 6313.91 samples/sec Loss 3.7803 LearningRate 0.0001 Epoch: 27 Global Step: 571380 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:30,454-Speed 6313.69 samples/sec Loss 3.7884 LearningRate 0.0001 Epoch: 27 Global Step: 571390 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:33,703-Speed 6305.59 samples/sec Loss 3.7588 LearningRate 0.0001 Epoch: 27 Global Step: 571400 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:36,948-Speed 6313.05 samples/sec Loss 3.7618 LearningRate 0.0001 Epoch: 27 Global Step: 571410 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:40,197-Speed 6304.78 samples/sec Loss 3.7533 LearningRate 0.0001 Epoch: 27 Global Step: 571420 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:43,446-Speed 6305.12 samples/sec Loss 3.7922 LearningRate 0.0001 Epoch: 27 Global Step: 571430 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:46,696-Speed 6301.82 samples/sec Loss 3.7611 LearningRate 0.0001 Epoch: 27 Global Step: 571440 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:49,943-Speed 6309.48 samples/sec Loss 3.7422 LearningRate 0.0001 Epoch: 27 Global Step: 571450 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:53,194-Speed 6301.83 samples/sec Loss 3.7664 LearningRate 0.0001 Epoch: 27 Global Step: 571460 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:42:56,425-Speed 6338.20 samples/sec Loss 3.7513 LearningRate 0.0001 Epoch: 27 Global Step: 571470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:42:59,673-Speed 6307.72 samples/sec Loss 3.8060 LearningRate 0.0001 Epoch: 27 Global Step: 571480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:02,926-Speed 6297.60 samples/sec Loss 3.7729 LearningRate 0.0001 Epoch: 27 Global Step: 571490 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:06,171-Speed 6311.87 samples/sec Loss 3.7405 LearningRate 0.0001 Epoch: 27 Global Step: 571500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:09,420-Speed 6305.04 samples/sec Loss 3.7388 LearningRate 0.0001 Epoch: 27 Global Step: 571510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:12,665-Speed 6311.75 samples/sec Loss 3.6851 LearningRate 0.0001 Epoch: 27 Global Step: 571520 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:15,909-Speed 6314.62 samples/sec Loss 3.7866 LearningRate 0.0001 Epoch: 27 Global Step: 571530 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:19,155-Speed 6312.09 samples/sec Loss 3.7347 LearningRate 0.0001 Epoch: 27 Global Step: 571540 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:22,409-Speed 6294.95 samples/sec Loss 3.7629 LearningRate 0.0001 Epoch: 27 Global Step: 571550 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:25,653-Speed 6315.54 samples/sec Loss 3.7105 LearningRate 0.0001 Epoch: 27 Global Step: 571560 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:28,897-Speed 6313.93 samples/sec Loss 3.7489 LearningRate 0.0001 Epoch: 27 Global Step: 571570 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:43:32,162-Speed 6274.57 samples/sec Loss 3.7991 LearningRate 0.0001 Epoch: 27 Global Step: 571580 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:43:35,393-Speed 6339.74 samples/sec Loss 3.7286 LearningRate 0.0001 Epoch: 27 Global Step: 571590 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:38,639-Speed 6310.19 samples/sec Loss 3.7131 LearningRate 0.0001 Epoch: 27 Global Step: 571600 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:41,890-Speed 6301.15 samples/sec Loss 3.7683 LearningRate 0.0001 Epoch: 27 Global Step: 571610 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:45,136-Speed 6312.23 samples/sec Loss 3.8256 LearningRate 0.0001 Epoch: 27 Global Step: 571620 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:48,379-Speed 6315.10 samples/sec Loss 3.7767 LearningRate 0.0001 Epoch: 27 Global Step: 571630 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:51,627-Speed 6307.45 samples/sec Loss 3.7349 LearningRate 0.0001 Epoch: 27 Global Step: 571640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:54,874-Speed 6309.07 samples/sec Loss 3.8082 LearningRate 0.0001 Epoch: 27 Global Step: 571650 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:43:58,122-Speed 6306.19 samples/sec Loss 3.7495 LearningRate 0.0001 Epoch: 27 Global Step: 571660 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:01,369-Speed 6309.45 samples/sec Loss 3.7729 LearningRate 0.0001 Epoch: 27 Global Step: 571670 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:04,612-Speed 6315.68 samples/sec Loss 3.6983 LearningRate 0.0001 Epoch: 27 Global Step: 571680 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:07,845-Speed 6336.60 samples/sec Loss 3.7505 LearningRate 0.0001 Epoch: 27 Global Step: 571690 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:11,094-Speed 6304.25 samples/sec Loss 3.7364 LearningRate 0.0001 Epoch: 27 Global Step: 571700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:14,338-Speed 6315.96 samples/sec Loss 3.7993 LearningRate 0.0001 Epoch: 27 Global Step: 571710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:17,583-Speed 6312.12 samples/sec Loss 3.7195 LearningRate 0.0001 Epoch: 27 Global Step: 571720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:20,829-Speed 6310.87 samples/sec Loss 3.7364 LearningRate 0.0001 Epoch: 27 Global Step: 571730 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:24,077-Speed 6306.97 samples/sec Loss 3.7127 LearningRate 0.0001 Epoch: 27 Global Step: 571740 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:27,330-Speed 6296.80 samples/sec Loss 3.7293 LearningRate 0.0001 Epoch: 27 Global Step: 571750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:30,609-Speed 6246.34 samples/sec Loss 3.7629 LearningRate 0.0001 Epoch: 27 Global Step: 571760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:33,848-Speed 6326.22 samples/sec Loss 3.7766 LearningRate 0.0001 Epoch: 27 Global Step: 571770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:37,090-Speed 6317.92 samples/sec Loss 3.7427 LearningRate 0.0001 Epoch: 27 Global Step: 571780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:44:40,382-Speed 6222.76 samples/sec Loss 3.7948 LearningRate 0.0001 Epoch: 27 Global Step: 571790 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:44:43,642-Speed 6283.37 samples/sec Loss 3.7676 LearningRate 0.0001 Epoch: 27 Global Step: 571800 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:44:46,887-Speed 6313.41 samples/sec Loss 3.7838 LearningRate 0.0001 Epoch: 27 Global Step: 571810 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:44:50,131-Speed 6314.69 samples/sec Loss 3.7204 LearningRate 0.0001 Epoch: 27 Global Step: 571820 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:44:53,380-Speed 6303.72 samples/sec Loss 3.7379 LearningRate 0.0001 Epoch: 27 Global Step: 571830 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:44:56,628-Speed 6307.55 samples/sec Loss 3.7320 LearningRate 0.0001 Epoch: 27 Global Step: 571840 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:44:59,875-Speed 6309.07 samples/sec Loss 3.7508 LearningRate 0.0001 Epoch: 27 Global Step: 571850 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:45:03,118-Speed 6315.37 samples/sec Loss 3.7994 LearningRate 0.0001 Epoch: 27 Global Step: 571860 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:45:06,378-Speed 6283.91 samples/sec Loss 3.7539 LearningRate 0.0001 Epoch: 27 Global Step: 571870 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:09,622-Speed 6314.53 samples/sec Loss 3.7680 LearningRate 0.0001 Epoch: 27 Global Step: 571880 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:12,870-Speed 6307.33 samples/sec Loss 3.7571 LearningRate 0.0001 Epoch: 27 Global Step: 571890 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:16,161-Speed 6225.04 samples/sec Loss 3.7388 LearningRate 0.0001 Epoch: 27 Global Step: 571900 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:19,407-Speed 6310.19 samples/sec Loss 3.7387 LearningRate 0.0001 Epoch: 27 Global Step: 571910 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:22,649-Speed 6318.37 samples/sec Loss 3.6890 LearningRate 0.0001 Epoch: 27 Global Step: 571920 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:25,891-Speed 6318.24 samples/sec Loss 3.7089 LearningRate 0.0001 Epoch: 27 Global Step: 571930 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:29,133-Speed 6317.81 samples/sec Loss 3.7472 LearningRate 0.0001 Epoch: 27 Global Step: 571940 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:32,380-Speed 6308.44 samples/sec Loss 3.7196 LearningRate 0.0001 Epoch: 27 Global Step: 571950 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:35,621-Speed 6320.78 samples/sec Loss 3.7655 LearningRate 0.0001 Epoch: 27 Global Step: 571960 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:38,867-Speed 6310.65 samples/sec Loss 3.7795 LearningRate 0.0001 Epoch: 27 Global Step: 571970 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:45:42,099-Speed 6338.82 samples/sec Loss 3.7512 LearningRate 0.0001 Epoch: 27 Global Step: 571980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:45,347-Speed 6308.32 samples/sec Loss 3.7550 LearningRate 0.0001 Epoch: 27 Global Step: 571990 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:48,595-Speed 6305.92 samples/sec Loss 3.7581 LearningRate 0.0001 Epoch: 27 Global Step: 572000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:51,847-Speed 6300.25 samples/sec Loss 3.7202 LearningRate 0.0001 Epoch: 27 Global Step: 572010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:55,092-Speed 6311.58 samples/sec Loss 3.8102 LearningRate 0.0001 Epoch: 27 Global Step: 572020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:45:58,338-Speed 6311.53 samples/sec Loss 3.6831 LearningRate 0.0001 Epoch: 27 Global Step: 572030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:01,584-Speed 6308.96 samples/sec Loss 3.7281 LearningRate 0.0001 Epoch: 27 Global Step: 572040 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:04,826-Speed 6319.49 samples/sec Loss 3.7440 LearningRate 0.0001 Epoch: 27 Global Step: 572050 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:08,071-Speed 6311.68 samples/sec Loss 3.7900 LearningRate 0.0001 Epoch: 27 Global Step: 572060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:11,319-Speed 6307.75 samples/sec Loss 3.7150 LearningRate 0.0001 Epoch: 27 Global Step: 572070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:14,564-Speed 6312.15 samples/sec Loss 3.7656 LearningRate 0.0001 Epoch: 27 Global Step: 572080 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:46:17,811-Speed 6310.09 samples/sec Loss 3.7041 LearningRate 0.0001 Epoch: 27 Global Step: 572090 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:46:21,058-Speed 6306.78 samples/sec Loss 3.7783 LearningRate 0.0001 Epoch: 27 Global Step: 572100 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:46:24,297-Speed 6324.43 samples/sec Loss 3.7162 LearningRate 0.0001 Epoch: 27 Global Step: 572110 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:46:27,545-Speed 6307.79 samples/sec Loss 3.7509 LearningRate 0.0001 Epoch: 27 Global Step: 572120 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:46:30,791-Speed 6310.32 samples/sec Loss 3.7724 LearningRate 0.0001 Epoch: 27 Global Step: 572130 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:46:34,026-Speed 6332.51 samples/sec Loss 3.8040 LearningRate 0.0001 Epoch: 27 Global Step: 572140 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:37,278-Speed 6298.45 samples/sec Loss 3.7678 LearningRate 0.0001 Epoch: 27 Global Step: 572150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:40,523-Speed 6312.50 samples/sec Loss 3.6927 LearningRate 0.0001 Epoch: 27 Global Step: 572160 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:43,770-Speed 6309.38 samples/sec Loss 3.6929 LearningRate 0.0001 Epoch: 27 Global Step: 572170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:47,016-Speed 6311.26 samples/sec Loss 3.7242 LearningRate 0.0001 Epoch: 27 Global Step: 572180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:50,265-Speed 6304.37 samples/sec Loss 3.7620 LearningRate 0.0001 Epoch: 27 Global Step: 572190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:53,511-Speed 6312.37 samples/sec Loss 3.7561 LearningRate 0.0001 Epoch: 27 Global Step: 572200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:46:56,757-Speed 6309.94 samples/sec Loss 3.7688 LearningRate 0.0001 Epoch: 27 Global Step: 572210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:00,005-Speed 6306.14 samples/sec Loss 3.7881 LearningRate 0.0001 Epoch: 27 Global Step: 572220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:03,262-Speed 6290.96 samples/sec Loss 3.7679 LearningRate 0.0001 Epoch: 27 Global Step: 572230 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:06,509-Speed 6307.92 samples/sec Loss 3.6732 LearningRate 0.0001 Epoch: 27 Global Step: 572240 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:47:09,754-Speed 6312.42 samples/sec Loss 3.7020 LearningRate 0.0001 Epoch: 27 Global Step: 572250 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:47:12,998-Speed 6314.45 samples/sec Loss 3.7399 LearningRate 0.0001 Epoch: 27 Global Step: 572260 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:47:16,241-Speed 6316.38 samples/sec Loss 3.7148 LearningRate 0.0001 Epoch: 27 Global Step: 572270 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:47:19,487-Speed 6312.24 samples/sec Loss 3.7661 LearningRate 0.0001 Epoch: 27 Global Step: 572280 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:47:22,718-Speed 6338.78 samples/sec Loss 3.7540 LearningRate 0.0001 Epoch: 27 Global Step: 572290 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:25,963-Speed 6311.83 samples/sec Loss 3.6820 LearningRate 0.0001 Epoch: 27 Global Step: 572300 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:29,214-Speed 6302.50 samples/sec Loss 3.7539 LearningRate 0.0001 Epoch: 27 Global Step: 572310 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:32,463-Speed 6304.17 samples/sec Loss 3.7751 LearningRate 0.0001 Epoch: 27 Global Step: 572320 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:35,707-Speed 6314.56 samples/sec Loss 3.7644 LearningRate 0.0001 Epoch: 27 Global Step: 572330 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:38,955-Speed 6306.41 samples/sec Loss 3.7581 LearningRate 0.0001 Epoch: 27 Global Step: 572340 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:42,202-Speed 6309.44 samples/sec Loss 3.7810 LearningRate 0.0001 Epoch: 27 Global Step: 572350 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:45,452-Speed 6303.46 samples/sec Loss 3.7265 LearningRate 0.0001 Epoch: 27 Global Step: 572360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:48,695-Speed 6315.60 samples/sec Loss 3.7523 LearningRate 0.0001 Epoch: 27 Global Step: 572370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:51,939-Speed 6314.55 samples/sec Loss 3.7251 LearningRate 0.0001 Epoch: 27 Global Step: 572380 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:47:55,189-Speed 6303.15 samples/sec Loss 3.7815 LearningRate 0.0001 Epoch: 27 Global Step: 572390 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:47:58,432-Speed 6316.17 samples/sec Loss 3.7476 LearningRate 0.0001 Epoch: 27 Global Step: 572400 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:48:01,681-Speed 6305.26 samples/sec Loss 3.7592 LearningRate 0.0001 Epoch: 27 Global Step: 572410 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:48:04,926-Speed 6313.28 samples/sec Loss 3.7069 LearningRate 0.0001 Epoch: 27 Global Step: 572420 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:48:08,161-Speed 6333.46 samples/sec Loss 3.6703 LearningRate 0.0001 Epoch: 27 Global Step: 572430 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:48:11,433-Speed 6259.24 samples/sec Loss 3.7052 LearningRate 0.0001 Epoch: 27 Global Step: 572440 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:48:14,701-Speed 6270.20 samples/sec Loss 3.7713 LearningRate 0.0001 Epoch: 27 Global Step: 572450 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:48:17,944-Speed 6314.98 samples/sec Loss 3.6860 LearningRate 0.0001 Epoch: 27 Global Step: 572460 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:48:21,277-Speed 6145.91 samples/sec Loss 3.7415 LearningRate 0.0001 Epoch: 27 Global Step: 572470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:48:24,583-Speed 6195.73 samples/sec Loss 3.7513 LearningRate 0.0001 Epoch: 27 Global Step: 572480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:48:27,943-Speed 6097.81 samples/sec Loss 3.7513 LearningRate 0.0001 Epoch: 27 Global Step: 572490 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:48:31,230-Speed 6232.51 samples/sec Loss 3.7316 LearningRate 0.0001 Epoch: 27 Global Step: 572500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:48:34,475-Speed 6310.87 samples/sec Loss 3.7248 LearningRate 0.0001 Epoch: 27 Global Step: 572510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:48:37,718-Speed 6317.46 samples/sec Loss 3.7173 LearningRate 0.0001 Epoch: 27 Global Step: 572520 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:48:40,967-Speed 6303.92 samples/sec Loss 3.6683 LearningRate 0.0001 Epoch: 27 Global Step: 572530 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:48:44,214-Speed 6308.88 samples/sec Loss 3.7184 LearningRate 0.0001 Epoch: 27 Global Step: 572540 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:48:47,464-Speed 6304.61 samples/sec Loss 3.7547 LearningRate 0.0001 Epoch: 27 Global Step: 572550 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:48:50,713-Speed 6303.48 samples/sec Loss 3.7797 LearningRate 0.0001 Epoch: 27 Global Step: 572560 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:48:53,959-Speed 6310.90 samples/sec Loss 3.7567 LearningRate 0.0001 Epoch: 27 Global Step: 572570 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:48:57,208-Speed 6306.00 samples/sec Loss 3.8660 LearningRate 0.0001 Epoch: 27 Global Step: 572580 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:49:00,451-Speed 6316.73 samples/sec Loss 3.7923 LearningRate 0.0001 Epoch: 27 Global Step: 572590 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:49:03,688-Speed 6326.71 samples/sec Loss 3.8248 LearningRate 0.0001 Epoch: 27 Global Step: 572600 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:07,017-Speed 6153.34 samples/sec Loss 3.7016 LearningRate 0.0001 Epoch: 27 Global Step: 572610 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:10,270-Speed 6298.18 samples/sec Loss 3.7316 LearningRate 0.0001 Epoch: 27 Global Step: 572620 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:13,517-Speed 6309.20 samples/sec Loss 3.8013 LearningRate 0.0001 Epoch: 27 Global Step: 572630 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:16,764-Speed 6309.60 samples/sec Loss 3.7307 LearningRate 0.0001 Epoch: 27 Global Step: 572640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:20,010-Speed 6310.76 samples/sec Loss 3.8170 LearningRate 0.0001 Epoch: 27 Global Step: 572650 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:23,255-Speed 6311.82 samples/sec Loss 3.7172 LearningRate 0.0001 Epoch: 27 Global Step: 572660 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:26,504-Speed 6305.44 samples/sec Loss 3.8374 LearningRate 0.0001 Epoch: 27 Global Step: 572670 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:29,748-Speed 6313.74 samples/sec Loss 3.7338 LearningRate 0.0001 Epoch: 27 Global Step: 572680 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:32,992-Speed 6315.10 samples/sec Loss 3.7263 LearningRate 0.0001 Epoch: 27 Global Step: 572690 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:36,224-Speed 6338.42 samples/sec Loss 3.7373 LearningRate 0.0001 Epoch: 27 Global Step: 572700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:39,465-Speed 6319.26 samples/sec Loss 3.7842 LearningRate 0.0001 Epoch: 27 Global Step: 572710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:42,714-Speed 6305.84 samples/sec Loss 3.7472 LearningRate 0.0001 Epoch: 27 Global Step: 572720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:45,962-Speed 6306.66 samples/sec Loss 3.7858 LearningRate 0.0001 Epoch: 27 Global Step: 572730 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:49,207-Speed 6312.31 samples/sec Loss 3.7563 LearningRate 0.0001 Epoch: 27 Global Step: 572740 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:52,450-Speed 6316.80 samples/sec Loss 3.7374 LearningRate 0.0001 Epoch: 27 Global Step: 572750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:55,694-Speed 6314.76 samples/sec Loss 3.6685 LearningRate 0.0001 Epoch: 27 Global Step: 572760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:49:58,939-Speed 6313.18 samples/sec Loss 3.7832 LearningRate 0.0001 Epoch: 27 Global Step: 572770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:02,183-Speed 6313.52 samples/sec Loss 3.7579 LearningRate 0.0001 Epoch: 27 Global Step: 572780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:05,427-Speed 6315.24 samples/sec Loss 3.7503 LearningRate 0.0001 Epoch: 27 Global Step: 572790 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:08,670-Speed 6315.23 samples/sec Loss 3.7708 LearningRate 0.0001 Epoch: 27 Global Step: 572800 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:50:11,915-Speed 6312.77 samples/sec Loss 3.7133 LearningRate 0.0001 Epoch: 27 Global Step: 572810 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:50:15,147-Speed 6339.05 samples/sec Loss 3.7306 LearningRate 0.0001 Epoch: 27 Global Step: 572820 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:18,393-Speed 6310.46 samples/sec Loss 3.7941 LearningRate 0.0001 Epoch: 27 Global Step: 572830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:21,642-Speed 6304.84 samples/sec Loss 3.7282 LearningRate 0.0001 Epoch: 27 Global Step: 572840 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:24,887-Speed 6312.27 samples/sec Loss 3.7927 LearningRate 0.0001 Epoch: 27 Global Step: 572850 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:28,134-Speed 6310.13 samples/sec Loss 3.7156 LearningRate 0.0001 Epoch: 27 Global Step: 572860 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:31,382-Speed 6307.75 samples/sec Loss 3.7519 LearningRate 0.0001 Epoch: 27 Global Step: 572870 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:34,624-Speed 6317.78 samples/sec Loss 3.7208 LearningRate 0.0001 Epoch: 27 Global Step: 572880 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:37,870-Speed 6311.28 samples/sec Loss 3.6944 LearningRate 0.0001 Epoch: 27 Global Step: 572890 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:41,114-Speed 6314.11 samples/sec Loss 3.7469 LearningRate 0.0001 Epoch: 27 Global Step: 572900 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:44,359-Speed 6312.34 samples/sec Loss 3.7720 LearningRate 0.0001 Epoch: 27 Global Step: 572910 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:50:47,605-Speed 6311.94 samples/sec Loss 3.7217 LearningRate 0.0001 Epoch: 27 Global Step: 572920 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:50:50,851-Speed 6309.11 samples/sec Loss 3.7607 LearningRate 0.0001 Epoch: 27 Global Step: 572930 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:50:54,099-Speed 6307.86 samples/sec Loss 3.7612 LearningRate 0.0001 Epoch: 27 Global Step: 572940 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:50:57,328-Speed 6342.70 samples/sec Loss 3.7889 LearningRate 0.0001 Epoch: 27 Global Step: 572950 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:00,574-Speed 6311.55 samples/sec Loss 3.8002 LearningRate 0.0001 Epoch: 27 Global Step: 572960 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:03,821-Speed 6307.92 samples/sec Loss 3.7797 LearningRate 0.0001 Epoch: 27 Global Step: 572970 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:07,068-Speed 6309.74 samples/sec Loss 3.7501 LearningRate 0.0001 Epoch: 27 Global Step: 572980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:10,313-Speed 6312.29 samples/sec Loss 3.7528 LearningRate 0.0001 Epoch: 27 Global Step: 572990 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:13,556-Speed 6316.90 samples/sec Loss 3.7283 LearningRate 0.0001 Epoch: 27 Global Step: 573000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:16,799-Speed 6316.09 samples/sec Loss 3.7448 LearningRate 0.0001 Epoch: 27 Global Step: 573010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:20,044-Speed 6313.17 samples/sec Loss 3.7187 LearningRate 0.0001 Epoch: 27 Global Step: 573020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:23,288-Speed 6314.08 samples/sec Loss 3.7029 LearningRate 0.0001 Epoch: 27 Global Step: 573030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:26,532-Speed 6314.13 samples/sec Loss 3.7731 LearningRate 0.0001 Epoch: 27 Global Step: 573040 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:29,776-Speed 6315.75 samples/sec Loss 3.7759 LearningRate 0.0001 Epoch: 27 Global Step: 573050 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:51:33,011-Speed 6331.87 samples/sec Loss 3.7694 LearningRate 0.0001 Epoch: 27 Global Step: 573060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:36,259-Speed 6306.34 samples/sec Loss 3.7856 LearningRate 0.0001 Epoch: 27 Global Step: 573070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:39,503-Speed 6313.87 samples/sec Loss 3.7207 LearningRate 0.0001 Epoch: 27 Global Step: 573080 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:42,746-Speed 6316.30 samples/sec Loss 3.7714 LearningRate 0.0001 Epoch: 27 Global Step: 573090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:45,998-Speed 6301.69 samples/sec Loss 3.7750 LearningRate 0.0001 Epoch: 27 Global Step: 573100 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:49,245-Speed 6307.33 samples/sec Loss 3.7613 LearningRate 0.0001 Epoch: 27 Global Step: 573110 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:52,490-Speed 6314.06 samples/sec Loss 3.6861 LearningRate 0.0001 Epoch: 27 Global Step: 573120 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:55,738-Speed 6306.42 samples/sec Loss 3.8090 LearningRate 0.0001 Epoch: 27 Global Step: 573130 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:51:58,983-Speed 6313.27 samples/sec Loss 3.6735 LearningRate 0.0001 Epoch: 27 Global Step: 573140 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:02,231-Speed 6306.36 samples/sec Loss 3.7130 LearningRate 0.0001 Epoch: 27 Global Step: 573150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:05,484-Speed 6297.26 samples/sec Loss 3.7711 LearningRate 0.0001 Epoch: 27 Global Step: 573160 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:52:08,718-Speed 6333.31 samples/sec Loss 3.6902 LearningRate 0.0001 Epoch: 27 Global Step: 573170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:11,963-Speed 6313.86 samples/sec Loss 3.7678 LearningRate 0.0001 Epoch: 27 Global Step: 573180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:15,205-Speed 6317.96 samples/sec Loss 3.7302 LearningRate 0.0001 Epoch: 27 Global Step: 573190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:18,450-Speed 6313.51 samples/sec Loss 3.7709 LearningRate 0.0001 Epoch: 27 Global Step: 573200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:21,695-Speed 6311.49 samples/sec Loss 3.7257 LearningRate 0.0001 Epoch: 27 Global Step: 573210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:24,945-Speed 6303.45 samples/sec Loss 3.7473 LearningRate 0.0001 Epoch: 27 Global Step: 573220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:28,191-Speed 6310.51 samples/sec Loss 3.7492 LearningRate 0.0001 Epoch: 27 Global Step: 573230 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:31,436-Speed 6312.49 samples/sec Loss 3.8424 LearningRate 0.0001 Epoch: 27 Global Step: 573240 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:34,685-Speed 6305.57 samples/sec Loss 3.8226 LearningRate 0.0001 Epoch: 27 Global Step: 573250 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:37,928-Speed 6315.42 samples/sec Loss 3.7180 LearningRate 0.0001 Epoch: 27 Global Step: 573260 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:41,172-Speed 6314.25 samples/sec Loss 3.7284 LearningRate 0.0001 Epoch: 27 Global Step: 573270 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:52:44,415-Speed 6316.33 samples/sec Loss 3.8046 LearningRate 0.0001 Epoch: 27 Global Step: 573280 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:52:47,659-Speed 6315.18 samples/sec Loss 3.8032 LearningRate 0.0001 Epoch: 27 Global Step: 573290 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:52:50,891-Speed 6338.27 samples/sec Loss 3.7463 LearningRate 0.0001 Epoch: 27 Global Step: 573300 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:54,134-Speed 6317.16 samples/sec Loss 3.7707 LearningRate 0.0001 Epoch: 27 Global Step: 573310 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:52:57,381-Speed 6308.60 samples/sec Loss 3.6456 LearningRate 0.0001 Epoch: 27 Global Step: 573320 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:00,626-Speed 6312.88 samples/sec Loss 3.8343 LearningRate 0.0001 Epoch: 27 Global Step: 573330 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:03,875-Speed 6305.10 samples/sec Loss 3.7562 LearningRate 0.0001 Epoch: 27 Global Step: 573340 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:07,124-Speed 6305.61 samples/sec Loss 3.7769 LearningRate 0.0001 Epoch: 27 Global Step: 573350 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:10,367-Speed 6315.47 samples/sec Loss 3.7125 LearningRate 0.0001 Epoch: 27 Global Step: 573360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:13,612-Speed 6314.53 samples/sec Loss 3.7939 LearningRate 0.0001 Epoch: 27 Global Step: 573370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:16,857-Speed 6312.45 samples/sec Loss 3.7572 LearningRate 0.0001 Epoch: 27 Global Step: 573380 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:20,100-Speed 6316.81 samples/sec Loss 3.7595 LearningRate 0.0001 Epoch: 27 Global Step: 573390 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:23,348-Speed 6306.53 samples/sec Loss 3.7425 LearningRate 0.0001 Epoch: 27 Global Step: 573400 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:53:26,588-Speed 6322.57 samples/sec Loss 3.7417 LearningRate 0.0001 Epoch: 27 Global Step: 573410 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:53:29,820-Speed 6336.34 samples/sec Loss 3.7500 LearningRate 0.0001 Epoch: 27 Global Step: 573420 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:33,068-Speed 6306.69 samples/sec Loss 3.7061 LearningRate 0.0001 Epoch: 27 Global Step: 573430 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:36,313-Speed 6313.99 samples/sec Loss 3.7378 LearningRate 0.0001 Epoch: 27 Global Step: 573440 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:39,560-Speed 6308.52 samples/sec Loss 3.7719 LearningRate 0.0001 Epoch: 27 Global Step: 573450 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:42,805-Speed 6311.21 samples/sec Loss 3.7595 LearningRate 0.0001 Epoch: 27 Global Step: 573460 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:46,052-Speed 6310.79 samples/sec Loss 3.7359 LearningRate 0.0001 Epoch: 27 Global Step: 573470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:49,299-Speed 6308.37 samples/sec Loss 3.8000 LearningRate 0.0001 Epoch: 27 Global Step: 573480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:52,546-Speed 6307.81 samples/sec Loss 3.8082 LearningRate 0.0001 Epoch: 27 Global Step: 573490 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:55,795-Speed 6305.47 samples/sec Loss 3.6922 LearningRate 0.0001 Epoch: 27 Global Step: 573500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:53:59,041-Speed 6310.15 samples/sec Loss 3.7823 LearningRate 0.0001 Epoch: 27 Global Step: 573510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:02,288-Speed 6309.61 samples/sec Loss 3.7824 LearningRate 0.0001 Epoch: 27 Global Step: 573520 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:54:05,536-Speed 6307.27 samples/sec Loss 3.7307 LearningRate 0.0001 Epoch: 27 Global Step: 573530 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:54:08,780-Speed 6313.98 samples/sec Loss 3.7305 LearningRate 0.0001 Epoch: 27 Global Step: 573540 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:54:12,023-Speed 6317.12 samples/sec Loss 3.7665 LearningRate 0.0001 Epoch: 27 Global Step: 573550 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:54:15,269-Speed 6310.83 samples/sec Loss 3.7361 LearningRate 0.0001 Epoch: 27 Global Step: 573560 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:54:18,498-Speed 6343.51 samples/sec Loss 3.7507 LearningRate 0.0001 Epoch: 27 Global Step: 573570 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:21,741-Speed 6317.46 samples/sec Loss 3.7174 LearningRate 0.0001 Epoch: 27 Global Step: 573580 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:24,993-Speed 6298.67 samples/sec Loss 3.7527 LearningRate 0.0001 Epoch: 27 Global Step: 573590 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:28,236-Speed 6317.11 samples/sec Loss 3.7043 LearningRate 0.0001 Epoch: 27 Global Step: 573600 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:31,486-Speed 6302.04 samples/sec Loss 3.7471 LearningRate 0.0001 Epoch: 27 Global Step: 573610 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:34,731-Speed 6313.04 samples/sec Loss 3.7643 LearningRate 0.0001 Epoch: 27 Global Step: 573620 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:37,972-Speed 6320.60 samples/sec Loss 3.6829 LearningRate 0.0001 Epoch: 27 Global Step: 573630 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:41,216-Speed 6314.29 samples/sec Loss 3.6920 LearningRate 0.0001 Epoch: 27 Global Step: 573640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:44,459-Speed 6316.85 samples/sec Loss 3.7801 LearningRate 0.0001 Epoch: 27 Global Step: 573650 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:47,702-Speed 6317.70 samples/sec Loss 3.7593 LearningRate 0.0001 Epoch: 27 Global Step: 573660 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:54:50,950-Speed 6306.58 samples/sec Loss 3.7752 LearningRate 0.0001 Epoch: 27 Global Step: 573670 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:54:54,197-Speed 6308.82 samples/sec Loss 3.6935 LearningRate 0.0001 Epoch: 27 Global Step: 573680 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:54:57,427-Speed 6340.95 samples/sec Loss 3.7193 LearningRate 0.0001 Epoch: 27 Global Step: 573690 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:00,673-Speed 6311.05 samples/sec Loss 3.7285 LearningRate 0.0001 Epoch: 27 Global Step: 573700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:03,921-Speed 6307.50 samples/sec Loss 3.7422 LearningRate 0.0001 Epoch: 27 Global Step: 573710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:07,164-Speed 6315.87 samples/sec Loss 3.7677 LearningRate 0.0001 Epoch: 27 Global Step: 573720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:10,421-Speed 6289.05 samples/sec Loss 3.7640 LearningRate 0.0001 Epoch: 27 Global Step: 573730 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:13,686-Speed 6274.70 samples/sec Loss 3.7027 LearningRate 0.0001 Epoch: 27 Global Step: 573740 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:16,929-Speed 6316.79 samples/sec Loss 3.7246 LearningRate 0.0001 Epoch: 27 Global Step: 573750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:20,183-Speed 6294.81 samples/sec Loss 3.7399 LearningRate 0.0001 Epoch: 27 Global Step: 573760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:23,427-Speed 6315.88 samples/sec Loss 3.7449 LearningRate 0.0001 Epoch: 27 Global Step: 573770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:26,674-Speed 6308.73 samples/sec Loss 3.6936 LearningRate 0.0001 Epoch: 27 Global Step: 573780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:29,904-Speed 6341.04 samples/sec Loss 3.7682 LearningRate 0.0001 Epoch: 27 Global Step: 573790 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:33,153-Speed 6305.33 samples/sec Loss 3.7423 LearningRate 0.0001 Epoch: 27 Global Step: 573800 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:36,402-Speed 6306.15 samples/sec Loss 3.6997 LearningRate 0.0001 Epoch: 27 Global Step: 573810 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:39,646-Speed 6313.83 samples/sec Loss 3.6679 LearningRate 0.0001 Epoch: 27 Global Step: 573820 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:42,890-Speed 6314.46 samples/sec Loss 3.7207 LearningRate 0.0001 Epoch: 27 Global Step: 573830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:46,145-Speed 6294.59 samples/sec Loss 3.7739 LearningRate 0.0001 Epoch: 27 Global Step: 573840 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:49,390-Speed 6311.11 samples/sec Loss 3.7456 LearningRate 0.0001 Epoch: 27 Global Step: 573850 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:52,637-Speed 6310.19 samples/sec Loss 3.7160 LearningRate 0.0001 Epoch: 27 Global Step: 573860 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:55,881-Speed 6314.04 samples/sec Loss 3.6914 LearningRate 0.0001 Epoch: 27 Global Step: 573870 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:55:59,125-Speed 6313.32 samples/sec Loss 3.7458 LearningRate 0.0001 Epoch: 27 Global Step: 573880 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:56:02,367-Speed 6319.08 samples/sec Loss 3.7259 LearningRate 0.0001 Epoch: 27 Global Step: 573890 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:56:05,613-Speed 6311.32 samples/sec Loss 3.7975 LearningRate 0.0001 Epoch: 27 Global Step: 573900 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:56:08,860-Speed 6307.74 samples/sec Loss 3.7645 LearningRate 0.0001 Epoch: 27 Global Step: 573910 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:56:12,105-Speed 6312.53 samples/sec Loss 3.7843 LearningRate 0.0001 Epoch: 27 Global Step: 573920 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:56:15,351-Speed 6311.10 samples/sec Loss 3.7473 LearningRate 0.0001 Epoch: 27 Global Step: 573930 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:56:18,596-Speed 6313.11 samples/sec Loss 3.7864 LearningRate 0.0001 Epoch: 27 Global Step: 573940 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:56:21,843-Speed 6308.41 samples/sec Loss 3.7065 LearningRate 0.0001 Epoch: 27 Global Step: 573950 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:56:25,087-Speed 6315.79 samples/sec Loss 3.7784 LearningRate 0.0001 Epoch: 27 Global Step: 573960 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:56:28,332-Speed 6311.49 samples/sec Loss 3.7635 LearningRate 0.0001 Epoch: 27 Global Step: 573970 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:56:31,562-Speed 6341.94 samples/sec Loss 3.7903 LearningRate 0.0001 Epoch: 27 Global Step: 573980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:56:34,804-Speed 6318.47 samples/sec Loss 3.7536 LearningRate 0.0001 Epoch: 27 Global Step: 573990 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:56:38,052-Speed 6308.90 samples/sec Loss 3.7632 LearningRate 0.0001 Epoch: 27 Global Step: 574000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:56:41,296-Speed 6314.24 samples/sec Loss 3.6542 LearningRate 0.0001 Epoch: 27 Global Step: 574010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:56:44,546-Speed 6302.39 samples/sec Loss 3.7403 LearningRate 0.0001 Epoch: 27 Global Step: 574020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:56:47,790-Speed 6314.39 samples/sec Loss 3.7483 LearningRate 0.0001 Epoch: 27 Global Step: 574030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:56:51,036-Speed 6312.88 samples/sec Loss 3.7181 LearningRate 0.0001 Epoch: 27 Global Step: 574040 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:56:54,282-Speed 6310.42 samples/sec Loss 3.6982 LearningRate 0.0001 Epoch: 27 Global Step: 574050 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:56:57,537-Speed 6293.61 samples/sec Loss 3.7957 LearningRate 0.0001 Epoch: 27 Global Step: 574060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:00,785-Speed 6306.27 samples/sec Loss 3.6915 LearningRate 0.0001 Epoch: 27 Global Step: 574070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:04,030-Speed 6312.93 samples/sec Loss 3.7451 LearningRate 0.0001 Epoch: 27 Global Step: 574080 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:57:07,266-Speed 6330.69 samples/sec Loss 3.8090 LearningRate 0.0001 Epoch: 27 Global Step: 574090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:10,513-Speed 6308.52 samples/sec Loss 3.7994 LearningRate 0.0001 Epoch: 27 Global Step: 574100 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:13,767-Speed 6294.25 samples/sec Loss 3.7259 LearningRate 0.0001 Epoch: 27 Global Step: 574110 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:17,017-Speed 6304.95 samples/sec Loss 3.7586 LearningRate 0.0001 Epoch: 27 Global Step: 574120 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:20,263-Speed 6310.24 samples/sec Loss 3.7712 LearningRate 0.0001 Epoch: 27 Global Step: 574130 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:23,511-Speed 6306.78 samples/sec Loss 3.7420 LearningRate 0.0001 Epoch: 27 Global Step: 574140 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:26,754-Speed 6315.27 samples/sec Loss 3.7654 LearningRate 0.0001 Epoch: 27 Global Step: 574150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:29,999-Speed 6314.07 samples/sec Loss 3.7407 LearningRate 0.0001 Epoch: 27 Global Step: 574160 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:33,241-Speed 6318.36 samples/sec Loss 3.7145 LearningRate 0.0001 Epoch: 27 Global Step: 574170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:36,481-Speed 6321.50 samples/sec Loss 3.7264 LearningRate 0.0001 Epoch: 27 Global Step: 574180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:39,718-Speed 6328.90 samples/sec Loss 3.7735 LearningRate 0.0001 Epoch: 27 Global Step: 574190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:42,961-Speed 6316.43 samples/sec Loss 3.7680 LearningRate 0.0001 Epoch: 27 Global Step: 574200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:46,208-Speed 6309.31 samples/sec Loss 3.6878 LearningRate 0.0001 Epoch: 27 Global Step: 574210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:49,457-Speed 6303.89 samples/sec Loss 3.7190 LearningRate 0.0001 Epoch: 27 Global Step: 574220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:52,709-Speed 6300.80 samples/sec Loss 3.7403 LearningRate 0.0001 Epoch: 27 Global Step: 574230 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:55,956-Speed 6309.12 samples/sec Loss 3.7301 LearningRate 0.0001 Epoch: 27 Global Step: 574240 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:57:59,201-Speed 6312.08 samples/sec Loss 3.7152 LearningRate 0.0001 Epoch: 27 Global Step: 574250 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:02,445-Speed 6313.36 samples/sec Loss 3.7326 LearningRate 0.0001 Epoch: 27 Global Step: 574260 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:05,689-Speed 6315.87 samples/sec Loss 3.7608 LearningRate 0.0001 Epoch: 27 Global Step: 574270 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:08,937-Speed 6305.96 samples/sec Loss 3.7165 LearningRate 0.0001 Epoch: 27 Global Step: 574280 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:12,183-Speed 6312.33 samples/sec Loss 3.7883 LearningRate 0.0001 Epoch: 27 Global Step: 574290 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:58:15,430-Speed 6308.64 samples/sec Loss 3.7069 LearningRate 0.0001 Epoch: 27 Global Step: 574300 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:58:18,662-Speed 6336.53 samples/sec Loss 3.6814 LearningRate 0.0001 Epoch: 27 Global Step: 574310 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:21,916-Speed 6295.23 samples/sec Loss 3.7346 LearningRate 0.0001 Epoch: 27 Global Step: 574320 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:25,161-Speed 6313.05 samples/sec Loss 3.7407 LearningRate 0.0001 Epoch: 27 Global Step: 574330 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:28,406-Speed 6312.68 samples/sec Loss 3.7861 LearningRate 0.0001 Epoch: 27 Global Step: 574340 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:31,649-Speed 6316.21 samples/sec Loss 3.6858 LearningRate 0.0001 Epoch: 27 Global Step: 574350 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:34,894-Speed 6314.32 samples/sec Loss 3.6595 LearningRate 0.0001 Epoch: 27 Global Step: 574360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:38,142-Speed 6305.90 samples/sec Loss 3.7540 LearningRate 0.0001 Epoch: 27 Global Step: 574370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:41,392-Speed 6302.76 samples/sec Loss 3.7814 LearningRate 0.0001 Epoch: 27 Global Step: 574380 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:44,649-Speed 6290.68 samples/sec Loss 3.7196 LearningRate 0.0001 Epoch: 27 Global Step: 574390 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:47,897-Speed 6306.31 samples/sec Loss 3.6861 LearningRate 0.0001 Epoch: 27 Global Step: 574400 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:51,145-Speed 6308.09 samples/sec Loss 3.7705 LearningRate 0.0001 Epoch: 27 Global Step: 574410 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 19:58:54,373-Speed 6345.35 samples/sec Loss 3.7033 LearningRate 0.0001 Epoch: 27 Global Step: 574420 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:58:57,620-Speed 6308.50 samples/sec Loss 3.7040 LearningRate 0.0001 Epoch: 27 Global Step: 574430 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:00,864-Speed 6314.57 samples/sec Loss 3.7473 LearningRate 0.0001 Epoch: 27 Global Step: 574440 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:04,123-Speed 6286.81 samples/sec Loss 3.7537 LearningRate 0.0001 Epoch: 27 Global Step: 574450 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:07,365-Speed 6318.05 samples/sec Loss 3.8195 LearningRate 0.0001 Epoch: 27 Global Step: 574460 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:10,606-Speed 6319.96 samples/sec Loss 3.7434 LearningRate 0.0001 Epoch: 27 Global Step: 574470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:13,852-Speed 6310.06 samples/sec Loss 3.7882 LearningRate 0.0001 Epoch: 27 Global Step: 574480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:17,097-Speed 6314.19 samples/sec Loss 3.7933 LearningRate 0.0001 Epoch: 27 Global Step: 574490 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:20,344-Speed 6307.25 samples/sec Loss 3.8138 LearningRate 0.0001 Epoch: 27 Global Step: 574500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:23,591-Speed 6310.46 samples/sec Loss 3.6939 LearningRate 0.0001 Epoch: 27 Global Step: 574510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:26,823-Speed 6337.49 samples/sec Loss 3.7266 LearningRate 0.0001 Epoch: 27 Global Step: 574520 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:30,117-Speed 6217.82 samples/sec Loss 3.7286 LearningRate 0.0001 Epoch: 27 Global Step: 574530 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:33,372-Speed 6293.86 samples/sec Loss 3.6873 LearningRate 0.0001 Epoch: 27 Global Step: 574540 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:36,616-Speed 6314.35 samples/sec Loss 3.7295 LearningRate 0.0001 Epoch: 27 Global Step: 574550 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:39,860-Speed 6313.93 samples/sec Loss 3.7989 LearningRate 0.0001 Epoch: 27 Global Step: 574560 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:43,100-Speed 6323.32 samples/sec Loss 3.7143 LearningRate 0.0001 Epoch: 27 Global Step: 574570 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:46,344-Speed 6314.24 samples/sec Loss 3.7750 LearningRate 0.0001 Epoch: 27 Global Step: 574580 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:49,589-Speed 6312.11 samples/sec Loss 3.6866 LearningRate 0.0001 Epoch: 27 Global Step: 574590 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:52,836-Speed 6309.54 samples/sec Loss 3.7022 LearningRate 0.0001 Epoch: 27 Global Step: 574600 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:56,078-Speed 6318.53 samples/sec Loss 3.7401 LearningRate 0.0001 Epoch: 27 Global Step: 574610 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 19:59:59,324-Speed 6310.33 samples/sec Loss 3.7322 LearningRate 0.0001 Epoch: 27 Global Step: 574620 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:00:02,570-Speed 6311.66 samples/sec Loss 3.7365 LearningRate 0.0001 Epoch: 27 Global Step: 574630 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:00:05,814-Speed 6314.67 samples/sec Loss 3.6967 LearningRate 0.0001 Epoch: 27 Global Step: 574640 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:00:09,063-Speed 6303.66 samples/sec Loss 3.6819 LearningRate 0.0001 Epoch: 27 Global Step: 574650 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:00:12,305-Speed 6318.40 samples/sec Loss 3.7261 LearningRate 0.0001 Epoch: 27 Global Step: 574660 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:00:15,536-Speed 6341.30 samples/sec Loss 3.7271 LearningRate 0.0001 Epoch: 27 Global Step: 574670 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:18,781-Speed 6312.85 samples/sec Loss 3.7088 LearningRate 0.0001 Epoch: 27 Global Step: 574680 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:22,026-Speed 6312.31 samples/sec Loss 3.7831 LearningRate 0.0001 Epoch: 27 Global Step: 574690 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:25,272-Speed 6312.48 samples/sec Loss 3.7333 LearningRate 0.0001 Epoch: 27 Global Step: 574700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:28,515-Speed 6316.10 samples/sec Loss 3.7815 LearningRate 0.0001 Epoch: 27 Global Step: 574710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:31,758-Speed 6316.67 samples/sec Loss 3.7821 LearningRate 0.0001 Epoch: 27 Global Step: 574720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:35,001-Speed 6315.49 samples/sec Loss 3.8025 LearningRate 0.0001 Epoch: 27 Global Step: 574730 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:38,251-Speed 6303.62 samples/sec Loss 3.7349 LearningRate 0.0001 Epoch: 27 Global Step: 574740 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:41,497-Speed 6310.95 samples/sec Loss 3.7907 LearningRate 0.0001 Epoch: 27 Global Step: 574750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:44,740-Speed 6316.37 samples/sec Loss 3.7626 LearningRate 0.0001 Epoch: 27 Global Step: 574760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:47,974-Speed 6334.24 samples/sec Loss 3.7421 LearningRate 0.0001 Epoch: 27 Global Step: 574770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:51,218-Speed 6314.86 samples/sec Loss 3.7985 LearningRate 0.0001 Epoch: 27 Global Step: 574780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:54,465-Speed 6308.50 samples/sec Loss 3.7155 LearningRate 0.0001 Epoch: 27 Global Step: 574790 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:00:57,712-Speed 6308.87 samples/sec Loss 3.7666 LearningRate 0.0001 Epoch: 27 Global Step: 574800 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:00,967-Speed 6292.05 samples/sec Loss 3.6828 LearningRate 0.0001 Epoch: 27 Global Step: 574810 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:04,215-Speed 6307.90 samples/sec Loss 3.7283 LearningRate 0.0001 Epoch: 27 Global Step: 574820 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:07,463-Speed 6305.77 samples/sec Loss 3.7501 LearningRate 0.0001 Epoch: 27 Global Step: 574830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:10,709-Speed 6311.75 samples/sec Loss 3.7261 LearningRate 0.0001 Epoch: 27 Global Step: 574840 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:13,957-Speed 6307.19 samples/sec Loss 3.8060 LearningRate 0.0001 Epoch: 27 Global Step: 574850 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:17,203-Speed 6309.03 samples/sec Loss 3.6405 LearningRate 0.0001 Epoch: 27 Global Step: 574860 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:20,449-Speed 6312.14 samples/sec Loss 3.7173 LearningRate 0.0001 Epoch: 27 Global Step: 574870 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:01:23,678-Speed 6344.22 samples/sec Loss 3.7348 LearningRate 0.0001 Epoch: 27 Global Step: 574880 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:26,924-Speed 6310.34 samples/sec Loss 3.7801 LearningRate 0.0001 Epoch: 27 Global Step: 574890 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:30,171-Speed 6308.81 samples/sec Loss 3.6951 LearningRate 0.0001 Epoch: 27 Global Step: 574900 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:33,441-Speed 6264.96 samples/sec Loss 3.7214 LearningRate 0.0001 Epoch: 27 Global Step: 574910 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:36,686-Speed 6312.66 samples/sec Loss 3.7523 LearningRate 0.0001 Epoch: 27 Global Step: 574920 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:39,934-Speed 6306.68 samples/sec Loss 3.7422 LearningRate 0.0001 Epoch: 27 Global Step: 574930 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:43,178-Speed 6314.71 samples/sec Loss 3.7434 LearningRate 0.0001 Epoch: 27 Global Step: 574940 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:46,423-Speed 6311.66 samples/sec Loss 3.7411 LearningRate 0.0001 Epoch: 27 Global Step: 574950 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:49,668-Speed 6313.49 samples/sec Loss 3.7432 LearningRate 0.0001 Epoch: 27 Global Step: 574960 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:52,912-Speed 6314.49 samples/sec Loss 3.6950 LearningRate 0.0001 Epoch: 27 Global Step: 574970 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:01:56,159-Speed 6309.34 samples/sec Loss 3.8556 LearningRate 0.0001 Epoch: 27 Global Step: 574980 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:01:59,412-Speed 6296.00 samples/sec Loss 3.7679 LearningRate 0.0001 Epoch: 27 Global Step: 574990 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:02:02,642-Speed 6342.45 samples/sec Loss 3.6850 LearningRate 0.0001 Epoch: 27 Global Step: 575000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:05,891-Speed 6304.77 samples/sec Loss 3.7732 LearningRate 0.0001 Epoch: 27 Global Step: 575010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:09,137-Speed 6312.40 samples/sec Loss 3.7279 LearningRate 0.0001 Epoch: 27 Global Step: 575020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:12,384-Speed 6308.27 samples/sec Loss 3.7139 LearningRate 0.0001 Epoch: 27 Global Step: 575030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:15,633-Speed 6305.72 samples/sec Loss 3.7675 LearningRate 0.0001 Epoch: 27 Global Step: 575040 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:18,875-Speed 6317.02 samples/sec Loss 3.7861 LearningRate 0.0001 Epoch: 27 Global Step: 575050 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:22,122-Speed 6309.19 samples/sec Loss 3.7509 LearningRate 0.0001 Epoch: 27 Global Step: 575060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:25,365-Speed 6315.93 samples/sec Loss 3.6577 LearningRate 0.0001 Epoch: 27 Global Step: 575070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:28,611-Speed 6312.56 samples/sec Loss 3.7184 LearningRate 0.0001 Epoch: 27 Global Step: 575080 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:31,854-Speed 6315.59 samples/sec Loss 3.7979 LearningRate 0.0001 Epoch: 27 Global Step: 575090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:35,102-Speed 6306.15 samples/sec Loss 3.6870 LearningRate 0.0001 Epoch: 27 Global Step: 575100 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:02:38,347-Speed 6312.67 samples/sec Loss 3.7974 LearningRate 0.0001 Epoch: 27 Global Step: 575110 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:02:41,594-Speed 6310.31 samples/sec Loss 3.7490 LearningRate 0.0001 Epoch: 27 Global Step: 575120 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:02:44,824-Speed 6342.45 samples/sec Loss 3.7812 LearningRate 0.0001 Epoch: 27 Global Step: 575130 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:48,066-Speed 6317.58 samples/sec Loss 3.7109 LearningRate 0.0001 Epoch: 27 Global Step: 575140 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:51,319-Speed 6296.87 samples/sec Loss 3.7032 LearningRate 0.0001 Epoch: 27 Global Step: 575150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:54,560-Speed 6321.73 samples/sec Loss 3.7287 LearningRate 0.0001 Epoch: 27 Global Step: 575160 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:02:57,805-Speed 6312.78 samples/sec Loss 3.7798 LearningRate 0.0001 Epoch: 27 Global Step: 575170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:01,051-Speed 6310.21 samples/sec Loss 3.6349 LearningRate 0.0001 Epoch: 27 Global Step: 575180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:04,301-Speed 6303.25 samples/sec Loss 3.7472 LearningRate 0.0001 Epoch: 27 Global Step: 575190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:07,546-Speed 6311.26 samples/sec Loss 3.7421 LearningRate 0.0001 Epoch: 27 Global Step: 575200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:10,793-Speed 6309.24 samples/sec Loss 3.7010 LearningRate 0.0001 Epoch: 27 Global Step: 575210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:14,042-Speed 6305.17 samples/sec Loss 3.6690 LearningRate 0.0001 Epoch: 27 Global Step: 575220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:17,287-Speed 6312.05 samples/sec Loss 3.7461 LearningRate 0.0001 Epoch: 27 Global Step: 575230 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:03:20,529-Speed 6319.28 samples/sec Loss 3.7438 LearningRate 0.0001 Epoch: 27 Global Step: 575240 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:03:23,760-Speed 6340.28 samples/sec Loss 3.7576 LearningRate 0.0001 Epoch: 27 Global Step: 575250 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:27,004-Speed 6315.27 samples/sec Loss 3.7753 LearningRate 0.0001 Epoch: 27 Global Step: 575260 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:30,261-Speed 6289.01 samples/sec Loss 3.7406 LearningRate 0.0001 Epoch: 27 Global Step: 575270 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:33,507-Speed 6309.47 samples/sec Loss 3.7141 LearningRate 0.0001 Epoch: 27 Global Step: 575280 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:36,753-Speed 6311.74 samples/sec Loss 3.7717 LearningRate 0.0001 Epoch: 27 Global Step: 575290 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:40,004-Speed 6301.14 samples/sec Loss 3.7239 LearningRate 0.0001 Epoch: 27 Global Step: 575300 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:43,249-Speed 6312.80 samples/sec Loss 3.7330 LearningRate 0.0001 Epoch: 27 Global Step: 575310 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:46,496-Speed 6307.86 samples/sec Loss 3.7063 LearningRate 0.0001 Epoch: 27 Global Step: 575320 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:49,741-Speed 6313.32 samples/sec Loss 3.7159 LearningRate 0.0001 Epoch: 27 Global Step: 575330 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:52,985-Speed 6314.26 samples/sec Loss 3.7239 LearningRate 0.0001 Epoch: 27 Global Step: 575340 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:03:56,234-Speed 6304.39 samples/sec Loss 3.7191 LearningRate 0.0001 Epoch: 27 Global Step: 575350 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:03:59,483-Speed 6305.28 samples/sec Loss 3.7329 LearningRate 0.0001 Epoch: 27 Global Step: 575360 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:04:02,761-Speed 6249.18 samples/sec Loss 3.7401 LearningRate 0.0001 Epoch: 27 Global Step: 575370 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:04:06,004-Speed 6316.49 samples/sec Loss 3.7182 LearningRate 0.0001 Epoch: 27 Global Step: 575380 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:04:09,251-Speed 6309.09 samples/sec Loss 3.7287 LearningRate 0.0001 Epoch: 27 Global Step: 575390 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:04:12,481-Speed 6342.67 samples/sec Loss 3.7616 LearningRate 0.0001 Epoch: 27 Global Step: 575400 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:04:15,726-Speed 6313.09 samples/sec Loss 3.6972 LearningRate 0.0001 Epoch: 27 Global Step: 575410 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:04:18,980-Speed 6294.81 samples/sec Loss 3.6951 LearningRate 0.0001 Epoch: 27 Global Step: 575420 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:04:22,236-Speed 6291.59 samples/sec Loss 3.7283 LearningRate 0.0001 Epoch: 27 Global Step: 575430 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:04:25,483-Speed 6308.63 samples/sec Loss 3.6890 LearningRate 0.0001 Epoch: 27 Global Step: 575440 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:04:28,727-Speed 6314.28 samples/sec Loss 3.7903 LearningRate 0.0001 Epoch: 27 Global Step: 575450 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:04:31,976-Speed 6305.56 samples/sec Loss 3.6791 LearningRate 0.0001 Epoch: 27 Global Step: 575460 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:04:35,220-Speed 6314.86 samples/sec Loss 3.7187 LearningRate 0.0001 Epoch: 27 Global Step: 575470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:04:38,462-Speed 6317.54 samples/sec Loss 3.7505 LearningRate 0.0001 Epoch: 27 Global Step: 575480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:04:41,718-Speed 6292.32 samples/sec Loss 3.7411 LearningRate 0.0001 Epoch: 27 Global Step: 575490 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:04:44,962-Speed 6312.77 samples/sec Loss 3.8001 LearningRate 0.0001 Epoch: 27 Global Step: 575500 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:04:48,208-Speed 6310.60 samples/sec Loss 3.7337 LearningRate 0.0001 Epoch: 27 Global Step: 575510 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:04:51,456-Speed 6307.61 samples/sec Loss 3.7433 LearningRate 0.0001 Epoch: 27 Global Step: 575520 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:04:54,699-Speed 6315.72 samples/sec Loss 3.7160 LearningRate 0.0001 Epoch: 27 Global Step: 575530 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:04:57,944-Speed 6313.30 samples/sec Loss 3.7407 LearningRate 0.0001 Epoch: 27 Global Step: 575540 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:05:01,173-Speed 6345.20 samples/sec Loss 3.7056 LearningRate 0.0001 Epoch: 27 Global Step: 575550 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:04,420-Speed 6306.86 samples/sec Loss 3.6787 LearningRate 0.0001 Epoch: 27 Global Step: 575560 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:07,664-Speed 6316.00 samples/sec Loss 3.7050 LearningRate 0.0001 Epoch: 27 Global Step: 575570 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:10,912-Speed 6306.35 samples/sec Loss 3.6822 LearningRate 0.0001 Epoch: 27 Global Step: 575580 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:14,157-Speed 6313.40 samples/sec Loss 3.7013 LearningRate 0.0001 Epoch: 27 Global Step: 575590 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:17,401-Speed 6314.02 samples/sec Loss 3.7346 LearningRate 0.0001 Epoch: 27 Global Step: 575600 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:20,659-Speed 6287.62 samples/sec Loss 3.8101 LearningRate 0.0001 Epoch: 27 Global Step: 575610 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:23,902-Speed 6317.67 samples/sec Loss 3.7752 LearningRate 0.0001 Epoch: 27 Global Step: 575620 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:27,147-Speed 6311.47 samples/sec Loss 3.7090 LearningRate 0.0001 Epoch: 27 Global Step: 575630 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:30,393-Speed 6311.77 samples/sec Loss 3.7398 LearningRate 0.0001 Epoch: 27 Global Step: 575640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:33,634-Speed 6319.83 samples/sec Loss 3.6637 LearningRate 0.0001 Epoch: 27 Global Step: 575650 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:05:36,882-Speed 6306.36 samples/sec Loss 3.7114 LearningRate 0.0001 Epoch: 27 Global Step: 575660 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:05:40,127-Speed 6313.86 samples/sec Loss 3.7415 LearningRate 0.0001 Epoch: 27 Global Step: 575670 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:05:43,374-Speed 6307.54 samples/sec Loss 3.7314 LearningRate 0.0001 Epoch: 27 Global Step: 575680 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:05:46,609-Speed 6331.91 samples/sec Loss 3.7485 LearningRate 0.0001 Epoch: 27 Global Step: 575690 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:49,854-Speed 6314.09 samples/sec Loss 3.7705 LearningRate 0.0001 Epoch: 27 Global Step: 575700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:53,098-Speed 6314.40 samples/sec Loss 3.7599 LearningRate 0.0001 Epoch: 27 Global Step: 575710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:56,342-Speed 6313.25 samples/sec Loss 3.7127 LearningRate 0.0001 Epoch: 27 Global Step: 575720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:05:59,590-Speed 6308.54 samples/sec Loss 3.7011 LearningRate 0.0001 Epoch: 27 Global Step: 575730 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:02,835-Speed 6311.12 samples/sec Loss 3.6680 LearningRate 0.0001 Epoch: 27 Global Step: 575740 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:06,082-Speed 6308.77 samples/sec Loss 3.6727 LearningRate 0.0001 Epoch: 27 Global Step: 575750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:09,327-Speed 6313.04 samples/sec Loss 3.7482 LearningRate 0.0001 Epoch: 27 Global Step: 575760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:12,575-Speed 6306.74 samples/sec Loss 3.6978 LearningRate 0.0001 Epoch: 27 Global Step: 575770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:15,831-Speed 6291.04 samples/sec Loss 3.7789 LearningRate 0.0001 Epoch: 27 Global Step: 575780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:19,073-Speed 6317.53 samples/sec Loss 3.7260 LearningRate 0.0001 Epoch: 27 Global Step: 575790 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:06:22,318-Speed 6313.47 samples/sec Loss 3.7289 LearningRate 0.0001 Epoch: 27 Global Step: 575800 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:06:25,570-Speed 6299.66 samples/sec Loss 3.6772 LearningRate 0.0001 Epoch: 27 Global Step: 575810 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:06:28,800-Speed 6343.66 samples/sec Loss 3.7067 LearningRate 0.0001 Epoch: 27 Global Step: 575820 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:32,051-Speed 6299.70 samples/sec Loss 3.7104 LearningRate 0.0001 Epoch: 27 Global Step: 575830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:35,296-Speed 6313.24 samples/sec Loss 3.7545 LearningRate 0.0001 Epoch: 27 Global Step: 575840 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:38,539-Speed 6316.65 samples/sec Loss 3.7599 LearningRate 0.0001 Epoch: 27 Global Step: 575850 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:41,787-Speed 6307.74 samples/sec Loss 3.7186 LearningRate 0.0001 Epoch: 27 Global Step: 575860 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:45,031-Speed 6314.09 samples/sec Loss 3.7453 LearningRate 0.0001 Epoch: 27 Global Step: 575870 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:48,366-Speed 6141.95 samples/sec Loss 3.7368 LearningRate 0.0001 Epoch: 27 Global Step: 575880 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:51,609-Speed 6316.68 samples/sec Loss 3.7222 LearningRate 0.0001 Epoch: 27 Global Step: 575890 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:54,855-Speed 6309.93 samples/sec Loss 3.7643 LearningRate 0.0001 Epoch: 27 Global Step: 575900 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:06:58,097-Speed 6320.16 samples/sec Loss 3.7811 LearningRate 0.0001 Epoch: 27 Global Step: 575910 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:01,345-Speed 6305.15 samples/sec Loss 3.7645 LearningRate 0.0001 Epoch: 27 Global Step: 575920 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:07:04,589-Speed 6314.35 samples/sec Loss 3.7774 LearningRate 0.0001 Epoch: 27 Global Step: 575930 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:07:07,819-Speed 6342.41 samples/sec Loss 3.7046 LearningRate 0.0001 Epoch: 27 Global Step: 575940 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:11,076-Speed 6290.02 samples/sec Loss 3.7049 LearningRate 0.0001 Epoch: 27 Global Step: 575950 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:14,327-Speed 6301.39 samples/sec Loss 3.7343 LearningRate 0.0001 Epoch: 27 Global Step: 575960 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:17,570-Speed 6314.80 samples/sec Loss 3.6830 LearningRate 0.0001 Epoch: 27 Global Step: 575970 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:20,821-Speed 6301.74 samples/sec Loss 3.7560 LearningRate 0.0001 Epoch: 27 Global Step: 575980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:24,070-Speed 6305.67 samples/sec Loss 3.6768 LearningRate 0.0001 Epoch: 27 Global Step: 575990 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:27,313-Speed 6316.99 samples/sec Loss 3.7507 LearningRate 0.0001 Epoch: 27 Global Step: 576000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:30,557-Speed 6313.10 samples/sec Loss 3.7219 LearningRate 0.0001 Epoch: 27 Global Step: 576010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:33,815-Speed 6288.80 samples/sec Loss 3.7016 LearningRate 0.0001 Epoch: 27 Global Step: 576020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:37,063-Speed 6306.19 samples/sec Loss 3.7233 LearningRate 0.0001 Epoch: 27 Global Step: 576030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:40,310-Speed 6308.41 samples/sec Loss 3.7523 LearningRate 0.0001 Epoch: 27 Global Step: 576040 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:07:43,543-Speed 6335.90 samples/sec Loss 3.7616 LearningRate 0.0001 Epoch: 27 Global Step: 576050 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:46,789-Speed 6311.87 samples/sec Loss 3.8230 LearningRate 0.0001 Epoch: 27 Global Step: 576060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:50,036-Speed 6308.27 samples/sec Loss 3.6916 LearningRate 0.0001 Epoch: 27 Global Step: 576070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:53,280-Speed 6316.07 samples/sec Loss 3.7081 LearningRate 0.0001 Epoch: 27 Global Step: 576080 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:56,524-Speed 6313.65 samples/sec Loss 3.7382 LearningRate 0.0001 Epoch: 27 Global Step: 576090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:07:59,783-Speed 6286.22 samples/sec Loss 3.7343 LearningRate 0.0001 Epoch: 27 Global Step: 576100 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:03,030-Speed 6308.54 samples/sec Loss 3.7420 LearningRate 0.0001 Epoch: 27 Global Step: 576110 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:06,275-Speed 6313.16 samples/sec Loss 3.7888 LearningRate 0.0001 Epoch: 27 Global Step: 576120 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:09,520-Speed 6311.55 samples/sec Loss 3.7135 LearningRate 0.0001 Epoch: 27 Global Step: 576130 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:12,766-Speed 6311.82 samples/sec Loss 3.7704 LearningRate 0.0001 Epoch: 27 Global Step: 576140 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:16,012-Speed 6309.43 samples/sec Loss 3.8217 LearningRate 0.0001 Epoch: 27 Global Step: 576150 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:08:19,245-Speed 6336.32 samples/sec Loss 3.7909 LearningRate 0.0001 Epoch: 27 Global Step: 576160 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:22,490-Speed 6312.85 samples/sec Loss 3.7514 LearningRate 0.0001 Epoch: 27 Global Step: 576170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:25,737-Speed 6309.49 samples/sec Loss 3.6855 LearningRate 0.0001 Epoch: 27 Global Step: 576180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:28,981-Speed 6314.03 samples/sec Loss 3.7520 LearningRate 0.0001 Epoch: 27 Global Step: 576190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:32,227-Speed 6310.39 samples/sec Loss 3.7690 LearningRate 0.0001 Epoch: 27 Global Step: 576200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:35,471-Speed 6314.15 samples/sec Loss 3.6895 LearningRate 0.0001 Epoch: 27 Global Step: 576210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:38,717-Speed 6312.39 samples/sec Loss 3.7284 LearningRate 0.0001 Epoch: 27 Global Step: 576220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:41,963-Speed 6309.64 samples/sec Loss 3.7280 LearningRate 0.0001 Epoch: 27 Global Step: 576230 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:45,210-Speed 6309.11 samples/sec Loss 3.6898 LearningRate 0.0001 Epoch: 27 Global Step: 576240 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:48,454-Speed 6314.41 samples/sec Loss 3.7532 LearningRate 0.0001 Epoch: 27 Global Step: 576250 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:08:51,701-Speed 6308.96 samples/sec Loss 3.7032 LearningRate 0.0001 Epoch: 27 Global Step: 576260 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:08:54,963-Speed 6280.04 samples/sec Loss 3.7590 LearningRate 0.0001 Epoch: 27 Global Step: 576270 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:08:58,210-Speed 6309.87 samples/sec Loss 3.6696 LearningRate 0.0001 Epoch: 27 Global Step: 576280 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:09:01,440-Speed 6341.87 samples/sec Loss 3.7284 LearningRate 0.0001 Epoch: 27 Global Step: 576290 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:04,684-Speed 6313.70 samples/sec Loss 3.7636 LearningRate 0.0001 Epoch: 27 Global Step: 576300 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:07,931-Speed 6309.57 samples/sec Loss 3.7093 LearningRate 0.0001 Epoch: 27 Global Step: 576310 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:11,175-Speed 6314.41 samples/sec Loss 3.7486 LearningRate 0.0001 Epoch: 27 Global Step: 576320 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:14,420-Speed 6312.68 samples/sec Loss 3.7527 LearningRate 0.0001 Epoch: 27 Global Step: 576330 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:17,663-Speed 6315.48 samples/sec Loss 3.7270 LearningRate 0.0001 Epoch: 27 Global Step: 576340 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:20,908-Speed 6313.33 samples/sec Loss 3.7174 LearningRate 0.0001 Epoch: 27 Global Step: 576350 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:24,156-Speed 6305.95 samples/sec Loss 3.7615 LearningRate 0.0001 Epoch: 27 Global Step: 576360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:27,399-Speed 6317.61 samples/sec Loss 3.7404 LearningRate 0.0001 Epoch: 27 Global Step: 576370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:30,643-Speed 6313.61 samples/sec Loss 3.6952 LearningRate 0.0001 Epoch: 27 Global Step: 576380 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:33,874-Speed 6340.68 samples/sec Loss 3.7241 LearningRate 0.0001 Epoch: 27 Global Step: 576390 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:37,119-Speed 6312.80 samples/sec Loss 3.7095 LearningRate 0.0001 Epoch: 27 Global Step: 576400 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:40,367-Speed 6307.03 samples/sec Loss 3.7722 LearningRate 0.0001 Epoch: 27 Global Step: 576410 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:43,614-Speed 6307.43 samples/sec Loss 3.7824 LearningRate 0.0001 Epoch: 27 Global Step: 576420 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:46,861-Speed 6309.12 samples/sec Loss 3.7316 LearningRate 0.0001 Epoch: 27 Global Step: 576430 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:50,110-Speed 6305.40 samples/sec Loss 3.6699 LearningRate 0.0001 Epoch: 27 Global Step: 576440 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:53,411-Speed 6205.31 samples/sec Loss 3.6959 LearningRate 0.0001 Epoch: 27 Global Step: 576450 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:56,668-Speed 6290.71 samples/sec Loss 3.7288 LearningRate 0.0001 Epoch: 27 Global Step: 576460 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:09:59,913-Speed 6311.50 samples/sec Loss 3.7631 LearningRate 0.0001 Epoch: 27 Global Step: 576470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:03,158-Speed 6312.45 samples/sec Loss 3.7244 LearningRate 0.0001 Epoch: 27 Global Step: 576480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:06,405-Speed 6308.41 samples/sec Loss 3.7889 LearningRate 0.0001 Epoch: 27 Global Step: 576490 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:10:09,643-Speed 6328.03 samples/sec Loss 3.7037 LearningRate 0.0001 Epoch: 27 Global Step: 576500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:12,883-Speed 6322.03 samples/sec Loss 3.7134 LearningRate 0.0001 Epoch: 27 Global Step: 576510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:16,128-Speed 6312.39 samples/sec Loss 3.7550 LearningRate 0.0001 Epoch: 27 Global Step: 576520 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:19,373-Speed 6312.51 samples/sec Loss 3.7494 LearningRate 0.0001 Epoch: 27 Global Step: 576530 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:22,617-Speed 6315.00 samples/sec Loss 3.6973 LearningRate 0.0001 Epoch: 27 Global Step: 576540 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:25,868-Speed 6301.12 samples/sec Loss 3.7139 LearningRate 0.0001 Epoch: 27 Global Step: 576550 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:29,114-Speed 6312.00 samples/sec Loss 3.6979 LearningRate 0.0001 Epoch: 27 Global Step: 576560 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:32,358-Speed 6314.04 samples/sec Loss 3.7017 LearningRate 0.0001 Epoch: 27 Global Step: 576570 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:35,603-Speed 6311.62 samples/sec Loss 3.6290 LearningRate 0.0001 Epoch: 27 Global Step: 576580 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:38,847-Speed 6315.38 samples/sec Loss 3.6762 LearningRate 0.0001 Epoch: 27 Global Step: 576590 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:42,090-Speed 6316.60 samples/sec Loss 3.7139 LearningRate 0.0001 Epoch: 27 Global Step: 576600 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:10:45,335-Speed 6312.71 samples/sec Loss 3.7164 LearningRate 0.0001 Epoch: 27 Global Step: 576610 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:10:48,579-Speed 6314.62 samples/sec Loss 3.7075 LearningRate 0.0001 Epoch: 27 Global Step: 576620 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:10:51,832-Speed 6296.62 samples/sec Loss 3.7351 LearningRate 0.0001 Epoch: 27 Global Step: 576630 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:10:55,068-Speed 6329.98 samples/sec Loss 3.7437 LearningRate 0.0001 Epoch: 27 Global Step: 576640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:10:58,320-Speed 6300.63 samples/sec Loss 3.7689 LearningRate 0.0001 Epoch: 27 Global Step: 576650 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:01,563-Speed 6316.09 samples/sec Loss 3.7299 LearningRate 0.0001 Epoch: 27 Global Step: 576660 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:04,810-Speed 6308.75 samples/sec Loss 3.6801 LearningRate 0.0001 Epoch: 27 Global Step: 576670 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:08,054-Speed 6314.23 samples/sec Loss 3.7279 LearningRate 0.0001 Epoch: 27 Global Step: 576680 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:11,303-Speed 6304.74 samples/sec Loss 3.7336 LearningRate 0.0001 Epoch: 27 Global Step: 576690 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:14,549-Speed 6309.82 samples/sec Loss 3.7538 LearningRate 0.0001 Epoch: 27 Global Step: 576700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:17,799-Speed 6303.71 samples/sec Loss 3.7349 LearningRate 0.0001 Epoch: 27 Global Step: 576710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:21,048-Speed 6305.27 samples/sec Loss 3.7231 LearningRate 0.0001 Epoch: 27 Global Step: 576720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:24,292-Speed 6315.44 samples/sec Loss 3.7146 LearningRate 0.0001 Epoch: 27 Global Step: 576730 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:27,548-Speed 6290.43 samples/sec Loss 3.6663 LearningRate 0.0001 Epoch: 27 Global Step: 576740 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:11:30,826-Speed 6249.87 samples/sec Loss 3.7135 LearningRate 0.0001 Epoch: 27 Global Step: 576750 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:11:34,073-Speed 6309.36 samples/sec Loss 3.6700 LearningRate 0.0001 Epoch: 27 Global Step: 576760 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:11:37,304-Speed 6339.01 samples/sec Loss 3.7863 LearningRate 0.0001 Epoch: 27 Global Step: 576770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:40,553-Speed 6304.42 samples/sec Loss 3.7347 LearningRate 0.0001 Epoch: 27 Global Step: 576780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:43,802-Speed 6305.79 samples/sec Loss 3.7164 LearningRate 0.0001 Epoch: 27 Global Step: 576790 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:47,060-Speed 6286.80 samples/sec Loss 3.7060 LearningRate 0.0001 Epoch: 27 Global Step: 576800 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:50,303-Speed 6317.62 samples/sec Loss 3.7484 LearningRate 0.0001 Epoch: 27 Global Step: 576810 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:53,549-Speed 6309.18 samples/sec Loss 3.6875 LearningRate 0.0001 Epoch: 27 Global Step: 576820 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:11:56,796-Speed 6310.44 samples/sec Loss 3.6997 LearningRate 0.0001 Epoch: 27 Global Step: 576830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:00,043-Speed 6307.52 samples/sec Loss 3.7787 LearningRate 0.0001 Epoch: 27 Global Step: 576840 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:03,289-Speed 6311.83 samples/sec Loss 3.6928 LearningRate 0.0001 Epoch: 27 Global Step: 576850 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:06,533-Speed 6314.22 samples/sec Loss 3.7237 LearningRate 0.0001 Epoch: 27 Global Step: 576860 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:09,789-Speed 6290.96 samples/sec Loss 3.6868 LearningRate 0.0001 Epoch: 27 Global Step: 576870 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:12:13,034-Speed 6311.81 samples/sec Loss 3.7139 LearningRate 0.0001 Epoch: 27 Global Step: 576880 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:12:16,279-Speed 6314.02 samples/sec Loss 3.7224 LearningRate 0.0001 Epoch: 27 Global Step: 576890 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:12:19,521-Speed 6318.16 samples/sec Loss 3.6792 LearningRate 0.0001 Epoch: 27 Global Step: 576900 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:12:22,766-Speed 6312.46 samples/sec Loss 3.7065 LearningRate 0.0001 Epoch: 27 Global Step: 576910 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:12:26,009-Speed 6316.69 samples/sec Loss 3.7051 LearningRate 0.0001 Epoch: 27 Global Step: 576920 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:12:29,252-Speed 6315.70 samples/sec Loss 3.6841 LearningRate 0.0001 Epoch: 27 Global Step: 576930 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:12:32,514-Speed 6280.83 samples/sec Loss 3.6456 LearningRate 0.0001 Epoch: 27 Global Step: 576940 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:12:35,746-Speed 6337.59 samples/sec Loss 3.7106 LearningRate 0.0001 Epoch: 27 Global Step: 576950 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:38,990-Speed 6315.22 samples/sec Loss 3.7103 LearningRate 0.0001 Epoch: 27 Global Step: 576960 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:42,238-Speed 6307.23 samples/sec Loss 3.6907 LearningRate 0.0001 Epoch: 27 Global Step: 576970 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:45,482-Speed 6315.52 samples/sec Loss 3.7280 LearningRate 0.0001 Epoch: 27 Global Step: 576980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:48,726-Speed 6313.16 samples/sec Loss 3.7473 LearningRate 0.0001 Epoch: 27 Global Step: 576990 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:51,981-Speed 6293.20 samples/sec Loss 3.7061 LearningRate 0.0001 Epoch: 27 Global Step: 577000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:55,226-Speed 6313.08 samples/sec Loss 3.6867 LearningRate 0.0001 Epoch: 27 Global Step: 577010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:12:58,471-Speed 6312.13 samples/sec Loss 3.7064 LearningRate 0.0001 Epoch: 27 Global Step: 577020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:01,717-Speed 6310.16 samples/sec Loss 3.7033 LearningRate 0.0001 Epoch: 27 Global Step: 577030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:04,961-Speed 6316.54 samples/sec Loss 3.6445 LearningRate 0.0001 Epoch: 27 Global Step: 577040 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:08,218-Speed 6288.89 samples/sec Loss 3.7284 LearningRate 0.0001 Epoch: 27 Global Step: 577050 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:13:11,458-Speed 6321.98 samples/sec Loss 3.7184 LearningRate 0.0001 Epoch: 27 Global Step: 577060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:14,702-Speed 6313.79 samples/sec Loss 3.7519 LearningRate 0.0001 Epoch: 27 Global Step: 577070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:17,944-Speed 6319.93 samples/sec Loss 3.7210 LearningRate 0.0001 Epoch: 27 Global Step: 577080 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:21,205-Speed 6281.00 samples/sec Loss 3.6877 LearningRate 0.0001 Epoch: 27 Global Step: 577090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:24,462-Speed 6288.78 samples/sec Loss 3.7365 LearningRate 0.0001 Epoch: 27 Global Step: 577100 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:27,723-Speed 6281.85 samples/sec Loss 3.7965 LearningRate 0.0001 Epoch: 27 Global Step: 577110 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:30,964-Speed 6320.45 samples/sec Loss 3.7807 LearningRate 0.0001 Epoch: 27 Global Step: 577120 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:34,209-Speed 6312.58 samples/sec Loss 3.7727 LearningRate 0.0001 Epoch: 27 Global Step: 577130 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:37,454-Speed 6313.05 samples/sec Loss 3.7845 LearningRate 0.0001 Epoch: 27 Global Step: 577140 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:40,701-Speed 6308.39 samples/sec Loss 3.7278 LearningRate 0.0001 Epoch: 27 Global Step: 577150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:13:43,943-Speed 6318.40 samples/sec Loss 3.6862 LearningRate 0.0001 Epoch: 27 Global Step: 577160 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:13:47,189-Speed 6311.38 samples/sec Loss 3.7257 LearningRate 0.0001 Epoch: 27 Global Step: 577170 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:13:50,438-Speed 6305.81 samples/sec Loss 3.7672 LearningRate 0.0001 Epoch: 27 Global Step: 577180 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:13:53,686-Speed 6306.75 samples/sec Loss 3.7051 LearningRate 0.0001 Epoch: 27 Global Step: 577190 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:13:56,931-Speed 6313.58 samples/sec Loss 3.6953 LearningRate 0.0001 Epoch: 27 Global Step: 577200 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:14:00,158-Speed 6346.92 samples/sec Loss 3.7673 LearningRate 0.0001 Epoch: 27 Global Step: 577210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:03,405-Speed 6308.63 samples/sec Loss 3.7297 LearningRate 0.0001 Epoch: 27 Global Step: 577220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:06,649-Speed 6314.85 samples/sec Loss 3.7063 LearningRate 0.0001 Epoch: 27 Global Step: 577230 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:09,899-Speed 6302.64 samples/sec Loss 3.6991 LearningRate 0.0001 Epoch: 27 Global Step: 577240 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:13,145-Speed 6311.18 samples/sec Loss 3.7403 LearningRate 0.0001 Epoch: 27 Global Step: 577250 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:16,388-Speed 6316.32 samples/sec Loss 3.7073 LearningRate 0.0001 Epoch: 27 Global Step: 577260 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:19,634-Speed 6311.29 samples/sec Loss 3.7290 LearningRate 0.0001 Epoch: 27 Global Step: 577270 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:22,880-Speed 6310.45 samples/sec Loss 3.7268 LearningRate 0.0001 Epoch: 27 Global Step: 577280 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:26,126-Speed 6309.81 samples/sec Loss 3.6695 LearningRate 0.0001 Epoch: 27 Global Step: 577290 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:29,370-Speed 6315.80 samples/sec Loss 3.7189 LearningRate 0.0001 Epoch: 27 Global Step: 577300 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:32,614-Speed 6314.07 samples/sec Loss 3.6547 LearningRate 0.0001 Epoch: 27 Global Step: 577310 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:14:35,865-Speed 6301.72 samples/sec Loss 3.7545 LearningRate 0.0001 Epoch: 27 Global Step: 577320 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:14:39,109-Speed 6313.40 samples/sec Loss 3.6775 LearningRate 0.0001 Epoch: 27 Global Step: 577330 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:14:42,356-Speed 6309.30 samples/sec Loss 3.6786 LearningRate 0.0001 Epoch: 27 Global Step: 577340 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:14:45,602-Speed 6310.48 samples/sec Loss 3.7320 LearningRate 0.0001 Epoch: 27 Global Step: 577350 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:14:48,837-Speed 6332.51 samples/sec Loss 3.6738 LearningRate 0.0001 Epoch: 27 Global Step: 577360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:52,080-Speed 6315.83 samples/sec Loss 3.7219 LearningRate 0.0001 Epoch: 27 Global Step: 577370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:55,333-Speed 6297.82 samples/sec Loss 3.6685 LearningRate 0.0001 Epoch: 27 Global Step: 577380 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:14:58,584-Speed 6299.54 samples/sec Loss 3.6559 LearningRate 0.0001 Epoch: 27 Global Step: 577390 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:01,828-Speed 6315.36 samples/sec Loss 3.7396 LearningRate 0.0001 Epoch: 27 Global Step: 577400 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:05,077-Speed 6305.29 samples/sec Loss 3.6553 LearningRate 0.0001 Epoch: 27 Global Step: 577410 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:08,323-Speed 6311.47 samples/sec Loss 3.7107 LearningRate 0.0001 Epoch: 27 Global Step: 577420 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:11,576-Speed 6297.55 samples/sec Loss 3.7694 LearningRate 0.0001 Epoch: 27 Global Step: 577430 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:14,825-Speed 6304.54 samples/sec Loss 3.7482 LearningRate 0.0001 Epoch: 27 Global Step: 577440 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:18,071-Speed 6309.66 samples/sec Loss 3.7309 LearningRate 0.0001 Epoch: 27 Global Step: 577450 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:21,314-Speed 6317.38 samples/sec Loss 3.7085 LearningRate 0.0001 Epoch: 27 Global Step: 577460 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:15:24,543-Speed 6343.24 samples/sec Loss 3.6711 LearningRate 0.0001 Epoch: 27 Global Step: 577470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:27,787-Speed 6316.29 samples/sec Loss 3.6912 LearningRate 0.0001 Epoch: 27 Global Step: 577480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:31,035-Speed 6305.46 samples/sec Loss 3.6951 LearningRate 0.0001 Epoch: 27 Global Step: 577490 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:34,277-Speed 6319.49 samples/sec Loss 3.6484 LearningRate 0.0001 Epoch: 27 Global Step: 577500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:37,521-Speed 6313.23 samples/sec Loss 3.7483 LearningRate 0.0001 Epoch: 27 Global Step: 577510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:40,767-Speed 6310.46 samples/sec Loss 3.7527 LearningRate 0.0001 Epoch: 27 Global Step: 577520 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:44,013-Speed 6310.55 samples/sec Loss 3.7928 LearningRate 0.0001 Epoch: 27 Global Step: 577530 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:47,268-Speed 6293.80 samples/sec Loss 3.7239 LearningRate 0.0001 Epoch: 27 Global Step: 577540 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:50,515-Speed 6308.75 samples/sec Loss 3.7134 LearningRate 0.0001 Epoch: 27 Global Step: 577550 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:53,760-Speed 6312.01 samples/sec Loss 3.6943 LearningRate 0.0001 Epoch: 27 Global Step: 577560 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:15:57,004-Speed 6314.80 samples/sec Loss 3.6802 LearningRate 0.0001 Epoch: 27 Global Step: 577570 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:16:00,250-Speed 6310.91 samples/sec Loss 3.6648 LearningRate 0.0001 Epoch: 27 Global Step: 577580 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:16:03,492-Speed 6318.50 samples/sec Loss 3.6985 LearningRate 0.0001 Epoch: 27 Global Step: 577590 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:06,744-Speed 6299.35 samples/sec Loss 3.7114 LearningRate 0.0001 Epoch: 27 Global Step: 577600 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:09,991-Speed 6309.66 samples/sec Loss 3.6764 LearningRate 0.0001 Epoch: 27 Global Step: 577610 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:13,237-Speed 6309.24 samples/sec Loss 3.7000 LearningRate 0.0001 Epoch: 27 Global Step: 577620 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:16,483-Speed 6312.06 samples/sec Loss 3.7047 LearningRate 0.0001 Epoch: 27 Global Step: 577630 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:19,729-Speed 6311.27 samples/sec Loss 3.7002 LearningRate 0.0001 Epoch: 27 Global Step: 577640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:22,978-Speed 6305.14 samples/sec Loss 3.7099 LearningRate 0.0001 Epoch: 27 Global Step: 577650 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:26,221-Speed 6316.30 samples/sec Loss 3.7115 LearningRate 0.0001 Epoch: 27 Global Step: 577660 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:29,466-Speed 6313.17 samples/sec Loss 3.7558 LearningRate 0.0001 Epoch: 27 Global Step: 577670 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:32,713-Speed 6308.24 samples/sec Loss 3.7503 LearningRate 0.0001 Epoch: 27 Global Step: 577680 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:35,962-Speed 6305.57 samples/sec Loss 3.7556 LearningRate 0.0001 Epoch: 27 Global Step: 577690 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:16:39,205-Speed 6317.06 samples/sec Loss 3.6940 LearningRate 0.0001 Epoch: 27 Global Step: 577700 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:16:42,437-Speed 6338.88 samples/sec Loss 3.6735 LearningRate 0.0001 Epoch: 27 Global Step: 577710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:45,682-Speed 6311.67 samples/sec Loss 3.6859 LearningRate 0.0001 Epoch: 27 Global Step: 577720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:48,928-Speed 6310.76 samples/sec Loss 3.7110 LearningRate 0.0001 Epoch: 27 Global Step: 577730 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:52,170-Speed 6319.03 samples/sec Loss 3.7770 LearningRate 0.0001 Epoch: 27 Global Step: 577740 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:55,415-Speed 6313.31 samples/sec Loss 3.6929 LearningRate 0.0001 Epoch: 27 Global Step: 577750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:16:58,660-Speed 6312.20 samples/sec Loss 3.7992 LearningRate 0.0001 Epoch: 27 Global Step: 577760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:01,905-Speed 6313.54 samples/sec Loss 3.6728 LearningRate 0.0001 Epoch: 27 Global Step: 577770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:05,151-Speed 6310.69 samples/sec Loss 3.6944 LearningRate 0.0001 Epoch: 27 Global Step: 577780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:08,394-Speed 6316.34 samples/sec Loss 3.7239 LearningRate 0.0001 Epoch: 27 Global Step: 577790 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:11,640-Speed 6309.54 samples/sec Loss 3.7139 LearningRate 0.0001 Epoch: 27 Global Step: 577800 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:14,882-Speed 6318.35 samples/sec Loss 3.7287 LearningRate 0.0001 Epoch: 27 Global Step: 577810 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:17:18,129-Speed 6308.83 samples/sec Loss 3.7029 LearningRate 0.0001 Epoch: 27 Global Step: 577820 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:17:21,363-Speed 6334.40 samples/sec Loss 3.7086 LearningRate 0.0001 Epoch: 27 Global Step: 577830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:24,610-Speed 6309.41 samples/sec Loss 3.7539 LearningRate 0.0001 Epoch: 27 Global Step: 577840 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:27,854-Speed 6313.64 samples/sec Loss 3.7809 LearningRate 0.0001 Epoch: 27 Global Step: 577850 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:31,102-Speed 6308.86 samples/sec Loss 3.6314 LearningRate 0.0001 Epoch: 27 Global Step: 577860 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:34,349-Speed 6308.37 samples/sec Loss 3.6569 LearningRate 0.0001 Epoch: 27 Global Step: 577870 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:37,595-Speed 6311.25 samples/sec Loss 3.6765 LearningRate 0.0001 Epoch: 27 Global Step: 577880 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:40,837-Speed 6318.24 samples/sec Loss 3.7143 LearningRate 0.0001 Epoch: 27 Global Step: 577890 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:44,078-Speed 6319.51 samples/sec Loss 3.6810 LearningRate 0.0001 Epoch: 27 Global Step: 577900 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:47,322-Speed 6315.15 samples/sec Loss 3.6741 LearningRate 0.0001 Epoch: 27 Global Step: 577910 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:50,572-Speed 6303.48 samples/sec Loss 3.7302 LearningRate 0.0001 Epoch: 27 Global Step: 577920 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:17:53,820-Speed 6305.61 samples/sec Loss 3.7182 LearningRate 0.0001 Epoch: 27 Global Step: 577930 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:17:57,053-Speed 6337.00 samples/sec Loss 3.7345 LearningRate 0.0001 Epoch: 27 Global Step: 577940 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:00,297-Speed 6314.27 samples/sec Loss 3.7094 LearningRate 0.0001 Epoch: 27 Global Step: 577950 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:03,544-Speed 6308.25 samples/sec Loss 3.7160 LearningRate 0.0001 Epoch: 27 Global Step: 577960 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:06,791-Speed 6308.49 samples/sec Loss 3.7372 LearningRate 0.0001 Epoch: 27 Global Step: 577970 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:10,036-Speed 6312.58 samples/sec Loss 3.7101 LearningRate 0.0001 Epoch: 27 Global Step: 577980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:13,283-Speed 6309.41 samples/sec Loss 3.7407 LearningRate 0.0001 Epoch: 27 Global Step: 577990 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:16,532-Speed 6305.18 samples/sec Loss 3.6922 LearningRate 0.0001 Epoch: 27 Global Step: 578000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:19,777-Speed 6312.62 samples/sec Loss 3.6237 LearningRate 0.0001 Epoch: 27 Global Step: 578010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:23,026-Speed 6304.72 samples/sec Loss 3.6785 LearningRate 0.0001 Epoch: 27 Global Step: 578020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:26,267-Speed 6320.76 samples/sec Loss 3.6933 LearningRate 0.0001 Epoch: 27 Global Step: 578030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:29,518-Speed 6300.67 samples/sec Loss 3.7099 LearningRate 0.0001 Epoch: 27 Global Step: 578040 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:18:32,750-Speed 6338.50 samples/sec Loss 3.7124 LearningRate 0.0001 Epoch: 27 Global Step: 578050 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:35,995-Speed 6311.11 samples/sec Loss 3.6948 LearningRate 0.0001 Epoch: 27 Global Step: 578060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:39,241-Speed 6310.99 samples/sec Loss 3.7110 LearningRate 0.0001 Epoch: 27 Global Step: 578070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:42,483-Speed 6318.19 samples/sec Loss 3.7084 LearningRate 0.0001 Epoch: 27 Global Step: 578080 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:45,731-Speed 6308.67 samples/sec Loss 3.7074 LearningRate 0.0001 Epoch: 27 Global Step: 578090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:48,975-Speed 6315.12 samples/sec Loss 3.6953 LearningRate 0.0001 Epoch: 27 Global Step: 578100 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:52,221-Speed 6311.31 samples/sec Loss 3.7543 LearningRate 0.0001 Epoch: 27 Global Step: 578110 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:55,463-Speed 6317.12 samples/sec Loss 3.7389 LearningRate 0.0001 Epoch: 27 Global Step: 578120 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:18:58,706-Speed 6316.42 samples/sec Loss 3.6560 LearningRate 0.0001 Epoch: 27 Global Step: 578130 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:01,951-Speed 6312.71 samples/sec Loss 3.7048 LearningRate 0.0001 Epoch: 27 Global Step: 578140 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:05,181-Speed 6343.18 samples/sec Loss 3.6828 LearningRate 0.0001 Epoch: 27 Global Step: 578150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:08,426-Speed 6311.60 samples/sec Loss 3.6739 LearningRate 0.0001 Epoch: 27 Global Step: 578160 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:11,677-Speed 6301.70 samples/sec Loss 3.6431 LearningRate 0.0001 Epoch: 27 Global Step: 578170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:14,921-Speed 6314.82 samples/sec Loss 3.7073 LearningRate 0.0001 Epoch: 27 Global Step: 578180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:18,168-Speed 6308.87 samples/sec Loss 3.6934 LearningRate 0.0001 Epoch: 27 Global Step: 578190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:21,411-Speed 6316.86 samples/sec Loss 3.6915 LearningRate 0.0001 Epoch: 27 Global Step: 578200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:24,658-Speed 6308.05 samples/sec Loss 3.6679 LearningRate 0.0001 Epoch: 27 Global Step: 578210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:27,901-Speed 6314.99 samples/sec Loss 3.6893 LearningRate 0.0001 Epoch: 27 Global Step: 578220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:31,151-Speed 6303.36 samples/sec Loss 3.7513 LearningRate 0.0001 Epoch: 27 Global Step: 578230 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:34,396-Speed 6313.64 samples/sec Loss 3.7395 LearningRate 0.0001 Epoch: 27 Global Step: 578240 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:37,638-Speed 6317.81 samples/sec Loss 3.7067 LearningRate 0.0001 Epoch: 27 Global Step: 578250 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:19:40,887-Speed 6306.05 samples/sec Loss 3.6573 LearningRate 0.0001 Epoch: 27 Global Step: 578260 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:19:44,134-Speed 6308.35 samples/sec Loss 3.7382 LearningRate 0.0001 Epoch: 27 Global Step: 578270 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:19:47,365-Speed 6339.94 samples/sec Loss 3.7683 LearningRate 0.0001 Epoch: 27 Global Step: 578280 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:50,613-Speed 6307.34 samples/sec Loss 3.7303 LearningRate 0.0001 Epoch: 27 Global Step: 578290 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:53,857-Speed 6312.48 samples/sec Loss 3.7729 LearningRate 0.0001 Epoch: 27 Global Step: 578300 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:19:57,105-Speed 6307.74 samples/sec Loss 3.7256 LearningRate 0.0001 Epoch: 27 Global Step: 578310 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:20:00,350-Speed 6312.34 samples/sec Loss 3.6868 LearningRate 0.0001 Epoch: 27 Global Step: 578320 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:20:03,596-Speed 6311.08 samples/sec Loss 3.7245 LearningRate 0.0001 Epoch: 27 Global Step: 578330 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:20:06,841-Speed 6313.17 samples/sec Loss 3.7503 LearningRate 0.0001 Epoch: 27 Global Step: 578340 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:20:10,089-Speed 6306.67 samples/sec Loss 3.7074 LearningRate 0.0001 Epoch: 27 Global Step: 578350 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:20:13,334-Speed 6312.72 samples/sec Loss 3.7926 LearningRate 0.0001 Epoch: 27 Global Step: 578360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:20:16,581-Speed 6309.19 samples/sec Loss 3.7631 LearningRate 0.0001 Epoch: 27 Global Step: 578370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:20:19,825-Speed 6314.55 samples/sec Loss 3.7266 LearningRate 0.0001 Epoch: 27 Global Step: 578380 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:20:23,071-Speed 6310.03 samples/sec Loss 3.6948 LearningRate 0.0001 Epoch: 27 Global Step: 578390 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:20:26,318-Speed 6309.35 samples/sec Loss 3.7297 LearningRate 0.0001 Epoch: 27 Global Step: 578400 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:20:29,563-Speed 6313.06 samples/sec Loss 3.6775 LearningRate 0.0001 Epoch: 27 Global Step: 578410 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:20:32,807-Speed 6313.87 samples/sec Loss 3.7284 LearningRate 0.0001 Epoch: 27 Global Step: 578420 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:20:36,052-Speed 6313.91 samples/sec Loss 3.7480 LearningRate 0.0001 Epoch: 27 Global Step: 578430 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:20:39,297-Speed 6312.45 samples/sec Loss 3.7223 LearningRate 0.0001 Epoch: 27 Global Step: 578440 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:20:42,551-Speed 6294.43 samples/sec Loss 3.7003 LearningRate 0.0001 Epoch: 27 Global Step: 578450 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:20:45,794-Speed 6316.05 samples/sec Loss 3.6911 LearningRate 0.0001 Epoch: 27 Global Step: 578460 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:20:49,027-Speed 6336.57 samples/sec Loss 3.7154 LearningRate 0.0001 Epoch: 27 Global Step: 578470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:20:52,270-Speed 6316.23 samples/sec Loss 3.7106 LearningRate 0.0001 Epoch: 27 Global Step: 578480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:20:55,513-Speed 6317.26 samples/sec Loss 3.6782 LearningRate 0.0001 Epoch: 27 Global Step: 578490 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:20:58,761-Speed 6307.22 samples/sec Loss 3.6355 LearningRate 0.0001 Epoch: 27 Global Step: 578500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:02,009-Speed 6306.50 samples/sec Loss 3.7244 LearningRate 0.0001 Epoch: 27 Global Step: 578510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:05,255-Speed 6309.45 samples/sec Loss 3.6883 LearningRate 0.0001 Epoch: 27 Global Step: 578520 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:08,501-Speed 6312.40 samples/sec Loss 3.7123 LearningRate 0.0001 Epoch: 27 Global Step: 578530 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:11,744-Speed 6316.18 samples/sec Loss 3.7129 LearningRate 0.0001 Epoch: 27 Global Step: 578540 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:14,991-Speed 6309.60 samples/sec Loss 3.7424 LearningRate 0.0001 Epoch: 27 Global Step: 578550 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:18,237-Speed 6309.61 samples/sec Loss 3.7197 LearningRate 0.0001 Epoch: 27 Global Step: 578560 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:21,483-Speed 6312.39 samples/sec Loss 3.7209 LearningRate 0.0001 Epoch: 27 Global Step: 578570 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:21:24,728-Speed 6312.63 samples/sec Loss 3.6994 LearningRate 0.0001 Epoch: 27 Global Step: 578580 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:21:27,972-Speed 6312.85 samples/sec Loss 3.7207 LearningRate 0.0001 Epoch: 27 Global Step: 578590 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:21:31,222-Speed 6302.90 samples/sec Loss 3.6885 LearningRate 0.0001 Epoch: 27 Global Step: 578600 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:21:34,459-Speed 6330.15 samples/sec Loss 3.7473 LearningRate 0.0001 Epoch: 27 Global Step: 578610 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:37,704-Speed 6311.89 samples/sec Loss 3.7050 LearningRate 0.0001 Epoch: 27 Global Step: 578620 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:40,949-Speed 6312.66 samples/sec Loss 3.6679 LearningRate 0.0001 Epoch: 27 Global Step: 578630 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:44,194-Speed 6311.74 samples/sec Loss 3.6925 LearningRate 0.0001 Epoch: 27 Global Step: 578640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:47,441-Speed 6309.89 samples/sec Loss 3.7341 LearningRate 0.0001 Epoch: 27 Global Step: 578650 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:50,689-Speed 6305.06 samples/sec Loss 3.7279 LearningRate 0.0001 Epoch: 27 Global Step: 578660 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:53,935-Speed 6311.97 samples/sec Loss 3.6775 LearningRate 0.0001 Epoch: 27 Global Step: 578670 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:21:57,183-Speed 6305.93 samples/sec Loss 3.7476 LearningRate 0.0001 Epoch: 27 Global Step: 578680 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:00,428-Speed 6313.34 samples/sec Loss 3.6737 LearningRate 0.0001 Epoch: 27 Global Step: 578690 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:03,682-Speed 6306.40 samples/sec Loss 3.7249 LearningRate 0.0001 Epoch: 27 Global Step: 578700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:06,928-Speed 6311.32 samples/sec Loss 3.6852 LearningRate 0.0001 Epoch: 27 Global Step: 578710 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:22:10,173-Speed 6311.59 samples/sec Loss 3.7116 LearningRate 0.0001 Epoch: 27 Global Step: 578720 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:22:13,420-Speed 6309.21 samples/sec Loss 3.6946 LearningRate 0.0001 Epoch: 27 Global Step: 578730 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:22:16,666-Speed 6310.81 samples/sec Loss 3.6184 LearningRate 0.0001 Epoch: 27 Global Step: 578740 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:22:19,896-Speed 6341.44 samples/sec Loss 3.6533 LearningRate 0.0001 Epoch: 27 Global Step: 578750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:23,143-Speed 6309.59 samples/sec Loss 3.7086 LearningRate 0.0001 Epoch: 27 Global Step: 578760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:26,386-Speed 6315.31 samples/sec Loss 3.7921 LearningRate 0.0001 Epoch: 27 Global Step: 578770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:29,636-Speed 6303.93 samples/sec Loss 3.6650 LearningRate 0.0001 Epoch: 27 Global Step: 578780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:32,878-Speed 6317.99 samples/sec Loss 3.6990 LearningRate 0.0001 Epoch: 27 Global Step: 578790 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:36,123-Speed 6312.67 samples/sec Loss 3.6877 LearningRate 0.0001 Epoch: 27 Global Step: 578800 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:39,377-Speed 6296.83 samples/sec Loss 3.7070 LearningRate 0.0001 Epoch: 27 Global Step: 578810 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:42,622-Speed 6312.28 samples/sec Loss 3.7888 LearningRate 0.0001 Epoch: 27 Global Step: 578820 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:45,869-Speed 6308.76 samples/sec Loss 3.7018 LearningRate 0.0001 Epoch: 27 Global Step: 578830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:49,113-Speed 6313.55 samples/sec Loss 3.6337 LearningRate 0.0001 Epoch: 27 Global Step: 578840 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:22:52,357-Speed 6316.12 samples/sec Loss 3.7435 LearningRate 0.0001 Epoch: 27 Global Step: 578850 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:22:55,603-Speed 6310.80 samples/sec Loss 3.7373 LearningRate 0.0001 Epoch: 27 Global Step: 578860 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:22:58,853-Speed 6301.43 samples/sec Loss 3.7152 LearningRate 0.0001 Epoch: 27 Global Step: 578870 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:23:02,099-Speed 6311.39 samples/sec Loss 3.7378 LearningRate 0.0001 Epoch: 27 Global Step: 578880 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:23:05,332-Speed 6335.24 samples/sec Loss 3.7470 LearningRate 0.0001 Epoch: 27 Global Step: 578890 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:08,578-Speed 6310.99 samples/sec Loss 3.6894 LearningRate 0.0001 Epoch: 27 Global Step: 578900 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:11,824-Speed 6310.84 samples/sec Loss 3.7601 LearningRate 0.0001 Epoch: 27 Global Step: 578910 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:15,073-Speed 6305.91 samples/sec Loss 3.7157 LearningRate 0.0001 Epoch: 27 Global Step: 578920 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:18,320-Speed 6307.83 samples/sec Loss 3.6953 LearningRate 0.0001 Epoch: 27 Global Step: 578930 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:21,563-Speed 6316.59 samples/sec Loss 3.7299 LearningRate 0.0001 Epoch: 27 Global Step: 578940 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:24,808-Speed 6312.30 samples/sec Loss 3.6681 LearningRate 0.0001 Epoch: 27 Global Step: 578950 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:28,053-Speed 6313.50 samples/sec Loss 3.6855 LearningRate 0.0001 Epoch: 27 Global Step: 578960 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:31,311-Speed 6287.43 samples/sec Loss 3.6832 LearningRate 0.0001 Epoch: 27 Global Step: 578970 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:34,580-Speed 6266.74 samples/sec Loss 3.7222 LearningRate 0.0001 Epoch: 27 Global Step: 578980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:37,822-Speed 6318.56 samples/sec Loss 3.7781 LearningRate 0.0001 Epoch: 27 Global Step: 578990 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:23:41,074-Speed 6298.26 samples/sec Loss 3.6923 LearningRate 0.0001 Epoch: 27 Global Step: 579000 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:23:44,313-Speed 6325.82 samples/sec Loss 3.6690 LearningRate 0.0001 Epoch: 27 Global Step: 579010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:47,557-Speed 6314.00 samples/sec Loss 3.6374 LearningRate 0.0001 Epoch: 27 Global Step: 579020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:50,806-Speed 6305.77 samples/sec Loss 3.7877 LearningRate 0.0001 Epoch: 27 Global Step: 579030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:54,050-Speed 6313.69 samples/sec Loss 3.6507 LearningRate 0.0001 Epoch: 27 Global Step: 579040 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:23:57,296-Speed 6310.20 samples/sec Loss 3.6646 LearningRate 0.0001 Epoch: 27 Global Step: 579050 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:00,544-Speed 6307.55 samples/sec Loss 3.6620 LearningRate 0.0001 Epoch: 27 Global Step: 579060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:03,790-Speed 6310.44 samples/sec Loss 3.7276 LearningRate 0.0001 Epoch: 27 Global Step: 579070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:07,038-Speed 6307.62 samples/sec Loss 3.6788 LearningRate 0.0001 Epoch: 27 Global Step: 579080 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:10,281-Speed 6315.05 samples/sec Loss 3.7026 LearningRate 0.0001 Epoch: 27 Global Step: 579090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:13,527-Speed 6312.72 samples/sec Loss 3.7136 LearningRate 0.0001 Epoch: 27 Global Step: 579100 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:16,771-Speed 6313.17 samples/sec Loss 3.6865 LearningRate 0.0001 Epoch: 27 Global Step: 579110 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:24:20,017-Speed 6311.71 samples/sec Loss 3.6835 LearningRate 0.0001 Epoch: 27 Global Step: 579120 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:24:23,249-Speed 6336.64 samples/sec Loss 3.6395 LearningRate 0.0001 Epoch: 27 Global Step: 579130 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:26,495-Speed 6311.69 samples/sec Loss 3.7538 LearningRate 0.0001 Epoch: 27 Global Step: 579140 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:29,748-Speed 6301.48 samples/sec Loss 3.7090 LearningRate 0.0001 Epoch: 27 Global Step: 579150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:32,995-Speed 6308.05 samples/sec Loss 3.7480 LearningRate 0.0001 Epoch: 27 Global Step: 579160 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:36,243-Speed 6307.73 samples/sec Loss 3.6936 LearningRate 0.0001 Epoch: 27 Global Step: 579170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:39,490-Speed 6307.88 samples/sec Loss 3.6988 LearningRate 0.0001 Epoch: 27 Global Step: 579180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:42,740-Speed 6303.75 samples/sec Loss 3.6651 LearningRate 0.0001 Epoch: 27 Global Step: 579190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:45,986-Speed 6309.64 samples/sec Loss 3.7146 LearningRate 0.0001 Epoch: 27 Global Step: 579200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:49,230-Speed 6314.42 samples/sec Loss 3.7179 LearningRate 0.0001 Epoch: 27 Global Step: 579210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:52,477-Speed 6309.21 samples/sec Loss 3.6867 LearningRate 0.0001 Epoch: 27 Global Step: 579220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:24:55,724-Speed 6308.89 samples/sec Loss 3.7327 LearningRate 0.0001 Epoch: 27 Global Step: 579230 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:24:58,974-Speed 6305.04 samples/sec Loss 3.6900 LearningRate 0.0001 Epoch: 27 Global Step: 579240 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:25:02,213-Speed 6323.75 samples/sec Loss 3.7286 LearningRate 0.0001 Epoch: 27 Global Step: 579250 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:05,459-Speed 6309.63 samples/sec Loss 3.7280 LearningRate 0.0001 Epoch: 27 Global Step: 579260 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:08,705-Speed 6311.63 samples/sec Loss 3.6816 LearningRate 0.0001 Epoch: 27 Global Step: 579270 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:11,951-Speed 6310.36 samples/sec Loss 3.6446 LearningRate 0.0001 Epoch: 27 Global Step: 579280 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:15,199-Speed 6307.87 samples/sec Loss 3.6970 LearningRate 0.0001 Epoch: 27 Global Step: 579290 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:18,444-Speed 6312.93 samples/sec Loss 3.6925 LearningRate 0.0001 Epoch: 27 Global Step: 579300 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:21,695-Speed 6299.65 samples/sec Loss 3.7421 LearningRate 0.0001 Epoch: 27 Global Step: 579310 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:24,940-Speed 6313.78 samples/sec Loss 3.7514 LearningRate 0.0001 Epoch: 27 Global Step: 579320 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:28,187-Speed 6308.50 samples/sec Loss 3.7830 LearningRate 0.0001 Epoch: 27 Global Step: 579330 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:31,438-Speed 6300.67 samples/sec Loss 3.7464 LearningRate 0.0001 Epoch: 27 Global Step: 579340 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:34,670-Speed 6338.92 samples/sec Loss 3.7128 LearningRate 0.0001 Epoch: 27 Global Step: 579350 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:37,915-Speed 6312.11 samples/sec Loss 3.7284 LearningRate 0.0001 Epoch: 27 Global Step: 579360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:41,163-Speed 6306.72 samples/sec Loss 3.7927 LearningRate 0.0001 Epoch: 27 Global Step: 579370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:44,407-Speed 6315.00 samples/sec Loss 3.7276 LearningRate 0.0001 Epoch: 27 Global Step: 579380 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:47,660-Speed 6297.03 samples/sec Loss 3.6834 LearningRate 0.0001 Epoch: 27 Global Step: 579390 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:50,909-Speed 6304.64 samples/sec Loss 3.6932 LearningRate 0.0001 Epoch: 27 Global Step: 579400 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:54,161-Speed 6298.70 samples/sec Loss 3.6242 LearningRate 0.0001 Epoch: 27 Global Step: 579410 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:25:57,407-Speed 6310.77 samples/sec Loss 3.7050 LearningRate 0.0001 Epoch: 27 Global Step: 579420 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:00,654-Speed 6308.70 samples/sec Loss 3.6824 LearningRate 0.0001 Epoch: 27 Global Step: 579430 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:03,904-Speed 6303.46 samples/sec Loss 3.7179 LearningRate 0.0001 Epoch: 27 Global Step: 579440 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:07,150-Speed 6311.05 samples/sec Loss 3.7002 LearningRate 0.0001 Epoch: 27 Global Step: 579450 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:26:10,394-Speed 6314.69 samples/sec Loss 3.7639 LearningRate 0.0001 Epoch: 27 Global Step: 579460 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:26:13,622-Speed 6347.30 samples/sec Loss 3.6635 LearningRate 0.0001 Epoch: 27 Global Step: 579470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:16,868-Speed 6310.17 samples/sec Loss 3.6329 LearningRate 0.0001 Epoch: 27 Global Step: 579480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:20,113-Speed 6312.18 samples/sec Loss 3.6700 LearningRate 0.0001 Epoch: 27 Global Step: 579490 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:23,363-Speed 6303.29 samples/sec Loss 3.7317 LearningRate 0.0001 Epoch: 27 Global Step: 579500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:26,611-Speed 6306.33 samples/sec Loss 3.6338 LearningRate 0.0001 Epoch: 27 Global Step: 579510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:29,858-Speed 6308.38 samples/sec Loss 3.7371 LearningRate 0.0001 Epoch: 27 Global Step: 579520 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:33,105-Speed 6309.63 samples/sec Loss 3.6602 LearningRate 0.0001 Epoch: 27 Global Step: 579530 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:36,347-Speed 6317.19 samples/sec Loss 3.7184 LearningRate 0.0001 Epoch: 27 Global Step: 579540 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:39,589-Speed 6319.43 samples/sec Loss 3.6429 LearningRate 0.0001 Epoch: 27 Global Step: 579550 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:42,835-Speed 6311.36 samples/sec Loss 3.7588 LearningRate 0.0001 Epoch: 27 Global Step: 579560 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:46,077-Speed 6318.08 samples/sec Loss 3.7493 LearningRate 0.0001 Epoch: 27 Global Step: 579570 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:26:49,322-Speed 6311.48 samples/sec Loss 3.7568 LearningRate 0.0001 Epoch: 27 Global Step: 579580 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:26:52,555-Speed 6337.48 samples/sec Loss 3.7542 LearningRate 0.0001 Epoch: 27 Global Step: 579590 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:55,800-Speed 6310.81 samples/sec Loss 3.6350 LearningRate 0.0001 Epoch: 27 Global Step: 579600 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:26:59,043-Speed 6318.64 samples/sec Loss 3.7157 LearningRate 0.0001 Epoch: 27 Global Step: 579610 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:02,314-Speed 6261.25 samples/sec Loss 3.6827 LearningRate 0.0001 Epoch: 27 Global Step: 579620 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:05,562-Speed 6307.42 samples/sec Loss 3.7228 LearningRate 0.0001 Epoch: 27 Global Step: 579630 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:08,805-Speed 6316.82 samples/sec Loss 3.6872 LearningRate 0.0001 Epoch: 27 Global Step: 579640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:12,053-Speed 6306.17 samples/sec Loss 3.7271 LearningRate 0.0001 Epoch: 27 Global Step: 579650 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:15,301-Speed 6306.93 samples/sec Loss 3.7681 LearningRate 0.0001 Epoch: 27 Global Step: 579660 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:18,547-Speed 6311.40 samples/sec Loss 3.6674 LearningRate 0.0001 Epoch: 27 Global Step: 579670 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:21,793-Speed 6310.64 samples/sec Loss 3.7088 LearningRate 0.0001 Epoch: 27 Global Step: 579680 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:25,045-Speed 6299.59 samples/sec Loss 3.7681 LearningRate 0.0001 Epoch: 27 Global Step: 579690 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:27:28,275-Speed 6342.38 samples/sec Loss 3.6495 LearningRate 0.0001 Epoch: 27 Global Step: 579700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:31,517-Speed 6317.10 samples/sec Loss 3.7166 LearningRate 0.0001 Epoch: 27 Global Step: 579710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:34,764-Speed 6309.42 samples/sec Loss 3.6764 LearningRate 0.0001 Epoch: 27 Global Step: 579720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:38,006-Speed 6318.10 samples/sec Loss 3.6328 LearningRate 0.0001 Epoch: 27 Global Step: 579730 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:41,252-Speed 6311.23 samples/sec Loss 3.6746 LearningRate 0.0001 Epoch: 27 Global Step: 579740 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:44,496-Speed 6314.56 samples/sec Loss 3.7386 LearningRate 0.0001 Epoch: 27 Global Step: 579750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:47,742-Speed 6311.17 samples/sec Loss 3.7310 LearningRate 0.0001 Epoch: 27 Global Step: 579760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:50,997-Speed 6292.86 samples/sec Loss 3.6514 LearningRate 0.0001 Epoch: 27 Global Step: 579770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:54,245-Speed 6306.87 samples/sec Loss 3.7012 LearningRate 0.0001 Epoch: 27 Global Step: 579780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:27:57,489-Speed 6314.75 samples/sec Loss 3.7200 LearningRate 0.0001 Epoch: 27 Global Step: 579790 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:00,733-Speed 6313.56 samples/sec Loss 3.7263 LearningRate 0.0001 Epoch: 27 Global Step: 579800 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:28:03,967-Speed 6334.01 samples/sec Loss 3.6743 LearningRate 0.0001 Epoch: 27 Global Step: 579810 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:07,211-Speed 6314.91 samples/sec Loss 3.7180 LearningRate 0.0001 Epoch: 27 Global Step: 579820 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:10,460-Speed 6305.46 samples/sec Loss 3.7011 LearningRate 0.0001 Epoch: 27 Global Step: 579830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:13,702-Speed 6318.17 samples/sec Loss 3.6737 LearningRate 0.0001 Epoch: 27 Global Step: 579840 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:16,947-Speed 6312.35 samples/sec Loss 3.6797 LearningRate 0.0001 Epoch: 27 Global Step: 579850 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:20,192-Speed 6312.38 samples/sec Loss 3.7838 LearningRate 0.0001 Epoch: 27 Global Step: 579860 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:23,445-Speed 6296.97 samples/sec Loss 3.7207 LearningRate 0.0001 Epoch: 27 Global Step: 579870 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:26,690-Speed 6313.41 samples/sec Loss 3.6791 LearningRate 0.0001 Epoch: 27 Global Step: 579880 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:29,935-Speed 6311.99 samples/sec Loss 3.6723 LearningRate 0.0001 Epoch: 27 Global Step: 579890 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:33,184-Speed 6306.06 samples/sec Loss 3.7319 LearningRate 0.0001 Epoch: 27 Global Step: 579900 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:36,426-Speed 6317.56 samples/sec Loss 3.6827 LearningRate 0.0001 Epoch: 27 Global Step: 579910 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:28:39,674-Speed 6307.61 samples/sec Loss 3.7141 LearningRate 0.0001 Epoch: 27 Global Step: 579920 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:28:42,906-Speed 6339.19 samples/sec Loss 3.6808 LearningRate 0.0001 Epoch: 27 Global Step: 579930 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:46,156-Speed 6302.85 samples/sec Loss 3.6613 LearningRate 0.0001 Epoch: 27 Global Step: 579940 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:49,399-Speed 6315.23 samples/sec Loss 3.7010 LearningRate 0.0001 Epoch: 27 Global Step: 579950 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:52,643-Speed 6314.50 samples/sec Loss 3.6839 LearningRate 0.0001 Epoch: 27 Global Step: 579960 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:55,888-Speed 6313.79 samples/sec Loss 3.7521 LearningRate 0.0001 Epoch: 27 Global Step: 579970 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:28:59,132-Speed 6313.47 samples/sec Loss 3.6834 LearningRate 0.0001 Epoch: 27 Global Step: 579980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:02,384-Speed 6300.05 samples/sec Loss 3.6959 LearningRate 0.0001 Epoch: 27 Global Step: 579990 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:05,640-Speed 6289.88 samples/sec Loss 3.6977 LearningRate 0.0001 Epoch: 27 Global Step: 580000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:08,883-Speed 6316.75 samples/sec Loss 3.7133 LearningRate 0.0001 Epoch: 27 Global Step: 580010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:12,131-Speed 6308.39 samples/sec Loss 3.6921 LearningRate 0.0001 Epoch: 27 Global Step: 580020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:15,372-Speed 6321.05 samples/sec Loss 3.7222 LearningRate 0.0001 Epoch: 27 Global Step: 580030 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:29:18,608-Speed 6328.99 samples/sec Loss 3.6828 LearningRate 0.0001 Epoch: 27 Global Step: 580040 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:21,862-Speed 6295.23 samples/sec Loss 3.7547 LearningRate 0.0001 Epoch: 27 Global Step: 580050 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:25,107-Speed 6311.99 samples/sec Loss 3.6695 LearningRate 0.0001 Epoch: 27 Global Step: 580060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:28,355-Speed 6308.32 samples/sec Loss 3.7402 LearningRate 0.0001 Epoch: 27 Global Step: 580070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:31,602-Speed 6307.06 samples/sec Loss 3.6512 LearningRate 0.0001 Epoch: 27 Global Step: 580080 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:34,853-Speed 6301.24 samples/sec Loss 3.7489 LearningRate 0.0001 Epoch: 27 Global Step: 580090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:38,100-Speed 6310.39 samples/sec Loss 3.6735 LearningRate 0.0001 Epoch: 27 Global Step: 580100 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:41,344-Speed 6313.33 samples/sec Loss 3.7559 LearningRate 0.0001 Epoch: 27 Global Step: 580110 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:44,589-Speed 6313.04 samples/sec Loss 3.7048 LearningRate 0.0001 Epoch: 27 Global Step: 580120 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:47,847-Speed 6287.73 samples/sec Loss 3.6290 LearningRate 0.0001 Epoch: 27 Global Step: 580130 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:51,102-Speed 6294.60 samples/sec Loss 3.6275 LearningRate 0.0001 Epoch: 27 Global Step: 580140 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:29:54,335-Speed 6335.80 samples/sec Loss 3.7093 LearningRate 0.0001 Epoch: 27 Global Step: 580150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:29:57,581-Speed 6310.38 samples/sec Loss 3.7218 LearningRate 0.0001 Epoch: 27 Global Step: 580160 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:00,839-Speed 6288.01 samples/sec Loss 3.6812 LearningRate 0.0001 Epoch: 27 Global Step: 580170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:04,085-Speed 6310.92 samples/sec Loss 3.7041 LearningRate 0.0001 Epoch: 27 Global Step: 580180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:07,332-Speed 6309.07 samples/sec Loss 3.6949 LearningRate 0.0001 Epoch: 27 Global Step: 580190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:10,582-Speed 6302.76 samples/sec Loss 3.6854 LearningRate 0.0001 Epoch: 27 Global Step: 580200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:13,832-Speed 6303.32 samples/sec Loss 3.7089 LearningRate 0.0001 Epoch: 27 Global Step: 580210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:17,080-Speed 6305.87 samples/sec Loss 3.7273 LearningRate 0.0001 Epoch: 27 Global Step: 580220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:20,330-Speed 6302.82 samples/sec Loss 3.6522 LearningRate 0.0001 Epoch: 27 Global Step: 580230 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:23,570-Speed 6321.25 samples/sec Loss 3.6799 LearningRate 0.0001 Epoch: 27 Global Step: 580240 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:26,822-Speed 6300.36 samples/sec Loss 3.6599 LearningRate 0.0001 Epoch: 27 Global Step: 580250 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:30:30,071-Speed 6304.37 samples/sec Loss 3.6833 LearningRate 0.0001 Epoch: 27 Global Step: 580260 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:30:33,319-Speed 6306.17 samples/sec Loss 3.6989 LearningRate 0.0001 Epoch: 27 Global Step: 580270 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:30:36,565-Speed 6310.60 samples/sec Loss 3.6530 LearningRate 0.0001 Epoch: 27 Global Step: 580280 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:30:39,795-Speed 6342.67 samples/sec Loss 3.6792 LearningRate 0.0001 Epoch: 27 Global Step: 580290 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:43,044-Speed 6305.75 samples/sec Loss 3.7493 LearningRate 0.0001 Epoch: 27 Global Step: 580300 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:46,291-Speed 6307.78 samples/sec Loss 3.7093 LearningRate 0.0001 Epoch: 27 Global Step: 580310 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:49,535-Speed 6315.02 samples/sec Loss 3.6823 LearningRate 0.0001 Epoch: 27 Global Step: 580320 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:52,796-Speed 6281.61 samples/sec Loss 3.6899 LearningRate 0.0001 Epoch: 27 Global Step: 580330 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:56,037-Speed 6319.93 samples/sec Loss 3.6919 LearningRate 0.0001 Epoch: 27 Global Step: 580340 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:30:59,281-Speed 6315.85 samples/sec Loss 3.7724 LearningRate 0.0001 Epoch: 27 Global Step: 580350 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:02,526-Speed 6312.10 samples/sec Loss 3.6462 LearningRate 0.0001 Epoch: 27 Global Step: 580360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:05,775-Speed 6305.75 samples/sec Loss 3.7147 LearningRate 0.0001 Epoch: 27 Global Step: 580370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:09,020-Speed 6313.26 samples/sec Loss 3.7558 LearningRate 0.0001 Epoch: 27 Global Step: 580380 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:12,262-Speed 6318.92 samples/sec Loss 3.6959 LearningRate 0.0001 Epoch: 27 Global Step: 580390 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:15,505-Speed 6314.88 samples/sec Loss 3.7215 LearningRate 0.0001 Epoch: 27 Global Step: 580400 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:18,751-Speed 6313.02 samples/sec Loss 3.6560 LearningRate 0.0001 Epoch: 27 Global Step: 580410 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:21,998-Speed 6306.86 samples/sec Loss 3.8016 LearningRate 0.0001 Epoch: 27 Global Step: 580420 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:25,244-Speed 6310.76 samples/sec Loss 3.6527 LearningRate 0.0001 Epoch: 27 Global Step: 580430 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:28,506-Speed 6279.70 samples/sec Loss 3.7764 LearningRate 0.0001 Epoch: 27 Global Step: 580440 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:31,751-Speed 6313.89 samples/sec Loss 3.7541 LearningRate 0.0001 Epoch: 27 Global Step: 580450 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:34,994-Speed 6314.88 samples/sec Loss 3.7078 LearningRate 0.0001 Epoch: 27 Global Step: 580460 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:38,244-Speed 6304.50 samples/sec Loss 3.7571 LearningRate 0.0001 Epoch: 27 Global Step: 580470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:41,489-Speed 6311.07 samples/sec Loss 3.6808 LearningRate 0.0001 Epoch: 27 Global Step: 580480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:44,735-Speed 6312.03 samples/sec Loss 3.6328 LearningRate 0.0001 Epoch: 27 Global Step: 580490 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:31:47,971-Speed 6330.31 samples/sec Loss 3.6682 LearningRate 0.0001 Epoch: 27 Global Step: 580500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:51,215-Speed 6314.71 samples/sec Loss 3.6860 LearningRate 0.0001 Epoch: 27 Global Step: 580510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:54,462-Speed 6307.12 samples/sec Loss 3.7464 LearningRate 0.0001 Epoch: 27 Global Step: 580520 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:31:57,705-Speed 6316.97 samples/sec Loss 3.7064 LearningRate 0.0001 Epoch: 27 Global Step: 580530 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:00,957-Speed 6299.74 samples/sec Loss 3.7568 LearningRate 0.0001 Epoch: 27 Global Step: 580540 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:04,202-Speed 6312.66 samples/sec Loss 3.7055 LearningRate 0.0001 Epoch: 27 Global Step: 580550 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:07,449-Speed 6307.87 samples/sec Loss 3.6944 LearningRate 0.0001 Epoch: 27 Global Step: 580560 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:10,700-Speed 6301.18 samples/sec Loss 3.7180 LearningRate 0.0001 Epoch: 27 Global Step: 580570 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:13,945-Speed 6313.15 samples/sec Loss 3.6838 LearningRate 0.0001 Epoch: 27 Global Step: 580580 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:17,191-Speed 6311.93 samples/sec Loss 3.7263 LearningRate 0.0001 Epoch: 27 Global Step: 580590 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:20,434-Speed 6315.56 samples/sec Loss 3.6627 LearningRate 0.0001 Epoch: 27 Global Step: 580600 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:32:23,683-Speed 6305.63 samples/sec Loss 3.7259 LearningRate 0.0001 Epoch: 27 Global Step: 580610 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:32:26,930-Speed 6308.54 samples/sec Loss 3.6627 LearningRate 0.0001 Epoch: 27 Global Step: 580620 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:32:30,161-Speed 6340.72 samples/sec Loss 3.6769 LearningRate 0.0001 Epoch: 27 Global Step: 580630 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:33,408-Speed 6309.05 samples/sec Loss 3.7663 LearningRate 0.0001 Epoch: 27 Global Step: 580640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:36,654-Speed 6310.08 samples/sec Loss 3.6873 LearningRate 0.0001 Epoch: 27 Global Step: 580650 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:39,898-Speed 6315.18 samples/sec Loss 3.6979 LearningRate 0.0001 Epoch: 27 Global Step: 580660 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:43,142-Speed 6313.60 samples/sec Loss 3.7419 LearningRate 0.0001 Epoch: 27 Global Step: 580670 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:46,384-Speed 6318.02 samples/sec Loss 3.7272 LearningRate 0.0001 Epoch: 27 Global Step: 580680 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:49,644-Speed 6285.29 samples/sec Loss 3.6881 LearningRate 0.0001 Epoch: 27 Global Step: 580690 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:32:52,905-Speed 6280.57 samples/sec Loss 3.6617 LearningRate 0.0001 Epoch: 27 Global Step: 580700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:33:52,076-Speed 346.12 samples/sec Loss 3.7799 LearningRate 0.0001 Epoch: 28 Global Step: 580710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:33:55,311-Speed 6331.31 samples/sec Loss 3.7173 LearningRate 0.0001 Epoch: 28 Global Step: 580720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:33:58,556-Speed 6313.30 samples/sec Loss 3.7299 LearningRate 0.0001 Epoch: 28 Global Step: 580730 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:34:01,784-Speed 6352.82 samples/sec Loss 3.7132 LearningRate 0.0001 Epoch: 28 Global Step: 580740 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:05,024-Speed 6322.34 samples/sec Loss 3.7380 LearningRate 0.0001 Epoch: 28 Global Step: 580750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:08,261-Speed 6326.95 samples/sec Loss 3.6970 LearningRate 0.0001 Epoch: 28 Global Step: 580760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:11,499-Speed 6326.46 samples/sec Loss 3.7251 LearningRate 0.0001 Epoch: 28 Global Step: 580770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:14,737-Speed 6327.91 samples/sec Loss 3.7080 LearningRate 0.0001 Epoch: 28 Global Step: 580780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:17,977-Speed 6322.59 samples/sec Loss 3.7406 LearningRate 0.0001 Epoch: 28 Global Step: 580790 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:21,214-Speed 6328.22 samples/sec Loss 3.6464 LearningRate 0.0001 Epoch: 28 Global Step: 580800 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:24,455-Speed 6319.68 samples/sec Loss 3.6817 LearningRate 0.0001 Epoch: 28 Global Step: 580810 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:27,697-Speed 6319.65 samples/sec Loss 3.6694 LearningRate 0.0001 Epoch: 28 Global Step: 580820 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:30,938-Speed 6320.54 samples/sec Loss 3.7255 LearningRate 0.0001 Epoch: 28 Global Step: 580830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:34,176-Speed 6325.65 samples/sec Loss 3.6420 LearningRate 0.0001 Epoch: 28 Global Step: 580840 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:34:37,403-Speed 6348.80 samples/sec Loss 3.7169 LearningRate 0.0001 Epoch: 28 Global Step: 580850 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:40,643-Speed 6320.92 samples/sec Loss 3.6658 LearningRate 0.0001 Epoch: 28 Global Step: 580860 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:43,884-Speed 6320.93 samples/sec Loss 3.6824 LearningRate 0.0001 Epoch: 28 Global Step: 580870 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:47,127-Speed 6317.15 samples/sec Loss 3.6872 LearningRate 0.0001 Epoch: 28 Global Step: 580880 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:50,367-Speed 6322.78 samples/sec Loss 3.6648 LearningRate 0.0001 Epoch: 28 Global Step: 580890 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:53,611-Speed 6315.42 samples/sec Loss 3.6572 LearningRate 0.0001 Epoch: 28 Global Step: 580900 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:34:56,848-Speed 6327.62 samples/sec Loss 3.6701 LearningRate 0.0001 Epoch: 28 Global Step: 580910 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:00,088-Speed 6321.20 samples/sec Loss 3.6776 LearningRate 0.0001 Epoch: 28 Global Step: 580920 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:03,343-Speed 6293.77 samples/sec Loss 3.6937 LearningRate 0.0001 Epoch: 28 Global Step: 580930 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:06,581-Speed 6326.46 samples/sec Loss 3.7345 LearningRate 0.0001 Epoch: 28 Global Step: 580940 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:09,825-Speed 6313.79 samples/sec Loss 3.7568 LearningRate 0.0001 Epoch: 28 Global Step: 580950 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:35:13,066-Speed 6321.11 samples/sec Loss 3.7047 LearningRate 0.0001 Epoch: 28 Global Step: 580960 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:35:16,293-Speed 6347.62 samples/sec Loss 3.6635 LearningRate 0.0001 Epoch: 28 Global Step: 580970 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:19,530-Speed 6329.51 samples/sec Loss 3.6827 LearningRate 0.0001 Epoch: 28 Global Step: 580980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:22,770-Speed 6322.20 samples/sec Loss 3.7101 LearningRate 0.0001 Epoch: 28 Global Step: 580990 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:26,010-Speed 6322.63 samples/sec Loss 3.6367 LearningRate 0.0001 Epoch: 28 Global Step: 581000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:29,254-Speed 6314.72 samples/sec Loss 3.6506 LearningRate 0.0001 Epoch: 28 Global Step: 581010 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:32,490-Speed 6331.50 samples/sec Loss 3.6979 LearningRate 0.0001 Epoch: 28 Global Step: 581020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:35,725-Speed 6330.18 samples/sec Loss 3.7207 LearningRate 0.0001 Epoch: 28 Global Step: 581030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:38,962-Speed 6328.25 samples/sec Loss 3.6028 LearningRate 0.0001 Epoch: 28 Global Step: 581040 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:42,200-Speed 6327.33 samples/sec Loss 3.7675 LearningRate 0.0001 Epoch: 28 Global Step: 581050 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:45,438-Speed 6325.54 samples/sec Loss 3.6883 LearningRate 0.0001 Epoch: 28 Global Step: 581060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:48,681-Speed 6316.28 samples/sec Loss 3.7095 LearningRate 0.0001 Epoch: 28 Global Step: 581070 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:35:51,925-Speed 6316.41 samples/sec Loss 3.7056 LearningRate 0.0001 Epoch: 28 Global Step: 581080 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:35:55,147-Speed 6357.22 samples/sec Loss 3.6695 LearningRate 0.0001 Epoch: 28 Global Step: 581090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:35:58,388-Speed 6320.39 samples/sec Loss 3.6371 LearningRate 0.0001 Epoch: 28 Global Step: 581100 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:01,630-Speed 6318.29 samples/sec Loss 3.5983 LearningRate 0.0001 Epoch: 28 Global Step: 581110 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:04,869-Speed 6324.68 samples/sec Loss 3.6752 LearningRate 0.0001 Epoch: 28 Global Step: 581120 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:08,109-Speed 6322.46 samples/sec Loss 3.7016 LearningRate 0.0001 Epoch: 28 Global Step: 581130 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:11,350-Speed 6319.94 samples/sec Loss 3.6844 LearningRate 0.0001 Epoch: 28 Global Step: 581140 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:14,593-Speed 6317.92 samples/sec Loss 3.6687 LearningRate 0.0001 Epoch: 28 Global Step: 581150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:17,830-Speed 6328.08 samples/sec Loss 3.7032 LearningRate 0.0001 Epoch: 28 Global Step: 581160 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:21,067-Speed 6328.51 samples/sec Loss 3.7299 LearningRate 0.0001 Epoch: 28 Global Step: 581170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:24,307-Speed 6320.81 samples/sec Loss 3.6746 LearningRate 0.0001 Epoch: 28 Global Step: 581180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:27,550-Speed 6316.69 samples/sec Loss 3.6843 LearningRate 0.0001 Epoch: 28 Global Step: 581190 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:36:30,777-Speed 6347.58 samples/sec Loss 3.6793 LearningRate 0.0001 Epoch: 28 Global Step: 581200 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:34,019-Speed 6319.20 samples/sec Loss 3.6832 LearningRate 0.0001 Epoch: 28 Global Step: 581210 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:37,261-Speed 6319.13 samples/sec Loss 3.6535 LearningRate 0.0001 Epoch: 28 Global Step: 581220 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:40,503-Speed 6319.79 samples/sec Loss 3.7032 LearningRate 0.0001 Epoch: 28 Global Step: 581230 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:43,742-Speed 6324.01 samples/sec Loss 3.6176 LearningRate 0.0001 Epoch: 28 Global Step: 581240 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:46,978-Speed 6329.24 samples/sec Loss 3.6762 LearningRate 0.0001 Epoch: 28 Global Step: 581250 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:50,230-Speed 6299.47 samples/sec Loss 3.6934 LearningRate 0.0001 Epoch: 28 Global Step: 581260 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:53,470-Speed 6323.46 samples/sec Loss 3.7032 LearningRate 0.0001 Epoch: 28 Global Step: 581270 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:56,710-Speed 6321.05 samples/sec Loss 3.7116 LearningRate 0.0001 Epoch: 28 Global Step: 581280 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:36:59,954-Speed 6315.88 samples/sec Loss 3.6770 LearningRate 0.0001 Epoch: 28 Global Step: 581290 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:03,200-Speed 6311.01 samples/sec Loss 3.6754 LearningRate 0.0001 Epoch: 28 Global Step: 581300 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:37:06,425-Speed 6351.01 samples/sec Loss 3.7268 LearningRate 0.0001 Epoch: 28 Global Step: 581310 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:09,666-Speed 6320.95 samples/sec Loss 3.6903 LearningRate 0.0001 Epoch: 28 Global Step: 581320 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:12,903-Speed 6328.26 samples/sec Loss 3.6487 LearningRate 0.0001 Epoch: 28 Global Step: 581330 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:16,146-Speed 6315.41 samples/sec Loss 3.7130 LearningRate 0.0001 Epoch: 28 Global Step: 581340 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:19,385-Speed 6324.56 samples/sec Loss 3.7427 LearningRate 0.0001 Epoch: 28 Global Step: 581350 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:22,626-Speed 6320.01 samples/sec Loss 3.6637 LearningRate 0.0001 Epoch: 28 Global Step: 581360 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:25,865-Speed 6324.16 samples/sec Loss 3.6836 LearningRate 0.0001 Epoch: 28 Global Step: 581370 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:29,110-Speed 6314.18 samples/sec Loss 3.7225 LearningRate 0.0001 Epoch: 28 Global Step: 581380 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:32,350-Speed 6321.96 samples/sec Loss 3.7393 LearningRate 0.0001 Epoch: 28 Global Step: 581390 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:35,590-Speed 6322.84 samples/sec Loss 3.6880 LearningRate 0.0001 Epoch: 28 Global Step: 581400 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:38,832-Speed 6316.94 samples/sec Loss 3.7084 LearningRate 0.0001 Epoch: 28 Global Step: 581410 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:37:42,073-Speed 6320.80 samples/sec Loss 3.6317 LearningRate 0.0001 Epoch: 28 Global Step: 581420 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:37:45,314-Speed 6322.15 samples/sec Loss 3.6601 LearningRate 0.0001 Epoch: 28 Global Step: 581430 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:37:48,551-Speed 6326.87 samples/sec Loss 3.7444 LearningRate 0.0001 Epoch: 28 Global Step: 581440 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:37:51,796-Speed 6313.07 samples/sec Loss 3.7456 LearningRate 0.0001 Epoch: 28 Global Step: 581450 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:37:55,022-Speed 6351.15 samples/sec Loss 3.6277 LearningRate 0.0001 Epoch: 28 Global Step: 581460 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:37:58,265-Speed 6316.66 samples/sec Loss 3.6904 LearningRate 0.0001 Epoch: 28 Global Step: 581470 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:01,559-Speed 6218.62 samples/sec Loss 3.6730 LearningRate 0.0001 Epoch: 28 Global Step: 581480 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:04,801-Speed 6319.16 samples/sec Loss 3.7403 LearningRate 0.0001 Epoch: 28 Global Step: 581490 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:08,042-Speed 6320.11 samples/sec Loss 3.7191 LearningRate 0.0001 Epoch: 28 Global Step: 581500 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:11,292-Speed 6302.54 samples/sec Loss 3.7449 LearningRate 0.0001 Epoch: 28 Global Step: 581510 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:14,532-Speed 6322.27 samples/sec Loss 3.7032 LearningRate 0.0001 Epoch: 28 Global Step: 581520 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:17,779-Speed 6310.19 samples/sec Loss 3.7231 LearningRate 0.0001 Epoch: 28 Global Step: 581530 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:21,020-Speed 6319.92 samples/sec Loss 3.6494 LearningRate 0.0001 Epoch: 28 Global Step: 581540 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:24,259-Speed 6324.52 samples/sec Loss 3.6900 LearningRate 0.0001 Epoch: 28 Global Step: 581550 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:27,501-Speed 6317.50 samples/sec Loss 3.6784 LearningRate 0.0001 Epoch: 28 Global Step: 581560 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:38:30,742-Speed 6320.65 samples/sec Loss 3.6337 LearningRate 0.0001 Epoch: 28 Global Step: 581570 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:38:33,987-Speed 6313.93 samples/sec Loss 3.6840 LearningRate 0.0001 Epoch: 28 Global Step: 581580 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:38:37,226-Speed 6322.39 samples/sec Loss 3.7011 LearningRate 0.0001 Epoch: 28 Global Step: 581590 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:38:40,455-Speed 6343.96 samples/sec Loss 3.6758 LearningRate 0.0001 Epoch: 28 Global Step: 581600 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:43,697-Speed 6318.57 samples/sec Loss 3.7184 LearningRate 0.0001 Epoch: 28 Global Step: 581610 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:46,942-Speed 6313.72 samples/sec Loss 3.7249 LearningRate 0.0001 Epoch: 28 Global Step: 581620 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:50,185-Speed 6315.90 samples/sec Loss 3.6756 LearningRate 0.0001 Epoch: 28 Global Step: 581630 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:53,426-Speed 6320.00 samples/sec Loss 3.6334 LearningRate 0.0001 Epoch: 28 Global Step: 581640 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:56,665-Speed 6324.56 samples/sec Loss 3.7410 LearningRate 0.0001 Epoch: 28 Global Step: 581650 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:38:59,908-Speed 6317.36 samples/sec Loss 3.6544 LearningRate 0.0001 Epoch: 28 Global Step: 581660 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:03,152-Speed 6315.49 samples/sec Loss 3.5940 LearningRate 0.0001 Epoch: 28 Global Step: 581670 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:06,395-Speed 6316.29 samples/sec Loss 3.7278 LearningRate 0.0001 Epoch: 28 Global Step: 581680 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:09,636-Speed 6320.43 samples/sec Loss 3.6380 LearningRate 0.0001 Epoch: 28 Global Step: 581690 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:12,870-Speed 6334.00 samples/sec Loss 3.7096 LearningRate 0.0001 Epoch: 28 Global Step: 581700 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:16,111-Speed 6320.31 samples/sec Loss 3.7184 LearningRate 0.0001 Epoch: 28 Global Step: 581710 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:19,358-Speed 6310.09 samples/sec Loss 3.7530 LearningRate 0.0001 Epoch: 28 Global Step: 581720 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:22,600-Speed 6317.31 samples/sec Loss 3.7012 LearningRate 0.0001 Epoch: 28 Global Step: 581730 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:25,845-Speed 6312.63 samples/sec Loss 3.7111 LearningRate 0.0001 Epoch: 28 Global Step: 581740 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:29,087-Speed 6319.45 samples/sec Loss 3.6611 LearningRate 0.0001 Epoch: 28 Global Step: 581750 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:32,330-Speed 6316.03 samples/sec Loss 3.7034 LearningRate 0.0001 Epoch: 28 Global Step: 581760 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:35,575-Speed 6312.84 samples/sec Loss 3.6928 LearningRate 0.0001 Epoch: 28 Global Step: 581770 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:38,821-Speed 6310.87 samples/sec Loss 3.7008 LearningRate 0.0001 Epoch: 28 Global Step: 581780 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:42,066-Speed 6312.65 samples/sec Loss 3.7136 LearningRate 0.0001 Epoch: 28 Global Step: 581790 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:45,295-Speed 6343.01 samples/sec Loss 3.6732 LearningRate 0.0001 Epoch: 28 Global Step: 581800 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:48,541-Speed 6311.31 samples/sec Loss 3.6707 LearningRate 0.0001 Epoch: 28 Global Step: 581810 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:51,783-Speed 6318.27 samples/sec Loss 3.7001 LearningRate 0.0001 Epoch: 28 Global Step: 581820 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:55,031-Speed 6308.22 samples/sec Loss 3.6200 LearningRate 0.0001 Epoch: 28 Global Step: 581830 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:39:58,275-Speed 6313.68 samples/sec Loss 3.6840 LearningRate 0.0001 Epoch: 28 Global Step: 581840 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:01,522-Speed 6309.87 samples/sec Loss 3.6395 LearningRate 0.0001 Epoch: 28 Global Step: 581850 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:04,762-Speed 6321.56 samples/sec Loss 3.6667 LearningRate 0.0001 Epoch: 28 Global Step: 581860 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:08,007-Speed 6312.50 samples/sec Loss 3.6805 LearningRate 0.0001 Epoch: 28 Global Step: 581870 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:11,248-Speed 6320.69 samples/sec Loss 3.7270 LearningRate 0.0001 Epoch: 28 Global Step: 581880 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:14,491-Speed 6316.79 samples/sec Loss 3.7120 LearningRate 0.0001 Epoch: 28 Global Step: 581890 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:17,732-Speed 6320.85 samples/sec Loss 3.6444 LearningRate 0.0001 Epoch: 28 Global Step: 581900 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:40:20,963-Speed 6341.16 samples/sec Loss 3.7119 LearningRate 0.0001 Epoch: 28 Global Step: 581910 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:24,209-Speed 6310.48 samples/sec Loss 3.6712 LearningRate 0.0001 Epoch: 28 Global Step: 581920 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:27,450-Speed 6321.51 samples/sec Loss 3.6982 LearningRate 0.0001 Epoch: 28 Global Step: 581930 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:30,693-Speed 6315.16 samples/sec Loss 3.6798 LearningRate 0.0001 Epoch: 28 Global Step: 581940 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:33,936-Speed 6318.27 samples/sec Loss 3.7300 LearningRate 0.0001 Epoch: 28 Global Step: 581950 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:37,198-Speed 6278.35 samples/sec Loss 3.7083 LearningRate 0.0001 Epoch: 28 Global Step: 581960 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:40,439-Speed 6320.80 samples/sec Loss 3.7092 LearningRate 0.0001 Epoch: 28 Global Step: 581970 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:43,678-Speed 6323.75 samples/sec Loss 3.7109 LearningRate 0.0001 Epoch: 28 Global Step: 581980 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:46,920-Speed 6318.37 samples/sec Loss 3.6555 LearningRate 0.0001 Epoch: 28 Global Step: 581990 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:50,161-Speed 6320.67 samples/sec Loss 3.6853 LearningRate 0.0001 Epoch: 28 Global Step: 582000 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:53,406-Speed 6314.03 samples/sec Loss 3.6217 LearningRate 0.0001 Epoch: 28 Global Step: 582010 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:40:56,635-Speed 6342.16 samples/sec Loss 3.6138 LearningRate 0.0001 Epoch: 28 Global Step: 582020 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:40:59,880-Speed 6314.43 samples/sec Loss 3.6313 LearningRate 0.0001 Epoch: 28 Global Step: 582030 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:03,125-Speed 6312.43 samples/sec Loss 3.7083 LearningRate 0.0001 Epoch: 28 Global Step: 582040 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:06,368-Speed 6315.25 samples/sec Loss 3.7149 LearningRate 0.0001 Epoch: 28 Global Step: 582050 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:09,615-Speed 6309.38 samples/sec Loss 3.7154 LearningRate 0.0001 Epoch: 28 Global Step: 582060 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:12,858-Speed 6316.64 samples/sec Loss 3.7183 LearningRate 0.0001 Epoch: 28 Global Step: 582070 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:16,101-Speed 6316.90 samples/sec Loss 3.6408 LearningRate 0.0001 Epoch: 28 Global Step: 582080 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:19,340-Speed 6323.69 samples/sec Loss 3.6859 LearningRate 0.0001 Epoch: 28 Global Step: 582090 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:22,582-Speed 6319.47 samples/sec Loss 3.6678 LearningRate 0.0001 Epoch: 28 Global Step: 582100 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:25,828-Speed 6311.43 samples/sec Loss 3.6782 LearningRate 0.0001 Epoch: 28 Global Step: 582110 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:29,067-Speed 6323.90 samples/sec Loss 3.6523 LearningRate 0.0001 Epoch: 28 Global Step: 582120 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:41:32,307-Speed 6322.13 samples/sec Loss 3.6774 LearningRate 0.0001 Epoch: 28 Global Step: 582130 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:41:35,553-Speed 6311.14 samples/sec Loss 3.7118 LearningRate 0.0001 Epoch: 28 Global Step: 582140 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-02 20:41:38,790-Speed 6329.30 samples/sec Loss 3.6738 LearningRate 0.0001 Epoch: 28 Global Step: 582150 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:42,032-Speed 6318.33 samples/sec Loss 3.6754 LearningRate 0.0001 Epoch: 28 Global Step: 582160 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:45,272-Speed 6321.16 samples/sec Loss 3.6880 LearningRate 0.0001 Epoch: 28 Global Step: 582170 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:48,516-Speed 6315.67 samples/sec Loss 3.6976 LearningRate 0.0001 Epoch: 28 Global Step: 582180 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:51,761-Speed 6312.83 samples/sec Loss 3.6516 LearningRate 0.0001 Epoch: 28 Global Step: 582190 Fp16 Grad Scale: 8192 Required: 23 hours Training: 2022-04-02 20:41:55,004-Speed 6315.60 samples/sec Loss 3.6968 LearningRate 0.0001 Epoch: 28 Global Step: 582200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:41:58,250-Speed 6310.65 samples/sec Loss 3.6493 LearningRate 0.0001 Epoch: 28 Global Step: 582210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:01,495-Speed 6312.79 samples/sec Loss 3.6370 LearningRate 0.0001 Epoch: 28 Global Step: 582220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:04,742-Speed 6309.95 samples/sec Loss 3.6165 LearningRate 0.0001 Epoch: 28 Global Step: 582230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:07,986-Speed 6314.57 samples/sec Loss 3.7083 LearningRate 0.0001 Epoch: 28 Global Step: 582240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:11,215-Speed 6343.61 samples/sec Loss 3.7400 LearningRate 0.0001 Epoch: 28 Global Step: 582250 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:14,455-Speed 6321.75 samples/sec Loss 3.7511 LearningRate 0.0001 Epoch: 28 Global Step: 582260 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:17,701-Speed 6312.01 samples/sec Loss 3.6683 LearningRate 0.0001 Epoch: 28 Global Step: 582270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:20,945-Speed 6313.65 samples/sec Loss 3.6526 LearningRate 0.0001 Epoch: 28 Global Step: 582280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:24,189-Speed 6314.19 samples/sec Loss 3.6496 LearningRate 0.0001 Epoch: 28 Global Step: 582290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:27,434-Speed 6313.00 samples/sec Loss 3.7147 LearningRate 0.0001 Epoch: 28 Global Step: 582300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:30,677-Speed 6316.46 samples/sec Loss 3.6314 LearningRate 0.0001 Epoch: 28 Global Step: 582310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:33,917-Speed 6322.74 samples/sec Loss 3.7562 LearningRate 0.0001 Epoch: 28 Global Step: 582320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:37,158-Speed 6321.03 samples/sec Loss 3.7193 LearningRate 0.0001 Epoch: 28 Global Step: 582330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:40,405-Speed 6309.14 samples/sec Loss 3.7245 LearningRate 0.0001 Epoch: 28 Global Step: 582340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:43,639-Speed 6334.70 samples/sec Loss 3.6781 LearningRate 0.0001 Epoch: 28 Global Step: 582350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:46,879-Speed 6321.61 samples/sec Loss 3.6435 LearningRate 0.0001 Epoch: 28 Global Step: 582360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:50,127-Speed 6308.06 samples/sec Loss 3.6887 LearningRate 0.0001 Epoch: 28 Global Step: 582370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:53,370-Speed 6315.55 samples/sec Loss 3.6577 LearningRate 0.0001 Epoch: 28 Global Step: 582380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:56,612-Speed 6319.51 samples/sec Loss 3.6643 LearningRate 0.0001 Epoch: 28 Global Step: 582390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:42:59,855-Speed 6315.62 samples/sec Loss 3.6208 LearningRate 0.0001 Epoch: 28 Global Step: 582400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:03,097-Speed 6318.64 samples/sec Loss 3.6796 LearningRate 0.0001 Epoch: 28 Global Step: 582410 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:06,342-Speed 6313.46 samples/sec Loss 3.7059 LearningRate 0.0001 Epoch: 28 Global Step: 582420 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:09,585-Speed 6315.60 samples/sec Loss 3.6675 LearningRate 0.0001 Epoch: 28 Global Step: 582430 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:12,832-Speed 6308.44 samples/sec Loss 3.6159 LearningRate 0.0001 Epoch: 28 Global Step: 582440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:16,075-Speed 6318.12 samples/sec Loss 3.6798 LearningRate 0.0001 Epoch: 28 Global Step: 582450 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:43:19,306-Speed 6338.35 samples/sec Loss 3.7056 LearningRate 0.0001 Epoch: 28 Global Step: 582460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:22,552-Speed 6311.11 samples/sec Loss 3.6571 LearningRate 0.0001 Epoch: 28 Global Step: 582470 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:25,799-Speed 6309.66 samples/sec Loss 3.7361 LearningRate 0.0001 Epoch: 28 Global Step: 582480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:29,040-Speed 6319.98 samples/sec Loss 3.7601 LearningRate 0.0001 Epoch: 28 Global Step: 582490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:32,288-Speed 6306.89 samples/sec Loss 3.6933 LearningRate 0.0001 Epoch: 28 Global Step: 582500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:35,532-Speed 6313.71 samples/sec Loss 3.6787 LearningRate 0.0001 Epoch: 28 Global Step: 582510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:38,778-Speed 6311.80 samples/sec Loss 3.6695 LearningRate 0.0001 Epoch: 28 Global Step: 582520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:42,019-Speed 6320.24 samples/sec Loss 3.7197 LearningRate 0.0001 Epoch: 28 Global Step: 582530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:45,265-Speed 6310.10 samples/sec Loss 3.6391 LearningRate 0.0001 Epoch: 28 Global Step: 582540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:48,512-Speed 6309.12 samples/sec Loss 3.7079 LearningRate 0.0001 Epoch: 28 Global Step: 582550 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:51,755-Speed 6317.08 samples/sec Loss 3.6686 LearningRate 0.0001 Epoch: 28 Global Step: 582560 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:43:54,983-Speed 6345.67 samples/sec Loss 3.6312 LearningRate 0.0001 Epoch: 28 Global Step: 582570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:43:58,224-Speed 6322.20 samples/sec Loss 3.6840 LearningRate 0.0001 Epoch: 28 Global Step: 582580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:01,472-Speed 6306.78 samples/sec Loss 3.6739 LearningRate 0.0001 Epoch: 28 Global Step: 582590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:04,715-Speed 6316.31 samples/sec Loss 3.7160 LearningRate 0.0001 Epoch: 28 Global Step: 582600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:07,961-Speed 6310.52 samples/sec Loss 3.6741 LearningRate 0.0001 Epoch: 28 Global Step: 582610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:11,205-Speed 6313.95 samples/sec Loss 3.6955 LearningRate 0.0001 Epoch: 28 Global Step: 582620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:14,451-Speed 6311.81 samples/sec Loss 3.6600 LearningRate 0.0001 Epoch: 28 Global Step: 582630 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:17,693-Speed 6318.24 samples/sec Loss 3.7018 LearningRate 0.0001 Epoch: 28 Global Step: 582640 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:20,938-Speed 6311.81 samples/sec Loss 3.6431 LearningRate 0.0001 Epoch: 28 Global Step: 582650 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:24,182-Speed 6315.14 samples/sec Loss 3.6678 LearningRate 0.0001 Epoch: 28 Global Step: 582660 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:27,503-Speed 6168.61 samples/sec Loss 3.6456 LearningRate 0.0001 Epoch: 28 Global Step: 582670 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:44:30,735-Speed 6338.74 samples/sec Loss 3.6691 LearningRate 0.0001 Epoch: 28 Global Step: 582680 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:33,980-Speed 6311.88 samples/sec Loss 3.6474 LearningRate 0.0001 Epoch: 28 Global Step: 582690 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:37,222-Speed 6318.06 samples/sec Loss 3.6471 LearningRate 0.0001 Epoch: 28 Global Step: 582700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:40,469-Speed 6308.35 samples/sec Loss 3.6273 LearningRate 0.0001 Epoch: 28 Global Step: 582710 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:43,715-Speed 6310.44 samples/sec Loss 3.6953 LearningRate 0.0001 Epoch: 28 Global Step: 582720 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:46,960-Speed 6313.53 samples/sec Loss 3.7379 LearningRate 0.0001 Epoch: 28 Global Step: 582730 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:50,204-Speed 6313.83 samples/sec Loss 3.7403 LearningRate 0.0001 Epoch: 28 Global Step: 582740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:53,463-Speed 6285.85 samples/sec Loss 3.6814 LearningRate 0.0001 Epoch: 28 Global Step: 582750 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:56,707-Speed 6314.79 samples/sec Loss 3.6745 LearningRate 0.0001 Epoch: 28 Global Step: 582760 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:44:59,957-Speed 6304.44 samples/sec Loss 3.7191 LearningRate 0.0001 Epoch: 28 Global Step: 582770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:03,205-Speed 6306.62 samples/sec Loss 3.7123 LearningRate 0.0001 Epoch: 28 Global Step: 582780 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:45:06,435-Speed 6341.33 samples/sec Loss 3.6876 LearningRate 0.0001 Epoch: 28 Global Step: 582790 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:09,693-Speed 6289.04 samples/sec Loss 3.7093 LearningRate 0.0001 Epoch: 28 Global Step: 582800 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:12,936-Speed 6316.06 samples/sec Loss 3.6718 LearningRate 0.0001 Epoch: 28 Global Step: 582810 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:16,181-Speed 6312.13 samples/sec Loss 3.6142 LearningRate 0.0001 Epoch: 28 Global Step: 582820 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:19,428-Speed 6309.22 samples/sec Loss 3.6639 LearningRate 0.0001 Epoch: 28 Global Step: 582830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:22,674-Speed 6311.76 samples/sec Loss 3.7386 LearningRate 0.0001 Epoch: 28 Global Step: 582840 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:25,931-Speed 6289.58 samples/sec Loss 3.6643 LearningRate 0.0001 Epoch: 28 Global Step: 582850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:29,175-Speed 6313.72 samples/sec Loss 3.6817 LearningRate 0.0001 Epoch: 28 Global Step: 582860 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:32,416-Speed 6321.31 samples/sec Loss 3.6669 LearningRate 0.0001 Epoch: 28 Global Step: 582870 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:35,658-Speed 6317.10 samples/sec Loss 3.6719 LearningRate 0.0001 Epoch: 28 Global Step: 582880 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:38,899-Speed 6320.93 samples/sec Loss 3.6754 LearningRate 0.0001 Epoch: 28 Global Step: 582890 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:45:42,142-Speed 6316.48 samples/sec Loss 3.6705 LearningRate 0.0001 Epoch: 28 Global Step: 582900 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:45:45,384-Speed 6319.35 samples/sec Loss 3.7190 LearningRate 0.0001 Epoch: 28 Global Step: 582910 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:45:48,616-Speed 6337.97 samples/sec Loss 3.6756 LearningRate 0.0001 Epoch: 28 Global Step: 582920 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:51,865-Speed 6305.10 samples/sec Loss 3.6998 LearningRate 0.0001 Epoch: 28 Global Step: 582930 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:55,115-Speed 6301.73 samples/sec Loss 3.6618 LearningRate 0.0001 Epoch: 28 Global Step: 582940 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:45:58,362-Speed 6310.05 samples/sec Loss 3.6542 LearningRate 0.0001 Epoch: 28 Global Step: 582950 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:01,605-Speed 6316.06 samples/sec Loss 3.6554 LearningRate 0.0001 Epoch: 28 Global Step: 582960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:04,852-Speed 6309.67 samples/sec Loss 3.6721 LearningRate 0.0001 Epoch: 28 Global Step: 582970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:08,092-Speed 6320.82 samples/sec Loss 3.7309 LearningRate 0.0001 Epoch: 28 Global Step: 582980 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:11,337-Speed 6313.01 samples/sec Loss 3.6907 LearningRate 0.0001 Epoch: 28 Global Step: 582990 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:14,581-Speed 6316.07 samples/sec Loss 3.6706 LearningRate 0.0001 Epoch: 28 Global Step: 583000 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:17,826-Speed 6312.11 samples/sec Loss 3.6908 LearningRate 0.0001 Epoch: 28 Global Step: 583010 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:21,071-Speed 6312.84 samples/sec Loss 3.6392 LearningRate 0.0001 Epoch: 28 Global Step: 583020 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:46:24,326-Speed 6293.65 samples/sec Loss 3.6904 LearningRate 0.0001 Epoch: 28 Global Step: 583030 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:46:27,616-Speed 6225.53 samples/sec Loss 3.6505 LearningRate 0.0001 Epoch: 28 Global Step: 583040 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:46:30,864-Speed 6307.83 samples/sec Loss 3.6786 LearningRate 0.0001 Epoch: 28 Global Step: 583050 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:46:34,110-Speed 6310.90 samples/sec Loss 3.7089 LearningRate 0.0001 Epoch: 28 Global Step: 583060 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:46:37,343-Speed 6335.79 samples/sec Loss 3.6608 LearningRate 0.0001 Epoch: 28 Global Step: 583070 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:40,588-Speed 6311.88 samples/sec Loss 3.6867 LearningRate 0.0001 Epoch: 28 Global Step: 583080 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:43,833-Speed 6314.17 samples/sec Loss 3.6530 LearningRate 0.0001 Epoch: 28 Global Step: 583090 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:47,075-Speed 6318.69 samples/sec Loss 3.6446 LearningRate 0.0001 Epoch: 28 Global Step: 583100 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:50,322-Speed 6308.20 samples/sec Loss 3.6666 LearningRate 0.0001 Epoch: 28 Global Step: 583110 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:53,566-Speed 6314.44 samples/sec Loss 3.6367 LearningRate 0.0001 Epoch: 28 Global Step: 583120 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:46:56,809-Speed 6315.90 samples/sec Loss 3.6709 LearningRate 0.0001 Epoch: 28 Global Step: 583130 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:00,056-Speed 6309.58 samples/sec Loss 3.6883 LearningRate 0.0001 Epoch: 28 Global Step: 583140 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:03,301-Speed 6314.09 samples/sec Loss 3.6586 LearningRate 0.0001 Epoch: 28 Global Step: 583150 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:06,544-Speed 6315.89 samples/sec Loss 3.6691 LearningRate 0.0001 Epoch: 28 Global Step: 583160 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:09,785-Speed 6319.50 samples/sec Loss 3.7269 LearningRate 0.0001 Epoch: 28 Global Step: 583170 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:47:13,029-Speed 6315.35 samples/sec Loss 3.6419 LearningRate 0.0001 Epoch: 28 Global Step: 583180 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:47:16,265-Speed 6330.49 samples/sec Loss 3.6680 LearningRate 0.0001 Epoch: 28 Global Step: 583190 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:19,513-Speed 6306.62 samples/sec Loss 3.6114 LearningRate 0.0001 Epoch: 28 Global Step: 583200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:22,768-Speed 6293.40 samples/sec Loss 3.6531 LearningRate 0.0001 Epoch: 28 Global Step: 583210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:26,011-Speed 6316.45 samples/sec Loss 3.7212 LearningRate 0.0001 Epoch: 28 Global Step: 583220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:29,259-Speed 6308.12 samples/sec Loss 3.6614 LearningRate 0.0001 Epoch: 28 Global Step: 583230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:32,505-Speed 6310.36 samples/sec Loss 3.6559 LearningRate 0.0001 Epoch: 28 Global Step: 583240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:35,750-Speed 6313.56 samples/sec Loss 3.6732 LearningRate 0.0001 Epoch: 28 Global Step: 583250 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:38,996-Speed 6311.10 samples/sec Loss 3.6576 LearningRate 0.0001 Epoch: 28 Global Step: 583260 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:42,244-Speed 6304.98 samples/sec Loss 3.7153 LearningRate 0.0001 Epoch: 28 Global Step: 583270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:45,490-Speed 6311.47 samples/sec Loss 3.6276 LearningRate 0.0001 Epoch: 28 Global Step: 583280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:47:48,733-Speed 6315.97 samples/sec Loss 3.6563 LearningRate 0.0001 Epoch: 28 Global Step: 583290 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:47:51,981-Speed 6307.67 samples/sec Loss 3.6540 LearningRate 0.0001 Epoch: 28 Global Step: 583300 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:47:55,225-Speed 6314.67 samples/sec Loss 3.6673 LearningRate 0.0001 Epoch: 28 Global Step: 583310 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:47:58,459-Speed 6333.11 samples/sec Loss 3.6466 LearningRate 0.0001 Epoch: 28 Global Step: 583320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:01,707-Speed 6307.89 samples/sec Loss 3.6607 LearningRate 0.0001 Epoch: 28 Global Step: 583330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:04,951-Speed 6314.40 samples/sec Loss 3.6985 LearningRate 0.0001 Epoch: 28 Global Step: 583340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:08,197-Speed 6311.17 samples/sec Loss 3.7132 LearningRate 0.0001 Epoch: 28 Global Step: 583350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:11,444-Speed 6307.45 samples/sec Loss 3.6536 LearningRate 0.0001 Epoch: 28 Global Step: 583360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:14,689-Speed 6313.70 samples/sec Loss 3.6685 LearningRate 0.0001 Epoch: 28 Global Step: 583370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:17,934-Speed 6313.28 samples/sec Loss 3.6963 LearningRate 0.0001 Epoch: 28 Global Step: 583380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:21,179-Speed 6312.30 samples/sec Loss 3.6988 LearningRate 0.0001 Epoch: 28 Global Step: 583390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:24,422-Speed 6316.24 samples/sec Loss 3.6867 LearningRate 0.0001 Epoch: 28 Global Step: 583400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:27,668-Speed 6310.72 samples/sec Loss 3.6957 LearningRate 0.0001 Epoch: 28 Global Step: 583410 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:30,912-Speed 6314.91 samples/sec Loss 3.6262 LearningRate 0.0001 Epoch: 28 Global Step: 583420 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:48:34,162-Speed 6303.60 samples/sec Loss 3.6740 LearningRate 0.0001 Epoch: 28 Global Step: 583430 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:48:37,408-Speed 6311.54 samples/sec Loss 3.6728 LearningRate 0.0001 Epoch: 28 Global Step: 583440 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:48:40,642-Speed 6333.85 samples/sec Loss 3.6783 LearningRate 0.0001 Epoch: 28 Global Step: 583450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:43,886-Speed 6313.38 samples/sec Loss 3.6553 LearningRate 0.0001 Epoch: 28 Global Step: 583460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:47,136-Speed 6303.04 samples/sec Loss 3.6012 LearningRate 0.0001 Epoch: 28 Global Step: 583470 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:50,384-Speed 6307.98 samples/sec Loss 3.6956 LearningRate 0.0001 Epoch: 28 Global Step: 583480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:53,630-Speed 6309.76 samples/sec Loss 3.6715 LearningRate 0.0001 Epoch: 28 Global Step: 583490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:48:56,876-Speed 6311.85 samples/sec Loss 3.7060 LearningRate 0.0001 Epoch: 28 Global Step: 583500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:00,119-Speed 6316.16 samples/sec Loss 3.7193 LearningRate 0.0001 Epoch: 28 Global Step: 583510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:03,367-Speed 6305.92 samples/sec Loss 3.6745 LearningRate 0.0001 Epoch: 28 Global Step: 583520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:06,612-Speed 6313.41 samples/sec Loss 3.6798 LearningRate 0.0001 Epoch: 28 Global Step: 583530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:09,854-Speed 6317.88 samples/sec Loss 3.7298 LearningRate 0.0001 Epoch: 28 Global Step: 583540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:13,112-Speed 6288.34 samples/sec Loss 3.6928 LearningRate 0.0001 Epoch: 28 Global Step: 583550 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:49:16,344-Speed 6337.39 samples/sec Loss 3.6099 LearningRate 0.0001 Epoch: 28 Global Step: 583560 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:19,589-Speed 6312.91 samples/sec Loss 3.6403 LearningRate 0.0001 Epoch: 28 Global Step: 583570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:22,836-Speed 6309.97 samples/sec Loss 3.6982 LearningRate 0.0001 Epoch: 28 Global Step: 583580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:26,080-Speed 6313.35 samples/sec Loss 3.6691 LearningRate 0.0001 Epoch: 28 Global Step: 583590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:29,324-Speed 6315.40 samples/sec Loss 3.6268 LearningRate 0.0001 Epoch: 28 Global Step: 583600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:32,569-Speed 6312.01 samples/sec Loss 3.7299 LearningRate 0.0001 Epoch: 28 Global Step: 583610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:35,812-Speed 6317.68 samples/sec Loss 3.6644 LearningRate 0.0001 Epoch: 28 Global Step: 583620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:39,056-Speed 6313.96 samples/sec Loss 3.6677 LearningRate 0.0001 Epoch: 28 Global Step: 583630 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:42,301-Speed 6311.84 samples/sec Loss 3.6449 LearningRate 0.0001 Epoch: 28 Global Step: 583640 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:45,545-Speed 6315.00 samples/sec Loss 3.7160 LearningRate 0.0001 Epoch: 28 Global Step: 583650 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:48,775-Speed 6342.49 samples/sec Loss 3.6914 LearningRate 0.0001 Epoch: 28 Global Step: 583660 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:52,020-Speed 6313.00 samples/sec Loss 3.6655 LearningRate 0.0001 Epoch: 28 Global Step: 583670 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:55,263-Speed 6317.62 samples/sec Loss 3.6813 LearningRate 0.0001 Epoch: 28 Global Step: 583680 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:49:58,509-Speed 6309.51 samples/sec Loss 3.6951 LearningRate 0.0001 Epoch: 28 Global Step: 583690 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:01,756-Speed 6310.22 samples/sec Loss 3.6461 LearningRate 0.0001 Epoch: 28 Global Step: 583700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:04,999-Speed 6316.47 samples/sec Loss 3.6736 LearningRate 0.0001 Epoch: 28 Global Step: 583710 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:08,244-Speed 6312.75 samples/sec Loss 3.7004 LearningRate 0.0001 Epoch: 28 Global Step: 583720 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:11,489-Speed 6312.15 samples/sec Loss 3.7010 LearningRate 0.0001 Epoch: 28 Global Step: 583730 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:14,731-Speed 6318.79 samples/sec Loss 3.6733 LearningRate 0.0001 Epoch: 28 Global Step: 583740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:17,977-Speed 6310.60 samples/sec Loss 3.7014 LearningRate 0.0001 Epoch: 28 Global Step: 583750 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:21,219-Speed 6317.84 samples/sec Loss 3.6921 LearningRate 0.0001 Epoch: 28 Global Step: 583760 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:50:24,449-Speed 6343.22 samples/sec Loss 3.6088 LearningRate 0.0001 Epoch: 28 Global Step: 583770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:27,694-Speed 6311.62 samples/sec Loss 3.7132 LearningRate 0.0001 Epoch: 28 Global Step: 583780 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:30,942-Speed 6307.04 samples/sec Loss 3.7019 LearningRate 0.0001 Epoch: 28 Global Step: 583790 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:34,190-Speed 6307.10 samples/sec Loss 3.6347 LearningRate 0.0001 Epoch: 28 Global Step: 583800 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:37,434-Speed 6315.31 samples/sec Loss 3.6849 LearningRate 0.0001 Epoch: 28 Global Step: 583810 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:40,678-Speed 6312.89 samples/sec Loss 3.6513 LearningRate 0.0001 Epoch: 28 Global Step: 583820 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:43,922-Speed 6314.99 samples/sec Loss 3.6713 LearningRate 0.0001 Epoch: 28 Global Step: 583830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:47,168-Speed 6311.11 samples/sec Loss 3.6183 LearningRate 0.0001 Epoch: 28 Global Step: 583840 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:50,411-Speed 6317.29 samples/sec Loss 3.5963 LearningRate 0.0001 Epoch: 28 Global Step: 583850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:53,668-Speed 6289.50 samples/sec Loss 3.6755 LearningRate 0.0001 Epoch: 28 Global Step: 583860 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:50:56,897-Speed 6343.35 samples/sec Loss 3.7004 LearningRate 0.0001 Epoch: 28 Global Step: 583870 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:00,164-Speed 6270.06 samples/sec Loss 3.6935 LearningRate 0.0001 Epoch: 28 Global Step: 583880 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:03,408-Speed 6315.65 samples/sec Loss 3.6715 LearningRate 0.0001 Epoch: 28 Global Step: 583890 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:06,656-Speed 6306.45 samples/sec Loss 3.6653 LearningRate 0.0001 Epoch: 28 Global Step: 583900 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:09,902-Speed 6311.12 samples/sec Loss 3.6563 LearningRate 0.0001 Epoch: 28 Global Step: 583910 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:13,147-Speed 6312.73 samples/sec Loss 3.7072 LearningRate 0.0001 Epoch: 28 Global Step: 583920 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:16,394-Speed 6308.98 samples/sec Loss 3.6504 LearningRate 0.0001 Epoch: 28 Global Step: 583930 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:19,640-Speed 6311.36 samples/sec Loss 3.6614 LearningRate 0.0001 Epoch: 28 Global Step: 583940 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:22,886-Speed 6310.18 samples/sec Loss 3.6954 LearningRate 0.0001 Epoch: 28 Global Step: 583950 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:26,142-Speed 6291.08 samples/sec Loss 3.6778 LearningRate 0.0001 Epoch: 28 Global Step: 583960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:29,374-Speed 6338.87 samples/sec Loss 3.6330 LearningRate 0.0001 Epoch: 28 Global Step: 583970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:32,623-Speed 6305.64 samples/sec Loss 3.6329 LearningRate 0.0001 Epoch: 28 Global Step: 583980 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:35,865-Speed 6317.06 samples/sec Loss 3.7053 LearningRate 0.0001 Epoch: 28 Global Step: 583990 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:39,112-Speed 6309.95 samples/sec Loss 3.7241 LearningRate 0.0001 Epoch: 28 Global Step: 584000 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:42,357-Speed 6311.23 samples/sec Loss 3.6370 LearningRate 0.0001 Epoch: 28 Global Step: 584010 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:45,601-Speed 6315.94 samples/sec Loss 3.7249 LearningRate 0.0001 Epoch: 28 Global Step: 584020 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:48,845-Speed 6313.38 samples/sec Loss 3.6290 LearningRate 0.0001 Epoch: 28 Global Step: 584030 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:52,096-Speed 6300.97 samples/sec Loss 3.7078 LearningRate 0.0001 Epoch: 28 Global Step: 584040 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:55,341-Speed 6312.79 samples/sec Loss 3.6073 LearningRate 0.0001 Epoch: 28 Global Step: 584050 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:51:58,586-Speed 6312.01 samples/sec Loss 3.6920 LearningRate 0.0001 Epoch: 28 Global Step: 584060 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:01,835-Speed 6305.38 samples/sec Loss 3.6754 LearningRate 0.0001 Epoch: 28 Global Step: 584070 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:52:05,081-Speed 6312.02 samples/sec Loss 3.6529 LearningRate 0.0001 Epoch: 28 Global Step: 584080 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:52:08,325-Speed 6314.63 samples/sec Loss 3.6579 LearningRate 0.0001 Epoch: 28 Global Step: 584090 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:52:11,569-Speed 6314.10 samples/sec Loss 3.6324 LearningRate 0.0001 Epoch: 28 Global Step: 584100 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:52:14,800-Speed 6339.80 samples/sec Loss 3.6526 LearningRate 0.0001 Epoch: 28 Global Step: 584110 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:18,045-Speed 6313.88 samples/sec Loss 3.6649 LearningRate 0.0001 Epoch: 28 Global Step: 584120 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:21,300-Speed 6292.63 samples/sec Loss 3.6791 LearningRate 0.0001 Epoch: 28 Global Step: 584130 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:24,542-Speed 6318.15 samples/sec Loss 3.7199 LearningRate 0.0001 Epoch: 28 Global Step: 584140 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:27,787-Speed 6313.46 samples/sec Loss 3.6924 LearningRate 0.0001 Epoch: 28 Global Step: 584150 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:31,032-Speed 6313.42 samples/sec Loss 3.6143 LearningRate 0.0001 Epoch: 28 Global Step: 584160 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:34,276-Speed 6313.02 samples/sec Loss 3.6910 LearningRate 0.0001 Epoch: 28 Global Step: 584170 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:37,526-Speed 6305.22 samples/sec Loss 3.6871 LearningRate 0.0001 Epoch: 28 Global Step: 584180 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:40,773-Speed 6307.66 samples/sec Loss 3.6438 LearningRate 0.0001 Epoch: 28 Global Step: 584190 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:44,018-Speed 6312.43 samples/sec Loss 3.6872 LearningRate 0.0001 Epoch: 28 Global Step: 584200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:47,274-Speed 6290.86 samples/sec Loss 3.6895 LearningRate 0.0001 Epoch: 28 Global Step: 584210 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:52:50,521-Speed 6309.51 samples/sec Loss 3.7051 LearningRate 0.0001 Epoch: 28 Global Step: 584220 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:52:53,752-Speed 6339.36 samples/sec Loss 3.6502 LearningRate 0.0001 Epoch: 28 Global Step: 584230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:52:56,999-Speed 6308.45 samples/sec Loss 3.6348 LearningRate 0.0001 Epoch: 28 Global Step: 584240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:00,245-Speed 6312.41 samples/sec Loss 3.6805 LearningRate 0.0001 Epoch: 28 Global Step: 584250 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:03,493-Speed 6305.83 samples/sec Loss 3.6360 LearningRate 0.0001 Epoch: 28 Global Step: 584260 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:06,739-Speed 6310.56 samples/sec Loss 3.6719 LearningRate 0.0001 Epoch: 28 Global Step: 584270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:10,001-Speed 6281.35 samples/sec Loss 3.6691 LearningRate 0.0001 Epoch: 28 Global Step: 584280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:13,240-Speed 6323.87 samples/sec Loss 3.6746 LearningRate 0.0001 Epoch: 28 Global Step: 584290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:16,487-Speed 6308.72 samples/sec Loss 3.7372 LearningRate 0.0001 Epoch: 28 Global Step: 584300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:19,732-Speed 6312.88 samples/sec Loss 3.6543 LearningRate 0.0001 Epoch: 28 Global Step: 584310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:22,979-Speed 6308.60 samples/sec Loss 3.6642 LearningRate 0.0001 Epoch: 28 Global Step: 584320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:26,227-Speed 6306.52 samples/sec Loss 3.6249 LearningRate 0.0001 Epoch: 28 Global Step: 584330 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:53:29,461-Speed 6334.37 samples/sec Loss 3.6949 LearningRate 0.0001 Epoch: 28 Global Step: 584340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:32,705-Speed 6314.95 samples/sec Loss 3.6218 LearningRate 0.0001 Epoch: 28 Global Step: 584350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:35,951-Speed 6311.01 samples/sec Loss 3.6405 LearningRate 0.0001 Epoch: 28 Global Step: 584360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:39,199-Speed 6307.54 samples/sec Loss 3.6294 LearningRate 0.0001 Epoch: 28 Global Step: 584370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:42,445-Speed 6309.80 samples/sec Loss 3.6230 LearningRate 0.0001 Epoch: 28 Global Step: 584380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:45,695-Speed 6303.05 samples/sec Loss 3.6281 LearningRate 0.0001 Epoch: 28 Global Step: 584390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:48,940-Speed 6314.56 samples/sec Loss 3.6670 LearningRate 0.0001 Epoch: 28 Global Step: 584400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:52,188-Speed 6305.69 samples/sec Loss 3.6211 LearningRate 0.0001 Epoch: 28 Global Step: 584410 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:55,434-Speed 6309.88 samples/sec Loss 3.6267 LearningRate 0.0001 Epoch: 28 Global Step: 584420 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:53:58,681-Speed 6309.51 samples/sec Loss 3.7222 LearningRate 0.0001 Epoch: 28 Global Step: 584430 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:01,914-Speed 6336.51 samples/sec Loss 3.7084 LearningRate 0.0001 Epoch: 28 Global Step: 584440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:05,156-Speed 6317.42 samples/sec Loss 3.6596 LearningRate 0.0001 Epoch: 28 Global Step: 584450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:08,403-Speed 6310.89 samples/sec Loss 3.6566 LearningRate 0.0001 Epoch: 28 Global Step: 584460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:11,646-Speed 6315.44 samples/sec Loss 3.7306 LearningRate 0.0001 Epoch: 28 Global Step: 584470 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:14,892-Speed 6310.44 samples/sec Loss 3.6286 LearningRate 0.0001 Epoch: 28 Global Step: 584480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:18,135-Speed 6315.81 samples/sec Loss 3.6658 LearningRate 0.0001 Epoch: 28 Global Step: 584490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:21,379-Speed 6315.02 samples/sec Loss 3.6032 LearningRate 0.0001 Epoch: 28 Global Step: 584500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:24,627-Speed 6306.82 samples/sec Loss 3.6762 LearningRate 0.0001 Epoch: 28 Global Step: 584510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:27,872-Speed 6313.91 samples/sec Loss 3.6422 LearningRate 0.0001 Epoch: 28 Global Step: 584520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:31,119-Speed 6308.16 samples/sec Loss 3.6819 LearningRate 0.0001 Epoch: 28 Global Step: 584530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:34,367-Speed 6306.82 samples/sec Loss 3.7541 LearningRate 0.0001 Epoch: 28 Global Step: 584540 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:54:37,613-Speed 6311.92 samples/sec Loss 3.6713 LearningRate 0.0001 Epoch: 28 Global Step: 584550 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:54:40,847-Speed 6333.69 samples/sec Loss 3.6775 LearningRate 0.0001 Epoch: 28 Global Step: 584560 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:44,091-Speed 6313.99 samples/sec Loss 3.6548 LearningRate 0.0001 Epoch: 28 Global Step: 584570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:47,342-Speed 6302.36 samples/sec Loss 3.6930 LearningRate 0.0001 Epoch: 28 Global Step: 584580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:50,584-Speed 6318.77 samples/sec Loss 3.6668 LearningRate 0.0001 Epoch: 28 Global Step: 584590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:53,837-Speed 6297.52 samples/sec Loss 3.6650 LearningRate 0.0001 Epoch: 28 Global Step: 584600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:54:57,082-Speed 6312.74 samples/sec Loss 3.5980 LearningRate 0.0001 Epoch: 28 Global Step: 584610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:00,325-Speed 6317.04 samples/sec Loss 3.6341 LearningRate 0.0001 Epoch: 28 Global Step: 584620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:03,577-Speed 6297.78 samples/sec Loss 3.6922 LearningRate 0.0001 Epoch: 28 Global Step: 584630 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:06,823-Speed 6311.83 samples/sec Loss 3.7395 LearningRate 0.0001 Epoch: 28 Global Step: 584640 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:10,070-Speed 6308.55 samples/sec Loss 3.6728 LearningRate 0.0001 Epoch: 28 Global Step: 584650 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:13,317-Speed 6307.97 samples/sec Loss 3.7014 LearningRate 0.0001 Epoch: 28 Global Step: 584660 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:55:16,563-Speed 6311.73 samples/sec Loss 3.6419 LearningRate 0.0001 Epoch: 28 Global Step: 584670 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:55:19,812-Speed 6305.58 samples/sec Loss 3.6272 LearningRate 0.0001 Epoch: 28 Global Step: 584680 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:55:23,042-Speed 6340.48 samples/sec Loss 3.6489 LearningRate 0.0001 Epoch: 28 Global Step: 584690 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:26,347-Speed 6198.36 samples/sec Loss 3.6337 LearningRate 0.0001 Epoch: 28 Global Step: 584700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:29,632-Speed 6237.29 samples/sec Loss 3.6703 LearningRate 0.0001 Epoch: 28 Global Step: 584710 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:32,877-Speed 6311.86 samples/sec Loss 3.6751 LearningRate 0.0001 Epoch: 28 Global Step: 584720 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:36,131-Speed 6296.28 samples/sec Loss 3.6297 LearningRate 0.0001 Epoch: 28 Global Step: 584730 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:39,372-Speed 6318.78 samples/sec Loss 3.7168 LearningRate 0.0001 Epoch: 28 Global Step: 584740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:42,617-Speed 6312.69 samples/sec Loss 3.6979 LearningRate 0.0001 Epoch: 28 Global Step: 584750 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:45,860-Speed 6316.38 samples/sec Loss 3.7027 LearningRate 0.0001 Epoch: 28 Global Step: 584760 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:49,110-Speed 6303.09 samples/sec Loss 3.6682 LearningRate 0.0001 Epoch: 28 Global Step: 584770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:52,357-Speed 6308.66 samples/sec Loss 3.6464 LearningRate 0.0001 Epoch: 28 Global Step: 584780 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:55:55,600-Speed 6317.16 samples/sec Loss 3.5949 LearningRate 0.0001 Epoch: 28 Global Step: 584790 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:55:58,847-Speed 6309.08 samples/sec Loss 3.6469 LearningRate 0.0001 Epoch: 28 Global Step: 584800 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:56:02,091-Speed 6314.49 samples/sec Loss 3.7067 LearningRate 0.0001 Epoch: 28 Global Step: 584810 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:56:05,321-Speed 6343.76 samples/sec Loss 3.5997 LearningRate 0.0001 Epoch: 28 Global Step: 584820 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:08,568-Speed 6308.23 samples/sec Loss 3.6741 LearningRate 0.0001 Epoch: 28 Global Step: 584830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:11,811-Speed 6316.84 samples/sec Loss 3.6430 LearningRate 0.0001 Epoch: 28 Global Step: 584840 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:15,055-Speed 6314.55 samples/sec Loss 3.6620 LearningRate 0.0001 Epoch: 28 Global Step: 584850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:18,301-Speed 6311.58 samples/sec Loss 3.6989 LearningRate 0.0001 Epoch: 28 Global Step: 584860 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:21,547-Speed 6309.55 samples/sec Loss 3.6858 LearningRate 0.0001 Epoch: 28 Global Step: 584870 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:24,792-Speed 6317.33 samples/sec Loss 3.6248 LearningRate 0.0001 Epoch: 28 Global Step: 584880 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:28,035-Speed 6315.62 samples/sec Loss 3.6693 LearningRate 0.0001 Epoch: 28 Global Step: 584890 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:31,281-Speed 6310.22 samples/sec Loss 3.7041 LearningRate 0.0001 Epoch: 28 Global Step: 584900 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:34,525-Speed 6314.23 samples/sec Loss 3.6951 LearningRate 0.0001 Epoch: 28 Global Step: 584910 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:37,779-Speed 6297.30 samples/sec Loss 3.6794 LearningRate 0.0001 Epoch: 28 Global Step: 584920 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:56:41,006-Speed 6346.89 samples/sec Loss 3.6901 LearningRate 0.0001 Epoch: 28 Global Step: 584930 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:44,251-Speed 6313.39 samples/sec Loss 3.7029 LearningRate 0.0001 Epoch: 28 Global Step: 584940 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:47,498-Speed 6308.21 samples/sec Loss 3.6991 LearningRate 0.0001 Epoch: 28 Global Step: 584950 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:50,741-Speed 6316.77 samples/sec Loss 3.6190 LearningRate 0.0001 Epoch: 28 Global Step: 584960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:53,984-Speed 6315.25 samples/sec Loss 3.7061 LearningRate 0.0001 Epoch: 28 Global Step: 584970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:56:57,230-Speed 6311.81 samples/sec Loss 3.6575 LearningRate 0.0001 Epoch: 28 Global Step: 584980 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:00,480-Speed 6302.22 samples/sec Loss 3.7021 LearningRate 0.0001 Epoch: 28 Global Step: 584990 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:03,725-Speed 6312.72 samples/sec Loss 3.6480 LearningRate 0.0001 Epoch: 28 Global Step: 585000 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:06,972-Speed 6308.71 samples/sec Loss 3.6661 LearningRate 0.0001 Epoch: 28 Global Step: 585010 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:10,221-Speed 6306.43 samples/sec Loss 3.6603 LearningRate 0.0001 Epoch: 28 Global Step: 585020 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:13,447-Speed 6349.86 samples/sec Loss 3.6594 LearningRate 0.0001 Epoch: 28 Global Step: 585030 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:16,691-Speed 6314.16 samples/sec Loss 3.6798 LearningRate 0.0001 Epoch: 28 Global Step: 585040 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:19,934-Speed 6317.43 samples/sec Loss 3.7059 LearningRate 0.0001 Epoch: 28 Global Step: 585050 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:23,177-Speed 6316.94 samples/sec Loss 3.6206 LearningRate 0.0001 Epoch: 28 Global Step: 585060 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:26,421-Speed 6314.41 samples/sec Loss 3.6425 LearningRate 0.0001 Epoch: 28 Global Step: 585070 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:29,667-Speed 6310.61 samples/sec Loss 3.6485 LearningRate 0.0001 Epoch: 28 Global Step: 585080 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:32,915-Speed 6306.17 samples/sec Loss 3.6248 LearningRate 0.0001 Epoch: 28 Global Step: 585090 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:36,163-Speed 6308.19 samples/sec Loss 3.6227 LearningRate 0.0001 Epoch: 28 Global Step: 585100 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:57:39,393-Speed 6342.02 samples/sec Loss 3.6726 LearningRate 0.0001 Epoch: 28 Global Step: 585110 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-02 20:57:42,637-Speed 6314.53 samples/sec Loss 3.7395 LearningRate 0.0001 Epoch: 28 Global Step: 585120 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-02 20:57:45,882-Speed 6313.06 samples/sec Loss 3.6739 LearningRate 0.0001 Epoch: 28 Global Step: 585130 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-02 20:57:49,130-Speed 6306.81 samples/sec Loss 3.6463 LearningRate 0.0001 Epoch: 28 Global Step: 585140 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-02 20:57:52,374-Speed 6312.91 samples/sec Loss 3.6255 LearningRate 0.0001 Epoch: 28 Global Step: 585150 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-02 20:57:55,619-Speed 6314.20 samples/sec Loss 3.6641 LearningRate 0.0001 Epoch: 28 Global Step: 585160 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-02 20:57:58,863-Speed 6314.52 samples/sec Loss 3.6349 LearningRate 0.0001 Epoch: 28 Global Step: 585170 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-02 20:58:02,108-Speed 6312.70 samples/sec Loss 3.6949 LearningRate 0.0001 Epoch: 28 Global Step: 585180 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-02 20:58:05,352-Speed 6313.27 samples/sec Loss 3.6535 LearningRate 0.0001 Epoch: 28 Global Step: 585190 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-02 20:58:08,593-Speed 6320.87 samples/sec Loss 3.6668 LearningRate 0.0001 Epoch: 28 Global Step: 585200 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-02 20:58:11,839-Speed 6311.08 samples/sec Loss 3.6375 LearningRate 0.0001 Epoch: 28 Global Step: 585210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:15,082-Speed 6316.19 samples/sec Loss 3.6980 LearningRate 0.0001 Epoch: 28 Global Step: 585220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:18,329-Speed 6309.44 samples/sec Loss 3.6235 LearningRate 0.0001 Epoch: 28 Global Step: 585230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:21,578-Speed 6305.34 samples/sec Loss 3.6604 LearningRate 0.0001 Epoch: 28 Global Step: 585240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:24,825-Speed 6309.66 samples/sec Loss 3.6477 LearningRate 0.0001 Epoch: 28 Global Step: 585250 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:28,072-Speed 6308.69 samples/sec Loss 3.6648 LearningRate 0.0001 Epoch: 28 Global Step: 585260 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:31,317-Speed 6312.90 samples/sec Loss 3.6445 LearningRate 0.0001 Epoch: 28 Global Step: 585270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:34,560-Speed 6315.22 samples/sec Loss 3.7000 LearningRate 0.0001 Epoch: 28 Global Step: 585280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:37,801-Speed 6321.48 samples/sec Loss 3.5822 LearningRate 0.0001 Epoch: 28 Global Step: 585290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:41,048-Speed 6308.19 samples/sec Loss 3.6629 LearningRate 0.0001 Epoch: 28 Global Step: 585300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:44,279-Speed 6341.44 samples/sec Loss 3.6821 LearningRate 0.0001 Epoch: 28 Global Step: 585310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:47,524-Speed 6312.62 samples/sec Loss 3.6930 LearningRate 0.0001 Epoch: 28 Global Step: 585320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:50,767-Speed 6315.62 samples/sec Loss 3.6540 LearningRate 0.0001 Epoch: 28 Global Step: 585330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:54,013-Speed 6311.05 samples/sec Loss 3.6899 LearningRate 0.0001 Epoch: 28 Global Step: 585340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:58:57,259-Speed 6311.36 samples/sec Loss 3.5957 LearningRate 0.0001 Epoch: 28 Global Step: 585350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:00,504-Speed 6312.28 samples/sec Loss 3.6402 LearningRate 0.0001 Epoch: 28 Global Step: 585360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:03,749-Speed 6312.61 samples/sec Loss 3.6631 LearningRate 0.0001 Epoch: 28 Global Step: 585370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:06,992-Speed 6316.05 samples/sec Loss 3.6841 LearningRate 0.0001 Epoch: 28 Global Step: 585380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:10,240-Speed 6307.71 samples/sec Loss 3.5962 LearningRate 0.0001 Epoch: 28 Global Step: 585390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:13,485-Speed 6312.23 samples/sec Loss 3.7297 LearningRate 0.0001 Epoch: 28 Global Step: 585400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:16,728-Speed 6316.94 samples/sec Loss 3.6588 LearningRate 0.0001 Epoch: 28 Global Step: 585410 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:59:19,976-Speed 6306.25 samples/sec Loss 3.6467 LearningRate 0.0001 Epoch: 28 Global Step: 585420 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:59:23,223-Speed 6308.17 samples/sec Loss 3.6662 LearningRate 0.0001 Epoch: 28 Global Step: 585430 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 20:59:26,456-Speed 6337.48 samples/sec Loss 3.6458 LearningRate 0.0001 Epoch: 28 Global Step: 585440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:29,703-Speed 6307.93 samples/sec Loss 3.6392 LearningRate 0.0001 Epoch: 28 Global Step: 585450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:32,950-Speed 6308.84 samples/sec Loss 3.6463 LearningRate 0.0001 Epoch: 28 Global Step: 585460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:36,198-Speed 6308.11 samples/sec Loss 3.6695 LearningRate 0.0001 Epoch: 28 Global Step: 585470 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:39,443-Speed 6313.01 samples/sec Loss 3.6942 LearningRate 0.0001 Epoch: 28 Global Step: 585480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:42,687-Speed 6313.88 samples/sec Loss 3.6452 LearningRate 0.0001 Epoch: 28 Global Step: 585490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:45,930-Speed 6317.73 samples/sec Loss 3.6768 LearningRate 0.0001 Epoch: 28 Global Step: 585500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:49,176-Speed 6308.75 samples/sec Loss 3.6906 LearningRate 0.0001 Epoch: 28 Global Step: 585510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:52,422-Speed 6311.43 samples/sec Loss 3.6543 LearningRate 0.0001 Epoch: 28 Global Step: 585520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:55,667-Speed 6313.45 samples/sec Loss 3.6378 LearningRate 0.0001 Epoch: 28 Global Step: 585530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 20:59:58,898-Speed 6339.08 samples/sec Loss 3.6628 LearningRate 0.0001 Epoch: 28 Global Step: 585540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:02,147-Speed 6306.05 samples/sec Loss 3.6456 LearningRate 0.0001 Epoch: 28 Global Step: 585550 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:05,393-Speed 6311.42 samples/sec Loss 3.7342 LearningRate 0.0001 Epoch: 28 Global Step: 585560 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:08,640-Speed 6308.74 samples/sec Loss 3.6255 LearningRate 0.0001 Epoch: 28 Global Step: 585570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:11,882-Speed 6317.34 samples/sec Loss 3.6614 LearningRate 0.0001 Epoch: 28 Global Step: 585580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:15,126-Speed 6315.44 samples/sec Loss 3.7507 LearningRate 0.0001 Epoch: 28 Global Step: 585590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:18,373-Speed 6309.37 samples/sec Loss 3.5849 LearningRate 0.0001 Epoch: 28 Global Step: 585600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:21,615-Speed 6318.42 samples/sec Loss 3.6332 LearningRate 0.0001 Epoch: 28 Global Step: 585610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:24,860-Speed 6311.77 samples/sec Loss 3.6933 LearningRate 0.0001 Epoch: 28 Global Step: 585620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:28,109-Speed 6304.94 samples/sec Loss 3.6393 LearningRate 0.0001 Epoch: 28 Global Step: 585630 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:31,358-Speed 6306.19 samples/sec Loss 3.6747 LearningRate 0.0001 Epoch: 28 Global Step: 585640 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:00:34,649-Speed 6224.49 samples/sec Loss 3.6192 LearningRate 0.0001 Epoch: 28 Global Step: 585650 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:00:37,884-Speed 6330.74 samples/sec Loss 3.6061 LearningRate 0.0001 Epoch: 28 Global Step: 585660 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:41,131-Speed 6309.72 samples/sec Loss 3.7392 LearningRate 0.0001 Epoch: 28 Global Step: 585670 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:44,377-Speed 6310.57 samples/sec Loss 3.6960 LearningRate 0.0001 Epoch: 28 Global Step: 585680 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:47,625-Speed 6308.32 samples/sec Loss 3.6969 LearningRate 0.0001 Epoch: 28 Global Step: 585690 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:50,871-Speed 6309.62 samples/sec Loss 3.6459 LearningRate 0.0001 Epoch: 28 Global Step: 585700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:54,120-Speed 6304.93 samples/sec Loss 3.6384 LearningRate 0.0001 Epoch: 28 Global Step: 585710 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:00:57,368-Speed 6307.88 samples/sec Loss 3.6750 LearningRate 0.0001 Epoch: 28 Global Step: 585720 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:00,615-Speed 6308.96 samples/sec Loss 3.6235 LearningRate 0.0001 Epoch: 28 Global Step: 585730 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:03,860-Speed 6312.88 samples/sec Loss 3.6195 LearningRate 0.0001 Epoch: 28 Global Step: 585740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:07,102-Speed 6317.88 samples/sec Loss 3.6330 LearningRate 0.0001 Epoch: 28 Global Step: 585750 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:10,347-Speed 6313.29 samples/sec Loss 3.6334 LearningRate 0.0001 Epoch: 28 Global Step: 585760 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:01:13,576-Speed 6343.40 samples/sec Loss 3.7129 LearningRate 0.0001 Epoch: 28 Global Step: 585770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:16,826-Speed 6301.79 samples/sec Loss 3.6543 LearningRate 0.0001 Epoch: 28 Global Step: 585780 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:20,075-Speed 6305.32 samples/sec Loss 3.6299 LearningRate 0.0001 Epoch: 28 Global Step: 585790 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:23,321-Speed 6310.82 samples/sec Loss 3.6326 LearningRate 0.0001 Epoch: 28 Global Step: 585800 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:26,565-Speed 6314.61 samples/sec Loss 3.6498 LearningRate 0.0001 Epoch: 28 Global Step: 585810 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:29,814-Speed 6305.84 samples/sec Loss 3.6442 LearningRate 0.0001 Epoch: 28 Global Step: 585820 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:33,060-Speed 6309.65 samples/sec Loss 3.6282 LearningRate 0.0001 Epoch: 28 Global Step: 585830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:36,309-Speed 6305.64 samples/sec Loss 3.6551 LearningRate 0.0001 Epoch: 28 Global Step: 585840 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:39,555-Speed 6309.86 samples/sec Loss 3.6976 LearningRate 0.0001 Epoch: 28 Global Step: 585850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:42,801-Speed 6311.82 samples/sec Loss 3.6997 LearningRate 0.0001 Epoch: 28 Global Step: 585860 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:46,048-Speed 6308.03 samples/sec Loss 3.6735 LearningRate 0.0001 Epoch: 28 Global Step: 585870 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:01:49,278-Speed 6342.15 samples/sec Loss 3.6940 LearningRate 0.0001 Epoch: 28 Global Step: 585880 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:52,524-Speed 6311.50 samples/sec Loss 3.6922 LearningRate 0.0001 Epoch: 28 Global Step: 585890 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:55,784-Speed 6284.61 samples/sec Loss 3.6725 LearningRate 0.0001 Epoch: 28 Global Step: 585900 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:01:59,029-Speed 6312.40 samples/sec Loss 3.6516 LearningRate 0.0001 Epoch: 28 Global Step: 585910 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:02,275-Speed 6311.33 samples/sec Loss 3.6705 LearningRate 0.0001 Epoch: 28 Global Step: 585920 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:05,522-Speed 6307.67 samples/sec Loss 3.7349 LearningRate 0.0001 Epoch: 28 Global Step: 585930 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:08,768-Speed 6311.81 samples/sec Loss 3.6091 LearningRate 0.0001 Epoch: 28 Global Step: 585940 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:12,017-Speed 6304.38 samples/sec Loss 3.6490 LearningRate 0.0001 Epoch: 28 Global Step: 585950 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:15,261-Speed 6314.92 samples/sec Loss 3.6232 LearningRate 0.0001 Epoch: 28 Global Step: 585960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:18,508-Speed 6308.97 samples/sec Loss 3.6577 LearningRate 0.0001 Epoch: 28 Global Step: 585970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:21,752-Speed 6314.77 samples/sec Loss 3.6702 LearningRate 0.0001 Epoch: 28 Global Step: 585980 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:02:25,001-Speed 6304.94 samples/sec Loss 3.6504 LearningRate 0.0001 Epoch: 28 Global Step: 585990 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:02:28,235-Speed 6333.87 samples/sec Loss 3.6299 LearningRate 0.0001 Epoch: 28 Global Step: 586000 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:31,479-Speed 6315.35 samples/sec Loss 3.6504 LearningRate 0.0001 Epoch: 28 Global Step: 586010 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:34,732-Speed 6297.00 samples/sec Loss 3.6568 LearningRate 0.0001 Epoch: 28 Global Step: 586020 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:37,981-Speed 6303.87 samples/sec Loss 3.6771 LearningRate 0.0001 Epoch: 28 Global Step: 586030 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:41,224-Speed 6316.92 samples/sec Loss 3.6892 LearningRate 0.0001 Epoch: 28 Global Step: 586040 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:44,469-Speed 6313.84 samples/sec Loss 3.6050 LearningRate 0.0001 Epoch: 28 Global Step: 586050 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:47,714-Speed 6311.11 samples/sec Loss 3.5743 LearningRate 0.0001 Epoch: 28 Global Step: 586060 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:50,981-Speed 6270.42 samples/sec Loss 3.6955 LearningRate 0.0001 Epoch: 28 Global Step: 586070 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:54,226-Speed 6312.81 samples/sec Loss 3.6273 LearningRate 0.0001 Epoch: 28 Global Step: 586080 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:02:57,472-Speed 6310.99 samples/sec Loss 3.6275 LearningRate 0.0001 Epoch: 28 Global Step: 586090 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:00,715-Speed 6316.75 samples/sec Loss 3.6581 LearningRate 0.0001 Epoch: 28 Global Step: 586100 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:03:03,945-Speed 6342.81 samples/sec Loss 3.6261 LearningRate 0.0001 Epoch: 28 Global Step: 586110 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:07,190-Speed 6312.82 samples/sec Loss 3.6795 LearningRate 0.0001 Epoch: 28 Global Step: 586120 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:10,438-Speed 6306.87 samples/sec Loss 3.6771 LearningRate 0.0001 Epoch: 28 Global Step: 586130 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:13,680-Speed 6317.89 samples/sec Loss 3.6812 LearningRate 0.0001 Epoch: 28 Global Step: 586140 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:16,929-Speed 6305.47 samples/sec Loss 3.6410 LearningRate 0.0001 Epoch: 28 Global Step: 586150 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:20,173-Speed 6314.09 samples/sec Loss 3.6658 LearningRate 0.0001 Epoch: 28 Global Step: 586160 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:23,420-Speed 6309.51 samples/sec Loss 3.6641 LearningRate 0.0001 Epoch: 28 Global Step: 586170 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:26,665-Speed 6311.65 samples/sec Loss 3.6592 LearningRate 0.0001 Epoch: 28 Global Step: 586180 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:29,910-Speed 6314.15 samples/sec Loss 3.7930 LearningRate 0.0001 Epoch: 28 Global Step: 586190 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:33,151-Speed 6320.24 samples/sec Loss 3.6404 LearningRate 0.0001 Epoch: 28 Global Step: 586200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:36,382-Speed 6340.35 samples/sec Loss 3.6616 LearningRate 0.0001 Epoch: 28 Global Step: 586210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:39,628-Speed 6309.60 samples/sec Loss 3.6600 LearningRate 0.0001 Epoch: 28 Global Step: 586220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:42,872-Speed 6315.01 samples/sec Loss 3.6502 LearningRate 0.0001 Epoch: 28 Global Step: 586230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:46,118-Speed 6311.44 samples/sec Loss 3.6281 LearningRate 0.0001 Epoch: 28 Global Step: 586240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:49,363-Speed 6311.44 samples/sec Loss 3.6516 LearningRate 0.0001 Epoch: 28 Global Step: 586250 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:52,606-Speed 6317.03 samples/sec Loss 3.6880 LearningRate 0.0001 Epoch: 28 Global Step: 586260 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:55,852-Speed 6310.76 samples/sec Loss 3.6329 LearningRate 0.0001 Epoch: 28 Global Step: 586270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:03:59,094-Speed 6319.78 samples/sec Loss 3.6218 LearningRate 0.0001 Epoch: 28 Global Step: 586280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:02,339-Speed 6311.64 samples/sec Loss 3.6842 LearningRate 0.0001 Epoch: 28 Global Step: 586290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:05,583-Speed 6314.95 samples/sec Loss 3.6610 LearningRate 0.0001 Epoch: 28 Global Step: 586300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:08,830-Speed 6308.83 samples/sec Loss 3.6810 LearningRate 0.0001 Epoch: 28 Global Step: 586310 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:04:12,061-Speed 6339.37 samples/sec Loss 3.6460 LearningRate 0.0001 Epoch: 28 Global Step: 586320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:15,311-Speed 6303.89 samples/sec Loss 3.6872 LearningRate 0.0001 Epoch: 28 Global Step: 586330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:18,557-Speed 6311.69 samples/sec Loss 3.7323 LearningRate 0.0001 Epoch: 28 Global Step: 586340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:21,805-Speed 6305.58 samples/sec Loss 3.6551 LearningRate 0.0001 Epoch: 28 Global Step: 586350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:25,053-Speed 6307.37 samples/sec Loss 3.6350 LearningRate 0.0001 Epoch: 28 Global Step: 586360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:28,299-Speed 6311.56 samples/sec Loss 3.7070 LearningRate 0.0001 Epoch: 28 Global Step: 586370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:31,543-Speed 6313.56 samples/sec Loss 3.6058 LearningRate 0.0001 Epoch: 28 Global Step: 586380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:34,789-Speed 6311.41 samples/sec Loss 3.6983 LearningRate 0.0001 Epoch: 28 Global Step: 586390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:38,037-Speed 6306.47 samples/sec Loss 3.6225 LearningRate 0.0001 Epoch: 28 Global Step: 586400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:41,283-Speed 6311.67 samples/sec Loss 3.6947 LearningRate 0.0001 Epoch: 28 Global Step: 586410 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:44,515-Speed 6337.06 samples/sec Loss 3.6010 LearningRate 0.0001 Epoch: 28 Global Step: 586420 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:47,761-Speed 6311.30 samples/sec Loss 3.6534 LearningRate 0.0001 Epoch: 28 Global Step: 586430 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:51,008-Speed 6309.45 samples/sec Loss 3.6450 LearningRate 0.0001 Epoch: 28 Global Step: 586440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:54,256-Speed 6305.89 samples/sec Loss 3.6182 LearningRate 0.0001 Epoch: 28 Global Step: 586450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:04:57,502-Speed 6311.00 samples/sec Loss 3.6665 LearningRate 0.0001 Epoch: 28 Global Step: 586460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:00,754-Speed 6299.55 samples/sec Loss 3.6270 LearningRate 0.0001 Epoch: 28 Global Step: 586470 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:03,998-Speed 6313.36 samples/sec Loss 3.5527 LearningRate 0.0001 Epoch: 28 Global Step: 586480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:07,242-Speed 6316.70 samples/sec Loss 3.6252 LearningRate 0.0001 Epoch: 28 Global Step: 586490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:10,484-Speed 6317.50 samples/sec Loss 3.6661 LearningRate 0.0001 Epoch: 28 Global Step: 586500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:13,732-Speed 6306.85 samples/sec Loss 3.7534 LearningRate 0.0001 Epoch: 28 Global Step: 586510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:16,978-Speed 6309.68 samples/sec Loss 3.6844 LearningRate 0.0001 Epoch: 28 Global Step: 586520 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:05:20,205-Speed 6349.46 samples/sec Loss 3.7228 LearningRate 0.0001 Epoch: 28 Global Step: 586530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:23,454-Speed 6304.37 samples/sec Loss 3.6440 LearningRate 0.0001 Epoch: 28 Global Step: 586540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:26,700-Speed 6310.59 samples/sec Loss 3.6359 LearningRate 0.0001 Epoch: 28 Global Step: 586550 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:29,946-Speed 6310.52 samples/sec Loss 3.6444 LearningRate 0.0001 Epoch: 28 Global Step: 586560 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:33,188-Speed 6319.26 samples/sec Loss 3.5878 LearningRate 0.0001 Epoch: 28 Global Step: 586570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:36,434-Speed 6311.95 samples/sec Loss 3.6595 LearningRate 0.0001 Epoch: 28 Global Step: 586580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:39,678-Speed 6314.86 samples/sec Loss 3.6455 LearningRate 0.0001 Epoch: 28 Global Step: 586590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:42,920-Speed 6318.49 samples/sec Loss 3.6492 LearningRate 0.0001 Epoch: 28 Global Step: 586600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:46,164-Speed 6314.02 samples/sec Loss 3.6532 LearningRate 0.0001 Epoch: 28 Global Step: 586610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:49,415-Speed 6301.05 samples/sec Loss 3.6860 LearningRate 0.0001 Epoch: 28 Global Step: 586620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:05:52,661-Speed 6311.49 samples/sec Loss 3.6263 LearningRate 0.0001 Epoch: 28 Global Step: 586630 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:05:55,906-Speed 6311.25 samples/sec Loss 3.6774 LearningRate 0.0001 Epoch: 28 Global Step: 586640 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:05:59,142-Speed 6330.19 samples/sec Loss 3.6765 LearningRate 0.0001 Epoch: 28 Global Step: 586650 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:02,391-Speed 6305.88 samples/sec Loss 3.6356 LearningRate 0.0001 Epoch: 28 Global Step: 586660 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:05,638-Speed 6308.98 samples/sec Loss 3.6655 LearningRate 0.0001 Epoch: 28 Global Step: 586670 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:08,885-Speed 6309.51 samples/sec Loss 3.6300 LearningRate 0.0001 Epoch: 28 Global Step: 586680 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:12,128-Speed 6315.84 samples/sec Loss 3.6599 LearningRate 0.0001 Epoch: 28 Global Step: 586690 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:15,371-Speed 6316.24 samples/sec Loss 3.6701 LearningRate 0.0001 Epoch: 28 Global Step: 586700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:18,623-Speed 6298.88 samples/sec Loss 3.6851 LearningRate 0.0001 Epoch: 28 Global Step: 586710 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:21,870-Speed 6309.69 samples/sec Loss 3.6307 LearningRate 0.0001 Epoch: 28 Global Step: 586720 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:25,114-Speed 6313.19 samples/sec Loss 3.6340 LearningRate 0.0001 Epoch: 28 Global Step: 586730 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:28,358-Speed 6315.69 samples/sec Loss 3.6549 LearningRate 0.0001 Epoch: 28 Global Step: 586740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:31,605-Speed 6308.08 samples/sec Loss 3.6785 LearningRate 0.0001 Epoch: 28 Global Step: 586750 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:06:34,838-Speed 6337.18 samples/sec Loss 3.7187 LearningRate 0.0001 Epoch: 28 Global Step: 586760 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:38,083-Speed 6311.56 samples/sec Loss 3.6824 LearningRate 0.0001 Epoch: 28 Global Step: 586770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:41,329-Speed 6312.17 samples/sec Loss 3.6200 LearningRate 0.0001 Epoch: 28 Global Step: 586780 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:44,574-Speed 6313.47 samples/sec Loss 3.6430 LearningRate 0.0001 Epoch: 28 Global Step: 586790 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:47,828-Speed 6294.97 samples/sec Loss 3.6694 LearningRate 0.0001 Epoch: 28 Global Step: 586800 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:51,077-Speed 6304.36 samples/sec Loss 3.7134 LearningRate 0.0001 Epoch: 28 Global Step: 586810 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:54,322-Speed 6313.81 samples/sec Loss 3.6542 LearningRate 0.0001 Epoch: 28 Global Step: 586820 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:06:57,568-Speed 6309.78 samples/sec Loss 3.5996 LearningRate 0.0001 Epoch: 28 Global Step: 586830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:00,815-Speed 6309.00 samples/sec Loss 3.5949 LearningRate 0.0001 Epoch: 28 Global Step: 586840 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:04,057-Speed 6318.34 samples/sec Loss 3.6690 LearningRate 0.0001 Epoch: 28 Global Step: 586850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:07,304-Speed 6308.55 samples/sec Loss 3.6485 LearningRate 0.0001 Epoch: 28 Global Step: 586860 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:07:10,552-Speed 6308.00 samples/sec Loss 3.6382 LearningRate 0.0001 Epoch: 28 Global Step: 586870 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:07:13,797-Speed 6310.79 samples/sec Loss 3.6578 LearningRate 0.0001 Epoch: 28 Global Step: 586880 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:07:17,048-Speed 6300.96 samples/sec Loss 3.6275 LearningRate 0.0001 Epoch: 28 Global Step: 586890 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:07:20,278-Speed 6342.76 samples/sec Loss 3.6230 LearningRate 0.0001 Epoch: 28 Global Step: 586900 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:23,523-Speed 6312.65 samples/sec Loss 3.6292 LearningRate 0.0001 Epoch: 28 Global Step: 586910 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:26,771-Speed 6307.33 samples/sec Loss 3.6838 LearningRate 0.0001 Epoch: 28 Global Step: 586920 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:30,018-Speed 6308.23 samples/sec Loss 3.6785 LearningRate 0.0001 Epoch: 28 Global Step: 586930 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:33,264-Speed 6310.65 samples/sec Loss 3.6305 LearningRate 0.0001 Epoch: 28 Global Step: 586940 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:36,504-Speed 6322.45 samples/sec Loss 3.6804 LearningRate 0.0001 Epoch: 28 Global Step: 586950 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:39,748-Speed 6314.27 samples/sec Loss 3.7130 LearningRate 0.0001 Epoch: 28 Global Step: 586960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:42,994-Speed 6311.75 samples/sec Loss 3.6478 LearningRate 0.0001 Epoch: 28 Global Step: 586970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:46,240-Speed 6311.28 samples/sec Loss 3.6484 LearningRate 0.0001 Epoch: 28 Global Step: 586980 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:49,485-Speed 6312.04 samples/sec Loss 3.6483 LearningRate 0.0001 Epoch: 28 Global Step: 586990 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:07:52,731-Speed 6310.99 samples/sec Loss 3.5730 LearningRate 0.0001 Epoch: 28 Global Step: 587000 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:07:55,978-Speed 6308.38 samples/sec Loss 3.6263 LearningRate 0.0001 Epoch: 28 Global Step: 587010 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:07:59,230-Speed 6299.71 samples/sec Loss 3.6690 LearningRate 0.0001 Epoch: 28 Global Step: 587020 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:08:02,463-Speed 6337.20 samples/sec Loss 3.6620 LearningRate 0.0001 Epoch: 28 Global Step: 587030 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:05,710-Speed 6308.75 samples/sec Loss 3.6650 LearningRate 0.0001 Epoch: 28 Global Step: 587040 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:08,955-Speed 6313.29 samples/sec Loss 3.6293 LearningRate 0.0001 Epoch: 28 Global Step: 587050 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:12,200-Speed 6310.85 samples/sec Loss 3.6591 LearningRate 0.0001 Epoch: 28 Global Step: 587060 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:15,447-Speed 6308.76 samples/sec Loss 3.7086 LearningRate 0.0001 Epoch: 28 Global Step: 587070 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:18,693-Speed 6311.84 samples/sec Loss 3.6525 LearningRate 0.0001 Epoch: 28 Global Step: 587080 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:21,938-Speed 6313.22 samples/sec Loss 3.5863 LearningRate 0.0001 Epoch: 28 Global Step: 587090 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:25,188-Speed 6302.05 samples/sec Loss 3.6372 LearningRate 0.0001 Epoch: 28 Global Step: 587100 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:28,431-Speed 6317.23 samples/sec Loss 3.6392 LearningRate 0.0001 Epoch: 28 Global Step: 587110 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:31,673-Speed 6317.49 samples/sec Loss 3.6529 LearningRate 0.0001 Epoch: 28 Global Step: 587120 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:34,917-Speed 6315.32 samples/sec Loss 3.6919 LearningRate 0.0001 Epoch: 28 Global Step: 587130 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:08:38,143-Speed 6349.82 samples/sec Loss 3.5864 LearningRate 0.0001 Epoch: 28 Global Step: 587140 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:41,391-Speed 6306.88 samples/sec Loss 3.6271 LearningRate 0.0001 Epoch: 28 Global Step: 587150 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:44,635-Speed 6313.73 samples/sec Loss 3.6013 LearningRate 0.0001 Epoch: 28 Global Step: 587160 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:47,881-Speed 6310.84 samples/sec Loss 3.6210 LearningRate 0.0001 Epoch: 28 Global Step: 587170 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:51,127-Speed 6312.10 samples/sec Loss 3.6575 LearningRate 0.0001 Epoch: 28 Global Step: 587180 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:54,374-Speed 6308.42 samples/sec Loss 3.6376 LearningRate 0.0001 Epoch: 28 Global Step: 587190 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:08:57,615-Speed 6320.48 samples/sec Loss 3.6791 LearningRate 0.0001 Epoch: 28 Global Step: 587200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:00,860-Speed 6311.77 samples/sec Loss 3.6656 LearningRate 0.0001 Epoch: 28 Global Step: 587210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:04,106-Speed 6311.41 samples/sec Loss 3.6252 LearningRate 0.0001 Epoch: 28 Global Step: 587220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:07,352-Speed 6310.05 samples/sec Loss 3.6826 LearningRate 0.0001 Epoch: 28 Global Step: 587230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:10,597-Speed 6314.03 samples/sec Loss 3.6562 LearningRate 0.0001 Epoch: 28 Global Step: 587240 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:09:13,831-Speed 6334.00 samples/sec Loss 3.6562 LearningRate 0.0001 Epoch: 28 Global Step: 587250 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:17,078-Speed 6308.21 samples/sec Loss 3.6904 LearningRate 0.0001 Epoch: 28 Global Step: 587260 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:20,330-Speed 6298.80 samples/sec Loss 3.6757 LearningRate 0.0001 Epoch: 28 Global Step: 587270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:23,576-Speed 6311.36 samples/sec Loss 3.6717 LearningRate 0.0001 Epoch: 28 Global Step: 587280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:26,817-Speed 6321.30 samples/sec Loss 3.6518 LearningRate 0.0001 Epoch: 28 Global Step: 587290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:30,065-Speed 6305.94 samples/sec Loss 3.6416 LearningRate 0.0001 Epoch: 28 Global Step: 587300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:33,309-Speed 6315.69 samples/sec Loss 3.6367 LearningRate 0.0001 Epoch: 28 Global Step: 587310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:36,554-Speed 6311.70 samples/sec Loss 3.6266 LearningRate 0.0001 Epoch: 28 Global Step: 587320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:39,798-Speed 6315.58 samples/sec Loss 3.5918 LearningRate 0.0001 Epoch: 28 Global Step: 587330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:43,041-Speed 6316.61 samples/sec Loss 3.6594 LearningRate 0.0001 Epoch: 28 Global Step: 587340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:46,288-Speed 6308.95 samples/sec Loss 3.6472 LearningRate 0.0001 Epoch: 28 Global Step: 587350 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:09:49,532-Speed 6314.30 samples/sec Loss 3.6130 LearningRate 0.0001 Epoch: 28 Global Step: 587360 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:09:52,767-Speed 6332.56 samples/sec Loss 3.6840 LearningRate 0.0001 Epoch: 28 Global Step: 587370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:56,010-Speed 6315.76 samples/sec Loss 3.6110 LearningRate 0.0001 Epoch: 28 Global Step: 587380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:09:59,256-Speed 6310.73 samples/sec Loss 3.6509 LearningRate 0.0001 Epoch: 28 Global Step: 587390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:02,509-Speed 6296.52 samples/sec Loss 3.6541 LearningRate 0.0001 Epoch: 28 Global Step: 587400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:05,763-Speed 6297.03 samples/sec Loss 3.6221 LearningRate 0.0001 Epoch: 28 Global Step: 587410 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:09,009-Speed 6309.17 samples/sec Loss 3.6997 LearningRate 0.0001 Epoch: 28 Global Step: 587420 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:12,267-Speed 6287.92 samples/sec Loss 3.6404 LearningRate 0.0001 Epoch: 28 Global Step: 587430 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:15,526-Speed 6285.50 samples/sec Loss 3.5850 LearningRate 0.0001 Epoch: 28 Global Step: 587440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:18,773-Speed 6309.14 samples/sec Loss 3.6321 LearningRate 0.0001 Epoch: 28 Global Step: 587450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:22,021-Speed 6307.64 samples/sec Loss 3.6855 LearningRate 0.0001 Epoch: 28 Global Step: 587460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:25,273-Speed 6308.38 samples/sec Loss 3.5943 LearningRate 0.0001 Epoch: 28 Global Step: 587470 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:10:28,510-Speed 6328.93 samples/sec Loss 3.6513 LearningRate 0.0001 Epoch: 28 Global Step: 587480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:31,755-Speed 6313.42 samples/sec Loss 3.6711 LearningRate 0.0001 Epoch: 28 Global Step: 587490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:35,002-Speed 6308.16 samples/sec Loss 3.6499 LearningRate 0.0001 Epoch: 28 Global Step: 587500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:38,246-Speed 6315.13 samples/sec Loss 3.6473 LearningRate 0.0001 Epoch: 28 Global Step: 587510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:41,486-Speed 6322.33 samples/sec Loss 3.6974 LearningRate 0.0001 Epoch: 28 Global Step: 587520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:44,727-Speed 6319.96 samples/sec Loss 3.5930 LearningRate 0.0001 Epoch: 28 Global Step: 587530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:47,970-Speed 6316.88 samples/sec Loss 3.6461 LearningRate 0.0001 Epoch: 28 Global Step: 587540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:51,214-Speed 6314.26 samples/sec Loss 3.6416 LearningRate 0.0001 Epoch: 28 Global Step: 587550 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:54,459-Speed 6312.98 samples/sec Loss 3.6330 LearningRate 0.0001 Epoch: 28 Global Step: 587560 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:10:57,702-Speed 6316.82 samples/sec Loss 3.6306 LearningRate 0.0001 Epoch: 28 Global Step: 587570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:00,933-Speed 6338.99 samples/sec Loss 3.6625 LearningRate 0.0001 Epoch: 28 Global Step: 587580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:04,184-Speed 6301.61 samples/sec Loss 3.6587 LearningRate 0.0001 Epoch: 28 Global Step: 587590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:07,425-Speed 6319.96 samples/sec Loss 3.6258 LearningRate 0.0001 Epoch: 28 Global Step: 587600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:10,671-Speed 6311.42 samples/sec Loss 3.6110 LearningRate 0.0001 Epoch: 28 Global Step: 587610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:13,912-Speed 6319.83 samples/sec Loss 3.6074 LearningRate 0.0001 Epoch: 28 Global Step: 587620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:17,157-Speed 6312.67 samples/sec Loss 3.6370 LearningRate 0.0001 Epoch: 28 Global Step: 587630 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:20,403-Speed 6311.63 samples/sec Loss 3.6652 LearningRate 0.0001 Epoch: 28 Global Step: 587640 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:23,642-Speed 6324.28 samples/sec Loss 3.6124 LearningRate 0.0001 Epoch: 28 Global Step: 587650 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:26,889-Speed 6309.28 samples/sec Loss 3.5876 LearningRate 0.0001 Epoch: 28 Global Step: 587660 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:30,135-Speed 6309.89 samples/sec Loss 3.6431 LearningRate 0.0001 Epoch: 28 Global Step: 587670 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:33,380-Speed 6314.50 samples/sec Loss 3.6463 LearningRate 0.0001 Epoch: 28 Global Step: 587680 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:11:36,611-Speed 6339.01 samples/sec Loss 3.6909 LearningRate 0.0001 Epoch: 28 Global Step: 587690 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:39,854-Speed 6317.65 samples/sec Loss 3.6971 LearningRate 0.0001 Epoch: 28 Global Step: 587700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:43,099-Speed 6313.40 samples/sec Loss 3.6256 LearningRate 0.0001 Epoch: 28 Global Step: 587710 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:46,344-Speed 6311.16 samples/sec Loss 3.6487 LearningRate 0.0001 Epoch: 28 Global Step: 587720 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:49,589-Speed 6312.41 samples/sec Loss 3.6634 LearningRate 0.0001 Epoch: 28 Global Step: 587730 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:52,832-Speed 6316.64 samples/sec Loss 3.6727 LearningRate 0.0001 Epoch: 28 Global Step: 587740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:56,080-Speed 6306.99 samples/sec Loss 3.6314 LearningRate 0.0001 Epoch: 28 Global Step: 587750 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:11:59,328-Speed 6308.08 samples/sec Loss 3.7001 LearningRate 0.0001 Epoch: 28 Global Step: 587760 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:02,573-Speed 6311.55 samples/sec Loss 3.6565 LearningRate 0.0001 Epoch: 28 Global Step: 587770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:05,819-Speed 6310.23 samples/sec Loss 3.6653 LearningRate 0.0001 Epoch: 28 Global Step: 587780 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:09,050-Speed 6340.56 samples/sec Loss 3.6480 LearningRate 0.0001 Epoch: 28 Global Step: 587790 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:12,293-Speed 6317.68 samples/sec Loss 3.6696 LearningRate 0.0001 Epoch: 28 Global Step: 587800 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:15,539-Speed 6309.41 samples/sec Loss 3.6361 LearningRate 0.0001 Epoch: 28 Global Step: 587810 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:18,783-Speed 6314.27 samples/sec Loss 3.6572 LearningRate 0.0001 Epoch: 28 Global Step: 587820 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:22,029-Speed 6312.67 samples/sec Loss 3.5599 LearningRate 0.0001 Epoch: 28 Global Step: 587830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:25,277-Speed 6306.67 samples/sec Loss 3.6312 LearningRate 0.0001 Epoch: 28 Global Step: 587840 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:28,521-Speed 6312.90 samples/sec Loss 3.6566 LearningRate 0.0001 Epoch: 28 Global Step: 587850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:31,766-Speed 6313.07 samples/sec Loss 3.6473 LearningRate 0.0001 Epoch: 28 Global Step: 587860 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:35,012-Speed 6312.04 samples/sec Loss 3.6718 LearningRate 0.0001 Epoch: 28 Global Step: 587870 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:38,259-Speed 6308.74 samples/sec Loss 3.6056 LearningRate 0.0001 Epoch: 28 Global Step: 587880 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:41,491-Speed 6337.93 samples/sec Loss 3.6499 LearningRate 0.0001 Epoch: 28 Global Step: 587890 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:44,736-Speed 6313.05 samples/sec Loss 3.6197 LearningRate 0.0001 Epoch: 28 Global Step: 587900 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:47,979-Speed 6316.92 samples/sec Loss 3.6332 LearningRate 0.0001 Epoch: 28 Global Step: 587910 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:51,223-Speed 6314.54 samples/sec Loss 3.6663 LearningRate 0.0001 Epoch: 28 Global Step: 587920 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:54,464-Speed 6320.52 samples/sec Loss 3.6663 LearningRate 0.0001 Epoch: 28 Global Step: 587930 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:12:57,711-Speed 6308.98 samples/sec Loss 3.6455 LearningRate 0.0001 Epoch: 28 Global Step: 587940 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:00,957-Speed 6309.93 samples/sec Loss 3.6354 LearningRate 0.0001 Epoch: 28 Global Step: 587950 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:04,205-Speed 6308.17 samples/sec Loss 3.6996 LearningRate 0.0001 Epoch: 28 Global Step: 587960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:07,464-Speed 6285.52 samples/sec Loss 3.6170 LearningRate 0.0001 Epoch: 28 Global Step: 587970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:10,706-Speed 6318.25 samples/sec Loss 3.6364 LearningRate 0.0001 Epoch: 28 Global Step: 587980 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:13,953-Speed 6308.89 samples/sec Loss 3.6779 LearningRate 0.0001 Epoch: 28 Global Step: 587990 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:13:17,198-Speed 6311.82 samples/sec Loss 3.6313 LearningRate 0.0001 Epoch: 28 Global Step: 588000 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:13:20,429-Speed 6341.45 samples/sec Loss 3.6109 LearningRate 0.0001 Epoch: 28 Global Step: 588010 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:23,673-Speed 6314.24 samples/sec Loss 3.6553 LearningRate 0.0001 Epoch: 28 Global Step: 588020 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:26,916-Speed 6315.41 samples/sec Loss 3.6638 LearningRate 0.0001 Epoch: 28 Global Step: 588030 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:30,159-Speed 6316.56 samples/sec Loss 3.6869 LearningRate 0.0001 Epoch: 28 Global Step: 588040 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:33,403-Speed 6314.56 samples/sec Loss 3.6377 LearningRate 0.0001 Epoch: 28 Global Step: 588050 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:36,654-Speed 6302.93 samples/sec Loss 3.6536 LearningRate 0.0001 Epoch: 28 Global Step: 588060 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:39,899-Speed 6312.02 samples/sec Loss 3.6132 LearningRate 0.0001 Epoch: 28 Global Step: 588070 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:43,142-Speed 6316.17 samples/sec Loss 3.6528 LearningRate 0.0001 Epoch: 28 Global Step: 588080 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:46,386-Speed 6314.84 samples/sec Loss 3.6438 LearningRate 0.0001 Epoch: 28 Global Step: 588090 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:49,630-Speed 6316.06 samples/sec Loss 3.6239 LearningRate 0.0001 Epoch: 28 Global Step: 588100 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:13:52,875-Speed 6311.86 samples/sec Loss 3.6310 LearningRate 0.0001 Epoch: 28 Global Step: 588110 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:13:56,118-Speed 6316.91 samples/sec Loss 3.5953 LearningRate 0.0001 Epoch: 28 Global Step: 588120 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:13:59,363-Speed 6313.10 samples/sec Loss 3.6192 LearningRate 0.0001 Epoch: 28 Global Step: 588130 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:14:02,616-Speed 6296.79 samples/sec Loss 3.6232 LearningRate 0.0001 Epoch: 28 Global Step: 588140 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:14:05,845-Speed 6343.88 samples/sec Loss 3.6351 LearningRate 0.0001 Epoch: 28 Global Step: 588150 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:09,092-Speed 6309.17 samples/sec Loss 3.6398 LearningRate 0.0001 Epoch: 28 Global Step: 588160 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:12,333-Speed 6320.83 samples/sec Loss 3.7012 LearningRate 0.0001 Epoch: 28 Global Step: 588170 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:15,578-Speed 6312.64 samples/sec Loss 3.6039 LearningRate 0.0001 Epoch: 28 Global Step: 588180 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:18,820-Speed 6318.03 samples/sec Loss 3.6637 LearningRate 0.0001 Epoch: 28 Global Step: 588190 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:22,063-Speed 6316.65 samples/sec Loss 3.6147 LearningRate 0.0001 Epoch: 28 Global Step: 588200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:25,307-Speed 6315.14 samples/sec Loss 3.6001 LearningRate 0.0001 Epoch: 28 Global Step: 588210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:28,553-Speed 6310.56 samples/sec Loss 3.6483 LearningRate 0.0001 Epoch: 28 Global Step: 588220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:31,801-Speed 6306.30 samples/sec Loss 3.6675 LearningRate 0.0001 Epoch: 28 Global Step: 588230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:35,044-Speed 6316.42 samples/sec Loss 3.6892 LearningRate 0.0001 Epoch: 28 Global Step: 588240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:38,292-Speed 6307.56 samples/sec Loss 3.6601 LearningRate 0.0001 Epoch: 28 Global Step: 588250 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:14:41,537-Speed 6311.61 samples/sec Loss 3.6161 LearningRate 0.0001 Epoch: 28 Global Step: 588260 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:14:44,765-Speed 6346.54 samples/sec Loss 3.6154 LearningRate 0.0001 Epoch: 28 Global Step: 588270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:48,013-Speed 6306.98 samples/sec Loss 3.6605 LearningRate 0.0001 Epoch: 28 Global Step: 588280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:51,255-Speed 6317.35 samples/sec Loss 3.6109 LearningRate 0.0001 Epoch: 28 Global Step: 588290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:54,526-Speed 6263.22 samples/sec Loss 3.5971 LearningRate 0.0001 Epoch: 28 Global Step: 588300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:14:57,772-Speed 6310.57 samples/sec Loss 3.5604 LearningRate 0.0001 Epoch: 28 Global Step: 588310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:01,022-Speed 6303.56 samples/sec Loss 3.6445 LearningRate 0.0001 Epoch: 28 Global Step: 588320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:04,268-Speed 6311.59 samples/sec Loss 3.6510 LearningRate 0.0001 Epoch: 28 Global Step: 588330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:07,509-Speed 6320.12 samples/sec Loss 3.6466 LearningRate 0.0001 Epoch: 28 Global Step: 588340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:10,758-Speed 6305.89 samples/sec Loss 3.5843 LearningRate 0.0001 Epoch: 28 Global Step: 588350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:14,006-Speed 6306.32 samples/sec Loss 3.6679 LearningRate 0.0001 Epoch: 28 Global Step: 588360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:17,247-Speed 6320.95 samples/sec Loss 3.6094 LearningRate 0.0001 Epoch: 28 Global Step: 588370 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:15:20,495-Speed 6307.15 samples/sec Loss 3.6637 LearningRate 0.0001 Epoch: 28 Global Step: 588380 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:15:23,741-Speed 6309.37 samples/sec Loss 3.6492 LearningRate 0.0001 Epoch: 28 Global Step: 588390 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:15:26,989-Speed 6308.04 samples/sec Loss 3.6763 LearningRate 0.0001 Epoch: 28 Global Step: 588400 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:15:30,243-Speed 6294.27 samples/sec Loss 3.6499 LearningRate 0.0001 Epoch: 28 Global Step: 588410 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:15:33,487-Speed 6314.50 samples/sec Loss 3.6281 LearningRate 0.0001 Epoch: 28 Global Step: 588420 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:15:36,733-Speed 6311.18 samples/sec Loss 3.6550 LearningRate 0.0001 Epoch: 28 Global Step: 588430 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:15:39,964-Speed 6340.09 samples/sec Loss 3.6508 LearningRate 0.0001 Epoch: 28 Global Step: 588440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:43,212-Speed 6307.34 samples/sec Loss 3.6419 LearningRate 0.0001 Epoch: 28 Global Step: 588450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:46,458-Speed 6309.72 samples/sec Loss 3.6614 LearningRate 0.0001 Epoch: 28 Global Step: 588460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:49,709-Speed 6301.59 samples/sec Loss 3.5997 LearningRate 0.0001 Epoch: 28 Global Step: 588470 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:52,956-Speed 6308.70 samples/sec Loss 3.7004 LearningRate 0.0001 Epoch: 28 Global Step: 588480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:56,200-Speed 6314.85 samples/sec Loss 3.6576 LearningRate 0.0001 Epoch: 28 Global Step: 588490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:15:59,446-Speed 6310.84 samples/sec Loss 3.5992 LearningRate 0.0001 Epoch: 28 Global Step: 588500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:02,692-Speed 6310.85 samples/sec Loss 3.6285 LearningRate 0.0001 Epoch: 28 Global Step: 588510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:05,938-Speed 6310.30 samples/sec Loss 3.6485 LearningRate 0.0001 Epoch: 28 Global Step: 588520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:09,189-Speed 6300.60 samples/sec Loss 3.6546 LearningRate 0.0001 Epoch: 28 Global Step: 588530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:12,446-Speed 6291.35 samples/sec Loss 3.6213 LearningRate 0.0001 Epoch: 28 Global Step: 588540 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:16:15,694-Speed 6305.80 samples/sec Loss 3.6434 LearningRate 0.0001 Epoch: 28 Global Step: 588550 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:16:18,941-Speed 6308.93 samples/sec Loss 3.6111 LearningRate 0.0001 Epoch: 28 Global Step: 588560 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:16:22,178-Speed 6329.19 samples/sec Loss 3.5689 LearningRate 0.0001 Epoch: 28 Global Step: 588570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:25,424-Speed 6310.31 samples/sec Loss 3.5582 LearningRate 0.0001 Epoch: 28 Global Step: 588580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:28,677-Speed 6297.39 samples/sec Loss 3.6516 LearningRate 0.0001 Epoch: 28 Global Step: 588590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:31,923-Speed 6311.10 samples/sec Loss 3.6594 LearningRate 0.0001 Epoch: 28 Global Step: 588600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:35,167-Speed 6315.27 samples/sec Loss 3.6713 LearningRate 0.0001 Epoch: 28 Global Step: 588610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:38,415-Speed 6305.58 samples/sec Loss 3.6514 LearningRate 0.0001 Epoch: 28 Global Step: 588620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:41,660-Speed 6313.88 samples/sec Loss 3.6324 LearningRate 0.0001 Epoch: 28 Global Step: 588630 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:44,905-Speed 6312.57 samples/sec Loss 3.6741 LearningRate 0.0001 Epoch: 28 Global Step: 588640 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:48,155-Speed 6302.04 samples/sec Loss 3.6978 LearningRate 0.0001 Epoch: 28 Global Step: 588650 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:51,403-Speed 6307.08 samples/sec Loss 3.6415 LearningRate 0.0001 Epoch: 28 Global Step: 588660 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:16:54,649-Speed 6310.29 samples/sec Loss 3.6206 LearningRate 0.0001 Epoch: 28 Global Step: 588670 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:16:57,897-Speed 6308.44 samples/sec Loss 3.6553 LearningRate 0.0001 Epoch: 28 Global Step: 588680 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:17:01,131-Speed 6333.04 samples/sec Loss 3.6559 LearningRate 0.0001 Epoch: 28 Global Step: 588690 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:04,382-Speed 6301.09 samples/sec Loss 3.6684 LearningRate 0.0001 Epoch: 28 Global Step: 588700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:07,628-Speed 6310.64 samples/sec Loss 3.5978 LearningRate 0.0001 Epoch: 28 Global Step: 588710 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:10,872-Speed 6313.95 samples/sec Loss 3.6134 LearningRate 0.0001 Epoch: 28 Global Step: 588720 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:14,118-Speed 6312.36 samples/sec Loss 3.6278 LearningRate 0.0001 Epoch: 28 Global Step: 588730 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:17,365-Speed 6307.64 samples/sec Loss 3.5817 LearningRate 0.0001 Epoch: 28 Global Step: 588740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:20,612-Speed 6310.44 samples/sec Loss 3.6565 LearningRate 0.0001 Epoch: 28 Global Step: 588750 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:23,859-Speed 6308.04 samples/sec Loss 3.6956 LearningRate 0.0001 Epoch: 28 Global Step: 588760 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:27,110-Speed 6302.29 samples/sec Loss 3.5948 LearningRate 0.0001 Epoch: 28 Global Step: 588770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:30,355-Speed 6312.59 samples/sec Loss 3.6050 LearningRate 0.0001 Epoch: 28 Global Step: 588780 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:33,599-Speed 6313.32 samples/sec Loss 3.6132 LearningRate 0.0001 Epoch: 28 Global Step: 588790 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:17:36,830-Speed 6341.68 samples/sec Loss 3.6700 LearningRate 0.0001 Epoch: 28 Global Step: 588800 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:40,076-Speed 6310.14 samples/sec Loss 3.6476 LearningRate 0.0001 Epoch: 28 Global Step: 588810 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:43,322-Speed 6311.65 samples/sec Loss 3.6465 LearningRate 0.0001 Epoch: 28 Global Step: 588820 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:46,565-Speed 6315.47 samples/sec Loss 3.7108 LearningRate 0.0001 Epoch: 28 Global Step: 588830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:49,811-Speed 6310.85 samples/sec Loss 3.7020 LearningRate 0.0001 Epoch: 28 Global Step: 588840 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:53,053-Speed 6319.06 samples/sec Loss 3.5693 LearningRate 0.0001 Epoch: 28 Global Step: 588850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:56,298-Speed 6313.08 samples/sec Loss 3.6871 LearningRate 0.0001 Epoch: 28 Global Step: 588860 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:17:59,541-Speed 6315.66 samples/sec Loss 3.6644 LearningRate 0.0001 Epoch: 28 Global Step: 588870 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:02,792-Speed 6300.78 samples/sec Loss 3.6086 LearningRate 0.0001 Epoch: 28 Global Step: 588880 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:06,039-Speed 6308.49 samples/sec Loss 3.7110 LearningRate 0.0001 Epoch: 28 Global Step: 588890 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:09,284-Speed 6312.87 samples/sec Loss 3.6382 LearningRate 0.0001 Epoch: 28 Global Step: 588900 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:18:12,528-Speed 6314.14 samples/sec Loss 3.6663 LearningRate 0.0001 Epoch: 28 Global Step: 588910 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:18:15,771-Speed 6317.63 samples/sec Loss 3.6365 LearningRate 0.0001 Epoch: 28 Global Step: 588920 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:18:19,021-Speed 6303.21 samples/sec Loss 3.6643 LearningRate 0.0001 Epoch: 28 Global Step: 588930 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:18:22,270-Speed 6304.30 samples/sec Loss 3.6131 LearningRate 0.0001 Epoch: 28 Global Step: 588940 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:18:25,500-Speed 6341.05 samples/sec Loss 3.6450 LearningRate 0.0001 Epoch: 28 Global Step: 588950 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:28,742-Speed 6319.15 samples/sec Loss 3.6280 LearningRate 0.0001 Epoch: 28 Global Step: 588960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:31,988-Speed 6310.19 samples/sec Loss 3.5723 LearningRate 0.0001 Epoch: 28 Global Step: 588970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:35,231-Speed 6318.33 samples/sec Loss 3.5899 LearningRate 0.0001 Epoch: 28 Global Step: 588980 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:38,476-Speed 6313.13 samples/sec Loss 3.6593 LearningRate 0.0001 Epoch: 28 Global Step: 588990 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:41,718-Speed 6318.20 samples/sec Loss 3.6076 LearningRate 0.0001 Epoch: 28 Global Step: 589000 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:44,962-Speed 6314.52 samples/sec Loss 3.6077 LearningRate 0.0001 Epoch: 28 Global Step: 589010 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:48,205-Speed 6316.35 samples/sec Loss 3.6513 LearningRate 0.0001 Epoch: 28 Global Step: 589020 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:51,452-Speed 6309.75 samples/sec Loss 3.6860 LearningRate 0.0001 Epoch: 28 Global Step: 589030 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:54,696-Speed 6314.69 samples/sec Loss 3.6162 LearningRate 0.0001 Epoch: 28 Global Step: 589040 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:18:57,928-Speed 6336.75 samples/sec Loss 3.6628 LearningRate 0.0001 Epoch: 28 Global Step: 589050 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:01,173-Speed 6313.77 samples/sec Loss 3.6460 LearningRate 0.0001 Epoch: 28 Global Step: 589060 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:04,418-Speed 6313.42 samples/sec Loss 3.6376 LearningRate 0.0001 Epoch: 28 Global Step: 589070 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:07,660-Speed 6317.36 samples/sec Loss 3.6151 LearningRate 0.0001 Epoch: 28 Global Step: 589080 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:10,905-Speed 6313.55 samples/sec Loss 3.6195 LearningRate 0.0001 Epoch: 28 Global Step: 589090 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:14,151-Speed 6309.98 samples/sec Loss 3.6217 LearningRate 0.0001 Epoch: 28 Global Step: 589100 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:17,395-Speed 6315.29 samples/sec Loss 3.6197 LearningRate 0.0001 Epoch: 28 Global Step: 589110 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:20,640-Speed 6312.54 samples/sec Loss 3.6066 LearningRate 0.0001 Epoch: 28 Global Step: 589120 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:23,887-Speed 6309.30 samples/sec Loss 3.6400 LearningRate 0.0001 Epoch: 28 Global Step: 589130 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:27,133-Speed 6309.70 samples/sec Loss 3.6284 LearningRate 0.0001 Epoch: 28 Global Step: 589140 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:30,380-Speed 6308.79 samples/sec Loss 3.5920 LearningRate 0.0001 Epoch: 28 Global Step: 589150 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:19:33,624-Speed 6315.17 samples/sec Loss 3.6017 LearningRate 0.0001 Epoch: 28 Global Step: 589160 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:19:36,852-Speed 6345.62 samples/sec Loss 3.6529 LearningRate 0.0001 Epoch: 28 Global Step: 589170 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:40,099-Speed 6309.02 samples/sec Loss 3.6239 LearningRate 0.0001 Epoch: 28 Global Step: 589180 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:43,346-Speed 6308.65 samples/sec Loss 3.6686 LearningRate 0.0001 Epoch: 28 Global Step: 589190 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:46,590-Speed 6314.42 samples/sec Loss 3.6644 LearningRate 0.0001 Epoch: 28 Global Step: 589200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:49,841-Speed 6301.26 samples/sec Loss 3.6862 LearningRate 0.0001 Epoch: 28 Global Step: 589210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:53,084-Speed 6318.55 samples/sec Loss 3.6540 LearningRate 0.0001 Epoch: 28 Global Step: 589220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:56,336-Speed 6297.46 samples/sec Loss 3.6374 LearningRate 0.0001 Epoch: 28 Global Step: 589230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:19:59,582-Speed 6311.08 samples/sec Loss 3.6157 LearningRate 0.0001 Epoch: 28 Global Step: 589240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:02,828-Speed 6311.01 samples/sec Loss 3.6304 LearningRate 0.0001 Epoch: 28 Global Step: 589250 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:06,096-Speed 6268.32 samples/sec Loss 3.6415 LearningRate 0.0001 Epoch: 28 Global Step: 589260 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:09,345-Speed 6304.52 samples/sec Loss 3.6182 LearningRate 0.0001 Epoch: 28 Global Step: 589270 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:20:12,575-Speed 6342.03 samples/sec Loss 3.6230 LearningRate 0.0001 Epoch: 28 Global Step: 589280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:15,819-Speed 6316.20 samples/sec Loss 3.6508 LearningRate 0.0001 Epoch: 28 Global Step: 589290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:19,064-Speed 6311.36 samples/sec Loss 3.6288 LearningRate 0.0001 Epoch: 28 Global Step: 589300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:22,315-Speed 6300.80 samples/sec Loss 3.6797 LearningRate 0.0001 Epoch: 28 Global Step: 589310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:25,564-Speed 6304.59 samples/sec Loss 3.5973 LearningRate 0.0001 Epoch: 28 Global Step: 589320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:28,818-Speed 6295.10 samples/sec Loss 3.6482 LearningRate 0.0001 Epoch: 28 Global Step: 589330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:32,067-Speed 6306.38 samples/sec Loss 3.6389 LearningRate 0.0001 Epoch: 28 Global Step: 589340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:35,312-Speed 6312.43 samples/sec Loss 3.5893 LearningRate 0.0001 Epoch: 28 Global Step: 589350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:38,561-Speed 6305.29 samples/sec Loss 3.5938 LearningRate 0.0001 Epoch: 28 Global Step: 589360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:41,805-Speed 6313.89 samples/sec Loss 3.6560 LearningRate 0.0001 Epoch: 28 Global Step: 589370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:45,054-Speed 6304.49 samples/sec Loss 3.6413 LearningRate 0.0001 Epoch: 28 Global Step: 589380 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:20:48,287-Speed 6337.44 samples/sec Loss 3.5627 LearningRate 0.0001 Epoch: 28 Global Step: 589390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:51,533-Speed 6310.17 samples/sec Loss 3.6195 LearningRate 0.0001 Epoch: 28 Global Step: 589400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:54,779-Speed 6311.03 samples/sec Loss 3.6103 LearningRate 0.0001 Epoch: 28 Global Step: 589410 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:20:58,023-Speed 6313.64 samples/sec Loss 3.5646 LearningRate 0.0001 Epoch: 28 Global Step: 589420 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:01,270-Speed 6308.19 samples/sec Loss 3.6605 LearningRate 0.0001 Epoch: 28 Global Step: 589430 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:04,521-Speed 6302.34 samples/sec Loss 3.6505 LearningRate 0.0001 Epoch: 28 Global Step: 589440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:07,766-Speed 6313.89 samples/sec Loss 3.6342 LearningRate 0.0001 Epoch: 28 Global Step: 589450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:11,012-Speed 6311.04 samples/sec Loss 3.6338 LearningRate 0.0001 Epoch: 28 Global Step: 589460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:14,255-Speed 6316.18 samples/sec Loss 3.6001 LearningRate 0.0001 Epoch: 28 Global Step: 589470 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:17,502-Speed 6309.04 samples/sec Loss 3.6237 LearningRate 0.0001 Epoch: 28 Global Step: 589480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:20,746-Speed 6313.62 samples/sec Loss 3.7085 LearningRate 0.0001 Epoch: 28 Global Step: 589490 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:21:24,003-Speed 6289.69 samples/sec Loss 3.6617 LearningRate 0.0001 Epoch: 28 Global Step: 589500 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:21:27,238-Speed 6333.49 samples/sec Loss 3.6223 LearningRate 0.0001 Epoch: 28 Global Step: 589510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:30,485-Speed 6306.89 samples/sec Loss 3.6525 LearningRate 0.0001 Epoch: 28 Global Step: 589520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:33,732-Speed 6309.53 samples/sec Loss 3.6284 LearningRate 0.0001 Epoch: 28 Global Step: 589530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:36,976-Speed 6314.42 samples/sec Loss 3.6637 LearningRate 0.0001 Epoch: 28 Global Step: 589540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:40,221-Speed 6311.92 samples/sec Loss 3.5982 LearningRate 0.0001 Epoch: 28 Global Step: 589550 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:43,470-Speed 6306.77 samples/sec Loss 3.5763 LearningRate 0.0001 Epoch: 28 Global Step: 589560 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:46,715-Speed 6310.93 samples/sec Loss 3.5505 LearningRate 0.0001 Epoch: 28 Global Step: 589570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:49,964-Speed 6305.98 samples/sec Loss 3.6090 LearningRate 0.0001 Epoch: 28 Global Step: 589580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:53,210-Speed 6309.69 samples/sec Loss 3.6537 LearningRate 0.0001 Epoch: 28 Global Step: 589590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:56,461-Speed 6301.59 samples/sec Loss 3.6531 LearningRate 0.0001 Epoch: 28 Global Step: 589600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:21:59,698-Speed 6329.21 samples/sec Loss 3.6124 LearningRate 0.0001 Epoch: 28 Global Step: 589610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:02,949-Speed 6302.82 samples/sec Loss 3.6372 LearningRate 0.0001 Epoch: 28 Global Step: 589620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:06,194-Speed 6311.17 samples/sec Loss 3.6298 LearningRate 0.0001 Epoch: 28 Global Step: 589630 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:09,440-Speed 6311.11 samples/sec Loss 3.6157 LearningRate 0.0001 Epoch: 28 Global Step: 589640 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:12,753-Speed 6184.20 samples/sec Loss 3.5876 LearningRate 0.0001 Epoch: 28 Global Step: 589650 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:16,118-Speed 6086.35 samples/sec Loss 3.6570 LearningRate 0.0001 Epoch: 28 Global Step: 589660 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:19,377-Speed 6286.78 samples/sec Loss 3.6358 LearningRate 0.0001 Epoch: 28 Global Step: 589670 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:22,622-Speed 6313.21 samples/sec Loss 3.6888 LearningRate 0.0001 Epoch: 28 Global Step: 589680 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:25,871-Speed 6304.60 samples/sec Loss 3.6121 LearningRate 0.0001 Epoch: 28 Global Step: 589690 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:29,119-Speed 6306.36 samples/sec Loss 3.6442 LearningRate 0.0001 Epoch: 28 Global Step: 589700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:32,373-Speed 6296.61 samples/sec Loss 3.5656 LearningRate 0.0001 Epoch: 28 Global Step: 589710 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:22:35,603-Speed 6341.42 samples/sec Loss 3.6574 LearningRate 0.0001 Epoch: 28 Global Step: 589720 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:38,847-Speed 6314.42 samples/sec Loss 3.6197 LearningRate 0.0001 Epoch: 28 Global Step: 589730 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:42,095-Speed 6307.67 samples/sec Loss 3.6323 LearningRate 0.0001 Epoch: 28 Global Step: 589740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:45,346-Speed 6300.92 samples/sec Loss 3.6336 LearningRate 0.0001 Epoch: 28 Global Step: 589750 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:48,592-Speed 6310.59 samples/sec Loss 3.6397 LearningRate 0.0001 Epoch: 28 Global Step: 589760 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:51,833-Speed 6319.50 samples/sec Loss 3.6249 LearningRate 0.0001 Epoch: 28 Global Step: 589770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:55,079-Speed 6311.68 samples/sec Loss 3.6602 LearningRate 0.0001 Epoch: 28 Global Step: 589780 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:22:58,322-Speed 6316.67 samples/sec Loss 3.6254 LearningRate 0.0001 Epoch: 28 Global Step: 589790 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:01,566-Speed 6313.82 samples/sec Loss 3.6654 LearningRate 0.0001 Epoch: 28 Global Step: 589800 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:04,811-Speed 6312.76 samples/sec Loss 3.6620 LearningRate 0.0001 Epoch: 28 Global Step: 589810 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:08,060-Speed 6305.58 samples/sec Loss 3.6432 LearningRate 0.0001 Epoch: 28 Global Step: 589820 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:23:11,292-Speed 6337.45 samples/sec Loss 3.5999 LearningRate 0.0001 Epoch: 28 Global Step: 589830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:14,535-Speed 6317.59 samples/sec Loss 3.5836 LearningRate 0.0001 Epoch: 28 Global Step: 589840 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:17,778-Speed 6316.15 samples/sec Loss 3.6266 LearningRate 0.0001 Epoch: 28 Global Step: 589850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:21,022-Speed 6314.26 samples/sec Loss 3.6316 LearningRate 0.0001 Epoch: 28 Global Step: 589860 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:24,264-Speed 6318.45 samples/sec Loss 3.6453 LearningRate 0.0001 Epoch: 28 Global Step: 589870 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:27,511-Speed 6309.09 samples/sec Loss 3.6323 LearningRate 0.0001 Epoch: 28 Global Step: 589880 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:30,754-Speed 6316.76 samples/sec Loss 3.6607 LearningRate 0.0001 Epoch: 28 Global Step: 589890 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:33,998-Speed 6315.44 samples/sec Loss 3.6326 LearningRate 0.0001 Epoch: 28 Global Step: 589900 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:37,245-Speed 6309.32 samples/sec Loss 3.5693 LearningRate 0.0001 Epoch: 28 Global Step: 589910 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:40,524-Speed 6247.32 samples/sec Loss 3.6300 LearningRate 0.0001 Epoch: 28 Global Step: 589920 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:43,766-Speed 6317.69 samples/sec Loss 3.5975 LearningRate 0.0001 Epoch: 28 Global Step: 589930 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:23:47,010-Speed 6314.65 samples/sec Loss 3.5569 LearningRate 0.0001 Epoch: 28 Global Step: 589940 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:23:50,264-Speed 6295.20 samples/sec Loss 3.6673 LearningRate 0.0001 Epoch: 28 Global Step: 589950 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:23:53,498-Speed 6335.39 samples/sec Loss 3.6438 LearningRate 0.0001 Epoch: 28 Global Step: 589960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:56,744-Speed 6309.34 samples/sec Loss 3.6331 LearningRate 0.0001 Epoch: 28 Global Step: 589970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:23:59,990-Speed 6311.55 samples/sec Loss 3.6376 LearningRate 0.0001 Epoch: 28 Global Step: 589980 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:03,243-Speed 6297.29 samples/sec Loss 3.6332 LearningRate 0.0001 Epoch: 28 Global Step: 589990 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:06,488-Speed 6312.43 samples/sec Loss 3.6282 LearningRate 0.0001 Epoch: 28 Global Step: 590000 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:09,738-Speed 6301.79 samples/sec Loss 3.6704 LearningRate 0.0001 Epoch: 28 Global Step: 590010 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:12,984-Speed 6311.28 samples/sec Loss 3.6318 LearningRate 0.0001 Epoch: 28 Global Step: 590020 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:16,226-Speed 6318.15 samples/sec Loss 3.6730 LearningRate 0.0001 Epoch: 28 Global Step: 590030 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:19,475-Speed 6306.09 samples/sec Loss 3.5493 LearningRate 0.0001 Epoch: 28 Global Step: 590040 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:22,725-Speed 6301.94 samples/sec Loss 3.5874 LearningRate 0.0001 Epoch: 28 Global Step: 590050 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:25,974-Speed 6306.29 samples/sec Loss 3.6057 LearningRate 0.0001 Epoch: 28 Global Step: 590060 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:24:29,205-Speed 6338.94 samples/sec Loss 3.6273 LearningRate 0.0001 Epoch: 28 Global Step: 590070 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:32,448-Speed 6316.64 samples/sec Loss 3.5739 LearningRate 0.0001 Epoch: 28 Global Step: 590080 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:35,695-Speed 6309.63 samples/sec Loss 3.6243 LearningRate 0.0001 Epoch: 28 Global Step: 590090 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:38,940-Speed 6312.80 samples/sec Loss 3.6723 LearningRate 0.0001 Epoch: 28 Global Step: 590100 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:42,185-Speed 6311.21 samples/sec Loss 3.6326 LearningRate 0.0001 Epoch: 28 Global Step: 590110 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:45,430-Speed 6312.44 samples/sec Loss 3.6476 LearningRate 0.0001 Epoch: 28 Global Step: 590120 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:48,679-Speed 6306.57 samples/sec Loss 3.6202 LearningRate 0.0001 Epoch: 28 Global Step: 590130 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:51,923-Speed 6313.96 samples/sec Loss 3.6485 LearningRate 0.0001 Epoch: 28 Global Step: 590140 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:55,167-Speed 6315.75 samples/sec Loss 3.6307 LearningRate 0.0001 Epoch: 28 Global Step: 590150 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:24:58,411-Speed 6313.34 samples/sec Loss 3.6417 LearningRate 0.0001 Epoch: 28 Global Step: 590160 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:01,653-Speed 6318.64 samples/sec Loss 3.5820 LearningRate 0.0001 Epoch: 28 Global Step: 590170 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:04,904-Speed 6302.77 samples/sec Loss 3.6849 LearningRate 0.0001 Epoch: 28 Global Step: 590180 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:08,154-Speed 6301.28 samples/sec Loss 3.6623 LearningRate 0.0001 Epoch: 28 Global Step: 590190 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:11,400-Speed 6311.84 samples/sec Loss 3.6126 LearningRate 0.0001 Epoch: 28 Global Step: 590200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:14,645-Speed 6312.15 samples/sec Loss 3.6384 LearningRate 0.0001 Epoch: 28 Global Step: 590210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:17,891-Speed 6311.10 samples/sec Loss 3.6595 LearningRate 0.0001 Epoch: 28 Global Step: 590220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:21,137-Speed 6311.96 samples/sec Loss 3.6329 LearningRate 0.0001 Epoch: 28 Global Step: 590230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:24,390-Speed 6295.52 samples/sec Loss 3.6394 LearningRate 0.0001 Epoch: 28 Global Step: 590240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:27,639-Speed 6306.01 samples/sec Loss 3.6571 LearningRate 0.0001 Epoch: 28 Global Step: 590250 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:30,885-Speed 6310.12 samples/sec Loss 3.6617 LearningRate 0.0001 Epoch: 28 Global Step: 590260 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:34,119-Speed 6333.87 samples/sec Loss 3.6297 LearningRate 0.0001 Epoch: 28 Global Step: 590270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:37,365-Speed 6310.25 samples/sec Loss 3.6705 LearningRate 0.0001 Epoch: 28 Global Step: 590280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:40,612-Speed 6308.86 samples/sec Loss 3.5610 LearningRate 0.0001 Epoch: 28 Global Step: 590290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:43,859-Speed 6309.84 samples/sec Loss 3.6416 LearningRate 0.0001 Epoch: 28 Global Step: 590300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:47,109-Speed 6302.80 samples/sec Loss 3.6124 LearningRate 0.0001 Epoch: 28 Global Step: 590310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:50,355-Speed 6310.59 samples/sec Loss 3.6211 LearningRate 0.0001 Epoch: 28 Global Step: 590320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:53,600-Speed 6312.75 samples/sec Loss 3.6974 LearningRate 0.0001 Epoch: 28 Global Step: 590330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:25:56,845-Speed 6312.64 samples/sec Loss 3.6427 LearningRate 0.0001 Epoch: 28 Global Step: 590340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:00,093-Speed 6307.76 samples/sec Loss 3.5908 LearningRate 0.0001 Epoch: 28 Global Step: 590350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:03,344-Speed 6302.12 samples/sec Loss 3.6059 LearningRate 0.0001 Epoch: 28 Global Step: 590360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:06,573-Speed 6343.02 samples/sec Loss 3.6762 LearningRate 0.0001 Epoch: 28 Global Step: 590370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:09,820-Speed 6308.97 samples/sec Loss 3.6634 LearningRate 0.0001 Epoch: 28 Global Step: 590380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:13,067-Speed 6309.76 samples/sec Loss 3.6307 LearningRate 0.0001 Epoch: 28 Global Step: 590390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:16,313-Speed 6310.47 samples/sec Loss 3.6410 LearningRate 0.0001 Epoch: 28 Global Step: 590400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:19,560-Speed 6309.13 samples/sec Loss 3.6185 LearningRate 0.0001 Epoch: 28 Global Step: 590410 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:22,806-Speed 6308.99 samples/sec Loss 3.6254 LearningRate 0.0001 Epoch: 28 Global Step: 590420 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:26,057-Speed 6300.91 samples/sec Loss 3.6137 LearningRate 0.0001 Epoch: 28 Global Step: 590430 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:29,302-Speed 6312.53 samples/sec Loss 3.6229 LearningRate 0.0001 Epoch: 28 Global Step: 590440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:32,562-Speed 6285.48 samples/sec Loss 3.6665 LearningRate 0.0001 Epoch: 28 Global Step: 590450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:35,806-Speed 6313.32 samples/sec Loss 3.6371 LearningRate 0.0001 Epoch: 28 Global Step: 590460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:39,055-Speed 6305.48 samples/sec Loss 3.5235 LearningRate 0.0001 Epoch: 28 Global Step: 590470 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:26:42,285-Speed 6341.48 samples/sec Loss 3.5471 LearningRate 0.0001 Epoch: 28 Global Step: 590480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:45,529-Speed 6315.26 samples/sec Loss 3.6352 LearningRate 0.0001 Epoch: 28 Global Step: 590490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:48,778-Speed 6303.39 samples/sec Loss 3.6199 LearningRate 0.0001 Epoch: 28 Global Step: 590500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:52,032-Speed 6296.11 samples/sec Loss 3.6527 LearningRate 0.0001 Epoch: 28 Global Step: 590510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:55,279-Speed 6308.27 samples/sec Loss 3.5729 LearningRate 0.0001 Epoch: 28 Global Step: 590520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:26:58,531-Speed 6299.81 samples/sec Loss 3.6439 LearningRate 0.0001 Epoch: 28 Global Step: 590530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:01,777-Speed 6310.09 samples/sec Loss 3.6491 LearningRate 0.0001 Epoch: 28 Global Step: 590540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:05,026-Speed 6305.90 samples/sec Loss 3.6278 LearningRate 0.0001 Epoch: 28 Global Step: 590550 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:08,269-Speed 6316.17 samples/sec Loss 3.6392 LearningRate 0.0001 Epoch: 28 Global Step: 590560 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:11,520-Speed 6301.06 samples/sec Loss 3.6491 LearningRate 0.0001 Epoch: 28 Global Step: 590570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:14,754-Speed 6335.61 samples/sec Loss 3.6247 LearningRate 0.0001 Epoch: 28 Global Step: 590580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:18,001-Speed 6309.03 samples/sec Loss 3.6674 LearningRate 0.0001 Epoch: 28 Global Step: 590590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:21,248-Speed 6309.47 samples/sec Loss 3.6382 LearningRate 0.0001 Epoch: 28 Global Step: 590600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:24,493-Speed 6311.23 samples/sec Loss 3.6097 LearningRate 0.0001 Epoch: 28 Global Step: 590610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:27,737-Speed 6315.46 samples/sec Loss 3.6062 LearningRate 0.0001 Epoch: 28 Global Step: 590620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:30,980-Speed 6316.51 samples/sec Loss 3.5598 LearningRate 0.0001 Epoch: 28 Global Step: 590630 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:34,225-Speed 6311.35 samples/sec Loss 3.6701 LearningRate 0.0001 Epoch: 28 Global Step: 590640 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:37,475-Speed 6304.11 samples/sec Loss 3.5892 LearningRate 0.0001 Epoch: 28 Global Step: 590650 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:40,720-Speed 6312.25 samples/sec Loss 3.6118 LearningRate 0.0001 Epoch: 28 Global Step: 590660 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:43,967-Speed 6309.38 samples/sec Loss 3.6664 LearningRate 0.0001 Epoch: 28 Global Step: 590670 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:47,211-Speed 6313.47 samples/sec Loss 3.6226 LearningRate 0.0001 Epoch: 28 Global Step: 590680 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:27:50,461-Speed 6304.24 samples/sec Loss 3.6566 LearningRate 0.0001 Epoch: 28 Global Step: 590690 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:27:53,690-Speed 6344.19 samples/sec Loss 3.6173 LearningRate 0.0001 Epoch: 28 Global Step: 590700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:27:56,936-Speed 6310.95 samples/sec Loss 3.6156 LearningRate 0.0001 Epoch: 28 Global Step: 590710 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:00,184-Speed 6306.49 samples/sec Loss 3.6225 LearningRate 0.0001 Epoch: 28 Global Step: 590720 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:03,427-Speed 6316.72 samples/sec Loss 3.6029 LearningRate 0.0001 Epoch: 28 Global Step: 590730 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:06,672-Speed 6311.37 samples/sec Loss 3.5645 LearningRate 0.0001 Epoch: 28 Global Step: 590740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:09,927-Speed 6293.31 samples/sec Loss 3.6209 LearningRate 0.0001 Epoch: 28 Global Step: 590750 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:13,174-Speed 6308.48 samples/sec Loss 3.6123 LearningRate 0.0001 Epoch: 28 Global Step: 590760 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:16,477-Speed 6201.76 samples/sec Loss 3.6273 LearningRate 0.0001 Epoch: 28 Global Step: 590770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:19,723-Speed 6311.68 samples/sec Loss 3.5966 LearningRate 0.0001 Epoch: 28 Global Step: 590780 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:22,981-Speed 6287.76 samples/sec Loss 3.6200 LearningRate 0.0001 Epoch: 28 Global Step: 590790 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:26,224-Speed 6316.45 samples/sec Loss 3.6611 LearningRate 0.0001 Epoch: 28 Global Step: 590800 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:28:29,456-Speed 6338.47 samples/sec Loss 3.5832 LearningRate 0.0001 Epoch: 28 Global Step: 590810 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:32,701-Speed 6313.01 samples/sec Loss 3.6249 LearningRate 0.0001 Epoch: 28 Global Step: 590820 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:35,951-Speed 6303.46 samples/sec Loss 3.6520 LearningRate 0.0001 Epoch: 28 Global Step: 590830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:39,198-Speed 6308.98 samples/sec Loss 3.6610 LearningRate 0.0001 Epoch: 28 Global Step: 590840 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:42,443-Speed 6312.01 samples/sec Loss 3.5772 LearningRate 0.0001 Epoch: 28 Global Step: 590850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:45,690-Speed 6309.48 samples/sec Loss 3.5810 LearningRate 0.0001 Epoch: 28 Global Step: 590860 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:48,932-Speed 6317.32 samples/sec Loss 3.6768 LearningRate 0.0001 Epoch: 28 Global Step: 590870 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:52,178-Speed 6312.08 samples/sec Loss 3.5927 LearningRate 0.0001 Epoch: 28 Global Step: 590880 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:55,424-Speed 6310.61 samples/sec Loss 3.5674 LearningRate 0.0001 Epoch: 28 Global Step: 590890 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:28:58,668-Speed 6314.73 samples/sec Loss 3.6511 LearningRate 0.0001 Epoch: 28 Global Step: 590900 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:01,917-Speed 6303.44 samples/sec Loss 3.5961 LearningRate 0.0001 Epoch: 28 Global Step: 590910 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:29:05,161-Speed 6315.65 samples/sec Loss 3.6730 LearningRate 0.0001 Epoch: 28 Global Step: 590920 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:29:08,397-Speed 6330.14 samples/sec Loss 3.6095 LearningRate 0.0001 Epoch: 28 Global Step: 590930 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:11,647-Speed 6303.47 samples/sec Loss 3.6333 LearningRate 0.0001 Epoch: 28 Global Step: 590940 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:14,893-Speed 6310.06 samples/sec Loss 3.5598 LearningRate 0.0001 Epoch: 28 Global Step: 590950 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:18,141-Speed 6306.94 samples/sec Loss 3.6193 LearningRate 0.0001 Epoch: 28 Global Step: 590960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:21,384-Speed 6316.56 samples/sec Loss 3.6563 LearningRate 0.0001 Epoch: 28 Global Step: 590970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:24,674-Speed 6226.77 samples/sec Loss 3.6243 LearningRate 0.0001 Epoch: 28 Global Step: 590980 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:27,940-Speed 6270.99 samples/sec Loss 3.6190 LearningRate 0.0001 Epoch: 28 Global Step: 590990 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:31,188-Speed 6307.34 samples/sec Loss 3.6366 LearningRate 0.0001 Epoch: 28 Global Step: 591000 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:34,434-Speed 6311.46 samples/sec Loss 3.6156 LearningRate 0.0001 Epoch: 28 Global Step: 591010 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:37,679-Speed 6312.23 samples/sec Loss 3.6654 LearningRate 0.0001 Epoch: 28 Global Step: 591020 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:40,925-Speed 6312.63 samples/sec Loss 3.5647 LearningRate 0.0001 Epoch: 28 Global Step: 591030 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:29:44,168-Speed 6315.58 samples/sec Loss 3.6427 LearningRate 0.0001 Epoch: 28 Global Step: 591040 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:29:47,396-Speed 6346.07 samples/sec Loss 3.6426 LearningRate 0.0001 Epoch: 28 Global Step: 591050 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:50,641-Speed 6311.99 samples/sec Loss 3.6419 LearningRate 0.0001 Epoch: 28 Global Step: 591060 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:53,884-Speed 6317.52 samples/sec Loss 3.6063 LearningRate 0.0001 Epoch: 28 Global Step: 591070 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:29:57,135-Speed 6300.46 samples/sec Loss 3.6428 LearningRate 0.0001 Epoch: 28 Global Step: 591080 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:00,381-Speed 6311.35 samples/sec Loss 3.5921 LearningRate 0.0001 Epoch: 28 Global Step: 591090 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:03,627-Speed 6310.03 samples/sec Loss 3.6063 LearningRate 0.0001 Epoch: 28 Global Step: 591100 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:06,876-Speed 6305.52 samples/sec Loss 3.5798 LearningRate 0.0001 Epoch: 28 Global Step: 591110 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:10,122-Speed 6310.66 samples/sec Loss 3.5888 LearningRate 0.0001 Epoch: 28 Global Step: 591120 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:13,391-Speed 6267.22 samples/sec Loss 3.6268 LearningRate 0.0001 Epoch: 28 Global Step: 591130 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:16,635-Speed 6314.14 samples/sec Loss 3.6440 LearningRate 0.0001 Epoch: 28 Global Step: 591140 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:19,884-Speed 6305.24 samples/sec Loss 3.6307 LearningRate 0.0001 Epoch: 28 Global Step: 591150 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:23,129-Speed 6311.68 samples/sec Loss 3.6629 LearningRate 0.0001 Epoch: 28 Global Step: 591160 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:26,377-Speed 6307.66 samples/sec Loss 3.5661 LearningRate 0.0001 Epoch: 28 Global Step: 591170 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:29,622-Speed 6311.79 samples/sec Loss 3.6195 LearningRate 0.0001 Epoch: 28 Global Step: 591180 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:32,863-Speed 6320.84 samples/sec Loss 3.6327 LearningRate 0.0001 Epoch: 28 Global Step: 591190 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:36,168-Speed 6197.91 samples/sec Loss 3.6860 LearningRate 0.0001 Epoch: 28 Global Step: 591200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:39,528-Speed 6096.83 samples/sec Loss 3.6146 LearningRate 0.0001 Epoch: 28 Global Step: 591210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:42,851-Speed 6163.64 samples/sec Loss 3.5502 LearningRate 0.0001 Epoch: 28 Global Step: 591220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:46,100-Speed 6305.31 samples/sec Loss 3.6268 LearningRate 0.0001 Epoch: 28 Global Step: 591230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:49,445-Speed 6123.94 samples/sec Loss 3.6023 LearningRate 0.0001 Epoch: 28 Global Step: 591240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:30:52,758-Speed 6182.59 samples/sec Loss 3.6023 LearningRate 0.0001 Epoch: 28 Global Step: 591250 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:30:56,014-Speed 6292.39 samples/sec Loss 3.6453 LearningRate 0.0001 Epoch: 28 Global Step: 591260 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:30:59,247-Speed 6336.40 samples/sec Loss 3.5936 LearningRate 0.0001 Epoch: 28 Global Step: 591270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:02,491-Speed 6314.92 samples/sec Loss 3.6488 LearningRate 0.0001 Epoch: 28 Global Step: 591280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:05,737-Speed 6310.53 samples/sec Loss 3.6152 LearningRate 0.0001 Epoch: 28 Global Step: 591290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:08,985-Speed 6307.21 samples/sec Loss 3.6542 LearningRate 0.0001 Epoch: 28 Global Step: 591300 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:12,233-Speed 6307.81 samples/sec Loss 3.6231 LearningRate 0.0001 Epoch: 28 Global Step: 591310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:15,487-Speed 6294.96 samples/sec Loss 3.6639 LearningRate 0.0001 Epoch: 28 Global Step: 591320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:18,731-Speed 6313.78 samples/sec Loss 3.5771 LearningRate 0.0001 Epoch: 28 Global Step: 591330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:21,987-Speed 6292.29 samples/sec Loss 3.6199 LearningRate 0.0001 Epoch: 28 Global Step: 591340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:25,236-Speed 6305.60 samples/sec Loss 3.6305 LearningRate 0.0001 Epoch: 28 Global Step: 591350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:28,485-Speed 6304.86 samples/sec Loss 3.6153 LearningRate 0.0001 Epoch: 28 Global Step: 591360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:31,719-Speed 6333.37 samples/sec Loss 3.5626 LearningRate 0.0001 Epoch: 28 Global Step: 591370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:34,963-Speed 6314.09 samples/sec Loss 3.5780 LearningRate 0.0001 Epoch: 28 Global Step: 591380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:38,208-Speed 6313.23 samples/sec Loss 3.6270 LearningRate 0.0001 Epoch: 28 Global Step: 591390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:41,454-Speed 6310.58 samples/sec Loss 3.5619 LearningRate 0.0001 Epoch: 28 Global Step: 591400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:44,695-Speed 6319.48 samples/sec Loss 3.6308 LearningRate 0.0001 Epoch: 28 Global Step: 591410 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:47,943-Speed 6307.13 samples/sec Loss 3.6360 LearningRate 0.0001 Epoch: 28 Global Step: 591420 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:51,190-Speed 6309.80 samples/sec Loss 3.6369 LearningRate 0.0001 Epoch: 28 Global Step: 591430 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:54,431-Speed 6320.15 samples/sec Loss 3.6135 LearningRate 0.0001 Epoch: 28 Global Step: 591440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:31:57,675-Speed 6314.39 samples/sec Loss 3.6122 LearningRate 0.0001 Epoch: 28 Global Step: 591450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:00,919-Speed 6314.71 samples/sec Loss 3.5783 LearningRate 0.0001 Epoch: 28 Global Step: 591460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:04,164-Speed 6313.16 samples/sec Loss 3.5941 LearningRate 0.0001 Epoch: 28 Global Step: 591470 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:32:07,405-Speed 6320.71 samples/sec Loss 3.5939 LearningRate 0.0001 Epoch: 28 Global Step: 591480 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:32:10,651-Speed 6309.94 samples/sec Loss 3.5720 LearningRate 0.0001 Epoch: 28 Global Step: 591490 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:32:13,882-Speed 6341.04 samples/sec Loss 3.6336 LearningRate 0.0001 Epoch: 28 Global Step: 591500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:17,133-Speed 6301.09 samples/sec Loss 3.6122 LearningRate 0.0001 Epoch: 28 Global Step: 591510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:20,375-Speed 6318.21 samples/sec Loss 3.5587 LearningRate 0.0001 Epoch: 28 Global Step: 591520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:23,620-Speed 6313.14 samples/sec Loss 3.6780 LearningRate 0.0001 Epoch: 28 Global Step: 591530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:26,865-Speed 6312.44 samples/sec Loss 3.6200 LearningRate 0.0001 Epoch: 28 Global Step: 591540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:30,110-Speed 6312.96 samples/sec Loss 3.5687 LearningRate 0.0001 Epoch: 28 Global Step: 591550 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:33,356-Speed 6310.68 samples/sec Loss 3.7027 LearningRate 0.0001 Epoch: 28 Global Step: 591560 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:36,603-Speed 6308.30 samples/sec Loss 3.6301 LearningRate 0.0001 Epoch: 28 Global Step: 591570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:39,851-Speed 6307.01 samples/sec Loss 3.6279 LearningRate 0.0001 Epoch: 28 Global Step: 591580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:43,099-Speed 6308.19 samples/sec Loss 3.6184 LearningRate 0.0001 Epoch: 28 Global Step: 591590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:46,341-Speed 6317.15 samples/sec Loss 3.6256 LearningRate 0.0001 Epoch: 28 Global Step: 591600 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:32:49,586-Speed 6312.69 samples/sec Loss 3.6730 LearningRate 0.0001 Epoch: 28 Global Step: 591610 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:32:52,816-Speed 6342.81 samples/sec Loss 3.6086 LearningRate 0.0001 Epoch: 28 Global Step: 591620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:56,065-Speed 6305.16 samples/sec Loss 3.6094 LearningRate 0.0001 Epoch: 28 Global Step: 591630 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:32:59,310-Speed 6312.54 samples/sec Loss 3.6582 LearningRate 0.0001 Epoch: 28 Global Step: 591640 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:02,552-Speed 6317.36 samples/sec Loss 3.6442 LearningRate 0.0001 Epoch: 28 Global Step: 591650 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:05,798-Speed 6311.89 samples/sec Loss 3.6522 LearningRate 0.0001 Epoch: 28 Global Step: 591660 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:09,043-Speed 6312.83 samples/sec Loss 3.6511 LearningRate 0.0001 Epoch: 28 Global Step: 591670 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:12,288-Speed 6312.16 samples/sec Loss 3.6289 LearningRate 0.0001 Epoch: 28 Global Step: 591680 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:15,540-Speed 6298.43 samples/sec Loss 3.6239 LearningRate 0.0001 Epoch: 28 Global Step: 591690 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:18,788-Speed 6307.77 samples/sec Loss 3.6363 LearningRate 0.0001 Epoch: 28 Global Step: 591700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:22,035-Speed 6307.55 samples/sec Loss 3.6441 LearningRate 0.0001 Epoch: 28 Global Step: 591710 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:25,280-Speed 6314.98 samples/sec Loss 3.6188 LearningRate 0.0001 Epoch: 28 Global Step: 591720 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:33:28,522-Speed 6317.42 samples/sec Loss 3.6405 LearningRate 0.0001 Epoch: 28 Global Step: 591730 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:33:31,755-Speed 6336.82 samples/sec Loss 3.6246 LearningRate 0.0001 Epoch: 28 Global Step: 591740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:35,005-Speed 6303.43 samples/sec Loss 3.5969 LearningRate 0.0001 Epoch: 28 Global Step: 591750 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:38,244-Speed 6323.61 samples/sec Loss 3.6874 LearningRate 0.0001 Epoch: 28 Global Step: 591760 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:41,500-Speed 6291.20 samples/sec Loss 3.6184 LearningRate 0.0001 Epoch: 28 Global Step: 591770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:44,746-Speed 6312.16 samples/sec Loss 3.6722 LearningRate 0.0001 Epoch: 28 Global Step: 591780 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:47,987-Speed 6319.82 samples/sec Loss 3.6298 LearningRate 0.0001 Epoch: 28 Global Step: 591790 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:51,233-Speed 6310.98 samples/sec Loss 3.5897 LearningRate 0.0001 Epoch: 28 Global Step: 591800 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:54,476-Speed 6317.12 samples/sec Loss 3.6079 LearningRate 0.0001 Epoch: 28 Global Step: 591810 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:33:57,722-Speed 6310.08 samples/sec Loss 3.6254 LearningRate 0.0001 Epoch: 28 Global Step: 591820 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:00,969-Speed 6308.19 samples/sec Loss 3.5766 LearningRate 0.0001 Epoch: 28 Global Step: 591830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:04,214-Speed 6313.43 samples/sec Loss 3.6062 LearningRate 0.0001 Epoch: 28 Global Step: 591840 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:34:07,444-Speed 6341.37 samples/sec Loss 3.6170 LearningRate 0.0001 Epoch: 28 Global Step: 591850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:10,689-Speed 6312.45 samples/sec Loss 3.6649 LearningRate 0.0001 Epoch: 28 Global Step: 591860 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:13,934-Speed 6313.80 samples/sec Loss 3.6383 LearningRate 0.0001 Epoch: 28 Global Step: 591870 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:17,180-Speed 6309.15 samples/sec Loss 3.6324 LearningRate 0.0001 Epoch: 28 Global Step: 591880 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:20,429-Speed 6305.71 samples/sec Loss 3.5746 LearningRate 0.0001 Epoch: 28 Global Step: 591890 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:23,676-Speed 6308.67 samples/sec Loss 3.5756 LearningRate 0.0001 Epoch: 28 Global Step: 591900 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:26,922-Speed 6310.76 samples/sec Loss 3.6099 LearningRate 0.0001 Epoch: 28 Global Step: 591910 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:30,165-Speed 6316.13 samples/sec Loss 3.5944 LearningRate 0.0001 Epoch: 28 Global Step: 591920 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:33,407-Speed 6319.59 samples/sec Loss 3.6018 LearningRate 0.0001 Epoch: 28 Global Step: 591930 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:36,652-Speed 6312.80 samples/sec Loss 3.6702 LearningRate 0.0001 Epoch: 28 Global Step: 591940 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:39,901-Speed 6304.98 samples/sec Loss 3.6040 LearningRate 0.0001 Epoch: 28 Global Step: 591950 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:34:43,131-Speed 6341.35 samples/sec Loss 3.6287 LearningRate 0.0001 Epoch: 28 Global Step: 591960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:46,378-Speed 6310.13 samples/sec Loss 3.6410 LearningRate 0.0001 Epoch: 28 Global Step: 591970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:49,623-Speed 6313.24 samples/sec Loss 3.6630 LearningRate 0.0001 Epoch: 28 Global Step: 591980 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:52,882-Speed 6285.60 samples/sec Loss 3.6613 LearningRate 0.0001 Epoch: 28 Global Step: 591990 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:56,131-Speed 6305.03 samples/sec Loss 3.6182 LearningRate 0.0001 Epoch: 28 Global Step: 592000 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:34:59,380-Speed 6304.99 samples/sec Loss 3.6415 LearningRate 0.0001 Epoch: 28 Global Step: 592010 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:02,625-Speed 6311.66 samples/sec Loss 3.6371 LearningRate 0.0001 Epoch: 28 Global Step: 592020 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:05,871-Speed 6310.52 samples/sec Loss 3.6348 LearningRate 0.0001 Epoch: 28 Global Step: 592030 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:09,119-Speed 6307.19 samples/sec Loss 3.6763 LearningRate 0.0001 Epoch: 28 Global Step: 592040 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:12,364-Speed 6312.96 samples/sec Loss 3.5995 LearningRate 0.0001 Epoch: 28 Global Step: 592050 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:15,611-Speed 6308.46 samples/sec Loss 3.5588 LearningRate 0.0001 Epoch: 28 Global Step: 592060 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:35:18,858-Speed 6309.11 samples/sec Loss 3.6619 LearningRate 0.0001 Epoch: 28 Global Step: 592070 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:35:22,092-Speed 6333.46 samples/sec Loss 3.6245 LearningRate 0.0001 Epoch: 28 Global Step: 592080 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:25,341-Speed 6305.65 samples/sec Loss 3.6291 LearningRate 0.0001 Epoch: 28 Global Step: 592090 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:28,601-Speed 6283.19 samples/sec Loss 3.6210 LearningRate 0.0001 Epoch: 28 Global Step: 592100 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:31,849-Speed 6306.31 samples/sec Loss 3.6137 LearningRate 0.0001 Epoch: 28 Global Step: 592110 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:35,095-Speed 6311.49 samples/sec Loss 3.6166 LearningRate 0.0001 Epoch: 28 Global Step: 592120 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:38,338-Speed 6316.72 samples/sec Loss 3.5961 LearningRate 0.0001 Epoch: 28 Global Step: 592130 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:41,596-Speed 6286.24 samples/sec Loss 3.6442 LearningRate 0.0001 Epoch: 28 Global Step: 592140 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:44,844-Speed 6308.21 samples/sec Loss 3.6303 LearningRate 0.0001 Epoch: 28 Global Step: 592150 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:48,091-Speed 6307.93 samples/sec Loss 3.6129 LearningRate 0.0001 Epoch: 28 Global Step: 592160 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:51,373-Speed 6242.51 samples/sec Loss 3.5388 LearningRate 0.0001 Epoch: 28 Global Step: 592170 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:35:54,616-Speed 6317.31 samples/sec Loss 3.5809 LearningRate 0.0001 Epoch: 28 Global Step: 592180 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:35:57,864-Speed 6307.11 samples/sec Loss 3.5801 LearningRate 0.0001 Epoch: 28 Global Step: 592190 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:36:01,093-Speed 6343.41 samples/sec Loss 3.6402 LearningRate 0.0001 Epoch: 28 Global Step: 592200 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:04,337-Speed 6313.66 samples/sec Loss 3.6445 LearningRate 0.0001 Epoch: 28 Global Step: 592210 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:07,581-Speed 6315.89 samples/sec Loss 3.6125 LearningRate 0.0001 Epoch: 28 Global Step: 592220 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:10,826-Speed 6311.13 samples/sec Loss 3.6325 LearningRate 0.0001 Epoch: 28 Global Step: 592230 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:14,071-Speed 6313.34 samples/sec Loss 3.6437 LearningRate 0.0001 Epoch: 28 Global Step: 592240 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:17,317-Speed 6311.08 samples/sec Loss 3.6337 LearningRate 0.0001 Epoch: 28 Global Step: 592250 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:20,562-Speed 6312.32 samples/sec Loss 3.5429 LearningRate 0.0001 Epoch: 28 Global Step: 592260 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:23,810-Speed 6307.74 samples/sec Loss 3.6234 LearningRate 0.0001 Epoch: 28 Global Step: 592270 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:27,060-Speed 6301.94 samples/sec Loss 3.5915 LearningRate 0.0001 Epoch: 28 Global Step: 592280 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:30,315-Speed 6294.31 samples/sec Loss 3.6209 LearningRate 0.0001 Epoch: 28 Global Step: 592290 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:33,561-Speed 6309.34 samples/sec Loss 3.5429 LearningRate 0.0001 Epoch: 28 Global Step: 592300 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:36:36,822-Speed 6281.59 samples/sec Loss 3.5828 LearningRate 0.0001 Epoch: 28 Global Step: 592310 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:40,071-Speed 6305.44 samples/sec Loss 3.6297 LearningRate 0.0001 Epoch: 28 Global Step: 592320 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:43,319-Speed 6307.02 samples/sec Loss 3.6579 LearningRate 0.0001 Epoch: 28 Global Step: 592330 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:46,564-Speed 6312.38 samples/sec Loss 3.6636 LearningRate 0.0001 Epoch: 28 Global Step: 592340 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:49,812-Speed 6307.65 samples/sec Loss 3.6367 LearningRate 0.0001 Epoch: 28 Global Step: 592350 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:53,061-Speed 6304.73 samples/sec Loss 3.6170 LearningRate 0.0001 Epoch: 28 Global Step: 592360 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:56,309-Speed 6305.76 samples/sec Loss 3.6202 LearningRate 0.0001 Epoch: 28 Global Step: 592370 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:36:59,555-Speed 6310.68 samples/sec Loss 3.5991 LearningRate 0.0001 Epoch: 28 Global Step: 592380 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:02,802-Speed 6310.69 samples/sec Loss 3.6537 LearningRate 0.0001 Epoch: 28 Global Step: 592390 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:06,050-Speed 6306.29 samples/sec Loss 3.6020 LearningRate 0.0001 Epoch: 28 Global Step: 592400 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:09,295-Speed 6312.36 samples/sec Loss 3.5864 LearningRate 0.0001 Epoch: 28 Global Step: 592410 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:37:12,538-Speed 6316.67 samples/sec Loss 3.6074 LearningRate 0.0001 Epoch: 28 Global Step: 592420 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:37:15,799-Speed 6281.41 samples/sec Loss 3.6105 LearningRate 0.0001 Epoch: 28 Global Step: 592430 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:19,055-Speed 6291.51 samples/sec Loss 3.5531 LearningRate 0.0001 Epoch: 28 Global Step: 592440 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:22,302-Speed 6309.45 samples/sec Loss 3.5744 LearningRate 0.0001 Epoch: 28 Global Step: 592450 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:25,548-Speed 6311.62 samples/sec Loss 3.5438 LearningRate 0.0001 Epoch: 28 Global Step: 592460 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:28,790-Speed 6317.03 samples/sec Loss 3.6115 LearningRate 0.0001 Epoch: 28 Global Step: 592470 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:32,036-Speed 6310.49 samples/sec Loss 3.6246 LearningRate 0.0001 Epoch: 28 Global Step: 592480 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:35,281-Speed 6314.08 samples/sec Loss 3.6508 LearningRate 0.0001 Epoch: 28 Global Step: 592490 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:38,522-Speed 6318.73 samples/sec Loss 3.5538 LearningRate 0.0001 Epoch: 28 Global Step: 592500 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:41,767-Speed 6313.09 samples/sec Loss 3.5769 LearningRate 0.0001 Epoch: 28 Global Step: 592510 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:45,015-Speed 6307.74 samples/sec Loss 3.5626 LearningRate 0.0001 Epoch: 28 Global Step: 592520 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:48,246-Speed 6338.51 samples/sec Loss 3.5960 LearningRate 0.0001 Epoch: 28 Global Step: 592530 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:51,489-Speed 6316.79 samples/sec Loss 3.6130 LearningRate 0.0001 Epoch: 28 Global Step: 592540 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:54,735-Speed 6310.90 samples/sec Loss 3.5917 LearningRate 0.0001 Epoch: 28 Global Step: 592550 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:37:57,982-Speed 6308.84 samples/sec Loss 3.5751 LearningRate 0.0001 Epoch: 28 Global Step: 592560 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:01,235-Speed 6299.91 samples/sec Loss 3.6125 LearningRate 0.0001 Epoch: 28 Global Step: 592570 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:04,486-Speed 6300.09 samples/sec Loss 3.6232 LearningRate 0.0001 Epoch: 28 Global Step: 592580 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:07,729-Speed 6316.76 samples/sec Loss 3.5952 LearningRate 0.0001 Epoch: 28 Global Step: 592590 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:10,977-Speed 6307.54 samples/sec Loss 3.6509 LearningRate 0.0001 Epoch: 28 Global Step: 592600 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:14,220-Speed 6315.68 samples/sec Loss 3.6557 LearningRate 0.0001 Epoch: 28 Global Step: 592610 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:17,466-Speed 6311.55 samples/sec Loss 3.5990 LearningRate 0.0001 Epoch: 28 Global Step: 592620 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:20,711-Speed 6313.87 samples/sec Loss 3.6481 LearningRate 0.0001 Epoch: 28 Global Step: 592630 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:38:23,948-Speed 6327.64 samples/sec Loss 3.5962 LearningRate 0.0001 Epoch: 28 Global Step: 592640 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:27,195-Speed 6308.32 samples/sec Loss 3.6440 LearningRate 0.0001 Epoch: 28 Global Step: 592650 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:30,440-Speed 6314.80 samples/sec Loss 3.6375 LearningRate 0.0001 Epoch: 28 Global Step: 592660 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:33,690-Speed 6301.62 samples/sec Loss 3.6221 LearningRate 0.0001 Epoch: 28 Global Step: 592670 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:36,935-Speed 6312.91 samples/sec Loss 3.6206 LearningRate 0.0001 Epoch: 28 Global Step: 592680 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:40,180-Speed 6311.88 samples/sec Loss 3.6165 LearningRate 0.0001 Epoch: 28 Global Step: 592690 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:43,425-Speed 6312.76 samples/sec Loss 3.5768 LearningRate 0.0001 Epoch: 28 Global Step: 592700 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:46,673-Speed 6307.29 samples/sec Loss 3.6738 LearningRate 0.0001 Epoch: 28 Global Step: 592710 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:49,922-Speed 6305.01 samples/sec Loss 3.5998 LearningRate 0.0001 Epoch: 28 Global Step: 592720 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:53,167-Speed 6312.14 samples/sec Loss 3.5139 LearningRate 0.0001 Epoch: 28 Global Step: 592730 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:56,401-Speed 6335.20 samples/sec Loss 3.5549 LearningRate 0.0001 Epoch: 28 Global Step: 592740 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:38:59,646-Speed 6310.97 samples/sec Loss 3.6870 LearningRate 0.0001 Epoch: 28 Global Step: 592750 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:02,890-Speed 6314.73 samples/sec Loss 3.6902 LearningRate 0.0001 Epoch: 28 Global Step: 592760 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:06,137-Speed 6309.39 samples/sec Loss 3.5705 LearningRate 0.0001 Epoch: 28 Global Step: 592770 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:09,378-Speed 6320.99 samples/sec Loss 3.5596 LearningRate 0.0001 Epoch: 28 Global Step: 592780 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:12,622-Speed 6314.10 samples/sec Loss 3.5914 LearningRate 0.0001 Epoch: 28 Global Step: 592790 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:15,868-Speed 6309.91 samples/sec Loss 3.5786 LearningRate 0.0001 Epoch: 28 Global Step: 592800 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:19,111-Speed 6316.84 samples/sec Loss 3.5995 LearningRate 0.0001 Epoch: 28 Global Step: 592810 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:22,358-Speed 6309.50 samples/sec Loss 3.5756 LearningRate 0.0001 Epoch: 28 Global Step: 592820 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:25,603-Speed 6312.71 samples/sec Loss 3.6352 LearningRate 0.0001 Epoch: 28 Global Step: 592830 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:28,850-Speed 6307.88 samples/sec Loss 3.5951 LearningRate 0.0001 Epoch: 28 Global Step: 592840 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:39:32,125-Speed 6256.05 samples/sec Loss 3.6207 LearningRate 0.0001 Epoch: 28 Global Step: 592850 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:35,369-Speed 6314.50 samples/sec Loss 3.6123 LearningRate 0.0001 Epoch: 28 Global Step: 592860 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:38,617-Speed 6306.57 samples/sec Loss 3.5806 LearningRate 0.0001 Epoch: 28 Global Step: 592870 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:41,862-Speed 6313.07 samples/sec Loss 3.5912 LearningRate 0.0001 Epoch: 28 Global Step: 592880 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:45,108-Speed 6310.69 samples/sec Loss 3.6222 LearningRate 0.0001 Epoch: 28 Global Step: 592890 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:48,353-Speed 6312.67 samples/sec Loss 3.5943 LearningRate 0.0001 Epoch: 28 Global Step: 592900 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:51,598-Speed 6312.43 samples/sec Loss 3.6908 LearningRate 0.0001 Epoch: 28 Global Step: 592910 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:54,846-Speed 6307.45 samples/sec Loss 3.5933 LearningRate 0.0001 Epoch: 28 Global Step: 592920 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:39:58,095-Speed 6304.84 samples/sec Loss 3.5905 LearningRate 0.0001 Epoch: 28 Global Step: 592930 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:01,342-Speed 6312.86 samples/sec Loss 3.6644 LearningRate 0.0001 Epoch: 28 Global Step: 592940 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:04,611-Speed 6266.20 samples/sec Loss 3.5541 LearningRate 0.0001 Epoch: 28 Global Step: 592950 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:07,903-Speed 6221.11 samples/sec Loss 3.6082 LearningRate 0.0001 Epoch: 28 Global Step: 592960 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:11,151-Speed 6307.73 samples/sec Loss 3.5782 LearningRate 0.0001 Epoch: 28 Global Step: 592970 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:14,397-Speed 6311.00 samples/sec Loss 3.6082 LearningRate 0.0001 Epoch: 28 Global Step: 592980 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:17,644-Speed 6309.17 samples/sec Loss 3.5855 LearningRate 0.0001 Epoch: 28 Global Step: 592990 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:20,892-Speed 6306.47 samples/sec Loss 3.5551 LearningRate 0.0001 Epoch: 28 Global Step: 593000 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:24,138-Speed 6309.58 samples/sec Loss 3.5945 LearningRate 0.0001 Epoch: 28 Global Step: 593010 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:27,385-Speed 6309.69 samples/sec Loss 3.5404 LearningRate 0.0001 Epoch: 28 Global Step: 593020 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:30,630-Speed 6313.29 samples/sec Loss 3.6248 LearningRate 0.0001 Epoch: 28 Global Step: 593030 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:33,875-Speed 6311.35 samples/sec Loss 3.5874 LearningRate 0.0001 Epoch: 28 Global Step: 593040 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:37,124-Speed 6305.04 samples/sec Loss 3.5681 LearningRate 0.0001 Epoch: 28 Global Step: 593050 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:40:40,370-Speed 6310.07 samples/sec Loss 3.6472 LearningRate 0.0001 Epoch: 28 Global Step: 593060 Fp16 Grad Scale: 16384 Required: 22 hours Training: 2022-04-02 21:40:43,604-Speed 6334.87 samples/sec Loss 3.6302 LearningRate 0.0001 Epoch: 28 Global Step: 593070 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:46,849-Speed 6313.86 samples/sec Loss 3.6306 LearningRate 0.0001 Epoch: 28 Global Step: 593080 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:50,096-Speed 6309.30 samples/sec Loss 3.6079 LearningRate 0.0001 Epoch: 28 Global Step: 593090 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:53,339-Speed 6315.49 samples/sec Loss 3.6879 LearningRate 0.0001 Epoch: 28 Global Step: 593100 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:56,588-Speed 6306.00 samples/sec Loss 3.6252 LearningRate 0.0001 Epoch: 28 Global Step: 593110 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:40:59,836-Speed 6306.81 samples/sec Loss 3.6355 LearningRate 0.0001 Epoch: 28 Global Step: 593120 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:41:03,082-Speed 6310.34 samples/sec Loss 3.5730 LearningRate 0.0001 Epoch: 28 Global Step: 593130 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:41:06,326-Speed 6314.44 samples/sec Loss 3.6104 LearningRate 0.0001 Epoch: 28 Global Step: 593140 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:41:09,570-Speed 6313.76 samples/sec Loss 3.6430 LearningRate 0.0001 Epoch: 28 Global Step: 593150 Fp16 Grad Scale: 8192 Required: 22 hours Training: 2022-04-02 21:41:12,815-Speed 6313.82 samples/sec Loss 3.5870 LearningRate 0.0001 Epoch: 28 Global Step: 593160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:16,057-Speed 6317.96 samples/sec Loss 3.5635 LearningRate 0.0001 Epoch: 28 Global Step: 593170 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:41:19,305-Speed 6306.77 samples/sec Loss 3.6062 LearningRate 0.0001 Epoch: 28 Global Step: 593180 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:41:22,534-Speed 6344.61 samples/sec Loss 3.5599 LearningRate 0.0001 Epoch: 28 Global Step: 593190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:25,779-Speed 6313.22 samples/sec Loss 3.5711 LearningRate 0.0001 Epoch: 28 Global Step: 593200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:29,028-Speed 6303.32 samples/sec Loss 3.6295 LearningRate 0.0001 Epoch: 28 Global Step: 593210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:32,274-Speed 6315.12 samples/sec Loss 3.6169 LearningRate 0.0001 Epoch: 28 Global Step: 593220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:35,518-Speed 6314.68 samples/sec Loss 3.5806 LearningRate 0.0001 Epoch: 28 Global Step: 593230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:38,761-Speed 6316.25 samples/sec Loss 3.5542 LearningRate 0.0001 Epoch: 28 Global Step: 593240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:42,004-Speed 6315.84 samples/sec Loss 3.5549 LearningRate 0.0001 Epoch: 28 Global Step: 593250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:45,248-Speed 6315.61 samples/sec Loss 3.5770 LearningRate 0.0001 Epoch: 28 Global Step: 593260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:48,495-Speed 6307.09 samples/sec Loss 3.5931 LearningRate 0.0001 Epoch: 28 Global Step: 593270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:51,740-Speed 6313.99 samples/sec Loss 3.6187 LearningRate 0.0001 Epoch: 28 Global Step: 593280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:54,972-Speed 6339.47 samples/sec Loss 3.5839 LearningRate 0.0001 Epoch: 28 Global Step: 593290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:41:58,218-Speed 6311.63 samples/sec Loss 3.6518 LearningRate 0.0001 Epoch: 28 Global Step: 593300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:01,463-Speed 6311.96 samples/sec Loss 3.6275 LearningRate 0.0001 Epoch: 28 Global Step: 593310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:04,709-Speed 6311.80 samples/sec Loss 3.5977 LearningRate 0.0001 Epoch: 28 Global Step: 593320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:07,953-Speed 6314.32 samples/sec Loss 3.6372 LearningRate 0.0001 Epoch: 28 Global Step: 593330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:11,204-Speed 6300.81 samples/sec Loss 3.6226 LearningRate 0.0001 Epoch: 28 Global Step: 593340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:14,454-Speed 6303.14 samples/sec Loss 3.5810 LearningRate 0.0001 Epoch: 28 Global Step: 593350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:17,702-Speed 6307.35 samples/sec Loss 3.6222 LearningRate 0.0001 Epoch: 28 Global Step: 593360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:20,946-Speed 6314.33 samples/sec Loss 3.6573 LearningRate 0.0001 Epoch: 28 Global Step: 593370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:24,191-Speed 6311.84 samples/sec Loss 3.6129 LearningRate 0.0001 Epoch: 28 Global Step: 593380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:27,436-Speed 6311.99 samples/sec Loss 3.6005 LearningRate 0.0001 Epoch: 28 Global Step: 593390 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:42:30,689-Speed 6298.08 samples/sec Loss 3.6198 LearningRate 0.0001 Epoch: 28 Global Step: 593400 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:42:33,949-Speed 6283.31 samples/sec Loss 3.6026 LearningRate 0.0001 Epoch: 28 Global Step: 593410 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:42:37,203-Speed 6295.78 samples/sec Loss 3.5860 LearningRate 0.0001 Epoch: 28 Global Step: 593420 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:42:40,433-Speed 6342.20 samples/sec Loss 3.6386 LearningRate 0.0001 Epoch: 28 Global Step: 593430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:43,678-Speed 6311.96 samples/sec Loss 3.6042 LearningRate 0.0001 Epoch: 28 Global Step: 593440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:46,929-Speed 6299.97 samples/sec Loss 3.6143 LearningRate 0.0001 Epoch: 28 Global Step: 593450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:50,190-Speed 6283.36 samples/sec Loss 3.6226 LearningRate 0.0001 Epoch: 28 Global Step: 593460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:53,434-Speed 6313.87 samples/sec Loss 3.6164 LearningRate 0.0001 Epoch: 28 Global Step: 593470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:56,677-Speed 6315.83 samples/sec Loss 3.5925 LearningRate 0.0001 Epoch: 28 Global Step: 593480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:42:59,931-Speed 6295.32 samples/sec Loss 3.6480 LearningRate 0.0001 Epoch: 28 Global Step: 593490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:03,176-Speed 6313.19 samples/sec Loss 3.6358 LearningRate 0.0001 Epoch: 28 Global Step: 593500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:06,429-Speed 6297.08 samples/sec Loss 3.6426 LearningRate 0.0001 Epoch: 28 Global Step: 593510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:09,677-Speed 6307.39 samples/sec Loss 3.6583 LearningRate 0.0001 Epoch: 28 Global Step: 593520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:12,910-Speed 6336.06 samples/sec Loss 3.5917 LearningRate 0.0001 Epoch: 28 Global Step: 593530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:16,156-Speed 6310.80 samples/sec Loss 3.6100 LearningRate 0.0001 Epoch: 28 Global Step: 593540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:19,402-Speed 6311.99 samples/sec Loss 3.6364 LearningRate 0.0001 Epoch: 28 Global Step: 593550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:22,648-Speed 6309.19 samples/sec Loss 3.6072 LearningRate 0.0001 Epoch: 28 Global Step: 593560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:25,896-Speed 6307.96 samples/sec Loss 3.5809 LearningRate 0.0001 Epoch: 28 Global Step: 593570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:29,142-Speed 6309.49 samples/sec Loss 3.6049 LearningRate 0.0001 Epoch: 28 Global Step: 593580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:32,390-Speed 6306.59 samples/sec Loss 3.5929 LearningRate 0.0001 Epoch: 28 Global Step: 593590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:35,633-Speed 6317.76 samples/sec Loss 3.6548 LearningRate 0.0001 Epoch: 28 Global Step: 593600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:38,879-Speed 6310.98 samples/sec Loss 3.6234 LearningRate 0.0001 Epoch: 28 Global Step: 593610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:42,129-Speed 6301.15 samples/sec Loss 3.5862 LearningRate 0.0001 Epoch: 28 Global Step: 593620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:45,359-Speed 6342.68 samples/sec Loss 3.5949 LearningRate 0.0001 Epoch: 28 Global Step: 593630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:48,606-Speed 6309.19 samples/sec Loss 3.6059 LearningRate 0.0001 Epoch: 28 Global Step: 593640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:51,853-Speed 6308.00 samples/sec Loss 3.6079 LearningRate 0.0001 Epoch: 28 Global Step: 593650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:55,097-Speed 6314.72 samples/sec Loss 3.6177 LearningRate 0.0001 Epoch: 28 Global Step: 593660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:43:58,344-Speed 6309.59 samples/sec Loss 3.5672 LearningRate 0.0001 Epoch: 28 Global Step: 593670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:01,590-Speed 6309.92 samples/sec Loss 3.6099 LearningRate 0.0001 Epoch: 28 Global Step: 593680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:04,837-Speed 6308.70 samples/sec Loss 3.6192 LearningRate 0.0001 Epoch: 28 Global Step: 593690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:08,090-Speed 6297.11 samples/sec Loss 3.6157 LearningRate 0.0001 Epoch: 28 Global Step: 593700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:11,335-Speed 6313.84 samples/sec Loss 3.5834 LearningRate 0.0001 Epoch: 28 Global Step: 593710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:14,608-Speed 6258.02 samples/sec Loss 3.6046 LearningRate 0.0001 Epoch: 28 Global Step: 593720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:17,859-Speed 6301.26 samples/sec Loss 3.6105 LearningRate 0.0001 Epoch: 28 Global Step: 593730 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:44:21,107-Speed 6308.21 samples/sec Loss 3.6330 LearningRate 0.0001 Epoch: 28 Global Step: 593740 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:44:24,339-Speed 6337.15 samples/sec Loss 3.5719 LearningRate 0.0001 Epoch: 28 Global Step: 593750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:27,586-Speed 6308.41 samples/sec Loss 3.6568 LearningRate 0.0001 Epoch: 28 Global Step: 593760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:30,832-Speed 6311.14 samples/sec Loss 3.5882 LearningRate 0.0001 Epoch: 28 Global Step: 593770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:34,077-Speed 6313.44 samples/sec Loss 3.5706 LearningRate 0.0001 Epoch: 28 Global Step: 593780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:37,319-Speed 6318.80 samples/sec Loss 3.5866 LearningRate 0.0001 Epoch: 28 Global Step: 593790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:40,563-Speed 6313.63 samples/sec Loss 3.6471 LearningRate 0.0001 Epoch: 28 Global Step: 593800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:43,805-Speed 6319.31 samples/sec Loss 3.5955 LearningRate 0.0001 Epoch: 28 Global Step: 593810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:47,053-Speed 6305.69 samples/sec Loss 3.5598 LearningRate 0.0001 Epoch: 28 Global Step: 593820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:50,301-Speed 6307.93 samples/sec Loss 3.6241 LearningRate 0.0001 Epoch: 28 Global Step: 593830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:53,546-Speed 6311.94 samples/sec Loss 3.5602 LearningRate 0.0001 Epoch: 28 Global Step: 593840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:44:56,778-Speed 6337.37 samples/sec Loss 3.5922 LearningRate 0.0001 Epoch: 28 Global Step: 593850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:00,025-Speed 6309.52 samples/sec Loss 3.6736 LearningRate 0.0001 Epoch: 28 Global Step: 593860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:03,277-Speed 6298.06 samples/sec Loss 3.6048 LearningRate 0.0001 Epoch: 28 Global Step: 593870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:06,524-Speed 6310.24 samples/sec Loss 3.6369 LearningRate 0.0001 Epoch: 28 Global Step: 593880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:09,767-Speed 6315.57 samples/sec Loss 3.6035 LearningRate 0.0001 Epoch: 28 Global Step: 593890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:13,011-Speed 6314.06 samples/sec Loss 3.5903 LearningRate 0.0001 Epoch: 28 Global Step: 593900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:16,256-Speed 6312.36 samples/sec Loss 3.6459 LearningRate 0.0001 Epoch: 28 Global Step: 593910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:19,502-Speed 6311.77 samples/sec Loss 3.5972 LearningRate 0.0001 Epoch: 28 Global Step: 593920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:22,744-Speed 6317.54 samples/sec Loss 3.6434 LearningRate 0.0001 Epoch: 28 Global Step: 593930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:25,992-Speed 6306.85 samples/sec Loss 3.5899 LearningRate 0.0001 Epoch: 28 Global Step: 593940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:29,223-Speed 6340.79 samples/sec Loss 3.6156 LearningRate 0.0001 Epoch: 28 Global Step: 593950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:32,474-Speed 6302.37 samples/sec Loss 3.5878 LearningRate 0.0001 Epoch: 28 Global Step: 593960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:35,722-Speed 6307.21 samples/sec Loss 3.5914 LearningRate 0.0001 Epoch: 28 Global Step: 593970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:38,964-Speed 6317.64 samples/sec Loss 3.6134 LearningRate 0.0001 Epoch: 28 Global Step: 593980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:42,315-Speed 6113.74 samples/sec Loss 3.5940 LearningRate 0.0001 Epoch: 28 Global Step: 593990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:45,559-Speed 6313.88 samples/sec Loss 3.6291 LearningRate 0.0001 Epoch: 28 Global Step: 594000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:48,804-Speed 6313.20 samples/sec Loss 3.6078 LearningRate 0.0001 Epoch: 28 Global Step: 594010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:52,049-Speed 6312.07 samples/sec Loss 3.6086 LearningRate 0.0001 Epoch: 28 Global Step: 594020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:55,295-Speed 6311.26 samples/sec Loss 3.6117 LearningRate 0.0001 Epoch: 28 Global Step: 594030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:45:58,546-Speed 6300.25 samples/sec Loss 3.6056 LearningRate 0.0001 Epoch: 28 Global Step: 594040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:01,790-Speed 6314.52 samples/sec Loss 3.5844 LearningRate 0.0001 Epoch: 28 Global Step: 594050 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:46:05,020-Speed 6341.39 samples/sec Loss 3.6216 LearningRate 0.0001 Epoch: 28 Global Step: 594060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:08,265-Speed 6313.08 samples/sec Loss 3.5716 LearningRate 0.0001 Epoch: 28 Global Step: 594070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:11,507-Speed 6319.00 samples/sec Loss 3.6480 LearningRate 0.0001 Epoch: 28 Global Step: 594080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:14,751-Speed 6314.61 samples/sec Loss 3.6169 LearningRate 0.0001 Epoch: 28 Global Step: 594090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:17,996-Speed 6312.79 samples/sec Loss 3.5720 LearningRate 0.0001 Epoch: 28 Global Step: 594100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:21,245-Speed 6305.22 samples/sec Loss 3.6508 LearningRate 0.0001 Epoch: 28 Global Step: 594110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:24,493-Speed 6305.71 samples/sec Loss 3.5760 LearningRate 0.0001 Epoch: 28 Global Step: 594120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:27,736-Speed 6316.01 samples/sec Loss 3.5791 LearningRate 0.0001 Epoch: 28 Global Step: 594130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:30,987-Speed 6302.57 samples/sec Loss 3.5683 LearningRate 0.0001 Epoch: 28 Global Step: 594140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:34,234-Speed 6307.87 samples/sec Loss 3.5948 LearningRate 0.0001 Epoch: 28 Global Step: 594150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:37,479-Speed 6312.24 samples/sec Loss 3.6039 LearningRate 0.0001 Epoch: 28 Global Step: 594160 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:46:40,724-Speed 6312.82 samples/sec Loss 3.5758 LearningRate 0.0001 Epoch: 28 Global Step: 594170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:43,973-Speed 6306.51 samples/sec Loss 3.5844 LearningRate 0.0001 Epoch: 28 Global Step: 594180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:47,222-Speed 6305.37 samples/sec Loss 3.5710 LearningRate 0.0001 Epoch: 28 Global Step: 594190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:50,470-Speed 6307.04 samples/sec Loss 3.5415 LearningRate 0.0001 Epoch: 28 Global Step: 594200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:53,715-Speed 6311.21 samples/sec Loss 3.6171 LearningRate 0.0001 Epoch: 28 Global Step: 594210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:46:56,957-Speed 6319.63 samples/sec Loss 3.5238 LearningRate 0.0001 Epoch: 28 Global Step: 594220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:00,203-Speed 6310.47 samples/sec Loss 3.5953 LearningRate 0.0001 Epoch: 28 Global Step: 594230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:03,450-Speed 6308.44 samples/sec Loss 3.6127 LearningRate 0.0001 Epoch: 28 Global Step: 594240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:06,694-Speed 6315.13 samples/sec Loss 3.6695 LearningRate 0.0001 Epoch: 28 Global Step: 594250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:09,942-Speed 6307.41 samples/sec Loss 3.6067 LearningRate 0.0001 Epoch: 28 Global Step: 594260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:13,183-Speed 6319.81 samples/sec Loss 3.6368 LearningRate 0.0001 Epoch: 28 Global Step: 594270 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:47:16,416-Speed 6336.03 samples/sec Loss 3.5490 LearningRate 0.0001 Epoch: 28 Global Step: 594280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:19,661-Speed 6313.15 samples/sec Loss 3.5855 LearningRate 0.0001 Epoch: 28 Global Step: 594290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:22,904-Speed 6315.76 samples/sec Loss 3.5862 LearningRate 0.0001 Epoch: 28 Global Step: 594300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:26,152-Speed 6306.21 samples/sec Loss 3.6263 LearningRate 0.0001 Epoch: 28 Global Step: 594310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:29,400-Speed 6307.60 samples/sec Loss 3.6568 LearningRate 0.0001 Epoch: 28 Global Step: 594320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:32,650-Speed 6302.01 samples/sec Loss 3.6132 LearningRate 0.0001 Epoch: 28 Global Step: 594330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:35,894-Speed 6314.90 samples/sec Loss 3.5449 LearningRate 0.0001 Epoch: 28 Global Step: 594340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:39,142-Speed 6307.78 samples/sec Loss 3.5672 LearningRate 0.0001 Epoch: 28 Global Step: 594350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:42,388-Speed 6310.37 samples/sec Loss 3.5704 LearningRate 0.0001 Epoch: 28 Global Step: 594360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:45,643-Speed 6293.16 samples/sec Loss 3.5411 LearningRate 0.0001 Epoch: 28 Global Step: 594370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:48,877-Speed 6334.84 samples/sec Loss 3.5958 LearningRate 0.0001 Epoch: 28 Global Step: 594380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:52,117-Speed 6320.81 samples/sec Loss 3.5960 LearningRate 0.0001 Epoch: 28 Global Step: 594390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:55,363-Speed 6312.23 samples/sec Loss 3.6279 LearningRate 0.0001 Epoch: 28 Global Step: 594400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:47:58,610-Speed 6309.77 samples/sec Loss 3.5592 LearningRate 0.0001 Epoch: 28 Global Step: 594410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:01,855-Speed 6313.01 samples/sec Loss 3.5828 LearningRate 0.0001 Epoch: 28 Global Step: 594420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:05,103-Speed 6307.87 samples/sec Loss 3.6707 LearningRate 0.0001 Epoch: 28 Global Step: 594430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:08,350-Speed 6309.81 samples/sec Loss 3.6068 LearningRate 0.0001 Epoch: 28 Global Step: 594440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:11,596-Speed 6309.10 samples/sec Loss 3.6258 LearningRate 0.0001 Epoch: 28 Global Step: 594450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:14,844-Speed 6308.19 samples/sec Loss 3.5771 LearningRate 0.0001 Epoch: 28 Global Step: 594460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:18,089-Speed 6310.94 samples/sec Loss 3.6083 LearningRate 0.0001 Epoch: 28 Global Step: 594470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:21,336-Speed 6308.93 samples/sec Loss 3.5944 LearningRate 0.0001 Epoch: 28 Global Step: 594480 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:48:24,567-Speed 6340.37 samples/sec Loss 3.5981 LearningRate 0.0001 Epoch: 28 Global Step: 594490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:27,815-Speed 6308.01 samples/sec Loss 3.5855 LearningRate 0.0001 Epoch: 28 Global Step: 594500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:31,061-Speed 6309.79 samples/sec Loss 3.5932 LearningRate 0.0001 Epoch: 28 Global Step: 594510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:34,308-Speed 6309.14 samples/sec Loss 3.5817 LearningRate 0.0001 Epoch: 28 Global Step: 594520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:37,555-Speed 6309.51 samples/sec Loss 3.6096 LearningRate 0.0001 Epoch: 28 Global Step: 594530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:40,798-Speed 6315.48 samples/sec Loss 3.5942 LearningRate 0.0001 Epoch: 28 Global Step: 594540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:44,042-Speed 6315.59 samples/sec Loss 3.6258 LearningRate 0.0001 Epoch: 28 Global Step: 594550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:47,289-Speed 6308.18 samples/sec Loss 3.5744 LearningRate 0.0001 Epoch: 28 Global Step: 594560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:50,550-Speed 6282.40 samples/sec Loss 3.5332 LearningRate 0.0001 Epoch: 28 Global Step: 594570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:53,793-Speed 6314.90 samples/sec Loss 3.6062 LearningRate 0.0001 Epoch: 28 Global Step: 594580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:48:57,041-Speed 6307.59 samples/sec Loss 3.6113 LearningRate 0.0001 Epoch: 28 Global Step: 594590 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:49:00,273-Speed 6337.63 samples/sec Loss 3.6056 LearningRate 0.0001 Epoch: 28 Global Step: 594600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:03,519-Speed 6310.89 samples/sec Loss 3.6324 LearningRate 0.0001 Epoch: 28 Global Step: 594610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:06,767-Speed 6308.00 samples/sec Loss 3.6336 LearningRate 0.0001 Epoch: 28 Global Step: 594620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:10,013-Speed 6310.09 samples/sec Loss 3.5956 LearningRate 0.0001 Epoch: 28 Global Step: 594630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:13,259-Speed 6311.98 samples/sec Loss 3.5824 LearningRate 0.0001 Epoch: 28 Global Step: 594640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:16,566-Speed 6193.97 samples/sec Loss 3.5727 LearningRate 0.0001 Epoch: 28 Global Step: 594650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:19,853-Speed 6231.93 samples/sec Loss 3.6611 LearningRate 0.0001 Epoch: 28 Global Step: 594660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:23,103-Speed 6302.77 samples/sec Loss 3.5960 LearningRate 0.0001 Epoch: 28 Global Step: 594670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:26,345-Speed 6317.55 samples/sec Loss 3.6791 LearningRate 0.0001 Epoch: 28 Global Step: 594680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:29,591-Speed 6310.68 samples/sec Loss 3.6182 LearningRate 0.0001 Epoch: 28 Global Step: 594690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:32,838-Speed 6309.52 samples/sec Loss 3.5929 LearningRate 0.0001 Epoch: 28 Global Step: 594700 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:49:36,084-Speed 6310.09 samples/sec Loss 3.5934 LearningRate 0.0001 Epoch: 28 Global Step: 594710 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:49:39,329-Speed 6313.28 samples/sec Loss 3.5911 LearningRate 0.0001 Epoch: 28 Global Step: 594720 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:49:42,561-Speed 6338.44 samples/sec Loss 3.6019 LearningRate 0.0001 Epoch: 28 Global Step: 594730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:45,806-Speed 6313.09 samples/sec Loss 3.5867 LearningRate 0.0001 Epoch: 28 Global Step: 594740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:49,052-Speed 6309.20 samples/sec Loss 3.6257 LearningRate 0.0001 Epoch: 28 Global Step: 594750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:52,301-Speed 6305.89 samples/sec Loss 3.6036 LearningRate 0.0001 Epoch: 28 Global Step: 594760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:55,547-Speed 6310.65 samples/sec Loss 3.6624 LearningRate 0.0001 Epoch: 28 Global Step: 594770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:49:58,793-Speed 6309.58 samples/sec Loss 3.5588 LearningRate 0.0001 Epoch: 28 Global Step: 594780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:02,039-Speed 6311.18 samples/sec Loss 3.6146 LearningRate 0.0001 Epoch: 28 Global Step: 594790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:05,283-Speed 6314.29 samples/sec Loss 3.5983 LearningRate 0.0001 Epoch: 28 Global Step: 594800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:08,529-Speed 6311.59 samples/sec Loss 3.6345 LearningRate 0.0001 Epoch: 28 Global Step: 594810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:11,773-Speed 6313.99 samples/sec Loss 3.6381 LearningRate 0.0001 Epoch: 28 Global Step: 594820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:15,019-Speed 6311.36 samples/sec Loss 3.6275 LearningRate 0.0001 Epoch: 28 Global Step: 594830 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:50:18,254-Speed 6332.94 samples/sec Loss 3.5923 LearningRate 0.0001 Epoch: 28 Global Step: 594840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:21,499-Speed 6313.47 samples/sec Loss 3.6224 LearningRate 0.0001 Epoch: 28 Global Step: 594850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:24,748-Speed 6304.56 samples/sec Loss 3.6138 LearningRate 0.0001 Epoch: 28 Global Step: 594860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:27,990-Speed 6317.27 samples/sec Loss 3.5742 LearningRate 0.0001 Epoch: 28 Global Step: 594870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:31,235-Speed 6313.27 samples/sec Loss 3.6138 LearningRate 0.0001 Epoch: 28 Global Step: 594880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:34,481-Speed 6310.12 samples/sec Loss 3.6319 LearningRate 0.0001 Epoch: 28 Global Step: 594890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:37,728-Speed 6310.02 samples/sec Loss 3.5834 LearningRate 0.0001 Epoch: 28 Global Step: 594900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:40,973-Speed 6312.26 samples/sec Loss 3.5631 LearningRate 0.0001 Epoch: 28 Global Step: 594910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:44,215-Speed 6319.17 samples/sec Loss 3.5756 LearningRate 0.0001 Epoch: 28 Global Step: 594920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:47,460-Speed 6313.02 samples/sec Loss 3.5743 LearningRate 0.0001 Epoch: 28 Global Step: 594930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:50,698-Speed 6326.39 samples/sec Loss 3.5374 LearningRate 0.0001 Epoch: 28 Global Step: 594940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:53,941-Speed 6315.65 samples/sec Loss 3.5457 LearningRate 0.0001 Epoch: 28 Global Step: 594950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:50:57,190-Speed 6305.38 samples/sec Loss 3.5776 LearningRate 0.0001 Epoch: 28 Global Step: 594960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:00,434-Speed 6314.67 samples/sec Loss 3.5911 LearningRate 0.0001 Epoch: 28 Global Step: 594970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:03,678-Speed 6313.18 samples/sec Loss 3.5596 LearningRate 0.0001 Epoch: 28 Global Step: 594980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:06,921-Speed 6318.58 samples/sec Loss 3.5448 LearningRate 0.0001 Epoch: 28 Global Step: 594990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:10,172-Speed 6299.41 samples/sec Loss 3.6347 LearningRate 0.0001 Epoch: 28 Global Step: 595000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:13,475-Speed 6202.61 samples/sec Loss 3.6107 LearningRate 0.0001 Epoch: 28 Global Step: 595010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:16,718-Speed 6315.63 samples/sec Loss 3.6350 LearningRate 0.0001 Epoch: 28 Global Step: 595020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:19,960-Speed 6318.77 samples/sec Loss 3.5229 LearningRate 0.0001 Epoch: 28 Global Step: 595030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:23,199-Speed 6324.31 samples/sec Loss 3.5979 LearningRate 0.0001 Epoch: 28 Global Step: 595040 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:51:26,430-Speed 6340.74 samples/sec Loss 3.5505 LearningRate 0.0001 Epoch: 28 Global Step: 595050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:29,676-Speed 6311.67 samples/sec Loss 3.6113 LearningRate 0.0001 Epoch: 28 Global Step: 595060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:32,928-Speed 6299.25 samples/sec Loss 3.5557 LearningRate 0.0001 Epoch: 28 Global Step: 595070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:36,177-Speed 6305.38 samples/sec Loss 3.6443 LearningRate 0.0001 Epoch: 28 Global Step: 595080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:39,418-Speed 6320.04 samples/sec Loss 3.5103 LearningRate 0.0001 Epoch: 28 Global Step: 595090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:42,666-Speed 6307.69 samples/sec Loss 3.5971 LearningRate 0.0001 Epoch: 28 Global Step: 595100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:45,910-Speed 6313.83 samples/sec Loss 3.5786 LearningRate 0.0001 Epoch: 28 Global Step: 595110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:49,156-Speed 6310.38 samples/sec Loss 3.5788 LearningRate 0.0001 Epoch: 28 Global Step: 595120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:52,398-Speed 6319.00 samples/sec Loss 3.6054 LearningRate 0.0001 Epoch: 28 Global Step: 595130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:55,644-Speed 6310.55 samples/sec Loss 3.5817 LearningRate 0.0001 Epoch: 28 Global Step: 595140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:51:58,895-Speed 6300.01 samples/sec Loss 3.5964 LearningRate 0.0001 Epoch: 28 Global Step: 595150 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:52:02,135-Speed 6323.54 samples/sec Loss 3.6170 LearningRate 0.0001 Epoch: 28 Global Step: 595160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:05,386-Speed 6304.68 samples/sec Loss 3.6227 LearningRate 0.0001 Epoch: 28 Global Step: 595170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:08,630-Speed 6312.96 samples/sec Loss 3.5512 LearningRate 0.0001 Epoch: 28 Global Step: 595180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:11,881-Speed 6302.13 samples/sec Loss 3.6221 LearningRate 0.0001 Epoch: 28 Global Step: 595190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:15,123-Speed 6318.98 samples/sec Loss 3.5702 LearningRate 0.0001 Epoch: 28 Global Step: 595200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:18,369-Speed 6310.96 samples/sec Loss 3.5963 LearningRate 0.0001 Epoch: 28 Global Step: 595210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:21,611-Speed 6316.58 samples/sec Loss 3.5559 LearningRate 0.0001 Epoch: 28 Global Step: 595220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:24,857-Speed 6311.62 samples/sec Loss 3.5773 LearningRate 0.0001 Epoch: 28 Global Step: 595230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:28,105-Speed 6306.38 samples/sec Loss 3.5979 LearningRate 0.0001 Epoch: 28 Global Step: 595240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:31,351-Speed 6311.28 samples/sec Loss 3.5868 LearningRate 0.0001 Epoch: 28 Global Step: 595250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:34,579-Speed 6345.52 samples/sec Loss 3.5564 LearningRate 0.0001 Epoch: 28 Global Step: 595260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:37,827-Speed 6306.49 samples/sec Loss 3.5988 LearningRate 0.0001 Epoch: 28 Global Step: 595270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:41,068-Speed 6321.10 samples/sec Loss 3.5545 LearningRate 0.0001 Epoch: 28 Global Step: 595280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:44,313-Speed 6313.65 samples/sec Loss 3.6160 LearningRate 0.0001 Epoch: 28 Global Step: 595290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:47,560-Speed 6309.09 samples/sec Loss 3.5863 LearningRate 0.0001 Epoch: 28 Global Step: 595300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:50,805-Speed 6312.34 samples/sec Loss 3.5622 LearningRate 0.0001 Epoch: 28 Global Step: 595310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:54,057-Speed 6299.25 samples/sec Loss 3.6088 LearningRate 0.0001 Epoch: 28 Global Step: 595320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:52:57,298-Speed 6320.69 samples/sec Loss 3.5807 LearningRate 0.0001 Epoch: 28 Global Step: 595330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:00,543-Speed 6313.34 samples/sec Loss 3.6254 LearningRate 0.0001 Epoch: 28 Global Step: 595340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:03,793-Speed 6302.60 samples/sec Loss 3.5804 LearningRate 0.0001 Epoch: 28 Global Step: 595350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:07,045-Speed 6297.77 samples/sec Loss 3.5674 LearningRate 0.0001 Epoch: 28 Global Step: 595360 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:53:10,276-Speed 6339.93 samples/sec Loss 3.6285 LearningRate 0.0001 Epoch: 28 Global Step: 595370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:13,520-Speed 6315.69 samples/sec Loss 3.6052 LearningRate 0.0001 Epoch: 28 Global Step: 595380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:16,765-Speed 6313.12 samples/sec Loss 3.6696 LearningRate 0.0001 Epoch: 28 Global Step: 595390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:20,016-Speed 6299.88 samples/sec Loss 3.6490 LearningRate 0.0001 Epoch: 28 Global Step: 595400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:23,266-Speed 6303.01 samples/sec Loss 3.5806 LearningRate 0.0001 Epoch: 28 Global Step: 595410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:26,508-Speed 6318.20 samples/sec Loss 3.6309 LearningRate 0.0001 Epoch: 28 Global Step: 595420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:29,755-Speed 6309.47 samples/sec Loss 3.6034 LearningRate 0.0001 Epoch: 28 Global Step: 595430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:33,002-Speed 6308.91 samples/sec Loss 3.6127 LearningRate 0.0001 Epoch: 28 Global Step: 595440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:36,249-Speed 6312.97 samples/sec Loss 3.6072 LearningRate 0.0001 Epoch: 28 Global Step: 595450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:39,491-Speed 6317.27 samples/sec Loss 3.6170 LearningRate 0.0001 Epoch: 28 Global Step: 595460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:42,724-Speed 6336.01 samples/sec Loss 3.6654 LearningRate 0.0001 Epoch: 28 Global Step: 595470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:45,974-Speed 6302.76 samples/sec Loss 3.5567 LearningRate 0.0001 Epoch: 28 Global Step: 595480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:49,218-Speed 6314.53 samples/sec Loss 3.6689 LearningRate 0.0001 Epoch: 28 Global Step: 595490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:52,463-Speed 6312.44 samples/sec Loss 3.5634 LearningRate 0.0001 Epoch: 28 Global Step: 595500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:55,713-Speed 6304.45 samples/sec Loss 3.5605 LearningRate 0.0001 Epoch: 28 Global Step: 595510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:53:58,959-Speed 6311.01 samples/sec Loss 3.5903 LearningRate 0.0001 Epoch: 28 Global Step: 595520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:02,208-Speed 6305.11 samples/sec Loss 3.6031 LearningRate 0.0001 Epoch: 28 Global Step: 595530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:05,453-Speed 6312.49 samples/sec Loss 3.5756 LearningRate 0.0001 Epoch: 28 Global Step: 595540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:08,699-Speed 6310.14 samples/sec Loss 3.6182 LearningRate 0.0001 Epoch: 28 Global Step: 595550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:11,947-Speed 6306.94 samples/sec Loss 3.5922 LearningRate 0.0001 Epoch: 28 Global Step: 595560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:15,191-Speed 6314.45 samples/sec Loss 3.6399 LearningRate 0.0001 Epoch: 28 Global Step: 595570 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:54:18,423-Speed 6337.63 samples/sec Loss 3.6587 LearningRate 0.0001 Epoch: 28 Global Step: 595580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:21,669-Speed 6310.67 samples/sec Loss 3.6105 LearningRate 0.0001 Epoch: 28 Global Step: 595590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:24,917-Speed 6307.88 samples/sec Loss 3.5859 LearningRate 0.0001 Epoch: 28 Global Step: 595600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:28,166-Speed 6304.02 samples/sec Loss 3.6041 LearningRate 0.0001 Epoch: 28 Global Step: 595610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:31,407-Speed 6320.45 samples/sec Loss 3.5573 LearningRate 0.0001 Epoch: 28 Global Step: 595620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:34,653-Speed 6310.43 samples/sec Loss 3.5423 LearningRate 0.0001 Epoch: 28 Global Step: 595630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:37,903-Speed 6304.04 samples/sec Loss 3.5617 LearningRate 0.0001 Epoch: 28 Global Step: 595640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:41,146-Speed 6316.72 samples/sec Loss 3.5933 LearningRate 0.0001 Epoch: 28 Global Step: 595650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:44,392-Speed 6310.71 samples/sec Loss 3.5969 LearningRate 0.0001 Epoch: 28 Global Step: 595660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:47,634-Speed 6316.85 samples/sec Loss 3.6542 LearningRate 0.0001 Epoch: 28 Global Step: 595670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:50,883-Speed 6306.46 samples/sec Loss 3.6183 LearningRate 0.0001 Epoch: 28 Global Step: 595680 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:54:54,112-Speed 6342.35 samples/sec Loss 3.5895 LearningRate 0.0001 Epoch: 28 Global Step: 595690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:54:57,356-Speed 6315.59 samples/sec Loss 3.5681 LearningRate 0.0001 Epoch: 28 Global Step: 595700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:00,601-Speed 6311.51 samples/sec Loss 3.5664 LearningRate 0.0001 Epoch: 28 Global Step: 595710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:03,850-Speed 6305.59 samples/sec Loss 3.5448 LearningRate 0.0001 Epoch: 28 Global Step: 595720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:07,096-Speed 6311.52 samples/sec Loss 3.5928 LearningRate 0.0001 Epoch: 28 Global Step: 595730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:10,339-Speed 6316.16 samples/sec Loss 3.6015 LearningRate 0.0001 Epoch: 28 Global Step: 595740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:13,629-Speed 6226.53 samples/sec Loss 3.5448 LearningRate 0.0001 Epoch: 28 Global Step: 595750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:16,887-Speed 6288.15 samples/sec Loss 3.5853 LearningRate 0.0001 Epoch: 28 Global Step: 595760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:20,131-Speed 6315.14 samples/sec Loss 3.6143 LearningRate 0.0001 Epoch: 28 Global Step: 595770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:23,375-Speed 6313.82 samples/sec Loss 3.6611 LearningRate 0.0001 Epoch: 28 Global Step: 595780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:26,624-Speed 6305.46 samples/sec Loss 3.5611 LearningRate 0.0001 Epoch: 28 Global Step: 595790 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:55:29,869-Speed 6312.61 samples/sec Loss 3.5977 LearningRate 0.0001 Epoch: 28 Global Step: 595800 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:55:33,097-Speed 6344.54 samples/sec Loss 3.6430 LearningRate 0.0001 Epoch: 28 Global Step: 595810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:36,343-Speed 6312.51 samples/sec Loss 3.6300 LearningRate 0.0001 Epoch: 28 Global Step: 595820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:39,586-Speed 6315.80 samples/sec Loss 3.5639 LearningRate 0.0001 Epoch: 28 Global Step: 595830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:42,832-Speed 6311.53 samples/sec Loss 3.6424 LearningRate 0.0001 Epoch: 28 Global Step: 595840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:46,075-Speed 6315.44 samples/sec Loss 3.5666 LearningRate 0.0001 Epoch: 28 Global Step: 595850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:49,318-Speed 6316.29 samples/sec Loss 3.5679 LearningRate 0.0001 Epoch: 28 Global Step: 595860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:52,563-Speed 6313.75 samples/sec Loss 3.5714 LearningRate 0.0001 Epoch: 28 Global Step: 595870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:55,806-Speed 6316.75 samples/sec Loss 3.5973 LearningRate 0.0001 Epoch: 28 Global Step: 595880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:55:59,056-Speed 6301.72 samples/sec Loss 3.5723 LearningRate 0.0001 Epoch: 28 Global Step: 595890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:02,303-Speed 6309.52 samples/sec Loss 3.5658 LearningRate 0.0001 Epoch: 28 Global Step: 595900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:05,553-Speed 6302.92 samples/sec Loss 3.6000 LearningRate 0.0001 Epoch: 28 Global Step: 595910 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:56:08,796-Speed 6316.81 samples/sec Loss 3.6016 LearningRate 0.0001 Epoch: 28 Global Step: 595920 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:56:12,031-Speed 6331.04 samples/sec Loss 3.5921 LearningRate 0.0001 Epoch: 28 Global Step: 595930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:15,277-Speed 6312.73 samples/sec Loss 3.5553 LearningRate 0.0001 Epoch: 28 Global Step: 595940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:18,523-Speed 6310.89 samples/sec Loss 3.6185 LearningRate 0.0001 Epoch: 28 Global Step: 595950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:21,765-Speed 6317.35 samples/sec Loss 3.6036 LearningRate 0.0001 Epoch: 28 Global Step: 595960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:25,009-Speed 6315.22 samples/sec Loss 3.6400 LearningRate 0.0001 Epoch: 28 Global Step: 595970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:28,252-Speed 6316.00 samples/sec Loss 3.5751 LearningRate 0.0001 Epoch: 28 Global Step: 595980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:31,495-Speed 6317.35 samples/sec Loss 3.5497 LearningRate 0.0001 Epoch: 28 Global Step: 595990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:34,738-Speed 6316.14 samples/sec Loss 3.5779 LearningRate 0.0001 Epoch: 28 Global Step: 596000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:37,987-Speed 6305.40 samples/sec Loss 3.6168 LearningRate 0.0001 Epoch: 28 Global Step: 596010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:41,235-Speed 6306.10 samples/sec Loss 3.5757 LearningRate 0.0001 Epoch: 28 Global Step: 596020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:44,468-Speed 6336.99 samples/sec Loss 3.5972 LearningRate 0.0001 Epoch: 28 Global Step: 596030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:47,714-Speed 6309.84 samples/sec Loss 3.4984 LearningRate 0.0001 Epoch: 28 Global Step: 596040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:50,954-Speed 6321.98 samples/sec Loss 3.6127 LearningRate 0.0001 Epoch: 28 Global Step: 596050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:54,201-Speed 6309.70 samples/sec Loss 3.5521 LearningRate 0.0001 Epoch: 28 Global Step: 596060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:56:57,441-Speed 6321.37 samples/sec Loss 3.5706 LearningRate 0.0001 Epoch: 28 Global Step: 596070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:00,684-Speed 6317.69 samples/sec Loss 3.5553 LearningRate 0.0001 Epoch: 28 Global Step: 596080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:03,983-Speed 6208.09 samples/sec Loss 3.5957 LearningRate 0.0001 Epoch: 28 Global Step: 596090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:07,232-Speed 6306.05 samples/sec Loss 3.5424 LearningRate 0.0001 Epoch: 28 Global Step: 596100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:10,474-Speed 6317.63 samples/sec Loss 3.5903 LearningRate 0.0001 Epoch: 28 Global Step: 596110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:13,717-Speed 6316.75 samples/sec Loss 3.5285 LearningRate 0.0001 Epoch: 28 Global Step: 596120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:16,976-Speed 6286.29 samples/sec Loss 3.5529 LearningRate 0.0001 Epoch: 28 Global Step: 596130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:20,229-Speed 6296.50 samples/sec Loss 3.6171 LearningRate 0.0001 Epoch: 28 Global Step: 596140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:23,473-Speed 6315.28 samples/sec Loss 3.6460 LearningRate 0.0001 Epoch: 28 Global Step: 596150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:26,766-Speed 6220.42 samples/sec Loss 3.5518 LearningRate 0.0001 Epoch: 28 Global Step: 596160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:30,010-Speed 6314.98 samples/sec Loss 3.5607 LearningRate 0.0001 Epoch: 28 Global Step: 596170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:33,260-Speed 6302.13 samples/sec Loss 3.6244 LearningRate 0.0001 Epoch: 28 Global Step: 596180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:36,510-Speed 6303.33 samples/sec Loss 3.5865 LearningRate 0.0001 Epoch: 28 Global Step: 596190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:39,753-Speed 6316.33 samples/sec Loss 3.6194 LearningRate 0.0001 Epoch: 28 Global Step: 596200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:42,998-Speed 6312.94 samples/sec Loss 3.6071 LearningRate 0.0001 Epoch: 28 Global Step: 596210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:46,240-Speed 6318.93 samples/sec Loss 3.6469 LearningRate 0.0001 Epoch: 28 Global Step: 596220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:49,485-Speed 6312.91 samples/sec Loss 3.6133 LearningRate 0.0001 Epoch: 28 Global Step: 596230 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:57:52,718-Speed 6334.76 samples/sec Loss 3.5913 LearningRate 0.0001 Epoch: 28 Global Step: 596240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:55,966-Speed 6308.14 samples/sec Loss 3.5397 LearningRate 0.0001 Epoch: 28 Global Step: 596250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:57:59,211-Speed 6313.34 samples/sec Loss 3.5897 LearningRate 0.0001 Epoch: 28 Global Step: 596260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:02,459-Speed 6305.93 samples/sec Loss 3.5750 LearningRate 0.0001 Epoch: 28 Global Step: 596270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:05,707-Speed 6306.85 samples/sec Loss 3.5555 LearningRate 0.0001 Epoch: 28 Global Step: 596280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:08,955-Speed 6307.70 samples/sec Loss 3.5857 LearningRate 0.0001 Epoch: 28 Global Step: 596290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:12,200-Speed 6312.42 samples/sec Loss 3.5655 LearningRate 0.0001 Epoch: 28 Global Step: 596300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:15,444-Speed 6314.87 samples/sec Loss 3.6122 LearningRate 0.0001 Epoch: 28 Global Step: 596310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:18,745-Speed 6205.03 samples/sec Loss 3.5890 LearningRate 0.0001 Epoch: 28 Global Step: 596320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:21,991-Speed 6310.27 samples/sec Loss 3.6122 LearningRate 0.0001 Epoch: 28 Global Step: 596330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:25,229-Speed 6326.91 samples/sec Loss 3.6132 LearningRate 0.0001 Epoch: 28 Global Step: 596340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:28,475-Speed 6310.40 samples/sec Loss 3.5796 LearningRate 0.0001 Epoch: 28 Global Step: 596350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:31,719-Speed 6314.48 samples/sec Loss 3.6271 LearningRate 0.0001 Epoch: 28 Global Step: 596360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:34,967-Speed 6306.95 samples/sec Loss 3.5810 LearningRate 0.0001 Epoch: 28 Global Step: 596370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:38,214-Speed 6309.11 samples/sec Loss 3.5640 LearningRate 0.0001 Epoch: 28 Global Step: 596380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:41,492-Speed 6250.48 samples/sec Loss 3.6174 LearningRate 0.0001 Epoch: 28 Global Step: 596390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:44,740-Speed 6305.64 samples/sec Loss 3.5901 LearningRate 0.0001 Epoch: 28 Global Step: 596400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:47,982-Speed 6319.88 samples/sec Loss 3.5881 LearningRate 0.0001 Epoch: 28 Global Step: 596410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:51,235-Speed 6296.90 samples/sec Loss 3.5774 LearningRate 0.0001 Epoch: 28 Global Step: 596420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:54,484-Speed 6303.99 samples/sec Loss 3.5716 LearningRate 0.0001 Epoch: 28 Global Step: 596430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:58:57,730-Speed 6310.72 samples/sec Loss 3.6190 LearningRate 0.0001 Epoch: 28 Global Step: 596440 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:59:00,978-Speed 6307.04 samples/sec Loss 3.6015 LearningRate 0.0001 Epoch: 28 Global Step: 596450 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:59:04,223-Speed 6311.47 samples/sec Loss 3.5359 LearningRate 0.0001 Epoch: 28 Global Step: 596460 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:59:07,455-Speed 6339.27 samples/sec Loss 3.5438 LearningRate 0.0001 Epoch: 28 Global Step: 596470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:10,699-Speed 6313.29 samples/sec Loss 3.5998 LearningRate 0.0001 Epoch: 28 Global Step: 596480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:13,947-Speed 6308.04 samples/sec Loss 3.5407 LearningRate 0.0001 Epoch: 28 Global Step: 596490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:17,194-Speed 6309.18 samples/sec Loss 3.6092 LearningRate 0.0001 Epoch: 28 Global Step: 596500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:20,440-Speed 6310.37 samples/sec Loss 3.5062 LearningRate 0.0001 Epoch: 28 Global Step: 596510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:23,683-Speed 6316.10 samples/sec Loss 3.5880 LearningRate 0.0001 Epoch: 28 Global Step: 596520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:26,927-Speed 6313.75 samples/sec Loss 3.6094 LearningRate 0.0001 Epoch: 28 Global Step: 596530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:30,176-Speed 6306.01 samples/sec Loss 3.4953 LearningRate 0.0001 Epoch: 28 Global Step: 596540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:33,422-Speed 6310.85 samples/sec Loss 3.5642 LearningRate 0.0001 Epoch: 28 Global Step: 596550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:36,664-Speed 6317.37 samples/sec Loss 3.6011 LearningRate 0.0001 Epoch: 28 Global Step: 596560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:39,910-Speed 6310.79 samples/sec Loss 3.5777 LearningRate 0.0001 Epoch: 28 Global Step: 596570 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 21:59:43,145-Speed 6333.49 samples/sec Loss 3.5863 LearningRate 0.0001 Epoch: 28 Global Step: 596580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:46,395-Speed 6302.78 samples/sec Loss 3.5775 LearningRate 0.0001 Epoch: 28 Global Step: 596590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:49,642-Speed 6309.14 samples/sec Loss 3.5582 LearningRate 0.0001 Epoch: 28 Global Step: 596600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:52,887-Speed 6312.35 samples/sec Loss 3.5894 LearningRate 0.0001 Epoch: 28 Global Step: 596610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:56,133-Speed 6311.42 samples/sec Loss 3.5927 LearningRate 0.0001 Epoch: 28 Global Step: 596620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 21:59:59,401-Speed 6268.55 samples/sec Loss 3.5329 LearningRate 0.0001 Epoch: 28 Global Step: 596630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:02,722-Speed 6168.20 samples/sec Loss 3.5778 LearningRate 0.0001 Epoch: 28 Global Step: 596640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:05,972-Speed 6301.74 samples/sec Loss 3.6528 LearningRate 0.0001 Epoch: 28 Global Step: 596650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:09,218-Speed 6311.05 samples/sec Loss 3.5575 LearningRate 0.0001 Epoch: 28 Global Step: 596660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:12,465-Speed 6309.55 samples/sec Loss 3.6955 LearningRate 0.0001 Epoch: 28 Global Step: 596670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:15,695-Speed 6341.76 samples/sec Loss 3.5698 LearningRate 0.0001 Epoch: 28 Global Step: 596680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:18,939-Speed 6314.88 samples/sec Loss 3.6081 LearningRate 0.0001 Epoch: 28 Global Step: 596690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:22,189-Speed 6302.54 samples/sec Loss 3.5964 LearningRate 0.0001 Epoch: 28 Global Step: 596700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:25,438-Speed 6304.03 samples/sec Loss 3.5876 LearningRate 0.0001 Epoch: 28 Global Step: 596710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:28,690-Speed 6300.15 samples/sec Loss 3.5434 LearningRate 0.0001 Epoch: 28 Global Step: 596720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:31,934-Speed 6314.75 samples/sec Loss 3.5920 LearningRate 0.0001 Epoch: 28 Global Step: 596730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:35,181-Speed 6307.83 samples/sec Loss 3.5535 LearningRate 0.0001 Epoch: 28 Global Step: 596740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:38,423-Speed 6318.21 samples/sec Loss 3.6403 LearningRate 0.0001 Epoch: 28 Global Step: 596750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:41,675-Speed 6299.33 samples/sec Loss 3.6421 LearningRate 0.0001 Epoch: 28 Global Step: 596760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:44,920-Speed 6312.17 samples/sec Loss 3.6070 LearningRate 0.0001 Epoch: 28 Global Step: 596770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:48,171-Speed 6301.11 samples/sec Loss 3.5776 LearningRate 0.0001 Epoch: 28 Global Step: 596780 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:00:51,404-Speed 6335.91 samples/sec Loss 3.6330 LearningRate 0.0001 Epoch: 28 Global Step: 596790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:54,646-Speed 6318.95 samples/sec Loss 3.5471 LearningRate 0.0001 Epoch: 28 Global Step: 596800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:00:57,899-Speed 6296.75 samples/sec Loss 3.5800 LearningRate 0.0001 Epoch: 28 Global Step: 596810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:01,145-Speed 6311.84 samples/sec Loss 3.6044 LearningRate 0.0001 Epoch: 28 Global Step: 596820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:04,397-Speed 6298.58 samples/sec Loss 3.5173 LearningRate 0.0001 Epoch: 28 Global Step: 596830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:07,644-Speed 6309.83 samples/sec Loss 3.6286 LearningRate 0.0001 Epoch: 28 Global Step: 596840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:10,892-Speed 6307.31 samples/sec Loss 3.5918 LearningRate 0.0001 Epoch: 28 Global Step: 596850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:14,139-Speed 6308.53 samples/sec Loss 3.5923 LearningRate 0.0001 Epoch: 28 Global Step: 596860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:17,385-Speed 6309.39 samples/sec Loss 3.6423 LearningRate 0.0001 Epoch: 28 Global Step: 596870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:20,637-Speed 6299.41 samples/sec Loss 3.5392 LearningRate 0.0001 Epoch: 28 Global Step: 596880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:23,866-Speed 6343.76 samples/sec Loss 3.6228 LearningRate 0.0001 Epoch: 28 Global Step: 596890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:27,111-Speed 6313.05 samples/sec Loss 3.5742 LearningRate 0.0001 Epoch: 28 Global Step: 596900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:30,355-Speed 6314.77 samples/sec Loss 3.5750 LearningRate 0.0001 Epoch: 28 Global Step: 596910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:33,603-Speed 6306.47 samples/sec Loss 3.6059 LearningRate 0.0001 Epoch: 28 Global Step: 596920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:36,851-Speed 6307.60 samples/sec Loss 3.6185 LearningRate 0.0001 Epoch: 28 Global Step: 596930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:40,095-Speed 6313.84 samples/sec Loss 3.5972 LearningRate 0.0001 Epoch: 28 Global Step: 596940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:43,344-Speed 6306.29 samples/sec Loss 3.5857 LearningRate 0.0001 Epoch: 28 Global Step: 596950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:46,588-Speed 6312.94 samples/sec Loss 3.5913 LearningRate 0.0001 Epoch: 28 Global Step: 596960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:49,839-Speed 6302.92 samples/sec Loss 3.6170 LearningRate 0.0001 Epoch: 28 Global Step: 596970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:53,082-Speed 6315.06 samples/sec Loss 3.5993 LearningRate 0.0001 Epoch: 28 Global Step: 596980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:01:56,326-Speed 6315.21 samples/sec Loss 3.5726 LearningRate 0.0001 Epoch: 28 Global Step: 596990 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:01:59,559-Speed 6334.83 samples/sec Loss 3.6069 LearningRate 0.0001 Epoch: 28 Global Step: 597000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:02,811-Speed 6300.22 samples/sec Loss 3.6109 LearningRate 0.0001 Epoch: 28 Global Step: 597010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:06,058-Speed 6309.22 samples/sec Loss 3.5989 LearningRate 0.0001 Epoch: 28 Global Step: 597020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:09,303-Speed 6310.96 samples/sec Loss 3.5503 LearningRate 0.0001 Epoch: 28 Global Step: 597030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:12,547-Speed 6316.56 samples/sec Loss 3.6108 LearningRate 0.0001 Epoch: 28 Global Step: 597040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:15,803-Speed 6290.37 samples/sec Loss 3.6122 LearningRate 0.0001 Epoch: 28 Global Step: 597050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:19,048-Speed 6313.71 samples/sec Loss 3.6044 LearningRate 0.0001 Epoch: 28 Global Step: 597060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:22,296-Speed 6306.09 samples/sec Loss 3.6144 LearningRate 0.0001 Epoch: 28 Global Step: 597070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:25,542-Speed 6311.48 samples/sec Loss 3.5280 LearningRate 0.0001 Epoch: 28 Global Step: 597080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:28,788-Speed 6310.51 samples/sec Loss 3.5690 LearningRate 0.0001 Epoch: 28 Global Step: 597090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:32,019-Speed 6343.21 samples/sec Loss 3.5904 LearningRate 0.0001 Epoch: 28 Global Step: 597100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:35,311-Speed 6221.12 samples/sec Loss 3.6538 LearningRate 0.0001 Epoch: 28 Global Step: 597110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:38,635-Speed 6163.29 samples/sec Loss 3.5788 LearningRate 0.0001 Epoch: 28 Global Step: 597120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:41,883-Speed 6307.01 samples/sec Loss 3.5734 LearningRate 0.0001 Epoch: 28 Global Step: 597130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:45,128-Speed 6312.38 samples/sec Loss 3.5738 LearningRate 0.0001 Epoch: 28 Global Step: 597140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:48,373-Speed 6313.52 samples/sec Loss 3.5798 LearningRate 0.0001 Epoch: 28 Global Step: 597150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:51,621-Speed 6304.87 samples/sec Loss 3.5570 LearningRate 0.0001 Epoch: 28 Global Step: 597160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:54,869-Speed 6306.57 samples/sec Loss 3.5829 LearningRate 0.0001 Epoch: 28 Global Step: 597170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:02:58,113-Speed 6314.50 samples/sec Loss 3.6069 LearningRate 0.0001 Epoch: 28 Global Step: 597180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:01,358-Speed 6313.25 samples/sec Loss 3.5543 LearningRate 0.0001 Epoch: 28 Global Step: 597190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:04,608-Speed 6303.27 samples/sec Loss 3.6114 LearningRate 0.0001 Epoch: 28 Global Step: 597200 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:03:07,841-Speed 6336.78 samples/sec Loss 3.5997 LearningRate 0.0001 Epoch: 28 Global Step: 597210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:11,085-Speed 6313.43 samples/sec Loss 3.5797 LearningRate 0.0001 Epoch: 28 Global Step: 597220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:14,328-Speed 6317.40 samples/sec Loss 3.6320 LearningRate 0.0001 Epoch: 28 Global Step: 597230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:17,574-Speed 6310.43 samples/sec Loss 3.5776 LearningRate 0.0001 Epoch: 28 Global Step: 597240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:20,816-Speed 6318.05 samples/sec Loss 3.5799 LearningRate 0.0001 Epoch: 28 Global Step: 597250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:24,073-Speed 6290.26 samples/sec Loss 3.5694 LearningRate 0.0001 Epoch: 28 Global Step: 597260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:27,318-Speed 6312.81 samples/sec Loss 3.6174 LearningRate 0.0001 Epoch: 28 Global Step: 597270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:30,560-Speed 6317.21 samples/sec Loss 3.5792 LearningRate 0.0001 Epoch: 28 Global Step: 597280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:33,806-Speed 6312.66 samples/sec Loss 3.5622 LearningRate 0.0001 Epoch: 28 Global Step: 597290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:37,051-Speed 6312.07 samples/sec Loss 3.5703 LearningRate 0.0001 Epoch: 28 Global Step: 597300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:40,284-Speed 6336.87 samples/sec Loss 3.5984 LearningRate 0.0001 Epoch: 28 Global Step: 597310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:43,526-Speed 6319.23 samples/sec Loss 3.5326 LearningRate 0.0001 Epoch: 28 Global Step: 597320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:46,790-Speed 6274.80 samples/sec Loss 3.5280 LearningRate 0.0001 Epoch: 28 Global Step: 597330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:50,037-Speed 6309.92 samples/sec Loss 3.6275 LearningRate 0.0001 Epoch: 28 Global Step: 597340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:53,280-Speed 6316.59 samples/sec Loss 3.6486 LearningRate 0.0001 Epoch: 28 Global Step: 597350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:56,525-Speed 6312.57 samples/sec Loss 3.5576 LearningRate 0.0001 Epoch: 28 Global Step: 597360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:03:59,772-Speed 6306.85 samples/sec Loss 3.5875 LearningRate 0.0001 Epoch: 28 Global Step: 597370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:03,027-Speed 6293.95 samples/sec Loss 3.6045 LearningRate 0.0001 Epoch: 28 Global Step: 597380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:06,275-Speed 6306.64 samples/sec Loss 3.5637 LearningRate 0.0001 Epoch: 28 Global Step: 597390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:09,537-Speed 6280.39 samples/sec Loss 3.5598 LearningRate 0.0001 Epoch: 28 Global Step: 597400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:12,770-Speed 6336.20 samples/sec Loss 3.6199 LearningRate 0.0001 Epoch: 28 Global Step: 597410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:16,017-Speed 6308.84 samples/sec Loss 3.5526 LearningRate 0.0001 Epoch: 28 Global Step: 597420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:19,263-Speed 6310.24 samples/sec Loss 3.5634 LearningRate 0.0001 Epoch: 28 Global Step: 597430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:22,511-Speed 6307.43 samples/sec Loss 3.5529 LearningRate 0.0001 Epoch: 28 Global Step: 597440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:25,756-Speed 6312.20 samples/sec Loss 3.6024 LearningRate 0.0001 Epoch: 28 Global Step: 597450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:29,001-Speed 6312.18 samples/sec Loss 3.6008 LearningRate 0.0001 Epoch: 28 Global Step: 597460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:32,251-Speed 6303.83 samples/sec Loss 3.5471 LearningRate 0.0001 Epoch: 28 Global Step: 597470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:35,498-Speed 6309.09 samples/sec Loss 3.4950 LearningRate 0.0001 Epoch: 28 Global Step: 597480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:38,745-Speed 6308.10 samples/sec Loss 3.5833 LearningRate 0.0001 Epoch: 28 Global Step: 597490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:42,024-Speed 6247.72 samples/sec Loss 3.5630 LearningRate 0.0001 Epoch: 28 Global Step: 597500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:45,271-Speed 6309.40 samples/sec Loss 3.5458 LearningRate 0.0001 Epoch: 28 Global Step: 597510 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:04:48,506-Speed 6333.36 samples/sec Loss 3.5569 LearningRate 0.0001 Epoch: 28 Global Step: 597520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:51,751-Speed 6311.48 samples/sec Loss 3.5866 LearningRate 0.0001 Epoch: 28 Global Step: 597530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:54,996-Speed 6312.72 samples/sec Loss 3.5612 LearningRate 0.0001 Epoch: 28 Global Step: 597540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:04:58,241-Speed 6313.90 samples/sec Loss 3.6056 LearningRate 0.0001 Epoch: 28 Global Step: 597550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:01,494-Speed 6297.63 samples/sec Loss 3.5979 LearningRate 0.0001 Epoch: 28 Global Step: 597560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:04,743-Speed 6302.92 samples/sec Loss 3.5206 LearningRate 0.0001 Epoch: 28 Global Step: 597570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:07,992-Speed 6306.15 samples/sec Loss 3.5887 LearningRate 0.0001 Epoch: 28 Global Step: 597580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:11,242-Speed 6302.13 samples/sec Loss 3.5337 LearningRate 0.0001 Epoch: 28 Global Step: 597590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:14,489-Speed 6308.32 samples/sec Loss 3.5554 LearningRate 0.0001 Epoch: 28 Global Step: 597600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:17,736-Speed 6309.73 samples/sec Loss 3.5591 LearningRate 0.0001 Epoch: 28 Global Step: 597610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:20,978-Speed 6317.66 samples/sec Loss 3.5111 LearningRate 0.0001 Epoch: 28 Global Step: 597620 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:05:24,207-Speed 6345.13 samples/sec Loss 3.5412 LearningRate 0.0001 Epoch: 28 Global Step: 597630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:27,468-Speed 6281.38 samples/sec Loss 3.5963 LearningRate 0.0001 Epoch: 28 Global Step: 597640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:30,734-Speed 6271.89 samples/sec Loss 3.6296 LearningRate 0.0001 Epoch: 28 Global Step: 597650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:33,985-Speed 6301.99 samples/sec Loss 3.6055 LearningRate 0.0001 Epoch: 28 Global Step: 597660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:37,232-Speed 6308.05 samples/sec Loss 3.5989 LearningRate 0.0001 Epoch: 28 Global Step: 597670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:40,482-Speed 6303.35 samples/sec Loss 3.5338 LearningRate 0.0001 Epoch: 28 Global Step: 597680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:43,727-Speed 6311.70 samples/sec Loss 3.5393 LearningRate 0.0001 Epoch: 28 Global Step: 597690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:46,977-Speed 6302.80 samples/sec Loss 3.5650 LearningRate 0.0001 Epoch: 28 Global Step: 597700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:50,225-Speed 6308.58 samples/sec Loss 3.5682 LearningRate 0.0001 Epoch: 28 Global Step: 597710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:53,473-Speed 6308.90 samples/sec Loss 3.5896 LearningRate 0.0001 Epoch: 28 Global Step: 597720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:56,703-Speed 6341.10 samples/sec Loss 3.6213 LearningRate 0.0001 Epoch: 28 Global Step: 597730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:05:59,949-Speed 6311.78 samples/sec Loss 3.5515 LearningRate 0.0001 Epoch: 28 Global Step: 597740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:03,197-Speed 6307.14 samples/sec Loss 3.5168 LearningRate 0.0001 Epoch: 28 Global Step: 597750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:06,442-Speed 6311.70 samples/sec Loss 3.6169 LearningRate 0.0001 Epoch: 28 Global Step: 597760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:09,699-Speed 6289.21 samples/sec Loss 3.6399 LearningRate 0.0001 Epoch: 28 Global Step: 597770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:12,979-Speed 6245.44 samples/sec Loss 3.5398 LearningRate 0.0001 Epoch: 28 Global Step: 597780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:16,266-Speed 6232.46 samples/sec Loss 3.5352 LearningRate 0.0001 Epoch: 28 Global Step: 597790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:19,548-Speed 6241.46 samples/sec Loss 3.5626 LearningRate 0.0001 Epoch: 28 Global Step: 597800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:22,790-Speed 6318.42 samples/sec Loss 3.5830 LearningRate 0.0001 Epoch: 28 Global Step: 597810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:26,037-Speed 6309.15 samples/sec Loss 3.5376 LearningRate 0.0001 Epoch: 28 Global Step: 597820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:29,288-Speed 6300.74 samples/sec Loss 3.5837 LearningRate 0.0001 Epoch: 28 Global Step: 597830 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:06:32,521-Speed 6335.84 samples/sec Loss 3.5538 LearningRate 0.0001 Epoch: 28 Global Step: 597840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:35,767-Speed 6311.08 samples/sec Loss 3.5627 LearningRate 0.0001 Epoch: 28 Global Step: 597850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:39,015-Speed 6305.87 samples/sec Loss 3.6104 LearningRate 0.0001 Epoch: 28 Global Step: 597860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:42,268-Speed 6296.98 samples/sec Loss 3.5177 LearningRate 0.0001 Epoch: 28 Global Step: 597870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:45,513-Speed 6312.97 samples/sec Loss 3.5781 LearningRate 0.0001 Epoch: 28 Global Step: 597880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:48,759-Speed 6312.08 samples/sec Loss 3.5843 LearningRate 0.0001 Epoch: 28 Global Step: 597890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:52,006-Speed 6307.89 samples/sec Loss 3.5878 LearningRate 0.0001 Epoch: 28 Global Step: 597900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:55,251-Speed 6313.36 samples/sec Loss 3.6116 LearningRate 0.0001 Epoch: 28 Global Step: 597910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:06:58,495-Speed 6314.73 samples/sec Loss 3.5745 LearningRate 0.0001 Epoch: 28 Global Step: 597920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:01,739-Speed 6314.48 samples/sec Loss 3.5495 LearningRate 0.0001 Epoch: 28 Global Step: 597930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:04,985-Speed 6310.43 samples/sec Loss 3.6034 LearningRate 0.0001 Epoch: 28 Global Step: 597940 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:07:08,221-Speed 6330.82 samples/sec Loss 3.5958 LearningRate 0.0001 Epoch: 28 Global Step: 597950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:11,469-Speed 6307.54 samples/sec Loss 3.5463 LearningRate 0.0001 Epoch: 28 Global Step: 597960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:14,716-Speed 6308.27 samples/sec Loss 3.5623 LearningRate 0.0001 Epoch: 28 Global Step: 597970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:17,962-Speed 6310.56 samples/sec Loss 3.6021 LearningRate 0.0001 Epoch: 28 Global Step: 597980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:21,211-Speed 6306.00 samples/sec Loss 3.5676 LearningRate 0.0001 Epoch: 28 Global Step: 597990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:24,458-Speed 6309.64 samples/sec Loss 3.6002 LearningRate 0.0001 Epoch: 28 Global Step: 598000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:27,701-Speed 6316.26 samples/sec Loss 3.5325 LearningRate 0.0001 Epoch: 28 Global Step: 598010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:31,001-Speed 6206.38 samples/sec Loss 3.5750 LearningRate 0.0001 Epoch: 28 Global Step: 598020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:34,285-Speed 6238.16 samples/sec Loss 3.5353 LearningRate 0.0001 Epoch: 28 Global Step: 598030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:37,533-Speed 6305.91 samples/sec Loss 3.6195 LearningRate 0.0001 Epoch: 28 Global Step: 598040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:40,781-Speed 6306.40 samples/sec Loss 3.5813 LearningRate 0.0001 Epoch: 28 Global Step: 598050 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:07:44,031-Speed 6303.39 samples/sec Loss 3.6004 LearningRate 0.0001 Epoch: 28 Global Step: 598060 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:07:47,261-Speed 6341.79 samples/sec Loss 3.5391 LearningRate 0.0001 Epoch: 28 Global Step: 598070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:50,512-Speed 6302.47 samples/sec Loss 3.5089 LearningRate 0.0001 Epoch: 28 Global Step: 598080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:53,767-Speed 6292.84 samples/sec Loss 3.6102 LearningRate 0.0001 Epoch: 28 Global Step: 598090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:07:57,017-Speed 6303.12 samples/sec Loss 3.5745 LearningRate 0.0001 Epoch: 28 Global Step: 598100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:00,264-Speed 6308.88 samples/sec Loss 3.5860 LearningRate 0.0001 Epoch: 28 Global Step: 598110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:03,508-Speed 6313.25 samples/sec Loss 3.6135 LearningRate 0.0001 Epoch: 28 Global Step: 598120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:06,754-Speed 6310.72 samples/sec Loss 3.5830 LearningRate 0.0001 Epoch: 28 Global Step: 598130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:10,022-Speed 6268.75 samples/sec Loss 3.5303 LearningRate 0.0001 Epoch: 28 Global Step: 598140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:13,297-Speed 6255.86 samples/sec Loss 3.5436 LearningRate 0.0001 Epoch: 28 Global Step: 598150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:16,544-Speed 6308.44 samples/sec Loss 3.5675 LearningRate 0.0001 Epoch: 28 Global Step: 598160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:19,788-Speed 6315.35 samples/sec Loss 3.5110 LearningRate 0.0001 Epoch: 28 Global Step: 598170 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:08:23,021-Speed 6335.82 samples/sec Loss 3.5718 LearningRate 0.0001 Epoch: 28 Global Step: 598180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:26,268-Speed 6308.42 samples/sec Loss 3.5344 LearningRate 0.0001 Epoch: 28 Global Step: 598190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:29,515-Speed 6309.27 samples/sec Loss 3.5220 LearningRate 0.0001 Epoch: 28 Global Step: 598200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:32,758-Speed 6315.59 samples/sec Loss 3.5737 LearningRate 0.0001 Epoch: 28 Global Step: 598210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:36,003-Speed 6312.72 samples/sec Loss 3.5416 LearningRate 0.0001 Epoch: 28 Global Step: 598220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:39,250-Speed 6309.94 samples/sec Loss 3.5735 LearningRate 0.0001 Epoch: 28 Global Step: 598230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:42,498-Speed 6305.50 samples/sec Loss 3.5466 LearningRate 0.0001 Epoch: 28 Global Step: 598240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:45,744-Speed 6311.31 samples/sec Loss 3.5301 LearningRate 0.0001 Epoch: 28 Global Step: 598250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:48,990-Speed 6311.98 samples/sec Loss 3.5961 LearningRate 0.0001 Epoch: 28 Global Step: 598260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:52,231-Speed 6320.19 samples/sec Loss 3.6005 LearningRate 0.0001 Epoch: 28 Global Step: 598270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:55,462-Speed 6338.69 samples/sec Loss 3.5489 LearningRate 0.0001 Epoch: 28 Global Step: 598280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:08:58,711-Speed 6306.29 samples/sec Loss 3.5256 LearningRate 0.0001 Epoch: 28 Global Step: 598290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:01,957-Speed 6310.01 samples/sec Loss 3.5103 LearningRate 0.0001 Epoch: 28 Global Step: 598300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:05,206-Speed 6303.75 samples/sec Loss 3.6191 LearningRate 0.0001 Epoch: 28 Global Step: 598310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:08,460-Speed 6295.79 samples/sec Loss 3.5197 LearningRate 0.0001 Epoch: 28 Global Step: 598320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:11,702-Speed 6318.82 samples/sec Loss 3.6259 LearningRate 0.0001 Epoch: 28 Global Step: 598330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:14,945-Speed 6316.10 samples/sec Loss 3.5562 LearningRate 0.0001 Epoch: 28 Global Step: 598340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:18,193-Speed 6306.58 samples/sec Loss 3.5339 LearningRate 0.0001 Epoch: 28 Global Step: 598350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:21,437-Speed 6314.52 samples/sec Loss 3.5779 LearningRate 0.0001 Epoch: 28 Global Step: 598360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:24,690-Speed 6298.61 samples/sec Loss 3.5613 LearningRate 0.0001 Epoch: 28 Global Step: 598370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:27,926-Speed 6330.09 samples/sec Loss 3.5724 LearningRate 0.0001 Epoch: 28 Global Step: 598380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:31,170-Speed 6314.91 samples/sec Loss 3.6374 LearningRate 0.0001 Epoch: 28 Global Step: 598390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:34,414-Speed 6314.74 samples/sec Loss 3.5179 LearningRate 0.0001 Epoch: 28 Global Step: 598400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:37,655-Speed 6320.20 samples/sec Loss 3.5291 LearningRate 0.0001 Epoch: 28 Global Step: 598410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:40,902-Speed 6309.67 samples/sec Loss 3.5729 LearningRate 0.0001 Epoch: 28 Global Step: 598420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:44,146-Speed 6314.00 samples/sec Loss 3.5448 LearningRate 0.0001 Epoch: 28 Global Step: 598430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:47,396-Speed 6303.56 samples/sec Loss 3.5438 LearningRate 0.0001 Epoch: 28 Global Step: 598440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:50,640-Speed 6314.54 samples/sec Loss 3.4710 LearningRate 0.0001 Epoch: 28 Global Step: 598450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:53,894-Speed 6295.15 samples/sec Loss 3.6079 LearningRate 0.0001 Epoch: 28 Global Step: 598460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:09:57,146-Speed 6299.41 samples/sec Loss 3.5476 LearningRate 0.0001 Epoch: 28 Global Step: 598470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:00,393-Speed 6307.11 samples/sec Loss 3.5617 LearningRate 0.0001 Epoch: 28 Global Step: 598480 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:10:03,624-Speed 6340.12 samples/sec Loss 3.5484 LearningRate 0.0001 Epoch: 28 Global Step: 598490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:06,874-Speed 6303.01 samples/sec Loss 3.6048 LearningRate 0.0001 Epoch: 28 Global Step: 598500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:10,117-Speed 6316.20 samples/sec Loss 3.5662 LearningRate 0.0001 Epoch: 28 Global Step: 598510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:13,387-Speed 6264.07 samples/sec Loss 3.5673 LearningRate 0.0001 Epoch: 28 Global Step: 598520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:16,638-Speed 6301.10 samples/sec Loss 3.5790 LearningRate 0.0001 Epoch: 28 Global Step: 598530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:19,886-Speed 6308.23 samples/sec Loss 3.6004 LearningRate 0.0001 Epoch: 28 Global Step: 598540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:23,136-Speed 6301.41 samples/sec Loss 3.6092 LearningRate 0.0001 Epoch: 28 Global Step: 598550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:26,406-Speed 6265.87 samples/sec Loss 3.5130 LearningRate 0.0001 Epoch: 28 Global Step: 598560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:29,672-Speed 6271.54 samples/sec Loss 3.5992 LearningRate 0.0001 Epoch: 28 Global Step: 598570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:32,915-Speed 6317.10 samples/sec Loss 3.5657 LearningRate 0.0001 Epoch: 28 Global Step: 598580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:36,161-Speed 6310.65 samples/sec Loss 3.5077 LearningRate 0.0001 Epoch: 28 Global Step: 598590 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:10:39,398-Speed 6328.82 samples/sec Loss 3.5716 LearningRate 0.0001 Epoch: 28 Global Step: 598600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:42,640-Speed 6318.10 samples/sec Loss 3.6027 LearningRate 0.0001 Epoch: 28 Global Step: 598610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:45,884-Speed 6315.09 samples/sec Loss 3.5636 LearningRate 0.0001 Epoch: 28 Global Step: 598620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:49,129-Speed 6312.98 samples/sec Loss 3.6082 LearningRate 0.0001 Epoch: 28 Global Step: 598630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:52,371-Speed 6317.35 samples/sec Loss 3.5353 LearningRate 0.0001 Epoch: 28 Global Step: 598640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:55,617-Speed 6311.19 samples/sec Loss 3.5529 LearningRate 0.0001 Epoch: 28 Global Step: 598650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:10:58,868-Speed 6301.78 samples/sec Loss 3.5778 LearningRate 0.0001 Epoch: 28 Global Step: 598660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:02,119-Speed 6300.48 samples/sec Loss 3.4975 LearningRate 0.0001 Epoch: 28 Global Step: 598670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:05,363-Speed 6313.93 samples/sec Loss 3.5802 LearningRate 0.0001 Epoch: 28 Global Step: 598680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:08,607-Speed 6314.40 samples/sec Loss 3.5649 LearningRate 0.0001 Epoch: 28 Global Step: 598690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:11,839-Speed 6338.35 samples/sec Loss 3.5967 LearningRate 0.0001 Epoch: 28 Global Step: 598700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:15,085-Speed 6310.06 samples/sec Loss 3.5907 LearningRate 0.0001 Epoch: 28 Global Step: 598710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:18,330-Speed 6313.65 samples/sec Loss 3.5903 LearningRate 0.0001 Epoch: 28 Global Step: 598720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:21,574-Speed 6315.24 samples/sec Loss 3.5630 LearningRate 0.0001 Epoch: 28 Global Step: 598730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:24,821-Speed 6311.10 samples/sec Loss 3.6082 LearningRate 0.0001 Epoch: 28 Global Step: 598740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:28,062-Speed 6320.29 samples/sec Loss 3.5683 LearningRate 0.0001 Epoch: 28 Global Step: 598750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:31,309-Speed 6307.97 samples/sec Loss 3.5232 LearningRate 0.0001 Epoch: 28 Global Step: 598760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:34,556-Speed 6309.84 samples/sec Loss 3.5465 LearningRate 0.0001 Epoch: 28 Global Step: 598770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:37,798-Speed 6320.13 samples/sec Loss 3.6069 LearningRate 0.0001 Epoch: 28 Global Step: 598780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:41,043-Speed 6314.12 samples/sec Loss 3.5952 LearningRate 0.0001 Epoch: 28 Global Step: 598790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:44,286-Speed 6315.51 samples/sec Loss 3.5897 LearningRate 0.0001 Epoch: 28 Global Step: 598800 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:11:47,530-Speed 6313.81 samples/sec Loss 3.6258 LearningRate 0.0001 Epoch: 28 Global Step: 598810 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:11:50,840-Speed 6189.63 samples/sec Loss 3.5508 LearningRate 0.0001 Epoch: 28 Global Step: 598820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:54,096-Speed 6290.65 samples/sec Loss 3.5164 LearningRate 0.0001 Epoch: 28 Global Step: 598830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:11:57,342-Speed 6311.96 samples/sec Loss 3.5440 LearningRate 0.0001 Epoch: 28 Global Step: 598840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:00,622-Speed 6245.44 samples/sec Loss 3.5481 LearningRate 0.0001 Epoch: 28 Global Step: 598850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:03,866-Speed 6315.10 samples/sec Loss 3.5427 LearningRate 0.0001 Epoch: 28 Global Step: 598860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:07,109-Speed 6317.35 samples/sec Loss 3.5385 LearningRate 0.0001 Epoch: 28 Global Step: 598870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:10,358-Speed 6303.69 samples/sec Loss 3.4807 LearningRate 0.0001 Epoch: 28 Global Step: 598880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:13,605-Speed 6309.82 samples/sec Loss 3.5556 LearningRate 0.0001 Epoch: 28 Global Step: 598890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:16,849-Speed 6313.06 samples/sec Loss 3.6004 LearningRate 0.0001 Epoch: 28 Global Step: 598900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:20,096-Speed 6308.81 samples/sec Loss 3.5870 LearningRate 0.0001 Epoch: 28 Global Step: 598910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:23,324-Speed 6345.91 samples/sec Loss 3.5430 LearningRate 0.0001 Epoch: 28 Global Step: 598920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:26,568-Speed 6314.61 samples/sec Loss 3.5528 LearningRate 0.0001 Epoch: 28 Global Step: 598930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:29,818-Speed 6303.81 samples/sec Loss 3.5596 LearningRate 0.0001 Epoch: 28 Global Step: 598940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:33,063-Speed 6312.80 samples/sec Loss 3.5264 LearningRate 0.0001 Epoch: 28 Global Step: 598950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:36,309-Speed 6310.70 samples/sec Loss 3.6131 LearningRate 0.0001 Epoch: 28 Global Step: 598960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:39,560-Speed 6300.90 samples/sec Loss 3.5831 LearningRate 0.0001 Epoch: 28 Global Step: 598970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:42,806-Speed 6310.30 samples/sec Loss 3.6036 LearningRate 0.0001 Epoch: 28 Global Step: 598980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:46,053-Speed 6308.22 samples/sec Loss 3.5916 LearningRate 0.0001 Epoch: 28 Global Step: 598990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:49,300-Speed 6310.29 samples/sec Loss 3.6077 LearningRate 0.0001 Epoch: 28 Global Step: 599000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:52,548-Speed 6307.05 samples/sec Loss 3.5150 LearningRate 0.0001 Epoch: 28 Global Step: 599010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:12:55,795-Speed 6308.81 samples/sec Loss 3.5834 LearningRate 0.0001 Epoch: 28 Global Step: 599020 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:12:59,041-Speed 6309.10 samples/sec Loss 3.5923 LearningRate 0.0001 Epoch: 28 Global Step: 599030 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:13:02,271-Speed 6341.83 samples/sec Loss 3.5536 LearningRate 0.0001 Epoch: 28 Global Step: 599040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:05,518-Speed 6309.18 samples/sec Loss 3.5366 LearningRate 0.0001 Epoch: 28 Global Step: 599050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:08,768-Speed 6307.22 samples/sec Loss 3.6119 LearningRate 0.0001 Epoch: 28 Global Step: 599060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:12,018-Speed 6304.15 samples/sec Loss 3.5928 LearningRate 0.0001 Epoch: 28 Global Step: 599070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:15,268-Speed 6301.52 samples/sec Loss 3.5973 LearningRate 0.0001 Epoch: 28 Global Step: 599080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:18,514-Speed 6311.51 samples/sec Loss 3.6056 LearningRate 0.0001 Epoch: 28 Global Step: 599090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:21,759-Speed 6313.32 samples/sec Loss 3.5662 LearningRate 0.0001 Epoch: 28 Global Step: 599100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:25,005-Speed 6310.41 samples/sec Loss 3.5712 LearningRate 0.0001 Epoch: 28 Global Step: 599110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:28,251-Speed 6311.18 samples/sec Loss 3.5462 LearningRate 0.0001 Epoch: 28 Global Step: 599120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:31,502-Speed 6300.87 samples/sec Loss 3.6073 LearningRate 0.0001 Epoch: 28 Global Step: 599130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:34,831-Speed 6153.13 samples/sec Loss 3.5797 LearningRate 0.0001 Epoch: 28 Global Step: 599140 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:13:38,073-Speed 6318.83 samples/sec Loss 3.4946 LearningRate 0.0001 Epoch: 28 Global Step: 599150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:41,316-Speed 6315.77 samples/sec Loss 3.5624 LearningRate 0.0001 Epoch: 28 Global Step: 599160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:44,561-Speed 6313.30 samples/sec Loss 3.5711 LearningRate 0.0001 Epoch: 28 Global Step: 599170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:47,805-Speed 6314.13 samples/sec Loss 3.5907 LearningRate 0.0001 Epoch: 28 Global Step: 599180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:51,052-Speed 6308.69 samples/sec Loss 3.5894 LearningRate 0.0001 Epoch: 28 Global Step: 599190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:54,296-Speed 6314.88 samples/sec Loss 3.5064 LearningRate 0.0001 Epoch: 28 Global Step: 599200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:13:57,539-Speed 6316.37 samples/sec Loss 3.5926 LearningRate 0.0001 Epoch: 28 Global Step: 599210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:00,792-Speed 6298.00 samples/sec Loss 3.5507 LearningRate 0.0001 Epoch: 28 Global Step: 599220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:04,044-Speed 6298.65 samples/sec Loss 3.5020 LearningRate 0.0001 Epoch: 28 Global Step: 599230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:07,290-Speed 6310.95 samples/sec Loss 3.6093 LearningRate 0.0001 Epoch: 28 Global Step: 599240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:10,518-Speed 6345.45 samples/sec Loss 3.5419 LearningRate 0.0001 Epoch: 28 Global Step: 599250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:13,763-Speed 6315.43 samples/sec Loss 3.5934 LearningRate 0.0001 Epoch: 28 Global Step: 599260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:17,010-Speed 6308.81 samples/sec Loss 3.5502 LearningRate 0.0001 Epoch: 28 Global Step: 599270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:20,255-Speed 6313.39 samples/sec Loss 3.5557 LearningRate 0.0001 Epoch: 28 Global Step: 599280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:23,505-Speed 6303.83 samples/sec Loss 3.5747 LearningRate 0.0001 Epoch: 28 Global Step: 599290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:26,751-Speed 6310.37 samples/sec Loss 3.5811 LearningRate 0.0001 Epoch: 28 Global Step: 599300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:29,997-Speed 6311.75 samples/sec Loss 3.5134 LearningRate 0.0001 Epoch: 28 Global Step: 599310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:33,256-Speed 6284.38 samples/sec Loss 3.5621 LearningRate 0.0001 Epoch: 28 Global Step: 599320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:36,617-Speed 6094.91 samples/sec Loss 3.5718 LearningRate 0.0001 Epoch: 28 Global Step: 599330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:39,896-Speed 6247.10 samples/sec Loss 3.5952 LearningRate 0.0001 Epoch: 28 Global Step: 599340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:43,141-Speed 6313.45 samples/sec Loss 3.4902 LearningRate 0.0001 Epoch: 28 Global Step: 599350 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:14:46,387-Speed 6310.94 samples/sec Loss 3.5556 LearningRate 0.0001 Epoch: 28 Global Step: 599360 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:14:49,618-Speed 6338.41 samples/sec Loss 3.5794 LearningRate 0.0001 Epoch: 28 Global Step: 599370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:52,861-Speed 6317.25 samples/sec Loss 3.5721 LearningRate 0.0001 Epoch: 28 Global Step: 599380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:56,102-Speed 6321.13 samples/sec Loss 3.5200 LearningRate 0.0001 Epoch: 28 Global Step: 599390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:14:59,344-Speed 6317.83 samples/sec Loss 3.5395 LearningRate 0.0001 Epoch: 28 Global Step: 599400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:02,591-Speed 6308.13 samples/sec Loss 3.6047 LearningRate 0.0001 Epoch: 28 Global Step: 599410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:05,865-Speed 6256.27 samples/sec Loss 3.5341 LearningRate 0.0001 Epoch: 28 Global Step: 599420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:09,109-Speed 6316.14 samples/sec Loss 3.5259 LearningRate 0.0001 Epoch: 28 Global Step: 599430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:12,350-Speed 6319.00 samples/sec Loss 3.5231 LearningRate 0.0001 Epoch: 28 Global Step: 599440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:15,593-Speed 6316.61 samples/sec Loss 3.6190 LearningRate 0.0001 Epoch: 28 Global Step: 599450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:18,843-Speed 6303.98 samples/sec Loss 3.5346 LearningRate 0.0001 Epoch: 28 Global Step: 599460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:22,070-Speed 6347.13 samples/sec Loss 3.6019 LearningRate 0.0001 Epoch: 28 Global Step: 599470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:25,311-Speed 6320.15 samples/sec Loss 3.5960 LearningRate 0.0001 Epoch: 28 Global Step: 599480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:28,556-Speed 6312.58 samples/sec Loss 3.5263 LearningRate 0.0001 Epoch: 28 Global Step: 599490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:31,803-Speed 6310.18 samples/sec Loss 3.5960 LearningRate 0.0001 Epoch: 28 Global Step: 599500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:35,050-Speed 6309.40 samples/sec Loss 3.5986 LearningRate 0.0001 Epoch: 28 Global Step: 599510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:38,291-Speed 6319.27 samples/sec Loss 3.5996 LearningRate 0.0001 Epoch: 28 Global Step: 599520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:41,628-Speed 6138.75 samples/sec Loss 3.5237 LearningRate 0.0001 Epoch: 28 Global Step: 599530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:44,873-Speed 6313.31 samples/sec Loss 3.5612 LearningRate 0.0001 Epoch: 28 Global Step: 599540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:48,120-Speed 6309.25 samples/sec Loss 3.5480 LearningRate 0.0001 Epoch: 28 Global Step: 599550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:51,366-Speed 6309.86 samples/sec Loss 3.5679 LearningRate 0.0001 Epoch: 28 Global Step: 599560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:15:54,614-Speed 6306.25 samples/sec Loss 3.5371 LearningRate 0.0001 Epoch: 28 Global Step: 599570 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:15:57,844-Speed 6342.31 samples/sec Loss 3.5126 LearningRate 0.0001 Epoch: 28 Global Step: 599580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:01,092-Speed 6306.73 samples/sec Loss 3.6126 LearningRate 0.0001 Epoch: 28 Global Step: 599590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:04,339-Speed 6308.47 samples/sec Loss 3.5413 LearningRate 0.0001 Epoch: 28 Global Step: 599600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:07,584-Speed 6312.73 samples/sec Loss 3.5026 LearningRate 0.0001 Epoch: 28 Global Step: 599610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:10,831-Speed 6308.83 samples/sec Loss 3.5123 LearningRate 0.0001 Epoch: 28 Global Step: 599620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:14,080-Speed 6305.54 samples/sec Loss 3.5950 LearningRate 0.0001 Epoch: 28 Global Step: 599630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:17,326-Speed 6310.99 samples/sec Loss 3.5673 LearningRate 0.0001 Epoch: 28 Global Step: 599640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:20,572-Speed 6309.93 samples/sec Loss 3.5583 LearningRate 0.0001 Epoch: 28 Global Step: 599650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:23,829-Speed 6290.28 samples/sec Loss 3.5681 LearningRate 0.0001 Epoch: 28 Global Step: 599660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:27,072-Speed 6315.07 samples/sec Loss 3.6131 LearningRate 0.0001 Epoch: 28 Global Step: 599670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:30,337-Speed 6275.94 samples/sec Loss 3.5781 LearningRate 0.0001 Epoch: 28 Global Step: 599680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:33,584-Speed 6308.07 samples/sec Loss 3.5534 LearningRate 0.0001 Epoch: 28 Global Step: 599690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:36,828-Speed 6313.40 samples/sec Loss 3.5902 LearningRate 0.0001 Epoch: 28 Global Step: 599700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:40,122-Speed 6219.11 samples/sec Loss 3.5484 LearningRate 0.0001 Epoch: 28 Global Step: 599710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:43,376-Speed 6295.89 samples/sec Loss 3.5795 LearningRate 0.0001 Epoch: 28 Global Step: 599720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:46,622-Speed 6310.03 samples/sec Loss 3.5754 LearningRate 0.0001 Epoch: 28 Global Step: 599730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:49,867-Speed 6312.86 samples/sec Loss 3.5826 LearningRate 0.0001 Epoch: 28 Global Step: 599740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:53,112-Speed 6313.23 samples/sec Loss 3.5751 LearningRate 0.0001 Epoch: 28 Global Step: 599750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:56,361-Speed 6304.84 samples/sec Loss 3.5209 LearningRate 0.0001 Epoch: 28 Global Step: 599760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:16:59,608-Speed 6310.45 samples/sec Loss 3.5612 LearningRate 0.0001 Epoch: 28 Global Step: 599770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:02,849-Speed 6319.21 samples/sec Loss 3.5774 LearningRate 0.0001 Epoch: 28 Global Step: 599780 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:17:06,078-Speed 6344.21 samples/sec Loss 3.5599 LearningRate 0.0001 Epoch: 28 Global Step: 599790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:09,322-Speed 6314.20 samples/sec Loss 3.5615 LearningRate 0.0001 Epoch: 28 Global Step: 599800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:12,565-Speed 6316.54 samples/sec Loss 3.5456 LearningRate 0.0001 Epoch: 28 Global Step: 599810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:15,824-Speed 6285.96 samples/sec Loss 3.5455 LearningRate 0.0001 Epoch: 28 Global Step: 599820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:19,070-Speed 6309.35 samples/sec Loss 3.5791 LearningRate 0.0001 Epoch: 28 Global Step: 599830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:22,315-Speed 6314.19 samples/sec Loss 3.5860 LearningRate 0.0001 Epoch: 28 Global Step: 599840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:25,558-Speed 6316.60 samples/sec Loss 3.6232 LearningRate 0.0001 Epoch: 28 Global Step: 599850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:28,800-Speed 6318.21 samples/sec Loss 3.5087 LearningRate 0.0001 Epoch: 28 Global Step: 599860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:32,042-Speed 6319.20 samples/sec Loss 3.5534 LearningRate 0.0001 Epoch: 28 Global Step: 599870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:35,286-Speed 6313.72 samples/sec Loss 3.5497 LearningRate 0.0001 Epoch: 28 Global Step: 599880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:38,514-Speed 6345.55 samples/sec Loss 3.5424 LearningRate 0.0001 Epoch: 28 Global Step: 599890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:41,767-Speed 6296.30 samples/sec Loss 3.5801 LearningRate 0.0001 Epoch: 28 Global Step: 599900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:45,009-Speed 6320.30 samples/sec Loss 3.4987 LearningRate 0.0001 Epoch: 28 Global Step: 599910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:48,256-Speed 6307.84 samples/sec Loss 3.5658 LearningRate 0.0001 Epoch: 28 Global Step: 599920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:51,499-Speed 6316.24 samples/sec Loss 3.5458 LearningRate 0.0001 Epoch: 28 Global Step: 599930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:54,746-Speed 6309.80 samples/sec Loss 3.5956 LearningRate 0.0001 Epoch: 28 Global Step: 599940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:17:57,993-Speed 6307.24 samples/sec Loss 3.5778 LearningRate 0.0001 Epoch: 28 Global Step: 599950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:01,246-Speed 6298.22 samples/sec Loss 3.5813 LearningRate 0.0001 Epoch: 28 Global Step: 599960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:04,489-Speed 6317.53 samples/sec Loss 3.5917 LearningRate 0.0001 Epoch: 28 Global Step: 599970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:07,735-Speed 6311.51 samples/sec Loss 3.5806 LearningRate 0.0001 Epoch: 28 Global Step: 599980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:10,982-Speed 6307.15 samples/sec Loss 3.5662 LearningRate 0.0001 Epoch: 28 Global Step: 599990 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:18:14,212-Speed 6342.21 samples/sec Loss 3.5929 LearningRate 0.0001 Epoch: 28 Global Step: 600000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:17,457-Speed 6312.46 samples/sec Loss 3.6115 LearningRate 0.0001 Epoch: 28 Global Step: 600010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:20,701-Speed 6314.37 samples/sec Loss 3.5154 LearningRate 0.0001 Epoch: 28 Global Step: 600020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:23,945-Speed 6315.90 samples/sec Loss 3.5470 LearningRate 0.0001 Epoch: 28 Global Step: 600030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:27,193-Speed 6307.23 samples/sec Loss 3.5760 LearningRate 0.0001 Epoch: 28 Global Step: 600040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:30,438-Speed 6311.31 samples/sec Loss 3.5615 LearningRate 0.0001 Epoch: 28 Global Step: 600050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:33,680-Speed 6320.00 samples/sec Loss 3.4878 LearningRate 0.0001 Epoch: 28 Global Step: 600060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:36,931-Speed 6300.13 samples/sec Loss 3.5705 LearningRate 0.0001 Epoch: 28 Global Step: 600070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:40,178-Speed 6308.03 samples/sec Loss 3.5482 LearningRate 0.0001 Epoch: 28 Global Step: 600080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:43,422-Speed 6314.64 samples/sec Loss 3.6417 LearningRate 0.0001 Epoch: 28 Global Step: 600090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:46,669-Speed 6308.69 samples/sec Loss 3.5910 LearningRate 0.0001 Epoch: 28 Global Step: 600100 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:18:49,905-Speed 6329.74 samples/sec Loss 3.5691 LearningRate 0.0001 Epoch: 28 Global Step: 600110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:53,153-Speed 6308.30 samples/sec Loss 3.6254 LearningRate 0.0001 Epoch: 28 Global Step: 600120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:56,396-Speed 6315.52 samples/sec Loss 3.5727 LearningRate 0.0001 Epoch: 28 Global Step: 600130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:18:59,652-Speed 6291.08 samples/sec Loss 3.5813 LearningRate 0.0001 Epoch: 28 Global Step: 600140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:02,902-Speed 6304.12 samples/sec Loss 3.5446 LearningRate 0.0001 Epoch: 28 Global Step: 600150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:06,150-Speed 6305.81 samples/sec Loss 3.6128 LearningRate 0.0001 Epoch: 28 Global Step: 600160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:09,398-Speed 6306.55 samples/sec Loss 3.5296 LearningRate 0.0001 Epoch: 28 Global Step: 600170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:12,643-Speed 6313.57 samples/sec Loss 3.5913 LearningRate 0.0001 Epoch: 28 Global Step: 600180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:15,894-Speed 6302.62 samples/sec Loss 3.5277 LearningRate 0.0001 Epoch: 28 Global Step: 600190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:19,139-Speed 6312.69 samples/sec Loss 3.5709 LearningRate 0.0001 Epoch: 28 Global Step: 600200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:22,384-Speed 6311.26 samples/sec Loss 3.5105 LearningRate 0.0001 Epoch: 28 Global Step: 600210 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:19:25,631-Speed 6308.70 samples/sec Loss 3.5436 LearningRate 0.0001 Epoch: 28 Global Step: 600220 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:19:28,982-Speed 6114.26 samples/sec Loss 3.5342 LearningRate 0.0001 Epoch: 28 Global Step: 600230 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:19:32,248-Speed 6271.85 samples/sec Loss 3.5910 LearningRate 0.0001 Epoch: 28 Global Step: 600240 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:19:35,496-Speed 6306.34 samples/sec Loss 3.4804 LearningRate 0.0001 Epoch: 28 Global Step: 600250 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:19:38,739-Speed 6316.66 samples/sec Loss 3.5232 LearningRate 0.0001 Epoch: 28 Global Step: 600260 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:19:41,972-Speed 6335.12 samples/sec Loss 3.4977 LearningRate 0.0001 Epoch: 28 Global Step: 600270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:45,219-Speed 6309.77 samples/sec Loss 3.5919 LearningRate 0.0001 Epoch: 28 Global Step: 600280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:48,470-Speed 6301.35 samples/sec Loss 3.6001 LearningRate 0.0001 Epoch: 28 Global Step: 600290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:51,719-Speed 6304.02 samples/sec Loss 3.5850 LearningRate 0.0001 Epoch: 28 Global Step: 600300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:54,965-Speed 6311.39 samples/sec Loss 3.5182 LearningRate 0.0001 Epoch: 28 Global Step: 600310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:19:58,212-Speed 6307.43 samples/sec Loss 3.5419 LearningRate 0.0001 Epoch: 28 Global Step: 600320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:01,461-Speed 6305.44 samples/sec Loss 3.5762 LearningRate 0.0001 Epoch: 28 Global Step: 600330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:04,712-Speed 6300.38 samples/sec Loss 3.6127 LearningRate 0.0001 Epoch: 28 Global Step: 600340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:07,956-Speed 6316.04 samples/sec Loss 3.6252 LearningRate 0.0001 Epoch: 28 Global Step: 600350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:11,201-Speed 6312.50 samples/sec Loss 3.5455 LearningRate 0.0001 Epoch: 28 Global Step: 600360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:14,428-Speed 6347.49 samples/sec Loss 3.5273 LearningRate 0.0001 Epoch: 28 Global Step: 600370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:17,683-Speed 6293.93 samples/sec Loss 3.4881 LearningRate 0.0001 Epoch: 28 Global Step: 600380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:20,930-Speed 6308.05 samples/sec Loss 3.5757 LearningRate 0.0001 Epoch: 28 Global Step: 600390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:24,175-Speed 6313.88 samples/sec Loss 3.5496 LearningRate 0.0001 Epoch: 28 Global Step: 600400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:27,426-Speed 6301.27 samples/sec Loss 3.6112 LearningRate 0.0001 Epoch: 28 Global Step: 600410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:30,672-Speed 6309.68 samples/sec Loss 3.5482 LearningRate 0.0001 Epoch: 28 Global Step: 600420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:33,920-Speed 6307.94 samples/sec Loss 3.5464 LearningRate 0.0001 Epoch: 28 Global Step: 600430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:37,168-Speed 6307.19 samples/sec Loss 3.5855 LearningRate 0.0001 Epoch: 28 Global Step: 600440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:40,414-Speed 6310.31 samples/sec Loss 3.6038 LearningRate 0.0001 Epoch: 28 Global Step: 600450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:43,662-Speed 6306.94 samples/sec Loss 3.5966 LearningRate 0.0001 Epoch: 28 Global Step: 600460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:46,905-Speed 6315.83 samples/sec Loss 3.5060 LearningRate 0.0001 Epoch: 28 Global Step: 600470 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:20:50,139-Speed 6333.91 samples/sec Loss 3.5559 LearningRate 0.0001 Epoch: 28 Global Step: 600480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:53,385-Speed 6310.92 samples/sec Loss 3.5253 LearningRate 0.0001 Epoch: 28 Global Step: 600490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:56,628-Speed 6316.28 samples/sec Loss 3.4530 LearningRate 0.0001 Epoch: 28 Global Step: 600500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:20:59,875-Speed 6309.02 samples/sec Loss 3.5182 LearningRate 0.0001 Epoch: 28 Global Step: 600510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:03,122-Speed 6309.37 samples/sec Loss 3.5445 LearningRate 0.0001 Epoch: 28 Global Step: 600520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:06,367-Speed 6313.43 samples/sec Loss 3.5758 LearningRate 0.0001 Epoch: 28 Global Step: 600530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:09,635-Speed 6267.86 samples/sec Loss 3.5654 LearningRate 0.0001 Epoch: 28 Global Step: 600540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:12,899-Speed 6275.10 samples/sec Loss 3.5535 LearningRate 0.0001 Epoch: 28 Global Step: 600550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:16,141-Speed 6318.89 samples/sec Loss 3.5667 LearningRate 0.0001 Epoch: 28 Global Step: 600560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:19,384-Speed 6315.34 samples/sec Loss 3.5459 LearningRate 0.0001 Epoch: 28 Global Step: 600570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:22,646-Speed 6280.53 samples/sec Loss 3.5267 LearningRate 0.0001 Epoch: 28 Global Step: 600580 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:21:25,878-Speed 6337.03 samples/sec Loss 3.5394 LearningRate 0.0001 Epoch: 28 Global Step: 600590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:29,123-Speed 6313.85 samples/sec Loss 3.5591 LearningRate 0.0001 Epoch: 28 Global Step: 600600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:32,369-Speed 6311.28 samples/sec Loss 3.5490 LearningRate 0.0001 Epoch: 28 Global Step: 600610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:35,613-Speed 6315.16 samples/sec Loss 3.5632 LearningRate 0.0001 Epoch: 28 Global Step: 600620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:38,961-Speed 6118.29 samples/sec Loss 3.5376 LearningRate 0.0001 Epoch: 28 Global Step: 600630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:42,202-Speed 6319.29 samples/sec Loss 3.5530 LearningRate 0.0001 Epoch: 28 Global Step: 600640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:45,446-Speed 6315.00 samples/sec Loss 3.5459 LearningRate 0.0001 Epoch: 28 Global Step: 600650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:48,689-Speed 6317.47 samples/sec Loss 3.6514 LearningRate 0.0001 Epoch: 28 Global Step: 600660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:51,937-Speed 6306.30 samples/sec Loss 3.5782 LearningRate 0.0001 Epoch: 28 Global Step: 600670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:55,184-Speed 6308.46 samples/sec Loss 3.5308 LearningRate 0.0001 Epoch: 28 Global Step: 600680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:21:58,417-Speed 6336.87 samples/sec Loss 3.5052 LearningRate 0.0001 Epoch: 28 Global Step: 600690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:01,665-Speed 6305.36 samples/sec Loss 3.6131 LearningRate 0.0001 Epoch: 28 Global Step: 600700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:04,910-Speed 6314.48 samples/sec Loss 3.5717 LearningRate 0.0001 Epoch: 28 Global Step: 600710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:08,156-Speed 6310.06 samples/sec Loss 3.5647 LearningRate 0.0001 Epoch: 28 Global Step: 600720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:11,400-Speed 6313.38 samples/sec Loss 3.5126 LearningRate 0.0001 Epoch: 28 Global Step: 600730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:14,646-Speed 6311.14 samples/sec Loss 3.5886 LearningRate 0.0001 Epoch: 28 Global Step: 600740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:17,889-Speed 6317.11 samples/sec Loss 3.6293 LearningRate 0.0001 Epoch: 28 Global Step: 600750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:21,136-Speed 6308.79 samples/sec Loss 3.5833 LearningRate 0.0001 Epoch: 28 Global Step: 600760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:24,398-Speed 6279.71 samples/sec Loss 3.5479 LearningRate 0.0001 Epoch: 28 Global Step: 600770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:27,643-Speed 6312.68 samples/sec Loss 3.5267 LearningRate 0.0001 Epoch: 28 Global Step: 600780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:30,876-Speed 6336.41 samples/sec Loss 3.5701 LearningRate 0.0001 Epoch: 28 Global Step: 600790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:34,122-Speed 6310.44 samples/sec Loss 3.5613 LearningRate 0.0001 Epoch: 28 Global Step: 600800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:37,368-Speed 6310.84 samples/sec Loss 3.5187 LearningRate 0.0001 Epoch: 28 Global Step: 600810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:40,613-Speed 6312.37 samples/sec Loss 3.5660 LearningRate 0.0001 Epoch: 28 Global Step: 600820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:43,859-Speed 6311.59 samples/sec Loss 3.5403 LearningRate 0.0001 Epoch: 28 Global Step: 600830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:47,113-Speed 6294.51 samples/sec Loss 3.5482 LearningRate 0.0001 Epoch: 28 Global Step: 600840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:50,363-Speed 6304.06 samples/sec Loss 3.5815 LearningRate 0.0001 Epoch: 28 Global Step: 600850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:53,609-Speed 6310.67 samples/sec Loss 3.4916 LearningRate 0.0001 Epoch: 28 Global Step: 600860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:22:56,859-Speed 6303.20 samples/sec Loss 3.5053 LearningRate 0.0001 Epoch: 28 Global Step: 600870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:00,107-Speed 6307.41 samples/sec Loss 3.5518 LearningRate 0.0001 Epoch: 28 Global Step: 600880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:03,356-Speed 6303.96 samples/sec Loss 3.5012 LearningRate 0.0001 Epoch: 28 Global Step: 600890 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:23:06,598-Speed 6319.58 samples/sec Loss 3.5708 LearningRate 0.0001 Epoch: 28 Global Step: 600900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:09,846-Speed 6306.75 samples/sec Loss 3.5581 LearningRate 0.0001 Epoch: 28 Global Step: 600910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:13,097-Speed 6300.87 samples/sec Loss 3.5335 LearningRate 0.0001 Epoch: 28 Global Step: 600920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:16,342-Speed 6311.98 samples/sec Loss 3.5281 LearningRate 0.0001 Epoch: 28 Global Step: 600930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:19,588-Speed 6310.29 samples/sec Loss 3.5674 LearningRate 0.0001 Epoch: 28 Global Step: 600940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:22,835-Speed 6309.87 samples/sec Loss 3.5086 LearningRate 0.0001 Epoch: 28 Global Step: 600950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:26,085-Speed 6301.25 samples/sec Loss 3.5281 LearningRate 0.0001 Epoch: 28 Global Step: 600960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:29,331-Speed 6312.76 samples/sec Loss 3.5861 LearningRate 0.0001 Epoch: 28 Global Step: 600970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:32,585-Speed 6294.96 samples/sec Loss 3.5152 LearningRate 0.0001 Epoch: 28 Global Step: 600980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:35,834-Speed 6305.04 samples/sec Loss 3.5576 LearningRate 0.0001 Epoch: 28 Global Step: 600990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:39,084-Speed 6302.99 samples/sec Loss 3.5384 LearningRate 0.0001 Epoch: 28 Global Step: 601000 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:23:42,317-Speed 6335.26 samples/sec Loss 3.5446 LearningRate 0.0001 Epoch: 28 Global Step: 601010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:45,566-Speed 6305.07 samples/sec Loss 3.6459 LearningRate 0.0001 Epoch: 28 Global Step: 601020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:48,829-Speed 6277.12 samples/sec Loss 3.5592 LearningRate 0.0001 Epoch: 28 Global Step: 601030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:52,076-Speed 6308.31 samples/sec Loss 3.6179 LearningRate 0.0001 Epoch: 28 Global Step: 601040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:55,324-Speed 6307.73 samples/sec Loss 3.5917 LearningRate 0.0001 Epoch: 28 Global Step: 601050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:23:58,572-Speed 6307.91 samples/sec Loss 3.5357 LearningRate 0.0001 Epoch: 28 Global Step: 601060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:01,820-Speed 6306.49 samples/sec Loss 3.6152 LearningRate 0.0001 Epoch: 28 Global Step: 601070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:05,066-Speed 6310.23 samples/sec Loss 3.5469 LearningRate 0.0001 Epoch: 28 Global Step: 601080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:08,312-Speed 6311.47 samples/sec Loss 3.5428 LearningRate 0.0001 Epoch: 28 Global Step: 601090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:11,570-Speed 6287.17 samples/sec Loss 3.5949 LearningRate 0.0001 Epoch: 28 Global Step: 601100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:14,823-Speed 6296.45 samples/sec Loss 3.5491 LearningRate 0.0001 Epoch: 28 Global Step: 601110 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:24:18,057-Speed 6335.10 samples/sec Loss 3.5215 LearningRate 0.0001 Epoch: 28 Global Step: 601120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:21,302-Speed 6311.58 samples/sec Loss 3.5540 LearningRate 0.0001 Epoch: 28 Global Step: 601130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:24,547-Speed 6313.10 samples/sec Loss 3.5978 LearningRate 0.0001 Epoch: 28 Global Step: 601140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:27,796-Speed 6305.04 samples/sec Loss 3.5928 LearningRate 0.0001 Epoch: 28 Global Step: 601150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:31,041-Speed 6313.86 samples/sec Loss 3.5545 LearningRate 0.0001 Epoch: 28 Global Step: 601160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:34,289-Speed 6305.48 samples/sec Loss 3.5969 LearningRate 0.0001 Epoch: 28 Global Step: 601170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:37,535-Speed 6311.33 samples/sec Loss 3.5338 LearningRate 0.0001 Epoch: 28 Global Step: 601180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:40,788-Speed 6296.40 samples/sec Loss 3.5287 LearningRate 0.0001 Epoch: 28 Global Step: 601190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:44,032-Speed 6314.15 samples/sec Loss 3.5796 LearningRate 0.0001 Epoch: 28 Global Step: 601200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:47,280-Speed 6308.10 samples/sec Loss 3.5611 LearningRate 0.0001 Epoch: 28 Global Step: 601210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:50,511-Speed 6338.92 samples/sec Loss 3.5565 LearningRate 0.0001 Epoch: 28 Global Step: 601220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:53,762-Speed 6301.40 samples/sec Loss 3.5492 LearningRate 0.0001 Epoch: 28 Global Step: 601230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:24:57,007-Speed 6313.44 samples/sec Loss 3.5596 LearningRate 0.0001 Epoch: 28 Global Step: 601240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:00,252-Speed 6312.39 samples/sec Loss 3.5715 LearningRate 0.0001 Epoch: 28 Global Step: 601250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:03,502-Speed 6302.44 samples/sec Loss 3.5316 LearningRate 0.0001 Epoch: 28 Global Step: 601260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:06,750-Speed 6307.98 samples/sec Loss 3.6194 LearningRate 0.0001 Epoch: 28 Global Step: 601270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:09,995-Speed 6312.63 samples/sec Loss 3.5857 LearningRate 0.0001 Epoch: 28 Global Step: 601280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:13,240-Speed 6312.09 samples/sec Loss 3.5375 LearningRate 0.0001 Epoch: 28 Global Step: 601290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:16,488-Speed 6307.87 samples/sec Loss 3.5931 LearningRate 0.0001 Epoch: 28 Global Step: 601300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:19,732-Speed 6314.33 samples/sec Loss 3.6225 LearningRate 0.0001 Epoch: 28 Global Step: 601310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:22,977-Speed 6312.55 samples/sec Loss 3.6033 LearningRate 0.0001 Epoch: 28 Global Step: 601320 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:25:26,323-Speed 6121.92 samples/sec Loss 3.5644 LearningRate 0.0001 Epoch: 28 Global Step: 601330 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:25:29,614-Speed 6225.48 samples/sec Loss 3.5823 LearningRate 0.0001 Epoch: 28 Global Step: 601340 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:25:32,846-Speed 6337.70 samples/sec Loss 3.5556 LearningRate 0.0001 Epoch: 28 Global Step: 601350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:36,092-Speed 6311.17 samples/sec Loss 3.5184 LearningRate 0.0001 Epoch: 28 Global Step: 601360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:39,335-Speed 6315.79 samples/sec Loss 3.5022 LearningRate 0.0001 Epoch: 28 Global Step: 601370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:42,584-Speed 6304.86 samples/sec Loss 3.5520 LearningRate 0.0001 Epoch: 28 Global Step: 601380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:45,828-Speed 6314.27 samples/sec Loss 3.5528 LearningRate 0.0001 Epoch: 28 Global Step: 601390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:49,072-Speed 6315.48 samples/sec Loss 3.5314 LearningRate 0.0001 Epoch: 28 Global Step: 601400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:52,316-Speed 6314.07 samples/sec Loss 3.5906 LearningRate 0.0001 Epoch: 28 Global Step: 601410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:55,561-Speed 6311.42 samples/sec Loss 3.5655 LearningRate 0.0001 Epoch: 28 Global Step: 601420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:25:58,804-Speed 6318.60 samples/sec Loss 3.5412 LearningRate 0.0001 Epoch: 28 Global Step: 601430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:26:02,052-Speed 6305.08 samples/sec Loss 3.5461 LearningRate 0.0001 Epoch: 28 Global Step: 601440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:03,897-Speed 331.16 samples/sec Loss 3.5338 LearningRate 0.0001 Epoch: 29 Global Step: 601450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:07,154-Speed 6288.89 samples/sec Loss 3.6333 LearningRate 0.0001 Epoch: 29 Global Step: 601460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:10,386-Speed 6337.84 samples/sec Loss 3.5713 LearningRate 0.0001 Epoch: 29 Global Step: 601470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:13,649-Speed 6278.14 samples/sec Loss 3.6022 LearningRate 0.0001 Epoch: 29 Global Step: 601480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:16,889-Speed 6323.14 samples/sec Loss 3.5617 LearningRate 0.0001 Epoch: 29 Global Step: 601490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:20,125-Speed 6329.85 samples/sec Loss 3.5514 LearningRate 0.0001 Epoch: 29 Global Step: 601500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:23,362-Speed 6329.44 samples/sec Loss 3.5278 LearningRate 0.0001 Epoch: 29 Global Step: 601510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:26,602-Speed 6322.12 samples/sec Loss 3.5163 LearningRate 0.0001 Epoch: 29 Global Step: 601520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:29,843-Speed 6320.11 samples/sec Loss 3.5425 LearningRate 0.0001 Epoch: 29 Global Step: 601530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:33,076-Speed 6335.87 samples/sec Loss 3.5961 LearningRate 0.0001 Epoch: 29 Global Step: 601540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:36,304-Speed 6346.60 samples/sec Loss 3.5375 LearningRate 0.0001 Epoch: 29 Global Step: 601550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:39,541-Speed 6327.87 samples/sec Loss 3.5482 LearningRate 0.0001 Epoch: 29 Global Step: 601560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:42,781-Speed 6322.94 samples/sec Loss 3.5061 LearningRate 0.0001 Epoch: 29 Global Step: 601570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:46,025-Speed 6314.99 samples/sec Loss 3.5920 LearningRate 0.0001 Epoch: 29 Global Step: 601580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:49,265-Speed 6320.91 samples/sec Loss 3.5492 LearningRate 0.0001 Epoch: 29 Global Step: 601590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:52,501-Speed 6330.45 samples/sec Loss 3.5494 LearningRate 0.0001 Epoch: 29 Global Step: 601600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:55,736-Speed 6331.47 samples/sec Loss 3.5305 LearningRate 0.0001 Epoch: 29 Global Step: 601610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:27:58,984-Speed 6307.93 samples/sec Loss 3.5912 LearningRate 0.0001 Epoch: 29 Global Step: 601620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:02,229-Speed 6313.47 samples/sec Loss 3.5581 LearningRate 0.0001 Epoch: 29 Global Step: 601630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:05,474-Speed 6311.99 samples/sec Loss 3.5611 LearningRate 0.0001 Epoch: 29 Global Step: 601640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:08,712-Speed 6326.14 samples/sec Loss 3.5107 LearningRate 0.0001 Epoch: 29 Global Step: 601650 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:28:11,937-Speed 6351.60 samples/sec Loss 3.5493 LearningRate 0.0001 Epoch: 29 Global Step: 601660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:15,177-Speed 6322.24 samples/sec Loss 3.5838 LearningRate 0.0001 Epoch: 29 Global Step: 601670 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:18,416-Speed 6324.02 samples/sec Loss 3.5844 LearningRate 0.0001 Epoch: 29 Global Step: 601680 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:21,655-Speed 6324.18 samples/sec Loss 3.5197 LearningRate 0.0001 Epoch: 29 Global Step: 601690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:24,901-Speed 6312.15 samples/sec Loss 3.5238 LearningRate 0.0001 Epoch: 29 Global Step: 601700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:28,137-Speed 6329.48 samples/sec Loss 3.5186 LearningRate 0.0001 Epoch: 29 Global Step: 601710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:31,375-Speed 6327.44 samples/sec Loss 3.5144 LearningRate 0.0001 Epoch: 29 Global Step: 601720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:34,613-Speed 6326.77 samples/sec Loss 3.5324 LearningRate 0.0001 Epoch: 29 Global Step: 601730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:37,851-Speed 6325.99 samples/sec Loss 3.5276 LearningRate 0.0001 Epoch: 29 Global Step: 601740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:41,088-Speed 6328.50 samples/sec Loss 3.5166 LearningRate 0.0001 Epoch: 29 Global Step: 601750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:44,310-Speed 6356.51 samples/sec Loss 3.5344 LearningRate 0.0001 Epoch: 29 Global Step: 601760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:47,549-Speed 6325.82 samples/sec Loss 3.5594 LearningRate 0.0001 Epoch: 29 Global Step: 601770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:50,788-Speed 6322.84 samples/sec Loss 3.5457 LearningRate 0.0001 Epoch: 29 Global Step: 601780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:54,024-Speed 6330.72 samples/sec Loss 3.4938 LearningRate 0.0001 Epoch: 29 Global Step: 601790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:28:57,260-Speed 6330.93 samples/sec Loss 3.5148 LearningRate 0.0001 Epoch: 29 Global Step: 601800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:00,505-Speed 6312.21 samples/sec Loss 3.5495 LearningRate 0.0001 Epoch: 29 Global Step: 601810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:03,750-Speed 6312.36 samples/sec Loss 3.5543 LearningRate 0.0001 Epoch: 29 Global Step: 601820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:06,988-Speed 6326.67 samples/sec Loss 3.5679 LearningRate 0.0001 Epoch: 29 Global Step: 601830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:10,223-Speed 6332.17 samples/sec Loss 3.5924 LearningRate 0.0001 Epoch: 29 Global Step: 601840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:13,456-Speed 6336.70 samples/sec Loss 3.5564 LearningRate 0.0001 Epoch: 29 Global Step: 601850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:16,693-Speed 6327.64 samples/sec Loss 3.5878 LearningRate 0.0001 Epoch: 29 Global Step: 601860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:19,931-Speed 6326.82 samples/sec Loss 3.5205 LearningRate 0.0001 Epoch: 29 Global Step: 601870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:23,169-Speed 6324.76 samples/sec Loss 3.4842 LearningRate 0.0001 Epoch: 29 Global Step: 601880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:26,407-Speed 6326.32 samples/sec Loss 3.5606 LearningRate 0.0001 Epoch: 29 Global Step: 601890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:29,641-Speed 6335.25 samples/sec Loss 3.5742 LearningRate 0.0001 Epoch: 29 Global Step: 601900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:32,877-Speed 6329.26 samples/sec Loss 3.5147 LearningRate 0.0001 Epoch: 29 Global Step: 601910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:36,113-Speed 6331.11 samples/sec Loss 3.5714 LearningRate 0.0001 Epoch: 29 Global Step: 601920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:39,346-Speed 6336.78 samples/sec Loss 3.5511 LearningRate 0.0001 Epoch: 29 Global Step: 601930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:42,584-Speed 6325.70 samples/sec Loss 3.5460 LearningRate 0.0001 Epoch: 29 Global Step: 601940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:45,819-Speed 6333.75 samples/sec Loss 3.6002 LearningRate 0.0001 Epoch: 29 Global Step: 601950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:49,055-Speed 6328.85 samples/sec Loss 3.5710 LearningRate 0.0001 Epoch: 29 Global Step: 601960 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:29:52,275-Speed 6361.82 samples/sec Loss 3.5897 LearningRate 0.0001 Epoch: 29 Global Step: 601970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:55,513-Speed 6327.14 samples/sec Loss 3.5084 LearningRate 0.0001 Epoch: 29 Global Step: 601980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:29:58,755-Speed 6319.04 samples/sec Loss 3.5217 LearningRate 0.0001 Epoch: 29 Global Step: 601990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:02,065-Speed 6187.51 samples/sec Loss 3.4996 LearningRate 0.0001 Epoch: 29 Global Step: 602000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:05,311-Speed 6311.79 samples/sec Loss 3.5611 LearningRate 0.0001 Epoch: 29 Global Step: 602010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:08,545-Speed 6332.34 samples/sec Loss 3.5270 LearningRate 0.0001 Epoch: 29 Global Step: 602020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:11,786-Speed 6321.86 samples/sec Loss 3.5292 LearningRate 0.0001 Epoch: 29 Global Step: 602030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:15,019-Speed 6335.16 samples/sec Loss 3.5268 LearningRate 0.0001 Epoch: 29 Global Step: 602040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:18,259-Speed 6323.63 samples/sec Loss 3.4863 LearningRate 0.0001 Epoch: 29 Global Step: 602050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:21,496-Speed 6329.79 samples/sec Loss 3.5646 LearningRate 0.0001 Epoch: 29 Global Step: 602060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:24,734-Speed 6325.81 samples/sec Loss 3.5207 LearningRate 0.0001 Epoch: 29 Global Step: 602070 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:30:27,959-Speed 6351.34 samples/sec Loss 3.6204 LearningRate 0.0001 Epoch: 29 Global Step: 602080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:31,207-Speed 6307.55 samples/sec Loss 3.5881 LearningRate 0.0001 Epoch: 29 Global Step: 602090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:34,452-Speed 6311.55 samples/sec Loss 3.5662 LearningRate 0.0001 Epoch: 29 Global Step: 602100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:37,690-Speed 6326.21 samples/sec Loss 3.5410 LearningRate 0.0001 Epoch: 29 Global Step: 602110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:40,925-Speed 6331.63 samples/sec Loss 3.5616 LearningRate 0.0001 Epoch: 29 Global Step: 602120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:44,168-Speed 6316.82 samples/sec Loss 3.5349 LearningRate 0.0001 Epoch: 29 Global Step: 602130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:47,404-Speed 6331.09 samples/sec Loss 3.6036 LearningRate 0.0001 Epoch: 29 Global Step: 602140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:50,642-Speed 6325.77 samples/sec Loss 3.5246 LearningRate 0.0001 Epoch: 29 Global Step: 602150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:53,879-Speed 6328.23 samples/sec Loss 3.5404 LearningRate 0.0001 Epoch: 29 Global Step: 602160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:30:57,131-Speed 6300.92 samples/sec Loss 3.5255 LearningRate 0.0001 Epoch: 29 Global Step: 602170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:31:00,353-Speed 6356.44 samples/sec Loss 3.5047 LearningRate 0.0001 Epoch: 29 Global Step: 602180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:31:03,626-Speed 6258.71 samples/sec Loss 3.5570 LearningRate 0.0001 Epoch: 29 Global Step: 602190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:31:06,866-Speed 6322.08 samples/sec Loss 3.5689 LearningRate 0.0001 Epoch: 29 Global Step: 602200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:31:10,103-Speed 6328.90 samples/sec Loss 3.5373 LearningRate 0.0001 Epoch: 29 Global Step: 602210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:31:13,344-Speed 6319.90 samples/sec Loss 3.5675 LearningRate 0.0001 Epoch: 29 Global Step: 602220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:31:16,584-Speed 6323.22 samples/sec Loss 3.5631 LearningRate 0.0001 Epoch: 29 Global Step: 602230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:31:19,819-Speed 6333.23 samples/sec Loss 3.5728 LearningRate 0.0001 Epoch: 29 Global Step: 602240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:31:23,056-Speed 6328.52 samples/sec Loss 3.5953 LearningRate 0.0001 Epoch: 29 Global Step: 602250 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:31:26,280-Speed 6353.15 samples/sec Loss 3.5521 LearningRate 0.0001 Epoch: 29 Global Step: 602260 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-04-02 22:31:29,517-Speed 6328.23 samples/sec Loss 3.5373 LearningRate 0.0001 Epoch: 29 Global Step: 602270 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-04-02 22:31:32,756-Speed 6323.45 samples/sec Loss 3.5736 LearningRate 0.0001 Epoch: 29 Global Step: 602280 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-04-02 22:31:35,991-Speed 6333.42 samples/sec Loss 3.6034 LearningRate 0.0001 Epoch: 29 Global Step: 602290 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-04-02 22:31:39,234-Speed 6317.16 samples/sec Loss 3.5634 LearningRate 0.0001 Epoch: 29 Global Step: 602300 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-04-02 22:31:42,483-Speed 6304.55 samples/sec Loss 3.5014 LearningRate 0.0001 Epoch: 29 Global Step: 602310 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-04-02 22:31:45,721-Speed 6326.33 samples/sec Loss 3.5331 LearningRate 0.0001 Epoch: 29 Global Step: 602320 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-04-02 22:31:48,957-Speed 6329.13 samples/sec Loss 3.5380 LearningRate 0.0001 Epoch: 29 Global Step: 602330 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-04-02 22:31:52,196-Speed 6323.94 samples/sec Loss 3.5089 LearningRate 0.0001 Epoch: 29 Global Step: 602340 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-04-02 22:31:55,437-Speed 6320.91 samples/sec Loss 3.5458 LearningRate 0.0001 Epoch: 29 Global Step: 602350 Fp16 Grad Scale: 4096 Required: 21 hours Training: 2022-04-02 22:31:58,679-Speed 6318.01 samples/sec Loss 3.5330 LearningRate 0.0001 Epoch: 29 Global Step: 602360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:01,920-Speed 6321.62 samples/sec Loss 3.5873 LearningRate 0.0001 Epoch: 29 Global Step: 602370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:05,168-Speed 6306.19 samples/sec Loss 3.5497 LearningRate 0.0001 Epoch: 29 Global Step: 602380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:08,408-Speed 6322.27 samples/sec Loss 3.5439 LearningRate 0.0001 Epoch: 29 Global Step: 602390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:11,647-Speed 6325.20 samples/sec Loss 3.5599 LearningRate 0.0001 Epoch: 29 Global Step: 602400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:14,884-Speed 6329.05 samples/sec Loss 3.5055 LearningRate 0.0001 Epoch: 29 Global Step: 602410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:18,124-Speed 6322.25 samples/sec Loss 3.5255 LearningRate 0.0001 Epoch: 29 Global Step: 602420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:21,362-Speed 6326.30 samples/sec Loss 3.5466 LearningRate 0.0001 Epoch: 29 Global Step: 602430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:24,599-Speed 6327.06 samples/sec Loss 3.5691 LearningRate 0.0001 Epoch: 29 Global Step: 602440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:27,835-Speed 6330.89 samples/sec Loss 3.5416 LearningRate 0.0001 Epoch: 29 Global Step: 602450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:31,062-Speed 6348.82 samples/sec Loss 3.5578 LearningRate 0.0001 Epoch: 29 Global Step: 602460 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:34,299-Speed 6328.24 samples/sec Loss 3.5501 LearningRate 0.0001 Epoch: 29 Global Step: 602470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:37,535-Speed 6329.86 samples/sec Loss 3.5176 LearningRate 0.0001 Epoch: 29 Global Step: 602480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:40,775-Speed 6321.49 samples/sec Loss 3.5585 LearningRate 0.0001 Epoch: 29 Global Step: 602490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:44,014-Speed 6324.28 samples/sec Loss 3.5184 LearningRate 0.0001 Epoch: 29 Global Step: 602500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:47,260-Speed 6312.04 samples/sec Loss 3.5514 LearningRate 0.0001 Epoch: 29 Global Step: 602510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:50,496-Speed 6330.02 samples/sec Loss 3.5963 LearningRate 0.0001 Epoch: 29 Global Step: 602520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:53,733-Speed 6326.99 samples/sec Loss 3.5599 LearningRate 0.0001 Epoch: 29 Global Step: 602530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:32:56,973-Speed 6321.94 samples/sec Loss 3.5109 LearningRate 0.0001 Epoch: 29 Global Step: 602540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:00,216-Speed 6317.32 samples/sec Loss 3.5608 LearningRate 0.0001 Epoch: 29 Global Step: 602550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:03,462-Speed 6311.83 samples/sec Loss 3.5997 LearningRate 0.0001 Epoch: 29 Global Step: 602560 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:33:06,753-Speed 6223.47 samples/sec Loss 3.5111 LearningRate 0.0001 Epoch: 29 Global Step: 602570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:09,997-Speed 6314.16 samples/sec Loss 3.5153 LearningRate 0.0001 Epoch: 29 Global Step: 602580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:13,239-Speed 6318.36 samples/sec Loss 3.5452 LearningRate 0.0001 Epoch: 29 Global Step: 602590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:16,476-Speed 6329.93 samples/sec Loss 3.5193 LearningRate 0.0001 Epoch: 29 Global Step: 602600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:19,717-Speed 6319.62 samples/sec Loss 3.5512 LearningRate 0.0001 Epoch: 29 Global Step: 602610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:22,963-Speed 6310.64 samples/sec Loss 3.5533 LearningRate 0.0001 Epoch: 29 Global Step: 602620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:26,201-Speed 6327.44 samples/sec Loss 3.5699 LearningRate 0.0001 Epoch: 29 Global Step: 602630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:29,440-Speed 6323.98 samples/sec Loss 3.5505 LearningRate 0.0001 Epoch: 29 Global Step: 602640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:32,683-Speed 6317.50 samples/sec Loss 3.5552 LearningRate 0.0001 Epoch: 29 Global Step: 602650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:35,927-Speed 6313.18 samples/sec Loss 3.5564 LearningRate 0.0001 Epoch: 29 Global Step: 602660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:39,166-Speed 6325.00 samples/sec Loss 3.5769 LearningRate 0.0001 Epoch: 29 Global Step: 602670 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:33:42,409-Speed 6317.38 samples/sec Loss 3.5268 LearningRate 0.0001 Epoch: 29 Global Step: 602680 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:33:45,637-Speed 6346.30 samples/sec Loss 3.5198 LearningRate 0.0001 Epoch: 29 Global Step: 602690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:48,879-Speed 6318.93 samples/sec Loss 3.5282 LearningRate 0.0001 Epoch: 29 Global Step: 602700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:52,124-Speed 6313.05 samples/sec Loss 3.5249 LearningRate 0.0001 Epoch: 29 Global Step: 602710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:55,361-Speed 6326.91 samples/sec Loss 3.5528 LearningRate 0.0001 Epoch: 29 Global Step: 602720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:33:58,600-Speed 6325.90 samples/sec Loss 3.5066 LearningRate 0.0001 Epoch: 29 Global Step: 602730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:01,840-Speed 6322.01 samples/sec Loss 3.6043 LearningRate 0.0001 Epoch: 29 Global Step: 602740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:05,082-Speed 6318.44 samples/sec Loss 3.4554 LearningRate 0.0001 Epoch: 29 Global Step: 602750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:08,323-Speed 6319.48 samples/sec Loss 3.5377 LearningRate 0.0001 Epoch: 29 Global Step: 602760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:11,563-Speed 6323.25 samples/sec Loss 3.5403 LearningRate 0.0001 Epoch: 29 Global Step: 602770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:14,807-Speed 6314.69 samples/sec Loss 3.5801 LearningRate 0.0001 Epoch: 29 Global Step: 602780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:18,039-Speed 6337.78 samples/sec Loss 3.5576 LearningRate 0.0001 Epoch: 29 Global Step: 602790 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:21,277-Speed 6326.17 samples/sec Loss 3.4772 LearningRate 0.0001 Epoch: 29 Global Step: 602800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:24,517-Speed 6322.37 samples/sec Loss 3.4837 LearningRate 0.0001 Epoch: 29 Global Step: 602810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:27,754-Speed 6328.53 samples/sec Loss 3.5648 LearningRate 0.0001 Epoch: 29 Global Step: 602820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:30,995-Speed 6319.11 samples/sec Loss 3.5613 LearningRate 0.0001 Epoch: 29 Global Step: 602830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:34,237-Speed 6321.07 samples/sec Loss 3.5114 LearningRate 0.0001 Epoch: 29 Global Step: 602840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:37,476-Speed 6322.97 samples/sec Loss 3.5068 LearningRate 0.0001 Epoch: 29 Global Step: 602850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:40,717-Speed 6321.04 samples/sec Loss 3.5305 LearningRate 0.0001 Epoch: 29 Global Step: 602860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:43,958-Speed 6319.78 samples/sec Loss 3.5025 LearningRate 0.0001 Epoch: 29 Global Step: 602870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:47,197-Speed 6324.61 samples/sec Loss 3.5619 LearningRate 0.0001 Epoch: 29 Global Step: 602880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:34:50,442-Speed 6313.39 samples/sec Loss 3.5713 LearningRate 0.0001 Epoch: 29 Global Step: 602890 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:34:53,686-Speed 6314.74 samples/sec Loss 3.5297 LearningRate 0.0001 Epoch: 29 Global Step: 602900 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:34:56,928-Speed 6318.61 samples/sec Loss 3.5479 LearningRate 0.0001 Epoch: 29 Global Step: 602910 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:35:00,171-Speed 6316.94 samples/sec Loss 3.5235 LearningRate 0.0001 Epoch: 29 Global Step: 602920 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:35:03,400-Speed 6344.08 samples/sec Loss 3.5385 LearningRate 0.0001 Epoch: 29 Global Step: 602930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:06,643-Speed 6315.22 samples/sec Loss 3.5795 LearningRate 0.0001 Epoch: 29 Global Step: 602940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:09,886-Speed 6317.96 samples/sec Loss 3.5272 LearningRate 0.0001 Epoch: 29 Global Step: 602950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:13,125-Speed 6322.38 samples/sec Loss 3.5386 LearningRate 0.0001 Epoch: 29 Global Step: 602960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:16,368-Speed 6316.54 samples/sec Loss 3.4788 LearningRate 0.0001 Epoch: 29 Global Step: 602970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:19,608-Speed 6323.56 samples/sec Loss 3.4929 LearningRate 0.0001 Epoch: 29 Global Step: 602980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:22,853-Speed 6311.65 samples/sec Loss 3.5331 LearningRate 0.0001 Epoch: 29 Global Step: 602990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:26,094-Speed 6320.50 samples/sec Loss 3.5560 LearningRate 0.0001 Epoch: 29 Global Step: 603000 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:29,343-Speed 6305.36 samples/sec Loss 3.5297 LearningRate 0.0001 Epoch: 29 Global Step: 603010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:32,599-Speed 6292.71 samples/sec Loss 3.5047 LearningRate 0.0001 Epoch: 29 Global Step: 603020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:35,824-Speed 6350.89 samples/sec Loss 3.5605 LearningRate 0.0001 Epoch: 29 Global Step: 603030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:39,067-Speed 6316.49 samples/sec Loss 3.4560 LearningRate 0.0001 Epoch: 29 Global Step: 603040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:42,307-Speed 6323.48 samples/sec Loss 3.5163 LearningRate 0.0001 Epoch: 29 Global Step: 603050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:45,545-Speed 6325.34 samples/sec Loss 3.5490 LearningRate 0.0001 Epoch: 29 Global Step: 603060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:48,787-Speed 6319.36 samples/sec Loss 3.5329 LearningRate 0.0001 Epoch: 29 Global Step: 603070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:52,029-Speed 6319.35 samples/sec Loss 3.5589 LearningRate 0.0001 Epoch: 29 Global Step: 603080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:55,271-Speed 6317.12 samples/sec Loss 3.5030 LearningRate 0.0001 Epoch: 29 Global Step: 603090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:35:58,514-Speed 6316.79 samples/sec Loss 3.5419 LearningRate 0.0001 Epoch: 29 Global Step: 603100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:01,764-Speed 6303.96 samples/sec Loss 3.5601 LearningRate 0.0001 Epoch: 29 Global Step: 603110 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:05,007-Speed 6316.49 samples/sec Loss 3.5390 LearningRate 0.0001 Epoch: 29 Global Step: 603120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:08,245-Speed 6325.40 samples/sec Loss 3.5594 LearningRate 0.0001 Epoch: 29 Global Step: 603130 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:36:11,490-Speed 6313.73 samples/sec Loss 3.5617 LearningRate 0.0001 Epoch: 29 Global Step: 603140 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:36:14,720-Speed 6341.79 samples/sec Loss 3.5517 LearningRate 0.0001 Epoch: 29 Global Step: 603150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:17,966-Speed 6310.01 samples/sec Loss 3.5505 LearningRate 0.0001 Epoch: 29 Global Step: 603160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:21,205-Speed 6323.93 samples/sec Loss 3.5202 LearningRate 0.0001 Epoch: 29 Global Step: 603170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:24,449-Speed 6316.03 samples/sec Loss 3.5621 LearningRate 0.0001 Epoch: 29 Global Step: 603180 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:27,692-Speed 6315.85 samples/sec Loss 3.4837 LearningRate 0.0001 Epoch: 29 Global Step: 603190 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:30,933-Speed 6320.27 samples/sec Loss 3.5247 LearningRate 0.0001 Epoch: 29 Global Step: 603200 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:34,175-Speed 6318.01 samples/sec Loss 3.5414 LearningRate 0.0001 Epoch: 29 Global Step: 603210 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:37,420-Speed 6313.90 samples/sec Loss 3.6151 LearningRate 0.0001 Epoch: 29 Global Step: 603220 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:40,669-Speed 6303.50 samples/sec Loss 3.5260 LearningRate 0.0001 Epoch: 29 Global Step: 603230 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:43,910-Speed 6320.65 samples/sec Loss 3.5936 LearningRate 0.0001 Epoch: 29 Global Step: 603240 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:47,148-Speed 6327.66 samples/sec Loss 3.5713 LearningRate 0.0001 Epoch: 29 Global Step: 603250 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:36:50,376-Speed 6344.26 samples/sec Loss 3.4606 LearningRate 0.0001 Epoch: 29 Global Step: 603260 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:53,619-Speed 6317.43 samples/sec Loss 3.4990 LearningRate 0.0001 Epoch: 29 Global Step: 603270 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:36:56,864-Speed 6313.64 samples/sec Loss 3.5648 LearningRate 0.0001 Epoch: 29 Global Step: 603280 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:00,106-Speed 6318.99 samples/sec Loss 3.5910 LearningRate 0.0001 Epoch: 29 Global Step: 603290 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:03,348-Speed 6318.26 samples/sec Loss 3.5116 LearningRate 0.0001 Epoch: 29 Global Step: 603300 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:06,591-Speed 6316.98 samples/sec Loss 3.5285 LearningRate 0.0001 Epoch: 29 Global Step: 603310 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:09,835-Speed 6314.21 samples/sec Loss 3.5084 LearningRate 0.0001 Epoch: 29 Global Step: 603320 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:13,073-Speed 6326.29 samples/sec Loss 3.4833 LearningRate 0.0001 Epoch: 29 Global Step: 603330 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:16,316-Speed 6316.80 samples/sec Loss 3.5545 LearningRate 0.0001 Epoch: 29 Global Step: 603340 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:19,567-Speed 6300.24 samples/sec Loss 3.5720 LearningRate 0.0001 Epoch: 29 Global Step: 603350 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:22,795-Speed 6345.64 samples/sec Loss 3.4957 LearningRate 0.0001 Epoch: 29 Global Step: 603360 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:26,041-Speed 6314.51 samples/sec Loss 3.5737 LearningRate 0.0001 Epoch: 29 Global Step: 603370 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:29,284-Speed 6316.55 samples/sec Loss 3.5100 LearningRate 0.0001 Epoch: 29 Global Step: 603380 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:32,528-Speed 6314.40 samples/sec Loss 3.5027 LearningRate 0.0001 Epoch: 29 Global Step: 603390 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:35,767-Speed 6322.96 samples/sec Loss 3.5477 LearningRate 0.0001 Epoch: 29 Global Step: 603400 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:39,012-Speed 6312.99 samples/sec Loss 3.5571 LearningRate 0.0001 Epoch: 29 Global Step: 603410 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:42,251-Speed 6325.27 samples/sec Loss 3.5610 LearningRate 0.0001 Epoch: 29 Global Step: 603420 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:45,515-Speed 6274.65 samples/sec Loss 3.5138 LearningRate 0.0001 Epoch: 29 Global Step: 603430 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:48,758-Speed 6316.48 samples/sec Loss 3.4811 LearningRate 0.0001 Epoch: 29 Global Step: 603440 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:51,998-Speed 6322.37 samples/sec Loss 3.5961 LearningRate 0.0001 Epoch: 29 Global Step: 603450 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:37:55,243-Speed 6313.97 samples/sec Loss 3.4944 LearningRate 0.0001 Epoch: 29 Global Step: 603460 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:37:58,472-Speed 6344.21 samples/sec Loss 3.5432 LearningRate 0.0001 Epoch: 29 Global Step: 603470 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:01,713-Speed 6319.88 samples/sec Loss 3.5600 LearningRate 0.0001 Epoch: 29 Global Step: 603480 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:04,962-Speed 6305.19 samples/sec Loss 3.5094 LearningRate 0.0001 Epoch: 29 Global Step: 603490 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:08,201-Speed 6323.23 samples/sec Loss 3.5231 LearningRate 0.0001 Epoch: 29 Global Step: 603500 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:11,445-Speed 6315.97 samples/sec Loss 3.5656 LearningRate 0.0001 Epoch: 29 Global Step: 603510 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:14,694-Speed 6304.84 samples/sec Loss 3.5311 LearningRate 0.0001 Epoch: 29 Global Step: 603520 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:17,937-Speed 6316.47 samples/sec Loss 3.5409 LearningRate 0.0001 Epoch: 29 Global Step: 603530 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:21,181-Speed 6314.68 samples/sec Loss 3.5751 LearningRate 0.0001 Epoch: 29 Global Step: 603540 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:24,427-Speed 6310.78 samples/sec Loss 3.5551 LearningRate 0.0001 Epoch: 29 Global Step: 603550 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:27,674-Speed 6309.00 samples/sec Loss 3.5249 LearningRate 0.0001 Epoch: 29 Global Step: 603560 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:30,904-Speed 6342.51 samples/sec Loss 3.5246 LearningRate 0.0001 Epoch: 29 Global Step: 603570 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:34,147-Speed 6315.44 samples/sec Loss 3.4885 LearningRate 0.0001 Epoch: 29 Global Step: 603580 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:37,391-Speed 6315.50 samples/sec Loss 3.5649 LearningRate 0.0001 Epoch: 29 Global Step: 603590 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:40,633-Speed 6319.27 samples/sec Loss 3.5836 LearningRate 0.0001 Epoch: 29 Global Step: 603600 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:43,875-Speed 6318.55 samples/sec Loss 3.5395 LearningRate 0.0001 Epoch: 29 Global Step: 603610 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:47,118-Speed 6315.76 samples/sec Loss 3.5508 LearningRate 0.0001 Epoch: 29 Global Step: 603620 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:50,360-Speed 6318.89 samples/sec Loss 3.5169 LearningRate 0.0001 Epoch: 29 Global Step: 603630 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:53,602-Speed 6318.76 samples/sec Loss 3.5482 LearningRate 0.0001 Epoch: 29 Global Step: 603640 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:38:56,844-Speed 6318.25 samples/sec Loss 3.4533 LearningRate 0.0001 Epoch: 29 Global Step: 603650 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:00,089-Speed 6312.24 samples/sec Loss 3.5664 LearningRate 0.0001 Epoch: 29 Global Step: 603660 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:03,335-Speed 6311.94 samples/sec Loss 3.5213 LearningRate 0.0001 Epoch: 29 Global Step: 603670 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:39:06,578-Speed 6315.77 samples/sec Loss 3.5559 LearningRate 0.0001 Epoch: 29 Global Step: 603680 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:39:09,812-Speed 6334.28 samples/sec Loss 3.4998 LearningRate 0.0001 Epoch: 29 Global Step: 603690 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:13,058-Speed 6309.56 samples/sec Loss 3.5130 LearningRate 0.0001 Epoch: 29 Global Step: 603700 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:16,302-Speed 6316.44 samples/sec Loss 3.5701 LearningRate 0.0001 Epoch: 29 Global Step: 603710 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:19,544-Speed 6317.30 samples/sec Loss 3.5354 LearningRate 0.0001 Epoch: 29 Global Step: 603720 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:22,790-Speed 6312.21 samples/sec Loss 3.5188 LearningRate 0.0001 Epoch: 29 Global Step: 603730 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:26,101-Speed 6187.23 samples/sec Loss 3.5071 LearningRate 0.0001 Epoch: 29 Global Step: 603740 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:29,348-Speed 6308.69 samples/sec Loss 3.5181 LearningRate 0.0001 Epoch: 29 Global Step: 603750 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:32,613-Speed 6272.93 samples/sec Loss 3.5377 LearningRate 0.0001 Epoch: 29 Global Step: 603760 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:35,853-Speed 6323.21 samples/sec Loss 3.5139 LearningRate 0.0001 Epoch: 29 Global Step: 603770 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:39,108-Speed 6293.04 samples/sec Loss 3.5067 LearningRate 0.0001 Epoch: 29 Global Step: 603780 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:42,412-Speed 6200.41 samples/sec Loss 3.5274 LearningRate 0.0001 Epoch: 29 Global Step: 603790 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:39:45,650-Speed 6325.14 samples/sec Loss 3.5107 LearningRate 0.0001 Epoch: 29 Global Step: 603800 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:48,894-Speed 6314.39 samples/sec Loss 3.5231 LearningRate 0.0001 Epoch: 29 Global Step: 603810 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:52,139-Speed 6313.06 samples/sec Loss 3.5416 LearningRate 0.0001 Epoch: 29 Global Step: 603820 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:55,386-Speed 6308.96 samples/sec Loss 3.5419 LearningRate 0.0001 Epoch: 29 Global Step: 603830 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:39:58,633-Speed 6308.51 samples/sec Loss 3.5367 LearningRate 0.0001 Epoch: 29 Global Step: 603840 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:01,876-Speed 6316.67 samples/sec Loss 3.5333 LearningRate 0.0001 Epoch: 29 Global Step: 603850 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:05,124-Speed 6307.95 samples/sec Loss 3.5180 LearningRate 0.0001 Epoch: 29 Global Step: 603860 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:08,368-Speed 6314.02 samples/sec Loss 3.5828 LearningRate 0.0001 Epoch: 29 Global Step: 603870 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:11,622-Speed 6297.49 samples/sec Loss 3.5191 LearningRate 0.0001 Epoch: 29 Global Step: 603880 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:14,863-Speed 6319.84 samples/sec Loss 3.5705 LearningRate 0.0001 Epoch: 29 Global Step: 603890 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:18,091-Speed 6346.68 samples/sec Loss 3.5716 LearningRate 0.0001 Epoch: 29 Global Step: 603900 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:21,334-Speed 6316.03 samples/sec Loss 3.4532 LearningRate 0.0001 Epoch: 29 Global Step: 603910 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:24,575-Speed 6321.49 samples/sec Loss 3.5057 LearningRate 0.0001 Epoch: 29 Global Step: 603920 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:27,822-Speed 6308.57 samples/sec Loss 3.5418 LearningRate 0.0001 Epoch: 29 Global Step: 603930 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:31,070-Speed 6306.73 samples/sec Loss 3.5897 LearningRate 0.0001 Epoch: 29 Global Step: 603940 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:34,314-Speed 6313.70 samples/sec Loss 3.5329 LearningRate 0.0001 Epoch: 29 Global Step: 603950 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:37,556-Speed 6318.68 samples/sec Loss 3.5047 LearningRate 0.0001 Epoch: 29 Global Step: 603960 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:40,800-Speed 6314.90 samples/sec Loss 3.4984 LearningRate 0.0001 Epoch: 29 Global Step: 603970 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:44,043-Speed 6316.91 samples/sec Loss 3.5109 LearningRate 0.0001 Epoch: 29 Global Step: 603980 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:47,292-Speed 6305.33 samples/sec Loss 3.5150 LearningRate 0.0001 Epoch: 29 Global Step: 603990 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:50,533-Speed 6320.64 samples/sec Loss 3.5231 LearningRate 0.0001 Epoch: 29 Global Step: 604000 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:40:53,761-Speed 6344.53 samples/sec Loss 3.5968 LearningRate 0.0001 Epoch: 29 Global Step: 604010 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:40:57,003-Speed 6318.95 samples/sec Loss 3.5189 LearningRate 0.0001 Epoch: 29 Global Step: 604020 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:00,249-Speed 6311.56 samples/sec Loss 3.4855 LearningRate 0.0001 Epoch: 29 Global Step: 604030 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:03,493-Speed 6314.04 samples/sec Loss 3.5122 LearningRate 0.0001 Epoch: 29 Global Step: 604040 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:06,742-Speed 6305.91 samples/sec Loss 3.5132 LearningRate 0.0001 Epoch: 29 Global Step: 604050 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:09,988-Speed 6309.25 samples/sec Loss 3.5249 LearningRate 0.0001 Epoch: 29 Global Step: 604060 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:13,233-Speed 6313.28 samples/sec Loss 3.5228 LearningRate 0.0001 Epoch: 29 Global Step: 604070 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:16,481-Speed 6307.50 samples/sec Loss 3.5421 LearningRate 0.0001 Epoch: 29 Global Step: 604080 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:19,730-Speed 6303.73 samples/sec Loss 3.4826 LearningRate 0.0001 Epoch: 29 Global Step: 604090 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:22,974-Speed 6314.06 samples/sec Loss 3.5295 LearningRate 0.0001 Epoch: 29 Global Step: 604100 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:26,221-Speed 6309.53 samples/sec Loss 3.5313 LearningRate 0.0001 Epoch: 29 Global Step: 604110 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-02 22:41:29,458-Speed 6329.19 samples/sec Loss 3.5552 LearningRate 0.0001 Epoch: 29 Global Step: 604120 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:32,710-Speed 6298.89 samples/sec Loss 3.5676 LearningRate 0.0001 Epoch: 29 Global Step: 604130 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:35,954-Speed 6313.99 samples/sec Loss 3.5213 LearningRate 0.0001 Epoch: 29 Global Step: 604140 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:39,200-Speed 6310.40 samples/sec Loss 3.5020 LearningRate 0.0001 Epoch: 29 Global Step: 604150 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:42,444-Speed 6314.51 samples/sec Loss 3.5243 LearningRate 0.0001 Epoch: 29 Global Step: 604160 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:45,699-Speed 6294.09 samples/sec Loss 3.4768 LearningRate 0.0001 Epoch: 29 Global Step: 604170 Fp16 Grad Scale: 8192 Required: 21 hours Training: 2022-04-02 22:41:48,944-Speed 6313.70 samples/sec Loss 3.5422 LearningRate 0.0001 Epoch: 29 Global Step: 604180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:41:52,189-Speed 6311.62 samples/sec Loss 3.5316 LearningRate 0.0001 Epoch: 29 Global Step: 604190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:41:55,435-Speed 6312.44 samples/sec Loss 3.5336 LearningRate 0.0001 Epoch: 29 Global Step: 604200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:41:58,677-Speed 6317.54 samples/sec Loss 3.5233 LearningRate 0.0001 Epoch: 29 Global Step: 604210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:01,905-Speed 6345.01 samples/sec Loss 3.5499 LearningRate 0.0001 Epoch: 29 Global Step: 604220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:05,149-Speed 6315.24 samples/sec Loss 3.5457 LearningRate 0.0001 Epoch: 29 Global Step: 604230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:08,391-Speed 6317.71 samples/sec Loss 3.4918 LearningRate 0.0001 Epoch: 29 Global Step: 604240 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:11,638-Speed 6310.59 samples/sec Loss 3.5143 LearningRate 0.0001 Epoch: 29 Global Step: 604250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:14,883-Speed 6310.93 samples/sec Loss 3.4965 LearningRate 0.0001 Epoch: 29 Global Step: 604260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:18,128-Speed 6312.54 samples/sec Loss 3.5518 LearningRate 0.0001 Epoch: 29 Global Step: 604270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:21,371-Speed 6317.36 samples/sec Loss 3.4699 LearningRate 0.0001 Epoch: 29 Global Step: 604280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:24,622-Speed 6300.87 samples/sec Loss 3.5230 LearningRate 0.0001 Epoch: 29 Global Step: 604290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:27,863-Speed 6319.96 samples/sec Loss 3.5675 LearningRate 0.0001 Epoch: 29 Global Step: 604300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:31,108-Speed 6314.04 samples/sec Loss 3.5246 LearningRate 0.0001 Epoch: 29 Global Step: 604310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:34,355-Speed 6308.54 samples/sec Loss 3.5268 LearningRate 0.0001 Epoch: 29 Global Step: 604320 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:42:37,599-Speed 6314.60 samples/sec Loss 3.5443 LearningRate 0.0001 Epoch: 29 Global Step: 604330 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:42:40,828-Speed 6344.25 samples/sec Loss 3.4991 LearningRate 0.0001 Epoch: 29 Global Step: 604340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:44,071-Speed 6317.64 samples/sec Loss 3.4940 LearningRate 0.0001 Epoch: 29 Global Step: 604350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:47,318-Speed 6308.08 samples/sec Loss 3.5879 LearningRate 0.0001 Epoch: 29 Global Step: 604360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:50,573-Speed 6293.39 samples/sec Loss 3.4963 LearningRate 0.0001 Epoch: 29 Global Step: 604370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:53,818-Speed 6311.23 samples/sec Loss 3.5661 LearningRate 0.0001 Epoch: 29 Global Step: 604380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:42:57,061-Speed 6317.57 samples/sec Loss 3.5623 LearningRate 0.0001 Epoch: 29 Global Step: 604390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:00,306-Speed 6312.63 samples/sec Loss 3.5408 LearningRate 0.0001 Epoch: 29 Global Step: 604400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:03,550-Speed 6315.25 samples/sec Loss 3.5085 LearningRate 0.0001 Epoch: 29 Global Step: 604410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:06,798-Speed 6308.50 samples/sec Loss 3.5308 LearningRate 0.0001 Epoch: 29 Global Step: 604420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:10,043-Speed 6311.64 samples/sec Loss 3.5515 LearningRate 0.0001 Epoch: 29 Global Step: 604430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:13,274-Speed 6339.55 samples/sec Loss 3.5539 LearningRate 0.0001 Epoch: 29 Global Step: 604440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:16,517-Speed 6317.32 samples/sec Loss 3.5647 LearningRate 0.0001 Epoch: 29 Global Step: 604450 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:19,762-Speed 6311.70 samples/sec Loss 3.5874 LearningRate 0.0001 Epoch: 29 Global Step: 604460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:23,010-Speed 6306.89 samples/sec Loss 3.5198 LearningRate 0.0001 Epoch: 29 Global Step: 604470 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:26,255-Speed 6313.87 samples/sec Loss 3.6108 LearningRate 0.0001 Epoch: 29 Global Step: 604480 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:29,498-Speed 6315.39 samples/sec Loss 3.5568 LearningRate 0.0001 Epoch: 29 Global Step: 604490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:32,742-Speed 6314.94 samples/sec Loss 3.5230 LearningRate 0.0001 Epoch: 29 Global Step: 604500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:35,986-Speed 6314.36 samples/sec Loss 3.5532 LearningRate 0.0001 Epoch: 29 Global Step: 604510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:39,232-Speed 6311.55 samples/sec Loss 3.4959 LearningRate 0.0001 Epoch: 29 Global Step: 604520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:42,479-Speed 6309.36 samples/sec Loss 3.5377 LearningRate 0.0001 Epoch: 29 Global Step: 604530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:45,726-Speed 6307.30 samples/sec Loss 3.5167 LearningRate 0.0001 Epoch: 29 Global Step: 604540 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:43:48,957-Speed 6340.01 samples/sec Loss 3.5350 LearningRate 0.0001 Epoch: 29 Global Step: 604550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:52,207-Speed 6302.75 samples/sec Loss 3.4633 LearningRate 0.0001 Epoch: 29 Global Step: 604560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:55,448-Speed 6321.02 samples/sec Loss 3.5707 LearningRate 0.0001 Epoch: 29 Global Step: 604570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:43:58,701-Speed 6296.59 samples/sec Loss 3.5051 LearningRate 0.0001 Epoch: 29 Global Step: 604580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:01,950-Speed 6305.70 samples/sec Loss 3.5089 LearningRate 0.0001 Epoch: 29 Global Step: 604590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:05,191-Speed 6320.70 samples/sec Loss 3.5693 LearningRate 0.0001 Epoch: 29 Global Step: 604600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:08,434-Speed 6316.34 samples/sec Loss 3.5666 LearningRate 0.0001 Epoch: 29 Global Step: 604610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:11,677-Speed 6318.23 samples/sec Loss 3.5199 LearningRate 0.0001 Epoch: 29 Global Step: 604620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:14,927-Speed 6301.23 samples/sec Loss 3.5580 LearningRate 0.0001 Epoch: 29 Global Step: 604630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:18,174-Speed 6309.30 samples/sec Loss 3.5199 LearningRate 0.0001 Epoch: 29 Global Step: 604640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:21,417-Speed 6317.92 samples/sec Loss 3.4891 LearningRate 0.0001 Epoch: 29 Global Step: 604650 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:44:24,647-Speed 6340.77 samples/sec Loss 3.5370 LearningRate 0.0001 Epoch: 29 Global Step: 604660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:27,890-Speed 6316.44 samples/sec Loss 3.5413 LearningRate 0.0001 Epoch: 29 Global Step: 604670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:31,135-Speed 6313.87 samples/sec Loss 3.5548 LearningRate 0.0001 Epoch: 29 Global Step: 604680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:34,385-Speed 6301.77 samples/sec Loss 3.6078 LearningRate 0.0001 Epoch: 29 Global Step: 604690 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:37,632-Speed 6307.69 samples/sec Loss 3.4723 LearningRate 0.0001 Epoch: 29 Global Step: 604700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:40,878-Speed 6312.51 samples/sec Loss 3.4963 LearningRate 0.0001 Epoch: 29 Global Step: 604710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:44,121-Speed 6315.16 samples/sec Loss 3.5663 LearningRate 0.0001 Epoch: 29 Global Step: 604720 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:47,366-Speed 6313.79 samples/sec Loss 3.5137 LearningRate 0.0001 Epoch: 29 Global Step: 604730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:50,612-Speed 6311.24 samples/sec Loss 3.5489 LearningRate 0.0001 Epoch: 29 Global Step: 604740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:53,857-Speed 6311.32 samples/sec Loss 3.4795 LearningRate 0.0001 Epoch: 29 Global Step: 604750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:44:57,088-Speed 6341.62 samples/sec Loss 3.5535 LearningRate 0.0001 Epoch: 29 Global Step: 604760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:00,330-Speed 6317.56 samples/sec Loss 3.4778 LearningRate 0.0001 Epoch: 29 Global Step: 604770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:03,575-Speed 6312.30 samples/sec Loss 3.4549 LearningRate 0.0001 Epoch: 29 Global Step: 604780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:06,818-Speed 6316.06 samples/sec Loss 3.5551 LearningRate 0.0001 Epoch: 29 Global Step: 604790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:10,066-Speed 6307.19 samples/sec Loss 3.4842 LearningRate 0.0001 Epoch: 29 Global Step: 604800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:13,310-Speed 6315.55 samples/sec Loss 3.5509 LearningRate 0.0001 Epoch: 29 Global Step: 604810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:16,554-Speed 6314.66 samples/sec Loss 3.5115 LearningRate 0.0001 Epoch: 29 Global Step: 604820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:19,802-Speed 6306.97 samples/sec Loss 3.5363 LearningRate 0.0001 Epoch: 29 Global Step: 604830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:23,043-Speed 6319.78 samples/sec Loss 3.4614 LearningRate 0.0001 Epoch: 29 Global Step: 604840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:26,292-Speed 6304.72 samples/sec Loss 3.5678 LearningRate 0.0001 Epoch: 29 Global Step: 604850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:29,523-Speed 6340.52 samples/sec Loss 3.5189 LearningRate 0.0001 Epoch: 29 Global Step: 604860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:32,772-Speed 6305.74 samples/sec Loss 3.5314 LearningRate 0.0001 Epoch: 29 Global Step: 604870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:36,020-Speed 6306.03 samples/sec Loss 3.5070 LearningRate 0.0001 Epoch: 29 Global Step: 604880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:39,266-Speed 6311.74 samples/sec Loss 3.4729 LearningRate 0.0001 Epoch: 29 Global Step: 604890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:42,509-Speed 6316.10 samples/sec Loss 3.4829 LearningRate 0.0001 Epoch: 29 Global Step: 604900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:45,753-Speed 6314.50 samples/sec Loss 3.5251 LearningRate 0.0001 Epoch: 29 Global Step: 604910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:48,997-Speed 6315.12 samples/sec Loss 3.5243 LearningRate 0.0001 Epoch: 29 Global Step: 604920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:52,241-Speed 6314.70 samples/sec Loss 3.4912 LearningRate 0.0001 Epoch: 29 Global Step: 604930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:55,488-Speed 6308.65 samples/sec Loss 3.5277 LearningRate 0.0001 Epoch: 29 Global Step: 604940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:45:58,731-Speed 6316.73 samples/sec Loss 3.5400 LearningRate 0.0001 Epoch: 29 Global Step: 604950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:01,979-Speed 6306.71 samples/sec Loss 3.5147 LearningRate 0.0001 Epoch: 29 Global Step: 604960 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:46:05,222-Speed 6317.49 samples/sec Loss 3.5220 LearningRate 0.0001 Epoch: 29 Global Step: 604970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:08,466-Speed 6313.90 samples/sec Loss 3.4965 LearningRate 0.0001 Epoch: 29 Global Step: 604980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:11,709-Speed 6315.49 samples/sec Loss 3.5277 LearningRate 0.0001 Epoch: 29 Global Step: 604990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:14,961-Speed 6299.75 samples/sec Loss 3.5286 LearningRate 0.0001 Epoch: 29 Global Step: 605000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:18,203-Speed 6318.23 samples/sec Loss 3.5542 LearningRate 0.0001 Epoch: 29 Global Step: 605010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:21,451-Speed 6307.55 samples/sec Loss 3.5318 LearningRate 0.0001 Epoch: 29 Global Step: 605020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:24,700-Speed 6305.17 samples/sec Loss 3.4859 LearningRate 0.0001 Epoch: 29 Global Step: 605030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:27,943-Speed 6315.29 samples/sec Loss 3.5715 LearningRate 0.0001 Epoch: 29 Global Step: 605040 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:31,189-Speed 6311.28 samples/sec Loss 3.4926 LearningRate 0.0001 Epoch: 29 Global Step: 605050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:34,434-Speed 6312.78 samples/sec Loss 3.5117 LearningRate 0.0001 Epoch: 29 Global Step: 605060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:37,661-Speed 6348.25 samples/sec Loss 3.5028 LearningRate 0.0001 Epoch: 29 Global Step: 605070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:40,906-Speed 6311.89 samples/sec Loss 3.5574 LearningRate 0.0001 Epoch: 29 Global Step: 605080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:44,150-Speed 6315.33 samples/sec Loss 3.5457 LearningRate 0.0001 Epoch: 29 Global Step: 605090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:47,391-Speed 6320.38 samples/sec Loss 3.5428 LearningRate 0.0001 Epoch: 29 Global Step: 605100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:50,640-Speed 6304.91 samples/sec Loss 3.5065 LearningRate 0.0001 Epoch: 29 Global Step: 605110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:53,887-Speed 6310.61 samples/sec Loss 3.4999 LearningRate 0.0001 Epoch: 29 Global Step: 605120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:46:57,131-Speed 6313.74 samples/sec Loss 3.4824 LearningRate 0.0001 Epoch: 29 Global Step: 605130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:00,389-Speed 6287.86 samples/sec Loss 3.5171 LearningRate 0.0001 Epoch: 29 Global Step: 605140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:03,635-Speed 6311.05 samples/sec Loss 3.4753 LearningRate 0.0001 Epoch: 29 Global Step: 605150 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:06,882-Speed 6309.54 samples/sec Loss 3.5005 LearningRate 0.0001 Epoch: 29 Global Step: 605160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:10,112-Speed 6341.70 samples/sec Loss 3.5340 LearningRate 0.0001 Epoch: 29 Global Step: 605170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:13,368-Speed 6289.55 samples/sec Loss 3.5217 LearningRate 0.0001 Epoch: 29 Global Step: 605180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:16,638-Speed 6265.44 samples/sec Loss 3.4847 LearningRate 0.0001 Epoch: 29 Global Step: 605190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:19,883-Speed 6313.70 samples/sec Loss 3.4873 LearningRate 0.0001 Epoch: 29 Global Step: 605200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:23,131-Speed 6306.97 samples/sec Loss 3.5134 LearningRate 0.0001 Epoch: 29 Global Step: 605210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:26,375-Speed 6314.27 samples/sec Loss 3.5400 LearningRate 0.0001 Epoch: 29 Global Step: 605220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:29,625-Speed 6302.37 samples/sec Loss 3.4670 LearningRate 0.0001 Epoch: 29 Global Step: 605230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:32,870-Speed 6311.49 samples/sec Loss 3.5144 LearningRate 0.0001 Epoch: 29 Global Step: 605240 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:36,113-Speed 6317.61 samples/sec Loss 3.5013 LearningRate 0.0001 Epoch: 29 Global Step: 605250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:39,364-Speed 6300.52 samples/sec Loss 3.5636 LearningRate 0.0001 Epoch: 29 Global Step: 605260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:42,611-Speed 6309.55 samples/sec Loss 3.4914 LearningRate 0.0001 Epoch: 29 Global Step: 605270 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:47:45,839-Speed 6345.83 samples/sec Loss 3.5275 LearningRate 0.0001 Epoch: 29 Global Step: 605280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:49,084-Speed 6311.67 samples/sec Loss 3.5776 LearningRate 0.0001 Epoch: 29 Global Step: 605290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:52,330-Speed 6312.54 samples/sec Loss 3.5182 LearningRate 0.0001 Epoch: 29 Global Step: 605300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:55,577-Speed 6308.70 samples/sec Loss 3.5107 LearningRate 0.0001 Epoch: 29 Global Step: 605310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:47:58,825-Speed 6307.44 samples/sec Loss 3.5977 LearningRate 0.0001 Epoch: 29 Global Step: 605320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:02,074-Speed 6304.73 samples/sec Loss 3.5406 LearningRate 0.0001 Epoch: 29 Global Step: 605330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:05,319-Speed 6312.07 samples/sec Loss 3.5635 LearningRate 0.0001 Epoch: 29 Global Step: 605340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:08,561-Speed 6318.17 samples/sec Loss 3.5560 LearningRate 0.0001 Epoch: 29 Global Step: 605350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:11,811-Speed 6302.48 samples/sec Loss 3.5690 LearningRate 0.0001 Epoch: 29 Global Step: 605360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:15,060-Speed 6304.76 samples/sec Loss 3.5294 LearningRate 0.0001 Epoch: 29 Global Step: 605370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:18,294-Speed 6334.54 samples/sec Loss 3.5336 LearningRate 0.0001 Epoch: 29 Global Step: 605380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:21,537-Speed 6317.67 samples/sec Loss 3.5416 LearningRate 0.0001 Epoch: 29 Global Step: 605390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:24,784-Speed 6308.15 samples/sec Loss 3.5124 LearningRate 0.0001 Epoch: 29 Global Step: 605400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:28,028-Speed 6313.99 samples/sec Loss 3.5886 LearningRate 0.0001 Epoch: 29 Global Step: 605410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:31,273-Speed 6314.09 samples/sec Loss 3.5842 LearningRate 0.0001 Epoch: 29 Global Step: 605420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:34,518-Speed 6310.99 samples/sec Loss 3.5305 LearningRate 0.0001 Epoch: 29 Global Step: 605430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:37,764-Speed 6310.93 samples/sec Loss 3.5767 LearningRate 0.0001 Epoch: 29 Global Step: 605440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:41,013-Speed 6304.92 samples/sec Loss 3.5493 LearningRate 0.0001 Epoch: 29 Global Step: 605450 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:44,257-Speed 6314.28 samples/sec Loss 3.5673 LearningRate 0.0001 Epoch: 29 Global Step: 605460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:47,502-Speed 6314.59 samples/sec Loss 3.4946 LearningRate 0.0001 Epoch: 29 Global Step: 605470 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:50,744-Speed 6317.50 samples/sec Loss 3.5071 LearningRate 0.0001 Epoch: 29 Global Step: 605480 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:48:53,975-Speed 6340.67 samples/sec Loss 3.5963 LearningRate 0.0001 Epoch: 29 Global Step: 605490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:48:57,222-Speed 6307.78 samples/sec Loss 3.5099 LearningRate 0.0001 Epoch: 29 Global Step: 605500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:00,468-Speed 6310.89 samples/sec Loss 3.5172 LearningRate 0.0001 Epoch: 29 Global Step: 605510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:03,720-Speed 6299.57 samples/sec Loss 3.4705 LearningRate 0.0001 Epoch: 29 Global Step: 605520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:06,963-Speed 6316.66 samples/sec Loss 3.5631 LearningRate 0.0001 Epoch: 29 Global Step: 605530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:10,213-Speed 6303.22 samples/sec Loss 3.5740 LearningRate 0.0001 Epoch: 29 Global Step: 605540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:13,459-Speed 6311.82 samples/sec Loss 3.5071 LearningRate 0.0001 Epoch: 29 Global Step: 605550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:16,701-Speed 6317.06 samples/sec Loss 3.5434 LearningRate 0.0001 Epoch: 29 Global Step: 605560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:19,946-Speed 6313.44 samples/sec Loss 3.4793 LearningRate 0.0001 Epoch: 29 Global Step: 605570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:23,188-Speed 6318.00 samples/sec Loss 3.4777 LearningRate 0.0001 Epoch: 29 Global Step: 605580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:26,423-Speed 6331.96 samples/sec Loss 3.5202 LearningRate 0.0001 Epoch: 29 Global Step: 605590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:29,671-Speed 6307.15 samples/sec Loss 3.5576 LearningRate 0.0001 Epoch: 29 Global Step: 605600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:32,918-Speed 6309.47 samples/sec Loss 3.5231 LearningRate 0.0001 Epoch: 29 Global Step: 605610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:36,166-Speed 6306.92 samples/sec Loss 3.5629 LearningRate 0.0001 Epoch: 29 Global Step: 605620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:39,409-Speed 6316.47 samples/sec Loss 3.5187 LearningRate 0.0001 Epoch: 29 Global Step: 605630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:42,655-Speed 6309.15 samples/sec Loss 3.4679 LearningRate 0.0001 Epoch: 29 Global Step: 605640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:45,897-Speed 6319.85 samples/sec Loss 3.5265 LearningRate 0.0001 Epoch: 29 Global Step: 605650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:49,143-Speed 6309.89 samples/sec Loss 3.4828 LearningRate 0.0001 Epoch: 29 Global Step: 605660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:52,393-Speed 6303.90 samples/sec Loss 3.5105 LearningRate 0.0001 Epoch: 29 Global Step: 605670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:55,637-Speed 6313.78 samples/sec Loss 3.4910 LearningRate 0.0001 Epoch: 29 Global Step: 605680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:49:58,884-Speed 6309.53 samples/sec Loss 3.4715 LearningRate 0.0001 Epoch: 29 Global Step: 605690 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:50:02,124-Speed 6320.60 samples/sec Loss 3.5179 LearningRate 0.0001 Epoch: 29 Global Step: 605700 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:50:05,370-Speed 6311.52 samples/sec Loss 3.4405 LearningRate 0.0001 Epoch: 29 Global Step: 605710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:08,617-Speed 6308.22 samples/sec Loss 3.5157 LearningRate 0.0001 Epoch: 29 Global Step: 605720 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:11,861-Speed 6315.56 samples/sec Loss 3.4755 LearningRate 0.0001 Epoch: 29 Global Step: 605730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:15,109-Speed 6307.58 samples/sec Loss 3.4859 LearningRate 0.0001 Epoch: 29 Global Step: 605740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:18,355-Speed 6311.69 samples/sec Loss 3.5013 LearningRate 0.0001 Epoch: 29 Global Step: 605750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:21,596-Speed 6318.83 samples/sec Loss 3.4955 LearningRate 0.0001 Epoch: 29 Global Step: 605760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:24,845-Speed 6305.19 samples/sec Loss 3.5712 LearningRate 0.0001 Epoch: 29 Global Step: 605770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:28,090-Speed 6312.50 samples/sec Loss 3.5471 LearningRate 0.0001 Epoch: 29 Global Step: 605780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:31,334-Speed 6316.12 samples/sec Loss 3.5316 LearningRate 0.0001 Epoch: 29 Global Step: 605790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:34,577-Speed 6315.20 samples/sec Loss 3.5056 LearningRate 0.0001 Epoch: 29 Global Step: 605800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:37,809-Speed 6339.22 samples/sec Loss 3.4743 LearningRate 0.0001 Epoch: 29 Global Step: 605810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:41,055-Speed 6310.83 samples/sec Loss 3.4765 LearningRate 0.0001 Epoch: 29 Global Step: 605820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:44,302-Speed 6308.00 samples/sec Loss 3.5448 LearningRate 0.0001 Epoch: 29 Global Step: 605830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:47,544-Speed 6318.72 samples/sec Loss 3.5216 LearningRate 0.0001 Epoch: 29 Global Step: 605840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:50,787-Speed 6315.84 samples/sec Loss 3.5794 LearningRate 0.0001 Epoch: 29 Global Step: 605850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:54,032-Speed 6312.53 samples/sec Loss 3.5348 LearningRate 0.0001 Epoch: 29 Global Step: 605860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:50:57,278-Speed 6311.86 samples/sec Loss 3.4985 LearningRate 0.0001 Epoch: 29 Global Step: 605870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:00,529-Speed 6300.71 samples/sec Loss 3.5676 LearningRate 0.0001 Epoch: 29 Global Step: 605880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:03,776-Speed 6308.97 samples/sec Loss 3.4839 LearningRate 0.0001 Epoch: 29 Global Step: 605890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:07,022-Speed 6310.50 samples/sec Loss 3.5229 LearningRate 0.0001 Epoch: 29 Global Step: 605900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:10,255-Speed 6335.64 samples/sec Loss 3.5180 LearningRate 0.0001 Epoch: 29 Global Step: 605910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:13,501-Speed 6310.11 samples/sec Loss 3.4921 LearningRate 0.0001 Epoch: 29 Global Step: 605920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:16,747-Speed 6310.60 samples/sec Loss 3.5354 LearningRate 0.0001 Epoch: 29 Global Step: 605930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:19,991-Speed 6315.04 samples/sec Loss 3.4673 LearningRate 0.0001 Epoch: 29 Global Step: 605940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:23,246-Speed 6293.90 samples/sec Loss 3.5553 LearningRate 0.0001 Epoch: 29 Global Step: 605950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:26,494-Speed 6307.10 samples/sec Loss 3.5141 LearningRate 0.0001 Epoch: 29 Global Step: 605960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:29,734-Speed 6323.15 samples/sec Loss 3.5312 LearningRate 0.0001 Epoch: 29 Global Step: 605970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:32,982-Speed 6305.82 samples/sec Loss 3.5323 LearningRate 0.0001 Epoch: 29 Global Step: 605980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:36,224-Speed 6319.64 samples/sec Loss 3.4658 LearningRate 0.0001 Epoch: 29 Global Step: 605990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:39,473-Speed 6303.75 samples/sec Loss 3.5632 LearningRate 0.0001 Epoch: 29 Global Step: 606000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:42,721-Speed 6307.30 samples/sec Loss 3.5438 LearningRate 0.0001 Epoch: 29 Global Step: 606010 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:51:45,967-Speed 6310.16 samples/sec Loss 3.4893 LearningRate 0.0001 Epoch: 29 Global Step: 606020 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:51:49,198-Speed 6340.18 samples/sec Loss 3.5253 LearningRate 0.0001 Epoch: 29 Global Step: 606030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:52,441-Speed 6318.33 samples/sec Loss 3.4395 LearningRate 0.0001 Epoch: 29 Global Step: 606040 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:55,687-Speed 6309.13 samples/sec Loss 3.5359 LearningRate 0.0001 Epoch: 29 Global Step: 606050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:51:58,930-Speed 6317.68 samples/sec Loss 3.5798 LearningRate 0.0001 Epoch: 29 Global Step: 606060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:02,177-Speed 6308.45 samples/sec Loss 3.4962 LearningRate 0.0001 Epoch: 29 Global Step: 606070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:05,424-Speed 6307.72 samples/sec Loss 3.5196 LearningRate 0.0001 Epoch: 29 Global Step: 606080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:08,667-Speed 6316.35 samples/sec Loss 3.4831 LearningRate 0.0001 Epoch: 29 Global Step: 606090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:11,914-Speed 6308.39 samples/sec Loss 3.4941 LearningRate 0.0001 Epoch: 29 Global Step: 606100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:15,164-Speed 6303.50 samples/sec Loss 3.5614 LearningRate 0.0001 Epoch: 29 Global Step: 606110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:18,410-Speed 6310.70 samples/sec Loss 3.4681 LearningRate 0.0001 Epoch: 29 Global Step: 606120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:21,656-Speed 6310.24 samples/sec Loss 3.4878 LearningRate 0.0001 Epoch: 29 Global Step: 606130 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:52:24,887-Speed 6340.73 samples/sec Loss 3.5524 LearningRate 0.0001 Epoch: 29 Global Step: 606140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:28,133-Speed 6310.16 samples/sec Loss 3.4950 LearningRate 0.0001 Epoch: 29 Global Step: 606150 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:31,378-Speed 6313.80 samples/sec Loss 3.5569 LearningRate 0.0001 Epoch: 29 Global Step: 606160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:34,627-Speed 6303.77 samples/sec Loss 3.5249 LearningRate 0.0001 Epoch: 29 Global Step: 606170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:37,869-Speed 6319.51 samples/sec Loss 3.5159 LearningRate 0.0001 Epoch: 29 Global Step: 606180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:41,117-Speed 6307.41 samples/sec Loss 3.5306 LearningRate 0.0001 Epoch: 29 Global Step: 606190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:44,370-Speed 6296.84 samples/sec Loss 3.4706 LearningRate 0.0001 Epoch: 29 Global Step: 606200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:47,612-Speed 6319.33 samples/sec Loss 3.5333 LearningRate 0.0001 Epoch: 29 Global Step: 606210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:50,858-Speed 6309.54 samples/sec Loss 3.5010 LearningRate 0.0001 Epoch: 29 Global Step: 606220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:54,105-Speed 6309.93 samples/sec Loss 3.5119 LearningRate 0.0001 Epoch: 29 Global Step: 606230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:52:57,349-Speed 6313.60 samples/sec Loss 3.5519 LearningRate 0.0001 Epoch: 29 Global Step: 606240 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:53:00,582-Speed 6335.74 samples/sec Loss 3.5485 LearningRate 0.0001 Epoch: 29 Global Step: 606250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:03,829-Speed 6308.75 samples/sec Loss 3.5333 LearningRate 0.0001 Epoch: 29 Global Step: 606260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:07,075-Speed 6310.61 samples/sec Loss 3.4640 LearningRate 0.0001 Epoch: 29 Global Step: 606270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:10,322-Speed 6309.71 samples/sec Loss 3.4727 LearningRate 0.0001 Epoch: 29 Global Step: 606280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:13,568-Speed 6310.48 samples/sec Loss 3.4843 LearningRate 0.0001 Epoch: 29 Global Step: 606290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:16,827-Speed 6284.34 samples/sec Loss 3.4941 LearningRate 0.0001 Epoch: 29 Global Step: 606300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:20,078-Speed 6302.08 samples/sec Loss 3.6053 LearningRate 0.0001 Epoch: 29 Global Step: 606310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:23,322-Speed 6315.13 samples/sec Loss 3.4828 LearningRate 0.0001 Epoch: 29 Global Step: 606320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:26,566-Speed 6315.39 samples/sec Loss 3.5203 LearningRate 0.0001 Epoch: 29 Global Step: 606330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:29,814-Speed 6306.12 samples/sec Loss 3.5511 LearningRate 0.0001 Epoch: 29 Global Step: 606340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:33,064-Speed 6303.30 samples/sec Loss 3.5047 LearningRate 0.0001 Epoch: 29 Global Step: 606350 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:53:36,329-Speed 6272.93 samples/sec Loss 3.4457 LearningRate 0.0001 Epoch: 29 Global Step: 606360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:39,581-Speed 6298.76 samples/sec Loss 3.5368 LearningRate 0.0001 Epoch: 29 Global Step: 606370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:42,828-Speed 6309.57 samples/sec Loss 3.5084 LearningRate 0.0001 Epoch: 29 Global Step: 606380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:46,083-Speed 6293.98 samples/sec Loss 3.5683 LearningRate 0.0001 Epoch: 29 Global Step: 606390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:49,328-Speed 6313.30 samples/sec Loss 3.4960 LearningRate 0.0001 Epoch: 29 Global Step: 606400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:52,576-Speed 6306.16 samples/sec Loss 3.5290 LearningRate 0.0001 Epoch: 29 Global Step: 606410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:53:55,804-Speed 6346.74 samples/sec Loss 3.5576 LearningRate 0.0001 Epoch: 29 Global Step: 606420 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 22:53:59,050-Speed 6311.38 samples/sec Loss 3.5207 LearningRate 0.0001 Epoch: 29 Global Step: 606430 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 22:54:02,308-Speed 6287.33 samples/sec Loss 3.5370 LearningRate 0.0001 Epoch: 29 Global Step: 606440 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 22:54:05,549-Speed 6319.80 samples/sec Loss 3.4462 LearningRate 0.0001 Epoch: 29 Global Step: 606450 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 22:54:08,795-Speed 6310.19 samples/sec Loss 3.4943 LearningRate 0.0001 Epoch: 29 Global Step: 606460 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 22:54:12,037-Speed 6319.85 samples/sec Loss 3.4684 LearningRate 0.0001 Epoch: 29 Global Step: 606470 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 22:54:15,278-Speed 6319.01 samples/sec Loss 3.5100 LearningRate 0.0001 Epoch: 29 Global Step: 606480 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 22:54:18,524-Speed 6310.40 samples/sec Loss 3.5522 LearningRate 0.0001 Epoch: 29 Global Step: 606490 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 22:54:21,766-Speed 6318.35 samples/sec Loss 3.5392 LearningRate 0.0001 Epoch: 29 Global Step: 606500 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 22:54:25,024-Speed 6288.72 samples/sec Loss 3.5405 LearningRate 0.0001 Epoch: 29 Global Step: 606510 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 22:54:28,271-Speed 6309.09 samples/sec Loss 3.5653 LearningRate 0.0001 Epoch: 29 Global Step: 606520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:54:31,513-Speed 6317.69 samples/sec Loss 3.5021 LearningRate 0.0001 Epoch: 29 Global Step: 606530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:54:34,764-Speed 6301.56 samples/sec Loss 3.5397 LearningRate 0.0001 Epoch: 29 Global Step: 606540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:54:38,011-Speed 6308.73 samples/sec Loss 3.5276 LearningRate 0.0001 Epoch: 29 Global Step: 606550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:54:41,264-Speed 6296.48 samples/sec Loss 3.4868 LearningRate 0.0001 Epoch: 29 Global Step: 606560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:54:44,507-Speed 6316.54 samples/sec Loss 3.5254 LearningRate 0.0001 Epoch: 29 Global Step: 606570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:54:47,749-Speed 6318.21 samples/sec Loss 3.5055 LearningRate 0.0001 Epoch: 29 Global Step: 606580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:54:50,994-Speed 6312.67 samples/sec Loss 3.5121 LearningRate 0.0001 Epoch: 29 Global Step: 606590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:54:54,237-Speed 6315.81 samples/sec Loss 3.5356 LearningRate 0.0001 Epoch: 29 Global Step: 606600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:54:57,482-Speed 6313.43 samples/sec Loss 3.5142 LearningRate 0.0001 Epoch: 29 Global Step: 606610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:00,711-Speed 6345.07 samples/sec Loss 3.5571 LearningRate 0.0001 Epoch: 29 Global Step: 606620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:03,957-Speed 6311.12 samples/sec Loss 3.4983 LearningRate 0.0001 Epoch: 29 Global Step: 606630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:07,214-Speed 6288.65 samples/sec Loss 3.5160 LearningRate 0.0001 Epoch: 29 Global Step: 606640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:10,471-Speed 6289.54 samples/sec Loss 3.5234 LearningRate 0.0001 Epoch: 29 Global Step: 606650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:13,716-Speed 6312.42 samples/sec Loss 3.5365 LearningRate 0.0001 Epoch: 29 Global Step: 606660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:16,962-Speed 6310.74 samples/sec Loss 3.4867 LearningRate 0.0001 Epoch: 29 Global Step: 606670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:20,212-Speed 6302.68 samples/sec Loss 3.5530 LearningRate 0.0001 Epoch: 29 Global Step: 606680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:23,458-Speed 6311.04 samples/sec Loss 3.4959 LearningRate 0.0001 Epoch: 29 Global Step: 606690 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:26,778-Speed 6170.45 samples/sec Loss 3.5135 LearningRate 0.0001 Epoch: 29 Global Step: 606700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:30,120-Speed 6128.67 samples/sec Loss 3.5242 LearningRate 0.0001 Epoch: 29 Global Step: 606710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:33,367-Speed 6310.05 samples/sec Loss 3.5481 LearningRate 0.0001 Epoch: 29 Global Step: 606720 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:55:36,599-Speed 6336.81 samples/sec Loss 3.5481 LearningRate 0.0001 Epoch: 29 Global Step: 606730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:39,842-Speed 6317.79 samples/sec Loss 3.5768 LearningRate 0.0001 Epoch: 29 Global Step: 606740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:43,091-Speed 6304.41 samples/sec Loss 3.5026 LearningRate 0.0001 Epoch: 29 Global Step: 606750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:46,337-Speed 6310.21 samples/sec Loss 3.4727 LearningRate 0.0001 Epoch: 29 Global Step: 606760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:49,583-Speed 6310.99 samples/sec Loss 3.4781 LearningRate 0.0001 Epoch: 29 Global Step: 606770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:52,831-Speed 6307.25 samples/sec Loss 3.5410 LearningRate 0.0001 Epoch: 29 Global Step: 606780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:56,075-Speed 6313.08 samples/sec Loss 3.4338 LearningRate 0.0001 Epoch: 29 Global Step: 606790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:55:59,322-Speed 6309.95 samples/sec Loss 3.4799 LearningRate 0.0001 Epoch: 29 Global Step: 606800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:02,571-Speed 6305.38 samples/sec Loss 3.4992 LearningRate 0.0001 Epoch: 29 Global Step: 606810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:05,817-Speed 6309.33 samples/sec Loss 3.5217 LearningRate 0.0001 Epoch: 29 Global Step: 606820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:09,048-Speed 6341.58 samples/sec Loss 3.5105 LearningRate 0.0001 Epoch: 29 Global Step: 606830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:12,294-Speed 6310.70 samples/sec Loss 3.5008 LearningRate 0.0001 Epoch: 29 Global Step: 606840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:15,540-Speed 6310.67 samples/sec Loss 3.5916 LearningRate 0.0001 Epoch: 29 Global Step: 606850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:18,785-Speed 6312.94 samples/sec Loss 3.5433 LearningRate 0.0001 Epoch: 29 Global Step: 606860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:22,033-Speed 6306.12 samples/sec Loss 3.5077 LearningRate 0.0001 Epoch: 29 Global Step: 606870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:25,277-Speed 6314.81 samples/sec Loss 3.5184 LearningRate 0.0001 Epoch: 29 Global Step: 606880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:28,521-Speed 6315.84 samples/sec Loss 3.5512 LearningRate 0.0001 Epoch: 29 Global Step: 606890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:31,768-Speed 6308.59 samples/sec Loss 3.4869 LearningRate 0.0001 Epoch: 29 Global Step: 606900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:35,010-Speed 6317.31 samples/sec Loss 3.5005 LearningRate 0.0001 Epoch: 29 Global Step: 606910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:38,255-Speed 6312.87 samples/sec Loss 3.5355 LearningRate 0.0001 Epoch: 29 Global Step: 606920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:41,499-Speed 6315.30 samples/sec Loss 3.4839 LearningRate 0.0001 Epoch: 29 Global Step: 606930 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:56:44,739-Speed 6322.18 samples/sec Loss 3.5616 LearningRate 0.0001 Epoch: 29 Global Step: 606940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:47,981-Speed 6318.48 samples/sec Loss 3.4987 LearningRate 0.0001 Epoch: 29 Global Step: 606950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:51,225-Speed 6313.72 samples/sec Loss 3.6050 LearningRate 0.0001 Epoch: 29 Global Step: 606960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:54,471-Speed 6311.74 samples/sec Loss 3.5014 LearningRate 0.0001 Epoch: 29 Global Step: 606970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:56:57,730-Speed 6283.97 samples/sec Loss 3.4247 LearningRate 0.0001 Epoch: 29 Global Step: 606980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:00,978-Speed 6310.17 samples/sec Loss 3.5146 LearningRate 0.0001 Epoch: 29 Global Step: 606990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:04,227-Speed 6305.36 samples/sec Loss 3.5234 LearningRate 0.0001 Epoch: 29 Global Step: 607000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:07,474-Speed 6307.75 samples/sec Loss 3.4886 LearningRate 0.0001 Epoch: 29 Global Step: 607010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:10,723-Speed 6303.86 samples/sec Loss 3.5591 LearningRate 0.0001 Epoch: 29 Global Step: 607020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:13,967-Speed 6314.95 samples/sec Loss 3.4579 LearningRate 0.0001 Epoch: 29 Global Step: 607030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:17,249-Speed 6242.57 samples/sec Loss 3.4860 LearningRate 0.0001 Epoch: 29 Global Step: 607040 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:57:20,476-Speed 6346.65 samples/sec Loss 3.5250 LearningRate 0.0001 Epoch: 29 Global Step: 607050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:23,755-Speed 6250.64 samples/sec Loss 3.4903 LearningRate 0.0001 Epoch: 29 Global Step: 607060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:27,111-Speed 6103.24 samples/sec Loss 3.4356 LearningRate 0.0001 Epoch: 29 Global Step: 607070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:30,458-Speed 6121.88 samples/sec Loss 3.5222 LearningRate 0.0001 Epoch: 29 Global Step: 607080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:33,705-Speed 6308.49 samples/sec Loss 3.4792 LearningRate 0.0001 Epoch: 29 Global Step: 607090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:36,952-Speed 6309.15 samples/sec Loss 3.5066 LearningRate 0.0001 Epoch: 29 Global Step: 607100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:40,200-Speed 6307.51 samples/sec Loss 3.4924 LearningRate 0.0001 Epoch: 29 Global Step: 607110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:43,442-Speed 6317.80 samples/sec Loss 3.5424 LearningRate 0.0001 Epoch: 29 Global Step: 607120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:46,687-Speed 6311.90 samples/sec Loss 3.4982 LearningRate 0.0001 Epoch: 29 Global Step: 607130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:49,930-Speed 6316.60 samples/sec Loss 3.5118 LearningRate 0.0001 Epoch: 29 Global Step: 607140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:53,173-Speed 6318.36 samples/sec Loss 3.5496 LearningRate 0.0001 Epoch: 29 Global Step: 607150 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:57:56,407-Speed 6333.55 samples/sec Loss 3.6083 LearningRate 0.0001 Epoch: 29 Global Step: 607160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:57:59,665-Speed 6287.05 samples/sec Loss 3.4792 LearningRate 0.0001 Epoch: 29 Global Step: 607170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:02,912-Speed 6308.44 samples/sec Loss 3.5838 LearningRate 0.0001 Epoch: 29 Global Step: 607180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:06,162-Speed 6304.02 samples/sec Loss 3.5502 LearningRate 0.0001 Epoch: 29 Global Step: 607190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:09,412-Speed 6302.67 samples/sec Loss 3.5429 LearningRate 0.0001 Epoch: 29 Global Step: 607200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:12,654-Speed 6318.43 samples/sec Loss 3.4431 LearningRate 0.0001 Epoch: 29 Global Step: 607210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:15,900-Speed 6309.64 samples/sec Loss 3.4670 LearningRate 0.0001 Epoch: 29 Global Step: 607220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:19,146-Speed 6310.97 samples/sec Loss 3.5315 LearningRate 0.0001 Epoch: 29 Global Step: 607230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:22,392-Speed 6312.26 samples/sec Loss 3.5031 LearningRate 0.0001 Epoch: 29 Global Step: 607240 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:25,641-Speed 6303.45 samples/sec Loss 3.5315 LearningRate 0.0001 Epoch: 29 Global Step: 607250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:28,872-Speed 6340.19 samples/sec Loss 3.5301 LearningRate 0.0001 Epoch: 29 Global Step: 607260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:32,118-Speed 6310.67 samples/sec Loss 3.4831 LearningRate 0.0001 Epoch: 29 Global Step: 607270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:35,367-Speed 6305.27 samples/sec Loss 3.4961 LearningRate 0.0001 Epoch: 29 Global Step: 607280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:38,616-Speed 6304.89 samples/sec Loss 3.5273 LearningRate 0.0001 Epoch: 29 Global Step: 607290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:41,868-Speed 6299.49 samples/sec Loss 3.4954 LearningRate 0.0001 Epoch: 29 Global Step: 607300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:45,114-Speed 6310.55 samples/sec Loss 3.5166 LearningRate 0.0001 Epoch: 29 Global Step: 607310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:48,358-Speed 6314.40 samples/sec Loss 3.4683 LearningRate 0.0001 Epoch: 29 Global Step: 607320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:51,605-Speed 6309.79 samples/sec Loss 3.5353 LearningRate 0.0001 Epoch: 29 Global Step: 607330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:54,855-Speed 6303.41 samples/sec Loss 3.5568 LearningRate 0.0001 Epoch: 29 Global Step: 607340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:58:58,096-Speed 6319.90 samples/sec Loss 3.5193 LearningRate 0.0001 Epoch: 29 Global Step: 607350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:01,345-Speed 6305.26 samples/sec Loss 3.5215 LearningRate 0.0001 Epoch: 29 Global Step: 607360 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:59:04,591-Speed 6310.22 samples/sec Loss 3.4826 LearningRate 0.0001 Epoch: 29 Global Step: 607370 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:59:07,837-Speed 6311.30 samples/sec Loss 3.5216 LearningRate 0.0001 Epoch: 29 Global Step: 607380 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:59:11,081-Speed 6314.89 samples/sec Loss 3.5257 LearningRate 0.0001 Epoch: 29 Global Step: 607390 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:59:14,328-Speed 6308.89 samples/sec Loss 3.5206 LearningRate 0.0001 Epoch: 29 Global Step: 607400 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 22:59:17,562-Speed 6332.77 samples/sec Loss 3.5366 LearningRate 0.0001 Epoch: 29 Global Step: 607410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:20,804-Speed 6318.04 samples/sec Loss 3.5016 LearningRate 0.0001 Epoch: 29 Global Step: 607420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:24,049-Speed 6313.11 samples/sec Loss 3.5605 LearningRate 0.0001 Epoch: 29 Global Step: 607430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:27,292-Speed 6316.52 samples/sec Loss 3.5297 LearningRate 0.0001 Epoch: 29 Global Step: 607440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:30,537-Speed 6313.57 samples/sec Loss 3.5531 LearningRate 0.0001 Epoch: 29 Global Step: 607450 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:33,779-Speed 6317.80 samples/sec Loss 3.5163 LearningRate 0.0001 Epoch: 29 Global Step: 607460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:37,026-Speed 6307.91 samples/sec Loss 3.5361 LearningRate 0.0001 Epoch: 29 Global Step: 607470 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:40,267-Speed 6320.18 samples/sec Loss 3.5262 LearningRate 0.0001 Epoch: 29 Global Step: 607480 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:43,514-Speed 6309.81 samples/sec Loss 3.4393 LearningRate 0.0001 Epoch: 29 Global Step: 607490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:46,761-Speed 6308.73 samples/sec Loss 3.5017 LearningRate 0.0001 Epoch: 29 Global Step: 607500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:49,989-Speed 6345.42 samples/sec Loss 3.5128 LearningRate 0.0001 Epoch: 29 Global Step: 607510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:53,232-Speed 6317.19 samples/sec Loss 3.4693 LearningRate 0.0001 Epoch: 29 Global Step: 607520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:56,477-Speed 6311.33 samples/sec Loss 3.5412 LearningRate 0.0001 Epoch: 29 Global Step: 607530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 22:59:59,725-Speed 6308.75 samples/sec Loss 3.5183 LearningRate 0.0001 Epoch: 29 Global Step: 607540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:02,973-Speed 6307.86 samples/sec Loss 3.5093 LearningRate 0.0001 Epoch: 29 Global Step: 607550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:06,220-Speed 6309.08 samples/sec Loss 3.4524 LearningRate 0.0001 Epoch: 29 Global Step: 607560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:09,467-Speed 6308.30 samples/sec Loss 3.4354 LearningRate 0.0001 Epoch: 29 Global Step: 607570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:12,710-Speed 6315.30 samples/sec Loss 3.4936 LearningRate 0.0001 Epoch: 29 Global Step: 607580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:15,957-Speed 6308.61 samples/sec Loss 3.5146 LearningRate 0.0001 Epoch: 29 Global Step: 607590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:19,202-Speed 6312.84 samples/sec Loss 3.5556 LearningRate 0.0001 Epoch: 29 Global Step: 607600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:22,435-Speed 6336.65 samples/sec Loss 3.5636 LearningRate 0.0001 Epoch: 29 Global Step: 607610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:25,682-Speed 6309.35 samples/sec Loss 3.4919 LearningRate 0.0001 Epoch: 29 Global Step: 607620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:28,926-Speed 6314.02 samples/sec Loss 3.5363 LearningRate 0.0001 Epoch: 29 Global Step: 607630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:32,173-Speed 6308.38 samples/sec Loss 3.4419 LearningRate 0.0001 Epoch: 29 Global Step: 607640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:35,415-Speed 6319.62 samples/sec Loss 3.5447 LearningRate 0.0001 Epoch: 29 Global Step: 607650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:38,658-Speed 6314.89 samples/sec Loss 3.5152 LearningRate 0.0001 Epoch: 29 Global Step: 607660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:41,907-Speed 6305.61 samples/sec Loss 3.6062 LearningRate 0.0001 Epoch: 29 Global Step: 607670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:45,150-Speed 6316.63 samples/sec Loss 3.4724 LearningRate 0.0001 Epoch: 29 Global Step: 607680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:48,397-Speed 6309.49 samples/sec Loss 3.5077 LearningRate 0.0001 Epoch: 29 Global Step: 607690 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:51,641-Speed 6314.18 samples/sec Loss 3.5364 LearningRate 0.0001 Epoch: 29 Global Step: 607700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:00:54,894-Speed 6295.92 samples/sec Loss 3.5022 LearningRate 0.0001 Epoch: 29 Global Step: 607710 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:00:58,150-Speed 6292.48 samples/sec Loss 3.4450 LearningRate 0.0001 Epoch: 29 Global Step: 607720 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:01:01,385-Speed 6332.79 samples/sec Loss 3.4531 LearningRate 0.0001 Epoch: 29 Global Step: 607730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:04,633-Speed 6306.59 samples/sec Loss 3.5023 LearningRate 0.0001 Epoch: 29 Global Step: 607740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:07,877-Speed 6313.89 samples/sec Loss 3.5273 LearningRate 0.0001 Epoch: 29 Global Step: 607750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:11,127-Speed 6303.76 samples/sec Loss 3.5145 LearningRate 0.0001 Epoch: 29 Global Step: 607760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:14,370-Speed 6315.92 samples/sec Loss 3.5644 LearningRate 0.0001 Epoch: 29 Global Step: 607770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:17,616-Speed 6311.25 samples/sec Loss 3.4958 LearningRate 0.0001 Epoch: 29 Global Step: 607780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:20,866-Speed 6302.11 samples/sec Loss 3.5235 LearningRate 0.0001 Epoch: 29 Global Step: 607790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:24,111-Speed 6312.77 samples/sec Loss 3.4867 LearningRate 0.0001 Epoch: 29 Global Step: 607800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:27,357-Speed 6312.67 samples/sec Loss 3.5299 LearningRate 0.0001 Epoch: 29 Global Step: 607810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:30,608-Speed 6300.65 samples/sec Loss 3.5643 LearningRate 0.0001 Epoch: 29 Global Step: 607820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:33,844-Speed 6330.68 samples/sec Loss 3.4892 LearningRate 0.0001 Epoch: 29 Global Step: 607830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:37,087-Speed 6315.73 samples/sec Loss 3.4803 LearningRate 0.0001 Epoch: 29 Global Step: 607840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:40,335-Speed 6307.21 samples/sec Loss 3.5654 LearningRate 0.0001 Epoch: 29 Global Step: 607850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:43,581-Speed 6310.07 samples/sec Loss 3.5279 LearningRate 0.0001 Epoch: 29 Global Step: 607860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:46,826-Speed 6312.34 samples/sec Loss 3.4627 LearningRate 0.0001 Epoch: 29 Global Step: 607870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:50,074-Speed 6308.00 samples/sec Loss 3.5092 LearningRate 0.0001 Epoch: 29 Global Step: 607880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:53,316-Speed 6317.16 samples/sec Loss 3.4947 LearningRate 0.0001 Epoch: 29 Global Step: 607890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:56,591-Speed 6254.43 samples/sec Loss 3.4650 LearningRate 0.0001 Epoch: 29 Global Step: 607900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:01:59,844-Speed 6297.62 samples/sec Loss 3.5339 LearningRate 0.0001 Epoch: 29 Global Step: 607910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:03,095-Speed 6302.07 samples/sec Loss 3.5367 LearningRate 0.0001 Epoch: 29 Global Step: 607920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:06,327-Speed 6337.64 samples/sec Loss 3.4969 LearningRate 0.0001 Epoch: 29 Global Step: 607930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:09,581-Speed 6295.55 samples/sec Loss 3.5001 LearningRate 0.0001 Epoch: 29 Global Step: 607940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:12,834-Speed 6295.95 samples/sec Loss 3.4888 LearningRate 0.0001 Epoch: 29 Global Step: 607950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:16,099-Speed 6274.43 samples/sec Loss 3.4853 LearningRate 0.0001 Epoch: 29 Global Step: 607960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:19,342-Speed 6316.51 samples/sec Loss 3.4963 LearningRate 0.0001 Epoch: 29 Global Step: 607970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:22,588-Speed 6310.53 samples/sec Loss 3.4651 LearningRate 0.0001 Epoch: 29 Global Step: 607980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:25,831-Speed 6317.26 samples/sec Loss 3.4026 LearningRate 0.0001 Epoch: 29 Global Step: 607990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:29,085-Speed 6295.60 samples/sec Loss 3.4967 LearningRate 0.0001 Epoch: 29 Global Step: 608000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:32,333-Speed 6306.27 samples/sec Loss 3.4921 LearningRate 0.0001 Epoch: 29 Global Step: 608010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:35,579-Speed 6312.00 samples/sec Loss 3.5809 LearningRate 0.0001 Epoch: 29 Global Step: 608020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:38,822-Speed 6316.30 samples/sec Loss 3.4573 LearningRate 0.0001 Epoch: 29 Global Step: 608030 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:02:42,076-Speed 6294.05 samples/sec Loss 3.4890 LearningRate 0.0001 Epoch: 29 Global Step: 608040 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:02:45,310-Speed 6334.17 samples/sec Loss 3.5089 LearningRate 0.0001 Epoch: 29 Global Step: 608050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:48,554-Speed 6315.18 samples/sec Loss 3.4818 LearningRate 0.0001 Epoch: 29 Global Step: 608060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:51,802-Speed 6307.26 samples/sec Loss 3.5397 LearningRate 0.0001 Epoch: 29 Global Step: 608070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:55,045-Speed 6316.72 samples/sec Loss 3.4901 LearningRate 0.0001 Epoch: 29 Global Step: 608080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:02:58,285-Speed 6321.89 samples/sec Loss 3.4417 LearningRate 0.0001 Epoch: 29 Global Step: 608090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:01,535-Speed 6302.84 samples/sec Loss 3.5009 LearningRate 0.0001 Epoch: 29 Global Step: 608100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:04,792-Speed 6290.81 samples/sec Loss 3.5255 LearningRate 0.0001 Epoch: 29 Global Step: 608110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:08,036-Speed 6313.94 samples/sec Loss 3.4897 LearningRate 0.0001 Epoch: 29 Global Step: 608120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:11,279-Speed 6315.44 samples/sec Loss 3.5293 LearningRate 0.0001 Epoch: 29 Global Step: 608130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:14,529-Speed 6303.64 samples/sec Loss 3.4979 LearningRate 0.0001 Epoch: 29 Global Step: 608140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:17,773-Speed 6313.79 samples/sec Loss 3.5417 LearningRate 0.0001 Epoch: 29 Global Step: 608150 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:03:21,004-Speed 6339.91 samples/sec Loss 3.4889 LearningRate 0.0001 Epoch: 29 Global Step: 608160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:24,249-Speed 6313.52 samples/sec Loss 3.5076 LearningRate 0.0001 Epoch: 29 Global Step: 608170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:27,493-Speed 6314.08 samples/sec Loss 3.4805 LearningRate 0.0001 Epoch: 29 Global Step: 608180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:30,737-Speed 6314.56 samples/sec Loss 3.5155 LearningRate 0.0001 Epoch: 29 Global Step: 608190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:33,982-Speed 6313.22 samples/sec Loss 3.4976 LearningRate 0.0001 Epoch: 29 Global Step: 608200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:37,227-Speed 6313.03 samples/sec Loss 3.4264 LearningRate 0.0001 Epoch: 29 Global Step: 608210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:40,470-Speed 6315.59 samples/sec Loss 3.4722 LearningRate 0.0001 Epoch: 29 Global Step: 608220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:43,716-Speed 6312.67 samples/sec Loss 3.4844 LearningRate 0.0001 Epoch: 29 Global Step: 608230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:46,960-Speed 6315.33 samples/sec Loss 3.5421 LearningRate 0.0001 Epoch: 29 Global Step: 608240 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:50,204-Speed 6315.39 samples/sec Loss 3.5190 LearningRate 0.0001 Epoch: 29 Global Step: 608250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:53,450-Speed 6310.85 samples/sec Loss 3.5272 LearningRate 0.0001 Epoch: 29 Global Step: 608260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:56,694-Speed 6313.82 samples/sec Loss 3.4960 LearningRate 0.0001 Epoch: 29 Global Step: 608270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:03:59,939-Speed 6312.95 samples/sec Loss 3.5236 LearningRate 0.0001 Epoch: 29 Global Step: 608280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:03,233-Speed 6219.04 samples/sec Loss 3.4927 LearningRate 0.0001 Epoch: 29 Global Step: 608290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:06,484-Speed 6301.29 samples/sec Loss 3.5215 LearningRate 0.0001 Epoch: 29 Global Step: 608300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:09,735-Speed 6300.51 samples/sec Loss 3.5472 LearningRate 0.0001 Epoch: 29 Global Step: 608310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:12,979-Speed 6314.32 samples/sec Loss 3.5027 LearningRate 0.0001 Epoch: 29 Global Step: 608320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:16,221-Speed 6319.43 samples/sec Loss 3.5356 LearningRate 0.0001 Epoch: 29 Global Step: 608330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:19,472-Speed 6299.91 samples/sec Loss 3.5634 LearningRate 0.0001 Epoch: 29 Global Step: 608340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:22,719-Speed 6309.41 samples/sec Loss 3.5518 LearningRate 0.0001 Epoch: 29 Global Step: 608350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:25,970-Speed 6301.17 samples/sec Loss 3.4947 LearningRate 0.0001 Epoch: 29 Global Step: 608360 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:04:29,206-Speed 6329.23 samples/sec Loss 3.4830 LearningRate 0.0001 Epoch: 29 Global Step: 608370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:32,453-Speed 6309.08 samples/sec Loss 3.4910 LearningRate 0.0001 Epoch: 29 Global Step: 608380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:35,699-Speed 6310.86 samples/sec Loss 3.5109 LearningRate 0.0001 Epoch: 29 Global Step: 608390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:38,942-Speed 6316.96 samples/sec Loss 3.5251 LearningRate 0.0001 Epoch: 29 Global Step: 608400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:42,190-Speed 6307.15 samples/sec Loss 3.4750 LearningRate 0.0001 Epoch: 29 Global Step: 608410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:45,431-Speed 6320.13 samples/sec Loss 3.4877 LearningRate 0.0001 Epoch: 29 Global Step: 608420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:48,674-Speed 6316.67 samples/sec Loss 3.4603 LearningRate 0.0001 Epoch: 29 Global Step: 608430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:51,918-Speed 6314.65 samples/sec Loss 3.5494 LearningRate 0.0001 Epoch: 29 Global Step: 608440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:55,165-Speed 6309.32 samples/sec Loss 3.4799 LearningRate 0.0001 Epoch: 29 Global Step: 608450 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:04:58,417-Speed 6298.67 samples/sec Loss 3.4501 LearningRate 0.0001 Epoch: 29 Global Step: 608460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:01,663-Speed 6311.29 samples/sec Loss 3.5376 LearningRate 0.0001 Epoch: 29 Global Step: 608470 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:05:04,908-Speed 6311.45 samples/sec Loss 3.5043 LearningRate 0.0001 Epoch: 29 Global Step: 608480 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:05:08,141-Speed 6337.95 samples/sec Loss 3.5602 LearningRate 0.0001 Epoch: 29 Global Step: 608490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:11,396-Speed 6291.44 samples/sec Loss 3.5000 LearningRate 0.0001 Epoch: 29 Global Step: 608500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:14,643-Speed 6309.68 samples/sec Loss 3.4987 LearningRate 0.0001 Epoch: 29 Global Step: 608510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:17,885-Speed 6317.91 samples/sec Loss 3.4980 LearningRate 0.0001 Epoch: 29 Global Step: 608520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:21,131-Speed 6311.96 samples/sec Loss 3.4622 LearningRate 0.0001 Epoch: 29 Global Step: 608530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:24,372-Speed 6319.90 samples/sec Loss 3.4509 LearningRate 0.0001 Epoch: 29 Global Step: 608540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:27,619-Speed 6309.55 samples/sec Loss 3.5302 LearningRate 0.0001 Epoch: 29 Global Step: 608550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:30,863-Speed 6313.64 samples/sec Loss 3.4720 LearningRate 0.0001 Epoch: 29 Global Step: 608560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:34,109-Speed 6310.84 samples/sec Loss 3.5289 LearningRate 0.0001 Epoch: 29 Global Step: 608570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:37,353-Speed 6313.91 samples/sec Loss 3.4730 LearningRate 0.0001 Epoch: 29 Global Step: 608580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:40,586-Speed 6336.13 samples/sec Loss 3.4774 LearningRate 0.0001 Epoch: 29 Global Step: 608590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:43,835-Speed 6305.55 samples/sec Loss 3.5181 LearningRate 0.0001 Epoch: 29 Global Step: 608600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:47,079-Speed 6314.93 samples/sec Loss 3.4816 LearningRate 0.0001 Epoch: 29 Global Step: 608610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:50,324-Speed 6312.01 samples/sec Loss 3.5220 LearningRate 0.0001 Epoch: 29 Global Step: 608620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:53,568-Speed 6313.64 samples/sec Loss 3.4966 LearningRate 0.0001 Epoch: 29 Global Step: 608630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:05:56,813-Speed 6313.08 samples/sec Loss 3.4885 LearningRate 0.0001 Epoch: 29 Global Step: 608640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:00,059-Speed 6310.56 samples/sec Loss 3.5290 LearningRate 0.0001 Epoch: 29 Global Step: 608650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:03,303-Speed 6314.63 samples/sec Loss 3.5528 LearningRate 0.0001 Epoch: 29 Global Step: 608660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:06,549-Speed 6311.69 samples/sec Loss 3.5076 LearningRate 0.0001 Epoch: 29 Global Step: 608670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:09,796-Speed 6309.66 samples/sec Loss 3.4593 LearningRate 0.0001 Epoch: 29 Global Step: 608680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:13,041-Speed 6312.40 samples/sec Loss 3.4709 LearningRate 0.0001 Epoch: 29 Global Step: 608690 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:06:16,276-Speed 6332.75 samples/sec Loss 3.5546 LearningRate 0.0001 Epoch: 29 Global Step: 608700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:19,521-Speed 6312.28 samples/sec Loss 3.5231 LearningRate 0.0001 Epoch: 29 Global Step: 608710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:22,768-Speed 6309.19 samples/sec Loss 3.4719 LearningRate 0.0001 Epoch: 29 Global Step: 608720 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:26,012-Speed 6314.64 samples/sec Loss 3.5050 LearningRate 0.0001 Epoch: 29 Global Step: 608730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:29,261-Speed 6305.41 samples/sec Loss 3.5034 LearningRate 0.0001 Epoch: 29 Global Step: 608740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:32,508-Speed 6307.95 samples/sec Loss 3.4607 LearningRate 0.0001 Epoch: 29 Global Step: 608750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:35,750-Speed 6317.71 samples/sec Loss 3.5100 LearningRate 0.0001 Epoch: 29 Global Step: 608760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:38,995-Speed 6313.30 samples/sec Loss 3.4998 LearningRate 0.0001 Epoch: 29 Global Step: 608770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:42,239-Speed 6315.24 samples/sec Loss 3.5143 LearningRate 0.0001 Epoch: 29 Global Step: 608780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:45,483-Speed 6314.00 samples/sec Loss 3.4627 LearningRate 0.0001 Epoch: 29 Global Step: 608790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:48,810-Speed 6157.46 samples/sec Loss 3.4670 LearningRate 0.0001 Epoch: 29 Global Step: 608800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:52,060-Speed 6303.03 samples/sec Loss 3.5314 LearningRate 0.0001 Epoch: 29 Global Step: 608810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:55,304-Speed 6314.70 samples/sec Loss 3.4725 LearningRate 0.0001 Epoch: 29 Global Step: 608820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:06:58,548-Speed 6314.27 samples/sec Loss 3.4480 LearningRate 0.0001 Epoch: 29 Global Step: 608830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:01,803-Speed 6293.64 samples/sec Loss 3.4969 LearningRate 0.0001 Epoch: 29 Global Step: 608840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:05,049-Speed 6309.75 samples/sec Loss 3.5330 LearningRate 0.0001 Epoch: 29 Global Step: 608850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:08,297-Speed 6306.77 samples/sec Loss 3.5136 LearningRate 0.0001 Epoch: 29 Global Step: 608860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:11,542-Speed 6313.19 samples/sec Loss 3.4863 LearningRate 0.0001 Epoch: 29 Global Step: 608870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:14,787-Speed 6312.10 samples/sec Loss 3.4593 LearningRate 0.0001 Epoch: 29 Global Step: 608880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:18,029-Speed 6318.42 samples/sec Loss 3.5288 LearningRate 0.0001 Epoch: 29 Global Step: 608890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:21,263-Speed 6334.86 samples/sec Loss 3.5937 LearningRate 0.0001 Epoch: 29 Global Step: 608900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:24,509-Speed 6311.29 samples/sec Loss 3.4281 LearningRate 0.0001 Epoch: 29 Global Step: 608910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:27,758-Speed 6304.14 samples/sec Loss 3.4977 LearningRate 0.0001 Epoch: 29 Global Step: 608920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:31,002-Speed 6315.23 samples/sec Loss 3.4980 LearningRate 0.0001 Epoch: 29 Global Step: 608930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:34,249-Speed 6308.61 samples/sec Loss 3.5433 LearningRate 0.0001 Epoch: 29 Global Step: 608940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:37,490-Speed 6320.40 samples/sec Loss 3.5056 LearningRate 0.0001 Epoch: 29 Global Step: 608950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:40,737-Speed 6308.05 samples/sec Loss 3.5118 LearningRate 0.0001 Epoch: 29 Global Step: 608960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:43,983-Speed 6311.61 samples/sec Loss 3.5393 LearningRate 0.0001 Epoch: 29 Global Step: 608970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:47,229-Speed 6311.30 samples/sec Loss 3.5076 LearningRate 0.0001 Epoch: 29 Global Step: 608980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:50,479-Speed 6302.64 samples/sec Loss 3.4525 LearningRate 0.0001 Epoch: 29 Global Step: 608990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:07:53,722-Speed 6316.76 samples/sec Loss 3.5366 LearningRate 0.0001 Epoch: 29 Global Step: 609000 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:07:56,955-Speed 6335.05 samples/sec Loss 3.4735 LearningRate 0.0001 Epoch: 29 Global Step: 609010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:00,200-Speed 6313.57 samples/sec Loss 3.5126 LearningRate 0.0001 Epoch: 29 Global Step: 609020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:03,443-Speed 6315.83 samples/sec Loss 3.5432 LearningRate 0.0001 Epoch: 29 Global Step: 609030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:06,691-Speed 6307.41 samples/sec Loss 3.4527 LearningRate 0.0001 Epoch: 29 Global Step: 609040 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:09,934-Speed 6316.06 samples/sec Loss 3.4454 LearningRate 0.0001 Epoch: 29 Global Step: 609050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:13,179-Speed 6313.74 samples/sec Loss 3.4738 LearningRate 0.0001 Epoch: 29 Global Step: 609060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:16,423-Speed 6313.92 samples/sec Loss 3.4769 LearningRate 0.0001 Epoch: 29 Global Step: 609070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:19,666-Speed 6316.51 samples/sec Loss 3.4670 LearningRate 0.0001 Epoch: 29 Global Step: 609080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:22,912-Speed 6310.70 samples/sec Loss 3.5234 LearningRate 0.0001 Epoch: 29 Global Step: 609090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:26,155-Speed 6315.35 samples/sec Loss 3.5025 LearningRate 0.0001 Epoch: 29 Global Step: 609100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:29,388-Speed 6338.92 samples/sec Loss 3.5410 LearningRate 0.0001 Epoch: 29 Global Step: 609110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:32,636-Speed 6305.84 samples/sec Loss 3.4934 LearningRate 0.0001 Epoch: 29 Global Step: 609120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:35,882-Speed 6311.38 samples/sec Loss 3.4983 LearningRate 0.0001 Epoch: 29 Global Step: 609130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:39,127-Speed 6312.31 samples/sec Loss 3.4764 LearningRate 0.0001 Epoch: 29 Global Step: 609140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:42,377-Speed 6304.15 samples/sec Loss 3.4827 LearningRate 0.0001 Epoch: 29 Global Step: 609150 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:45,623-Speed 6311.04 samples/sec Loss 3.4881 LearningRate 0.0001 Epoch: 29 Global Step: 609160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:48,868-Speed 6311.99 samples/sec Loss 3.5313 LearningRate 0.0001 Epoch: 29 Global Step: 609170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:52,115-Speed 6308.67 samples/sec Loss 3.5599 LearningRate 0.0001 Epoch: 29 Global Step: 609180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:55,360-Speed 6312.86 samples/sec Loss 3.4559 LearningRate 0.0001 Epoch: 29 Global Step: 609190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:08:58,605-Speed 6311.72 samples/sec Loss 3.4900 LearningRate 0.0001 Epoch: 29 Global Step: 609200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:01,852-Speed 6309.03 samples/sec Loss 3.5107 LearningRate 0.0001 Epoch: 29 Global Step: 609210 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:09:05,081-Speed 6344.22 samples/sec Loss 3.4680 LearningRate 0.0001 Epoch: 29 Global Step: 609220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:08,325-Speed 6315.21 samples/sec Loss 3.4759 LearningRate 0.0001 Epoch: 29 Global Step: 609230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:11,569-Speed 6313.76 samples/sec Loss 3.5135 LearningRate 0.0001 Epoch: 29 Global Step: 609240 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:14,812-Speed 6316.37 samples/sec Loss 3.5608 LearningRate 0.0001 Epoch: 29 Global Step: 609250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:18,058-Speed 6311.90 samples/sec Loss 3.4923 LearningRate 0.0001 Epoch: 29 Global Step: 609260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:21,301-Speed 6316.31 samples/sec Loss 3.4864 LearningRate 0.0001 Epoch: 29 Global Step: 609270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:24,552-Speed 6300.54 samples/sec Loss 3.5185 LearningRate 0.0001 Epoch: 29 Global Step: 609280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:27,797-Speed 6313.37 samples/sec Loss 3.4908 LearningRate 0.0001 Epoch: 29 Global Step: 609290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:31,044-Speed 6308.88 samples/sec Loss 3.4911 LearningRate 0.0001 Epoch: 29 Global Step: 609300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:34,288-Speed 6312.88 samples/sec Loss 3.4696 LearningRate 0.0001 Epoch: 29 Global Step: 609310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:37,535-Speed 6308.77 samples/sec Loss 3.5337 LearningRate 0.0001 Epoch: 29 Global Step: 609320 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:09:40,762-Speed 6348.08 samples/sec Loss 3.5285 LearningRate 0.0001 Epoch: 29 Global Step: 609330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:44,002-Speed 6322.94 samples/sec Loss 3.5053 LearningRate 0.0001 Epoch: 29 Global Step: 609340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:47,246-Speed 6313.88 samples/sec Loss 3.5064 LearningRate 0.0001 Epoch: 29 Global Step: 609350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:50,493-Speed 6310.41 samples/sec Loss 3.4704 LearningRate 0.0001 Epoch: 29 Global Step: 609360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:53,734-Speed 6320.36 samples/sec Loss 3.5696 LearningRate 0.0001 Epoch: 29 Global Step: 609370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:09:56,979-Speed 6313.32 samples/sec Loss 3.4665 LearningRate 0.0001 Epoch: 29 Global Step: 609380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:00,222-Speed 6316.13 samples/sec Loss 3.5339 LearningRate 0.0001 Epoch: 29 Global Step: 609390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:03,466-Speed 6315.11 samples/sec Loss 3.5164 LearningRate 0.0001 Epoch: 29 Global Step: 609400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:06,710-Speed 6314.27 samples/sec Loss 3.5340 LearningRate 0.0001 Epoch: 29 Global Step: 609410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:09,959-Speed 6304.32 samples/sec Loss 3.5313 LearningRate 0.0001 Epoch: 29 Global Step: 609420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:13,186-Speed 6347.30 samples/sec Loss 3.5101 LearningRate 0.0001 Epoch: 29 Global Step: 609430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:16,430-Speed 6314.95 samples/sec Loss 3.4919 LearningRate 0.0001 Epoch: 29 Global Step: 609440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:19,712-Speed 6242.20 samples/sec Loss 3.4643 LearningRate 0.0001 Epoch: 29 Global Step: 609450 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:22,965-Speed 6297.34 samples/sec Loss 3.4791 LearningRate 0.0001 Epoch: 29 Global Step: 609460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:26,210-Speed 6311.23 samples/sec Loss 3.5016 LearningRate 0.0001 Epoch: 29 Global Step: 609470 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:29,457-Speed 6309.55 samples/sec Loss 3.5157 LearningRate 0.0001 Epoch: 29 Global Step: 609480 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:32,704-Speed 6307.89 samples/sec Loss 3.4749 LearningRate 0.0001 Epoch: 29 Global Step: 609490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:35,949-Speed 6312.92 samples/sec Loss 3.5345 LearningRate 0.0001 Epoch: 29 Global Step: 609500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:39,195-Speed 6310.89 samples/sec Loss 3.5427 LearningRate 0.0001 Epoch: 29 Global Step: 609510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:42,437-Speed 6318.37 samples/sec Loss 3.4918 LearningRate 0.0001 Epoch: 29 Global Step: 609520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:45,687-Speed 6303.80 samples/sec Loss 3.4837 LearningRate 0.0001 Epoch: 29 Global Step: 609530 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:10:48,923-Speed 6330.62 samples/sec Loss 3.4913 LearningRate 0.0001 Epoch: 29 Global Step: 609540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:52,168-Speed 6312.27 samples/sec Loss 3.5184 LearningRate 0.0001 Epoch: 29 Global Step: 609550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:55,410-Speed 6318.73 samples/sec Loss 3.4790 LearningRate 0.0001 Epoch: 29 Global Step: 609560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:10:58,657-Speed 6309.41 samples/sec Loss 3.4824 LearningRate 0.0001 Epoch: 29 Global Step: 609570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:01,901-Speed 6313.92 samples/sec Loss 3.5072 LearningRate 0.0001 Epoch: 29 Global Step: 609580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:05,150-Speed 6303.97 samples/sec Loss 3.5344 LearningRate 0.0001 Epoch: 29 Global Step: 609590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:08,391-Speed 6320.97 samples/sec Loss 3.4784 LearningRate 0.0001 Epoch: 29 Global Step: 609600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:11,640-Speed 6305.22 samples/sec Loss 3.5138 LearningRate 0.0001 Epoch: 29 Global Step: 609610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:14,883-Speed 6317.14 samples/sec Loss 3.5185 LearningRate 0.0001 Epoch: 29 Global Step: 609620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:18,128-Speed 6312.49 samples/sec Loss 3.5217 LearningRate 0.0001 Epoch: 29 Global Step: 609630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:21,372-Speed 6314.33 samples/sec Loss 3.4988 LearningRate 0.0001 Epoch: 29 Global Step: 609640 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:11:24,603-Speed 6339.33 samples/sec Loss 3.5302 LearningRate 0.0001 Epoch: 29 Global Step: 609650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:27,850-Speed 6310.19 samples/sec Loss 3.5017 LearningRate 0.0001 Epoch: 29 Global Step: 609660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:31,093-Speed 6317.15 samples/sec Loss 3.4895 LearningRate 0.0001 Epoch: 29 Global Step: 609670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:34,343-Speed 6301.80 samples/sec Loss 3.5097 LearningRate 0.0001 Epoch: 29 Global Step: 609680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:37,587-Speed 6314.37 samples/sec Loss 3.4709 LearningRate 0.0001 Epoch: 29 Global Step: 609690 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:40,829-Speed 6318.97 samples/sec Loss 3.5395 LearningRate 0.0001 Epoch: 29 Global Step: 609700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:44,073-Speed 6314.99 samples/sec Loss 3.5088 LearningRate 0.0001 Epoch: 29 Global Step: 609710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:47,318-Speed 6312.13 samples/sec Loss 3.4945 LearningRate 0.0001 Epoch: 29 Global Step: 609720 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:50,583-Speed 6274.78 samples/sec Loss 3.5053 LearningRate 0.0001 Epoch: 29 Global Step: 609730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:53,832-Speed 6303.12 samples/sec Loss 3.5150 LearningRate 0.0001 Epoch: 29 Global Step: 609740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:11:57,076-Speed 6315.58 samples/sec Loss 3.4949 LearningRate 0.0001 Epoch: 29 Global Step: 609750 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:12:00,305-Speed 6342.91 samples/sec Loss 3.4739 LearningRate 0.0001 Epoch: 29 Global Step: 609760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:03,552-Speed 6308.90 samples/sec Loss 3.4118 LearningRate 0.0001 Epoch: 29 Global Step: 609770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:06,800-Speed 6307.60 samples/sec Loss 3.5316 LearningRate 0.0001 Epoch: 29 Global Step: 609780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:10,045-Speed 6312.93 samples/sec Loss 3.4995 LearningRate 0.0001 Epoch: 29 Global Step: 609790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:13,289-Speed 6313.51 samples/sec Loss 3.5068 LearningRate 0.0001 Epoch: 29 Global Step: 609800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:16,533-Speed 6315.25 samples/sec Loss 3.5322 LearningRate 0.0001 Epoch: 29 Global Step: 609810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:19,780-Speed 6310.10 samples/sec Loss 3.5751 LearningRate 0.0001 Epoch: 29 Global Step: 609820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:23,025-Speed 6312.98 samples/sec Loss 3.4489 LearningRate 0.0001 Epoch: 29 Global Step: 609830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:26,269-Speed 6314.48 samples/sec Loss 3.5115 LearningRate 0.0001 Epoch: 29 Global Step: 609840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:29,559-Speed 6226.60 samples/sec Loss 3.4628 LearningRate 0.0001 Epoch: 29 Global Step: 609850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:32,790-Speed 6338.65 samples/sec Loss 3.4560 LearningRate 0.0001 Epoch: 29 Global Step: 609860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:36,035-Speed 6312.28 samples/sec Loss 3.4621 LearningRate 0.0001 Epoch: 29 Global Step: 609870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:39,280-Speed 6314.53 samples/sec Loss 3.4820 LearningRate 0.0001 Epoch: 29 Global Step: 609880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:42,527-Speed 6307.27 samples/sec Loss 3.4707 LearningRate 0.0001 Epoch: 29 Global Step: 609890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:45,774-Speed 6309.56 samples/sec Loss 3.5332 LearningRate 0.0001 Epoch: 29 Global Step: 609900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:49,020-Speed 6310.41 samples/sec Loss 3.4473 LearningRate 0.0001 Epoch: 29 Global Step: 609910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:52,267-Speed 6308.98 samples/sec Loss 3.4471 LearningRate 0.0001 Epoch: 29 Global Step: 609920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:55,513-Speed 6311.44 samples/sec Loss 3.4890 LearningRate 0.0001 Epoch: 29 Global Step: 609930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:12:58,756-Speed 6315.11 samples/sec Loss 3.5154 LearningRate 0.0001 Epoch: 29 Global Step: 609940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:02,004-Speed 6306.22 samples/sec Loss 3.4819 LearningRate 0.0001 Epoch: 29 Global Step: 609950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:05,245-Speed 6321.27 samples/sec Loss 3.4811 LearningRate 0.0001 Epoch: 29 Global Step: 609960 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:13:08,477-Speed 6337.36 samples/sec Loss 3.5217 LearningRate 0.0001 Epoch: 29 Global Step: 609970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:11,727-Speed 6303.67 samples/sec Loss 3.4766 LearningRate 0.0001 Epoch: 29 Global Step: 609980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:14,970-Speed 6316.62 samples/sec Loss 3.5154 LearningRate 0.0001 Epoch: 29 Global Step: 609990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:18,213-Speed 6316.60 samples/sec Loss 3.5001 LearningRate 0.0001 Epoch: 29 Global Step: 610000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:21,456-Speed 6317.22 samples/sec Loss 3.5522 LearningRate 0.0001 Epoch: 29 Global Step: 610010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:24,702-Speed 6311.06 samples/sec Loss 3.5266 LearningRate 0.0001 Epoch: 29 Global Step: 610020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:27,947-Speed 6311.62 samples/sec Loss 3.4941 LearningRate 0.0001 Epoch: 29 Global Step: 610030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:31,191-Speed 6314.44 samples/sec Loss 3.4993 LearningRate 0.0001 Epoch: 29 Global Step: 610040 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:34,439-Speed 6308.84 samples/sec Loss 3.4766 LearningRate 0.0001 Epoch: 29 Global Step: 610050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:37,683-Speed 6313.71 samples/sec Loss 3.4267 LearningRate 0.0001 Epoch: 29 Global Step: 610060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:40,921-Speed 6326.60 samples/sec Loss 3.5402 LearningRate 0.0001 Epoch: 29 Global Step: 610070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:44,163-Speed 6319.48 samples/sec Loss 3.4889 LearningRate 0.0001 Epoch: 29 Global Step: 610080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:47,410-Speed 6308.27 samples/sec Loss 3.4801 LearningRate 0.0001 Epoch: 29 Global Step: 610090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:50,651-Speed 6320.66 samples/sec Loss 3.5297 LearningRate 0.0001 Epoch: 29 Global Step: 610100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:53,895-Speed 6313.95 samples/sec Loss 3.5133 LearningRate 0.0001 Epoch: 29 Global Step: 610110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:13:57,143-Speed 6305.81 samples/sec Loss 3.5153 LearningRate 0.0001 Epoch: 29 Global Step: 610120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:14:00,387-Speed 6315.58 samples/sec Loss 3.4543 LearningRate 0.0001 Epoch: 29 Global Step: 610130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:14:03,632-Speed 6313.12 samples/sec Loss 3.4742 LearningRate 0.0001 Epoch: 29 Global Step: 610140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:14:06,877-Speed 6311.07 samples/sec Loss 3.4920 LearningRate 0.0001 Epoch: 29 Global Step: 610150 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:14:10,122-Speed 6313.75 samples/sec Loss 3.4679 LearningRate 0.0001 Epoch: 29 Global Step: 610160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:14:13,354-Speed 6337.97 samples/sec Loss 3.4639 LearningRate 0.0001 Epoch: 29 Global Step: 610170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:14:16,599-Speed 6313.12 samples/sec Loss 3.4653 LearningRate 0.0001 Epoch: 29 Global Step: 610180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:14:19,844-Speed 6313.21 samples/sec Loss 3.4505 LearningRate 0.0001 Epoch: 29 Global Step: 610190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:14:23,092-Speed 6306.14 samples/sec Loss 3.4336 LearningRate 0.0001 Epoch: 29 Global Step: 610200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:14:26,325-Speed 6335.42 samples/sec Loss 3.4854 LearningRate 0.0001 Epoch: 29 Global Step: 610210 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 23:14:29,573-Speed 6307.89 samples/sec Loss 3.4954 LearningRate 0.0001 Epoch: 29 Global Step: 610220 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 23:14:32,817-Speed 6313.62 samples/sec Loss 3.4849 LearningRate 0.0001 Epoch: 29 Global Step: 610230 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 23:14:36,065-Speed 6306.34 samples/sec Loss 3.4951 LearningRate 0.0001 Epoch: 29 Global Step: 610240 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 23:14:39,307-Speed 6318.72 samples/sec Loss 3.5452 LearningRate 0.0001 Epoch: 29 Global Step: 610250 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 23:14:42,557-Speed 6304.72 samples/sec Loss 3.4706 LearningRate 0.0001 Epoch: 29 Global Step: 610260 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 23:14:45,800-Speed 6316.82 samples/sec Loss 3.4719 LearningRate 0.0001 Epoch: 29 Global Step: 610270 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 23:14:49,044-Speed 6314.26 samples/sec Loss 3.5375 LearningRate 0.0001 Epoch: 29 Global Step: 610280 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 23:14:52,285-Speed 6319.77 samples/sec Loss 3.5200 LearningRate 0.0001 Epoch: 29 Global Step: 610290 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 23:14:55,531-Speed 6312.17 samples/sec Loss 3.4940 LearningRate 0.0001 Epoch: 29 Global Step: 610300 Fp16 Grad Scale: 4096 Required: 20 hours Training: 2022-04-02 23:14:58,774-Speed 6315.06 samples/sec Loss 3.5058 LearningRate 0.0001 Epoch: 29 Global Step: 610310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:02,022-Speed 6308.10 samples/sec Loss 3.4824 LearningRate 0.0001 Epoch: 29 Global Step: 610320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:05,265-Speed 6315.20 samples/sec Loss 3.5109 LearningRate 0.0001 Epoch: 29 Global Step: 610330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:08,512-Speed 6310.26 samples/sec Loss 3.4970 LearningRate 0.0001 Epoch: 29 Global Step: 610340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:11,760-Speed 6306.31 samples/sec Loss 3.5025 LearningRate 0.0001 Epoch: 29 Global Step: 610350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:15,008-Speed 6306.36 samples/sec Loss 3.4854 LearningRate 0.0001 Epoch: 29 Global Step: 610360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:18,247-Speed 6324.61 samples/sec Loss 3.4838 LearningRate 0.0001 Epoch: 29 Global Step: 610370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:21,492-Speed 6312.23 samples/sec Loss 3.4665 LearningRate 0.0001 Epoch: 29 Global Step: 610380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:24,740-Speed 6306.91 samples/sec Loss 3.4575 LearningRate 0.0001 Epoch: 29 Global Step: 610390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:27,988-Speed 6307.40 samples/sec Loss 3.4896 LearningRate 0.0001 Epoch: 29 Global Step: 610400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:31,234-Speed 6309.90 samples/sec Loss 3.4403 LearningRate 0.0001 Epoch: 29 Global Step: 610410 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:15:34,462-Speed 6347.32 samples/sec Loss 3.5445 LearningRate 0.0001 Epoch: 29 Global Step: 610420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:37,708-Speed 6308.92 samples/sec Loss 3.4756 LearningRate 0.0001 Epoch: 29 Global Step: 610430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:40,950-Speed 6318.34 samples/sec Loss 3.5081 LearningRate 0.0001 Epoch: 29 Global Step: 610440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:44,193-Speed 6318.02 samples/sec Loss 3.5780 LearningRate 0.0001 Epoch: 29 Global Step: 610450 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:47,437-Speed 6314.30 samples/sec Loss 3.4507 LearningRate 0.0001 Epoch: 29 Global Step: 610460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:50,681-Speed 6314.35 samples/sec Loss 3.5258 LearningRate 0.0001 Epoch: 29 Global Step: 610470 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:53,925-Speed 6314.87 samples/sec Loss 3.4399 LearningRate 0.0001 Epoch: 29 Global Step: 610480 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:15:57,169-Speed 6315.00 samples/sec Loss 3.5398 LearningRate 0.0001 Epoch: 29 Global Step: 610490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:00,417-Speed 6306.85 samples/sec Loss 3.4720 LearningRate 0.0001 Epoch: 29 Global Step: 610500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:03,662-Speed 6312.99 samples/sec Loss 3.4014 LearningRate 0.0001 Epoch: 29 Global Step: 610510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:06,892-Speed 6341.38 samples/sec Loss 3.4318 LearningRate 0.0001 Epoch: 29 Global Step: 610520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:10,143-Speed 6301.41 samples/sec Loss 3.4831 LearningRate 0.0001 Epoch: 29 Global Step: 610530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:13,388-Speed 6312.70 samples/sec Loss 3.4609 LearningRate 0.0001 Epoch: 29 Global Step: 610540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:16,632-Speed 6313.68 samples/sec Loss 3.4645 LearningRate 0.0001 Epoch: 29 Global Step: 610550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:19,880-Speed 6308.83 samples/sec Loss 3.4723 LearningRate 0.0001 Epoch: 29 Global Step: 610560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:23,126-Speed 6309.50 samples/sec Loss 3.5109 LearningRate 0.0001 Epoch: 29 Global Step: 610570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:26,377-Speed 6301.92 samples/sec Loss 3.5121 LearningRate 0.0001 Epoch: 29 Global Step: 610580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:29,622-Speed 6311.97 samples/sec Loss 3.5283 LearningRate 0.0001 Epoch: 29 Global Step: 610590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:32,869-Speed 6308.88 samples/sec Loss 3.4948 LearningRate 0.0001 Epoch: 29 Global Step: 610600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:36,114-Speed 6312.25 samples/sec Loss 3.4333 LearningRate 0.0001 Epoch: 29 Global Step: 610610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:39,352-Speed 6327.13 samples/sec Loss 3.4861 LearningRate 0.0001 Epoch: 29 Global Step: 610620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:42,596-Speed 6313.28 samples/sec Loss 3.5297 LearningRate 0.0001 Epoch: 29 Global Step: 610630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:45,842-Speed 6311.58 samples/sec Loss 3.5008 LearningRate 0.0001 Epoch: 29 Global Step: 610640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:49,090-Speed 6306.77 samples/sec Loss 3.5300 LearningRate 0.0001 Epoch: 29 Global Step: 610650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:52,347-Speed 6290.41 samples/sec Loss 3.5076 LearningRate 0.0001 Epoch: 29 Global Step: 610660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:55,589-Speed 6318.25 samples/sec Loss 3.5142 LearningRate 0.0001 Epoch: 29 Global Step: 610670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:16:58,834-Speed 6311.94 samples/sec Loss 3.4749 LearningRate 0.0001 Epoch: 29 Global Step: 610680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:02,083-Speed 6304.61 samples/sec Loss 3.5115 LearningRate 0.0001 Epoch: 29 Global Step: 610690 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:05,327-Speed 6315.73 samples/sec Loss 3.5377 LearningRate 0.0001 Epoch: 29 Global Step: 610700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:08,574-Speed 6307.65 samples/sec Loss 3.5312 LearningRate 0.0001 Epoch: 29 Global Step: 610710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:11,822-Speed 6307.52 samples/sec Loss 3.4608 LearningRate 0.0001 Epoch: 29 Global Step: 610720 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:17:15,051-Speed 6343.72 samples/sec Loss 3.4774 LearningRate 0.0001 Epoch: 29 Global Step: 610730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:18,300-Speed 6306.17 samples/sec Loss 3.4356 LearningRate 0.0001 Epoch: 29 Global Step: 610740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:21,550-Speed 6302.61 samples/sec Loss 3.4816 LearningRate 0.0001 Epoch: 29 Global Step: 610750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:24,795-Speed 6313.28 samples/sec Loss 3.5549 LearningRate 0.0001 Epoch: 29 Global Step: 610760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:28,043-Speed 6306.37 samples/sec Loss 3.5401 LearningRate 0.0001 Epoch: 29 Global Step: 610770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:31,300-Speed 6288.72 samples/sec Loss 3.4740 LearningRate 0.0001 Epoch: 29 Global Step: 610780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:34,551-Speed 6301.55 samples/sec Loss 3.4561 LearningRate 0.0001 Epoch: 29 Global Step: 610790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:37,795-Speed 6314.26 samples/sec Loss 3.4307 LearningRate 0.0001 Epoch: 29 Global Step: 610800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:41,040-Speed 6313.27 samples/sec Loss 3.4900 LearningRate 0.0001 Epoch: 29 Global Step: 610810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:44,283-Speed 6316.76 samples/sec Loss 3.5479 LearningRate 0.0001 Epoch: 29 Global Step: 610820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:47,512-Speed 6342.82 samples/sec Loss 3.4424 LearningRate 0.0001 Epoch: 29 Global Step: 610830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:50,760-Speed 6306.97 samples/sec Loss 3.5043 LearningRate 0.0001 Epoch: 29 Global Step: 610840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:54,001-Speed 6321.26 samples/sec Loss 3.4711 LearningRate 0.0001 Epoch: 29 Global Step: 610850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:17:57,255-Speed 6295.72 samples/sec Loss 3.4223 LearningRate 0.0001 Epoch: 29 Global Step: 610860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:00,503-Speed 6305.51 samples/sec Loss 3.4754 LearningRate 0.0001 Epoch: 29 Global Step: 610870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:03,785-Speed 6241.39 samples/sec Loss 3.5238 LearningRate 0.0001 Epoch: 29 Global Step: 610880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:07,037-Speed 6299.14 samples/sec Loss 3.5471 LearningRate 0.0001 Epoch: 29 Global Step: 610890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:10,283-Speed 6310.28 samples/sec Loss 3.5262 LearningRate 0.0001 Epoch: 29 Global Step: 610900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:13,529-Speed 6311.55 samples/sec Loss 3.4660 LearningRate 0.0001 Epoch: 29 Global Step: 610910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:16,777-Speed 6306.75 samples/sec Loss 3.4981 LearningRate 0.0001 Epoch: 29 Global Step: 610920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:20,011-Speed 6334.57 samples/sec Loss 3.4336 LearningRate 0.0001 Epoch: 29 Global Step: 610930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:23,263-Speed 6298.95 samples/sec Loss 3.4521 LearningRate 0.0001 Epoch: 29 Global Step: 610940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:26,509-Speed 6312.08 samples/sec Loss 3.4344 LearningRate 0.0001 Epoch: 29 Global Step: 610950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:29,759-Speed 6301.77 samples/sec Loss 3.4958 LearningRate 0.0001 Epoch: 29 Global Step: 610960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:33,015-Speed 6291.87 samples/sec Loss 3.4206 LearningRate 0.0001 Epoch: 29 Global Step: 610970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:36,259-Speed 6314.21 samples/sec Loss 3.5110 LearningRate 0.0001 Epoch: 29 Global Step: 610980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:39,504-Speed 6312.89 samples/sec Loss 3.4810 LearningRate 0.0001 Epoch: 29 Global Step: 610990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:42,752-Speed 6307.06 samples/sec Loss 3.4710 LearningRate 0.0001 Epoch: 29 Global Step: 611000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:45,997-Speed 6312.51 samples/sec Loss 3.4873 LearningRate 0.0001 Epoch: 29 Global Step: 611010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:49,241-Speed 6315.35 samples/sec Loss 3.4853 LearningRate 0.0001 Epoch: 29 Global Step: 611020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:18:52,484-Speed 6315.78 samples/sec Loss 3.4693 LearningRate 0.0001 Epoch: 29 Global Step: 611030 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:18:55,729-Speed 6312.16 samples/sec Loss 3.3622 LearningRate 0.0001 Epoch: 29 Global Step: 611040 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:18:58,984-Speed 6292.84 samples/sec Loss 3.5211 LearningRate 0.0001 Epoch: 29 Global Step: 611050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:02,238-Speed 6295.44 samples/sec Loss 3.5201 LearningRate 0.0001 Epoch: 29 Global Step: 611060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:05,485-Speed 6310.16 samples/sec Loss 3.4683 LearningRate 0.0001 Epoch: 29 Global Step: 611070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:08,729-Speed 6313.52 samples/sec Loss 3.4956 LearningRate 0.0001 Epoch: 29 Global Step: 611080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:11,972-Speed 6317.12 samples/sec Loss 3.5007 LearningRate 0.0001 Epoch: 29 Global Step: 611090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:15,217-Speed 6312.07 samples/sec Loss 3.4600 LearningRate 0.0001 Epoch: 29 Global Step: 611100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:18,464-Speed 6308.32 samples/sec Loss 3.4113 LearningRate 0.0001 Epoch: 29 Global Step: 611110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:21,711-Speed 6310.11 samples/sec Loss 3.4663 LearningRate 0.0001 Epoch: 29 Global Step: 611120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:24,956-Speed 6312.03 samples/sec Loss 3.4732 LearningRate 0.0001 Epoch: 29 Global Step: 611130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:28,203-Speed 6309.15 samples/sec Loss 3.5030 LearningRate 0.0001 Epoch: 29 Global Step: 611140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:31,435-Speed 6337.98 samples/sec Loss 3.5057 LearningRate 0.0001 Epoch: 29 Global Step: 611150 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:34,684-Speed 6304.71 samples/sec Loss 3.4930 LearningRate 0.0001 Epoch: 29 Global Step: 611160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:37,929-Speed 6313.26 samples/sec Loss 3.4745 LearningRate 0.0001 Epoch: 29 Global Step: 611170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:41,175-Speed 6311.54 samples/sec Loss 3.4291 LearningRate 0.0001 Epoch: 29 Global Step: 611180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:44,417-Speed 6317.92 samples/sec Loss 3.5264 LearningRate 0.0001 Epoch: 29 Global Step: 611190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:47,663-Speed 6309.98 samples/sec Loss 3.5802 LearningRate 0.0001 Epoch: 29 Global Step: 611200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:50,908-Speed 6313.34 samples/sec Loss 3.5076 LearningRate 0.0001 Epoch: 29 Global Step: 611210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:54,152-Speed 6314.89 samples/sec Loss 3.4609 LearningRate 0.0001 Epoch: 29 Global Step: 611220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:19:57,400-Speed 6307.31 samples/sec Loss 3.5068 LearningRate 0.0001 Epoch: 29 Global Step: 611230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:00,643-Speed 6315.89 samples/sec Loss 3.4964 LearningRate 0.0001 Epoch: 29 Global Step: 611240 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:03,890-Speed 6308.83 samples/sec Loss 3.4660 LearningRate 0.0001 Epoch: 29 Global Step: 611250 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:20:07,125-Speed 6332.90 samples/sec Loss 3.4529 LearningRate 0.0001 Epoch: 29 Global Step: 611260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:10,369-Speed 6312.81 samples/sec Loss 3.5467 LearningRate 0.0001 Epoch: 29 Global Step: 611270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:13,616-Speed 6309.74 samples/sec Loss 3.5122 LearningRate 0.0001 Epoch: 29 Global Step: 611280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:16,860-Speed 6314.97 samples/sec Loss 3.4751 LearningRate 0.0001 Epoch: 29 Global Step: 611290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:20,110-Speed 6304.35 samples/sec Loss 3.4824 LearningRate 0.0001 Epoch: 29 Global Step: 611300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:23,356-Speed 6310.07 samples/sec Loss 3.4972 LearningRate 0.0001 Epoch: 29 Global Step: 611310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:26,602-Speed 6311.80 samples/sec Loss 3.4865 LearningRate 0.0001 Epoch: 29 Global Step: 611320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:29,861-Speed 6285.29 samples/sec Loss 3.5170 LearningRate 0.0001 Epoch: 29 Global Step: 611330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:33,111-Speed 6302.23 samples/sec Loss 3.4496 LearningRate 0.0001 Epoch: 29 Global Step: 611340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:36,354-Speed 6315.57 samples/sec Loss 3.5030 LearningRate 0.0001 Epoch: 29 Global Step: 611350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:39,604-Speed 6304.84 samples/sec Loss 3.4550 LearningRate 0.0001 Epoch: 29 Global Step: 611360 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:20:42,835-Speed 6339.73 samples/sec Loss 3.5315 LearningRate 0.0001 Epoch: 29 Global Step: 611370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:46,081-Speed 6310.82 samples/sec Loss 3.4797 LearningRate 0.0001 Epoch: 29 Global Step: 611380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:49,325-Speed 6315.37 samples/sec Loss 3.4252 LearningRate 0.0001 Epoch: 29 Global Step: 611390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:52,570-Speed 6311.94 samples/sec Loss 3.4288 LearningRate 0.0001 Epoch: 29 Global Step: 611400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:55,812-Speed 6318.84 samples/sec Loss 3.4957 LearningRate 0.0001 Epoch: 29 Global Step: 611410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:20:59,057-Speed 6312.81 samples/sec Loss 3.4544 LearningRate 0.0001 Epoch: 29 Global Step: 611420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:02,398-Speed 6131.61 samples/sec Loss 3.4550 LearningRate 0.0001 Epoch: 29 Global Step: 611430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:05,639-Speed 6318.66 samples/sec Loss 3.5140 LearningRate 0.0001 Epoch: 29 Global Step: 611440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:08,885-Speed 6310.86 samples/sec Loss 3.4600 LearningRate 0.0001 Epoch: 29 Global Step: 611450 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:12,136-Speed 6302.18 samples/sec Loss 3.5005 LearningRate 0.0001 Epoch: 29 Global Step: 611460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:15,381-Speed 6311.76 samples/sec Loss 3.4617 LearningRate 0.0001 Epoch: 29 Global Step: 611470 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:21:18,612-Speed 6340.19 samples/sec Loss 3.5389 LearningRate 0.0001 Epoch: 29 Global Step: 611480 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:21,856-Speed 6314.37 samples/sec Loss 3.4681 LearningRate 0.0001 Epoch: 29 Global Step: 611490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:25,103-Speed 6309.13 samples/sec Loss 3.4374 LearningRate 0.0001 Epoch: 29 Global Step: 611500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:28,346-Speed 6316.08 samples/sec Loss 3.4080 LearningRate 0.0001 Epoch: 29 Global Step: 611510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:31,597-Speed 6301.66 samples/sec Loss 3.4797 LearningRate 0.0001 Epoch: 29 Global Step: 611520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:34,841-Speed 6314.26 samples/sec Loss 3.5391 LearningRate 0.0001 Epoch: 29 Global Step: 611530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:38,085-Speed 6314.71 samples/sec Loss 3.4952 LearningRate 0.0001 Epoch: 29 Global Step: 611540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:41,404-Speed 6172.96 samples/sec Loss 3.5240 LearningRate 0.0001 Epoch: 29 Global Step: 611550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:44,661-Speed 6287.52 samples/sec Loss 3.4929 LearningRate 0.0001 Epoch: 29 Global Step: 611560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:47,903-Speed 6319.33 samples/sec Loss 3.4230 LearningRate 0.0001 Epoch: 29 Global Step: 611570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:51,136-Speed 6336.35 samples/sec Loss 3.5003 LearningRate 0.0001 Epoch: 29 Global Step: 611580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:54,384-Speed 6307.35 samples/sec Loss 3.5313 LearningRate 0.0001 Epoch: 29 Global Step: 611590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:21:57,628-Speed 6313.13 samples/sec Loss 3.5064 LearningRate 0.0001 Epoch: 29 Global Step: 611600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:00,877-Speed 6306.59 samples/sec Loss 3.4844 LearningRate 0.0001 Epoch: 29 Global Step: 611610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:04,123-Speed 6309.72 samples/sec Loss 3.4221 LearningRate 0.0001 Epoch: 29 Global Step: 611620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:07,371-Speed 6308.18 samples/sec Loss 3.4871 LearningRate 0.0001 Epoch: 29 Global Step: 611630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:10,615-Speed 6314.11 samples/sec Loss 3.4459 LearningRate 0.0001 Epoch: 29 Global Step: 611640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:13,862-Speed 6309.25 samples/sec Loss 3.4501 LearningRate 0.0001 Epoch: 29 Global Step: 611650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:17,108-Speed 6309.89 samples/sec Loss 3.4623 LearningRate 0.0001 Epoch: 29 Global Step: 611660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:20,350-Speed 6318.86 samples/sec Loss 3.5261 LearningRate 0.0001 Epoch: 29 Global Step: 611670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:23,581-Speed 6339.70 samples/sec Loss 3.4736 LearningRate 0.0001 Epoch: 29 Global Step: 611680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:26,826-Speed 6313.32 samples/sec Loss 3.4894 LearningRate 0.0001 Epoch: 29 Global Step: 611690 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:30,071-Speed 6311.29 samples/sec Loss 3.4963 LearningRate 0.0001 Epoch: 29 Global Step: 611700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:33,318-Speed 6309.70 samples/sec Loss 3.4502 LearningRate 0.0001 Epoch: 29 Global Step: 611710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:36,561-Speed 6316.12 samples/sec Loss 3.4856 LearningRate 0.0001 Epoch: 29 Global Step: 611720 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:39,803-Speed 6318.51 samples/sec Loss 3.4932 LearningRate 0.0001 Epoch: 29 Global Step: 611730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:43,048-Speed 6312.38 samples/sec Loss 3.3982 LearningRate 0.0001 Epoch: 29 Global Step: 611740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:46,307-Speed 6285.47 samples/sec Loss 3.4812 LearningRate 0.0001 Epoch: 29 Global Step: 611750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:49,552-Speed 6315.42 samples/sec Loss 3.4762 LearningRate 0.0001 Epoch: 29 Global Step: 611760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:52,794-Speed 6318.03 samples/sec Loss 3.5499 LearningRate 0.0001 Epoch: 29 Global Step: 611770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:56,027-Speed 6336.88 samples/sec Loss 3.5306 LearningRate 0.0001 Epoch: 29 Global Step: 611780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:22:59,272-Speed 6312.44 samples/sec Loss 3.4877 LearningRate 0.0001 Epoch: 29 Global Step: 611790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:02,516-Speed 6314.98 samples/sec Loss 3.4625 LearningRate 0.0001 Epoch: 29 Global Step: 611800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:05,783-Speed 6270.37 samples/sec Loss 3.4851 LearningRate 0.0001 Epoch: 29 Global Step: 611810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:09,029-Speed 6310.08 samples/sec Loss 3.4265 LearningRate 0.0001 Epoch: 29 Global Step: 611820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:12,273-Speed 6314.12 samples/sec Loss 3.5127 LearningRate 0.0001 Epoch: 29 Global Step: 611830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:15,518-Speed 6313.05 samples/sec Loss 3.4332 LearningRate 0.0001 Epoch: 29 Global Step: 611840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:18,762-Speed 6315.44 samples/sec Loss 3.4697 LearningRate 0.0001 Epoch: 29 Global Step: 611850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:22,004-Speed 6318.89 samples/sec Loss 3.4871 LearningRate 0.0001 Epoch: 29 Global Step: 611860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:25,248-Speed 6313.67 samples/sec Loss 3.4159 LearningRate 0.0001 Epoch: 29 Global Step: 611870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:28,478-Speed 6343.35 samples/sec Loss 3.5361 LearningRate 0.0001 Epoch: 29 Global Step: 611880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:31,722-Speed 6314.68 samples/sec Loss 3.5051 LearningRate 0.0001 Epoch: 29 Global Step: 611890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:34,964-Speed 6317.76 samples/sec Loss 3.4533 LearningRate 0.0001 Epoch: 29 Global Step: 611900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:38,203-Speed 6324.76 samples/sec Loss 3.4277 LearningRate 0.0001 Epoch: 29 Global Step: 611910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:41,453-Speed 6302.46 samples/sec Loss 3.4721 LearningRate 0.0001 Epoch: 29 Global Step: 611920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:44,707-Speed 6294.57 samples/sec Loss 3.4988 LearningRate 0.0001 Epoch: 29 Global Step: 611930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:47,954-Speed 6310.36 samples/sec Loss 3.4452 LearningRate 0.0001 Epoch: 29 Global Step: 611940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:51,202-Speed 6306.81 samples/sec Loss 3.4560 LearningRate 0.0001 Epoch: 29 Global Step: 611950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:54,448-Speed 6309.58 samples/sec Loss 3.5269 LearningRate 0.0001 Epoch: 29 Global Step: 611960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:23:57,694-Speed 6311.23 samples/sec Loss 3.5571 LearningRate 0.0001 Epoch: 29 Global Step: 611970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:00,942-Speed 6306.78 samples/sec Loss 3.4761 LearningRate 0.0001 Epoch: 29 Global Step: 611980 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:24:04,174-Speed 6337.08 samples/sec Loss 3.4223 LearningRate 0.0001 Epoch: 29 Global Step: 611990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:07,420-Speed 6310.87 samples/sec Loss 3.4916 LearningRate 0.0001 Epoch: 29 Global Step: 612000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:10,670-Speed 6303.67 samples/sec Loss 3.5056 LearningRate 0.0001 Epoch: 29 Global Step: 612010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:13,915-Speed 6312.38 samples/sec Loss 3.4705 LearningRate 0.0001 Epoch: 29 Global Step: 612020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:17,164-Speed 6304.09 samples/sec Loss 3.4645 LearningRate 0.0001 Epoch: 29 Global Step: 612030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:20,415-Speed 6301.07 samples/sec Loss 3.5542 LearningRate 0.0001 Epoch: 29 Global Step: 612040 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:23,662-Speed 6309.99 samples/sec Loss 3.4677 LearningRate 0.0001 Epoch: 29 Global Step: 612050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:26,915-Speed 6296.44 samples/sec Loss 3.4737 LearningRate 0.0001 Epoch: 29 Global Step: 612060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:30,161-Speed 6312.27 samples/sec Loss 3.4826 LearningRate 0.0001 Epoch: 29 Global Step: 612070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:33,406-Speed 6311.46 samples/sec Loss 3.4521 LearningRate 0.0001 Epoch: 29 Global Step: 612080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:36,638-Speed 6338.81 samples/sec Loss 3.4820 LearningRate 0.0001 Epoch: 29 Global Step: 612090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:39,883-Speed 6314.53 samples/sec Loss 3.4863 LearningRate 0.0001 Epoch: 29 Global Step: 612100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:43,129-Speed 6310.22 samples/sec Loss 3.4898 LearningRate 0.0001 Epoch: 29 Global Step: 612110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:46,379-Speed 6303.79 samples/sec Loss 3.4348 LearningRate 0.0001 Epoch: 29 Global Step: 612120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:49,626-Speed 6308.57 samples/sec Loss 3.4184 LearningRate 0.0001 Epoch: 29 Global Step: 612130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:52,870-Speed 6315.41 samples/sec Loss 3.4460 LearningRate 0.0001 Epoch: 29 Global Step: 612140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:56,117-Speed 6308.81 samples/sec Loss 3.5047 LearningRate 0.0001 Epoch: 29 Global Step: 612150 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:24:59,366-Speed 6303.52 samples/sec Loss 3.4941 LearningRate 0.0001 Epoch: 29 Global Step: 612160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:02,613-Speed 6310.70 samples/sec Loss 3.4485 LearningRate 0.0001 Epoch: 29 Global Step: 612170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:05,859-Speed 6310.04 samples/sec Loss 3.4485 LearningRate 0.0001 Epoch: 29 Global Step: 612180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:09,092-Speed 6336.81 samples/sec Loss 3.4791 LearningRate 0.0001 Epoch: 29 Global Step: 612190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:12,346-Speed 6294.41 samples/sec Loss 3.4677 LearningRate 0.0001 Epoch: 29 Global Step: 612200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:15,593-Speed 6308.61 samples/sec Loss 3.4959 LearningRate 0.0001 Epoch: 29 Global Step: 612210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:18,840-Speed 6307.64 samples/sec Loss 3.5100 LearningRate 0.0001 Epoch: 29 Global Step: 612220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:22,089-Speed 6305.38 samples/sec Loss 3.4783 LearningRate 0.0001 Epoch: 29 Global Step: 612230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:25,337-Speed 6306.74 samples/sec Loss 3.4750 LearningRate 0.0001 Epoch: 29 Global Step: 612240 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:28,579-Speed 6320.13 samples/sec Loss 3.4469 LearningRate 0.0001 Epoch: 29 Global Step: 612250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:31,822-Speed 6316.30 samples/sec Loss 3.4973 LearningRate 0.0001 Epoch: 29 Global Step: 612260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:35,071-Speed 6303.71 samples/sec Loss 3.5138 LearningRate 0.0001 Epoch: 29 Global Step: 612270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:38,316-Speed 6313.89 samples/sec Loss 3.4553 LearningRate 0.0001 Epoch: 29 Global Step: 612280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:41,560-Speed 6313.10 samples/sec Loss 3.4829 LearningRate 0.0001 Epoch: 29 Global Step: 612290 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:25:44,795-Speed 6332.85 samples/sec Loss 3.5157 LearningRate 0.0001 Epoch: 29 Global Step: 612300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:48,038-Speed 6317.41 samples/sec Loss 3.5029 LearningRate 0.0001 Epoch: 29 Global Step: 612310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:51,285-Speed 6309.47 samples/sec Loss 3.4666 LearningRate 0.0001 Epoch: 29 Global Step: 612320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:54,526-Speed 6319.25 samples/sec Loss 3.4827 LearningRate 0.0001 Epoch: 29 Global Step: 612330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:25:57,776-Speed 6303.44 samples/sec Loss 3.4473 LearningRate 0.0001 Epoch: 29 Global Step: 612340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:01,024-Speed 6306.97 samples/sec Loss 3.5207 LearningRate 0.0001 Epoch: 29 Global Step: 612350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:04,270-Speed 6310.90 samples/sec Loss 3.5057 LearningRate 0.0001 Epoch: 29 Global Step: 612360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:07,514-Speed 6314.53 samples/sec Loss 3.4917 LearningRate 0.0001 Epoch: 29 Global Step: 612370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:10,757-Speed 6317.81 samples/sec Loss 3.4540 LearningRate 0.0001 Epoch: 29 Global Step: 612380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:13,999-Speed 6317.20 samples/sec Loss 3.4984 LearningRate 0.0001 Epoch: 29 Global Step: 612390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:17,244-Speed 6313.33 samples/sec Loss 3.5026 LearningRate 0.0001 Epoch: 29 Global Step: 612400 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:26:20,476-Speed 6337.14 samples/sec Loss 3.4722 LearningRate 0.0001 Epoch: 29 Global Step: 612410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:23,717-Speed 6321.52 samples/sec Loss 3.5236 LearningRate 0.0001 Epoch: 29 Global Step: 612420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:26,963-Speed 6310.66 samples/sec Loss 3.5134 LearningRate 0.0001 Epoch: 29 Global Step: 612430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:30,205-Speed 6318.54 samples/sec Loss 3.4860 LearningRate 0.0001 Epoch: 29 Global Step: 612440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:33,447-Speed 6317.66 samples/sec Loss 3.5066 LearningRate 0.0001 Epoch: 29 Global Step: 612450 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:36,691-Speed 6314.39 samples/sec Loss 3.4394 LearningRate 0.0001 Epoch: 29 Global Step: 612460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:39,939-Speed 6307.56 samples/sec Loss 3.4635 LearningRate 0.0001 Epoch: 29 Global Step: 612470 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:43,184-Speed 6312.46 samples/sec Loss 3.4839 LearningRate 0.0001 Epoch: 29 Global Step: 612480 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:46,429-Speed 6311.67 samples/sec Loss 3.4962 LearningRate 0.0001 Epoch: 29 Global Step: 612490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:49,675-Speed 6310.69 samples/sec Loss 3.5458 LearningRate 0.0001 Epoch: 29 Global Step: 612500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:52,907-Speed 6339.38 samples/sec Loss 3.4668 LearningRate 0.0001 Epoch: 29 Global Step: 612510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:56,156-Speed 6305.26 samples/sec Loss 3.4618 LearningRate 0.0001 Epoch: 29 Global Step: 612520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:26:59,436-Speed 6245.87 samples/sec Loss 3.5049 LearningRate 0.0001 Epoch: 29 Global Step: 612530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:02,679-Speed 6316.01 samples/sec Loss 3.4311 LearningRate 0.0001 Epoch: 29 Global Step: 612540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:05,923-Speed 6313.88 samples/sec Loss 3.4754 LearningRate 0.0001 Epoch: 29 Global Step: 612550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:09,170-Speed 6310.58 samples/sec Loss 3.5104 LearningRate 0.0001 Epoch: 29 Global Step: 612560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:12,415-Speed 6311.32 samples/sec Loss 3.4834 LearningRate 0.0001 Epoch: 29 Global Step: 612570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:15,662-Speed 6308.54 samples/sec Loss 3.4751 LearningRate 0.0001 Epoch: 29 Global Step: 612580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:18,905-Speed 6315.92 samples/sec Loss 3.4660 LearningRate 0.0001 Epoch: 29 Global Step: 612590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:22,152-Speed 6309.69 samples/sec Loss 3.3980 LearningRate 0.0001 Epoch: 29 Global Step: 612600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:25,399-Speed 6309.13 samples/sec Loss 3.4734 LearningRate 0.0001 Epoch: 29 Global Step: 612610 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:27:28,629-Speed 6342.67 samples/sec Loss 3.4559 LearningRate 0.0001 Epoch: 29 Global Step: 612620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:31,886-Speed 6287.72 samples/sec Loss 3.4816 LearningRate 0.0001 Epoch: 29 Global Step: 612630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:35,145-Speed 6286.21 samples/sec Loss 3.4959 LearningRate 0.0001 Epoch: 29 Global Step: 612640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:38,390-Speed 6312.46 samples/sec Loss 3.4570 LearningRate 0.0001 Epoch: 29 Global Step: 612650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:41,645-Speed 6294.08 samples/sec Loss 3.3959 LearningRate 0.0001 Epoch: 29 Global Step: 612660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:44,892-Speed 6308.06 samples/sec Loss 3.5178 LearningRate 0.0001 Epoch: 29 Global Step: 612670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:48,139-Speed 6308.30 samples/sec Loss 3.4612 LearningRate 0.0001 Epoch: 29 Global Step: 612680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:51,385-Speed 6311.04 samples/sec Loss 3.4976 LearningRate 0.0001 Epoch: 29 Global Step: 612690 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:54,631-Speed 6310.54 samples/sec Loss 3.4985 LearningRate 0.0001 Epoch: 29 Global Step: 612700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:27:57,879-Speed 6306.23 samples/sec Loss 3.5039 LearningRate 0.0001 Epoch: 29 Global Step: 612710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:01,111-Speed 6338.73 samples/sec Loss 3.4889 LearningRate 0.0001 Epoch: 29 Global Step: 612720 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:04,358-Speed 6308.16 samples/sec Loss 3.4673 LearningRate 0.0001 Epoch: 29 Global Step: 612730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:07,602-Speed 6315.55 samples/sec Loss 3.4312 LearningRate 0.0001 Epoch: 29 Global Step: 612740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:10,863-Speed 6282.20 samples/sec Loss 3.4961 LearningRate 0.0001 Epoch: 29 Global Step: 612750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:14,105-Speed 6318.08 samples/sec Loss 3.4556 LearningRate 0.0001 Epoch: 29 Global Step: 612760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:17,348-Speed 6317.73 samples/sec Loss 3.4825 LearningRate 0.0001 Epoch: 29 Global Step: 612770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:20,621-Speed 6257.60 samples/sec Loss 3.5006 LearningRate 0.0001 Epoch: 29 Global Step: 612780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:23,866-Speed 6312.55 samples/sec Loss 3.4248 LearningRate 0.0001 Epoch: 29 Global Step: 612790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:27,110-Speed 6315.93 samples/sec Loss 3.4430 LearningRate 0.0001 Epoch: 29 Global Step: 612800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:30,357-Speed 6308.24 samples/sec Loss 3.4343 LearningRate 0.0001 Epoch: 29 Global Step: 612810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:33,591-Speed 6334.51 samples/sec Loss 3.5162 LearningRate 0.0001 Epoch: 29 Global Step: 612820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:36,834-Speed 6316.00 samples/sec Loss 3.4335 LearningRate 0.0001 Epoch: 29 Global Step: 612830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:40,080-Speed 6309.82 samples/sec Loss 3.4348 LearningRate 0.0001 Epoch: 29 Global Step: 612840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:43,327-Speed 6308.78 samples/sec Loss 3.5557 LearningRate 0.0001 Epoch: 29 Global Step: 612850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:46,573-Speed 6311.77 samples/sec Loss 3.5209 LearningRate 0.0001 Epoch: 29 Global Step: 612860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:49,820-Speed 6308.45 samples/sec Loss 3.4713 LearningRate 0.0001 Epoch: 29 Global Step: 612870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:53,066-Speed 6310.89 samples/sec Loss 3.4717 LearningRate 0.0001 Epoch: 29 Global Step: 612880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:56,312-Speed 6310.42 samples/sec Loss 3.4523 LearningRate 0.0001 Epoch: 29 Global Step: 612890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:28:59,555-Speed 6316.02 samples/sec Loss 3.4491 LearningRate 0.0001 Epoch: 29 Global Step: 612900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:02,801-Speed 6312.00 samples/sec Loss 3.4793 LearningRate 0.0001 Epoch: 29 Global Step: 612910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:06,034-Speed 6335.06 samples/sec Loss 3.4433 LearningRate 0.0001 Epoch: 29 Global Step: 612920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:09,279-Speed 6312.21 samples/sec Loss 3.4785 LearningRate 0.0001 Epoch: 29 Global Step: 612930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:12,524-Speed 6313.54 samples/sec Loss 3.4431 LearningRate 0.0001 Epoch: 29 Global Step: 612940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:15,765-Speed 6319.19 samples/sec Loss 3.4473 LearningRate 0.0001 Epoch: 29 Global Step: 612950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:19,007-Speed 6318.35 samples/sec Loss 3.5106 LearningRate 0.0001 Epoch: 29 Global Step: 612960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:22,253-Speed 6310.62 samples/sec Loss 3.4700 LearningRate 0.0001 Epoch: 29 Global Step: 612970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:25,499-Speed 6312.34 samples/sec Loss 3.4905 LearningRate 0.0001 Epoch: 29 Global Step: 612980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:28,744-Speed 6313.55 samples/sec Loss 3.4673 LearningRate 0.0001 Epoch: 29 Global Step: 612990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:31,991-Speed 6308.33 samples/sec Loss 3.4999 LearningRate 0.0001 Epoch: 29 Global Step: 613000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:35,247-Speed 6291.65 samples/sec Loss 3.4890 LearningRate 0.0001 Epoch: 29 Global Step: 613010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:38,494-Speed 6309.80 samples/sec Loss 3.4159 LearningRate 0.0001 Epoch: 29 Global Step: 613020 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:29:41,731-Speed 6327.30 samples/sec Loss 3.4446 LearningRate 0.0001 Epoch: 29 Global Step: 613030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:44,976-Speed 6312.06 samples/sec Loss 3.4591 LearningRate 0.0001 Epoch: 29 Global Step: 613040 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:48,222-Speed 6310.67 samples/sec Loss 3.4821 LearningRate 0.0001 Epoch: 29 Global Step: 613050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:51,464-Speed 6318.40 samples/sec Loss 3.4500 LearningRate 0.0001 Epoch: 29 Global Step: 613060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:54,707-Speed 6317.51 samples/sec Loss 3.5064 LearningRate 0.0001 Epoch: 29 Global Step: 613070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:29:57,954-Speed 6309.50 samples/sec Loss 3.4472 LearningRate 0.0001 Epoch: 29 Global Step: 613080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:01,200-Speed 6311.26 samples/sec Loss 3.5027 LearningRate 0.0001 Epoch: 29 Global Step: 613090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:04,448-Speed 6305.84 samples/sec Loss 3.5201 LearningRate 0.0001 Epoch: 29 Global Step: 613100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:07,700-Speed 6298.58 samples/sec Loss 3.4894 LearningRate 0.0001 Epoch: 29 Global Step: 613110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:10,948-Speed 6306.68 samples/sec Loss 3.4775 LearningRate 0.0001 Epoch: 29 Global Step: 613120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:14,181-Speed 6336.55 samples/sec Loss 3.4268 LearningRate 0.0001 Epoch: 29 Global Step: 613130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:17,427-Speed 6310.43 samples/sec Loss 3.5111 LearningRate 0.0001 Epoch: 29 Global Step: 613140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:20,680-Speed 6296.87 samples/sec Loss 3.4886 LearningRate 0.0001 Epoch: 29 Global Step: 613150 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:23,923-Speed 6315.88 samples/sec Loss 3.4977 LearningRate 0.0001 Epoch: 29 Global Step: 613160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:27,168-Speed 6313.87 samples/sec Loss 3.4445 LearningRate 0.0001 Epoch: 29 Global Step: 613170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:30,412-Speed 6313.39 samples/sec Loss 3.4560 LearningRate 0.0001 Epoch: 29 Global Step: 613180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:33,655-Speed 6317.33 samples/sec Loss 3.4345 LearningRate 0.0001 Epoch: 29 Global Step: 613190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:36,903-Speed 6306.94 samples/sec Loss 3.4483 LearningRate 0.0001 Epoch: 29 Global Step: 613200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:40,150-Speed 6310.40 samples/sec Loss 3.4294 LearningRate 0.0001 Epoch: 29 Global Step: 613210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:43,394-Speed 6314.31 samples/sec Loss 3.4550 LearningRate 0.0001 Epoch: 29 Global Step: 613220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:46,652-Speed 6286.71 samples/sec Loss 3.4196 LearningRate 0.0001 Epoch: 29 Global Step: 613230 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:30:49,899-Speed 6309.37 samples/sec Loss 3.5070 LearningRate 0.0001 Epoch: 29 Global Step: 613240 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:30:53,132-Speed 6336.65 samples/sec Loss 3.4685 LearningRate 0.0001 Epoch: 29 Global Step: 613250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:56,376-Speed 6314.42 samples/sec Loss 3.4637 LearningRate 0.0001 Epoch: 29 Global Step: 613260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:30:59,618-Speed 6317.87 samples/sec Loss 3.4699 LearningRate 0.0001 Epoch: 29 Global Step: 613270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:02,867-Speed 6306.11 samples/sec Loss 3.4518 LearningRate 0.0001 Epoch: 29 Global Step: 613280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:06,117-Speed 6301.31 samples/sec Loss 3.4484 LearningRate 0.0001 Epoch: 29 Global Step: 613290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:09,362-Speed 6312.81 samples/sec Loss 3.4743 LearningRate 0.0001 Epoch: 29 Global Step: 613300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:12,608-Speed 6310.37 samples/sec Loss 3.4925 LearningRate 0.0001 Epoch: 29 Global Step: 613310 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:15,854-Speed 6312.23 samples/sec Loss 3.5160 LearningRate 0.0001 Epoch: 29 Global Step: 613320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:19,102-Speed 6306.38 samples/sec Loss 3.4333 LearningRate 0.0001 Epoch: 29 Global Step: 613330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:22,350-Speed 6306.85 samples/sec Loss 3.5139 LearningRate 0.0001 Epoch: 29 Global Step: 613340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:25,580-Speed 6342.05 samples/sec Loss 3.4464 LearningRate 0.0001 Epoch: 29 Global Step: 613350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:28,823-Speed 6316.19 samples/sec Loss 3.4334 LearningRate 0.0001 Epoch: 29 Global Step: 613360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:32,078-Speed 6294.23 samples/sec Loss 3.4129 LearningRate 0.0001 Epoch: 29 Global Step: 613370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:35,326-Speed 6305.55 samples/sec Loss 3.4824 LearningRate 0.0001 Epoch: 29 Global Step: 613380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:38,574-Speed 6307.59 samples/sec Loss 3.4585 LearningRate 0.0001 Epoch: 29 Global Step: 613390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:41,819-Speed 6311.13 samples/sec Loss 3.4667 LearningRate 0.0001 Epoch: 29 Global Step: 613400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:45,066-Speed 6310.70 samples/sec Loss 3.4472 LearningRate 0.0001 Epoch: 29 Global Step: 613410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:48,314-Speed 6305.45 samples/sec Loss 3.4788 LearningRate 0.0001 Epoch: 29 Global Step: 613420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:51,556-Speed 6319.54 samples/sec Loss 3.4505 LearningRate 0.0001 Epoch: 29 Global Step: 613430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:54,804-Speed 6306.74 samples/sec Loss 3.4849 LearningRate 0.0001 Epoch: 29 Global Step: 613440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:31:58,049-Speed 6313.60 samples/sec Loss 3.4395 LearningRate 0.0001 Epoch: 29 Global Step: 613450 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:32:01,286-Speed 6328.66 samples/sec Loss 3.4954 LearningRate 0.0001 Epoch: 29 Global Step: 613460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:04,535-Speed 6304.58 samples/sec Loss 3.5058 LearningRate 0.0001 Epoch: 29 Global Step: 613470 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:07,778-Speed 6315.40 samples/sec Loss 3.5021 LearningRate 0.0001 Epoch: 29 Global Step: 613480 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:11,032-Speed 6295.41 samples/sec Loss 3.5104 LearningRate 0.0001 Epoch: 29 Global Step: 613490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:14,276-Speed 6315.39 samples/sec Loss 3.4887 LearningRate 0.0001 Epoch: 29 Global Step: 613500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:17,523-Speed 6309.03 samples/sec Loss 3.4708 LearningRate 0.0001 Epoch: 29 Global Step: 613510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:20,766-Speed 6315.96 samples/sec Loss 3.4719 LearningRate 0.0001 Epoch: 29 Global Step: 613520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:24,009-Speed 6315.07 samples/sec Loss 3.4342 LearningRate 0.0001 Epoch: 29 Global Step: 613530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:27,279-Speed 6266.18 samples/sec Loss 3.4704 LearningRate 0.0001 Epoch: 29 Global Step: 613540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:30,524-Speed 6311.26 samples/sec Loss 3.4896 LearningRate 0.0001 Epoch: 29 Global Step: 613550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:33,770-Speed 6310.76 samples/sec Loss 3.5346 LearningRate 0.0001 Epoch: 29 Global Step: 613560 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:32:37,002-Speed 6339.13 samples/sec Loss 3.4524 LearningRate 0.0001 Epoch: 29 Global Step: 613570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:40,247-Speed 6312.46 samples/sec Loss 3.4914 LearningRate 0.0001 Epoch: 29 Global Step: 613580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:43,490-Speed 6315.35 samples/sec Loss 3.4847 LearningRate 0.0001 Epoch: 29 Global Step: 613590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:46,735-Speed 6314.57 samples/sec Loss 3.5086 LearningRate 0.0001 Epoch: 29 Global Step: 613600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:49,979-Speed 6314.52 samples/sec Loss 3.4339 LearningRate 0.0001 Epoch: 29 Global Step: 613610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:53,222-Speed 6315.40 samples/sec Loss 3.3898 LearningRate 0.0001 Epoch: 29 Global Step: 613620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:56,468-Speed 6311.43 samples/sec Loss 3.4832 LearningRate 0.0001 Epoch: 29 Global Step: 613630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:32:59,734-Speed 6271.95 samples/sec Loss 3.4640 LearningRate 0.0001 Epoch: 29 Global Step: 613640 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:02,978-Speed 6314.32 samples/sec Loss 3.4908 LearningRate 0.0001 Epoch: 29 Global Step: 613650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:06,225-Speed 6309.46 samples/sec Loss 3.4258 LearningRate 0.0001 Epoch: 29 Global Step: 613660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:09,470-Speed 6312.08 samples/sec Loss 3.4243 LearningRate 0.0001 Epoch: 29 Global Step: 613670 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:33:12,700-Speed 6342.60 samples/sec Loss 3.3589 LearningRate 0.0001 Epoch: 29 Global Step: 613680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:15,956-Speed 6292.72 samples/sec Loss 3.4338 LearningRate 0.0001 Epoch: 29 Global Step: 613690 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:19,257-Speed 6203.75 samples/sec Loss 3.4734 LearningRate 0.0001 Epoch: 29 Global Step: 613700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:22,505-Speed 6307.85 samples/sec Loss 3.4253 LearningRate 0.0001 Epoch: 29 Global Step: 613710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:25,751-Speed 6313.52 samples/sec Loss 3.4389 LearningRate 0.0001 Epoch: 29 Global Step: 613720 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:28,994-Speed 6316.12 samples/sec Loss 3.4270 LearningRate 0.0001 Epoch: 29 Global Step: 613730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:32,239-Speed 6313.40 samples/sec Loss 3.4768 LearningRate 0.0001 Epoch: 29 Global Step: 613740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:35,489-Speed 6303.55 samples/sec Loss 3.4926 LearningRate 0.0001 Epoch: 29 Global Step: 613750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:38,734-Speed 6312.82 samples/sec Loss 3.4938 LearningRate 0.0001 Epoch: 29 Global Step: 613760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:41,978-Speed 6313.96 samples/sec Loss 3.4180 LearningRate 0.0001 Epoch: 29 Global Step: 613770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:45,209-Speed 6340.40 samples/sec Loss 3.3777 LearningRate 0.0001 Epoch: 29 Global Step: 613780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:48,453-Speed 6314.28 samples/sec Loss 3.4550 LearningRate 0.0001 Epoch: 29 Global Step: 613790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:51,696-Speed 6316.97 samples/sec Loss 3.4393 LearningRate 0.0001 Epoch: 29 Global Step: 613800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:54,941-Speed 6313.11 samples/sec Loss 3.4320 LearningRate 0.0001 Epoch: 29 Global Step: 613810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:33:58,188-Speed 6308.64 samples/sec Loss 3.4879 LearningRate 0.0001 Epoch: 29 Global Step: 613820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:01,432-Speed 6313.25 samples/sec Loss 3.5183 LearningRate 0.0001 Epoch: 29 Global Step: 613830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:04,687-Speed 6294.61 samples/sec Loss 3.4569 LearningRate 0.0001 Epoch: 29 Global Step: 613840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:07,930-Speed 6315.48 samples/sec Loss 3.4463 LearningRate 0.0001 Epoch: 29 Global Step: 613850 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:11,174-Speed 6314.67 samples/sec Loss 3.5000 LearningRate 0.0001 Epoch: 29 Global Step: 613860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:14,449-Speed 6255.71 samples/sec Loss 3.4579 LearningRate 0.0001 Epoch: 29 Global Step: 613870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:17,696-Speed 6309.78 samples/sec Loss 3.4635 LearningRate 0.0001 Epoch: 29 Global Step: 613880 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:34:20,939-Speed 6316.04 samples/sec Loss 3.4559 LearningRate 0.0001 Epoch: 29 Global Step: 613890 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:34:24,171-Speed 6337.60 samples/sec Loss 3.4153 LearningRate 0.0001 Epoch: 29 Global Step: 613900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:27,420-Speed 6305.59 samples/sec Loss 3.5140 LearningRate 0.0001 Epoch: 29 Global Step: 613910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:30,666-Speed 6309.80 samples/sec Loss 3.4808 LearningRate 0.0001 Epoch: 29 Global Step: 613920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:33,913-Speed 6308.92 samples/sec Loss 3.4218 LearningRate 0.0001 Epoch: 29 Global Step: 613930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:37,160-Speed 6309.53 samples/sec Loss 3.3890 LearningRate 0.0001 Epoch: 29 Global Step: 613940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:40,408-Speed 6306.76 samples/sec Loss 3.4001 LearningRate 0.0001 Epoch: 29 Global Step: 613950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:43,652-Speed 6314.43 samples/sec Loss 3.4594 LearningRate 0.0001 Epoch: 29 Global Step: 613960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:46,899-Speed 6308.87 samples/sec Loss 3.3855 LearningRate 0.0001 Epoch: 29 Global Step: 613970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:50,142-Speed 6316.44 samples/sec Loss 3.3794 LearningRate 0.0001 Epoch: 29 Global Step: 613980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:53,388-Speed 6309.31 samples/sec Loss 3.4938 LearningRate 0.0001 Epoch: 29 Global Step: 613990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:34:56,649-Speed 6282.36 samples/sec Loss 3.4016 LearningRate 0.0001 Epoch: 29 Global Step: 614000 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:34:59,891-Speed 6319.82 samples/sec Loss 3.4938 LearningRate 0.0001 Epoch: 29 Global Step: 614010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:03,139-Speed 6306.45 samples/sec Loss 3.4568 LearningRate 0.0001 Epoch: 29 Global Step: 614020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:06,385-Speed 6309.57 samples/sec Loss 3.4872 LearningRate 0.0001 Epoch: 29 Global Step: 614030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:09,632-Speed 6310.06 samples/sec Loss 3.5012 LearningRate 0.0001 Epoch: 29 Global Step: 614040 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:12,873-Speed 6318.62 samples/sec Loss 3.5075 LearningRate 0.0001 Epoch: 29 Global Step: 614050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:16,119-Speed 6311.83 samples/sec Loss 3.5275 LearningRate 0.0001 Epoch: 29 Global Step: 614060 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:19,364-Speed 6312.22 samples/sec Loss 3.4868 LearningRate 0.0001 Epoch: 29 Global Step: 614070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:22,610-Speed 6312.26 samples/sec Loss 3.4982 LearningRate 0.0001 Epoch: 29 Global Step: 614080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:25,854-Speed 6315.49 samples/sec Loss 3.4922 LearningRate 0.0001 Epoch: 29 Global Step: 614090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:29,098-Speed 6313.44 samples/sec Loss 3.4937 LearningRate 0.0001 Epoch: 29 Global Step: 614100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:32,331-Speed 6336.12 samples/sec Loss 3.5215 LearningRate 0.0001 Epoch: 29 Global Step: 614110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:35,575-Speed 6314.91 samples/sec Loss 3.4864 LearningRate 0.0001 Epoch: 29 Global Step: 614120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:38,817-Speed 6319.01 samples/sec Loss 3.4449 LearningRate 0.0001 Epoch: 29 Global Step: 614130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:42,063-Speed 6310.48 samples/sec Loss 3.4645 LearningRate 0.0001 Epoch: 29 Global Step: 614140 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:45,308-Speed 6311.67 samples/sec Loss 3.4658 LearningRate 0.0001 Epoch: 29 Global Step: 614150 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:48,553-Speed 6314.16 samples/sec Loss 3.4488 LearningRate 0.0001 Epoch: 29 Global Step: 614160 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:51,801-Speed 6307.05 samples/sec Loss 3.4321 LearningRate 0.0001 Epoch: 29 Global Step: 614170 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:55,046-Speed 6311.90 samples/sec Loss 3.4337 LearningRate 0.0001 Epoch: 29 Global Step: 614180 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:35:58,292-Speed 6311.33 samples/sec Loss 3.4231 LearningRate 0.0001 Epoch: 29 Global Step: 614190 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:01,535-Speed 6315.40 samples/sec Loss 3.4559 LearningRate 0.0001 Epoch: 29 Global Step: 614200 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:04,768-Speed 6336.15 samples/sec Loss 3.5177 LearningRate 0.0001 Epoch: 29 Global Step: 614210 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:08,012-Speed 6315.95 samples/sec Loss 3.5100 LearningRate 0.0001 Epoch: 29 Global Step: 614220 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:11,257-Speed 6312.01 samples/sec Loss 3.4682 LearningRate 0.0001 Epoch: 29 Global Step: 614230 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:14,503-Speed 6310.85 samples/sec Loss 3.4835 LearningRate 0.0001 Epoch: 29 Global Step: 614240 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:17,755-Speed 6298.78 samples/sec Loss 3.4768 LearningRate 0.0001 Epoch: 29 Global Step: 614250 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:21,002-Speed 6309.53 samples/sec Loss 3.5054 LearningRate 0.0001 Epoch: 29 Global Step: 614260 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:24,246-Speed 6313.13 samples/sec Loss 3.4858 LearningRate 0.0001 Epoch: 29 Global Step: 614270 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:27,499-Speed 6297.70 samples/sec Loss 3.4765 LearningRate 0.0001 Epoch: 29 Global Step: 614280 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:30,790-Speed 6223.65 samples/sec Loss 3.5106 LearningRate 0.0001 Epoch: 29 Global Step: 614290 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:34,034-Speed 6315.27 samples/sec Loss 3.4708 LearningRate 0.0001 Epoch: 29 Global Step: 614300 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:37,281-Speed 6309.25 samples/sec Loss 3.4676 LearningRate 0.0001 Epoch: 29 Global Step: 614310 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:36:40,513-Speed 6338.76 samples/sec Loss 3.4239 LearningRate 0.0001 Epoch: 29 Global Step: 614320 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:43,758-Speed 6313.20 samples/sec Loss 3.4972 LearningRate 0.0001 Epoch: 29 Global Step: 614330 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:47,008-Speed 6302.70 samples/sec Loss 3.5194 LearningRate 0.0001 Epoch: 29 Global Step: 614340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:50,286-Speed 6248.93 samples/sec Loss 3.4429 LearningRate 0.0001 Epoch: 29 Global Step: 614350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:53,569-Speed 6238.72 samples/sec Loss 3.5236 LearningRate 0.0001 Epoch: 29 Global Step: 614360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:36:56,823-Speed 6296.42 samples/sec Loss 3.4867 LearningRate 0.0001 Epoch: 29 Global Step: 614370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:00,074-Speed 6300.33 samples/sec Loss 3.4560 LearningRate 0.0001 Epoch: 29 Global Step: 614380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:03,318-Speed 6315.74 samples/sec Loss 3.5015 LearningRate 0.0001 Epoch: 29 Global Step: 614390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:06,560-Speed 6317.93 samples/sec Loss 3.4430 LearningRate 0.0001 Epoch: 29 Global Step: 614400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:09,822-Speed 6279.65 samples/sec Loss 3.4597 LearningRate 0.0001 Epoch: 29 Global Step: 614410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:13,068-Speed 6310.44 samples/sec Loss 3.4813 LearningRate 0.0001 Epoch: 29 Global Step: 614420 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:37:16,315-Speed 6308.48 samples/sec Loss 3.4529 LearningRate 0.0001 Epoch: 29 Global Step: 614430 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:37:19,546-Speed 6340.53 samples/sec Loss 3.4542 LearningRate 0.0001 Epoch: 29 Global Step: 614440 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:22,805-Speed 6285.63 samples/sec Loss 3.4599 LearningRate 0.0001 Epoch: 29 Global Step: 614450 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:26,095-Speed 6226.34 samples/sec Loss 3.4140 LearningRate 0.0001 Epoch: 29 Global Step: 614460 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:29,338-Speed 6315.52 samples/sec Loss 3.4407 LearningRate 0.0001 Epoch: 29 Global Step: 614470 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:32,585-Speed 6309.52 samples/sec Loss 3.4524 LearningRate 0.0001 Epoch: 29 Global Step: 614480 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:35,841-Speed 6291.66 samples/sec Loss 3.4597 LearningRate 0.0001 Epoch: 29 Global Step: 614490 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:39,101-Speed 6282.43 samples/sec Loss 3.4500 LearningRate 0.0001 Epoch: 29 Global Step: 614500 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:42,352-Speed 6302.06 samples/sec Loss 3.4297 LearningRate 0.0001 Epoch: 29 Global Step: 614510 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:45,612-Speed 6282.64 samples/sec Loss 3.4441 LearningRate 0.0001 Epoch: 29 Global Step: 614520 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:48,857-Speed 6313.00 samples/sec Loss 3.4053 LearningRate 0.0001 Epoch: 29 Global Step: 614530 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:52,085-Speed 6346.15 samples/sec Loss 3.4507 LearningRate 0.0001 Epoch: 29 Global Step: 614540 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:55,333-Speed 6307.29 samples/sec Loss 3.4564 LearningRate 0.0001 Epoch: 29 Global Step: 614550 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:37:58,576-Speed 6316.28 samples/sec Loss 3.5079 LearningRate 0.0001 Epoch: 29 Global Step: 614560 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:01,820-Speed 6314.92 samples/sec Loss 3.4557 LearningRate 0.0001 Epoch: 29 Global Step: 614570 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:05,174-Speed 6108.34 samples/sec Loss 3.4608 LearningRate 0.0001 Epoch: 29 Global Step: 614580 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:08,439-Speed 6272.82 samples/sec Loss 3.4520 LearningRate 0.0001 Epoch: 29 Global Step: 614590 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:11,686-Speed 6309.93 samples/sec Loss 3.4361 LearningRate 0.0001 Epoch: 29 Global Step: 614600 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:14,926-Speed 6321.84 samples/sec Loss 3.5200 LearningRate 0.0001 Epoch: 29 Global Step: 614610 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:18,171-Speed 6313.08 samples/sec Loss 3.4099 LearningRate 0.0001 Epoch: 29 Global Step: 614620 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:21,418-Speed 6308.69 samples/sec Loss 3.4750 LearningRate 0.0001 Epoch: 29 Global Step: 614630 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:24,665-Speed 6307.76 samples/sec Loss 3.4900 LearningRate 0.0001 Epoch: 29 Global Step: 614640 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:38:27,896-Speed 6339.54 samples/sec Loss 3.4563 LearningRate 0.0001 Epoch: 29 Global Step: 614650 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:31,141-Speed 6313.83 samples/sec Loss 3.4879 LearningRate 0.0001 Epoch: 29 Global Step: 614660 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:34,387-Speed 6310.80 samples/sec Loss 3.4592 LearningRate 0.0001 Epoch: 29 Global Step: 614670 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:37,633-Speed 6310.67 samples/sec Loss 3.4585 LearningRate 0.0001 Epoch: 29 Global Step: 614680 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:40,879-Speed 6310.35 samples/sec Loss 3.4993 LearningRate 0.0001 Epoch: 29 Global Step: 614690 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:44,125-Speed 6311.58 samples/sec Loss 3.4140 LearningRate 0.0001 Epoch: 29 Global Step: 614700 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:47,370-Speed 6311.65 samples/sec Loss 3.4604 LearningRate 0.0001 Epoch: 29 Global Step: 614710 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:50,617-Speed 6309.04 samples/sec Loss 3.4497 LearningRate 0.0001 Epoch: 29 Global Step: 614720 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:53,869-Speed 6298.57 samples/sec Loss 3.4716 LearningRate 0.0001 Epoch: 29 Global Step: 614730 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:38:57,114-Speed 6313.28 samples/sec Loss 3.4176 LearningRate 0.0001 Epoch: 29 Global Step: 614740 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:00,346-Speed 6338.31 samples/sec Loss 3.4371 LearningRate 0.0001 Epoch: 29 Global Step: 614750 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:03,594-Speed 6307.82 samples/sec Loss 3.4129 LearningRate 0.0001 Epoch: 29 Global Step: 614760 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:06,838-Speed 6314.55 samples/sec Loss 3.4877 LearningRate 0.0001 Epoch: 29 Global Step: 614770 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:10,081-Speed 6316.14 samples/sec Loss 3.4384 LearningRate 0.0001 Epoch: 29 Global Step: 614780 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:13,324-Speed 6316.34 samples/sec Loss 3.4904 LearningRate 0.0001 Epoch: 29 Global Step: 614790 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:16,567-Speed 6317.80 samples/sec Loss 3.4603 LearningRate 0.0001 Epoch: 29 Global Step: 614800 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:19,814-Speed 6308.25 samples/sec Loss 3.4954 LearningRate 0.0001 Epoch: 29 Global Step: 614810 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:23,060-Speed 6311.37 samples/sec Loss 3.4802 LearningRate 0.0001 Epoch: 29 Global Step: 614820 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:26,305-Speed 6311.34 samples/sec Loss 3.4579 LearningRate 0.0001 Epoch: 29 Global Step: 614830 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:29,552-Speed 6309.37 samples/sec Loss 3.4179 LearningRate 0.0001 Epoch: 29 Global Step: 614840 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:32,798-Speed 6309.94 samples/sec Loss 3.5250 LearningRate 0.0001 Epoch: 29 Global Step: 614850 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:39:36,027-Speed 6344.22 samples/sec Loss 3.5314 LearningRate 0.0001 Epoch: 29 Global Step: 614860 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:39,273-Speed 6312.20 samples/sec Loss 3.5074 LearningRate 0.0001 Epoch: 29 Global Step: 614870 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:42,520-Speed 6308.53 samples/sec Loss 3.4957 LearningRate 0.0001 Epoch: 29 Global Step: 614880 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:45,768-Speed 6306.65 samples/sec Loss 3.4406 LearningRate 0.0001 Epoch: 29 Global Step: 614890 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:49,013-Speed 6311.24 samples/sec Loss 3.4483 LearningRate 0.0001 Epoch: 29 Global Step: 614900 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:52,260-Speed 6309.48 samples/sec Loss 3.4778 LearningRate 0.0001 Epoch: 29 Global Step: 614910 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:55,527-Speed 6270.60 samples/sec Loss 3.4888 LearningRate 0.0001 Epoch: 29 Global Step: 614920 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:39:58,775-Speed 6306.08 samples/sec Loss 3.5031 LearningRate 0.0001 Epoch: 29 Global Step: 614930 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:02,046-Speed 6263.12 samples/sec Loss 3.4808 LearningRate 0.0001 Epoch: 29 Global Step: 614940 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:05,292-Speed 6310.57 samples/sec Loss 3.4332 LearningRate 0.0001 Epoch: 29 Global Step: 614950 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:08,527-Speed 6332.69 samples/sec Loss 3.4827 LearningRate 0.0001 Epoch: 29 Global Step: 614960 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:11,771-Speed 6313.71 samples/sec Loss 3.3732 LearningRate 0.0001 Epoch: 29 Global Step: 614970 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:15,015-Speed 6314.54 samples/sec Loss 3.4833 LearningRate 0.0001 Epoch: 29 Global Step: 614980 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:18,259-Speed 6315.44 samples/sec Loss 3.3967 LearningRate 0.0001 Epoch: 29 Global Step: 614990 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:21,508-Speed 6304.90 samples/sec Loss 3.4154 LearningRate 0.0001 Epoch: 29 Global Step: 615000 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:24,756-Speed 6307.82 samples/sec Loss 3.4534 LearningRate 0.0001 Epoch: 29 Global Step: 615010 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:28,002-Speed 6309.21 samples/sec Loss 3.5171 LearningRate 0.0001 Epoch: 29 Global Step: 615020 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:31,245-Speed 6317.88 samples/sec Loss 3.4648 LearningRate 0.0001 Epoch: 29 Global Step: 615030 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:34,523-Speed 6249.18 samples/sec Loss 3.4247 LearningRate 0.0001 Epoch: 29 Global Step: 615040 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:37,771-Speed 6306.82 samples/sec Loss 3.4632 LearningRate 0.0001 Epoch: 29 Global Step: 615050 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:41,027-Speed 6291.75 samples/sec Loss 3.4507 LearningRate 0.0001 Epoch: 29 Global Step: 615060 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-02 23:40:44,254-Speed 6346.17 samples/sec Loss 3.4992 LearningRate 0.0001 Epoch: 29 Global Step: 615070 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:47,501-Speed 6308.83 samples/sec Loss 3.5191 LearningRate 0.0001 Epoch: 29 Global Step: 615080 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:50,747-Speed 6312.35 samples/sec Loss 3.4539 LearningRate 0.0001 Epoch: 29 Global Step: 615090 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:53,993-Speed 6308.90 samples/sec Loss 3.4612 LearningRate 0.0001 Epoch: 29 Global Step: 615100 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:40:57,239-Speed 6310.86 samples/sec Loss 3.4740 LearningRate 0.0001 Epoch: 29 Global Step: 615110 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:41:00,487-Speed 6306.80 samples/sec Loss 3.4557 LearningRate 0.0001 Epoch: 29 Global Step: 615120 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:41:03,733-Speed 6310.82 samples/sec Loss 3.4480 LearningRate 0.0001 Epoch: 29 Global Step: 615130 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-04-02 23:41:06,981-Speed 6307.94 samples/sec Loss 3.5120 LearningRate 0.0001 Epoch: 29 Global Step: 615140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:10,231-Speed 6302.05 samples/sec Loss 3.4484 LearningRate 0.0001 Epoch: 29 Global Step: 615150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:13,477-Speed 6310.72 samples/sec Loss 3.4200 LearningRate 0.0001 Epoch: 29 Global Step: 615160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:16,721-Speed 6315.83 samples/sec Loss 3.3391 LearningRate 0.0001 Epoch: 29 Global Step: 615170 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:41:19,950-Speed 6343.00 samples/sec Loss 3.4629 LearningRate 0.0001 Epoch: 29 Global Step: 615180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:23,196-Speed 6311.50 samples/sec Loss 3.4660 LearningRate 0.0001 Epoch: 29 Global Step: 615190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:26,437-Speed 6319.06 samples/sec Loss 3.4120 LearningRate 0.0001 Epoch: 29 Global Step: 615200 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:29,684-Speed 6308.72 samples/sec Loss 3.4318 LearningRate 0.0001 Epoch: 29 Global Step: 615210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:32,933-Speed 6307.02 samples/sec Loss 3.4791 LearningRate 0.0001 Epoch: 29 Global Step: 615220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:36,177-Speed 6314.20 samples/sec Loss 3.4902 LearningRate 0.0001 Epoch: 29 Global Step: 615230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:39,422-Speed 6312.25 samples/sec Loss 3.4261 LearningRate 0.0001 Epoch: 29 Global Step: 615240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:42,680-Speed 6288.42 samples/sec Loss 3.3988 LearningRate 0.0001 Epoch: 29 Global Step: 615250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:45,922-Speed 6317.29 samples/sec Loss 3.4207 LearningRate 0.0001 Epoch: 29 Global Step: 615260 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:49,180-Speed 6288.34 samples/sec Loss 3.4528 LearningRate 0.0001 Epoch: 29 Global Step: 615270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:52,414-Speed 6334.32 samples/sec Loss 3.4306 LearningRate 0.0001 Epoch: 29 Global Step: 615280 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:55,659-Speed 6312.77 samples/sec Loss 3.4704 LearningRate 0.0001 Epoch: 29 Global Step: 615290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:41:58,909-Speed 6301.85 samples/sec Loss 3.4644 LearningRate 0.0001 Epoch: 29 Global Step: 615300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:02,152-Speed 6316.43 samples/sec Loss 3.4684 LearningRate 0.0001 Epoch: 29 Global Step: 615310 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:05,396-Speed 6314.80 samples/sec Loss 3.4321 LearningRate 0.0001 Epoch: 29 Global Step: 615320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:08,652-Speed 6291.15 samples/sec Loss 3.5100 LearningRate 0.0001 Epoch: 29 Global Step: 615330 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:11,923-Speed 6262.01 samples/sec Loss 3.4737 LearningRate 0.0001 Epoch: 29 Global Step: 615340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:15,168-Speed 6313.52 samples/sec Loss 3.4344 LearningRate 0.0001 Epoch: 29 Global Step: 615350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:18,430-Speed 6279.35 samples/sec Loss 3.4915 LearningRate 0.0001 Epoch: 29 Global Step: 615360 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:21,675-Speed 6313.64 samples/sec Loss 3.4268 LearningRate 0.0001 Epoch: 29 Global Step: 615370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:24,922-Speed 6307.14 samples/sec Loss 3.4553 LearningRate 0.0001 Epoch: 29 Global Step: 615380 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:42:28,155-Speed 6337.46 samples/sec Loss 3.4397 LearningRate 0.0001 Epoch: 29 Global Step: 615390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:31,395-Speed 6322.86 samples/sec Loss 3.4735 LearningRate 0.0001 Epoch: 29 Global Step: 615400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:34,645-Speed 6301.97 samples/sec Loss 3.4513 LearningRate 0.0001 Epoch: 29 Global Step: 615410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:37,888-Speed 6317.32 samples/sec Loss 3.4631 LearningRate 0.0001 Epoch: 29 Global Step: 615420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:41,130-Speed 6318.06 samples/sec Loss 3.4381 LearningRate 0.0001 Epoch: 29 Global Step: 615430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:44,378-Speed 6307.08 samples/sec Loss 3.3685 LearningRate 0.0001 Epoch: 29 Global Step: 615440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:47,625-Speed 6309.37 samples/sec Loss 3.4611 LearningRate 0.0001 Epoch: 29 Global Step: 615450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:50,870-Speed 6312.24 samples/sec Loss 3.4451 LearningRate 0.0001 Epoch: 29 Global Step: 615460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:54,113-Speed 6315.92 samples/sec Loss 3.4683 LearningRate 0.0001 Epoch: 29 Global Step: 615470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:42:57,355-Speed 6319.31 samples/sec Loss 3.4266 LearningRate 0.0001 Epoch: 29 Global Step: 615480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:00,585-Speed 6341.58 samples/sec Loss 3.4846 LearningRate 0.0001 Epoch: 29 Global Step: 615490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:03,828-Speed 6316.69 samples/sec Loss 3.4810 LearningRate 0.0001 Epoch: 29 Global Step: 615500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:07,070-Speed 6318.09 samples/sec Loss 3.4531 LearningRate 0.0001 Epoch: 29 Global Step: 615510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:10,318-Speed 6306.72 samples/sec Loss 3.4548 LearningRate 0.0001 Epoch: 29 Global Step: 615520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:13,562-Speed 6315.88 samples/sec Loss 3.4779 LearningRate 0.0001 Epoch: 29 Global Step: 615530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:16,809-Speed 6308.59 samples/sec Loss 3.4665 LearningRate 0.0001 Epoch: 29 Global Step: 615540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:20,058-Speed 6304.49 samples/sec Loss 3.4904 LearningRate 0.0001 Epoch: 29 Global Step: 615550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:23,304-Speed 6310.28 samples/sec Loss 3.4831 LearningRate 0.0001 Epoch: 29 Global Step: 615560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:26,549-Speed 6313.49 samples/sec Loss 3.4840 LearningRate 0.0001 Epoch: 29 Global Step: 615570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:29,796-Speed 6308.84 samples/sec Loss 3.4775 LearningRate 0.0001 Epoch: 29 Global Step: 615580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:33,040-Speed 6313.66 samples/sec Loss 3.4373 LearningRate 0.0001 Epoch: 29 Global Step: 615590 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:43:36,270-Speed 6342.72 samples/sec Loss 3.5295 LearningRate 0.0001 Epoch: 29 Global Step: 615600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:39,515-Speed 6311.56 samples/sec Loss 3.4835 LearningRate 0.0001 Epoch: 29 Global Step: 615610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:42,756-Speed 6320.41 samples/sec Loss 3.4224 LearningRate 0.0001 Epoch: 29 Global Step: 615620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:46,002-Speed 6311.84 samples/sec Loss 3.5104 LearningRate 0.0001 Epoch: 29 Global Step: 615630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:49,252-Speed 6302.66 samples/sec Loss 3.4362 LearningRate 0.0001 Epoch: 29 Global Step: 615640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:52,498-Speed 6310.14 samples/sec Loss 3.4558 LearningRate 0.0001 Epoch: 29 Global Step: 615650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:55,741-Speed 6317.53 samples/sec Loss 3.4602 LearningRate 0.0001 Epoch: 29 Global Step: 615660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:43:58,986-Speed 6311.77 samples/sec Loss 3.4308 LearningRate 0.0001 Epoch: 29 Global Step: 615670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:02,234-Speed 6308.22 samples/sec Loss 3.4338 LearningRate 0.0001 Epoch: 29 Global Step: 615680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:05,474-Speed 6321.66 samples/sec Loss 3.4545 LearningRate 0.0001 Epoch: 29 Global Step: 615690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:08,724-Speed 6303.95 samples/sec Loss 3.4594 LearningRate 0.0001 Epoch: 29 Global Step: 615700 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:44:11,953-Speed 6343.13 samples/sec Loss 3.4392 LearningRate 0.0001 Epoch: 29 Global Step: 615710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:15,196-Speed 6317.20 samples/sec Loss 3.4749 LearningRate 0.0001 Epoch: 29 Global Step: 615720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:18,442-Speed 6310.42 samples/sec Loss 3.4244 LearningRate 0.0001 Epoch: 29 Global Step: 615730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:21,682-Speed 6321.53 samples/sec Loss 3.4113 LearningRate 0.0001 Epoch: 29 Global Step: 615740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:24,927-Speed 6312.36 samples/sec Loss 3.3618 LearningRate 0.0001 Epoch: 29 Global Step: 615750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:28,172-Speed 6313.21 samples/sec Loss 3.4643 LearningRate 0.0001 Epoch: 29 Global Step: 615760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:31,418-Speed 6309.80 samples/sec Loss 3.4980 LearningRate 0.0001 Epoch: 29 Global Step: 615770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:34,665-Speed 6308.70 samples/sec Loss 3.4256 LearningRate 0.0001 Epoch: 29 Global Step: 615780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:37,909-Speed 6315.17 samples/sec Loss 3.5151 LearningRate 0.0001 Epoch: 29 Global Step: 615790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:41,150-Speed 6320.82 samples/sec Loss 3.4668 LearningRate 0.0001 Epoch: 29 Global Step: 615800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:44,381-Speed 6339.74 samples/sec Loss 3.4556 LearningRate 0.0001 Epoch: 29 Global Step: 615810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:47,628-Speed 6309.44 samples/sec Loss 3.4370 LearningRate 0.0001 Epoch: 29 Global Step: 615820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:50,875-Speed 6309.22 samples/sec Loss 3.4457 LearningRate 0.0001 Epoch: 29 Global Step: 615830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:54,119-Speed 6312.82 samples/sec Loss 3.4164 LearningRate 0.0001 Epoch: 29 Global Step: 615840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:44:57,366-Speed 6309.10 samples/sec Loss 3.4640 LearningRate 0.0001 Epoch: 29 Global Step: 615850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:00,626-Speed 6285.11 samples/sec Loss 3.4945 LearningRate 0.0001 Epoch: 29 Global Step: 615860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:03,874-Speed 6307.13 samples/sec Loss 3.4616 LearningRate 0.0001 Epoch: 29 Global Step: 615870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:07,120-Speed 6311.33 samples/sec Loss 3.4415 LearningRate 0.0001 Epoch: 29 Global Step: 615880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:10,361-Speed 6320.36 samples/sec Loss 3.4073 LearningRate 0.0001 Epoch: 29 Global Step: 615890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:13,605-Speed 6314.78 samples/sec Loss 3.4074 LearningRate 0.0001 Epoch: 29 Global Step: 615900 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:16,846-Speed 6319.97 samples/sec Loss 3.5040 LearningRate 0.0001 Epoch: 29 Global Step: 615910 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:45:20,079-Speed 6335.98 samples/sec Loss 3.4800 LearningRate 0.0001 Epoch: 29 Global Step: 615920 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:23,327-Speed 6305.76 samples/sec Loss 3.4871 LearningRate 0.0001 Epoch: 29 Global Step: 615930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:26,575-Speed 6306.86 samples/sec Loss 3.4389 LearningRate 0.0001 Epoch: 29 Global Step: 615940 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:29,819-Speed 6315.64 samples/sec Loss 3.4390 LearningRate 0.0001 Epoch: 29 Global Step: 615950 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:33,061-Speed 6318.21 samples/sec Loss 3.4291 LearningRate 0.0001 Epoch: 29 Global Step: 615960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:36,306-Speed 6312.39 samples/sec Loss 3.4601 LearningRate 0.0001 Epoch: 29 Global Step: 615970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:39,548-Speed 6318.63 samples/sec Loss 3.4562 LearningRate 0.0001 Epoch: 29 Global Step: 615980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:42,791-Speed 6316.64 samples/sec Loss 3.3967 LearningRate 0.0001 Epoch: 29 Global Step: 615990 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:46,037-Speed 6310.32 samples/sec Loss 3.4525 LearningRate 0.0001 Epoch: 29 Global Step: 616000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:49,285-Speed 6306.70 samples/sec Loss 3.4358 LearningRate 0.0001 Epoch: 29 Global Step: 616010 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:52,519-Speed 6333.43 samples/sec Loss 3.4509 LearningRate 0.0001 Epoch: 29 Global Step: 616020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:55,765-Speed 6312.74 samples/sec Loss 3.4356 LearningRate 0.0001 Epoch: 29 Global Step: 616030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:45:59,012-Speed 6308.33 samples/sec Loss 3.4736 LearningRate 0.0001 Epoch: 29 Global Step: 616040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:02,255-Speed 6314.78 samples/sec Loss 3.3823 LearningRate 0.0001 Epoch: 29 Global Step: 616050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:05,500-Speed 6313.49 samples/sec Loss 3.3766 LearningRate 0.0001 Epoch: 29 Global Step: 616060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:08,749-Speed 6305.16 samples/sec Loss 3.4407 LearningRate 0.0001 Epoch: 29 Global Step: 616070 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:11,994-Speed 6312.31 samples/sec Loss 3.4465 LearningRate 0.0001 Epoch: 29 Global Step: 616080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:15,244-Speed 6303.61 samples/sec Loss 3.3821 LearningRate 0.0001 Epoch: 29 Global Step: 616090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:18,486-Speed 6318.33 samples/sec Loss 3.4688 LearningRate 0.0001 Epoch: 29 Global Step: 616100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:21,734-Speed 6306.56 samples/sec Loss 3.4729 LearningRate 0.0001 Epoch: 29 Global Step: 616110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:24,967-Speed 6336.89 samples/sec Loss 3.4707 LearningRate 0.0001 Epoch: 29 Global Step: 616120 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:28,212-Speed 6312.99 samples/sec Loss 3.4523 LearningRate 0.0001 Epoch: 29 Global Step: 616130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:31,460-Speed 6305.56 samples/sec Loss 3.4697 LearningRate 0.0001 Epoch: 29 Global Step: 616140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:34,704-Speed 6315.18 samples/sec Loss 3.4962 LearningRate 0.0001 Epoch: 29 Global Step: 616150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:37,954-Speed 6304.00 samples/sec Loss 3.5052 LearningRate 0.0001 Epoch: 29 Global Step: 616160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:41,201-Speed 6308.63 samples/sec Loss 3.4427 LearningRate 0.0001 Epoch: 29 Global Step: 616170 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:44,448-Speed 6307.52 samples/sec Loss 3.4334 LearningRate 0.0001 Epoch: 29 Global Step: 616180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:47,700-Speed 6299.45 samples/sec Loss 3.4452 LearningRate 0.0001 Epoch: 29 Global Step: 616190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:50,946-Speed 6311.41 samples/sec Loss 3.4206 LearningRate 0.0001 Epoch: 29 Global Step: 616200 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:54,190-Speed 6313.13 samples/sec Loss 3.4490 LearningRate 0.0001 Epoch: 29 Global Step: 616210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:46:57,436-Speed 6311.06 samples/sec Loss 3.4032 LearningRate 0.0001 Epoch: 29 Global Step: 616220 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:47:00,668-Speed 6338.14 samples/sec Loss 3.4037 LearningRate 0.0001 Epoch: 29 Global Step: 616230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:03,913-Speed 6313.39 samples/sec Loss 3.4315 LearningRate 0.0001 Epoch: 29 Global Step: 616240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:07,158-Speed 6311.75 samples/sec Loss 3.4803 LearningRate 0.0001 Epoch: 29 Global Step: 616250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:10,408-Speed 6304.25 samples/sec Loss 3.4782 LearningRate 0.0001 Epoch: 29 Global Step: 616260 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:13,651-Speed 6315.84 samples/sec Loss 3.4906 LearningRate 0.0001 Epoch: 29 Global Step: 616270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:16,898-Speed 6308.44 samples/sec Loss 3.4923 LearningRate 0.0001 Epoch: 29 Global Step: 616280 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:20,151-Speed 6298.24 samples/sec Loss 3.4548 LearningRate 0.0001 Epoch: 29 Global Step: 616290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:23,395-Speed 6314.26 samples/sec Loss 3.4695 LearningRate 0.0001 Epoch: 29 Global Step: 616300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:26,643-Speed 6307.77 samples/sec Loss 3.4925 LearningRate 0.0001 Epoch: 29 Global Step: 616310 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:29,888-Speed 6312.32 samples/sec Loss 3.4261 LearningRate 0.0001 Epoch: 29 Global Step: 616320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:33,142-Speed 6294.98 samples/sec Loss 3.4126 LearningRate 0.0001 Epoch: 29 Global Step: 616330 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:47:36,373-Speed 6340.62 samples/sec Loss 3.4729 LearningRate 0.0001 Epoch: 29 Global Step: 616340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:39,622-Speed 6304.59 samples/sec Loss 3.5205 LearningRate 0.0001 Epoch: 29 Global Step: 616350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:42,871-Speed 6304.83 samples/sec Loss 3.4250 LearningRate 0.0001 Epoch: 29 Global Step: 616360 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:46,120-Speed 6305.36 samples/sec Loss 3.4783 LearningRate 0.0001 Epoch: 29 Global Step: 616370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:49,372-Speed 6300.07 samples/sec Loss 3.4724 LearningRate 0.0001 Epoch: 29 Global Step: 616380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:52,616-Speed 6313.79 samples/sec Loss 3.4694 LearningRate 0.0001 Epoch: 29 Global Step: 616390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:55,864-Speed 6305.90 samples/sec Loss 3.5089 LearningRate 0.0001 Epoch: 29 Global Step: 616400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:47:59,110-Speed 6311.44 samples/sec Loss 3.4268 LearningRate 0.0001 Epoch: 29 Global Step: 616410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:02,359-Speed 6305.41 samples/sec Loss 3.4417 LearningRate 0.0001 Epoch: 29 Global Step: 616420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:05,606-Speed 6308.76 samples/sec Loss 3.4445 LearningRate 0.0001 Epoch: 29 Global Step: 616430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:08,849-Speed 6317.31 samples/sec Loss 3.4181 LearningRate 0.0001 Epoch: 29 Global Step: 616440 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:48:12,083-Speed 6333.65 samples/sec Loss 3.4408 LearningRate 0.0001 Epoch: 29 Global Step: 616450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:15,330-Speed 6308.46 samples/sec Loss 3.4275 LearningRate 0.0001 Epoch: 29 Global Step: 616460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:18,578-Speed 6306.67 samples/sec Loss 3.4688 LearningRate 0.0001 Epoch: 29 Global Step: 616470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:21,825-Speed 6308.92 samples/sec Loss 3.4266 LearningRate 0.0001 Epoch: 29 Global Step: 616480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:25,070-Speed 6312.60 samples/sec Loss 3.4590 LearningRate 0.0001 Epoch: 29 Global Step: 616490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:28,317-Speed 6308.59 samples/sec Loss 3.4765 LearningRate 0.0001 Epoch: 29 Global Step: 616500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:31,561-Speed 6313.93 samples/sec Loss 3.4690 LearningRate 0.0001 Epoch: 29 Global Step: 616510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:34,807-Speed 6311.56 samples/sec Loss 3.4981 LearningRate 0.0001 Epoch: 29 Global Step: 616520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:38,053-Speed 6310.86 samples/sec Loss 3.4913 LearningRate 0.0001 Epoch: 29 Global Step: 616530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:41,302-Speed 6304.88 samples/sec Loss 3.4463 LearningRate 0.0001 Epoch: 29 Global Step: 616540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:44,546-Speed 6316.26 samples/sec Loss 3.5120 LearningRate 0.0001 Epoch: 29 Global Step: 616550 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:48:47,775-Speed 6343.31 samples/sec Loss 3.4538 LearningRate 0.0001 Epoch: 29 Global Step: 616560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:51,025-Speed 6302.23 samples/sec Loss 3.4488 LearningRate 0.0001 Epoch: 29 Global Step: 616570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:54,275-Speed 6303.01 samples/sec Loss 3.4692 LearningRate 0.0001 Epoch: 29 Global Step: 616580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:48:57,521-Speed 6311.50 samples/sec Loss 3.4807 LearningRate 0.0001 Epoch: 29 Global Step: 616590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:00,780-Speed 6285.65 samples/sec Loss 3.4699 LearningRate 0.0001 Epoch: 29 Global Step: 616600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:04,047-Speed 6269.47 samples/sec Loss 3.4487 LearningRate 0.0001 Epoch: 29 Global Step: 616610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:07,292-Speed 6312.37 samples/sec Loss 3.4691 LearningRate 0.0001 Epoch: 29 Global Step: 616620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:10,537-Speed 6312.79 samples/sec Loss 3.5094 LearningRate 0.0001 Epoch: 29 Global Step: 616630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:13,785-Speed 6307.45 samples/sec Loss 3.4764 LearningRate 0.0001 Epoch: 29 Global Step: 616640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:17,027-Speed 6318.86 samples/sec Loss 3.4620 LearningRate 0.0001 Epoch: 29 Global Step: 616650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:20,257-Speed 6341.30 samples/sec Loss 3.5259 LearningRate 0.0001 Epoch: 29 Global Step: 616660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:23,502-Speed 6313.38 samples/sec Loss 3.3888 LearningRate 0.0001 Epoch: 29 Global Step: 616670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:26,743-Speed 6320.56 samples/sec Loss 3.4694 LearningRate 0.0001 Epoch: 29 Global Step: 616680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:29,989-Speed 6309.94 samples/sec Loss 3.4471 LearningRate 0.0001 Epoch: 29 Global Step: 616690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:33,231-Speed 6318.69 samples/sec Loss 3.3975 LearningRate 0.0001 Epoch: 29 Global Step: 616700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:36,482-Speed 6301.58 samples/sec Loss 3.4439 LearningRate 0.0001 Epoch: 29 Global Step: 616710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:39,725-Speed 6315.24 samples/sec Loss 3.4305 LearningRate 0.0001 Epoch: 29 Global Step: 616720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:42,966-Speed 6320.98 samples/sec Loss 3.3942 LearningRate 0.0001 Epoch: 29 Global Step: 616730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:46,210-Speed 6313.90 samples/sec Loss 3.4272 LearningRate 0.0001 Epoch: 29 Global Step: 616740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:49,453-Speed 6316.65 samples/sec Loss 3.4025 LearningRate 0.0001 Epoch: 29 Global Step: 616750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:52,686-Speed 6337.77 samples/sec Loss 3.3575 LearningRate 0.0001 Epoch: 29 Global Step: 616760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:55,929-Speed 6315.69 samples/sec Loss 3.4029 LearningRate 0.0001 Epoch: 29 Global Step: 616770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:49:59,175-Speed 6311.33 samples/sec Loss 3.4628 LearningRate 0.0001 Epoch: 29 Global Step: 616780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:02,422-Speed 6309.38 samples/sec Loss 3.4713 LearningRate 0.0001 Epoch: 29 Global Step: 616790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:05,666-Speed 6314.27 samples/sec Loss 3.4401 LearningRate 0.0001 Epoch: 29 Global Step: 616800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:08,916-Speed 6302.96 samples/sec Loss 3.4217 LearningRate 0.0001 Epoch: 29 Global Step: 616810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:12,163-Speed 6309.26 samples/sec Loss 3.4703 LearningRate 0.0001 Epoch: 29 Global Step: 616820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:15,413-Speed 6303.57 samples/sec Loss 3.4445 LearningRate 0.0001 Epoch: 29 Global Step: 616830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:18,662-Speed 6303.44 samples/sec Loss 3.4820 LearningRate 0.0001 Epoch: 29 Global Step: 616840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:21,908-Speed 6311.64 samples/sec Loss 3.4584 LearningRate 0.0001 Epoch: 29 Global Step: 616850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:25,157-Speed 6303.66 samples/sec Loss 3.4939 LearningRate 0.0001 Epoch: 29 Global Step: 616860 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:50:28,385-Speed 6347.03 samples/sec Loss 3.4300 LearningRate 0.0001 Epoch: 29 Global Step: 616870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:31,630-Speed 6312.48 samples/sec Loss 3.4103 LearningRate 0.0001 Epoch: 29 Global Step: 616880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:34,873-Speed 6315.76 samples/sec Loss 3.4767 LearningRate 0.0001 Epoch: 29 Global Step: 616890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:38,119-Speed 6311.53 samples/sec Loss 3.4701 LearningRate 0.0001 Epoch: 29 Global Step: 616900 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:41,368-Speed 6303.87 samples/sec Loss 3.4919 LearningRate 0.0001 Epoch: 29 Global Step: 616910 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:44,623-Speed 6293.99 samples/sec Loss 3.4445 LearningRate 0.0001 Epoch: 29 Global Step: 616920 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:47,867-Speed 6314.49 samples/sec Loss 3.4267 LearningRate 0.0001 Epoch: 29 Global Step: 616930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:51,111-Speed 6315.67 samples/sec Loss 3.4571 LearningRate 0.0001 Epoch: 29 Global Step: 616940 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:54,357-Speed 6310.25 samples/sec Loss 3.4403 LearningRate 0.0001 Epoch: 29 Global Step: 616950 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:50:57,602-Speed 6312.40 samples/sec Loss 3.4725 LearningRate 0.0001 Epoch: 29 Global Step: 616960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:00,831-Speed 6342.64 samples/sec Loss 3.4193 LearningRate 0.0001 Epoch: 29 Global Step: 616970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:04,079-Speed 6308.41 samples/sec Loss 3.4335 LearningRate 0.0001 Epoch: 29 Global Step: 616980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:07,324-Speed 6312.49 samples/sec Loss 3.4508 LearningRate 0.0001 Epoch: 29 Global Step: 616990 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:10,572-Speed 6307.64 samples/sec Loss 3.4555 LearningRate 0.0001 Epoch: 29 Global Step: 617000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:13,821-Speed 6305.09 samples/sec Loss 3.3984 LearningRate 0.0001 Epoch: 29 Global Step: 617010 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:17,066-Speed 6312.55 samples/sec Loss 3.4150 LearningRate 0.0001 Epoch: 29 Global Step: 617020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:20,312-Speed 6310.31 samples/sec Loss 3.4325 LearningRate 0.0001 Epoch: 29 Global Step: 617030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:23,556-Speed 6314.55 samples/sec Loss 3.4272 LearningRate 0.0001 Epoch: 29 Global Step: 617040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:26,804-Speed 6307.09 samples/sec Loss 3.4101 LearningRate 0.0001 Epoch: 29 Global Step: 617050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:30,051-Speed 6309.17 samples/sec Loss 3.5092 LearningRate 0.0001 Epoch: 29 Global Step: 617060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:33,300-Speed 6305.29 samples/sec Loss 3.3570 LearningRate 0.0001 Epoch: 29 Global Step: 617070 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:51:36,531-Speed 6339.89 samples/sec Loss 3.3795 LearningRate 0.0001 Epoch: 29 Global Step: 617080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:39,782-Speed 6300.90 samples/sec Loss 3.5123 LearningRate 0.0001 Epoch: 29 Global Step: 617090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:43,028-Speed 6311.18 samples/sec Loss 3.4175 LearningRate 0.0001 Epoch: 29 Global Step: 617100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:46,275-Speed 6308.12 samples/sec Loss 3.4607 LearningRate 0.0001 Epoch: 29 Global Step: 617110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:49,518-Speed 6316.47 samples/sec Loss 3.3767 LearningRate 0.0001 Epoch: 29 Global Step: 617120 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:52,764-Speed 6311.18 samples/sec Loss 3.5192 LearningRate 0.0001 Epoch: 29 Global Step: 617130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:56,006-Speed 6317.56 samples/sec Loss 3.4568 LearningRate 0.0001 Epoch: 29 Global Step: 617140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:51:59,253-Speed 6308.25 samples/sec Loss 3.4133 LearningRate 0.0001 Epoch: 29 Global Step: 617150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:02,500-Speed 6310.20 samples/sec Loss 3.4269 LearningRate 0.0001 Epoch: 29 Global Step: 617160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:05,750-Speed 6303.45 samples/sec Loss 3.4569 LearningRate 0.0001 Epoch: 29 Global Step: 617170 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:08,979-Speed 6342.56 samples/sec Loss 3.4275 LearningRate 0.0001 Epoch: 29 Global Step: 617180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:12,232-Speed 6298.72 samples/sec Loss 3.4465 LearningRate 0.0001 Epoch: 29 Global Step: 617190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:15,476-Speed 6313.22 samples/sec Loss 3.4051 LearningRate 0.0001 Epoch: 29 Global Step: 617200 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:18,722-Speed 6312.00 samples/sec Loss 3.4686 LearningRate 0.0001 Epoch: 29 Global Step: 617210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:21,968-Speed 6311.85 samples/sec Loss 3.4267 LearningRate 0.0001 Epoch: 29 Global Step: 617220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:25,214-Speed 6309.22 samples/sec Loss 3.4492 LearningRate 0.0001 Epoch: 29 Global Step: 617230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:28,460-Speed 6310.57 samples/sec Loss 3.4771 LearningRate 0.0001 Epoch: 29 Global Step: 617240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:31,740-Speed 6247.14 samples/sec Loss 3.4787 LearningRate 0.0001 Epoch: 29 Global Step: 617250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:34,985-Speed 6312.01 samples/sec Loss 3.4656 LearningRate 0.0001 Epoch: 29 Global Step: 617260 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:38,236-Speed 6299.61 samples/sec Loss 3.4815 LearningRate 0.0001 Epoch: 29 Global Step: 617270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:41,479-Speed 6317.14 samples/sec Loss 3.4509 LearningRate 0.0001 Epoch: 29 Global Step: 617280 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:52:44,713-Speed 6333.62 samples/sec Loss 3.4244 LearningRate 0.0001 Epoch: 29 Global Step: 617290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:47,975-Speed 6279.76 samples/sec Loss 3.4238 LearningRate 0.0001 Epoch: 29 Global Step: 617300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:51,224-Speed 6306.46 samples/sec Loss 3.4507 LearningRate 0.0001 Epoch: 29 Global Step: 617310 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:54,471-Speed 6307.79 samples/sec Loss 3.4673 LearningRate 0.0001 Epoch: 29 Global Step: 617320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:52:57,717-Speed 6311.18 samples/sec Loss 3.3811 LearningRate 0.0001 Epoch: 29 Global Step: 617330 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:00,964-Speed 6307.58 samples/sec Loss 3.5034 LearningRate 0.0001 Epoch: 29 Global Step: 617340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:04,213-Speed 6305.77 samples/sec Loss 3.3768 LearningRate 0.0001 Epoch: 29 Global Step: 617350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:07,459-Speed 6310.10 samples/sec Loss 3.4375 LearningRate 0.0001 Epoch: 29 Global Step: 617360 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:10,707-Speed 6307.31 samples/sec Loss 3.4569 LearningRate 0.0001 Epoch: 29 Global Step: 617370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:13,952-Speed 6313.85 samples/sec Loss 3.4904 LearningRate 0.0001 Epoch: 29 Global Step: 617380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:17,187-Speed 6331.59 samples/sec Loss 3.4596 LearningRate 0.0001 Epoch: 29 Global Step: 617390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:20,430-Speed 6316.92 samples/sec Loss 3.4522 LearningRate 0.0001 Epoch: 29 Global Step: 617400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:23,680-Speed 6302.57 samples/sec Loss 3.4595 LearningRate 0.0001 Epoch: 29 Global Step: 617410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:26,927-Speed 6308.65 samples/sec Loss 3.4812 LearningRate 0.0001 Epoch: 29 Global Step: 617420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:30,174-Speed 6309.12 samples/sec Loss 3.4486 LearningRate 0.0001 Epoch: 29 Global Step: 617430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:33,448-Speed 6257.85 samples/sec Loss 3.4488 LearningRate 0.0001 Epoch: 29 Global Step: 617440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:36,699-Speed 6299.90 samples/sec Loss 3.3877 LearningRate 0.0001 Epoch: 29 Global Step: 617450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:39,946-Speed 6310.14 samples/sec Loss 3.5022 LearningRate 0.0001 Epoch: 29 Global Step: 617460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:43,197-Speed 6301.02 samples/sec Loss 3.4813 LearningRate 0.0001 Epoch: 29 Global Step: 617470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:46,438-Speed 6320.14 samples/sec Loss 3.4740 LearningRate 0.0001 Epoch: 29 Global Step: 617480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:49,669-Speed 6339.79 samples/sec Loss 3.4023 LearningRate 0.0001 Epoch: 29 Global Step: 617490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:52,913-Speed 6314.48 samples/sec Loss 3.3899 LearningRate 0.0001 Epoch: 29 Global Step: 617500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:56,160-Speed 6307.99 samples/sec Loss 3.4143 LearningRate 0.0001 Epoch: 29 Global Step: 617510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:53:59,405-Speed 6313.52 samples/sec Loss 3.4518 LearningRate 0.0001 Epoch: 29 Global Step: 617520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:02,652-Speed 6309.23 samples/sec Loss 3.4518 LearningRate 0.0001 Epoch: 29 Global Step: 617530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:05,900-Speed 6306.74 samples/sec Loss 3.4178 LearningRate 0.0001 Epoch: 29 Global Step: 617540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:09,142-Speed 6316.98 samples/sec Loss 3.4223 LearningRate 0.0001 Epoch: 29 Global Step: 617550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:12,389-Speed 6309.48 samples/sec Loss 3.4195 LearningRate 0.0001 Epoch: 29 Global Step: 617560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:15,636-Speed 6309.11 samples/sec Loss 3.4243 LearningRate 0.0001 Epoch: 29 Global Step: 617570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:18,882-Speed 6309.89 samples/sec Loss 3.5538 LearningRate 0.0001 Epoch: 29 Global Step: 617580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:22,113-Speed 6341.28 samples/sec Loss 3.4511 LearningRate 0.0001 Epoch: 29 Global Step: 617590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:25,360-Speed 6308.38 samples/sec Loss 3.4649 LearningRate 0.0001 Epoch: 29 Global Step: 617600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:28,613-Speed 6296.41 samples/sec Loss 3.4436 LearningRate 0.0001 Epoch: 29 Global Step: 617610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:31,854-Speed 6320.49 samples/sec Loss 3.4645 LearningRate 0.0001 Epoch: 29 Global Step: 617620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:35,103-Speed 6305.43 samples/sec Loss 3.4868 LearningRate 0.0001 Epoch: 29 Global Step: 617630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:38,346-Speed 6315.16 samples/sec Loss 3.4206 LearningRate 0.0001 Epoch: 29 Global Step: 617640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:41,595-Speed 6306.07 samples/sec Loss 3.4304 LearningRate 0.0001 Epoch: 29 Global Step: 617650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:44,838-Speed 6316.59 samples/sec Loss 3.4352 LearningRate 0.0001 Epoch: 29 Global Step: 617660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:48,085-Speed 6308.66 samples/sec Loss 3.4077 LearningRate 0.0001 Epoch: 29 Global Step: 617670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:51,331-Speed 6311.15 samples/sec Loss 3.4135 LearningRate 0.0001 Epoch: 29 Global Step: 617680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:54:54,583-Speed 6298.77 samples/sec Loss 3.4293 LearningRate 0.0001 Epoch: 29 Global Step: 617690 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:54:57,814-Speed 6339.85 samples/sec Loss 3.4995 LearningRate 0.0001 Epoch: 29 Global Step: 617700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:01,075-Speed 6281.83 samples/sec Loss 3.4470 LearningRate 0.0001 Epoch: 29 Global Step: 617710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:04,324-Speed 6316.26 samples/sec Loss 3.4970 LearningRate 0.0001 Epoch: 29 Global Step: 617720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:07,568-Speed 6313.23 samples/sec Loss 3.4780 LearningRate 0.0001 Epoch: 29 Global Step: 617730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:10,812-Speed 6314.88 samples/sec Loss 3.4187 LearningRate 0.0001 Epoch: 29 Global Step: 617740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:14,055-Speed 6316.62 samples/sec Loss 3.4987 LearningRate 0.0001 Epoch: 29 Global Step: 617750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:17,296-Speed 6320.41 samples/sec Loss 3.4269 LearningRate 0.0001 Epoch: 29 Global Step: 617760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:20,539-Speed 6317.29 samples/sec Loss 3.4857 LearningRate 0.0001 Epoch: 29 Global Step: 617770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:23,783-Speed 6313.44 samples/sec Loss 3.4430 LearningRate 0.0001 Epoch: 29 Global Step: 617780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:27,029-Speed 6312.30 samples/sec Loss 3.4891 LearningRate 0.0001 Epoch: 29 Global Step: 617790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:30,261-Speed 6336.27 samples/sec Loss 3.4357 LearningRate 0.0001 Epoch: 29 Global Step: 617800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:33,503-Speed 6318.57 samples/sec Loss 3.4476 LearningRate 0.0001 Epoch: 29 Global Step: 617810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:36,750-Speed 6310.15 samples/sec Loss 3.4249 LearningRate 0.0001 Epoch: 29 Global Step: 617820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:39,998-Speed 6305.43 samples/sec Loss 3.4335 LearningRate 0.0001 Epoch: 29 Global Step: 617830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:43,245-Speed 6308.91 samples/sec Loss 3.4281 LearningRate 0.0001 Epoch: 29 Global Step: 617840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:46,495-Speed 6303.49 samples/sec Loss 3.4212 LearningRate 0.0001 Epoch: 29 Global Step: 617850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:49,739-Speed 6314.37 samples/sec Loss 3.4658 LearningRate 0.0001 Epoch: 29 Global Step: 617860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:52,982-Speed 6316.62 samples/sec Loss 3.4161 LearningRate 0.0001 Epoch: 29 Global Step: 617870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:56,227-Speed 6313.47 samples/sec Loss 3.5117 LearningRate 0.0001 Epoch: 29 Global Step: 617880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:55:59,477-Speed 6303.35 samples/sec Loss 3.4330 LearningRate 0.0001 Epoch: 29 Global Step: 617890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:02,725-Speed 6306.35 samples/sec Loss 3.4536 LearningRate 0.0001 Epoch: 29 Global Step: 617900 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:56:05,958-Speed 6336.02 samples/sec Loss 3.4295 LearningRate 0.0001 Epoch: 29 Global Step: 617910 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:09,203-Speed 6312.76 samples/sec Loss 3.4971 LearningRate 0.0001 Epoch: 29 Global Step: 617920 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:12,456-Speed 6297.65 samples/sec Loss 3.4355 LearningRate 0.0001 Epoch: 29 Global Step: 617930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:15,700-Speed 6314.38 samples/sec Loss 3.4088 LearningRate 0.0001 Epoch: 29 Global Step: 617940 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:18,947-Speed 6308.68 samples/sec Loss 3.4381 LearningRate 0.0001 Epoch: 29 Global Step: 617950 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:22,192-Speed 6312.84 samples/sec Loss 3.4501 LearningRate 0.0001 Epoch: 29 Global Step: 617960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:25,447-Speed 6293.61 samples/sec Loss 3.4863 LearningRate 0.0001 Epoch: 29 Global Step: 617970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:28,693-Speed 6310.80 samples/sec Loss 3.4467 LearningRate 0.0001 Epoch: 29 Global Step: 617980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:31,934-Speed 6319.56 samples/sec Loss 3.3856 LearningRate 0.0001 Epoch: 29 Global Step: 617990 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:35,179-Speed 6313.51 samples/sec Loss 3.4962 LearningRate 0.0001 Epoch: 29 Global Step: 618000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:38,426-Speed 6306.77 samples/sec Loss 3.3621 LearningRate 0.0001 Epoch: 29 Global Step: 618010 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:56:41,659-Speed 6337.40 samples/sec Loss 3.4812 LearningRate 0.0001 Epoch: 29 Global Step: 618020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:44,901-Speed 6318.96 samples/sec Loss 3.5140 LearningRate 0.0001 Epoch: 29 Global Step: 618030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:48,183-Speed 6240.71 samples/sec Loss 3.4336 LearningRate 0.0001 Epoch: 29 Global Step: 618040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:51,439-Speed 6291.92 samples/sec Loss 3.3648 LearningRate 0.0001 Epoch: 29 Global Step: 618050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:54,684-Speed 6311.57 samples/sec Loss 3.4754 LearningRate 0.0001 Epoch: 29 Global Step: 618060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:56:57,930-Speed 6311.27 samples/sec Loss 3.4241 LearningRate 0.0001 Epoch: 29 Global Step: 618070 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:01,176-Speed 6309.87 samples/sec Loss 3.4113 LearningRate 0.0001 Epoch: 29 Global Step: 618080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:04,424-Speed 6308.89 samples/sec Loss 3.4681 LearningRate 0.0001 Epoch: 29 Global Step: 618090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:07,664-Speed 6322.26 samples/sec Loss 3.4433 LearningRate 0.0001 Epoch: 29 Global Step: 618100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:10,957-Speed 6221.25 samples/sec Loss 3.4261 LearningRate 0.0001 Epoch: 29 Global Step: 618110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:14,203-Speed 6310.19 samples/sec Loss 3.4208 LearningRate 0.0001 Epoch: 29 Global Step: 618120 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:57:17,458-Speed 6292.51 samples/sec Loss 3.4342 LearningRate 0.0001 Epoch: 29 Global Step: 618130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:20,700-Speed 6318.80 samples/sec Loss 3.4350 LearningRate 0.0001 Epoch: 29 Global Step: 618140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:23,945-Speed 6313.43 samples/sec Loss 3.4163 LearningRate 0.0001 Epoch: 29 Global Step: 618150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:27,189-Speed 6313.52 samples/sec Loss 3.3844 LearningRate 0.0001 Epoch: 29 Global Step: 618160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:30,433-Speed 6314.80 samples/sec Loss 3.3793 LearningRate 0.0001 Epoch: 29 Global Step: 618170 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:33,679-Speed 6311.36 samples/sec Loss 3.4539 LearningRate 0.0001 Epoch: 29 Global Step: 618180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:36,923-Speed 6314.07 samples/sec Loss 3.4124 LearningRate 0.0001 Epoch: 29 Global Step: 618190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:40,169-Speed 6310.10 samples/sec Loss 3.4196 LearningRate 0.0001 Epoch: 29 Global Step: 618200 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:43,422-Speed 6296.93 samples/sec Loss 3.4287 LearningRate 0.0001 Epoch: 29 Global Step: 618210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:46,663-Speed 6322.08 samples/sec Loss 3.3868 LearningRate 0.0001 Epoch: 29 Global Step: 618220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:49,897-Speed 6333.22 samples/sec Loss 3.4791 LearningRate 0.0001 Epoch: 29 Global Step: 618230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:53,144-Speed 6308.14 samples/sec Loss 3.4183 LearningRate 0.0001 Epoch: 29 Global Step: 618240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:56,386-Speed 6318.90 samples/sec Loss 3.4339 LearningRate 0.0001 Epoch: 29 Global Step: 618250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:57:59,632-Speed 6311.13 samples/sec Loss 3.4347 LearningRate 0.0001 Epoch: 29 Global Step: 618260 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:02,878-Speed 6309.84 samples/sec Loss 3.3714 LearningRate 0.0001 Epoch: 29 Global Step: 618270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:06,125-Speed 6309.65 samples/sec Loss 3.4066 LearningRate 0.0001 Epoch: 29 Global Step: 618280 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:09,372-Speed 6307.48 samples/sec Loss 3.3836 LearningRate 0.0001 Epoch: 29 Global Step: 618290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:12,618-Speed 6312.42 samples/sec Loss 3.5177 LearningRate 0.0001 Epoch: 29 Global Step: 618300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:15,862-Speed 6313.50 samples/sec Loss 3.4851 LearningRate 0.0001 Epoch: 29 Global Step: 618310 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:19,106-Speed 6315.07 samples/sec Loss 3.4484 LearningRate 0.0001 Epoch: 29 Global Step: 618320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:22,340-Speed 6333.97 samples/sec Loss 3.4282 LearningRate 0.0001 Epoch: 29 Global Step: 618330 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:25,588-Speed 6307.69 samples/sec Loss 3.3917 LearningRate 0.0001 Epoch: 29 Global Step: 618340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:28,839-Speed 6300.52 samples/sec Loss 3.4567 LearningRate 0.0001 Epoch: 29 Global Step: 618350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:32,083-Speed 6314.51 samples/sec Loss 3.4627 LearningRate 0.0001 Epoch: 29 Global Step: 618360 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:35,329-Speed 6310.23 samples/sec Loss 3.4805 LearningRate 0.0001 Epoch: 29 Global Step: 618370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:38,578-Speed 6305.56 samples/sec Loss 3.4278 LearningRate 0.0001 Epoch: 29 Global Step: 618380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:41,828-Speed 6302.21 samples/sec Loss 3.4537 LearningRate 0.0001 Epoch: 29 Global Step: 618390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:45,076-Speed 6308.16 samples/sec Loss 3.3929 LearningRate 0.0001 Epoch: 29 Global Step: 618400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:48,320-Speed 6314.32 samples/sec Loss 3.4152 LearningRate 0.0001 Epoch: 29 Global Step: 618410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:51,567-Speed 6308.27 samples/sec Loss 3.4572 LearningRate 0.0001 Epoch: 29 Global Step: 618420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:54,800-Speed 6335.43 samples/sec Loss 3.4003 LearningRate 0.0001 Epoch: 29 Global Step: 618430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:58:58,047-Speed 6309.62 samples/sec Loss 3.4208 LearningRate 0.0001 Epoch: 29 Global Step: 618440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:01,295-Speed 6306.13 samples/sec Loss 3.4283 LearningRate 0.0001 Epoch: 29 Global Step: 618450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:04,554-Speed 6286.06 samples/sec Loss 3.4033 LearningRate 0.0001 Epoch: 29 Global Step: 618460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:07,799-Speed 6312.05 samples/sec Loss 3.4564 LearningRate 0.0001 Epoch: 29 Global Step: 618470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:11,045-Speed 6311.21 samples/sec Loss 3.4004 LearningRate 0.0001 Epoch: 29 Global Step: 618480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:14,286-Speed 6320.62 samples/sec Loss 3.4204 LearningRate 0.0001 Epoch: 29 Global Step: 618490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:17,537-Speed 6300.86 samples/sec Loss 3.5193 LearningRate 0.0001 Epoch: 29 Global Step: 618500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:20,794-Speed 6289.18 samples/sec Loss 3.4986 LearningRate 0.0001 Epoch: 29 Global Step: 618510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:24,041-Speed 6308.93 samples/sec Loss 3.4479 LearningRate 0.0001 Epoch: 29 Global Step: 618520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:27,291-Speed 6302.91 samples/sec Loss 3.3849 LearningRate 0.0001 Epoch: 29 Global Step: 618530 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-02 23:59:30,521-Speed 6341.96 samples/sec Loss 3.4216 LearningRate 0.0001 Epoch: 29 Global Step: 618540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:33,766-Speed 6314.42 samples/sec Loss 3.4183 LearningRate 0.0001 Epoch: 29 Global Step: 618550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:37,008-Speed 6318.03 samples/sec Loss 3.5081 LearningRate 0.0001 Epoch: 29 Global Step: 618560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:40,255-Speed 6308.11 samples/sec Loss 3.4099 LearningRate 0.0001 Epoch: 29 Global Step: 618570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:43,502-Speed 6309.44 samples/sec Loss 3.4833 LearningRate 0.0001 Epoch: 29 Global Step: 618580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:46,750-Speed 6306.34 samples/sec Loss 3.4260 LearningRate 0.0001 Epoch: 29 Global Step: 618590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:49,995-Speed 6312.47 samples/sec Loss 3.3795 LearningRate 0.0001 Epoch: 29 Global Step: 618600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:53,235-Speed 6322.12 samples/sec Loss 3.3885 LearningRate 0.0001 Epoch: 29 Global Step: 618610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:56,481-Speed 6312.11 samples/sec Loss 3.4378 LearningRate 0.0001 Epoch: 29 Global Step: 618620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-02 23:59:59,727-Speed 6309.85 samples/sec Loss 3.4919 LearningRate 0.0001 Epoch: 29 Global Step: 618630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:02,958-Speed 6339.10 samples/sec Loss 3.4797 LearningRate 0.0001 Epoch: 29 Global Step: 618640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:06,205-Speed 6309.18 samples/sec Loss 3.4242 LearningRate 0.0001 Epoch: 29 Global Step: 618650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:09,451-Speed 6311.64 samples/sec Loss 3.4394 LearningRate 0.0001 Epoch: 29 Global Step: 618660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:12,701-Speed 6302.11 samples/sec Loss 3.3954 LearningRate 0.0001 Epoch: 29 Global Step: 618670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:15,945-Speed 6314.45 samples/sec Loss 3.4157 LearningRate 0.0001 Epoch: 29 Global Step: 618680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:19,194-Speed 6305.97 samples/sec Loss 3.4071 LearningRate 0.0001 Epoch: 29 Global Step: 618690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:22,439-Speed 6311.36 samples/sec Loss 3.3942 LearningRate 0.0001 Epoch: 29 Global Step: 618700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:25,684-Speed 6313.85 samples/sec Loss 3.3693 LearningRate 0.0001 Epoch: 29 Global Step: 618710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:28,934-Speed 6302.57 samples/sec Loss 3.3821 LearningRate 0.0001 Epoch: 29 Global Step: 618720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:32,182-Speed 6307.31 samples/sec Loss 3.4343 LearningRate 0.0001 Epoch: 29 Global Step: 618730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:35,425-Speed 6317.31 samples/sec Loss 3.3662 LearningRate 0.0001 Epoch: 29 Global Step: 618740 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:00:38,658-Speed 6336.34 samples/sec Loss 3.4683 LearningRate 0.0001 Epoch: 29 Global Step: 618750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:41,904-Speed 6311.57 samples/sec Loss 3.4315 LearningRate 0.0001 Epoch: 29 Global Step: 618760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:45,150-Speed 6311.64 samples/sec Loss 3.3966 LearningRate 0.0001 Epoch: 29 Global Step: 618770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:48,394-Speed 6315.43 samples/sec Loss 3.4433 LearningRate 0.0001 Epoch: 29 Global Step: 618780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:51,643-Speed 6304.63 samples/sec Loss 3.4587 LearningRate 0.0001 Epoch: 29 Global Step: 618790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:54,893-Speed 6303.18 samples/sec Loss 3.4247 LearningRate 0.0001 Epoch: 29 Global Step: 618800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:00:58,142-Speed 6303.73 samples/sec Loss 3.4800 LearningRate 0.0001 Epoch: 29 Global Step: 618810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:01,386-Speed 6315.25 samples/sec Loss 3.4065 LearningRate 0.0001 Epoch: 29 Global Step: 618820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:04,632-Speed 6310.76 samples/sec Loss 3.4098 LearningRate 0.0001 Epoch: 29 Global Step: 618830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:07,880-Speed 6306.60 samples/sec Loss 3.5021 LearningRate 0.0001 Epoch: 29 Global Step: 618840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:11,112-Speed 6337.51 samples/sec Loss 3.4255 LearningRate 0.0001 Epoch: 29 Global Step: 618850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:14,357-Speed 6313.19 samples/sec Loss 3.4276 LearningRate 0.0001 Epoch: 29 Global Step: 618860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:17,603-Speed 6310.81 samples/sec Loss 3.4616 LearningRate 0.0001 Epoch: 29 Global Step: 618870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:20,849-Speed 6310.23 samples/sec Loss 3.4373 LearningRate 0.0001 Epoch: 29 Global Step: 618880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:24,096-Speed 6309.05 samples/sec Loss 3.4395 LearningRate 0.0001 Epoch: 29 Global Step: 618890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:27,342-Speed 6313.09 samples/sec Loss 3.4470 LearningRate 0.0001 Epoch: 29 Global Step: 618900 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:30,589-Speed 6308.41 samples/sec Loss 3.3931 LearningRate 0.0001 Epoch: 29 Global Step: 618910 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:33,836-Speed 6308.60 samples/sec Loss 3.3697 LearningRate 0.0001 Epoch: 29 Global Step: 618920 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:37,085-Speed 6303.31 samples/sec Loss 3.3999 LearningRate 0.0001 Epoch: 29 Global Step: 618930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:40,332-Speed 6310.56 samples/sec Loss 3.4660 LearningRate 0.0001 Epoch: 29 Global Step: 618940 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:43,578-Speed 6309.13 samples/sec Loss 3.4171 LearningRate 0.0001 Epoch: 29 Global Step: 618950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:01:46,812-Speed 6336.19 samples/sec Loss 3.4651 LearningRate 0.0001 Epoch: 29 Global Step: 618960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:50,056-Speed 6313.17 samples/sec Loss 3.4185 LearningRate 0.0001 Epoch: 29 Global Step: 618970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:53,304-Speed 6308.00 samples/sec Loss 3.4020 LearningRate 0.0001 Epoch: 29 Global Step: 618980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:56,550-Speed 6309.43 samples/sec Loss 3.4703 LearningRate 0.0001 Epoch: 29 Global Step: 618990 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:01:59,799-Speed 6306.82 samples/sec Loss 3.5196 LearningRate 0.0001 Epoch: 29 Global Step: 619000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:03,048-Speed 6303.60 samples/sec Loss 3.4524 LearningRate 0.0001 Epoch: 29 Global Step: 619010 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:06,298-Speed 6304.24 samples/sec Loss 3.3561 LearningRate 0.0001 Epoch: 29 Global Step: 619020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:09,545-Speed 6308.01 samples/sec Loss 3.4049 LearningRate 0.0001 Epoch: 29 Global Step: 619030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:12,795-Speed 6303.18 samples/sec Loss 3.4378 LearningRate 0.0001 Epoch: 29 Global Step: 619040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:16,040-Speed 6313.45 samples/sec Loss 3.4394 LearningRate 0.0001 Epoch: 29 Global Step: 619050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:19,273-Speed 6335.04 samples/sec Loss 3.3849 LearningRate 0.0001 Epoch: 29 Global Step: 619060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:22,519-Speed 6310.98 samples/sec Loss 3.4363 LearningRate 0.0001 Epoch: 29 Global Step: 619070 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:25,767-Speed 6307.34 samples/sec Loss 3.4093 LearningRate 0.0001 Epoch: 29 Global Step: 619080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:29,013-Speed 6309.60 samples/sec Loss 3.3924 LearningRate 0.0001 Epoch: 29 Global Step: 619090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:32,271-Speed 6287.10 samples/sec Loss 3.4331 LearningRate 0.0001 Epoch: 29 Global Step: 619100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:35,551-Speed 6245.77 samples/sec Loss 3.4240 LearningRate 0.0001 Epoch: 29 Global Step: 619110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:38,822-Speed 6262.40 samples/sec Loss 3.4486 LearningRate 0.0001 Epoch: 29 Global Step: 619120 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:42,068-Speed 6311.59 samples/sec Loss 3.3563 LearningRate 0.0001 Epoch: 29 Global Step: 619130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:45,313-Speed 6312.95 samples/sec Loss 3.4654 LearningRate 0.0001 Epoch: 29 Global Step: 619140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:48,559-Speed 6310.03 samples/sec Loss 3.4840 LearningRate 0.0001 Epoch: 29 Global Step: 619150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:51,792-Speed 6336.78 samples/sec Loss 3.3886 LearningRate 0.0001 Epoch: 29 Global Step: 619160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:55,036-Speed 6313.02 samples/sec Loss 3.3903 LearningRate 0.0001 Epoch: 29 Global Step: 619170 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:02:58,282-Speed 6311.94 samples/sec Loss 3.4710 LearningRate 0.0001 Epoch: 29 Global Step: 619180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:01,528-Speed 6309.30 samples/sec Loss 3.5284 LearningRate 0.0001 Epoch: 29 Global Step: 619190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:04,777-Speed 6305.54 samples/sec Loss 3.3695 LearningRate 0.0001 Epoch: 29 Global Step: 619200 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:08,022-Speed 6312.90 samples/sec Loss 3.4560 LearningRate 0.0001 Epoch: 29 Global Step: 619210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:11,277-Speed 6293.21 samples/sec Loss 3.4149 LearningRate 0.0001 Epoch: 29 Global Step: 619220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:14,522-Speed 6314.17 samples/sec Loss 3.4809 LearningRate 0.0001 Epoch: 29 Global Step: 619230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:17,767-Speed 6312.60 samples/sec Loss 3.3853 LearningRate 0.0001 Epoch: 29 Global Step: 619240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:21,009-Speed 6318.59 samples/sec Loss 3.4237 LearningRate 0.0001 Epoch: 29 Global Step: 619250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:24,257-Speed 6305.90 samples/sec Loss 3.4540 LearningRate 0.0001 Epoch: 29 Global Step: 619260 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:03:27,490-Speed 6337.00 samples/sec Loss 3.3890 LearningRate 0.0001 Epoch: 29 Global Step: 619270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:30,734-Speed 6314.63 samples/sec Loss 3.4064 LearningRate 0.0001 Epoch: 29 Global Step: 619280 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:33,980-Speed 6309.26 samples/sec Loss 3.4193 LearningRate 0.0001 Epoch: 29 Global Step: 619290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:37,228-Speed 6307.70 samples/sec Loss 3.4516 LearningRate 0.0001 Epoch: 29 Global Step: 619300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:40,476-Speed 6307.68 samples/sec Loss 3.4413 LearningRate 0.0001 Epoch: 29 Global Step: 619310 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:43,731-Speed 6292.86 samples/sec Loss 3.4184 LearningRate 0.0001 Epoch: 29 Global Step: 619320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:46,969-Speed 6325.85 samples/sec Loss 3.4067 LearningRate 0.0001 Epoch: 29 Global Step: 619330 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:50,215-Speed 6311.23 samples/sec Loss 3.4092 LearningRate 0.0001 Epoch: 29 Global Step: 619340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:53,561-Speed 6120.69 samples/sec Loss 3.3971 LearningRate 0.0001 Epoch: 29 Global Step: 619350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:03:56,807-Speed 6311.72 samples/sec Loss 3.3650 LearningRate 0.0001 Epoch: 29 Global Step: 619360 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:00,057-Speed 6302.90 samples/sec Loss 3.3651 LearningRate 0.0001 Epoch: 29 Global Step: 619370 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:04:03,287-Speed 6341.74 samples/sec Loss 3.4250 LearningRate 0.0001 Epoch: 29 Global Step: 619380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:06,533-Speed 6311.10 samples/sec Loss 3.4385 LearningRate 0.0001 Epoch: 29 Global Step: 619390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:09,776-Speed 6316.27 samples/sec Loss 3.4363 LearningRate 0.0001 Epoch: 29 Global Step: 619400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:13,022-Speed 6310.14 samples/sec Loss 3.4251 LearningRate 0.0001 Epoch: 29 Global Step: 619410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:16,272-Speed 6302.92 samples/sec Loss 3.4819 LearningRate 0.0001 Epoch: 29 Global Step: 619420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:19,518-Speed 6312.08 samples/sec Loss 3.4528 LearningRate 0.0001 Epoch: 29 Global Step: 619430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:22,771-Speed 6297.20 samples/sec Loss 3.4123 LearningRate 0.0001 Epoch: 29 Global Step: 619440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:26,016-Speed 6312.13 samples/sec Loss 3.4449 LearningRate 0.0001 Epoch: 29 Global Step: 619450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:29,270-Speed 6295.24 samples/sec Loss 3.4062 LearningRate 0.0001 Epoch: 29 Global Step: 619460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:32,535-Speed 6275.18 samples/sec Loss 3.4366 LearningRate 0.0001 Epoch: 29 Global Step: 619470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:35,768-Speed 6336.17 samples/sec Loss 3.4469 LearningRate 0.0001 Epoch: 29 Global Step: 619480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:39,013-Speed 6312.87 samples/sec Loss 3.3850 LearningRate 0.0001 Epoch: 29 Global Step: 619490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:42,260-Speed 6307.94 samples/sec Loss 3.4014 LearningRate 0.0001 Epoch: 29 Global Step: 619500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:45,508-Speed 6306.08 samples/sec Loss 3.4272 LearningRate 0.0001 Epoch: 29 Global Step: 619510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:48,755-Speed 6310.06 samples/sec Loss 3.3766 LearningRate 0.0001 Epoch: 29 Global Step: 619520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:52,002-Speed 6308.57 samples/sec Loss 3.5041 LearningRate 0.0001 Epoch: 29 Global Step: 619530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:55,245-Speed 6315.81 samples/sec Loss 3.3615 LearningRate 0.0001 Epoch: 29 Global Step: 619540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:04:58,491-Speed 6314.31 samples/sec Loss 3.4290 LearningRate 0.0001 Epoch: 29 Global Step: 619550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:01,740-Speed 6304.50 samples/sec Loss 3.4077 LearningRate 0.0001 Epoch: 29 Global Step: 619560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:04,979-Speed 6323.35 samples/sec Loss 3.4907 LearningRate 0.0001 Epoch: 29 Global Step: 619570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:08,209-Speed 6341.99 samples/sec Loss 3.4421 LearningRate 0.0001 Epoch: 29 Global Step: 619580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:11,457-Speed 6306.65 samples/sec Loss 3.4414 LearningRate 0.0001 Epoch: 29 Global Step: 619590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:14,702-Speed 6313.79 samples/sec Loss 3.3931 LearningRate 0.0001 Epoch: 29 Global Step: 619600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:17,949-Speed 6308.50 samples/sec Loss 3.3998 LearningRate 0.0001 Epoch: 29 Global Step: 619610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:21,192-Speed 6317.12 samples/sec Loss 3.4486 LearningRate 0.0001 Epoch: 29 Global Step: 619620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:24,437-Speed 6311.63 samples/sec Loss 3.4080 LearningRate 0.0001 Epoch: 29 Global Step: 619630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:27,680-Speed 6316.77 samples/sec Loss 3.4431 LearningRate 0.0001 Epoch: 29 Global Step: 619640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:30,926-Speed 6310.81 samples/sec Loss 3.4510 LearningRate 0.0001 Epoch: 29 Global Step: 619650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:34,171-Speed 6311.85 samples/sec Loss 3.4800 LearningRate 0.0001 Epoch: 29 Global Step: 619660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:37,416-Speed 6314.08 samples/sec Loss 3.3942 LearningRate 0.0001 Epoch: 29 Global Step: 619670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:40,648-Speed 6337.44 samples/sec Loss 3.4147 LearningRate 0.0001 Epoch: 29 Global Step: 619680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:43,893-Speed 6312.67 samples/sec Loss 3.4449 LearningRate 0.0001 Epoch: 29 Global Step: 619690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:47,139-Speed 6311.55 samples/sec Loss 3.4231 LearningRate 0.0001 Epoch: 29 Global Step: 619700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:50,385-Speed 6310.87 samples/sec Loss 3.4066 LearningRate 0.0001 Epoch: 29 Global Step: 619710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:53,630-Speed 6312.99 samples/sec Loss 3.4418 LearningRate 0.0001 Epoch: 29 Global Step: 619720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:05:56,876-Speed 6311.04 samples/sec Loss 3.4710 LearningRate 0.0001 Epoch: 29 Global Step: 619730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:00,117-Speed 6319.97 samples/sec Loss 3.4566 LearningRate 0.0001 Epoch: 29 Global Step: 619740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:03,363-Speed 6310.40 samples/sec Loss 3.4915 LearningRate 0.0001 Epoch: 29 Global Step: 619750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:06,611-Speed 6307.59 samples/sec Loss 3.3903 LearningRate 0.0001 Epoch: 29 Global Step: 619760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:09,855-Speed 6313.81 samples/sec Loss 3.4850 LearningRate 0.0001 Epoch: 29 Global Step: 619770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:13,100-Speed 6313.18 samples/sec Loss 3.4275 LearningRate 0.0001 Epoch: 29 Global Step: 619780 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:06:16,352-Speed 6298.82 samples/sec Loss 3.4403 LearningRate 0.0001 Epoch: 29 Global Step: 619790 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:06:19,584-Speed 6341.15 samples/sec Loss 3.4422 LearningRate 0.0001 Epoch: 29 Global Step: 619800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:22,826-Speed 6317.78 samples/sec Loss 3.4778 LearningRate 0.0001 Epoch: 29 Global Step: 619810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:26,072-Speed 6311.32 samples/sec Loss 3.3985 LearningRate 0.0001 Epoch: 29 Global Step: 619820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:29,316-Speed 6313.70 samples/sec Loss 3.3995 LearningRate 0.0001 Epoch: 29 Global Step: 619830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:32,563-Speed 6309.09 samples/sec Loss 3.4522 LearningRate 0.0001 Epoch: 29 Global Step: 619840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:35,812-Speed 6305.82 samples/sec Loss 3.4001 LearningRate 0.0001 Epoch: 29 Global Step: 619850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:39,060-Speed 6306.94 samples/sec Loss 3.3547 LearningRate 0.0001 Epoch: 29 Global Step: 619860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:42,306-Speed 6310.67 samples/sec Loss 3.4764 LearningRate 0.0001 Epoch: 29 Global Step: 619870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:45,550-Speed 6314.24 samples/sec Loss 3.4428 LearningRate 0.0001 Epoch: 29 Global Step: 619880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:48,796-Speed 6311.80 samples/sec Loss 3.4316 LearningRate 0.0001 Epoch: 29 Global Step: 619890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:52,023-Speed 6348.32 samples/sec Loss 3.3810 LearningRate 0.0001 Epoch: 29 Global Step: 619900 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:55,268-Speed 6312.39 samples/sec Loss 3.4590 LearningRate 0.0001 Epoch: 29 Global Step: 619910 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:06:58,512-Speed 6313.36 samples/sec Loss 3.4731 LearningRate 0.0001 Epoch: 29 Global Step: 619920 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:01,761-Speed 6305.25 samples/sec Loss 3.4032 LearningRate 0.0001 Epoch: 29 Global Step: 619930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:05,012-Speed 6301.80 samples/sec Loss 3.3906 LearningRate 0.0001 Epoch: 29 Global Step: 619940 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:08,257-Speed 6313.27 samples/sec Loss 3.4676 LearningRate 0.0001 Epoch: 29 Global Step: 619950 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:11,585-Speed 6154.75 samples/sec Loss 3.3823 LearningRate 0.0001 Epoch: 29 Global Step: 619960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:14,830-Speed 6311.48 samples/sec Loss 3.4295 LearningRate 0.0001 Epoch: 29 Global Step: 619970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:18,073-Speed 6316.10 samples/sec Loss 3.4443 LearningRate 0.0001 Epoch: 29 Global Step: 619980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:21,319-Speed 6312.87 samples/sec Loss 3.4118 LearningRate 0.0001 Epoch: 29 Global Step: 619990 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:24,549-Speed 6340.56 samples/sec Loss 3.4265 LearningRate 0.0001 Epoch: 29 Global Step: 620000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:27,794-Speed 6313.50 samples/sec Loss 3.4270 LearningRate 0.0001 Epoch: 29 Global Step: 620010 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:31,040-Speed 6309.58 samples/sec Loss 3.4034 LearningRate 0.0001 Epoch: 29 Global Step: 620020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:34,282-Speed 6319.84 samples/sec Loss 3.3867 LearningRate 0.0001 Epoch: 29 Global Step: 620030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:37,530-Speed 6306.84 samples/sec Loss 3.4344 LearningRate 0.0001 Epoch: 29 Global Step: 620040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:40,774-Speed 6313.39 samples/sec Loss 3.3892 LearningRate 0.0001 Epoch: 29 Global Step: 620050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:44,020-Speed 6310.99 samples/sec Loss 3.4388 LearningRate 0.0001 Epoch: 29 Global Step: 620060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:47,269-Speed 6305.25 samples/sec Loss 3.4520 LearningRate 0.0001 Epoch: 29 Global Step: 620070 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:50,520-Speed 6300.89 samples/sec Loss 3.4006 LearningRate 0.0001 Epoch: 29 Global Step: 620080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:53,767-Speed 6308.62 samples/sec Loss 3.3905 LearningRate 0.0001 Epoch: 29 Global Step: 620090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:07:56,995-Speed 6346.54 samples/sec Loss 3.4667 LearningRate 0.0001 Epoch: 29 Global Step: 620100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:00,241-Speed 6310.19 samples/sec Loss 3.3843 LearningRate 0.0001 Epoch: 29 Global Step: 620110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:03,485-Speed 6314.76 samples/sec Loss 3.3865 LearningRate 0.0001 Epoch: 29 Global Step: 620120 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:06,737-Speed 6298.82 samples/sec Loss 3.3995 LearningRate 0.0001 Epoch: 29 Global Step: 620130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:09,984-Speed 6309.88 samples/sec Loss 3.4098 LearningRate 0.0001 Epoch: 29 Global Step: 620140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:13,228-Speed 6313.80 samples/sec Loss 3.4528 LearningRate 0.0001 Epoch: 29 Global Step: 620150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:16,477-Speed 6306.35 samples/sec Loss 3.4888 LearningRate 0.0001 Epoch: 29 Global Step: 620160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:19,723-Speed 6309.93 samples/sec Loss 3.4211 LearningRate 0.0001 Epoch: 29 Global Step: 620170 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:22,970-Speed 6309.06 samples/sec Loss 3.4509 LearningRate 0.0001 Epoch: 29 Global Step: 620180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:26,218-Speed 6307.55 samples/sec Loss 3.3872 LearningRate 0.0001 Epoch: 29 Global Step: 620190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:29,461-Speed 6315.23 samples/sec Loss 3.4181 LearningRate 0.0001 Epoch: 29 Global Step: 620200 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:08:32,702-Speed 6319.84 samples/sec Loss 3.3699 LearningRate 0.0001 Epoch: 29 Global Step: 620210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:35,954-Speed 6299.91 samples/sec Loss 3.4743 LearningRate 0.0001 Epoch: 29 Global Step: 620220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:39,200-Speed 6311.57 samples/sec Loss 3.3992 LearningRate 0.0001 Epoch: 29 Global Step: 620230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:42,443-Speed 6315.59 samples/sec Loss 3.3914 LearningRate 0.0001 Epoch: 29 Global Step: 620240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:45,690-Speed 6308.41 samples/sec Loss 3.4583 LearningRate 0.0001 Epoch: 29 Global Step: 620250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:48,937-Speed 6310.21 samples/sec Loss 3.3609 LearningRate 0.0001 Epoch: 29 Global Step: 620260 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:52,188-Speed 6300.07 samples/sec Loss 3.4496 LearningRate 0.0001 Epoch: 29 Global Step: 620270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:55,436-Speed 6308.02 samples/sec Loss 3.4520 LearningRate 0.0001 Epoch: 29 Global Step: 620280 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:08:58,678-Speed 6317.95 samples/sec Loss 3.3929 LearningRate 0.0001 Epoch: 29 Global Step: 620290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:01,923-Speed 6312.64 samples/sec Loss 3.4470 LearningRate 0.0001 Epoch: 29 Global Step: 620300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:05,169-Speed 6309.73 samples/sec Loss 3.4386 LearningRate 0.0001 Epoch: 29 Global Step: 620310 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:09:08,400-Speed 6341.58 samples/sec Loss 3.4221 LearningRate 0.0001 Epoch: 29 Global Step: 620320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:11,648-Speed 6307.13 samples/sec Loss 3.4296 LearningRate 0.0001 Epoch: 29 Global Step: 620330 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:14,890-Speed 6318.31 samples/sec Loss 3.3641 LearningRate 0.0001 Epoch: 29 Global Step: 620340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:18,142-Speed 6298.26 samples/sec Loss 3.3958 LearningRate 0.0001 Epoch: 29 Global Step: 620350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:21,389-Speed 6309.36 samples/sec Loss 3.4129 LearningRate 0.0001 Epoch: 29 Global Step: 620360 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:24,641-Speed 6298.72 samples/sec Loss 3.3989 LearningRate 0.0001 Epoch: 29 Global Step: 620370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:27,886-Speed 6314.16 samples/sec Loss 3.4120 LearningRate 0.0001 Epoch: 29 Global Step: 620380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:31,129-Speed 6315.34 samples/sec Loss 3.4202 LearningRate 0.0001 Epoch: 29 Global Step: 620390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:34,375-Speed 6309.93 samples/sec Loss 3.4284 LearningRate 0.0001 Epoch: 29 Global Step: 620400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:37,619-Speed 6314.86 samples/sec Loss 3.4540 LearningRate 0.0001 Epoch: 29 Global Step: 620410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:40,850-Speed 6340.71 samples/sec Loss 3.4331 LearningRate 0.0001 Epoch: 29 Global Step: 620420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:44,095-Speed 6311.69 samples/sec Loss 3.4206 LearningRate 0.0001 Epoch: 29 Global Step: 620430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:47,340-Speed 6312.41 samples/sec Loss 3.4842 LearningRate 0.0001 Epoch: 29 Global Step: 620440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:50,584-Speed 6316.14 samples/sec Loss 3.4570 LearningRate 0.0001 Epoch: 29 Global Step: 620450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:53,829-Speed 6311.99 samples/sec Loss 3.4331 LearningRate 0.0001 Epoch: 29 Global Step: 620460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:09:57,076-Speed 6309.19 samples/sec Loss 3.3907 LearningRate 0.0001 Epoch: 29 Global Step: 620470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:00,330-Speed 6294.83 samples/sec Loss 3.3945 LearningRate 0.0001 Epoch: 29 Global Step: 620480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:03,572-Speed 6317.69 samples/sec Loss 3.4027 LearningRate 0.0001 Epoch: 29 Global Step: 620490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:06,817-Speed 6312.28 samples/sec Loss 3.4022 LearningRate 0.0001 Epoch: 29 Global Step: 620500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:10,063-Speed 6312.52 samples/sec Loss 3.4560 LearningRate 0.0001 Epoch: 29 Global Step: 620510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:13,308-Speed 6311.09 samples/sec Loss 3.4044 LearningRate 0.0001 Epoch: 29 Global Step: 620520 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:10:16,536-Speed 6346.41 samples/sec Loss 3.3724 LearningRate 0.0001 Epoch: 29 Global Step: 620530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:19,784-Speed 6307.03 samples/sec Loss 3.4460 LearningRate 0.0001 Epoch: 29 Global Step: 620540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:23,031-Speed 6309.09 samples/sec Loss 3.4483 LearningRate 0.0001 Epoch: 29 Global Step: 620550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:26,276-Speed 6313.88 samples/sec Loss 3.3818 LearningRate 0.0001 Epoch: 29 Global Step: 620560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:29,554-Speed 6248.71 samples/sec Loss 3.4070 LearningRate 0.0001 Epoch: 29 Global Step: 620570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:32,798-Speed 6315.13 samples/sec Loss 3.4555 LearningRate 0.0001 Epoch: 29 Global Step: 620580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:36,055-Speed 6289.15 samples/sec Loss 3.4151 LearningRate 0.0001 Epoch: 29 Global Step: 620590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:39,354-Speed 6209.61 samples/sec Loss 3.3828 LearningRate 0.0001 Epoch: 29 Global Step: 620600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:42,594-Speed 6321.18 samples/sec Loss 3.3748 LearningRate 0.0001 Epoch: 29 Global Step: 620610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:45,840-Speed 6311.96 samples/sec Loss 3.4361 LearningRate 0.0001 Epoch: 29 Global Step: 620620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:49,081-Speed 6319.37 samples/sec Loss 3.4187 LearningRate 0.0001 Epoch: 29 Global Step: 620630 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:10:52,313-Speed 6337.60 samples/sec Loss 3.4520 LearningRate 0.0001 Epoch: 29 Global Step: 620640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:55,556-Speed 6317.31 samples/sec Loss 3.4346 LearningRate 0.0001 Epoch: 29 Global Step: 620650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:10:58,800-Speed 6314.12 samples/sec Loss 3.4458 LearningRate 0.0001 Epoch: 29 Global Step: 620660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:02,043-Speed 6316.58 samples/sec Loss 3.4253 LearningRate 0.0001 Epoch: 29 Global Step: 620670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:05,289-Speed 6312.08 samples/sec Loss 3.4236 LearningRate 0.0001 Epoch: 29 Global Step: 620680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:08,533-Speed 6313.09 samples/sec Loss 3.5028 LearningRate 0.0001 Epoch: 29 Global Step: 620690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:11,779-Speed 6310.82 samples/sec Loss 3.3932 LearningRate 0.0001 Epoch: 29 Global Step: 620700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:15,022-Speed 6316.24 samples/sec Loss 3.3839 LearningRate 0.0001 Epoch: 29 Global Step: 620710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:18,271-Speed 6307.76 samples/sec Loss 3.3433 LearningRate 0.0001 Epoch: 29 Global Step: 620720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:21,514-Speed 6315.20 samples/sec Loss 3.4199 LearningRate 0.0001 Epoch: 29 Global Step: 620730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:24,757-Speed 6317.66 samples/sec Loss 3.4490 LearningRate 0.0001 Epoch: 29 Global Step: 620740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:28,002-Speed 6311.91 samples/sec Loss 3.4408 LearningRate 0.0001 Epoch: 29 Global Step: 620750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:31,247-Speed 6313.52 samples/sec Loss 3.4534 LearningRate 0.0001 Epoch: 29 Global Step: 620760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:34,491-Speed 6314.13 samples/sec Loss 3.4648 LearningRate 0.0001 Epoch: 29 Global Step: 620770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:37,736-Speed 6312.86 samples/sec Loss 3.4257 LearningRate 0.0001 Epoch: 29 Global Step: 620780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:40,985-Speed 6305.52 samples/sec Loss 3.3775 LearningRate 0.0001 Epoch: 29 Global Step: 620790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:44,229-Speed 6313.59 samples/sec Loss 3.4137 LearningRate 0.0001 Epoch: 29 Global Step: 620800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:47,473-Speed 6315.39 samples/sec Loss 3.4365 LearningRate 0.0001 Epoch: 29 Global Step: 620810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:50,716-Speed 6316.35 samples/sec Loss 3.4333 LearningRate 0.0001 Epoch: 29 Global Step: 620820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:53,961-Speed 6313.52 samples/sec Loss 3.4538 LearningRate 0.0001 Epoch: 29 Global Step: 620830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:11:57,192-Speed 6339.71 samples/sec Loss 3.4601 LearningRate 0.0001 Epoch: 29 Global Step: 620840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:00,434-Speed 6318.64 samples/sec Loss 3.3773 LearningRate 0.0001 Epoch: 29 Global Step: 620850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:03,682-Speed 6306.55 samples/sec Loss 3.4162 LearningRate 0.0001 Epoch: 29 Global Step: 620860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:06,925-Speed 6316.32 samples/sec Loss 3.4271 LearningRate 0.0001 Epoch: 29 Global Step: 620870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:10,169-Speed 6314.27 samples/sec Loss 3.4238 LearningRate 0.0001 Epoch: 29 Global Step: 620880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:13,420-Speed 6301.38 samples/sec Loss 3.4122 LearningRate 0.0001 Epoch: 29 Global Step: 620890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:16,668-Speed 6306.75 samples/sec Loss 3.4423 LearningRate 0.0001 Epoch: 29 Global Step: 620900 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:19,912-Speed 6314.24 samples/sec Loss 3.3817 LearningRate 0.0001 Epoch: 29 Global Step: 620910 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:23,155-Speed 6316.52 samples/sec Loss 3.3886 LearningRate 0.0001 Epoch: 29 Global Step: 620920 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:26,404-Speed 6305.72 samples/sec Loss 3.4149 LearningRate 0.0001 Epoch: 29 Global Step: 620930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:29,647-Speed 6316.13 samples/sec Loss 3.4217 LearningRate 0.0001 Epoch: 29 Global Step: 620940 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:12:32,893-Speed 6309.93 samples/sec Loss 3.4155 LearningRate 0.0001 Epoch: 29 Global Step: 620950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:12:36,125-Speed 6338.97 samples/sec Loss 3.4457 LearningRate 0.0001 Epoch: 29 Global Step: 620960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:39,363-Speed 6325.73 samples/sec Loss 3.3848 LearningRate 0.0001 Epoch: 29 Global Step: 620970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:42,614-Speed 6301.97 samples/sec Loss 3.3373 LearningRate 0.0001 Epoch: 29 Global Step: 620980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:45,863-Speed 6305.15 samples/sec Loss 3.3836 LearningRate 0.0001 Epoch: 29 Global Step: 620990 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:49,104-Speed 6319.83 samples/sec Loss 3.3655 LearningRate 0.0001 Epoch: 29 Global Step: 621000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:52,350-Speed 6310.64 samples/sec Loss 3.4287 LearningRate 0.0001 Epoch: 29 Global Step: 621010 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:55,595-Speed 6313.55 samples/sec Loss 3.4137 LearningRate 0.0001 Epoch: 29 Global Step: 621020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:12:58,842-Speed 6308.74 samples/sec Loss 3.4299 LearningRate 0.0001 Epoch: 29 Global Step: 621030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:02,089-Speed 6309.25 samples/sec Loss 3.4194 LearningRate 0.0001 Epoch: 29 Global Step: 621040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:05,336-Speed 6308.12 samples/sec Loss 3.4786 LearningRate 0.0001 Epoch: 29 Global Step: 621050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:08,568-Speed 6339.28 samples/sec Loss 3.3860 LearningRate 0.0001 Epoch: 29 Global Step: 621060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:11,812-Speed 6313.53 samples/sec Loss 3.4309 LearningRate 0.0001 Epoch: 29 Global Step: 621070 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:15,059-Speed 6308.29 samples/sec Loss 3.3850 LearningRate 0.0001 Epoch: 29 Global Step: 621080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:18,304-Speed 6314.35 samples/sec Loss 3.3441 LearningRate 0.0001 Epoch: 29 Global Step: 621090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:21,549-Speed 6312.06 samples/sec Loss 3.4722 LearningRate 0.0001 Epoch: 29 Global Step: 621100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:24,792-Speed 6315.86 samples/sec Loss 3.3809 LearningRate 0.0001 Epoch: 29 Global Step: 621110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:28,034-Speed 6317.69 samples/sec Loss 3.4323 LearningRate 0.0001 Epoch: 29 Global Step: 621120 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:31,284-Speed 6303.75 samples/sec Loss 3.4404 LearningRate 0.0001 Epoch: 29 Global Step: 621130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:34,533-Speed 6304.95 samples/sec Loss 3.3863 LearningRate 0.0001 Epoch: 29 Global Step: 621140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:37,778-Speed 6314.98 samples/sec Loss 3.3669 LearningRate 0.0001 Epoch: 29 Global Step: 621150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:41,020-Speed 6318.69 samples/sec Loss 3.4493 LearningRate 0.0001 Epoch: 29 Global Step: 621160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:44,265-Speed 6312.59 samples/sec Loss 3.4436 LearningRate 0.0001 Epoch: 29 Global Step: 621170 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:47,509-Speed 6315.37 samples/sec Loss 3.4416 LearningRate 0.0001 Epoch: 29 Global Step: 621180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:50,756-Speed 6308.06 samples/sec Loss 3.4364 LearningRate 0.0001 Epoch: 29 Global Step: 621190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:53,997-Speed 6321.20 samples/sec Loss 3.3898 LearningRate 0.0001 Epoch: 29 Global Step: 621200 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:13:57,239-Speed 6317.15 samples/sec Loss 3.4028 LearningRate 0.0001 Epoch: 29 Global Step: 621210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:00,485-Speed 6311.26 samples/sec Loss 3.3925 LearningRate 0.0001 Epoch: 29 Global Step: 621220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:03,727-Speed 6319.02 samples/sec Loss 3.4212 LearningRate 0.0001 Epoch: 29 Global Step: 621230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:06,974-Speed 6309.19 samples/sec Loss 3.4273 LearningRate 0.0001 Epoch: 29 Global Step: 621240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:10,218-Speed 6314.63 samples/sec Loss 3.4280 LearningRate 0.0001 Epoch: 29 Global Step: 621250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:13,451-Speed 6337.45 samples/sec Loss 3.4149 LearningRate 0.0001 Epoch: 29 Global Step: 621260 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:16,699-Speed 6305.35 samples/sec Loss 3.4464 LearningRate 0.0001 Epoch: 29 Global Step: 621270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:19,942-Speed 6317.69 samples/sec Loss 3.3858 LearningRate 0.0001 Epoch: 29 Global Step: 621280 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:23,204-Speed 6278.96 samples/sec Loss 3.4544 LearningRate 0.0001 Epoch: 29 Global Step: 621290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:26,449-Speed 6313.26 samples/sec Loss 3.3795 LearningRate 0.0001 Epoch: 29 Global Step: 621300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:29,690-Speed 6320.72 samples/sec Loss 3.4369 LearningRate 0.0001 Epoch: 29 Global Step: 621310 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:32,940-Speed 6303.08 samples/sec Loss 3.4072 LearningRate 0.0001 Epoch: 29 Global Step: 621320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:36,181-Speed 6319.80 samples/sec Loss 3.4773 LearningRate 0.0001 Epoch: 29 Global Step: 621330 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:39,431-Speed 6303.35 samples/sec Loss 3.4399 LearningRate 0.0001 Epoch: 29 Global Step: 621340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:42,678-Speed 6308.44 samples/sec Loss 3.4649 LearningRate 0.0001 Epoch: 29 Global Step: 621350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:45,921-Speed 6315.75 samples/sec Loss 3.4287 LearningRate 0.0001 Epoch: 29 Global Step: 621360 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:14:49,153-Speed 6338.77 samples/sec Loss 3.3967 LearningRate 0.0001 Epoch: 29 Global Step: 621370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:52,398-Speed 6311.96 samples/sec Loss 3.3944 LearningRate 0.0001 Epoch: 29 Global Step: 621380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:55,648-Speed 6304.10 samples/sec Loss 3.3751 LearningRate 0.0001 Epoch: 29 Global Step: 621390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:14:58,888-Speed 6322.46 samples/sec Loss 3.4010 LearningRate 0.0001 Epoch: 29 Global Step: 621400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:02,130-Speed 6318.67 samples/sec Loss 3.3797 LearningRate 0.0001 Epoch: 29 Global Step: 621410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:05,373-Speed 6315.89 samples/sec Loss 3.3603 LearningRate 0.0001 Epoch: 29 Global Step: 621420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:08,619-Speed 6310.79 samples/sec Loss 3.4449 LearningRate 0.0001 Epoch: 29 Global Step: 621430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:11,862-Speed 6316.28 samples/sec Loss 3.4381 LearningRate 0.0001 Epoch: 29 Global Step: 621440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:15,112-Speed 6304.38 samples/sec Loss 3.4546 LearningRate 0.0001 Epoch: 29 Global Step: 621450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:18,354-Speed 6318.96 samples/sec Loss 3.3684 LearningRate 0.0001 Epoch: 29 Global Step: 621460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:21,591-Speed 6327.24 samples/sec Loss 3.4429 LearningRate 0.0001 Epoch: 29 Global Step: 621470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:24,836-Speed 6312.52 samples/sec Loss 3.4415 LearningRate 0.0001 Epoch: 29 Global Step: 621480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:28,084-Speed 6306.53 samples/sec Loss 3.4268 LearningRate 0.0001 Epoch: 29 Global Step: 621490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:31,330-Speed 6311.87 samples/sec Loss 3.4792 LearningRate 0.0001 Epoch: 29 Global Step: 621500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:34,575-Speed 6312.45 samples/sec Loss 3.3532 LearningRate 0.0001 Epoch: 29 Global Step: 621510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:37,826-Speed 6301.60 samples/sec Loss 3.3958 LearningRate 0.0001 Epoch: 29 Global Step: 621520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:41,074-Speed 6305.05 samples/sec Loss 3.3981 LearningRate 0.0001 Epoch: 29 Global Step: 621530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:44,319-Speed 6313.44 samples/sec Loss 3.4272 LearningRate 0.0001 Epoch: 29 Global Step: 621540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:47,562-Speed 6315.49 samples/sec Loss 3.4401 LearningRate 0.0001 Epoch: 29 Global Step: 621550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:50,804-Speed 6318.69 samples/sec Loss 3.4297 LearningRate 0.0001 Epoch: 29 Global Step: 621560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:15:54,049-Speed 6313.16 samples/sec Loss 3.4190 LearningRate 0.0001 Epoch: 29 Global Step: 621570 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:15:57,279-Speed 6342.56 samples/sec Loss 3.4626 LearningRate 0.0001 Epoch: 29 Global Step: 621580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:00,526-Speed 6310.65 samples/sec Loss 3.4151 LearningRate 0.0001 Epoch: 29 Global Step: 621590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:03,777-Speed 6300.70 samples/sec Loss 3.4441 LearningRate 0.0001 Epoch: 29 Global Step: 621600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:07,031-Speed 6293.99 samples/sec Loss 3.3562 LearningRate 0.0001 Epoch: 29 Global Step: 621610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:10,275-Speed 6316.18 samples/sec Loss 3.3569 LearningRate 0.0001 Epoch: 29 Global Step: 621620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:13,520-Speed 6311.50 samples/sec Loss 3.3859 LearningRate 0.0001 Epoch: 29 Global Step: 621630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:16,763-Speed 6315.76 samples/sec Loss 3.4573 LearningRate 0.0001 Epoch: 29 Global Step: 621640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:20,009-Speed 6312.08 samples/sec Loss 3.3999 LearningRate 0.0001 Epoch: 29 Global Step: 621650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:23,257-Speed 6305.32 samples/sec Loss 3.4399 LearningRate 0.0001 Epoch: 29 Global Step: 621660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:26,508-Speed 6302.35 samples/sec Loss 3.4485 LearningRate 0.0001 Epoch: 29 Global Step: 621670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:29,742-Speed 6333.56 samples/sec Loss 3.4426 LearningRate 0.0001 Epoch: 29 Global Step: 621680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:32,982-Speed 6323.09 samples/sec Loss 3.4203 LearningRate 0.0001 Epoch: 29 Global Step: 621690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:36,233-Speed 6300.90 samples/sec Loss 3.3858 LearningRate 0.0001 Epoch: 29 Global Step: 621700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:39,477-Speed 6315.59 samples/sec Loss 3.3904 LearningRate 0.0001 Epoch: 29 Global Step: 621710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:42,719-Speed 6317.93 samples/sec Loss 3.4570 LearningRate 0.0001 Epoch: 29 Global Step: 621720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:45,962-Speed 6316.79 samples/sec Loss 3.4293 LearningRate 0.0001 Epoch: 29 Global Step: 621730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:49,208-Speed 6310.48 samples/sec Loss 3.4294 LearningRate 0.0001 Epoch: 29 Global Step: 621740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:52,454-Speed 6310.93 samples/sec Loss 3.4616 LearningRate 0.0001 Epoch: 29 Global Step: 621750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:55,702-Speed 6305.93 samples/sec Loss 3.3708 LearningRate 0.0001 Epoch: 29 Global Step: 621760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:16:58,949-Speed 6308.54 samples/sec Loss 3.4238 LearningRate 0.0001 Epoch: 29 Global Step: 621770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:02,185-Speed 6330.59 samples/sec Loss 3.4426 LearningRate 0.0001 Epoch: 29 Global Step: 621780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:05,435-Speed 6303.43 samples/sec Loss 3.4027 LearningRate 0.0001 Epoch: 29 Global Step: 621790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:08,683-Speed 6307.70 samples/sec Loss 3.4308 LearningRate 0.0001 Epoch: 29 Global Step: 621800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:11,925-Speed 6317.13 samples/sec Loss 3.3901 LearningRate 0.0001 Epoch: 29 Global Step: 621810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:15,169-Speed 6314.68 samples/sec Loss 3.4419 LearningRate 0.0001 Epoch: 29 Global Step: 621820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:18,416-Speed 6309.17 samples/sec Loss 3.3496 LearningRate 0.0001 Epoch: 29 Global Step: 621830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:21,673-Speed 6288.73 samples/sec Loss 3.3747 LearningRate 0.0001 Epoch: 29 Global Step: 621840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:24,917-Speed 6314.71 samples/sec Loss 3.4401 LearningRate 0.0001 Epoch: 29 Global Step: 621850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:28,163-Speed 6311.49 samples/sec Loss 3.4025 LearningRate 0.0001 Epoch: 29 Global Step: 621860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:31,411-Speed 6306.32 samples/sec Loss 3.4175 LearningRate 0.0001 Epoch: 29 Global Step: 621870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:34,661-Speed 6302.48 samples/sec Loss 3.4140 LearningRate 0.0001 Epoch: 29 Global Step: 621880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:37,917-Speed 6291.16 samples/sec Loss 3.4353 LearningRate 0.0001 Epoch: 29 Global Step: 621890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:41,168-Speed 6303.17 samples/sec Loss 3.4475 LearningRate 0.0001 Epoch: 29 Global Step: 621900 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:44,412-Speed 6314.18 samples/sec Loss 3.4462 LearningRate 0.0001 Epoch: 29 Global Step: 621910 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:47,656-Speed 6313.57 samples/sec Loss 3.4517 LearningRate 0.0001 Epoch: 29 Global Step: 621920 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:50,899-Speed 6318.42 samples/sec Loss 3.3858 LearningRate 0.0001 Epoch: 29 Global Step: 621930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:54,138-Speed 6323.23 samples/sec Loss 3.3978 LearningRate 0.0001 Epoch: 29 Global Step: 621940 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:17:57,383-Speed 6311.65 samples/sec Loss 3.4331 LearningRate 0.0001 Epoch: 29 Global Step: 621950 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:00,631-Speed 6307.65 samples/sec Loss 3.4017 LearningRate 0.0001 Epoch: 29 Global Step: 621960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:03,883-Speed 6298.90 samples/sec Loss 3.3890 LearningRate 0.0001 Epoch: 29 Global Step: 621970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:07,134-Speed 6301.11 samples/sec Loss 3.3793 LearningRate 0.0001 Epoch: 29 Global Step: 621980 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:18:10,385-Speed 6301.98 samples/sec Loss 3.4132 LearningRate 0.0001 Epoch: 29 Global Step: 621990 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:18:13,614-Speed 6343.17 samples/sec Loss 3.4331 LearningRate 0.0001 Epoch: 29 Global Step: 622000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:16,859-Speed 6311.75 samples/sec Loss 3.3942 LearningRate 0.0001 Epoch: 29 Global Step: 622010 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:20,105-Speed 6311.20 samples/sec Loss 3.4699 LearningRate 0.0001 Epoch: 29 Global Step: 622020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:23,357-Speed 6300.33 samples/sec Loss 3.4344 LearningRate 0.0001 Epoch: 29 Global Step: 622030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:26,605-Speed 6306.55 samples/sec Loss 3.4839 LearningRate 0.0001 Epoch: 29 Global Step: 622040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:29,846-Speed 6321.02 samples/sec Loss 3.4576 LearningRate 0.0001 Epoch: 29 Global Step: 622050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:33,092-Speed 6308.92 samples/sec Loss 3.3911 LearningRate 0.0001 Epoch: 29 Global Step: 622060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:36,335-Speed 6317.07 samples/sec Loss 3.4200 LearningRate 0.0001 Epoch: 29 Global Step: 622070 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:39,587-Speed 6298.91 samples/sec Loss 3.3832 LearningRate 0.0001 Epoch: 29 Global Step: 622080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:42,830-Speed 6316.42 samples/sec Loss 3.3977 LearningRate 0.0001 Epoch: 29 Global Step: 622090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:46,081-Speed 6300.91 samples/sec Loss 3.4298 LearningRate 0.0001 Epoch: 29 Global Step: 622100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:49,326-Speed 6312.91 samples/sec Loss 3.4100 LearningRate 0.0001 Epoch: 29 Global Step: 622110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:52,575-Speed 6305.52 samples/sec Loss 3.4255 LearningRate 0.0001 Epoch: 29 Global Step: 622120 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:55,822-Speed 6310.05 samples/sec Loss 3.3694 LearningRate 0.0001 Epoch: 29 Global Step: 622130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:18:59,069-Speed 6308.76 samples/sec Loss 3.4353 LearningRate 0.0001 Epoch: 29 Global Step: 622140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:19:02,334-Speed 6273.49 samples/sec Loss 3.4298 LearningRate 0.0001 Epoch: 29 Global Step: 622150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:19:05,607-Speed 6259.38 samples/sec Loss 3.4288 LearningRate 0.0001 Epoch: 29 Global Step: 622160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:19:08,849-Speed 6317.96 samples/sec Loss 3.4417 LearningRate 0.0001 Epoch: 29 Global Step: 622170 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:19:12,093-Speed 6315.60 samples/sec Loss 3.4438 LearningRate 0.0001 Epoch: 29 Global Step: 622180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:13,431-Speed 333.89 samples/sec Loss 3.4298 LearningRate 0.0001 Epoch: 30 Global Step: 622190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:16,669-Speed 6327.27 samples/sec Loss 3.3766 LearningRate 0.0001 Epoch: 30 Global Step: 622200 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:20:19,895-Speed 6349.36 samples/sec Loss 3.4214 LearningRate 0.0001 Epoch: 30 Global Step: 622210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:23,131-Speed 6330.09 samples/sec Loss 3.4559 LearningRate 0.0001 Epoch: 30 Global Step: 622220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:26,373-Speed 6319.73 samples/sec Loss 3.4045 LearningRate 0.0001 Epoch: 30 Global Step: 622230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:29,612-Speed 6322.70 samples/sec Loss 3.4162 LearningRate 0.0001 Epoch: 30 Global Step: 622240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:32,847-Speed 6332.74 samples/sec Loss 3.4086 LearningRate 0.0001 Epoch: 30 Global Step: 622250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:36,086-Speed 6323.94 samples/sec Loss 3.3886 LearningRate 0.0001 Epoch: 30 Global Step: 622260 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:39,323-Speed 6329.32 samples/sec Loss 3.4516 LearningRate 0.0001 Epoch: 30 Global Step: 622270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:42,567-Speed 6314.71 samples/sec Loss 3.3707 LearningRate 0.0001 Epoch: 30 Global Step: 622280 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:45,812-Speed 6312.42 samples/sec Loss 3.3807 LearningRate 0.0001 Epoch: 30 Global Step: 622290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:49,052-Speed 6322.25 samples/sec Loss 3.4520 LearningRate 0.0001 Epoch: 30 Global Step: 622300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:52,283-Speed 6340.90 samples/sec Loss 3.4145 LearningRate 0.0001 Epoch: 30 Global Step: 622310 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:55,521-Speed 6327.26 samples/sec Loss 3.4867 LearningRate 0.0001 Epoch: 30 Global Step: 622320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:20:58,761-Speed 6322.27 samples/sec Loss 3.4365 LearningRate 0.0001 Epoch: 30 Global Step: 622330 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:02,006-Speed 6311.79 samples/sec Loss 3.4559 LearningRate 0.0001 Epoch: 30 Global Step: 622340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:05,251-Speed 6312.33 samples/sec Loss 3.3558 LearningRate 0.0001 Epoch: 30 Global Step: 622350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:08,492-Speed 6321.55 samples/sec Loss 3.4142 LearningRate 0.0001 Epoch: 30 Global Step: 622360 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:11,731-Speed 6322.96 samples/sec Loss 3.4479 LearningRate 0.0001 Epoch: 30 Global Step: 622370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:14,976-Speed 6312.77 samples/sec Loss 3.3804 LearningRate 0.0001 Epoch: 30 Global Step: 622380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:18,229-Speed 6298.48 samples/sec Loss 3.3574 LearningRate 0.0001 Epoch: 30 Global Step: 622390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:21,472-Speed 6317.06 samples/sec Loss 3.4168 LearningRate 0.0001 Epoch: 30 Global Step: 622400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:24,698-Speed 6349.76 samples/sec Loss 3.3368 LearningRate 0.0001 Epoch: 30 Global Step: 622410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:27,949-Speed 6300.36 samples/sec Loss 3.3821 LearningRate 0.0001 Epoch: 30 Global Step: 622420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:31,190-Speed 6319.21 samples/sec Loss 3.4277 LearningRate 0.0001 Epoch: 30 Global Step: 622430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:34,435-Speed 6314.36 samples/sec Loss 3.3917 LearningRate 0.0001 Epoch: 30 Global Step: 622440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:37,678-Speed 6315.57 samples/sec Loss 3.4323 LearningRate 0.0001 Epoch: 30 Global Step: 622450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:40,920-Speed 6317.62 samples/sec Loss 3.3604 LearningRate 0.0001 Epoch: 30 Global Step: 622460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:44,162-Speed 6318.08 samples/sec Loss 3.4116 LearningRate 0.0001 Epoch: 30 Global Step: 622470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:47,411-Speed 6306.55 samples/sec Loss 3.4213 LearningRate 0.0001 Epoch: 30 Global Step: 622480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:50,654-Speed 6316.84 samples/sec Loss 3.3830 LearningRate 0.0001 Epoch: 30 Global Step: 622490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:53,899-Speed 6311.86 samples/sec Loss 3.4550 LearningRate 0.0001 Epoch: 30 Global Step: 622500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:21:57,131-Speed 6338.34 samples/sec Loss 3.3752 LearningRate 0.0001 Epoch: 30 Global Step: 622510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:00,376-Speed 6312.15 samples/sec Loss 3.3956 LearningRate 0.0001 Epoch: 30 Global Step: 622520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:03,623-Speed 6309.67 samples/sec Loss 3.3996 LearningRate 0.0001 Epoch: 30 Global Step: 622530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:06,865-Speed 6318.06 samples/sec Loss 3.3690 LearningRate 0.0001 Epoch: 30 Global Step: 622540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:10,112-Speed 6309.47 samples/sec Loss 3.3646 LearningRate 0.0001 Epoch: 30 Global Step: 622550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:13,359-Speed 6309.18 samples/sec Loss 3.3752 LearningRate 0.0001 Epoch: 30 Global Step: 622560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:16,600-Speed 6319.84 samples/sec Loss 3.3900 LearningRate 0.0001 Epoch: 30 Global Step: 622570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:19,844-Speed 6315.23 samples/sec Loss 3.3801 LearningRate 0.0001 Epoch: 30 Global Step: 622580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:23,092-Speed 6307.28 samples/sec Loss 3.3880 LearningRate 0.0001 Epoch: 30 Global Step: 622590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:26,334-Speed 6317.60 samples/sec Loss 3.4269 LearningRate 0.0001 Epoch: 30 Global Step: 622600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:29,566-Speed 6337.55 samples/sec Loss 3.3916 LearningRate 0.0001 Epoch: 30 Global Step: 622610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:32,839-Speed 6259.65 samples/sec Loss 3.3355 LearningRate 0.0001 Epoch: 30 Global Step: 622620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:36,145-Speed 6196.07 samples/sec Loss 3.4860 LearningRate 0.0001 Epoch: 30 Global Step: 622630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:39,403-Speed 6287.51 samples/sec Loss 3.4326 LearningRate 0.0001 Epoch: 30 Global Step: 622640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:42,725-Speed 6165.50 samples/sec Loss 3.4485 LearningRate 0.0001 Epoch: 30 Global Step: 622650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:45,971-Speed 6310.87 samples/sec Loss 3.4013 LearningRate 0.0001 Epoch: 30 Global Step: 622660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:49,218-Speed 6308.97 samples/sec Loss 3.4222 LearningRate 0.0001 Epoch: 30 Global Step: 622670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:52,465-Speed 6309.15 samples/sec Loss 3.3791 LearningRate 0.0001 Epoch: 30 Global Step: 622680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:55,711-Speed 6310.45 samples/sec Loss 3.4104 LearningRate 0.0001 Epoch: 30 Global Step: 622690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:22:58,968-Speed 6289.76 samples/sec Loss 3.3456 LearningRate 0.0001 Epoch: 30 Global Step: 622700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:02,210-Speed 6317.26 samples/sec Loss 3.3851 LearningRate 0.0001 Epoch: 30 Global Step: 622710 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:23:05,443-Speed 6336.98 samples/sec Loss 3.4116 LearningRate 0.0001 Epoch: 30 Global Step: 622720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:08,687-Speed 6314.27 samples/sec Loss 3.3879 LearningRate 0.0001 Epoch: 30 Global Step: 622730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:11,936-Speed 6305.60 samples/sec Loss 3.3613 LearningRate 0.0001 Epoch: 30 Global Step: 622740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:15,182-Speed 6311.12 samples/sec Loss 3.4246 LearningRate 0.0001 Epoch: 30 Global Step: 622750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:18,422-Speed 6321.64 samples/sec Loss 3.4158 LearningRate 0.0001 Epoch: 30 Global Step: 622760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:21,671-Speed 6306.54 samples/sec Loss 3.3960 LearningRate 0.0001 Epoch: 30 Global Step: 622770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:24,914-Speed 6316.80 samples/sec Loss 3.3873 LearningRate 0.0001 Epoch: 30 Global Step: 622780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:28,161-Speed 6308.70 samples/sec Loss 3.4100 LearningRate 0.0001 Epoch: 30 Global Step: 622790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:31,407-Speed 6309.28 samples/sec Loss 3.3559 LearningRate 0.0001 Epoch: 30 Global Step: 622800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:34,648-Speed 6321.17 samples/sec Loss 3.4210 LearningRate 0.0001 Epoch: 30 Global Step: 622810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:37,881-Speed 6336.50 samples/sec Loss 3.4029 LearningRate 0.0001 Epoch: 30 Global Step: 622820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:41,238-Speed 6102.21 samples/sec Loss 3.4361 LearningRate 0.0001 Epoch: 30 Global Step: 622830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:44,503-Speed 6273.65 samples/sec Loss 3.4433 LearningRate 0.0001 Epoch: 30 Global Step: 622840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:47,744-Speed 6320.59 samples/sec Loss 3.4626 LearningRate 0.0001 Epoch: 30 Global Step: 622850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:50,986-Speed 6319.16 samples/sec Loss 3.4040 LearningRate 0.0001 Epoch: 30 Global Step: 622860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:54,272-Speed 6233.71 samples/sec Loss 3.3731 LearningRate 0.0001 Epoch: 30 Global Step: 622870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:23:57,516-Speed 6314.28 samples/sec Loss 3.4136 LearningRate 0.0001 Epoch: 30 Global Step: 622880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:00,762-Speed 6310.18 samples/sec Loss 3.4225 LearningRate 0.0001 Epoch: 30 Global Step: 622890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:04,004-Speed 6318.81 samples/sec Loss 3.4063 LearningRate 0.0001 Epoch: 30 Global Step: 622900 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:07,251-Speed 6309.14 samples/sec Loss 3.4255 LearningRate 0.0001 Epoch: 30 Global Step: 622910 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:10,494-Speed 6315.88 samples/sec Loss 3.3842 LearningRate 0.0001 Epoch: 30 Global Step: 622920 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:24:13,734-Speed 6323.13 samples/sec Loss 3.4071 LearningRate 0.0001 Epoch: 30 Global Step: 622930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:16,979-Speed 6312.18 samples/sec Loss 3.3472 LearningRate 0.0001 Epoch: 30 Global Step: 622940 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:20,220-Speed 6321.79 samples/sec Loss 3.4080 LearningRate 0.0001 Epoch: 30 Global Step: 622950 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:23,465-Speed 6311.84 samples/sec Loss 3.4711 LearningRate 0.0001 Epoch: 30 Global Step: 622960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:26,709-Speed 6315.46 samples/sec Loss 3.3615 LearningRate 0.0001 Epoch: 30 Global Step: 622970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:29,956-Speed 6308.70 samples/sec Loss 3.4609 LearningRate 0.0001 Epoch: 30 Global Step: 622980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:33,203-Speed 6309.46 samples/sec Loss 3.4143 LearningRate 0.0001 Epoch: 30 Global Step: 622990 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:36,443-Speed 6321.88 samples/sec Loss 3.3448 LearningRate 0.0001 Epoch: 30 Global Step: 623000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:39,694-Speed 6302.31 samples/sec Loss 3.4214 LearningRate 0.0001 Epoch: 30 Global Step: 623010 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:42,937-Speed 6315.56 samples/sec Loss 3.3942 LearningRate 0.0001 Epoch: 30 Global Step: 623020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:46,165-Speed 6346.24 samples/sec Loss 3.3968 LearningRate 0.0001 Epoch: 30 Global Step: 623030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:49,407-Speed 6318.29 samples/sec Loss 3.3887 LearningRate 0.0001 Epoch: 30 Global Step: 623040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:52,646-Speed 6323.58 samples/sec Loss 3.3328 LearningRate 0.0001 Epoch: 30 Global Step: 623050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:55,893-Speed 6310.41 samples/sec Loss 3.4156 LearningRate 0.0001 Epoch: 30 Global Step: 623060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:24:59,147-Speed 6294.89 samples/sec Loss 3.3379 LearningRate 0.0001 Epoch: 30 Global Step: 623070 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:02,390-Speed 6316.71 samples/sec Loss 3.3591 LearningRate 0.0001 Epoch: 30 Global Step: 623080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:05,629-Speed 6323.28 samples/sec Loss 3.3863 LearningRate 0.0001 Epoch: 30 Global Step: 623090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:08,872-Speed 6318.08 samples/sec Loss 3.4066 LearningRate 0.0001 Epoch: 30 Global Step: 623100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:12,119-Speed 6307.82 samples/sec Loss 3.3775 LearningRate 0.0001 Epoch: 30 Global Step: 623110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:15,366-Speed 6308.16 samples/sec Loss 3.4159 LearningRate 0.0001 Epoch: 30 Global Step: 623120 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:18,596-Speed 6342.87 samples/sec Loss 3.4309 LearningRate 0.0001 Epoch: 30 Global Step: 623130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:21,843-Speed 6309.52 samples/sec Loss 3.4558 LearningRate 0.0001 Epoch: 30 Global Step: 623140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:25,086-Speed 6316.08 samples/sec Loss 3.4233 LearningRate 0.0001 Epoch: 30 Global Step: 623150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:28,331-Speed 6312.70 samples/sec Loss 3.4230 LearningRate 0.0001 Epoch: 30 Global Step: 623160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:31,573-Speed 6318.97 samples/sec Loss 3.3942 LearningRate 0.0001 Epoch: 30 Global Step: 623170 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:34,815-Speed 6317.13 samples/sec Loss 3.4544 LearningRate 0.0001 Epoch: 30 Global Step: 623180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:38,078-Speed 6279.71 samples/sec Loss 3.4286 LearningRate 0.0001 Epoch: 30 Global Step: 623190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:41,322-Speed 6314.35 samples/sec Loss 3.3482 LearningRate 0.0001 Epoch: 30 Global Step: 623200 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:44,563-Speed 6320.36 samples/sec Loss 3.3590 LearningRate 0.0001 Epoch: 30 Global Step: 623210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:47,805-Speed 6319.08 samples/sec Loss 3.3536 LearningRate 0.0001 Epoch: 30 Global Step: 623220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:51,038-Speed 6336.24 samples/sec Loss 3.4122 LearningRate 0.0001 Epoch: 30 Global Step: 623230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:54,281-Speed 6316.08 samples/sec Loss 3.4448 LearningRate 0.0001 Epoch: 30 Global Step: 623240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:25:57,524-Speed 6317.04 samples/sec Loss 3.3358 LearningRate 0.0001 Epoch: 30 Global Step: 623250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:00,770-Speed 6310.31 samples/sec Loss 3.3461 LearningRate 0.0001 Epoch: 30 Global Step: 623260 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:04,083-Speed 6182.06 samples/sec Loss 3.4045 LearningRate 0.0001 Epoch: 30 Global Step: 623270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:07,384-Speed 6206.71 samples/sec Loss 3.3394 LearningRate 0.0001 Epoch: 30 Global Step: 623280 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:10,658-Speed 6257.18 samples/sec Loss 3.4548 LearningRate 0.0001 Epoch: 30 Global Step: 623290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:13,961-Speed 6201.48 samples/sec Loss 3.4514 LearningRate 0.0001 Epoch: 30 Global Step: 623300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:17,247-Speed 6232.15 samples/sec Loss 3.4547 LearningRate 0.0001 Epoch: 30 Global Step: 623310 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:20,491-Speed 6315.59 samples/sec Loss 3.4447 LearningRate 0.0001 Epoch: 30 Global Step: 623320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:23,736-Speed 6312.37 samples/sec Loss 3.4248 LearningRate 0.0001 Epoch: 30 Global Step: 623330 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:26:26,968-Speed 6339.01 samples/sec Loss 3.4085 LearningRate 0.0001 Epoch: 30 Global Step: 623340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:30,211-Speed 6316.72 samples/sec Loss 3.4298 LearningRate 0.0001 Epoch: 30 Global Step: 623350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:33,458-Speed 6308.85 samples/sec Loss 3.4159 LearningRate 0.0001 Epoch: 30 Global Step: 623360 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:36,703-Speed 6311.69 samples/sec Loss 3.4843 LearningRate 0.0001 Epoch: 30 Global Step: 623370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:39,949-Speed 6311.91 samples/sec Loss 3.3993 LearningRate 0.0001 Epoch: 30 Global Step: 623380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:43,193-Speed 6313.28 samples/sec Loss 3.4298 LearningRate 0.0001 Epoch: 30 Global Step: 623390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:46,442-Speed 6305.97 samples/sec Loss 3.4654 LearningRate 0.0001 Epoch: 30 Global Step: 623400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:49,689-Speed 6308.62 samples/sec Loss 3.3961 LearningRate 0.0001 Epoch: 30 Global Step: 623410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:52,936-Speed 6309.11 samples/sec Loss 3.3810 LearningRate 0.0001 Epoch: 30 Global Step: 623420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:56,181-Speed 6312.29 samples/sec Loss 3.3761 LearningRate 0.0001 Epoch: 30 Global Step: 623430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:26:59,414-Speed 6336.73 samples/sec Loss 3.4089 LearningRate 0.0001 Epoch: 30 Global Step: 623440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:02,657-Speed 6315.67 samples/sec Loss 3.4453 LearningRate 0.0001 Epoch: 30 Global Step: 623450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:05,899-Speed 6319.48 samples/sec Loss 3.3612 LearningRate 0.0001 Epoch: 30 Global Step: 623460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:09,147-Speed 6307.50 samples/sec Loss 3.3387 LearningRate 0.0001 Epoch: 30 Global Step: 623470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:12,391-Speed 6313.68 samples/sec Loss 3.4228 LearningRate 0.0001 Epoch: 30 Global Step: 623480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:15,632-Speed 6320.20 samples/sec Loss 3.3636 LearningRate 0.0001 Epoch: 30 Global Step: 623490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:18,873-Speed 6321.58 samples/sec Loss 3.4892 LearningRate 0.0001 Epoch: 30 Global Step: 623500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:22,118-Speed 6312.21 samples/sec Loss 3.3180 LearningRate 0.0001 Epoch: 30 Global Step: 623510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:25,364-Speed 6311.78 samples/sec Loss 3.3454 LearningRate 0.0001 Epoch: 30 Global Step: 623520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:28,702-Speed 6134.93 samples/sec Loss 3.3653 LearningRate 0.0001 Epoch: 30 Global Step: 623530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:31,935-Speed 6337.61 samples/sec Loss 3.4085 LearningRate 0.0001 Epoch: 30 Global Step: 623540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:35,184-Speed 6304.22 samples/sec Loss 3.4454 LearningRate 0.0001 Epoch: 30 Global Step: 623550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:38,430-Speed 6311.00 samples/sec Loss 3.3530 LearningRate 0.0001 Epoch: 30 Global Step: 623560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:41,674-Speed 6315.34 samples/sec Loss 3.4581 LearningRate 0.0001 Epoch: 30 Global Step: 623570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:44,924-Speed 6301.72 samples/sec Loss 3.4517 LearningRate 0.0001 Epoch: 30 Global Step: 623580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:48,170-Speed 6310.92 samples/sec Loss 3.3944 LearningRate 0.0001 Epoch: 30 Global Step: 623590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:51,416-Speed 6312.02 samples/sec Loss 3.4380 LearningRate 0.0001 Epoch: 30 Global Step: 623600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:54,657-Speed 6318.43 samples/sec Loss 3.3766 LearningRate 0.0001 Epoch: 30 Global Step: 623610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:27:57,904-Speed 6309.66 samples/sec Loss 3.3679 LearningRate 0.0001 Epoch: 30 Global Step: 623620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:01,151-Speed 6310.31 samples/sec Loss 3.4196 LearningRate 0.0001 Epoch: 30 Global Step: 623630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:04,412-Speed 6281.35 samples/sec Loss 3.3637 LearningRate 0.0001 Epoch: 30 Global Step: 623640 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:28:07,638-Speed 6349.94 samples/sec Loss 3.4049 LearningRate 0.0001 Epoch: 30 Global Step: 623650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:10,882-Speed 6315.15 samples/sec Loss 3.3822 LearningRate 0.0001 Epoch: 30 Global Step: 623660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:14,131-Speed 6303.80 samples/sec Loss 3.4469 LearningRate 0.0001 Epoch: 30 Global Step: 623670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:17,377-Speed 6310.29 samples/sec Loss 3.3473 LearningRate 0.0001 Epoch: 30 Global Step: 623680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:20,624-Speed 6309.65 samples/sec Loss 3.4079 LearningRate 0.0001 Epoch: 30 Global Step: 623690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:23,869-Speed 6313.46 samples/sec Loss 3.4333 LearningRate 0.0001 Epoch: 30 Global Step: 623700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:27,111-Speed 6318.05 samples/sec Loss 3.3638 LearningRate 0.0001 Epoch: 30 Global Step: 623710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:30,356-Speed 6313.14 samples/sec Loss 3.3492 LearningRate 0.0001 Epoch: 30 Global Step: 623720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:33,602-Speed 6310.06 samples/sec Loss 3.3002 LearningRate 0.0001 Epoch: 30 Global Step: 623730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:36,845-Speed 6316.49 samples/sec Loss 3.3982 LearningRate 0.0001 Epoch: 30 Global Step: 623740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:40,075-Speed 6342.89 samples/sec Loss 3.3859 LearningRate 0.0001 Epoch: 30 Global Step: 623750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:43,321-Speed 6310.07 samples/sec Loss 3.3904 LearningRate 0.0001 Epoch: 30 Global Step: 623760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:46,567-Speed 6310.67 samples/sec Loss 3.4443 LearningRate 0.0001 Epoch: 30 Global Step: 623770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:49,813-Speed 6311.48 samples/sec Loss 3.4125 LearningRate 0.0001 Epoch: 30 Global Step: 623780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:53,058-Speed 6311.60 samples/sec Loss 3.4027 LearningRate 0.0001 Epoch: 30 Global Step: 623790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:56,305-Speed 6313.01 samples/sec Loss 3.4617 LearningRate 0.0001 Epoch: 30 Global Step: 623800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:28:59,546-Speed 6320.22 samples/sec Loss 3.3684 LearningRate 0.0001 Epoch: 30 Global Step: 623810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:02,789-Speed 6316.00 samples/sec Loss 3.4437 LearningRate 0.0001 Epoch: 30 Global Step: 623820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:06,037-Speed 6306.46 samples/sec Loss 3.3887 LearningRate 0.0001 Epoch: 30 Global Step: 623830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:09,280-Speed 6317.88 samples/sec Loss 3.3830 LearningRate 0.0001 Epoch: 30 Global Step: 623840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:12,526-Speed 6310.02 samples/sec Loss 3.3267 LearningRate 0.0001 Epoch: 30 Global Step: 623850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:15,773-Speed 6309.57 samples/sec Loss 3.3843 LearningRate 0.0001 Epoch: 30 Global Step: 623860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:19,019-Speed 6310.37 samples/sec Loss 3.4435 LearningRate 0.0001 Epoch: 30 Global Step: 623870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:22,268-Speed 6305.19 samples/sec Loss 3.3503 LearningRate 0.0001 Epoch: 30 Global Step: 623880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:25,516-Speed 6307.26 samples/sec Loss 3.4096 LearningRate 0.0001 Epoch: 30 Global Step: 623890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:28,766-Speed 6302.89 samples/sec Loss 3.3642 LearningRate 0.0001 Epoch: 30 Global Step: 623900 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:32,010-Speed 6314.64 samples/sec Loss 3.3874 LearningRate 0.0001 Epoch: 30 Global Step: 623910 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:35,253-Speed 6317.45 samples/sec Loss 3.3674 LearningRate 0.0001 Epoch: 30 Global Step: 623920 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:38,497-Speed 6313.51 samples/sec Loss 3.4265 LearningRate 0.0001 Epoch: 30 Global Step: 623930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:41,741-Speed 6316.55 samples/sec Loss 3.4048 LearningRate 0.0001 Epoch: 30 Global Step: 623940 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:45,003-Speed 6278.32 samples/sec Loss 3.3786 LearningRate 0.0001 Epoch: 30 Global Step: 623950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:29:48,237-Speed 6335.14 samples/sec Loss 3.3746 LearningRate 0.0001 Epoch: 30 Global Step: 623960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:51,480-Speed 6315.31 samples/sec Loss 3.4333 LearningRate 0.0001 Epoch: 30 Global Step: 623970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:54,730-Speed 6302.55 samples/sec Loss 3.4571 LearningRate 0.0001 Epoch: 30 Global Step: 623980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:29:57,975-Speed 6314.25 samples/sec Loss 3.3972 LearningRate 0.0001 Epoch: 30 Global Step: 623990 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:01,223-Speed 6305.71 samples/sec Loss 3.3960 LearningRate 0.0001 Epoch: 30 Global Step: 624000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:04,540-Speed 6175.39 samples/sec Loss 3.4269 LearningRate 0.0001 Epoch: 30 Global Step: 624010 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:07,785-Speed 6313.18 samples/sec Loss 3.4078 LearningRate 0.0001 Epoch: 30 Global Step: 624020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:11,049-Speed 6275.87 samples/sec Loss 3.4053 LearningRate 0.0001 Epoch: 30 Global Step: 624030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:14,297-Speed 6306.28 samples/sec Loss 3.4202 LearningRate 0.0001 Epoch: 30 Global Step: 624040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:17,546-Speed 6306.01 samples/sec Loss 3.4252 LearningRate 0.0001 Epoch: 30 Global Step: 624050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:20,776-Speed 6341.57 samples/sec Loss 3.4285 LearningRate 0.0001 Epoch: 30 Global Step: 624060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:24,022-Speed 6312.63 samples/sec Loss 3.3579 LearningRate 0.0001 Epoch: 30 Global Step: 624070 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:27,268-Speed 6309.18 samples/sec Loss 3.3922 LearningRate 0.0001 Epoch: 30 Global Step: 624080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:30,515-Speed 6310.41 samples/sec Loss 3.3897 LearningRate 0.0001 Epoch: 30 Global Step: 624090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:33,790-Speed 6254.52 samples/sec Loss 3.3369 LearningRate 0.0001 Epoch: 30 Global Step: 624100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:37,046-Speed 6291.13 samples/sec Loss 3.4316 LearningRate 0.0001 Epoch: 30 Global Step: 624110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:40,291-Speed 6313.35 samples/sec Loss 3.4606 LearningRate 0.0001 Epoch: 30 Global Step: 624120 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:43,540-Speed 6304.58 samples/sec Loss 3.3688 LearningRate 0.0001 Epoch: 30 Global Step: 624130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:46,829-Speed 6229.13 samples/sec Loss 3.3566 LearningRate 0.0001 Epoch: 30 Global Step: 624140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:50,192-Speed 6089.51 samples/sec Loss 3.3798 LearningRate 0.0001 Epoch: 30 Global Step: 624150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:53,427-Speed 6333.68 samples/sec Loss 3.3840 LearningRate 0.0001 Epoch: 30 Global Step: 624160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:56,672-Speed 6312.11 samples/sec Loss 3.3826 LearningRate 0.0001 Epoch: 30 Global Step: 624170 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:30:59,915-Speed 6317.16 samples/sec Loss 3.3853 LearningRate 0.0001 Epoch: 30 Global Step: 624180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:03,204-Speed 6226.59 samples/sec Loss 3.4080 LearningRate 0.0001 Epoch: 30 Global Step: 624190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:06,450-Speed 6310.67 samples/sec Loss 3.3610 LearningRate 0.0001 Epoch: 30 Global Step: 624200 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:09,722-Speed 6261.04 samples/sec Loss 3.3475 LearningRate 0.0001 Epoch: 30 Global Step: 624210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:13,077-Speed 6106.79 samples/sec Loss 3.4055 LearningRate 0.0001 Epoch: 30 Global Step: 624220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:16,372-Speed 6215.46 samples/sec Loss 3.3639 LearningRate 0.0001 Epoch: 30 Global Step: 624230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:19,665-Speed 6221.51 samples/sec Loss 3.4699 LearningRate 0.0001 Epoch: 30 Global Step: 624240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:22,911-Speed 6309.77 samples/sec Loss 3.4009 LearningRate 0.0001 Epoch: 30 Global Step: 624250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:26,146-Speed 6333.55 samples/sec Loss 3.4155 LearningRate 0.0001 Epoch: 30 Global Step: 624260 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:29,392-Speed 6309.77 samples/sec Loss 3.3674 LearningRate 0.0001 Epoch: 30 Global Step: 624270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:32,636-Speed 6315.11 samples/sec Loss 3.4006 LearningRate 0.0001 Epoch: 30 Global Step: 624280 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:35,892-Speed 6291.45 samples/sec Loss 3.3865 LearningRate 0.0001 Epoch: 30 Global Step: 624290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:39,139-Speed 6309.78 samples/sec Loss 3.4347 LearningRate 0.0001 Epoch: 30 Global Step: 624300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:42,387-Speed 6307.16 samples/sec Loss 3.3822 LearningRate 0.0001 Epoch: 30 Global Step: 624310 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:45,629-Speed 6319.17 samples/sec Loss 3.3877 LearningRate 0.0001 Epoch: 30 Global Step: 624320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:48,873-Speed 6314.13 samples/sec Loss 3.3851 LearningRate 0.0001 Epoch: 30 Global Step: 624330 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:52,117-Speed 6313.81 samples/sec Loss 3.4119 LearningRate 0.0001 Epoch: 30 Global Step: 624340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:55,365-Speed 6308.19 samples/sec Loss 3.4220 LearningRate 0.0001 Epoch: 30 Global Step: 624350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:31:58,610-Speed 6311.14 samples/sec Loss 3.3965 LearningRate 0.0001 Epoch: 30 Global Step: 624360 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:32:01,848-Speed 6326.19 samples/sec Loss 3.4040 LearningRate 0.0001 Epoch: 30 Global Step: 624370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:05,100-Speed 6298.97 samples/sec Loss 3.3391 LearningRate 0.0001 Epoch: 30 Global Step: 624380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:08,344-Speed 6315.07 samples/sec Loss 3.4211 LearningRate 0.0001 Epoch: 30 Global Step: 624390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:11,593-Speed 6305.38 samples/sec Loss 3.3271 LearningRate 0.0001 Epoch: 30 Global Step: 624400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:14,840-Speed 6308.47 samples/sec Loss 3.3535 LearningRate 0.0001 Epoch: 30 Global Step: 624410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:18,085-Speed 6313.52 samples/sec Loss 3.4283 LearningRate 0.0001 Epoch: 30 Global Step: 624420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:21,335-Speed 6302.48 samples/sec Loss 3.4554 LearningRate 0.0001 Epoch: 30 Global Step: 624430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:24,614-Speed 6247.51 samples/sec Loss 3.3907 LearningRate 0.0001 Epoch: 30 Global Step: 624440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:27,863-Speed 6305.10 samples/sec Loss 3.3723 LearningRate 0.0001 Epoch: 30 Global Step: 624450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:31,134-Speed 6261.97 samples/sec Loss 3.3398 LearningRate 0.0001 Epoch: 30 Global Step: 624460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:34,406-Speed 6259.28 samples/sec Loss 3.3605 LearningRate 0.0001 Epoch: 30 Global Step: 624470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:37,652-Speed 6310.53 samples/sec Loss 3.3988 LearningRate 0.0001 Epoch: 30 Global Step: 624480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:40,897-Speed 6314.18 samples/sec Loss 3.3864 LearningRate 0.0001 Epoch: 30 Global Step: 624490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:44,143-Speed 6310.79 samples/sec Loss 3.3801 LearningRate 0.0001 Epoch: 30 Global Step: 624500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:47,387-Speed 6315.46 samples/sec Loss 3.4425 LearningRate 0.0001 Epoch: 30 Global Step: 624510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:50,631-Speed 6315.13 samples/sec Loss 3.3592 LearningRate 0.0001 Epoch: 30 Global Step: 624520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:53,872-Speed 6319.47 samples/sec Loss 3.3854 LearningRate 0.0001 Epoch: 30 Global Step: 624530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:32:57,119-Speed 6308.68 samples/sec Loss 3.4168 LearningRate 0.0001 Epoch: 30 Global Step: 624540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:00,370-Speed 6300.99 samples/sec Loss 3.4066 LearningRate 0.0001 Epoch: 30 Global Step: 624550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:03,615-Speed 6312.34 samples/sec Loss 3.4026 LearningRate 0.0001 Epoch: 30 Global Step: 624560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:06,844-Speed 6344.23 samples/sec Loss 3.3631 LearningRate 0.0001 Epoch: 30 Global Step: 624570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:10,090-Speed 6312.22 samples/sec Loss 3.3628 LearningRate 0.0001 Epoch: 30 Global Step: 624580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:13,336-Speed 6309.58 samples/sec Loss 3.3689 LearningRate 0.0001 Epoch: 30 Global Step: 624590 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:16,582-Speed 6310.65 samples/sec Loss 3.3843 LearningRate 0.0001 Epoch: 30 Global Step: 624600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:19,824-Speed 6318.72 samples/sec Loss 3.3548 LearningRate 0.0001 Epoch: 30 Global Step: 624610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:23,068-Speed 6315.42 samples/sec Loss 3.4096 LearningRate 0.0001 Epoch: 30 Global Step: 624620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:26,313-Speed 6312.87 samples/sec Loss 3.3577 LearningRate 0.0001 Epoch: 30 Global Step: 624630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:29,565-Speed 6297.64 samples/sec Loss 3.4228 LearningRate 0.0001 Epoch: 30 Global Step: 624640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:32,813-Speed 6306.70 samples/sec Loss 3.4115 LearningRate 0.0001 Epoch: 30 Global Step: 624650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:36,077-Speed 6276.50 samples/sec Loss 3.3776 LearningRate 0.0001 Epoch: 30 Global Step: 624660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:39,323-Speed 6310.09 samples/sec Loss 3.3821 LearningRate 0.0001 Epoch: 30 Global Step: 624670 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:33:42,556-Speed 6336.94 samples/sec Loss 3.3735 LearningRate 0.0001 Epoch: 30 Global Step: 624680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:45,803-Speed 6308.93 samples/sec Loss 3.3663 LearningRate 0.0001 Epoch: 30 Global Step: 624690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:49,051-Speed 6307.03 samples/sec Loss 3.4315 LearningRate 0.0001 Epoch: 30 Global Step: 624700 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:52,331-Speed 6245.83 samples/sec Loss 3.3834 LearningRate 0.0001 Epoch: 30 Global Step: 624710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:55,576-Speed 6312.83 samples/sec Loss 3.4206 LearningRate 0.0001 Epoch: 30 Global Step: 624720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:33:58,926-Speed 6114.35 samples/sec Loss 3.3787 LearningRate 0.0001 Epoch: 30 Global Step: 624730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:02,287-Speed 6096.33 samples/sec Loss 3.3792 LearningRate 0.0001 Epoch: 30 Global Step: 624740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:05,621-Speed 6143.24 samples/sec Loss 3.4164 LearningRate 0.0001 Epoch: 30 Global Step: 624750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:08,865-Speed 6314.08 samples/sec Loss 3.4026 LearningRate 0.0001 Epoch: 30 Global Step: 624760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:12,110-Speed 6313.70 samples/sec Loss 3.3693 LearningRate 0.0001 Epoch: 30 Global Step: 624770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:15,341-Speed 6339.62 samples/sec Loss 3.4395 LearningRate 0.0001 Epoch: 30 Global Step: 624780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:18,673-Speed 6147.61 samples/sec Loss 3.4082 LearningRate 0.0001 Epoch: 30 Global Step: 624790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:22,029-Speed 6104.44 samples/sec Loss 3.3357 LearningRate 0.0001 Epoch: 30 Global Step: 624800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:25,273-Speed 6314.46 samples/sec Loss 3.4391 LearningRate 0.0001 Epoch: 30 Global Step: 624810 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:28,518-Speed 6312.98 samples/sec Loss 3.4207 LearningRate 0.0001 Epoch: 30 Global Step: 624820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:31,762-Speed 6312.85 samples/sec Loss 3.3657 LearningRate 0.0001 Epoch: 30 Global Step: 624830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:35,005-Speed 6318.25 samples/sec Loss 3.3671 LearningRate 0.0001 Epoch: 30 Global Step: 624840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:38,249-Speed 6313.76 samples/sec Loss 3.3827 LearningRate 0.0001 Epoch: 30 Global Step: 624850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:41,493-Speed 6315.18 samples/sec Loss 3.3912 LearningRate 0.0001 Epoch: 30 Global Step: 624860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:34:44,722-Speed 6343.63 samples/sec Loss 3.4157 LearningRate 0.0001 Epoch: 30 Global Step: 624870 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-03 00:34:47,964-Speed 6317.88 samples/sec Loss 3.3784 LearningRate 0.0001 Epoch: 30 Global Step: 624880 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-03 00:34:51,209-Speed 6312.46 samples/sec Loss 3.3648 LearningRate 0.0001 Epoch: 30 Global Step: 624890 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-03 00:34:54,454-Speed 6313.97 samples/sec Loss 3.3376 LearningRate 0.0001 Epoch: 30 Global Step: 624900 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-03 00:34:57,703-Speed 6303.14 samples/sec Loss 3.4224 LearningRate 0.0001 Epoch: 30 Global Step: 624910 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-03 00:35:00,948-Speed 6314.12 samples/sec Loss 3.3867 LearningRate 0.0001 Epoch: 30 Global Step: 624920 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-03 00:35:04,195-Speed 6308.22 samples/sec Loss 3.3669 LearningRate 0.0001 Epoch: 30 Global Step: 624930 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-03 00:35:07,442-Speed 6308.77 samples/sec Loss 3.4197 LearningRate 0.0001 Epoch: 30 Global Step: 624940 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-03 00:35:10,687-Speed 6313.01 samples/sec Loss 3.3937 LearningRate 0.0001 Epoch: 30 Global Step: 624950 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-03 00:35:13,933-Speed 6310.46 samples/sec Loss 3.4005 LearningRate 0.0001 Epoch: 30 Global Step: 624960 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-03 00:35:17,178-Speed 6313.97 samples/sec Loss 3.4248 LearningRate 0.0001 Epoch: 30 Global Step: 624970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:20,423-Speed 6312.63 samples/sec Loss 3.4225 LearningRate 0.0001 Epoch: 30 Global Step: 624980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:23,672-Speed 6304.42 samples/sec Loss 3.4725 LearningRate 0.0001 Epoch: 30 Global Step: 624990 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:26,913-Speed 6320.20 samples/sec Loss 3.3773 LearningRate 0.0001 Epoch: 30 Global Step: 625000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:30,161-Speed 6307.12 samples/sec Loss 3.3621 LearningRate 0.0001 Epoch: 30 Global Step: 625010 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:33,405-Speed 6314.64 samples/sec Loss 3.3412 LearningRate 0.0001 Epoch: 30 Global Step: 625020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:36,652-Speed 6309.26 samples/sec Loss 3.3558 LearningRate 0.0001 Epoch: 30 Global Step: 625030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:39,894-Speed 6317.72 samples/sec Loss 3.4251 LearningRate 0.0001 Epoch: 30 Global Step: 625040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:43,137-Speed 6315.95 samples/sec Loss 3.4197 LearningRate 0.0001 Epoch: 30 Global Step: 625050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:46,383-Speed 6311.94 samples/sec Loss 3.3727 LearningRate 0.0001 Epoch: 30 Global Step: 625060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:49,626-Speed 6315.75 samples/sec Loss 3.3161 LearningRate 0.0001 Epoch: 30 Global Step: 625070 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:35:52,858-Speed 6338.86 samples/sec Loss 3.4083 LearningRate 0.0001 Epoch: 30 Global Step: 625080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:56,104-Speed 6310.20 samples/sec Loss 3.3891 LearningRate 0.0001 Epoch: 30 Global Step: 625090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:35:59,349-Speed 6313.92 samples/sec Loss 3.3924 LearningRate 0.0001 Epoch: 30 Global Step: 625100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:02,604-Speed 6296.46 samples/sec Loss 3.3776 LearningRate 0.0001 Epoch: 30 Global Step: 625110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:05,849-Speed 6312.87 samples/sec Loss 3.4508 LearningRate 0.0001 Epoch: 30 Global Step: 625120 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:09,094-Speed 6312.06 samples/sec Loss 3.3562 LearningRate 0.0001 Epoch: 30 Global Step: 625130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:12,343-Speed 6305.56 samples/sec Loss 3.3884 LearningRate 0.0001 Epoch: 30 Global Step: 625140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:15,609-Speed 6270.79 samples/sec Loss 3.3685 LearningRate 0.0001 Epoch: 30 Global Step: 625150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:18,939-Speed 6151.56 samples/sec Loss 3.3773 LearningRate 0.0001 Epoch: 30 Global Step: 625160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:22,191-Speed 6300.84 samples/sec Loss 3.3884 LearningRate 0.0001 Epoch: 30 Global Step: 625170 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:25,424-Speed 6336.21 samples/sec Loss 3.3427 LearningRate 0.0001 Epoch: 30 Global Step: 625180 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:28,674-Speed 6301.61 samples/sec Loss 3.4279 LearningRate 0.0001 Epoch: 30 Global Step: 625190 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:31,925-Speed 6301.76 samples/sec Loss 3.3602 LearningRate 0.0001 Epoch: 30 Global Step: 625200 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:35,172-Speed 6307.73 samples/sec Loss 3.3752 LearningRate 0.0001 Epoch: 30 Global Step: 625210 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:38,418-Speed 6311.49 samples/sec Loss 3.4491 LearningRate 0.0001 Epoch: 30 Global Step: 625220 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:41,669-Speed 6300.57 samples/sec Loss 3.4183 LearningRate 0.0001 Epoch: 30 Global Step: 625230 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:44,918-Speed 6306.49 samples/sec Loss 3.4225 LearningRate 0.0001 Epoch: 30 Global Step: 625240 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:48,163-Speed 6312.56 samples/sec Loss 3.4385 LearningRate 0.0001 Epoch: 30 Global Step: 625250 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:51,409-Speed 6311.03 samples/sec Loss 3.4332 LearningRate 0.0001 Epoch: 30 Global Step: 625260 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:54,652-Speed 6315.65 samples/sec Loss 3.4007 LearningRate 0.0001 Epoch: 30 Global Step: 625270 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:36:57,902-Speed 6303.72 samples/sec Loss 3.4180 LearningRate 0.0001 Epoch: 30 Global Step: 625280 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:37:01,134-Speed 6337.72 samples/sec Loss 3.4077 LearningRate 0.0001 Epoch: 30 Global Step: 625290 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:04,381-Speed 6308.35 samples/sec Loss 3.3949 LearningRate 0.0001 Epoch: 30 Global Step: 625300 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:07,635-Speed 6294.38 samples/sec Loss 3.3307 LearningRate 0.0001 Epoch: 30 Global Step: 625310 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:10,888-Speed 6301.47 samples/sec Loss 3.3682 LearningRate 0.0001 Epoch: 30 Global Step: 625320 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:14,135-Speed 6308.99 samples/sec Loss 3.3690 LearningRate 0.0001 Epoch: 30 Global Step: 625330 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:17,377-Speed 6316.95 samples/sec Loss 3.4203 LearningRate 0.0001 Epoch: 30 Global Step: 625340 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:20,628-Speed 6300.75 samples/sec Loss 3.3780 LearningRate 0.0001 Epoch: 30 Global Step: 625350 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:23,879-Speed 6302.14 samples/sec Loss 3.3478 LearningRate 0.0001 Epoch: 30 Global Step: 625360 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:27,125-Speed 6309.82 samples/sec Loss 3.3819 LearningRate 0.0001 Epoch: 30 Global Step: 625370 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:30,371-Speed 6312.55 samples/sec Loss 3.3873 LearningRate 0.0001 Epoch: 30 Global Step: 625380 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:33,599-Speed 6345.02 samples/sec Loss 3.4429 LearningRate 0.0001 Epoch: 30 Global Step: 625390 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:36,856-Speed 6289.79 samples/sec Loss 3.4481 LearningRate 0.0001 Epoch: 30 Global Step: 625400 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:40,120-Speed 6276.22 samples/sec Loss 3.4136 LearningRate 0.0001 Epoch: 30 Global Step: 625410 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:43,365-Speed 6312.90 samples/sec Loss 3.4312 LearningRate 0.0001 Epoch: 30 Global Step: 625420 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:46,612-Speed 6307.75 samples/sec Loss 3.3588 LearningRate 0.0001 Epoch: 30 Global Step: 625430 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:49,867-Speed 6294.13 samples/sec Loss 3.3997 LearningRate 0.0001 Epoch: 30 Global Step: 625440 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:53,117-Speed 6303.45 samples/sec Loss 3.3845 LearningRate 0.0001 Epoch: 30 Global Step: 625450 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:56,359-Speed 6317.40 samples/sec Loss 3.3773 LearningRate 0.0001 Epoch: 30 Global Step: 625460 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:37:59,614-Speed 6293.44 samples/sec Loss 3.4020 LearningRate 0.0001 Epoch: 30 Global Step: 625470 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:02,857-Speed 6315.98 samples/sec Loss 3.3875 LearningRate 0.0001 Epoch: 30 Global Step: 625480 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:06,089-Speed 6339.58 samples/sec Loss 3.3532 LearningRate 0.0001 Epoch: 30 Global Step: 625490 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:09,336-Speed 6308.19 samples/sec Loss 3.3815 LearningRate 0.0001 Epoch: 30 Global Step: 625500 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:12,581-Speed 6311.78 samples/sec Loss 3.3569 LearningRate 0.0001 Epoch: 30 Global Step: 625510 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:15,826-Speed 6313.97 samples/sec Loss 3.3814 LearningRate 0.0001 Epoch: 30 Global Step: 625520 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:19,074-Speed 6306.25 samples/sec Loss 3.3752 LearningRate 0.0001 Epoch: 30 Global Step: 625530 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:22,319-Speed 6312.14 samples/sec Loss 3.3885 LearningRate 0.0001 Epoch: 30 Global Step: 625540 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:25,565-Speed 6310.87 samples/sec Loss 3.3704 LearningRate 0.0001 Epoch: 30 Global Step: 625550 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:28,810-Speed 6313.98 samples/sec Loss 3.3161 LearningRate 0.0001 Epoch: 30 Global Step: 625560 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:32,055-Speed 6312.59 samples/sec Loss 3.3860 LearningRate 0.0001 Epoch: 30 Global Step: 625570 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:35,297-Speed 6317.59 samples/sec Loss 3.3441 LearningRate 0.0001 Epoch: 30 Global Step: 625580 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:38,539-Speed 6318.24 samples/sec Loss 3.4264 LearningRate 0.0001 Epoch: 30 Global Step: 625590 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:38:41,771-Speed 6339.71 samples/sec Loss 3.3766 LearningRate 0.0001 Epoch: 30 Global Step: 625600 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:45,024-Speed 6296.96 samples/sec Loss 3.3673 LearningRate 0.0001 Epoch: 30 Global Step: 625610 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:48,270-Speed 6309.73 samples/sec Loss 3.3409 LearningRate 0.0001 Epoch: 30 Global Step: 625620 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:51,519-Speed 6306.03 samples/sec Loss 3.3301 LearningRate 0.0001 Epoch: 30 Global Step: 625630 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:54,765-Speed 6311.08 samples/sec Loss 3.3967 LearningRate 0.0001 Epoch: 30 Global Step: 625640 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:38:58,008-Speed 6315.32 samples/sec Loss 3.4222 LearningRate 0.0001 Epoch: 30 Global Step: 625650 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:01,253-Speed 6312.65 samples/sec Loss 3.4334 LearningRate 0.0001 Epoch: 30 Global Step: 625660 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:04,530-Speed 6250.70 samples/sec Loss 3.3993 LearningRate 0.0001 Epoch: 30 Global Step: 625670 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:07,778-Speed 6308.06 samples/sec Loss 3.3665 LearningRate 0.0001 Epoch: 30 Global Step: 625680 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:11,025-Speed 6309.31 samples/sec Loss 3.3017 LearningRate 0.0001 Epoch: 30 Global Step: 625690 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:14,282-Speed 6288.51 samples/sec Loss 3.3716 LearningRate 0.0001 Epoch: 30 Global Step: 625700 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:39:17,548-Speed 6271.80 samples/sec Loss 3.3288 LearningRate 0.0001 Epoch: 30 Global Step: 625710 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:20,796-Speed 6306.61 samples/sec Loss 3.3782 LearningRate 0.0001 Epoch: 30 Global Step: 625720 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:24,040-Speed 6314.24 samples/sec Loss 3.3819 LearningRate 0.0001 Epoch: 30 Global Step: 625730 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:27,287-Speed 6309.82 samples/sec Loss 3.3534 LearningRate 0.0001 Epoch: 30 Global Step: 625740 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:30,531-Speed 6313.93 samples/sec Loss 3.3947 LearningRate 0.0001 Epoch: 30 Global Step: 625750 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:33,778-Speed 6309.44 samples/sec Loss 3.3899 LearningRate 0.0001 Epoch: 30 Global Step: 625760 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:37,022-Speed 6313.67 samples/sec Loss 3.3622 LearningRate 0.0001 Epoch: 30 Global Step: 625770 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:40,327-Speed 6199.16 samples/sec Loss 3.3429 LearningRate 0.0001 Epoch: 30 Global Step: 625780 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:43,628-Speed 6205.50 samples/sec Loss 3.4499 LearningRate 0.0001 Epoch: 30 Global Step: 625790 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:46,870-Speed 6316.76 samples/sec Loss 3.4042 LearningRate 0.0001 Epoch: 30 Global Step: 625800 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:50,115-Speed 6313.31 samples/sec Loss 3.3775 LearningRate 0.0001 Epoch: 30 Global Step: 625810 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:39:53,351-Speed 6331.63 samples/sec Loss 3.3863 LearningRate 0.0001 Epoch: 30 Global Step: 625820 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:56,596-Speed 6311.55 samples/sec Loss 3.4025 LearningRate 0.0001 Epoch: 30 Global Step: 625830 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:39:59,841-Speed 6314.48 samples/sec Loss 3.3815 LearningRate 0.0001 Epoch: 30 Global Step: 625840 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:03,098-Speed 6288.65 samples/sec Loss 3.3971 LearningRate 0.0001 Epoch: 30 Global Step: 625850 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:06,344-Speed 6311.46 samples/sec Loss 3.3961 LearningRate 0.0001 Epoch: 30 Global Step: 625860 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:09,586-Speed 6317.24 samples/sec Loss 3.4485 LearningRate 0.0001 Epoch: 30 Global Step: 625870 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:12,836-Speed 6305.03 samples/sec Loss 3.4219 LearningRate 0.0001 Epoch: 30 Global Step: 625880 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:16,083-Speed 6309.22 samples/sec Loss 3.3823 LearningRate 0.0001 Epoch: 30 Global Step: 625890 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:19,327-Speed 6313.26 samples/sec Loss 3.3767 LearningRate 0.0001 Epoch: 30 Global Step: 625900 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:22,574-Speed 6309.16 samples/sec Loss 3.4435 LearningRate 0.0001 Epoch: 30 Global Step: 625910 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:25,805-Speed 6341.46 samples/sec Loss 3.4288 LearningRate 0.0001 Epoch: 30 Global Step: 625920 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:29,055-Speed 6301.92 samples/sec Loss 3.3749 LearningRate 0.0001 Epoch: 30 Global Step: 625930 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:32,298-Speed 6317.28 samples/sec Loss 3.3457 LearningRate 0.0001 Epoch: 30 Global Step: 625940 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:35,556-Speed 6287.86 samples/sec Loss 3.3617 LearningRate 0.0001 Epoch: 30 Global Step: 625950 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:38,801-Speed 6311.07 samples/sec Loss 3.4301 LearningRate 0.0001 Epoch: 30 Global Step: 625960 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:42,045-Speed 6316.09 samples/sec Loss 3.3072 LearningRate 0.0001 Epoch: 30 Global Step: 625970 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:45,291-Speed 6309.36 samples/sec Loss 3.3850 LearningRate 0.0001 Epoch: 30 Global Step: 625980 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:48,538-Speed 6309.93 samples/sec Loss 3.3417 LearningRate 0.0001 Epoch: 30 Global Step: 625990 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:51,785-Speed 6307.33 samples/sec Loss 3.4248 LearningRate 0.0001 Epoch: 30 Global Step: 626000 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:55,031-Speed 6310.50 samples/sec Loss 3.3419 LearningRate 0.0001 Epoch: 30 Global Step: 626010 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:40:58,261-Speed 6343.20 samples/sec Loss 3.3672 LearningRate 0.0001 Epoch: 30 Global Step: 626020 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:01,508-Speed 6307.71 samples/sec Loss 3.3484 LearningRate 0.0001 Epoch: 30 Global Step: 626030 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:04,752-Speed 6314.57 samples/sec Loss 3.4328 LearningRate 0.0001 Epoch: 30 Global Step: 626040 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:07,998-Speed 6311.11 samples/sec Loss 3.4548 LearningRate 0.0001 Epoch: 30 Global Step: 626050 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:11,244-Speed 6311.13 samples/sec Loss 3.3617 LearningRate 0.0001 Epoch: 30 Global Step: 626060 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:14,491-Speed 6309.46 samples/sec Loss 3.3709 LearningRate 0.0001 Epoch: 30 Global Step: 626070 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:17,737-Speed 6310.71 samples/sec Loss 3.3971 LearningRate 0.0001 Epoch: 30 Global Step: 626080 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:20,980-Speed 6315.85 samples/sec Loss 3.4211 LearningRate 0.0001 Epoch: 30 Global Step: 626090 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:24,227-Speed 6309.12 samples/sec Loss 3.3040 LearningRate 0.0001 Epoch: 30 Global Step: 626100 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:27,468-Speed 6320.25 samples/sec Loss 3.3717 LearningRate 0.0001 Epoch: 30 Global Step: 626110 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:30,717-Speed 6305.13 samples/sec Loss 3.4434 LearningRate 0.0001 Epoch: 30 Global Step: 626120 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-03 00:41:33,949-Speed 6339.11 samples/sec Loss 3.4270 LearningRate 0.0001 Epoch: 30 Global Step: 626130 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:37,194-Speed 6312.74 samples/sec Loss 3.3216 LearningRate 0.0001 Epoch: 30 Global Step: 626140 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:40,441-Speed 6308.68 samples/sec Loss 3.3592 LearningRate 0.0001 Epoch: 30 Global Step: 626150 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:43,687-Speed 6309.50 samples/sec Loss 3.3502 LearningRate 0.0001 Epoch: 30 Global Step: 626160 Fp16 Grad Scale: 8192 Required: 19 hours Training: 2022-04-03 00:41:46,933-Speed 6311.31 samples/sec Loss 3.4169 LearningRate 0.0001 Epoch: 30 Global Step: 626170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:41:50,178-Speed 6311.75 samples/sec Loss 3.3848 LearningRate 0.0001 Epoch: 30 Global Step: 626180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:41:53,421-Speed 6316.88 samples/sec Loss 3.3842 LearningRate 0.0001 Epoch: 30 Global Step: 626190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:41:56,667-Speed 6311.00 samples/sec Loss 3.3974 LearningRate 0.0001 Epoch: 30 Global Step: 626200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:41:59,936-Speed 6266.52 samples/sec Loss 3.4436 LearningRate 0.0001 Epoch: 30 Global Step: 626210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:03,183-Speed 6308.99 samples/sec Loss 3.4181 LearningRate 0.0001 Epoch: 30 Global Step: 626220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:06,415-Speed 6337.73 samples/sec Loss 3.4089 LearningRate 0.0001 Epoch: 30 Global Step: 626230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:09,664-Speed 6305.85 samples/sec Loss 3.3664 LearningRate 0.0001 Epoch: 30 Global Step: 626240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:12,914-Speed 6301.72 samples/sec Loss 3.3531 LearningRate 0.0001 Epoch: 30 Global Step: 626250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:16,164-Speed 6303.51 samples/sec Loss 3.3512 LearningRate 0.0001 Epoch: 30 Global Step: 626260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:19,415-Speed 6300.91 samples/sec Loss 3.3799 LearningRate 0.0001 Epoch: 30 Global Step: 626270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:22,663-Speed 6307.35 samples/sec Loss 3.3955 LearningRate 0.0001 Epoch: 30 Global Step: 626280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:25,909-Speed 6311.35 samples/sec Loss 3.3517 LearningRate 0.0001 Epoch: 30 Global Step: 626290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:29,156-Speed 6309.57 samples/sec Loss 3.3648 LearningRate 0.0001 Epoch: 30 Global Step: 626300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:32,399-Speed 6316.69 samples/sec Loss 3.3365 LearningRate 0.0001 Epoch: 30 Global Step: 626310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:35,645-Speed 6309.40 samples/sec Loss 3.4012 LearningRate 0.0001 Epoch: 30 Global Step: 626320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:38,877-Speed 6338.28 samples/sec Loss 3.3458 LearningRate 0.0001 Epoch: 30 Global Step: 626330 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:42,126-Speed 6304.91 samples/sec Loss 3.4686 LearningRate 0.0001 Epoch: 30 Global Step: 626340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:45,372-Speed 6310.32 samples/sec Loss 3.3715 LearningRate 0.0001 Epoch: 30 Global Step: 626350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:48,619-Speed 6308.39 samples/sec Loss 3.4083 LearningRate 0.0001 Epoch: 30 Global Step: 626360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:51,869-Speed 6304.78 samples/sec Loss 3.4198 LearningRate 0.0001 Epoch: 30 Global Step: 626370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:55,120-Speed 6300.25 samples/sec Loss 3.3731 LearningRate 0.0001 Epoch: 30 Global Step: 626380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:42:58,370-Speed 6303.54 samples/sec Loss 3.4279 LearningRate 0.0001 Epoch: 30 Global Step: 626390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:01,614-Speed 6314.24 samples/sec Loss 3.3817 LearningRate 0.0001 Epoch: 30 Global Step: 626400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:04,864-Speed 6302.03 samples/sec Loss 3.3819 LearningRate 0.0001 Epoch: 30 Global Step: 626410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:08,106-Speed 6318.26 samples/sec Loss 3.3859 LearningRate 0.0001 Epoch: 30 Global Step: 626420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:11,361-Speed 6294.60 samples/sec Loss 3.3530 LearningRate 0.0001 Epoch: 30 Global Step: 626430 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 00:43:14,591-Speed 6341.71 samples/sec Loss 3.3264 LearningRate 0.0001 Epoch: 30 Global Step: 626440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:17,836-Speed 6312.08 samples/sec Loss 3.3814 LearningRate 0.0001 Epoch: 30 Global Step: 626450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:21,080-Speed 6314.26 samples/sec Loss 3.3536 LearningRate 0.0001 Epoch: 30 Global Step: 626460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:24,329-Speed 6305.99 samples/sec Loss 3.3937 LearningRate 0.0001 Epoch: 30 Global Step: 626470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:27,578-Speed 6304.37 samples/sec Loss 3.3473 LearningRate 0.0001 Epoch: 30 Global Step: 626480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:30,821-Speed 6316.69 samples/sec Loss 3.3744 LearningRate 0.0001 Epoch: 30 Global Step: 626490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:34,067-Speed 6310.11 samples/sec Loss 3.3274 LearningRate 0.0001 Epoch: 30 Global Step: 626500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:37,311-Speed 6315.46 samples/sec Loss 3.3396 LearningRate 0.0001 Epoch: 30 Global Step: 626510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:40,552-Speed 6319.71 samples/sec Loss 3.4487 LearningRate 0.0001 Epoch: 30 Global Step: 626520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:43,798-Speed 6311.54 samples/sec Loss 3.3524 LearningRate 0.0001 Epoch: 30 Global Step: 626530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:47,027-Speed 6344.25 samples/sec Loss 3.3860 LearningRate 0.0001 Epoch: 30 Global Step: 626540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:50,276-Speed 6305.58 samples/sec Loss 3.4206 LearningRate 0.0001 Epoch: 30 Global Step: 626550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:53,522-Speed 6310.59 samples/sec Loss 3.4218 LearningRate 0.0001 Epoch: 30 Global Step: 626560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:43:56,765-Speed 6317.01 samples/sec Loss 3.4111 LearningRate 0.0001 Epoch: 30 Global Step: 626570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:00,009-Speed 6313.03 samples/sec Loss 3.4274 LearningRate 0.0001 Epoch: 30 Global Step: 626580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:03,254-Speed 6313.57 samples/sec Loss 3.3367 LearningRate 0.0001 Epoch: 30 Global Step: 626590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:06,502-Speed 6306.04 samples/sec Loss 3.3572 LearningRate 0.0001 Epoch: 30 Global Step: 626600 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:09,751-Speed 6305.17 samples/sec Loss 3.3550 LearningRate 0.0001 Epoch: 30 Global Step: 626610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:13,000-Speed 6305.98 samples/sec Loss 3.3528 LearningRate 0.0001 Epoch: 30 Global Step: 626620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:16,245-Speed 6311.35 samples/sec Loss 3.3702 LearningRate 0.0001 Epoch: 30 Global Step: 626630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:19,489-Speed 6315.40 samples/sec Loss 3.4011 LearningRate 0.0001 Epoch: 30 Global Step: 626640 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 00:44:22,719-Speed 6341.07 samples/sec Loss 3.3759 LearningRate 0.0001 Epoch: 30 Global Step: 626650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:25,977-Speed 6288.65 samples/sec Loss 3.3986 LearningRate 0.0001 Epoch: 30 Global Step: 626660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:29,219-Speed 6318.43 samples/sec Loss 3.3944 LearningRate 0.0001 Epoch: 30 Global Step: 626670 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:32,465-Speed 6310.36 samples/sec Loss 3.3475 LearningRate 0.0001 Epoch: 30 Global Step: 626680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:35,706-Speed 6320.19 samples/sec Loss 3.3879 LearningRate 0.0001 Epoch: 30 Global Step: 626690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:38,951-Speed 6312.65 samples/sec Loss 3.3540 LearningRate 0.0001 Epoch: 30 Global Step: 626700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:42,204-Speed 6296.95 samples/sec Loss 3.3147 LearningRate 0.0001 Epoch: 30 Global Step: 626710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:45,451-Speed 6309.50 samples/sec Loss 3.3764 LearningRate 0.0001 Epoch: 30 Global Step: 626720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:48,696-Speed 6312.07 samples/sec Loss 3.3780 LearningRate 0.0001 Epoch: 30 Global Step: 626730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:51,941-Speed 6313.94 samples/sec Loss 3.3886 LearningRate 0.0001 Epoch: 30 Global Step: 626740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:55,173-Speed 6337.53 samples/sec Loss 3.3320 LearningRate 0.0001 Epoch: 30 Global Step: 626750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:44:58,422-Speed 6304.85 samples/sec Loss 3.4092 LearningRate 0.0001 Epoch: 30 Global Step: 626760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:01,666-Speed 6314.64 samples/sec Loss 3.3383 LearningRate 0.0001 Epoch: 30 Global Step: 626770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:04,915-Speed 6304.20 samples/sec Loss 3.3783 LearningRate 0.0001 Epoch: 30 Global Step: 626780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:08,159-Speed 6315.96 samples/sec Loss 3.3241 LearningRate 0.0001 Epoch: 30 Global Step: 626790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:11,407-Speed 6306.21 samples/sec Loss 3.3059 LearningRate 0.0001 Epoch: 30 Global Step: 626800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:14,649-Speed 6318.18 samples/sec Loss 3.3828 LearningRate 0.0001 Epoch: 30 Global Step: 626810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:17,898-Speed 6305.67 samples/sec Loss 3.3743 LearningRate 0.0001 Epoch: 30 Global Step: 626820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:21,141-Speed 6316.35 samples/sec Loss 3.3494 LearningRate 0.0001 Epoch: 30 Global Step: 626830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:24,389-Speed 6307.07 samples/sec Loss 3.3673 LearningRate 0.0001 Epoch: 30 Global Step: 626840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:27,635-Speed 6310.63 samples/sec Loss 3.3915 LearningRate 0.0001 Epoch: 30 Global Step: 626850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 00:45:30,867-Speed 6338.13 samples/sec Loss 3.3623 LearningRate 0.0001 Epoch: 30 Global Step: 626860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:34,119-Speed 6299.38 samples/sec Loss 3.4237 LearningRate 0.0001 Epoch: 30 Global Step: 626870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:37,366-Speed 6307.67 samples/sec Loss 3.4181 LearningRate 0.0001 Epoch: 30 Global Step: 626880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:40,616-Speed 6303.40 samples/sec Loss 3.4348 LearningRate 0.0001 Epoch: 30 Global Step: 626890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:43,860-Speed 6315.42 samples/sec Loss 3.3981 LearningRate 0.0001 Epoch: 30 Global Step: 626900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:47,105-Speed 6312.28 samples/sec Loss 3.3438 LearningRate 0.0001 Epoch: 30 Global Step: 626910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:50,352-Speed 6308.66 samples/sec Loss 3.3589 LearningRate 0.0001 Epoch: 30 Global Step: 626920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:53,599-Speed 6307.60 samples/sec Loss 3.3618 LearningRate 0.0001 Epoch: 30 Global Step: 626930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:45:56,847-Speed 6306.99 samples/sec Loss 3.3850 LearningRate 0.0001 Epoch: 30 Global Step: 626940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:00,145-Speed 6211.38 samples/sec Loss 3.3547 LearningRate 0.0001 Epoch: 30 Global Step: 626950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:03,382-Speed 6329.47 samples/sec Loss 3.3644 LearningRate 0.0001 Epoch: 30 Global Step: 626960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:06,634-Speed 6298.10 samples/sec Loss 3.3642 LearningRate 0.0001 Epoch: 30 Global Step: 626970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:09,883-Speed 6306.82 samples/sec Loss 3.4182 LearningRate 0.0001 Epoch: 30 Global Step: 626980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:13,128-Speed 6311.47 samples/sec Loss 3.4259 LearningRate 0.0001 Epoch: 30 Global Step: 626990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:16,380-Speed 6299.66 samples/sec Loss 3.3372 LearningRate 0.0001 Epoch: 30 Global Step: 627000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:19,623-Speed 6317.01 samples/sec Loss 3.3670 LearningRate 0.0001 Epoch: 30 Global Step: 627010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:22,874-Speed 6300.36 samples/sec Loss 3.3943 LearningRate 0.0001 Epoch: 30 Global Step: 627020 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:26,117-Speed 6315.78 samples/sec Loss 3.4170 LearningRate 0.0001 Epoch: 30 Global Step: 627030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:29,365-Speed 6306.50 samples/sec Loss 3.3509 LearningRate 0.0001 Epoch: 30 Global Step: 627040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:32,610-Speed 6313.31 samples/sec Loss 3.4061 LearningRate 0.0001 Epoch: 30 Global Step: 627050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:35,844-Speed 6334.67 samples/sec Loss 3.3621 LearningRate 0.0001 Epoch: 30 Global Step: 627060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:39,088-Speed 6314.28 samples/sec Loss 3.4090 LearningRate 0.0001 Epoch: 30 Global Step: 627070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:42,333-Speed 6312.33 samples/sec Loss 3.3621 LearningRate 0.0001 Epoch: 30 Global Step: 627080 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:45,578-Speed 6313.84 samples/sec Loss 3.3884 LearningRate 0.0001 Epoch: 30 Global Step: 627090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:48,820-Speed 6317.61 samples/sec Loss 3.3503 LearningRate 0.0001 Epoch: 30 Global Step: 627100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:52,072-Speed 6299.16 samples/sec Loss 3.4121 LearningRate 0.0001 Epoch: 30 Global Step: 627110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:55,322-Speed 6302.60 samples/sec Loss 3.3755 LearningRate 0.0001 Epoch: 30 Global Step: 627120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:46:58,570-Speed 6306.41 samples/sec Loss 3.3724 LearningRate 0.0001 Epoch: 30 Global Step: 627130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:01,822-Speed 6299.51 samples/sec Loss 3.2930 LearningRate 0.0001 Epoch: 30 Global Step: 627140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:05,068-Speed 6309.92 samples/sec Loss 3.3510 LearningRate 0.0001 Epoch: 30 Global Step: 627150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:08,301-Speed 6337.37 samples/sec Loss 3.3437 LearningRate 0.0001 Epoch: 30 Global Step: 627160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:11,542-Speed 6319.74 samples/sec Loss 3.3775 LearningRate 0.0001 Epoch: 30 Global Step: 627170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:14,788-Speed 6311.21 samples/sec Loss 3.4730 LearningRate 0.0001 Epoch: 30 Global Step: 627180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:18,033-Speed 6313.13 samples/sec Loss 3.4018 LearningRate 0.0001 Epoch: 30 Global Step: 627190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:21,274-Speed 6321.26 samples/sec Loss 3.3905 LearningRate 0.0001 Epoch: 30 Global Step: 627200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:24,521-Speed 6308.82 samples/sec Loss 3.4196 LearningRate 0.0001 Epoch: 30 Global Step: 627210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:27,762-Speed 6320.30 samples/sec Loss 3.3140 LearningRate 0.0001 Epoch: 30 Global Step: 627220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:31,009-Speed 6308.16 samples/sec Loss 3.4026 LearningRate 0.0001 Epoch: 30 Global Step: 627230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:34,255-Speed 6311.79 samples/sec Loss 3.4014 LearningRate 0.0001 Epoch: 30 Global Step: 627240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:37,503-Speed 6305.90 samples/sec Loss 3.3913 LearningRate 0.0001 Epoch: 30 Global Step: 627250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:40,737-Speed 6334.00 samples/sec Loss 3.3868 LearningRate 0.0001 Epoch: 30 Global Step: 627260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:43,979-Speed 6320.61 samples/sec Loss 3.3943 LearningRate 0.0001 Epoch: 30 Global Step: 627270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:47,224-Speed 6312.07 samples/sec Loss 3.3536 LearningRate 0.0001 Epoch: 30 Global Step: 627280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:50,467-Speed 6316.37 samples/sec Loss 3.3428 LearningRate 0.0001 Epoch: 30 Global Step: 627290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:53,708-Speed 6319.22 samples/sec Loss 3.3894 LearningRate 0.0001 Epoch: 30 Global Step: 627300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:47:56,954-Speed 6312.01 samples/sec Loss 3.3359 LearningRate 0.0001 Epoch: 30 Global Step: 627310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:00,207-Speed 6295.96 samples/sec Loss 3.4052 LearningRate 0.0001 Epoch: 30 Global Step: 627320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:03,468-Speed 6282.24 samples/sec Loss 3.4039 LearningRate 0.0001 Epoch: 30 Global Step: 627330 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:06,713-Speed 6311.21 samples/sec Loss 3.3594 LearningRate 0.0001 Epoch: 30 Global Step: 627340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:09,958-Speed 6312.92 samples/sec Loss 3.4705 LearningRate 0.0001 Epoch: 30 Global Step: 627350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:13,206-Speed 6308.36 samples/sec Loss 3.3765 LearningRate 0.0001 Epoch: 30 Global Step: 627360 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 00:48:16,440-Speed 6334.56 samples/sec Loss 3.3907 LearningRate 0.0001 Epoch: 30 Global Step: 627370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:19,685-Speed 6311.57 samples/sec Loss 3.3688 LearningRate 0.0001 Epoch: 30 Global Step: 627380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:22,928-Speed 6315.66 samples/sec Loss 3.4046 LearningRate 0.0001 Epoch: 30 Global Step: 627390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:26,172-Speed 6314.49 samples/sec Loss 3.3441 LearningRate 0.0001 Epoch: 30 Global Step: 627400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:29,417-Speed 6314.23 samples/sec Loss 3.3340 LearningRate 0.0001 Epoch: 30 Global Step: 627410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:32,658-Speed 6320.98 samples/sec Loss 3.4544 LearningRate 0.0001 Epoch: 30 Global Step: 627420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:35,902-Speed 6314.31 samples/sec Loss 3.3806 LearningRate 0.0001 Epoch: 30 Global Step: 627430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:39,150-Speed 6309.66 samples/sec Loss 3.3669 LearningRate 0.0001 Epoch: 30 Global Step: 627440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:42,400-Speed 6302.65 samples/sec Loss 3.4062 LearningRate 0.0001 Epoch: 30 Global Step: 627450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:45,647-Speed 6307.92 samples/sec Loss 3.3945 LearningRate 0.0001 Epoch: 30 Global Step: 627460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:48,917-Speed 6264.53 samples/sec Loss 3.3478 LearningRate 0.0001 Epoch: 30 Global Step: 627470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:52,160-Speed 6316.64 samples/sec Loss 3.4110 LearningRate 0.0001 Epoch: 30 Global Step: 627480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:55,409-Speed 6305.78 samples/sec Loss 3.4055 LearningRate 0.0001 Epoch: 30 Global Step: 627490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:48:58,652-Speed 6315.21 samples/sec Loss 3.3639 LearningRate 0.0001 Epoch: 30 Global Step: 627500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:01,897-Speed 6312.30 samples/sec Loss 3.3405 LearningRate 0.0001 Epoch: 30 Global Step: 627510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:05,141-Speed 6316.14 samples/sec Loss 3.3260 LearningRate 0.0001 Epoch: 30 Global Step: 627520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:08,387-Speed 6310.90 samples/sec Loss 3.3814 LearningRate 0.0001 Epoch: 30 Global Step: 627530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:11,633-Speed 6310.77 samples/sec Loss 3.4066 LearningRate 0.0001 Epoch: 30 Global Step: 627540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:14,878-Speed 6312.60 samples/sec Loss 3.2721 LearningRate 0.0001 Epoch: 30 Global Step: 627550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:18,125-Speed 6307.40 samples/sec Loss 3.2882 LearningRate 0.0001 Epoch: 30 Global Step: 627560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:21,353-Speed 6347.03 samples/sec Loss 3.3314 LearningRate 0.0001 Epoch: 30 Global Step: 627570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:24,601-Speed 6305.30 samples/sec Loss 3.3540 LearningRate 0.0001 Epoch: 30 Global Step: 627580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:27,844-Speed 6316.87 samples/sec Loss 3.3794 LearningRate 0.0001 Epoch: 30 Global Step: 627590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:31,087-Speed 6316.62 samples/sec Loss 3.3664 LearningRate 0.0001 Epoch: 30 Global Step: 627600 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:34,333-Speed 6311.98 samples/sec Loss 3.3639 LearningRate 0.0001 Epoch: 30 Global Step: 627610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:37,590-Speed 6288.76 samples/sec Loss 3.4256 LearningRate 0.0001 Epoch: 30 Global Step: 627620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:40,832-Speed 6318.30 samples/sec Loss 3.3738 LearningRate 0.0001 Epoch: 30 Global Step: 627630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:44,079-Speed 6309.41 samples/sec Loss 3.3750 LearningRate 0.0001 Epoch: 30 Global Step: 627640 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:47,323-Speed 6315.30 samples/sec Loss 3.4166 LearningRate 0.0001 Epoch: 30 Global Step: 627650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:50,577-Speed 6295.93 samples/sec Loss 3.3404 LearningRate 0.0001 Epoch: 30 Global Step: 627660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:49:53,820-Speed 6315.22 samples/sec Loss 3.4671 LearningRate 0.0001 Epoch: 30 Global Step: 627670 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 00:49:57,049-Speed 6345.29 samples/sec Loss 3.3966 LearningRate 0.0001 Epoch: 30 Global Step: 627680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:00,293-Speed 6312.95 samples/sec Loss 3.3507 LearningRate 0.0001 Epoch: 30 Global Step: 627690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:03,536-Speed 6317.59 samples/sec Loss 3.4104 LearningRate 0.0001 Epoch: 30 Global Step: 627700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:06,779-Speed 6316.21 samples/sec Loss 3.3789 LearningRate 0.0001 Epoch: 30 Global Step: 627710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:10,022-Speed 6316.60 samples/sec Loss 3.3560 LearningRate 0.0001 Epoch: 30 Global Step: 627720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:13,268-Speed 6311.22 samples/sec Loss 3.3677 LearningRate 0.0001 Epoch: 30 Global Step: 627730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:16,525-Speed 6287.60 samples/sec Loss 3.3814 LearningRate 0.0001 Epoch: 30 Global Step: 627740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:19,769-Speed 6315.22 samples/sec Loss 3.3878 LearningRate 0.0001 Epoch: 30 Global Step: 627750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:23,015-Speed 6310.49 samples/sec Loss 3.3237 LearningRate 0.0001 Epoch: 30 Global Step: 627760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:26,258-Speed 6317.46 samples/sec Loss 3.3878 LearningRate 0.0001 Epoch: 30 Global Step: 627770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:29,488-Speed 6342.86 samples/sec Loss 3.3660 LearningRate 0.0001 Epoch: 30 Global Step: 627780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:32,736-Speed 6305.96 samples/sec Loss 3.4118 LearningRate 0.0001 Epoch: 30 Global Step: 627790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:35,977-Speed 6319.41 samples/sec Loss 3.3858 LearningRate 0.0001 Epoch: 30 Global Step: 627800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:39,223-Speed 6311.02 samples/sec Loss 3.3656 LearningRate 0.0001 Epoch: 30 Global Step: 627810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:42,463-Speed 6323.65 samples/sec Loss 3.3693 LearningRate 0.0001 Epoch: 30 Global Step: 627820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:45,715-Speed 6298.21 samples/sec Loss 3.3635 LearningRate 0.0001 Epoch: 30 Global Step: 627830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:48,960-Speed 6312.78 samples/sec Loss 3.3891 LearningRate 0.0001 Epoch: 30 Global Step: 627840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:52,202-Speed 6318.95 samples/sec Loss 3.4082 LearningRate 0.0001 Epoch: 30 Global Step: 627850 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:55,449-Speed 6307.83 samples/sec Loss 3.3842 LearningRate 0.0001 Epoch: 30 Global Step: 627860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:50:58,693-Speed 6315.15 samples/sec Loss 3.3291 LearningRate 0.0001 Epoch: 30 Global Step: 627870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:01,925-Speed 6338.79 samples/sec Loss 3.3608 LearningRate 0.0001 Epoch: 30 Global Step: 627880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:05,176-Speed 6300.17 samples/sec Loss 3.3997 LearningRate 0.0001 Epoch: 30 Global Step: 627890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:08,427-Speed 6302.22 samples/sec Loss 3.3705 LearningRate 0.0001 Epoch: 30 Global Step: 627900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:11,676-Speed 6304.30 samples/sec Loss 3.4061 LearningRate 0.0001 Epoch: 30 Global Step: 627910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:14,923-Speed 6308.35 samples/sec Loss 3.3446 LearningRate 0.0001 Epoch: 30 Global Step: 627920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:18,173-Speed 6303.63 samples/sec Loss 3.3898 LearningRate 0.0001 Epoch: 30 Global Step: 627930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:21,417-Speed 6313.96 samples/sec Loss 3.3625 LearningRate 0.0001 Epoch: 30 Global Step: 627940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:24,681-Speed 6276.98 samples/sec Loss 3.4019 LearningRate 0.0001 Epoch: 30 Global Step: 627950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:27,927-Speed 6311.02 samples/sec Loss 3.3878 LearningRate 0.0001 Epoch: 30 Global Step: 627960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:31,168-Speed 6319.62 samples/sec Loss 3.3487 LearningRate 0.0001 Epoch: 30 Global Step: 627970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:34,413-Speed 6313.28 samples/sec Loss 3.3517 LearningRate 0.0001 Epoch: 30 Global Step: 627980 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 00:51:37,640-Speed 6347.10 samples/sec Loss 3.3350 LearningRate 0.0001 Epoch: 30 Global Step: 627990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:40,884-Speed 6314.36 samples/sec Loss 3.3410 LearningRate 0.0001 Epoch: 30 Global Step: 628000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:44,129-Speed 6312.22 samples/sec Loss 3.3661 LearningRate 0.0001 Epoch: 30 Global Step: 628010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:47,373-Speed 6315.84 samples/sec Loss 3.3853 LearningRate 0.0001 Epoch: 30 Global Step: 628020 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:50,626-Speed 6296.73 samples/sec Loss 3.3041 LearningRate 0.0001 Epoch: 30 Global Step: 628030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:53,877-Speed 6300.14 samples/sec Loss 3.3655 LearningRate 0.0001 Epoch: 30 Global Step: 628040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:51:57,122-Speed 6313.96 samples/sec Loss 3.3476 LearningRate 0.0001 Epoch: 30 Global Step: 628050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:00,373-Speed 6300.36 samples/sec Loss 3.3494 LearningRate 0.0001 Epoch: 30 Global Step: 628060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:03,624-Speed 6301.65 samples/sec Loss 3.3248 LearningRate 0.0001 Epoch: 30 Global Step: 628070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:06,867-Speed 6315.13 samples/sec Loss 3.3873 LearningRate 0.0001 Epoch: 30 Global Step: 628080 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:10,097-Speed 6342.92 samples/sec Loss 3.3647 LearningRate 0.0001 Epoch: 30 Global Step: 628090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:13,338-Speed 6320.23 samples/sec Loss 3.3943 LearningRate 0.0001 Epoch: 30 Global Step: 628100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:16,580-Speed 6320.56 samples/sec Loss 3.3494 LearningRate 0.0001 Epoch: 30 Global Step: 628110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:19,825-Speed 6311.07 samples/sec Loss 3.4110 LearningRate 0.0001 Epoch: 30 Global Step: 628120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:23,073-Speed 6307.65 samples/sec Loss 3.3855 LearningRate 0.0001 Epoch: 30 Global Step: 628130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:26,323-Speed 6303.21 samples/sec Loss 3.3642 LearningRate 0.0001 Epoch: 30 Global Step: 628140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:29,571-Speed 6307.84 samples/sec Loss 3.3873 LearningRate 0.0001 Epoch: 30 Global Step: 628150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:32,816-Speed 6310.61 samples/sec Loss 3.4381 LearningRate 0.0001 Epoch: 30 Global Step: 628160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:36,062-Speed 6310.50 samples/sec Loss 3.4487 LearningRate 0.0001 Epoch: 30 Global Step: 628170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:39,307-Speed 6313.85 samples/sec Loss 3.3436 LearningRate 0.0001 Epoch: 30 Global Step: 628180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:42,559-Speed 6299.23 samples/sec Loss 3.3618 LearningRate 0.0001 Epoch: 30 Global Step: 628190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:45,801-Speed 6317.14 samples/sec Loss 3.4204 LearningRate 0.0001 Epoch: 30 Global Step: 628200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:49,046-Speed 6314.13 samples/sec Loss 3.3886 LearningRate 0.0001 Epoch: 30 Global Step: 628210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:52,291-Speed 6312.09 samples/sec Loss 3.4306 LearningRate 0.0001 Epoch: 30 Global Step: 628220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:55,532-Speed 6320.12 samples/sec Loss 3.3610 LearningRate 0.0001 Epoch: 30 Global Step: 628230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:52:58,775-Speed 6317.37 samples/sec Loss 3.3380 LearningRate 0.0001 Epoch: 30 Global Step: 628240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:02,020-Speed 6311.04 samples/sec Loss 3.3392 LearningRate 0.0001 Epoch: 30 Global Step: 628250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:05,263-Speed 6316.79 samples/sec Loss 3.3702 LearningRate 0.0001 Epoch: 30 Global Step: 628260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:08,512-Speed 6304.88 samples/sec Loss 3.4032 LearningRate 0.0001 Epoch: 30 Global Step: 628270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:11,757-Speed 6314.16 samples/sec Loss 3.3950 LearningRate 0.0001 Epoch: 30 Global Step: 628280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:14,989-Speed 6336.24 samples/sec Loss 3.3581 LearningRate 0.0001 Epoch: 30 Global Step: 628290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:18,231-Speed 6319.92 samples/sec Loss 3.4444 LearningRate 0.0001 Epoch: 30 Global Step: 628300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:21,477-Speed 6311.03 samples/sec Loss 3.3436 LearningRate 0.0001 Epoch: 30 Global Step: 628310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:24,725-Speed 6307.06 samples/sec Loss 3.3307 LearningRate 0.0001 Epoch: 30 Global Step: 628320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:27,980-Speed 6292.44 samples/sec Loss 3.4073 LearningRate 0.0001 Epoch: 30 Global Step: 628330 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:31,287-Speed 6195.19 samples/sec Loss 3.2794 LearningRate 0.0001 Epoch: 30 Global Step: 628340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:34,534-Speed 6307.99 samples/sec Loss 3.3511 LearningRate 0.0001 Epoch: 30 Global Step: 628350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:37,776-Speed 6318.91 samples/sec Loss 3.3911 LearningRate 0.0001 Epoch: 30 Global Step: 628360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:41,020-Speed 6315.65 samples/sec Loss 3.3538 LearningRate 0.0001 Epoch: 30 Global Step: 628370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:44,261-Speed 6319.78 samples/sec Loss 3.3975 LearningRate 0.0001 Epoch: 30 Global Step: 628380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:47,491-Speed 6341.06 samples/sec Loss 3.3565 LearningRate 0.0001 Epoch: 30 Global Step: 628390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:50,734-Speed 6316.67 samples/sec Loss 3.3936 LearningRate 0.0001 Epoch: 30 Global Step: 628400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:53,978-Speed 6315.65 samples/sec Loss 3.3405 LearningRate 0.0001 Epoch: 30 Global Step: 628410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:53:57,222-Speed 6313.38 samples/sec Loss 3.3144 LearningRate 0.0001 Epoch: 30 Global Step: 628420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:00,469-Speed 6309.58 samples/sec Loss 3.3768 LearningRate 0.0001 Epoch: 30 Global Step: 628430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:03,718-Speed 6303.96 samples/sec Loss 3.3916 LearningRate 0.0001 Epoch: 30 Global Step: 628440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:06,961-Speed 6316.67 samples/sec Loss 3.3589 LearningRate 0.0001 Epoch: 30 Global Step: 628450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:10,208-Speed 6310.28 samples/sec Loss 3.4026 LearningRate 0.0001 Epoch: 30 Global Step: 628460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:13,451-Speed 6315.26 samples/sec Loss 3.3472 LearningRate 0.0001 Epoch: 30 Global Step: 628470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:16,700-Speed 6305.96 samples/sec Loss 3.3964 LearningRate 0.0001 Epoch: 30 Global Step: 628480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:19,948-Speed 6305.71 samples/sec Loss 3.3639 LearningRate 0.0001 Epoch: 30 Global Step: 628490 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 00:54:23,175-Speed 6347.78 samples/sec Loss 3.4175 LearningRate 0.0001 Epoch: 30 Global Step: 628500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:26,423-Speed 6307.96 samples/sec Loss 3.3841 LearningRate 0.0001 Epoch: 30 Global Step: 628510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:29,665-Speed 6317.68 samples/sec Loss 3.3455 LearningRate 0.0001 Epoch: 30 Global Step: 628520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:32,913-Speed 6307.73 samples/sec Loss 3.3660 LearningRate 0.0001 Epoch: 30 Global Step: 628530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:36,160-Speed 6307.39 samples/sec Loss 3.3777 LearningRate 0.0001 Epoch: 30 Global Step: 628540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:39,418-Speed 6287.93 samples/sec Loss 3.3146 LearningRate 0.0001 Epoch: 30 Global Step: 628550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:42,681-Speed 6277.97 samples/sec Loss 3.3322 LearningRate 0.0001 Epoch: 30 Global Step: 628560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:45,928-Speed 6309.09 samples/sec Loss 3.3597 LearningRate 0.0001 Epoch: 30 Global Step: 628570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:49,171-Speed 6316.58 samples/sec Loss 3.3994 LearningRate 0.0001 Epoch: 30 Global Step: 628580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:52,414-Speed 6316.78 samples/sec Loss 3.4071 LearningRate 0.0001 Epoch: 30 Global Step: 628590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:55,643-Speed 6345.41 samples/sec Loss 3.3422 LearningRate 0.0001 Epoch: 30 Global Step: 628600 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:54:58,886-Speed 6314.98 samples/sec Loss 3.3493 LearningRate 0.0001 Epoch: 30 Global Step: 628610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:02,128-Speed 6318.40 samples/sec Loss 3.3589 LearningRate 0.0001 Epoch: 30 Global Step: 628620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:05,389-Speed 6282.66 samples/sec Loss 3.3190 LearningRate 0.0001 Epoch: 30 Global Step: 628630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:08,636-Speed 6309.00 samples/sec Loss 3.3395 LearningRate 0.0001 Epoch: 30 Global Step: 628640 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:11,886-Speed 6302.51 samples/sec Loss 3.3957 LearningRate 0.0001 Epoch: 30 Global Step: 628650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:15,143-Speed 6288.17 samples/sec Loss 3.3599 LearningRate 0.0001 Epoch: 30 Global Step: 628660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:18,388-Speed 6313.97 samples/sec Loss 3.3519 LearningRate 0.0001 Epoch: 30 Global Step: 628670 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:21,629-Speed 6320.15 samples/sec Loss 3.3553 LearningRate 0.0001 Epoch: 30 Global Step: 628680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:24,875-Speed 6310.04 samples/sec Loss 3.3351 LearningRate 0.0001 Epoch: 30 Global Step: 628690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:28,107-Speed 6339.03 samples/sec Loss 3.3703 LearningRate 0.0001 Epoch: 30 Global Step: 628700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:31,355-Speed 6307.42 samples/sec Loss 3.3599 LearningRate 0.0001 Epoch: 30 Global Step: 628710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:34,603-Speed 6306.44 samples/sec Loss 3.3404 LearningRate 0.0001 Epoch: 30 Global Step: 628720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:37,847-Speed 6314.64 samples/sec Loss 3.3383 LearningRate 0.0001 Epoch: 30 Global Step: 628730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:41,089-Speed 6318.09 samples/sec Loss 3.4104 LearningRate 0.0001 Epoch: 30 Global Step: 628740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:44,336-Speed 6307.42 samples/sec Loss 3.3829 LearningRate 0.0001 Epoch: 30 Global Step: 628750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:47,584-Speed 6308.12 samples/sec Loss 3.3692 LearningRate 0.0001 Epoch: 30 Global Step: 628760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:50,827-Speed 6316.86 samples/sec Loss 3.3607 LearningRate 0.0001 Epoch: 30 Global Step: 628770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:54,071-Speed 6315.50 samples/sec Loss 3.3880 LearningRate 0.0001 Epoch: 30 Global Step: 628780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:55:57,317-Speed 6310.88 samples/sec Loss 3.4068 LearningRate 0.0001 Epoch: 30 Global Step: 628790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:00,549-Speed 6337.89 samples/sec Loss 3.3372 LearningRate 0.0001 Epoch: 30 Global Step: 628800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:03,798-Speed 6304.87 samples/sec Loss 3.3792 LearningRate 0.0001 Epoch: 30 Global Step: 628810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:07,046-Speed 6306.30 samples/sec Loss 3.3563 LearningRate 0.0001 Epoch: 30 Global Step: 628820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:10,290-Speed 6315.58 samples/sec Loss 3.4191 LearningRate 0.0001 Epoch: 30 Global Step: 628830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:13,531-Speed 6318.49 samples/sec Loss 3.3397 LearningRate 0.0001 Epoch: 30 Global Step: 628840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:16,778-Speed 6310.05 samples/sec Loss 3.3982 LearningRate 0.0001 Epoch: 30 Global Step: 628850 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:20,030-Speed 6298.88 samples/sec Loss 3.3663 LearningRate 0.0001 Epoch: 30 Global Step: 628860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:23,278-Speed 6306.53 samples/sec Loss 3.3348 LearningRate 0.0001 Epoch: 30 Global Step: 628870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:26,523-Speed 6313.80 samples/sec Loss 3.4097 LearningRate 0.0001 Epoch: 30 Global Step: 628880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:29,767-Speed 6313.50 samples/sec Loss 3.3531 LearningRate 0.0001 Epoch: 30 Global Step: 628890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:33,000-Speed 6336.82 samples/sec Loss 3.3462 LearningRate 0.0001 Epoch: 30 Global Step: 628900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:36,247-Speed 6308.04 samples/sec Loss 3.3451 LearningRate 0.0001 Epoch: 30 Global Step: 628910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:39,490-Speed 6319.52 samples/sec Loss 3.3825 LearningRate 0.0001 Epoch: 30 Global Step: 628920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:42,732-Speed 6317.90 samples/sec Loss 3.2880 LearningRate 0.0001 Epoch: 30 Global Step: 628930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:45,974-Speed 6318.62 samples/sec Loss 3.4026 LearningRate 0.0001 Epoch: 30 Global Step: 628940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:49,222-Speed 6307.30 samples/sec Loss 3.3995 LearningRate 0.0001 Epoch: 30 Global Step: 628950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:52,468-Speed 6310.00 samples/sec Loss 3.3112 LearningRate 0.0001 Epoch: 30 Global Step: 628960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:55,715-Speed 6309.57 samples/sec Loss 3.3000 LearningRate 0.0001 Epoch: 30 Global Step: 628970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:56:58,957-Speed 6318.58 samples/sec Loss 3.3617 LearningRate 0.0001 Epoch: 30 Global Step: 628980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:02,211-Speed 6294.28 samples/sec Loss 3.3456 LearningRate 0.0001 Epoch: 30 Global Step: 628990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:05,495-Speed 6238.01 samples/sec Loss 3.3006 LearningRate 0.0001 Epoch: 30 Global Step: 629000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:08,752-Speed 6290.34 samples/sec Loss 3.3738 LearningRate 0.0001 Epoch: 30 Global Step: 629010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:12,044-Speed 6222.49 samples/sec Loss 3.3194 LearningRate 0.0001 Epoch: 30 Global Step: 629020 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:15,285-Speed 6319.47 samples/sec Loss 3.3987 LearningRate 0.0001 Epoch: 30 Global Step: 629030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:18,534-Speed 6309.80 samples/sec Loss 3.3678 LearningRate 0.0001 Epoch: 30 Global Step: 629040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:21,778-Speed 6312.89 samples/sec Loss 3.3575 LearningRate 0.0001 Epoch: 30 Global Step: 629050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:25,027-Speed 6307.14 samples/sec Loss 3.3848 LearningRate 0.0001 Epoch: 30 Global Step: 629060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:28,276-Speed 6303.59 samples/sec Loss 3.3381 LearningRate 0.0001 Epoch: 30 Global Step: 629070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:31,521-Speed 6312.39 samples/sec Loss 3.4590 LearningRate 0.0001 Epoch: 30 Global Step: 629080 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:34,769-Speed 6306.67 samples/sec Loss 3.4012 LearningRate 0.0001 Epoch: 30 Global Step: 629090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:37,999-Speed 6341.97 samples/sec Loss 3.3411 LearningRate 0.0001 Epoch: 30 Global Step: 629100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:41,243-Speed 6314.50 samples/sec Loss 3.3286 LearningRate 0.0001 Epoch: 30 Global Step: 629110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:44,493-Speed 6303.61 samples/sec Loss 3.3351 LearningRate 0.0001 Epoch: 30 Global Step: 629120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:47,735-Speed 6318.17 samples/sec Loss 3.3649 LearningRate 0.0001 Epoch: 30 Global Step: 629130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:51,071-Speed 6140.36 samples/sec Loss 3.3868 LearningRate 0.0001 Epoch: 30 Global Step: 629140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:54,366-Speed 6220.28 samples/sec Loss 3.4262 LearningRate 0.0001 Epoch: 30 Global Step: 629150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:57:57,612-Speed 6309.95 samples/sec Loss 3.3890 LearningRate 0.0001 Epoch: 30 Global Step: 629160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:00,856-Speed 6315.18 samples/sec Loss 3.3652 LearningRate 0.0001 Epoch: 30 Global Step: 629170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:04,103-Speed 6309.67 samples/sec Loss 3.3710 LearningRate 0.0001 Epoch: 30 Global Step: 629180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:07,343-Speed 6321.52 samples/sec Loss 3.3582 LearningRate 0.0001 Epoch: 30 Global Step: 629190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:10,593-Speed 6303.75 samples/sec Loss 3.3303 LearningRate 0.0001 Epoch: 30 Global Step: 629200 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 00:58:13,828-Speed 6333.22 samples/sec Loss 3.4289 LearningRate 0.0001 Epoch: 30 Global Step: 629210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:17,072-Speed 6314.59 samples/sec Loss 3.3284 LearningRate 0.0001 Epoch: 30 Global Step: 629220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:20,319-Speed 6308.67 samples/sec Loss 3.3330 LearningRate 0.0001 Epoch: 30 Global Step: 629230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:23,565-Speed 6311.01 samples/sec Loss 3.4057 LearningRate 0.0001 Epoch: 30 Global Step: 629240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:26,810-Speed 6311.56 samples/sec Loss 3.3472 LearningRate 0.0001 Epoch: 30 Global Step: 629250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:30,076-Speed 6272.99 samples/sec Loss 3.3713 LearningRate 0.0001 Epoch: 30 Global Step: 629260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:33,322-Speed 6310.07 samples/sec Loss 3.3236 LearningRate 0.0001 Epoch: 30 Global Step: 629270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:36,566-Speed 6314.52 samples/sec Loss 3.3894 LearningRate 0.0001 Epoch: 30 Global Step: 629280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:39,812-Speed 6310.06 samples/sec Loss 3.3716 LearningRate 0.0001 Epoch: 30 Global Step: 629290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:43,062-Speed 6304.48 samples/sec Loss 3.3594 LearningRate 0.0001 Epoch: 30 Global Step: 629300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:46,301-Speed 6324.45 samples/sec Loss 3.3805 LearningRate 0.0001 Epoch: 30 Global Step: 629310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:49,542-Speed 6319.96 samples/sec Loss 3.4226 LearningRate 0.0001 Epoch: 30 Global Step: 629320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:52,785-Speed 6314.99 samples/sec Loss 3.3072 LearningRate 0.0001 Epoch: 30 Global Step: 629330 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:56,031-Speed 6311.03 samples/sec Loss 3.3653 LearningRate 0.0001 Epoch: 30 Global Step: 629340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:58:59,277-Speed 6311.68 samples/sec Loss 3.3466 LearningRate 0.0001 Epoch: 30 Global Step: 629350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:02,525-Speed 6306.78 samples/sec Loss 3.3828 LearningRate 0.0001 Epoch: 30 Global Step: 629360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:05,769-Speed 6313.95 samples/sec Loss 3.3346 LearningRate 0.0001 Epoch: 30 Global Step: 629370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:09,013-Speed 6314.43 samples/sec Loss 3.4053 LearningRate 0.0001 Epoch: 30 Global Step: 629380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:12,255-Speed 6318.58 samples/sec Loss 3.3487 LearningRate 0.0001 Epoch: 30 Global Step: 629390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:15,511-Speed 6291.05 samples/sec Loss 3.3682 LearningRate 0.0001 Epoch: 30 Global Step: 629400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:18,771-Speed 6284.92 samples/sec Loss 3.3313 LearningRate 0.0001 Epoch: 30 Global Step: 629410 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 00:59:22,004-Speed 6336.92 samples/sec Loss 3.3417 LearningRate 0.0001 Epoch: 30 Global Step: 629420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:25,249-Speed 6312.57 samples/sec Loss 3.3864 LearningRate 0.0001 Epoch: 30 Global Step: 629430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:28,489-Speed 6321.95 samples/sec Loss 3.3223 LearningRate 0.0001 Epoch: 30 Global Step: 629440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:31,737-Speed 6307.48 samples/sec Loss 3.4161 LearningRate 0.0001 Epoch: 30 Global Step: 629450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:34,981-Speed 6315.06 samples/sec Loss 3.3740 LearningRate 0.0001 Epoch: 30 Global Step: 629460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:38,227-Speed 6309.85 samples/sec Loss 3.3645 LearningRate 0.0001 Epoch: 30 Global Step: 629470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:41,468-Speed 6320.76 samples/sec Loss 3.3352 LearningRate 0.0001 Epoch: 30 Global Step: 629480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:44,718-Speed 6302.62 samples/sec Loss 3.3561 LearningRate 0.0001 Epoch: 30 Global Step: 629490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:47,965-Speed 6308.94 samples/sec Loss 3.3819 LearningRate 0.0001 Epoch: 30 Global Step: 629500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:51,207-Speed 6316.86 samples/sec Loss 3.3802 LearningRate 0.0001 Epoch: 30 Global Step: 629510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:54,439-Speed 6339.28 samples/sec Loss 3.3780 LearningRate 0.0001 Epoch: 30 Global Step: 629520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 00:59:57,681-Speed 6317.50 samples/sec Loss 3.3900 LearningRate 0.0001 Epoch: 30 Global Step: 629530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:00,927-Speed 6311.89 samples/sec Loss 3.4177 LearningRate 0.0001 Epoch: 30 Global Step: 629540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:04,178-Speed 6301.78 samples/sec Loss 3.3739 LearningRate 0.0001 Epoch: 30 Global Step: 629550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:07,425-Speed 6307.85 samples/sec Loss 3.3861 LearningRate 0.0001 Epoch: 30 Global Step: 629560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:10,671-Speed 6310.84 samples/sec Loss 3.3675 LearningRate 0.0001 Epoch: 30 Global Step: 629570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:13,912-Speed 6320.48 samples/sec Loss 3.3611 LearningRate 0.0001 Epoch: 30 Global Step: 629580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:17,156-Speed 6313.99 samples/sec Loss 3.3500 LearningRate 0.0001 Epoch: 30 Global Step: 629590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:20,400-Speed 6315.44 samples/sec Loss 3.3510 LearningRate 0.0001 Epoch: 30 Global Step: 629600 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:23,647-Speed 6308.59 samples/sec Loss 3.3414 LearningRate 0.0001 Epoch: 30 Global Step: 629610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:26,882-Speed 6332.33 samples/sec Loss 3.3070 LearningRate 0.0001 Epoch: 30 Global Step: 629620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:30,129-Speed 6309.45 samples/sec Loss 3.3592 LearningRate 0.0001 Epoch: 30 Global Step: 629630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:33,373-Speed 6314.59 samples/sec Loss 3.3533 LearningRate 0.0001 Epoch: 30 Global Step: 629640 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:36,619-Speed 6311.42 samples/sec Loss 3.4211 LearningRate 0.0001 Epoch: 30 Global Step: 629650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:39,865-Speed 6309.58 samples/sec Loss 3.3743 LearningRate 0.0001 Epoch: 30 Global Step: 629660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:43,111-Speed 6311.38 samples/sec Loss 3.3321 LearningRate 0.0001 Epoch: 30 Global Step: 629670 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:46,362-Speed 6301.23 samples/sec Loss 3.3310 LearningRate 0.0001 Epoch: 30 Global Step: 629680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:49,605-Speed 6316.33 samples/sec Loss 3.4015 LearningRate 0.0001 Epoch: 30 Global Step: 629690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:52,845-Speed 6321.37 samples/sec Loss 3.3624 LearningRate 0.0001 Epoch: 30 Global Step: 629700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:56,088-Speed 6318.07 samples/sec Loss 3.3583 LearningRate 0.0001 Epoch: 30 Global Step: 629710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:00:59,317-Speed 6343.83 samples/sec Loss 3.3792 LearningRate 0.0001 Epoch: 30 Global Step: 629720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:02,561-Speed 6314.81 samples/sec Loss 3.3552 LearningRate 0.0001 Epoch: 30 Global Step: 629730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:05,805-Speed 6314.89 samples/sec Loss 3.3318 LearningRate 0.0001 Epoch: 30 Global Step: 629740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:09,075-Speed 6263.07 samples/sec Loss 3.3774 LearningRate 0.0001 Epoch: 30 Global Step: 629750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:12,437-Speed 6093.20 samples/sec Loss 3.3511 LearningRate 0.0001 Epoch: 30 Global Step: 629760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:15,677-Speed 6322.17 samples/sec Loss 3.3592 LearningRate 0.0001 Epoch: 30 Global Step: 629770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:18,922-Speed 6312.31 samples/sec Loss 3.3110 LearningRate 0.0001 Epoch: 30 Global Step: 629780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:22,168-Speed 6310.10 samples/sec Loss 3.3155 LearningRate 0.0001 Epoch: 30 Global Step: 629790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:25,409-Speed 6321.43 samples/sec Loss 3.2882 LearningRate 0.0001 Epoch: 30 Global Step: 629800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:28,653-Speed 6314.63 samples/sec Loss 3.3437 LearningRate 0.0001 Epoch: 30 Global Step: 629810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:31,896-Speed 6315.82 samples/sec Loss 3.3950 LearningRate 0.0001 Epoch: 30 Global Step: 629820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:35,146-Speed 6304.51 samples/sec Loss 3.3115 LearningRate 0.0001 Epoch: 30 Global Step: 629830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:38,391-Speed 6311.30 samples/sec Loss 3.3254 LearningRate 0.0001 Epoch: 30 Global Step: 629840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:41,639-Speed 6306.92 samples/sec Loss 3.3553 LearningRate 0.0001 Epoch: 30 Global Step: 629850 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:44,887-Speed 6308.83 samples/sec Loss 3.3793 LearningRate 0.0001 Epoch: 30 Global Step: 629860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:48,127-Speed 6321.37 samples/sec Loss 3.3664 LearningRate 0.0001 Epoch: 30 Global Step: 629870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:51,370-Speed 6316.68 samples/sec Loss 3.3769 LearningRate 0.0001 Epoch: 30 Global Step: 629880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:54,613-Speed 6316.35 samples/sec Loss 3.4115 LearningRate 0.0001 Epoch: 30 Global Step: 629890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:01:57,856-Speed 6316.40 samples/sec Loss 3.3767 LearningRate 0.0001 Epoch: 30 Global Step: 629900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:01,105-Speed 6305.49 samples/sec Loss 3.2753 LearningRate 0.0001 Epoch: 30 Global Step: 629910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:04,334-Speed 6345.17 samples/sec Loss 3.3306 LearningRate 0.0001 Epoch: 30 Global Step: 629920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:07,578-Speed 6313.69 samples/sec Loss 3.3535 LearningRate 0.0001 Epoch: 30 Global Step: 629930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:10,824-Speed 6310.95 samples/sec Loss 3.3443 LearningRate 0.0001 Epoch: 30 Global Step: 629940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:14,074-Speed 6303.00 samples/sec Loss 3.3977 LearningRate 0.0001 Epoch: 30 Global Step: 629950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:17,319-Speed 6311.77 samples/sec Loss 3.3721 LearningRate 0.0001 Epoch: 30 Global Step: 629960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:20,563-Speed 6314.67 samples/sec Loss 3.3564 LearningRate 0.0001 Epoch: 30 Global Step: 629970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:23,810-Speed 6310.15 samples/sec Loss 3.3802 LearningRate 0.0001 Epoch: 30 Global Step: 629980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:27,052-Speed 6316.61 samples/sec Loss 3.3423 LearningRate 0.0001 Epoch: 30 Global Step: 629990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:30,298-Speed 6312.37 samples/sec Loss 3.3584 LearningRate 0.0001 Epoch: 30 Global Step: 630000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:33,542-Speed 6313.76 samples/sec Loss 3.3571 LearningRate 0.0001 Epoch: 30 Global Step: 630010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:36,795-Speed 6298.26 samples/sec Loss 3.3351 LearningRate 0.0001 Epoch: 30 Global Step: 630020 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:02:40,023-Speed 6344.41 samples/sec Loss 3.3894 LearningRate 0.0001 Epoch: 30 Global Step: 630030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:43,268-Speed 6314.32 samples/sec Loss 3.3679 LearningRate 0.0001 Epoch: 30 Global Step: 630040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:46,514-Speed 6310.38 samples/sec Loss 3.3643 LearningRate 0.0001 Epoch: 30 Global Step: 630050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:49,758-Speed 6314.69 samples/sec Loss 3.3466 LearningRate 0.0001 Epoch: 30 Global Step: 630060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:53,014-Speed 6290.44 samples/sec Loss 3.2791 LearningRate 0.0001 Epoch: 30 Global Step: 630070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:56,259-Speed 6314.14 samples/sec Loss 3.4171 LearningRate 0.0001 Epoch: 30 Global Step: 630080 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:02:59,502-Speed 6315.63 samples/sec Loss 3.3238 LearningRate 0.0001 Epoch: 30 Global Step: 630090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:02,747-Speed 6313.35 samples/sec Loss 3.3486 LearningRate 0.0001 Epoch: 30 Global Step: 630100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:05,991-Speed 6313.81 samples/sec Loss 3.3654 LearningRate 0.0001 Epoch: 30 Global Step: 630110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:09,241-Speed 6304.10 samples/sec Loss 3.3926 LearningRate 0.0001 Epoch: 30 Global Step: 630120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:12,468-Speed 6346.72 samples/sec Loss 3.4319 LearningRate 0.0001 Epoch: 30 Global Step: 630130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:15,709-Speed 6320.44 samples/sec Loss 3.3653 LearningRate 0.0001 Epoch: 30 Global Step: 630140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:18,958-Speed 6304.58 samples/sec Loss 3.3256 LearningRate 0.0001 Epoch: 30 Global Step: 630150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:22,204-Speed 6311.01 samples/sec Loss 3.3633 LearningRate 0.0001 Epoch: 30 Global Step: 630160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:25,452-Speed 6307.76 samples/sec Loss 3.3457 LearningRate 0.0001 Epoch: 30 Global Step: 630170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:28,692-Speed 6322.50 samples/sec Loss 3.3609 LearningRate 0.0001 Epoch: 30 Global Step: 630180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:31,939-Speed 6308.34 samples/sec Loss 3.3385 LearningRate 0.0001 Epoch: 30 Global Step: 630190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:35,187-Speed 6307.16 samples/sec Loss 3.3600 LearningRate 0.0001 Epoch: 30 Global Step: 630200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:38,432-Speed 6313.16 samples/sec Loss 3.3906 LearningRate 0.0001 Epoch: 30 Global Step: 630210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:41,677-Speed 6312.11 samples/sec Loss 3.3622 LearningRate 0.0001 Epoch: 30 Global Step: 630220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:44,902-Speed 6351.10 samples/sec Loss 3.3426 LearningRate 0.0001 Epoch: 30 Global Step: 630230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:48,149-Speed 6309.50 samples/sec Loss 3.3332 LearningRate 0.0001 Epoch: 30 Global Step: 630240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:51,391-Speed 6318.50 samples/sec Loss 3.3024 LearningRate 0.0001 Epoch: 30 Global Step: 630250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:54,637-Speed 6309.73 samples/sec Loss 3.3991 LearningRate 0.0001 Epoch: 30 Global Step: 630260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:03:57,885-Speed 6307.25 samples/sec Loss 3.3644 LearningRate 0.0001 Epoch: 30 Global Step: 630270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:01,130-Speed 6312.46 samples/sec Loss 3.4154 LearningRate 0.0001 Epoch: 30 Global Step: 630280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:04,375-Speed 6313.65 samples/sec Loss 3.3726 LearningRate 0.0001 Epoch: 30 Global Step: 630290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:07,615-Speed 6322.40 samples/sec Loss 3.4057 LearningRate 0.0001 Epoch: 30 Global Step: 630300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:10,864-Speed 6306.20 samples/sec Loss 3.3512 LearningRate 0.0001 Epoch: 30 Global Step: 630310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:14,112-Speed 6310.54 samples/sec Loss 3.3175 LearningRate 0.0001 Epoch: 30 Global Step: 630320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:17,356-Speed 6312.89 samples/sec Loss 3.3741 LearningRate 0.0001 Epoch: 30 Global Step: 630330 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:04:20,589-Speed 6336.64 samples/sec Loss 3.4095 LearningRate 0.0001 Epoch: 30 Global Step: 630340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:23,833-Speed 6316.19 samples/sec Loss 3.3182 LearningRate 0.0001 Epoch: 30 Global Step: 630350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:27,083-Speed 6301.26 samples/sec Loss 3.3684 LearningRate 0.0001 Epoch: 30 Global Step: 630360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:30,326-Speed 6318.27 samples/sec Loss 3.4080 LearningRate 0.0001 Epoch: 30 Global Step: 630370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:33,568-Speed 6316.78 samples/sec Loss 3.3925 LearningRate 0.0001 Epoch: 30 Global Step: 630380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:36,812-Speed 6314.51 samples/sec Loss 3.4220 LearningRate 0.0001 Epoch: 30 Global Step: 630390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:40,057-Speed 6312.36 samples/sec Loss 3.3211 LearningRate 0.0001 Epoch: 30 Global Step: 630400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:43,302-Speed 6313.14 samples/sec Loss 3.3285 LearningRate 0.0001 Epoch: 30 Global Step: 630410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:46,557-Speed 6293.21 samples/sec Loss 3.3478 LearningRate 0.0001 Epoch: 30 Global Step: 630420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:49,824-Speed 6270.31 samples/sec Loss 3.3442 LearningRate 0.0001 Epoch: 30 Global Step: 630430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:53,057-Speed 6335.36 samples/sec Loss 3.3163 LearningRate 0.0001 Epoch: 30 Global Step: 630440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:56,298-Speed 6321.18 samples/sec Loss 3.3443 LearningRate 0.0001 Epoch: 30 Global Step: 630450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:04:59,544-Speed 6311.00 samples/sec Loss 3.3766 LearningRate 0.0001 Epoch: 30 Global Step: 630460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:02,793-Speed 6304.72 samples/sec Loss 3.3479 LearningRate 0.0001 Epoch: 30 Global Step: 630470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:06,054-Speed 6282.06 samples/sec Loss 3.3872 LearningRate 0.0001 Epoch: 30 Global Step: 630480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:09,302-Speed 6307.37 samples/sec Loss 3.3538 LearningRate 0.0001 Epoch: 30 Global Step: 630490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:12,557-Speed 6292.40 samples/sec Loss 3.4161 LearningRate 0.0001 Epoch: 30 Global Step: 630500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:15,808-Speed 6302.27 samples/sec Loss 3.3519 LearningRate 0.0001 Epoch: 30 Global Step: 630510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:19,052-Speed 6314.68 samples/sec Loss 3.3195 LearningRate 0.0001 Epoch: 30 Global Step: 630520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:22,301-Speed 6304.93 samples/sec Loss 3.3654 LearningRate 0.0001 Epoch: 30 Global Step: 630530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:25,533-Speed 6337.65 samples/sec Loss 3.3628 LearningRate 0.0001 Epoch: 30 Global Step: 630540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:28,779-Speed 6310.73 samples/sec Loss 3.3339 LearningRate 0.0001 Epoch: 30 Global Step: 630550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:32,022-Speed 6316.97 samples/sec Loss 3.3883 LearningRate 0.0001 Epoch: 30 Global Step: 630560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:35,266-Speed 6314.99 samples/sec Loss 3.3203 LearningRate 0.0001 Epoch: 30 Global Step: 630570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:38,516-Speed 6302.98 samples/sec Loss 3.3787 LearningRate 0.0001 Epoch: 30 Global Step: 630580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:05:41,747-Speed 6338.89 samples/sec Loss 3.3817 LearningRate 0.0001 Epoch: 30 Global Step: 630590 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:05:44,991-Speed 6314.96 samples/sec Loss 3.3012 LearningRate 0.0001 Epoch: 30 Global Step: 630600 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:05:48,238-Speed 6308.81 samples/sec Loss 3.2625 LearningRate 0.0001 Epoch: 30 Global Step: 630610 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:05:51,491-Speed 6296.86 samples/sec Loss 3.3110 LearningRate 0.0001 Epoch: 30 Global Step: 630620 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:05:54,739-Speed 6306.19 samples/sec Loss 3.3890 LearningRate 0.0001 Epoch: 30 Global Step: 630630 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:05:57,985-Speed 6312.05 samples/sec Loss 3.3823 LearningRate 0.0001 Epoch: 30 Global Step: 630640 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:06:01,232-Speed 6307.43 samples/sec Loss 3.4378 LearningRate 0.0001 Epoch: 30 Global Step: 630650 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:06:04,479-Speed 6309.65 samples/sec Loss 3.3413 LearningRate 0.0001 Epoch: 30 Global Step: 630660 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:06:07,729-Speed 6303.29 samples/sec Loss 3.3203 LearningRate 0.0001 Epoch: 30 Global Step: 630670 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:06:10,976-Speed 6307.53 samples/sec Loss 3.3464 LearningRate 0.0001 Epoch: 30 Global Step: 630680 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:06:14,237-Speed 6282.95 samples/sec Loss 3.2906 LearningRate 0.0001 Epoch: 30 Global Step: 630690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:17,492-Speed 6293.35 samples/sec Loss 3.3025 LearningRate 0.0001 Epoch: 30 Global Step: 630700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:20,740-Speed 6306.71 samples/sec Loss 3.2993 LearningRate 0.0001 Epoch: 30 Global Step: 630710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:23,986-Speed 6310.76 samples/sec Loss 3.3746 LearningRate 0.0001 Epoch: 30 Global Step: 630720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:27,232-Speed 6311.36 samples/sec Loss 3.3917 LearningRate 0.0001 Epoch: 30 Global Step: 630730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:30,491-Speed 6285.54 samples/sec Loss 3.3632 LearningRate 0.0001 Epoch: 30 Global Step: 630740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:33,737-Speed 6310.79 samples/sec Loss 3.3419 LearningRate 0.0001 Epoch: 30 Global Step: 630750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:37,026-Speed 6227.98 samples/sec Loss 3.3663 LearningRate 0.0001 Epoch: 30 Global Step: 630760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:40,277-Speed 6302.14 samples/sec Loss 3.3818 LearningRate 0.0001 Epoch: 30 Global Step: 630770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:43,529-Speed 6297.48 samples/sec Loss 3.3925 LearningRate 0.0001 Epoch: 30 Global Step: 630780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:46,806-Speed 6252.24 samples/sec Loss 3.3370 LearningRate 0.0001 Epoch: 30 Global Step: 630790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:50,095-Speed 6227.26 samples/sec Loss 3.3257 LearningRate 0.0001 Epoch: 30 Global Step: 630800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:53,345-Speed 6303.88 samples/sec Loss 3.3050 LearningRate 0.0001 Epoch: 30 Global Step: 630810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:56,591-Speed 6310.12 samples/sec Loss 3.3005 LearningRate 0.0001 Epoch: 30 Global Step: 630820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:06:59,836-Speed 6312.43 samples/sec Loss 3.3457 LearningRate 0.0001 Epoch: 30 Global Step: 630830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:03,079-Speed 6316.20 samples/sec Loss 3.3329 LearningRate 0.0001 Epoch: 30 Global Step: 630840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:06,323-Speed 6315.30 samples/sec Loss 3.3497 LearningRate 0.0001 Epoch: 30 Global Step: 630850 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:09,565-Speed 6318.49 samples/sec Loss 3.3905 LearningRate 0.0001 Epoch: 30 Global Step: 630860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:12,811-Speed 6311.31 samples/sec Loss 3.3967 LearningRate 0.0001 Epoch: 30 Global Step: 630870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:16,053-Speed 6318.40 samples/sec Loss 3.2981 LearningRate 0.0001 Epoch: 30 Global Step: 630880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:19,281-Speed 6344.65 samples/sec Loss 3.3718 LearningRate 0.0001 Epoch: 30 Global Step: 630890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:22,529-Speed 6306.41 samples/sec Loss 3.3249 LearningRate 0.0001 Epoch: 30 Global Step: 630900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:25,777-Speed 6306.71 samples/sec Loss 3.3666 LearningRate 0.0001 Epoch: 30 Global Step: 630910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:29,026-Speed 6306.67 samples/sec Loss 3.4081 LearningRate 0.0001 Epoch: 30 Global Step: 630920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:32,269-Speed 6315.50 samples/sec Loss 3.3135 LearningRate 0.0001 Epoch: 30 Global Step: 630930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:35,519-Speed 6304.43 samples/sec Loss 3.4261 LearningRate 0.0001 Epoch: 30 Global Step: 630940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:38,764-Speed 6312.35 samples/sec Loss 3.3555 LearningRate 0.0001 Epoch: 30 Global Step: 630950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:42,011-Speed 6308.92 samples/sec Loss 3.3533 LearningRate 0.0001 Epoch: 30 Global Step: 630960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:45,252-Speed 6319.73 samples/sec Loss 3.3763 LearningRate 0.0001 Epoch: 30 Global Step: 630970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:48,498-Speed 6311.91 samples/sec Loss 3.3876 LearningRate 0.0001 Epoch: 30 Global Step: 630980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:51,727-Speed 6343.88 samples/sec Loss 3.3353 LearningRate 0.0001 Epoch: 30 Global Step: 630990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:54,974-Speed 6307.42 samples/sec Loss 3.2983 LearningRate 0.0001 Epoch: 30 Global Step: 631000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:07:58,218-Speed 6314.36 samples/sec Loss 3.3100 LearningRate 0.0001 Epoch: 30 Global Step: 631010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:01,465-Speed 6310.20 samples/sec Loss 3.3405 LearningRate 0.0001 Epoch: 30 Global Step: 631020 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:04,710-Speed 6311.62 samples/sec Loss 3.3580 LearningRate 0.0001 Epoch: 30 Global Step: 631030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:07,957-Speed 6309.53 samples/sec Loss 3.3438 LearningRate 0.0001 Epoch: 30 Global Step: 631040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:11,198-Speed 6321.18 samples/sec Loss 3.3625 LearningRate 0.0001 Epoch: 30 Global Step: 631050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:14,452-Speed 6294.69 samples/sec Loss 3.3551 LearningRate 0.0001 Epoch: 30 Global Step: 631060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:17,700-Speed 6305.48 samples/sec Loss 3.3447 LearningRate 0.0001 Epoch: 30 Global Step: 631070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:20,949-Speed 6305.40 samples/sec Loss 3.3229 LearningRate 0.0001 Epoch: 30 Global Step: 631080 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:24,187-Speed 6326.73 samples/sec Loss 3.3782 LearningRate 0.0001 Epoch: 30 Global Step: 631090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:27,434-Speed 6309.39 samples/sec Loss 3.3099 LearningRate 0.0001 Epoch: 30 Global Step: 631100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:30,681-Speed 6309.33 samples/sec Loss 3.4218 LearningRate 0.0001 Epoch: 30 Global Step: 631110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:33,923-Speed 6317.50 samples/sec Loss 3.3454 LearningRate 0.0001 Epoch: 30 Global Step: 631120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:37,170-Speed 6309.64 samples/sec Loss 3.3646 LearningRate 0.0001 Epoch: 30 Global Step: 631130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:40,422-Speed 6298.75 samples/sec Loss 3.3501 LearningRate 0.0001 Epoch: 30 Global Step: 631140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:43,682-Speed 6282.63 samples/sec Loss 3.3190 LearningRate 0.0001 Epoch: 30 Global Step: 631150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:46,933-Speed 6302.56 samples/sec Loss 3.3390 LearningRate 0.0001 Epoch: 30 Global Step: 631160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:50,177-Speed 6315.07 samples/sec Loss 3.3521 LearningRate 0.0001 Epoch: 30 Global Step: 631170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:53,422-Speed 6312.95 samples/sec Loss 3.3407 LearningRate 0.0001 Epoch: 30 Global Step: 631180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:56,651-Speed 6342.97 samples/sec Loss 3.3585 LearningRate 0.0001 Epoch: 30 Global Step: 631190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:08:59,897-Speed 6310.14 samples/sec Loss 3.3526 LearningRate 0.0001 Epoch: 30 Global Step: 631200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:03,142-Speed 6312.31 samples/sec Loss 3.3349 LearningRate 0.0001 Epoch: 30 Global Step: 631210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:06,390-Speed 6308.68 samples/sec Loss 3.3272 LearningRate 0.0001 Epoch: 30 Global Step: 631220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:09,636-Speed 6310.39 samples/sec Loss 3.3217 LearningRate 0.0001 Epoch: 30 Global Step: 631230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:12,900-Speed 6275.77 samples/sec Loss 3.3318 LearningRate 0.0001 Epoch: 30 Global Step: 631240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:16,147-Speed 6307.33 samples/sec Loss 3.3721 LearningRate 0.0001 Epoch: 30 Global Step: 631250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:19,394-Speed 6310.23 samples/sec Loss 3.3799 LearningRate 0.0001 Epoch: 30 Global Step: 631260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:22,661-Speed 6269.62 samples/sec Loss 3.3299 LearningRate 0.0001 Epoch: 30 Global Step: 631270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:26,014-Speed 6109.74 samples/sec Loss 3.2959 LearningRate 0.0001 Epoch: 30 Global Step: 631280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:29,269-Speed 6293.48 samples/sec Loss 3.3410 LearningRate 0.0001 Epoch: 30 Global Step: 631290 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:09:32,502-Speed 6335.26 samples/sec Loss 3.3812 LearningRate 0.0001 Epoch: 30 Global Step: 631300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:35,750-Speed 6307.95 samples/sec Loss 3.3439 LearningRate 0.0001 Epoch: 30 Global Step: 631310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:38,995-Speed 6313.51 samples/sec Loss 3.3706 LearningRate 0.0001 Epoch: 30 Global Step: 631320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:42,243-Speed 6307.01 samples/sec Loss 3.3660 LearningRate 0.0001 Epoch: 30 Global Step: 631330 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:45,486-Speed 6316.29 samples/sec Loss 3.3038 LearningRate 0.0001 Epoch: 30 Global Step: 631340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:48,728-Speed 6318.43 samples/sec Loss 3.3376 LearningRate 0.0001 Epoch: 30 Global Step: 631350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:51,972-Speed 6313.77 samples/sec Loss 3.2978 LearningRate 0.0001 Epoch: 30 Global Step: 631360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:55,218-Speed 6312.04 samples/sec Loss 3.3605 LearningRate 0.0001 Epoch: 30 Global Step: 631370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:09:58,464-Speed 6309.49 samples/sec Loss 3.3314 LearningRate 0.0001 Epoch: 30 Global Step: 631380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:01,723-Speed 6287.56 samples/sec Loss 3.3599 LearningRate 0.0001 Epoch: 30 Global Step: 631390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:04,953-Speed 6340.62 samples/sec Loss 3.2876 LearningRate 0.0001 Epoch: 30 Global Step: 631400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:08,199-Speed 6310.36 samples/sec Loss 3.4077 LearningRate 0.0001 Epoch: 30 Global Step: 631410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:11,447-Speed 6308.55 samples/sec Loss 3.3238 LearningRate 0.0001 Epoch: 30 Global Step: 631420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:14,693-Speed 6309.49 samples/sec Loss 3.3321 LearningRate 0.0001 Epoch: 30 Global Step: 631430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:17,937-Speed 6314.00 samples/sec Loss 3.3925 LearningRate 0.0001 Epoch: 30 Global Step: 631440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:21,184-Speed 6308.80 samples/sec Loss 3.3921 LearningRate 0.0001 Epoch: 30 Global Step: 631450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:24,431-Speed 6308.67 samples/sec Loss 3.3255 LearningRate 0.0001 Epoch: 30 Global Step: 631460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:27,679-Speed 6307.97 samples/sec Loss 3.3574 LearningRate 0.0001 Epoch: 30 Global Step: 631470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:30,948-Speed 6265.32 samples/sec Loss 3.3589 LearningRate 0.0001 Epoch: 30 Global Step: 631480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:34,194-Speed 6311.37 samples/sec Loss 3.3005 LearningRate 0.0001 Epoch: 30 Global Step: 631490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:37,427-Speed 6337.14 samples/sec Loss 3.3381 LearningRate 0.0001 Epoch: 30 Global Step: 631500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:40,672-Speed 6311.18 samples/sec Loss 3.3286 LearningRate 0.0001 Epoch: 30 Global Step: 631510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:43,918-Speed 6310.75 samples/sec Loss 3.3567 LearningRate 0.0001 Epoch: 30 Global Step: 631520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:47,163-Speed 6313.76 samples/sec Loss 3.3432 LearningRate 0.0001 Epoch: 30 Global Step: 631530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:50,408-Speed 6313.52 samples/sec Loss 3.3653 LearningRate 0.0001 Epoch: 30 Global Step: 631540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:53,652-Speed 6315.67 samples/sec Loss 3.2963 LearningRate 0.0001 Epoch: 30 Global Step: 631550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:10:56,898-Speed 6310.02 samples/sec Loss 3.3550 LearningRate 0.0001 Epoch: 30 Global Step: 631560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:00,145-Speed 6308.19 samples/sec Loss 3.3743 LearningRate 0.0001 Epoch: 30 Global Step: 631570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:03,391-Speed 6311.64 samples/sec Loss 3.3374 LearningRate 0.0001 Epoch: 30 Global Step: 631580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:06,638-Speed 6307.98 samples/sec Loss 3.3556 LearningRate 0.0001 Epoch: 30 Global Step: 631590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:09,886-Speed 6307.47 samples/sec Loss 3.4056 LearningRate 0.0001 Epoch: 30 Global Step: 631600 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:11:13,116-Speed 6341.89 samples/sec Loss 3.2617 LearningRate 0.0001 Epoch: 30 Global Step: 631610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:16,362-Speed 6311.21 samples/sec Loss 3.2875 LearningRate 0.0001 Epoch: 30 Global Step: 631620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:19,605-Speed 6317.88 samples/sec Loss 3.3989 LearningRate 0.0001 Epoch: 30 Global Step: 631630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:22,848-Speed 6315.40 samples/sec Loss 3.3663 LearningRate 0.0001 Epoch: 30 Global Step: 631640 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:26,088-Speed 6321.95 samples/sec Loss 3.3334 LearningRate 0.0001 Epoch: 30 Global Step: 631650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:29,336-Speed 6307.97 samples/sec Loss 3.3545 LearningRate 0.0001 Epoch: 30 Global Step: 631660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:32,582-Speed 6311.11 samples/sec Loss 3.3687 LearningRate 0.0001 Epoch: 30 Global Step: 631670 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:35,826-Speed 6312.71 samples/sec Loss 3.3462 LearningRate 0.0001 Epoch: 30 Global Step: 631680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:39,071-Speed 6313.16 samples/sec Loss 3.3844 LearningRate 0.0001 Epoch: 30 Global Step: 631690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:42,346-Speed 6254.68 samples/sec Loss 3.2718 LearningRate 0.0001 Epoch: 30 Global Step: 631700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:45,576-Speed 6342.20 samples/sec Loss 3.3678 LearningRate 0.0001 Epoch: 30 Global Step: 631710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:48,822-Speed 6311.64 samples/sec Loss 3.3852 LearningRate 0.0001 Epoch: 30 Global Step: 631720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:52,078-Speed 6290.75 samples/sec Loss 3.3607 LearningRate 0.0001 Epoch: 30 Global Step: 631730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:55,324-Speed 6310.22 samples/sec Loss 3.3309 LearningRate 0.0001 Epoch: 30 Global Step: 631740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:11:58,569-Speed 6312.82 samples/sec Loss 3.4060 LearningRate 0.0001 Epoch: 30 Global Step: 631750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:01,820-Speed 6301.03 samples/sec Loss 3.3919 LearningRate 0.0001 Epoch: 30 Global Step: 631760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:05,065-Speed 6313.70 samples/sec Loss 3.3191 LearningRate 0.0001 Epoch: 30 Global Step: 631770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:08,314-Speed 6305.32 samples/sec Loss 3.2921 LearningRate 0.0001 Epoch: 30 Global Step: 631780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:11,559-Speed 6311.68 samples/sec Loss 3.3168 LearningRate 0.0001 Epoch: 30 Global Step: 631790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:14,807-Speed 6307.49 samples/sec Loss 3.2735 LearningRate 0.0001 Epoch: 30 Global Step: 631800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:18,039-Speed 6337.11 samples/sec Loss 3.3760 LearningRate 0.0001 Epoch: 30 Global Step: 631810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:21,285-Speed 6310.97 samples/sec Loss 3.3399 LearningRate 0.0001 Epoch: 30 Global Step: 631820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:24,534-Speed 6305.12 samples/sec Loss 3.2883 LearningRate 0.0001 Epoch: 30 Global Step: 631830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:27,781-Speed 6309.49 samples/sec Loss 3.3869 LearningRate 0.0001 Epoch: 30 Global Step: 631840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:31,022-Speed 6320.28 samples/sec Loss 3.3690 LearningRate 0.0001 Epoch: 30 Global Step: 631850 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:34,268-Speed 6310.24 samples/sec Loss 3.3508 LearningRate 0.0001 Epoch: 30 Global Step: 631860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:37,517-Speed 6306.86 samples/sec Loss 3.3185 LearningRate 0.0001 Epoch: 30 Global Step: 631870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:40,764-Speed 6308.01 samples/sec Loss 3.3677 LearningRate 0.0001 Epoch: 30 Global Step: 631880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:44,008-Speed 6313.65 samples/sec Loss 3.3544 LearningRate 0.0001 Epoch: 30 Global Step: 631890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:47,251-Speed 6317.78 samples/sec Loss 3.3995 LearningRate 0.0001 Epoch: 30 Global Step: 631900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:50,496-Speed 6311.16 samples/sec Loss 3.3252 LearningRate 0.0001 Epoch: 30 Global Step: 631910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:53,743-Speed 6309.30 samples/sec Loss 3.3022 LearningRate 0.0001 Epoch: 30 Global Step: 631920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:12:56,991-Speed 6306.94 samples/sec Loss 3.3495 LearningRate 0.0001 Epoch: 30 Global Step: 631930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:00,237-Speed 6311.70 samples/sec Loss 3.3312 LearningRate 0.0001 Epoch: 30 Global Step: 631940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:03,480-Speed 6316.68 samples/sec Loss 3.3245 LearningRate 0.0001 Epoch: 30 Global Step: 631950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:06,726-Speed 6310.82 samples/sec Loss 3.3057 LearningRate 0.0001 Epoch: 30 Global Step: 631960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:09,970-Speed 6313.06 samples/sec Loss 3.3803 LearningRate 0.0001 Epoch: 30 Global Step: 631970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:13,221-Speed 6312.10 samples/sec Loss 3.2968 LearningRate 0.0001 Epoch: 30 Global Step: 631980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:16,465-Speed 6315.20 samples/sec Loss 3.3434 LearningRate 0.0001 Epoch: 30 Global Step: 631990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:19,711-Speed 6312.14 samples/sec Loss 3.3606 LearningRate 0.0001 Epoch: 30 Global Step: 632000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:22,944-Speed 6334.76 samples/sec Loss 3.3274 LearningRate 0.0001 Epoch: 30 Global Step: 632010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:26,189-Speed 6311.88 samples/sec Loss 3.3120 LearningRate 0.0001 Epoch: 30 Global Step: 632020 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:29,434-Speed 6313.08 samples/sec Loss 3.3904 LearningRate 0.0001 Epoch: 30 Global Step: 632030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:32,682-Speed 6308.00 samples/sec Loss 3.3856 LearningRate 0.0001 Epoch: 30 Global Step: 632040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:35,928-Speed 6310.75 samples/sec Loss 3.3709 LearningRate 0.0001 Epoch: 30 Global Step: 632050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:39,173-Speed 6315.13 samples/sec Loss 3.3014 LearningRate 0.0001 Epoch: 30 Global Step: 632060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:42,416-Speed 6316.65 samples/sec Loss 3.3512 LearningRate 0.0001 Epoch: 30 Global Step: 632070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:45,661-Speed 6312.62 samples/sec Loss 3.4115 LearningRate 0.0001 Epoch: 30 Global Step: 632080 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:48,904-Speed 6314.87 samples/sec Loss 3.3966 LearningRate 0.0001 Epoch: 30 Global Step: 632090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:52,154-Speed 6303.78 samples/sec Loss 3.3858 LearningRate 0.0001 Epoch: 30 Global Step: 632100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:55,385-Speed 6340.27 samples/sec Loss 3.3633 LearningRate 0.0001 Epoch: 30 Global Step: 632110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:13:58,626-Speed 6319.66 samples/sec Loss 3.3749 LearningRate 0.0001 Epoch: 30 Global Step: 632120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:01,874-Speed 6310.85 samples/sec Loss 3.3393 LearningRate 0.0001 Epoch: 30 Global Step: 632130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:05,116-Speed 6318.56 samples/sec Loss 3.3410 LearningRate 0.0001 Epoch: 30 Global Step: 632140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:08,365-Speed 6304.42 samples/sec Loss 3.2945 LearningRate 0.0001 Epoch: 30 Global Step: 632150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:11,608-Speed 6317.18 samples/sec Loss 3.3719 LearningRate 0.0001 Epoch: 30 Global Step: 632160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:14,850-Speed 6317.71 samples/sec Loss 3.3381 LearningRate 0.0001 Epoch: 30 Global Step: 632170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:18,107-Speed 6288.54 samples/sec Loss 3.3109 LearningRate 0.0001 Epoch: 30 Global Step: 632180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:21,354-Speed 6310.70 samples/sec Loss 3.3105 LearningRate 0.0001 Epoch: 30 Global Step: 632190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:24,595-Speed 6318.77 samples/sec Loss 3.3331 LearningRate 0.0001 Epoch: 30 Global Step: 632200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:27,840-Speed 6312.81 samples/sec Loss 3.3046 LearningRate 0.0001 Epoch: 30 Global Step: 632210 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:14:31,073-Speed 6337.69 samples/sec Loss 3.2982 LearningRate 0.0001 Epoch: 30 Global Step: 632220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:34,322-Speed 6304.62 samples/sec Loss 3.3122 LearningRate 0.0001 Epoch: 30 Global Step: 632230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:37,567-Speed 6311.42 samples/sec Loss 3.3594 LearningRate 0.0001 Epoch: 30 Global Step: 632240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:40,812-Speed 6312.30 samples/sec Loss 3.3217 LearningRate 0.0001 Epoch: 30 Global Step: 632250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:44,061-Speed 6306.09 samples/sec Loss 3.3595 LearningRate 0.0001 Epoch: 30 Global Step: 632260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:47,313-Speed 6299.21 samples/sec Loss 3.2892 LearningRate 0.0001 Epoch: 30 Global Step: 632270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:50,557-Speed 6314.22 samples/sec Loss 3.2836 LearningRate 0.0001 Epoch: 30 Global Step: 632280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:53,803-Speed 6310.81 samples/sec Loss 3.3821 LearningRate 0.0001 Epoch: 30 Global Step: 632290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:14:57,050-Speed 6309.91 samples/sec Loss 3.3241 LearningRate 0.0001 Epoch: 30 Global Step: 632300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:00,293-Speed 6317.46 samples/sec Loss 3.3911 LearningRate 0.0001 Epoch: 30 Global Step: 632310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:03,524-Speed 6338.32 samples/sec Loss 3.3601 LearningRate 0.0001 Epoch: 30 Global Step: 632320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:06,768-Speed 6315.78 samples/sec Loss 3.3411 LearningRate 0.0001 Epoch: 30 Global Step: 632330 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:10,013-Speed 6311.23 samples/sec Loss 3.3558 LearningRate 0.0001 Epoch: 30 Global Step: 632340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:13,257-Speed 6314.65 samples/sec Loss 3.3819 LearningRate 0.0001 Epoch: 30 Global Step: 632350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:16,499-Speed 6318.66 samples/sec Loss 3.3205 LearningRate 0.0001 Epoch: 30 Global Step: 632360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:19,747-Speed 6306.66 samples/sec Loss 3.2927 LearningRate 0.0001 Epoch: 30 Global Step: 632370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:22,997-Speed 6303.35 samples/sec Loss 3.3448 LearningRate 0.0001 Epoch: 30 Global Step: 632380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:26,242-Speed 6311.97 samples/sec Loss 3.3054 LearningRate 0.0001 Epoch: 30 Global Step: 632390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:29,488-Speed 6312.12 samples/sec Loss 3.3821 LearningRate 0.0001 Epoch: 30 Global Step: 632400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:32,735-Speed 6308.08 samples/sec Loss 3.3337 LearningRate 0.0001 Epoch: 30 Global Step: 632410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:35,970-Speed 6331.49 samples/sec Loss 3.3734 LearningRate 0.0001 Epoch: 30 Global Step: 632420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:39,214-Speed 6315.03 samples/sec Loss 3.3928 LearningRate 0.0001 Epoch: 30 Global Step: 632430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:42,461-Speed 6309.79 samples/sec Loss 3.3872 LearningRate 0.0001 Epoch: 30 Global Step: 632440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:45,702-Speed 6320.79 samples/sec Loss 3.2986 LearningRate 0.0001 Epoch: 30 Global Step: 632450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:48,950-Speed 6306.30 samples/sec Loss 3.3245 LearningRate 0.0001 Epoch: 30 Global Step: 632460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:52,196-Speed 6309.96 samples/sec Loss 3.3166 LearningRate 0.0001 Epoch: 30 Global Step: 632470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:55,448-Speed 6300.27 samples/sec Loss 3.3408 LearningRate 0.0001 Epoch: 30 Global Step: 632480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:15:58,693-Speed 6312.26 samples/sec Loss 3.2946 LearningRate 0.0001 Epoch: 30 Global Step: 632490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:01,949-Speed 6291.02 samples/sec Loss 3.4126 LearningRate 0.0001 Epoch: 30 Global Step: 632500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:05,195-Speed 6311.17 samples/sec Loss 3.3435 LearningRate 0.0001 Epoch: 30 Global Step: 632510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:08,426-Speed 6341.27 samples/sec Loss 3.3500 LearningRate 0.0001 Epoch: 30 Global Step: 632520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:11,670-Speed 6313.44 samples/sec Loss 3.3437 LearningRate 0.0001 Epoch: 30 Global Step: 632530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:14,916-Speed 6311.20 samples/sec Loss 3.3704 LearningRate 0.0001 Epoch: 30 Global Step: 632540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:18,174-Speed 6287.93 samples/sec Loss 3.2728 LearningRate 0.0001 Epoch: 30 Global Step: 632550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:21,422-Speed 6306.08 samples/sec Loss 3.3876 LearningRate 0.0001 Epoch: 30 Global Step: 632560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:24,668-Speed 6310.74 samples/sec Loss 3.2509 LearningRate 0.0001 Epoch: 30 Global Step: 632570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:27,913-Speed 6312.74 samples/sec Loss 3.3278 LearningRate 0.0001 Epoch: 30 Global Step: 632580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:31,162-Speed 6305.24 samples/sec Loss 3.3526 LearningRate 0.0001 Epoch: 30 Global Step: 632590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:34,413-Speed 6300.05 samples/sec Loss 3.3859 LearningRate 0.0001 Epoch: 30 Global Step: 632600 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:37,661-Speed 6308.20 samples/sec Loss 3.2955 LearningRate 0.0001 Epoch: 30 Global Step: 632610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:40,891-Speed 6341.34 samples/sec Loss 3.3787 LearningRate 0.0001 Epoch: 30 Global Step: 632620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:44,137-Speed 6310.45 samples/sec Loss 3.3416 LearningRate 0.0001 Epoch: 30 Global Step: 632630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:47,384-Speed 6309.53 samples/sec Loss 3.4175 LearningRate 0.0001 Epoch: 30 Global Step: 632640 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:50,631-Speed 6307.70 samples/sec Loss 3.3030 LearningRate 0.0001 Epoch: 30 Global Step: 632650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:53,880-Speed 6304.57 samples/sec Loss 3.3316 LearningRate 0.0001 Epoch: 30 Global Step: 632660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:16:57,124-Speed 6314.70 samples/sec Loss 3.3485 LearningRate 0.0001 Epoch: 30 Global Step: 632670 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:00,373-Speed 6306.62 samples/sec Loss 3.3375 LearningRate 0.0001 Epoch: 30 Global Step: 632680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:03,623-Speed 6301.85 samples/sec Loss 3.3030 LearningRate 0.0001 Epoch: 30 Global Step: 632690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:06,876-Speed 6298.11 samples/sec Loss 3.3261 LearningRate 0.0001 Epoch: 30 Global Step: 632700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:10,132-Speed 6291.32 samples/sec Loss 3.2983 LearningRate 0.0001 Epoch: 30 Global Step: 632710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:13,382-Speed 6303.03 samples/sec Loss 3.3169 LearningRate 0.0001 Epoch: 30 Global Step: 632720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:16,626-Speed 6314.52 samples/sec Loss 3.2975 LearningRate 0.0001 Epoch: 30 Global Step: 632730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:19,869-Speed 6316.04 samples/sec Loss 3.3134 LearningRate 0.0001 Epoch: 30 Global Step: 632740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:23,118-Speed 6305.52 samples/sec Loss 3.3501 LearningRate 0.0001 Epoch: 30 Global Step: 632750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:26,358-Speed 6321.97 samples/sec Loss 3.3443 LearningRate 0.0001 Epoch: 30 Global Step: 632760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:29,604-Speed 6310.48 samples/sec Loss 3.2910 LearningRate 0.0001 Epoch: 30 Global Step: 632770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:32,849-Speed 6314.40 samples/sec Loss 3.3550 LearningRate 0.0001 Epoch: 30 Global Step: 632780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:36,110-Speed 6281.34 samples/sec Loss 3.2595 LearningRate 0.0001 Epoch: 30 Global Step: 632790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:39,352-Speed 6318.63 samples/sec Loss 3.3306 LearningRate 0.0001 Epoch: 30 Global Step: 632800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:42,597-Speed 6312.26 samples/sec Loss 3.3426 LearningRate 0.0001 Epoch: 30 Global Step: 632810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:45,826-Speed 6343.88 samples/sec Loss 3.3354 LearningRate 0.0001 Epoch: 30 Global Step: 632820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:49,069-Speed 6315.60 samples/sec Loss 3.3279 LearningRate 0.0001 Epoch: 30 Global Step: 632830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:52,321-Speed 6299.89 samples/sec Loss 3.3543 LearningRate 0.0001 Epoch: 30 Global Step: 632840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:55,568-Speed 6308.54 samples/sec Loss 3.3621 LearningRate 0.0001 Epoch: 30 Global Step: 632850 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:17:58,812-Speed 6314.62 samples/sec Loss 3.2932 LearningRate 0.0001 Epoch: 30 Global Step: 632860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:02,057-Speed 6312.24 samples/sec Loss 3.3126 LearningRate 0.0001 Epoch: 30 Global Step: 632870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:05,392-Speed 6141.89 samples/sec Loss 3.2708 LearningRate 0.0001 Epoch: 30 Global Step: 632880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:08,647-Speed 6293.90 samples/sec Loss 3.3340 LearningRate 0.0001 Epoch: 30 Global Step: 632890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:11,891-Speed 6314.11 samples/sec Loss 3.3262 LearningRate 0.0001 Epoch: 30 Global Step: 632900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:15,133-Speed 6318.20 samples/sec Loss 3.4006 LearningRate 0.0001 Epoch: 30 Global Step: 632910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:18,380-Speed 6309.48 samples/sec Loss 3.3276 LearningRate 0.0001 Epoch: 30 Global Step: 632920 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:18:21,608-Speed 6347.26 samples/sec Loss 3.3488 LearningRate 0.0001 Epoch: 30 Global Step: 632930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:24,862-Speed 6295.86 samples/sec Loss 3.3093 LearningRate 0.0001 Epoch: 30 Global Step: 632940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:28,106-Speed 6314.54 samples/sec Loss 3.4100 LearningRate 0.0001 Epoch: 30 Global Step: 632950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:31,350-Speed 6313.63 samples/sec Loss 3.3417 LearningRate 0.0001 Epoch: 30 Global Step: 632960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:34,601-Speed 6301.17 samples/sec Loss 3.3785 LearningRate 0.0001 Epoch: 30 Global Step: 632970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:37,845-Speed 6315.59 samples/sec Loss 3.3844 LearningRate 0.0001 Epoch: 30 Global Step: 632980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:41,088-Speed 6315.05 samples/sec Loss 3.3457 LearningRate 0.0001 Epoch: 30 Global Step: 632990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:44,337-Speed 6305.14 samples/sec Loss 3.3108 LearningRate 0.0001 Epoch: 30 Global Step: 633000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:47,587-Speed 6303.91 samples/sec Loss 3.3596 LearningRate 0.0001 Epoch: 30 Global Step: 633010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:50,831-Speed 6313.35 samples/sec Loss 3.3210 LearningRate 0.0001 Epoch: 30 Global Step: 633020 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:54,059-Speed 6345.74 samples/sec Loss 3.3923 LearningRate 0.0001 Epoch: 30 Global Step: 633030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:18:57,308-Speed 6306.33 samples/sec Loss 3.3880 LearningRate 0.0001 Epoch: 30 Global Step: 633040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:00,558-Speed 6303.03 samples/sec Loss 3.3420 LearningRate 0.0001 Epoch: 30 Global Step: 633050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:03,801-Speed 6316.15 samples/sec Loss 3.3991 LearningRate 0.0001 Epoch: 30 Global Step: 633060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:07,049-Speed 6307.43 samples/sec Loss 3.3310 LearningRate 0.0001 Epoch: 30 Global Step: 633070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:10,293-Speed 6313.27 samples/sec Loss 3.3770 LearningRate 0.0001 Epoch: 30 Global Step: 633080 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:13,544-Speed 6302.19 samples/sec Loss 3.3867 LearningRate 0.0001 Epoch: 30 Global Step: 633090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:16,787-Speed 6315.68 samples/sec Loss 3.3827 LearningRate 0.0001 Epoch: 30 Global Step: 633100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:20,034-Speed 6308.96 samples/sec Loss 3.3789 LearningRate 0.0001 Epoch: 30 Global Step: 633110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:23,276-Speed 6318.62 samples/sec Loss 3.4035 LearningRate 0.0001 Epoch: 30 Global Step: 633120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:26,511-Speed 6331.23 samples/sec Loss 3.3116 LearningRate 0.0001 Epoch: 30 Global Step: 633130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:29,760-Speed 6304.59 samples/sec Loss 3.3735 LearningRate 0.0001 Epoch: 30 Global Step: 633140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:33,010-Speed 6304.24 samples/sec Loss 3.2796 LearningRate 0.0001 Epoch: 30 Global Step: 633150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:36,258-Speed 6307.44 samples/sec Loss 3.3296 LearningRate 0.0001 Epoch: 30 Global Step: 633160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:39,505-Speed 6309.96 samples/sec Loss 3.3549 LearningRate 0.0001 Epoch: 30 Global Step: 633170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:42,747-Speed 6317.58 samples/sec Loss 3.2803 LearningRate 0.0001 Epoch: 30 Global Step: 633180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:45,994-Speed 6308.22 samples/sec Loss 3.2849 LearningRate 0.0001 Epoch: 30 Global Step: 633190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:49,239-Speed 6313.89 samples/sec Loss 3.3407 LearningRate 0.0001 Epoch: 30 Global Step: 633200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:52,481-Speed 6318.69 samples/sec Loss 3.3361 LearningRate 0.0001 Epoch: 30 Global Step: 633210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:55,722-Speed 6319.10 samples/sec Loss 3.3872 LearningRate 0.0001 Epoch: 30 Global Step: 633220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:19:58,953-Speed 6341.29 samples/sec Loss 3.3198 LearningRate 0.0001 Epoch: 30 Global Step: 633230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:02,196-Speed 6316.24 samples/sec Loss 3.3956 LearningRate 0.0001 Epoch: 30 Global Step: 633240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:05,441-Speed 6311.27 samples/sec Loss 3.3556 LearningRate 0.0001 Epoch: 30 Global Step: 633250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:08,683-Speed 6319.06 samples/sec Loss 3.3180 LearningRate 0.0001 Epoch: 30 Global Step: 633260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:11,928-Speed 6313.50 samples/sec Loss 3.3160 LearningRate 0.0001 Epoch: 30 Global Step: 633270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:15,178-Speed 6302.50 samples/sec Loss 3.3337 LearningRate 0.0001 Epoch: 30 Global Step: 633280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:18,422-Speed 6313.88 samples/sec Loss 3.3225 LearningRate 0.0001 Epoch: 30 Global Step: 633290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:21,672-Speed 6303.60 samples/sec Loss 3.3271 LearningRate 0.0001 Epoch: 30 Global Step: 633300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:24,916-Speed 6315.33 samples/sec Loss 3.3396 LearningRate 0.0001 Epoch: 30 Global Step: 633310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:28,162-Speed 6310.81 samples/sec Loss 3.3581 LearningRate 0.0001 Epoch: 30 Global Step: 633320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:31,406-Speed 6313.09 samples/sec Loss 3.3322 LearningRate 0.0001 Epoch: 30 Global Step: 633330 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:20:34,633-Speed 6347.40 samples/sec Loss 3.3562 LearningRate 0.0001 Epoch: 30 Global Step: 633340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:37,878-Speed 6312.61 samples/sec Loss 3.4026 LearningRate 0.0001 Epoch: 30 Global Step: 633350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:41,126-Speed 6306.88 samples/sec Loss 3.3079 LearningRate 0.0001 Epoch: 30 Global Step: 633360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:44,372-Speed 6311.78 samples/sec Loss 3.2955 LearningRate 0.0001 Epoch: 30 Global Step: 633370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:47,619-Speed 6310.25 samples/sec Loss 3.3144 LearningRate 0.0001 Epoch: 30 Global Step: 633380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:50,862-Speed 6315.28 samples/sec Loss 3.3373 LearningRate 0.0001 Epoch: 30 Global Step: 633390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:54,111-Speed 6305.13 samples/sec Loss 3.3415 LearningRate 0.0001 Epoch: 30 Global Step: 633400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:20:57,354-Speed 6317.47 samples/sec Loss 3.3911 LearningRate 0.0001 Epoch: 30 Global Step: 633410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:00,598-Speed 6313.56 samples/sec Loss 3.3252 LearningRate 0.0001 Epoch: 30 Global Step: 633420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:03,852-Speed 6295.92 samples/sec Loss 3.3420 LearningRate 0.0001 Epoch: 30 Global Step: 633430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:07,083-Speed 6339.05 samples/sec Loss 3.3742 LearningRate 0.0001 Epoch: 30 Global Step: 633440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:10,327-Speed 6314.43 samples/sec Loss 3.3413 LearningRate 0.0001 Epoch: 30 Global Step: 633450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:13,572-Speed 6313.34 samples/sec Loss 3.3257 LearningRate 0.0001 Epoch: 30 Global Step: 633460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:16,873-Speed 6205.24 samples/sec Loss 3.4162 LearningRate 0.0001 Epoch: 30 Global Step: 633470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:20,124-Speed 6300.65 samples/sec Loss 3.3346 LearningRate 0.0001 Epoch: 30 Global Step: 633480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:23,371-Speed 6309.73 samples/sec Loss 3.3309 LearningRate 0.0001 Epoch: 30 Global Step: 633490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:26,618-Speed 6307.31 samples/sec Loss 3.3049 LearningRate 0.0001 Epoch: 30 Global Step: 633500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:29,866-Speed 6310.84 samples/sec Loss 3.3306 LearningRate 0.0001 Epoch: 30 Global Step: 633510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:33,112-Speed 6311.37 samples/sec Loss 3.3918 LearningRate 0.0001 Epoch: 30 Global Step: 633520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:36,360-Speed 6306.17 samples/sec Loss 3.3169 LearningRate 0.0001 Epoch: 30 Global Step: 633530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:39,605-Speed 6312.21 samples/sec Loss 3.3425 LearningRate 0.0001 Epoch: 30 Global Step: 633540 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:21:42,842-Speed 6331.27 samples/sec Loss 3.3024 LearningRate 0.0001 Epoch: 30 Global Step: 633550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:46,087-Speed 6312.82 samples/sec Loss 3.3180 LearningRate 0.0001 Epoch: 30 Global Step: 633560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:49,331-Speed 6313.49 samples/sec Loss 3.3661 LearningRate 0.0001 Epoch: 30 Global Step: 633570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:52,583-Speed 6299.33 samples/sec Loss 3.3318 LearningRate 0.0001 Epoch: 30 Global Step: 633580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:55,827-Speed 6314.98 samples/sec Loss 3.3704 LearningRate 0.0001 Epoch: 30 Global Step: 633590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:21:59,073-Speed 6310.63 samples/sec Loss 3.3288 LearningRate 0.0001 Epoch: 30 Global Step: 633600 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:02,318-Speed 6313.20 samples/sec Loss 3.3300 LearningRate 0.0001 Epoch: 30 Global Step: 633610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:05,562-Speed 6314.80 samples/sec Loss 3.3578 LearningRate 0.0001 Epoch: 30 Global Step: 633620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:08,807-Speed 6312.06 samples/sec Loss 3.3136 LearningRate 0.0001 Epoch: 30 Global Step: 633630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:12,052-Speed 6313.25 samples/sec Loss 3.3339 LearningRate 0.0001 Epoch: 30 Global Step: 633640 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:15,284-Speed 6337.02 samples/sec Loss 3.3455 LearningRate 0.0001 Epoch: 30 Global Step: 633650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:18,530-Speed 6312.40 samples/sec Loss 3.2951 LearningRate 0.0001 Epoch: 30 Global Step: 633660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:21,776-Speed 6309.33 samples/sec Loss 3.3128 LearningRate 0.0001 Epoch: 30 Global Step: 633670 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:25,029-Speed 6298.34 samples/sec Loss 3.3686 LearningRate 0.0001 Epoch: 30 Global Step: 633680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:28,279-Speed 6301.88 samples/sec Loss 3.3038 LearningRate 0.0001 Epoch: 30 Global Step: 633690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:31,521-Speed 6318.21 samples/sec Loss 3.3376 LearningRate 0.0001 Epoch: 30 Global Step: 633700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:34,767-Speed 6310.60 samples/sec Loss 3.3755 LearningRate 0.0001 Epoch: 30 Global Step: 633710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:38,008-Speed 6321.82 samples/sec Loss 3.2719 LearningRate 0.0001 Epoch: 30 Global Step: 633720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:41,251-Speed 6315.31 samples/sec Loss 3.2766 LearningRate 0.0001 Epoch: 30 Global Step: 633730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:44,497-Speed 6310.21 samples/sec Loss 3.3921 LearningRate 0.0001 Epoch: 30 Global Step: 633740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:47,726-Speed 6344.34 samples/sec Loss 3.3641 LearningRate 0.0001 Epoch: 30 Global Step: 633750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:50,974-Speed 6307.12 samples/sec Loss 3.3366 LearningRate 0.0001 Epoch: 30 Global Step: 633760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:54,218-Speed 6314.46 samples/sec Loss 3.2839 LearningRate 0.0001 Epoch: 30 Global Step: 633770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:22:57,464-Speed 6310.92 samples/sec Loss 3.3006 LearningRate 0.0001 Epoch: 30 Global Step: 633780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:00,708-Speed 6315.81 samples/sec Loss 3.3284 LearningRate 0.0001 Epoch: 30 Global Step: 633790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:03,950-Speed 6317.69 samples/sec Loss 3.3782 LearningRate 0.0001 Epoch: 30 Global Step: 633800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:07,194-Speed 6313.56 samples/sec Loss 3.3457 LearningRate 0.0001 Epoch: 30 Global Step: 633810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:10,449-Speed 6294.02 samples/sec Loss 3.3464 LearningRate 0.0001 Epoch: 30 Global Step: 633820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:13,694-Speed 6313.88 samples/sec Loss 3.3357 LearningRate 0.0001 Epoch: 30 Global Step: 633830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:16,937-Speed 6316.64 samples/sec Loss 3.2870 LearningRate 0.0001 Epoch: 30 Global Step: 633840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:20,182-Speed 6312.67 samples/sec Loss 3.3083 LearningRate 0.0001 Epoch: 30 Global Step: 633850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:23:23,411-Speed 6344.56 samples/sec Loss 3.4117 LearningRate 0.0001 Epoch: 30 Global Step: 633860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:26,654-Speed 6315.22 samples/sec Loss 3.3027 LearningRate 0.0001 Epoch: 30 Global Step: 633870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:29,906-Speed 6299.50 samples/sec Loss 3.3094 LearningRate 0.0001 Epoch: 30 Global Step: 633880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:33,154-Speed 6307.37 samples/sec Loss 3.3235 LearningRate 0.0001 Epoch: 30 Global Step: 633890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:36,400-Speed 6310.79 samples/sec Loss 3.3336 LearningRate 0.0001 Epoch: 30 Global Step: 633900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:39,644-Speed 6313.31 samples/sec Loss 3.3027 LearningRate 0.0001 Epoch: 30 Global Step: 633910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:42,890-Speed 6311.20 samples/sec Loss 3.2764 LearningRate 0.0001 Epoch: 30 Global Step: 633920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:46,139-Speed 6305.07 samples/sec Loss 3.3352 LearningRate 0.0001 Epoch: 30 Global Step: 633930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:49,387-Speed 6307.31 samples/sec Loss 3.3508 LearningRate 0.0001 Epoch: 30 Global Step: 633940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:52,629-Speed 6318.94 samples/sec Loss 3.3569 LearningRate 0.0001 Epoch: 30 Global Step: 633950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:55,867-Speed 6325.42 samples/sec Loss 3.2995 LearningRate 0.0001 Epoch: 30 Global Step: 633960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:23:59,145-Speed 6249.20 samples/sec Loss 3.2988 LearningRate 0.0001 Epoch: 30 Global Step: 633970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:02,410-Speed 6273.95 samples/sec Loss 3.3081 LearningRate 0.0001 Epoch: 30 Global Step: 633980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:05,691-Speed 6244.17 samples/sec Loss 3.3483 LearningRate 0.0001 Epoch: 30 Global Step: 633990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:08,935-Speed 6313.55 samples/sec Loss 3.3443 LearningRate 0.0001 Epoch: 30 Global Step: 634000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:12,176-Speed 6321.19 samples/sec Loss 3.3219 LearningRate 0.0001 Epoch: 30 Global Step: 634010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:15,425-Speed 6303.82 samples/sec Loss 3.3942 LearningRate 0.0001 Epoch: 30 Global Step: 634020 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:18,670-Speed 6314.05 samples/sec Loss 3.3297 LearningRate 0.0001 Epoch: 30 Global Step: 634030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:21,917-Speed 6308.82 samples/sec Loss 3.2743 LearningRate 0.0001 Epoch: 30 Global Step: 634040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:25,161-Speed 6314.74 samples/sec Loss 3.3498 LearningRate 0.0001 Epoch: 30 Global Step: 634050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:28,390-Speed 6343.67 samples/sec Loss 3.3690 LearningRate 0.0001 Epoch: 30 Global Step: 634060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:31,646-Speed 6292.69 samples/sec Loss 3.3321 LearningRate 0.0001 Epoch: 30 Global Step: 634070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:34,892-Speed 6310.72 samples/sec Loss 3.3119 LearningRate 0.0001 Epoch: 30 Global Step: 634080 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:38,135-Speed 6314.76 samples/sec Loss 3.3376 LearningRate 0.0001 Epoch: 30 Global Step: 634090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:41,380-Speed 6314.61 samples/sec Loss 3.3512 LearningRate 0.0001 Epoch: 30 Global Step: 634100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:44,621-Speed 6319.52 samples/sec Loss 3.3455 LearningRate 0.0001 Epoch: 30 Global Step: 634110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:47,868-Speed 6308.21 samples/sec Loss 3.3497 LearningRate 0.0001 Epoch: 30 Global Step: 634120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:51,115-Speed 6308.92 samples/sec Loss 3.2858 LearningRate 0.0001 Epoch: 30 Global Step: 634130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:54,362-Speed 6309.88 samples/sec Loss 3.3244 LearningRate 0.0001 Epoch: 30 Global Step: 634140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:24:57,608-Speed 6310.80 samples/sec Loss 3.3627 LearningRate 0.0001 Epoch: 30 Global Step: 634150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:00,837-Speed 6342.70 samples/sec Loss 3.3661 LearningRate 0.0001 Epoch: 30 Global Step: 634160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:04,084-Speed 6308.74 samples/sec Loss 3.2534 LearningRate 0.0001 Epoch: 30 Global Step: 634170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:07,331-Speed 6309.30 samples/sec Loss 3.3315 LearningRate 0.0001 Epoch: 30 Global Step: 634180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:10,581-Speed 6302.49 samples/sec Loss 3.3126 LearningRate 0.0001 Epoch: 30 Global Step: 634190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:13,825-Speed 6314.18 samples/sec Loss 3.2498 LearningRate 0.0001 Epoch: 30 Global Step: 634200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:17,072-Speed 6310.37 samples/sec Loss 3.3167 LearningRate 0.0001 Epoch: 30 Global Step: 634210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:20,318-Speed 6309.36 samples/sec Loss 3.3134 LearningRate 0.0001 Epoch: 30 Global Step: 634220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:23,559-Speed 6321.31 samples/sec Loss 3.3346 LearningRate 0.0001 Epoch: 30 Global Step: 634230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:26,805-Speed 6311.24 samples/sec Loss 3.3501 LearningRate 0.0001 Epoch: 30 Global Step: 634240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:30,048-Speed 6315.30 samples/sec Loss 3.2925 LearningRate 0.0001 Epoch: 30 Global Step: 634250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:33,289-Speed 6321.34 samples/sec Loss 3.2663 LearningRate 0.0001 Epoch: 30 Global Step: 634260 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:25:36,520-Speed 6338.98 samples/sec Loss 3.2970 LearningRate 0.0001 Epoch: 30 Global Step: 634270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:39,766-Speed 6312.83 samples/sec Loss 3.3442 LearningRate 0.0001 Epoch: 30 Global Step: 634280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:43,014-Speed 6305.75 samples/sec Loss 3.3304 LearningRate 0.0001 Epoch: 30 Global Step: 634290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:46,257-Speed 6316.74 samples/sec Loss 3.3234 LearningRate 0.0001 Epoch: 30 Global Step: 634300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:49,502-Speed 6312.97 samples/sec Loss 3.3399 LearningRate 0.0001 Epoch: 30 Global Step: 634310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:52,746-Speed 6315.79 samples/sec Loss 3.3167 LearningRate 0.0001 Epoch: 30 Global Step: 634320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:55,993-Speed 6308.31 samples/sec Loss 3.3377 LearningRate 0.0001 Epoch: 30 Global Step: 634330 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:25:59,238-Speed 6311.75 samples/sec Loss 3.2519 LearningRate 0.0001 Epoch: 30 Global Step: 634340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:02,483-Speed 6312.61 samples/sec Loss 3.3506 LearningRate 0.0001 Epoch: 30 Global Step: 634350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:05,731-Speed 6306.39 samples/sec Loss 3.3000 LearningRate 0.0001 Epoch: 30 Global Step: 634360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:08,959-Speed 6346.00 samples/sec Loss 3.3438 LearningRate 0.0001 Epoch: 30 Global Step: 634370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:12,208-Speed 6305.51 samples/sec Loss 3.3193 LearningRate 0.0001 Epoch: 30 Global Step: 634380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:15,454-Speed 6311.42 samples/sec Loss 3.3103 LearningRate 0.0001 Epoch: 30 Global Step: 634390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:18,699-Speed 6312.29 samples/sec Loss 3.3449 LearningRate 0.0001 Epoch: 30 Global Step: 634400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:21,945-Speed 6311.03 samples/sec Loss 3.3094 LearningRate 0.0001 Epoch: 30 Global Step: 634410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:25,193-Speed 6306.37 samples/sec Loss 3.3042 LearningRate 0.0001 Epoch: 30 Global Step: 634420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:28,438-Speed 6311.87 samples/sec Loss 3.3119 LearningRate 0.0001 Epoch: 30 Global Step: 634430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:31,687-Speed 6305.87 samples/sec Loss 3.2904 LearningRate 0.0001 Epoch: 30 Global Step: 634440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:34,931-Speed 6314.20 samples/sec Loss 3.3287 LearningRate 0.0001 Epoch: 30 Global Step: 634450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:38,175-Speed 6313.68 samples/sec Loss 3.2834 LearningRate 0.0001 Epoch: 30 Global Step: 634460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:41,405-Speed 6344.31 samples/sec Loss 3.3098 LearningRate 0.0001 Epoch: 30 Global Step: 634470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:44,648-Speed 6314.46 samples/sec Loss 3.3388 LearningRate 0.0001 Epoch: 30 Global Step: 634480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:47,896-Speed 6308.32 samples/sec Loss 3.2383 LearningRate 0.0001 Epoch: 30 Global Step: 634490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:51,140-Speed 6314.11 samples/sec Loss 3.3377 LearningRate 0.0001 Epoch: 30 Global Step: 634500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:54,387-Speed 6310.11 samples/sec Loss 3.2708 LearningRate 0.0001 Epoch: 30 Global Step: 634510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:26:57,636-Speed 6304.50 samples/sec Loss 3.3429 LearningRate 0.0001 Epoch: 30 Global Step: 634520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:00,884-Speed 6307.87 samples/sec Loss 3.4013 LearningRate 0.0001 Epoch: 30 Global Step: 634530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:04,132-Speed 6305.07 samples/sec Loss 3.3024 LearningRate 0.0001 Epoch: 30 Global Step: 634540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:07,380-Speed 6307.50 samples/sec Loss 3.3217 LearningRate 0.0001 Epoch: 30 Global Step: 634550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:10,626-Speed 6310.32 samples/sec Loss 3.3622 LearningRate 0.0001 Epoch: 30 Global Step: 634560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:13,855-Speed 6344.11 samples/sec Loss 3.3134 LearningRate 0.0001 Epoch: 30 Global Step: 634570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:17,097-Speed 6317.91 samples/sec Loss 3.3514 LearningRate 0.0001 Epoch: 30 Global Step: 634580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:20,346-Speed 6305.34 samples/sec Loss 3.3221 LearningRate 0.0001 Epoch: 30 Global Step: 634590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:23,593-Speed 6308.92 samples/sec Loss 3.3132 LearningRate 0.0001 Epoch: 30 Global Step: 634600 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:26,839-Speed 6311.70 samples/sec Loss 3.3257 LearningRate 0.0001 Epoch: 30 Global Step: 634610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:30,083-Speed 6314.46 samples/sec Loss 3.2888 LearningRate 0.0001 Epoch: 30 Global Step: 634620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:33,336-Speed 6296.73 samples/sec Loss 3.2748 LearningRate 0.0001 Epoch: 30 Global Step: 634630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:36,584-Speed 6306.13 samples/sec Loss 3.3133 LearningRate 0.0001 Epoch: 30 Global Step: 634640 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:39,827-Speed 6316.65 samples/sec Loss 3.2869 LearningRate 0.0001 Epoch: 30 Global Step: 634650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:43,072-Speed 6312.22 samples/sec Loss 3.3338 LearningRate 0.0001 Epoch: 30 Global Step: 634660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:46,302-Speed 6341.90 samples/sec Loss 3.3060 LearningRate 0.0001 Epoch: 30 Global Step: 634670 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:49,548-Speed 6311.48 samples/sec Loss 3.3044 LearningRate 0.0001 Epoch: 30 Global Step: 634680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:52,792-Speed 6315.20 samples/sec Loss 3.2849 LearningRate 0.0001 Epoch: 30 Global Step: 634690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:56,034-Speed 6316.91 samples/sec Loss 3.2817 LearningRate 0.0001 Epoch: 30 Global Step: 634700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:27:59,280-Speed 6311.16 samples/sec Loss 3.2895 LearningRate 0.0001 Epoch: 30 Global Step: 634710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:02,527-Speed 6309.93 samples/sec Loss 3.3375 LearningRate 0.0001 Epoch: 30 Global Step: 634720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:05,772-Speed 6312.81 samples/sec Loss 3.3359 LearningRate 0.0001 Epoch: 30 Global Step: 634730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:09,016-Speed 6315.38 samples/sec Loss 3.3545 LearningRate 0.0001 Epoch: 30 Global Step: 634740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:12,262-Speed 6311.44 samples/sec Loss 3.3162 LearningRate 0.0001 Epoch: 30 Global Step: 634750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:15,504-Speed 6317.48 samples/sec Loss 3.3274 LearningRate 0.0001 Epoch: 30 Global Step: 634760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:18,753-Speed 6304.22 samples/sec Loss 3.3063 LearningRate 0.0001 Epoch: 30 Global Step: 634770 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:28:21,983-Speed 6342.96 samples/sec Loss 3.3816 LearningRate 0.0001 Epoch: 30 Global Step: 634780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:25,227-Speed 6315.31 samples/sec Loss 3.3177 LearningRate 0.0001 Epoch: 30 Global Step: 634790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:28,472-Speed 6311.13 samples/sec Loss 3.3302 LearningRate 0.0001 Epoch: 30 Global Step: 634800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:31,716-Speed 6314.96 samples/sec Loss 3.3590 LearningRate 0.0001 Epoch: 30 Global Step: 634810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:34,963-Speed 6307.96 samples/sec Loss 3.3809 LearningRate 0.0001 Epoch: 30 Global Step: 634820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:38,209-Speed 6311.16 samples/sec Loss 3.3006 LearningRate 0.0001 Epoch: 30 Global Step: 634830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:41,456-Speed 6310.05 samples/sec Loss 3.3289 LearningRate 0.0001 Epoch: 30 Global Step: 634840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:44,703-Speed 6306.92 samples/sec Loss 3.3599 LearningRate 0.0001 Epoch: 30 Global Step: 634850 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:47,953-Speed 6304.14 samples/sec Loss 3.3298 LearningRate 0.0001 Epoch: 30 Global Step: 634860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:51,200-Speed 6308.17 samples/sec Loss 3.2792 LearningRate 0.0001 Epoch: 30 Global Step: 634870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:54,430-Speed 6341.90 samples/sec Loss 3.3170 LearningRate 0.0001 Epoch: 30 Global Step: 634880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:28:57,674-Speed 6315.52 samples/sec Loss 3.3413 LearningRate 0.0001 Epoch: 30 Global Step: 634890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:00,921-Speed 6308.50 samples/sec Loss 3.3280 LearningRate 0.0001 Epoch: 30 Global Step: 634900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:04,169-Speed 6306.85 samples/sec Loss 3.3036 LearningRate 0.0001 Epoch: 30 Global Step: 634910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:07,417-Speed 6306.67 samples/sec Loss 3.3139 LearningRate 0.0001 Epoch: 30 Global Step: 634920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:10,670-Speed 6298.10 samples/sec Loss 3.3420 LearningRate 0.0001 Epoch: 30 Global Step: 634930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:13,916-Speed 6310.62 samples/sec Loss 3.3720 LearningRate 0.0001 Epoch: 30 Global Step: 634940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:17,160-Speed 6314.20 samples/sec Loss 3.3200 LearningRate 0.0001 Epoch: 30 Global Step: 634950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:20,408-Speed 6306.69 samples/sec Loss 3.3678 LearningRate 0.0001 Epoch: 30 Global Step: 634960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:23,653-Speed 6313.53 samples/sec Loss 3.2962 LearningRate 0.0001 Epoch: 30 Global Step: 634970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:26,881-Speed 6346.34 samples/sec Loss 3.3222 LearningRate 0.0001 Epoch: 30 Global Step: 634980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:30,140-Speed 6284.77 samples/sec Loss 3.3509 LearningRate 0.0001 Epoch: 30 Global Step: 634990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:33,381-Speed 6319.55 samples/sec Loss 3.3582 LearningRate 0.0001 Epoch: 30 Global Step: 635000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:36,626-Speed 6312.42 samples/sec Loss 3.3290 LearningRate 0.0001 Epoch: 30 Global Step: 635010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:39,871-Speed 6312.87 samples/sec Loss 3.3512 LearningRate 0.0001 Epoch: 30 Global Step: 635020 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:43,113-Speed 6318.81 samples/sec Loss 3.3553 LearningRate 0.0001 Epoch: 30 Global Step: 635030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:46,363-Speed 6304.27 samples/sec Loss 3.3028 LearningRate 0.0001 Epoch: 30 Global Step: 635040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:49,612-Speed 6304.82 samples/sec Loss 3.3400 LearningRate 0.0001 Epoch: 30 Global Step: 635050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:52,856-Speed 6314.65 samples/sec Loss 3.3134 LearningRate 0.0001 Epoch: 30 Global Step: 635060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:56,101-Speed 6312.26 samples/sec Loss 3.3476 LearningRate 0.0001 Epoch: 30 Global Step: 635070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:29:59,349-Speed 6305.98 samples/sec Loss 3.3708 LearningRate 0.0001 Epoch: 30 Global Step: 635080 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:30:02,582-Speed 6336.68 samples/sec Loss 3.3718 LearningRate 0.0001 Epoch: 30 Global Step: 635090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:05,828-Speed 6310.06 samples/sec Loss 3.3410 LearningRate 0.0001 Epoch: 30 Global Step: 635100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:09,078-Speed 6302.51 samples/sec Loss 3.3141 LearningRate 0.0001 Epoch: 30 Global Step: 635110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:12,322-Speed 6314.81 samples/sec Loss 3.2959 LearningRate 0.0001 Epoch: 30 Global Step: 635120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:15,575-Speed 6298.27 samples/sec Loss 3.3369 LearningRate 0.0001 Epoch: 30 Global Step: 635130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:18,826-Speed 6299.41 samples/sec Loss 3.3120 LearningRate 0.0001 Epoch: 30 Global Step: 635140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:22,072-Speed 6312.05 samples/sec Loss 3.3499 LearningRate 0.0001 Epoch: 30 Global Step: 635150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:25,317-Speed 6312.96 samples/sec Loss 3.3175 LearningRate 0.0001 Epoch: 30 Global Step: 635160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:28,566-Speed 6307.99 samples/sec Loss 3.3893 LearningRate 0.0001 Epoch: 30 Global Step: 635170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:31,810-Speed 6313.63 samples/sec Loss 3.2617 LearningRate 0.0001 Epoch: 30 Global Step: 635180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:35,042-Speed 6337.23 samples/sec Loss 3.3079 LearningRate 0.0001 Epoch: 30 Global Step: 635190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:38,291-Speed 6305.88 samples/sec Loss 3.2505 LearningRate 0.0001 Epoch: 30 Global Step: 635200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:41,573-Speed 6241.21 samples/sec Loss 3.3815 LearningRate 0.0001 Epoch: 30 Global Step: 635210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:44,831-Speed 6287.66 samples/sec Loss 3.2962 LearningRate 0.0001 Epoch: 30 Global Step: 635220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:48,084-Speed 6296.61 samples/sec Loss 3.3045 LearningRate 0.0001 Epoch: 30 Global Step: 635230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:51,339-Speed 6294.20 samples/sec Loss 3.3978 LearningRate 0.0001 Epoch: 30 Global Step: 635240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:54,634-Speed 6216.36 samples/sec Loss 3.3226 LearningRate 0.0001 Epoch: 30 Global Step: 635250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:30:57,877-Speed 6316.93 samples/sec Loss 3.3655 LearningRate 0.0001 Epoch: 30 Global Step: 635260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:01,121-Speed 6314.46 samples/sec Loss 3.2797 LearningRate 0.0001 Epoch: 30 Global Step: 635270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:04,374-Speed 6297.31 samples/sec Loss 3.3247 LearningRate 0.0001 Epoch: 30 Global Step: 635280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:07,603-Speed 6342.72 samples/sec Loss 3.2665 LearningRate 0.0001 Epoch: 30 Global Step: 635290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:10,856-Speed 6297.24 samples/sec Loss 3.2278 LearningRate 0.0001 Epoch: 30 Global Step: 635300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:14,106-Speed 6302.15 samples/sec Loss 3.4020 LearningRate 0.0001 Epoch: 30 Global Step: 635310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:17,354-Speed 6307.38 samples/sec Loss 3.3126 LearningRate 0.0001 Epoch: 30 Global Step: 635320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:20,603-Speed 6305.18 samples/sec Loss 3.3559 LearningRate 0.0001 Epoch: 30 Global Step: 635330 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:23,850-Speed 6308.30 samples/sec Loss 3.3250 LearningRate 0.0001 Epoch: 30 Global Step: 635340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:27,093-Speed 6316.85 samples/sec Loss 3.3154 LearningRate 0.0001 Epoch: 30 Global Step: 635350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:30,338-Speed 6313.23 samples/sec Loss 3.3377 LearningRate 0.0001 Epoch: 30 Global Step: 635360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:33,583-Speed 6313.74 samples/sec Loss 3.3274 LearningRate 0.0001 Epoch: 30 Global Step: 635370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:36,827-Speed 6313.72 samples/sec Loss 3.3508 LearningRate 0.0001 Epoch: 30 Global Step: 635380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:40,073-Speed 6311.61 samples/sec Loss 3.3426 LearningRate 0.0001 Epoch: 30 Global Step: 635390 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:31:43,307-Speed 6334.97 samples/sec Loss 3.3140 LearningRate 0.0001 Epoch: 30 Global Step: 635400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:46,552-Speed 6312.05 samples/sec Loss 3.3126 LearningRate 0.0001 Epoch: 30 Global Step: 635410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:49,796-Speed 6315.01 samples/sec Loss 3.3527 LearningRate 0.0001 Epoch: 30 Global Step: 635420 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:53,039-Speed 6316.53 samples/sec Loss 3.3338 LearningRate 0.0001 Epoch: 30 Global Step: 635430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:56,282-Speed 6315.40 samples/sec Loss 3.2910 LearningRate 0.0001 Epoch: 30 Global Step: 635440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:31:59,528-Speed 6311.43 samples/sec Loss 3.3055 LearningRate 0.0001 Epoch: 30 Global Step: 635450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:02,785-Speed 6290.18 samples/sec Loss 3.3655 LearningRate 0.0001 Epoch: 30 Global Step: 635460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:06,026-Speed 6319.80 samples/sec Loss 3.3047 LearningRate 0.0001 Epoch: 30 Global Step: 635470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:09,265-Speed 6323.87 samples/sec Loss 3.2958 LearningRate 0.0001 Epoch: 30 Global Step: 635480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:12,511-Speed 6310.71 samples/sec Loss 3.3022 LearningRate 0.0001 Epoch: 30 Global Step: 635490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:15,740-Speed 6344.95 samples/sec Loss 3.3278 LearningRate 0.0001 Epoch: 30 Global Step: 635500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:18,985-Speed 6311.33 samples/sec Loss 3.2715 LearningRate 0.0001 Epoch: 30 Global Step: 635510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:22,231-Speed 6311.71 samples/sec Loss 3.3502 LearningRate 0.0001 Epoch: 30 Global Step: 635520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:25,474-Speed 6315.88 samples/sec Loss 3.3432 LearningRate 0.0001 Epoch: 30 Global Step: 635530 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:28,721-Speed 6309.04 samples/sec Loss 3.3593 LearningRate 0.0001 Epoch: 30 Global Step: 635540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:31,967-Speed 6311.69 samples/sec Loss 3.3217 LearningRate 0.0001 Epoch: 30 Global Step: 635550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:35,213-Speed 6309.35 samples/sec Loss 3.3007 LearningRate 0.0001 Epoch: 30 Global Step: 635560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:38,469-Speed 6291.65 samples/sec Loss 3.2925 LearningRate 0.0001 Epoch: 30 Global Step: 635570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:41,718-Speed 6307.02 samples/sec Loss 3.3020 LearningRate 0.0001 Epoch: 30 Global Step: 635580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:44,973-Speed 6292.30 samples/sec Loss 3.2887 LearningRate 0.0001 Epoch: 30 Global Step: 635590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:48,205-Speed 6337.65 samples/sec Loss 3.3864 LearningRate 0.0001 Epoch: 30 Global Step: 635600 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:51,450-Speed 6313.34 samples/sec Loss 3.3499 LearningRate 0.0001 Epoch: 30 Global Step: 635610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:54,696-Speed 6310.33 samples/sec Loss 3.3459 LearningRate 0.0001 Epoch: 30 Global Step: 635620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:32:57,945-Speed 6305.90 samples/sec Loss 3.3422 LearningRate 0.0001 Epoch: 30 Global Step: 635630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:01,198-Speed 6296.22 samples/sec Loss 3.3195 LearningRate 0.0001 Epoch: 30 Global Step: 635640 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:04,443-Speed 6314.23 samples/sec Loss 3.3360 LearningRate 0.0001 Epoch: 30 Global Step: 635650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:07,686-Speed 6314.81 samples/sec Loss 3.3598 LearningRate 0.0001 Epoch: 30 Global Step: 635660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:10,929-Speed 6317.89 samples/sec Loss 3.3028 LearningRate 0.0001 Epoch: 30 Global Step: 635670 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:14,173-Speed 6313.37 samples/sec Loss 3.2877 LearningRate 0.0001 Epoch: 30 Global Step: 635680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:17,420-Speed 6310.35 samples/sec Loss 3.2515 LearningRate 0.0001 Epoch: 30 Global Step: 635690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:20,649-Speed 6343.54 samples/sec Loss 3.3070 LearningRate 0.0001 Epoch: 30 Global Step: 635700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:23,894-Speed 6311.25 samples/sec Loss 3.3419 LearningRate 0.0001 Epoch: 30 Global Step: 635710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:27,144-Speed 6307.66 samples/sec Loss 3.2862 LearningRate 0.0001 Epoch: 30 Global Step: 635720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:30,401-Speed 6287.98 samples/sec Loss 3.3083 LearningRate 0.0001 Epoch: 30 Global Step: 635730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:33,641-Speed 6322.96 samples/sec Loss 3.3751 LearningRate 0.0001 Epoch: 30 Global Step: 635740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:36,889-Speed 6307.29 samples/sec Loss 3.3128 LearningRate 0.0001 Epoch: 30 Global Step: 635750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:40,137-Speed 6306.23 samples/sec Loss 3.3230 LearningRate 0.0001 Epoch: 30 Global Step: 635760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:43,385-Speed 6306.33 samples/sec Loss 3.3767 LearningRate 0.0001 Epoch: 30 Global Step: 635770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:46,631-Speed 6312.02 samples/sec Loss 3.3216 LearningRate 0.0001 Epoch: 30 Global Step: 635780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:49,876-Speed 6312.35 samples/sec Loss 3.3340 LearningRate 0.0001 Epoch: 30 Global Step: 635790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:53,105-Speed 6343.85 samples/sec Loss 3.3570 LearningRate 0.0001 Epoch: 30 Global Step: 635800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:56,350-Speed 6313.68 samples/sec Loss 3.3031 LearningRate 0.0001 Epoch: 30 Global Step: 635810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:33:59,595-Speed 6313.20 samples/sec Loss 3.2960 LearningRate 0.0001 Epoch: 30 Global Step: 635820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:02,839-Speed 6314.04 samples/sec Loss 3.3609 LearningRate 0.0001 Epoch: 30 Global Step: 635830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:06,084-Speed 6311.76 samples/sec Loss 3.2953 LearningRate 0.0001 Epoch: 30 Global Step: 635840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:09,328-Speed 6314.94 samples/sec Loss 3.2962 LearningRate 0.0001 Epoch: 30 Global Step: 635850 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:12,573-Speed 6313.23 samples/sec Loss 3.3288 LearningRate 0.0001 Epoch: 30 Global Step: 635860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:15,827-Speed 6295.37 samples/sec Loss 3.2673 LearningRate 0.0001 Epoch: 30 Global Step: 635870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:19,076-Speed 6304.83 samples/sec Loss 3.3487 LearningRate 0.0001 Epoch: 30 Global Step: 635880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:22,318-Speed 6317.83 samples/sec Loss 3.3119 LearningRate 0.0001 Epoch: 30 Global Step: 635890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:25,564-Speed 6309.85 samples/sec Loss 3.3566 LearningRate 0.0001 Epoch: 30 Global Step: 635900 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:34:28,803-Speed 6325.34 samples/sec Loss 3.3100 LearningRate 0.0001 Epoch: 30 Global Step: 635910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:32,045-Speed 6318.45 samples/sec Loss 3.3324 LearningRate 0.0001 Epoch: 30 Global Step: 635920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:35,292-Speed 6307.92 samples/sec Loss 3.2994 LearningRate 0.0001 Epoch: 30 Global Step: 635930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:38,542-Speed 6303.48 samples/sec Loss 3.3352 LearningRate 0.0001 Epoch: 30 Global Step: 635940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:41,798-Speed 6290.82 samples/sec Loss 3.3314 LearningRate 0.0001 Epoch: 30 Global Step: 635950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:45,077-Speed 6247.75 samples/sec Loss 3.3275 LearningRate 0.0001 Epoch: 30 Global Step: 635960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:48,323-Speed 6312.58 samples/sec Loss 3.3150 LearningRate 0.0001 Epoch: 30 Global Step: 635970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:51,571-Speed 6305.99 samples/sec Loss 3.3202 LearningRate 0.0001 Epoch: 30 Global Step: 635980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:54,820-Speed 6305.00 samples/sec Loss 3.3000 LearningRate 0.0001 Epoch: 30 Global Step: 635990 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:34:58,068-Speed 6306.18 samples/sec Loss 3.3096 LearningRate 0.0001 Epoch: 30 Global Step: 636000 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:01,301-Speed 6335.73 samples/sec Loss 3.3270 LearningRate 0.0001 Epoch: 30 Global Step: 636010 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:04,549-Speed 6308.29 samples/sec Loss 3.3350 LearningRate 0.0001 Epoch: 30 Global Step: 636020 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:07,794-Speed 6313.37 samples/sec Loss 3.3472 LearningRate 0.0001 Epoch: 30 Global Step: 636030 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:11,041-Speed 6308.96 samples/sec Loss 3.2550 LearningRate 0.0001 Epoch: 30 Global Step: 636040 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:14,284-Speed 6316.92 samples/sec Loss 3.3172 LearningRate 0.0001 Epoch: 30 Global Step: 636050 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:17,526-Speed 6317.14 samples/sec Loss 3.3210 LearningRate 0.0001 Epoch: 30 Global Step: 636060 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:20,772-Speed 6310.44 samples/sec Loss 3.3450 LearningRate 0.0001 Epoch: 30 Global Step: 636070 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:24,017-Speed 6313.24 samples/sec Loss 3.3262 LearningRate 0.0001 Epoch: 30 Global Step: 636080 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:27,264-Speed 6309.08 samples/sec Loss 3.3751 LearningRate 0.0001 Epoch: 30 Global Step: 636090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:30,509-Speed 6312.86 samples/sec Loss 3.3170 LearningRate 0.0001 Epoch: 30 Global Step: 636100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:33,753-Speed 6314.30 samples/sec Loss 3.3404 LearningRate 0.0001 Epoch: 30 Global Step: 636110 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:35:36,983-Speed 6341.79 samples/sec Loss 3.2828 LearningRate 0.0001 Epoch: 30 Global Step: 636120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:40,227-Speed 6315.14 samples/sec Loss 3.2865 LearningRate 0.0001 Epoch: 30 Global Step: 636130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:43,474-Speed 6308.10 samples/sec Loss 3.3174 LearningRate 0.0001 Epoch: 30 Global Step: 636140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:46,723-Speed 6305.96 samples/sec Loss 3.2765 LearningRate 0.0001 Epoch: 30 Global Step: 636150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:49,971-Speed 6306.40 samples/sec Loss 3.3294 LearningRate 0.0001 Epoch: 30 Global Step: 636160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:53,258-Speed 6230.55 samples/sec Loss 3.2847 LearningRate 0.0001 Epoch: 30 Global Step: 636170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:56,525-Speed 6270.52 samples/sec Loss 3.3651 LearningRate 0.0001 Epoch: 30 Global Step: 636180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:35:59,776-Speed 6301.23 samples/sec Loss 3.3239 LearningRate 0.0001 Epoch: 30 Global Step: 636190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:03,023-Speed 6309.11 samples/sec Loss 3.2836 LearningRate 0.0001 Epoch: 30 Global Step: 636200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:06,270-Speed 6309.61 samples/sec Loss 3.3231 LearningRate 0.0001 Epoch: 30 Global Step: 636210 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:09,498-Speed 6345.08 samples/sec Loss 3.3255 LearningRate 0.0001 Epoch: 30 Global Step: 636220 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:12,740-Speed 6318.65 samples/sec Loss 3.3790 LearningRate 0.0001 Epoch: 30 Global Step: 636230 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:15,989-Speed 6305.93 samples/sec Loss 3.2811 LearningRate 0.0001 Epoch: 30 Global Step: 636240 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:19,232-Speed 6316.72 samples/sec Loss 3.3129 LearningRate 0.0001 Epoch: 30 Global Step: 636250 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:22,478-Speed 6310.00 samples/sec Loss 3.3464 LearningRate 0.0001 Epoch: 30 Global Step: 636260 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:25,832-Speed 6106.77 samples/sec Loss 3.3199 LearningRate 0.0001 Epoch: 30 Global Step: 636270 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:29,081-Speed 6305.82 samples/sec Loss 3.2930 LearningRate 0.0001 Epoch: 30 Global Step: 636280 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:32,326-Speed 6312.34 samples/sec Loss 3.3238 LearningRate 0.0001 Epoch: 30 Global Step: 636290 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:35,568-Speed 6318.45 samples/sec Loss 3.2632 LearningRate 0.0001 Epoch: 30 Global Step: 636300 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:38,817-Speed 6306.16 samples/sec Loss 3.3555 LearningRate 0.0001 Epoch: 30 Global Step: 636310 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:42,049-Speed 6336.82 samples/sec Loss 3.3873 LearningRate 0.0001 Epoch: 30 Global Step: 636320 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:45,288-Speed 6324.10 samples/sec Loss 3.2863 LearningRate 0.0001 Epoch: 30 Global Step: 636330 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:48,536-Speed 6306.88 samples/sec Loss 3.3488 LearningRate 0.0001 Epoch: 30 Global Step: 636340 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:51,785-Speed 6305.98 samples/sec Loss 3.3630 LearningRate 0.0001 Epoch: 30 Global Step: 636350 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:55,042-Speed 6288.67 samples/sec Loss 3.3132 LearningRate 0.0001 Epoch: 30 Global Step: 636360 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:36:58,293-Speed 6301.00 samples/sec Loss 3.3258 LearningRate 0.0001 Epoch: 30 Global Step: 636370 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:01,540-Speed 6309.10 samples/sec Loss 3.2834 LearningRate 0.0001 Epoch: 30 Global Step: 636380 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:04,793-Speed 6297.11 samples/sec Loss 3.2689 LearningRate 0.0001 Epoch: 30 Global Step: 636390 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:08,035-Speed 6317.56 samples/sec Loss 3.2873 LearningRate 0.0001 Epoch: 30 Global Step: 636400 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:11,283-Speed 6308.42 samples/sec Loss 3.3098 LearningRate 0.0001 Epoch: 30 Global Step: 636410 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:14,533-Speed 6302.30 samples/sec Loss 3.2657 LearningRate 0.0001 Epoch: 30 Global Step: 636420 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:37:17,771-Speed 6325.27 samples/sec Loss 3.2990 LearningRate 0.0001 Epoch: 30 Global Step: 636430 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:21,015-Speed 6314.48 samples/sec Loss 3.3214 LearningRate 0.0001 Epoch: 30 Global Step: 636440 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:24,262-Speed 6310.31 samples/sec Loss 3.2995 LearningRate 0.0001 Epoch: 30 Global Step: 636450 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:27,507-Speed 6313.08 samples/sec Loss 3.3573 LearningRate 0.0001 Epoch: 30 Global Step: 636460 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:30,753-Speed 6310.89 samples/sec Loss 3.3141 LearningRate 0.0001 Epoch: 30 Global Step: 636470 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:34,002-Speed 6305.25 samples/sec Loss 3.3036 LearningRate 0.0001 Epoch: 30 Global Step: 636480 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:37,247-Speed 6312.89 samples/sec Loss 3.2920 LearningRate 0.0001 Epoch: 30 Global Step: 636490 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:40,490-Speed 6316.34 samples/sec Loss 3.3452 LearningRate 0.0001 Epoch: 30 Global Step: 636500 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:43,738-Speed 6306.21 samples/sec Loss 3.2741 LearningRate 0.0001 Epoch: 30 Global Step: 636510 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:46,986-Speed 6306.50 samples/sec Loss 3.3345 LearningRate 0.0001 Epoch: 30 Global Step: 636520 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:50,234-Speed 6306.22 samples/sec Loss 3.3014 LearningRate 0.0001 Epoch: 30 Global Step: 636530 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-03 01:37:53,466-Speed 6338.37 samples/sec Loss 3.2844 LearningRate 0.0001 Epoch: 30 Global Step: 636540 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:56,709-Speed 6316.43 samples/sec Loss 3.2793 LearningRate 0.0001 Epoch: 30 Global Step: 636550 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:37:59,952-Speed 6316.55 samples/sec Loss 3.3221 LearningRate 0.0001 Epoch: 30 Global Step: 636560 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:03,197-Speed 6312.96 samples/sec Loss 3.3113 LearningRate 0.0001 Epoch: 30 Global Step: 636570 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:06,446-Speed 6304.52 samples/sec Loss 3.3193 LearningRate 0.0001 Epoch: 30 Global Step: 636580 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:09,691-Speed 6312.63 samples/sec Loss 3.3327 LearningRate 0.0001 Epoch: 30 Global Step: 636590 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:12,938-Speed 6309.20 samples/sec Loss 3.2942 LearningRate 0.0001 Epoch: 30 Global Step: 636600 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:16,186-Speed 6307.30 samples/sec Loss 3.2595 LearningRate 0.0001 Epoch: 30 Global Step: 636610 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:19,439-Speed 6296.99 samples/sec Loss 3.3308 LearningRate 0.0001 Epoch: 30 Global Step: 636620 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:22,693-Speed 6295.22 samples/sec Loss 3.3294 LearningRate 0.0001 Epoch: 30 Global Step: 636630 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:25,921-Speed 6345.31 samples/sec Loss 3.2563 LearningRate 0.0001 Epoch: 30 Global Step: 636640 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:29,170-Speed 6305.97 samples/sec Loss 3.3634 LearningRate 0.0001 Epoch: 30 Global Step: 636650 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:32,411-Speed 6318.66 samples/sec Loss 3.3622 LearningRate 0.0001 Epoch: 30 Global Step: 636660 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:35,658-Speed 6309.52 samples/sec Loss 3.2915 LearningRate 0.0001 Epoch: 30 Global Step: 636670 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:38,902-Speed 6315.87 samples/sec Loss 3.3911 LearningRate 0.0001 Epoch: 30 Global Step: 636680 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:42,145-Speed 6315.19 samples/sec Loss 3.3596 LearningRate 0.0001 Epoch: 30 Global Step: 636690 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:45,386-Speed 6320.59 samples/sec Loss 3.3593 LearningRate 0.0001 Epoch: 30 Global Step: 636700 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:48,634-Speed 6307.60 samples/sec Loss 3.3138 LearningRate 0.0001 Epoch: 30 Global Step: 636710 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:51,877-Speed 6315.88 samples/sec Loss 3.2602 LearningRate 0.0001 Epoch: 30 Global Step: 636720 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:55,123-Speed 6310.46 samples/sec Loss 3.2422 LearningRate 0.0001 Epoch: 30 Global Step: 636730 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:38:58,356-Speed 6337.40 samples/sec Loss 3.3319 LearningRate 0.0001 Epoch: 30 Global Step: 636740 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:01,602-Speed 6311.30 samples/sec Loss 3.3373 LearningRate 0.0001 Epoch: 30 Global Step: 636750 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:04,848-Speed 6310.76 samples/sec Loss 3.3285 LearningRate 0.0001 Epoch: 30 Global Step: 636760 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:08,090-Speed 6317.37 samples/sec Loss 3.3149 LearningRate 0.0001 Epoch: 30 Global Step: 636770 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:11,337-Speed 6308.94 samples/sec Loss 3.2590 LearningRate 0.0001 Epoch: 30 Global Step: 636780 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:14,585-Speed 6307.02 samples/sec Loss 3.2872 LearningRate 0.0001 Epoch: 30 Global Step: 636790 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:17,828-Speed 6316.59 samples/sec Loss 3.2629 LearningRate 0.0001 Epoch: 30 Global Step: 636800 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:21,075-Speed 6307.80 samples/sec Loss 3.2340 LearningRate 0.0001 Epoch: 30 Global Step: 636810 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:24,323-Speed 6306.60 samples/sec Loss 3.2990 LearningRate 0.0001 Epoch: 30 Global Step: 636820 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:27,570-Speed 6310.70 samples/sec Loss 3.3310 LearningRate 0.0001 Epoch: 30 Global Step: 636830 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:30,802-Speed 6337.71 samples/sec Loss 3.2937 LearningRate 0.0001 Epoch: 30 Global Step: 636840 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:34,057-Speed 6293.70 samples/sec Loss 3.3456 LearningRate 0.0001 Epoch: 30 Global Step: 636850 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:37,303-Speed 6309.70 samples/sec Loss 3.3249 LearningRate 0.0001 Epoch: 30 Global Step: 636860 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:40,546-Speed 6317.21 samples/sec Loss 3.3467 LearningRate 0.0001 Epoch: 30 Global Step: 636870 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:43,787-Speed 6318.75 samples/sec Loss 3.3048 LearningRate 0.0001 Epoch: 30 Global Step: 636880 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:47,038-Speed 6301.51 samples/sec Loss 3.3157 LearningRate 0.0001 Epoch: 30 Global Step: 636890 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:50,285-Speed 6309.30 samples/sec Loss 3.2880 LearningRate 0.0001 Epoch: 30 Global Step: 636900 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:53,551-Speed 6273.04 samples/sec Loss 3.2956 LearningRate 0.0001 Epoch: 30 Global Step: 636910 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:39:56,791-Speed 6322.08 samples/sec Loss 3.2666 LearningRate 0.0001 Epoch: 30 Global Step: 636920 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:40:00,036-Speed 6312.90 samples/sec Loss 3.3018 LearningRate 0.0001 Epoch: 30 Global Step: 636930 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:40:03,269-Speed 6335.22 samples/sec Loss 3.3344 LearningRate 0.0001 Epoch: 30 Global Step: 636940 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:40:06,525-Speed 6291.58 samples/sec Loss 3.3268 LearningRate 0.0001 Epoch: 30 Global Step: 636950 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:40:09,770-Speed 6312.69 samples/sec Loss 3.3001 LearningRate 0.0001 Epoch: 30 Global Step: 636960 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:40:13,014-Speed 6314.13 samples/sec Loss 3.3059 LearningRate 0.0001 Epoch: 30 Global Step: 636970 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:40:16,270-Speed 6292.09 samples/sec Loss 3.3322 LearningRate 0.0001 Epoch: 30 Global Step: 636980 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:40:19,504-Speed 6335.25 samples/sec Loss 3.2798 LearningRate 0.0001 Epoch: 30 Global Step: 636990 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:40:22,747-Speed 6316.37 samples/sec Loss 3.3533 LearningRate 0.0001 Epoch: 30 Global Step: 637000 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:40:26,041-Speed 6218.40 samples/sec Loss 3.3796 LearningRate 0.0001 Epoch: 30 Global Step: 637010 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:40:29,304-Speed 6277.83 samples/sec Loss 3.2540 LearningRate 0.0001 Epoch: 30 Global Step: 637020 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:40:32,551-Speed 6308.07 samples/sec Loss 3.2658 LearningRate 0.0001 Epoch: 30 Global Step: 637030 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:40:35,817-Speed 6272.53 samples/sec Loss 3.3037 LearningRate 0.0001 Epoch: 30 Global Step: 637040 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:40:39,103-Speed 6232.90 samples/sec Loss 3.3028 LearningRate 0.0001 Epoch: 30 Global Step: 637050 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:40:42,419-Speed 6177.52 samples/sec Loss 3.3719 LearningRate 0.0001 Epoch: 30 Global Step: 637060 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:40:45,667-Speed 6307.00 samples/sec Loss 3.3423 LearningRate 0.0001 Epoch: 30 Global Step: 637070 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:40:48,910-Speed 6316.91 samples/sec Loss 3.3374 LearningRate 0.0001 Epoch: 30 Global Step: 637080 Fp16 Grad Scale: 4096 Required: 18 hours Training: 2022-04-03 01:40:52,158-Speed 6306.59 samples/sec Loss 3.3478 LearningRate 0.0001 Epoch: 30 Global Step: 637090 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:40:55,404-Speed 6310.31 samples/sec Loss 3.2586 LearningRate 0.0001 Epoch: 30 Global Step: 637100 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:40:58,663-Speed 6286.67 samples/sec Loss 3.3029 LearningRate 0.0001 Epoch: 30 Global Step: 637110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:41:01,908-Speed 6312.67 samples/sec Loss 3.2883 LearningRate 0.0001 Epoch: 30 Global Step: 637120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-04-03 01:41:05,158-Speed 6302.00 samples/sec Loss 3.3652 LearningRate 0.0001 Epoch: 30 Global Step: 637130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:08,408-Speed 6303.24 samples/sec Loss 3.3325 LearningRate 0.0001 Epoch: 30 Global Step: 637140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:11,654-Speed 6312.08 samples/sec Loss 3.3289 LearningRate 0.0001 Epoch: 30 Global Step: 637150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:14,902-Speed 6306.84 samples/sec Loss 3.2756 LearningRate 0.0001 Epoch: 30 Global Step: 637160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:18,144-Speed 6317.46 samples/sec Loss 3.3535 LearningRate 0.0001 Epoch: 30 Global Step: 637170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:21,391-Speed 6308.56 samples/sec Loss 3.3145 LearningRate 0.0001 Epoch: 30 Global Step: 637180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:24,622-Speed 6340.19 samples/sec Loss 3.3146 LearningRate 0.0001 Epoch: 30 Global Step: 637190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:27,868-Speed 6312.14 samples/sec Loss 3.2925 LearningRate 0.0001 Epoch: 30 Global Step: 637200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:31,110-Speed 6317.12 samples/sec Loss 3.2544 LearningRate 0.0001 Epoch: 30 Global Step: 637210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:34,376-Speed 6273.84 samples/sec Loss 3.2926 LearningRate 0.0001 Epoch: 30 Global Step: 637220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:37,620-Speed 6313.33 samples/sec Loss 3.3305 LearningRate 0.0001 Epoch: 30 Global Step: 637230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:40,873-Speed 6297.58 samples/sec Loss 3.3275 LearningRate 0.0001 Epoch: 30 Global Step: 637240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:44,128-Speed 6293.12 samples/sec Loss 3.3007 LearningRate 0.0001 Epoch: 30 Global Step: 637250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:47,375-Speed 6308.96 samples/sec Loss 3.3133 LearningRate 0.0001 Epoch: 30 Global Step: 637260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:50,622-Speed 6308.28 samples/sec Loss 3.3680 LearningRate 0.0001 Epoch: 30 Global Step: 637270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:53,869-Speed 6308.31 samples/sec Loss 3.2929 LearningRate 0.0001 Epoch: 30 Global Step: 637280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:41:57,098-Speed 6344.85 samples/sec Loss 3.2697 LearningRate 0.0001 Epoch: 30 Global Step: 637290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:00,350-Speed 6298.02 samples/sec Loss 3.3787 LearningRate 0.0001 Epoch: 30 Global Step: 637300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:03,596-Speed 6312.37 samples/sec Loss 3.3461 LearningRate 0.0001 Epoch: 30 Global Step: 637310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:06,839-Speed 6315.52 samples/sec Loss 3.2948 LearningRate 0.0001 Epoch: 30 Global Step: 637320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:10,081-Speed 6320.20 samples/sec Loss 3.3040 LearningRate 0.0001 Epoch: 30 Global Step: 637330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:13,323-Speed 6317.41 samples/sec Loss 3.3662 LearningRate 0.0001 Epoch: 30 Global Step: 637340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:16,570-Speed 6308.20 samples/sec Loss 3.3216 LearningRate 0.0001 Epoch: 30 Global Step: 637350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:19,820-Speed 6304.42 samples/sec Loss 3.3099 LearningRate 0.0001 Epoch: 30 Global Step: 637360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:23,075-Speed 6293.79 samples/sec Loss 3.2548 LearningRate 0.0001 Epoch: 30 Global Step: 637370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:26,319-Speed 6315.23 samples/sec Loss 3.3106 LearningRate 0.0001 Epoch: 30 Global Step: 637380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:29,569-Speed 6302.58 samples/sec Loss 3.3381 LearningRate 0.0001 Epoch: 30 Global Step: 637390 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 01:42:32,799-Speed 6342.11 samples/sec Loss 3.3702 LearningRate 0.0001 Epoch: 30 Global Step: 637400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:36,079-Speed 6244.19 samples/sec Loss 3.3461 LearningRate 0.0001 Epoch: 30 Global Step: 637410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:39,326-Speed 6309.68 samples/sec Loss 3.3034 LearningRate 0.0001 Epoch: 30 Global Step: 637420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:42,573-Speed 6308.69 samples/sec Loss 3.2894 LearningRate 0.0001 Epoch: 30 Global Step: 637430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:45,818-Speed 6312.96 samples/sec Loss 3.2780 LearningRate 0.0001 Epoch: 30 Global Step: 637440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:49,167-Speed 6115.61 samples/sec Loss 3.3172 LearningRate 0.0001 Epoch: 30 Global Step: 637450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:52,434-Speed 6269.98 samples/sec Loss 3.2826 LearningRate 0.0001 Epoch: 30 Global Step: 637460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:55,678-Speed 6314.50 samples/sec Loss 3.3034 LearningRate 0.0001 Epoch: 30 Global Step: 637470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:42:58,920-Speed 6319.13 samples/sec Loss 3.3472 LearningRate 0.0001 Epoch: 30 Global Step: 637480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:02,166-Speed 6310.69 samples/sec Loss 3.3155 LearningRate 0.0001 Epoch: 30 Global Step: 637490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:05,399-Speed 6336.69 samples/sec Loss 3.3242 LearningRate 0.0001 Epoch: 30 Global Step: 637500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:08,645-Speed 6309.26 samples/sec Loss 3.2756 LearningRate 0.0001 Epoch: 30 Global Step: 637510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:11,893-Speed 6308.85 samples/sec Loss 3.2822 LearningRate 0.0001 Epoch: 30 Global Step: 637520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:15,135-Speed 6317.05 samples/sec Loss 3.3450 LearningRate 0.0001 Epoch: 30 Global Step: 637530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:18,384-Speed 6305.60 samples/sec Loss 3.3113 LearningRate 0.0001 Epoch: 30 Global Step: 637540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:21,625-Speed 6318.87 samples/sec Loss 3.3302 LearningRate 0.0001 Epoch: 30 Global Step: 637550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:24,875-Speed 6304.67 samples/sec Loss 3.2755 LearningRate 0.0001 Epoch: 30 Global Step: 637560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:28,120-Speed 6312.75 samples/sec Loss 3.3817 LearningRate 0.0001 Epoch: 30 Global Step: 637570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:31,366-Speed 6310.14 samples/sec Loss 3.3185 LearningRate 0.0001 Epoch: 30 Global Step: 637580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:34,614-Speed 6307.59 samples/sec Loss 3.2907 LearningRate 0.0001 Epoch: 30 Global Step: 637590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:37,847-Speed 6335.82 samples/sec Loss 3.3337 LearningRate 0.0001 Epoch: 30 Global Step: 637600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:41,095-Speed 6306.84 samples/sec Loss 3.3743 LearningRate 0.0001 Epoch: 30 Global Step: 637610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:44,337-Speed 6318.24 samples/sec Loss 3.2238 LearningRate 0.0001 Epoch: 30 Global Step: 637620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:47,584-Speed 6310.08 samples/sec Loss 3.2849 LearningRate 0.0001 Epoch: 30 Global Step: 637630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:50,829-Speed 6311.97 samples/sec Loss 3.3104 LearningRate 0.0001 Epoch: 30 Global Step: 637640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:54,078-Speed 6304.20 samples/sec Loss 3.3014 LearningRate 0.0001 Epoch: 30 Global Step: 637650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:43:57,323-Speed 6312.66 samples/sec Loss 3.2594 LearningRate 0.0001 Epoch: 30 Global Step: 637660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:00,581-Speed 6288.77 samples/sec Loss 3.3202 LearningRate 0.0001 Epoch: 30 Global Step: 637670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:03,829-Speed 6307.70 samples/sec Loss 3.3647 LearningRate 0.0001 Epoch: 30 Global Step: 637680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:07,076-Speed 6308.96 samples/sec Loss 3.3083 LearningRate 0.0001 Epoch: 30 Global Step: 637690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:10,309-Speed 6334.53 samples/sec Loss 3.3024 LearningRate 0.0001 Epoch: 30 Global Step: 637700 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:13,558-Speed 6305.82 samples/sec Loss 3.3114 LearningRate 0.0001 Epoch: 30 Global Step: 637710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:16,803-Speed 6313.02 samples/sec Loss 3.2441 LearningRate 0.0001 Epoch: 30 Global Step: 637720 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:20,085-Speed 6240.71 samples/sec Loss 3.3219 LearningRate 0.0001 Epoch: 30 Global Step: 637730 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:23,412-Speed 6156.57 samples/sec Loss 3.3394 LearningRate 0.0001 Epoch: 30 Global Step: 637740 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:26,659-Speed 6310.24 samples/sec Loss 3.2920 LearningRate 0.0001 Epoch: 30 Global Step: 637750 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:29,911-Speed 6297.54 samples/sec Loss 3.3252 LearningRate 0.0001 Epoch: 30 Global Step: 637760 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:44:33,141-Speed 6343.71 samples/sec Loss 3.2849 LearningRate 0.0001 Epoch: 30 Global Step: 637770 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:44:36,389-Speed 6304.96 samples/sec Loss 3.3362 LearningRate 0.0001 Epoch: 30 Global Step: 637780 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:44:39,634-Speed 6314.68 samples/sec Loss 3.3361 LearningRate 0.0001 Epoch: 30 Global Step: 637790 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:44:42,878-Speed 6314.27 samples/sec Loss 3.2616 LearningRate 0.0001 Epoch: 30 Global Step: 637800 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:44:46,126-Speed 6306.46 samples/sec Loss 3.4245 LearningRate 0.0001 Epoch: 30 Global Step: 637810 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:44:49,373-Speed 6309.62 samples/sec Loss 3.2576 LearningRate 0.0001 Epoch: 30 Global Step: 637820 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:44:52,619-Speed 6311.01 samples/sec Loss 3.3138 LearningRate 0.0001 Epoch: 30 Global Step: 637830 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:44:55,860-Speed 6319.61 samples/sec Loss 3.3370 LearningRate 0.0001 Epoch: 30 Global Step: 637840 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:44:59,119-Speed 6285.63 samples/sec Loss 3.2690 LearningRate 0.0001 Epoch: 30 Global Step: 637850 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:45:02,375-Speed 6290.52 samples/sec Loss 3.2208 LearningRate 0.0001 Epoch: 30 Global Step: 637860 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:45:05,633-Speed 6288.95 samples/sec Loss 3.2541 LearningRate 0.0001 Epoch: 30 Global Step: 637870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:08,880-Speed 6307.31 samples/sec Loss 3.2768 LearningRate 0.0001 Epoch: 30 Global Step: 637880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:12,128-Speed 6308.16 samples/sec Loss 3.3024 LearningRate 0.0001 Epoch: 30 Global Step: 637890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:15,372-Speed 6314.86 samples/sec Loss 3.2348 LearningRate 0.0001 Epoch: 30 Global Step: 637900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:18,620-Speed 6305.99 samples/sec Loss 3.2916 LearningRate 0.0001 Epoch: 30 Global Step: 637910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:21,863-Speed 6315.37 samples/sec Loss 3.2887 LearningRate 0.0001 Epoch: 30 Global Step: 637920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:25,108-Speed 6313.13 samples/sec Loss 3.3203 LearningRate 0.0001 Epoch: 30 Global Step: 637930 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:28,357-Speed 6305.94 samples/sec Loss 3.2598 LearningRate 0.0001 Epoch: 30 Global Step: 637940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:31,601-Speed 6313.83 samples/sec Loss 3.2500 LearningRate 0.0001 Epoch: 30 Global Step: 637950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:34,844-Speed 6316.76 samples/sec Loss 3.3199 LearningRate 0.0001 Epoch: 30 Global Step: 637960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:38,078-Speed 6334.35 samples/sec Loss 3.2845 LearningRate 0.0001 Epoch: 30 Global Step: 637970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:41,326-Speed 6306.38 samples/sec Loss 3.3226 LearningRate 0.0001 Epoch: 30 Global Step: 637980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:44,571-Speed 6313.64 samples/sec Loss 3.3107 LearningRate 0.0001 Epoch: 30 Global Step: 637990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:47,816-Speed 6311.25 samples/sec Loss 3.2625 LearningRate 0.0001 Epoch: 30 Global Step: 638000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:51,062-Speed 6311.76 samples/sec Loss 3.2897 LearningRate 0.0001 Epoch: 30 Global Step: 638010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:54,313-Speed 6300.44 samples/sec Loss 3.2763 LearningRate 0.0001 Epoch: 30 Global Step: 638020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:45:57,558-Speed 6311.90 samples/sec Loss 3.3182 LearningRate 0.0001 Epoch: 30 Global Step: 638030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:00,803-Speed 6312.66 samples/sec Loss 3.3761 LearningRate 0.0001 Epoch: 30 Global Step: 638040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:04,049-Speed 6311.52 samples/sec Loss 3.2767 LearningRate 0.0001 Epoch: 30 Global Step: 638050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:07,294-Speed 6313.73 samples/sec Loss 3.2781 LearningRate 0.0001 Epoch: 30 Global Step: 638060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:10,525-Speed 6339.61 samples/sec Loss 3.3036 LearningRate 0.0001 Epoch: 30 Global Step: 638070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:13,770-Speed 6313.55 samples/sec Loss 3.2444 LearningRate 0.0001 Epoch: 30 Global Step: 638080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:17,015-Speed 6311.01 samples/sec Loss 3.3021 LearningRate 0.0001 Epoch: 30 Global Step: 638090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:20,262-Speed 6309.43 samples/sec Loss 3.3389 LearningRate 0.0001 Epoch: 30 Global Step: 638100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:23,506-Speed 6313.93 samples/sec Loss 3.3173 LearningRate 0.0001 Epoch: 30 Global Step: 638110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:26,752-Speed 6310.49 samples/sec Loss 3.3261 LearningRate 0.0001 Epoch: 30 Global Step: 638120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:29,997-Speed 6312.97 samples/sec Loss 3.3201 LearningRate 0.0001 Epoch: 30 Global Step: 638130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:33,241-Speed 6315.72 samples/sec Loss 3.2392 LearningRate 0.0001 Epoch: 30 Global Step: 638140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:36,485-Speed 6314.86 samples/sec Loss 3.2875 LearningRate 0.0001 Epoch: 30 Global Step: 638150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:39,729-Speed 6313.78 samples/sec Loss 3.2842 LearningRate 0.0001 Epoch: 30 Global Step: 638160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:42,960-Speed 6340.94 samples/sec Loss 3.3193 LearningRate 0.0001 Epoch: 30 Global Step: 638170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:46,209-Speed 6303.00 samples/sec Loss 3.3159 LearningRate 0.0001 Epoch: 30 Global Step: 638180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:49,459-Speed 6304.59 samples/sec Loss 3.3421 LearningRate 0.0001 Epoch: 30 Global Step: 638190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:52,700-Speed 6320.02 samples/sec Loss 3.3293 LearningRate 0.0001 Epoch: 30 Global Step: 638200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:55,946-Speed 6310.15 samples/sec Loss 3.3551 LearningRate 0.0001 Epoch: 30 Global Step: 638210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:46:59,191-Speed 6312.04 samples/sec Loss 3.3306 LearningRate 0.0001 Epoch: 30 Global Step: 638220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:02,437-Speed 6311.57 samples/sec Loss 3.2777 LearningRate 0.0001 Epoch: 30 Global Step: 638230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:05,685-Speed 6307.07 samples/sec Loss 3.2828 LearningRate 0.0001 Epoch: 30 Global Step: 638240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:08,930-Speed 6311.86 samples/sec Loss 3.3629 LearningRate 0.0001 Epoch: 30 Global Step: 638250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:12,175-Speed 6312.24 samples/sec Loss 3.3602 LearningRate 0.0001 Epoch: 30 Global Step: 638260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:15,423-Speed 6307.48 samples/sec Loss 3.3312 LearningRate 0.0001 Epoch: 30 Global Step: 638270 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 01:47:18,658-Speed 6333.51 samples/sec Loss 3.3355 LearningRate 0.0001 Epoch: 30 Global Step: 638280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:21,902-Speed 6314.15 samples/sec Loss 3.3304 LearningRate 0.0001 Epoch: 30 Global Step: 638290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:25,148-Speed 6311.02 samples/sec Loss 3.2952 LearningRate 0.0001 Epoch: 30 Global Step: 638300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:28,408-Speed 6284.41 samples/sec Loss 3.2964 LearningRate 0.0001 Epoch: 30 Global Step: 638310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:31,656-Speed 6305.43 samples/sec Loss 3.3302 LearningRate 0.0001 Epoch: 30 Global Step: 638320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:34,901-Speed 6313.98 samples/sec Loss 3.2761 LearningRate 0.0001 Epoch: 30 Global Step: 638330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:38,148-Speed 6308.18 samples/sec Loss 3.2815 LearningRate 0.0001 Epoch: 30 Global Step: 638340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:41,393-Speed 6313.38 samples/sec Loss 3.2928 LearningRate 0.0001 Epoch: 30 Global Step: 638350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:44,635-Speed 6318.12 samples/sec Loss 3.3142 LearningRate 0.0001 Epoch: 30 Global Step: 638360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:47,879-Speed 6314.57 samples/sec Loss 3.3106 LearningRate 0.0001 Epoch: 30 Global Step: 638370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:51,109-Speed 6340.52 samples/sec Loss 3.3507 LearningRate 0.0001 Epoch: 30 Global Step: 638380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:54,370-Speed 6282.20 samples/sec Loss 3.3554 LearningRate 0.0001 Epoch: 30 Global Step: 638390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:47:57,617-Speed 6308.60 samples/sec Loss 3.3029 LearningRate 0.0001 Epoch: 30 Global Step: 638400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:00,864-Speed 6309.80 samples/sec Loss 3.3507 LearningRate 0.0001 Epoch: 30 Global Step: 638410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:04,109-Speed 6312.73 samples/sec Loss 3.3225 LearningRate 0.0001 Epoch: 30 Global Step: 638420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:07,353-Speed 6314.23 samples/sec Loss 3.2703 LearningRate 0.0001 Epoch: 30 Global Step: 638430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:10,596-Speed 6315.49 samples/sec Loss 3.2676 LearningRate 0.0001 Epoch: 30 Global Step: 638440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:13,854-Speed 6287.30 samples/sec Loss 3.2894 LearningRate 0.0001 Epoch: 30 Global Step: 638450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:17,120-Speed 6273.34 samples/sec Loss 3.2925 LearningRate 0.0001 Epoch: 30 Global Step: 638460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:20,374-Speed 6294.94 samples/sec Loss 3.3071 LearningRate 0.0001 Epoch: 30 Global Step: 638470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:23,614-Speed 6323.12 samples/sec Loss 3.3301 LearningRate 0.0001 Epoch: 30 Global Step: 638480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:26,969-Speed 6105.69 samples/sec Loss 3.2361 LearningRate 0.0001 Epoch: 30 Global Step: 638490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:30,277-Speed 6192.91 samples/sec Loss 3.3181 LearningRate 0.0001 Epoch: 30 Global Step: 638500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:33,530-Speed 6295.80 samples/sec Loss 3.3260 LearningRate 0.0001 Epoch: 30 Global Step: 638510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:36,785-Speed 6294.85 samples/sec Loss 3.3281 LearningRate 0.0001 Epoch: 30 Global Step: 638520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:40,034-Speed 6305.31 samples/sec Loss 3.3242 LearningRate 0.0001 Epoch: 30 Global Step: 638530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:43,274-Speed 6322.06 samples/sec Loss 3.2864 LearningRate 0.0001 Epoch: 30 Global Step: 638540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:46,529-Speed 6293.73 samples/sec Loss 3.2969 LearningRate 0.0001 Epoch: 30 Global Step: 638550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:49,780-Speed 6300.17 samples/sec Loss 3.2828 LearningRate 0.0001 Epoch: 30 Global Step: 638560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:53,022-Speed 6318.96 samples/sec Loss 3.3075 LearningRate 0.0001 Epoch: 30 Global Step: 638570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:56,253-Speed 6339.97 samples/sec Loss 3.2954 LearningRate 0.0001 Epoch: 30 Global Step: 638580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:48:59,500-Speed 6308.62 samples/sec Loss 3.3067 LearningRate 0.0001 Epoch: 30 Global Step: 638590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:02,763-Speed 6276.95 samples/sec Loss 3.3004 LearningRate 0.0001 Epoch: 30 Global Step: 638600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:06,005-Speed 6318.18 samples/sec Loss 3.2624 LearningRate 0.0001 Epoch: 30 Global Step: 638610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:09,253-Speed 6308.42 samples/sec Loss 3.2519 LearningRate 0.0001 Epoch: 30 Global Step: 638620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:12,503-Speed 6302.58 samples/sec Loss 3.3215 LearningRate 0.0001 Epoch: 30 Global Step: 638630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:15,749-Speed 6310.48 samples/sec Loss 3.2237 LearningRate 0.0001 Epoch: 30 Global Step: 638640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:18,991-Speed 6318.26 samples/sec Loss 3.3602 LearningRate 0.0001 Epoch: 30 Global Step: 638650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:22,234-Speed 6316.43 samples/sec Loss 3.3142 LearningRate 0.0001 Epoch: 30 Global Step: 638660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:25,483-Speed 6304.90 samples/sec Loss 3.2888 LearningRate 0.0001 Epoch: 30 Global Step: 638670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:28,717-Speed 6335.26 samples/sec Loss 3.4253 LearningRate 0.0001 Epoch: 30 Global Step: 638680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:31,962-Speed 6311.57 samples/sec Loss 3.2468 LearningRate 0.0001 Epoch: 30 Global Step: 638690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:35,204-Speed 6318.09 samples/sec Loss 3.3273 LearningRate 0.0001 Epoch: 30 Global Step: 638700 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:38,447-Speed 6317.68 samples/sec Loss 3.3138 LearningRate 0.0001 Epoch: 30 Global Step: 638710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:49:41,679-Speed 6338.61 samples/sec Loss 3.3457 LearningRate 0.0001 Epoch: 30 Global Step: 638720 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:49:44,923-Speed 6314.86 samples/sec Loss 3.3278 LearningRate 0.0001 Epoch: 30 Global Step: 638730 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:49:48,201-Speed 6249.42 samples/sec Loss 3.3404 LearningRate 0.0001 Epoch: 30 Global Step: 638740 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:49:51,508-Speed 6194.11 samples/sec Loss 3.3222 LearningRate 0.0001 Epoch: 30 Global Step: 638750 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:49:54,756-Speed 6306.46 samples/sec Loss 3.2317 LearningRate 0.0001 Epoch: 30 Global Step: 638760 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:49:58,008-Speed 6299.67 samples/sec Loss 3.2310 LearningRate 0.0001 Epoch: 30 Global Step: 638770 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:01,253-Speed 6312.05 samples/sec Loss 3.2432 LearningRate 0.0001 Epoch: 30 Global Step: 638780 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:04,508-Speed 6292.56 samples/sec Loss 3.3341 LearningRate 0.0001 Epoch: 30 Global Step: 638790 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:07,750-Speed 6317.40 samples/sec Loss 3.3309 LearningRate 0.0001 Epoch: 30 Global Step: 638800 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:10,999-Speed 6306.85 samples/sec Loss 3.2693 LearningRate 0.0001 Epoch: 30 Global Step: 638810 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:14,243-Speed 6313.12 samples/sec Loss 3.3463 LearningRate 0.0001 Epoch: 30 Global Step: 638820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:50:17,499-Speed 6291.51 samples/sec Loss 3.3367 LearningRate 0.0001 Epoch: 30 Global Step: 638830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:50:20,756-Speed 6289.72 samples/sec Loss 3.3098 LearningRate 0.0001 Epoch: 30 Global Step: 638840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:50:24,013-Speed 6289.09 samples/sec Loss 3.3140 LearningRate 0.0001 Epoch: 30 Global Step: 638850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:50:27,267-Speed 6294.96 samples/sec Loss 3.2875 LearningRate 0.0001 Epoch: 30 Global Step: 638860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:50:30,521-Speed 6296.20 samples/sec Loss 3.3557 LearningRate 0.0001 Epoch: 30 Global Step: 638870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:50:33,769-Speed 6306.75 samples/sec Loss 3.3148 LearningRate 0.0001 Epoch: 30 Global Step: 638880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:50:36,997-Speed 6346.01 samples/sec Loss 3.3053 LearningRate 0.0001 Epoch: 30 Global Step: 638890 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:40,243-Speed 6311.08 samples/sec Loss 3.2887 LearningRate 0.0001 Epoch: 30 Global Step: 638900 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:43,485-Speed 6317.96 samples/sec Loss 3.2464 LearningRate 0.0001 Epoch: 30 Global Step: 638910 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:46,730-Speed 6311.49 samples/sec Loss 3.2947 LearningRate 0.0001 Epoch: 30 Global Step: 638920 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:49,981-Speed 6301.33 samples/sec Loss 3.2874 LearningRate 0.0001 Epoch: 30 Global Step: 638930 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:53,227-Speed 6311.76 samples/sec Loss 3.2991 LearningRate 0.0001 Epoch: 30 Global Step: 638940 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:56,476-Speed 6305.25 samples/sec Loss 3.3025 LearningRate 0.0001 Epoch: 30 Global Step: 638950 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:50:59,726-Speed 6303.28 samples/sec Loss 3.2615 LearningRate 0.0001 Epoch: 30 Global Step: 638960 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:51:02,971-Speed 6312.30 samples/sec Loss 3.3050 LearningRate 0.0001 Epoch: 30 Global Step: 638970 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:51:06,213-Speed 6319.25 samples/sec Loss 3.2628 LearningRate 0.0001 Epoch: 30 Global Step: 638980 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:51:09,457-Speed 6314.77 samples/sec Loss 3.3237 LearningRate 0.0001 Epoch: 30 Global Step: 638990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:12,704-Speed 6309.30 samples/sec Loss 3.3556 LearningRate 0.0001 Epoch: 30 Global Step: 639000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:15,949-Speed 6311.04 samples/sec Loss 3.2940 LearningRate 0.0001 Epoch: 30 Global Step: 639010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:19,193-Speed 6315.65 samples/sec Loss 3.2947 LearningRate 0.0001 Epoch: 30 Global Step: 639020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:22,440-Speed 6307.52 samples/sec Loss 3.3269 LearningRate 0.0001 Epoch: 30 Global Step: 639030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:25,688-Speed 6308.42 samples/sec Loss 3.2604 LearningRate 0.0001 Epoch: 30 Global Step: 639040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:28,934-Speed 6309.84 samples/sec Loss 3.3282 LearningRate 0.0001 Epoch: 30 Global Step: 639050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:32,185-Speed 6301.48 samples/sec Loss 3.2956 LearningRate 0.0001 Epoch: 30 Global Step: 639060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:35,427-Speed 6317.34 samples/sec Loss 3.2629 LearningRate 0.0001 Epoch: 30 Global Step: 639070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:38,675-Speed 6308.05 samples/sec Loss 3.2884 LearningRate 0.0001 Epoch: 30 Global Step: 639080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:41,918-Speed 6315.20 samples/sec Loss 3.3527 LearningRate 0.0001 Epoch: 30 Global Step: 639090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:45,159-Speed 6322.39 samples/sec Loss 3.2731 LearningRate 0.0001 Epoch: 30 Global Step: 639100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:48,405-Speed 6311.10 samples/sec Loss 3.3241 LearningRate 0.0001 Epoch: 30 Global Step: 639110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:51,648-Speed 6316.62 samples/sec Loss 3.3076 LearningRate 0.0001 Epoch: 30 Global Step: 639120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:54,894-Speed 6309.62 samples/sec Loss 3.2888 LearningRate 0.0001 Epoch: 30 Global Step: 639130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:51:58,139-Speed 6313.86 samples/sec Loss 3.2338 LearningRate 0.0001 Epoch: 30 Global Step: 639140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:01,390-Speed 6301.23 samples/sec Loss 3.3110 LearningRate 0.0001 Epoch: 30 Global Step: 639150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:04,639-Speed 6305.62 samples/sec Loss 3.2546 LearningRate 0.0001 Epoch: 30 Global Step: 639160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:07,901-Speed 6279.05 samples/sec Loss 3.2808 LearningRate 0.0001 Epoch: 30 Global Step: 639170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:11,147-Speed 6311.55 samples/sec Loss 3.2777 LearningRate 0.0001 Epoch: 30 Global Step: 639180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:14,375-Speed 6346.45 samples/sec Loss 3.3107 LearningRate 0.0001 Epoch: 30 Global Step: 639190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:17,621-Speed 6311.44 samples/sec Loss 3.3156 LearningRate 0.0001 Epoch: 30 Global Step: 639200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:20,863-Speed 6317.08 samples/sec Loss 3.3085 LearningRate 0.0001 Epoch: 30 Global Step: 639210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:24,106-Speed 6316.25 samples/sec Loss 3.2786 LearningRate 0.0001 Epoch: 30 Global Step: 639220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:27,361-Speed 6293.48 samples/sec Loss 3.2381 LearningRate 0.0001 Epoch: 30 Global Step: 639230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:30,610-Speed 6305.65 samples/sec Loss 3.2756 LearningRate 0.0001 Epoch: 30 Global Step: 639240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:33,862-Speed 6297.89 samples/sec Loss 3.3051 LearningRate 0.0001 Epoch: 30 Global Step: 639250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:37,116-Speed 6296.60 samples/sec Loss 3.3178 LearningRate 0.0001 Epoch: 30 Global Step: 639260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:40,358-Speed 6317.86 samples/sec Loss 3.3141 LearningRate 0.0001 Epoch: 30 Global Step: 639270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:43,606-Speed 6307.04 samples/sec Loss 3.3478 LearningRate 0.0001 Epoch: 30 Global Step: 639280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:46,838-Speed 6337.21 samples/sec Loss 3.2362 LearningRate 0.0001 Epoch: 30 Global Step: 639290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:50,087-Speed 6305.35 samples/sec Loss 3.3057 LearningRate 0.0001 Epoch: 30 Global Step: 639300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:53,328-Speed 6320.82 samples/sec Loss 3.2881 LearningRate 0.0001 Epoch: 30 Global Step: 639310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:56,578-Speed 6302.86 samples/sec Loss 3.2991 LearningRate 0.0001 Epoch: 30 Global Step: 639320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:52:59,825-Speed 6308.78 samples/sec Loss 3.2654 LearningRate 0.0001 Epoch: 30 Global Step: 639330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:03,071-Speed 6311.27 samples/sec Loss 3.2505 LearningRate 0.0001 Epoch: 30 Global Step: 639340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:06,317-Speed 6310.48 samples/sec Loss 3.2553 LearningRate 0.0001 Epoch: 30 Global Step: 639350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:09,560-Speed 6315.94 samples/sec Loss 3.2589 LearningRate 0.0001 Epoch: 30 Global Step: 639360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:12,806-Speed 6310.70 samples/sec Loss 3.3242 LearningRate 0.0001 Epoch: 30 Global Step: 639370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:16,063-Speed 6290.75 samples/sec Loss 3.3508 LearningRate 0.0001 Epoch: 30 Global Step: 639380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:19,306-Speed 6317.10 samples/sec Loss 3.3096 LearningRate 0.0001 Epoch: 30 Global Step: 639390 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 01:53:22,550-Speed 6315.20 samples/sec Loss 3.3047 LearningRate 0.0001 Epoch: 30 Global Step: 639400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:25,798-Speed 6306.91 samples/sec Loss 3.2541 LearningRate 0.0001 Epoch: 30 Global Step: 639410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:29,043-Speed 6311.54 samples/sec Loss 3.2977 LearningRate 0.0001 Epoch: 30 Global Step: 639420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:32,290-Speed 6308.98 samples/sec Loss 3.2821 LearningRate 0.0001 Epoch: 30 Global Step: 639430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:35,536-Speed 6311.38 samples/sec Loss 3.3743 LearningRate 0.0001 Epoch: 30 Global Step: 639440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:38,785-Speed 6303.86 samples/sec Loss 3.2979 LearningRate 0.0001 Epoch: 30 Global Step: 639450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:42,033-Speed 6307.23 samples/sec Loss 3.3105 LearningRate 0.0001 Epoch: 30 Global Step: 639460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:45,282-Speed 6306.02 samples/sec Loss 3.2570 LearningRate 0.0001 Epoch: 30 Global Step: 639470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:48,534-Speed 6298.95 samples/sec Loss 3.2836 LearningRate 0.0001 Epoch: 30 Global Step: 639480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:51,782-Speed 6306.68 samples/sec Loss 3.2380 LearningRate 0.0001 Epoch: 30 Global Step: 639490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:55,014-Speed 6337.44 samples/sec Loss 3.3345 LearningRate 0.0001 Epoch: 30 Global Step: 639500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:53:58,258-Speed 6314.85 samples/sec Loss 3.2388 LearningRate 0.0001 Epoch: 30 Global Step: 639510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:01,509-Speed 6300.83 samples/sec Loss 3.2826 LearningRate 0.0001 Epoch: 30 Global Step: 639520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:04,759-Speed 6301.49 samples/sec Loss 3.3268 LearningRate 0.0001 Epoch: 30 Global Step: 639530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:08,003-Speed 6314.78 samples/sec Loss 3.2801 LearningRate 0.0001 Epoch: 30 Global Step: 639540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:11,251-Speed 6307.02 samples/sec Loss 3.2992 LearningRate 0.0001 Epoch: 30 Global Step: 639550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:14,500-Speed 6304.46 samples/sec Loss 3.2822 LearningRate 0.0001 Epoch: 30 Global Step: 639560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:17,743-Speed 6318.05 samples/sec Loss 3.2563 LearningRate 0.0001 Epoch: 30 Global Step: 639570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:20,987-Speed 6313.27 samples/sec Loss 3.2720 LearningRate 0.0001 Epoch: 30 Global Step: 639580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:24,241-Speed 6296.70 samples/sec Loss 3.3002 LearningRate 0.0001 Epoch: 30 Global Step: 639590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:27,470-Speed 6342.81 samples/sec Loss 3.3064 LearningRate 0.0001 Epoch: 30 Global Step: 639600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:30,718-Speed 6307.74 samples/sec Loss 3.2787 LearningRate 0.0001 Epoch: 30 Global Step: 639610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:33,964-Speed 6311.96 samples/sec Loss 3.2900 LearningRate 0.0001 Epoch: 30 Global Step: 639620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:37,211-Speed 6308.98 samples/sec Loss 3.3216 LearningRate 0.0001 Epoch: 30 Global Step: 639630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:40,459-Speed 6306.62 samples/sec Loss 3.2745 LearningRate 0.0001 Epoch: 30 Global Step: 639640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:43,705-Speed 6310.73 samples/sec Loss 3.2905 LearningRate 0.0001 Epoch: 30 Global Step: 639650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:46,951-Speed 6310.90 samples/sec Loss 3.2996 LearningRate 0.0001 Epoch: 30 Global Step: 639660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:50,202-Speed 6301.83 samples/sec Loss 3.2828 LearningRate 0.0001 Epoch: 30 Global Step: 639670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:53,450-Speed 6306.13 samples/sec Loss 3.3473 LearningRate 0.0001 Epoch: 30 Global Step: 639680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:56,696-Speed 6310.48 samples/sec Loss 3.2947 LearningRate 0.0001 Epoch: 30 Global Step: 639690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:54:59,941-Speed 6313.27 samples/sec Loss 3.3085 LearningRate 0.0001 Epoch: 30 Global Step: 639700 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 01:55:03,174-Speed 6336.39 samples/sec Loss 3.3454 LearningRate 0.0001 Epoch: 30 Global Step: 639710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:06,419-Speed 6312.14 samples/sec Loss 3.3103 LearningRate 0.0001 Epoch: 30 Global Step: 639720 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:09,667-Speed 6306.08 samples/sec Loss 3.2444 LearningRate 0.0001 Epoch: 30 Global Step: 639730 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:12,917-Speed 6302.65 samples/sec Loss 3.2772 LearningRate 0.0001 Epoch: 30 Global Step: 639740 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:16,164-Speed 6309.83 samples/sec Loss 3.2473 LearningRate 0.0001 Epoch: 30 Global Step: 639750 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:19,421-Speed 6289.66 samples/sec Loss 3.3052 LearningRate 0.0001 Epoch: 30 Global Step: 639760 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:22,675-Speed 6295.12 samples/sec Loss 3.2614 LearningRate 0.0001 Epoch: 30 Global Step: 639770 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:25,921-Speed 6310.72 samples/sec Loss 3.2706 LearningRate 0.0001 Epoch: 30 Global Step: 639780 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:29,161-Speed 6321.50 samples/sec Loss 3.2678 LearningRate 0.0001 Epoch: 30 Global Step: 639790 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:32,409-Speed 6306.84 samples/sec Loss 3.2496 LearningRate 0.0001 Epoch: 30 Global Step: 639800 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:35,643-Speed 6335.17 samples/sec Loss 3.2482 LearningRate 0.0001 Epoch: 30 Global Step: 639810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:38,887-Speed 6314.35 samples/sec Loss 3.2775 LearningRate 0.0001 Epoch: 30 Global Step: 639820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:42,142-Speed 6292.85 samples/sec Loss 3.2716 LearningRate 0.0001 Epoch: 30 Global Step: 639830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:45,386-Speed 6314.96 samples/sec Loss 3.2964 LearningRate 0.0001 Epoch: 30 Global Step: 639840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:48,631-Speed 6312.78 samples/sec Loss 3.3307 LearningRate 0.0001 Epoch: 30 Global Step: 639850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:51,878-Speed 6309.70 samples/sec Loss 3.2320 LearningRate 0.0001 Epoch: 30 Global Step: 639860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:55,123-Speed 6313.02 samples/sec Loss 3.3015 LearningRate 0.0001 Epoch: 30 Global Step: 639870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:55:58,372-Speed 6305.19 samples/sec Loss 3.2949 LearningRate 0.0001 Epoch: 30 Global Step: 639880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:01,620-Speed 6306.94 samples/sec Loss 3.3573 LearningRate 0.0001 Epoch: 30 Global Step: 639890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:04,867-Speed 6308.29 samples/sec Loss 3.2639 LearningRate 0.0001 Epoch: 30 Global Step: 639900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:08,099-Speed 6338.82 samples/sec Loss 3.2701 LearningRate 0.0001 Epoch: 30 Global Step: 639910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:11,344-Speed 6311.13 samples/sec Loss 3.2970 LearningRate 0.0001 Epoch: 30 Global Step: 639920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:14,597-Speed 6296.87 samples/sec Loss 3.2393 LearningRate 0.0001 Epoch: 30 Global Step: 639930 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:17,846-Speed 6305.53 samples/sec Loss 3.2800 LearningRate 0.0001 Epoch: 30 Global Step: 639940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:21,094-Speed 6306.23 samples/sec Loss 3.2861 LearningRate 0.0001 Epoch: 30 Global Step: 639950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:24,348-Speed 6296.88 samples/sec Loss 3.2934 LearningRate 0.0001 Epoch: 30 Global Step: 639960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:27,593-Speed 6311.10 samples/sec Loss 3.3232 LearningRate 0.0001 Epoch: 30 Global Step: 639970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:30,844-Speed 6301.83 samples/sec Loss 3.2864 LearningRate 0.0001 Epoch: 30 Global Step: 639980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:34,087-Speed 6315.74 samples/sec Loss 3.3123 LearningRate 0.0001 Epoch: 30 Global Step: 639990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:37,333-Speed 6310.36 samples/sec Loss 3.3391 LearningRate 0.0001 Epoch: 30 Global Step: 640000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:40,568-Speed 6333.71 samples/sec Loss 3.2689 LearningRate 0.0001 Epoch: 30 Global Step: 640010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:43,812-Speed 6314.19 samples/sec Loss 3.2968 LearningRate 0.0001 Epoch: 30 Global Step: 640020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:47,065-Speed 6296.85 samples/sec Loss 3.3167 LearningRate 0.0001 Epoch: 30 Global Step: 640030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:50,311-Speed 6310.53 samples/sec Loss 3.2806 LearningRate 0.0001 Epoch: 30 Global Step: 640040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:53,558-Speed 6308.05 samples/sec Loss 3.2415 LearningRate 0.0001 Epoch: 30 Global Step: 640050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:56:56,804-Speed 6310.44 samples/sec Loss 3.2408 LearningRate 0.0001 Epoch: 30 Global Step: 640060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:00,055-Speed 6302.40 samples/sec Loss 3.2496 LearningRate 0.0001 Epoch: 30 Global Step: 640070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:03,305-Speed 6303.30 samples/sec Loss 3.3219 LearningRate 0.0001 Epoch: 30 Global Step: 640080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:06,549-Speed 6314.00 samples/sec Loss 3.2575 LearningRate 0.0001 Epoch: 30 Global Step: 640090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:09,799-Speed 6303.90 samples/sec Loss 3.2847 LearningRate 0.0001 Epoch: 30 Global Step: 640100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:13,043-Speed 6314.10 samples/sec Loss 3.2670 LearningRate 0.0001 Epoch: 30 Global Step: 640110 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 01:57:16,277-Speed 6334.70 samples/sec Loss 3.2368 LearningRate 0.0001 Epoch: 30 Global Step: 640120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:19,522-Speed 6312.36 samples/sec Loss 3.2938 LearningRate 0.0001 Epoch: 30 Global Step: 640130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:22,770-Speed 6306.66 samples/sec Loss 3.2629 LearningRate 0.0001 Epoch: 30 Global Step: 640140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:26,017-Speed 6308.26 samples/sec Loss 3.2803 LearningRate 0.0001 Epoch: 30 Global Step: 640150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:29,270-Speed 6298.56 samples/sec Loss 3.2758 LearningRate 0.0001 Epoch: 30 Global Step: 640160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:32,513-Speed 6314.80 samples/sec Loss 3.2927 LearningRate 0.0001 Epoch: 30 Global Step: 640170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:35,799-Speed 6234.87 samples/sec Loss 3.3661 LearningRate 0.0001 Epoch: 30 Global Step: 640180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:39,181-Speed 6056.74 samples/sec Loss 3.3364 LearningRate 0.0001 Epoch: 30 Global Step: 640190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:42,518-Speed 6139.77 samples/sec Loss 3.3083 LearningRate 0.0001 Epoch: 30 Global Step: 640200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:45,763-Speed 6312.32 samples/sec Loss 3.2869 LearningRate 0.0001 Epoch: 30 Global Step: 640210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:48,998-Speed 6332.42 samples/sec Loss 3.3231 LearningRate 0.0001 Epoch: 30 Global Step: 640220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:52,245-Speed 6308.57 samples/sec Loss 3.2720 LearningRate 0.0001 Epoch: 30 Global Step: 640230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:55,490-Speed 6312.81 samples/sec Loss 3.2665 LearningRate 0.0001 Epoch: 30 Global Step: 640240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:57:58,737-Speed 6308.04 samples/sec Loss 3.2829 LearningRate 0.0001 Epoch: 30 Global Step: 640250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:02,050-Speed 6183.32 samples/sec Loss 3.3128 LearningRate 0.0001 Epoch: 30 Global Step: 640260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:05,331-Speed 6243.03 samples/sec Loss 3.3109 LearningRate 0.0001 Epoch: 30 Global Step: 640270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:08,576-Speed 6312.48 samples/sec Loss 3.2694 LearningRate 0.0001 Epoch: 30 Global Step: 640280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:11,825-Speed 6305.67 samples/sec Loss 3.2576 LearningRate 0.0001 Epoch: 30 Global Step: 640290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:15,068-Speed 6317.70 samples/sec Loss 3.2668 LearningRate 0.0001 Epoch: 30 Global Step: 640300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:18,312-Speed 6314.04 samples/sec Loss 3.2339 LearningRate 0.0001 Epoch: 30 Global Step: 640310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:21,543-Speed 6341.54 samples/sec Loss 3.3034 LearningRate 0.0001 Epoch: 30 Global Step: 640320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:24,789-Speed 6310.15 samples/sec Loss 3.2534 LearningRate 0.0001 Epoch: 30 Global Step: 640330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:28,036-Speed 6308.32 samples/sec Loss 3.2804 LearningRate 0.0001 Epoch: 30 Global Step: 640340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:31,283-Speed 6308.30 samples/sec Loss 3.2802 LearningRate 0.0001 Epoch: 30 Global Step: 640350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:34,531-Speed 6307.88 samples/sec Loss 3.2715 LearningRate 0.0001 Epoch: 30 Global Step: 640360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:37,776-Speed 6312.15 samples/sec Loss 3.3035 LearningRate 0.0001 Epoch: 30 Global Step: 640370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:41,023-Speed 6307.74 samples/sec Loss 3.2846 LearningRate 0.0001 Epoch: 30 Global Step: 640380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:44,276-Speed 6298.36 samples/sec Loss 3.2444 LearningRate 0.0001 Epoch: 30 Global Step: 640390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:47,524-Speed 6305.79 samples/sec Loss 3.3195 LearningRate 0.0001 Epoch: 30 Global Step: 640400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:50,785-Speed 6283.15 samples/sec Loss 3.2810 LearningRate 0.0001 Epoch: 30 Global Step: 640410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:54,028-Speed 6316.37 samples/sec Loss 3.3140 LearningRate 0.0001 Epoch: 30 Global Step: 640420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:58:57,277-Speed 6303.76 samples/sec Loss 3.2422 LearningRate 0.0001 Epoch: 30 Global Step: 640430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:00,531-Speed 6294.58 samples/sec Loss 3.3048 LearningRate 0.0001 Epoch: 30 Global Step: 640440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:03,775-Speed 6315.17 samples/sec Loss 3.3053 LearningRate 0.0001 Epoch: 30 Global Step: 640450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:07,018-Speed 6315.97 samples/sec Loss 3.2718 LearningRate 0.0001 Epoch: 30 Global Step: 640460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:10,270-Speed 6298.97 samples/sec Loss 3.2739 LearningRate 0.0001 Epoch: 30 Global Step: 640470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:13,516-Speed 6311.55 samples/sec Loss 3.2912 LearningRate 0.0001 Epoch: 30 Global Step: 640480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:16,759-Speed 6316.40 samples/sec Loss 3.2979 LearningRate 0.0001 Epoch: 30 Global Step: 640490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:20,005-Speed 6310.78 samples/sec Loss 3.2570 LearningRate 0.0001 Epoch: 30 Global Step: 640500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:23,250-Speed 6313.02 samples/sec Loss 3.2798 LearningRate 0.0001 Epoch: 30 Global Step: 640510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:26,479-Speed 6345.25 samples/sec Loss 3.3240 LearningRate 0.0001 Epoch: 30 Global Step: 640520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:29,723-Speed 6312.86 samples/sec Loss 3.2691 LearningRate 0.0001 Epoch: 30 Global Step: 640530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:32,969-Speed 6312.03 samples/sec Loss 3.3249 LearningRate 0.0001 Epoch: 30 Global Step: 640540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:36,213-Speed 6313.99 samples/sec Loss 3.2709 LearningRate 0.0001 Epoch: 30 Global Step: 640550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:39,456-Speed 6315.88 samples/sec Loss 3.2456 LearningRate 0.0001 Epoch: 30 Global Step: 640560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:42,697-Speed 6320.28 samples/sec Loss 3.2856 LearningRate 0.0001 Epoch: 30 Global Step: 640570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:45,945-Speed 6308.37 samples/sec Loss 3.2647 LearningRate 0.0001 Epoch: 30 Global Step: 640580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 01:59:49,175-Speed 6341.45 samples/sec Loss 3.2485 LearningRate 0.0001 Epoch: 30 Global Step: 640590 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:59:52,427-Speed 6298.84 samples/sec Loss 3.3312 LearningRate 0.0001 Epoch: 30 Global Step: 640600 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:59:55,672-Speed 6312.00 samples/sec Loss 3.2734 LearningRate 0.0001 Epoch: 30 Global Step: 640610 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 01:59:58,918-Speed 6310.41 samples/sec Loss 3.2579 LearningRate 0.0001 Epoch: 30 Global Step: 640620 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:00:02,176-Speed 6289.01 samples/sec Loss 3.3491 LearningRate 0.0001 Epoch: 30 Global Step: 640630 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:00:05,421-Speed 6311.58 samples/sec Loss 3.2644 LearningRate 0.0001 Epoch: 30 Global Step: 640640 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:00:08,670-Speed 6304.29 samples/sec Loss 3.2481 LearningRate 0.0001 Epoch: 30 Global Step: 640650 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:00:11,917-Speed 6308.87 samples/sec Loss 3.3144 LearningRate 0.0001 Epoch: 30 Global Step: 640660 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:00:15,163-Speed 6311.64 samples/sec Loss 3.2913 LearningRate 0.0001 Epoch: 30 Global Step: 640670 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:00:18,431-Speed 6267.23 samples/sec Loss 3.3113 LearningRate 0.0001 Epoch: 30 Global Step: 640680 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:00:21,767-Speed 6141.29 samples/sec Loss 3.2714 LearningRate 0.0001 Epoch: 30 Global Step: 640690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:25,011-Speed 6314.43 samples/sec Loss 3.2436 LearningRate 0.0001 Epoch: 30 Global Step: 640700 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:28,256-Speed 6311.88 samples/sec Loss 3.2990 LearningRate 0.0001 Epoch: 30 Global Step: 640710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:31,505-Speed 6305.98 samples/sec Loss 3.3665 LearningRate 0.0001 Epoch: 30 Global Step: 640720 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:34,749-Speed 6315.13 samples/sec Loss 3.2718 LearningRate 0.0001 Epoch: 30 Global Step: 640730 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:37,994-Speed 6312.90 samples/sec Loss 3.2782 LearningRate 0.0001 Epoch: 30 Global Step: 640740 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:41,236-Speed 6319.05 samples/sec Loss 3.3247 LearningRate 0.0001 Epoch: 30 Global Step: 640750 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:44,481-Speed 6312.35 samples/sec Loss 3.2691 LearningRate 0.0001 Epoch: 30 Global Step: 640760 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:47,727-Speed 6310.39 samples/sec Loss 3.2741 LearningRate 0.0001 Epoch: 30 Global Step: 640770 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:50,973-Speed 6313.71 samples/sec Loss 3.2959 LearningRate 0.0001 Epoch: 30 Global Step: 640780 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:54,201-Speed 6344.35 samples/sec Loss 3.2905 LearningRate 0.0001 Epoch: 30 Global Step: 640790 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:00:57,446-Speed 6313.63 samples/sec Loss 3.2753 LearningRate 0.0001 Epoch: 30 Global Step: 640800 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:00,686-Speed 6320.76 samples/sec Loss 3.3009 LearningRate 0.0001 Epoch: 30 Global Step: 640810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:03,934-Speed 6308.04 samples/sec Loss 3.2909 LearningRate 0.0001 Epoch: 30 Global Step: 640820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:07,179-Speed 6311.99 samples/sec Loss 3.2906 LearningRate 0.0001 Epoch: 30 Global Step: 640830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:10,422-Speed 6316.54 samples/sec Loss 3.2852 LearningRate 0.0001 Epoch: 30 Global Step: 640840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:13,674-Speed 6299.37 samples/sec Loss 3.2985 LearningRate 0.0001 Epoch: 30 Global Step: 640850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:16,921-Speed 6309.53 samples/sec Loss 3.2731 LearningRate 0.0001 Epoch: 30 Global Step: 640860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:20,164-Speed 6316.12 samples/sec Loss 3.2293 LearningRate 0.0001 Epoch: 30 Global Step: 640870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:23,415-Speed 6300.63 samples/sec Loss 3.3018 LearningRate 0.0001 Epoch: 30 Global Step: 640880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:26,646-Speed 6341.12 samples/sec Loss 3.2973 LearningRate 0.0001 Epoch: 30 Global Step: 640890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:29,888-Speed 6317.64 samples/sec Loss 3.2141 LearningRate 0.0001 Epoch: 30 Global Step: 640900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:33,133-Speed 6312.74 samples/sec Loss 3.2266 LearningRate 0.0001 Epoch: 30 Global Step: 640910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:36,379-Speed 6309.77 samples/sec Loss 3.2261 LearningRate 0.0001 Epoch: 30 Global Step: 640920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:39,627-Speed 6307.63 samples/sec Loss 3.3483 LearningRate 0.0001 Epoch: 30 Global Step: 640930 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:42,876-Speed 6305.96 samples/sec Loss 3.3127 LearningRate 0.0001 Epoch: 30 Global Step: 640940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:46,125-Speed 6305.69 samples/sec Loss 3.3096 LearningRate 0.0001 Epoch: 30 Global Step: 640950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:49,370-Speed 6310.81 samples/sec Loss 3.3728 LearningRate 0.0001 Epoch: 30 Global Step: 640960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:52,614-Speed 6315.49 samples/sec Loss 3.3311 LearningRate 0.0001 Epoch: 30 Global Step: 640970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:55,865-Speed 6301.39 samples/sec Loss 3.3122 LearningRate 0.0001 Epoch: 30 Global Step: 640980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:01:59,102-Speed 6328.81 samples/sec Loss 3.2771 LearningRate 0.0001 Epoch: 30 Global Step: 640990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:02,349-Speed 6307.53 samples/sec Loss 3.2905 LearningRate 0.0001 Epoch: 30 Global Step: 641000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:05,600-Speed 6301.55 samples/sec Loss 3.3125 LearningRate 0.0001 Epoch: 30 Global Step: 641010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:08,852-Speed 6298.81 samples/sec Loss 3.2377 LearningRate 0.0001 Epoch: 30 Global Step: 641020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:12,102-Speed 6303.59 samples/sec Loss 3.2818 LearningRate 0.0001 Epoch: 30 Global Step: 641030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:15,347-Speed 6311.89 samples/sec Loss 3.2571 LearningRate 0.0001 Epoch: 30 Global Step: 641040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:18,595-Speed 6306.75 samples/sec Loss 3.3268 LearningRate 0.0001 Epoch: 30 Global Step: 641050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:21,843-Speed 6307.48 samples/sec Loss 3.2633 LearningRate 0.0001 Epoch: 30 Global Step: 641060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:25,091-Speed 6306.57 samples/sec Loss 3.2939 LearningRate 0.0001 Epoch: 30 Global Step: 641070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:28,342-Speed 6300.45 samples/sec Loss 3.3107 LearningRate 0.0001 Epoch: 30 Global Step: 641080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:31,580-Speed 6327.48 samples/sec Loss 3.2708 LearningRate 0.0001 Epoch: 30 Global Step: 641090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:34,823-Speed 6315.27 samples/sec Loss 3.2747 LearningRate 0.0001 Epoch: 30 Global Step: 641100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:38,093-Speed 6264.60 samples/sec Loss 3.2985 LearningRate 0.0001 Epoch: 30 Global Step: 641110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:41,338-Speed 6312.83 samples/sec Loss 3.2718 LearningRate 0.0001 Epoch: 30 Global Step: 641120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:44,585-Speed 6309.42 samples/sec Loss 3.2810 LearningRate 0.0001 Epoch: 30 Global Step: 641130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:47,829-Speed 6313.42 samples/sec Loss 3.2177 LearningRate 0.0001 Epoch: 30 Global Step: 641140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:51,101-Speed 6261.48 samples/sec Loss 3.2592 LearningRate 0.0001 Epoch: 30 Global Step: 641150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:54,346-Speed 6313.54 samples/sec Loss 3.2468 LearningRate 0.0001 Epoch: 30 Global Step: 641160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:02:57,593-Speed 6308.51 samples/sec Loss 3.3560 LearningRate 0.0001 Epoch: 30 Global Step: 641170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:00,837-Speed 6315.12 samples/sec Loss 3.3217 LearningRate 0.0001 Epoch: 30 Global Step: 641180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:04,079-Speed 6319.07 samples/sec Loss 3.2654 LearningRate 0.0001 Epoch: 30 Global Step: 641190 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:03:07,313-Speed 6333.58 samples/sec Loss 3.2497 LearningRate 0.0001 Epoch: 30 Global Step: 641200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:10,559-Speed 6310.89 samples/sec Loss 3.2859 LearningRate 0.0001 Epoch: 30 Global Step: 641210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:13,802-Speed 6315.09 samples/sec Loss 3.3019 LearningRate 0.0001 Epoch: 30 Global Step: 641220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:17,049-Speed 6308.70 samples/sec Loss 3.2999 LearningRate 0.0001 Epoch: 30 Global Step: 641230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:20,300-Speed 6302.37 samples/sec Loss 3.2885 LearningRate 0.0001 Epoch: 30 Global Step: 641240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:23,545-Speed 6311.58 samples/sec Loss 3.3049 LearningRate 0.0001 Epoch: 30 Global Step: 641250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:26,791-Speed 6311.68 samples/sec Loss 3.2988 LearningRate 0.0001 Epoch: 30 Global Step: 641260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:30,034-Speed 6317.09 samples/sec Loss 3.2838 LearningRate 0.0001 Epoch: 30 Global Step: 641270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:33,283-Speed 6304.27 samples/sec Loss 3.3066 LearningRate 0.0001 Epoch: 30 Global Step: 641280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:36,527-Speed 6314.37 samples/sec Loss 3.3319 LearningRate 0.0001 Epoch: 30 Global Step: 641290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:39,760-Speed 6337.25 samples/sec Loss 3.3084 LearningRate 0.0001 Epoch: 30 Global Step: 641300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:43,006-Speed 6310.02 samples/sec Loss 3.3008 LearningRate 0.0001 Epoch: 30 Global Step: 641310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:46,253-Speed 6308.61 samples/sec Loss 3.2621 LearningRate 0.0001 Epoch: 30 Global Step: 641320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:49,504-Speed 6301.01 samples/sec Loss 3.2888 LearningRate 0.0001 Epoch: 30 Global Step: 641330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:52,750-Speed 6310.80 samples/sec Loss 3.2665 LearningRate 0.0001 Epoch: 30 Global Step: 641340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:55,995-Speed 6312.95 samples/sec Loss 3.2653 LearningRate 0.0001 Epoch: 30 Global Step: 641350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:03:59,239-Speed 6314.23 samples/sec Loss 3.3138 LearningRate 0.0001 Epoch: 30 Global Step: 641360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:02,489-Speed 6302.20 samples/sec Loss 3.2435 LearningRate 0.0001 Epoch: 30 Global Step: 641370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:05,736-Speed 6310.76 samples/sec Loss 3.3325 LearningRate 0.0001 Epoch: 30 Global Step: 641380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:08,983-Speed 6307.39 samples/sec Loss 3.2915 LearningRate 0.0001 Epoch: 30 Global Step: 641390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:12,257-Speed 6257.06 samples/sec Loss 3.2813 LearningRate 0.0001 Epoch: 30 Global Step: 641400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:15,535-Speed 6248.67 samples/sec Loss 3.3001 LearningRate 0.0001 Epoch: 30 Global Step: 641410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:18,782-Speed 6309.67 samples/sec Loss 3.3238 LearningRate 0.0001 Epoch: 30 Global Step: 641420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:22,029-Speed 6308.11 samples/sec Loss 3.2870 LearningRate 0.0001 Epoch: 30 Global Step: 641430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:25,283-Speed 6295.69 samples/sec Loss 3.2268 LearningRate 0.0001 Epoch: 30 Global Step: 641440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:28,529-Speed 6311.62 samples/sec Loss 3.2780 LearningRate 0.0001 Epoch: 30 Global Step: 641450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:31,773-Speed 6312.94 samples/sec Loss 3.2640 LearningRate 0.0001 Epoch: 30 Global Step: 641460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:35,021-Speed 6306.79 samples/sec Loss 3.3547 LearningRate 0.0001 Epoch: 30 Global Step: 641470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:38,264-Speed 6316.48 samples/sec Loss 3.2969 LearningRate 0.0001 Epoch: 30 Global Step: 641480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:41,510-Speed 6312.58 samples/sec Loss 3.3026 LearningRate 0.0001 Epoch: 30 Global Step: 641490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:44,754-Speed 6314.06 samples/sec Loss 3.2839 LearningRate 0.0001 Epoch: 30 Global Step: 641500 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:04:47,983-Speed 6343.64 samples/sec Loss 3.3178 LearningRate 0.0001 Epoch: 30 Global Step: 641510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:51,230-Speed 6310.96 samples/sec Loss 3.3068 LearningRate 0.0001 Epoch: 30 Global Step: 641520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:54,474-Speed 6316.21 samples/sec Loss 3.3093 LearningRate 0.0001 Epoch: 30 Global Step: 641530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:04:57,728-Speed 6293.93 samples/sec Loss 3.2783 LearningRate 0.0001 Epoch: 30 Global Step: 641540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:00,977-Speed 6306.16 samples/sec Loss 3.3251 LearningRate 0.0001 Epoch: 30 Global Step: 641550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:04,225-Speed 6306.29 samples/sec Loss 3.3026 LearningRate 0.0001 Epoch: 30 Global Step: 641560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:07,470-Speed 6311.42 samples/sec Loss 3.2940 LearningRate 0.0001 Epoch: 30 Global Step: 641570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:10,714-Speed 6315.49 samples/sec Loss 3.2744 LearningRate 0.0001 Epoch: 30 Global Step: 641580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:13,958-Speed 6314.38 samples/sec Loss 3.2690 LearningRate 0.0001 Epoch: 30 Global Step: 641590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:17,204-Speed 6311.27 samples/sec Loss 3.2526 LearningRate 0.0001 Epoch: 30 Global Step: 641600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:20,432-Speed 6345.37 samples/sec Loss 3.2376 LearningRate 0.0001 Epoch: 30 Global Step: 641610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:23,677-Speed 6313.85 samples/sec Loss 3.2966 LearningRate 0.0001 Epoch: 30 Global Step: 641620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:26,926-Speed 6305.61 samples/sec Loss 3.2806 LearningRate 0.0001 Epoch: 30 Global Step: 641630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:30,190-Speed 6274.75 samples/sec Loss 3.2995 LearningRate 0.0001 Epoch: 30 Global Step: 641640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:33,432-Speed 6318.46 samples/sec Loss 3.2937 LearningRate 0.0001 Epoch: 30 Global Step: 641650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:36,676-Speed 6315.06 samples/sec Loss 3.2956 LearningRate 0.0001 Epoch: 30 Global Step: 641660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:39,923-Speed 6309.11 samples/sec Loss 3.2795 LearningRate 0.0001 Epoch: 30 Global Step: 641670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:43,164-Speed 6320.26 samples/sec Loss 3.2762 LearningRate 0.0001 Epoch: 30 Global Step: 641680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:46,409-Speed 6311.48 samples/sec Loss 3.2414 LearningRate 0.0001 Epoch: 30 Global Step: 641690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:49,652-Speed 6317.01 samples/sec Loss 3.3081 LearningRate 0.0001 Epoch: 30 Global Step: 641700 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:52,884-Speed 6339.07 samples/sec Loss 3.2665 LearningRate 0.0001 Epoch: 30 Global Step: 641710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:56,133-Speed 6304.06 samples/sec Loss 3.2833 LearningRate 0.0001 Epoch: 30 Global Step: 641720 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:05:59,377-Speed 6314.92 samples/sec Loss 3.2515 LearningRate 0.0001 Epoch: 30 Global Step: 641730 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:02,624-Speed 6309.53 samples/sec Loss 3.2901 LearningRate 0.0001 Epoch: 30 Global Step: 641740 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:05,872-Speed 6306.20 samples/sec Loss 3.2215 LearningRate 0.0001 Epoch: 30 Global Step: 641750 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:09,121-Speed 6304.80 samples/sec Loss 3.3044 LearningRate 0.0001 Epoch: 30 Global Step: 641760 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:12,371-Speed 6303.38 samples/sec Loss 3.3239 LearningRate 0.0001 Epoch: 30 Global Step: 641770 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:15,616-Speed 6311.79 samples/sec Loss 3.2831 LearningRate 0.0001 Epoch: 30 Global Step: 641780 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:18,862-Speed 6311.24 samples/sec Loss 3.2819 LearningRate 0.0001 Epoch: 30 Global Step: 641790 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:22,109-Speed 6308.00 samples/sec Loss 3.2314 LearningRate 0.0001 Epoch: 30 Global Step: 641800 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:25,348-Speed 6324.76 samples/sec Loss 3.3065 LearningRate 0.0001 Epoch: 30 Global Step: 641810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:28,597-Speed 6305.27 samples/sec Loss 3.2665 LearningRate 0.0001 Epoch: 30 Global Step: 641820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:31,844-Speed 6309.84 samples/sec Loss 3.2454 LearningRate 0.0001 Epoch: 30 Global Step: 641830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:35,090-Speed 6308.87 samples/sec Loss 3.3470 LearningRate 0.0001 Epoch: 30 Global Step: 641840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:38,344-Speed 6296.32 samples/sec Loss 3.3220 LearningRate 0.0001 Epoch: 30 Global Step: 641850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:41,594-Speed 6303.78 samples/sec Loss 3.2559 LearningRate 0.0001 Epoch: 30 Global Step: 641860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:44,834-Speed 6321.21 samples/sec Loss 3.3041 LearningRate 0.0001 Epoch: 30 Global Step: 641870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:48,083-Speed 6305.97 samples/sec Loss 3.3024 LearningRate 0.0001 Epoch: 30 Global Step: 641880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:51,338-Speed 6293.52 samples/sec Loss 3.3037 LearningRate 0.0001 Epoch: 30 Global Step: 641890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:54,587-Speed 6303.02 samples/sec Loss 3.2901 LearningRate 0.0001 Epoch: 30 Global Step: 641900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:06:57,822-Speed 6333.79 samples/sec Loss 3.3085 LearningRate 0.0001 Epoch: 30 Global Step: 641910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:01,065-Speed 6315.38 samples/sec Loss 3.2633 LearningRate 0.0001 Epoch: 30 Global Step: 641920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:04,313-Speed 6307.46 samples/sec Loss 3.3293 LearningRate 0.0001 Epoch: 30 Global Step: 641930 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:07,556-Speed 6316.60 samples/sec Loss 3.3045 LearningRate 0.0001 Epoch: 30 Global Step: 641940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:10,802-Speed 6310.91 samples/sec Loss 3.3165 LearningRate 0.0001 Epoch: 30 Global Step: 641950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:14,049-Speed 6308.07 samples/sec Loss 3.2725 LearningRate 0.0001 Epoch: 30 Global Step: 641960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:17,295-Speed 6310.07 samples/sec Loss 3.2406 LearningRate 0.0001 Epoch: 30 Global Step: 641970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:20,552-Speed 6290.65 samples/sec Loss 3.2453 LearningRate 0.0001 Epoch: 30 Global Step: 641980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:23,798-Speed 6309.87 samples/sec Loss 3.2343 LearningRate 0.0001 Epoch: 30 Global Step: 641990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:27,043-Speed 6312.48 samples/sec Loss 3.2529 LearningRate 0.0001 Epoch: 30 Global Step: 642000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:30,285-Speed 6319.47 samples/sec Loss 3.3037 LearningRate 0.0001 Epoch: 30 Global Step: 642010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:33,526-Speed 6320.31 samples/sec Loss 3.3063 LearningRate 0.0001 Epoch: 30 Global Step: 642020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:36,773-Speed 6308.93 samples/sec Loss 3.3077 LearningRate 0.0001 Epoch: 30 Global Step: 642030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:40,022-Speed 6305.34 samples/sec Loss 3.3202 LearningRate 0.0001 Epoch: 30 Global Step: 642040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:43,268-Speed 6310.85 samples/sec Loss 3.2778 LearningRate 0.0001 Epoch: 30 Global Step: 642050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:46,513-Speed 6313.08 samples/sec Loss 3.2764 LearningRate 0.0001 Epoch: 30 Global Step: 642060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:49,761-Speed 6306.94 samples/sec Loss 3.2858 LearningRate 0.0001 Epoch: 30 Global Step: 642070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:53,007-Speed 6311.39 samples/sec Loss 3.2813 LearningRate 0.0001 Epoch: 30 Global Step: 642080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:56,250-Speed 6316.90 samples/sec Loss 3.2212 LearningRate 0.0001 Epoch: 30 Global Step: 642090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:07:59,496-Speed 6309.24 samples/sec Loss 3.2602 LearningRate 0.0001 Epoch: 30 Global Step: 642100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:02,728-Speed 6337.82 samples/sec Loss 3.2555 LearningRate 0.0001 Epoch: 30 Global Step: 642110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:05,975-Speed 6309.84 samples/sec Loss 3.2384 LearningRate 0.0001 Epoch: 30 Global Step: 642120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:09,220-Speed 6311.23 samples/sec Loss 3.2479 LearningRate 0.0001 Epoch: 30 Global Step: 642130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:12,467-Speed 6308.81 samples/sec Loss 3.2645 LearningRate 0.0001 Epoch: 30 Global Step: 642140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:15,742-Speed 6256.34 samples/sec Loss 3.2951 LearningRate 0.0001 Epoch: 30 Global Step: 642150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:19,011-Speed 6264.35 samples/sec Loss 3.2572 LearningRate 0.0001 Epoch: 30 Global Step: 642160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:22,335-Speed 6164.27 samples/sec Loss 3.2792 LearningRate 0.0001 Epoch: 30 Global Step: 642170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:25,644-Speed 6189.08 samples/sec Loss 3.2625 LearningRate 0.0001 Epoch: 30 Global Step: 642180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:28,892-Speed 6307.56 samples/sec Loss 3.2765 LearningRate 0.0001 Epoch: 30 Global Step: 642190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:32,139-Speed 6309.78 samples/sec Loss 3.2842 LearningRate 0.0001 Epoch: 30 Global Step: 642200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:35,380-Speed 6319.33 samples/sec Loss 3.3391 LearningRate 0.0001 Epoch: 30 Global Step: 642210 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:08:38,616-Speed 6329.46 samples/sec Loss 3.3414 LearningRate 0.0001 Epoch: 30 Global Step: 642220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:41,860-Speed 6316.34 samples/sec Loss 3.3064 LearningRate 0.0001 Epoch: 30 Global Step: 642230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:45,104-Speed 6313.29 samples/sec Loss 3.2719 LearningRate 0.0001 Epoch: 30 Global Step: 642240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:48,350-Speed 6311.62 samples/sec Loss 3.2656 LearningRate 0.0001 Epoch: 30 Global Step: 642250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:51,593-Speed 6317.32 samples/sec Loss 3.3455 LearningRate 0.0001 Epoch: 30 Global Step: 642260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:54,836-Speed 6316.56 samples/sec Loss 3.2341 LearningRate 0.0001 Epoch: 30 Global Step: 642270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:08:58,083-Speed 6309.03 samples/sec Loss 3.2594 LearningRate 0.0001 Epoch: 30 Global Step: 642280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:01,326-Speed 6316.90 samples/sec Loss 3.2205 LearningRate 0.0001 Epoch: 30 Global Step: 642290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:04,570-Speed 6313.97 samples/sec Loss 3.2245 LearningRate 0.0001 Epoch: 30 Global Step: 642300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:07,812-Speed 6317.30 samples/sec Loss 3.2944 LearningRate 0.0001 Epoch: 30 Global Step: 642310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:11,063-Speed 6301.46 samples/sec Loss 3.2424 LearningRate 0.0001 Epoch: 30 Global Step: 642320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:14,309-Speed 6310.29 samples/sec Loss 3.3371 LearningRate 0.0001 Epoch: 30 Global Step: 642330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:17,555-Speed 6310.37 samples/sec Loss 3.3229 LearningRate 0.0001 Epoch: 30 Global Step: 642340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:20,803-Speed 6308.44 samples/sec Loss 3.2293 LearningRate 0.0001 Epoch: 30 Global Step: 642350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:24,053-Speed 6302.30 samples/sec Loss 3.2731 LearningRate 0.0001 Epoch: 30 Global Step: 642360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:27,299-Speed 6311.36 samples/sec Loss 3.2566 LearningRate 0.0001 Epoch: 30 Global Step: 642370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:30,562-Speed 6277.99 samples/sec Loss 3.3297 LearningRate 0.0001 Epoch: 30 Global Step: 642380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:33,809-Speed 6307.64 samples/sec Loss 3.2998 LearningRate 0.0001 Epoch: 30 Global Step: 642390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:37,054-Speed 6313.38 samples/sec Loss 3.3092 LearningRate 0.0001 Epoch: 30 Global Step: 642400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:40,300-Speed 6311.13 samples/sec Loss 3.2805 LearningRate 0.0001 Epoch: 30 Global Step: 642410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:43,545-Speed 6311.06 samples/sec Loss 3.2986 LearningRate 0.0001 Epoch: 30 Global Step: 642420 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:09:46,776-Speed 6340.10 samples/sec Loss 3.3199 LearningRate 0.0001 Epoch: 30 Global Step: 642430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:50,046-Speed 6265.84 samples/sec Loss 3.2138 LearningRate 0.0001 Epoch: 30 Global Step: 642440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:53,293-Speed 6307.07 samples/sec Loss 3.2895 LearningRate 0.0001 Epoch: 30 Global Step: 642450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:56,538-Speed 6313.07 samples/sec Loss 3.2871 LearningRate 0.0001 Epoch: 30 Global Step: 642460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:09:59,789-Speed 6302.03 samples/sec Loss 3.2916 LearningRate 0.0001 Epoch: 30 Global Step: 642470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:03,098-Speed 6190.58 samples/sec Loss 3.3272 LearningRate 0.0001 Epoch: 30 Global Step: 642480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:06,346-Speed 6306.16 samples/sec Loss 3.2966 LearningRate 0.0001 Epoch: 30 Global Step: 642490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:09,593-Speed 6309.70 samples/sec Loss 3.2442 LearningRate 0.0001 Epoch: 30 Global Step: 642500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:12,839-Speed 6310.29 samples/sec Loss 3.2444 LearningRate 0.0001 Epoch: 30 Global Step: 642510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:16,084-Speed 6312.53 samples/sec Loss 3.2663 LearningRate 0.0001 Epoch: 30 Global Step: 642520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:19,316-Speed 6338.33 samples/sec Loss 3.3404 LearningRate 0.0001 Epoch: 30 Global Step: 642530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:22,567-Speed 6301.70 samples/sec Loss 3.3216 LearningRate 0.0001 Epoch: 30 Global Step: 642540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:25,824-Speed 6289.10 samples/sec Loss 3.2604 LearningRate 0.0001 Epoch: 30 Global Step: 642550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:29,074-Speed 6303.73 samples/sec Loss 3.2302 LearningRate 0.0001 Epoch: 30 Global Step: 642560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:32,328-Speed 6295.34 samples/sec Loss 3.2751 LearningRate 0.0001 Epoch: 30 Global Step: 642570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:35,574-Speed 6310.40 samples/sec Loss 3.2669 LearningRate 0.0001 Epoch: 30 Global Step: 642580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:38,819-Speed 6311.09 samples/sec Loss 3.2772 LearningRate 0.0001 Epoch: 30 Global Step: 642590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:42,061-Speed 6319.24 samples/sec Loss 3.2892 LearningRate 0.0001 Epoch: 30 Global Step: 642600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:45,307-Speed 6311.61 samples/sec Loss 3.3248 LearningRate 0.0001 Epoch: 30 Global Step: 642610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:48,551-Speed 6312.91 samples/sec Loss 3.2501 LearningRate 0.0001 Epoch: 30 Global Step: 642620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:51,785-Speed 6335.88 samples/sec Loss 3.3434 LearningRate 0.0001 Epoch: 30 Global Step: 642630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:55,043-Speed 6286.49 samples/sec Loss 3.2810 LearningRate 0.0001 Epoch: 30 Global Step: 642640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:10:58,288-Speed 6313.19 samples/sec Loss 3.2841 LearningRate 0.0001 Epoch: 30 Global Step: 642650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:01,537-Speed 6304.92 samples/sec Loss 3.2757 LearningRate 0.0001 Epoch: 30 Global Step: 642660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:04,781-Speed 6314.71 samples/sec Loss 3.2869 LearningRate 0.0001 Epoch: 30 Global Step: 642670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:08,030-Speed 6304.75 samples/sec Loss 3.2335 LearningRate 0.0001 Epoch: 30 Global Step: 642680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:11,271-Speed 6320.87 samples/sec Loss 3.2488 LearningRate 0.0001 Epoch: 30 Global Step: 642690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:14,517-Speed 6310.27 samples/sec Loss 3.2515 LearningRate 0.0001 Epoch: 30 Global Step: 642700 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:17,760-Speed 6316.14 samples/sec Loss 3.2732 LearningRate 0.0001 Epoch: 30 Global Step: 642710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:21,006-Speed 6310.92 samples/sec Loss 3.2796 LearningRate 0.0001 Epoch: 30 Global Step: 642720 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:24,238-Speed 6339.26 samples/sec Loss 3.3042 LearningRate 0.0001 Epoch: 30 Global Step: 642730 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:27,485-Speed 6309.07 samples/sec Loss 3.3058 LearningRate 0.0001 Epoch: 30 Global Step: 642740 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:30,730-Speed 6311.36 samples/sec Loss 3.2892 LearningRate 0.0001 Epoch: 30 Global Step: 642750 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:33,975-Speed 6313.77 samples/sec Loss 3.2943 LearningRate 0.0001 Epoch: 30 Global Step: 642760 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:37,222-Speed 6308.74 samples/sec Loss 3.2816 LearningRate 0.0001 Epoch: 30 Global Step: 642770 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:40,472-Speed 6302.40 samples/sec Loss 3.2794 LearningRate 0.0001 Epoch: 30 Global Step: 642780 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:43,716-Speed 6313.64 samples/sec Loss 3.3196 LearningRate 0.0001 Epoch: 30 Global Step: 642790 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:46,960-Speed 6314.82 samples/sec Loss 3.3188 LearningRate 0.0001 Epoch: 30 Global Step: 642800 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:50,205-Speed 6313.88 samples/sec Loss 3.2827 LearningRate 0.0001 Epoch: 30 Global Step: 642810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:53,454-Speed 6303.86 samples/sec Loss 3.2386 LearningRate 0.0001 Epoch: 30 Global Step: 642820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:56,684-Speed 6343.15 samples/sec Loss 3.3051 LearningRate 0.0001 Epoch: 30 Global Step: 642830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:11:59,936-Speed 6299.02 samples/sec Loss 3.3119 LearningRate 0.0001 Epoch: 30 Global Step: 642840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:12:03,181-Speed 6312.55 samples/sec Loss 3.2818 LearningRate 0.0001 Epoch: 30 Global Step: 642850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:12:06,429-Speed 6306.47 samples/sec Loss 3.2556 LearningRate 0.0001 Epoch: 30 Global Step: 642860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:12:09,679-Speed 6301.79 samples/sec Loss 3.3122 LearningRate 0.0001 Epoch: 30 Global Step: 642870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:12:12,930-Speed 6302.23 samples/sec Loss 3.2728 LearningRate 0.0001 Epoch: 30 Global Step: 642880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:12:16,176-Speed 6309.45 samples/sec Loss 3.2815 LearningRate 0.0001 Epoch: 30 Global Step: 642890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:12:19,434-Speed 6289.10 samples/sec Loss 3.2564 LearningRate 0.0001 Epoch: 30 Global Step: 642900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:12:22,682-Speed 6306.29 samples/sec Loss 3.2638 LearningRate 0.0001 Epoch: 30 Global Step: 642910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:12:25,928-Speed 6310.94 samples/sec Loss 3.2560 LearningRate 0.0001 Epoch: 30 Global Step: 642920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:13:26,004-Speed 340.91 samples/sec Loss 3.2575 LearningRate 0.0001 Epoch: 31 Global Step: 642930 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:13:29,273-Speed 6266.99 samples/sec Loss 3.2691 LearningRate 0.0001 Epoch: 31 Global Step: 642940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:13:32,509-Speed 6329.18 samples/sec Loss 3.3433 LearningRate 0.0001 Epoch: 31 Global Step: 642950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:13:35,750-Speed 6320.13 samples/sec Loss 3.2550 LearningRate 0.0001 Epoch: 31 Global Step: 642960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:13:38,981-Speed 6341.15 samples/sec Loss 3.2972 LearningRate 0.0001 Epoch: 31 Global Step: 642970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:13:42,226-Speed 6311.88 samples/sec Loss 3.2760 LearningRate 0.0001 Epoch: 31 Global Step: 642980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:13:45,465-Speed 6323.81 samples/sec Loss 3.2652 LearningRate 0.0001 Epoch: 31 Global Step: 642990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:13:48,709-Speed 6316.18 samples/sec Loss 3.2885 LearningRate 0.0001 Epoch: 31 Global Step: 643000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:13:51,948-Speed 6324.42 samples/sec Loss 3.3146 LearningRate 0.0001 Epoch: 31 Global Step: 643010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:13:55,186-Speed 6325.38 samples/sec Loss 3.2914 LearningRate 0.0001 Epoch: 31 Global Step: 643020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:13:58,430-Speed 6314.90 samples/sec Loss 3.2977 LearningRate 0.0001 Epoch: 31 Global Step: 643030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:01,671-Speed 6320.67 samples/sec Loss 3.2788 LearningRate 0.0001 Epoch: 31 Global Step: 643040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:04,912-Speed 6319.49 samples/sec Loss 3.2471 LearningRate 0.0001 Epoch: 31 Global Step: 643050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:08,158-Speed 6310.50 samples/sec Loss 3.2872 LearningRate 0.0001 Epoch: 31 Global Step: 643060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:11,409-Speed 6301.04 samples/sec Loss 3.2656 LearningRate 0.0001 Epoch: 31 Global Step: 643070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:14,651-Speed 6317.95 samples/sec Loss 3.2577 LearningRate 0.0001 Epoch: 31 Global Step: 643080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:17,903-Speed 6301.11 samples/sec Loss 3.2703 LearningRate 0.0001 Epoch: 31 Global Step: 643090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:21,148-Speed 6313.28 samples/sec Loss 3.2593 LearningRate 0.0001 Epoch: 31 Global Step: 643100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:24,391-Speed 6315.80 samples/sec Loss 3.3435 LearningRate 0.0001 Epoch: 31 Global Step: 643110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:27,643-Speed 6299.99 samples/sec Loss 3.2398 LearningRate 0.0001 Epoch: 31 Global Step: 643120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:30,889-Speed 6309.73 samples/sec Loss 3.2703 LearningRate 0.0001 Epoch: 31 Global Step: 643130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:34,122-Speed 6337.04 samples/sec Loss 3.2848 LearningRate 0.0001 Epoch: 31 Global Step: 643140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:37,366-Speed 6315.34 samples/sec Loss 3.3142 LearningRate 0.0001 Epoch: 31 Global Step: 643150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:40,614-Speed 6307.48 samples/sec Loss 3.2981 LearningRate 0.0001 Epoch: 31 Global Step: 643160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:43,863-Speed 6304.16 samples/sec Loss 3.2973 LearningRate 0.0001 Epoch: 31 Global Step: 643170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:47,112-Speed 6305.58 samples/sec Loss 3.2781 LearningRate 0.0001 Epoch: 31 Global Step: 643180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:50,361-Speed 6304.47 samples/sec Loss 3.2083 LearningRate 0.0001 Epoch: 31 Global Step: 643190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:53,611-Speed 6302.33 samples/sec Loss 3.2873 LearningRate 0.0001 Epoch: 31 Global Step: 643200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:14:56,867-Speed 6292.87 samples/sec Loss 3.3027 LearningRate 0.0001 Epoch: 31 Global Step: 643210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:00,119-Speed 6298.84 samples/sec Loss 3.2374 LearningRate 0.0001 Epoch: 31 Global Step: 643220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:03,371-Speed 6297.64 samples/sec Loss 3.2163 LearningRate 0.0001 Epoch: 31 Global Step: 643230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:06,609-Speed 6327.11 samples/sec Loss 3.2543 LearningRate 0.0001 Epoch: 31 Global Step: 643240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:09,861-Speed 6298.37 samples/sec Loss 3.2668 LearningRate 0.0001 Epoch: 31 Global Step: 643250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:13,128-Speed 6270.17 samples/sec Loss 3.2554 LearningRate 0.0001 Epoch: 31 Global Step: 643260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:16,377-Speed 6304.70 samples/sec Loss 3.2736 LearningRate 0.0001 Epoch: 31 Global Step: 643270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:19,629-Speed 6299.21 samples/sec Loss 3.2285 LearningRate 0.0001 Epoch: 31 Global Step: 643280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:22,893-Speed 6275.72 samples/sec Loss 3.3075 LearningRate 0.0001 Epoch: 31 Global Step: 643290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:26,139-Speed 6311.91 samples/sec Loss 3.2522 LearningRate 0.0001 Epoch: 31 Global Step: 643300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:29,387-Speed 6305.95 samples/sec Loss 3.2271 LearningRate 0.0001 Epoch: 31 Global Step: 643310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:32,631-Speed 6313.97 samples/sec Loss 3.1866 LearningRate 0.0001 Epoch: 31 Global Step: 643320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:35,878-Speed 6309.96 samples/sec Loss 3.2872 LearningRate 0.0001 Epoch: 31 Global Step: 643330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:39,111-Speed 6335.79 samples/sec Loss 3.2709 LearningRate 0.0001 Epoch: 31 Global Step: 643340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:42,356-Speed 6313.18 samples/sec Loss 3.2449 LearningRate 0.0001 Epoch: 31 Global Step: 643350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:45,601-Speed 6314.04 samples/sec Loss 3.2318 LearningRate 0.0001 Epoch: 31 Global Step: 643360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:48,846-Speed 6312.76 samples/sec Loss 3.2837 LearningRate 0.0001 Epoch: 31 Global Step: 643370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:52,091-Speed 6312.80 samples/sec Loss 3.2836 LearningRate 0.0001 Epoch: 31 Global Step: 643380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:55,336-Speed 6313.07 samples/sec Loss 3.2611 LearningRate 0.0001 Epoch: 31 Global Step: 643390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:15:58,579-Speed 6315.99 samples/sec Loss 3.2634 LearningRate 0.0001 Epoch: 31 Global Step: 643400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:01,829-Speed 6302.99 samples/sec Loss 3.2841 LearningRate 0.0001 Epoch: 31 Global Step: 643410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:05,073-Speed 6313.64 samples/sec Loss 3.2760 LearningRate 0.0001 Epoch: 31 Global Step: 643420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:08,320-Speed 6309.71 samples/sec Loss 3.2796 LearningRate 0.0001 Epoch: 31 Global Step: 643430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:11,549-Speed 6343.42 samples/sec Loss 3.2406 LearningRate 0.0001 Epoch: 31 Global Step: 643440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:14,795-Speed 6310.54 samples/sec Loss 3.2764 LearningRate 0.0001 Epoch: 31 Global Step: 643450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:18,041-Speed 6310.61 samples/sec Loss 3.2102 LearningRate 0.0001 Epoch: 31 Global Step: 643460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:21,288-Speed 6309.56 samples/sec Loss 3.2575 LearningRate 0.0001 Epoch: 31 Global Step: 643470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:24,542-Speed 6295.60 samples/sec Loss 3.3119 LearningRate 0.0001 Epoch: 31 Global Step: 643480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:27,786-Speed 6313.18 samples/sec Loss 3.2490 LearningRate 0.0001 Epoch: 31 Global Step: 643490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:31,034-Speed 6307.33 samples/sec Loss 3.2765 LearningRate 0.0001 Epoch: 31 Global Step: 643500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:34,276-Speed 6319.38 samples/sec Loss 3.2679 LearningRate 0.0001 Epoch: 31 Global Step: 643510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:37,524-Speed 6305.51 samples/sec Loss 3.2316 LearningRate 0.0001 Epoch: 31 Global Step: 643520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:40,767-Speed 6317.35 samples/sec Loss 3.2913 LearningRate 0.0001 Epoch: 31 Global Step: 643530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:44,012-Speed 6312.27 samples/sec Loss 3.2322 LearningRate 0.0001 Epoch: 31 Global Step: 643540 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:16:47,243-Speed 6341.42 samples/sec Loss 3.2307 LearningRate 0.0001 Epoch: 31 Global Step: 643550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:50,491-Speed 6306.71 samples/sec Loss 3.3033 LearningRate 0.0001 Epoch: 31 Global Step: 643560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:53,736-Speed 6311.12 samples/sec Loss 3.2746 LearningRate 0.0001 Epoch: 31 Global Step: 643570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:16:56,985-Speed 6306.93 samples/sec Loss 3.2992 LearningRate 0.0001 Epoch: 31 Global Step: 643580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:00,234-Speed 6304.33 samples/sec Loss 3.2792 LearningRate 0.0001 Epoch: 31 Global Step: 643590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:03,495-Speed 6282.38 samples/sec Loss 3.2819 LearningRate 0.0001 Epoch: 31 Global Step: 643600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:06,745-Speed 6303.08 samples/sec Loss 3.2588 LearningRate 0.0001 Epoch: 31 Global Step: 643610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:09,996-Speed 6301.47 samples/sec Loss 3.2738 LearningRate 0.0001 Epoch: 31 Global Step: 643620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:13,238-Speed 6316.75 samples/sec Loss 3.2828 LearningRate 0.0001 Epoch: 31 Global Step: 643630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:16,487-Speed 6305.93 samples/sec Loss 3.2815 LearningRate 0.0001 Epoch: 31 Global Step: 643640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:19,722-Speed 6332.71 samples/sec Loss 3.2806 LearningRate 0.0001 Epoch: 31 Global Step: 643650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:22,965-Speed 6316.17 samples/sec Loss 3.2529 LearningRate 0.0001 Epoch: 31 Global Step: 643660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:26,213-Speed 6306.00 samples/sec Loss 3.2517 LearningRate 0.0001 Epoch: 31 Global Step: 643670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:29,467-Speed 6295.73 samples/sec Loss 3.2635 LearningRate 0.0001 Epoch: 31 Global Step: 643680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:32,712-Speed 6312.31 samples/sec Loss 3.2275 LearningRate 0.0001 Epoch: 31 Global Step: 643690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:35,959-Speed 6308.84 samples/sec Loss 3.3080 LearningRate 0.0001 Epoch: 31 Global Step: 643700 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:39,206-Speed 6309.95 samples/sec Loss 3.2955 LearningRate 0.0001 Epoch: 31 Global Step: 643710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:42,453-Speed 6309.01 samples/sec Loss 3.2763 LearningRate 0.0001 Epoch: 31 Global Step: 643720 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:45,699-Speed 6309.25 samples/sec Loss 3.2631 LearningRate 0.0001 Epoch: 31 Global Step: 643730 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:48,945-Speed 6311.33 samples/sec Loss 3.3551 LearningRate 0.0001 Epoch: 31 Global Step: 643740 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:52,174-Speed 6344.40 samples/sec Loss 3.2993 LearningRate 0.0001 Epoch: 31 Global Step: 643750 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:55,418-Speed 6315.02 samples/sec Loss 3.2877 LearningRate 0.0001 Epoch: 31 Global Step: 643760 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:17:58,666-Speed 6306.11 samples/sec Loss 3.3071 LearningRate 0.0001 Epoch: 31 Global Step: 643770 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:01,921-Speed 6293.19 samples/sec Loss 3.2544 LearningRate 0.0001 Epoch: 31 Global Step: 643780 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:05,168-Speed 6308.37 samples/sec Loss 3.1836 LearningRate 0.0001 Epoch: 31 Global Step: 643790 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:08,413-Speed 6313.60 samples/sec Loss 3.2453 LearningRate 0.0001 Epoch: 31 Global Step: 643800 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:11,684-Speed 6263.60 samples/sec Loss 3.2622 LearningRate 0.0001 Epoch: 31 Global Step: 643810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:14,948-Speed 6275.65 samples/sec Loss 3.1907 LearningRate 0.0001 Epoch: 31 Global Step: 643820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:18,198-Speed 6303.06 samples/sec Loss 3.2365 LearningRate 0.0001 Epoch: 31 Global Step: 643830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:21,446-Speed 6307.33 samples/sec Loss 3.1878 LearningRate 0.0001 Epoch: 31 Global Step: 643840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:24,679-Speed 6335.59 samples/sec Loss 3.2323 LearningRate 0.0001 Epoch: 31 Global Step: 643850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:27,924-Speed 6311.95 samples/sec Loss 3.2688 LearningRate 0.0001 Epoch: 31 Global Step: 643860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:31,170-Speed 6312.33 samples/sec Loss 3.3045 LearningRate 0.0001 Epoch: 31 Global Step: 643870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:34,423-Speed 6295.65 samples/sec Loss 3.2531 LearningRate 0.0001 Epoch: 31 Global Step: 643880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:37,669-Speed 6311.22 samples/sec Loss 3.3368 LearningRate 0.0001 Epoch: 31 Global Step: 643890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:40,919-Speed 6304.96 samples/sec Loss 3.2078 LearningRate 0.0001 Epoch: 31 Global Step: 643900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:44,171-Speed 6298.41 samples/sec Loss 3.3020 LearningRate 0.0001 Epoch: 31 Global Step: 643910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:47,420-Speed 6305.63 samples/sec Loss 3.2532 LearningRate 0.0001 Epoch: 31 Global Step: 643920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:50,675-Speed 6293.97 samples/sec Loss 3.3114 LearningRate 0.0001 Epoch: 31 Global Step: 643930 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:53,918-Speed 6316.13 samples/sec Loss 3.2625 LearningRate 0.0001 Epoch: 31 Global Step: 643940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:18:57,146-Speed 6345.73 samples/sec Loss 3.2262 LearningRate 0.0001 Epoch: 31 Global Step: 643950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:00,395-Speed 6304.98 samples/sec Loss 3.2972 LearningRate 0.0001 Epoch: 31 Global Step: 643960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:03,638-Speed 6316.90 samples/sec Loss 3.2222 LearningRate 0.0001 Epoch: 31 Global Step: 643970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:06,886-Speed 6306.85 samples/sec Loss 3.2760 LearningRate 0.0001 Epoch: 31 Global Step: 643980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:10,129-Speed 6315.73 samples/sec Loss 3.2412 LearningRate 0.0001 Epoch: 31 Global Step: 643990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:13,375-Speed 6310.92 samples/sec Loss 3.2763 LearningRate 0.0001 Epoch: 31 Global Step: 644000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:16,620-Speed 6312.04 samples/sec Loss 3.2402 LearningRate 0.0001 Epoch: 31 Global Step: 644010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:19,867-Speed 6309.59 samples/sec Loss 3.2495 LearningRate 0.0001 Epoch: 31 Global Step: 644020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:23,121-Speed 6296.18 samples/sec Loss 3.2824 LearningRate 0.0001 Epoch: 31 Global Step: 644030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:26,370-Speed 6304.80 samples/sec Loss 3.2762 LearningRate 0.0001 Epoch: 31 Global Step: 644040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:29,604-Speed 6334.68 samples/sec Loss 3.3119 LearningRate 0.0001 Epoch: 31 Global Step: 644050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:32,851-Speed 6307.91 samples/sec Loss 3.2718 LearningRate 0.0001 Epoch: 31 Global Step: 644060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:36,093-Speed 6321.06 samples/sec Loss 3.2899 LearningRate 0.0001 Epoch: 31 Global Step: 644070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:39,337-Speed 6314.06 samples/sec Loss 3.2235 LearningRate 0.0001 Epoch: 31 Global Step: 644080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:42,589-Speed 6299.00 samples/sec Loss 3.2622 LearningRate 0.0001 Epoch: 31 Global Step: 644090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:45,838-Speed 6305.77 samples/sec Loss 3.3343 LearningRate 0.0001 Epoch: 31 Global Step: 644100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:49,085-Speed 6307.35 samples/sec Loss 3.2548 LearningRate 0.0001 Epoch: 31 Global Step: 644110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:52,337-Speed 6299.54 samples/sec Loss 3.2677 LearningRate 0.0001 Epoch: 31 Global Step: 644120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:19:55,568-Speed 6339.29 samples/sec Loss 3.2531 LearningRate 0.0001 Epoch: 31 Global Step: 644130 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:19:58,812-Speed 6315.36 samples/sec Loss 3.2890 LearningRate 0.0001 Epoch: 31 Global Step: 644140 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:02,082-Speed 6265.06 samples/sec Loss 3.2654 LearningRate 0.0001 Epoch: 31 Global Step: 644150 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:05,409-Speed 6156.63 samples/sec Loss 3.3147 LearningRate 0.0001 Epoch: 31 Global Step: 644160 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:08,656-Speed 6307.37 samples/sec Loss 3.2616 LearningRate 0.0001 Epoch: 31 Global Step: 644170 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:11,907-Speed 6302.63 samples/sec Loss 3.2157 LearningRate 0.0001 Epoch: 31 Global Step: 644180 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:15,149-Speed 6316.95 samples/sec Loss 3.2315 LearningRate 0.0001 Epoch: 31 Global Step: 644190 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:18,393-Speed 6316.18 samples/sec Loss 3.2712 LearningRate 0.0001 Epoch: 31 Global Step: 644200 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:21,635-Speed 6317.78 samples/sec Loss 3.2748 LearningRate 0.0001 Epoch: 31 Global Step: 644210 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:24,884-Speed 6305.03 samples/sec Loss 3.2730 LearningRate 0.0001 Epoch: 31 Global Step: 644220 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:28,134-Speed 6302.89 samples/sec Loss 3.2839 LearningRate 0.0001 Epoch: 31 Global Step: 644230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:20:31,381-Speed 6309.40 samples/sec Loss 3.1712 LearningRate 0.0001 Epoch: 31 Global Step: 644240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:20:34,627-Speed 6309.96 samples/sec Loss 3.2346 LearningRate 0.0001 Epoch: 31 Global Step: 644250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:20:37,858-Speed 6341.81 samples/sec Loss 3.2219 LearningRate 0.0001 Epoch: 31 Global Step: 644260 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:41,143-Speed 6235.50 samples/sec Loss 3.2644 LearningRate 0.0001 Epoch: 31 Global Step: 644270 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:44,402-Speed 6285.31 samples/sec Loss 3.3296 LearningRate 0.0001 Epoch: 31 Global Step: 644280 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:47,654-Speed 6300.15 samples/sec Loss 3.2871 LearningRate 0.0001 Epoch: 31 Global Step: 644290 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:50,922-Speed 6267.55 samples/sec Loss 3.2843 LearningRate 0.0001 Epoch: 31 Global Step: 644300 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:54,221-Speed 6209.15 samples/sec Loss 3.2241 LearningRate 0.0001 Epoch: 31 Global Step: 644310 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:20:57,466-Speed 6313.26 samples/sec Loss 3.2876 LearningRate 0.0001 Epoch: 31 Global Step: 644320 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:21:00,716-Speed 6302.22 samples/sec Loss 3.2499 LearningRate 0.0001 Epoch: 31 Global Step: 644330 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:21:03,963-Speed 6308.58 samples/sec Loss 3.3030 LearningRate 0.0001 Epoch: 31 Global Step: 644340 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:21:07,216-Speed 6298.63 samples/sec Loss 3.2786 LearningRate 0.0001 Epoch: 31 Global Step: 644350 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:21:10,462-Speed 6310.31 samples/sec Loss 3.2026 LearningRate 0.0001 Epoch: 31 Global Step: 644360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:13,710-Speed 6306.37 samples/sec Loss 3.2400 LearningRate 0.0001 Epoch: 31 Global Step: 644370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:16,954-Speed 6314.29 samples/sec Loss 3.2689 LearningRate 0.0001 Epoch: 31 Global Step: 644380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:20,202-Speed 6307.56 samples/sec Loss 3.3112 LearningRate 0.0001 Epoch: 31 Global Step: 644390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:23,454-Speed 6299.57 samples/sec Loss 3.2336 LearningRate 0.0001 Epoch: 31 Global Step: 644400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:26,702-Speed 6306.76 samples/sec Loss 3.2923 LearningRate 0.0001 Epoch: 31 Global Step: 644410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:29,946-Speed 6313.27 samples/sec Loss 3.2641 LearningRate 0.0001 Epoch: 31 Global Step: 644420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:33,209-Speed 6277.97 samples/sec Loss 3.3041 LearningRate 0.0001 Epoch: 31 Global Step: 644430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:36,459-Speed 6304.43 samples/sec Loss 3.2782 LearningRate 0.0001 Epoch: 31 Global Step: 644440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:39,710-Speed 6299.61 samples/sec Loss 3.3123 LearningRate 0.0001 Epoch: 31 Global Step: 644450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:42,951-Speed 6320.11 samples/sec Loss 3.2621 LearningRate 0.0001 Epoch: 31 Global Step: 644460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:46,205-Speed 6296.13 samples/sec Loss 3.2474 LearningRate 0.0001 Epoch: 31 Global Step: 644470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:49,457-Speed 6300.69 samples/sec Loss 3.3048 LearningRate 0.0001 Epoch: 31 Global Step: 644480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:52,706-Speed 6304.58 samples/sec Loss 3.3170 LearningRate 0.0001 Epoch: 31 Global Step: 644490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:55,952-Speed 6310.00 samples/sec Loss 3.2125 LearningRate 0.0001 Epoch: 31 Global Step: 644500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:21:59,200-Speed 6307.81 samples/sec Loss 3.2314 LearningRate 0.0001 Epoch: 31 Global Step: 644510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:02,451-Speed 6301.79 samples/sec Loss 3.2007 LearningRate 0.0001 Epoch: 31 Global Step: 644520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:05,694-Speed 6315.35 samples/sec Loss 3.2583 LearningRate 0.0001 Epoch: 31 Global Step: 644530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:08,940-Speed 6309.90 samples/sec Loss 3.2275 LearningRate 0.0001 Epoch: 31 Global Step: 644540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:12,184-Speed 6314.71 samples/sec Loss 3.2516 LearningRate 0.0001 Epoch: 31 Global Step: 644550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:15,433-Speed 6304.89 samples/sec Loss 3.2414 LearningRate 0.0001 Epoch: 31 Global Step: 644560 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:22:18,676-Speed 6317.45 samples/sec Loss 3.2849 LearningRate 0.0001 Epoch: 31 Global Step: 644570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:21,920-Speed 6314.19 samples/sec Loss 3.2347 LearningRate 0.0001 Epoch: 31 Global Step: 644580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:25,169-Speed 6305.47 samples/sec Loss 3.2811 LearningRate 0.0001 Epoch: 31 Global Step: 644590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:28,412-Speed 6315.88 samples/sec Loss 3.2883 LearningRate 0.0001 Epoch: 31 Global Step: 644600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:31,660-Speed 6307.22 samples/sec Loss 3.2319 LearningRate 0.0001 Epoch: 31 Global Step: 644610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:34,908-Speed 6306.21 samples/sec Loss 3.2813 LearningRate 0.0001 Epoch: 31 Global Step: 644620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:38,154-Speed 6310.35 samples/sec Loss 3.2142 LearningRate 0.0001 Epoch: 31 Global Step: 644630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:41,404-Speed 6304.66 samples/sec Loss 3.3046 LearningRate 0.0001 Epoch: 31 Global Step: 644640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:44,653-Speed 6303.58 samples/sec Loss 3.2431 LearningRate 0.0001 Epoch: 31 Global Step: 644650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:47,899-Speed 6310.05 samples/sec Loss 3.2973 LearningRate 0.0001 Epoch: 31 Global Step: 644660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:51,133-Speed 6334.69 samples/sec Loss 3.2366 LearningRate 0.0001 Epoch: 31 Global Step: 644670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:54,389-Speed 6292.16 samples/sec Loss 3.2572 LearningRate 0.0001 Epoch: 31 Global Step: 644680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:22:57,640-Speed 6299.95 samples/sec Loss 3.2444 LearningRate 0.0001 Epoch: 31 Global Step: 644690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:00,886-Speed 6311.93 samples/sec Loss 3.2663 LearningRate 0.0001 Epoch: 31 Global Step: 644700 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:04,136-Speed 6302.60 samples/sec Loss 3.2577 LearningRate 0.0001 Epoch: 31 Global Step: 644710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:07,383-Speed 6309.84 samples/sec Loss 3.2575 LearningRate 0.0001 Epoch: 31 Global Step: 644720 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:10,626-Speed 6316.13 samples/sec Loss 3.3271 LearningRate 0.0001 Epoch: 31 Global Step: 644730 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:13,873-Speed 6308.20 samples/sec Loss 3.2347 LearningRate 0.0001 Epoch: 31 Global Step: 644740 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:17,126-Speed 6296.50 samples/sec Loss 3.2170 LearningRate 0.0001 Epoch: 31 Global Step: 644750 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:20,376-Speed 6303.74 samples/sec Loss 3.3448 LearningRate 0.0001 Epoch: 31 Global Step: 644760 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:23,609-Speed 6335.11 samples/sec Loss 3.2847 LearningRate 0.0001 Epoch: 31 Global Step: 644770 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:26,855-Speed 6312.47 samples/sec Loss 3.2621 LearningRate 0.0001 Epoch: 31 Global Step: 644780 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:30,101-Speed 6309.73 samples/sec Loss 3.2358 LearningRate 0.0001 Epoch: 31 Global Step: 644790 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:33,345-Speed 6314.59 samples/sec Loss 3.2323 LearningRate 0.0001 Epoch: 31 Global Step: 644800 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:36,591-Speed 6311.68 samples/sec Loss 3.3001 LearningRate 0.0001 Epoch: 31 Global Step: 644810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:39,837-Speed 6310.30 samples/sec Loss 3.2007 LearningRate 0.0001 Epoch: 31 Global Step: 644820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:43,087-Speed 6302.43 samples/sec Loss 3.2737 LearningRate 0.0001 Epoch: 31 Global Step: 644830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:46,343-Speed 6290.92 samples/sec Loss 3.2777 LearningRate 0.0001 Epoch: 31 Global Step: 644840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:49,597-Speed 6295.04 samples/sec Loss 3.2160 LearningRate 0.0001 Epoch: 31 Global Step: 644850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:52,844-Speed 6309.88 samples/sec Loss 3.2552 LearningRate 0.0001 Epoch: 31 Global Step: 644860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:56,076-Speed 6338.05 samples/sec Loss 3.2206 LearningRate 0.0001 Epoch: 31 Global Step: 644870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:23:59,322-Speed 6310.12 samples/sec Loss 3.2430 LearningRate 0.0001 Epoch: 31 Global Step: 644880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:02,572-Speed 6303.45 samples/sec Loss 3.2369 LearningRate 0.0001 Epoch: 31 Global Step: 644890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:05,824-Speed 6299.57 samples/sec Loss 3.3014 LearningRate 0.0001 Epoch: 31 Global Step: 644900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:09,071-Speed 6307.87 samples/sec Loss 3.2679 LearningRate 0.0001 Epoch: 31 Global Step: 644910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:12,316-Speed 6313.42 samples/sec Loss 3.2729 LearningRate 0.0001 Epoch: 31 Global Step: 644920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:15,568-Speed 6298.89 samples/sec Loss 3.2904 LearningRate 0.0001 Epoch: 31 Global Step: 644930 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:18,818-Speed 6303.13 samples/sec Loss 3.2326 LearningRate 0.0001 Epoch: 31 Global Step: 644940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:22,072-Speed 6295.69 samples/sec Loss 3.3078 LearningRate 0.0001 Epoch: 31 Global Step: 644950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:25,325-Speed 6296.66 samples/sec Loss 3.2625 LearningRate 0.0001 Epoch: 31 Global Step: 644960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:28,593-Speed 6268.10 samples/sec Loss 3.1999 LearningRate 0.0001 Epoch: 31 Global Step: 644970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:31,841-Speed 6307.15 samples/sec Loss 3.2484 LearningRate 0.0001 Epoch: 31 Global Step: 644980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:35,087-Speed 6311.59 samples/sec Loss 3.2164 LearningRate 0.0001 Epoch: 31 Global Step: 644990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:38,331-Speed 6313.69 samples/sec Loss 3.3272 LearningRate 0.0001 Epoch: 31 Global Step: 645000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:41,576-Speed 6312.60 samples/sec Loss 3.2905 LearningRate 0.0001 Epoch: 31 Global Step: 645010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:44,838-Speed 6279.68 samples/sec Loss 3.3150 LearningRate 0.0001 Epoch: 31 Global Step: 645020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:48,098-Speed 6285.80 samples/sec Loss 3.3130 LearningRate 0.0001 Epoch: 31 Global Step: 645030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:51,343-Speed 6311.07 samples/sec Loss 3.2783 LearningRate 0.0001 Epoch: 31 Global Step: 645040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:54,590-Speed 6308.64 samples/sec Loss 3.2366 LearningRate 0.0001 Epoch: 31 Global Step: 645050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:24:57,839-Speed 6306.95 samples/sec Loss 3.2825 LearningRate 0.0001 Epoch: 31 Global Step: 645060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:01,183-Speed 6124.35 samples/sec Loss 3.2624 LearningRate 0.0001 Epoch: 31 Global Step: 645070 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:25:04,426-Speed 6316.75 samples/sec Loss 3.2923 LearningRate 0.0001 Epoch: 31 Global Step: 645080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:07,673-Speed 6309.86 samples/sec Loss 3.2834 LearningRate 0.0001 Epoch: 31 Global Step: 645090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:10,920-Speed 6309.09 samples/sec Loss 3.2962 LearningRate 0.0001 Epoch: 31 Global Step: 645100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:14,166-Speed 6309.52 samples/sec Loss 3.2943 LearningRate 0.0001 Epoch: 31 Global Step: 645110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:17,416-Speed 6303.26 samples/sec Loss 3.2624 LearningRate 0.0001 Epoch: 31 Global Step: 645120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:20,669-Speed 6298.41 samples/sec Loss 3.2113 LearningRate 0.0001 Epoch: 31 Global Step: 645130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:23,914-Speed 6312.16 samples/sec Loss 3.2413 LearningRate 0.0001 Epoch: 31 Global Step: 645140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:27,162-Speed 6307.20 samples/sec Loss 3.2372 LearningRate 0.0001 Epoch: 31 Global Step: 645150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:30,410-Speed 6306.58 samples/sec Loss 3.2402 LearningRate 0.0001 Epoch: 31 Global Step: 645160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:33,658-Speed 6305.97 samples/sec Loss 3.2707 LearningRate 0.0001 Epoch: 31 Global Step: 645170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:36,889-Speed 6341.46 samples/sec Loss 3.2787 LearningRate 0.0001 Epoch: 31 Global Step: 645180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:40,136-Speed 6308.33 samples/sec Loss 3.2579 LearningRate 0.0001 Epoch: 31 Global Step: 645190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:43,379-Speed 6317.19 samples/sec Loss 3.2625 LearningRate 0.0001 Epoch: 31 Global Step: 645200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:46,646-Speed 6270.40 samples/sec Loss 3.2464 LearningRate 0.0001 Epoch: 31 Global Step: 645210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:49,890-Speed 6313.86 samples/sec Loss 3.2965 LearningRate 0.0001 Epoch: 31 Global Step: 645220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:53,145-Speed 6294.51 samples/sec Loss 3.2719 LearningRate 0.0001 Epoch: 31 Global Step: 645230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:56,396-Speed 6300.65 samples/sec Loss 3.2152 LearningRate 0.0001 Epoch: 31 Global Step: 645240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:25:59,642-Speed 6309.31 samples/sec Loss 3.2957 LearningRate 0.0001 Epoch: 31 Global Step: 645250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:02,887-Speed 6313.16 samples/sec Loss 3.2688 LearningRate 0.0001 Epoch: 31 Global Step: 645260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:06,129-Speed 6318.54 samples/sec Loss 3.2097 LearningRate 0.0001 Epoch: 31 Global Step: 645270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:09,368-Speed 6323.61 samples/sec Loss 3.2659 LearningRate 0.0001 Epoch: 31 Global Step: 645280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:12,612-Speed 6316.34 samples/sec Loss 3.2490 LearningRate 0.0001 Epoch: 31 Global Step: 645290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:15,874-Speed 6279.83 samples/sec Loss 3.2160 LearningRate 0.0001 Epoch: 31 Global Step: 645300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:19,120-Speed 6310.04 samples/sec Loss 3.2143 LearningRate 0.0001 Epoch: 31 Global Step: 645310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:22,366-Speed 6310.71 samples/sec Loss 3.2190 LearningRate 0.0001 Epoch: 31 Global Step: 645320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:25,611-Speed 6313.02 samples/sec Loss 3.2070 LearningRate 0.0001 Epoch: 31 Global Step: 645330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:28,859-Speed 6306.86 samples/sec Loss 3.2088 LearningRate 0.0001 Epoch: 31 Global Step: 645340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:32,111-Speed 6298.95 samples/sec Loss 3.2648 LearningRate 0.0001 Epoch: 31 Global Step: 645350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:35,360-Speed 6305.01 samples/sec Loss 3.2426 LearningRate 0.0001 Epoch: 31 Global Step: 645360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:38,601-Speed 6320.24 samples/sec Loss 3.2298 LearningRate 0.0001 Epoch: 31 Global Step: 645370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:41,835-Speed 6334.91 samples/sec Loss 3.2991 LearningRate 0.0001 Epoch: 31 Global Step: 645380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:45,086-Speed 6300.82 samples/sec Loss 3.3093 LearningRate 0.0001 Epoch: 31 Global Step: 645390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:48,333-Speed 6312.89 samples/sec Loss 3.2279 LearningRate 0.0001 Epoch: 31 Global Step: 645400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:51,581-Speed 6307.59 samples/sec Loss 3.2511 LearningRate 0.0001 Epoch: 31 Global Step: 645410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:54,825-Speed 6314.47 samples/sec Loss 3.2329 LearningRate 0.0001 Epoch: 31 Global Step: 645420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:26:58,071-Speed 6309.94 samples/sec Loss 3.2605 LearningRate 0.0001 Epoch: 31 Global Step: 645430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:01,320-Speed 6305.07 samples/sec Loss 3.2844 LearningRate 0.0001 Epoch: 31 Global Step: 645440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:04,574-Speed 6294.56 samples/sec Loss 3.2821 LearningRate 0.0001 Epoch: 31 Global Step: 645450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:07,823-Speed 6305.80 samples/sec Loss 3.2103 LearningRate 0.0001 Epoch: 31 Global Step: 645460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:11,070-Speed 6307.91 samples/sec Loss 3.2445 LearningRate 0.0001 Epoch: 31 Global Step: 645470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:14,302-Speed 6338.72 samples/sec Loss 3.2598 LearningRate 0.0001 Epoch: 31 Global Step: 645480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:17,548-Speed 6311.51 samples/sec Loss 3.2236 LearningRate 0.0001 Epoch: 31 Global Step: 645490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:20,798-Speed 6301.66 samples/sec Loss 3.2427 LearningRate 0.0001 Epoch: 31 Global Step: 645500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:24,046-Speed 6306.70 samples/sec Loss 3.2276 LearningRate 0.0001 Epoch: 31 Global Step: 645510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:27,294-Speed 6308.71 samples/sec Loss 3.2535 LearningRate 0.0001 Epoch: 31 Global Step: 645520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:30,541-Speed 6308.16 samples/sec Loss 3.2239 LearningRate 0.0001 Epoch: 31 Global Step: 645530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:33,789-Speed 6306.57 samples/sec Loss 3.3308 LearningRate 0.0001 Epoch: 31 Global Step: 645540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:37,041-Speed 6299.13 samples/sec Loss 3.2510 LearningRate 0.0001 Epoch: 31 Global Step: 645550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:40,291-Speed 6302.39 samples/sec Loss 3.2445 LearningRate 0.0001 Epoch: 31 Global Step: 645560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:43,537-Speed 6311.75 samples/sec Loss 3.2511 LearningRate 0.0001 Epoch: 31 Global Step: 645570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:46,789-Speed 6300.06 samples/sec Loss 3.3171 LearningRate 0.0001 Epoch: 31 Global Step: 645580 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:27:50,024-Speed 6331.71 samples/sec Loss 3.2537 LearningRate 0.0001 Epoch: 31 Global Step: 645590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:53,274-Speed 6302.92 samples/sec Loss 3.2677 LearningRate 0.0001 Epoch: 31 Global Step: 645600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:56,517-Speed 6315.87 samples/sec Loss 3.2150 LearningRate 0.0001 Epoch: 31 Global Step: 645610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:27:59,773-Speed 6292.68 samples/sec Loss 3.2500 LearningRate 0.0001 Epoch: 31 Global Step: 645620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:28:03,024-Speed 6300.61 samples/sec Loss 3.2229 LearningRate 0.0001 Epoch: 31 Global Step: 645630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:28:06,271-Speed 6307.89 samples/sec Loss 3.2589 LearningRate 0.0001 Epoch: 31 Global Step: 645640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:28:09,521-Speed 6304.41 samples/sec Loss 3.2409 LearningRate 0.0001 Epoch: 31 Global Step: 645650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:28:12,765-Speed 6313.20 samples/sec Loss 3.2443 LearningRate 0.0001 Epoch: 31 Global Step: 645660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:28:16,012-Speed 6309.22 samples/sec Loss 3.2963 LearningRate 0.0001 Epoch: 31 Global Step: 645670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:28:19,262-Speed 6302.29 samples/sec Loss 3.1976 LearningRate 0.0001 Epoch: 31 Global Step: 645680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:28:22,502-Speed 6324.18 samples/sec Loss 3.2552 LearningRate 0.0001 Epoch: 31 Global Step: 645690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:28:25,754-Speed 6297.76 samples/sec Loss 3.3024 LearningRate 0.0001 Epoch: 31 Global Step: 645700 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:28:28,986-Speed 6338.44 samples/sec Loss 3.2527 LearningRate 0.0001 Epoch: 31 Global Step: 645710 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:28:32,236-Speed 6302.72 samples/sec Loss 3.2594 LearningRate 0.0001 Epoch: 31 Global Step: 645720 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:28:35,481-Speed 6313.25 samples/sec Loss 3.2549 LearningRate 0.0001 Epoch: 31 Global Step: 645730 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:28:38,723-Speed 6318.65 samples/sec Loss 3.2368 LearningRate 0.0001 Epoch: 31 Global Step: 645740 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:28:41,969-Speed 6310.46 samples/sec Loss 3.2976 LearningRate 0.0001 Epoch: 31 Global Step: 645750 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:28:45,212-Speed 6317.20 samples/sec Loss 3.2903 LearningRate 0.0001 Epoch: 31 Global Step: 645760 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:28:48,456-Speed 6314.55 samples/sec Loss 3.2666 LearningRate 0.0001 Epoch: 31 Global Step: 645770 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:28:51,701-Speed 6311.06 samples/sec Loss 3.2638 LearningRate 0.0001 Epoch: 31 Global Step: 645780 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:28:54,950-Speed 6305.24 samples/sec Loss 3.2847 LearningRate 0.0001 Epoch: 31 Global Step: 645790 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:28:58,192-Speed 6319.27 samples/sec Loss 3.3008 LearningRate 0.0001 Epoch: 31 Global Step: 645800 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:29:01,440-Speed 6307.19 samples/sec Loss 3.2311 LearningRate 0.0001 Epoch: 31 Global Step: 645810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:04,691-Speed 6301.30 samples/sec Loss 3.2123 LearningRate 0.0001 Epoch: 31 Global Step: 645820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:07,936-Speed 6313.34 samples/sec Loss 3.2498 LearningRate 0.0001 Epoch: 31 Global Step: 645830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:11,182-Speed 6311.13 samples/sec Loss 3.2729 LearningRate 0.0001 Epoch: 31 Global Step: 645840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:14,429-Speed 6308.32 samples/sec Loss 3.3185 LearningRate 0.0001 Epoch: 31 Global Step: 645850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:17,674-Speed 6313.10 samples/sec Loss 3.2444 LearningRate 0.0001 Epoch: 31 Global Step: 645860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:20,918-Speed 6314.42 samples/sec Loss 3.1905 LearningRate 0.0001 Epoch: 31 Global Step: 645870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:24,167-Speed 6304.91 samples/sec Loss 3.1944 LearningRate 0.0001 Epoch: 31 Global Step: 645880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:27,412-Speed 6312.78 samples/sec Loss 3.3326 LearningRate 0.0001 Epoch: 31 Global Step: 645890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:30,656-Speed 6313.34 samples/sec Loss 3.2263 LearningRate 0.0001 Epoch: 31 Global Step: 645900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:33,891-Speed 6332.87 samples/sec Loss 3.2391 LearningRate 0.0001 Epoch: 31 Global Step: 645910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:37,134-Speed 6316.42 samples/sec Loss 3.2232 LearningRate 0.0001 Epoch: 31 Global Step: 645920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:40,385-Speed 6301.10 samples/sec Loss 3.2713 LearningRate 0.0001 Epoch: 31 Global Step: 645930 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:43,634-Speed 6304.90 samples/sec Loss 3.2332 LearningRate 0.0001 Epoch: 31 Global Step: 645940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:46,881-Speed 6308.60 samples/sec Loss 3.2698 LearningRate 0.0001 Epoch: 31 Global Step: 645950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:50,127-Speed 6310.84 samples/sec Loss 3.3034 LearningRate 0.0001 Epoch: 31 Global Step: 645960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:53,373-Speed 6311.62 samples/sec Loss 3.2372 LearningRate 0.0001 Epoch: 31 Global Step: 645970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:56,619-Speed 6310.46 samples/sec Loss 3.2621 LearningRate 0.0001 Epoch: 31 Global Step: 645980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:29:59,862-Speed 6316.19 samples/sec Loss 3.2539 LearningRate 0.0001 Epoch: 31 Global Step: 645990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:03,112-Speed 6302.69 samples/sec Loss 3.2834 LearningRate 0.0001 Epoch: 31 Global Step: 646000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:06,345-Speed 6335.49 samples/sec Loss 3.1830 LearningRate 0.0001 Epoch: 31 Global Step: 646010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:09,594-Speed 6304.63 samples/sec Loss 3.2774 LearningRate 0.0001 Epoch: 31 Global Step: 646020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:12,840-Speed 6312.19 samples/sec Loss 3.2838 LearningRate 0.0001 Epoch: 31 Global Step: 646030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:16,083-Speed 6316.87 samples/sec Loss 3.2712 LearningRate 0.0001 Epoch: 31 Global Step: 646040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:19,332-Speed 6306.17 samples/sec Loss 3.2960 LearningRate 0.0001 Epoch: 31 Global Step: 646050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:22,577-Speed 6312.20 samples/sec Loss 3.2400 LearningRate 0.0001 Epoch: 31 Global Step: 646060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:25,824-Speed 6308.25 samples/sec Loss 3.2360 LearningRate 0.0001 Epoch: 31 Global Step: 646070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:29,074-Speed 6304.04 samples/sec Loss 3.3092 LearningRate 0.0001 Epoch: 31 Global Step: 646080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:32,322-Speed 6306.09 samples/sec Loss 3.2317 LearningRate 0.0001 Epoch: 31 Global Step: 646090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:35,568-Speed 6311.00 samples/sec Loss 3.2120 LearningRate 0.0001 Epoch: 31 Global Step: 646100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:38,813-Speed 6312.55 samples/sec Loss 3.2461 LearningRate 0.0001 Epoch: 31 Global Step: 646110 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:30:42,045-Speed 6337.32 samples/sec Loss 3.2292 LearningRate 0.0001 Epoch: 31 Global Step: 646120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:45,296-Speed 6301.69 samples/sec Loss 3.2359 LearningRate 0.0001 Epoch: 31 Global Step: 646130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:48,548-Speed 6299.27 samples/sec Loss 3.2399 LearningRate 0.0001 Epoch: 31 Global Step: 646140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:51,800-Speed 6299.28 samples/sec Loss 3.2083 LearningRate 0.0001 Epoch: 31 Global Step: 646150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:55,044-Speed 6314.30 samples/sec Loss 3.2814 LearningRate 0.0001 Epoch: 31 Global Step: 646160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:30:58,290-Speed 6309.91 samples/sec Loss 3.2720 LearningRate 0.0001 Epoch: 31 Global Step: 646170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:01,540-Speed 6303.59 samples/sec Loss 3.2536 LearningRate 0.0001 Epoch: 31 Global Step: 646180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:04,793-Speed 6296.98 samples/sec Loss 3.2848 LearningRate 0.0001 Epoch: 31 Global Step: 646190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:08,040-Speed 6307.64 samples/sec Loss 3.2495 LearningRate 0.0001 Epoch: 31 Global Step: 646200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:11,287-Speed 6309.27 samples/sec Loss 3.2767 LearningRate 0.0001 Epoch: 31 Global Step: 646210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:14,520-Speed 6335.78 samples/sec Loss 3.2361 LearningRate 0.0001 Epoch: 31 Global Step: 646220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:17,765-Speed 6313.87 samples/sec Loss 3.2625 LearningRate 0.0001 Epoch: 31 Global Step: 646230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:21,012-Speed 6308.58 samples/sec Loss 3.2950 LearningRate 0.0001 Epoch: 31 Global Step: 646240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:24,260-Speed 6306.15 samples/sec Loss 3.2527 LearningRate 0.0001 Epoch: 31 Global Step: 646250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:27,507-Speed 6309.65 samples/sec Loss 3.2649 LearningRate 0.0001 Epoch: 31 Global Step: 646260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:30,753-Speed 6311.10 samples/sec Loss 3.2658 LearningRate 0.0001 Epoch: 31 Global Step: 646270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:34,002-Speed 6305.66 samples/sec Loss 3.2350 LearningRate 0.0001 Epoch: 31 Global Step: 646280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:37,246-Speed 6314.85 samples/sec Loss 3.1982 LearningRate 0.0001 Epoch: 31 Global Step: 646290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:40,486-Speed 6321.33 samples/sec Loss 3.2923 LearningRate 0.0001 Epoch: 31 Global Step: 646300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:43,733-Speed 6309.97 samples/sec Loss 3.2041 LearningRate 0.0001 Epoch: 31 Global Step: 646310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:46,965-Speed 6337.97 samples/sec Loss 3.2333 LearningRate 0.0001 Epoch: 31 Global Step: 646320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:50,212-Speed 6308.12 samples/sec Loss 3.1773 LearningRate 0.0001 Epoch: 31 Global Step: 646330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:53,464-Speed 6298.29 samples/sec Loss 3.2572 LearningRate 0.0001 Epoch: 31 Global Step: 646340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:56,706-Speed 6318.45 samples/sec Loss 3.2473 LearningRate 0.0001 Epoch: 31 Global Step: 646350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:31:59,955-Speed 6304.76 samples/sec Loss 3.1900 LearningRate 0.0001 Epoch: 31 Global Step: 646360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:03,209-Speed 6296.05 samples/sec Loss 3.1790 LearningRate 0.0001 Epoch: 31 Global Step: 646370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:06,458-Speed 6305.03 samples/sec Loss 3.2383 LearningRate 0.0001 Epoch: 31 Global Step: 646380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:09,713-Speed 6292.46 samples/sec Loss 3.2519 LearningRate 0.0001 Epoch: 31 Global Step: 646390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:12,961-Speed 6306.79 samples/sec Loss 3.2724 LearningRate 0.0001 Epoch: 31 Global Step: 646400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:16,215-Speed 6294.87 samples/sec Loss 3.2449 LearningRate 0.0001 Epoch: 31 Global Step: 646410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:19,466-Speed 6301.34 samples/sec Loss 3.2034 LearningRate 0.0001 Epoch: 31 Global Step: 646420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:22,709-Speed 6316.90 samples/sec Loss 3.2623 LearningRate 0.0001 Epoch: 31 Global Step: 646430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:25,960-Speed 6301.95 samples/sec Loss 3.2697 LearningRate 0.0001 Epoch: 31 Global Step: 646440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:29,204-Speed 6313.25 samples/sec Loss 3.2097 LearningRate 0.0001 Epoch: 31 Global Step: 646450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:32,452-Speed 6308.08 samples/sec Loss 3.1695 LearningRate 0.0001 Epoch: 31 Global Step: 646460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:35,701-Speed 6305.11 samples/sec Loss 3.2969 LearningRate 0.0001 Epoch: 31 Global Step: 646470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:38,948-Speed 6308.47 samples/sec Loss 3.2514 LearningRate 0.0001 Epoch: 31 Global Step: 646480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:42,194-Speed 6310.30 samples/sec Loss 3.2137 LearningRate 0.0001 Epoch: 31 Global Step: 646490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:45,439-Speed 6313.51 samples/sec Loss 3.2760 LearningRate 0.0001 Epoch: 31 Global Step: 646500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:48,685-Speed 6311.09 samples/sec Loss 3.2204 LearningRate 0.0001 Epoch: 31 Global Step: 646510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:51,919-Speed 6334.65 samples/sec Loss 3.2282 LearningRate 0.0001 Epoch: 31 Global Step: 646520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:55,167-Speed 6305.61 samples/sec Loss 3.2819 LearningRate 0.0001 Epoch: 31 Global Step: 646530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:32:58,416-Speed 6305.59 samples/sec Loss 3.2094 LearningRate 0.0001 Epoch: 31 Global Step: 646540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:01,660-Speed 6314.27 samples/sec Loss 3.2454 LearningRate 0.0001 Epoch: 31 Global Step: 646550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:04,910-Speed 6303.02 samples/sec Loss 3.2316 LearningRate 0.0001 Epoch: 31 Global Step: 646560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:08,154-Speed 6315.19 samples/sec Loss 3.2230 LearningRate 0.0001 Epoch: 31 Global Step: 646570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:11,401-Speed 6307.03 samples/sec Loss 3.2035 LearningRate 0.0001 Epoch: 31 Global Step: 646580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:14,651-Speed 6303.94 samples/sec Loss 3.2600 LearningRate 0.0001 Epoch: 31 Global Step: 646590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:17,896-Speed 6312.77 samples/sec Loss 3.2569 LearningRate 0.0001 Epoch: 31 Global Step: 646600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:21,145-Speed 6305.57 samples/sec Loss 3.2890 LearningRate 0.0001 Epoch: 31 Global Step: 646610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:24,391-Speed 6309.95 samples/sec Loss 3.3148 LearningRate 0.0001 Epoch: 31 Global Step: 646620 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:33:27,623-Speed 6338.25 samples/sec Loss 3.2685 LearningRate 0.0001 Epoch: 31 Global Step: 646630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:30,865-Speed 6317.15 samples/sec Loss 3.3156 LearningRate 0.0001 Epoch: 31 Global Step: 646640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:34,117-Speed 6299.31 samples/sec Loss 3.2675 LearningRate 0.0001 Epoch: 31 Global Step: 646650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:37,362-Speed 6314.21 samples/sec Loss 3.2534 LearningRate 0.0001 Epoch: 31 Global Step: 646660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:40,611-Speed 6303.28 samples/sec Loss 3.2227 LearningRate 0.0001 Epoch: 31 Global Step: 646670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:43,857-Speed 6312.56 samples/sec Loss 3.2046 LearningRate 0.0001 Epoch: 31 Global Step: 646680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:33:47,084-Speed 6346.06 samples/sec Loss 3.1881 LearningRate 0.0001 Epoch: 31 Global Step: 646690 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:33:50,333-Speed 6305.69 samples/sec Loss 3.2758 LearningRate 0.0001 Epoch: 31 Global Step: 646700 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:33:53,580-Speed 6310.40 samples/sec Loss 3.2140 LearningRate 0.0001 Epoch: 31 Global Step: 646710 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:33:56,825-Speed 6312.47 samples/sec Loss 3.2863 LearningRate 0.0001 Epoch: 31 Global Step: 646720 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:34:00,080-Speed 6293.16 samples/sec Loss 3.2306 LearningRate 0.0001 Epoch: 31 Global Step: 646730 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:34:03,332-Speed 6299.01 samples/sec Loss 3.2435 LearningRate 0.0001 Epoch: 31 Global Step: 646740 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:34:06,575-Speed 6315.72 samples/sec Loss 3.2485 LearningRate 0.0001 Epoch: 31 Global Step: 646750 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:34:09,818-Speed 6316.87 samples/sec Loss 3.2842 LearningRate 0.0001 Epoch: 31 Global Step: 646760 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:34:13,109-Speed 6224.47 samples/sec Loss 3.2666 LearningRate 0.0001 Epoch: 31 Global Step: 646770 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:34:16,382-Speed 6259.84 samples/sec Loss 3.2447 LearningRate 0.0001 Epoch: 31 Global Step: 646780 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-03 02:34:19,624-Speed 6317.05 samples/sec Loss 3.2199 LearningRate 0.0001 Epoch: 31 Global Step: 646790 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:22,873-Speed 6306.34 samples/sec Loss 3.2566 LearningRate 0.0001 Epoch: 31 Global Step: 646800 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:26,115-Speed 6317.23 samples/sec Loss 3.2394 LearningRate 0.0001 Epoch: 31 Global Step: 646810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:29,365-Speed 6304.06 samples/sec Loss 3.2341 LearningRate 0.0001 Epoch: 31 Global Step: 646820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:32,610-Speed 6313.67 samples/sec Loss 3.2810 LearningRate 0.0001 Epoch: 31 Global Step: 646830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:35,856-Speed 6310.79 samples/sec Loss 3.2593 LearningRate 0.0001 Epoch: 31 Global Step: 646840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:39,099-Speed 6316.75 samples/sec Loss 3.2682 LearningRate 0.0001 Epoch: 31 Global Step: 646850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:42,345-Speed 6310.44 samples/sec Loss 3.2194 LearningRate 0.0001 Epoch: 31 Global Step: 646860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:45,634-Speed 6228.22 samples/sec Loss 3.2581 LearningRate 0.0001 Epoch: 31 Global Step: 646870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:48,910-Speed 6253.41 samples/sec Loss 3.2762 LearningRate 0.0001 Epoch: 31 Global Step: 646880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:52,141-Speed 6339.64 samples/sec Loss 3.3090 LearningRate 0.0001 Epoch: 31 Global Step: 646890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:55,384-Speed 6315.68 samples/sec Loss 3.2158 LearningRate 0.0001 Epoch: 31 Global Step: 646900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:34:58,628-Speed 6314.31 samples/sec Loss 3.2451 LearningRate 0.0001 Epoch: 31 Global Step: 646910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:01,877-Speed 6306.90 samples/sec Loss 3.2427 LearningRate 0.0001 Epoch: 31 Global Step: 646920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:05,122-Speed 6311.36 samples/sec Loss 3.2522 LearningRate 0.0001 Epoch: 31 Global Step: 646930 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:08,370-Speed 6307.61 samples/sec Loss 3.2431 LearningRate 0.0001 Epoch: 31 Global Step: 646940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:11,620-Speed 6303.53 samples/sec Loss 3.2717 LearningRate 0.0001 Epoch: 31 Global Step: 646950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:14,866-Speed 6309.42 samples/sec Loss 3.2001 LearningRate 0.0001 Epoch: 31 Global Step: 646960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:18,116-Speed 6304.61 samples/sec Loss 3.2813 LearningRate 0.0001 Epoch: 31 Global Step: 646970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:21,362-Speed 6309.22 samples/sec Loss 3.2995 LearningRate 0.0001 Epoch: 31 Global Step: 646980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:24,610-Speed 6306.45 samples/sec Loss 3.2524 LearningRate 0.0001 Epoch: 31 Global Step: 646990 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:35:27,852-Speed 6318.68 samples/sec Loss 3.2446 LearningRate 0.0001 Epoch: 31 Global Step: 647000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:31,101-Speed 6305.00 samples/sec Loss 3.2472 LearningRate 0.0001 Epoch: 31 Global Step: 647010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:34,346-Speed 6313.21 samples/sec Loss 3.2980 LearningRate 0.0001 Epoch: 31 Global Step: 647020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:37,594-Speed 6306.99 samples/sec Loss 3.2153 LearningRate 0.0001 Epoch: 31 Global Step: 647030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:40,842-Speed 6307.09 samples/sec Loss 3.2761 LearningRate 0.0001 Epoch: 31 Global Step: 647040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:44,094-Speed 6299.02 samples/sec Loss 3.2863 LearningRate 0.0001 Epoch: 31 Global Step: 647050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:47,344-Speed 6303.09 samples/sec Loss 3.2825 LearningRate 0.0001 Epoch: 31 Global Step: 647060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:50,592-Speed 6305.65 samples/sec Loss 3.2588 LearningRate 0.0001 Epoch: 31 Global Step: 647070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:53,837-Speed 6313.87 samples/sec Loss 3.2336 LearningRate 0.0001 Epoch: 31 Global Step: 647080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:35:57,089-Speed 6298.38 samples/sec Loss 3.1694 LearningRate 0.0001 Epoch: 31 Global Step: 647090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:00,323-Speed 6334.11 samples/sec Loss 3.3053 LearningRate 0.0001 Epoch: 31 Global Step: 647100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:03,572-Speed 6304.60 samples/sec Loss 3.2482 LearningRate 0.0001 Epoch: 31 Global Step: 647110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:06,821-Speed 6306.33 samples/sec Loss 3.2621 LearningRate 0.0001 Epoch: 31 Global Step: 647120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:10,066-Speed 6313.01 samples/sec Loss 3.2110 LearningRate 0.0001 Epoch: 31 Global Step: 647130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:13,317-Speed 6302.41 samples/sec Loss 3.1402 LearningRate 0.0001 Epoch: 31 Global Step: 647140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:16,573-Speed 6290.61 samples/sec Loss 3.2304 LearningRate 0.0001 Epoch: 31 Global Step: 647150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:19,823-Speed 6304.21 samples/sec Loss 3.1923 LearningRate 0.0001 Epoch: 31 Global Step: 647160 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:23,068-Speed 6313.09 samples/sec Loss 3.2398 LearningRate 0.0001 Epoch: 31 Global Step: 647170 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:26,316-Speed 6305.20 samples/sec Loss 3.2313 LearningRate 0.0001 Epoch: 31 Global Step: 647180 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:29,564-Speed 6308.19 samples/sec Loss 3.2870 LearningRate 0.0001 Epoch: 31 Global Step: 647190 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:32,799-Speed 6331.43 samples/sec Loss 3.2090 LearningRate 0.0001 Epoch: 31 Global Step: 647200 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:36,045-Speed 6311.39 samples/sec Loss 3.2916 LearningRate 0.0001 Epoch: 31 Global Step: 647210 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:39,287-Speed 6318.64 samples/sec Loss 3.2580 LearningRate 0.0001 Epoch: 31 Global Step: 647220 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:42,567-Speed 6245.48 samples/sec Loss 3.2297 LearningRate 0.0001 Epoch: 31 Global Step: 647230 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:45,816-Speed 6304.15 samples/sec Loss 3.2049 LearningRate 0.0001 Epoch: 31 Global Step: 647240 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:49,068-Speed 6299.86 samples/sec Loss 3.2866 LearningRate 0.0001 Epoch: 31 Global Step: 647250 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:52,315-Speed 6308.92 samples/sec Loss 3.2380 LearningRate 0.0001 Epoch: 31 Global Step: 647260 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:55,560-Speed 6312.55 samples/sec Loss 3.2899 LearningRate 0.0001 Epoch: 31 Global Step: 647270 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:36:58,805-Speed 6311.20 samples/sec Loss 3.2391 LearningRate 0.0001 Epoch: 31 Global Step: 647280 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:02,051-Speed 6310.92 samples/sec Loss 3.2167 LearningRate 0.0001 Epoch: 31 Global Step: 647290 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:05,285-Speed 6335.07 samples/sec Loss 3.2632 LearningRate 0.0001 Epoch: 31 Global Step: 647300 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:08,535-Speed 6303.24 samples/sec Loss 3.2350 LearningRate 0.0001 Epoch: 31 Global Step: 647310 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:11,783-Speed 6305.99 samples/sec Loss 3.2390 LearningRate 0.0001 Epoch: 31 Global Step: 647320 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:15,029-Speed 6309.85 samples/sec Loss 3.1991 LearningRate 0.0001 Epoch: 31 Global Step: 647330 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:18,279-Speed 6303.60 samples/sec Loss 3.2808 LearningRate 0.0001 Epoch: 31 Global Step: 647340 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:21,527-Speed 6307.77 samples/sec Loss 3.2873 LearningRate 0.0001 Epoch: 31 Global Step: 647350 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:24,774-Speed 6308.49 samples/sec Loss 3.2807 LearningRate 0.0001 Epoch: 31 Global Step: 647360 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:28,022-Speed 6307.61 samples/sec Loss 3.2181 LearningRate 0.0001 Epoch: 31 Global Step: 647370 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:31,269-Speed 6308.34 samples/sec Loss 3.2306 LearningRate 0.0001 Epoch: 31 Global Step: 647380 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:34,520-Speed 6301.33 samples/sec Loss 3.1780 LearningRate 0.0001 Epoch: 31 Global Step: 647390 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:37,754-Speed 6335.73 samples/sec Loss 3.2396 LearningRate 0.0001 Epoch: 31 Global Step: 647400 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:40,999-Speed 6312.10 samples/sec Loss 3.2318 LearningRate 0.0001 Epoch: 31 Global Step: 647410 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:44,241-Speed 6317.47 samples/sec Loss 3.2479 LearningRate 0.0001 Epoch: 31 Global Step: 647420 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:47,491-Speed 6302.88 samples/sec Loss 3.2155 LearningRate 0.0001 Epoch: 31 Global Step: 647430 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:50,740-Speed 6306.08 samples/sec Loss 3.2081 LearningRate 0.0001 Epoch: 31 Global Step: 647440 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:53,989-Speed 6305.14 samples/sec Loss 3.1872 LearningRate 0.0001 Epoch: 31 Global Step: 647450 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:37:57,235-Speed 6310.68 samples/sec Loss 3.2317 LearningRate 0.0001 Epoch: 31 Global Step: 647460 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:00,482-Speed 6308.58 samples/sec Loss 3.3022 LearningRate 0.0001 Epoch: 31 Global Step: 647470 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:03,733-Speed 6300.52 samples/sec Loss 3.2668 LearningRate 0.0001 Epoch: 31 Global Step: 647480 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:06,982-Speed 6305.51 samples/sec Loss 3.2018 LearningRate 0.0001 Epoch: 31 Global Step: 647490 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:10,216-Speed 6332.38 samples/sec Loss 3.2712 LearningRate 0.0001 Epoch: 31 Global Step: 647500 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:13,460-Speed 6316.39 samples/sec Loss 3.2129 LearningRate 0.0001 Epoch: 31 Global Step: 647510 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:16,707-Speed 6307.86 samples/sec Loss 3.2705 LearningRate 0.0001 Epoch: 31 Global Step: 647520 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:19,954-Speed 6308.22 samples/sec Loss 3.1916 LearningRate 0.0001 Epoch: 31 Global Step: 647530 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:23,198-Speed 6315.51 samples/sec Loss 3.3146 LearningRate 0.0001 Epoch: 31 Global Step: 647540 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:26,447-Speed 6305.05 samples/sec Loss 3.1254 LearningRate 0.0001 Epoch: 31 Global Step: 647550 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:29,702-Speed 6292.40 samples/sec Loss 3.3096 LearningRate 0.0001 Epoch: 31 Global Step: 647560 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:32,955-Speed 6299.00 samples/sec Loss 3.2372 LearningRate 0.0001 Epoch: 31 Global Step: 647570 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:36,200-Speed 6312.21 samples/sec Loss 3.2148 LearningRate 0.0001 Epoch: 31 Global Step: 647580 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:39,457-Speed 6289.89 samples/sec Loss 3.2525 LearningRate 0.0001 Epoch: 31 Global Step: 647590 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:42,690-Speed 6337.34 samples/sec Loss 3.2244 LearningRate 0.0001 Epoch: 31 Global Step: 647600 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:45,937-Speed 6306.93 samples/sec Loss 3.2594 LearningRate 0.0001 Epoch: 31 Global Step: 647610 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:49,185-Speed 6307.54 samples/sec Loss 3.2329 LearningRate 0.0001 Epoch: 31 Global Step: 647620 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:52,429-Speed 6314.77 samples/sec Loss 3.2768 LearningRate 0.0001 Epoch: 31 Global Step: 647630 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:55,678-Speed 6305.02 samples/sec Loss 3.2452 LearningRate 0.0001 Epoch: 31 Global Step: 647640 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:38:58,926-Speed 6306.13 samples/sec Loss 3.2360 LearningRate 0.0001 Epoch: 31 Global Step: 647650 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:02,177-Speed 6301.72 samples/sec Loss 3.2386 LearningRate 0.0001 Epoch: 31 Global Step: 647660 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:05,449-Speed 6260.84 samples/sec Loss 3.2695 LearningRate 0.0001 Epoch: 31 Global Step: 647670 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:08,722-Speed 6258.74 samples/sec Loss 3.1990 LearningRate 0.0001 Epoch: 31 Global Step: 647680 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:11,971-Speed 6305.23 samples/sec Loss 3.2602 LearningRate 0.0001 Epoch: 31 Global Step: 647690 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:15,214-Speed 6315.34 samples/sec Loss 3.2049 LearningRate 0.0001 Epoch: 31 Global Step: 647700 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-03 02:39:18,450-Speed 6331.24 samples/sec Loss 3.2464 LearningRate 0.0001 Epoch: 31 Global Step: 647710 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:21,702-Speed 6298.81 samples/sec Loss 3.2466 LearningRate 0.0001 Epoch: 31 Global Step: 647720 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:24,949-Speed 6308.36 samples/sec Loss 3.2751 LearningRate 0.0001 Epoch: 31 Global Step: 647730 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:28,197-Speed 6306.73 samples/sec Loss 3.2449 LearningRate 0.0001 Epoch: 31 Global Step: 647740 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:31,443-Speed 6310.68 samples/sec Loss 3.2699 LearningRate 0.0001 Epoch: 31 Global Step: 647750 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:34,693-Speed 6303.81 samples/sec Loss 3.2095 LearningRate 0.0001 Epoch: 31 Global Step: 647760 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:37,947-Speed 6295.77 samples/sec Loss 3.2436 LearningRate 0.0001 Epoch: 31 Global Step: 647770 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:41,202-Speed 6293.15 samples/sec Loss 3.2280 LearningRate 0.0001 Epoch: 31 Global Step: 647780 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:44,451-Speed 6304.71 samples/sec Loss 3.2716 LearningRate 0.0001 Epoch: 31 Global Step: 647790 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:47,709-Speed 6288.72 samples/sec Loss 3.2705 LearningRate 0.0001 Epoch: 31 Global Step: 647800 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:50,937-Speed 6344.60 samples/sec Loss 3.2313 LearningRate 0.0001 Epoch: 31 Global Step: 647810 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:54,183-Speed 6311.62 samples/sec Loss 3.1980 LearningRate 0.0001 Epoch: 31 Global Step: 647820 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:39:57,431-Speed 6306.54 samples/sec Loss 3.2305 LearningRate 0.0001 Epoch: 31 Global Step: 647830 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:00,676-Speed 6313.57 samples/sec Loss 3.2629 LearningRate 0.0001 Epoch: 31 Global Step: 647840 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:03,930-Speed 6295.01 samples/sec Loss 3.2948 LearningRate 0.0001 Epoch: 31 Global Step: 647850 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:07,173-Speed 6316.35 samples/sec Loss 3.2234 LearningRate 0.0001 Epoch: 31 Global Step: 647860 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:10,420-Speed 6308.88 samples/sec Loss 3.2659 LearningRate 0.0001 Epoch: 31 Global Step: 647870 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:13,665-Speed 6311.86 samples/sec Loss 3.2279 LearningRate 0.0001 Epoch: 31 Global Step: 647880 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:16,911-Speed 6311.15 samples/sec Loss 3.2622 LearningRate 0.0001 Epoch: 31 Global Step: 647890 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:20,194-Speed 6239.78 samples/sec Loss 3.2539 LearningRate 0.0001 Epoch: 31 Global Step: 647900 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:23,429-Speed 6332.19 samples/sec Loss 3.2618 LearningRate 0.0001 Epoch: 31 Global Step: 647910 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:26,669-Speed 6321.22 samples/sec Loss 3.2967 LearningRate 0.0001 Epoch: 31 Global Step: 647920 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:29,916-Speed 6309.97 samples/sec Loss 3.1730 LearningRate 0.0001 Epoch: 31 Global Step: 647930 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:33,163-Speed 6308.96 samples/sec Loss 3.1913 LearningRate 0.0001 Epoch: 31 Global Step: 647940 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:36,411-Speed 6305.43 samples/sec Loss 3.2565 LearningRate 0.0001 Epoch: 31 Global Step: 647950 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:39,654-Speed 6317.59 samples/sec Loss 3.2741 LearningRate 0.0001 Epoch: 31 Global Step: 647960 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:42,900-Speed 6309.62 samples/sec Loss 3.2582 LearningRate 0.0001 Epoch: 31 Global Step: 647970 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:46,146-Speed 6311.56 samples/sec Loss 3.2132 LearningRate 0.0001 Epoch: 31 Global Step: 647980 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:49,392-Speed 6311.05 samples/sec Loss 3.2891 LearningRate 0.0001 Epoch: 31 Global Step: 647990 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:52,641-Speed 6303.95 samples/sec Loss 3.2102 LearningRate 0.0001 Epoch: 31 Global Step: 648000 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:55,879-Speed 6326.24 samples/sec Loss 3.2649 LearningRate 0.0001 Epoch: 31 Global Step: 648010 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:40:59,125-Speed 6312.42 samples/sec Loss 3.1262 LearningRate 0.0001 Epoch: 31 Global Step: 648020 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:02,373-Speed 6306.86 samples/sec Loss 3.2061 LearningRate 0.0001 Epoch: 31 Global Step: 648030 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:05,622-Speed 6303.87 samples/sec Loss 3.2389 LearningRate 0.0001 Epoch: 31 Global Step: 648040 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:08,868-Speed 6313.26 samples/sec Loss 3.2160 LearningRate 0.0001 Epoch: 31 Global Step: 648050 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:12,111-Speed 6315.85 samples/sec Loss 3.2492 LearningRate 0.0001 Epoch: 31 Global Step: 648060 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:15,357-Speed 6310.78 samples/sec Loss 3.2724 LearningRate 0.0001 Epoch: 31 Global Step: 648070 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:18,604-Speed 6309.21 samples/sec Loss 3.2534 LearningRate 0.0001 Epoch: 31 Global Step: 648080 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:21,850-Speed 6308.79 samples/sec Loss 3.2635 LearningRate 0.0001 Epoch: 31 Global Step: 648090 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:25,096-Speed 6310.91 samples/sec Loss 3.2925 LearningRate 0.0001 Epoch: 31 Global Step: 648100 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:28,326-Speed 6342.34 samples/sec Loss 3.2483 LearningRate 0.0001 Epoch: 31 Global Step: 648110 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:31,573-Speed 6309.06 samples/sec Loss 3.2353 LearningRate 0.0001 Epoch: 31 Global Step: 648120 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:34,815-Speed 6318.41 samples/sec Loss 3.3120 LearningRate 0.0001 Epoch: 31 Global Step: 648130 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:38,055-Speed 6321.37 samples/sec Loss 3.2457 LearningRate 0.0001 Epoch: 31 Global Step: 648140 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:41,304-Speed 6305.70 samples/sec Loss 3.2019 LearningRate 0.0001 Epoch: 31 Global Step: 648150 Fp16 Grad Scale: 8192 Required: 17 hours Training: 2022-04-03 02:41:44,548-Speed 6315.19 samples/sec Loss 3.2438 LearningRate 0.0001 Epoch: 31 Global Step: 648160 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:41:47,797-Speed 6304.44 samples/sec Loss 3.2888 LearningRate 0.0001 Epoch: 31 Global Step: 648170 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:41:51,045-Speed 6307.33 samples/sec Loss 3.2892 LearningRate 0.0001 Epoch: 31 Global Step: 648180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:41:54,294-Speed 6304.00 samples/sec Loss 3.2510 LearningRate 0.0001 Epoch: 31 Global Step: 648190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:41:57,540-Speed 6310.33 samples/sec Loss 3.1972 LearningRate 0.0001 Epoch: 31 Global Step: 648200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:00,771-Speed 6340.09 samples/sec Loss 3.2427 LearningRate 0.0001 Epoch: 31 Global Step: 648210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:04,022-Speed 6302.56 samples/sec Loss 3.2484 LearningRate 0.0001 Epoch: 31 Global Step: 648220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:07,277-Speed 6291.79 samples/sec Loss 3.2611 LearningRate 0.0001 Epoch: 31 Global Step: 648230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:10,524-Speed 6308.60 samples/sec Loss 3.2687 LearningRate 0.0001 Epoch: 31 Global Step: 648240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:13,770-Speed 6312.71 samples/sec Loss 3.2880 LearningRate 0.0001 Epoch: 31 Global Step: 648250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:17,016-Speed 6309.94 samples/sec Loss 3.2116 LearningRate 0.0001 Epoch: 31 Global Step: 648260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:20,265-Speed 6305.16 samples/sec Loss 3.2465 LearningRate 0.0001 Epoch: 31 Global Step: 648270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:23,508-Speed 6316.93 samples/sec Loss 3.2391 LearningRate 0.0001 Epoch: 31 Global Step: 648280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:26,756-Speed 6305.93 samples/sec Loss 3.2149 LearningRate 0.0001 Epoch: 31 Global Step: 648290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:30,005-Speed 6306.05 samples/sec Loss 3.1743 LearningRate 0.0001 Epoch: 31 Global Step: 648300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:33,238-Speed 6336.86 samples/sec Loss 3.2561 LearningRate 0.0001 Epoch: 31 Global Step: 648310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:36,498-Speed 6282.92 samples/sec Loss 3.2673 LearningRate 0.0001 Epoch: 31 Global Step: 648320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:39,860-Speed 6092.12 samples/sec Loss 3.2042 LearningRate 0.0001 Epoch: 31 Global Step: 648330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:43,106-Speed 6312.01 samples/sec Loss 3.2633 LearningRate 0.0001 Epoch: 31 Global Step: 648340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:46,354-Speed 6306.11 samples/sec Loss 3.2450 LearningRate 0.0001 Epoch: 31 Global Step: 648350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:49,605-Speed 6300.94 samples/sec Loss 3.2108 LearningRate 0.0001 Epoch: 31 Global Step: 648360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:52,849-Speed 6313.89 samples/sec Loss 3.2255 LearningRate 0.0001 Epoch: 31 Global Step: 648370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:56,101-Speed 6298.56 samples/sec Loss 3.2043 LearningRate 0.0001 Epoch: 31 Global Step: 648380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:42:59,351-Speed 6303.62 samples/sec Loss 3.1977 LearningRate 0.0001 Epoch: 31 Global Step: 648390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:02,597-Speed 6310.27 samples/sec Loss 3.2187 LearningRate 0.0001 Epoch: 31 Global Step: 648400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:05,828-Speed 6340.75 samples/sec Loss 3.2993 LearningRate 0.0001 Epoch: 31 Global Step: 648410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:09,076-Speed 6306.71 samples/sec Loss 3.2339 LearningRate 0.0001 Epoch: 31 Global Step: 648420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:12,315-Speed 6323.83 samples/sec Loss 3.2242 LearningRate 0.0001 Epoch: 31 Global Step: 648430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:15,560-Speed 6313.22 samples/sec Loss 3.2695 LearningRate 0.0001 Epoch: 31 Global Step: 648440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:18,806-Speed 6311.52 samples/sec Loss 3.1853 LearningRate 0.0001 Epoch: 31 Global Step: 648450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:22,048-Speed 6317.33 samples/sec Loss 3.2387 LearningRate 0.0001 Epoch: 31 Global Step: 648460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:25,297-Speed 6304.80 samples/sec Loss 3.2688 LearningRate 0.0001 Epoch: 31 Global Step: 648470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:28,546-Speed 6305.74 samples/sec Loss 3.2794 LearningRate 0.0001 Epoch: 31 Global Step: 648480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:31,804-Speed 6288.43 samples/sec Loss 3.2189 LearningRate 0.0001 Epoch: 31 Global Step: 648490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:35,050-Speed 6310.60 samples/sec Loss 3.1993 LearningRate 0.0001 Epoch: 31 Global Step: 648500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:38,378-Speed 6155.67 samples/sec Loss 3.2517 LearningRate 0.0001 Epoch: 31 Global Step: 648510 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-03 02:43:41,618-Speed 6320.60 samples/sec Loss 3.2300 LearningRate 0.0001 Epoch: 31 Global Step: 648520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:44,863-Speed 6313.04 samples/sec Loss 3.2365 LearningRate 0.0001 Epoch: 31 Global Step: 648530 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:48,110-Speed 6308.68 samples/sec Loss 3.2024 LearningRate 0.0001 Epoch: 31 Global Step: 648540 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:51,354-Speed 6314.41 samples/sec Loss 3.2701 LearningRate 0.0001 Epoch: 31 Global Step: 648550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:54,600-Speed 6312.28 samples/sec Loss 3.2541 LearningRate 0.0001 Epoch: 31 Global Step: 648560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:43:57,848-Speed 6305.71 samples/sec Loss 3.2334 LearningRate 0.0001 Epoch: 31 Global Step: 648570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:01,101-Speed 6297.55 samples/sec Loss 3.2712 LearningRate 0.0001 Epoch: 31 Global Step: 648580 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:04,346-Speed 6311.71 samples/sec Loss 3.2409 LearningRate 0.0001 Epoch: 31 Global Step: 648590 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:07,590-Speed 6315.33 samples/sec Loss 3.1857 LearningRate 0.0001 Epoch: 31 Global Step: 648600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:10,843-Speed 6296.84 samples/sec Loss 3.2295 LearningRate 0.0001 Epoch: 31 Global Step: 648610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:14,072-Speed 6346.17 samples/sec Loss 3.2430 LearningRate 0.0001 Epoch: 31 Global Step: 648620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:17,322-Speed 6303.97 samples/sec Loss 3.2457 LearningRate 0.0001 Epoch: 31 Global Step: 648630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:20,573-Speed 6301.25 samples/sec Loss 3.2250 LearningRate 0.0001 Epoch: 31 Global Step: 648640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:23,820-Speed 6308.01 samples/sec Loss 3.2136 LearningRate 0.0001 Epoch: 31 Global Step: 648650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:27,063-Speed 6316.09 samples/sec Loss 3.2073 LearningRate 0.0001 Epoch: 31 Global Step: 648660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:30,311-Speed 6306.43 samples/sec Loss 3.2251 LearningRate 0.0001 Epoch: 31 Global Step: 648670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:33,559-Speed 6308.35 samples/sec Loss 3.2154 LearningRate 0.0001 Epoch: 31 Global Step: 648680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:36,806-Speed 6307.99 samples/sec Loss 3.2451 LearningRate 0.0001 Epoch: 31 Global Step: 648690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:40,051-Speed 6312.96 samples/sec Loss 3.2261 LearningRate 0.0001 Epoch: 31 Global Step: 648700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:43,295-Speed 6314.64 samples/sec Loss 3.2220 LearningRate 0.0001 Epoch: 31 Global Step: 648710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:46,528-Speed 6337.09 samples/sec Loss 3.1990 LearningRate 0.0001 Epoch: 31 Global Step: 648720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:44:49,760-Speed 6336.93 samples/sec Loss 3.2389 LearningRate 0.0001 Epoch: 31 Global Step: 648730 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:44:53,006-Speed 6311.01 samples/sec Loss 3.1903 LearningRate 0.0001 Epoch: 31 Global Step: 648740 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:44:56,250-Speed 6315.64 samples/sec Loss 3.2362 LearningRate 0.0001 Epoch: 31 Global Step: 648750 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:44:59,494-Speed 6314.59 samples/sec Loss 3.2706 LearningRate 0.0001 Epoch: 31 Global Step: 648760 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:45:02,736-Speed 6316.59 samples/sec Loss 3.2443 LearningRate 0.0001 Epoch: 31 Global Step: 648770 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:45:05,982-Speed 6312.52 samples/sec Loss 3.2466 LearningRate 0.0001 Epoch: 31 Global Step: 648780 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:45:09,229-Speed 6307.39 samples/sec Loss 3.1549 LearningRate 0.0001 Epoch: 31 Global Step: 648790 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:45:12,474-Speed 6313.04 samples/sec Loss 3.2369 LearningRate 0.0001 Epoch: 31 Global Step: 648800 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:45:15,718-Speed 6315.85 samples/sec Loss 3.2497 LearningRate 0.0001 Epoch: 31 Global Step: 648810 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:45:18,965-Speed 6308.75 samples/sec Loss 3.2079 LearningRate 0.0001 Epoch: 31 Global Step: 648820 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:45:22,219-Speed 6294.06 samples/sec Loss 3.2652 LearningRate 0.0001 Epoch: 31 Global Step: 648830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:25,474-Speed 6293.50 samples/sec Loss 3.2856 LearningRate 0.0001 Epoch: 31 Global Step: 648840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:28,732-Speed 6287.61 samples/sec Loss 3.2431 LearningRate 0.0001 Epoch: 31 Global Step: 648850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:31,997-Speed 6272.79 samples/sec Loss 3.2452 LearningRate 0.0001 Epoch: 31 Global Step: 648860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:35,242-Speed 6314.35 samples/sec Loss 3.2496 LearningRate 0.0001 Epoch: 31 Global Step: 648870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:38,487-Speed 6311.85 samples/sec Loss 3.2331 LearningRate 0.0001 Epoch: 31 Global Step: 648880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:41,736-Speed 6304.05 samples/sec Loss 3.2437 LearningRate 0.0001 Epoch: 31 Global Step: 648890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:44,981-Speed 6314.24 samples/sec Loss 3.2318 LearningRate 0.0001 Epoch: 31 Global Step: 648900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:48,227-Speed 6310.03 samples/sec Loss 3.2268 LearningRate 0.0001 Epoch: 31 Global Step: 648910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:51,474-Speed 6309.13 samples/sec Loss 3.2653 LearningRate 0.0001 Epoch: 31 Global Step: 648920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:54,705-Speed 6340.31 samples/sec Loss 3.2276 LearningRate 0.0001 Epoch: 31 Global Step: 648930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:45:57,951-Speed 6310.79 samples/sec Loss 3.2548 LearningRate 0.0001 Epoch: 31 Global Step: 648940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:01,197-Speed 6310.88 samples/sec Loss 3.1887 LearningRate 0.0001 Epoch: 31 Global Step: 648950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:04,443-Speed 6311.68 samples/sec Loss 3.2550 LearningRate 0.0001 Epoch: 31 Global Step: 648960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:07,689-Speed 6311.05 samples/sec Loss 3.2677 LearningRate 0.0001 Epoch: 31 Global Step: 648970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:10,935-Speed 6310.07 samples/sec Loss 3.2484 LearningRate 0.0001 Epoch: 31 Global Step: 648980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:14,178-Speed 6317.53 samples/sec Loss 3.2801 LearningRate 0.0001 Epoch: 31 Global Step: 648990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:17,420-Speed 6317.11 samples/sec Loss 3.2206 LearningRate 0.0001 Epoch: 31 Global Step: 649000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:20,668-Speed 6307.66 samples/sec Loss 3.2372 LearningRate 0.0001 Epoch: 31 Global Step: 649010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:23,912-Speed 6313.19 samples/sec Loss 3.2334 LearningRate 0.0001 Epoch: 31 Global Step: 649020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:27,144-Speed 6339.48 samples/sec Loss 3.1527 LearningRate 0.0001 Epoch: 31 Global Step: 649030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:30,392-Speed 6307.12 samples/sec Loss 3.2177 LearningRate 0.0001 Epoch: 31 Global Step: 649040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:33,635-Speed 6316.51 samples/sec Loss 3.2244 LearningRate 0.0001 Epoch: 31 Global Step: 649050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:37,014-Speed 6062.02 samples/sec Loss 3.1863 LearningRate 0.0001 Epoch: 31 Global Step: 649060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:40,260-Speed 6309.32 samples/sec Loss 3.2765 LearningRate 0.0001 Epoch: 31 Global Step: 649070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:43,504-Speed 6315.82 samples/sec Loss 3.2380 LearningRate 0.0001 Epoch: 31 Global Step: 649080 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:46,749-Speed 6313.15 samples/sec Loss 3.2236 LearningRate 0.0001 Epoch: 31 Global Step: 649090 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:49,992-Speed 6315.38 samples/sec Loss 3.2729 LearningRate 0.0001 Epoch: 31 Global Step: 649100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:53,242-Speed 6303.97 samples/sec Loss 3.2323 LearningRate 0.0001 Epoch: 31 Global Step: 649110 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:56,488-Speed 6309.01 samples/sec Loss 3.2973 LearningRate 0.0001 Epoch: 31 Global Step: 649120 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:46:59,725-Speed 6329.01 samples/sec Loss 3.2493 LearningRate 0.0001 Epoch: 31 Global Step: 649130 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:02,976-Speed 6301.18 samples/sec Loss 3.3048 LearningRate 0.0001 Epoch: 31 Global Step: 649140 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:06,227-Speed 6301.36 samples/sec Loss 3.2192 LearningRate 0.0001 Epoch: 31 Global Step: 649150 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:09,475-Speed 6307.59 samples/sec Loss 3.2195 LearningRate 0.0001 Epoch: 31 Global Step: 649160 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:12,722-Speed 6308.78 samples/sec Loss 3.2004 LearningRate 0.0001 Epoch: 31 Global Step: 649170 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:15,968-Speed 6310.64 samples/sec Loss 3.2393 LearningRate 0.0001 Epoch: 31 Global Step: 649180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:19,212-Speed 6313.84 samples/sec Loss 3.2272 LearningRate 0.0001 Epoch: 31 Global Step: 649190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:22,458-Speed 6310.93 samples/sec Loss 3.2421 LearningRate 0.0001 Epoch: 31 Global Step: 649200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:25,706-Speed 6307.72 samples/sec Loss 3.2230 LearningRate 0.0001 Epoch: 31 Global Step: 649210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:28,950-Speed 6313.71 samples/sec Loss 3.2554 LearningRate 0.0001 Epoch: 31 Global Step: 649220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:32,198-Speed 6307.70 samples/sec Loss 3.2460 LearningRate 0.0001 Epoch: 31 Global Step: 649230 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-03 02:47:35,429-Speed 6339.87 samples/sec Loss 3.2436 LearningRate 0.0001 Epoch: 31 Global Step: 649240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:38,674-Speed 6311.53 samples/sec Loss 3.2198 LearningRate 0.0001 Epoch: 31 Global Step: 649250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:41,924-Speed 6303.99 samples/sec Loss 3.2378 LearningRate 0.0001 Epoch: 31 Global Step: 649260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:45,170-Speed 6310.20 samples/sec Loss 3.2995 LearningRate 0.0001 Epoch: 31 Global Step: 649270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:48,418-Speed 6306.14 samples/sec Loss 3.1934 LearningRate 0.0001 Epoch: 31 Global Step: 649280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:51,666-Speed 6307.07 samples/sec Loss 3.2280 LearningRate 0.0001 Epoch: 31 Global Step: 649290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:54,910-Speed 6314.59 samples/sec Loss 3.2219 LearningRate 0.0001 Epoch: 31 Global Step: 649300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:47:58,155-Speed 6314.06 samples/sec Loss 3.2117 LearningRate 0.0001 Epoch: 31 Global Step: 649310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:01,401-Speed 6309.26 samples/sec Loss 3.2114 LearningRate 0.0001 Epoch: 31 Global Step: 649320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:04,647-Speed 6311.26 samples/sec Loss 3.1991 LearningRate 0.0001 Epoch: 31 Global Step: 649330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:07,877-Speed 6341.88 samples/sec Loss 3.2093 LearningRate 0.0001 Epoch: 31 Global Step: 649340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:11,132-Speed 6293.76 samples/sec Loss 3.1996 LearningRate 0.0001 Epoch: 31 Global Step: 649350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:14,376-Speed 6315.54 samples/sec Loss 3.2377 LearningRate 0.0001 Epoch: 31 Global Step: 649360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:17,622-Speed 6311.02 samples/sec Loss 3.2476 LearningRate 0.0001 Epoch: 31 Global Step: 649370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:20,870-Speed 6306.29 samples/sec Loss 3.2597 LearningRate 0.0001 Epoch: 31 Global Step: 649380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:24,116-Speed 6311.49 samples/sec Loss 3.2060 LearningRate 0.0001 Epoch: 31 Global Step: 649390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:27,361-Speed 6312.69 samples/sec Loss 3.2137 LearningRate 0.0001 Epoch: 31 Global Step: 649400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:30,606-Speed 6312.40 samples/sec Loss 3.2273 LearningRate 0.0001 Epoch: 31 Global Step: 649410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:33,852-Speed 6311.11 samples/sec Loss 3.2284 LearningRate 0.0001 Epoch: 31 Global Step: 649420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:37,097-Speed 6310.83 samples/sec Loss 3.2774 LearningRate 0.0001 Epoch: 31 Global Step: 649430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:40,331-Speed 6334.52 samples/sec Loss 3.2210 LearningRate 0.0001 Epoch: 31 Global Step: 649440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:43,579-Speed 6307.32 samples/sec Loss 3.1899 LearningRate 0.0001 Epoch: 31 Global Step: 649450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:46,822-Speed 6317.07 samples/sec Loss 3.2012 LearningRate 0.0001 Epoch: 31 Global Step: 649460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:50,065-Speed 6315.51 samples/sec Loss 3.2604 LearningRate 0.0001 Epoch: 31 Global Step: 649470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:53,310-Speed 6311.74 samples/sec Loss 3.2113 LearningRate 0.0001 Epoch: 31 Global Step: 649480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:56,557-Speed 6309.79 samples/sec Loss 3.2185 LearningRate 0.0001 Epoch: 31 Global Step: 649490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:48:59,805-Speed 6305.92 samples/sec Loss 3.2109 LearningRate 0.0001 Epoch: 31 Global Step: 649500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:03,049-Speed 6314.83 samples/sec Loss 3.2913 LearningRate 0.0001 Epoch: 31 Global Step: 649510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:06,290-Speed 6321.15 samples/sec Loss 3.1868 LearningRate 0.0001 Epoch: 31 Global Step: 649520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:09,533-Speed 6316.89 samples/sec Loss 3.2299 LearningRate 0.0001 Epoch: 31 Global Step: 649530 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:12,762-Speed 6343.69 samples/sec Loss 3.3079 LearningRate 0.0001 Epoch: 31 Global Step: 649540 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:16,006-Speed 6314.11 samples/sec Loss 3.1992 LearningRate 0.0001 Epoch: 31 Global Step: 649550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:19,250-Speed 6315.42 samples/sec Loss 3.2229 LearningRate 0.0001 Epoch: 31 Global Step: 649560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:22,499-Speed 6304.32 samples/sec Loss 3.3006 LearningRate 0.0001 Epoch: 31 Global Step: 649570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:25,748-Speed 6304.77 samples/sec Loss 3.2026 LearningRate 0.0001 Epoch: 31 Global Step: 649580 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:28,998-Speed 6303.78 samples/sec Loss 3.1912 LearningRate 0.0001 Epoch: 31 Global Step: 649590 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:32,238-Speed 6321.37 samples/sec Loss 3.2106 LearningRate 0.0001 Epoch: 31 Global Step: 649600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:35,484-Speed 6312.19 samples/sec Loss 3.2332 LearningRate 0.0001 Epoch: 31 Global Step: 649610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:38,729-Speed 6312.72 samples/sec Loss 3.2215 LearningRate 0.0001 Epoch: 31 Global Step: 649620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:41,972-Speed 6315.18 samples/sec Loss 3.2576 LearningRate 0.0001 Epoch: 31 Global Step: 649630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:45,206-Speed 6334.21 samples/sec Loss 3.2387 LearningRate 0.0001 Epoch: 31 Global Step: 649640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:48,458-Speed 6299.54 samples/sec Loss 3.2357 LearningRate 0.0001 Epoch: 31 Global Step: 649650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:51,711-Speed 6296.96 samples/sec Loss 3.2722 LearningRate 0.0001 Epoch: 31 Global Step: 649660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:54,956-Speed 6313.20 samples/sec Loss 3.2520 LearningRate 0.0001 Epoch: 31 Global Step: 649670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:49:58,202-Speed 6309.44 samples/sec Loss 3.1875 LearningRate 0.0001 Epoch: 31 Global Step: 649680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:01,450-Speed 6307.07 samples/sec Loss 3.2128 LearningRate 0.0001 Epoch: 31 Global Step: 649690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:04,691-Speed 6321.27 samples/sec Loss 3.1801 LearningRate 0.0001 Epoch: 31 Global Step: 649700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:07,937-Speed 6311.10 samples/sec Loss 3.1962 LearningRate 0.0001 Epoch: 31 Global Step: 649710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:11,184-Speed 6308.10 samples/sec Loss 3.1748 LearningRate 0.0001 Epoch: 31 Global Step: 649720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:14,429-Speed 6311.85 samples/sec Loss 3.2743 LearningRate 0.0001 Epoch: 31 Global Step: 649730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:17,659-Speed 6341.94 samples/sec Loss 3.2124 LearningRate 0.0001 Epoch: 31 Global Step: 649740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:20,902-Speed 6316.83 samples/sec Loss 3.2440 LearningRate 0.0001 Epoch: 31 Global Step: 649750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:24,146-Speed 6314.04 samples/sec Loss 3.2349 LearningRate 0.0001 Epoch: 31 Global Step: 649760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:27,391-Speed 6313.31 samples/sec Loss 3.2803 LearningRate 0.0001 Epoch: 31 Global Step: 649770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:30,640-Speed 6305.76 samples/sec Loss 3.3007 LearningRate 0.0001 Epoch: 31 Global Step: 649780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:50:33,875-Speed 6333.05 samples/sec Loss 3.2013 LearningRate 0.0001 Epoch: 31 Global Step: 649790 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:50:37,120-Speed 6313.07 samples/sec Loss 3.2167 LearningRate 0.0001 Epoch: 31 Global Step: 649800 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:50:40,367-Speed 6307.37 samples/sec Loss 3.2388 LearningRate 0.0001 Epoch: 31 Global Step: 649810 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:50:43,619-Speed 6300.26 samples/sec Loss 3.2547 LearningRate 0.0001 Epoch: 31 Global Step: 649820 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:50:46,866-Speed 6307.37 samples/sec Loss 3.2439 LearningRate 0.0001 Epoch: 31 Global Step: 649830 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:50:50,116-Speed 6303.09 samples/sec Loss 3.2470 LearningRate 0.0001 Epoch: 31 Global Step: 649840 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:50:53,361-Speed 6313.67 samples/sec Loss 3.2168 LearningRate 0.0001 Epoch: 31 Global Step: 649850 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:50:56,604-Speed 6315.78 samples/sec Loss 3.1958 LearningRate 0.0001 Epoch: 31 Global Step: 649860 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:50:59,851-Speed 6308.69 samples/sec Loss 3.1605 LearningRate 0.0001 Epoch: 31 Global Step: 649870 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:51:03,098-Speed 6309.45 samples/sec Loss 3.2464 LearningRate 0.0001 Epoch: 31 Global Step: 649880 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:51:06,341-Speed 6316.93 samples/sec Loss 3.2402 LearningRate 0.0001 Epoch: 31 Global Step: 649890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:09,590-Speed 6303.76 samples/sec Loss 3.2423 LearningRate 0.0001 Epoch: 31 Global Step: 649900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:12,835-Speed 6312.52 samples/sec Loss 3.1958 LearningRate 0.0001 Epoch: 31 Global Step: 649910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:16,079-Speed 6315.47 samples/sec Loss 3.2011 LearningRate 0.0001 Epoch: 31 Global Step: 649920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:19,326-Speed 6307.87 samples/sec Loss 3.2361 LearningRate 0.0001 Epoch: 31 Global Step: 649930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:22,575-Speed 6305.72 samples/sec Loss 3.2691 LearningRate 0.0001 Epoch: 31 Global Step: 649940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:25,825-Speed 6301.50 samples/sec Loss 3.3089 LearningRate 0.0001 Epoch: 31 Global Step: 649950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:29,071-Speed 6312.28 samples/sec Loss 3.3161 LearningRate 0.0001 Epoch: 31 Global Step: 649960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:32,316-Speed 6311.85 samples/sec Loss 3.2087 LearningRate 0.0001 Epoch: 31 Global Step: 649970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:35,560-Speed 6315.18 samples/sec Loss 3.2476 LearningRate 0.0001 Epoch: 31 Global Step: 649980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:38,795-Speed 6332.11 samples/sec Loss 3.2711 LearningRate 0.0001 Epoch: 31 Global Step: 649990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:42,041-Speed 6311.42 samples/sec Loss 3.1589 LearningRate 0.0001 Epoch: 31 Global Step: 650000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:45,287-Speed 6311.24 samples/sec Loss 3.2250 LearningRate 0.0001 Epoch: 31 Global Step: 650010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:48,534-Speed 6309.07 samples/sec Loss 3.2850 LearningRate 0.0001 Epoch: 31 Global Step: 650020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:51,783-Speed 6305.21 samples/sec Loss 3.2046 LearningRate 0.0001 Epoch: 31 Global Step: 650030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:55,030-Speed 6308.73 samples/sec Loss 3.2055 LearningRate 0.0001 Epoch: 31 Global Step: 650040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:51:58,283-Speed 6298.10 samples/sec Loss 3.2168 LearningRate 0.0001 Epoch: 31 Global Step: 650050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:52:01,534-Speed 6300.91 samples/sec Loss 3.2396 LearningRate 0.0001 Epoch: 31 Global Step: 650060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:52:04,781-Speed 6308.50 samples/sec Loss 3.1972 LearningRate 0.0001 Epoch: 31 Global Step: 650070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:52:08,026-Speed 6312.12 samples/sec Loss 3.1975 LearningRate 0.0001 Epoch: 31 Global Step: 650080 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:52:11,258-Speed 6338.89 samples/sec Loss 3.2146 LearningRate 0.0001 Epoch: 31 Global Step: 650090 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:52:14,505-Speed 6308.73 samples/sec Loss 3.2493 LearningRate 0.0001 Epoch: 31 Global Step: 650100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:52:17,736-Speed 6338.86 samples/sec Loss 3.3072 LearningRate 0.0001 Epoch: 31 Global Step: 650110 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:52:20,981-Speed 6313.57 samples/sec Loss 3.2666 LearningRate 0.0001 Epoch: 31 Global Step: 650120 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:52:24,271-Speed 6225.52 samples/sec Loss 3.2214 LearningRate 0.0001 Epoch: 31 Global Step: 650130 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:52:27,518-Speed 6308.58 samples/sec Loss 3.1584 LearningRate 0.0001 Epoch: 31 Global Step: 650140 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:52:30,761-Speed 6317.87 samples/sec Loss 3.2390 LearningRate 0.0001 Epoch: 31 Global Step: 650150 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:52:34,005-Speed 6313.56 samples/sec Loss 3.2586 LearningRate 0.0001 Epoch: 31 Global Step: 650160 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:52:37,248-Speed 6316.66 samples/sec Loss 3.2391 LearningRate 0.0001 Epoch: 31 Global Step: 650170 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:52:40,495-Speed 6309.96 samples/sec Loss 3.2503 LearningRate 0.0001 Epoch: 31 Global Step: 650180 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:52:43,739-Speed 6313.59 samples/sec Loss 3.2787 LearningRate 0.0001 Epoch: 31 Global Step: 650190 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:52:47,026-Speed 6232.08 samples/sec Loss 3.2031 LearningRate 0.0001 Epoch: 31 Global Step: 650200 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:52:50,276-Speed 6302.32 samples/sec Loss 3.1964 LearningRate 0.0001 Epoch: 31 Global Step: 650210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:52:53,522-Speed 6311.58 samples/sec Loss 3.1891 LearningRate 0.0001 Epoch: 31 Global Step: 650220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:52:56,766-Speed 6313.84 samples/sec Loss 3.1741 LearningRate 0.0001 Epoch: 31 Global Step: 650230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:00,012-Speed 6311.31 samples/sec Loss 3.2111 LearningRate 0.0001 Epoch: 31 Global Step: 650240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:03,256-Speed 6314.78 samples/sec Loss 3.2043 LearningRate 0.0001 Epoch: 31 Global Step: 650250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:06,505-Speed 6307.59 samples/sec Loss 3.2543 LearningRate 0.0001 Epoch: 31 Global Step: 650260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:09,769-Speed 6275.39 samples/sec Loss 3.2099 LearningRate 0.0001 Epoch: 31 Global Step: 650270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:13,117-Speed 6119.79 samples/sec Loss 3.2145 LearningRate 0.0001 Epoch: 31 Global Step: 650280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:16,396-Speed 6246.20 samples/sec Loss 3.2174 LearningRate 0.0001 Epoch: 31 Global Step: 650290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:19,638-Speed 6318.99 samples/sec Loss 3.2253 LearningRate 0.0001 Epoch: 31 Global Step: 650300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:22,869-Speed 6340.24 samples/sec Loss 3.2115 LearningRate 0.0001 Epoch: 31 Global Step: 650310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:26,117-Speed 6306.83 samples/sec Loss 3.2457 LearningRate 0.0001 Epoch: 31 Global Step: 650320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:29,361-Speed 6314.65 samples/sec Loss 3.1268 LearningRate 0.0001 Epoch: 31 Global Step: 650330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:32,609-Speed 6307.31 samples/sec Loss 3.2599 LearningRate 0.0001 Epoch: 31 Global Step: 650340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:35,859-Speed 6302.60 samples/sec Loss 3.2740 LearningRate 0.0001 Epoch: 31 Global Step: 650350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:39,104-Speed 6311.85 samples/sec Loss 3.2238 LearningRate 0.0001 Epoch: 31 Global Step: 650360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:42,352-Speed 6306.95 samples/sec Loss 3.2482 LearningRate 0.0001 Epoch: 31 Global Step: 650370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:45,597-Speed 6313.58 samples/sec Loss 3.2286 LearningRate 0.0001 Epoch: 31 Global Step: 650380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:48,841-Speed 6314.17 samples/sec Loss 3.2337 LearningRate 0.0001 Epoch: 31 Global Step: 650390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:52,084-Speed 6316.25 samples/sec Loss 3.2113 LearningRate 0.0001 Epoch: 31 Global Step: 650400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:55,314-Speed 6343.00 samples/sec Loss 3.2525 LearningRate 0.0001 Epoch: 31 Global Step: 650410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:53:58,563-Speed 6304.24 samples/sec Loss 3.2475 LearningRate 0.0001 Epoch: 31 Global Step: 650420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:01,815-Speed 6298.52 samples/sec Loss 3.2227 LearningRate 0.0001 Epoch: 31 Global Step: 650430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:05,065-Speed 6304.22 samples/sec Loss 3.2391 LearningRate 0.0001 Epoch: 31 Global Step: 650440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:08,311-Speed 6309.71 samples/sec Loss 3.1631 LearningRate 0.0001 Epoch: 31 Global Step: 650450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:11,559-Speed 6308.50 samples/sec Loss 3.1717 LearningRate 0.0001 Epoch: 31 Global Step: 650460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:14,813-Speed 6293.97 samples/sec Loss 3.1955 LearningRate 0.0001 Epoch: 31 Global Step: 650470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:18,062-Speed 6305.49 samples/sec Loss 3.2592 LearningRate 0.0001 Epoch: 31 Global Step: 650480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:21,308-Speed 6312.13 samples/sec Loss 3.2397 LearningRate 0.0001 Epoch: 31 Global Step: 650490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:24,553-Speed 6312.05 samples/sec Loss 3.1793 LearningRate 0.0001 Epoch: 31 Global Step: 650500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:27,783-Speed 6341.01 samples/sec Loss 3.2171 LearningRate 0.0001 Epoch: 31 Global Step: 650510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:31,031-Speed 6307.09 samples/sec Loss 3.1862 LearningRate 0.0001 Epoch: 31 Global Step: 650520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:34,276-Speed 6314.12 samples/sec Loss 3.2413 LearningRate 0.0001 Epoch: 31 Global Step: 650530 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:37,525-Speed 6303.40 samples/sec Loss 3.2079 LearningRate 0.0001 Epoch: 31 Global Step: 650540 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:40,774-Speed 6306.45 samples/sec Loss 3.2143 LearningRate 0.0001 Epoch: 31 Global Step: 650550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:44,017-Speed 6316.44 samples/sec Loss 3.2435 LearningRate 0.0001 Epoch: 31 Global Step: 650560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:47,260-Speed 6315.69 samples/sec Loss 3.2244 LearningRate 0.0001 Epoch: 31 Global Step: 650570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:50,512-Speed 6299.82 samples/sec Loss 3.2281 LearningRate 0.0001 Epoch: 31 Global Step: 650580 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:53,760-Speed 6306.83 samples/sec Loss 3.2542 LearningRate 0.0001 Epoch: 31 Global Step: 650590 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:54:57,033-Speed 6256.96 samples/sec Loss 3.2792 LearningRate 0.0001 Epoch: 31 Global Step: 650600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:00,269-Speed 6330.31 samples/sec Loss 3.2045 LearningRate 0.0001 Epoch: 31 Global Step: 650610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:03,516-Speed 6309.70 samples/sec Loss 3.2304 LearningRate 0.0001 Epoch: 31 Global Step: 650620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:06,762-Speed 6310.98 samples/sec Loss 3.2383 LearningRate 0.0001 Epoch: 31 Global Step: 650630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:10,010-Speed 6307.18 samples/sec Loss 3.1416 LearningRate 0.0001 Epoch: 31 Global Step: 650640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:13,256-Speed 6309.82 samples/sec Loss 3.2133 LearningRate 0.0001 Epoch: 31 Global Step: 650650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:16,499-Speed 6317.58 samples/sec Loss 3.2310 LearningRate 0.0001 Epoch: 31 Global Step: 650660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:19,748-Speed 6303.92 samples/sec Loss 3.2094 LearningRate 0.0001 Epoch: 31 Global Step: 650670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:23,005-Speed 6289.95 samples/sec Loss 3.2062 LearningRate 0.0001 Epoch: 31 Global Step: 650680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:26,250-Speed 6313.23 samples/sec Loss 3.1959 LearningRate 0.0001 Epoch: 31 Global Step: 650690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:29,498-Speed 6305.61 samples/sec Loss 3.2707 LearningRate 0.0001 Epoch: 31 Global Step: 650700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:32,730-Speed 6338.30 samples/sec Loss 3.2227 LearningRate 0.0001 Epoch: 31 Global Step: 650710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:35,978-Speed 6307.53 samples/sec Loss 3.2291 LearningRate 0.0001 Epoch: 31 Global Step: 650720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:39,234-Speed 6291.08 samples/sec Loss 3.2401 LearningRate 0.0001 Epoch: 31 Global Step: 650730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:42,480-Speed 6310.77 samples/sec Loss 3.2133 LearningRate 0.0001 Epoch: 31 Global Step: 650740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:45,734-Speed 6294.65 samples/sec Loss 3.1931 LearningRate 0.0001 Epoch: 31 Global Step: 650750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:48,981-Speed 6309.41 samples/sec Loss 3.2190 LearningRate 0.0001 Epoch: 31 Global Step: 650760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:52,229-Speed 6306.23 samples/sec Loss 3.1752 LearningRate 0.0001 Epoch: 31 Global Step: 650770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:55,474-Speed 6313.60 samples/sec Loss 3.2529 LearningRate 0.0001 Epoch: 31 Global Step: 650780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:55:58,727-Speed 6296.36 samples/sec Loss 3.2051 LearningRate 0.0001 Epoch: 31 Global Step: 650790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:56:01,976-Speed 6305.64 samples/sec Loss 3.2482 LearningRate 0.0001 Epoch: 31 Global Step: 650800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:56:05,205-Speed 6342.90 samples/sec Loss 3.2280 LearningRate 0.0001 Epoch: 31 Global Step: 650810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:56:08,450-Speed 6312.48 samples/sec Loss 3.2170 LearningRate 0.0001 Epoch: 31 Global Step: 650820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:56:11,694-Speed 6314.53 samples/sec Loss 3.1887 LearningRate 0.0001 Epoch: 31 Global Step: 650830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:56:14,941-Speed 6309.94 samples/sec Loss 3.2185 LearningRate 0.0001 Epoch: 31 Global Step: 650840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:56:18,170-Speed 6343.95 samples/sec Loss 3.2275 LearningRate 0.0001 Epoch: 31 Global Step: 650850 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:56:21,416-Speed 6310.32 samples/sec Loss 3.2056 LearningRate 0.0001 Epoch: 31 Global Step: 650860 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:56:24,662-Speed 6310.70 samples/sec Loss 3.2417 LearningRate 0.0001 Epoch: 31 Global Step: 650870 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:56:27,911-Speed 6305.13 samples/sec Loss 3.2041 LearningRate 0.0001 Epoch: 31 Global Step: 650880 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:56:31,156-Speed 6313.07 samples/sec Loss 3.2810 LearningRate 0.0001 Epoch: 31 Global Step: 650890 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:56:34,399-Speed 6316.41 samples/sec Loss 3.2780 LearningRate 0.0001 Epoch: 31 Global Step: 650900 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:56:37,649-Speed 6301.68 samples/sec Loss 3.2353 LearningRate 0.0001 Epoch: 31 Global Step: 650910 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:56:40,899-Speed 6304.27 samples/sec Loss 3.2745 LearningRate 0.0001 Epoch: 31 Global Step: 650920 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:56:44,147-Speed 6307.07 samples/sec Loss 3.2439 LearningRate 0.0001 Epoch: 31 Global Step: 650930 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:56:47,397-Speed 6303.13 samples/sec Loss 3.2419 LearningRate 0.0001 Epoch: 31 Global Step: 650940 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:56:50,641-Speed 6315.15 samples/sec Loss 3.1681 LearningRate 0.0001 Epoch: 31 Global Step: 650950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:56:53,889-Speed 6306.43 samples/sec Loss 3.2513 LearningRate 0.0001 Epoch: 31 Global Step: 650960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:56:57,133-Speed 6313.54 samples/sec Loss 3.1930 LearningRate 0.0001 Epoch: 31 Global Step: 650970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:00,378-Speed 6313.66 samples/sec Loss 3.2156 LearningRate 0.0001 Epoch: 31 Global Step: 650980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:03,632-Speed 6295.79 samples/sec Loss 3.1866 LearningRate 0.0001 Epoch: 31 Global Step: 650990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:06,879-Speed 6308.03 samples/sec Loss 3.2386 LearningRate 0.0001 Epoch: 31 Global Step: 651000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:10,148-Speed 6266.56 samples/sec Loss 3.2170 LearningRate 0.0001 Epoch: 31 Global Step: 651010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:13,395-Speed 6308.94 samples/sec Loss 3.2282 LearningRate 0.0001 Epoch: 31 Global Step: 651020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:16,643-Speed 6307.33 samples/sec Loss 3.3044 LearningRate 0.0001 Epoch: 31 Global Step: 651030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:19,889-Speed 6310.03 samples/sec Loss 3.3125 LearningRate 0.0001 Epoch: 31 Global Step: 651040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:23,120-Speed 6339.32 samples/sec Loss 3.2214 LearningRate 0.0001 Epoch: 31 Global Step: 651050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:26,367-Speed 6308.65 samples/sec Loss 3.1858 LearningRate 0.0001 Epoch: 31 Global Step: 651060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:29,614-Speed 6309.54 samples/sec Loss 3.2386 LearningRate 0.0001 Epoch: 31 Global Step: 651070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:32,860-Speed 6310.67 samples/sec Loss 3.2625 LearningRate 0.0001 Epoch: 31 Global Step: 651080 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:36,108-Speed 6306.29 samples/sec Loss 3.2492 LearningRate 0.0001 Epoch: 31 Global Step: 651090 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:39,352-Speed 6314.77 samples/sec Loss 3.2292 LearningRate 0.0001 Epoch: 31 Global Step: 651100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:42,598-Speed 6309.82 samples/sec Loss 3.2197 LearningRate 0.0001 Epoch: 31 Global Step: 651110 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:45,843-Speed 6312.85 samples/sec Loss 3.2369 LearningRate 0.0001 Epoch: 31 Global Step: 651120 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:49,091-Speed 6308.38 samples/sec Loss 3.2794 LearningRate 0.0001 Epoch: 31 Global Step: 651130 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:52,345-Speed 6295.17 samples/sec Loss 3.2118 LearningRate 0.0001 Epoch: 31 Global Step: 651140 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:55,578-Speed 6334.82 samples/sec Loss 3.2378 LearningRate 0.0001 Epoch: 31 Global Step: 651150 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:57:58,808-Speed 6342.53 samples/sec Loss 3.2573 LearningRate 0.0001 Epoch: 31 Global Step: 651160 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:58:02,051-Speed 6317.41 samples/sec Loss 3.1479 LearningRate 0.0001 Epoch: 31 Global Step: 651170 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:58:05,298-Speed 6308.02 samples/sec Loss 3.2274 LearningRate 0.0001 Epoch: 31 Global Step: 651180 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:58:08,542-Speed 6315.58 samples/sec Loss 3.1660 LearningRate 0.0001 Epoch: 31 Global Step: 651190 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:58:11,788-Speed 6310.23 samples/sec Loss 3.2509 LearningRate 0.0001 Epoch: 31 Global Step: 651200 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:58:15,033-Speed 6312.53 samples/sec Loss 3.2299 LearningRate 0.0001 Epoch: 31 Global Step: 651210 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:58:18,276-Speed 6316.19 samples/sec Loss 3.2229 LearningRate 0.0001 Epoch: 31 Global Step: 651220 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:58:21,526-Speed 6302.86 samples/sec Loss 3.2069 LearningRate 0.0001 Epoch: 31 Global Step: 651230 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:58:24,769-Speed 6316.96 samples/sec Loss 3.1493 LearningRate 0.0001 Epoch: 31 Global Step: 651240 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:58:28,014-Speed 6311.94 samples/sec Loss 3.1351 LearningRate 0.0001 Epoch: 31 Global Step: 651250 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:58:31,261-Speed 6310.39 samples/sec Loss 3.1740 LearningRate 0.0001 Epoch: 31 Global Step: 651260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:58:34,503-Speed 6317.40 samples/sec Loss 3.1911 LearningRate 0.0001 Epoch: 31 Global Step: 651270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:58:37,749-Speed 6310.09 samples/sec Loss 3.2263 LearningRate 0.0001 Epoch: 31 Global Step: 651280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:58:40,992-Speed 6317.21 samples/sec Loss 3.2030 LearningRate 0.0001 Epoch: 31 Global Step: 651290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:58:44,238-Speed 6310.52 samples/sec Loss 3.1728 LearningRate 0.0001 Epoch: 31 Global Step: 651300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:58:47,481-Speed 6317.11 samples/sec Loss 3.2386 LearningRate 0.0001 Epoch: 31 Global Step: 651310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:58:50,726-Speed 6312.92 samples/sec Loss 3.2050 LearningRate 0.0001 Epoch: 31 Global Step: 651320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:58:53,970-Speed 6314.34 samples/sec Loss 3.1569 LearningRate 0.0001 Epoch: 31 Global Step: 651330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:58:57,214-Speed 6314.84 samples/sec Loss 3.2561 LearningRate 0.0001 Epoch: 31 Global Step: 651340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:00,462-Speed 6307.26 samples/sec Loss 3.2186 LearningRate 0.0001 Epoch: 31 Global Step: 651350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:03,699-Speed 6328.18 samples/sec Loss 3.2671 LearningRate 0.0001 Epoch: 31 Global Step: 651360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:06,950-Speed 6300.46 samples/sec Loss 3.2067 LearningRate 0.0001 Epoch: 31 Global Step: 651370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:10,201-Speed 6301.33 samples/sec Loss 3.1847 LearningRate 0.0001 Epoch: 31 Global Step: 651380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:13,452-Speed 6300.68 samples/sec Loss 3.2161 LearningRate 0.0001 Epoch: 31 Global Step: 651390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:16,696-Speed 6316.34 samples/sec Loss 3.1888 LearningRate 0.0001 Epoch: 31 Global Step: 651400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:19,940-Speed 6313.30 samples/sec Loss 3.1850 LearningRate 0.0001 Epoch: 31 Global Step: 651410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:23,189-Speed 6306.00 samples/sec Loss 3.2593 LearningRate 0.0001 Epoch: 31 Global Step: 651420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:26,435-Speed 6308.86 samples/sec Loss 3.2475 LearningRate 0.0001 Epoch: 31 Global Step: 651430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:29,680-Speed 6314.11 samples/sec Loss 3.1961 LearningRate 0.0001 Epoch: 31 Global Step: 651440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:32,927-Speed 6307.37 samples/sec Loss 3.2250 LearningRate 0.0001 Epoch: 31 Global Step: 651450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:36,158-Speed 6339.96 samples/sec Loss 3.2266 LearningRate 0.0001 Epoch: 31 Global Step: 651460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:39,405-Speed 6308.60 samples/sec Loss 3.2584 LearningRate 0.0001 Epoch: 31 Global Step: 651470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:42,648-Speed 6317.44 samples/sec Loss 3.2218 LearningRate 0.0001 Epoch: 31 Global Step: 651480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:45,892-Speed 6314.88 samples/sec Loss 3.2103 LearningRate 0.0001 Epoch: 31 Global Step: 651490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:49,137-Speed 6312.32 samples/sec Loss 3.2683 LearningRate 0.0001 Epoch: 31 Global Step: 651500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 02:59:52,372-Speed 6331.71 samples/sec Loss 3.2227 LearningRate 0.0001 Epoch: 31 Global Step: 651510 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:59:55,617-Speed 6312.66 samples/sec Loss 3.2025 LearningRate 0.0001 Epoch: 31 Global Step: 651520 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 02:59:58,860-Speed 6316.89 samples/sec Loss 3.2159 LearningRate 0.0001 Epoch: 31 Global Step: 651530 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:00:02,105-Speed 6315.06 samples/sec Loss 3.1664 LearningRate 0.0001 Epoch: 31 Global Step: 651540 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:00:05,350-Speed 6313.37 samples/sec Loss 3.2270 LearningRate 0.0001 Epoch: 31 Global Step: 651550 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:00:08,595-Speed 6312.32 samples/sec Loss 3.1791 LearningRate 0.0001 Epoch: 31 Global Step: 651560 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:00:11,836-Speed 6320.58 samples/sec Loss 3.1997 LearningRate 0.0001 Epoch: 31 Global Step: 651570 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:00:15,082-Speed 6310.94 samples/sec Loss 3.1689 LearningRate 0.0001 Epoch: 31 Global Step: 651580 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:00:18,328-Speed 6310.77 samples/sec Loss 3.2284 LearningRate 0.0001 Epoch: 31 Global Step: 651590 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:00:21,575-Speed 6309.35 samples/sec Loss 3.2214 LearningRate 0.0001 Epoch: 31 Global Step: 651600 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:00:24,833-Speed 6286.53 samples/sec Loss 3.2758 LearningRate 0.0001 Epoch: 31 Global Step: 651610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:00:28,076-Speed 6316.65 samples/sec Loss 3.1940 LearningRate 0.0001 Epoch: 31 Global Step: 651620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:00:31,327-Speed 6302.47 samples/sec Loss 3.1946 LearningRate 0.0001 Epoch: 31 Global Step: 651630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:00:34,576-Speed 6305.91 samples/sec Loss 3.2086 LearningRate 0.0001 Epoch: 31 Global Step: 651640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:00:37,822-Speed 6309.66 samples/sec Loss 3.2317 LearningRate 0.0001 Epoch: 31 Global Step: 651650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:00:41,067-Speed 6313.68 samples/sec Loss 3.1967 LearningRate 0.0001 Epoch: 31 Global Step: 651660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:00:44,312-Speed 6312.58 samples/sec Loss 3.2139 LearningRate 0.0001 Epoch: 31 Global Step: 651670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:00:47,568-Speed 6292.04 samples/sec Loss 3.2855 LearningRate 0.0001 Epoch: 31 Global Step: 651680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:00:50,814-Speed 6310.42 samples/sec Loss 3.1790 LearningRate 0.0001 Epoch: 31 Global Step: 651690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:00:54,060-Speed 6310.07 samples/sec Loss 3.2244 LearningRate 0.0001 Epoch: 31 Global Step: 651700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:00:57,293-Speed 6336.46 samples/sec Loss 3.2199 LearningRate 0.0001 Epoch: 31 Global Step: 651710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:00,538-Speed 6311.41 samples/sec Loss 3.2206 LearningRate 0.0001 Epoch: 31 Global Step: 651720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:03,794-Speed 6292.59 samples/sec Loss 3.2049 LearningRate 0.0001 Epoch: 31 Global Step: 651730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:07,038-Speed 6313.11 samples/sec Loss 3.1839 LearningRate 0.0001 Epoch: 31 Global Step: 651740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:10,287-Speed 6305.02 samples/sec Loss 3.2239 LearningRate 0.0001 Epoch: 31 Global Step: 651750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:13,533-Speed 6311.23 samples/sec Loss 3.1953 LearningRate 0.0001 Epoch: 31 Global Step: 651760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:16,776-Speed 6317.95 samples/sec Loss 3.2409 LearningRate 0.0001 Epoch: 31 Global Step: 651770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:20,021-Speed 6312.14 samples/sec Loss 3.2380 LearningRate 0.0001 Epoch: 31 Global Step: 651780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:23,266-Speed 6312.01 samples/sec Loss 3.1612 LearningRate 0.0001 Epoch: 31 Global Step: 651790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:26,511-Speed 6313.87 samples/sec Loss 3.1868 LearningRate 0.0001 Epoch: 31 Global Step: 651800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:29,744-Speed 6334.72 samples/sec Loss 3.1921 LearningRate 0.0001 Epoch: 31 Global Step: 651810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:32,992-Speed 6307.00 samples/sec Loss 3.2359 LearningRate 0.0001 Epoch: 31 Global Step: 651820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:36,243-Speed 6302.83 samples/sec Loss 3.1590 LearningRate 0.0001 Epoch: 31 Global Step: 651830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:39,492-Speed 6304.83 samples/sec Loss 3.2314 LearningRate 0.0001 Epoch: 31 Global Step: 651840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:42,740-Speed 6307.01 samples/sec Loss 3.2396 LearningRate 0.0001 Epoch: 31 Global Step: 651850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:45,983-Speed 6317.01 samples/sec Loss 3.2126 LearningRate 0.0001 Epoch: 31 Global Step: 651860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:49,233-Speed 6301.57 samples/sec Loss 3.2190 LearningRate 0.0001 Epoch: 31 Global Step: 651870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:52,482-Speed 6309.94 samples/sec Loss 3.2436 LearningRate 0.0001 Epoch: 31 Global Step: 651880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:55,732-Speed 6301.65 samples/sec Loss 3.2254 LearningRate 0.0001 Epoch: 31 Global Step: 651890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:01:58,985-Speed 6298.11 samples/sec Loss 3.2321 LearningRate 0.0001 Epoch: 31 Global Step: 651900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:02,250-Speed 6272.97 samples/sec Loss 3.2350 LearningRate 0.0001 Epoch: 31 Global Step: 651910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:05,530-Speed 6244.54 samples/sec Loss 3.2668 LearningRate 0.0001 Epoch: 31 Global Step: 651920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:08,791-Speed 6282.63 samples/sec Loss 3.2293 LearningRate 0.0001 Epoch: 31 Global Step: 651930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:12,034-Speed 6316.68 samples/sec Loss 3.2793 LearningRate 0.0001 Epoch: 31 Global Step: 651940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:15,277-Speed 6315.39 samples/sec Loss 3.2001 LearningRate 0.0001 Epoch: 31 Global Step: 651950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:18,520-Speed 6317.13 samples/sec Loss 3.1761 LearningRate 0.0001 Epoch: 31 Global Step: 651960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:21,766-Speed 6312.25 samples/sec Loss 3.2800 LearningRate 0.0001 Epoch: 31 Global Step: 651970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:25,017-Speed 6300.95 samples/sec Loss 3.2004 LearningRate 0.0001 Epoch: 31 Global Step: 651980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:28,265-Speed 6306.51 samples/sec Loss 3.1651 LearningRate 0.0001 Epoch: 31 Global Step: 651990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:31,516-Speed 6299.73 samples/sec Loss 3.1631 LearningRate 0.0001 Epoch: 31 Global Step: 652000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:34,751-Speed 6332.57 samples/sec Loss 3.2200 LearningRate 0.0001 Epoch: 31 Global Step: 652010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:38,001-Speed 6304.13 samples/sec Loss 3.1702 LearningRate 0.0001 Epoch: 31 Global Step: 652020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:41,248-Speed 6307.59 samples/sec Loss 3.1992 LearningRate 0.0001 Epoch: 31 Global Step: 652030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:02:44,479-Speed 6340.46 samples/sec Loss 3.1692 LearningRate 0.0001 Epoch: 31 Global Step: 652040 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:02:47,723-Speed 6315.32 samples/sec Loss 3.2046 LearningRate 0.0001 Epoch: 31 Global Step: 652050 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:02:50,967-Speed 6314.76 samples/sec Loss 3.2491 LearningRate 0.0001 Epoch: 31 Global Step: 652060 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:02:54,221-Speed 6296.06 samples/sec Loss 3.2184 LearningRate 0.0001 Epoch: 31 Global Step: 652070 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:02:57,475-Speed 6295.80 samples/sec Loss 3.2033 LearningRate 0.0001 Epoch: 31 Global Step: 652080 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:03:00,721-Speed 6310.26 samples/sec Loss 3.2040 LearningRate 0.0001 Epoch: 31 Global Step: 652090 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:03:03,966-Speed 6311.31 samples/sec Loss 3.2287 LearningRate 0.0001 Epoch: 31 Global Step: 652100 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:03:07,212-Speed 6310.72 samples/sec Loss 3.2194 LearningRate 0.0001 Epoch: 31 Global Step: 652110 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:03:10,458-Speed 6310.82 samples/sec Loss 3.2419 LearningRate 0.0001 Epoch: 31 Global Step: 652120 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:03:13,701-Speed 6316.07 samples/sec Loss 3.2177 LearningRate 0.0001 Epoch: 31 Global Step: 652130 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:03:16,947-Speed 6311.99 samples/sec Loss 3.1706 LearningRate 0.0001 Epoch: 31 Global Step: 652140 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:20,199-Speed 6299.01 samples/sec Loss 3.1741 LearningRate 0.0001 Epoch: 31 Global Step: 652150 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:23,457-Speed 6287.34 samples/sec Loss 3.2009 LearningRate 0.0001 Epoch: 31 Global Step: 652160 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:26,703-Speed 6310.93 samples/sec Loss 3.1339 LearningRate 0.0001 Epoch: 31 Global Step: 652170 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:29,946-Speed 6316.61 samples/sec Loss 3.2164 LearningRate 0.0001 Epoch: 31 Global Step: 652180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:33,191-Speed 6312.52 samples/sec Loss 3.2102 LearningRate 0.0001 Epoch: 31 Global Step: 652190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:36,439-Speed 6307.33 samples/sec Loss 3.2101 LearningRate 0.0001 Epoch: 31 Global Step: 652200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:39,687-Speed 6307.07 samples/sec Loss 3.2103 LearningRate 0.0001 Epoch: 31 Global Step: 652210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:42,932-Speed 6312.42 samples/sec Loss 3.1968 LearningRate 0.0001 Epoch: 31 Global Step: 652220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:46,174-Speed 6317.95 samples/sec Loss 3.1757 LearningRate 0.0001 Epoch: 31 Global Step: 652230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:49,403-Speed 6343.37 samples/sec Loss 3.1698 LearningRate 0.0001 Epoch: 31 Global Step: 652240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:52,647-Speed 6317.82 samples/sec Loss 3.2057 LearningRate 0.0001 Epoch: 31 Global Step: 652250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:55,896-Speed 6305.45 samples/sec Loss 3.2219 LearningRate 0.0001 Epoch: 31 Global Step: 652260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:03:59,147-Speed 6299.90 samples/sec Loss 3.2354 LearningRate 0.0001 Epoch: 31 Global Step: 652270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:02,400-Speed 6298.64 samples/sec Loss 3.2130 LearningRate 0.0001 Epoch: 31 Global Step: 652280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:05,658-Speed 6287.04 samples/sec Loss 3.2505 LearningRate 0.0001 Epoch: 31 Global Step: 652290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:08,910-Speed 6299.18 samples/sec Loss 3.2141 LearningRate 0.0001 Epoch: 31 Global Step: 652300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:12,159-Speed 6305.23 samples/sec Loss 3.2036 LearningRate 0.0001 Epoch: 31 Global Step: 652310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:15,400-Speed 6320.23 samples/sec Loss 3.1948 LearningRate 0.0001 Epoch: 31 Global Step: 652320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:18,645-Speed 6313.40 samples/sec Loss 3.2273 LearningRate 0.0001 Epoch: 31 Global Step: 652330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:21,876-Speed 6338.50 samples/sec Loss 3.1644 LearningRate 0.0001 Epoch: 31 Global Step: 652340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:25,129-Speed 6298.23 samples/sec Loss 3.2412 LearningRate 0.0001 Epoch: 31 Global Step: 652350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:28,374-Speed 6312.88 samples/sec Loss 3.2442 LearningRate 0.0001 Epoch: 31 Global Step: 652360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:31,618-Speed 6312.80 samples/sec Loss 3.2216 LearningRate 0.0001 Epoch: 31 Global Step: 652370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:34,867-Speed 6305.71 samples/sec Loss 3.2107 LearningRate 0.0001 Epoch: 31 Global Step: 652380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:38,110-Speed 6316.84 samples/sec Loss 3.2235 LearningRate 0.0001 Epoch: 31 Global Step: 652390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:41,408-Speed 6211.43 samples/sec Loss 3.1931 LearningRate 0.0001 Epoch: 31 Global Step: 652400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:44,659-Speed 6299.37 samples/sec Loss 3.1727 LearningRate 0.0001 Epoch: 31 Global Step: 652410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:47,905-Speed 6311.52 samples/sec Loss 3.2264 LearningRate 0.0001 Epoch: 31 Global Step: 652420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:51,151-Speed 6311.79 samples/sec Loss 3.1557 LearningRate 0.0001 Epoch: 31 Global Step: 652430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:54,380-Speed 6342.76 samples/sec Loss 3.2318 LearningRate 0.0001 Epoch: 31 Global Step: 652440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:04:57,626-Speed 6311.58 samples/sec Loss 3.2388 LearningRate 0.0001 Epoch: 31 Global Step: 652450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:00,871-Speed 6311.57 samples/sec Loss 3.2266 LearningRate 0.0001 Epoch: 31 Global Step: 652460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:04,116-Speed 6312.05 samples/sec Loss 3.2011 LearningRate 0.0001 Epoch: 31 Global Step: 652470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:07,361-Speed 6312.85 samples/sec Loss 3.2148 LearningRate 0.0001 Epoch: 31 Global Step: 652480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:10,605-Speed 6316.30 samples/sec Loss 3.1872 LearningRate 0.0001 Epoch: 31 Global Step: 652490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:13,862-Speed 6288.17 samples/sec Loss 3.2143 LearningRate 0.0001 Epoch: 31 Global Step: 652500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:17,105-Speed 6317.68 samples/sec Loss 3.2568 LearningRate 0.0001 Epoch: 31 Global Step: 652510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:20,362-Speed 6288.50 samples/sec Loss 3.2573 LearningRate 0.0001 Epoch: 31 Global Step: 652520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:23,609-Speed 6309.87 samples/sec Loss 3.1964 LearningRate 0.0001 Epoch: 31 Global Step: 652530 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:26,855-Speed 6311.04 samples/sec Loss 3.1873 LearningRate 0.0001 Epoch: 31 Global Step: 652540 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-03 03:05:30,083-Speed 6344.45 samples/sec Loss 3.2054 LearningRate 0.0001 Epoch: 31 Global Step: 652550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:33,342-Speed 6286.19 samples/sec Loss 3.1986 LearningRate 0.0001 Epoch: 31 Global Step: 652560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:36,586-Speed 6315.77 samples/sec Loss 3.1581 LearningRate 0.0001 Epoch: 31 Global Step: 652570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:39,831-Speed 6312.17 samples/sec Loss 3.1812 LearningRate 0.0001 Epoch: 31 Global Step: 652580 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:43,077-Speed 6309.47 samples/sec Loss 3.1897 LearningRate 0.0001 Epoch: 31 Global Step: 652590 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:46,327-Speed 6304.02 samples/sec Loss 3.2336 LearningRate 0.0001 Epoch: 31 Global Step: 652600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:49,572-Speed 6312.03 samples/sec Loss 3.1895 LearningRate 0.0001 Epoch: 31 Global Step: 652610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:52,817-Speed 6312.99 samples/sec Loss 3.2502 LearningRate 0.0001 Epoch: 31 Global Step: 652620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:56,069-Speed 6298.82 samples/sec Loss 3.2214 LearningRate 0.0001 Epoch: 31 Global Step: 652630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:05:59,317-Speed 6305.79 samples/sec Loss 3.2225 LearningRate 0.0001 Epoch: 31 Global Step: 652640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:02,554-Speed 6328.43 samples/sec Loss 3.2246 LearningRate 0.0001 Epoch: 31 Global Step: 652650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:05,799-Speed 6313.20 samples/sec Loss 3.2475 LearningRate 0.0001 Epoch: 31 Global Step: 652660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:09,046-Speed 6308.46 samples/sec Loss 3.2183 LearningRate 0.0001 Epoch: 31 Global Step: 652670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:12,291-Speed 6313.34 samples/sec Loss 3.2244 LearningRate 0.0001 Epoch: 31 Global Step: 652680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:15,537-Speed 6311.61 samples/sec Loss 3.2425 LearningRate 0.0001 Epoch: 31 Global Step: 652690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:18,783-Speed 6309.61 samples/sec Loss 3.2180 LearningRate 0.0001 Epoch: 31 Global Step: 652700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:22,078-Speed 6217.31 samples/sec Loss 3.1631 LearningRate 0.0001 Epoch: 31 Global Step: 652710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:25,397-Speed 6171.93 samples/sec Loss 3.1743 LearningRate 0.0001 Epoch: 31 Global Step: 652720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:28,646-Speed 6305.63 samples/sec Loss 3.1646 LearningRate 0.0001 Epoch: 31 Global Step: 652730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:31,902-Speed 6291.38 samples/sec Loss 3.2340 LearningRate 0.0001 Epoch: 31 Global Step: 652740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:35,131-Speed 6343.47 samples/sec Loss 3.2716 LearningRate 0.0001 Epoch: 31 Global Step: 652750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:38,378-Speed 6309.79 samples/sec Loss 3.2099 LearningRate 0.0001 Epoch: 31 Global Step: 652760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:41,636-Speed 6286.90 samples/sec Loss 3.1740 LearningRate 0.0001 Epoch: 31 Global Step: 652770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:44,880-Speed 6313.49 samples/sec Loss 3.2544 LearningRate 0.0001 Epoch: 31 Global Step: 652780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:48,131-Speed 6302.22 samples/sec Loss 3.2056 LearningRate 0.0001 Epoch: 31 Global Step: 652790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:51,375-Speed 6314.99 samples/sec Loss 3.1857 LearningRate 0.0001 Epoch: 31 Global Step: 652800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:54,619-Speed 6315.12 samples/sec Loss 3.1546 LearningRate 0.0001 Epoch: 31 Global Step: 652810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:06:57,860-Speed 6319.05 samples/sec Loss 3.2030 LearningRate 0.0001 Epoch: 31 Global Step: 652820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:01,109-Speed 6304.72 samples/sec Loss 3.2244 LearningRate 0.0001 Epoch: 31 Global Step: 652830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:04,359-Speed 6303.32 samples/sec Loss 3.2892 LearningRate 0.0001 Epoch: 31 Global Step: 652840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:07,591-Speed 6337.88 samples/sec Loss 3.2263 LearningRate 0.0001 Epoch: 31 Global Step: 652850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:10,847-Speed 6292.23 samples/sec Loss 3.2459 LearningRate 0.0001 Epoch: 31 Global Step: 652860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:14,091-Speed 6313.67 samples/sec Loss 3.2145 LearningRate 0.0001 Epoch: 31 Global Step: 652870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:17,341-Speed 6303.35 samples/sec Loss 3.2293 LearningRate 0.0001 Epoch: 31 Global Step: 652880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:20,598-Speed 6289.06 samples/sec Loss 3.1986 LearningRate 0.0001 Epoch: 31 Global Step: 652890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:23,851-Speed 6297.15 samples/sec Loss 3.1550 LearningRate 0.0001 Epoch: 31 Global Step: 652900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:27,098-Speed 6309.77 samples/sec Loss 3.1522 LearningRate 0.0001 Epoch: 31 Global Step: 652910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:30,403-Speed 6196.62 samples/sec Loss 3.2487 LearningRate 0.0001 Epoch: 31 Global Step: 652920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:33,649-Speed 6310.77 samples/sec Loss 3.2262 LearningRate 0.0001 Epoch: 31 Global Step: 652930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:36,902-Speed 6299.43 samples/sec Loss 3.1503 LearningRate 0.0001 Epoch: 31 Global Step: 652940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:40,143-Speed 6319.63 samples/sec Loss 3.2431 LearningRate 0.0001 Epoch: 31 Global Step: 652950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:43,390-Speed 6308.23 samples/sec Loss 3.2136 LearningRate 0.0001 Epoch: 31 Global Step: 652960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:46,634-Speed 6315.00 samples/sec Loss 3.1999 LearningRate 0.0001 Epoch: 31 Global Step: 652970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:49,881-Speed 6308.70 samples/sec Loss 3.2277 LearningRate 0.0001 Epoch: 31 Global Step: 652980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:53,135-Speed 6296.61 samples/sec Loss 3.1496 LearningRate 0.0001 Epoch: 31 Global Step: 652990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:56,383-Speed 6305.31 samples/sec Loss 3.1998 LearningRate 0.0001 Epoch: 31 Global Step: 653000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:07:59,621-Speed 6327.00 samples/sec Loss 3.2063 LearningRate 0.0001 Epoch: 31 Global Step: 653010 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:08:02,865-Speed 6314.67 samples/sec Loss 3.1937 LearningRate 0.0001 Epoch: 31 Global Step: 653020 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:08:06,115-Speed 6301.82 samples/sec Loss 3.2474 LearningRate 0.0001 Epoch: 31 Global Step: 653030 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:08:09,359-Speed 6315.16 samples/sec Loss 3.2446 LearningRate 0.0001 Epoch: 31 Global Step: 653040 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:08:12,601-Speed 6318.06 samples/sec Loss 3.1975 LearningRate 0.0001 Epoch: 31 Global Step: 653050 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:08:15,847-Speed 6311.32 samples/sec Loss 3.1978 LearningRate 0.0001 Epoch: 31 Global Step: 653060 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:08:19,092-Speed 6313.15 samples/sec Loss 3.2064 LearningRate 0.0001 Epoch: 31 Global Step: 653070 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:08:22,349-Speed 6289.98 samples/sec Loss 3.2090 LearningRate 0.0001 Epoch: 31 Global Step: 653080 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:08:25,595-Speed 6308.87 samples/sec Loss 3.1895 LearningRate 0.0001 Epoch: 31 Global Step: 653090 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:08:28,844-Speed 6306.18 samples/sec Loss 3.1832 LearningRate 0.0001 Epoch: 31 Global Step: 653100 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:08:32,089-Speed 6311.10 samples/sec Loss 3.2430 LearningRate 0.0001 Epoch: 31 Global Step: 653110 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:08:35,334-Speed 6312.69 samples/sec Loss 3.2437 LearningRate 0.0001 Epoch: 31 Global Step: 653120 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:08:38,585-Speed 6301.66 samples/sec Loss 3.2569 LearningRate 0.0001 Epoch: 31 Global Step: 653130 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:08:41,830-Speed 6311.74 samples/sec Loss 3.2252 LearningRate 0.0001 Epoch: 31 Global Step: 653140 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:08:45,081-Speed 6302.21 samples/sec Loss 3.1827 LearningRate 0.0001 Epoch: 31 Global Step: 653150 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:08:48,330-Speed 6304.83 samples/sec Loss 3.1878 LearningRate 0.0001 Epoch: 31 Global Step: 653160 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:08:51,576-Speed 6311.44 samples/sec Loss 3.1836 LearningRate 0.0001 Epoch: 31 Global Step: 653170 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:08:54,818-Speed 6318.09 samples/sec Loss 3.2003 LearningRate 0.0001 Epoch: 31 Global Step: 653180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:08:58,066-Speed 6307.85 samples/sec Loss 3.2276 LearningRate 0.0001 Epoch: 31 Global Step: 653190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:01,311-Speed 6311.95 samples/sec Loss 3.2112 LearningRate 0.0001 Epoch: 31 Global Step: 653200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:04,546-Speed 6331.08 samples/sec Loss 3.1757 LearningRate 0.0001 Epoch: 31 Global Step: 653210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:07,792-Speed 6311.60 samples/sec Loss 3.1890 LearningRate 0.0001 Epoch: 31 Global Step: 653220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:11,037-Speed 6312.77 samples/sec Loss 3.1885 LearningRate 0.0001 Epoch: 31 Global Step: 653230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:14,377-Speed 6132.37 samples/sec Loss 3.1675 LearningRate 0.0001 Epoch: 31 Global Step: 653240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:17,625-Speed 6307.59 samples/sec Loss 3.2174 LearningRate 0.0001 Epoch: 31 Global Step: 653250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:20,870-Speed 6313.40 samples/sec Loss 3.2115 LearningRate 0.0001 Epoch: 31 Global Step: 653260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:24,115-Speed 6311.42 samples/sec Loss 3.2128 LearningRate 0.0001 Epoch: 31 Global Step: 653270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:27,363-Speed 6308.59 samples/sec Loss 3.2071 LearningRate 0.0001 Epoch: 31 Global Step: 653280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:30,613-Speed 6301.37 samples/sec Loss 3.1834 LearningRate 0.0001 Epoch: 31 Global Step: 653290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:33,864-Speed 6300.35 samples/sec Loss 3.2050 LearningRate 0.0001 Epoch: 31 Global Step: 653300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:37,102-Speed 6327.88 samples/sec Loss 3.2808 LearningRate 0.0001 Epoch: 31 Global Step: 653310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:40,348-Speed 6310.99 samples/sec Loss 3.1596 LearningRate 0.0001 Epoch: 31 Global Step: 653320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:43,595-Speed 6307.09 samples/sec Loss 3.1718 LearningRate 0.0001 Epoch: 31 Global Step: 653330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:46,849-Speed 6296.75 samples/sec Loss 3.1713 LearningRate 0.0001 Epoch: 31 Global Step: 653340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:50,107-Speed 6285.72 samples/sec Loss 3.2139 LearningRate 0.0001 Epoch: 31 Global Step: 653350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:53,352-Speed 6314.01 samples/sec Loss 3.2327 LearningRate 0.0001 Epoch: 31 Global Step: 653360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:56,596-Speed 6313.15 samples/sec Loss 3.2173 LearningRate 0.0001 Epoch: 31 Global Step: 653370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:09:59,841-Speed 6314.31 samples/sec Loss 3.1923 LearningRate 0.0001 Epoch: 31 Global Step: 653380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:03,085-Speed 6314.38 samples/sec Loss 3.2483 LearningRate 0.0001 Epoch: 31 Global Step: 653390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:06,330-Speed 6312.08 samples/sec Loss 3.2076 LearningRate 0.0001 Epoch: 31 Global Step: 653400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:09,560-Speed 6342.16 samples/sec Loss 3.2199 LearningRate 0.0001 Epoch: 31 Global Step: 653410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:12,810-Speed 6311.43 samples/sec Loss 3.1998 LearningRate 0.0001 Epoch: 31 Global Step: 653420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:16,056-Speed 6311.73 samples/sec Loss 3.2004 LearningRate 0.0001 Epoch: 31 Global Step: 653430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:19,306-Speed 6302.89 samples/sec Loss 3.2152 LearningRate 0.0001 Epoch: 31 Global Step: 653440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:22,555-Speed 6305.07 samples/sec Loss 3.2205 LearningRate 0.0001 Epoch: 31 Global Step: 653450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:25,803-Speed 6306.72 samples/sec Loss 3.1787 LearningRate 0.0001 Epoch: 31 Global Step: 653460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:29,047-Speed 6313.34 samples/sec Loss 3.2334 LearningRate 0.0001 Epoch: 31 Global Step: 653470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:32,292-Speed 6312.51 samples/sec Loss 3.2260 LearningRate 0.0001 Epoch: 31 Global Step: 653480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:35,542-Speed 6304.21 samples/sec Loss 3.2536 LearningRate 0.0001 Epoch: 31 Global Step: 653490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:10:38,781-Speed 6322.59 samples/sec Loss 3.1886 LearningRate 0.0001 Epoch: 31 Global Step: 653500 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:10:42,032-Speed 6302.76 samples/sec Loss 3.1660 LearningRate 0.0001 Epoch: 31 Global Step: 653510 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:10:45,276-Speed 6314.68 samples/sec Loss 3.2015 LearningRate 0.0001 Epoch: 31 Global Step: 653520 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:10:48,525-Speed 6304.52 samples/sec Loss 3.2754 LearningRate 0.0001 Epoch: 31 Global Step: 653530 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:10:51,781-Speed 6290.44 samples/sec Loss 3.1212 LearningRate 0.0001 Epoch: 31 Global Step: 653540 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:10:55,026-Speed 6312.08 samples/sec Loss 3.2373 LearningRate 0.0001 Epoch: 31 Global Step: 653550 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:10:58,269-Speed 6317.58 samples/sec Loss 3.1499 LearningRate 0.0001 Epoch: 31 Global Step: 653560 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:01,519-Speed 6302.58 samples/sec Loss 3.1577 LearningRate 0.0001 Epoch: 31 Global Step: 653570 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:04,770-Speed 6300.30 samples/sec Loss 3.2130 LearningRate 0.0001 Epoch: 31 Global Step: 653580 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:08,018-Speed 6306.80 samples/sec Loss 3.1782 LearningRate 0.0001 Epoch: 31 Global Step: 653590 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:11,269-Speed 6302.74 samples/sec Loss 3.1846 LearningRate 0.0001 Epoch: 31 Global Step: 653600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:11:14,497-Speed 6346.23 samples/sec Loss 3.2123 LearningRate 0.0001 Epoch: 31 Global Step: 653610 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:17,743-Speed 6311.23 samples/sec Loss 3.2144 LearningRate 0.0001 Epoch: 31 Global Step: 653620 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:20,993-Speed 6302.32 samples/sec Loss 3.2059 LearningRate 0.0001 Epoch: 31 Global Step: 653630 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:24,235-Speed 6318.28 samples/sec Loss 3.1908 LearningRate 0.0001 Epoch: 31 Global Step: 653640 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:27,480-Speed 6314.16 samples/sec Loss 3.2188 LearningRate 0.0001 Epoch: 31 Global Step: 653650 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:30,724-Speed 6313.90 samples/sec Loss 3.2309 LearningRate 0.0001 Epoch: 31 Global Step: 653660 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:33,964-Speed 6322.61 samples/sec Loss 3.1877 LearningRate 0.0001 Epoch: 31 Global Step: 653670 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:37,212-Speed 6306.88 samples/sec Loss 3.2141 LearningRate 0.0001 Epoch: 31 Global Step: 653680 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:40,453-Speed 6319.09 samples/sec Loss 3.2193 LearningRate 0.0001 Epoch: 31 Global Step: 653690 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:43,699-Speed 6311.38 samples/sec Loss 3.2093 LearningRate 0.0001 Epoch: 31 Global Step: 653700 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:11:46,946-Speed 6309.02 samples/sec Loss 3.1463 LearningRate 0.0001 Epoch: 31 Global Step: 653710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:11:50,191-Speed 6312.13 samples/sec Loss 3.2009 LearningRate 0.0001 Epoch: 31 Global Step: 653720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:11:53,443-Speed 6299.75 samples/sec Loss 3.2145 LearningRate 0.0001 Epoch: 31 Global Step: 653730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:11:56,687-Speed 6314.14 samples/sec Loss 3.2381 LearningRate 0.0001 Epoch: 31 Global Step: 653740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:11:59,932-Speed 6313.15 samples/sec Loss 3.2359 LearningRate 0.0001 Epoch: 31 Global Step: 653750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:03,181-Speed 6304.63 samples/sec Loss 3.1485 LearningRate 0.0001 Epoch: 31 Global Step: 653760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:06,427-Speed 6309.73 samples/sec Loss 3.2534 LearningRate 0.0001 Epoch: 31 Global Step: 653770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:09,672-Speed 6313.36 samples/sec Loss 3.2115 LearningRate 0.0001 Epoch: 31 Global Step: 653780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:13,485-Speed 5371.37 samples/sec Loss 3.1786 LearningRate 0.0001 Epoch: 31 Global Step: 653790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:16,732-Speed 6309.54 samples/sec Loss 3.2133 LearningRate 0.0001 Epoch: 31 Global Step: 653800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:19,976-Speed 6314.10 samples/sec Loss 3.2330 LearningRate 0.0001 Epoch: 31 Global Step: 653810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:23,220-Speed 6315.83 samples/sec Loss 3.2248 LearningRate 0.0001 Epoch: 31 Global Step: 653820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:26,466-Speed 6310.05 samples/sec Loss 3.2004 LearningRate 0.0001 Epoch: 31 Global Step: 653830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:29,715-Speed 6306.62 samples/sec Loss 3.2656 LearningRate 0.0001 Epoch: 31 Global Step: 653840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:32,960-Speed 6312.03 samples/sec Loss 3.1990 LearningRate 0.0001 Epoch: 31 Global Step: 653850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:36,209-Speed 6304.33 samples/sec Loss 3.2105 LearningRate 0.0001 Epoch: 31 Global Step: 653860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:39,456-Speed 6308.84 samples/sec Loss 3.1481 LearningRate 0.0001 Epoch: 31 Global Step: 653870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:42,704-Speed 6308.10 samples/sec Loss 3.2407 LearningRate 0.0001 Epoch: 31 Global Step: 653880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:45,951-Speed 6307.09 samples/sec Loss 3.2596 LearningRate 0.0001 Epoch: 31 Global Step: 653890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:49,195-Speed 6314.78 samples/sec Loss 3.2710 LearningRate 0.0001 Epoch: 31 Global Step: 653900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:52,428-Speed 6336.30 samples/sec Loss 3.1926 LearningRate 0.0001 Epoch: 31 Global Step: 653910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:55,673-Speed 6311.84 samples/sec Loss 3.2328 LearningRate 0.0001 Epoch: 31 Global Step: 653920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:12:58,919-Speed 6311.38 samples/sec Loss 3.2221 LearningRate 0.0001 Epoch: 31 Global Step: 653930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:02,172-Speed 6296.81 samples/sec Loss 3.2187 LearningRate 0.0001 Epoch: 31 Global Step: 653940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:05,434-Speed 6280.57 samples/sec Loss 3.2098 LearningRate 0.0001 Epoch: 31 Global Step: 653950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:08,686-Speed 6298.92 samples/sec Loss 3.2265 LearningRate 0.0001 Epoch: 31 Global Step: 653960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:11,946-Speed 6283.60 samples/sec Loss 3.2151 LearningRate 0.0001 Epoch: 31 Global Step: 653970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:15,189-Speed 6316.87 samples/sec Loss 3.1542 LearningRate 0.0001 Epoch: 31 Global Step: 653980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:18,432-Speed 6316.62 samples/sec Loss 3.1421 LearningRate 0.0001 Epoch: 31 Global Step: 653990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:21,674-Speed 6317.86 samples/sec Loss 3.1638 LearningRate 0.0001 Epoch: 31 Global Step: 654000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:24,927-Speed 6297.12 samples/sec Loss 3.2241 LearningRate 0.0001 Epoch: 31 Global Step: 654010 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-03 03:13:28,163-Speed 6331.40 samples/sec Loss 3.1409 LearningRate 0.0001 Epoch: 31 Global Step: 654020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:31,408-Speed 6311.37 samples/sec Loss 3.1421 LearningRate 0.0001 Epoch: 31 Global Step: 654030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:34,656-Speed 6308.29 samples/sec Loss 3.2171 LearningRate 0.0001 Epoch: 31 Global Step: 654040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:37,904-Speed 6305.44 samples/sec Loss 3.2043 LearningRate 0.0001 Epoch: 31 Global Step: 654050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:41,156-Speed 6299.98 samples/sec Loss 3.2068 LearningRate 0.0001 Epoch: 31 Global Step: 654060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:44,405-Speed 6305.94 samples/sec Loss 3.1793 LearningRate 0.0001 Epoch: 31 Global Step: 654070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:47,651-Speed 6310.79 samples/sec Loss 3.2139 LearningRate 0.0001 Epoch: 31 Global Step: 654080 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:50,899-Speed 6305.41 samples/sec Loss 3.1837 LearningRate 0.0001 Epoch: 31 Global Step: 654090 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:54,154-Speed 6297.80 samples/sec Loss 3.2326 LearningRate 0.0001 Epoch: 31 Global Step: 654100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:13:57,389-Speed 6330.92 samples/sec Loss 3.1580 LearningRate 0.0001 Epoch: 31 Global Step: 654110 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:14:00,641-Speed 6299.54 samples/sec Loss 3.1614 LearningRate 0.0001 Epoch: 31 Global Step: 654120 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:14:03,893-Speed 6299.46 samples/sec Loss 3.1753 LearningRate 0.0001 Epoch: 31 Global Step: 654130 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:14:07,137-Speed 6313.73 samples/sec Loss 3.2347 LearningRate 0.0001 Epoch: 31 Global Step: 654140 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:14:10,388-Speed 6301.18 samples/sec Loss 3.2054 LearningRate 0.0001 Epoch: 31 Global Step: 654150 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:14:13,638-Speed 6302.71 samples/sec Loss 3.1885 LearningRate 0.0001 Epoch: 31 Global Step: 654160 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:14:16,884-Speed 6311.45 samples/sec Loss 3.1975 LearningRate 0.0001 Epoch: 31 Global Step: 654170 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:14:20,128-Speed 6313.87 samples/sec Loss 3.1608 LearningRate 0.0001 Epoch: 31 Global Step: 654180 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:14:23,376-Speed 6307.99 samples/sec Loss 3.1996 LearningRate 0.0001 Epoch: 31 Global Step: 654190 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:14:26,625-Speed 6303.86 samples/sec Loss 3.2769 LearningRate 0.0001 Epoch: 31 Global Step: 654200 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:14:29,877-Speed 6298.91 samples/sec Loss 3.1802 LearningRate 0.0001 Epoch: 31 Global Step: 654210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:14:33,126-Speed 6305.39 samples/sec Loss 3.2174 LearningRate 0.0001 Epoch: 31 Global Step: 654220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:14:36,385-Speed 6284.52 samples/sec Loss 3.2552 LearningRate 0.0001 Epoch: 31 Global Step: 654230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:14:39,638-Speed 6298.11 samples/sec Loss 3.2218 LearningRate 0.0001 Epoch: 31 Global Step: 654240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:14:42,883-Speed 6313.02 samples/sec Loss 3.2190 LearningRate 0.0001 Epoch: 31 Global Step: 654250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:14:46,131-Speed 6305.59 samples/sec Loss 3.2356 LearningRate 0.0001 Epoch: 31 Global Step: 654260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:14:49,375-Speed 6315.15 samples/sec Loss 3.1795 LearningRate 0.0001 Epoch: 31 Global Step: 654270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:14:52,638-Speed 6278.11 samples/sec Loss 3.1715 LearningRate 0.0001 Epoch: 31 Global Step: 654280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:14:55,888-Speed 6304.14 samples/sec Loss 3.1830 LearningRate 0.0001 Epoch: 31 Global Step: 654290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:14:59,146-Speed 6287.69 samples/sec Loss 3.2366 LearningRate 0.0001 Epoch: 31 Global Step: 654300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:02,379-Speed 6335.80 samples/sec Loss 3.1957 LearningRate 0.0001 Epoch: 31 Global Step: 654310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:05,625-Speed 6310.09 samples/sec Loss 3.1896 LearningRate 0.0001 Epoch: 31 Global Step: 654320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:08,875-Speed 6303.34 samples/sec Loss 3.2830 LearningRate 0.0001 Epoch: 31 Global Step: 654330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:12,120-Speed 6312.80 samples/sec Loss 3.1976 LearningRate 0.0001 Epoch: 31 Global Step: 654340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:15,365-Speed 6312.89 samples/sec Loss 3.1773 LearningRate 0.0001 Epoch: 31 Global Step: 654350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:18,610-Speed 6311.05 samples/sec Loss 3.1949 LearningRate 0.0001 Epoch: 31 Global Step: 654360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:21,856-Speed 6311.05 samples/sec Loss 3.2007 LearningRate 0.0001 Epoch: 31 Global Step: 654370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:25,108-Speed 6299.94 samples/sec Loss 3.1710 LearningRate 0.0001 Epoch: 31 Global Step: 654380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:28,353-Speed 6312.74 samples/sec Loss 3.1817 LearningRate 0.0001 Epoch: 31 Global Step: 654390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:31,601-Speed 6305.80 samples/sec Loss 3.1941 LearningRate 0.0001 Epoch: 31 Global Step: 654400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:34,835-Speed 6334.13 samples/sec Loss 3.1593 LearningRate 0.0001 Epoch: 31 Global Step: 654410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:38,078-Speed 6317.41 samples/sec Loss 3.2298 LearningRate 0.0001 Epoch: 31 Global Step: 654420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:41,324-Speed 6310.21 samples/sec Loss 3.2328 LearningRate 0.0001 Epoch: 31 Global Step: 654430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:44,566-Speed 6318.55 samples/sec Loss 3.1691 LearningRate 0.0001 Epoch: 31 Global Step: 654440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:47,812-Speed 6310.27 samples/sec Loss 3.1862 LearningRate 0.0001 Epoch: 31 Global Step: 654450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:51,056-Speed 6315.21 samples/sec Loss 3.1942 LearningRate 0.0001 Epoch: 31 Global Step: 654460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:54,306-Speed 6302.50 samples/sec Loss 3.2492 LearningRate 0.0001 Epoch: 31 Global Step: 654470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:15:57,560-Speed 6295.28 samples/sec Loss 3.2148 LearningRate 0.0001 Epoch: 31 Global Step: 654480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:00,809-Speed 6305.62 samples/sec Loss 3.2023 LearningRate 0.0001 Epoch: 31 Global Step: 654490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:04,059-Speed 6303.25 samples/sec Loss 3.1729 LearningRate 0.0001 Epoch: 31 Global Step: 654500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:07,290-Speed 6340.99 samples/sec Loss 3.2223 LearningRate 0.0001 Epoch: 31 Global Step: 654510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:10,536-Speed 6309.18 samples/sec Loss 3.2664 LearningRate 0.0001 Epoch: 31 Global Step: 654520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:13,802-Speed 6272.66 samples/sec Loss 3.1579 LearningRate 0.0001 Epoch: 31 Global Step: 654530 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:17,054-Speed 6299.50 samples/sec Loss 3.2078 LearningRate 0.0001 Epoch: 31 Global Step: 654540 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:20,304-Speed 6302.97 samples/sec Loss 3.2009 LearningRate 0.0001 Epoch: 31 Global Step: 654550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:23,547-Speed 6316.56 samples/sec Loss 3.1987 LearningRate 0.0001 Epoch: 31 Global Step: 654560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:26,790-Speed 6315.66 samples/sec Loss 3.1618 LearningRate 0.0001 Epoch: 31 Global Step: 654570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:30,039-Speed 6305.81 samples/sec Loss 3.1727 LearningRate 0.0001 Epoch: 31 Global Step: 654580 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:33,285-Speed 6310.75 samples/sec Loss 3.1887 LearningRate 0.0001 Epoch: 31 Global Step: 654590 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:36,533-Speed 6306.13 samples/sec Loss 3.1678 LearningRate 0.0001 Epoch: 31 Global Step: 654600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:39,761-Speed 6346.11 samples/sec Loss 3.1926 LearningRate 0.0001 Epoch: 31 Global Step: 654610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:43,006-Speed 6311.76 samples/sec Loss 3.1798 LearningRate 0.0001 Epoch: 31 Global Step: 654620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:46,253-Speed 6309.15 samples/sec Loss 3.1541 LearningRate 0.0001 Epoch: 31 Global Step: 654630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:49,503-Speed 6304.23 samples/sec Loss 3.1492 LearningRate 0.0001 Epoch: 31 Global Step: 654640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:52,747-Speed 6314.23 samples/sec Loss 3.2109 LearningRate 0.0001 Epoch: 31 Global Step: 654650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:16:55,987-Speed 6321.26 samples/sec Loss 3.2406 LearningRate 0.0001 Epoch: 31 Global Step: 654660 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:16:59,233-Speed 6311.13 samples/sec Loss 3.1468 LearningRate 0.0001 Epoch: 31 Global Step: 654670 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:02,497-Speed 6276.85 samples/sec Loss 3.1615 LearningRate 0.0001 Epoch: 31 Global Step: 654680 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:05,744-Speed 6308.48 samples/sec Loss 3.1647 LearningRate 0.0001 Epoch: 31 Global Step: 654690 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:08,993-Speed 6306.12 samples/sec Loss 3.1966 LearningRate 0.0001 Epoch: 31 Global Step: 654700 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:12,236-Speed 6315.10 samples/sec Loss 3.1801 LearningRate 0.0001 Epoch: 31 Global Step: 654710 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:15,481-Speed 6313.77 samples/sec Loss 3.2002 LearningRate 0.0001 Epoch: 31 Global Step: 654720 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:18,729-Speed 6307.72 samples/sec Loss 3.2388 LearningRate 0.0001 Epoch: 31 Global Step: 654730 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:21,974-Speed 6311.22 samples/sec Loss 3.2158 LearningRate 0.0001 Epoch: 31 Global Step: 654740 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:25,221-Speed 6309.04 samples/sec Loss 3.1258 LearningRate 0.0001 Epoch: 31 Global Step: 654750 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:28,468-Speed 6310.58 samples/sec Loss 3.1850 LearningRate 0.0001 Epoch: 31 Global Step: 654760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:17:31,705-Speed 6326.96 samples/sec Loss 3.2758 LearningRate 0.0001 Epoch: 31 Global Step: 654770 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:34,951-Speed 6311.86 samples/sec Loss 3.1621 LearningRate 0.0001 Epoch: 31 Global Step: 654780 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:38,203-Speed 6298.56 samples/sec Loss 3.1284 LearningRate 0.0001 Epoch: 31 Global Step: 654790 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:41,462-Speed 6285.10 samples/sec Loss 3.1869 LearningRate 0.0001 Epoch: 31 Global Step: 654800 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:44,707-Speed 6312.08 samples/sec Loss 3.2806 LearningRate 0.0001 Epoch: 31 Global Step: 654810 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:47,953-Speed 6311.65 samples/sec Loss 3.1940 LearningRate 0.0001 Epoch: 31 Global Step: 654820 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:51,203-Speed 6303.32 samples/sec Loss 3.2016 LearningRate 0.0001 Epoch: 31 Global Step: 654830 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:54,448-Speed 6311.76 samples/sec Loss 3.1779 LearningRate 0.0001 Epoch: 31 Global Step: 654840 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:17:57,693-Speed 6312.14 samples/sec Loss 3.2475 LearningRate 0.0001 Epoch: 31 Global Step: 654850 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:18:00,939-Speed 6312.11 samples/sec Loss 3.1706 LearningRate 0.0001 Epoch: 31 Global Step: 654860 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:18:04,194-Speed 6293.07 samples/sec Loss 3.2033 LearningRate 0.0001 Epoch: 31 Global Step: 654870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:07,444-Speed 6302.76 samples/sec Loss 3.1642 LearningRate 0.0001 Epoch: 31 Global Step: 654880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:10,693-Speed 6304.68 samples/sec Loss 3.1812 LearningRate 0.0001 Epoch: 31 Global Step: 654890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:13,940-Speed 6307.57 samples/sec Loss 3.2410 LearningRate 0.0001 Epoch: 31 Global Step: 654900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:17,189-Speed 6305.09 samples/sec Loss 3.1739 LearningRate 0.0001 Epoch: 31 Global Step: 654910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:20,435-Speed 6311.04 samples/sec Loss 3.1761 LearningRate 0.0001 Epoch: 31 Global Step: 654920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:23,678-Speed 6316.48 samples/sec Loss 3.2397 LearningRate 0.0001 Epoch: 31 Global Step: 654930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:26,929-Speed 6302.25 samples/sec Loss 3.1431 LearningRate 0.0001 Epoch: 31 Global Step: 654940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:30,175-Speed 6309.13 samples/sec Loss 3.1942 LearningRate 0.0001 Epoch: 31 Global Step: 654950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:33,422-Speed 6309.72 samples/sec Loss 3.1578 LearningRate 0.0001 Epoch: 31 Global Step: 654960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:36,654-Speed 6339.94 samples/sec Loss 3.2056 LearningRate 0.0001 Epoch: 31 Global Step: 654970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:39,899-Speed 6311.36 samples/sec Loss 3.1740 LearningRate 0.0001 Epoch: 31 Global Step: 654980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:43,146-Speed 6308.66 samples/sec Loss 3.2068 LearningRate 0.0001 Epoch: 31 Global Step: 654990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:46,389-Speed 6317.76 samples/sec Loss 3.1675 LearningRate 0.0001 Epoch: 31 Global Step: 655000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:49,638-Speed 6304.79 samples/sec Loss 3.2593 LearningRate 0.0001 Epoch: 31 Global Step: 655010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:52,885-Speed 6307.39 samples/sec Loss 3.1387 LearningRate 0.0001 Epoch: 31 Global Step: 655020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:56,133-Speed 6308.21 samples/sec Loss 3.2434 LearningRate 0.0001 Epoch: 31 Global Step: 655030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:18:59,379-Speed 6309.85 samples/sec Loss 3.1986 LearningRate 0.0001 Epoch: 31 Global Step: 655040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:02,622-Speed 6317.66 samples/sec Loss 3.1967 LearningRate 0.0001 Epoch: 31 Global Step: 655050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:05,878-Speed 6291.10 samples/sec Loss 3.1846 LearningRate 0.0001 Epoch: 31 Global Step: 655060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:09,129-Speed 6299.52 samples/sec Loss 3.1739 LearningRate 0.0001 Epoch: 31 Global Step: 655070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:12,379-Speed 6303.77 samples/sec Loss 3.1781 LearningRate 0.0001 Epoch: 31 Global Step: 655080 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:15,624-Speed 6312.78 samples/sec Loss 3.1734 LearningRate 0.0001 Epoch: 31 Global Step: 655090 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:18,883-Speed 6285.38 samples/sec Loss 3.2426 LearningRate 0.0001 Epoch: 31 Global Step: 655100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:22,138-Speed 6293.92 samples/sec Loss 3.2271 LearningRate 0.0001 Epoch: 31 Global Step: 655110 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:25,383-Speed 6311.13 samples/sec Loss 3.1711 LearningRate 0.0001 Epoch: 31 Global Step: 655120 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:28,661-Speed 6249.30 samples/sec Loss 3.1632 LearningRate 0.0001 Epoch: 31 Global Step: 655130 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:31,911-Speed 6302.98 samples/sec Loss 3.1901 LearningRate 0.0001 Epoch: 31 Global Step: 655140 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:35,167-Speed 6291.80 samples/sec Loss 3.1917 LearningRate 0.0001 Epoch: 31 Global Step: 655150 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:38,424-Speed 6289.19 samples/sec Loss 3.1701 LearningRate 0.0001 Epoch: 31 Global Step: 655160 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:41,715-Speed 6224.00 samples/sec Loss 3.1685 LearningRate 0.0001 Epoch: 31 Global Step: 655170 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-03 03:19:44,944-Speed 6344.19 samples/sec Loss 3.1337 LearningRate 0.0001 Epoch: 31 Global Step: 655180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:48,189-Speed 6314.72 samples/sec Loss 3.2194 LearningRate 0.0001 Epoch: 31 Global Step: 655190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:51,439-Speed 6302.05 samples/sec Loss 3.1927 LearningRate 0.0001 Epoch: 31 Global Step: 655200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:54,686-Speed 6308.20 samples/sec Loss 3.1973 LearningRate 0.0001 Epoch: 31 Global Step: 655210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:19:57,931-Speed 6314.70 samples/sec Loss 3.1573 LearningRate 0.0001 Epoch: 31 Global Step: 655220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:01,204-Speed 6257.05 samples/sec Loss 3.1982 LearningRate 0.0001 Epoch: 31 Global Step: 655230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:04,453-Speed 6305.15 samples/sec Loss 3.1984 LearningRate 0.0001 Epoch: 31 Global Step: 655240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:07,700-Speed 6308.77 samples/sec Loss 3.2292 LearningRate 0.0001 Epoch: 31 Global Step: 655250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:10,956-Speed 6292.02 samples/sec Loss 3.1634 LearningRate 0.0001 Epoch: 31 Global Step: 655260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:14,212-Speed 6291.13 samples/sec Loss 3.1803 LearningRate 0.0001 Epoch: 31 Global Step: 655270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:17,446-Speed 6333.47 samples/sec Loss 3.2383 LearningRate 0.0001 Epoch: 31 Global Step: 655280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:20,704-Speed 6288.64 samples/sec Loss 3.1660 LearningRate 0.0001 Epoch: 31 Global Step: 655290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:23,946-Speed 6318.44 samples/sec Loss 3.1611 LearningRate 0.0001 Epoch: 31 Global Step: 655300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:27,197-Speed 6301.10 samples/sec Loss 3.1890 LearningRate 0.0001 Epoch: 31 Global Step: 655310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:30,442-Speed 6311.71 samples/sec Loss 3.1321 LearningRate 0.0001 Epoch: 31 Global Step: 655320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:33,687-Speed 6312.69 samples/sec Loss 3.1339 LearningRate 0.0001 Epoch: 31 Global Step: 655330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:36,932-Speed 6313.83 samples/sec Loss 3.2435 LearningRate 0.0001 Epoch: 31 Global Step: 655340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:40,178-Speed 6310.45 samples/sec Loss 3.1910 LearningRate 0.0001 Epoch: 31 Global Step: 655350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:43,421-Speed 6316.59 samples/sec Loss 3.2183 LearningRate 0.0001 Epoch: 31 Global Step: 655360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:46,679-Speed 6286.81 samples/sec Loss 3.1783 LearningRate 0.0001 Epoch: 31 Global Step: 655370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:49,907-Speed 6346.60 samples/sec Loss 3.1353 LearningRate 0.0001 Epoch: 31 Global Step: 655380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:53,153-Speed 6310.44 samples/sec Loss 3.2127 LearningRate 0.0001 Epoch: 31 Global Step: 655390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:56,400-Speed 6308.71 samples/sec Loss 3.1904 LearningRate 0.0001 Epoch: 31 Global Step: 655400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:20:59,651-Speed 6301.19 samples/sec Loss 3.2210 LearningRate 0.0001 Epoch: 31 Global Step: 655410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:02,903-Speed 6298.12 samples/sec Loss 3.2078 LearningRate 0.0001 Epoch: 31 Global Step: 655420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:06,156-Speed 6297.74 samples/sec Loss 3.2081 LearningRate 0.0001 Epoch: 31 Global Step: 655430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:09,400-Speed 6316.46 samples/sec Loss 3.2042 LearningRate 0.0001 Epoch: 31 Global Step: 655440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:12,643-Speed 6316.37 samples/sec Loss 3.2086 LearningRate 0.0001 Epoch: 31 Global Step: 655450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:15,887-Speed 6314.35 samples/sec Loss 3.2013 LearningRate 0.0001 Epoch: 31 Global Step: 655460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:19,139-Speed 6299.06 samples/sec Loss 3.1392 LearningRate 0.0001 Epoch: 31 Global Step: 655470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:22,369-Speed 6340.63 samples/sec Loss 3.1762 LearningRate 0.0001 Epoch: 31 Global Step: 655480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:25,626-Speed 6290.05 samples/sec Loss 3.1915 LearningRate 0.0001 Epoch: 31 Global Step: 655490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:28,876-Speed 6303.82 samples/sec Loss 3.1784 LearningRate 0.0001 Epoch: 31 Global Step: 655500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:32,128-Speed 6298.95 samples/sec Loss 3.1976 LearningRate 0.0001 Epoch: 31 Global Step: 655510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:35,371-Speed 6316.34 samples/sec Loss 3.2300 LearningRate 0.0001 Epoch: 31 Global Step: 655520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:38,616-Speed 6311.72 samples/sec Loss 3.1654 LearningRate 0.0001 Epoch: 31 Global Step: 655530 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:41,861-Speed 6312.26 samples/sec Loss 3.1943 LearningRate 0.0001 Epoch: 31 Global Step: 655540 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:45,106-Speed 6313.06 samples/sec Loss 3.1863 LearningRate 0.0001 Epoch: 31 Global Step: 655550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:48,353-Speed 6308.29 samples/sec Loss 3.2329 LearningRate 0.0001 Epoch: 31 Global Step: 655560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:51,608-Speed 6294.47 samples/sec Loss 3.1719 LearningRate 0.0001 Epoch: 31 Global Step: 655570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:54,838-Speed 6341.66 samples/sec Loss 3.2672 LearningRate 0.0001 Epoch: 31 Global Step: 655580 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:21:58,082-Speed 6314.51 samples/sec Loss 3.1498 LearningRate 0.0001 Epoch: 31 Global Step: 655590 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:01,331-Speed 6304.42 samples/sec Loss 3.2711 LearningRate 0.0001 Epoch: 31 Global Step: 655600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:04,575-Speed 6314.53 samples/sec Loss 3.2108 LearningRate 0.0001 Epoch: 31 Global Step: 655610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:07,827-Speed 6300.55 samples/sec Loss 3.2292 LearningRate 0.0001 Epoch: 31 Global Step: 655620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:11,071-Speed 6313.94 samples/sec Loss 3.1680 LearningRate 0.0001 Epoch: 31 Global Step: 655630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:14,318-Speed 6308.04 samples/sec Loss 3.2065 LearningRate 0.0001 Epoch: 31 Global Step: 655640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:17,568-Speed 6303.95 samples/sec Loss 3.2387 LearningRate 0.0001 Epoch: 31 Global Step: 655650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:20,814-Speed 6311.69 samples/sec Loss 3.2462 LearningRate 0.0001 Epoch: 31 Global Step: 655660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:24,061-Speed 6308.37 samples/sec Loss 3.2194 LearningRate 0.0001 Epoch: 31 Global Step: 655670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:27,309-Speed 6306.78 samples/sec Loss 3.1363 LearningRate 0.0001 Epoch: 31 Global Step: 655680 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-03 03:22:30,547-Speed 6325.26 samples/sec Loss 3.1592 LearningRate 0.0001 Epoch: 31 Global Step: 655690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:33,801-Speed 6296.69 samples/sec Loss 3.1807 LearningRate 0.0001 Epoch: 31 Global Step: 655700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:37,047-Speed 6309.74 samples/sec Loss 3.2005 LearningRate 0.0001 Epoch: 31 Global Step: 655710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:40,291-Speed 6314.84 samples/sec Loss 3.2403 LearningRate 0.0001 Epoch: 31 Global Step: 655720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:43,536-Speed 6312.36 samples/sec Loss 3.2186 LearningRate 0.0001 Epoch: 31 Global Step: 655730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:46,780-Speed 6314.48 samples/sec Loss 3.2206 LearningRate 0.0001 Epoch: 31 Global Step: 655740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:50,028-Speed 6306.60 samples/sec Loss 3.1841 LearningRate 0.0001 Epoch: 31 Global Step: 655750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:53,271-Speed 6317.89 samples/sec Loss 3.2387 LearningRate 0.0001 Epoch: 31 Global Step: 655760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:56,529-Speed 6287.45 samples/sec Loss 3.2097 LearningRate 0.0001 Epoch: 31 Global Step: 655770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:22:59,781-Speed 6298.43 samples/sec Loss 3.1875 LearningRate 0.0001 Epoch: 31 Global Step: 655780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:03,013-Speed 6338.44 samples/sec Loss 3.2036 LearningRate 0.0001 Epoch: 31 Global Step: 655790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:06,260-Speed 6309.32 samples/sec Loss 3.1669 LearningRate 0.0001 Epoch: 31 Global Step: 655800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:09,513-Speed 6296.44 samples/sec Loss 3.1908 LearningRate 0.0001 Epoch: 31 Global Step: 655810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:12,817-Speed 6199.13 samples/sec Loss 3.1502 LearningRate 0.0001 Epoch: 31 Global Step: 655820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:16,065-Speed 6306.78 samples/sec Loss 3.2239 LearningRate 0.0001 Epoch: 31 Global Step: 655830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:19,309-Speed 6315.09 samples/sec Loss 3.1678 LearningRate 0.0001 Epoch: 31 Global Step: 655840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:22,567-Speed 6286.68 samples/sec Loss 3.1380 LearningRate 0.0001 Epoch: 31 Global Step: 655850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:25,817-Speed 6303.11 samples/sec Loss 3.2275 LearningRate 0.0001 Epoch: 31 Global Step: 655860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:29,074-Speed 6290.42 samples/sec Loss 3.2021 LearningRate 0.0001 Epoch: 31 Global Step: 655870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:32,327-Speed 6297.00 samples/sec Loss 3.1675 LearningRate 0.0001 Epoch: 31 Global Step: 655880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:35,563-Speed 6330.28 samples/sec Loss 3.1763 LearningRate 0.0001 Epoch: 31 Global Step: 655890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:38,808-Speed 6312.52 samples/sec Loss 3.2039 LearningRate 0.0001 Epoch: 31 Global Step: 655900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:42,055-Speed 6308.74 samples/sec Loss 3.2231 LearningRate 0.0001 Epoch: 31 Global Step: 655910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:45,310-Speed 6293.82 samples/sec Loss 3.2483 LearningRate 0.0001 Epoch: 31 Global Step: 655920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:48,560-Speed 6303.10 samples/sec Loss 3.1673 LearningRate 0.0001 Epoch: 31 Global Step: 655930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:51,808-Speed 6307.47 samples/sec Loss 3.2196 LearningRate 0.0001 Epoch: 31 Global Step: 655940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:55,055-Speed 6308.68 samples/sec Loss 3.2542 LearningRate 0.0001 Epoch: 31 Global Step: 655950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:23:58,302-Speed 6308.39 samples/sec Loss 3.2064 LearningRate 0.0001 Epoch: 31 Global Step: 655960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:01,550-Speed 6305.53 samples/sec Loss 3.1323 LearningRate 0.0001 Epoch: 31 Global Step: 655970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:04,797-Speed 6309.44 samples/sec Loss 3.1579 LearningRate 0.0001 Epoch: 31 Global Step: 655980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:08,035-Speed 6326.17 samples/sec Loss 3.1914 LearningRate 0.0001 Epoch: 31 Global Step: 655990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:11,289-Speed 6295.23 samples/sec Loss 3.2092 LearningRate 0.0001 Epoch: 31 Global Step: 656000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:14,534-Speed 6312.22 samples/sec Loss 3.1700 LearningRate 0.0001 Epoch: 31 Global Step: 656010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:17,782-Speed 6308.26 samples/sec Loss 3.1874 LearningRate 0.0001 Epoch: 31 Global Step: 656020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:21,030-Speed 6305.84 samples/sec Loss 3.1672 LearningRate 0.0001 Epoch: 31 Global Step: 656030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:24,289-Speed 6285.90 samples/sec Loss 3.1704 LearningRate 0.0001 Epoch: 31 Global Step: 656040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:27,538-Speed 6304.58 samples/sec Loss 3.1699 LearningRate 0.0001 Epoch: 31 Global Step: 656050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:30,783-Speed 6311.89 samples/sec Loss 3.2137 LearningRate 0.0001 Epoch: 31 Global Step: 656060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:34,033-Speed 6304.54 samples/sec Loss 3.1655 LearningRate 0.0001 Epoch: 31 Global Step: 656070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:37,281-Speed 6306.54 samples/sec Loss 3.2046 LearningRate 0.0001 Epoch: 31 Global Step: 656080 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:40,516-Speed 6332.56 samples/sec Loss 3.1937 LearningRate 0.0001 Epoch: 31 Global Step: 656090 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:43,767-Speed 6301.38 samples/sec Loss 3.1884 LearningRate 0.0001 Epoch: 31 Global Step: 656100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:47,015-Speed 6306.79 samples/sec Loss 3.1876 LearningRate 0.0001 Epoch: 31 Global Step: 656110 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:50,267-Speed 6298.49 samples/sec Loss 3.2105 LearningRate 0.0001 Epoch: 31 Global Step: 656120 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:53,515-Speed 6308.18 samples/sec Loss 3.1673 LearningRate 0.0001 Epoch: 31 Global Step: 656130 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:24:56,762-Speed 6308.30 samples/sec Loss 3.2054 LearningRate 0.0001 Epoch: 31 Global Step: 656140 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:00,006-Speed 6313.70 samples/sec Loss 3.1731 LearningRate 0.0001 Epoch: 31 Global Step: 656150 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:03,252-Speed 6310.17 samples/sec Loss 3.2224 LearningRate 0.0001 Epoch: 31 Global Step: 656160 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:06,502-Speed 6303.48 samples/sec Loss 3.1914 LearningRate 0.0001 Epoch: 31 Global Step: 656170 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:09,751-Speed 6305.80 samples/sec Loss 3.1351 LearningRate 0.0001 Epoch: 31 Global Step: 656180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:12,983-Speed 6336.66 samples/sec Loss 3.2409 LearningRate 0.0001 Epoch: 31 Global Step: 656190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:16,235-Speed 6299.17 samples/sec Loss 3.1829 LearningRate 0.0001 Epoch: 31 Global Step: 656200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:19,486-Speed 6300.76 samples/sec Loss 3.1466 LearningRate 0.0001 Epoch: 31 Global Step: 656210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:22,738-Speed 6299.75 samples/sec Loss 3.2009 LearningRate 0.0001 Epoch: 31 Global Step: 656220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:25,983-Speed 6312.12 samples/sec Loss 3.1988 LearningRate 0.0001 Epoch: 31 Global Step: 656230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:29,232-Speed 6305.41 samples/sec Loss 3.2496 LearningRate 0.0001 Epoch: 31 Global Step: 656240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:32,482-Speed 6302.21 samples/sec Loss 3.2106 LearningRate 0.0001 Epoch: 31 Global Step: 656250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:35,730-Speed 6306.78 samples/sec Loss 3.1601 LearningRate 0.0001 Epoch: 31 Global Step: 656260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:38,976-Speed 6310.75 samples/sec Loss 3.1934 LearningRate 0.0001 Epoch: 31 Global Step: 656270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:42,221-Speed 6312.73 samples/sec Loss 3.1661 LearningRate 0.0001 Epoch: 31 Global Step: 656280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:45,453-Speed 6338.33 samples/sec Loss 3.2355 LearningRate 0.0001 Epoch: 31 Global Step: 656290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:48,696-Speed 6317.49 samples/sec Loss 3.2475 LearningRate 0.0001 Epoch: 31 Global Step: 656300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:51,945-Speed 6305.46 samples/sec Loss 3.1904 LearningRate 0.0001 Epoch: 31 Global Step: 656310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:55,197-Speed 6300.00 samples/sec Loss 3.1927 LearningRate 0.0001 Epoch: 31 Global Step: 656320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:25:58,441-Speed 6313.13 samples/sec Loss 3.2147 LearningRate 0.0001 Epoch: 31 Global Step: 656330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:01,688-Speed 6308.61 samples/sec Loss 3.1107 LearningRate 0.0001 Epoch: 31 Global Step: 656340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:04,935-Speed 6310.18 samples/sec Loss 3.1611 LearningRate 0.0001 Epoch: 31 Global Step: 656350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:08,183-Speed 6305.92 samples/sec Loss 3.2217 LearningRate 0.0001 Epoch: 31 Global Step: 656360 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:11,430-Speed 6308.90 samples/sec Loss 3.1396 LearningRate 0.0001 Epoch: 31 Global Step: 656370 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:14,684-Speed 6295.79 samples/sec Loss 3.1378 LearningRate 0.0001 Epoch: 31 Global Step: 656380 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:17,916-Speed 6338.53 samples/sec Loss 3.2138 LearningRate 0.0001 Epoch: 31 Global Step: 656390 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:21,167-Speed 6300.91 samples/sec Loss 3.1817 LearningRate 0.0001 Epoch: 31 Global Step: 656400 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:24,413-Speed 6309.47 samples/sec Loss 3.1890 LearningRate 0.0001 Epoch: 31 Global Step: 656410 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:27,659-Speed 6311.27 samples/sec Loss 3.2407 LearningRate 0.0001 Epoch: 31 Global Step: 656420 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:30,902-Speed 6316.47 samples/sec Loss 3.1897 LearningRate 0.0001 Epoch: 31 Global Step: 656430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:34,150-Speed 6307.89 samples/sec Loss 3.0837 LearningRate 0.0001 Epoch: 31 Global Step: 656440 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:37,392-Speed 6316.60 samples/sec Loss 3.2470 LearningRate 0.0001 Epoch: 31 Global Step: 656450 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:40,642-Speed 6305.17 samples/sec Loss 3.1873 LearningRate 0.0001 Epoch: 31 Global Step: 656460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:43,887-Speed 6311.68 samples/sec Loss 3.2341 LearningRate 0.0001 Epoch: 31 Global Step: 656470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:47,145-Speed 6286.64 samples/sec Loss 3.2047 LearningRate 0.0001 Epoch: 31 Global Step: 656480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:26:50,393-Speed 6306.59 samples/sec Loss 3.1830 LearningRate 0.0001 Epoch: 31 Global Step: 656490 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-03 03:26:53,612-Speed 6363.65 samples/sec Loss 3.1672 LearningRate 0.0001 Epoch: 31 Global Step: 656500 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:26:56,853-Speed 6321.06 samples/sec Loss 3.1590 LearningRate 0.0001 Epoch: 31 Global Step: 656510 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:27:00,102-Speed 6305.57 samples/sec Loss 3.1506 LearningRate 0.0001 Epoch: 31 Global Step: 656520 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:27:03,348-Speed 6310.41 samples/sec Loss 3.2254 LearningRate 0.0001 Epoch: 31 Global Step: 656530 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:27:06,593-Speed 6314.09 samples/sec Loss 3.1996 LearningRate 0.0001 Epoch: 31 Global Step: 656540 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:27:09,838-Speed 6311.64 samples/sec Loss 3.1922 LearningRate 0.0001 Epoch: 31 Global Step: 656550 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:27:13,087-Speed 6305.33 samples/sec Loss 3.2092 LearningRate 0.0001 Epoch: 31 Global Step: 656560 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:27:16,335-Speed 6307.29 samples/sec Loss 3.1551 LearningRate 0.0001 Epoch: 31 Global Step: 656570 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:27:19,587-Speed 6298.91 samples/sec Loss 3.2553 LearningRate 0.0001 Epoch: 31 Global Step: 656580 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:27:22,833-Speed 6309.75 samples/sec Loss 3.2223 LearningRate 0.0001 Epoch: 31 Global Step: 656590 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:27:26,084-Speed 6300.80 samples/sec Loss 3.1600 LearningRate 0.0001 Epoch: 31 Global Step: 656600 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:27:29,328-Speed 6314.94 samples/sec Loss 3.1340 LearningRate 0.0001 Epoch: 31 Global Step: 656610 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:27:32,580-Speed 6298.59 samples/sec Loss 3.2239 LearningRate 0.0001 Epoch: 31 Global Step: 656620 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:27:35,826-Speed 6310.43 samples/sec Loss 3.1865 LearningRate 0.0001 Epoch: 31 Global Step: 656630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:27:39,078-Speed 6299.69 samples/sec Loss 3.1898 LearningRate 0.0001 Epoch: 31 Global Step: 656640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:27:42,324-Speed 6311.40 samples/sec Loss 3.2337 LearningRate 0.0001 Epoch: 31 Global Step: 656650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:27:45,569-Speed 6313.06 samples/sec Loss 3.1730 LearningRate 0.0001 Epoch: 31 Global Step: 656660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:27:48,824-Speed 6291.88 samples/sec Loss 3.1539 LearningRate 0.0001 Epoch: 31 Global Step: 656670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:27:52,074-Speed 6304.58 samples/sec Loss 3.2275 LearningRate 0.0001 Epoch: 31 Global Step: 656680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:27:55,316-Speed 6316.51 samples/sec Loss 3.2191 LearningRate 0.0001 Epoch: 31 Global Step: 656690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:27:58,551-Speed 6332.37 samples/sec Loss 3.1822 LearningRate 0.0001 Epoch: 31 Global Step: 656700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:01,831-Speed 6245.86 samples/sec Loss 3.1854 LearningRate 0.0001 Epoch: 31 Global Step: 656710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:05,078-Speed 6308.46 samples/sec Loss 3.1664 LearningRate 0.0001 Epoch: 31 Global Step: 656720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:08,324-Speed 6309.98 samples/sec Loss 3.1788 LearningRate 0.0001 Epoch: 31 Global Step: 656730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:11,568-Speed 6316.29 samples/sec Loss 3.2179 LearningRate 0.0001 Epoch: 31 Global Step: 656740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:14,816-Speed 6307.25 samples/sec Loss 3.1516 LearningRate 0.0001 Epoch: 31 Global Step: 656750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:18,063-Speed 6308.40 samples/sec Loss 3.2368 LearningRate 0.0001 Epoch: 31 Global Step: 656760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:21,315-Speed 6298.62 samples/sec Loss 3.1825 LearningRate 0.0001 Epoch: 31 Global Step: 656770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:24,565-Speed 6304.42 samples/sec Loss 3.1760 LearningRate 0.0001 Epoch: 31 Global Step: 656780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:27,812-Speed 6307.42 samples/sec Loss 3.2568 LearningRate 0.0001 Epoch: 31 Global Step: 656790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:31,045-Speed 6336.36 samples/sec Loss 3.2170 LearningRate 0.0001 Epoch: 31 Global Step: 656800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:34,294-Speed 6306.27 samples/sec Loss 3.1626 LearningRate 0.0001 Epoch: 31 Global Step: 656810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:37,543-Speed 6303.96 samples/sec Loss 3.1808 LearningRate 0.0001 Epoch: 31 Global Step: 656820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:40,792-Speed 6305.12 samples/sec Loss 3.1959 LearningRate 0.0001 Epoch: 31 Global Step: 656830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:44,040-Speed 6306.37 samples/sec Loss 3.1703 LearningRate 0.0001 Epoch: 31 Global Step: 656840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:47,284-Speed 6313.94 samples/sec Loss 3.1153 LearningRate 0.0001 Epoch: 31 Global Step: 656850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:50,535-Speed 6300.77 samples/sec Loss 3.1855 LearningRate 0.0001 Epoch: 31 Global Step: 656860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:53,791-Speed 6291.47 samples/sec Loss 3.2112 LearningRate 0.0001 Epoch: 31 Global Step: 656870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:28:57,036-Speed 6313.43 samples/sec Loss 3.1759 LearningRate 0.0001 Epoch: 31 Global Step: 656880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:00,288-Speed 6298.57 samples/sec Loss 3.1745 LearningRate 0.0001 Epoch: 31 Global Step: 656890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:03,515-Speed 6347.27 samples/sec Loss 3.1905 LearningRate 0.0001 Epoch: 31 Global Step: 656900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:06,760-Speed 6313.23 samples/sec Loss 3.1613 LearningRate 0.0001 Epoch: 31 Global Step: 656910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:10,004-Speed 6315.87 samples/sec Loss 3.1324 LearningRate 0.0001 Epoch: 31 Global Step: 656920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:13,260-Speed 6290.17 samples/sec Loss 3.1841 LearningRate 0.0001 Epoch: 31 Global Step: 656930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:16,507-Speed 6309.63 samples/sec Loss 3.1545 LearningRate 0.0001 Epoch: 31 Global Step: 656940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:19,757-Speed 6301.98 samples/sec Loss 3.1600 LearningRate 0.0001 Epoch: 31 Global Step: 656950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:23,009-Speed 6300.15 samples/sec Loss 3.2303 LearningRate 0.0001 Epoch: 31 Global Step: 656960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:26,257-Speed 6306.79 samples/sec Loss 3.1894 LearningRate 0.0001 Epoch: 31 Global Step: 656970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:29,499-Speed 6317.84 samples/sec Loss 3.1772 LearningRate 0.0001 Epoch: 31 Global Step: 656980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:32,744-Speed 6313.22 samples/sec Loss 3.1481 LearningRate 0.0001 Epoch: 31 Global Step: 656990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:35,979-Speed 6332.11 samples/sec Loss 3.1931 LearningRate 0.0001 Epoch: 31 Global Step: 657000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:39,231-Speed 6300.74 samples/sec Loss 3.1906 LearningRate 0.0001 Epoch: 31 Global Step: 657010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:42,474-Speed 6316.42 samples/sec Loss 3.1929 LearningRate 0.0001 Epoch: 31 Global Step: 657020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:45,721-Speed 6307.56 samples/sec Loss 3.1605 LearningRate 0.0001 Epoch: 31 Global Step: 657030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:48,968-Speed 6307.71 samples/sec Loss 3.2138 LearningRate 0.0001 Epoch: 31 Global Step: 657040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:52,220-Speed 6300.16 samples/sec Loss 3.2178 LearningRate 0.0001 Epoch: 31 Global Step: 657050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:55,461-Speed 6320.05 samples/sec Loss 3.1781 LearningRate 0.0001 Epoch: 31 Global Step: 657060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:29:58,711-Speed 6302.82 samples/sec Loss 3.2111 LearningRate 0.0001 Epoch: 31 Global Step: 657070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:30:01,946-Speed 6331.48 samples/sec Loss 3.1833 LearningRate 0.0001 Epoch: 31 Global Step: 657080 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:05,194-Speed 6307.45 samples/sec Loss 3.1887 LearningRate 0.0001 Epoch: 31 Global Step: 657090 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:08,440-Speed 6309.98 samples/sec Loss 3.2196 LearningRate 0.0001 Epoch: 31 Global Step: 657100 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:11,684-Speed 6315.20 samples/sec Loss 3.2025 LearningRate 0.0001 Epoch: 31 Global Step: 657110 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:14,942-Speed 6287.89 samples/sec Loss 3.2076 LearningRate 0.0001 Epoch: 31 Global Step: 657120 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:18,190-Speed 6306.83 samples/sec Loss 3.1989 LearningRate 0.0001 Epoch: 31 Global Step: 657130 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:21,440-Speed 6301.96 samples/sec Loss 3.1486 LearningRate 0.0001 Epoch: 31 Global Step: 657140 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:24,688-Speed 6308.16 samples/sec Loss 3.1747 LearningRate 0.0001 Epoch: 31 Global Step: 657150 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:27,932-Speed 6313.56 samples/sec Loss 3.1637 LearningRate 0.0001 Epoch: 31 Global Step: 657160 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:31,174-Speed 6317.61 samples/sec Loss 3.0965 LearningRate 0.0001 Epoch: 31 Global Step: 657170 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:34,419-Speed 6313.36 samples/sec Loss 3.1675 LearningRate 0.0001 Epoch: 31 Global Step: 657180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:30:37,665-Speed 6312.73 samples/sec Loss 3.1808 LearningRate 0.0001 Epoch: 31 Global Step: 657190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:30:40,924-Speed 6284.38 samples/sec Loss 3.1704 LearningRate 0.0001 Epoch: 31 Global Step: 657200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:30:44,189-Speed 6275.30 samples/sec Loss 3.1408 LearningRate 0.0001 Epoch: 31 Global Step: 657210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:30:47,435-Speed 6309.62 samples/sec Loss 3.1774 LearningRate 0.0001 Epoch: 31 Global Step: 657220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:30:50,666-Speed 6340.76 samples/sec Loss 3.1749 LearningRate 0.0001 Epoch: 31 Global Step: 657230 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:53,909-Speed 6315.75 samples/sec Loss 3.1956 LearningRate 0.0001 Epoch: 31 Global Step: 657240 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:30:57,159-Speed 6302.71 samples/sec Loss 3.1507 LearningRate 0.0001 Epoch: 31 Global Step: 657250 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:00,407-Speed 6306.43 samples/sec Loss 3.1944 LearningRate 0.0001 Epoch: 31 Global Step: 657260 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:03,656-Speed 6305.07 samples/sec Loss 3.2069 LearningRate 0.0001 Epoch: 31 Global Step: 657270 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:06,901-Speed 6312.68 samples/sec Loss 3.1506 LearningRate 0.0001 Epoch: 31 Global Step: 657280 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:10,151-Speed 6303.85 samples/sec Loss 3.1876 LearningRate 0.0001 Epoch: 31 Global Step: 657290 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:13,395-Speed 6313.71 samples/sec Loss 3.1920 LearningRate 0.0001 Epoch: 31 Global Step: 657300 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:16,642-Speed 6309.77 samples/sec Loss 3.1860 LearningRate 0.0001 Epoch: 31 Global Step: 657310 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:19,886-Speed 6313.59 samples/sec Loss 3.1706 LearningRate 0.0001 Epoch: 31 Global Step: 657320 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:23,134-Speed 6308.24 samples/sec Loss 3.1432 LearningRate 0.0001 Epoch: 31 Global Step: 657330 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:31:26,385-Speed 6300.58 samples/sec Loss 3.1692 LearningRate 0.0001 Epoch: 31 Global Step: 657340 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:31:29,633-Speed 6306.47 samples/sec Loss 3.1690 LearningRate 0.0001 Epoch: 31 Global Step: 657350 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:31:32,866-Speed 6335.36 samples/sec Loss 3.2016 LearningRate 0.0001 Epoch: 31 Global Step: 657360 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:36,115-Speed 6304.93 samples/sec Loss 3.2027 LearningRate 0.0001 Epoch: 31 Global Step: 657370 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:39,356-Speed 6321.67 samples/sec Loss 3.1922 LearningRate 0.0001 Epoch: 31 Global Step: 657380 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:42,606-Speed 6303.44 samples/sec Loss 3.1614 LearningRate 0.0001 Epoch: 31 Global Step: 657390 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:45,855-Speed 6303.79 samples/sec Loss 3.1986 LearningRate 0.0001 Epoch: 31 Global Step: 657400 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:49,102-Speed 6309.56 samples/sec Loss 3.1939 LearningRate 0.0001 Epoch: 31 Global Step: 657410 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:52,357-Speed 6294.04 samples/sec Loss 3.1914 LearningRate 0.0001 Epoch: 31 Global Step: 657420 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:55,602-Speed 6311.90 samples/sec Loss 3.2259 LearningRate 0.0001 Epoch: 31 Global Step: 657430 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:31:58,848-Speed 6315.24 samples/sec Loss 3.1677 LearningRate 0.0001 Epoch: 31 Global Step: 657440 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:02,095-Speed 6308.46 samples/sec Loss 3.2518 LearningRate 0.0001 Epoch: 31 Global Step: 657450 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:05,342-Speed 6309.01 samples/sec Loss 3.1629 LearningRate 0.0001 Epoch: 31 Global Step: 657460 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:32:08,589-Speed 6309.72 samples/sec Loss 3.1142 LearningRate 0.0001 Epoch: 31 Global Step: 657470 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:32:11,836-Speed 6307.82 samples/sec Loss 3.1496 LearningRate 0.0001 Epoch: 31 Global Step: 657480 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:32:15,087-Speed 6302.06 samples/sec Loss 3.1891 LearningRate 0.0001 Epoch: 31 Global Step: 657490 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:32:18,333-Speed 6309.66 samples/sec Loss 3.1765 LearningRate 0.0001 Epoch: 31 Global Step: 657500 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:32:21,577-Speed 6316.11 samples/sec Loss 3.1542 LearningRate 0.0001 Epoch: 31 Global Step: 657510 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:32:24,823-Speed 6310.38 samples/sec Loss 3.1844 LearningRate 0.0001 Epoch: 31 Global Step: 657520 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:32:28,053-Speed 6341.26 samples/sec Loss 3.1116 LearningRate 0.0001 Epoch: 31 Global Step: 657530 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:31,298-Speed 6313.08 samples/sec Loss 3.2098 LearningRate 0.0001 Epoch: 31 Global Step: 657540 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:34,543-Speed 6313.38 samples/sec Loss 3.1681 LearningRate 0.0001 Epoch: 31 Global Step: 657550 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:37,786-Speed 6315.21 samples/sec Loss 3.1835 LearningRate 0.0001 Epoch: 31 Global Step: 657560 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:41,033-Speed 6309.51 samples/sec Loss 3.1186 LearningRate 0.0001 Epoch: 31 Global Step: 657570 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:44,277-Speed 6313.50 samples/sec Loss 3.1359 LearningRate 0.0001 Epoch: 31 Global Step: 657580 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:47,519-Speed 6319.83 samples/sec Loss 3.1607 LearningRate 0.0001 Epoch: 31 Global Step: 657590 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:50,762-Speed 6315.24 samples/sec Loss 3.2097 LearningRate 0.0001 Epoch: 31 Global Step: 657600 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:54,011-Speed 6304.99 samples/sec Loss 3.1849 LearningRate 0.0001 Epoch: 31 Global Step: 657610 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:32:57,255-Speed 6315.68 samples/sec Loss 3.1783 LearningRate 0.0001 Epoch: 31 Global Step: 657620 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:33:00,503-Speed 6307.48 samples/sec Loss 3.1752 LearningRate 0.0001 Epoch: 31 Global Step: 657630 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:03,750-Speed 6309.39 samples/sec Loss 3.2293 LearningRate 0.0001 Epoch: 31 Global Step: 657640 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:06,994-Speed 6314.08 samples/sec Loss 3.1977 LearningRate 0.0001 Epoch: 31 Global Step: 657650 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:10,241-Speed 6310.50 samples/sec Loss 3.2283 LearningRate 0.0001 Epoch: 31 Global Step: 657660 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:13,485-Speed 6314.23 samples/sec Loss 3.2377 LearningRate 0.0001 Epoch: 31 Global Step: 657670 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:16,730-Speed 6312.62 samples/sec Loss 3.1758 LearningRate 0.0001 Epoch: 31 Global Step: 657680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:19,977-Speed 6308.07 samples/sec Loss 3.1512 LearningRate 0.0001 Epoch: 31 Global Step: 657690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:23,232-Speed 6294.37 samples/sec Loss 3.1645 LearningRate 0.0001 Epoch: 31 Global Step: 657700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:26,478-Speed 6310.16 samples/sec Loss 3.1392 LearningRate 0.0001 Epoch: 31 Global Step: 657710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:29,725-Speed 6308.40 samples/sec Loss 3.2541 LearningRate 0.0001 Epoch: 31 Global Step: 657720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:32,959-Speed 6333.23 samples/sec Loss 3.1733 LearningRate 0.0001 Epoch: 31 Global Step: 657730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:36,200-Speed 6321.99 samples/sec Loss 3.2125 LearningRate 0.0001 Epoch: 31 Global Step: 657740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:39,449-Speed 6304.45 samples/sec Loss 3.2088 LearningRate 0.0001 Epoch: 31 Global Step: 657750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:42,695-Speed 6309.64 samples/sec Loss 3.1724 LearningRate 0.0001 Epoch: 31 Global Step: 657760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:45,943-Speed 6308.43 samples/sec Loss 3.1895 LearningRate 0.0001 Epoch: 31 Global Step: 657770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:49,189-Speed 6310.67 samples/sec Loss 3.1583 LearningRate 0.0001 Epoch: 31 Global Step: 657780 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:52,437-Speed 6306.31 samples/sec Loss 3.1520 LearningRate 0.0001 Epoch: 31 Global Step: 657790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:55,682-Speed 6312.77 samples/sec Loss 3.1800 LearningRate 0.0001 Epoch: 31 Global Step: 657800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:33:58,928-Speed 6310.34 samples/sec Loss 3.1746 LearningRate 0.0001 Epoch: 31 Global Step: 657810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:02,183-Speed 6293.05 samples/sec Loss 3.2356 LearningRate 0.0001 Epoch: 31 Global Step: 657820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:05,471-Speed 6229.45 samples/sec Loss 3.1923 LearningRate 0.0001 Epoch: 31 Global Step: 657830 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-03 03:34:08,702-Speed 6340.88 samples/sec Loss 3.2112 LearningRate 0.0001 Epoch: 31 Global Step: 657840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:11,950-Speed 6306.57 samples/sec Loss 3.1334 LearningRate 0.0001 Epoch: 31 Global Step: 657850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:15,198-Speed 6306.88 samples/sec Loss 3.1626 LearningRate 0.0001 Epoch: 31 Global Step: 657860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:18,450-Speed 6300.15 samples/sec Loss 3.2242 LearningRate 0.0001 Epoch: 31 Global Step: 657870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:21,699-Speed 6305.64 samples/sec Loss 3.2189 LearningRate 0.0001 Epoch: 31 Global Step: 657880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:24,949-Speed 6302.88 samples/sec Loss 3.1914 LearningRate 0.0001 Epoch: 31 Global Step: 657890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:28,195-Speed 6310.76 samples/sec Loss 3.1537 LearningRate 0.0001 Epoch: 31 Global Step: 657900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:31,445-Speed 6301.81 samples/sec Loss 3.1567 LearningRate 0.0001 Epoch: 31 Global Step: 657910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:34,705-Speed 6284.83 samples/sec Loss 3.1780 LearningRate 0.0001 Epoch: 31 Global Step: 657920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:37,957-Speed 6299.15 samples/sec Loss 3.1848 LearningRate 0.0001 Epoch: 31 Global Step: 657930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:41,199-Speed 6318.58 samples/sec Loss 3.1743 LearningRate 0.0001 Epoch: 31 Global Step: 657940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:44,444-Speed 6312.33 samples/sec Loss 3.1680 LearningRate 0.0001 Epoch: 31 Global Step: 657950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:47,689-Speed 6312.00 samples/sec Loss 3.2416 LearningRate 0.0001 Epoch: 31 Global Step: 657960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:50,932-Speed 6316.63 samples/sec Loss 3.1748 LearningRate 0.0001 Epoch: 31 Global Step: 657970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:54,176-Speed 6314.74 samples/sec Loss 3.1905 LearningRate 0.0001 Epoch: 31 Global Step: 657980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:34:57,423-Speed 6308.65 samples/sec Loss 3.2206 LearningRate 0.0001 Epoch: 31 Global Step: 657990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:35:00,678-Speed 6293.29 samples/sec Loss 3.1833 LearningRate 0.0001 Epoch: 31 Global Step: 658000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:35:03,924-Speed 6311.06 samples/sec Loss 3.1740 LearningRate 0.0001 Epoch: 31 Global Step: 658010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:35:07,169-Speed 6311.41 samples/sec Loss 3.1741 LearningRate 0.0001 Epoch: 31 Global Step: 658020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:35:10,414-Speed 6314.28 samples/sec Loss 3.1785 LearningRate 0.0001 Epoch: 31 Global Step: 658030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:35:13,646-Speed 6337.47 samples/sec Loss 3.2126 LearningRate 0.0001 Epoch: 31 Global Step: 658040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:35:16,889-Speed 6315.93 samples/sec Loss 3.2425 LearningRate 0.0001 Epoch: 31 Global Step: 658050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:35:20,132-Speed 6317.58 samples/sec Loss 3.1290 LearningRate 0.0001 Epoch: 31 Global Step: 658060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:35:23,377-Speed 6311.81 samples/sec Loss 3.1151 LearningRate 0.0001 Epoch: 31 Global Step: 658070 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:35:26,624-Speed 6307.71 samples/sec Loss 3.1246 LearningRate 0.0001 Epoch: 31 Global Step: 658080 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:35:29,875-Speed 6302.62 samples/sec Loss 3.1717 LearningRate 0.0001 Epoch: 31 Global Step: 658090 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:35:33,123-Speed 6306.88 samples/sec Loss 3.1465 LearningRate 0.0001 Epoch: 31 Global Step: 658100 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:35:36,370-Speed 6308.82 samples/sec Loss 3.1716 LearningRate 0.0001 Epoch: 31 Global Step: 658110 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:35:39,616-Speed 6311.75 samples/sec Loss 3.2300 LearningRate 0.0001 Epoch: 31 Global Step: 658120 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:35:42,862-Speed 6311.37 samples/sec Loss 3.1992 LearningRate 0.0001 Epoch: 31 Global Step: 658130 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:35:46,106-Speed 6313.34 samples/sec Loss 3.1059 LearningRate 0.0001 Epoch: 31 Global Step: 658140 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:35:49,350-Speed 6314.28 samples/sec Loss 3.1624 LearningRate 0.0001 Epoch: 31 Global Step: 658150 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:35:52,596-Speed 6310.47 samples/sec Loss 3.1984 LearningRate 0.0001 Epoch: 31 Global Step: 658160 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:35:55,840-Speed 6315.78 samples/sec Loss 3.1516 LearningRate 0.0001 Epoch: 31 Global Step: 658170 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:35:59,088-Speed 6306.11 samples/sec Loss 3.1703 LearningRate 0.0001 Epoch: 31 Global Step: 658180 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:02,336-Speed 6307.11 samples/sec Loss 3.1804 LearningRate 0.0001 Epoch: 31 Global Step: 658190 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:05,581-Speed 6311.51 samples/sec Loss 3.2232 LearningRate 0.0001 Epoch: 31 Global Step: 658200 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:08,828-Speed 6310.41 samples/sec Loss 3.1643 LearningRate 0.0001 Epoch: 31 Global Step: 658210 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:12,076-Speed 6306.32 samples/sec Loss 3.1753 LearningRate 0.0001 Epoch: 31 Global Step: 658220 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:15,320-Speed 6313.99 samples/sec Loss 3.1976 LearningRate 0.0001 Epoch: 31 Global Step: 658230 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:18,569-Speed 6305.87 samples/sec Loss 3.1635 LearningRate 0.0001 Epoch: 31 Global Step: 658240 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:21,815-Speed 6311.02 samples/sec Loss 3.1486 LearningRate 0.0001 Epoch: 31 Global Step: 658250 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:25,063-Speed 6306.62 samples/sec Loss 3.1683 LearningRate 0.0001 Epoch: 31 Global Step: 658260 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:28,293-Speed 6340.78 samples/sec Loss 3.1498 LearningRate 0.0001 Epoch: 31 Global Step: 658270 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:31,539-Speed 6310.64 samples/sec Loss 3.1542 LearningRate 0.0001 Epoch: 31 Global Step: 658280 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:34,787-Speed 6306.88 samples/sec Loss 3.1452 LearningRate 0.0001 Epoch: 31 Global Step: 658290 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:38,035-Speed 6306.82 samples/sec Loss 3.2182 LearningRate 0.0001 Epoch: 31 Global Step: 658300 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:41,287-Speed 6299.91 samples/sec Loss 3.2373 LearningRate 0.0001 Epoch: 31 Global Step: 658310 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:44,530-Speed 6315.74 samples/sec Loss 3.2459 LearningRate 0.0001 Epoch: 31 Global Step: 658320 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:36:47,760-Speed 6343.38 samples/sec Loss 3.1766 LearningRate 0.0001 Epoch: 31 Global Step: 658330 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:36:51,008-Speed 6307.43 samples/sec Loss 3.1935 LearningRate 0.0001 Epoch: 31 Global Step: 658340 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:36:54,250-Speed 6317.93 samples/sec Loss 3.1607 LearningRate 0.0001 Epoch: 31 Global Step: 658350 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:36:57,499-Speed 6305.29 samples/sec Loss 3.1822 LearningRate 0.0001 Epoch: 31 Global Step: 658360 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:00,752-Speed 6296.70 samples/sec Loss 3.2123 LearningRate 0.0001 Epoch: 31 Global Step: 658370 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:04,005-Speed 6296.65 samples/sec Loss 3.1353 LearningRate 0.0001 Epoch: 31 Global Step: 658380 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:07,255-Speed 6304.06 samples/sec Loss 3.1544 LearningRate 0.0001 Epoch: 31 Global Step: 658390 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:10,506-Speed 6299.79 samples/sec Loss 3.2133 LearningRate 0.0001 Epoch: 31 Global Step: 658400 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:13,757-Speed 6302.30 samples/sec Loss 3.2066 LearningRate 0.0001 Epoch: 31 Global Step: 658410 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:17,048-Speed 6223.74 samples/sec Loss 3.1975 LearningRate 0.0001 Epoch: 31 Global Step: 658420 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:20,314-Speed 6272.45 samples/sec Loss 3.1408 LearningRate 0.0001 Epoch: 31 Global Step: 658430 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:37:23,548-Speed 6332.86 samples/sec Loss 3.1805 LearningRate 0.0001 Epoch: 31 Global Step: 658440 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:26,801-Speed 6297.95 samples/sec Loss 3.1727 LearningRate 0.0001 Epoch: 31 Global Step: 658450 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:30,045-Speed 6314.91 samples/sec Loss 3.1810 LearningRate 0.0001 Epoch: 31 Global Step: 658460 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:33,291-Speed 6310.87 samples/sec Loss 3.1444 LearningRate 0.0001 Epoch: 31 Global Step: 658470 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:36,534-Speed 6315.48 samples/sec Loss 3.2375 LearningRate 0.0001 Epoch: 31 Global Step: 658480 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:39,793-Speed 6287.37 samples/sec Loss 3.1676 LearningRate 0.0001 Epoch: 31 Global Step: 658490 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:43,036-Speed 6315.25 samples/sec Loss 3.1804 LearningRate 0.0001 Epoch: 31 Global Step: 658500 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:46,295-Speed 6287.11 samples/sec Loss 3.1918 LearningRate 0.0001 Epoch: 31 Global Step: 658510 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:49,540-Speed 6311.22 samples/sec Loss 3.1603 LearningRate 0.0001 Epoch: 31 Global Step: 658520 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:52,782-Speed 6318.96 samples/sec Loss 3.1522 LearningRate 0.0001 Epoch: 31 Global Step: 658530 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:37:56,029-Speed 6309.09 samples/sec Loss 3.1897 LearningRate 0.0001 Epoch: 31 Global Step: 658540 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:37:59,281-Speed 6299.06 samples/sec Loss 3.1976 LearningRate 0.0001 Epoch: 31 Global Step: 658550 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:38:02,531-Speed 6304.39 samples/sec Loss 3.1571 LearningRate 0.0001 Epoch: 31 Global Step: 658560 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:38:05,777-Speed 6309.37 samples/sec Loss 3.1084 LearningRate 0.0001 Epoch: 31 Global Step: 658570 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:38:09,008-Speed 6339.78 samples/sec Loss 3.1713 LearningRate 0.0001 Epoch: 31 Global Step: 658580 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:38:12,257-Speed 6305.01 samples/sec Loss 3.1583 LearningRate 0.0001 Epoch: 31 Global Step: 658590 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:38:15,503-Speed 6312.13 samples/sec Loss 3.1304 LearningRate 0.0001 Epoch: 31 Global Step: 658600 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:38:18,751-Speed 6305.85 samples/sec Loss 3.1613 LearningRate 0.0001 Epoch: 31 Global Step: 658610 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:38:21,994-Speed 6315.87 samples/sec Loss 3.1030 LearningRate 0.0001 Epoch: 31 Global Step: 658620 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:38:25,241-Speed 6310.18 samples/sec Loss 3.1740 LearningRate 0.0001 Epoch: 31 Global Step: 658630 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:38:28,485-Speed 6313.74 samples/sec Loss 3.1587 LearningRate 0.0001 Epoch: 31 Global Step: 658640 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:38:31,730-Speed 6312.70 samples/sec Loss 3.1298 LearningRate 0.0001 Epoch: 31 Global Step: 658650 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:38:34,972-Speed 6318.60 samples/sec Loss 3.2002 LearningRate 0.0001 Epoch: 31 Global Step: 658660 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:38:38,219-Speed 6308.80 samples/sec Loss 3.1153 LearningRate 0.0001 Epoch: 31 Global Step: 658670 Fp16 Grad Scale: 4096 Required: 16 hours Training: 2022-04-03 03:38:41,465-Speed 6310.64 samples/sec Loss 3.2037 LearningRate 0.0001 Epoch: 31 Global Step: 658680 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:38:44,713-Speed 6307.84 samples/sec Loss 3.1607 LearningRate 0.0001 Epoch: 31 Global Step: 658690 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:38:47,963-Speed 6302.47 samples/sec Loss 3.1995 LearningRate 0.0001 Epoch: 31 Global Step: 658700 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:38:51,206-Speed 6315.85 samples/sec Loss 3.1587 LearningRate 0.0001 Epoch: 31 Global Step: 658710 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:38:54,453-Speed 6309.73 samples/sec Loss 3.1749 LearningRate 0.0001 Epoch: 31 Global Step: 658720 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:38:57,700-Speed 6307.81 samples/sec Loss 3.0968 LearningRate 0.0001 Epoch: 31 Global Step: 658730 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:00,949-Speed 6304.98 samples/sec Loss 3.1665 LearningRate 0.0001 Epoch: 31 Global Step: 658740 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:04,196-Speed 6308.36 samples/sec Loss 3.1661 LearningRate 0.0001 Epoch: 31 Global Step: 658750 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:07,457-Speed 6283.18 samples/sec Loss 3.1678 LearningRate 0.0001 Epoch: 31 Global Step: 658760 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:10,703-Speed 6310.56 samples/sec Loss 3.1373 LearningRate 0.0001 Epoch: 31 Global Step: 658770 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:13,951-Speed 6307.59 samples/sec Loss 3.1921 LearningRate 0.0001 Epoch: 31 Global Step: 658780 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-03 03:39:17,275-Speed 6161.57 samples/sec Loss 3.1916 LearningRate 0.0001 Epoch: 31 Global Step: 658790 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:20,521-Speed 6311.89 samples/sec Loss 3.1856 LearningRate 0.0001 Epoch: 31 Global Step: 658800 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:23,764-Speed 6316.30 samples/sec Loss 3.1505 LearningRate 0.0001 Epoch: 31 Global Step: 658810 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:27,012-Speed 6307.49 samples/sec Loss 3.1863 LearningRate 0.0001 Epoch: 31 Global Step: 658820 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:30,256-Speed 6313.38 samples/sec Loss 3.2097 LearningRate 0.0001 Epoch: 31 Global Step: 658830 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:33,506-Speed 6304.47 samples/sec Loss 3.1758 LearningRate 0.0001 Epoch: 31 Global Step: 658840 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:36,756-Speed 6302.43 samples/sec Loss 3.2088 LearningRate 0.0001 Epoch: 31 Global Step: 658850 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:40,007-Speed 6300.32 samples/sec Loss 3.1452 LearningRate 0.0001 Epoch: 31 Global Step: 658860 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:43,254-Speed 6308.01 samples/sec Loss 3.1878 LearningRate 0.0001 Epoch: 31 Global Step: 658870 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:46,505-Speed 6302.68 samples/sec Loss 3.2009 LearningRate 0.0001 Epoch: 31 Global Step: 658880 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:49,738-Speed 6335.10 samples/sec Loss 3.2424 LearningRate 0.0001 Epoch: 31 Global Step: 658890 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:52,982-Speed 6314.08 samples/sec Loss 3.1862 LearningRate 0.0001 Epoch: 31 Global Step: 658900 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:56,228-Speed 6311.64 samples/sec Loss 3.1800 LearningRate 0.0001 Epoch: 31 Global Step: 658910 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:39:59,475-Speed 6307.98 samples/sec Loss 3.1330 LearningRate 0.0001 Epoch: 31 Global Step: 658920 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:02,726-Speed 6300.99 samples/sec Loss 3.1921 LearningRate 0.0001 Epoch: 31 Global Step: 658930 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:05,972-Speed 6311.99 samples/sec Loss 3.1741 LearningRate 0.0001 Epoch: 31 Global Step: 658940 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:09,218-Speed 6310.20 samples/sec Loss 3.2112 LearningRate 0.0001 Epoch: 31 Global Step: 658950 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:12,464-Speed 6310.90 samples/sec Loss 3.2456 LearningRate 0.0001 Epoch: 31 Global Step: 658960 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:15,710-Speed 6310.41 samples/sec Loss 3.1548 LearningRate 0.0001 Epoch: 31 Global Step: 658970 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:18,954-Speed 6314.91 samples/sec Loss 3.1658 LearningRate 0.0001 Epoch: 31 Global Step: 658980 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:22,190-Speed 6330.28 samples/sec Loss 3.1552 LearningRate 0.0001 Epoch: 31 Global Step: 658990 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:25,440-Speed 6303.50 samples/sec Loss 3.1120 LearningRate 0.0001 Epoch: 31 Global Step: 659000 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:28,690-Speed 6303.42 samples/sec Loss 3.1639 LearningRate 0.0001 Epoch: 31 Global Step: 659010 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:31,941-Speed 6300.33 samples/sec Loss 3.1504 LearningRate 0.0001 Epoch: 31 Global Step: 659020 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:35,197-Speed 6291.52 samples/sec Loss 3.1852 LearningRate 0.0001 Epoch: 31 Global Step: 659030 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:38,458-Speed 6280.72 samples/sec Loss 3.2203 LearningRate 0.0001 Epoch: 31 Global Step: 659040 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:41,707-Speed 6304.84 samples/sec Loss 3.1461 LearningRate 0.0001 Epoch: 31 Global Step: 659050 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:44,954-Speed 6309.39 samples/sec Loss 3.1720 LearningRate 0.0001 Epoch: 31 Global Step: 659060 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:48,203-Speed 6304.73 samples/sec Loss 3.1915 LearningRate 0.0001 Epoch: 31 Global Step: 659070 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:51,458-Speed 6294.12 samples/sec Loss 3.1653 LearningRate 0.0001 Epoch: 31 Global Step: 659080 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:54,691-Speed 6335.30 samples/sec Loss 3.2036 LearningRate 0.0001 Epoch: 31 Global Step: 659090 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:40:57,938-Speed 6308.45 samples/sec Loss 3.1754 LearningRate 0.0001 Epoch: 31 Global Step: 659100 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:41:01,187-Speed 6305.94 samples/sec Loss 3.2284 LearningRate 0.0001 Epoch: 31 Global Step: 659110 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:41:04,433-Speed 6310.52 samples/sec Loss 3.1753 LearningRate 0.0001 Epoch: 31 Global Step: 659120 Fp16 Grad Scale: 8192 Required: 16 hours Training: 2022-04-03 03:41:07,684-Speed 6300.70 samples/sec Loss 3.2052 LearningRate 0.0001 Epoch: 31 Global Step: 659130 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:10,929-Speed 6312.79 samples/sec Loss 3.1857 LearningRate 0.0001 Epoch: 31 Global Step: 659140 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:14,172-Speed 6316.25 samples/sec Loss 3.2255 LearningRate 0.0001 Epoch: 31 Global Step: 659150 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:17,425-Speed 6297.23 samples/sec Loss 3.1548 LearningRate 0.0001 Epoch: 31 Global Step: 659160 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:20,675-Speed 6303.43 samples/sec Loss 3.1778 LearningRate 0.0001 Epoch: 31 Global Step: 659170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:23,921-Speed 6310.20 samples/sec Loss 3.2143 LearningRate 0.0001 Epoch: 31 Global Step: 659180 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:27,157-Speed 6330.02 samples/sec Loss 3.1764 LearningRate 0.0001 Epoch: 31 Global Step: 659190 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:30,400-Speed 6315.73 samples/sec Loss 3.1321 LearningRate 0.0001 Epoch: 31 Global Step: 659200 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:33,653-Speed 6299.53 samples/sec Loss 3.1645 LearningRate 0.0001 Epoch: 31 Global Step: 659210 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:36,908-Speed 6292.15 samples/sec Loss 3.1881 LearningRate 0.0001 Epoch: 31 Global Step: 659220 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:40,155-Speed 6309.22 samples/sec Loss 3.1643 LearningRate 0.0001 Epoch: 31 Global Step: 659230 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:41:43,390-Speed 6332.63 samples/sec Loss 3.1677 LearningRate 0.0001 Epoch: 31 Global Step: 659240 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:41:46,632-Speed 6318.21 samples/sec Loss 3.1800 LearningRate 0.0001 Epoch: 31 Global Step: 659250 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:41:49,876-Speed 6314.29 samples/sec Loss 3.1556 LearningRate 0.0001 Epoch: 31 Global Step: 659260 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:41:53,119-Speed 6316.05 samples/sec Loss 3.1592 LearningRate 0.0001 Epoch: 31 Global Step: 659270 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:41:56,369-Speed 6303.91 samples/sec Loss 3.2238 LearningRate 0.0001 Epoch: 31 Global Step: 659280 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:41:59,620-Speed 6301.03 samples/sec Loss 3.1958 LearningRate 0.0001 Epoch: 31 Global Step: 659290 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:42:02,871-Speed 6301.30 samples/sec Loss 3.1596 LearningRate 0.0001 Epoch: 31 Global Step: 659300 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:42:06,120-Speed 6304.17 samples/sec Loss 3.1890 LearningRate 0.0001 Epoch: 31 Global Step: 659310 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:42:09,365-Speed 6312.17 samples/sec Loss 3.2076 LearningRate 0.0001 Epoch: 31 Global Step: 659320 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:42:12,613-Speed 6307.16 samples/sec Loss 3.1592 LearningRate 0.0001 Epoch: 31 Global Step: 659330 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:42:15,859-Speed 6310.63 samples/sec Loss 3.1656 LearningRate 0.0001 Epoch: 31 Global Step: 659340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:19,105-Speed 6311.09 samples/sec Loss 3.2083 LearningRate 0.0001 Epoch: 31 Global Step: 659350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:22,364-Speed 6285.86 samples/sec Loss 3.1793 LearningRate 0.0001 Epoch: 31 Global Step: 659360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:25,610-Speed 6310.16 samples/sec Loss 3.0937 LearningRate 0.0001 Epoch: 31 Global Step: 659370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:28,857-Speed 6309.06 samples/sec Loss 3.1852 LearningRate 0.0001 Epoch: 31 Global Step: 659380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:32,103-Speed 6309.92 samples/sec Loss 3.2073 LearningRate 0.0001 Epoch: 31 Global Step: 659390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:35,358-Speed 6293.60 samples/sec Loss 3.1914 LearningRate 0.0001 Epoch: 31 Global Step: 659400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:38,607-Speed 6305.30 samples/sec Loss 3.1309 LearningRate 0.0001 Epoch: 31 Global Step: 659410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:41,856-Speed 6303.71 samples/sec Loss 3.2154 LearningRate 0.0001 Epoch: 31 Global Step: 659420 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:45,108-Speed 6299.52 samples/sec Loss 3.1936 LearningRate 0.0001 Epoch: 31 Global Step: 659430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:48,340-Speed 6339.62 samples/sec Loss 3.1586 LearningRate 0.0001 Epoch: 31 Global Step: 659440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:51,591-Speed 6301.26 samples/sec Loss 3.1069 LearningRate 0.0001 Epoch: 31 Global Step: 659450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:54,844-Speed 6295.96 samples/sec Loss 3.1388 LearningRate 0.0001 Epoch: 31 Global Step: 659460 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:42:58,088-Speed 6314.38 samples/sec Loss 3.1812 LearningRate 0.0001 Epoch: 31 Global Step: 659470 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:43:01,321-Speed 6335.75 samples/sec Loss 3.1008 LearningRate 0.0001 Epoch: 31 Global Step: 659480 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:43:04,568-Speed 6310.93 samples/sec Loss 3.2062 LearningRate 0.0001 Epoch: 31 Global Step: 659490 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:43:07,814-Speed 6309.24 samples/sec Loss 3.2040 LearningRate 0.0001 Epoch: 31 Global Step: 659500 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:43:11,059-Speed 6312.90 samples/sec Loss 3.2387 LearningRate 0.0001 Epoch: 31 Global Step: 659510 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:43:14,317-Speed 6288.04 samples/sec Loss 3.2177 LearningRate 0.0001 Epoch: 31 Global Step: 659520 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:43:17,562-Speed 6311.97 samples/sec Loss 3.1229 LearningRate 0.0001 Epoch: 31 Global Step: 659530 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:43:20,812-Speed 6303.75 samples/sec Loss 3.1532 LearningRate 0.0001 Epoch: 31 Global Step: 659540 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:43:24,060-Speed 6305.33 samples/sec Loss 3.1789 LearningRate 0.0001 Epoch: 31 Global Step: 659550 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:43:27,304-Speed 6314.39 samples/sec Loss 3.1216 LearningRate 0.0001 Epoch: 31 Global Step: 659560 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:43:30,550-Speed 6311.48 samples/sec Loss 3.1376 LearningRate 0.0001 Epoch: 31 Global Step: 659570 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:43:33,796-Speed 6310.32 samples/sec Loss 3.1167 LearningRate 0.0001 Epoch: 31 Global Step: 659580 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:43:37,040-Speed 6315.18 samples/sec Loss 3.1331 LearningRate 0.0001 Epoch: 31 Global Step: 659590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:43:40,285-Speed 6311.36 samples/sec Loss 3.1700 LearningRate 0.0001 Epoch: 31 Global Step: 659600 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:43:43,532-Speed 6310.25 samples/sec Loss 3.1355 LearningRate 0.0001 Epoch: 31 Global Step: 659610 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:43:46,778-Speed 6308.93 samples/sec Loss 3.1849 LearningRate 0.0001 Epoch: 31 Global Step: 659620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:43:50,025-Speed 6310.23 samples/sec Loss 3.1883 LearningRate 0.0001 Epoch: 31 Global Step: 659630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:43:53,281-Speed 6289.99 samples/sec Loss 3.1812 LearningRate 0.0001 Epoch: 31 Global Step: 659640 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:43:56,534-Speed 6299.23 samples/sec Loss 3.2295 LearningRate 0.0001 Epoch: 31 Global Step: 659650 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:43:59,785-Speed 6301.22 samples/sec Loss 3.1486 LearningRate 0.0001 Epoch: 31 Global Step: 659660 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:03,031-Speed 6310.14 samples/sec Loss 3.1587 LearningRate 0.0001 Epoch: 31 Global Step: 659670 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:06,262-Speed 6339.36 samples/sec Loss 3.1694 LearningRate 0.0001 Epoch: 31 Global Step: 659680 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:09,510-Speed 6307.53 samples/sec Loss 3.2344 LearningRate 0.0001 Epoch: 31 Global Step: 659690 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:12,769-Speed 6286.40 samples/sec Loss 3.1189 LearningRate 0.0001 Epoch: 31 Global Step: 659700 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:16,021-Speed 6298.21 samples/sec Loss 3.1916 LearningRate 0.0001 Epoch: 31 Global Step: 659710 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:19,280-Speed 6286.78 samples/sec Loss 3.1711 LearningRate 0.0001 Epoch: 31 Global Step: 659720 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:22,529-Speed 6304.88 samples/sec Loss 3.1223 LearningRate 0.0001 Epoch: 31 Global Step: 659730 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:25,791-Speed 6279.38 samples/sec Loss 3.1314 LearningRate 0.0001 Epoch: 31 Global Step: 659740 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:29,035-Speed 6313.87 samples/sec Loss 3.2406 LearningRate 0.0001 Epoch: 31 Global Step: 659750 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:32,280-Speed 6312.61 samples/sec Loss 3.0959 LearningRate 0.0001 Epoch: 31 Global Step: 659760 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:35,538-Speed 6288.83 samples/sec Loss 3.1496 LearningRate 0.0001 Epoch: 31 Global Step: 659770 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:38,765-Speed 6346.22 samples/sec Loss 3.1542 LearningRate 0.0001 Epoch: 31 Global Step: 659780 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:42,017-Speed 6300.29 samples/sec Loss 3.1260 LearningRate 0.0001 Epoch: 31 Global Step: 659790 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:45,264-Speed 6309.22 samples/sec Loss 3.1621 LearningRate 0.0001 Epoch: 31 Global Step: 659800 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:48,515-Speed 6299.89 samples/sec Loss 3.1189 LearningRate 0.0001 Epoch: 31 Global Step: 659810 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:51,762-Speed 6309.56 samples/sec Loss 3.1826 LearningRate 0.0001 Epoch: 31 Global Step: 659820 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:55,006-Speed 6313.45 samples/sec Loss 3.2136 LearningRate 0.0001 Epoch: 31 Global Step: 659830 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:44:58,263-Speed 6288.98 samples/sec Loss 3.1747 LearningRate 0.0001 Epoch: 31 Global Step: 659840 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:01,511-Speed 6308.60 samples/sec Loss 3.1670 LearningRate 0.0001 Epoch: 31 Global Step: 659850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:04,767-Speed 6290.67 samples/sec Loss 3.2060 LearningRate 0.0001 Epoch: 31 Global Step: 659860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:08,017-Speed 6302.36 samples/sec Loss 3.1180 LearningRate 0.0001 Epoch: 31 Global Step: 659870 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:11,249-Speed 6337.88 samples/sec Loss 3.1648 LearningRate 0.0001 Epoch: 31 Global Step: 659880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:14,494-Speed 6313.64 samples/sec Loss 3.1141 LearningRate 0.0001 Epoch: 31 Global Step: 659890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:17,741-Speed 6309.69 samples/sec Loss 3.1356 LearningRate 0.0001 Epoch: 31 Global Step: 659900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:20,993-Speed 6299.13 samples/sec Loss 3.1908 LearningRate 0.0001 Epoch: 31 Global Step: 659910 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:24,239-Speed 6309.34 samples/sec Loss 3.1977 LearningRate 0.0001 Epoch: 31 Global Step: 659920 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:27,488-Speed 6304.84 samples/sec Loss 3.1688 LearningRate 0.0001 Epoch: 31 Global Step: 659930 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:30,739-Speed 6302.71 samples/sec Loss 3.1767 LearningRate 0.0001 Epoch: 31 Global Step: 659940 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:33,996-Speed 6288.52 samples/sec Loss 3.1920 LearningRate 0.0001 Epoch: 31 Global Step: 659950 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:37,252-Speed 6291.27 samples/sec Loss 3.1399 LearningRate 0.0001 Epoch: 31 Global Step: 659960 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:40,502-Speed 6301.79 samples/sec Loss 3.1543 LearningRate 0.0001 Epoch: 31 Global Step: 659970 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:43,737-Speed 6333.37 samples/sec Loss 3.2100 LearningRate 0.0001 Epoch: 31 Global Step: 659980 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:46,983-Speed 6310.11 samples/sec Loss 3.1096 LearningRate 0.0001 Epoch: 31 Global Step: 659990 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:50,229-Speed 6311.98 samples/sec Loss 3.1002 LearningRate 0.0001 Epoch: 31 Global Step: 660000 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:53,477-Speed 6305.16 samples/sec Loss 3.1916 LearningRate 0.0001 Epoch: 31 Global Step: 660010 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:56,730-Speed 6297.44 samples/sec Loss 3.1689 LearningRate 0.0001 Epoch: 31 Global Step: 660020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:45:59,977-Speed 6309.71 samples/sec Loss 3.1578 LearningRate 0.0001 Epoch: 31 Global Step: 660030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:03,226-Speed 6304.73 samples/sec Loss 3.1695 LearningRate 0.0001 Epoch: 31 Global Step: 660040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:06,469-Speed 6317.01 samples/sec Loss 3.1646 LearningRate 0.0001 Epoch: 31 Global Step: 660050 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:09,720-Speed 6300.30 samples/sec Loss 3.1661 LearningRate 0.0001 Epoch: 31 Global Step: 660060 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:12,964-Speed 6314.14 samples/sec Loss 3.1662 LearningRate 0.0001 Epoch: 31 Global Step: 660070 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:16,205-Speed 6321.44 samples/sec Loss 3.1480 LearningRate 0.0001 Epoch: 31 Global Step: 660080 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:19,458-Speed 6296.36 samples/sec Loss 3.1946 LearningRate 0.0001 Epoch: 31 Global Step: 660090 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:22,703-Speed 6311.79 samples/sec Loss 3.1319 LearningRate 0.0001 Epoch: 31 Global Step: 660100 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:25,954-Speed 6305.08 samples/sec Loss 3.1993 LearningRate 0.0001 Epoch: 31 Global Step: 660110 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:29,197-Speed 6316.70 samples/sec Loss 3.1831 LearningRate 0.0001 Epoch: 31 Global Step: 660120 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:32,442-Speed 6312.80 samples/sec Loss 3.1893 LearningRate 0.0001 Epoch: 31 Global Step: 660130 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:35,689-Speed 6309.34 samples/sec Loss 3.1796 LearningRate 0.0001 Epoch: 31 Global Step: 660140 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:38,946-Speed 6289.30 samples/sec Loss 3.1397 LearningRate 0.0001 Epoch: 31 Global Step: 660150 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:42,195-Speed 6304.58 samples/sec Loss 3.1722 LearningRate 0.0001 Epoch: 31 Global Step: 660160 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:45,440-Speed 6313.47 samples/sec Loss 3.1852 LearningRate 0.0001 Epoch: 31 Global Step: 660170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:48,669-Speed 6342.33 samples/sec Loss 3.2166 LearningRate 0.0001 Epoch: 31 Global Step: 660180 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:51,911-Speed 6319.16 samples/sec Loss 3.1282 LearningRate 0.0001 Epoch: 31 Global Step: 660190 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:55,154-Speed 6315.91 samples/sec Loss 3.1626 LearningRate 0.0001 Epoch: 31 Global Step: 660200 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:46:58,404-Speed 6303.91 samples/sec Loss 3.1762 LearningRate 0.0001 Epoch: 31 Global Step: 660210 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:01,650-Speed 6311.12 samples/sec Loss 3.1292 LearningRate 0.0001 Epoch: 31 Global Step: 660220 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:04,909-Speed 6285.88 samples/sec Loss 3.1928 LearningRate 0.0001 Epoch: 31 Global Step: 660230 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:08,157-Speed 6305.87 samples/sec Loss 3.1978 LearningRate 0.0001 Epoch: 31 Global Step: 660240 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:11,416-Speed 6285.54 samples/sec Loss 3.1084 LearningRate 0.0001 Epoch: 31 Global Step: 660250 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:14,670-Speed 6294.31 samples/sec Loss 3.1768 LearningRate 0.0001 Epoch: 31 Global Step: 660260 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:17,935-Speed 6275.28 samples/sec Loss 3.1751 LearningRate 0.0001 Epoch: 31 Global Step: 660270 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:21,181-Speed 6309.18 samples/sec Loss 3.1250 LearningRate 0.0001 Epoch: 31 Global Step: 660280 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 03:47:24,411-Speed 6343.74 samples/sec Loss 3.1684 LearningRate 0.0001 Epoch: 31 Global Step: 660290 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:27,655-Speed 6313.74 samples/sec Loss 3.1810 LearningRate 0.0001 Epoch: 31 Global Step: 660300 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:30,900-Speed 6313.43 samples/sec Loss 3.1725 LearningRate 0.0001 Epoch: 31 Global Step: 660310 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:34,149-Speed 6304.92 samples/sec Loss 3.1412 LearningRate 0.0001 Epoch: 31 Global Step: 660320 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:37,412-Speed 6278.32 samples/sec Loss 3.1338 LearningRate 0.0001 Epoch: 31 Global Step: 660330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:40,661-Speed 6305.25 samples/sec Loss 3.1295 LearningRate 0.0001 Epoch: 31 Global Step: 660340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:43,909-Speed 6306.70 samples/sec Loss 3.1567 LearningRate 0.0001 Epoch: 31 Global Step: 660350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:47,178-Speed 6266.61 samples/sec Loss 3.1430 LearningRate 0.0001 Epoch: 31 Global Step: 660360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:50,434-Speed 6291.02 samples/sec Loss 3.1913 LearningRate 0.0001 Epoch: 31 Global Step: 660370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:53,686-Speed 6299.18 samples/sec Loss 3.1735 LearningRate 0.0001 Epoch: 31 Global Step: 660380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:47:56,919-Speed 6335.58 samples/sec Loss 3.0987 LearningRate 0.0001 Epoch: 31 Global Step: 660390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:48:00,171-Speed 6299.86 samples/sec Loss 3.1877 LearningRate 0.0001 Epoch: 31 Global Step: 660400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:48:03,424-Speed 6296.78 samples/sec Loss 3.1411 LearningRate 0.0001 Epoch: 31 Global Step: 660410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:48:06,676-Speed 6300.13 samples/sec Loss 3.1272 LearningRate 0.0001 Epoch: 31 Global Step: 660420 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:48:09,919-Speed 6315.21 samples/sec Loss 3.1358 LearningRate 0.0001 Epoch: 31 Global Step: 660430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:48:13,171-Speed 6298.54 samples/sec Loss 3.1252 LearningRate 0.0001 Epoch: 31 Global Step: 660440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:48:16,417-Speed 6312.31 samples/sec Loss 3.1306 LearningRate 0.0001 Epoch: 31 Global Step: 660450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:48:19,668-Speed 6300.81 samples/sec Loss 3.1918 LearningRate 0.0001 Epoch: 31 Global Step: 660460 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:48:22,921-Speed 6296.53 samples/sec Loss 3.1680 LearningRate 0.0001 Epoch: 31 Global Step: 660470 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:48:26,179-Speed 6287.60 samples/sec Loss 3.1420 LearningRate 0.0001 Epoch: 31 Global Step: 660480 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:48:29,419-Speed 6323.02 samples/sec Loss 3.1544 LearningRate 0.0001 Epoch: 31 Global Step: 660490 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:48:32,673-Speed 6295.77 samples/sec Loss 3.1244 LearningRate 0.0001 Epoch: 31 Global Step: 660500 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:48:35,921-Speed 6305.80 samples/sec Loss 3.1600 LearningRate 0.0001 Epoch: 31 Global Step: 660510 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:48:39,173-Speed 6299.65 samples/sec Loss 3.1750 LearningRate 0.0001 Epoch: 31 Global Step: 660520 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:48:42,418-Speed 6312.58 samples/sec Loss 3.2008 LearningRate 0.0001 Epoch: 31 Global Step: 660530 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:48:45,665-Speed 6308.05 samples/sec Loss 3.1458 LearningRate 0.0001 Epoch: 31 Global Step: 660540 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:48:48,911-Speed 6311.29 samples/sec Loss 3.1844 LearningRate 0.0001 Epoch: 31 Global Step: 660550 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:48:52,160-Speed 6304.91 samples/sec Loss 3.1791 LearningRate 0.0001 Epoch: 31 Global Step: 660560 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:48:55,407-Speed 6308.57 samples/sec Loss 3.2056 LearningRate 0.0001 Epoch: 31 Global Step: 660570 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:48:58,657-Speed 6303.47 samples/sec Loss 3.1599 LearningRate 0.0001 Epoch: 31 Global Step: 660580 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:49:01,917-Speed 6283.85 samples/sec Loss 3.1145 LearningRate 0.0001 Epoch: 31 Global Step: 660590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:49:05,165-Speed 6306.95 samples/sec Loss 3.0706 LearningRate 0.0001 Epoch: 31 Global Step: 660600 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:49:08,409-Speed 6313.99 samples/sec Loss 3.0921 LearningRate 0.0001 Epoch: 31 Global Step: 660610 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:49:11,665-Speed 6292.09 samples/sec Loss 3.1756 LearningRate 0.0001 Epoch: 31 Global Step: 660620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:49:14,909-Speed 6316.35 samples/sec Loss 3.0943 LearningRate 0.0001 Epoch: 31 Global Step: 660630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:49:18,154-Speed 6312.31 samples/sec Loss 3.1053 LearningRate 0.0001 Epoch: 31 Global Step: 660640 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:49:21,401-Speed 6307.46 samples/sec Loss 3.1592 LearningRate 0.0001 Epoch: 31 Global Step: 660650 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:49:24,647-Speed 6310.74 samples/sec Loss 3.2455 LearningRate 0.0001 Epoch: 31 Global Step: 660660 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:49:27,898-Speed 6301.71 samples/sec Loss 3.1710 LearningRate 0.0001 Epoch: 31 Global Step: 660670 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:49:31,152-Speed 6296.31 samples/sec Loss 3.1704 LearningRate 0.0001 Epoch: 31 Global Step: 660680 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:49:34,392-Speed 6322.04 samples/sec Loss 3.2095 LearningRate 0.0001 Epoch: 31 Global Step: 660690 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:49:37,648-Speed 6291.51 samples/sec Loss 3.1390 LearningRate 0.0001 Epoch: 31 Global Step: 660700 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:49:40,898-Speed 6301.62 samples/sec Loss 3.0929 LearningRate 0.0001 Epoch: 31 Global Step: 660710 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:49:44,148-Speed 6304.03 samples/sec Loss 3.1817 LearningRate 0.0001 Epoch: 31 Global Step: 660720 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:49:47,392-Speed 6313.98 samples/sec Loss 3.1719 LearningRate 0.0001 Epoch: 31 Global Step: 660730 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:49:50,636-Speed 6315.03 samples/sec Loss 3.2121 LearningRate 0.0001 Epoch: 31 Global Step: 660740 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:49:53,881-Speed 6313.27 samples/sec Loss 3.1049 LearningRate 0.0001 Epoch: 31 Global Step: 660750 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:49:57,123-Speed 6317.82 samples/sec Loss 3.1574 LearningRate 0.0001 Epoch: 31 Global Step: 660760 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:50:00,375-Speed 6299.07 samples/sec Loss 3.2160 LearningRate 0.0001 Epoch: 31 Global Step: 660770 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:50:03,622-Speed 6308.60 samples/sec Loss 3.2038 LearningRate 0.0001 Epoch: 31 Global Step: 660780 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:50:06,869-Speed 6308.99 samples/sec Loss 3.1717 LearningRate 0.0001 Epoch: 31 Global Step: 660790 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:10,115-Speed 6310.82 samples/sec Loss 3.0914 LearningRate 0.0001 Epoch: 31 Global Step: 660800 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:13,365-Speed 6303.31 samples/sec Loss 3.1739 LearningRate 0.0001 Epoch: 31 Global Step: 660810 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:16,609-Speed 6314.28 samples/sec Loss 3.1500 LearningRate 0.0001 Epoch: 31 Global Step: 660820 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:19,859-Speed 6302.78 samples/sec Loss 3.0883 LearningRate 0.0001 Epoch: 31 Global Step: 660830 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:23,111-Speed 6300.26 samples/sec Loss 3.0934 LearningRate 0.0001 Epoch: 31 Global Step: 660840 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:26,358-Speed 6307.55 samples/sec Loss 3.1304 LearningRate 0.0001 Epoch: 31 Global Step: 660850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:29,601-Speed 6316.92 samples/sec Loss 3.1016 LearningRate 0.0001 Epoch: 31 Global Step: 660860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:32,851-Speed 6303.86 samples/sec Loss 3.2192 LearningRate 0.0001 Epoch: 31 Global Step: 660870 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:36,100-Speed 6304.40 samples/sec Loss 3.2156 LearningRate 0.0001 Epoch: 31 Global Step: 660880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:39,332-Speed 6338.18 samples/sec Loss 3.1668 LearningRate 0.0001 Epoch: 31 Global Step: 660890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:42,580-Speed 6306.62 samples/sec Loss 3.1219 LearningRate 0.0001 Epoch: 31 Global Step: 660900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:45,823-Speed 6316.85 samples/sec Loss 3.1336 LearningRate 0.0001 Epoch: 31 Global Step: 660910 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:49,071-Speed 6307.24 samples/sec Loss 3.1951 LearningRate 0.0001 Epoch: 31 Global Step: 660920 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:52,328-Speed 6288.47 samples/sec Loss 3.1410 LearningRate 0.0001 Epoch: 31 Global Step: 660930 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:55,578-Speed 6302.26 samples/sec Loss 3.0934 LearningRate 0.0001 Epoch: 31 Global Step: 660940 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:50:58,828-Speed 6303.68 samples/sec Loss 3.1270 LearningRate 0.0001 Epoch: 31 Global Step: 660950 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:02,079-Speed 6301.59 samples/sec Loss 3.1452 LearningRate 0.0001 Epoch: 31 Global Step: 660960 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:05,323-Speed 6313.72 samples/sec Loss 3.1566 LearningRate 0.0001 Epoch: 31 Global Step: 660970 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:08,569-Speed 6310.97 samples/sec Loss 3.1373 LearningRate 0.0001 Epoch: 31 Global Step: 660980 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:11,801-Speed 6337.19 samples/sec Loss 3.1797 LearningRate 0.0001 Epoch: 31 Global Step: 660990 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:15,048-Speed 6310.91 samples/sec Loss 3.1648 LearningRate 0.0001 Epoch: 31 Global Step: 661000 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:18,295-Speed 6307.44 samples/sec Loss 3.1322 LearningRate 0.0001 Epoch: 31 Global Step: 661010 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:21,545-Speed 6302.93 samples/sec Loss 3.2064 LearningRate 0.0001 Epoch: 31 Global Step: 661020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:24,794-Speed 6304.95 samples/sec Loss 3.2149 LearningRate 0.0001 Epoch: 31 Global Step: 661030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:28,039-Speed 6313.07 samples/sec Loss 3.1163 LearningRate 0.0001 Epoch: 31 Global Step: 661040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:31,287-Speed 6307.64 samples/sec Loss 3.1651 LearningRate 0.0001 Epoch: 31 Global Step: 661050 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:34,530-Speed 6316.59 samples/sec Loss 3.1549 LearningRate 0.0001 Epoch: 31 Global Step: 661060 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:37,779-Speed 6304.78 samples/sec Loss 3.1706 LearningRate 0.0001 Epoch: 31 Global Step: 661070 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:41,028-Speed 6304.50 samples/sec Loss 3.1511 LearningRate 0.0001 Epoch: 31 Global Step: 661080 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:44,259-Speed 6339.78 samples/sec Loss 3.1621 LearningRate 0.0001 Epoch: 31 Global Step: 661090 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:47,506-Speed 6310.33 samples/sec Loss 3.1670 LearningRate 0.0001 Epoch: 31 Global Step: 661100 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:50,753-Speed 6307.42 samples/sec Loss 3.1644 LearningRate 0.0001 Epoch: 31 Global Step: 661110 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:54,001-Speed 6306.93 samples/sec Loss 3.1193 LearningRate 0.0001 Epoch: 31 Global Step: 661120 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:51:57,247-Speed 6310.09 samples/sec Loss 3.1246 LearningRate 0.0001 Epoch: 31 Global Step: 661130 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:00,499-Speed 6299.51 samples/sec Loss 3.1333 LearningRate 0.0001 Epoch: 31 Global Step: 661140 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:03,752-Speed 6297.08 samples/sec Loss 3.1452 LearningRate 0.0001 Epoch: 31 Global Step: 661150 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:07,007-Speed 6293.77 samples/sec Loss 3.1334 LearningRate 0.0001 Epoch: 31 Global Step: 661160 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:10,256-Speed 6304.58 samples/sec Loss 3.1506 LearningRate 0.0001 Epoch: 31 Global Step: 661170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:13,500-Speed 6315.72 samples/sec Loss 3.1453 LearningRate 0.0001 Epoch: 31 Global Step: 661180 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:16,731-Speed 6338.65 samples/sec Loss 3.2070 LearningRate 0.0001 Epoch: 31 Global Step: 661190 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:19,985-Speed 6296.78 samples/sec Loss 3.1597 LearningRate 0.0001 Epoch: 31 Global Step: 661200 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:23,242-Speed 6289.34 samples/sec Loss 3.1275 LearningRate 0.0001 Epoch: 31 Global Step: 661210 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:26,502-Speed 6281.73 samples/sec Loss 3.1677 LearningRate 0.0001 Epoch: 31 Global Step: 661220 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:29,750-Speed 6307.92 samples/sec Loss 3.1652 LearningRate 0.0001 Epoch: 31 Global Step: 661230 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:32,997-Speed 6308.83 samples/sec Loss 3.1522 LearningRate 0.0001 Epoch: 31 Global Step: 661240 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:36,250-Speed 6296.85 samples/sec Loss 3.1559 LearningRate 0.0001 Epoch: 31 Global Step: 661250 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:39,497-Speed 6310.95 samples/sec Loss 3.1994 LearningRate 0.0001 Epoch: 31 Global Step: 661260 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:42,744-Speed 6307.71 samples/sec Loss 3.1984 LearningRate 0.0001 Epoch: 31 Global Step: 661270 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:45,993-Speed 6305.15 samples/sec Loss 3.1746 LearningRate 0.0001 Epoch: 31 Global Step: 661280 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:49,240-Speed 6308.79 samples/sec Loss 3.1841 LearningRate 0.0001 Epoch: 31 Global Step: 661290 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 03:52:52,470-Speed 6341.37 samples/sec Loss 3.1835 LearningRate 0.0001 Epoch: 31 Global Step: 661300 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:55,723-Speed 6298.27 samples/sec Loss 3.1999 LearningRate 0.0001 Epoch: 31 Global Step: 661310 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:52:58,968-Speed 6311.68 samples/sec Loss 3.1916 LearningRate 0.0001 Epoch: 31 Global Step: 661320 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:02,225-Speed 6289.49 samples/sec Loss 3.1062 LearningRate 0.0001 Epoch: 31 Global Step: 661330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:05,473-Speed 6307.50 samples/sec Loss 3.1560 LearningRate 0.0001 Epoch: 31 Global Step: 661340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:08,725-Speed 6298.18 samples/sec Loss 3.1688 LearningRate 0.0001 Epoch: 31 Global Step: 661350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:11,971-Speed 6311.92 samples/sec Loss 3.1324 LearningRate 0.0001 Epoch: 31 Global Step: 661360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:15,220-Speed 6304.65 samples/sec Loss 3.1458 LearningRate 0.0001 Epoch: 31 Global Step: 661370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:18,471-Speed 6299.17 samples/sec Loss 3.1479 LearningRate 0.0001 Epoch: 31 Global Step: 661380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:21,719-Speed 6307.25 samples/sec Loss 3.1620 LearningRate 0.0001 Epoch: 31 Global Step: 661390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:24,961-Speed 6319.28 samples/sec Loss 3.1385 LearningRate 0.0001 Epoch: 31 Global Step: 661400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:28,211-Speed 6303.14 samples/sec Loss 3.1618 LearningRate 0.0001 Epoch: 31 Global Step: 661410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:31,455-Speed 6315.09 samples/sec Loss 3.1518 LearningRate 0.0001 Epoch: 31 Global Step: 661420 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:34,701-Speed 6309.72 samples/sec Loss 3.1462 LearningRate 0.0001 Epoch: 31 Global Step: 661430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:37,946-Speed 6312.85 samples/sec Loss 3.1693 LearningRate 0.0001 Epoch: 31 Global Step: 661440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:41,195-Speed 6305.92 samples/sec Loss 3.1351 LearningRate 0.0001 Epoch: 31 Global Step: 661450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:44,448-Speed 6297.57 samples/sec Loss 3.1557 LearningRate 0.0001 Epoch: 31 Global Step: 661460 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:47,693-Speed 6312.40 samples/sec Loss 3.1187 LearningRate 0.0001 Epoch: 31 Global Step: 661470 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:50,944-Speed 6302.02 samples/sec Loss 3.1786 LearningRate 0.0001 Epoch: 31 Global Step: 661480 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:54,189-Speed 6310.72 samples/sec Loss 3.1619 LearningRate 0.0001 Epoch: 31 Global Step: 661490 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:53:57,420-Speed 6341.64 samples/sec Loss 3.1352 LearningRate 0.0001 Epoch: 31 Global Step: 661500 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:00,665-Speed 6312.24 samples/sec Loss 3.1660 LearningRate 0.0001 Epoch: 31 Global Step: 661510 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:03,919-Speed 6294.09 samples/sec Loss 3.2141 LearningRate 0.0001 Epoch: 31 Global Step: 661520 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:07,166-Speed 6310.39 samples/sec Loss 3.1342 LearningRate 0.0001 Epoch: 31 Global Step: 661530 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:10,408-Speed 6316.98 samples/sec Loss 3.1527 LearningRate 0.0001 Epoch: 31 Global Step: 661540 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:13,655-Speed 6309.08 samples/sec Loss 3.1929 LearningRate 0.0001 Epoch: 31 Global Step: 661550 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:16,902-Speed 6308.04 samples/sec Loss 3.1500 LearningRate 0.0001 Epoch: 31 Global Step: 661560 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:20,151-Speed 6305.14 samples/sec Loss 3.1667 LearningRate 0.0001 Epoch: 31 Global Step: 661570 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:23,410-Speed 6285.57 samples/sec Loss 3.0918 LearningRate 0.0001 Epoch: 31 Global Step: 661580 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:26,672-Speed 6281.13 samples/sec Loss 3.1211 LearningRate 0.0001 Epoch: 31 Global Step: 661590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:29,905-Speed 6336.07 samples/sec Loss 3.1408 LearningRate 0.0001 Epoch: 31 Global Step: 661600 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:33,152-Speed 6307.24 samples/sec Loss 3.1821 LearningRate 0.0001 Epoch: 31 Global Step: 661610 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:36,409-Speed 6289.17 samples/sec Loss 3.2042 LearningRate 0.0001 Epoch: 31 Global Step: 661620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:39,730-Speed 6168.63 samples/sec Loss 3.1275 LearningRate 0.0001 Epoch: 31 Global Step: 661630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:43,026-Speed 6214.99 samples/sec Loss 3.1658 LearningRate 0.0001 Epoch: 31 Global Step: 661640 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:46,272-Speed 6310.43 samples/sec Loss 3.1348 LearningRate 0.0001 Epoch: 31 Global Step: 661650 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:49,517-Speed 6312.16 samples/sec Loss 3.1422 LearningRate 0.0001 Epoch: 31 Global Step: 661660 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:52,760-Speed 6317.13 samples/sec Loss 3.1808 LearningRate 0.0001 Epoch: 31 Global Step: 661670 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:56,006-Speed 6311.96 samples/sec Loss 3.2093 LearningRate 0.0001 Epoch: 31 Global Step: 661680 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:54:59,256-Speed 6303.01 samples/sec Loss 3.1387 LearningRate 0.0001 Epoch: 31 Global Step: 661690 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:02,518-Speed 6279.51 samples/sec Loss 3.1855 LearningRate 0.0001 Epoch: 31 Global Step: 661700 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:05,765-Speed 6309.46 samples/sec Loss 3.1102 LearningRate 0.0001 Epoch: 31 Global Step: 661710 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:09,011-Speed 6309.55 samples/sec Loss 3.1361 LearningRate 0.0001 Epoch: 31 Global Step: 661720 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:12,254-Speed 6317.57 samples/sec Loss 3.1344 LearningRate 0.0001 Epoch: 31 Global Step: 661730 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:15,504-Speed 6302.79 samples/sec Loss 3.1231 LearningRate 0.0001 Epoch: 31 Global Step: 661740 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:18,754-Speed 6303.52 samples/sec Loss 3.1766 LearningRate 0.0001 Epoch: 31 Global Step: 661750 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:21,999-Speed 6311.57 samples/sec Loss 3.1685 LearningRate 0.0001 Epoch: 31 Global Step: 661760 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:25,243-Speed 6314.83 samples/sec Loss 3.1371 LearningRate 0.0001 Epoch: 31 Global Step: 661770 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:28,490-Speed 6309.36 samples/sec Loss 3.1981 LearningRate 0.0001 Epoch: 31 Global Step: 661780 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:31,739-Speed 6304.92 samples/sec Loss 3.1533 LearningRate 0.0001 Epoch: 31 Global Step: 661790 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:34,973-Speed 6334.16 samples/sec Loss 3.1887 LearningRate 0.0001 Epoch: 31 Global Step: 661800 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:38,218-Speed 6312.02 samples/sec Loss 3.1677 LearningRate 0.0001 Epoch: 31 Global Step: 661810 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:41,471-Speed 6296.56 samples/sec Loss 3.1479 LearningRate 0.0001 Epoch: 31 Global Step: 661820 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:44,717-Speed 6310.60 samples/sec Loss 3.1495 LearningRate 0.0001 Epoch: 31 Global Step: 661830 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:47,971-Speed 6296.62 samples/sec Loss 3.1823 LearningRate 0.0001 Epoch: 31 Global Step: 661840 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:51,237-Speed 6271.16 samples/sec Loss 3.0743 LearningRate 0.0001 Epoch: 31 Global Step: 661850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:54,491-Speed 6294.62 samples/sec Loss 3.2314 LearningRate 0.0001 Epoch: 31 Global Step: 661860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:55:57,738-Speed 6308.68 samples/sec Loss 3.1479 LearningRate 0.0001 Epoch: 31 Global Step: 661870 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:00,987-Speed 6305.61 samples/sec Loss 3.1360 LearningRate 0.0001 Epoch: 31 Global Step: 661880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:04,237-Speed 6303.08 samples/sec Loss 3.1465 LearningRate 0.0001 Epoch: 31 Global Step: 661890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:07,471-Speed 6335.26 samples/sec Loss 3.1391 LearningRate 0.0001 Epoch: 31 Global Step: 661900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:10,717-Speed 6309.45 samples/sec Loss 3.1430 LearningRate 0.0001 Epoch: 31 Global Step: 661910 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:13,967-Speed 6303.04 samples/sec Loss 3.0988 LearningRate 0.0001 Epoch: 31 Global Step: 661920 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:17,216-Speed 6305.77 samples/sec Loss 3.1653 LearningRate 0.0001 Epoch: 31 Global Step: 661930 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:20,462-Speed 6311.57 samples/sec Loss 3.1383 LearningRate 0.0001 Epoch: 31 Global Step: 661940 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:23,712-Speed 6303.13 samples/sec Loss 3.1764 LearningRate 0.0001 Epoch: 31 Global Step: 661950 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:26,965-Speed 6296.48 samples/sec Loss 3.1775 LearningRate 0.0001 Epoch: 31 Global Step: 661960 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:30,214-Speed 6305.28 samples/sec Loss 3.0976 LearningRate 0.0001 Epoch: 31 Global Step: 661970 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:33,465-Speed 6300.41 samples/sec Loss 3.1442 LearningRate 0.0001 Epoch: 31 Global Step: 661980 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:36,724-Speed 6285.48 samples/sec Loss 3.2036 LearningRate 0.0001 Epoch: 31 Global Step: 661990 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:39,956-Speed 6338.59 samples/sec Loss 3.1140 LearningRate 0.0001 Epoch: 31 Global Step: 662000 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:43,207-Speed 6300.04 samples/sec Loss 3.1247 LearningRate 0.0001 Epoch: 31 Global Step: 662010 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:46,455-Speed 6307.16 samples/sec Loss 3.1801 LearningRate 0.0001 Epoch: 31 Global Step: 662020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:49,704-Speed 6304.15 samples/sec Loss 3.1300 LearningRate 0.0001 Epoch: 31 Global Step: 662030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:52,960-Speed 6292.66 samples/sec Loss 3.1791 LearningRate 0.0001 Epoch: 31 Global Step: 662040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:56:56,191-Speed 6340.23 samples/sec Loss 3.1685 LearningRate 0.0001 Epoch: 31 Global Step: 662050 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:56:59,437-Speed 6310.46 samples/sec Loss 3.1232 LearningRate 0.0001 Epoch: 31 Global Step: 662060 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:57:02,682-Speed 6311.71 samples/sec Loss 3.1471 LearningRate 0.0001 Epoch: 31 Global Step: 662070 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:57:05,938-Speed 6292.07 samples/sec Loss 3.1730 LearningRate 0.0001 Epoch: 31 Global Step: 662080 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:57:09,189-Speed 6300.39 samples/sec Loss 3.1044 LearningRate 0.0001 Epoch: 31 Global Step: 662090 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:57:12,436-Speed 6309.23 samples/sec Loss 3.2174 LearningRate 0.0001 Epoch: 31 Global Step: 662100 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:57:15,687-Speed 6300.79 samples/sec Loss 3.1332 LearningRate 0.0001 Epoch: 31 Global Step: 662110 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:57:18,933-Speed 6309.68 samples/sec Loss 3.1610 LearningRate 0.0001 Epoch: 31 Global Step: 662120 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:57:22,181-Speed 6308.79 samples/sec Loss 3.1560 LearningRate 0.0001 Epoch: 31 Global Step: 662130 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:57:25,478-Speed 6212.21 samples/sec Loss 3.0844 LearningRate 0.0001 Epoch: 31 Global Step: 662140 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 03:57:28,727-Speed 6305.64 samples/sec Loss 3.1692 LearningRate 0.0001 Epoch: 31 Global Step: 662150 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:57:31,974-Speed 6308.32 samples/sec Loss 3.0910 LearningRate 0.0001 Epoch: 31 Global Step: 662160 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:57:35,226-Speed 6300.26 samples/sec Loss 3.1922 LearningRate 0.0001 Epoch: 31 Global Step: 662170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:57:38,472-Speed 6309.94 samples/sec Loss 3.2678 LearningRate 0.0001 Epoch: 31 Global Step: 662180 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:57:41,715-Speed 6318.00 samples/sec Loss 3.1460 LearningRate 0.0001 Epoch: 31 Global Step: 662190 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:57:44,962-Speed 6307.39 samples/sec Loss 3.2032 LearningRate 0.0001 Epoch: 31 Global Step: 662200 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:57:48,210-Speed 6307.93 samples/sec Loss 3.1536 LearningRate 0.0001 Epoch: 31 Global Step: 662210 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:57:51,457-Speed 6308.18 samples/sec Loss 3.1956 LearningRate 0.0001 Epoch: 31 Global Step: 662220 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:57:54,706-Speed 6305.35 samples/sec Loss 3.1524 LearningRate 0.0001 Epoch: 31 Global Step: 662230 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:57:57,963-Speed 6288.23 samples/sec Loss 3.1468 LearningRate 0.0001 Epoch: 31 Global Step: 662240 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:01,203-Speed 6322.39 samples/sec Loss 3.1931 LearningRate 0.0001 Epoch: 31 Global Step: 662250 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:04,457-Speed 6296.25 samples/sec Loss 3.1004 LearningRate 0.0001 Epoch: 31 Global Step: 662260 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:07,706-Speed 6304.15 samples/sec Loss 3.1892 LearningRate 0.0001 Epoch: 31 Global Step: 662270 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:10,954-Speed 6306.21 samples/sec Loss 3.1152 LearningRate 0.0001 Epoch: 31 Global Step: 662280 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:14,203-Speed 6305.35 samples/sec Loss 3.1009 LearningRate 0.0001 Epoch: 31 Global Step: 662290 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:17,452-Speed 6305.47 samples/sec Loss 3.1904 LearningRate 0.0001 Epoch: 31 Global Step: 662300 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:20,699-Speed 6309.13 samples/sec Loss 3.1352 LearningRate 0.0001 Epoch: 31 Global Step: 662310 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:23,960-Speed 6280.77 samples/sec Loss 3.1300 LearningRate 0.0001 Epoch: 31 Global Step: 662320 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:27,207-Speed 6308.54 samples/sec Loss 3.1481 LearningRate 0.0001 Epoch: 31 Global Step: 662330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:30,450-Speed 6316.08 samples/sec Loss 3.1538 LearningRate 0.0001 Epoch: 31 Global Step: 662340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:33,678-Speed 6345.72 samples/sec Loss 3.1473 LearningRate 0.0001 Epoch: 31 Global Step: 662350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:36,928-Speed 6304.08 samples/sec Loss 3.1863 LearningRate 0.0001 Epoch: 31 Global Step: 662360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:40,172-Speed 6315.62 samples/sec Loss 3.1201 LearningRate 0.0001 Epoch: 31 Global Step: 662370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:43,418-Speed 6309.50 samples/sec Loss 3.1269 LearningRate 0.0001 Epoch: 31 Global Step: 662380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:46,668-Speed 6304.67 samples/sec Loss 3.1659 LearningRate 0.0001 Epoch: 31 Global Step: 662390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:49,914-Speed 6309.70 samples/sec Loss 3.1463 LearningRate 0.0001 Epoch: 31 Global Step: 662400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:53,163-Speed 6305.83 samples/sec Loss 3.1555 LearningRate 0.0001 Epoch: 31 Global Step: 662410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:56,407-Speed 6314.07 samples/sec Loss 3.0985 LearningRate 0.0001 Epoch: 31 Global Step: 662420 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:58:59,655-Speed 6306.60 samples/sec Loss 3.1755 LearningRate 0.0001 Epoch: 31 Global Step: 662430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:02,934-Speed 6248.05 samples/sec Loss 3.1340 LearningRate 0.0001 Epoch: 31 Global Step: 662440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:06,247-Speed 6182.17 samples/sec Loss 3.1663 LearningRate 0.0001 Epoch: 31 Global Step: 662450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:09,493-Speed 6312.01 samples/sec Loss 3.1226 LearningRate 0.0001 Epoch: 31 Global Step: 662460 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:12,744-Speed 6299.33 samples/sec Loss 3.1598 LearningRate 0.0001 Epoch: 31 Global Step: 662470 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:15,992-Speed 6308.15 samples/sec Loss 3.1239 LearningRate 0.0001 Epoch: 31 Global Step: 662480 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:19,240-Speed 6305.34 samples/sec Loss 3.1161 LearningRate 0.0001 Epoch: 31 Global Step: 662490 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:22,488-Speed 6306.80 samples/sec Loss 3.1337 LearningRate 0.0001 Epoch: 31 Global Step: 662500 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:25,737-Speed 6305.90 samples/sec Loss 3.1853 LearningRate 0.0001 Epoch: 31 Global Step: 662510 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:28,984-Speed 6308.69 samples/sec Loss 3.1219 LearningRate 0.0001 Epoch: 31 Global Step: 662520 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:32,232-Speed 6307.42 samples/sec Loss 3.1749 LearningRate 0.0001 Epoch: 31 Global Step: 662530 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:35,476-Speed 6313.65 samples/sec Loss 3.1648 LearningRate 0.0001 Epoch: 31 Global Step: 662540 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:38,706-Speed 6341.54 samples/sec Loss 3.1554 LearningRate 0.0001 Epoch: 31 Global Step: 662550 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:41,958-Speed 6298.79 samples/sec Loss 3.1444 LearningRate 0.0001 Epoch: 31 Global Step: 662560 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:45,208-Speed 6303.41 samples/sec Loss 3.1772 LearningRate 0.0001 Epoch: 31 Global Step: 662570 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:48,459-Speed 6301.58 samples/sec Loss 3.1623 LearningRate 0.0001 Epoch: 31 Global Step: 662580 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:51,714-Speed 6293.19 samples/sec Loss 3.1494 LearningRate 0.0001 Epoch: 31 Global Step: 662590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:54,963-Speed 6306.48 samples/sec Loss 3.1481 LearningRate 0.0001 Epoch: 31 Global Step: 662600 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 03:59:58,208-Speed 6310.89 samples/sec Loss 3.2294 LearningRate 0.0001 Epoch: 31 Global Step: 662610 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:01,459-Speed 6302.85 samples/sec Loss 3.1523 LearningRate 0.0000 Epoch: 31 Global Step: 662620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:04,714-Speed 6292.50 samples/sec Loss 3.1518 LearningRate 0.0000 Epoch: 31 Global Step: 662630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:07,960-Speed 6311.82 samples/sec Loss 3.1200 LearningRate 0.0000 Epoch: 31 Global Step: 662640 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:11,204-Speed 6314.03 samples/sec Loss 3.1600 LearningRate 0.0000 Epoch: 31 Global Step: 662650 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 04:00:14,443-Speed 6323.13 samples/sec Loss 3.1569 LearningRate 0.0000 Epoch: 31 Global Step: 662660 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:17,689-Speed 6312.25 samples/sec Loss 3.1660 LearningRate 0.0000 Epoch: 31 Global Step: 662670 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:20,938-Speed 6303.63 samples/sec Loss 3.0768 LearningRate 0.0000 Epoch: 31 Global Step: 662680 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:24,184-Speed 6311.30 samples/sec Loss 3.1385 LearningRate 0.0000 Epoch: 31 Global Step: 662690 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:27,429-Speed 6312.91 samples/sec Loss 3.1577 LearningRate 0.0000 Epoch: 31 Global Step: 662700 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:30,678-Speed 6305.38 samples/sec Loss 3.1521 LearningRate 0.0000 Epoch: 31 Global Step: 662710 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:33,925-Speed 6307.54 samples/sec Loss 3.1819 LearningRate 0.0000 Epoch: 31 Global Step: 662720 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:37,170-Speed 6313.59 samples/sec Loss 3.1763 LearningRate 0.0000 Epoch: 31 Global Step: 662730 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:40,416-Speed 6309.92 samples/sec Loss 3.2166 LearningRate 0.0000 Epoch: 31 Global Step: 662740 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:43,659-Speed 6317.48 samples/sec Loss 3.1159 LearningRate 0.0000 Epoch: 31 Global Step: 662750 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:46,896-Speed 6328.24 samples/sec Loss 3.1048 LearningRate 0.0000 Epoch: 31 Global Step: 662760 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:50,143-Speed 6308.14 samples/sec Loss 3.1663 LearningRate 0.0000 Epoch: 31 Global Step: 662770 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:53,395-Speed 6299.76 samples/sec Loss 3.1375 LearningRate 0.0000 Epoch: 31 Global Step: 662780 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:56,642-Speed 6307.78 samples/sec Loss 3.1456 LearningRate 0.0000 Epoch: 31 Global Step: 662790 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:00:59,899-Speed 6289.96 samples/sec Loss 3.1423 LearningRate 0.0000 Epoch: 31 Global Step: 662800 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:03,151-Speed 6300.25 samples/sec Loss 3.0800 LearningRate 0.0000 Epoch: 31 Global Step: 662810 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:06,401-Speed 6302.83 samples/sec Loss 3.1528 LearningRate 0.0000 Epoch: 31 Global Step: 662820 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:09,653-Speed 6299.02 samples/sec Loss 3.1002 LearningRate 0.0000 Epoch: 31 Global Step: 662830 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:12,896-Speed 6315.25 samples/sec Loss 3.1747 LearningRate 0.0000 Epoch: 31 Global Step: 662840 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:16,143-Speed 6308.97 samples/sec Loss 3.1696 LearningRate 0.0000 Epoch: 31 Global Step: 662850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:19,374-Speed 6340.88 samples/sec Loss 3.1858 LearningRate 0.0000 Epoch: 31 Global Step: 662860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:22,619-Speed 6312.98 samples/sec Loss 3.1453 LearningRate 0.0000 Epoch: 31 Global Step: 662870 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:25,864-Speed 6311.41 samples/sec Loss 3.1391 LearningRate 0.0000 Epoch: 31 Global Step: 662880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:29,122-Speed 6288.36 samples/sec Loss 3.1828 LearningRate 0.0000 Epoch: 31 Global Step: 662890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:32,375-Speed 6297.00 samples/sec Loss 3.2036 LearningRate 0.0000 Epoch: 31 Global Step: 662900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:35,625-Speed 6302.33 samples/sec Loss 3.1586 LearningRate 0.0000 Epoch: 31 Global Step: 662910 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:38,875-Speed 6303.35 samples/sec Loss 3.1101 LearningRate 0.0000 Epoch: 31 Global Step: 662920 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:42,131-Speed 6291.82 samples/sec Loss 3.1223 LearningRate 0.0000 Epoch: 31 Global Step: 662930 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:45,379-Speed 6306.82 samples/sec Loss 3.1437 LearningRate 0.0000 Epoch: 31 Global Step: 662940 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:48,625-Speed 6310.59 samples/sec Loss 3.1603 LearningRate 0.0000 Epoch: 31 Global Step: 662950 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:51,858-Speed 6334.78 samples/sec Loss 3.1530 LearningRate 0.0000 Epoch: 31 Global Step: 662960 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:55,107-Speed 6305.72 samples/sec Loss 3.1020 LearningRate 0.0000 Epoch: 31 Global Step: 662970 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:01:58,354-Speed 6308.84 samples/sec Loss 3.1317 LearningRate 0.0000 Epoch: 31 Global Step: 662980 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:01,603-Speed 6304.78 samples/sec Loss 3.1457 LearningRate 0.0000 Epoch: 31 Global Step: 662990 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:04,859-Speed 6291.40 samples/sec Loss 3.0923 LearningRate 0.0000 Epoch: 31 Global Step: 663000 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:08,109-Speed 6302.82 samples/sec Loss 3.1175 LearningRate 0.0000 Epoch: 31 Global Step: 663010 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:11,354-Speed 6313.96 samples/sec Loss 3.1580 LearningRate 0.0000 Epoch: 31 Global Step: 663020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:14,602-Speed 6306.35 samples/sec Loss 3.1456 LearningRate 0.0000 Epoch: 31 Global Step: 663030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:17,850-Speed 6306.44 samples/sec Loss 3.1539 LearningRate 0.0000 Epoch: 31 Global Step: 663040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:21,099-Speed 6306.61 samples/sec Loss 3.2098 LearningRate 0.0000 Epoch: 31 Global Step: 663050 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:24,331-Speed 6336.58 samples/sec Loss 3.0963 LearningRate 0.0000 Epoch: 31 Global Step: 663060 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:27,584-Speed 6297.69 samples/sec Loss 3.1838 LearningRate 0.0000 Epoch: 31 Global Step: 663070 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:30,831-Speed 6309.06 samples/sec Loss 3.1261 LearningRate 0.0000 Epoch: 31 Global Step: 663080 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:34,086-Speed 6294.05 samples/sec Loss 3.1821 LearningRate 0.0000 Epoch: 31 Global Step: 663090 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:37,341-Speed 6292.52 samples/sec Loss 3.1622 LearningRate 0.0000 Epoch: 31 Global Step: 663100 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:40,588-Speed 6308.12 samples/sec Loss 3.2029 LearningRate 0.0000 Epoch: 31 Global Step: 663110 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:43,836-Speed 6307.41 samples/sec Loss 3.1247 LearningRate 0.0000 Epoch: 31 Global Step: 663120 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:47,081-Speed 6312.52 samples/sec Loss 3.1452 LearningRate 0.0000 Epoch: 31 Global Step: 663130 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:50,331-Speed 6303.91 samples/sec Loss 3.1715 LearningRate 0.0000 Epoch: 31 Global Step: 663140 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:53,574-Speed 6315.90 samples/sec Loss 3.2120 LearningRate 0.0000 Epoch: 31 Global Step: 663150 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:02:56,809-Speed 6332.03 samples/sec Loss 3.1843 LearningRate 0.0000 Epoch: 31 Global Step: 663160 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:00,056-Speed 6308.55 samples/sec Loss 3.1709 LearningRate 0.0000 Epoch: 31 Global Step: 663170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:03,305-Speed 6305.80 samples/sec Loss 3.1225 LearningRate 0.0000 Epoch: 31 Global Step: 663180 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:06,566-Speed 6279.92 samples/sec Loss 3.1490 LearningRate 0.0000 Epoch: 31 Global Step: 663190 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:09,826-Speed 6284.76 samples/sec Loss 3.0868 LearningRate 0.0000 Epoch: 31 Global Step: 663200 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:13,075-Speed 6303.82 samples/sec Loss 3.1259 LearningRate 0.0000 Epoch: 31 Global Step: 663210 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:16,323-Speed 6306.99 samples/sec Loss 3.0957 LearningRate 0.0000 Epoch: 31 Global Step: 663220 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:19,571-Speed 6306.62 samples/sec Loss 3.2011 LearningRate 0.0000 Epoch: 31 Global Step: 663230 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:22,821-Speed 6304.79 samples/sec Loss 3.1199 LearningRate 0.0000 Epoch: 31 Global Step: 663240 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:26,067-Speed 6311.31 samples/sec Loss 3.1623 LearningRate 0.0000 Epoch: 31 Global Step: 663250 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:29,303-Speed 6329.77 samples/sec Loss 3.1453 LearningRate 0.0000 Epoch: 31 Global Step: 663260 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:32,552-Speed 6305.79 samples/sec Loss 3.1500 LearningRate 0.0000 Epoch: 31 Global Step: 663270 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:35,803-Speed 6300.78 samples/sec Loss 3.1403 LearningRate 0.0000 Epoch: 31 Global Step: 663280 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:39,055-Speed 6298.00 samples/sec Loss 3.1812 LearningRate 0.0000 Epoch: 31 Global Step: 663290 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:42,301-Speed 6311.27 samples/sec Loss 3.1760 LearningRate 0.0000 Epoch: 31 Global Step: 663300 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:45,546-Speed 6311.72 samples/sec Loss 3.1601 LearningRate 0.0000 Epoch: 31 Global Step: 663310 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:48,790-Speed 6314.31 samples/sec Loss 3.1971 LearningRate 0.0000 Epoch: 31 Global Step: 663320 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:52,036-Speed 6312.55 samples/sec Loss 3.1183 LearningRate 0.0000 Epoch: 31 Global Step: 663330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:55,282-Speed 6309.47 samples/sec Loss 3.1878 LearningRate 0.0000 Epoch: 31 Global Step: 663340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:03:58,530-Speed 6306.52 samples/sec Loss 3.1471 LearningRate 0.0000 Epoch: 31 Global Step: 663350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:01,771-Speed 6321.40 samples/sec Loss 3.1564 LearningRate 0.0000 Epoch: 31 Global Step: 663360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:05,023-Speed 6299.45 samples/sec Loss 3.1679 LearningRate 0.0000 Epoch: 31 Global Step: 663370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:08,272-Speed 6304.62 samples/sec Loss 3.1455 LearningRate 0.0000 Epoch: 31 Global Step: 663380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:11,521-Speed 6304.13 samples/sec Loss 3.1336 LearningRate 0.0000 Epoch: 31 Global Step: 663390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:14,769-Speed 6306.30 samples/sec Loss 3.1572 LearningRate 0.0000 Epoch: 31 Global Step: 663400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:18,014-Speed 6313.87 samples/sec Loss 3.1216 LearningRate 0.0000 Epoch: 31 Global Step: 663410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:21,260-Speed 6310.44 samples/sec Loss 3.1288 LearningRate 0.0000 Epoch: 31 Global Step: 663420 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:24,511-Speed 6300.38 samples/sec Loss 3.1289 LearningRate 0.0000 Epoch: 31 Global Step: 663430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:27,761-Speed 6302.94 samples/sec Loss 3.2188 LearningRate 0.0000 Epoch: 31 Global Step: 663440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:31,007-Speed 6311.36 samples/sec Loss 3.0640 LearningRate 0.0000 Epoch: 31 Global Step: 663450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:04:34,250-Speed 6316.91 samples/sec Loss 3.1478 LearningRate 0.0000 Epoch: 31 Global Step: 663460 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 04:04:37,468-Speed 6366.34 samples/sec Loss 3.1302 LearningRate 0.0000 Epoch: 31 Global Step: 663470 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:04:40,715-Speed 6307.66 samples/sec Loss 3.1573 LearningRate 0.0000 Epoch: 31 Global Step: 663480 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:04:43,962-Speed 6310.38 samples/sec Loss 3.2313 LearningRate 0.0000 Epoch: 31 Global Step: 663490 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:04:47,208-Speed 6309.18 samples/sec Loss 3.1276 LearningRate 0.0000 Epoch: 31 Global Step: 663500 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:04:50,452-Speed 6315.62 samples/sec Loss 3.1266 LearningRate 0.0000 Epoch: 31 Global Step: 663510 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:04:53,704-Speed 6298.12 samples/sec Loss 3.1346 LearningRate 0.0000 Epoch: 31 Global Step: 663520 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:04:56,949-Speed 6313.91 samples/sec Loss 3.1865 LearningRate 0.0000 Epoch: 31 Global Step: 663530 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:05:00,195-Speed 6309.67 samples/sec Loss 3.1668 LearningRate 0.0000 Epoch: 31 Global Step: 663540 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:05:03,518-Speed 6164.23 samples/sec Loss 3.1945 LearningRate 0.0000 Epoch: 31 Global Step: 663550 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:05:06,786-Speed 6268.79 samples/sec Loss 3.1618 LearningRate 0.0000 Epoch: 31 Global Step: 663560 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:05:10,040-Speed 6294.88 samples/sec Loss 3.1331 LearningRate 0.0000 Epoch: 31 Global Step: 663570 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:05:13,290-Speed 6302.75 samples/sec Loss 3.1502 LearningRate 0.0000 Epoch: 31 Global Step: 663580 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:05:16,541-Speed 6301.70 samples/sec Loss 3.1585 LearningRate 0.0000 Epoch: 31 Global Step: 663590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:05:19,798-Speed 6289.37 samples/sec Loss 3.1477 LearningRate 0.0000 Epoch: 31 Global Step: 663600 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:05:23,058-Speed 6283.37 samples/sec Loss 3.0938 LearningRate 0.0000 Epoch: 31 Global Step: 663610 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:05:26,318-Speed 6283.88 samples/sec Loss 3.1683 LearningRate 0.0000 Epoch: 31 Global Step: 663620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:05:29,564-Speed 6310.99 samples/sec Loss 3.1220 LearningRate 0.0000 Epoch: 31 Global Step: 663630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:05:32,814-Speed 6303.05 samples/sec Loss 3.1529 LearningRate 0.0000 Epoch: 31 Global Step: 663640 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:05:36,064-Speed 6302.01 samples/sec Loss 3.1595 LearningRate 0.0000 Epoch: 31 Global Step: 663650 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:05:39,313-Speed 6305.20 samples/sec Loss 3.1255 LearningRate 0.0000 Epoch: 31 Global Step: 663660 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:06:39,765-Speed 338.78 samples/sec Loss 3.1700 LearningRate 0.0000 Epoch: 32 Global Step: 663670 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:06:43,002-Speed 6328.00 samples/sec Loss 3.1339 LearningRate 0.0000 Epoch: 32 Global Step: 663680 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:06:46,253-Speed 6301.13 samples/sec Loss 3.1362 LearningRate 0.0000 Epoch: 32 Global Step: 663690 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:06:49,490-Speed 6328.97 samples/sec Loss 3.2266 LearningRate 0.0000 Epoch: 32 Global Step: 663700 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:06:52,725-Speed 6332.61 samples/sec Loss 3.2008 LearningRate 0.0000 Epoch: 32 Global Step: 663710 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:06:55,960-Speed 6333.51 samples/sec Loss 3.2084 LearningRate 0.0000 Epoch: 32 Global Step: 663720 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:06:59,199-Speed 6323.61 samples/sec Loss 3.1612 LearningRate 0.0000 Epoch: 32 Global Step: 663730 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:02,437-Speed 6326.17 samples/sec Loss 3.1470 LearningRate 0.0000 Epoch: 32 Global Step: 663740 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:05,680-Speed 6317.37 samples/sec Loss 3.1064 LearningRate 0.0000 Epoch: 32 Global Step: 663750 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:08,917-Speed 6327.36 samples/sec Loss 3.1517 LearningRate 0.0000 Epoch: 32 Global Step: 663760 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:12,139-Speed 6357.04 samples/sec Loss 3.1360 LearningRate 0.0000 Epoch: 32 Global Step: 663770 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:15,377-Speed 6327.05 samples/sec Loss 3.0671 LearningRate 0.0000 Epoch: 32 Global Step: 663780 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:18,613-Speed 6329.56 samples/sec Loss 3.1161 LearningRate 0.0000 Epoch: 32 Global Step: 663790 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:21,854-Speed 6320.64 samples/sec Loss 3.1248 LearningRate 0.0000 Epoch: 32 Global Step: 663800 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:25,095-Speed 6320.04 samples/sec Loss 3.2166 LearningRate 0.0000 Epoch: 32 Global Step: 663810 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:28,334-Speed 6324.48 samples/sec Loss 3.1285 LearningRate 0.0000 Epoch: 32 Global Step: 663820 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:31,573-Speed 6324.76 samples/sec Loss 3.1568 LearningRate 0.0000 Epoch: 32 Global Step: 663830 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:34,838-Speed 6274.49 samples/sec Loss 3.1832 LearningRate 0.0000 Epoch: 32 Global Step: 663840 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:38,078-Speed 6321.91 samples/sec Loss 3.1170 LearningRate 0.0000 Epoch: 32 Global Step: 663850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:41,319-Speed 6321.66 samples/sec Loss 3.1700 LearningRate 0.0000 Epoch: 32 Global Step: 663860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:44,561-Speed 6317.05 samples/sec Loss 3.0766 LearningRate 0.0000 Epoch: 32 Global Step: 663870 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 04:07:47,788-Speed 6348.81 samples/sec Loss 3.1424 LearningRate 0.0000 Epoch: 32 Global Step: 663880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:51,029-Speed 6319.10 samples/sec Loss 3.1338 LearningRate 0.0000 Epoch: 32 Global Step: 663890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:54,271-Speed 6319.80 samples/sec Loss 3.1404 LearningRate 0.0000 Epoch: 32 Global Step: 663900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:07:57,520-Speed 6305.95 samples/sec Loss 3.1542 LearningRate 0.0000 Epoch: 32 Global Step: 663910 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:00,758-Speed 6326.25 samples/sec Loss 3.1384 LearningRate 0.0000 Epoch: 32 Global Step: 663920 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:04,005-Speed 6308.81 samples/sec Loss 3.1715 LearningRate 0.0000 Epoch: 32 Global Step: 663930 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:07,246-Speed 6320.15 samples/sec Loss 3.1658 LearningRate 0.0000 Epoch: 32 Global Step: 663940 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:10,488-Speed 6319.70 samples/sec Loss 3.1499 LearningRate 0.0000 Epoch: 32 Global Step: 663950 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:13,736-Speed 6305.63 samples/sec Loss 3.1003 LearningRate 0.0000 Epoch: 32 Global Step: 663960 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:16,981-Speed 6313.89 samples/sec Loss 3.1326 LearningRate 0.0000 Epoch: 32 Global Step: 663970 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:20,205-Speed 6353.75 samples/sec Loss 3.0796 LearningRate 0.0000 Epoch: 32 Global Step: 663980 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:23,443-Speed 6326.10 samples/sec Loss 3.1110 LearningRate 0.0000 Epoch: 32 Global Step: 663990 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:26,685-Speed 6317.61 samples/sec Loss 3.0954 LearningRate 0.0000 Epoch: 32 Global Step: 664000 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:29,923-Speed 6327.60 samples/sec Loss 3.0745 LearningRate 0.0000 Epoch: 32 Global Step: 664010 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:33,161-Speed 6325.91 samples/sec Loss 3.1109 LearningRate 0.0000 Epoch: 32 Global Step: 664020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:36,398-Speed 6328.07 samples/sec Loss 3.2051 LearningRate 0.0000 Epoch: 32 Global Step: 664030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:39,637-Speed 6325.26 samples/sec Loss 3.1820 LearningRate 0.0000 Epoch: 32 Global Step: 664040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:42,873-Speed 6329.74 samples/sec Loss 3.1447 LearningRate 0.0000 Epoch: 32 Global Step: 664050 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:46,121-Speed 6308.39 samples/sec Loss 3.0816 LearningRate 0.0000 Epoch: 32 Global Step: 664060 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:08:49,346-Speed 6351.32 samples/sec Loss 3.1365 LearningRate 0.0000 Epoch: 32 Global Step: 664070 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:08:52,580-Speed 6334.37 samples/sec Loss 3.1106 LearningRate 0.0000 Epoch: 32 Global Step: 664080 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:08:55,818-Speed 6325.98 samples/sec Loss 3.1205 LearningRate 0.0000 Epoch: 32 Global Step: 664090 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:08:59,053-Speed 6332.08 samples/sec Loss 3.1092 LearningRate 0.0000 Epoch: 32 Global Step: 664100 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:09:02,289-Speed 6330.71 samples/sec Loss 3.1346 LearningRate 0.0000 Epoch: 32 Global Step: 664110 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:09:05,527-Speed 6325.46 samples/sec Loss 3.1353 LearningRate 0.0000 Epoch: 32 Global Step: 664120 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:09:08,784-Speed 6291.48 samples/sec Loss 3.1194 LearningRate 0.0000 Epoch: 32 Global Step: 664130 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:09:12,028-Speed 6314.41 samples/sec Loss 3.1363 LearningRate 0.0000 Epoch: 32 Global Step: 664140 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:09:15,269-Speed 6321.29 samples/sec Loss 3.1399 LearningRate 0.0000 Epoch: 32 Global Step: 664150 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:09:18,506-Speed 6328.22 samples/sec Loss 3.1621 LearningRate 0.0000 Epoch: 32 Global Step: 664160 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:09:21,741-Speed 6331.75 samples/sec Loss 3.0989 LearningRate 0.0000 Epoch: 32 Global Step: 664170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:24,978-Speed 6328.41 samples/sec Loss 3.1055 LearningRate 0.0000 Epoch: 32 Global Step: 664180 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:28,214-Speed 6329.63 samples/sec Loss 3.1736 LearningRate 0.0000 Epoch: 32 Global Step: 664190 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:31,452-Speed 6325.84 samples/sec Loss 3.1330 LearningRate 0.0000 Epoch: 32 Global Step: 664200 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:34,693-Speed 6321.47 samples/sec Loss 3.1792 LearningRate 0.0000 Epoch: 32 Global Step: 664210 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:37,936-Speed 6316.56 samples/sec Loss 3.0981 LearningRate 0.0000 Epoch: 32 Global Step: 664220 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:41,178-Speed 6318.53 samples/sec Loss 3.1444 LearningRate 0.0000 Epoch: 32 Global Step: 664230 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:44,418-Speed 6322.44 samples/sec Loss 3.0821 LearningRate 0.0000 Epoch: 32 Global Step: 664240 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:47,654-Speed 6329.10 samples/sec Loss 3.1581 LearningRate 0.0000 Epoch: 32 Global Step: 664250 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:50,892-Speed 6326.61 samples/sec Loss 3.0930 LearningRate 0.0000 Epoch: 32 Global Step: 664260 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:54,115-Speed 6356.58 samples/sec Loss 3.1258 LearningRate 0.0000 Epoch: 32 Global Step: 664270 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:09:57,354-Speed 6323.91 samples/sec Loss 3.1487 LearningRate 0.0000 Epoch: 32 Global Step: 664280 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:00,604-Speed 6303.67 samples/sec Loss 3.1479 LearningRate 0.0000 Epoch: 32 Global Step: 664290 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:03,847-Speed 6315.02 samples/sec Loss 3.1571 LearningRate 0.0000 Epoch: 32 Global Step: 664300 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:07,088-Speed 6322.22 samples/sec Loss 3.0996 LearningRate 0.0000 Epoch: 32 Global Step: 664310 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:10,326-Speed 6325.92 samples/sec Loss 3.2123 LearningRate 0.0000 Epoch: 32 Global Step: 664320 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:13,570-Speed 6313.77 samples/sec Loss 3.1753 LearningRate 0.0000 Epoch: 32 Global Step: 664330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:16,812-Speed 6320.28 samples/sec Loss 3.1162 LearningRate 0.0000 Epoch: 32 Global Step: 664340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:20,050-Speed 6325.85 samples/sec Loss 3.1596 LearningRate 0.0000 Epoch: 32 Global Step: 664350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:23,291-Speed 6321.55 samples/sec Loss 3.0931 LearningRate 0.0000 Epoch: 32 Global Step: 664360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:26,516-Speed 6350.74 samples/sec Loss 3.1183 LearningRate 0.0000 Epoch: 32 Global Step: 664370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:29,759-Speed 6317.04 samples/sec Loss 3.1356 LearningRate 0.0000 Epoch: 32 Global Step: 664380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:32,996-Speed 6327.94 samples/sec Loss 3.1380 LearningRate 0.0000 Epoch: 32 Global Step: 664390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:36,242-Speed 6310.78 samples/sec Loss 3.0949 LearningRate 0.0000 Epoch: 32 Global Step: 664400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:39,481-Speed 6324.95 samples/sec Loss 3.1844 LearningRate 0.0000 Epoch: 32 Global Step: 664410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:42,719-Speed 6325.40 samples/sec Loss 3.1399 LearningRate 0.0000 Epoch: 32 Global Step: 664420 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:45,962-Speed 6319.57 samples/sec Loss 3.0891 LearningRate 0.0000 Epoch: 32 Global Step: 664430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:49,198-Speed 6330.00 samples/sec Loss 3.1960 LearningRate 0.0000 Epoch: 32 Global Step: 664440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:52,436-Speed 6326.75 samples/sec Loss 3.1763 LearningRate 0.0000 Epoch: 32 Global Step: 664450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:55,675-Speed 6324.51 samples/sec Loss 3.0915 LearningRate 0.0000 Epoch: 32 Global Step: 664460 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:10:58,899-Speed 6353.80 samples/sec Loss 3.1566 LearningRate 0.0000 Epoch: 32 Global Step: 664470 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:02,141-Speed 6318.68 samples/sec Loss 3.2153 LearningRate 0.0000 Epoch: 32 Global Step: 664480 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:05,401-Speed 6282.62 samples/sec Loss 3.1425 LearningRate 0.0000 Epoch: 32 Global Step: 664490 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:08,646-Speed 6313.20 samples/sec Loss 3.1454 LearningRate 0.0000 Epoch: 32 Global Step: 664500 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:11,888-Speed 6318.29 samples/sec Loss 3.1290 LearningRate 0.0000 Epoch: 32 Global Step: 664510 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:15,126-Speed 6326.70 samples/sec Loss 3.1219 LearningRate 0.0000 Epoch: 32 Global Step: 664520 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:18,366-Speed 6322.97 samples/sec Loss 3.1498 LearningRate 0.0000 Epoch: 32 Global Step: 664530 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:21,613-Speed 6308.21 samples/sec Loss 3.1106 LearningRate 0.0000 Epoch: 32 Global Step: 664540 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:24,853-Speed 6322.91 samples/sec Loss 3.1346 LearningRate 0.0000 Epoch: 32 Global Step: 664550 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:28,101-Speed 6306.34 samples/sec Loss 3.1883 LearningRate 0.0000 Epoch: 32 Global Step: 664560 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:31,351-Speed 6303.91 samples/sec Loss 3.1406 LearningRate 0.0000 Epoch: 32 Global Step: 664570 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 04:11:34,586-Speed 6332.34 samples/sec Loss 3.1215 LearningRate 0.0000 Epoch: 32 Global Step: 664580 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:37,829-Speed 6316.26 samples/sec Loss 3.1244 LearningRate 0.0000 Epoch: 32 Global Step: 664590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:41,069-Speed 6322.96 samples/sec Loss 3.1887 LearningRate 0.0000 Epoch: 32 Global Step: 664600 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:44,309-Speed 6322.22 samples/sec Loss 3.1061 LearningRate 0.0000 Epoch: 32 Global Step: 664610 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:47,551-Speed 6319.40 samples/sec Loss 3.1110 LearningRate 0.0000 Epoch: 32 Global Step: 664620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:50,799-Speed 6305.81 samples/sec Loss 3.1205 LearningRate 0.0000 Epoch: 32 Global Step: 664630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:54,050-Speed 6301.42 samples/sec Loss 3.1690 LearningRate 0.0000 Epoch: 32 Global Step: 664640 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:11:57,290-Speed 6321.79 samples/sec Loss 3.1566 LearningRate 0.0000 Epoch: 32 Global Step: 664650 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:00,529-Speed 6323.58 samples/sec Loss 3.1206 LearningRate 0.0000 Epoch: 32 Global Step: 664660 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:03,767-Speed 6326.86 samples/sec Loss 3.1148 LearningRate 0.0000 Epoch: 32 Global Step: 664670 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:06,996-Speed 6344.57 samples/sec Loss 3.1202 LearningRate 0.0000 Epoch: 32 Global Step: 664680 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:10,238-Speed 6317.84 samples/sec Loss 3.1288 LearningRate 0.0000 Epoch: 32 Global Step: 664690 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:13,490-Speed 6300.59 samples/sec Loss 3.1656 LearningRate 0.0000 Epoch: 32 Global Step: 664700 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:16,736-Speed 6309.51 samples/sec Loss 3.1417 LearningRate 0.0000 Epoch: 32 Global Step: 664710 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:19,979-Speed 6316.03 samples/sec Loss 3.0997 LearningRate 0.0000 Epoch: 32 Global Step: 664720 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:23,223-Speed 6314.58 samples/sec Loss 3.1439 LearningRate 0.0000 Epoch: 32 Global Step: 664730 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:26,469-Speed 6311.29 samples/sec Loss 3.1229 LearningRate 0.0000 Epoch: 32 Global Step: 664740 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:29,711-Speed 6318.17 samples/sec Loss 3.1026 LearningRate 0.0000 Epoch: 32 Global Step: 664750 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:32,952-Speed 6320.91 samples/sec Loss 3.1032 LearningRate 0.0000 Epoch: 32 Global Step: 664760 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:36,191-Speed 6324.65 samples/sec Loss 3.1061 LearningRate 0.0000 Epoch: 32 Global Step: 664770 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:39,418-Speed 6349.00 samples/sec Loss 3.1443 LearningRate 0.0000 Epoch: 32 Global Step: 664780 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:42,660-Speed 6317.40 samples/sec Loss 3.1216 LearningRate 0.0000 Epoch: 32 Global Step: 664790 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:45,901-Speed 6322.28 samples/sec Loss 3.1921 LearningRate 0.0000 Epoch: 32 Global Step: 664800 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:49,139-Speed 6326.75 samples/sec Loss 3.1465 LearningRate 0.0000 Epoch: 32 Global Step: 664810 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:52,381-Speed 6316.92 samples/sec Loss 3.1970 LearningRate 0.0000 Epoch: 32 Global Step: 664820 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:55,624-Speed 6317.80 samples/sec Loss 3.1224 LearningRate 0.0000 Epoch: 32 Global Step: 664830 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:12:58,864-Speed 6321.52 samples/sec Loss 3.1324 LearningRate 0.0000 Epoch: 32 Global Step: 664840 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:13:02,110-Speed 6310.76 samples/sec Loss 3.2044 LearningRate 0.0000 Epoch: 32 Global Step: 664850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:13:05,351-Speed 6320.90 samples/sec Loss 3.1337 LearningRate 0.0000 Epoch: 32 Global Step: 664860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:13:08,596-Speed 6312.77 samples/sec Loss 3.1050 LearningRate 0.0000 Epoch: 32 Global Step: 664870 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:13:11,846-Speed 6301.72 samples/sec Loss 3.1149 LearningRate 0.0000 Epoch: 32 Global Step: 664880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:13:15,117-Speed 6264.18 samples/sec Loss 3.1721 LearningRate 0.0000 Epoch: 32 Global Step: 664890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:13:18,360-Speed 6316.90 samples/sec Loss 3.1284 LearningRate 0.0000 Epoch: 32 Global Step: 664900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:13:21,600-Speed 6321.21 samples/sec Loss 3.1375 LearningRate 0.0000 Epoch: 32 Global Step: 664910 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:13:24,837-Speed 6329.03 samples/sec Loss 3.1318 LearningRate 0.0000 Epoch: 32 Global Step: 664920 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:13:28,105-Speed 6267.73 samples/sec Loss 3.1547 LearningRate 0.0000 Epoch: 32 Global Step: 664930 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:13:31,348-Speed 6316.00 samples/sec Loss 3.1088 LearningRate 0.0000 Epoch: 32 Global Step: 664940 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:13:34,612-Speed 6275.95 samples/sec Loss 3.1104 LearningRate 0.0000 Epoch: 32 Global Step: 664950 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:13:37,851-Speed 6325.83 samples/sec Loss 3.1287 LearningRate 0.0000 Epoch: 32 Global Step: 664960 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:13:41,091-Speed 6321.15 samples/sec Loss 3.0841 LearningRate 0.0000 Epoch: 32 Global Step: 664970 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:13:44,334-Speed 6317.50 samples/sec Loss 3.1276 LearningRate 0.0000 Epoch: 32 Global Step: 664980 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:13:47,578-Speed 6313.54 samples/sec Loss 3.1243 LearningRate 0.0000 Epoch: 32 Global Step: 664990 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:13:50,815-Speed 6329.83 samples/sec Loss 3.1415 LearningRate 0.0000 Epoch: 32 Global Step: 665000 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:13:54,050-Speed 6331.92 samples/sec Loss 3.1296 LearningRate 0.0000 Epoch: 32 Global Step: 665010 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:13:57,296-Speed 6310.18 samples/sec Loss 3.1531 LearningRate 0.0000 Epoch: 32 Global Step: 665020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:00,539-Speed 6316.99 samples/sec Loss 3.1471 LearningRate 0.0000 Epoch: 32 Global Step: 665030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:03,786-Speed 6312.17 samples/sec Loss 3.1723 LearningRate 0.0000 Epoch: 32 Global Step: 665040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:07,031-Speed 6312.79 samples/sec Loss 3.1056 LearningRate 0.0000 Epoch: 32 Global Step: 665050 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:10,275-Speed 6314.21 samples/sec Loss 3.0406 LearningRate 0.0000 Epoch: 32 Global Step: 665060 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:13,518-Speed 6317.10 samples/sec Loss 3.0950 LearningRate 0.0000 Epoch: 32 Global Step: 665070 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:16,762-Speed 6315.56 samples/sec Loss 3.1041 LearningRate 0.0000 Epoch: 32 Global Step: 665080 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:20,006-Speed 6315.17 samples/sec Loss 3.0892 LearningRate 0.0000 Epoch: 32 Global Step: 665090 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:23,248-Speed 6317.45 samples/sec Loss 3.1124 LearningRate 0.0000 Epoch: 32 Global Step: 665100 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:26,490-Speed 6318.77 samples/sec Loss 3.1333 LearningRate 0.0000 Epoch: 32 Global Step: 665110 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:29,737-Speed 6307.96 samples/sec Loss 3.1024 LearningRate 0.0000 Epoch: 32 Global Step: 665120 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 04:14:32,967-Speed 6344.04 samples/sec Loss 3.1357 LearningRate 0.0000 Epoch: 32 Global Step: 665130 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:36,209-Speed 6317.37 samples/sec Loss 3.2028 LearningRate 0.0000 Epoch: 32 Global Step: 665140 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:39,453-Speed 6314.52 samples/sec Loss 3.0719 LearningRate 0.0000 Epoch: 32 Global Step: 665150 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:42,695-Speed 6318.78 samples/sec Loss 3.1607 LearningRate 0.0000 Epoch: 32 Global Step: 665160 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:45,937-Speed 6319.07 samples/sec Loss 3.0784 LearningRate 0.0000 Epoch: 32 Global Step: 665170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:49,182-Speed 6313.58 samples/sec Loss 3.1595 LearningRate 0.0000 Epoch: 32 Global Step: 665180 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:52,437-Speed 6291.73 samples/sec Loss 3.0918 LearningRate 0.0000 Epoch: 32 Global Step: 665190 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:55,685-Speed 6308.47 samples/sec Loss 3.1030 LearningRate 0.0000 Epoch: 32 Global Step: 665200 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:14:58,930-Speed 6312.60 samples/sec Loss 3.1462 LearningRate 0.0000 Epoch: 32 Global Step: 665210 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:02,178-Speed 6306.35 samples/sec Loss 3.1697 LearningRate 0.0000 Epoch: 32 Global Step: 665220 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:05,407-Speed 6344.19 samples/sec Loss 3.1452 LearningRate 0.0000 Epoch: 32 Global Step: 665230 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:08,655-Speed 6306.23 samples/sec Loss 3.1277 LearningRate 0.0000 Epoch: 32 Global Step: 665240 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:11,895-Speed 6322.70 samples/sec Loss 3.1230 LearningRate 0.0000 Epoch: 32 Global Step: 665250 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:15,150-Speed 6294.23 samples/sec Loss 3.1383 LearningRate 0.0000 Epoch: 32 Global Step: 665260 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:18,389-Speed 6323.67 samples/sec Loss 3.0952 LearningRate 0.0000 Epoch: 32 Global Step: 665270 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:21,630-Speed 6320.85 samples/sec Loss 3.1369 LearningRate 0.0000 Epoch: 32 Global Step: 665280 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:24,914-Speed 6237.92 samples/sec Loss 3.1409 LearningRate 0.0000 Epoch: 32 Global Step: 665290 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:28,171-Speed 6288.62 samples/sec Loss 3.1196 LearningRate 0.0000 Epoch: 32 Global Step: 665300 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:31,417-Speed 6311.37 samples/sec Loss 3.0698 LearningRate 0.0000 Epoch: 32 Global Step: 665310 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:34,659-Speed 6318.37 samples/sec Loss 3.0923 LearningRate 0.0000 Epoch: 32 Global Step: 665320 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:37,888-Speed 6343.96 samples/sec Loss 3.0790 LearningRate 0.0000 Epoch: 32 Global Step: 665330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:41,133-Speed 6312.62 samples/sec Loss 3.0621 LearningRate 0.0000 Epoch: 32 Global Step: 665340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:44,372-Speed 6324.06 samples/sec Loss 3.1262 LearningRate 0.0000 Epoch: 32 Global Step: 665350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:47,617-Speed 6312.05 samples/sec Loss 3.1371 LearningRate 0.0000 Epoch: 32 Global Step: 665360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:50,869-Speed 6299.70 samples/sec Loss 3.1727 LearningRate 0.0000 Epoch: 32 Global Step: 665370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:54,120-Speed 6301.95 samples/sec Loss 3.1582 LearningRate 0.0000 Epoch: 32 Global Step: 665380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:15:57,363-Speed 6315.65 samples/sec Loss 3.2562 LearningRate 0.0000 Epoch: 32 Global Step: 665390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:00,606-Speed 6316.51 samples/sec Loss 3.1464 LearningRate 0.0000 Epoch: 32 Global Step: 665400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:03,851-Speed 6312.65 samples/sec Loss 3.0994 LearningRate 0.0000 Epoch: 32 Global Step: 665410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:07,091-Speed 6322.66 samples/sec Loss 3.1799 LearningRate 0.0000 Epoch: 32 Global Step: 665420 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:10,327-Speed 6331.52 samples/sec Loss 3.1353 LearningRate 0.0000 Epoch: 32 Global Step: 665430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:13,569-Speed 6318.57 samples/sec Loss 3.1012 LearningRate 0.0000 Epoch: 32 Global Step: 665440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:16,813-Speed 6314.92 samples/sec Loss 3.0912 LearningRate 0.0000 Epoch: 32 Global Step: 665450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:20,057-Speed 6314.62 samples/sec Loss 3.1390 LearningRate 0.0000 Epoch: 32 Global Step: 665460 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:23,304-Speed 6307.97 samples/sec Loss 3.1996 LearningRate 0.0000 Epoch: 32 Global Step: 665470 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:26,551-Speed 6309.07 samples/sec Loss 3.1320 LearningRate 0.0000 Epoch: 32 Global Step: 665480 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:29,790-Speed 6324.70 samples/sec Loss 3.1178 LearningRate 0.0000 Epoch: 32 Global Step: 665490 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:33,039-Speed 6303.61 samples/sec Loss 3.1122 LearningRate 0.0000 Epoch: 32 Global Step: 665500 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:36,288-Speed 6304.31 samples/sec Loss 3.1160 LearningRate 0.0000 Epoch: 32 Global Step: 665510 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:39,535-Speed 6309.38 samples/sec Loss 3.2112 LearningRate 0.0000 Epoch: 32 Global Step: 665520 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:42,767-Speed 6338.78 samples/sec Loss 3.1267 LearningRate 0.0000 Epoch: 32 Global Step: 665530 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:46,008-Speed 6319.17 samples/sec Loss 3.1399 LearningRate 0.0000 Epoch: 32 Global Step: 665540 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:49,290-Speed 6242.12 samples/sec Loss 3.0893 LearningRate 0.0000 Epoch: 32 Global Step: 665550 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:52,651-Speed 6095.50 samples/sec Loss 3.1615 LearningRate 0.0000 Epoch: 32 Global Step: 665560 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:55,946-Speed 6216.13 samples/sec Loss 3.1816 LearningRate 0.0000 Epoch: 32 Global Step: 665570 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:16:59,199-Speed 6296.23 samples/sec Loss 3.1078 LearningRate 0.0000 Epoch: 32 Global Step: 665580 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:02,444-Speed 6313.34 samples/sec Loss 3.0945 LearningRate 0.0000 Epoch: 32 Global Step: 665590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:05,691-Speed 6309.25 samples/sec Loss 3.1914 LearningRate 0.0000 Epoch: 32 Global Step: 665600 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:08,937-Speed 6310.46 samples/sec Loss 3.0887 LearningRate 0.0000 Epoch: 32 Global Step: 665610 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:12,192-Speed 6294.97 samples/sec Loss 3.1071 LearningRate 0.0000 Epoch: 32 Global Step: 665620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:15,423-Speed 6340.14 samples/sec Loss 3.0496 LearningRate 0.0000 Epoch: 32 Global Step: 665630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:18,662-Speed 6323.61 samples/sec Loss 3.1396 LearningRate 0.0000 Epoch: 32 Global Step: 665640 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:21,906-Speed 6315.60 samples/sec Loss 3.1051 LearningRate 0.0000 Epoch: 32 Global Step: 665650 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:25,151-Speed 6312.10 samples/sec Loss 3.1418 LearningRate 0.0000 Epoch: 32 Global Step: 665660 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:28,394-Speed 6315.76 samples/sec Loss 3.0887 LearningRate 0.0000 Epoch: 32 Global Step: 665670 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:31,636-Speed 6318.80 samples/sec Loss 3.0936 LearningRate 0.0000 Epoch: 32 Global Step: 665680 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:34,884-Speed 6306.60 samples/sec Loss 3.1547 LearningRate 0.0000 Epoch: 32 Global Step: 665690 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:38,126-Speed 6319.75 samples/sec Loss 3.1242 LearningRate 0.0000 Epoch: 32 Global Step: 665700 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:41,370-Speed 6312.96 samples/sec Loss 3.1348 LearningRate 0.0000 Epoch: 32 Global Step: 665710 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:44,616-Speed 6311.77 samples/sec Loss 3.1587 LearningRate 0.0000 Epoch: 32 Global Step: 665720 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:17:47,845-Speed 6343.73 samples/sec Loss 3.1637 LearningRate 0.0000 Epoch: 32 Global Step: 665730 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:17:51,089-Speed 6315.70 samples/sec Loss 3.1256 LearningRate 0.0000 Epoch: 32 Global Step: 665740 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:17:54,329-Speed 6321.69 samples/sec Loss 3.0910 LearningRate 0.0000 Epoch: 32 Global Step: 665750 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:17:57,570-Speed 6319.15 samples/sec Loss 3.1209 LearningRate 0.0000 Epoch: 32 Global Step: 665760 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:00,819-Speed 6305.40 samples/sec Loss 3.1361 LearningRate 0.0000 Epoch: 32 Global Step: 665770 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:04,069-Speed 6302.65 samples/sec Loss 3.1781 LearningRate 0.0000 Epoch: 32 Global Step: 665780 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:07,322-Speed 6298.39 samples/sec Loss 3.1985 LearningRate 0.0000 Epoch: 32 Global Step: 665790 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:10,573-Speed 6300.09 samples/sec Loss 3.1502 LearningRate 0.0000 Epoch: 32 Global Step: 665800 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:13,823-Speed 6304.15 samples/sec Loss 3.1706 LearningRate 0.0000 Epoch: 32 Global Step: 665810 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:17,066-Speed 6315.46 samples/sec Loss 3.1468 LearningRate 0.0000 Epoch: 32 Global Step: 665820 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:20,315-Speed 6305.43 samples/sec Loss 3.1395 LearningRate 0.0000 Epoch: 32 Global Step: 665830 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:18:23,560-Speed 6312.74 samples/sec Loss 3.1235 LearningRate 0.0000 Epoch: 32 Global Step: 665840 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:18:26,806-Speed 6311.32 samples/sec Loss 3.1387 LearningRate 0.0000 Epoch: 32 Global Step: 665850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:18:30,059-Speed 6297.58 samples/sec Loss 3.0925 LearningRate 0.0000 Epoch: 32 Global Step: 665860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:18:33,302-Speed 6316.38 samples/sec Loss 3.1160 LearningRate 0.0000 Epoch: 32 Global Step: 665870 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:18:36,549-Speed 6308.07 samples/sec Loss 3.0907 LearningRate 0.0000 Epoch: 32 Global Step: 665880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:18:39,792-Speed 6315.95 samples/sec Loss 3.1003 LearningRate 0.0000 Epoch: 32 Global Step: 665890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:18:43,044-Speed 6299.53 samples/sec Loss 3.1603 LearningRate 0.0000 Epoch: 32 Global Step: 665900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:18:46,272-Speed 6347.34 samples/sec Loss 3.1248 LearningRate 0.0000 Epoch: 32 Global Step: 665910 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:49,514-Speed 6317.42 samples/sec Loss 3.1107 LearningRate 0.0000 Epoch: 32 Global Step: 665920 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:52,766-Speed 6299.77 samples/sec Loss 3.1375 LearningRate 0.0000 Epoch: 32 Global Step: 665930 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:56,014-Speed 6307.11 samples/sec Loss 3.1295 LearningRate 0.0000 Epoch: 32 Global Step: 665940 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:18:59,270-Speed 6289.57 samples/sec Loss 3.1367 LearningRate 0.0000 Epoch: 32 Global Step: 665950 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:02,515-Speed 6314.24 samples/sec Loss 3.1231 LearningRate 0.0000 Epoch: 32 Global Step: 665960 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:05,805-Speed 6226.05 samples/sec Loss 3.0766 LearningRate 0.0000 Epoch: 32 Global Step: 665970 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:09,048-Speed 6316.34 samples/sec Loss 3.1322 LearningRate 0.0000 Epoch: 32 Global Step: 665980 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:12,297-Speed 6304.26 samples/sec Loss 3.1390 LearningRate 0.0000 Epoch: 32 Global Step: 665990 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:15,541-Speed 6315.83 samples/sec Loss 3.0284 LearningRate 0.0000 Epoch: 32 Global Step: 666000 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:18,785-Speed 6313.15 samples/sec Loss 3.1554 LearningRate 0.0000 Epoch: 32 Global Step: 666010 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:19:22,027-Speed 6318.70 samples/sec Loss 3.0852 LearningRate 0.0000 Epoch: 32 Global Step: 666020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:19:25,276-Speed 6305.83 samples/sec Loss 3.1190 LearningRate 0.0000 Epoch: 32 Global Step: 666030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:19:28,521-Speed 6312.55 samples/sec Loss 3.1486 LearningRate 0.0000 Epoch: 32 Global Step: 666040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:19:31,766-Speed 6312.87 samples/sec Loss 3.1835 LearningRate 0.0000 Epoch: 32 Global Step: 666050 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:19:34,996-Speed 6343.27 samples/sec Loss 3.1629 LearningRate 0.0000 Epoch: 32 Global Step: 666060 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:38,238-Speed 6317.21 samples/sec Loss 3.1739 LearningRate 0.0000 Epoch: 32 Global Step: 666070 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:41,482-Speed 6315.38 samples/sec Loss 3.1659 LearningRate 0.0000 Epoch: 32 Global Step: 666080 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:44,728-Speed 6311.67 samples/sec Loss 3.1668 LearningRate 0.0000 Epoch: 32 Global Step: 666090 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:47,969-Speed 6318.84 samples/sec Loss 3.1372 LearningRate 0.0000 Epoch: 32 Global Step: 666100 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:51,215-Speed 6311.53 samples/sec Loss 3.0933 LearningRate 0.0000 Epoch: 32 Global Step: 666110 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:54,457-Speed 6317.75 samples/sec Loss 3.1258 LearningRate 0.0000 Epoch: 32 Global Step: 666120 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:19:57,701-Speed 6314.79 samples/sec Loss 3.0712 LearningRate 0.0000 Epoch: 32 Global Step: 666130 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:00,949-Speed 6308.11 samples/sec Loss 3.0794 LearningRate 0.0000 Epoch: 32 Global Step: 666140 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:04,192-Speed 6314.96 samples/sec Loss 3.1014 LearningRate 0.0000 Epoch: 32 Global Step: 666150 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:07,437-Speed 6313.03 samples/sec Loss 3.1435 LearningRate 0.0000 Epoch: 32 Global Step: 666160 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:20:10,681-Speed 6315.61 samples/sec Loss 3.1422 LearningRate 0.0000 Epoch: 32 Global Step: 666170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:20:13,937-Speed 6291.23 samples/sec Loss 3.1351 LearningRate 0.0000 Epoch: 32 Global Step: 666180 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:20:17,180-Speed 6316.88 samples/sec Loss 3.0847 LearningRate 0.0000 Epoch: 32 Global Step: 666190 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:20:20,428-Speed 6308.75 samples/sec Loss 3.1300 LearningRate 0.0000 Epoch: 32 Global Step: 666200 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:20:23,684-Speed 6292.23 samples/sec Loss 3.1017 LearningRate 0.0000 Epoch: 32 Global Step: 666210 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:20:26,935-Speed 6299.69 samples/sec Loss 3.1204 LearningRate 0.0000 Epoch: 32 Global Step: 666220 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:20:30,168-Speed 6337.07 samples/sec Loss 3.1495 LearningRate 0.0000 Epoch: 32 Global Step: 666230 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:33,412-Speed 6314.51 samples/sec Loss 3.1384 LearningRate 0.0000 Epoch: 32 Global Step: 666240 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:36,658-Speed 6310.98 samples/sec Loss 3.1047 LearningRate 0.0000 Epoch: 32 Global Step: 666250 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:39,907-Speed 6304.58 samples/sec Loss 3.1218 LearningRate 0.0000 Epoch: 32 Global Step: 666260 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:43,154-Speed 6309.38 samples/sec Loss 3.1372 LearningRate 0.0000 Epoch: 32 Global Step: 666270 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:46,402-Speed 6308.05 samples/sec Loss 3.1221 LearningRate 0.0000 Epoch: 32 Global Step: 666280 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:49,647-Speed 6312.36 samples/sec Loss 3.0888 LearningRate 0.0000 Epoch: 32 Global Step: 666290 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:52,888-Speed 6319.22 samples/sec Loss 3.1087 LearningRate 0.0000 Epoch: 32 Global Step: 666300 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:56,138-Speed 6303.99 samples/sec Loss 3.0737 LearningRate 0.0000 Epoch: 32 Global Step: 666310 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:20:59,387-Speed 6305.21 samples/sec Loss 3.1895 LearningRate 0.0000 Epoch: 32 Global Step: 666320 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:02,633-Speed 6310.39 samples/sec Loss 3.1372 LearningRate 0.0000 Epoch: 32 Global Step: 666330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:21:05,883-Speed 6302.48 samples/sec Loss 3.1343 LearningRate 0.0000 Epoch: 32 Global Step: 666340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:21:09,134-Speed 6301.46 samples/sec Loss 3.1301 LearningRate 0.0000 Epoch: 32 Global Step: 666350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:21:12,386-Speed 6298.10 samples/sec Loss 3.1056 LearningRate 0.0000 Epoch: 32 Global Step: 666360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:21:15,639-Speed 6298.60 samples/sec Loss 3.1440 LearningRate 0.0000 Epoch: 32 Global Step: 666370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:21:18,886-Speed 6308.24 samples/sec Loss 3.0765 LearningRate 0.0000 Epoch: 32 Global Step: 666380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:21:22,121-Speed 6331.44 samples/sec Loss 3.1173 LearningRate 0.0000 Epoch: 32 Global Step: 666390 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:25,379-Speed 6288.31 samples/sec Loss 3.1558 LearningRate 0.0000 Epoch: 32 Global Step: 666400 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:28,632-Speed 6295.98 samples/sec Loss 3.1651 LearningRate 0.0000 Epoch: 32 Global Step: 666410 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:31,885-Speed 6296.81 samples/sec Loss 3.1801 LearningRate 0.0000 Epoch: 32 Global Step: 666420 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:35,129-Speed 6314.82 samples/sec Loss 3.0810 LearningRate 0.0000 Epoch: 32 Global Step: 666430 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:38,378-Speed 6305.30 samples/sec Loss 3.1260 LearningRate 0.0000 Epoch: 32 Global Step: 666440 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:41,624-Speed 6310.20 samples/sec Loss 3.0910 LearningRate 0.0000 Epoch: 32 Global Step: 666450 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:44,879-Speed 6294.20 samples/sec Loss 3.1238 LearningRate 0.0000 Epoch: 32 Global Step: 666460 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:48,126-Speed 6308.24 samples/sec Loss 3.0296 LearningRate 0.0000 Epoch: 32 Global Step: 666470 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:51,375-Speed 6305.54 samples/sec Loss 3.1311 LearningRate 0.0000 Epoch: 32 Global Step: 666480 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:21:54,621-Speed 6310.77 samples/sec Loss 3.1163 LearningRate 0.0000 Epoch: 32 Global Step: 666490 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:21:57,867-Speed 6310.95 samples/sec Loss 3.1189 LearningRate 0.0000 Epoch: 32 Global Step: 666500 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:22:01,113-Speed 6311.03 samples/sec Loss 3.0867 LearningRate 0.0000 Epoch: 32 Global Step: 666510 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:22:04,346-Speed 6336.60 samples/sec Loss 3.1312 LearningRate 0.0000 Epoch: 32 Global Step: 666520 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:22:07,604-Speed 6286.56 samples/sec Loss 3.0672 LearningRate 0.0000 Epoch: 32 Global Step: 666530 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:22:10,854-Speed 6303.38 samples/sec Loss 3.1690 LearningRate 0.0000 Epoch: 32 Global Step: 666540 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:22:14,109-Speed 6292.97 samples/sec Loss 3.1131 LearningRate 0.0000 Epoch: 32 Global Step: 666550 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:22:17,354-Speed 6312.81 samples/sec Loss 3.0978 LearningRate 0.0000 Epoch: 32 Global Step: 666560 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:22:20,612-Speed 6288.14 samples/sec Loss 3.1557 LearningRate 0.0000 Epoch: 32 Global Step: 666570 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:22:23,856-Speed 6313.07 samples/sec Loss 3.1312 LearningRate 0.0000 Epoch: 32 Global Step: 666580 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:22:27,104-Speed 6306.95 samples/sec Loss 3.1365 LearningRate 0.0000 Epoch: 32 Global Step: 666590 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:22:30,355-Speed 6302.04 samples/sec Loss 3.1180 LearningRate 0.0000 Epoch: 32 Global Step: 666600 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:22:33,610-Speed 6292.53 samples/sec Loss 3.0849 LearningRate 0.0000 Epoch: 32 Global Step: 666610 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:22:36,862-Speed 6298.90 samples/sec Loss 3.1872 LearningRate 0.0000 Epoch: 32 Global Step: 666620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:22:40,109-Speed 6309.42 samples/sec Loss 3.0947 LearningRate 0.0000 Epoch: 32 Global Step: 666630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:22:43,354-Speed 6312.94 samples/sec Loss 3.1576 LearningRate 0.0000 Epoch: 32 Global Step: 666640 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:22:46,599-Speed 6311.06 samples/sec Loss 3.1089 LearningRate 0.0000 Epoch: 32 Global Step: 666650 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:22:49,845-Speed 6311.22 samples/sec Loss 3.1780 LearningRate 0.0000 Epoch: 32 Global Step: 666660 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:22:53,091-Speed 6311.49 samples/sec Loss 3.0935 LearningRate 0.0000 Epoch: 32 Global Step: 666670 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:22:56,342-Speed 6301.94 samples/sec Loss 3.1675 LearningRate 0.0000 Epoch: 32 Global Step: 666680 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:22:59,599-Speed 6288.63 samples/sec Loss 3.1467 LearningRate 0.0000 Epoch: 32 Global Step: 666690 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:02,840-Speed 6320.85 samples/sec Loss 3.1920 LearningRate 0.0000 Epoch: 32 Global Step: 666700 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:06,090-Speed 6302.95 samples/sec Loss 3.2082 LearningRate 0.0000 Epoch: 32 Global Step: 666710 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:09,320-Speed 6343.23 samples/sec Loss 3.1660 LearningRate 0.0000 Epoch: 32 Global Step: 666720 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:12,570-Speed 6301.46 samples/sec Loss 3.1475 LearningRate 0.0000 Epoch: 32 Global Step: 666730 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:15,824-Speed 6295.67 samples/sec Loss 3.1446 LearningRate 0.0000 Epoch: 32 Global Step: 666740 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:19,077-Speed 6296.59 samples/sec Loss 3.1238 LearningRate 0.0000 Epoch: 32 Global Step: 666750 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:22,328-Speed 6301.42 samples/sec Loss 3.1260 LearningRate 0.0000 Epoch: 32 Global Step: 666760 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:25,580-Speed 6299.02 samples/sec Loss 3.1026 LearningRate 0.0000 Epoch: 32 Global Step: 666770 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:28,828-Speed 6307.35 samples/sec Loss 3.1420 LearningRate 0.0000 Epoch: 32 Global Step: 666780 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:32,071-Speed 6316.29 samples/sec Loss 3.1103 LearningRate 0.0000 Epoch: 32 Global Step: 666790 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:35,316-Speed 6311.79 samples/sec Loss 3.0662 LearningRate 0.0000 Epoch: 32 Global Step: 666800 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:38,563-Speed 6310.10 samples/sec Loss 3.1949 LearningRate 0.0000 Epoch: 32 Global Step: 666810 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:41,819-Speed 6290.83 samples/sec Loss 3.1102 LearningRate 0.0000 Epoch: 32 Global Step: 666820 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 04:23:45,053-Speed 6334.84 samples/sec Loss 3.1459 LearningRate 0.0000 Epoch: 32 Global Step: 666830 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:48,298-Speed 6312.00 samples/sec Loss 3.1348 LearningRate 0.0000 Epoch: 32 Global Step: 666840 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:51,550-Speed 6299.59 samples/sec Loss 3.1040 LearningRate 0.0000 Epoch: 32 Global Step: 666850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:54,799-Speed 6305.02 samples/sec Loss 3.1397 LearningRate 0.0000 Epoch: 32 Global Step: 666860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:23:58,050-Speed 6300.98 samples/sec Loss 3.1544 LearningRate 0.0000 Epoch: 32 Global Step: 666870 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:01,299-Speed 6303.55 samples/sec Loss 3.1430 LearningRate 0.0000 Epoch: 32 Global Step: 666880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:04,541-Speed 6319.09 samples/sec Loss 3.1522 LearningRate 0.0000 Epoch: 32 Global Step: 666890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:07,789-Speed 6306.39 samples/sec Loss 3.1421 LearningRate 0.0000 Epoch: 32 Global Step: 666900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:11,041-Speed 6299.95 samples/sec Loss 3.0687 LearningRate 0.0000 Epoch: 32 Global Step: 666910 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:14,283-Speed 6318.56 samples/sec Loss 3.1540 LearningRate 0.0000 Epoch: 32 Global Step: 666920 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:17,516-Speed 6335.88 samples/sec Loss 3.1252 LearningRate 0.0000 Epoch: 32 Global Step: 666930 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:20,765-Speed 6306.15 samples/sec Loss 3.0898 LearningRate 0.0000 Epoch: 32 Global Step: 666940 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:24,018-Speed 6296.37 samples/sec Loss 3.0951 LearningRate 0.0000 Epoch: 32 Global Step: 666950 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:27,261-Speed 6316.95 samples/sec Loss 3.1003 LearningRate 0.0000 Epoch: 32 Global Step: 666960 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:30,517-Speed 6291.11 samples/sec Loss 3.1779 LearningRate 0.0000 Epoch: 32 Global Step: 666970 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:33,765-Speed 6306.47 samples/sec Loss 3.0540 LearningRate 0.0000 Epoch: 32 Global Step: 666980 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:37,011-Speed 6310.86 samples/sec Loss 3.1115 LearningRate 0.0000 Epoch: 32 Global Step: 666990 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:24:40,263-Speed 6300.44 samples/sec Loss 3.1010 LearningRate 0.0000 Epoch: 32 Global Step: 667000 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:24:43,521-Speed 6286.41 samples/sec Loss 3.1234 LearningRate 0.0000 Epoch: 32 Global Step: 667010 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:24:46,767-Speed 6310.42 samples/sec Loss 3.1157 LearningRate 0.0000 Epoch: 32 Global Step: 667020 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:24:50,023-Speed 6292.84 samples/sec Loss 3.1136 LearningRate 0.0000 Epoch: 32 Global Step: 667030 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:24:53,265-Speed 6318.55 samples/sec Loss 3.0943 LearningRate 0.0000 Epoch: 32 Global Step: 667040 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:24:56,508-Speed 6315.21 samples/sec Loss 3.1085 LearningRate 0.0000 Epoch: 32 Global Step: 667050 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:24:59,753-Speed 6312.81 samples/sec Loss 3.0943 LearningRate 0.0000 Epoch: 32 Global Step: 667060 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:03,006-Speed 6296.55 samples/sec Loss 3.0987 LearningRate 0.0000 Epoch: 32 Global Step: 667070 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:06,260-Speed 6296.41 samples/sec Loss 3.1052 LearningRate 0.0000 Epoch: 32 Global Step: 667080 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:09,509-Speed 6305.08 samples/sec Loss 3.0693 LearningRate 0.0000 Epoch: 32 Global Step: 667090 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:12,760-Speed 6300.41 samples/sec Loss 3.1302 LearningRate 0.0000 Epoch: 32 Global Step: 667100 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:25:16,013-Speed 6297.68 samples/sec Loss 3.1548 LearningRate 0.0000 Epoch: 32 Global Step: 667110 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:25:19,276-Speed 6279.15 samples/sec Loss 3.1333 LearningRate 0.0000 Epoch: 32 Global Step: 667120 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:25:22,527-Speed 6301.73 samples/sec Loss 3.1203 LearningRate 0.0000 Epoch: 32 Global Step: 667130 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:25:25,780-Speed 6297.59 samples/sec Loss 3.1213 LearningRate 0.0000 Epoch: 32 Global Step: 667140 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:25:29,026-Speed 6310.74 samples/sec Loss 3.1498 LearningRate 0.0000 Epoch: 32 Global Step: 667150 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:25:32,270-Speed 6313.96 samples/sec Loss 3.1382 LearningRate 0.0000 Epoch: 32 Global Step: 667160 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:35,518-Speed 6306.42 samples/sec Loss 3.1131 LearningRate 0.0000 Epoch: 32 Global Step: 667170 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:38,782-Speed 6277.65 samples/sec Loss 3.1050 LearningRate 0.0000 Epoch: 32 Global Step: 667180 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:42,024-Speed 6317.01 samples/sec Loss 3.0876 LearningRate 0.0000 Epoch: 32 Global Step: 667190 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:45,271-Speed 6309.10 samples/sec Loss 3.1103 LearningRate 0.0000 Epoch: 32 Global Step: 667200 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:48,520-Speed 6305.13 samples/sec Loss 3.0985 LearningRate 0.0000 Epoch: 32 Global Step: 667210 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:51,775-Speed 6292.59 samples/sec Loss 3.0837 LearningRate 0.0000 Epoch: 32 Global Step: 667220 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:55,026-Speed 6304.63 samples/sec Loss 3.1302 LearningRate 0.0000 Epoch: 32 Global Step: 667230 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:25:58,275-Speed 6303.87 samples/sec Loss 3.0700 LearningRate 0.0000 Epoch: 32 Global Step: 667240 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:26:01,524-Speed 6303.98 samples/sec Loss 3.0902 LearningRate 0.0000 Epoch: 32 Global Step: 667250 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:26:04,775-Speed 6301.26 samples/sec Loss 3.1508 LearningRate 0.0000 Epoch: 32 Global Step: 667260 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:08,023-Speed 6308.50 samples/sec Loss 3.1128 LearningRate 0.0000 Epoch: 32 Global Step: 667270 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:11,273-Speed 6303.73 samples/sec Loss 3.1424 LearningRate 0.0000 Epoch: 32 Global Step: 667280 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:14,527-Speed 6294.60 samples/sec Loss 3.1077 LearningRate 0.0000 Epoch: 32 Global Step: 667290 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:17,776-Speed 6304.17 samples/sec Loss 3.1303 LearningRate 0.0000 Epoch: 32 Global Step: 667300 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:21,024-Speed 6308.11 samples/sec Loss 3.1538 LearningRate 0.0000 Epoch: 32 Global Step: 667310 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:24,274-Speed 6303.25 samples/sec Loss 3.0687 LearningRate 0.0000 Epoch: 32 Global Step: 667320 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:27,533-Speed 6285.27 samples/sec Loss 3.1009 LearningRate 0.0000 Epoch: 32 Global Step: 667330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:30,788-Speed 6292.82 samples/sec Loss 3.0993 LearningRate 0.0000 Epoch: 32 Global Step: 667340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:34,035-Speed 6309.33 samples/sec Loss 3.1507 LearningRate 0.0000 Epoch: 32 Global Step: 667350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:37,270-Speed 6332.60 samples/sec Loss 3.1552 LearningRate 0.0000 Epoch: 32 Global Step: 667360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:40,527-Speed 6288.70 samples/sec Loss 3.0705 LearningRate 0.0000 Epoch: 32 Global Step: 667370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:43,775-Speed 6306.54 samples/sec Loss 3.0978 LearningRate 0.0000 Epoch: 32 Global Step: 667380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:47,025-Speed 6302.61 samples/sec Loss 3.0804 LearningRate 0.0000 Epoch: 32 Global Step: 667390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:50,272-Speed 6309.86 samples/sec Loss 3.1498 LearningRate 0.0000 Epoch: 32 Global Step: 667400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:53,520-Speed 6306.24 samples/sec Loss 3.1003 LearningRate 0.0000 Epoch: 32 Global Step: 667410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:26:56,761-Speed 6320.78 samples/sec Loss 3.1571 LearningRate 0.0000 Epoch: 32 Global Step: 667420 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:00,010-Speed 6304.23 samples/sec Loss 3.1348 LearningRate 0.0000 Epoch: 32 Global Step: 667430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:03,265-Speed 6293.93 samples/sec Loss 3.1128 LearningRate 0.0000 Epoch: 32 Global Step: 667440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:06,514-Speed 6303.95 samples/sec Loss 3.1334 LearningRate 0.0000 Epoch: 32 Global Step: 667450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:09,752-Speed 6327.09 samples/sec Loss 3.1349 LearningRate 0.0000 Epoch: 32 Global Step: 667460 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:13,001-Speed 6305.57 samples/sec Loss 3.0812 LearningRate 0.0000 Epoch: 32 Global Step: 667470 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:16,247-Speed 6310.00 samples/sec Loss 3.1264 LearningRate 0.0000 Epoch: 32 Global Step: 667480 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:19,488-Speed 6319.79 samples/sec Loss 3.1183 LearningRate 0.0000 Epoch: 32 Global Step: 667490 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:22,734-Speed 6310.65 samples/sec Loss 3.1095 LearningRate 0.0000 Epoch: 32 Global Step: 667500 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:25,982-Speed 6307.29 samples/sec Loss 3.0635 LearningRate 0.0000 Epoch: 32 Global Step: 667510 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:29,236-Speed 6295.33 samples/sec Loss 3.1133 LearningRate 0.0000 Epoch: 32 Global Step: 667520 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:32,485-Speed 6305.61 samples/sec Loss 3.1025 LearningRate 0.0000 Epoch: 32 Global Step: 667530 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:35,731-Speed 6310.03 samples/sec Loss 3.1354 LearningRate 0.0000 Epoch: 32 Global Step: 667540 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:38,993-Speed 6281.58 samples/sec Loss 3.1303 LearningRate 0.0000 Epoch: 32 Global Step: 667550 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:42,228-Speed 6330.56 samples/sec Loss 3.1153 LearningRate 0.0000 Epoch: 32 Global Step: 667560 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:45,481-Speed 6298.29 samples/sec Loss 3.1501 LearningRate 0.0000 Epoch: 32 Global Step: 667570 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:48,736-Speed 6292.62 samples/sec Loss 3.1092 LearningRate 0.0000 Epoch: 32 Global Step: 667580 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:51,983-Speed 6309.15 samples/sec Loss 3.1060 LearningRate 0.0000 Epoch: 32 Global Step: 667590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:55,226-Speed 6315.57 samples/sec Loss 3.0814 LearningRate 0.0000 Epoch: 32 Global Step: 667600 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:27:58,472-Speed 6310.62 samples/sec Loss 3.1238 LearningRate 0.0000 Epoch: 32 Global Step: 667610 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:01,720-Speed 6306.61 samples/sec Loss 3.1147 LearningRate 0.0000 Epoch: 32 Global Step: 667620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:04,967-Speed 6308.54 samples/sec Loss 3.0920 LearningRate 0.0000 Epoch: 32 Global Step: 667630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:08,218-Speed 6305.84 samples/sec Loss 3.1780 LearningRate 0.0000 Epoch: 32 Global Step: 667640 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:11,462-Speed 6313.57 samples/sec Loss 3.1896 LearningRate 0.0000 Epoch: 32 Global Step: 667650 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:14,709-Speed 6308.89 samples/sec Loss 3.1012 LearningRate 0.0000 Epoch: 32 Global Step: 667660 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 04:28:17,945-Speed 6330.36 samples/sec Loss 3.0595 LearningRate 0.0000 Epoch: 32 Global Step: 667670 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:21,200-Speed 6292.23 samples/sec Loss 3.1139 LearningRate 0.0000 Epoch: 32 Global Step: 667680 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:24,448-Speed 6307.15 samples/sec Loss 3.1019 LearningRate 0.0000 Epoch: 32 Global Step: 667690 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:27,694-Speed 6311.03 samples/sec Loss 3.0833 LearningRate 0.0000 Epoch: 32 Global Step: 667700 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:30,936-Speed 6317.77 samples/sec Loss 3.0813 LearningRate 0.0000 Epoch: 32 Global Step: 667710 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:34,187-Speed 6301.56 samples/sec Loss 3.1388 LearningRate 0.0000 Epoch: 32 Global Step: 667720 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:37,445-Speed 6288.46 samples/sec Loss 3.1600 LearningRate 0.0000 Epoch: 32 Global Step: 667730 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:40,699-Speed 6295.17 samples/sec Loss 3.1404 LearningRate 0.0000 Epoch: 32 Global Step: 667740 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:43,954-Speed 6293.35 samples/sec Loss 3.1152 LearningRate 0.0000 Epoch: 32 Global Step: 667750 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:47,209-Speed 6293.05 samples/sec Loss 3.0985 LearningRate 0.0000 Epoch: 32 Global Step: 667760 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:50,441-Speed 6339.41 samples/sec Loss 3.1760 LearningRate 0.0000 Epoch: 32 Global Step: 667770 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:53,683-Speed 6317.46 samples/sec Loss 3.1542 LearningRate 0.0000 Epoch: 32 Global Step: 667780 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:28:56,926-Speed 6315.97 samples/sec Loss 3.1361 LearningRate 0.0000 Epoch: 32 Global Step: 667790 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:00,168-Speed 6318.67 samples/sec Loss 3.1257 LearningRate 0.0000 Epoch: 32 Global Step: 667800 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:03,416-Speed 6307.88 samples/sec Loss 3.1286 LearningRate 0.0000 Epoch: 32 Global Step: 667810 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:06,662-Speed 6310.07 samples/sec Loss 3.1602 LearningRate 0.0000 Epoch: 32 Global Step: 667820 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:09,905-Speed 6316.42 samples/sec Loss 3.0862 LearningRate 0.0000 Epoch: 32 Global Step: 667830 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:13,173-Speed 6270.80 samples/sec Loss 3.1520 LearningRate 0.0000 Epoch: 32 Global Step: 667840 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:16,428-Speed 6292.90 samples/sec Loss 3.1091 LearningRate 0.0000 Epoch: 32 Global Step: 667850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:19,678-Speed 6303.16 samples/sec Loss 3.0842 LearningRate 0.0000 Epoch: 32 Global Step: 667860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:22,912-Speed 6333.48 samples/sec Loss 3.0529 LearningRate 0.0000 Epoch: 32 Global Step: 667870 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:26,160-Speed 6308.19 samples/sec Loss 3.1260 LearningRate 0.0000 Epoch: 32 Global Step: 667880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:29,406-Speed 6309.45 samples/sec Loss 3.1468 LearningRate 0.0000 Epoch: 32 Global Step: 667890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:32,652-Speed 6311.77 samples/sec Loss 3.1193 LearningRate 0.0000 Epoch: 32 Global Step: 667900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:35,898-Speed 6309.30 samples/sec Loss 3.0809 LearningRate 0.0000 Epoch: 32 Global Step: 667910 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:39,143-Speed 6314.09 samples/sec Loss 3.1101 LearningRate 0.0000 Epoch: 32 Global Step: 667920 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:42,389-Speed 6310.01 samples/sec Loss 3.1428 LearningRate 0.0000 Epoch: 32 Global Step: 667930 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:45,636-Speed 6308.54 samples/sec Loss 3.0553 LearningRate 0.0000 Epoch: 32 Global Step: 667940 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:48,879-Speed 6316.95 samples/sec Loss 3.0695 LearningRate 0.0000 Epoch: 32 Global Step: 667950 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:52,127-Speed 6308.00 samples/sec Loss 3.1198 LearningRate 0.0000 Epoch: 32 Global Step: 667960 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:55,366-Speed 6324.31 samples/sec Loss 3.1106 LearningRate 0.0000 Epoch: 32 Global Step: 667970 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:29:58,620-Speed 6294.02 samples/sec Loss 3.1096 LearningRate 0.0000 Epoch: 32 Global Step: 667980 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:01,868-Speed 6308.60 samples/sec Loss 3.0960 LearningRate 0.0000 Epoch: 32 Global Step: 667990 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:05,133-Speed 6272.83 samples/sec Loss 3.1291 LearningRate 0.0000 Epoch: 32 Global Step: 668000 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:08,397-Speed 6275.47 samples/sec Loss 3.0989 LearningRate 0.0000 Epoch: 32 Global Step: 668010 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:11,642-Speed 6312.50 samples/sec Loss 3.1104 LearningRate 0.0000 Epoch: 32 Global Step: 668020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:14,890-Speed 6307.10 samples/sec Loss 3.1089 LearningRate 0.0000 Epoch: 32 Global Step: 668030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:18,139-Speed 6305.38 samples/sec Loss 3.0911 LearningRate 0.0000 Epoch: 32 Global Step: 668040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:21,387-Speed 6306.19 samples/sec Loss 3.1601 LearningRate 0.0000 Epoch: 32 Global Step: 668050 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:24,636-Speed 6306.66 samples/sec Loss 3.1145 LearningRate 0.0000 Epoch: 32 Global Step: 668060 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:27,866-Speed 6340.21 samples/sec Loss 3.1247 LearningRate 0.0000 Epoch: 32 Global Step: 668070 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:31,117-Speed 6301.10 samples/sec Loss 3.0681 LearningRate 0.0000 Epoch: 32 Global Step: 668080 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:34,363-Speed 6310.64 samples/sec Loss 3.0776 LearningRate 0.0000 Epoch: 32 Global Step: 668090 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:37,609-Speed 6312.09 samples/sec Loss 3.0928 LearningRate 0.0000 Epoch: 32 Global Step: 668100 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:40,859-Speed 6303.02 samples/sec Loss 3.1322 LearningRate 0.0000 Epoch: 32 Global Step: 668110 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:44,105-Speed 6310.54 samples/sec Loss 3.0998 LearningRate 0.0000 Epoch: 32 Global Step: 668120 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:47,349-Speed 6313.19 samples/sec Loss 3.1390 LearningRate 0.0000 Epoch: 32 Global Step: 668130 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:50,593-Speed 6314.43 samples/sec Loss 3.1178 LearningRate 0.0000 Epoch: 32 Global Step: 668140 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:53,840-Speed 6309.61 samples/sec Loss 3.1288 LearningRate 0.0000 Epoch: 32 Global Step: 668150 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:30:57,086-Speed 6312.29 samples/sec Loss 3.1437 LearningRate 0.0000 Epoch: 32 Global Step: 668160 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:31:00,325-Speed 6324.23 samples/sec Loss 3.1213 LearningRate 0.0000 Epoch: 32 Global Step: 668170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:31:03,569-Speed 6314.46 samples/sec Loss 3.1743 LearningRate 0.0000 Epoch: 32 Global Step: 668180 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:31:06,814-Speed 6311.97 samples/sec Loss 3.1230 LearningRate 0.0000 Epoch: 32 Global Step: 668190 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:31:10,064-Speed 6302.74 samples/sec Loss 3.0899 LearningRate 0.0000 Epoch: 32 Global Step: 668200 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:31:13,324-Speed 6285.28 samples/sec Loss 3.1343 LearningRate 0.0000 Epoch: 32 Global Step: 668210 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:31:16,577-Speed 6296.10 samples/sec Loss 3.1139 LearningRate 0.0000 Epoch: 32 Global Step: 668220 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:31:19,822-Speed 6311.95 samples/sec Loss 3.1513 LearningRate 0.0000 Epoch: 32 Global Step: 668230 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:31:23,065-Speed 6316.49 samples/sec Loss 3.0506 LearningRate 0.0000 Epoch: 32 Global Step: 668240 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:31:26,311-Speed 6311.65 samples/sec Loss 3.0966 LearningRate 0.0000 Epoch: 32 Global Step: 668250 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:31:29,555-Speed 6314.76 samples/sec Loss 3.0466 LearningRate 0.0000 Epoch: 32 Global Step: 668260 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:31:32,805-Speed 6302.37 samples/sec Loss 3.0179 LearningRate 0.0000 Epoch: 32 Global Step: 668270 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:31:36,051-Speed 6310.71 samples/sec Loss 3.1233 LearningRate 0.0000 Epoch: 32 Global Step: 668280 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:31:39,312-Speed 6281.42 samples/sec Loss 3.1359 LearningRate 0.0000 Epoch: 32 Global Step: 668290 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:31:42,559-Speed 6308.26 samples/sec Loss 3.1596 LearningRate 0.0000 Epoch: 32 Global Step: 668300 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:31:45,809-Speed 6304.74 samples/sec Loss 3.0794 LearningRate 0.0000 Epoch: 32 Global Step: 668310 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:31:49,058-Speed 6303.02 samples/sec Loss 3.1192 LearningRate 0.0000 Epoch: 32 Global Step: 668320 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:31:52,303-Speed 6313.29 samples/sec Loss 3.0870 LearningRate 0.0000 Epoch: 32 Global Step: 668330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:31:55,551-Speed 6307.71 samples/sec Loss 3.1130 LearningRate 0.0000 Epoch: 32 Global Step: 668340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:31:58,795-Speed 6313.92 samples/sec Loss 3.1410 LearningRate 0.0000 Epoch: 32 Global Step: 668350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:02,039-Speed 6315.26 samples/sec Loss 3.1426 LearningRate 0.0000 Epoch: 32 Global Step: 668360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:05,285-Speed 6310.84 samples/sec Loss 3.1060 LearningRate 0.0000 Epoch: 32 Global Step: 668370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:08,514-Speed 6343.71 samples/sec Loss 3.0619 LearningRate 0.0000 Epoch: 32 Global Step: 668380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:11,761-Speed 6309.61 samples/sec Loss 3.1074 LearningRate 0.0000 Epoch: 32 Global Step: 668390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:15,010-Speed 6305.59 samples/sec Loss 3.0761 LearningRate 0.0000 Epoch: 32 Global Step: 668400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:18,264-Speed 6293.30 samples/sec Loss 3.1164 LearningRate 0.0000 Epoch: 32 Global Step: 668410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:21,514-Speed 6303.92 samples/sec Loss 3.1383 LearningRate 0.0000 Epoch: 32 Global Step: 668420 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:24,761-Speed 6309.69 samples/sec Loss 3.1037 LearningRate 0.0000 Epoch: 32 Global Step: 668430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:28,010-Speed 6304.17 samples/sec Loss 3.1661 LearningRate 0.0000 Epoch: 32 Global Step: 668440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:31,258-Speed 6306.86 samples/sec Loss 3.1380 LearningRate 0.0000 Epoch: 32 Global Step: 668450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:34,506-Speed 6306.91 samples/sec Loss 3.0840 LearningRate 0.0000 Epoch: 32 Global Step: 668460 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:37,752-Speed 6309.56 samples/sec Loss 3.0491 LearningRate 0.0000 Epoch: 32 Global Step: 668470 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:40,983-Speed 6340.84 samples/sec Loss 3.0973 LearningRate 0.0000 Epoch: 32 Global Step: 668480 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:44,232-Speed 6304.85 samples/sec Loss 3.1089 LearningRate 0.0000 Epoch: 32 Global Step: 668490 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:47,484-Speed 6298.71 samples/sec Loss 3.1153 LearningRate 0.0000 Epoch: 32 Global Step: 668500 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:50,735-Speed 6301.11 samples/sec Loss 3.1403 LearningRate 0.0000 Epoch: 32 Global Step: 668510 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:53,978-Speed 6315.85 samples/sec Loss 3.0645 LearningRate 0.0000 Epoch: 32 Global Step: 668520 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:32:57,227-Speed 6305.75 samples/sec Loss 3.1243 LearningRate 0.0000 Epoch: 32 Global Step: 668530 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:00,480-Speed 6296.34 samples/sec Loss 3.1794 LearningRate 0.0000 Epoch: 32 Global Step: 668540 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:03,739-Speed 6286.41 samples/sec Loss 3.1233 LearningRate 0.0000 Epoch: 32 Global Step: 668550 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:06,994-Speed 6294.13 samples/sec Loss 3.1041 LearningRate 0.0000 Epoch: 32 Global Step: 668560 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:10,242-Speed 6305.74 samples/sec Loss 3.1044 LearningRate 0.0000 Epoch: 32 Global Step: 668570 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:13,479-Speed 6327.79 samples/sec Loss 3.1634 LearningRate 0.0000 Epoch: 32 Global Step: 668580 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:16,730-Speed 6300.92 samples/sec Loss 3.1638 LearningRate 0.0000 Epoch: 32 Global Step: 668590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:19,980-Speed 6304.37 samples/sec Loss 3.0633 LearningRate 0.0000 Epoch: 32 Global Step: 668600 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:23,229-Speed 6304.68 samples/sec Loss 3.1869 LearningRate 0.0000 Epoch: 32 Global Step: 668610 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:26,483-Speed 6294.99 samples/sec Loss 3.1264 LearningRate 0.0000 Epoch: 32 Global Step: 668620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:29,728-Speed 6312.31 samples/sec Loss 3.1269 LearningRate 0.0000 Epoch: 32 Global Step: 668630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:32,975-Speed 6310.32 samples/sec Loss 3.1213 LearningRate 0.0000 Epoch: 32 Global Step: 668640 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:36,223-Speed 6306.45 samples/sec Loss 3.1207 LearningRate 0.0000 Epoch: 32 Global Step: 668650 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:39,470-Speed 6308.13 samples/sec Loss 3.1462 LearningRate 0.0000 Epoch: 32 Global Step: 668660 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:42,713-Speed 6316.36 samples/sec Loss 3.1251 LearningRate 0.0000 Epoch: 32 Global Step: 668670 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:45,949-Speed 6330.89 samples/sec Loss 3.0911 LearningRate 0.0000 Epoch: 32 Global Step: 668680 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:49,204-Speed 6291.86 samples/sec Loss 3.1051 LearningRate 0.0000 Epoch: 32 Global Step: 668690 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:52,454-Speed 6304.59 samples/sec Loss 3.1122 LearningRate 0.0000 Epoch: 32 Global Step: 668700 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:55,706-Speed 6298.96 samples/sec Loss 3.0920 LearningRate 0.0000 Epoch: 32 Global Step: 668710 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:33:58,959-Speed 6296.07 samples/sec Loss 3.1674 LearningRate 0.0000 Epoch: 32 Global Step: 668720 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:34:02,208-Speed 6306.10 samples/sec Loss 3.0434 LearningRate 0.0000 Epoch: 32 Global Step: 668730 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:34:05,453-Speed 6311.75 samples/sec Loss 3.1202 LearningRate 0.0000 Epoch: 32 Global Step: 668740 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:34:08,699-Speed 6310.12 samples/sec Loss 3.0850 LearningRate 0.0000 Epoch: 32 Global Step: 668750 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:34:11,948-Speed 6305.80 samples/sec Loss 3.1171 LearningRate 0.0000 Epoch: 32 Global Step: 668760 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:34:15,193-Speed 6313.15 samples/sec Loss 3.0949 LearningRate 0.0000 Epoch: 32 Global Step: 668770 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:34:18,423-Speed 6341.30 samples/sec Loss 3.1678 LearningRate 0.0000 Epoch: 32 Global Step: 668780 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:34:21,673-Speed 6302.24 samples/sec Loss 3.1026 LearningRate 0.0000 Epoch: 32 Global Step: 668790 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:34:24,919-Speed 6312.26 samples/sec Loss 3.0551 LearningRate 0.0000 Epoch: 32 Global Step: 668800 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:34:28,171-Speed 6300.11 samples/sec Loss 3.1525 LearningRate 0.0000 Epoch: 32 Global Step: 668810 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:34:31,405-Speed 6334.00 samples/sec Loss 3.1353 LearningRate 0.0000 Epoch: 32 Global Step: 668820 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:34:34,660-Speed 6292.09 samples/sec Loss 3.1482 LearningRate 0.0000 Epoch: 32 Global Step: 668830 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:34:37,908-Speed 6307.51 samples/sec Loss 3.1089 LearningRate 0.0000 Epoch: 32 Global Step: 668840 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:34:41,155-Speed 6309.87 samples/sec Loss 3.1483 LearningRate 0.0000 Epoch: 32 Global Step: 668850 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:34:44,396-Speed 6319.12 samples/sec Loss 3.1079 LearningRate 0.0000 Epoch: 32 Global Step: 668860 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:34:47,650-Speed 6295.61 samples/sec Loss 3.0889 LearningRate 0.0000 Epoch: 32 Global Step: 668870 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:34:50,898-Speed 6307.47 samples/sec Loss 3.0434 LearningRate 0.0000 Epoch: 32 Global Step: 668880 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:34:54,148-Speed 6301.44 samples/sec Loss 3.1159 LearningRate 0.0000 Epoch: 32 Global Step: 668890 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:34:57,398-Speed 6304.56 samples/sec Loss 3.0788 LearningRate 0.0000 Epoch: 32 Global Step: 668900 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:35:00,696-Speed 6210.33 samples/sec Loss 3.0994 LearningRate 0.0000 Epoch: 32 Global Step: 668910 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:35:03,946-Speed 6301.93 samples/sec Loss 3.0975 LearningRate 0.0000 Epoch: 32 Global Step: 668920 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:07,193-Speed 6309.73 samples/sec Loss 3.0770 LearningRate 0.0000 Epoch: 32 Global Step: 668930 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:10,450-Speed 6288.36 samples/sec Loss 3.1593 LearningRate 0.0000 Epoch: 32 Global Step: 668940 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:13,708-Speed 6288.84 samples/sec Loss 3.1642 LearningRate 0.0000 Epoch: 32 Global Step: 668950 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:16,955-Speed 6308.03 samples/sec Loss 3.1188 LearningRate 0.0000 Epoch: 32 Global Step: 668960 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:20,202-Speed 6308.82 samples/sec Loss 3.0860 LearningRate 0.0000 Epoch: 32 Global Step: 668970 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:23,449-Speed 6308.21 samples/sec Loss 3.1556 LearningRate 0.0000 Epoch: 32 Global Step: 668980 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:26,694-Speed 6312.80 samples/sec Loss 3.0621 LearningRate 0.0000 Epoch: 32 Global Step: 668990 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:29,939-Speed 6312.72 samples/sec Loss 3.1069 LearningRate 0.0000 Epoch: 32 Global Step: 669000 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:33,187-Speed 6308.26 samples/sec Loss 3.1161 LearningRate 0.0000 Epoch: 32 Global Step: 669010 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:36,421-Speed 6334.12 samples/sec Loss 3.0743 LearningRate 0.0000 Epoch: 32 Global Step: 669020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:39,669-Speed 6307.81 samples/sec Loss 3.1239 LearningRate 0.0000 Epoch: 32 Global Step: 669030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:42,913-Speed 6313.55 samples/sec Loss 3.0607 LearningRate 0.0000 Epoch: 32 Global Step: 669040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:46,159-Speed 6311.53 samples/sec Loss 3.0850 LearningRate 0.0000 Epoch: 32 Global Step: 669050 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:49,419-Speed 6283.84 samples/sec Loss 3.1039 LearningRate 0.0000 Epoch: 32 Global Step: 669060 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:52,675-Speed 6289.40 samples/sec Loss 3.1026 LearningRate 0.0000 Epoch: 32 Global Step: 669070 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:55,918-Speed 6316.74 samples/sec Loss 3.0794 LearningRate 0.0000 Epoch: 32 Global Step: 669080 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:35:59,163-Speed 6313.81 samples/sec Loss 3.1618 LearningRate 0.0000 Epoch: 32 Global Step: 669090 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:02,413-Speed 6303.13 samples/sec Loss 3.0887 LearningRate 0.0000 Epoch: 32 Global Step: 669100 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:05,667-Speed 6294.02 samples/sec Loss 3.1247 LearningRate 0.0000 Epoch: 32 Global Step: 669110 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:08,900-Speed 6336.27 samples/sec Loss 3.1239 LearningRate 0.0000 Epoch: 32 Global Step: 669120 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:12,146-Speed 6310.58 samples/sec Loss 3.1491 LearningRate 0.0000 Epoch: 32 Global Step: 669130 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:15,398-Speed 6299.66 samples/sec Loss 3.1817 LearningRate 0.0000 Epoch: 32 Global Step: 669140 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:18,640-Speed 6318.37 samples/sec Loss 3.1204 LearningRate 0.0000 Epoch: 32 Global Step: 669150 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:21,890-Speed 6302.54 samples/sec Loss 3.1129 LearningRate 0.0000 Epoch: 32 Global Step: 669160 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:25,138-Speed 6307.28 samples/sec Loss 3.1457 LearningRate 0.0000 Epoch: 32 Global Step: 669170 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:28,406-Speed 6268.71 samples/sec Loss 3.1375 LearningRate 0.0000 Epoch: 32 Global Step: 669180 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:31,657-Speed 6300.71 samples/sec Loss 3.1380 LearningRate 0.0000 Epoch: 32 Global Step: 669190 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:34,903-Speed 6310.37 samples/sec Loss 3.0886 LearningRate 0.0000 Epoch: 32 Global Step: 669200 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:38,157-Speed 6296.64 samples/sec Loss 3.1556 LearningRate 0.0000 Epoch: 32 Global Step: 669210 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:41,390-Speed 6336.79 samples/sec Loss 3.1173 LearningRate 0.0000 Epoch: 32 Global Step: 669220 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:44,633-Speed 6316.57 samples/sec Loss 3.0681 LearningRate 0.0000 Epoch: 32 Global Step: 669230 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:47,881-Speed 6305.47 samples/sec Loss 3.0889 LearningRate 0.0000 Epoch: 32 Global Step: 669240 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:51,130-Speed 6305.40 samples/sec Loss 3.0801 LearningRate 0.0000 Epoch: 32 Global Step: 669250 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:54,384-Speed 6294.36 samples/sec Loss 3.1118 LearningRate 0.0000 Epoch: 32 Global Step: 669260 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:36:57,636-Speed 6299.31 samples/sec Loss 3.0741 LearningRate 0.0000 Epoch: 32 Global Step: 669270 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:00,887-Speed 6300.67 samples/sec Loss 3.1286 LearningRate 0.0000 Epoch: 32 Global Step: 669280 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:04,133-Speed 6311.44 samples/sec Loss 3.1195 LearningRate 0.0000 Epoch: 32 Global Step: 669290 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:07,378-Speed 6312.15 samples/sec Loss 3.0728 LearningRate 0.0000 Epoch: 32 Global Step: 669300 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:10,627-Speed 6305.30 samples/sec Loss 3.1635 LearningRate 0.0000 Epoch: 32 Global Step: 669310 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:13,861-Speed 6334.83 samples/sec Loss 3.1302 LearningRate 0.0000 Epoch: 32 Global Step: 669320 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:17,108-Speed 6307.23 samples/sec Loss 3.1134 LearningRate 0.0000 Epoch: 32 Global Step: 669330 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:20,356-Speed 6307.29 samples/sec Loss 3.0690 LearningRate 0.0000 Epoch: 32 Global Step: 669340 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:23,613-Speed 6289.75 samples/sec Loss 3.1476 LearningRate 0.0000 Epoch: 32 Global Step: 669350 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:26,867-Speed 6294.22 samples/sec Loss 3.1204 LearningRate 0.0000 Epoch: 32 Global Step: 669360 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:30,113-Speed 6311.81 samples/sec Loss 3.1372 LearningRate 0.0000 Epoch: 32 Global Step: 669370 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:33,368-Speed 6293.50 samples/sec Loss 3.0739 LearningRate 0.0000 Epoch: 32 Global Step: 669380 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:36,620-Speed 6297.83 samples/sec Loss 3.1460 LearningRate 0.0000 Epoch: 32 Global Step: 669390 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:39,880-Speed 6285.63 samples/sec Loss 3.0358 LearningRate 0.0000 Epoch: 32 Global Step: 669400 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:43,125-Speed 6312.06 samples/sec Loss 3.0608 LearningRate 0.0000 Epoch: 32 Global Step: 669410 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:46,374-Speed 6305.67 samples/sec Loss 3.1182 LearningRate 0.0000 Epoch: 32 Global Step: 669420 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 04:37:49,603-Speed 6344.23 samples/sec Loss 3.1251 LearningRate 0.0000 Epoch: 32 Global Step: 669430 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:52,848-Speed 6312.11 samples/sec Loss 3.0656 LearningRate 0.0000 Epoch: 32 Global Step: 669440 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:56,095-Speed 6308.84 samples/sec Loss 3.1273 LearningRate 0.0000 Epoch: 32 Global Step: 669450 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:37:59,346-Speed 6302.61 samples/sec Loss 3.0970 LearningRate 0.0000 Epoch: 32 Global Step: 669460 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:38:02,590-Speed 6314.21 samples/sec Loss 3.1314 LearningRate 0.0000 Epoch: 32 Global Step: 669470 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:38:05,825-Speed 6332.40 samples/sec Loss 3.1515 LearningRate 0.0000 Epoch: 32 Global Step: 669480 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:38:09,069-Speed 6313.43 samples/sec Loss 3.0721 LearningRate 0.0000 Epoch: 32 Global Step: 669490 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:38:12,324-Speed 6293.33 samples/sec Loss 3.0926 LearningRate 0.0000 Epoch: 32 Global Step: 669500 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:38:15,571-Speed 6309.95 samples/sec Loss 3.0857 LearningRate 0.0000 Epoch: 32 Global Step: 669510 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:38:18,822-Speed 6299.74 samples/sec Loss 3.1090 LearningRate 0.0000 Epoch: 32 Global Step: 669520 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:38:22,080-Speed 6287.96 samples/sec Loss 3.0763 LearningRate 0.0000 Epoch: 32 Global Step: 669530 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:38:25,332-Speed 6299.04 samples/sec Loss 3.0860 LearningRate 0.0000 Epoch: 32 Global Step: 669540 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:38:28,577-Speed 6313.62 samples/sec Loss 3.0286 LearningRate 0.0000 Epoch: 32 Global Step: 669550 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:38:31,823-Speed 6309.84 samples/sec Loss 3.1275 LearningRate 0.0000 Epoch: 32 Global Step: 669560 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:38:35,068-Speed 6312.22 samples/sec Loss 3.0771 LearningRate 0.0000 Epoch: 32 Global Step: 669570 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:38:38,313-Speed 6313.89 samples/sec Loss 3.1161 LearningRate 0.0000 Epoch: 32 Global Step: 669580 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:38:41,561-Speed 6307.06 samples/sec Loss 3.1384 LearningRate 0.0000 Epoch: 32 Global Step: 669590 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:38:44,808-Speed 6309.06 samples/sec Loss 3.1239 LearningRate 0.0000 Epoch: 32 Global Step: 669600 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:38:48,061-Speed 6298.16 samples/sec Loss 3.1303 LearningRate 0.0000 Epoch: 32 Global Step: 669610 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:38:51,308-Speed 6309.75 samples/sec Loss 3.0874 LearningRate 0.0000 Epoch: 32 Global Step: 669620 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:38:54,619-Speed 6186.33 samples/sec Loss 3.1068 LearningRate 0.0000 Epoch: 32 Global Step: 669630 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:38:57,857-Speed 6327.26 samples/sec Loss 3.0992 LearningRate 0.0000 Epoch: 32 Global Step: 669640 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:39:01,114-Speed 6289.30 samples/sec Loss 3.1182 LearningRate 0.0000 Epoch: 32 Global Step: 669650 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:39:04,362-Speed 6305.34 samples/sec Loss 3.1155 LearningRate 0.0000 Epoch: 32 Global Step: 669660 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:39:07,610-Speed 6308.36 samples/sec Loss 3.0841 LearningRate 0.0000 Epoch: 32 Global Step: 669670 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:39:10,857-Speed 6308.91 samples/sec Loss 3.1205 LearningRate 0.0000 Epoch: 32 Global Step: 669680 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:39:14,109-Speed 6298.12 samples/sec Loss 3.1194 LearningRate 0.0000 Epoch: 32 Global Step: 669690 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:39:17,363-Speed 6295.09 samples/sec Loss 3.1298 LearningRate 0.0000 Epoch: 32 Global Step: 669700 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:39:20,611-Speed 6306.72 samples/sec Loss 3.1104 LearningRate 0.0000 Epoch: 32 Global Step: 669710 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:39:23,854-Speed 6316.42 samples/sec Loss 3.1218 LearningRate 0.0000 Epoch: 32 Global Step: 669720 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:39:27,097-Speed 6316.81 samples/sec Loss 3.1173 LearningRate 0.0000 Epoch: 32 Global Step: 669730 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-03 04:39:30,352-Speed 6294.18 samples/sec Loss 3.1320 LearningRate 0.0000 Epoch: 32 Global Step: 669740 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:39:33,597-Speed 6312.20 samples/sec Loss 3.1142 LearningRate 0.0000 Epoch: 32 Global Step: 669750 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:39:36,851-Speed 6295.26 samples/sec Loss 3.1050 LearningRate 0.0000 Epoch: 32 Global Step: 669760 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:39:40,099-Speed 6308.01 samples/sec Loss 3.1161 LearningRate 0.0000 Epoch: 32 Global Step: 669770 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:39:43,349-Speed 6301.12 samples/sec Loss 3.0874 LearningRate 0.0000 Epoch: 32 Global Step: 669780 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:39:46,596-Speed 6309.98 samples/sec Loss 3.0969 LearningRate 0.0000 Epoch: 32 Global Step: 669790 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:39:49,849-Speed 6297.14 samples/sec Loss 3.0899 LearningRate 0.0000 Epoch: 32 Global Step: 669800 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:39:53,098-Speed 6304.25 samples/sec Loss 3.0954 LearningRate 0.0000 Epoch: 32 Global Step: 669810 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:39:56,346-Speed 6308.10 samples/sec Loss 3.1440 LearningRate 0.0000 Epoch: 32 Global Step: 669820 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:39:59,590-Speed 6315.66 samples/sec Loss 3.1106 LearningRate 0.0000 Epoch: 32 Global Step: 669830 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:02,844-Speed 6295.50 samples/sec Loss 3.1271 LearningRate 0.0000 Epoch: 32 Global Step: 669840 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-03 04:40:06,072-Speed 6344.73 samples/sec Loss 3.1176 LearningRate 0.0000 Epoch: 32 Global Step: 669850 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:09,322-Speed 6303.73 samples/sec Loss 3.0786 LearningRate 0.0000 Epoch: 32 Global Step: 669860 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:12,563-Speed 6320.50 samples/sec Loss 3.1047 LearningRate 0.0000 Epoch: 32 Global Step: 669870 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:15,810-Speed 6307.66 samples/sec Loss 3.1212 LearningRate 0.0000 Epoch: 32 Global Step: 669880 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:19,053-Speed 6317.17 samples/sec Loss 3.1725 LearningRate 0.0000 Epoch: 32 Global Step: 669890 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:22,298-Speed 6312.50 samples/sec Loss 3.1531 LearningRate 0.0000 Epoch: 32 Global Step: 669900 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:25,545-Speed 6308.62 samples/sec Loss 3.0761 LearningRate 0.0000 Epoch: 32 Global Step: 669910 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:28,789-Speed 6314.35 samples/sec Loss 3.0920 LearningRate 0.0000 Epoch: 32 Global Step: 669920 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:32,034-Speed 6313.58 samples/sec Loss 3.1179 LearningRate 0.0000 Epoch: 32 Global Step: 669930 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:35,283-Speed 6304.02 samples/sec Loss 3.0638 LearningRate 0.0000 Epoch: 32 Global Step: 669940 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:38,516-Speed 6336.10 samples/sec Loss 3.0771 LearningRate 0.0000 Epoch: 32 Global Step: 669950 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:41,768-Speed 6298.80 samples/sec Loss 3.0999 LearningRate 0.0000 Epoch: 32 Global Step: 669960 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:45,008-Speed 6322.55 samples/sec Loss 3.1485 LearningRate 0.0000 Epoch: 32 Global Step: 669970 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:48,269-Speed 6282.68 samples/sec Loss 3.1748 LearningRate 0.0000 Epoch: 32 Global Step: 669980 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:51,532-Speed 6277.77 samples/sec Loss 3.1064 LearningRate 0.0000 Epoch: 32 Global Step: 669990 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:54,784-Speed 6299.19 samples/sec Loss 3.0893 LearningRate 0.0000 Epoch: 32 Global Step: 670000 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:40:58,040-Speed 6290.60 samples/sec Loss 3.0689 LearningRate 0.0000 Epoch: 32 Global Step: 670010 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:01,293-Speed 6297.65 samples/sec Loss 3.1087 LearningRate 0.0000 Epoch: 32 Global Step: 670020 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:04,540-Speed 6309.19 samples/sec Loss 3.0561 LearningRate 0.0000 Epoch: 32 Global Step: 670030 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:07,785-Speed 6312.00 samples/sec Loss 3.1031 LearningRate 0.0000 Epoch: 32 Global Step: 670040 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:11,027-Speed 6330.19 samples/sec Loss 3.0845 LearningRate 0.0000 Epoch: 32 Global Step: 670050 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:14,272-Speed 6311.89 samples/sec Loss 3.1021 LearningRate 0.0000 Epoch: 32 Global Step: 670060 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:17,517-Speed 6312.07 samples/sec Loss 3.0625 LearningRate 0.0000 Epoch: 32 Global Step: 670070 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:20,762-Speed 6312.49 samples/sec Loss 3.0670 LearningRate 0.0000 Epoch: 32 Global Step: 670080 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:24,007-Speed 6312.87 samples/sec Loss 3.1602 LearningRate 0.0000 Epoch: 32 Global Step: 670090 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:27,258-Speed 6302.45 samples/sec Loss 3.0430 LearningRate 0.0000 Epoch: 32 Global Step: 670100 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:30,512-Speed 6294.06 samples/sec Loss 3.1100 LearningRate 0.0000 Epoch: 32 Global Step: 670110 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:33,762-Speed 6302.45 samples/sec Loss 3.0986 LearningRate 0.0000 Epoch: 32 Global Step: 670120 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:37,017-Speed 6293.16 samples/sec Loss 3.1132 LearningRate 0.0000 Epoch: 32 Global Step: 670130 Fp16 Grad Scale: 8192 Required: 15 hours Training: 2022-04-03 04:41:40,267-Speed 6305.02 samples/sec Loss 3.0813 LearningRate 0.0000 Epoch: 32 Global Step: 670140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:41:43,500-Speed 6335.70 samples/sec Loss 3.0356 LearningRate 0.0000 Epoch: 32 Global Step: 670150 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:41:46,743-Speed 6316.29 samples/sec Loss 3.0771 LearningRate 0.0000 Epoch: 32 Global Step: 670160 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:41:49,994-Speed 6301.59 samples/sec Loss 3.0681 LearningRate 0.0000 Epoch: 32 Global Step: 670170 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:41:53,242-Speed 6305.68 samples/sec Loss 3.0736 LearningRate 0.0000 Epoch: 32 Global Step: 670180 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:41:56,492-Speed 6303.74 samples/sec Loss 3.0845 LearningRate 0.0000 Epoch: 32 Global Step: 670190 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:41:59,743-Speed 6301.24 samples/sec Loss 3.0918 LearningRate 0.0000 Epoch: 32 Global Step: 670200 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:02,996-Speed 6297.37 samples/sec Loss 3.0912 LearningRate 0.0000 Epoch: 32 Global Step: 670210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:06,255-Speed 6285.56 samples/sec Loss 3.0789 LearningRate 0.0000 Epoch: 32 Global Step: 670220 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:09,503-Speed 6306.52 samples/sec Loss 3.0934 LearningRate 0.0000 Epoch: 32 Global Step: 670230 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:12,753-Speed 6302.48 samples/sec Loss 3.0607 LearningRate 0.0000 Epoch: 32 Global Step: 670240 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:15,993-Speed 6324.56 samples/sec Loss 3.1356 LearningRate 0.0000 Epoch: 32 Global Step: 670250 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:19,242-Speed 6304.34 samples/sec Loss 3.1254 LearningRate 0.0000 Epoch: 32 Global Step: 670260 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:22,493-Speed 6300.39 samples/sec Loss 3.0747 LearningRate 0.0000 Epoch: 32 Global Step: 670270 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:25,742-Speed 6306.32 samples/sec Loss 3.0845 LearningRate 0.0000 Epoch: 32 Global Step: 670280 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:28,989-Speed 6308.43 samples/sec Loss 3.0927 LearningRate 0.0000 Epoch: 32 Global Step: 670290 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:32,239-Speed 6302.69 samples/sec Loss 3.1202 LearningRate 0.0000 Epoch: 32 Global Step: 670300 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:35,484-Speed 6312.85 samples/sec Loss 3.0481 LearningRate 0.0000 Epoch: 32 Global Step: 670310 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:38,733-Speed 6305.05 samples/sec Loss 3.1203 LearningRate 0.0000 Epoch: 32 Global Step: 670320 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:41,984-Speed 6300.89 samples/sec Loss 3.1430 LearningRate 0.0000 Epoch: 32 Global Step: 670330 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:45,246-Speed 6279.22 samples/sec Loss 3.0847 LearningRate 0.0000 Epoch: 32 Global Step: 670340 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:48,482-Speed 6331.38 samples/sec Loss 3.1697 LearningRate 0.0000 Epoch: 32 Global Step: 670350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:51,731-Speed 6304.51 samples/sec Loss 3.0990 LearningRate 0.0000 Epoch: 32 Global Step: 670360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:54,981-Speed 6302.21 samples/sec Loss 3.1217 LearningRate 0.0000 Epoch: 32 Global Step: 670370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:42:58,229-Speed 6307.27 samples/sec Loss 3.0796 LearningRate 0.0000 Epoch: 32 Global Step: 670380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:43:01,480-Speed 6301.36 samples/sec Loss 3.0968 LearningRate 0.0000 Epoch: 32 Global Step: 670390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:43:04,730-Speed 6302.28 samples/sec Loss 3.1364 LearningRate 0.0000 Epoch: 32 Global Step: 670400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:43:07,982-Speed 6299.68 samples/sec Loss 3.1004 LearningRate 0.0000 Epoch: 32 Global Step: 670410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:43:11,240-Speed 6287.34 samples/sec Loss 3.0617 LearningRate 0.0000 Epoch: 32 Global Step: 670420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:43:14,489-Speed 6305.87 samples/sec Loss 3.0809 LearningRate 0.0000 Epoch: 32 Global Step: 670430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:43:17,742-Speed 6296.89 samples/sec Loss 3.1271 LearningRate 0.0000 Epoch: 32 Global Step: 670440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:43:20,976-Speed 6333.95 samples/sec Loss 3.0889 LearningRate 0.0000 Epoch: 32 Global Step: 670450 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:43:24,222-Speed 6309.43 samples/sec Loss 3.0742 LearningRate 0.0000 Epoch: 32 Global Step: 670460 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:43:27,470-Speed 6307.20 samples/sec Loss 3.0493 LearningRate 0.0000 Epoch: 32 Global Step: 670470 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:43:30,716-Speed 6311.39 samples/sec Loss 3.0667 LearningRate 0.0000 Epoch: 32 Global Step: 670480 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:43:33,965-Speed 6304.60 samples/sec Loss 3.1280 LearningRate 0.0000 Epoch: 32 Global Step: 670490 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:43:37,213-Speed 6307.54 samples/sec Loss 3.0993 LearningRate 0.0000 Epoch: 32 Global Step: 670500 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:43:40,480-Speed 6270.48 samples/sec Loss 3.0724 LearningRate 0.0000 Epoch: 32 Global Step: 670510 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:43:43,725-Speed 6313.41 samples/sec Loss 3.0553 LearningRate 0.0000 Epoch: 32 Global Step: 670520 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:43:46,972-Speed 6307.16 samples/sec Loss 3.1199 LearningRate 0.0000 Epoch: 32 Global Step: 670530 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:43:50,231-Speed 6286.23 samples/sec Loss 3.1068 LearningRate 0.0000 Epoch: 32 Global Step: 670540 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:43:53,480-Speed 6305.82 samples/sec Loss 3.0893 LearningRate 0.0000 Epoch: 32 Global Step: 670550 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:43:56,729-Speed 6305.03 samples/sec Loss 3.0689 LearningRate 0.0000 Epoch: 32 Global Step: 670560 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:43:59,978-Speed 6303.67 samples/sec Loss 3.0929 LearningRate 0.0000 Epoch: 32 Global Step: 670570 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:03,226-Speed 6306.76 samples/sec Loss 3.1171 LearningRate 0.0000 Epoch: 32 Global Step: 670580 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:06,477-Speed 6302.66 samples/sec Loss 3.0709 LearningRate 0.0000 Epoch: 32 Global Step: 670590 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:09,723-Speed 6310.01 samples/sec Loss 3.1049 LearningRate 0.0000 Epoch: 32 Global Step: 670600 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:12,972-Speed 6304.44 samples/sec Loss 3.1075 LearningRate 0.0000 Epoch: 32 Global Step: 670610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:16,230-Speed 6288.03 samples/sec Loss 3.0593 LearningRate 0.0000 Epoch: 32 Global Step: 670620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:19,487-Speed 6288.94 samples/sec Loss 3.1100 LearningRate 0.0000 Epoch: 32 Global Step: 670630 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:22,734-Speed 6308.98 samples/sec Loss 3.0895 LearningRate 0.0000 Epoch: 32 Global Step: 670640 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:25,974-Speed 6322.68 samples/sec Loss 3.1361 LearningRate 0.0000 Epoch: 32 Global Step: 670650 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:29,226-Speed 6299.29 samples/sec Loss 3.1489 LearningRate 0.0000 Epoch: 32 Global Step: 670660 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:32,479-Speed 6297.47 samples/sec Loss 3.0564 LearningRate 0.0000 Epoch: 32 Global Step: 670670 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:35,726-Speed 6308.24 samples/sec Loss 3.1034 LearningRate 0.0000 Epoch: 32 Global Step: 670680 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:38,971-Speed 6314.82 samples/sec Loss 3.1127 LearningRate 0.0000 Epoch: 32 Global Step: 670690 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:42,221-Speed 6301.23 samples/sec Loss 3.0939 LearningRate 0.0000 Epoch: 32 Global Step: 670700 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:45,475-Speed 6295.53 samples/sec Loss 3.0967 LearningRate 0.0000 Epoch: 32 Global Step: 670710 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:48,722-Speed 6309.23 samples/sec Loss 3.1357 LearningRate 0.0000 Epoch: 32 Global Step: 670720 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:51,971-Speed 6306.02 samples/sec Loss 3.1084 LearningRate 0.0000 Epoch: 32 Global Step: 670730 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:55,219-Speed 6305.18 samples/sec Loss 3.1014 LearningRate 0.0000 Epoch: 32 Global Step: 670740 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:44:58,450-Speed 6340.97 samples/sec Loss 3.0666 LearningRate 0.0000 Epoch: 32 Global Step: 670750 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:45:01,700-Speed 6302.42 samples/sec Loss 3.0940 LearningRate 0.0000 Epoch: 32 Global Step: 670760 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:45:04,954-Speed 6296.17 samples/sec Loss 3.0355 LearningRate 0.0000 Epoch: 32 Global Step: 670770 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:45:08,212-Speed 6286.37 samples/sec Loss 3.1202 LearningRate 0.0000 Epoch: 32 Global Step: 670780 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:45:11,462-Speed 6303.46 samples/sec Loss 3.0621 LearningRate 0.0000 Epoch: 32 Global Step: 670790 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:45:14,709-Speed 6307.78 samples/sec Loss 3.0873 LearningRate 0.0000 Epoch: 32 Global Step: 670800 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:45:17,956-Speed 6308.39 samples/sec Loss 3.0764 LearningRate 0.0000 Epoch: 32 Global Step: 670810 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:45:21,208-Speed 6300.74 samples/sec Loss 3.0762 LearningRate 0.0000 Epoch: 32 Global Step: 670820 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:45:24,448-Speed 6321.60 samples/sec Loss 3.1066 LearningRate 0.0000 Epoch: 32 Global Step: 670830 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:45:27,695-Speed 6309.56 samples/sec Loss 3.1506 LearningRate 0.0000 Epoch: 32 Global Step: 670840 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:45:30,951-Speed 6290.59 samples/sec Loss 3.0697 LearningRate 0.0000 Epoch: 32 Global Step: 670850 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:45:34,195-Speed 6315.35 samples/sec Loss 3.1571 LearningRate 0.0000 Epoch: 32 Global Step: 670860 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:45:37,441-Speed 6310.68 samples/sec Loss 3.0971 LearningRate 0.0000 Epoch: 32 Global Step: 670870 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:45:40,687-Speed 6310.23 samples/sec Loss 3.1412 LearningRate 0.0000 Epoch: 32 Global Step: 670880 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:45:43,934-Speed 6308.13 samples/sec Loss 3.0782 LearningRate 0.0000 Epoch: 32 Global Step: 670890 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:45:47,184-Speed 6304.38 samples/sec Loss 3.0880 LearningRate 0.0000 Epoch: 32 Global Step: 670900 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:45:50,432-Speed 6307.53 samples/sec Loss 3.1339 LearningRate 0.0000 Epoch: 32 Global Step: 670910 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:45:53,695-Speed 6277.44 samples/sec Loss 3.1342 LearningRate 0.0000 Epoch: 32 Global Step: 670920 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:45:56,938-Speed 6316.76 samples/sec Loss 3.0806 LearningRate 0.0000 Epoch: 32 Global Step: 670930 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:46:00,201-Speed 6278.46 samples/sec Loss 3.1086 LearningRate 0.0000 Epoch: 32 Global Step: 670940 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:46:03,445-Speed 6314.73 samples/sec Loss 3.1014 LearningRate 0.0000 Epoch: 32 Global Step: 670950 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:46:06,690-Speed 6312.10 samples/sec Loss 3.1926 LearningRate 0.0000 Epoch: 32 Global Step: 670960 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:46:09,935-Speed 6313.21 samples/sec Loss 3.0916 LearningRate 0.0000 Epoch: 32 Global Step: 670970 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:46:13,197-Speed 6277.97 samples/sec Loss 3.1529 LearningRate 0.0000 Epoch: 32 Global Step: 670980 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:46:16,449-Speed 6299.14 samples/sec Loss 3.1197 LearningRate 0.0000 Epoch: 32 Global Step: 670990 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:46:19,707-Speed 6288.17 samples/sec Loss 3.1384 LearningRate 0.0000 Epoch: 32 Global Step: 671000 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:46:22,947-Speed 6323.06 samples/sec Loss 3.1073 LearningRate 0.0000 Epoch: 32 Global Step: 671010 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:46:26,196-Speed 6304.93 samples/sec Loss 3.1329 LearningRate 0.0000 Epoch: 32 Global Step: 671020 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:46:29,443-Speed 6307.68 samples/sec Loss 3.0924 LearningRate 0.0000 Epoch: 32 Global Step: 671030 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:46:32,690-Speed 6308.26 samples/sec Loss 3.0940 LearningRate 0.0000 Epoch: 32 Global Step: 671040 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:46:35,933-Speed 6317.61 samples/sec Loss 3.0708 LearningRate 0.0000 Epoch: 32 Global Step: 671050 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:46:39,181-Speed 6305.87 samples/sec Loss 3.0479 LearningRate 0.0000 Epoch: 32 Global Step: 671060 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:46:42,432-Speed 6302.36 samples/sec Loss 3.1031 LearningRate 0.0000 Epoch: 32 Global Step: 671070 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:46:45,681-Speed 6303.58 samples/sec Loss 3.0829 LearningRate 0.0000 Epoch: 32 Global Step: 671080 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:46:48,926-Speed 6312.68 samples/sec Loss 3.0897 LearningRate 0.0000 Epoch: 32 Global Step: 671090 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:46:52,187-Speed 6283.77 samples/sec Loss 3.1486 LearningRate 0.0000 Epoch: 32 Global Step: 671100 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:46:55,440-Speed 6295.47 samples/sec Loss 3.0965 LearningRate 0.0000 Epoch: 32 Global Step: 671110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:46:58,685-Speed 6313.47 samples/sec Loss 3.0652 LearningRate 0.0000 Epoch: 32 Global Step: 671120 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:01,933-Speed 6307.30 samples/sec Loss 3.0765 LearningRate 0.0000 Epoch: 32 Global Step: 671130 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:05,181-Speed 6306.61 samples/sec Loss 3.0427 LearningRate 0.0000 Epoch: 32 Global Step: 671140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:08,426-Speed 6313.25 samples/sec Loss 3.0968 LearningRate 0.0000 Epoch: 32 Global Step: 671150 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:11,669-Speed 6315.12 samples/sec Loss 3.0880 LearningRate 0.0000 Epoch: 32 Global Step: 671160 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:14,927-Speed 6287.76 samples/sec Loss 3.1569 LearningRate 0.0000 Epoch: 32 Global Step: 671170 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:18,176-Speed 6305.98 samples/sec Loss 3.0597 LearningRate 0.0000 Epoch: 32 Global Step: 671180 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:21,434-Speed 6286.64 samples/sec Loss 3.1171 LearningRate 0.0000 Epoch: 32 Global Step: 671190 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:24,681-Speed 6308.11 samples/sec Loss 3.0685 LearningRate 0.0000 Epoch: 32 Global Step: 671200 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:27,918-Speed 6329.63 samples/sec Loss 2.9968 LearningRate 0.0000 Epoch: 32 Global Step: 671210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:31,166-Speed 6308.36 samples/sec Loss 3.0837 LearningRate 0.0000 Epoch: 32 Global Step: 671220 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:34,416-Speed 6302.41 samples/sec Loss 3.1012 LearningRate 0.0000 Epoch: 32 Global Step: 671230 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:37,670-Speed 6294.84 samples/sec Loss 3.0756 LearningRate 0.0000 Epoch: 32 Global Step: 671240 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:40,916-Speed 6311.21 samples/sec Loss 3.1050 LearningRate 0.0000 Epoch: 32 Global Step: 671250 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:47:44,162-Speed 6310.74 samples/sec Loss 3.0667 LearningRate 0.0000 Epoch: 32 Global Step: 671260 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:47:47,408-Speed 6309.22 samples/sec Loss 3.1030 LearningRate 0.0000 Epoch: 32 Global Step: 671270 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:47:50,659-Speed 6301.57 samples/sec Loss 3.0614 LearningRate 0.0000 Epoch: 32 Global Step: 671280 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:47:53,907-Speed 6307.43 samples/sec Loss 3.0629 LearningRate 0.0000 Epoch: 32 Global Step: 671290 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:47:57,152-Speed 6311.80 samples/sec Loss 3.0676 LearningRate 0.0000 Epoch: 32 Global Step: 671300 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:48:00,415-Speed 6280.19 samples/sec Loss 3.1478 LearningRate 0.0000 Epoch: 32 Global Step: 671310 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:48:03,667-Speed 6298.62 samples/sec Loss 3.0989 LearningRate 0.0000 Epoch: 32 Global Step: 671320 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:48:06,916-Speed 6304.34 samples/sec Loss 3.1581 LearningRate 0.0000 Epoch: 32 Global Step: 671330 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:48:10,163-Speed 6308.97 samples/sec Loss 3.1043 LearningRate 0.0000 Epoch: 32 Global Step: 671340 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:48:13,410-Speed 6308.85 samples/sec Loss 3.1213 LearningRate 0.0000 Epoch: 32 Global Step: 671350 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:48:16,655-Speed 6312.96 samples/sec Loss 3.0758 LearningRate 0.0000 Epoch: 32 Global Step: 671360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:19,901-Speed 6311.81 samples/sec Loss 3.1110 LearningRate 0.0000 Epoch: 32 Global Step: 671370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:23,150-Speed 6303.81 samples/sec Loss 3.1116 LearningRate 0.0000 Epoch: 32 Global Step: 671380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:26,410-Speed 6284.12 samples/sec Loss 3.0374 LearningRate 0.0000 Epoch: 32 Global Step: 671390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:29,667-Speed 6289.06 samples/sec Loss 3.0898 LearningRate 0.0000 Epoch: 32 Global Step: 671400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:32,919-Speed 6298.03 samples/sec Loss 3.0905 LearningRate 0.0000 Epoch: 32 Global Step: 671410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:36,171-Speed 6299.98 samples/sec Loss 3.0498 LearningRate 0.0000 Epoch: 32 Global Step: 671420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:39,425-Speed 6294.84 samples/sec Loss 3.0568 LearningRate 0.0000 Epoch: 32 Global Step: 671430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:42,675-Speed 6302.78 samples/sec Loss 3.0968 LearningRate 0.0000 Epoch: 32 Global Step: 671440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:45,929-Speed 6296.00 samples/sec Loss 3.1049 LearningRate 0.0000 Epoch: 32 Global Step: 671450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:49,180-Speed 6302.10 samples/sec Loss 3.0717 LearningRate 0.0000 Epoch: 32 Global Step: 671460 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-03 04:48:52,416-Speed 6328.39 samples/sec Loss 3.0946 LearningRate 0.0000 Epoch: 32 Global Step: 671470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:55,661-Speed 6313.58 samples/sec Loss 3.0996 LearningRate 0.0000 Epoch: 32 Global Step: 671480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:48:58,911-Speed 6301.81 samples/sec Loss 3.1600 LearningRate 0.0000 Epoch: 32 Global Step: 671490 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:02,163-Speed 6304.10 samples/sec Loss 3.0541 LearningRate 0.0000 Epoch: 32 Global Step: 671500 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:05,408-Speed 6311.93 samples/sec Loss 3.0790 LearningRate 0.0000 Epoch: 32 Global Step: 671510 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:08,659-Speed 6301.24 samples/sec Loss 3.0943 LearningRate 0.0000 Epoch: 32 Global Step: 671520 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:11,926-Speed 6270.52 samples/sec Loss 3.0984 LearningRate 0.0000 Epoch: 32 Global Step: 671530 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:15,176-Speed 6302.56 samples/sec Loss 3.1135 LearningRate 0.0000 Epoch: 32 Global Step: 671540 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:18,429-Speed 6296.92 samples/sec Loss 3.1513 LearningRate 0.0000 Epoch: 32 Global Step: 671550 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:21,680-Speed 6301.41 samples/sec Loss 3.1412 LearningRate 0.0000 Epoch: 32 Global Step: 671560 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:24,915-Speed 6332.71 samples/sec Loss 3.0593 LearningRate 0.0000 Epoch: 32 Global Step: 671570 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:28,167-Speed 6298.68 samples/sec Loss 3.0939 LearningRate 0.0000 Epoch: 32 Global Step: 671580 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:31,413-Speed 6310.00 samples/sec Loss 3.0923 LearningRate 0.0000 Epoch: 32 Global Step: 671590 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:34,663-Speed 6304.26 samples/sec Loss 3.0837 LearningRate 0.0000 Epoch: 32 Global Step: 671600 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:37,924-Speed 6280.19 samples/sec Loss 3.0860 LearningRate 0.0000 Epoch: 32 Global Step: 671610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:41,171-Speed 6310.10 samples/sec Loss 3.0871 LearningRate 0.0000 Epoch: 32 Global Step: 671620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:44,415-Speed 6314.12 samples/sec Loss 3.0940 LearningRate 0.0000 Epoch: 32 Global Step: 671630 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:47,662-Speed 6308.18 samples/sec Loss 3.0739 LearningRate 0.0000 Epoch: 32 Global Step: 671640 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:50,916-Speed 6295.88 samples/sec Loss 3.1251 LearningRate 0.0000 Epoch: 32 Global Step: 671650 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:54,165-Speed 6304.00 samples/sec Loss 3.1088 LearningRate 0.0000 Epoch: 32 Global Step: 671660 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:49:57,404-Speed 6325.17 samples/sec Loss 3.0957 LearningRate 0.0000 Epoch: 32 Global Step: 671670 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:00,651-Speed 6308.73 samples/sec Loss 3.1335 LearningRate 0.0000 Epoch: 32 Global Step: 671680 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:03,904-Speed 6296.04 samples/sec Loss 3.0735 LearningRate 0.0000 Epoch: 32 Global Step: 671690 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:07,155-Speed 6301.31 samples/sec Loss 3.0331 LearningRate 0.0000 Epoch: 32 Global Step: 671700 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:10,403-Speed 6307.42 samples/sec Loss 3.0949 LearningRate 0.0000 Epoch: 32 Global Step: 671710 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:13,649-Speed 6311.44 samples/sec Loss 3.1512 LearningRate 0.0000 Epoch: 32 Global Step: 671720 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:16,898-Speed 6303.43 samples/sec Loss 3.1004 LearningRate 0.0000 Epoch: 32 Global Step: 671730 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:20,145-Speed 6310.24 samples/sec Loss 3.1397 LearningRate 0.0000 Epoch: 32 Global Step: 671740 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:23,392-Speed 6309.19 samples/sec Loss 3.1745 LearningRate 0.0000 Epoch: 32 Global Step: 671750 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:26,648-Speed 6291.85 samples/sec Loss 3.0983 LearningRate 0.0000 Epoch: 32 Global Step: 671760 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:29,884-Speed 6330.49 samples/sec Loss 3.1012 LearningRate 0.0000 Epoch: 32 Global Step: 671770 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:33,138-Speed 6293.95 samples/sec Loss 3.1293 LearningRate 0.0000 Epoch: 32 Global Step: 671780 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:36,387-Speed 6305.97 samples/sec Loss 3.1207 LearningRate 0.0000 Epoch: 32 Global Step: 671790 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:39,637-Speed 6303.16 samples/sec Loss 3.1245 LearningRate 0.0000 Epoch: 32 Global Step: 671800 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:42,883-Speed 6310.61 samples/sec Loss 3.1173 LearningRate 0.0000 Epoch: 32 Global Step: 671810 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:46,129-Speed 6309.90 samples/sec Loss 3.1302 LearningRate 0.0000 Epoch: 32 Global Step: 671820 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:49,379-Speed 6303.73 samples/sec Loss 3.0782 LearningRate 0.0000 Epoch: 32 Global Step: 671830 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:52,624-Speed 6311.64 samples/sec Loss 3.0540 LearningRate 0.0000 Epoch: 32 Global Step: 671840 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:55,868-Speed 6313.99 samples/sec Loss 3.0826 LearningRate 0.0000 Epoch: 32 Global Step: 671850 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:50:59,113-Speed 6312.96 samples/sec Loss 3.0598 LearningRate 0.0000 Epoch: 32 Global Step: 671860 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:02,345-Speed 6339.72 samples/sec Loss 3.1302 LearningRate 0.0000 Epoch: 32 Global Step: 671870 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:05,590-Speed 6311.33 samples/sec Loss 3.0730 LearningRate 0.0000 Epoch: 32 Global Step: 671880 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:08,836-Speed 6310.59 samples/sec Loss 3.0918 LearningRate 0.0000 Epoch: 32 Global Step: 671890 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:12,082-Speed 6310.86 samples/sec Loss 3.1397 LearningRate 0.0000 Epoch: 32 Global Step: 671900 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:15,330-Speed 6307.50 samples/sec Loss 3.0930 LearningRate 0.0000 Epoch: 32 Global Step: 671910 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:18,577-Speed 6308.79 samples/sec Loss 3.0324 LearningRate 0.0000 Epoch: 32 Global Step: 671920 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:21,832-Speed 6292.53 samples/sec Loss 3.1080 LearningRate 0.0000 Epoch: 32 Global Step: 671930 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:25,083-Speed 6302.01 samples/sec Loss 3.0865 LearningRate 0.0000 Epoch: 32 Global Step: 671940 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:28,350-Speed 6269.38 samples/sec Loss 3.0952 LearningRate 0.0000 Epoch: 32 Global Step: 671950 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:31,601-Speed 6302.49 samples/sec Loss 3.0745 LearningRate 0.0000 Epoch: 32 Global Step: 671960 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:34,839-Speed 6325.44 samples/sec Loss 3.1128 LearningRate 0.0000 Epoch: 32 Global Step: 671970 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:38,083-Speed 6314.71 samples/sec Loss 3.1010 LearningRate 0.0000 Epoch: 32 Global Step: 671980 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:41,335-Speed 6300.24 samples/sec Loss 3.0900 LearningRate 0.0000 Epoch: 32 Global Step: 671990 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:44,578-Speed 6314.97 samples/sec Loss 3.1132 LearningRate 0.0000 Epoch: 32 Global Step: 672000 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:47,823-Speed 6312.64 samples/sec Loss 3.0170 LearningRate 0.0000 Epoch: 32 Global Step: 672010 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:51,072-Speed 6305.67 samples/sec Loss 3.1445 LearningRate 0.0000 Epoch: 32 Global Step: 672020 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:54,325-Speed 6296.75 samples/sec Loss 3.1039 LearningRate 0.0000 Epoch: 32 Global Step: 672030 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:51:57,571-Speed 6310.78 samples/sec Loss 3.0198 LearningRate 0.0000 Epoch: 32 Global Step: 672040 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:00,825-Speed 6295.46 samples/sec Loss 3.0178 LearningRate 0.0000 Epoch: 32 Global Step: 672050 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:04,075-Speed 6301.79 samples/sec Loss 3.1102 LearningRate 0.0000 Epoch: 32 Global Step: 672060 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:07,308-Speed 6336.84 samples/sec Loss 3.0229 LearningRate 0.0000 Epoch: 32 Global Step: 672070 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:10,554-Speed 6310.93 samples/sec Loss 3.0441 LearningRate 0.0000 Epoch: 32 Global Step: 672080 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:13,803-Speed 6305.66 samples/sec Loss 3.1526 LearningRate 0.0000 Epoch: 32 Global Step: 672090 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:17,048-Speed 6311.66 samples/sec Loss 3.1245 LearningRate 0.0000 Epoch: 32 Global Step: 672100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:20,296-Speed 6306.26 samples/sec Loss 3.0996 LearningRate 0.0000 Epoch: 32 Global Step: 672110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:23,542-Speed 6311.08 samples/sec Loss 3.0637 LearningRate 0.0000 Epoch: 32 Global Step: 672120 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:26,789-Speed 6308.73 samples/sec Loss 3.0716 LearningRate 0.0000 Epoch: 32 Global Step: 672130 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:30,035-Speed 6310.35 samples/sec Loss 3.0546 LearningRate 0.0000 Epoch: 32 Global Step: 672140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:33,283-Speed 6306.81 samples/sec Loss 3.1275 LearningRate 0.0000 Epoch: 32 Global Step: 672150 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:36,534-Speed 6301.20 samples/sec Loss 3.0672 LearningRate 0.0000 Epoch: 32 Global Step: 672160 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:39,774-Speed 6322.48 samples/sec Loss 3.1055 LearningRate 0.0000 Epoch: 32 Global Step: 672170 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:43,020-Speed 6312.88 samples/sec Loss 3.1560 LearningRate 0.0000 Epoch: 32 Global Step: 672180 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:46,270-Speed 6302.58 samples/sec Loss 3.0899 LearningRate 0.0000 Epoch: 32 Global Step: 672190 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:49,516-Speed 6311.44 samples/sec Loss 3.0188 LearningRate 0.0000 Epoch: 32 Global Step: 672200 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:52,759-Speed 6315.35 samples/sec Loss 3.1456 LearningRate 0.0000 Epoch: 32 Global Step: 672210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:56,008-Speed 6305.61 samples/sec Loss 3.0953 LearningRate 0.0000 Epoch: 32 Global Step: 672220 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:52:59,251-Speed 6315.13 samples/sec Loss 3.0705 LearningRate 0.0000 Epoch: 32 Global Step: 672230 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:53:02,505-Speed 6296.05 samples/sec Loss 3.1324 LearningRate 0.0000 Epoch: 32 Global Step: 672240 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:53:05,757-Speed 6298.92 samples/sec Loss 3.1518 LearningRate 0.0000 Epoch: 32 Global Step: 672250 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:53:09,004-Speed 6308.30 samples/sec Loss 3.0628 LearningRate 0.0000 Epoch: 32 Global Step: 672260 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:53:12,253-Speed 6304.62 samples/sec Loss 3.1285 LearningRate 0.0000 Epoch: 32 Global Step: 672270 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-03 04:53:15,482-Speed 6344.13 samples/sec Loss 3.0044 LearningRate 0.0000 Epoch: 32 Global Step: 672280 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:53:18,737-Speed 6294.43 samples/sec Loss 3.0598 LearningRate 0.0000 Epoch: 32 Global Step: 672290 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:53:21,982-Speed 6311.09 samples/sec Loss 3.0773 LearningRate 0.0000 Epoch: 32 Global Step: 672300 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:53:25,234-Speed 6299.75 samples/sec Loss 3.0548 LearningRate 0.0000 Epoch: 32 Global Step: 672310 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:53:28,512-Speed 6249.23 samples/sec Loss 3.0766 LearningRate 0.0000 Epoch: 32 Global Step: 672320 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:53:31,744-Speed 6338.60 samples/sec Loss 3.0838 LearningRate 0.0000 Epoch: 32 Global Step: 672330 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:53:34,999-Speed 6293.34 samples/sec Loss 3.0670 LearningRate 0.0000 Epoch: 32 Global Step: 672340 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:53:38,280-Speed 6242.25 samples/sec Loss 3.1031 LearningRate 0.0000 Epoch: 32 Global Step: 672350 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:53:41,528-Speed 6307.43 samples/sec Loss 3.0745 LearningRate 0.0000 Epoch: 32 Global Step: 672360 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:53:44,779-Speed 6300.95 samples/sec Loss 3.1063 LearningRate 0.0000 Epoch: 32 Global Step: 672370 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:53:48,024-Speed 6314.15 samples/sec Loss 3.0675 LearningRate 0.0000 Epoch: 32 Global Step: 672380 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:53:51,271-Speed 6308.73 samples/sec Loss 3.0862 LearningRate 0.0000 Epoch: 32 Global Step: 672390 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:53:54,515-Speed 6313.77 samples/sec Loss 3.1216 LearningRate 0.0000 Epoch: 32 Global Step: 672400 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:53:57,761-Speed 6309.95 samples/sec Loss 3.1092 LearningRate 0.0000 Epoch: 32 Global Step: 672410 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:01,015-Speed 6296.63 samples/sec Loss 3.0480 LearningRate 0.0000 Epoch: 32 Global Step: 672420 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:04,269-Speed 6294.19 samples/sec Loss 3.1546 LearningRate 0.0000 Epoch: 32 Global Step: 672430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:54:07,515-Speed 6311.28 samples/sec Loss 3.0774 LearningRate 0.0000 Epoch: 32 Global Step: 672440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:54:10,762-Speed 6309.08 samples/sec Loss 3.0736 LearningRate 0.0000 Epoch: 32 Global Step: 672450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:54:14,011-Speed 6305.28 samples/sec Loss 3.0557 LearningRate 0.0000 Epoch: 32 Global Step: 672460 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:54:17,254-Speed 6315.71 samples/sec Loss 3.1304 LearningRate 0.0000 Epoch: 32 Global Step: 672470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:54:20,500-Speed 6310.18 samples/sec Loss 3.0382 LearningRate 0.0000 Epoch: 32 Global Step: 672480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:54:23,752-Speed 6299.70 samples/sec Loss 3.0964 LearningRate 0.0000 Epoch: 32 Global Step: 672490 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:54:26,982-Speed 6342.32 samples/sec Loss 3.0823 LearningRate 0.0000 Epoch: 32 Global Step: 672500 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:30,236-Speed 6293.57 samples/sec Loss 3.0813 LearningRate 0.0000 Epoch: 32 Global Step: 672510 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:33,483-Speed 6309.40 samples/sec Loss 3.1148 LearningRate 0.0000 Epoch: 32 Global Step: 672520 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:36,734-Speed 6300.86 samples/sec Loss 3.0485 LearningRate 0.0000 Epoch: 32 Global Step: 672530 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:39,984-Speed 6302.91 samples/sec Loss 3.0932 LearningRate 0.0000 Epoch: 32 Global Step: 672540 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:43,238-Speed 6295.85 samples/sec Loss 3.0767 LearningRate 0.0000 Epoch: 32 Global Step: 672550 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:46,488-Speed 6302.84 samples/sec Loss 3.0575 LearningRate 0.0000 Epoch: 32 Global Step: 672560 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:49,739-Speed 6301.75 samples/sec Loss 3.1034 LearningRate 0.0000 Epoch: 32 Global Step: 672570 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:52,988-Speed 6304.88 samples/sec Loss 3.0198 LearningRate 0.0000 Epoch: 32 Global Step: 672580 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:56,233-Speed 6312.50 samples/sec Loss 3.0701 LearningRate 0.0000 Epoch: 32 Global Step: 672590 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:54:59,485-Speed 6298.76 samples/sec Loss 3.1041 LearningRate 0.0000 Epoch: 32 Global Step: 672600 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:02,732-Speed 6309.44 samples/sec Loss 3.0883 LearningRate 0.0000 Epoch: 32 Global Step: 672610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:05,984-Speed 6298.91 samples/sec Loss 3.0449 LearningRate 0.0000 Epoch: 32 Global Step: 672620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:09,230-Speed 6310.51 samples/sec Loss 3.0888 LearningRate 0.0000 Epoch: 32 Global Step: 672630 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:12,476-Speed 6310.36 samples/sec Loss 3.1145 LearningRate 0.0000 Epoch: 32 Global Step: 672640 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:15,723-Speed 6309.86 samples/sec Loss 3.1087 LearningRate 0.0000 Epoch: 32 Global Step: 672650 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:18,972-Speed 6305.55 samples/sec Loss 3.0364 LearningRate 0.0000 Epoch: 32 Global Step: 672660 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:22,217-Speed 6311.54 samples/sec Loss 3.0684 LearningRate 0.0000 Epoch: 32 Global Step: 672670 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:25,467-Speed 6302.01 samples/sec Loss 3.1315 LearningRate 0.0000 Epoch: 32 Global Step: 672680 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:28,720-Speed 6297.83 samples/sec Loss 3.0882 LearningRate 0.0000 Epoch: 32 Global Step: 672690 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:31,961-Speed 6320.01 samples/sec Loss 3.0352 LearningRate 0.0000 Epoch: 32 Global Step: 672700 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:35,207-Speed 6310.93 samples/sec Loss 3.1496 LearningRate 0.0000 Epoch: 32 Global Step: 672710 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:38,455-Speed 6307.61 samples/sec Loss 3.1151 LearningRate 0.0000 Epoch: 32 Global Step: 672720 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:41,701-Speed 6310.36 samples/sec Loss 3.0914 LearningRate 0.0000 Epoch: 32 Global Step: 672730 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:44,949-Speed 6306.35 samples/sec Loss 3.0899 LearningRate 0.0000 Epoch: 32 Global Step: 672740 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:48,198-Speed 6304.77 samples/sec Loss 3.0939 LearningRate 0.0000 Epoch: 32 Global Step: 672750 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:51,441-Speed 6317.69 samples/sec Loss 3.0952 LearningRate 0.0000 Epoch: 32 Global Step: 672760 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:54,692-Speed 6299.45 samples/sec Loss 3.0322 LearningRate 0.0000 Epoch: 32 Global Step: 672770 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:55:57,940-Speed 6306.74 samples/sec Loss 3.1374 LearningRate 0.0000 Epoch: 32 Global Step: 672780 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:01,192-Speed 6299.94 samples/sec Loss 3.0900 LearningRate 0.0000 Epoch: 32 Global Step: 672790 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:04,426-Speed 6334.72 samples/sec Loss 3.0550 LearningRate 0.0000 Epoch: 32 Global Step: 672800 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:07,672-Speed 6310.61 samples/sec Loss 3.0467 LearningRate 0.0000 Epoch: 32 Global Step: 672810 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:10,929-Speed 6289.44 samples/sec Loss 3.0549 LearningRate 0.0000 Epoch: 32 Global Step: 672820 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:14,180-Speed 6301.16 samples/sec Loss 3.1476 LearningRate 0.0000 Epoch: 32 Global Step: 672830 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:17,431-Speed 6300.77 samples/sec Loss 3.1278 LearningRate 0.0000 Epoch: 32 Global Step: 672840 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:20,681-Speed 6304.11 samples/sec Loss 3.0559 LearningRate 0.0000 Epoch: 32 Global Step: 672850 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:23,929-Speed 6308.15 samples/sec Loss 3.0833 LearningRate 0.0000 Epoch: 32 Global Step: 672860 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:27,178-Speed 6303.92 samples/sec Loss 3.0492 LearningRate 0.0000 Epoch: 32 Global Step: 672870 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:30,428-Speed 6302.23 samples/sec Loss 3.0150 LearningRate 0.0000 Epoch: 32 Global Step: 672880 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:33,679-Speed 6301.00 samples/sec Loss 3.1294 LearningRate 0.0000 Epoch: 32 Global Step: 672890 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:36,912-Speed 6337.57 samples/sec Loss 3.0447 LearningRate 0.0000 Epoch: 32 Global Step: 672900 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:40,164-Speed 6298.10 samples/sec Loss 3.0551 LearningRate 0.0000 Epoch: 32 Global Step: 672910 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:43,406-Speed 6317.84 samples/sec Loss 3.0616 LearningRate 0.0000 Epoch: 32 Global Step: 672920 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:46,656-Speed 6304.92 samples/sec Loss 3.1283 LearningRate 0.0000 Epoch: 32 Global Step: 672930 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:49,901-Speed 6312.26 samples/sec Loss 3.0702 LearningRate 0.0000 Epoch: 32 Global Step: 672940 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:56:53,135-Speed 6333.56 samples/sec Loss 3.0647 LearningRate 0.0000 Epoch: 32 Global Step: 672950 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:56:56,386-Speed 6301.60 samples/sec Loss 3.1214 LearningRate 0.0000 Epoch: 32 Global Step: 672960 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:56:59,635-Speed 6304.54 samples/sec Loss 3.1221 LearningRate 0.0000 Epoch: 32 Global Step: 672970 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:02,878-Speed 6315.70 samples/sec Loss 3.1176 LearningRate 0.0000 Epoch: 32 Global Step: 672980 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:06,129-Speed 6300.86 samples/sec Loss 3.0872 LearningRate 0.0000 Epoch: 32 Global Step: 672990 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:09,372-Speed 6318.00 samples/sec Loss 3.0429 LearningRate 0.0000 Epoch: 32 Global Step: 673000 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:12,628-Speed 6290.90 samples/sec Loss 3.1185 LearningRate 0.0000 Epoch: 32 Global Step: 673010 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:15,875-Speed 6308.78 samples/sec Loss 3.1169 LearningRate 0.0000 Epoch: 32 Global Step: 673020 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:19,123-Speed 6307.23 samples/sec Loss 3.0173 LearningRate 0.0000 Epoch: 32 Global Step: 673030 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:22,370-Speed 6308.63 samples/sec Loss 3.1051 LearningRate 0.0000 Epoch: 32 Global Step: 673040 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:25,634-Speed 6276.29 samples/sec Loss 3.1069 LearningRate 0.0000 Epoch: 32 Global Step: 673050 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:57:28,902-Speed 6268.40 samples/sec Loss 3.0985 LearningRate 0.0000 Epoch: 32 Global Step: 673060 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:57:32,147-Speed 6312.69 samples/sec Loss 3.0439 LearningRate 0.0000 Epoch: 32 Global Step: 673070 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:57:35,400-Speed 6297.24 samples/sec Loss 3.0902 LearningRate 0.0000 Epoch: 32 Global Step: 673080 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:57:38,631-Speed 6340.07 samples/sec Loss 3.0486 LearningRate 0.0000 Epoch: 32 Global Step: 673090 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:41,876-Speed 6312.38 samples/sec Loss 3.0429 LearningRate 0.0000 Epoch: 32 Global Step: 673100 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:45,124-Speed 6305.92 samples/sec Loss 3.1330 LearningRate 0.0000 Epoch: 32 Global Step: 673110 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:48,366-Speed 6319.16 samples/sec Loss 3.1133 LearningRate 0.0000 Epoch: 32 Global Step: 673120 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:51,615-Speed 6304.69 samples/sec Loss 3.0557 LearningRate 0.0000 Epoch: 32 Global Step: 673130 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:54,863-Speed 6307.53 samples/sec Loss 3.0840 LearningRate 0.0000 Epoch: 32 Global Step: 673140 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:57:58,105-Speed 6317.99 samples/sec Loss 3.0787 LearningRate 0.0000 Epoch: 32 Global Step: 673150 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:58:01,363-Speed 6288.12 samples/sec Loss 3.0828 LearningRate 0.0000 Epoch: 32 Global Step: 673160 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:58:04,614-Speed 6299.24 samples/sec Loss 3.0253 LearningRate 0.0000 Epoch: 32 Global Step: 673170 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:58:07,863-Speed 6305.27 samples/sec Loss 3.0759 LearningRate 0.0000 Epoch: 32 Global Step: 673180 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 04:58:11,110-Speed 6309.37 samples/sec Loss 3.0640 LearningRate 0.0000 Epoch: 32 Global Step: 673190 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:14,359-Speed 6304.53 samples/sec Loss 3.0620 LearningRate 0.0000 Epoch: 32 Global Step: 673200 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:17,612-Speed 6297.30 samples/sec Loss 3.1326 LearningRate 0.0000 Epoch: 32 Global Step: 673210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:20,862-Speed 6302.87 samples/sec Loss 3.1023 LearningRate 0.0000 Epoch: 32 Global Step: 673220 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:24,120-Speed 6288.38 samples/sec Loss 3.1198 LearningRate 0.0000 Epoch: 32 Global Step: 673230 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:27,377-Speed 6288.31 samples/sec Loss 3.0929 LearningRate 0.0000 Epoch: 32 Global Step: 673240 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:30,626-Speed 6305.95 samples/sec Loss 3.0851 LearningRate 0.0000 Epoch: 32 Global Step: 673250 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:33,882-Speed 6290.78 samples/sec Loss 3.0936 LearningRate 0.0000 Epoch: 32 Global Step: 673260 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:37,152-Speed 6264.75 samples/sec Loss 3.0569 LearningRate 0.0000 Epoch: 32 Global Step: 673270 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:40,399-Speed 6308.75 samples/sec Loss 3.0763 LearningRate 0.0000 Epoch: 32 Global Step: 673280 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:43,643-Speed 6315.63 samples/sec Loss 3.0846 LearningRate 0.0000 Epoch: 32 Global Step: 673290 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:46,934-Speed 6224.32 samples/sec Loss 3.1136 LearningRate 0.0000 Epoch: 32 Global Step: 673300 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:50,216-Speed 6241.60 samples/sec Loss 3.0894 LearningRate 0.0000 Epoch: 32 Global Step: 673310 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:53,460-Speed 6314.24 samples/sec Loss 3.0978 LearningRate 0.0000 Epoch: 32 Global Step: 673320 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:56,709-Speed 6305.18 samples/sec Loss 3.0607 LearningRate 0.0000 Epoch: 32 Global Step: 673330 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:58:59,959-Speed 6302.62 samples/sec Loss 3.1433 LearningRate 0.0000 Epoch: 32 Global Step: 673340 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:03,211-Speed 6298.38 samples/sec Loss 3.0451 LearningRate 0.0000 Epoch: 32 Global Step: 673350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:06,457-Speed 6311.20 samples/sec Loss 3.0412 LearningRate 0.0000 Epoch: 32 Global Step: 673360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:09,705-Speed 6307.68 samples/sec Loss 3.0597 LearningRate 0.0000 Epoch: 32 Global Step: 673370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:12,954-Speed 6303.52 samples/sec Loss 3.0762 LearningRate 0.0000 Epoch: 32 Global Step: 673380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:16,185-Speed 6340.22 samples/sec Loss 3.1024 LearningRate 0.0000 Epoch: 32 Global Step: 673390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:19,441-Speed 6291.96 samples/sec Loss 3.0838 LearningRate 0.0000 Epoch: 32 Global Step: 673400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:22,689-Speed 6305.73 samples/sec Loss 3.0263 LearningRate 0.0000 Epoch: 32 Global Step: 673410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:25,938-Speed 6306.50 samples/sec Loss 3.0827 LearningRate 0.0000 Epoch: 32 Global Step: 673420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:29,189-Speed 6300.32 samples/sec Loss 3.1380 LearningRate 0.0000 Epoch: 32 Global Step: 673430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:32,435-Speed 6310.04 samples/sec Loss 3.0932 LearningRate 0.0000 Epoch: 32 Global Step: 673440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:35,685-Speed 6302.63 samples/sec Loss 3.0897 LearningRate 0.0000 Epoch: 32 Global Step: 673450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:38,943-Speed 6288.00 samples/sec Loss 3.0352 LearningRate 0.0000 Epoch: 32 Global Step: 673460 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:42,192-Speed 6303.85 samples/sec Loss 3.0410 LearningRate 0.0000 Epoch: 32 Global Step: 673470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:45,437-Speed 6313.17 samples/sec Loss 3.0656 LearningRate 0.0000 Epoch: 32 Global Step: 673480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:48,668-Speed 6339.96 samples/sec Loss 3.1085 LearningRate 0.0000 Epoch: 32 Global Step: 673490 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:51,918-Speed 6304.83 samples/sec Loss 3.0580 LearningRate 0.0000 Epoch: 32 Global Step: 673500 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:55,161-Speed 6315.66 samples/sec Loss 3.1054 LearningRate 0.0000 Epoch: 32 Global Step: 673510 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 04:59:58,406-Speed 6313.02 samples/sec Loss 3.1273 LearningRate 0.0000 Epoch: 32 Global Step: 673520 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:00:01,655-Speed 6305.98 samples/sec Loss 3.0741 LearningRate 0.0000 Epoch: 32 Global Step: 673530 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:00:04,901-Speed 6310.09 samples/sec Loss 3.0980 LearningRate 0.0000 Epoch: 32 Global Step: 673540 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:00:08,148-Speed 6309.14 samples/sec Loss 3.0579 LearningRate 0.0000 Epoch: 32 Global Step: 673550 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:00:11,397-Speed 6304.61 samples/sec Loss 3.0737 LearningRate 0.0000 Epoch: 32 Global Step: 673560 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:00:14,645-Speed 6305.39 samples/sec Loss 3.0593 LearningRate 0.0000 Epoch: 32 Global Step: 673570 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:00:17,896-Speed 6302.50 samples/sec Loss 3.1062 LearningRate 0.0000 Epoch: 32 Global Step: 673580 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:00:21,134-Speed 6326.52 samples/sec Loss 3.0549 LearningRate 0.0000 Epoch: 32 Global Step: 673590 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:00:24,381-Speed 6307.60 samples/sec Loss 3.0867 LearningRate 0.0000 Epoch: 32 Global Step: 673600 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:00:27,630-Speed 6305.88 samples/sec Loss 3.1053 LearningRate 0.0000 Epoch: 32 Global Step: 673610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:00:30,866-Speed 6330.43 samples/sec Loss 3.1295 LearningRate 0.0000 Epoch: 32 Global Step: 673620 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:00:34,115-Speed 6303.08 samples/sec Loss 3.1169 LearningRate 0.0000 Epoch: 32 Global Step: 673630 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:00:37,365-Speed 6304.89 samples/sec Loss 3.0973 LearningRate 0.0000 Epoch: 32 Global Step: 673640 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:00:40,611-Speed 6309.94 samples/sec Loss 3.0712 LearningRate 0.0000 Epoch: 32 Global Step: 673650 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:00:43,858-Speed 6309.29 samples/sec Loss 3.0733 LearningRate 0.0000 Epoch: 32 Global Step: 673660 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:00:47,108-Speed 6303.09 samples/sec Loss 3.0681 LearningRate 0.0000 Epoch: 32 Global Step: 673670 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:00:50,356-Speed 6305.46 samples/sec Loss 3.0809 LearningRate 0.0000 Epoch: 32 Global Step: 673680 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:00:53,608-Speed 6299.83 samples/sec Loss 3.0594 LearningRate 0.0000 Epoch: 32 Global Step: 673690 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:00:56,852-Speed 6314.41 samples/sec Loss 3.0724 LearningRate 0.0000 Epoch: 32 Global Step: 673700 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:00,104-Speed 6299.82 samples/sec Loss 3.0977 LearningRate 0.0000 Epoch: 32 Global Step: 673710 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:03,336-Speed 6336.23 samples/sec Loss 3.0631 LearningRate 0.0000 Epoch: 32 Global Step: 673720 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:06,585-Speed 6306.70 samples/sec Loss 3.0818 LearningRate 0.0000 Epoch: 32 Global Step: 673730 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:09,840-Speed 6293.56 samples/sec Loss 3.0483 LearningRate 0.0000 Epoch: 32 Global Step: 673740 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:13,087-Speed 6308.34 samples/sec Loss 3.0857 LearningRate 0.0000 Epoch: 32 Global Step: 673750 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:16,334-Speed 6308.68 samples/sec Loss 3.0745 LearningRate 0.0000 Epoch: 32 Global Step: 673760 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:19,580-Speed 6310.87 samples/sec Loss 3.0709 LearningRate 0.0000 Epoch: 32 Global Step: 673770 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:22,829-Speed 6304.67 samples/sec Loss 3.0806 LearningRate 0.0000 Epoch: 32 Global Step: 673780 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:26,076-Speed 6309.84 samples/sec Loss 3.0592 LearningRate 0.0000 Epoch: 32 Global Step: 673790 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:29,321-Speed 6312.38 samples/sec Loss 3.0783 LearningRate 0.0000 Epoch: 32 Global Step: 673800 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:32,568-Speed 6307.60 samples/sec Loss 3.0404 LearningRate 0.0000 Epoch: 32 Global Step: 673810 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:35,815-Speed 6309.49 samples/sec Loss 3.0340 LearningRate 0.0000 Epoch: 32 Global Step: 673820 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:01:39,065-Speed 6303.14 samples/sec Loss 3.1125 LearningRate 0.0000 Epoch: 32 Global Step: 673830 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:01:42,312-Speed 6308.05 samples/sec Loss 3.1294 LearningRate 0.0000 Epoch: 32 Global Step: 673840 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:01:45,545-Speed 6336.03 samples/sec Loss 3.1354 LearningRate 0.0000 Epoch: 32 Global Step: 673850 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:48,796-Speed 6301.46 samples/sec Loss 3.0822 LearningRate 0.0000 Epoch: 32 Global Step: 673860 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:52,046-Speed 6303.27 samples/sec Loss 3.0890 LearningRate 0.0000 Epoch: 32 Global Step: 673870 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:55,299-Speed 6297.97 samples/sec Loss 3.0294 LearningRate 0.0000 Epoch: 32 Global Step: 673880 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:01:58,552-Speed 6296.40 samples/sec Loss 3.0428 LearningRate 0.0000 Epoch: 32 Global Step: 673890 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:02:01,806-Speed 6295.25 samples/sec Loss 3.0837 LearningRate 0.0000 Epoch: 32 Global Step: 673900 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:02:05,064-Speed 6285.95 samples/sec Loss 3.0730 LearningRate 0.0000 Epoch: 32 Global Step: 673910 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:02:08,317-Speed 6298.71 samples/sec Loss 3.1251 LearningRate 0.0000 Epoch: 32 Global Step: 673920 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:02:11,569-Speed 6299.12 samples/sec Loss 3.0691 LearningRate 0.0000 Epoch: 32 Global Step: 673930 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:02:14,811-Speed 6316.85 samples/sec Loss 3.0148 LearningRate 0.0000 Epoch: 32 Global Step: 673940 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:02:18,059-Speed 6307.48 samples/sec Loss 3.1090 LearningRate 0.0000 Epoch: 32 Global Step: 673950 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:21,315-Speed 6291.28 samples/sec Loss 3.0849 LearningRate 0.0000 Epoch: 32 Global Step: 673960 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:24,559-Speed 6316.83 samples/sec Loss 3.1282 LearningRate 0.0000 Epoch: 32 Global Step: 673970 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:27,807-Speed 6305.30 samples/sec Loss 3.0332 LearningRate 0.0000 Epoch: 32 Global Step: 673980 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:31,054-Speed 6309.22 samples/sec Loss 3.0496 LearningRate 0.0000 Epoch: 32 Global Step: 673990 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:34,300-Speed 6311.34 samples/sec Loss 3.1670 LearningRate 0.0000 Epoch: 32 Global Step: 674000 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:37,550-Speed 6303.46 samples/sec Loss 3.0785 LearningRate 0.0000 Epoch: 32 Global Step: 674010 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:40,798-Speed 6305.70 samples/sec Loss 3.0823 LearningRate 0.0000 Epoch: 32 Global Step: 674020 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:44,057-Speed 6286.69 samples/sec Loss 3.0901 LearningRate 0.0000 Epoch: 32 Global Step: 674030 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:47,320-Speed 6277.09 samples/sec Loss 3.0587 LearningRate 0.0000 Epoch: 32 Global Step: 674040 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:50,561-Speed 6321.37 samples/sec Loss 3.0792 LearningRate 0.0000 Epoch: 32 Global Step: 674050 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:53,812-Speed 6299.80 samples/sec Loss 3.0407 LearningRate 0.0000 Epoch: 32 Global Step: 674060 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:02:57,065-Speed 6297.94 samples/sec Loss 3.1238 LearningRate 0.0000 Epoch: 32 Global Step: 674070 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:00,329-Speed 6274.90 samples/sec Loss 3.0826 LearningRate 0.0000 Epoch: 32 Global Step: 674080 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:03,583-Speed 6295.59 samples/sec Loss 3.0819 LearningRate 0.0000 Epoch: 32 Global Step: 674090 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:06,832-Speed 6304.43 samples/sec Loss 3.0493 LearningRate 0.0000 Epoch: 32 Global Step: 674100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:10,079-Speed 6308.44 samples/sec Loss 3.1034 LearningRate 0.0000 Epoch: 32 Global Step: 674110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:13,325-Speed 6311.23 samples/sec Loss 3.1310 LearningRate 0.0000 Epoch: 32 Global Step: 674120 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:16,575-Speed 6303.43 samples/sec Loss 3.0623 LearningRate 0.0000 Epoch: 32 Global Step: 674130 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:19,826-Speed 6300.22 samples/sec Loss 3.0333 LearningRate 0.0000 Epoch: 32 Global Step: 674140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:23,081-Speed 6292.78 samples/sec Loss 3.0713 LearningRate 0.0000 Epoch: 32 Global Step: 674150 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-03 05:03:26,316-Speed 6332.91 samples/sec Loss 3.1161 LearningRate 0.0000 Epoch: 32 Global Step: 674160 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:29,562-Speed 6310.19 samples/sec Loss 3.0525 LearningRate 0.0000 Epoch: 32 Global Step: 674170 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:32,822-Speed 6284.14 samples/sec Loss 3.0879 LearningRate 0.0000 Epoch: 32 Global Step: 674180 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:36,074-Speed 6300.80 samples/sec Loss 3.1020 LearningRate 0.0000 Epoch: 32 Global Step: 674190 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:39,318-Speed 6314.60 samples/sec Loss 3.0767 LearningRate 0.0000 Epoch: 32 Global Step: 674200 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:42,565-Speed 6307.52 samples/sec Loss 3.0568 LearningRate 0.0000 Epoch: 32 Global Step: 674210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:45,820-Speed 6294.59 samples/sec Loss 3.0725 LearningRate 0.0000 Epoch: 32 Global Step: 674220 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:49,066-Speed 6309.42 samples/sec Loss 3.0306 LearningRate 0.0000 Epoch: 32 Global Step: 674230 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:03:52,304-Speed 6327.03 samples/sec Loss 3.1116 LearningRate 0.0000 Epoch: 32 Global Step: 674240 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:03:55,555-Speed 6301.14 samples/sec Loss 3.0045 LearningRate 0.0000 Epoch: 32 Global Step: 674250 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:03:58,800-Speed 6312.12 samples/sec Loss 3.0447 LearningRate 0.0000 Epoch: 32 Global Step: 674260 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:04:02,047-Speed 6310.25 samples/sec Loss 3.0458 LearningRate 0.0000 Epoch: 32 Global Step: 674270 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:04:05,288-Speed 6319.62 samples/sec Loss 3.0672 LearningRate 0.0000 Epoch: 32 Global Step: 674280 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:04:08,540-Speed 6298.64 samples/sec Loss 3.0811 LearningRate 0.0000 Epoch: 32 Global Step: 674290 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:04:11,790-Speed 6303.49 samples/sec Loss 3.0560 LearningRate 0.0000 Epoch: 32 Global Step: 674300 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:04:15,035-Speed 6311.84 samples/sec Loss 3.0576 LearningRate 0.0000 Epoch: 32 Global Step: 674310 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:04:18,288-Speed 6296.59 samples/sec Loss 3.0906 LearningRate 0.0000 Epoch: 32 Global Step: 674320 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:04:21,540-Speed 6299.89 samples/sec Loss 3.1109 LearningRate 0.0000 Epoch: 32 Global Step: 674330 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:04:24,783-Speed 6316.47 samples/sec Loss 3.0860 LearningRate 0.0000 Epoch: 32 Global Step: 674340 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:04:28,035-Speed 6299.06 samples/sec Loss 3.0435 LearningRate 0.0000 Epoch: 32 Global Step: 674350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:04:31,280-Speed 6312.32 samples/sec Loss 3.0901 LearningRate 0.0000 Epoch: 32 Global Step: 674360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:04:34,532-Speed 6300.55 samples/sec Loss 3.0662 LearningRate 0.0000 Epoch: 32 Global Step: 674370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:04:37,799-Speed 6270.03 samples/sec Loss 3.0415 LearningRate 0.0000 Epoch: 32 Global Step: 674380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:04:41,053-Speed 6293.66 samples/sec Loss 3.0551 LearningRate 0.0000 Epoch: 32 Global Step: 674390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:04:44,298-Speed 6312.58 samples/sec Loss 3.0641 LearningRate 0.0000 Epoch: 32 Global Step: 674400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:04:47,546-Speed 6308.18 samples/sec Loss 3.0671 LearningRate 0.0000 Epoch: 32 Global Step: 674410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:04:50,794-Speed 6306.07 samples/sec Loss 3.0361 LearningRate 0.0000 Epoch: 32 Global Step: 674420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:04:54,041-Speed 6309.49 samples/sec Loss 3.0519 LearningRate 0.0000 Epoch: 32 Global Step: 674430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:04:57,278-Speed 6327.63 samples/sec Loss 3.0729 LearningRate 0.0000 Epoch: 32 Global Step: 674440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:00,520-Speed 6319.30 samples/sec Loss 3.0616 LearningRate 0.0000 Epoch: 32 Global Step: 674450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:03,771-Speed 6301.32 samples/sec Loss 3.1092 LearningRate 0.0000 Epoch: 32 Global Step: 674460 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:07,027-Speed 6291.85 samples/sec Loss 3.1001 LearningRate 0.0000 Epoch: 32 Global Step: 674470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:10,281-Speed 6293.94 samples/sec Loss 3.0652 LearningRate 0.0000 Epoch: 32 Global Step: 674480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:13,531-Speed 6303.82 samples/sec Loss 3.0836 LearningRate 0.0000 Epoch: 32 Global Step: 674490 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:16,784-Speed 6295.89 samples/sec Loss 3.1156 LearningRate 0.0000 Epoch: 32 Global Step: 674500 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:20,040-Speed 6291.19 samples/sec Loss 3.0723 LearningRate 0.0000 Epoch: 32 Global Step: 674510 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:23,292-Speed 6300.26 samples/sec Loss 3.0538 LearningRate 0.0000 Epoch: 32 Global Step: 674520 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:26,550-Speed 6287.87 samples/sec Loss 3.0767 LearningRate 0.0000 Epoch: 32 Global Step: 674530 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:29,787-Speed 6327.31 samples/sec Loss 3.0128 LearningRate 0.0000 Epoch: 32 Global Step: 674540 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:33,036-Speed 6304.92 samples/sec Loss 3.0540 LearningRate 0.0000 Epoch: 32 Global Step: 674550 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:36,285-Speed 6305.20 samples/sec Loss 3.0815 LearningRate 0.0000 Epoch: 32 Global Step: 674560 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:39,531-Speed 6309.51 samples/sec Loss 3.0685 LearningRate 0.0000 Epoch: 32 Global Step: 674570 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:42,781-Speed 6304.38 samples/sec Loss 3.0890 LearningRate 0.0000 Epoch: 32 Global Step: 674580 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:46,035-Speed 6295.32 samples/sec Loss 3.0919 LearningRate 0.0000 Epoch: 32 Global Step: 674590 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:49,283-Speed 6306.61 samples/sec Loss 3.1038 LearningRate 0.0000 Epoch: 32 Global Step: 674600 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:52,528-Speed 6312.24 samples/sec Loss 3.0534 LearningRate 0.0000 Epoch: 32 Global Step: 674610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:55,781-Speed 6296.55 samples/sec Loss 3.0248 LearningRate 0.0000 Epoch: 32 Global Step: 674620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:05:59,029-Speed 6308.58 samples/sec Loss 3.0874 LearningRate 0.0000 Epoch: 32 Global Step: 674630 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:02,261-Speed 6336.98 samples/sec Loss 3.0460 LearningRate 0.0000 Epoch: 32 Global Step: 674640 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:05,523-Speed 6279.97 samples/sec Loss 3.1224 LearningRate 0.0000 Epoch: 32 Global Step: 674650 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:08,773-Speed 6303.17 samples/sec Loss 3.0658 LearningRate 0.0000 Epoch: 32 Global Step: 674660 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:12,035-Speed 6280.88 samples/sec Loss 3.0786 LearningRate 0.0000 Epoch: 32 Global Step: 674670 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:15,288-Speed 6296.26 samples/sec Loss 3.0610 LearningRate 0.0000 Epoch: 32 Global Step: 674680 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:18,535-Speed 6308.18 samples/sec Loss 3.0943 LearningRate 0.0000 Epoch: 32 Global Step: 674690 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:21,785-Speed 6303.38 samples/sec Loss 3.1408 LearningRate 0.0000 Epoch: 32 Global Step: 674700 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:25,035-Speed 6302.66 samples/sec Loss 3.0775 LearningRate 0.0000 Epoch: 32 Global Step: 674710 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:28,284-Speed 6304.97 samples/sec Loss 3.0887 LearningRate 0.0000 Epoch: 32 Global Step: 674720 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:31,542-Speed 6287.80 samples/sec Loss 3.0904 LearningRate 0.0000 Epoch: 32 Global Step: 674730 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:34,781-Speed 6325.39 samples/sec Loss 3.0378 LearningRate 0.0000 Epoch: 32 Global Step: 674740 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:38,028-Speed 6307.36 samples/sec Loss 3.0372 LearningRate 0.0000 Epoch: 32 Global Step: 674750 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:41,279-Speed 6301.00 samples/sec Loss 3.1328 LearningRate 0.0000 Epoch: 32 Global Step: 674760 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:44,537-Speed 6288.70 samples/sec Loss 3.0992 LearningRate 0.0000 Epoch: 32 Global Step: 674770 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:47,780-Speed 6315.68 samples/sec Loss 3.0708 LearningRate 0.0000 Epoch: 32 Global Step: 674780 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:51,038-Speed 6288.25 samples/sec Loss 3.0567 LearningRate 0.0000 Epoch: 32 Global Step: 674790 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:54,287-Speed 6305.14 samples/sec Loss 3.0910 LearningRate 0.0000 Epoch: 32 Global Step: 674800 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:06:57,531-Speed 6313.45 samples/sec Loss 3.0524 LearningRate 0.0000 Epoch: 32 Global Step: 674810 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:07:00,769-Speed 6325.83 samples/sec Loss 3.1133 LearningRate 0.0000 Epoch: 32 Global Step: 674820 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:07:04,023-Speed 6296.55 samples/sec Loss 3.1257 LearningRate 0.0000 Epoch: 32 Global Step: 674830 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:07:07,273-Speed 6302.92 samples/sec Loss 3.1024 LearningRate 0.0000 Epoch: 32 Global Step: 674840 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:07:10,518-Speed 6310.93 samples/sec Loss 3.0571 LearningRate 0.0000 Epoch: 32 Global Step: 674850 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:07:13,761-Speed 6317.53 samples/sec Loss 3.0810 LearningRate 0.0000 Epoch: 32 Global Step: 674860 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:07:17,008-Speed 6309.05 samples/sec Loss 3.0592 LearningRate 0.0000 Epoch: 32 Global Step: 674870 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:07:20,257-Speed 6305.06 samples/sec Loss 3.0616 LearningRate 0.0000 Epoch: 32 Global Step: 674880 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:07:23,506-Speed 6304.75 samples/sec Loss 3.0760 LearningRate 0.0000 Epoch: 32 Global Step: 674890 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:07:26,756-Speed 6304.03 samples/sec Loss 3.0628 LearningRate 0.0000 Epoch: 32 Global Step: 674900 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:07:30,000-Speed 6315.50 samples/sec Loss 3.0772 LearningRate 0.0000 Epoch: 32 Global Step: 674910 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:07:33,249-Speed 6303.16 samples/sec Loss 3.0767 LearningRate 0.0000 Epoch: 32 Global Step: 674920 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:07:36,496-Speed 6309.05 samples/sec Loss 3.0859 LearningRate 0.0000 Epoch: 32 Global Step: 674930 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:07:39,740-Speed 6314.48 samples/sec Loss 3.0463 LearningRate 0.0000 Epoch: 32 Global Step: 674940 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:07:42,989-Speed 6304.90 samples/sec Loss 3.1124 LearningRate 0.0000 Epoch: 32 Global Step: 674950 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:07:46,240-Speed 6300.97 samples/sec Loss 3.0855 LearningRate 0.0000 Epoch: 32 Global Step: 674960 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:07:49,485-Speed 6312.85 samples/sec Loss 3.1619 LearningRate 0.0000 Epoch: 32 Global Step: 674970 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:07:52,734-Speed 6305.40 samples/sec Loss 3.1016 LearningRate 0.0000 Epoch: 32 Global Step: 674980 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:07:55,983-Speed 6304.54 samples/sec Loss 3.0783 LearningRate 0.0000 Epoch: 32 Global Step: 674990 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:07:59,232-Speed 6305.22 samples/sec Loss 3.0709 LearningRate 0.0000 Epoch: 32 Global Step: 675000 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:02,497-Speed 6273.48 samples/sec Loss 3.0429 LearningRate 0.0000 Epoch: 32 Global Step: 675010 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:05,728-Speed 6339.28 samples/sec Loss 3.0423 LearningRate 0.0000 Epoch: 32 Global Step: 675020 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:08,977-Speed 6306.25 samples/sec Loss 3.0478 LearningRate 0.0000 Epoch: 32 Global Step: 675030 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:12,230-Speed 6295.60 samples/sec Loss 3.0268 LearningRate 0.0000 Epoch: 32 Global Step: 675040 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:15,478-Speed 6307.38 samples/sec Loss 3.0255 LearningRate 0.0000 Epoch: 32 Global Step: 675050 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:18,733-Speed 6293.64 samples/sec Loss 3.0934 LearningRate 0.0000 Epoch: 32 Global Step: 675060 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:21,981-Speed 6306.29 samples/sec Loss 3.0392 LearningRate 0.0000 Epoch: 32 Global Step: 675070 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:25,226-Speed 6313.85 samples/sec Loss 3.1130 LearningRate 0.0000 Epoch: 32 Global Step: 675080 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:28,477-Speed 6300.92 samples/sec Loss 2.9956 LearningRate 0.0000 Epoch: 32 Global Step: 675090 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:31,720-Speed 6317.44 samples/sec Loss 3.0457 LearningRate 0.0000 Epoch: 32 Global Step: 675100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:34,973-Speed 6298.76 samples/sec Loss 3.0722 LearningRate 0.0000 Epoch: 32 Global Step: 675110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:38,205-Speed 6336.50 samples/sec Loss 3.1032 LearningRate 0.0000 Epoch: 32 Global Step: 675120 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:41,453-Speed 6306.52 samples/sec Loss 3.0444 LearningRate 0.0000 Epoch: 32 Global Step: 675130 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:44,701-Speed 6308.90 samples/sec Loss 3.0609 LearningRate 0.0000 Epoch: 32 Global Step: 675140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:47,954-Speed 6297.22 samples/sec Loss 3.0449 LearningRate 0.0000 Epoch: 32 Global Step: 675150 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:51,198-Speed 6314.18 samples/sec Loss 3.0641 LearningRate 0.0000 Epoch: 32 Global Step: 675160 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:54,439-Speed 6320.38 samples/sec Loss 3.0838 LearningRate 0.0000 Epoch: 32 Global Step: 675170 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:08:57,689-Speed 6302.02 samples/sec Loss 3.0824 LearningRate 0.0000 Epoch: 32 Global Step: 675180 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:09:00,935-Speed 6311.35 samples/sec Loss 3.0282 LearningRate 0.0000 Epoch: 32 Global Step: 675190 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:09:04,183-Speed 6307.05 samples/sec Loss 3.0867 LearningRate 0.0000 Epoch: 32 Global Step: 675200 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:09:07,430-Speed 6308.46 samples/sec Loss 3.0645 LearningRate 0.0000 Epoch: 32 Global Step: 675210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:09:10,663-Speed 6336.18 samples/sec Loss 3.0877 LearningRate 0.0000 Epoch: 32 Global Step: 675220 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:09:13,908-Speed 6312.26 samples/sec Loss 3.0457 LearningRate 0.0000 Epoch: 32 Global Step: 675230 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:09:17,156-Speed 6306.63 samples/sec Loss 3.0868 LearningRate 0.0000 Epoch: 32 Global Step: 675240 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:09:20,392-Speed 6330.66 samples/sec Loss 3.0452 LearningRate 0.0000 Epoch: 32 Global Step: 675250 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:09:23,652-Speed 6283.20 samples/sec Loss 3.1118 LearningRate 0.0000 Epoch: 32 Global Step: 675260 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:09:26,899-Speed 6308.83 samples/sec Loss 3.1036 LearningRate 0.0000 Epoch: 32 Global Step: 675270 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:09:30,154-Speed 6294.22 samples/sec Loss 3.0991 LearningRate 0.0000 Epoch: 32 Global Step: 675280 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:09:33,404-Speed 6302.94 samples/sec Loss 3.0951 LearningRate 0.0000 Epoch: 32 Global Step: 675290 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:09:36,654-Speed 6302.94 samples/sec Loss 3.1270 LearningRate 0.0000 Epoch: 32 Global Step: 675300 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:09:39,895-Speed 6319.23 samples/sec Loss 3.0185 LearningRate 0.0000 Epoch: 32 Global Step: 675310 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:09:43,149-Speed 6296.48 samples/sec Loss 3.1313 LearningRate 0.0000 Epoch: 32 Global Step: 675320 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:09:46,398-Speed 6306.44 samples/sec Loss 3.0905 LearningRate 0.0000 Epoch: 32 Global Step: 675330 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:09:49,647-Speed 6303.71 samples/sec Loss 3.0350 LearningRate 0.0000 Epoch: 32 Global Step: 675340 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:09:52,897-Speed 6303.17 samples/sec Loss 3.0692 LearningRate 0.0000 Epoch: 32 Global Step: 675350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:09:56,141-Speed 6314.27 samples/sec Loss 3.0177 LearningRate 0.0000 Epoch: 32 Global Step: 675360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:09:59,382-Speed 6321.49 samples/sec Loss 3.0403 LearningRate 0.0000 Epoch: 32 Global Step: 675370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:10:02,632-Speed 6303.57 samples/sec Loss 3.0589 LearningRate 0.0000 Epoch: 32 Global Step: 675380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:10:05,880-Speed 6306.43 samples/sec Loss 3.0864 LearningRate 0.0000 Epoch: 32 Global Step: 675390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:10:09,128-Speed 6306.27 samples/sec Loss 3.0779 LearningRate 0.0000 Epoch: 32 Global Step: 675400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:10:12,381-Speed 6297.65 samples/sec Loss 3.0695 LearningRate 0.0000 Epoch: 32 Global Step: 675410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:10:15,626-Speed 6312.12 samples/sec Loss 3.0787 LearningRate 0.0000 Epoch: 32 Global Step: 675420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:10:18,875-Speed 6304.15 samples/sec Loss 3.0544 LearningRate 0.0000 Epoch: 32 Global Step: 675430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:10:22,125-Speed 6302.32 samples/sec Loss 3.0747 LearningRate 0.0000 Epoch: 32 Global Step: 675440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:10:25,361-Speed 6332.18 samples/sec Loss 3.0689 LearningRate 0.0000 Epoch: 32 Global Step: 675450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:10:28,613-Speed 6297.32 samples/sec Loss 3.0558 LearningRate 0.0000 Epoch: 32 Global Step: 675460 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:10:31,868-Speed 6294.14 samples/sec Loss 3.0574 LearningRate 0.0000 Epoch: 32 Global Step: 675470 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:10:35,111-Speed 6315.77 samples/sec Loss 3.0548 LearningRate 0.0000 Epoch: 32 Global Step: 675480 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:10:38,359-Speed 6308.22 samples/sec Loss 3.0256 LearningRate 0.0000 Epoch: 32 Global Step: 675490 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:10:41,613-Speed 6294.76 samples/sec Loss 3.1187 LearningRate 0.0000 Epoch: 32 Global Step: 675500 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:10:44,871-Speed 6287.65 samples/sec Loss 3.0650 LearningRate 0.0000 Epoch: 32 Global Step: 675510 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:10:48,119-Speed 6306.14 samples/sec Loss 3.0727 LearningRate 0.0000 Epoch: 32 Global Step: 675520 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:10:51,370-Speed 6302.13 samples/sec Loss 3.0950 LearningRate 0.0000 Epoch: 32 Global Step: 675530 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:10:54,618-Speed 6306.56 samples/sec Loss 3.0758 LearningRate 0.0000 Epoch: 32 Global Step: 675540 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:10:57,877-Speed 6284.48 samples/sec Loss 3.0417 LearningRate 0.0000 Epoch: 32 Global Step: 675550 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:01,209-Speed 6150.01 samples/sec Loss 3.0561 LearningRate 0.0000 Epoch: 32 Global Step: 675560 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:11:04,500-Speed 6223.11 samples/sec Loss 3.0668 LearningRate 0.0000 Epoch: 32 Global Step: 675570 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:11:07,744-Speed 6315.29 samples/sec Loss 3.0085 LearningRate 0.0000 Epoch: 32 Global Step: 675580 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:11:10,988-Speed 6314.25 samples/sec Loss 2.9786 LearningRate 0.0000 Epoch: 32 Global Step: 675590 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:11:14,234-Speed 6310.50 samples/sec Loss 3.0825 LearningRate 0.0000 Epoch: 32 Global Step: 675600 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:11:17,484-Speed 6303.62 samples/sec Loss 3.1164 LearningRate 0.0000 Epoch: 32 Global Step: 675610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:11:20,740-Speed 6290.19 samples/sec Loss 3.0822 LearningRate 0.0000 Epoch: 32 Global Step: 675620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:11:23,989-Speed 6305.31 samples/sec Loss 3.0699 LearningRate 0.0000 Epoch: 32 Global Step: 675630 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:27,240-Speed 6300.12 samples/sec Loss 3.0961 LearningRate 0.0000 Epoch: 32 Global Step: 675640 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:30,490-Speed 6304.07 samples/sec Loss 3.0757 LearningRate 0.0000 Epoch: 32 Global Step: 675650 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:33,744-Speed 6295.77 samples/sec Loss 3.0748 LearningRate 0.0000 Epoch: 32 Global Step: 675660 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:37,004-Speed 6283.45 samples/sec Loss 3.0440 LearningRate 0.0000 Epoch: 32 Global Step: 675670 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:40,255-Speed 6300.97 samples/sec Loss 3.0551 LearningRate 0.0000 Epoch: 32 Global Step: 675680 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:43,503-Speed 6305.47 samples/sec Loss 3.0065 LearningRate 0.0000 Epoch: 32 Global Step: 675690 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:46,754-Speed 6301.02 samples/sec Loss 3.0548 LearningRate 0.0000 Epoch: 32 Global Step: 675700 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:50,001-Speed 6309.02 samples/sec Loss 3.0549 LearningRate 0.0000 Epoch: 32 Global Step: 675710 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:53,246-Speed 6312.61 samples/sec Loss 3.0837 LearningRate 0.0000 Epoch: 32 Global Step: 675720 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:11:56,494-Speed 6307.83 samples/sec Loss 2.9841 LearningRate 0.0000 Epoch: 32 Global Step: 675730 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:11:59,729-Speed 6331.76 samples/sec Loss 3.0955 LearningRate 0.0000 Epoch: 32 Global Step: 675740 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:12:02,979-Speed 6303.21 samples/sec Loss 3.0520 LearningRate 0.0000 Epoch: 32 Global Step: 675750 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:12:06,226-Speed 6309.43 samples/sec Loss 3.0655 LearningRate 0.0000 Epoch: 32 Global Step: 675760 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:12:09,471-Speed 6311.83 samples/sec Loss 3.0154 LearningRate 0.0000 Epoch: 32 Global Step: 675770 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:12:12,718-Speed 6308.59 samples/sec Loss 3.1177 LearningRate 0.0000 Epoch: 32 Global Step: 675780 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:12:15,992-Speed 6257.71 samples/sec Loss 3.0937 LearningRate 0.0000 Epoch: 32 Global Step: 675790 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:12:19,244-Speed 6299.50 samples/sec Loss 3.0441 LearningRate 0.0000 Epoch: 32 Global Step: 675800 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:12:22,488-Speed 6313.36 samples/sec Loss 3.0875 LearningRate 0.0000 Epoch: 32 Global Step: 675810 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:12:25,734-Speed 6311.42 samples/sec Loss 3.0103 LearningRate 0.0000 Epoch: 32 Global Step: 675820 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:12:28,985-Speed 6300.85 samples/sec Loss 3.0684 LearningRate 0.0000 Epoch: 32 Global Step: 675830 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:12:32,238-Speed 6297.17 samples/sec Loss 3.0605 LearningRate 0.0000 Epoch: 32 Global Step: 675840 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:12:35,486-Speed 6307.53 samples/sec Loss 3.0520 LearningRate 0.0000 Epoch: 32 Global Step: 675850 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:12:38,737-Speed 6300.31 samples/sec Loss 3.0406 LearningRate 0.0000 Epoch: 32 Global Step: 675860 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:12:41,980-Speed 6317.29 samples/sec Loss 3.0292 LearningRate 0.0000 Epoch: 32 Global Step: 675870 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:12:45,227-Speed 6307.07 samples/sec Loss 3.0586 LearningRate 0.0000 Epoch: 32 Global Step: 675880 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:12:48,474-Speed 6310.25 samples/sec Loss 3.0389 LearningRate 0.0000 Epoch: 32 Global Step: 675890 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:12:51,732-Speed 6286.22 samples/sec Loss 3.0796 LearningRate 0.0000 Epoch: 32 Global Step: 675900 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:12:54,976-Speed 6314.98 samples/sec Loss 3.0710 LearningRate 0.0000 Epoch: 32 Global Step: 675910 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:12:58,222-Speed 6311.14 samples/sec Loss 3.0909 LearningRate 0.0000 Epoch: 32 Global Step: 675920 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:13:01,481-Speed 6285.99 samples/sec Loss 3.1080 LearningRate 0.0000 Epoch: 32 Global Step: 675930 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:13:04,712-Speed 6339.43 samples/sec Loss 3.1170 LearningRate 0.0000 Epoch: 32 Global Step: 675940 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:13:07,958-Speed 6309.81 samples/sec Loss 3.0430 LearningRate 0.0000 Epoch: 32 Global Step: 675950 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:13:11,205-Speed 6308.83 samples/sec Loss 3.0186 LearningRate 0.0000 Epoch: 32 Global Step: 675960 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:13:14,450-Speed 6313.98 samples/sec Loss 3.0464 LearningRate 0.0000 Epoch: 32 Global Step: 675970 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:13:17,697-Speed 6309.69 samples/sec Loss 3.0614 LearningRate 0.0000 Epoch: 32 Global Step: 675980 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:13:20,953-Speed 6290.70 samples/sec Loss 3.0286 LearningRate 0.0000 Epoch: 32 Global Step: 675990 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:13:24,189-Speed 6330.50 samples/sec Loss 3.0597 LearningRate 0.0000 Epoch: 32 Global Step: 676000 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:13:27,435-Speed 6309.75 samples/sec Loss 3.0413 LearningRate 0.0000 Epoch: 32 Global Step: 676010 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:13:30,687-Speed 6300.65 samples/sec Loss 3.0247 LearningRate 0.0000 Epoch: 32 Global Step: 676020 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:13:33,932-Speed 6311.19 samples/sec Loss 3.0607 LearningRate 0.0000 Epoch: 32 Global Step: 676030 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:13:37,187-Speed 6294.02 samples/sec Loss 3.1033 LearningRate 0.0000 Epoch: 32 Global Step: 676040 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:13:40,446-Speed 6286.24 samples/sec Loss 3.0619 LearningRate 0.0000 Epoch: 32 Global Step: 676050 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:13:43,695-Speed 6303.85 samples/sec Loss 3.0636 LearningRate 0.0000 Epoch: 32 Global Step: 676060 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:13:46,947-Speed 6299.91 samples/sec Loss 3.0405 LearningRate 0.0000 Epoch: 32 Global Step: 676070 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:13:50,196-Speed 6303.50 samples/sec Loss 3.0337 LearningRate 0.0000 Epoch: 32 Global Step: 676080 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:13:53,443-Speed 6309.27 samples/sec Loss 3.0911 LearningRate 0.0000 Epoch: 32 Global Step: 676090 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:13:56,691-Speed 6307.33 samples/sec Loss 3.0823 LearningRate 0.0000 Epoch: 32 Global Step: 676100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:13:59,944-Speed 6297.81 samples/sec Loss 3.0379 LearningRate 0.0000 Epoch: 32 Global Step: 676110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:03,203-Speed 6284.73 samples/sec Loss 3.0970 LearningRate 0.0000 Epoch: 32 Global Step: 676120 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:06,455-Speed 6298.88 samples/sec Loss 3.0386 LearningRate 0.0000 Epoch: 32 Global Step: 676130 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:09,702-Speed 6308.25 samples/sec Loss 3.0594 LearningRate 0.0000 Epoch: 32 Global Step: 676140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:12,964-Speed 6279.76 samples/sec Loss 3.0178 LearningRate 0.0000 Epoch: 32 Global Step: 676150 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:16,217-Speed 6297.20 samples/sec Loss 3.0469 LearningRate 0.0000 Epoch: 32 Global Step: 676160 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:19,462-Speed 6313.15 samples/sec Loss 3.0556 LearningRate 0.0000 Epoch: 32 Global Step: 676170 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:22,706-Speed 6313.31 samples/sec Loss 3.0879 LearningRate 0.0000 Epoch: 32 Global Step: 676180 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:25,959-Speed 6298.17 samples/sec Loss 3.0793 LearningRate 0.0000 Epoch: 32 Global Step: 676190 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:29,196-Speed 6329.59 samples/sec Loss 3.0540 LearningRate 0.0000 Epoch: 32 Global Step: 676200 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:32,442-Speed 6309.44 samples/sec Loss 3.0703 LearningRate 0.0000 Epoch: 32 Global Step: 676210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:35,702-Speed 6284.75 samples/sec Loss 3.0177 LearningRate 0.0000 Epoch: 32 Global Step: 676220 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:38,951-Speed 6304.42 samples/sec Loss 3.0900 LearningRate 0.0000 Epoch: 32 Global Step: 676230 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:42,199-Speed 6307.86 samples/sec Loss 3.0497 LearningRate 0.0000 Epoch: 32 Global Step: 676240 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:45,457-Speed 6286.04 samples/sec Loss 3.0535 LearningRate 0.0000 Epoch: 32 Global Step: 676250 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:48,702-Speed 6312.97 samples/sec Loss 3.0378 LearningRate 0.0000 Epoch: 32 Global Step: 676260 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:51,954-Speed 6299.00 samples/sec Loss 3.0490 LearningRate 0.0000 Epoch: 32 Global Step: 676270 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:55,205-Speed 6300.43 samples/sec Loss 3.1063 LearningRate 0.0000 Epoch: 32 Global Step: 676280 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:14:58,457-Speed 6299.73 samples/sec Loss 3.1190 LearningRate 0.0000 Epoch: 32 Global Step: 676290 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:01,690-Speed 6336.35 samples/sec Loss 3.1228 LearningRate 0.0000 Epoch: 32 Global Step: 676300 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:04,937-Speed 6309.42 samples/sec Loss 3.1559 LearningRate 0.0000 Epoch: 32 Global Step: 676310 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:08,183-Speed 6310.26 samples/sec Loss 3.1338 LearningRate 0.0000 Epoch: 32 Global Step: 676320 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:11,428-Speed 6311.98 samples/sec Loss 3.0714 LearningRate 0.0000 Epoch: 32 Global Step: 676330 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:14,690-Speed 6279.71 samples/sec Loss 3.0802 LearningRate 0.0000 Epoch: 32 Global Step: 676340 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:17,939-Speed 6304.86 samples/sec Loss 3.0865 LearningRate 0.0000 Epoch: 32 Global Step: 676350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:21,188-Speed 6304.97 samples/sec Loss 3.0220 LearningRate 0.0000 Epoch: 32 Global Step: 676360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:24,442-Speed 6296.22 samples/sec Loss 3.1103 LearningRate 0.0000 Epoch: 32 Global Step: 676370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:27,688-Speed 6310.28 samples/sec Loss 3.0416 LearningRate 0.0000 Epoch: 32 Global Step: 676380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:30,942-Speed 6295.76 samples/sec Loss 3.1005 LearningRate 0.0000 Epoch: 32 Global Step: 676390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:34,194-Speed 6297.30 samples/sec Loss 3.0430 LearningRate 0.0000 Epoch: 32 Global Step: 676400 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-03 05:15:37,433-Speed 6326.37 samples/sec Loss 3.1192 LearningRate 0.0000 Epoch: 32 Global Step: 676410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:40,682-Speed 6304.54 samples/sec Loss 3.1002 LearningRate 0.0000 Epoch: 32 Global Step: 676420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:43,929-Speed 6308.53 samples/sec Loss 3.0792 LearningRate 0.0000 Epoch: 32 Global Step: 676430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:47,174-Speed 6313.11 samples/sec Loss 3.0923 LearningRate 0.0000 Epoch: 32 Global Step: 676440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:50,416-Speed 6318.46 samples/sec Loss 3.0096 LearningRate 0.0000 Epoch: 32 Global Step: 676450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:53,663-Speed 6309.25 samples/sec Loss 3.0606 LearningRate 0.0000 Epoch: 32 Global Step: 676460 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:15:56,909-Speed 6309.94 samples/sec Loss 3.0436 LearningRate 0.0000 Epoch: 32 Global Step: 676470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:16:00,180-Speed 6263.41 samples/sec Loss 3.1098 LearningRate 0.0000 Epoch: 32 Global Step: 676480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:16:03,431-Speed 6301.32 samples/sec Loss 3.0734 LearningRate 0.0000 Epoch: 32 Global Step: 676490 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:16:06,685-Speed 6293.95 samples/sec Loss 3.0969 LearningRate 0.0000 Epoch: 32 Global Step: 676500 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:16:09,919-Speed 6335.13 samples/sec Loss 3.0380 LearningRate 0.0000 Epoch: 32 Global Step: 676510 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:16:13,148-Speed 6342.90 samples/sec Loss 3.0948 LearningRate 0.0000 Epoch: 32 Global Step: 676520 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:16:16,402-Speed 6295.03 samples/sec Loss 3.0746 LearningRate 0.0000 Epoch: 32 Global Step: 676530 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:16:19,648-Speed 6310.64 samples/sec Loss 3.0696 LearningRate 0.0000 Epoch: 32 Global Step: 676540 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:16:22,898-Speed 6303.97 samples/sec Loss 3.0775 LearningRate 0.0000 Epoch: 32 Global Step: 676550 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:16:26,145-Speed 6308.58 samples/sec Loss 3.0540 LearningRate 0.0000 Epoch: 32 Global Step: 676560 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:16:29,398-Speed 6297.71 samples/sec Loss 3.0435 LearningRate 0.0000 Epoch: 32 Global Step: 676570 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:16:32,644-Speed 6310.09 samples/sec Loss 3.0200 LearningRate 0.0000 Epoch: 32 Global Step: 676580 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:16:35,888-Speed 6314.15 samples/sec Loss 3.0428 LearningRate 0.0000 Epoch: 32 Global Step: 676590 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:16:39,134-Speed 6311.36 samples/sec Loss 3.0806 LearningRate 0.0000 Epoch: 32 Global Step: 676600 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:16:42,376-Speed 6318.98 samples/sec Loss 3.0989 LearningRate 0.0000 Epoch: 32 Global Step: 676610 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:16:45,620-Speed 6313.63 samples/sec Loss 3.0675 LearningRate 0.0000 Epoch: 32 Global Step: 676620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:16:48,865-Speed 6311.83 samples/sec Loss 3.0297 LearningRate 0.0000 Epoch: 32 Global Step: 676630 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:16:52,111-Speed 6311.69 samples/sec Loss 3.0581 LearningRate 0.0000 Epoch: 32 Global Step: 676640 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:16:55,356-Speed 6312.65 samples/sec Loss 3.0431 LearningRate 0.0000 Epoch: 32 Global Step: 676650 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:16:58,607-Speed 6302.13 samples/sec Loss 3.0892 LearningRate 0.0000 Epoch: 32 Global Step: 676660 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:01,860-Speed 6297.64 samples/sec Loss 3.0594 LearningRate 0.0000 Epoch: 32 Global Step: 676670 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:05,109-Speed 6304.10 samples/sec Loss 3.0962 LearningRate 0.0000 Epoch: 32 Global Step: 676680 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:08,356-Speed 6309.16 samples/sec Loss 3.0674 LearningRate 0.0000 Epoch: 32 Global Step: 676690 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:11,609-Speed 6296.83 samples/sec Loss 3.0854 LearningRate 0.0000 Epoch: 32 Global Step: 676700 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:14,870-Speed 6282.44 samples/sec Loss 3.0830 LearningRate 0.0000 Epoch: 32 Global Step: 676710 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:18,110-Speed 6321.92 samples/sec Loss 3.0217 LearningRate 0.0000 Epoch: 32 Global Step: 676720 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:21,362-Speed 6299.79 samples/sec Loss 2.9853 LearningRate 0.0000 Epoch: 32 Global Step: 676730 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:24,620-Speed 6286.88 samples/sec Loss 3.0486 LearningRate 0.0000 Epoch: 32 Global Step: 676740 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:27,868-Speed 6306.16 samples/sec Loss 3.0114 LearningRate 0.0000 Epoch: 32 Global Step: 676750 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:31,120-Speed 6298.62 samples/sec Loss 3.0577 LearningRate 0.0000 Epoch: 32 Global Step: 676760 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:34,371-Speed 6302.21 samples/sec Loss 3.0937 LearningRate 0.0000 Epoch: 32 Global Step: 676770 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:37,616-Speed 6312.07 samples/sec Loss 3.0757 LearningRate 0.0000 Epoch: 32 Global Step: 676780 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:40,864-Speed 6307.76 samples/sec Loss 3.0409 LearningRate 0.0000 Epoch: 32 Global Step: 676790 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:44,114-Speed 6302.83 samples/sec Loss 3.0462 LearningRate 0.0000 Epoch: 32 Global Step: 676800 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:17:47,353-Speed 6323.47 samples/sec Loss 3.1015 LearningRate 0.0000 Epoch: 32 Global Step: 676810 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:17:50,637-Speed 6238.77 samples/sec Loss 3.1045 LearningRate 0.0000 Epoch: 32 Global Step: 676820 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:17:53,884-Speed 6308.40 samples/sec Loss 3.0737 LearningRate 0.0000 Epoch: 32 Global Step: 676830 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:17:57,132-Speed 6306.15 samples/sec Loss 3.0155 LearningRate 0.0000 Epoch: 32 Global Step: 676840 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:00,381-Speed 6304.63 samples/sec Loss 3.1079 LearningRate 0.0000 Epoch: 32 Global Step: 676850 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:03,626-Speed 6312.25 samples/sec Loss 3.0566 LearningRate 0.0000 Epoch: 32 Global Step: 676860 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:06,874-Speed 6307.33 samples/sec Loss 3.0866 LearningRate 0.0000 Epoch: 32 Global Step: 676870 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:10,119-Speed 6314.69 samples/sec Loss 3.0399 LearningRate 0.0000 Epoch: 32 Global Step: 676880 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:13,371-Speed 6297.61 samples/sec Loss 3.0630 LearningRate 0.0000 Epoch: 32 Global Step: 676890 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:16,615-Speed 6315.04 samples/sec Loss 3.0621 LearningRate 0.0000 Epoch: 32 Global Step: 676900 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:19,856-Speed 6320.88 samples/sec Loss 3.0705 LearningRate 0.0000 Epoch: 32 Global Step: 676910 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:23,106-Speed 6303.23 samples/sec Loss 3.1156 LearningRate 0.0000 Epoch: 32 Global Step: 676920 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:26,357-Speed 6300.55 samples/sec Loss 3.0592 LearningRate 0.0000 Epoch: 32 Global Step: 676930 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:29,606-Speed 6305.75 samples/sec Loss 3.0733 LearningRate 0.0000 Epoch: 32 Global Step: 676940 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:32,857-Speed 6301.67 samples/sec Loss 3.0419 LearningRate 0.0000 Epoch: 32 Global Step: 676950 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:36,106-Speed 6303.97 samples/sec Loss 3.0821 LearningRate 0.0000 Epoch: 32 Global Step: 676960 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:39,355-Speed 6304.73 samples/sec Loss 3.0622 LearningRate 0.0000 Epoch: 32 Global Step: 676970 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:42,601-Speed 6311.77 samples/sec Loss 3.0748 LearningRate 0.0000 Epoch: 32 Global Step: 676980 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:45,847-Speed 6310.54 samples/sec Loss 3.0883 LearningRate 0.0000 Epoch: 32 Global Step: 676990 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:49,092-Speed 6312.28 samples/sec Loss 3.0730 LearningRate 0.0000 Epoch: 32 Global Step: 677000 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:18:52,343-Speed 6299.92 samples/sec Loss 3.0190 LearningRate 0.0000 Epoch: 32 Global Step: 677010 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:18:55,586-Speed 6316.98 samples/sec Loss 3.0318 LearningRate 0.0000 Epoch: 32 Global Step: 677020 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:18:58,832-Speed 6311.07 samples/sec Loss 3.1076 LearningRate 0.0000 Epoch: 32 Global Step: 677030 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:19:02,065-Speed 6335.53 samples/sec Loss 3.0197 LearningRate 0.0000 Epoch: 32 Global Step: 677040 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:05,341-Speed 6252.83 samples/sec Loss 3.0778 LearningRate 0.0000 Epoch: 32 Global Step: 677050 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:08,596-Speed 6294.71 samples/sec Loss 3.0598 LearningRate 0.0000 Epoch: 32 Global Step: 677060 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:11,857-Speed 6281.44 samples/sec Loss 3.0357 LearningRate 0.0000 Epoch: 32 Global Step: 677070 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:15,100-Speed 6316.49 samples/sec Loss 3.0476 LearningRate 0.0000 Epoch: 32 Global Step: 677080 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:18,349-Speed 6304.94 samples/sec Loss 3.0880 LearningRate 0.0000 Epoch: 32 Global Step: 677090 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:21,598-Speed 6303.48 samples/sec Loss 3.0550 LearningRate 0.0000 Epoch: 32 Global Step: 677100 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:24,852-Speed 6297.44 samples/sec Loss 3.1000 LearningRate 0.0000 Epoch: 32 Global Step: 677110 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:28,097-Speed 6313.02 samples/sec Loss 3.1005 LearningRate 0.0000 Epoch: 32 Global Step: 677120 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:31,348-Speed 6300.40 samples/sec Loss 3.0717 LearningRate 0.0000 Epoch: 32 Global Step: 677130 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:34,593-Speed 6312.48 samples/sec Loss 3.0037 LearningRate 0.0000 Epoch: 32 Global Step: 677140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:19:37,839-Speed 6310.43 samples/sec Loss 3.0550 LearningRate 0.0000 Epoch: 32 Global Step: 677150 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:19:41,078-Speed 6324.88 samples/sec Loss 3.0361 LearningRate 0.0000 Epoch: 32 Global Step: 677160 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:44,342-Speed 6276.30 samples/sec Loss 3.0354 LearningRate 0.0000 Epoch: 32 Global Step: 677170 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:47,686-Speed 6124.86 samples/sec Loss 3.0516 LearningRate 0.0000 Epoch: 32 Global Step: 677180 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:50,933-Speed 6309.10 samples/sec Loss 3.0558 LearningRate 0.0000 Epoch: 32 Global Step: 677190 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:54,181-Speed 6306.61 samples/sec Loss 3.0587 LearningRate 0.0000 Epoch: 32 Global Step: 677200 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:19:57,428-Speed 6309.93 samples/sec Loss 3.0289 LearningRate 0.0000 Epoch: 32 Global Step: 677210 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:00,671-Speed 6315.99 samples/sec Loss 3.0541 LearningRate 0.0000 Epoch: 32 Global Step: 677220 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:03,925-Speed 6295.64 samples/sec Loss 3.0318 LearningRate 0.0000 Epoch: 32 Global Step: 677230 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:07,173-Speed 6306.11 samples/sec Loss 3.1000 LearningRate 0.0000 Epoch: 32 Global Step: 677240 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:10,420-Speed 6310.28 samples/sec Loss 3.0660 LearningRate 0.0000 Epoch: 32 Global Step: 677250 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:13,672-Speed 6297.62 samples/sec Loss 2.9943 LearningRate 0.0000 Epoch: 32 Global Step: 677260 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:20:16,923-Speed 6301.99 samples/sec Loss 3.0505 LearningRate 0.0000 Epoch: 32 Global Step: 677270 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:20:20,175-Speed 6298.77 samples/sec Loss 3.0451 LearningRate 0.0000 Epoch: 32 Global Step: 677280 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:20:23,431-Speed 6291.48 samples/sec Loss 3.0626 LearningRate 0.0000 Epoch: 32 Global Step: 677290 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:20:26,683-Speed 6298.20 samples/sec Loss 3.0794 LearningRate 0.0000 Epoch: 32 Global Step: 677300 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:20:29,915-Speed 6339.09 samples/sec Loss 3.1389 LearningRate 0.0000 Epoch: 32 Global Step: 677310 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:33,166-Speed 6299.33 samples/sec Loss 3.0345 LearningRate 0.0000 Epoch: 32 Global Step: 677320 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:36,411-Speed 6313.96 samples/sec Loss 3.0215 LearningRate 0.0000 Epoch: 32 Global Step: 677330 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:39,660-Speed 6305.86 samples/sec Loss 3.0425 LearningRate 0.0000 Epoch: 32 Global Step: 677340 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:42,912-Speed 6299.29 samples/sec Loss 3.0586 LearningRate 0.0000 Epoch: 32 Global Step: 677350 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:46,165-Speed 6296.21 samples/sec Loss 3.0359 LearningRate 0.0000 Epoch: 32 Global Step: 677360 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:49,416-Speed 6301.38 samples/sec Loss 3.0817 LearningRate 0.0000 Epoch: 32 Global Step: 677370 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:52,661-Speed 6312.65 samples/sec Loss 3.0294 LearningRate 0.0000 Epoch: 32 Global Step: 677380 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:55,910-Speed 6304.74 samples/sec Loss 3.0660 LearningRate 0.0000 Epoch: 32 Global Step: 677390 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:20:59,163-Speed 6297.08 samples/sec Loss 3.0877 LearningRate 0.0000 Epoch: 32 Global Step: 677400 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:21:02,410-Speed 6309.01 samples/sec Loss 3.0472 LearningRate 0.0000 Epoch: 32 Global Step: 677410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:05,658-Speed 6307.40 samples/sec Loss 3.0709 LearningRate 0.0000 Epoch: 32 Global Step: 677420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:08,911-Speed 6296.03 samples/sec Loss 3.0992 LearningRate 0.0000 Epoch: 32 Global Step: 677430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:12,161-Speed 6303.67 samples/sec Loss 3.1593 LearningRate 0.0000 Epoch: 32 Global Step: 677440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:15,414-Speed 6296.37 samples/sec Loss 3.0809 LearningRate 0.0000 Epoch: 32 Global Step: 677450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:18,664-Speed 6302.59 samples/sec Loss 3.0911 LearningRate 0.0000 Epoch: 32 Global Step: 677460 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:21,916-Speed 6298.85 samples/sec Loss 3.0520 LearningRate 0.0000 Epoch: 32 Global Step: 677470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:25,168-Speed 6299.74 samples/sec Loss 3.0308 LearningRate 0.0000 Epoch: 32 Global Step: 677480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:28,417-Speed 6304.00 samples/sec Loss 3.0335 LearningRate 0.0000 Epoch: 32 Global Step: 677490 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:31,672-Speed 6293.66 samples/sec Loss 3.0370 LearningRate 0.0000 Epoch: 32 Global Step: 677500 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:34,909-Speed 6329.59 samples/sec Loss 3.0409 LearningRate 0.0000 Epoch: 32 Global Step: 677510 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:38,156-Speed 6308.80 samples/sec Loss 3.0680 LearningRate 0.0000 Epoch: 32 Global Step: 677520 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:41,407-Speed 6300.32 samples/sec Loss 3.0598 LearningRate 0.0000 Epoch: 32 Global Step: 677530 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:44,660-Speed 6296.25 samples/sec Loss 3.0569 LearningRate 0.0000 Epoch: 32 Global Step: 677540 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:47,910-Speed 6304.31 samples/sec Loss 3.0604 LearningRate 0.0000 Epoch: 32 Global Step: 677550 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:51,158-Speed 6305.33 samples/sec Loss 3.0153 LearningRate 0.0000 Epoch: 32 Global Step: 677560 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:54,407-Speed 6306.79 samples/sec Loss 3.0268 LearningRate 0.0000 Epoch: 32 Global Step: 677570 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:21:57,655-Speed 6307.31 samples/sec Loss 3.0522 LearningRate 0.0000 Epoch: 32 Global Step: 677580 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:22:00,905-Speed 6302.12 samples/sec Loss 3.0123 LearningRate 0.0000 Epoch: 32 Global Step: 677590 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:22:04,142-Speed 6328.41 samples/sec Loss 3.0484 LearningRate 0.0000 Epoch: 32 Global Step: 677600 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:07,393-Speed 6300.85 samples/sec Loss 3.0705 LearningRate 0.0000 Epoch: 32 Global Step: 677610 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:10,641-Speed 6307.37 samples/sec Loss 3.0203 LearningRate 0.0000 Epoch: 32 Global Step: 677620 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:13,894-Speed 6297.02 samples/sec Loss 3.0799 LearningRate 0.0000 Epoch: 32 Global Step: 677630 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:17,145-Speed 6300.31 samples/sec Loss 3.0442 LearningRate 0.0000 Epoch: 32 Global Step: 677640 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:20,389-Speed 6315.37 samples/sec Loss 3.0427 LearningRate 0.0000 Epoch: 32 Global Step: 677650 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:23,639-Speed 6302.61 samples/sec Loss 3.1042 LearningRate 0.0000 Epoch: 32 Global Step: 677660 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:26,882-Speed 6316.73 samples/sec Loss 2.9816 LearningRate 0.0000 Epoch: 32 Global Step: 677670 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:30,131-Speed 6304.68 samples/sec Loss 3.0681 LearningRate 0.0000 Epoch: 32 Global Step: 677680 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:33,375-Speed 6313.81 samples/sec Loss 3.0456 LearningRate 0.0000 Epoch: 32 Global Step: 677690 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:36,622-Speed 6309.88 samples/sec Loss 3.0820 LearningRate 0.0000 Epoch: 32 Global Step: 677700 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:22:39,873-Speed 6299.98 samples/sec Loss 3.0506 LearningRate 0.0000 Epoch: 32 Global Step: 677710 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:22:43,119-Speed 6311.24 samples/sec Loss 3.0176 LearningRate 0.0000 Epoch: 32 Global Step: 677720 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:22:46,349-Speed 6341.02 samples/sec Loss 3.0744 LearningRate 0.0000 Epoch: 32 Global Step: 677730 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:49,598-Speed 6305.96 samples/sec Loss 3.1088 LearningRate 0.0000 Epoch: 32 Global Step: 677740 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:52,853-Speed 6292.96 samples/sec Loss 3.1081 LearningRate 0.0000 Epoch: 32 Global Step: 677750 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:56,104-Speed 6300.99 samples/sec Loss 3.0699 LearningRate 0.0000 Epoch: 32 Global Step: 677760 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:22:59,355-Speed 6301.24 samples/sec Loss 3.0364 LearningRate 0.0000 Epoch: 32 Global Step: 677770 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:02,602-Speed 6307.65 samples/sec Loss 3.0461 LearningRate 0.0000 Epoch: 32 Global Step: 677780 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:05,848-Speed 6311.01 samples/sec Loss 3.0543 LearningRate 0.0000 Epoch: 32 Global Step: 677790 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:09,103-Speed 6294.50 samples/sec Loss 3.0541 LearningRate 0.0000 Epoch: 32 Global Step: 677800 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:12,348-Speed 6312.21 samples/sec Loss 3.0578 LearningRate 0.0000 Epoch: 32 Global Step: 677810 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:15,592-Speed 6315.84 samples/sec Loss 3.0992 LearningRate 0.0000 Epoch: 32 Global Step: 677820 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:18,834-Speed 6318.36 samples/sec Loss 3.1226 LearningRate 0.0000 Epoch: 32 Global Step: 677830 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:23:22,078-Speed 6313.90 samples/sec Loss 3.0570 LearningRate 0.0000 Epoch: 32 Global Step: 677840 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:23:25,310-Speed 6337.36 samples/sec Loss 3.0599 LearningRate 0.0000 Epoch: 32 Global Step: 677850 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:28,567-Speed 6290.24 samples/sec Loss 3.0475 LearningRate 0.0000 Epoch: 32 Global Step: 677860 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:31,816-Speed 6305.28 samples/sec Loss 3.1058 LearningRate 0.0000 Epoch: 32 Global Step: 677870 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:35,067-Speed 6300.40 samples/sec Loss 3.0827 LearningRate 0.0000 Epoch: 32 Global Step: 677880 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:38,313-Speed 6311.43 samples/sec Loss 3.0631 LearningRate 0.0000 Epoch: 32 Global Step: 677890 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:41,555-Speed 6317.38 samples/sec Loss 3.0937 LearningRate 0.0000 Epoch: 32 Global Step: 677900 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:44,804-Speed 6305.40 samples/sec Loss 3.0168 LearningRate 0.0000 Epoch: 32 Global Step: 677910 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:48,055-Speed 6301.21 samples/sec Loss 3.1037 LearningRate 0.0000 Epoch: 32 Global Step: 677920 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:51,303-Speed 6307.29 samples/sec Loss 3.0160 LearningRate 0.0000 Epoch: 32 Global Step: 677930 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:54,546-Speed 6315.68 samples/sec Loss 3.0909 LearningRate 0.0000 Epoch: 32 Global Step: 677940 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:23:57,792-Speed 6309.94 samples/sec Loss 3.0465 LearningRate 0.0000 Epoch: 32 Global Step: 677950 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:24:01,024-Speed 6338.60 samples/sec Loss 3.0159 LearningRate 0.0000 Epoch: 32 Global Step: 677960 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:04,271-Speed 6308.38 samples/sec Loss 3.0860 LearningRate 0.0000 Epoch: 32 Global Step: 677970 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:07,521-Speed 6303.48 samples/sec Loss 3.0187 LearningRate 0.0000 Epoch: 32 Global Step: 677980 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:10,768-Speed 6309.49 samples/sec Loss 3.0253 LearningRate 0.0000 Epoch: 32 Global Step: 677990 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:14,019-Speed 6300.73 samples/sec Loss 3.0385 LearningRate 0.0000 Epoch: 32 Global Step: 678000 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:17,269-Speed 6302.97 samples/sec Loss 3.0142 LearningRate 0.0000 Epoch: 32 Global Step: 678010 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:20,514-Speed 6311.68 samples/sec Loss 3.1187 LearningRate 0.0000 Epoch: 32 Global Step: 678020 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:23,771-Speed 6289.93 samples/sec Loss 3.0490 LearningRate 0.0000 Epoch: 32 Global Step: 678030 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:27,025-Speed 6296.95 samples/sec Loss 3.0745 LearningRate 0.0000 Epoch: 32 Global Step: 678040 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:30,275-Speed 6301.93 samples/sec Loss 3.0163 LearningRate 0.0000 Epoch: 32 Global Step: 678050 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:33,524-Speed 6304.64 samples/sec Loss 3.0321 LearningRate 0.0000 Epoch: 32 Global Step: 678060 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:24:36,772-Speed 6307.89 samples/sec Loss 3.0161 LearningRate 0.0000 Epoch: 32 Global Step: 678070 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:24:40,020-Speed 6307.24 samples/sec Loss 2.9914 LearningRate 0.0000 Epoch: 32 Global Step: 678080 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:24:43,270-Speed 6301.40 samples/sec Loss 3.0467 LearningRate 0.0000 Epoch: 32 Global Step: 678090 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:24:46,518-Speed 6307.21 samples/sec Loss 3.0157 LearningRate 0.0000 Epoch: 32 Global Step: 678100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:24:49,749-Speed 6339.25 samples/sec Loss 3.0214 LearningRate 0.0000 Epoch: 32 Global Step: 678110 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:53,000-Speed 6301.90 samples/sec Loss 3.1343 LearningRate 0.0000 Epoch: 32 Global Step: 678120 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:56,240-Speed 6321.91 samples/sec Loss 3.0459 LearningRate 0.0000 Epoch: 32 Global Step: 678130 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:24:59,485-Speed 6313.45 samples/sec Loss 3.0645 LearningRate 0.0000 Epoch: 32 Global Step: 678140 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:02,734-Speed 6304.52 samples/sec Loss 3.1063 LearningRate 0.0000 Epoch: 32 Global Step: 678150 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:05,990-Speed 6291.56 samples/sec Loss 3.0389 LearningRate 0.0000 Epoch: 32 Global Step: 678160 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:09,243-Speed 6297.54 samples/sec Loss 3.0336 LearningRate 0.0000 Epoch: 32 Global Step: 678170 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:12,493-Speed 6302.42 samples/sec Loss 3.0462 LearningRate 0.0000 Epoch: 32 Global Step: 678180 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:15,739-Speed 6310.57 samples/sec Loss 3.0376 LearningRate 0.0000 Epoch: 32 Global Step: 678190 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:18,987-Speed 6306.54 samples/sec Loss 3.0021 LearningRate 0.0000 Epoch: 32 Global Step: 678200 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:22,240-Speed 6297.08 samples/sec Loss 3.0663 LearningRate 0.0000 Epoch: 32 Global Step: 678210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:25:25,471-Speed 6340.83 samples/sec Loss 3.0367 LearningRate 0.0000 Epoch: 32 Global Step: 678220 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:28,718-Speed 6308.18 samples/sec Loss 3.0446 LearningRate 0.0000 Epoch: 32 Global Step: 678230 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:31,967-Speed 6305.14 samples/sec Loss 3.0493 LearningRate 0.0000 Epoch: 32 Global Step: 678240 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:35,216-Speed 6305.54 samples/sec Loss 3.0217 LearningRate 0.0000 Epoch: 32 Global Step: 678250 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:38,489-Speed 6258.39 samples/sec Loss 3.0050 LearningRate 0.0000 Epoch: 32 Global Step: 678260 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:41,738-Speed 6305.37 samples/sec Loss 3.0533 LearningRate 0.0000 Epoch: 32 Global Step: 678270 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:44,986-Speed 6307.02 samples/sec Loss 3.1058 LearningRate 0.0000 Epoch: 32 Global Step: 678280 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:48,231-Speed 6312.88 samples/sec Loss 3.0253 LearningRate 0.0000 Epoch: 32 Global Step: 678290 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:51,474-Speed 6315.22 samples/sec Loss 3.0098 LearningRate 0.0000 Epoch: 32 Global Step: 678300 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:54,724-Speed 6304.51 samples/sec Loss 3.0257 LearningRate 0.0000 Epoch: 32 Global Step: 678310 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:25:57,978-Speed 6294.94 samples/sec Loss 2.9850 LearningRate 0.0000 Epoch: 32 Global Step: 678320 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:26:01,224-Speed 6310.50 samples/sec Loss 3.0684 LearningRate 0.0000 Epoch: 32 Global Step: 678330 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:26:04,479-Speed 6293.24 samples/sec Loss 3.0378 LearningRate 0.0000 Epoch: 32 Global Step: 678340 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:26:07,724-Speed 6311.61 samples/sec Loss 3.0969 LearningRate 0.0000 Epoch: 32 Global Step: 678350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:26:10,986-Speed 6280.01 samples/sec Loss 3.0497 LearningRate 0.0000 Epoch: 32 Global Step: 678360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:26:14,230-Speed 6314.69 samples/sec Loss 3.0617 LearningRate 0.0000 Epoch: 32 Global Step: 678370 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:26:17,480-Speed 6302.90 samples/sec Loss 2.9895 LearningRate 0.0000 Epoch: 32 Global Step: 678380 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:26:20,731-Speed 6302.37 samples/sec Loss 3.0619 LearningRate 0.0000 Epoch: 32 Global Step: 678390 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:26:24,034-Speed 6201.69 samples/sec Loss 3.1613 LearningRate 0.0000 Epoch: 32 Global Step: 678400 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:26:27,289-Speed 6292.32 samples/sec Loss 3.0736 LearningRate 0.0000 Epoch: 32 Global Step: 678410 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:26:30,536-Speed 6309.33 samples/sec Loss 3.0095 LearningRate 0.0000 Epoch: 32 Global Step: 678420 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:26:33,787-Speed 6300.02 samples/sec Loss 3.0870 LearningRate 0.0000 Epoch: 32 Global Step: 678430 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:26:37,045-Speed 6287.31 samples/sec Loss 3.0573 LearningRate 0.0000 Epoch: 32 Global Step: 678440 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:26:40,302-Speed 6290.30 samples/sec Loss 3.0442 LearningRate 0.0000 Epoch: 32 Global Step: 678450 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:26:43,550-Speed 6307.77 samples/sec Loss 3.0540 LearningRate 0.0000 Epoch: 32 Global Step: 678460 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:26:46,799-Speed 6305.41 samples/sec Loss 3.0652 LearningRate 0.0000 Epoch: 32 Global Step: 678470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:26:50,047-Speed 6307.08 samples/sec Loss 3.1040 LearningRate 0.0000 Epoch: 32 Global Step: 678480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:26:53,293-Speed 6309.86 samples/sec Loss 3.0519 LearningRate 0.0000 Epoch: 32 Global Step: 678490 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:26:56,552-Speed 6287.70 samples/sec Loss 3.0215 LearningRate 0.0000 Epoch: 32 Global Step: 678500 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:26:59,798-Speed 6309.50 samples/sec Loss 3.1288 LearningRate 0.0000 Epoch: 32 Global Step: 678510 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:27:03,038-Speed 6322.06 samples/sec Loss 3.0852 LearningRate 0.0000 Epoch: 32 Global Step: 678520 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:27:06,360-Speed 6167.00 samples/sec Loss 2.9999 LearningRate 0.0000 Epoch: 32 Global Step: 678530 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:27:09,611-Speed 6300.96 samples/sec Loss 3.0593 LearningRate 0.0000 Epoch: 32 Global Step: 678540 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:27:12,869-Speed 6288.58 samples/sec Loss 3.0917 LearningRate 0.0000 Epoch: 32 Global Step: 678550 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:27:16,119-Speed 6302.25 samples/sec Loss 3.0302 LearningRate 0.0000 Epoch: 32 Global Step: 678560 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:27:19,365-Speed 6309.36 samples/sec Loss 3.0991 LearningRate 0.0000 Epoch: 32 Global Step: 678570 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:27:22,612-Speed 6308.55 samples/sec Loss 3.0408 LearningRate 0.0000 Epoch: 32 Global Step: 678580 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:27:25,869-Speed 6289.78 samples/sec Loss 3.0463 LearningRate 0.0000 Epoch: 32 Global Step: 678590 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:27:29,113-Speed 6315.18 samples/sec Loss 3.0668 LearningRate 0.0000 Epoch: 32 Global Step: 678600 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:27:32,355-Speed 6318.57 samples/sec Loss 3.0086 LearningRate 0.0000 Epoch: 32 Global Step: 678610 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:27:35,599-Speed 6313.98 samples/sec Loss 3.0978 LearningRate 0.0000 Epoch: 32 Global Step: 678620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:27:38,892-Speed 6221.03 samples/sec Loss 2.9998 LearningRate 0.0000 Epoch: 32 Global Step: 678630 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:27:42,151-Speed 6295.65 samples/sec Loss 3.1076 LearningRate 0.0000 Epoch: 32 Global Step: 678640 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:27:45,394-Speed 6316.18 samples/sec Loss 3.0086 LearningRate 0.0000 Epoch: 32 Global Step: 678650 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:27:48,642-Speed 6306.59 samples/sec Loss 3.0876 LearningRate 0.0000 Epoch: 32 Global Step: 678660 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:27:51,890-Speed 6307.29 samples/sec Loss 3.0244 LearningRate 0.0000 Epoch: 32 Global Step: 678670 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:27:55,144-Speed 6294.70 samples/sec Loss 3.0630 LearningRate 0.0000 Epoch: 32 Global Step: 678680 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:27:58,400-Speed 6291.95 samples/sec Loss 3.0877 LearningRate 0.0000 Epoch: 32 Global Step: 678690 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:28:01,646-Speed 6310.65 samples/sec Loss 3.0853 LearningRate 0.0000 Epoch: 32 Global Step: 678700 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:28:04,896-Speed 6304.51 samples/sec Loss 3.0331 LearningRate 0.0000 Epoch: 32 Global Step: 678710 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:28:08,148-Speed 6298.14 samples/sec Loss 3.0602 LearningRate 0.0000 Epoch: 32 Global Step: 678720 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-03 05:28:11,394-Speed 6311.06 samples/sec Loss 3.1038 LearningRate 0.0000 Epoch: 32 Global Step: 678730 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:28:14,651-Speed 6290.15 samples/sec Loss 3.0119 LearningRate 0.0000 Epoch: 32 Global Step: 678740 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:28:17,906-Speed 6292.70 samples/sec Loss 3.0718 LearningRate 0.0000 Epoch: 32 Global Step: 678750 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:28:21,154-Speed 6307.24 samples/sec Loss 3.0964 LearningRate 0.0000 Epoch: 32 Global Step: 678760 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:28:24,390-Speed 6332.81 samples/sec Loss 3.0361 LearningRate 0.0000 Epoch: 32 Global Step: 678770 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:28:27,638-Speed 6306.18 samples/sec Loss 3.0448 LearningRate 0.0000 Epoch: 32 Global Step: 678780 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:28:30,898-Speed 6284.82 samples/sec Loss 3.0526 LearningRate 0.0000 Epoch: 32 Global Step: 678790 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:28:34,145-Speed 6309.31 samples/sec Loss 3.0765 LearningRate 0.0000 Epoch: 32 Global Step: 678800 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:28:37,391-Speed 6309.98 samples/sec Loss 3.0068 LearningRate 0.0000 Epoch: 32 Global Step: 678810 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:28:40,638-Speed 6309.12 samples/sec Loss 3.0070 LearningRate 0.0000 Epoch: 32 Global Step: 678820 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:28:43,887-Speed 6304.38 samples/sec Loss 3.0791 LearningRate 0.0000 Epoch: 32 Global Step: 678830 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:28:47,132-Speed 6312.74 samples/sec Loss 3.0146 LearningRate 0.0000 Epoch: 32 Global Step: 678840 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:28:50,377-Speed 6312.39 samples/sec Loss 3.0600 LearningRate 0.0000 Epoch: 32 Global Step: 678850 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:28:53,627-Speed 6302.69 samples/sec Loss 3.1170 LearningRate 0.0000 Epoch: 32 Global Step: 678860 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:28:56,890-Speed 6277.78 samples/sec Loss 3.0778 LearningRate 0.0000 Epoch: 32 Global Step: 678870 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:29:00,135-Speed 6312.77 samples/sec Loss 2.9861 LearningRate 0.0000 Epoch: 32 Global Step: 678880 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:29:03,381-Speed 6311.64 samples/sec Loss 3.0840 LearningRate 0.0000 Epoch: 32 Global Step: 678890 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:29:06,630-Speed 6305.22 samples/sec Loss 3.1096 LearningRate 0.0000 Epoch: 32 Global Step: 678900 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:29:09,880-Speed 6302.40 samples/sec Loss 3.0225 LearningRate 0.0000 Epoch: 32 Global Step: 678910 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:29:13,121-Speed 6321.77 samples/sec Loss 3.0539 LearningRate 0.0000 Epoch: 32 Global Step: 678920 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:29:16,369-Speed 6305.94 samples/sec Loss 3.0483 LearningRate 0.0000 Epoch: 32 Global Step: 678930 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:29:19,626-Speed 6288.71 samples/sec Loss 3.0172 LearningRate 0.0000 Epoch: 32 Global Step: 678940 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:29:22,879-Speed 6297.67 samples/sec Loss 3.0235 LearningRate 0.0000 Epoch: 32 Global Step: 678950 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:29:26,131-Speed 6299.15 samples/sec Loss 3.0521 LearningRate 0.0000 Epoch: 32 Global Step: 678960 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:29:29,374-Speed 6316.94 samples/sec Loss 3.0133 LearningRate 0.0000 Epoch: 32 Global Step: 678970 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:29:32,624-Speed 6303.58 samples/sec Loss 3.0156 LearningRate 0.0000 Epoch: 32 Global Step: 678980 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:29:35,869-Speed 6311.54 samples/sec Loss 3.0266 LearningRate 0.0000 Epoch: 32 Global Step: 678990 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:29:39,120-Speed 6301.28 samples/sec Loss 3.0702 LearningRate 0.0000 Epoch: 32 Global Step: 679000 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:29:42,377-Speed 6289.41 samples/sec Loss 3.0138 LearningRate 0.0000 Epoch: 32 Global Step: 679010 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:29:45,627-Speed 6303.39 samples/sec Loss 3.0434 LearningRate 0.0000 Epoch: 32 Global Step: 679020 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:29:48,876-Speed 6304.50 samples/sec Loss 3.0751 LearningRate 0.0000 Epoch: 32 Global Step: 679030 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:29:52,122-Speed 6310.24 samples/sec Loss 3.0098 LearningRate 0.0000 Epoch: 32 Global Step: 679040 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:29:55,368-Speed 6311.57 samples/sec Loss 3.0231 LearningRate 0.0000 Epoch: 32 Global Step: 679050 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:29:58,618-Speed 6302.10 samples/sec Loss 3.0346 LearningRate 0.0000 Epoch: 32 Global Step: 679060 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:01,863-Speed 6313.62 samples/sec Loss 3.1146 LearningRate 0.0000 Epoch: 32 Global Step: 679070 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:05,112-Speed 6303.66 samples/sec Loss 3.0876 LearningRate 0.0000 Epoch: 32 Global Step: 679080 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:08,363-Speed 6301.03 samples/sec Loss 3.0987 LearningRate 0.0000 Epoch: 32 Global Step: 679090 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:11,611-Speed 6307.78 samples/sec Loss 3.0380 LearningRate 0.0000 Epoch: 32 Global Step: 679100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:14,866-Speed 6293.71 samples/sec Loss 3.1058 LearningRate 0.0000 Epoch: 32 Global Step: 679110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:18,098-Speed 6337.06 samples/sec Loss 3.0334 LearningRate 0.0000 Epoch: 32 Global Step: 679120 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:21,346-Speed 6307.36 samples/sec Loss 3.0736 LearningRate 0.0000 Epoch: 32 Global Step: 679130 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:24,596-Speed 6304.23 samples/sec Loss 3.0331 LearningRate 0.0000 Epoch: 32 Global Step: 679140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:27,855-Speed 6285.45 samples/sec Loss 3.0113 LearningRate 0.0000 Epoch: 32 Global Step: 679150 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:31,099-Speed 6313.55 samples/sec Loss 3.0496 LearningRate 0.0000 Epoch: 32 Global Step: 679160 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:34,369-Speed 6265.75 samples/sec Loss 3.0258 LearningRate 0.0000 Epoch: 32 Global Step: 679170 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:37,620-Speed 6299.90 samples/sec Loss 3.0597 LearningRate 0.0000 Epoch: 32 Global Step: 679180 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:40,877-Speed 6290.39 samples/sec Loss 3.0317 LearningRate 0.0000 Epoch: 32 Global Step: 679190 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:44,126-Speed 6303.45 samples/sec Loss 3.0823 LearningRate 0.0000 Epoch: 32 Global Step: 679200 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:47,373-Speed 6308.78 samples/sec Loss 3.0558 LearningRate 0.0000 Epoch: 32 Global Step: 679210 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:50,615-Speed 6319.10 samples/sec Loss 3.0343 LearningRate 0.0000 Epoch: 32 Global Step: 679220 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:53,872-Speed 6289.64 samples/sec Loss 2.9588 LearningRate 0.0000 Epoch: 32 Global Step: 679230 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:30:57,121-Speed 6303.93 samples/sec Loss 3.1270 LearningRate 0.0000 Epoch: 32 Global Step: 679240 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:31:00,354-Speed 6337.73 samples/sec Loss 3.0257 LearningRate 0.0000 Epoch: 32 Global Step: 679250 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:03,598-Speed 6313.94 samples/sec Loss 3.0627 LearningRate 0.0000 Epoch: 32 Global Step: 679260 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:06,848-Speed 6303.07 samples/sec Loss 3.0311 LearningRate 0.0000 Epoch: 32 Global Step: 679270 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:10,098-Speed 6302.38 samples/sec Loss 3.0657 LearningRate 0.0000 Epoch: 32 Global Step: 679280 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:13,350-Speed 6299.02 samples/sec Loss 3.0706 LearningRate 0.0000 Epoch: 32 Global Step: 679290 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:16,602-Speed 6299.99 samples/sec Loss 3.0627 LearningRate 0.0000 Epoch: 32 Global Step: 679300 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:19,845-Speed 6315.63 samples/sec Loss 3.0465 LearningRate 0.0000 Epoch: 32 Global Step: 679310 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:23,095-Speed 6302.27 samples/sec Loss 3.0191 LearningRate 0.0000 Epoch: 32 Global Step: 679320 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:26,347-Speed 6299.05 samples/sec Loss 3.0819 LearningRate 0.0000 Epoch: 32 Global Step: 679330 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:29,595-Speed 6306.81 samples/sec Loss 2.9833 LearningRate 0.0000 Epoch: 32 Global Step: 679340 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:32,844-Speed 6306.42 samples/sec Loss 3.0185 LearningRate 0.0000 Epoch: 32 Global Step: 679350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:31:36,094-Speed 6304.23 samples/sec Loss 3.0631 LearningRate 0.0000 Epoch: 32 Global Step: 679360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:31:39,348-Speed 6294.65 samples/sec Loss 3.0406 LearningRate 0.0000 Epoch: 32 Global Step: 679370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:31:42,594-Speed 6311.30 samples/sec Loss 2.9953 LearningRate 0.0000 Epoch: 32 Global Step: 679380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:31:45,840-Speed 6309.19 samples/sec Loss 3.0742 LearningRate 0.0000 Epoch: 32 Global Step: 679390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:31:49,089-Speed 6305.34 samples/sec Loss 3.0369 LearningRate 0.0000 Epoch: 32 Global Step: 679400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:31:52,324-Speed 6331.81 samples/sec Loss 3.0333 LearningRate 0.0000 Epoch: 32 Global Step: 679410 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:55,569-Speed 6313.63 samples/sec Loss 3.0338 LearningRate 0.0000 Epoch: 32 Global Step: 679420 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:31:58,820-Speed 6300.83 samples/sec Loss 3.0410 LearningRate 0.0000 Epoch: 32 Global Step: 679430 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:32:02,070-Speed 6301.37 samples/sec Loss 3.0010 LearningRate 0.0000 Epoch: 32 Global Step: 679440 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:32:05,321-Speed 6301.43 samples/sec Loss 3.0405 LearningRate 0.0000 Epoch: 32 Global Step: 679450 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:32:08,569-Speed 6307.42 samples/sec Loss 3.1078 LearningRate 0.0000 Epoch: 32 Global Step: 679460 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:32:11,819-Speed 6302.64 samples/sec Loss 3.0527 LearningRate 0.0000 Epoch: 32 Global Step: 679470 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:32:15,064-Speed 6313.02 samples/sec Loss 3.0500 LearningRate 0.0000 Epoch: 32 Global Step: 679480 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:32:18,313-Speed 6304.65 samples/sec Loss 3.0539 LearningRate 0.0000 Epoch: 32 Global Step: 679490 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:32:21,560-Speed 6309.44 samples/sec Loss 3.0323 LearningRate 0.0000 Epoch: 32 Global Step: 679500 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:32:24,809-Speed 6304.60 samples/sec Loss 3.0268 LearningRate 0.0000 Epoch: 32 Global Step: 679510 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:32:28,069-Speed 6282.70 samples/sec Loss 3.0663 LearningRate 0.0000 Epoch: 32 Global Step: 679520 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:32:31,319-Speed 6304.12 samples/sec Loss 3.0729 LearningRate 0.0000 Epoch: 32 Global Step: 679530 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:32:34,571-Speed 6298.75 samples/sec Loss 3.0299 LearningRate 0.0000 Epoch: 32 Global Step: 679540 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:32:37,823-Speed 6298.74 samples/sec Loss 3.0515 LearningRate 0.0000 Epoch: 32 Global Step: 679550 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:32:41,067-Speed 6314.46 samples/sec Loss 2.9887 LearningRate 0.0000 Epoch: 32 Global Step: 679560 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:32:44,317-Speed 6302.99 samples/sec Loss 3.0449 LearningRate 0.0000 Epoch: 32 Global Step: 679570 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:32:47,566-Speed 6305.94 samples/sec Loss 3.0729 LearningRate 0.0000 Epoch: 32 Global Step: 679580 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:32:50,819-Speed 6297.65 samples/sec Loss 3.0359 LearningRate 0.0000 Epoch: 32 Global Step: 679590 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:32:54,068-Speed 6303.99 samples/sec Loss 3.0560 LearningRate 0.0000 Epoch: 32 Global Step: 679600 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:32:57,307-Speed 6325.77 samples/sec Loss 3.0682 LearningRate 0.0000 Epoch: 32 Global Step: 679610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:33:00,550-Speed 6314.59 samples/sec Loss 3.0273 LearningRate 0.0000 Epoch: 32 Global Step: 679620 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:33:03,804-Speed 6296.68 samples/sec Loss 3.0700 LearningRate 0.0000 Epoch: 32 Global Step: 679630 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:33:07,065-Speed 6281.11 samples/sec Loss 3.0212 LearningRate 0.0000 Epoch: 32 Global Step: 679640 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:33:10,314-Speed 6305.33 samples/sec Loss 3.0587 LearningRate 0.0000 Epoch: 32 Global Step: 679650 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:33:13,563-Speed 6304.32 samples/sec Loss 3.0219 LearningRate 0.0000 Epoch: 32 Global Step: 679660 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:33:16,814-Speed 6300.41 samples/sec Loss 3.0794 LearningRate 0.0000 Epoch: 32 Global Step: 679670 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:33:20,072-Speed 6288.01 samples/sec Loss 3.0094 LearningRate 0.0000 Epoch: 32 Global Step: 679680 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:33:23,318-Speed 6311.30 samples/sec Loss 3.0128 LearningRate 0.0000 Epoch: 32 Global Step: 679690 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:33:26,565-Speed 6308.97 samples/sec Loss 3.0469 LearningRate 0.0000 Epoch: 32 Global Step: 679700 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:33:29,809-Speed 6314.57 samples/sec Loss 3.0475 LearningRate 0.0000 Epoch: 32 Global Step: 679710 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:33:33,060-Speed 6299.85 samples/sec Loss 3.0560 LearningRate 0.0000 Epoch: 32 Global Step: 679720 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:33:36,312-Speed 6300.49 samples/sec Loss 3.0582 LearningRate 0.0000 Epoch: 32 Global Step: 679730 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:33:39,562-Speed 6301.37 samples/sec Loss 3.0156 LearningRate 0.0000 Epoch: 32 Global Step: 679740 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:33:42,815-Speed 6298.49 samples/sec Loss 3.0527 LearningRate 0.0000 Epoch: 32 Global Step: 679750 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:33:46,064-Speed 6304.78 samples/sec Loss 3.0473 LearningRate 0.0000 Epoch: 32 Global Step: 679760 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:33:49,310-Speed 6309.13 samples/sec Loss 3.0178 LearningRate 0.0000 Epoch: 32 Global Step: 679770 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:33:52,553-Speed 6316.98 samples/sec Loss 3.0741 LearningRate 0.0000 Epoch: 32 Global Step: 679780 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:33:55,805-Speed 6299.86 samples/sec Loss 3.0586 LearningRate 0.0000 Epoch: 32 Global Step: 679790 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:33:59,057-Speed 6299.12 samples/sec Loss 3.0465 LearningRate 0.0000 Epoch: 32 Global Step: 679800 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:34:02,315-Speed 6287.72 samples/sec Loss 3.0425 LearningRate 0.0000 Epoch: 32 Global Step: 679810 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:34:05,554-Speed 6324.39 samples/sec Loss 3.1027 LearningRate 0.0000 Epoch: 32 Global Step: 679820 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:34:08,806-Speed 6299.15 samples/sec Loss 3.0225 LearningRate 0.0000 Epoch: 32 Global Step: 679830 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:34:12,060-Speed 6296.75 samples/sec Loss 3.0663 LearningRate 0.0000 Epoch: 32 Global Step: 679840 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:34:15,318-Speed 6285.37 samples/sec Loss 3.0425 LearningRate 0.0000 Epoch: 32 Global Step: 679850 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:34:18,566-Speed 6308.03 samples/sec Loss 3.0763 LearningRate 0.0000 Epoch: 32 Global Step: 679860 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:34:21,816-Speed 6303.12 samples/sec Loss 2.9963 LearningRate 0.0000 Epoch: 32 Global Step: 679870 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:34:25,050-Speed 6333.28 samples/sec Loss 3.0002 LearningRate 0.0000 Epoch: 32 Global Step: 679880 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:34:28,299-Speed 6304.20 samples/sec Loss 3.0683 LearningRate 0.0000 Epoch: 32 Global Step: 679890 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:34:31,553-Speed 6296.47 samples/sec Loss 3.0559 LearningRate 0.0000 Epoch: 32 Global Step: 679900 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:34:34,799-Speed 6310.89 samples/sec Loss 3.1157 LearningRate 0.0000 Epoch: 32 Global Step: 679910 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:34:38,045-Speed 6309.22 samples/sec Loss 3.0528 LearningRate 0.0000 Epoch: 32 Global Step: 679920 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:34:41,301-Speed 6291.94 samples/sec Loss 3.0763 LearningRate 0.0000 Epoch: 32 Global Step: 679930 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:34:44,551-Speed 6304.31 samples/sec Loss 3.0808 LearningRate 0.0000 Epoch: 32 Global Step: 679940 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:34:47,796-Speed 6311.89 samples/sec Loss 3.0451 LearningRate 0.0000 Epoch: 32 Global Step: 679950 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:34:51,048-Speed 6299.51 samples/sec Loss 3.0129 LearningRate 0.0000 Epoch: 32 Global Step: 679960 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:34:54,301-Speed 6297.25 samples/sec Loss 3.0076 LearningRate 0.0000 Epoch: 32 Global Step: 679970 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:34:57,544-Speed 6315.89 samples/sec Loss 3.0598 LearningRate 0.0000 Epoch: 32 Global Step: 679980 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:35:00,793-Speed 6305.57 samples/sec Loss 3.0856 LearningRate 0.0000 Epoch: 32 Global Step: 679990 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:35:04,034-Speed 6319.60 samples/sec Loss 3.0518 LearningRate 0.0000 Epoch: 32 Global Step: 680000 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:35:07,294-Speed 6284.23 samples/sec Loss 3.0853 LearningRate 0.0000 Epoch: 32 Global Step: 680010 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:35:10,541-Speed 6307.75 samples/sec Loss 3.0468 LearningRate 0.0000 Epoch: 32 Global Step: 680020 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:35:13,788-Speed 6310.36 samples/sec Loss 3.0497 LearningRate 0.0000 Epoch: 32 Global Step: 680030 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:35:17,032-Speed 6315.53 samples/sec Loss 3.0149 LearningRate 0.0000 Epoch: 32 Global Step: 680040 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:35:20,280-Speed 6305.51 samples/sec Loss 3.0013 LearningRate 0.0000 Epoch: 32 Global Step: 680050 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:35:23,523-Speed 6317.16 samples/sec Loss 3.0794 LearningRate 0.0000 Epoch: 32 Global Step: 680060 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:35:26,771-Speed 6306.18 samples/sec Loss 2.9881 LearningRate 0.0000 Epoch: 32 Global Step: 680070 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:35:30,018-Speed 6309.74 samples/sec Loss 3.0432 LearningRate 0.0000 Epoch: 32 Global Step: 680080 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:35:33,269-Speed 6299.98 samples/sec Loss 3.0834 LearningRate 0.0000 Epoch: 32 Global Step: 680090 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:35:36,521-Speed 6298.87 samples/sec Loss 3.0242 LearningRate 0.0000 Epoch: 32 Global Step: 680100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:35:39,772-Speed 6301.23 samples/sec Loss 2.9945 LearningRate 0.0000 Epoch: 32 Global Step: 680110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:35:43,018-Speed 6310.97 samples/sec Loss 3.0297 LearningRate 0.0000 Epoch: 32 Global Step: 680120 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:35:46,262-Speed 6313.81 samples/sec Loss 3.0332 LearningRate 0.0000 Epoch: 32 Global Step: 680130 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:35:49,508-Speed 6312.38 samples/sec Loss 3.0304 LearningRate 0.0000 Epoch: 32 Global Step: 680140 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:35:52,755-Speed 6307.85 samples/sec Loss 3.0158 LearningRate 0.0000 Epoch: 32 Global Step: 680150 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:35:56,004-Speed 6305.58 samples/sec Loss 3.0839 LearningRate 0.0000 Epoch: 32 Global Step: 680160 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:35:59,252-Speed 6306.22 samples/sec Loss 3.0790 LearningRate 0.0000 Epoch: 32 Global Step: 680170 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:36:02,495-Speed 6316.30 samples/sec Loss 3.0302 LearningRate 0.0000 Epoch: 32 Global Step: 680180 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:36:05,739-Speed 6314.32 samples/sec Loss 3.1015 LearningRate 0.0000 Epoch: 32 Global Step: 680190 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:36:08,986-Speed 6308.37 samples/sec Loss 3.0090 LearningRate 0.0000 Epoch: 32 Global Step: 680200 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:36:12,236-Speed 6303.20 samples/sec Loss 3.0051 LearningRate 0.0000 Epoch: 32 Global Step: 680210 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:36:15,485-Speed 6305.42 samples/sec Loss 3.0017 LearningRate 0.0000 Epoch: 32 Global Step: 680220 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:36:18,733-Speed 6306.82 samples/sec Loss 3.0756 LearningRate 0.0000 Epoch: 32 Global Step: 680230 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:36:21,982-Speed 6304.85 samples/sec Loss 3.0491 LearningRate 0.0000 Epoch: 32 Global Step: 680240 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:36:25,227-Speed 6313.15 samples/sec Loss 3.0403 LearningRate 0.0000 Epoch: 32 Global Step: 680250 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:36:28,475-Speed 6306.00 samples/sec Loss 3.0279 LearningRate 0.0000 Epoch: 32 Global Step: 680260 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:36:31,724-Speed 6305.67 samples/sec Loss 3.0532 LearningRate 0.0000 Epoch: 32 Global Step: 680270 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:36:34,970-Speed 6310.25 samples/sec Loss 3.0739 LearningRate 0.0000 Epoch: 32 Global Step: 680280 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:36:38,223-Speed 6298.39 samples/sec Loss 3.0372 LearningRate 0.0000 Epoch: 32 Global Step: 680290 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:36:41,470-Speed 6308.08 samples/sec Loss 3.0567 LearningRate 0.0000 Epoch: 32 Global Step: 680300 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:36:44,714-Speed 6315.07 samples/sec Loss 3.0407 LearningRate 0.0000 Epoch: 32 Global Step: 680310 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:36:47,968-Speed 6294.54 samples/sec Loss 3.1026 LearningRate 0.0000 Epoch: 32 Global Step: 680320 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:36:51,221-Speed 6297.20 samples/sec Loss 3.0690 LearningRate 0.0000 Epoch: 32 Global Step: 680330 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:36:54,469-Speed 6306.15 samples/sec Loss 3.0287 LearningRate 0.0000 Epoch: 32 Global Step: 680340 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:36:57,722-Speed 6297.37 samples/sec Loss 3.0711 LearningRate 0.0000 Epoch: 32 Global Step: 680350 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:00,982-Speed 6285.15 samples/sec Loss 3.0248 LearningRate 0.0000 Epoch: 32 Global Step: 680360 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:04,225-Speed 6315.75 samples/sec Loss 3.0467 LearningRate 0.0000 Epoch: 32 Global Step: 680370 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:07,458-Speed 6336.03 samples/sec Loss 3.0929 LearningRate 0.0000 Epoch: 32 Global Step: 680380 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:10,701-Speed 6316.45 samples/sec Loss 3.0587 LearningRate 0.0000 Epoch: 32 Global Step: 680390 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:13,953-Speed 6298.99 samples/sec Loss 3.0123 LearningRate 0.0000 Epoch: 32 Global Step: 680400 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:17,206-Speed 6297.41 samples/sec Loss 2.9980 LearningRate 0.0000 Epoch: 32 Global Step: 680410 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:20,458-Speed 6299.55 samples/sec Loss 3.0311 LearningRate 0.0000 Epoch: 32 Global Step: 680420 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:23,708-Speed 6302.36 samples/sec Loss 2.9932 LearningRate 0.0000 Epoch: 32 Global Step: 680430 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:26,959-Speed 6300.57 samples/sec Loss 3.0657 LearningRate 0.0000 Epoch: 32 Global Step: 680440 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:30,220-Speed 6281.55 samples/sec Loss 2.9831 LearningRate 0.0000 Epoch: 32 Global Step: 680450 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:33,470-Speed 6302.39 samples/sec Loss 3.0811 LearningRate 0.0000 Epoch: 32 Global Step: 680460 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:36,714-Speed 6314.76 samples/sec Loss 3.0068 LearningRate 0.0000 Epoch: 32 Global Step: 680470 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:39,947-Speed 6336.35 samples/sec Loss 3.0088 LearningRate 0.0000 Epoch: 32 Global Step: 680480 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:43,194-Speed 6309.88 samples/sec Loss 3.0440 LearningRate 0.0000 Epoch: 32 Global Step: 680490 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:46,439-Speed 6311.86 samples/sec Loss 2.9905 LearningRate 0.0000 Epoch: 32 Global Step: 680500 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:37:49,673-Speed 6335.65 samples/sec Loss 3.0826 LearningRate 0.0000 Epoch: 32 Global Step: 680510 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:37:52,919-Speed 6309.93 samples/sec Loss 2.9690 LearningRate 0.0000 Epoch: 32 Global Step: 680520 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:37:56,173-Speed 6295.30 samples/sec Loss 3.0130 LearningRate 0.0000 Epoch: 32 Global Step: 680530 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:37:59,417-Speed 6314.30 samples/sec Loss 3.0891 LearningRate 0.0000 Epoch: 32 Global Step: 680540 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:38:02,674-Speed 6289.46 samples/sec Loss 3.0723 LearningRate 0.0000 Epoch: 32 Global Step: 680550 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:38:05,925-Speed 6300.42 samples/sec Loss 3.0092 LearningRate 0.0000 Epoch: 32 Global Step: 680560 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:38:09,177-Speed 6298.95 samples/sec Loss 3.0565 LearningRate 0.0000 Epoch: 32 Global Step: 680570 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:38:12,425-Speed 6308.27 samples/sec Loss 3.0386 LearningRate 0.0000 Epoch: 32 Global Step: 680580 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:38:15,673-Speed 6305.71 samples/sec Loss 3.0396 LearningRate 0.0000 Epoch: 32 Global Step: 680590 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:38:18,923-Speed 6303.71 samples/sec Loss 3.0331 LearningRate 0.0000 Epoch: 32 Global Step: 680600 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:38:22,170-Speed 6308.49 samples/sec Loss 3.0140 LearningRate 0.0000 Epoch: 32 Global Step: 680610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:38:25,426-Speed 6291.60 samples/sec Loss 3.0636 LearningRate 0.0000 Epoch: 32 Global Step: 680620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:38:28,675-Speed 6304.89 samples/sec Loss 2.9958 LearningRate 0.0000 Epoch: 32 Global Step: 680630 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:38:31,931-Speed 6290.62 samples/sec Loss 3.0487 LearningRate 0.0000 Epoch: 32 Global Step: 680640 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:38:35,190-Speed 6285.89 samples/sec Loss 2.9949 LearningRate 0.0000 Epoch: 32 Global Step: 680650 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:38:38,451-Speed 6282.25 samples/sec Loss 2.9894 LearningRate 0.0000 Epoch: 32 Global Step: 680660 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:38:41,701-Speed 6301.98 samples/sec Loss 3.0242 LearningRate 0.0000 Epoch: 32 Global Step: 680670 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:38:44,947-Speed 6310.86 samples/sec Loss 3.0205 LearningRate 0.0000 Epoch: 32 Global Step: 680680 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:38:48,194-Speed 6309.84 samples/sec Loss 3.0802 LearningRate 0.0000 Epoch: 32 Global Step: 680690 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:38:51,442-Speed 6307.08 samples/sec Loss 3.0383 LearningRate 0.0000 Epoch: 32 Global Step: 680700 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:38:54,697-Speed 6292.31 samples/sec Loss 3.0483 LearningRate 0.0000 Epoch: 32 Global Step: 680710 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:38:57,950-Speed 6298.98 samples/sec Loss 3.0644 LearningRate 0.0000 Epoch: 32 Global Step: 680720 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:39:01,196-Speed 6309.30 samples/sec Loss 3.0228 LearningRate 0.0000 Epoch: 32 Global Step: 680730 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:39:04,445-Speed 6305.62 samples/sec Loss 3.0570 LearningRate 0.0000 Epoch: 32 Global Step: 680740 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:39:07,696-Speed 6300.33 samples/sec Loss 3.0443 LearningRate 0.0000 Epoch: 32 Global Step: 680750 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:39:10,942-Speed 6312.07 samples/sec Loss 2.9845 LearningRate 0.0000 Epoch: 32 Global Step: 680760 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:39:14,193-Speed 6300.88 samples/sec Loss 3.0120 LearningRate 0.0000 Epoch: 32 Global Step: 680770 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:39:17,442-Speed 6304.49 samples/sec Loss 3.0233 LearningRate 0.0000 Epoch: 32 Global Step: 680780 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:39:20,692-Speed 6302.82 samples/sec Loss 3.0385 LearningRate 0.0000 Epoch: 32 Global Step: 680790 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:23,941-Speed 6304.28 samples/sec Loss 3.0780 LearningRate 0.0000 Epoch: 32 Global Step: 680800 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:27,192-Speed 6301.12 samples/sec Loss 3.1068 LearningRate 0.0000 Epoch: 32 Global Step: 680810 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:30,434-Speed 6318.05 samples/sec Loss 3.0593 LearningRate 0.0000 Epoch: 32 Global Step: 680820 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:33,683-Speed 6306.31 samples/sec Loss 3.0406 LearningRate 0.0000 Epoch: 32 Global Step: 680830 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:36,933-Speed 6302.97 samples/sec Loss 3.0085 LearningRate 0.0000 Epoch: 32 Global Step: 680840 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:40,182-Speed 6304.56 samples/sec Loss 3.0142 LearningRate 0.0000 Epoch: 32 Global Step: 680850 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:43,428-Speed 6311.05 samples/sec Loss 3.0198 LearningRate 0.0000 Epoch: 32 Global Step: 680860 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:46,674-Speed 6309.10 samples/sec Loss 3.0641 LearningRate 0.0000 Epoch: 32 Global Step: 680870 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:49,925-Speed 6301.42 samples/sec Loss 3.0669 LearningRate 0.0000 Epoch: 32 Global Step: 680880 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:53,167-Speed 6318.93 samples/sec Loss 3.0648 LearningRate 0.0000 Epoch: 32 Global Step: 680890 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:56,416-Speed 6304.08 samples/sec Loss 3.1116 LearningRate 0.0000 Epoch: 32 Global Step: 680900 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:39:59,669-Speed 6297.77 samples/sec Loss 3.0199 LearningRate 0.0000 Epoch: 32 Global Step: 680910 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:40:02,914-Speed 6313.89 samples/sec Loss 2.9906 LearningRate 0.0000 Epoch: 32 Global Step: 680920 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:40:06,166-Speed 6298.52 samples/sec Loss 3.0076 LearningRate 0.0000 Epoch: 32 Global Step: 680930 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:40:09,417-Speed 6301.09 samples/sec Loss 3.0003 LearningRate 0.0000 Epoch: 32 Global Step: 680940 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:40:12,674-Speed 6289.21 samples/sec Loss 3.0288 LearningRate 0.0000 Epoch: 32 Global Step: 680950 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:40:15,921-Speed 6308.72 samples/sec Loss 3.0563 LearningRate 0.0000 Epoch: 32 Global Step: 680960 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:40:19,163-Speed 6318.98 samples/sec Loss 3.0579 LearningRate 0.0000 Epoch: 32 Global Step: 680970 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:40:22,411-Speed 6307.20 samples/sec Loss 3.0275 LearningRate 0.0000 Epoch: 32 Global Step: 680980 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:40:25,670-Speed 6285.66 samples/sec Loss 3.0229 LearningRate 0.0000 Epoch: 32 Global Step: 680990 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:40:28,917-Speed 6307.44 samples/sec Loss 3.0630 LearningRate 0.0000 Epoch: 32 Global Step: 681000 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:40:32,166-Speed 6305.18 samples/sec Loss 3.0747 LearningRate 0.0000 Epoch: 32 Global Step: 681010 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:40:35,412-Speed 6310.23 samples/sec Loss 3.0072 LearningRate 0.0000 Epoch: 32 Global Step: 681020 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:40:38,663-Speed 6302.26 samples/sec Loss 3.0038 LearningRate 0.0000 Epoch: 32 Global Step: 681030 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:40:41,909-Speed 6310.95 samples/sec Loss 3.0527 LearningRate 0.0000 Epoch: 32 Global Step: 681040 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:40:45,162-Speed 6295.22 samples/sec Loss 3.0834 LearningRate 0.0000 Epoch: 32 Global Step: 681050 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:40:48,412-Speed 6304.83 samples/sec Loss 3.0335 LearningRate 0.0000 Epoch: 32 Global Step: 681060 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-03 05:40:51,659-Speed 6307.56 samples/sec Loss 3.0378 LearningRate 0.0000 Epoch: 32 Global Step: 681070 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:40:54,909-Speed 6303.98 samples/sec Loss 3.0328 LearningRate 0.0000 Epoch: 32 Global Step: 681080 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:40:58,157-Speed 6305.81 samples/sec Loss 3.0711 LearningRate 0.0000 Epoch: 32 Global Step: 681090 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:41:01,403-Speed 6311.19 samples/sec Loss 3.0294 LearningRate 0.0000 Epoch: 32 Global Step: 681100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:41:04,652-Speed 6305.05 samples/sec Loss 3.0784 LearningRate 0.0000 Epoch: 32 Global Step: 681110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-03 05:41:07,900-Speed 6306.90 samples/sec Loss 3.0583 LearningRate 0.0000 Epoch: 32 Global Step: 681120 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:41:11,150-Speed 6301.94 samples/sec Loss 3.0515 LearningRate 0.0000 Epoch: 32 Global Step: 681130 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:41:14,394-Speed 6313.87 samples/sec Loss 3.0469 LearningRate 0.0000 Epoch: 32 Global Step: 681140 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:41:17,635-Speed 6321.51 samples/sec Loss 3.0464 LearningRate 0.0000 Epoch: 32 Global Step: 681150 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:41:20,884-Speed 6306.25 samples/sec Loss 2.9905 LearningRate 0.0000 Epoch: 32 Global Step: 681160 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:41:24,130-Speed 6310.42 samples/sec Loss 2.9792 LearningRate 0.0000 Epoch: 32 Global Step: 681170 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:41:27,376-Speed 6309.90 samples/sec Loss 3.0740 LearningRate 0.0000 Epoch: 32 Global Step: 681180 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:41:30,634-Speed 6288.19 samples/sec Loss 3.0673 LearningRate 0.0000 Epoch: 32 Global Step: 681190 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:41:33,892-Speed 6287.45 samples/sec Loss 3.0905 LearningRate 0.0000 Epoch: 32 Global Step: 681200 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:41:37,134-Speed 6319.29 samples/sec Loss 2.9633 LearningRate 0.0000 Epoch: 32 Global Step: 681210 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:41:40,386-Speed 6298.13 samples/sec Loss 3.0293 LearningRate 0.0000 Epoch: 32 Global Step: 681220 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:41:43,639-Speed 6296.15 samples/sec Loss 3.0057 LearningRate 0.0000 Epoch: 32 Global Step: 681230 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:41:46,884-Speed 6312.82 samples/sec Loss 3.0064 LearningRate 0.0000 Epoch: 32 Global Step: 681240 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:41:50,139-Speed 6293.08 samples/sec Loss 3.0735 LearningRate 0.0000 Epoch: 32 Global Step: 681250 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:41:53,386-Speed 6310.53 samples/sec Loss 3.0182 LearningRate 0.0000 Epoch: 32 Global Step: 681260 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:41:56,633-Speed 6308.70 samples/sec Loss 3.0128 LearningRate 0.0000 Epoch: 32 Global Step: 681270 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:41:59,886-Speed 6296.98 samples/sec Loss 3.0408 LearningRate 0.0000 Epoch: 32 Global Step: 681280 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:03,143-Speed 6289.46 samples/sec Loss 2.9878 LearningRate 0.0000 Epoch: 32 Global Step: 681290 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:06,396-Speed 6297.37 samples/sec Loss 3.0563 LearningRate 0.0000 Epoch: 32 Global Step: 681300 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:09,644-Speed 6306.54 samples/sec Loss 3.0258 LearningRate 0.0000 Epoch: 32 Global Step: 681310 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:12,894-Speed 6301.81 samples/sec Loss 3.0257 LearningRate 0.0000 Epoch: 32 Global Step: 681320 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:16,141-Speed 6308.59 samples/sec Loss 3.0415 LearningRate 0.0000 Epoch: 32 Global Step: 681330 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:19,392-Speed 6300.82 samples/sec Loss 3.0263 LearningRate 0.0000 Epoch: 32 Global Step: 681340 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:22,645-Speed 6297.76 samples/sec Loss 3.0346 LearningRate 0.0000 Epoch: 32 Global Step: 681350 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-03 05:42:25,880-Speed 6333.22 samples/sec Loss 2.9577 LearningRate 0.0000 Epoch: 32 Global Step: 681360 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:29,123-Speed 6316.19 samples/sec Loss 3.0726 LearningRate 0.0000 Epoch: 32 Global Step: 681370 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:32,366-Speed 6315.71 samples/sec Loss 3.0448 LearningRate 0.0000 Epoch: 32 Global Step: 681380 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:35,613-Speed 6309.89 samples/sec Loss 3.0376 LearningRate 0.0000 Epoch: 32 Global Step: 681390 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:38,856-Speed 6317.21 samples/sec Loss 3.0184 LearningRate 0.0000 Epoch: 32 Global Step: 681400 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:42,100-Speed 6314.31 samples/sec Loss 2.9918 LearningRate 0.0000 Epoch: 32 Global Step: 681410 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:45,342-Speed 6317.23 samples/sec Loss 3.0839 LearningRate 0.0000 Epoch: 32 Global Step: 681420 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:48,601-Speed 6285.92 samples/sec Loss 3.0259 LearningRate 0.0000 Epoch: 32 Global Step: 681430 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:51,852-Speed 6300.84 samples/sec Loss 3.0106 LearningRate 0.0000 Epoch: 32 Global Step: 681440 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:55,097-Speed 6312.63 samples/sec Loss 3.0636 LearningRate 0.0000 Epoch: 32 Global Step: 681450 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:42:58,335-Speed 6326.85 samples/sec Loss 3.0239 LearningRate 0.0000 Epoch: 32 Global Step: 681460 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:43:01,586-Speed 6301.81 samples/sec Loss 3.0199 LearningRate 0.0000 Epoch: 32 Global Step: 681470 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:43:04,817-Speed 6339.14 samples/sec Loss 2.9881 LearningRate 0.0000 Epoch: 32 Global Step: 681480 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:08,072-Speed 6293.75 samples/sec Loss 2.9939 LearningRate 0.0000 Epoch: 32 Global Step: 681490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:11,323-Speed 6300.27 samples/sec Loss 3.0239 LearningRate 0.0000 Epoch: 32 Global Step: 681500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:14,572-Speed 6304.93 samples/sec Loss 3.0252 LearningRate 0.0000 Epoch: 32 Global Step: 681510 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:17,820-Speed 6306.72 samples/sec Loss 3.0489 LearningRate 0.0000 Epoch: 32 Global Step: 681520 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:21,070-Speed 6303.10 samples/sec Loss 2.9469 LearningRate 0.0000 Epoch: 32 Global Step: 681530 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:24,319-Speed 6304.21 samples/sec Loss 3.0044 LearningRate 0.0000 Epoch: 32 Global Step: 681540 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:27,565-Speed 6311.29 samples/sec Loss 3.0160 LearningRate 0.0000 Epoch: 32 Global Step: 681550 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:30,820-Speed 6293.51 samples/sec Loss 3.0748 LearningRate 0.0000 Epoch: 32 Global Step: 681560 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:34,068-Speed 6306.01 samples/sec Loss 3.0175 LearningRate 0.0000 Epoch: 32 Global Step: 681570 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:37,315-Speed 6309.40 samples/sec Loss 3.0166 LearningRate 0.0000 Epoch: 32 Global Step: 681580 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:43:40,567-Speed 6298.88 samples/sec Loss 3.0117 LearningRate 0.0000 Epoch: 32 Global Step: 681590 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:43:43,814-Speed 6309.67 samples/sec Loss 3.0932 LearningRate 0.0000 Epoch: 32 Global Step: 681600 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:43:47,071-Speed 6289.59 samples/sec Loss 3.0101 LearningRate 0.0000 Epoch: 32 Global Step: 681610 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:50,332-Speed 6280.82 samples/sec Loss 3.0078 LearningRate 0.0000 Epoch: 32 Global Step: 681620 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:53,578-Speed 6310.97 samples/sec Loss 3.0209 LearningRate 0.0000 Epoch: 32 Global Step: 681630 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:43:56,832-Speed 6295.10 samples/sec Loss 3.0584 LearningRate 0.0000 Epoch: 32 Global Step: 681640 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:00,089-Speed 6290.68 samples/sec Loss 3.0500 LearningRate 0.0000 Epoch: 32 Global Step: 681650 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:03,340-Speed 6300.65 samples/sec Loss 3.0428 LearningRate 0.0000 Epoch: 32 Global Step: 681660 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:06,592-Speed 6299.26 samples/sec Loss 3.0382 LearningRate 0.0000 Epoch: 32 Global Step: 681670 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:09,837-Speed 6312.54 samples/sec Loss 3.0246 LearningRate 0.0000 Epoch: 32 Global Step: 681680 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:13,090-Speed 6296.37 samples/sec Loss 3.0606 LearningRate 0.0000 Epoch: 32 Global Step: 681690 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:16,338-Speed 6306.84 samples/sec Loss 3.0244 LearningRate 0.0000 Epoch: 32 Global Step: 681700 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:19,588-Speed 6303.34 samples/sec Loss 3.0356 LearningRate 0.0000 Epoch: 32 Global Step: 681710 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:44:22,837-Speed 6304.02 samples/sec Loss 2.9806 LearningRate 0.0000 Epoch: 32 Global Step: 681720 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:44:26,087-Speed 6304.35 samples/sec Loss 2.9881 LearningRate 0.0000 Epoch: 32 Global Step: 681730 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:44:29,331-Speed 6313.34 samples/sec Loss 2.9677 LearningRate 0.0000 Epoch: 32 Global Step: 681740 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:44:32,585-Speed 6294.91 samples/sec Loss 3.0763 LearningRate 0.0000 Epoch: 32 Global Step: 681750 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:44:35,836-Speed 6302.04 samples/sec Loss 3.0173 LearningRate 0.0000 Epoch: 32 Global Step: 681760 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:44:39,071-Speed 6332.50 samples/sec Loss 3.0480 LearningRate 0.0000 Epoch: 32 Global Step: 681770 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:42,318-Speed 6309.00 samples/sec Loss 3.0786 LearningRate 0.0000 Epoch: 32 Global Step: 681780 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:45,568-Speed 6301.15 samples/sec Loss 2.9766 LearningRate 0.0000 Epoch: 32 Global Step: 681790 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:48,821-Speed 6298.09 samples/sec Loss 3.0201 LearningRate 0.0000 Epoch: 32 Global Step: 681800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:52,066-Speed 6313.13 samples/sec Loss 3.0442 LearningRate 0.0000 Epoch: 32 Global Step: 681810 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:55,321-Speed 6294.89 samples/sec Loss 3.0481 LearningRate 0.0000 Epoch: 32 Global Step: 681820 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:44:58,573-Speed 6299.12 samples/sec Loss 3.0319 LearningRate 0.0000 Epoch: 32 Global Step: 681830 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:45:01,823-Speed 6302.18 samples/sec Loss 3.0396 LearningRate 0.0000 Epoch: 32 Global Step: 681840 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:45:05,078-Speed 6292.18 samples/sec Loss 3.0402 LearningRate 0.0000 Epoch: 32 Global Step: 681850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:45:08,330-Speed 6299.32 samples/sec Loss 2.9935 LearningRate 0.0000 Epoch: 32 Global Step: 681860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:45:11,580-Speed 6303.22 samples/sec Loss 2.9883 LearningRate 0.0000 Epoch: 32 Global Step: 681870 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:14,836-Speed 6291.06 samples/sec Loss 3.0556 LearningRate 0.0000 Epoch: 32 Global Step: 681880 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:18,085-Speed 6306.15 samples/sec Loss 3.0346 LearningRate 0.0000 Epoch: 32 Global Step: 681890 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:21,332-Speed 6307.93 samples/sec Loss 3.0482 LearningRate 0.0000 Epoch: 32 Global Step: 681900 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:24,580-Speed 6306.90 samples/sec Loss 3.0448 LearningRate 0.0000 Epoch: 32 Global Step: 681910 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:27,837-Speed 6288.85 samples/sec Loss 2.9814 LearningRate 0.0000 Epoch: 32 Global Step: 681920 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:31,082-Speed 6313.89 samples/sec Loss 3.0469 LearningRate 0.0000 Epoch: 32 Global Step: 681930 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:34,334-Speed 6298.10 samples/sec Loss 3.0334 LearningRate 0.0000 Epoch: 32 Global Step: 681940 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:37,606-Speed 6260.65 samples/sec Loss 3.0253 LearningRate 0.0000 Epoch: 32 Global Step: 681950 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:40,887-Speed 6243.65 samples/sec Loss 3.0779 LearningRate 0.0000 Epoch: 32 Global Step: 681960 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:44,127-Speed 6322.95 samples/sec Loss 3.0253 LearningRate 0.0000 Epoch: 32 Global Step: 681970 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:47,372-Speed 6312.44 samples/sec Loss 3.0523 LearningRate 0.0000 Epoch: 32 Global Step: 681980 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:50,616-Speed 6314.04 samples/sec Loss 3.0313 LearningRate 0.0000 Epoch: 32 Global Step: 681990 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:53,863-Speed 6310.14 samples/sec Loss 3.0190 LearningRate 0.0000 Epoch: 32 Global Step: 682000 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:45:57,110-Speed 6308.27 samples/sec Loss 3.0193 LearningRate 0.0000 Epoch: 32 Global Step: 682010 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:00,362-Speed 6299.24 samples/sec Loss 3.0443 LearningRate 0.0000 Epoch: 32 Global Step: 682020 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:03,614-Speed 6298.77 samples/sec Loss 3.0444 LearningRate 0.0000 Epoch: 32 Global Step: 682030 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:06,867-Speed 6296.99 samples/sec Loss 3.0172 LearningRate 0.0000 Epoch: 32 Global Step: 682040 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:10,116-Speed 6305.78 samples/sec Loss 3.0778 LearningRate 0.0000 Epoch: 32 Global Step: 682050 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:13,366-Speed 6303.20 samples/sec Loss 3.0672 LearningRate 0.0000 Epoch: 32 Global Step: 682060 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:16,602-Speed 6330.22 samples/sec Loss 3.0806 LearningRate 0.0000 Epoch: 32 Global Step: 682070 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:19,850-Speed 6307.03 samples/sec Loss 3.0233 LearningRate 0.0000 Epoch: 32 Global Step: 682080 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:23,107-Speed 6288.26 samples/sec Loss 3.0559 LearningRate 0.0000 Epoch: 32 Global Step: 682090 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:26,354-Speed 6308.63 samples/sec Loss 3.0543 LearningRate 0.0000 Epoch: 32 Global Step: 682100 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:29,606-Speed 6299.02 samples/sec Loss 2.9994 LearningRate 0.0000 Epoch: 32 Global Step: 682110 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:32,868-Speed 6281.12 samples/sec Loss 3.0884 LearningRate 0.0000 Epoch: 32 Global Step: 682120 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:36,118-Speed 6304.53 samples/sec Loss 2.9932 LearningRate 0.0000 Epoch: 32 Global Step: 682130 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:39,368-Speed 6304.42 samples/sec Loss 3.0109 LearningRate 0.0000 Epoch: 32 Global Step: 682140 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:42,617-Speed 6303.09 samples/sec Loss 3.0467 LearningRate 0.0000 Epoch: 32 Global Step: 682150 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:45,877-Speed 6285.44 samples/sec Loss 3.0099 LearningRate 0.0000 Epoch: 32 Global Step: 682160 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:49,111-Speed 6333.75 samples/sec Loss 2.9939 LearningRate 0.0000 Epoch: 32 Global Step: 682170 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:52,367-Speed 6290.15 samples/sec Loss 3.0169 LearningRate 0.0000 Epoch: 32 Global Step: 682180 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:55,618-Speed 6301.73 samples/sec Loss 2.9348 LearningRate 0.0000 Epoch: 32 Global Step: 682190 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:46:58,874-Speed 6291.30 samples/sec Loss 3.0042 LearningRate 0.0000 Epoch: 32 Global Step: 682200 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:47:02,121-Speed 6308.24 samples/sec Loss 3.0357 LearningRate 0.0000 Epoch: 32 Global Step: 682210 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:47:05,366-Speed 6312.23 samples/sec Loss 3.0390 LearningRate 0.0000 Epoch: 32 Global Step: 682220 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:47:08,615-Speed 6305.14 samples/sec Loss 3.0012 LearningRate 0.0000 Epoch: 32 Global Step: 682230 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:47:11,866-Speed 6301.75 samples/sec Loss 3.0251 LearningRate 0.0000 Epoch: 32 Global Step: 682240 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:47:15,121-Speed 6292.91 samples/sec Loss 3.0326 LearningRate 0.0000 Epoch: 32 Global Step: 682250 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:47:18,374-Speed 6297.64 samples/sec Loss 3.0302 LearningRate 0.0000 Epoch: 32 Global Step: 682260 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:47:21,623-Speed 6305.40 samples/sec Loss 3.0703 LearningRate 0.0000 Epoch: 32 Global Step: 682270 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-03 05:47:24,848-Speed 6351.68 samples/sec Loss 3.0140 LearningRate 0.0000 Epoch: 32 Global Step: 682280 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:47:28,099-Speed 6301.08 samples/sec Loss 3.0268 LearningRate 0.0000 Epoch: 32 Global Step: 682290 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:47:31,357-Speed 6286.54 samples/sec Loss 3.0219 LearningRate 0.0000 Epoch: 32 Global Step: 682300 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:47:34,614-Speed 6290.39 samples/sec Loss 2.9592 LearningRate 0.0000 Epoch: 32 Global Step: 682310 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:47:37,866-Speed 6299.29 samples/sec Loss 3.0355 LearningRate 0.0000 Epoch: 32 Global Step: 682320 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:47:41,118-Speed 6298.36 samples/sec Loss 3.0600 LearningRate 0.0000 Epoch: 32 Global Step: 682330 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:47:44,368-Speed 6302.81 samples/sec Loss 3.0512 LearningRate 0.0000 Epoch: 32 Global Step: 682340 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:47:47,620-Speed 6299.04 samples/sec Loss 3.0419 LearningRate 0.0000 Epoch: 32 Global Step: 682350 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:47:50,872-Speed 6299.91 samples/sec Loss 3.0485 LearningRate 0.0000 Epoch: 32 Global Step: 682360 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:47:54,145-Speed 6258.45 samples/sec Loss 3.0006 LearningRate 0.0000 Epoch: 32 Global Step: 682370 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:47:57,390-Speed 6312.71 samples/sec Loss 3.0226 LearningRate 0.0000 Epoch: 32 Global Step: 682380 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:48:00,644-Speed 6294.99 samples/sec Loss 3.0004 LearningRate 0.0000 Epoch: 32 Global Step: 682390 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:48:03,897-Speed 6296.72 samples/sec Loss 3.0347 LearningRate 0.0000 Epoch: 32 Global Step: 682400 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:48:07,137-Speed 6323.20 samples/sec Loss 3.0364 LearningRate 0.0000 Epoch: 32 Global Step: 682410 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:48:10,390-Speed 6296.13 samples/sec Loss 3.0126 LearningRate 0.0000 Epoch: 32 Global Step: 682420 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:48:13,642-Speed 6299.96 samples/sec Loss 3.0133 LearningRate 0.0000 Epoch: 32 Global Step: 682430 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:48:16,899-Speed 6288.52 samples/sec Loss 3.0088 LearningRate 0.0000 Epoch: 32 Global Step: 682440 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:48:20,148-Speed 6306.08 samples/sec Loss 3.0496 LearningRate 0.0000 Epoch: 32 Global Step: 682450 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:48:23,397-Speed 6303.64 samples/sec Loss 3.0257 LearningRate 0.0000 Epoch: 32 Global Step: 682460 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:48:26,651-Speed 6295.79 samples/sec Loss 3.0395 LearningRate 0.0000 Epoch: 32 Global Step: 682470 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:48:29,902-Speed 6300.52 samples/sec Loss 2.9898 LearningRate 0.0000 Epoch: 32 Global Step: 682480 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:48:33,156-Speed 6295.84 samples/sec Loss 3.0381 LearningRate 0.0000 Epoch: 32 Global Step: 682490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:48:36,416-Speed 6284.72 samples/sec Loss 3.0809 LearningRate 0.0000 Epoch: 32 Global Step: 682500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:48:39,666-Speed 6303.22 samples/sec Loss 3.0068 LearningRate 0.0000 Epoch: 32 Global Step: 682510 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:48:42,910-Speed 6314.40 samples/sec Loss 2.9867 LearningRate 0.0000 Epoch: 32 Global Step: 682520 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:48:46,156-Speed 6310.95 samples/sec Loss 3.0174 LearningRate 0.0000 Epoch: 32 Global Step: 682530 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:48:49,406-Speed 6302.37 samples/sec Loss 3.0747 LearningRate 0.0000 Epoch: 32 Global Step: 682540 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:48:52,666-Speed 6286.81 samples/sec Loss 3.0340 LearningRate 0.0000 Epoch: 32 Global Step: 682550 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:48:55,919-Speed 6297.15 samples/sec Loss 3.0171 LearningRate 0.0000 Epoch: 32 Global Step: 682560 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:48:59,151-Speed 6337.12 samples/sec Loss 3.0611 LearningRate 0.0000 Epoch: 32 Global Step: 682570 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:49:02,411-Speed 6283.43 samples/sec Loss 3.0441 LearningRate 0.0000 Epoch: 32 Global Step: 682580 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:49:05,666-Speed 6293.01 samples/sec Loss 3.0352 LearningRate 0.0000 Epoch: 32 Global Step: 682590 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:49:08,920-Speed 6295.59 samples/sec Loss 3.0101 LearningRate 0.0000 Epoch: 32 Global Step: 682600 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:49:12,176-Speed 6291.69 samples/sec Loss 3.0011 LearningRate 0.0000 Epoch: 32 Global Step: 682610 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:49:15,428-Speed 6297.57 samples/sec Loss 3.0160 LearningRate 0.0000 Epoch: 32 Global Step: 682620 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:49:18,675-Speed 6309.54 samples/sec Loss 3.0587 LearningRate 0.0000 Epoch: 32 Global Step: 682630 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:49:21,922-Speed 6308.53 samples/sec Loss 3.0773 LearningRate 0.0000 Epoch: 32 Global Step: 682640 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:49:25,179-Speed 6288.95 samples/sec Loss 3.0098 LearningRate 0.0000 Epoch: 32 Global Step: 682650 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:49:28,423-Speed 6315.32 samples/sec Loss 2.9844 LearningRate 0.0000 Epoch: 32 Global Step: 682660 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:49:31,668-Speed 6313.14 samples/sec Loss 2.9864 LearningRate 0.0000 Epoch: 32 Global Step: 682670 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:49:34,915-Speed 6308.28 samples/sec Loss 3.0532 LearningRate 0.0000 Epoch: 32 Global Step: 682680 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:49:38,165-Speed 6302.83 samples/sec Loss 2.9961 LearningRate 0.0000 Epoch: 32 Global Step: 682690 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:49:41,420-Speed 6294.37 samples/sec Loss 3.0298 LearningRate 0.0000 Epoch: 32 Global Step: 682700 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:49:44,665-Speed 6311.29 samples/sec Loss 3.0544 LearningRate 0.0000 Epoch: 32 Global Step: 682710 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:49:47,916-Speed 6302.87 samples/sec Loss 3.0162 LearningRate 0.0000 Epoch: 32 Global Step: 682720 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:49:51,163-Speed 6308.28 samples/sec Loss 3.0451 LearningRate 0.0000 Epoch: 32 Global Step: 682730 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:49:54,415-Speed 6299.11 samples/sec Loss 3.0178 LearningRate 0.0000 Epoch: 32 Global Step: 682740 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:49:57,668-Speed 6297.92 samples/sec Loss 3.0575 LearningRate 0.0000 Epoch: 32 Global Step: 682750 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:50:00,916-Speed 6307.12 samples/sec Loss 3.0621 LearningRate 0.0000 Epoch: 32 Global Step: 682760 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:50:04,159-Speed 6316.28 samples/sec Loss 3.0604 LearningRate 0.0000 Epoch: 32 Global Step: 682770 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:50:07,407-Speed 6306.26 samples/sec Loss 2.9755 LearningRate 0.0000 Epoch: 32 Global Step: 682780 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:50:10,657-Speed 6302.56 samples/sec Loss 3.0475 LearningRate 0.0000 Epoch: 32 Global Step: 682790 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:50:13,887-Speed 6341.84 samples/sec Loss 3.0011 LearningRate 0.0000 Epoch: 32 Global Step: 682800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:50:17,136-Speed 6304.85 samples/sec Loss 2.9935 LearningRate 0.0000 Epoch: 32 Global Step: 682810 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:50:20,390-Speed 6296.11 samples/sec Loss 2.9897 LearningRate 0.0000 Epoch: 32 Global Step: 682820 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:50:23,640-Speed 6303.24 samples/sec Loss 3.0355 LearningRate 0.0000 Epoch: 32 Global Step: 682830 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:50:26,888-Speed 6304.84 samples/sec Loss 2.9692 LearningRate 0.0000 Epoch: 32 Global Step: 682840 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:50:30,140-Speed 6299.44 samples/sec Loss 3.0503 LearningRate 0.0000 Epoch: 32 Global Step: 682850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:50:33,384-Speed 6314.81 samples/sec Loss 3.0863 LearningRate 0.0000 Epoch: 32 Global Step: 682860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:50:36,638-Speed 6295.80 samples/sec Loss 3.0876 LearningRate 0.0000 Epoch: 32 Global Step: 682870 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:50:39,887-Speed 6304.61 samples/sec Loss 2.9825 LearningRate 0.0000 Epoch: 32 Global Step: 682880 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:50:43,145-Speed 6288.40 samples/sec Loss 3.0507 LearningRate 0.0000 Epoch: 32 Global Step: 682890 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:50:46,391-Speed 6310.37 samples/sec Loss 3.0901 LearningRate 0.0000 Epoch: 32 Global Step: 682900 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:50:49,640-Speed 6304.24 samples/sec Loss 3.0004 LearningRate 0.0000 Epoch: 32 Global Step: 682910 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:50:52,894-Speed 6296.09 samples/sec Loss 3.0638 LearningRate 0.0000 Epoch: 32 Global Step: 682920 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:50:56,170-Speed 6251.74 samples/sec Loss 3.0273 LearningRate 0.0000 Epoch: 32 Global Step: 682930 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:50:59,532-Speed 6095.03 samples/sec Loss 3.0640 LearningRate 0.0000 Epoch: 32 Global Step: 682940 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:51:02,836-Speed 6198.57 samples/sec Loss 3.0864 LearningRate 0.0000 Epoch: 32 Global Step: 682950 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:51:06,073-Speed 6329.21 samples/sec Loss 3.0120 LearningRate 0.0000 Epoch: 32 Global Step: 682960 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:51:09,316-Speed 6316.59 samples/sec Loss 3.0567 LearningRate 0.0000 Epoch: 32 Global Step: 682970 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:51:12,567-Speed 6301.24 samples/sec Loss 3.0260 LearningRate 0.0000 Epoch: 32 Global Step: 682980 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:51:15,815-Speed 6307.28 samples/sec Loss 3.0373 LearningRate 0.0000 Epoch: 32 Global Step: 682990 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:51:19,056-Speed 6318.75 samples/sec Loss 3.0382 LearningRate 0.0000 Epoch: 32 Global Step: 683000 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:51:22,305-Speed 6306.06 samples/sec Loss 3.0511 LearningRate 0.0000 Epoch: 32 Global Step: 683010 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:51:25,550-Speed 6313.07 samples/sec Loss 3.0584 LearningRate 0.0000 Epoch: 32 Global Step: 683020 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:51:28,798-Speed 6307.36 samples/sec Loss 3.0269 LearningRate 0.0000 Epoch: 32 Global Step: 683030 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:51:32,044-Speed 6309.95 samples/sec Loss 2.9534 LearningRate 0.0000 Epoch: 32 Global Step: 683040 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:51:35,296-Speed 6298.53 samples/sec Loss 3.1156 LearningRate 0.0000 Epoch: 32 Global Step: 683050 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:51:38,550-Speed 6294.89 samples/sec Loss 3.0974 LearningRate 0.0000 Epoch: 32 Global Step: 683060 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:51:41,801-Speed 6301.85 samples/sec Loss 2.9848 LearningRate 0.0000 Epoch: 32 Global Step: 683070 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:51:45,050-Speed 6305.31 samples/sec Loss 3.0129 LearningRate 0.0000 Epoch: 32 Global Step: 683080 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:51:48,304-Speed 6295.13 samples/sec Loss 3.0068 LearningRate 0.0000 Epoch: 32 Global Step: 683090 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:51:51,552-Speed 6306.19 samples/sec Loss 3.0447 LearningRate 0.0000 Epoch: 32 Global Step: 683100 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:51:54,804-Speed 6298.67 samples/sec Loss 2.9809 LearningRate 0.0000 Epoch: 32 Global Step: 683110 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:51:58,052-Speed 6308.25 samples/sec Loss 2.9972 LearningRate 0.0000 Epoch: 32 Global Step: 683120 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:52:01,296-Speed 6313.41 samples/sec Loss 3.0014 LearningRate 0.0000 Epoch: 32 Global Step: 683130 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:52:04,545-Speed 6305.79 samples/sec Loss 2.9254 LearningRate 0.0000 Epoch: 32 Global Step: 683140 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:52:07,789-Speed 6314.12 samples/sec Loss 3.0229 LearningRate 0.0000 Epoch: 32 Global Step: 683150 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:11,042-Speed 6296.91 samples/sec Loss 3.0335 LearningRate 0.0000 Epoch: 32 Global Step: 683160 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:14,289-Speed 6309.69 samples/sec Loss 3.0101 LearningRate 0.0000 Epoch: 32 Global Step: 683170 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:17,538-Speed 6304.63 samples/sec Loss 3.0154 LearningRate 0.0000 Epoch: 32 Global Step: 683180 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:20,786-Speed 6306.44 samples/sec Loss 2.9972 LearningRate 0.0000 Epoch: 32 Global Step: 683190 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:24,037-Speed 6301.35 samples/sec Loss 3.0655 LearningRate 0.0000 Epoch: 32 Global Step: 683200 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:27,282-Speed 6312.83 samples/sec Loss 2.9800 LearningRate 0.0000 Epoch: 32 Global Step: 683210 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:30,521-Speed 6324.20 samples/sec Loss 3.0421 LearningRate 0.0000 Epoch: 32 Global Step: 683220 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:33,773-Speed 6299.39 samples/sec Loss 2.9803 LearningRate 0.0000 Epoch: 32 Global Step: 683230 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:37,020-Speed 6308.33 samples/sec Loss 3.0404 LearningRate 0.0000 Epoch: 32 Global Step: 683240 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:40,268-Speed 6307.74 samples/sec Loss 3.0414 LearningRate 0.0000 Epoch: 32 Global Step: 683250 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:52:43,517-Speed 6303.99 samples/sec Loss 2.9823 LearningRate 0.0000 Epoch: 32 Global Step: 683260 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:52:46,768-Speed 6302.31 samples/sec Loss 2.9868 LearningRate 0.0000 Epoch: 32 Global Step: 683270 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:52:50,018-Speed 6301.70 samples/sec Loss 3.0146 LearningRate 0.0000 Epoch: 32 Global Step: 683280 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:52:53,251-Speed 6336.20 samples/sec Loss 3.1256 LearningRate 0.0000 Epoch: 32 Global Step: 683290 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:56,499-Speed 6306.12 samples/sec Loss 2.9486 LearningRate 0.0000 Epoch: 32 Global Step: 683300 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:52:59,749-Speed 6304.20 samples/sec Loss 3.0523 LearningRate 0.0000 Epoch: 32 Global Step: 683310 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:53:03,002-Speed 6297.83 samples/sec Loss 3.0216 LearningRate 0.0000 Epoch: 32 Global Step: 683320 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:53:06,247-Speed 6311.68 samples/sec Loss 2.9805 LearningRate 0.0000 Epoch: 32 Global Step: 683330 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:53:09,490-Speed 6316.43 samples/sec Loss 3.0379 LearningRate 0.0000 Epoch: 32 Global Step: 683340 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:53:12,738-Speed 6306.86 samples/sec Loss 3.0564 LearningRate 0.0000 Epoch: 32 Global Step: 683350 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:53:15,990-Speed 6299.65 samples/sec Loss 3.0336 LearningRate 0.0000 Epoch: 32 Global Step: 683360 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:53:19,242-Speed 6297.89 samples/sec Loss 3.0780 LearningRate 0.0000 Epoch: 32 Global Step: 683370 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:53:22,489-Speed 6310.20 samples/sec Loss 3.0783 LearningRate 0.0000 Epoch: 32 Global Step: 683380 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:53:25,735-Speed 6308.95 samples/sec Loss 3.0075 LearningRate 0.0000 Epoch: 32 Global Step: 683390 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:53:28,987-Speed 6300.04 samples/sec Loss 2.9941 LearningRate 0.0000 Epoch: 32 Global Step: 683400 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:53:32,234-Speed 6308.93 samples/sec Loss 2.9928 LearningRate 0.0000 Epoch: 32 Global Step: 683410 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:53:35,482-Speed 6307.04 samples/sec Loss 3.0446 LearningRate 0.0000 Epoch: 32 Global Step: 683420 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:53:38,731-Speed 6305.26 samples/sec Loss 3.0594 LearningRate 0.0000 Epoch: 32 Global Step: 683430 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:53:41,978-Speed 6308.88 samples/sec Loss 2.9458 LearningRate 0.0000 Epoch: 32 Global Step: 683440 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:53:45,228-Speed 6303.66 samples/sec Loss 3.0460 LearningRate 0.0000 Epoch: 32 Global Step: 683450 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:53:48,479-Speed 6300.89 samples/sec Loss 2.9920 LearningRate 0.0000 Epoch: 32 Global Step: 683460 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:53:51,732-Speed 6300.05 samples/sec Loss 3.0821 LearningRate 0.0000 Epoch: 32 Global Step: 683470 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:53:54,981-Speed 6304.18 samples/sec Loss 3.0100 LearningRate 0.0000 Epoch: 32 Global Step: 683480 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:53:58,232-Speed 6301.91 samples/sec Loss 3.0318 LearningRate 0.0000 Epoch: 32 Global Step: 683490 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-03 05:54:01,464-Speed 6336.60 samples/sec Loss 3.0784 LearningRate 0.0000 Epoch: 32 Global Step: 683500 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:54:04,722-Speed 6287.60 samples/sec Loss 3.0299 LearningRate 0.0000 Epoch: 32 Global Step: 683510 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:54:07,970-Speed 6307.01 samples/sec Loss 3.0346 LearningRate 0.0000 Epoch: 32 Global Step: 683520 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:54:11,276-Speed 6197.06 samples/sec Loss 2.9980 LearningRate 0.0000 Epoch: 32 Global Step: 683530 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:14,567-Speed 6223.01 samples/sec Loss 3.0143 LearningRate 0.0000 Epoch: 32 Global Step: 683540 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:17,819-Speed 6299.60 samples/sec Loss 3.0651 LearningRate 0.0000 Epoch: 32 Global Step: 683550 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:21,065-Speed 6309.64 samples/sec Loss 3.0309 LearningRate 0.0000 Epoch: 32 Global Step: 683560 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:24,312-Speed 6313.19 samples/sec Loss 3.0023 LearningRate 0.0000 Epoch: 32 Global Step: 683570 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:27,571-Speed 6286.07 samples/sec Loss 2.9761 LearningRate 0.0000 Epoch: 32 Global Step: 683580 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:30,823-Speed 6298.90 samples/sec Loss 3.0550 LearningRate 0.0000 Epoch: 32 Global Step: 683590 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:34,071-Speed 6307.09 samples/sec Loss 3.0012 LearningRate 0.0000 Epoch: 32 Global Step: 683600 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:37,330-Speed 6283.90 samples/sec Loss 3.0937 LearningRate 0.0000 Epoch: 32 Global Step: 683610 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:40,575-Speed 6313.10 samples/sec Loss 3.0052 LearningRate 0.0000 Epoch: 32 Global Step: 683620 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:43,826-Speed 6301.54 samples/sec Loss 2.9652 LearningRate 0.0000 Epoch: 32 Global Step: 683630 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:54:47,062-Speed 6332.14 samples/sec Loss 3.0288 LearningRate 0.0000 Epoch: 32 Global Step: 683640 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:50,306-Speed 6313.72 samples/sec Loss 3.0315 LearningRate 0.0000 Epoch: 32 Global Step: 683650 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:53,552-Speed 6310.54 samples/sec Loss 2.9947 LearningRate 0.0000 Epoch: 32 Global Step: 683660 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:54:56,796-Speed 6315.71 samples/sec Loss 3.0806 LearningRate 0.0000 Epoch: 32 Global Step: 683670 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:55:00,042-Speed 6310.84 samples/sec Loss 3.0441 LearningRate 0.0000 Epoch: 32 Global Step: 683680 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:55:03,291-Speed 6304.06 samples/sec Loss 3.0134 LearningRate 0.0000 Epoch: 32 Global Step: 683690 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:55:06,542-Speed 6301.76 samples/sec Loss 3.0172 LearningRate 0.0000 Epoch: 32 Global Step: 683700 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:55:09,793-Speed 6301.04 samples/sec Loss 2.9660 LearningRate 0.0000 Epoch: 32 Global Step: 683710 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:55:13,042-Speed 6303.59 samples/sec Loss 2.9987 LearningRate 0.0000 Epoch: 32 Global Step: 683720 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:55:16,292-Speed 6303.23 samples/sec Loss 3.0136 LearningRate 0.0000 Epoch: 32 Global Step: 683730 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:55:19,539-Speed 6308.21 samples/sec Loss 3.0089 LearningRate 0.0000 Epoch: 32 Global Step: 683740 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:22,789-Speed 6304.53 samples/sec Loss 3.0342 LearningRate 0.0000 Epoch: 32 Global Step: 683750 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:26,044-Speed 6292.46 samples/sec Loss 3.0141 LearningRate 0.0000 Epoch: 32 Global Step: 683760 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:29,302-Speed 6286.94 samples/sec Loss 2.9760 LearningRate 0.0000 Epoch: 32 Global Step: 683770 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:32,558-Speed 6292.67 samples/sec Loss 3.0776 LearningRate 0.0000 Epoch: 32 Global Step: 683780 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:35,809-Speed 6300.48 samples/sec Loss 3.0419 LearningRate 0.0000 Epoch: 32 Global Step: 683790 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:39,066-Speed 6288.71 samples/sec Loss 3.0639 LearningRate 0.0000 Epoch: 32 Global Step: 683800 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:42,317-Speed 6302.16 samples/sec Loss 3.0888 LearningRate 0.0000 Epoch: 32 Global Step: 683810 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:45,573-Speed 6290.72 samples/sec Loss 3.0239 LearningRate 0.0000 Epoch: 32 Global Step: 683820 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:48,827-Speed 6295.01 samples/sec Loss 3.0662 LearningRate 0.0000 Epoch: 32 Global Step: 683830 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:52,058-Speed 6339.83 samples/sec Loss 2.9798 LearningRate 0.0000 Epoch: 32 Global Step: 683840 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:55,307-Speed 6305.52 samples/sec Loss 3.0551 LearningRate 0.0000 Epoch: 32 Global Step: 683850 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:55:58,568-Speed 6281.80 samples/sec Loss 3.0266 LearningRate 0.0000 Epoch: 32 Global Step: 683860 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:56:01,814-Speed 6311.94 samples/sec Loss 2.9682 LearningRate 0.0000 Epoch: 32 Global Step: 683870 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:56:05,060-Speed 6310.61 samples/sec Loss 3.0094 LearningRate 0.0000 Epoch: 32 Global Step: 683880 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:56:08,314-Speed 6294.78 samples/sec Loss 3.0531 LearningRate 0.0000 Epoch: 32 Global Step: 683890 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:56:11,573-Speed 6284.69 samples/sec Loss 3.0718 LearningRate 0.0000 Epoch: 32 Global Step: 683900 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:56:14,823-Speed 6302.49 samples/sec Loss 3.0101 LearningRate 0.0000 Epoch: 32 Global Step: 683910 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:56:18,064-Speed 6320.43 samples/sec Loss 3.0273 LearningRate 0.0000 Epoch: 32 Global Step: 683920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:21,317-Speed 6297.16 samples/sec Loss 3.0172 LearningRate 0.0000 Epoch: 32 Global Step: 683930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:24,561-Speed 6315.06 samples/sec Loss 2.9783 LearningRate 0.0000 Epoch: 32 Global Step: 683940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:27,815-Speed 6294.92 samples/sec Loss 2.9939 LearningRate 0.0000 Epoch: 32 Global Step: 683950 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:31,069-Speed 6295.58 samples/sec Loss 3.0225 LearningRate 0.0000 Epoch: 32 Global Step: 683960 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:34,316-Speed 6308.52 samples/sec Loss 2.9955 LearningRate 0.0000 Epoch: 32 Global Step: 683970 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:37,561-Speed 6312.59 samples/sec Loss 3.0108 LearningRate 0.0000 Epoch: 32 Global Step: 683980 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:40,808-Speed 6308.51 samples/sec Loss 3.0486 LearningRate 0.0000 Epoch: 32 Global Step: 683990 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:44,067-Speed 6286.00 samples/sec Loss 3.0465 LearningRate 0.0000 Epoch: 32 Global Step: 684000 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:47,335-Speed 6268.91 samples/sec Loss 2.9829 LearningRate 0.0000 Epoch: 32 Global Step: 684010 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:50,574-Speed 6324.04 samples/sec Loss 3.0483 LearningRate 0.0000 Epoch: 32 Global Step: 684020 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:53,828-Speed 6295.55 samples/sec Loss 3.0267 LearningRate 0.0000 Epoch: 32 Global Step: 684030 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:56:57,076-Speed 6307.94 samples/sec Loss 3.0543 LearningRate 0.0000 Epoch: 32 Global Step: 684040 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:00,327-Speed 6300.24 samples/sec Loss 3.0299 LearningRate 0.0000 Epoch: 32 Global Step: 684050 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:03,581-Speed 6295.33 samples/sec Loss 3.0025 LearningRate 0.0000 Epoch: 32 Global Step: 684060 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:06,876-Speed 6217.33 samples/sec Loss 3.0071 LearningRate 0.0000 Epoch: 32 Global Step: 684070 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:10,120-Speed 6314.07 samples/sec Loss 2.9872 LearningRate 0.0000 Epoch: 32 Global Step: 684080 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:13,363-Speed 6316.39 samples/sec Loss 3.0523 LearningRate 0.0000 Epoch: 32 Global Step: 684090 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:16,607-Speed 6314.98 samples/sec Loss 3.0409 LearningRate 0.0000 Epoch: 32 Global Step: 684100 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:19,858-Speed 6301.61 samples/sec Loss 2.9778 LearningRate 0.0000 Epoch: 32 Global Step: 684110 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:23,121-Speed 6277.38 samples/sec Loss 2.9818 LearningRate 0.0000 Epoch: 32 Global Step: 684120 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:57:26,363-Speed 6317.58 samples/sec Loss 3.0188 LearningRate 0.0000 Epoch: 32 Global Step: 684130 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:57:29,612-Speed 6306.62 samples/sec Loss 3.0108 LearningRate 0.0000 Epoch: 32 Global Step: 684140 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:57:32,850-Speed 6325.50 samples/sec Loss 3.0101 LearningRate 0.0000 Epoch: 32 Global Step: 684150 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:36,112-Speed 6279.86 samples/sec Loss 3.0395 LearningRate 0.0000 Epoch: 32 Global Step: 684160 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:39,413-Speed 6205.37 samples/sec Loss 3.0897 LearningRate 0.0000 Epoch: 32 Global Step: 684170 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:42,672-Speed 6285.16 samples/sec Loss 3.0024 LearningRate 0.0000 Epoch: 32 Global Step: 684180 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:45,935-Speed 6278.09 samples/sec Loss 3.0243 LearningRate 0.0000 Epoch: 32 Global Step: 684190 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:49,198-Speed 6278.64 samples/sec Loss 3.0390 LearningRate 0.0000 Epoch: 32 Global Step: 684200 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:52,448-Speed 6302.29 samples/sec Loss 3.0370 LearningRate 0.0000 Epoch: 32 Global Step: 684210 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:55,696-Speed 6306.52 samples/sec Loss 3.0202 LearningRate 0.0000 Epoch: 32 Global Step: 684220 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:57:58,946-Speed 6302.72 samples/sec Loss 3.0320 LearningRate 0.0000 Epoch: 32 Global Step: 684230 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:02,200-Speed 6294.99 samples/sec Loss 2.9641 LearningRate 0.0000 Epoch: 32 Global Step: 684240 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:05,447-Speed 6311.53 samples/sec Loss 3.0467 LearningRate 0.0000 Epoch: 32 Global Step: 684250 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:58:08,699-Speed 6300.25 samples/sec Loss 3.0351 LearningRate 0.0000 Epoch: 32 Global Step: 684260 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:58:11,941-Speed 6316.76 samples/sec Loss 2.9923 LearningRate 0.0000 Epoch: 32 Global Step: 684270 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:58:15,173-Speed 6339.55 samples/sec Loss 3.0155 LearningRate 0.0000 Epoch: 32 Global Step: 684280 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:18,421-Speed 6306.70 samples/sec Loss 3.0426 LearningRate 0.0000 Epoch: 32 Global Step: 684290 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:21,668-Speed 6308.86 samples/sec Loss 3.0143 LearningRate 0.0000 Epoch: 32 Global Step: 684300 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:24,920-Speed 6299.77 samples/sec Loss 3.0368 LearningRate 0.0000 Epoch: 32 Global Step: 684310 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:28,174-Speed 6295.21 samples/sec Loss 3.0309 LearningRate 0.0000 Epoch: 32 Global Step: 684320 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:31,423-Speed 6305.48 samples/sec Loss 3.0504 LearningRate 0.0000 Epoch: 32 Global Step: 684330 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:34,681-Speed 6287.84 samples/sec Loss 3.0576 LearningRate 0.0000 Epoch: 32 Global Step: 684340 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:37,930-Speed 6304.15 samples/sec Loss 2.9990 LearningRate 0.0000 Epoch: 32 Global Step: 684350 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:41,206-Speed 6253.30 samples/sec Loss 3.0597 LearningRate 0.0000 Epoch: 32 Global Step: 684360 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:44,460-Speed 6294.84 samples/sec Loss 3.0342 LearningRate 0.0000 Epoch: 32 Global Step: 684370 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 05:58:47,715-Speed 6293.35 samples/sec Loss 3.0260 LearningRate 0.0000 Epoch: 32 Global Step: 684380 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:58:51,047-Speed 6147.87 samples/sec Loss 3.0463 LearningRate 0.0000 Epoch: 32 Global Step: 684390 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:58:54,356-Speed 6189.53 samples/sec Loss 3.0451 LearningRate 0.0000 Epoch: 32 Global Step: 684400 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:59:54,347-Speed 341.39 samples/sec Loss 3.1056 LearningRate 0.0000 Epoch: 33 Global Step: 684410 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 05:59:57,579-Speed 6338.10 samples/sec Loss 3.0123 LearningRate 0.0000 Epoch: 33 Global Step: 684420 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:00,820-Speed 6320.14 samples/sec Loss 3.0364 LearningRate 0.0000 Epoch: 33 Global Step: 684430 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:04,059-Speed 6324.50 samples/sec Loss 3.0245 LearningRate 0.0000 Epoch: 33 Global Step: 684440 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:07,295-Speed 6330.54 samples/sec Loss 3.0628 LearningRate 0.0000 Epoch: 33 Global Step: 684450 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:10,530-Speed 6331.77 samples/sec Loss 3.0135 LearningRate 0.0000 Epoch: 33 Global Step: 684460 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:13,769-Speed 6325.46 samples/sec Loss 2.9877 LearningRate 0.0000 Epoch: 33 Global Step: 684470 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:16,993-Speed 6353.84 samples/sec Loss 3.0251 LearningRate 0.0000 Epoch: 33 Global Step: 684480 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:20,231-Speed 6325.96 samples/sec Loss 3.0069 LearningRate 0.0000 Epoch: 33 Global Step: 684490 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:23,480-Speed 6304.77 samples/sec Loss 2.9901 LearningRate 0.0000 Epoch: 33 Global Step: 684500 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:26,722-Speed 6316.73 samples/sec Loss 3.0325 LearningRate 0.0000 Epoch: 33 Global Step: 684510 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:29,963-Speed 6322.17 samples/sec Loss 3.0145 LearningRate 0.0000 Epoch: 33 Global Step: 684520 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:33,201-Speed 6325.89 samples/sec Loss 3.0366 LearningRate 0.0000 Epoch: 33 Global Step: 684530 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:36,442-Speed 6320.47 samples/sec Loss 3.0318 LearningRate 0.0000 Epoch: 33 Global Step: 684540 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:39,684-Speed 6319.05 samples/sec Loss 3.0877 LearningRate 0.0000 Epoch: 33 Global Step: 684550 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:42,929-Speed 6312.49 samples/sec Loss 3.0259 LearningRate 0.0000 Epoch: 33 Global Step: 684560 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:46,173-Speed 6314.25 samples/sec Loss 2.9905 LearningRate 0.0000 Epoch: 33 Global Step: 684570 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:49,397-Speed 6353.40 samples/sec Loss 3.0287 LearningRate 0.0000 Epoch: 33 Global Step: 684580 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:52,643-Speed 6310.65 samples/sec Loss 3.0145 LearningRate 0.0000 Epoch: 33 Global Step: 684590 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:55,884-Speed 6321.57 samples/sec Loss 2.9920 LearningRate 0.0000 Epoch: 33 Global Step: 684600 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:00:59,136-Speed 6299.26 samples/sec Loss 3.0089 LearningRate 0.0000 Epoch: 33 Global Step: 684610 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:01:02,387-Speed 6301.14 samples/sec Loss 3.0777 LearningRate 0.0000 Epoch: 33 Global Step: 684620 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:01:05,638-Speed 6300.56 samples/sec Loss 3.0148 LearningRate 0.0000 Epoch: 33 Global Step: 684630 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:01:08,896-Speed 6286.88 samples/sec Loss 2.9796 LearningRate 0.0000 Epoch: 33 Global Step: 684640 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:01:12,145-Speed 6305.15 samples/sec Loss 3.0579 LearningRate 0.0000 Epoch: 33 Global Step: 684650 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:01:15,385-Speed 6322.79 samples/sec Loss 3.0312 LearningRate 0.0000 Epoch: 33 Global Step: 684660 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:18,640-Speed 6294.16 samples/sec Loss 2.9825 LearningRate 0.0000 Epoch: 33 Global Step: 684670 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:21,887-Speed 6307.98 samples/sec Loss 3.0089 LearningRate 0.0000 Epoch: 33 Global Step: 684680 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:25,140-Speed 6295.89 samples/sec Loss 2.9848 LearningRate 0.0000 Epoch: 33 Global Step: 684690 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:28,389-Speed 6306.76 samples/sec Loss 2.9788 LearningRate 0.0000 Epoch: 33 Global Step: 684700 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:31,634-Speed 6311.51 samples/sec Loss 3.0353 LearningRate 0.0000 Epoch: 33 Global Step: 684710 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:34,878-Speed 6315.34 samples/sec Loss 3.0028 LearningRate 0.0000 Epoch: 33 Global Step: 684720 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:38,121-Speed 6315.64 samples/sec Loss 2.9937 LearningRate 0.0000 Epoch: 33 Global Step: 684730 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:41,365-Speed 6316.07 samples/sec Loss 3.0384 LearningRate 0.0000 Epoch: 33 Global Step: 684740 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:44,612-Speed 6309.03 samples/sec Loss 2.9646 LearningRate 0.0000 Epoch: 33 Global Step: 684750 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:47,855-Speed 6316.93 samples/sec Loss 2.9939 LearningRate 0.0000 Epoch: 33 Global Step: 684760 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:01:51,083-Speed 6345.36 samples/sec Loss 3.0387 LearningRate 0.0000 Epoch: 33 Global Step: 684770 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:54,327-Speed 6313.59 samples/sec Loss 3.0395 LearningRate 0.0000 Epoch: 33 Global Step: 684780 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:01:57,568-Speed 6321.98 samples/sec Loss 3.0014 LearningRate 0.0000 Epoch: 33 Global Step: 684790 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:00,813-Speed 6311.31 samples/sec Loss 2.9925 LearningRate 0.0000 Epoch: 33 Global Step: 684800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:04,061-Speed 6308.36 samples/sec Loss 2.9531 LearningRate 0.0000 Epoch: 33 Global Step: 684810 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:07,307-Speed 6309.98 samples/sec Loss 3.0303 LearningRate 0.0000 Epoch: 33 Global Step: 684820 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:10,569-Speed 6279.54 samples/sec Loss 3.0119 LearningRate 0.0000 Epoch: 33 Global Step: 684830 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:13,820-Speed 6301.12 samples/sec Loss 2.9911 LearningRate 0.0000 Epoch: 33 Global Step: 684840 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:17,067-Speed 6308.69 samples/sec Loss 3.0533 LearningRate 0.0000 Epoch: 33 Global Step: 684850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:20,310-Speed 6316.98 samples/sec Loss 3.0089 LearningRate 0.0000 Epoch: 33 Global Step: 684860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:23,568-Speed 6286.09 samples/sec Loss 3.0608 LearningRate 0.0000 Epoch: 33 Global Step: 684870 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:02:26,823-Speed 6293.11 samples/sec Loss 3.0291 LearningRate 0.0000 Epoch: 33 Global Step: 684880 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:02:30,063-Speed 6324.07 samples/sec Loss 2.9964 LearningRate 0.0000 Epoch: 33 Global Step: 684890 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:02:33,294-Speed 6339.06 samples/sec Loss 3.0330 LearningRate 0.0000 Epoch: 33 Global Step: 684900 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:36,542-Speed 6307.97 samples/sec Loss 3.0457 LearningRate 0.0000 Epoch: 33 Global Step: 684910 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:39,798-Speed 6290.91 samples/sec Loss 3.0102 LearningRate 0.0000 Epoch: 33 Global Step: 684920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:43,049-Speed 6301.32 samples/sec Loss 3.0119 LearningRate 0.0000 Epoch: 33 Global Step: 684930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:46,292-Speed 6316.25 samples/sec Loss 3.0138 LearningRate 0.0000 Epoch: 33 Global Step: 684940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:49,536-Speed 6315.12 samples/sec Loss 3.0537 LearningRate 0.0000 Epoch: 33 Global Step: 684950 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:52,776-Speed 6323.14 samples/sec Loss 3.0337 LearningRate 0.0000 Epoch: 33 Global Step: 684960 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:56,018-Speed 6317.11 samples/sec Loss 3.0374 LearningRate 0.0000 Epoch: 33 Global Step: 684970 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:02:59,257-Speed 6328.21 samples/sec Loss 3.0505 LearningRate 0.0000 Epoch: 33 Global Step: 684980 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:02,511-Speed 6294.23 samples/sec Loss 2.9560 LearningRate 0.0000 Epoch: 33 Global Step: 684990 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:05,764-Speed 6297.36 samples/sec Loss 3.0644 LearningRate 0.0000 Epoch: 33 Global Step: 685000 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:03:09,012-Speed 6307.10 samples/sec Loss 2.9161 LearningRate 0.0000 Epoch: 33 Global Step: 685010 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:03:12,240-Speed 6346.06 samples/sec Loss 3.0164 LearningRate 0.0000 Epoch: 33 Global Step: 685020 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:15,482-Speed 6317.06 samples/sec Loss 3.0750 LearningRate 0.0000 Epoch: 33 Global Step: 685030 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:18,727-Speed 6313.02 samples/sec Loss 2.9444 LearningRate 0.0000 Epoch: 33 Global Step: 685040 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:21,972-Speed 6313.72 samples/sec Loss 3.0285 LearningRate 0.0000 Epoch: 33 Global Step: 685050 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:25,217-Speed 6312.10 samples/sec Loss 3.0115 LearningRate 0.0000 Epoch: 33 Global Step: 685060 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:28,462-Speed 6313.66 samples/sec Loss 3.0353 LearningRate 0.0000 Epoch: 33 Global Step: 685070 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:31,699-Speed 6326.95 samples/sec Loss 2.9668 LearningRate 0.0000 Epoch: 33 Global Step: 685080 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:34,944-Speed 6313.02 samples/sec Loss 3.0270 LearningRate 0.0000 Epoch: 33 Global Step: 685090 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:38,199-Speed 6292.51 samples/sec Loss 3.0134 LearningRate 0.0000 Epoch: 33 Global Step: 685100 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:41,446-Speed 6309.00 samples/sec Loss 2.9746 LearningRate 0.0000 Epoch: 33 Global Step: 685110 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:03:44,692-Speed 6310.51 samples/sec Loss 2.9819 LearningRate 0.0000 Epoch: 33 Global Step: 685120 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:03:47,943-Speed 6302.48 samples/sec Loss 2.9760 LearningRate 0.0000 Epoch: 33 Global Step: 685130 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:03:51,196-Speed 6296.16 samples/sec Loss 3.0022 LearningRate 0.0000 Epoch: 33 Global Step: 685140 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:03:54,452-Speed 6292.03 samples/sec Loss 3.0107 LearningRate 0.0000 Epoch: 33 Global Step: 685150 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:03:57,705-Speed 6296.21 samples/sec Loss 2.9934 LearningRate 0.0000 Epoch: 33 Global Step: 685160 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:00,958-Speed 6297.08 samples/sec Loss 2.9614 LearningRate 0.0000 Epoch: 33 Global Step: 685170 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:04,210-Speed 6300.15 samples/sec Loss 3.0190 LearningRate 0.0000 Epoch: 33 Global Step: 685180 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:07,454-Speed 6314.62 samples/sec Loss 3.0531 LearningRate 0.0000 Epoch: 33 Global Step: 685190 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:10,708-Speed 6296.89 samples/sec Loss 3.0516 LearningRate 0.0000 Epoch: 33 Global Step: 685200 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:13,963-Speed 6293.11 samples/sec Loss 2.9967 LearningRate 0.0000 Epoch: 33 Global Step: 685210 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:17,193-Speed 6341.02 samples/sec Loss 2.9609 LearningRate 0.0000 Epoch: 33 Global Step: 685220 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:20,440-Speed 6308.32 samples/sec Loss 3.0170 LearningRate 0.0000 Epoch: 33 Global Step: 685230 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:23,683-Speed 6317.24 samples/sec Loss 2.9976 LearningRate 0.0000 Epoch: 33 Global Step: 685240 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:26,926-Speed 6317.25 samples/sec Loss 3.0009 LearningRate 0.0000 Epoch: 33 Global Step: 685250 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:30,171-Speed 6311.33 samples/sec Loss 3.0581 LearningRate 0.0000 Epoch: 33 Global Step: 685260 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:33,418-Speed 6309.45 samples/sec Loss 3.0334 LearningRate 0.0000 Epoch: 33 Global Step: 685270 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:36,658-Speed 6321.85 samples/sec Loss 3.0223 LearningRate 0.0000 Epoch: 33 Global Step: 685280 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:39,908-Speed 6303.83 samples/sec Loss 3.0411 LearningRate 0.0000 Epoch: 33 Global Step: 685290 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:43,153-Speed 6312.78 samples/sec Loss 3.0597 LearningRate 0.0000 Epoch: 33 Global Step: 685300 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:04:46,380-Speed 6348.43 samples/sec Loss 3.0454 LearningRate 0.0000 Epoch: 33 Global Step: 685310 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:04:49,625-Speed 6311.76 samples/sec Loss 3.0133 LearningRate 0.0000 Epoch: 33 Global Step: 685320 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:04:52,866-Speed 6319.96 samples/sec Loss 3.0081 LearningRate 0.0000 Epoch: 33 Global Step: 685330 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:04:56,109-Speed 6316.55 samples/sec Loss 2.9967 LearningRate 0.0000 Epoch: 33 Global Step: 685340 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:04:59,354-Speed 6315.87 samples/sec Loss 3.0069 LearningRate 0.0000 Epoch: 33 Global Step: 685350 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:02,597-Speed 6317.12 samples/sec Loss 3.0599 LearningRate 0.0000 Epoch: 33 Global Step: 685360 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:05,839-Speed 6317.49 samples/sec Loss 2.9774 LearningRate 0.0000 Epoch: 33 Global Step: 685370 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:09,093-Speed 6296.30 samples/sec Loss 3.0493 LearningRate 0.0000 Epoch: 33 Global Step: 685380 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:12,356-Speed 6278.95 samples/sec Loss 3.0040 LearningRate 0.0000 Epoch: 33 Global Step: 685390 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:15,698-Speed 6130.13 samples/sec Loss 3.0232 LearningRate 0.0000 Epoch: 33 Global Step: 685400 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:19,029-Speed 6148.64 samples/sec Loss 3.0067 LearningRate 0.0000 Epoch: 33 Global Step: 685410 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:05:22,263-Speed 6334.91 samples/sec Loss 3.0150 LearningRate 0.0000 Epoch: 33 Global Step: 685420 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:25,510-Speed 6309.24 samples/sec Loss 2.9825 LearningRate 0.0000 Epoch: 33 Global Step: 685430 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:28,752-Speed 6317.62 samples/sec Loss 3.0628 LearningRate 0.0000 Epoch: 33 Global Step: 685440 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:32,007-Speed 6292.90 samples/sec Loss 3.0166 LearningRate 0.0000 Epoch: 33 Global Step: 685450 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:35,256-Speed 6304.76 samples/sec Loss 3.0570 LearningRate 0.0000 Epoch: 33 Global Step: 685460 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:38,503-Speed 6309.00 samples/sec Loss 3.0050 LearningRate 0.0000 Epoch: 33 Global Step: 685470 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:41,759-Speed 6291.91 samples/sec Loss 2.9916 LearningRate 0.0000 Epoch: 33 Global Step: 685480 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:45,003-Speed 6314.26 samples/sec Loss 2.9951 LearningRate 0.0000 Epoch: 33 Global Step: 685490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:48,249-Speed 6310.03 samples/sec Loss 3.0051 LearningRate 0.0000 Epoch: 33 Global Step: 685500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:51,494-Speed 6314.05 samples/sec Loss 3.0375 LearningRate 0.0000 Epoch: 33 Global Step: 685510 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:05:54,740-Speed 6309.60 samples/sec Loss 2.9525 LearningRate 0.0000 Epoch: 33 Global Step: 685520 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:05:57,982-Speed 6318.97 samples/sec Loss 2.9971 LearningRate 0.0000 Epoch: 33 Global Step: 685530 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:06:01,237-Speed 6293.47 samples/sec Loss 2.9928 LearningRate 0.0000 Epoch: 33 Global Step: 685540 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:06:04,484-Speed 6308.58 samples/sec Loss 3.0152 LearningRate 0.0000 Epoch: 33 Global Step: 685550 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:06:07,715-Speed 6338.87 samples/sec Loss 2.9784 LearningRate 0.0000 Epoch: 33 Global Step: 685560 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:10,965-Speed 6303.66 samples/sec Loss 3.0457 LearningRate 0.0000 Epoch: 33 Global Step: 685570 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:14,210-Speed 6312.98 samples/sec Loss 2.9310 LearningRate 0.0000 Epoch: 33 Global Step: 685580 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:17,455-Speed 6313.61 samples/sec Loss 2.9810 LearningRate 0.0000 Epoch: 33 Global Step: 685590 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:20,703-Speed 6307.43 samples/sec Loss 3.0388 LearningRate 0.0000 Epoch: 33 Global Step: 685600 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:23,945-Speed 6318.91 samples/sec Loss 2.9750 LearningRate 0.0000 Epoch: 33 Global Step: 685610 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:27,191-Speed 6309.67 samples/sec Loss 3.0757 LearningRate 0.0000 Epoch: 33 Global Step: 685620 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:30,437-Speed 6311.59 samples/sec Loss 3.0140 LearningRate 0.0000 Epoch: 33 Global Step: 685630 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:33,686-Speed 6303.70 samples/sec Loss 2.9752 LearningRate 0.0000 Epoch: 33 Global Step: 685640 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:36,934-Speed 6307.05 samples/sec Loss 3.0258 LearningRate 0.0000 Epoch: 33 Global Step: 685650 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:40,178-Speed 6314.40 samples/sec Loss 3.0446 LearningRate 0.0000 Epoch: 33 Global Step: 685660 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:06:43,409-Speed 6340.20 samples/sec Loss 3.0447 LearningRate 0.0000 Epoch: 33 Global Step: 685670 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:46,653-Speed 6314.48 samples/sec Loss 3.0127 LearningRate 0.0000 Epoch: 33 Global Step: 685680 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:49,906-Speed 6298.59 samples/sec Loss 2.9761 LearningRate 0.0000 Epoch: 33 Global Step: 685690 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:53,157-Speed 6299.26 samples/sec Loss 2.9868 LearningRate 0.0000 Epoch: 33 Global Step: 685700 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:56,408-Speed 6302.70 samples/sec Loss 3.0082 LearningRate 0.0000 Epoch: 33 Global Step: 685710 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:06:59,651-Speed 6315.17 samples/sec Loss 2.9695 LearningRate 0.0000 Epoch: 33 Global Step: 685720 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:02,895-Speed 6315.91 samples/sec Loss 2.9716 LearningRate 0.0000 Epoch: 33 Global Step: 685730 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:06,181-Speed 6233.42 samples/sec Loss 3.0319 LearningRate 0.0000 Epoch: 33 Global Step: 685740 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:09,437-Speed 6290.80 samples/sec Loss 3.0073 LearningRate 0.0000 Epoch: 33 Global Step: 685750 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:12,699-Speed 6279.76 samples/sec Loss 3.0684 LearningRate 0.0000 Epoch: 33 Global Step: 685760 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:15,945-Speed 6310.23 samples/sec Loss 2.9886 LearningRate 0.0000 Epoch: 33 Global Step: 685770 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:07:19,179-Speed 6334.96 samples/sec Loss 3.0299 LearningRate 0.0000 Epoch: 33 Global Step: 685780 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:22,483-Speed 6198.85 samples/sec Loss 2.9923 LearningRate 0.0000 Epoch: 33 Global Step: 685790 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:25,737-Speed 6297.17 samples/sec Loss 3.0464 LearningRate 0.0000 Epoch: 33 Global Step: 685800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:28,985-Speed 6306.56 samples/sec Loss 3.0432 LearningRate 0.0000 Epoch: 33 Global Step: 685810 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:32,228-Speed 6316.04 samples/sec Loss 2.9722 LearningRate 0.0000 Epoch: 33 Global Step: 685820 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:35,473-Speed 6312.60 samples/sec Loss 2.9800 LearningRate 0.0000 Epoch: 33 Global Step: 685830 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:38,726-Speed 6299.07 samples/sec Loss 3.0174 LearningRate 0.0000 Epoch: 33 Global Step: 685840 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:41,966-Speed 6321.20 samples/sec Loss 2.9315 LearningRate 0.0000 Epoch: 33 Global Step: 685850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:45,209-Speed 6317.39 samples/sec Loss 3.0308 LearningRate 0.0000 Epoch: 33 Global Step: 685860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:48,460-Speed 6300.51 samples/sec Loss 2.9679 LearningRate 0.0000 Epoch: 33 Global Step: 685870 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:07:51,708-Speed 6307.14 samples/sec Loss 2.9958 LearningRate 0.0000 Epoch: 33 Global Step: 685880 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:07:54,961-Speed 6295.68 samples/sec Loss 3.0552 LearningRate 0.0000 Epoch: 33 Global Step: 685890 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:07:58,196-Speed 6333.19 samples/sec Loss 3.0145 LearningRate 0.0000 Epoch: 33 Global Step: 685900 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:01,445-Speed 6305.74 samples/sec Loss 3.0083 LearningRate 0.0000 Epoch: 33 Global Step: 685910 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:04,689-Speed 6313.68 samples/sec Loss 2.9992 LearningRate 0.0000 Epoch: 33 Global Step: 685920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:07,947-Speed 6286.58 samples/sec Loss 2.9653 LearningRate 0.0000 Epoch: 33 Global Step: 685930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:11,190-Speed 6317.74 samples/sec Loss 3.0032 LearningRate 0.0000 Epoch: 33 Global Step: 685940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:14,434-Speed 6314.70 samples/sec Loss 2.9958 LearningRate 0.0000 Epoch: 33 Global Step: 685950 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:17,686-Speed 6298.25 samples/sec Loss 3.0323 LearningRate 0.0000 Epoch: 33 Global Step: 685960 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:20,942-Speed 6291.56 samples/sec Loss 3.0419 LearningRate 0.0000 Epoch: 33 Global Step: 685970 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:24,190-Speed 6307.05 samples/sec Loss 2.9779 LearningRate 0.0000 Epoch: 33 Global Step: 685980 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:27,433-Speed 6316.53 samples/sec Loss 2.9905 LearningRate 0.0000 Epoch: 33 Global Step: 685990 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:30,681-Speed 6307.74 samples/sec Loss 3.0058 LearningRate 0.0000 Epoch: 33 Global Step: 686000 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:08:33,935-Speed 6294.46 samples/sec Loss 2.9782 LearningRate 0.0000 Epoch: 33 Global Step: 686010 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:08:37,175-Speed 6322.24 samples/sec Loss 3.0231 LearningRate 0.0000 Epoch: 33 Global Step: 686020 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:08:40,418-Speed 6316.82 samples/sec Loss 2.9789 LearningRate 0.0000 Epoch: 33 Global Step: 686030 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:08:43,672-Speed 6294.89 samples/sec Loss 3.0141 LearningRate 0.0000 Epoch: 33 Global Step: 686040 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:08:46,916-Speed 6315.25 samples/sec Loss 3.0332 LearningRate 0.0000 Epoch: 33 Global Step: 686050 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:08:50,161-Speed 6313.37 samples/sec Loss 3.0079 LearningRate 0.0000 Epoch: 33 Global Step: 686060 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:08:53,394-Speed 6334.94 samples/sec Loss 2.9513 LearningRate 0.0000 Epoch: 33 Global Step: 686070 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:56,645-Speed 6301.87 samples/sec Loss 3.0170 LearningRate 0.0000 Epoch: 33 Global Step: 686080 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:08:59,889-Speed 6313.74 samples/sec Loss 2.9821 LearningRate 0.0000 Epoch: 33 Global Step: 686090 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:03,133-Speed 6315.29 samples/sec Loss 2.9875 LearningRate 0.0000 Epoch: 33 Global Step: 686100 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:06,379-Speed 6311.27 samples/sec Loss 3.0440 LearningRate 0.0000 Epoch: 33 Global Step: 686110 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:09,622-Speed 6315.61 samples/sec Loss 2.9929 LearningRate 0.0000 Epoch: 33 Global Step: 686120 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:12,864-Speed 6317.87 samples/sec Loss 2.9798 LearningRate 0.0000 Epoch: 33 Global Step: 686130 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:16,107-Speed 6317.39 samples/sec Loss 2.9479 LearningRate 0.0000 Epoch: 33 Global Step: 686140 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:19,353-Speed 6310.61 samples/sec Loss 2.9920 LearningRate 0.0000 Epoch: 33 Global Step: 686150 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:22,610-Speed 6289.49 samples/sec Loss 2.9944 LearningRate 0.0000 Epoch: 33 Global Step: 686160 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:25,889-Speed 6247.43 samples/sec Loss 3.0439 LearningRate 0.0000 Epoch: 33 Global Step: 686170 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:09:29,123-Speed 6334.17 samples/sec Loss 3.0536 LearningRate 0.0000 Epoch: 33 Global Step: 686180 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:32,367-Speed 6314.02 samples/sec Loss 3.0202 LearningRate 0.0000 Epoch: 33 Global Step: 686190 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:35,611-Speed 6315.60 samples/sec Loss 2.9770 LearningRate 0.0000 Epoch: 33 Global Step: 686200 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:38,862-Speed 6301.73 samples/sec Loss 3.0395 LearningRate 0.0000 Epoch: 33 Global Step: 686210 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:42,110-Speed 6305.29 samples/sec Loss 3.0612 LearningRate 0.0000 Epoch: 33 Global Step: 686220 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:45,366-Speed 6293.00 samples/sec Loss 2.9820 LearningRate 0.0000 Epoch: 33 Global Step: 686230 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:48,618-Speed 6298.58 samples/sec Loss 2.9865 LearningRate 0.0000 Epoch: 33 Global Step: 686240 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:51,867-Speed 6305.21 samples/sec Loss 3.0105 LearningRate 0.0000 Epoch: 33 Global Step: 686250 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:55,124-Speed 6290.16 samples/sec Loss 2.9981 LearningRate 0.0000 Epoch: 33 Global Step: 686260 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:09:58,372-Speed 6305.73 samples/sec Loss 3.0560 LearningRate 0.0000 Epoch: 33 Global Step: 686270 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:10:01,621-Speed 6304.84 samples/sec Loss 3.0575 LearningRate 0.0000 Epoch: 33 Global Step: 686280 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:04,870-Speed 6305.39 samples/sec Loss 3.0168 LearningRate 0.0000 Epoch: 33 Global Step: 686290 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:08,121-Speed 6304.14 samples/sec Loss 3.0062 LearningRate 0.0000 Epoch: 33 Global Step: 686300 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:11,372-Speed 6300.32 samples/sec Loss 2.9929 LearningRate 0.0000 Epoch: 33 Global Step: 686310 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:14,618-Speed 6310.31 samples/sec Loss 3.0229 LearningRate 0.0000 Epoch: 33 Global Step: 686320 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:17,875-Speed 6289.88 samples/sec Loss 3.0032 LearningRate 0.0000 Epoch: 33 Global Step: 686330 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:21,122-Speed 6307.47 samples/sec Loss 3.0023 LearningRate 0.0000 Epoch: 33 Global Step: 686340 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:24,367-Speed 6313.09 samples/sec Loss 2.9621 LearningRate 0.0000 Epoch: 33 Global Step: 686350 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:27,614-Speed 6308.57 samples/sec Loss 2.9892 LearningRate 0.0000 Epoch: 33 Global Step: 686360 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:30,863-Speed 6305.92 samples/sec Loss 2.9828 LearningRate 0.0000 Epoch: 33 Global Step: 686370 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:34,097-Speed 6333.27 samples/sec Loss 3.0282 LearningRate 0.0000 Epoch: 33 Global Step: 686380 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:37,353-Speed 6290.67 samples/sec Loss 3.0069 LearningRate 0.0000 Epoch: 33 Global Step: 686390 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:40,606-Speed 6298.42 samples/sec Loss 3.0085 LearningRate 0.0000 Epoch: 33 Global Step: 686400 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:43,853-Speed 6308.56 samples/sec Loss 2.9638 LearningRate 0.0000 Epoch: 33 Global Step: 686410 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:47,106-Speed 6295.97 samples/sec Loss 3.0632 LearningRate 0.0000 Epoch: 33 Global Step: 686420 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:10:50,343-Speed 6328.37 samples/sec Loss 3.0294 LearningRate 0.0000 Epoch: 33 Global Step: 686430 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:10:53,591-Speed 6308.41 samples/sec Loss 2.9423 LearningRate 0.0000 Epoch: 33 Global Step: 686440 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:10:56,839-Speed 6307.76 samples/sec Loss 3.0176 LearningRate 0.0000 Epoch: 33 Global Step: 686450 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:00,081-Speed 6317.66 samples/sec Loss 3.0343 LearningRate 0.0000 Epoch: 33 Global Step: 686460 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:03,324-Speed 6316.91 samples/sec Loss 3.0771 LearningRate 0.0000 Epoch: 33 Global Step: 686470 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:06,570-Speed 6309.43 samples/sec Loss 2.9923 LearningRate 0.0000 Epoch: 33 Global Step: 686480 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:09,825-Speed 6294.40 samples/sec Loss 2.9486 LearningRate 0.0000 Epoch: 33 Global Step: 686490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:13,072-Speed 6308.23 samples/sec Loss 3.0400 LearningRate 0.0000 Epoch: 33 Global Step: 686500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:16,314-Speed 6318.90 samples/sec Loss 3.0084 LearningRate 0.0000 Epoch: 33 Global Step: 686510 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:19,562-Speed 6307.19 samples/sec Loss 2.9648 LearningRate 0.0000 Epoch: 33 Global Step: 686520 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:22,814-Speed 6298.19 samples/sec Loss 3.0506 LearningRate 0.0000 Epoch: 33 Global Step: 686530 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:11:26,060-Speed 6310.37 samples/sec Loss 2.9684 LearningRate 0.0000 Epoch: 33 Global Step: 686540 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:11:29,309-Speed 6306.44 samples/sec Loss 3.0416 LearningRate 0.0000 Epoch: 33 Global Step: 686550 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:11:32,556-Speed 6307.13 samples/sec Loss 2.9477 LearningRate 0.0000 Epoch: 33 Global Step: 686560 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:11:35,805-Speed 6304.94 samples/sec Loss 3.0318 LearningRate 0.0000 Epoch: 33 Global Step: 686570 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:11:39,041-Speed 6330.28 samples/sec Loss 2.9813 LearningRate 0.0000 Epoch: 33 Global Step: 686580 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:42,293-Speed 6299.15 samples/sec Loss 2.9798 LearningRate 0.0000 Epoch: 33 Global Step: 686590 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:45,546-Speed 6297.81 samples/sec Loss 2.9801 LearningRate 0.0000 Epoch: 33 Global Step: 686600 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:48,791-Speed 6312.83 samples/sec Loss 3.0106 LearningRate 0.0000 Epoch: 33 Global Step: 686610 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:52,101-Speed 6188.27 samples/sec Loss 3.0027 LearningRate 0.0000 Epoch: 33 Global Step: 686620 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:55,355-Speed 6295.30 samples/sec Loss 3.0492 LearningRate 0.0000 Epoch: 33 Global Step: 686630 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:11:58,606-Speed 6300.78 samples/sec Loss 2.9998 LearningRate 0.0000 Epoch: 33 Global Step: 686640 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:12:01,857-Speed 6300.91 samples/sec Loss 3.0567 LearningRate 0.0000 Epoch: 33 Global Step: 686650 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:12:05,113-Speed 6291.64 samples/sec Loss 2.9967 LearningRate 0.0000 Epoch: 33 Global Step: 686660 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:12:08,368-Speed 6293.85 samples/sec Loss 3.0379 LearningRate 0.0000 Epoch: 33 Global Step: 686670 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:12:11,614-Speed 6310.88 samples/sec Loss 2.9880 LearningRate 0.0000 Epoch: 33 Global Step: 686680 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:12:14,864-Speed 6303.04 samples/sec Loss 3.0000 LearningRate 0.0000 Epoch: 33 Global Step: 686690 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:12:18,105-Speed 6319.98 samples/sec Loss 2.9427 LearningRate 0.0000 Epoch: 33 Global Step: 686700 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:12:21,358-Speed 6297.32 samples/sec Loss 3.0165 LearningRate 0.0000 Epoch: 33 Global Step: 686710 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:12:24,605-Speed 6308.43 samples/sec Loss 2.9741 LearningRate 0.0000 Epoch: 33 Global Step: 686720 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:12:27,853-Speed 6308.09 samples/sec Loss 3.0564 LearningRate 0.0000 Epoch: 33 Global Step: 686730 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:12:31,101-Speed 6305.18 samples/sec Loss 3.0457 LearningRate 0.0000 Epoch: 33 Global Step: 686740 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:12:34,353-Speed 6300.09 samples/sec Loss 2.9847 LearningRate 0.0000 Epoch: 33 Global Step: 686750 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:12:37,607-Speed 6293.93 samples/sec Loss 3.0297 LearningRate 0.0000 Epoch: 33 Global Step: 686760 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:12:40,850-Speed 6317.62 samples/sec Loss 3.0119 LearningRate 0.0000 Epoch: 33 Global Step: 686770 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:12:44,100-Speed 6301.65 samples/sec Loss 2.9391 LearningRate 0.0000 Epoch: 33 Global Step: 686780 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:12:47,351-Speed 6301.12 samples/sec Loss 2.9791 LearningRate 0.0000 Epoch: 33 Global Step: 686790 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:12:50,603-Speed 6300.22 samples/sec Loss 2.9954 LearningRate 0.0000 Epoch: 33 Global Step: 686800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:12:53,861-Speed 6286.89 samples/sec Loss 3.0188 LearningRate 0.0000 Epoch: 33 Global Step: 686810 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:12:57,106-Speed 6313.43 samples/sec Loss 2.9651 LearningRate 0.0000 Epoch: 33 Global Step: 686820 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:00,366-Speed 6284.03 samples/sec Loss 2.9847 LearningRate 0.0000 Epoch: 33 Global Step: 686830 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:03,604-Speed 6324.60 samples/sec Loss 2.9748 LearningRate 0.0000 Epoch: 33 Global Step: 686840 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:06,847-Speed 6316.61 samples/sec Loss 3.0325 LearningRate 0.0000 Epoch: 33 Global Step: 686850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:10,092-Speed 6313.63 samples/sec Loss 2.9527 LearningRate 0.0000 Epoch: 33 Global Step: 686860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:13,339-Speed 6308.41 samples/sec Loss 3.0352 LearningRate 0.0000 Epoch: 33 Global Step: 686870 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:13:16,585-Speed 6310.53 samples/sec Loss 2.9678 LearningRate 0.0000 Epoch: 33 Global Step: 686880 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:13:19,834-Speed 6306.97 samples/sec Loss 2.9522 LearningRate 0.0000 Epoch: 33 Global Step: 686890 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:13:23,067-Speed 6335.31 samples/sec Loss 2.9812 LearningRate 0.0000 Epoch: 33 Global Step: 686900 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:26,314-Speed 6308.24 samples/sec Loss 2.9710 LearningRate 0.0000 Epoch: 33 Global Step: 686910 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:29,569-Speed 6294.64 samples/sec Loss 3.0527 LearningRate 0.0000 Epoch: 33 Global Step: 686920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:32,821-Speed 6297.45 samples/sec Loss 3.0029 LearningRate 0.0000 Epoch: 33 Global Step: 686930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:36,075-Speed 6296.07 samples/sec Loss 3.0287 LearningRate 0.0000 Epoch: 33 Global Step: 686940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:39,362-Speed 6231.92 samples/sec Loss 3.0341 LearningRate 0.0000 Epoch: 33 Global Step: 686950 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:42,612-Speed 6303.17 samples/sec Loss 2.9834 LearningRate 0.0000 Epoch: 33 Global Step: 686960 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:45,855-Speed 6316.26 samples/sec Loss 2.9322 LearningRate 0.0000 Epoch: 33 Global Step: 686970 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:49,099-Speed 6313.51 samples/sec Loss 3.0393 LearningRate 0.0000 Epoch: 33 Global Step: 686980 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:52,342-Speed 6318.13 samples/sec Loss 3.0541 LearningRate 0.0000 Epoch: 33 Global Step: 686990 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:13:55,584-Speed 6318.65 samples/sec Loss 2.9809 LearningRate 0.0000 Epoch: 33 Global Step: 687000 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:13:58,831-Speed 6307.34 samples/sec Loss 3.0347 LearningRate 0.0000 Epoch: 33 Global Step: 687010 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:14:02,083-Speed 6299.33 samples/sec Loss 2.9812 LearningRate 0.0000 Epoch: 33 Global Step: 687020 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:14:05,330-Speed 6309.29 samples/sec Loss 3.0188 LearningRate 0.0000 Epoch: 33 Global Step: 687030 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:14:08,576-Speed 6309.71 samples/sec Loss 3.0300 LearningRate 0.0000 Epoch: 33 Global Step: 687040 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:14:11,825-Speed 6305.54 samples/sec Loss 3.0112 LearningRate 0.0000 Epoch: 33 Global Step: 687050 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:14:15,069-Speed 6314.46 samples/sec Loss 3.0087 LearningRate 0.0000 Epoch: 33 Global Step: 687060 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:14:18,322-Speed 6297.40 samples/sec Loss 3.0113 LearningRate 0.0000 Epoch: 33 Global Step: 687070 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:14:21,570-Speed 6307.84 samples/sec Loss 3.0039 LearningRate 0.0000 Epoch: 33 Global Step: 687080 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:14:24,808-Speed 6325.77 samples/sec Loss 3.0038 LearningRate 0.0000 Epoch: 33 Global Step: 687090 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:14:28,061-Speed 6297.24 samples/sec Loss 2.9954 LearningRate 0.0000 Epoch: 33 Global Step: 687100 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:14:31,312-Speed 6302.32 samples/sec Loss 3.0076 LearningRate 0.0000 Epoch: 33 Global Step: 687110 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:14:34,556-Speed 6314.08 samples/sec Loss 3.0055 LearningRate 0.0000 Epoch: 33 Global Step: 687120 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:14:37,805-Speed 6304.00 samples/sec Loss 3.0215 LearningRate 0.0000 Epoch: 33 Global Step: 687130 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:14:41,056-Speed 6300.96 samples/sec Loss 3.0033 LearningRate 0.0000 Epoch: 33 Global Step: 687140 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:14:44,301-Speed 6312.16 samples/sec Loss 2.9711 LearningRate 0.0000 Epoch: 33 Global Step: 687150 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:14:47,547-Speed 6312.12 samples/sec Loss 2.9933 LearningRate 0.0000 Epoch: 33 Global Step: 687160 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:14:50,793-Speed 6310.35 samples/sec Loss 2.9906 LearningRate 0.0000 Epoch: 33 Global Step: 687170 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:14:54,049-Speed 6290.39 samples/sec Loss 2.9974 LearningRate 0.0000 Epoch: 33 Global Step: 687180 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:14:57,301-Speed 6300.00 samples/sec Loss 2.9644 LearningRate 0.0000 Epoch: 33 Global Step: 687190 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:15:00,535-Speed 6334.24 samples/sec Loss 2.9844 LearningRate 0.0000 Epoch: 33 Global Step: 687200 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:03,792-Speed 6288.46 samples/sec Loss 3.0127 LearningRate 0.0000 Epoch: 33 Global Step: 687210 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:07,048-Speed 6292.21 samples/sec Loss 3.0433 LearningRate 0.0000 Epoch: 33 Global Step: 687220 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:10,295-Speed 6308.07 samples/sec Loss 2.9880 LearningRate 0.0000 Epoch: 33 Global Step: 687230 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:13,544-Speed 6304.92 samples/sec Loss 2.9527 LearningRate 0.0000 Epoch: 33 Global Step: 687240 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:16,789-Speed 6313.26 samples/sec Loss 3.0140 LearningRate 0.0000 Epoch: 33 Global Step: 687250 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:20,041-Speed 6298.35 samples/sec Loss 3.0512 LearningRate 0.0000 Epoch: 33 Global Step: 687260 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:23,292-Speed 6302.01 samples/sec Loss 3.0045 LearningRate 0.0000 Epoch: 33 Global Step: 687270 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:26,541-Speed 6305.44 samples/sec Loss 3.0338 LearningRate 0.0000 Epoch: 33 Global Step: 687280 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:29,788-Speed 6307.62 samples/sec Loss 2.9407 LearningRate 0.0000 Epoch: 33 Global Step: 687290 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:33,052-Speed 6276.34 samples/sec Loss 3.0006 LearningRate 0.0000 Epoch: 33 Global Step: 687300 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:15:36,308-Speed 6291.59 samples/sec Loss 2.9783 LearningRate 0.0000 Epoch: 33 Global Step: 687310 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:15:39,556-Speed 6306.46 samples/sec Loss 2.9877 LearningRate 0.0000 Epoch: 33 Global Step: 687320 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:15:42,806-Speed 6303.56 samples/sec Loss 2.9892 LearningRate 0.0000 Epoch: 33 Global Step: 687330 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:15:46,040-Speed 6333.54 samples/sec Loss 3.0486 LearningRate 0.0000 Epoch: 33 Global Step: 687340 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:49,293-Speed 6296.85 samples/sec Loss 2.9956 LearningRate 0.0000 Epoch: 33 Global Step: 687350 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:52,551-Speed 6287.25 samples/sec Loss 3.0021 LearningRate 0.0000 Epoch: 33 Global Step: 687360 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:55,799-Speed 6307.17 samples/sec Loss 2.9672 LearningRate 0.0000 Epoch: 33 Global Step: 687370 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:15:59,052-Speed 6297.03 samples/sec Loss 3.0043 LearningRate 0.0000 Epoch: 33 Global Step: 687380 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:02,315-Speed 6279.08 samples/sec Loss 3.0489 LearningRate 0.0000 Epoch: 33 Global Step: 687390 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:05,581-Speed 6272.37 samples/sec Loss 3.0354 LearningRate 0.0000 Epoch: 33 Global Step: 687400 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:08,836-Speed 6293.40 samples/sec Loss 2.9534 LearningRate 0.0000 Epoch: 33 Global Step: 687410 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:12,098-Speed 6278.03 samples/sec Loss 2.9451 LearningRate 0.0000 Epoch: 33 Global Step: 687420 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:15,352-Speed 6295.80 samples/sec Loss 3.0166 LearningRate 0.0000 Epoch: 33 Global Step: 687430 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:18,601-Speed 6304.81 samples/sec Loss 3.0155 LearningRate 0.0000 Epoch: 33 Global Step: 687440 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:16:21,852-Speed 6301.14 samples/sec Loss 3.0515 LearningRate 0.0000 Epoch: 33 Global Step: 687450 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:16:25,100-Speed 6306.90 samples/sec Loss 3.0064 LearningRate 0.0000 Epoch: 33 Global Step: 687460 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:16:28,348-Speed 6306.20 samples/sec Loss 2.9644 LearningRate 0.0000 Epoch: 33 Global Step: 687470 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:16:31,607-Speed 6286.47 samples/sec Loss 2.9872 LearningRate 0.0000 Epoch: 33 Global Step: 687480 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:16:34,854-Speed 6307.94 samples/sec Loss 2.9726 LearningRate 0.0000 Epoch: 33 Global Step: 687490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:38,126-Speed 6261.28 samples/sec Loss 3.0741 LearningRate 0.0000 Epoch: 33 Global Step: 687500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:41,375-Speed 6304.41 samples/sec Loss 3.0425 LearningRate 0.0000 Epoch: 33 Global Step: 687510 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:44,626-Speed 6301.12 samples/sec Loss 2.9889 LearningRate 0.0000 Epoch: 33 Global Step: 687520 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:47,882-Speed 6291.54 samples/sec Loss 3.0448 LearningRate 0.0000 Epoch: 33 Global Step: 687530 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:51,138-Speed 6292.38 samples/sec Loss 3.0442 LearningRate 0.0000 Epoch: 33 Global Step: 687540 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:54,393-Speed 6291.85 samples/sec Loss 3.0331 LearningRate 0.0000 Epoch: 33 Global Step: 687550 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:16:57,640-Speed 6309.62 samples/sec Loss 2.9688 LearningRate 0.0000 Epoch: 33 Global Step: 687560 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:17:00,882-Speed 6318.81 samples/sec Loss 3.0054 LearningRate 0.0000 Epoch: 33 Global Step: 687570 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:17:04,132-Speed 6303.05 samples/sec Loss 3.0127 LearningRate 0.0000 Epoch: 33 Global Step: 687580 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:17:07,388-Speed 6290.69 samples/sec Loss 2.9869 LearningRate 0.0000 Epoch: 33 Global Step: 687590 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:10,633-Speed 6312.38 samples/sec Loss 3.0237 LearningRate 0.0000 Epoch: 33 Global Step: 687600 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:13,876-Speed 6316.98 samples/sec Loss 3.0224 LearningRate 0.0000 Epoch: 33 Global Step: 687610 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:17,138-Speed 6279.64 samples/sec Loss 3.0038 LearningRate 0.0000 Epoch: 33 Global Step: 687620 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:20,386-Speed 6306.43 samples/sec Loss 2.9875 LearningRate 0.0000 Epoch: 33 Global Step: 687630 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:23,673-Speed 6232.84 samples/sec Loss 3.0221 LearningRate 0.0000 Epoch: 33 Global Step: 687640 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:26,921-Speed 6305.97 samples/sec Loss 2.9843 LearningRate 0.0000 Epoch: 33 Global Step: 687650 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:30,163-Speed 6319.50 samples/sec Loss 3.0435 LearningRate 0.0000 Epoch: 33 Global Step: 687660 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:33,411-Speed 6306.73 samples/sec Loss 3.0086 LearningRate 0.0000 Epoch: 33 Global Step: 687670 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:36,721-Speed 6188.67 samples/sec Loss 3.0180 LearningRate 0.0000 Epoch: 33 Global Step: 687680 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:39,957-Speed 6329.96 samples/sec Loss 2.9560 LearningRate 0.0000 Epoch: 33 Global Step: 687690 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:43,206-Speed 6304.93 samples/sec Loss 3.0234 LearningRate 0.0000 Epoch: 33 Global Step: 687700 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:46,453-Speed 6309.09 samples/sec Loss 2.9863 LearningRate 0.0000 Epoch: 33 Global Step: 687710 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:49,704-Speed 6303.03 samples/sec Loss 2.9402 LearningRate 0.0000 Epoch: 33 Global Step: 687720 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:52,951-Speed 6308.10 samples/sec Loss 3.0066 LearningRate 0.0000 Epoch: 33 Global Step: 687730 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:56,198-Speed 6309.07 samples/sec Loss 2.9470 LearningRate 0.0000 Epoch: 33 Global Step: 687740 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:17:59,452-Speed 6295.26 samples/sec Loss 3.0006 LearningRate 0.0000 Epoch: 33 Global Step: 687750 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:18:02,698-Speed 6311.67 samples/sec Loss 3.0162 LearningRate 0.0000 Epoch: 33 Global Step: 687760 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:18:05,934-Speed 6329.29 samples/sec Loss 3.0166 LearningRate 0.0000 Epoch: 33 Global Step: 687770 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:09,190-Speed 6292.58 samples/sec Loss 2.9924 LearningRate 0.0000 Epoch: 33 Global Step: 687780 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:12,446-Speed 6290.48 samples/sec Loss 3.0087 LearningRate 0.0000 Epoch: 33 Global Step: 687790 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:15,696-Speed 6303.35 samples/sec Loss 2.9948 LearningRate 0.0000 Epoch: 33 Global Step: 687800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:18,946-Speed 6302.53 samples/sec Loss 2.9713 LearningRate 0.0000 Epoch: 33 Global Step: 687810 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:22,203-Speed 6288.68 samples/sec Loss 2.9986 LearningRate 0.0000 Epoch: 33 Global Step: 687820 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:25,458-Speed 6292.58 samples/sec Loss 2.9661 LearningRate 0.0000 Epoch: 33 Global Step: 687830 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:28,709-Speed 6301.33 samples/sec Loss 2.9980 LearningRate 0.0000 Epoch: 33 Global Step: 687840 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:31,958-Speed 6306.24 samples/sec Loss 2.9510 LearningRate 0.0000 Epoch: 33 Global Step: 687850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:35,202-Speed 6314.41 samples/sec Loss 2.9359 LearningRate 0.0000 Epoch: 33 Global Step: 687860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:38,434-Speed 6337.67 samples/sec Loss 3.0169 LearningRate 0.0000 Epoch: 33 Global Step: 687870 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:41,677-Speed 6317.23 samples/sec Loss 2.9836 LearningRate 0.0000 Epoch: 33 Global Step: 687880 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:44,925-Speed 6305.17 samples/sec Loss 2.9553 LearningRate 0.0000 Epoch: 33 Global Step: 687890 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:48,177-Speed 6299.17 samples/sec Loss 2.9367 LearningRate 0.0000 Epoch: 33 Global Step: 687900 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:51,426-Speed 6306.29 samples/sec Loss 3.0079 LearningRate 0.0000 Epoch: 33 Global Step: 687910 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:54,673-Speed 6308.37 samples/sec Loss 2.9994 LearningRate 0.0000 Epoch: 33 Global Step: 687920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:18:57,926-Speed 6296.37 samples/sec Loss 3.0568 LearningRate 0.0000 Epoch: 33 Global Step: 687930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:19:01,184-Speed 6288.73 samples/sec Loss 3.0536 LearningRate 0.0000 Epoch: 33 Global Step: 687940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:19:04,439-Speed 6293.32 samples/sec Loss 2.9471 LearningRate 0.0000 Epoch: 33 Global Step: 687950 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:19:07,693-Speed 6295.91 samples/sec Loss 2.9708 LearningRate 0.0000 Epoch: 33 Global Step: 687960 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:19:10,951-Speed 6286.56 samples/sec Loss 2.9740 LearningRate 0.0000 Epoch: 33 Global Step: 687970 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:14,196-Speed 6313.25 samples/sec Loss 2.9541 LearningRate 0.0000 Epoch: 33 Global Step: 687980 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:17,447-Speed 6299.97 samples/sec Loss 3.0220 LearningRate 0.0000 Epoch: 33 Global Step: 687990 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:20,704-Speed 6290.07 samples/sec Loss 3.0039 LearningRate 0.0000 Epoch: 33 Global Step: 688000 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:23,966-Speed 6279.37 samples/sec Loss 3.0041 LearningRate 0.0000 Epoch: 33 Global Step: 688010 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:27,224-Speed 6287.05 samples/sec Loss 3.0380 LearningRate 0.0000 Epoch: 33 Global Step: 688020 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:30,476-Speed 6298.68 samples/sec Loss 2.9853 LearningRate 0.0000 Epoch: 33 Global Step: 688030 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:33,726-Speed 6302.70 samples/sec Loss 2.9207 LearningRate 0.0000 Epoch: 33 Global Step: 688040 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:36,976-Speed 6304.90 samples/sec Loss 2.9337 LearningRate 0.0000 Epoch: 33 Global Step: 688050 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:40,229-Speed 6296.00 samples/sec Loss 3.0123 LearningRate 0.0000 Epoch: 33 Global Step: 688060 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:43,479-Speed 6303.38 samples/sec Loss 3.0231 LearningRate 0.0000 Epoch: 33 Global Step: 688070 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:46,731-Speed 6297.75 samples/sec Loss 2.9760 LearningRate 0.0000 Epoch: 33 Global Step: 688080 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:49,984-Speed 6297.59 samples/sec Loss 2.9794 LearningRate 0.0000 Epoch: 33 Global Step: 688090 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:53,234-Speed 6304.11 samples/sec Loss 2.9612 LearningRate 0.0000 Epoch: 33 Global Step: 688100 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:56,484-Speed 6302.43 samples/sec Loss 3.0561 LearningRate 0.0000 Epoch: 33 Global Step: 688110 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:19:59,734-Speed 6302.60 samples/sec Loss 3.0038 LearningRate 0.0000 Epoch: 33 Global Step: 688120 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:20:02,991-Speed 6290.79 samples/sec Loss 3.0180 LearningRate 0.0000 Epoch: 33 Global Step: 688130 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:20:06,239-Speed 6306.59 samples/sec Loss 2.9999 LearningRate 0.0000 Epoch: 33 Global Step: 688140 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:20:09,489-Speed 6303.28 samples/sec Loss 2.9927 LearningRate 0.0000 Epoch: 33 Global Step: 688150 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:20:12,739-Speed 6302.59 samples/sec Loss 3.0184 LearningRate 0.0000 Epoch: 33 Global Step: 688160 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:20:15,972-Speed 6336.53 samples/sec Loss 3.0037 LearningRate 0.0000 Epoch: 33 Global Step: 688170 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:20:19,216-Speed 6314.49 samples/sec Loss 2.9551 LearningRate 0.0000 Epoch: 33 Global Step: 688180 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:20:22,464-Speed 6307.25 samples/sec Loss 3.0168 LearningRate 0.0000 Epoch: 33 Global Step: 688190 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:20:25,711-Speed 6307.75 samples/sec Loss 2.9831 LearningRate 0.0000 Epoch: 33 Global Step: 688200 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:20:28,960-Speed 6305.27 samples/sec Loss 3.0151 LearningRate 0.0000 Epoch: 33 Global Step: 688210 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:20:32,217-Speed 6289.37 samples/sec Loss 2.9842 LearningRate 0.0000 Epoch: 33 Global Step: 688220 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:20:35,468-Speed 6301.55 samples/sec Loss 2.9849 LearningRate 0.0000 Epoch: 33 Global Step: 688230 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:20:38,716-Speed 6306.89 samples/sec Loss 3.0112 LearningRate 0.0000 Epoch: 33 Global Step: 688240 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:20:42,019-Speed 6201.16 samples/sec Loss 2.9716 LearningRate 0.0000 Epoch: 33 Global Step: 688250 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:20:45,269-Speed 6303.53 samples/sec Loss 2.9584 LearningRate 0.0000 Epoch: 33 Global Step: 688260 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:20:48,512-Speed 6315.14 samples/sec Loss 2.9569 LearningRate 0.0000 Epoch: 33 Global Step: 688270 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:20:51,773-Speed 6281.73 samples/sec Loss 2.9805 LearningRate 0.0000 Epoch: 33 Global Step: 688280 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:20:55,023-Speed 6304.17 samples/sec Loss 3.0720 LearningRate 0.0000 Epoch: 33 Global Step: 688290 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:20:58,272-Speed 6303.36 samples/sec Loss 3.0259 LearningRate 0.0000 Epoch: 33 Global Step: 688300 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:21:01,502-Speed 6343.91 samples/sec Loss 2.9865 LearningRate 0.0000 Epoch: 33 Global Step: 688310 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:04,746-Speed 6313.36 samples/sec Loss 2.9976 LearningRate 0.0000 Epoch: 33 Global Step: 688320 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:07,993-Speed 6309.86 samples/sec Loss 3.0139 LearningRate 0.0000 Epoch: 33 Global Step: 688330 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:11,238-Speed 6313.78 samples/sec Loss 2.9686 LearningRate 0.0000 Epoch: 33 Global Step: 688340 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:14,487-Speed 6304.31 samples/sec Loss 3.0552 LearningRate 0.0000 Epoch: 33 Global Step: 688350 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:17,736-Speed 6304.00 samples/sec Loss 2.9960 LearningRate 0.0000 Epoch: 33 Global Step: 688360 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:20,996-Speed 6284.52 samples/sec Loss 3.0021 LearningRate 0.0000 Epoch: 33 Global Step: 688370 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:24,252-Speed 6291.21 samples/sec Loss 2.9670 LearningRate 0.0000 Epoch: 33 Global Step: 688380 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:27,501-Speed 6304.25 samples/sec Loss 3.0393 LearningRate 0.0000 Epoch: 33 Global Step: 688390 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:30,750-Speed 6306.16 samples/sec Loss 2.9635 LearningRate 0.0000 Epoch: 33 Global Step: 688400 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:33,992-Speed 6318.36 samples/sec Loss 2.9810 LearningRate 0.0000 Epoch: 33 Global Step: 688410 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:21:37,245-Speed 6297.15 samples/sec Loss 3.0094 LearningRate 0.0000 Epoch: 33 Global Step: 688420 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:21:40,494-Speed 6305.71 samples/sec Loss 2.9918 LearningRate 0.0000 Epoch: 33 Global Step: 688430 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:21:43,746-Speed 6299.25 samples/sec Loss 3.0001 LearningRate 0.0000 Epoch: 33 Global Step: 688440 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:21:46,984-Speed 6324.99 samples/sec Loss 2.9699 LearningRate 0.0000 Epoch: 33 Global Step: 688450 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:50,235-Speed 6300.98 samples/sec Loss 2.9575 LearningRate 0.0000 Epoch: 33 Global Step: 688460 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:53,495-Speed 6283.45 samples/sec Loss 2.9544 LearningRate 0.0000 Epoch: 33 Global Step: 688470 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:56,746-Speed 6301.68 samples/sec Loss 3.0161 LearningRate 0.0000 Epoch: 33 Global Step: 688480 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:21:59,995-Speed 6305.38 samples/sec Loss 3.0045 LearningRate 0.0000 Epoch: 33 Global Step: 688490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:22:03,246-Speed 6300.37 samples/sec Loss 2.9669 LearningRate 0.0000 Epoch: 33 Global Step: 688500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:22:06,508-Speed 6280.35 samples/sec Loss 3.0504 LearningRate 0.0000 Epoch: 33 Global Step: 688510 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:22:09,756-Speed 6305.42 samples/sec Loss 2.9247 LearningRate 0.0000 Epoch: 33 Global Step: 688520 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:22:13,002-Speed 6312.55 samples/sec Loss 2.9770 LearningRate 0.0000 Epoch: 33 Global Step: 688530 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:22:16,247-Speed 6313.13 samples/sec Loss 3.0192 LearningRate 0.0000 Epoch: 33 Global Step: 688540 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:22:19,498-Speed 6300.73 samples/sec Loss 3.0318 LearningRate 0.0000 Epoch: 33 Global Step: 688550 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:22,756-Speed 6286.56 samples/sec Loss 3.0418 LearningRate 0.0000 Epoch: 33 Global Step: 688560 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:26,012-Speed 6292.40 samples/sec Loss 2.9822 LearningRate 0.0000 Epoch: 33 Global Step: 688570 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:29,269-Speed 6288.90 samples/sec Loss 3.0055 LearningRate 0.0000 Epoch: 33 Global Step: 688580 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:32,528-Speed 6286.32 samples/sec Loss 2.9488 LearningRate 0.0000 Epoch: 33 Global Step: 688590 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:35,793-Speed 6276.73 samples/sec Loss 2.9697 LearningRate 0.0000 Epoch: 33 Global Step: 688600 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:39,042-Speed 6304.22 samples/sec Loss 2.9860 LearningRate 0.0000 Epoch: 33 Global Step: 688610 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:42,292-Speed 6302.72 samples/sec Loss 2.9461 LearningRate 0.0000 Epoch: 33 Global Step: 688620 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:45,535-Speed 6316.67 samples/sec Loss 2.9756 LearningRate 0.0000 Epoch: 33 Global Step: 688630 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:48,783-Speed 6306.45 samples/sec Loss 2.9745 LearningRate 0.0000 Epoch: 33 Global Step: 688640 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:52,022-Speed 6325.71 samples/sec Loss 2.9451 LearningRate 0.0000 Epoch: 33 Global Step: 688650 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:22:55,255-Speed 6336.62 samples/sec Loss 3.0172 LearningRate 0.0000 Epoch: 33 Global Step: 688660 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:22:58,503-Speed 6304.95 samples/sec Loss 3.0053 LearningRate 0.0000 Epoch: 33 Global Step: 688670 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:01,757-Speed 6295.28 samples/sec Loss 3.0159 LearningRate 0.0000 Epoch: 33 Global Step: 688680 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:05,003-Speed 6310.56 samples/sec Loss 3.0028 LearningRate 0.0000 Epoch: 33 Global Step: 688690 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:08,260-Speed 6289.64 samples/sec Loss 2.9595 LearningRate 0.0000 Epoch: 33 Global Step: 688700 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:11,518-Speed 6288.63 samples/sec Loss 2.9963 LearningRate 0.0000 Epoch: 33 Global Step: 688710 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:14,769-Speed 6299.66 samples/sec Loss 3.0264 LearningRate 0.0000 Epoch: 33 Global Step: 688720 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:18,014-Speed 6313.75 samples/sec Loss 2.9527 LearningRate 0.0000 Epoch: 33 Global Step: 688730 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:21,258-Speed 6313.76 samples/sec Loss 2.9869 LearningRate 0.0000 Epoch: 33 Global Step: 688740 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:24,509-Speed 6302.95 samples/sec Loss 2.9746 LearningRate 0.0000 Epoch: 33 Global Step: 688750 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:27,763-Speed 6294.72 samples/sec Loss 3.0189 LearningRate 0.0000 Epoch: 33 Global Step: 688760 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:23:31,011-Speed 6307.45 samples/sec Loss 3.0044 LearningRate 0.0000 Epoch: 33 Global Step: 688770 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:23:34,241-Speed 6341.37 samples/sec Loss 3.0131 LearningRate 0.0000 Epoch: 33 Global Step: 688780 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:37,491-Speed 6303.61 samples/sec Loss 3.0490 LearningRate 0.0000 Epoch: 33 Global Step: 688790 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:40,740-Speed 6304.33 samples/sec Loss 2.9992 LearningRate 0.0000 Epoch: 33 Global Step: 688800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:44,003-Speed 6277.55 samples/sec Loss 3.0192 LearningRate 0.0000 Epoch: 33 Global Step: 688810 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:47,272-Speed 6266.47 samples/sec Loss 3.0293 LearningRate 0.0000 Epoch: 33 Global Step: 688820 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:50,525-Speed 6298.16 samples/sec Loss 2.9777 LearningRate 0.0000 Epoch: 33 Global Step: 688830 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:53,770-Speed 6310.73 samples/sec Loss 3.0135 LearningRate 0.0000 Epoch: 33 Global Step: 688840 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:23:57,017-Speed 6310.08 samples/sec Loss 2.9926 LearningRate 0.0000 Epoch: 33 Global Step: 688850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:00,269-Speed 6298.91 samples/sec Loss 2.9961 LearningRate 0.0000 Epoch: 33 Global Step: 688860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:03,531-Speed 6278.63 samples/sec Loss 2.9456 LearningRate 0.0000 Epoch: 33 Global Step: 688870 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:06,785-Speed 6295.41 samples/sec Loss 3.0104 LearningRate 0.0000 Epoch: 33 Global Step: 688880 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:24:10,020-Speed 6332.51 samples/sec Loss 2.9732 LearningRate 0.0000 Epoch: 33 Global Step: 688890 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:13,272-Speed 6300.19 samples/sec Loss 3.0262 LearningRate 0.0000 Epoch: 33 Global Step: 688900 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:16,518-Speed 6309.60 samples/sec Loss 3.0210 LearningRate 0.0000 Epoch: 33 Global Step: 688910 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:19,771-Speed 6297.32 samples/sec Loss 2.9863 LearningRate 0.0000 Epoch: 33 Global Step: 688920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:23,033-Speed 6279.75 samples/sec Loss 2.9853 LearningRate 0.0000 Epoch: 33 Global Step: 688930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:26,284-Speed 6300.88 samples/sec Loss 2.9910 LearningRate 0.0000 Epoch: 33 Global Step: 688940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:29,533-Speed 6305.61 samples/sec Loss 2.9409 LearningRate 0.0000 Epoch: 33 Global Step: 688950 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:32,791-Speed 6286.70 samples/sec Loss 2.9522 LearningRate 0.0000 Epoch: 33 Global Step: 688960 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:36,042-Speed 6303.08 samples/sec Loss 2.9373 LearningRate 0.0000 Epoch: 33 Global Step: 688970 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:39,295-Speed 6295.89 samples/sec Loss 2.9751 LearningRate 0.0000 Epoch: 33 Global Step: 688980 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:42,554-Speed 6285.83 samples/sec Loss 2.9744 LearningRate 0.0000 Epoch: 33 Global Step: 688990 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:24:45,825-Speed 6261.65 samples/sec Loss 2.9857 LearningRate 0.0000 Epoch: 33 Global Step: 689000 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:24:49,068-Speed 6317.26 samples/sec Loss 3.0075 LearningRate 0.0000 Epoch: 33 Global Step: 689010 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:52,320-Speed 6299.83 samples/sec Loss 2.9753 LearningRate 0.0000 Epoch: 33 Global Step: 689020 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:55,568-Speed 6305.98 samples/sec Loss 2.9850 LearningRate 0.0000 Epoch: 33 Global Step: 689030 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:24:58,814-Speed 6311.56 samples/sec Loss 3.0242 LearningRate 0.0000 Epoch: 33 Global Step: 689040 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:02,074-Speed 6283.20 samples/sec Loss 2.9878 LearningRate 0.0000 Epoch: 33 Global Step: 689050 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:05,320-Speed 6310.13 samples/sec Loss 2.9533 LearningRate 0.0000 Epoch: 33 Global Step: 689060 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:08,569-Speed 6304.23 samples/sec Loss 2.9558 LearningRate 0.0000 Epoch: 33 Global Step: 689070 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:11,823-Speed 6295.66 samples/sec Loss 2.9674 LearningRate 0.0000 Epoch: 33 Global Step: 689080 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:15,072-Speed 6305.22 samples/sec Loss 2.9593 LearningRate 0.0000 Epoch: 33 Global Step: 689090 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:18,323-Speed 6300.68 samples/sec Loss 2.9455 LearningRate 0.0000 Epoch: 33 Global Step: 689100 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:21,574-Speed 6300.73 samples/sec Loss 3.0727 LearningRate 0.0000 Epoch: 33 Global Step: 689110 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:25:24,830-Speed 6291.02 samples/sec Loss 2.9415 LearningRate 0.0000 Epoch: 33 Global Step: 689120 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:25:28,087-Speed 6290.35 samples/sec Loss 3.0437 LearningRate 0.0000 Epoch: 33 Global Step: 689130 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:25:31,337-Speed 6301.91 samples/sec Loss 2.9574 LearningRate 0.0000 Epoch: 33 Global Step: 689140 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:25:34,578-Speed 6320.10 samples/sec Loss 2.9765 LearningRate 0.0000 Epoch: 33 Global Step: 689150 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:37,829-Speed 6303.20 samples/sec Loss 3.0090 LearningRate 0.0000 Epoch: 33 Global Step: 689160 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:41,081-Speed 6299.05 samples/sec Loss 3.0255 LearningRate 0.0000 Epoch: 33 Global Step: 689170 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:44,334-Speed 6297.59 samples/sec Loss 2.9819 LearningRate 0.0000 Epoch: 33 Global Step: 689180 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:47,584-Speed 6301.82 samples/sec Loss 2.9694 LearningRate 0.0000 Epoch: 33 Global Step: 689190 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:50,840-Speed 6292.51 samples/sec Loss 3.0309 LearningRate 0.0000 Epoch: 33 Global Step: 689200 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:54,085-Speed 6311.73 samples/sec Loss 2.9889 LearningRate 0.0000 Epoch: 33 Global Step: 689210 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:25:57,331-Speed 6310.68 samples/sec Loss 3.0159 LearningRate 0.0000 Epoch: 33 Global Step: 689220 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:00,577-Speed 6310.99 samples/sec Loss 3.0190 LearningRate 0.0000 Epoch: 33 Global Step: 689230 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:03,825-Speed 6306.27 samples/sec Loss 2.9878 LearningRate 0.0000 Epoch: 33 Global Step: 689240 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:07,072-Speed 6309.74 samples/sec Loss 2.9755 LearningRate 0.0000 Epoch: 33 Global Step: 689250 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:26:10,350-Speed 6248.44 samples/sec Loss 2.9747 LearningRate 0.0000 Epoch: 33 Global Step: 689260 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:26:13,601-Speed 6302.17 samples/sec Loss 2.9574 LearningRate 0.0000 Epoch: 33 Global Step: 689270 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:26:16,847-Speed 6308.88 samples/sec Loss 3.0366 LearningRate 0.0000 Epoch: 33 Global Step: 689280 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:26:20,098-Speed 6301.71 samples/sec Loss 3.0096 LearningRate 0.0000 Epoch: 33 Global Step: 689290 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:26:23,329-Speed 6339.38 samples/sec Loss 2.9629 LearningRate 0.0000 Epoch: 33 Global Step: 689300 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:26,579-Speed 6303.26 samples/sec Loss 3.0004 LearningRate 0.0000 Epoch: 33 Global Step: 689310 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:29,826-Speed 6308.85 samples/sec Loss 2.9879 LearningRate 0.0000 Epoch: 33 Global Step: 689320 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:33,093-Speed 6270.70 samples/sec Loss 3.0386 LearningRate 0.0000 Epoch: 33 Global Step: 689330 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:36,336-Speed 6316.24 samples/sec Loss 3.0018 LearningRate 0.0000 Epoch: 33 Global Step: 689340 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:39,583-Speed 6307.70 samples/sec Loss 3.0140 LearningRate 0.0000 Epoch: 33 Global Step: 689350 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:42,829-Speed 6312.31 samples/sec Loss 3.0030 LearningRate 0.0000 Epoch: 33 Global Step: 689360 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:46,077-Speed 6307.31 samples/sec Loss 3.0038 LearningRate 0.0000 Epoch: 33 Global Step: 689370 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:49,319-Speed 6317.48 samples/sec Loss 3.0378 LearningRate 0.0000 Epoch: 33 Global Step: 689380 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:52,576-Speed 6290.47 samples/sec Loss 3.0080 LearningRate 0.0000 Epoch: 33 Global Step: 689390 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:26:55,865-Speed 6228.02 samples/sec Loss 3.0024 LearningRate 0.0000 Epoch: 33 Global Step: 689400 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:26:59,114-Speed 6304.65 samples/sec Loss 2.9568 LearningRate 0.0000 Epoch: 33 Global Step: 689410 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:27:02,360-Speed 6310.02 samples/sec Loss 2.9764 LearningRate 0.0000 Epoch: 33 Global Step: 689420 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:27:05,606-Speed 6312.73 samples/sec Loss 2.9593 LearningRate 0.0000 Epoch: 33 Global Step: 689430 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:27:08,846-Speed 6321.19 samples/sec Loss 3.0175 LearningRate 0.0000 Epoch: 33 Global Step: 689440 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:12,118-Speed 6260.40 samples/sec Loss 2.9682 LearningRate 0.0000 Epoch: 33 Global Step: 689450 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:15,368-Speed 6304.09 samples/sec Loss 2.9591 LearningRate 0.0000 Epoch: 33 Global Step: 689460 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:18,617-Speed 6304.52 samples/sec Loss 2.9603 LearningRate 0.0000 Epoch: 33 Global Step: 689470 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:21,862-Speed 6311.52 samples/sec Loss 3.0020 LearningRate 0.0000 Epoch: 33 Global Step: 689480 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:25,113-Speed 6301.31 samples/sec Loss 2.9820 LearningRate 0.0000 Epoch: 33 Global Step: 689490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:28,363-Speed 6303.32 samples/sec Loss 2.9685 LearningRate 0.0000 Epoch: 33 Global Step: 689500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:31,613-Speed 6303.00 samples/sec Loss 3.0233 LearningRate 0.0000 Epoch: 33 Global Step: 689510 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:34,858-Speed 6311.67 samples/sec Loss 2.9417 LearningRate 0.0000 Epoch: 33 Global Step: 689520 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:38,104-Speed 6310.44 samples/sec Loss 2.9842 LearningRate 0.0000 Epoch: 33 Global Step: 689530 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:41,345-Speed 6321.03 samples/sec Loss 2.9962 LearningRate 0.0000 Epoch: 33 Global Step: 689540 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:44,596-Speed 6301.57 samples/sec Loss 2.9771 LearningRate 0.0000 Epoch: 33 Global Step: 689550 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:47,856-Speed 6282.90 samples/sec Loss 2.9613 LearningRate 0.0000 Epoch: 33 Global Step: 689560 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:51,111-Speed 6294.02 samples/sec Loss 3.0005 LearningRate 0.0000 Epoch: 33 Global Step: 689570 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:54,361-Speed 6301.38 samples/sec Loss 2.9865 LearningRate 0.0000 Epoch: 33 Global Step: 689580 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:27:57,616-Speed 6293.82 samples/sec Loss 3.0008 LearningRate 0.0000 Epoch: 33 Global Step: 689590 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:00,864-Speed 6308.78 samples/sec Loss 3.0228 LearningRate 0.0000 Epoch: 33 Global Step: 689600 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:04,107-Speed 6315.55 samples/sec Loss 2.9655 LearningRate 0.0000 Epoch: 33 Global Step: 689610 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:07,358-Speed 6300.72 samples/sec Loss 2.9996 LearningRate 0.0000 Epoch: 33 Global Step: 689620 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:10,609-Speed 6301.38 samples/sec Loss 2.9596 LearningRate 0.0000 Epoch: 33 Global Step: 689630 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:13,877-Speed 6269.06 samples/sec Loss 2.9483 LearningRate 0.0000 Epoch: 33 Global Step: 689640 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:28:17,134-Speed 6289.82 samples/sec Loss 3.0346 LearningRate 0.0000 Epoch: 33 Global Step: 689650 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:28:20,380-Speed 6308.96 samples/sec Loss 2.9953 LearningRate 0.0000 Epoch: 33 Global Step: 689660 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:28:23,636-Speed 6292.54 samples/sec Loss 3.0217 LearningRate 0.0000 Epoch: 33 Global Step: 689670 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:28:26,886-Speed 6302.11 samples/sec Loss 2.9761 LearningRate 0.0000 Epoch: 33 Global Step: 689680 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:28:30,137-Speed 6302.03 samples/sec Loss 3.0024 LearningRate 0.0000 Epoch: 33 Global Step: 689690 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:28:33,386-Speed 6303.16 samples/sec Loss 2.9639 LearningRate 0.0000 Epoch: 33 Global Step: 689700 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:28:36,627-Speed 6322.19 samples/sec Loss 2.9607 LearningRate 0.0000 Epoch: 33 Global Step: 689710 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:39,880-Speed 6296.39 samples/sec Loss 3.0167 LearningRate 0.0000 Epoch: 33 Global Step: 689720 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:43,125-Speed 6311.87 samples/sec Loss 3.0010 LearningRate 0.0000 Epoch: 33 Global Step: 689730 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:46,376-Speed 6301.86 samples/sec Loss 3.0406 LearningRate 0.0000 Epoch: 33 Global Step: 689740 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:49,628-Speed 6298.03 samples/sec Loss 2.9763 LearningRate 0.0000 Epoch: 33 Global Step: 689750 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:52,886-Speed 6287.00 samples/sec Loss 2.9626 LearningRate 0.0000 Epoch: 33 Global Step: 689760 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:56,141-Speed 6295.06 samples/sec Loss 2.9952 LearningRate 0.0000 Epoch: 33 Global Step: 689770 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:28:59,394-Speed 6295.49 samples/sec Loss 2.9898 LearningRate 0.0000 Epoch: 33 Global Step: 689780 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:02,653-Speed 6285.66 samples/sec Loss 2.9264 LearningRate 0.0000 Epoch: 33 Global Step: 689790 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:05,898-Speed 6313.28 samples/sec Loss 2.9843 LearningRate 0.0000 Epoch: 33 Global Step: 689800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:09,143-Speed 6312.72 samples/sec Loss 2.9571 LearningRate 0.0000 Epoch: 33 Global Step: 689810 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:29:12,395-Speed 6300.68 samples/sec Loss 2.9474 LearningRate 0.0000 Epoch: 33 Global Step: 689820 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:29:15,643-Speed 6306.60 samples/sec Loss 2.9924 LearningRate 0.0000 Epoch: 33 Global Step: 689830 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:29:18,891-Speed 6305.99 samples/sec Loss 2.9917 LearningRate 0.0000 Epoch: 33 Global Step: 689840 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:29:22,127-Speed 6330.07 samples/sec Loss 3.0068 LearningRate 0.0000 Epoch: 33 Global Step: 689850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:25,376-Speed 6305.90 samples/sec Loss 3.0011 LearningRate 0.0000 Epoch: 33 Global Step: 689860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:28,623-Speed 6308.85 samples/sec Loss 2.9709 LearningRate 0.0000 Epoch: 33 Global Step: 689870 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:31,874-Speed 6299.50 samples/sec Loss 2.9641 LearningRate 0.0000 Epoch: 33 Global Step: 689880 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:35,127-Speed 6298.84 samples/sec Loss 3.0154 LearningRate 0.0000 Epoch: 33 Global Step: 689890 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:38,428-Speed 6204.30 samples/sec Loss 3.0464 LearningRate 0.0000 Epoch: 33 Global Step: 689900 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:41,681-Speed 6297.60 samples/sec Loss 2.9488 LearningRate 0.0000 Epoch: 33 Global Step: 689910 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:44,924-Speed 6315.96 samples/sec Loss 2.9583 LearningRate 0.0000 Epoch: 33 Global Step: 689920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:48,182-Speed 6288.68 samples/sec Loss 2.9654 LearningRate 0.0000 Epoch: 33 Global Step: 689930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:51,433-Speed 6301.41 samples/sec Loss 2.9733 LearningRate 0.0000 Epoch: 33 Global Step: 689940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:29:54,677-Speed 6314.19 samples/sec Loss 2.9812 LearningRate 0.0000 Epoch: 33 Global Step: 689950 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:29:57,923-Speed 6309.90 samples/sec Loss 3.0224 LearningRate 0.0000 Epoch: 33 Global Step: 689960 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:01,179-Speed 6290.94 samples/sec Loss 2.9683 LearningRate 0.0000 Epoch: 33 Global Step: 689970 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:04,437-Speed 6288.54 samples/sec Loss 3.0102 LearningRate 0.0000 Epoch: 33 Global Step: 689980 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:07,690-Speed 6297.13 samples/sec Loss 2.9889 LearningRate 0.0000 Epoch: 33 Global Step: 689990 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:10,941-Speed 6299.64 samples/sec Loss 2.9657 LearningRate 0.0000 Epoch: 33 Global Step: 690000 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:14,201-Speed 6284.25 samples/sec Loss 2.9843 LearningRate 0.0000 Epoch: 33 Global Step: 690010 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:17,456-Speed 6293.80 samples/sec Loss 2.9929 LearningRate 0.0000 Epoch: 33 Global Step: 690020 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:20,708-Speed 6299.24 samples/sec Loss 2.9306 LearningRate 0.0000 Epoch: 33 Global Step: 690030 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:23,960-Speed 6300.19 samples/sec Loss 2.9852 LearningRate 0.0000 Epoch: 33 Global Step: 690040 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:27,191-Speed 6338.01 samples/sec Loss 2.9841 LearningRate 0.0000 Epoch: 33 Global Step: 690050 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:30,437-Speed 6311.68 samples/sec Loss 2.9876 LearningRate 0.0000 Epoch: 33 Global Step: 690060 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:30:33,678-Speed 6321.23 samples/sec Loss 3.0385 LearningRate 0.0000 Epoch: 33 Global Step: 690070 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:30:36,923-Speed 6311.51 samples/sec Loss 2.9235 LearningRate 0.0000 Epoch: 33 Global Step: 690080 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:30:40,166-Speed 6315.92 samples/sec Loss 2.9851 LearningRate 0.0000 Epoch: 33 Global Step: 690090 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:30:43,409-Speed 6317.46 samples/sec Loss 3.0209 LearningRate 0.0000 Epoch: 33 Global Step: 690100 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:30:46,654-Speed 6311.81 samples/sec Loss 2.9954 LearningRate 0.0000 Epoch: 33 Global Step: 690110 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:30:49,902-Speed 6308.32 samples/sec Loss 2.9710 LearningRate 0.0000 Epoch: 33 Global Step: 690120 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:30:53,151-Speed 6303.83 samples/sec Loss 3.0256 LearningRate 0.0000 Epoch: 33 Global Step: 690130 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:30:56,407-Speed 6291.36 samples/sec Loss 3.0037 LearningRate 0.0000 Epoch: 33 Global Step: 690140 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:30:59,661-Speed 6295.08 samples/sec Loss 2.9389 LearningRate 0.0000 Epoch: 33 Global Step: 690150 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:31:02,918-Speed 6290.59 samples/sec Loss 2.9477 LearningRate 0.0000 Epoch: 33 Global Step: 690160 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:31:06,168-Speed 6301.77 samples/sec Loss 2.9739 LearningRate 0.0000 Epoch: 33 Global Step: 690170 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:31:09,408-Speed 6323.20 samples/sec Loss 2.9170 LearningRate 0.0000 Epoch: 33 Global Step: 690180 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:31:12,654-Speed 6310.99 samples/sec Loss 2.9751 LearningRate 0.0000 Epoch: 33 Global Step: 690190 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:31:15,897-Speed 6314.71 samples/sec Loss 3.0466 LearningRate 0.0000 Epoch: 33 Global Step: 690200 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:31:19,152-Speed 6294.45 samples/sec Loss 2.9672 LearningRate 0.0000 Epoch: 33 Global Step: 690210 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:31:22,400-Speed 6307.50 samples/sec Loss 2.9997 LearningRate 0.0000 Epoch: 33 Global Step: 690220 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:31:25,653-Speed 6296.86 samples/sec Loss 2.9785 LearningRate 0.0000 Epoch: 33 Global Step: 690230 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:31:28,900-Speed 6308.82 samples/sec Loss 2.9858 LearningRate 0.0000 Epoch: 33 Global Step: 690240 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:31:32,134-Speed 6335.00 samples/sec Loss 3.0261 LearningRate 0.0000 Epoch: 33 Global Step: 690250 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:31:35,394-Speed 6282.67 samples/sec Loss 2.9559 LearningRate 0.0000 Epoch: 33 Global Step: 690260 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:31:38,650-Speed 6292.57 samples/sec Loss 3.0191 LearningRate 0.0000 Epoch: 33 Global Step: 690270 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:31:41,897-Speed 6307.86 samples/sec Loss 2.9610 LearningRate 0.0000 Epoch: 33 Global Step: 690280 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:31:45,150-Speed 6297.01 samples/sec Loss 3.0146 LearningRate 0.0000 Epoch: 33 Global Step: 690290 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:31:48,398-Speed 6307.34 samples/sec Loss 3.0082 LearningRate 0.0000 Epoch: 33 Global Step: 690300 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:31:51,649-Speed 6300.81 samples/sec Loss 3.0151 LearningRate 0.0000 Epoch: 33 Global Step: 690310 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:31:54,901-Speed 6298.74 samples/sec Loss 2.9959 LearningRate 0.0000 Epoch: 33 Global Step: 690320 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:31:58,152-Speed 6301.23 samples/sec Loss 2.9710 LearningRate 0.0000 Epoch: 33 Global Step: 690330 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:32:01,405-Speed 6297.24 samples/sec Loss 3.0047 LearningRate 0.0000 Epoch: 33 Global Step: 690340 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:32:04,660-Speed 6293.20 samples/sec Loss 2.9998 LearningRate 0.0000 Epoch: 33 Global Step: 690350 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:07,904-Speed 6313.61 samples/sec Loss 3.0041 LearningRate 0.0000 Epoch: 33 Global Step: 690360 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:11,153-Speed 6305.16 samples/sec Loss 2.9229 LearningRate 0.0000 Epoch: 33 Global Step: 690370 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:14,412-Speed 6285.99 samples/sec Loss 2.9998 LearningRate 0.0000 Epoch: 33 Global Step: 690380 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:17,684-Speed 6260.77 samples/sec Loss 3.0174 LearningRate 0.0000 Epoch: 33 Global Step: 690390 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:20,937-Speed 6297.06 samples/sec Loss 2.9488 LearningRate 0.0000 Epoch: 33 Global Step: 690400 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:24,185-Speed 6305.98 samples/sec Loss 2.9805 LearningRate 0.0000 Epoch: 33 Global Step: 690410 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:27,472-Speed 6233.19 samples/sec Loss 2.9516 LearningRate 0.0000 Epoch: 33 Global Step: 690420 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:30,726-Speed 6294.75 samples/sec Loss 2.9260 LearningRate 0.0000 Epoch: 33 Global Step: 690430 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:33,969-Speed 6316.89 samples/sec Loss 2.9842 LearningRate 0.0000 Epoch: 33 Global Step: 690440 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:37,203-Speed 6335.00 samples/sec Loss 3.0044 LearningRate 0.0000 Epoch: 33 Global Step: 690450 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:32:40,433-Speed 6341.53 samples/sec Loss 3.0406 LearningRate 0.0000 Epoch: 33 Global Step: 690460 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:32:43,682-Speed 6304.79 samples/sec Loss 2.9625 LearningRate 0.0000 Epoch: 33 Global Step: 690470 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:32:46,933-Speed 6301.97 samples/sec Loss 2.9858 LearningRate 0.0000 Epoch: 33 Global Step: 690480 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:32:50,182-Speed 6303.68 samples/sec Loss 2.9398 LearningRate 0.0000 Epoch: 33 Global Step: 690490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:32:53,430-Speed 6307.12 samples/sec Loss 3.0403 LearningRate 0.0000 Epoch: 33 Global Step: 690500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:32:56,687-Speed 6288.56 samples/sec Loss 3.0112 LearningRate 0.0000 Epoch: 33 Global Step: 690510 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:32:59,947-Speed 6285.18 samples/sec Loss 3.0205 LearningRate 0.0000 Epoch: 33 Global Step: 690520 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:03,195-Speed 6305.82 samples/sec Loss 2.9861 LearningRate 0.0000 Epoch: 33 Global Step: 690530 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:06,445-Speed 6302.76 samples/sec Loss 3.0128 LearningRate 0.0000 Epoch: 33 Global Step: 690540 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:09,710-Speed 6274.43 samples/sec Loss 2.9347 LearningRate 0.0000 Epoch: 33 Global Step: 690550 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:12,946-Speed 6330.75 samples/sec Loss 2.9859 LearningRate 0.0000 Epoch: 33 Global Step: 690560 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:16,193-Speed 6307.92 samples/sec Loss 2.9912 LearningRate 0.0000 Epoch: 33 Global Step: 690570 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:19,435-Speed 6319.20 samples/sec Loss 2.9728 LearningRate 0.0000 Epoch: 33 Global Step: 690580 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:22,711-Speed 6251.97 samples/sec Loss 2.9512 LearningRate 0.0000 Epoch: 33 Global Step: 690590 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:25,968-Speed 6289.78 samples/sec Loss 3.0233 LearningRate 0.0000 Epoch: 33 Global Step: 690600 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:29,218-Speed 6302.28 samples/sec Loss 2.9932 LearningRate 0.0000 Epoch: 33 Global Step: 690610 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:32,474-Speed 6291.07 samples/sec Loss 3.0193 LearningRate 0.0000 Epoch: 33 Global Step: 690620 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:35,721-Speed 6309.57 samples/sec Loss 2.9245 LearningRate 0.0000 Epoch: 33 Global Step: 690630 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:38,980-Speed 6286.09 samples/sec Loss 2.9672 LearningRate 0.0000 Epoch: 33 Global Step: 690640 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:42,231-Speed 6301.44 samples/sec Loss 2.9421 LearningRate 0.0000 Epoch: 33 Global Step: 690650 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:45,474-Speed 6316.05 samples/sec Loss 2.9726 LearningRate 0.0000 Epoch: 33 Global Step: 690660 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:33:48,719-Speed 6313.68 samples/sec Loss 2.9691 LearningRate 0.0000 Epoch: 33 Global Step: 690670 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:51,971-Speed 6299.03 samples/sec Loss 2.9495 LearningRate 0.0000 Epoch: 33 Global Step: 690680 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:55,224-Speed 6296.96 samples/sec Loss 2.9965 LearningRate 0.0000 Epoch: 33 Global Step: 690690 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:33:58,469-Speed 6311.81 samples/sec Loss 3.0029 LearningRate 0.0000 Epoch: 33 Global Step: 690700 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:34:01,720-Speed 6301.99 samples/sec Loss 2.9693 LearningRate 0.0000 Epoch: 33 Global Step: 690710 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:34:04,976-Speed 6291.35 samples/sec Loss 3.0419 LearningRate 0.0000 Epoch: 33 Global Step: 690720 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:34:08,230-Speed 6295.09 samples/sec Loss 2.9747 LearningRate 0.0000 Epoch: 33 Global Step: 690730 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:34:11,475-Speed 6311.99 samples/sec Loss 3.0442 LearningRate 0.0000 Epoch: 33 Global Step: 690740 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:34:14,723-Speed 6306.65 samples/sec Loss 2.9950 LearningRate 0.0000 Epoch: 33 Global Step: 690750 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:34:17,997-Speed 6257.93 samples/sec Loss 2.9094 LearningRate 0.0000 Epoch: 33 Global Step: 690760 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:34:21,245-Speed 6305.25 samples/sec Loss 2.9868 LearningRate 0.0000 Epoch: 33 Global Step: 690770 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:34:24,498-Speed 6297.45 samples/sec Loss 3.0151 LearningRate 0.0000 Epoch: 33 Global Step: 690780 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:34:27,752-Speed 6295.82 samples/sec Loss 2.9418 LearningRate 0.0000 Epoch: 33 Global Step: 690790 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:34:31,009-Speed 6288.42 samples/sec Loss 2.9882 LearningRate 0.0000 Epoch: 33 Global Step: 690800 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:34:34,260-Speed 6301.69 samples/sec Loss 2.9475 LearningRate 0.0000 Epoch: 33 Global Step: 690810 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:34:37,508-Speed 6307.17 samples/sec Loss 2.9515 LearningRate 0.0000 Epoch: 33 Global Step: 690820 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:34:40,762-Speed 6295.17 samples/sec Loss 2.9919 LearningRate 0.0000 Epoch: 33 Global Step: 690830 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:34:44,011-Speed 6304.18 samples/sec Loss 2.9417 LearningRate 0.0000 Epoch: 33 Global Step: 690840 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:34:47,245-Speed 6333.95 samples/sec Loss 2.9907 LearningRate 0.0000 Epoch: 33 Global Step: 690850 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:34:50,507-Speed 6281.79 samples/sec Loss 2.9321 LearningRate 0.0000 Epoch: 33 Global Step: 690860 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:34:53,755-Speed 6306.87 samples/sec Loss 3.0219 LearningRate 0.0000 Epoch: 33 Global Step: 690870 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:34:57,007-Speed 6297.71 samples/sec Loss 2.9714 LearningRate 0.0000 Epoch: 33 Global Step: 690880 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:35:00,261-Speed 6296.97 samples/sec Loss 2.9721 LearningRate 0.0000 Epoch: 33 Global Step: 690890 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:35:03,515-Speed 6295.19 samples/sec Loss 2.9899 LearningRate 0.0000 Epoch: 33 Global Step: 690900 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:35:06,772-Speed 6287.95 samples/sec Loss 2.9567 LearningRate 0.0000 Epoch: 33 Global Step: 690910 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:35:10,019-Speed 6308.42 samples/sec Loss 2.9549 LearningRate 0.0000 Epoch: 33 Global Step: 690920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:35:13,266-Speed 6309.42 samples/sec Loss 3.0096 LearningRate 0.0000 Epoch: 33 Global Step: 690930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:35:16,517-Speed 6301.26 samples/sec Loss 3.0289 LearningRate 0.0000 Epoch: 33 Global Step: 690940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:35:19,771-Speed 6295.14 samples/sec Loss 2.9441 LearningRate 0.0000 Epoch: 33 Global Step: 690950 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:23,021-Speed 6302.45 samples/sec Loss 2.9393 LearningRate 0.0000 Epoch: 33 Global Step: 690960 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:26,268-Speed 6308.86 samples/sec Loss 2.9980 LearningRate 0.0000 Epoch: 33 Global Step: 690970 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:29,526-Speed 6287.90 samples/sec Loss 2.9373 LearningRate 0.0000 Epoch: 33 Global Step: 690980 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:32,768-Speed 6317.97 samples/sec Loss 2.9030 LearningRate 0.0000 Epoch: 33 Global Step: 690990 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:36,019-Speed 6301.29 samples/sec Loss 2.9956 LearningRate 0.0000 Epoch: 33 Global Step: 691000 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:39,269-Speed 6302.83 samples/sec Loss 2.9963 LearningRate 0.0000 Epoch: 33 Global Step: 691010 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:42,515-Speed 6310.33 samples/sec Loss 2.9947 LearningRate 0.0000 Epoch: 33 Global Step: 691020 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:45,770-Speed 6294.85 samples/sec Loss 2.9477 LearningRate 0.0000 Epoch: 33 Global Step: 691030 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:49,020-Speed 6302.31 samples/sec Loss 2.9415 LearningRate 0.0000 Epoch: 33 Global Step: 691040 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:52,252-Speed 6337.04 samples/sec Loss 3.0363 LearningRate 0.0000 Epoch: 33 Global Step: 691050 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:55,511-Speed 6286.45 samples/sec Loss 2.9865 LearningRate 0.0000 Epoch: 33 Global Step: 691060 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:35:58,770-Speed 6285.92 samples/sec Loss 3.0165 LearningRate 0.0000 Epoch: 33 Global Step: 691070 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:36:02,021-Speed 6300.75 samples/sec Loss 2.9866 LearningRate 0.0000 Epoch: 33 Global Step: 691080 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:36:05,276-Speed 6293.99 samples/sec Loss 2.9609 LearningRate 0.0000 Epoch: 33 Global Step: 691090 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:36:08,524-Speed 6306.86 samples/sec Loss 2.9774 LearningRate 0.0000 Epoch: 33 Global Step: 691100 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:36:11,759-Speed 6332.99 samples/sec Loss 3.0574 LearningRate 0.0000 Epoch: 33 Global Step: 691110 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:36:15,003-Speed 6314.16 samples/sec Loss 2.9634 LearningRate 0.0000 Epoch: 33 Global Step: 691120 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:36:18,255-Speed 6299.58 samples/sec Loss 2.9364 LearningRate 0.0000 Epoch: 33 Global Step: 691130 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:36:21,496-Speed 6318.46 samples/sec Loss 2.9853 LearningRate 0.0000 Epoch: 33 Global Step: 691140 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:36:24,746-Speed 6302.98 samples/sec Loss 2.9571 LearningRate 0.0000 Epoch: 33 Global Step: 691150 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:36:27,990-Speed 6314.64 samples/sec Loss 2.9659 LearningRate 0.0000 Epoch: 33 Global Step: 691160 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:36:31,239-Speed 6304.90 samples/sec Loss 2.9695 LearningRate 0.0000 Epoch: 33 Global Step: 691170 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:36:34,484-Speed 6313.65 samples/sec Loss 2.9597 LearningRate 0.0000 Epoch: 33 Global Step: 691180 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:36:37,731-Speed 6307.73 samples/sec Loss 2.9923 LearningRate 0.0000 Epoch: 33 Global Step: 691190 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:36:40,979-Speed 6306.62 samples/sec Loss 2.9547 LearningRate 0.0000 Epoch: 33 Global Step: 691200 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:36:44,228-Speed 6305.42 samples/sec Loss 2.9585 LearningRate 0.0000 Epoch: 33 Global Step: 691210 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:36:47,477-Speed 6306.16 samples/sec Loss 3.0216 LearningRate 0.0000 Epoch: 33 Global Step: 691220 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:36:50,726-Speed 6303.33 samples/sec Loss 2.9711 LearningRate 0.0000 Epoch: 33 Global Step: 691230 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:36:53,975-Speed 6305.70 samples/sec Loss 2.9981 LearningRate 0.0000 Epoch: 33 Global Step: 691240 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:36:57,221-Speed 6310.56 samples/sec Loss 3.0075 LearningRate 0.0000 Epoch: 33 Global Step: 691250 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:37:00,473-Speed 6299.51 samples/sec Loss 2.9746 LearningRate 0.0000 Epoch: 33 Global Step: 691260 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:37:03,727-Speed 6295.86 samples/sec Loss 2.9869 LearningRate 0.0000 Epoch: 33 Global Step: 691270 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:37:06,983-Speed 6290.42 samples/sec Loss 2.9858 LearningRate 0.0000 Epoch: 33 Global Step: 691280 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:37:10,233-Speed 6304.19 samples/sec Loss 2.9830 LearningRate 0.0000 Epoch: 33 Global Step: 691290 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:37:13,486-Speed 6295.72 samples/sec Loss 2.9792 LearningRate 0.0000 Epoch: 33 Global Step: 691300 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:37:16,727-Speed 6320.97 samples/sec Loss 2.9936 LearningRate 0.0000 Epoch: 33 Global Step: 691310 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:37:19,980-Speed 6296.33 samples/sec Loss 2.9702 LearningRate 0.0000 Epoch: 33 Global Step: 691320 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:37:23,226-Speed 6311.61 samples/sec Loss 2.9484 LearningRate 0.0000 Epoch: 33 Global Step: 691330 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:37:26,471-Speed 6312.28 samples/sec Loss 2.9820 LearningRate 0.0000 Epoch: 33 Global Step: 691340 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:37:29,727-Speed 6292.38 samples/sec Loss 2.9917 LearningRate 0.0000 Epoch: 33 Global Step: 691350 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:37:32,980-Speed 6296.70 samples/sec Loss 2.9805 LearningRate 0.0000 Epoch: 33 Global Step: 691360 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:37:36,229-Speed 6305.12 samples/sec Loss 2.9250 LearningRate 0.0000 Epoch: 33 Global Step: 691370 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:37:39,477-Speed 6306.80 samples/sec Loss 2.9988 LearningRate 0.0000 Epoch: 33 Global Step: 691380 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:37:42,723-Speed 6310.20 samples/sec Loss 3.0028 LearningRate 0.0000 Epoch: 33 Global Step: 691390 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:37:45,971-Speed 6305.95 samples/sec Loss 2.9640 LearningRate 0.0000 Epoch: 33 Global Step: 691400 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:37:49,224-Speed 6297.08 samples/sec Loss 2.9552 LearningRate 0.0000 Epoch: 33 Global Step: 691410 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:37:52,473-Speed 6305.28 samples/sec Loss 2.9576 LearningRate 0.0000 Epoch: 33 Global Step: 691420 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:37:55,720-Speed 6308.97 samples/sec Loss 2.9854 LearningRate 0.0000 Epoch: 33 Global Step: 691430 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:37:58,972-Speed 6298.35 samples/sec Loss 2.9536 LearningRate 0.0000 Epoch: 33 Global Step: 691440 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:38:02,318-Speed 6122.22 samples/sec Loss 2.9789 LearningRate 0.0000 Epoch: 33 Global Step: 691450 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:38:05,623-Speed 6199.13 samples/sec Loss 2.9400 LearningRate 0.0000 Epoch: 33 Global Step: 691460 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:38:08,859-Speed 6330.93 samples/sec Loss 3.0242 LearningRate 0.0000 Epoch: 33 Global Step: 691470 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:12,105-Speed 6310.49 samples/sec Loss 3.0276 LearningRate 0.0000 Epoch: 33 Global Step: 691480 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:15,358-Speed 6296.09 samples/sec Loss 2.9926 LearningRate 0.0000 Epoch: 33 Global Step: 691490 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:18,613-Speed 6295.07 samples/sec Loss 2.9860 LearningRate 0.0000 Epoch: 33 Global Step: 691500 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:21,868-Speed 6291.49 samples/sec Loss 3.0063 LearningRate 0.0000 Epoch: 33 Global Step: 691510 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:25,124-Speed 6292.82 samples/sec Loss 2.9633 LearningRate 0.0000 Epoch: 33 Global Step: 691520 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:28,379-Speed 6292.31 samples/sec Loss 3.0013 LearningRate 0.0000 Epoch: 33 Global Step: 691530 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:31,637-Speed 6287.82 samples/sec Loss 3.0074 LearningRate 0.0000 Epoch: 33 Global Step: 691540 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:34,902-Speed 6272.97 samples/sec Loss 2.9325 LearningRate 0.0000 Epoch: 33 Global Step: 691550 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:38,153-Speed 6302.92 samples/sec Loss 2.9522 LearningRate 0.0000 Epoch: 33 Global Step: 691560 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:41,398-Speed 6312.29 samples/sec Loss 3.0228 LearningRate 0.0000 Epoch: 33 Global Step: 691570 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:38:44,643-Speed 6312.28 samples/sec Loss 2.9594 LearningRate 0.0000 Epoch: 33 Global Step: 691580 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:38:47,887-Speed 6314.50 samples/sec Loss 3.0018 LearningRate 0.0000 Epoch: 33 Global Step: 691590 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:38:51,147-Speed 6283.69 samples/sec Loss 2.9511 LearningRate 0.0000 Epoch: 33 Global Step: 691600 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:38:54,388-Speed 6321.63 samples/sec Loss 2.9627 LearningRate 0.0000 Epoch: 33 Global Step: 691610 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:38:57,633-Speed 6311.41 samples/sec Loss 3.0112 LearningRate 0.0000 Epoch: 33 Global Step: 691620 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:00,880-Speed 6308.65 samples/sec Loss 2.9577 LearningRate 0.0000 Epoch: 33 Global Step: 691630 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:04,131-Speed 6300.67 samples/sec Loss 2.9648 LearningRate 0.0000 Epoch: 33 Global Step: 691640 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:07,381-Speed 6304.42 samples/sec Loss 3.0013 LearningRate 0.0000 Epoch: 33 Global Step: 691650 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:10,632-Speed 6301.55 samples/sec Loss 2.9333 LearningRate 0.0000 Epoch: 33 Global Step: 691660 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:13,885-Speed 6295.76 samples/sec Loss 2.9650 LearningRate 0.0000 Epoch: 33 Global Step: 691670 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:17,135-Speed 6304.20 samples/sec Loss 2.9930 LearningRate 0.0000 Epoch: 33 Global Step: 691680 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:20,383-Speed 6307.29 samples/sec Loss 2.9979 LearningRate 0.0000 Epoch: 33 Global Step: 691690 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:23,634-Speed 6302.53 samples/sec Loss 2.9682 LearningRate 0.0000 Epoch: 33 Global Step: 691700 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:26,887-Speed 6297.91 samples/sec Loss 2.9682 LearningRate 0.0000 Epoch: 33 Global Step: 691710 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:39:30,140-Speed 6295.85 samples/sec Loss 2.9698 LearningRate 0.0000 Epoch: 33 Global Step: 691720 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:39:33,377-Speed 6329.15 samples/sec Loss 2.9806 LearningRate 0.0000 Epoch: 33 Global Step: 691730 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:36,623-Speed 6310.81 samples/sec Loss 3.0006 LearningRate 0.0000 Epoch: 33 Global Step: 691740 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:39,877-Speed 6295.47 samples/sec Loss 2.9538 LearningRate 0.0000 Epoch: 33 Global Step: 691750 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:43,125-Speed 6305.63 samples/sec Loss 2.9641 LearningRate 0.0000 Epoch: 33 Global Step: 691760 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:46,376-Speed 6302.38 samples/sec Loss 2.9886 LearningRate 0.0000 Epoch: 33 Global Step: 691770 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:49,621-Speed 6310.91 samples/sec Loss 2.9677 LearningRate 0.0000 Epoch: 33 Global Step: 691780 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:52,871-Speed 6303.09 samples/sec Loss 2.9849 LearningRate 0.0000 Epoch: 33 Global Step: 691790 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:56,120-Speed 6305.24 samples/sec Loss 2.9842 LearningRate 0.0000 Epoch: 33 Global Step: 691800 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:39:59,372-Speed 6300.08 samples/sec Loss 3.0715 LearningRate 0.0000 Epoch: 33 Global Step: 691810 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:02,632-Speed 6283.32 samples/sec Loss 2.9533 LearningRate 0.0000 Epoch: 33 Global Step: 691820 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:05,883-Speed 6301.04 samples/sec Loss 2.9708 LearningRate 0.0000 Epoch: 33 Global Step: 691830 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:40:09,142-Speed 6283.81 samples/sec Loss 2.9466 LearningRate 0.0000 Epoch: 33 Global Step: 691840 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:40:12,397-Speed 6293.82 samples/sec Loss 2.9129 LearningRate 0.0000 Epoch: 33 Global Step: 691850 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:40:15,648-Speed 6301.31 samples/sec Loss 2.9608 LearningRate 0.0000 Epoch: 33 Global Step: 691860 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:40:18,898-Speed 6304.16 samples/sec Loss 2.9616 LearningRate 0.0000 Epoch: 33 Global Step: 691870 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:40:22,143-Speed 6312.26 samples/sec Loss 2.9957 LearningRate 0.0000 Epoch: 33 Global Step: 691880 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:40:25,378-Speed 6332.80 samples/sec Loss 2.9615 LearningRate 0.0000 Epoch: 33 Global Step: 691890 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:28,630-Speed 6298.56 samples/sec Loss 2.9897 LearningRate 0.0000 Epoch: 33 Global Step: 691900 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:31,885-Speed 6293.37 samples/sec Loss 2.9255 LearningRate 0.0000 Epoch: 33 Global Step: 691910 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:35,143-Speed 6286.98 samples/sec Loss 2.9914 LearningRate 0.0000 Epoch: 33 Global Step: 691920 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:38,393-Speed 6303.15 samples/sec Loss 2.9424 LearningRate 0.0000 Epoch: 33 Global Step: 691930 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:41,637-Speed 6314.71 samples/sec Loss 2.9495 LearningRate 0.0000 Epoch: 33 Global Step: 691940 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:44,886-Speed 6305.48 samples/sec Loss 2.9734 LearningRate 0.0000 Epoch: 33 Global Step: 691950 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:48,141-Speed 6293.13 samples/sec Loss 2.9305 LearningRate 0.0000 Epoch: 33 Global Step: 691960 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:51,412-Speed 6261.66 samples/sec Loss 2.9568 LearningRate 0.0000 Epoch: 33 Global Step: 691970 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:54,668-Speed 6291.00 samples/sec Loss 2.9615 LearningRate 0.0000 Epoch: 33 Global Step: 691980 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:40:57,916-Speed 6306.81 samples/sec Loss 2.9538 LearningRate 0.0000 Epoch: 33 Global Step: 691990 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:41:01,164-Speed 6308.05 samples/sec Loss 2.9410 LearningRate 0.0000 Epoch: 33 Global Step: 692000 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:41:04,414-Speed 6301.42 samples/sec Loss 2.9444 LearningRate 0.0000 Epoch: 33 Global Step: 692010 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:41:07,647-Speed 6336.80 samples/sec Loss 2.9863 LearningRate 0.0000 Epoch: 33 Global Step: 692020 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:41:10,896-Speed 6303.98 samples/sec Loss 3.0369 LearningRate 0.0000 Epoch: 33 Global Step: 692030 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:41:14,147-Speed 6300.74 samples/sec Loss 3.0118 LearningRate 0.0000 Epoch: 33 Global Step: 692040 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:41:17,398-Speed 6301.88 samples/sec Loss 2.9402 LearningRate 0.0000 Epoch: 33 Global Step: 692050 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:41:20,646-Speed 6307.64 samples/sec Loss 2.9848 LearningRate 0.0000 Epoch: 33 Global Step: 692060 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:41:23,893-Speed 6309.55 samples/sec Loss 2.9774 LearningRate 0.0000 Epoch: 33 Global Step: 692070 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:41:27,149-Speed 6290.80 samples/sec Loss 2.9671 LearningRate 0.0000 Epoch: 33 Global Step: 692080 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:41:30,397-Speed 6306.36 samples/sec Loss 2.9664 LearningRate 0.0000 Epoch: 33 Global Step: 692090 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:41:33,649-Speed 6298.57 samples/sec Loss 2.9223 LearningRate 0.0000 Epoch: 33 Global Step: 692100 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:41:36,908-Speed 6286.83 samples/sec Loss 3.0159 LearningRate 0.0000 Epoch: 33 Global Step: 692110 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-03 06:41:40,163-Speed 6292.00 samples/sec Loss 2.9467 LearningRate 0.0000 Epoch: 33 Global Step: 692120 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-03 06:41:43,413-Speed 6303.98 samples/sec Loss 2.9524 LearningRate 0.0000 Epoch: 33 Global Step: 692130 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:41:46,665-Speed 6299.68 samples/sec Loss 2.9929 LearningRate 0.0000 Epoch: 33 Global Step: 692140 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:41:49,912-Speed 6307.79 samples/sec Loss 2.9857 LearningRate 0.0000 Epoch: 33 Global Step: 692150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:41:53,196-Speed 6238.29 samples/sec Loss 2.9450 LearningRate 0.0000 Epoch: 33 Global Step: 692160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:41:56,446-Speed 6301.91 samples/sec Loss 2.9469 LearningRate 0.0000 Epoch: 33 Global Step: 692170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:41:59,691-Speed 6313.04 samples/sec Loss 2.9538 LearningRate 0.0000 Epoch: 33 Global Step: 692180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:02,944-Speed 6297.40 samples/sec Loss 2.9317 LearningRate 0.0000 Epoch: 33 Global Step: 692190 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:06,191-Speed 6308.83 samples/sec Loss 2.9193 LearningRate 0.0000 Epoch: 33 Global Step: 692200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:09,440-Speed 6304.64 samples/sec Loss 2.9924 LearningRate 0.0000 Epoch: 33 Global Step: 692210 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:12,689-Speed 6303.91 samples/sec Loss 3.0126 LearningRate 0.0000 Epoch: 33 Global Step: 692220 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:15,942-Speed 6298.81 samples/sec Loss 3.0003 LearningRate 0.0000 Epoch: 33 Global Step: 692230 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:19,195-Speed 6296.32 samples/sec Loss 2.9443 LearningRate 0.0000 Epoch: 33 Global Step: 692240 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:22,448-Speed 6296.25 samples/sec Loss 2.9406 LearningRate 0.0000 Epoch: 33 Global Step: 692250 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:42:25,705-Speed 6289.76 samples/sec Loss 2.9781 LearningRate 0.0000 Epoch: 33 Global Step: 692260 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:42:28,964-Speed 6287.55 samples/sec Loss 2.9805 LearningRate 0.0000 Epoch: 33 Global Step: 692270 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:42:32,213-Speed 6304.03 samples/sec Loss 2.9846 LearningRate 0.0000 Epoch: 33 Global Step: 692280 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:42:35,451-Speed 6327.14 samples/sec Loss 3.0052 LearningRate 0.0000 Epoch: 33 Global Step: 692290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:38,707-Speed 6291.18 samples/sec Loss 2.9861 LearningRate 0.0000 Epoch: 33 Global Step: 692300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:41,963-Speed 6291.41 samples/sec Loss 2.9552 LearningRate 0.0000 Epoch: 33 Global Step: 692310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:45,216-Speed 6297.73 samples/sec Loss 2.9577 LearningRate 0.0000 Epoch: 33 Global Step: 692320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:48,469-Speed 6296.70 samples/sec Loss 2.9884 LearningRate 0.0000 Epoch: 33 Global Step: 692330 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:51,723-Speed 6295.07 samples/sec Loss 2.9696 LearningRate 0.0000 Epoch: 33 Global Step: 692340 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:54,973-Speed 6302.80 samples/sec Loss 3.0083 LearningRate 0.0000 Epoch: 33 Global Step: 692350 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:42:58,233-Speed 6283.50 samples/sec Loss 2.9340 LearningRate 0.0000 Epoch: 33 Global Step: 692360 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:01,487-Speed 6294.86 samples/sec Loss 2.9488 LearningRate 0.0000 Epoch: 33 Global Step: 692370 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:04,740-Speed 6297.11 samples/sec Loss 3.0241 LearningRate 0.0000 Epoch: 33 Global Step: 692380 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:07,993-Speed 6297.78 samples/sec Loss 2.9708 LearningRate 0.0000 Epoch: 33 Global Step: 692390 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:43:11,249-Speed 6290.16 samples/sec Loss 2.9648 LearningRate 0.0000 Epoch: 33 Global Step: 692400 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:43:14,507-Speed 6287.83 samples/sec Loss 3.0066 LearningRate 0.0000 Epoch: 33 Global Step: 692410 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:43:17,760-Speed 6296.90 samples/sec Loss 2.9893 LearningRate 0.0000 Epoch: 33 Global Step: 692420 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:43:20,998-Speed 6326.15 samples/sec Loss 2.9626 LearningRate 0.0000 Epoch: 33 Global Step: 692430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:24,247-Speed 6305.65 samples/sec Loss 2.9855 LearningRate 0.0000 Epoch: 33 Global Step: 692440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:27,503-Speed 6291.08 samples/sec Loss 2.9229 LearningRate 0.0000 Epoch: 33 Global Step: 692450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:30,752-Speed 6304.11 samples/sec Loss 2.9673 LearningRate 0.0000 Epoch: 33 Global Step: 692460 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:34,004-Speed 6300.74 samples/sec Loss 3.0188 LearningRate 0.0000 Epoch: 33 Global Step: 692470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:37,255-Speed 6300.42 samples/sec Loss 3.0513 LearningRate 0.0000 Epoch: 33 Global Step: 692480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:40,500-Speed 6314.54 samples/sec Loss 2.9939 LearningRate 0.0000 Epoch: 33 Global Step: 692490 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:43,752-Speed 6297.28 samples/sec Loss 2.9461 LearningRate 0.0000 Epoch: 33 Global Step: 692500 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:47,003-Speed 6301.43 samples/sec Loss 2.9377 LearningRate 0.0000 Epoch: 33 Global Step: 692510 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:50,249-Speed 6310.92 samples/sec Loss 2.9388 LearningRate 0.0000 Epoch: 33 Global Step: 692520 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:53,504-Speed 6294.84 samples/sec Loss 3.0161 LearningRate 0.0000 Epoch: 33 Global Step: 692530 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:43:56,745-Speed 6319.44 samples/sec Loss 3.0763 LearningRate 0.0000 Epoch: 33 Global Step: 692540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:43:59,998-Speed 6296.45 samples/sec Loss 2.9831 LearningRate 0.0000 Epoch: 33 Global Step: 692550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:03,279-Speed 6244.42 samples/sec Loss 2.9808 LearningRate 0.0000 Epoch: 33 Global Step: 692560 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:06,534-Speed 6291.96 samples/sec Loss 3.0792 LearningRate 0.0000 Epoch: 33 Global Step: 692570 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:09,781-Speed 6309.59 samples/sec Loss 2.9181 LearningRate 0.0000 Epoch: 33 Global Step: 692580 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:13,033-Speed 6298.47 samples/sec Loss 3.0503 LearningRate 0.0000 Epoch: 33 Global Step: 692590 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:16,295-Speed 6278.85 samples/sec Loss 3.0020 LearningRate 0.0000 Epoch: 33 Global Step: 692600 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:19,558-Speed 6278.06 samples/sec Loss 2.9360 LearningRate 0.0000 Epoch: 33 Global Step: 692610 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:22,810-Speed 6300.26 samples/sec Loss 3.0079 LearningRate 0.0000 Epoch: 33 Global Step: 692620 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:26,066-Speed 6291.77 samples/sec Loss 2.9559 LearningRate 0.0000 Epoch: 33 Global Step: 692630 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:29,313-Speed 6307.39 samples/sec Loss 3.0045 LearningRate 0.0000 Epoch: 33 Global Step: 692640 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:44:32,568-Speed 6294.48 samples/sec Loss 2.8882 LearningRate 0.0000 Epoch: 33 Global Step: 692650 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:44:35,813-Speed 6311.70 samples/sec Loss 2.9810 LearningRate 0.0000 Epoch: 33 Global Step: 692660 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:44:39,066-Speed 6296.63 samples/sec Loss 2.9969 LearningRate 0.0000 Epoch: 33 Global Step: 692670 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:44:42,320-Speed 6296.02 samples/sec Loss 3.0068 LearningRate 0.0000 Epoch: 33 Global Step: 692680 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:44:45,570-Speed 6302.93 samples/sec Loss 2.9624 LearningRate 0.0000 Epoch: 33 Global Step: 692690 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:44:48,825-Speed 6293.95 samples/sec Loss 2.9812 LearningRate 0.0000 Epoch: 33 Global Step: 692700 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:44:52,058-Speed 6335.57 samples/sec Loss 2.9676 LearningRate 0.0000 Epoch: 33 Global Step: 692710 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:55,307-Speed 6304.76 samples/sec Loss 2.9033 LearningRate 0.0000 Epoch: 33 Global Step: 692720 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:44:58,551-Speed 6315.06 samples/sec Loss 2.9771 LearningRate 0.0000 Epoch: 33 Global Step: 692730 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:01,810-Speed 6284.80 samples/sec Loss 3.0053 LearningRate 0.0000 Epoch: 33 Global Step: 692740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:05,066-Speed 6291.36 samples/sec Loss 2.9352 LearningRate 0.0000 Epoch: 33 Global Step: 692750 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:08,322-Speed 6291.49 samples/sec Loss 2.9779 LearningRate 0.0000 Epoch: 33 Global Step: 692760 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:11,579-Speed 6289.88 samples/sec Loss 2.9638 LearningRate 0.0000 Epoch: 33 Global Step: 692770 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:14,831-Speed 6299.59 samples/sec Loss 2.9565 LearningRate 0.0000 Epoch: 33 Global Step: 692780 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:18,078-Speed 6308.74 samples/sec Loss 2.9763 LearningRate 0.0000 Epoch: 33 Global Step: 692790 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:21,325-Speed 6308.01 samples/sec Loss 2.9303 LearningRate 0.0000 Epoch: 33 Global Step: 692800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:24,556-Speed 6339.10 samples/sec Loss 2.9312 LearningRate 0.0000 Epoch: 33 Global Step: 692810 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:27,822-Speed 6271.83 samples/sec Loss 2.9301 LearningRate 0.0000 Epoch: 33 Global Step: 692820 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:31,077-Speed 6295.27 samples/sec Loss 2.9055 LearningRate 0.0000 Epoch: 33 Global Step: 692830 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:34,329-Speed 6297.81 samples/sec Loss 2.9633 LearningRate 0.0000 Epoch: 33 Global Step: 692840 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:37,581-Speed 6298.52 samples/sec Loss 3.0212 LearningRate 0.0000 Epoch: 33 Global Step: 692850 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:40,834-Speed 6297.63 samples/sec Loss 2.9890 LearningRate 0.0000 Epoch: 33 Global Step: 692860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:44,086-Speed 6299.68 samples/sec Loss 2.9514 LearningRate 0.0000 Epoch: 33 Global Step: 692870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:47,337-Speed 6300.10 samples/sec Loss 2.9670 LearningRate 0.0000 Epoch: 33 Global Step: 692880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:50,591-Speed 6294.73 samples/sec Loss 2.9483 LearningRate 0.0000 Epoch: 33 Global Step: 692890 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:53,838-Speed 6309.02 samples/sec Loss 2.9709 LearningRate 0.0000 Epoch: 33 Global Step: 692900 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:45:57,090-Speed 6299.79 samples/sec Loss 2.9421 LearningRate 0.0000 Epoch: 33 Global Step: 692910 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:46:00,338-Speed 6307.44 samples/sec Loss 2.9957 LearningRate 0.0000 Epoch: 33 Global Step: 692920 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:46:03,572-Speed 6334.63 samples/sec Loss 2.9820 LearningRate 0.0000 Epoch: 33 Global Step: 692930 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:06,838-Speed 6272.37 samples/sec Loss 2.9687 LearningRate 0.0000 Epoch: 33 Global Step: 692940 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:10,093-Speed 6293.52 samples/sec Loss 2.9431 LearningRate 0.0000 Epoch: 33 Global Step: 692950 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:13,347-Speed 6294.07 samples/sec Loss 2.9871 LearningRate 0.0000 Epoch: 33 Global Step: 692960 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:16,594-Speed 6310.02 samples/sec Loss 2.9494 LearningRate 0.0000 Epoch: 33 Global Step: 692970 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:19,849-Speed 6292.52 samples/sec Loss 2.9606 LearningRate 0.0000 Epoch: 33 Global Step: 692980 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:23,116-Speed 6271.04 samples/sec Loss 2.9944 LearningRate 0.0000 Epoch: 33 Global Step: 692990 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:26,369-Speed 6295.65 samples/sec Loss 2.9492 LearningRate 0.0000 Epoch: 33 Global Step: 693000 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:29,620-Speed 6302.46 samples/sec Loss 2.9389 LearningRate 0.0000 Epoch: 33 Global Step: 693010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:32,870-Speed 6302.40 samples/sec Loss 2.9566 LearningRate 0.0000 Epoch: 33 Global Step: 693020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:36,117-Speed 6307.88 samples/sec Loss 3.0302 LearningRate 0.0000 Epoch: 33 Global Step: 693030 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:46:39,421-Speed 6199.68 samples/sec Loss 3.0212 LearningRate 0.0000 Epoch: 33 Global Step: 693040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:42,687-Speed 6272.83 samples/sec Loss 2.9205 LearningRate 0.0000 Epoch: 33 Global Step: 693050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:45,941-Speed 6294.34 samples/sec Loss 3.0073 LearningRate 0.0000 Epoch: 33 Global Step: 693060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:49,199-Speed 6288.33 samples/sec Loss 2.9292 LearningRate 0.0000 Epoch: 33 Global Step: 693070 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:52,456-Speed 6288.57 samples/sec Loss 2.9666 LearningRate 0.0000 Epoch: 33 Global Step: 693080 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:55,715-Speed 6285.81 samples/sec Loss 2.9537 LearningRate 0.0000 Epoch: 33 Global Step: 693090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:46:58,959-Speed 6315.91 samples/sec Loss 2.9911 LearningRate 0.0000 Epoch: 33 Global Step: 693100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:02,215-Speed 6290.57 samples/sec Loss 2.9371 LearningRate 0.0000 Epoch: 33 Global Step: 693110 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:05,470-Speed 6294.05 samples/sec Loss 2.9977 LearningRate 0.0000 Epoch: 33 Global Step: 693120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:08,730-Speed 6283.20 samples/sec Loss 3.0041 LearningRate 0.0000 Epoch: 33 Global Step: 693130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:11,982-Speed 6299.07 samples/sec Loss 3.0052 LearningRate 0.0000 Epoch: 33 Global Step: 693140 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:47:15,221-Speed 6325.05 samples/sec Loss 3.0279 LearningRate 0.0000 Epoch: 33 Global Step: 693150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:18,486-Speed 6273.96 samples/sec Loss 2.9247 LearningRate 0.0000 Epoch: 33 Global Step: 693160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:21,737-Speed 6299.92 samples/sec Loss 2.9807 LearningRate 0.0000 Epoch: 33 Global Step: 693170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:24,990-Speed 6298.20 samples/sec Loss 2.9979 LearningRate 0.0000 Epoch: 33 Global Step: 693180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:28,245-Speed 6293.20 samples/sec Loss 2.9987 LearningRate 0.0000 Epoch: 33 Global Step: 693190 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:31,494-Speed 6304.48 samples/sec Loss 2.9760 LearningRate 0.0000 Epoch: 33 Global Step: 693200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:34,748-Speed 6294.97 samples/sec Loss 2.9770 LearningRate 0.0000 Epoch: 33 Global Step: 693210 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:38,001-Speed 6297.75 samples/sec Loss 2.9335 LearningRate 0.0000 Epoch: 33 Global Step: 693220 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:41,257-Speed 6291.01 samples/sec Loss 2.8978 LearningRate 0.0000 Epoch: 33 Global Step: 693230 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:44,511-Speed 6295.91 samples/sec Loss 2.9682 LearningRate 0.0000 Epoch: 33 Global Step: 693240 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:47:47,758-Speed 6308.21 samples/sec Loss 3.0030 LearningRate 0.0000 Epoch: 33 Global Step: 693250 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:47:51,010-Speed 6298.45 samples/sec Loss 2.9838 LearningRate 0.0000 Epoch: 33 Global Step: 693260 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:47:54,267-Speed 6290.16 samples/sec Loss 2.9726 LearningRate 0.0000 Epoch: 33 Global Step: 693270 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:47:57,520-Speed 6296.83 samples/sec Loss 2.9534 LearningRate 0.0000 Epoch: 33 Global Step: 693280 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:00,774-Speed 6296.41 samples/sec Loss 2.9475 LearningRate 0.0000 Epoch: 33 Global Step: 693290 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:04,023-Speed 6303.64 samples/sec Loss 2.9507 LearningRate 0.0000 Epoch: 33 Global Step: 693300 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:07,274-Speed 6300.51 samples/sec Loss 2.9331 LearningRate 0.0000 Epoch: 33 Global Step: 693310 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:10,526-Speed 6299.20 samples/sec Loss 2.9866 LearningRate 0.0000 Epoch: 33 Global Step: 693320 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:13,776-Speed 6303.74 samples/sec Loss 2.9379 LearningRate 0.0000 Epoch: 33 Global Step: 693330 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:17,026-Speed 6303.13 samples/sec Loss 2.9832 LearningRate 0.0000 Epoch: 33 Global Step: 693340 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:20,268-Speed 6317.50 samples/sec Loss 2.9399 LearningRate 0.0000 Epoch: 33 Global Step: 693350 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:23,531-Speed 6278.70 samples/sec Loss 2.9638 LearningRate 0.0000 Epoch: 33 Global Step: 693360 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:26,779-Speed 6306.67 samples/sec Loss 2.9547 LearningRate 0.0000 Epoch: 33 Global Step: 693370 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:30,030-Speed 6302.04 samples/sec Loss 2.9823 LearningRate 0.0000 Epoch: 33 Global Step: 693380 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:33,279-Speed 6303.79 samples/sec Loss 2.9325 LearningRate 0.0000 Epoch: 33 Global Step: 693390 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:36,532-Speed 6297.22 samples/sec Loss 2.9964 LearningRate 0.0000 Epoch: 33 Global Step: 693400 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:48:39,776-Speed 6314.66 samples/sec Loss 3.0207 LearningRate 0.0000 Epoch: 33 Global Step: 693410 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:48:43,030-Speed 6296.56 samples/sec Loss 2.9032 LearningRate 0.0000 Epoch: 33 Global Step: 693420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:48:46,280-Speed 6302.87 samples/sec Loss 2.9223 LearningRate 0.0000 Epoch: 33 Global Step: 693430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:48:49,530-Speed 6302.10 samples/sec Loss 3.0574 LearningRate 0.0000 Epoch: 33 Global Step: 693440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:48:52,777-Speed 6308.19 samples/sec Loss 2.9787 LearningRate 0.0000 Epoch: 33 Global Step: 693450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:48:56,028-Speed 6300.83 samples/sec Loss 3.0166 LearningRate 0.0000 Epoch: 33 Global Step: 693460 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:48:59,283-Speed 6295.06 samples/sec Loss 2.9722 LearningRate 0.0000 Epoch: 33 Global Step: 693470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:02,531-Speed 6305.90 samples/sec Loss 2.9760 LearningRate 0.0000 Epoch: 33 Global Step: 693480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:05,779-Speed 6307.20 samples/sec Loss 2.9948 LearningRate 0.0000 Epoch: 33 Global Step: 693490 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:09,031-Speed 6298.71 samples/sec Loss 2.9986 LearningRate 0.0000 Epoch: 33 Global Step: 693500 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:12,281-Speed 6303.78 samples/sec Loss 2.9703 LearningRate 0.0000 Epoch: 33 Global Step: 693510 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:49:15,518-Speed 6326.77 samples/sec Loss 2.9296 LearningRate 0.0000 Epoch: 33 Global Step: 693520 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:18,771-Speed 6297.31 samples/sec Loss 2.9624 LearningRate 0.0000 Epoch: 33 Global Step: 693530 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:22,025-Speed 6295.95 samples/sec Loss 3.0015 LearningRate 0.0000 Epoch: 33 Global Step: 693540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:25,275-Speed 6303.15 samples/sec Loss 2.9465 LearningRate 0.0000 Epoch: 33 Global Step: 693550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:28,530-Speed 6293.94 samples/sec Loss 2.9788 LearningRate 0.0000 Epoch: 33 Global Step: 693560 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:31,778-Speed 6307.24 samples/sec Loss 2.9825 LearningRate 0.0000 Epoch: 33 Global Step: 693570 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:35,027-Speed 6303.84 samples/sec Loss 2.9887 LearningRate 0.0000 Epoch: 33 Global Step: 693580 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:38,300-Speed 6259.46 samples/sec Loss 2.9567 LearningRate 0.0000 Epoch: 33 Global Step: 693590 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:41,620-Speed 6169.97 samples/sec Loss 2.9381 LearningRate 0.0000 Epoch: 33 Global Step: 693600 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:44,871-Speed 6299.56 samples/sec Loss 2.9150 LearningRate 0.0000 Epoch: 33 Global Step: 693610 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:49:48,128-Speed 6290.35 samples/sec Loss 2.9483 LearningRate 0.0000 Epoch: 33 Global Step: 693620 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:49:51,381-Speed 6296.66 samples/sec Loss 2.8739 LearningRate 0.0000 Epoch: 33 Global Step: 693630 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:49:54,635-Speed 6294.65 samples/sec Loss 2.9820 LearningRate 0.0000 Epoch: 33 Global Step: 693640 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:49:57,878-Speed 6316.56 samples/sec Loss 2.9629 LearningRate 0.0000 Epoch: 33 Global Step: 693650 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:01,172-Speed 6220.45 samples/sec Loss 2.9645 LearningRate 0.0000 Epoch: 33 Global Step: 693660 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:04,420-Speed 6305.21 samples/sec Loss 2.9630 LearningRate 0.0000 Epoch: 33 Global Step: 693670 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:07,673-Speed 6298.39 samples/sec Loss 2.9806 LearningRate 0.0000 Epoch: 33 Global Step: 693680 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:10,921-Speed 6305.47 samples/sec Loss 2.9455 LearningRate 0.0000 Epoch: 33 Global Step: 693690 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:14,169-Speed 6307.60 samples/sec Loss 2.9663 LearningRate 0.0000 Epoch: 33 Global Step: 693700 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:17,422-Speed 6296.96 samples/sec Loss 2.9724 LearningRate 0.0000 Epoch: 33 Global Step: 693710 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:20,673-Speed 6300.00 samples/sec Loss 2.9659 LearningRate 0.0000 Epoch: 33 Global Step: 693720 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:23,929-Speed 6291.69 samples/sec Loss 2.9629 LearningRate 0.0000 Epoch: 33 Global Step: 693730 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:27,182-Speed 6297.79 samples/sec Loss 2.9605 LearningRate 0.0000 Epoch: 33 Global Step: 693740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:30,433-Speed 6301.33 samples/sec Loss 2.9601 LearningRate 0.0000 Epoch: 33 Global Step: 693750 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:50:33,680-Speed 6308.74 samples/sec Loss 2.9331 LearningRate 0.0000 Epoch: 33 Global Step: 693760 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:50:36,926-Speed 6309.68 samples/sec Loss 2.9496 LearningRate 0.0000 Epoch: 33 Global Step: 693770 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:50:40,174-Speed 6306.09 samples/sec Loss 2.9581 LearningRate 0.0000 Epoch: 33 Global Step: 693780 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:50:43,406-Speed 6339.38 samples/sec Loss 2.9976 LearningRate 0.0000 Epoch: 33 Global Step: 693790 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:46,651-Speed 6312.83 samples/sec Loss 2.9787 LearningRate 0.0000 Epoch: 33 Global Step: 693800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:49,899-Speed 6308.24 samples/sec Loss 2.9414 LearningRate 0.0000 Epoch: 33 Global Step: 693810 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:53,155-Speed 6289.88 samples/sec Loss 2.9645 LearningRate 0.0000 Epoch: 33 Global Step: 693820 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:56,404-Speed 6304.69 samples/sec Loss 2.9634 LearningRate 0.0000 Epoch: 33 Global Step: 693830 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:50:59,653-Speed 6306.16 samples/sec Loss 3.0200 LearningRate 0.0000 Epoch: 33 Global Step: 693840 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:02,903-Speed 6303.25 samples/sec Loss 2.9445 LearningRate 0.0000 Epoch: 33 Global Step: 693850 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:06,153-Speed 6301.43 samples/sec Loss 2.9884 LearningRate 0.0000 Epoch: 33 Global Step: 693860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:09,410-Speed 6289.05 samples/sec Loss 2.9285 LearningRate 0.0000 Epoch: 33 Global Step: 693870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:12,664-Speed 6296.78 samples/sec Loss 2.8892 LearningRate 0.0000 Epoch: 33 Global Step: 693880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:15,912-Speed 6306.13 samples/sec Loss 2.9186 LearningRate 0.0000 Epoch: 33 Global Step: 693890 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:51:19,170-Speed 6286.49 samples/sec Loss 2.9730 LearningRate 0.0000 Epoch: 33 Global Step: 693900 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:51:22,418-Speed 6307.34 samples/sec Loss 3.0186 LearningRate 0.0000 Epoch: 33 Global Step: 693910 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:51:25,668-Speed 6303.77 samples/sec Loss 2.9365 LearningRate 0.0000 Epoch: 33 Global Step: 693920 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:51:28,919-Speed 6301.30 samples/sec Loss 2.9060 LearningRate 0.0000 Epoch: 33 Global Step: 693930 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:51:32,165-Speed 6310.35 samples/sec Loss 2.9698 LearningRate 0.0000 Epoch: 33 Global Step: 693940 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:51:35,406-Speed 6319.27 samples/sec Loss 2.9570 LearningRate 0.0000 Epoch: 33 Global Step: 693950 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:51:38,656-Speed 6303.54 samples/sec Loss 2.9850 LearningRate 0.0000 Epoch: 33 Global Step: 693960 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:41,907-Speed 6300.31 samples/sec Loss 2.9441 LearningRate 0.0000 Epoch: 33 Global Step: 693970 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:45,155-Speed 6307.59 samples/sec Loss 2.9286 LearningRate 0.0000 Epoch: 33 Global Step: 693980 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:48,401-Speed 6310.25 samples/sec Loss 2.9689 LearningRate 0.0000 Epoch: 33 Global Step: 693990 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:51,659-Speed 6288.47 samples/sec Loss 2.9473 LearningRate 0.0000 Epoch: 33 Global Step: 694000 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:54,911-Speed 6299.85 samples/sec Loss 2.9730 LearningRate 0.0000 Epoch: 33 Global Step: 694010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:51:58,161-Speed 6302.26 samples/sec Loss 2.9911 LearningRate 0.0000 Epoch: 33 Global Step: 694020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:01,414-Speed 6296.75 samples/sec Loss 2.9557 LearningRate 0.0000 Epoch: 33 Global Step: 694030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:04,676-Speed 6280.88 samples/sec Loss 2.9489 LearningRate 0.0000 Epoch: 33 Global Step: 694040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:07,923-Speed 6308.66 samples/sec Loss 2.9745 LearningRate 0.0000 Epoch: 33 Global Step: 694050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:11,173-Speed 6302.59 samples/sec Loss 2.9569 LearningRate 0.0000 Epoch: 33 Global Step: 694060 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:52:14,441-Speed 6268.50 samples/sec Loss 2.9603 LearningRate 0.0000 Epoch: 33 Global Step: 694070 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:52:17,692-Speed 6300.56 samples/sec Loss 2.9627 LearningRate 0.0000 Epoch: 33 Global Step: 694080 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:52:20,933-Speed 6320.52 samples/sec Loss 2.9584 LearningRate 0.0000 Epoch: 33 Global Step: 694090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:24,189-Speed 6290.95 samples/sec Loss 2.9663 LearningRate 0.0000 Epoch: 33 Global Step: 694100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:27,449-Speed 6283.82 samples/sec Loss 2.9689 LearningRate 0.0000 Epoch: 33 Global Step: 694110 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:30,694-Speed 6312.45 samples/sec Loss 3.0228 LearningRate 0.0000 Epoch: 33 Global Step: 694120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:33,936-Speed 6318.92 samples/sec Loss 2.9391 LearningRate 0.0000 Epoch: 33 Global Step: 694130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:37,187-Speed 6301.67 samples/sec Loss 2.9696 LearningRate 0.0000 Epoch: 33 Global Step: 694140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:40,433-Speed 6309.31 samples/sec Loss 2.9670 LearningRate 0.0000 Epoch: 33 Global Step: 694150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:43,709-Speed 6253.04 samples/sec Loss 2.9764 LearningRate 0.0000 Epoch: 33 Global Step: 694160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:46,956-Speed 6310.21 samples/sec Loss 2.9770 LearningRate 0.0000 Epoch: 33 Global Step: 694170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:50,203-Speed 6306.96 samples/sec Loss 2.8998 LearningRate 0.0000 Epoch: 33 Global Step: 694180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:52:53,455-Speed 6299.84 samples/sec Loss 2.9099 LearningRate 0.0000 Epoch: 33 Global Step: 694190 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:52:56,702-Speed 6308.62 samples/sec Loss 2.9771 LearningRate 0.0000 Epoch: 33 Global Step: 694200 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:52:59,952-Speed 6304.94 samples/sec Loss 2.9358 LearningRate 0.0000 Epoch: 33 Global Step: 694210 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:53:03,208-Speed 6291.38 samples/sec Loss 2.9207 LearningRate 0.0000 Epoch: 33 Global Step: 694220 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:53:06,449-Speed 6320.48 samples/sec Loss 2.9374 LearningRate 0.0000 Epoch: 33 Global Step: 694230 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:53:09,698-Speed 6303.99 samples/sec Loss 2.9852 LearningRate 0.0000 Epoch: 33 Global Step: 694240 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:12,946-Speed 6306.56 samples/sec Loss 2.9125 LearningRate 0.0000 Epoch: 33 Global Step: 694250 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:16,192-Speed 6311.92 samples/sec Loss 2.9582 LearningRate 0.0000 Epoch: 33 Global Step: 694260 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:19,445-Speed 6297.51 samples/sec Loss 2.9574 LearningRate 0.0000 Epoch: 33 Global Step: 694270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:22,693-Speed 6305.72 samples/sec Loss 2.9917 LearningRate 0.0000 Epoch: 33 Global Step: 694280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:25,944-Speed 6301.77 samples/sec Loss 2.9757 LearningRate 0.0000 Epoch: 33 Global Step: 694290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:29,191-Speed 6308.05 samples/sec Loss 2.9452 LearningRate 0.0000 Epoch: 33 Global Step: 694300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:32,437-Speed 6310.04 samples/sec Loss 2.9955 LearningRate 0.0000 Epoch: 33 Global Step: 694310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:35,684-Speed 6310.46 samples/sec Loss 3.0102 LearningRate 0.0000 Epoch: 33 Global Step: 694320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:38,940-Speed 6290.59 samples/sec Loss 3.0043 LearningRate 0.0000 Epoch: 33 Global Step: 694330 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:42,179-Speed 6324.91 samples/sec Loss 3.0101 LearningRate 0.0000 Epoch: 33 Global Step: 694340 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:45,422-Speed 6315.77 samples/sec Loss 3.0194 LearningRate 0.0000 Epoch: 33 Global Step: 694350 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:48,667-Speed 6314.21 samples/sec Loss 2.9462 LearningRate 0.0000 Epoch: 33 Global Step: 694360 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:51,912-Speed 6310.90 samples/sec Loss 2.9199 LearningRate 0.0000 Epoch: 33 Global Step: 694370 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:55,156-Speed 6315.05 samples/sec Loss 2.9429 LearningRate 0.0000 Epoch: 33 Global Step: 694380 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:53:58,413-Speed 6290.94 samples/sec Loss 2.9921 LearningRate 0.0000 Epoch: 33 Global Step: 694390 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:01,671-Speed 6286.90 samples/sec Loss 2.9522 LearningRate 0.0000 Epoch: 33 Global Step: 694400 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:04,925-Speed 6295.87 samples/sec Loss 2.9314 LearningRate 0.0000 Epoch: 33 Global Step: 694410 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:08,172-Speed 6309.36 samples/sec Loss 2.9126 LearningRate 0.0000 Epoch: 33 Global Step: 694420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:11,425-Speed 6296.98 samples/sec Loss 2.9336 LearningRate 0.0000 Epoch: 33 Global Step: 694430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:14,676-Speed 6300.29 samples/sec Loss 2.9678 LearningRate 0.0000 Epoch: 33 Global Step: 694440 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:54:17,927-Speed 6301.95 samples/sec Loss 2.9066 LearningRate 0.0000 Epoch: 33 Global Step: 694450 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:54:21,177-Speed 6303.22 samples/sec Loss 2.9633 LearningRate 0.0000 Epoch: 33 Global Step: 694460 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:54:24,429-Speed 6298.89 samples/sec Loss 2.9603 LearningRate 0.0000 Epoch: 33 Global Step: 694470 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:54:27,682-Speed 6296.28 samples/sec Loss 2.9891 LearningRate 0.0000 Epoch: 33 Global Step: 694480 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:54:30,929-Speed 6309.46 samples/sec Loss 2.9542 LearningRate 0.0000 Epoch: 33 Global Step: 694490 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:54:34,174-Speed 6313.11 samples/sec Loss 2.9174 LearningRate 0.0000 Epoch: 33 Global Step: 694500 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:54:37,409-Speed 6330.83 samples/sec Loss 3.0262 LearningRate 0.0000 Epoch: 33 Global Step: 694510 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:40,660-Speed 6300.76 samples/sec Loss 2.9560 LearningRate 0.0000 Epoch: 33 Global Step: 694520 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:43,920-Speed 6284.33 samples/sec Loss 2.9753 LearningRate 0.0000 Epoch: 33 Global Step: 694530 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:47,167-Speed 6309.72 samples/sec Loss 2.9713 LearningRate 0.0000 Epoch: 33 Global Step: 694540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:50,425-Speed 6285.94 samples/sec Loss 2.9283 LearningRate 0.0000 Epoch: 33 Global Step: 694550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:53,683-Speed 6287.70 samples/sec Loss 2.9215 LearningRate 0.0000 Epoch: 33 Global Step: 694560 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:54:56,933-Speed 6303.55 samples/sec Loss 2.9499 LearningRate 0.0000 Epoch: 33 Global Step: 694570 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:00,190-Speed 6287.93 samples/sec Loss 2.9647 LearningRate 0.0000 Epoch: 33 Global Step: 694580 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:03,436-Speed 6312.33 samples/sec Loss 2.9517 LearningRate 0.0000 Epoch: 33 Global Step: 694590 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:06,683-Speed 6309.92 samples/sec Loss 3.0126 LearningRate 0.0000 Epoch: 33 Global Step: 694600 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:09,923-Speed 6320.94 samples/sec Loss 2.9624 LearningRate 0.0000 Epoch: 33 Global Step: 694610 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:55:13,161-Speed 6327.11 samples/sec Loss 2.9585 LearningRate 0.0000 Epoch: 33 Global Step: 694620 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:16,405-Speed 6313.95 samples/sec Loss 2.9452 LearningRate 0.0000 Epoch: 33 Global Step: 694630 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:19,665-Speed 6283.98 samples/sec Loss 2.9681 LearningRate 0.0000 Epoch: 33 Global Step: 694640 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:22,918-Speed 6297.63 samples/sec Loss 2.9329 LearningRate 0.0000 Epoch: 33 Global Step: 694650 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:26,167-Speed 6304.55 samples/sec Loss 2.9578 LearningRate 0.0000 Epoch: 33 Global Step: 694660 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:29,424-Speed 6289.32 samples/sec Loss 2.9373 LearningRate 0.0000 Epoch: 33 Global Step: 694670 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:32,673-Speed 6305.21 samples/sec Loss 2.9726 LearningRate 0.0000 Epoch: 33 Global Step: 694680 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:35,925-Speed 6299.13 samples/sec Loss 2.9383 LearningRate 0.0000 Epoch: 33 Global Step: 694690 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:39,190-Speed 6273.56 samples/sec Loss 2.8992 LearningRate 0.0000 Epoch: 33 Global Step: 694700 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:42,439-Speed 6305.57 samples/sec Loss 3.0046 LearningRate 0.0000 Epoch: 33 Global Step: 694710 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:55:45,686-Speed 6307.74 samples/sec Loss 2.9547 LearningRate 0.0000 Epoch: 33 Global Step: 694720 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:55:48,940-Speed 6295.59 samples/sec Loss 2.9409 LearningRate 0.0000 Epoch: 33 Global Step: 694730 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:55:52,194-Speed 6295.56 samples/sec Loss 2.9656 LearningRate 0.0000 Epoch: 33 Global Step: 694740 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:55:55,460-Speed 6271.97 samples/sec Loss 2.9562 LearningRate 0.0000 Epoch: 33 Global Step: 694750 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:55:58,722-Speed 6279.14 samples/sec Loss 2.9520 LearningRate 0.0000 Epoch: 33 Global Step: 694760 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:56:01,983-Speed 6282.58 samples/sec Loss 2.9822 LearningRate 0.0000 Epoch: 33 Global Step: 694770 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:56:05,239-Speed 6295.63 samples/sec Loss 2.9861 LearningRate 0.0000 Epoch: 33 Global Step: 694780 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:56:08,479-Speed 6322.36 samples/sec Loss 2.9558 LearningRate 0.0000 Epoch: 33 Global Step: 694790 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:11,734-Speed 6292.54 samples/sec Loss 2.9333 LearningRate 0.0000 Epoch: 33 Global Step: 694800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:14,989-Speed 6293.73 samples/sec Loss 3.0244 LearningRate 0.0000 Epoch: 33 Global Step: 694810 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:18,239-Speed 6303.29 samples/sec Loss 3.0015 LearningRate 0.0000 Epoch: 33 Global Step: 694820 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:21,486-Speed 6308.57 samples/sec Loss 2.9671 LearningRate 0.0000 Epoch: 33 Global Step: 694830 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:24,736-Speed 6302.43 samples/sec Loss 3.0195 LearningRate 0.0000 Epoch: 33 Global Step: 694840 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:27,987-Speed 6301.37 samples/sec Loss 2.9145 LearningRate 0.0000 Epoch: 33 Global Step: 694850 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:31,231-Speed 6314.19 samples/sec Loss 2.9769 LearningRate 0.0000 Epoch: 33 Global Step: 694860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:34,483-Speed 6300.42 samples/sec Loss 2.9779 LearningRate 0.0000 Epoch: 33 Global Step: 694870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:37,733-Speed 6302.12 samples/sec Loss 2.9614 LearningRate 0.0000 Epoch: 33 Global Step: 694880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:40,965-Speed 6337.92 samples/sec Loss 2.9768 LearningRate 0.0000 Epoch: 33 Global Step: 694890 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:44,208-Speed 6316.39 samples/sec Loss 2.9329 LearningRate 0.0000 Epoch: 33 Global Step: 694900 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:47,453-Speed 6312.91 samples/sec Loss 2.9789 LearningRate 0.0000 Epoch: 33 Global Step: 694910 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:50,700-Speed 6307.90 samples/sec Loss 2.9223 LearningRate 0.0000 Epoch: 33 Global Step: 694920 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:53,944-Speed 6315.49 samples/sec Loss 2.9469 LearningRate 0.0000 Epoch: 33 Global Step: 694930 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:56:57,190-Speed 6310.09 samples/sec Loss 2.8863 LearningRate 0.0000 Epoch: 33 Global Step: 694940 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:00,448-Speed 6288.86 samples/sec Loss 2.9590 LearningRate 0.0000 Epoch: 33 Global Step: 694950 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:03,704-Speed 6290.40 samples/sec Loss 2.9530 LearningRate 0.0000 Epoch: 33 Global Step: 694960 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:06,956-Speed 6299.42 samples/sec Loss 2.9349 LearningRate 0.0000 Epoch: 33 Global Step: 694970 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:10,202-Speed 6309.55 samples/sec Loss 2.9741 LearningRate 0.0000 Epoch: 33 Global Step: 694980 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:13,456-Speed 6295.08 samples/sec Loss 2.9243 LearningRate 0.0000 Epoch: 33 Global Step: 694990 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:57:16,708-Speed 6300.00 samples/sec Loss 2.9808 LearningRate 0.0000 Epoch: 33 Global Step: 695000 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:57:19,958-Speed 6303.07 samples/sec Loss 2.9715 LearningRate 0.0000 Epoch: 33 Global Step: 695010 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:57:23,207-Speed 6305.86 samples/sec Loss 2.9497 LearningRate 0.0000 Epoch: 33 Global Step: 695020 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:57:26,439-Speed 6337.00 samples/sec Loss 2.9288 LearningRate 0.0000 Epoch: 33 Global Step: 695030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:29,689-Speed 6302.77 samples/sec Loss 2.9890 LearningRate 0.0000 Epoch: 33 Global Step: 695040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:32,936-Speed 6309.46 samples/sec Loss 2.9020 LearningRate 0.0000 Epoch: 33 Global Step: 695050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:36,188-Speed 6299.72 samples/sec Loss 2.9631 LearningRate 0.0000 Epoch: 33 Global Step: 695060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:39,432-Speed 6313.64 samples/sec Loss 2.9938 LearningRate 0.0000 Epoch: 33 Global Step: 695070 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:42,683-Speed 6301.62 samples/sec Loss 2.9429 LearningRate 0.0000 Epoch: 33 Global Step: 695080 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:45,932-Speed 6304.11 samples/sec Loss 2.9758 LearningRate 0.0000 Epoch: 33 Global Step: 695090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:49,176-Speed 6314.64 samples/sec Loss 2.9512 LearningRate 0.0000 Epoch: 33 Global Step: 695100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:52,426-Speed 6303.79 samples/sec Loss 2.9089 LearningRate 0.0000 Epoch: 33 Global Step: 695110 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:55,678-Speed 6297.42 samples/sec Loss 2.9259 LearningRate 0.0000 Epoch: 33 Global Step: 695120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:57:58,915-Speed 6329.80 samples/sec Loss 2.9357 LearningRate 0.0000 Epoch: 33 Global Step: 695130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:02,160-Speed 6312.89 samples/sec Loss 2.8861 LearningRate 0.0000 Epoch: 33 Global Step: 695140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:05,410-Speed 6301.84 samples/sec Loss 3.0039 LearningRate 0.0000 Epoch: 33 Global Step: 695150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:08,659-Speed 6304.99 samples/sec Loss 2.9585 LearningRate 0.0000 Epoch: 33 Global Step: 695160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:11,912-Speed 6296.87 samples/sec Loss 2.8811 LearningRate 0.0000 Epoch: 33 Global Step: 695170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:15,182-Speed 6265.38 samples/sec Loss 2.9174 LearningRate 0.0000 Epoch: 33 Global Step: 695180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:18,432-Speed 6301.94 samples/sec Loss 2.9693 LearningRate 0.0000 Epoch: 33 Global Step: 695190 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:21,677-Speed 6313.98 samples/sec Loss 2.9760 LearningRate 0.0000 Epoch: 33 Global Step: 695200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:24,933-Speed 6291.34 samples/sec Loss 2.9680 LearningRate 0.0000 Epoch: 33 Global Step: 695210 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:28,185-Speed 6298.37 samples/sec Loss 3.0122 LearningRate 0.0000 Epoch: 33 Global Step: 695220 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:31,442-Speed 6289.26 samples/sec Loss 2.9648 LearningRate 0.0000 Epoch: 33 Global Step: 695230 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:58:34,699-Speed 6290.47 samples/sec Loss 2.9451 LearningRate 0.0000 Epoch: 33 Global Step: 695240 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:58:37,952-Speed 6297.37 samples/sec Loss 2.9519 LearningRate 0.0000 Epoch: 33 Global Step: 695250 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:58:41,202-Speed 6303.33 samples/sec Loss 2.9844 LearningRate 0.0000 Epoch: 33 Global Step: 695260 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:58:44,431-Speed 6343.21 samples/sec Loss 3.0151 LearningRate 0.0000 Epoch: 33 Global Step: 695270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:47,675-Speed 6314.41 samples/sec Loss 2.9973 LearningRate 0.0000 Epoch: 33 Global Step: 695280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:50,918-Speed 6316.59 samples/sec Loss 2.9456 LearningRate 0.0000 Epoch: 33 Global Step: 695290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:54,172-Speed 6296.01 samples/sec Loss 2.9515 LearningRate 0.0000 Epoch: 33 Global Step: 695300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:58:57,422-Speed 6302.62 samples/sec Loss 2.9703 LearningRate 0.0000 Epoch: 33 Global Step: 695310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:00,672-Speed 6302.74 samples/sec Loss 2.9402 LearningRate 0.0000 Epoch: 33 Global Step: 695320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:03,920-Speed 6307.30 samples/sec Loss 2.9863 LearningRate 0.0000 Epoch: 33 Global Step: 695330 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:07,162-Speed 6318.85 samples/sec Loss 2.9734 LearningRate 0.0000 Epoch: 33 Global Step: 695340 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:10,412-Speed 6302.00 samples/sec Loss 2.9279 LearningRate 0.0000 Epoch: 33 Global Step: 695350 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:13,661-Speed 6304.71 samples/sec Loss 2.9441 LearningRate 0.0000 Epoch: 33 Global Step: 695360 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:16,904-Speed 6315.70 samples/sec Loss 2.9576 LearningRate 0.0000 Epoch: 33 Global Step: 695370 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:59:20,152-Speed 6307.48 samples/sec Loss 2.9688 LearningRate 0.0000 Epoch: 33 Global Step: 695380 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:59:23,405-Speed 6297.94 samples/sec Loss 2.9508 LearningRate 0.0000 Epoch: 33 Global Step: 695390 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:59:26,651-Speed 6309.23 samples/sec Loss 2.9513 LearningRate 0.0000 Epoch: 33 Global Step: 695400 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:59:29,904-Speed 6298.10 samples/sec Loss 2.9810 LearningRate 0.0000 Epoch: 33 Global Step: 695410 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 06:59:33,147-Speed 6315.56 samples/sec Loss 2.9751 LearningRate 0.0000 Epoch: 33 Global Step: 695420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:36,400-Speed 6298.36 samples/sec Loss 2.9762 LearningRate 0.0000 Epoch: 33 Global Step: 695430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:39,654-Speed 6295.90 samples/sec Loss 2.9106 LearningRate 0.0000 Epoch: 33 Global Step: 695440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:42,910-Speed 6290.44 samples/sec Loss 2.9567 LearningRate 0.0000 Epoch: 33 Global Step: 695450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:46,160-Speed 6303.95 samples/sec Loss 2.9724 LearningRate 0.0000 Epoch: 33 Global Step: 695460 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:49,407-Speed 6308.75 samples/sec Loss 2.9362 LearningRate 0.0000 Epoch: 33 Global Step: 695470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:52,651-Speed 6313.98 samples/sec Loss 2.9087 LearningRate 0.0000 Epoch: 33 Global Step: 695480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:55,902-Speed 6301.16 samples/sec Loss 2.9732 LearningRate 0.0000 Epoch: 33 Global Step: 695490 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 06:59:59,155-Speed 6298.59 samples/sec Loss 2.9927 LearningRate 0.0000 Epoch: 33 Global Step: 695500 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:02,402-Speed 6306.78 samples/sec Loss 2.9475 LearningRate 0.0000 Epoch: 33 Global Step: 695510 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:05,649-Speed 6309.01 samples/sec Loss 2.9675 LearningRate 0.0000 Epoch: 33 Global Step: 695520 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:00:08,900-Speed 6302.47 samples/sec Loss 2.9493 LearningRate 0.0000 Epoch: 33 Global Step: 695530 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:00:12,135-Speed 6331.31 samples/sec Loss 2.9269 LearningRate 0.0000 Epoch: 33 Global Step: 695540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:15,393-Speed 6286.63 samples/sec Loss 2.9426 LearningRate 0.0000 Epoch: 33 Global Step: 695550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:18,648-Speed 6293.20 samples/sec Loss 2.9324 LearningRate 0.0000 Epoch: 33 Global Step: 695560 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:21,895-Speed 6309.33 samples/sec Loss 2.9682 LearningRate 0.0000 Epoch: 33 Global Step: 695570 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:25,152-Speed 6290.09 samples/sec Loss 2.9584 LearningRate 0.0000 Epoch: 33 Global Step: 695580 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:28,401-Speed 6304.22 samples/sec Loss 2.8985 LearningRate 0.0000 Epoch: 33 Global Step: 695590 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:31,651-Speed 6302.93 samples/sec Loss 3.0316 LearningRate 0.0000 Epoch: 33 Global Step: 695600 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:34,909-Speed 6288.49 samples/sec Loss 2.9305 LearningRate 0.0000 Epoch: 33 Global Step: 695610 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:38,156-Speed 6307.28 samples/sec Loss 2.9638 LearningRate 0.0000 Epoch: 33 Global Step: 695620 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:41,396-Speed 6323.55 samples/sec Loss 2.9547 LearningRate 0.0000 Epoch: 33 Global Step: 695630 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:00:44,641-Speed 6312.56 samples/sec Loss 3.0036 LearningRate 0.0000 Epoch: 33 Global Step: 695640 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:00:47,884-Speed 6317.09 samples/sec Loss 2.9179 LearningRate 0.0000 Epoch: 33 Global Step: 695650 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:00:51,133-Speed 6305.10 samples/sec Loss 2.9606 LearningRate 0.0000 Epoch: 33 Global Step: 695660 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:00:54,389-Speed 6290.57 samples/sec Loss 2.9915 LearningRate 0.0000 Epoch: 33 Global Step: 695670 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:00:57,637-Speed 6306.60 samples/sec Loss 2.9291 LearningRate 0.0000 Epoch: 33 Global Step: 695680 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:01:00,935-Speed 6211.51 samples/sec Loss 2.9631 LearningRate 0.0000 Epoch: 33 Global Step: 695690 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:01:04,189-Speed 6295.92 samples/sec Loss 2.9329 LearningRate 0.0000 Epoch: 33 Global Step: 695700 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:01:07,441-Speed 6298.36 samples/sec Loss 2.9790 LearningRate 0.0000 Epoch: 33 Global Step: 695710 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:01:10,678-Speed 6328.28 samples/sec Loss 2.9253 LearningRate 0.0000 Epoch: 33 Global Step: 695720 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:13,928-Speed 6303.04 samples/sec Loss 2.9645 LearningRate 0.0000 Epoch: 33 Global Step: 695730 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:17,182-Speed 6294.67 samples/sec Loss 2.9632 LearningRate 0.0000 Epoch: 33 Global Step: 695740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:20,431-Speed 6305.04 samples/sec Loss 2.9914 LearningRate 0.0000 Epoch: 33 Global Step: 695750 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:23,684-Speed 6298.62 samples/sec Loss 2.9498 LearningRate 0.0000 Epoch: 33 Global Step: 695760 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:26,933-Speed 6303.96 samples/sec Loss 3.0380 LearningRate 0.0000 Epoch: 33 Global Step: 695770 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:30,187-Speed 6294.16 samples/sec Loss 2.9481 LearningRate 0.0000 Epoch: 33 Global Step: 695780 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:33,444-Speed 6289.94 samples/sec Loss 2.9097 LearningRate 0.0000 Epoch: 33 Global Step: 695790 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:36,688-Speed 6314.43 samples/sec Loss 2.9747 LearningRate 0.0000 Epoch: 33 Global Step: 695800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:39,944-Speed 6292.70 samples/sec Loss 2.9276 LearningRate 0.0000 Epoch: 33 Global Step: 695810 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:43,200-Speed 6291.38 samples/sec Loss 2.9875 LearningRate 0.0000 Epoch: 33 Global Step: 695820 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:01:46,445-Speed 6311.83 samples/sec Loss 2.9417 LearningRate 0.0000 Epoch: 33 Global Step: 695830 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:01:49,706-Speed 6280.58 samples/sec Loss 2.9592 LearningRate 0.0000 Epoch: 33 Global Step: 695840 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:01:52,961-Speed 6294.64 samples/sec Loss 3.0010 LearningRate 0.0000 Epoch: 33 Global Step: 695850 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:01:56,196-Speed 6331.99 samples/sec Loss 3.0040 LearningRate 0.0000 Epoch: 33 Global Step: 695860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:01:59,448-Speed 6299.87 samples/sec Loss 2.9548 LearningRate 0.0000 Epoch: 33 Global Step: 695870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:02,704-Speed 6291.28 samples/sec Loss 2.9137 LearningRate 0.0000 Epoch: 33 Global Step: 695880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:05,960-Speed 6290.62 samples/sec Loss 2.9519 LearningRate 0.0000 Epoch: 33 Global Step: 695890 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:09,210-Speed 6303.39 samples/sec Loss 2.9222 LearningRate 0.0000 Epoch: 33 Global Step: 695900 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:12,457-Speed 6308.94 samples/sec Loss 2.9577 LearningRate 0.0000 Epoch: 33 Global Step: 695910 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:15,703-Speed 6310.48 samples/sec Loss 2.9727 LearningRate 0.0000 Epoch: 33 Global Step: 695920 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:18,956-Speed 6297.57 samples/sec Loss 2.9341 LearningRate 0.0000 Epoch: 33 Global Step: 695930 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:22,212-Speed 6290.19 samples/sec Loss 2.9454 LearningRate 0.0000 Epoch: 33 Global Step: 695940 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:25,461-Speed 6305.56 samples/sec Loss 2.9236 LearningRate 0.0000 Epoch: 33 Global Step: 695950 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:28,776-Speed 6178.61 samples/sec Loss 2.9408 LearningRate 0.0000 Epoch: 33 Global Step: 695960 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:02:32,033-Speed 6289.74 samples/sec Loss 2.9060 LearningRate 0.0000 Epoch: 33 Global Step: 695970 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:02:35,267-Speed 6334.52 samples/sec Loss 2.9211 LearningRate 0.0000 Epoch: 33 Global Step: 695980 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:38,525-Speed 6287.85 samples/sec Loss 2.9629 LearningRate 0.0000 Epoch: 33 Global Step: 695990 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:41,774-Speed 6304.80 samples/sec Loss 2.9199 LearningRate 0.0000 Epoch: 33 Global Step: 696000 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:45,022-Speed 6307.05 samples/sec Loss 2.9264 LearningRate 0.0000 Epoch: 33 Global Step: 696010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:48,274-Speed 6299.47 samples/sec Loss 2.9687 LearningRate 0.0000 Epoch: 33 Global Step: 696020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:51,519-Speed 6311.51 samples/sec Loss 2.9459 LearningRate 0.0000 Epoch: 33 Global Step: 696030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:54,779-Speed 6284.79 samples/sec Loss 2.9363 LearningRate 0.0000 Epoch: 33 Global Step: 696040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:02:58,029-Speed 6301.45 samples/sec Loss 2.9192 LearningRate 0.0000 Epoch: 33 Global Step: 696050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:01,280-Speed 6301.44 samples/sec Loss 2.9660 LearningRate 0.0000 Epoch: 33 Global Step: 696060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:04,531-Speed 6300.47 samples/sec Loss 2.9846 LearningRate 0.0000 Epoch: 33 Global Step: 696070 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:07,787-Speed 6292.05 samples/sec Loss 2.9769 LearningRate 0.0000 Epoch: 33 Global Step: 696080 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:03:11,037-Speed 6302.99 samples/sec Loss 2.9636 LearningRate 0.0000 Epoch: 33 Global Step: 696090 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:03:14,286-Speed 6305.05 samples/sec Loss 2.9131 LearningRate 0.0000 Epoch: 33 Global Step: 696100 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:03:17,541-Speed 6293.01 samples/sec Loss 2.9494 LearningRate 0.0000 Epoch: 33 Global Step: 696110 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:03:20,781-Speed 6323.04 samples/sec Loss 2.9484 LearningRate 0.0000 Epoch: 33 Global Step: 696120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:24,027-Speed 6311.07 samples/sec Loss 2.9513 LearningRate 0.0000 Epoch: 33 Global Step: 696130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:27,275-Speed 6307.50 samples/sec Loss 2.9041 LearningRate 0.0000 Epoch: 33 Global Step: 696140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:30,529-Speed 6295.14 samples/sec Loss 2.9345 LearningRate 0.0000 Epoch: 33 Global Step: 696150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:33,777-Speed 6306.13 samples/sec Loss 2.9106 LearningRate 0.0000 Epoch: 33 Global Step: 696160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:37,026-Speed 6305.76 samples/sec Loss 2.9492 LearningRate 0.0000 Epoch: 33 Global Step: 696170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:40,282-Speed 6291.38 samples/sec Loss 2.9479 LearningRate 0.0000 Epoch: 33 Global Step: 696180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:43,530-Speed 6306.89 samples/sec Loss 2.9296 LearningRate 0.0000 Epoch: 33 Global Step: 696190 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:46,777-Speed 6308.27 samples/sec Loss 2.9869 LearningRate 0.0000 Epoch: 33 Global Step: 696200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:50,033-Speed 6290.21 samples/sec Loss 2.9305 LearningRate 0.0000 Epoch: 33 Global Step: 696210 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:03:53,302-Speed 6267.26 samples/sec Loss 2.9537 LearningRate 0.0000 Epoch: 33 Global Step: 696220 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:03:56,563-Speed 6282.50 samples/sec Loss 2.9691 LearningRate 0.0000 Epoch: 33 Global Step: 696230 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:03:59,808-Speed 6312.20 samples/sec Loss 2.9693 LearningRate 0.0000 Epoch: 33 Global Step: 696240 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:04:03,042-Speed 6332.65 samples/sec Loss 2.8624 LearningRate 0.0000 Epoch: 33 Global Step: 696250 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:06,296-Speed 6296.97 samples/sec Loss 2.9682 LearningRate 0.0000 Epoch: 33 Global Step: 696260 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:09,544-Speed 6306.78 samples/sec Loss 2.8826 LearningRate 0.0000 Epoch: 33 Global Step: 696270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:12,796-Speed 6299.35 samples/sec Loss 2.9388 LearningRate 0.0000 Epoch: 33 Global Step: 696280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:16,048-Speed 6299.12 samples/sec Loss 2.9515 LearningRate 0.0000 Epoch: 33 Global Step: 696290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:19,290-Speed 6317.50 samples/sec Loss 2.9499 LearningRate 0.0000 Epoch: 33 Global Step: 696300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:22,535-Speed 6312.74 samples/sec Loss 3.0109 LearningRate 0.0000 Epoch: 33 Global Step: 696310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:25,781-Speed 6312.08 samples/sec Loss 2.9451 LearningRate 0.0000 Epoch: 33 Global Step: 696320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:29,024-Speed 6316.30 samples/sec Loss 2.9345 LearningRate 0.0000 Epoch: 33 Global Step: 696330 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:32,269-Speed 6312.54 samples/sec Loss 2.9252 LearningRate 0.0000 Epoch: 33 Global Step: 696340 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:35,518-Speed 6304.81 samples/sec Loss 2.9744 LearningRate 0.0000 Epoch: 33 Global Step: 696350 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:04:38,772-Speed 6296.57 samples/sec Loss 2.9468 LearningRate 0.0000 Epoch: 33 Global Step: 696360 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:04:42,029-Speed 6288.28 samples/sec Loss 2.9367 LearningRate 0.0000 Epoch: 33 Global Step: 696370 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:04:45,273-Speed 6315.23 samples/sec Loss 2.9331 LearningRate 0.0000 Epoch: 33 Global Step: 696380 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:04:48,508-Speed 6331.47 samples/sec Loss 2.9129 LearningRate 0.0000 Epoch: 33 Global Step: 696390 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:51,753-Speed 6313.69 samples/sec Loss 2.9623 LearningRate 0.0000 Epoch: 33 Global Step: 696400 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:55,006-Speed 6296.76 samples/sec Loss 2.9365 LearningRate 0.0000 Epoch: 33 Global Step: 696410 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:04:58,257-Speed 6300.54 samples/sec Loss 2.9436 LearningRate 0.0000 Epoch: 33 Global Step: 696420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:01,523-Speed 6272.87 samples/sec Loss 2.9826 LearningRate 0.0000 Epoch: 33 Global Step: 696430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:04,779-Speed 6290.68 samples/sec Loss 2.9957 LearningRate 0.0000 Epoch: 33 Global Step: 696440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:08,032-Speed 6297.18 samples/sec Loss 2.9948 LearningRate 0.0000 Epoch: 33 Global Step: 696450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:11,283-Speed 6302.49 samples/sec Loss 2.9663 LearningRate 0.0000 Epoch: 33 Global Step: 696460 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:14,532-Speed 6304.64 samples/sec Loss 2.9637 LearningRate 0.0000 Epoch: 33 Global Step: 696470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:17,786-Speed 6295.69 samples/sec Loss 2.9637 LearningRate 0.0000 Epoch: 33 Global Step: 696480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:21,038-Speed 6298.38 samples/sec Loss 2.9645 LearningRate 0.0000 Epoch: 33 Global Step: 696490 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:05:24,299-Speed 6282.79 samples/sec Loss 2.9376 LearningRate 0.0000 Epoch: 33 Global Step: 696500 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:05:27,600-Speed 6205.17 samples/sec Loss 2.9596 LearningRate 0.0000 Epoch: 33 Global Step: 696510 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:05:30,857-Speed 6290.93 samples/sec Loss 3.0036 LearningRate 0.0000 Epoch: 33 Global Step: 696520 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:05:34,098-Speed 6319.17 samples/sec Loss 2.9300 LearningRate 0.0000 Epoch: 33 Global Step: 696530 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:37,351-Speed 6298.65 samples/sec Loss 2.9121 LearningRate 0.0000 Epoch: 33 Global Step: 696540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:40,606-Speed 6292.16 samples/sec Loss 2.9326 LearningRate 0.0000 Epoch: 33 Global Step: 696550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:43,857-Speed 6302.20 samples/sec Loss 2.9311 LearningRate 0.0000 Epoch: 33 Global Step: 696560 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:47,112-Speed 6293.18 samples/sec Loss 2.9129 LearningRate 0.0000 Epoch: 33 Global Step: 696570 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:50,370-Speed 6287.17 samples/sec Loss 2.9047 LearningRate 0.0000 Epoch: 33 Global Step: 696580 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:53,626-Speed 6291.51 samples/sec Loss 2.8983 LearningRate 0.0000 Epoch: 33 Global Step: 696590 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:05:56,884-Speed 6286.27 samples/sec Loss 2.9040 LearningRate 0.0000 Epoch: 33 Global Step: 696600 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:06:00,141-Speed 6290.60 samples/sec Loss 2.9383 LearningRate 0.0000 Epoch: 33 Global Step: 696610 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:06:03,389-Speed 6305.09 samples/sec Loss 2.9708 LearningRate 0.0000 Epoch: 33 Global Step: 696620 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:06:06,640-Speed 6302.12 samples/sec Loss 2.9052 LearningRate 0.0000 Epoch: 33 Global Step: 696630 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:06:09,886-Speed 6309.77 samples/sec Loss 2.8642 LearningRate 0.0000 Epoch: 33 Global Step: 696640 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:06:13,135-Speed 6305.87 samples/sec Loss 2.9443 LearningRate 0.0000 Epoch: 33 Global Step: 696650 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:06:16,385-Speed 6303.85 samples/sec Loss 2.9207 LearningRate 0.0000 Epoch: 33 Global Step: 696660 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:06:19,639-Speed 6294.28 samples/sec Loss 2.9196 LearningRate 0.0000 Epoch: 33 Global Step: 696670 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:06:22,904-Speed 6274.59 samples/sec Loss 2.9296 LearningRate 0.0000 Epoch: 33 Global Step: 696680 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:06:26,156-Speed 6299.06 samples/sec Loss 2.9129 LearningRate 0.0000 Epoch: 33 Global Step: 696690 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:06:29,498-Speed 6129.37 samples/sec Loss 2.8868 LearningRate 0.0000 Epoch: 33 Global Step: 696700 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:06:32,766-Speed 6268.20 samples/sec Loss 2.9764 LearningRate 0.0000 Epoch: 33 Global Step: 696710 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:06:36,017-Speed 6301.59 samples/sec Loss 2.9219 LearningRate 0.0000 Epoch: 33 Global Step: 696720 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:06:39,234-Speed 6366.63 samples/sec Loss 2.9822 LearningRate 0.0000 Epoch: 33 Global Step: 696730 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:06:42,482-Speed 6309.53 samples/sec Loss 2.8970 LearningRate 0.0000 Epoch: 33 Global Step: 696740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:06:45,731-Speed 6304.24 samples/sec Loss 2.9336 LearningRate 0.0000 Epoch: 33 Global Step: 696750 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:06:48,989-Speed 6287.48 samples/sec Loss 2.9222 LearningRate 0.0000 Epoch: 33 Global Step: 696760 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:06:52,250-Speed 6281.46 samples/sec Loss 2.9133 LearningRate 0.0000 Epoch: 33 Global Step: 696770 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:06:55,502-Speed 6298.22 samples/sec Loss 2.9109 LearningRate 0.0000 Epoch: 33 Global Step: 696780 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:06:58,755-Speed 6297.76 samples/sec Loss 3.0095 LearningRate 0.0000 Epoch: 33 Global Step: 696790 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:02,004-Speed 6305.39 samples/sec Loss 2.9265 LearningRate 0.0000 Epoch: 33 Global Step: 696800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:05,249-Speed 6313.22 samples/sec Loss 2.9180 LearningRate 0.0000 Epoch: 33 Global Step: 696810 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:08,499-Speed 6302.62 samples/sec Loss 2.9730 LearningRate 0.0000 Epoch: 33 Global Step: 696820 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:11,750-Speed 6300.96 samples/sec Loss 2.9201 LearningRate 0.0000 Epoch: 33 Global Step: 696830 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:07:14,985-Speed 6331.91 samples/sec Loss 2.9512 LearningRate 0.0000 Epoch: 33 Global Step: 696840 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:18,242-Speed 6288.97 samples/sec Loss 2.9329 LearningRate 0.0000 Epoch: 33 Global Step: 696850 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:21,498-Speed 6291.42 samples/sec Loss 2.9469 LearningRate 0.0000 Epoch: 33 Global Step: 696860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:24,747-Speed 6306.11 samples/sec Loss 2.9549 LearningRate 0.0000 Epoch: 33 Global Step: 696870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:27,989-Speed 6318.13 samples/sec Loss 2.9070 LearningRate 0.0000 Epoch: 33 Global Step: 696880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:31,239-Speed 6303.32 samples/sec Loss 2.9537 LearningRate 0.0000 Epoch: 33 Global Step: 696890 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:34,483-Speed 6315.36 samples/sec Loss 2.9142 LearningRate 0.0000 Epoch: 33 Global Step: 696900 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:37,733-Speed 6302.70 samples/sec Loss 2.9449 LearningRate 0.0000 Epoch: 33 Global Step: 696910 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:40,985-Speed 6297.47 samples/sec Loss 2.9273 LearningRate 0.0000 Epoch: 33 Global Step: 696920 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:44,235-Speed 6302.87 samples/sec Loss 2.9611 LearningRate 0.0000 Epoch: 33 Global Step: 696930 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:07:47,485-Speed 6304.41 samples/sec Loss 2.8749 LearningRate 0.0000 Epoch: 33 Global Step: 696940 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:07:50,736-Speed 6302.35 samples/sec Loss 2.9084 LearningRate 0.0000 Epoch: 33 Global Step: 696950 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:07:53,984-Speed 6305.11 samples/sec Loss 2.9879 LearningRate 0.0000 Epoch: 33 Global Step: 696960 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:07:57,218-Speed 6334.58 samples/sec Loss 2.9534 LearningRate 0.0000 Epoch: 33 Global Step: 696970 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:00,470-Speed 6300.12 samples/sec Loss 2.9819 LearningRate 0.0000 Epoch: 33 Global Step: 696980 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:03,717-Speed 6307.29 samples/sec Loss 2.9552 LearningRate 0.0000 Epoch: 33 Global Step: 696990 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:06,966-Speed 6306.55 samples/sec Loss 2.9502 LearningRate 0.0000 Epoch: 33 Global Step: 697000 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:10,212-Speed 6309.54 samples/sec Loss 2.9659 LearningRate 0.0000 Epoch: 33 Global Step: 697010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:13,462-Speed 6302.77 samples/sec Loss 2.9297 LearningRate 0.0000 Epoch: 33 Global Step: 697020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:16,706-Speed 6315.16 samples/sec Loss 2.9493 LearningRate 0.0000 Epoch: 33 Global Step: 697030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:19,952-Speed 6311.19 samples/sec Loss 2.9000 LearningRate 0.0000 Epoch: 33 Global Step: 697040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:23,198-Speed 6310.68 samples/sec Loss 2.9831 LearningRate 0.0000 Epoch: 33 Global Step: 697050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:26,444-Speed 6310.41 samples/sec Loss 2.9326 LearningRate 0.0000 Epoch: 33 Global Step: 697060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:29,699-Speed 6293.75 samples/sec Loss 2.9620 LearningRate 0.0000 Epoch: 33 Global Step: 697070 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:08:32,940-Speed 6319.94 samples/sec Loss 2.9841 LearningRate 0.0000 Epoch: 33 Global Step: 697080 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:36,195-Speed 6293.50 samples/sec Loss 2.8940 LearningRate 0.0000 Epoch: 33 Global Step: 697090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:39,452-Speed 6289.24 samples/sec Loss 2.9096 LearningRate 0.0000 Epoch: 33 Global Step: 697100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:42,705-Speed 6297.18 samples/sec Loss 3.0013 LearningRate 0.0000 Epoch: 33 Global Step: 697110 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:45,952-Speed 6308.78 samples/sec Loss 2.9116 LearningRate 0.0000 Epoch: 33 Global Step: 697120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:49,195-Speed 6316.01 samples/sec Loss 2.9396 LearningRate 0.0000 Epoch: 33 Global Step: 697130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:52,446-Speed 6301.64 samples/sec Loss 2.9487 LearningRate 0.0000 Epoch: 33 Global Step: 697140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:55,697-Speed 6301.78 samples/sec Loss 2.9575 LearningRate 0.0000 Epoch: 33 Global Step: 697150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:08:58,942-Speed 6313.77 samples/sec Loss 2.9898 LearningRate 0.0000 Epoch: 33 Global Step: 697160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:02,195-Speed 6297.03 samples/sec Loss 2.9810 LearningRate 0.0000 Epoch: 33 Global Step: 697170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:05,443-Speed 6306.75 samples/sec Loss 2.9449 LearningRate 0.0000 Epoch: 33 Global Step: 697180 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:09:08,697-Speed 6294.98 samples/sec Loss 2.9779 LearningRate 0.0000 Epoch: 33 Global Step: 697190 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:09:11,932-Speed 6331.22 samples/sec Loss 2.9418 LearningRate 0.0000 Epoch: 33 Global Step: 697200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:15,186-Speed 6295.55 samples/sec Loss 2.9778 LearningRate 0.0000 Epoch: 33 Global Step: 697210 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:18,439-Speed 6299.03 samples/sec Loss 2.9098 LearningRate 0.0000 Epoch: 33 Global Step: 697220 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:21,693-Speed 6294.50 samples/sec Loss 2.9711 LearningRate 0.0000 Epoch: 33 Global Step: 697230 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:24,943-Speed 6302.86 samples/sec Loss 2.9615 LearningRate 0.0000 Epoch: 33 Global Step: 697240 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:28,194-Speed 6302.24 samples/sec Loss 2.9450 LearningRate 0.0000 Epoch: 33 Global Step: 697250 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:31,436-Speed 6318.53 samples/sec Loss 2.9433 LearningRate 0.0000 Epoch: 33 Global Step: 697260 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:34,687-Speed 6300.49 samples/sec Loss 2.8915 LearningRate 0.0000 Epoch: 33 Global Step: 697270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:37,935-Speed 6306.23 samples/sec Loss 2.9673 LearningRate 0.0000 Epoch: 33 Global Step: 697280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:41,184-Speed 6305.95 samples/sec Loss 2.9642 LearningRate 0.0000 Epoch: 33 Global Step: 697290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:09:44,431-Speed 6308.25 samples/sec Loss 2.9360 LearningRate 0.0000 Epoch: 33 Global Step: 697300 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:09:47,689-Speed 6287.80 samples/sec Loss 3.0063 LearningRate 0.0000 Epoch: 33 Global Step: 697310 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:09:50,941-Speed 6298.50 samples/sec Loss 2.9106 LearningRate 0.0000 Epoch: 33 Global Step: 697320 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:09:54,195-Speed 6296.15 samples/sec Loss 2.9433 LearningRate 0.0000 Epoch: 33 Global Step: 697330 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:09:57,448-Speed 6296.22 samples/sec Loss 2.9521 LearningRate 0.0000 Epoch: 33 Global Step: 697340 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:10:00,698-Speed 6303.79 samples/sec Loss 2.9439 LearningRate 0.0000 Epoch: 33 Global Step: 697350 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:10:03,938-Speed 6322.58 samples/sec Loss 2.9458 LearningRate 0.0000 Epoch: 33 Global Step: 697360 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:07,204-Speed 6271.56 samples/sec Loss 2.9697 LearningRate 0.0000 Epoch: 33 Global Step: 697370 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:10,462-Speed 6288.17 samples/sec Loss 2.9288 LearningRate 0.0000 Epoch: 33 Global Step: 697380 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:13,708-Speed 6310.58 samples/sec Loss 2.9131 LearningRate 0.0000 Epoch: 33 Global Step: 697390 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:16,955-Speed 6308.79 samples/sec Loss 2.9351 LearningRate 0.0000 Epoch: 33 Global Step: 697400 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:20,207-Speed 6299.24 samples/sec Loss 2.9246 LearningRate 0.0000 Epoch: 33 Global Step: 697410 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:23,460-Speed 6296.84 samples/sec Loss 3.0072 LearningRate 0.0000 Epoch: 33 Global Step: 697420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:26,712-Speed 6298.68 samples/sec Loss 2.9439 LearningRate 0.0000 Epoch: 33 Global Step: 697430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:29,966-Speed 6296.65 samples/sec Loss 2.9682 LearningRate 0.0000 Epoch: 33 Global Step: 697440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:33,216-Speed 6302.27 samples/sec Loss 2.9460 LearningRate 0.0000 Epoch: 33 Global Step: 697450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:36,469-Speed 6296.70 samples/sec Loss 2.9239 LearningRate 0.0000 Epoch: 33 Global Step: 697460 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:10:39,714-Speed 6312.56 samples/sec Loss 2.9572 LearningRate 0.0000 Epoch: 33 Global Step: 697470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:42,975-Speed 6282.97 samples/sec Loss 2.9627 LearningRate 0.0000 Epoch: 33 Global Step: 697480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:46,229-Speed 6294.43 samples/sec Loss 2.8954 LearningRate 0.0000 Epoch: 33 Global Step: 697490 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:49,477-Speed 6306.29 samples/sec Loss 2.9550 LearningRate 0.0000 Epoch: 33 Global Step: 697500 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:52,724-Speed 6308.89 samples/sec Loss 2.9144 LearningRate 0.0000 Epoch: 33 Global Step: 697510 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:55,972-Speed 6306.49 samples/sec Loss 2.9302 LearningRate 0.0000 Epoch: 33 Global Step: 697520 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:10:59,223-Speed 6302.06 samples/sec Loss 2.9558 LearningRate 0.0000 Epoch: 33 Global Step: 697530 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:02,475-Speed 6298.49 samples/sec Loss 2.9256 LearningRate 0.0000 Epoch: 33 Global Step: 697540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:05,729-Speed 6293.99 samples/sec Loss 2.9512 LearningRate 0.0000 Epoch: 33 Global Step: 697550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:08,990-Speed 6283.83 samples/sec Loss 2.9809 LearningRate 0.0000 Epoch: 33 Global Step: 697560 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:12,238-Speed 6306.79 samples/sec Loss 2.8528 LearningRate 0.0000 Epoch: 33 Global Step: 697570 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:11:15,486-Speed 6305.91 samples/sec Loss 2.9373 LearningRate 0.0000 Epoch: 33 Global Step: 697580 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:11:18,734-Speed 6306.79 samples/sec Loss 2.9236 LearningRate 0.0000 Epoch: 33 Global Step: 697590 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:11:21,980-Speed 6311.17 samples/sec Loss 2.9444 LearningRate 0.0000 Epoch: 33 Global Step: 697600 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:11:25,227-Speed 6308.56 samples/sec Loss 2.9323 LearningRate 0.0000 Epoch: 33 Global Step: 697610 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:11:28,476-Speed 6305.19 samples/sec Loss 2.9612 LearningRate 0.0000 Epoch: 33 Global Step: 697620 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:11:31,709-Speed 6336.90 samples/sec Loss 2.9508 LearningRate 0.0000 Epoch: 33 Global Step: 697630 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:34,970-Speed 6281.05 samples/sec Loss 2.9175 LearningRate 0.0000 Epoch: 33 Global Step: 697640 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:38,215-Speed 6312.16 samples/sec Loss 2.9499 LearningRate 0.0000 Epoch: 33 Global Step: 697650 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:41,460-Speed 6313.28 samples/sec Loss 2.9024 LearningRate 0.0000 Epoch: 33 Global Step: 697660 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:44,718-Speed 6287.75 samples/sec Loss 2.9664 LearningRate 0.0000 Epoch: 33 Global Step: 697670 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:47,967-Speed 6304.97 samples/sec Loss 2.9988 LearningRate 0.0000 Epoch: 33 Global Step: 697680 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:51,215-Speed 6305.82 samples/sec Loss 2.9062 LearningRate 0.0000 Epoch: 33 Global Step: 697690 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:54,467-Speed 6298.48 samples/sec Loss 2.9426 LearningRate 0.0000 Epoch: 33 Global Step: 697700 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:11:57,713-Speed 6311.80 samples/sec Loss 3.0019 LearningRate 0.0000 Epoch: 33 Global Step: 697710 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:00,962-Speed 6304.92 samples/sec Loss 2.9788 LearningRate 0.0000 Epoch: 33 Global Step: 697720 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:04,223-Speed 6281.14 samples/sec Loss 2.9793 LearningRate 0.0000 Epoch: 33 Global Step: 697730 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:12:07,465-Speed 6319.10 samples/sec Loss 2.9006 LearningRate 0.0000 Epoch: 33 Global Step: 697740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:10,708-Speed 6315.36 samples/sec Loss 2.9396 LearningRate 0.0000 Epoch: 33 Global Step: 697750 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:13,954-Speed 6311.87 samples/sec Loss 2.9438 LearningRate 0.0000 Epoch: 33 Global Step: 697760 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:17,203-Speed 6305.36 samples/sec Loss 2.9915 LearningRate 0.0000 Epoch: 33 Global Step: 697770 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:20,458-Speed 6292.73 samples/sec Loss 2.9319 LearningRate 0.0000 Epoch: 33 Global Step: 697780 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:23,723-Speed 6275.54 samples/sec Loss 2.9283 LearningRate 0.0000 Epoch: 33 Global Step: 697790 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:26,982-Speed 6284.59 samples/sec Loss 2.9305 LearningRate 0.0000 Epoch: 33 Global Step: 697800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:30,236-Speed 6295.08 samples/sec Loss 2.9349 LearningRate 0.0000 Epoch: 33 Global Step: 697810 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:33,482-Speed 6310.36 samples/sec Loss 2.9736 LearningRate 0.0000 Epoch: 33 Global Step: 697820 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:36,733-Speed 6301.07 samples/sec Loss 2.9396 LearningRate 0.0000 Epoch: 33 Global Step: 697830 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:39,984-Speed 6301.13 samples/sec Loss 2.9366 LearningRate 0.0000 Epoch: 33 Global Step: 697840 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:12:43,217-Speed 6335.72 samples/sec Loss 2.9152 LearningRate 0.0000 Epoch: 33 Global Step: 697850 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:46,471-Speed 6296.20 samples/sec Loss 2.9468 LearningRate 0.0000 Epoch: 33 Global Step: 697860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:49,720-Speed 6304.05 samples/sec Loss 2.9329 LearningRate 0.0000 Epoch: 33 Global Step: 697870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:52,973-Speed 6297.80 samples/sec Loss 2.9588 LearningRate 0.0000 Epoch: 33 Global Step: 697880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:56,225-Speed 6299.24 samples/sec Loss 2.9492 LearningRate 0.0000 Epoch: 33 Global Step: 697890 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:12:59,472-Speed 6307.18 samples/sec Loss 2.9413 LearningRate 0.0000 Epoch: 33 Global Step: 697900 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:02,726-Speed 6296.30 samples/sec Loss 2.9450 LearningRate 0.0000 Epoch: 33 Global Step: 697910 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:05,969-Speed 6315.51 samples/sec Loss 2.9460 LearningRate 0.0000 Epoch: 33 Global Step: 697920 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:09,221-Speed 6300.10 samples/sec Loss 2.9311 LearningRate 0.0000 Epoch: 33 Global Step: 697930 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:12,475-Speed 6294.38 samples/sec Loss 2.9955 LearningRate 0.0000 Epoch: 33 Global Step: 697940 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:15,722-Speed 6309.74 samples/sec Loss 2.9098 LearningRate 0.0000 Epoch: 33 Global Step: 697950 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:13:18,968-Speed 6310.11 samples/sec Loss 2.9250 LearningRate 0.0000 Epoch: 33 Global Step: 697960 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:13:22,219-Speed 6300.12 samples/sec Loss 2.9373 LearningRate 0.0000 Epoch: 33 Global Step: 697970 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:13:25,469-Speed 6304.52 samples/sec Loss 2.9041 LearningRate 0.0000 Epoch: 33 Global Step: 697980 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:13:28,718-Speed 6305.46 samples/sec Loss 2.9350 LearningRate 0.0000 Epoch: 33 Global Step: 697990 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:13:31,967-Speed 6304.32 samples/sec Loss 2.9383 LearningRate 0.0000 Epoch: 33 Global Step: 698000 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:13:35,208-Speed 6320.66 samples/sec Loss 2.9738 LearningRate 0.0000 Epoch: 33 Global Step: 698010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:38,464-Speed 6290.05 samples/sec Loss 2.9938 LearningRate 0.0000 Epoch: 33 Global Step: 698020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:41,717-Speed 6297.85 samples/sec Loss 2.9464 LearningRate 0.0000 Epoch: 33 Global Step: 698030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:44,964-Speed 6309.06 samples/sec Loss 2.9399 LearningRate 0.0000 Epoch: 33 Global Step: 698040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:48,214-Speed 6301.76 samples/sec Loss 3.0068 LearningRate 0.0000 Epoch: 33 Global Step: 698050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:51,466-Speed 6299.40 samples/sec Loss 2.9307 LearningRate 0.0000 Epoch: 33 Global Step: 698060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:54,716-Speed 6304.34 samples/sec Loss 2.9140 LearningRate 0.0000 Epoch: 33 Global Step: 698070 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:13:57,961-Speed 6311.33 samples/sec Loss 2.9105 LearningRate 0.0000 Epoch: 33 Global Step: 698080 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:01,204-Speed 6317.41 samples/sec Loss 2.9337 LearningRate 0.0000 Epoch: 33 Global Step: 698090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:04,460-Speed 6291.78 samples/sec Loss 2.9290 LearningRate 0.0000 Epoch: 33 Global Step: 698100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:07,715-Speed 6291.37 samples/sec Loss 2.9069 LearningRate 0.0000 Epoch: 33 Global Step: 698110 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:14:10,960-Speed 6313.16 samples/sec Loss 2.9179 LearningRate 0.0000 Epoch: 33 Global Step: 698120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:14,209-Speed 6305.08 samples/sec Loss 2.9371 LearningRate 0.0000 Epoch: 33 Global Step: 698130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:17,459-Speed 6302.89 samples/sec Loss 2.9890 LearningRate 0.0000 Epoch: 33 Global Step: 698140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:20,710-Speed 6301.01 samples/sec Loss 2.9798 LearningRate 0.0000 Epoch: 33 Global Step: 698150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:23,957-Speed 6309.05 samples/sec Loss 3.0014 LearningRate 0.0000 Epoch: 33 Global Step: 698160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:27,205-Speed 6307.91 samples/sec Loss 2.9691 LearningRate 0.0000 Epoch: 33 Global Step: 698170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:30,465-Speed 6282.99 samples/sec Loss 2.9550 LearningRate 0.0000 Epoch: 33 Global Step: 698180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:33,721-Speed 6290.43 samples/sec Loss 2.9578 LearningRate 0.0000 Epoch: 33 Global Step: 698190 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:36,975-Speed 6296.63 samples/sec Loss 2.8845 LearningRate 0.0000 Epoch: 33 Global Step: 698200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:40,231-Speed 6291.44 samples/sec Loss 2.9323 LearningRate 0.0000 Epoch: 33 Global Step: 698210 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:14:43,483-Speed 6299.07 samples/sec Loss 2.9385 LearningRate 0.0000 Epoch: 33 Global Step: 698220 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:14:46,736-Speed 6295.97 samples/sec Loss 2.9682 LearningRate 0.0000 Epoch: 33 Global Step: 698230 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:14:49,983-Speed 6309.50 samples/sec Loss 2.9354 LearningRate 0.0000 Epoch: 33 Global Step: 698240 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:14:53,244-Speed 6281.70 samples/sec Loss 2.9303 LearningRate 0.0000 Epoch: 33 Global Step: 698250 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:14:56,493-Speed 6304.34 samples/sec Loss 2.8852 LearningRate 0.0000 Epoch: 33 Global Step: 698260 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:14:59,731-Speed 6327.12 samples/sec Loss 2.9443 LearningRate 0.0000 Epoch: 33 Global Step: 698270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:02,989-Speed 6286.66 samples/sec Loss 2.9256 LearningRate 0.0000 Epoch: 33 Global Step: 698280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:06,245-Speed 6292.52 samples/sec Loss 2.8993 LearningRate 0.0000 Epoch: 33 Global Step: 698290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:09,494-Speed 6304.43 samples/sec Loss 2.9691 LearningRate 0.0000 Epoch: 33 Global Step: 698300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:12,748-Speed 6294.70 samples/sec Loss 2.8770 LearningRate 0.0000 Epoch: 33 Global Step: 698310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:16,041-Speed 6221.32 samples/sec Loss 2.9228 LearningRate 0.0000 Epoch: 33 Global Step: 698320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:19,350-Speed 6189.90 samples/sec Loss 2.8994 LearningRate 0.0000 Epoch: 33 Global Step: 698330 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:22,609-Speed 6286.23 samples/sec Loss 2.8921 LearningRate 0.0000 Epoch: 33 Global Step: 698340 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:25,864-Speed 6292.62 samples/sec Loss 2.9686 LearningRate 0.0000 Epoch: 33 Global Step: 698350 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:29,116-Speed 6299.26 samples/sec Loss 2.9851 LearningRate 0.0000 Epoch: 33 Global Step: 698360 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:32,372-Speed 6290.80 samples/sec Loss 2.9489 LearningRate 0.0000 Epoch: 33 Global Step: 698370 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:15:35,625-Speed 6298.36 samples/sec Loss 2.9208 LearningRate 0.0000 Epoch: 33 Global Step: 698380 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:15:38,877-Speed 6298.10 samples/sec Loss 2.9923 LearningRate 0.0000 Epoch: 33 Global Step: 698390 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:15:42,135-Speed 6287.60 samples/sec Loss 2.9518 LearningRate 0.0000 Epoch: 33 Global Step: 698400 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:15:45,380-Speed 6314.16 samples/sec Loss 2.9490 LearningRate 0.0000 Epoch: 33 Global Step: 698410 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:15:48,617-Speed 6328.53 samples/sec Loss 2.9779 LearningRate 0.0000 Epoch: 33 Global Step: 698420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:51,865-Speed 6306.83 samples/sec Loss 2.9672 LearningRate 0.0000 Epoch: 33 Global Step: 698430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:55,124-Speed 6285.68 samples/sec Loss 2.8156 LearningRate 0.0000 Epoch: 33 Global Step: 698440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:15:58,377-Speed 6297.05 samples/sec Loss 3.0211 LearningRate 0.0000 Epoch: 33 Global Step: 698450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:16:01,629-Speed 6297.80 samples/sec Loss 2.9208 LearningRate 0.0000 Epoch: 33 Global Step: 698460 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:16:04,886-Speed 6290.21 samples/sec Loss 2.9087 LearningRate 0.0000 Epoch: 33 Global Step: 698470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:16:08,142-Speed 6291.41 samples/sec Loss 2.9363 LearningRate 0.0000 Epoch: 33 Global Step: 698480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:16:11,403-Speed 6280.91 samples/sec Loss 2.9395 LearningRate 0.0000 Epoch: 33 Global Step: 698490 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:16:14,655-Speed 6299.61 samples/sec Loss 2.9404 LearningRate 0.0000 Epoch: 33 Global Step: 698500 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:16:17,910-Speed 6292.60 samples/sec Loss 2.9518 LearningRate 0.0000 Epoch: 33 Global Step: 698510 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:16:21,156-Speed 6311.54 samples/sec Loss 2.9284 LearningRate 0.0000 Epoch: 33 Global Step: 698520 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:16:24,414-Speed 6286.71 samples/sec Loss 2.9441 LearningRate 0.0000 Epoch: 33 Global Step: 698530 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:16:27,669-Speed 6292.56 samples/sec Loss 2.8865 LearningRate 0.0000 Epoch: 33 Global Step: 698540 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:16:30,921-Speed 6299.54 samples/sec Loss 2.9917 LearningRate 0.0000 Epoch: 33 Global Step: 698550 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:16:34,172-Speed 6300.17 samples/sec Loss 2.9908 LearningRate 0.0000 Epoch: 33 Global Step: 698560 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:16:37,417-Speed 6313.10 samples/sec Loss 2.9605 LearningRate 0.0000 Epoch: 33 Global Step: 698570 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:16:40,666-Speed 6304.98 samples/sec Loss 2.9340 LearningRate 0.0000 Epoch: 33 Global Step: 698580 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:16:43,922-Speed 6292.12 samples/sec Loss 2.9201 LearningRate 0.0000 Epoch: 33 Global Step: 698590 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:16:47,151-Speed 6342.95 samples/sec Loss 2.9693 LearningRate 0.0000 Epoch: 33 Global Step: 698600 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:16:50,405-Speed 6295.40 samples/sec Loss 2.9352 LearningRate 0.0000 Epoch: 33 Global Step: 698610 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:16:53,662-Speed 6290.19 samples/sec Loss 2.9093 LearningRate 0.0000 Epoch: 33 Global Step: 698620 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:16:56,912-Speed 6302.59 samples/sec Loss 2.9325 LearningRate 0.0000 Epoch: 33 Global Step: 698630 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:00,162-Speed 6303.30 samples/sec Loss 2.9064 LearningRate 0.0000 Epoch: 33 Global Step: 698640 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:03,414-Speed 6300.32 samples/sec Loss 2.9486 LearningRate 0.0000 Epoch: 33 Global Step: 698650 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:06,671-Speed 6288.18 samples/sec Loss 2.9183 LearningRate 0.0000 Epoch: 33 Global Step: 698660 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:09,930-Speed 6286.43 samples/sec Loss 2.9308 LearningRate 0.0000 Epoch: 33 Global Step: 698670 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:13,184-Speed 6293.76 samples/sec Loss 2.9358 LearningRate 0.0000 Epoch: 33 Global Step: 698680 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:16,437-Speed 6296.96 samples/sec Loss 2.8698 LearningRate 0.0000 Epoch: 33 Global Step: 698690 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:19,694-Speed 6291.45 samples/sec Loss 2.8840 LearningRate 0.0000 Epoch: 33 Global Step: 698700 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:17:22,952-Speed 6287.07 samples/sec Loss 2.9320 LearningRate 0.0000 Epoch: 33 Global Step: 698710 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:17:26,210-Speed 6287.18 samples/sec Loss 2.9389 LearningRate 0.0000 Epoch: 33 Global Step: 698720 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:29,467-Speed 6289.09 samples/sec Loss 2.9189 LearningRate 0.0000 Epoch: 33 Global Step: 698730 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:32,715-Speed 6307.28 samples/sec Loss 2.9506 LearningRate 0.0000 Epoch: 33 Global Step: 698740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:35,958-Speed 6315.66 samples/sec Loss 2.9210 LearningRate 0.0000 Epoch: 33 Global Step: 698750 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:39,210-Speed 6298.92 samples/sec Loss 2.9558 LearningRate 0.0000 Epoch: 33 Global Step: 698760 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:42,467-Speed 6289.79 samples/sec Loss 2.9231 LearningRate 0.0000 Epoch: 33 Global Step: 698770 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:45,716-Speed 6303.60 samples/sec Loss 2.9302 LearningRate 0.0000 Epoch: 33 Global Step: 698780 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:48,969-Speed 6297.99 samples/sec Loss 2.8552 LearningRate 0.0000 Epoch: 33 Global Step: 698790 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:52,218-Speed 6305.26 samples/sec Loss 2.9294 LearningRate 0.0000 Epoch: 33 Global Step: 698800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:55,482-Speed 6275.03 samples/sec Loss 2.9137 LearningRate 0.0000 Epoch: 33 Global Step: 698810 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:17:58,716-Speed 6335.19 samples/sec Loss 2.9522 LearningRate 0.0000 Epoch: 33 Global Step: 698820 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:18:01,965-Speed 6306.07 samples/sec Loss 2.9500 LearningRate 0.0000 Epoch: 33 Global Step: 698830 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:18:05,211-Speed 6309.94 samples/sec Loss 2.9417 LearningRate 0.0000 Epoch: 33 Global Step: 698840 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:18:08,461-Speed 6302.94 samples/sec Loss 2.9911 LearningRate 0.0000 Epoch: 33 Global Step: 698850 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:18:11,710-Speed 6305.35 samples/sec Loss 2.9284 LearningRate 0.0000 Epoch: 33 Global Step: 698860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:18:14,961-Speed 6301.53 samples/sec Loss 2.9107 LearningRate 0.0000 Epoch: 33 Global Step: 698870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:18:18,213-Speed 6298.23 samples/sec Loss 2.9516 LearningRate 0.0000 Epoch: 33 Global Step: 698880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:18:21,464-Speed 6300.36 samples/sec Loss 2.9231 LearningRate 0.0000 Epoch: 33 Global Step: 698890 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:18:24,712-Speed 6306.65 samples/sec Loss 2.9636 LearningRate 0.0000 Epoch: 33 Global Step: 698900 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:18:27,959-Speed 6310.00 samples/sec Loss 2.9426 LearningRate 0.0000 Epoch: 33 Global Step: 698910 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:18:31,218-Speed 6285.48 samples/sec Loss 2.9255 LearningRate 0.0000 Epoch: 33 Global Step: 698920 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:18:34,472-Speed 6295.39 samples/sec Loss 2.9135 LearningRate 0.0000 Epoch: 33 Global Step: 698930 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:18:37,727-Speed 6292.05 samples/sec Loss 2.9577 LearningRate 0.0000 Epoch: 33 Global Step: 698940 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:18:40,986-Speed 6285.33 samples/sec Loss 2.9284 LearningRate 0.0000 Epoch: 33 Global Step: 698950 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:18:44,240-Speed 6295.39 samples/sec Loss 2.9240 LearningRate 0.0000 Epoch: 33 Global Step: 698960 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:18:47,487-Speed 6308.55 samples/sec Loss 2.8878 LearningRate 0.0000 Epoch: 33 Global Step: 698970 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:18:50,740-Speed 6296.87 samples/sec Loss 2.9043 LearningRate 0.0000 Epoch: 33 Global Step: 698980 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:18:53,994-Speed 6295.46 samples/sec Loss 2.9354 LearningRate 0.0000 Epoch: 33 Global Step: 698990 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:18:57,252-Speed 6288.48 samples/sec Loss 2.9314 LearningRate 0.0000 Epoch: 33 Global Step: 699000 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:19:00,490-Speed 6325.48 samples/sec Loss 2.9472 LearningRate 0.0000 Epoch: 33 Global Step: 699010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:03,739-Speed 6305.48 samples/sec Loss 2.9186 LearningRate 0.0000 Epoch: 33 Global Step: 699020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:06,991-Speed 6299.81 samples/sec Loss 2.9536 LearningRate 0.0000 Epoch: 33 Global Step: 699030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:10,244-Speed 6296.57 samples/sec Loss 2.9286 LearningRate 0.0000 Epoch: 33 Global Step: 699040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:13,492-Speed 6306.23 samples/sec Loss 2.9379 LearningRate 0.0000 Epoch: 33 Global Step: 699050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:16,742-Speed 6304.50 samples/sec Loss 2.9974 LearningRate 0.0000 Epoch: 33 Global Step: 699060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:20,002-Speed 6283.13 samples/sec Loss 2.8923 LearningRate 0.0000 Epoch: 33 Global Step: 699070 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:23,261-Speed 6284.45 samples/sec Loss 2.9638 LearningRate 0.0000 Epoch: 33 Global Step: 699080 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:26,530-Speed 6267.62 samples/sec Loss 2.9416 LearningRate 0.0000 Epoch: 33 Global Step: 699090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:29,780-Speed 6303.29 samples/sec Loss 2.9338 LearningRate 0.0000 Epoch: 33 Global Step: 699100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:33,023-Speed 6315.40 samples/sec Loss 2.9358 LearningRate 0.0000 Epoch: 33 Global Step: 699110 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:36,281-Speed 6288.18 samples/sec Loss 2.9355 LearningRate 0.0000 Epoch: 33 Global Step: 699120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:39,532-Speed 6300.59 samples/sec Loss 2.9509 LearningRate 0.0000 Epoch: 33 Global Step: 699130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:42,786-Speed 6295.04 samples/sec Loss 2.9473 LearningRate 0.0000 Epoch: 33 Global Step: 699140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:46,032-Speed 6310.53 samples/sec Loss 2.9300 LearningRate 0.0000 Epoch: 33 Global Step: 699150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:49,282-Speed 6301.98 samples/sec Loss 2.9754 LearningRate 0.0000 Epoch: 33 Global Step: 699160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:52,540-Speed 6288.46 samples/sec Loss 2.9542 LearningRate 0.0000 Epoch: 33 Global Step: 699170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:55,788-Speed 6306.61 samples/sec Loss 2.8835 LearningRate 0.0000 Epoch: 33 Global Step: 699180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:19:59,043-Speed 6293.81 samples/sec Loss 2.9109 LearningRate 0.0000 Epoch: 33 Global Step: 699190 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:02,292-Speed 6305.35 samples/sec Loss 2.9288 LearningRate 0.0000 Epoch: 33 Global Step: 699200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:05,539-Speed 6307.40 samples/sec Loss 2.8798 LearningRate 0.0000 Epoch: 33 Global Step: 699210 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:20:08,788-Speed 6304.66 samples/sec Loss 2.9802 LearningRate 0.0000 Epoch: 33 Global Step: 699220 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:20:12,023-Speed 6332.36 samples/sec Loss 2.9668 LearningRate 0.0000 Epoch: 33 Global Step: 699230 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:15,275-Speed 6300.72 samples/sec Loss 2.9400 LearningRate 0.0000 Epoch: 33 Global Step: 699240 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:18,533-Speed 6287.22 samples/sec Loss 3.0022 LearningRate 0.0000 Epoch: 33 Global Step: 699250 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:21,782-Speed 6304.48 samples/sec Loss 2.9222 LearningRate 0.0000 Epoch: 33 Global Step: 699260 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:25,033-Speed 6301.91 samples/sec Loss 2.9092 LearningRate 0.0000 Epoch: 33 Global Step: 699270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:28,282-Speed 6304.28 samples/sec Loss 2.9040 LearningRate 0.0000 Epoch: 33 Global Step: 699280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:31,529-Speed 6309.65 samples/sec Loss 2.9072 LearningRate 0.0000 Epoch: 33 Global Step: 699290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:34,780-Speed 6299.80 samples/sec Loss 2.8927 LearningRate 0.0000 Epoch: 33 Global Step: 699300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:38,039-Speed 6287.04 samples/sec Loss 2.9386 LearningRate 0.0000 Epoch: 33 Global Step: 699310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:41,287-Speed 6306.27 samples/sec Loss 2.9772 LearningRate 0.0000 Epoch: 33 Global Step: 699320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:20:44,543-Speed 6291.47 samples/sec Loss 2.8588 LearningRate 0.0000 Epoch: 33 Global Step: 699330 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:20:47,792-Speed 6303.52 samples/sec Loss 2.9681 LearningRate 0.0000 Epoch: 33 Global Step: 699340 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:20:51,040-Speed 6307.38 samples/sec Loss 2.9089 LearningRate 0.0000 Epoch: 33 Global Step: 699350 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:20:54,293-Speed 6297.83 samples/sec Loss 2.9228 LearningRate 0.0000 Epoch: 33 Global Step: 699360 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:20:57,546-Speed 6295.72 samples/sec Loss 2.9401 LearningRate 0.0000 Epoch: 33 Global Step: 699370 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:21:00,805-Speed 6285.90 samples/sec Loss 2.9470 LearningRate 0.0000 Epoch: 33 Global Step: 699380 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:21:04,040-Speed 6333.67 samples/sec Loss 2.9596 LearningRate 0.0000 Epoch: 33 Global Step: 699390 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:07,288-Speed 6306.13 samples/sec Loss 2.9842 LearningRate 0.0000 Epoch: 33 Global Step: 699400 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:10,540-Speed 6299.66 samples/sec Loss 2.9892 LearningRate 0.0000 Epoch: 33 Global Step: 699410 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:13,800-Speed 6282.20 samples/sec Loss 2.9531 LearningRate 0.0000 Epoch: 33 Global Step: 699420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:17,051-Speed 6301.87 samples/sec Loss 2.8946 LearningRate 0.0000 Epoch: 33 Global Step: 699430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:20,302-Speed 6300.33 samples/sec Loss 2.9439 LearningRate 0.0000 Epoch: 33 Global Step: 699440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:23,571-Speed 6266.26 samples/sec Loss 2.9260 LearningRate 0.0000 Epoch: 33 Global Step: 699450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:26,824-Speed 6298.34 samples/sec Loss 2.9407 LearningRate 0.0000 Epoch: 33 Global Step: 699460 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:30,077-Speed 6296.68 samples/sec Loss 2.9975 LearningRate 0.0000 Epoch: 33 Global Step: 699470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:33,335-Speed 6289.09 samples/sec Loss 2.9140 LearningRate 0.0000 Epoch: 33 Global Step: 699480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:36,595-Speed 6283.44 samples/sec Loss 2.9517 LearningRate 0.0000 Epoch: 33 Global Step: 699490 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:21:39,837-Speed 6318.31 samples/sec Loss 2.8709 LearningRate 0.0000 Epoch: 33 Global Step: 699500 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:43,091-Speed 6293.63 samples/sec Loss 2.9410 LearningRate 0.0000 Epoch: 33 Global Step: 699510 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:46,336-Speed 6313.24 samples/sec Loss 2.9759 LearningRate 0.0000 Epoch: 33 Global Step: 699520 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:49,586-Speed 6302.65 samples/sec Loss 2.9243 LearningRate 0.0000 Epoch: 33 Global Step: 699530 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:52,839-Speed 6298.19 samples/sec Loss 2.9441 LearningRate 0.0000 Epoch: 33 Global Step: 699540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:56,098-Speed 6284.75 samples/sec Loss 2.9575 LearningRate 0.0000 Epoch: 33 Global Step: 699550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:21:59,348-Speed 6302.75 samples/sec Loss 2.9518 LearningRate 0.0000 Epoch: 33 Global Step: 699560 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:02,602-Speed 6295.53 samples/sec Loss 2.9141 LearningRate 0.0000 Epoch: 33 Global Step: 699570 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:05,859-Speed 6289.56 samples/sec Loss 2.9693 LearningRate 0.0000 Epoch: 33 Global Step: 699580 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:09,113-Speed 6295.47 samples/sec Loss 2.9098 LearningRate 0.0000 Epoch: 33 Global Step: 699590 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:12,368-Speed 6293.58 samples/sec Loss 2.8758 LearningRate 0.0000 Epoch: 33 Global Step: 699600 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:22:15,620-Speed 6298.34 samples/sec Loss 2.9402 LearningRate 0.0000 Epoch: 33 Global Step: 699610 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:22:18,876-Speed 6290.61 samples/sec Loss 2.9761 LearningRate 0.0000 Epoch: 33 Global Step: 699620 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:22:22,132-Speed 6291.67 samples/sec Loss 2.8980 LearningRate 0.0000 Epoch: 33 Global Step: 699630 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:22:25,385-Speed 6297.41 samples/sec Loss 2.9693 LearningRate 0.0000 Epoch: 33 Global Step: 699640 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:22:28,647-Speed 6279.47 samples/sec Loss 2.9313 LearningRate 0.0000 Epoch: 33 Global Step: 699650 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:22:31,887-Speed 6325.71 samples/sec Loss 2.9837 LearningRate 0.0000 Epoch: 33 Global Step: 699660 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:35,136-Speed 6305.29 samples/sec Loss 2.9872 LearningRate 0.0000 Epoch: 33 Global Step: 699670 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:38,397-Speed 6281.79 samples/sec Loss 2.9419 LearningRate 0.0000 Epoch: 33 Global Step: 699680 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:41,654-Speed 6289.10 samples/sec Loss 2.9145 LearningRate 0.0000 Epoch: 33 Global Step: 699690 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:44,904-Speed 6303.40 samples/sec Loss 2.9010 LearningRate 0.0000 Epoch: 33 Global Step: 699700 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:48,153-Speed 6303.86 samples/sec Loss 2.9441 LearningRate 0.0000 Epoch: 33 Global Step: 699710 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:51,410-Speed 6289.21 samples/sec Loss 2.8871 LearningRate 0.0000 Epoch: 33 Global Step: 699720 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:54,666-Speed 6291.54 samples/sec Loss 2.9090 LearningRate 0.0000 Epoch: 33 Global Step: 699730 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:22:57,922-Speed 6290.85 samples/sec Loss 2.9543 LearningRate 0.0000 Epoch: 33 Global Step: 699740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:01,183-Speed 6281.75 samples/sec Loss 2.9318 LearningRate 0.0000 Epoch: 33 Global Step: 699750 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:04,437-Speed 6296.50 samples/sec Loss 2.9501 LearningRate 0.0000 Epoch: 33 Global Step: 699760 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:23:07,688-Speed 6299.50 samples/sec Loss 2.9315 LearningRate 0.0000 Epoch: 33 Global Step: 699770 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:23:10,940-Speed 6299.61 samples/sec Loss 2.8876 LearningRate 0.0000 Epoch: 33 Global Step: 699780 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:23:14,193-Speed 6297.40 samples/sec Loss 2.8772 LearningRate 0.0000 Epoch: 33 Global Step: 699790 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:23:17,432-Speed 6324.21 samples/sec Loss 2.9437 LearningRate 0.0000 Epoch: 33 Global Step: 699800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:20,682-Speed 6302.18 samples/sec Loss 2.9181 LearningRate 0.0000 Epoch: 33 Global Step: 699810 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:23,932-Speed 6303.19 samples/sec Loss 2.9351 LearningRate 0.0000 Epoch: 33 Global Step: 699820 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:27,177-Speed 6312.16 samples/sec Loss 2.9033 LearningRate 0.0000 Epoch: 33 Global Step: 699830 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:30,437-Speed 6284.82 samples/sec Loss 2.9158 LearningRate 0.0000 Epoch: 33 Global Step: 699840 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:33,688-Speed 6301.73 samples/sec Loss 2.9305 LearningRate 0.0000 Epoch: 33 Global Step: 699850 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:36,945-Speed 6287.31 samples/sec Loss 2.9508 LearningRate 0.0000 Epoch: 33 Global Step: 699860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:40,195-Speed 6305.32 samples/sec Loss 2.9548 LearningRate 0.0000 Epoch: 33 Global Step: 699870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:43,448-Speed 6295.55 samples/sec Loss 2.9426 LearningRate 0.0000 Epoch: 33 Global Step: 699880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:46,739-Speed 6225.06 samples/sec Loss 2.9689 LearningRate 0.0000 Epoch: 33 Global Step: 699890 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:23:49,997-Speed 6287.84 samples/sec Loss 2.9512 LearningRate 0.0000 Epoch: 33 Global Step: 699900 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:23:53,255-Speed 6287.58 samples/sec Loss 2.9554 LearningRate 0.0000 Epoch: 33 Global Step: 699910 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:23:56,509-Speed 6295.38 samples/sec Loss 2.9364 LearningRate 0.0000 Epoch: 33 Global Step: 699920 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:23:59,760-Speed 6301.38 samples/sec Loss 2.8742 LearningRate 0.0000 Epoch: 33 Global Step: 699930 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:24:03,011-Speed 6300.19 samples/sec Loss 2.8965 LearningRate 0.0000 Epoch: 33 Global Step: 699940 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:24:06,247-Speed 6330.96 samples/sec Loss 2.9124 LearningRate 0.0000 Epoch: 33 Global Step: 699950 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:24:09,502-Speed 6293.37 samples/sec Loss 2.9383 LearningRate 0.0000 Epoch: 33 Global Step: 699960 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:24:12,751-Speed 6304.23 samples/sec Loss 2.9792 LearningRate 0.0000 Epoch: 33 Global Step: 699970 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:24:16,009-Speed 6287.90 samples/sec Loss 2.9161 LearningRate 0.0000 Epoch: 33 Global Step: 699980 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:24:19,263-Speed 6294.54 samples/sec Loss 2.8869 LearningRate 0.0000 Epoch: 33 Global Step: 699990 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:24:22,511-Speed 6306.68 samples/sec Loss 2.8950 LearningRate 0.0000 Epoch: 33 Global Step: 700000 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:24:25,766-Speed 6292.66 samples/sec Loss 2.9275 LearningRate 0.0000 Epoch: 33 Global Step: 700010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:24:29,013-Speed 6309.96 samples/sec Loss 2.9080 LearningRate 0.0000 Epoch: 33 Global Step: 700020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:24:32,269-Speed 6293.85 samples/sec Loss 2.9521 LearningRate 0.0000 Epoch: 33 Global Step: 700030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:24:35,529-Speed 6283.25 samples/sec Loss 2.9285 LearningRate 0.0000 Epoch: 33 Global Step: 700040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:24:38,787-Speed 6287.23 samples/sec Loss 2.9568 LearningRate 0.0000 Epoch: 33 Global Step: 700050 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:24:42,035-Speed 6307.88 samples/sec Loss 2.9250 LearningRate 0.0000 Epoch: 33 Global Step: 700060 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:24:45,297-Speed 6280.40 samples/sec Loss 2.9164 LearningRate 0.0000 Epoch: 33 Global Step: 700070 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:24:48,555-Speed 6285.90 samples/sec Loss 2.8763 LearningRate 0.0000 Epoch: 33 Global Step: 700080 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:24:51,807-Speed 6299.19 samples/sec Loss 2.9391 LearningRate 0.0000 Epoch: 33 Global Step: 700090 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:24:55,055-Speed 6308.62 samples/sec Loss 2.9233 LearningRate 0.0000 Epoch: 33 Global Step: 700100 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:24:58,302-Speed 6308.04 samples/sec Loss 2.8978 LearningRate 0.0000 Epoch: 33 Global Step: 700110 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:01,556-Speed 6296.23 samples/sec Loss 2.8715 LearningRate 0.0000 Epoch: 33 Global Step: 700120 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:04,810-Speed 6294.28 samples/sec Loss 2.9409 LearningRate 0.0000 Epoch: 33 Global Step: 700130 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:08,060-Speed 6304.19 samples/sec Loss 2.9042 LearningRate 0.0000 Epoch: 33 Global Step: 700140 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:11,303-Speed 6315.92 samples/sec Loss 2.9392 LearningRate 0.0000 Epoch: 33 Global Step: 700150 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:14,561-Speed 6287.73 samples/sec Loss 2.9586 LearningRate 0.0000 Epoch: 33 Global Step: 700160 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:17,813-Speed 6299.65 samples/sec Loss 2.9012 LearningRate 0.0000 Epoch: 33 Global Step: 700170 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:21,068-Speed 6293.18 samples/sec Loss 2.9751 LearningRate 0.0000 Epoch: 33 Global Step: 700180 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:24,320-Speed 6298.90 samples/sec Loss 2.9352 LearningRate 0.0000 Epoch: 33 Global Step: 700190 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:27,571-Speed 6300.91 samples/sec Loss 2.9628 LearningRate 0.0000 Epoch: 33 Global Step: 700200 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:30,833-Speed 6278.54 samples/sec Loss 2.9340 LearningRate 0.0000 Epoch: 33 Global Step: 700210 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:34,090-Speed 6289.23 samples/sec Loss 2.9345 LearningRate 0.0000 Epoch: 33 Global Step: 700220 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:37,341-Speed 6302.26 samples/sec Loss 2.9988 LearningRate 0.0000 Epoch: 33 Global Step: 700230 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:25:40,579-Speed 6325.54 samples/sec Loss 2.9037 LearningRate 0.0000 Epoch: 33 Global Step: 700240 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:25:43,835-Speed 6291.32 samples/sec Loss 2.9643 LearningRate 0.0000 Epoch: 33 Global Step: 700250 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:25:47,094-Speed 6285.19 samples/sec Loss 2.9594 LearningRate 0.0000 Epoch: 33 Global Step: 700260 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:25:50,362-Speed 6267.51 samples/sec Loss 2.9211 LearningRate 0.0000 Epoch: 33 Global Step: 700270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:25:53,624-Speed 6279.66 samples/sec Loss 2.9047 LearningRate 0.0000 Epoch: 33 Global Step: 700280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:25:56,866-Speed 6320.31 samples/sec Loss 2.9152 LearningRate 0.0000 Epoch: 33 Global Step: 700290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:00,116-Speed 6303.02 samples/sec Loss 2.9131 LearningRate 0.0000 Epoch: 33 Global Step: 700300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:03,372-Speed 6292.28 samples/sec Loss 2.9063 LearningRate 0.0000 Epoch: 33 Global Step: 700310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:06,617-Speed 6311.67 samples/sec Loss 2.9302 LearningRate 0.0000 Epoch: 33 Global Step: 700320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:09,933-Speed 6177.71 samples/sec Loss 2.8966 LearningRate 0.0000 Epoch: 33 Global Step: 700330 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:13,248-Speed 6178.60 samples/sec Loss 2.8985 LearningRate 0.0000 Epoch: 33 Global Step: 700340 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:26:16,502-Speed 6294.90 samples/sec Loss 3.0001 LearningRate 0.0000 Epoch: 33 Global Step: 700350 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:26:19,754-Speed 6299.26 samples/sec Loss 2.9495 LearningRate 0.0000 Epoch: 33 Global Step: 700360 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:26:23,015-Speed 6282.66 samples/sec Loss 2.9174 LearningRate 0.0000 Epoch: 33 Global Step: 700370 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:26:26,248-Speed 6335.44 samples/sec Loss 2.9451 LearningRate 0.0000 Epoch: 33 Global Step: 700380 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:29,508-Speed 6283.75 samples/sec Loss 2.9340 LearningRate 0.0000 Epoch: 33 Global Step: 700390 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:32,758-Speed 6302.82 samples/sec Loss 2.9153 LearningRate 0.0000 Epoch: 33 Global Step: 700400 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:36,009-Speed 6300.92 samples/sec Loss 2.9199 LearningRate 0.0000 Epoch: 33 Global Step: 700410 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:39,259-Speed 6303.32 samples/sec Loss 2.8933 LearningRate 0.0000 Epoch: 33 Global Step: 700420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:42,509-Speed 6301.97 samples/sec Loss 2.9302 LearningRate 0.0000 Epoch: 33 Global Step: 700430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:45,769-Speed 6284.06 samples/sec Loss 2.8591 LearningRate 0.0000 Epoch: 33 Global Step: 700440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:49,018-Speed 6304.69 samples/sec Loss 2.9372 LearningRate 0.0000 Epoch: 33 Global Step: 700450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:52,266-Speed 6306.52 samples/sec Loss 2.9146 LearningRate 0.0000 Epoch: 33 Global Step: 700460 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:55,522-Speed 6292.55 samples/sec Loss 2.9482 LearningRate 0.0000 Epoch: 33 Global Step: 700470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:26:58,766-Speed 6313.06 samples/sec Loss 2.9904 LearningRate 0.0000 Epoch: 33 Global Step: 700480 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:27:02,020-Speed 6295.61 samples/sec Loss 2.8619 LearningRate 0.0000 Epoch: 33 Global Step: 700490 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:27:05,305-Speed 6239.71 samples/sec Loss 2.9775 LearningRate 0.0000 Epoch: 33 Global Step: 700500 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:27:08,558-Speed 6295.92 samples/sec Loss 2.9358 LearningRate 0.0000 Epoch: 33 Global Step: 700510 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:27:11,810-Speed 6300.27 samples/sec Loss 2.9360 LearningRate 0.0000 Epoch: 33 Global Step: 700520 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:27:15,060-Speed 6303.08 samples/sec Loss 2.8923 LearningRate 0.0000 Epoch: 33 Global Step: 700530 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:27:18,308-Speed 6306.59 samples/sec Loss 2.8828 LearningRate 0.0000 Epoch: 33 Global Step: 700540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:27:21,560-Speed 6299.49 samples/sec Loss 2.9293 LearningRate 0.0000 Epoch: 33 Global Step: 700550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:27:24,805-Speed 6312.55 samples/sec Loss 2.9592 LearningRate 0.0000 Epoch: 33 Global Step: 700560 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:27:28,053-Speed 6306.42 samples/sec Loss 2.9194 LearningRate 0.0000 Epoch: 33 Global Step: 700570 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:27:31,298-Speed 6313.70 samples/sec Loss 2.9422 LearningRate 0.0000 Epoch: 33 Global Step: 700580 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:27:34,549-Speed 6300.48 samples/sec Loss 2.9852 LearningRate 0.0000 Epoch: 33 Global Step: 700590 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:27:37,802-Speed 6296.96 samples/sec Loss 2.9405 LearningRate 0.0000 Epoch: 33 Global Step: 700600 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:27:41,054-Speed 6299.24 samples/sec Loss 2.9316 LearningRate 0.0000 Epoch: 33 Global Step: 700610 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:27:44,311-Speed 6289.59 samples/sec Loss 2.9636 LearningRate 0.0000 Epoch: 33 Global Step: 700620 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:27:47,562-Speed 6301.32 samples/sec Loss 2.9692 LearningRate 0.0000 Epoch: 33 Global Step: 700630 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:27:50,828-Speed 6271.03 samples/sec Loss 2.8809 LearningRate 0.0000 Epoch: 33 Global Step: 700640 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:27:54,092-Speed 6276.65 samples/sec Loss 2.8292 LearningRate 0.0000 Epoch: 33 Global Step: 700650 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:27:57,342-Speed 6302.13 samples/sec Loss 2.9139 LearningRate 0.0000 Epoch: 33 Global Step: 700660 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:28:00,581-Speed 6324.10 samples/sec Loss 2.9041 LearningRate 0.0000 Epoch: 33 Global Step: 700670 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:03,844-Speed 6279.52 samples/sec Loss 2.8925 LearningRate 0.0000 Epoch: 33 Global Step: 700680 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:07,102-Speed 6285.83 samples/sec Loss 2.9789 LearningRate 0.0000 Epoch: 33 Global Step: 700690 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:10,347-Speed 6314.46 samples/sec Loss 2.9238 LearningRate 0.0000 Epoch: 33 Global Step: 700700 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:13,596-Speed 6303.29 samples/sec Loss 2.9568 LearningRate 0.0000 Epoch: 33 Global Step: 700710 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:16,852-Speed 6292.48 samples/sec Loss 2.9327 LearningRate 0.0000 Epoch: 33 Global Step: 700720 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:20,107-Speed 6292.42 samples/sec Loss 2.9524 LearningRate 0.0000 Epoch: 33 Global Step: 700730 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:23,363-Speed 6293.50 samples/sec Loss 2.9418 LearningRate 0.0000 Epoch: 33 Global Step: 700740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:26,607-Speed 6313.11 samples/sec Loss 2.9497 LearningRate 0.0000 Epoch: 33 Global Step: 700750 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:29,852-Speed 6314.04 samples/sec Loss 2.9483 LearningRate 0.0000 Epoch: 33 Global Step: 700760 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:33,096-Speed 6314.70 samples/sec Loss 2.9576 LearningRate 0.0000 Epoch: 33 Global Step: 700770 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:28:36,347-Speed 6300.95 samples/sec Loss 2.8784 LearningRate 0.0000 Epoch: 33 Global Step: 700780 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:39,600-Speed 6296.10 samples/sec Loss 2.9667 LearningRate 0.0000 Epoch: 33 Global Step: 700790 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:42,857-Speed 6290.53 samples/sec Loss 2.9725 LearningRate 0.0000 Epoch: 33 Global Step: 700800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:46,108-Speed 6299.34 samples/sec Loss 2.9138 LearningRate 0.0000 Epoch: 33 Global Step: 700810 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:49,368-Speed 6284.58 samples/sec Loss 2.9390 LearningRate 0.0000 Epoch: 33 Global Step: 700820 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:52,636-Speed 6267.55 samples/sec Loss 2.9451 LearningRate 0.0000 Epoch: 33 Global Step: 700830 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:55,896-Speed 6283.98 samples/sec Loss 2.9198 LearningRate 0.0000 Epoch: 33 Global Step: 700840 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:28:59,148-Speed 6299.05 samples/sec Loss 2.9654 LearningRate 0.0000 Epoch: 33 Global Step: 700850 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:29:02,399-Speed 6304.19 samples/sec Loss 2.9322 LearningRate 0.0000 Epoch: 33 Global Step: 700860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:29:05,655-Speed 6291.15 samples/sec Loss 2.9559 LearningRate 0.0000 Epoch: 33 Global Step: 700870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:29:08,907-Speed 6298.54 samples/sec Loss 2.9087 LearningRate 0.0000 Epoch: 33 Global Step: 700880 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:12,156-Speed 6306.49 samples/sec Loss 2.9341 LearningRate 0.0000 Epoch: 33 Global Step: 700890 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:15,409-Speed 6296.45 samples/sec Loss 2.9287 LearningRate 0.0000 Epoch: 33 Global Step: 700900 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:18,661-Speed 6298.37 samples/sec Loss 2.9242 LearningRate 0.0000 Epoch: 33 Global Step: 700910 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:21,908-Speed 6309.78 samples/sec Loss 2.9393 LearningRate 0.0000 Epoch: 33 Global Step: 700920 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:25,199-Speed 6224.00 samples/sec Loss 2.9759 LearningRate 0.0000 Epoch: 33 Global Step: 700930 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:28,467-Speed 6268.16 samples/sec Loss 2.9056 LearningRate 0.0000 Epoch: 33 Global Step: 700940 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:31,722-Speed 6293.40 samples/sec Loss 2.9579 LearningRate 0.0000 Epoch: 33 Global Step: 700950 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:34,976-Speed 6295.22 samples/sec Loss 2.9369 LearningRate 0.0000 Epoch: 33 Global Step: 700960 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:38,229-Speed 6298.14 samples/sec Loss 2.9020 LearningRate 0.0000 Epoch: 33 Global Step: 700970 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:41,472-Speed 6316.61 samples/sec Loss 2.9120 LearningRate 0.0000 Epoch: 33 Global Step: 700980 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:44,723-Speed 6300.18 samples/sec Loss 2.8901 LearningRate 0.0000 Epoch: 33 Global Step: 700990 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:29:47,953-Speed 6341.34 samples/sec Loss 2.9060 LearningRate 0.0000 Epoch: 33 Global Step: 701000 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:29:51,210-Speed 6291.37 samples/sec Loss 2.8894 LearningRate 0.0000 Epoch: 33 Global Step: 701010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:29:54,461-Speed 6299.45 samples/sec Loss 2.8955 LearningRate 0.0000 Epoch: 33 Global Step: 701020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:29:57,715-Speed 6295.92 samples/sec Loss 2.9701 LearningRate 0.0000 Epoch: 33 Global Step: 701030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:00,970-Speed 6293.52 samples/sec Loss 2.9183 LearningRate 0.0000 Epoch: 33 Global Step: 701040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:04,230-Speed 6282.28 samples/sec Loss 2.9367 LearningRate 0.0000 Epoch: 33 Global Step: 701050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:07,484-Speed 6296.16 samples/sec Loss 2.9075 LearningRate 0.0000 Epoch: 33 Global Step: 701060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:10,729-Speed 6312.60 samples/sec Loss 2.8926 LearningRate 0.0000 Epoch: 33 Global Step: 701070 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:13,977-Speed 6305.95 samples/sec Loss 2.9326 LearningRate 0.0000 Epoch: 33 Global Step: 701080 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:17,226-Speed 6305.36 samples/sec Loss 2.8949 LearningRate 0.0000 Epoch: 33 Global Step: 701090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:20,480-Speed 6295.17 samples/sec Loss 2.9439 LearningRate 0.0000 Epoch: 33 Global Step: 701100 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:30:23,736-Speed 6291.95 samples/sec Loss 2.9097 LearningRate 0.0000 Epoch: 33 Global Step: 701110 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:30:26,991-Speed 6292.97 samples/sec Loss 2.9405 LearningRate 0.0000 Epoch: 33 Global Step: 701120 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:30:30,225-Speed 6334.04 samples/sec Loss 2.9795 LearningRate 0.0000 Epoch: 33 Global Step: 701130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:33,479-Speed 6295.66 samples/sec Loss 2.8977 LearningRate 0.0000 Epoch: 33 Global Step: 701140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:36,745-Speed 6273.77 samples/sec Loss 2.8974 LearningRate 0.0000 Epoch: 33 Global Step: 701150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:39,993-Speed 6306.48 samples/sec Loss 2.8995 LearningRate 0.0000 Epoch: 33 Global Step: 701160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:43,239-Speed 6310.42 samples/sec Loss 2.9386 LearningRate 0.0000 Epoch: 33 Global Step: 701170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:46,483-Speed 6313.75 samples/sec Loss 2.8832 LearningRate 0.0000 Epoch: 33 Global Step: 701180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:49,743-Speed 6285.21 samples/sec Loss 2.9430 LearningRate 0.0000 Epoch: 33 Global Step: 701190 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:53,011-Speed 6268.07 samples/sec Loss 2.9103 LearningRate 0.0000 Epoch: 33 Global Step: 701200 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:56,264-Speed 6295.95 samples/sec Loss 2.9344 LearningRate 0.0000 Epoch: 33 Global Step: 701210 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:30:59,518-Speed 6295.84 samples/sec Loss 2.9343 LearningRate 0.0000 Epoch: 33 Global Step: 701220 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:02,761-Speed 6316.60 samples/sec Loss 2.9141 LearningRate 0.0000 Epoch: 33 Global Step: 701230 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:31:06,011-Speed 6302.20 samples/sec Loss 2.8680 LearningRate 0.0000 Epoch: 33 Global Step: 701240 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:31:09,251-Speed 6322.54 samples/sec Loss 2.8918 LearningRate 0.0000 Epoch: 33 Global Step: 701250 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:12,502-Speed 6301.01 samples/sec Loss 2.9395 LearningRate 0.0000 Epoch: 33 Global Step: 701260 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:15,755-Speed 6296.44 samples/sec Loss 2.9420 LearningRate 0.0000 Epoch: 33 Global Step: 701270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:19,003-Speed 6307.74 samples/sec Loss 2.9460 LearningRate 0.0000 Epoch: 33 Global Step: 701280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:22,252-Speed 6304.45 samples/sec Loss 2.9389 LearningRate 0.0000 Epoch: 33 Global Step: 701290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:25,504-Speed 6298.72 samples/sec Loss 2.9347 LearningRate 0.0000 Epoch: 33 Global Step: 701300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:28,759-Speed 6294.66 samples/sec Loss 2.9628 LearningRate 0.0000 Epoch: 33 Global Step: 701310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:32,010-Speed 6301.46 samples/sec Loss 2.9300 LearningRate 0.0000 Epoch: 33 Global Step: 701320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:35,253-Speed 6315.46 samples/sec Loss 2.9125 LearningRate 0.0000 Epoch: 33 Global Step: 701330 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:38,519-Speed 6273.34 samples/sec Loss 2.8494 LearningRate 0.0000 Epoch: 33 Global Step: 701340 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:31:41,779-Speed 6284.04 samples/sec Loss 2.8943 LearningRate 0.0000 Epoch: 33 Global Step: 701350 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:31:45,028-Speed 6303.74 samples/sec Loss 2.9480 LearningRate 0.0000 Epoch: 33 Global Step: 701360 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:31:48,286-Speed 6287.99 samples/sec Loss 2.9298 LearningRate 0.0000 Epoch: 33 Global Step: 701370 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:31:51,537-Speed 6301.19 samples/sec Loss 2.9209 LearningRate 0.0000 Epoch: 33 Global Step: 701380 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:31:54,808-Speed 6261.98 samples/sec Loss 2.8888 LearningRate 0.0000 Epoch: 33 Global Step: 701390 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:31:58,058-Speed 6302.65 samples/sec Loss 2.8825 LearningRate 0.0000 Epoch: 33 Global Step: 701400 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:32:01,297-Speed 6324.63 samples/sec Loss 2.9260 LearningRate 0.0000 Epoch: 33 Global Step: 701410 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:04,581-Speed 6238.91 samples/sec Loss 2.8695 LearningRate 0.0000 Epoch: 33 Global Step: 701420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:07,833-Speed 6299.05 samples/sec Loss 2.9634 LearningRate 0.0000 Epoch: 33 Global Step: 701430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:11,084-Speed 6300.22 samples/sec Loss 2.8991 LearningRate 0.0000 Epoch: 33 Global Step: 701440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:14,337-Speed 6298.27 samples/sec Loss 2.8935 LearningRate 0.0000 Epoch: 33 Global Step: 701450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:17,587-Speed 6300.95 samples/sec Loss 2.8763 LearningRate 0.0000 Epoch: 33 Global Step: 701460 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:20,841-Speed 6296.14 samples/sec Loss 2.9208 LearningRate 0.0000 Epoch: 33 Global Step: 701470 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:24,099-Speed 6288.07 samples/sec Loss 2.8963 LearningRate 0.0000 Epoch: 33 Global Step: 701480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:27,347-Speed 6306.18 samples/sec Loss 2.9503 LearningRate 0.0000 Epoch: 33 Global Step: 701490 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:30,594-Speed 6309.59 samples/sec Loss 2.8496 LearningRate 0.0000 Epoch: 33 Global Step: 701500 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:33,849-Speed 6292.87 samples/sec Loss 2.9835 LearningRate 0.0000 Epoch: 33 Global Step: 701510 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:32:37,103-Speed 6295.29 samples/sec Loss 2.9398 LearningRate 0.0000 Epoch: 33 Global Step: 701520 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:32:40,350-Speed 6308.77 samples/sec Loss 2.9293 LearningRate 0.0000 Epoch: 33 Global Step: 701530 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:32:43,602-Speed 6299.78 samples/sec Loss 2.9487 LearningRate 0.0000 Epoch: 33 Global Step: 701540 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:32:46,849-Speed 6308.53 samples/sec Loss 2.9082 LearningRate 0.0000 Epoch: 33 Global Step: 701550 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:32:50,099-Speed 6301.91 samples/sec Loss 2.9383 LearningRate 0.0000 Epoch: 33 Global Step: 701560 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:32:53,361-Speed 6281.33 samples/sec Loss 2.9709 LearningRate 0.0000 Epoch: 33 Global Step: 701570 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:56,610-Speed 6304.29 samples/sec Loss 2.9136 LearningRate 0.0000 Epoch: 33 Global Step: 701580 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:32:59,857-Speed 6308.45 samples/sec Loss 2.9165 LearningRate 0.0000 Epoch: 33 Global Step: 701590 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:03,107-Speed 6302.74 samples/sec Loss 2.9372 LearningRate 0.0000 Epoch: 33 Global Step: 701600 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:06,361-Speed 6295.62 samples/sec Loss 2.9176 LearningRate 0.0000 Epoch: 33 Global Step: 701610 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:09,609-Speed 6306.48 samples/sec Loss 2.8880 LearningRate 0.0000 Epoch: 33 Global Step: 701620 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:12,859-Speed 6302.80 samples/sec Loss 2.9369 LearningRate 0.0000 Epoch: 33 Global Step: 701630 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:16,109-Speed 6303.69 samples/sec Loss 2.9120 LearningRate 0.0000 Epoch: 33 Global Step: 701640 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:19,367-Speed 6288.30 samples/sec Loss 2.9029 LearningRate 0.0000 Epoch: 33 Global Step: 701650 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:22,627-Speed 6282.08 samples/sec Loss 2.9104 LearningRate 0.0000 Epoch: 33 Global Step: 701660 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:25,882-Speed 6294.14 samples/sec Loss 2.9702 LearningRate 0.0000 Epoch: 33 Global Step: 701670 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:33:29,135-Speed 6297.17 samples/sec Loss 2.9042 LearningRate 0.0000 Epoch: 33 Global Step: 701680 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:33:32,374-Speed 6323.24 samples/sec Loss 2.9630 LearningRate 0.0000 Epoch: 33 Global Step: 701690 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:35,635-Speed 6281.51 samples/sec Loss 2.9166 LearningRate 0.0000 Epoch: 33 Global Step: 701700 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:38,898-Speed 6277.96 samples/sec Loss 2.9194 LearningRate 0.0000 Epoch: 33 Global Step: 701710 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:42,152-Speed 6295.75 samples/sec Loss 2.9472 LearningRate 0.0000 Epoch: 33 Global Step: 701720 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:45,400-Speed 6306.30 samples/sec Loss 2.8840 LearningRate 0.0000 Epoch: 33 Global Step: 701730 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:48,650-Speed 6304.57 samples/sec Loss 2.9167 LearningRate 0.0000 Epoch: 33 Global Step: 701740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:51,907-Speed 6290.49 samples/sec Loss 2.9384 LearningRate 0.0000 Epoch: 33 Global Step: 701750 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:55,164-Speed 6288.29 samples/sec Loss 2.8627 LearningRate 0.0000 Epoch: 33 Global Step: 701760 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:33:58,416-Speed 6298.99 samples/sec Loss 2.9080 LearningRate 0.0000 Epoch: 33 Global Step: 701770 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:01,675-Speed 6286.05 samples/sec Loss 2.9529 LearningRate 0.0000 Epoch: 33 Global Step: 701780 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:04,920-Speed 6312.05 samples/sec Loss 2.9020 LearningRate 0.0000 Epoch: 33 Global Step: 701790 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:34:08,173-Speed 6297.96 samples/sec Loss 2.8933 LearningRate 0.0000 Epoch: 33 Global Step: 701800 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:34:11,420-Speed 6309.34 samples/sec Loss 2.9104 LearningRate 0.0000 Epoch: 33 Global Step: 701810 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:34:14,662-Speed 6318.48 samples/sec Loss 2.9437 LearningRate 0.0000 Epoch: 33 Global Step: 701820 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:17,912-Speed 6303.28 samples/sec Loss 2.9030 LearningRate 0.0000 Epoch: 33 Global Step: 701830 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:21,164-Speed 6298.65 samples/sec Loss 2.9054 LearningRate 0.0000 Epoch: 33 Global Step: 701840 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:24,427-Speed 6276.84 samples/sec Loss 2.9498 LearningRate 0.0000 Epoch: 33 Global Step: 701850 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:27,688-Speed 6283.02 samples/sec Loss 2.8657 LearningRate 0.0000 Epoch: 33 Global Step: 701860 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:30,944-Speed 6291.47 samples/sec Loss 2.9582 LearningRate 0.0000 Epoch: 33 Global Step: 701870 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:34,191-Speed 6306.81 samples/sec Loss 2.8879 LearningRate 0.0000 Epoch: 33 Global Step: 701880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:37,454-Speed 6279.18 samples/sec Loss 2.9269 LearningRate 0.0000 Epoch: 33 Global Step: 701890 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:40,707-Speed 6295.58 samples/sec Loss 2.8982 LearningRate 0.0000 Epoch: 33 Global Step: 701900 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:43,961-Speed 6296.82 samples/sec Loss 2.9277 LearningRate 0.0000 Epoch: 33 Global Step: 701910 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:34:47,210-Speed 6303.65 samples/sec Loss 2.9552 LearningRate 0.0000 Epoch: 33 Global Step: 701920 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:34:50,466-Speed 6291.42 samples/sec Loss 2.8522 LearningRate 0.0000 Epoch: 33 Global Step: 701930 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:34:53,723-Speed 6289.93 samples/sec Loss 2.8950 LearningRate 0.0000 Epoch: 33 Global Step: 701940 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:34:56,958-Speed 6332.70 samples/sec Loss 2.9091 LearningRate 0.0000 Epoch: 33 Global Step: 701950 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:00,200-Speed 6318.07 samples/sec Loss 2.9277 LearningRate 0.0000 Epoch: 33 Global Step: 701960 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:03,453-Speed 6297.74 samples/sec Loss 2.9323 LearningRate 0.0000 Epoch: 33 Global Step: 701970 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:06,699-Speed 6310.67 samples/sec Loss 2.9295 LearningRate 0.0000 Epoch: 33 Global Step: 701980 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:09,965-Speed 6272.22 samples/sec Loss 2.9142 LearningRate 0.0000 Epoch: 33 Global Step: 701990 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:13,220-Speed 6293.11 samples/sec Loss 2.8619 LearningRate 0.0000 Epoch: 33 Global Step: 702000 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:16,473-Speed 6296.36 samples/sec Loss 2.9295 LearningRate 0.0000 Epoch: 33 Global Step: 702010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:19,715-Speed 6318.94 samples/sec Loss 2.9236 LearningRate 0.0000 Epoch: 33 Global Step: 702020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:22,966-Speed 6301.62 samples/sec Loss 2.8908 LearningRate 0.0000 Epoch: 33 Global Step: 702030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:26,219-Speed 6297.63 samples/sec Loss 2.9131 LearningRate 0.0000 Epoch: 33 Global Step: 702040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:29,473-Speed 6293.98 samples/sec Loss 2.9220 LearningRate 0.0000 Epoch: 33 Global Step: 702050 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:35:32,731-Speed 6288.50 samples/sec Loss 2.9616 LearningRate 0.0000 Epoch: 33 Global Step: 702060 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:35:35,980-Speed 6304.08 samples/sec Loss 2.8358 LearningRate 0.0000 Epoch: 33 Global Step: 702070 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:35:39,239-Speed 6285.61 samples/sec Loss 2.8756 LearningRate 0.0000 Epoch: 33 Global Step: 702080 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:35:42,488-Speed 6305.74 samples/sec Loss 2.9366 LearningRate 0.0000 Epoch: 33 Global Step: 702090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:45,740-Speed 6298.98 samples/sec Loss 2.8489 LearningRate 0.0000 Epoch: 33 Global Step: 702100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:48,994-Speed 6295.17 samples/sec Loss 2.8913 LearningRate 0.0000 Epoch: 33 Global Step: 702110 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:52,247-Speed 6296.64 samples/sec Loss 2.8715 LearningRate 0.0000 Epoch: 33 Global Step: 702120 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:55,503-Speed 6290.57 samples/sec Loss 2.8979 LearningRate 0.0000 Epoch: 33 Global Step: 702130 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:35:58,757-Speed 6296.96 samples/sec Loss 2.9525 LearningRate 0.0000 Epoch: 33 Global Step: 702140 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:02,005-Speed 6306.18 samples/sec Loss 2.9154 LearningRate 0.0000 Epoch: 33 Global Step: 702150 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:05,255-Speed 6303.65 samples/sec Loss 2.8884 LearningRate 0.0000 Epoch: 33 Global Step: 702160 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:08,500-Speed 6312.56 samples/sec Loss 2.9287 LearningRate 0.0000 Epoch: 33 Global Step: 702170 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:11,752-Speed 6298.53 samples/sec Loss 2.9298 LearningRate 0.0000 Epoch: 33 Global Step: 702180 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:15,008-Speed 6291.43 samples/sec Loss 2.9252 LearningRate 0.0000 Epoch: 33 Global Step: 702190 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:36:18,262-Speed 6295.78 samples/sec Loss 2.8801 LearningRate 0.0000 Epoch: 33 Global Step: 702200 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:36:21,518-Speed 6290.68 samples/sec Loss 2.8765 LearningRate 0.0000 Epoch: 33 Global Step: 702210 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:36:24,775-Speed 6290.41 samples/sec Loss 2.9235 LearningRate 0.0000 Epoch: 33 Global Step: 702220 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:36:28,031-Speed 6290.11 samples/sec Loss 3.0027 LearningRate 0.0000 Epoch: 33 Global Step: 702230 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:36:31,277-Speed 6310.26 samples/sec Loss 2.9037 LearningRate 0.0000 Epoch: 33 Global Step: 702240 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:36:34,523-Speed 6312.16 samples/sec Loss 2.9130 LearningRate 0.0000 Epoch: 33 Global Step: 702250 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:36:37,762-Speed 6323.37 samples/sec Loss 2.8922 LearningRate 0.0000 Epoch: 33 Global Step: 702260 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:41,021-Speed 6284.66 samples/sec Loss 2.8602 LearningRate 0.0000 Epoch: 33 Global Step: 702270 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:44,276-Speed 6294.50 samples/sec Loss 2.8817 LearningRate 0.0000 Epoch: 33 Global Step: 702280 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:47,523-Speed 6307.62 samples/sec Loss 2.8917 LearningRate 0.0000 Epoch: 33 Global Step: 702290 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:50,775-Speed 6300.18 samples/sec Loss 2.9372 LearningRate 0.0000 Epoch: 33 Global Step: 702300 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:54,028-Speed 6296.17 samples/sec Loss 2.9182 LearningRate 0.0000 Epoch: 33 Global Step: 702310 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:36:57,276-Speed 6307.09 samples/sec Loss 2.8999 LearningRate 0.0000 Epoch: 33 Global Step: 702320 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:00,544-Speed 6269.22 samples/sec Loss 2.9007 LearningRate 0.0000 Epoch: 33 Global Step: 702330 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:03,797-Speed 6296.91 samples/sec Loss 2.9341 LearningRate 0.0000 Epoch: 33 Global Step: 702340 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:07,052-Speed 6293.23 samples/sec Loss 2.9584 LearningRate 0.0000 Epoch: 33 Global Step: 702350 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:10,288-Speed 6329.91 samples/sec Loss 2.8916 LearningRate 0.0000 Epoch: 33 Global Step: 702360 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:13,544-Speed 6293.04 samples/sec Loss 2.9370 LearningRate 0.0000 Epoch: 33 Global Step: 702370 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:16,800-Speed 6290.64 samples/sec Loss 2.9506 LearningRate 0.0000 Epoch: 33 Global Step: 702380 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:20,051-Speed 6300.21 samples/sec Loss 2.9177 LearningRate 0.0000 Epoch: 33 Global Step: 702390 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:23,302-Speed 6301.84 samples/sec Loss 2.9275 LearningRate 0.0000 Epoch: 33 Global Step: 702400 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:26,551-Speed 6304.53 samples/sec Loss 2.9169 LearningRate 0.0000 Epoch: 33 Global Step: 702410 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:29,805-Speed 6295.16 samples/sec Loss 2.8836 LearningRate 0.0000 Epoch: 33 Global Step: 702420 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:33,058-Speed 6297.42 samples/sec Loss 2.9144 LearningRate 0.0000 Epoch: 33 Global Step: 702430 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:36,311-Speed 6296.64 samples/sec Loss 2.9291 LearningRate 0.0000 Epoch: 33 Global Step: 702440 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:39,571-Speed 6283.49 samples/sec Loss 2.9145 LearningRate 0.0000 Epoch: 33 Global Step: 702450 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:42,827-Speed 6292.87 samples/sec Loss 2.8983 LearningRate 0.0000 Epoch: 33 Global Step: 702460 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:37:46,085-Speed 6285.96 samples/sec Loss 2.9293 LearningRate 0.0000 Epoch: 33 Global Step: 702470 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:37:49,320-Speed 6332.19 samples/sec Loss 2.8446 LearningRate 0.0000 Epoch: 33 Global Step: 702480 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:52,570-Speed 6303.63 samples/sec Loss 2.9350 LearningRate 0.0000 Epoch: 33 Global Step: 702490 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:55,823-Speed 6296.06 samples/sec Loss 2.8700 LearningRate 0.0000 Epoch: 33 Global Step: 702500 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:37:59,076-Speed 6298.43 samples/sec Loss 2.9516 LearningRate 0.0000 Epoch: 33 Global Step: 702510 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:02,330-Speed 6295.20 samples/sec Loss 2.9539 LearningRate 0.0000 Epoch: 33 Global Step: 702520 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:05,590-Speed 6281.95 samples/sec Loss 2.8838 LearningRate 0.0000 Epoch: 33 Global Step: 702530 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:08,848-Speed 6287.62 samples/sec Loss 2.8770 LearningRate 0.0000 Epoch: 33 Global Step: 702540 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:12,103-Speed 6293.34 samples/sec Loss 2.8804 LearningRate 0.0000 Epoch: 33 Global Step: 702550 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:15,356-Speed 6298.41 samples/sec Loss 2.9363 LearningRate 0.0000 Epoch: 33 Global Step: 702560 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:18,611-Speed 6293.19 samples/sec Loss 2.9246 LearningRate 0.0000 Epoch: 33 Global Step: 702570 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:21,855-Speed 6314.34 samples/sec Loss 2.9295 LearningRate 0.0000 Epoch: 33 Global Step: 702580 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:38:25,101-Speed 6311.09 samples/sec Loss 2.8799 LearningRate 0.0000 Epoch: 33 Global Step: 702590 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:38:28,340-Speed 6324.35 samples/sec Loss 2.9207 LearningRate 0.0000 Epoch: 33 Global Step: 702600 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:31,585-Speed 6312.38 samples/sec Loss 2.8857 LearningRate 0.0000 Epoch: 33 Global Step: 702610 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:34,835-Speed 6303.06 samples/sec Loss 2.8605 LearningRate 0.0000 Epoch: 33 Global Step: 702620 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:38,081-Speed 6311.44 samples/sec Loss 2.8919 LearningRate 0.0000 Epoch: 33 Global Step: 702630 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:41,333-Speed 6297.56 samples/sec Loss 2.9084 LearningRate 0.0000 Epoch: 33 Global Step: 702640 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:44,579-Speed 6312.43 samples/sec Loss 2.9264 LearningRate 0.0000 Epoch: 33 Global Step: 702650 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:47,831-Speed 6298.58 samples/sec Loss 2.9789 LearningRate 0.0000 Epoch: 33 Global Step: 702660 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:51,084-Speed 6297.35 samples/sec Loss 2.9301 LearningRate 0.0000 Epoch: 33 Global Step: 702670 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:54,326-Speed 6317.55 samples/sec Loss 2.9256 LearningRate 0.0000 Epoch: 33 Global Step: 702680 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:38:57,574-Speed 6307.31 samples/sec Loss 2.8892 LearningRate 0.0000 Epoch: 33 Global Step: 702690 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:00,833-Speed 6286.19 samples/sec Loss 2.8747 LearningRate 0.0000 Epoch: 33 Global Step: 702700 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:39:04,075-Speed 6318.81 samples/sec Loss 2.9086 LearningRate 0.0000 Epoch: 33 Global Step: 702710 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:07,323-Speed 6305.56 samples/sec Loss 2.9125 LearningRate 0.0000 Epoch: 33 Global Step: 702720 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:10,574-Speed 6300.64 samples/sec Loss 2.9202 LearningRate 0.0000 Epoch: 33 Global Step: 702730 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:13,832-Speed 6287.87 samples/sec Loss 2.8630 LearningRate 0.0000 Epoch: 33 Global Step: 702740 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:17,089-Speed 6290.06 samples/sec Loss 2.9493 LearningRate 0.0000 Epoch: 33 Global Step: 702750 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:20,337-Speed 6307.23 samples/sec Loss 2.8873 LearningRate 0.0000 Epoch: 33 Global Step: 702760 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:23,585-Speed 6306.74 samples/sec Loss 2.9204 LearningRate 0.0000 Epoch: 33 Global Step: 702770 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:26,840-Speed 6293.04 samples/sec Loss 2.9508 LearningRate 0.0000 Epoch: 33 Global Step: 702780 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:30,084-Speed 6314.38 samples/sec Loss 2.9269 LearningRate 0.0000 Epoch: 33 Global Step: 702790 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:33,332-Speed 6307.85 samples/sec Loss 2.9221 LearningRate 0.0000 Epoch: 33 Global Step: 702800 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:39:36,582-Speed 6302.66 samples/sec Loss 2.9066 LearningRate 0.0000 Epoch: 33 Global Step: 702810 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:39:39,832-Speed 6302.05 samples/sec Loss 2.9170 LearningRate 0.0000 Epoch: 33 Global Step: 702820 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:39:43,084-Speed 6300.63 samples/sec Loss 2.9077 LearningRate 0.0000 Epoch: 33 Global Step: 702830 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:39:46,327-Speed 6316.04 samples/sec Loss 2.9303 LearningRate 0.0000 Epoch: 33 Global Step: 702840 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:39:49,585-Speed 6288.49 samples/sec Loss 2.8760 LearningRate 0.0000 Epoch: 33 Global Step: 702850 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:39:52,854-Speed 6264.51 samples/sec Loss 2.9154 LearningRate 0.0000 Epoch: 33 Global Step: 702860 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:39:56,112-Speed 6287.34 samples/sec Loss 2.9934 LearningRate 0.0000 Epoch: 33 Global Step: 702870 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:39:59,349-Speed 6328.33 samples/sec Loss 2.8989 LearningRate 0.0000 Epoch: 33 Global Step: 702880 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:02,596-Speed 6310.03 samples/sec Loss 2.9084 LearningRate 0.0000 Epoch: 33 Global Step: 702890 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:05,848-Speed 6298.94 samples/sec Loss 2.8788 LearningRate 0.0000 Epoch: 33 Global Step: 702900 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:09,107-Speed 6284.89 samples/sec Loss 2.9343 LearningRate 0.0000 Epoch: 33 Global Step: 702910 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:12,354-Speed 6309.01 samples/sec Loss 2.9181 LearningRate 0.0000 Epoch: 33 Global Step: 702920 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:15,610-Speed 6290.66 samples/sec Loss 2.9384 LearningRate 0.0000 Epoch: 33 Global Step: 702930 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:18,860-Speed 6304.16 samples/sec Loss 2.9739 LearningRate 0.0000 Epoch: 33 Global Step: 702940 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:22,110-Speed 6301.44 samples/sec Loss 2.8743 LearningRate 0.0000 Epoch: 33 Global Step: 702950 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:25,361-Speed 6301.65 samples/sec Loss 2.8563 LearningRate 0.0000 Epoch: 33 Global Step: 702960 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:28,617-Speed 6291.09 samples/sec Loss 2.9479 LearningRate 0.0000 Epoch: 33 Global Step: 702970 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:31,877-Speed 6284.96 samples/sec Loss 2.9533 LearningRate 0.0000 Epoch: 33 Global Step: 702980 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:40:35,134-Speed 6289.89 samples/sec Loss 2.8599 LearningRate 0.0000 Epoch: 33 Global Step: 702990 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:40:38,376-Speed 6318.18 samples/sec Loss 2.9014 LearningRate 0.0000 Epoch: 33 Global Step: 703000 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:41,629-Speed 6296.31 samples/sec Loss 2.8780 LearningRate 0.0000 Epoch: 33 Global Step: 703010 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:44,888-Speed 6285.44 samples/sec Loss 2.8991 LearningRate 0.0000 Epoch: 33 Global Step: 703020 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:48,146-Speed 6287.78 samples/sec Loss 2.9053 LearningRate 0.0000 Epoch: 33 Global Step: 703030 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:51,403-Speed 6290.05 samples/sec Loss 2.9182 LearningRate 0.0000 Epoch: 33 Global Step: 703040 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:54,651-Speed 6305.45 samples/sec Loss 2.9142 LearningRate 0.0000 Epoch: 33 Global Step: 703050 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:40:57,899-Speed 6307.72 samples/sec Loss 2.8884 LearningRate 0.0000 Epoch: 33 Global Step: 703060 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:41:01,150-Speed 6300.64 samples/sec Loss 2.8972 LearningRate 0.0000 Epoch: 33 Global Step: 703070 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:41:04,406-Speed 6290.64 samples/sec Loss 2.8523 LearningRate 0.0000 Epoch: 33 Global Step: 703080 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:41:07,660-Speed 6295.81 samples/sec Loss 2.9523 LearningRate 0.0000 Epoch: 33 Global Step: 703090 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-03 07:41:10,911-Speed 6300.66 samples/sec Loss 2.9438 LearningRate 0.0000 Epoch: 33 Global Step: 703100 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-03 07:41:14,151-Speed 6322.87 samples/sec Loss 2.9451 LearningRate 0.0000 Epoch: 33 Global Step: 703110 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:17,405-Speed 6296.28 samples/sec Loss 2.9255 LearningRate 0.0000 Epoch: 33 Global Step: 703120 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:20,656-Speed 6300.24 samples/sec Loss 2.8793 LearningRate 0.0000 Epoch: 33 Global Step: 703130 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:23,906-Speed 6302.08 samples/sec Loss 2.9088 LearningRate 0.0000 Epoch: 33 Global Step: 703140 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:27,158-Speed 6300.59 samples/sec Loss 2.8963 LearningRate 0.0000 Epoch: 33 Global Step: 703150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:30,407-Speed 6303.20 samples/sec Loss 2.9793 LearningRate 0.0000 Epoch: 33 Global Step: 703160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:33,662-Speed 6294.07 samples/sec Loss 2.8947 LearningRate 0.0000 Epoch: 33 Global Step: 703170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:36,915-Speed 6297.28 samples/sec Loss 2.9127 LearningRate 0.0000 Epoch: 33 Global Step: 703180 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:40,165-Speed 6302.87 samples/sec Loss 2.8993 LearningRate 0.0000 Epoch: 33 Global Step: 703190 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:43,416-Speed 6302.51 samples/sec Loss 2.9035 LearningRate 0.0000 Epoch: 33 Global Step: 703200 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:46,667-Speed 6299.21 samples/sec Loss 2.9159 LearningRate 0.0000 Epoch: 33 Global Step: 703210 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:41:49,955-Speed 6231.45 samples/sec Loss 2.9384 LearningRate 0.0000 Epoch: 33 Global Step: 703220 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:41:53,193-Speed 6326.58 samples/sec Loss 2.8741 LearningRate 0.0000 Epoch: 33 Global Step: 703230 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:56,447-Speed 6294.35 samples/sec Loss 2.8752 LearningRate 0.0000 Epoch: 33 Global Step: 703240 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:41:59,696-Speed 6306.02 samples/sec Loss 2.8863 LearningRate 0.0000 Epoch: 33 Global Step: 703250 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:02,953-Speed 6288.65 samples/sec Loss 2.9586 LearningRate 0.0000 Epoch: 33 Global Step: 703260 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:06,212-Speed 6284.62 samples/sec Loss 2.9227 LearningRate 0.0000 Epoch: 33 Global Step: 703270 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:09,459-Speed 6309.81 samples/sec Loss 2.8682 LearningRate 0.0000 Epoch: 33 Global Step: 703280 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:12,721-Speed 6279.35 samples/sec Loss 2.9083 LearningRate 0.0000 Epoch: 33 Global Step: 703290 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:15,974-Speed 6297.19 samples/sec Loss 2.8846 LearningRate 0.0000 Epoch: 33 Global Step: 703300 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:19,218-Speed 6315.62 samples/sec Loss 2.8896 LearningRate 0.0000 Epoch: 33 Global Step: 703310 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:22,472-Speed 6294.54 samples/sec Loss 2.8967 LearningRate 0.0000 Epoch: 33 Global Step: 703320 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:25,718-Speed 6309.65 samples/sec Loss 2.8777 LearningRate 0.0000 Epoch: 33 Global Step: 703330 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:42:28,968-Speed 6304.06 samples/sec Loss 2.8894 LearningRate 0.0000 Epoch: 33 Global Step: 703340 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:42:32,226-Speed 6287.02 samples/sec Loss 2.9379 LearningRate 0.0000 Epoch: 33 Global Step: 703350 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:42:35,478-Speed 6298.56 samples/sec Loss 2.9147 LearningRate 0.0000 Epoch: 33 Global Step: 703360 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:42:38,736-Speed 6287.10 samples/sec Loss 2.9200 LearningRate 0.0000 Epoch: 33 Global Step: 703370 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:42:41,994-Speed 6289.01 samples/sec Loss 2.9405 LearningRate 0.0000 Epoch: 33 Global Step: 703380 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:42:45,244-Speed 6302.43 samples/sec Loss 2.9274 LearningRate 0.0000 Epoch: 33 Global Step: 703390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:48,523-Speed 6247.98 samples/sec Loss 2.9131 LearningRate 0.0000 Epoch: 33 Global Step: 703400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:51,782-Speed 6284.80 samples/sec Loss 2.8874 LearningRate 0.0000 Epoch: 33 Global Step: 703410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:55,038-Speed 6291.25 samples/sec Loss 2.9370 LearningRate 0.0000 Epoch: 33 Global Step: 703420 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:42:58,306-Speed 6268.56 samples/sec Loss 2.8609 LearningRate 0.0000 Epoch: 33 Global Step: 703430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:01,686-Speed 6061.10 samples/sec Loss 2.9490 LearningRate 0.0000 Epoch: 33 Global Step: 703440 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:04,938-Speed 6298.84 samples/sec Loss 2.9281 LearningRate 0.0000 Epoch: 33 Global Step: 703450 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:08,192-Speed 6296.32 samples/sec Loss 2.9026 LearningRate 0.0000 Epoch: 33 Global Step: 703460 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:11,534-Speed 6128.63 samples/sec Loss 2.9106 LearningRate 0.0000 Epoch: 33 Global Step: 703470 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:14,794-Speed 6283.35 samples/sec Loss 2.8782 LearningRate 0.0000 Epoch: 33 Global Step: 703480 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:18,037-Speed 6317.00 samples/sec Loss 2.9602 LearningRate 0.0000 Epoch: 33 Global Step: 703490 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:21,292-Speed 6292.82 samples/sec Loss 2.9459 LearningRate 0.0000 Epoch: 33 Global Step: 703500 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:24,540-Speed 6306.08 samples/sec Loss 2.9077 LearningRate 0.0000 Epoch: 33 Global Step: 703510 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:27,787-Speed 6309.83 samples/sec Loss 2.8853 LearningRate 0.0000 Epoch: 33 Global Step: 703520 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:31,038-Speed 6300.20 samples/sec Loss 2.9065 LearningRate 0.0000 Epoch: 33 Global Step: 703530 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:34,296-Speed 6288.26 samples/sec Loss 2.9312 LearningRate 0.0000 Epoch: 33 Global Step: 703540 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:37,543-Speed 6308.61 samples/sec Loss 2.9300 LearningRate 0.0000 Epoch: 33 Global Step: 703550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:40,795-Speed 6298.28 samples/sec Loss 2.9858 LearningRate 0.0000 Epoch: 33 Global Step: 703560 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:44,050-Speed 6294.76 samples/sec Loss 2.9103 LearningRate 0.0000 Epoch: 33 Global Step: 703570 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:47,304-Speed 6295.04 samples/sec Loss 2.8819 LearningRate 0.0000 Epoch: 33 Global Step: 703580 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:50,551-Speed 6307.60 samples/sec Loss 2.9526 LearningRate 0.0000 Epoch: 33 Global Step: 703590 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:43:53,780-Speed 6344.66 samples/sec Loss 2.8831 LearningRate 0.0000 Epoch: 33 Global Step: 703600 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:43:57,028-Speed 6307.43 samples/sec Loss 2.8853 LearningRate 0.0000 Epoch: 33 Global Step: 703610 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:00,281-Speed 6296.98 samples/sec Loss 2.8881 LearningRate 0.0000 Epoch: 33 Global Step: 703620 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:03,532-Speed 6301.08 samples/sec Loss 2.9277 LearningRate 0.0000 Epoch: 33 Global Step: 703630 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:06,785-Speed 6298.06 samples/sec Loss 2.9526 LearningRate 0.0000 Epoch: 33 Global Step: 703640 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:10,034-Speed 6304.98 samples/sec Loss 2.9080 LearningRate 0.0000 Epoch: 33 Global Step: 703650 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:13,289-Speed 6292.34 samples/sec Loss 2.9478 LearningRate 0.0000 Epoch: 33 Global Step: 703660 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:16,538-Speed 6303.85 samples/sec Loss 2.8788 LearningRate 0.0000 Epoch: 33 Global Step: 703670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:19,784-Speed 6311.94 samples/sec Loss 2.8994 LearningRate 0.0000 Epoch: 33 Global Step: 703680 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:23,030-Speed 6309.79 samples/sec Loss 2.8637 LearningRate 0.0000 Epoch: 33 Global Step: 703690 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:26,283-Speed 6298.78 samples/sec Loss 2.9466 LearningRate 0.0000 Epoch: 33 Global Step: 703700 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:44:29,536-Speed 6295.60 samples/sec Loss 2.9651 LearningRate 0.0000 Epoch: 33 Global Step: 703710 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:44:32,783-Speed 6308.40 samples/sec Loss 2.9455 LearningRate 0.0000 Epoch: 33 Global Step: 703720 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:44:36,023-Speed 6323.61 samples/sec Loss 2.9750 LearningRate 0.0000 Epoch: 33 Global Step: 703730 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:39,279-Speed 6291.35 samples/sec Loss 2.9217 LearningRate 0.0000 Epoch: 33 Global Step: 703740 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:42,540-Speed 6280.79 samples/sec Loss 2.9062 LearningRate 0.0000 Epoch: 33 Global Step: 703750 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:45,802-Speed 6279.58 samples/sec Loss 2.9298 LearningRate 0.0000 Epoch: 33 Global Step: 703760 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:49,055-Speed 6297.23 samples/sec Loss 2.9087 LearningRate 0.0000 Epoch: 33 Global Step: 703770 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:52,317-Speed 6280.73 samples/sec Loss 2.8879 LearningRate 0.0000 Epoch: 33 Global Step: 703780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:55,571-Speed 6294.71 samples/sec Loss 2.9235 LearningRate 0.0000 Epoch: 33 Global Step: 703790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:44:58,836-Speed 6274.02 samples/sec Loss 2.8373 LearningRate 0.0000 Epoch: 33 Global Step: 703800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:02,091-Speed 6293.75 samples/sec Loss 2.9217 LearningRate 0.0000 Epoch: 33 Global Step: 703810 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:05,337-Speed 6310.56 samples/sec Loss 2.9238 LearningRate 0.0000 Epoch: 33 Global Step: 703820 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:08,592-Speed 6293.41 samples/sec Loss 2.8866 LearningRate 0.0000 Epoch: 33 Global Step: 703830 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:45:11,832-Speed 6323.14 samples/sec Loss 2.8884 LearningRate 0.0000 Epoch: 33 Global Step: 703840 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:15,090-Speed 6287.40 samples/sec Loss 2.9734 LearningRate 0.0000 Epoch: 33 Global Step: 703850 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:18,344-Speed 6294.19 samples/sec Loss 2.9377 LearningRate 0.0000 Epoch: 33 Global Step: 703860 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:21,594-Speed 6304.06 samples/sec Loss 2.9239 LearningRate 0.0000 Epoch: 33 Global Step: 703870 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:24,842-Speed 6307.15 samples/sec Loss 2.8473 LearningRate 0.0000 Epoch: 33 Global Step: 703880 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:28,087-Speed 6312.38 samples/sec Loss 2.8698 LearningRate 0.0000 Epoch: 33 Global Step: 703890 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:31,337-Speed 6302.31 samples/sec Loss 2.9168 LearningRate 0.0000 Epoch: 33 Global Step: 703900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:34,587-Speed 6303.17 samples/sec Loss 2.8792 LearningRate 0.0000 Epoch: 33 Global Step: 703910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:37,836-Speed 6305.62 samples/sec Loss 2.9324 LearningRate 0.0000 Epoch: 33 Global Step: 703920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:41,096-Speed 6281.66 samples/sec Loss 2.9162 LearningRate 0.0000 Epoch: 33 Global Step: 703930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:44,347-Speed 6301.18 samples/sec Loss 2.8985 LearningRate 0.0000 Epoch: 33 Global Step: 703940 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:45:47,594-Speed 6309.03 samples/sec Loss 2.9272 LearningRate 0.0000 Epoch: 33 Global Step: 703950 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:45:50,849-Speed 6292.86 samples/sec Loss 2.8557 LearningRate 0.0000 Epoch: 33 Global Step: 703960 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:45:54,092-Speed 6317.82 samples/sec Loss 2.8863 LearningRate 0.0000 Epoch: 33 Global Step: 703970 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:45:57,338-Speed 6310.27 samples/sec Loss 2.9043 LearningRate 0.0000 Epoch: 33 Global Step: 703980 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:00,581-Speed 6315.93 samples/sec Loss 2.8842 LearningRate 0.0000 Epoch: 33 Global Step: 703990 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:03,832-Speed 6301.19 samples/sec Loss 2.8195 LearningRate 0.0000 Epoch: 33 Global Step: 704000 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:07,085-Speed 6296.95 samples/sec Loss 2.8785 LearningRate 0.0000 Epoch: 33 Global Step: 704010 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:10,341-Speed 6290.85 samples/sec Loss 2.9031 LearningRate 0.0000 Epoch: 33 Global Step: 704020 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:13,592-Speed 6300.93 samples/sec Loss 2.9185 LearningRate 0.0000 Epoch: 33 Global Step: 704030 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:16,850-Speed 6287.30 samples/sec Loss 2.9142 LearningRate 0.0000 Epoch: 33 Global Step: 704040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:20,106-Speed 6292.84 samples/sec Loss 2.8725 LearningRate 0.0000 Epoch: 33 Global Step: 704050 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:23,362-Speed 6291.91 samples/sec Loss 2.8628 LearningRate 0.0000 Epoch: 33 Global Step: 704060 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:26,618-Speed 6290.59 samples/sec Loss 2.9683 LearningRate 0.0000 Epoch: 33 Global Step: 704070 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:46:29,858-Speed 6322.15 samples/sec Loss 2.9367 LearningRate 0.0000 Epoch: 33 Global Step: 704080 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:33,103-Speed 6312.79 samples/sec Loss 2.9795 LearningRate 0.0000 Epoch: 33 Global Step: 704090 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:36,354-Speed 6301.13 samples/sec Loss 2.9299 LearningRate 0.0000 Epoch: 33 Global Step: 704100 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:39,597-Speed 6316.00 samples/sec Loss 2.8872 LearningRate 0.0000 Epoch: 33 Global Step: 704110 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:42,849-Speed 6300.59 samples/sec Loss 2.8927 LearningRate 0.0000 Epoch: 33 Global Step: 704120 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:46,091-Speed 6316.68 samples/sec Loss 2.9262 LearningRate 0.0000 Epoch: 33 Global Step: 704130 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:49,351-Speed 6284.55 samples/sec Loss 2.9212 LearningRate 0.0000 Epoch: 33 Global Step: 704140 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:52,607-Speed 6292.02 samples/sec Loss 2.8729 LearningRate 0.0000 Epoch: 33 Global Step: 704150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:55,854-Speed 6307.86 samples/sec Loss 2.9318 LearningRate 0.0000 Epoch: 33 Global Step: 704160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:46:59,097-Speed 6315.93 samples/sec Loss 2.9144 LearningRate 0.0000 Epoch: 33 Global Step: 704170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:02,345-Speed 6308.11 samples/sec Loss 2.9313 LearningRate 0.0000 Epoch: 33 Global Step: 704180 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:47:05,601-Speed 6291.20 samples/sec Loss 2.9272 LearningRate 0.0000 Epoch: 33 Global Step: 704190 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:47:08,835-Speed 6332.96 samples/sec Loss 2.9342 LearningRate 0.0000 Epoch: 33 Global Step: 704200 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:12,087-Speed 6299.24 samples/sec Loss 2.9047 LearningRate 0.0000 Epoch: 33 Global Step: 704210 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:15,336-Speed 6305.75 samples/sec Loss 2.9009 LearningRate 0.0000 Epoch: 33 Global Step: 704220 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:18,587-Speed 6300.58 samples/sec Loss 2.9212 LearningRate 0.0000 Epoch: 33 Global Step: 704230 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:21,832-Speed 6312.19 samples/sec Loss 2.9326 LearningRate 0.0000 Epoch: 33 Global Step: 704240 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:25,080-Speed 6307.88 samples/sec Loss 2.9240 LearningRate 0.0000 Epoch: 33 Global Step: 704250 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:28,341-Speed 6282.58 samples/sec Loss 2.8589 LearningRate 0.0000 Epoch: 33 Global Step: 704260 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:31,584-Speed 6314.82 samples/sec Loss 2.8952 LearningRate 0.0000 Epoch: 33 Global Step: 704270 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:34,847-Speed 6278.42 samples/sec Loss 2.9300 LearningRate 0.0000 Epoch: 33 Global Step: 704280 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:38,097-Speed 6303.50 samples/sec Loss 2.9171 LearningRate 0.0000 Epoch: 33 Global Step: 704290 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:41,369-Speed 6261.22 samples/sec Loss 2.9702 LearningRate 0.0000 Epoch: 33 Global Step: 704300 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:47:44,620-Speed 6300.39 samples/sec Loss 2.9063 LearningRate 0.0000 Epoch: 33 Global Step: 704310 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:47:47,851-Speed 6339.70 samples/sec Loss 2.9123 LearningRate 0.0000 Epoch: 33 Global Step: 704320 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:51,098-Speed 6308.47 samples/sec Loss 2.8692 LearningRate 0.0000 Epoch: 33 Global Step: 704330 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:54,348-Speed 6303.80 samples/sec Loss 2.8790 LearningRate 0.0000 Epoch: 33 Global Step: 704340 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:47:57,609-Speed 6282.21 samples/sec Loss 2.9453 LearningRate 0.0000 Epoch: 33 Global Step: 704350 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:00,871-Speed 6279.26 samples/sec Loss 2.9528 LearningRate 0.0000 Epoch: 33 Global Step: 704360 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:04,133-Speed 6279.20 samples/sec Loss 2.9287 LearningRate 0.0000 Epoch: 33 Global Step: 704370 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:07,393-Speed 6284.43 samples/sec Loss 2.8997 LearningRate 0.0000 Epoch: 33 Global Step: 704380 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:10,647-Speed 6294.78 samples/sec Loss 2.9120 LearningRate 0.0000 Epoch: 33 Global Step: 704390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:13,900-Speed 6297.00 samples/sec Loss 2.9214 LearningRate 0.0000 Epoch: 33 Global Step: 704400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:17,153-Speed 6295.77 samples/sec Loss 2.9258 LearningRate 0.0000 Epoch: 33 Global Step: 704410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:20,401-Speed 6306.45 samples/sec Loss 2.8319 LearningRate 0.0000 Epoch: 33 Global Step: 704420 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:48:23,638-Speed 6329.20 samples/sec Loss 2.9079 LearningRate 0.0000 Epoch: 33 Global Step: 704430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:26,895-Speed 6288.60 samples/sec Loss 2.8796 LearningRate 0.0000 Epoch: 33 Global Step: 704440 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:30,144-Speed 6305.12 samples/sec Loss 2.8640 LearningRate 0.0000 Epoch: 33 Global Step: 704450 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:33,398-Speed 6296.37 samples/sec Loss 2.8912 LearningRate 0.0000 Epoch: 33 Global Step: 704460 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:36,649-Speed 6300.09 samples/sec Loss 2.9171 LearningRate 0.0000 Epoch: 33 Global Step: 704470 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:39,901-Speed 6300.41 samples/sec Loss 2.9395 LearningRate 0.0000 Epoch: 33 Global Step: 704480 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:43,151-Speed 6301.85 samples/sec Loss 2.9090 LearningRate 0.0000 Epoch: 33 Global Step: 704490 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:46,407-Speed 6290.98 samples/sec Loss 2.9043 LearningRate 0.0000 Epoch: 33 Global Step: 704500 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:49,673-Speed 6272.32 samples/sec Loss 2.8828 LearningRate 0.0000 Epoch: 33 Global Step: 704510 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:52,929-Speed 6291.83 samples/sec Loss 2.9232 LearningRate 0.0000 Epoch: 33 Global Step: 704520 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:48:56,178-Speed 6304.03 samples/sec Loss 2.8807 LearningRate 0.0000 Epoch: 33 Global Step: 704530 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:48:59,417-Speed 6325.61 samples/sec Loss 2.9162 LearningRate 0.0000 Epoch: 33 Global Step: 704540 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:02,668-Speed 6299.87 samples/sec Loss 2.8690 LearningRate 0.0000 Epoch: 33 Global Step: 704550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:05,919-Speed 6301.73 samples/sec Loss 2.8839 LearningRate 0.0000 Epoch: 33 Global Step: 704560 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:09,167-Speed 6306.05 samples/sec Loss 2.8816 LearningRate 0.0000 Epoch: 33 Global Step: 704570 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:12,419-Speed 6299.41 samples/sec Loss 2.9421 LearningRate 0.0000 Epoch: 33 Global Step: 704580 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:15,674-Speed 6294.39 samples/sec Loss 2.8847 LearningRate 0.0000 Epoch: 33 Global Step: 704590 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:18,926-Speed 6298.64 samples/sec Loss 2.9519 LearningRate 0.0000 Epoch: 33 Global Step: 704600 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:22,178-Speed 6298.52 samples/sec Loss 2.8602 LearningRate 0.0000 Epoch: 33 Global Step: 704610 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:25,428-Speed 6303.36 samples/sec Loss 2.8683 LearningRate 0.0000 Epoch: 33 Global Step: 704620 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:28,688-Speed 6282.76 samples/sec Loss 2.9117 LearningRate 0.0000 Epoch: 33 Global Step: 704630 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:31,939-Speed 6302.28 samples/sec Loss 2.9140 LearningRate 0.0000 Epoch: 33 Global Step: 704640 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:49:35,200-Speed 6281.13 samples/sec Loss 2.9096 LearningRate 0.0000 Epoch: 33 Global Step: 704650 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:49:38,448-Speed 6306.34 samples/sec Loss 2.9149 LearningRate 0.0000 Epoch: 33 Global Step: 704660 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:49:41,686-Speed 6327.32 samples/sec Loss 2.8893 LearningRate 0.0000 Epoch: 33 Global Step: 704670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:44,937-Speed 6301.07 samples/sec Loss 2.9194 LearningRate 0.0000 Epoch: 33 Global Step: 704680 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:48,194-Speed 6288.59 samples/sec Loss 2.9092 LearningRate 0.0000 Epoch: 33 Global Step: 704690 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:51,453-Speed 6286.67 samples/sec Loss 2.8957 LearningRate 0.0000 Epoch: 33 Global Step: 704700 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:54,700-Speed 6308.74 samples/sec Loss 2.8957 LearningRate 0.0000 Epoch: 33 Global Step: 704710 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:49:57,953-Speed 6297.69 samples/sec Loss 2.9624 LearningRate 0.0000 Epoch: 33 Global Step: 704720 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:01,201-Speed 6306.30 samples/sec Loss 2.8849 LearningRate 0.0000 Epoch: 33 Global Step: 704730 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:04,467-Speed 6271.42 samples/sec Loss 2.8894 LearningRate 0.0000 Epoch: 33 Global Step: 704740 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:07,722-Speed 6292.69 samples/sec Loss 2.9610 LearningRate 0.0000 Epoch: 33 Global Step: 704750 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:10,973-Speed 6300.90 samples/sec Loss 2.8947 LearningRate 0.0000 Epoch: 33 Global Step: 704760 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:14,212-Speed 6325.64 samples/sec Loss 2.8863 LearningRate 0.0000 Epoch: 33 Global Step: 704770 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:17,463-Speed 6299.97 samples/sec Loss 2.9115 LearningRate 0.0000 Epoch: 33 Global Step: 704780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:20,726-Speed 6278.69 samples/sec Loss 2.9117 LearningRate 0.0000 Epoch: 33 Global Step: 704790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:23,981-Speed 6292.62 samples/sec Loss 2.8731 LearningRate 0.0000 Epoch: 33 Global Step: 704800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:27,236-Speed 6293.62 samples/sec Loss 2.9647 LearningRate 0.0000 Epoch: 33 Global Step: 704810 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:30,495-Speed 6285.39 samples/sec Loss 2.9128 LearningRate 0.0000 Epoch: 33 Global Step: 704820 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:33,740-Speed 6312.83 samples/sec Loss 2.9111 LearningRate 0.0000 Epoch: 33 Global Step: 704830 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:36,995-Speed 6292.54 samples/sec Loss 2.8679 LearningRate 0.0000 Epoch: 33 Global Step: 704840 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:40,252-Speed 6289.88 samples/sec Loss 2.9306 LearningRate 0.0000 Epoch: 33 Global Step: 704850 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:43,508-Speed 6291.62 samples/sec Loss 2.9294 LearningRate 0.0000 Epoch: 33 Global Step: 704860 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:46,756-Speed 6306.05 samples/sec Loss 2.9342 LearningRate 0.0000 Epoch: 33 Global Step: 704870 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:50,005-Speed 6306.03 samples/sec Loss 2.8939 LearningRate 0.0000 Epoch: 33 Global Step: 704880 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:53,254-Speed 6304.36 samples/sec Loss 2.9554 LearningRate 0.0000 Epoch: 33 Global Step: 704890 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:56,511-Speed 6289.19 samples/sec Loss 2.9226 LearningRate 0.0000 Epoch: 33 Global Step: 704900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:50:59,764-Speed 6297.76 samples/sec Loss 2.9108 LearningRate 0.0000 Epoch: 33 Global Step: 704910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:03,018-Speed 6295.52 samples/sec Loss 2.9232 LearningRate 0.0000 Epoch: 33 Global Step: 704920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:06,262-Speed 6313.99 samples/sec Loss 2.8587 LearningRate 0.0000 Epoch: 33 Global Step: 704930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:09,515-Speed 6297.80 samples/sec Loss 2.8856 LearningRate 0.0000 Epoch: 33 Global Step: 704940 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:12,771-Speed 6291.59 samples/sec Loss 2.8886 LearningRate 0.0000 Epoch: 33 Global Step: 704950 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:16,024-Speed 6297.62 samples/sec Loss 2.9304 LearningRate 0.0000 Epoch: 33 Global Step: 704960 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:19,258-Speed 6334.09 samples/sec Loss 2.9027 LearningRate 0.0000 Epoch: 33 Global Step: 704970 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:22,515-Speed 6287.76 samples/sec Loss 2.9294 LearningRate 0.0000 Epoch: 33 Global Step: 704980 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:25,771-Speed 6293.24 samples/sec Loss 2.9343 LearningRate 0.0000 Epoch: 33 Global Step: 704990 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:29,017-Speed 6309.21 samples/sec Loss 2.9399 LearningRate 0.0000 Epoch: 33 Global Step: 705000 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:32,269-Speed 6298.83 samples/sec Loss 2.8475 LearningRate 0.0000 Epoch: 33 Global Step: 705010 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:35,522-Speed 6297.05 samples/sec Loss 2.8999 LearningRate 0.0000 Epoch: 33 Global Step: 705020 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:38,773-Speed 6300.87 samples/sec Loss 2.8808 LearningRate 0.0000 Epoch: 33 Global Step: 705030 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:42,025-Speed 6300.23 samples/sec Loss 2.8843 LearningRate 0.0000 Epoch: 33 Global Step: 705040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:45,274-Speed 6303.90 samples/sec Loss 2.8700 LearningRate 0.0000 Epoch: 33 Global Step: 705050 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:48,534-Speed 6284.75 samples/sec Loss 2.9337 LearningRate 0.0000 Epoch: 33 Global Step: 705060 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:51:51,787-Speed 6295.50 samples/sec Loss 2.8737 LearningRate 0.0000 Epoch: 33 Global Step: 705070 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:51:55,039-Speed 6299.54 samples/sec Loss 2.8404 LearningRate 0.0000 Epoch: 33 Global Step: 705080 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:51:58,273-Speed 6335.38 samples/sec Loss 2.9247 LearningRate 0.0000 Epoch: 33 Global Step: 705090 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:52:01,531-Speed 6286.55 samples/sec Loss 2.9681 LearningRate 0.0000 Epoch: 33 Global Step: 705100 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:52:04,784-Speed 6297.88 samples/sec Loss 2.9271 LearningRate 0.0000 Epoch: 33 Global Step: 705110 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:52:08,032-Speed 6306.34 samples/sec Loss 2.9007 LearningRate 0.0000 Epoch: 33 Global Step: 705120 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:52:11,325-Speed 6222.13 samples/sec Loss 2.9391 LearningRate 0.0000 Epoch: 33 Global Step: 705130 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:52:14,621-Speed 6215.17 samples/sec Loss 2.9438 LearningRate 0.0000 Epoch: 33 Global Step: 705140 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:14,495-Speed 342.05 samples/sec Loss 2.9380 LearningRate 0.0000 Epoch: 34 Global Step: 705150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:17,737-Speed 6319.68 samples/sec Loss 2.9097 LearningRate 0.0000 Epoch: 34 Global Step: 705160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:20,987-Speed 6302.26 samples/sec Loss 2.9102 LearningRate 0.0000 Epoch: 34 Global Step: 705170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:24,257-Speed 6264.46 samples/sec Loss 2.9515 LearningRate 0.0000 Epoch: 34 Global Step: 705180 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:27,498-Speed 6319.50 samples/sec Loss 2.9525 LearningRate 0.0000 Epoch: 34 Global Step: 705190 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:53:30,738-Speed 6323.45 samples/sec Loss 2.8963 LearningRate 0.0000 Epoch: 34 Global Step: 705200 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:53:33,978-Speed 6322.41 samples/sec Loss 2.9371 LearningRate 0.0000 Epoch: 34 Global Step: 705210 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:53:37,228-Speed 6301.79 samples/sec Loss 2.8568 LearningRate 0.0000 Epoch: 34 Global Step: 705220 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:40,490-Speed 6279.49 samples/sec Loss 2.9339 LearningRate 0.0000 Epoch: 34 Global Step: 705230 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:43,741-Speed 6301.95 samples/sec Loss 2.9404 LearningRate 0.0000 Epoch: 34 Global Step: 705240 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:46,992-Speed 6301.53 samples/sec Loss 2.8528 LearningRate 0.0000 Epoch: 34 Global Step: 705250 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:50,245-Speed 6296.60 samples/sec Loss 2.9071 LearningRate 0.0000 Epoch: 34 Global Step: 705260 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:53,486-Speed 6320.12 samples/sec Loss 2.9124 LearningRate 0.0000 Epoch: 34 Global Step: 705270 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:56,739-Speed 6296.59 samples/sec Loss 2.9137 LearningRate 0.0000 Epoch: 34 Global Step: 705280 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:53:59,985-Speed 6311.89 samples/sec Loss 2.9277 LearningRate 0.0000 Epoch: 34 Global Step: 705290 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:03,237-Speed 6298.59 samples/sec Loss 2.8645 LearningRate 0.0000 Epoch: 34 Global Step: 705300 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:06,480-Speed 6316.47 samples/sec Loss 2.8839 LearningRate 0.0000 Epoch: 34 Global Step: 705310 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:09,722-Speed 6319.38 samples/sec Loss 2.8610 LearningRate 0.0000 Epoch: 34 Global Step: 705320 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:54:12,964-Speed 6317.35 samples/sec Loss 2.9105 LearningRate 0.0000 Epoch: 34 Global Step: 705330 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:54:16,220-Speed 6291.48 samples/sec Loss 2.8983 LearningRate 0.0000 Epoch: 34 Global Step: 705340 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:54:19,463-Speed 6317.03 samples/sec Loss 2.8336 LearningRate 0.0000 Epoch: 34 Global Step: 705350 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:54:22,708-Speed 6312.15 samples/sec Loss 2.9351 LearningRate 0.0000 Epoch: 34 Global Step: 705360 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:54:25,944-Speed 6331.40 samples/sec Loss 2.9383 LearningRate 0.0000 Epoch: 34 Global Step: 705370 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:29,202-Speed 6286.46 samples/sec Loss 2.8288 LearningRate 0.0000 Epoch: 34 Global Step: 705380 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:32,448-Speed 6311.50 samples/sec Loss 2.8685 LearningRate 0.0000 Epoch: 34 Global Step: 705390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:35,691-Speed 6317.14 samples/sec Loss 2.8959 LearningRate 0.0000 Epoch: 34 Global Step: 705400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:38,934-Speed 6314.91 samples/sec Loss 2.9396 LearningRate 0.0000 Epoch: 34 Global Step: 705410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:42,181-Speed 6309.25 samples/sec Loss 2.9064 LearningRate 0.0000 Epoch: 34 Global Step: 705420 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:45,429-Speed 6306.97 samples/sec Loss 2.9214 LearningRate 0.0000 Epoch: 34 Global Step: 705430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:48,672-Speed 6316.84 samples/sec Loss 2.8863 LearningRate 0.0000 Epoch: 34 Global Step: 705440 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:51,919-Speed 6308.66 samples/sec Loss 2.8155 LearningRate 0.0000 Epoch: 34 Global Step: 705450 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:55,163-Speed 6313.86 samples/sec Loss 2.9197 LearningRate 0.0000 Epoch: 34 Global Step: 705460 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:54:58,410-Speed 6309.08 samples/sec Loss 2.8972 LearningRate 0.0000 Epoch: 34 Global Step: 705470 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:55:01,659-Speed 6305.21 samples/sec Loss 2.8759 LearningRate 0.0000 Epoch: 34 Global Step: 705480 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:55:04,891-Speed 6337.92 samples/sec Loss 2.8386 LearningRate 0.0000 Epoch: 34 Global Step: 705490 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:55:08,142-Speed 6301.36 samples/sec Loss 2.8849 LearningRate 0.0000 Epoch: 34 Global Step: 705500 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:55:11,389-Speed 6307.92 samples/sec Loss 2.8858 LearningRate 0.0000 Epoch: 34 Global Step: 705510 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:55:14,639-Speed 6303.13 samples/sec Loss 2.9309 LearningRate 0.0000 Epoch: 34 Global Step: 705520 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:55:17,879-Speed 6322.20 samples/sec Loss 2.8939 LearningRate 0.0000 Epoch: 34 Global Step: 705530 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:55:21,132-Speed 6297.37 samples/sec Loss 2.8825 LearningRate 0.0000 Epoch: 34 Global Step: 705540 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:55:24,383-Speed 6302.12 samples/sec Loss 2.9067 LearningRate 0.0000 Epoch: 34 Global Step: 705550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:55:27,642-Speed 6285.97 samples/sec Loss 2.8508 LearningRate 0.0000 Epoch: 34 Global Step: 705560 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:55:30,894-Speed 6299.63 samples/sec Loss 2.9482 LearningRate 0.0000 Epoch: 34 Global Step: 705570 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:55:34,154-Speed 6283.01 samples/sec Loss 2.9206 LearningRate 0.0000 Epoch: 34 Global Step: 705580 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:55:37,409-Speed 6292.99 samples/sec Loss 2.9192 LearningRate 0.0000 Epoch: 34 Global Step: 705590 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:55:40,664-Speed 6293.54 samples/sec Loss 2.8702 LearningRate 0.0000 Epoch: 34 Global Step: 705600 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:55:43,915-Speed 6301.12 samples/sec Loss 2.8602 LearningRate 0.0000 Epoch: 34 Global Step: 705610 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:55:47,166-Speed 6300.95 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 705620 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:55:50,412-Speed 6310.10 samples/sec Loss 2.9017 LearningRate 0.0000 Epoch: 34 Global Step: 705630 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:55:53,656-Speed 6314.98 samples/sec Loss 2.8960 LearningRate 0.0000 Epoch: 34 Global Step: 705640 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:55:56,885-Speed 6342.88 samples/sec Loss 2.8459 LearningRate 0.0000 Epoch: 34 Global Step: 705650 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:00,131-Speed 6312.23 samples/sec Loss 2.9441 LearningRate 0.0000 Epoch: 34 Global Step: 705660 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:03,385-Speed 6295.24 samples/sec Loss 2.9137 LearningRate 0.0000 Epoch: 34 Global Step: 705670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:06,635-Speed 6301.15 samples/sec Loss 2.8504 LearningRate 0.0000 Epoch: 34 Global Step: 705680 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:09,877-Speed 6318.67 samples/sec Loss 2.8798 LearningRate 0.0000 Epoch: 34 Global Step: 705690 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:13,120-Speed 6316.33 samples/sec Loss 2.8355 LearningRate 0.0000 Epoch: 34 Global Step: 705700 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:16,363-Speed 6316.44 samples/sec Loss 2.9271 LearningRate 0.0000 Epoch: 34 Global Step: 705710 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:19,620-Speed 6289.92 samples/sec Loss 2.9298 LearningRate 0.0000 Epoch: 34 Global Step: 705720 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:22,867-Speed 6308.67 samples/sec Loss 2.9124 LearningRate 0.0000 Epoch: 34 Global Step: 705730 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:26,113-Speed 6310.77 samples/sec Loss 2.9028 LearningRate 0.0000 Epoch: 34 Global Step: 705740 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:29,346-Speed 6335.96 samples/sec Loss 2.9330 LearningRate 0.0000 Epoch: 34 Global Step: 705750 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:32,601-Speed 6294.81 samples/sec Loss 2.9512 LearningRate 0.0000 Epoch: 34 Global Step: 705760 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:35,874-Speed 6258.53 samples/sec Loss 2.9389 LearningRate 0.0000 Epoch: 34 Global Step: 705770 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:39,125-Speed 6300.67 samples/sec Loss 2.8947 LearningRate 0.0000 Epoch: 34 Global Step: 705780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:42,377-Speed 6299.19 samples/sec Loss 2.9085 LearningRate 0.0000 Epoch: 34 Global Step: 705790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:45,626-Speed 6305.80 samples/sec Loss 2.8710 LearningRate 0.0000 Epoch: 34 Global Step: 705800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:48,877-Speed 6300.36 samples/sec Loss 2.8734 LearningRate 0.0000 Epoch: 34 Global Step: 705810 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:52,117-Speed 6321.17 samples/sec Loss 2.9457 LearningRate 0.0000 Epoch: 34 Global Step: 705820 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:55,362-Speed 6314.42 samples/sec Loss 2.9031 LearningRate 0.0000 Epoch: 34 Global Step: 705830 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:56:58,605-Speed 6314.98 samples/sec Loss 2.8850 LearningRate 0.0000 Epoch: 34 Global Step: 705840 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:01,850-Speed 6314.35 samples/sec Loss 2.9639 LearningRate 0.0000 Epoch: 34 Global Step: 705850 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:57:05,100-Speed 6303.12 samples/sec Loss 2.8727 LearningRate 0.0000 Epoch: 34 Global Step: 705860 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:57:08,344-Speed 6312.97 samples/sec Loss 2.8843 LearningRate 0.0000 Epoch: 34 Global Step: 705870 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:57:11,572-Speed 6346.86 samples/sec Loss 2.8631 LearningRate 0.0000 Epoch: 34 Global Step: 705880 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:14,818-Speed 6309.41 samples/sec Loss 2.8870 LearningRate 0.0000 Epoch: 34 Global Step: 705890 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:18,064-Speed 6311.87 samples/sec Loss 2.8784 LearningRate 0.0000 Epoch: 34 Global Step: 705900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:21,307-Speed 6314.96 samples/sec Loss 2.8994 LearningRate 0.0000 Epoch: 34 Global Step: 705910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:24,565-Speed 6288.06 samples/sec Loss 2.9085 LearningRate 0.0000 Epoch: 34 Global Step: 705920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:27,815-Speed 6303.22 samples/sec Loss 2.8945 LearningRate 0.0000 Epoch: 34 Global Step: 705930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:31,062-Speed 6309.43 samples/sec Loss 2.9152 LearningRate 0.0000 Epoch: 34 Global Step: 705940 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:34,314-Speed 6299.22 samples/sec Loss 2.8995 LearningRate 0.0000 Epoch: 34 Global Step: 705950 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:37,562-Speed 6306.08 samples/sec Loss 2.9720 LearningRate 0.0000 Epoch: 34 Global Step: 705960 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:40,807-Speed 6313.04 samples/sec Loss 2.8805 LearningRate 0.0000 Epoch: 34 Global Step: 705970 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:57:44,059-Speed 6299.04 samples/sec Loss 2.9374 LearningRate 0.0000 Epoch: 34 Global Step: 705980 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:57:47,317-Speed 6288.44 samples/sec Loss 2.8757 LearningRate 0.0000 Epoch: 34 Global Step: 705990 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:57:50,571-Speed 6296.22 samples/sec Loss 2.9333 LearningRate 0.0000 Epoch: 34 Global Step: 706000 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:57:53,828-Speed 6289.16 samples/sec Loss 2.8777 LearningRate 0.0000 Epoch: 34 Global Step: 706010 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:57:57,080-Speed 6298.63 samples/sec Loss 2.8742 LearningRate 0.0000 Epoch: 34 Global Step: 706020 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:58:00,325-Speed 6312.19 samples/sec Loss 2.9522 LearningRate 0.0000 Epoch: 34 Global Step: 706030 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:58:03,561-Speed 6330.70 samples/sec Loss 2.9283 LearningRate 0.0000 Epoch: 34 Global Step: 706040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:06,807-Speed 6310.96 samples/sec Loss 2.8787 LearningRate 0.0000 Epoch: 34 Global Step: 706050 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:10,050-Speed 6315.80 samples/sec Loss 2.8806 LearningRate 0.0000 Epoch: 34 Global Step: 706060 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:13,297-Speed 6308.93 samples/sec Loss 2.8100 LearningRate 0.0000 Epoch: 34 Global Step: 706070 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:16,549-Speed 6299.77 samples/sec Loss 2.9347 LearningRate 0.0000 Epoch: 34 Global Step: 706080 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:19,794-Speed 6312.33 samples/sec Loss 2.8886 LearningRate 0.0000 Epoch: 34 Global Step: 706090 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:23,040-Speed 6309.78 samples/sec Loss 2.8935 LearningRate 0.0000 Epoch: 34 Global Step: 706100 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:26,282-Speed 6318.87 samples/sec Loss 2.9153 LearningRate 0.0000 Epoch: 34 Global Step: 706110 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:29,525-Speed 6315.53 samples/sec Loss 2.9132 LearningRate 0.0000 Epoch: 34 Global Step: 706120 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:32,781-Speed 6292.68 samples/sec Loss 2.8990 LearningRate 0.0000 Epoch: 34 Global Step: 706130 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:36,034-Speed 6297.09 samples/sec Loss 2.9125 LearningRate 0.0000 Epoch: 34 Global Step: 706140 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:58:39,271-Speed 6327.73 samples/sec Loss 2.8990 LearningRate 0.0000 Epoch: 34 Global Step: 706150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:42,519-Speed 6306.72 samples/sec Loss 2.8929 LearningRate 0.0000 Epoch: 34 Global Step: 706160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:45,763-Speed 6315.81 samples/sec Loss 2.8860 LearningRate 0.0000 Epoch: 34 Global Step: 706170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:49,012-Speed 6303.93 samples/sec Loss 2.9075 LearningRate 0.0000 Epoch: 34 Global Step: 706180 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:52,270-Speed 6286.76 samples/sec Loss 2.8545 LearningRate 0.0000 Epoch: 34 Global Step: 706190 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:55,524-Speed 6296.66 samples/sec Loss 2.8934 LearningRate 0.0000 Epoch: 34 Global Step: 706200 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:58:58,779-Speed 6293.16 samples/sec Loss 2.8956 LearningRate 0.0000 Epoch: 34 Global Step: 706210 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:02,031-Speed 6298.77 samples/sec Loss 2.8646 LearningRate 0.0000 Epoch: 34 Global Step: 706220 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:05,278-Speed 6310.26 samples/sec Loss 2.8777 LearningRate 0.0000 Epoch: 34 Global Step: 706230 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:08,521-Speed 6316.28 samples/sec Loss 2.8577 LearningRate 0.0000 Epoch: 34 Global Step: 706240 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:11,772-Speed 6300.76 samples/sec Loss 2.9321 LearningRate 0.0000 Epoch: 34 Global Step: 706250 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:59:15,025-Speed 6297.91 samples/sec Loss 2.9088 LearningRate 0.0000 Epoch: 34 Global Step: 706260 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:59:18,270-Speed 6311.37 samples/sec Loss 2.9219 LearningRate 0.0000 Epoch: 34 Global Step: 706270 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:59:21,521-Speed 6300.80 samples/sec Loss 2.9305 LearningRate 0.0000 Epoch: 34 Global Step: 706280 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:59:24,779-Speed 6287.33 samples/sec Loss 2.8764 LearningRate 0.0000 Epoch: 34 Global Step: 706290 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 07:59:28,019-Speed 6323.12 samples/sec Loss 2.9227 LearningRate 0.0000 Epoch: 34 Global Step: 706300 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:31,264-Speed 6313.26 samples/sec Loss 2.9134 LearningRate 0.0000 Epoch: 34 Global Step: 706310 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:34,506-Speed 6317.46 samples/sec Loss 2.8856 LearningRate 0.0000 Epoch: 34 Global Step: 706320 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:37,754-Speed 6306.95 samples/sec Loss 2.8544 LearningRate 0.0000 Epoch: 34 Global Step: 706330 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:41,004-Speed 6302.60 samples/sec Loss 2.8819 LearningRate 0.0000 Epoch: 34 Global Step: 706340 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:44,252-Speed 6308.14 samples/sec Loss 2.9248 LearningRate 0.0000 Epoch: 34 Global Step: 706350 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:47,503-Speed 6299.42 samples/sec Loss 2.8768 LearningRate 0.0000 Epoch: 34 Global Step: 706360 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:50,760-Speed 6289.78 samples/sec Loss 2.8956 LearningRate 0.0000 Epoch: 34 Global Step: 706370 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:54,017-Speed 6288.91 samples/sec Loss 2.8884 LearningRate 0.0000 Epoch: 34 Global Step: 706380 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 07:59:57,271-Speed 6295.63 samples/sec Loss 2.9319 LearningRate 0.0000 Epoch: 34 Global Step: 706390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:00,499-Speed 6345.91 samples/sec Loss 2.8637 LearningRate 0.0000 Epoch: 34 Global Step: 706400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:03,758-Speed 6285.89 samples/sec Loss 2.9335 LearningRate 0.0000 Epoch: 34 Global Step: 706410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:07,012-Speed 6295.06 samples/sec Loss 2.9410 LearningRate 0.0000 Epoch: 34 Global Step: 706420 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:10,256-Speed 6315.46 samples/sec Loss 2.8589 LearningRate 0.0000 Epoch: 34 Global Step: 706430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:13,508-Speed 6299.43 samples/sec Loss 2.8957 LearningRate 0.0000 Epoch: 34 Global Step: 706440 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:16,766-Speed 6286.51 samples/sec Loss 2.9263 LearningRate 0.0000 Epoch: 34 Global Step: 706450 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:20,019-Speed 6298.11 samples/sec Loss 2.9303 LearningRate 0.0000 Epoch: 34 Global Step: 706460 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:23,274-Speed 6293.60 samples/sec Loss 2.9000 LearningRate 0.0000 Epoch: 34 Global Step: 706470 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:26,523-Speed 6304.17 samples/sec Loss 2.8476 LearningRate 0.0000 Epoch: 34 Global Step: 706480 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:29,777-Speed 6296.42 samples/sec Loss 2.8866 LearningRate 0.0000 Epoch: 34 Global Step: 706490 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:33,028-Speed 6300.79 samples/sec Loss 2.8429 LearningRate 0.0000 Epoch: 34 Global Step: 706500 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:00:36,259-Speed 6339.33 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 706510 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:39,503-Speed 6313.93 samples/sec Loss 2.8799 LearningRate 0.0000 Epoch: 34 Global Step: 706520 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:42,754-Speed 6301.28 samples/sec Loss 2.9551 LearningRate 0.0000 Epoch: 34 Global Step: 706530 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:46,002-Speed 6307.77 samples/sec Loss 2.8620 LearningRate 0.0000 Epoch: 34 Global Step: 706540 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:49,252-Speed 6301.57 samples/sec Loss 2.8955 LearningRate 0.0000 Epoch: 34 Global Step: 706550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:52,504-Speed 6299.75 samples/sec Loss 2.8870 LearningRate 0.0000 Epoch: 34 Global Step: 706560 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:55,747-Speed 6317.15 samples/sec Loss 2.8312 LearningRate 0.0000 Epoch: 34 Global Step: 706570 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:00:58,995-Speed 6307.04 samples/sec Loss 2.8250 LearningRate 0.0000 Epoch: 34 Global Step: 706580 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:02,247-Speed 6298.30 samples/sec Loss 2.9021 LearningRate 0.0000 Epoch: 34 Global Step: 706590 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:05,489-Speed 6318.95 samples/sec Loss 2.8934 LearningRate 0.0000 Epoch: 34 Global Step: 706600 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:08,738-Speed 6304.87 samples/sec Loss 2.8257 LearningRate 0.0000 Epoch: 34 Global Step: 706610 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:01:11,978-Speed 6322.40 samples/sec Loss 2.8857 LearningRate 0.0000 Epoch: 34 Global Step: 706620 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:01:15,225-Speed 6307.35 samples/sec Loss 2.9088 LearningRate 0.0000 Epoch: 34 Global Step: 706630 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:01:18,477-Speed 6298.94 samples/sec Loss 2.9054 LearningRate 0.0000 Epoch: 34 Global Step: 706640 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:01:21,723-Speed 6311.75 samples/sec Loss 2.8519 LearningRate 0.0000 Epoch: 34 Global Step: 706650 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:01:24,982-Speed 6285.35 samples/sec Loss 2.9420 LearningRate 0.0000 Epoch: 34 Global Step: 706660 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:01:28,220-Speed 6327.54 samples/sec Loss 2.8856 LearningRate 0.0000 Epoch: 34 Global Step: 706670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:31,473-Speed 6296.23 samples/sec Loss 2.8895 LearningRate 0.0000 Epoch: 34 Global Step: 706680 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:34,729-Speed 6292.44 samples/sec Loss 2.8919 LearningRate 0.0000 Epoch: 34 Global Step: 706690 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:37,983-Speed 6295.36 samples/sec Loss 2.8884 LearningRate 0.0000 Epoch: 34 Global Step: 706700 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:41,230-Speed 6308.58 samples/sec Loss 2.8765 LearningRate 0.0000 Epoch: 34 Global Step: 706710 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:44,475-Speed 6312.94 samples/sec Loss 2.8673 LearningRate 0.0000 Epoch: 34 Global Step: 706720 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:47,723-Speed 6305.34 samples/sec Loss 2.9733 LearningRate 0.0000 Epoch: 34 Global Step: 706730 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:50,971-Speed 6307.66 samples/sec Loss 2.8373 LearningRate 0.0000 Epoch: 34 Global Step: 706740 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:54,221-Speed 6303.65 samples/sec Loss 2.9112 LearningRate 0.0000 Epoch: 34 Global Step: 706750 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:01:57,465-Speed 6314.00 samples/sec Loss 2.9130 LearningRate 0.0000 Epoch: 34 Global Step: 706760 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:00,719-Speed 6295.48 samples/sec Loss 2.9793 LearningRate 0.0000 Epoch: 34 Global Step: 706770 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:02:03,967-Speed 6306.24 samples/sec Loss 2.8405 LearningRate 0.0000 Epoch: 34 Global Step: 706780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:07,215-Speed 6307.38 samples/sec Loss 2.8801 LearningRate 0.0000 Epoch: 34 Global Step: 706790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:10,465-Speed 6302.77 samples/sec Loss 2.8425 LearningRate 0.0000 Epoch: 34 Global Step: 706800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:13,710-Speed 6312.78 samples/sec Loss 2.9138 LearningRate 0.0000 Epoch: 34 Global Step: 706810 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:16,965-Speed 6291.55 samples/sec Loss 2.8516 LearningRate 0.0000 Epoch: 34 Global Step: 706820 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:20,215-Speed 6302.80 samples/sec Loss 2.8750 LearningRate 0.0000 Epoch: 34 Global Step: 706830 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:23,466-Speed 6301.84 samples/sec Loss 2.8735 LearningRate 0.0000 Epoch: 34 Global Step: 706840 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:26,713-Speed 6308.57 samples/sec Loss 2.8935 LearningRate 0.0000 Epoch: 34 Global Step: 706850 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:29,956-Speed 6316.46 samples/sec Loss 2.9354 LearningRate 0.0000 Epoch: 34 Global Step: 706860 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:33,212-Speed 6291.40 samples/sec Loss 2.9004 LearningRate 0.0000 Epoch: 34 Global Step: 706870 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:36,469-Speed 6291.46 samples/sec Loss 2.8877 LearningRate 0.0000 Epoch: 34 Global Step: 706880 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:02:39,716-Speed 6307.75 samples/sec Loss 2.8742 LearningRate 0.0000 Epoch: 34 Global Step: 706890 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:43,022-Speed 6197.60 samples/sec Loss 2.8790 LearningRate 0.0000 Epoch: 34 Global Step: 706900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:46,275-Speed 6295.69 samples/sec Loss 2.8635 LearningRate 0.0000 Epoch: 34 Global Step: 706910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:49,531-Speed 6290.93 samples/sec Loss 2.8771 LearningRate 0.0000 Epoch: 34 Global Step: 706920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:52,781-Speed 6302.72 samples/sec Loss 2.9123 LearningRate 0.0000 Epoch: 34 Global Step: 706930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:56,028-Speed 6310.22 samples/sec Loss 2.9429 LearningRate 0.0000 Epoch: 34 Global Step: 706940 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:02:59,285-Speed 6288.11 samples/sec Loss 2.8995 LearningRate 0.0000 Epoch: 34 Global Step: 706950 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:02,533-Speed 6307.25 samples/sec Loss 2.9036 LearningRate 0.0000 Epoch: 34 Global Step: 706960 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:05,784-Speed 6305.35 samples/sec Loss 2.8737 LearningRate 0.0000 Epoch: 34 Global Step: 706970 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:09,046-Speed 6278.70 samples/sec Loss 2.8591 LearningRate 0.0000 Epoch: 34 Global Step: 706980 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:12,297-Speed 6302.94 samples/sec Loss 2.8815 LearningRate 0.0000 Epoch: 34 Global Step: 706990 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:03:15,588-Speed 6223.53 samples/sec Loss 2.9024 LearningRate 0.0000 Epoch: 34 Global Step: 707000 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:03:18,818-Speed 6342.06 samples/sec Loss 2.9220 LearningRate 0.0000 Epoch: 34 Global Step: 707010 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:22,066-Speed 6306.87 samples/sec Loss 2.9234 LearningRate 0.0000 Epoch: 34 Global Step: 707020 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:25,319-Speed 6297.90 samples/sec Loss 2.8794 LearningRate 0.0000 Epoch: 34 Global Step: 707030 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:28,569-Speed 6302.68 samples/sec Loss 2.9045 LearningRate 0.0000 Epoch: 34 Global Step: 707040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:31,819-Speed 6302.62 samples/sec Loss 2.8814 LearningRate 0.0000 Epoch: 34 Global Step: 707050 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:35,123-Speed 6199.63 samples/sec Loss 2.8814 LearningRate 0.0000 Epoch: 34 Global Step: 707060 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:38,380-Speed 6288.14 samples/sec Loss 2.9125 LearningRate 0.0000 Epoch: 34 Global Step: 707070 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:41,630-Speed 6304.52 samples/sec Loss 2.9025 LearningRate 0.0000 Epoch: 34 Global Step: 707080 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:44,893-Speed 6277.27 samples/sec Loss 2.8092 LearningRate 0.0000 Epoch: 34 Global Step: 707090 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:48,146-Speed 6296.93 samples/sec Loss 2.9053 LearningRate 0.0000 Epoch: 34 Global Step: 707100 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:03:51,403-Speed 6291.04 samples/sec Loss 2.8964 LearningRate 0.0000 Epoch: 34 Global Step: 707110 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:03:54,655-Speed 6298.57 samples/sec Loss 2.8774 LearningRate 0.0000 Epoch: 34 Global Step: 707120 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:03:57,911-Speed 6290.86 samples/sec Loss 2.8913 LearningRate 0.0000 Epoch: 34 Global Step: 707130 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:04:01,150-Speed 6325.44 samples/sec Loss 2.8230 LearningRate 0.0000 Epoch: 34 Global Step: 707140 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:04,392-Speed 6317.26 samples/sec Loss 2.8910 LearningRate 0.0000 Epoch: 34 Global Step: 707150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:07,641-Speed 6305.51 samples/sec Loss 2.9114 LearningRate 0.0000 Epoch: 34 Global Step: 707160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:10,888-Speed 6309.47 samples/sec Loss 2.8996 LearningRate 0.0000 Epoch: 34 Global Step: 707170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:14,142-Speed 6294.41 samples/sec Loss 2.8801 LearningRate 0.0000 Epoch: 34 Global Step: 707180 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:17,389-Speed 6307.69 samples/sec Loss 2.9203 LearningRate 0.0000 Epoch: 34 Global Step: 707190 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:20,653-Speed 6277.49 samples/sec Loss 2.8814 LearningRate 0.0000 Epoch: 34 Global Step: 707200 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:23,903-Speed 6302.00 samples/sec Loss 2.9042 LearningRate 0.0000 Epoch: 34 Global Step: 707210 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:27,205-Speed 6202.99 samples/sec Loss 2.9310 LearningRate 0.0000 Epoch: 34 Global Step: 707220 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:30,471-Speed 6272.83 samples/sec Loss 2.9177 LearningRate 0.0000 Epoch: 34 Global Step: 707230 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:33,711-Speed 6322.91 samples/sec Loss 2.8790 LearningRate 0.0000 Epoch: 34 Global Step: 707240 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:36,962-Speed 6299.69 samples/sec Loss 2.8725 LearningRate 0.0000 Epoch: 34 Global Step: 707250 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:40,214-Speed 6299.56 samples/sec Loss 2.9899 LearningRate 0.0000 Epoch: 34 Global Step: 707260 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:43,466-Speed 6300.20 samples/sec Loss 2.8903 LearningRate 0.0000 Epoch: 34 Global Step: 707270 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:46,715-Speed 6304.00 samples/sec Loss 2.8753 LearningRate 0.0000 Epoch: 34 Global Step: 707280 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:49,965-Speed 6302.62 samples/sec Loss 2.8547 LearningRate 0.0000 Epoch: 34 Global Step: 707290 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:53,219-Speed 6294.91 samples/sec Loss 2.9059 LearningRate 0.0000 Epoch: 34 Global Step: 707300 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:56,477-Speed 6287.52 samples/sec Loss 2.8752 LearningRate 0.0000 Epoch: 34 Global Step: 707310 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:04:59,735-Speed 6288.49 samples/sec Loss 2.8990 LearningRate 0.0000 Epoch: 34 Global Step: 707320 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:02,988-Speed 6297.28 samples/sec Loss 2.9291 LearningRate 0.0000 Epoch: 34 Global Step: 707330 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:06,238-Speed 6303.19 samples/sec Loss 2.8716 LearningRate 0.0000 Epoch: 34 Global Step: 707340 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:05:09,473-Speed 6332.95 samples/sec Loss 2.8284 LearningRate 0.0000 Epoch: 34 Global Step: 707350 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:12,723-Speed 6303.55 samples/sec Loss 2.8551 LearningRate 0.0000 Epoch: 34 Global Step: 707360 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:15,971-Speed 6305.98 samples/sec Loss 2.8767 LearningRate 0.0000 Epoch: 34 Global Step: 707370 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:19,222-Speed 6299.79 samples/sec Loss 2.8464 LearningRate 0.0000 Epoch: 34 Global Step: 707380 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:22,472-Speed 6303.84 samples/sec Loss 2.8636 LearningRate 0.0000 Epoch: 34 Global Step: 707390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:25,725-Speed 6298.20 samples/sec Loss 2.8709 LearningRate 0.0000 Epoch: 34 Global Step: 707400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:28,978-Speed 6295.65 samples/sec Loss 2.8941 LearningRate 0.0000 Epoch: 34 Global Step: 707410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:32,232-Speed 6296.50 samples/sec Loss 2.9076 LearningRate 0.0000 Epoch: 34 Global Step: 707420 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:35,484-Speed 6299.19 samples/sec Loss 2.8612 LearningRate 0.0000 Epoch: 34 Global Step: 707430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:38,728-Speed 6312.87 samples/sec Loss 2.9062 LearningRate 0.0000 Epoch: 34 Global Step: 707440 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:41,983-Speed 6293.91 samples/sec Loss 2.9029 LearningRate 0.0000 Epoch: 34 Global Step: 707450 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:05:45,231-Speed 6307.87 samples/sec Loss 2.8580 LearningRate 0.0000 Epoch: 34 Global Step: 707460 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:05:48,470-Speed 6323.11 samples/sec Loss 2.9219 LearningRate 0.0000 Epoch: 34 Global Step: 707470 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:51,727-Speed 6289.02 samples/sec Loss 2.9011 LearningRate 0.0000 Epoch: 34 Global Step: 707480 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:54,985-Speed 6287.45 samples/sec Loss 2.9313 LearningRate 0.0000 Epoch: 34 Global Step: 707490 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:05:58,242-Speed 6290.70 samples/sec Loss 2.9044 LearningRate 0.0000 Epoch: 34 Global Step: 707500 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:01,495-Speed 6296.25 samples/sec Loss 2.8829 LearningRate 0.0000 Epoch: 34 Global Step: 707510 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:04,751-Speed 6291.42 samples/sec Loss 2.8153 LearningRate 0.0000 Epoch: 34 Global Step: 707520 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:08,013-Speed 6280.34 samples/sec Loss 2.8989 LearningRate 0.0000 Epoch: 34 Global Step: 707530 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:11,267-Speed 6295.86 samples/sec Loss 2.9131 LearningRate 0.0000 Epoch: 34 Global Step: 707540 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:14,517-Speed 6301.92 samples/sec Loss 2.8467 LearningRate 0.0000 Epoch: 34 Global Step: 707550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:17,775-Speed 6288.89 samples/sec Loss 2.9182 LearningRate 0.0000 Epoch: 34 Global Step: 707560 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:21,029-Speed 6293.85 samples/sec Loss 2.9513 LearningRate 0.0000 Epoch: 34 Global Step: 707570 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:06:24,278-Speed 6305.79 samples/sec Loss 2.9323 LearningRate 0.0000 Epoch: 34 Global Step: 707580 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:06:27,529-Speed 6300.40 samples/sec Loss 2.8648 LearningRate 0.0000 Epoch: 34 Global Step: 707590 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:06:30,779-Speed 6303.92 samples/sec Loss 2.8776 LearningRate 0.0000 Epoch: 34 Global Step: 707600 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:06:34,039-Speed 6282.94 samples/sec Loss 2.8714 LearningRate 0.0000 Epoch: 34 Global Step: 707610 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:06:37,294-Speed 6294.08 samples/sec Loss 2.9031 LearningRate 0.0000 Epoch: 34 Global Step: 707620 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:06:40,542-Speed 6307.33 samples/sec Loss 2.8828 LearningRate 0.0000 Epoch: 34 Global Step: 707630 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:06:43,779-Speed 6326.45 samples/sec Loss 2.9216 LearningRate 0.0000 Epoch: 34 Global Step: 707640 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:47,037-Speed 6288.21 samples/sec Loss 2.9048 LearningRate 0.0000 Epoch: 34 Global Step: 707650 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:50,287-Speed 6302.94 samples/sec Loss 2.8130 LearningRate 0.0000 Epoch: 34 Global Step: 707660 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:53,539-Speed 6299.28 samples/sec Loss 2.9103 LearningRate 0.0000 Epoch: 34 Global Step: 707670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:06:56,786-Speed 6308.31 samples/sec Loss 2.9033 LearningRate 0.0000 Epoch: 34 Global Step: 707680 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:00,051-Speed 6273.90 samples/sec Loss 2.8853 LearningRate 0.0000 Epoch: 34 Global Step: 707690 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:03,309-Speed 6287.32 samples/sec Loss 2.9064 LearningRate 0.0000 Epoch: 34 Global Step: 707700 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:06,559-Speed 6302.95 samples/sec Loss 2.8567 LearningRate 0.0000 Epoch: 34 Global Step: 707710 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:09,816-Speed 6289.87 samples/sec Loss 2.8418 LearningRate 0.0000 Epoch: 34 Global Step: 707720 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:13,073-Speed 6289.25 samples/sec Loss 2.8602 LearningRate 0.0000 Epoch: 34 Global Step: 707730 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:16,330-Speed 6289.26 samples/sec Loss 2.8687 LearningRate 0.0000 Epoch: 34 Global Step: 707740 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:07:19,581-Speed 6300.91 samples/sec Loss 2.8976 LearningRate 0.0000 Epoch: 34 Global Step: 707750 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:07:22,833-Speed 6298.89 samples/sec Loss 2.9019 LearningRate 0.0000 Epoch: 34 Global Step: 707760 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:07:26,069-Speed 6330.33 samples/sec Loss 2.8805 LearningRate 0.0000 Epoch: 34 Global Step: 707770 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:29,323-Speed 6295.92 samples/sec Loss 2.8900 LearningRate 0.0000 Epoch: 34 Global Step: 707780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:32,575-Speed 6298.95 samples/sec Loss 2.8814 LearningRate 0.0000 Epoch: 34 Global Step: 707790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:35,827-Speed 6299.19 samples/sec Loss 2.8846 LearningRate 0.0000 Epoch: 34 Global Step: 707800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:39,078-Speed 6302.12 samples/sec Loss 2.8860 LearningRate 0.0000 Epoch: 34 Global Step: 707810 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:42,328-Speed 6303.29 samples/sec Loss 2.8306 LearningRate 0.0000 Epoch: 34 Global Step: 707820 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:45,577-Speed 6303.20 samples/sec Loss 2.9535 LearningRate 0.0000 Epoch: 34 Global Step: 707830 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:48,829-Speed 6300.18 samples/sec Loss 2.9078 LearningRate 0.0000 Epoch: 34 Global Step: 707840 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:52,081-Speed 6299.34 samples/sec Loss 2.8625 LearningRate 0.0000 Epoch: 34 Global Step: 707850 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:55,336-Speed 6291.80 samples/sec Loss 2.9359 LearningRate 0.0000 Epoch: 34 Global Step: 707860 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:07:58,590-Speed 6294.94 samples/sec Loss 2.8530 LearningRate 0.0000 Epoch: 34 Global Step: 707870 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:08:01,829-Speed 6324.90 samples/sec Loss 2.8239 LearningRate 0.0000 Epoch: 34 Global Step: 707880 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:05,086-Speed 6290.78 samples/sec Loss 2.8580 LearningRate 0.0000 Epoch: 34 Global Step: 707890 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:08,335-Speed 6304.44 samples/sec Loss 2.9228 LearningRate 0.0000 Epoch: 34 Global Step: 707900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:11,583-Speed 6307.17 samples/sec Loss 2.8788 LearningRate 0.0000 Epoch: 34 Global Step: 707910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:14,834-Speed 6299.68 samples/sec Loss 2.8818 LearningRate 0.0000 Epoch: 34 Global Step: 707920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:18,090-Speed 6291.69 samples/sec Loss 2.8739 LearningRate 0.0000 Epoch: 34 Global Step: 707930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:21,338-Speed 6306.07 samples/sec Loss 2.9241 LearningRate 0.0000 Epoch: 34 Global Step: 707940 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:24,591-Speed 6297.89 samples/sec Loss 2.9035 LearningRate 0.0000 Epoch: 34 Global Step: 707950 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:27,838-Speed 6307.63 samples/sec Loss 2.8685 LearningRate 0.0000 Epoch: 34 Global Step: 707960 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:31,086-Speed 6307.32 samples/sec Loss 2.8998 LearningRate 0.0000 Epoch: 34 Global Step: 707970 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:34,336-Speed 6303.68 samples/sec Loss 2.8453 LearningRate 0.0000 Epoch: 34 Global Step: 707980 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:08:37,574-Speed 6325.48 samples/sec Loss 2.8807 LearningRate 0.0000 Epoch: 34 Global Step: 707990 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:40,827-Speed 6298.12 samples/sec Loss 2.9464 LearningRate 0.0000 Epoch: 34 Global Step: 708000 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:44,078-Speed 6301.91 samples/sec Loss 2.8744 LearningRate 0.0000 Epoch: 34 Global Step: 708010 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:47,319-Speed 6320.57 samples/sec Loss 2.8903 LearningRate 0.0000 Epoch: 34 Global Step: 708020 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:50,576-Speed 6289.64 samples/sec Loss 2.9328 LearningRate 0.0000 Epoch: 34 Global Step: 708030 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:53,836-Speed 6283.43 samples/sec Loss 2.8981 LearningRate 0.0000 Epoch: 34 Global Step: 708040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:08:57,088-Speed 6298.31 samples/sec Loss 2.8967 LearningRate 0.0000 Epoch: 34 Global Step: 708050 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:00,347-Speed 6286.59 samples/sec Loss 2.8396 LearningRate 0.0000 Epoch: 34 Global Step: 708060 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:03,594-Speed 6308.42 samples/sec Loss 2.8742 LearningRate 0.0000 Epoch: 34 Global Step: 708070 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:06,843-Speed 6304.16 samples/sec Loss 2.8635 LearningRate 0.0000 Epoch: 34 Global Step: 708080 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:10,099-Speed 6291.32 samples/sec Loss 2.8888 LearningRate 0.0000 Epoch: 34 Global Step: 708090 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:09:13,349-Speed 6303.10 samples/sec Loss 2.9435 LearningRate 0.0000 Epoch: 34 Global Step: 708100 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:09:16,605-Speed 6291.59 samples/sec Loss 2.9016 LearningRate 0.0000 Epoch: 34 Global Step: 708110 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:09:19,856-Speed 6299.77 samples/sec Loss 2.9135 LearningRate 0.0000 Epoch: 34 Global Step: 708120 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:09:23,093-Speed 6328.44 samples/sec Loss 2.8638 LearningRate 0.0000 Epoch: 34 Global Step: 708130 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:26,341-Speed 6307.32 samples/sec Loss 2.9086 LearningRate 0.0000 Epoch: 34 Global Step: 708140 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:29,597-Speed 6290.95 samples/sec Loss 2.8598 LearningRate 0.0000 Epoch: 34 Global Step: 708150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:32,854-Speed 6288.79 samples/sec Loss 2.8720 LearningRate 0.0000 Epoch: 34 Global Step: 708160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:36,113-Speed 6286.31 samples/sec Loss 2.9336 LearningRate 0.0000 Epoch: 34 Global Step: 708170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:39,368-Speed 6292.81 samples/sec Loss 2.9227 LearningRate 0.0000 Epoch: 34 Global Step: 708180 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:42,630-Speed 6282.20 samples/sec Loss 2.8769 LearningRate 0.0000 Epoch: 34 Global Step: 708190 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:45,885-Speed 6293.03 samples/sec Loss 2.8610 LearningRate 0.0000 Epoch: 34 Global Step: 708200 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:49,140-Speed 6294.38 samples/sec Loss 2.8936 LearningRate 0.0000 Epoch: 34 Global Step: 708210 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:52,384-Speed 6312.62 samples/sec Loss 2.8426 LearningRate 0.0000 Epoch: 34 Global Step: 708220 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:09:55,648-Speed 6278.13 samples/sec Loss 2.8629 LearningRate 0.0000 Epoch: 34 Global Step: 708230 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:09:58,930-Speed 6241.12 samples/sec Loss 2.9025 LearningRate 0.0000 Epoch: 34 Global Step: 708240 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:02,224-Speed 6219.00 samples/sec Loss 2.9048 LearningRate 0.0000 Epoch: 34 Global Step: 708250 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:05,475-Speed 6301.57 samples/sec Loss 2.9053 LearningRate 0.0000 Epoch: 34 Global Step: 708260 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:08,736-Speed 6280.26 samples/sec Loss 2.9097 LearningRate 0.0000 Epoch: 34 Global Step: 708270 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:11,989-Speed 6297.00 samples/sec Loss 2.8783 LearningRate 0.0000 Epoch: 34 Global Step: 708280 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:15,248-Speed 6286.81 samples/sec Loss 2.8872 LearningRate 0.0000 Epoch: 34 Global Step: 708290 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:18,498-Speed 6302.43 samples/sec Loss 2.8939 LearningRate 0.0000 Epoch: 34 Global Step: 708300 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:21,746-Speed 6305.99 samples/sec Loss 2.9298 LearningRate 0.0000 Epoch: 34 Global Step: 708310 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:24,995-Speed 6305.26 samples/sec Loss 2.8699 LearningRate 0.0000 Epoch: 34 Global Step: 708320 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:28,246-Speed 6301.14 samples/sec Loss 2.8870 LearningRate 0.0000 Epoch: 34 Global Step: 708330 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:31,489-Speed 6315.96 samples/sec Loss 2.9027 LearningRate 0.0000 Epoch: 34 Global Step: 708340 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:10:34,743-Speed 6295.59 samples/sec Loss 2.8991 LearningRate 0.0000 Epoch: 34 Global Step: 708350 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:10:38,000-Speed 6288.82 samples/sec Loss 2.8617 LearningRate 0.0000 Epoch: 34 Global Step: 708360 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:10:41,256-Speed 6291.90 samples/sec Loss 2.9019 LearningRate 0.0000 Epoch: 34 Global Step: 708370 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:10:44,507-Speed 6301.48 samples/sec Loss 2.9054 LearningRate 0.0000 Epoch: 34 Global Step: 708380 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:10:47,742-Speed 6332.89 samples/sec Loss 2.8837 LearningRate 0.0000 Epoch: 34 Global Step: 708390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:50,992-Speed 6301.86 samples/sec Loss 2.8449 LearningRate 0.0000 Epoch: 34 Global Step: 708400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:54,247-Speed 6293.57 samples/sec Loss 2.9483 LearningRate 0.0000 Epoch: 34 Global Step: 708410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:10:57,500-Speed 6296.85 samples/sec Loss 2.8755 LearningRate 0.0000 Epoch: 34 Global Step: 708420 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:00,751-Speed 6301.31 samples/sec Loss 2.8789 LearningRate 0.0000 Epoch: 34 Global Step: 708430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:04,009-Speed 6286.81 samples/sec Loss 2.8227 LearningRate 0.0000 Epoch: 34 Global Step: 708440 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:07,253-Speed 6314.29 samples/sec Loss 2.8586 LearningRate 0.0000 Epoch: 34 Global Step: 708450 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:10,511-Speed 6289.51 samples/sec Loss 2.8310 LearningRate 0.0000 Epoch: 34 Global Step: 708460 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:13,767-Speed 6289.96 samples/sec Loss 2.8986 LearningRate 0.0000 Epoch: 34 Global Step: 708470 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:17,016-Speed 6305.97 samples/sec Loss 2.7966 LearningRate 0.0000 Epoch: 34 Global Step: 708480 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:20,265-Speed 6305.82 samples/sec Loss 2.8714 LearningRate 0.0000 Epoch: 34 Global Step: 708490 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:11:23,513-Speed 6306.07 samples/sec Loss 2.9202 LearningRate 0.0000 Epoch: 34 Global Step: 708500 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:11:26,767-Speed 6294.48 samples/sec Loss 2.8938 LearningRate 0.0000 Epoch: 34 Global Step: 708510 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:11:30,027-Speed 6284.24 samples/sec Loss 2.9041 LearningRate 0.0000 Epoch: 34 Global Step: 708520 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:11:33,278-Speed 6300.50 samples/sec Loss 2.8708 LearningRate 0.0000 Epoch: 34 Global Step: 708530 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:11:36,522-Speed 6315.55 samples/sec Loss 2.8608 LearningRate 0.0000 Epoch: 34 Global Step: 708540 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:11:39,751-Speed 6342.01 samples/sec Loss 2.8855 LearningRate 0.0000 Epoch: 34 Global Step: 708550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:43,010-Speed 6286.30 samples/sec Loss 2.9000 LearningRate 0.0000 Epoch: 34 Global Step: 708560 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:46,262-Speed 6299.21 samples/sec Loss 2.8826 LearningRate 0.0000 Epoch: 34 Global Step: 708570 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:49,511-Speed 6305.92 samples/sec Loss 2.8437 LearningRate 0.0000 Epoch: 34 Global Step: 708580 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:52,769-Speed 6286.83 samples/sec Loss 2.8463 LearningRate 0.0000 Epoch: 34 Global Step: 708590 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:56,013-Speed 6313.83 samples/sec Loss 2.8638 LearningRate 0.0000 Epoch: 34 Global Step: 708600 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:11:59,268-Speed 6292.90 samples/sec Loss 2.8564 LearningRate 0.0000 Epoch: 34 Global Step: 708610 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:02,517-Speed 6305.03 samples/sec Loss 2.8461 LearningRate 0.0000 Epoch: 34 Global Step: 708620 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:05,773-Speed 6291.87 samples/sec Loss 2.8753 LearningRate 0.0000 Epoch: 34 Global Step: 708630 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:09,025-Speed 6299.79 samples/sec Loss 2.8783 LearningRate 0.0000 Epoch: 34 Global Step: 708640 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:12,273-Speed 6306.25 samples/sec Loss 2.8465 LearningRate 0.0000 Epoch: 34 Global Step: 708650 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:12:15,513-Speed 6322.75 samples/sec Loss 2.8494 LearningRate 0.0000 Epoch: 34 Global Step: 708660 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:18,771-Speed 6286.91 samples/sec Loss 2.8693 LearningRate 0.0000 Epoch: 34 Global Step: 708670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:22,025-Speed 6295.24 samples/sec Loss 2.8666 LearningRate 0.0000 Epoch: 34 Global Step: 708680 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:25,277-Speed 6300.85 samples/sec Loss 2.8530 LearningRate 0.0000 Epoch: 34 Global Step: 708690 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:28,527-Speed 6302.88 samples/sec Loss 2.9067 LearningRate 0.0000 Epoch: 34 Global Step: 708700 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:31,782-Speed 6292.90 samples/sec Loss 2.8658 LearningRate 0.0000 Epoch: 34 Global Step: 708710 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:35,031-Speed 6304.31 samples/sec Loss 2.8864 LearningRate 0.0000 Epoch: 34 Global Step: 708720 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:38,282-Speed 6300.90 samples/sec Loss 2.8574 LearningRate 0.0000 Epoch: 34 Global Step: 708730 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:41,535-Speed 6298.21 samples/sec Loss 2.8507 LearningRate 0.0000 Epoch: 34 Global Step: 708740 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:44,787-Speed 6297.51 samples/sec Loss 2.9085 LearningRate 0.0000 Epoch: 34 Global Step: 708750 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:48,025-Speed 6328.11 samples/sec Loss 2.9144 LearningRate 0.0000 Epoch: 34 Global Step: 708760 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:51,281-Speed 6291.42 samples/sec Loss 2.9031 LearningRate 0.0000 Epoch: 34 Global Step: 708770 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:54,534-Speed 6296.34 samples/sec Loss 2.8615 LearningRate 0.0000 Epoch: 34 Global Step: 708780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:12:57,790-Speed 6291.30 samples/sec Loss 2.8962 LearningRate 0.0000 Epoch: 34 Global Step: 708790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:01,049-Speed 6284.79 samples/sec Loss 2.8781 LearningRate 0.0000 Epoch: 34 Global Step: 708800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:04,306-Speed 6290.05 samples/sec Loss 2.8444 LearningRate 0.0000 Epoch: 34 Global Step: 708810 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:07,558-Speed 6297.60 samples/sec Loss 2.9252 LearningRate 0.0000 Epoch: 34 Global Step: 708820 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:10,816-Speed 6289.58 samples/sec Loss 2.8551 LearningRate 0.0000 Epoch: 34 Global Step: 708830 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:14,067-Speed 6299.43 samples/sec Loss 2.8531 LearningRate 0.0000 Epoch: 34 Global Step: 708840 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:17,319-Speed 6300.53 samples/sec Loss 2.8865 LearningRate 0.0000 Epoch: 34 Global Step: 708850 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:20,564-Speed 6312.82 samples/sec Loss 2.8338 LearningRate 0.0000 Epoch: 34 Global Step: 708860 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:13:23,850-Speed 6233.93 samples/sec Loss 2.8848 LearningRate 0.0000 Epoch: 34 Global Step: 708870 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:13:27,087-Speed 6328.16 samples/sec Loss 2.8707 LearningRate 0.0000 Epoch: 34 Global Step: 708880 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:30,331-Speed 6314.39 samples/sec Loss 2.8409 LearningRate 0.0000 Epoch: 34 Global Step: 708890 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:33,581-Speed 6302.15 samples/sec Loss 2.8482 LearningRate 0.0000 Epoch: 34 Global Step: 708900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:36,829-Speed 6307.68 samples/sec Loss 2.8390 LearningRate 0.0000 Epoch: 34 Global Step: 708910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:40,085-Speed 6290.84 samples/sec Loss 2.8070 LearningRate 0.0000 Epoch: 34 Global Step: 708920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:43,344-Speed 6286.60 samples/sec Loss 2.8620 LearningRate 0.0000 Epoch: 34 Global Step: 708930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:46,589-Speed 6312.27 samples/sec Loss 2.8920 LearningRate 0.0000 Epoch: 34 Global Step: 708940 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:49,840-Speed 6301.56 samples/sec Loss 2.8917 LearningRate 0.0000 Epoch: 34 Global Step: 708950 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:53,090-Speed 6301.83 samples/sec Loss 2.9356 LearningRate 0.0000 Epoch: 34 Global Step: 708960 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:56,335-Speed 6312.72 samples/sec Loss 2.8701 LearningRate 0.0000 Epoch: 34 Global Step: 708970 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:13:59,574-Speed 6324.85 samples/sec Loss 2.8974 LearningRate 0.0000 Epoch: 34 Global Step: 708980 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:02,829-Speed 6292.34 samples/sec Loss 2.8850 LearningRate 0.0000 Epoch: 34 Global Step: 708990 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:06,081-Speed 6299.35 samples/sec Loss 2.8981 LearningRate 0.0000 Epoch: 34 Global Step: 709000 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:09,335-Speed 6295.19 samples/sec Loss 2.8432 LearningRate 0.0000 Epoch: 34 Global Step: 709010 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:12,585-Speed 6303.27 samples/sec Loss 2.8973 LearningRate 0.0000 Epoch: 34 Global Step: 709020 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:15,838-Speed 6298.41 samples/sec Loss 2.8882 LearningRate 0.0000 Epoch: 34 Global Step: 709030 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:19,097-Speed 6284.88 samples/sec Loss 2.9142 LearningRate 0.0000 Epoch: 34 Global Step: 709040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:22,348-Speed 6300.50 samples/sec Loss 2.9101 LearningRate 0.0000 Epoch: 34 Global Step: 709050 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:25,600-Speed 6298.29 samples/sec Loss 2.8963 LearningRate 0.0000 Epoch: 34 Global Step: 709060 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:28,850-Speed 6302.81 samples/sec Loss 2.8555 LearningRate 0.0000 Epoch: 34 Global Step: 709070 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:32,103-Speed 6297.20 samples/sec Loss 2.8647 LearningRate 0.0000 Epoch: 34 Global Step: 709080 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:14:35,341-Speed 6326.83 samples/sec Loss 2.8144 LearningRate 0.0000 Epoch: 34 Global Step: 709090 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:38,591-Speed 6303.61 samples/sec Loss 2.8463 LearningRate 0.0000 Epoch: 34 Global Step: 709100 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:41,850-Speed 6285.34 samples/sec Loss 2.9160 LearningRate 0.0000 Epoch: 34 Global Step: 709110 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:45,103-Speed 6297.05 samples/sec Loss 2.8814 LearningRate 0.0000 Epoch: 34 Global Step: 709120 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:48,359-Speed 6291.72 samples/sec Loss 2.9491 LearningRate 0.0000 Epoch: 34 Global Step: 709130 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:51,611-Speed 6300.01 samples/sec Loss 2.8411 LearningRate 0.0000 Epoch: 34 Global Step: 709140 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:54,872-Speed 6282.14 samples/sec Loss 2.9173 LearningRate 0.0000 Epoch: 34 Global Step: 709150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:14:58,124-Speed 6297.67 samples/sec Loss 2.8761 LearningRate 0.0000 Epoch: 34 Global Step: 709160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:01,385-Speed 6281.73 samples/sec Loss 2.9179 LearningRate 0.0000 Epoch: 34 Global Step: 709170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:04,642-Speed 6289.90 samples/sec Loss 2.8363 LearningRate 0.0000 Epoch: 34 Global Step: 709180 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:07,894-Speed 6299.77 samples/sec Loss 2.8609 LearningRate 0.0000 Epoch: 34 Global Step: 709190 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:15:11,146-Speed 6299.05 samples/sec Loss 2.8714 LearningRate 0.0000 Epoch: 34 Global Step: 709200 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:15:14,400-Speed 6295.31 samples/sec Loss 2.8771 LearningRate 0.0000 Epoch: 34 Global Step: 709210 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:15:17,659-Speed 6284.54 samples/sec Loss 2.8860 LearningRate 0.0000 Epoch: 34 Global Step: 709220 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:15:20,900-Speed 6319.78 samples/sec Loss 2.8747 LearningRate 0.0000 Epoch: 34 Global Step: 709230 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:24,154-Speed 6296.30 samples/sec Loss 2.8918 LearningRate 0.0000 Epoch: 34 Global Step: 709240 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:27,407-Speed 6296.82 samples/sec Loss 2.8704 LearningRate 0.0000 Epoch: 34 Global Step: 709250 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:30,669-Speed 6279.22 samples/sec Loss 2.8901 LearningRate 0.0000 Epoch: 34 Global Step: 709260 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:33,928-Speed 6285.94 samples/sec Loss 2.8677 LearningRate 0.0000 Epoch: 34 Global Step: 709270 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:37,186-Speed 6287.53 samples/sec Loss 2.8806 LearningRate 0.0000 Epoch: 34 Global Step: 709280 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:40,447-Speed 6281.78 samples/sec Loss 2.8683 LearningRate 0.0000 Epoch: 34 Global Step: 709290 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:43,696-Speed 6303.95 samples/sec Loss 2.9056 LearningRate 0.0000 Epoch: 34 Global Step: 709300 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:46,945-Speed 6306.80 samples/sec Loss 2.8753 LearningRate 0.0000 Epoch: 34 Global Step: 709310 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:50,196-Speed 6299.60 samples/sec Loss 2.8821 LearningRate 0.0000 Epoch: 34 Global Step: 709320 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:53,450-Speed 6296.83 samples/sec Loss 2.8444 LearningRate 0.0000 Epoch: 34 Global Step: 709330 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:15:56,687-Speed 6328.00 samples/sec Loss 2.8593 LearningRate 0.0000 Epoch: 34 Global Step: 709340 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:15:59,946-Speed 6285.92 samples/sec Loss 2.8541 LearningRate 0.0000 Epoch: 34 Global Step: 709350 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:03,202-Speed 6290.79 samples/sec Loss 2.8534 LearningRate 0.0000 Epoch: 34 Global Step: 709360 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:06,486-Speed 6237.88 samples/sec Loss 2.8571 LearningRate 0.0000 Epoch: 34 Global Step: 709370 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:09,743-Speed 6288.91 samples/sec Loss 2.8570 LearningRate 0.0000 Epoch: 34 Global Step: 709380 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:13,000-Speed 6290.40 samples/sec Loss 2.8777 LearningRate 0.0000 Epoch: 34 Global Step: 709390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:16,303-Speed 6201.67 samples/sec Loss 2.8832 LearningRate 0.0000 Epoch: 34 Global Step: 709400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:19,552-Speed 6303.88 samples/sec Loss 2.8677 LearningRate 0.0000 Epoch: 34 Global Step: 709410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:22,804-Speed 6300.20 samples/sec Loss 2.8575 LearningRate 0.0000 Epoch: 34 Global Step: 709420 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:26,050-Speed 6309.67 samples/sec Loss 2.8609 LearningRate 0.0000 Epoch: 34 Global Step: 709430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:29,320-Speed 6263.93 samples/sec Loss 2.8411 LearningRate 0.0000 Epoch: 34 Global Step: 709440 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:16:32,572-Speed 6299.86 samples/sec Loss 2.8427 LearningRate 0.0000 Epoch: 34 Global Step: 709450 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:16:35,816-Speed 6314.26 samples/sec Loss 2.8762 LearningRate 0.0000 Epoch: 34 Global Step: 709460 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:39,071-Speed 6294.21 samples/sec Loss 2.9181 LearningRate 0.0000 Epoch: 34 Global Step: 709470 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:42,327-Speed 6289.62 samples/sec Loss 2.9022 LearningRate 0.0000 Epoch: 34 Global Step: 709480 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:45,602-Speed 6255.70 samples/sec Loss 2.8470 LearningRate 0.0000 Epoch: 34 Global Step: 709490 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:48,859-Speed 6290.31 samples/sec Loss 2.8811 LearningRate 0.0000 Epoch: 34 Global Step: 709500 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:52,113-Speed 6295.50 samples/sec Loss 2.8843 LearningRate 0.0000 Epoch: 34 Global Step: 709510 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:55,362-Speed 6302.84 samples/sec Loss 2.8945 LearningRate 0.0000 Epoch: 34 Global Step: 709520 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:16:58,615-Speed 6298.04 samples/sec Loss 2.9015 LearningRate 0.0000 Epoch: 34 Global Step: 709530 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:01,874-Speed 6284.80 samples/sec Loss 2.8529 LearningRate 0.0000 Epoch: 34 Global Step: 709540 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:05,131-Speed 6289.74 samples/sec Loss 2.8765 LearningRate 0.0000 Epoch: 34 Global Step: 709550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:08,386-Speed 6294.00 samples/sec Loss 2.8990 LearningRate 0.0000 Epoch: 34 Global Step: 709560 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:17:11,643-Speed 6289.43 samples/sec Loss 2.9068 LearningRate 0.0000 Epoch: 34 Global Step: 709570 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:17:14,882-Speed 6324.26 samples/sec Loss 2.8881 LearningRate 0.0000 Epoch: 34 Global Step: 709580 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:18,129-Speed 6308.54 samples/sec Loss 2.8736 LearningRate 0.0000 Epoch: 34 Global Step: 709590 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:21,386-Speed 6289.48 samples/sec Loss 2.8400 LearningRate 0.0000 Epoch: 34 Global Step: 709600 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:24,641-Speed 6295.82 samples/sec Loss 2.9140 LearningRate 0.0000 Epoch: 34 Global Step: 709610 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:27,886-Speed 6312.85 samples/sec Loss 2.8710 LearningRate 0.0000 Epoch: 34 Global Step: 709620 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:31,139-Speed 6296.53 samples/sec Loss 2.9128 LearningRate 0.0000 Epoch: 34 Global Step: 709630 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:34,391-Speed 6299.29 samples/sec Loss 2.8386 LearningRate 0.0000 Epoch: 34 Global Step: 709640 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:37,638-Speed 6308.14 samples/sec Loss 2.8531 LearningRate 0.0000 Epoch: 34 Global Step: 709650 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:40,893-Speed 6294.68 samples/sec Loss 2.8610 LearningRate 0.0000 Epoch: 34 Global Step: 709660 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:44,147-Speed 6293.62 samples/sec Loss 2.8789 LearningRate 0.0000 Epoch: 34 Global Step: 709670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:17:47,391-Speed 6314.68 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 709680 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:17:50,647-Speed 6292.53 samples/sec Loss 2.8707 LearningRate 0.0000 Epoch: 34 Global Step: 709690 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:17:53,939-Speed 6223.15 samples/sec Loss 2.9297 LearningRate 0.0000 Epoch: 34 Global Step: 709700 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:17:57,173-Speed 6332.24 samples/sec Loss 2.9024 LearningRate 0.0000 Epoch: 34 Global Step: 709710 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:00,435-Speed 6281.14 samples/sec Loss 2.8463 LearningRate 0.0000 Epoch: 34 Global Step: 709720 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:03,684-Speed 6303.12 samples/sec Loss 2.8211 LearningRate 0.0000 Epoch: 34 Global Step: 709730 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:06,933-Speed 6305.70 samples/sec Loss 2.8691 LearningRate 0.0000 Epoch: 34 Global Step: 709740 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:10,191-Speed 6288.56 samples/sec Loss 2.8524 LearningRate 0.0000 Epoch: 34 Global Step: 709750 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:13,445-Speed 6294.84 samples/sec Loss 2.8907 LearningRate 0.0000 Epoch: 34 Global Step: 709760 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:16,694-Speed 6305.01 samples/sec Loss 2.8915 LearningRate 0.0000 Epoch: 34 Global Step: 709770 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:19,942-Speed 6305.67 samples/sec Loss 2.9117 LearningRate 0.0000 Epoch: 34 Global Step: 709780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:23,194-Speed 6301.27 samples/sec Loss 2.8931 LearningRate 0.0000 Epoch: 34 Global Step: 709790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:26,455-Speed 6281.07 samples/sec Loss 2.8642 LearningRate 0.0000 Epoch: 34 Global Step: 709800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:29,711-Speed 6291.86 samples/sec Loss 2.9027 LearningRate 0.0000 Epoch: 34 Global Step: 709810 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:18:32,971-Speed 6282.31 samples/sec Loss 2.8412 LearningRate 0.0000 Epoch: 34 Global Step: 709820 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:18:36,227-Speed 6292.97 samples/sec Loss 2.8525 LearningRate 0.0000 Epoch: 34 Global Step: 709830 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:18:39,459-Speed 6337.43 samples/sec Loss 2.8990 LearningRate 0.0000 Epoch: 34 Global Step: 709840 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:42,711-Speed 6299.44 samples/sec Loss 2.8727 LearningRate 0.0000 Epoch: 34 Global Step: 709850 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:45,957-Speed 6310.97 samples/sec Loss 2.9292 LearningRate 0.0000 Epoch: 34 Global Step: 709860 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:49,314-Speed 6101.67 samples/sec Loss 2.8046 LearningRate 0.0000 Epoch: 34 Global Step: 709870 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:52,626-Speed 6184.64 samples/sec Loss 2.9142 LearningRate 0.0000 Epoch: 34 Global Step: 709880 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:55,883-Speed 6290.20 samples/sec Loss 2.8798 LearningRate 0.0000 Epoch: 34 Global Step: 709890 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:18:59,136-Speed 6295.77 samples/sec Loss 2.9207 LearningRate 0.0000 Epoch: 34 Global Step: 709900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:02,389-Speed 6297.02 samples/sec Loss 2.8186 LearningRate 0.0000 Epoch: 34 Global Step: 709910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:05,642-Speed 6297.87 samples/sec Loss 2.8846 LearningRate 0.0000 Epoch: 34 Global Step: 709920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:08,894-Speed 6298.74 samples/sec Loss 2.8401 LearningRate 0.0000 Epoch: 34 Global Step: 709930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:12,153-Speed 6284.91 samples/sec Loss 2.9207 LearningRate 0.0000 Epoch: 34 Global Step: 709940 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:19:15,425-Speed 6261.44 samples/sec Loss 2.8859 LearningRate 0.0000 Epoch: 34 Global Step: 709950 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:19:18,678-Speed 6298.18 samples/sec Loss 2.9000 LearningRate 0.0000 Epoch: 34 Global Step: 709960 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:19:21,928-Speed 6302.45 samples/sec Loss 2.8715 LearningRate 0.0000 Epoch: 34 Global Step: 709970 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:19:25,187-Speed 6284.43 samples/sec Loss 2.8678 LearningRate 0.0000 Epoch: 34 Global Step: 709980 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:19:28,429-Speed 6319.76 samples/sec Loss 2.9193 LearningRate 0.0000 Epoch: 34 Global Step: 709990 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:31,678-Speed 6304.55 samples/sec Loss 2.8879 LearningRate 0.0000 Epoch: 34 Global Step: 710000 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:34,941-Speed 6277.89 samples/sec Loss 2.8439 LearningRate 0.0000 Epoch: 34 Global Step: 710010 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:38,197-Speed 6292.94 samples/sec Loss 2.8767 LearningRate 0.0000 Epoch: 34 Global Step: 710020 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:41,447-Speed 6302.42 samples/sec Loss 2.8722 LearningRate 0.0000 Epoch: 34 Global Step: 710030 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:44,701-Speed 6295.21 samples/sec Loss 2.8886 LearningRate 0.0000 Epoch: 34 Global Step: 710040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:47,956-Speed 6292.15 samples/sec Loss 2.8548 LearningRate 0.0000 Epoch: 34 Global Step: 710050 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:51,215-Speed 6286.34 samples/sec Loss 2.9048 LearningRate 0.0000 Epoch: 34 Global Step: 710060 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:54,476-Speed 6282.84 samples/sec Loss 2.8464 LearningRate 0.0000 Epoch: 34 Global Step: 710070 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:19:57,726-Speed 6303.42 samples/sec Loss 2.8766 LearningRate 0.0000 Epoch: 34 Global Step: 710080 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:00,979-Speed 6296.99 samples/sec Loss 2.9190 LearningRate 0.0000 Epoch: 34 Global Step: 710090 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:20:04,228-Speed 6305.72 samples/sec Loss 2.9052 LearningRate 0.0000 Epoch: 34 Global Step: 710100 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:20:07,482-Speed 6294.80 samples/sec Loss 2.9009 LearningRate 0.0000 Epoch: 34 Global Step: 710110 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:20:10,732-Speed 6301.81 samples/sec Loss 2.8274 LearningRate 0.0000 Epoch: 34 Global Step: 710120 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:20:13,988-Speed 6292.12 samples/sec Loss 2.8950 LearningRate 0.0000 Epoch: 34 Global Step: 710130 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:20:17,231-Speed 6317.12 samples/sec Loss 2.8715 LearningRate 0.0000 Epoch: 34 Global Step: 710140 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:20,485-Speed 6294.26 samples/sec Loss 2.8502 LearningRate 0.0000 Epoch: 34 Global Step: 710150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:23,736-Speed 6301.86 samples/sec Loss 2.8473 LearningRate 0.0000 Epoch: 34 Global Step: 710160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:26,987-Speed 6299.79 samples/sec Loss 2.8705 LearningRate 0.0000 Epoch: 34 Global Step: 710170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:30,249-Speed 6279.72 samples/sec Loss 2.8450 LearningRate 0.0000 Epoch: 34 Global Step: 710180 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:33,502-Speed 6298.69 samples/sec Loss 2.9051 LearningRate 0.0000 Epoch: 34 Global Step: 710190 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:36,764-Speed 6279.23 samples/sec Loss 2.9051 LearningRate 0.0000 Epoch: 34 Global Step: 710200 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:40,024-Speed 6284.00 samples/sec Loss 2.8374 LearningRate 0.0000 Epoch: 34 Global Step: 710210 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:43,315-Speed 6223.46 samples/sec Loss 2.9022 LearningRate 0.0000 Epoch: 34 Global Step: 710220 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:46,573-Speed 6288.32 samples/sec Loss 2.8738 LearningRate 0.0000 Epoch: 34 Global Step: 710230 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:49,825-Speed 6300.48 samples/sec Loss 2.8402 LearningRate 0.0000 Epoch: 34 Global Step: 710240 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:20:53,154-Speed 6153.67 samples/sec Loss 2.8831 LearningRate 0.0000 Epoch: 34 Global Step: 710250 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:20:56,414-Speed 6282.98 samples/sec Loss 2.8407 LearningRate 0.0000 Epoch: 34 Global Step: 710260 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:20:59,664-Speed 6302.52 samples/sec Loss 2.8979 LearningRate 0.0000 Epoch: 34 Global Step: 710270 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:02,918-Speed 6295.76 samples/sec Loss 2.8706 LearningRate 0.0000 Epoch: 34 Global Step: 710280 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:06,173-Speed 6293.28 samples/sec Loss 2.8513 LearningRate 0.0000 Epoch: 34 Global Step: 710290 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:09,437-Speed 6275.74 samples/sec Loss 2.8680 LearningRate 0.0000 Epoch: 34 Global Step: 710300 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:12,689-Speed 6298.90 samples/sec Loss 2.8261 LearningRate 0.0000 Epoch: 34 Global Step: 710310 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:15,940-Speed 6300.50 samples/sec Loss 2.9047 LearningRate 0.0000 Epoch: 34 Global Step: 710320 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:19,191-Speed 6302.15 samples/sec Loss 2.8592 LearningRate 0.0000 Epoch: 34 Global Step: 710330 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:22,442-Speed 6299.47 samples/sec Loss 2.8113 LearningRate 0.0000 Epoch: 34 Global Step: 710340 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:25,705-Speed 6278.21 samples/sec Loss 2.8605 LearningRate 0.0000 Epoch: 34 Global Step: 710350 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:28,944-Speed 6325.76 samples/sec Loss 2.8411 LearningRate 0.0000 Epoch: 34 Global Step: 710360 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:32,226-Speed 6239.87 samples/sec Loss 2.9105 LearningRate 0.0000 Epoch: 34 Global Step: 710370 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:35,484-Speed 6288.77 samples/sec Loss 2.8488 LearningRate 0.0000 Epoch: 34 Global Step: 710380 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:38,729-Speed 6312.17 samples/sec Loss 2.9084 LearningRate 0.0000 Epoch: 34 Global Step: 710390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:41,987-Speed 6287.41 samples/sec Loss 2.8555 LearningRate 0.0000 Epoch: 34 Global Step: 710400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:45,237-Speed 6301.54 samples/sec Loss 2.9312 LearningRate 0.0000 Epoch: 34 Global Step: 710410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:48,483-Speed 6311.93 samples/sec Loss 2.9122 LearningRate 0.0000 Epoch: 34 Global Step: 710420 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:51,733-Speed 6302.24 samples/sec Loss 2.8963 LearningRate 0.0000 Epoch: 34 Global Step: 710430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:54,990-Speed 6289.22 samples/sec Loss 2.8922 LearningRate 0.0000 Epoch: 34 Global Step: 710440 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:21:58,247-Speed 6290.03 samples/sec Loss 2.9156 LearningRate 0.0000 Epoch: 34 Global Step: 710450 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:01,493-Speed 6310.61 samples/sec Loss 2.8324 LearningRate 0.0000 Epoch: 34 Global Step: 710460 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:22:04,752-Speed 6287.27 samples/sec Loss 2.9047 LearningRate 0.0000 Epoch: 34 Global Step: 710470 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:22:08,000-Speed 6305.37 samples/sec Loss 2.8867 LearningRate 0.0000 Epoch: 34 Global Step: 710480 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:22:11,259-Speed 6287.30 samples/sec Loss 2.8540 LearningRate 0.0000 Epoch: 34 Global Step: 710490 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:22:14,510-Speed 6300.60 samples/sec Loss 2.8659 LearningRate 0.0000 Epoch: 34 Global Step: 710500 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:22:17,761-Speed 6302.82 samples/sec Loss 2.8716 LearningRate 0.0000 Epoch: 34 Global Step: 710510 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:22:20,993-Speed 6338.62 samples/sec Loss 2.8173 LearningRate 0.0000 Epoch: 34 Global Step: 710520 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:24,250-Speed 6288.87 samples/sec Loss 2.8520 LearningRate 0.0000 Epoch: 34 Global Step: 710530 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:27,503-Speed 6298.59 samples/sec Loss 2.8692 LearningRate 0.0000 Epoch: 34 Global Step: 710540 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:30,759-Speed 6290.74 samples/sec Loss 2.8734 LearningRate 0.0000 Epoch: 34 Global Step: 710550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:34,016-Speed 6289.17 samples/sec Loss 2.8456 LearningRate 0.0000 Epoch: 34 Global Step: 710560 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:37,277-Speed 6281.33 samples/sec Loss 2.8663 LearningRate 0.0000 Epoch: 34 Global Step: 710570 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:40,535-Speed 6288.12 samples/sec Loss 2.8802 LearningRate 0.0000 Epoch: 34 Global Step: 710580 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:43,783-Speed 6306.52 samples/sec Loss 2.8298 LearningRate 0.0000 Epoch: 34 Global Step: 710590 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:47,041-Speed 6287.83 samples/sec Loss 2.9109 LearningRate 0.0000 Epoch: 34 Global Step: 710600 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:50,301-Speed 6282.72 samples/sec Loss 2.8174 LearningRate 0.0000 Epoch: 34 Global Step: 710610 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:22:53,560-Speed 6286.19 samples/sec Loss 2.8841 LearningRate 0.0000 Epoch: 34 Global Step: 710620 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:22:56,804-Speed 6318.26 samples/sec Loss 2.8484 LearningRate 0.0000 Epoch: 34 Global Step: 710630 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:00,059-Speed 6292.97 samples/sec Loss 2.8152 LearningRate 0.0000 Epoch: 34 Global Step: 710640 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:03,316-Speed 6289.83 samples/sec Loss 2.9018 LearningRate 0.0000 Epoch: 34 Global Step: 710650 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:06,572-Speed 6289.77 samples/sec Loss 2.8890 LearningRate 0.0000 Epoch: 34 Global Step: 710660 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:09,821-Speed 6305.24 samples/sec Loss 2.8605 LearningRate 0.0000 Epoch: 34 Global Step: 710670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:13,076-Speed 6293.37 samples/sec Loss 2.9545 LearningRate 0.0000 Epoch: 34 Global Step: 710680 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:16,332-Speed 6291.68 samples/sec Loss 2.9024 LearningRate 0.0000 Epoch: 34 Global Step: 710690 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:19,586-Speed 6294.94 samples/sec Loss 2.9061 LearningRate 0.0000 Epoch: 34 Global Step: 710700 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:22,841-Speed 6294.84 samples/sec Loss 2.9064 LearningRate 0.0000 Epoch: 34 Global Step: 710710 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:26,091-Speed 6301.56 samples/sec Loss 2.8307 LearningRate 0.0000 Epoch: 34 Global Step: 710720 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:29,336-Speed 6314.30 samples/sec Loss 2.8592 LearningRate 0.0000 Epoch: 34 Global Step: 710730 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:23:32,568-Speed 6337.63 samples/sec Loss 2.9217 LearningRate 0.0000 Epoch: 34 Global Step: 710740 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:35,814-Speed 6310.17 samples/sec Loss 2.8221 LearningRate 0.0000 Epoch: 34 Global Step: 710750 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:39,073-Speed 6286.17 samples/sec Loss 2.9227 LearningRate 0.0000 Epoch: 34 Global Step: 710760 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:42,337-Speed 6274.59 samples/sec Loss 2.8986 LearningRate 0.0000 Epoch: 34 Global Step: 710770 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:45,585-Speed 6306.66 samples/sec Loss 2.8714 LearningRate 0.0000 Epoch: 34 Global Step: 710780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:48,841-Speed 6291.33 samples/sec Loss 2.9358 LearningRate 0.0000 Epoch: 34 Global Step: 710790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:52,093-Speed 6299.16 samples/sec Loss 2.8523 LearningRate 0.0000 Epoch: 34 Global Step: 710800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:55,376-Speed 6239.87 samples/sec Loss 2.8912 LearningRate 0.0000 Epoch: 34 Global Step: 710810 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:23:58,637-Speed 6281.18 samples/sec Loss 2.8771 LearningRate 0.0000 Epoch: 34 Global Step: 710820 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:01,901-Speed 6277.22 samples/sec Loss 2.9247 LearningRate 0.0000 Epoch: 34 Global Step: 710830 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:05,134-Speed 6334.91 samples/sec Loss 2.8641 LearningRate 0.0000 Epoch: 34 Global Step: 710840 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:08,386-Speed 6300.44 samples/sec Loss 2.8731 LearningRate 0.0000 Epoch: 34 Global Step: 710850 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:11,639-Speed 6296.69 samples/sec Loss 2.8568 LearningRate 0.0000 Epoch: 34 Global Step: 710860 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:14,889-Speed 6301.79 samples/sec Loss 2.8656 LearningRate 0.0000 Epoch: 34 Global Step: 710870 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:18,137-Speed 6307.06 samples/sec Loss 2.8622 LearningRate 0.0000 Epoch: 34 Global Step: 710880 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:21,398-Speed 6283.64 samples/sec Loss 2.8236 LearningRate 0.0000 Epoch: 34 Global Step: 710890 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:24,649-Speed 6300.22 samples/sec Loss 2.8731 LearningRate 0.0000 Epoch: 34 Global Step: 710900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:27,899-Speed 6303.55 samples/sec Loss 2.8354 LearningRate 0.0000 Epoch: 34 Global Step: 710910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:31,148-Speed 6304.49 samples/sec Loss 2.8431 LearningRate 0.0000 Epoch: 34 Global Step: 710920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:34,395-Speed 6309.12 samples/sec Loss 2.8897 LearningRate 0.0000 Epoch: 34 Global Step: 710930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:37,645-Speed 6303.05 samples/sec Loss 2.8338 LearningRate 0.0000 Epoch: 34 Global Step: 710940 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:24:40,877-Speed 6336.68 samples/sec Loss 2.8399 LearningRate 0.0000 Epoch: 34 Global Step: 710950 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:44,126-Speed 6304.84 samples/sec Loss 2.8897 LearningRate 0.0000 Epoch: 34 Global Step: 710960 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:47,383-Speed 6290.91 samples/sec Loss 2.9155 LearningRate 0.0000 Epoch: 34 Global Step: 710970 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:50,643-Speed 6282.83 samples/sec Loss 2.8347 LearningRate 0.0000 Epoch: 34 Global Step: 710980 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:53,895-Speed 6298.62 samples/sec Loss 2.8945 LearningRate 0.0000 Epoch: 34 Global Step: 710990 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:24:57,146-Speed 6302.66 samples/sec Loss 2.8594 LearningRate 0.0000 Epoch: 34 Global Step: 711000 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:00,403-Speed 6287.80 samples/sec Loss 2.9196 LearningRate 0.0000 Epoch: 34 Global Step: 711010 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:03,654-Speed 6301.24 samples/sec Loss 2.8606 LearningRate 0.0000 Epoch: 34 Global Step: 711020 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:06,909-Speed 6293.23 samples/sec Loss 2.9146 LearningRate 0.0000 Epoch: 34 Global Step: 711030 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:10,158-Speed 6305.81 samples/sec Loss 2.8522 LearningRate 0.0000 Epoch: 34 Global Step: 711040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:13,410-Speed 6297.85 samples/sec Loss 2.8594 LearningRate 0.0000 Epoch: 34 Global Step: 711050 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:25:16,658-Speed 6307.29 samples/sec Loss 2.8636 LearningRate 0.0000 Epoch: 34 Global Step: 711060 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:25:19,896-Speed 6327.29 samples/sec Loss 2.8769 LearningRate 0.0000 Epoch: 34 Global Step: 711070 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:23,150-Speed 6294.69 samples/sec Loss 2.8577 LearningRate 0.0000 Epoch: 34 Global Step: 711080 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:26,396-Speed 6312.43 samples/sec Loss 2.8672 LearningRate 0.0000 Epoch: 34 Global Step: 711090 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:29,641-Speed 6313.51 samples/sec Loss 2.8451 LearningRate 0.0000 Epoch: 34 Global Step: 711100 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:32,897-Speed 6291.72 samples/sec Loss 2.8983 LearningRate 0.0000 Epoch: 34 Global Step: 711110 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:36,143-Speed 6310.76 samples/sec Loss 2.9408 LearningRate 0.0000 Epoch: 34 Global Step: 711120 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:39,392-Speed 6305.84 samples/sec Loss 2.8686 LearningRate 0.0000 Epoch: 34 Global Step: 711130 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:42,637-Speed 6313.26 samples/sec Loss 2.8888 LearningRate 0.0000 Epoch: 34 Global Step: 711140 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:45,894-Speed 6288.45 samples/sec Loss 2.8837 LearningRate 0.0000 Epoch: 34 Global Step: 711150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:49,157-Speed 6278.13 samples/sec Loss 2.8790 LearningRate 0.0000 Epoch: 34 Global Step: 711160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:52,464-Speed 6193.56 samples/sec Loss 2.9164 LearningRate 0.0000 Epoch: 34 Global Step: 711170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:55,730-Speed 6273.32 samples/sec Loss 2.8736 LearningRate 0.0000 Epoch: 34 Global Step: 711180 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:25:58,986-Speed 6290.46 samples/sec Loss 2.8343 LearningRate 0.0000 Epoch: 34 Global Step: 711190 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:02,247-Speed 6280.86 samples/sec Loss 2.8888 LearningRate 0.0000 Epoch: 34 Global Step: 711200 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:05,497-Speed 6303.76 samples/sec Loss 2.8503 LearningRate 0.0000 Epoch: 34 Global Step: 711210 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:08,755-Speed 6287.77 samples/sec Loss 2.8394 LearningRate 0.0000 Epoch: 34 Global Step: 711220 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:12,012-Speed 6289.64 samples/sec Loss 2.8224 LearningRate 0.0000 Epoch: 34 Global Step: 711230 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:15,268-Speed 6290.93 samples/sec Loss 2.9317 LearningRate 0.0000 Epoch: 34 Global Step: 711240 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:18,521-Speed 6297.52 samples/sec Loss 2.8664 LearningRate 0.0000 Epoch: 34 Global Step: 711250 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:21,767-Speed 6310.64 samples/sec Loss 2.8562 LearningRate 0.0000 Epoch: 34 Global Step: 711260 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:25,016-Speed 6303.03 samples/sec Loss 2.8731 LearningRate 0.0000 Epoch: 34 Global Step: 711270 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:26:28,255-Speed 6325.96 samples/sec Loss 2.8997 LearningRate 0.0000 Epoch: 34 Global Step: 711280 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:31,509-Speed 6295.31 samples/sec Loss 2.7924 LearningRate 0.0000 Epoch: 34 Global Step: 711290 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:34,762-Speed 6296.10 samples/sec Loss 2.8549 LearningRate 0.0000 Epoch: 34 Global Step: 711300 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:38,024-Speed 6279.62 samples/sec Loss 2.7779 LearningRate 0.0000 Epoch: 34 Global Step: 711310 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:41,287-Speed 6278.74 samples/sec Loss 2.8870 LearningRate 0.0000 Epoch: 34 Global Step: 711320 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:44,538-Speed 6301.90 samples/sec Loss 2.9519 LearningRate 0.0000 Epoch: 34 Global Step: 711330 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:47,795-Speed 6288.33 samples/sec Loss 2.9239 LearningRate 0.0000 Epoch: 34 Global Step: 711340 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:51,045-Speed 6303.27 samples/sec Loss 2.8851 LearningRate 0.0000 Epoch: 34 Global Step: 711350 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:54,293-Speed 6306.29 samples/sec Loss 2.8504 LearningRate 0.0000 Epoch: 34 Global Step: 711360 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:26:57,542-Speed 6305.39 samples/sec Loss 2.8326 LearningRate 0.0000 Epoch: 34 Global Step: 711370 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:00,799-Speed 6290.28 samples/sec Loss 2.8897 LearningRate 0.0000 Epoch: 34 Global Step: 711380 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:27:04,057-Speed 6285.71 samples/sec Loss 2.8789 LearningRate 0.0000 Epoch: 34 Global Step: 711390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:07,313-Speed 6292.05 samples/sec Loss 2.9082 LearningRate 0.0000 Epoch: 34 Global Step: 711400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:10,567-Speed 6295.32 samples/sec Loss 2.8531 LearningRate 0.0000 Epoch: 34 Global Step: 711410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:13,819-Speed 6300.19 samples/sec Loss 2.8487 LearningRate 0.0000 Epoch: 34 Global Step: 711420 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:17,072-Speed 6296.09 samples/sec Loss 2.7994 LearningRate 0.0000 Epoch: 34 Global Step: 711430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:20,333-Speed 6280.82 samples/sec Loss 2.8886 LearningRate 0.0000 Epoch: 34 Global Step: 711440 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:23,593-Speed 6283.64 samples/sec Loss 2.8379 LearningRate 0.0000 Epoch: 34 Global Step: 711450 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:26,849-Speed 6291.31 samples/sec Loss 2.9194 LearningRate 0.0000 Epoch: 34 Global Step: 711460 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:30,100-Speed 6302.24 samples/sec Loss 2.8541 LearningRate 0.0000 Epoch: 34 Global Step: 711470 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:33,352-Speed 6297.61 samples/sec Loss 2.8918 LearningRate 0.0000 Epoch: 34 Global Step: 711480 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:36,614-Speed 6281.01 samples/sec Loss 2.8994 LearningRate 0.0000 Epoch: 34 Global Step: 711490 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:27:39,880-Speed 6272.20 samples/sec Loss 2.8913 LearningRate 0.0000 Epoch: 34 Global Step: 711500 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:27:43,141-Speed 6281.81 samples/sec Loss 2.8811 LearningRate 0.0000 Epoch: 34 Global Step: 711510 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:27:46,376-Speed 6330.69 samples/sec Loss 2.8213 LearningRate 0.0000 Epoch: 34 Global Step: 711520 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:49,634-Speed 6288.81 samples/sec Loss 2.7932 LearningRate 0.0000 Epoch: 34 Global Step: 711530 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:52,888-Speed 6293.85 samples/sec Loss 2.8629 LearningRate 0.0000 Epoch: 34 Global Step: 711540 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:56,141-Speed 6297.97 samples/sec Loss 2.8566 LearningRate 0.0000 Epoch: 34 Global Step: 711550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:27:59,389-Speed 6306.75 samples/sec Loss 2.8899 LearningRate 0.0000 Epoch: 34 Global Step: 711560 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:02,635-Speed 6312.02 samples/sec Loss 2.8615 LearningRate 0.0000 Epoch: 34 Global Step: 711570 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:05,888-Speed 6296.94 samples/sec Loss 2.8319 LearningRate 0.0000 Epoch: 34 Global Step: 711580 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:09,137-Speed 6303.48 samples/sec Loss 2.8104 LearningRate 0.0000 Epoch: 34 Global Step: 711590 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:12,382-Speed 6312.65 samples/sec Loss 2.8281 LearningRate 0.0000 Epoch: 34 Global Step: 711600 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:15,628-Speed 6310.91 samples/sec Loss 2.8571 LearningRate 0.0000 Epoch: 34 Global Step: 711610 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:18,876-Speed 6307.09 samples/sec Loss 2.7762 LearningRate 0.0000 Epoch: 34 Global Step: 711620 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:28:22,118-Speed 6319.69 samples/sec Loss 2.9166 LearningRate 0.0000 Epoch: 34 Global Step: 711630 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:25,389-Speed 6260.96 samples/sec Loss 2.9150 LearningRate 0.0000 Epoch: 34 Global Step: 711640 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:28,640-Speed 6301.35 samples/sec Loss 2.8760 LearningRate 0.0000 Epoch: 34 Global Step: 711650 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:31,889-Speed 6304.34 samples/sec Loss 2.8692 LearningRate 0.0000 Epoch: 34 Global Step: 711660 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:35,138-Speed 6305.59 samples/sec Loss 2.8526 LearningRate 0.0000 Epoch: 34 Global Step: 711670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:38,389-Speed 6300.92 samples/sec Loss 2.8744 LearningRate 0.0000 Epoch: 34 Global Step: 711680 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:41,640-Speed 6300.81 samples/sec Loss 2.8889 LearningRate 0.0000 Epoch: 34 Global Step: 711690 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:44,895-Speed 6293.17 samples/sec Loss 2.8403 LearningRate 0.0000 Epoch: 34 Global Step: 711700 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:48,153-Speed 6288.59 samples/sec Loss 2.8630 LearningRate 0.0000 Epoch: 34 Global Step: 711710 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:51,406-Speed 6297.00 samples/sec Loss 2.8730 LearningRate 0.0000 Epoch: 34 Global Step: 711720 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:28:54,659-Speed 6295.41 samples/sec Loss 2.8859 LearningRate 0.0000 Epoch: 34 Global Step: 711730 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:28:57,895-Speed 6330.79 samples/sec Loss 2.8152 LearningRate 0.0000 Epoch: 34 Global Step: 711740 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:01,150-Speed 6293.10 samples/sec Loss 2.8056 LearningRate 0.0000 Epoch: 34 Global Step: 711750 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:04,401-Speed 6301.03 samples/sec Loss 2.8918 LearningRate 0.0000 Epoch: 34 Global Step: 711760 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:07,653-Speed 6300.02 samples/sec Loss 2.8808 LearningRate 0.0000 Epoch: 34 Global Step: 711770 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:10,902-Speed 6305.56 samples/sec Loss 2.9021 LearningRate 0.0000 Epoch: 34 Global Step: 711780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:14,150-Speed 6307.38 samples/sec Loss 2.8353 LearningRate 0.0000 Epoch: 34 Global Step: 711790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:17,401-Speed 6301.68 samples/sec Loss 2.9432 LearningRate 0.0000 Epoch: 34 Global Step: 711800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:20,649-Speed 6304.75 samples/sec Loss 2.8405 LearningRate 0.0000 Epoch: 34 Global Step: 711810 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:23,905-Speed 6292.49 samples/sec Loss 2.8948 LearningRate 0.0000 Epoch: 34 Global Step: 711820 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:27,161-Speed 6291.10 samples/sec Loss 2.8566 LearningRate 0.0000 Epoch: 34 Global Step: 711830 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:30,419-Speed 6287.46 samples/sec Loss 2.8462 LearningRate 0.0000 Epoch: 34 Global Step: 711840 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:29:33,681-Speed 6279.41 samples/sec Loss 2.8243 LearningRate 0.0000 Epoch: 34 Global Step: 711850 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:36,958-Speed 6250.93 samples/sec Loss 2.8930 LearningRate 0.0000 Epoch: 34 Global Step: 711860 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:40,214-Speed 6291.59 samples/sec Loss 2.8912 LearningRate 0.0000 Epoch: 34 Global Step: 711870 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:43,464-Speed 6303.43 samples/sec Loss 2.8756 LearningRate 0.0000 Epoch: 34 Global Step: 711880 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:46,710-Speed 6310.56 samples/sec Loss 2.8343 LearningRate 0.0000 Epoch: 34 Global Step: 711890 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:49,968-Speed 6286.45 samples/sec Loss 2.8777 LearningRate 0.0000 Epoch: 34 Global Step: 711900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:53,225-Speed 6290.05 samples/sec Loss 2.8405 LearningRate 0.0000 Epoch: 34 Global Step: 711910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:56,481-Speed 6291.69 samples/sec Loss 2.8181 LearningRate 0.0000 Epoch: 34 Global Step: 711920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:29:59,743-Speed 6279.93 samples/sec Loss 2.8554 LearningRate 0.0000 Epoch: 34 Global Step: 711930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:03,003-Speed 6282.03 samples/sec Loss 2.9084 LearningRate 0.0000 Epoch: 34 Global Step: 711940 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:06,258-Speed 6293.36 samples/sec Loss 2.8281 LearningRate 0.0000 Epoch: 34 Global Step: 711950 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:30:09,507-Speed 6304.73 samples/sec Loss 2.8796 LearningRate 0.0000 Epoch: 34 Global Step: 711960 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:30:12,757-Speed 6303.22 samples/sec Loss 2.8427 LearningRate 0.0000 Epoch: 34 Global Step: 711970 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:30:15,991-Speed 6335.35 samples/sec Loss 2.9032 LearningRate 0.0000 Epoch: 34 Global Step: 711980 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:19,240-Speed 6304.59 samples/sec Loss 2.8601 LearningRate 0.0000 Epoch: 34 Global Step: 711990 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:22,491-Speed 6302.29 samples/sec Loss 2.8348 LearningRate 0.0000 Epoch: 34 Global Step: 712000 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:25,743-Speed 6298.26 samples/sec Loss 2.8732 LearningRate 0.0000 Epoch: 34 Global Step: 712010 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:28,996-Speed 6297.90 samples/sec Loss 2.8555 LearningRate 0.0000 Epoch: 34 Global Step: 712020 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:32,246-Speed 6303.11 samples/sec Loss 2.8527 LearningRate 0.0000 Epoch: 34 Global Step: 712030 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:35,503-Speed 6287.86 samples/sec Loss 2.8997 LearningRate 0.0000 Epoch: 34 Global Step: 712040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:38,752-Speed 6305.49 samples/sec Loss 2.8773 LearningRate 0.0000 Epoch: 34 Global Step: 712050 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:41,999-Speed 6308.64 samples/sec Loss 2.8514 LearningRate 0.0000 Epoch: 34 Global Step: 712060 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:45,258-Speed 6286.17 samples/sec Loss 2.8720 LearningRate 0.0000 Epoch: 34 Global Step: 712070 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:30:48,515-Speed 6288.71 samples/sec Loss 2.8431 LearningRate 0.0000 Epoch: 34 Global Step: 712080 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:30:51,773-Speed 6288.57 samples/sec Loss 2.8382 LearningRate 0.0000 Epoch: 34 Global Step: 712090 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:30:55,025-Speed 6299.20 samples/sec Loss 2.8566 LearningRate 0.0000 Epoch: 34 Global Step: 712100 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:30:58,278-Speed 6295.81 samples/sec Loss 2.8352 LearningRate 0.0000 Epoch: 34 Global Step: 712110 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:31:01,515-Speed 6329.78 samples/sec Loss 2.8765 LearningRate 0.0000 Epoch: 34 Global Step: 712120 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:04,785-Speed 6264.03 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 712130 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:08,045-Speed 6282.26 samples/sec Loss 2.9026 LearningRate 0.0000 Epoch: 34 Global Step: 712140 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:11,307-Speed 6281.04 samples/sec Loss 2.8791 LearningRate 0.0000 Epoch: 34 Global Step: 712150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:14,569-Speed 6279.97 samples/sec Loss 2.8739 LearningRate 0.0000 Epoch: 34 Global Step: 712160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:17,832-Speed 6277.24 samples/sec Loss 2.8583 LearningRate 0.0000 Epoch: 34 Global Step: 712170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:21,091-Speed 6284.61 samples/sec Loss 2.8243 LearningRate 0.0000 Epoch: 34 Global Step: 712180 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:24,349-Speed 6287.72 samples/sec Loss 2.8590 LearningRate 0.0000 Epoch: 34 Global Step: 712190 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:27,606-Speed 6289.94 samples/sec Loss 2.8519 LearningRate 0.0000 Epoch: 34 Global Step: 712200 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:30,857-Speed 6301.06 samples/sec Loss 2.8395 LearningRate 0.0000 Epoch: 34 Global Step: 712210 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:34,112-Speed 6293.22 samples/sec Loss 2.8367 LearningRate 0.0000 Epoch: 34 Global Step: 712220 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:31:37,361-Speed 6305.66 samples/sec Loss 2.8900 LearningRate 0.0000 Epoch: 34 Global Step: 712230 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:31:40,608-Speed 6308.80 samples/sec Loss 2.8904 LearningRate 0.0000 Epoch: 34 Global Step: 712240 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:43,853-Speed 6312.78 samples/sec Loss 2.7914 LearningRate 0.0000 Epoch: 34 Global Step: 712250 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:47,107-Speed 6295.37 samples/sec Loss 2.8623 LearningRate 0.0000 Epoch: 34 Global Step: 712260 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:50,359-Speed 6299.48 samples/sec Loss 2.8921 LearningRate 0.0000 Epoch: 34 Global Step: 712270 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:53,633-Speed 6255.56 samples/sec Loss 2.8857 LearningRate 0.0000 Epoch: 34 Global Step: 712280 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:31:56,900-Speed 6270.98 samples/sec Loss 2.8407 LearningRate 0.0000 Epoch: 34 Global Step: 712290 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:00,174-Speed 6256.61 samples/sec Loss 2.8480 LearningRate 0.0000 Epoch: 34 Global Step: 712300 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:03,428-Speed 6295.13 samples/sec Loss 2.8534 LearningRate 0.0000 Epoch: 34 Global Step: 712310 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:06,675-Speed 6307.98 samples/sec Loss 2.8737 LearningRate 0.0000 Epoch: 34 Global Step: 712320 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:09,926-Speed 6301.64 samples/sec Loss 2.9162 LearningRate 0.0000 Epoch: 34 Global Step: 712330 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:13,189-Speed 6277.18 samples/sec Loss 2.8710 LearningRate 0.0000 Epoch: 34 Global Step: 712340 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:32:16,430-Speed 6321.48 samples/sec Loss 2.8333 LearningRate 0.0000 Epoch: 34 Global Step: 712350 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:19,683-Speed 6295.69 samples/sec Loss 2.8407 LearningRate 0.0000 Epoch: 34 Global Step: 712360 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:22,932-Speed 6304.57 samples/sec Loss 2.8756 LearningRate 0.0000 Epoch: 34 Global Step: 712370 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:26,182-Speed 6304.33 samples/sec Loss 2.8271 LearningRate 0.0000 Epoch: 34 Global Step: 712380 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:29,445-Speed 6278.06 samples/sec Loss 2.9311 LearningRate 0.0000 Epoch: 34 Global Step: 712390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:32,694-Speed 6303.54 samples/sec Loss 2.8744 LearningRate 0.0000 Epoch: 34 Global Step: 712400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:35,943-Speed 6305.10 samples/sec Loss 2.8674 LearningRate 0.0000 Epoch: 34 Global Step: 712410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:39,200-Speed 6289.32 samples/sec Loss 2.9123 LearningRate 0.0000 Epoch: 34 Global Step: 712420 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:42,456-Speed 6290.95 samples/sec Loss 2.8963 LearningRate 0.0000 Epoch: 34 Global Step: 712430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:45,709-Speed 6297.88 samples/sec Loss 2.8346 LearningRate 0.0000 Epoch: 34 Global Step: 712440 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:48,955-Speed 6311.11 samples/sec Loss 2.7785 LearningRate 0.0000 Epoch: 34 Global Step: 712450 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:32:52,189-Speed 6334.40 samples/sec Loss 2.8425 LearningRate 0.0000 Epoch: 34 Global Step: 712460 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:55,439-Speed 6302.97 samples/sec Loss 2.8608 LearningRate 0.0000 Epoch: 34 Global Step: 712470 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:32:58,698-Speed 6285.67 samples/sec Loss 2.8409 LearningRate 0.0000 Epoch: 34 Global Step: 712480 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:01,949-Speed 6301.64 samples/sec Loss 2.8619 LearningRate 0.0000 Epoch: 34 Global Step: 712490 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:05,205-Speed 6290.60 samples/sec Loss 2.8595 LearningRate 0.0000 Epoch: 34 Global Step: 712500 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:08,468-Speed 6278.11 samples/sec Loss 2.8614 LearningRate 0.0000 Epoch: 34 Global Step: 712510 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:11,727-Speed 6284.75 samples/sec Loss 2.8608 LearningRate 0.0000 Epoch: 34 Global Step: 712520 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:14,983-Speed 6292.91 samples/sec Loss 2.8735 LearningRate 0.0000 Epoch: 34 Global Step: 712530 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:18,239-Speed 6290.48 samples/sec Loss 2.8980 LearningRate 0.0000 Epoch: 34 Global Step: 712540 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:21,497-Speed 6288.20 samples/sec Loss 2.8638 LearningRate 0.0000 Epoch: 34 Global Step: 712550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:24,757-Speed 6282.89 samples/sec Loss 2.8708 LearningRate 0.0000 Epoch: 34 Global Step: 712560 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:33:28,007-Speed 6302.72 samples/sec Loss 2.9077 LearningRate 0.0000 Epoch: 34 Global Step: 712570 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:33:31,240-Speed 6335.89 samples/sec Loss 2.8909 LearningRate 0.0000 Epoch: 34 Global Step: 712580 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:34,495-Speed 6293.22 samples/sec Loss 2.8339 LearningRate 0.0000 Epoch: 34 Global Step: 712590 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:37,755-Speed 6287.17 samples/sec Loss 2.8736 LearningRate 0.0000 Epoch: 34 Global Step: 712600 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:41,012-Speed 6288.18 samples/sec Loss 2.8833 LearningRate 0.0000 Epoch: 34 Global Step: 712610 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:44,265-Speed 6298.53 samples/sec Loss 2.8483 LearningRate 0.0000 Epoch: 34 Global Step: 712620 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:47,522-Speed 6290.06 samples/sec Loss 2.8204 LearningRate 0.0000 Epoch: 34 Global Step: 712630 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:50,770-Speed 6305.90 samples/sec Loss 2.8040 LearningRate 0.0000 Epoch: 34 Global Step: 712640 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:54,018-Speed 6306.21 samples/sec Loss 2.8705 LearningRate 0.0000 Epoch: 34 Global Step: 712650 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:33:57,272-Speed 6296.53 samples/sec Loss 2.8688 LearningRate 0.0000 Epoch: 34 Global Step: 712660 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:00,528-Speed 6290.45 samples/sec Loss 2.9227 LearningRate 0.0000 Epoch: 34 Global Step: 712670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:03,805-Speed 6251.88 samples/sec Loss 2.8847 LearningRate 0.0000 Epoch: 34 Global Step: 712680 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:34:07,059-Speed 6295.93 samples/sec Loss 2.8878 LearningRate 0.0000 Epoch: 34 Global Step: 712690 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:34:10,305-Speed 6310.38 samples/sec Loss 2.8231 LearningRate 0.0000 Epoch: 34 Global Step: 712700 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:34:13,542-Speed 6328.68 samples/sec Loss 2.8182 LearningRate 0.0000 Epoch: 34 Global Step: 712710 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:16,798-Speed 6290.21 samples/sec Loss 2.8137 LearningRate 0.0000 Epoch: 34 Global Step: 712720 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:20,047-Speed 6306.51 samples/sec Loss 2.8738 LearningRate 0.0000 Epoch: 34 Global Step: 712730 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:23,308-Speed 6279.89 samples/sec Loss 2.8832 LearningRate 0.0000 Epoch: 34 Global Step: 712740 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:26,565-Speed 6289.32 samples/sec Loss 2.8376 LearningRate 0.0000 Epoch: 34 Global Step: 712750 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:29,810-Speed 6312.52 samples/sec Loss 2.8173 LearningRate 0.0000 Epoch: 34 Global Step: 712760 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:33,065-Speed 6294.44 samples/sec Loss 2.8480 LearningRate 0.0000 Epoch: 34 Global Step: 712770 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:36,308-Speed 6317.15 samples/sec Loss 2.8229 LearningRate 0.0000 Epoch: 34 Global Step: 712780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:39,573-Speed 6272.42 samples/sec Loss 2.8591 LearningRate 0.0000 Epoch: 34 Global Step: 712790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:42,839-Speed 6272.18 samples/sec Loss 2.8384 LearningRate 0.0000 Epoch: 34 Global Step: 712800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:46,089-Speed 6304.79 samples/sec Loss 2.8587 LearningRate 0.0000 Epoch: 34 Global Step: 712810 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:34:49,324-Speed 6331.20 samples/sec Loss 2.8798 LearningRate 0.0000 Epoch: 34 Global Step: 712820 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:52,589-Speed 6273.65 samples/sec Loss 2.8563 LearningRate 0.0000 Epoch: 34 Global Step: 712830 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:55,843-Speed 6295.30 samples/sec Loss 2.8233 LearningRate 0.0000 Epoch: 34 Global Step: 712840 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:34:59,088-Speed 6312.29 samples/sec Loss 2.9160 LearningRate 0.0000 Epoch: 34 Global Step: 712850 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:02,338-Speed 6303.76 samples/sec Loss 2.8582 LearningRate 0.0000 Epoch: 34 Global Step: 712860 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:05,620-Speed 6241.74 samples/sec Loss 2.8830 LearningRate 0.0000 Epoch: 34 Global Step: 712870 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:08,874-Speed 6294.84 samples/sec Loss 2.8645 LearningRate 0.0000 Epoch: 34 Global Step: 712880 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:12,125-Speed 6301.04 samples/sec Loss 2.8742 LearningRate 0.0000 Epoch: 34 Global Step: 712890 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:15,384-Speed 6285.92 samples/sec Loss 2.8372 LearningRate 0.0000 Epoch: 34 Global Step: 712900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:18,636-Speed 6298.70 samples/sec Loss 2.8306 LearningRate 0.0000 Epoch: 34 Global Step: 712910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:21,878-Speed 6318.81 samples/sec Loss 2.8081 LearningRate 0.0000 Epoch: 34 Global Step: 712920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:25,133-Speed 6293.58 samples/sec Loss 2.8490 LearningRate 0.0000 Epoch: 34 Global Step: 712930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:28,395-Speed 6279.25 samples/sec Loss 2.8524 LearningRate 0.0000 Epoch: 34 Global Step: 712940 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:31,641-Speed 6311.67 samples/sec Loss 2.9227 LearningRate 0.0000 Epoch: 34 Global Step: 712950 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:34,899-Speed 6286.96 samples/sec Loss 2.8260 LearningRate 0.0000 Epoch: 34 Global Step: 712960 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:38,159-Speed 6283.87 samples/sec Loss 2.8696 LearningRate 0.0000 Epoch: 34 Global Step: 712970 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:41,409-Speed 6303.36 samples/sec Loss 2.8001 LearningRate 0.0000 Epoch: 34 Global Step: 712980 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:44,664-Speed 6293.60 samples/sec Loss 2.8591 LearningRate 0.0000 Epoch: 34 Global Step: 712990 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:47,914-Speed 6302.70 samples/sec Loss 2.8240 LearningRate 0.0000 Epoch: 34 Global Step: 713000 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:51,163-Speed 6304.30 samples/sec Loss 2.8887 LearningRate 0.0000 Epoch: 34 Global Step: 713010 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:35:54,410-Speed 6308.05 samples/sec Loss 2.8804 LearningRate 0.0000 Epoch: 34 Global Step: 713020 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:35:57,649-Speed 6324.24 samples/sec Loss 2.8702 LearningRate 0.0000 Epoch: 34 Global Step: 713030 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:00,920-Speed 6262.25 samples/sec Loss 2.8742 LearningRate 0.0000 Epoch: 34 Global Step: 713040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:04,176-Speed 6293.35 samples/sec Loss 2.8612 LearningRate 0.0000 Epoch: 34 Global Step: 713050 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:07,429-Speed 6295.32 samples/sec Loss 2.8545 LearningRate 0.0000 Epoch: 34 Global Step: 713060 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:10,682-Speed 6297.92 samples/sec Loss 2.8649 LearningRate 0.0000 Epoch: 34 Global Step: 713070 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:13,944-Speed 6282.63 samples/sec Loss 2.9610 LearningRate 0.0000 Epoch: 34 Global Step: 713080 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:17,203-Speed 6285.03 samples/sec Loss 2.8560 LearningRate 0.0000 Epoch: 34 Global Step: 713090 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:20,455-Speed 6300.08 samples/sec Loss 2.8756 LearningRate 0.0000 Epoch: 34 Global Step: 713100 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:23,711-Speed 6291.20 samples/sec Loss 2.8546 LearningRate 0.0000 Epoch: 34 Global Step: 713110 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:26,963-Speed 6298.88 samples/sec Loss 2.9180 LearningRate 0.0000 Epoch: 34 Global Step: 713120 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:30,214-Speed 6299.67 samples/sec Loss 2.8515 LearningRate 0.0000 Epoch: 34 Global Step: 713130 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:36:33,452-Speed 6328.42 samples/sec Loss 2.8780 LearningRate 0.0000 Epoch: 34 Global Step: 713140 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:36,756-Speed 6200.11 samples/sec Loss 2.8762 LearningRate 0.0000 Epoch: 34 Global Step: 713150 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:40,068-Speed 6183.24 samples/sec Loss 2.8749 LearningRate 0.0000 Epoch: 34 Global Step: 713160 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:43,317-Speed 6305.03 samples/sec Loss 2.9356 LearningRate 0.0000 Epoch: 34 Global Step: 713170 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:46,578-Speed 6282.98 samples/sec Loss 2.8886 LearningRate 0.0000 Epoch: 34 Global Step: 713180 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:49,825-Speed 6309.36 samples/sec Loss 2.8084 LearningRate 0.0000 Epoch: 34 Global Step: 713190 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:53,082-Speed 6288.27 samples/sec Loss 2.8440 LearningRate 0.0000 Epoch: 34 Global Step: 713200 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:56,336-Speed 6295.40 samples/sec Loss 2.8654 LearningRate 0.0000 Epoch: 34 Global Step: 713210 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:36:59,584-Speed 6306.14 samples/sec Loss 2.8441 LearningRate 0.0000 Epoch: 34 Global Step: 713220 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:02,833-Speed 6304.93 samples/sec Loss 2.8991 LearningRate 0.0000 Epoch: 34 Global Step: 713230 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:06,086-Speed 6297.07 samples/sec Loss 2.8802 LearningRate 0.0000 Epoch: 34 Global Step: 713240 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:37:09,324-Speed 6325.96 samples/sec Loss 2.8445 LearningRate 0.0000 Epoch: 34 Global Step: 713250 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:12,576-Speed 6299.70 samples/sec Loss 2.8372 LearningRate 0.0000 Epoch: 34 Global Step: 713260 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:15,822-Speed 6311.61 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 713270 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:19,075-Speed 6295.95 samples/sec Loss 2.8642 LearningRate 0.0000 Epoch: 34 Global Step: 713280 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:22,335-Speed 6283.29 samples/sec Loss 2.8305 LearningRate 0.0000 Epoch: 34 Global Step: 713290 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:25,590-Speed 6293.66 samples/sec Loss 2.8687 LearningRate 0.0000 Epoch: 34 Global Step: 713300 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:28,849-Speed 6285.43 samples/sec Loss 2.9436 LearningRate 0.0000 Epoch: 34 Global Step: 713310 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:32,113-Speed 6275.47 samples/sec Loss 2.8339 LearningRate 0.0000 Epoch: 34 Global Step: 713320 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:35,370-Speed 6289.74 samples/sec Loss 2.8730 LearningRate 0.0000 Epoch: 34 Global Step: 713330 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:38,631-Speed 6281.24 samples/sec Loss 2.8760 LearningRate 0.0000 Epoch: 34 Global Step: 713340 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:41,893-Speed 6280.25 samples/sec Loss 2.9133 LearningRate 0.0000 Epoch: 34 Global Step: 713350 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:37:45,144-Speed 6300.79 samples/sec Loss 2.8407 LearningRate 0.0000 Epoch: 34 Global Step: 713360 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:37:48,382-Speed 6327.74 samples/sec Loss 2.7863 LearningRate 0.0000 Epoch: 34 Global Step: 713370 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:51,662-Speed 6246.06 samples/sec Loss 2.8437 LearningRate 0.0000 Epoch: 34 Global Step: 713380 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:54,918-Speed 6290.08 samples/sec Loss 2.8536 LearningRate 0.0000 Epoch: 34 Global Step: 713390 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:37:58,177-Speed 6285.77 samples/sec Loss 2.8581 LearningRate 0.0000 Epoch: 34 Global Step: 713400 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:01,426-Speed 6303.92 samples/sec Loss 2.8556 LearningRate 0.0000 Epoch: 34 Global Step: 713410 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:04,679-Speed 6298.21 samples/sec Loss 2.8897 LearningRate 0.0000 Epoch: 34 Global Step: 713420 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:07,930-Speed 6300.37 samples/sec Loss 2.8668 LearningRate 0.0000 Epoch: 34 Global Step: 713430 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:11,186-Speed 6291.95 samples/sec Loss 2.8512 LearningRate 0.0000 Epoch: 34 Global Step: 713440 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:14,441-Speed 6292.73 samples/sec Loss 2.8755 LearningRate 0.0000 Epoch: 34 Global Step: 713450 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:17,700-Speed 6287.17 samples/sec Loss 2.8292 LearningRate 0.0000 Epoch: 34 Global Step: 713460 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:20,963-Speed 6277.13 samples/sec Loss 2.8767 LearningRate 0.0000 Epoch: 34 Global Step: 713470 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:38:24,214-Speed 6301.94 samples/sec Loss 2.8528 LearningRate 0.0000 Epoch: 34 Global Step: 713480 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:38:27,471-Speed 6287.45 samples/sec Loss 2.8474 LearningRate 0.0000 Epoch: 34 Global Step: 713490 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:38:30,743-Speed 6260.68 samples/sec Loss 2.8160 LearningRate 0.0000 Epoch: 34 Global Step: 713500 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:38:33,993-Speed 6304.46 samples/sec Loss 2.8343 LearningRate 0.0000 Epoch: 34 Global Step: 713510 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:38:37,241-Speed 6305.62 samples/sec Loss 2.8249 LearningRate 0.0000 Epoch: 34 Global Step: 713520 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:38:40,481-Speed 6322.35 samples/sec Loss 2.8874 LearningRate 0.0000 Epoch: 34 Global Step: 713530 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:43,737-Speed 6291.41 samples/sec Loss 2.8850 LearningRate 0.0000 Epoch: 34 Global Step: 713540 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:46,996-Speed 6286.58 samples/sec Loss 2.8690 LearningRate 0.0000 Epoch: 34 Global Step: 713550 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:50,242-Speed 6309.90 samples/sec Loss 2.7954 LearningRate 0.0000 Epoch: 34 Global Step: 713560 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:53,492-Speed 6304.06 samples/sec Loss 2.8828 LearningRate 0.0000 Epoch: 34 Global Step: 713570 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:38:56,747-Speed 6291.46 samples/sec Loss 2.8724 LearningRate 0.0000 Epoch: 34 Global Step: 713580 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:00,005-Speed 6287.50 samples/sec Loss 2.8324 LearningRate 0.0000 Epoch: 34 Global Step: 713590 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:03,263-Speed 6287.40 samples/sec Loss 2.8125 LearningRate 0.0000 Epoch: 34 Global Step: 713600 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:06,523-Speed 6283.51 samples/sec Loss 2.8495 LearningRate 0.0000 Epoch: 34 Global Step: 713610 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:09,770-Speed 6309.91 samples/sec Loss 2.8540 LearningRate 0.0000 Epoch: 34 Global Step: 713620 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:13,023-Speed 6297.75 samples/sec Loss 2.8510 LearningRate 0.0000 Epoch: 34 Global Step: 713630 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:39:16,271-Speed 6306.99 samples/sec Loss 2.8486 LearningRate 0.0000 Epoch: 34 Global Step: 713640 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:39:19,524-Speed 6297.12 samples/sec Loss 2.8481 LearningRate 0.0000 Epoch: 34 Global Step: 713650 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:39:22,780-Speed 6292.52 samples/sec Loss 2.8513 LearningRate 0.0000 Epoch: 34 Global Step: 713660 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:39:26,014-Speed 6333.56 samples/sec Loss 2.8443 LearningRate 0.0000 Epoch: 34 Global Step: 713670 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:29,287-Speed 6257.63 samples/sec Loss 2.8493 LearningRate 0.0000 Epoch: 34 Global Step: 713680 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:32,545-Speed 6287.31 samples/sec Loss 2.8589 LearningRate 0.0000 Epoch: 34 Global Step: 713690 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:35,804-Speed 6286.18 samples/sec Loss 2.8875 LearningRate 0.0000 Epoch: 34 Global Step: 713700 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:39,059-Speed 6293.28 samples/sec Loss 2.8891 LearningRate 0.0000 Epoch: 34 Global Step: 713710 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:42,314-Speed 6292.59 samples/sec Loss 2.9030 LearningRate 0.0000 Epoch: 34 Global Step: 713720 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:45,570-Speed 6293.09 samples/sec Loss 2.8655 LearningRate 0.0000 Epoch: 34 Global Step: 713730 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:48,836-Speed 6270.66 samples/sec Loss 2.9028 LearningRate 0.0000 Epoch: 34 Global Step: 713740 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:52,098-Speed 6280.94 samples/sec Loss 2.8496 LearningRate 0.0000 Epoch: 34 Global Step: 713750 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:55,350-Speed 6299.16 samples/sec Loss 2.8810 LearningRate 0.0000 Epoch: 34 Global Step: 713760 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:39:58,614-Speed 6275.85 samples/sec Loss 2.8314 LearningRate 0.0000 Epoch: 34 Global Step: 713770 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:40:01,878-Speed 6275.73 samples/sec Loss 2.9093 LearningRate 0.0000 Epoch: 34 Global Step: 713780 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:05,138-Speed 6283.27 samples/sec Loss 2.7846 LearningRate 0.0000 Epoch: 34 Global Step: 713790 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:08,385-Speed 6308.60 samples/sec Loss 2.7867 LearningRate 0.0000 Epoch: 34 Global Step: 713800 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:11,632-Speed 6308.28 samples/sec Loss 2.8548 LearningRate 0.0000 Epoch: 34 Global Step: 713810 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:14,879-Speed 6308.45 samples/sec Loss 2.8273 LearningRate 0.0000 Epoch: 34 Global Step: 713820 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:18,132-Speed 6297.97 samples/sec Loss 2.8421 LearningRate 0.0000 Epoch: 34 Global Step: 713830 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:21,380-Speed 6306.71 samples/sec Loss 2.8371 LearningRate 0.0000 Epoch: 34 Global Step: 713840 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:24,630-Speed 6303.78 samples/sec Loss 2.9025 LearningRate 0.0000 Epoch: 34 Global Step: 713850 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:27,886-Speed 6291.56 samples/sec Loss 2.8553 LearningRate 0.0000 Epoch: 34 Global Step: 713860 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:31,139-Speed 6297.34 samples/sec Loss 2.7815 LearningRate 0.0000 Epoch: 34 Global Step: 713870 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:34,388-Speed 6305.27 samples/sec Loss 2.8492 LearningRate 0.0000 Epoch: 34 Global Step: 713880 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:40:37,639-Speed 6301.38 samples/sec Loss 2.8431 LearningRate 0.0000 Epoch: 34 Global Step: 713890 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:40:40,876-Speed 6326.50 samples/sec Loss 2.8592 LearningRate 0.0000 Epoch: 34 Global Step: 713900 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:44,135-Speed 6286.64 samples/sec Loss 2.8361 LearningRate 0.0000 Epoch: 34 Global Step: 713910 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:47,388-Speed 6296.63 samples/sec Loss 2.8659 LearningRate 0.0000 Epoch: 34 Global Step: 713920 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:50,646-Speed 6288.96 samples/sec Loss 2.8918 LearningRate 0.0000 Epoch: 34 Global Step: 713930 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:53,900-Speed 6293.25 samples/sec Loss 2.8255 LearningRate 0.0000 Epoch: 34 Global Step: 713940 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:40:57,154-Speed 6295.24 samples/sec Loss 2.8403 LearningRate 0.0000 Epoch: 34 Global Step: 713950 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:00,401-Speed 6308.58 samples/sec Loss 2.8573 LearningRate 0.0000 Epoch: 34 Global Step: 713960 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:03,658-Speed 6290.78 samples/sec Loss 2.8618 LearningRate 0.0000 Epoch: 34 Global Step: 713970 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:06,910-Speed 6299.11 samples/sec Loss 2.8805 LearningRate 0.0000 Epoch: 34 Global Step: 713980 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:10,158-Speed 6306.01 samples/sec Loss 2.8450 LearningRate 0.0000 Epoch: 34 Global Step: 713990 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:13,423-Speed 6274.19 samples/sec Loss 2.8910 LearningRate 0.0000 Epoch: 34 Global Step: 714000 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:41:16,682-Speed 6285.31 samples/sec Loss 2.8672 LearningRate 0.0000 Epoch: 34 Global Step: 714010 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:41:19,922-Speed 6323.54 samples/sec Loss 2.8646 LearningRate 0.0000 Epoch: 34 Global Step: 714020 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:23,173-Speed 6299.71 samples/sec Loss 2.7860 LearningRate 0.0000 Epoch: 34 Global Step: 714030 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:26,425-Speed 6300.13 samples/sec Loss 2.8394 LearningRate 0.0000 Epoch: 34 Global Step: 714040 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:29,674-Speed 6304.70 samples/sec Loss 2.8865 LearningRate 0.0000 Epoch: 34 Global Step: 714050 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:32,932-Speed 6287.47 samples/sec Loss 2.8462 LearningRate 0.0000 Epoch: 34 Global Step: 714060 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:36,185-Speed 6297.07 samples/sec Loss 2.8604 LearningRate 0.0000 Epoch: 34 Global Step: 714070 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:39,443-Speed 6288.04 samples/sec Loss 2.8482 LearningRate 0.0000 Epoch: 34 Global Step: 714080 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:42,701-Speed 6288.00 samples/sec Loss 2.8061 LearningRate 0.0000 Epoch: 34 Global Step: 714090 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:45,958-Speed 6287.95 samples/sec Loss 2.8632 LearningRate 0.0000 Epoch: 34 Global Step: 714100 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:49,212-Speed 6296.55 samples/sec Loss 2.8714 LearningRate 0.0000 Epoch: 34 Global Step: 714110 Fp16 Grad Scale: 4096 Required: 11 hours Training: 2022-04-03 08:41:52,471-Speed 6284.35 samples/sec Loss 2.8488 LearningRate 0.0000 Epoch: 34 Global Step: 714120 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-03 08:41:55,714-Speed 6317.04 samples/sec Loss 2.8587 LearningRate 0.0000 Epoch: 34 Global Step: 714130 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:41:58,969-Speed 6294.09 samples/sec Loss 2.8764 LearningRate 0.0000 Epoch: 34 Global Step: 714140 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:02,225-Speed 6290.15 samples/sec Loss 2.8493 LearningRate 0.0000 Epoch: 34 Global Step: 714150 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:05,481-Speed 6292.66 samples/sec Loss 2.8548 LearningRate 0.0000 Epoch: 34 Global Step: 714160 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:08,731-Speed 6302.53 samples/sec Loss 2.8622 LearningRate 0.0000 Epoch: 34 Global Step: 714170 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:11,980-Speed 6303.54 samples/sec Loss 2.8297 LearningRate 0.0000 Epoch: 34 Global Step: 714180 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:15,227-Speed 6308.60 samples/sec Loss 2.8514 LearningRate 0.0000 Epoch: 34 Global Step: 714190 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:18,482-Speed 6294.82 samples/sec Loss 2.8639 LearningRate 0.0000 Epoch: 34 Global Step: 714200 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:21,733-Speed 6300.75 samples/sec Loss 2.8416 LearningRate 0.0000 Epoch: 34 Global Step: 714210 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:24,988-Speed 6293.85 samples/sec Loss 2.8491 LearningRate 0.0000 Epoch: 34 Global Step: 714220 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:28,239-Speed 6300.15 samples/sec Loss 2.8700 LearningRate 0.0000 Epoch: 34 Global Step: 714230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:31,500-Speed 6281.88 samples/sec Loss 2.8867 LearningRate 0.0000 Epoch: 34 Global Step: 714240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:34,757-Speed 6289.00 samples/sec Loss 2.8541 LearningRate 0.0000 Epoch: 34 Global Step: 714250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:38,005-Speed 6306.52 samples/sec Loss 2.8321 LearningRate 0.0000 Epoch: 34 Global Step: 714260 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:41,257-Speed 6299.69 samples/sec Loss 2.8301 LearningRate 0.0000 Epoch: 34 Global Step: 714270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:44,508-Speed 6300.21 samples/sec Loss 2.8693 LearningRate 0.0000 Epoch: 34 Global Step: 714280 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:47,767-Speed 6287.55 samples/sec Loss 2.7690 LearningRate 0.0000 Epoch: 34 Global Step: 714290 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:51,016-Speed 6304.46 samples/sec Loss 2.8130 LearningRate 0.0000 Epoch: 34 Global Step: 714300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:54,264-Speed 6306.81 samples/sec Loss 2.9459 LearningRate 0.0000 Epoch: 34 Global Step: 714310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:42:57,524-Speed 6283.27 samples/sec Loss 2.8554 LearningRate 0.0000 Epoch: 34 Global Step: 714320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:00,777-Speed 6296.59 samples/sec Loss 2.8533 LearningRate 0.0000 Epoch: 34 Global Step: 714330 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:43:04,037-Speed 6283.96 samples/sec Loss 2.8449 LearningRate 0.0000 Epoch: 34 Global Step: 714340 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:43:07,321-Speed 6238.60 samples/sec Loss 2.8446 LearningRate 0.0000 Epoch: 34 Global Step: 714350 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:43:10,573-Speed 6297.86 samples/sec Loss 2.8925 LearningRate 0.0000 Epoch: 34 Global Step: 714360 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:43:13,810-Speed 6329.20 samples/sec Loss 2.8214 LearningRate 0.0000 Epoch: 34 Global Step: 714370 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:17,059-Speed 6303.30 samples/sec Loss 2.8841 LearningRate 0.0000 Epoch: 34 Global Step: 714380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:20,310-Speed 6302.68 samples/sec Loss 2.8454 LearningRate 0.0000 Epoch: 34 Global Step: 714390 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:23,567-Speed 6287.74 samples/sec Loss 2.8423 LearningRate 0.0000 Epoch: 34 Global Step: 714400 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:26,828-Speed 6282.29 samples/sec Loss 2.8911 LearningRate 0.0000 Epoch: 34 Global Step: 714410 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:30,086-Speed 6287.38 samples/sec Loss 2.8654 LearningRate 0.0000 Epoch: 34 Global Step: 714420 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:33,340-Speed 6295.25 samples/sec Loss 2.8107 LearningRate 0.0000 Epoch: 34 Global Step: 714430 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:36,594-Speed 6294.31 samples/sec Loss 2.8077 LearningRate 0.0000 Epoch: 34 Global Step: 714440 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:39,861-Speed 6271.98 samples/sec Loss 2.8426 LearningRate 0.0000 Epoch: 34 Global Step: 714450 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:43,109-Speed 6305.53 samples/sec Loss 2.8693 LearningRate 0.0000 Epoch: 34 Global Step: 714460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:46,359-Speed 6302.79 samples/sec Loss 2.8698 LearningRate 0.0000 Epoch: 34 Global Step: 714470 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:43:49,613-Speed 6296.45 samples/sec Loss 2.8452 LearningRate 0.0000 Epoch: 34 Global Step: 714480 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:43:52,851-Speed 6326.48 samples/sec Loss 2.8449 LearningRate 0.0000 Epoch: 34 Global Step: 714490 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:56,104-Speed 6296.72 samples/sec Loss 2.8629 LearningRate 0.0000 Epoch: 34 Global Step: 714500 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:43:59,361-Speed 6290.78 samples/sec Loss 2.8533 LearningRate 0.0000 Epoch: 34 Global Step: 714510 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:02,629-Speed 6267.15 samples/sec Loss 2.8562 LearningRate 0.0000 Epoch: 34 Global Step: 714520 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:05,890-Speed 6281.64 samples/sec Loss 2.8482 LearningRate 0.0000 Epoch: 34 Global Step: 714530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:09,145-Speed 6294.47 samples/sec Loss 2.8470 LearningRate 0.0000 Epoch: 34 Global Step: 714540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:12,399-Speed 6294.95 samples/sec Loss 2.8875 LearningRate 0.0000 Epoch: 34 Global Step: 714550 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:15,646-Speed 6307.30 samples/sec Loss 2.8002 LearningRate 0.0000 Epoch: 34 Global Step: 714560 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:18,900-Speed 6296.59 samples/sec Loss 2.8163 LearningRate 0.0000 Epoch: 34 Global Step: 714570 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:22,153-Speed 6296.43 samples/sec Loss 2.8590 LearningRate 0.0000 Epoch: 34 Global Step: 714580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:25,422-Speed 6266.13 samples/sec Loss 2.8310 LearningRate 0.0000 Epoch: 34 Global Step: 714590 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:44:28,672-Speed 6302.35 samples/sec Loss 2.8965 LearningRate 0.0000 Epoch: 34 Global Step: 714600 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:44:31,909-Speed 6328.33 samples/sec Loss 2.9076 LearningRate 0.0000 Epoch: 34 Global Step: 714610 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:35,166-Speed 6289.87 samples/sec Loss 2.8742 LearningRate 0.0000 Epoch: 34 Global Step: 714620 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:38,413-Speed 6310.00 samples/sec Loss 2.7982 LearningRate 0.0000 Epoch: 34 Global Step: 714630 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:41,660-Speed 6308.02 samples/sec Loss 2.8249 LearningRate 0.0000 Epoch: 34 Global Step: 714640 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:44,911-Speed 6300.56 samples/sec Loss 2.8307 LearningRate 0.0000 Epoch: 34 Global Step: 714650 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:48,155-Speed 6314.61 samples/sec Loss 2.9051 LearningRate 0.0000 Epoch: 34 Global Step: 714660 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:51,411-Speed 6291.71 samples/sec Loss 2.9054 LearningRate 0.0000 Epoch: 34 Global Step: 714670 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:54,669-Speed 6287.57 samples/sec Loss 2.8585 LearningRate 0.0000 Epoch: 34 Global Step: 714680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:44:57,925-Speed 6290.86 samples/sec Loss 2.8182 LearningRate 0.0000 Epoch: 34 Global Step: 714690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:01,179-Speed 6295.82 samples/sec Loss 2.8258 LearningRate 0.0000 Epoch: 34 Global Step: 714700 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:04,428-Speed 6303.72 samples/sec Loss 2.7913 LearningRate 0.0000 Epoch: 34 Global Step: 714710 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:45:07,673-Speed 6314.36 samples/sec Loss 2.9081 LearningRate 0.0000 Epoch: 34 Global Step: 714720 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:10,933-Speed 6282.32 samples/sec Loss 2.9214 LearningRate 0.0000 Epoch: 34 Global Step: 714730 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:14,185-Speed 6300.56 samples/sec Loss 2.8211 LearningRate 0.0000 Epoch: 34 Global Step: 714740 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:17,438-Speed 6295.78 samples/sec Loss 2.8925 LearningRate 0.0000 Epoch: 34 Global Step: 714750 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:20,692-Speed 6296.84 samples/sec Loss 2.8800 LearningRate 0.0000 Epoch: 34 Global Step: 714760 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:23,950-Speed 6286.44 samples/sec Loss 2.8391 LearningRate 0.0000 Epoch: 34 Global Step: 714770 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:27,209-Speed 6286.97 samples/sec Loss 2.8235 LearningRate 0.0000 Epoch: 34 Global Step: 714780 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:30,463-Speed 6294.73 samples/sec Loss 2.8666 LearningRate 0.0000 Epoch: 34 Global Step: 714790 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:33,718-Speed 6292.16 samples/sec Loss 2.8586 LearningRate 0.0000 Epoch: 34 Global Step: 714800 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:36,959-Speed 6319.98 samples/sec Loss 2.8200 LearningRate 0.0000 Epoch: 34 Global Step: 714810 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:40,214-Speed 6294.17 samples/sec Loss 2.8661 LearningRate 0.0000 Epoch: 34 Global Step: 714820 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:45:43,472-Speed 6286.99 samples/sec Loss 2.8473 LearningRate 0.0000 Epoch: 34 Global Step: 714830 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:45:46,722-Speed 6303.41 samples/sec Loss 2.8554 LearningRate 0.0000 Epoch: 34 Global Step: 714840 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:45:49,955-Speed 6335.65 samples/sec Loss 2.7955 LearningRate 0.0000 Epoch: 34 Global Step: 714850 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:53,209-Speed 6294.67 samples/sec Loss 2.8608 LearningRate 0.0000 Epoch: 34 Global Step: 714860 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:56,469-Speed 6285.35 samples/sec Loss 2.8513 LearningRate 0.0000 Epoch: 34 Global Step: 714870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:45:59,721-Speed 6299.11 samples/sec Loss 2.7900 LearningRate 0.0000 Epoch: 34 Global Step: 714880 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:02,975-Speed 6293.96 samples/sec Loss 2.8854 LearningRate 0.0000 Epoch: 34 Global Step: 714890 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:06,234-Speed 6286.12 samples/sec Loss 2.8790 LearningRate 0.0000 Epoch: 34 Global Step: 714900 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:09,490-Speed 6292.13 samples/sec Loss 2.8480 LearningRate 0.0000 Epoch: 34 Global Step: 714910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:12,748-Speed 6285.99 samples/sec Loss 2.8246 LearningRate 0.0000 Epoch: 34 Global Step: 714920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:16,012-Speed 6277.05 samples/sec Loss 2.8671 LearningRate 0.0000 Epoch: 34 Global Step: 714930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:19,271-Speed 6286.24 samples/sec Loss 2.8403 LearningRate 0.0000 Epoch: 34 Global Step: 714940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:22,529-Speed 6287.97 samples/sec Loss 2.8103 LearningRate 0.0000 Epoch: 34 Global Step: 714950 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:46:25,785-Speed 6291.93 samples/sec Loss 2.8787 LearningRate 0.0000 Epoch: 34 Global Step: 714960 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:29,043-Speed 6285.90 samples/sec Loss 2.8511 LearningRate 0.0000 Epoch: 34 Global Step: 714970 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:32,290-Speed 6309.75 samples/sec Loss 2.8145 LearningRate 0.0000 Epoch: 34 Global Step: 714980 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:35,536-Speed 6310.42 samples/sec Loss 2.8113 LearningRate 0.0000 Epoch: 34 Global Step: 714990 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:38,787-Speed 6301.56 samples/sec Loss 2.8582 LearningRate 0.0000 Epoch: 34 Global Step: 715000 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:42,048-Speed 6280.40 samples/sec Loss 2.8921 LearningRate 0.0000 Epoch: 34 Global Step: 715010 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:45,301-Speed 6298.45 samples/sec Loss 2.8644 LearningRate 0.0000 Epoch: 34 Global Step: 715020 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:48,559-Speed 6286.23 samples/sec Loss 2.8831 LearningRate 0.0000 Epoch: 34 Global Step: 715030 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:51,815-Speed 6292.63 samples/sec Loss 2.8168 LearningRate 0.0000 Epoch: 34 Global Step: 715040 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:55,079-Speed 6274.51 samples/sec Loss 2.8836 LearningRate 0.0000 Epoch: 34 Global Step: 715050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:46:58,338-Speed 6287.20 samples/sec Loss 2.8550 LearningRate 0.0000 Epoch: 34 Global Step: 715060 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:47:01,598-Speed 6283.16 samples/sec Loss 2.8349 LearningRate 0.0000 Epoch: 34 Global Step: 715070 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:47:04,844-Speed 6310.04 samples/sec Loss 2.8662 LearningRate 0.0000 Epoch: 34 Global Step: 715080 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:08,095-Speed 6301.52 samples/sec Loss 2.8594 LearningRate 0.0000 Epoch: 34 Global Step: 715090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:11,351-Speed 6290.87 samples/sec Loss 2.8591 LearningRate 0.0000 Epoch: 34 Global Step: 715100 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:14,627-Speed 6252.01 samples/sec Loss 2.8111 LearningRate 0.0000 Epoch: 34 Global Step: 715110 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:17,963-Speed 6140.30 samples/sec Loss 2.8287 LearningRate 0.0000 Epoch: 34 Global Step: 715120 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:21,218-Speed 6295.07 samples/sec Loss 2.8246 LearningRate 0.0000 Epoch: 34 Global Step: 715130 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:24,469-Speed 6299.90 samples/sec Loss 2.8042 LearningRate 0.0000 Epoch: 34 Global Step: 715140 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:27,725-Speed 6291.70 samples/sec Loss 2.8364 LearningRate 0.0000 Epoch: 34 Global Step: 715150 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:30,980-Speed 6292.14 samples/sec Loss 2.8386 LearningRate 0.0000 Epoch: 34 Global Step: 715160 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:34,241-Speed 6283.59 samples/sec Loss 2.7953 LearningRate 0.0000 Epoch: 34 Global Step: 715170 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:37,489-Speed 6305.96 samples/sec Loss 2.8336 LearningRate 0.0000 Epoch: 34 Global Step: 715180 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:47:40,738-Speed 6306.14 samples/sec Loss 2.8497 LearningRate 0.0000 Epoch: 34 Global Step: 715190 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:47:43,978-Speed 6321.51 samples/sec Loss 2.8518 LearningRate 0.0000 Epoch: 34 Global Step: 715200 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:47,231-Speed 6298.13 samples/sec Loss 2.8229 LearningRate 0.0000 Epoch: 34 Global Step: 715210 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:50,482-Speed 6300.99 samples/sec Loss 2.8565 LearningRate 0.0000 Epoch: 34 Global Step: 715220 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:53,734-Speed 6299.07 samples/sec Loss 2.8198 LearningRate 0.0000 Epoch: 34 Global Step: 715230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:47:56,998-Speed 6275.42 samples/sec Loss 2.8691 LearningRate 0.0000 Epoch: 34 Global Step: 715240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:00,253-Speed 6294.34 samples/sec Loss 2.8641 LearningRate 0.0000 Epoch: 34 Global Step: 715250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:03,512-Speed 6285.06 samples/sec Loss 2.8603 LearningRate 0.0000 Epoch: 34 Global Step: 715260 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:06,774-Speed 6279.12 samples/sec Loss 2.8843 LearningRate 0.0000 Epoch: 34 Global Step: 715270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:10,028-Speed 6296.25 samples/sec Loss 2.8034 LearningRate 0.0000 Epoch: 34 Global Step: 715280 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:13,290-Speed 6278.25 samples/sec Loss 2.8594 LearningRate 0.0000 Epoch: 34 Global Step: 715290 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:16,533-Speed 6317.37 samples/sec Loss 2.8972 LearningRate 0.0000 Epoch: 34 Global Step: 715300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:19,785-Speed 6299.65 samples/sec Loss 2.8585 LearningRate 0.0000 Epoch: 34 Global Step: 715310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:23,041-Speed 6289.82 samples/sec Loss 2.8441 LearningRate 0.0000 Epoch: 34 Global Step: 715320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:26,298-Speed 6290.81 samples/sec Loss 2.8502 LearningRate 0.0000 Epoch: 34 Global Step: 715330 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:29,546-Speed 6305.43 samples/sec Loss 2.9129 LearningRate 0.0000 Epoch: 34 Global Step: 715340 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:32,793-Speed 6309.09 samples/sec Loss 2.8636 LearningRate 0.0000 Epoch: 34 Global Step: 715350 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:36,046-Speed 6296.15 samples/sec Loss 2.7526 LearningRate 0.0000 Epoch: 34 Global Step: 715360 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:39,299-Speed 6297.79 samples/sec Loss 2.8688 LearningRate 0.0000 Epoch: 34 Global Step: 715370 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:42,551-Speed 6299.34 samples/sec Loss 2.8774 LearningRate 0.0000 Epoch: 34 Global Step: 715380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:45,799-Speed 6306.09 samples/sec Loss 2.8221 LearningRate 0.0000 Epoch: 34 Global Step: 715390 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:49,038-Speed 6325.53 samples/sec Loss 2.8286 LearningRate 0.0000 Epoch: 34 Global Step: 715400 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:52,288-Speed 6303.84 samples/sec Loss 2.8311 LearningRate 0.0000 Epoch: 34 Global Step: 715410 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:55,546-Speed 6286.82 samples/sec Loss 2.8487 LearningRate 0.0000 Epoch: 34 Global Step: 715420 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:48:58,801-Speed 6293.58 samples/sec Loss 2.8147 LearningRate 0.0000 Epoch: 34 Global Step: 715430 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:02,057-Speed 6290.73 samples/sec Loss 2.8331 LearningRate 0.0000 Epoch: 34 Global Step: 715440 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:05,306-Speed 6305.63 samples/sec Loss 2.8479 LearningRate 0.0000 Epoch: 34 Global Step: 715450 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:08,563-Speed 6289.80 samples/sec Loss 2.8741 LearningRate 0.0000 Epoch: 34 Global Step: 715460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:11,814-Speed 6300.96 samples/sec Loss 2.8410 LearningRate 0.0000 Epoch: 34 Global Step: 715470 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:15,066-Speed 6299.32 samples/sec Loss 2.8683 LearningRate 0.0000 Epoch: 34 Global Step: 715480 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:18,326-Speed 6282.37 samples/sec Loss 2.8501 LearningRate 0.0000 Epoch: 34 Global Step: 715490 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:21,580-Speed 6294.87 samples/sec Loss 2.8709 LearningRate 0.0000 Epoch: 34 Global Step: 715500 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:49:24,838-Speed 6287.58 samples/sec Loss 2.8341 LearningRate 0.0000 Epoch: 34 Global Step: 715510 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:49:28,085-Speed 6308.58 samples/sec Loss 2.8664 LearningRate 0.0000 Epoch: 34 Global Step: 715520 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:49:31,323-Speed 6326.53 samples/sec Loss 2.9010 LearningRate 0.0000 Epoch: 34 Global Step: 715530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:34,579-Speed 6291.39 samples/sec Loss 2.8630 LearningRate 0.0000 Epoch: 34 Global Step: 715540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:37,835-Speed 6292.50 samples/sec Loss 2.7859 LearningRate 0.0000 Epoch: 34 Global Step: 715550 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:41,084-Speed 6303.60 samples/sec Loss 2.8259 LearningRate 0.0000 Epoch: 34 Global Step: 715560 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:44,339-Speed 6293.81 samples/sec Loss 2.8370 LearningRate 0.0000 Epoch: 34 Global Step: 715570 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:47,588-Speed 6305.10 samples/sec Loss 2.8605 LearningRate 0.0000 Epoch: 34 Global Step: 715580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:50,835-Speed 6308.08 samples/sec Loss 2.7835 LearningRate 0.0000 Epoch: 34 Global Step: 715590 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:54,088-Speed 6297.18 samples/sec Loss 2.8414 LearningRate 0.0000 Epoch: 34 Global Step: 715600 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:49:57,344-Speed 6290.44 samples/sec Loss 2.8273 LearningRate 0.0000 Epoch: 34 Global Step: 715610 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:00,610-Speed 6273.71 samples/sec Loss 2.8494 LearningRate 0.0000 Epoch: 34 Global Step: 715620 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:03,849-Speed 6325.73 samples/sec Loss 2.7695 LearningRate 0.0000 Epoch: 34 Global Step: 715630 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:07,101-Speed 6297.19 samples/sec Loss 2.8327 LearningRate 0.0000 Epoch: 34 Global Step: 715640 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:10,354-Speed 6298.56 samples/sec Loss 2.8826 LearningRate 0.0000 Epoch: 34 Global Step: 715650 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:13,602-Speed 6307.20 samples/sec Loss 2.8239 LearningRate 0.0000 Epoch: 34 Global Step: 715660 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:16,857-Speed 6292.19 samples/sec Loss 2.7909 LearningRate 0.0000 Epoch: 34 Global Step: 715670 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:20,120-Speed 6277.03 samples/sec Loss 2.8317 LearningRate 0.0000 Epoch: 34 Global Step: 715680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:23,369-Speed 6304.68 samples/sec Loss 2.8327 LearningRate 0.0000 Epoch: 34 Global Step: 715690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:26,617-Speed 6308.50 samples/sec Loss 2.8838 LearningRate 0.0000 Epoch: 34 Global Step: 715700 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:29,864-Speed 6308.26 samples/sec Loss 2.8635 LearningRate 0.0000 Epoch: 34 Global Step: 715710 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:33,119-Speed 6292.97 samples/sec Loss 2.8437 LearningRate 0.0000 Epoch: 34 Global Step: 715720 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:36,374-Speed 6293.35 samples/sec Loss 2.8500 LearningRate 0.0000 Epoch: 34 Global Step: 715730 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:50:39,625-Speed 6301.49 samples/sec Loss 2.8502 LearningRate 0.0000 Epoch: 34 Global Step: 715740 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:50:42,871-Speed 6309.79 samples/sec Loss 2.8347 LearningRate 0.0000 Epoch: 34 Global Step: 715750 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:50:46,121-Speed 6302.63 samples/sec Loss 2.8428 LearningRate 0.0000 Epoch: 34 Global Step: 715760 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:50:49,378-Speed 6289.94 samples/sec Loss 2.8564 LearningRate 0.0000 Epoch: 34 Global Step: 715770 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:50:52,615-Speed 6327.28 samples/sec Loss 2.8312 LearningRate 0.0000 Epoch: 34 Global Step: 715780 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:55,868-Speed 6297.92 samples/sec Loss 2.7929 LearningRate 0.0000 Epoch: 34 Global Step: 715790 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:50:59,129-Speed 6282.01 samples/sec Loss 2.8387 LearningRate 0.0000 Epoch: 34 Global Step: 715800 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:02,380-Speed 6300.18 samples/sec Loss 2.8688 LearningRate 0.0000 Epoch: 34 Global Step: 715810 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:05,636-Speed 6290.65 samples/sec Loss 2.7876 LearningRate 0.0000 Epoch: 34 Global Step: 715820 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:08,889-Speed 6298.78 samples/sec Loss 2.8906 LearningRate 0.0000 Epoch: 34 Global Step: 715830 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:12,151-Speed 6279.54 samples/sec Loss 2.7695 LearningRate 0.0000 Epoch: 34 Global Step: 715840 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:15,403-Speed 6298.91 samples/sec Loss 2.8474 LearningRate 0.0000 Epoch: 34 Global Step: 715850 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:18,663-Speed 6284.61 samples/sec Loss 2.8502 LearningRate 0.0000 Epoch: 34 Global Step: 715860 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:21,928-Speed 6274.32 samples/sec Loss 2.8234 LearningRate 0.0000 Epoch: 34 Global Step: 715870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:25,188-Speed 6283.46 samples/sec Loss 2.9214 LearningRate 0.0000 Epoch: 34 Global Step: 715880 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:51:28,447-Speed 6285.85 samples/sec Loss 2.9026 LearningRate 0.0000 Epoch: 34 Global Step: 715890 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:51:31,708-Speed 6280.26 samples/sec Loss 2.8364 LearningRate 0.0000 Epoch: 34 Global Step: 715900 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:51:34,949-Speed 6321.66 samples/sec Loss 2.8574 LearningRate 0.0000 Epoch: 34 Global Step: 715910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:38,203-Speed 6295.43 samples/sec Loss 2.8633 LearningRate 0.0000 Epoch: 34 Global Step: 715920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:41,463-Speed 6283.38 samples/sec Loss 2.7973 LearningRate 0.0000 Epoch: 34 Global Step: 715930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:44,715-Speed 6299.17 samples/sec Loss 2.8798 LearningRate 0.0000 Epoch: 34 Global Step: 715940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:47,974-Speed 6285.27 samples/sec Loss 2.9018 LearningRate 0.0000 Epoch: 34 Global Step: 715950 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:51,227-Speed 6295.75 samples/sec Loss 2.8516 LearningRate 0.0000 Epoch: 34 Global Step: 715960 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:54,477-Speed 6303.84 samples/sec Loss 2.8258 LearningRate 0.0000 Epoch: 34 Global Step: 715970 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:51:57,732-Speed 6293.58 samples/sec Loss 2.8316 LearningRate 0.0000 Epoch: 34 Global Step: 715980 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:00,981-Speed 6305.58 samples/sec Loss 2.8958 LearningRate 0.0000 Epoch: 34 Global Step: 715990 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:04,237-Speed 6289.73 samples/sec Loss 2.8432 LearningRate 0.0000 Epoch: 34 Global Step: 716000 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:07,475-Speed 6326.11 samples/sec Loss 2.8165 LearningRate 0.0000 Epoch: 34 Global Step: 716010 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:10,726-Speed 6302.03 samples/sec Loss 2.8799 LearningRate 0.0000 Epoch: 34 Global Step: 716020 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:13,977-Speed 6300.73 samples/sec Loss 2.8138 LearningRate 0.0000 Epoch: 34 Global Step: 716030 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:17,226-Speed 6304.21 samples/sec Loss 2.8479 LearningRate 0.0000 Epoch: 34 Global Step: 716040 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:20,487-Speed 6281.57 samples/sec Loss 2.8780 LearningRate 0.0000 Epoch: 34 Global Step: 716050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:23,745-Speed 6288.07 samples/sec Loss 2.8516 LearningRate 0.0000 Epoch: 34 Global Step: 716060 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:26,997-Speed 6300.89 samples/sec Loss 2.8665 LearningRate 0.0000 Epoch: 34 Global Step: 716070 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:30,256-Speed 6285.14 samples/sec Loss 2.8069 LearningRate 0.0000 Epoch: 34 Global Step: 716080 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:33,513-Speed 6289.00 samples/sec Loss 2.8267 LearningRate 0.0000 Epoch: 34 Global Step: 716090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:36,774-Speed 6282.26 samples/sec Loss 2.8534 LearningRate 0.0000 Epoch: 34 Global Step: 716100 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:40,021-Speed 6307.72 samples/sec Loss 2.8902 LearningRate 0.0000 Epoch: 34 Global Step: 716110 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:43,272-Speed 6301.07 samples/sec Loss 2.9031 LearningRate 0.0000 Epoch: 34 Global Step: 716120 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:46,524-Speed 6300.36 samples/sec Loss 2.8112 LearningRate 0.0000 Epoch: 34 Global Step: 716130 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:49,778-Speed 6293.98 samples/sec Loss 2.8333 LearningRate 0.0000 Epoch: 34 Global Step: 716140 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:53,035-Speed 6290.01 samples/sec Loss 2.8424 LearningRate 0.0000 Epoch: 34 Global Step: 716150 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:56,297-Speed 6278.96 samples/sec Loss 2.8912 LearningRate 0.0000 Epoch: 34 Global Step: 716160 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:52:59,550-Speed 6298.46 samples/sec Loss 2.8334 LearningRate 0.0000 Epoch: 34 Global Step: 716170 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:02,805-Speed 6292.11 samples/sec Loss 2.8810 LearningRate 0.0000 Epoch: 34 Global Step: 716180 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:06,054-Speed 6305.23 samples/sec Loss 2.8300 LearningRate 0.0000 Epoch: 34 Global Step: 716190 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:09,299-Speed 6312.55 samples/sec Loss 2.7665 LearningRate 0.0000 Epoch: 34 Global Step: 716200 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:12,553-Speed 6294.25 samples/sec Loss 2.8187 LearningRate 0.0000 Epoch: 34 Global Step: 716210 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:53:15,805-Speed 6299.86 samples/sec Loss 2.8694 LearningRate 0.0000 Epoch: 34 Global Step: 716220 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:53:19,043-Speed 6326.47 samples/sec Loss 2.7803 LearningRate 0.0000 Epoch: 34 Global Step: 716230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:22,297-Speed 6295.18 samples/sec Loss 2.8143 LearningRate 0.0000 Epoch: 34 Global Step: 716240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:25,546-Speed 6305.38 samples/sec Loss 2.8307 LearningRate 0.0000 Epoch: 34 Global Step: 716250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:28,807-Speed 6281.30 samples/sec Loss 2.8615 LearningRate 0.0000 Epoch: 34 Global Step: 716260 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:32,098-Speed 6225.94 samples/sec Loss 2.8769 LearningRate 0.0000 Epoch: 34 Global Step: 716270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:35,356-Speed 6287.77 samples/sec Loss 2.8214 LearningRate 0.0000 Epoch: 34 Global Step: 716280 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:38,606-Speed 6302.24 samples/sec Loss 2.8596 LearningRate 0.0000 Epoch: 34 Global Step: 716290 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:41,851-Speed 6311.99 samples/sec Loss 2.8633 LearningRate 0.0000 Epoch: 34 Global Step: 716300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:45,104-Speed 6298.35 samples/sec Loss 2.8279 LearningRate 0.0000 Epoch: 34 Global Step: 716310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:48,358-Speed 6295.15 samples/sec Loss 2.8400 LearningRate 0.0000 Epoch: 34 Global Step: 716320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:51,607-Speed 6304.60 samples/sec Loss 2.8560 LearningRate 0.0000 Epoch: 34 Global Step: 716330 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:53:54,840-Speed 6336.46 samples/sec Loss 2.8196 LearningRate 0.0000 Epoch: 34 Global Step: 716340 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:53:58,099-Speed 6283.94 samples/sec Loss 2.8298 LearningRate 0.0000 Epoch: 34 Global Step: 716350 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:01,348-Speed 6306.90 samples/sec Loss 2.8573 LearningRate 0.0000 Epoch: 34 Global Step: 716360 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:04,611-Speed 6276.80 samples/sec Loss 2.8453 LearningRate 0.0000 Epoch: 34 Global Step: 716370 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:07,866-Speed 6293.61 samples/sec Loss 2.8570 LearningRate 0.0000 Epoch: 34 Global Step: 716380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:11,116-Speed 6301.93 samples/sec Loss 2.8306 LearningRate 0.0000 Epoch: 34 Global Step: 716390 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:14,371-Speed 6294.53 samples/sec Loss 2.8854 LearningRate 0.0000 Epoch: 34 Global Step: 716400 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:17,625-Speed 6294.14 samples/sec Loss 2.8413 LearningRate 0.0000 Epoch: 34 Global Step: 716410 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:20,878-Speed 6296.79 samples/sec Loss 2.8187 LearningRate 0.0000 Epoch: 34 Global Step: 716420 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:24,134-Speed 6292.50 samples/sec Loss 2.8767 LearningRate 0.0000 Epoch: 34 Global Step: 716430 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:27,385-Speed 6300.67 samples/sec Loss 2.8006 LearningRate 0.0000 Epoch: 34 Global Step: 716440 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:54:30,628-Speed 6316.71 samples/sec Loss 2.8660 LearningRate 0.0000 Epoch: 34 Global Step: 716450 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:54:33,865-Speed 6327.60 samples/sec Loss 2.8086 LearningRate 0.0000 Epoch: 34 Global Step: 716460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:37,123-Speed 6287.67 samples/sec Loss 2.8532 LearningRate 0.0000 Epoch: 34 Global Step: 716470 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:40,383-Speed 6284.58 samples/sec Loss 2.8478 LearningRate 0.0000 Epoch: 34 Global Step: 716480 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:43,631-Speed 6306.01 samples/sec Loss 2.8426 LearningRate 0.0000 Epoch: 34 Global Step: 716490 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:46,883-Speed 6299.26 samples/sec Loss 2.8678 LearningRate 0.0000 Epoch: 34 Global Step: 716500 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:50,141-Speed 6288.18 samples/sec Loss 2.8755 LearningRate 0.0000 Epoch: 34 Global Step: 716510 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:53,393-Speed 6299.25 samples/sec Loss 2.8226 LearningRate 0.0000 Epoch: 34 Global Step: 716520 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:56,636-Speed 6316.74 samples/sec Loss 2.8000 LearningRate 0.0000 Epoch: 34 Global Step: 716530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:54:59,895-Speed 6285.44 samples/sec Loss 2.8357 LearningRate 0.0000 Epoch: 34 Global Step: 716540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:03,177-Speed 6241.39 samples/sec Loss 2.8834 LearningRate 0.0000 Epoch: 34 Global Step: 716550 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:06,432-Speed 6292.30 samples/sec Loss 2.8579 LearningRate 0.0000 Epoch: 34 Global Step: 716560 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:55:09,677-Speed 6311.85 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 716570 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:12,930-Speed 6297.32 samples/sec Loss 2.8388 LearningRate 0.0000 Epoch: 34 Global Step: 716580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:16,182-Speed 6300.07 samples/sec Loss 2.8403 LearningRate 0.0000 Epoch: 34 Global Step: 716590 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:19,437-Speed 6293.78 samples/sec Loss 2.8544 LearningRate 0.0000 Epoch: 34 Global Step: 716600 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:22,692-Speed 6292.77 samples/sec Loss 2.8197 LearningRate 0.0000 Epoch: 34 Global Step: 716610 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:25,960-Speed 6268.88 samples/sec Loss 2.8349 LearningRate 0.0000 Epoch: 34 Global Step: 716620 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:29,215-Speed 6291.55 samples/sec Loss 2.8759 LearningRate 0.0000 Epoch: 34 Global Step: 716630 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:32,476-Speed 6283.26 samples/sec Loss 2.8576 LearningRate 0.0000 Epoch: 34 Global Step: 716640 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:35,727-Speed 6299.69 samples/sec Loss 2.9021 LearningRate 0.0000 Epoch: 34 Global Step: 716650 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:38,983-Speed 6291.87 samples/sec Loss 2.8625 LearningRate 0.0000 Epoch: 34 Global Step: 716660 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:42,234-Speed 6301.10 samples/sec Loss 2.8017 LearningRate 0.0000 Epoch: 34 Global Step: 716670 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:55:45,472-Speed 6325.81 samples/sec Loss 2.8300 LearningRate 0.0000 Epoch: 34 Global Step: 716680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:48,726-Speed 6294.96 samples/sec Loss 2.8537 LearningRate 0.0000 Epoch: 34 Global Step: 716690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:51,982-Speed 6292.25 samples/sec Loss 2.8096 LearningRate 0.0000 Epoch: 34 Global Step: 716700 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:55,236-Speed 6296.55 samples/sec Loss 2.8293 LearningRate 0.0000 Epoch: 34 Global Step: 716710 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:55:58,493-Speed 6288.30 samples/sec Loss 2.8270 LearningRate 0.0000 Epoch: 34 Global Step: 716720 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:01,751-Speed 6288.94 samples/sec Loss 2.8260 LearningRate 0.0000 Epoch: 34 Global Step: 716730 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:05,010-Speed 6285.12 samples/sec Loss 2.8406 LearningRate 0.0000 Epoch: 34 Global Step: 716740 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:08,266-Speed 6290.96 samples/sec Loss 2.7939 LearningRate 0.0000 Epoch: 34 Global Step: 716750 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:11,520-Speed 6294.13 samples/sec Loss 2.8354 LearningRate 0.0000 Epoch: 34 Global Step: 716760 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:14,778-Speed 6289.03 samples/sec Loss 2.8361 LearningRate 0.0000 Epoch: 34 Global Step: 716770 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:18,034-Speed 6290.73 samples/sec Loss 2.8262 LearningRate 0.0000 Epoch: 34 Global Step: 716780 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:56:21,296-Speed 6280.30 samples/sec Loss 2.7976 LearningRate 0.0000 Epoch: 34 Global Step: 716790 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:56:24,551-Speed 6292.93 samples/sec Loss 2.8308 LearningRate 0.0000 Epoch: 34 Global Step: 716800 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:56:27,803-Speed 6299.47 samples/sec Loss 2.8622 LearningRate 0.0000 Epoch: 34 Global Step: 716810 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:56:31,057-Speed 6294.22 samples/sec Loss 2.8152 LearningRate 0.0000 Epoch: 34 Global Step: 716820 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:56:34,309-Speed 6299.12 samples/sec Loss 2.8554 LearningRate 0.0000 Epoch: 34 Global Step: 716830 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:56:37,569-Speed 6284.71 samples/sec Loss 2.8802 LearningRate 0.0000 Epoch: 34 Global Step: 716840 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:56:40,806-Speed 6326.33 samples/sec Loss 2.8278 LearningRate 0.0000 Epoch: 34 Global Step: 716850 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:44,064-Speed 6288.53 samples/sec Loss 2.8293 LearningRate 0.0000 Epoch: 34 Global Step: 716860 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:47,316-Speed 6298.38 samples/sec Loss 2.8907 LearningRate 0.0000 Epoch: 34 Global Step: 716870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:50,587-Speed 6261.90 samples/sec Loss 2.8391 LearningRate 0.0000 Epoch: 34 Global Step: 716880 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:53,840-Speed 6297.07 samples/sec Loss 2.8435 LearningRate 0.0000 Epoch: 34 Global Step: 716890 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:56:57,086-Speed 6310.63 samples/sec Loss 2.8030 LearningRate 0.0000 Epoch: 34 Global Step: 716900 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:00,343-Speed 6291.27 samples/sec Loss 2.8042 LearningRate 0.0000 Epoch: 34 Global Step: 716910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:03,601-Speed 6285.67 samples/sec Loss 2.8041 LearningRate 0.0000 Epoch: 34 Global Step: 716920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:06,863-Speed 6280.73 samples/sec Loss 2.8809 LearningRate 0.0000 Epoch: 34 Global Step: 716930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:10,114-Speed 6301.01 samples/sec Loss 2.8423 LearningRate 0.0000 Epoch: 34 Global Step: 716940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:13,368-Speed 6296.56 samples/sec Loss 2.8145 LearningRate 0.0000 Epoch: 34 Global Step: 716950 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:57:16,624-Speed 6290.01 samples/sec Loss 2.8285 LearningRate 0.0000 Epoch: 34 Global Step: 716960 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:57:19,883-Speed 6286.81 samples/sec Loss 2.8019 LearningRate 0.0000 Epoch: 34 Global Step: 716970 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:57:23,142-Speed 6285.00 samples/sec Loss 2.8271 LearningRate 0.0000 Epoch: 34 Global Step: 716980 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:57:26,394-Speed 6299.52 samples/sec Loss 2.8642 LearningRate 0.0000 Epoch: 34 Global Step: 716990 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:57:29,648-Speed 6294.33 samples/sec Loss 2.8478 LearningRate 0.0000 Epoch: 34 Global Step: 717000 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:57:32,881-Speed 6335.55 samples/sec Loss 2.8064 LearningRate 0.0000 Epoch: 34 Global Step: 717010 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:36,128-Speed 6310.17 samples/sec Loss 2.8433 LearningRate 0.0000 Epoch: 34 Global Step: 717020 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:39,384-Speed 6291.50 samples/sec Loss 2.8601 LearningRate 0.0000 Epoch: 34 Global Step: 717030 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:42,636-Speed 6297.55 samples/sec Loss 2.8656 LearningRate 0.0000 Epoch: 34 Global Step: 717040 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:45,898-Speed 6280.97 samples/sec Loss 2.8193 LearningRate 0.0000 Epoch: 34 Global Step: 717050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:49,157-Speed 6285.47 samples/sec Loss 2.8252 LearningRate 0.0000 Epoch: 34 Global Step: 717060 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:52,409-Speed 6298.09 samples/sec Loss 2.8288 LearningRate 0.0000 Epoch: 34 Global Step: 717070 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:55,671-Speed 6279.12 samples/sec Loss 2.8736 LearningRate 0.0000 Epoch: 34 Global Step: 717080 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:57:58,922-Speed 6302.54 samples/sec Loss 2.8187 LearningRate 0.0000 Epoch: 34 Global Step: 717090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:02,183-Speed 6282.09 samples/sec Loss 2.8468 LearningRate 0.0000 Epoch: 34 Global Step: 717100 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:05,436-Speed 6295.32 samples/sec Loss 2.7925 LearningRate 0.0000 Epoch: 34 Global Step: 717110 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:58:08,689-Speed 6298.11 samples/sec Loss 2.8636 LearningRate 0.0000 Epoch: 34 Global Step: 717120 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:58:11,947-Speed 6287.05 samples/sec Loss 2.8589 LearningRate 0.0000 Epoch: 34 Global Step: 717130 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:58:15,204-Speed 6289.82 samples/sec Loss 2.8319 LearningRate 0.0000 Epoch: 34 Global Step: 717140 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:58:18,499-Speed 6216.07 samples/sec Loss 2.8425 LearningRate 0.0000 Epoch: 34 Global Step: 717150 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:58:21,733-Speed 6335.03 samples/sec Loss 2.8016 LearningRate 0.0000 Epoch: 34 Global Step: 717160 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:24,987-Speed 6294.82 samples/sec Loss 2.7798 LearningRate 0.0000 Epoch: 34 Global Step: 717170 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:28,244-Speed 6290.07 samples/sec Loss 2.8371 LearningRate 0.0000 Epoch: 34 Global Step: 717180 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:31,498-Speed 6294.85 samples/sec Loss 2.7789 LearningRate 0.0000 Epoch: 34 Global Step: 717190 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:34,749-Speed 6301.15 samples/sec Loss 2.8419 LearningRate 0.0000 Epoch: 34 Global Step: 717200 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:37,998-Speed 6305.11 samples/sec Loss 2.8309 LearningRate 0.0000 Epoch: 34 Global Step: 717210 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:41,253-Speed 6294.45 samples/sec Loss 2.8525 LearningRate 0.0000 Epoch: 34 Global Step: 717220 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:44,510-Speed 6289.13 samples/sec Loss 2.8362 LearningRate 0.0000 Epoch: 34 Global Step: 717230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:47,756-Speed 6309.66 samples/sec Loss 2.8057 LearningRate 0.0000 Epoch: 34 Global Step: 717240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:51,019-Speed 6278.79 samples/sec Loss 2.8781 LearningRate 0.0000 Epoch: 34 Global Step: 717250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:58:54,271-Speed 6298.04 samples/sec Loss 2.8482 LearningRate 0.0000 Epoch: 34 Global Step: 717260 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:58:57,509-Speed 6326.41 samples/sec Loss 2.8389 LearningRate 0.0000 Epoch: 34 Global Step: 717270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:00,759-Speed 6302.85 samples/sec Loss 2.8527 LearningRate 0.0000 Epoch: 34 Global Step: 717280 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:04,046-Speed 6232.65 samples/sec Loss 2.8611 LearningRate 0.0000 Epoch: 34 Global Step: 717290 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:07,331-Speed 6234.87 samples/sec Loss 2.8026 LearningRate 0.0000 Epoch: 34 Global Step: 717300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:10,581-Speed 6302.82 samples/sec Loss 2.8541 LearningRate 0.0000 Epoch: 34 Global Step: 717310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:13,835-Speed 6294.95 samples/sec Loss 2.8531 LearningRate 0.0000 Epoch: 34 Global Step: 717320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:17,086-Speed 6301.75 samples/sec Loss 2.8361 LearningRate 0.0000 Epoch: 34 Global Step: 717330 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:20,336-Speed 6303.56 samples/sec Loss 2.9009 LearningRate 0.0000 Epoch: 34 Global Step: 717340 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:23,589-Speed 6295.90 samples/sec Loss 2.8459 LearningRate 0.0000 Epoch: 34 Global Step: 717350 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:26,847-Speed 6287.63 samples/sec Loss 2.8121 LearningRate 0.0000 Epoch: 34 Global Step: 717360 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:30,104-Speed 6290.04 samples/sec Loss 2.8368 LearningRate 0.0000 Epoch: 34 Global Step: 717370 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 08:59:33,344-Speed 6322.95 samples/sec Loss 2.7811 LearningRate 0.0000 Epoch: 34 Global Step: 717380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:36,598-Speed 6295.29 samples/sec Loss 2.8410 LearningRate 0.0000 Epoch: 34 Global Step: 717390 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:39,856-Speed 6287.03 samples/sec Loss 2.8013 LearningRate 0.0000 Epoch: 34 Global Step: 717400 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:43,111-Speed 6294.02 samples/sec Loss 2.8457 LearningRate 0.0000 Epoch: 34 Global Step: 717410 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:46,424-Speed 6182.32 samples/sec Loss 2.8083 LearningRate 0.0000 Epoch: 34 Global Step: 717420 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:49,795-Speed 6076.66 samples/sec Loss 2.8204 LearningRate 0.0000 Epoch: 34 Global Step: 717430 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:53,144-Speed 6117.48 samples/sec Loss 2.8190 LearningRate 0.0000 Epoch: 34 Global Step: 717440 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:56,386-Speed 6318.30 samples/sec Loss 2.7529 LearningRate 0.0000 Epoch: 34 Global Step: 717450 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 08:59:59,632-Speed 6310.44 samples/sec Loss 2.7644 LearningRate 0.0000 Epoch: 34 Global Step: 717460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:02,887-Speed 6293.08 samples/sec Loss 2.8210 LearningRate 0.0000 Epoch: 34 Global Step: 717470 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:06,140-Speed 6296.64 samples/sec Loss 2.8747 LearningRate 0.0000 Epoch: 34 Global Step: 717480 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:00:09,390-Speed 6303.59 samples/sec Loss 2.8038 LearningRate 0.0000 Epoch: 34 Global Step: 717490 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:00:12,625-Speed 6331.56 samples/sec Loss 2.8738 LearningRate 0.0000 Epoch: 34 Global Step: 717500 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:15,880-Speed 6293.77 samples/sec Loss 2.8650 LearningRate 0.0000 Epoch: 34 Global Step: 717510 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:19,137-Speed 6288.50 samples/sec Loss 2.8421 LearningRate 0.0000 Epoch: 34 Global Step: 717520 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:22,383-Speed 6311.87 samples/sec Loss 2.8123 LearningRate 0.0000 Epoch: 34 Global Step: 717530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:25,635-Speed 6299.01 samples/sec Loss 2.8159 LearningRate 0.0000 Epoch: 34 Global Step: 717540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:28,883-Speed 6306.81 samples/sec Loss 2.8605 LearningRate 0.0000 Epoch: 34 Global Step: 717550 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:32,138-Speed 6292.38 samples/sec Loss 2.8189 LearningRate 0.0000 Epoch: 34 Global Step: 717560 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:35,500-Speed 6093.32 samples/sec Loss 2.7979 LearningRate 0.0000 Epoch: 34 Global Step: 717570 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:38,863-Speed 6091.29 samples/sec Loss 2.8312 LearningRate 0.0000 Epoch: 34 Global Step: 717580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:42,122-Speed 6285.61 samples/sec Loss 2.8585 LearningRate 0.0000 Epoch: 34 Global Step: 717590 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:45,365-Speed 6316.52 samples/sec Loss 2.7869 LearningRate 0.0000 Epoch: 34 Global Step: 717600 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:48,632-Speed 6269.30 samples/sec Loss 2.7576 LearningRate 0.0000 Epoch: 34 Global Step: 717610 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:51,878-Speed 6312.47 samples/sec Loss 2.7977 LearningRate 0.0000 Epoch: 34 Global Step: 717620 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:55,134-Speed 6290.57 samples/sec Loss 2.7983 LearningRate 0.0000 Epoch: 34 Global Step: 717630 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:00:58,387-Speed 6297.50 samples/sec Loss 2.8433 LearningRate 0.0000 Epoch: 34 Global Step: 717640 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:01,643-Speed 6290.82 samples/sec Loss 2.8165 LearningRate 0.0000 Epoch: 34 Global Step: 717650 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:04,901-Speed 6288.22 samples/sec Loss 2.8555 LearningRate 0.0000 Epoch: 34 Global Step: 717660 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:08,181-Speed 6244.41 samples/sec Loss 2.8375 LearningRate 0.0000 Epoch: 34 Global Step: 717670 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:11,442-Speed 6282.06 samples/sec Loss 2.8330 LearningRate 0.0000 Epoch: 34 Global Step: 717680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:14,688-Speed 6310.44 samples/sec Loss 2.8401 LearningRate 0.0000 Epoch: 34 Global Step: 717690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:17,944-Speed 6290.69 samples/sec Loss 2.8060 LearningRate 0.0000 Epoch: 34 Global Step: 717700 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:01:21,194-Speed 6303.51 samples/sec Loss 2.8576 LearningRate 0.0000 Epoch: 34 Global Step: 717710 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:01:24,443-Speed 6304.34 samples/sec Loss 2.8505 LearningRate 0.0000 Epoch: 34 Global Step: 717720 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:01:27,698-Speed 6294.49 samples/sec Loss 2.9023 LearningRate 0.0000 Epoch: 34 Global Step: 717730 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:01:30,952-Speed 6294.87 samples/sec Loss 2.8785 LearningRate 0.0000 Epoch: 34 Global Step: 717740 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:01:34,191-Speed 6324.10 samples/sec Loss 2.8704 LearningRate 0.0000 Epoch: 34 Global Step: 717750 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:37,448-Speed 6289.64 samples/sec Loss 2.8621 LearningRate 0.0000 Epoch: 34 Global Step: 717760 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:40,716-Speed 6268.05 samples/sec Loss 2.8007 LearningRate 0.0000 Epoch: 34 Global Step: 717770 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:43,971-Speed 6293.30 samples/sec Loss 2.8691 LearningRate 0.0000 Epoch: 34 Global Step: 717780 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:47,225-Speed 6295.96 samples/sec Loss 2.8227 LearningRate 0.0000 Epoch: 34 Global Step: 717790 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:50,482-Speed 6287.82 samples/sec Loss 2.7935 LearningRate 0.0000 Epoch: 34 Global Step: 717800 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:53,740-Speed 6288.57 samples/sec Loss 2.8238 LearningRate 0.0000 Epoch: 34 Global Step: 717810 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:01:56,993-Speed 6296.37 samples/sec Loss 2.8489 LearningRate 0.0000 Epoch: 34 Global Step: 717820 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:00,243-Speed 6304.74 samples/sec Loss 2.8471 LearningRate 0.0000 Epoch: 34 Global Step: 717830 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:03,501-Speed 6287.45 samples/sec Loss 2.8357 LearningRate 0.0000 Epoch: 34 Global Step: 717840 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:06,754-Speed 6296.09 samples/sec Loss 2.8747 LearningRate 0.0000 Epoch: 34 Global Step: 717850 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:02:10,002-Speed 6306.73 samples/sec Loss 2.8257 LearningRate 0.0000 Epoch: 34 Global Step: 717860 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:13,261-Speed 6286.68 samples/sec Loss 2.8611 LearningRate 0.0000 Epoch: 34 Global Step: 717870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:16,522-Speed 6280.43 samples/sec Loss 2.8678 LearningRate 0.0000 Epoch: 34 Global Step: 717880 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:19,772-Speed 6303.66 samples/sec Loss 2.8737 LearningRate 0.0000 Epoch: 34 Global Step: 717890 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:23,033-Speed 6281.30 samples/sec Loss 2.8252 LearningRate 0.0000 Epoch: 34 Global Step: 717900 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:26,286-Speed 6297.13 samples/sec Loss 2.8503 LearningRate 0.0000 Epoch: 34 Global Step: 717910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:29,538-Speed 6298.08 samples/sec Loss 2.8767 LearningRate 0.0000 Epoch: 34 Global Step: 717920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:32,786-Speed 6307.76 samples/sec Loss 2.8547 LearningRate 0.0000 Epoch: 34 Global Step: 717930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:36,033-Speed 6309.45 samples/sec Loss 2.8080 LearningRate 0.0000 Epoch: 34 Global Step: 717940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:39,287-Speed 6294.74 samples/sec Loss 2.7981 LearningRate 0.0000 Epoch: 34 Global Step: 717950 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:42,542-Speed 6292.50 samples/sec Loss 2.8579 LearningRate 0.0000 Epoch: 34 Global Step: 717960 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:02:45,777-Speed 6333.16 samples/sec Loss 2.8292 LearningRate 0.0000 Epoch: 34 Global Step: 717970 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:49,025-Speed 6306.58 samples/sec Loss 2.8426 LearningRate 0.0000 Epoch: 34 Global Step: 717980 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:52,277-Speed 6299.37 samples/sec Loss 2.8489 LearningRate 0.0000 Epoch: 34 Global Step: 717990 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:55,521-Speed 6313.37 samples/sec Loss 2.8078 LearningRate 0.0000 Epoch: 34 Global Step: 718000 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:02:58,784-Speed 6278.61 samples/sec Loss 2.8435 LearningRate 0.0000 Epoch: 34 Global Step: 718010 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:02,046-Speed 6279.04 samples/sec Loss 2.8066 LearningRate 0.0000 Epoch: 34 Global Step: 718020 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:05,302-Speed 6291.11 samples/sec Loss 2.8466 LearningRate 0.0000 Epoch: 34 Global Step: 718030 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:08,555-Speed 6297.52 samples/sec Loss 2.8052 LearningRate 0.0000 Epoch: 34 Global Step: 718040 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:11,810-Speed 6293.52 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 718050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:15,065-Speed 6294.76 samples/sec Loss 2.8540 LearningRate 0.0000 Epoch: 34 Global Step: 718060 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:18,314-Speed 6305.25 samples/sec Loss 2.8379 LearningRate 0.0000 Epoch: 34 Global Step: 718070 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:03:21,557-Speed 6316.71 samples/sec Loss 2.8146 LearningRate 0.0000 Epoch: 34 Global Step: 718080 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:03:24,789-Speed 6336.87 samples/sec Loss 2.9016 LearningRate 0.0000 Epoch: 34 Global Step: 718090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:28,040-Speed 6302.30 samples/sec Loss 2.8320 LearningRate 0.0000 Epoch: 34 Global Step: 718100 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:31,298-Speed 6287.71 samples/sec Loss 2.8331 LearningRate 0.0000 Epoch: 34 Global Step: 718110 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:34,562-Speed 6274.70 samples/sec Loss 2.8187 LearningRate 0.0000 Epoch: 34 Global Step: 718120 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:37,817-Speed 6294.16 samples/sec Loss 2.8874 LearningRate 0.0000 Epoch: 34 Global Step: 718130 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:41,067-Speed 6303.35 samples/sec Loss 2.8046 LearningRate 0.0000 Epoch: 34 Global Step: 718140 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:44,320-Speed 6296.17 samples/sec Loss 2.8273 LearningRate 0.0000 Epoch: 34 Global Step: 718150 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:47,575-Speed 6292.40 samples/sec Loss 2.8307 LearningRate 0.0000 Epoch: 34 Global Step: 718160 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:50,836-Speed 6283.43 samples/sec Loss 2.8338 LearningRate 0.0000 Epoch: 34 Global Step: 718170 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:54,092-Speed 6291.06 samples/sec Loss 2.8417 LearningRate 0.0000 Epoch: 34 Global Step: 718180 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:03:57,341-Speed 6304.41 samples/sec Loss 2.8641 LearningRate 0.0000 Epoch: 34 Global Step: 718190 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:00,595-Speed 6295.56 samples/sec Loss 2.7832 LearningRate 0.0000 Epoch: 34 Global Step: 718200 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:03,854-Speed 6284.30 samples/sec Loss 2.8089 LearningRate 0.0000 Epoch: 34 Global Step: 718210 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:07,121-Speed 6270.88 samples/sec Loss 2.7978 LearningRate 0.0000 Epoch: 34 Global Step: 718220 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:10,377-Speed 6291.47 samples/sec Loss 2.7970 LearningRate 0.0000 Epoch: 34 Global Step: 718230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:13,621-Speed 6314.61 samples/sec Loss 2.8326 LearningRate 0.0000 Epoch: 34 Global Step: 718240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:16,874-Speed 6297.00 samples/sec Loss 2.8427 LearningRate 0.0000 Epoch: 34 Global Step: 718250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:20,129-Speed 6293.13 samples/sec Loss 2.8425 LearningRate 0.0000 Epoch: 34 Global Step: 718260 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:23,385-Speed 6292.43 samples/sec Loss 2.8281 LearningRate 0.0000 Epoch: 34 Global Step: 718270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:26,636-Speed 6299.51 samples/sec Loss 2.8453 LearningRate 0.0000 Epoch: 34 Global Step: 718280 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:29,893-Speed 6290.77 samples/sec Loss 2.8784 LearningRate 0.0000 Epoch: 34 Global Step: 718290 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:04:33,132-Speed 6324.68 samples/sec Loss 2.8776 LearningRate 0.0000 Epoch: 34 Global Step: 718300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:36,391-Speed 6283.98 samples/sec Loss 2.8143 LearningRate 0.0000 Epoch: 34 Global Step: 718310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:39,647-Speed 6292.29 samples/sec Loss 2.8437 LearningRate 0.0000 Epoch: 34 Global Step: 718320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:42,889-Speed 6317.92 samples/sec Loss 2.7800 LearningRate 0.0000 Epoch: 34 Global Step: 718330 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:46,146-Speed 6290.74 samples/sec Loss 2.8205 LearningRate 0.0000 Epoch: 34 Global Step: 718340 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:49,397-Speed 6299.86 samples/sec Loss 2.8943 LearningRate 0.0000 Epoch: 34 Global Step: 718350 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:52,651-Speed 6296.01 samples/sec Loss 2.8259 LearningRate 0.0000 Epoch: 34 Global Step: 718360 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:55,899-Speed 6305.41 samples/sec Loss 2.8075 LearningRate 0.0000 Epoch: 34 Global Step: 718370 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:04:59,166-Speed 6269.65 samples/sec Loss 2.8584 LearningRate 0.0000 Epoch: 34 Global Step: 718380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:02,424-Speed 6288.67 samples/sec Loss 2.8640 LearningRate 0.0000 Epoch: 34 Global Step: 718390 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:05,682-Speed 6286.97 samples/sec Loss 2.8141 LearningRate 0.0000 Epoch: 34 Global Step: 718400 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:05:08,919-Speed 6327.98 samples/sec Loss 2.8361 LearningRate 0.0000 Epoch: 34 Global Step: 718410 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:12,172-Speed 6298.68 samples/sec Loss 2.8000 LearningRate 0.0000 Epoch: 34 Global Step: 718420 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:15,429-Speed 6288.32 samples/sec Loss 2.8825 LearningRate 0.0000 Epoch: 34 Global Step: 718430 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:18,685-Speed 6290.49 samples/sec Loss 2.8207 LearningRate 0.0000 Epoch: 34 Global Step: 718440 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:21,939-Speed 6297.00 samples/sec Loss 2.8202 LearningRate 0.0000 Epoch: 34 Global Step: 718450 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:25,200-Speed 6281.41 samples/sec Loss 2.8433 LearningRate 0.0000 Epoch: 34 Global Step: 718460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:28,456-Speed 6290.61 samples/sec Loss 2.8465 LearningRate 0.0000 Epoch: 34 Global Step: 718470 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:31,711-Speed 6294.21 samples/sec Loss 2.8131 LearningRate 0.0000 Epoch: 34 Global Step: 718480 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:34,962-Speed 6299.16 samples/sec Loss 2.8723 LearningRate 0.0000 Epoch: 34 Global Step: 718490 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:38,222-Speed 6284.82 samples/sec Loss 2.8043 LearningRate 0.0000 Epoch: 34 Global Step: 718500 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:41,487-Speed 6274.39 samples/sec Loss 2.8431 LearningRate 0.0000 Epoch: 34 Global Step: 718510 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:05:44,728-Speed 6320.37 samples/sec Loss 2.8054 LearningRate 0.0000 Epoch: 34 Global Step: 718520 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:47,986-Speed 6287.27 samples/sec Loss 2.8377 LearningRate 0.0000 Epoch: 34 Global Step: 718530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:51,248-Speed 6280.60 samples/sec Loss 2.7949 LearningRate 0.0000 Epoch: 34 Global Step: 718540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:54,502-Speed 6294.61 samples/sec Loss 2.8668 LearningRate 0.0000 Epoch: 34 Global Step: 718550 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:05:57,749-Speed 6309.56 samples/sec Loss 2.7892 LearningRate 0.0000 Epoch: 34 Global Step: 718560 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:01,015-Speed 6271.69 samples/sec Loss 2.8447 LearningRate 0.0000 Epoch: 34 Global Step: 718570 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:04,275-Speed 6284.20 samples/sec Loss 2.8313 LearningRate 0.0000 Epoch: 34 Global Step: 718580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:07,531-Speed 6291.40 samples/sec Loss 2.8183 LearningRate 0.0000 Epoch: 34 Global Step: 718590 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:10,788-Speed 6289.36 samples/sec Loss 2.8473 LearningRate 0.0000 Epoch: 34 Global Step: 718600 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:14,044-Speed 6291.54 samples/sec Loss 2.8025 LearningRate 0.0000 Epoch: 34 Global Step: 718610 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:17,317-Speed 6258.53 samples/sec Loss 2.8589 LearningRate 0.0000 Epoch: 34 Global Step: 718620 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:06:20,590-Speed 6258.17 samples/sec Loss 2.8155 LearningRate 0.0000 Epoch: 34 Global Step: 718630 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:06:23,837-Speed 6307.28 samples/sec Loss 2.8549 LearningRate 0.0000 Epoch: 34 Global Step: 718640 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:06:27,094-Speed 6289.60 samples/sec Loss 2.8442 LearningRate 0.0000 Epoch: 34 Global Step: 718650 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:06:30,350-Speed 6291.39 samples/sec Loss 2.8280 LearningRate 0.0000 Epoch: 34 Global Step: 718660 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:06:33,591-Speed 6319.83 samples/sec Loss 2.8168 LearningRate 0.0000 Epoch: 34 Global Step: 718670 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:36,846-Speed 6294.32 samples/sec Loss 2.8383 LearningRate 0.0000 Epoch: 34 Global Step: 718680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:40,101-Speed 6292.69 samples/sec Loss 2.8516 LearningRate 0.0000 Epoch: 34 Global Step: 718690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:43,350-Speed 6305.00 samples/sec Loss 2.8273 LearningRate 0.0000 Epoch: 34 Global Step: 718700 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:46,607-Speed 6289.33 samples/sec Loss 2.7905 LearningRate 0.0000 Epoch: 34 Global Step: 718710 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:49,867-Speed 6285.00 samples/sec Loss 2.8018 LearningRate 0.0000 Epoch: 34 Global Step: 718720 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:53,128-Speed 6281.79 samples/sec Loss 2.8523 LearningRate 0.0000 Epoch: 34 Global Step: 718730 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:56,377-Speed 6304.08 samples/sec Loss 2.8531 LearningRate 0.0000 Epoch: 34 Global Step: 718740 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:06:59,646-Speed 6267.06 samples/sec Loss 2.8408 LearningRate 0.0000 Epoch: 34 Global Step: 718750 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:02,899-Speed 6297.58 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 718760 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:06,162-Speed 6276.74 samples/sec Loss 2.7987 LearningRate 0.0000 Epoch: 34 Global Step: 718770 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:07:09,397-Speed 6332.54 samples/sec Loss 2.8549 LearningRate 0.0000 Epoch: 34 Global Step: 718780 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:12,650-Speed 6297.74 samples/sec Loss 2.7957 LearningRate 0.0000 Epoch: 34 Global Step: 718790 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:15,918-Speed 6268.79 samples/sec Loss 2.8699 LearningRate 0.0000 Epoch: 34 Global Step: 718800 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:19,161-Speed 6315.63 samples/sec Loss 2.8546 LearningRate 0.0000 Epoch: 34 Global Step: 718810 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:22,413-Speed 6298.67 samples/sec Loss 2.8295 LearningRate 0.0000 Epoch: 34 Global Step: 718820 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:25,667-Speed 6296.39 samples/sec Loss 2.8527 LearningRate 0.0000 Epoch: 34 Global Step: 718830 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:28,920-Speed 6295.58 samples/sec Loss 2.7935 LearningRate 0.0000 Epoch: 34 Global Step: 718840 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:32,179-Speed 6285.43 samples/sec Loss 2.8134 LearningRate 0.0000 Epoch: 34 Global Step: 718850 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:35,436-Speed 6289.38 samples/sec Loss 2.8214 LearningRate 0.0000 Epoch: 34 Global Step: 718860 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:38,693-Speed 6290.68 samples/sec Loss 2.7730 LearningRate 0.0000 Epoch: 34 Global Step: 718870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:41,952-Speed 6284.48 samples/sec Loss 2.8254 LearningRate 0.0000 Epoch: 34 Global Step: 718880 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:07:45,202-Speed 6303.97 samples/sec Loss 2.8458 LearningRate 0.0000 Epoch: 34 Global Step: 718890 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:48,461-Speed 6285.65 samples/sec Loss 2.8462 LearningRate 0.0000 Epoch: 34 Global Step: 718900 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:51,722-Speed 6281.59 samples/sec Loss 2.7972 LearningRate 0.0000 Epoch: 34 Global Step: 718910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:54,980-Speed 6286.83 samples/sec Loss 2.8301 LearningRate 0.0000 Epoch: 34 Global Step: 718920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:07:58,232-Speed 6299.56 samples/sec Loss 2.8184 LearningRate 0.0000 Epoch: 34 Global Step: 718930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:01,482-Speed 6301.90 samples/sec Loss 2.8050 LearningRate 0.0000 Epoch: 34 Global Step: 718940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:04,740-Speed 6287.25 samples/sec Loss 2.8369 LearningRate 0.0000 Epoch: 34 Global Step: 718950 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:07,995-Speed 6294.04 samples/sec Loss 2.7813 LearningRate 0.0000 Epoch: 34 Global Step: 718960 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:11,261-Speed 6271.64 samples/sec Loss 2.8645 LearningRate 0.0000 Epoch: 34 Global Step: 718970 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:14,518-Speed 6289.62 samples/sec Loss 2.8374 LearningRate 0.0000 Epoch: 34 Global Step: 718980 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:17,784-Speed 6273.87 samples/sec Loss 2.8911 LearningRate 0.0000 Epoch: 34 Global Step: 718990 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:08:21,038-Speed 6295.36 samples/sec Loss 2.8547 LearningRate 0.0000 Epoch: 34 Global Step: 719000 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:08:24,296-Speed 6285.72 samples/sec Loss 2.8390 LearningRate 0.0000 Epoch: 34 Global Step: 719010 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:08:27,539-Speed 6317.62 samples/sec Loss 2.7384 LearningRate 0.0000 Epoch: 34 Global Step: 719020 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:30,789-Speed 6302.53 samples/sec Loss 2.8276 LearningRate 0.0000 Epoch: 34 Global Step: 719030 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:34,046-Speed 6290.32 samples/sec Loss 2.8491 LearningRate 0.0000 Epoch: 34 Global Step: 719040 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:37,301-Speed 6291.54 samples/sec Loss 2.8220 LearningRate 0.0000 Epoch: 34 Global Step: 719050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:40,560-Speed 6285.51 samples/sec Loss 2.8614 LearningRate 0.0000 Epoch: 34 Global Step: 719060 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:43,815-Speed 6294.55 samples/sec Loss 2.8276 LearningRate 0.0000 Epoch: 34 Global Step: 719070 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:47,076-Speed 6281.36 samples/sec Loss 2.8549 LearningRate 0.0000 Epoch: 34 Global Step: 719080 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:50,330-Speed 6295.35 samples/sec Loss 2.8452 LearningRate 0.0000 Epoch: 34 Global Step: 719090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:53,581-Speed 6300.56 samples/sec Loss 2.7898 LearningRate 0.0000 Epoch: 34 Global Step: 719100 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:08:56,828-Speed 6308.11 samples/sec Loss 2.7770 LearningRate 0.0000 Epoch: 34 Global Step: 719110 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:00,084-Speed 6292.22 samples/sec Loss 2.7949 LearningRate 0.0000 Epoch: 34 Global Step: 719120 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:09:03,323-Speed 6323.83 samples/sec Loss 2.7959 LearningRate 0.0000 Epoch: 34 Global Step: 719130 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:06,574-Speed 6300.83 samples/sec Loss 2.8661 LearningRate 0.0000 Epoch: 34 Global Step: 719140 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:09,824-Speed 6303.71 samples/sec Loss 2.8288 LearningRate 0.0000 Epoch: 34 Global Step: 719150 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:13,079-Speed 6294.08 samples/sec Loss 2.8184 LearningRate 0.0000 Epoch: 34 Global Step: 719160 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:16,407-Speed 6154.68 samples/sec Loss 2.8642 LearningRate 0.0000 Epoch: 34 Global Step: 719170 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:19,670-Speed 6277.09 samples/sec Loss 2.8681 LearningRate 0.0000 Epoch: 34 Global Step: 719180 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:22,923-Speed 6297.64 samples/sec Loss 2.8283 LearningRate 0.0000 Epoch: 34 Global Step: 719190 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:26,174-Speed 6302.29 samples/sec Loss 2.8623 LearningRate 0.0000 Epoch: 34 Global Step: 719200 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:29,449-Speed 6253.30 samples/sec Loss 2.8343 LearningRate 0.0000 Epoch: 34 Global Step: 719210 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:32,705-Speed 6293.05 samples/sec Loss 2.8255 LearningRate 0.0000 Epoch: 34 Global Step: 719220 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:35,947-Speed 6318.84 samples/sec Loss 2.8123 LearningRate 0.0000 Epoch: 34 Global Step: 719230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:39,202-Speed 6291.38 samples/sec Loss 2.8303 LearningRate 0.0000 Epoch: 34 Global Step: 719240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:42,468-Speed 6273.67 samples/sec Loss 2.8543 LearningRate 0.0000 Epoch: 34 Global Step: 719250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:45,726-Speed 6286.53 samples/sec Loss 2.8370 LearningRate 0.0000 Epoch: 34 Global Step: 719260 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:48,977-Speed 6300.91 samples/sec Loss 2.7683 LearningRate 0.0000 Epoch: 34 Global Step: 719270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:52,232-Speed 6293.70 samples/sec Loss 2.8266 LearningRate 0.0000 Epoch: 34 Global Step: 719280 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:55,489-Speed 6288.72 samples/sec Loss 2.8133 LearningRate 0.0000 Epoch: 34 Global Step: 719290 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:09:58,752-Speed 6278.79 samples/sec Loss 2.8423 LearningRate 0.0000 Epoch: 34 Global Step: 719300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:02,008-Speed 6291.12 samples/sec Loss 2.8450 LearningRate 0.0000 Epoch: 34 Global Step: 719310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:05,268-Speed 6283.71 samples/sec Loss 2.8282 LearningRate 0.0000 Epoch: 34 Global Step: 719320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:08,528-Speed 6282.73 samples/sec Loss 2.8619 LearningRate 0.0000 Epoch: 34 Global Step: 719330 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:10:11,787-Speed 6285.93 samples/sec Loss 2.8444 LearningRate 0.0000 Epoch: 34 Global Step: 719340 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:10:15,026-Speed 6324.02 samples/sec Loss 2.7969 LearningRate 0.0000 Epoch: 34 Global Step: 719350 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:18,279-Speed 6298.30 samples/sec Loss 2.8105 LearningRate 0.0000 Epoch: 34 Global Step: 719360 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:21,536-Speed 6288.69 samples/sec Loss 2.8064 LearningRate 0.0000 Epoch: 34 Global Step: 719370 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:24,811-Speed 6254.57 samples/sec Loss 2.8396 LearningRate 0.0000 Epoch: 34 Global Step: 719380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:28,074-Speed 6278.39 samples/sec Loss 2.7943 LearningRate 0.0000 Epoch: 34 Global Step: 719390 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:31,361-Speed 6230.68 samples/sec Loss 2.8553 LearningRate 0.0000 Epoch: 34 Global Step: 719400 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:34,615-Speed 6296.92 samples/sec Loss 2.8774 LearningRate 0.0000 Epoch: 34 Global Step: 719410 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:37,867-Speed 6297.89 samples/sec Loss 2.8187 LearningRate 0.0000 Epoch: 34 Global Step: 719420 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:41,133-Speed 6273.13 samples/sec Loss 2.8530 LearningRate 0.0000 Epoch: 34 Global Step: 719430 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:44,383-Speed 6302.69 samples/sec Loss 2.8087 LearningRate 0.0000 Epoch: 34 Global Step: 719440 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:47,619-Speed 6331.58 samples/sec Loss 2.7983 LearningRate 0.0000 Epoch: 34 Global Step: 719450 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:50,916-Speed 6212.05 samples/sec Loss 2.8577 LearningRate 0.0000 Epoch: 34 Global Step: 719460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:54,194-Speed 6249.04 samples/sec Loss 2.8182 LearningRate 0.0000 Epoch: 34 Global Step: 719470 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:10:57,454-Speed 6282.93 samples/sec Loss 2.7754 LearningRate 0.0000 Epoch: 34 Global Step: 719480 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:00,749-Speed 6218.10 samples/sec Loss 2.8155 LearningRate 0.0000 Epoch: 34 Global Step: 719490 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:04,005-Speed 6290.31 samples/sec Loss 2.7779 LearningRate 0.0000 Epoch: 34 Global Step: 719500 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:07,253-Speed 6307.25 samples/sec Loss 2.8170 LearningRate 0.0000 Epoch: 34 Global Step: 719510 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:10,502-Speed 6304.38 samples/sec Loss 2.8505 LearningRate 0.0000 Epoch: 34 Global Step: 719520 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:13,756-Speed 6296.32 samples/sec Loss 2.8004 LearningRate 0.0000 Epoch: 34 Global Step: 719530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:17,014-Speed 6286.20 samples/sec Loss 2.7778 LearningRate 0.0000 Epoch: 34 Global Step: 719540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:20,263-Speed 6305.61 samples/sec Loss 2.8129 LearningRate 0.0000 Epoch: 34 Global Step: 719550 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:11:23,516-Speed 6297.48 samples/sec Loss 2.8223 LearningRate 0.0000 Epoch: 34 Global Step: 719560 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:26,776-Speed 6283.44 samples/sec Loss 2.8523 LearningRate 0.0000 Epoch: 34 Global Step: 719570 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:30,033-Speed 6288.08 samples/sec Loss 2.8139 LearningRate 0.0000 Epoch: 34 Global Step: 719580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:33,286-Speed 6297.72 samples/sec Loss 2.7440 LearningRate 0.0000 Epoch: 34 Global Step: 719590 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:36,538-Speed 6299.52 samples/sec Loss 2.8611 LearningRate 0.0000 Epoch: 34 Global Step: 719600 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:39,796-Speed 6286.89 samples/sec Loss 2.8022 LearningRate 0.0000 Epoch: 34 Global Step: 719610 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:43,055-Speed 6286.96 samples/sec Loss 2.8113 LearningRate 0.0000 Epoch: 34 Global Step: 719620 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:46,312-Speed 6288.53 samples/sec Loss 2.8663 LearningRate 0.0000 Epoch: 34 Global Step: 719630 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:49,604-Speed 6224.30 samples/sec Loss 2.8421 LearningRate 0.0000 Epoch: 34 Global Step: 719640 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:52,912-Speed 6191.08 samples/sec Loss 2.9082 LearningRate 0.0000 Epoch: 34 Global Step: 719650 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:56,156-Speed 6316.03 samples/sec Loss 2.8353 LearningRate 0.0000 Epoch: 34 Global Step: 719660 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:11:59,408-Speed 6298.98 samples/sec Loss 2.8396 LearningRate 0.0000 Epoch: 34 Global Step: 719670 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:02,663-Speed 6293.72 samples/sec Loss 2.8039 LearningRate 0.0000 Epoch: 34 Global Step: 719680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:05,918-Speed 6292.75 samples/sec Loss 2.7922 LearningRate 0.0000 Epoch: 34 Global Step: 719690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:09,170-Speed 6298.60 samples/sec Loss 2.8270 LearningRate 0.0000 Epoch: 34 Global Step: 719700 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:12,424-Speed 6295.92 samples/sec Loss 2.8084 LearningRate 0.0000 Epoch: 34 Global Step: 719710 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:15,677-Speed 6296.11 samples/sec Loss 2.8119 LearningRate 0.0000 Epoch: 34 Global Step: 719720 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:18,925-Speed 6307.17 samples/sec Loss 2.7695 LearningRate 0.0000 Epoch: 34 Global Step: 719730 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:22,172-Speed 6309.46 samples/sec Loss 2.7491 LearningRate 0.0000 Epoch: 34 Global Step: 719740 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:25,425-Speed 6297.03 samples/sec Loss 2.8700 LearningRate 0.0000 Epoch: 34 Global Step: 719750 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:28,677-Speed 6297.76 samples/sec Loss 2.8392 LearningRate 0.0000 Epoch: 34 Global Step: 719760 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:12:31,923-Speed 6313.44 samples/sec Loss 2.7374 LearningRate 0.0000 Epoch: 34 Global Step: 719770 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:35,184-Speed 6282.88 samples/sec Loss 2.8029 LearningRate 0.0000 Epoch: 34 Global Step: 719780 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:38,449-Speed 6273.88 samples/sec Loss 2.8154 LearningRate 0.0000 Epoch: 34 Global Step: 719790 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:41,708-Speed 6285.32 samples/sec Loss 2.7987 LearningRate 0.0000 Epoch: 34 Global Step: 719800 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:44,961-Speed 6297.58 samples/sec Loss 2.8545 LearningRate 0.0000 Epoch: 34 Global Step: 719810 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:48,207-Speed 6310.27 samples/sec Loss 2.8666 LearningRate 0.0000 Epoch: 34 Global Step: 719820 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:51,458-Speed 6300.27 samples/sec Loss 2.8438 LearningRate 0.0000 Epoch: 34 Global Step: 719830 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:54,716-Speed 6287.61 samples/sec Loss 2.8451 LearningRate 0.0000 Epoch: 34 Global Step: 719840 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:12:57,976-Speed 6284.84 samples/sec Loss 2.8350 LearningRate 0.0000 Epoch: 34 Global Step: 719850 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:13:01,232-Speed 6290.31 samples/sec Loss 2.8075 LearningRate 0.0000 Epoch: 34 Global Step: 719860 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:13:04,480-Speed 6307.60 samples/sec Loss 2.8582 LearningRate 0.0000 Epoch: 34 Global Step: 719870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:13:07,732-Speed 6298.16 samples/sec Loss 2.8380 LearningRate 0.0000 Epoch: 34 Global Step: 719880 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:13:10,983-Speed 6302.91 samples/sec Loss 2.8181 LearningRate 0.0000 Epoch: 34 Global Step: 719890 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:13:14,241-Speed 6287.57 samples/sec Loss 2.8296 LearningRate 0.0000 Epoch: 34 Global Step: 719900 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:13:17,499-Speed 6287.26 samples/sec Loss 2.8224 LearningRate 0.0000 Epoch: 34 Global Step: 719910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:13:20,752-Speed 6296.15 samples/sec Loss 2.8142 LearningRate 0.0000 Epoch: 34 Global Step: 719920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:13:24,006-Speed 6295.47 samples/sec Loss 2.8266 LearningRate 0.0000 Epoch: 34 Global Step: 719930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:13:27,260-Speed 6294.09 samples/sec Loss 2.8017 LearningRate 0.0000 Epoch: 34 Global Step: 719940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:13:30,499-Speed 6325.56 samples/sec Loss 2.8176 LearningRate 0.0000 Epoch: 34 Global Step: 719950 Fp16 Grad Scale: 2048 Required: 10 hours Training: 2022-04-03 09:13:33,746-Speed 6309.10 samples/sec Loss 2.8331 LearningRate 0.0000 Epoch: 34 Global Step: 719960 Fp16 Grad Scale: 2048 Required: 10 hours Training: 2022-04-03 09:13:37,003-Speed 6289.27 samples/sec Loss 2.8513 LearningRate 0.0000 Epoch: 34 Global Step: 719970 Fp16 Grad Scale: 2048 Required: 10 hours Training: 2022-04-03 09:13:40,261-Speed 6286.05 samples/sec Loss 2.8524 LearningRate 0.0000 Epoch: 34 Global Step: 719980 Fp16 Grad Scale: 2048 Required: 10 hours Training: 2022-04-03 09:13:43,519-Speed 6287.72 samples/sec Loss 2.8256 LearningRate 0.0000 Epoch: 34 Global Step: 719990 Fp16 Grad Scale: 2048 Required: 10 hours Training: 2022-04-03 09:13:46,785-Speed 6273.20 samples/sec Loss 2.8411 LearningRate 0.0000 Epoch: 34 Global Step: 720000 Fp16 Grad Scale: 2048 Required: 10 hours Training: 2022-04-03 09:13:50,039-Speed 6295.53 samples/sec Loss 2.8450 LearningRate 0.0000 Epoch: 34 Global Step: 720010 Fp16 Grad Scale: 2048 Required: 10 hours Training: 2022-04-03 09:13:53,294-Speed 6292.81 samples/sec Loss 2.8540 LearningRate 0.0000 Epoch: 34 Global Step: 720020 Fp16 Grad Scale: 2048 Required: 10 hours Training: 2022-04-03 09:13:56,553-Speed 6284.10 samples/sec Loss 2.8396 LearningRate 0.0000 Epoch: 34 Global Step: 720030 Fp16 Grad Scale: 2048 Required: 10 hours Training: 2022-04-03 09:13:59,804-Speed 6301.55 samples/sec Loss 2.8855 LearningRate 0.0000 Epoch: 34 Global Step: 720040 Fp16 Grad Scale: 2048 Required: 10 hours Training: 2022-04-03 09:14:03,078-Speed 6256.71 samples/sec Loss 2.8396 LearningRate 0.0000 Epoch: 34 Global Step: 720050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:06,368-Speed 6225.65 samples/sec Loss 2.7773 LearningRate 0.0000 Epoch: 34 Global Step: 720060 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:09,634-Speed 6273.67 samples/sec Loss 2.8194 LearningRate 0.0000 Epoch: 34 Global Step: 720070 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:12,881-Speed 6308.86 samples/sec Loss 2.8645 LearningRate 0.0000 Epoch: 34 Global Step: 720080 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:16,133-Speed 6299.20 samples/sec Loss 2.7908 LearningRate 0.0000 Epoch: 34 Global Step: 720090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:19,388-Speed 6293.01 samples/sec Loss 2.8569 LearningRate 0.0000 Epoch: 34 Global Step: 720100 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:22,696-Speed 6194.04 samples/sec Loss 2.8497 LearningRate 0.0000 Epoch: 34 Global Step: 720110 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:25,953-Speed 6288.99 samples/sec Loss 2.8256 LearningRate 0.0000 Epoch: 34 Global Step: 720120 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:29,213-Speed 6282.25 samples/sec Loss 2.8421 LearningRate 0.0000 Epoch: 34 Global Step: 720130 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:32,468-Speed 6294.75 samples/sec Loss 2.8200 LearningRate 0.0000 Epoch: 34 Global Step: 720140 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:35,721-Speed 6295.69 samples/sec Loss 2.8258 LearningRate 0.0000 Epoch: 34 Global Step: 720150 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:14:38,976-Speed 6294.77 samples/sec Loss 2.8100 LearningRate 0.0000 Epoch: 34 Global Step: 720160 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:14:42,239-Speed 6277.31 samples/sec Loss 2.9095 LearningRate 0.0000 Epoch: 34 Global Step: 720170 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:14:45,480-Speed 6320.46 samples/sec Loss 2.7909 LearningRate 0.0000 Epoch: 34 Global Step: 720180 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:48,739-Speed 6284.36 samples/sec Loss 2.8711 LearningRate 0.0000 Epoch: 34 Global Step: 720190 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:52,017-Speed 6249.71 samples/sec Loss 2.8590 LearningRate 0.0000 Epoch: 34 Global Step: 720200 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:55,276-Speed 6285.02 samples/sec Loss 2.8766 LearningRate 0.0000 Epoch: 34 Global Step: 720210 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:14:58,531-Speed 6293.77 samples/sec Loss 2.7982 LearningRate 0.0000 Epoch: 34 Global Step: 720220 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:01,859-Speed 6155.69 samples/sec Loss 2.8132 LearningRate 0.0000 Epoch: 34 Global Step: 720230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:05,148-Speed 6228.00 samples/sec Loss 2.8509 LearningRate 0.0000 Epoch: 34 Global Step: 720240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:08,401-Speed 6296.93 samples/sec Loss 2.8106 LearningRate 0.0000 Epoch: 34 Global Step: 720250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:11,646-Speed 6311.55 samples/sec Loss 2.8905 LearningRate 0.0000 Epoch: 34 Global Step: 720260 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:14,899-Speed 6297.63 samples/sec Loss 2.8553 LearningRate 0.0000 Epoch: 34 Global Step: 720270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:18,147-Speed 6307.00 samples/sec Loss 2.8624 LearningRate 0.0000 Epoch: 34 Global Step: 720280 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:15:21,391-Speed 6315.08 samples/sec Loss 2.8025 LearningRate 0.0000 Epoch: 34 Global Step: 720290 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:24,642-Speed 6302.64 samples/sec Loss 2.7882 LearningRate 0.0000 Epoch: 34 Global Step: 720300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:27,895-Speed 6297.20 samples/sec Loss 2.7860 LearningRate 0.0000 Epoch: 34 Global Step: 720310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:31,143-Speed 6305.85 samples/sec Loss 2.7722 LearningRate 0.0000 Epoch: 34 Global Step: 720320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:34,397-Speed 6294.86 samples/sec Loss 2.7806 LearningRate 0.0000 Epoch: 34 Global Step: 720330 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:37,658-Speed 6281.95 samples/sec Loss 2.8483 LearningRate 0.0000 Epoch: 34 Global Step: 720340 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:40,918-Speed 6282.99 samples/sec Loss 2.8468 LearningRate 0.0000 Epoch: 34 Global Step: 720350 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:44,173-Speed 6293.29 samples/sec Loss 2.8490 LearningRate 0.0000 Epoch: 34 Global Step: 720360 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:47,432-Speed 6285.66 samples/sec Loss 2.8473 LearningRate 0.0000 Epoch: 34 Global Step: 720370 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:50,677-Speed 6313.64 samples/sec Loss 2.8016 LearningRate 0.0000 Epoch: 34 Global Step: 720380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:15:53,931-Speed 6295.20 samples/sec Loss 2.8538 LearningRate 0.0000 Epoch: 34 Global Step: 720390 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:15:57,172-Speed 6319.57 samples/sec Loss 2.8116 LearningRate 0.0000 Epoch: 34 Global Step: 720400 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:00,421-Speed 6305.60 samples/sec Loss 2.8231 LearningRate 0.0000 Epoch: 34 Global Step: 720410 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:03,674-Speed 6296.63 samples/sec Loss 2.8319 LearningRate 0.0000 Epoch: 34 Global Step: 720420 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:06,927-Speed 6297.67 samples/sec Loss 2.8043 LearningRate 0.0000 Epoch: 34 Global Step: 720430 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:10,183-Speed 6290.55 samples/sec Loss 2.8136 LearningRate 0.0000 Epoch: 34 Global Step: 720440 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:13,429-Speed 6310.35 samples/sec Loss 2.7750 LearningRate 0.0000 Epoch: 34 Global Step: 720450 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:16,725-Speed 6216.34 samples/sec Loss 2.8593 LearningRate 0.0000 Epoch: 34 Global Step: 720460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:19,978-Speed 6295.81 samples/sec Loss 2.7866 LearningRate 0.0000 Epoch: 34 Global Step: 720470 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:23,228-Speed 6303.08 samples/sec Loss 2.8304 LearningRate 0.0000 Epoch: 34 Global Step: 720480 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:26,558-Speed 6152.20 samples/sec Loss 2.8013 LearningRate 0.0000 Epoch: 34 Global Step: 720490 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:29,914-Speed 6103.14 samples/sec Loss 2.8249 LearningRate 0.0000 Epoch: 34 Global Step: 720500 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:16:33,159-Speed 6313.77 samples/sec Loss 2.7970 LearningRate 0.0000 Epoch: 34 Global Step: 720510 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:36,420-Speed 6280.56 samples/sec Loss 2.8447 LearningRate 0.0000 Epoch: 34 Global Step: 720520 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:39,674-Speed 6297.35 samples/sec Loss 2.8121 LearningRate 0.0000 Epoch: 34 Global Step: 720530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:42,937-Speed 6276.58 samples/sec Loss 2.7760 LearningRate 0.0000 Epoch: 34 Global Step: 720540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:46,194-Speed 6290.31 samples/sec Loss 2.8706 LearningRate 0.0000 Epoch: 34 Global Step: 720550 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:49,448-Speed 6293.99 samples/sec Loss 2.8603 LearningRate 0.0000 Epoch: 34 Global Step: 720560 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:52,703-Speed 6293.78 samples/sec Loss 2.8164 LearningRate 0.0000 Epoch: 34 Global Step: 720570 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:55,963-Speed 6283.53 samples/sec Loss 2.8180 LearningRate 0.0000 Epoch: 34 Global Step: 720580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:16:59,220-Speed 6289.27 samples/sec Loss 2.8290 LearningRate 0.0000 Epoch: 34 Global Step: 720590 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:02,479-Speed 6285.49 samples/sec Loss 2.8280 LearningRate 0.0000 Epoch: 34 Global Step: 720600 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:05,742-Speed 6278.03 samples/sec Loss 2.7998 LearningRate 0.0000 Epoch: 34 Global Step: 720610 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:17:08,986-Speed 6314.61 samples/sec Loss 2.7799 LearningRate 0.0000 Epoch: 34 Global Step: 720620 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:12,248-Speed 6280.80 samples/sec Loss 2.7818 LearningRate 0.0000 Epoch: 34 Global Step: 720630 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:15,502-Speed 6294.80 samples/sec Loss 2.8538 LearningRate 0.0000 Epoch: 34 Global Step: 720640 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:18,756-Speed 6294.73 samples/sec Loss 2.8271 LearningRate 0.0000 Epoch: 34 Global Step: 720650 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:22,016-Speed 6284.02 samples/sec Loss 2.8594 LearningRate 0.0000 Epoch: 34 Global Step: 720660 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:25,266-Speed 6302.88 samples/sec Loss 2.8189 LearningRate 0.0000 Epoch: 34 Global Step: 720670 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:28,525-Speed 6286.37 samples/sec Loss 2.7599 LearningRate 0.0000 Epoch: 34 Global Step: 720680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:31,778-Speed 6295.88 samples/sec Loss 2.8113 LearningRate 0.0000 Epoch: 34 Global Step: 720690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:35,034-Speed 6290.75 samples/sec Loss 2.7906 LearningRate 0.0000 Epoch: 34 Global Step: 720700 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:38,297-Speed 6278.93 samples/sec Loss 2.8152 LearningRate 0.0000 Epoch: 34 Global Step: 720710 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:41,563-Speed 6271.23 samples/sec Loss 2.8317 LearningRate 0.0000 Epoch: 34 Global Step: 720720 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:44,819-Speed 6291.52 samples/sec Loss 2.7902 LearningRate 0.0000 Epoch: 34 Global Step: 720730 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:48,074-Speed 6294.28 samples/sec Loss 2.8439 LearningRate 0.0000 Epoch: 34 Global Step: 720740 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:51,335-Speed 6282.14 samples/sec Loss 2.8511 LearningRate 0.0000 Epoch: 34 Global Step: 720750 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:54,591-Speed 6294.45 samples/sec Loss 2.7955 LearningRate 0.0000 Epoch: 34 Global Step: 720760 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:17:57,844-Speed 6297.18 samples/sec Loss 2.7935 LearningRate 0.0000 Epoch: 34 Global Step: 720770 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:01,109-Speed 6273.47 samples/sec Loss 2.8360 LearningRate 0.0000 Epoch: 34 Global Step: 720780 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:04,368-Speed 6285.10 samples/sec Loss 2.8136 LearningRate 0.0000 Epoch: 34 Global Step: 720790 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:07,637-Speed 6268.01 samples/sec Loss 2.7692 LearningRate 0.0000 Epoch: 34 Global Step: 720800 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:10,896-Speed 6284.27 samples/sec Loss 2.8147 LearningRate 0.0000 Epoch: 34 Global Step: 720810 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:14,157-Speed 6281.81 samples/sec Loss 2.8262 LearningRate 0.0000 Epoch: 34 Global Step: 720820 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:18:17,420-Speed 6278.78 samples/sec Loss 2.7964 LearningRate 0.0000 Epoch: 34 Global Step: 720830 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:18:20,677-Speed 6288.77 samples/sec Loss 2.8150 LearningRate 0.0000 Epoch: 34 Global Step: 720840 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:18:23,931-Speed 6294.22 samples/sec Loss 2.7980 LearningRate 0.0000 Epoch: 34 Global Step: 720850 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:18:27,184-Speed 6297.31 samples/sec Loss 2.7914 LearningRate 0.0000 Epoch: 34 Global Step: 720860 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:18:30,418-Speed 6334.35 samples/sec Loss 2.8264 LearningRate 0.0000 Epoch: 34 Global Step: 720870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:33,669-Speed 6300.28 samples/sec Loss 2.7793 LearningRate 0.0000 Epoch: 34 Global Step: 720880 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:36,920-Speed 6302.36 samples/sec Loss 2.8598 LearningRate 0.0000 Epoch: 34 Global Step: 720890 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:40,182-Speed 6279.17 samples/sec Loss 2.8883 LearningRate 0.0000 Epoch: 34 Global Step: 720900 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:43,435-Speed 6296.84 samples/sec Loss 2.7946 LearningRate 0.0000 Epoch: 34 Global Step: 720910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:46,693-Speed 6287.28 samples/sec Loss 2.7700 LearningRate 0.0000 Epoch: 34 Global Step: 720920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:49,952-Speed 6285.22 samples/sec Loss 2.7966 LearningRate 0.0000 Epoch: 34 Global Step: 720930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:53,208-Speed 6293.29 samples/sec Loss 2.7865 LearningRate 0.0000 Epoch: 34 Global Step: 720940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:56,456-Speed 6306.43 samples/sec Loss 2.7810 LearningRate 0.0000 Epoch: 34 Global Step: 720950 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:18:59,717-Speed 6281.36 samples/sec Loss 2.8110 LearningRate 0.0000 Epoch: 34 Global Step: 720960 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:02,974-Speed 6290.22 samples/sec Loss 2.8348 LearningRate 0.0000 Epoch: 34 Global Step: 720970 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:19:06,229-Speed 6293.41 samples/sec Loss 2.7747 LearningRate 0.0000 Epoch: 34 Global Step: 720980 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:09,488-Speed 6285.38 samples/sec Loss 2.8877 LearningRate 0.0000 Epoch: 34 Global Step: 720990 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:12,739-Speed 6300.48 samples/sec Loss 2.8017 LearningRate 0.0000 Epoch: 34 Global Step: 721000 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:15,996-Speed 6290.54 samples/sec Loss 2.8388 LearningRate 0.0000 Epoch: 34 Global Step: 721010 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:19,256-Speed 6283.44 samples/sec Loss 2.8379 LearningRate 0.0000 Epoch: 34 Global Step: 721020 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:22,511-Speed 6292.64 samples/sec Loss 2.8255 LearningRate 0.0000 Epoch: 34 Global Step: 721030 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:25,767-Speed 6290.51 samples/sec Loss 2.8012 LearningRate 0.0000 Epoch: 34 Global Step: 721040 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:29,020-Speed 6299.14 samples/sec Loss 2.8264 LearningRate 0.0000 Epoch: 34 Global Step: 721050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:32,281-Speed 6280.29 samples/sec Loss 2.8052 LearningRate 0.0000 Epoch: 34 Global Step: 721060 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:35,536-Speed 6293.19 samples/sec Loss 2.8389 LearningRate 0.0000 Epoch: 34 Global Step: 721070 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:38,771-Speed 6331.66 samples/sec Loss 2.7938 LearningRate 0.0000 Epoch: 34 Global Step: 721080 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:42,027-Speed 6291.34 samples/sec Loss 2.8117 LearningRate 0.0000 Epoch: 34 Global Step: 721090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:45,284-Speed 6290.99 samples/sec Loss 2.8701 LearningRate 0.0000 Epoch: 34 Global Step: 721100 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:48,604-Speed 6169.54 samples/sec Loss 2.8350 LearningRate 0.0000 Epoch: 34 Global Step: 721110 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:51,893-Speed 6227.87 samples/sec Loss 2.8172 LearningRate 0.0000 Epoch: 34 Global Step: 721120 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:55,149-Speed 6290.58 samples/sec Loss 2.7846 LearningRate 0.0000 Epoch: 34 Global Step: 721130 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:19:58,401-Speed 6299.75 samples/sec Loss 2.8617 LearningRate 0.0000 Epoch: 34 Global Step: 721140 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:01,653-Speed 6298.84 samples/sec Loss 2.8208 LearningRate 0.0000 Epoch: 34 Global Step: 721150 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:04,906-Speed 6297.01 samples/sec Loss 2.8249 LearningRate 0.0000 Epoch: 34 Global Step: 721160 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:08,169-Speed 6278.76 samples/sec Loss 2.7987 LearningRate 0.0000 Epoch: 34 Global Step: 721170 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:11,424-Speed 6293.33 samples/sec Loss 2.8485 LearningRate 0.0000 Epoch: 34 Global Step: 721180 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:20:14,688-Speed 6276.46 samples/sec Loss 2.7492 LearningRate 0.0000 Epoch: 34 Global Step: 721190 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:20:17,938-Speed 6301.92 samples/sec Loss 2.8231 LearningRate 0.0000 Epoch: 34 Global Step: 721200 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:20:21,200-Speed 6280.81 samples/sec Loss 2.8368 LearningRate 0.0000 Epoch: 34 Global Step: 721210 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:24,459-Speed 6285.24 samples/sec Loss 2.8647 LearningRate 0.0000 Epoch: 34 Global Step: 721220 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:27,717-Speed 6287.87 samples/sec Loss 2.8477 LearningRate 0.0000 Epoch: 34 Global Step: 721230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:31,012-Speed 6216.98 samples/sec Loss 2.8154 LearningRate 0.0000 Epoch: 34 Global Step: 721240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:34,264-Speed 6299.08 samples/sec Loss 2.8533 LearningRate 0.0000 Epoch: 34 Global Step: 721250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:37,515-Speed 6299.66 samples/sec Loss 2.8249 LearningRate 0.0000 Epoch: 34 Global Step: 721260 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:40,771-Speed 6292.14 samples/sec Loss 2.8174 LearningRate 0.0000 Epoch: 34 Global Step: 721270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:44,025-Speed 6295.51 samples/sec Loss 2.8226 LearningRate 0.0000 Epoch: 34 Global Step: 721280 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:47,284-Speed 6285.95 samples/sec Loss 2.8542 LearningRate 0.0000 Epoch: 34 Global Step: 721290 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:50,537-Speed 6295.18 samples/sec Loss 2.8235 LearningRate 0.0000 Epoch: 34 Global Step: 721300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:53,772-Speed 6333.52 samples/sec Loss 2.8368 LearningRate 0.0000 Epoch: 34 Global Step: 721310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:20:57,033-Speed 6280.42 samples/sec Loss 2.8001 LearningRate 0.0000 Epoch: 34 Global Step: 721320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:00,317-Speed 6237.99 samples/sec Loss 2.8354 LearningRate 0.0000 Epoch: 34 Global Step: 721330 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:03,625-Speed 6193.32 samples/sec Loss 2.7769 LearningRate 0.0000 Epoch: 34 Global Step: 721340 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:06,883-Speed 6287.07 samples/sec Loss 2.8083 LearningRate 0.0000 Epoch: 34 Global Step: 721350 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:10,141-Speed 6287.59 samples/sec Loss 2.8080 LearningRate 0.0000 Epoch: 34 Global Step: 721360 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:13,401-Speed 6282.94 samples/sec Loss 2.8054 LearningRate 0.0000 Epoch: 34 Global Step: 721370 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:16,650-Speed 6304.69 samples/sec Loss 2.8852 LearningRate 0.0000 Epoch: 34 Global Step: 721380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:19,899-Speed 6304.78 samples/sec Loss 2.7567 LearningRate 0.0000 Epoch: 34 Global Step: 721390 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:23,156-Speed 6289.58 samples/sec Loss 2.8098 LearningRate 0.0000 Epoch: 34 Global Step: 721400 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:26,410-Speed 6296.93 samples/sec Loss 2.7876 LearningRate 0.0000 Epoch: 34 Global Step: 721410 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:21:29,659-Speed 6304.83 samples/sec Loss 2.8064 LearningRate 0.0000 Epoch: 34 Global Step: 721420 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:21:32,912-Speed 6298.12 samples/sec Loss 2.8314 LearningRate 0.0000 Epoch: 34 Global Step: 721430 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:21:36,172-Speed 6283.40 samples/sec Loss 2.7975 LearningRate 0.0000 Epoch: 34 Global Step: 721440 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:21:39,413-Speed 6319.11 samples/sec Loss 2.8940 LearningRate 0.0000 Epoch: 34 Global Step: 721450 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:42,665-Speed 6300.41 samples/sec Loss 2.7749 LearningRate 0.0000 Epoch: 34 Global Step: 721460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:45,922-Speed 6289.26 samples/sec Loss 2.8376 LearningRate 0.0000 Epoch: 34 Global Step: 721470 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:49,244-Speed 6165.81 samples/sec Loss 2.7623 LearningRate 0.0000 Epoch: 34 Global Step: 721480 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:52,521-Speed 6251.77 samples/sec Loss 2.8191 LearningRate 0.0000 Epoch: 34 Global Step: 721490 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:55,775-Speed 6294.73 samples/sec Loss 2.8251 LearningRate 0.0000 Epoch: 34 Global Step: 721500 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:21:59,126-Speed 6113.39 samples/sec Loss 2.8751 LearningRate 0.0000 Epoch: 34 Global Step: 721510 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:02,382-Speed 6289.57 samples/sec Loss 2.7947 LearningRate 0.0000 Epoch: 34 Global Step: 721520 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:05,655-Speed 6259.89 samples/sec Loss 2.8798 LearningRate 0.0000 Epoch: 34 Global Step: 721530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:08,913-Speed 6287.61 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 721540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:12,172-Speed 6284.89 samples/sec Loss 2.7917 LearningRate 0.0000 Epoch: 34 Global Step: 721550 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:22:15,431-Speed 6284.84 samples/sec Loss 2.8056 LearningRate 0.0000 Epoch: 34 Global Step: 721560 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:22:18,685-Speed 6295.67 samples/sec Loss 2.8296 LearningRate 0.0000 Epoch: 34 Global Step: 721570 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:22:21,929-Speed 6315.08 samples/sec Loss 2.8007 LearningRate 0.0000 Epoch: 34 Global Step: 721580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:25,179-Speed 6302.97 samples/sec Loss 2.8150 LearningRate 0.0000 Epoch: 34 Global Step: 721590 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:28,464-Speed 6237.53 samples/sec Loss 2.7916 LearningRate 0.0000 Epoch: 34 Global Step: 721600 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:31,755-Speed 6225.55 samples/sec Loss 2.8744 LearningRate 0.0000 Epoch: 34 Global Step: 721610 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:35,013-Speed 6287.17 samples/sec Loss 2.8007 LearningRate 0.0000 Epoch: 34 Global Step: 721620 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:38,262-Speed 6304.35 samples/sec Loss 2.8325 LearningRate 0.0000 Epoch: 34 Global Step: 721630 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:41,515-Speed 6297.75 samples/sec Loss 2.8436 LearningRate 0.0000 Epoch: 34 Global Step: 721640 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:44,768-Speed 6297.90 samples/sec Loss 2.8523 LearningRate 0.0000 Epoch: 34 Global Step: 721650 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:48,026-Speed 6286.38 samples/sec Loss 2.8384 LearningRate 0.0000 Epoch: 34 Global Step: 721660 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:51,287-Speed 6282.48 samples/sec Loss 2.8147 LearningRate 0.0000 Epoch: 34 Global Step: 721670 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:54,536-Speed 6305.63 samples/sec Loss 2.8608 LearningRate 0.0000 Epoch: 34 Global Step: 721680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:22:57,798-Speed 6279.20 samples/sec Loss 2.7606 LearningRate 0.0000 Epoch: 34 Global Step: 721690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:01,054-Speed 6290.96 samples/sec Loss 2.8132 LearningRate 0.0000 Epoch: 34 Global Step: 721700 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:04,314-Speed 6282.62 samples/sec Loss 2.7911 LearningRate 0.0000 Epoch: 34 Global Step: 721710 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:07,558-Speed 6315.65 samples/sec Loss 2.8232 LearningRate 0.0000 Epoch: 34 Global Step: 721720 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:10,805-Speed 6308.57 samples/sec Loss 2.8445 LearningRate 0.0000 Epoch: 34 Global Step: 721730 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:14,062-Speed 6289.36 samples/sec Loss 2.7849 LearningRate 0.0000 Epoch: 34 Global Step: 721740 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:17,327-Speed 6273.30 samples/sec Loss 2.8374 LearningRate 0.0000 Epoch: 34 Global Step: 721750 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:20,580-Speed 6297.23 samples/sec Loss 2.7866 LearningRate 0.0000 Epoch: 34 Global Step: 721760 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:23,838-Speed 6288.69 samples/sec Loss 2.8220 LearningRate 0.0000 Epoch: 34 Global Step: 721770 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:27,092-Speed 6295.64 samples/sec Loss 2.7907 LearningRate 0.0000 Epoch: 34 Global Step: 721780 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:23:30,326-Speed 6334.36 samples/sec Loss 2.7668 LearningRate 0.0000 Epoch: 34 Global Step: 721790 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:33,604-Speed 6248.66 samples/sec Loss 2.8069 LearningRate 0.0000 Epoch: 34 Global Step: 721800 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:36,861-Speed 6289.41 samples/sec Loss 2.7549 LearningRate 0.0000 Epoch: 34 Global Step: 721810 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:40,144-Speed 6238.75 samples/sec Loss 2.8056 LearningRate 0.0000 Epoch: 34 Global Step: 721820 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:43,403-Speed 6286.51 samples/sec Loss 2.7955 LearningRate 0.0000 Epoch: 34 Global Step: 721830 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:46,658-Speed 6292.67 samples/sec Loss 2.8231 LearningRate 0.0000 Epoch: 34 Global Step: 721840 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:49,918-Speed 6282.66 samples/sec Loss 2.7603 LearningRate 0.0000 Epoch: 34 Global Step: 721850 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:53,168-Speed 6304.47 samples/sec Loss 2.8300 LearningRate 0.0000 Epoch: 34 Global Step: 721860 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:56,424-Speed 6291.03 samples/sec Loss 2.8156 LearningRate 0.0000 Epoch: 34 Global Step: 721870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:23:59,678-Speed 6296.54 samples/sec Loss 2.7900 LearningRate 0.0000 Epoch: 34 Global Step: 721880 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:02,923-Speed 6311.24 samples/sec Loss 2.8284 LearningRate 0.0000 Epoch: 34 Global Step: 721890 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:06,183-Speed 6283.77 samples/sec Loss 2.7965 LearningRate 0.0000 Epoch: 34 Global Step: 721900 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:09,443-Speed 6284.24 samples/sec Loss 2.8077 LearningRate 0.0000 Epoch: 34 Global Step: 721910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:12,691-Speed 6306.15 samples/sec Loss 2.7811 LearningRate 0.0000 Epoch: 34 Global Step: 721920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:15,942-Speed 6301.01 samples/sec Loss 2.8716 LearningRate 0.0000 Epoch: 34 Global Step: 721930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:19,197-Speed 6293.34 samples/sec Loss 2.8420 LearningRate 0.0000 Epoch: 34 Global Step: 721940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:22,448-Speed 6302.00 samples/sec Loss 2.8314 LearningRate 0.0000 Epoch: 34 Global Step: 721950 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:25,700-Speed 6297.91 samples/sec Loss 2.8321 LearningRate 0.0000 Epoch: 34 Global Step: 721960 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:28,955-Speed 6293.78 samples/sec Loss 2.8333 LearningRate 0.0000 Epoch: 34 Global Step: 721970 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:32,202-Speed 6308.56 samples/sec Loss 2.8223 LearningRate 0.0000 Epoch: 34 Global Step: 721980 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:35,453-Speed 6301.62 samples/sec Loss 2.8185 LearningRate 0.0000 Epoch: 34 Global Step: 721990 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:24:38,691-Speed 6324.94 samples/sec Loss 2.8384 LearningRate 0.0000 Epoch: 34 Global Step: 722000 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:41,946-Speed 6293.73 samples/sec Loss 2.7800 LearningRate 0.0000 Epoch: 34 Global Step: 722010 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:45,198-Speed 6300.09 samples/sec Loss 2.7656 LearningRate 0.0000 Epoch: 34 Global Step: 722020 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:48,450-Speed 6297.23 samples/sec Loss 2.7978 LearningRate 0.0000 Epoch: 34 Global Step: 722030 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:51,710-Speed 6285.47 samples/sec Loss 2.8107 LearningRate 0.0000 Epoch: 34 Global Step: 722040 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:54,969-Speed 6284.57 samples/sec Loss 2.8030 LearningRate 0.0000 Epoch: 34 Global Step: 722050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:24:58,230-Speed 6281.81 samples/sec Loss 2.8290 LearningRate 0.0000 Epoch: 34 Global Step: 722060 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:01,487-Speed 6288.61 samples/sec Loss 2.7852 LearningRate 0.0000 Epoch: 34 Global Step: 722070 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:04,843-Speed 6106.17 samples/sec Loss 2.7854 LearningRate 0.0000 Epoch: 34 Global Step: 722080 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:08,120-Speed 6250.60 samples/sec Loss 2.8208 LearningRate 0.0000 Epoch: 34 Global Step: 722090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:11,383-Speed 6279.47 samples/sec Loss 2.8538 LearningRate 0.0000 Epoch: 34 Global Step: 722100 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:25:14,619-Speed 6329.55 samples/sec Loss 2.7432 LearningRate 0.0000 Epoch: 34 Global Step: 722110 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:17,874-Speed 6292.11 samples/sec Loss 2.8590 LearningRate 0.0000 Epoch: 34 Global Step: 722120 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:21,134-Speed 6284.67 samples/sec Loss 2.8021 LearningRate 0.0000 Epoch: 34 Global Step: 722130 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:24,386-Speed 6298.21 samples/sec Loss 2.7962 LearningRate 0.0000 Epoch: 34 Global Step: 722140 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:27,646-Speed 6284.94 samples/sec Loss 2.8384 LearningRate 0.0000 Epoch: 34 Global Step: 722150 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:30,901-Speed 6291.90 samples/sec Loss 2.7871 LearningRate 0.0000 Epoch: 34 Global Step: 722160 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:34,159-Speed 6287.78 samples/sec Loss 2.8047 LearningRate 0.0000 Epoch: 34 Global Step: 722170 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:37,411-Speed 6300.34 samples/sec Loss 2.8366 LearningRate 0.0000 Epoch: 34 Global Step: 722180 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:40,665-Speed 6294.25 samples/sec Loss 2.7977 LearningRate 0.0000 Epoch: 34 Global Step: 722190 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:43,918-Speed 6296.42 samples/sec Loss 2.8718 LearningRate 0.0000 Epoch: 34 Global Step: 722200 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:47,174-Speed 6292.63 samples/sec Loss 2.8387 LearningRate 0.0000 Epoch: 34 Global Step: 722210 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:25:50,415-Speed 6320.58 samples/sec Loss 2.8364 LearningRate 0.0000 Epoch: 34 Global Step: 722220 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:53,670-Speed 6293.53 samples/sec Loss 2.7713 LearningRate 0.0000 Epoch: 34 Global Step: 722230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:25:56,930-Speed 6283.24 samples/sec Loss 2.8203 LearningRate 0.0000 Epoch: 34 Global Step: 722240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:00,183-Speed 6296.52 samples/sec Loss 2.8251 LearningRate 0.0000 Epoch: 34 Global Step: 722250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:03,435-Speed 6299.16 samples/sec Loss 2.7796 LearningRate 0.0000 Epoch: 34 Global Step: 722260 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:06,680-Speed 6312.56 samples/sec Loss 2.8413 LearningRate 0.0000 Epoch: 34 Global Step: 722270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:09,933-Speed 6296.92 samples/sec Loss 2.8689 LearningRate 0.0000 Epoch: 34 Global Step: 722280 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:13,183-Speed 6303.55 samples/sec Loss 2.8254 LearningRate 0.0000 Epoch: 34 Global Step: 722290 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:16,445-Speed 6279.95 samples/sec Loss 2.8139 LearningRate 0.0000 Epoch: 34 Global Step: 722300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:19,702-Speed 6288.59 samples/sec Loss 2.8226 LearningRate 0.0000 Epoch: 34 Global Step: 722310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:22,940-Speed 6326.57 samples/sec Loss 2.8687 LearningRate 0.0000 Epoch: 34 Global Step: 722320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:26,195-Speed 6294.20 samples/sec Loss 2.8279 LearningRate 0.0000 Epoch: 34 Global Step: 722330 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:29,453-Speed 6286.75 samples/sec Loss 2.8265 LearningRate 0.0000 Epoch: 34 Global Step: 722340 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:32,714-Speed 6281.76 samples/sec Loss 2.7747 LearningRate 0.0000 Epoch: 34 Global Step: 722350 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:35,974-Speed 6283.70 samples/sec Loss 2.8777 LearningRate 0.0000 Epoch: 34 Global Step: 722360 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:39,225-Speed 6301.17 samples/sec Loss 2.7913 LearningRate 0.0000 Epoch: 34 Global Step: 722370 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:42,480-Speed 6294.45 samples/sec Loss 2.8277 LearningRate 0.0000 Epoch: 34 Global Step: 722380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:45,734-Speed 6295.16 samples/sec Loss 2.7624 LearningRate 0.0000 Epoch: 34 Global Step: 722390 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:48,990-Speed 6289.53 samples/sec Loss 2.8268 LearningRate 0.0000 Epoch: 34 Global Step: 722400 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:52,251-Speed 6282.37 samples/sec Loss 2.7955 LearningRate 0.0000 Epoch: 34 Global Step: 722410 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:26:55,515-Speed 6275.86 samples/sec Loss 2.8204 LearningRate 0.0000 Epoch: 34 Global Step: 722420 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:26:58,760-Speed 6312.52 samples/sec Loss 2.8542 LearningRate 0.0000 Epoch: 34 Global Step: 722430 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:02,012-Speed 6299.16 samples/sec Loss 2.8117 LearningRate 0.0000 Epoch: 34 Global Step: 722440 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:05,278-Speed 6271.47 samples/sec Loss 2.7828 LearningRate 0.0000 Epoch: 34 Global Step: 722450 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:08,543-Speed 6275.34 samples/sec Loss 2.8199 LearningRate 0.0000 Epoch: 34 Global Step: 722460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:11,800-Speed 6289.13 samples/sec Loss 2.8233 LearningRate 0.0000 Epoch: 34 Global Step: 722470 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:15,109-Speed 6190.60 samples/sec Loss 2.8146 LearningRate 0.0000 Epoch: 34 Global Step: 722480 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:18,402-Speed 6219.86 samples/sec Loss 2.7723 LearningRate 0.0000 Epoch: 34 Global Step: 722490 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:21,661-Speed 6286.16 samples/sec Loss 2.8145 LearningRate 0.0000 Epoch: 34 Global Step: 722500 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:24,919-Speed 6287.53 samples/sec Loss 2.8690 LearningRate 0.0000 Epoch: 34 Global Step: 722510 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:28,176-Speed 6289.42 samples/sec Loss 2.7271 LearningRate 0.0000 Epoch: 34 Global Step: 722520 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:31,413-Speed 6327.27 samples/sec Loss 2.8236 LearningRate 0.0000 Epoch: 34 Global Step: 722530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:34,674-Speed 6282.41 samples/sec Loss 2.8072 LearningRate 0.0000 Epoch: 34 Global Step: 722540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:37,934-Speed 6283.44 samples/sec Loss 2.8127 LearningRate 0.0000 Epoch: 34 Global Step: 722550 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:41,192-Speed 6288.03 samples/sec Loss 2.8204 LearningRate 0.0000 Epoch: 34 Global Step: 722560 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:44,447-Speed 6292.97 samples/sec Loss 2.8001 LearningRate 0.0000 Epoch: 34 Global Step: 722570 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:47,702-Speed 6294.84 samples/sec Loss 2.7608 LearningRate 0.0000 Epoch: 34 Global Step: 722580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:50,960-Speed 6285.87 samples/sec Loss 2.7702 LearningRate 0.0000 Epoch: 34 Global Step: 722590 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:54,215-Speed 6294.60 samples/sec Loss 2.8326 LearningRate 0.0000 Epoch: 34 Global Step: 722600 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:27:57,463-Speed 6305.87 samples/sec Loss 2.8227 LearningRate 0.0000 Epoch: 34 Global Step: 722610 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:00,721-Speed 6287.38 samples/sec Loss 2.8170 LearningRate 0.0000 Epoch: 34 Global Step: 722620 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:03,953-Speed 6338.76 samples/sec Loss 2.7659 LearningRate 0.0000 Epoch: 34 Global Step: 722630 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:07,206-Speed 6296.00 samples/sec Loss 2.7665 LearningRate 0.0000 Epoch: 34 Global Step: 722640 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:10,458-Speed 6299.59 samples/sec Loss 2.8063 LearningRate 0.0000 Epoch: 34 Global Step: 722650 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:13,714-Speed 6291.40 samples/sec Loss 2.8177 LearningRate 0.0000 Epoch: 34 Global Step: 722660 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:16,969-Speed 6293.80 samples/sec Loss 2.8356 LearningRate 0.0000 Epoch: 34 Global Step: 722670 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:20,224-Speed 6293.06 samples/sec Loss 2.7906 LearningRate 0.0000 Epoch: 34 Global Step: 722680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:23,484-Speed 6283.42 samples/sec Loss 2.8319 LearningRate 0.0000 Epoch: 34 Global Step: 722690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:26,744-Speed 6282.66 samples/sec Loss 2.7554 LearningRate 0.0000 Epoch: 34 Global Step: 722700 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:30,006-Speed 6280.21 samples/sec Loss 2.8122 LearningRate 0.0000 Epoch: 34 Global Step: 722710 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:33,265-Speed 6285.87 samples/sec Loss 2.7704 LearningRate 0.0000 Epoch: 34 Global Step: 722720 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:36,510-Speed 6312.49 samples/sec Loss 2.8119 LearningRate 0.0000 Epoch: 34 Global Step: 722730 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:28:39,761-Speed 6300.98 samples/sec Loss 2.7878 LearningRate 0.0000 Epoch: 34 Global Step: 722740 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:28:43,008-Speed 6308.16 samples/sec Loss 2.8244 LearningRate 0.0000 Epoch: 34 Global Step: 722750 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:28:46,242-Speed 6334.45 samples/sec Loss 2.7823 LearningRate 0.0000 Epoch: 34 Global Step: 722760 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:49,494-Speed 6299.54 samples/sec Loss 2.7710 LearningRate 0.0000 Epoch: 34 Global Step: 722770 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:52,744-Speed 6304.16 samples/sec Loss 2.7669 LearningRate 0.0000 Epoch: 34 Global Step: 722780 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:55,995-Speed 6300.22 samples/sec Loss 2.8015 LearningRate 0.0000 Epoch: 34 Global Step: 722790 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:28:59,244-Speed 6305.28 samples/sec Loss 2.7640 LearningRate 0.0000 Epoch: 34 Global Step: 722800 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:02,506-Speed 6280.15 samples/sec Loss 2.8017 LearningRate 0.0000 Epoch: 34 Global Step: 722810 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:05,765-Speed 6285.55 samples/sec Loss 2.8359 LearningRate 0.0000 Epoch: 34 Global Step: 722820 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:09,024-Speed 6287.90 samples/sec Loss 2.7816 LearningRate 0.0000 Epoch: 34 Global Step: 722830 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:12,294-Speed 6263.03 samples/sec Loss 2.7767 LearningRate 0.0000 Epoch: 34 Global Step: 722840 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:15,555-Speed 6282.36 samples/sec Loss 2.7658 LearningRate 0.0000 Epoch: 34 Global Step: 722850 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:18,795-Speed 6321.81 samples/sec Loss 2.8092 LearningRate 0.0000 Epoch: 34 Global Step: 722860 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:22,048-Speed 6297.34 samples/sec Loss 2.7819 LearningRate 0.0000 Epoch: 34 Global Step: 722870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:25,300-Speed 6299.21 samples/sec Loss 2.8071 LearningRate 0.0000 Epoch: 34 Global Step: 722880 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:28,571-Speed 6262.47 samples/sec Loss 2.8285 LearningRate 0.0000 Epoch: 34 Global Step: 722890 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:31,831-Speed 6283.15 samples/sec Loss 2.7625 LearningRate 0.0000 Epoch: 34 Global Step: 722900 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:35,088-Speed 6289.78 samples/sec Loss 2.8056 LearningRate 0.0000 Epoch: 34 Global Step: 722910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:38,344-Speed 6291.58 samples/sec Loss 2.8244 LearningRate 0.0000 Epoch: 34 Global Step: 722920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:41,599-Speed 6293.87 samples/sec Loss 2.8277 LearningRate 0.0000 Epoch: 34 Global Step: 722930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:44,851-Speed 6298.79 samples/sec Loss 2.8491 LearningRate 0.0000 Epoch: 34 Global Step: 722940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:48,113-Speed 6279.62 samples/sec Loss 2.8035 LearningRate 0.0000 Epoch: 34 Global Step: 722950 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:29:51,369-Speed 6291.47 samples/sec Loss 2.7959 LearningRate 0.0000 Epoch: 34 Global Step: 722960 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:29:54,615-Speed 6311.19 samples/sec Loss 2.7885 LearningRate 0.0000 Epoch: 34 Global Step: 722970 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:29:57,854-Speed 6323.48 samples/sec Loss 2.8538 LearningRate 0.0000 Epoch: 34 Global Step: 722980 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:01,108-Speed 6294.06 samples/sec Loss 2.8300 LearningRate 0.0000 Epoch: 34 Global Step: 722990 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:04,371-Speed 6279.30 samples/sec Loss 2.8427 LearningRate 0.0000 Epoch: 34 Global Step: 723000 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:07,632-Speed 6282.58 samples/sec Loss 2.7861 LearningRate 0.0000 Epoch: 34 Global Step: 723010 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:10,896-Speed 6276.83 samples/sec Loss 2.7979 LearningRate 0.0000 Epoch: 34 Global Step: 723020 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:14,155-Speed 6285.02 samples/sec Loss 2.8147 LearningRate 0.0000 Epoch: 34 Global Step: 723030 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:17,405-Speed 6302.95 samples/sec Loss 2.7815 LearningRate 0.0000 Epoch: 34 Global Step: 723040 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:20,653-Speed 6306.26 samples/sec Loss 2.8337 LearningRate 0.0000 Epoch: 34 Global Step: 723050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:23,910-Speed 6289.90 samples/sec Loss 2.8068 LearningRate 0.0000 Epoch: 34 Global Step: 723060 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:27,166-Speed 6290.04 samples/sec Loss 2.8335 LearningRate 0.0000 Epoch: 34 Global Step: 723070 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:30,408-Speed 6318.19 samples/sec Loss 2.8197 LearningRate 0.0000 Epoch: 34 Global Step: 723080 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:33,673-Speed 6275.71 samples/sec Loss 2.7282 LearningRate 0.0000 Epoch: 34 Global Step: 723090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:36,927-Speed 6293.62 samples/sec Loss 2.8047 LearningRate 0.0000 Epoch: 34 Global Step: 723100 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:40,179-Speed 6299.99 samples/sec Loss 2.7460 LearningRate 0.0000 Epoch: 34 Global Step: 723110 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:43,436-Speed 6289.55 samples/sec Loss 2.8230 LearningRate 0.0000 Epoch: 34 Global Step: 723120 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:46,688-Speed 6297.73 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 723130 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:49,950-Speed 6280.07 samples/sec Loss 2.8275 LearningRate 0.0000 Epoch: 34 Global Step: 723140 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:53,197-Speed 6309.07 samples/sec Loss 2.7946 LearningRate 0.0000 Epoch: 34 Global Step: 723150 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:56,444-Speed 6308.36 samples/sec Loss 2.8088 LearningRate 0.0000 Epoch: 34 Global Step: 723160 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:30:59,698-Speed 6295.68 samples/sec Loss 2.8261 LearningRate 0.0000 Epoch: 34 Global Step: 723170 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:02,936-Speed 6326.42 samples/sec Loss 2.8184 LearningRate 0.0000 Epoch: 34 Global Step: 723180 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:06,186-Speed 6303.10 samples/sec Loss 2.8060 LearningRate 0.0000 Epoch: 34 Global Step: 723190 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:09,447-Speed 6282.95 samples/sec Loss 2.8119 LearningRate 0.0000 Epoch: 34 Global Step: 723200 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:12,716-Speed 6265.96 samples/sec Loss 2.8080 LearningRate 0.0000 Epoch: 34 Global Step: 723210 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:15,971-Speed 6293.33 samples/sec Loss 2.7985 LearningRate 0.0000 Epoch: 34 Global Step: 723220 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:19,229-Speed 6288.41 samples/sec Loss 2.7495 LearningRate 0.0000 Epoch: 34 Global Step: 723230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:22,498-Speed 6266.33 samples/sec Loss 2.7917 LearningRate 0.0000 Epoch: 34 Global Step: 723240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:25,754-Speed 6290.11 samples/sec Loss 2.7685 LearningRate 0.0000 Epoch: 34 Global Step: 723250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:29,008-Speed 6295.56 samples/sec Loss 2.7952 LearningRate 0.0000 Epoch: 34 Global Step: 723260 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:32,255-Speed 6309.01 samples/sec Loss 2.7405 LearningRate 0.0000 Epoch: 34 Global Step: 723270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:35,520-Speed 6274.63 samples/sec Loss 2.8506 LearningRate 0.0000 Epoch: 34 Global Step: 723280 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:31:38,780-Speed 6283.28 samples/sec Loss 2.7881 LearningRate 0.0000 Epoch: 34 Global Step: 723290 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:31:42,023-Speed 6318.14 samples/sec Loss 2.7835 LearningRate 0.0000 Epoch: 34 Global Step: 723300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:45,278-Speed 6294.61 samples/sec Loss 2.8135 LearningRate 0.0000 Epoch: 34 Global Step: 723310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:48,543-Speed 6273.30 samples/sec Loss 2.8137 LearningRate 0.0000 Epoch: 34 Global Step: 723320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:51,808-Speed 6272.96 samples/sec Loss 2.8347 LearningRate 0.0000 Epoch: 34 Global Step: 723330 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:55,068-Speed 6285.14 samples/sec Loss 2.7707 LearningRate 0.0000 Epoch: 34 Global Step: 723340 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:31:58,322-Speed 6293.86 samples/sec Loss 2.7907 LearningRate 0.0000 Epoch: 34 Global Step: 723350 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:01,580-Speed 6287.68 samples/sec Loss 2.8464 LearningRate 0.0000 Epoch: 34 Global Step: 723360 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:04,837-Speed 6290.41 samples/sec Loss 2.7891 LearningRate 0.0000 Epoch: 34 Global Step: 723370 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:08,083-Speed 6311.25 samples/sec Loss 2.8239 LearningRate 0.0000 Epoch: 34 Global Step: 723380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:11,333-Speed 6300.98 samples/sec Loss 2.8457 LearningRate 0.0000 Epoch: 34 Global Step: 723390 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:14,590-Speed 6291.03 samples/sec Loss 2.7745 LearningRate 0.0000 Epoch: 34 Global Step: 723400 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:32:17,858-Speed 6267.20 samples/sec Loss 2.7901 LearningRate 0.0000 Epoch: 34 Global Step: 723410 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:32:21,119-Speed 6281.77 samples/sec Loss 2.8286 LearningRate 0.0000 Epoch: 34 Global Step: 723420 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:32:24,360-Speed 6319.68 samples/sec Loss 2.8889 LearningRate 0.0000 Epoch: 34 Global Step: 723430 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:27,621-Speed 6283.89 samples/sec Loss 2.8034 LearningRate 0.0000 Epoch: 34 Global Step: 723440 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:30,882-Speed 6281.44 samples/sec Loss 2.7929 LearningRate 0.0000 Epoch: 34 Global Step: 723450 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:34,143-Speed 6280.49 samples/sec Loss 2.8263 LearningRate 0.0000 Epoch: 34 Global Step: 723460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:37,407-Speed 6276.04 samples/sec Loss 2.8061 LearningRate 0.0000 Epoch: 34 Global Step: 723470 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:40,666-Speed 6287.48 samples/sec Loss 2.7940 LearningRate 0.0000 Epoch: 34 Global Step: 723480 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:43,924-Speed 6286.75 samples/sec Loss 2.8096 LearningRate 0.0000 Epoch: 34 Global Step: 723490 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:47,176-Speed 6299.27 samples/sec Loss 2.7755 LearningRate 0.0000 Epoch: 34 Global Step: 723500 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:50,433-Speed 6289.58 samples/sec Loss 2.8019 LearningRate 0.0000 Epoch: 34 Global Step: 723510 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:53,687-Speed 6295.46 samples/sec Loss 2.7751 LearningRate 0.0000 Epoch: 34 Global Step: 723520 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:32:56,926-Speed 6322.87 samples/sec Loss 2.8226 LearningRate 0.0000 Epoch: 34 Global Step: 723530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:00,172-Speed 6310.49 samples/sec Loss 2.7802 LearningRate 0.0000 Epoch: 34 Global Step: 723540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:03,426-Speed 6296.40 samples/sec Loss 2.7820 LearningRate 0.0000 Epoch: 34 Global Step: 723550 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:06,681-Speed 6292.17 samples/sec Loss 2.8605 LearningRate 0.0000 Epoch: 34 Global Step: 723560 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:09,940-Speed 6285.71 samples/sec Loss 2.7477 LearningRate 0.0000 Epoch: 34 Global Step: 723570 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:13,195-Speed 6294.19 samples/sec Loss 2.8726 LearningRate 0.0000 Epoch: 34 Global Step: 723580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:16,455-Speed 6282.97 samples/sec Loss 2.8131 LearningRate 0.0000 Epoch: 34 Global Step: 723590 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:19,711-Speed 6292.72 samples/sec Loss 2.8108 LearningRate 0.0000 Epoch: 34 Global Step: 723600 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:22,974-Speed 6276.49 samples/sec Loss 2.8447 LearningRate 0.0000 Epoch: 34 Global Step: 723610 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:26,230-Speed 6292.24 samples/sec Loss 2.8797 LearningRate 0.0000 Epoch: 34 Global Step: 723620 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:29,509-Speed 6246.46 samples/sec Loss 2.8238 LearningRate 0.0000 Epoch: 34 Global Step: 723630 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:33:32,744-Speed 6332.77 samples/sec Loss 2.8693 LearningRate 0.0000 Epoch: 34 Global Step: 723640 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:36,004-Speed 6282.78 samples/sec Loss 2.8324 LearningRate 0.0000 Epoch: 34 Global Step: 723650 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:39,272-Speed 6268.90 samples/sec Loss 2.7299 LearningRate 0.0000 Epoch: 34 Global Step: 723660 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:42,532-Speed 6283.52 samples/sec Loss 2.8212 LearningRate 0.0000 Epoch: 34 Global Step: 723670 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:45,787-Speed 6294.80 samples/sec Loss 2.7980 LearningRate 0.0000 Epoch: 34 Global Step: 723680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:49,040-Speed 6295.75 samples/sec Loss 2.8178 LearningRate 0.0000 Epoch: 34 Global Step: 723690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:52,294-Speed 6294.95 samples/sec Loss 2.8017 LearningRate 0.0000 Epoch: 34 Global Step: 723700 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:55,549-Speed 6294.59 samples/sec Loss 2.8068 LearningRate 0.0000 Epoch: 34 Global Step: 723710 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:33:58,804-Speed 6292.39 samples/sec Loss 2.8478 LearningRate 0.0000 Epoch: 34 Global Step: 723720 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:02,065-Speed 6281.34 samples/sec Loss 2.7801 LearningRate 0.0000 Epoch: 34 Global Step: 723730 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:05,328-Speed 6278.52 samples/sec Loss 2.8151 LearningRate 0.0000 Epoch: 34 Global Step: 723740 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:34:08,570-Speed 6318.05 samples/sec Loss 2.8034 LearningRate 0.0000 Epoch: 34 Global Step: 723750 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:11,830-Speed 6284.89 samples/sec Loss 2.7761 LearningRate 0.0000 Epoch: 34 Global Step: 723760 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:15,121-Speed 6224.72 samples/sec Loss 2.7680 LearningRate 0.0000 Epoch: 34 Global Step: 723770 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:18,371-Speed 6301.05 samples/sec Loss 2.7923 LearningRate 0.0000 Epoch: 34 Global Step: 723780 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:21,628-Speed 6290.33 samples/sec Loss 2.8218 LearningRate 0.0000 Epoch: 34 Global Step: 723790 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:24,882-Speed 6294.24 samples/sec Loss 2.7710 LearningRate 0.0000 Epoch: 34 Global Step: 723800 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:28,143-Speed 6282.93 samples/sec Loss 2.7808 LearningRate 0.0000 Epoch: 34 Global Step: 723810 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:31,393-Speed 6302.71 samples/sec Loss 2.8075 LearningRate 0.0000 Epoch: 34 Global Step: 723820 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:34,653-Speed 6282.30 samples/sec Loss 2.8419 LearningRate 0.0000 Epoch: 34 Global Step: 723830 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:37,912-Speed 6286.94 samples/sec Loss 2.7847 LearningRate 0.0000 Epoch: 34 Global Step: 723840 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:41,164-Speed 6297.65 samples/sec Loss 2.8034 LearningRate 0.0000 Epoch: 34 Global Step: 723850 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:34:44,403-Speed 6324.58 samples/sec Loss 2.7886 LearningRate 0.0000 Epoch: 34 Global Step: 723860 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:47,667-Speed 6276.62 samples/sec Loss 2.8211 LearningRate 0.0000 Epoch: 34 Global Step: 723870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:50,954-Speed 6232.60 samples/sec Loss 2.7898 LearningRate 0.0000 Epoch: 34 Global Step: 723880 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:54,215-Speed 6281.77 samples/sec Loss 2.7806 LearningRate 0.0000 Epoch: 34 Global Step: 723890 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:34:57,472-Speed 6288.41 samples/sec Loss 2.8254 LearningRate 0.0000 Epoch: 34 Global Step: 723900 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:00,732-Speed 6285.46 samples/sec Loss 2.7914 LearningRate 0.0000 Epoch: 34 Global Step: 723910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:03,986-Speed 6293.83 samples/sec Loss 2.8612 LearningRate 0.0000 Epoch: 34 Global Step: 723920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:07,236-Speed 6303.90 samples/sec Loss 2.8015 LearningRate 0.0000 Epoch: 34 Global Step: 723930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:10,493-Speed 6289.92 samples/sec Loss 2.8063 LearningRate 0.0000 Epoch: 34 Global Step: 723940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:13,740-Speed 6307.23 samples/sec Loss 2.8030 LearningRate 0.0000 Epoch: 34 Global Step: 723950 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:17,014-Speed 6258.34 samples/sec Loss 2.8041 LearningRate 0.0000 Epoch: 34 Global Step: 723960 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:35:20,259-Speed 6310.79 samples/sec Loss 2.8294 LearningRate 0.0000 Epoch: 34 Global Step: 723970 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:23,513-Speed 6295.75 samples/sec Loss 2.7757 LearningRate 0.0000 Epoch: 34 Global Step: 723980 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:26,799-Speed 6233.13 samples/sec Loss 2.7872 LearningRate 0.0000 Epoch: 34 Global Step: 723990 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:30,052-Speed 6298.97 samples/sec Loss 2.7593 LearningRate 0.0000 Epoch: 34 Global Step: 724000 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:33,310-Speed 6286.71 samples/sec Loss 2.8344 LearningRate 0.0000 Epoch: 34 Global Step: 724010 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:36,569-Speed 6284.76 samples/sec Loss 2.8220 LearningRate 0.0000 Epoch: 34 Global Step: 724020 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:39,826-Speed 6289.35 samples/sec Loss 2.7650 LearningRate 0.0000 Epoch: 34 Global Step: 724030 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:43,093-Speed 6270.00 samples/sec Loss 2.7368 LearningRate 0.0000 Epoch: 34 Global Step: 724040 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:46,357-Speed 6276.36 samples/sec Loss 2.8374 LearningRate 0.0000 Epoch: 34 Global Step: 724050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:49,618-Speed 6282.02 samples/sec Loss 2.8402 LearningRate 0.0000 Epoch: 34 Global Step: 724060 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:52,874-Speed 6291.25 samples/sec Loss 2.7516 LearningRate 0.0000 Epoch: 34 Global Step: 724070 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:35:56,111-Speed 6328.70 samples/sec Loss 2.7922 LearningRate 0.0000 Epoch: 34 Global Step: 724080 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:35:59,373-Speed 6278.86 samples/sec Loss 2.8446 LearningRate 0.0000 Epoch: 34 Global Step: 724090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:02,634-Speed 6282.03 samples/sec Loss 2.8140 LearningRate 0.0000 Epoch: 34 Global Step: 724100 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:05,888-Speed 6296.87 samples/sec Loss 2.8236 LearningRate 0.0000 Epoch: 34 Global Step: 724110 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:09,137-Speed 6304.87 samples/sec Loss 2.7946 LearningRate 0.0000 Epoch: 34 Global Step: 724120 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:12,394-Speed 6288.17 samples/sec Loss 2.7942 LearningRate 0.0000 Epoch: 34 Global Step: 724130 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:15,652-Speed 6289.32 samples/sec Loss 2.7982 LearningRate 0.0000 Epoch: 34 Global Step: 724140 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:18,908-Speed 6290.87 samples/sec Loss 2.8354 LearningRate 0.0000 Epoch: 34 Global Step: 724150 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:22,161-Speed 6296.60 samples/sec Loss 2.7664 LearningRate 0.0000 Epoch: 34 Global Step: 724160 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:25,412-Speed 6301.28 samples/sec Loss 2.7619 LearningRate 0.0000 Epoch: 34 Global Step: 724170 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:28,654-Speed 6319.93 samples/sec Loss 2.8083 LearningRate 0.0000 Epoch: 34 Global Step: 724180 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:31,908-Speed 6296.63 samples/sec Loss 2.8605 LearningRate 0.0000 Epoch: 34 Global Step: 724190 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:35,165-Speed 6288.20 samples/sec Loss 2.8128 LearningRate 0.0000 Epoch: 34 Global Step: 724200 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:38,422-Speed 6290.40 samples/sec Loss 2.8569 LearningRate 0.0000 Epoch: 34 Global Step: 724210 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:41,680-Speed 6287.17 samples/sec Loss 2.8130 LearningRate 0.0000 Epoch: 34 Global Step: 724220 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:44,937-Speed 6288.93 samples/sec Loss 2.7612 LearningRate 0.0000 Epoch: 34 Global Step: 724230 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:48,194-Speed 6289.94 samples/sec Loss 2.8190 LearningRate 0.0000 Epoch: 34 Global Step: 724240 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:51,454-Speed 6283.63 samples/sec Loss 2.8035 LearningRate 0.0000 Epoch: 34 Global Step: 724250 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:54,714-Speed 6282.29 samples/sec Loss 2.7914 LearningRate 0.0000 Epoch: 34 Global Step: 724260 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:36:57,977-Speed 6279.46 samples/sec Loss 2.7954 LearningRate 0.0000 Epoch: 34 Global Step: 724270 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:01,235-Speed 6287.39 samples/sec Loss 2.8559 LearningRate 0.0000 Epoch: 34 Global Step: 724280 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:37:04,486-Speed 6300.20 samples/sec Loss 2.8090 LearningRate 0.0000 Epoch: 34 Global Step: 724290 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:37:07,719-Speed 6336.59 samples/sec Loss 2.8332 LearningRate 0.0000 Epoch: 34 Global Step: 724300 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:10,986-Speed 6269.41 samples/sec Loss 2.8197 LearningRate 0.0000 Epoch: 34 Global Step: 724310 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:14,239-Speed 6297.55 samples/sec Loss 2.7982 LearningRate 0.0000 Epoch: 34 Global Step: 724320 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:17,491-Speed 6300.14 samples/sec Loss 2.8155 LearningRate 0.0000 Epoch: 34 Global Step: 724330 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:20,738-Speed 6307.97 samples/sec Loss 2.8120 LearningRate 0.0000 Epoch: 34 Global Step: 724340 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:23,993-Speed 6293.97 samples/sec Loss 2.7524 LearningRate 0.0000 Epoch: 34 Global Step: 724350 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:27,259-Speed 6270.91 samples/sec Loss 2.7887 LearningRate 0.0000 Epoch: 34 Global Step: 724360 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:30,514-Speed 6294.06 samples/sec Loss 2.8229 LearningRate 0.0000 Epoch: 34 Global Step: 724370 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:33,773-Speed 6284.98 samples/sec Loss 2.7955 LearningRate 0.0000 Epoch: 34 Global Step: 724380 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:37,029-Speed 6292.31 samples/sec Loss 2.8779 LearningRate 0.0000 Epoch: 34 Global Step: 724390 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:40,286-Speed 6288.67 samples/sec Loss 2.7642 LearningRate 0.0000 Epoch: 34 Global Step: 724400 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:37:43,529-Speed 6317.15 samples/sec Loss 2.7779 LearningRate 0.0000 Epoch: 34 Global Step: 724410 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:46,788-Speed 6285.17 samples/sec Loss 2.7792 LearningRate 0.0000 Epoch: 34 Global Step: 724420 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:50,052-Speed 6276.65 samples/sec Loss 2.8019 LearningRate 0.0000 Epoch: 34 Global Step: 724430 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:53,305-Speed 6296.81 samples/sec Loss 2.8410 LearningRate 0.0000 Epoch: 34 Global Step: 724440 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:56,558-Speed 6296.85 samples/sec Loss 2.8599 LearningRate 0.0000 Epoch: 34 Global Step: 724450 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:37:59,816-Speed 6288.25 samples/sec Loss 2.8357 LearningRate 0.0000 Epoch: 34 Global Step: 724460 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:03,059-Speed 6315.16 samples/sec Loss 2.8084 LearningRate 0.0000 Epoch: 34 Global Step: 724470 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:06,315-Speed 6291.08 samples/sec Loss 2.7963 LearningRate 0.0000 Epoch: 34 Global Step: 724480 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:09,574-Speed 6285.31 samples/sec Loss 2.7521 LearningRate 0.0000 Epoch: 34 Global Step: 724490 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:12,826-Speed 6299.62 samples/sec Loss 2.7549 LearningRate 0.0000 Epoch: 34 Global Step: 724500 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:16,077-Speed 6301.40 samples/sec Loss 2.8661 LearningRate 0.0000 Epoch: 34 Global Step: 724510 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:38:19,333-Speed 6291.45 samples/sec Loss 2.7967 LearningRate 0.0000 Epoch: 34 Global Step: 724520 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:38:22,572-Speed 6324.09 samples/sec Loss 2.8303 LearningRate 0.0000 Epoch: 34 Global Step: 724530 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:25,831-Speed 6285.41 samples/sec Loss 2.8008 LearningRate 0.0000 Epoch: 34 Global Step: 724540 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:29,088-Speed 6289.81 samples/sec Loss 2.7872 LearningRate 0.0000 Epoch: 34 Global Step: 724550 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:32,342-Speed 6295.40 samples/sec Loss 2.7746 LearningRate 0.0000 Epoch: 34 Global Step: 724560 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:35,597-Speed 6293.28 samples/sec Loss 2.8223 LearningRate 0.0000 Epoch: 34 Global Step: 724570 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:38,859-Speed 6280.04 samples/sec Loss 2.7675 LearningRate 0.0000 Epoch: 34 Global Step: 724580 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:42,118-Speed 6285.09 samples/sec Loss 2.7939 LearningRate 0.0000 Epoch: 34 Global Step: 724590 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:45,366-Speed 6307.23 samples/sec Loss 2.8142 LearningRate 0.0000 Epoch: 34 Global Step: 724600 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:48,616-Speed 6302.62 samples/sec Loss 2.8189 LearningRate 0.0000 Epoch: 34 Global Step: 724610 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:51,877-Speed 6282.42 samples/sec Loss 2.7795 LearningRate 0.0000 Epoch: 34 Global Step: 724620 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:55,125-Speed 6305.74 samples/sec Loss 2.7968 LearningRate 0.0000 Epoch: 34 Global Step: 724630 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:38:58,383-Speed 6287.69 samples/sec Loss 2.8003 LearningRate 0.0000 Epoch: 34 Global Step: 724640 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:01,651-Speed 6267.56 samples/sec Loss 2.7817 LearningRate 0.0000 Epoch: 34 Global Step: 724650 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:04,898-Speed 6309.08 samples/sec Loss 2.8145 LearningRate 0.0000 Epoch: 34 Global Step: 724660 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:08,145-Speed 6308.21 samples/sec Loss 2.7710 LearningRate 0.0000 Epoch: 34 Global Step: 724670 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:11,402-Speed 6291.23 samples/sec Loss 2.8232 LearningRate 0.0000 Epoch: 34 Global Step: 724680 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:14,649-Speed 6306.86 samples/sec Loss 2.8221 LearningRate 0.0000 Epoch: 34 Global Step: 724690 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:17,903-Speed 6295.90 samples/sec Loss 2.8507 LearningRate 0.0000 Epoch: 34 Global Step: 724700 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:21,157-Speed 6295.36 samples/sec Loss 2.7926 LearningRate 0.0000 Epoch: 34 Global Step: 724710 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:24,410-Speed 6297.82 samples/sec Loss 2.8011 LearningRate 0.0000 Epoch: 34 Global Step: 724720 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:27,662-Speed 6298.27 samples/sec Loss 2.7498 LearningRate 0.0000 Epoch: 34 Global Step: 724730 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:39:30,895-Speed 6335.66 samples/sec Loss 2.8324 LearningRate 0.0000 Epoch: 34 Global Step: 724740 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:34,165-Speed 6264.87 samples/sec Loss 2.7417 LearningRate 0.0000 Epoch: 34 Global Step: 724750 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:37,414-Speed 6305.14 samples/sec Loss 2.8304 LearningRate 0.0000 Epoch: 34 Global Step: 724760 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:40,674-Speed 6283.20 samples/sec Loss 2.7984 LearningRate 0.0000 Epoch: 34 Global Step: 724770 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:43,932-Speed 6288.18 samples/sec Loss 2.8199 LearningRate 0.0000 Epoch: 34 Global Step: 724780 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:47,192-Speed 6284.25 samples/sec Loss 2.8234 LearningRate 0.0000 Epoch: 34 Global Step: 724790 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:50,446-Speed 6294.87 samples/sec Loss 2.8060 LearningRate 0.0000 Epoch: 34 Global Step: 724800 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:53,705-Speed 6286.47 samples/sec Loss 2.9085 LearningRate 0.0000 Epoch: 34 Global Step: 724810 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:39:56,953-Speed 6305.15 samples/sec Loss 2.7188 LearningRate 0.0000 Epoch: 34 Global Step: 724820 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:00,212-Speed 6286.45 samples/sec Loss 2.7889 LearningRate 0.0000 Epoch: 34 Global Step: 724830 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:03,473-Speed 6281.66 samples/sec Loss 2.8958 LearningRate 0.0000 Epoch: 34 Global Step: 724840 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:40:06,708-Speed 6332.24 samples/sec Loss 2.8600 LearningRate 0.0000 Epoch: 34 Global Step: 724850 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:09,961-Speed 6296.23 samples/sec Loss 2.8057 LearningRate 0.0000 Epoch: 34 Global Step: 724860 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:13,217-Speed 6290.96 samples/sec Loss 2.7779 LearningRate 0.0000 Epoch: 34 Global Step: 724870 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:16,472-Speed 6294.54 samples/sec Loss 2.7656 LearningRate 0.0000 Epoch: 34 Global Step: 724880 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:19,724-Speed 6299.15 samples/sec Loss 2.7944 LearningRate 0.0000 Epoch: 34 Global Step: 724890 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:22,977-Speed 6296.97 samples/sec Loss 2.7840 LearningRate 0.0000 Epoch: 34 Global Step: 724900 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:26,233-Speed 6290.81 samples/sec Loss 2.8031 LearningRate 0.0000 Epoch: 34 Global Step: 724910 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:29,497-Speed 6276.75 samples/sec Loss 2.7803 LearningRate 0.0000 Epoch: 34 Global Step: 724920 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:32,756-Speed 6283.92 samples/sec Loss 2.8345 LearningRate 0.0000 Epoch: 34 Global Step: 724930 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:36,007-Speed 6301.98 samples/sec Loss 2.8298 LearningRate 0.0000 Epoch: 34 Global Step: 724940 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:39,263-Speed 6291.75 samples/sec Loss 2.8482 LearningRate 0.0000 Epoch: 34 Global Step: 724950 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:40:42,509-Speed 6310.32 samples/sec Loss 2.8257 LearningRate 0.0000 Epoch: 34 Global Step: 724960 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:40:45,750-Speed 6319.93 samples/sec Loss 2.7577 LearningRate 0.0000 Epoch: 34 Global Step: 724970 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:49,007-Speed 6289.03 samples/sec Loss 2.8086 LearningRate 0.0000 Epoch: 34 Global Step: 724980 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:52,259-Speed 6299.45 samples/sec Loss 2.8030 LearningRate 0.0000 Epoch: 34 Global Step: 724990 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:55,513-Speed 6295.35 samples/sec Loss 2.7997 LearningRate 0.0000 Epoch: 34 Global Step: 725000 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:40:58,772-Speed 6285.19 samples/sec Loss 2.8425 LearningRate 0.0000 Epoch: 34 Global Step: 725010 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:41:02,033-Speed 6283.54 samples/sec Loss 2.8188 LearningRate 0.0000 Epoch: 34 Global Step: 725020 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:41:05,289-Speed 6291.18 samples/sec Loss 2.8470 LearningRate 0.0000 Epoch: 34 Global Step: 725030 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:41:08,546-Speed 6289.18 samples/sec Loss 2.7654 LearningRate 0.0000 Epoch: 34 Global Step: 725040 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:41:11,801-Speed 6292.19 samples/sec Loss 2.8186 LearningRate 0.0000 Epoch: 34 Global Step: 725050 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:41:15,050-Speed 6305.76 samples/sec Loss 2.7578 LearningRate 0.0000 Epoch: 34 Global Step: 725060 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:41:18,308-Speed 6287.07 samples/sec Loss 2.8486 LearningRate 0.0000 Epoch: 34 Global Step: 725070 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-03 09:41:21,543-Speed 6333.51 samples/sec Loss 2.7609 LearningRate 0.0000 Epoch: 34 Global Step: 725080 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:41:24,798-Speed 6291.82 samples/sec Loss 2.8258 LearningRate 0.0000 Epoch: 34 Global Step: 725090 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:41:28,059-Speed 6281.44 samples/sec Loss 2.8244 LearningRate 0.0000 Epoch: 34 Global Step: 725100 Fp16 Grad Scale: 4096 Required: 10 hours Training: 2022-04-03 09:41:31,316-Speed 6289.39 samples/sec Loss 2.7410 LearningRate 0.0000 Epoch: 34 Global Step: 725110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:41:34,576-Speed 6284.49 samples/sec Loss 2.7482 LearningRate 0.0000 Epoch: 34 Global Step: 725120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:41:37,831-Speed 6292.48 samples/sec Loss 2.7988 LearningRate 0.0000 Epoch: 34 Global Step: 725130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:41:41,079-Speed 6307.02 samples/sec Loss 2.7986 LearningRate 0.0000 Epoch: 34 Global Step: 725140 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:41:44,326-Speed 6309.58 samples/sec Loss 2.8923 LearningRate 0.0000 Epoch: 34 Global Step: 725150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:41:47,580-Speed 6295.35 samples/sec Loss 2.8217 LearningRate 0.0000 Epoch: 34 Global Step: 725160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:41:50,831-Speed 6301.18 samples/sec Loss 2.8258 LearningRate 0.0000 Epoch: 34 Global Step: 725170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:41:54,081-Speed 6302.28 samples/sec Loss 2.7892 LearningRate 0.0000 Epoch: 34 Global Step: 725180 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:41:57,317-Speed 6329.94 samples/sec Loss 2.8161 LearningRate 0.0000 Epoch: 34 Global Step: 725190 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:00,582-Speed 6274.01 samples/sec Loss 2.7645 LearningRate 0.0000 Epoch: 34 Global Step: 725200 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:03,840-Speed 6288.42 samples/sec Loss 2.7979 LearningRate 0.0000 Epoch: 34 Global Step: 725210 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:07,090-Speed 6301.52 samples/sec Loss 2.7627 LearningRate 0.0000 Epoch: 34 Global Step: 725220 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:10,352-Speed 6280.47 samples/sec Loss 2.7785 LearningRate 0.0000 Epoch: 34 Global Step: 725230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:13,609-Speed 6290.37 samples/sec Loss 2.7479 LearningRate 0.0000 Epoch: 34 Global Step: 725240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:16,875-Speed 6272.75 samples/sec Loss 2.8354 LearningRate 0.0000 Epoch: 34 Global Step: 725250 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:20,136-Speed 6280.48 samples/sec Loss 2.7054 LearningRate 0.0000 Epoch: 34 Global Step: 725260 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:23,396-Speed 6284.64 samples/sec Loss 2.8094 LearningRate 0.0000 Epoch: 34 Global Step: 725270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:26,645-Speed 6305.37 samples/sec Loss 2.8306 LearningRate 0.0000 Epoch: 34 Global Step: 725280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:29,903-Speed 6286.05 samples/sec Loss 2.7694 LearningRate 0.0000 Epoch: 34 Global Step: 725290 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:42:33,163-Speed 6283.44 samples/sec Loss 2.8004 LearningRate 0.0000 Epoch: 34 Global Step: 725300 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:42:36,404-Speed 6321.44 samples/sec Loss 2.7918 LearningRate 0.0000 Epoch: 34 Global Step: 725310 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:39,672-Speed 6267.65 samples/sec Loss 2.8281 LearningRate 0.0000 Epoch: 34 Global Step: 725320 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:42,919-Speed 6308.75 samples/sec Loss 2.8285 LearningRate 0.0000 Epoch: 34 Global Step: 725330 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:46,174-Speed 6294.48 samples/sec Loss 2.7764 LearningRate 0.0000 Epoch: 34 Global Step: 725340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:49,425-Speed 6300.13 samples/sec Loss 2.7572 LearningRate 0.0000 Epoch: 34 Global Step: 725350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:52,677-Speed 6299.93 samples/sec Loss 2.8134 LearningRate 0.0000 Epoch: 34 Global Step: 725360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:55,936-Speed 6283.99 samples/sec Loss 2.7881 LearningRate 0.0000 Epoch: 34 Global Step: 725370 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:42:59,192-Speed 6292.79 samples/sec Loss 2.8181 LearningRate 0.0000 Epoch: 34 Global Step: 725380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:02,448-Speed 6289.53 samples/sec Loss 2.8206 LearningRate 0.0000 Epoch: 34 Global Step: 725390 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:05,702-Speed 6295.44 samples/sec Loss 2.8058 LearningRate 0.0000 Epoch: 34 Global Step: 725400 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:08,956-Speed 6296.15 samples/sec Loss 2.7524 LearningRate 0.0000 Epoch: 34 Global Step: 725410 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:43:12,189-Speed 6336.18 samples/sec Loss 2.7734 LearningRate 0.0000 Epoch: 34 Global Step: 725420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:15,445-Speed 6291.14 samples/sec Loss 2.7689 LearningRate 0.0000 Epoch: 34 Global Step: 725430 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:18,698-Speed 6297.15 samples/sec Loss 2.8089 LearningRate 0.0000 Epoch: 34 Global Step: 725440 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:21,956-Speed 6288.33 samples/sec Loss 2.8122 LearningRate 0.0000 Epoch: 34 Global Step: 725450 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:25,212-Speed 6291.06 samples/sec Loss 2.8152 LearningRate 0.0000 Epoch: 34 Global Step: 725460 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:28,577-Speed 6087.08 samples/sec Loss 2.8187 LearningRate 0.0000 Epoch: 34 Global Step: 725470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:31,882-Speed 6198.42 samples/sec Loss 2.8132 LearningRate 0.0000 Epoch: 34 Global Step: 725480 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:35,133-Speed 6300.96 samples/sec Loss 2.7813 LearningRate 0.0000 Epoch: 34 Global Step: 725490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:38,384-Speed 6300.75 samples/sec Loss 2.7571 LearningRate 0.0000 Epoch: 34 Global Step: 725500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:41,639-Speed 6293.24 samples/sec Loss 2.8367 LearningRate 0.0000 Epoch: 34 Global Step: 725510 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:44,885-Speed 6311.10 samples/sec Loss 2.7947 LearningRate 0.0000 Epoch: 34 Global Step: 725520 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:48,154-Speed 6267.21 samples/sec Loss 2.7904 LearningRate 0.0000 Epoch: 34 Global Step: 725530 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:51,409-Speed 6292.29 samples/sec Loss 2.7807 LearningRate 0.0000 Epoch: 34 Global Step: 725540 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:54,666-Speed 6290.52 samples/sec Loss 2.7949 LearningRate 0.0000 Epoch: 34 Global Step: 725550 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:43:57,916-Speed 6302.21 samples/sec Loss 2.8077 LearningRate 0.0000 Epoch: 34 Global Step: 725560 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:01,168-Speed 6297.89 samples/sec Loss 2.7746 LearningRate 0.0000 Epoch: 34 Global Step: 725570 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:04,424-Speed 6291.49 samples/sec Loss 2.7889 LearningRate 0.0000 Epoch: 34 Global Step: 725580 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:07,680-Speed 6291.66 samples/sec Loss 2.8306 LearningRate 0.0000 Epoch: 34 Global Step: 725590 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:10,938-Speed 6286.70 samples/sec Loss 2.7572 LearningRate 0.0000 Epoch: 34 Global Step: 725600 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:14,196-Speed 6289.11 samples/sec Loss 2.8020 LearningRate 0.0000 Epoch: 34 Global Step: 725610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:17,451-Speed 6291.75 samples/sec Loss 2.8245 LearningRate 0.0000 Epoch: 34 Global Step: 725620 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:44:20,693-Speed 6320.34 samples/sec Loss 2.8064 LearningRate 0.0000 Epoch: 34 Global Step: 725630 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:23,945-Speed 6298.02 samples/sec Loss 2.7950 LearningRate 0.0000 Epoch: 34 Global Step: 725640 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:27,197-Speed 6298.48 samples/sec Loss 2.7618 LearningRate 0.0000 Epoch: 34 Global Step: 725650 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:30,448-Speed 6301.83 samples/sec Loss 2.7710 LearningRate 0.0000 Epoch: 34 Global Step: 725660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:33,701-Speed 6297.24 samples/sec Loss 2.8108 LearningRate 0.0000 Epoch: 34 Global Step: 725670 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:36,951-Speed 6302.58 samples/sec Loss 2.7963 LearningRate 0.0000 Epoch: 34 Global Step: 725680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:40,210-Speed 6285.81 samples/sec Loss 2.8461 LearningRate 0.0000 Epoch: 34 Global Step: 725690 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:43,473-Speed 6278.32 samples/sec Loss 2.8334 LearningRate 0.0000 Epoch: 34 Global Step: 725700 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:46,723-Speed 6302.33 samples/sec Loss 2.8873 LearningRate 0.0000 Epoch: 34 Global Step: 725710 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:49,984-Speed 6282.39 samples/sec Loss 2.8246 LearningRate 0.0000 Epoch: 34 Global Step: 725720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:53,241-Speed 6288.93 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 34 Global Step: 725730 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:44:56,484-Speed 6317.42 samples/sec Loss 2.8225 LearningRate 0.0000 Epoch: 34 Global Step: 725740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:44:59,747-Speed 6277.69 samples/sec Loss 2.7860 LearningRate 0.0000 Epoch: 34 Global Step: 725750 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:02,998-Speed 6300.75 samples/sec Loss 2.8276 LearningRate 0.0000 Epoch: 34 Global Step: 725760 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:06,251-Speed 6296.94 samples/sec Loss 2.7828 LearningRate 0.0000 Epoch: 34 Global Step: 725770 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:09,507-Speed 6291.37 samples/sec Loss 2.7643 LearningRate 0.0000 Epoch: 34 Global Step: 725780 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:12,768-Speed 6281.79 samples/sec Loss 2.7925 LearningRate 0.0000 Epoch: 34 Global Step: 725790 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:16,032-Speed 6275.00 samples/sec Loss 2.8375 LearningRate 0.0000 Epoch: 34 Global Step: 725800 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:19,280-Speed 6307.51 samples/sec Loss 2.8004 LearningRate 0.0000 Epoch: 34 Global Step: 725810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:22,551-Speed 6263.58 samples/sec Loss 2.8054 LearningRate 0.0000 Epoch: 34 Global Step: 725820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:25,807-Speed 6290.89 samples/sec Loss 2.7957 LearningRate 0.0000 Epoch: 34 Global Step: 725830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:29,045-Speed 6326.21 samples/sec Loss 2.7343 LearningRate 0.0000 Epoch: 34 Global Step: 725840 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:32,295-Speed 6302.25 samples/sec Loss 2.8031 LearningRate 0.0000 Epoch: 34 Global Step: 725850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:35,551-Speed 6291.22 samples/sec Loss 2.7945 LearningRate 0.0000 Epoch: 34 Global Step: 725860 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:38,800-Speed 6305.13 samples/sec Loss 2.8235 LearningRate 0.0000 Epoch: 34 Global Step: 725870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:45:42,049-Speed 6304.71 samples/sec Loss 2.8612 LearningRate 0.0000 Epoch: 34 Global Step: 725880 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:46:42,091-Speed 341.10 samples/sec Loss 2.7467 LearningRate 0.0000 Epoch: 35 Global Step: 725890 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:46:45,337-Speed 6311.20 samples/sec Loss 2.7647 LearningRate 0.0000 Epoch: 35 Global Step: 725900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:46:48,590-Speed 6296.86 samples/sec Loss 2.8151 LearningRate 0.0000 Epoch: 35 Global Step: 725910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:46:51,844-Speed 6296.42 samples/sec Loss 2.8219 LearningRate 0.0000 Epoch: 35 Global Step: 725920 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:46:55,084-Speed 6322.12 samples/sec Loss 2.8086 LearningRate 0.0000 Epoch: 35 Global Step: 725930 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:46:58,327-Speed 6316.53 samples/sec Loss 2.8071 LearningRate 0.0000 Epoch: 35 Global Step: 725940 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:47:01,569-Speed 6317.53 samples/sec Loss 2.8039 LearningRate 0.0000 Epoch: 35 Global Step: 725950 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:47:04,808-Speed 6324.30 samples/sec Loss 2.8722 LearningRate 0.0000 Epoch: 35 Global Step: 725960 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:08,055-Speed 6309.71 samples/sec Loss 2.8013 LearningRate 0.0000 Epoch: 35 Global Step: 725970 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:11,310-Speed 6292.36 samples/sec Loss 2.8370 LearningRate 0.0000 Epoch: 35 Global Step: 725980 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:14,566-Speed 6292.25 samples/sec Loss 2.7784 LearningRate 0.0000 Epoch: 35 Global Step: 725990 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:17,816-Speed 6302.90 samples/sec Loss 2.7875 LearningRate 0.0000 Epoch: 35 Global Step: 726000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:21,069-Speed 6296.76 samples/sec Loss 2.7828 LearningRate 0.0000 Epoch: 35 Global Step: 726010 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:24,332-Speed 6277.85 samples/sec Loss 2.8077 LearningRate 0.0000 Epoch: 35 Global Step: 726020 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:27,637-Speed 6198.22 samples/sec Loss 2.8148 LearningRate 0.0000 Epoch: 35 Global Step: 726030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:30,892-Speed 6292.30 samples/sec Loss 2.8270 LearningRate 0.0000 Epoch: 35 Global Step: 726040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:34,150-Speed 6288.18 samples/sec Loss 2.8049 LearningRate 0.0000 Epoch: 35 Global Step: 726050 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:37,405-Speed 6292.24 samples/sec Loss 2.7401 LearningRate 0.0000 Epoch: 35 Global Step: 726060 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:47:40,667-Speed 6281.25 samples/sec Loss 2.8093 LearningRate 0.0000 Epoch: 35 Global Step: 726070 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:47:43,908-Speed 6319.32 samples/sec Loss 2.7905 LearningRate 0.0000 Epoch: 35 Global Step: 726080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:47,160-Speed 6298.94 samples/sec Loss 2.8013 LearningRate 0.0000 Epoch: 35 Global Step: 726090 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:50,418-Speed 6288.71 samples/sec Loss 2.7848 LearningRate 0.0000 Epoch: 35 Global Step: 726100 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:53,662-Speed 6313.27 samples/sec Loss 2.7453 LearningRate 0.0000 Epoch: 35 Global Step: 726110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:47:56,905-Speed 6318.23 samples/sec Loss 2.8327 LearningRate 0.0000 Epoch: 35 Global Step: 726120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:00,145-Speed 6321.88 samples/sec Loss 2.7810 LearningRate 0.0000 Epoch: 35 Global Step: 726130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:03,397-Speed 6299.32 samples/sec Loss 2.8094 LearningRate 0.0000 Epoch: 35 Global Step: 726140 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:06,656-Speed 6284.82 samples/sec Loss 2.7816 LearningRate 0.0000 Epoch: 35 Global Step: 726150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:09,902-Speed 6311.78 samples/sec Loss 2.7845 LearningRate 0.0000 Epoch: 35 Global Step: 726160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:13,152-Speed 6301.68 samples/sec Loss 2.8467 LearningRate 0.0000 Epoch: 35 Global Step: 726170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:16,399-Speed 6310.00 samples/sec Loss 2.7550 LearningRate 0.0000 Epoch: 35 Global Step: 726180 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:48:19,637-Speed 6326.60 samples/sec Loss 2.8264 LearningRate 0.0000 Epoch: 35 Global Step: 726190 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:22,888-Speed 6300.60 samples/sec Loss 2.8417 LearningRate 0.0000 Epoch: 35 Global Step: 726200 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:26,136-Speed 6306.71 samples/sec Loss 2.7922 LearningRate 0.0000 Epoch: 35 Global Step: 726210 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:29,379-Speed 6317.08 samples/sec Loss 2.7890 LearningRate 0.0000 Epoch: 35 Global Step: 726220 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:32,627-Speed 6306.55 samples/sec Loss 2.7797 LearningRate 0.0000 Epoch: 35 Global Step: 726230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:35,871-Speed 6313.98 samples/sec Loss 2.7322 LearningRate 0.0000 Epoch: 35 Global Step: 726240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:39,114-Speed 6315.81 samples/sec Loss 2.7377 LearningRate 0.0000 Epoch: 35 Global Step: 726250 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:42,368-Speed 6296.11 samples/sec Loss 2.7922 LearningRate 0.0000 Epoch: 35 Global Step: 726260 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:45,614-Speed 6310.54 samples/sec Loss 2.7458 LearningRate 0.0000 Epoch: 35 Global Step: 726270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:48,865-Speed 6301.24 samples/sec Loss 2.7710 LearningRate 0.0000 Epoch: 35 Global Step: 726280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:52,097-Speed 6337.48 samples/sec Loss 2.8058 LearningRate 0.0000 Epoch: 35 Global Step: 726290 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:55,350-Speed 6297.69 samples/sec Loss 2.7827 LearningRate 0.0000 Epoch: 35 Global Step: 726300 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:48:58,598-Speed 6306.51 samples/sec Loss 2.8269 LearningRate 0.0000 Epoch: 35 Global Step: 726310 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:01,847-Speed 6304.64 samples/sec Loss 2.7948 LearningRate 0.0000 Epoch: 35 Global Step: 726320 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:05,103-Speed 6291.56 samples/sec Loss 2.7703 LearningRate 0.0000 Epoch: 35 Global Step: 726330 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:08,349-Speed 6311.57 samples/sec Loss 2.7647 LearningRate 0.0000 Epoch: 35 Global Step: 726340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:11,600-Speed 6300.16 samples/sec Loss 2.8375 LearningRate 0.0000 Epoch: 35 Global Step: 726350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:14,856-Speed 6291.40 samples/sec Loss 2.8000 LearningRate 0.0000 Epoch: 35 Global Step: 726360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:18,104-Speed 6307.09 samples/sec Loss 2.7466 LearningRate 0.0000 Epoch: 35 Global Step: 726370 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:21,349-Speed 6312.83 samples/sec Loss 2.7974 LearningRate 0.0000 Epoch: 35 Global Step: 726380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:24,605-Speed 6292.49 samples/sec Loss 2.8002 LearningRate 0.0000 Epoch: 35 Global Step: 726390 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:49:27,854-Speed 6304.38 samples/sec Loss 2.7251 LearningRate 0.0000 Epoch: 35 Global Step: 726400 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:49:31,087-Speed 6336.42 samples/sec Loss 2.7629 LearningRate 0.0000 Epoch: 35 Global Step: 726410 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:34,336-Speed 6304.05 samples/sec Loss 2.7882 LearningRate 0.0000 Epoch: 35 Global Step: 726420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:37,589-Speed 6296.38 samples/sec Loss 2.8000 LearningRate 0.0000 Epoch: 35 Global Step: 726430 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:40,836-Speed 6309.51 samples/sec Loss 2.8021 LearningRate 0.0000 Epoch: 35 Global Step: 726440 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:44,084-Speed 6307.31 samples/sec Loss 2.7844 LearningRate 0.0000 Epoch: 35 Global Step: 726450 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:47,331-Speed 6309.19 samples/sec Loss 2.8163 LearningRate 0.0000 Epoch: 35 Global Step: 726460 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:50,578-Speed 6307.27 samples/sec Loss 2.7510 LearningRate 0.0000 Epoch: 35 Global Step: 726470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:53,827-Speed 6304.84 samples/sec Loss 2.7719 LearningRate 0.0000 Epoch: 35 Global Step: 726480 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:49:57,078-Speed 6301.92 samples/sec Loss 2.8003 LearningRate 0.0000 Epoch: 35 Global Step: 726490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:00,329-Speed 6303.50 samples/sec Loss 2.8335 LearningRate 0.0000 Epoch: 35 Global Step: 726500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:03,576-Speed 6308.67 samples/sec Loss 2.8294 LearningRate 0.0000 Epoch: 35 Global Step: 726510 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:50:06,832-Speed 6291.79 samples/sec Loss 2.7887 LearningRate 0.0000 Epoch: 35 Global Step: 726520 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:50:10,066-Speed 6334.84 samples/sec Loss 2.7722 LearningRate 0.0000 Epoch: 35 Global Step: 726530 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:13,314-Speed 6307.54 samples/sec Loss 2.7886 LearningRate 0.0000 Epoch: 35 Global Step: 726540 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:16,575-Speed 6280.36 samples/sec Loss 2.7941 LearningRate 0.0000 Epoch: 35 Global Step: 726550 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:19,824-Speed 6305.66 samples/sec Loss 2.7923 LearningRate 0.0000 Epoch: 35 Global Step: 726560 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:23,076-Speed 6299.88 samples/sec Loss 2.7773 LearningRate 0.0000 Epoch: 35 Global Step: 726570 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:26,318-Speed 6316.77 samples/sec Loss 2.7763 LearningRate 0.0000 Epoch: 35 Global Step: 726580 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:29,568-Speed 6305.32 samples/sec Loss 2.7695 LearningRate 0.0000 Epoch: 35 Global Step: 726590 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:32,807-Speed 6324.54 samples/sec Loss 2.7939 LearningRate 0.0000 Epoch: 35 Global Step: 726600 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:36,061-Speed 6294.93 samples/sec Loss 2.8313 LearningRate 0.0000 Epoch: 35 Global Step: 726610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:39,314-Speed 6297.08 samples/sec Loss 2.8274 LearningRate 0.0000 Epoch: 35 Global Step: 726620 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:42,576-Speed 6278.57 samples/sec Loss 2.7950 LearningRate 0.0000 Epoch: 35 Global Step: 726630 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:50:45,808-Speed 6338.57 samples/sec Loss 2.7635 LearningRate 0.0000 Epoch: 35 Global Step: 726640 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:49,060-Speed 6299.86 samples/sec Loss 2.7984 LearningRate 0.0000 Epoch: 35 Global Step: 726650 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:52,310-Speed 6301.68 samples/sec Loss 2.7874 LearningRate 0.0000 Epoch: 35 Global Step: 726660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:55,564-Speed 6295.49 samples/sec Loss 2.7493 LearningRate 0.0000 Epoch: 35 Global Step: 726670 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:50:58,819-Speed 6293.23 samples/sec Loss 2.8363 LearningRate 0.0000 Epoch: 35 Global Step: 726680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:02,067-Speed 6306.45 samples/sec Loss 2.7859 LearningRate 0.0000 Epoch: 35 Global Step: 726690 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:05,320-Speed 6298.17 samples/sec Loss 2.8201 LearningRate 0.0000 Epoch: 35 Global Step: 726700 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:08,574-Speed 6294.50 samples/sec Loss 2.8259 LearningRate 0.0000 Epoch: 35 Global Step: 726710 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:11,823-Speed 6304.26 samples/sec Loss 2.8205 LearningRate 0.0000 Epoch: 35 Global Step: 726720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:15,073-Speed 6303.99 samples/sec Loss 2.7872 LearningRate 0.0000 Epoch: 35 Global Step: 726730 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:18,313-Speed 6322.49 samples/sec Loss 2.8180 LearningRate 0.0000 Epoch: 35 Global Step: 726740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:21,567-Speed 6295.93 samples/sec Loss 2.8020 LearningRate 0.0000 Epoch: 35 Global Step: 726750 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:24,826-Speed 6285.63 samples/sec Loss 2.8100 LearningRate 0.0000 Epoch: 35 Global Step: 726760 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:28,080-Speed 6294.28 samples/sec Loss 2.7918 LearningRate 0.0000 Epoch: 35 Global Step: 726770 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:31,326-Speed 6313.18 samples/sec Loss 2.7693 LearningRate 0.0000 Epoch: 35 Global Step: 726780 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:34,581-Speed 6292.62 samples/sec Loss 2.7836 LearningRate 0.0000 Epoch: 35 Global Step: 726790 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:37,835-Speed 6294.61 samples/sec Loss 2.7764 LearningRate 0.0000 Epoch: 35 Global Step: 726800 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:41,096-Speed 6283.41 samples/sec Loss 2.8116 LearningRate 0.0000 Epoch: 35 Global Step: 726810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:44,348-Speed 6299.12 samples/sec Loss 2.7750 LearningRate 0.0000 Epoch: 35 Global Step: 726820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:47,593-Speed 6311.21 samples/sec Loss 2.8193 LearningRate 0.0000 Epoch: 35 Global Step: 726830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:50,838-Speed 6313.22 samples/sec Loss 2.8133 LearningRate 0.0000 Epoch: 35 Global Step: 726840 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:54,089-Speed 6301.34 samples/sec Loss 2.8009 LearningRate 0.0000 Epoch: 35 Global Step: 726850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:51:57,335-Speed 6309.88 samples/sec Loss 2.7517 LearningRate 0.0000 Epoch: 35 Global Step: 726860 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:00,579-Speed 6314.35 samples/sec Loss 2.7595 LearningRate 0.0000 Epoch: 35 Global Step: 726870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:03,838-Speed 6287.09 samples/sec Loss 2.8350 LearningRate 0.0000 Epoch: 35 Global Step: 726880 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:07,088-Speed 6302.25 samples/sec Loss 2.7533 LearningRate 0.0000 Epoch: 35 Global Step: 726890 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:10,337-Speed 6305.13 samples/sec Loss 2.7738 LearningRate 0.0000 Epoch: 35 Global Step: 726900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:13,590-Speed 6296.65 samples/sec Loss 2.7727 LearningRate 0.0000 Epoch: 35 Global Step: 726910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:16,845-Speed 6293.38 samples/sec Loss 2.7855 LearningRate 0.0000 Epoch: 35 Global Step: 726920 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:20,103-Speed 6286.52 samples/sec Loss 2.8050 LearningRate 0.0000 Epoch: 35 Global Step: 726930 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:23,358-Speed 6293.83 samples/sec Loss 2.8049 LearningRate 0.0000 Epoch: 35 Global Step: 726940 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:52:26,594-Speed 6329.84 samples/sec Loss 2.7999 LearningRate 0.0000 Epoch: 35 Global Step: 726950 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:29,841-Speed 6310.97 samples/sec Loss 2.7452 LearningRate 0.0000 Epoch: 35 Global Step: 726960 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:33,088-Speed 6309.00 samples/sec Loss 2.7703 LearningRate 0.0000 Epoch: 35 Global Step: 726970 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:36,343-Speed 6292.31 samples/sec Loss 2.7659 LearningRate 0.0000 Epoch: 35 Global Step: 726980 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:39,588-Speed 6311.78 samples/sec Loss 2.8317 LearningRate 0.0000 Epoch: 35 Global Step: 726990 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:42,841-Speed 6298.94 samples/sec Loss 2.8014 LearningRate 0.0000 Epoch: 35 Global Step: 727000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:46,093-Speed 6298.71 samples/sec Loss 2.7339 LearningRate 0.0000 Epoch: 35 Global Step: 727010 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:49,343-Speed 6302.83 samples/sec Loss 2.7787 LearningRate 0.0000 Epoch: 35 Global Step: 727020 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:52,596-Speed 6297.17 samples/sec Loss 2.7875 LearningRate 0.0000 Epoch: 35 Global Step: 727030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:55,845-Speed 6304.68 samples/sec Loss 2.8257 LearningRate 0.0000 Epoch: 35 Global Step: 727040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:52:59,079-Speed 6333.26 samples/sec Loss 2.7833 LearningRate 0.0000 Epoch: 35 Global Step: 727050 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:02,327-Speed 6307.02 samples/sec Loss 2.7971 LearningRate 0.0000 Epoch: 35 Global Step: 727060 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:05,581-Speed 6295.68 samples/sec Loss 2.8225 LearningRate 0.0000 Epoch: 35 Global Step: 727070 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:08,835-Speed 6295.45 samples/sec Loss 2.8051 LearningRate 0.0000 Epoch: 35 Global Step: 727080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:12,090-Speed 6292.37 samples/sec Loss 2.7917 LearningRate 0.0000 Epoch: 35 Global Step: 727090 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:15,344-Speed 6296.41 samples/sec Loss 2.7607 LearningRate 0.0000 Epoch: 35 Global Step: 727100 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:18,593-Speed 6304.29 samples/sec Loss 2.8096 LearningRate 0.0000 Epoch: 35 Global Step: 727110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:21,851-Speed 6288.28 samples/sec Loss 2.8224 LearningRate 0.0000 Epoch: 35 Global Step: 727120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:25,105-Speed 6295.00 samples/sec Loss 2.7843 LearningRate 0.0000 Epoch: 35 Global Step: 727130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:28,354-Speed 6303.81 samples/sec Loss 2.8030 LearningRate 0.0000 Epoch: 35 Global Step: 727140 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:31,588-Speed 6334.00 samples/sec Loss 2.7589 LearningRate 0.0000 Epoch: 35 Global Step: 727150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:34,839-Speed 6301.69 samples/sec Loss 2.7545 LearningRate 0.0000 Epoch: 35 Global Step: 727160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:38,098-Speed 6285.99 samples/sec Loss 2.7722 LearningRate 0.0000 Epoch: 35 Global Step: 727170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:41,363-Speed 6274.08 samples/sec Loss 2.8033 LearningRate 0.0000 Epoch: 35 Global Step: 727180 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:44,611-Speed 6307.39 samples/sec Loss 2.8376 LearningRate 0.0000 Epoch: 35 Global Step: 727190 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:47,858-Speed 6308.68 samples/sec Loss 2.7955 LearningRate 0.0000 Epoch: 35 Global Step: 727200 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:51,114-Speed 6291.40 samples/sec Loss 2.8213 LearningRate 0.0000 Epoch: 35 Global Step: 727210 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:54,360-Speed 6310.49 samples/sec Loss 2.7893 LearningRate 0.0000 Epoch: 35 Global Step: 727220 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:53:57,642-Speed 6242.38 samples/sec Loss 2.8207 LearningRate 0.0000 Epoch: 35 Global Step: 727230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:00,890-Speed 6307.33 samples/sec Loss 2.7954 LearningRate 0.0000 Epoch: 35 Global Step: 727240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:04,144-Speed 6293.65 samples/sec Loss 2.7167 LearningRate 0.0000 Epoch: 35 Global Step: 727250 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:54:07,374-Speed 6342.40 samples/sec Loss 2.7456 LearningRate 0.0000 Epoch: 35 Global Step: 727260 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:10,621-Speed 6309.68 samples/sec Loss 2.7584 LearningRate 0.0000 Epoch: 35 Global Step: 727270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:13,870-Speed 6303.44 samples/sec Loss 2.7835 LearningRate 0.0000 Epoch: 35 Global Step: 727280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:17,124-Speed 6295.30 samples/sec Loss 2.7467 LearningRate 0.0000 Epoch: 35 Global Step: 727290 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:20,378-Speed 6296.09 samples/sec Loss 2.7982 LearningRate 0.0000 Epoch: 35 Global Step: 727300 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:23,631-Speed 6296.35 samples/sec Loss 2.7947 LearningRate 0.0000 Epoch: 35 Global Step: 727310 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:26,889-Speed 6287.10 samples/sec Loss 2.7789 LearningRate 0.0000 Epoch: 35 Global Step: 727320 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:30,144-Speed 6293.58 samples/sec Loss 2.7722 LearningRate 0.0000 Epoch: 35 Global Step: 727330 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:33,403-Speed 6284.87 samples/sec Loss 2.8034 LearningRate 0.0000 Epoch: 35 Global Step: 727340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:36,662-Speed 6286.42 samples/sec Loss 2.7879 LearningRate 0.0000 Epoch: 35 Global Step: 727350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:39,906-Speed 6314.61 samples/sec Loss 2.7414 LearningRate 0.0000 Epoch: 35 Global Step: 727360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:43,163-Speed 6289.29 samples/sec Loss 2.7766 LearningRate 0.0000 Epoch: 35 Global Step: 727370 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:46,423-Speed 6284.05 samples/sec Loss 2.7377 LearningRate 0.0000 Epoch: 35 Global Step: 727380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:49,681-Speed 6286.59 samples/sec Loss 2.7588 LearningRate 0.0000 Epoch: 35 Global Step: 727390 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:52,939-Speed 6289.81 samples/sec Loss 2.8119 LearningRate 0.0000 Epoch: 35 Global Step: 727400 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:56,202-Speed 6277.33 samples/sec Loss 2.7773 LearningRate 0.0000 Epoch: 35 Global Step: 727410 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:54:59,459-Speed 6289.24 samples/sec Loss 2.8346 LearningRate 0.0000 Epoch: 35 Global Step: 727420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:02,719-Speed 6283.40 samples/sec Loss 2.7478 LearningRate 0.0000 Epoch: 35 Global Step: 727430 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:05,978-Speed 6285.20 samples/sec Loss 2.7439 LearningRate 0.0000 Epoch: 35 Global Step: 727440 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:09,232-Speed 6296.25 samples/sec Loss 2.7886 LearningRate 0.0000 Epoch: 35 Global Step: 727450 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:12,490-Speed 6289.37 samples/sec Loss 2.7967 LearningRate 0.0000 Epoch: 35 Global Step: 727460 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:55:15,722-Speed 6339.34 samples/sec Loss 2.7888 LearningRate 0.0000 Epoch: 35 Global Step: 727470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:18,969-Speed 6306.74 samples/sec Loss 2.7699 LearningRate 0.0000 Epoch: 35 Global Step: 727480 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:22,219-Speed 6303.56 samples/sec Loss 2.7625 LearningRate 0.0000 Epoch: 35 Global Step: 727490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:25,477-Speed 6286.99 samples/sec Loss 2.8198 LearningRate 0.0000 Epoch: 35 Global Step: 727500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:28,731-Speed 6295.76 samples/sec Loss 2.7879 LearningRate 0.0000 Epoch: 35 Global Step: 727510 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:31,986-Speed 6294.61 samples/sec Loss 2.7983 LearningRate 0.0000 Epoch: 35 Global Step: 727520 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:35,230-Speed 6313.95 samples/sec Loss 2.8150 LearningRate 0.0000 Epoch: 35 Global Step: 727530 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:38,487-Speed 6288.82 samples/sec Loss 2.7543 LearningRate 0.0000 Epoch: 35 Global Step: 727540 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:41,740-Speed 6296.79 samples/sec Loss 2.7913 LearningRate 0.0000 Epoch: 35 Global Step: 727550 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:44,984-Speed 6314.40 samples/sec Loss 2.7652 LearningRate 0.0000 Epoch: 35 Global Step: 727560 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:48,239-Speed 6294.52 samples/sec Loss 2.7592 LearningRate 0.0000 Epoch: 35 Global Step: 727570 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:55:51,479-Speed 6322.68 samples/sec Loss 2.7859 LearningRate 0.0000 Epoch: 35 Global Step: 727580 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:54,733-Speed 6294.79 samples/sec Loss 2.8026 LearningRate 0.0000 Epoch: 35 Global Step: 727590 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:55:57,983-Speed 6302.69 samples/sec Loss 2.8446 LearningRate 0.0000 Epoch: 35 Global Step: 727600 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:01,246-Speed 6279.19 samples/sec Loss 2.8264 LearningRate 0.0000 Epoch: 35 Global Step: 727610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:04,499-Speed 6297.58 samples/sec Loss 2.7944 LearningRate 0.0000 Epoch: 35 Global Step: 727620 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:07,750-Speed 6299.17 samples/sec Loss 2.7369 LearningRate 0.0000 Epoch: 35 Global Step: 727630 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:10,999-Speed 6306.26 samples/sec Loss 2.7596 LearningRate 0.0000 Epoch: 35 Global Step: 727640 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:14,241-Speed 6317.53 samples/sec Loss 2.7892 LearningRate 0.0000 Epoch: 35 Global Step: 727650 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:17,496-Speed 6294.13 samples/sec Loss 2.7776 LearningRate 0.0000 Epoch: 35 Global Step: 727660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:20,755-Speed 6285.59 samples/sec Loss 2.7380 LearningRate 0.0000 Epoch: 35 Global Step: 727670 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:23,999-Speed 6314.90 samples/sec Loss 2.8112 LearningRate 0.0000 Epoch: 35 Global Step: 727680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:27,251-Speed 6299.41 samples/sec Loss 2.7868 LearningRate 0.0000 Epoch: 35 Global Step: 727690 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:30,507-Speed 6289.49 samples/sec Loss 2.8071 LearningRate 0.0000 Epoch: 35 Global Step: 727700 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:33,759-Speed 6298.95 samples/sec Loss 2.7993 LearningRate 0.0000 Epoch: 35 Global Step: 727710 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:37,021-Speed 6280.65 samples/sec Loss 2.7410 LearningRate 0.0000 Epoch: 35 Global Step: 727720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:40,281-Speed 6284.31 samples/sec Loss 2.7677 LearningRate 0.0000 Epoch: 35 Global Step: 727730 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:43,530-Speed 6304.42 samples/sec Loss 2.7971 LearningRate 0.0000 Epoch: 35 Global Step: 727740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:46,780-Speed 6301.96 samples/sec Loss 2.7453 LearningRate 0.0000 Epoch: 35 Global Step: 727750 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:50,034-Speed 6296.20 samples/sec Loss 2.8295 LearningRate 0.0000 Epoch: 35 Global Step: 727760 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:53,282-Speed 6307.20 samples/sec Loss 2.8157 LearningRate 0.0000 Epoch: 35 Global Step: 727770 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:56:56,541-Speed 6283.80 samples/sec Loss 2.8507 LearningRate 0.0000 Epoch: 35 Global Step: 727780 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:56:59,780-Speed 6324.73 samples/sec Loss 2.7907 LearningRate 0.0000 Epoch: 35 Global Step: 727790 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:03,149-Speed 6080.62 samples/sec Loss 2.7776 LearningRate 0.0000 Epoch: 35 Global Step: 727800 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:06,426-Speed 6251.06 samples/sec Loss 2.8116 LearningRate 0.0000 Epoch: 35 Global Step: 727810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:09,684-Speed 6287.61 samples/sec Loss 2.7865 LearningRate 0.0000 Epoch: 35 Global Step: 727820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:12,941-Speed 6290.67 samples/sec Loss 2.7523 LearningRate 0.0000 Epoch: 35 Global Step: 727830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:16,222-Speed 6244.06 samples/sec Loss 2.8059 LearningRate 0.0000 Epoch: 35 Global Step: 727840 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:19,504-Speed 6240.65 samples/sec Loss 2.7392 LearningRate 0.0000 Epoch: 35 Global Step: 727850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:22,761-Speed 6289.44 samples/sec Loss 2.7617 LearningRate 0.0000 Epoch: 35 Global Step: 727860 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:26,030-Speed 6265.69 samples/sec Loss 2.8179 LearningRate 0.0000 Epoch: 35 Global Step: 727870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:29,285-Speed 6296.10 samples/sec Loss 2.8102 LearningRate 0.0000 Epoch: 35 Global Step: 727880 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:32,519-Speed 6335.22 samples/sec Loss 2.7697 LearningRate 0.0000 Epoch: 35 Global Step: 727890 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:35,777-Speed 6286.66 samples/sec Loss 2.7401 LearningRate 0.0000 Epoch: 35 Global Step: 727900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:39,033-Speed 6291.42 samples/sec Loss 2.8348 LearningRate 0.0000 Epoch: 35 Global Step: 727910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:42,287-Speed 6295.36 samples/sec Loss 2.8035 LearningRate 0.0000 Epoch: 35 Global Step: 727920 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:45,545-Speed 6287.43 samples/sec Loss 2.7523 LearningRate 0.0000 Epoch: 35 Global Step: 727930 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:48,801-Speed 6290.86 samples/sec Loss 2.7649 LearningRate 0.0000 Epoch: 35 Global Step: 727940 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:52,062-Speed 6282.51 samples/sec Loss 2.7484 LearningRate 0.0000 Epoch: 35 Global Step: 727950 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:55,320-Speed 6287.74 samples/sec Loss 2.8023 LearningRate 0.0000 Epoch: 35 Global Step: 727960 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:57:58,580-Speed 6283.50 samples/sec Loss 2.8077 LearningRate 0.0000 Epoch: 35 Global Step: 727970 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:01,832-Speed 6299.57 samples/sec Loss 2.7711 LearningRate 0.0000 Epoch: 35 Global Step: 727980 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:05,084-Speed 6299.27 samples/sec Loss 2.7945 LearningRate 0.0000 Epoch: 35 Global Step: 727990 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:58:08,323-Speed 6323.49 samples/sec Loss 2.8106 LearningRate 0.0000 Epoch: 35 Global Step: 728000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:11,584-Speed 6281.39 samples/sec Loss 2.8263 LearningRate 0.0000 Epoch: 35 Global Step: 728010 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:14,848-Speed 6276.96 samples/sec Loss 2.8578 LearningRate 0.0000 Epoch: 35 Global Step: 728020 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:18,103-Speed 6292.51 samples/sec Loss 2.7797 LearningRate 0.0000 Epoch: 35 Global Step: 728030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:21,357-Speed 6295.21 samples/sec Loss 2.7851 LearningRate 0.0000 Epoch: 35 Global Step: 728040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:24,605-Speed 6307.66 samples/sec Loss 2.7487 LearningRate 0.0000 Epoch: 35 Global Step: 728050 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:27,849-Speed 6314.67 samples/sec Loss 2.7354 LearningRate 0.0000 Epoch: 35 Global Step: 728060 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:31,109-Speed 6282.83 samples/sec Loss 2.7626 LearningRate 0.0000 Epoch: 35 Global Step: 728070 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:34,366-Speed 6289.72 samples/sec Loss 2.7750 LearningRate 0.0000 Epoch: 35 Global Step: 728080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:37,617-Speed 6300.75 samples/sec Loss 2.7930 LearningRate 0.0000 Epoch: 35 Global Step: 728090 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:40,866-Speed 6305.77 samples/sec Loss 2.7215 LearningRate 0.0000 Epoch: 35 Global Step: 728100 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:58:44,105-Speed 6324.18 samples/sec Loss 2.8068 LearningRate 0.0000 Epoch: 35 Global Step: 728110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:47,364-Speed 6285.44 samples/sec Loss 2.7801 LearningRate 0.0000 Epoch: 35 Global Step: 728120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:50,622-Speed 6286.54 samples/sec Loss 2.8078 LearningRate 0.0000 Epoch: 35 Global Step: 728130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:53,883-Speed 6282.59 samples/sec Loss 2.7357 LearningRate 0.0000 Epoch: 35 Global Step: 728140 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:58:57,136-Speed 6295.80 samples/sec Loss 2.7610 LearningRate 0.0000 Epoch: 35 Global Step: 728150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:00,394-Speed 6287.71 samples/sec Loss 2.7609 LearningRate 0.0000 Epoch: 35 Global Step: 728160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:03,657-Speed 6278.88 samples/sec Loss 2.7966 LearningRate 0.0000 Epoch: 35 Global Step: 728170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:06,910-Speed 6298.85 samples/sec Loss 2.7451 LearningRate 0.0000 Epoch: 35 Global Step: 728180 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:10,166-Speed 6290.04 samples/sec Loss 2.8354 LearningRate 0.0000 Epoch: 35 Global Step: 728190 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:13,422-Speed 6290.73 samples/sec Loss 2.8153 LearningRate 0.0000 Epoch: 35 Global Step: 728200 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:16,651-Speed 6344.66 samples/sec Loss 2.8115 LearningRate 0.0000 Epoch: 35 Global Step: 728210 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:19,906-Speed 6294.90 samples/sec Loss 2.8258 LearningRate 0.0000 Epoch: 35 Global Step: 728220 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:23,168-Speed 6279.51 samples/sec Loss 2.7952 LearningRate 0.0000 Epoch: 35 Global Step: 728230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:26,427-Speed 6285.59 samples/sec Loss 2.8125 LearningRate 0.0000 Epoch: 35 Global Step: 728240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:29,677-Speed 6301.49 samples/sec Loss 2.8108 LearningRate 0.0000 Epoch: 35 Global Step: 728250 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:32,929-Speed 6300.26 samples/sec Loss 2.7968 LearningRate 0.0000 Epoch: 35 Global Step: 728260 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:36,183-Speed 6295.65 samples/sec Loss 2.8156 LearningRate 0.0000 Epoch: 35 Global Step: 728270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:39,433-Speed 6302.08 samples/sec Loss 2.7654 LearningRate 0.0000 Epoch: 35 Global Step: 728280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:42,684-Speed 6301.64 samples/sec Loss 2.7768 LearningRate 0.0000 Epoch: 35 Global Step: 728290 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:45,940-Speed 6290.69 samples/sec Loss 2.7492 LearningRate 0.0000 Epoch: 35 Global Step: 728300 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:49,200-Speed 6283.40 samples/sec Loss 2.7808 LearningRate 0.0000 Epoch: 35 Global Step: 728310 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 09:59:52,442-Speed 6319.09 samples/sec Loss 2.7924 LearningRate 0.0000 Epoch: 35 Global Step: 728320 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:55,696-Speed 6294.52 samples/sec Loss 2.8291 LearningRate 0.0000 Epoch: 35 Global Step: 728330 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 09:59:58,956-Speed 6284.36 samples/sec Loss 2.7970 LearningRate 0.0000 Epoch: 35 Global Step: 728340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:02,216-Speed 6282.19 samples/sec Loss 2.7801 LearningRate 0.0000 Epoch: 35 Global Step: 728350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:05,475-Speed 6287.14 samples/sec Loss 2.7761 LearningRate 0.0000 Epoch: 35 Global Step: 728360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:08,731-Speed 6291.11 samples/sec Loss 2.7829 LearningRate 0.0000 Epoch: 35 Global Step: 728370 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:11,989-Speed 6286.08 samples/sec Loss 2.7924 LearningRate 0.0000 Epoch: 35 Global Step: 728380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:15,247-Speed 6288.69 samples/sec Loss 2.7335 LearningRate 0.0000 Epoch: 35 Global Step: 728390 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:18,493-Speed 6309.23 samples/sec Loss 2.7469 LearningRate 0.0000 Epoch: 35 Global Step: 728400 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:21,741-Speed 6307.05 samples/sec Loss 2.7496 LearningRate 0.0000 Epoch: 35 Global Step: 728410 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:24,983-Speed 6318.86 samples/sec Loss 2.7956 LearningRate 0.0000 Epoch: 35 Global Step: 728420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:28,239-Speed 6291.59 samples/sec Loss 2.7539 LearningRate 0.0000 Epoch: 35 Global Step: 728430 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:31,493-Speed 6296.40 samples/sec Loss 2.7843 LearningRate 0.0000 Epoch: 35 Global Step: 728440 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:34,739-Speed 6311.34 samples/sec Loss 2.7364 LearningRate 0.0000 Epoch: 35 Global Step: 728450 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:38,005-Speed 6275.12 samples/sec Loss 2.7621 LearningRate 0.0000 Epoch: 35 Global Step: 728460 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:41,260-Speed 6293.28 samples/sec Loss 2.8067 LearningRate 0.0000 Epoch: 35 Global Step: 728470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:44,518-Speed 6287.19 samples/sec Loss 2.8041 LearningRate 0.0000 Epoch: 35 Global Step: 728480 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:47,774-Speed 6292.03 samples/sec Loss 2.7970 LearningRate 0.0000 Epoch: 35 Global Step: 728490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:51,034-Speed 6282.42 samples/sec Loss 2.8272 LearningRate 0.0000 Epoch: 35 Global Step: 728500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:54,291-Speed 6289.21 samples/sec Loss 2.8067 LearningRate 0.0000 Epoch: 35 Global Step: 728510 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:00:57,539-Speed 6307.63 samples/sec Loss 2.7386 LearningRate 0.0000 Epoch: 35 Global Step: 728520 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:01:00,796-Speed 6288.47 samples/sec Loss 2.7940 LearningRate 0.0000 Epoch: 35 Global Step: 728530 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:01:04,061-Speed 6274.52 samples/sec Loss 2.7584 LearningRate 0.0000 Epoch: 35 Global Step: 728540 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:01:07,313-Speed 6299.06 samples/sec Loss 2.7480 LearningRate 0.0000 Epoch: 35 Global Step: 728550 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:10,571-Speed 6287.06 samples/sec Loss 2.7700 LearningRate 0.0000 Epoch: 35 Global Step: 728560 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:13,830-Speed 6286.53 samples/sec Loss 2.7390 LearningRate 0.0000 Epoch: 35 Global Step: 728570 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:17,090-Speed 6284.05 samples/sec Loss 2.8015 LearningRate 0.0000 Epoch: 35 Global Step: 728580 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:20,345-Speed 6292.56 samples/sec Loss 2.8015 LearningRate 0.0000 Epoch: 35 Global Step: 728590 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:23,603-Speed 6286.61 samples/sec Loss 2.7430 LearningRate 0.0000 Epoch: 35 Global Step: 728600 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:26,857-Speed 6295.68 samples/sec Loss 2.7535 LearningRate 0.0000 Epoch: 35 Global Step: 728610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:30,116-Speed 6285.96 samples/sec Loss 2.7540 LearningRate 0.0000 Epoch: 35 Global Step: 728620 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:33,373-Speed 6290.00 samples/sec Loss 2.7585 LearningRate 0.0000 Epoch: 35 Global Step: 728630 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:36,627-Speed 6293.85 samples/sec Loss 2.7258 LearningRate 0.0000 Epoch: 35 Global Step: 728640 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:39,867-Speed 6323.80 samples/sec Loss 2.7825 LearningRate 0.0000 Epoch: 35 Global Step: 728650 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:43,121-Speed 6295.04 samples/sec Loss 2.7358 LearningRate 0.0000 Epoch: 35 Global Step: 728660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:46,382-Speed 6282.01 samples/sec Loss 2.7886 LearningRate 0.0000 Epoch: 35 Global Step: 728670 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:49,641-Speed 6285.10 samples/sec Loss 2.7969 LearningRate 0.0000 Epoch: 35 Global Step: 728680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:52,898-Speed 6290.92 samples/sec Loss 2.7262 LearningRate 0.0000 Epoch: 35 Global Step: 728690 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:56,152-Speed 6293.79 samples/sec Loss 2.7860 LearningRate 0.0000 Epoch: 35 Global Step: 728700 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:01:59,402-Speed 6302.52 samples/sec Loss 2.7792 LearningRate 0.0000 Epoch: 35 Global Step: 728710 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:02,659-Speed 6290.43 samples/sec Loss 2.8025 LearningRate 0.0000 Epoch: 35 Global Step: 728720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:05,908-Speed 6305.34 samples/sec Loss 2.7873 LearningRate 0.0000 Epoch: 35 Global Step: 728730 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:09,157-Speed 6304.79 samples/sec Loss 2.7693 LearningRate 0.0000 Epoch: 35 Global Step: 728740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:12,418-Speed 6281.73 samples/sec Loss 2.8025 LearningRate 0.0000 Epoch: 35 Global Step: 728750 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:02:15,660-Speed 6318.52 samples/sec Loss 2.7346 LearningRate 0.0000 Epoch: 35 Global Step: 728760 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:18,917-Speed 6288.99 samples/sec Loss 2.7743 LearningRate 0.0000 Epoch: 35 Global Step: 728770 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:22,176-Speed 6291.55 samples/sec Loss 2.7907 LearningRate 0.0000 Epoch: 35 Global Step: 728780 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:25,435-Speed 6284.81 samples/sec Loss 2.7743 LearningRate 0.0000 Epoch: 35 Global Step: 728790 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:28,691-Speed 6292.09 samples/sec Loss 2.7917 LearningRate 0.0000 Epoch: 35 Global Step: 728800 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:31,948-Speed 6289.49 samples/sec Loss 2.7679 LearningRate 0.0000 Epoch: 35 Global Step: 728810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:35,205-Speed 6289.35 samples/sec Loss 2.7892 LearningRate 0.0000 Epoch: 35 Global Step: 728820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:38,466-Speed 6281.39 samples/sec Loss 2.8122 LearningRate 0.0000 Epoch: 35 Global Step: 728830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:41,717-Speed 6300.06 samples/sec Loss 2.7818 LearningRate 0.0000 Epoch: 35 Global Step: 728840 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:44,976-Speed 6286.81 samples/sec Loss 2.7900 LearningRate 0.0000 Epoch: 35 Global Step: 728850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:48,245-Speed 6265.36 samples/sec Loss 2.8119 LearningRate 0.0000 Epoch: 35 Global Step: 728860 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:02:51,531-Speed 6234.79 samples/sec Loss 2.8072 LearningRate 0.0000 Epoch: 35 Global Step: 728870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:54,784-Speed 6297.29 samples/sec Loss 2.7664 LearningRate 0.0000 Epoch: 35 Global Step: 728880 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:02:58,029-Speed 6312.07 samples/sec Loss 2.7824 LearningRate 0.0000 Epoch: 35 Global Step: 728890 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:01,283-Speed 6297.15 samples/sec Loss 2.7966 LearningRate 0.0000 Epoch: 35 Global Step: 728900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:04,540-Speed 6288.01 samples/sec Loss 2.8197 LearningRate 0.0000 Epoch: 35 Global Step: 728910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:07,800-Speed 6285.24 samples/sec Loss 2.8124 LearningRate 0.0000 Epoch: 35 Global Step: 728920 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:11,054-Speed 6293.36 samples/sec Loss 2.7651 LearningRate 0.0000 Epoch: 35 Global Step: 728930 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:14,302-Speed 6307.45 samples/sec Loss 2.7304 LearningRate 0.0000 Epoch: 35 Global Step: 728940 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:17,569-Speed 6270.45 samples/sec Loss 2.7592 LearningRate 0.0000 Epoch: 35 Global Step: 728950 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:20,825-Speed 6290.94 samples/sec Loss 2.8141 LearningRate 0.0000 Epoch: 35 Global Step: 728960 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:24,084-Speed 6285.86 samples/sec Loss 2.7918 LearningRate 0.0000 Epoch: 35 Global Step: 728970 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:03:27,336-Speed 6299.19 samples/sec Loss 2.8061 LearningRate 0.0000 Epoch: 35 Global Step: 728980 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:03:30,581-Speed 6311.34 samples/sec Loss 2.8239 LearningRate 0.0000 Epoch: 35 Global Step: 728990 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:33,841-Speed 6284.69 samples/sec Loss 2.7361 LearningRate 0.0000 Epoch: 35 Global Step: 729000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:37,095-Speed 6294.07 samples/sec Loss 2.7505 LearningRate 0.0000 Epoch: 35 Global Step: 729010 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:40,352-Speed 6291.09 samples/sec Loss 2.8086 LearningRate 0.0000 Epoch: 35 Global Step: 729020 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:43,611-Speed 6283.83 samples/sec Loss 2.7472 LearningRate 0.0000 Epoch: 35 Global Step: 729030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:46,918-Speed 6195.12 samples/sec Loss 2.7602 LearningRate 0.0000 Epoch: 35 Global Step: 729040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:50,179-Speed 6281.53 samples/sec Loss 2.7872 LearningRate 0.0000 Epoch: 35 Global Step: 729050 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:53,428-Speed 6304.75 samples/sec Loss 2.7725 LearningRate 0.0000 Epoch: 35 Global Step: 729060 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:56,679-Speed 6300.38 samples/sec Loss 2.8076 LearningRate 0.0000 Epoch: 35 Global Step: 729070 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:03:59,927-Speed 6306.39 samples/sec Loss 2.7961 LearningRate 0.0000 Epoch: 35 Global Step: 729080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:03,172-Speed 6318.11 samples/sec Loss 2.8102 LearningRate 0.0000 Epoch: 35 Global Step: 729090 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:06,430-Speed 6286.66 samples/sec Loss 2.8092 LearningRate 0.0000 Epoch: 35 Global Step: 729100 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:09,685-Speed 6293.98 samples/sec Loss 2.8204 LearningRate 0.0000 Epoch: 35 Global Step: 729110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:12,949-Speed 6275.80 samples/sec Loss 2.7782 LearningRate 0.0000 Epoch: 35 Global Step: 729120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:16,204-Speed 6292.34 samples/sec Loss 2.7615 LearningRate 0.0000 Epoch: 35 Global Step: 729130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:19,459-Speed 6294.15 samples/sec Loss 2.8120 LearningRate 0.0000 Epoch: 35 Global Step: 729140 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:22,709-Speed 6303.76 samples/sec Loss 2.7811 LearningRate 0.0000 Epoch: 35 Global Step: 729150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:25,961-Speed 6297.99 samples/sec Loss 2.7682 LearningRate 0.0000 Epoch: 35 Global Step: 729160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:29,212-Speed 6300.20 samples/sec Loss 2.7893 LearningRate 0.0000 Epoch: 35 Global Step: 729170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:32,466-Speed 6296.03 samples/sec Loss 2.7964 LearningRate 0.0000 Epoch: 35 Global Step: 729180 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:35,725-Speed 6286.41 samples/sec Loss 2.8039 LearningRate 0.0000 Epoch: 35 Global Step: 729190 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:04:38,979-Speed 6295.13 samples/sec Loss 2.7570 LearningRate 0.0000 Epoch: 35 Global Step: 729200 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:04:42,231-Speed 6298.17 samples/sec Loss 2.7681 LearningRate 0.0000 Epoch: 35 Global Step: 729210 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:04:45,492-Speed 6282.82 samples/sec Loss 2.7634 LearningRate 0.0000 Epoch: 35 Global Step: 729220 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:04:48,725-Speed 6334.85 samples/sec Loss 2.7480 LearningRate 0.0000 Epoch: 35 Global Step: 729230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:51,982-Speed 6290.13 samples/sec Loss 2.7648 LearningRate 0.0000 Epoch: 35 Global Step: 729240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:55,238-Speed 6291.63 samples/sec Loss 2.7489 LearningRate 0.0000 Epoch: 35 Global Step: 729250 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:04:58,492-Speed 6295.26 samples/sec Loss 2.7969 LearningRate 0.0000 Epoch: 35 Global Step: 729260 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:01,752-Speed 6282.77 samples/sec Loss 2.7860 LearningRate 0.0000 Epoch: 35 Global Step: 729270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:05,018-Speed 6272.43 samples/sec Loss 2.7746 LearningRate 0.0000 Epoch: 35 Global Step: 729280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:08,278-Speed 6283.47 samples/sec Loss 2.8073 LearningRate 0.0000 Epoch: 35 Global Step: 729290 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:11,537-Speed 6285.29 samples/sec Loss 2.7346 LearningRate 0.0000 Epoch: 35 Global Step: 729300 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:14,809-Speed 6261.87 samples/sec Loss 2.7441 LearningRate 0.0000 Epoch: 35 Global Step: 729310 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:18,059-Speed 6303.55 samples/sec Loss 2.7771 LearningRate 0.0000 Epoch: 35 Global Step: 729320 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:21,308-Speed 6305.33 samples/sec Loss 2.8248 LearningRate 0.0000 Epoch: 35 Global Step: 729330 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:05:24,552-Speed 6313.71 samples/sec Loss 2.7963 LearningRate 0.0000 Epoch: 35 Global Step: 729340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:27,834-Speed 6240.58 samples/sec Loss 2.6992 LearningRate 0.0000 Epoch: 35 Global Step: 729350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:31,135-Speed 6205.75 samples/sec Loss 2.7858 LearningRate 0.0000 Epoch: 35 Global Step: 729360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:34,378-Speed 6317.21 samples/sec Loss 2.7670 LearningRate 0.0000 Epoch: 35 Global Step: 729370 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:37,631-Speed 6296.96 samples/sec Loss 2.7612 LearningRate 0.0000 Epoch: 35 Global Step: 729380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:40,920-Speed 6229.22 samples/sec Loss 2.7220 LearningRate 0.0000 Epoch: 35 Global Step: 729390 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:44,171-Speed 6300.39 samples/sec Loss 2.7857 LearningRate 0.0000 Epoch: 35 Global Step: 729400 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:47,426-Speed 6292.98 samples/sec Loss 2.7503 LearningRate 0.0000 Epoch: 35 Global Step: 729410 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:50,682-Speed 6292.38 samples/sec Loss 2.8286 LearningRate 0.0000 Epoch: 35 Global Step: 729420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:53,936-Speed 6293.26 samples/sec Loss 2.7502 LearningRate 0.0000 Epoch: 35 Global Step: 729430 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:05:57,179-Speed 6316.60 samples/sec Loss 2.8082 LearningRate 0.0000 Epoch: 35 Global Step: 729440 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:00,437-Speed 6287.30 samples/sec Loss 2.7455 LearningRate 0.0000 Epoch: 35 Global Step: 729450 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:03,701-Speed 6276.69 samples/sec Loss 2.7959 LearningRate 0.0000 Epoch: 35 Global Step: 729460 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:06,960-Speed 6285.13 samples/sec Loss 2.7567 LearningRate 0.0000 Epoch: 35 Global Step: 729470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:10,219-Speed 6285.42 samples/sec Loss 2.8038 LearningRate 0.0000 Epoch: 35 Global Step: 729480 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:13,474-Speed 6294.43 samples/sec Loss 2.7866 LearningRate 0.0000 Epoch: 35 Global Step: 729490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:16,725-Speed 6300.94 samples/sec Loss 2.7436 LearningRate 0.0000 Epoch: 35 Global Step: 729500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:19,979-Speed 6294.80 samples/sec Loss 2.7267 LearningRate 0.0000 Epoch: 35 Global Step: 729510 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:23,230-Speed 6302.00 samples/sec Loss 2.7416 LearningRate 0.0000 Epoch: 35 Global Step: 729520 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:26,481-Speed 6301.06 samples/sec Loss 2.7509 LearningRate 0.0000 Epoch: 35 Global Step: 729530 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:29,724-Speed 6315.79 samples/sec Loss 2.7922 LearningRate 0.0000 Epoch: 35 Global Step: 729540 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:32,981-Speed 6288.76 samples/sec Loss 2.8174 LearningRate 0.0000 Epoch: 35 Global Step: 729550 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:36,242-Speed 6282.58 samples/sec Loss 2.8071 LearningRate 0.0000 Epoch: 35 Global Step: 729560 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:39,497-Speed 6293.83 samples/sec Loss 2.8282 LearningRate 0.0000 Epoch: 35 Global Step: 729570 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:42,753-Speed 6290.38 samples/sec Loss 2.7972 LearningRate 0.0000 Epoch: 35 Global Step: 729580 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:46,008-Speed 6293.83 samples/sec Loss 2.7818 LearningRate 0.0000 Epoch: 35 Global Step: 729590 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:49,268-Speed 6284.43 samples/sec Loss 2.7863 LearningRate 0.0000 Epoch: 35 Global Step: 729600 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:52,528-Speed 6283.30 samples/sec Loss 2.7630 LearningRate 0.0000 Epoch: 35 Global Step: 729610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:55,787-Speed 6285.47 samples/sec Loss 2.7354 LearningRate 0.0000 Epoch: 35 Global Step: 729620 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:06:59,040-Speed 6296.42 samples/sec Loss 2.7732 LearningRate 0.0000 Epoch: 35 Global Step: 729630 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:02,299-Speed 6285.22 samples/sec Loss 2.8123 LearningRate 0.0000 Epoch: 35 Global Step: 729640 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:07:05,554-Speed 6294.51 samples/sec Loss 2.7697 LearningRate 0.0000 Epoch: 35 Global Step: 729650 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:07:08,794-Speed 6320.76 samples/sec Loss 2.7582 LearningRate 0.0000 Epoch: 35 Global Step: 729660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:12,048-Speed 6296.27 samples/sec Loss 2.7944 LearningRate 0.0000 Epoch: 35 Global Step: 729670 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:15,294-Speed 6310.75 samples/sec Loss 2.7348 LearningRate 0.0000 Epoch: 35 Global Step: 729680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:18,577-Speed 6239.54 samples/sec Loss 2.7588 LearningRate 0.0000 Epoch: 35 Global Step: 729690 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:21,831-Speed 6296.10 samples/sec Loss 2.7935 LearningRate 0.0000 Epoch: 35 Global Step: 729700 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:25,087-Speed 6289.44 samples/sec Loss 2.7792 LearningRate 0.0000 Epoch: 35 Global Step: 729710 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:28,344-Speed 6291.08 samples/sec Loss 2.8017 LearningRate 0.0000 Epoch: 35 Global Step: 729720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:31,614-Speed 6264.20 samples/sec Loss 2.7426 LearningRate 0.0000 Epoch: 35 Global Step: 729730 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:34,877-Speed 6277.19 samples/sec Loss 2.7537 LearningRate 0.0000 Epoch: 35 Global Step: 729740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:38,137-Speed 6284.34 samples/sec Loss 2.7794 LearningRate 0.0000 Epoch: 35 Global Step: 729750 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:41,428-Speed 6224.24 samples/sec Loss 2.7812 LearningRate 0.0000 Epoch: 35 Global Step: 729760 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:07:44,695-Speed 6271.29 samples/sec Loss 2.8033 LearningRate 0.0000 Epoch: 35 Global Step: 729770 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:07:47,939-Speed 6313.82 samples/sec Loss 2.7507 LearningRate 0.0000 Epoch: 35 Global Step: 729780 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:51,191-Speed 6299.60 samples/sec Loss 2.8252 LearningRate 0.0000 Epoch: 35 Global Step: 729790 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:54,445-Speed 6294.24 samples/sec Loss 2.7784 LearningRate 0.0000 Epoch: 35 Global Step: 729800 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:07:57,703-Speed 6287.71 samples/sec Loss 2.7493 LearningRate 0.0000 Epoch: 35 Global Step: 729810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:00,959-Speed 6291.99 samples/sec Loss 2.7564 LearningRate 0.0000 Epoch: 35 Global Step: 729820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:04,206-Speed 6308.67 samples/sec Loss 2.7124 LearningRate 0.0000 Epoch: 35 Global Step: 729830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:07,465-Speed 6285.43 samples/sec Loss 2.7530 LearningRate 0.0000 Epoch: 35 Global Step: 729840 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:10,715-Speed 6301.50 samples/sec Loss 2.7769 LearningRate 0.0000 Epoch: 35 Global Step: 729850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:13,977-Speed 6280.73 samples/sec Loss 2.7552 LearningRate 0.0000 Epoch: 35 Global Step: 729860 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:17,233-Speed 6291.60 samples/sec Loss 2.7621 LearningRate 0.0000 Epoch: 35 Global Step: 729870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:20,491-Speed 6287.20 samples/sec Loss 2.7624 LearningRate 0.0000 Epoch: 35 Global Step: 729880 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:08:23,751-Speed 6284.51 samples/sec Loss 2.7627 LearningRate 0.0000 Epoch: 35 Global Step: 729890 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:08:26,994-Speed 6316.23 samples/sec Loss 2.8155 LearningRate 0.0000 Epoch: 35 Global Step: 729900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:30,249-Speed 6291.48 samples/sec Loss 2.8180 LearningRate 0.0000 Epoch: 35 Global Step: 729910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:33,505-Speed 6291.38 samples/sec Loss 2.7681 LearningRate 0.0000 Epoch: 35 Global Step: 729920 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:36,784-Speed 6248.64 samples/sec Loss 2.8123 LearningRate 0.0000 Epoch: 35 Global Step: 729930 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:40,041-Speed 6289.93 samples/sec Loss 2.7295 LearningRate 0.0000 Epoch: 35 Global Step: 729940 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:43,301-Speed 6283.53 samples/sec Loss 2.8112 LearningRate 0.0000 Epoch: 35 Global Step: 729950 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:46,559-Speed 6287.47 samples/sec Loss 2.8044 LearningRate 0.0000 Epoch: 35 Global Step: 729960 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:49,819-Speed 6283.14 samples/sec Loss 2.7727 LearningRate 0.0000 Epoch: 35 Global Step: 729970 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:53,074-Speed 6294.34 samples/sec Loss 2.7939 LearningRate 0.0000 Epoch: 35 Global Step: 729980 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:56,333-Speed 6284.17 samples/sec Loss 2.7561 LearningRate 0.0000 Epoch: 35 Global Step: 729990 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:08:59,566-Speed 6336.98 samples/sec Loss 2.7559 LearningRate 0.0000 Epoch: 35 Global Step: 730000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:02,823-Speed 6289.19 samples/sec Loss 2.7498 LearningRate 0.0000 Epoch: 35 Global Step: 730010 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:06,073-Speed 6303.41 samples/sec Loss 2.8327 LearningRate 0.0000 Epoch: 35 Global Step: 730020 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:09,364-Speed 6223.62 samples/sec Loss 2.7639 LearningRate 0.0000 Epoch: 35 Global Step: 730030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:12,624-Speed 6284.56 samples/sec Loss 2.7882 LearningRate 0.0000 Epoch: 35 Global Step: 730040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:15,876-Speed 6297.81 samples/sec Loss 2.7708 LearningRate 0.0000 Epoch: 35 Global Step: 730050 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:19,132-Speed 6292.07 samples/sec Loss 2.7964 LearningRate 0.0000 Epoch: 35 Global Step: 730060 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:22,401-Speed 6265.68 samples/sec Loss 2.8117 LearningRate 0.0000 Epoch: 35 Global Step: 730070 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:25,654-Speed 6298.09 samples/sec Loss 2.7736 LearningRate 0.0000 Epoch: 35 Global Step: 730080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:28,904-Speed 6301.80 samples/sec Loss 2.7942 LearningRate 0.0000 Epoch: 35 Global Step: 730090 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:32,158-Speed 6295.05 samples/sec Loss 2.7573 LearningRate 0.0000 Epoch: 35 Global Step: 730100 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:09:35,398-Speed 6322.83 samples/sec Loss 2.7546 LearningRate 0.0000 Epoch: 35 Global Step: 730110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:38,647-Speed 6305.46 samples/sec Loss 2.7401 LearningRate 0.0000 Epoch: 35 Global Step: 730120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:41,902-Speed 6293.01 samples/sec Loss 2.8132 LearningRate 0.0000 Epoch: 35 Global Step: 730130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:45,165-Speed 6277.95 samples/sec Loss 2.7845 LearningRate 0.0000 Epoch: 35 Global Step: 730140 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:48,422-Speed 6289.45 samples/sec Loss 2.7599 LearningRate 0.0000 Epoch: 35 Global Step: 730150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:51,683-Speed 6283.21 samples/sec Loss 2.8198 LearningRate 0.0000 Epoch: 35 Global Step: 730160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:54,933-Speed 6301.22 samples/sec Loss 2.7286 LearningRate 0.0000 Epoch: 35 Global Step: 730170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:09:58,191-Speed 6287.40 samples/sec Loss 2.8047 LearningRate 0.0000 Epoch: 35 Global Step: 730180 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:01,447-Speed 6293.08 samples/sec Loss 2.7306 LearningRate 0.0000 Epoch: 35 Global Step: 730190 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:04,701-Speed 6293.82 samples/sec Loss 2.7585 LearningRate 0.0000 Epoch: 35 Global Step: 730200 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:07,949-Speed 6306.73 samples/sec Loss 2.7376 LearningRate 0.0000 Epoch: 35 Global Step: 730210 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:11,202-Speed 6297.80 samples/sec Loss 2.8415 LearningRate 0.0000 Epoch: 35 Global Step: 730220 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:14,455-Speed 6297.48 samples/sec Loss 2.7828 LearningRate 0.0000 Epoch: 35 Global Step: 730230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:17,711-Speed 6289.78 samples/sec Loss 2.7580 LearningRate 0.0000 Epoch: 35 Global Step: 730240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:20,965-Speed 6295.44 samples/sec Loss 2.7682 LearningRate 0.0000 Epoch: 35 Global Step: 730250 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:24,220-Speed 6294.87 samples/sec Loss 2.7803 LearningRate 0.0000 Epoch: 35 Global Step: 730260 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:27,474-Speed 6293.95 samples/sec Loss 2.7399 LearningRate 0.0000 Epoch: 35 Global Step: 730270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:30,730-Speed 6291.12 samples/sec Loss 2.7987 LearningRate 0.0000 Epoch: 35 Global Step: 730280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:33,984-Speed 6295.09 samples/sec Loss 2.8047 LearningRate 0.0000 Epoch: 35 Global Step: 730290 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:37,238-Speed 6295.69 samples/sec Loss 2.7758 LearningRate 0.0000 Epoch: 35 Global Step: 730300 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:40,491-Speed 6297.08 samples/sec Loss 2.7797 LearningRate 0.0000 Epoch: 35 Global Step: 730310 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:10:43,740-Speed 6304.11 samples/sec Loss 2.7602 LearningRate 0.0000 Epoch: 35 Global Step: 730320 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:10:46,973-Speed 6337.54 samples/sec Loss 2.8423 LearningRate 0.0000 Epoch: 35 Global Step: 730330 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:50,226-Speed 6296.54 samples/sec Loss 2.7704 LearningRate 0.0000 Epoch: 35 Global Step: 730340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:53,487-Speed 6283.12 samples/sec Loss 2.7152 LearningRate 0.0000 Epoch: 35 Global Step: 730350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:10:56,742-Speed 6293.53 samples/sec Loss 2.7887 LearningRate 0.0000 Epoch: 35 Global Step: 730360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:00,025-Speed 6239.47 samples/sec Loss 2.7760 LearningRate 0.0000 Epoch: 35 Global Step: 730370 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:03,399-Speed 6070.72 samples/sec Loss 2.7245 LearningRate 0.0000 Epoch: 35 Global Step: 730380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:06,727-Speed 6156.17 samples/sec Loss 2.7572 LearningRate 0.0000 Epoch: 35 Global Step: 730390 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:09,976-Speed 6304.91 samples/sec Loss 2.7488 LearningRate 0.0000 Epoch: 35 Global Step: 730400 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:13,230-Speed 6294.46 samples/sec Loss 2.7647 LearningRate 0.0000 Epoch: 35 Global Step: 730410 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:16,486-Speed 6290.49 samples/sec Loss 2.7236 LearningRate 0.0000 Epoch: 35 Global Step: 730420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:19,732-Speed 6312.19 samples/sec Loss 2.7920 LearningRate 0.0000 Epoch: 35 Global Step: 730430 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:22,988-Speed 6290.87 samples/sec Loss 2.7729 LearningRate 0.0000 Epoch: 35 Global Step: 730440 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:26,249-Speed 6282.24 samples/sec Loss 2.8043 LearningRate 0.0000 Epoch: 35 Global Step: 730450 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:29,506-Speed 6289.19 samples/sec Loss 2.8145 LearningRate 0.0000 Epoch: 35 Global Step: 730460 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:32,766-Speed 6283.87 samples/sec Loss 2.7396 LearningRate 0.0000 Epoch: 35 Global Step: 730470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:36,024-Speed 6287.28 samples/sec Loss 2.7664 LearningRate 0.0000 Epoch: 35 Global Step: 730480 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:39,282-Speed 6285.88 samples/sec Loss 2.7465 LearningRate 0.0000 Epoch: 35 Global Step: 730490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:42,546-Speed 6276.39 samples/sec Loss 2.7034 LearningRate 0.0000 Epoch: 35 Global Step: 730500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:45,797-Speed 6302.14 samples/sec Loss 2.7706 LearningRate 0.0000 Epoch: 35 Global Step: 730510 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:49,057-Speed 6283.21 samples/sec Loss 2.7613 LearningRate 0.0000 Epoch: 35 Global Step: 730520 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:52,297-Speed 6322.49 samples/sec Loss 2.8137 LearningRate 0.0000 Epoch: 35 Global Step: 730530 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:55,553-Speed 6290.68 samples/sec Loss 2.7253 LearningRate 0.0000 Epoch: 35 Global Step: 730540 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:11:58,813-Speed 6283.34 samples/sec Loss 2.7442 LearningRate 0.0000 Epoch: 35 Global Step: 730550 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:02,071-Speed 6288.27 samples/sec Loss 2.7937 LearningRate 0.0000 Epoch: 35 Global Step: 730560 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:05,321-Speed 6302.15 samples/sec Loss 2.7760 LearningRate 0.0000 Epoch: 35 Global Step: 730570 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:08,573-Speed 6299.72 samples/sec Loss 2.7931 LearningRate 0.0000 Epoch: 35 Global Step: 730580 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:11,875-Speed 6204.96 samples/sec Loss 2.7433 LearningRate 0.0000 Epoch: 35 Global Step: 730590 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:15,180-Speed 6197.60 samples/sec Loss 2.7547 LearningRate 0.0000 Epoch: 35 Global Step: 730600 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:18,438-Speed 6287.69 samples/sec Loss 2.7752 LearningRate 0.0000 Epoch: 35 Global Step: 730610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:21,692-Speed 6294.60 samples/sec Loss 2.7591 LearningRate 0.0000 Epoch: 35 Global Step: 730620 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:24,946-Speed 6295.65 samples/sec Loss 2.8352 LearningRate 0.0000 Epoch: 35 Global Step: 730630 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:12:28,191-Speed 6312.64 samples/sec Loss 2.7542 LearningRate 0.0000 Epoch: 35 Global Step: 730640 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:31,450-Speed 6285.69 samples/sec Loss 2.7314 LearningRate 0.0000 Epoch: 35 Global Step: 730650 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:34,706-Speed 6290.52 samples/sec Loss 2.7542 LearningRate 0.0000 Epoch: 35 Global Step: 730660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:37,962-Speed 6291.75 samples/sec Loss 2.8042 LearningRate 0.0000 Epoch: 35 Global Step: 730670 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:41,219-Speed 6289.09 samples/sec Loss 2.8461 LearningRate 0.0000 Epoch: 35 Global Step: 730680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:44,472-Speed 6296.78 samples/sec Loss 2.7923 LearningRate 0.0000 Epoch: 35 Global Step: 730690 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:47,742-Speed 6264.34 samples/sec Loss 2.7346 LearningRate 0.0000 Epoch: 35 Global Step: 730700 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:51,001-Speed 6286.29 samples/sec Loss 2.7424 LearningRate 0.0000 Epoch: 35 Global Step: 730710 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:54,256-Speed 6292.96 samples/sec Loss 2.7385 LearningRate 0.0000 Epoch: 35 Global Step: 730720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:12:57,511-Speed 6293.41 samples/sec Loss 2.7759 LearningRate 0.0000 Epoch: 35 Global Step: 730730 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:00,751-Speed 6323.27 samples/sec Loss 2.7777 LearningRate 0.0000 Epoch: 35 Global Step: 730740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:04,008-Speed 6287.66 samples/sec Loss 2.8086 LearningRate 0.0000 Epoch: 35 Global Step: 730750 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:07,265-Speed 6291.05 samples/sec Loss 2.7660 LearningRate 0.0000 Epoch: 35 Global Step: 730760 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:10,519-Speed 6293.83 samples/sec Loss 2.7460 LearningRate 0.0000 Epoch: 35 Global Step: 730770 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:13,778-Speed 6285.89 samples/sec Loss 2.8004 LearningRate 0.0000 Epoch: 35 Global Step: 730780 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:17,039-Speed 6282.80 samples/sec Loss 2.7913 LearningRate 0.0000 Epoch: 35 Global Step: 730790 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:20,295-Speed 6290.34 samples/sec Loss 2.7758 LearningRate 0.0000 Epoch: 35 Global Step: 730800 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:23,559-Speed 6276.61 samples/sec Loss 2.7281 LearningRate 0.0000 Epoch: 35 Global Step: 730810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:26,808-Speed 6306.03 samples/sec Loss 2.7934 LearningRate 0.0000 Epoch: 35 Global Step: 730820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:30,069-Speed 6281.22 samples/sec Loss 2.7128 LearningRate 0.0000 Epoch: 35 Global Step: 730830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:33,326-Speed 6288.78 samples/sec Loss 2.7631 LearningRate 0.0000 Epoch: 35 Global Step: 730840 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:13:36,565-Speed 6324.77 samples/sec Loss 2.7185 LearningRate 0.0000 Epoch: 35 Global Step: 730850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:39,825-Speed 6282.53 samples/sec Loss 2.7801 LearningRate 0.0000 Epoch: 35 Global Step: 730860 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:43,088-Speed 6279.01 samples/sec Loss 2.7779 LearningRate 0.0000 Epoch: 35 Global Step: 730870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:46,343-Speed 6292.29 samples/sec Loss 2.7813 LearningRate 0.0000 Epoch: 35 Global Step: 730880 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:49,598-Speed 6293.80 samples/sec Loss 2.7554 LearningRate 0.0000 Epoch: 35 Global Step: 730890 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:52,859-Speed 6282.76 samples/sec Loss 2.7776 LearningRate 0.0000 Epoch: 35 Global Step: 730900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:56,113-Speed 6294.63 samples/sec Loss 2.8088 LearningRate 0.0000 Epoch: 35 Global Step: 730910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:13:59,371-Speed 6286.92 samples/sec Loss 2.8048 LearningRate 0.0000 Epoch: 35 Global Step: 730920 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:02,628-Speed 6289.30 samples/sec Loss 2.8007 LearningRate 0.0000 Epoch: 35 Global Step: 730930 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:05,884-Speed 6290.74 samples/sec Loss 2.7481 LearningRate 0.0000 Epoch: 35 Global Step: 730940 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:09,142-Speed 6289.11 samples/sec Loss 2.7706 LearningRate 0.0000 Epoch: 35 Global Step: 730950 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:14:12,454-Speed 6184.15 samples/sec Loss 2.7823 LearningRate 0.0000 Epoch: 35 Global Step: 730960 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:14:15,693-Speed 6324.72 samples/sec Loss 2.7323 LearningRate 0.0000 Epoch: 35 Global Step: 730970 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:18,943-Speed 6303.11 samples/sec Loss 2.7583 LearningRate 0.0000 Epoch: 35 Global Step: 730980 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:22,201-Speed 6287.58 samples/sec Loss 2.7971 LearningRate 0.0000 Epoch: 35 Global Step: 730990 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:25,456-Speed 6294.17 samples/sec Loss 2.8072 LearningRate 0.0000 Epoch: 35 Global Step: 731000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:28,707-Speed 6301.02 samples/sec Loss 2.7554 LearningRate 0.0000 Epoch: 35 Global Step: 731010 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:31,965-Speed 6286.40 samples/sec Loss 2.7725 LearningRate 0.0000 Epoch: 35 Global Step: 731020 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:35,228-Speed 6277.62 samples/sec Loss 2.7903 LearningRate 0.0000 Epoch: 35 Global Step: 731030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:38,482-Speed 6295.11 samples/sec Loss 2.7636 LearningRate 0.0000 Epoch: 35 Global Step: 731040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:41,734-Speed 6300.96 samples/sec Loss 2.7561 LearningRate 0.0000 Epoch: 35 Global Step: 731050 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:44,983-Speed 6304.18 samples/sec Loss 2.7980 LearningRate 0.0000 Epoch: 35 Global Step: 731060 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:48,223-Speed 6321.41 samples/sec Loss 2.7722 LearningRate 0.0000 Epoch: 35 Global Step: 731070 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:51,475-Speed 6299.54 samples/sec Loss 2.7281 LearningRate 0.0000 Epoch: 35 Global Step: 731080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:54,725-Speed 6302.21 samples/sec Loss 2.7745 LearningRate 0.0000 Epoch: 35 Global Step: 731090 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:14:57,978-Speed 6298.08 samples/sec Loss 2.7415 LearningRate 0.0000 Epoch: 35 Global Step: 731100 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:01,238-Speed 6282.95 samples/sec Loss 2.7653 LearningRate 0.0000 Epoch: 35 Global Step: 731110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:04,498-Speed 6283.65 samples/sec Loss 2.7868 LearningRate 0.0000 Epoch: 35 Global Step: 731120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:07,750-Speed 6298.48 samples/sec Loss 2.8235 LearningRate 0.0000 Epoch: 35 Global Step: 731130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:11,015-Speed 6275.34 samples/sec Loss 2.7668 LearningRate 0.0000 Epoch: 35 Global Step: 731140 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:14,282-Speed 6268.62 samples/sec Loss 2.8186 LearningRate 0.0000 Epoch: 35 Global Step: 731150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:17,546-Speed 6276.37 samples/sec Loss 2.7523 LearningRate 0.0000 Epoch: 35 Global Step: 731160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:20,788-Speed 6318.23 samples/sec Loss 2.8036 LearningRate 0.0000 Epoch: 35 Global Step: 731170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:24,038-Speed 6303.20 samples/sec Loss 2.7967 LearningRate 0.0000 Epoch: 35 Global Step: 731180 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:27,287-Speed 6305.04 samples/sec Loss 2.8102 LearningRate 0.0000 Epoch: 35 Global Step: 731190 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:30,542-Speed 6293.38 samples/sec Loss 2.7509 LearningRate 0.0000 Epoch: 35 Global Step: 731200 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:33,825-Speed 6239.57 samples/sec Loss 2.8037 LearningRate 0.0000 Epoch: 35 Global Step: 731210 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:37,079-Speed 6296.99 samples/sec Loss 2.7904 LearningRate 0.0000 Epoch: 35 Global Step: 731220 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:40,341-Speed 6277.87 samples/sec Loss 2.7419 LearningRate 0.0000 Epoch: 35 Global Step: 731230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:43,595-Speed 6297.18 samples/sec Loss 2.7183 LearningRate 0.0000 Epoch: 35 Global Step: 731240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:46,843-Speed 6305.11 samples/sec Loss 2.7556 LearningRate 0.0000 Epoch: 35 Global Step: 731250 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:50,100-Speed 6290.28 samples/sec Loss 2.7801 LearningRate 0.0000 Epoch: 35 Global Step: 731260 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:53,342-Speed 6317.45 samples/sec Loss 2.7705 LearningRate 0.0000 Epoch: 35 Global Step: 731270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:56,605-Speed 6279.24 samples/sec Loss 2.7626 LearningRate 0.0000 Epoch: 35 Global Step: 731280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:15:59,863-Speed 6286.64 samples/sec Loss 2.7715 LearningRate 0.0000 Epoch: 35 Global Step: 731290 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:03,124-Speed 6282.51 samples/sec Loss 2.7654 LearningRate 0.0000 Epoch: 35 Global Step: 731300 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:06,383-Speed 6284.48 samples/sec Loss 2.7804 LearningRate 0.0000 Epoch: 35 Global Step: 731310 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:09,636-Speed 6296.99 samples/sec Loss 2.7673 LearningRate 0.0000 Epoch: 35 Global Step: 731320 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:12,897-Speed 6282.08 samples/sec Loss 2.7348 LearningRate 0.0000 Epoch: 35 Global Step: 731330 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:16,155-Speed 6286.98 samples/sec Loss 2.7608 LearningRate 0.0000 Epoch: 35 Global Step: 731340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:19,409-Speed 6294.89 samples/sec Loss 2.7625 LearningRate 0.0000 Epoch: 35 Global Step: 731350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:22,662-Speed 6297.42 samples/sec Loss 2.7927 LearningRate 0.0000 Epoch: 35 Global Step: 731360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:25,923-Speed 6281.44 samples/sec Loss 2.7331 LearningRate 0.0000 Epoch: 35 Global Step: 731370 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:16:29,161-Speed 6327.13 samples/sec Loss 2.7580 LearningRate 0.0000 Epoch: 35 Global Step: 731380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:32,422-Speed 6281.02 samples/sec Loss 2.8036 LearningRate 0.0000 Epoch: 35 Global Step: 731390 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:35,673-Speed 6300.81 samples/sec Loss 2.7460 LearningRate 0.0000 Epoch: 35 Global Step: 731400 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:38,923-Speed 6302.94 samples/sec Loss 2.7954 LearningRate 0.0000 Epoch: 35 Global Step: 731410 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:42,186-Speed 6279.35 samples/sec Loss 2.8037 LearningRate 0.0000 Epoch: 35 Global Step: 731420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:45,436-Speed 6302.48 samples/sec Loss 2.7700 LearningRate 0.0000 Epoch: 35 Global Step: 731430 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:48,690-Speed 6294.78 samples/sec Loss 2.7412 LearningRate 0.0000 Epoch: 35 Global Step: 731440 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:51,938-Speed 6308.12 samples/sec Loss 2.7124 LearningRate 0.0000 Epoch: 35 Global Step: 731450 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:55,189-Speed 6300.51 samples/sec Loss 2.7742 LearningRate 0.0000 Epoch: 35 Global Step: 731460 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:16:58,440-Speed 6301.11 samples/sec Loss 2.7237 LearningRate 0.0000 Epoch: 35 Global Step: 731470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:01,681-Speed 6320.76 samples/sec Loss 2.7254 LearningRate 0.0000 Epoch: 35 Global Step: 731480 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:04,944-Speed 6277.32 samples/sec Loss 2.8116 LearningRate 0.0000 Epoch: 35 Global Step: 731490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:08,201-Speed 6289.63 samples/sec Loss 2.7668 LearningRate 0.0000 Epoch: 35 Global Step: 731500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:11,460-Speed 6284.67 samples/sec Loss 2.7521 LearningRate 0.0000 Epoch: 35 Global Step: 731510 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:14,720-Speed 6283.50 samples/sec Loss 2.7588 LearningRate 0.0000 Epoch: 35 Global Step: 731520 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:17,979-Speed 6286.08 samples/sec Loss 2.7645 LearningRate 0.0000 Epoch: 35 Global Step: 731530 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:21,256-Speed 6250.22 samples/sec Loss 2.7502 LearningRate 0.0000 Epoch: 35 Global Step: 731540 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:24,531-Speed 6255.82 samples/sec Loss 2.8011 LearningRate 0.0000 Epoch: 35 Global Step: 731550 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:27,819-Speed 6229.89 samples/sec Loss 2.7724 LearningRate 0.0000 Epoch: 35 Global Step: 731560 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:31,109-Speed 6225.99 samples/sec Loss 2.7944 LearningRate 0.0000 Epoch: 35 Global Step: 731570 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:34,363-Speed 6296.03 samples/sec Loss 2.7695 LearningRate 0.0000 Epoch: 35 Global Step: 731580 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:17:37,595-Speed 6337.26 samples/sec Loss 2.7580 LearningRate 0.0000 Epoch: 35 Global Step: 731590 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:40,857-Speed 6280.05 samples/sec Loss 2.7666 LearningRate 0.0000 Epoch: 35 Global Step: 731600 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:44,110-Speed 6296.26 samples/sec Loss 2.7533 LearningRate 0.0000 Epoch: 35 Global Step: 731610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:47,371-Speed 6283.14 samples/sec Loss 2.8077 LearningRate 0.0000 Epoch: 35 Global Step: 731620 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:50,629-Speed 6285.93 samples/sec Loss 2.8129 LearningRate 0.0000 Epoch: 35 Global Step: 731630 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:53,888-Speed 6287.87 samples/sec Loss 2.7517 LearningRate 0.0000 Epoch: 35 Global Step: 731640 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:17:57,147-Speed 6286.15 samples/sec Loss 2.7617 LearningRate 0.0000 Epoch: 35 Global Step: 731650 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:00,399-Speed 6298.09 samples/sec Loss 2.7419 LearningRate 0.0000 Epoch: 35 Global Step: 731660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:03,655-Speed 6291.19 samples/sec Loss 2.7558 LearningRate 0.0000 Epoch: 35 Global Step: 731670 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:06,909-Speed 6295.14 samples/sec Loss 2.7636 LearningRate 0.0000 Epoch: 35 Global Step: 731680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:10,165-Speed 6290.84 samples/sec Loss 2.7751 LearningRate 0.0000 Epoch: 35 Global Step: 731690 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:18:13,427-Speed 6279.88 samples/sec Loss 2.7531 LearningRate 0.0000 Epoch: 35 Global Step: 731700 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:18:16,682-Speed 6293.08 samples/sec Loss 2.7298 LearningRate 0.0000 Epoch: 35 Global Step: 731710 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:18:19,932-Speed 6304.38 samples/sec Loss 2.7941 LearningRate 0.0000 Epoch: 35 Global Step: 731720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:23,189-Speed 6288.66 samples/sec Loss 2.7895 LearningRate 0.0000 Epoch: 35 Global Step: 731730 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:26,438-Speed 6304.70 samples/sec Loss 2.7912 LearningRate 0.0000 Epoch: 35 Global Step: 731740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:29,688-Speed 6303.22 samples/sec Loss 2.7819 LearningRate 0.0000 Epoch: 35 Global Step: 731750 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:32,940-Speed 6299.68 samples/sec Loss 2.7679 LearningRate 0.0000 Epoch: 35 Global Step: 731760 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:36,191-Speed 6300.95 samples/sec Loss 2.8006 LearningRate 0.0000 Epoch: 35 Global Step: 731770 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:39,448-Speed 6289.12 samples/sec Loss 2.8187 LearningRate 0.0000 Epoch: 35 Global Step: 731780 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:42,705-Speed 6288.55 samples/sec Loss 2.7149 LearningRate 0.0000 Epoch: 35 Global Step: 731790 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:45,958-Speed 6296.77 samples/sec Loss 2.7427 LearningRate 0.0000 Epoch: 35 Global Step: 731800 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:49,212-Speed 6296.20 samples/sec Loss 2.7557 LearningRate 0.0000 Epoch: 35 Global Step: 731810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:52,450-Speed 6325.35 samples/sec Loss 2.7926 LearningRate 0.0000 Epoch: 35 Global Step: 731820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:55,707-Speed 6289.42 samples/sec Loss 2.7930 LearningRate 0.0000 Epoch: 35 Global Step: 731830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:18:58,964-Speed 6290.76 samples/sec Loss 2.8032 LearningRate 0.0000 Epoch: 35 Global Step: 731840 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:02,231-Speed 6270.09 samples/sec Loss 2.7822 LearningRate 0.0000 Epoch: 35 Global Step: 731850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:05,546-Speed 6180.05 samples/sec Loss 2.7471 LearningRate 0.0000 Epoch: 35 Global Step: 731860 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:08,805-Speed 6285.95 samples/sec Loss 2.7615 LearningRate 0.0000 Epoch: 35 Global Step: 731870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:12,065-Speed 6282.25 samples/sec Loss 2.7816 LearningRate 0.0000 Epoch: 35 Global Step: 731880 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:15,315-Speed 6302.73 samples/sec Loss 2.7037 LearningRate 0.0000 Epoch: 35 Global Step: 731890 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:18,571-Speed 6291.67 samples/sec Loss 2.8050 LearningRate 0.0000 Epoch: 35 Global Step: 731900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:21,837-Speed 6272.57 samples/sec Loss 2.8165 LearningRate 0.0000 Epoch: 35 Global Step: 731910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:25,092-Speed 6292.77 samples/sec Loss 2.7974 LearningRate 0.0000 Epoch: 35 Global Step: 731920 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:19:28,346-Speed 6295.03 samples/sec Loss 2.7495 LearningRate 0.0000 Epoch: 35 Global Step: 731930 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:19:31,585-Speed 6325.60 samples/sec Loss 2.7831 LearningRate 0.0000 Epoch: 35 Global Step: 731940 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:34,837-Speed 6298.11 samples/sec Loss 2.7297 LearningRate 0.0000 Epoch: 35 Global Step: 731950 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:38,096-Speed 6285.29 samples/sec Loss 2.7852 LearningRate 0.0000 Epoch: 35 Global Step: 731960 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:41,352-Speed 6292.19 samples/sec Loss 2.7521 LearningRate 0.0000 Epoch: 35 Global Step: 731970 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:44,614-Speed 6278.36 samples/sec Loss 2.7872 LearningRate 0.0000 Epoch: 35 Global Step: 731980 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:47,868-Speed 6295.90 samples/sec Loss 2.7554 LearningRate 0.0000 Epoch: 35 Global Step: 731990 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:51,123-Speed 6293.68 samples/sec Loss 2.7705 LearningRate 0.0000 Epoch: 35 Global Step: 732000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:54,373-Speed 6303.25 samples/sec Loss 2.8141 LearningRate 0.0000 Epoch: 35 Global Step: 732010 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:19:57,634-Speed 6281.25 samples/sec Loss 2.7446 LearningRate 0.0000 Epoch: 35 Global Step: 732020 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:00,886-Speed 6298.43 samples/sec Loss 2.7439 LearningRate 0.0000 Epoch: 35 Global Step: 732030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:04,129-Speed 6316.65 samples/sec Loss 2.7561 LearningRate 0.0000 Epoch: 35 Global Step: 732040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:07,389-Speed 6284.76 samples/sec Loss 2.7897 LearningRate 0.0000 Epoch: 35 Global Step: 732050 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:10,641-Speed 6298.01 samples/sec Loss 2.7657 LearningRate 0.0000 Epoch: 35 Global Step: 732060 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:13,892-Speed 6301.23 samples/sec Loss 2.8178 LearningRate 0.0000 Epoch: 35 Global Step: 732070 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:17,155-Speed 6278.74 samples/sec Loss 2.7654 LearningRate 0.0000 Epoch: 35 Global Step: 732080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:20,416-Speed 6282.28 samples/sec Loss 2.8026 LearningRate 0.0000 Epoch: 35 Global Step: 732090 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:23,679-Speed 6277.54 samples/sec Loss 2.7731 LearningRate 0.0000 Epoch: 35 Global Step: 732100 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:26,928-Speed 6305.27 samples/sec Loss 2.7477 LearningRate 0.0000 Epoch: 35 Global Step: 732110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:30,186-Speed 6289.81 samples/sec Loss 2.7614 LearningRate 0.0000 Epoch: 35 Global Step: 732120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:33,455-Speed 6266.41 samples/sec Loss 2.7824 LearningRate 0.0000 Epoch: 35 Global Step: 732130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:36,713-Speed 6288.10 samples/sec Loss 2.7542 LearningRate 0.0000 Epoch: 35 Global Step: 732140 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:20:39,954-Speed 6320.21 samples/sec Loss 2.7861 LearningRate 0.0000 Epoch: 35 Global Step: 732150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:43,211-Speed 6288.40 samples/sec Loss 2.7197 LearningRate 0.0000 Epoch: 35 Global Step: 732160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:46,470-Speed 6285.66 samples/sec Loss 2.7774 LearningRate 0.0000 Epoch: 35 Global Step: 732170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:49,733-Speed 6278.41 samples/sec Loss 2.8051 LearningRate 0.0000 Epoch: 35 Global Step: 732180 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:52,983-Speed 6303.02 samples/sec Loss 2.7560 LearningRate 0.0000 Epoch: 35 Global Step: 732190 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:56,243-Speed 6282.79 samples/sec Loss 2.7306 LearningRate 0.0000 Epoch: 35 Global Step: 732200 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:20:59,505-Speed 6280.55 samples/sec Loss 2.7305 LearningRate 0.0000 Epoch: 35 Global Step: 732210 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:02,763-Speed 6286.32 samples/sec Loss 2.7926 LearningRate 0.0000 Epoch: 35 Global Step: 732220 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:06,023-Speed 6285.16 samples/sec Loss 2.7624 LearningRate 0.0000 Epoch: 35 Global Step: 732230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:09,282-Speed 6284.24 samples/sec Loss 2.8093 LearningRate 0.0000 Epoch: 35 Global Step: 732240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:12,521-Speed 6324.78 samples/sec Loss 2.7134 LearningRate 0.0000 Epoch: 35 Global Step: 732250 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:15,783-Speed 6279.63 samples/sec Loss 2.7515 LearningRate 0.0000 Epoch: 35 Global Step: 732260 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:19,039-Speed 6292.23 samples/sec Loss 2.7269 LearningRate 0.0000 Epoch: 35 Global Step: 732270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:22,306-Speed 6270.03 samples/sec Loss 2.7789 LearningRate 0.0000 Epoch: 35 Global Step: 732280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:25,567-Speed 6281.43 samples/sec Loss 2.7712 LearningRate 0.0000 Epoch: 35 Global Step: 732290 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:28,814-Speed 6308.73 samples/sec Loss 2.7836 LearningRate 0.0000 Epoch: 35 Global Step: 732300 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:32,071-Speed 6291.25 samples/sec Loss 2.7620 LearningRate 0.0000 Epoch: 35 Global Step: 732310 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:35,330-Speed 6285.68 samples/sec Loss 2.7280 LearningRate 0.0000 Epoch: 35 Global Step: 732320 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:38,591-Speed 6281.51 samples/sec Loss 2.7553 LearningRate 0.0000 Epoch: 35 Global Step: 732330 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:41,847-Speed 6290.98 samples/sec Loss 2.7725 LearningRate 0.0000 Epoch: 35 Global Step: 732340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:45,086-Speed 6323.84 samples/sec Loss 2.7627 LearningRate 0.0000 Epoch: 35 Global Step: 732350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:48,354-Speed 6269.81 samples/sec Loss 2.7841 LearningRate 0.0000 Epoch: 35 Global Step: 732360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:51,606-Speed 6298.08 samples/sec Loss 2.7491 LearningRate 0.0000 Epoch: 35 Global Step: 732370 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:54,861-Speed 6293.41 samples/sec Loss 2.7469 LearningRate 0.0000 Epoch: 35 Global Step: 732380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:21:58,112-Speed 6300.15 samples/sec Loss 2.7411 LearningRate 0.0000 Epoch: 35 Global Step: 732390 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:01,375-Speed 6278.05 samples/sec Loss 2.7366 LearningRate 0.0000 Epoch: 35 Global Step: 732400 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:04,636-Speed 6282.07 samples/sec Loss 2.8215 LearningRate 0.0000 Epoch: 35 Global Step: 732410 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:07,885-Speed 6305.38 samples/sec Loss 2.7461 LearningRate 0.0000 Epoch: 35 Global Step: 732420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:11,151-Speed 6271.11 samples/sec Loss 2.7258 LearningRate 0.0000 Epoch: 35 Global Step: 732430 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:14,408-Speed 6289.11 samples/sec Loss 2.8039 LearningRate 0.0000 Epoch: 35 Global Step: 732440 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:17,677-Speed 6266.27 samples/sec Loss 2.7380 LearningRate 0.0000 Epoch: 35 Global Step: 732450 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:22:20,931-Speed 6295.13 samples/sec Loss 2.7792 LearningRate 0.0000 Epoch: 35 Global Step: 732460 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:22:24,176-Speed 6313.94 samples/sec Loss 2.7829 LearningRate 0.0000 Epoch: 35 Global Step: 732470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:27,433-Speed 6287.86 samples/sec Loss 2.7109 LearningRate 0.0000 Epoch: 35 Global Step: 732480 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:30,681-Speed 6308.82 samples/sec Loss 2.8486 LearningRate 0.0000 Epoch: 35 Global Step: 732490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:33,949-Speed 6267.10 samples/sec Loss 2.7250 LearningRate 0.0000 Epoch: 35 Global Step: 732500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:37,208-Speed 6287.22 samples/sec Loss 2.7538 LearningRate 0.0000 Epoch: 35 Global Step: 732510 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:40,459-Speed 6299.56 samples/sec Loss 2.8005 LearningRate 0.0000 Epoch: 35 Global Step: 732520 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:43,710-Speed 6301.28 samples/sec Loss 2.7153 LearningRate 0.0000 Epoch: 35 Global Step: 732530 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:46,959-Speed 6306.17 samples/sec Loss 2.6970 LearningRate 0.0000 Epoch: 35 Global Step: 732540 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:50,227-Speed 6268.26 samples/sec Loss 2.8040 LearningRate 0.0000 Epoch: 35 Global Step: 732550 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:53,486-Speed 6285.62 samples/sec Loss 2.7816 LearningRate 0.0000 Epoch: 35 Global Step: 732560 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:22:56,739-Speed 6296.59 samples/sec Loss 2.7648 LearningRate 0.0000 Epoch: 35 Global Step: 732570 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:23:00,011-Speed 6261.10 samples/sec Loss 2.7719 LearningRate 0.0000 Epoch: 35 Global Step: 732580 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:23:03,254-Speed 6315.98 samples/sec Loss 2.7602 LearningRate 0.0000 Epoch: 35 Global Step: 732590 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:06,509-Speed 6293.72 samples/sec Loss 2.7425 LearningRate 0.0000 Epoch: 35 Global Step: 732600 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:09,758-Speed 6304.74 samples/sec Loss 2.8014 LearningRate 0.0000 Epoch: 35 Global Step: 732610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:13,017-Speed 6285.94 samples/sec Loss 2.7300 LearningRate 0.0000 Epoch: 35 Global Step: 732620 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:16,275-Speed 6285.98 samples/sec Loss 2.7388 LearningRate 0.0000 Epoch: 35 Global Step: 732630 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:19,535-Speed 6284.52 samples/sec Loss 2.7208 LearningRate 0.0000 Epoch: 35 Global Step: 732640 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:22,862-Speed 6157.80 samples/sec Loss 2.7386 LearningRate 0.0000 Epoch: 35 Global Step: 732650 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:26,115-Speed 6296.49 samples/sec Loss 2.7876 LearningRate 0.0000 Epoch: 35 Global Step: 732660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:29,368-Speed 6296.27 samples/sec Loss 2.7581 LearningRate 0.0000 Epoch: 35 Global Step: 732670 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:32,628-Speed 6283.86 samples/sec Loss 2.7379 LearningRate 0.0000 Epoch: 35 Global Step: 732680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:35,867-Speed 6324.11 samples/sec Loss 2.7315 LearningRate 0.0000 Epoch: 35 Global Step: 732690 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:39,121-Speed 6295.05 samples/sec Loss 2.7767 LearningRate 0.0000 Epoch: 35 Global Step: 732700 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:42,374-Speed 6298.09 samples/sec Loss 2.7771 LearningRate 0.0000 Epoch: 35 Global Step: 732710 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:45,630-Speed 6291.83 samples/sec Loss 2.7625 LearningRate 0.0000 Epoch: 35 Global Step: 732720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:48,887-Speed 6288.83 samples/sec Loss 2.7190 LearningRate 0.0000 Epoch: 35 Global Step: 732730 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:52,145-Speed 6288.38 samples/sec Loss 2.7643 LearningRate 0.0000 Epoch: 35 Global Step: 732740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:55,399-Speed 6294.83 samples/sec Loss 2.8173 LearningRate 0.0000 Epoch: 35 Global Step: 732750 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:23:58,649-Speed 6304.10 samples/sec Loss 2.7499 LearningRate 0.0000 Epoch: 35 Global Step: 732760 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:01,911-Speed 6279.76 samples/sec Loss 2.7822 LearningRate 0.0000 Epoch: 35 Global Step: 732770 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:05,158-Speed 6308.62 samples/sec Loss 2.7184 LearningRate 0.0000 Epoch: 35 Global Step: 732780 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:08,417-Speed 6285.40 samples/sec Loss 2.7897 LearningRate 0.0000 Epoch: 35 Global Step: 732790 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:24:11,667-Speed 6302.06 samples/sec Loss 2.7689 LearningRate 0.0000 Epoch: 35 Global Step: 732800 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:24:14,903-Speed 6331.57 samples/sec Loss 2.7396 LearningRate 0.0000 Epoch: 35 Global Step: 732810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:18,162-Speed 6284.35 samples/sec Loss 2.7787 LearningRate 0.0000 Epoch: 35 Global Step: 732820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:21,424-Speed 6280.16 samples/sec Loss 2.7713 LearningRate 0.0000 Epoch: 35 Global Step: 732830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:24,683-Speed 6284.81 samples/sec Loss 2.7941 LearningRate 0.0000 Epoch: 35 Global Step: 732840 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:27,937-Speed 6296.60 samples/sec Loss 2.7604 LearningRate 0.0000 Epoch: 35 Global Step: 732850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:31,191-Speed 6294.37 samples/sec Loss 2.7197 LearningRate 0.0000 Epoch: 35 Global Step: 732860 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:34,454-Speed 6278.76 samples/sec Loss 2.7800 LearningRate 0.0000 Epoch: 35 Global Step: 732870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:37,709-Speed 6291.54 samples/sec Loss 2.7356 LearningRate 0.0000 Epoch: 35 Global Step: 732880 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:40,968-Speed 6285.18 samples/sec Loss 2.7710 LearningRate 0.0000 Epoch: 35 Global Step: 732890 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:44,217-Speed 6305.01 samples/sec Loss 2.7598 LearningRate 0.0000 Epoch: 35 Global Step: 732900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:47,460-Speed 6317.24 samples/sec Loss 2.7505 LearningRate 0.0000 Epoch: 35 Global Step: 732910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:50,718-Speed 6287.89 samples/sec Loss 2.7530 LearningRate 0.0000 Epoch: 35 Global Step: 732920 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:53,971-Speed 6296.47 samples/sec Loss 2.7376 LearningRate 0.0000 Epoch: 35 Global Step: 732930 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:24:57,229-Speed 6286.43 samples/sec Loss 2.7580 LearningRate 0.0000 Epoch: 35 Global Step: 732940 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:00,487-Speed 6288.21 samples/sec Loss 2.7278 LearningRate 0.0000 Epoch: 35 Global Step: 732950 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:03,744-Speed 6290.97 samples/sec Loss 2.7699 LearningRate 0.0000 Epoch: 35 Global Step: 732960 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:07,000-Speed 6290.72 samples/sec Loss 2.7854 LearningRate 0.0000 Epoch: 35 Global Step: 732970 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:10,254-Speed 6295.85 samples/sec Loss 2.7955 LearningRate 0.0000 Epoch: 35 Global Step: 732980 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:13,511-Speed 6289.84 samples/sec Loss 2.7526 LearningRate 0.0000 Epoch: 35 Global Step: 732990 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:16,764-Speed 6295.83 samples/sec Loss 2.7822 LearningRate 0.0000 Epoch: 35 Global Step: 733000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:20,017-Speed 6298.62 samples/sec Loss 2.7757 LearningRate 0.0000 Epoch: 35 Global Step: 733010 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:25:23,261-Speed 6313.36 samples/sec Loss 2.7557 LearningRate 0.0000 Epoch: 35 Global Step: 733020 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:26,519-Speed 6286.76 samples/sec Loss 2.7860 LearningRate 0.0000 Epoch: 35 Global Step: 733030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:29,777-Speed 6287.97 samples/sec Loss 2.7743 LearningRate 0.0000 Epoch: 35 Global Step: 733040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:33,037-Speed 6284.65 samples/sec Loss 2.7191 LearningRate 0.0000 Epoch: 35 Global Step: 733050 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:36,289-Speed 6298.85 samples/sec Loss 2.7548 LearningRate 0.0000 Epoch: 35 Global Step: 733060 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:39,543-Speed 6294.73 samples/sec Loss 2.7684 LearningRate 0.0000 Epoch: 35 Global Step: 733070 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:42,806-Speed 6277.39 samples/sec Loss 2.8207 LearningRate 0.0000 Epoch: 35 Global Step: 733080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:46,065-Speed 6286.22 samples/sec Loss 2.7728 LearningRate 0.0000 Epoch: 35 Global Step: 733090 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:49,336-Speed 6261.71 samples/sec Loss 2.8047 LearningRate 0.0000 Epoch: 35 Global Step: 733100 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:52,597-Speed 6281.40 samples/sec Loss 2.7956 LearningRate 0.0000 Epoch: 35 Global Step: 733110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:55,837-Speed 6322.74 samples/sec Loss 2.7377 LearningRate 0.0000 Epoch: 35 Global Step: 733120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:25:59,130-Speed 6221.68 samples/sec Loss 2.7566 LearningRate 0.0000 Epoch: 35 Global Step: 733130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:02,393-Speed 6277.10 samples/sec Loss 2.7608 LearningRate 0.0000 Epoch: 35 Global Step: 733140 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:05,642-Speed 6306.31 samples/sec Loss 2.7810 LearningRate 0.0000 Epoch: 35 Global Step: 733150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:08,891-Speed 6305.86 samples/sec Loss 2.7283 LearningRate 0.0000 Epoch: 35 Global Step: 733160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:12,145-Speed 6294.40 samples/sec Loss 2.7668 LearningRate 0.0000 Epoch: 35 Global Step: 733170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:15,403-Speed 6287.38 samples/sec Loss 2.7783 LearningRate 0.0000 Epoch: 35 Global Step: 733180 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:18,653-Speed 6302.84 samples/sec Loss 2.7947 LearningRate 0.0000 Epoch: 35 Global Step: 733190 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:21,911-Speed 6287.26 samples/sec Loss 2.7502 LearningRate 0.0000 Epoch: 35 Global Step: 733200 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:25,175-Speed 6276.73 samples/sec Loss 2.7391 LearningRate 0.0000 Epoch: 35 Global Step: 733210 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:28,420-Speed 6313.17 samples/sec Loss 2.7344 LearningRate 0.0000 Epoch: 35 Global Step: 733220 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:31,671-Speed 6301.03 samples/sec Loss 2.7421 LearningRate 0.0000 Epoch: 35 Global Step: 733230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:34,931-Speed 6283.26 samples/sec Loss 2.8097 LearningRate 0.0000 Epoch: 35 Global Step: 733240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:38,196-Speed 6274.57 samples/sec Loss 2.7888 LearningRate 0.0000 Epoch: 35 Global Step: 733250 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:41,450-Speed 6295.30 samples/sec Loss 2.7863 LearningRate 0.0000 Epoch: 35 Global Step: 733260 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:44,707-Speed 6288.68 samples/sec Loss 2.7796 LearningRate 0.0000 Epoch: 35 Global Step: 733270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:47,966-Speed 6284.61 samples/sec Loss 2.7631 LearningRate 0.0000 Epoch: 35 Global Step: 733280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:51,212-Speed 6312.03 samples/sec Loss 2.7612 LearningRate 0.0000 Epoch: 35 Global Step: 733290 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:54,471-Speed 6284.24 samples/sec Loss 2.7875 LearningRate 0.0000 Epoch: 35 Global Step: 733300 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:26:57,724-Speed 6298.67 samples/sec Loss 2.7325 LearningRate 0.0000 Epoch: 35 Global Step: 733310 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:00,976-Speed 6298.33 samples/sec Loss 2.7377 LearningRate 0.0000 Epoch: 35 Global Step: 733320 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:27:04,220-Speed 6314.52 samples/sec Loss 2.7585 LearningRate 0.0000 Epoch: 35 Global Step: 733330 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:07,478-Speed 6287.73 samples/sec Loss 2.7548 LearningRate 0.0000 Epoch: 35 Global Step: 733340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:10,735-Speed 6289.15 samples/sec Loss 2.7361 LearningRate 0.0000 Epoch: 35 Global Step: 733350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:13,992-Speed 6289.96 samples/sec Loss 2.7593 LearningRate 0.0000 Epoch: 35 Global Step: 733360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:17,242-Speed 6303.09 samples/sec Loss 2.7598 LearningRate 0.0000 Epoch: 35 Global Step: 733370 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:20,491-Speed 6305.47 samples/sec Loss 2.7645 LearningRate 0.0000 Epoch: 35 Global Step: 733380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:23,742-Speed 6300.55 samples/sec Loss 2.7273 LearningRate 0.0000 Epoch: 35 Global Step: 733390 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:27,006-Speed 6275.34 samples/sec Loss 2.7922 LearningRate 0.0000 Epoch: 35 Global Step: 733400 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:30,272-Speed 6272.30 samples/sec Loss 2.7300 LearningRate 0.0000 Epoch: 35 Global Step: 733410 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:33,527-Speed 6294.20 samples/sec Loss 2.7489 LearningRate 0.0000 Epoch: 35 Global Step: 733420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:36,780-Speed 6296.27 samples/sec Loss 2.7574 LearningRate 0.0000 Epoch: 35 Global Step: 733430 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:27:40,125-Speed 6124.17 samples/sec Loss 2.7529 LearningRate 0.0000 Epoch: 35 Global Step: 733440 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:27:43,403-Speed 6249.73 samples/sec Loss 2.7696 LearningRate 0.0000 Epoch: 35 Global Step: 733450 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:46,724-Speed 6167.43 samples/sec Loss 2.7381 LearningRate 0.0000 Epoch: 35 Global Step: 733460 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:50,105-Speed 6058.83 samples/sec Loss 2.7742 LearningRate 0.0000 Epoch: 35 Global Step: 733470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:53,362-Speed 6288.77 samples/sec Loss 2.7239 LearningRate 0.0000 Epoch: 35 Global Step: 733480 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:56,615-Speed 6297.77 samples/sec Loss 2.7462 LearningRate 0.0000 Epoch: 35 Global Step: 733490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:27:59,880-Speed 6274.77 samples/sec Loss 2.7834 LearningRate 0.0000 Epoch: 35 Global Step: 733500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:03,139-Speed 6285.79 samples/sec Loss 2.7532 LearningRate 0.0000 Epoch: 35 Global Step: 733510 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:06,394-Speed 6293.47 samples/sec Loss 2.7555 LearningRate 0.0000 Epoch: 35 Global Step: 733520 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:09,653-Speed 6284.34 samples/sec Loss 2.7697 LearningRate 0.0000 Epoch: 35 Global Step: 733530 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:12,912-Speed 6286.45 samples/sec Loss 2.7630 LearningRate 0.0000 Epoch: 35 Global Step: 733540 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:16,173-Speed 6281.93 samples/sec Loss 2.7442 LearningRate 0.0000 Epoch: 35 Global Step: 733550 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:28:19,426-Speed 6296.91 samples/sec Loss 2.7445 LearningRate 0.0000 Epoch: 35 Global Step: 733560 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:28:22,661-Speed 6332.76 samples/sec Loss 2.7737 LearningRate 0.0000 Epoch: 35 Global Step: 733570 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:25,911-Speed 6302.55 samples/sec Loss 2.7256 LearningRate 0.0000 Epoch: 35 Global Step: 733580 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:29,165-Speed 6295.18 samples/sec Loss 2.6824 LearningRate 0.0000 Epoch: 35 Global Step: 733590 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:32,417-Speed 6298.78 samples/sec Loss 2.7614 LearningRate 0.0000 Epoch: 35 Global Step: 733600 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:35,667-Speed 6304.43 samples/sec Loss 2.7586 LearningRate 0.0000 Epoch: 35 Global Step: 733610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:38,928-Speed 6280.80 samples/sec Loss 2.7732 LearningRate 0.0000 Epoch: 35 Global Step: 733620 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:42,180-Speed 6300.55 samples/sec Loss 2.7263 LearningRate 0.0000 Epoch: 35 Global Step: 733630 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:45,437-Speed 6287.30 samples/sec Loss 2.7903 LearningRate 0.0000 Epoch: 35 Global Step: 733640 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:48,693-Speed 6292.46 samples/sec Loss 2.8102 LearningRate 0.0000 Epoch: 35 Global Step: 733650 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:51,953-Speed 6284.01 samples/sec Loss 2.7326 LearningRate 0.0000 Epoch: 35 Global Step: 733660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:28:55,218-Speed 6273.47 samples/sec Loss 2.7393 LearningRate 0.0000 Epoch: 35 Global Step: 733670 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:28:58,457-Speed 6324.33 samples/sec Loss 2.8116 LearningRate 0.0000 Epoch: 35 Global Step: 733680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:01,777-Speed 6169.97 samples/sec Loss 2.8323 LearningRate 0.0000 Epoch: 35 Global Step: 733690 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:05,140-Speed 6091.88 samples/sec Loss 2.7951 LearningRate 0.0000 Epoch: 35 Global Step: 733700 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:08,434-Speed 6216.89 samples/sec Loss 2.7123 LearningRate 0.0000 Epoch: 35 Global Step: 733710 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:11,796-Speed 6094.33 samples/sec Loss 2.7890 LearningRate 0.0000 Epoch: 35 Global Step: 733720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:15,048-Speed 6297.69 samples/sec Loss 2.7472 LearningRate 0.0000 Epoch: 35 Global Step: 733730 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:18,305-Speed 6290.72 samples/sec Loss 2.8032 LearningRate 0.0000 Epoch: 35 Global Step: 733740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:21,551-Speed 6310.46 samples/sec Loss 2.7558 LearningRate 0.0000 Epoch: 35 Global Step: 733750 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:24,801-Speed 6302.88 samples/sec Loss 2.7479 LearningRate 0.0000 Epoch: 35 Global Step: 733760 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:28,061-Speed 6283.16 samples/sec Loss 2.7698 LearningRate 0.0000 Epoch: 35 Global Step: 733770 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:31,298-Speed 6329.08 samples/sec Loss 2.7661 LearningRate 0.0000 Epoch: 35 Global Step: 733780 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:34,559-Speed 6281.74 samples/sec Loss 2.7000 LearningRate 0.0000 Epoch: 35 Global Step: 733790 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:37,831-Speed 6261.13 samples/sec Loss 2.8225 LearningRate 0.0000 Epoch: 35 Global Step: 733800 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:41,138-Speed 6193.22 samples/sec Loss 2.7408 LearningRate 0.0000 Epoch: 35 Global Step: 733810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:44,386-Speed 6306.96 samples/sec Loss 2.7446 LearningRate 0.0000 Epoch: 35 Global Step: 733820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:47,637-Speed 6300.93 samples/sec Loss 2.7053 LearningRate 0.0000 Epoch: 35 Global Step: 733830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:50,892-Speed 6294.46 samples/sec Loss 2.7418 LearningRate 0.0000 Epoch: 35 Global Step: 733840 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:54,140-Speed 6305.75 samples/sec Loss 2.7170 LearningRate 0.0000 Epoch: 35 Global Step: 733850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:29:57,387-Speed 6308.77 samples/sec Loss 2.7380 LearningRate 0.0000 Epoch: 35 Global Step: 733860 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:00,644-Speed 6290.53 samples/sec Loss 2.7558 LearningRate 0.0000 Epoch: 35 Global Step: 733870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:03,904-Speed 6282.88 samples/sec Loss 2.7920 LearningRate 0.0000 Epoch: 35 Global Step: 733880 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:30:07,146-Speed 6317.80 samples/sec Loss 2.7376 LearningRate 0.0000 Epoch: 35 Global Step: 733890 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:10,403-Speed 6290.80 samples/sec Loss 2.7753 LearningRate 0.0000 Epoch: 35 Global Step: 733900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:13,666-Speed 6276.82 samples/sec Loss 2.7119 LearningRate 0.0000 Epoch: 35 Global Step: 733910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:16,923-Speed 6289.99 samples/sec Loss 2.7732 LearningRate 0.0000 Epoch: 35 Global Step: 733920 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:20,188-Speed 6274.16 samples/sec Loss 2.7995 LearningRate 0.0000 Epoch: 35 Global Step: 733930 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:23,449-Speed 6281.55 samples/sec Loss 2.7847 LearningRate 0.0000 Epoch: 35 Global Step: 733940 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:26,709-Speed 6283.58 samples/sec Loss 2.7674 LearningRate 0.0000 Epoch: 35 Global Step: 733950 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:29,962-Speed 6297.21 samples/sec Loss 2.6953 LearningRate 0.0000 Epoch: 35 Global Step: 733960 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:33,216-Speed 6293.80 samples/sec Loss 2.7915 LearningRate 0.0000 Epoch: 35 Global Step: 733970 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:36,482-Speed 6272.01 samples/sec Loss 2.7549 LearningRate 0.0000 Epoch: 35 Global Step: 733980 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:39,797-Speed 6179.43 samples/sec Loss 2.8077 LearningRate 0.0000 Epoch: 35 Global Step: 733990 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:43,055-Speed 6289.43 samples/sec Loss 2.8154 LearningRate 0.0000 Epoch: 35 Global Step: 734000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:46,314-Speed 6285.74 samples/sec Loss 2.7634 LearningRate 0.0000 Epoch: 35 Global Step: 734010 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:49,570-Speed 6289.95 samples/sec Loss 2.7832 LearningRate 0.0000 Epoch: 35 Global Step: 734020 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:52,834-Speed 6277.08 samples/sec Loss 2.7675 LearningRate 0.0000 Epoch: 35 Global Step: 734030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:56,086-Speed 6297.78 samples/sec Loss 2.7746 LearningRate 0.0000 Epoch: 35 Global Step: 734040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:30:59,341-Speed 6293.29 samples/sec Loss 2.7562 LearningRate 0.0000 Epoch: 35 Global Step: 734050 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:02,592-Speed 6301.73 samples/sec Loss 2.8279 LearningRate 0.0000 Epoch: 35 Global Step: 734060 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:05,843-Speed 6301.20 samples/sec Loss 2.7612 LearningRate 0.0000 Epoch: 35 Global Step: 734070 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:09,095-Speed 6298.76 samples/sec Loss 2.7349 LearningRate 0.0000 Epoch: 35 Global Step: 734080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:12,355-Speed 6284.73 samples/sec Loss 2.7643 LearningRate 0.0000 Epoch: 35 Global Step: 734090 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:31:15,607-Speed 6298.55 samples/sec Loss 2.7401 LearningRate 0.0000 Epoch: 35 Global Step: 734100 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:18,865-Speed 6286.05 samples/sec Loss 2.7636 LearningRate 0.0000 Epoch: 35 Global Step: 734110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:22,119-Speed 6295.95 samples/sec Loss 2.7784 LearningRate 0.0000 Epoch: 35 Global Step: 734120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:25,379-Speed 6284.13 samples/sec Loss 2.7593 LearningRate 0.0000 Epoch: 35 Global Step: 734130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:28,641-Speed 6279.30 samples/sec Loss 2.7697 LearningRate 0.0000 Epoch: 35 Global Step: 734140 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:31,899-Speed 6287.16 samples/sec Loss 2.7466 LearningRate 0.0000 Epoch: 35 Global Step: 734150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:35,153-Speed 6294.92 samples/sec Loss 2.7300 LearningRate 0.0000 Epoch: 35 Global Step: 734160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:38,406-Speed 6297.56 samples/sec Loss 2.7448 LearningRate 0.0000 Epoch: 35 Global Step: 734170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:41,662-Speed 6291.34 samples/sec Loss 2.7900 LearningRate 0.0000 Epoch: 35 Global Step: 734180 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:44,917-Speed 6293.02 samples/sec Loss 2.7515 LearningRate 0.0000 Epoch: 35 Global Step: 734190 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:48,171-Speed 6295.15 samples/sec Loss 2.6878 LearningRate 0.0000 Epoch: 35 Global Step: 734200 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:31:51,407-Speed 6330.52 samples/sec Loss 2.7455 LearningRate 0.0000 Epoch: 35 Global Step: 734210 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:54,665-Speed 6287.72 samples/sec Loss 2.7011 LearningRate 0.0000 Epoch: 35 Global Step: 734220 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:31:57,924-Speed 6286.83 samples/sec Loss 2.8055 LearningRate 0.0000 Epoch: 35 Global Step: 734230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:01,177-Speed 6296.87 samples/sec Loss 2.7928 LearningRate 0.0000 Epoch: 35 Global Step: 734240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:04,436-Speed 6285.44 samples/sec Loss 2.7488 LearningRate 0.0000 Epoch: 35 Global Step: 734250 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:07,691-Speed 6292.33 samples/sec Loss 2.7670 LearningRate 0.0000 Epoch: 35 Global Step: 734260 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:10,942-Speed 6301.85 samples/sec Loss 2.7457 LearningRate 0.0000 Epoch: 35 Global Step: 734270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:14,190-Speed 6306.07 samples/sec Loss 2.7192 LearningRate 0.0000 Epoch: 35 Global Step: 734280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:17,447-Speed 6289.20 samples/sec Loss 2.6986 LearningRate 0.0000 Epoch: 35 Global Step: 734290 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:20,708-Speed 6283.15 samples/sec Loss 2.7774 LearningRate 0.0000 Epoch: 35 Global Step: 734300 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:23,955-Speed 6308.55 samples/sec Loss 2.7881 LearningRate 0.0000 Epoch: 35 Global Step: 734310 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:27,213-Speed 6287.53 samples/sec Loss 2.7771 LearningRate 0.0000 Epoch: 35 Global Step: 734320 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:30,473-Speed 6282.47 samples/sec Loss 2.7728 LearningRate 0.0000 Epoch: 35 Global Step: 734330 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:33,731-Speed 6289.07 samples/sec Loss 2.7403 LearningRate 0.0000 Epoch: 35 Global Step: 734340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:36,993-Speed 6277.82 samples/sec Loss 2.7261 LearningRate 0.0000 Epoch: 35 Global Step: 734350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:40,249-Speed 6293.27 samples/sec Loss 2.7468 LearningRate 0.0000 Epoch: 35 Global Step: 734360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:43,502-Speed 6296.51 samples/sec Loss 2.7640 LearningRate 0.0000 Epoch: 35 Global Step: 734370 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:46,746-Speed 6313.26 samples/sec Loss 2.7357 LearningRate 0.0000 Epoch: 35 Global Step: 734380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:50,001-Speed 6293.03 samples/sec Loss 2.7787 LearningRate 0.0000 Epoch: 35 Global Step: 734390 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:53,264-Speed 6279.66 samples/sec Loss 2.7533 LearningRate 0.0000 Epoch: 35 Global Step: 734400 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:32:56,515-Speed 6300.50 samples/sec Loss 2.6990 LearningRate 0.0000 Epoch: 35 Global Step: 734410 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:32:59,756-Speed 6320.24 samples/sec Loss 2.7749 LearningRate 0.0000 Epoch: 35 Global Step: 734420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:03,007-Speed 6301.07 samples/sec Loss 2.7402 LearningRate 0.0000 Epoch: 35 Global Step: 734430 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:06,256-Speed 6306.47 samples/sec Loss 2.8090 LearningRate 0.0000 Epoch: 35 Global Step: 734440 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:09,517-Speed 6280.43 samples/sec Loss 2.7588 LearningRate 0.0000 Epoch: 35 Global Step: 734450 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:12,786-Speed 6267.19 samples/sec Loss 2.7589 LearningRate 0.0000 Epoch: 35 Global Step: 734460 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:16,042-Speed 6292.14 samples/sec Loss 2.7695 LearningRate 0.0000 Epoch: 35 Global Step: 734470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:19,296-Speed 6294.17 samples/sec Loss 2.7092 LearningRate 0.0000 Epoch: 35 Global Step: 734480 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:22,550-Speed 6294.58 samples/sec Loss 2.7436 LearningRate 0.0000 Epoch: 35 Global Step: 734490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:25,802-Speed 6300.12 samples/sec Loss 2.7371 LearningRate 0.0000 Epoch: 35 Global Step: 734500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:29,057-Speed 6292.92 samples/sec Loss 2.7444 LearningRate 0.0000 Epoch: 35 Global Step: 734510 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:32,302-Speed 6311.40 samples/sec Loss 2.7821 LearningRate 0.0000 Epoch: 35 Global Step: 734520 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:35,562-Speed 6285.28 samples/sec Loss 2.7373 LearningRate 0.0000 Epoch: 35 Global Step: 734530 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:38,810-Speed 6305.85 samples/sec Loss 2.6962 LearningRate 0.0000 Epoch: 35 Global Step: 734540 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:42,067-Speed 6290.10 samples/sec Loss 2.7789 LearningRate 0.0000 Epoch: 35 Global Step: 734550 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:45,324-Speed 6288.87 samples/sec Loss 2.7717 LearningRate 0.0000 Epoch: 35 Global Step: 734560 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:48,582-Speed 6287.30 samples/sec Loss 2.7840 LearningRate 0.0000 Epoch: 35 Global Step: 734570 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:51,831-Speed 6304.49 samples/sec Loss 2.7620 LearningRate 0.0000 Epoch: 35 Global Step: 734580 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:55,091-Speed 6283.54 samples/sec Loss 2.7926 LearningRate 0.0000 Epoch: 35 Global Step: 734590 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:33:58,344-Speed 6297.75 samples/sec Loss 2.7609 LearningRate 0.0000 Epoch: 35 Global Step: 734600 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:01,599-Speed 6292.44 samples/sec Loss 2.7738 LearningRate 0.0000 Epoch: 35 Global Step: 734610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:04,855-Speed 6293.08 samples/sec Loss 2.6999 LearningRate 0.0000 Epoch: 35 Global Step: 734620 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:34:08,089-Speed 6333.28 samples/sec Loss 2.7566 LearningRate 0.0000 Epoch: 35 Global Step: 734630 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:11,342-Speed 6298.96 samples/sec Loss 2.7920 LearningRate 0.0000 Epoch: 35 Global Step: 734640 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:14,598-Speed 6290.08 samples/sec Loss 2.7753 LearningRate 0.0000 Epoch: 35 Global Step: 734650 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:17,855-Speed 6289.17 samples/sec Loss 2.7458 LearningRate 0.0000 Epoch: 35 Global Step: 734660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:21,116-Speed 6281.84 samples/sec Loss 2.7974 LearningRate 0.0000 Epoch: 35 Global Step: 734670 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:24,372-Speed 6291.58 samples/sec Loss 2.7605 LearningRate 0.0000 Epoch: 35 Global Step: 734680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:27,631-Speed 6285.76 samples/sec Loss 2.7878 LearningRate 0.0000 Epoch: 35 Global Step: 734690 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:30,887-Speed 6291.90 samples/sec Loss 2.7574 LearningRate 0.0000 Epoch: 35 Global Step: 734700 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:34,145-Speed 6285.96 samples/sec Loss 2.7377 LearningRate 0.0000 Epoch: 35 Global Step: 734710 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:37,396-Speed 6301.48 samples/sec Loss 2.7671 LearningRate 0.0000 Epoch: 35 Global Step: 734720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:40,648-Speed 6299.42 samples/sec Loss 2.7943 LearningRate 0.0000 Epoch: 35 Global Step: 734730 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:34:43,892-Speed 6314.19 samples/sec Loss 2.7616 LearningRate 0.0000 Epoch: 35 Global Step: 734740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:47,154-Speed 6279.41 samples/sec Loss 2.7275 LearningRate 0.0000 Epoch: 35 Global Step: 734750 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:50,420-Speed 6277.79 samples/sec Loss 2.7184 LearningRate 0.0000 Epoch: 35 Global Step: 734760 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:53,672-Speed 6297.25 samples/sec Loss 2.6983 LearningRate 0.0000 Epoch: 35 Global Step: 734770 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:34:56,930-Speed 6287.59 samples/sec Loss 2.7480 LearningRate 0.0000 Epoch: 35 Global Step: 734780 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:00,192-Speed 6280.63 samples/sec Loss 2.7893 LearningRate 0.0000 Epoch: 35 Global Step: 734790 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:03,448-Speed 6290.57 samples/sec Loss 2.7440 LearningRate 0.0000 Epoch: 35 Global Step: 734800 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:06,713-Speed 6275.19 samples/sec Loss 2.7317 LearningRate 0.0000 Epoch: 35 Global Step: 734810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:09,968-Speed 6292.05 samples/sec Loss 2.7472 LearningRate 0.0000 Epoch: 35 Global Step: 734820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:13,228-Speed 6284.38 samples/sec Loss 2.7381 LearningRate 0.0000 Epoch: 35 Global Step: 734830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:16,543-Speed 6179.97 samples/sec Loss 2.7797 LearningRate 0.0000 Epoch: 35 Global Step: 734840 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:35:19,785-Speed 6317.76 samples/sec Loss 2.7304 LearningRate 0.0000 Epoch: 35 Global Step: 734850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:23,041-Speed 6292.89 samples/sec Loss 2.7815 LearningRate 0.0000 Epoch: 35 Global Step: 734860 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:26,297-Speed 6289.61 samples/sec Loss 2.7729 LearningRate 0.0000 Epoch: 35 Global Step: 734870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:29,557-Speed 6283.50 samples/sec Loss 2.7635 LearningRate 0.0000 Epoch: 35 Global Step: 734880 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:32,816-Speed 6286.97 samples/sec Loss 2.7460 LearningRate 0.0000 Epoch: 35 Global Step: 734890 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:36,072-Speed 6291.40 samples/sec Loss 2.7965 LearningRate 0.0000 Epoch: 35 Global Step: 734900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:39,333-Speed 6281.73 samples/sec Loss 2.7798 LearningRate 0.0000 Epoch: 35 Global Step: 734910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:42,588-Speed 6292.72 samples/sec Loss 2.7223 LearningRate 0.0000 Epoch: 35 Global Step: 734920 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:45,843-Speed 6292.97 samples/sec Loss 2.7546 LearningRate 0.0000 Epoch: 35 Global Step: 734930 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:49,106-Speed 6276.98 samples/sec Loss 2.7196 LearningRate 0.0000 Epoch: 35 Global Step: 734940 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:52,346-Speed 6323.35 samples/sec Loss 2.7846 LearningRate 0.0000 Epoch: 35 Global Step: 734950 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:55,603-Speed 6289.03 samples/sec Loss 2.7344 LearningRate 0.0000 Epoch: 35 Global Step: 734960 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:35:58,861-Speed 6288.14 samples/sec Loss 2.7682 LearningRate 0.0000 Epoch: 35 Global Step: 734970 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:02,127-Speed 6271.01 samples/sec Loss 2.7650 LearningRate 0.0000 Epoch: 35 Global Step: 734980 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:05,389-Speed 6280.32 samples/sec Loss 2.7806 LearningRate 0.0000 Epoch: 35 Global Step: 734990 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:08,645-Speed 6291.10 samples/sec Loss 2.7899 LearningRate 0.0000 Epoch: 35 Global Step: 735000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:11,899-Speed 6294.97 samples/sec Loss 2.7137 LearningRate 0.0000 Epoch: 35 Global Step: 735010 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:15,170-Speed 6263.93 samples/sec Loss 2.7393 LearningRate 0.0000 Epoch: 35 Global Step: 735020 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:18,437-Speed 6270.28 samples/sec Loss 2.7502 LearningRate 0.0000 Epoch: 35 Global Step: 735030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:21,699-Speed 6279.86 samples/sec Loss 2.7361 LearningRate 0.0000 Epoch: 35 Global Step: 735040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:24,973-Speed 6257.44 samples/sec Loss 2.7195 LearningRate 0.0000 Epoch: 35 Global Step: 735050 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:36:28,212-Speed 6324.25 samples/sec Loss 2.7179 LearningRate 0.0000 Epoch: 35 Global Step: 735060 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:31,469-Speed 6288.89 samples/sec Loss 2.7626 LearningRate 0.0000 Epoch: 35 Global Step: 735070 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:34,726-Speed 6290.06 samples/sec Loss 2.7371 LearningRate 0.0000 Epoch: 35 Global Step: 735080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:37,990-Speed 6275.51 samples/sec Loss 2.7928 LearningRate 0.0000 Epoch: 35 Global Step: 735090 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:41,251-Speed 6281.67 samples/sec Loss 2.7579 LearningRate 0.0000 Epoch: 35 Global Step: 735100 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:44,507-Speed 6290.22 samples/sec Loss 2.7896 LearningRate 0.0000 Epoch: 35 Global Step: 735110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:47,767-Speed 6283.55 samples/sec Loss 2.7522 LearningRate 0.0000 Epoch: 35 Global Step: 735120 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:51,031-Speed 6276.42 samples/sec Loss 2.7664 LearningRate 0.0000 Epoch: 35 Global Step: 735130 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:54,283-Speed 6299.73 samples/sec Loss 2.7342 LearningRate 0.0000 Epoch: 35 Global Step: 735140 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:36:57,531-Speed 6306.09 samples/sec Loss 2.7118 LearningRate 0.0000 Epoch: 35 Global Step: 735150 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:00,773-Speed 6318.04 samples/sec Loss 2.7954 LearningRate 0.0000 Epoch: 35 Global Step: 735160 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:04,027-Speed 6295.41 samples/sec Loss 2.7373 LearningRate 0.0000 Epoch: 35 Global Step: 735170 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:07,279-Speed 6299.94 samples/sec Loss 2.7416 LearningRate 0.0000 Epoch: 35 Global Step: 735180 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:10,540-Speed 6281.59 samples/sec Loss 2.7366 LearningRate 0.0000 Epoch: 35 Global Step: 735190 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:13,796-Speed 6291.45 samples/sec Loss 2.7593 LearningRate 0.0000 Epoch: 35 Global Step: 735200 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:17,057-Speed 6281.41 samples/sec Loss 2.7606 LearningRate 0.0000 Epoch: 35 Global Step: 735210 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:20,314-Speed 6289.69 samples/sec Loss 2.7395 LearningRate 0.0000 Epoch: 35 Global Step: 735220 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:23,566-Speed 6299.17 samples/sec Loss 2.7787 LearningRate 0.0000 Epoch: 35 Global Step: 735230 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:26,820-Speed 6295.29 samples/sec Loss 2.7078 LearningRate 0.0000 Epoch: 35 Global Step: 735240 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:30,073-Speed 6297.92 samples/sec Loss 2.7518 LearningRate 0.0000 Epoch: 35 Global Step: 735250 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:33,328-Speed 6292.04 samples/sec Loss 2.7514 LearningRate 0.0000 Epoch: 35 Global Step: 735260 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:37:36,566-Speed 6326.86 samples/sec Loss 2.7172 LearningRate 0.0000 Epoch: 35 Global Step: 735270 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:39,814-Speed 6307.44 samples/sec Loss 2.7340 LearningRate 0.0000 Epoch: 35 Global Step: 735280 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:43,060-Speed 6310.45 samples/sec Loss 2.6869 LearningRate 0.0000 Epoch: 35 Global Step: 735290 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:46,320-Speed 6282.71 samples/sec Loss 2.7274 LearningRate 0.0000 Epoch: 35 Global Step: 735300 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:49,601-Speed 6244.25 samples/sec Loss 2.7268 LearningRate 0.0000 Epoch: 35 Global Step: 735310 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:52,857-Speed 6291.86 samples/sec Loss 2.7594 LearningRate 0.0000 Epoch: 35 Global Step: 735320 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:56,109-Speed 6298.70 samples/sec Loss 2.7393 LearningRate 0.0000 Epoch: 35 Global Step: 735330 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:37:59,365-Speed 6292.07 samples/sec Loss 2.7605 LearningRate 0.0000 Epoch: 35 Global Step: 735340 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:02,629-Speed 6274.59 samples/sec Loss 2.7228 LearningRate 0.0000 Epoch: 35 Global Step: 735350 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:05,887-Speed 6288.68 samples/sec Loss 2.7027 LearningRate 0.0000 Epoch: 35 Global Step: 735360 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:09,147-Speed 6283.03 samples/sec Loss 2.7948 LearningRate 0.0000 Epoch: 35 Global Step: 735370 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:38:12,398-Speed 6300.77 samples/sec Loss 2.7295 LearningRate 0.0000 Epoch: 35 Global Step: 735380 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:15,657-Speed 6284.71 samples/sec Loss 2.6909 LearningRate 0.0000 Epoch: 35 Global Step: 735390 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:18,911-Speed 6294.84 samples/sec Loss 2.7269 LearningRate 0.0000 Epoch: 35 Global Step: 735400 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:22,164-Speed 6299.21 samples/sec Loss 2.6917 LearningRate 0.0000 Epoch: 35 Global Step: 735410 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:25,415-Speed 6301.05 samples/sec Loss 2.7290 LearningRate 0.0000 Epoch: 35 Global Step: 735420 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:28,755-Speed 6133.42 samples/sec Loss 2.7927 LearningRate 0.0000 Epoch: 35 Global Step: 735430 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:32,007-Speed 6297.96 samples/sec Loss 2.7562 LearningRate 0.0000 Epoch: 35 Global Step: 735440 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:35,264-Speed 6290.60 samples/sec Loss 2.7638 LearningRate 0.0000 Epoch: 35 Global Step: 735450 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:38,523-Speed 6285.40 samples/sec Loss 2.8031 LearningRate 0.0000 Epoch: 35 Global Step: 735460 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:41,772-Speed 6305.28 samples/sec Loss 2.7368 LearningRate 0.0000 Epoch: 35 Global Step: 735470 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:45,031-Speed 6285.37 samples/sec Loss 2.7446 LearningRate 0.0000 Epoch: 35 Global Step: 735480 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:38:48,276-Speed 6311.64 samples/sec Loss 2.7570 LearningRate 0.0000 Epoch: 35 Global Step: 735490 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:51,531-Speed 6293.16 samples/sec Loss 2.7753 LearningRate 0.0000 Epoch: 35 Global Step: 735500 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:54,791-Speed 6284.94 samples/sec Loss 2.7306 LearningRate 0.0000 Epoch: 35 Global Step: 735510 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:38:58,054-Speed 6277.95 samples/sec Loss 2.6915 LearningRate 0.0000 Epoch: 35 Global Step: 735520 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:01,315-Speed 6281.75 samples/sec Loss 2.7522 LearningRate 0.0000 Epoch: 35 Global Step: 735530 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:04,580-Speed 6273.85 samples/sec Loss 2.7663 LearningRate 0.0000 Epoch: 35 Global Step: 735540 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:07,839-Speed 6284.92 samples/sec Loss 2.7774 LearningRate 0.0000 Epoch: 35 Global Step: 735550 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:11,101-Speed 6280.56 samples/sec Loss 2.7413 LearningRate 0.0000 Epoch: 35 Global Step: 735560 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:14,358-Speed 6288.67 samples/sec Loss 2.7365 LearningRate 0.0000 Epoch: 35 Global Step: 735570 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:17,609-Speed 6301.68 samples/sec Loss 2.7691 LearningRate 0.0000 Epoch: 35 Global Step: 735580 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:20,865-Speed 6290.01 samples/sec Loss 2.7452 LearningRate 0.0000 Epoch: 35 Global Step: 735590 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:39:24,119-Speed 6295.79 samples/sec Loss 2.7287 LearningRate 0.0000 Epoch: 35 Global Step: 735600 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:39:27,364-Speed 6313.26 samples/sec Loss 2.7245 LearningRate 0.0000 Epoch: 35 Global Step: 735610 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:30,649-Speed 6236.56 samples/sec Loss 2.7294 LearningRate 0.0000 Epoch: 35 Global Step: 735620 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:33,908-Speed 6284.89 samples/sec Loss 2.7788 LearningRate 0.0000 Epoch: 35 Global Step: 735630 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:37,172-Speed 6276.86 samples/sec Loss 2.7668 LearningRate 0.0000 Epoch: 35 Global Step: 735640 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:40,421-Speed 6304.56 samples/sec Loss 2.7352 LearningRate 0.0000 Epoch: 35 Global Step: 735650 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:43,680-Speed 6285.37 samples/sec Loss 2.7788 LearningRate 0.0000 Epoch: 35 Global Step: 735660 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:46,936-Speed 6290.72 samples/sec Loss 2.7758 LearningRate 0.0000 Epoch: 35 Global Step: 735670 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:50,198-Speed 6280.74 samples/sec Loss 2.7824 LearningRate 0.0000 Epoch: 35 Global Step: 735680 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:53,450-Speed 6298.33 samples/sec Loss 2.7499 LearningRate 0.0000 Epoch: 35 Global Step: 735690 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:56,707-Speed 6290.60 samples/sec Loss 2.7481 LearningRate 0.0000 Epoch: 35 Global Step: 735700 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:39:59,964-Speed 6288.57 samples/sec Loss 2.6981 LearningRate 0.0000 Epoch: 35 Global Step: 735710 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:40:03,213-Speed 6305.27 samples/sec Loss 2.7703 LearningRate 0.0000 Epoch: 35 Global Step: 735720 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:06,540-Speed 6157.32 samples/sec Loss 2.7807 LearningRate 0.0000 Epoch: 35 Global Step: 735730 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:09,795-Speed 6291.72 samples/sec Loss 2.6510 LearningRate 0.0000 Epoch: 35 Global Step: 735740 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:13,057-Speed 6280.59 samples/sec Loss 2.7756 LearningRate 0.0000 Epoch: 35 Global Step: 735750 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:16,313-Speed 6291.74 samples/sec Loss 2.7293 LearningRate 0.0000 Epoch: 35 Global Step: 735760 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:19,567-Speed 6295.59 samples/sec Loss 2.7690 LearningRate 0.0000 Epoch: 35 Global Step: 735770 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:22,827-Speed 6282.61 samples/sec Loss 2.7340 LearningRate 0.0000 Epoch: 35 Global Step: 735780 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:26,084-Speed 6290.51 samples/sec Loss 2.7483 LearningRate 0.0000 Epoch: 35 Global Step: 735790 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:29,342-Speed 6285.60 samples/sec Loss 2.7599 LearningRate 0.0000 Epoch: 35 Global Step: 735800 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:32,597-Speed 6293.61 samples/sec Loss 2.7854 LearningRate 0.0000 Epoch: 35 Global Step: 735810 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:35,835-Speed 6328.29 samples/sec Loss 2.7674 LearningRate 0.0000 Epoch: 35 Global Step: 735820 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:39,084-Speed 6303.61 samples/sec Loss 2.7528 LearningRate 0.0000 Epoch: 35 Global Step: 735830 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:42,343-Speed 6286.48 samples/sec Loss 2.7334 LearningRate 0.0000 Epoch: 35 Global Step: 735840 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:45,602-Speed 6284.81 samples/sec Loss 2.7149 LearningRate 0.0000 Epoch: 35 Global Step: 735850 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:48,856-Speed 6294.91 samples/sec Loss 2.8090 LearningRate 0.0000 Epoch: 35 Global Step: 735860 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:52,116-Speed 6283.59 samples/sec Loss 2.7023 LearningRate 0.0000 Epoch: 35 Global Step: 735870 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:55,374-Speed 6288.95 samples/sec Loss 2.7707 LearningRate 0.0000 Epoch: 35 Global Step: 735880 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:40:58,634-Speed 6283.25 samples/sec Loss 2.7237 LearningRate 0.0000 Epoch: 35 Global Step: 735890 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:01,890-Speed 6291.56 samples/sec Loss 2.7319 LearningRate 0.0000 Epoch: 35 Global Step: 735900 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:05,143-Speed 6297.92 samples/sec Loss 2.7295 LearningRate 0.0000 Epoch: 35 Global Step: 735910 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:08,384-Speed 6318.40 samples/sec Loss 2.7351 LearningRate 0.0000 Epoch: 35 Global Step: 735920 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:11,671-Speed 6233.48 samples/sec Loss 2.7310 LearningRate 0.0000 Epoch: 35 Global Step: 735930 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:14,955-Speed 6237.86 samples/sec Loss 2.7388 LearningRate 0.0000 Epoch: 35 Global Step: 735940 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:18,233-Speed 6247.49 samples/sec Loss 2.7394 LearningRate 0.0000 Epoch: 35 Global Step: 735950 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:21,485-Speed 6299.23 samples/sec Loss 2.7348 LearningRate 0.0000 Epoch: 35 Global Step: 735960 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:24,744-Speed 6287.05 samples/sec Loss 2.7769 LearningRate 0.0000 Epoch: 35 Global Step: 735970 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:28,070-Speed 6157.58 samples/sec Loss 2.7598 LearningRate 0.0000 Epoch: 35 Global Step: 735980 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:31,364-Speed 6219.87 samples/sec Loss 2.7990 LearningRate 0.0000 Epoch: 35 Global Step: 735990 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:34,624-Speed 6282.20 samples/sec Loss 2.7592 LearningRate 0.0000 Epoch: 35 Global Step: 736000 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:37,881-Speed 6290.21 samples/sec Loss 2.7560 LearningRate 0.0000 Epoch: 35 Global Step: 736010 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:41,144-Speed 6278.00 samples/sec Loss 2.7686 LearningRate 0.0000 Epoch: 35 Global Step: 736020 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-03 10:41:44,379-Speed 6333.15 samples/sec Loss 2.7473 LearningRate 0.0000 Epoch: 35 Global Step: 736030 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:47,640-Speed 6281.83 samples/sec Loss 2.7637 LearningRate 0.0000 Epoch: 35 Global Step: 736040 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:50,900-Speed 6282.34 samples/sec Loss 2.7286 LearningRate 0.0000 Epoch: 35 Global Step: 736050 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:54,152-Speed 6298.75 samples/sec Loss 2.7400 LearningRate 0.0000 Epoch: 35 Global Step: 736060 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:41:57,414-Speed 6280.36 samples/sec Loss 2.7119 LearningRate 0.0000 Epoch: 35 Global Step: 736070 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:42:00,668-Speed 6295.25 samples/sec Loss 2.7419 LearningRate 0.0000 Epoch: 35 Global Step: 736080 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:42:03,970-Speed 6204.97 samples/sec Loss 2.7222 LearningRate 0.0000 Epoch: 35 Global Step: 736090 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:42:07,229-Speed 6285.49 samples/sec Loss 2.7407 LearningRate 0.0000 Epoch: 35 Global Step: 736100 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:42:10,484-Speed 6292.54 samples/sec Loss 2.6949 LearningRate 0.0000 Epoch: 35 Global Step: 736110 Fp16 Grad Scale: 4096 Required: 9 hours Training: 2022-04-03 10:42:13,742-Speed 6287.35 samples/sec Loss 2.7270 LearningRate 0.0000 Epoch: 35 Global Step: 736120 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:16,988-Speed 6311.05 samples/sec Loss 2.7406 LearningRate 0.0000 Epoch: 35 Global Step: 736130 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:20,249-Speed 6281.23 samples/sec Loss 2.7801 LearningRate 0.0000 Epoch: 35 Global Step: 736140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:23,505-Speed 6290.70 samples/sec Loss 2.7652 LearningRate 0.0000 Epoch: 35 Global Step: 736150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:26,762-Speed 6289.05 samples/sec Loss 2.7634 LearningRate 0.0000 Epoch: 35 Global Step: 736160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:30,015-Speed 6298.06 samples/sec Loss 2.7832 LearningRate 0.0000 Epoch: 35 Global Step: 736170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:33,266-Speed 6301.10 samples/sec Loss 2.7674 LearningRate 0.0000 Epoch: 35 Global Step: 736180 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:36,525-Speed 6285.17 samples/sec Loss 2.7871 LearningRate 0.0000 Epoch: 35 Global Step: 736190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:39,786-Speed 6281.83 samples/sec Loss 2.7527 LearningRate 0.0000 Epoch: 35 Global Step: 736200 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:43,041-Speed 6293.58 samples/sec Loss 2.7034 LearningRate 0.0000 Epoch: 35 Global Step: 736210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:46,298-Speed 6289.80 samples/sec Loss 2.7679 LearningRate 0.0000 Epoch: 35 Global Step: 736220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:49,550-Speed 6299.25 samples/sec Loss 2.7462 LearningRate 0.0000 Epoch: 35 Global Step: 736230 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:52,815-Speed 6273.88 samples/sec Loss 2.7231 LearningRate 0.0000 Epoch: 35 Global Step: 736240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:56,075-Speed 6284.13 samples/sec Loss 2.7509 LearningRate 0.0000 Epoch: 35 Global Step: 736250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:42:59,333-Speed 6286.86 samples/sec Loss 2.6820 LearningRate 0.0000 Epoch: 35 Global Step: 736260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:02,589-Speed 6291.60 samples/sec Loss 2.7515 LearningRate 0.0000 Epoch: 35 Global Step: 736270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:05,840-Speed 6302.31 samples/sec Loss 2.7120 LearningRate 0.0000 Epoch: 35 Global Step: 736280 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:09,101-Speed 6281.60 samples/sec Loss 2.7263 LearningRate 0.0000 Epoch: 35 Global Step: 736290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:12,362-Speed 6281.13 samples/sec Loss 2.7749 LearningRate 0.0000 Epoch: 35 Global Step: 736300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:15,621-Speed 6285.35 samples/sec Loss 2.7347 LearningRate 0.0000 Epoch: 35 Global Step: 736310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:18,871-Speed 6302.99 samples/sec Loss 2.7550 LearningRate 0.0000 Epoch: 35 Global Step: 736320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:22,117-Speed 6309.84 samples/sec Loss 2.7251 LearningRate 0.0000 Epoch: 35 Global Step: 736330 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:43:25,360-Speed 6317.31 samples/sec Loss 2.8013 LearningRate 0.0000 Epoch: 35 Global Step: 736340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:28,618-Speed 6287.02 samples/sec Loss 2.7137 LearningRate 0.0000 Epoch: 35 Global Step: 736350 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:31,875-Speed 6289.95 samples/sec Loss 2.7290 LearningRate 0.0000 Epoch: 35 Global Step: 736360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:35,132-Speed 6288.62 samples/sec Loss 2.6934 LearningRate 0.0000 Epoch: 35 Global Step: 736370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:38,393-Speed 6282.37 samples/sec Loss 2.7625 LearningRate 0.0000 Epoch: 35 Global Step: 736380 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:41,647-Speed 6295.91 samples/sec Loss 2.7422 LearningRate 0.0000 Epoch: 35 Global Step: 736390 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:44,907-Speed 6284.41 samples/sec Loss 2.7581 LearningRate 0.0000 Epoch: 35 Global Step: 736400 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:48,163-Speed 6290.65 samples/sec Loss 2.7392 LearningRate 0.0000 Epoch: 35 Global Step: 736410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:51,419-Speed 6292.13 samples/sec Loss 2.7661 LearningRate 0.0000 Epoch: 35 Global Step: 736420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:54,675-Speed 6291.31 samples/sec Loss 2.7241 LearningRate 0.0000 Epoch: 35 Global Step: 736430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:43:57,922-Speed 6308.82 samples/sec Loss 2.7196 LearningRate 0.0000 Epoch: 35 Global Step: 736440 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:44:01,162-Speed 6326.67 samples/sec Loss 2.7479 LearningRate 0.0000 Epoch: 35 Global Step: 736450 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:04,425-Speed 6277.36 samples/sec Loss 2.7269 LearningRate 0.0000 Epoch: 35 Global Step: 736460 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:07,682-Speed 6289.05 samples/sec Loss 2.6958 LearningRate 0.0000 Epoch: 35 Global Step: 736470 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:10,934-Speed 6299.63 samples/sec Loss 2.8162 LearningRate 0.0000 Epoch: 35 Global Step: 736480 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:14,182-Speed 6307.05 samples/sec Loss 2.7314 LearningRate 0.0000 Epoch: 35 Global Step: 736490 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:17,437-Speed 6293.79 samples/sec Loss 2.7409 LearningRate 0.0000 Epoch: 35 Global Step: 736500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:20,690-Speed 6296.17 samples/sec Loss 2.7451 LearningRate 0.0000 Epoch: 35 Global Step: 736510 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:23,941-Speed 6300.65 samples/sec Loss 2.7082 LearningRate 0.0000 Epoch: 35 Global Step: 736520 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:27,196-Speed 6294.08 samples/sec Loss 2.7829 LearningRate 0.0000 Epoch: 35 Global Step: 736530 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:30,449-Speed 6297.10 samples/sec Loss 2.7322 LearningRate 0.0000 Epoch: 35 Global Step: 736540 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:33,701-Speed 6299.79 samples/sec Loss 2.7469 LearningRate 0.0000 Epoch: 35 Global Step: 736550 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:44:36,942-Speed 6319.74 samples/sec Loss 2.7833 LearningRate 0.0000 Epoch: 35 Global Step: 736560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:40,199-Speed 6288.86 samples/sec Loss 2.7359 LearningRate 0.0000 Epoch: 35 Global Step: 736570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:43,458-Speed 6285.57 samples/sec Loss 2.7314 LearningRate 0.0000 Epoch: 35 Global Step: 736580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:46,705-Speed 6308.30 samples/sec Loss 2.7386 LearningRate 0.0000 Epoch: 35 Global Step: 736590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:49,968-Speed 6278.93 samples/sec Loss 2.7765 LearningRate 0.0000 Epoch: 35 Global Step: 736600 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:53,222-Speed 6293.88 samples/sec Loss 2.7243 LearningRate 0.0000 Epoch: 35 Global Step: 736610 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:56,479-Speed 6289.15 samples/sec Loss 2.7639 LearningRate 0.0000 Epoch: 35 Global Step: 736620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:44:59,743-Speed 6277.70 samples/sec Loss 2.7755 LearningRate 0.0000 Epoch: 35 Global Step: 736630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:03,018-Speed 6254.38 samples/sec Loss 2.7591 LearningRate 0.0000 Epoch: 35 Global Step: 736640 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:06,365-Speed 6119.82 samples/sec Loss 2.6273 LearningRate 0.0000 Epoch: 35 Global Step: 736650 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:09,613-Speed 6308.43 samples/sec Loss 2.7377 LearningRate 0.0000 Epoch: 35 Global Step: 736660 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:12,870-Speed 6288.02 samples/sec Loss 2.7194 LearningRate 0.0000 Epoch: 35 Global Step: 736670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:16,135-Speed 6275.20 samples/sec Loss 2.7534 LearningRate 0.0000 Epoch: 35 Global Step: 736680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:19,395-Speed 6283.10 samples/sec Loss 2.7678 LearningRate 0.0000 Epoch: 35 Global Step: 736690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:22,651-Speed 6290.70 samples/sec Loss 2.7098 LearningRate 0.0000 Epoch: 35 Global Step: 736700 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:25,910-Speed 6285.77 samples/sec Loss 2.7690 LearningRate 0.0000 Epoch: 35 Global Step: 736710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:29,167-Speed 6289.08 samples/sec Loss 2.7609 LearningRate 0.0000 Epoch: 35 Global Step: 736720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:32,424-Speed 6290.92 samples/sec Loss 2.7611 LearningRate 0.0000 Epoch: 35 Global Step: 736730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:35,687-Speed 6277.13 samples/sec Loss 2.6507 LearningRate 0.0000 Epoch: 35 Global Step: 736740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:38,938-Speed 6301.38 samples/sec Loss 2.7469 LearningRate 0.0000 Epoch: 35 Global Step: 736750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:42,184-Speed 6310.23 samples/sec Loss 2.7793 LearningRate 0.0000 Epoch: 35 Global Step: 736760 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:45:45,427-Speed 6317.11 samples/sec Loss 2.7761 LearningRate 0.0000 Epoch: 35 Global Step: 736770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:48,693-Speed 6271.08 samples/sec Loss 2.8062 LearningRate 0.0000 Epoch: 35 Global Step: 736780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:51,953-Speed 6282.71 samples/sec Loss 2.7345 LearningRate 0.0000 Epoch: 35 Global Step: 736790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:55,210-Speed 6291.44 samples/sec Loss 2.7533 LearningRate 0.0000 Epoch: 35 Global Step: 736800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:45:58,469-Speed 6284.40 samples/sec Loss 2.7408 LearningRate 0.0000 Epoch: 35 Global Step: 736810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:01,725-Speed 6291.72 samples/sec Loss 2.7364 LearningRate 0.0000 Epoch: 35 Global Step: 736820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:04,986-Speed 6281.80 samples/sec Loss 2.7344 LearningRate 0.0000 Epoch: 35 Global Step: 736830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:08,245-Speed 6284.27 samples/sec Loss 2.7759 LearningRate 0.0000 Epoch: 35 Global Step: 736840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:11,508-Speed 6277.50 samples/sec Loss 2.7186 LearningRate 0.0000 Epoch: 35 Global Step: 736850 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:14,762-Speed 6297.63 samples/sec Loss 2.6999 LearningRate 0.0000 Epoch: 35 Global Step: 736860 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:17,995-Speed 6334.67 samples/sec Loss 2.7831 LearningRate 0.0000 Epoch: 35 Global Step: 736870 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:21,256-Speed 6281.72 samples/sec Loss 2.7829 LearningRate 0.0000 Epoch: 35 Global Step: 736880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:24,510-Speed 6296.26 samples/sec Loss 2.7472 LearningRate 0.0000 Epoch: 35 Global Step: 736890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:27,761-Speed 6300.13 samples/sec Loss 2.7772 LearningRate 0.0000 Epoch: 35 Global Step: 736900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:31,025-Speed 6276.38 samples/sec Loss 2.7424 LearningRate 0.0000 Epoch: 35 Global Step: 736910 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:34,281-Speed 6290.83 samples/sec Loss 2.7661 LearningRate 0.0000 Epoch: 35 Global Step: 736920 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:37,531-Speed 6302.57 samples/sec Loss 2.6909 LearningRate 0.0000 Epoch: 35 Global Step: 736930 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:40,788-Speed 6289.35 samples/sec Loss 2.7354 LearningRate 0.0000 Epoch: 35 Global Step: 736940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:44,043-Speed 6292.96 samples/sec Loss 2.7495 LearningRate 0.0000 Epoch: 35 Global Step: 736950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:47,300-Speed 6292.86 samples/sec Loss 2.7542 LearningRate 0.0000 Epoch: 35 Global Step: 736960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:50,569-Speed 6267.13 samples/sec Loss 2.7599 LearningRate 0.0000 Epoch: 35 Global Step: 736970 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:46:53,813-Speed 6313.28 samples/sec Loss 2.7485 LearningRate 0.0000 Epoch: 35 Global Step: 736980 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:46:57,064-Speed 6300.94 samples/sec Loss 2.7079 LearningRate 0.0000 Epoch: 35 Global Step: 736990 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:00,319-Speed 6294.51 samples/sec Loss 2.7015 LearningRate 0.0000 Epoch: 35 Global Step: 737000 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:03,581-Speed 6279.61 samples/sec Loss 2.7678 LearningRate 0.0000 Epoch: 35 Global Step: 737010 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:06,851-Speed 6264.54 samples/sec Loss 2.6964 LearningRate 0.0000 Epoch: 35 Global Step: 737020 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:10,108-Speed 6288.10 samples/sec Loss 2.7609 LearningRate 0.0000 Epoch: 35 Global Step: 737030 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:13,375-Speed 6270.65 samples/sec Loss 2.8027 LearningRate 0.0000 Epoch: 35 Global Step: 737040 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:16,638-Speed 6276.90 samples/sec Loss 2.7632 LearningRate 0.0000 Epoch: 35 Global Step: 737050 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:19,893-Speed 6295.34 samples/sec Loss 2.7412 LearningRate 0.0000 Epoch: 35 Global Step: 737060 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:23,154-Speed 6280.81 samples/sec Loss 2.7011 LearningRate 0.0000 Epoch: 35 Global Step: 737070 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:26,398-Speed 6315.12 samples/sec Loss 2.7528 LearningRate 0.0000 Epoch: 35 Global Step: 737080 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:29,653-Speed 6293.62 samples/sec Loss 2.7611 LearningRate 0.0000 Epoch: 35 Global Step: 737090 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:32,909-Speed 6290.22 samples/sec Loss 2.7614 LearningRate 0.0000 Epoch: 35 Global Step: 737100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:36,164-Speed 6294.47 samples/sec Loss 2.7547 LearningRate 0.0000 Epoch: 35 Global Step: 737110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:39,420-Speed 6290.99 samples/sec Loss 2.7635 LearningRate 0.0000 Epoch: 35 Global Step: 737120 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:42,668-Speed 6306.49 samples/sec Loss 2.7320 LearningRate 0.0000 Epoch: 35 Global Step: 737130 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:45,932-Speed 6277.20 samples/sec Loss 2.7576 LearningRate 0.0000 Epoch: 35 Global Step: 737140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:49,189-Speed 6288.08 samples/sec Loss 2.7580 LearningRate 0.0000 Epoch: 35 Global Step: 737150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:52,441-Speed 6298.92 samples/sec Loss 2.7201 LearningRate 0.0000 Epoch: 35 Global Step: 737160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:55,693-Speed 6299.96 samples/sec Loss 2.7059 LearningRate 0.0000 Epoch: 35 Global Step: 737170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:47:58,932-Speed 6323.42 samples/sec Loss 2.7573 LearningRate 0.0000 Epoch: 35 Global Step: 737180 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:02,186-Speed 6294.88 samples/sec Loss 2.7465 LearningRate 0.0000 Epoch: 35 Global Step: 737190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:05,442-Speed 6291.31 samples/sec Loss 2.7416 LearningRate 0.0000 Epoch: 35 Global Step: 737200 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:08,696-Speed 6295.25 samples/sec Loss 2.7269 LearningRate 0.0000 Epoch: 35 Global Step: 737210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:11,947-Speed 6302.55 samples/sec Loss 2.7618 LearningRate 0.0000 Epoch: 35 Global Step: 737220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:15,201-Speed 6294.42 samples/sec Loss 2.7417 LearningRate 0.0000 Epoch: 35 Global Step: 737230 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:18,456-Speed 6293.31 samples/sec Loss 2.8109 LearningRate 0.0000 Epoch: 35 Global Step: 737240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:21,710-Speed 6294.16 samples/sec Loss 2.7705 LearningRate 0.0000 Epoch: 35 Global Step: 737250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:24,969-Speed 6285.58 samples/sec Loss 2.7912 LearningRate 0.0000 Epoch: 35 Global Step: 737260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:28,225-Speed 6292.48 samples/sec Loss 2.7459 LearningRate 0.0000 Epoch: 35 Global Step: 737270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:31,472-Speed 6308.41 samples/sec Loss 2.7676 LearningRate 0.0000 Epoch: 35 Global Step: 737280 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:34,729-Speed 6289.23 samples/sec Loss 2.7343 LearningRate 0.0000 Epoch: 35 Global Step: 737290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:37,980-Speed 6300.78 samples/sec Loss 2.7174 LearningRate 0.0000 Epoch: 35 Global Step: 737300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:41,241-Speed 6282.06 samples/sec Loss 2.7759 LearningRate 0.0000 Epoch: 35 Global Step: 737310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:44,491-Speed 6304.47 samples/sec Loss 2.7547 LearningRate 0.0000 Epoch: 35 Global Step: 737320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:47,744-Speed 6296.29 samples/sec Loss 2.7080 LearningRate 0.0000 Epoch: 35 Global Step: 737330 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:51,005-Speed 6281.02 samples/sec Loss 2.7541 LearningRate 0.0000 Epoch: 35 Global Step: 737340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:54,266-Speed 6280.96 samples/sec Loss 2.7307 LearningRate 0.0000 Epoch: 35 Global Step: 737350 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:48:57,532-Speed 6273.62 samples/sec Loss 2.7719 LearningRate 0.0000 Epoch: 35 Global Step: 737360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:00,784-Speed 6297.59 samples/sec Loss 2.7147 LearningRate 0.0000 Epoch: 35 Global Step: 737370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:04,043-Speed 6286.14 samples/sec Loss 2.7928 LearningRate 0.0000 Epoch: 35 Global Step: 737380 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:49:07,289-Speed 6310.47 samples/sec Loss 2.7717 LearningRate 0.0000 Epoch: 35 Global Step: 737390 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:10,543-Speed 6295.96 samples/sec Loss 2.7194 LearningRate 0.0000 Epoch: 35 Global Step: 737400 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:13,797-Speed 6295.18 samples/sec Loss 2.7052 LearningRate 0.0000 Epoch: 35 Global Step: 737410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:17,052-Speed 6293.60 samples/sec Loss 2.7399 LearningRate 0.0000 Epoch: 35 Global Step: 737420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:20,315-Speed 6277.30 samples/sec Loss 2.7411 LearningRate 0.0000 Epoch: 35 Global Step: 737430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:23,573-Speed 6288.07 samples/sec Loss 2.7902 LearningRate 0.0000 Epoch: 35 Global Step: 737440 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:26,836-Speed 6277.53 samples/sec Loss 2.6944 LearningRate 0.0000 Epoch: 35 Global Step: 737450 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:30,096-Speed 6283.94 samples/sec Loss 2.7862 LearningRate 0.0000 Epoch: 35 Global Step: 737460 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:33,357-Speed 6279.97 samples/sec Loss 2.7371 LearningRate 0.0000 Epoch: 35 Global Step: 737470 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:36,612-Speed 6294.20 samples/sec Loss 2.7539 LearningRate 0.0000 Epoch: 35 Global Step: 737480 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:39,862-Speed 6303.90 samples/sec Loss 2.7341 LearningRate 0.0000 Epoch: 35 Global Step: 737490 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:49:43,103-Speed 6321.30 samples/sec Loss 2.6994 LearningRate 0.0000 Epoch: 35 Global Step: 737500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:46,357-Speed 6294.67 samples/sec Loss 2.6550 LearningRate 0.0000 Epoch: 35 Global Step: 737510 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:49,628-Speed 6261.91 samples/sec Loss 2.7501 LearningRate 0.0000 Epoch: 35 Global Step: 737520 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:52,889-Speed 6281.62 samples/sec Loss 2.7287 LearningRate 0.0000 Epoch: 35 Global Step: 737530 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:56,150-Speed 6282.38 samples/sec Loss 2.6713 LearningRate 0.0000 Epoch: 35 Global Step: 737540 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:49:59,406-Speed 6290.08 samples/sec Loss 2.6962 LearningRate 0.0000 Epoch: 35 Global Step: 737550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:02,666-Speed 6284.03 samples/sec Loss 2.7188 LearningRate 0.0000 Epoch: 35 Global Step: 737560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:05,927-Speed 6282.71 samples/sec Loss 2.7377 LearningRate 0.0000 Epoch: 35 Global Step: 737570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:09,181-Speed 6294.17 samples/sec Loss 2.7261 LearningRate 0.0000 Epoch: 35 Global Step: 737580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:12,501-Speed 6169.79 samples/sec Loss 2.6970 LearningRate 0.0000 Epoch: 35 Global Step: 737590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:15,819-Speed 6173.38 samples/sec Loss 2.7314 LearningRate 0.0000 Epoch: 35 Global Step: 737600 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:19,069-Speed 6304.40 samples/sec Loss 2.7236 LearningRate 0.0000 Epoch: 35 Global Step: 737610 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:22,327-Speed 6285.69 samples/sec Loss 2.7119 LearningRate 0.0000 Epoch: 35 Global Step: 737620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:25,580-Speed 6297.17 samples/sec Loss 2.7293 LearningRate 0.0000 Epoch: 35 Global Step: 737630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:28,834-Speed 6295.52 samples/sec Loss 2.7022 LearningRate 0.0000 Epoch: 35 Global Step: 737640 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:32,091-Speed 6289.09 samples/sec Loss 2.7167 LearningRate 0.0000 Epoch: 35 Global Step: 737650 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:35,343-Speed 6299.92 samples/sec Loss 2.7133 LearningRate 0.0000 Epoch: 35 Global Step: 737660 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:38,603-Speed 6282.55 samples/sec Loss 2.7391 LearningRate 0.0000 Epoch: 35 Global Step: 737670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:41,863-Speed 6285.30 samples/sec Loss 2.7262 LearningRate 0.0000 Epoch: 35 Global Step: 737680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:45,117-Speed 6294.41 samples/sec Loss 2.7880 LearningRate 0.0000 Epoch: 35 Global Step: 737690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:48,374-Speed 6289.76 samples/sec Loss 2.7516 LearningRate 0.0000 Epoch: 35 Global Step: 737700 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:50:51,614-Speed 6322.60 samples/sec Loss 2.7332 LearningRate 0.0000 Epoch: 35 Global Step: 737710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:54,870-Speed 6292.03 samples/sec Loss 2.7638 LearningRate 0.0000 Epoch: 35 Global Step: 737720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:50:58,130-Speed 6283.55 samples/sec Loss 2.7385 LearningRate 0.0000 Epoch: 35 Global Step: 737730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:01,381-Speed 6300.43 samples/sec Loss 2.7179 LearningRate 0.0000 Epoch: 35 Global Step: 737740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:04,645-Speed 6275.22 samples/sec Loss 2.7541 LearningRate 0.0000 Epoch: 35 Global Step: 737750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:07,898-Speed 6297.92 samples/sec Loss 2.7388 LearningRate 0.0000 Epoch: 35 Global Step: 737760 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:11,148-Speed 6303.22 samples/sec Loss 2.7648 LearningRate 0.0000 Epoch: 35 Global Step: 737770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:14,408-Speed 6283.07 samples/sec Loss 2.7478 LearningRate 0.0000 Epoch: 35 Global Step: 737780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:17,662-Speed 6295.86 samples/sec Loss 2.7293 LearningRate 0.0000 Epoch: 35 Global Step: 737790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:20,915-Speed 6297.34 samples/sec Loss 2.7536 LearningRate 0.0000 Epoch: 35 Global Step: 737800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:24,161-Speed 6310.25 samples/sec Loss 2.6887 LearningRate 0.0000 Epoch: 35 Global Step: 737810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:27,424-Speed 6276.60 samples/sec Loss 2.7814 LearningRate 0.0000 Epoch: 35 Global Step: 737820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:30,683-Speed 6285.91 samples/sec Loss 2.7720 LearningRate 0.0000 Epoch: 35 Global Step: 737830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:33,939-Speed 6291.62 samples/sec Loss 2.7453 LearningRate 0.0000 Epoch: 35 Global Step: 737840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:37,275-Speed 6140.75 samples/sec Loss 2.7072 LearningRate 0.0000 Epoch: 35 Global Step: 737850 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:40,539-Speed 6276.44 samples/sec Loss 2.7371 LearningRate 0.0000 Epoch: 35 Global Step: 737860 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:43,788-Speed 6303.88 samples/sec Loss 2.7065 LearningRate 0.0000 Epoch: 35 Global Step: 737870 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:47,039-Speed 6300.91 samples/sec Loss 2.7894 LearningRate 0.0000 Epoch: 35 Global Step: 737880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:50,293-Speed 6296.08 samples/sec Loss 2.7232 LearningRate 0.0000 Epoch: 35 Global Step: 737890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:53,557-Speed 6275.01 samples/sec Loss 2.6885 LearningRate 0.0000 Epoch: 35 Global Step: 737900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:51:56,799-Speed 6320.11 samples/sec Loss 2.7443 LearningRate 0.0000 Epoch: 35 Global Step: 737910 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:00,049-Speed 6302.54 samples/sec Loss 2.7695 LearningRate 0.0000 Epoch: 35 Global Step: 737920 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:03,310-Speed 6281.60 samples/sec Loss 2.7675 LearningRate 0.0000 Epoch: 35 Global Step: 737930 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:06,565-Speed 6293.61 samples/sec Loss 2.7272 LearningRate 0.0000 Epoch: 35 Global Step: 737940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:09,814-Speed 6305.05 samples/sec Loss 2.7531 LearningRate 0.0000 Epoch: 35 Global Step: 737950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:13,068-Speed 6294.77 samples/sec Loss 2.7108 LearningRate 0.0000 Epoch: 35 Global Step: 737960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:16,323-Speed 6293.32 samples/sec Loss 2.7535 LearningRate 0.0000 Epoch: 35 Global Step: 737970 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:19,576-Speed 6295.99 samples/sec Loss 2.7017 LearningRate 0.0000 Epoch: 35 Global Step: 737980 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:22,841-Speed 6274.51 samples/sec Loss 2.7291 LearningRate 0.0000 Epoch: 35 Global Step: 737990 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:26,088-Speed 6308.76 samples/sec Loss 2.7405 LearningRate 0.0000 Epoch: 35 Global Step: 738000 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:29,347-Speed 6284.87 samples/sec Loss 2.7170 LearningRate 0.0000 Epoch: 35 Global Step: 738010 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:52:32,589-Speed 6318.49 samples/sec Loss 2.6676 LearningRate 0.0000 Epoch: 35 Global Step: 738020 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:35,841-Speed 6299.98 samples/sec Loss 2.7363 LearningRate 0.0000 Epoch: 35 Global Step: 738030 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:39,101-Speed 6282.68 samples/sec Loss 2.7155 LearningRate 0.0000 Epoch: 35 Global Step: 738040 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:42,360-Speed 6285.61 samples/sec Loss 2.7376 LearningRate 0.0000 Epoch: 35 Global Step: 738050 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:45,617-Speed 6290.71 samples/sec Loss 2.7419 LearningRate 0.0000 Epoch: 35 Global Step: 738060 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:48,923-Speed 6196.55 samples/sec Loss 2.7541 LearningRate 0.0000 Epoch: 35 Global Step: 738070 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:52,207-Speed 6236.12 samples/sec Loss 2.7799 LearningRate 0.0000 Epoch: 35 Global Step: 738080 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:55,465-Speed 6287.18 samples/sec Loss 2.7592 LearningRate 0.0000 Epoch: 35 Global Step: 738090 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:52:58,725-Speed 6284.80 samples/sec Loss 2.7511 LearningRate 0.0000 Epoch: 35 Global Step: 738100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:01,991-Speed 6270.96 samples/sec Loss 2.7266 LearningRate 0.0000 Epoch: 35 Global Step: 738110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:05,256-Speed 6275.23 samples/sec Loss 2.7462 LearningRate 0.0000 Epoch: 35 Global Step: 738120 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:53:08,500-Speed 6315.04 samples/sec Loss 2.7017 LearningRate 0.0000 Epoch: 35 Global Step: 738130 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:11,749-Speed 6304.16 samples/sec Loss 2.7195 LearningRate 0.0000 Epoch: 35 Global Step: 738140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:15,006-Speed 6290.52 samples/sec Loss 2.7098 LearningRate 0.0000 Epoch: 35 Global Step: 738150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:18,267-Speed 6281.13 samples/sec Loss 2.7689 LearningRate 0.0000 Epoch: 35 Global Step: 738160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:21,523-Speed 6290.77 samples/sec Loss 2.6523 LearningRate 0.0000 Epoch: 35 Global Step: 738170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:24,786-Speed 6279.00 samples/sec Loss 2.7504 LearningRate 0.0000 Epoch: 35 Global Step: 738180 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:28,043-Speed 6288.98 samples/sec Loss 2.7493 LearningRate 0.0000 Epoch: 35 Global Step: 738190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:31,299-Speed 6290.31 samples/sec Loss 2.6892 LearningRate 0.0000 Epoch: 35 Global Step: 738200 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:34,553-Speed 6295.94 samples/sec Loss 2.7387 LearningRate 0.0000 Epoch: 35 Global Step: 738210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:37,806-Speed 6296.76 samples/sec Loss 2.7469 LearningRate 0.0000 Epoch: 35 Global Step: 738220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:41,060-Speed 6294.99 samples/sec Loss 2.6933 LearningRate 0.0000 Epoch: 35 Global Step: 738230 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:53:44,298-Speed 6326.32 samples/sec Loss 2.7634 LearningRate 0.0000 Epoch: 35 Global Step: 738240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:47,556-Speed 6288.67 samples/sec Loss 2.7574 LearningRate 0.0000 Epoch: 35 Global Step: 738250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:50,813-Speed 6289.02 samples/sec Loss 2.7048 LearningRate 0.0000 Epoch: 35 Global Step: 738260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:54,075-Speed 6279.16 samples/sec Loss 2.7313 LearningRate 0.0000 Epoch: 35 Global Step: 738270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:53:57,335-Speed 6283.81 samples/sec Loss 2.7037 LearningRate 0.0000 Epoch: 35 Global Step: 738280 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:00,590-Speed 6292.89 samples/sec Loss 2.6996 LearningRate 0.0000 Epoch: 35 Global Step: 738290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:03,846-Speed 6291.41 samples/sec Loss 2.7328 LearningRate 0.0000 Epoch: 35 Global Step: 738300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:07,101-Speed 6294.07 samples/sec Loss 2.7309 LearningRate 0.0000 Epoch: 35 Global Step: 738310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:10,360-Speed 6284.00 samples/sec Loss 2.7188 LearningRate 0.0000 Epoch: 35 Global Step: 738320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:13,620-Speed 6285.28 samples/sec Loss 2.7735 LearningRate 0.0000 Epoch: 35 Global Step: 738330 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:16,860-Speed 6323.38 samples/sec Loss 2.7082 LearningRate 0.0000 Epoch: 35 Global Step: 738340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:20,118-Speed 6287.08 samples/sec Loss 2.7040 LearningRate 0.0000 Epoch: 35 Global Step: 738350 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:23,372-Speed 6294.39 samples/sec Loss 2.7749 LearningRate 0.0000 Epoch: 35 Global Step: 738360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:26,628-Speed 6291.05 samples/sec Loss 2.7316 LearningRate 0.0000 Epoch: 35 Global Step: 738370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:29,893-Speed 6274.81 samples/sec Loss 2.7410 LearningRate 0.0000 Epoch: 35 Global Step: 738380 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:33,154-Speed 6282.29 samples/sec Loss 2.7418 LearningRate 0.0000 Epoch: 35 Global Step: 738390 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:36,408-Speed 6294.98 samples/sec Loss 2.7237 LearningRate 0.0000 Epoch: 35 Global Step: 738400 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:39,668-Speed 6283.51 samples/sec Loss 2.7783 LearningRate 0.0000 Epoch: 35 Global Step: 738410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:42,924-Speed 6290.32 samples/sec Loss 2.6577 LearningRate 0.0000 Epoch: 35 Global Step: 738420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:46,173-Speed 6304.43 samples/sec Loss 2.6982 LearningRate 0.0000 Epoch: 35 Global Step: 738430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:49,417-Speed 6315.14 samples/sec Loss 2.7134 LearningRate 0.0000 Epoch: 35 Global Step: 738440 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:52,673-Speed 6292.63 samples/sec Loss 2.7521 LearningRate 0.0000 Epoch: 35 Global Step: 738450 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:55,936-Speed 6276.93 samples/sec Loss 2.6994 LearningRate 0.0000 Epoch: 35 Global Step: 738460 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:54:59,193-Speed 6289.06 samples/sec Loss 2.7484 LearningRate 0.0000 Epoch: 35 Global Step: 738470 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:02,451-Speed 6286.99 samples/sec Loss 2.7759 LearningRate 0.0000 Epoch: 35 Global Step: 738480 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:05,711-Speed 6284.16 samples/sec Loss 2.7742 LearningRate 0.0000 Epoch: 35 Global Step: 738490 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:08,972-Speed 6281.85 samples/sec Loss 2.7405 LearningRate 0.0000 Epoch: 35 Global Step: 738500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:12,240-Speed 6267.67 samples/sec Loss 2.7178 LearningRate 0.0000 Epoch: 35 Global Step: 738510 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:15,498-Speed 6288.56 samples/sec Loss 2.7897 LearningRate 0.0000 Epoch: 35 Global Step: 738520 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:18,758-Speed 6284.61 samples/sec Loss 2.7474 LearningRate 0.0000 Epoch: 35 Global Step: 738530 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:22,016-Speed 6286.48 samples/sec Loss 2.7248 LearningRate 0.0000 Epoch: 35 Global Step: 738540 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:55:25,254-Speed 6327.12 samples/sec Loss 2.7366 LearningRate 0.0000 Epoch: 35 Global Step: 738550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:28,512-Speed 6286.70 samples/sec Loss 2.7014 LearningRate 0.0000 Epoch: 35 Global Step: 738560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:31,768-Speed 6292.14 samples/sec Loss 2.7248 LearningRate 0.0000 Epoch: 35 Global Step: 738570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:35,023-Speed 6292.68 samples/sec Loss 2.7074 LearningRate 0.0000 Epoch: 35 Global Step: 738580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:38,285-Speed 6278.76 samples/sec Loss 2.7591 LearningRate 0.0000 Epoch: 35 Global Step: 738590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:41,550-Speed 6275.83 samples/sec Loss 2.7364 LearningRate 0.0000 Epoch: 35 Global Step: 738600 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:44,807-Speed 6288.77 samples/sec Loss 2.7171 LearningRate 0.0000 Epoch: 35 Global Step: 738610 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:48,068-Speed 6280.53 samples/sec Loss 2.7456 LearningRate 0.0000 Epoch: 35 Global Step: 738620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:51,326-Speed 6288.05 samples/sec Loss 2.7724 LearningRate 0.0000 Epoch: 35 Global Step: 738630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:54,584-Speed 6287.37 samples/sec Loss 2.7180 LearningRate 0.0000 Epoch: 35 Global Step: 738640 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:55:57,825-Speed 6320.02 samples/sec Loss 2.7465 LearningRate 0.0000 Epoch: 35 Global Step: 738650 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:01,090-Speed 6273.65 samples/sec Loss 2.7210 LearningRate 0.0000 Epoch: 35 Global Step: 738660 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:04,354-Speed 6277.40 samples/sec Loss 2.7675 LearningRate 0.0000 Epoch: 35 Global Step: 738670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:07,614-Speed 6283.19 samples/sec Loss 2.7648 LearningRate 0.0000 Epoch: 35 Global Step: 738680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:10,862-Speed 6306.11 samples/sec Loss 2.7482 LearningRate 0.0000 Epoch: 35 Global Step: 738690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:14,124-Speed 6279.58 samples/sec Loss 2.7232 LearningRate 0.0000 Epoch: 35 Global Step: 738700 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:17,384-Speed 6284.86 samples/sec Loss 2.7225 LearningRate 0.0000 Epoch: 35 Global Step: 738710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:20,638-Speed 6295.57 samples/sec Loss 2.7307 LearningRate 0.0000 Epoch: 35 Global Step: 738720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:23,898-Speed 6284.26 samples/sec Loss 2.7543 LearningRate 0.0000 Epoch: 35 Global Step: 738730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:27,166-Speed 6267.96 samples/sec Loss 2.7654 LearningRate 0.0000 Epoch: 35 Global Step: 738740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:30,402-Speed 6329.30 samples/sec Loss 2.7077 LearningRate 0.0000 Epoch: 35 Global Step: 738750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:33,658-Speed 6292.26 samples/sec Loss 2.7731 LearningRate 0.0000 Epoch: 35 Global Step: 738760 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:36,938-Speed 6244.55 samples/sec Loss 2.7914 LearningRate 0.0000 Epoch: 35 Global Step: 738770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:40,195-Speed 6288.77 samples/sec Loss 2.7358 LearningRate 0.0000 Epoch: 35 Global Step: 738780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:43,453-Speed 6287.91 samples/sec Loss 2.7292 LearningRate 0.0000 Epoch: 35 Global Step: 738790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:46,716-Speed 6278.88 samples/sec Loss 2.7027 LearningRate 0.0000 Epoch: 35 Global Step: 738800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:50,059-Speed 6127.71 samples/sec Loss 2.7339 LearningRate 0.0000 Epoch: 35 Global Step: 738810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:53,354-Speed 6217.17 samples/sec Loss 2.7379 LearningRate 0.0000 Epoch: 35 Global Step: 738820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:56,611-Speed 6288.26 samples/sec Loss 2.7075 LearningRate 0.0000 Epoch: 35 Global Step: 738830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:56:59,872-Speed 6281.52 samples/sec Loss 2.7601 LearningRate 0.0000 Epoch: 35 Global Step: 738840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:03,128-Speed 6290.73 samples/sec Loss 2.7701 LearningRate 0.0000 Epoch: 35 Global Step: 738850 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:57:06,385-Speed 6291.09 samples/sec Loss 2.7794 LearningRate 0.0000 Epoch: 35 Global Step: 738860 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:57:09,624-Speed 6323.08 samples/sec Loss 2.6753 LearningRate 0.0000 Epoch: 35 Global Step: 738870 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:12,885-Speed 6281.08 samples/sec Loss 2.7440 LearningRate 0.0000 Epoch: 35 Global Step: 738880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:16,147-Speed 6280.54 samples/sec Loss 2.7153 LearningRate 0.0000 Epoch: 35 Global Step: 738890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:19,400-Speed 6296.50 samples/sec Loss 2.7398 LearningRate 0.0000 Epoch: 35 Global Step: 738900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:22,654-Speed 6295.28 samples/sec Loss 2.7211 LearningRate 0.0000 Epoch: 35 Global Step: 738910 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:25,916-Speed 6279.93 samples/sec Loss 2.7515 LearningRate 0.0000 Epoch: 35 Global Step: 738920 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:29,171-Speed 6294.89 samples/sec Loss 2.7330 LearningRate 0.0000 Epoch: 35 Global Step: 738930 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:32,431-Speed 6283.39 samples/sec Loss 2.7583 LearningRate 0.0000 Epoch: 35 Global Step: 738940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:35,695-Speed 6276.63 samples/sec Loss 2.7391 LearningRate 0.0000 Epoch: 35 Global Step: 738950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:38,950-Speed 6293.35 samples/sec Loss 2.7581 LearningRate 0.0000 Epoch: 35 Global Step: 738960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:57:42,201-Speed 6299.81 samples/sec Loss 2.7380 LearningRate 0.0000 Epoch: 35 Global Step: 738970 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:57:45,425-Speed 6353.85 samples/sec Loss 2.6876 LearningRate 0.0000 Epoch: 35 Global Step: 738980 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 10:57:48,686-Speed 6282.35 samples/sec Loss 2.7090 LearningRate 0.0000 Epoch: 35 Global Step: 738990 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 10:57:51,938-Speed 6299.47 samples/sec Loss 2.7659 LearningRate 0.0000 Epoch: 35 Global Step: 739000 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 10:57:55,193-Speed 6292.61 samples/sec Loss 2.7723 LearningRate 0.0000 Epoch: 35 Global Step: 739010 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 10:57:58,449-Speed 6291.01 samples/sec Loss 2.7491 LearningRate 0.0000 Epoch: 35 Global Step: 739020 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 10:58:01,709-Speed 6283.66 samples/sec Loss 2.7169 LearningRate 0.0000 Epoch: 35 Global Step: 739030 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 10:58:04,963-Speed 6295.01 samples/sec Loss 2.7183 LearningRate 0.0000 Epoch: 35 Global Step: 739040 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 10:58:08,216-Speed 6297.01 samples/sec Loss 2.7783 LearningRate 0.0000 Epoch: 35 Global Step: 739050 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 10:58:11,483-Speed 6273.43 samples/sec Loss 2.7387 LearningRate 0.0000 Epoch: 35 Global Step: 739060 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 10:58:14,746-Speed 6278.23 samples/sec Loss 2.7951 LearningRate 0.0000 Epoch: 35 Global Step: 739070 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 10:58:18,006-Speed 6283.80 samples/sec Loss 2.7321 LearningRate 0.0000 Epoch: 35 Global Step: 739080 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:21,267-Speed 6282.43 samples/sec Loss 2.7638 LearningRate 0.0000 Epoch: 35 Global Step: 739090 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:24,526-Speed 6283.97 samples/sec Loss 2.7984 LearningRate 0.0000 Epoch: 35 Global Step: 739100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:27,786-Speed 6283.96 samples/sec Loss 2.7014 LearningRate 0.0000 Epoch: 35 Global Step: 739110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:31,043-Speed 6290.29 samples/sec Loss 2.7405 LearningRate 0.0000 Epoch: 35 Global Step: 739120 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:34,299-Speed 6290.70 samples/sec Loss 2.7163 LearningRate 0.0000 Epoch: 35 Global Step: 739130 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:37,556-Speed 6289.68 samples/sec Loss 2.7213 LearningRate 0.0000 Epoch: 35 Global Step: 739140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:40,817-Speed 6282.54 samples/sec Loss 2.7219 LearningRate 0.0000 Epoch: 35 Global Step: 739150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:44,078-Speed 6281.08 samples/sec Loss 2.7432 LearningRate 0.0000 Epoch: 35 Global Step: 739160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:47,333-Speed 6293.61 samples/sec Loss 2.7482 LearningRate 0.0000 Epoch: 35 Global Step: 739170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:50,573-Speed 6322.84 samples/sec Loss 2.7297 LearningRate 0.0000 Epoch: 35 Global Step: 739180 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:53,829-Speed 6291.13 samples/sec Loss 2.7439 LearningRate 0.0000 Epoch: 35 Global Step: 739190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:58:57,084-Speed 6293.20 samples/sec Loss 2.7436 LearningRate 0.0000 Epoch: 35 Global Step: 739200 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:00,338-Speed 6295.23 samples/sec Loss 2.7700 LearningRate 0.0000 Epoch: 35 Global Step: 739210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:03,595-Speed 6288.34 samples/sec Loss 2.8075 LearningRate 0.0000 Epoch: 35 Global Step: 739220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:06,858-Speed 6278.38 samples/sec Loss 2.7253 LearningRate 0.0000 Epoch: 35 Global Step: 739230 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:10,111-Speed 6297.61 samples/sec Loss 2.7332 LearningRate 0.0000 Epoch: 35 Global Step: 739240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:13,369-Speed 6286.77 samples/sec Loss 2.7670 LearningRate 0.0000 Epoch: 35 Global Step: 739250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:16,624-Speed 6293.96 samples/sec Loss 2.7305 LearningRate 0.0000 Epoch: 35 Global Step: 739260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:19,880-Speed 6291.06 samples/sec Loss 2.7193 LearningRate 0.0000 Epoch: 35 Global Step: 739270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:23,135-Speed 6292.23 samples/sec Loss 2.7148 LearningRate 0.0000 Epoch: 35 Global Step: 739280 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 10:59:26,381-Speed 6311.81 samples/sec Loss 2.7844 LearningRate 0.0000 Epoch: 35 Global Step: 739290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:29,634-Speed 6297.01 samples/sec Loss 2.7189 LearningRate 0.0000 Epoch: 35 Global Step: 739300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:32,884-Speed 6303.29 samples/sec Loss 2.7220 LearningRate 0.0000 Epoch: 35 Global Step: 739310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:36,146-Speed 6278.27 samples/sec Loss 2.7748 LearningRate 0.0000 Epoch: 35 Global Step: 739320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:39,408-Speed 6279.38 samples/sec Loss 2.7700 LearningRate 0.0000 Epoch: 35 Global Step: 739330 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:42,670-Speed 6281.73 samples/sec Loss 2.7029 LearningRate 0.0000 Epoch: 35 Global Step: 739340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:45,924-Speed 6294.59 samples/sec Loss 2.7037 LearningRate 0.0000 Epoch: 35 Global Step: 739350 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:49,187-Speed 6278.61 samples/sec Loss 2.7118 LearningRate 0.0000 Epoch: 35 Global Step: 739360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:52,442-Speed 6292.64 samples/sec Loss 2.7922 LearningRate 0.0000 Epoch: 35 Global Step: 739370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:55,704-Speed 6280.12 samples/sec Loss 2.7462 LearningRate 0.0000 Epoch: 35 Global Step: 739380 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 10:59:58,966-Speed 6280.33 samples/sec Loss 2.7150 LearningRate 0.0000 Epoch: 35 Global Step: 739390 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:00:02,234-Speed 6268.23 samples/sec Loss 2.7310 LearningRate 0.0000 Epoch: 35 Global Step: 739400 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:00:05,493-Speed 6285.60 samples/sec Loss 2.7476 LearningRate 0.0000 Epoch: 35 Global Step: 739410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:08,762-Speed 6265.05 samples/sec Loss 2.7433 LearningRate 0.0000 Epoch: 35 Global Step: 739420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:12,016-Speed 6296.50 samples/sec Loss 2.7004 LearningRate 0.0000 Epoch: 35 Global Step: 739430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:15,269-Speed 6296.95 samples/sec Loss 2.7527 LearningRate 0.0000 Epoch: 35 Global Step: 739440 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:18,529-Speed 6283.58 samples/sec Loss 2.7031 LearningRate 0.0000 Epoch: 35 Global Step: 739450 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:21,792-Speed 6278.51 samples/sec Loss 2.7086 LearningRate 0.0000 Epoch: 35 Global Step: 739460 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:25,043-Speed 6299.86 samples/sec Loss 2.7270 LearningRate 0.0000 Epoch: 35 Global Step: 739470 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:28,303-Speed 6286.36 samples/sec Loss 2.7272 LearningRate 0.0000 Epoch: 35 Global Step: 739480 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:31,554-Speed 6300.92 samples/sec Loss 2.7140 LearningRate 0.0000 Epoch: 35 Global Step: 739490 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:34,811-Speed 6289.74 samples/sec Loss 2.7588 LearningRate 0.0000 Epoch: 35 Global Step: 739500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:38,055-Speed 6313.31 samples/sec Loss 2.7288 LearningRate 0.0000 Epoch: 35 Global Step: 739510 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:41,311-Speed 6292.55 samples/sec Loss 2.7386 LearningRate 0.0000 Epoch: 35 Global Step: 739520 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:44,565-Speed 6294.86 samples/sec Loss 2.8135 LearningRate 0.0000 Epoch: 35 Global Step: 739530 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:47,821-Speed 6290.28 samples/sec Loss 2.7579 LearningRate 0.0000 Epoch: 35 Global Step: 739540 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:51,084-Speed 6278.85 samples/sec Loss 2.7511 LearningRate 0.0000 Epoch: 35 Global Step: 739550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:54,344-Speed 6283.64 samples/sec Loss 2.7526 LearningRate 0.0000 Epoch: 35 Global Step: 739560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:00:57,601-Speed 6289.72 samples/sec Loss 2.7577 LearningRate 0.0000 Epoch: 35 Global Step: 739570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:00,857-Speed 6294.92 samples/sec Loss 2.7040 LearningRate 0.0000 Epoch: 35 Global Step: 739580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:04,120-Speed 6277.65 samples/sec Loss 2.7332 LearningRate 0.0000 Epoch: 35 Global Step: 739590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:07,381-Speed 6282.71 samples/sec Loss 2.7049 LearningRate 0.0000 Epoch: 35 Global Step: 739600 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:10,635-Speed 6294.53 samples/sec Loss 2.7191 LearningRate 0.0000 Epoch: 35 Global Step: 739610 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:01:13,879-Speed 6314.19 samples/sec Loss 2.6766 LearningRate 0.0000 Epoch: 35 Global Step: 739620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:17,140-Speed 6281.92 samples/sec Loss 2.7157 LearningRate 0.0000 Epoch: 35 Global Step: 739630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:20,395-Speed 6292.77 samples/sec Loss 2.7412 LearningRate 0.0000 Epoch: 35 Global Step: 739640 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:23,653-Speed 6287.67 samples/sec Loss 2.7296 LearningRate 0.0000 Epoch: 35 Global Step: 739650 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:26,916-Speed 6279.27 samples/sec Loss 2.6553 LearningRate 0.0000 Epoch: 35 Global Step: 739660 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:30,177-Speed 6281.59 samples/sec Loss 2.7460 LearningRate 0.0000 Epoch: 35 Global Step: 739670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:33,434-Speed 6289.02 samples/sec Loss 2.7210 LearningRate 0.0000 Epoch: 35 Global Step: 739680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:36,690-Speed 6290.61 samples/sec Loss 2.6813 LearningRate 0.0000 Epoch: 35 Global Step: 739690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:39,946-Speed 6290.65 samples/sec Loss 2.6936 LearningRate 0.0000 Epoch: 35 Global Step: 739700 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:43,199-Speed 6297.91 samples/sec Loss 2.7332 LearningRate 0.0000 Epoch: 35 Global Step: 739710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:46,443-Speed 6313.83 samples/sec Loss 2.7351 LearningRate 0.0000 Epoch: 35 Global Step: 739720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:49,701-Speed 6288.65 samples/sec Loss 2.7191 LearningRate 0.0000 Epoch: 35 Global Step: 739730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:52,951-Speed 6302.96 samples/sec Loss 2.7185 LearningRate 0.0000 Epoch: 35 Global Step: 739740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:56,206-Speed 6293.00 samples/sec Loss 2.7915 LearningRate 0.0000 Epoch: 35 Global Step: 739750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:01:59,464-Speed 6286.83 samples/sec Loss 2.7685 LearningRate 0.0000 Epoch: 35 Global Step: 739760 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:02,727-Speed 6278.96 samples/sec Loss 2.7310 LearningRate 0.0000 Epoch: 35 Global Step: 739770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:05,984-Speed 6288.14 samples/sec Loss 2.7194 LearningRate 0.0000 Epoch: 35 Global Step: 739780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:09,242-Speed 6288.10 samples/sec Loss 2.7634 LearningRate 0.0000 Epoch: 35 Global Step: 739790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:12,501-Speed 6285.49 samples/sec Loss 2.6996 LearningRate 0.0000 Epoch: 35 Global Step: 739800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:15,758-Speed 6289.44 samples/sec Loss 2.7516 LearningRate 0.0000 Epoch: 35 Global Step: 739810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:18,993-Speed 6332.03 samples/sec Loss 2.6561 LearningRate 0.0000 Epoch: 35 Global Step: 739820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:22,267-Speed 6258.07 samples/sec Loss 2.6931 LearningRate 0.0000 Epoch: 35 Global Step: 739830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:25,523-Speed 6290.53 samples/sec Loss 2.7022 LearningRate 0.0000 Epoch: 35 Global Step: 739840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:28,779-Speed 6291.43 samples/sec Loss 2.7322 LearningRate 0.0000 Epoch: 35 Global Step: 739850 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:32,036-Speed 6290.50 samples/sec Loss 2.7152 LearningRate 0.0000 Epoch: 35 Global Step: 739860 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:35,294-Speed 6285.81 samples/sec Loss 2.7616 LearningRate 0.0000 Epoch: 35 Global Step: 739870 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:38,559-Speed 6273.48 samples/sec Loss 2.7252 LearningRate 0.0000 Epoch: 35 Global Step: 739880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:41,827-Speed 6269.69 samples/sec Loss 2.7556 LearningRate 0.0000 Epoch: 35 Global Step: 739890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:45,087-Speed 6283.37 samples/sec Loss 2.7437 LearningRate 0.0000 Epoch: 35 Global Step: 739900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:48,348-Speed 6282.16 samples/sec Loss 2.7136 LearningRate 0.0000 Epoch: 35 Global Step: 739910 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:51,594-Speed 6309.76 samples/sec Loss 2.7635 LearningRate 0.0000 Epoch: 35 Global Step: 739920 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:54,849-Speed 6292.82 samples/sec Loss 2.7502 LearningRate 0.0000 Epoch: 35 Global Step: 739930 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:02:58,104-Speed 6294.98 samples/sec Loss 2.7143 LearningRate 0.0000 Epoch: 35 Global Step: 739940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:01,363-Speed 6284.40 samples/sec Loss 2.7704 LearningRate 0.0000 Epoch: 35 Global Step: 739950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:04,621-Speed 6287.60 samples/sec Loss 2.7976 LearningRate 0.0000 Epoch: 35 Global Step: 739960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:07,880-Speed 6285.31 samples/sec Loss 2.7824 LearningRate 0.0000 Epoch: 35 Global Step: 739970 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:11,138-Speed 6287.01 samples/sec Loss 2.7729 LearningRate 0.0000 Epoch: 35 Global Step: 739980 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:14,393-Speed 6294.30 samples/sec Loss 2.7207 LearningRate 0.0000 Epoch: 35 Global Step: 739990 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:17,654-Speed 6282.18 samples/sec Loss 2.7119 LearningRate 0.0000 Epoch: 35 Global Step: 740000 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:20,912-Speed 6288.05 samples/sec Loss 2.7234 LearningRate 0.0000 Epoch: 35 Global Step: 740010 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:24,144-Speed 6336.77 samples/sec Loss 2.7497 LearningRate 0.0000 Epoch: 35 Global Step: 740020 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:27,410-Speed 6272.44 samples/sec Loss 2.7301 LearningRate 0.0000 Epoch: 35 Global Step: 740030 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:30,668-Speed 6287.38 samples/sec Loss 2.7028 LearningRate 0.0000 Epoch: 35 Global Step: 740040 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:33,922-Speed 6295.02 samples/sec Loss 2.7833 LearningRate 0.0000 Epoch: 35 Global Step: 740050 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:37,184-Speed 6279.30 samples/sec Loss 2.7280 LearningRate 0.0000 Epoch: 35 Global Step: 740060 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:40,439-Speed 6293.31 samples/sec Loss 2.7025 LearningRate 0.0000 Epoch: 35 Global Step: 740070 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:43,697-Speed 6288.05 samples/sec Loss 2.7058 LearningRate 0.0000 Epoch: 35 Global Step: 740080 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:46,956-Speed 6285.95 samples/sec Loss 2.7315 LearningRate 0.0000 Epoch: 35 Global Step: 740090 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:50,214-Speed 6287.67 samples/sec Loss 2.7126 LearningRate 0.0000 Epoch: 35 Global Step: 740100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:53,472-Speed 6287.68 samples/sec Loss 2.7355 LearningRate 0.0000 Epoch: 35 Global Step: 740110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:03:56,726-Speed 6294.28 samples/sec Loss 2.7353 LearningRate 0.0000 Epoch: 35 Global Step: 740120 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:03:59,965-Speed 6324.56 samples/sec Loss 2.7604 LearningRate 0.0000 Epoch: 35 Global Step: 740130 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:03,234-Speed 6266.57 samples/sec Loss 2.7484 LearningRate 0.0000 Epoch: 35 Global Step: 740140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:06,495-Speed 6282.07 samples/sec Loss 2.7107 LearningRate 0.0000 Epoch: 35 Global Step: 740150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:09,747-Speed 6297.42 samples/sec Loss 2.7825 LearningRate 0.0000 Epoch: 35 Global Step: 740160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:13,009-Speed 6281.39 samples/sec Loss 2.7258 LearningRate 0.0000 Epoch: 35 Global Step: 740170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:16,274-Speed 6272.36 samples/sec Loss 2.7059 LearningRate 0.0000 Epoch: 35 Global Step: 740180 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:19,534-Speed 6283.61 samples/sec Loss 2.7289 LearningRate 0.0000 Epoch: 35 Global Step: 740190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:22,791-Speed 6291.07 samples/sec Loss 2.7059 LearningRate 0.0000 Epoch: 35 Global Step: 740200 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:26,044-Speed 6296.19 samples/sec Loss 2.7394 LearningRate 0.0000 Epoch: 35 Global Step: 740210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:29,304-Speed 6283.84 samples/sec Loss 2.7391 LearningRate 0.0000 Epoch: 35 Global Step: 740220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:32,560-Speed 6290.89 samples/sec Loss 2.7108 LearningRate 0.0000 Epoch: 35 Global Step: 740230 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:04:35,799-Speed 6324.70 samples/sec Loss 2.6928 LearningRate 0.0000 Epoch: 35 Global Step: 740240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:39,052-Speed 6297.90 samples/sec Loss 2.7573 LearningRate 0.0000 Epoch: 35 Global Step: 740250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:42,310-Speed 6287.00 samples/sec Loss 2.7196 LearningRate 0.0000 Epoch: 35 Global Step: 740260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:45,563-Speed 6297.70 samples/sec Loss 2.7747 LearningRate 0.0000 Epoch: 35 Global Step: 740270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:48,820-Speed 6288.72 samples/sec Loss 2.7701 LearningRate 0.0000 Epoch: 35 Global Step: 740280 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:52,086-Speed 6272.13 samples/sec Loss 2.7412 LearningRate 0.0000 Epoch: 35 Global Step: 740290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:55,344-Speed 6287.95 samples/sec Loss 2.7481 LearningRate 0.0000 Epoch: 35 Global Step: 740300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:04:58,608-Speed 6276.58 samples/sec Loss 2.7651 LearningRate 0.0000 Epoch: 35 Global Step: 740310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:01,864-Speed 6290.47 samples/sec Loss 2.7893 LearningRate 0.0000 Epoch: 35 Global Step: 740320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:05,119-Speed 6293.49 samples/sec Loss 2.7146 LearningRate 0.0000 Epoch: 35 Global Step: 740330 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:08,363-Speed 6314.13 samples/sec Loss 2.7143 LearningRate 0.0000 Epoch: 35 Global Step: 740340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:11,615-Speed 6299.29 samples/sec Loss 2.7400 LearningRate 0.0000 Epoch: 35 Global Step: 740350 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:14,872-Speed 6290.22 samples/sec Loss 2.7297 LearningRate 0.0000 Epoch: 35 Global Step: 740360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:18,135-Speed 6277.03 samples/sec Loss 2.7327 LearningRate 0.0000 Epoch: 35 Global Step: 740370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:21,393-Speed 6287.58 samples/sec Loss 2.7134 LearningRate 0.0000 Epoch: 35 Global Step: 740380 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:24,653-Speed 6283.80 samples/sec Loss 2.7034 LearningRate 0.0000 Epoch: 35 Global Step: 740390 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:27,907-Speed 6294.61 samples/sec Loss 2.7568 LearningRate 0.0000 Epoch: 35 Global Step: 740400 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:31,172-Speed 6274.44 samples/sec Loss 2.7064 LearningRate 0.0000 Epoch: 35 Global Step: 740410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:34,429-Speed 6289.63 samples/sec Loss 2.7300 LearningRate 0.0000 Epoch: 35 Global Step: 740420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:37,692-Speed 6278.58 samples/sec Loss 2.7545 LearningRate 0.0000 Epoch: 35 Global Step: 740430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:40,934-Speed 6317.60 samples/sec Loss 2.7042 LearningRate 0.0000 Epoch: 35 Global Step: 740440 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:44,193-Speed 6287.23 samples/sec Loss 2.7317 LearningRate 0.0000 Epoch: 35 Global Step: 740450 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:47,446-Speed 6297.42 samples/sec Loss 2.6858 LearningRate 0.0000 Epoch: 35 Global Step: 740460 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:50,698-Speed 6298.75 samples/sec Loss 2.6793 LearningRate 0.0000 Epoch: 35 Global Step: 740470 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:53,946-Speed 6305.62 samples/sec Loss 2.7034 LearningRate 0.0000 Epoch: 35 Global Step: 740480 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:05:57,201-Speed 6293.64 samples/sec Loss 2.6544 LearningRate 0.0000 Epoch: 35 Global Step: 740490 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:00,463-Speed 6279.85 samples/sec Loss 2.7587 LearningRate 0.0000 Epoch: 35 Global Step: 740500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:03,817-Speed 6108.43 samples/sec Loss 2.7215 LearningRate 0.0000 Epoch: 35 Global Step: 740510 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:07,133-Speed 6177.16 samples/sec Loss 2.7670 LearningRate 0.0000 Epoch: 35 Global Step: 740520 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:10,387-Speed 6295.32 samples/sec Loss 2.7713 LearningRate 0.0000 Epoch: 35 Global Step: 740530 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:13,622-Speed 6330.65 samples/sec Loss 2.7201 LearningRate 0.0000 Epoch: 35 Global Step: 740540 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:16,875-Speed 6297.78 samples/sec Loss 2.6814 LearningRate 0.0000 Epoch: 35 Global Step: 740550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:20,129-Speed 6295.05 samples/sec Loss 2.7477 LearningRate 0.0000 Epoch: 35 Global Step: 740560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:23,387-Speed 6286.37 samples/sec Loss 2.8021 LearningRate 0.0000 Epoch: 35 Global Step: 740570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:26,642-Speed 6293.23 samples/sec Loss 2.7388 LearningRate 0.0000 Epoch: 35 Global Step: 740580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:29,888-Speed 6312.47 samples/sec Loss 2.7402 LearningRate 0.0000 Epoch: 35 Global Step: 740590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:33,146-Speed 6285.60 samples/sec Loss 2.7248 LearningRate 0.0000 Epoch: 35 Global Step: 740600 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:36,398-Speed 6300.11 samples/sec Loss 2.7047 LearningRate 0.0000 Epoch: 35 Global Step: 740610 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:39,653-Speed 6292.74 samples/sec Loss 2.7141 LearningRate 0.0000 Epoch: 35 Global Step: 740620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:42,912-Speed 6286.47 samples/sec Loss 2.6843 LearningRate 0.0000 Epoch: 35 Global Step: 740630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:46,172-Speed 6283.09 samples/sec Loss 2.6862 LearningRate 0.0000 Epoch: 35 Global Step: 740640 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:06:49,428-Speed 6290.93 samples/sec Loss 2.7715 LearningRate 0.0000 Epoch: 35 Global Step: 740650 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:52,694-Speed 6273.20 samples/sec Loss 2.7818 LearningRate 0.0000 Epoch: 35 Global Step: 740660 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:55,954-Speed 6283.53 samples/sec Loss 2.7029 LearningRate 0.0000 Epoch: 35 Global Step: 740670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:06:59,214-Speed 6282.64 samples/sec Loss 2.6970 LearningRate 0.0000 Epoch: 35 Global Step: 740680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:02,469-Speed 6293.30 samples/sec Loss 2.7066 LearningRate 0.0000 Epoch: 35 Global Step: 740690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:05,724-Speed 6294.59 samples/sec Loss 2.6999 LearningRate 0.0000 Epoch: 35 Global Step: 740700 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:08,982-Speed 6286.54 samples/sec Loss 2.7356 LearningRate 0.0000 Epoch: 35 Global Step: 740710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:12,232-Speed 6302.52 samples/sec Loss 2.7078 LearningRate 0.0000 Epoch: 35 Global Step: 740720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:15,484-Speed 6300.19 samples/sec Loss 2.7151 LearningRate 0.0000 Epoch: 35 Global Step: 740730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:18,741-Speed 6290.28 samples/sec Loss 2.7237 LearningRate 0.0000 Epoch: 35 Global Step: 740740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:21,984-Speed 6314.78 samples/sec Loss 2.7764 LearningRate 0.0000 Epoch: 35 Global Step: 740750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:25,241-Speed 6290.33 samples/sec Loss 2.7504 LearningRate 0.0000 Epoch: 35 Global Step: 740760 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:28,502-Speed 6281.18 samples/sec Loss 2.7252 LearningRate 0.0000 Epoch: 35 Global Step: 740770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:31,765-Speed 6277.11 samples/sec Loss 2.7190 LearningRate 0.0000 Epoch: 35 Global Step: 740780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:35,018-Speed 6297.15 samples/sec Loss 2.7086 LearningRate 0.0000 Epoch: 35 Global Step: 740790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:38,278-Speed 6284.67 samples/sec Loss 2.6654 LearningRate 0.0000 Epoch: 35 Global Step: 740800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:41,537-Speed 6284.34 samples/sec Loss 2.7008 LearningRate 0.0000 Epoch: 35 Global Step: 740810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:44,787-Speed 6302.97 samples/sec Loss 2.7138 LearningRate 0.0000 Epoch: 35 Global Step: 740820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:48,041-Speed 6296.59 samples/sec Loss 2.7214 LearningRate 0.0000 Epoch: 35 Global Step: 740830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:51,301-Speed 6283.32 samples/sec Loss 2.7224 LearningRate 0.0000 Epoch: 35 Global Step: 740840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:07:54,560-Speed 6285.11 samples/sec Loss 2.7609 LearningRate 0.0000 Epoch: 35 Global Step: 740850 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:07:57,833-Speed 6258.95 samples/sec Loss 2.7620 LearningRate 0.0000 Epoch: 35 Global Step: 740860 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:08:01,075-Speed 6318.80 samples/sec Loss 2.7506 LearningRate 0.0000 Epoch: 35 Global Step: 740870 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:04,332-Speed 6289.88 samples/sec Loss 2.7349 LearningRate 0.0000 Epoch: 35 Global Step: 740880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:07,583-Speed 6300.62 samples/sec Loss 2.6808 LearningRate 0.0000 Epoch: 35 Global Step: 740890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:10,839-Speed 6292.65 samples/sec Loss 2.7683 LearningRate 0.0000 Epoch: 35 Global Step: 740900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:14,090-Speed 6301.13 samples/sec Loss 2.6996 LearningRate 0.0000 Epoch: 35 Global Step: 740910 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:17,347-Speed 6289.37 samples/sec Loss 2.7492 LearningRate 0.0000 Epoch: 35 Global Step: 740920 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:20,604-Speed 6288.89 samples/sec Loss 2.7595 LearningRate 0.0000 Epoch: 35 Global Step: 740930 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:23,862-Speed 6286.59 samples/sec Loss 2.7863 LearningRate 0.0000 Epoch: 35 Global Step: 740940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:27,124-Speed 6279.73 samples/sec Loss 2.7433 LearningRate 0.0000 Epoch: 35 Global Step: 740950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:30,384-Speed 6283.52 samples/sec Loss 2.7276 LearningRate 0.0000 Epoch: 35 Global Step: 740960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:33,640-Speed 6292.95 samples/sec Loss 2.7011 LearningRate 0.0000 Epoch: 35 Global Step: 740970 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:08:36,871-Speed 6338.97 samples/sec Loss 2.7309 LearningRate 0.0000 Epoch: 35 Global Step: 740980 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:40,124-Speed 6296.08 samples/sec Loss 2.7543 LearningRate 0.0000 Epoch: 35 Global Step: 740990 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:43,379-Speed 6293.18 samples/sec Loss 2.7307 LearningRate 0.0000 Epoch: 35 Global Step: 741000 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:46,643-Speed 6276.34 samples/sec Loss 2.7407 LearningRate 0.0000 Epoch: 35 Global Step: 741010 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:49,905-Speed 6281.04 samples/sec Loss 2.7479 LearningRate 0.0000 Epoch: 35 Global Step: 741020 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:53,161-Speed 6290.23 samples/sec Loss 2.7307 LearningRate 0.0000 Epoch: 35 Global Step: 741030 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:56,412-Speed 6300.31 samples/sec Loss 2.6881 LearningRate 0.0000 Epoch: 35 Global Step: 741040 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:08:59,667-Speed 6293.36 samples/sec Loss 2.7610 LearningRate 0.0000 Epoch: 35 Global Step: 741050 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:02,928-Speed 6282.31 samples/sec Loss 2.7396 LearningRate 0.0000 Epoch: 35 Global Step: 741060 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:06,181-Speed 6298.36 samples/sec Loss 2.7955 LearningRate 0.0000 Epoch: 35 Global Step: 741070 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:09,426-Speed 6313.45 samples/sec Loss 2.7764 LearningRate 0.0000 Epoch: 35 Global Step: 741080 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:09:12,683-Speed 6288.22 samples/sec Loss 2.7759 LearningRate 0.0000 Epoch: 35 Global Step: 741090 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:09:15,930-Speed 6309.45 samples/sec Loss 2.7059 LearningRate 0.0000 Epoch: 35 Global Step: 741100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:19,192-Speed 6279.33 samples/sec Loss 2.7372 LearningRate 0.0000 Epoch: 35 Global Step: 741110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:22,447-Speed 6293.18 samples/sec Loss 2.7196 LearningRate 0.0000 Epoch: 35 Global Step: 741120 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:25,717-Speed 6264.00 samples/sec Loss 2.6787 LearningRate 0.0000 Epoch: 35 Global Step: 741130 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:28,975-Speed 6288.05 samples/sec Loss 2.7384 LearningRate 0.0000 Epoch: 35 Global Step: 741140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:32,229-Speed 6296.20 samples/sec Loss 2.7252 LearningRate 0.0000 Epoch: 35 Global Step: 741150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:35,491-Speed 6279.66 samples/sec Loss 2.6695 LearningRate 0.0000 Epoch: 35 Global Step: 741160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:38,754-Speed 6276.40 samples/sec Loss 2.7442 LearningRate 0.0000 Epoch: 35 Global Step: 741170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:42,008-Speed 6295.12 samples/sec Loss 2.7041 LearningRate 0.0000 Epoch: 35 Global Step: 741180 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:45,256-Speed 6307.13 samples/sec Loss 2.7189 LearningRate 0.0000 Epoch: 35 Global Step: 741190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:48,509-Speed 6297.55 samples/sec Loss 2.7156 LearningRate 0.0000 Epoch: 35 Global Step: 741200 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:09:51,752-Speed 6316.69 samples/sec Loss 2.7247 LearningRate 0.0000 Epoch: 35 Global Step: 741210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:55,003-Speed 6299.86 samples/sec Loss 2.6889 LearningRate 0.0000 Epoch: 35 Global Step: 741220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:09:58,257-Speed 6296.81 samples/sec Loss 2.7511 LearningRate 0.0000 Epoch: 35 Global Step: 741230 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:01,512-Speed 6293.08 samples/sec Loss 2.6525 LearningRate 0.0000 Epoch: 35 Global Step: 741240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:04,768-Speed 6291.01 samples/sec Loss 2.7195 LearningRate 0.0000 Epoch: 35 Global Step: 741250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:08,022-Speed 6294.75 samples/sec Loss 2.6984 LearningRate 0.0000 Epoch: 35 Global Step: 741260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:11,271-Speed 6304.33 samples/sec Loss 2.7256 LearningRate 0.0000 Epoch: 35 Global Step: 741270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:14,527-Speed 6291.53 samples/sec Loss 2.7478 LearningRate 0.0000 Epoch: 35 Global Step: 741280 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:17,782-Speed 6294.37 samples/sec Loss 2.7433 LearningRate 0.0000 Epoch: 35 Global Step: 741290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:21,040-Speed 6288.07 samples/sec Loss 2.6940 LearningRate 0.0000 Epoch: 35 Global Step: 741300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:24,284-Speed 6313.57 samples/sec Loss 2.7072 LearningRate 0.0000 Epoch: 35 Global Step: 741310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:27,547-Speed 6278.70 samples/sec Loss 2.6956 LearningRate 0.0000 Epoch: 35 Global Step: 741320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:30,803-Speed 6290.52 samples/sec Loss 2.6785 LearningRate 0.0000 Epoch: 35 Global Step: 741330 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:34,057-Speed 6296.59 samples/sec Loss 2.7626 LearningRate 0.0000 Epoch: 35 Global Step: 741340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:37,314-Speed 6288.59 samples/sec Loss 2.7121 LearningRate 0.0000 Epoch: 35 Global Step: 741350 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:40,576-Speed 6279.09 samples/sec Loss 2.7479 LearningRate 0.0000 Epoch: 35 Global Step: 741360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:43,839-Speed 6277.63 samples/sec Loss 2.7392 LearningRate 0.0000 Epoch: 35 Global Step: 741370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:47,101-Speed 6281.47 samples/sec Loss 2.7521 LearningRate 0.0000 Epoch: 35 Global Step: 741380 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:50,360-Speed 6284.60 samples/sec Loss 2.8112 LearningRate 0.0000 Epoch: 35 Global Step: 741390 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:53,616-Speed 6292.01 samples/sec Loss 2.7613 LearningRate 0.0000 Epoch: 35 Global Step: 741400 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:10:56,860-Speed 6313.87 samples/sec Loss 2.7356 LearningRate 0.0000 Epoch: 35 Global Step: 741410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:00,123-Speed 6277.65 samples/sec Loss 2.7190 LearningRate 0.0000 Epoch: 35 Global Step: 741420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:03,377-Speed 6295.58 samples/sec Loss 2.6683 LearningRate 0.0000 Epoch: 35 Global Step: 741430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:06,637-Speed 6282.65 samples/sec Loss 2.7212 LearningRate 0.0000 Epoch: 35 Global Step: 741440 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:09,891-Speed 6296.34 samples/sec Loss 2.7228 LearningRate 0.0000 Epoch: 35 Global Step: 741450 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:13,153-Speed 6279.29 samples/sec Loss 2.7380 LearningRate 0.0000 Epoch: 35 Global Step: 741460 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:16,401-Speed 6306.39 samples/sec Loss 2.7029 LearningRate 0.0000 Epoch: 35 Global Step: 741470 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:19,661-Speed 6284.64 samples/sec Loss 2.7366 LearningRate 0.0000 Epoch: 35 Global Step: 741480 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:22,913-Speed 6298.42 samples/sec Loss 2.7411 LearningRate 0.0000 Epoch: 35 Global Step: 741490 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:26,167-Speed 6294.71 samples/sec Loss 2.6999 LearningRate 0.0000 Epoch: 35 Global Step: 741500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:29,457-Speed 6226.65 samples/sec Loss 2.6784 LearningRate 0.0000 Epoch: 35 Global Step: 741510 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:32,722-Speed 6275.42 samples/sec Loss 2.7205 LearningRate 0.0000 Epoch: 35 Global Step: 741520 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:35,980-Speed 6287.69 samples/sec Loss 2.7589 LearningRate 0.0000 Epoch: 35 Global Step: 741530 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:39,232-Speed 6299.20 samples/sec Loss 2.7120 LearningRate 0.0000 Epoch: 35 Global Step: 741540 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:42,481-Speed 6304.05 samples/sec Loss 2.7535 LearningRate 0.0000 Epoch: 35 Global Step: 741550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:45,739-Speed 6287.45 samples/sec Loss 2.7029 LearningRate 0.0000 Epoch: 35 Global Step: 741560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:49,001-Speed 6279.65 samples/sec Loss 2.6824 LearningRate 0.0000 Epoch: 35 Global Step: 741570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:52,271-Speed 6264.41 samples/sec Loss 2.7173 LearningRate 0.0000 Epoch: 35 Global Step: 741580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:55,530-Speed 6284.39 samples/sec Loss 2.6900 LearningRate 0.0000 Epoch: 35 Global Step: 741590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:11:58,785-Speed 6294.24 samples/sec Loss 2.6142 LearningRate 0.0000 Epoch: 35 Global Step: 741600 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:02,045-Speed 6283.24 samples/sec Loss 2.7257 LearningRate 0.0000 Epoch: 35 Global Step: 741610 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:12:05,288-Speed 6316.42 samples/sec Loss 2.6608 LearningRate 0.0000 Epoch: 35 Global Step: 741620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:08,544-Speed 6291.58 samples/sec Loss 2.7708 LearningRate 0.0000 Epoch: 35 Global Step: 741630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:11,796-Speed 6298.60 samples/sec Loss 2.6775 LearningRate 0.0000 Epoch: 35 Global Step: 741640 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:15,046-Speed 6302.95 samples/sec Loss 2.7012 LearningRate 0.0000 Epoch: 35 Global Step: 741650 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:18,305-Speed 6285.89 samples/sec Loss 2.7201 LearningRate 0.0000 Epoch: 35 Global Step: 741660 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:21,561-Speed 6291.06 samples/sec Loss 2.7278 LearningRate 0.0000 Epoch: 35 Global Step: 741670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:24,818-Speed 6290.01 samples/sec Loss 2.7102 LearningRate 0.0000 Epoch: 35 Global Step: 741680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:28,081-Speed 6277.63 samples/sec Loss 2.7405 LearningRate 0.0000 Epoch: 35 Global Step: 741690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:31,340-Speed 6286.50 samples/sec Loss 2.7160 LearningRate 0.0000 Epoch: 35 Global Step: 741700 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:34,598-Speed 6286.76 samples/sec Loss 2.7156 LearningRate 0.0000 Epoch: 35 Global Step: 741710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:37,840-Speed 6318.17 samples/sec Loss 2.7380 LearningRate 0.0000 Epoch: 35 Global Step: 741720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:41,095-Speed 6293.64 samples/sec Loss 2.7605 LearningRate 0.0000 Epoch: 35 Global Step: 741730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:44,352-Speed 6290.75 samples/sec Loss 2.7477 LearningRate 0.0000 Epoch: 35 Global Step: 741740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:47,606-Speed 6293.85 samples/sec Loss 2.7043 LearningRate 0.0000 Epoch: 35 Global Step: 741750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:50,866-Speed 6283.88 samples/sec Loss 2.7568 LearningRate 0.0000 Epoch: 35 Global Step: 741760 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:54,120-Speed 6294.53 samples/sec Loss 2.7415 LearningRate 0.0000 Epoch: 35 Global Step: 741770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:12:57,380-Speed 6284.36 samples/sec Loss 2.7470 LearningRate 0.0000 Epoch: 35 Global Step: 741780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:00,633-Speed 6297.05 samples/sec Loss 2.6799 LearningRate 0.0000 Epoch: 35 Global Step: 741790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:03,884-Speed 6300.41 samples/sec Loss 2.7476 LearningRate 0.0000 Epoch: 35 Global Step: 741800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:07,139-Speed 6293.87 samples/sec Loss 2.7419 LearningRate 0.0000 Epoch: 35 Global Step: 741810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:10,379-Speed 6322.13 samples/sec Loss 2.6678 LearningRate 0.0000 Epoch: 35 Global Step: 741820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:13,638-Speed 6285.23 samples/sec Loss 2.7137 LearningRate 0.0000 Epoch: 35 Global Step: 741830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:16,897-Speed 6285.93 samples/sec Loss 2.7083 LearningRate 0.0000 Epoch: 35 Global Step: 741840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:20,164-Speed 6270.04 samples/sec Loss 2.7102 LearningRate 0.0000 Epoch: 35 Global Step: 741850 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:23,433-Speed 6267.01 samples/sec Loss 2.6809 LearningRate 0.0000 Epoch: 35 Global Step: 741860 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:26,692-Speed 6285.73 samples/sec Loss 2.7343 LearningRate 0.0000 Epoch: 35 Global Step: 741870 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:29,958-Speed 6272.08 samples/sec Loss 2.7856 LearningRate 0.0000 Epoch: 35 Global Step: 741880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:33,220-Speed 6278.07 samples/sec Loss 2.7571 LearningRate 0.0000 Epoch: 35 Global Step: 741890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:36,480-Speed 6284.46 samples/sec Loss 2.7407 LearningRate 0.0000 Epoch: 35 Global Step: 741900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:39,741-Speed 6282.27 samples/sec Loss 2.7128 LearningRate 0.0000 Epoch: 35 Global Step: 741910 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:43,018-Speed 6250.59 samples/sec Loss 2.7772 LearningRate 0.0000 Epoch: 35 Global Step: 741920 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:13:46,271-Speed 6297.26 samples/sec Loss 2.6562 LearningRate 0.0000 Epoch: 35 Global Step: 741930 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:13:49,510-Speed 6324.75 samples/sec Loss 2.6691 LearningRate 0.0000 Epoch: 35 Global Step: 741940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:52,764-Speed 6295.54 samples/sec Loss 2.7662 LearningRate 0.0000 Epoch: 35 Global Step: 741950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:56,016-Speed 6299.53 samples/sec Loss 2.6519 LearningRate 0.0000 Epoch: 35 Global Step: 741960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:13:59,272-Speed 6290.24 samples/sec Loss 2.7559 LearningRate 0.0000 Epoch: 35 Global Step: 741970 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:02,535-Speed 6278.03 samples/sec Loss 2.7759 LearningRate 0.0000 Epoch: 35 Global Step: 741980 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:05,792-Speed 6289.74 samples/sec Loss 2.7347 LearningRate 0.0000 Epoch: 35 Global Step: 741990 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:09,048-Speed 6291.56 samples/sec Loss 2.7197 LearningRate 0.0000 Epoch: 35 Global Step: 742000 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:12,305-Speed 6289.57 samples/sec Loss 2.7189 LearningRate 0.0000 Epoch: 35 Global Step: 742010 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:15,557-Speed 6298.25 samples/sec Loss 2.7644 LearningRate 0.0000 Epoch: 35 Global Step: 742020 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:18,810-Speed 6297.58 samples/sec Loss 2.7837 LearningRate 0.0000 Epoch: 35 Global Step: 742030 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:22,062-Speed 6299.42 samples/sec Loss 2.7196 LearningRate 0.0000 Epoch: 35 Global Step: 742040 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:14:25,307-Speed 6312.82 samples/sec Loss 2.7486 LearningRate 0.0000 Epoch: 35 Global Step: 742050 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:28,568-Speed 6281.13 samples/sec Loss 2.7101 LearningRate 0.0000 Epoch: 35 Global Step: 742060 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:31,823-Speed 6293.81 samples/sec Loss 2.7572 LearningRate 0.0000 Epoch: 35 Global Step: 742070 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:35,080-Speed 6288.43 samples/sec Loss 2.7643 LearningRate 0.0000 Epoch: 35 Global Step: 742080 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:38,333-Speed 6297.17 samples/sec Loss 2.6783 LearningRate 0.0000 Epoch: 35 Global Step: 742090 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:41,593-Speed 6283.65 samples/sec Loss 2.7443 LearningRate 0.0000 Epoch: 35 Global Step: 742100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:44,852-Speed 6286.48 samples/sec Loss 2.7437 LearningRate 0.0000 Epoch: 35 Global Step: 742110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:48,103-Speed 6300.44 samples/sec Loss 2.7030 LearningRate 0.0000 Epoch: 35 Global Step: 742120 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:51,354-Speed 6302.23 samples/sec Loss 2.6878 LearningRate 0.0000 Epoch: 35 Global Step: 742130 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:54,608-Speed 6294.19 samples/sec Loss 2.7340 LearningRate 0.0000 Epoch: 35 Global Step: 742140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:14:57,852-Speed 6315.86 samples/sec Loss 2.7473 LearningRate 0.0000 Epoch: 35 Global Step: 742150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:01,112-Speed 6282.20 samples/sec Loss 2.7463 LearningRate 0.0000 Epoch: 35 Global Step: 742160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:04,430-Speed 6175.06 samples/sec Loss 2.7577 LearningRate 0.0000 Epoch: 35 Global Step: 742170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:07,685-Speed 6293.72 samples/sec Loss 2.7009 LearningRate 0.0000 Epoch: 35 Global Step: 742180 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:10,938-Speed 6296.24 samples/sec Loss 2.7018 LearningRate 0.0000 Epoch: 35 Global Step: 742190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:14,207-Speed 6266.03 samples/sec Loss 2.7431 LearningRate 0.0000 Epoch: 35 Global Step: 742200 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:17,461-Speed 6295.11 samples/sec Loss 2.6628 LearningRate 0.0000 Epoch: 35 Global Step: 742210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:20,718-Speed 6290.05 samples/sec Loss 2.7389 LearningRate 0.0000 Epoch: 35 Global Step: 742220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:23,992-Speed 6255.76 samples/sec Loss 2.7032 LearningRate 0.0000 Epoch: 35 Global Step: 742230 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:27,254-Speed 6280.06 samples/sec Loss 2.7307 LearningRate 0.0000 Epoch: 35 Global Step: 742240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:30,499-Speed 6312.29 samples/sec Loss 2.7467 LearningRate 0.0000 Epoch: 35 Global Step: 742250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:33,760-Speed 6283.07 samples/sec Loss 2.7452 LearningRate 0.0000 Epoch: 35 Global Step: 742260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:37,024-Speed 6274.89 samples/sec Loss 2.6949 LearningRate 0.0000 Epoch: 35 Global Step: 742270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:40,287-Speed 6278.64 samples/sec Loss 2.7028 LearningRate 0.0000 Epoch: 35 Global Step: 742280 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:43,546-Speed 6284.49 samples/sec Loss 2.7241 LearningRate 0.0000 Epoch: 35 Global Step: 742290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:46,813-Speed 6270.34 samples/sec Loss 2.6796 LearningRate 0.0000 Epoch: 35 Global Step: 742300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:50,072-Speed 6285.51 samples/sec Loss 2.7248 LearningRate 0.0000 Epoch: 35 Global Step: 742310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:53,334-Speed 6279.06 samples/sec Loss 2.7243 LearningRate 0.0000 Epoch: 35 Global Step: 742320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:56,584-Speed 6304.38 samples/sec Loss 2.6776 LearningRate 0.0000 Epoch: 35 Global Step: 742330 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:15:59,840-Speed 6292.16 samples/sec Loss 2.7482 LearningRate 0.0000 Epoch: 35 Global Step: 742340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:03,101-Speed 6280.44 samples/sec Loss 2.7668 LearningRate 0.0000 Epoch: 35 Global Step: 742350 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:16:06,340-Speed 6324.81 samples/sec Loss 2.6689 LearningRate 0.0000 Epoch: 35 Global Step: 742360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:09,589-Speed 6304.56 samples/sec Loss 2.7332 LearningRate 0.0000 Epoch: 35 Global Step: 742370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:12,844-Speed 6293.89 samples/sec Loss 2.7618 LearningRate 0.0000 Epoch: 35 Global Step: 742380 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:16,165-Speed 6169.98 samples/sec Loss 2.7407 LearningRate 0.0000 Epoch: 35 Global Step: 742390 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:19,418-Speed 6296.26 samples/sec Loss 2.7310 LearningRate 0.0000 Epoch: 35 Global Step: 742400 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:22,675-Speed 6289.14 samples/sec Loss 2.7188 LearningRate 0.0000 Epoch: 35 Global Step: 742410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:25,931-Speed 6292.13 samples/sec Loss 2.7385 LearningRate 0.0000 Epoch: 35 Global Step: 742420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:29,181-Speed 6302.35 samples/sec Loss 2.7214 LearningRate 0.0000 Epoch: 35 Global Step: 742430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:32,429-Speed 6306.91 samples/sec Loss 2.6748 LearningRate 0.0000 Epoch: 35 Global Step: 742440 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:35,686-Speed 6289.47 samples/sec Loss 2.7673 LearningRate 0.0000 Epoch: 35 Global Step: 742450 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:38,931-Speed 6313.84 samples/sec Loss 2.6726 LearningRate 0.0000 Epoch: 35 Global Step: 742460 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:42,191-Speed 6282.68 samples/sec Loss 2.6924 LearningRate 0.0000 Epoch: 35 Global Step: 742470 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:45,453-Speed 6278.99 samples/sec Loss 2.6593 LearningRate 0.0000 Epoch: 35 Global Step: 742480 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:48,715-Speed 6281.06 samples/sec Loss 2.7598 LearningRate 0.0000 Epoch: 35 Global Step: 742490 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:51,970-Speed 6292.80 samples/sec Loss 2.7287 LearningRate 0.0000 Epoch: 35 Global Step: 742500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:55,226-Speed 6290.55 samples/sec Loss 2.7365 LearningRate 0.0000 Epoch: 35 Global Step: 742510 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:16:58,486-Speed 6283.30 samples/sec Loss 2.7139 LearningRate 0.0000 Epoch: 35 Global Step: 742520 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:01,744-Speed 6288.43 samples/sec Loss 2.7368 LearningRate 0.0000 Epoch: 35 Global Step: 742530 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:04,999-Speed 6293.16 samples/sec Loss 2.7752 LearningRate 0.0000 Epoch: 35 Global Step: 742540 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:08,253-Speed 6295.68 samples/sec Loss 2.6740 LearningRate 0.0000 Epoch: 35 Global Step: 742550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:11,514-Speed 6282.05 samples/sec Loss 2.7669 LearningRate 0.0000 Epoch: 35 Global Step: 742560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:14,771-Speed 6290.27 samples/sec Loss 2.6832 LearningRate 0.0000 Epoch: 35 Global Step: 742570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:18,026-Speed 6293.31 samples/sec Loss 2.6936 LearningRate 0.0000 Epoch: 35 Global Step: 742580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:21,280-Speed 6295.20 samples/sec Loss 2.7275 LearningRate 0.0000 Epoch: 35 Global Step: 742590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:24,535-Speed 6292.41 samples/sec Loss 2.6593 LearningRate 0.0000 Epoch: 35 Global Step: 742600 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:27,796-Speed 6280.62 samples/sec Loss 2.6881 LearningRate 0.0000 Epoch: 35 Global Step: 742610 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:31,048-Speed 6300.03 samples/sec Loss 2.7532 LearningRate 0.0000 Epoch: 35 Global Step: 742620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:34,299-Speed 6300.61 samples/sec Loss 2.7241 LearningRate 0.0000 Epoch: 35 Global Step: 742630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:37,552-Speed 6298.28 samples/sec Loss 2.6733 LearningRate 0.0000 Epoch: 35 Global Step: 742640 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:40,816-Speed 6274.41 samples/sec Loss 2.6974 LearningRate 0.0000 Epoch: 35 Global Step: 742650 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:44,072-Speed 6294.00 samples/sec Loss 2.7219 LearningRate 0.0000 Epoch: 35 Global Step: 742660 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:17:47,312-Speed 6321.91 samples/sec Loss 2.7007 LearningRate 0.0000 Epoch: 35 Global Step: 742670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:50,570-Speed 6287.97 samples/sec Loss 2.7338 LearningRate 0.0000 Epoch: 35 Global Step: 742680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:53,823-Speed 6296.37 samples/sec Loss 2.6934 LearningRate 0.0000 Epoch: 35 Global Step: 742690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:17:57,086-Speed 6279.70 samples/sec Loss 2.7296 LearningRate 0.0000 Epoch: 35 Global Step: 742700 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:00,342-Speed 6289.33 samples/sec Loss 2.7204 LearningRate 0.0000 Epoch: 35 Global Step: 742710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:03,597-Speed 6294.73 samples/sec Loss 2.6781 LearningRate 0.0000 Epoch: 35 Global Step: 742720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:06,879-Speed 6240.78 samples/sec Loss 2.7509 LearningRate 0.0000 Epoch: 35 Global Step: 742730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:10,136-Speed 6289.22 samples/sec Loss 2.7577 LearningRate 0.0000 Epoch: 35 Global Step: 742740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:13,395-Speed 6285.96 samples/sec Loss 2.7180 LearningRate 0.0000 Epoch: 35 Global Step: 742750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:16,766-Speed 6077.25 samples/sec Loss 2.7122 LearningRate 0.0000 Epoch: 35 Global Step: 742760 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:20,050-Speed 6238.47 samples/sec Loss 2.7518 LearningRate 0.0000 Epoch: 35 Global Step: 742770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:23,311-Speed 6280.58 samples/sec Loss 2.7304 LearningRate 0.0000 Epoch: 35 Global Step: 742780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:26,573-Speed 6280.86 samples/sec Loss 2.7646 LearningRate 0.0000 Epoch: 35 Global Step: 742790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:29,930-Speed 6102.04 samples/sec Loss 2.7639 LearningRate 0.0000 Epoch: 35 Global Step: 742800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:33,188-Speed 6285.88 samples/sec Loss 2.6894 LearningRate 0.0000 Epoch: 35 Global Step: 742810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:36,446-Speed 6287.97 samples/sec Loss 2.6834 LearningRate 0.0000 Epoch: 35 Global Step: 742820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:39,693-Speed 6308.44 samples/sec Loss 2.7524 LearningRate 0.0000 Epoch: 35 Global Step: 742830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:42,942-Speed 6305.52 samples/sec Loss 2.7121 LearningRate 0.0000 Epoch: 35 Global Step: 742840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:46,204-Speed 6279.40 samples/sec Loss 2.7282 LearningRate 0.0000 Epoch: 35 Global Step: 742850 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:49,460-Speed 6292.31 samples/sec Loss 2.7285 LearningRate 0.0000 Epoch: 35 Global Step: 742860 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:52,698-Speed 6325.67 samples/sec Loss 2.6932 LearningRate 0.0000 Epoch: 35 Global Step: 742870 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:55,951-Speed 6297.63 samples/sec Loss 2.7270 LearningRate 0.0000 Epoch: 35 Global Step: 742880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:18:59,204-Speed 6297.03 samples/sec Loss 2.7069 LearningRate 0.0000 Epoch: 35 Global Step: 742890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:02,456-Speed 6298.44 samples/sec Loss 2.6797 LearningRate 0.0000 Epoch: 35 Global Step: 742900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:05,710-Speed 6294.70 samples/sec Loss 2.7020 LearningRate 0.0000 Epoch: 35 Global Step: 742910 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:08,973-Speed 6278.94 samples/sec Loss 2.7025 LearningRate 0.0000 Epoch: 35 Global Step: 742920 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:12,219-Speed 6310.75 samples/sec Loss 2.7118 LearningRate 0.0000 Epoch: 35 Global Step: 742930 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:15,472-Speed 6296.80 samples/sec Loss 2.6916 LearningRate 0.0000 Epoch: 35 Global Step: 742940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:18,735-Speed 6278.27 samples/sec Loss 2.7346 LearningRate 0.0000 Epoch: 35 Global Step: 742950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:21,987-Speed 6298.97 samples/sec Loss 2.7551 LearningRate 0.0000 Epoch: 35 Global Step: 742960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:25,241-Speed 6296.99 samples/sec Loss 2.6622 LearningRate 0.0000 Epoch: 35 Global Step: 742970 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:19:28,482-Speed 6321.51 samples/sec Loss 2.7127 LearningRate 0.0000 Epoch: 35 Global Step: 742980 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:31,737-Speed 6292.60 samples/sec Loss 2.7154 LearningRate 0.0000 Epoch: 35 Global Step: 742990 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:34,993-Speed 6290.49 samples/sec Loss 2.7248 LearningRate 0.0000 Epoch: 35 Global Step: 743000 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:38,246-Speed 6297.75 samples/sec Loss 2.6843 LearningRate 0.0000 Epoch: 35 Global Step: 743010 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:41,504-Speed 6287.00 samples/sec Loss 2.7161 LearningRate 0.0000 Epoch: 35 Global Step: 743020 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:44,763-Speed 6285.46 samples/sec Loss 2.7000 LearningRate 0.0000 Epoch: 35 Global Step: 743030 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:48,029-Speed 6272.71 samples/sec Loss 2.7569 LearningRate 0.0000 Epoch: 35 Global Step: 743040 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:51,285-Speed 6291.45 samples/sec Loss 2.7588 LearningRate 0.0000 Epoch: 35 Global Step: 743050 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:54,539-Speed 6294.19 samples/sec Loss 2.7247 LearningRate 0.0000 Epoch: 35 Global Step: 743060 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:19:57,796-Speed 6290.76 samples/sec Loss 2.7546 LearningRate 0.0000 Epoch: 35 Global Step: 743070 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:01,052-Speed 6291.91 samples/sec Loss 2.7182 LearningRate 0.0000 Epoch: 35 Global Step: 743080 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:04,314-Speed 6278.53 samples/sec Loss 2.7454 LearningRate 0.0000 Epoch: 35 Global Step: 743090 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:07,579-Speed 6274.68 samples/sec Loss 2.7539 LearningRate 0.0000 Epoch: 35 Global Step: 743100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:10,836-Speed 6289.22 samples/sec Loss 2.6695 LearningRate 0.0000 Epoch: 35 Global Step: 743110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:14,093-Speed 6288.41 samples/sec Loss 2.7122 LearningRate 0.0000 Epoch: 35 Global Step: 743120 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:17,355-Speed 6280.73 samples/sec Loss 2.7617 LearningRate 0.0000 Epoch: 35 Global Step: 743130 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:20,618-Speed 6277.87 samples/sec Loss 2.7444 LearningRate 0.0000 Epoch: 35 Global Step: 743140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:23,873-Speed 6292.37 samples/sec Loss 2.7728 LearningRate 0.0000 Epoch: 35 Global Step: 743150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:27,127-Speed 6295.73 samples/sec Loss 2.6969 LearningRate 0.0000 Epoch: 35 Global Step: 743160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:30,386-Speed 6286.02 samples/sec Loss 2.7480 LearningRate 0.0000 Epoch: 35 Global Step: 743170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:33,641-Speed 6293.31 samples/sec Loss 2.7465 LearningRate 0.0000 Epoch: 35 Global Step: 743180 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:20:36,884-Speed 6316.64 samples/sec Loss 2.6824 LearningRate 0.0000 Epoch: 35 Global Step: 743190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:40,149-Speed 6275.14 samples/sec Loss 2.7483 LearningRate 0.0000 Epoch: 35 Global Step: 743200 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:43,405-Speed 6290.25 samples/sec Loss 2.6423 LearningRate 0.0000 Epoch: 35 Global Step: 743210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:46,664-Speed 6286.44 samples/sec Loss 2.7057 LearningRate 0.0000 Epoch: 35 Global Step: 743220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:49,910-Speed 6310.81 samples/sec Loss 2.7067 LearningRate 0.0000 Epoch: 35 Global Step: 743230 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:53,183-Speed 6258.93 samples/sec Loss 2.6791 LearningRate 0.0000 Epoch: 35 Global Step: 743240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:56,442-Speed 6286.08 samples/sec Loss 2.7062 LearningRate 0.0000 Epoch: 35 Global Step: 743250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:20:59,686-Speed 6312.80 samples/sec Loss 2.7323 LearningRate 0.0000 Epoch: 35 Global Step: 743260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:02,943-Speed 6289.68 samples/sec Loss 2.6844 LearningRate 0.0000 Epoch: 35 Global Step: 743270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:06,207-Speed 6275.45 samples/sec Loss 2.7398 LearningRate 0.0000 Epoch: 35 Global Step: 743280 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:09,450-Speed 6317.39 samples/sec Loss 2.6891 LearningRate 0.0000 Epoch: 35 Global Step: 743290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:12,704-Speed 6295.81 samples/sec Loss 2.7230 LearningRate 0.0000 Epoch: 35 Global Step: 743300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:15,973-Speed 6265.70 samples/sec Loss 2.7070 LearningRate 0.0000 Epoch: 35 Global Step: 743310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:19,235-Speed 6279.24 samples/sec Loss 2.7321 LearningRate 0.0000 Epoch: 35 Global Step: 743320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:22,497-Speed 6279.76 samples/sec Loss 2.6973 LearningRate 0.0000 Epoch: 35 Global Step: 743330 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:25,758-Speed 6282.66 samples/sec Loss 2.7479 LearningRate 0.0000 Epoch: 35 Global Step: 743340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:29,014-Speed 6291.12 samples/sec Loss 2.6701 LearningRate 0.0000 Epoch: 35 Global Step: 743350 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:32,267-Speed 6297.39 samples/sec Loss 2.6708 LearningRate 0.0000 Epoch: 35 Global Step: 743360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:35,524-Speed 6287.37 samples/sec Loss 2.7137 LearningRate 0.0000 Epoch: 35 Global Step: 743370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:38,779-Speed 6294.35 samples/sec Loss 2.7203 LearningRate 0.0000 Epoch: 35 Global Step: 743380 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:42,040-Speed 6282.57 samples/sec Loss 2.7212 LearningRate 0.0000 Epoch: 35 Global Step: 743390 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:21:45,281-Speed 6320.90 samples/sec Loss 2.7154 LearningRate 0.0000 Epoch: 35 Global Step: 743400 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:48,544-Speed 6277.25 samples/sec Loss 2.7357 LearningRate 0.0000 Epoch: 35 Global Step: 743410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:51,794-Speed 6303.06 samples/sec Loss 2.7064 LearningRate 0.0000 Epoch: 35 Global Step: 743420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:55,053-Speed 6286.50 samples/sec Loss 2.6743 LearningRate 0.0000 Epoch: 35 Global Step: 743430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:21:58,305-Speed 6297.61 samples/sec Loss 2.7129 LearningRate 0.0000 Epoch: 35 Global Step: 743440 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:01,558-Speed 6297.97 samples/sec Loss 2.7098 LearningRate 0.0000 Epoch: 35 Global Step: 743450 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:04,821-Speed 6277.48 samples/sec Loss 2.7017 LearningRate 0.0000 Epoch: 35 Global Step: 743460 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:08,074-Speed 6297.58 samples/sec Loss 2.7343 LearningRate 0.0000 Epoch: 35 Global Step: 743470 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:11,336-Speed 6280.11 samples/sec Loss 2.7038 LearningRate 0.0000 Epoch: 35 Global Step: 743480 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:14,596-Speed 6283.73 samples/sec Loss 2.6625 LearningRate 0.0000 Epoch: 35 Global Step: 743490 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:17,840-Speed 6314.41 samples/sec Loss 2.6856 LearningRate 0.0000 Epoch: 35 Global Step: 743500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:21,094-Speed 6293.31 samples/sec Loss 2.6936 LearningRate 0.0000 Epoch: 35 Global Step: 743510 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:24,352-Speed 6289.05 samples/sec Loss 2.7044 LearningRate 0.0000 Epoch: 35 Global Step: 743520 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:27,608-Speed 6290.69 samples/sec Loss 2.7533 LearningRate 0.0000 Epoch: 35 Global Step: 743530 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:30,866-Speed 6287.32 samples/sec Loss 2.7434 LearningRate 0.0000 Epoch: 35 Global Step: 743540 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:34,122-Speed 6290.62 samples/sec Loss 2.6688 LearningRate 0.0000 Epoch: 35 Global Step: 743550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:37,379-Speed 6291.02 samples/sec Loss 2.7280 LearningRate 0.0000 Epoch: 35 Global Step: 743560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:40,634-Speed 6291.19 samples/sec Loss 2.6978 LearningRate 0.0000 Epoch: 35 Global Step: 743570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:43,895-Speed 6282.04 samples/sec Loss 2.6948 LearningRate 0.0000 Epoch: 35 Global Step: 743580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:47,155-Speed 6284.90 samples/sec Loss 2.6957 LearningRate 0.0000 Epoch: 35 Global Step: 743590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:50,414-Speed 6285.85 samples/sec Loss 2.7155 LearningRate 0.0000 Epoch: 35 Global Step: 743600 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:22:53,652-Speed 6326.39 samples/sec Loss 2.6782 LearningRate 0.0000 Epoch: 35 Global Step: 743610 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:22:56,929-Speed 6249.77 samples/sec Loss 2.6904 LearningRate 0.0000 Epoch: 35 Global Step: 743620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:00,294-Speed 6089.04 samples/sec Loss 2.7232 LearningRate 0.0000 Epoch: 35 Global Step: 743630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:03,550-Speed 6291.81 samples/sec Loss 2.7279 LearningRate 0.0000 Epoch: 35 Global Step: 743640 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:06,806-Speed 6290.53 samples/sec Loss 2.7057 LearningRate 0.0000 Epoch: 35 Global Step: 743650 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:10,068-Speed 6278.98 samples/sec Loss 2.7317 LearningRate 0.0000 Epoch: 35 Global Step: 743660 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:13,335-Speed 6269.93 samples/sec Loss 2.6921 LearningRate 0.0000 Epoch: 35 Global Step: 743670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:16,592-Speed 6290.78 samples/sec Loss 2.7002 LearningRate 0.0000 Epoch: 35 Global Step: 743680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:19,844-Speed 6297.70 samples/sec Loss 2.6883 LearningRate 0.0000 Epoch: 35 Global Step: 743690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:23,161-Speed 6176.99 samples/sec Loss 2.7089 LearningRate 0.0000 Epoch: 35 Global Step: 743700 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:26,504-Speed 6127.45 samples/sec Loss 2.7682 LearningRate 0.0000 Epoch: 35 Global Step: 743710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:29,768-Speed 6275.30 samples/sec Loss 2.7479 LearningRate 0.0000 Epoch: 35 Global Step: 743720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:33,030-Speed 6279.97 samples/sec Loss 2.7023 LearningRate 0.0000 Epoch: 35 Global Step: 743730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:36,279-Speed 6304.88 samples/sec Loss 2.7003 LearningRate 0.0000 Epoch: 35 Global Step: 743740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:39,539-Speed 6282.78 samples/sec Loss 2.7608 LearningRate 0.0000 Epoch: 35 Global Step: 743750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:42,792-Speed 6298.40 samples/sec Loss 2.6954 LearningRate 0.0000 Epoch: 35 Global Step: 743760 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:46,055-Speed 6276.32 samples/sec Loss 2.6814 LearningRate 0.0000 Epoch: 35 Global Step: 743770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:49,322-Speed 6271.07 samples/sec Loss 2.7873 LearningRate 0.0000 Epoch: 35 Global Step: 743780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:52,576-Speed 6295.81 samples/sec Loss 2.7662 LearningRate 0.0000 Epoch: 35 Global Step: 743790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:55,828-Speed 6298.47 samples/sec Loss 2.7136 LearningRate 0.0000 Epoch: 35 Global Step: 743800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:23:59,069-Speed 6320.91 samples/sec Loss 2.7579 LearningRate 0.0000 Epoch: 35 Global Step: 743810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:02,327-Speed 6287.13 samples/sec Loss 2.6569 LearningRate 0.0000 Epoch: 35 Global Step: 743820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:05,576-Speed 6305.37 samples/sec Loss 2.6743 LearningRate 0.0000 Epoch: 35 Global Step: 743830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:08,830-Speed 6295.96 samples/sec Loss 2.7118 LearningRate 0.0000 Epoch: 35 Global Step: 743840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:12,089-Speed 6285.32 samples/sec Loss 2.6961 LearningRate 0.0000 Epoch: 35 Global Step: 743850 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:15,344-Speed 6293.44 samples/sec Loss 2.7550 LearningRate 0.0000 Epoch: 35 Global Step: 743860 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:18,592-Speed 6306.92 samples/sec Loss 2.7403 LearningRate 0.0000 Epoch: 35 Global Step: 743870 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:21,840-Speed 6306.71 samples/sec Loss 2.6974 LearningRate 0.0000 Epoch: 35 Global Step: 743880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:25,112-Speed 6259.43 samples/sec Loss 2.6999 LearningRate 0.0000 Epoch: 35 Global Step: 743890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:28,364-Speed 6299.55 samples/sec Loss 2.7716 LearningRate 0.0000 Epoch: 35 Global Step: 743900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:31,619-Speed 6293.55 samples/sec Loss 2.7435 LearningRate 0.0000 Epoch: 35 Global Step: 743910 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:24:34,881-Speed 6279.27 samples/sec Loss 2.7017 LearningRate 0.0000 Epoch: 35 Global Step: 743920 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:24:38,122-Speed 6321.51 samples/sec Loss 2.6807 LearningRate 0.0000 Epoch: 35 Global Step: 743930 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:41,374-Speed 6298.42 samples/sec Loss 2.7071 LearningRate 0.0000 Epoch: 35 Global Step: 743940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:44,633-Speed 6285.43 samples/sec Loss 2.7450 LearningRate 0.0000 Epoch: 35 Global Step: 743950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:47,892-Speed 6285.99 samples/sec Loss 2.6790 LearningRate 0.0000 Epoch: 35 Global Step: 743960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:51,151-Speed 6284.39 samples/sec Loss 2.6791 LearningRate 0.0000 Epoch: 35 Global Step: 743970 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:54,409-Speed 6287.91 samples/sec Loss 2.7082 LearningRate 0.0000 Epoch: 35 Global Step: 743980 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:24:57,671-Speed 6279.65 samples/sec Loss 2.7166 LearningRate 0.0000 Epoch: 35 Global Step: 743990 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:00,931-Speed 6282.96 samples/sec Loss 2.7017 LearningRate 0.0000 Epoch: 35 Global Step: 744000 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:04,182-Speed 6302.10 samples/sec Loss 2.7036 LearningRate 0.0000 Epoch: 35 Global Step: 744010 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:07,435-Speed 6297.49 samples/sec Loss 2.6851 LearningRate 0.0000 Epoch: 35 Global Step: 744020 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:10,679-Speed 6313.58 samples/sec Loss 2.6866 LearningRate 0.0000 Epoch: 35 Global Step: 744030 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:13,934-Speed 6294.38 samples/sec Loss 2.7377 LearningRate 0.0000 Epoch: 35 Global Step: 744040 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:17,191-Speed 6288.37 samples/sec Loss 2.7403 LearningRate 0.0000 Epoch: 35 Global Step: 744050 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:20,444-Speed 6297.67 samples/sec Loss 2.7525 LearningRate 0.0000 Epoch: 35 Global Step: 744060 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:23,709-Speed 6274.99 samples/sec Loss 2.6563 LearningRate 0.0000 Epoch: 35 Global Step: 744070 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:26,964-Speed 6292.58 samples/sec Loss 2.6524 LearningRate 0.0000 Epoch: 35 Global Step: 744080 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:30,233-Speed 6267.24 samples/sec Loss 2.7054 LearningRate 0.0000 Epoch: 35 Global Step: 744090 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:33,491-Speed 6286.67 samples/sec Loss 2.7179 LearningRate 0.0000 Epoch: 35 Global Step: 744100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:36,752-Speed 6282.56 samples/sec Loss 2.7477 LearningRate 0.0000 Epoch: 35 Global Step: 744110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:40,010-Speed 6286.98 samples/sec Loss 2.6749 LearningRate 0.0000 Epoch: 35 Global Step: 744120 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:43,267-Speed 6288.58 samples/sec Loss 2.7091 LearningRate 0.0000 Epoch: 35 Global Step: 744130 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:25:46,503-Speed 6330.88 samples/sec Loss 2.7477 LearningRate 0.0000 Epoch: 35 Global Step: 744140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:49,763-Speed 6284.03 samples/sec Loss 2.6848 LearningRate 0.0000 Epoch: 35 Global Step: 744150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:53,016-Speed 6295.92 samples/sec Loss 2.6908 LearningRate 0.0000 Epoch: 35 Global Step: 744160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:56,272-Speed 6291.17 samples/sec Loss 2.6853 LearningRate 0.0000 Epoch: 35 Global Step: 744170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:25:59,522-Speed 6303.26 samples/sec Loss 2.7034 LearningRate 0.0000 Epoch: 35 Global Step: 744180 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:02,767-Speed 6312.42 samples/sec Loss 2.7150 LearningRate 0.0000 Epoch: 35 Global Step: 744190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:06,024-Speed 6289.12 samples/sec Loss 2.7463 LearningRate 0.0000 Epoch: 35 Global Step: 744200 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:09,278-Speed 6296.82 samples/sec Loss 2.7288 LearningRate 0.0000 Epoch: 35 Global Step: 744210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:12,532-Speed 6293.24 samples/sec Loss 2.7325 LearningRate 0.0000 Epoch: 35 Global Step: 744220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:15,792-Speed 6283.44 samples/sec Loss 2.6722 LearningRate 0.0000 Epoch: 35 Global Step: 744230 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:19,032-Speed 6323.17 samples/sec Loss 2.6985 LearningRate 0.0000 Epoch: 35 Global Step: 744240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:22,292-Speed 6284.71 samples/sec Loss 2.7440 LearningRate 0.0000 Epoch: 35 Global Step: 744250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:25,548-Speed 6290.13 samples/sec Loss 2.7156 LearningRate 0.0000 Epoch: 35 Global Step: 744260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:28,801-Speed 6297.57 samples/sec Loss 2.7284 LearningRate 0.0000 Epoch: 35 Global Step: 744270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:32,061-Speed 6285.23 samples/sec Loss 2.7073 LearningRate 0.0000 Epoch: 35 Global Step: 744280 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:35,320-Speed 6284.18 samples/sec Loss 2.6940 LearningRate 0.0000 Epoch: 35 Global Step: 744290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:38,571-Speed 6302.03 samples/sec Loss 2.7333 LearningRate 0.0000 Epoch: 35 Global Step: 744300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:41,832-Speed 6280.32 samples/sec Loss 2.7403 LearningRate 0.0000 Epoch: 35 Global Step: 744310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:45,082-Speed 6302.99 samples/sec Loss 2.6871 LearningRate 0.0000 Epoch: 35 Global Step: 744320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:48,334-Speed 6299.95 samples/sec Loss 2.6638 LearningRate 0.0000 Epoch: 35 Global Step: 744330 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:51,568-Speed 6334.21 samples/sec Loss 2.7491 LearningRate 0.0000 Epoch: 35 Global Step: 744340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:54,817-Speed 6303.78 samples/sec Loss 2.7091 LearningRate 0.0000 Epoch: 35 Global Step: 744350 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:26:58,080-Speed 6277.90 samples/sec Loss 2.6757 LearningRate 0.0000 Epoch: 35 Global Step: 744360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:01,336-Speed 6292.12 samples/sec Loss 2.6787 LearningRate 0.0000 Epoch: 35 Global Step: 744370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:04,597-Speed 6281.41 samples/sec Loss 2.7003 LearningRate 0.0000 Epoch: 35 Global Step: 744380 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:07,848-Speed 6301.99 samples/sec Loss 2.7679 LearningRate 0.0000 Epoch: 35 Global Step: 744390 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:11,093-Speed 6312.27 samples/sec Loss 2.7007 LearningRate 0.0000 Epoch: 35 Global Step: 744400 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:14,350-Speed 6288.25 samples/sec Loss 2.7492 LearningRate 0.0000 Epoch: 35 Global Step: 744410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:17,602-Speed 6298.47 samples/sec Loss 2.6718 LearningRate 0.0000 Epoch: 35 Global Step: 744420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:20,858-Speed 6292.85 samples/sec Loss 2.7145 LearningRate 0.0000 Epoch: 35 Global Step: 744430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:24,116-Speed 6287.66 samples/sec Loss 2.7028 LearningRate 0.0000 Epoch: 35 Global Step: 744440 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:27:27,368-Speed 6298.77 samples/sec Loss 2.6995 LearningRate 0.0000 Epoch: 35 Global Step: 744450 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:30,631-Speed 6281.62 samples/sec Loss 2.7046 LearningRate 0.0000 Epoch: 35 Global Step: 744460 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:33,889-Speed 6287.89 samples/sec Loss 2.7526 LearningRate 0.0000 Epoch: 35 Global Step: 744470 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:37,149-Speed 6283.91 samples/sec Loss 2.7825 LearningRate 0.0000 Epoch: 35 Global Step: 744480 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:40,402-Speed 6297.23 samples/sec Loss 2.6972 LearningRate 0.0000 Epoch: 35 Global Step: 744490 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:43,657-Speed 6292.25 samples/sec Loss 2.6659 LearningRate 0.0000 Epoch: 35 Global Step: 744500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:46,916-Speed 6287.10 samples/sec Loss 2.7852 LearningRate 0.0000 Epoch: 35 Global Step: 744510 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:50,169-Speed 6295.71 samples/sec Loss 2.7428 LearningRate 0.0000 Epoch: 35 Global Step: 744520 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:53,427-Speed 6289.08 samples/sec Loss 2.7510 LearningRate 0.0000 Epoch: 35 Global Step: 744530 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:56,682-Speed 6292.44 samples/sec Loss 2.7193 LearningRate 0.0000 Epoch: 35 Global Step: 744540 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:27:59,927-Speed 6313.19 samples/sec Loss 2.7266 LearningRate 0.0000 Epoch: 35 Global Step: 744550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:03,183-Speed 6289.98 samples/sec Loss 2.6831 LearningRate 0.0000 Epoch: 35 Global Step: 744560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:06,434-Speed 6301.62 samples/sec Loss 2.7341 LearningRate 0.0000 Epoch: 35 Global Step: 744570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:09,688-Speed 6294.99 samples/sec Loss 2.6795 LearningRate 0.0000 Epoch: 35 Global Step: 744580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:12,951-Speed 6278.38 samples/sec Loss 2.7387 LearningRate 0.0000 Epoch: 35 Global Step: 744590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:16,212-Speed 6280.39 samples/sec Loss 2.7364 LearningRate 0.0000 Epoch: 35 Global Step: 744600 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:19,467-Speed 6294.19 samples/sec Loss 2.7323 LearningRate 0.0000 Epoch: 35 Global Step: 744610 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:22,734-Speed 6269.79 samples/sec Loss 2.6828 LearningRate 0.0000 Epoch: 35 Global Step: 744620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:25,993-Speed 6285.12 samples/sec Loss 2.6960 LearningRate 0.0000 Epoch: 35 Global Step: 744630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:29,246-Speed 6298.37 samples/sec Loss 2.7674 LearningRate 0.0000 Epoch: 35 Global Step: 744640 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:32,509-Speed 6276.71 samples/sec Loss 2.7553 LearningRate 0.0000 Epoch: 35 Global Step: 744650 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:28:35,756-Speed 6307.76 samples/sec Loss 2.7237 LearningRate 0.0000 Epoch: 35 Global Step: 744660 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:39,026-Speed 6265.15 samples/sec Loss 2.7006 LearningRate 0.0000 Epoch: 35 Global Step: 744670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:42,286-Speed 6284.07 samples/sec Loss 2.6741 LearningRate 0.0000 Epoch: 35 Global Step: 744680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:45,536-Speed 6303.88 samples/sec Loss 2.6699 LearningRate 0.0000 Epoch: 35 Global Step: 744690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:48,799-Speed 6276.79 samples/sec Loss 2.7554 LearningRate 0.0000 Epoch: 35 Global Step: 744700 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:52,059-Speed 6285.10 samples/sec Loss 2.6833 LearningRate 0.0000 Epoch: 35 Global Step: 744710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:55,314-Speed 6292.47 samples/sec Loss 2.6886 LearningRate 0.0000 Epoch: 35 Global Step: 744720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:28:58,571-Speed 6289.85 samples/sec Loss 2.6486 LearningRate 0.0000 Epoch: 35 Global Step: 744730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:01,834-Speed 6277.37 samples/sec Loss 2.6974 LearningRate 0.0000 Epoch: 35 Global Step: 744740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:05,089-Speed 6293.75 samples/sec Loss 2.7417 LearningRate 0.0000 Epoch: 35 Global Step: 744750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:08,326-Speed 6327.79 samples/sec Loss 2.6990 LearningRate 0.0000 Epoch: 35 Global Step: 744760 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:11,586-Speed 6282.63 samples/sec Loss 2.7285 LearningRate 0.0000 Epoch: 35 Global Step: 744770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:14,885-Speed 6210.63 samples/sec Loss 2.7270 LearningRate 0.0000 Epoch: 35 Global Step: 744780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:18,146-Speed 6281.93 samples/sec Loss 2.7015 LearningRate 0.0000 Epoch: 35 Global Step: 744790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:21,398-Speed 6298.11 samples/sec Loss 2.7198 LearningRate 0.0000 Epoch: 35 Global Step: 744800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:24,660-Speed 6280.25 samples/sec Loss 2.6443 LearningRate 0.0000 Epoch: 35 Global Step: 744810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:27,916-Speed 6291.55 samples/sec Loss 2.6919 LearningRate 0.0000 Epoch: 35 Global Step: 744820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:31,173-Speed 6289.09 samples/sec Loss 2.7195 LearningRate 0.0000 Epoch: 35 Global Step: 744830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:34,436-Speed 6277.57 samples/sec Loss 2.7202 LearningRate 0.0000 Epoch: 35 Global Step: 744840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:37,686-Speed 6302.58 samples/sec Loss 2.6972 LearningRate 0.0000 Epoch: 35 Global Step: 744850 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:40,937-Speed 6301.58 samples/sec Loss 2.6836 LearningRate 0.0000 Epoch: 35 Global Step: 744860 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:29:44,180-Speed 6316.11 samples/sec Loss 2.6615 LearningRate 0.0000 Epoch: 35 Global Step: 744870 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:47,446-Speed 6271.63 samples/sec Loss 2.7069 LearningRate 0.0000 Epoch: 35 Global Step: 744880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:50,710-Speed 6276.62 samples/sec Loss 2.7167 LearningRate 0.0000 Epoch: 35 Global Step: 744890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:53,973-Speed 6278.30 samples/sec Loss 2.7129 LearningRate 0.0000 Epoch: 35 Global Step: 744900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:29:57,241-Speed 6267.95 samples/sec Loss 2.7020 LearningRate 0.0000 Epoch: 35 Global Step: 744910 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:00,502-Speed 6281.80 samples/sec Loss 2.7343 LearningRate 0.0000 Epoch: 35 Global Step: 744920 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:03,759-Speed 6290.64 samples/sec Loss 2.7414 LearningRate 0.0000 Epoch: 35 Global Step: 744930 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:07,014-Speed 6292.92 samples/sec Loss 2.7128 LearningRate 0.0000 Epoch: 35 Global Step: 744940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:10,274-Speed 6284.26 samples/sec Loss 2.6639 LearningRate 0.0000 Epoch: 35 Global Step: 744950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:13,528-Speed 6293.52 samples/sec Loss 2.7059 LearningRate 0.0000 Epoch: 35 Global Step: 744960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:16,781-Speed 6297.65 samples/sec Loss 2.6963 LearningRate 0.0000 Epoch: 35 Global Step: 744970 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:20,041-Speed 6283.37 samples/sec Loss 2.7514 LearningRate 0.0000 Epoch: 35 Global Step: 744980 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:23,297-Speed 6292.06 samples/sec Loss 2.7184 LearningRate 0.0000 Epoch: 35 Global Step: 744990 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:26,552-Speed 6292.28 samples/sec Loss 2.6826 LearningRate 0.0000 Epoch: 35 Global Step: 745000 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:29,816-Speed 6276.97 samples/sec Loss 2.6284 LearningRate 0.0000 Epoch: 35 Global Step: 745010 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:33,069-Speed 6296.70 samples/sec Loss 2.7009 LearningRate 0.0000 Epoch: 35 Global Step: 745020 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:36,330-Speed 6281.64 samples/sec Loss 2.7108 LearningRate 0.0000 Epoch: 35 Global Step: 745030 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:39,635-Speed 6197.97 samples/sec Loss 2.6928 LearningRate 0.0000 Epoch: 35 Global Step: 745040 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:42,886-Speed 6301.75 samples/sec Loss 2.7187 LearningRate 0.0000 Epoch: 35 Global Step: 745050 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:46,145-Speed 6285.86 samples/sec Loss 2.6710 LearningRate 0.0000 Epoch: 35 Global Step: 745060 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:49,401-Speed 6290.33 samples/sec Loss 2.7376 LearningRate 0.0000 Epoch: 35 Global Step: 745070 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:30:52,636-Speed 6331.29 samples/sec Loss 2.6769 LearningRate 0.0000 Epoch: 35 Global Step: 745080 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:55,895-Speed 6286.22 samples/sec Loss 2.6599 LearningRate 0.0000 Epoch: 35 Global Step: 745090 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:30:59,142-Speed 6309.40 samples/sec Loss 2.7931 LearningRate 0.0000 Epoch: 35 Global Step: 745100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:02,405-Speed 6277.57 samples/sec Loss 2.7338 LearningRate 0.0000 Epoch: 35 Global Step: 745110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:05,666-Speed 6281.62 samples/sec Loss 2.7302 LearningRate 0.0000 Epoch: 35 Global Step: 745120 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:08,921-Speed 6293.99 samples/sec Loss 2.7076 LearningRate 0.0000 Epoch: 35 Global Step: 745130 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:12,177-Speed 6290.84 samples/sec Loss 2.6885 LearningRate 0.0000 Epoch: 35 Global Step: 745140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:15,436-Speed 6285.26 samples/sec Loss 2.7137 LearningRate 0.0000 Epoch: 35 Global Step: 745150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:18,695-Speed 6286.31 samples/sec Loss 2.7181 LearningRate 0.0000 Epoch: 35 Global Step: 745160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:21,955-Speed 6283.86 samples/sec Loss 2.7041 LearningRate 0.0000 Epoch: 35 Global Step: 745170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:25,200-Speed 6311.82 samples/sec Loss 2.7234 LearningRate 0.0000 Epoch: 35 Global Step: 745180 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:28,462-Speed 6280.19 samples/sec Loss 2.7581 LearningRate 0.0000 Epoch: 35 Global Step: 745190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:31,723-Speed 6282.26 samples/sec Loss 2.8107 LearningRate 0.0000 Epoch: 35 Global Step: 745200 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:35,012-Speed 6228.09 samples/sec Loss 2.6681 LearningRate 0.0000 Epoch: 35 Global Step: 745210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:38,261-Speed 6304.04 samples/sec Loss 2.7051 LearningRate 0.0000 Epoch: 35 Global Step: 745220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:41,539-Speed 6249.80 samples/sec Loss 2.6934 LearningRate 0.0000 Epoch: 35 Global Step: 745230 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:44,802-Speed 6278.02 samples/sec Loss 2.6930 LearningRate 0.0000 Epoch: 35 Global Step: 745240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:48,062-Speed 6283.04 samples/sec Loss 2.6960 LearningRate 0.0000 Epoch: 35 Global Step: 745250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:51,328-Speed 6271.41 samples/sec Loss 2.7165 LearningRate 0.0000 Epoch: 35 Global Step: 745260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:54,584-Speed 6291.00 samples/sec Loss 2.7287 LearningRate 0.0000 Epoch: 35 Global Step: 745270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:31:57,827-Speed 6316.51 samples/sec Loss 2.7389 LearningRate 0.0000 Epoch: 35 Global Step: 745280 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:01,084-Speed 6290.78 samples/sec Loss 2.7118 LearningRate 0.0000 Epoch: 35 Global Step: 745290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:04,342-Speed 6285.97 samples/sec Loss 2.7003 LearningRate 0.0000 Epoch: 35 Global Step: 745300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:07,600-Speed 6287.83 samples/sec Loss 2.6695 LearningRate 0.0000 Epoch: 35 Global Step: 745310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:10,859-Speed 6285.90 samples/sec Loss 2.6893 LearningRate 0.0000 Epoch: 35 Global Step: 745320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:14,118-Speed 6286.70 samples/sec Loss 2.7365 LearningRate 0.0000 Epoch: 35 Global Step: 745330 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:17,381-Speed 6278.59 samples/sec Loss 2.7404 LearningRate 0.0000 Epoch: 35 Global Step: 745340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:20,641-Speed 6282.56 samples/sec Loss 2.6965 LearningRate 0.0000 Epoch: 35 Global Step: 745350 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:23,897-Speed 6292.58 samples/sec Loss 2.7052 LearningRate 0.0000 Epoch: 35 Global Step: 745360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:27,224-Speed 6156.36 samples/sec Loss 2.7172 LearningRate 0.0000 Epoch: 35 Global Step: 745370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:30,544-Speed 6170.30 samples/sec Loss 2.6806 LearningRate 0.0000 Epoch: 35 Global Step: 745380 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:32:33,796-Speed 6299.48 samples/sec Loss 2.6416 LearningRate 0.0000 Epoch: 35 Global Step: 745390 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:32:37,039-Speed 6314.55 samples/sec Loss 2.7399 LearningRate 0.0000 Epoch: 35 Global Step: 745400 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:40,294-Speed 6294.91 samples/sec Loss 2.7055 LearningRate 0.0000 Epoch: 35 Global Step: 745410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:43,560-Speed 6272.12 samples/sec Loss 2.6526 LearningRate 0.0000 Epoch: 35 Global Step: 745420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:46,817-Speed 6288.41 samples/sec Loss 2.7191 LearningRate 0.0000 Epoch: 35 Global Step: 745430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:50,077-Speed 6283.88 samples/sec Loss 2.7064 LearningRate 0.0000 Epoch: 35 Global Step: 745440 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:32:53,319-Speed 6317.47 samples/sec Loss 2.7214 LearningRate 0.0000 Epoch: 35 Global Step: 745450 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 11:32:56,579-Speed 6284.18 samples/sec Loss 2.7238 LearningRate 0.0000 Epoch: 35 Global Step: 745460 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 11:32:59,840-Speed 6281.80 samples/sec Loss 2.7160 LearningRate 0.0000 Epoch: 35 Global Step: 745470 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 11:33:03,094-Speed 6296.03 samples/sec Loss 2.6987 LearningRate 0.0000 Epoch: 35 Global Step: 745480 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 11:33:06,352-Speed 6286.57 samples/sec Loss 2.6712 LearningRate 0.0000 Epoch: 35 Global Step: 745490 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 11:33:09,620-Speed 6267.77 samples/sec Loss 2.7247 LearningRate 0.0000 Epoch: 35 Global Step: 745500 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 11:33:12,881-Speed 6281.22 samples/sec Loss 2.6587 LearningRate 0.0000 Epoch: 35 Global Step: 745510 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 11:33:16,141-Speed 6284.50 samples/sec Loss 2.6976 LearningRate 0.0000 Epoch: 35 Global Step: 745520 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 11:33:19,404-Speed 6279.23 samples/sec Loss 2.7249 LearningRate 0.0000 Epoch: 35 Global Step: 745530 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 11:33:22,675-Speed 6262.02 samples/sec Loss 2.7400 LearningRate 0.0000 Epoch: 35 Global Step: 745540 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2022-04-03 11:33:25,930-Speed 6293.22 samples/sec Loss 2.6998 LearningRate 0.0000 Epoch: 35 Global Step: 745550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:33:29,187-Speed 6289.87 samples/sec Loss 2.7245 LearningRate 0.0000 Epoch: 35 Global Step: 745560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:33:32,446-Speed 6285.93 samples/sec Loss 2.6747 LearningRate 0.0000 Epoch: 35 Global Step: 745570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:33:35,698-Speed 6298.04 samples/sec Loss 2.6682 LearningRate 0.0000 Epoch: 35 Global Step: 745580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:33:38,952-Speed 6295.27 samples/sec Loss 2.6693 LearningRate 0.0000 Epoch: 35 Global Step: 745590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:33:42,254-Speed 6203.31 samples/sec Loss 2.6786 LearningRate 0.0000 Epoch: 35 Global Step: 745600 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:33:45,510-Speed 6290.92 samples/sec Loss 2.7053 LearningRate 0.0000 Epoch: 35 Global Step: 745610 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:33:48,768-Speed 6289.05 samples/sec Loss 2.7438 LearningRate 0.0000 Epoch: 35 Global Step: 745620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:33:52,020-Speed 6298.17 samples/sec Loss 2.7121 LearningRate 0.0000 Epoch: 35 Global Step: 745630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:33:55,280-Speed 6283.61 samples/sec Loss 2.6968 LearningRate 0.0000 Epoch: 35 Global Step: 745640 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:33:58,552-Speed 6260.11 samples/sec Loss 2.7986 LearningRate 0.0000 Epoch: 35 Global Step: 745650 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:34:01,807-Speed 6293.81 samples/sec Loss 2.7106 LearningRate 0.0000 Epoch: 35 Global Step: 745660 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:05,061-Speed 6295.38 samples/sec Loss 2.7277 LearningRate 0.0000 Epoch: 35 Global Step: 745670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:08,314-Speed 6296.09 samples/sec Loss 2.6903 LearningRate 0.0000 Epoch: 35 Global Step: 745680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:11,578-Speed 6276.58 samples/sec Loss 2.7145 LearningRate 0.0000 Epoch: 35 Global Step: 745690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:14,832-Speed 6294.75 samples/sec Loss 2.6549 LearningRate 0.0000 Epoch: 35 Global Step: 745700 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:18,091-Speed 6286.86 samples/sec Loss 2.7114 LearningRate 0.0000 Epoch: 35 Global Step: 745710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:21,350-Speed 6284.95 samples/sec Loss 2.7332 LearningRate 0.0000 Epoch: 35 Global Step: 745720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:24,611-Speed 6281.94 samples/sec Loss 2.7292 LearningRate 0.0000 Epoch: 35 Global Step: 745730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:27,879-Speed 6267.36 samples/sec Loss 2.7233 LearningRate 0.0000 Epoch: 35 Global Step: 745740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:31,130-Speed 6301.63 samples/sec Loss 2.6883 LearningRate 0.0000 Epoch: 35 Global Step: 745750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:34,390-Speed 6284.79 samples/sec Loss 2.7555 LearningRate 0.0000 Epoch: 35 Global Step: 745760 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:34:37,634-Speed 6314.78 samples/sec Loss 2.7195 LearningRate 0.0000 Epoch: 35 Global Step: 745770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:40,886-Speed 6297.77 samples/sec Loss 2.7223 LearningRate 0.0000 Epoch: 35 Global Step: 745780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:44,144-Speed 6287.75 samples/sec Loss 2.7422 LearningRate 0.0000 Epoch: 35 Global Step: 745790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:47,403-Speed 6285.84 samples/sec Loss 2.7046 LearningRate 0.0000 Epoch: 35 Global Step: 745800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:50,656-Speed 6296.11 samples/sec Loss 2.7265 LearningRate 0.0000 Epoch: 35 Global Step: 745810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:53,911-Speed 6293.14 samples/sec Loss 2.6715 LearningRate 0.0000 Epoch: 35 Global Step: 745820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:34:57,168-Speed 6290.27 samples/sec Loss 2.7155 LearningRate 0.0000 Epoch: 35 Global Step: 745830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:00,480-Speed 6185.58 samples/sec Loss 2.7524 LearningRate 0.0000 Epoch: 35 Global Step: 745840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:03,743-Speed 6276.64 samples/sec Loss 2.6961 LearningRate 0.0000 Epoch: 35 Global Step: 745850 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:07,036-Speed 6220.02 samples/sec Loss 2.6826 LearningRate 0.0000 Epoch: 35 Global Step: 745860 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:10,289-Speed 6297.67 samples/sec Loss 2.7348 LearningRate 0.0000 Epoch: 35 Global Step: 745870 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:35:13,533-Speed 6314.29 samples/sec Loss 2.6769 LearningRate 0.0000 Epoch: 35 Global Step: 745880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:16,822-Speed 6228.40 samples/sec Loss 2.7509 LearningRate 0.0000 Epoch: 35 Global Step: 745890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:20,081-Speed 6286.56 samples/sec Loss 2.7101 LearningRate 0.0000 Epoch: 35 Global Step: 745900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:23,331-Speed 6301.67 samples/sec Loss 2.7136 LearningRate 0.0000 Epoch: 35 Global Step: 745910 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:26,585-Speed 6295.20 samples/sec Loss 2.7304 LearningRate 0.0000 Epoch: 35 Global Step: 745920 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:29,850-Speed 6275.48 samples/sec Loss 2.6728 LearningRate 0.0000 Epoch: 35 Global Step: 745930 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:33,100-Speed 6302.81 samples/sec Loss 2.6955 LearningRate 0.0000 Epoch: 35 Global Step: 745940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:36,363-Speed 6278.50 samples/sec Loss 2.6872 LearningRate 0.0000 Epoch: 35 Global Step: 745950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:39,614-Speed 6299.65 samples/sec Loss 2.6878 LearningRate 0.0000 Epoch: 35 Global Step: 745960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:42,875-Speed 6282.79 samples/sec Loss 2.7325 LearningRate 0.0000 Epoch: 35 Global Step: 745970 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:46,117-Speed 6318.50 samples/sec Loss 2.7115 LearningRate 0.0000 Epoch: 35 Global Step: 745980 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:49,373-Speed 6291.84 samples/sec Loss 2.6843 LearningRate 0.0000 Epoch: 35 Global Step: 745990 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:52,631-Speed 6286.34 samples/sec Loss 2.7026 LearningRate 0.0000 Epoch: 35 Global Step: 746000 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:55,891-Speed 6283.46 samples/sec Loss 2.6979 LearningRate 0.0000 Epoch: 35 Global Step: 746010 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:35:59,140-Speed 6306.20 samples/sec Loss 2.7142 LearningRate 0.0000 Epoch: 35 Global Step: 746020 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:02,397-Speed 6288.16 samples/sec Loss 2.7312 LearningRate 0.0000 Epoch: 35 Global Step: 746030 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:05,649-Speed 6299.19 samples/sec Loss 2.7082 LearningRate 0.0000 Epoch: 35 Global Step: 746040 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:08,909-Speed 6284.22 samples/sec Loss 2.7194 LearningRate 0.0000 Epoch: 35 Global Step: 746050 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:12,180-Speed 6261.84 samples/sec Loss 2.7241 LearningRate 0.0000 Epoch: 35 Global Step: 746060 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:15,444-Speed 6276.19 samples/sec Loss 2.6956 LearningRate 0.0000 Epoch: 35 Global Step: 746070 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:18,688-Speed 6315.18 samples/sec Loss 2.7162 LearningRate 0.0000 Epoch: 35 Global Step: 746080 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:21,950-Speed 6278.98 samples/sec Loss 2.7176 LearningRate 0.0000 Epoch: 35 Global Step: 746090 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:25,212-Speed 6280.00 samples/sec Loss 2.7263 LearningRate 0.0000 Epoch: 35 Global Step: 746100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:28,491-Speed 6246.75 samples/sec Loss 2.6844 LearningRate 0.0000 Epoch: 35 Global Step: 746110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:31,747-Speed 6291.29 samples/sec Loss 2.7196 LearningRate 0.0000 Epoch: 35 Global Step: 746120 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:35,008-Speed 6282.00 samples/sec Loss 2.6512 LearningRate 0.0000 Epoch: 35 Global Step: 746130 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:38,265-Speed 6289.15 samples/sec Loss 2.6765 LearningRate 0.0000 Epoch: 35 Global Step: 746140 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:41,516-Speed 6300.25 samples/sec Loss 2.7291 LearningRate 0.0000 Epoch: 35 Global Step: 746150 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:44,771-Speed 6293.94 samples/sec Loss 2.7260 LearningRate 0.0000 Epoch: 35 Global Step: 746160 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:48,034-Speed 6279.60 samples/sec Loss 2.7307 LearningRate 0.0000 Epoch: 35 Global Step: 746170 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:51,288-Speed 6294.87 samples/sec Loss 2.6858 LearningRate 0.0000 Epoch: 35 Global Step: 746180 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:36:54,538-Speed 6302.55 samples/sec Loss 2.6966 LearningRate 0.0000 Epoch: 35 Global Step: 746190 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:36:57,797-Speed 6286.56 samples/sec Loss 2.7010 LearningRate 0.0000 Epoch: 35 Global Step: 746200 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:01,058-Speed 6280.91 samples/sec Loss 2.6739 LearningRate 0.0000 Epoch: 35 Global Step: 746210 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:04,319-Speed 6282.63 samples/sec Loss 2.6778 LearningRate 0.0000 Epoch: 35 Global Step: 746220 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:07,577-Speed 6287.52 samples/sec Loss 2.7161 LearningRate 0.0000 Epoch: 35 Global Step: 746230 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:10,830-Speed 6295.88 samples/sec Loss 2.6861 LearningRate 0.0000 Epoch: 35 Global Step: 746240 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:14,086-Speed 6291.11 samples/sec Loss 2.7187 LearningRate 0.0000 Epoch: 35 Global Step: 746250 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:17,348-Speed 6279.58 samples/sec Loss 2.6834 LearningRate 0.0000 Epoch: 35 Global Step: 746260 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:20,609-Speed 6283.14 samples/sec Loss 2.7231 LearningRate 0.0000 Epoch: 35 Global Step: 746270 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:23,865-Speed 6290.98 samples/sec Loss 2.7201 LearningRate 0.0000 Epoch: 35 Global Step: 746280 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:27,110-Speed 6312.29 samples/sec Loss 2.7608 LearningRate 0.0000 Epoch: 35 Global Step: 746290 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:30,399-Speed 6227.88 samples/sec Loss 2.7798 LearningRate 0.0000 Epoch: 35 Global Step: 746300 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:33,659-Speed 6282.81 samples/sec Loss 2.7428 LearningRate 0.0000 Epoch: 35 Global Step: 746310 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:36,920-Speed 6283.74 samples/sec Loss 2.7203 LearningRate 0.0000 Epoch: 35 Global Step: 746320 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:40,181-Speed 6280.84 samples/sec Loss 2.7448 LearningRate 0.0000 Epoch: 35 Global Step: 746330 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:43,444-Speed 6277.99 samples/sec Loss 2.6679 LearningRate 0.0000 Epoch: 35 Global Step: 746340 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:46,704-Speed 6283.52 samples/sec Loss 2.7107 LearningRate 0.0000 Epoch: 35 Global Step: 746350 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:49,962-Speed 6286.30 samples/sec Loss 2.7194 LearningRate 0.0000 Epoch: 35 Global Step: 746360 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:53,227-Speed 6274.52 samples/sec Loss 2.6939 LearningRate 0.0000 Epoch: 35 Global Step: 746370 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:56,476-Speed 6304.17 samples/sec Loss 2.6421 LearningRate 0.0000 Epoch: 35 Global Step: 746380 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:37:59,790-Speed 6181.80 samples/sec Loss 2.6886 LearningRate 0.0000 Epoch: 35 Global Step: 746390 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:38:03,151-Speed 6096.01 samples/sec Loss 2.7291 LearningRate 0.0000 Epoch: 35 Global Step: 746400 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:06,406-Speed 6293.95 samples/sec Loss 2.6383 LearningRate 0.0000 Epoch: 35 Global Step: 746410 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:09,658-Speed 6297.37 samples/sec Loss 2.6967 LearningRate 0.0000 Epoch: 35 Global Step: 746420 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:12,907-Speed 6304.76 samples/sec Loss 2.7077 LearningRate 0.0000 Epoch: 35 Global Step: 746430 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:16,158-Speed 6301.12 samples/sec Loss 2.6669 LearningRate 0.0000 Epoch: 35 Global Step: 746440 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:19,411-Speed 6298.76 samples/sec Loss 2.7441 LearningRate 0.0000 Epoch: 35 Global Step: 746450 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:22,674-Speed 6278.82 samples/sec Loss 2.7426 LearningRate 0.0000 Epoch: 35 Global Step: 746460 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:25,931-Speed 6289.78 samples/sec Loss 2.6597 LearningRate 0.0000 Epoch: 35 Global Step: 746470 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:29,190-Speed 6286.33 samples/sec Loss 2.7572 LearningRate 0.0000 Epoch: 35 Global Step: 746480 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:32,448-Speed 6286.01 samples/sec Loss 2.7299 LearningRate 0.0000 Epoch: 35 Global Step: 746490 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:35,689-Speed 6321.20 samples/sec Loss 2.6915 LearningRate 0.0000 Epoch: 35 Global Step: 746500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:38,950-Speed 6280.72 samples/sec Loss 2.6901 LearningRate 0.0000 Epoch: 35 Global Step: 746510 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:42,203-Speed 6298.52 samples/sec Loss 2.6900 LearningRate 0.0000 Epoch: 35 Global Step: 746520 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:45,457-Speed 6293.87 samples/sec Loss 2.7463 LearningRate 0.0000 Epoch: 35 Global Step: 746530 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:48,715-Speed 6288.59 samples/sec Loss 2.6945 LearningRate 0.0000 Epoch: 35 Global Step: 746540 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:51,976-Speed 6284.15 samples/sec Loss 2.6747 LearningRate 0.0000 Epoch: 35 Global Step: 746550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:55,232-Speed 6291.63 samples/sec Loss 2.7323 LearningRate 0.0000 Epoch: 35 Global Step: 746560 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:38:58,491-Speed 6285.71 samples/sec Loss 2.7021 LearningRate 0.0000 Epoch: 35 Global Step: 746570 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:39:01,751-Speed 6284.32 samples/sec Loss 2.7409 LearningRate 0.0000 Epoch: 35 Global Step: 746580 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:39:05,040-Speed 6227.68 samples/sec Loss 2.7369 LearningRate 0.0000 Epoch: 35 Global Step: 746590 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:39:08,291-Speed 6300.43 samples/sec Loss 2.7659 LearningRate 0.0000 Epoch: 35 Global Step: 746600 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:39:11,610-Speed 6173.58 samples/sec Loss 2.6752 LearningRate 0.0000 Epoch: 35 Global Step: 746610 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:39:14,954-Speed 6125.57 samples/sec Loss 2.6856 LearningRate 0.0000 Epoch: 35 Global Step: 746620 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:14,531-Speed 343.76 samples/sec Loss 2.7423 LearningRate 0.0000 Epoch: 36 Global Step: 746630 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:17,777-Speed 6311.66 samples/sec Loss 2.6858 LearningRate 0.0000 Epoch: 36 Global Step: 746640 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:21,011-Speed 6333.08 samples/sec Loss 2.7343 LearningRate 0.0000 Epoch: 36 Global Step: 746650 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:24,257-Speed 6312.10 samples/sec Loss 2.7351 LearningRate 0.0000 Epoch: 36 Global Step: 746660 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:27,530-Speed 6257.44 samples/sec Loss 2.6947 LearningRate 0.0000 Epoch: 36 Global Step: 746670 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:30,773-Speed 6316.63 samples/sec Loss 2.7166 LearningRate 0.0000 Epoch: 36 Global Step: 746680 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:34,020-Speed 6309.98 samples/sec Loss 2.7199 LearningRate 0.0000 Epoch: 36 Global Step: 746690 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:37,277-Speed 6289.14 samples/sec Loss 2.7531 LearningRate 0.0000 Epoch: 36 Global Step: 746700 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:40:40,514-Speed 6326.83 samples/sec Loss 2.7188 LearningRate 0.0000 Epoch: 36 Global Step: 746710 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:43,765-Speed 6301.21 samples/sec Loss 2.7255 LearningRate 0.0000 Epoch: 36 Global Step: 746720 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:47,016-Speed 6302.52 samples/sec Loss 2.7449 LearningRate 0.0000 Epoch: 36 Global Step: 746730 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:50,255-Speed 6324.33 samples/sec Loss 2.7036 LearningRate 0.0000 Epoch: 36 Global Step: 746740 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:53,497-Speed 6318.21 samples/sec Loss 2.7120 LearningRate 0.0000 Epoch: 36 Global Step: 746750 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:56,751-Speed 6295.23 samples/sec Loss 2.7030 LearningRate 0.0000 Epoch: 36 Global Step: 746760 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:40:59,991-Speed 6321.89 samples/sec Loss 2.7115 LearningRate 0.0000 Epoch: 36 Global Step: 746770 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:03,235-Speed 6314.91 samples/sec Loss 2.7356 LearningRate 0.0000 Epoch: 36 Global Step: 746780 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:06,485-Speed 6303.90 samples/sec Loss 2.7041 LearningRate 0.0000 Epoch: 36 Global Step: 746790 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:09,735-Speed 6303.10 samples/sec Loss 2.6971 LearningRate 0.0000 Epoch: 36 Global Step: 746800 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:12,964-Speed 6342.89 samples/sec Loss 2.6523 LearningRate 0.0000 Epoch: 36 Global Step: 746810 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:16,213-Speed 6305.81 samples/sec Loss 2.7199 LearningRate 0.0000 Epoch: 36 Global Step: 746820 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:19,472-Speed 6285.62 samples/sec Loss 2.6902 LearningRate 0.0000 Epoch: 36 Global Step: 746830 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:22,728-Speed 6290.80 samples/sec Loss 2.7164 LearningRate 0.0000 Epoch: 36 Global Step: 746840 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:25,983-Speed 6293.62 samples/sec Loss 2.6603 LearningRate 0.0000 Epoch: 36 Global Step: 746850 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:29,234-Speed 6301.59 samples/sec Loss 2.6687 LearningRate 0.0000 Epoch: 36 Global Step: 746860 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:32,481-Speed 6308.21 samples/sec Loss 2.6993 LearningRate 0.0000 Epoch: 36 Global Step: 746870 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:35,735-Speed 6294.42 samples/sec Loss 2.6634 LearningRate 0.0000 Epoch: 36 Global Step: 746880 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:39,031-Speed 6215.65 samples/sec Loss 2.7574 LearningRate 0.0000 Epoch: 36 Global Step: 746890 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:42,291-Speed 6284.85 samples/sec Loss 2.6381 LearningRate 0.0000 Epoch: 36 Global Step: 746900 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:45,528-Speed 6326.94 samples/sec Loss 2.6841 LearningRate 0.0000 Epoch: 36 Global Step: 746910 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:48,781-Speed 6297.67 samples/sec Loss 2.6709 LearningRate 0.0000 Epoch: 36 Global Step: 746920 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:52,033-Speed 6298.89 samples/sec Loss 2.7024 LearningRate 0.0000 Epoch: 36 Global Step: 746930 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:55,284-Speed 6300.41 samples/sec Loss 2.6857 LearningRate 0.0000 Epoch: 36 Global Step: 746940 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:41:58,530-Speed 6311.41 samples/sec Loss 2.7084 LearningRate 0.0000 Epoch: 36 Global Step: 746950 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:01,771-Speed 6320.87 samples/sec Loss 2.7007 LearningRate 0.0000 Epoch: 36 Global Step: 746960 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:05,015-Speed 6312.97 samples/sec Loss 2.6408 LearningRate 0.0000 Epoch: 36 Global Step: 746970 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:08,252-Speed 6328.27 samples/sec Loss 2.6690 LearningRate 0.0000 Epoch: 36 Global Step: 746980 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:11,497-Speed 6314.30 samples/sec Loss 2.6724 LearningRate 0.0000 Epoch: 36 Global Step: 746990 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:14,790-Speed 6219.91 samples/sec Loss 2.7116 LearningRate 0.0000 Epoch: 36 Global Step: 747000 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:18,035-Speed 6313.51 samples/sec Loss 2.7285 LearningRate 0.0000 Epoch: 36 Global Step: 747010 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-03 11:42:21,267-Speed 6338.79 samples/sec Loss 2.7156 LearningRate 0.0000 Epoch: 36 Global Step: 747020 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:24,514-Speed 6308.51 samples/sec Loss 2.6743 LearningRate 0.0000 Epoch: 36 Global Step: 747030 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:27,762-Speed 6306.53 samples/sec Loss 2.6665 LearningRate 0.0000 Epoch: 36 Global Step: 747040 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:31,004-Speed 6318.16 samples/sec Loss 2.7247 LearningRate 0.0000 Epoch: 36 Global Step: 747050 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:34,249-Speed 6313.05 samples/sec Loss 2.6965 LearningRate 0.0000 Epoch: 36 Global Step: 747060 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:37,491-Speed 6317.41 samples/sec Loss 2.7322 LearningRate 0.0000 Epoch: 36 Global Step: 747070 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:40,736-Speed 6313.65 samples/sec Loss 2.7061 LearningRate 0.0000 Epoch: 36 Global Step: 747080 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:43,979-Speed 6316.47 samples/sec Loss 2.7030 LearningRate 0.0000 Epoch: 36 Global Step: 747090 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:47,222-Speed 6316.49 samples/sec Loss 2.6706 LearningRate 0.0000 Epoch: 36 Global Step: 747100 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:50,469-Speed 6307.81 samples/sec Loss 2.7347 LearningRate 0.0000 Epoch: 36 Global Step: 747110 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:53,705-Speed 6330.12 samples/sec Loss 2.6557 LearningRate 0.0000 Epoch: 36 Global Step: 747120 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2022-04-03 11:42:56,950-Speed 6313.98 samples/sec Loss 2.7111 LearningRate 0.0000 Epoch: 36 Global Step: 747130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:00,192-Speed 6317.75 samples/sec Loss 2.7011 LearningRate 0.0000 Epoch: 36 Global Step: 747140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:03,438-Speed 6310.16 samples/sec Loss 2.6559 LearningRate 0.0000 Epoch: 36 Global Step: 747150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:06,683-Speed 6312.36 samples/sec Loss 2.6870 LearningRate 0.0000 Epoch: 36 Global Step: 747160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:09,931-Speed 6307.62 samples/sec Loss 2.7517 LearningRate 0.0000 Epoch: 36 Global Step: 747170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:13,181-Speed 6303.87 samples/sec Loss 2.7019 LearningRate 0.0000 Epoch: 36 Global Step: 747180 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:16,433-Speed 6299.00 samples/sec Loss 2.6836 LearningRate 0.0000 Epoch: 36 Global Step: 747190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:19,688-Speed 6294.43 samples/sec Loss 2.7167 LearningRate 0.0000 Epoch: 36 Global Step: 747200 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:22,938-Speed 6302.80 samples/sec Loss 2.6951 LearningRate 0.0000 Epoch: 36 Global Step: 747210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:26,174-Speed 6329.34 samples/sec Loss 2.7095 LearningRate 0.0000 Epoch: 36 Global Step: 747220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:29,421-Speed 6307.88 samples/sec Loss 2.6717 LearningRate 0.0000 Epoch: 36 Global Step: 747230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:32,677-Speed 6293.49 samples/sec Loss 2.6167 LearningRate 0.0000 Epoch: 36 Global Step: 747240 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:35,928-Speed 6299.39 samples/sec Loss 2.7481 LearningRate 0.0000 Epoch: 36 Global Step: 747250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:39,185-Speed 6289.46 samples/sec Loss 2.6824 LearningRate 0.0000 Epoch: 36 Global Step: 747260 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:42,431-Speed 6310.66 samples/sec Loss 2.7112 LearningRate 0.0000 Epoch: 36 Global Step: 747270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:45,680-Speed 6305.97 samples/sec Loss 2.6893 LearningRate 0.0000 Epoch: 36 Global Step: 747280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:48,999-Speed 6170.92 samples/sec Loss 2.6791 LearningRate 0.0000 Epoch: 36 Global Step: 747290 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:52,349-Speed 6115.60 samples/sec Loss 2.7167 LearningRate 0.0000 Epoch: 36 Global Step: 747300 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:55,592-Speed 6316.84 samples/sec Loss 2.7406 LearningRate 0.0000 Epoch: 36 Global Step: 747310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:43:58,839-Speed 6307.88 samples/sec Loss 2.6228 LearningRate 0.0000 Epoch: 36 Global Step: 747320 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:44:02,073-Speed 6333.86 samples/sec Loss 2.7005 LearningRate 0.0000 Epoch: 36 Global Step: 747330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:05,321-Speed 6306.47 samples/sec Loss 2.7135 LearningRate 0.0000 Epoch: 36 Global Step: 747340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:08,571-Speed 6303.14 samples/sec Loss 2.7315 LearningRate 0.0000 Epoch: 36 Global Step: 747350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:11,818-Speed 6309.26 samples/sec Loss 2.7421 LearningRate 0.0000 Epoch: 36 Global Step: 747360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:15,067-Speed 6304.06 samples/sec Loss 2.6941 LearningRate 0.0000 Epoch: 36 Global Step: 747370 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:18,308-Speed 6321.70 samples/sec Loss 2.7501 LearningRate 0.0000 Epoch: 36 Global Step: 747380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:21,563-Speed 6292.91 samples/sec Loss 2.6844 LearningRate 0.0000 Epoch: 36 Global Step: 747390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:24,823-Speed 6283.68 samples/sec Loss 2.6665 LearningRate 0.0000 Epoch: 36 Global Step: 747400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:28,065-Speed 6318.48 samples/sec Loss 2.7210 LearningRate 0.0000 Epoch: 36 Global Step: 747410 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:31,323-Speed 6291.50 samples/sec Loss 2.6801 LearningRate 0.0000 Epoch: 36 Global Step: 747420 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:34,552-Speed 6343.27 samples/sec Loss 2.6642 LearningRate 0.0000 Epoch: 36 Global Step: 747430 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:37,804-Speed 6299.48 samples/sec Loss 2.7067 LearningRate 0.0000 Epoch: 36 Global Step: 747440 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:41,052-Speed 6306.00 samples/sec Loss 2.6494 LearningRate 0.0000 Epoch: 36 Global Step: 747450 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:44,305-Speed 6296.93 samples/sec Loss 2.7137 LearningRate 0.0000 Epoch: 36 Global Step: 747460 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:47,552-Speed 6309.87 samples/sec Loss 2.7201 LearningRate 0.0000 Epoch: 36 Global Step: 747470 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:50,797-Speed 6311.97 samples/sec Loss 2.6889 LearningRate 0.0000 Epoch: 36 Global Step: 747480 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:54,079-Speed 6242.26 samples/sec Loss 2.7020 LearningRate 0.0000 Epoch: 36 Global Step: 747490 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:44:57,326-Speed 6308.81 samples/sec Loss 2.7069 LearningRate 0.0000 Epoch: 36 Global Step: 747500 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:00,570-Speed 6314.62 samples/sec Loss 2.6666 LearningRate 0.0000 Epoch: 36 Global Step: 747510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:03,811-Speed 6320.66 samples/sec Loss 2.6884 LearningRate 0.0000 Epoch: 36 Global Step: 747520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:07,048-Speed 6326.65 samples/sec Loss 2.6806 LearningRate 0.0000 Epoch: 36 Global Step: 747530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:10,294-Speed 6310.73 samples/sec Loss 2.6905 LearningRate 0.0000 Epoch: 36 Global Step: 747540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:13,537-Speed 6316.38 samples/sec Loss 2.6769 LearningRate 0.0000 Epoch: 36 Global Step: 747550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:16,784-Speed 6309.93 samples/sec Loss 2.7332 LearningRate 0.0000 Epoch: 36 Global Step: 747560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:20,036-Speed 6298.22 samples/sec Loss 2.7704 LearningRate 0.0000 Epoch: 36 Global Step: 747570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:23,329-Speed 6221.70 samples/sec Loss 2.6965 LearningRate 0.0000 Epoch: 36 Global Step: 747580 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:26,593-Speed 6275.01 samples/sec Loss 2.6887 LearningRate 0.0000 Epoch: 36 Global Step: 747590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:29,842-Speed 6304.75 samples/sec Loss 2.6962 LearningRate 0.0000 Epoch: 36 Global Step: 747600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:33,089-Speed 6309.67 samples/sec Loss 2.6547 LearningRate 0.0000 Epoch: 36 Global Step: 747610 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:36,337-Speed 6307.58 samples/sec Loss 2.7261 LearningRate 0.0000 Epoch: 36 Global Step: 747620 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:39,583-Speed 6309.53 samples/sec Loss 2.6224 LearningRate 0.0000 Epoch: 36 Global Step: 747630 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:45:42,812-Speed 6344.32 samples/sec Loss 2.6590 LearningRate 0.0000 Epoch: 36 Global Step: 747640 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:46,056-Speed 6315.88 samples/sec Loss 2.7391 LearningRate 0.0000 Epoch: 36 Global Step: 747650 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:49,305-Speed 6303.82 samples/sec Loss 2.7099 LearningRate 0.0000 Epoch: 36 Global Step: 747660 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:52,554-Speed 6305.47 samples/sec Loss 2.7183 LearningRate 0.0000 Epoch: 36 Global Step: 747670 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:55,800-Speed 6310.96 samples/sec Loss 2.7100 LearningRate 0.0000 Epoch: 36 Global Step: 747680 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:45:59,048-Speed 6305.77 samples/sec Loss 2.7344 LearningRate 0.0000 Epoch: 36 Global Step: 747690 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:02,300-Speed 6300.18 samples/sec Loss 2.7458 LearningRate 0.0000 Epoch: 36 Global Step: 747700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:05,551-Speed 6300.13 samples/sec Loss 2.6636 LearningRate 0.0000 Epoch: 36 Global Step: 747710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:08,797-Speed 6311.03 samples/sec Loss 2.7120 LearningRate 0.0000 Epoch: 36 Global Step: 747720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:12,102-Speed 6198.48 samples/sec Loss 2.7308 LearningRate 0.0000 Epoch: 36 Global Step: 747730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:15,353-Speed 6301.49 samples/sec Loss 2.7357 LearningRate 0.0000 Epoch: 36 Global Step: 747740 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:46:18,587-Speed 6333.39 samples/sec Loss 2.6504 LearningRate 0.0000 Epoch: 36 Global Step: 747750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:21,839-Speed 6299.52 samples/sec Loss 2.6284 LearningRate 0.0000 Epoch: 36 Global Step: 747760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:25,094-Speed 6297.23 samples/sec Loss 2.6984 LearningRate 0.0000 Epoch: 36 Global Step: 747770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:28,344-Speed 6302.77 samples/sec Loss 2.6942 LearningRate 0.0000 Epoch: 36 Global Step: 747780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:31,594-Speed 6301.09 samples/sec Loss 2.7198 LearningRate 0.0000 Epoch: 36 Global Step: 747790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:34,846-Speed 6300.58 samples/sec Loss 2.7094 LearningRate 0.0000 Epoch: 36 Global Step: 747800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:38,098-Speed 6298.63 samples/sec Loss 2.7279 LearningRate 0.0000 Epoch: 36 Global Step: 747810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:41,357-Speed 6285.49 samples/sec Loss 2.7424 LearningRate 0.0000 Epoch: 36 Global Step: 747820 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:44,606-Speed 6305.13 samples/sec Loss 2.6886 LearningRate 0.0000 Epoch: 36 Global Step: 747830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:47,858-Speed 6298.71 samples/sec Loss 2.7518 LearningRate 0.0000 Epoch: 36 Global Step: 747840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:51,093-Speed 6334.12 samples/sec Loss 2.7566 LearningRate 0.0000 Epoch: 36 Global Step: 747850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:54,337-Speed 6313.52 samples/sec Loss 2.6386 LearningRate 0.0000 Epoch: 36 Global Step: 747860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:46:57,583-Speed 6311.79 samples/sec Loss 2.6869 LearningRate 0.0000 Epoch: 36 Global Step: 747870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:00,825-Speed 6316.93 samples/sec Loss 2.7116 LearningRate 0.0000 Epoch: 36 Global Step: 747880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:04,068-Speed 6317.29 samples/sec Loss 2.6946 LearningRate 0.0000 Epoch: 36 Global Step: 747890 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:07,317-Speed 6304.76 samples/sec Loss 2.7113 LearningRate 0.0000 Epoch: 36 Global Step: 747900 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:10,565-Speed 6307.48 samples/sec Loss 2.6969 LearningRate 0.0000 Epoch: 36 Global Step: 747910 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:13,815-Speed 6301.28 samples/sec Loss 2.6710 LearningRate 0.0000 Epoch: 36 Global Step: 747920 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:17,063-Speed 6307.44 samples/sec Loss 2.7130 LearningRate 0.0000 Epoch: 36 Global Step: 747930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:20,315-Speed 6299.23 samples/sec Loss 2.6992 LearningRate 0.0000 Epoch: 36 Global Step: 747940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:23,567-Speed 6299.24 samples/sec Loss 2.6786 LearningRate 0.0000 Epoch: 36 Global Step: 747950 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:47:26,800-Speed 6336.55 samples/sec Loss 2.6965 LearningRate 0.0000 Epoch: 36 Global Step: 747960 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:30,045-Speed 6312.40 samples/sec Loss 2.6825 LearningRate 0.0000 Epoch: 36 Global Step: 747970 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:33,296-Speed 6301.39 samples/sec Loss 2.6940 LearningRate 0.0000 Epoch: 36 Global Step: 747980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:36,552-Speed 6291.02 samples/sec Loss 2.7151 LearningRate 0.0000 Epoch: 36 Global Step: 747990 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:39,810-Speed 6288.01 samples/sec Loss 2.6570 LearningRate 0.0000 Epoch: 36 Global Step: 748000 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:43,056-Speed 6311.41 samples/sec Loss 2.7091 LearningRate 0.0000 Epoch: 36 Global Step: 748010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:46,306-Speed 6302.90 samples/sec Loss 2.8247 LearningRate 0.0000 Epoch: 36 Global Step: 748020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:49,559-Speed 6296.06 samples/sec Loss 2.6770 LearningRate 0.0000 Epoch: 36 Global Step: 748030 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:52,807-Speed 6307.17 samples/sec Loss 2.6946 LearningRate 0.0000 Epoch: 36 Global Step: 748040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:56,058-Speed 6301.21 samples/sec Loss 2.6867 LearningRate 0.0000 Epoch: 36 Global Step: 748050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:47:59,290-Speed 6338.42 samples/sec Loss 2.7197 LearningRate 0.0000 Epoch: 36 Global Step: 748060 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:02,541-Speed 6300.80 samples/sec Loss 2.7363 LearningRate 0.0000 Epoch: 36 Global Step: 748070 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:05,785-Speed 6313.73 samples/sec Loss 2.6911 LearningRate 0.0000 Epoch: 36 Global Step: 748080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:09,043-Speed 6287.40 samples/sec Loss 2.6306 LearningRate 0.0000 Epoch: 36 Global Step: 748090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:12,291-Speed 6307.93 samples/sec Loss 2.6881 LearningRate 0.0000 Epoch: 36 Global Step: 748100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:15,539-Speed 6306.58 samples/sec Loss 2.7092 LearningRate 0.0000 Epoch: 36 Global Step: 748110 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:18,820-Speed 6242.50 samples/sec Loss 2.7079 LearningRate 0.0000 Epoch: 36 Global Step: 748120 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:22,075-Speed 6294.12 samples/sec Loss 2.7201 LearningRate 0.0000 Epoch: 36 Global Step: 748130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:25,329-Speed 6294.72 samples/sec Loss 2.6860 LearningRate 0.0000 Epoch: 36 Global Step: 748140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:28,583-Speed 6295.19 samples/sec Loss 2.6327 LearningRate 0.0000 Epoch: 36 Global Step: 748150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:31,821-Speed 6326.11 samples/sec Loss 2.6937 LearningRate 0.0000 Epoch: 36 Global Step: 748160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:35,073-Speed 6299.94 samples/sec Loss 2.7077 LearningRate 0.0000 Epoch: 36 Global Step: 748170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:38,332-Speed 6286.08 samples/sec Loss 2.6985 LearningRate 0.0000 Epoch: 36 Global Step: 748180 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:41,589-Speed 6289.33 samples/sec Loss 2.7101 LearningRate 0.0000 Epoch: 36 Global Step: 748190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:44,840-Speed 6301.27 samples/sec Loss 2.6574 LearningRate 0.0000 Epoch: 36 Global Step: 748200 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:48,093-Speed 6297.57 samples/sec Loss 2.6760 LearningRate 0.0000 Epoch: 36 Global Step: 748210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:51,351-Speed 6286.99 samples/sec Loss 2.7284 LearningRate 0.0000 Epoch: 36 Global Step: 748220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:54,630-Speed 6247.51 samples/sec Loss 2.6941 LearningRate 0.0000 Epoch: 36 Global Step: 748230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:48:57,879-Speed 6304.45 samples/sec Loss 2.7299 LearningRate 0.0000 Epoch: 36 Global Step: 748240 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:01,131-Speed 6299.57 samples/sec Loss 2.6795 LearningRate 0.0000 Epoch: 36 Global Step: 748250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:04,377-Speed 6311.67 samples/sec Loss 2.6834 LearningRate 0.0000 Epoch: 36 Global Step: 748260 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:49:07,613-Speed 6329.96 samples/sec Loss 2.6706 LearningRate 0.0000 Epoch: 36 Global Step: 748270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:10,876-Speed 6277.50 samples/sec Loss 2.6667 LearningRate 0.0000 Epoch: 36 Global Step: 748280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:14,128-Speed 6297.90 samples/sec Loss 2.7214 LearningRate 0.0000 Epoch: 36 Global Step: 748290 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:17,380-Speed 6298.99 samples/sec Loss 2.6633 LearningRate 0.0000 Epoch: 36 Global Step: 748300 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:20,636-Speed 6292.01 samples/sec Loss 2.7019 LearningRate 0.0000 Epoch: 36 Global Step: 748310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:23,891-Speed 6293.53 samples/sec Loss 2.6498 LearningRate 0.0000 Epoch: 36 Global Step: 748320 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:27,145-Speed 6295.30 samples/sec Loss 2.7185 LearningRate 0.0000 Epoch: 36 Global Step: 748330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:30,396-Speed 6301.74 samples/sec Loss 2.7069 LearningRate 0.0000 Epoch: 36 Global Step: 748340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:33,653-Speed 6288.72 samples/sec Loss 2.7141 LearningRate 0.0000 Epoch: 36 Global Step: 748350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:36,903-Speed 6303.10 samples/sec Loss 2.7206 LearningRate 0.0000 Epoch: 36 Global Step: 748360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:40,155-Speed 6297.92 samples/sec Loss 2.6683 LearningRate 0.0000 Epoch: 36 Global Step: 748370 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:49:43,399-Speed 6315.05 samples/sec Loss 2.6356 LearningRate 0.0000 Epoch: 36 Global Step: 748380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:46,657-Speed 6287.98 samples/sec Loss 2.6582 LearningRate 0.0000 Epoch: 36 Global Step: 748390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:49,908-Speed 6301.04 samples/sec Loss 2.7353 LearningRate 0.0000 Epoch: 36 Global Step: 748400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:53,167-Speed 6288.64 samples/sec Loss 2.7147 LearningRate 0.0000 Epoch: 36 Global Step: 748410 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:56,421-Speed 6294.68 samples/sec Loss 2.7067 LearningRate 0.0000 Epoch: 36 Global Step: 748420 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:49:59,731-Speed 6190.15 samples/sec Loss 2.7138 LearningRate 0.0000 Epoch: 36 Global Step: 748430 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:03,013-Speed 6239.80 samples/sec Loss 2.6852 LearningRate 0.0000 Epoch: 36 Global Step: 748440 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:06,266-Speed 6297.75 samples/sec Loss 2.7222 LearningRate 0.0000 Epoch: 36 Global Step: 748450 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:09,515-Speed 6303.68 samples/sec Loss 2.7167 LearningRate 0.0000 Epoch: 36 Global Step: 748460 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:12,765-Speed 6304.76 samples/sec Loss 2.6620 LearningRate 0.0000 Epoch: 36 Global Step: 748470 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:16,005-Speed 6321.15 samples/sec Loss 2.6764 LearningRate 0.0000 Epoch: 36 Global Step: 748480 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:19,257-Speed 6298.78 samples/sec Loss 2.6746 LearningRate 0.0000 Epoch: 36 Global Step: 748490 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:22,511-Speed 6296.63 samples/sec Loss 2.7066 LearningRate 0.0000 Epoch: 36 Global Step: 748500 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:25,766-Speed 6292.99 samples/sec Loss 2.6388 LearningRate 0.0000 Epoch: 36 Global Step: 748510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:29,028-Speed 6280.13 samples/sec Loss 2.6980 LearningRate 0.0000 Epoch: 36 Global Step: 748520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:32,286-Speed 6286.58 samples/sec Loss 2.7085 LearningRate 0.0000 Epoch: 36 Global Step: 748530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:35,534-Speed 6306.80 samples/sec Loss 2.6650 LearningRate 0.0000 Epoch: 36 Global Step: 748540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:38,790-Speed 6290.79 samples/sec Loss 2.6797 LearningRate 0.0000 Epoch: 36 Global Step: 748550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:42,043-Speed 6298.38 samples/sec Loss 2.6594 LearningRate 0.0000 Epoch: 36 Global Step: 748560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:45,298-Speed 6292.06 samples/sec Loss 2.6747 LearningRate 0.0000 Epoch: 36 Global Step: 748570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:48,552-Speed 6295.75 samples/sec Loss 2.6524 LearningRate 0.0000 Epoch: 36 Global Step: 748580 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:50:51,791-Speed 6326.06 samples/sec Loss 2.6642 LearningRate 0.0000 Epoch: 36 Global Step: 748590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:55,047-Speed 6289.91 samples/sec Loss 2.6664 LearningRate 0.0000 Epoch: 36 Global Step: 748600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:50:58,305-Speed 6287.95 samples/sec Loss 2.6750 LearningRate 0.0000 Epoch: 36 Global Step: 748610 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:01,566-Speed 6281.39 samples/sec Loss 2.7218 LearningRate 0.0000 Epoch: 36 Global Step: 748620 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:04,824-Speed 6288.93 samples/sec Loss 2.6876 LearningRate 0.0000 Epoch: 36 Global Step: 748630 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:08,085-Speed 6279.81 samples/sec Loss 2.7179 LearningRate 0.0000 Epoch: 36 Global Step: 748640 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:11,341-Speed 6295.50 samples/sec Loss 2.7316 LearningRate 0.0000 Epoch: 36 Global Step: 748650 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:14,592-Speed 6301.19 samples/sec Loss 2.7289 LearningRate 0.0000 Epoch: 36 Global Step: 748660 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:17,847-Speed 6291.73 samples/sec Loss 2.7299 LearningRate 0.0000 Epoch: 36 Global Step: 748670 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:21,106-Speed 6284.96 samples/sec Loss 2.6912 LearningRate 0.0000 Epoch: 36 Global Step: 748680 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:24,344-Speed 6326.56 samples/sec Loss 2.6674 LearningRate 0.0000 Epoch: 36 Global Step: 748690 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:27,594-Speed 6303.16 samples/sec Loss 2.7193 LearningRate 0.0000 Epoch: 36 Global Step: 748700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:30,842-Speed 6307.58 samples/sec Loss 2.7051 LearningRate 0.0000 Epoch: 36 Global Step: 748710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:34,091-Speed 6304.67 samples/sec Loss 2.6960 LearningRate 0.0000 Epoch: 36 Global Step: 748720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:37,349-Speed 6287.44 samples/sec Loss 2.6952 LearningRate 0.0000 Epoch: 36 Global Step: 748730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:40,604-Speed 6293.52 samples/sec Loss 2.6644 LearningRate 0.0000 Epoch: 36 Global Step: 748740 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:43,862-Speed 6286.59 samples/sec Loss 2.7359 LearningRate 0.0000 Epoch: 36 Global Step: 748750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:47,111-Speed 6306.44 samples/sec Loss 2.7475 LearningRate 0.0000 Epoch: 36 Global Step: 748760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:50,359-Speed 6305.39 samples/sec Loss 2.6578 LearningRate 0.0000 Epoch: 36 Global Step: 748770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:53,623-Speed 6277.69 samples/sec Loss 2.7063 LearningRate 0.0000 Epoch: 36 Global Step: 748780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:51:56,867-Speed 6313.41 samples/sec Loss 2.6824 LearningRate 0.0000 Epoch: 36 Global Step: 748790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:00,123-Speed 6293.32 samples/sec Loss 2.6910 LearningRate 0.0000 Epoch: 36 Global Step: 748800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:03,396-Speed 6258.55 samples/sec Loss 2.7137 LearningRate 0.0000 Epoch: 36 Global Step: 748810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:06,667-Speed 6262.44 samples/sec Loss 2.6703 LearningRate 0.0000 Epoch: 36 Global Step: 748820 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:09,914-Speed 6307.92 samples/sec Loss 2.6813 LearningRate 0.0000 Epoch: 36 Global Step: 748830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:13,167-Speed 6297.77 samples/sec Loss 2.6539 LearningRate 0.0000 Epoch: 36 Global Step: 748840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:16,420-Speed 6296.35 samples/sec Loss 2.6989 LearningRate 0.0000 Epoch: 36 Global Step: 748850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:19,676-Speed 6290.83 samples/sec Loss 2.6323 LearningRate 0.0000 Epoch: 36 Global Step: 748860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:22,931-Speed 6294.91 samples/sec Loss 2.7080 LearningRate 0.0000 Epoch: 36 Global Step: 748870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:26,187-Speed 6291.46 samples/sec Loss 2.7602 LearningRate 0.0000 Epoch: 36 Global Step: 748880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:29,442-Speed 6293.50 samples/sec Loss 2.6959 LearningRate 0.0000 Epoch: 36 Global Step: 748890 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:52:32,679-Speed 6328.45 samples/sec Loss 2.6852 LearningRate 0.0000 Epoch: 36 Global Step: 748900 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:35,932-Speed 6296.75 samples/sec Loss 2.6952 LearningRate 0.0000 Epoch: 36 Global Step: 748910 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:39,185-Speed 6297.29 samples/sec Loss 2.6928 LearningRate 0.0000 Epoch: 36 Global Step: 748920 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:42,441-Speed 6291.52 samples/sec Loss 2.7217 LearningRate 0.0000 Epoch: 36 Global Step: 748930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:45,698-Speed 6287.63 samples/sec Loss 2.6873 LearningRate 0.0000 Epoch: 36 Global Step: 748940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:48,945-Speed 6309.56 samples/sec Loss 2.6841 LearningRate 0.0000 Epoch: 36 Global Step: 748950 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:52,197-Speed 6298.85 samples/sec Loss 2.7053 LearningRate 0.0000 Epoch: 36 Global Step: 748960 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:55,451-Speed 6296.07 samples/sec Loss 2.6265 LearningRate 0.0000 Epoch: 36 Global Step: 748970 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:52:58,716-Speed 6275.56 samples/sec Loss 2.7007 LearningRate 0.0000 Epoch: 36 Global Step: 748980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:01,980-Speed 6275.36 samples/sec Loss 2.7315 LearningRate 0.0000 Epoch: 36 Global Step: 748990 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:05,231-Speed 6300.40 samples/sec Loss 2.6194 LearningRate 0.0000 Epoch: 36 Global Step: 749000 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:53:08,472-Speed 6321.40 samples/sec Loss 2.6932 LearningRate 0.0000 Epoch: 36 Global Step: 749010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:11,724-Speed 6299.39 samples/sec Loss 2.6772 LearningRate 0.0000 Epoch: 36 Global Step: 749020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:14,989-Speed 6273.65 samples/sec Loss 2.6263 LearningRate 0.0000 Epoch: 36 Global Step: 749030 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:18,240-Speed 6301.41 samples/sec Loss 2.6962 LearningRate 0.0000 Epoch: 36 Global Step: 749040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:21,488-Speed 6305.79 samples/sec Loss 2.7081 LearningRate 0.0000 Epoch: 36 Global Step: 749050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:24,757-Speed 6266.26 samples/sec Loss 2.6866 LearningRate 0.0000 Epoch: 36 Global Step: 749060 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:28,014-Speed 6290.16 samples/sec Loss 2.6574 LearningRate 0.0000 Epoch: 36 Global Step: 749070 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:31,267-Speed 6298.15 samples/sec Loss 2.7286 LearningRate 0.0000 Epoch: 36 Global Step: 749080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:34,515-Speed 6306.21 samples/sec Loss 2.6696 LearningRate 0.0000 Epoch: 36 Global Step: 749090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:37,779-Speed 6274.86 samples/sec Loss 2.6855 LearningRate 0.0000 Epoch: 36 Global Step: 749100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:41,018-Speed 6325.83 samples/sec Loss 2.6861 LearningRate 0.0000 Epoch: 36 Global Step: 749110 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:44,267-Speed 6304.34 samples/sec Loss 2.6300 LearningRate 0.0000 Epoch: 36 Global Step: 749120 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:47,525-Speed 6287.06 samples/sec Loss 2.7122 LearningRate 0.0000 Epoch: 36 Global Step: 749130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:50,781-Speed 6291.77 samples/sec Loss 2.6554 LearningRate 0.0000 Epoch: 36 Global Step: 749140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:54,041-Speed 6283.87 samples/sec Loss 2.7113 LearningRate 0.0000 Epoch: 36 Global Step: 749150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:53:57,296-Speed 6293.97 samples/sec Loss 2.7220 LearningRate 0.0000 Epoch: 36 Global Step: 749160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:00,544-Speed 6306.33 samples/sec Loss 2.6873 LearningRate 0.0000 Epoch: 36 Global Step: 749170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:03,794-Speed 6304.45 samples/sec Loss 2.6530 LearningRate 0.0000 Epoch: 36 Global Step: 749180 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:07,040-Speed 6309.44 samples/sec Loss 2.6914 LearningRate 0.0000 Epoch: 36 Global Step: 749190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:10,292-Speed 6299.28 samples/sec Loss 2.7152 LearningRate 0.0000 Epoch: 36 Global Step: 749200 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:13,520-Speed 6346.55 samples/sec Loss 2.7505 LearningRate 0.0000 Epoch: 36 Global Step: 749210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:16,771-Speed 6300.33 samples/sec Loss 2.6921 LearningRate 0.0000 Epoch: 36 Global Step: 749220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:20,030-Speed 6286.12 samples/sec Loss 2.6993 LearningRate 0.0000 Epoch: 36 Global Step: 749230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:23,284-Speed 6294.57 samples/sec Loss 2.6946 LearningRate 0.0000 Epoch: 36 Global Step: 749240 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:26,539-Speed 6294.37 samples/sec Loss 2.6655 LearningRate 0.0000 Epoch: 36 Global Step: 749250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:29,805-Speed 6271.00 samples/sec Loss 2.6925 LearningRate 0.0000 Epoch: 36 Global Step: 749260 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:33,062-Speed 6290.58 samples/sec Loss 2.6577 LearningRate 0.0000 Epoch: 36 Global Step: 749270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:36,314-Speed 6297.77 samples/sec Loss 2.6822 LearningRate 0.0000 Epoch: 36 Global Step: 749280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:39,573-Speed 6286.35 samples/sec Loss 2.6959 LearningRate 0.0000 Epoch: 36 Global Step: 749290 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:42,839-Speed 6272.43 samples/sec Loss 2.7121 LearningRate 0.0000 Epoch: 36 Global Step: 749300 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:46,076-Speed 6327.04 samples/sec Loss 2.6977 LearningRate 0.0000 Epoch: 36 Global Step: 749310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:49,331-Speed 6294.36 samples/sec Loss 2.7015 LearningRate 0.0000 Epoch: 36 Global Step: 749320 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:52,581-Speed 6302.99 samples/sec Loss 2.6782 LearningRate 0.0000 Epoch: 36 Global Step: 749330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:55,840-Speed 6285.67 samples/sec Loss 2.6969 LearningRate 0.0000 Epoch: 36 Global Step: 749340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:54:59,096-Speed 6290.29 samples/sec Loss 2.6590 LearningRate 0.0000 Epoch: 36 Global Step: 749350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:02,349-Speed 6297.09 samples/sec Loss 2.6710 LearningRate 0.0000 Epoch: 36 Global Step: 749360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:05,611-Speed 6281.75 samples/sec Loss 2.6818 LearningRate 0.0000 Epoch: 36 Global Step: 749370 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:08,868-Speed 6288.18 samples/sec Loss 2.7146 LearningRate 0.0000 Epoch: 36 Global Step: 749380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:12,123-Speed 6294.62 samples/sec Loss 2.6757 LearningRate 0.0000 Epoch: 36 Global Step: 749390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:15,380-Speed 6289.83 samples/sec Loss 2.6949 LearningRate 0.0000 Epoch: 36 Global Step: 749400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:18,640-Speed 6282.67 samples/sec Loss 2.6523 LearningRate 0.0000 Epoch: 36 Global Step: 749410 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:55:21,883-Speed 6317.03 samples/sec Loss 2.6988 LearningRate 0.0000 Epoch: 36 Global Step: 749420 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:25,143-Speed 6284.05 samples/sec Loss 2.6952 LearningRate 0.0000 Epoch: 36 Global Step: 749430 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:28,397-Speed 6294.44 samples/sec Loss 2.7093 LearningRate 0.0000 Epoch: 36 Global Step: 749440 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:31,654-Speed 6289.52 samples/sec Loss 2.7546 LearningRate 0.0000 Epoch: 36 Global Step: 749450 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:34,912-Speed 6288.60 samples/sec Loss 2.7010 LearningRate 0.0000 Epoch: 36 Global Step: 749460 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:38,182-Speed 6263.11 samples/sec Loss 2.7185 LearningRate 0.0000 Epoch: 36 Global Step: 749470 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:41,435-Speed 6298.18 samples/sec Loss 2.6754 LearningRate 0.0000 Epoch: 36 Global Step: 749480 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:44,692-Speed 6288.89 samples/sec Loss 2.6909 LearningRate 0.0000 Epoch: 36 Global Step: 749490 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:47,950-Speed 6286.98 samples/sec Loss 2.7161 LearningRate 0.0000 Epoch: 36 Global Step: 749500 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:51,206-Speed 6291.00 samples/sec Loss 2.6369 LearningRate 0.0000 Epoch: 36 Global Step: 749510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:54,443-Speed 6328.40 samples/sec Loss 2.7494 LearningRate 0.0000 Epoch: 36 Global Step: 749520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:55:57,704-Speed 6282.39 samples/sec Loss 2.6816 LearningRate 0.0000 Epoch: 36 Global Step: 749530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:00,959-Speed 6293.27 samples/sec Loss 2.6907 LearningRate 0.0000 Epoch: 36 Global Step: 749540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:04,212-Speed 6296.31 samples/sec Loss 2.6912 LearningRate 0.0000 Epoch: 36 Global Step: 749550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:07,466-Speed 6296.37 samples/sec Loss 2.7134 LearningRate 0.0000 Epoch: 36 Global Step: 749560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:10,715-Speed 6303.59 samples/sec Loss 2.6756 LearningRate 0.0000 Epoch: 36 Global Step: 749570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:13,973-Speed 6287.61 samples/sec Loss 2.7213 LearningRate 0.0000 Epoch: 36 Global Step: 749580 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:17,235-Speed 6281.40 samples/sec Loss 2.6790 LearningRate 0.0000 Epoch: 36 Global Step: 749590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:20,491-Speed 6291.31 samples/sec Loss 2.7440 LearningRate 0.0000 Epoch: 36 Global Step: 749600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:23,744-Speed 6297.78 samples/sec Loss 2.6806 LearningRate 0.0000 Epoch: 36 Global Step: 749610 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:26,996-Speed 6298.48 samples/sec Loss 2.6962 LearningRate 0.0000 Epoch: 36 Global Step: 749620 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:56:30,233-Speed 6327.53 samples/sec Loss 2.6672 LearningRate 0.0000 Epoch: 36 Global Step: 749630 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:33,490-Speed 6290.71 samples/sec Loss 2.6708 LearningRate 0.0000 Epoch: 36 Global Step: 749640 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:36,737-Speed 6307.47 samples/sec Loss 2.6447 LearningRate 0.0000 Epoch: 36 Global Step: 749650 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:39,991-Speed 6295.53 samples/sec Loss 2.6805 LearningRate 0.0000 Epoch: 36 Global Step: 749660 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:43,243-Speed 6298.45 samples/sec Loss 2.7263 LearningRate 0.0000 Epoch: 36 Global Step: 749670 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:46,494-Speed 6303.15 samples/sec Loss 2.6737 LearningRate 0.0000 Epoch: 36 Global Step: 749680 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:49,748-Speed 6293.93 samples/sec Loss 2.6664 LearningRate 0.0000 Epoch: 36 Global Step: 749690 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:53,010-Speed 6279.88 samples/sec Loss 2.7054 LearningRate 0.0000 Epoch: 36 Global Step: 749700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:56,260-Speed 6302.54 samples/sec Loss 2.7076 LearningRate 0.0000 Epoch: 36 Global Step: 749710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:56:59,518-Speed 6289.10 samples/sec Loss 2.7192 LearningRate 0.0000 Epoch: 36 Global Step: 749720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:02,761-Speed 6314.49 samples/sec Loss 2.6922 LearningRate 0.0000 Epoch: 36 Global Step: 749730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:06,021-Speed 6284.51 samples/sec Loss 2.6766 LearningRate 0.0000 Epoch: 36 Global Step: 749740 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:09,277-Speed 6292.24 samples/sec Loss 2.6514 LearningRate 0.0000 Epoch: 36 Global Step: 749750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:12,529-Speed 6297.66 samples/sec Loss 2.7709 LearningRate 0.0000 Epoch: 36 Global Step: 749760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:15,785-Speed 6291.90 samples/sec Loss 2.7040 LearningRate 0.0000 Epoch: 36 Global Step: 749770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:19,040-Speed 6293.34 samples/sec Loss 2.6806 LearningRate 0.0000 Epoch: 36 Global Step: 749780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:22,291-Speed 6302.43 samples/sec Loss 2.6908 LearningRate 0.0000 Epoch: 36 Global Step: 749790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:25,554-Speed 6277.80 samples/sec Loss 2.7323 LearningRate 0.0000 Epoch: 36 Global Step: 749800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:28,810-Speed 6291.26 samples/sec Loss 2.6618 LearningRate 0.0000 Epoch: 36 Global Step: 749810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:32,066-Speed 6291.84 samples/sec Loss 2.7011 LearningRate 0.0000 Epoch: 36 Global Step: 749820 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:35,306-Speed 6322.61 samples/sec Loss 2.6613 LearningRate 0.0000 Epoch: 36 Global Step: 749830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:38,568-Speed 6280.02 samples/sec Loss 2.6691 LearningRate 0.0000 Epoch: 36 Global Step: 749840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:41,824-Speed 6290.34 samples/sec Loss 2.6959 LearningRate 0.0000 Epoch: 36 Global Step: 749850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:45,068-Speed 6314.70 samples/sec Loss 2.6852 LearningRate 0.0000 Epoch: 36 Global Step: 749860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:48,342-Speed 6257.16 samples/sec Loss 2.7123 LearningRate 0.0000 Epoch: 36 Global Step: 749870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:51,598-Speed 6290.73 samples/sec Loss 2.7256 LearningRate 0.0000 Epoch: 36 Global Step: 749880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:54,853-Speed 6293.32 samples/sec Loss 2.7165 LearningRate 0.0000 Epoch: 36 Global Step: 749890 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:57:58,111-Speed 6287.55 samples/sec Loss 2.7093 LearningRate 0.0000 Epoch: 36 Global Step: 749900 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:01,367-Speed 6291.03 samples/sec Loss 2.6610 LearningRate 0.0000 Epoch: 36 Global Step: 749910 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:04,620-Speed 6297.43 samples/sec Loss 2.6955 LearningRate 0.0000 Epoch: 36 Global Step: 749920 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:07,854-Speed 6334.99 samples/sec Loss 2.6867 LearningRate 0.0000 Epoch: 36 Global Step: 749930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:11,115-Speed 6280.56 samples/sec Loss 2.6686 LearningRate 0.0000 Epoch: 36 Global Step: 749940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:14,379-Speed 6277.15 samples/sec Loss 2.6825 LearningRate 0.0000 Epoch: 36 Global Step: 749950 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:17,633-Speed 6295.50 samples/sec Loss 2.7213 LearningRate 0.0000 Epoch: 36 Global Step: 749960 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:20,886-Speed 6297.15 samples/sec Loss 2.6622 LearningRate 0.0000 Epoch: 36 Global Step: 749970 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:24,147-Speed 6281.20 samples/sec Loss 2.6811 LearningRate 0.0000 Epoch: 36 Global Step: 749980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:27,402-Speed 6293.94 samples/sec Loss 2.6565 LearningRate 0.0000 Epoch: 36 Global Step: 749990 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:30,661-Speed 6285.31 samples/sec Loss 2.7057 LearningRate 0.0000 Epoch: 36 Global Step: 750000 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:33,919-Speed 6287.98 samples/sec Loss 2.7155 LearningRate 0.0000 Epoch: 36 Global Step: 750010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:37,175-Speed 6291.64 samples/sec Loss 2.6755 LearningRate 0.0000 Epoch: 36 Global Step: 750020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:40,435-Speed 6282.84 samples/sec Loss 2.6686 LearningRate 0.0000 Epoch: 36 Global Step: 750030 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:58:43,673-Speed 6326.00 samples/sec Loss 2.6754 LearningRate 0.0000 Epoch: 36 Global Step: 750040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:46,925-Speed 6300.18 samples/sec Loss 2.7183 LearningRate 0.0000 Epoch: 36 Global Step: 750050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:50,174-Speed 6304.95 samples/sec Loss 2.6483 LearningRate 0.0000 Epoch: 36 Global Step: 750060 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:53,424-Speed 6303.19 samples/sec Loss 2.6836 LearningRate 0.0000 Epoch: 36 Global Step: 750070 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:56,688-Speed 6275.72 samples/sec Loss 2.7093 LearningRate 0.0000 Epoch: 36 Global Step: 750080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:58:59,940-Speed 6298.72 samples/sec Loss 2.6288 LearningRate 0.0000 Epoch: 36 Global Step: 750090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:03,197-Speed 6289.26 samples/sec Loss 2.7084 LearningRate 0.0000 Epoch: 36 Global Step: 750100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:06,456-Speed 6285.18 samples/sec Loss 2.6173 LearningRate 0.0000 Epoch: 36 Global Step: 750110 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:09,707-Speed 6302.04 samples/sec Loss 2.6986 LearningRate 0.0000 Epoch: 36 Global Step: 750120 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:12,967-Speed 6283.34 samples/sec Loss 2.6784 LearningRate 0.0000 Epoch: 36 Global Step: 750130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:16,211-Speed 6314.38 samples/sec Loss 2.6827 LearningRate 0.0000 Epoch: 36 Global Step: 750140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:19,464-Speed 6296.71 samples/sec Loss 2.6611 LearningRate 0.0000 Epoch: 36 Global Step: 750150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:22,719-Speed 6294.14 samples/sec Loss 2.7075 LearningRate 0.0000 Epoch: 36 Global Step: 750160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:25,973-Speed 6295.50 samples/sec Loss 2.6622 LearningRate 0.0000 Epoch: 36 Global Step: 750170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:29,225-Speed 6298.46 samples/sec Loss 2.7477 LearningRate 0.0000 Epoch: 36 Global Step: 750180 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:32,483-Speed 6287.24 samples/sec Loss 2.7301 LearningRate 0.0000 Epoch: 36 Global Step: 750190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:35,739-Speed 6292.26 samples/sec Loss 2.7003 LearningRate 0.0000 Epoch: 36 Global Step: 750200 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:38,997-Speed 6288.28 samples/sec Loss 2.6739 LearningRate 0.0000 Epoch: 36 Global Step: 750210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:42,242-Speed 6312.23 samples/sec Loss 2.6893 LearningRate 0.0000 Epoch: 36 Global Step: 750220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:45,487-Speed 6311.33 samples/sec Loss 2.6905 LearningRate 0.0000 Epoch: 36 Global Step: 750230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:48,741-Speed 6295.65 samples/sec Loss 2.7118 LearningRate 0.0000 Epoch: 36 Global Step: 750240 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 11:59:51,975-Speed 6335.46 samples/sec Loss 2.6618 LearningRate 0.0000 Epoch: 36 Global Step: 750250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:55,226-Speed 6299.76 samples/sec Loss 2.6918 LearningRate 0.0000 Epoch: 36 Global Step: 750260 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 11:59:58,487-Speed 6281.48 samples/sec Loss 2.7015 LearningRate 0.0000 Epoch: 36 Global Step: 750270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:01,747-Speed 6284.40 samples/sec Loss 2.6665 LearningRate 0.0000 Epoch: 36 Global Step: 750280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:05,001-Speed 6296.07 samples/sec Loss 2.6627 LearningRate 0.0000 Epoch: 36 Global Step: 750290 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:08,253-Speed 6299.19 samples/sec Loss 2.6922 LearningRate 0.0000 Epoch: 36 Global Step: 750300 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:11,508-Speed 6292.57 samples/sec Loss 2.6813 LearningRate 0.0000 Epoch: 36 Global Step: 750310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:14,771-Speed 6278.22 samples/sec Loss 2.6901 LearningRate 0.0000 Epoch: 36 Global Step: 750320 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:18,022-Speed 6301.23 samples/sec Loss 2.6911 LearningRate 0.0000 Epoch: 36 Global Step: 750330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:21,275-Speed 6296.06 samples/sec Loss 2.6732 LearningRate 0.0000 Epoch: 36 Global Step: 750340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:24,511-Speed 6330.58 samples/sec Loss 2.6674 LearningRate 0.0000 Epoch: 36 Global Step: 750350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:27,766-Speed 6292.73 samples/sec Loss 2.7094 LearningRate 0.0000 Epoch: 36 Global Step: 750360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:31,017-Speed 6301.81 samples/sec Loss 2.6231 LearningRate 0.0000 Epoch: 36 Global Step: 750370 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:34,289-Speed 6260.93 samples/sec Loss 2.6764 LearningRate 0.0000 Epoch: 36 Global Step: 750380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:37,547-Speed 6288.06 samples/sec Loss 2.6287 LearningRate 0.0000 Epoch: 36 Global Step: 750390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:40,794-Speed 6307.32 samples/sec Loss 2.6686 LearningRate 0.0000 Epoch: 36 Global Step: 750400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:44,043-Speed 6305.25 samples/sec Loss 2.6788 LearningRate 0.0000 Epoch: 36 Global Step: 750410 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:47,296-Speed 6298.17 samples/sec Loss 2.6717 LearningRate 0.0000 Epoch: 36 Global Step: 750420 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:50,551-Speed 6293.40 samples/sec Loss 2.7248 LearningRate 0.0000 Epoch: 36 Global Step: 750430 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:53,806-Speed 6292.14 samples/sec Loss 2.6958 LearningRate 0.0000 Epoch: 36 Global Step: 750440 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:00:57,061-Speed 6293.72 samples/sec Loss 2.6658 LearningRate 0.0000 Epoch: 36 Global Step: 750450 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:01:00,305-Speed 6314.65 samples/sec Loss 2.7564 LearningRate 0.0000 Epoch: 36 Global Step: 750460 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:03,559-Speed 6295.75 samples/sec Loss 2.6582 LearningRate 0.0000 Epoch: 36 Global Step: 750470 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:06,812-Speed 6296.37 samples/sec Loss 2.7004 LearningRate 0.0000 Epoch: 36 Global Step: 750480 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:10,062-Speed 6302.24 samples/sec Loss 2.7350 LearningRate 0.0000 Epoch: 36 Global Step: 750490 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:13,315-Speed 6297.93 samples/sec Loss 2.6789 LearningRate 0.0000 Epoch: 36 Global Step: 750500 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:16,564-Speed 6304.39 samples/sec Loss 2.6560 LearningRate 0.0000 Epoch: 36 Global Step: 750510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:19,820-Speed 6292.78 samples/sec Loss 2.6855 LearningRate 0.0000 Epoch: 36 Global Step: 750520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:23,072-Speed 6298.07 samples/sec Loss 2.6684 LearningRate 0.0000 Epoch: 36 Global Step: 750530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:26,334-Speed 6280.25 samples/sec Loss 2.7199 LearningRate 0.0000 Epoch: 36 Global Step: 750540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:29,594-Speed 6282.77 samples/sec Loss 2.6794 LearningRate 0.0000 Epoch: 36 Global Step: 750550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:32,840-Speed 6310.38 samples/sec Loss 2.6809 LearningRate 0.0000 Epoch: 36 Global Step: 750560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:36,098-Speed 6288.16 samples/sec Loss 2.6803 LearningRate 0.0000 Epoch: 36 Global Step: 750570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:39,361-Speed 6279.17 samples/sec Loss 2.7000 LearningRate 0.0000 Epoch: 36 Global Step: 750580 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:42,684-Speed 6163.46 samples/sec Loss 2.7237 LearningRate 0.0000 Epoch: 36 Global Step: 750590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:45,942-Speed 6288.10 samples/sec Loss 2.6709 LearningRate 0.0000 Epoch: 36 Global Step: 750600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:49,197-Speed 6292.92 samples/sec Loss 2.7027 LearningRate 0.0000 Epoch: 36 Global Step: 750610 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:52,454-Speed 6289.72 samples/sec Loss 2.6571 LearningRate 0.0000 Epoch: 36 Global Step: 750620 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:55,714-Speed 6282.78 samples/sec Loss 2.7004 LearningRate 0.0000 Epoch: 36 Global Step: 750630 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:01:58,963-Speed 6304.87 samples/sec Loss 2.6865 LearningRate 0.0000 Epoch: 36 Global Step: 750640 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:02,221-Speed 6287.23 samples/sec Loss 2.6948 LearningRate 0.0000 Epoch: 36 Global Step: 750650 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:05,466-Speed 6313.25 samples/sec Loss 2.7188 LearningRate 0.0000 Epoch: 36 Global Step: 750660 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:08,720-Speed 6296.03 samples/sec Loss 2.6637 LearningRate 0.0000 Epoch: 36 Global Step: 750670 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:11,968-Speed 6306.32 samples/sec Loss 2.6573 LearningRate 0.0000 Epoch: 36 Global Step: 750680 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:15,228-Speed 6283.54 samples/sec Loss 2.7193 LearningRate 0.0000 Epoch: 36 Global Step: 750690 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:18,521-Speed 6221.06 samples/sec Loss 2.6940 LearningRate 0.0000 Epoch: 36 Global Step: 750700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:21,777-Speed 6289.89 samples/sec Loss 2.6639 LearningRate 0.0000 Epoch: 36 Global Step: 750710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:25,030-Speed 6298.42 samples/sec Loss 2.6818 LearningRate 0.0000 Epoch: 36 Global Step: 750720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:28,286-Speed 6291.42 samples/sec Loss 2.6597 LearningRate 0.0000 Epoch: 36 Global Step: 750730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:31,544-Speed 6286.44 samples/sec Loss 2.6806 LearningRate 0.0000 Epoch: 36 Global Step: 750740 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:34,805-Speed 6282.44 samples/sec Loss 2.6951 LearningRate 0.0000 Epoch: 36 Global Step: 750750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:38,064-Speed 6285.84 samples/sec Loss 2.7048 LearningRate 0.0000 Epoch: 36 Global Step: 750760 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:02:41,312-Speed 6306.36 samples/sec Loss 2.7160 LearningRate 0.0000 Epoch: 36 Global Step: 750770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:44,578-Speed 6272.88 samples/sec Loss 2.6693 LearningRate 0.0000 Epoch: 36 Global Step: 750780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:47,829-Speed 6300.29 samples/sec Loss 2.6466 LearningRate 0.0000 Epoch: 36 Global Step: 750790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:51,083-Speed 6295.71 samples/sec Loss 2.6900 LearningRate 0.0000 Epoch: 36 Global Step: 750800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:54,359-Speed 6252.18 samples/sec Loss 2.7368 LearningRate 0.0000 Epoch: 36 Global Step: 750810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:02:57,616-Speed 6290.88 samples/sec Loss 2.7253 LearningRate 0.0000 Epoch: 36 Global Step: 750820 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:00,876-Speed 6283.44 samples/sec Loss 2.6499 LearningRate 0.0000 Epoch: 36 Global Step: 750830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:04,137-Speed 6281.80 samples/sec Loss 2.6358 LearningRate 0.0000 Epoch: 36 Global Step: 750840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:07,392-Speed 6292.03 samples/sec Loss 2.6887 LearningRate 0.0000 Epoch: 36 Global Step: 750850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:10,648-Speed 6291.21 samples/sec Loss 2.6777 LearningRate 0.0000 Epoch: 36 Global Step: 750860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:13,889-Speed 6320.20 samples/sec Loss 2.7066 LearningRate 0.0000 Epoch: 36 Global Step: 750870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:17,150-Speed 6283.41 samples/sec Loss 2.6414 LearningRate 0.0000 Epoch: 36 Global Step: 750880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:20,409-Speed 6284.22 samples/sec Loss 2.6497 LearningRate 0.0000 Epoch: 36 Global Step: 750890 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:23,672-Speed 6277.68 samples/sec Loss 2.6534 LearningRate 0.0000 Epoch: 36 Global Step: 750900 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:26,929-Speed 6290.07 samples/sec Loss 2.6587 LearningRate 0.0000 Epoch: 36 Global Step: 750910 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:30,195-Speed 6271.42 samples/sec Loss 2.6960 LearningRate 0.0000 Epoch: 36 Global Step: 750920 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:33,458-Speed 6277.83 samples/sec Loss 2.6783 LearningRate 0.0000 Epoch: 36 Global Step: 750930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:36,717-Speed 6286.25 samples/sec Loss 2.7025 LearningRate 0.0000 Epoch: 36 Global Step: 750940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:39,970-Speed 6296.63 samples/sec Loss 2.7551 LearningRate 0.0000 Epoch: 36 Global Step: 750950 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:43,230-Speed 6284.33 samples/sec Loss 2.6743 LearningRate 0.0000 Epoch: 36 Global Step: 750960 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:46,487-Speed 6288.99 samples/sec Loss 2.6513 LearningRate 0.0000 Epoch: 36 Global Step: 750970 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:03:49,728-Speed 6321.44 samples/sec Loss 2.6581 LearningRate 0.0000 Epoch: 36 Global Step: 750980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:52,982-Speed 6294.72 samples/sec Loss 2.7048 LearningRate 0.0000 Epoch: 36 Global Step: 750990 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:56,239-Speed 6290.16 samples/sec Loss 2.7287 LearningRate 0.0000 Epoch: 36 Global Step: 751000 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:03:59,493-Speed 6295.66 samples/sec Loss 2.7145 LearningRate 0.0000 Epoch: 36 Global Step: 751010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:02,750-Speed 6288.79 samples/sec Loss 2.7075 LearningRate 0.0000 Epoch: 36 Global Step: 751020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:06,009-Speed 6285.42 samples/sec Loss 2.7027 LearningRate 0.0000 Epoch: 36 Global Step: 751030 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:09,266-Speed 6288.87 samples/sec Loss 2.6772 LearningRate 0.0000 Epoch: 36 Global Step: 751040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:12,516-Speed 6303.33 samples/sec Loss 2.6575 LearningRate 0.0000 Epoch: 36 Global Step: 751050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:15,775-Speed 6286.38 samples/sec Loss 2.6947 LearningRate 0.0000 Epoch: 36 Global Step: 751060 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:19,028-Speed 6295.67 samples/sec Loss 2.6655 LearningRate 0.0000 Epoch: 36 Global Step: 751070 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:22,276-Speed 6308.11 samples/sec Loss 2.6740 LearningRate 0.0000 Epoch: 36 Global Step: 751080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:25,533-Speed 6288.82 samples/sec Loss 2.6656 LearningRate 0.0000 Epoch: 36 Global Step: 751090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:28,787-Speed 6294.51 samples/sec Loss 2.6755 LearningRate 0.0000 Epoch: 36 Global Step: 751100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:32,038-Speed 6300.94 samples/sec Loss 2.7476 LearningRate 0.0000 Epoch: 36 Global Step: 751110 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:35,301-Speed 6278.07 samples/sec Loss 2.6808 LearningRate 0.0000 Epoch: 36 Global Step: 751120 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:38,558-Speed 6289.44 samples/sec Loss 2.6700 LearningRate 0.0000 Epoch: 36 Global Step: 751130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:41,815-Speed 6290.20 samples/sec Loss 2.6994 LearningRate 0.0000 Epoch: 36 Global Step: 751140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:45,064-Speed 6304.88 samples/sec Loss 2.6902 LearningRate 0.0000 Epoch: 36 Global Step: 751150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:48,319-Speed 6292.53 samples/sec Loss 2.6983 LearningRate 0.0000 Epoch: 36 Global Step: 751160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:51,575-Speed 6291.06 samples/sec Loss 2.7213 LearningRate 0.0000 Epoch: 36 Global Step: 751170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:04:54,827-Speed 6300.32 samples/sec Loss 2.7011 LearningRate 0.0000 Epoch: 36 Global Step: 751180 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:04:58,067-Speed 6322.49 samples/sec Loss 2.6652 LearningRate 0.0000 Epoch: 36 Global Step: 751190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:01,325-Speed 6287.91 samples/sec Loss 2.6927 LearningRate 0.0000 Epoch: 36 Global Step: 751200 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:04,583-Speed 6286.92 samples/sec Loss 2.6605 LearningRate 0.0000 Epoch: 36 Global Step: 751210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:07,845-Speed 6280.39 samples/sec Loss 2.6244 LearningRate 0.0000 Epoch: 36 Global Step: 751220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:11,105-Speed 6283.66 samples/sec Loss 2.6869 LearningRate 0.0000 Epoch: 36 Global Step: 751230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:14,361-Speed 6291.63 samples/sec Loss 2.6600 LearningRate 0.0000 Epoch: 36 Global Step: 751240 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:17,608-Speed 6307.64 samples/sec Loss 2.6713 LearningRate 0.0000 Epoch: 36 Global Step: 751250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:20,862-Speed 6296.43 samples/sec Loss 2.6503 LearningRate 0.0000 Epoch: 36 Global Step: 751260 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:24,123-Speed 6280.67 samples/sec Loss 2.6837 LearningRate 0.0000 Epoch: 36 Global Step: 751270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:27,414-Speed 6225.02 samples/sec Loss 2.6788 LearningRate 0.0000 Epoch: 36 Global Step: 751280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:30,669-Speed 6291.95 samples/sec Loss 2.6422 LearningRate 0.0000 Epoch: 36 Global Step: 751290 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:05:33,913-Speed 6316.11 samples/sec Loss 2.6763 LearningRate 0.0000 Epoch: 36 Global Step: 751300 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:37,189-Speed 6251.86 samples/sec Loss 2.7039 LearningRate 0.0000 Epoch: 36 Global Step: 751310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:40,444-Speed 6293.92 samples/sec Loss 2.7522 LearningRate 0.0000 Epoch: 36 Global Step: 751320 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:43,704-Speed 6283.92 samples/sec Loss 2.6864 LearningRate 0.0000 Epoch: 36 Global Step: 751330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:46,966-Speed 6279.93 samples/sec Loss 2.6962 LearningRate 0.0000 Epoch: 36 Global Step: 751340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:50,224-Speed 6287.43 samples/sec Loss 2.6421 LearningRate 0.0000 Epoch: 36 Global Step: 751350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:53,480-Speed 6291.44 samples/sec Loss 2.6214 LearningRate 0.0000 Epoch: 36 Global Step: 751360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:56,739-Speed 6284.32 samples/sec Loss 2.6313 LearningRate 0.0000 Epoch: 36 Global Step: 751370 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:05:59,997-Speed 6287.43 samples/sec Loss 2.7033 LearningRate 0.0000 Epoch: 36 Global Step: 751380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:03,251-Speed 6295.13 samples/sec Loss 2.6865 LearningRate 0.0000 Epoch: 36 Global Step: 751390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:06,493-Speed 6319.05 samples/sec Loss 2.7068 LearningRate 0.0000 Epoch: 36 Global Step: 751400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:09,755-Speed 6280.85 samples/sec Loss 2.7072 LearningRate 0.0000 Epoch: 36 Global Step: 751410 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:13,015-Speed 6284.27 samples/sec Loss 2.6652 LearningRate 0.0000 Epoch: 36 Global Step: 751420 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:16,275-Speed 6282.36 samples/sec Loss 2.6877 LearningRate 0.0000 Epoch: 36 Global Step: 751430 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:19,528-Speed 6297.92 samples/sec Loss 2.6765 LearningRate 0.0000 Epoch: 36 Global Step: 751440 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:22,792-Speed 6276.36 samples/sec Loss 2.6694 LearningRate 0.0000 Epoch: 36 Global Step: 751450 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:26,047-Speed 6291.93 samples/sec Loss 2.6985 LearningRate 0.0000 Epoch: 36 Global Step: 751460 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:29,300-Speed 6296.74 samples/sec Loss 2.6921 LearningRate 0.0000 Epoch: 36 Global Step: 751470 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:32,559-Speed 6285.52 samples/sec Loss 2.6417 LearningRate 0.0000 Epoch: 36 Global Step: 751480 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:35,813-Speed 6297.15 samples/sec Loss 2.6802 LearningRate 0.0000 Epoch: 36 Global Step: 751490 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:39,054-Speed 6318.42 samples/sec Loss 2.7255 LearningRate 0.0000 Epoch: 36 Global Step: 751500 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:42,310-Speed 6292.60 samples/sec Loss 2.6386 LearningRate 0.0000 Epoch: 36 Global Step: 751510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:45,567-Speed 6289.06 samples/sec Loss 2.7382 LearningRate 0.0000 Epoch: 36 Global Step: 751520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:48,823-Speed 6292.08 samples/sec Loss 2.6937 LearningRate 0.0000 Epoch: 36 Global Step: 751530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:52,077-Speed 6295.09 samples/sec Loss 2.6759 LearningRate 0.0000 Epoch: 36 Global Step: 751540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:55,336-Speed 6284.30 samples/sec Loss 2.7192 LearningRate 0.0000 Epoch: 36 Global Step: 751550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:06:58,596-Speed 6284.37 samples/sec Loss 2.6930 LearningRate 0.0000 Epoch: 36 Global Step: 751560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:01,850-Speed 6295.47 samples/sec Loss 2.7085 LearningRate 0.0000 Epoch: 36 Global Step: 751570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:05,159-Speed 6189.28 samples/sec Loss 2.7107 LearningRate 0.0000 Epoch: 36 Global Step: 751580 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:08,418-Speed 6285.73 samples/sec Loss 2.7262 LearningRate 0.0000 Epoch: 36 Global Step: 751590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:11,661-Speed 6317.76 samples/sec Loss 2.6357 LearningRate 0.0000 Epoch: 36 Global Step: 751600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:14,938-Speed 6251.98 samples/sec Loss 2.6807 LearningRate 0.0000 Epoch: 36 Global Step: 751610 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:18,204-Speed 6272.22 samples/sec Loss 2.6802 LearningRate 0.0000 Epoch: 36 Global Step: 751620 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:21,490-Speed 6232.36 samples/sec Loss 2.6710 LearningRate 0.0000 Epoch: 36 Global Step: 751630 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:24,791-Speed 6206.30 samples/sec Loss 2.6769 LearningRate 0.0000 Epoch: 36 Global Step: 751640 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:28,047-Speed 6291.48 samples/sec Loss 2.6824 LearningRate 0.0000 Epoch: 36 Global Step: 751650 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:31,307-Speed 6284.43 samples/sec Loss 2.6874 LearningRate 0.0000 Epoch: 36 Global Step: 751660 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:34,564-Speed 6289.14 samples/sec Loss 2.6793 LearningRate 0.0000 Epoch: 36 Global Step: 751670 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:37,826-Speed 6279.70 samples/sec Loss 2.7037 LearningRate 0.0000 Epoch: 36 Global Step: 751680 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:41,082-Speed 6291.85 samples/sec Loss 2.6638 LearningRate 0.0000 Epoch: 36 Global Step: 751690 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:44,324-Speed 6317.59 samples/sec Loss 2.6225 LearningRate 0.0000 Epoch: 36 Global Step: 751700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:47,585-Speed 6281.93 samples/sec Loss 2.6531 LearningRate 0.0000 Epoch: 36 Global Step: 751710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:50,850-Speed 6273.03 samples/sec Loss 2.6784 LearningRate 0.0000 Epoch: 36 Global Step: 751720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:54,118-Speed 6269.78 samples/sec Loss 2.6549 LearningRate 0.0000 Epoch: 36 Global Step: 751730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:07:57,381-Speed 6276.26 samples/sec Loss 2.6910 LearningRate 0.0000 Epoch: 36 Global Step: 751740 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:00,647-Speed 6272.23 samples/sec Loss 2.6456 LearningRate 0.0000 Epoch: 36 Global Step: 751750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:03,909-Speed 6280.64 samples/sec Loss 2.6911 LearningRate 0.0000 Epoch: 36 Global Step: 751760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:07,169-Speed 6283.56 samples/sec Loss 2.6926 LearningRate 0.0000 Epoch: 36 Global Step: 751770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:10,425-Speed 6290.74 samples/sec Loss 2.7096 LearningRate 0.0000 Epoch: 36 Global Step: 751780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:13,682-Speed 6291.02 samples/sec Loss 2.7427 LearningRate 0.0000 Epoch: 36 Global Step: 751790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:16,924-Speed 6317.41 samples/sec Loss 2.6643 LearningRate 0.0000 Epoch: 36 Global Step: 751800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:20,184-Speed 6284.39 samples/sec Loss 2.6536 LearningRate 0.0000 Epoch: 36 Global Step: 751810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:23,441-Speed 6289.58 samples/sec Loss 2.7346 LearningRate 0.0000 Epoch: 36 Global Step: 751820 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:26,696-Speed 6292.59 samples/sec Loss 2.6769 LearningRate 0.0000 Epoch: 36 Global Step: 751830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:29,950-Speed 6295.69 samples/sec Loss 2.6945 LearningRate 0.0000 Epoch: 36 Global Step: 751840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:33,202-Speed 6299.75 samples/sec Loss 2.6703 LearningRate 0.0000 Epoch: 36 Global Step: 751850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:36,456-Speed 6295.02 samples/sec Loss 2.6655 LearningRate 0.0000 Epoch: 36 Global Step: 751860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:39,711-Speed 6292.97 samples/sec Loss 2.6672 LearningRate 0.0000 Epoch: 36 Global Step: 751870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:42,965-Speed 6294.24 samples/sec Loss 2.6862 LearningRate 0.0000 Epoch: 36 Global Step: 751880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:46,222-Speed 6289.16 samples/sec Loss 2.6671 LearningRate 0.0000 Epoch: 36 Global Step: 751890 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:08:49,477-Speed 6293.34 samples/sec Loss 2.7042 LearningRate 0.0000 Epoch: 36 Global Step: 751900 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:08:52,733-Speed 6291.35 samples/sec Loss 2.7283 LearningRate 0.0000 Epoch: 36 Global Step: 751910 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:08:55,991-Speed 6287.58 samples/sec Loss 2.6658 LearningRate 0.0000 Epoch: 36 Global Step: 751920 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:08:59,236-Speed 6313.31 samples/sec Loss 2.7008 LearningRate 0.0000 Epoch: 36 Global Step: 751930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:02,496-Speed 6283.78 samples/sec Loss 2.6911 LearningRate 0.0000 Epoch: 36 Global Step: 751940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:05,757-Speed 6280.69 samples/sec Loss 2.6247 LearningRate 0.0000 Epoch: 36 Global Step: 751950 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:09,024-Speed 6270.21 samples/sec Loss 2.6869 LearningRate 0.0000 Epoch: 36 Global Step: 751960 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:12,275-Speed 6301.96 samples/sec Loss 2.6964 LearningRate 0.0000 Epoch: 36 Global Step: 751970 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:15,533-Speed 6286.89 samples/sec Loss 2.6642 LearningRate 0.0000 Epoch: 36 Global Step: 751980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:18,789-Speed 6291.91 samples/sec Loss 2.6563 LearningRate 0.0000 Epoch: 36 Global Step: 751990 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:22,047-Speed 6286.82 samples/sec Loss 2.6782 LearningRate 0.0000 Epoch: 36 Global Step: 752000 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:25,305-Speed 6289.67 samples/sec Loss 2.6647 LearningRate 0.0000 Epoch: 36 Global Step: 752010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:28,566-Speed 6280.26 samples/sec Loss 2.7135 LearningRate 0.0000 Epoch: 36 Global Step: 752020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:31,826-Speed 6284.69 samples/sec Loss 2.7024 LearningRate 0.0000 Epoch: 36 Global Step: 752030 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:09:35,073-Speed 6308.58 samples/sec Loss 2.6386 LearningRate 0.0000 Epoch: 36 Global Step: 752040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:38,327-Speed 6294.41 samples/sec Loss 2.6858 LearningRate 0.0000 Epoch: 36 Global Step: 752050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:41,585-Speed 6288.21 samples/sec Loss 2.6959 LearningRate 0.0000 Epoch: 36 Global Step: 752060 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:44,831-Speed 6309.39 samples/sec Loss 2.5906 LearningRate 0.0000 Epoch: 36 Global Step: 752070 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:48,099-Speed 6269.34 samples/sec Loss 2.6618 LearningRate 0.0000 Epoch: 36 Global Step: 752080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:51,357-Speed 6286.44 samples/sec Loss 2.6329 LearningRate 0.0000 Epoch: 36 Global Step: 752090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:54,615-Speed 6287.56 samples/sec Loss 2.7060 LearningRate 0.0000 Epoch: 36 Global Step: 752100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:09:57,874-Speed 6286.15 samples/sec Loss 2.6506 LearningRate 0.0000 Epoch: 36 Global Step: 752110 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:01,134-Speed 6282.94 samples/sec Loss 2.7006 LearningRate 0.0000 Epoch: 36 Global Step: 752120 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:04,401-Speed 6269.50 samples/sec Loss 2.6766 LearningRate 0.0000 Epoch: 36 Global Step: 752130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:07,644-Speed 6317.02 samples/sec Loss 2.6965 LearningRate 0.0000 Epoch: 36 Global Step: 752140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:10,905-Speed 6282.32 samples/sec Loss 2.7272 LearningRate 0.0000 Epoch: 36 Global Step: 752150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:14,166-Speed 6282.28 samples/sec Loss 2.7115 LearningRate 0.0000 Epoch: 36 Global Step: 752160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:17,425-Speed 6285.30 samples/sec Loss 2.7132 LearningRate 0.0000 Epoch: 36 Global Step: 752170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:20,684-Speed 6285.03 samples/sec Loss 2.6444 LearningRate 0.0000 Epoch: 36 Global Step: 752180 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:23,942-Speed 6287.86 samples/sec Loss 2.6731 LearningRate 0.0000 Epoch: 36 Global Step: 752190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:27,202-Speed 6283.53 samples/sec Loss 2.6485 LearningRate 0.0000 Epoch: 36 Global Step: 752200 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:30,460-Speed 6287.06 samples/sec Loss 2.6809 LearningRate 0.0000 Epoch: 36 Global Step: 752210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:33,715-Speed 6294.19 samples/sec Loss 2.6911 LearningRate 0.0000 Epoch: 36 Global Step: 752220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:37,019-Speed 6199.58 samples/sec Loss 2.6630 LearningRate 0.0000 Epoch: 36 Global Step: 752230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:40,258-Speed 6323.50 samples/sec Loss 2.6494 LearningRate 0.0000 Epoch: 36 Global Step: 752240 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:43,514-Speed 6292.16 samples/sec Loss 2.6915 LearningRate 0.0000 Epoch: 36 Global Step: 752250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:46,769-Speed 6292.64 samples/sec Loss 2.6624 LearningRate 0.0000 Epoch: 36 Global Step: 752260 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:50,036-Speed 6270.43 samples/sec Loss 2.6938 LearningRate 0.0000 Epoch: 36 Global Step: 752270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:53,303-Speed 6271.08 samples/sec Loss 2.6949 LearningRate 0.0000 Epoch: 36 Global Step: 752280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:56,561-Speed 6286.53 samples/sec Loss 2.6722 LearningRate 0.0000 Epoch: 36 Global Step: 752290 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:10:59,828-Speed 6270.92 samples/sec Loss 2.7049 LearningRate 0.0000 Epoch: 36 Global Step: 752300 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:03,089-Speed 6282.19 samples/sec Loss 2.6832 LearningRate 0.0000 Epoch: 36 Global Step: 752310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:06,350-Speed 6280.42 samples/sec Loss 2.6999 LearningRate 0.0000 Epoch: 36 Global Step: 752320 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:09,615-Speed 6274.30 samples/sec Loss 2.6313 LearningRate 0.0000 Epoch: 36 Global Step: 752330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:12,856-Speed 6320.08 samples/sec Loss 2.7113 LearningRate 0.0000 Epoch: 36 Global Step: 752340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:16,109-Speed 6298.40 samples/sec Loss 2.7608 LearningRate 0.0000 Epoch: 36 Global Step: 752350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:19,371-Speed 6278.26 samples/sec Loss 2.6737 LearningRate 0.0000 Epoch: 36 Global Step: 752360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:22,637-Speed 6272.31 samples/sec Loss 2.7239 LearningRate 0.0000 Epoch: 36 Global Step: 752370 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:25,897-Speed 6283.68 samples/sec Loss 2.6424 LearningRate 0.0000 Epoch: 36 Global Step: 752380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:29,156-Speed 6286.55 samples/sec Loss 2.7072 LearningRate 0.0000 Epoch: 36 Global Step: 752390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:32,409-Speed 6297.23 samples/sec Loss 2.6801 LearningRate 0.0000 Epoch: 36 Global Step: 752400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:35,683-Speed 6257.01 samples/sec Loss 2.6669 LearningRate 0.0000 Epoch: 36 Global Step: 752410 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:38,952-Speed 6266.52 samples/sec Loss 2.7645 LearningRate 0.0000 Epoch: 36 Global Step: 752420 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:42,213-Speed 6282.23 samples/sec Loss 2.6447 LearningRate 0.0000 Epoch: 36 Global Step: 752430 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:45,468-Speed 6292.36 samples/sec Loss 2.6655 LearningRate 0.0000 Epoch: 36 Global Step: 752440 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:11:48,710-Speed 6318.73 samples/sec Loss 2.6782 LearningRate 0.0000 Epoch: 36 Global Step: 752450 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:51,967-Speed 6288.74 samples/sec Loss 2.7420 LearningRate 0.0000 Epoch: 36 Global Step: 752460 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:55,224-Speed 6289.50 samples/sec Loss 2.6804 LearningRate 0.0000 Epoch: 36 Global Step: 752470 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:11:58,483-Speed 6286.54 samples/sec Loss 2.6220 LearningRate 0.0000 Epoch: 36 Global Step: 752480 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:01,751-Speed 6266.97 samples/sec Loss 2.7178 LearningRate 0.0000 Epoch: 36 Global Step: 752490 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:05,015-Speed 6276.90 samples/sec Loss 2.6583 LearningRate 0.0000 Epoch: 36 Global Step: 752500 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:08,269-Speed 6295.38 samples/sec Loss 2.6361 LearningRate 0.0000 Epoch: 36 Global Step: 752510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:11,521-Speed 6298.68 samples/sec Loss 2.6825 LearningRate 0.0000 Epoch: 36 Global Step: 752520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:14,787-Speed 6271.86 samples/sec Loss 2.7029 LearningRate 0.0000 Epoch: 36 Global Step: 752530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:18,041-Speed 6295.49 samples/sec Loss 2.6779 LearningRate 0.0000 Epoch: 36 Global Step: 752540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:21,284-Speed 6315.72 samples/sec Loss 2.6406 LearningRate 0.0000 Epoch: 36 Global Step: 752550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:24,541-Speed 6290.87 samples/sec Loss 2.7093 LearningRate 0.0000 Epoch: 36 Global Step: 752560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:27,805-Speed 6275.71 samples/sec Loss 2.6584 LearningRate 0.0000 Epoch: 36 Global Step: 752570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:31,055-Speed 6301.58 samples/sec Loss 2.6841 LearningRate 0.0000 Epoch: 36 Global Step: 752580 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:34,323-Speed 6268.43 samples/sec Loss 2.6882 LearningRate 0.0000 Epoch: 36 Global Step: 752590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:37,578-Speed 6294.27 samples/sec Loss 2.6542 LearningRate 0.0000 Epoch: 36 Global Step: 752600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:40,833-Speed 6292.97 samples/sec Loss 2.7257 LearningRate 0.0000 Epoch: 36 Global Step: 752610 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:44,093-Speed 6284.91 samples/sec Loss 2.7029 LearningRate 0.0000 Epoch: 36 Global Step: 752620 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:47,357-Speed 6275.35 samples/sec Loss 2.6668 LearningRate 0.0000 Epoch: 36 Global Step: 752630 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:50,625-Speed 6268.77 samples/sec Loss 2.6848 LearningRate 0.0000 Epoch: 36 Global Step: 752640 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:12:53,877-Speed 6299.81 samples/sec Loss 2.6856 LearningRate 0.0000 Epoch: 36 Global Step: 752650 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:12:57,121-Speed 6313.39 samples/sec Loss 2.6631 LearningRate 0.0000 Epoch: 36 Global Step: 752660 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:00,381-Speed 6283.44 samples/sec Loss 2.7567 LearningRate 0.0000 Epoch: 36 Global Step: 752670 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:03,640-Speed 6287.05 samples/sec Loss 2.6983 LearningRate 0.0000 Epoch: 36 Global Step: 752680 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:06,901-Speed 6280.53 samples/sec Loss 2.6526 LearningRate 0.0000 Epoch: 36 Global Step: 752690 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:10,161-Speed 6284.05 samples/sec Loss 2.6926 LearningRate 0.0000 Epoch: 36 Global Step: 752700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:13,419-Speed 6287.05 samples/sec Loss 2.7018 LearningRate 0.0000 Epoch: 36 Global Step: 752710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:16,680-Speed 6282.48 samples/sec Loss 2.7172 LearningRate 0.0000 Epoch: 36 Global Step: 752720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:19,943-Speed 6277.64 samples/sec Loss 2.7250 LearningRate 0.0000 Epoch: 36 Global Step: 752730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:23,203-Speed 6283.80 samples/sec Loss 2.6844 LearningRate 0.0000 Epoch: 36 Global Step: 752740 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:26,462-Speed 6285.53 samples/sec Loss 2.6915 LearningRate 0.0000 Epoch: 36 Global Step: 752750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:29,713-Speed 6300.62 samples/sec Loss 2.6921 LearningRate 0.0000 Epoch: 36 Global Step: 752760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:32,973-Speed 6283.74 samples/sec Loss 2.6924 LearningRate 0.0000 Epoch: 36 Global Step: 752770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:36,246-Speed 6258.34 samples/sec Loss 2.6445 LearningRate 0.0000 Epoch: 36 Global Step: 752780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:39,498-Speed 6299.46 samples/sec Loss 2.6508 LearningRate 0.0000 Epoch: 36 Global Step: 752790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:42,757-Speed 6285.59 samples/sec Loss 2.6835 LearningRate 0.0000 Epoch: 36 Global Step: 752800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:46,015-Speed 6288.21 samples/sec Loss 2.6750 LearningRate 0.0000 Epoch: 36 Global Step: 752810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:49,269-Speed 6296.14 samples/sec Loss 2.6441 LearningRate 0.0000 Epoch: 36 Global Step: 752820 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:52,531-Speed 6279.43 samples/sec Loss 2.6968 LearningRate 0.0000 Epoch: 36 Global Step: 752830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:55,798-Speed 6269.28 samples/sec Loss 2.6291 LearningRate 0.0000 Epoch: 36 Global Step: 752840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:13:59,055-Speed 6289.95 samples/sec Loss 2.6710 LearningRate 0.0000 Epoch: 36 Global Step: 752850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:02,298-Speed 6316.26 samples/sec Loss 2.7283 LearningRate 0.0000 Epoch: 36 Global Step: 752860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:05,552-Speed 6296.08 samples/sec Loss 2.7079 LearningRate 0.0000 Epoch: 36 Global Step: 752870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:08,808-Speed 6290.02 samples/sec Loss 2.6786 LearningRate 0.0000 Epoch: 36 Global Step: 752880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:12,065-Speed 6289.89 samples/sec Loss 2.6461 LearningRate 0.0000 Epoch: 36 Global Step: 752890 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:15,320-Speed 6294.33 samples/sec Loss 2.6561 LearningRate 0.0000 Epoch: 36 Global Step: 752900 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:18,580-Speed 6282.54 samples/sec Loss 2.6808 LearningRate 0.0000 Epoch: 36 Global Step: 752910 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:21,836-Speed 6291.44 samples/sec Loss 2.7405 LearningRate 0.0000 Epoch: 36 Global Step: 752920 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:25,093-Speed 6290.55 samples/sec Loss 2.6906 LearningRate 0.0000 Epoch: 36 Global Step: 752930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:28,348-Speed 6292.15 samples/sec Loss 2.6605 LearningRate 0.0000 Epoch: 36 Global Step: 752940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:31,608-Speed 6283.13 samples/sec Loss 2.7106 LearningRate 0.0000 Epoch: 36 Global Step: 752950 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:34,848-Speed 6322.05 samples/sec Loss 2.6665 LearningRate 0.0000 Epoch: 36 Global Step: 752960 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:38,112-Speed 6276.23 samples/sec Loss 2.6824 LearningRate 0.0000 Epoch: 36 Global Step: 752970 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:41,369-Speed 6289.85 samples/sec Loss 2.7209 LearningRate 0.0000 Epoch: 36 Global Step: 752980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:44,625-Speed 6291.73 samples/sec Loss 2.7153 LearningRate 0.0000 Epoch: 36 Global Step: 752990 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:47,880-Speed 6293.79 samples/sec Loss 2.7334 LearningRate 0.0000 Epoch: 36 Global Step: 753000 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:51,138-Speed 6286.76 samples/sec Loss 2.6818 LearningRate 0.0000 Epoch: 36 Global Step: 753010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:54,397-Speed 6286.36 samples/sec Loss 2.6532 LearningRate 0.0000 Epoch: 36 Global Step: 753020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:14:57,654-Speed 6290.01 samples/sec Loss 2.6558 LearningRate 0.0000 Epoch: 36 Global Step: 753030 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:00,908-Speed 6294.03 samples/sec Loss 2.6700 LearningRate 0.0000 Epoch: 36 Global Step: 753040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:04,164-Speed 6292.18 samples/sec Loss 2.6299 LearningRate 0.0000 Epoch: 36 Global Step: 753050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:07,425-Speed 6281.08 samples/sec Loss 2.6420 LearningRate 0.0000 Epoch: 36 Global Step: 753060 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:15:10,667-Speed 6318.15 samples/sec Loss 2.6783 LearningRate 0.0000 Epoch: 36 Global Step: 753070 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:13,930-Speed 6278.55 samples/sec Loss 2.5990 LearningRate 0.0000 Epoch: 36 Global Step: 753080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:17,184-Speed 6295.65 samples/sec Loss 2.6782 LearningRate 0.0000 Epoch: 36 Global Step: 753090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:20,448-Speed 6275.71 samples/sec Loss 2.6660 LearningRate 0.0000 Epoch: 36 Global Step: 753100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:23,706-Speed 6285.96 samples/sec Loss 2.6261 LearningRate 0.0000 Epoch: 36 Global Step: 753110 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:26,961-Speed 6293.78 samples/sec Loss 2.6270 LearningRate 0.0000 Epoch: 36 Global Step: 753120 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:30,223-Speed 6279.67 samples/sec Loss 2.6372 LearningRate 0.0000 Epoch: 36 Global Step: 753130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:33,477-Speed 6295.04 samples/sec Loss 2.6541 LearningRate 0.0000 Epoch: 36 Global Step: 753140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:36,730-Speed 6296.93 samples/sec Loss 2.6987 LearningRate 0.0000 Epoch: 36 Global Step: 753150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:39,985-Speed 6294.80 samples/sec Loss 2.6964 LearningRate 0.0000 Epoch: 36 Global Step: 753160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:43,221-Speed 6330.40 samples/sec Loss 2.7090 LearningRate 0.0000 Epoch: 36 Global Step: 753170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:46,477-Speed 6290.00 samples/sec Loss 2.6541 LearningRate 0.0000 Epoch: 36 Global Step: 753180 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:49,737-Speed 6284.47 samples/sec Loss 2.6375 LearningRate 0.0000 Epoch: 36 Global Step: 753190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:53,001-Speed 6275.73 samples/sec Loss 2.6347 LearningRate 0.0000 Epoch: 36 Global Step: 753200 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:56,254-Speed 6298.35 samples/sec Loss 2.7295 LearningRate 0.0000 Epoch: 36 Global Step: 753210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:15:59,511-Speed 6287.98 samples/sec Loss 2.6916 LearningRate 0.0000 Epoch: 36 Global Step: 753220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:02,773-Speed 6280.20 samples/sec Loss 2.6568 LearningRate 0.0000 Epoch: 36 Global Step: 753230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:06,024-Speed 6300.62 samples/sec Loss 2.6961 LearningRate 0.0000 Epoch: 36 Global Step: 753240 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:09,284-Speed 6283.56 samples/sec Loss 2.7142 LearningRate 0.0000 Epoch: 36 Global Step: 753250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:12,541-Speed 6290.61 samples/sec Loss 2.6849 LearningRate 0.0000 Epoch: 36 Global Step: 753260 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:15,785-Speed 6313.26 samples/sec Loss 2.6741 LearningRate 0.0000 Epoch: 36 Global Step: 753270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:19,046-Speed 6282.51 samples/sec Loss 2.7016 LearningRate 0.0000 Epoch: 36 Global Step: 753280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:22,310-Speed 6276.47 samples/sec Loss 2.7052 LearningRate 0.0000 Epoch: 36 Global Step: 753290 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:25,619-Speed 6189.22 samples/sec Loss 2.6887 LearningRate 0.0000 Epoch: 36 Global Step: 753300 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:28,880-Speed 6282.78 samples/sec Loss 2.6791 LearningRate 0.0000 Epoch: 36 Global Step: 753310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:32,136-Speed 6290.76 samples/sec Loss 2.6622 LearningRate 0.0000 Epoch: 36 Global Step: 753320 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:35,390-Speed 6295.61 samples/sec Loss 2.6700 LearningRate 0.0000 Epoch: 36 Global Step: 753330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:38,640-Speed 6301.82 samples/sec Loss 2.6400 LearningRate 0.0000 Epoch: 36 Global Step: 753340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:41,942-Speed 6204.82 samples/sec Loss 2.6539 LearningRate 0.0000 Epoch: 36 Global Step: 753350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:45,201-Speed 6285.95 samples/sec Loss 2.6795 LearningRate 0.0000 Epoch: 36 Global Step: 753360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:48,455-Speed 6293.32 samples/sec Loss 2.6639 LearningRate 0.0000 Epoch: 36 Global Step: 753370 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:16:51,697-Speed 6320.14 samples/sec Loss 2.6498 LearningRate 0.0000 Epoch: 36 Global Step: 753380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:54,958-Speed 6281.56 samples/sec Loss 2.6573 LearningRate 0.0000 Epoch: 36 Global Step: 753390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:16:58,218-Speed 6284.40 samples/sec Loss 2.7156 LearningRate 0.0000 Epoch: 36 Global Step: 753400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:01,483-Speed 6276.10 samples/sec Loss 2.7006 LearningRate 0.0000 Epoch: 36 Global Step: 753410 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:04,739-Speed 6291.78 samples/sec Loss 2.6495 LearningRate 0.0000 Epoch: 36 Global Step: 753420 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:07,996-Speed 6288.62 samples/sec Loss 2.7007 LearningRate 0.0000 Epoch: 36 Global Step: 753430 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:11,255-Speed 6285.01 samples/sec Loss 2.6458 LearningRate 0.0000 Epoch: 36 Global Step: 753440 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:14,511-Speed 6291.79 samples/sec Loss 2.6990 LearningRate 0.0000 Epoch: 36 Global Step: 753450 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:17,770-Speed 6285.74 samples/sec Loss 2.6699 LearningRate 0.0000 Epoch: 36 Global Step: 753460 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:21,027-Speed 6290.03 samples/sec Loss 2.6304 LearningRate 0.0000 Epoch: 36 Global Step: 753470 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:24,282-Speed 6293.30 samples/sec Loss 2.6636 LearningRate 0.0000 Epoch: 36 Global Step: 753480 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:27,547-Speed 6273.11 samples/sec Loss 2.7098 LearningRate 0.0000 Epoch: 36 Global Step: 753490 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:30,809-Speed 6283.34 samples/sec Loss 2.7062 LearningRate 0.0000 Epoch: 36 Global Step: 753500 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:34,072-Speed 6278.09 samples/sec Loss 2.7247 LearningRate 0.0000 Epoch: 36 Global Step: 753510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:37,340-Speed 6267.86 samples/sec Loss 2.6489 LearningRate 0.0000 Epoch: 36 Global Step: 753520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:40,604-Speed 6275.74 samples/sec Loss 2.6416 LearningRate 0.0000 Epoch: 36 Global Step: 753530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:43,861-Speed 6289.05 samples/sec Loss 2.6440 LearningRate 0.0000 Epoch: 36 Global Step: 753540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:47,119-Speed 6287.35 samples/sec Loss 2.6648 LearningRate 0.0000 Epoch: 36 Global Step: 753550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:50,382-Speed 6277.85 samples/sec Loss 2.6633 LearningRate 0.0000 Epoch: 36 Global Step: 753560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:53,637-Speed 6294.09 samples/sec Loss 2.6060 LearningRate 0.0000 Epoch: 36 Global Step: 753570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:17:56,879-Speed 6318.78 samples/sec Loss 2.6504 LearningRate 0.0000 Epoch: 36 Global Step: 753580 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:00,135-Speed 6291.71 samples/sec Loss 2.6791 LearningRate 0.0000 Epoch: 36 Global Step: 753590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:03,393-Speed 6287.53 samples/sec Loss 2.7163 LearningRate 0.0000 Epoch: 36 Global Step: 753600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:06,653-Speed 6284.79 samples/sec Loss 2.6519 LearningRate 0.0000 Epoch: 36 Global Step: 753610 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:09,913-Speed 6282.17 samples/sec Loss 2.6917 LearningRate 0.0000 Epoch: 36 Global Step: 753620 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:13,166-Speed 6297.30 samples/sec Loss 2.6813 LearningRate 0.0000 Epoch: 36 Global Step: 753630 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:16,428-Speed 6280.43 samples/sec Loss 2.6544 LearningRate 0.0000 Epoch: 36 Global Step: 753640 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:19,679-Speed 6300.31 samples/sec Loss 2.6736 LearningRate 0.0000 Epoch: 36 Global Step: 753650 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:22,939-Speed 6284.68 samples/sec Loss 2.6766 LearningRate 0.0000 Epoch: 36 Global Step: 753660 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:26,195-Speed 6290.75 samples/sec Loss 2.6791 LearningRate 0.0000 Epoch: 36 Global Step: 753670 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:29,438-Speed 6315.60 samples/sec Loss 2.7220 LearningRate 0.0000 Epoch: 36 Global Step: 753680 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:32,696-Speed 6289.45 samples/sec Loss 2.6785 LearningRate 0.0000 Epoch: 36 Global Step: 753690 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:35,958-Speed 6278.41 samples/sec Loss 2.7204 LearningRate 0.0000 Epoch: 36 Global Step: 753700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:39,222-Speed 6275.91 samples/sec Loss 2.6491 LearningRate 0.0000 Epoch: 36 Global Step: 753710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:42,477-Speed 6294.12 samples/sec Loss 2.6938 LearningRate 0.0000 Epoch: 36 Global Step: 753720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:45,736-Speed 6285.39 samples/sec Loss 2.7177 LearningRate 0.0000 Epoch: 36 Global Step: 753730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:49,028-Speed 6221.21 samples/sec Loss 2.6739 LearningRate 0.0000 Epoch: 36 Global Step: 753740 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:52,289-Speed 6282.96 samples/sec Loss 2.6206 LearningRate 0.0000 Epoch: 36 Global Step: 753750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:55,550-Speed 6280.73 samples/sec Loss 2.6237 LearningRate 0.0000 Epoch: 36 Global Step: 753760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:18:58,811-Speed 6282.71 samples/sec Loss 2.6788 LearningRate 0.0000 Epoch: 36 Global Step: 753770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:02,071-Speed 6282.61 samples/sec Loss 2.7274 LearningRate 0.0000 Epoch: 36 Global Step: 753780 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:19:05,313-Speed 6320.30 samples/sec Loss 2.6514 LearningRate 0.0000 Epoch: 36 Global Step: 753790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:08,567-Speed 6294.95 samples/sec Loss 2.6914 LearningRate 0.0000 Epoch: 36 Global Step: 753800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:11,825-Speed 6286.30 samples/sec Loss 2.6840 LearningRate 0.0000 Epoch: 36 Global Step: 753810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:15,077-Speed 6299.62 samples/sec Loss 2.6992 LearningRate 0.0000 Epoch: 36 Global Step: 753820 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:18,342-Speed 6274.59 samples/sec Loss 2.7272 LearningRate 0.0000 Epoch: 36 Global Step: 753830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:21,604-Speed 6280.44 samples/sec Loss 2.6957 LearningRate 0.0000 Epoch: 36 Global Step: 753840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:24,927-Speed 6163.35 samples/sec Loss 2.6768 LearningRate 0.0000 Epoch: 36 Global Step: 753850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:28,189-Speed 6280.13 samples/sec Loss 2.6738 LearningRate 0.0000 Epoch: 36 Global Step: 753860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:31,442-Speed 6297.11 samples/sec Loss 2.6762 LearningRate 0.0000 Epoch: 36 Global Step: 753870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:34,695-Speed 6298.06 samples/sec Loss 2.6702 LearningRate 0.0000 Epoch: 36 Global Step: 753880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:37,939-Speed 6314.48 samples/sec Loss 2.6708 LearningRate 0.0000 Epoch: 36 Global Step: 753890 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:41,206-Speed 6268.69 samples/sec Loss 2.6485 LearningRate 0.0000 Epoch: 36 Global Step: 753900 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:44,471-Speed 6277.76 samples/sec Loss 2.6524 LearningRate 0.0000 Epoch: 36 Global Step: 753910 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:47,718-Speed 6308.70 samples/sec Loss 2.6388 LearningRate 0.0000 Epoch: 36 Global Step: 753920 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:50,974-Speed 6291.71 samples/sec Loss 2.6945 LearningRate 0.0000 Epoch: 36 Global Step: 753930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:54,231-Speed 6290.01 samples/sec Loss 2.6443 LearningRate 0.0000 Epoch: 36 Global Step: 753940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:19:57,527-Speed 6213.81 samples/sec Loss 2.6710 LearningRate 0.0000 Epoch: 36 Global Step: 753950 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:00,808-Speed 6244.94 samples/sec Loss 2.6929 LearningRate 0.0000 Epoch: 36 Global Step: 753960 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:04,070-Speed 6278.79 samples/sec Loss 2.6992 LearningRate 0.0000 Epoch: 36 Global Step: 753970 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:07,328-Speed 6287.94 samples/sec Loss 2.6987 LearningRate 0.0000 Epoch: 36 Global Step: 753980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:10,582-Speed 6294.99 samples/sec Loss 2.6584 LearningRate 0.0000 Epoch: 36 Global Step: 753990 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:20:13,823-Speed 6321.37 samples/sec Loss 2.6884 LearningRate 0.0000 Epoch: 36 Global Step: 754000 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:17,079-Speed 6290.77 samples/sec Loss 2.6934 LearningRate 0.0000 Epoch: 36 Global Step: 754010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:20,338-Speed 6285.91 samples/sec Loss 2.6465 LearningRate 0.0000 Epoch: 36 Global Step: 754020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:23,604-Speed 6271.76 samples/sec Loss 2.6813 LearningRate 0.0000 Epoch: 36 Global Step: 754030 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:26,866-Speed 6279.14 samples/sec Loss 2.6682 LearningRate 0.0000 Epoch: 36 Global Step: 754040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:30,233-Speed 6085.04 samples/sec Loss 2.6543 LearningRate 0.0000 Epoch: 36 Global Step: 754050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:33,495-Speed 6279.20 samples/sec Loss 2.6441 LearningRate 0.0000 Epoch: 36 Global Step: 754060 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:36,758-Speed 6278.20 samples/sec Loss 2.6771 LearningRate 0.0000 Epoch: 36 Global Step: 754070 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:40,018-Speed 6282.59 samples/sec Loss 2.6172 LearningRate 0.0000 Epoch: 36 Global Step: 754080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:43,273-Speed 6293.04 samples/sec Loss 2.6836 LearningRate 0.0000 Epoch: 36 Global Step: 754090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:46,517-Speed 6315.55 samples/sec Loss 2.7111 LearningRate 0.0000 Epoch: 36 Global Step: 754100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:49,770-Speed 6296.83 samples/sec Loss 2.6340 LearningRate 0.0000 Epoch: 36 Global Step: 754110 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:53,028-Speed 6288.01 samples/sec Loss 2.6720 LearningRate 0.0000 Epoch: 36 Global Step: 754120 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:56,285-Speed 6288.67 samples/sec Loss 2.6714 LearningRate 0.0000 Epoch: 36 Global Step: 754130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:20:59,540-Speed 6293.23 samples/sec Loss 2.7621 LearningRate 0.0000 Epoch: 36 Global Step: 754140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:02,803-Speed 6278.88 samples/sec Loss 2.6751 LearningRate 0.0000 Epoch: 36 Global Step: 754150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:06,058-Speed 6293.13 samples/sec Loss 2.6522 LearningRate 0.0000 Epoch: 36 Global Step: 754160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:09,310-Speed 6298.56 samples/sec Loss 2.6853 LearningRate 0.0000 Epoch: 36 Global Step: 754170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:12,568-Speed 6287.88 samples/sec Loss 2.6339 LearningRate 0.0000 Epoch: 36 Global Step: 754180 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:15,825-Speed 6288.98 samples/sec Loss 2.6556 LearningRate 0.0000 Epoch: 36 Global Step: 754190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:19,072-Speed 6310.32 samples/sec Loss 2.6450 LearningRate 0.0000 Epoch: 36 Global Step: 754200 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:22,338-Speed 6271.71 samples/sec Loss 2.6106 LearningRate 0.0000 Epoch: 36 Global Step: 754210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:25,597-Speed 6285.05 samples/sec Loss 2.7213 LearningRate 0.0000 Epoch: 36 Global Step: 754220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:28,854-Speed 6289.00 samples/sec Loss 2.6791 LearningRate 0.0000 Epoch: 36 Global Step: 754230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:32,129-Speed 6255.65 samples/sec Loss 2.7106 LearningRate 0.0000 Epoch: 36 Global Step: 754240 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:35,377-Speed 6307.28 samples/sec Loss 2.6471 LearningRate 0.0000 Epoch: 36 Global Step: 754250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:38,639-Speed 6279.11 samples/sec Loss 2.7353 LearningRate 0.0000 Epoch: 36 Global Step: 754260 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:41,901-Speed 6279.63 samples/sec Loss 2.6972 LearningRate 0.0000 Epoch: 36 Global Step: 754270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:45,164-Speed 6278.06 samples/sec Loss 2.6691 LearningRate 0.0000 Epoch: 36 Global Step: 754280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:48,427-Speed 6278.09 samples/sec Loss 2.6950 LearningRate 0.0000 Epoch: 36 Global Step: 754290 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:51,690-Speed 6277.04 samples/sec Loss 2.7035 LearningRate 0.0000 Epoch: 36 Global Step: 754300 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:21:54,931-Speed 6321.77 samples/sec Loss 2.6436 LearningRate 0.0000 Epoch: 36 Global Step: 754310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:21:58,184-Speed 6296.19 samples/sec Loss 2.6784 LearningRate 0.0000 Epoch: 36 Global Step: 754320 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:01,442-Speed 6288.03 samples/sec Loss 2.6424 LearningRate 0.0000 Epoch: 36 Global Step: 754330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:04,699-Speed 6288.63 samples/sec Loss 2.6832 LearningRate 0.0000 Epoch: 36 Global Step: 754340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:07,956-Speed 6289.89 samples/sec Loss 2.6220 LearningRate 0.0000 Epoch: 36 Global Step: 754350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:11,215-Speed 6285.19 samples/sec Loss 2.6371 LearningRate 0.0000 Epoch: 36 Global Step: 754360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:14,470-Speed 6293.08 samples/sec Loss 2.6996 LearningRate 0.0000 Epoch: 36 Global Step: 754370 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:17,730-Speed 6283.84 samples/sec Loss 2.6583 LearningRate 0.0000 Epoch: 36 Global Step: 754380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:20,997-Speed 6270.41 samples/sec Loss 2.6865 LearningRate 0.0000 Epoch: 36 Global Step: 754390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:24,261-Speed 6276.59 samples/sec Loss 2.6635 LearningRate 0.0000 Epoch: 36 Global Step: 754400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:27,515-Speed 6294.57 samples/sec Loss 2.6639 LearningRate 0.0000 Epoch: 36 Global Step: 754410 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:22:30,758-Speed 6318.55 samples/sec Loss 2.7229 LearningRate 0.0000 Epoch: 36 Global Step: 754420 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:34,020-Speed 6278.72 samples/sec Loss 2.6755 LearningRate 0.0000 Epoch: 36 Global Step: 754430 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:37,285-Speed 6273.81 samples/sec Loss 2.6544 LearningRate 0.0000 Epoch: 36 Global Step: 754440 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:40,534-Speed 6305.18 samples/sec Loss 2.6915 LearningRate 0.0000 Epoch: 36 Global Step: 754450 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:43,795-Speed 6282.42 samples/sec Loss 2.6331 LearningRate 0.0000 Epoch: 36 Global Step: 754460 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:47,049-Speed 6293.76 samples/sec Loss 2.6963 LearningRate 0.0000 Epoch: 36 Global Step: 754470 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:50,307-Speed 6288.54 samples/sec Loss 2.6595 LearningRate 0.0000 Epoch: 36 Global Step: 754480 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:53,566-Speed 6284.51 samples/sec Loss 2.6553 LearningRate 0.0000 Epoch: 36 Global Step: 754490 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:22:56,825-Speed 6285.59 samples/sec Loss 2.7050 LearningRate 0.0000 Epoch: 36 Global Step: 754500 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:00,082-Speed 6290.20 samples/sec Loss 2.6589 LearningRate 0.0000 Epoch: 36 Global Step: 754510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:03,314-Speed 6337.35 samples/sec Loss 2.6948 LearningRate 0.0000 Epoch: 36 Global Step: 754520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:06,587-Speed 6259.95 samples/sec Loss 2.6984 LearningRate 0.0000 Epoch: 36 Global Step: 754530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:09,843-Speed 6290.15 samples/sec Loss 2.6966 LearningRate 0.0000 Epoch: 36 Global Step: 754540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:13,105-Speed 6284.12 samples/sec Loss 2.6610 LearningRate 0.0000 Epoch: 36 Global Step: 754550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:16,367-Speed 6278.39 samples/sec Loss 2.6696 LearningRate 0.0000 Epoch: 36 Global Step: 754560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:19,629-Speed 6280.84 samples/sec Loss 2.6958 LearningRate 0.0000 Epoch: 36 Global Step: 754570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:22,885-Speed 6289.70 samples/sec Loss 2.6601 LearningRate 0.0000 Epoch: 36 Global Step: 754580 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:26,146-Speed 6283.63 samples/sec Loss 2.6492 LearningRate 0.0000 Epoch: 36 Global Step: 754590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:29,396-Speed 6302.54 samples/sec Loss 2.6505 LearningRate 0.0000 Epoch: 36 Global Step: 754600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:32,651-Speed 6293.27 samples/sec Loss 2.6599 LearningRate 0.0000 Epoch: 36 Global Step: 754610 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:35,887-Speed 6331.68 samples/sec Loss 2.6924 LearningRate 0.0000 Epoch: 36 Global Step: 754620 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:39,144-Speed 6288.71 samples/sec Loss 2.6952 LearningRate 0.0000 Epoch: 36 Global Step: 754630 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:42,399-Speed 6292.14 samples/sec Loss 2.6582 LearningRate 0.0000 Epoch: 36 Global Step: 754640 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:45,659-Speed 6284.29 samples/sec Loss 2.7280 LearningRate 0.0000 Epoch: 36 Global Step: 754650 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:48,922-Speed 6277.22 samples/sec Loss 2.6678 LearningRate 0.0000 Epoch: 36 Global Step: 754660 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:52,178-Speed 6291.12 samples/sec Loss 2.6945 LearningRate 0.0000 Epoch: 36 Global Step: 754670 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:55,434-Speed 6292.93 samples/sec Loss 2.6855 LearningRate 0.0000 Epoch: 36 Global Step: 754680 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:23:58,688-Speed 6294.82 samples/sec Loss 2.7005 LearningRate 0.0000 Epoch: 36 Global Step: 754690 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:01,943-Speed 6292.77 samples/sec Loss 2.6617 LearningRate 0.0000 Epoch: 36 Global Step: 754700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:05,201-Speed 6287.89 samples/sec Loss 2.7030 LearningRate 0.0000 Epoch: 36 Global Step: 754710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:08,452-Speed 6299.72 samples/sec Loss 2.7007 LearningRate 0.0000 Epoch: 36 Global Step: 754720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:11,800-Speed 6118.94 samples/sec Loss 2.7133 LearningRate 0.0000 Epoch: 36 Global Step: 754730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:15,082-Speed 6242.71 samples/sec Loss 2.6954 LearningRate 0.0000 Epoch: 36 Global Step: 754740 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:18,346-Speed 6275.76 samples/sec Loss 2.6559 LearningRate 0.0000 Epoch: 36 Global Step: 754750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:21,605-Speed 6284.58 samples/sec Loss 2.6567 LearningRate 0.0000 Epoch: 36 Global Step: 754760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:24,877-Speed 6260.32 samples/sec Loss 2.6992 LearningRate 0.0000 Epoch: 36 Global Step: 754770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:28,145-Speed 6267.76 samples/sec Loss 2.6334 LearningRate 0.0000 Epoch: 36 Global Step: 754780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:31,403-Speed 6288.06 samples/sec Loss 2.6833 LearningRate 0.0000 Epoch: 36 Global Step: 754790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:34,666-Speed 6278.67 samples/sec Loss 2.7011 LearningRate 0.0000 Epoch: 36 Global Step: 754800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:37,920-Speed 6296.08 samples/sec Loss 2.6352 LearningRate 0.0000 Epoch: 36 Global Step: 754810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:41,181-Speed 6282.17 samples/sec Loss 2.7043 LearningRate 0.0000 Epoch: 36 Global Step: 754820 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:24:44,424-Speed 6314.78 samples/sec Loss 2.6811 LearningRate 0.0000 Epoch: 36 Global Step: 754830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:47,687-Speed 6278.74 samples/sec Loss 2.7003 LearningRate 0.0000 Epoch: 36 Global Step: 754840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:50,945-Speed 6287.39 samples/sec Loss 2.7171 LearningRate 0.0000 Epoch: 36 Global Step: 754850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:54,202-Speed 6288.48 samples/sec Loss 2.6551 LearningRate 0.0000 Epoch: 36 Global Step: 754860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:24:57,459-Speed 6289.35 samples/sec Loss 2.6547 LearningRate 0.0000 Epoch: 36 Global Step: 754870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:00,711-Speed 6300.25 samples/sec Loss 2.7091 LearningRate 0.0000 Epoch: 36 Global Step: 754880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:03,971-Speed 6283.36 samples/sec Loss 2.6577 LearningRate 0.0000 Epoch: 36 Global Step: 754890 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:07,234-Speed 6277.35 samples/sec Loss 2.6589 LearningRate 0.0000 Epoch: 36 Global Step: 754900 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:10,496-Speed 6280.09 samples/sec Loss 2.6928 LearningRate 0.0000 Epoch: 36 Global Step: 754910 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:13,762-Speed 6271.23 samples/sec Loss 2.6901 LearningRate 0.0000 Epoch: 36 Global Step: 754920 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:17,013-Speed 6301.25 samples/sec Loss 2.6724 LearningRate 0.0000 Epoch: 36 Global Step: 754930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:20,272-Speed 6285.63 samples/sec Loss 2.6453 LearningRate 0.0000 Epoch: 36 Global Step: 754940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:23,530-Speed 6287.39 samples/sec Loss 2.6852 LearningRate 0.0000 Epoch: 36 Global Step: 754950 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:26,789-Speed 6289.43 samples/sec Loss 2.6410 LearningRate 0.0000 Epoch: 36 Global Step: 754960 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:30,050-Speed 6281.37 samples/sec Loss 2.6435 LearningRate 0.0000 Epoch: 36 Global Step: 754970 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:33,309-Speed 6285.95 samples/sec Loss 2.6576 LearningRate 0.0000 Epoch: 36 Global Step: 754980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:36,569-Speed 6287.17 samples/sec Loss 2.6532 LearningRate 0.0000 Epoch: 36 Global Step: 754990 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:39,825-Speed 6291.57 samples/sec Loss 2.6794 LearningRate 0.0000 Epoch: 36 Global Step: 755000 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:43,087-Speed 6281.26 samples/sec Loss 2.6922 LearningRate 0.0000 Epoch: 36 Global Step: 755010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:46,343-Speed 6290.01 samples/sec Loss 2.6905 LearningRate 0.0000 Epoch: 36 Global Step: 755020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:49,582-Speed 6324.38 samples/sec Loss 2.6663 LearningRate 0.0000 Epoch: 36 Global Step: 755030 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:52,845-Speed 6277.32 samples/sec Loss 2.6795 LearningRate 0.0000 Epoch: 36 Global Step: 755040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:56,102-Speed 6289.82 samples/sec Loss 2.7024 LearningRate 0.0000 Epoch: 36 Global Step: 755050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:25:59,358-Speed 6292.00 samples/sec Loss 2.6101 LearningRate 0.0000 Epoch: 36 Global Step: 755060 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:02,619-Speed 6281.85 samples/sec Loss 2.6285 LearningRate 0.0000 Epoch: 36 Global Step: 755070 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:05,879-Speed 6282.65 samples/sec Loss 2.6385 LearningRate 0.0000 Epoch: 36 Global Step: 755080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:09,139-Speed 6285.18 samples/sec Loss 2.6989 LearningRate 0.0000 Epoch: 36 Global Step: 755090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:12,391-Speed 6298.41 samples/sec Loss 2.6694 LearningRate 0.0000 Epoch: 36 Global Step: 755100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:15,649-Speed 6288.03 samples/sec Loss 2.6459 LearningRate 0.0000 Epoch: 36 Global Step: 755110 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:18,908-Speed 6285.68 samples/sec Loss 2.6927 LearningRate 0.0000 Epoch: 36 Global Step: 755120 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:22,145-Speed 6327.03 samples/sec Loss 2.6819 LearningRate 0.0000 Epoch: 36 Global Step: 755130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:25,400-Speed 6294.30 samples/sec Loss 2.6317 LearningRate 0.0000 Epoch: 36 Global Step: 755140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:28,675-Speed 6253.82 samples/sec Loss 2.6488 LearningRate 0.0000 Epoch: 36 Global Step: 755150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:31,935-Speed 6282.94 samples/sec Loss 2.7226 LearningRate 0.0000 Epoch: 36 Global Step: 755160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:35,195-Speed 6284.33 samples/sec Loss 2.6482 LearningRate 0.0000 Epoch: 36 Global Step: 755170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:38,454-Speed 6285.80 samples/sec Loss 2.6707 LearningRate 0.0000 Epoch: 36 Global Step: 755180 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:41,718-Speed 6276.37 samples/sec Loss 2.6692 LearningRate 0.0000 Epoch: 36 Global Step: 755190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:44,979-Speed 6282.68 samples/sec Loss 2.7204 LearningRate 0.0000 Epoch: 36 Global Step: 755200 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:48,239-Speed 6282.65 samples/sec Loss 2.6631 LearningRate 0.0000 Epoch: 36 Global Step: 755210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:51,496-Speed 6290.65 samples/sec Loss 2.6454 LearningRate 0.0000 Epoch: 36 Global Step: 755220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:54,739-Speed 6316.33 samples/sec Loss 2.6537 LearningRate 0.0000 Epoch: 36 Global Step: 755230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:26:57,995-Speed 6292.19 samples/sec Loss 2.6979 LearningRate 0.0000 Epoch: 36 Global Step: 755240 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:01,262-Speed 6270.04 samples/sec Loss 2.7012 LearningRate 0.0000 Epoch: 36 Global Step: 755250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:04,518-Speed 6290.26 samples/sec Loss 2.6570 LearningRate 0.0000 Epoch: 36 Global Step: 755260 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:07,775-Speed 6290.01 samples/sec Loss 2.6939 LearningRate 0.0000 Epoch: 36 Global Step: 755270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:11,043-Speed 6269.26 samples/sec Loss 2.6182 LearningRate 0.0000 Epoch: 36 Global Step: 755280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:14,298-Speed 6291.52 samples/sec Loss 2.6665 LearningRate 0.0000 Epoch: 36 Global Step: 755290 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:17,559-Speed 6283.35 samples/sec Loss 2.6431 LearningRate 0.0000 Epoch: 36 Global Step: 755300 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:20,819-Speed 6282.88 samples/sec Loss 2.6918 LearningRate 0.0000 Epoch: 36 Global Step: 755310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:24,075-Speed 6291.71 samples/sec Loss 2.6173 LearningRate 0.0000 Epoch: 36 Global Step: 755320 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:27,323-Speed 6305.64 samples/sec Loss 2.6350 LearningRate 0.0000 Epoch: 36 Global Step: 755330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:30,589-Speed 6272.35 samples/sec Loss 2.6903 LearningRate 0.0000 Epoch: 36 Global Step: 755340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:33,847-Speed 6288.08 samples/sec Loss 2.7040 LearningRate 0.0000 Epoch: 36 Global Step: 755350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:37,109-Speed 6280.28 samples/sec Loss 2.6972 LearningRate 0.0000 Epoch: 36 Global Step: 755360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:40,370-Speed 6281.51 samples/sec Loss 2.6230 LearningRate 0.0000 Epoch: 36 Global Step: 755370 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:43,629-Speed 6286.16 samples/sec Loss 2.6802 LearningRate 0.0000 Epoch: 36 Global Step: 755380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:46,906-Speed 6250.51 samples/sec Loss 2.6942 LearningRate 0.0000 Epoch: 36 Global Step: 755390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:50,169-Speed 6278.92 samples/sec Loss 2.6714 LearningRate 0.0000 Epoch: 36 Global Step: 755400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:53,439-Speed 6264.39 samples/sec Loss 2.6449 LearningRate 0.0000 Epoch: 36 Global Step: 755410 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:56,696-Speed 6288.26 samples/sec Loss 2.6948 LearningRate 0.0000 Epoch: 36 Global Step: 755420 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:27:59,958-Speed 6279.57 samples/sec Loss 2.6747 LearningRate 0.0000 Epoch: 36 Global Step: 755430 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:28:03,197-Speed 6324.21 samples/sec Loss 2.6843 LearningRate 0.0000 Epoch: 36 Global Step: 755440 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:06,450-Speed 6298.21 samples/sec Loss 2.6340 LearningRate 0.0000 Epoch: 36 Global Step: 755450 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:09,709-Speed 6284.64 samples/sec Loss 2.6491 LearningRate 0.0000 Epoch: 36 Global Step: 755460 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:12,966-Speed 6289.84 samples/sec Loss 2.6824 LearningRate 0.0000 Epoch: 36 Global Step: 755470 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:16,230-Speed 6275.37 samples/sec Loss 2.6416 LearningRate 0.0000 Epoch: 36 Global Step: 755480 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:19,489-Speed 6285.19 samples/sec Loss 2.6720 LearningRate 0.0000 Epoch: 36 Global Step: 755490 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:22,752-Speed 6278.14 samples/sec Loss 2.6680 LearningRate 0.0000 Epoch: 36 Global Step: 755500 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:26,010-Speed 6287.16 samples/sec Loss 2.6880 LearningRate 0.0000 Epoch: 36 Global Step: 755510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:29,271-Speed 6282.10 samples/sec Loss 2.6944 LearningRate 0.0000 Epoch: 36 Global Step: 755520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:32,530-Speed 6285.99 samples/sec Loss 2.6361 LearningRate 0.0000 Epoch: 36 Global Step: 755530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:35,806-Speed 6254.02 samples/sec Loss 2.6489 LearningRate 0.0000 Epoch: 36 Global Step: 755540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:39,061-Speed 6291.81 samples/sec Loss 2.6923 LearningRate 0.0000 Epoch: 36 Global Step: 755550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:42,319-Speed 6288.44 samples/sec Loss 2.7079 LearningRate 0.0000 Epoch: 36 Global Step: 755560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:45,579-Speed 6283.43 samples/sec Loss 2.6492 LearningRate 0.0000 Epoch: 36 Global Step: 755570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:48,844-Speed 6273.53 samples/sec Loss 2.6956 LearningRate 0.0000 Epoch: 36 Global Step: 755580 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:52,096-Speed 6300.75 samples/sec Loss 2.6354 LearningRate 0.0000 Epoch: 36 Global Step: 755590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:55,355-Speed 6284.03 samples/sec Loss 2.5907 LearningRate 0.0000 Epoch: 36 Global Step: 755600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:28:58,612-Speed 6291.30 samples/sec Loss 2.6164 LearningRate 0.0000 Epoch: 36 Global Step: 755610 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:01,869-Speed 6287.77 samples/sec Loss 2.6851 LearningRate 0.0000 Epoch: 36 Global Step: 755620 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:05,129-Speed 6284.18 samples/sec Loss 2.6723 LearningRate 0.0000 Epoch: 36 Global Step: 755630 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:08,381-Speed 6298.60 samples/sec Loss 2.6464 LearningRate 0.0000 Epoch: 36 Global Step: 755640 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:29:11,638-Speed 6289.95 samples/sec Loss 2.6851 LearningRate 0.0000 Epoch: 36 Global Step: 755650 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:29:14,880-Speed 6319.37 samples/sec Loss 2.7211 LearningRate 0.0000 Epoch: 36 Global Step: 755660 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:18,142-Speed 6279.50 samples/sec Loss 2.6303 LearningRate 0.0000 Epoch: 36 Global Step: 755670 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:21,397-Speed 6293.03 samples/sec Loss 2.6552 LearningRate 0.0000 Epoch: 36 Global Step: 755680 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:24,655-Speed 6286.86 samples/sec Loss 2.6443 LearningRate 0.0000 Epoch: 36 Global Step: 755690 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:27,908-Speed 6298.09 samples/sec Loss 2.6414 LearningRate 0.0000 Epoch: 36 Global Step: 755700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:31,169-Speed 6280.06 samples/sec Loss 2.6392 LearningRate 0.0000 Epoch: 36 Global Step: 755710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:34,431-Speed 6281.68 samples/sec Loss 2.6951 LearningRate 0.0000 Epoch: 36 Global Step: 755720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:37,701-Speed 6263.79 samples/sec Loss 2.6636 LearningRate 0.0000 Epoch: 36 Global Step: 755730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:40,957-Speed 6291.27 samples/sec Loss 2.6645 LearningRate 0.0000 Epoch: 36 Global Step: 755740 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:44,222-Speed 6273.91 samples/sec Loss 2.6895 LearningRate 0.0000 Epoch: 36 Global Step: 755750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:47,466-Speed 6313.54 samples/sec Loss 2.6786 LearningRate 0.0000 Epoch: 36 Global Step: 755760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:50,729-Speed 6278.83 samples/sec Loss 2.6266 LearningRate 0.0000 Epoch: 36 Global Step: 755770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:53,983-Speed 6295.72 samples/sec Loss 2.6922 LearningRate 0.0000 Epoch: 36 Global Step: 755780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:29:57,245-Speed 6278.13 samples/sec Loss 2.6997 LearningRate 0.0000 Epoch: 36 Global Step: 755790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:00,498-Speed 6298.58 samples/sec Loss 2.6829 LearningRate 0.0000 Epoch: 36 Global Step: 755800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:03,760-Speed 6279.14 samples/sec Loss 2.6513 LearningRate 0.0000 Epoch: 36 Global Step: 755810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:07,021-Speed 6282.57 samples/sec Loss 2.6530 LearningRate 0.0000 Epoch: 36 Global Step: 755820 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:10,275-Speed 6295.33 samples/sec Loss 2.6510 LearningRate 0.0000 Epoch: 36 Global Step: 755830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:13,537-Speed 6279.74 samples/sec Loss 2.6940 LearningRate 0.0000 Epoch: 36 Global Step: 755840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:16,795-Speed 6290.46 samples/sec Loss 2.7012 LearningRate 0.0000 Epoch: 36 Global Step: 755850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:20,035-Speed 6321.61 samples/sec Loss 2.6610 LearningRate 0.0000 Epoch: 36 Global Step: 755860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:23,298-Speed 6279.37 samples/sec Loss 2.7068 LearningRate 0.0000 Epoch: 36 Global Step: 755870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:26,552-Speed 6294.05 samples/sec Loss 2.6615 LearningRate 0.0000 Epoch: 36 Global Step: 755880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:29,800-Speed 6306.30 samples/sec Loss 2.6724 LearningRate 0.0000 Epoch: 36 Global Step: 755890 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:33,052-Speed 6298.61 samples/sec Loss 2.6347 LearningRate 0.0000 Epoch: 36 Global Step: 755900 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:36,315-Speed 6278.93 samples/sec Loss 2.6614 LearningRate 0.0000 Epoch: 36 Global Step: 755910 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:39,574-Speed 6285.10 samples/sec Loss 2.6812 LearningRate 0.0000 Epoch: 36 Global Step: 755920 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:42,836-Speed 6279.47 samples/sec Loss 2.6384 LearningRate 0.0000 Epoch: 36 Global Step: 755930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:46,101-Speed 6274.88 samples/sec Loss 2.6734 LearningRate 0.0000 Epoch: 36 Global Step: 755940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:49,364-Speed 6277.49 samples/sec Loss 2.6518 LearningRate 0.0000 Epoch: 36 Global Step: 755950 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:52,619-Speed 6294.49 samples/sec Loss 2.6756 LearningRate 0.0000 Epoch: 36 Global Step: 755960 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:30:55,860-Speed 6318.48 samples/sec Loss 2.7236 LearningRate 0.0000 Epoch: 36 Global Step: 755970 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:30:59,115-Speed 6293.60 samples/sec Loss 2.6766 LearningRate 0.0000 Epoch: 36 Global Step: 755980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:02,369-Speed 6295.42 samples/sec Loss 2.6532 LearningRate 0.0000 Epoch: 36 Global Step: 755990 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:05,630-Speed 6282.25 samples/sec Loss 2.6960 LearningRate 0.0000 Epoch: 36 Global Step: 756000 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:08,885-Speed 6293.56 samples/sec Loss 2.6502 LearningRate 0.0000 Epoch: 36 Global Step: 756010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:12,145-Speed 6284.14 samples/sec Loss 2.7165 LearningRate 0.0000 Epoch: 36 Global Step: 756020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:15,403-Speed 6288.09 samples/sec Loss 2.6694 LearningRate 0.0000 Epoch: 36 Global Step: 756030 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:18,668-Speed 6273.47 samples/sec Loss 2.6649 LearningRate 0.0000 Epoch: 36 Global Step: 756040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:21,927-Speed 6285.14 samples/sec Loss 2.6073 LearningRate 0.0000 Epoch: 36 Global Step: 756050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:25,182-Speed 6293.68 samples/sec Loss 2.6594 LearningRate 0.0000 Epoch: 36 Global Step: 756060 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:28,450-Speed 6267.30 samples/sec Loss 2.6525 LearningRate 0.0000 Epoch: 36 Global Step: 756070 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:31:31,694-Speed 6315.57 samples/sec Loss 2.7013 LearningRate 0.0000 Epoch: 36 Global Step: 756080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:34,954-Speed 6282.90 samples/sec Loss 2.6570 LearningRate 0.0000 Epoch: 36 Global Step: 756090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:38,209-Speed 6293.53 samples/sec Loss 2.6748 LearningRate 0.0000 Epoch: 36 Global Step: 756100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:41,466-Speed 6289.69 samples/sec Loss 2.6618 LearningRate 0.0000 Epoch: 36 Global Step: 756110 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:44,722-Speed 6291.06 samples/sec Loss 2.6410 LearningRate 0.0000 Epoch: 36 Global Step: 756120 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:47,983-Speed 6281.12 samples/sec Loss 2.7021 LearningRate 0.0000 Epoch: 36 Global Step: 756130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:51,240-Speed 6289.25 samples/sec Loss 2.6660 LearningRate 0.0000 Epoch: 36 Global Step: 756140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:54,496-Speed 6291.89 samples/sec Loss 2.6618 LearningRate 0.0000 Epoch: 36 Global Step: 756150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:31:57,757-Speed 6281.60 samples/sec Loss 2.6520 LearningRate 0.0000 Epoch: 36 Global Step: 756160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:01,011-Speed 6294.67 samples/sec Loss 2.6590 LearningRate 0.0000 Epoch: 36 Global Step: 756170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:04,255-Speed 6315.18 samples/sec Loss 2.6425 LearningRate 0.0000 Epoch: 36 Global Step: 756180 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:07,517-Speed 6280.32 samples/sec Loss 2.6540 LearningRate 0.0000 Epoch: 36 Global Step: 756190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:10,776-Speed 6285.60 samples/sec Loss 2.6753 LearningRate 0.0000 Epoch: 36 Global Step: 756200 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:14,029-Speed 6297.81 samples/sec Loss 2.6387 LearningRate 0.0000 Epoch: 36 Global Step: 756210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:17,282-Speed 6296.63 samples/sec Loss 2.6894 LearningRate 0.0000 Epoch: 36 Global Step: 756220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:20,541-Speed 6285.54 samples/sec Loss 2.6793 LearningRate 0.0000 Epoch: 36 Global Step: 756230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:23,799-Speed 6287.94 samples/sec Loss 2.6834 LearningRate 0.0000 Epoch: 36 Global Step: 756240 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:27,056-Speed 6290.21 samples/sec Loss 2.6598 LearningRate 0.0000 Epoch: 36 Global Step: 756250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:30,320-Speed 6275.00 samples/sec Loss 2.6411 LearningRate 0.0000 Epoch: 36 Global Step: 756260 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:33,574-Speed 6294.63 samples/sec Loss 2.6287 LearningRate 0.0000 Epoch: 36 Global Step: 756270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:36,825-Speed 6302.06 samples/sec Loss 2.6841 LearningRate 0.0000 Epoch: 36 Global Step: 756280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:40,101-Speed 6253.24 samples/sec Loss 2.6601 LearningRate 0.0000 Epoch: 36 Global Step: 756290 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:43,362-Speed 6281.03 samples/sec Loss 2.6979 LearningRate 0.0000 Epoch: 36 Global Step: 756300 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:46,623-Speed 6281.34 samples/sec Loss 2.6544 LearningRate 0.0000 Epoch: 36 Global Step: 756310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:49,891-Speed 6269.48 samples/sec Loss 2.6461 LearningRate 0.0000 Epoch: 36 Global Step: 756320 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:53,152-Speed 6281.87 samples/sec Loss 2.6664 LearningRate 0.0000 Epoch: 36 Global Step: 756330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:56,416-Speed 6275.25 samples/sec Loss 2.7047 LearningRate 0.0000 Epoch: 36 Global Step: 756340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:32:59,676-Speed 6282.61 samples/sec Loss 2.6270 LearningRate 0.0000 Epoch: 36 Global Step: 756350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:02,937-Speed 6281.48 samples/sec Loss 2.6849 LearningRate 0.0000 Epoch: 36 Global Step: 756360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:06,194-Speed 6290.11 samples/sec Loss 2.6589 LearningRate 0.0000 Epoch: 36 Global Step: 756370 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:09,439-Speed 6313.59 samples/sec Loss 2.6650 LearningRate 0.0000 Epoch: 36 Global Step: 756380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:12,697-Speed 6285.93 samples/sec Loss 2.6550 LearningRate 0.0000 Epoch: 36 Global Step: 756390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:15,960-Speed 6279.43 samples/sec Loss 2.6748 LearningRate 0.0000 Epoch: 36 Global Step: 756400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:19,214-Speed 6294.26 samples/sec Loss 2.6478 LearningRate 0.0000 Epoch: 36 Global Step: 756410 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:22,467-Speed 6298.17 samples/sec Loss 2.6705 LearningRate 0.0000 Epoch: 36 Global Step: 756420 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:25,721-Speed 6294.58 samples/sec Loss 2.6747 LearningRate 0.0000 Epoch: 36 Global Step: 756430 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:28,999-Speed 6250.29 samples/sec Loss 2.6647 LearningRate 0.0000 Epoch: 36 Global Step: 756440 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:32,249-Speed 6301.35 samples/sec Loss 2.6504 LearningRate 0.0000 Epoch: 36 Global Step: 756450 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:35,518-Speed 6267.15 samples/sec Loss 2.6616 LearningRate 0.0000 Epoch: 36 Global Step: 756460 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:38,779-Speed 6282.21 samples/sec Loss 2.6414 LearningRate 0.0000 Epoch: 36 Global Step: 756470 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:42,024-Speed 6312.84 samples/sec Loss 2.6288 LearningRate 0.0000 Epoch: 36 Global Step: 756480 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:45,283-Speed 6285.16 samples/sec Loss 2.6384 LearningRate 0.0000 Epoch: 36 Global Step: 756490 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:48,543-Speed 6283.40 samples/sec Loss 2.6250 LearningRate 0.0000 Epoch: 36 Global Step: 756500 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:51,803-Speed 6283.49 samples/sec Loss 2.6474 LearningRate 0.0000 Epoch: 36 Global Step: 756510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:55,063-Speed 6284.86 samples/sec Loss 2.6750 LearningRate 0.0000 Epoch: 36 Global Step: 756520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:33:58,320-Speed 6288.02 samples/sec Loss 2.6638 LearningRate 0.0000 Epoch: 36 Global Step: 756530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:01,581-Speed 6281.75 samples/sec Loss 2.6496 LearningRate 0.0000 Epoch: 36 Global Step: 756540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:04,839-Speed 6287.68 samples/sec Loss 2.6474 LearningRate 0.0000 Epoch: 36 Global Step: 756550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:08,094-Speed 6292.93 samples/sec Loss 2.7021 LearningRate 0.0000 Epoch: 36 Global Step: 756560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:11,349-Speed 6294.22 samples/sec Loss 2.6682 LearningRate 0.0000 Epoch: 36 Global Step: 756570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:14,605-Speed 6290.57 samples/sec Loss 2.6724 LearningRate 0.0000 Epoch: 36 Global Step: 756580 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:34:17,851-Speed 6310.29 samples/sec Loss 2.6951 LearningRate 0.0000 Epoch: 36 Global Step: 756590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:21,107-Speed 6291.88 samples/sec Loss 2.7016 LearningRate 0.0000 Epoch: 36 Global Step: 756600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:24,359-Speed 6300.87 samples/sec Loss 2.6874 LearningRate 0.0000 Epoch: 36 Global Step: 756610 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:27,626-Speed 6269.13 samples/sec Loss 2.6118 LearningRate 0.0000 Epoch: 36 Global Step: 756620 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:30,885-Speed 6285.91 samples/sec Loss 2.7028 LearningRate 0.0000 Epoch: 36 Global Step: 756630 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:34,144-Speed 6285.65 samples/sec Loss 2.6585 LearningRate 0.0000 Epoch: 36 Global Step: 756640 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:37,434-Speed 6226.95 samples/sec Loss 2.6908 LearningRate 0.0000 Epoch: 36 Global Step: 756650 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:40,692-Speed 6286.03 samples/sec Loss 2.6857 LearningRate 0.0000 Epoch: 36 Global Step: 756660 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:43,947-Speed 6293.50 samples/sec Loss 2.6134 LearningRate 0.0000 Epoch: 36 Global Step: 756670 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:47,205-Speed 6287.52 samples/sec Loss 2.6849 LearningRate 0.0000 Epoch: 36 Global Step: 756680 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:50,445-Speed 6321.85 samples/sec Loss 2.6423 LearningRate 0.0000 Epoch: 36 Global Step: 756690 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:53,755-Speed 6190.62 samples/sec Loss 2.6646 LearningRate 0.0000 Epoch: 36 Global Step: 756700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:34:57,007-Speed 6298.89 samples/sec Loss 2.6784 LearningRate 0.0000 Epoch: 36 Global Step: 756710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:00,269-Speed 6279.00 samples/sec Loss 2.7104 LearningRate 0.0000 Epoch: 36 Global Step: 756720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:03,523-Speed 6293.75 samples/sec Loss 2.6856 LearningRate 0.0000 Epoch: 36 Global Step: 756730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:06,775-Speed 6300.48 samples/sec Loss 2.6311 LearningRate 0.0000 Epoch: 36 Global Step: 756740 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:10,028-Speed 6296.75 samples/sec Loss 2.6829 LearningRate 0.0000 Epoch: 36 Global Step: 756750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:13,290-Speed 6280.20 samples/sec Loss 2.6708 LearningRate 0.0000 Epoch: 36 Global Step: 756760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:16,549-Speed 6286.25 samples/sec Loss 2.6655 LearningRate 0.0000 Epoch: 36 Global Step: 756770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:19,804-Speed 6292.11 samples/sec Loss 2.6594 LearningRate 0.0000 Epoch: 36 Global Step: 756780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:23,051-Speed 6309.07 samples/sec Loss 2.6364 LearningRate 0.0000 Epoch: 36 Global Step: 756790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:26,320-Speed 6266.51 samples/sec Loss 2.6856 LearningRate 0.0000 Epoch: 36 Global Step: 756800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:29,575-Speed 6293.28 samples/sec Loss 2.6458 LearningRate 0.0000 Epoch: 36 Global Step: 756810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:32,830-Speed 6294.29 samples/sec Loss 2.7107 LearningRate 0.0000 Epoch: 36 Global Step: 756820 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:36,083-Speed 6296.98 samples/sec Loss 2.6769 LearningRate 0.0000 Epoch: 36 Global Step: 756830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:39,341-Speed 6286.89 samples/sec Loss 2.6732 LearningRate 0.0000 Epoch: 36 Global Step: 756840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:42,587-Speed 6311.50 samples/sec Loss 2.6750 LearningRate 0.0000 Epoch: 36 Global Step: 756850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:45,841-Speed 6294.26 samples/sec Loss 2.6562 LearningRate 0.0000 Epoch: 36 Global Step: 756860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:49,105-Speed 6276.77 samples/sec Loss 2.6922 LearningRate 0.0000 Epoch: 36 Global Step: 756870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:52,360-Speed 6293.47 samples/sec Loss 2.6816 LearningRate 0.0000 Epoch: 36 Global Step: 756880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:55,606-Speed 6309.53 samples/sec Loss 2.6682 LearningRate 0.0000 Epoch: 36 Global Step: 756890 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:35:58,872-Speed 6272.04 samples/sec Loss 2.6355 LearningRate 0.0000 Epoch: 36 Global Step: 756900 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:02,139-Speed 6270.58 samples/sec Loss 2.6282 LearningRate 0.0000 Epoch: 36 Global Step: 756910 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:05,406-Speed 6269.46 samples/sec Loss 2.6677 LearningRate 0.0000 Epoch: 36 Global Step: 756920 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:08,665-Speed 6287.18 samples/sec Loss 2.6621 LearningRate 0.0000 Epoch: 36 Global Step: 756930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:11,916-Speed 6299.90 samples/sec Loss 2.6131 LearningRate 0.0000 Epoch: 36 Global Step: 756940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:15,174-Speed 6286.97 samples/sec Loss 2.6944 LearningRate 0.0000 Epoch: 36 Global Step: 756950 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:18,435-Speed 6282.30 samples/sec Loss 2.6109 LearningRate 0.0000 Epoch: 36 Global Step: 756960 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:21,692-Speed 6290.26 samples/sec Loss 2.6798 LearningRate 0.0000 Epoch: 36 Global Step: 756970 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:24,943-Speed 6299.78 samples/sec Loss 2.6294 LearningRate 0.0000 Epoch: 36 Global Step: 756980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:28,199-Speed 6292.98 samples/sec Loss 2.6536 LearningRate 0.0000 Epoch: 36 Global Step: 756990 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:36:31,436-Speed 6328.14 samples/sec Loss 2.7017 LearningRate 0.0000 Epoch: 36 Global Step: 757000 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:34,683-Speed 6307.83 samples/sec Loss 2.6895 LearningRate 0.0000 Epoch: 36 Global Step: 757010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:37,930-Speed 6309.15 samples/sec Loss 2.6976 LearningRate 0.0000 Epoch: 36 Global Step: 757020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:41,191-Speed 6282.31 samples/sec Loss 2.6358 LearningRate 0.0000 Epoch: 36 Global Step: 757030 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:44,446-Speed 6292.91 samples/sec Loss 2.6393 LearningRate 0.0000 Epoch: 36 Global Step: 757040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:47,705-Speed 6285.01 samples/sec Loss 2.7267 LearningRate 0.0000 Epoch: 36 Global Step: 757050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:50,966-Speed 6281.87 samples/sec Loss 2.6275 LearningRate 0.0000 Epoch: 36 Global Step: 757060 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:54,226-Speed 6283.71 samples/sec Loss 2.6518 LearningRate 0.0000 Epoch: 36 Global Step: 757070 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:36:57,486-Speed 6284.75 samples/sec Loss 2.6555 LearningRate 0.0000 Epoch: 36 Global Step: 757080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:00,736-Speed 6301.65 samples/sec Loss 2.6801 LearningRate 0.0000 Epoch: 36 Global Step: 757090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:03,994-Speed 6288.23 samples/sec Loss 2.6976 LearningRate 0.0000 Epoch: 36 Global Step: 757100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:07,256-Speed 6279.36 samples/sec Loss 2.6263 LearningRate 0.0000 Epoch: 36 Global Step: 757110 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:10,515-Speed 6285.41 samples/sec Loss 2.6518 LearningRate 0.0000 Epoch: 36 Global Step: 757120 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:13,775-Speed 6284.06 samples/sec Loss 2.6399 LearningRate 0.0000 Epoch: 36 Global Step: 757130 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:17,027-Speed 6298.49 samples/sec Loss 2.6363 LearningRate 0.0000 Epoch: 36 Global Step: 757140 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:20,281-Speed 6295.83 samples/sec Loss 2.6525 LearningRate 0.0000 Epoch: 36 Global Step: 757150 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:23,544-Speed 6278.00 samples/sec Loss 2.6666 LearningRate 0.0000 Epoch: 36 Global Step: 757160 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:26,802-Speed 6286.73 samples/sec Loss 2.7179 LearningRate 0.0000 Epoch: 36 Global Step: 757170 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:30,061-Speed 6287.30 samples/sec Loss 2.6660 LearningRate 0.0000 Epoch: 36 Global Step: 757180 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:33,322-Speed 6281.00 samples/sec Loss 2.6268 LearningRate 0.0000 Epoch: 36 Global Step: 757190 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:36,574-Speed 6299.14 samples/sec Loss 2.6139 LearningRate 0.0000 Epoch: 36 Global Step: 757200 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:37:39,815-Speed 6321.13 samples/sec Loss 2.6793 LearningRate 0.0000 Epoch: 36 Global Step: 757210 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:43,069-Speed 6293.92 samples/sec Loss 2.6920 LearningRate 0.0000 Epoch: 36 Global Step: 757220 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:46,325-Speed 6292.37 samples/sec Loss 2.6555 LearningRate 0.0000 Epoch: 36 Global Step: 757230 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:49,589-Speed 6275.79 samples/sec Loss 2.6689 LearningRate 0.0000 Epoch: 36 Global Step: 757240 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:52,844-Speed 6292.47 samples/sec Loss 2.6541 LearningRate 0.0000 Epoch: 36 Global Step: 757250 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:56,098-Speed 6295.02 samples/sec Loss 2.6391 LearningRate 0.0000 Epoch: 36 Global Step: 757260 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:37:59,357-Speed 6287.35 samples/sec Loss 2.7117 LearningRate 0.0000 Epoch: 36 Global Step: 757270 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:02,617-Speed 6283.13 samples/sec Loss 2.6459 LearningRate 0.0000 Epoch: 36 Global Step: 757280 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:05,876-Speed 6284.54 samples/sec Loss 2.6433 LearningRate 0.0000 Epoch: 36 Global Step: 757290 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:09,140-Speed 6275.83 samples/sec Loss 2.6262 LearningRate 0.0000 Epoch: 36 Global Step: 757300 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:12,382-Speed 6319.35 samples/sec Loss 2.6390 LearningRate 0.0000 Epoch: 36 Global Step: 757310 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:15,645-Speed 6277.72 samples/sec Loss 2.6584 LearningRate 0.0000 Epoch: 36 Global Step: 757320 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:18,899-Speed 6295.07 samples/sec Loss 2.6982 LearningRate 0.0000 Epoch: 36 Global Step: 757330 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:22,158-Speed 6284.99 samples/sec Loss 2.6500 LearningRate 0.0000 Epoch: 36 Global Step: 757340 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:25,417-Speed 6286.03 samples/sec Loss 2.6254 LearningRate 0.0000 Epoch: 36 Global Step: 757350 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:28,675-Speed 6288.61 samples/sec Loss 2.6250 LearningRate 0.0000 Epoch: 36 Global Step: 757360 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:31,936-Speed 6280.58 samples/sec Loss 2.6388 LearningRate 0.0000 Epoch: 36 Global Step: 757370 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:35,191-Speed 6292.82 samples/sec Loss 2.6242 LearningRate 0.0000 Epoch: 36 Global Step: 757380 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:38,452-Speed 6282.18 samples/sec Loss 2.6312 LearningRate 0.0000 Epoch: 36 Global Step: 757390 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:41,706-Speed 6295.67 samples/sec Loss 2.6496 LearningRate 0.0000 Epoch: 36 Global Step: 757400 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:38:44,931-Speed 6351.28 samples/sec Loss 2.6614 LearningRate 0.0000 Epoch: 36 Global Step: 757410 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:38:48,202-Speed 6263.63 samples/sec Loss 2.6857 LearningRate 0.0000 Epoch: 36 Global Step: 757420 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:38:51,457-Speed 6292.46 samples/sec Loss 2.6768 LearningRate 0.0000 Epoch: 36 Global Step: 757430 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:38:54,715-Speed 6288.44 samples/sec Loss 2.6868 LearningRate 0.0000 Epoch: 36 Global Step: 757440 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:38:57,968-Speed 6296.68 samples/sec Loss 2.6762 LearningRate 0.0000 Epoch: 36 Global Step: 757450 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:39:01,224-Speed 6291.93 samples/sec Loss 2.6010 LearningRate 0.0000 Epoch: 36 Global Step: 757460 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:39:04,500-Speed 6253.21 samples/sec Loss 2.7034 LearningRate 0.0000 Epoch: 36 Global Step: 757470 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:39:07,757-Speed 6289.43 samples/sec Loss 2.6165 LearningRate 0.0000 Epoch: 36 Global Step: 757480 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:39:11,013-Speed 6290.73 samples/sec Loss 2.6567 LearningRate 0.0000 Epoch: 36 Global Step: 757490 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:39:14,271-Speed 6286.30 samples/sec Loss 2.6758 LearningRate 0.0000 Epoch: 36 Global Step: 757500 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:39:17,532-Speed 6282.79 samples/sec Loss 2.7010 LearningRate 0.0000 Epoch: 36 Global Step: 757510 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:20,784-Speed 6298.44 samples/sec Loss 2.6678 LearningRate 0.0000 Epoch: 36 Global Step: 757520 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:24,038-Speed 6296.31 samples/sec Loss 2.6746 LearningRate 0.0000 Epoch: 36 Global Step: 757530 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:27,330-Speed 6222.28 samples/sec Loss 2.6996 LearningRate 0.0000 Epoch: 36 Global Step: 757540 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:30,622-Speed 6221.71 samples/sec Loss 2.6271 LearningRate 0.0000 Epoch: 36 Global Step: 757550 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:33,884-Speed 6280.59 samples/sec Loss 2.6855 LearningRate 0.0000 Epoch: 36 Global Step: 757560 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:37,138-Speed 6294.53 samples/sec Loss 2.6832 LearningRate 0.0000 Epoch: 36 Global Step: 757570 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:40,398-Speed 6284.20 samples/sec Loss 2.6870 LearningRate 0.0000 Epoch: 36 Global Step: 757580 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:43,657-Speed 6285.52 samples/sec Loss 2.7162 LearningRate 0.0000 Epoch: 36 Global Step: 757590 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:46,919-Speed 6280.06 samples/sec Loss 2.6908 LearningRate 0.0000 Epoch: 36 Global Step: 757600 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:50,180-Speed 6281.37 samples/sec Loss 2.6170 LearningRate 0.0000 Epoch: 36 Global Step: 757610 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:39:53,426-Speed 6310.58 samples/sec Loss 2.6681 LearningRate 0.0000 Epoch: 36 Global Step: 757620 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:56,686-Speed 6284.34 samples/sec Loss 2.5931 LearningRate 0.0000 Epoch: 36 Global Step: 757630 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:39:59,946-Speed 6284.01 samples/sec Loss 2.7115 LearningRate 0.0000 Epoch: 36 Global Step: 757640 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:40:03,188-Speed 6318.85 samples/sec Loss 2.6490 LearningRate 0.0000 Epoch: 36 Global Step: 757650 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:40:06,445-Speed 6289.42 samples/sec Loss 2.7038 LearningRate 0.0000 Epoch: 36 Global Step: 757660 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:40:09,704-Speed 6283.90 samples/sec Loss 2.6103 LearningRate 0.0000 Epoch: 36 Global Step: 757670 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:40:12,968-Speed 6275.72 samples/sec Loss 2.6839 LearningRate 0.0000 Epoch: 36 Global Step: 757680 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:40:16,229-Speed 6281.56 samples/sec Loss 2.7225 LearningRate 0.0000 Epoch: 36 Global Step: 757690 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:40:19,485-Speed 6291.76 samples/sec Loss 2.6775 LearningRate 0.0000 Epoch: 36 Global Step: 757700 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:40:22,749-Speed 6276.00 samples/sec Loss 2.6370 LearningRate 0.0000 Epoch: 36 Global Step: 757710 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:40:26,006-Speed 6289.25 samples/sec Loss 2.6810 LearningRate 0.0000 Epoch: 36 Global Step: 757720 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:40:29,293-Speed 6232.80 samples/sec Loss 2.6561 LearningRate 0.0000 Epoch: 36 Global Step: 757730 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:40:32,573-Speed 6245.09 samples/sec Loss 2.6399 LearningRate 0.0000 Epoch: 36 Global Step: 757740 Fp16 Grad Scale: 2048 Required: 7 hours Training: 2022-04-03 12:40:35,836-Speed 6278.13 samples/sec Loss 2.6929 LearningRate 0.0000 Epoch: 36 Global Step: 757750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:40:39,091-Speed 6292.12 samples/sec Loss 2.6397 LearningRate 0.0000 Epoch: 36 Global Step: 757760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:40:42,346-Speed 6293.26 samples/sec Loss 2.6464 LearningRate 0.0000 Epoch: 36 Global Step: 757770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:40:45,604-Speed 6288.73 samples/sec Loss 2.6441 LearningRate 0.0000 Epoch: 36 Global Step: 757780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:40:48,865-Speed 6281.33 samples/sec Loss 2.6193 LearningRate 0.0000 Epoch: 36 Global Step: 757790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:40:52,127-Speed 6280.41 samples/sec Loss 2.7220 LearningRate 0.0000 Epoch: 36 Global Step: 757800 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:40:55,388-Speed 6281.49 samples/sec Loss 2.6874 LearningRate 0.0000 Epoch: 36 Global Step: 757810 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:40:58,644-Speed 6291.64 samples/sec Loss 2.6779 LearningRate 0.0000 Epoch: 36 Global Step: 757820 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:01,908-Speed 6275.85 samples/sec Loss 2.6998 LearningRate 0.0000 Epoch: 36 Global Step: 757830 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:05,165-Speed 6290.10 samples/sec Loss 2.6537 LearningRate 0.0000 Epoch: 36 Global Step: 757840 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:08,408-Speed 6315.77 samples/sec Loss 2.6619 LearningRate 0.0000 Epoch: 36 Global Step: 757850 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:11,667-Speed 6285.47 samples/sec Loss 2.6828 LearningRate 0.0000 Epoch: 36 Global Step: 757860 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:14,928-Speed 6281.69 samples/sec Loss 2.6869 LearningRate 0.0000 Epoch: 36 Global Step: 757870 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:18,188-Speed 6282.82 samples/sec Loss 2.6548 LearningRate 0.0000 Epoch: 36 Global Step: 757880 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:21,445-Speed 6290.80 samples/sec Loss 2.6721 LearningRate 0.0000 Epoch: 36 Global Step: 757890 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:24,696-Speed 6300.65 samples/sec Loss 2.6587 LearningRate 0.0000 Epoch: 36 Global Step: 757900 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:27,951-Speed 6292.44 samples/sec Loss 2.6389 LearningRate 0.0000 Epoch: 36 Global Step: 757910 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:31,212-Speed 6283.35 samples/sec Loss 2.6496 LearningRate 0.0000 Epoch: 36 Global Step: 757920 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:34,471-Speed 6284.24 samples/sec Loss 2.6912 LearningRate 0.0000 Epoch: 36 Global Step: 757930 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:37,733-Speed 6280.05 samples/sec Loss 2.6623 LearningRate 0.0000 Epoch: 36 Global Step: 757940 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:40,991-Speed 6287.66 samples/sec Loss 2.6728 LearningRate 0.0000 Epoch: 36 Global Step: 757950 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-03 12:41:44,230-Speed 6324.65 samples/sec Loss 2.6021 LearningRate 0.0000 Epoch: 36 Global Step: 757960 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:47,499-Speed 6266.39 samples/sec Loss 2.6745 LearningRate 0.0000 Epoch: 36 Global Step: 757970 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:50,767-Speed 6266.84 samples/sec Loss 2.6685 LearningRate 0.0000 Epoch: 36 Global Step: 757980 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:54,025-Speed 6287.91 samples/sec Loss 2.6402 LearningRate 0.0000 Epoch: 36 Global Step: 757990 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:41:57,285-Speed 6284.57 samples/sec Loss 2.6494 LearningRate 0.0000 Epoch: 36 Global Step: 758000 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:00,538-Speed 6297.42 samples/sec Loss 2.6994 LearningRate 0.0000 Epoch: 36 Global Step: 758010 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:03,800-Speed 6279.79 samples/sec Loss 2.6684 LearningRate 0.0000 Epoch: 36 Global Step: 758020 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:07,054-Speed 6295.67 samples/sec Loss 2.6151 LearningRate 0.0000 Epoch: 36 Global Step: 758030 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:10,309-Speed 6292.99 samples/sec Loss 2.6415 LearningRate 0.0000 Epoch: 36 Global Step: 758040 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:13,563-Speed 6294.61 samples/sec Loss 2.6788 LearningRate 0.0000 Epoch: 36 Global Step: 758050 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:16,805-Speed 6318.10 samples/sec Loss 2.6874 LearningRate 0.0000 Epoch: 36 Global Step: 758060 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:20,062-Speed 6289.78 samples/sec Loss 2.6348 LearningRate 0.0000 Epoch: 36 Global Step: 758070 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:23,323-Speed 6282.09 samples/sec Loss 2.6867 LearningRate 0.0000 Epoch: 36 Global Step: 758080 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:26,577-Speed 6295.66 samples/sec Loss 2.6547 LearningRate 0.0000 Epoch: 36 Global Step: 758090 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:29,832-Speed 6292.44 samples/sec Loss 2.6617 LearningRate 0.0000 Epoch: 36 Global Step: 758100 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-03 12:42:33,095-Speed 6277.69 samples/sec Loss 2.6867 LearningRate 0.0000 Epoch: 36 Global Step: 758110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:42:36,353-Speed 6288.01 samples/sec Loss 2.6213 LearningRate 0.0000 Epoch: 36 Global Step: 758120 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:42:39,602-Speed 6304.91 samples/sec Loss 2.6912 LearningRate 0.0000 Epoch: 36 Global Step: 758130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:42:42,860-Speed 6288.18 samples/sec Loss 2.6915 LearningRate 0.0000 Epoch: 36 Global Step: 758140 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:42:46,117-Speed 6287.97 samples/sec Loss 2.6849 LearningRate 0.0000 Epoch: 36 Global Step: 758150 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:42:49,359-Speed 6318.86 samples/sec Loss 2.6353 LearningRate 0.0000 Epoch: 36 Global Step: 758160 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:42:52,607-Speed 6307.33 samples/sec Loss 2.6250 LearningRate 0.0000 Epoch: 36 Global Step: 758170 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:42:55,862-Speed 6292.76 samples/sec Loss 2.6985 LearningRate 0.0000 Epoch: 36 Global Step: 758180 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:42:59,124-Speed 6279.01 samples/sec Loss 2.6762 LearningRate 0.0000 Epoch: 36 Global Step: 758190 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:02,386-Speed 6283.49 samples/sec Loss 2.6343 LearningRate 0.0000 Epoch: 36 Global Step: 758200 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:05,645-Speed 6286.98 samples/sec Loss 2.6456 LearningRate 0.0000 Epoch: 36 Global Step: 758210 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:08,906-Speed 6280.88 samples/sec Loss 2.6774 LearningRate 0.0000 Epoch: 36 Global Step: 758220 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:12,164-Speed 6288.56 samples/sec Loss 2.6913 LearningRate 0.0000 Epoch: 36 Global Step: 758230 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:15,420-Speed 6290.67 samples/sec Loss 2.6435 LearningRate 0.0000 Epoch: 36 Global Step: 758240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:18,681-Speed 6280.79 samples/sec Loss 2.6633 LearningRate 0.0000 Epoch: 36 Global Step: 758250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:21,918-Speed 6329.65 samples/sec Loss 2.6319 LearningRate 0.0000 Epoch: 36 Global Step: 758260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:25,169-Speed 6300.14 samples/sec Loss 2.6438 LearningRate 0.0000 Epoch: 36 Global Step: 758270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:28,427-Speed 6288.28 samples/sec Loss 2.6410 LearningRate 0.0000 Epoch: 36 Global Step: 758280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:31,684-Speed 6288.12 samples/sec Loss 2.6494 LearningRate 0.0000 Epoch: 36 Global Step: 758290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:34,941-Speed 6290.24 samples/sec Loss 2.6377 LearningRate 0.0000 Epoch: 36 Global Step: 758300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:38,298-Speed 6101.74 samples/sec Loss 2.6270 LearningRate 0.0000 Epoch: 36 Global Step: 758310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:41,677-Speed 6061.86 samples/sec Loss 2.6797 LearningRate 0.0000 Epoch: 36 Global Step: 758320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:44,945-Speed 6267.69 samples/sec Loss 2.6703 LearningRate 0.0000 Epoch: 36 Global Step: 758330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:48,207-Speed 6280.06 samples/sec Loss 2.6193 LearningRate 0.0000 Epoch: 36 Global Step: 758340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:51,468-Speed 6283.11 samples/sec Loss 2.6284 LearningRate 0.0000 Epoch: 36 Global Step: 758350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:43:54,726-Speed 6287.32 samples/sec Loss 2.6379 LearningRate 0.0000 Epoch: 36 Global Step: 758360 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 12:43:57,964-Speed 6325.11 samples/sec Loss 2.6219 LearningRate 0.0000 Epoch: 36 Global Step: 758370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:01,254-Speed 6225.82 samples/sec Loss 2.6041 LearningRate 0.0000 Epoch: 36 Global Step: 758380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:04,522-Speed 6271.44 samples/sec Loss 2.6532 LearningRate 0.0000 Epoch: 36 Global Step: 758390 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:07,796-Speed 6255.70 samples/sec Loss 2.7020 LearningRate 0.0000 Epoch: 36 Global Step: 758400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:11,053-Speed 6290.87 samples/sec Loss 2.6237 LearningRate 0.0000 Epoch: 36 Global Step: 758410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:14,314-Speed 6280.84 samples/sec Loss 2.6490 LearningRate 0.0000 Epoch: 36 Global Step: 758420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:17,568-Speed 6294.92 samples/sec Loss 2.6127 LearningRate 0.0000 Epoch: 36 Global Step: 758430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:20,818-Speed 6303.87 samples/sec Loss 2.6815 LearningRate 0.0000 Epoch: 36 Global Step: 758440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:24,067-Speed 6305.19 samples/sec Loss 2.6252 LearningRate 0.0000 Epoch: 36 Global Step: 758450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:27,323-Speed 6290.39 samples/sec Loss 2.6540 LearningRate 0.0000 Epoch: 36 Global Step: 758460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:30,566-Speed 6317.21 samples/sec Loss 2.6812 LearningRate 0.0000 Epoch: 36 Global Step: 758470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:33,822-Speed 6290.75 samples/sec Loss 2.6639 LearningRate 0.0000 Epoch: 36 Global Step: 758480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:37,082-Speed 6284.07 samples/sec Loss 2.6391 LearningRate 0.0000 Epoch: 36 Global Step: 758490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:40,338-Speed 6290.76 samples/sec Loss 2.6311 LearningRate 0.0000 Epoch: 36 Global Step: 758500 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:43,596-Speed 6287.37 samples/sec Loss 2.6112 LearningRate 0.0000 Epoch: 36 Global Step: 758510 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:46,855-Speed 6286.32 samples/sec Loss 2.6472 LearningRate 0.0000 Epoch: 36 Global Step: 758520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:50,120-Speed 6274.02 samples/sec Loss 2.6517 LearningRate 0.0000 Epoch: 36 Global Step: 758530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:53,382-Speed 6279.11 samples/sec Loss 2.6946 LearningRate 0.0000 Epoch: 36 Global Step: 758540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:56,632-Speed 6303.48 samples/sec Loss 2.6954 LearningRate 0.0000 Epoch: 36 Global Step: 758550 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:44:59,889-Speed 6289.41 samples/sec Loss 2.7002 LearningRate 0.0000 Epoch: 36 Global Step: 758560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:03,126-Speed 6328.13 samples/sec Loss 2.6529 LearningRate 0.0000 Epoch: 36 Global Step: 758570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:06,390-Speed 6276.43 samples/sec Loss 2.6719 LearningRate 0.0000 Epoch: 36 Global Step: 758580 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:09,651-Speed 6280.73 samples/sec Loss 2.6644 LearningRate 0.0000 Epoch: 36 Global Step: 758590 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:12,908-Speed 6290.67 samples/sec Loss 2.6706 LearningRate 0.0000 Epoch: 36 Global Step: 758600 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:16,167-Speed 6286.26 samples/sec Loss 2.6600 LearningRate 0.0000 Epoch: 36 Global Step: 758610 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:19,424-Speed 6288.03 samples/sec Loss 2.6362 LearningRate 0.0000 Epoch: 36 Global Step: 758620 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:22,685-Speed 6283.42 samples/sec Loss 2.6233 LearningRate 0.0000 Epoch: 36 Global Step: 758630 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:25,941-Speed 6290.77 samples/sec Loss 2.6891 LearningRate 0.0000 Epoch: 36 Global Step: 758640 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:29,197-Speed 6291.20 samples/sec Loss 2.6415 LearningRate 0.0000 Epoch: 36 Global Step: 758650 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:32,466-Speed 6266.26 samples/sec Loss 2.6318 LearningRate 0.0000 Epoch: 36 Global Step: 758660 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:35,722-Speed 6290.92 samples/sec Loss 2.6477 LearningRate 0.0000 Epoch: 36 Global Step: 758670 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 12:45:38,987-Speed 6277.43 samples/sec Loss 2.6210 LearningRate 0.0000 Epoch: 36 Global Step: 758680 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 12:45:42,233-Speed 6311.74 samples/sec Loss 2.6239 LearningRate 0.0000 Epoch: 36 Global Step: 758690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:45,493-Speed 6282.24 samples/sec Loss 2.6920 LearningRate 0.0000 Epoch: 36 Global Step: 758700 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:48,755-Speed 6280.49 samples/sec Loss 2.6292 LearningRate 0.0000 Epoch: 36 Global Step: 758710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:52,009-Speed 6296.43 samples/sec Loss 2.6541 LearningRate 0.0000 Epoch: 36 Global Step: 758720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:55,268-Speed 6285.68 samples/sec Loss 2.6569 LearningRate 0.0000 Epoch: 36 Global Step: 758730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:45:58,522-Speed 6295.46 samples/sec Loss 2.7095 LearningRate 0.0000 Epoch: 36 Global Step: 758740 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:01,786-Speed 6275.39 samples/sec Loss 2.6260 LearningRate 0.0000 Epoch: 36 Global Step: 758750 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:05,035-Speed 6305.35 samples/sec Loss 2.6593 LearningRate 0.0000 Epoch: 36 Global Step: 758760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:08,293-Speed 6287.79 samples/sec Loss 2.6654 LearningRate 0.0000 Epoch: 36 Global Step: 758770 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:11,550-Speed 6287.80 samples/sec Loss 2.6217 LearningRate 0.0000 Epoch: 36 Global Step: 758780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:14,790-Speed 6323.20 samples/sec Loss 2.6686 LearningRate 0.0000 Epoch: 36 Global Step: 758790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:18,043-Speed 6297.75 samples/sec Loss 2.6755 LearningRate 0.0000 Epoch: 36 Global Step: 758800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:21,301-Speed 6286.59 samples/sec Loss 2.6783 LearningRate 0.0000 Epoch: 36 Global Step: 758810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:24,561-Speed 6285.18 samples/sec Loss 2.6595 LearningRate 0.0000 Epoch: 36 Global Step: 758820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:27,815-Speed 6294.24 samples/sec Loss 2.6292 LearningRate 0.0000 Epoch: 36 Global Step: 758830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:31,078-Speed 6279.18 samples/sec Loss 2.6036 LearningRate 0.0000 Epoch: 36 Global Step: 758840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:34,333-Speed 6291.71 samples/sec Loss 2.6687 LearningRate 0.0000 Epoch: 36 Global Step: 758850 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:37,595-Speed 6280.95 samples/sec Loss 2.6645 LearningRate 0.0000 Epoch: 36 Global Step: 758860 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:40,852-Speed 6288.29 samples/sec Loss 2.6290 LearningRate 0.0000 Epoch: 36 Global Step: 758870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:44,106-Speed 6296.47 samples/sec Loss 2.6700 LearningRate 0.0000 Epoch: 36 Global Step: 758880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:47,346-Speed 6322.51 samples/sec Loss 2.6033 LearningRate 0.0000 Epoch: 36 Global Step: 758890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:50,610-Speed 6274.12 samples/sec Loss 2.6473 LearningRate 0.0000 Epoch: 36 Global Step: 758900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:53,878-Speed 6269.71 samples/sec Loss 2.6044 LearningRate 0.0000 Epoch: 36 Global Step: 758910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:46:57,135-Speed 6287.96 samples/sec Loss 2.6459 LearningRate 0.0000 Epoch: 36 Global Step: 758920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:00,389-Speed 6295.10 samples/sec Loss 2.6053 LearningRate 0.0000 Epoch: 36 Global Step: 758930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:03,643-Speed 6299.33 samples/sec Loss 2.6899 LearningRate 0.0000 Epoch: 36 Global Step: 758940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:06,904-Speed 6281.08 samples/sec Loss 2.6784 LearningRate 0.0000 Epoch: 36 Global Step: 758950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:10,166-Speed 6280.58 samples/sec Loss 2.6417 LearningRate 0.0000 Epoch: 36 Global Step: 758960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:13,425-Speed 6285.67 samples/sec Loss 2.6914 LearningRate 0.0000 Epoch: 36 Global Step: 758970 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:16,677-Speed 6298.04 samples/sec Loss 2.6653 LearningRate 0.0000 Epoch: 36 Global Step: 758980 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:19,930-Speed 6297.68 samples/sec Loss 2.6544 LearningRate 0.0000 Epoch: 36 Global Step: 758990 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 12:47:23,167-Speed 6328.16 samples/sec Loss 2.6421 LearningRate 0.0000 Epoch: 36 Global Step: 759000 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:26,424-Speed 6289.94 samples/sec Loss 2.6687 LearningRate 0.0000 Epoch: 36 Global Step: 759010 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:29,680-Speed 6292.02 samples/sec Loss 2.6480 LearningRate 0.0000 Epoch: 36 Global Step: 759020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:32,939-Speed 6285.02 samples/sec Loss 2.6384 LearningRate 0.0000 Epoch: 36 Global Step: 759030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:36,219-Speed 6245.29 samples/sec Loss 2.6797 LearningRate 0.0000 Epoch: 36 Global Step: 759040 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:39,478-Speed 6286.18 samples/sec Loss 2.6637 LearningRate 0.0000 Epoch: 36 Global Step: 759050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:42,739-Speed 6282.32 samples/sec Loss 2.6319 LearningRate 0.0000 Epoch: 36 Global Step: 759060 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:46,003-Speed 6274.29 samples/sec Loss 2.5894 LearningRate 0.0000 Epoch: 36 Global Step: 759070 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:49,264-Speed 6282.44 samples/sec Loss 2.6814 LearningRate 0.0000 Epoch: 36 Global Step: 759080 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:52,524-Speed 6284.74 samples/sec Loss 2.6617 LearningRate 0.0000 Epoch: 36 Global Step: 759090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:55,768-Speed 6313.37 samples/sec Loss 2.6267 LearningRate 0.0000 Epoch: 36 Global Step: 759100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:47:59,027-Speed 6285.40 samples/sec Loss 2.6924 LearningRate 0.0000 Epoch: 36 Global Step: 759110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:02,285-Speed 6287.67 samples/sec Loss 2.6051 LearningRate 0.0000 Epoch: 36 Global Step: 759120 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:05,540-Speed 6293.88 samples/sec Loss 2.5920 LearningRate 0.0000 Epoch: 36 Global Step: 759130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:08,805-Speed 6273.04 samples/sec Loss 2.5899 LearningRate 0.0000 Epoch: 36 Global Step: 759140 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:12,079-Speed 6257.35 samples/sec Loss 2.6578 LearningRate 0.0000 Epoch: 36 Global Step: 759150 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:15,335-Speed 6291.62 samples/sec Loss 2.6660 LearningRate 0.0000 Epoch: 36 Global Step: 759160 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:18,587-Speed 6299.60 samples/sec Loss 2.6318 LearningRate 0.0000 Epoch: 36 Global Step: 759170 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:21,844-Speed 6287.92 samples/sec Loss 2.6538 LearningRate 0.0000 Epoch: 36 Global Step: 759180 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:25,114-Speed 6265.16 samples/sec Loss 2.6431 LearningRate 0.0000 Epoch: 36 Global Step: 759190 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:28,360-Speed 6311.11 samples/sec Loss 2.6931 LearningRate 0.0000 Epoch: 36 Global Step: 759200 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:31,623-Speed 6276.56 samples/sec Loss 2.6403 LearningRate 0.0000 Epoch: 36 Global Step: 759210 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:34,896-Speed 6260.47 samples/sec Loss 2.7075 LearningRate 0.0000 Epoch: 36 Global Step: 759220 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:38,172-Speed 6253.04 samples/sec Loss 2.6935 LearningRate 0.0000 Epoch: 36 Global Step: 759230 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:41,436-Speed 6275.40 samples/sec Loss 2.6632 LearningRate 0.0000 Epoch: 36 Global Step: 759240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:44,694-Speed 6288.73 samples/sec Loss 2.6262 LearningRate 0.0000 Epoch: 36 Global Step: 759250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:47,952-Speed 6286.16 samples/sec Loss 2.6670 LearningRate 0.0000 Epoch: 36 Global Step: 759260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:51,211-Speed 6286.13 samples/sec Loss 2.6549 LearningRate 0.0000 Epoch: 36 Global Step: 759270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:54,469-Speed 6288.22 samples/sec Loss 2.7133 LearningRate 0.0000 Epoch: 36 Global Step: 759280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:48:57,728-Speed 6284.31 samples/sec Loss 2.7016 LearningRate 0.0000 Epoch: 36 Global Step: 759290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:00,976-Speed 6307.41 samples/sec Loss 2.6460 LearningRate 0.0000 Epoch: 36 Global Step: 759300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:04,234-Speed 6286.99 samples/sec Loss 2.7081 LearningRate 0.0000 Epoch: 36 Global Step: 759310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:07,494-Speed 6284.68 samples/sec Loss 2.6795 LearningRate 0.0000 Epoch: 36 Global Step: 759320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:10,743-Speed 6303.84 samples/sec Loss 2.6776 LearningRate 0.0000 Epoch: 36 Global Step: 759330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:13,998-Speed 6294.43 samples/sec Loss 2.6311 LearningRate 0.0000 Epoch: 36 Global Step: 759340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:17,250-Speed 6298.52 samples/sec Loss 2.6858 LearningRate 0.0000 Epoch: 36 Global Step: 759350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:20,513-Speed 6278.60 samples/sec Loss 2.6919 LearningRate 0.0000 Epoch: 36 Global Step: 759360 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:23,769-Speed 6289.79 samples/sec Loss 2.6475 LearningRate 0.0000 Epoch: 36 Global Step: 759370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:27,026-Speed 6293.27 samples/sec Loss 2.6400 LearningRate 0.0000 Epoch: 36 Global Step: 759380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:30,280-Speed 6294.67 samples/sec Loss 2.6508 LearningRate 0.0000 Epoch: 36 Global Step: 759390 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:33,519-Speed 6324.44 samples/sec Loss 2.7074 LearningRate 0.0000 Epoch: 36 Global Step: 759400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:36,790-Speed 6263.74 samples/sec Loss 2.6433 LearningRate 0.0000 Epoch: 36 Global Step: 759410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:40,055-Speed 6274.40 samples/sec Loss 2.6200 LearningRate 0.0000 Epoch: 36 Global Step: 759420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:43,320-Speed 6278.26 samples/sec Loss 2.7325 LearningRate 0.0000 Epoch: 36 Global Step: 759430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:46,584-Speed 6276.36 samples/sec Loss 2.6413 LearningRate 0.0000 Epoch: 36 Global Step: 759440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:49,856-Speed 6260.46 samples/sec Loss 2.6744 LearningRate 0.0000 Epoch: 36 Global Step: 759450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:53,131-Speed 6254.27 samples/sec Loss 2.6553 LearningRate 0.0000 Epoch: 36 Global Step: 759460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:56,388-Speed 6289.25 samples/sec Loss 2.6602 LearningRate 0.0000 Epoch: 36 Global Step: 759470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:49:59,648-Speed 6283.98 samples/sec Loss 2.6696 LearningRate 0.0000 Epoch: 36 Global Step: 759480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:02,905-Speed 6288.41 samples/sec Loss 2.6752 LearningRate 0.0000 Epoch: 36 Global Step: 759490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:06,172-Speed 6271.22 samples/sec Loss 2.6108 LearningRate 0.0000 Epoch: 36 Global Step: 759500 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 12:50:09,445-Speed 6257.89 samples/sec Loss 2.6418 LearningRate 0.0000 Epoch: 36 Global Step: 759510 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 12:50:12,690-Speed 6313.21 samples/sec Loss 2.6857 LearningRate 0.0000 Epoch: 36 Global Step: 759520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:15,954-Speed 6276.44 samples/sec Loss 2.7249 LearningRate 0.0000 Epoch: 36 Global Step: 759530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:19,267-Speed 6182.52 samples/sec Loss 2.6459 LearningRate 0.0000 Epoch: 36 Global Step: 759540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:22,529-Speed 6279.66 samples/sec Loss 2.6713 LearningRate 0.0000 Epoch: 36 Global Step: 759550 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:25,787-Speed 6287.98 samples/sec Loss 2.6751 LearningRate 0.0000 Epoch: 36 Global Step: 759560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:29,040-Speed 6295.82 samples/sec Loss 2.6325 LearningRate 0.0000 Epoch: 36 Global Step: 759570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:32,292-Speed 6299.02 samples/sec Loss 2.6040 LearningRate 0.0000 Epoch: 36 Global Step: 759580 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:35,550-Speed 6288.18 samples/sec Loss 2.6678 LearningRate 0.0000 Epoch: 36 Global Step: 759590 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:38,807-Speed 6290.14 samples/sec Loss 2.6345 LearningRate 0.0000 Epoch: 36 Global Step: 759600 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:42,067-Speed 6282.48 samples/sec Loss 2.6762 LearningRate 0.0000 Epoch: 36 Global Step: 759610 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:45,306-Speed 6325.89 samples/sec Loss 2.6210 LearningRate 0.0000 Epoch: 36 Global Step: 759620 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:48,597-Speed 6224.96 samples/sec Loss 2.6844 LearningRate 0.0000 Epoch: 36 Global Step: 759630 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:51,871-Speed 6255.95 samples/sec Loss 2.6978 LearningRate 0.0000 Epoch: 36 Global Step: 759640 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:55,192-Speed 6167.25 samples/sec Loss 2.6188 LearningRate 0.0000 Epoch: 36 Global Step: 759650 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:50:58,445-Speed 6297.36 samples/sec Loss 2.6492 LearningRate 0.0000 Epoch: 36 Global Step: 759660 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:01,703-Speed 6288.80 samples/sec Loss 2.6430 LearningRate 0.0000 Epoch: 36 Global Step: 759670 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:04,966-Speed 6276.26 samples/sec Loss 2.6115 LearningRate 0.0000 Epoch: 36 Global Step: 759680 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:08,225-Speed 6286.12 samples/sec Loss 2.5999 LearningRate 0.0000 Epoch: 36 Global Step: 759690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:11,483-Speed 6286.94 samples/sec Loss 2.6446 LearningRate 0.0000 Epoch: 36 Global Step: 759700 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:14,737-Speed 6295.49 samples/sec Loss 2.6193 LearningRate 0.0000 Epoch: 36 Global Step: 759710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:17,979-Speed 6319.51 samples/sec Loss 2.6294 LearningRate 0.0000 Epoch: 36 Global Step: 759720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:21,233-Speed 6295.21 samples/sec Loss 2.6515 LearningRate 0.0000 Epoch: 36 Global Step: 759730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:24,486-Speed 6296.57 samples/sec Loss 2.6731 LearningRate 0.0000 Epoch: 36 Global Step: 759740 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:27,754-Speed 6268.27 samples/sec Loss 2.6490 LearningRate 0.0000 Epoch: 36 Global Step: 759750 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:31,023-Speed 6266.36 samples/sec Loss 2.6524 LearningRate 0.0000 Epoch: 36 Global Step: 759760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:34,283-Speed 6282.26 samples/sec Loss 2.6496 LearningRate 0.0000 Epoch: 36 Global Step: 759770 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:37,538-Speed 6293.33 samples/sec Loss 2.6522 LearningRate 0.0000 Epoch: 36 Global Step: 759780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:40,796-Speed 6287.34 samples/sec Loss 2.6365 LearningRate 0.0000 Epoch: 36 Global Step: 759790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:44,056-Speed 6285.45 samples/sec Loss 2.6377 LearningRate 0.0000 Epoch: 36 Global Step: 759800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:47,313-Speed 6288.87 samples/sec Loss 2.6631 LearningRate 0.0000 Epoch: 36 Global Step: 759810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:50,550-Speed 6329.56 samples/sec Loss 2.6678 LearningRate 0.0000 Epoch: 36 Global Step: 759820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:53,795-Speed 6311.40 samples/sec Loss 2.7088 LearningRate 0.0000 Epoch: 36 Global Step: 759830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:51:57,047-Speed 6299.71 samples/sec Loss 2.6210 LearningRate 0.0000 Epoch: 36 Global Step: 759840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:00,301-Speed 6294.46 samples/sec Loss 2.6700 LearningRate 0.0000 Epoch: 36 Global Step: 759850 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:03,557-Speed 6292.26 samples/sec Loss 2.6443 LearningRate 0.0000 Epoch: 36 Global Step: 759860 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:06,819-Speed 6280.08 samples/sec Loss 2.6797 LearningRate 0.0000 Epoch: 36 Global Step: 759870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:10,077-Speed 6287.23 samples/sec Loss 2.6345 LearningRate 0.0000 Epoch: 36 Global Step: 759880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:13,333-Speed 6291.29 samples/sec Loss 2.6880 LearningRate 0.0000 Epoch: 36 Global Step: 759890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:16,597-Speed 6275.32 samples/sec Loss 2.6558 LearningRate 0.0000 Epoch: 36 Global Step: 759900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:19,861-Speed 6276.61 samples/sec Loss 2.6279 LearningRate 0.0000 Epoch: 36 Global Step: 759910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:23,100-Speed 6324.57 samples/sec Loss 2.5916 LearningRate 0.0000 Epoch: 36 Global Step: 759920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:26,368-Speed 6267.06 samples/sec Loss 2.6621 LearningRate 0.0000 Epoch: 36 Global Step: 759930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:29,619-Speed 6302.05 samples/sec Loss 2.6907 LearningRate 0.0000 Epoch: 36 Global Step: 759940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:32,880-Speed 6281.93 samples/sec Loss 2.6676 LearningRate 0.0000 Epoch: 36 Global Step: 759950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:36,140-Speed 6282.62 samples/sec Loss 2.6894 LearningRate 0.0000 Epoch: 36 Global Step: 759960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:39,393-Speed 6297.99 samples/sec Loss 2.6858 LearningRate 0.0000 Epoch: 36 Global Step: 759970 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:42,648-Speed 6292.02 samples/sec Loss 2.6531 LearningRate 0.0000 Epoch: 36 Global Step: 759980 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:45,906-Speed 6288.18 samples/sec Loss 2.6070 LearningRate 0.0000 Epoch: 36 Global Step: 759990 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:49,157-Speed 6302.29 samples/sec Loss 2.6387 LearningRate 0.0000 Epoch: 36 Global Step: 760000 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:52,415-Speed 6286.43 samples/sec Loss 2.6895 LearningRate 0.0000 Epoch: 36 Global Step: 760010 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:55,653-Speed 6327.09 samples/sec Loss 2.6125 LearningRate 0.0000 Epoch: 36 Global Step: 760020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:52:58,908-Speed 6292.18 samples/sec Loss 2.6105 LearningRate 0.0000 Epoch: 36 Global Step: 760030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:02,164-Speed 6292.11 samples/sec Loss 2.6346 LearningRate 0.0000 Epoch: 36 Global Step: 760040 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:05,428-Speed 6274.96 samples/sec Loss 2.6701 LearningRate 0.0000 Epoch: 36 Global Step: 760050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:08,689-Speed 6282.91 samples/sec Loss 2.6674 LearningRate 0.0000 Epoch: 36 Global Step: 760060 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:11,947-Speed 6286.45 samples/sec Loss 2.6615 LearningRate 0.0000 Epoch: 36 Global Step: 760070 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:15,207-Speed 6284.39 samples/sec Loss 2.6444 LearningRate 0.0000 Epoch: 36 Global Step: 760080 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:18,464-Speed 6289.38 samples/sec Loss 2.6635 LearningRate 0.0000 Epoch: 36 Global Step: 760090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:21,726-Speed 6280.25 samples/sec Loss 2.6531 LearningRate 0.0000 Epoch: 36 Global Step: 760100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:24,986-Speed 6282.35 samples/sec Loss 2.6613 LearningRate 0.0000 Epoch: 36 Global Step: 760110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:28,224-Speed 6328.10 samples/sec Loss 2.6311 LearningRate 0.0000 Epoch: 36 Global Step: 760120 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:31,485-Speed 6279.74 samples/sec Loss 2.6663 LearningRate 0.0000 Epoch: 36 Global Step: 760130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:34,743-Speed 6288.43 samples/sec Loss 2.6963 LearningRate 0.0000 Epoch: 36 Global Step: 760140 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:38,006-Speed 6276.97 samples/sec Loss 2.6793 LearningRate 0.0000 Epoch: 36 Global Step: 760150 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:41,265-Speed 6285.72 samples/sec Loss 2.6851 LearningRate 0.0000 Epoch: 36 Global Step: 760160 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:44,524-Speed 6286.19 samples/sec Loss 2.6642 LearningRate 0.0000 Epoch: 36 Global Step: 760170 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:47,784-Speed 6283.86 samples/sec Loss 2.6178 LearningRate 0.0000 Epoch: 36 Global Step: 760180 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:51,041-Speed 6289.61 samples/sec Loss 2.6394 LearningRate 0.0000 Epoch: 36 Global Step: 760190 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:54,295-Speed 6293.56 samples/sec Loss 2.6635 LearningRate 0.0000 Epoch: 36 Global Step: 760200 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:53:57,551-Speed 6292.91 samples/sec Loss 2.6766 LearningRate 0.0000 Epoch: 36 Global Step: 760210 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:00,801-Speed 6303.29 samples/sec Loss 2.6528 LearningRate 0.0000 Epoch: 36 Global Step: 760220 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 12:54:04,043-Speed 6319.25 samples/sec Loss 2.6638 LearningRate 0.0000 Epoch: 36 Global Step: 760230 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:07,298-Speed 6292.38 samples/sec Loss 2.6412 LearningRate 0.0000 Epoch: 36 Global Step: 760240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:10,557-Speed 6286.74 samples/sec Loss 2.6597 LearningRate 0.0000 Epoch: 36 Global Step: 760250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:13,813-Speed 6291.14 samples/sec Loss 2.6556 LearningRate 0.0000 Epoch: 36 Global Step: 760260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:17,083-Speed 6263.12 samples/sec Loss 2.6937 LearningRate 0.0000 Epoch: 36 Global Step: 760270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:20,348-Speed 6274.49 samples/sec Loss 2.6345 LearningRate 0.0000 Epoch: 36 Global Step: 760280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:23,603-Speed 6292.84 samples/sec Loss 2.7205 LearningRate 0.0000 Epoch: 36 Global Step: 760290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:26,865-Speed 6281.37 samples/sec Loss 2.6350 LearningRate 0.0000 Epoch: 36 Global Step: 760300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:30,126-Speed 6281.25 samples/sec Loss 2.6121 LearningRate 0.0000 Epoch: 36 Global Step: 760310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:33,392-Speed 6270.36 samples/sec Loss 2.6201 LearningRate 0.0000 Epoch: 36 Global Step: 760320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:36,637-Speed 6313.79 samples/sec Loss 2.6192 LearningRate 0.0000 Epoch: 36 Global Step: 760330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:39,902-Speed 6273.02 samples/sec Loss 2.5842 LearningRate 0.0000 Epoch: 36 Global Step: 760340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:43,163-Speed 6282.16 samples/sec Loss 2.6527 LearningRate 0.0000 Epoch: 36 Global Step: 760350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:46,417-Speed 6294.91 samples/sec Loss 2.6318 LearningRate 0.0000 Epoch: 36 Global Step: 760360 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:49,683-Speed 6272.31 samples/sec Loss 2.6831 LearningRate 0.0000 Epoch: 36 Global Step: 760370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:52,954-Speed 6263.26 samples/sec Loss 2.7006 LearningRate 0.0000 Epoch: 36 Global Step: 760380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:56,212-Speed 6287.36 samples/sec Loss 2.6443 LearningRate 0.0000 Epoch: 36 Global Step: 760390 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:54:59,465-Speed 6296.96 samples/sec Loss 2.6153 LearningRate 0.0000 Epoch: 36 Global Step: 760400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:02,722-Speed 6289.84 samples/sec Loss 2.6298 LearningRate 0.0000 Epoch: 36 Global Step: 760410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:05,981-Speed 6285.27 samples/sec Loss 2.6351 LearningRate 0.0000 Epoch: 36 Global Step: 760420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:09,223-Speed 6318.87 samples/sec Loss 2.6328 LearningRate 0.0000 Epoch: 36 Global Step: 760430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:12,543-Speed 6170.72 samples/sec Loss 2.6788 LearningRate 0.0000 Epoch: 36 Global Step: 760440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:15,800-Speed 6288.35 samples/sec Loss 2.6979 LearningRate 0.0000 Epoch: 36 Global Step: 760450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:19,051-Speed 6301.94 samples/sec Loss 2.6161 LearningRate 0.0000 Epoch: 36 Global Step: 760460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:22,310-Speed 6285.66 samples/sec Loss 2.6702 LearningRate 0.0000 Epoch: 36 Global Step: 760470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:25,569-Speed 6283.88 samples/sec Loss 2.6399 LearningRate 0.0000 Epoch: 36 Global Step: 760480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:28,825-Speed 6291.72 samples/sec Loss 2.6666 LearningRate 0.0000 Epoch: 36 Global Step: 760490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:32,086-Speed 6282.79 samples/sec Loss 2.6826 LearningRate 0.0000 Epoch: 36 Global Step: 760500 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:35,346-Speed 6283.06 samples/sec Loss 2.7224 LearningRate 0.0000 Epoch: 36 Global Step: 760510 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:38,603-Speed 6289.64 samples/sec Loss 2.6476 LearningRate 0.0000 Epoch: 36 Global Step: 760520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:41,849-Speed 6310.69 samples/sec Loss 2.6195 LearningRate 0.0000 Epoch: 36 Global Step: 760530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:45,108-Speed 6284.62 samples/sec Loss 2.6415 LearningRate 0.0000 Epoch: 36 Global Step: 760540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:48,366-Speed 6288.56 samples/sec Loss 2.6478 LearningRate 0.0000 Epoch: 36 Global Step: 760550 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:51,622-Speed 6290.39 samples/sec Loss 2.6427 LearningRate 0.0000 Epoch: 36 Global Step: 760560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:54,883-Speed 6281.66 samples/sec Loss 2.6111 LearningRate 0.0000 Epoch: 36 Global Step: 760570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:55:58,138-Speed 6293.48 samples/sec Loss 2.6650 LearningRate 0.0000 Epoch: 36 Global Step: 760580 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:01,400-Speed 6280.56 samples/sec Loss 2.6171 LearningRate 0.0000 Epoch: 36 Global Step: 760590 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:04,655-Speed 6293.45 samples/sec Loss 2.6333 LearningRate 0.0000 Epoch: 36 Global Step: 760600 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:07,912-Speed 6287.58 samples/sec Loss 2.6485 LearningRate 0.0000 Epoch: 36 Global Step: 760610 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:11,170-Speed 6288.06 samples/sec Loss 2.6314 LearningRate 0.0000 Epoch: 36 Global Step: 760620 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:14,430-Speed 6284.98 samples/sec Loss 2.6887 LearningRate 0.0000 Epoch: 36 Global Step: 760630 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 12:56:17,665-Speed 6332.58 samples/sec Loss 2.6235 LearningRate 0.0000 Epoch: 36 Global Step: 760640 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:20,920-Speed 6293.59 samples/sec Loss 2.6782 LearningRate 0.0000 Epoch: 36 Global Step: 760650 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:24,186-Speed 6271.93 samples/sec Loss 2.7113 LearningRate 0.0000 Epoch: 36 Global Step: 760660 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:27,447-Speed 6282.11 samples/sec Loss 2.6263 LearningRate 0.0000 Epoch: 36 Global Step: 760670 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:30,701-Speed 6294.10 samples/sec Loss 2.6136 LearningRate 0.0000 Epoch: 36 Global Step: 760680 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:33,965-Speed 6276.08 samples/sec Loss 2.6238 LearningRate 0.0000 Epoch: 36 Global Step: 760690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:37,223-Speed 6287.34 samples/sec Loss 2.6845 LearningRate 0.0000 Epoch: 36 Global Step: 760700 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:40,488-Speed 6274.49 samples/sec Loss 2.6811 LearningRate 0.0000 Epoch: 36 Global Step: 760710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:43,743-Speed 6292.94 samples/sec Loss 2.6686 LearningRate 0.0000 Epoch: 36 Global Step: 760720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:47,009-Speed 6272.49 samples/sec Loss 2.6209 LearningRate 0.0000 Epoch: 36 Global Step: 760730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:50,248-Speed 6324.32 samples/sec Loss 2.6507 LearningRate 0.0000 Epoch: 36 Global Step: 760740 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:53,503-Speed 6292.23 samples/sec Loss 2.6553 LearningRate 0.0000 Epoch: 36 Global Step: 760750 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:56:56,761-Speed 6288.10 samples/sec Loss 2.6138 LearningRate 0.0000 Epoch: 36 Global Step: 760760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:00,026-Speed 6273.73 samples/sec Loss 2.6586 LearningRate 0.0000 Epoch: 36 Global Step: 760770 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:03,290-Speed 6275.75 samples/sec Loss 2.6795 LearningRate 0.0000 Epoch: 36 Global Step: 760780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:06,550-Speed 6283.48 samples/sec Loss 2.6409 LearningRate 0.0000 Epoch: 36 Global Step: 760790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:09,800-Speed 6302.83 samples/sec Loss 2.6505 LearningRate 0.0000 Epoch: 36 Global Step: 760800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:13,056-Speed 6293.16 samples/sec Loss 2.6853 LearningRate 0.0000 Epoch: 36 Global Step: 760810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:16,312-Speed 6291.43 samples/sec Loss 2.7001 LearningRate 0.0000 Epoch: 36 Global Step: 760820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:19,569-Speed 6288.86 samples/sec Loss 2.6160 LearningRate 0.0000 Epoch: 36 Global Step: 760830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:22,822-Speed 6297.05 samples/sec Loss 2.6269 LearningRate 0.0000 Epoch: 36 Global Step: 760840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:26,083-Speed 6281.78 samples/sec Loss 2.6893 LearningRate 0.0000 Epoch: 36 Global Step: 760850 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:29,340-Speed 6290.95 samples/sec Loss 2.5907 LearningRate 0.0000 Epoch: 36 Global Step: 760860 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:32,600-Speed 6281.65 samples/sec Loss 2.6525 LearningRate 0.0000 Epoch: 36 Global Step: 760870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:35,859-Speed 6285.43 samples/sec Loss 2.6581 LearningRate 0.0000 Epoch: 36 Global Step: 760880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:39,117-Speed 6288.90 samples/sec Loss 2.6658 LearningRate 0.0000 Epoch: 36 Global Step: 760890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:42,371-Speed 6294.12 samples/sec Loss 2.6079 LearningRate 0.0000 Epoch: 36 Global Step: 760900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:45,625-Speed 6295.88 samples/sec Loss 2.6773 LearningRate 0.0000 Epoch: 36 Global Step: 760910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:48,872-Speed 6308.50 samples/sec Loss 2.6453 LearningRate 0.0000 Epoch: 36 Global Step: 760920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:52,135-Speed 6278.16 samples/sec Loss 2.6304 LearningRate 0.0000 Epoch: 36 Global Step: 760930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:55,373-Speed 6325.66 samples/sec Loss 2.6576 LearningRate 0.0000 Epoch: 36 Global Step: 760940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:57:58,635-Speed 6279.62 samples/sec Loss 2.6095 LearningRate 0.0000 Epoch: 36 Global Step: 760950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:01,894-Speed 6284.93 samples/sec Loss 2.6486 LearningRate 0.0000 Epoch: 36 Global Step: 760960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:05,155-Speed 6281.93 samples/sec Loss 2.6478 LearningRate 0.0000 Epoch: 36 Global Step: 760970 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:08,418-Speed 6278.66 samples/sec Loss 2.6627 LearningRate 0.0000 Epoch: 36 Global Step: 760980 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:11,677-Speed 6284.93 samples/sec Loss 2.6291 LearningRate 0.0000 Epoch: 36 Global Step: 760990 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:14,939-Speed 6281.08 samples/sec Loss 2.6750 LearningRate 0.0000 Epoch: 36 Global Step: 761000 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:18,202-Speed 6277.29 samples/sec Loss 2.5895 LearningRate 0.0000 Epoch: 36 Global Step: 761010 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:21,466-Speed 6277.12 samples/sec Loss 2.6860 LearningRate 0.0000 Epoch: 36 Global Step: 761020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:24,730-Speed 6275.00 samples/sec Loss 2.6433 LearningRate 0.0000 Epoch: 36 Global Step: 761030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:27,984-Speed 6296.38 samples/sec Loss 2.6685 LearningRate 0.0000 Epoch: 36 Global Step: 761040 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 12:58:31,228-Speed 6312.84 samples/sec Loss 2.5659 LearningRate 0.0000 Epoch: 36 Global Step: 761050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:34,489-Speed 6282.05 samples/sec Loss 2.6830 LearningRate 0.0000 Epoch: 36 Global Step: 761060 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:37,748-Speed 6286.21 samples/sec Loss 2.6464 LearningRate 0.0000 Epoch: 36 Global Step: 761070 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:41,005-Speed 6289.18 samples/sec Loss 2.6590 LearningRate 0.0000 Epoch: 36 Global Step: 761080 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:44,258-Speed 6297.64 samples/sec Loss 2.6672 LearningRate 0.0000 Epoch: 36 Global Step: 761090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:47,510-Speed 6299.13 samples/sec Loss 2.6567 LearningRate 0.0000 Epoch: 36 Global Step: 761100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:50,766-Speed 6291.57 samples/sec Loss 2.6473 LearningRate 0.0000 Epoch: 36 Global Step: 761110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:54,028-Speed 6279.56 samples/sec Loss 2.6541 LearningRate 0.0000 Epoch: 36 Global Step: 761120 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:58:57,275-Speed 6308.96 samples/sec Loss 2.6210 LearningRate 0.0000 Epoch: 36 Global Step: 761130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:00,530-Speed 6293.26 samples/sec Loss 2.6508 LearningRate 0.0000 Epoch: 36 Global Step: 761140 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:03,770-Speed 6322.42 samples/sec Loss 2.6340 LearningRate 0.0000 Epoch: 36 Global Step: 761150 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:07,023-Speed 6295.79 samples/sec Loss 2.6513 LearningRate 0.0000 Epoch: 36 Global Step: 761160 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:10,287-Speed 6276.84 samples/sec Loss 2.6255 LearningRate 0.0000 Epoch: 36 Global Step: 761170 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:13,552-Speed 6273.56 samples/sec Loss 2.6284 LearningRate 0.0000 Epoch: 36 Global Step: 761180 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:16,830-Speed 6250.09 samples/sec Loss 2.6517 LearningRate 0.0000 Epoch: 36 Global Step: 761190 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:20,091-Speed 6282.15 samples/sec Loss 2.6379 LearningRate 0.0000 Epoch: 36 Global Step: 761200 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:23,348-Speed 6287.98 samples/sec Loss 2.6618 LearningRate 0.0000 Epoch: 36 Global Step: 761210 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:26,607-Speed 6286.63 samples/sec Loss 2.5742 LearningRate 0.0000 Epoch: 36 Global Step: 761220 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:29,863-Speed 6290.31 samples/sec Loss 2.6381 LearningRate 0.0000 Epoch: 36 Global Step: 761230 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:33,116-Speed 6296.78 samples/sec Loss 2.6334 LearningRate 0.0000 Epoch: 36 Global Step: 761240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:36,358-Speed 6319.31 samples/sec Loss 2.6637 LearningRate 0.0000 Epoch: 36 Global Step: 761250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:39,613-Speed 6293.43 samples/sec Loss 2.6186 LearningRate 0.0000 Epoch: 36 Global Step: 761260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:42,871-Speed 6287.84 samples/sec Loss 2.6612 LearningRate 0.0000 Epoch: 36 Global Step: 761270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:46,119-Speed 6306.17 samples/sec Loss 2.6551 LearningRate 0.0000 Epoch: 36 Global Step: 761280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:49,378-Speed 6286.53 samples/sec Loss 2.6442 LearningRate 0.0000 Epoch: 36 Global Step: 761290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:52,632-Speed 6295.16 samples/sec Loss 2.6398 LearningRate 0.0000 Epoch: 36 Global Step: 761300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:55,882-Speed 6302.73 samples/sec Loss 2.5932 LearningRate 0.0000 Epoch: 36 Global Step: 761310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 12:59:59,146-Speed 6276.02 samples/sec Loss 2.7134 LearningRate 0.0000 Epoch: 36 Global Step: 761320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:02,407-Speed 6281.34 samples/sec Loss 2.6754 LearningRate 0.0000 Epoch: 36 Global Step: 761330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:05,674-Speed 6270.26 samples/sec Loss 2.6031 LearningRate 0.0000 Epoch: 36 Global Step: 761340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:08,914-Speed 6321.10 samples/sec Loss 2.6285 LearningRate 0.0000 Epoch: 36 Global Step: 761350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:12,173-Speed 6285.24 samples/sec Loss 2.6498 LearningRate 0.0000 Epoch: 36 Global Step: 761360 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:15,430-Speed 6289.41 samples/sec Loss 2.6495 LearningRate 0.0000 Epoch: 36 Global Step: 761370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:18,691-Speed 6281.89 samples/sec Loss 2.6499 LearningRate 0.0000 Epoch: 36 Global Step: 761380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:21,950-Speed 6287.05 samples/sec Loss 2.6052 LearningRate 0.0000 Epoch: 36 Global Step: 761390 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:25,209-Speed 6286.19 samples/sec Loss 2.6586 LearningRate 0.0000 Epoch: 36 Global Step: 761400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:28,470-Speed 6280.79 samples/sec Loss 2.6832 LearningRate 0.0000 Epoch: 36 Global Step: 761410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:31,722-Speed 6300.02 samples/sec Loss 2.6294 LearningRate 0.0000 Epoch: 36 Global Step: 761420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:34,981-Speed 6285.75 samples/sec Loss 2.6215 LearningRate 0.0000 Epoch: 36 Global Step: 761430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:38,243-Speed 6279.49 samples/sec Loss 2.6608 LearningRate 0.0000 Epoch: 36 Global Step: 761440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:41,484-Speed 6321.13 samples/sec Loss 2.6846 LearningRate 0.0000 Epoch: 36 Global Step: 761450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:44,748-Speed 6274.24 samples/sec Loss 2.6813 LearningRate 0.0000 Epoch: 36 Global Step: 761460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:48,014-Speed 6272.32 samples/sec Loss 2.6351 LearningRate 0.0000 Epoch: 36 Global Step: 761470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:51,271-Speed 6289.82 samples/sec Loss 2.6337 LearningRate 0.0000 Epoch: 36 Global Step: 761480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:54,529-Speed 6286.75 samples/sec Loss 2.6730 LearningRate 0.0000 Epoch: 36 Global Step: 761490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:00:57,790-Speed 6282.68 samples/sec Loss 2.6463 LearningRate 0.0000 Epoch: 36 Global Step: 761500 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:01,049-Speed 6285.08 samples/sec Loss 2.6307 LearningRate 0.0000 Epoch: 36 Global Step: 761510 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:04,313-Speed 6276.08 samples/sec Loss 2.6740 LearningRate 0.0000 Epoch: 36 Global Step: 761520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:07,568-Speed 6292.39 samples/sec Loss 2.6543 LearningRate 0.0000 Epoch: 36 Global Step: 761530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:10,824-Speed 6292.84 samples/sec Loss 2.6267 LearningRate 0.0000 Epoch: 36 Global Step: 761540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:14,076-Speed 6297.57 samples/sec Loss 2.6377 LearningRate 0.0000 Epoch: 36 Global Step: 761550 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:01:17,324-Speed 6306.89 samples/sec Loss 2.6500 LearningRate 0.0000 Epoch: 36 Global Step: 761560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:20,577-Speed 6298.28 samples/sec Loss 2.6688 LearningRate 0.0000 Epoch: 36 Global Step: 761570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:23,838-Speed 6282.58 samples/sec Loss 2.6897 LearningRate 0.0000 Epoch: 36 Global Step: 761580 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:27,099-Speed 6280.27 samples/sec Loss 2.6443 LearningRate 0.0000 Epoch: 36 Global Step: 761590 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:30,366-Speed 6271.85 samples/sec Loss 2.6295 LearningRate 0.0000 Epoch: 36 Global Step: 761600 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:33,624-Speed 6287.40 samples/sec Loss 2.6769 LearningRate 0.0000 Epoch: 36 Global Step: 761610 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:36,875-Speed 6300.92 samples/sec Loss 2.6723 LearningRate 0.0000 Epoch: 36 Global Step: 761620 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:40,139-Speed 6275.35 samples/sec Loss 2.6751 LearningRate 0.0000 Epoch: 36 Global Step: 761630 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:43,399-Speed 6283.98 samples/sec Loss 2.6440 LearningRate 0.0000 Epoch: 36 Global Step: 761640 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:46,656-Speed 6289.39 samples/sec Loss 2.6479 LearningRate 0.0000 Epoch: 36 Global Step: 761650 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:49,899-Speed 6316.03 samples/sec Loss 2.6951 LearningRate 0.0000 Epoch: 36 Global Step: 761660 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:53,154-Speed 6293.95 samples/sec Loss 2.6452 LearningRate 0.0000 Epoch: 36 Global Step: 761670 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:56,413-Speed 6285.12 samples/sec Loss 2.6642 LearningRate 0.0000 Epoch: 36 Global Step: 761680 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:01:59,668-Speed 6293.13 samples/sec Loss 2.6981 LearningRate 0.0000 Epoch: 36 Global Step: 761690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:02,919-Speed 6302.42 samples/sec Loss 2.6246 LearningRate 0.0000 Epoch: 36 Global Step: 761700 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:06,179-Speed 6283.34 samples/sec Loss 2.6433 LearningRate 0.0000 Epoch: 36 Global Step: 761710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:09,430-Speed 6300.64 samples/sec Loss 2.6291 LearningRate 0.0000 Epoch: 36 Global Step: 761720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:12,690-Speed 6282.85 samples/sec Loss 2.6468 LearningRate 0.0000 Epoch: 36 Global Step: 761730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:15,952-Speed 6279.91 samples/sec Loss 2.6282 LearningRate 0.0000 Epoch: 36 Global Step: 761740 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:19,214-Speed 6279.47 samples/sec Loss 2.6359 LearningRate 0.0000 Epoch: 36 Global Step: 761750 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:22,461-Speed 6308.45 samples/sec Loss 2.6165 LearningRate 0.0000 Epoch: 36 Global Step: 761760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:25,719-Speed 6289.12 samples/sec Loss 2.5900 LearningRate 0.0000 Epoch: 36 Global Step: 761770 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:28,978-Speed 6284.51 samples/sec Loss 2.6213 LearningRate 0.0000 Epoch: 36 Global Step: 761780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:32,236-Speed 6288.27 samples/sec Loss 2.6485 LearningRate 0.0000 Epoch: 36 Global Step: 761790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:35,501-Speed 6274.86 samples/sec Loss 2.6681 LearningRate 0.0000 Epoch: 36 Global Step: 761800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:38,763-Speed 6278.43 samples/sec Loss 2.6352 LearningRate 0.0000 Epoch: 36 Global Step: 761810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:42,023-Speed 6283.98 samples/sec Loss 2.6513 LearningRate 0.0000 Epoch: 36 Global Step: 761820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:45,278-Speed 6293.50 samples/sec Loss 2.6800 LearningRate 0.0000 Epoch: 36 Global Step: 761830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:48,541-Speed 6278.84 samples/sec Loss 2.6721 LearningRate 0.0000 Epoch: 36 Global Step: 761840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:51,801-Speed 6283.17 samples/sec Loss 2.6712 LearningRate 0.0000 Epoch: 36 Global Step: 761850 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:02:55,059-Speed 6287.56 samples/sec Loss 2.6757 LearningRate 0.0000 Epoch: 36 Global Step: 761860 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:02:58,300-Speed 6320.03 samples/sec Loss 2.6463 LearningRate 0.0000 Epoch: 36 Global Step: 761870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:01,561-Speed 6281.23 samples/sec Loss 2.6619 LearningRate 0.0000 Epoch: 36 Global Step: 761880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:04,816-Speed 6294.35 samples/sec Loss 2.6974 LearningRate 0.0000 Epoch: 36 Global Step: 761890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:08,072-Speed 6289.89 samples/sec Loss 2.6447 LearningRate 0.0000 Epoch: 36 Global Step: 761900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:11,324-Speed 6298.59 samples/sec Loss 2.6154 LearningRate 0.0000 Epoch: 36 Global Step: 761910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:14,588-Speed 6275.95 samples/sec Loss 2.6867 LearningRate 0.0000 Epoch: 36 Global Step: 761920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:17,838-Speed 6303.15 samples/sec Loss 2.6181 LearningRate 0.0000 Epoch: 36 Global Step: 761930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:21,094-Speed 6291.32 samples/sec Loss 2.6363 LearningRate 0.0000 Epoch: 36 Global Step: 761940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:24,379-Speed 6236.69 samples/sec Loss 2.6078 LearningRate 0.0000 Epoch: 36 Global Step: 761950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:27,634-Speed 6293.13 samples/sec Loss 2.6294 LearningRate 0.0000 Epoch: 36 Global Step: 761960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:30,891-Speed 6290.69 samples/sec Loss 2.6238 LearningRate 0.0000 Epoch: 36 Global Step: 761970 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:03:34,153-Speed 6279.75 samples/sec Loss 2.6367 LearningRate 0.0000 Epoch: 36 Global Step: 761980 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:03:37,390-Speed 6327.53 samples/sec Loss 2.6095 LearningRate 0.0000 Epoch: 36 Global Step: 761990 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:40,647-Speed 6289.20 samples/sec Loss 2.6871 LearningRate 0.0000 Epoch: 36 Global Step: 762000 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:43,904-Speed 6289.42 samples/sec Loss 2.6913 LearningRate 0.0000 Epoch: 36 Global Step: 762010 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:47,163-Speed 6285.76 samples/sec Loss 2.6526 LearningRate 0.0000 Epoch: 36 Global Step: 762020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:50,417-Speed 6295.44 samples/sec Loss 2.6544 LearningRate 0.0000 Epoch: 36 Global Step: 762030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:53,672-Speed 6293.07 samples/sec Loss 2.6683 LearningRate 0.0000 Epoch: 36 Global Step: 762040 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:03:56,941-Speed 6266.18 samples/sec Loss 2.6280 LearningRate 0.0000 Epoch: 36 Global Step: 762050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:00,198-Speed 6289.10 samples/sec Loss 2.5629 LearningRate 0.0000 Epoch: 36 Global Step: 762060 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:03,466-Speed 6269.27 samples/sec Loss 2.6499 LearningRate 0.0000 Epoch: 36 Global Step: 762070 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:06,722-Speed 6291.42 samples/sec Loss 2.6315 LearningRate 0.0000 Epoch: 36 Global Step: 762080 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:09,964-Speed 6317.30 samples/sec Loss 2.6494 LearningRate 0.0000 Epoch: 36 Global Step: 762090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:13,221-Speed 6290.56 samples/sec Loss 2.6507 LearningRate 0.0000 Epoch: 36 Global Step: 762100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:16,488-Speed 6268.51 samples/sec Loss 2.6743 LearningRate 0.0000 Epoch: 36 Global Step: 762110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:19,745-Speed 6290.54 samples/sec Loss 2.6646 LearningRate 0.0000 Epoch: 36 Global Step: 762120 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:22,997-Speed 6298.02 samples/sec Loss 2.6325 LearningRate 0.0000 Epoch: 36 Global Step: 762130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:26,254-Speed 6289.01 samples/sec Loss 2.6467 LearningRate 0.0000 Epoch: 36 Global Step: 762140 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:29,510-Speed 6292.90 samples/sec Loss 2.6289 LearningRate 0.0000 Epoch: 36 Global Step: 762150 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:32,763-Speed 6297.94 samples/sec Loss 2.6262 LearningRate 0.0000 Epoch: 36 Global Step: 762160 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:36,014-Speed 6300.96 samples/sec Loss 2.6563 LearningRate 0.0000 Epoch: 36 Global Step: 762170 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:39,266-Speed 6298.75 samples/sec Loss 2.6822 LearningRate 0.0000 Epoch: 36 Global Step: 762180 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:42,523-Speed 6289.52 samples/sec Loss 2.6638 LearningRate 0.0000 Epoch: 36 Global Step: 762190 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:04:45,760-Speed 6327.56 samples/sec Loss 2.6565 LearningRate 0.0000 Epoch: 36 Global Step: 762200 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:49,021-Speed 6281.85 samples/sec Loss 2.6155 LearningRate 0.0000 Epoch: 36 Global Step: 762210 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:52,278-Speed 6289.45 samples/sec Loss 2.6017 LearningRate 0.0000 Epoch: 36 Global Step: 762220 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:55,546-Speed 6269.35 samples/sec Loss 2.6664 LearningRate 0.0000 Epoch: 36 Global Step: 762230 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:04:58,804-Speed 6286.55 samples/sec Loss 2.6053 LearningRate 0.0000 Epoch: 36 Global Step: 762240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:02,062-Speed 6287.94 samples/sec Loss 2.6517 LearningRate 0.0000 Epoch: 36 Global Step: 762250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:05,318-Speed 6291.01 samples/sec Loss 2.7012 LearningRate 0.0000 Epoch: 36 Global Step: 762260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:08,574-Speed 6290.62 samples/sec Loss 2.6410 LearningRate 0.0000 Epoch: 36 Global Step: 762270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:11,839-Speed 6274.61 samples/sec Loss 2.6865 LearningRate 0.0000 Epoch: 36 Global Step: 762280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:15,106-Speed 6281.84 samples/sec Loss 2.6532 LearningRate 0.0000 Epoch: 36 Global Step: 762290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:18,347-Speed 6320.49 samples/sec Loss 2.6081 LearningRate 0.0000 Epoch: 36 Global Step: 762300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:21,607-Speed 6283.67 samples/sec Loss 2.6256 LearningRate 0.0000 Epoch: 36 Global Step: 762310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:24,867-Speed 6282.22 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 36 Global Step: 762320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:28,134-Speed 6270.97 samples/sec Loss 2.6469 LearningRate 0.0000 Epoch: 36 Global Step: 762330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:31,397-Speed 6277.32 samples/sec Loss 2.6123 LearningRate 0.0000 Epoch: 36 Global Step: 762340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:34,657-Speed 6284.07 samples/sec Loss 2.5805 LearningRate 0.0000 Epoch: 36 Global Step: 762350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:37,926-Speed 6267.57 samples/sec Loss 2.6094 LearningRate 0.0000 Epoch: 36 Global Step: 762360 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:41,177-Speed 6301.35 samples/sec Loss 2.6067 LearningRate 0.0000 Epoch: 36 Global Step: 762370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:44,429-Speed 6298.16 samples/sec Loss 2.6381 LearningRate 0.0000 Epoch: 36 Global Step: 762380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:47,687-Speed 6287.53 samples/sec Loss 2.6525 LearningRate 0.0000 Epoch: 36 Global Step: 762390 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:50,939-Speed 6299.80 samples/sec Loss 2.6315 LearningRate 0.0000 Epoch: 36 Global Step: 762400 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:05:54,186-Speed 6308.13 samples/sec Loss 2.6551 LearningRate 0.0000 Epoch: 36 Global Step: 762410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:05:57,438-Speed 6298.34 samples/sec Loss 2.6634 LearningRate 0.0000 Epoch: 36 Global Step: 762420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:00,719-Speed 6244.84 samples/sec Loss 2.6738 LearningRate 0.0000 Epoch: 36 Global Step: 762430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:03,975-Speed 6291.01 samples/sec Loss 2.6641 LearningRate 0.0000 Epoch: 36 Global Step: 762440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:07,234-Speed 6285.15 samples/sec Loss 2.6357 LearningRate 0.0000 Epoch: 36 Global Step: 762450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:10,490-Speed 6291.69 samples/sec Loss 2.6325 LearningRate 0.0000 Epoch: 36 Global Step: 762460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:13,746-Speed 6290.62 samples/sec Loss 2.6493 LearningRate 0.0000 Epoch: 36 Global Step: 762470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:17,017-Speed 6262.51 samples/sec Loss 2.6372 LearningRate 0.0000 Epoch: 36 Global Step: 762480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:20,271-Speed 6295.11 samples/sec Loss 2.6433 LearningRate 0.0000 Epoch: 36 Global Step: 762490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:23,533-Speed 6280.58 samples/sec Loss 2.6587 LearningRate 0.0000 Epoch: 36 Global Step: 762500 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:26,777-Speed 6314.64 samples/sec Loss 2.6447 LearningRate 0.0000 Epoch: 36 Global Step: 762510 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:30,032-Speed 6292.99 samples/sec Loss 2.6854 LearningRate 0.0000 Epoch: 36 Global Step: 762520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:33,287-Speed 6293.39 samples/sec Loss 2.6007 LearningRate 0.0000 Epoch: 36 Global Step: 762530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:36,542-Speed 6293.60 samples/sec Loss 2.6504 LearningRate 0.0000 Epoch: 36 Global Step: 762540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:39,804-Speed 6279.34 samples/sec Loss 2.6615 LearningRate 0.0000 Epoch: 36 Global Step: 762550 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:43,063-Speed 6285.31 samples/sec Loss 2.6337 LearningRate 0.0000 Epoch: 36 Global Step: 762560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:46,323-Speed 6284.80 samples/sec Loss 2.6700 LearningRate 0.0000 Epoch: 36 Global Step: 762570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:49,578-Speed 6292.38 samples/sec Loss 2.6459 LearningRate 0.0000 Epoch: 36 Global Step: 762580 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:52,833-Speed 6294.00 samples/sec Loss 2.5757 LearningRate 0.0000 Epoch: 36 Global Step: 762590 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:56,091-Speed 6287.27 samples/sec Loss 2.6482 LearningRate 0.0000 Epoch: 36 Global Step: 762600 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:06:59,346-Speed 6293.51 samples/sec Loss 2.6873 LearningRate 0.0000 Epoch: 36 Global Step: 762610 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:07:02,596-Speed 6303.33 samples/sec Loss 2.6403 LearningRate 0.0000 Epoch: 36 Global Step: 762620 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:05,892-Speed 6215.27 samples/sec Loss 2.6181 LearningRate 0.0000 Epoch: 36 Global Step: 762630 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:09,154-Speed 6279.27 samples/sec Loss 2.6448 LearningRate 0.0000 Epoch: 36 Global Step: 762640 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:12,425-Speed 6261.29 samples/sec Loss 2.6609 LearningRate 0.0000 Epoch: 36 Global Step: 762650 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:15,687-Speed 6279.98 samples/sec Loss 2.6259 LearningRate 0.0000 Epoch: 36 Global Step: 762660 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:18,958-Speed 6262.07 samples/sec Loss 2.6622 LearningRate 0.0000 Epoch: 36 Global Step: 762670 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:22,221-Speed 6278.39 samples/sec Loss 2.6789 LearningRate 0.0000 Epoch: 36 Global Step: 762680 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:25,501-Speed 6245.65 samples/sec Loss 2.6790 LearningRate 0.0000 Epoch: 36 Global Step: 762690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:28,755-Speed 6295.70 samples/sec Loss 2.6110 LearningRate 0.0000 Epoch: 36 Global Step: 762700 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:32,020-Speed 6273.17 samples/sec Loss 2.6718 LearningRate 0.0000 Epoch: 36 Global Step: 762710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:35,261-Speed 6321.34 samples/sec Loss 2.6746 LearningRate 0.0000 Epoch: 36 Global Step: 762720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:38,516-Speed 6293.06 samples/sec Loss 2.6461 LearningRate 0.0000 Epoch: 36 Global Step: 762730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:41,770-Speed 6296.30 samples/sec Loss 2.6382 LearningRate 0.0000 Epoch: 36 Global Step: 762740 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:45,029-Speed 6284.86 samples/sec Loss 2.6621 LearningRate 0.0000 Epoch: 36 Global Step: 762750 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:48,289-Speed 6283.29 samples/sec Loss 2.6746 LearningRate 0.0000 Epoch: 36 Global Step: 762760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:51,545-Speed 6291.60 samples/sec Loss 2.6620 LearningRate 0.0000 Epoch: 36 Global Step: 762770 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:54,800-Speed 6294.34 samples/sec Loss 2.6622 LearningRate 0.0000 Epoch: 36 Global Step: 762780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:07:58,059-Speed 6285.73 samples/sec Loss 2.5907 LearningRate 0.0000 Epoch: 36 Global Step: 762790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:01,313-Speed 6294.71 samples/sec Loss 2.6020 LearningRate 0.0000 Epoch: 36 Global Step: 762800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:04,571-Speed 6287.54 samples/sec Loss 2.6760 LearningRate 0.0000 Epoch: 36 Global Step: 762810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:07,809-Speed 6326.11 samples/sec Loss 2.6543 LearningRate 0.0000 Epoch: 36 Global Step: 762820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:11,054-Speed 6311.38 samples/sec Loss 2.6525 LearningRate 0.0000 Epoch: 36 Global Step: 762830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:14,314-Speed 6284.92 samples/sec Loss 2.6144 LearningRate 0.0000 Epoch: 36 Global Step: 762840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:17,568-Speed 6294.53 samples/sec Loss 2.6638 LearningRate 0.0000 Epoch: 36 Global Step: 762850 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:20,821-Speed 6298.14 samples/sec Loss 2.6607 LearningRate 0.0000 Epoch: 36 Global Step: 762860 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:24,083-Speed 6279.63 samples/sec Loss 2.6368 LearningRate 0.0000 Epoch: 36 Global Step: 762870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:27,337-Speed 6294.43 samples/sec Loss 2.6319 LearningRate 0.0000 Epoch: 36 Global Step: 762880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:30,592-Speed 6293.18 samples/sec Loss 2.5891 LearningRate 0.0000 Epoch: 36 Global Step: 762890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:33,850-Speed 6287.14 samples/sec Loss 2.6943 LearningRate 0.0000 Epoch: 36 Global Step: 762900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:37,107-Speed 6290.34 samples/sec Loss 2.6796 LearningRate 0.0000 Epoch: 36 Global Step: 762910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:40,354-Speed 6308.00 samples/sec Loss 2.6349 LearningRate 0.0000 Epoch: 36 Global Step: 762920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:43,614-Speed 6285.00 samples/sec Loss 2.6480 LearningRate 0.0000 Epoch: 36 Global Step: 762930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:46,872-Speed 6287.38 samples/sec Loss 2.6130 LearningRate 0.0000 Epoch: 36 Global Step: 762940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:50,127-Speed 6293.24 samples/sec Loss 2.6888 LearningRate 0.0000 Epoch: 36 Global Step: 762950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:53,383-Speed 6291.70 samples/sec Loss 2.6713 LearningRate 0.0000 Epoch: 36 Global Step: 762960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:56,641-Speed 6287.38 samples/sec Loss 2.6432 LearningRate 0.0000 Epoch: 36 Global Step: 762970 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:08:59,890-Speed 6305.33 samples/sec Loss 2.6486 LearningRate 0.0000 Epoch: 36 Global Step: 762980 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:03,155-Speed 6274.69 samples/sec Loss 2.6577 LearningRate 0.0000 Epoch: 36 Global Step: 762990 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:06,410-Speed 6291.75 samples/sec Loss 2.6126 LearningRate 0.0000 Epoch: 36 Global Step: 763000 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:09,670-Speed 6284.58 samples/sec Loss 2.6010 LearningRate 0.0000 Epoch: 36 Global Step: 763010 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:12,912-Speed 6317.90 samples/sec Loss 2.6243 LearningRate 0.0000 Epoch: 36 Global Step: 763020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:16,172-Speed 6283.81 samples/sec Loss 2.6441 LearningRate 0.0000 Epoch: 36 Global Step: 763030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:19,427-Speed 6294.11 samples/sec Loss 2.6824 LearningRate 0.0000 Epoch: 36 Global Step: 763040 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:22,681-Speed 6293.71 samples/sec Loss 2.6556 LearningRate 0.0000 Epoch: 36 Global Step: 763050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:25,935-Speed 6295.91 samples/sec Loss 2.6299 LearningRate 0.0000 Epoch: 36 Global Step: 763060 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:29,203-Speed 6269.06 samples/sec Loss 2.6552 LearningRate 0.0000 Epoch: 36 Global Step: 763070 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:32,465-Speed 6278.06 samples/sec Loss 2.6668 LearningRate 0.0000 Epoch: 36 Global Step: 763080 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:35,724-Speed 6285.96 samples/sec Loss 2.6450 LearningRate 0.0000 Epoch: 36 Global Step: 763090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:38,983-Speed 6285.99 samples/sec Loss 2.6209 LearningRate 0.0000 Epoch: 36 Global Step: 763100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:42,253-Speed 6265.25 samples/sec Loss 2.6120 LearningRate 0.0000 Epoch: 36 Global Step: 763110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:45,519-Speed 6270.48 samples/sec Loss 2.6571 LearningRate 0.0000 Epoch: 36 Global Step: 763120 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:09:48,755-Speed 6329.69 samples/sec Loss 2.6149 LearningRate 0.0000 Epoch: 36 Global Step: 763130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:52,025-Speed 6265.47 samples/sec Loss 2.6680 LearningRate 0.0000 Epoch: 36 Global Step: 763140 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:55,283-Speed 6287.22 samples/sec Loss 2.6799 LearningRate 0.0000 Epoch: 36 Global Step: 763150 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:09:58,540-Speed 6291.26 samples/sec Loss 2.6353 LearningRate 0.0000 Epoch: 36 Global Step: 763160 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:01,794-Speed 6294.61 samples/sec Loss 2.6349 LearningRate 0.0000 Epoch: 36 Global Step: 763170 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:05,045-Speed 6300.08 samples/sec Loss 2.5930 LearningRate 0.0000 Epoch: 36 Global Step: 763180 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:08,299-Speed 6294.96 samples/sec Loss 2.6988 LearningRate 0.0000 Epoch: 36 Global Step: 763190 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:11,556-Speed 6290.96 samples/sec Loss 2.6157 LearningRate 0.0000 Epoch: 36 Global Step: 763200 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:14,816-Speed 6282.93 samples/sec Loss 2.6378 LearningRate 0.0000 Epoch: 36 Global Step: 763210 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:18,069-Speed 6296.94 samples/sec Loss 2.6138 LearningRate 0.0000 Epoch: 36 Global Step: 763220 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:21,330-Speed 6282.27 samples/sec Loss 2.6689 LearningRate 0.0000 Epoch: 36 Global Step: 763230 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:10:24,582-Speed 6297.86 samples/sec Loss 2.6512 LearningRate 0.0000 Epoch: 36 Global Step: 763240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:27,843-Speed 6281.70 samples/sec Loss 2.6618 LearningRate 0.0000 Epoch: 36 Global Step: 763250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:31,096-Speed 6296.68 samples/sec Loss 2.6451 LearningRate 0.0000 Epoch: 36 Global Step: 763260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:34,349-Speed 6298.44 samples/sec Loss 2.6705 LearningRate 0.0000 Epoch: 36 Global Step: 763270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:37,605-Speed 6290.75 samples/sec Loss 2.6420 LearningRate 0.0000 Epoch: 36 Global Step: 763280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:40,866-Speed 6281.86 samples/sec Loss 2.6244 LearningRate 0.0000 Epoch: 36 Global Step: 763290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:44,126-Speed 6283.99 samples/sec Loss 2.6324 LearningRate 0.0000 Epoch: 36 Global Step: 763300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:47,383-Speed 6290.14 samples/sec Loss 2.6261 LearningRate 0.0000 Epoch: 36 Global Step: 763310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:50,636-Speed 6295.17 samples/sec Loss 2.6593 LearningRate 0.0000 Epoch: 36 Global Step: 763320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:53,896-Speed 6285.59 samples/sec Loss 2.6503 LearningRate 0.0000 Epoch: 36 Global Step: 763330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:10:57,132-Speed 6328.95 samples/sec Loss 2.6139 LearningRate 0.0000 Epoch: 36 Global Step: 763340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:00,388-Speed 6292.47 samples/sec Loss 2.5910 LearningRate 0.0000 Epoch: 36 Global Step: 763350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:03,637-Speed 6304.20 samples/sec Loss 2.6004 LearningRate 0.0000 Epoch: 36 Global Step: 763360 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:06,888-Speed 6302.40 samples/sec Loss 2.6225 LearningRate 0.0000 Epoch: 36 Global Step: 763370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:10,149-Speed 6281.30 samples/sec Loss 2.6847 LearningRate 0.0000 Epoch: 36 Global Step: 763380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:13,409-Speed 6283.06 samples/sec Loss 2.6263 LearningRate 0.0000 Epoch: 36 Global Step: 763390 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:16,671-Speed 6279.24 samples/sec Loss 2.6836 LearningRate 0.0000 Epoch: 36 Global Step: 763400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:19,927-Speed 6292.18 samples/sec Loss 2.6849 LearningRate 0.0000 Epoch: 36 Global Step: 763410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:23,182-Speed 6293.88 samples/sec Loss 2.6736 LearningRate 0.0000 Epoch: 36 Global Step: 763420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:26,444-Speed 6279.59 samples/sec Loss 2.6637 LearningRate 0.0000 Epoch: 36 Global Step: 763430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:29,687-Speed 6315.60 samples/sec Loss 2.5794 LearningRate 0.0000 Epoch: 36 Global Step: 763440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:32,945-Speed 6287.55 samples/sec Loss 2.6021 LearningRate 0.0000 Epoch: 36 Global Step: 763450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:36,201-Speed 6291.88 samples/sec Loss 2.6589 LearningRate 0.0000 Epoch: 36 Global Step: 763460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:39,460-Speed 6285.93 samples/sec Loss 2.6167 LearningRate 0.0000 Epoch: 36 Global Step: 763470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:42,716-Speed 6290.47 samples/sec Loss 2.6221 LearningRate 0.0000 Epoch: 36 Global Step: 763480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:45,973-Speed 6290.41 samples/sec Loss 2.6574 LearningRate 0.0000 Epoch: 36 Global Step: 763490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:49,231-Speed 6286.56 samples/sec Loss 2.6377 LearningRate 0.0000 Epoch: 36 Global Step: 763500 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:52,490-Speed 6286.30 samples/sec Loss 2.6067 LearningRate 0.0000 Epoch: 36 Global Step: 763510 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:55,750-Speed 6282.87 samples/sec Loss 2.6081 LearningRate 0.0000 Epoch: 36 Global Step: 763520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:11:59,021-Speed 6263.04 samples/sec Loss 2.6523 LearningRate 0.0000 Epoch: 36 Global Step: 763530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:02,269-Speed 6306.76 samples/sec Loss 2.6209 LearningRate 0.0000 Epoch: 36 Global Step: 763540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:05,523-Speed 6295.21 samples/sec Loss 2.6479 LearningRate 0.0000 Epoch: 36 Global Step: 763550 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:08,796-Speed 6258.63 samples/sec Loss 2.6039 LearningRate 0.0000 Epoch: 36 Global Step: 763560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:12,054-Speed 6288.01 samples/sec Loss 2.6384 LearningRate 0.0000 Epoch: 36 Global Step: 763570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:15,311-Speed 6288.67 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 36 Global Step: 763580 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:18,566-Speed 6293.51 samples/sec Loss 2.7044 LearningRate 0.0000 Epoch: 36 Global Step: 763590 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:21,816-Speed 6303.57 samples/sec Loss 2.6452 LearningRate 0.0000 Epoch: 36 Global Step: 763600 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:25,072-Speed 6292.29 samples/sec Loss 2.6618 LearningRate 0.0000 Epoch: 36 Global Step: 763610 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:28,323-Speed 6299.28 samples/sec Loss 2.6315 LearningRate 0.0000 Epoch: 36 Global Step: 763620 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:31,579-Speed 6290.95 samples/sec Loss 2.6207 LearningRate 0.0000 Epoch: 36 Global Step: 763630 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:34,817-Speed 6327.78 samples/sec Loss 2.6661 LearningRate 0.0000 Epoch: 36 Global Step: 763640 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:38,069-Speed 6298.72 samples/sec Loss 2.6506 LearningRate 0.0000 Epoch: 36 Global Step: 763650 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:41,328-Speed 6286.33 samples/sec Loss 2.6028 LearningRate 0.0000 Epoch: 36 Global Step: 763660 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:44,587-Speed 6284.01 samples/sec Loss 2.6285 LearningRate 0.0000 Epoch: 36 Global Step: 763670 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:47,836-Speed 6305.87 samples/sec Loss 2.6902 LearningRate 0.0000 Epoch: 36 Global Step: 763680 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:51,094-Speed 6286.94 samples/sec Loss 2.5883 LearningRate 0.0000 Epoch: 36 Global Step: 763690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:54,350-Speed 6292.51 samples/sec Loss 2.6552 LearningRate 0.0000 Epoch: 36 Global Step: 763700 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:12:57,599-Speed 6302.94 samples/sec Loss 2.6105 LearningRate 0.0000 Epoch: 36 Global Step: 763710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:00,858-Speed 6286.69 samples/sec Loss 2.6466 LearningRate 0.0000 Epoch: 36 Global Step: 763720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:04,116-Speed 6286.51 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 36 Global Step: 763730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:07,380-Speed 6277.20 samples/sec Loss 2.6450 LearningRate 0.0000 Epoch: 36 Global Step: 763740 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:13:10,669-Speed 6228.62 samples/sec Loss 2.6523 LearningRate 0.0000 Epoch: 36 Global Step: 763750 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:13:13,910-Speed 6320.77 samples/sec Loss 2.6554 LearningRate 0.0000 Epoch: 36 Global Step: 763760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:17,178-Speed 6266.54 samples/sec Loss 2.6736 LearningRate 0.0000 Epoch: 36 Global Step: 763770 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:20,438-Speed 6285.70 samples/sec Loss 2.6990 LearningRate 0.0000 Epoch: 36 Global Step: 763780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:23,695-Speed 6288.82 samples/sec Loss 2.6131 LearningRate 0.0000 Epoch: 36 Global Step: 763790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:26,952-Speed 6289.90 samples/sec Loss 2.6535 LearningRate 0.0000 Epoch: 36 Global Step: 763800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:30,210-Speed 6287.46 samples/sec Loss 2.6340 LearningRate 0.0000 Epoch: 36 Global Step: 763810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:33,464-Speed 6294.71 samples/sec Loss 2.6410 LearningRate 0.0000 Epoch: 36 Global Step: 763820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:36,717-Speed 6297.09 samples/sec Loss 2.6616 LearningRate 0.0000 Epoch: 36 Global Step: 763830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:39,973-Speed 6292.18 samples/sec Loss 2.6845 LearningRate 0.0000 Epoch: 36 Global Step: 763840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:43,225-Speed 6298.68 samples/sec Loss 2.6374 LearningRate 0.0000 Epoch: 36 Global Step: 763850 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:46,483-Speed 6286.59 samples/sec Loss 2.6279 LearningRate 0.0000 Epoch: 36 Global Step: 763860 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:49,735-Speed 6298.92 samples/sec Loss 2.5962 LearningRate 0.0000 Epoch: 36 Global Step: 763870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:52,998-Speed 6277.15 samples/sec Loss 2.6287 LearningRate 0.0000 Epoch: 36 Global Step: 763880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:56,264-Speed 6273.53 samples/sec Loss 2.6614 LearningRate 0.0000 Epoch: 36 Global Step: 763890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:13:59,518-Speed 6294.38 samples/sec Loss 2.6289 LearningRate 0.0000 Epoch: 36 Global Step: 763900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:02,777-Speed 6286.62 samples/sec Loss 2.6359 LearningRate 0.0000 Epoch: 36 Global Step: 763910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:06,046-Speed 6265.93 samples/sec Loss 2.6161 LearningRate 0.0000 Epoch: 36 Global Step: 763920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:09,300-Speed 6295.61 samples/sec Loss 2.6513 LearningRate 0.0000 Epoch: 36 Global Step: 763930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:12,554-Speed 6295.69 samples/sec Loss 2.6626 LearningRate 0.0000 Epoch: 36 Global Step: 763940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:15,810-Speed 6291.48 samples/sec Loss 2.6349 LearningRate 0.0000 Epoch: 36 Global Step: 763950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:19,050-Speed 6321.87 samples/sec Loss 2.6838 LearningRate 0.0000 Epoch: 36 Global Step: 763960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:22,305-Speed 6293.88 samples/sec Loss 2.5821 LearningRate 0.0000 Epoch: 36 Global Step: 763970 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:25,559-Speed 6294.71 samples/sec Loss 2.6132 LearningRate 0.0000 Epoch: 36 Global Step: 763980 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:28,813-Speed 6295.68 samples/sec Loss 2.6266 LearningRate 0.0000 Epoch: 36 Global Step: 763990 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:32,072-Speed 6285.20 samples/sec Loss 2.6948 LearningRate 0.0000 Epoch: 36 Global Step: 764000 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:35,332-Speed 6283.43 samples/sec Loss 2.6348 LearningRate 0.0000 Epoch: 36 Global Step: 764010 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:38,586-Speed 6296.10 samples/sec Loss 2.6357 LearningRate 0.0000 Epoch: 36 Global Step: 764020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:41,846-Speed 6283.37 samples/sec Loss 2.6949 LearningRate 0.0000 Epoch: 36 Global Step: 764030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:45,100-Speed 6294.64 samples/sec Loss 2.5991 LearningRate 0.0000 Epoch: 36 Global Step: 764040 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:48,361-Speed 6281.43 samples/sec Loss 2.5828 LearningRate 0.0000 Epoch: 36 Global Step: 764050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:14:51,627-Speed 6272.70 samples/sec Loss 2.6084 LearningRate 0.0000 Epoch: 36 Global Step: 764060 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:14:54,886-Speed 6286.57 samples/sec Loss 2.6780 LearningRate 0.0000 Epoch: 36 Global Step: 764070 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:14:58,129-Speed 6315.75 samples/sec Loss 2.5702 LearningRate 0.0000 Epoch: 36 Global Step: 764080 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:01,389-Speed 6283.13 samples/sec Loss 2.6421 LearningRate 0.0000 Epoch: 36 Global Step: 764090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:04,650-Speed 6282.52 samples/sec Loss 2.6540 LearningRate 0.0000 Epoch: 36 Global Step: 764100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:07,908-Speed 6286.88 samples/sec Loss 2.6055 LearningRate 0.0000 Epoch: 36 Global Step: 764110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:11,165-Speed 6289.70 samples/sec Loss 2.6157 LearningRate 0.0000 Epoch: 36 Global Step: 764120 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:14,423-Speed 6288.12 samples/sec Loss 2.6798 LearningRate 0.0000 Epoch: 36 Global Step: 764130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:17,678-Speed 6291.46 samples/sec Loss 2.6207 LearningRate 0.0000 Epoch: 36 Global Step: 764140 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:20,945-Speed 6271.12 samples/sec Loss 2.6607 LearningRate 0.0000 Epoch: 36 Global Step: 764150 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:24,208-Speed 6277.88 samples/sec Loss 2.6573 LearningRate 0.0000 Epoch: 36 Global Step: 764160 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:27,473-Speed 6274.52 samples/sec Loss 2.6913 LearningRate 0.0000 Epoch: 36 Global Step: 764170 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:30,720-Speed 6307.98 samples/sec Loss 2.6193 LearningRate 0.0000 Epoch: 36 Global Step: 764180 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:33,997-Speed 6252.64 samples/sec Loss 2.6327 LearningRate 0.0000 Epoch: 36 Global Step: 764190 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:37,252-Speed 6292.25 samples/sec Loss 2.6091 LearningRate 0.0000 Epoch: 36 Global Step: 764200 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:40,501-Speed 6304.45 samples/sec Loss 2.6133 LearningRate 0.0000 Epoch: 36 Global Step: 764210 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:43,767-Speed 6273.15 samples/sec Loss 2.5770 LearningRate 0.0000 Epoch: 36 Global Step: 764220 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:47,033-Speed 6270.72 samples/sec Loss 2.6080 LearningRate 0.0000 Epoch: 36 Global Step: 764230 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:50,291-Speed 6288.37 samples/sec Loss 2.6055 LearningRate 0.0000 Epoch: 36 Global Step: 764240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:53,559-Speed 6268.68 samples/sec Loss 2.6006 LearningRate 0.0000 Epoch: 36 Global Step: 764250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:15:56,813-Speed 6294.62 samples/sec Loss 2.6225 LearningRate 0.0000 Epoch: 36 Global Step: 764260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:00,079-Speed 6271.99 samples/sec Loss 2.5996 LearningRate 0.0000 Epoch: 36 Global Step: 764270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:03,580-Speed 5849.95 samples/sec Loss 2.6423 LearningRate 0.0000 Epoch: 36 Global Step: 764280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:06,842-Speed 6281.02 samples/sec Loss 2.6009 LearningRate 0.0000 Epoch: 36 Global Step: 764290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:10,100-Speed 6287.39 samples/sec Loss 2.6230 LearningRate 0.0000 Epoch: 36 Global Step: 764300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:13,354-Speed 6295.27 samples/sec Loss 2.6245 LearningRate 0.0000 Epoch: 36 Global Step: 764310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:16,618-Speed 6275.36 samples/sec Loss 2.6344 LearningRate 0.0000 Epoch: 36 Global Step: 764320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:19,879-Speed 6281.43 samples/sec Loss 2.5511 LearningRate 0.0000 Epoch: 36 Global Step: 764330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:23,143-Speed 6276.96 samples/sec Loss 2.6017 LearningRate 0.0000 Epoch: 36 Global Step: 764340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:26,400-Speed 6289.63 samples/sec Loss 2.6183 LearningRate 0.0000 Epoch: 36 Global Step: 764350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:29,654-Speed 6295.90 samples/sec Loss 2.6214 LearningRate 0.0000 Epoch: 36 Global Step: 764360 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:32,916-Speed 6279.51 samples/sec Loss 2.6107 LearningRate 0.0000 Epoch: 36 Global Step: 764370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:36,184-Speed 6267.68 samples/sec Loss 2.6434 LearningRate 0.0000 Epoch: 36 Global Step: 764380 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:16:39,437-Speed 6297.71 samples/sec Loss 2.6169 LearningRate 0.0000 Epoch: 36 Global Step: 764390 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:16:42,676-Speed 6323.23 samples/sec Loss 2.6012 LearningRate 0.0000 Epoch: 36 Global Step: 764400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:45,933-Speed 6291.60 samples/sec Loss 2.6451 LearningRate 0.0000 Epoch: 36 Global Step: 764410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:49,193-Speed 6283.23 samples/sec Loss 2.6462 LearningRate 0.0000 Epoch: 36 Global Step: 764420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:52,463-Speed 6262.97 samples/sec Loss 2.6805 LearningRate 0.0000 Epoch: 36 Global Step: 764430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:55,732-Speed 6267.75 samples/sec Loss 2.6663 LearningRate 0.0000 Epoch: 36 Global Step: 764440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:16:58,990-Speed 6286.54 samples/sec Loss 2.6341 LearningRate 0.0000 Epoch: 36 Global Step: 764450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:02,246-Speed 6290.55 samples/sec Loss 2.6336 LearningRate 0.0000 Epoch: 36 Global Step: 764460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:05,504-Speed 6287.60 samples/sec Loss 2.6408 LearningRate 0.0000 Epoch: 36 Global Step: 764470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:08,762-Speed 6287.97 samples/sec Loss 2.6024 LearningRate 0.0000 Epoch: 36 Global Step: 764480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:12,020-Speed 6288.16 samples/sec Loss 2.6785 LearningRate 0.0000 Epoch: 36 Global Step: 764490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:15,256-Speed 6329.04 samples/sec Loss 2.6074 LearningRate 0.0000 Epoch: 36 Global Step: 764500 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:18,510-Speed 6296.36 samples/sec Loss 2.6299 LearningRate 0.0000 Epoch: 36 Global Step: 764510 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:21,765-Speed 6292.66 samples/sec Loss 2.6603 LearningRate 0.0000 Epoch: 36 Global Step: 764520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:25,027-Speed 6279.25 samples/sec Loss 2.6621 LearningRate 0.0000 Epoch: 36 Global Step: 764530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:28,285-Speed 6286.85 samples/sec Loss 2.6768 LearningRate 0.0000 Epoch: 36 Global Step: 764540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:31,551-Speed 6272.80 samples/sec Loss 2.5608 LearningRate 0.0000 Epoch: 36 Global Step: 764550 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:34,812-Speed 6281.95 samples/sec Loss 2.6482 LearningRate 0.0000 Epoch: 36 Global Step: 764560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:38,093-Speed 6244.80 samples/sec Loss 2.6270 LearningRate 0.0000 Epoch: 36 Global Step: 764570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:41,426-Speed 6144.91 samples/sec Loss 2.6614 LearningRate 0.0000 Epoch: 36 Global Step: 764580 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:44,687-Speed 6282.37 samples/sec Loss 2.5974 LearningRate 0.0000 Epoch: 36 Global Step: 764590 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:47,929-Speed 6318.90 samples/sec Loss 2.6245 LearningRate 0.0000 Epoch: 36 Global Step: 764600 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:51,185-Speed 6290.22 samples/sec Loss 2.6630 LearningRate 0.0000 Epoch: 36 Global Step: 764610 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:54,444-Speed 6285.46 samples/sec Loss 2.6609 LearningRate 0.0000 Epoch: 36 Global Step: 764620 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:17:57,705-Speed 6282.56 samples/sec Loss 2.6239 LearningRate 0.0000 Epoch: 36 Global Step: 764630 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:00,965-Speed 6282.51 samples/sec Loss 2.6185 LearningRate 0.0000 Epoch: 36 Global Step: 764640 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:04,225-Speed 6284.82 samples/sec Loss 2.6390 LearningRate 0.0000 Epoch: 36 Global Step: 764650 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:07,484-Speed 6284.73 samples/sec Loss 2.6503 LearningRate 0.0000 Epoch: 36 Global Step: 764660 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:10,739-Speed 6292.61 samples/sec Loss 2.6463 LearningRate 0.0000 Epoch: 36 Global Step: 764670 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:14,000-Speed 6282.77 samples/sec Loss 2.6278 LearningRate 0.0000 Epoch: 36 Global Step: 764680 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:17,254-Speed 6294.81 samples/sec Loss 2.6536 LearningRate 0.0000 Epoch: 36 Global Step: 764690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:20,491-Speed 6328.06 samples/sec Loss 2.6282 LearningRate 0.0000 Epoch: 36 Global Step: 764700 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:23,741-Speed 6302.66 samples/sec Loss 2.6237 LearningRate 0.0000 Epoch: 36 Global Step: 764710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:26,993-Speed 6299.89 samples/sec Loss 2.6823 LearningRate 0.0000 Epoch: 36 Global Step: 764720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:30,248-Speed 6293.51 samples/sec Loss 2.6971 LearningRate 0.0000 Epoch: 36 Global Step: 764730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:33,514-Speed 6271.97 samples/sec Loss 2.6177 LearningRate 0.0000 Epoch: 36 Global Step: 764740 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:36,770-Speed 6290.26 samples/sec Loss 2.6212 LearningRate 0.0000 Epoch: 36 Global Step: 764750 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:40,030-Speed 6284.15 samples/sec Loss 2.6173 LearningRate 0.0000 Epoch: 36 Global Step: 764760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:43,295-Speed 6274.52 samples/sec Loss 2.6480 LearningRate 0.0000 Epoch: 36 Global Step: 764770 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:46,556-Speed 6282.01 samples/sec Loss 2.6123 LearningRate 0.0000 Epoch: 36 Global Step: 764780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:49,807-Speed 6301.57 samples/sec Loss 2.6437 LearningRate 0.0000 Epoch: 36 Global Step: 764790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:53,048-Speed 6321.15 samples/sec Loss 2.5391 LearningRate 0.0000 Epoch: 36 Global Step: 764800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:56,318-Speed 6263.59 samples/sec Loss 2.6139 LearningRate 0.0000 Epoch: 36 Global Step: 764810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:18:59,578-Speed 6282.67 samples/sec Loss 2.6363 LearningRate 0.0000 Epoch: 36 Global Step: 764820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:02,837-Speed 6285.75 samples/sec Loss 2.6723 LearningRate 0.0000 Epoch: 36 Global Step: 764830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:06,094-Speed 6290.71 samples/sec Loss 2.6069 LearningRate 0.0000 Epoch: 36 Global Step: 764840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:09,358-Speed 6274.34 samples/sec Loss 2.5910 LearningRate 0.0000 Epoch: 36 Global Step: 764850 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:12,623-Speed 6275.66 samples/sec Loss 2.6273 LearningRate 0.0000 Epoch: 36 Global Step: 764860 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:15,880-Speed 6287.54 samples/sec Loss 2.6441 LearningRate 0.0000 Epoch: 36 Global Step: 764870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:19,146-Speed 6273.51 samples/sec Loss 2.6577 LearningRate 0.0000 Epoch: 36 Global Step: 764880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:22,408-Speed 6278.52 samples/sec Loss 2.6224 LearningRate 0.0000 Epoch: 36 Global Step: 764890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:25,666-Speed 6288.09 samples/sec Loss 2.6495 LearningRate 0.0000 Epoch: 36 Global Step: 764900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:28,929-Speed 6277.21 samples/sec Loss 2.6642 LearningRate 0.0000 Epoch: 36 Global Step: 764910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:32,190-Speed 6282.35 samples/sec Loss 2.5992 LearningRate 0.0000 Epoch: 36 Global Step: 764920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:35,481-Speed 6224.40 samples/sec Loss 2.6164 LearningRate 0.0000 Epoch: 36 Global Step: 764930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:38,739-Speed 6287.80 samples/sec Loss 2.6292 LearningRate 0.0000 Epoch: 36 Global Step: 764940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:41,995-Speed 6290.61 samples/sec Loss 2.7065 LearningRate 0.0000 Epoch: 36 Global Step: 764950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:45,265-Speed 6264.07 samples/sec Loss 2.6210 LearningRate 0.0000 Epoch: 36 Global Step: 764960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:48,530-Speed 6274.34 samples/sec Loss 2.6169 LearningRate 0.0000 Epoch: 36 Global Step: 764970 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:51,784-Speed 6296.81 samples/sec Loss 2.6842 LearningRate 0.0000 Epoch: 36 Global Step: 764980 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:55,045-Speed 6280.71 samples/sec Loss 2.6181 LearningRate 0.0000 Epoch: 36 Global Step: 764990 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:19:58,305-Speed 6284.67 samples/sec Loss 2.6541 LearningRate 0.0000 Epoch: 36 Global Step: 765000 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:20:01,547-Speed 6318.73 samples/sec Loss 2.6943 LearningRate 0.0000 Epoch: 36 Global Step: 765010 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:04,808-Speed 6281.16 samples/sec Loss 2.6245 LearningRate 0.0000 Epoch: 36 Global Step: 765020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:08,079-Speed 6261.34 samples/sec Loss 2.6818 LearningRate 0.0000 Epoch: 36 Global Step: 765030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:11,341-Speed 6282.42 samples/sec Loss 2.6092 LearningRate 0.0000 Epoch: 36 Global Step: 765040 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:14,625-Speed 6236.61 samples/sec Loss 2.6110 LearningRate 0.0000 Epoch: 36 Global Step: 765050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:17,996-Speed 6077.71 samples/sec Loss 2.6321 LearningRate 0.0000 Epoch: 36 Global Step: 765060 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:21,260-Speed 6274.71 samples/sec Loss 2.6421 LearningRate 0.0000 Epoch: 36 Global Step: 765070 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:24,510-Speed 6302.80 samples/sec Loss 2.6416 LearningRate 0.0000 Epoch: 36 Global Step: 765080 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:27,809-Speed 6214.58 samples/sec Loss 2.6349 LearningRate 0.0000 Epoch: 36 Global Step: 765090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:31,063-Speed 6293.78 samples/sec Loss 2.6898 LearningRate 0.0000 Epoch: 36 Global Step: 765100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:34,311-Speed 6306.90 samples/sec Loss 2.6906 LearningRate 0.0000 Epoch: 36 Global Step: 765110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:37,574-Speed 6278.76 samples/sec Loss 2.6581 LearningRate 0.0000 Epoch: 36 Global Step: 765120 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:40,832-Speed 6287.31 samples/sec Loss 2.6772 LearningRate 0.0000 Epoch: 36 Global Step: 765130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:44,085-Speed 6297.09 samples/sec Loss 2.6447 LearningRate 0.0000 Epoch: 36 Global Step: 765140 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:47,362-Speed 6251.28 samples/sec Loss 2.6109 LearningRate 0.0000 Epoch: 36 Global Step: 765150 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:50,689-Speed 6157.69 samples/sec Loss 2.6223 LearningRate 0.0000 Epoch: 36 Global Step: 765160 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:53,942-Speed 6296.76 samples/sec Loss 2.6054 LearningRate 0.0000 Epoch: 36 Global Step: 765170 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:20:57,198-Speed 6291.84 samples/sec Loss 2.5953 LearningRate 0.0000 Epoch: 36 Global Step: 765180 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:00,469-Speed 6262.74 samples/sec Loss 2.6161 LearningRate 0.0000 Epoch: 36 Global Step: 765190 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:03,729-Speed 6284.30 samples/sec Loss 2.6337 LearningRate 0.0000 Epoch: 36 Global Step: 765200 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:06,969-Speed 6320.84 samples/sec Loss 2.6118 LearningRate 0.0000 Epoch: 36 Global Step: 765210 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:10,231-Speed 6279.70 samples/sec Loss 2.6815 LearningRate 0.0000 Epoch: 36 Global Step: 765220 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:13,484-Speed 6298.49 samples/sec Loss 2.5622 LearningRate 0.0000 Epoch: 36 Global Step: 765230 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:16,749-Speed 6273.76 samples/sec Loss 2.6237 LearningRate 0.0000 Epoch: 36 Global Step: 765240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:20,008-Speed 6284.32 samples/sec Loss 2.6367 LearningRate 0.0000 Epoch: 36 Global Step: 765250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:23,266-Speed 6288.73 samples/sec Loss 2.6264 LearningRate 0.0000 Epoch: 36 Global Step: 765260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:26,521-Speed 6292.28 samples/sec Loss 2.6255 LearningRate 0.0000 Epoch: 36 Global Step: 765270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:29,780-Speed 6284.94 samples/sec Loss 2.6297 LearningRate 0.0000 Epoch: 36 Global Step: 765280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:33,043-Speed 6278.46 samples/sec Loss 2.6239 LearningRate 0.0000 Epoch: 36 Global Step: 765290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:36,307-Speed 6276.39 samples/sec Loss 2.6313 LearningRate 0.0000 Epoch: 36 Global Step: 765300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:39,542-Speed 6332.49 samples/sec Loss 2.6861 LearningRate 0.0000 Epoch: 36 Global Step: 765310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:42,806-Speed 6274.93 samples/sec Loss 2.6225 LearningRate 0.0000 Epoch: 36 Global Step: 765320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:46,066-Speed 6283.14 samples/sec Loss 2.6563 LearningRate 0.0000 Epoch: 36 Global Step: 765330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:49,330-Speed 6276.84 samples/sec Loss 2.6414 LearningRate 0.0000 Epoch: 36 Global Step: 765340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:52,587-Speed 6288.95 samples/sec Loss 2.5738 LearningRate 0.0000 Epoch: 36 Global Step: 765350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:55,843-Speed 6291.63 samples/sec Loss 2.6427 LearningRate 0.0000 Epoch: 36 Global Step: 765360 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:21:59,108-Speed 6275.44 samples/sec Loss 2.5934 LearningRate 0.0000 Epoch: 36 Global Step: 765370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:02,377-Speed 6265.48 samples/sec Loss 2.6179 LearningRate 0.0000 Epoch: 36 Global Step: 765380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:05,634-Speed 6290.05 samples/sec Loss 2.6381 LearningRate 0.0000 Epoch: 36 Global Step: 765390 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:08,887-Speed 6295.66 samples/sec Loss 2.6035 LearningRate 0.0000 Epoch: 36 Global Step: 765400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:12,122-Speed 6332.79 samples/sec Loss 2.6684 LearningRate 0.0000 Epoch: 36 Global Step: 765410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:15,378-Speed 6292.16 samples/sec Loss 2.6294 LearningRate 0.0000 Epoch: 36 Global Step: 765420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:18,631-Speed 6297.35 samples/sec Loss 2.6094 LearningRate 0.0000 Epoch: 36 Global Step: 765430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:21,887-Speed 6290.99 samples/sec Loss 2.6357 LearningRate 0.0000 Epoch: 36 Global Step: 765440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:25,141-Speed 6295.37 samples/sec Loss 2.6444 LearningRate 0.0000 Epoch: 36 Global Step: 765450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:28,397-Speed 6292.15 samples/sec Loss 2.6460 LearningRate 0.0000 Epoch: 36 Global Step: 765460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:31,666-Speed 6266.06 samples/sec Loss 2.6374 LearningRate 0.0000 Epoch: 36 Global Step: 765470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:34,926-Speed 6282.88 samples/sec Loss 2.5883 LearningRate 0.0000 Epoch: 36 Global Step: 765480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:38,178-Speed 6298.36 samples/sec Loss 2.6730 LearningRate 0.0000 Epoch: 36 Global Step: 765490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:41,459-Speed 6244.75 samples/sec Loss 2.6587 LearningRate 0.0000 Epoch: 36 Global Step: 765500 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:44,698-Speed 6323.95 samples/sec Loss 2.5799 LearningRate 0.0000 Epoch: 36 Global Step: 765510 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:47,953-Speed 6291.94 samples/sec Loss 2.6463 LearningRate 0.0000 Epoch: 36 Global Step: 765520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:51,210-Speed 6290.95 samples/sec Loss 2.6033 LearningRate 0.0000 Epoch: 36 Global Step: 765530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:54,467-Speed 6288.19 samples/sec Loss 2.6996 LearningRate 0.0000 Epoch: 36 Global Step: 765540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:22:57,727-Speed 6284.43 samples/sec Loss 2.6327 LearningRate 0.0000 Epoch: 36 Global Step: 765550 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:23:00,990-Speed 6279.04 samples/sec Loss 2.6314 LearningRate 0.0000 Epoch: 36 Global Step: 765560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:23:04,246-Speed 6291.27 samples/sec Loss 2.6015 LearningRate 0.0000 Epoch: 36 Global Step: 765570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:23:07,485-Speed 6323.45 samples/sec Loss 2.6236 LearningRate 0.0000 Epoch: 36 Global Step: 765580 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:23:10,745-Speed 6283.90 samples/sec Loss 2.6134 LearningRate 0.0000 Epoch: 36 Global Step: 765590 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:23:14,004-Speed 6284.70 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 36 Global Step: 765600 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:23:17,260-Speed 6291.15 samples/sec Loss 2.6820 LearningRate 0.0000 Epoch: 36 Global Step: 765610 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:23:20,522-Speed 6281.63 samples/sec Loss 2.6301 LearningRate 0.0000 Epoch: 36 Global Step: 765620 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:23:23,780-Speed 6286.46 samples/sec Loss 2.6803 LearningRate 0.0000 Epoch: 36 Global Step: 765630 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:23:27,036-Speed 6291.24 samples/sec Loss 2.6297 LearningRate 0.0000 Epoch: 36 Global Step: 765640 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:23:30,313-Speed 6251.28 samples/sec Loss 2.6379 LearningRate 0.0000 Epoch: 36 Global Step: 765650 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:23:33,586-Speed 6259.15 samples/sec Loss 2.6491 LearningRate 0.0000 Epoch: 36 Global Step: 765660 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:23:36,854-Speed 6266.68 samples/sec Loss 2.6005 LearningRate 0.0000 Epoch: 36 Global Step: 765670 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:23:40,120-Speed 6274.06 samples/sec Loss 2.6552 LearningRate 0.0000 Epoch: 36 Global Step: 765680 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:23:43,380-Speed 6282.51 samples/sec Loss 2.6177 LearningRate 0.0000 Epoch: 36 Global Step: 765690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:23:46,638-Speed 6287.94 samples/sec Loss 2.5585 LearningRate 0.0000 Epoch: 36 Global Step: 765700 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:23:49,895-Speed 6288.98 samples/sec Loss 2.6252 LearningRate 0.0000 Epoch: 36 Global Step: 765710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:23:53,148-Speed 6297.93 samples/sec Loss 2.6871 LearningRate 0.0000 Epoch: 36 Global Step: 765720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:23:56,405-Speed 6287.72 samples/sec Loss 2.5906 LearningRate 0.0000 Epoch: 36 Global Step: 765730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:23:59,666-Speed 6283.26 samples/sec Loss 2.6551 LearningRate 0.0000 Epoch: 36 Global Step: 765740 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:02,927-Speed 6281.15 samples/sec Loss 2.6077 LearningRate 0.0000 Epoch: 36 Global Step: 765750 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:06,182-Speed 6294.62 samples/sec Loss 2.6312 LearningRate 0.0000 Epoch: 36 Global Step: 765760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:09,438-Speed 6291.00 samples/sec Loss 2.6383 LearningRate 0.0000 Epoch: 36 Global Step: 765770 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:12,686-Speed 6310.21 samples/sec Loss 2.6628 LearningRate 0.0000 Epoch: 36 Global Step: 765780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:15,936-Speed 6302.73 samples/sec Loss 2.6195 LearningRate 0.0000 Epoch: 36 Global Step: 765790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:19,191-Speed 6293.04 samples/sec Loss 2.6631 LearningRate 0.0000 Epoch: 36 Global Step: 765800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:22,453-Speed 6278.80 samples/sec Loss 2.6336 LearningRate 0.0000 Epoch: 36 Global Step: 765810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:25,713-Speed 6283.62 samples/sec Loss 2.6459 LearningRate 0.0000 Epoch: 36 Global Step: 765820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:28,978-Speed 6273.90 samples/sec Loss 2.6376 LearningRate 0.0000 Epoch: 36 Global Step: 765830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:32,238-Speed 6285.10 samples/sec Loss 2.6380 LearningRate 0.0000 Epoch: 36 Global Step: 765840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:35,493-Speed 6292.20 samples/sec Loss 2.6601 LearningRate 0.0000 Epoch: 36 Global Step: 765850 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:38,752-Speed 6285.57 samples/sec Loss 2.6582 LearningRate 0.0000 Epoch: 36 Global Step: 765860 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:42,005-Speed 6297.54 samples/sec Loss 2.6541 LearningRate 0.0000 Epoch: 36 Global Step: 765870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:45,242-Speed 6327.73 samples/sec Loss 2.6380 LearningRate 0.0000 Epoch: 36 Global Step: 765880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:48,500-Speed 6286.26 samples/sec Loss 2.6209 LearningRate 0.0000 Epoch: 36 Global Step: 765890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:51,756-Speed 6292.69 samples/sec Loss 2.6126 LearningRate 0.0000 Epoch: 36 Global Step: 765900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:55,010-Speed 6294.58 samples/sec Loss 2.6556 LearningRate 0.0000 Epoch: 36 Global Step: 765910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:24:58,276-Speed 6273.69 samples/sec Loss 2.6364 LearningRate 0.0000 Epoch: 36 Global Step: 765920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:01,546-Speed 6263.69 samples/sec Loss 2.6383 LearningRate 0.0000 Epoch: 36 Global Step: 765930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:04,800-Speed 6294.33 samples/sec Loss 2.6111 LearningRate 0.0000 Epoch: 36 Global Step: 765940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:08,055-Speed 6294.50 samples/sec Loss 2.6129 LearningRate 0.0000 Epoch: 36 Global Step: 765950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:11,308-Speed 6297.57 samples/sec Loss 2.6415 LearningRate 0.0000 Epoch: 36 Global Step: 765960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:14,581-Speed 6257.79 samples/sec Loss 2.6758 LearningRate 0.0000 Epoch: 36 Global Step: 765970 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:17,840-Speed 6286.74 samples/sec Loss 2.6426 LearningRate 0.0000 Epoch: 36 Global Step: 765980 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:25:21,110-Speed 6264.83 samples/sec Loss 2.6462 LearningRate 0.0000 Epoch: 36 Global Step: 765990 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:24,365-Speed 6291.30 samples/sec Loss 2.6457 LearningRate 0.0000 Epoch: 36 Global Step: 766000 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:27,626-Speed 6283.86 samples/sec Loss 2.6180 LearningRate 0.0000 Epoch: 36 Global Step: 766010 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:30,887-Speed 6281.53 samples/sec Loss 2.7021 LearningRate 0.0000 Epoch: 36 Global Step: 766020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:34,145-Speed 6285.90 samples/sec Loss 2.6196 LearningRate 0.0000 Epoch: 36 Global Step: 766030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:37,406-Speed 6282.66 samples/sec Loss 2.6210 LearningRate 0.0000 Epoch: 36 Global Step: 766040 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:40,673-Speed 6270.68 samples/sec Loss 2.6201 LearningRate 0.0000 Epoch: 36 Global Step: 766050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:43,931-Speed 6287.34 samples/sec Loss 2.6351 LearningRate 0.0000 Epoch: 36 Global Step: 766060 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:47,188-Speed 6287.88 samples/sec Loss 2.6092 LearningRate 0.0000 Epoch: 36 Global Step: 766070 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:50,457-Speed 6267.00 samples/sec Loss 2.6214 LearningRate 0.0000 Epoch: 36 Global Step: 766080 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:53,700-Speed 6317.26 samples/sec Loss 2.6214 LearningRate 0.0000 Epoch: 36 Global Step: 766090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:25:56,955-Speed 6293.69 samples/sec Loss 2.6397 LearningRate 0.0000 Epoch: 36 Global Step: 766100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:00,215-Speed 6282.37 samples/sec Loss 2.6340 LearningRate 0.0000 Epoch: 36 Global Step: 766110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:03,470-Speed 6293.57 samples/sec Loss 2.6247 LearningRate 0.0000 Epoch: 36 Global Step: 766120 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:06,735-Speed 6274.07 samples/sec Loss 2.6298 LearningRate 0.0000 Epoch: 36 Global Step: 766130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:09,991-Speed 6291.24 samples/sec Loss 2.5962 LearningRate 0.0000 Epoch: 36 Global Step: 766140 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:13,250-Speed 6285.72 samples/sec Loss 2.6025 LearningRate 0.0000 Epoch: 36 Global Step: 766150 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:16,503-Speed 6297.68 samples/sec Loss 2.6728 LearningRate 0.0000 Epoch: 36 Global Step: 766160 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:19,763-Speed 6283.82 samples/sec Loss 2.6033 LearningRate 0.0000 Epoch: 36 Global Step: 766170 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:23,022-Speed 6285.34 samples/sec Loss 2.5927 LearningRate 0.0000 Epoch: 36 Global Step: 766180 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:26,260-Speed 6326.14 samples/sec Loss 2.6010 LearningRate 0.0000 Epoch: 36 Global Step: 766190 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:29,514-Speed 6295.20 samples/sec Loss 2.6138 LearningRate 0.0000 Epoch: 36 Global Step: 766200 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:32,765-Speed 6300.74 samples/sec Loss 2.6376 LearningRate 0.0000 Epoch: 36 Global Step: 766210 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:36,034-Speed 6266.16 samples/sec Loss 2.5930 LearningRate 0.0000 Epoch: 36 Global Step: 766220 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:39,292-Speed 6287.90 samples/sec Loss 2.6327 LearningRate 0.0000 Epoch: 36 Global Step: 766230 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:42,551-Speed 6286.37 samples/sec Loss 2.6545 LearningRate 0.0000 Epoch: 36 Global Step: 766240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:45,807-Speed 6291.09 samples/sec Loss 2.6588 LearningRate 0.0000 Epoch: 36 Global Step: 766250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:49,071-Speed 6275.94 samples/sec Loss 2.6397 LearningRate 0.0000 Epoch: 36 Global Step: 766260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:52,331-Speed 6281.93 samples/sec Loss 2.6692 LearningRate 0.0000 Epoch: 36 Global Step: 766270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:55,601-Speed 6266.56 samples/sec Loss 2.6636 LearningRate 0.0000 Epoch: 36 Global Step: 766280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:26:58,841-Speed 6322.94 samples/sec Loss 2.6192 LearningRate 0.0000 Epoch: 36 Global Step: 766290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:02,104-Speed 6277.61 samples/sec Loss 2.5763 LearningRate 0.0000 Epoch: 36 Global Step: 766300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:05,373-Speed 6266.45 samples/sec Loss 2.6342 LearningRate 0.0000 Epoch: 36 Global Step: 766310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:08,626-Speed 6296.74 samples/sec Loss 2.7046 LearningRate 0.0000 Epoch: 36 Global Step: 766320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:11,889-Speed 6278.11 samples/sec Loss 2.6322 LearningRate 0.0000 Epoch: 36 Global Step: 766330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:15,157-Speed 6268.01 samples/sec Loss 2.6513 LearningRate 0.0000 Epoch: 36 Global Step: 766340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:18,418-Speed 6281.31 samples/sec Loss 2.6313 LearningRate 0.0000 Epoch: 36 Global Step: 766350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:21,683-Speed 6276.32 samples/sec Loss 2.6291 LearningRate 0.0000 Epoch: 36 Global Step: 766360 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:24,941-Speed 6286.90 samples/sec Loss 2.6343 LearningRate 0.0000 Epoch: 36 Global Step: 766370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:28,197-Speed 6290.31 samples/sec Loss 2.6169 LearningRate 0.0000 Epoch: 36 Global Step: 766380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:31,455-Speed 6288.49 samples/sec Loss 2.6751 LearningRate 0.0000 Epoch: 36 Global Step: 766390 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:27:34,693-Speed 6326.14 samples/sec Loss 2.6093 LearningRate 0.0000 Epoch: 36 Global Step: 766400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:37,958-Speed 6276.93 samples/sec Loss 2.6790 LearningRate 0.0000 Epoch: 36 Global Step: 766410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:41,219-Speed 6281.29 samples/sec Loss 2.6334 LearningRate 0.0000 Epoch: 36 Global Step: 766420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:44,482-Speed 6277.32 samples/sec Loss 2.6648 LearningRate 0.0000 Epoch: 36 Global Step: 766430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:47,738-Speed 6292.81 samples/sec Loss 2.6048 LearningRate 0.0000 Epoch: 36 Global Step: 766440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:50,997-Speed 6285.97 samples/sec Loss 2.6644 LearningRate 0.0000 Epoch: 36 Global Step: 766450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:54,256-Speed 6284.44 samples/sec Loss 2.6396 LearningRate 0.0000 Epoch: 36 Global Step: 766460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:27:57,516-Speed 6283.34 samples/sec Loss 2.5960 LearningRate 0.0000 Epoch: 36 Global Step: 766470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:00,772-Speed 6292.07 samples/sec Loss 2.6911 LearningRate 0.0000 Epoch: 36 Global Step: 766480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:04,029-Speed 6289.69 samples/sec Loss 2.6238 LearningRate 0.0000 Epoch: 36 Global Step: 766490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:07,268-Speed 6323.61 samples/sec Loss 2.5925 LearningRate 0.0000 Epoch: 36 Global Step: 766500 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:10,529-Speed 6282.05 samples/sec Loss 2.5861 LearningRate 0.0000 Epoch: 36 Global Step: 766510 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:13,787-Speed 6286.59 samples/sec Loss 2.6176 LearningRate 0.0000 Epoch: 36 Global Step: 766520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:17,042-Speed 6293.55 samples/sec Loss 2.6257 LearningRate 0.0000 Epoch: 36 Global Step: 766530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:20,298-Speed 6292.45 samples/sec Loss 2.6617 LearningRate 0.0000 Epoch: 36 Global Step: 766540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:23,573-Speed 6254.92 samples/sec Loss 2.6289 LearningRate 0.0000 Epoch: 36 Global Step: 766550 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:26,832-Speed 6285.14 samples/sec Loss 2.6490 LearningRate 0.0000 Epoch: 36 Global Step: 766560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:30,090-Speed 6286.77 samples/sec Loss 2.6518 LearningRate 0.0000 Epoch: 36 Global Step: 766570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:33,350-Speed 6283.76 samples/sec Loss 2.6338 LearningRate 0.0000 Epoch: 36 Global Step: 766580 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:36,608-Speed 6288.48 samples/sec Loss 2.6341 LearningRate 0.0000 Epoch: 36 Global Step: 766590 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:39,928-Speed 6169.23 samples/sec Loss 2.5690 LearningRate 0.0000 Epoch: 36 Global Step: 766600 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:43,187-Speed 6286.42 samples/sec Loss 2.6627 LearningRate 0.0000 Epoch: 36 Global Step: 766610 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:46,449-Speed 6278.55 samples/sec Loss 2.6043 LearningRate 0.0000 Epoch: 36 Global Step: 766620 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:49,710-Speed 6282.49 samples/sec Loss 2.5891 LearningRate 0.0000 Epoch: 36 Global Step: 766630 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:52,961-Speed 6300.94 samples/sec Loss 2.5943 LearningRate 0.0000 Epoch: 36 Global Step: 766640 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:56,224-Speed 6278.12 samples/sec Loss 2.6790 LearningRate 0.0000 Epoch: 36 Global Step: 766650 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:28:59,479-Speed 6292.39 samples/sec Loss 2.6175 LearningRate 0.0000 Epoch: 36 Global Step: 766660 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:02,740-Speed 6282.54 samples/sec Loss 2.6443 LearningRate 0.0000 Epoch: 36 Global Step: 766670 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:06,007-Speed 6269.79 samples/sec Loss 2.6035 LearningRate 0.0000 Epoch: 36 Global Step: 766680 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:09,264-Speed 6290.91 samples/sec Loss 2.6383 LearningRate 0.0000 Epoch: 36 Global Step: 766690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:12,520-Speed 6289.26 samples/sec Loss 2.6559 LearningRate 0.0000 Epoch: 36 Global Step: 766700 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:29:15,767-Speed 6309.73 samples/sec Loss 2.6539 LearningRate 0.0000 Epoch: 36 Global Step: 766710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:19,025-Speed 6287.02 samples/sec Loss 2.6300 LearningRate 0.0000 Epoch: 36 Global Step: 766720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:22,273-Speed 6307.71 samples/sec Loss 2.5941 LearningRate 0.0000 Epoch: 36 Global Step: 766730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:25,527-Speed 6294.89 samples/sec Loss 2.6043 LearningRate 0.0000 Epoch: 36 Global Step: 766740 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:28,782-Speed 6293.81 samples/sec Loss 2.6441 LearningRate 0.0000 Epoch: 36 Global Step: 766750 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:32,045-Speed 6278.33 samples/sec Loss 2.6382 LearningRate 0.0000 Epoch: 36 Global Step: 766760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:35,304-Speed 6285.88 samples/sec Loss 2.6051 LearningRate 0.0000 Epoch: 36 Global Step: 766770 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:38,557-Speed 6297.42 samples/sec Loss 2.6258 LearningRate 0.0000 Epoch: 36 Global Step: 766780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:41,814-Speed 6288.41 samples/sec Loss 2.6659 LearningRate 0.0000 Epoch: 36 Global Step: 766790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:45,076-Speed 6279.33 samples/sec Loss 2.6364 LearningRate 0.0000 Epoch: 36 Global Step: 766800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:48,320-Speed 6315.53 samples/sec Loss 2.6254 LearningRate 0.0000 Epoch: 36 Global Step: 766810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:51,579-Speed 6285.09 samples/sec Loss 2.6288 LearningRate 0.0000 Epoch: 36 Global Step: 766820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:54,837-Speed 6288.04 samples/sec Loss 2.6413 LearningRate 0.0000 Epoch: 36 Global Step: 766830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:29:58,098-Speed 6281.43 samples/sec Loss 2.6221 LearningRate 0.0000 Epoch: 36 Global Step: 766840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:01,361-Speed 6277.78 samples/sec Loss 2.6078 LearningRate 0.0000 Epoch: 36 Global Step: 766850 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:04,625-Speed 6275.94 samples/sec Loss 2.5829 LearningRate 0.0000 Epoch: 36 Global Step: 766860 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:07,897-Speed 6261.05 samples/sec Loss 2.6549 LearningRate 0.0000 Epoch: 36 Global Step: 766870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:11,161-Speed 6275.00 samples/sec Loss 2.6111 LearningRate 0.0000 Epoch: 36 Global Step: 766880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:14,418-Speed 6290.17 samples/sec Loss 2.6493 LearningRate 0.0000 Epoch: 36 Global Step: 766890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:17,673-Speed 6292.60 samples/sec Loss 2.6275 LearningRate 0.0000 Epoch: 36 Global Step: 766900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:20,916-Speed 6316.75 samples/sec Loss 2.6786 LearningRate 0.0000 Epoch: 36 Global Step: 766910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:24,181-Speed 6273.57 samples/sec Loss 2.6403 LearningRate 0.0000 Epoch: 36 Global Step: 766920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:27,443-Speed 6280.52 samples/sec Loss 2.6057 LearningRate 0.0000 Epoch: 36 Global Step: 766930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:30,703-Speed 6284.63 samples/sec Loss 2.6344 LearningRate 0.0000 Epoch: 36 Global Step: 766940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:33,962-Speed 6286.35 samples/sec Loss 2.6283 LearningRate 0.0000 Epoch: 36 Global Step: 766950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:37,225-Speed 6276.42 samples/sec Loss 2.6602 LearningRate 0.0000 Epoch: 36 Global Step: 766960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:40,490-Speed 6275.57 samples/sec Loss 2.6319 LearningRate 0.0000 Epoch: 36 Global Step: 766970 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:43,748-Speed 6286.75 samples/sec Loss 2.6005 LearningRate 0.0000 Epoch: 36 Global Step: 766980 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:46,998-Speed 6302.84 samples/sec Loss 2.6207 LearningRate 0.0000 Epoch: 36 Global Step: 766990 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:50,252-Speed 6295.13 samples/sec Loss 2.6279 LearningRate 0.0000 Epoch: 36 Global Step: 767000 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:30:53,512-Speed 6282.63 samples/sec Loss 2.6180 LearningRate 0.0000 Epoch: 36 Global Step: 767010 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:30:56,753-Speed 6320.41 samples/sec Loss 2.5902 LearningRate 0.0000 Epoch: 36 Global Step: 767020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:00,013-Speed 6284.95 samples/sec Loss 2.6746 LearningRate 0.0000 Epoch: 36 Global Step: 767030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:03,267-Speed 6294.24 samples/sec Loss 2.6299 LearningRate 0.0000 Epoch: 36 Global Step: 767040 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:06,523-Speed 6292.01 samples/sec Loss 2.6757 LearningRate 0.0000 Epoch: 36 Global Step: 767050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:09,784-Speed 6282.22 samples/sec Loss 2.6135 LearningRate 0.0000 Epoch: 36 Global Step: 767060 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:13,044-Speed 6283.18 samples/sec Loss 2.6149 LearningRate 0.0000 Epoch: 36 Global Step: 767070 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:16,305-Speed 6280.99 samples/sec Loss 2.6310 LearningRate 0.0000 Epoch: 36 Global Step: 767080 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:19,581-Speed 6252.63 samples/sec Loss 2.6637 LearningRate 0.0000 Epoch: 36 Global Step: 767090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:22,839-Speed 6286.84 samples/sec Loss 2.6427 LearningRate 0.0000 Epoch: 36 Global Step: 767100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:26,101-Speed 6280.84 samples/sec Loss 2.6218 LearningRate 0.0000 Epoch: 36 Global Step: 767110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:29,344-Speed 6317.31 samples/sec Loss 2.6433 LearningRate 0.0000 Epoch: 36 Global Step: 767120 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:32,598-Speed 6294.82 samples/sec Loss 2.6150 LearningRate 0.0000 Epoch: 36 Global Step: 767130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:31:35,849-Speed 6300.69 samples/sec Loss 2.6351 LearningRate 0.0000 Epoch: 36 Global Step: 767140 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:31:39,106-Speed 6290.71 samples/sec Loss 2.6509 LearningRate 0.0000 Epoch: 36 Global Step: 767150 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:31:42,366-Speed 6283.02 samples/sec Loss 2.6508 LearningRate 0.0000 Epoch: 36 Global Step: 767160 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:31:45,621-Speed 6292.53 samples/sec Loss 2.6199 LearningRate 0.0000 Epoch: 36 Global Step: 767170 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:31:48,893-Speed 6261.52 samples/sec Loss 2.6163 LearningRate 0.0000 Epoch: 36 Global Step: 767180 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:31:52,155-Speed 6280.09 samples/sec Loss 2.6413 LearningRate 0.0000 Epoch: 36 Global Step: 767190 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:31:55,412-Speed 6289.47 samples/sec Loss 2.6434 LearningRate 0.0000 Epoch: 36 Global Step: 767200 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:31:58,671-Speed 6285.01 samples/sec Loss 2.6712 LearningRate 0.0000 Epoch: 36 Global Step: 767210 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:32:01,947-Speed 6254.02 samples/sec Loss 2.6607 LearningRate 0.0000 Epoch: 36 Global Step: 767220 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:32:05,207-Speed 6282.67 samples/sec Loss 2.6528 LearningRate 0.0000 Epoch: 36 Global Step: 767230 Fp16 Grad Scale: 2048 Required: 6 hours Training: 2022-04-03 13:32:08,466-Speed 6285.26 samples/sec Loss 2.6258 LearningRate 0.0000 Epoch: 36 Global Step: 767240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:11,734-Speed 6269.40 samples/sec Loss 2.6385 LearningRate 0.0000 Epoch: 36 Global Step: 767250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:14,992-Speed 6287.00 samples/sec Loss 2.6073 LearningRate 0.0000 Epoch: 36 Global Step: 767260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:18,246-Speed 6294.25 samples/sec Loss 2.6830 LearningRate 0.0000 Epoch: 36 Global Step: 767270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:21,542-Speed 6216.06 samples/sec Loss 2.6503 LearningRate 0.0000 Epoch: 36 Global Step: 767280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:24,802-Speed 6282.13 samples/sec Loss 2.6220 LearningRate 0.0000 Epoch: 36 Global Step: 767290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:28,063-Speed 6282.04 samples/sec Loss 2.6540 LearningRate 0.0000 Epoch: 36 Global Step: 767300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:31,322-Speed 6285.18 samples/sec Loss 2.6380 LearningRate 0.0000 Epoch: 36 Global Step: 767310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:34,578-Speed 6291.42 samples/sec Loss 2.6077 LearningRate 0.0000 Epoch: 36 Global Step: 767320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:37,833-Speed 6293.01 samples/sec Loss 2.6169 LearningRate 0.0000 Epoch: 36 Global Step: 767330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:41,069-Speed 6330.37 samples/sec Loss 2.5943 LearningRate 0.0000 Epoch: 36 Global Step: 767340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:44,328-Speed 6285.44 samples/sec Loss 2.6730 LearningRate 0.0000 Epoch: 36 Global Step: 767350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:32:47,579-Speed 6301.64 samples/sec Loss 2.6768 LearningRate 0.0000 Epoch: 36 Global Step: 767360 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:33:47,029-Speed 344.50 samples/sec Loss 2.6731 LearningRate 0.0000 Epoch: 37 Global Step: 767370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:33:50,275-Speed 6312.74 samples/sec Loss 2.6393 LearningRate 0.0000 Epoch: 37 Global Step: 767380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:33:53,532-Speed 6288.24 samples/sec Loss 2.6074 LearningRate 0.0000 Epoch: 37 Global Step: 767390 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:33:56,779-Speed 6310.15 samples/sec Loss 2.6897 LearningRate 0.0000 Epoch: 37 Global Step: 767400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:00,040-Speed 6280.09 samples/sec Loss 2.6461 LearningRate 0.0000 Epoch: 37 Global Step: 767410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:03,290-Speed 6303.38 samples/sec Loss 2.6265 LearningRate 0.0000 Epoch: 37 Global Step: 767420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:06,538-Speed 6308.04 samples/sec Loss 2.6384 LearningRate 0.0000 Epoch: 37 Global Step: 767430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:09,790-Speed 6297.71 samples/sec Loss 2.6361 LearningRate 0.0000 Epoch: 37 Global Step: 767440 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:34:13,038-Speed 6308.17 samples/sec Loss 2.6692 LearningRate 0.0000 Epoch: 37 Global Step: 767450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:16,283-Speed 6310.81 samples/sec Loss 2.6218 LearningRate 0.0000 Epoch: 37 Global Step: 767460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:19,533-Speed 6303.36 samples/sec Loss 2.6322 LearningRate 0.0000 Epoch: 37 Global Step: 767470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:22,784-Speed 6301.60 samples/sec Loss 2.6165 LearningRate 0.0000 Epoch: 37 Global Step: 767480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:26,033-Speed 6305.39 samples/sec Loss 2.6363 LearningRate 0.0000 Epoch: 37 Global Step: 767490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:29,314-Speed 6242.93 samples/sec Loss 2.6747 LearningRate 0.0000 Epoch: 37 Global Step: 767500 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:32,557-Speed 6316.96 samples/sec Loss 2.6058 LearningRate 0.0000 Epoch: 37 Global Step: 767510 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:35,813-Speed 6290.94 samples/sec Loss 2.6195 LearningRate 0.0000 Epoch: 37 Global Step: 767520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:39,067-Speed 6295.18 samples/sec Loss 2.6373 LearningRate 0.0000 Epoch: 37 Global Step: 767530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:42,376-Speed 6190.04 samples/sec Loss 2.6327 LearningRate 0.0000 Epoch: 37 Global Step: 767540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:45,624-Speed 6308.17 samples/sec Loss 2.5582 LearningRate 0.0000 Epoch: 37 Global Step: 767550 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:48,873-Speed 6305.22 samples/sec Loss 2.6469 LearningRate 0.0000 Epoch: 37 Global Step: 767560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:52,125-Speed 6298.86 samples/sec Loss 2.6389 LearningRate 0.0000 Epoch: 37 Global Step: 767570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:55,382-Speed 6289.48 samples/sec Loss 2.6571 LearningRate 0.0000 Epoch: 37 Global Step: 767580 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:34:58,637-Speed 6292.11 samples/sec Loss 2.6326 LearningRate 0.0000 Epoch: 37 Global Step: 767590 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:01,889-Speed 6299.96 samples/sec Loss 2.6157 LearningRate 0.0000 Epoch: 37 Global Step: 767600 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:05,149-Speed 6283.53 samples/sec Loss 2.6084 LearningRate 0.0000 Epoch: 37 Global Step: 767610 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:08,401-Speed 6298.01 samples/sec Loss 2.5567 LearningRate 0.0000 Epoch: 37 Global Step: 767620 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:11,665-Speed 6276.18 samples/sec Loss 2.6191 LearningRate 0.0000 Epoch: 37 Global Step: 767630 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:14,929-Speed 6276.71 samples/sec Loss 2.6326 LearningRate 0.0000 Epoch: 37 Global Step: 767640 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:18,171-Speed 6318.29 samples/sec Loss 2.5667 LearningRate 0.0000 Epoch: 37 Global Step: 767650 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:21,418-Speed 6308.19 samples/sec Loss 2.6156 LearningRate 0.0000 Epoch: 37 Global Step: 767660 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:24,675-Speed 6290.23 samples/sec Loss 2.5786 LearningRate 0.0000 Epoch: 37 Global Step: 767670 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:27,931-Speed 6291.13 samples/sec Loss 2.6716 LearningRate 0.0000 Epoch: 37 Global Step: 767680 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:31,185-Speed 6294.86 samples/sec Loss 2.6027 LearningRate 0.0000 Epoch: 37 Global Step: 767690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:34,435-Speed 6303.53 samples/sec Loss 2.5928 LearningRate 0.0000 Epoch: 37 Global Step: 767700 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:37,686-Speed 6299.65 samples/sec Loss 2.6064 LearningRate 0.0000 Epoch: 37 Global Step: 767710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:40,935-Speed 6306.32 samples/sec Loss 2.6264 LearningRate 0.0000 Epoch: 37 Global Step: 767720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:44,184-Speed 6304.89 samples/sec Loss 2.6480 LearningRate 0.0000 Epoch: 37 Global Step: 767730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:47,435-Speed 6300.99 samples/sec Loss 2.6212 LearningRate 0.0000 Epoch: 37 Global Step: 767740 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:50,666-Speed 6338.50 samples/sec Loss 2.6393 LearningRate 0.0000 Epoch: 37 Global Step: 767750 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:53,916-Speed 6304.34 samples/sec Loss 2.6582 LearningRate 0.0000 Epoch: 37 Global Step: 767760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:35:57,160-Speed 6315.31 samples/sec Loss 2.5755 LearningRate 0.0000 Epoch: 37 Global Step: 767770 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:00,412-Speed 6298.17 samples/sec Loss 2.6962 LearningRate 0.0000 Epoch: 37 Global Step: 767780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:03,668-Speed 6292.02 samples/sec Loss 2.6208 LearningRate 0.0000 Epoch: 37 Global Step: 767790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:06,920-Speed 6299.41 samples/sec Loss 2.6447 LearningRate 0.0000 Epoch: 37 Global Step: 767800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:10,170-Speed 6301.77 samples/sec Loss 2.6578 LearningRate 0.0000 Epoch: 37 Global Step: 767810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:13,417-Speed 6309.60 samples/sec Loss 2.5880 LearningRate 0.0000 Epoch: 37 Global Step: 767820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:16,661-Speed 6314.47 samples/sec Loss 2.5760 LearningRate 0.0000 Epoch: 37 Global Step: 767830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:19,911-Speed 6302.40 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 37 Global Step: 767840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:23,161-Speed 6302.62 samples/sec Loss 2.5426 LearningRate 0.0000 Epoch: 37 Global Step: 767850 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:36:26,411-Speed 6303.71 samples/sec Loss 2.6166 LearningRate 0.0000 Epoch: 37 Global Step: 767860 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:29,658-Speed 6308.81 samples/sec Loss 2.6497 LearningRate 0.0000 Epoch: 37 Global Step: 767870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:32,908-Speed 6302.66 samples/sec Loss 2.6055 LearningRate 0.0000 Epoch: 37 Global Step: 767880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:36,174-Speed 6271.22 samples/sec Loss 2.6646 LearningRate 0.0000 Epoch: 37 Global Step: 767890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:39,426-Speed 6300.68 samples/sec Loss 2.5960 LearningRate 0.0000 Epoch: 37 Global Step: 767900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:42,711-Speed 6235.39 samples/sec Loss 2.7010 LearningRate 0.0000 Epoch: 37 Global Step: 767910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:45,991-Speed 6244.56 samples/sec Loss 2.6286 LearningRate 0.0000 Epoch: 37 Global Step: 767920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:49,267-Speed 6253.27 samples/sec Loss 2.5578 LearningRate 0.0000 Epoch: 37 Global Step: 767930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:52,519-Speed 6300.06 samples/sec Loss 2.6429 LearningRate 0.0000 Epoch: 37 Global Step: 767940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:55,771-Speed 6298.93 samples/sec Loss 2.5949 LearningRate 0.0000 Epoch: 37 Global Step: 767950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:36:59,007-Speed 6329.66 samples/sec Loss 2.5922 LearningRate 0.0000 Epoch: 37 Global Step: 767960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:02,259-Speed 6299.60 samples/sec Loss 2.6264 LearningRate 0.0000 Epoch: 37 Global Step: 767970 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:05,515-Speed 6291.70 samples/sec Loss 2.5841 LearningRate 0.0000 Epoch: 37 Global Step: 767980 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:08,770-Speed 6292.80 samples/sec Loss 2.6533 LearningRate 0.0000 Epoch: 37 Global Step: 767990 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:12,022-Speed 6299.70 samples/sec Loss 2.6598 LearningRate 0.0000 Epoch: 37 Global Step: 768000 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:15,273-Speed 6300.01 samples/sec Loss 2.6094 LearningRate 0.0000 Epoch: 37 Global Step: 768010 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:18,522-Speed 6305.34 samples/sec Loss 2.6344 LearningRate 0.0000 Epoch: 37 Global Step: 768020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:21,772-Speed 6303.09 samples/sec Loss 2.6290 LearningRate 0.0000 Epoch: 37 Global Step: 768030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:25,024-Speed 6298.92 samples/sec Loss 2.5883 LearningRate 0.0000 Epoch: 37 Global Step: 768040 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:28,274-Speed 6303.06 samples/sec Loss 2.6495 LearningRate 0.0000 Epoch: 37 Global Step: 768050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:31,514-Speed 6321.67 samples/sec Loss 2.6273 LearningRate 0.0000 Epoch: 37 Global Step: 768060 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:34,838-Speed 6163.64 samples/sec Loss 2.6565 LearningRate 0.0000 Epoch: 37 Global Step: 768070 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:38,092-Speed 6294.48 samples/sec Loss 2.6443 LearningRate 0.0000 Epoch: 37 Global Step: 768080 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:41,359-Speed 6270.60 samples/sec Loss 2.6271 LearningRate 0.0000 Epoch: 37 Global Step: 768090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:44,611-Speed 6299.70 samples/sec Loss 2.6283 LearningRate 0.0000 Epoch: 37 Global Step: 768100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:47,857-Speed 6309.78 samples/sec Loss 2.6499 LearningRate 0.0000 Epoch: 37 Global Step: 768110 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:51,103-Speed 6309.96 samples/sec Loss 2.5996 LearningRate 0.0000 Epoch: 37 Global Step: 768120 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:54,361-Speed 6288.47 samples/sec Loss 2.6308 LearningRate 0.0000 Epoch: 37 Global Step: 768130 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:37:57,611-Speed 6303.21 samples/sec Loss 2.6274 LearningRate 0.0000 Epoch: 37 Global Step: 768140 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:00,859-Speed 6306.76 samples/sec Loss 2.5919 LearningRate 0.0000 Epoch: 37 Global Step: 768150 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:04,099-Speed 6322.81 samples/sec Loss 2.6326 LearningRate 0.0000 Epoch: 37 Global Step: 768160 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:07,355-Speed 6291.21 samples/sec Loss 2.5938 LearningRate 0.0000 Epoch: 37 Global Step: 768170 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:10,650-Speed 6216.99 samples/sec Loss 2.6800 LearningRate 0.0000 Epoch: 37 Global Step: 768180 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:14,004-Speed 6108.53 samples/sec Loss 2.5987 LearningRate 0.0000 Epoch: 37 Global Step: 768190 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:17,253-Speed 6304.14 samples/sec Loss 2.6033 LearningRate 0.0000 Epoch: 37 Global Step: 768200 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:20,507-Speed 6294.69 samples/sec Loss 2.5857 LearningRate 0.0000 Epoch: 37 Global Step: 768210 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:23,760-Speed 6298.12 samples/sec Loss 2.6668 LearningRate 0.0000 Epoch: 37 Global Step: 768220 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:27,007-Speed 6308.01 samples/sec Loss 2.6153 LearningRate 0.0000 Epoch: 37 Global Step: 768230 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:30,251-Speed 6315.14 samples/sec Loss 2.6412 LearningRate 0.0000 Epoch: 37 Global Step: 768240 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:33,507-Speed 6290.33 samples/sec Loss 2.6232 LearningRate 0.0000 Epoch: 37 Global Step: 768250 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:36,743-Speed 6331.32 samples/sec Loss 2.5978 LearningRate 0.0000 Epoch: 37 Global Step: 768260 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:40,051-Speed 6191.57 samples/sec Loss 2.6509 LearningRate 0.0000 Epoch: 37 Global Step: 768270 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:43,309-Speed 6292.73 samples/sec Loss 2.6341 LearningRate 0.0000 Epoch: 37 Global Step: 768280 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:46,551-Speed 6317.83 samples/sec Loss 2.6079 LearningRate 0.0000 Epoch: 37 Global Step: 768290 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:49,802-Speed 6302.14 samples/sec Loss 2.6413 LearningRate 0.0000 Epoch: 37 Global Step: 768300 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:53,053-Speed 6299.86 samples/sec Loss 2.5660 LearningRate 0.0000 Epoch: 37 Global Step: 768310 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:56,312-Speed 6286.65 samples/sec Loss 2.6203 LearningRate 0.0000 Epoch: 37 Global Step: 768320 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:38:59,568-Speed 6290.62 samples/sec Loss 2.6375 LearningRate 0.0000 Epoch: 37 Global Step: 768330 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:02,820-Speed 6298.73 samples/sec Loss 2.6076 LearningRate 0.0000 Epoch: 37 Global Step: 768340 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:06,073-Speed 6298.29 samples/sec Loss 2.6334 LearningRate 0.0000 Epoch: 37 Global Step: 768350 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:09,324-Speed 6301.39 samples/sec Loss 2.6335 LearningRate 0.0000 Epoch: 37 Global Step: 768360 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:39:12,566-Speed 6318.04 samples/sec Loss 2.6281 LearningRate 0.0000 Epoch: 37 Global Step: 768370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:15,815-Speed 6305.41 samples/sec Loss 2.6027 LearningRate 0.0000 Epoch: 37 Global Step: 768380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:19,072-Speed 6290.60 samples/sec Loss 2.6122 LearningRate 0.0000 Epoch: 37 Global Step: 768390 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:22,332-Speed 6283.00 samples/sec Loss 2.6299 LearningRate 0.0000 Epoch: 37 Global Step: 768400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:25,584-Speed 6298.87 samples/sec Loss 2.6047 LearningRate 0.0000 Epoch: 37 Global Step: 768410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:28,843-Speed 6284.62 samples/sec Loss 2.6283 LearningRate 0.0000 Epoch: 37 Global Step: 768420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:32,093-Speed 6304.33 samples/sec Loss 2.5943 LearningRate 0.0000 Epoch: 37 Global Step: 768430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:35,357-Speed 6274.69 samples/sec Loss 2.6319 LearningRate 0.0000 Epoch: 37 Global Step: 768440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:38,611-Speed 6296.06 samples/sec Loss 2.6335 LearningRate 0.0000 Epoch: 37 Global Step: 768450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:41,868-Speed 6290.04 samples/sec Loss 2.6283 LearningRate 0.0000 Epoch: 37 Global Step: 768460 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:45,107-Speed 6322.45 samples/sec Loss 2.6165 LearningRate 0.0000 Epoch: 37 Global Step: 768470 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:48,355-Speed 6307.88 samples/sec Loss 2.6101 LearningRate 0.0000 Epoch: 37 Global Step: 768480 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:51,608-Speed 6296.70 samples/sec Loss 2.6122 LearningRate 0.0000 Epoch: 37 Global Step: 768490 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:54,861-Speed 6297.74 samples/sec Loss 2.5911 LearningRate 0.0000 Epoch: 37 Global Step: 768500 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:39:58,112-Speed 6300.43 samples/sec Loss 2.5862 LearningRate 0.0000 Epoch: 37 Global Step: 768510 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:01,366-Speed 6295.19 samples/sec Loss 2.6415 LearningRate 0.0000 Epoch: 37 Global Step: 768520 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:04,623-Speed 6289.43 samples/sec Loss 2.6264 LearningRate 0.0000 Epoch: 37 Global Step: 768530 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:07,871-Speed 6306.42 samples/sec Loss 2.6399 LearningRate 0.0000 Epoch: 37 Global Step: 768540 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:11,129-Speed 6288.91 samples/sec Loss 2.6312 LearningRate 0.0000 Epoch: 37 Global Step: 768550 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:14,384-Speed 6293.25 samples/sec Loss 2.5773 LearningRate 0.0000 Epoch: 37 Global Step: 768560 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:17,619-Speed 6331.42 samples/sec Loss 2.6409 LearningRate 0.0000 Epoch: 37 Global Step: 768570 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:20,870-Speed 6301.54 samples/sec Loss 2.6600 LearningRate 0.0000 Epoch: 37 Global Step: 768580 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:24,144-Speed 6257.62 samples/sec Loss 2.5675 LearningRate 0.0000 Epoch: 37 Global Step: 768590 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:27,399-Speed 6293.23 samples/sec Loss 2.6990 LearningRate 0.0000 Epoch: 37 Global Step: 768600 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:30,659-Speed 6284.60 samples/sec Loss 2.5927 LearningRate 0.0000 Epoch: 37 Global Step: 768610 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:33,902-Speed 6314.93 samples/sec Loss 2.6410 LearningRate 0.0000 Epoch: 37 Global Step: 768620 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:37,164-Speed 6280.83 samples/sec Loss 2.6602 LearningRate 0.0000 Epoch: 37 Global Step: 768630 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:40,425-Speed 6280.77 samples/sec Loss 2.6213 LearningRate 0.0000 Epoch: 37 Global Step: 768640 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:43,681-Speed 6290.93 samples/sec Loss 2.6120 LearningRate 0.0000 Epoch: 37 Global Step: 768650 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:46,945-Speed 6276.52 samples/sec Loss 2.6476 LearningRate 0.0000 Epoch: 37 Global Step: 768660 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:50,187-Speed 6318.44 samples/sec Loss 2.6344 LearningRate 0.0000 Epoch: 37 Global Step: 768670 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:53,454-Speed 6269.52 samples/sec Loss 2.6198 LearningRate 0.0000 Epoch: 37 Global Step: 768680 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:56,730-Speed 6254.04 samples/sec Loss 2.6549 LearningRate 0.0000 Epoch: 37 Global Step: 768690 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:40:59,989-Speed 6285.18 samples/sec Loss 2.5550 LearningRate 0.0000 Epoch: 37 Global Step: 768700 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:03,253-Speed 6275.78 samples/sec Loss 2.6033 LearningRate 0.0000 Epoch: 37 Global Step: 768710 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:06,505-Speed 6299.53 samples/sec Loss 2.6115 LearningRate 0.0000 Epoch: 37 Global Step: 768720 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:09,761-Speed 6290.54 samples/sec Loss 2.6361 LearningRate 0.0000 Epoch: 37 Global Step: 768730 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:13,016-Speed 6294.19 samples/sec Loss 2.6383 LearningRate 0.0000 Epoch: 37 Global Step: 768740 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:16,280-Speed 6275.80 samples/sec Loss 2.6053 LearningRate 0.0000 Epoch: 37 Global Step: 768750 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:19,539-Speed 6285.15 samples/sec Loss 2.6739 LearningRate 0.0000 Epoch: 37 Global Step: 768760 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:22,799-Speed 6284.60 samples/sec Loss 2.5940 LearningRate 0.0000 Epoch: 37 Global Step: 768770 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:41:26,042-Speed 6317.19 samples/sec Loss 2.6013 LearningRate 0.0000 Epoch: 37 Global Step: 768780 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:29,303-Speed 6281.82 samples/sec Loss 2.6493 LearningRate 0.0000 Epoch: 37 Global Step: 768790 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:32,555-Speed 6298.86 samples/sec Loss 2.6367 LearningRate 0.0000 Epoch: 37 Global Step: 768800 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:35,821-Speed 6271.03 samples/sec Loss 2.6107 LearningRate 0.0000 Epoch: 37 Global Step: 768810 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:39,075-Speed 6296.08 samples/sec Loss 2.6388 LearningRate 0.0000 Epoch: 37 Global Step: 768820 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:42,334-Speed 6284.76 samples/sec Loss 2.6286 LearningRate 0.0000 Epoch: 37 Global Step: 768830 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:45,588-Speed 6294.65 samples/sec Loss 2.6386 LearningRate 0.0000 Epoch: 37 Global Step: 768840 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:48,850-Speed 6280.83 samples/sec Loss 2.5859 LearningRate 0.0000 Epoch: 37 Global Step: 768850 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:52,115-Speed 6274.63 samples/sec Loss 2.5406 LearningRate 0.0000 Epoch: 37 Global Step: 768860 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:55,371-Speed 6290.03 samples/sec Loss 2.5904 LearningRate 0.0000 Epoch: 37 Global Step: 768870 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:41:58,609-Speed 6326.38 samples/sec Loss 2.6153 LearningRate 0.0000 Epoch: 37 Global Step: 768880 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:01,858-Speed 6305.30 samples/sec Loss 2.6104 LearningRate 0.0000 Epoch: 37 Global Step: 768890 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:05,122-Speed 6274.74 samples/sec Loss 2.6022 LearningRate 0.0000 Epoch: 37 Global Step: 768900 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:08,381-Speed 6285.60 samples/sec Loss 2.6135 LearningRate 0.0000 Epoch: 37 Global Step: 768910 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:11,654-Speed 6259.42 samples/sec Loss 2.6013 LearningRate 0.0000 Epoch: 37 Global Step: 768920 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:14,912-Speed 6287.11 samples/sec Loss 2.5767 LearningRate 0.0000 Epoch: 37 Global Step: 768930 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:18,174-Speed 6280.06 samples/sec Loss 2.6568 LearningRate 0.0000 Epoch: 37 Global Step: 768940 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:21,429-Speed 6292.34 samples/sec Loss 2.6480 LearningRate 0.0000 Epoch: 37 Global Step: 768950 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:24,704-Speed 6255.50 samples/sec Loss 2.6599 LearningRate 0.0000 Epoch: 37 Global Step: 768960 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:27,976-Speed 6261.12 samples/sec Loss 2.6231 LearningRate 0.0000 Epoch: 37 Global Step: 768970 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:31,220-Speed 6316.16 samples/sec Loss 2.6087 LearningRate 0.0000 Epoch: 37 Global Step: 768980 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:34,477-Speed 6287.87 samples/sec Loss 2.5664 LearningRate 0.0000 Epoch: 37 Global Step: 768990 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:37,752-Speed 6256.05 samples/sec Loss 2.6319 LearningRate 0.0000 Epoch: 37 Global Step: 769000 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:41,004-Speed 6297.30 samples/sec Loss 2.6212 LearningRate 0.0000 Epoch: 37 Global Step: 769010 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:44,257-Speed 6298.10 samples/sec Loss 2.6369 LearningRate 0.0000 Epoch: 37 Global Step: 769020 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:47,517-Speed 6284.38 samples/sec Loss 2.6135 LearningRate 0.0000 Epoch: 37 Global Step: 769030 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:50,779-Speed 6279.16 samples/sec Loss 2.6272 LearningRate 0.0000 Epoch: 37 Global Step: 769040 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:54,053-Speed 6257.15 samples/sec Loss 2.6464 LearningRate 0.0000 Epoch: 37 Global Step: 769050 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:42:57,305-Speed 6297.72 samples/sec Loss 2.7025 LearningRate 0.0000 Epoch: 37 Global Step: 769060 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:43:00,656-Speed 6113.38 samples/sec Loss 2.6760 LearningRate 0.0000 Epoch: 37 Global Step: 769070 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:43:03,917-Speed 6282.25 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 37 Global Step: 769080 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-03 13:43:07,157-Speed 6322.79 samples/sec Loss 2.5933 LearningRate 0.0000 Epoch: 37 Global Step: 769090 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:43:10,414-Speed 6288.21 samples/sec Loss 2.6047 LearningRate 0.0000 Epoch: 37 Global Step: 769100 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-03 13:43:13,667-Speed 6297.56 samples/sec Loss 2.6544 LearningRate 0.0000 Epoch: 37 Global Step: 769110 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:16,918-Speed 6301.18 samples/sec Loss 2.5928 LearningRate 0.0000 Epoch: 37 Global Step: 769120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:20,176-Speed 6286.41 samples/sec Loss 2.6314 LearningRate 0.0000 Epoch: 37 Global Step: 769130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:23,462-Speed 6234.71 samples/sec Loss 2.6174 LearningRate 0.0000 Epoch: 37 Global Step: 769140 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:26,719-Speed 6289.94 samples/sec Loss 2.6158 LearningRate 0.0000 Epoch: 37 Global Step: 769150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:29,979-Speed 6283.48 samples/sec Loss 2.5818 LearningRate 0.0000 Epoch: 37 Global Step: 769160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:33,238-Speed 6287.15 samples/sec Loss 2.6714 LearningRate 0.0000 Epoch: 37 Global Step: 769170 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:36,495-Speed 6289.02 samples/sec Loss 2.5907 LearningRate 0.0000 Epoch: 37 Global Step: 769180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:39,745-Speed 6301.21 samples/sec Loss 2.6327 LearningRate 0.0000 Epoch: 37 Global Step: 769190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:43,012-Speed 6271.80 samples/sec Loss 2.5970 LearningRate 0.0000 Epoch: 37 Global Step: 769200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:46,270-Speed 6287.22 samples/sec Loss 2.5681 LearningRate 0.0000 Epoch: 37 Global Step: 769210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:49,520-Speed 6301.81 samples/sec Loss 2.6572 LearningRate 0.0000 Epoch: 37 Global Step: 769220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:52,777-Speed 6290.74 samples/sec Loss 2.6589 LearningRate 0.0000 Epoch: 37 Global Step: 769230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:56,033-Speed 6290.58 samples/sec Loss 2.6164 LearningRate 0.0000 Epoch: 37 Global Step: 769240 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:43:59,288-Speed 6294.49 samples/sec Loss 2.6230 LearningRate 0.0000 Epoch: 37 Global Step: 769250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:02,559-Speed 6261.25 samples/sec Loss 2.6297 LearningRate 0.0000 Epoch: 37 Global Step: 769260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:05,822-Speed 6277.88 samples/sec Loss 2.6492 LearningRate 0.0000 Epoch: 37 Global Step: 769270 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:09,082-Speed 6283.35 samples/sec Loss 2.6211 LearningRate 0.0000 Epoch: 37 Global Step: 769280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:12,324-Speed 6318.41 samples/sec Loss 2.6124 LearningRate 0.0000 Epoch: 37 Global Step: 769290 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:15,584-Speed 6284.73 samples/sec Loss 2.6313 LearningRate 0.0000 Epoch: 37 Global Step: 769300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:18,849-Speed 6273.58 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 37 Global Step: 769310 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:22,107-Speed 6286.83 samples/sec Loss 2.6561 LearningRate 0.0000 Epoch: 37 Global Step: 769320 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:25,367-Speed 6283.64 samples/sec Loss 2.5879 LearningRate 0.0000 Epoch: 37 Global Step: 769330 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:28,625-Speed 6288.17 samples/sec Loss 2.6123 LearningRate 0.0000 Epoch: 37 Global Step: 769340 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:31,884-Speed 6285.26 samples/sec Loss 2.6157 LearningRate 0.0000 Epoch: 37 Global Step: 769350 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:35,145-Speed 6281.75 samples/sec Loss 2.6801 LearningRate 0.0000 Epoch: 37 Global Step: 769360 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:38,402-Speed 6290.13 samples/sec Loss 2.5942 LearningRate 0.0000 Epoch: 37 Global Step: 769370 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:41,661-Speed 6285.33 samples/sec Loss 2.6426 LearningRate 0.0000 Epoch: 37 Global Step: 769380 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:44,920-Speed 6286.14 samples/sec Loss 2.6409 LearningRate 0.0000 Epoch: 37 Global Step: 769390 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 13:44:48,167-Speed 6309.06 samples/sec Loss 2.6155 LearningRate 0.0000 Epoch: 37 Global Step: 769400 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:51,429-Speed 6277.83 samples/sec Loss 2.6278 LearningRate 0.0000 Epoch: 37 Global Step: 769410 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:54,695-Speed 6272.90 samples/sec Loss 2.6317 LearningRate 0.0000 Epoch: 37 Global Step: 769420 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:44:57,955-Speed 6283.80 samples/sec Loss 2.6347 LearningRate 0.0000 Epoch: 37 Global Step: 769430 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:01,212-Speed 6290.52 samples/sec Loss 2.6032 LearningRate 0.0000 Epoch: 37 Global Step: 769440 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:04,474-Speed 6278.66 samples/sec Loss 2.6667 LearningRate 0.0000 Epoch: 37 Global Step: 769450 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:07,730-Speed 6290.70 samples/sec Loss 2.5835 LearningRate 0.0000 Epoch: 37 Global Step: 769460 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:10,987-Speed 6289.23 samples/sec Loss 2.6108 LearningRate 0.0000 Epoch: 37 Global Step: 769470 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:14,260-Speed 6259.11 samples/sec Loss 2.5790 LearningRate 0.0000 Epoch: 37 Global Step: 769480 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:17,523-Speed 6277.98 samples/sec Loss 2.6151 LearningRate 0.0000 Epoch: 37 Global Step: 769490 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:20,765-Speed 6317.59 samples/sec Loss 2.5696 LearningRate 0.0000 Epoch: 37 Global Step: 769500 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:24,019-Speed 6295.65 samples/sec Loss 2.6168 LearningRate 0.0000 Epoch: 37 Global Step: 769510 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:27,283-Speed 6276.49 samples/sec Loss 2.6233 LearningRate 0.0000 Epoch: 37 Global Step: 769520 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:30,547-Speed 6275.99 samples/sec Loss 2.6575 LearningRate 0.0000 Epoch: 37 Global Step: 769530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:33,809-Speed 6279.71 samples/sec Loss 2.6134 LearningRate 0.0000 Epoch: 37 Global Step: 769540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:37,066-Speed 6289.26 samples/sec Loss 2.5945 LearningRate 0.0000 Epoch: 37 Global Step: 769550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:40,332-Speed 6272.99 samples/sec Loss 2.6532 LearningRate 0.0000 Epoch: 37 Global Step: 769560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:43,599-Speed 6271.18 samples/sec Loss 2.6032 LearningRate 0.0000 Epoch: 37 Global Step: 769570 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:46,855-Speed 6290.09 samples/sec Loss 2.5944 LearningRate 0.0000 Epoch: 37 Global Step: 769580 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:50,111-Speed 6291.63 samples/sec Loss 2.6241 LearningRate 0.0000 Epoch: 37 Global Step: 769590 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:53,376-Speed 6274.83 samples/sec Loss 2.6388 LearningRate 0.0000 Epoch: 37 Global Step: 769600 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 13:45:56,621-Speed 6311.59 samples/sec Loss 2.6336 LearningRate 0.0000 Epoch: 37 Global Step: 769610 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:45:59,884-Speed 6278.81 samples/sec Loss 2.6122 LearningRate 0.0000 Epoch: 37 Global Step: 769620 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:03,146-Speed 6279.47 samples/sec Loss 2.5964 LearningRate 0.0000 Epoch: 37 Global Step: 769630 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:06,421-Speed 6254.84 samples/sec Loss 2.6429 LearningRate 0.0000 Epoch: 37 Global Step: 769640 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:09,682-Speed 6280.79 samples/sec Loss 2.5811 LearningRate 0.0000 Epoch: 37 Global Step: 769650 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:12,945-Speed 6278.19 samples/sec Loss 2.6058 LearningRate 0.0000 Epoch: 37 Global Step: 769660 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:16,204-Speed 6286.33 samples/sec Loss 2.6503 LearningRate 0.0000 Epoch: 37 Global Step: 769670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:19,460-Speed 6290.03 samples/sec Loss 2.6333 LearningRate 0.0000 Epoch: 37 Global Step: 769680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:22,726-Speed 6272.27 samples/sec Loss 2.5876 LearningRate 0.0000 Epoch: 37 Global Step: 769690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:25,989-Speed 6277.04 samples/sec Loss 2.6478 LearningRate 0.0000 Epoch: 37 Global Step: 769700 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:29,239-Speed 6304.38 samples/sec Loss 2.6267 LearningRate 0.0000 Epoch: 37 Global Step: 769710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:32,501-Speed 6278.43 samples/sec Loss 2.6668 LearningRate 0.0000 Epoch: 37 Global Step: 769720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:35,769-Speed 6269.74 samples/sec Loss 2.6351 LearningRate 0.0000 Epoch: 37 Global Step: 769730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:39,025-Speed 6290.85 samples/sec Loss 2.6052 LearningRate 0.0000 Epoch: 37 Global Step: 769740 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:42,284-Speed 6286.01 samples/sec Loss 2.6587 LearningRate 0.0000 Epoch: 37 Global Step: 769750 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:45,538-Speed 6295.20 samples/sec Loss 2.6523 LearningRate 0.0000 Epoch: 37 Global Step: 769760 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:48,793-Speed 6292.36 samples/sec Loss 2.5833 LearningRate 0.0000 Epoch: 37 Global Step: 769770 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:52,047-Speed 6295.81 samples/sec Loss 2.5820 LearningRate 0.0000 Epoch: 37 Global Step: 769780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:55,304-Speed 6289.84 samples/sec Loss 2.5587 LearningRate 0.0000 Epoch: 37 Global Step: 769790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:46:58,571-Speed 6269.85 samples/sec Loss 2.6421 LearningRate 0.0000 Epoch: 37 Global Step: 769800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:01,822-Speed 6301.89 samples/sec Loss 2.6172 LearningRate 0.0000 Epoch: 37 Global Step: 769810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:05,083-Speed 6281.56 samples/sec Loss 2.5987 LearningRate 0.0000 Epoch: 37 Global Step: 769820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:08,356-Speed 6258.42 samples/sec Loss 2.6051 LearningRate 0.0000 Epoch: 37 Global Step: 769830 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:11,624-Speed 6268.42 samples/sec Loss 2.6257 LearningRate 0.0000 Epoch: 37 Global Step: 769840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:14,884-Speed 6282.48 samples/sec Loss 2.6170 LearningRate 0.0000 Epoch: 37 Global Step: 769850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:18,146-Speed 6280.04 samples/sec Loss 2.5900 LearningRate 0.0000 Epoch: 37 Global Step: 769860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:21,410-Speed 6275.70 samples/sec Loss 2.6134 LearningRate 0.0000 Epoch: 37 Global Step: 769870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:24,671-Speed 6281.53 samples/sec Loss 2.6157 LearningRate 0.0000 Epoch: 37 Global Step: 769880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:27,931-Speed 6283.99 samples/sec Loss 2.6466 LearningRate 0.0000 Epoch: 37 Global Step: 769890 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:31,192-Speed 6282.59 samples/sec Loss 2.6064 LearningRate 0.0000 Epoch: 37 Global Step: 769900 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:34,434-Speed 6318.10 samples/sec Loss 2.5808 LearningRate 0.0000 Epoch: 37 Global Step: 769910 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:37,696-Speed 6280.45 samples/sec Loss 2.6201 LearningRate 0.0000 Epoch: 37 Global Step: 769920 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:40,952-Speed 6291.14 samples/sec Loss 2.5980 LearningRate 0.0000 Epoch: 37 Global Step: 769930 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:44,212-Speed 6283.10 samples/sec Loss 2.6083 LearningRate 0.0000 Epoch: 37 Global Step: 769940 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:47,473-Speed 6282.82 samples/sec Loss 2.6512 LearningRate 0.0000 Epoch: 37 Global Step: 769950 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:50,735-Speed 6279.85 samples/sec Loss 2.6316 LearningRate 0.0000 Epoch: 37 Global Step: 769960 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:53,996-Speed 6281.27 samples/sec Loss 2.6389 LearningRate 0.0000 Epoch: 37 Global Step: 769970 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:47:57,261-Speed 6273.19 samples/sec Loss 2.6714 LearningRate 0.0000 Epoch: 37 Global Step: 769980 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:00,520-Speed 6285.99 samples/sec Loss 2.6745 LearningRate 0.0000 Epoch: 37 Global Step: 769990 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:03,785-Speed 6275.06 samples/sec Loss 2.6169 LearningRate 0.0000 Epoch: 37 Global Step: 770000 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:07,045-Speed 6282.35 samples/sec Loss 2.6443 LearningRate 0.0000 Epoch: 37 Global Step: 770010 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 13:48:10,294-Speed 6306.12 samples/sec Loss 2.6196 LearningRate 0.0000 Epoch: 37 Global Step: 770020 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:13,550-Speed 6291.10 samples/sec Loss 2.6071 LearningRate 0.0000 Epoch: 37 Global Step: 770030 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:16,808-Speed 6286.58 samples/sec Loss 2.6036 LearningRate 0.0000 Epoch: 37 Global Step: 770040 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:20,066-Speed 6288.06 samples/sec Loss 2.5961 LearningRate 0.0000 Epoch: 37 Global Step: 770050 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:23,337-Speed 6263.03 samples/sec Loss 2.5847 LearningRate 0.0000 Epoch: 37 Global Step: 770060 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:26,596-Speed 6284.41 samples/sec Loss 2.6266 LearningRate 0.0000 Epoch: 37 Global Step: 770070 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:29,866-Speed 6264.33 samples/sec Loss 2.5841 LearningRate 0.0000 Epoch: 37 Global Step: 770080 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:33,126-Speed 6284.56 samples/sec Loss 2.6177 LearningRate 0.0000 Epoch: 37 Global Step: 770090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:36,388-Speed 6280.97 samples/sec Loss 2.6225 LearningRate 0.0000 Epoch: 37 Global Step: 770100 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:39,648-Speed 6282.88 samples/sec Loss 2.5847 LearningRate 0.0000 Epoch: 37 Global Step: 770110 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:42,892-Speed 6314.31 samples/sec Loss 2.6070 LearningRate 0.0000 Epoch: 37 Global Step: 770120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:46,156-Speed 6276.31 samples/sec Loss 2.6289 LearningRate 0.0000 Epoch: 37 Global Step: 770130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:49,413-Speed 6290.55 samples/sec Loss 2.6493 LearningRate 0.0000 Epoch: 37 Global Step: 770140 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:52,676-Speed 6276.68 samples/sec Loss 2.6063 LearningRate 0.0000 Epoch: 37 Global Step: 770150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:55,941-Speed 6275.20 samples/sec Loss 2.6254 LearningRate 0.0000 Epoch: 37 Global Step: 770160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:48:59,205-Speed 6276.59 samples/sec Loss 2.6328 LearningRate 0.0000 Epoch: 37 Global Step: 770170 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:02,466-Speed 6280.97 samples/sec Loss 2.6263 LearningRate 0.0000 Epoch: 37 Global Step: 770180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:05,723-Speed 6289.15 samples/sec Loss 2.5830 LearningRate 0.0000 Epoch: 37 Global Step: 770190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:08,986-Speed 6279.35 samples/sec Loss 2.6258 LearningRate 0.0000 Epoch: 37 Global Step: 770200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:12,244-Speed 6286.20 samples/sec Loss 2.6146 LearningRate 0.0000 Epoch: 37 Global Step: 770210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:15,501-Speed 6288.83 samples/sec Loss 2.6024 LearningRate 0.0000 Epoch: 37 Global Step: 770220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:18,760-Speed 6286.02 samples/sec Loss 2.6524 LearningRate 0.0000 Epoch: 37 Global Step: 770230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:22,016-Speed 6292.26 samples/sec Loss 2.6015 LearningRate 0.0000 Epoch: 37 Global Step: 770240 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:25,280-Speed 6274.78 samples/sec Loss 2.6224 LearningRate 0.0000 Epoch: 37 Global Step: 770250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:28,539-Speed 6285.25 samples/sec Loss 2.6287 LearningRate 0.0000 Epoch: 37 Global Step: 770260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:31,795-Speed 6291.39 samples/sec Loss 2.6034 LearningRate 0.0000 Epoch: 37 Global Step: 770270 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:35,060-Speed 6275.12 samples/sec Loss 2.6282 LearningRate 0.0000 Epoch: 37 Global Step: 770280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:38,325-Speed 6274.46 samples/sec Loss 2.6118 LearningRate 0.0000 Epoch: 37 Global Step: 770290 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:41,583-Speed 6287.05 samples/sec Loss 2.6098 LearningRate 0.0000 Epoch: 37 Global Step: 770300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:44,843-Speed 6283.69 samples/sec Loss 2.5781 LearningRate 0.0000 Epoch: 37 Global Step: 770310 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:48,098-Speed 6293.64 samples/sec Loss 2.6698 LearningRate 0.0000 Epoch: 37 Global Step: 770320 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:51,364-Speed 6270.14 samples/sec Loss 2.6935 LearningRate 0.0000 Epoch: 37 Global Step: 770330 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:54,624-Speed 6284.94 samples/sec Loss 2.6173 LearningRate 0.0000 Epoch: 37 Global Step: 770340 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:49:57,888-Speed 6276.31 samples/sec Loss 2.6485 LearningRate 0.0000 Epoch: 37 Global Step: 770350 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:01,153-Speed 6273.35 samples/sec Loss 2.6211 LearningRate 0.0000 Epoch: 37 Global Step: 770360 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:04,417-Speed 6277.32 samples/sec Loss 2.6093 LearningRate 0.0000 Epoch: 37 Global Step: 770370 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:07,674-Speed 6289.19 samples/sec Loss 2.6139 LearningRate 0.0000 Epoch: 37 Global Step: 770380 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:10,932-Speed 6286.11 samples/sec Loss 2.6088 LearningRate 0.0000 Epoch: 37 Global Step: 770390 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:14,200-Speed 6268.32 samples/sec Loss 2.5740 LearningRate 0.0000 Epoch: 37 Global Step: 770400 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:17,462-Speed 6280.40 samples/sec Loss 2.6054 LearningRate 0.0000 Epoch: 37 Global Step: 770410 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:20,727-Speed 6274.55 samples/sec Loss 2.6270 LearningRate 0.0000 Epoch: 37 Global Step: 770420 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 13:50:23,978-Speed 6300.46 samples/sec Loss 2.6070 LearningRate 0.0000 Epoch: 37 Global Step: 770430 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:27,250-Speed 6259.78 samples/sec Loss 2.6721 LearningRate 0.0000 Epoch: 37 Global Step: 770440 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:30,503-Speed 6301.79 samples/sec Loss 2.6284 LearningRate 0.0000 Epoch: 37 Global Step: 770450 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:33,765-Speed 6277.89 samples/sec Loss 2.6534 LearningRate 0.0000 Epoch: 37 Global Step: 770460 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:37,026-Speed 6287.10 samples/sec Loss 2.6205 LearningRate 0.0000 Epoch: 37 Global Step: 770470 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:40,284-Speed 6287.25 samples/sec Loss 2.6631 LearningRate 0.0000 Epoch: 37 Global Step: 770480 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:43,543-Speed 6284.99 samples/sec Loss 2.6257 LearningRate 0.0000 Epoch: 37 Global Step: 770490 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:46,809-Speed 6273.49 samples/sec Loss 2.6783 LearningRate 0.0000 Epoch: 37 Global Step: 770500 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:50,088-Speed 6247.12 samples/sec Loss 2.6477 LearningRate 0.0000 Epoch: 37 Global Step: 770510 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:53,364-Speed 6252.88 samples/sec Loss 2.6044 LearningRate 0.0000 Epoch: 37 Global Step: 770520 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:56,600-Speed 6329.11 samples/sec Loss 2.6283 LearningRate 0.0000 Epoch: 37 Global Step: 770530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:50:59,853-Speed 6297.46 samples/sec Loss 2.5869 LearningRate 0.0000 Epoch: 37 Global Step: 770540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:03,126-Speed 6258.15 samples/sec Loss 2.6390 LearningRate 0.0000 Epoch: 37 Global Step: 770550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:06,388-Speed 6282.18 samples/sec Loss 2.5605 LearningRate 0.0000 Epoch: 37 Global Step: 770560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:09,650-Speed 6278.82 samples/sec Loss 2.5711 LearningRate 0.0000 Epoch: 37 Global Step: 770570 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:12,905-Speed 6293.89 samples/sec Loss 2.6631 LearningRate 0.0000 Epoch: 37 Global Step: 770580 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:16,164-Speed 6284.36 samples/sec Loss 2.6281 LearningRate 0.0000 Epoch: 37 Global Step: 770590 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:19,433-Speed 6267.49 samples/sec Loss 2.6099 LearningRate 0.0000 Epoch: 37 Global Step: 770600 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:22,693-Speed 6283.48 samples/sec Loss 2.6088 LearningRate 0.0000 Epoch: 37 Global Step: 770610 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:25,951-Speed 6285.96 samples/sec Loss 2.6090 LearningRate 0.0000 Epoch: 37 Global Step: 770620 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:29,215-Speed 6276.74 samples/sec Loss 2.6027 LearningRate 0.0000 Epoch: 37 Global Step: 770630 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:32,481-Speed 6273.00 samples/sec Loss 2.6791 LearningRate 0.0000 Epoch: 37 Global Step: 770640 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:35,751-Speed 6263.92 samples/sec Loss 2.5830 LearningRate 0.0000 Epoch: 37 Global Step: 770650 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:39,006-Speed 6292.68 samples/sec Loss 2.5878 LearningRate 0.0000 Epoch: 37 Global Step: 770660 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:42,262-Speed 6292.07 samples/sec Loss 2.6373 LearningRate 0.0000 Epoch: 37 Global Step: 770670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:45,525-Speed 6277.47 samples/sec Loss 2.6261 LearningRate 0.0000 Epoch: 37 Global Step: 770680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:48,789-Speed 6275.15 samples/sec Loss 2.6081 LearningRate 0.0000 Epoch: 37 Global Step: 770690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:52,061-Speed 6261.88 samples/sec Loss 2.6189 LearningRate 0.0000 Epoch: 37 Global Step: 770700 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:55,328-Speed 6268.36 samples/sec Loss 2.5909 LearningRate 0.0000 Epoch: 37 Global Step: 770710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:51:58,602-Speed 6258.00 samples/sec Loss 2.6065 LearningRate 0.0000 Epoch: 37 Global Step: 770720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:01,857-Speed 6292.91 samples/sec Loss 2.6256 LearningRate 0.0000 Epoch: 37 Global Step: 770730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:05,122-Speed 6275.52 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 37 Global Step: 770740 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:08,388-Speed 6270.26 samples/sec Loss 2.5791 LearningRate 0.0000 Epoch: 37 Global Step: 770750 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:11,644-Speed 6292.42 samples/sec Loss 2.6331 LearningRate 0.0000 Epoch: 37 Global Step: 770760 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:14,900-Speed 6290.78 samples/sec Loss 2.6209 LearningRate 0.0000 Epoch: 37 Global Step: 770770 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:18,164-Speed 6277.72 samples/sec Loss 2.5693 LearningRate 0.0000 Epoch: 37 Global Step: 770780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:21,429-Speed 6272.89 samples/sec Loss 2.6069 LearningRate 0.0000 Epoch: 37 Global Step: 770790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:24,704-Speed 6254.55 samples/sec Loss 2.5678 LearningRate 0.0000 Epoch: 37 Global Step: 770800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:27,973-Speed 6267.01 samples/sec Loss 2.6398 LearningRate 0.0000 Epoch: 37 Global Step: 770810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:31,242-Speed 6266.66 samples/sec Loss 2.6059 LearningRate 0.0000 Epoch: 37 Global Step: 770820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:34,499-Speed 6289.30 samples/sec Loss 2.6078 LearningRate 0.0000 Epoch: 37 Global Step: 770830 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 13:52:37,744-Speed 6311.78 samples/sec Loss 2.5879 LearningRate 0.0000 Epoch: 37 Global Step: 770840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:41,001-Speed 6289.87 samples/sec Loss 2.5475 LearningRate 0.0000 Epoch: 37 Global Step: 770850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:44,260-Speed 6286.14 samples/sec Loss 2.5662 LearningRate 0.0000 Epoch: 37 Global Step: 770860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:47,524-Speed 6275.86 samples/sec Loss 2.5921 LearningRate 0.0000 Epoch: 37 Global Step: 770870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:50,794-Speed 6263.86 samples/sec Loss 2.6558 LearningRate 0.0000 Epoch: 37 Global Step: 770880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:54,055-Speed 6280.31 samples/sec Loss 2.6111 LearningRate 0.0000 Epoch: 37 Global Step: 770890 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:52:57,318-Speed 6278.37 samples/sec Loss 2.6535 LearningRate 0.0000 Epoch: 37 Global Step: 770900 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:00,574-Speed 6291.95 samples/sec Loss 2.6129 LearningRate 0.0000 Epoch: 37 Global Step: 770910 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:03,835-Speed 6282.19 samples/sec Loss 2.6671 LearningRate 0.0000 Epoch: 37 Global Step: 770920 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:07,091-Speed 6291.24 samples/sec Loss 2.5019 LearningRate 0.0000 Epoch: 37 Global Step: 770930 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:10,331-Speed 6321.34 samples/sec Loss 2.6457 LearningRate 0.0000 Epoch: 37 Global Step: 770940 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:13,591-Speed 6283.48 samples/sec Loss 2.6017 LearningRate 0.0000 Epoch: 37 Global Step: 770950 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:16,845-Speed 6295.89 samples/sec Loss 2.6373 LearningRate 0.0000 Epoch: 37 Global Step: 770960 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:20,109-Speed 6275.61 samples/sec Loss 2.6739 LearningRate 0.0000 Epoch: 37 Global Step: 770970 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:23,402-Speed 6222.18 samples/sec Loss 2.5926 LearningRate 0.0000 Epoch: 37 Global Step: 770980 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:26,659-Speed 6289.31 samples/sec Loss 2.6366 LearningRate 0.0000 Epoch: 37 Global Step: 770990 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:29,920-Speed 6281.45 samples/sec Loss 2.6610 LearningRate 0.0000 Epoch: 37 Global Step: 771000 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:33,183-Speed 6277.15 samples/sec Loss 2.6319 LearningRate 0.0000 Epoch: 37 Global Step: 771010 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:36,444-Speed 6281.50 samples/sec Loss 2.5969 LearningRate 0.0000 Epoch: 37 Global Step: 771020 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:39,714-Speed 6265.76 samples/sec Loss 2.5937 LearningRate 0.0000 Epoch: 37 Global Step: 771030 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:42,953-Speed 6322.82 samples/sec Loss 2.5714 LearningRate 0.0000 Epoch: 37 Global Step: 771040 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:46,223-Speed 6264.26 samples/sec Loss 2.6168 LearningRate 0.0000 Epoch: 37 Global Step: 771050 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:49,491-Speed 6269.07 samples/sec Loss 2.6412 LearningRate 0.0000 Epoch: 37 Global Step: 771060 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:52,743-Speed 6298.28 samples/sec Loss 2.6113 LearningRate 0.0000 Epoch: 37 Global Step: 771070 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:55,997-Speed 6295.04 samples/sec Loss 2.5991 LearningRate 0.0000 Epoch: 37 Global Step: 771080 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:53:59,255-Speed 6289.38 samples/sec Loss 2.6191 LearningRate 0.0000 Epoch: 37 Global Step: 771090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:02,518-Speed 6277.45 samples/sec Loss 2.6197 LearningRate 0.0000 Epoch: 37 Global Step: 771100 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:05,788-Speed 6263.01 samples/sec Loss 2.6240 LearningRate 0.0000 Epoch: 37 Global Step: 771110 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:09,043-Speed 6294.58 samples/sec Loss 2.6027 LearningRate 0.0000 Epoch: 37 Global Step: 771120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:12,305-Speed 6278.36 samples/sec Loss 2.6317 LearningRate 0.0000 Epoch: 37 Global Step: 771130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:15,567-Speed 6280.49 samples/sec Loss 2.5985 LearningRate 0.0000 Epoch: 37 Global Step: 771140 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 13:54:18,808-Speed 6320.43 samples/sec Loss 2.6023 LearningRate 0.0000 Epoch: 37 Global Step: 771150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:22,067-Speed 6284.96 samples/sec Loss 2.6307 LearningRate 0.0000 Epoch: 37 Global Step: 771160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:25,325-Speed 6289.37 samples/sec Loss 2.6491 LearningRate 0.0000 Epoch: 37 Global Step: 771170 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:28,580-Speed 6292.10 samples/sec Loss 2.6004 LearningRate 0.0000 Epoch: 37 Global Step: 771180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:31,841-Speed 6282.35 samples/sec Loss 2.6199 LearningRate 0.0000 Epoch: 37 Global Step: 771190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:35,090-Speed 6305.15 samples/sec Loss 2.6077 LearningRate 0.0000 Epoch: 37 Global Step: 771200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:38,351-Speed 6282.29 samples/sec Loss 2.5615 LearningRate 0.0000 Epoch: 37 Global Step: 771210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:41,608-Speed 6288.75 samples/sec Loss 2.6073 LearningRate 0.0000 Epoch: 37 Global Step: 771220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:44,867-Speed 6285.85 samples/sec Loss 2.6520 LearningRate 0.0000 Epoch: 37 Global Step: 771230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:48,123-Speed 6290.60 samples/sec Loss 2.6210 LearningRate 0.0000 Epoch: 37 Global Step: 771240 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:51,363-Speed 6322.20 samples/sec Loss 2.5910 LearningRate 0.0000 Epoch: 37 Global Step: 771250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:54,622-Speed 6286.64 samples/sec Loss 2.6331 LearningRate 0.0000 Epoch: 37 Global Step: 771260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:54:57,900-Speed 6248.37 samples/sec Loss 2.6065 LearningRate 0.0000 Epoch: 37 Global Step: 771270 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:01,158-Speed 6287.54 samples/sec Loss 2.6135 LearningRate 0.0000 Epoch: 37 Global Step: 771280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:04,432-Speed 6257.44 samples/sec Loss 2.6313 LearningRate 0.0000 Epoch: 37 Global Step: 771290 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:07,696-Speed 6275.76 samples/sec Loss 2.5606 LearningRate 0.0000 Epoch: 37 Global Step: 771300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:10,955-Speed 6285.84 samples/sec Loss 2.6260 LearningRate 0.0000 Epoch: 37 Global Step: 771310 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:14,213-Speed 6285.94 samples/sec Loss 2.6553 LearningRate 0.0000 Epoch: 37 Global Step: 771320 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:17,505-Speed 6223.47 samples/sec Loss 2.5763 LearningRate 0.0000 Epoch: 37 Global Step: 771330 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:20,764-Speed 6285.03 samples/sec Loss 2.6210 LearningRate 0.0000 Epoch: 37 Global Step: 771340 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:24,004-Speed 6322.38 samples/sec Loss 2.5727 LearningRate 0.0000 Epoch: 37 Global Step: 771350 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:27,257-Speed 6296.40 samples/sec Loss 2.6303 LearningRate 0.0000 Epoch: 37 Global Step: 771360 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:30,514-Speed 6290.55 samples/sec Loss 2.6309 LearningRate 0.0000 Epoch: 37 Global Step: 771370 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:33,774-Speed 6283.98 samples/sec Loss 2.5981 LearningRate 0.0000 Epoch: 37 Global Step: 771380 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:37,036-Speed 6280.21 samples/sec Loss 2.6040 LearningRate 0.0000 Epoch: 37 Global Step: 771390 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:40,320-Speed 6238.32 samples/sec Loss 2.6051 LearningRate 0.0000 Epoch: 37 Global Step: 771400 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:43,584-Speed 6274.18 samples/sec Loss 2.5829 LearningRate 0.0000 Epoch: 37 Global Step: 771410 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:46,848-Speed 6277.23 samples/sec Loss 2.5814 LearningRate 0.0000 Epoch: 37 Global Step: 771420 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:50,104-Speed 6290.94 samples/sec Loss 2.6387 LearningRate 0.0000 Epoch: 37 Global Step: 771430 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:53,361-Speed 6288.46 samples/sec Loss 2.6354 LearningRate 0.0000 Epoch: 37 Global Step: 771440 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:56,605-Speed 6315.77 samples/sec Loss 2.5958 LearningRate 0.0000 Epoch: 37 Global Step: 771450 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:55:59,864-Speed 6285.79 samples/sec Loss 2.6387 LearningRate 0.0000 Epoch: 37 Global Step: 771460 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:03,132-Speed 6267.80 samples/sec Loss 2.6361 LearningRate 0.0000 Epoch: 37 Global Step: 771470 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:06,393-Speed 6281.43 samples/sec Loss 2.6747 LearningRate 0.0000 Epoch: 37 Global Step: 771480 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:09,649-Speed 6290.43 samples/sec Loss 2.6076 LearningRate 0.0000 Epoch: 37 Global Step: 771490 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:12,906-Speed 6290.04 samples/sec Loss 2.5875 LearningRate 0.0000 Epoch: 37 Global Step: 771500 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:16,220-Speed 6182.18 samples/sec Loss 2.6403 LearningRate 0.0000 Epoch: 37 Global Step: 771510 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:19,473-Speed 6296.46 samples/sec Loss 2.5980 LearningRate 0.0000 Epoch: 37 Global Step: 771520 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:22,735-Speed 6278.86 samples/sec Loss 2.5878 LearningRate 0.0000 Epoch: 37 Global Step: 771530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:25,992-Speed 6289.33 samples/sec Loss 2.5684 LearningRate 0.0000 Epoch: 37 Global Step: 771540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:29,237-Speed 6312.68 samples/sec Loss 2.6429 LearningRate 0.0000 Epoch: 37 Global Step: 771550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:32,503-Speed 6273.32 samples/sec Loss 2.5558 LearningRate 0.0000 Epoch: 37 Global Step: 771560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:35,768-Speed 6274.50 samples/sec Loss 2.5974 LearningRate 0.0000 Epoch: 37 Global Step: 771570 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:39,037-Speed 6265.23 samples/sec Loss 2.5998 LearningRate 0.0000 Epoch: 37 Global Step: 771580 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:42,300-Speed 6279.74 samples/sec Loss 2.6237 LearningRate 0.0000 Epoch: 37 Global Step: 771590 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:45,573-Speed 6258.44 samples/sec Loss 2.5975 LearningRate 0.0000 Epoch: 37 Global Step: 771600 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:48,837-Speed 6274.65 samples/sec Loss 2.6761 LearningRate 0.0000 Epoch: 37 Global Step: 771610 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:52,102-Speed 6273.62 samples/sec Loss 2.6132 LearningRate 0.0000 Epoch: 37 Global Step: 771620 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:55,370-Speed 6269.07 samples/sec Loss 2.6210 LearningRate 0.0000 Epoch: 37 Global Step: 771630 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:56:58,626-Speed 6290.55 samples/sec Loss 2.6502 LearningRate 0.0000 Epoch: 37 Global Step: 771640 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:01,894-Speed 6268.02 samples/sec Loss 2.5970 LearningRate 0.0000 Epoch: 37 Global Step: 771650 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 13:57:05,140-Speed 6312.72 samples/sec Loss 2.5686 LearningRate 0.0000 Epoch: 37 Global Step: 771660 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:08,392-Speed 6297.55 samples/sec Loss 2.6054 LearningRate 0.0000 Epoch: 37 Global Step: 771670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:11,650-Speed 6288.14 samples/sec Loss 2.6641 LearningRate 0.0000 Epoch: 37 Global Step: 771680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:14,910-Speed 6283.18 samples/sec Loss 2.6041 LearningRate 0.0000 Epoch: 37 Global Step: 771690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:18,182-Speed 6260.92 samples/sec Loss 2.5520 LearningRate 0.0000 Epoch: 37 Global Step: 771700 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:21,440-Speed 6286.85 samples/sec Loss 2.6288 LearningRate 0.0000 Epoch: 37 Global Step: 771710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:24,700-Speed 6283.86 samples/sec Loss 2.5543 LearningRate 0.0000 Epoch: 37 Global Step: 771720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:27,957-Speed 6289.35 samples/sec Loss 2.6338 LearningRate 0.0000 Epoch: 37 Global Step: 771730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:31,209-Speed 6299.32 samples/sec Loss 2.6264 LearningRate 0.0000 Epoch: 37 Global Step: 771740 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:34,472-Speed 6277.04 samples/sec Loss 2.6862 LearningRate 0.0000 Epoch: 37 Global Step: 771750 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:37,716-Speed 6314.68 samples/sec Loss 2.6184 LearningRate 0.0000 Epoch: 37 Global Step: 771760 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:40,989-Speed 6258.96 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 37 Global Step: 771770 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:44,248-Speed 6285.62 samples/sec Loss 2.6201 LearningRate 0.0000 Epoch: 37 Global Step: 771780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:47,505-Speed 6289.21 samples/sec Loss 2.5794 LearningRate 0.0000 Epoch: 37 Global Step: 771790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:50,762-Speed 6289.24 samples/sec Loss 2.6154 LearningRate 0.0000 Epoch: 37 Global Step: 771800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:54,022-Speed 6285.30 samples/sec Loss 2.6346 LearningRate 0.0000 Epoch: 37 Global Step: 771810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:57:57,284-Speed 6279.23 samples/sec Loss 2.6130 LearningRate 0.0000 Epoch: 37 Global Step: 771820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:00,554-Speed 6264.22 samples/sec Loss 2.5525 LearningRate 0.0000 Epoch: 37 Global Step: 771830 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:03,814-Speed 6284.13 samples/sec Loss 2.5763 LearningRate 0.0000 Epoch: 37 Global Step: 771840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:07,074-Speed 6282.41 samples/sec Loss 2.6458 LearningRate 0.0000 Epoch: 37 Global Step: 771850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:10,349-Speed 6256.15 samples/sec Loss 2.6780 LearningRate 0.0000 Epoch: 37 Global Step: 771860 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 13:58:13,588-Speed 6324.33 samples/sec Loss 2.6114 LearningRate 0.0000 Epoch: 37 Global Step: 771870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:16,850-Speed 6279.82 samples/sec Loss 2.6315 LearningRate 0.0000 Epoch: 37 Global Step: 771880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:20,104-Speed 6294.76 samples/sec Loss 2.6038 LearningRate 0.0000 Epoch: 37 Global Step: 771890 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:23,362-Speed 6288.15 samples/sec Loss 2.5899 LearningRate 0.0000 Epoch: 37 Global Step: 771900 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:26,626-Speed 6275.45 samples/sec Loss 2.6018 LearningRate 0.0000 Epoch: 37 Global Step: 771910 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:29,882-Speed 6290.13 samples/sec Loss 2.5804 LearningRate 0.0000 Epoch: 37 Global Step: 771920 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:33,135-Speed 6298.28 samples/sec Loss 2.5744 LearningRate 0.0000 Epoch: 37 Global Step: 771930 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:36,394-Speed 6284.34 samples/sec Loss 2.5778 LearningRate 0.0000 Epoch: 37 Global Step: 771940 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:39,656-Speed 6280.98 samples/sec Loss 2.5560 LearningRate 0.0000 Epoch: 37 Global Step: 771950 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:42,921-Speed 6272.96 samples/sec Loss 2.6021 LearningRate 0.0000 Epoch: 37 Global Step: 771960 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:46,169-Speed 6308.25 samples/sec Loss 2.5963 LearningRate 0.0000 Epoch: 37 Global Step: 771970 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:49,449-Speed 6243.63 samples/sec Loss 2.6145 LearningRate 0.0000 Epoch: 37 Global Step: 771980 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:52,709-Speed 6283.45 samples/sec Loss 2.5616 LearningRate 0.0000 Epoch: 37 Global Step: 771990 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:55,967-Speed 6289.12 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 37 Global Step: 772000 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:58:59,229-Speed 6280.34 samples/sec Loss 2.6109 LearningRate 0.0000 Epoch: 37 Global Step: 772010 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:02,485-Speed 6290.52 samples/sec Loss 2.6244 LearningRate 0.0000 Epoch: 37 Global Step: 772020 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:05,737-Speed 6299.84 samples/sec Loss 2.6189 LearningRate 0.0000 Epoch: 37 Global Step: 772030 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:08,994-Speed 6288.66 samples/sec Loss 2.6024 LearningRate 0.0000 Epoch: 37 Global Step: 772040 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:12,272-Speed 6250.51 samples/sec Loss 2.6181 LearningRate 0.0000 Epoch: 37 Global Step: 772050 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:15,575-Speed 6201.61 samples/sec Loss 2.6110 LearningRate 0.0000 Epoch: 37 Global Step: 772060 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:18,813-Speed 6325.11 samples/sec Loss 2.5378 LearningRate 0.0000 Epoch: 37 Global Step: 772070 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:22,070-Speed 6288.79 samples/sec Loss 2.6490 LearningRate 0.0000 Epoch: 37 Global Step: 772080 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:25,328-Speed 6288.64 samples/sec Loss 2.6022 LearningRate 0.0000 Epoch: 37 Global Step: 772090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:28,587-Speed 6285.87 samples/sec Loss 2.6216 LearningRate 0.0000 Epoch: 37 Global Step: 772100 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:31,845-Speed 6285.79 samples/sec Loss 2.6356 LearningRate 0.0000 Epoch: 37 Global Step: 772110 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:35,104-Speed 6286.92 samples/sec Loss 2.6565 LearningRate 0.0000 Epoch: 37 Global Step: 772120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:38,361-Speed 6288.33 samples/sec Loss 2.6207 LearningRate 0.0000 Epoch: 37 Global Step: 772130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:41,612-Speed 6302.48 samples/sec Loss 2.6195 LearningRate 0.0000 Epoch: 37 Global Step: 772140 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:44,873-Speed 6281.45 samples/sec Loss 2.6134 LearningRate 0.0000 Epoch: 37 Global Step: 772150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:48,122-Speed 6304.99 samples/sec Loss 2.6420 LearningRate 0.0000 Epoch: 37 Global Step: 772160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:51,378-Speed 6291.26 samples/sec Loss 2.6230 LearningRate 0.0000 Epoch: 37 Global Step: 772170 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 13:59:54,621-Speed 6314.97 samples/sec Loss 2.6288 LearningRate 0.0000 Epoch: 37 Global Step: 772180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 13:59:57,885-Speed 6275.87 samples/sec Loss 2.6452 LearningRate 0.0000 Epoch: 37 Global Step: 772190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:01,135-Speed 6304.06 samples/sec Loss 2.5647 LearningRate 0.0000 Epoch: 37 Global Step: 772200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:04,392-Speed 6290.15 samples/sec Loss 2.6172 LearningRate 0.0000 Epoch: 37 Global Step: 772210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:07,653-Speed 6281.79 samples/sec Loss 2.6585 LearningRate 0.0000 Epoch: 37 Global Step: 772220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:10,915-Speed 6279.90 samples/sec Loss 2.6013 LearningRate 0.0000 Epoch: 37 Global Step: 772230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:14,177-Speed 6279.87 samples/sec Loss 2.6064 LearningRate 0.0000 Epoch: 37 Global Step: 772240 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:17,435-Speed 6286.29 samples/sec Loss 2.5900 LearningRate 0.0000 Epoch: 37 Global Step: 772250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:20,689-Speed 6295.25 samples/sec Loss 2.5914 LearningRate 0.0000 Epoch: 37 Global Step: 772260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:23,952-Speed 6279.56 samples/sec Loss 2.6094 LearningRate 0.0000 Epoch: 37 Global Step: 772270 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:27,194-Speed 6316.83 samples/sec Loss 2.5871 LearningRate 0.0000 Epoch: 37 Global Step: 772280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:30,450-Speed 6292.28 samples/sec Loss 2.6309 LearningRate 0.0000 Epoch: 37 Global Step: 772290 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:33,704-Speed 6296.00 samples/sec Loss 2.6536 LearningRate 0.0000 Epoch: 37 Global Step: 772300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:36,960-Speed 6291.11 samples/sec Loss 2.6286 LearningRate 0.0000 Epoch: 37 Global Step: 772310 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:40,223-Speed 6277.92 samples/sec Loss 2.6043 LearningRate 0.0000 Epoch: 37 Global Step: 772320 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:43,484-Speed 6280.74 samples/sec Loss 2.6173 LearningRate 0.0000 Epoch: 37 Global Step: 772330 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:46,745-Speed 6282.63 samples/sec Loss 2.6188 LearningRate 0.0000 Epoch: 37 Global Step: 772340 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:50,008-Speed 6276.55 samples/sec Loss 2.6202 LearningRate 0.0000 Epoch: 37 Global Step: 772350 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:53,270-Speed 6280.99 samples/sec Loss 2.5627 LearningRate 0.0000 Epoch: 37 Global Step: 772360 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:56,527-Speed 6289.29 samples/sec Loss 2.5879 LearningRate 0.0000 Epoch: 37 Global Step: 772370 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:00:59,775-Speed 6305.94 samples/sec Loss 2.6122 LearningRate 0.0000 Epoch: 37 Global Step: 772380 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:01:03,044-Speed 6267.74 samples/sec Loss 2.6745 LearningRate 0.0000 Epoch: 37 Global Step: 772390 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:01:06,300-Speed 6291.03 samples/sec Loss 2.6347 LearningRate 0.0000 Epoch: 37 Global Step: 772400 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:09,549-Speed 6304.63 samples/sec Loss 2.5923 LearningRate 0.0000 Epoch: 37 Global Step: 772410 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:12,812-Speed 6278.73 samples/sec Loss 2.6137 LearningRate 0.0000 Epoch: 37 Global Step: 772420 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:16,066-Speed 6295.59 samples/sec Loss 2.6156 LearningRate 0.0000 Epoch: 37 Global Step: 772430 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:19,322-Speed 6290.35 samples/sec Loss 2.5869 LearningRate 0.0000 Epoch: 37 Global Step: 772440 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:22,578-Speed 6290.59 samples/sec Loss 2.6161 LearningRate 0.0000 Epoch: 37 Global Step: 772450 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:25,829-Speed 6301.98 samples/sec Loss 2.6497 LearningRate 0.0000 Epoch: 37 Global Step: 772460 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:29,089-Speed 6282.78 samples/sec Loss 2.6012 LearningRate 0.0000 Epoch: 37 Global Step: 772470 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:32,352-Speed 6279.28 samples/sec Loss 2.6712 LearningRate 0.0000 Epoch: 37 Global Step: 772480 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:35,617-Speed 6274.37 samples/sec Loss 2.6187 LearningRate 0.0000 Epoch: 37 Global Step: 772490 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:38,866-Speed 6303.63 samples/sec Loss 2.5863 LearningRate 0.0000 Epoch: 37 Global Step: 772500 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:42,116-Speed 6304.37 samples/sec Loss 2.5733 LearningRate 0.0000 Epoch: 37 Global Step: 772510 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:45,379-Speed 6276.65 samples/sec Loss 2.6030 LearningRate 0.0000 Epoch: 37 Global Step: 772520 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:48,632-Speed 6297.05 samples/sec Loss 2.6315 LearningRate 0.0000 Epoch: 37 Global Step: 772530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:51,882-Speed 6303.47 samples/sec Loss 2.6761 LearningRate 0.0000 Epoch: 37 Global Step: 772540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:55,142-Speed 6284.28 samples/sec Loss 2.6018 LearningRate 0.0000 Epoch: 37 Global Step: 772550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:01:58,407-Speed 6272.78 samples/sec Loss 2.6208 LearningRate 0.0000 Epoch: 37 Global Step: 772560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:01,680-Speed 6257.79 samples/sec Loss 2.6177 LearningRate 0.0000 Epoch: 37 Global Step: 772570 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:04,938-Speed 6288.58 samples/sec Loss 2.6104 LearningRate 0.0000 Epoch: 37 Global Step: 772580 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:08,192-Speed 6296.11 samples/sec Loss 2.6287 LearningRate 0.0000 Epoch: 37 Global Step: 772590 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:11,432-Speed 6321.92 samples/sec Loss 2.5653 LearningRate 0.0000 Epoch: 37 Global Step: 772600 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:14,686-Speed 6294.90 samples/sec Loss 2.6087 LearningRate 0.0000 Epoch: 37 Global Step: 772610 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:17,944-Speed 6288.56 samples/sec Loss 2.5752 LearningRate 0.0000 Epoch: 37 Global Step: 772620 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:21,207-Speed 6276.53 samples/sec Loss 2.6322 LearningRate 0.0000 Epoch: 37 Global Step: 772630 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:24,467-Speed 6288.57 samples/sec Loss 2.6174 LearningRate 0.0000 Epoch: 37 Global Step: 772640 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:27,718-Speed 6299.79 samples/sec Loss 2.6066 LearningRate 0.0000 Epoch: 37 Global Step: 772650 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:30,967-Speed 6304.87 samples/sec Loss 2.5985 LearningRate 0.0000 Epoch: 37 Global Step: 772660 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:34,220-Speed 6296.84 samples/sec Loss 2.5399 LearningRate 0.0000 Epoch: 37 Global Step: 772670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:37,478-Speed 6287.54 samples/sec Loss 2.6392 LearningRate 0.0000 Epoch: 37 Global Step: 772680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:40,738-Speed 6284.11 samples/sec Loss 2.6502 LearningRate 0.0000 Epoch: 37 Global Step: 772690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:43,978-Speed 6322.43 samples/sec Loss 2.5964 LearningRate 0.0000 Epoch: 37 Global Step: 772700 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:47,236-Speed 6288.13 samples/sec Loss 2.6202 LearningRate 0.0000 Epoch: 37 Global Step: 772710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:50,492-Speed 6289.84 samples/sec Loss 2.6228 LearningRate 0.0000 Epoch: 37 Global Step: 772720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:53,743-Speed 6301.34 samples/sec Loss 2.5886 LearningRate 0.0000 Epoch: 37 Global Step: 772730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:02:56,992-Speed 6305.69 samples/sec Loss 2.5840 LearningRate 0.0000 Epoch: 37 Global Step: 772740 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:00,267-Speed 6254.49 samples/sec Loss 2.5973 LearningRate 0.0000 Epoch: 37 Global Step: 772750 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:03,532-Speed 6273.02 samples/sec Loss 2.6449 LearningRate 0.0000 Epoch: 37 Global Step: 772760 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:06,786-Speed 6296.01 samples/sec Loss 2.6202 LearningRate 0.0000 Epoch: 37 Global Step: 772770 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:10,052-Speed 6271.04 samples/sec Loss 2.6318 LearningRate 0.0000 Epoch: 37 Global Step: 772780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:13,311-Speed 6286.35 samples/sec Loss 2.6265 LearningRate 0.0000 Epoch: 37 Global Step: 772790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:16,571-Speed 6285.26 samples/sec Loss 2.6159 LearningRate 0.0000 Epoch: 37 Global Step: 772800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:19,829-Speed 6286.60 samples/sec Loss 2.6077 LearningRate 0.0000 Epoch: 37 Global Step: 772810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:23,087-Speed 6287.32 samples/sec Loss 2.6274 LearningRate 0.0000 Epoch: 37 Global Step: 772820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:26,354-Speed 6269.26 samples/sec Loss 2.6174 LearningRate 0.0000 Epoch: 37 Global Step: 772830 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:29,612-Speed 6288.37 samples/sec Loss 2.5915 LearningRate 0.0000 Epoch: 37 Global Step: 772840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:32,872-Speed 6282.65 samples/sec Loss 2.5818 LearningRate 0.0000 Epoch: 37 Global Step: 772850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:36,125-Speed 6297.71 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 37 Global Step: 772860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:39,384-Speed 6285.98 samples/sec Loss 2.5989 LearningRate 0.0000 Epoch: 37 Global Step: 772870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:42,646-Speed 6279.17 samples/sec Loss 2.6252 LearningRate 0.0000 Epoch: 37 Global Step: 772880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:45,903-Speed 6289.77 samples/sec Loss 2.6371 LearningRate 0.0000 Epoch: 37 Global Step: 772890 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:49,163-Speed 6283.91 samples/sec Loss 2.5056 LearningRate 0.0000 Epoch: 37 Global Step: 772900 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:03:52,410-Speed 6307.77 samples/sec Loss 2.5867 LearningRate 0.0000 Epoch: 37 Global Step: 772910 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:55,668-Speed 6289.12 samples/sec Loss 2.6452 LearningRate 0.0000 Epoch: 37 Global Step: 772920 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:03:58,925-Speed 6288.76 samples/sec Loss 2.5892 LearningRate 0.0000 Epoch: 37 Global Step: 772930 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:02,182-Speed 6288.85 samples/sec Loss 2.5993 LearningRate 0.0000 Epoch: 37 Global Step: 772940 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:05,441-Speed 6286.57 samples/sec Loss 2.6105 LearningRate 0.0000 Epoch: 37 Global Step: 772950 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:08,703-Speed 6279.71 samples/sec Loss 2.6597 LearningRate 0.0000 Epoch: 37 Global Step: 772960 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:11,961-Speed 6286.83 samples/sec Loss 2.6229 LearningRate 0.0000 Epoch: 37 Global Step: 772970 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:15,220-Speed 6285.60 samples/sec Loss 2.5897 LearningRate 0.0000 Epoch: 37 Global Step: 772980 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:18,481-Speed 6281.89 samples/sec Loss 2.5880 LearningRate 0.0000 Epoch: 37 Global Step: 772990 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:21,737-Speed 6291.68 samples/sec Loss 2.6260 LearningRate 0.0000 Epoch: 37 Global Step: 773000 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:24,976-Speed 6325.21 samples/sec Loss 2.5957 LearningRate 0.0000 Epoch: 37 Global Step: 773010 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:28,232-Speed 6291.17 samples/sec Loss 2.6170 LearningRate 0.0000 Epoch: 37 Global Step: 773020 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:31,486-Speed 6293.97 samples/sec Loss 2.6287 LearningRate 0.0000 Epoch: 37 Global Step: 773030 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:34,747-Speed 6282.51 samples/sec Loss 2.5957 LearningRate 0.0000 Epoch: 37 Global Step: 773040 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:38,013-Speed 6272.30 samples/sec Loss 2.6064 LearningRate 0.0000 Epoch: 37 Global Step: 773050 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:41,284-Speed 6261.82 samples/sec Loss 2.6381 LearningRate 0.0000 Epoch: 37 Global Step: 773060 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:44,544-Speed 6283.93 samples/sec Loss 2.5963 LearningRate 0.0000 Epoch: 37 Global Step: 773070 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:47,796-Speed 6298.25 samples/sec Loss 2.5772 LearningRate 0.0000 Epoch: 37 Global Step: 773080 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:51,065-Speed 6266.83 samples/sec Loss 2.6115 LearningRate 0.0000 Epoch: 37 Global Step: 773090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:54,319-Speed 6294.82 samples/sec Loss 2.5975 LearningRate 0.0000 Epoch: 37 Global Step: 773100 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:04:57,559-Speed 6323.34 samples/sec Loss 2.6125 LearningRate 0.0000 Epoch: 37 Global Step: 773110 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:00,815-Speed 6290.58 samples/sec Loss 2.6065 LearningRate 0.0000 Epoch: 37 Global Step: 773120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:04,073-Speed 6287.56 samples/sec Loss 2.6655 LearningRate 0.0000 Epoch: 37 Global Step: 773130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:07,327-Speed 6296.13 samples/sec Loss 2.5810 LearningRate 0.0000 Epoch: 37 Global Step: 773140 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:10,584-Speed 6288.40 samples/sec Loss 2.5717 LearningRate 0.0000 Epoch: 37 Global Step: 773150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:13,842-Speed 6288.04 samples/sec Loss 2.6351 LearningRate 0.0000 Epoch: 37 Global Step: 773160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:17,096-Speed 6295.08 samples/sec Loss 2.5714 LearningRate 0.0000 Epoch: 37 Global Step: 773170 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:20,356-Speed 6283.37 samples/sec Loss 2.5907 LearningRate 0.0000 Epoch: 37 Global Step: 773180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:23,618-Speed 6280.89 samples/sec Loss 2.6148 LearningRate 0.0000 Epoch: 37 Global Step: 773190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:26,880-Speed 6279.34 samples/sec Loss 2.6333 LearningRate 0.0000 Epoch: 37 Global Step: 773200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:30,121-Speed 6319.34 samples/sec Loss 2.6087 LearningRate 0.0000 Epoch: 37 Global Step: 773210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:33,373-Speed 6300.33 samples/sec Loss 2.5946 LearningRate 0.0000 Epoch: 37 Global Step: 773220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:36,628-Speed 6292.67 samples/sec Loss 2.6497 LearningRate 0.0000 Epoch: 37 Global Step: 773230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:39,885-Speed 6289.69 samples/sec Loss 2.6335 LearningRate 0.0000 Epoch: 37 Global Step: 773240 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:43,138-Speed 6297.92 samples/sec Loss 2.5728 LearningRate 0.0000 Epoch: 37 Global Step: 773250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:46,391-Speed 6297.30 samples/sec Loss 2.6213 LearningRate 0.0000 Epoch: 37 Global Step: 773260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:49,662-Speed 6261.11 samples/sec Loss 2.5844 LearningRate 0.0000 Epoch: 37 Global Step: 773270 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:52,930-Speed 6269.72 samples/sec Loss 2.5686 LearningRate 0.0000 Epoch: 37 Global Step: 773280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:56,201-Speed 6262.32 samples/sec Loss 2.7052 LearningRate 0.0000 Epoch: 37 Global Step: 773290 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:05:59,455-Speed 6294.30 samples/sec Loss 2.6205 LearningRate 0.0000 Epoch: 37 Global Step: 773300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:02,716-Speed 6280.61 samples/sec Loss 2.6251 LearningRate 0.0000 Epoch: 37 Global Step: 773310 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:06:05,967-Speed 6302.14 samples/sec Loss 2.6610 LearningRate 0.0000 Epoch: 37 Global Step: 773320 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:09,224-Speed 6290.10 samples/sec Loss 2.6079 LearningRate 0.0000 Epoch: 37 Global Step: 773330 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:12,472-Speed 6306.06 samples/sec Loss 2.6227 LearningRate 0.0000 Epoch: 37 Global Step: 773340 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:15,730-Speed 6289.17 samples/sec Loss 2.5954 LearningRate 0.0000 Epoch: 37 Global Step: 773350 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:18,987-Speed 6289.62 samples/sec Loss 2.6028 LearningRate 0.0000 Epoch: 37 Global Step: 773360 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:22,243-Speed 6291.88 samples/sec Loss 2.6256 LearningRate 0.0000 Epoch: 37 Global Step: 773370 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:25,502-Speed 6285.24 samples/sec Loss 2.6501 LearningRate 0.0000 Epoch: 37 Global Step: 773380 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:28,781-Speed 6247.33 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 37 Global Step: 773390 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:32,033-Speed 6299.41 samples/sec Loss 2.6320 LearningRate 0.0000 Epoch: 37 Global Step: 773400 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:35,303-Speed 6264.68 samples/sec Loss 2.5616 LearningRate 0.0000 Epoch: 37 Global Step: 773410 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:38,557-Speed 6295.03 samples/sec Loss 2.6072 LearningRate 0.0000 Epoch: 37 Global Step: 773420 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:06:41,796-Speed 6325.24 samples/sec Loss 2.5646 LearningRate 0.0000 Epoch: 37 Global Step: 773430 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:06:45,050-Speed 6294.75 samples/sec Loss 2.6104 LearningRate 0.0000 Epoch: 37 Global Step: 773440 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:06:48,308-Speed 6288.04 samples/sec Loss 2.6397 LearningRate 0.0000 Epoch: 37 Global Step: 773450 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:06:51,588-Speed 6245.41 samples/sec Loss 2.5911 LearningRate 0.0000 Epoch: 37 Global Step: 773460 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:06:54,849-Speed 6280.87 samples/sec Loss 2.5788 LearningRate 0.0000 Epoch: 37 Global Step: 773470 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:06:58,111-Speed 6278.90 samples/sec Loss 2.6387 LearningRate 0.0000 Epoch: 37 Global Step: 773480 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:07:01,359-Speed 6307.78 samples/sec Loss 2.6150 LearningRate 0.0000 Epoch: 37 Global Step: 773490 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:07:04,620-Speed 6281.45 samples/sec Loss 2.5502 LearningRate 0.0000 Epoch: 37 Global Step: 773500 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:07:07,886-Speed 6272.43 samples/sec Loss 2.5895 LearningRate 0.0000 Epoch: 37 Global Step: 773510 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:07:11,148-Speed 6280.16 samples/sec Loss 2.5865 LearningRate 0.0000 Epoch: 37 Global Step: 773520 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:07:14,406-Speed 6286.44 samples/sec Loss 2.5928 LearningRate 0.0000 Epoch: 37 Global Step: 773530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:17,663-Speed 6290.07 samples/sec Loss 2.5704 LearningRate 0.0000 Epoch: 37 Global Step: 773540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:20,924-Speed 6281.66 samples/sec Loss 2.5972 LearningRate 0.0000 Epoch: 37 Global Step: 773550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:24,182-Speed 6286.69 samples/sec Loss 2.5849 LearningRate 0.0000 Epoch: 37 Global Step: 773560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:27,444-Speed 6280.05 samples/sec Loss 2.5993 LearningRate 0.0000 Epoch: 37 Global Step: 773570 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:30,706-Speed 6280.53 samples/sec Loss 2.6456 LearningRate 0.0000 Epoch: 37 Global Step: 773580 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:33,965-Speed 6285.44 samples/sec Loss 2.6225 LearningRate 0.0000 Epoch: 37 Global Step: 773590 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:37,228-Speed 6279.00 samples/sec Loss 2.5914 LearningRate 0.0000 Epoch: 37 Global Step: 773600 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:40,484-Speed 6290.62 samples/sec Loss 2.6291 LearningRate 0.0000 Epoch: 37 Global Step: 773610 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:43,738-Speed 6294.67 samples/sec Loss 2.6311 LearningRate 0.0000 Epoch: 37 Global Step: 773620 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:46,983-Speed 6312.73 samples/sec Loss 2.6734 LearningRate 0.0000 Epoch: 37 Global Step: 773630 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:50,243-Speed 6284.53 samples/sec Loss 2.5796 LearningRate 0.0000 Epoch: 37 Global Step: 773640 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:53,504-Speed 6282.24 samples/sec Loss 2.7077 LearningRate 0.0000 Epoch: 37 Global Step: 773650 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:07:56,762-Speed 6285.92 samples/sec Loss 2.6084 LearningRate 0.0000 Epoch: 37 Global Step: 773660 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:00,025-Speed 6278.38 samples/sec Loss 2.5741 LearningRate 0.0000 Epoch: 37 Global Step: 773670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:03,286-Speed 6280.87 samples/sec Loss 2.5843 LearningRate 0.0000 Epoch: 37 Global Step: 773680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:06,544-Speed 6288.21 samples/sec Loss 2.6215 LearningRate 0.0000 Epoch: 37 Global Step: 773690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:09,801-Speed 6290.48 samples/sec Loss 2.6073 LearningRate 0.0000 Epoch: 37 Global Step: 773700 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:13,061-Speed 6282.46 samples/sec Loss 2.6166 LearningRate 0.0000 Epoch: 37 Global Step: 773710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:16,320-Speed 6286.46 samples/sec Loss 2.5542 LearningRate 0.0000 Epoch: 37 Global Step: 773720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:19,562-Speed 6319.20 samples/sec Loss 2.6218 LearningRate 0.0000 Epoch: 37 Global Step: 773730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:22,832-Speed 6263.14 samples/sec Loss 2.6196 LearningRate 0.0000 Epoch: 37 Global Step: 773740 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:26,091-Speed 6285.64 samples/sec Loss 2.5939 LearningRate 0.0000 Epoch: 37 Global Step: 773750 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:29,349-Speed 6288.23 samples/sec Loss 2.6022 LearningRate 0.0000 Epoch: 37 Global Step: 773760 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:32,609-Speed 6282.70 samples/sec Loss 2.6261 LearningRate 0.0000 Epoch: 37 Global Step: 773770 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:35,870-Speed 6282.27 samples/sec Loss 2.5531 LearningRate 0.0000 Epoch: 37 Global Step: 773780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:39,129-Speed 6286.06 samples/sec Loss 2.6124 LearningRate 0.0000 Epoch: 37 Global Step: 773790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:42,391-Speed 6280.45 samples/sec Loss 2.6246 LearningRate 0.0000 Epoch: 37 Global Step: 773800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:45,639-Speed 6305.36 samples/sec Loss 2.6397 LearningRate 0.0000 Epoch: 37 Global Step: 773810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:48,901-Speed 6280.88 samples/sec Loss 2.6276 LearningRate 0.0000 Epoch: 37 Global Step: 773820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:52,153-Speed 6298.77 samples/sec Loss 2.6172 LearningRate 0.0000 Epoch: 37 Global Step: 773830 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:08:55,405-Speed 6299.51 samples/sec Loss 2.5908 LearningRate 0.0000 Epoch: 37 Global Step: 773840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:08:58,664-Speed 6284.79 samples/sec Loss 2.6107 LearningRate 0.0000 Epoch: 37 Global Step: 773850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:01,917-Speed 6296.57 samples/sec Loss 2.6269 LearningRate 0.0000 Epoch: 37 Global Step: 773860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:05,180-Speed 6277.55 samples/sec Loss 2.6057 LearningRate 0.0000 Epoch: 37 Global Step: 773870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:08,429-Speed 6305.40 samples/sec Loss 2.6002 LearningRate 0.0000 Epoch: 37 Global Step: 773880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:11,690-Speed 6281.69 samples/sec Loss 2.6183 LearningRate 0.0000 Epoch: 37 Global Step: 773890 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:14,940-Speed 6302.71 samples/sec Loss 2.5880 LearningRate 0.0000 Epoch: 37 Global Step: 773900 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:18,199-Speed 6285.24 samples/sec Loss 2.6390 LearningRate 0.0000 Epoch: 37 Global Step: 773910 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:21,462-Speed 6279.53 samples/sec Loss 2.5833 LearningRate 0.0000 Epoch: 37 Global Step: 773920 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:24,716-Speed 6294.40 samples/sec Loss 2.6108 LearningRate 0.0000 Epoch: 37 Global Step: 773930 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:27,956-Speed 6321.81 samples/sec Loss 2.6292 LearningRate 0.0000 Epoch: 37 Global Step: 773940 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:31,215-Speed 6286.15 samples/sec Loss 2.6093 LearningRate 0.0000 Epoch: 37 Global Step: 773950 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:34,476-Speed 6280.99 samples/sec Loss 2.6100 LearningRate 0.0000 Epoch: 37 Global Step: 773960 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:37,736-Speed 6284.89 samples/sec Loss 2.5852 LearningRate 0.0000 Epoch: 37 Global Step: 773970 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:40,993-Speed 6289.99 samples/sec Loss 2.6154 LearningRate 0.0000 Epoch: 37 Global Step: 773980 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:44,251-Speed 6286.45 samples/sec Loss 2.6001 LearningRate 0.0000 Epoch: 37 Global Step: 773990 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:47,508-Speed 6289.95 samples/sec Loss 2.6049 LearningRate 0.0000 Epoch: 37 Global Step: 774000 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:50,762-Speed 6294.71 samples/sec Loss 2.6038 LearningRate 0.0000 Epoch: 37 Global Step: 774010 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:54,020-Speed 6288.60 samples/sec Loss 2.5982 LearningRate 0.0000 Epoch: 37 Global Step: 774020 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:09:57,280-Speed 6284.15 samples/sec Loss 2.5911 LearningRate 0.0000 Epoch: 37 Global Step: 774030 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:00,537-Speed 6289.32 samples/sec Loss 2.6278 LearningRate 0.0000 Epoch: 37 Global Step: 774040 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:03,793-Speed 6289.72 samples/sec Loss 2.5984 LearningRate 0.0000 Epoch: 37 Global Step: 774050 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:07,055-Speed 6281.01 samples/sec Loss 2.6059 LearningRate 0.0000 Epoch: 37 Global Step: 774060 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:10,314-Speed 6285.86 samples/sec Loss 2.6489 LearningRate 0.0000 Epoch: 37 Global Step: 774070 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:13,584-Speed 6263.56 samples/sec Loss 2.6005 LearningRate 0.0000 Epoch: 37 Global Step: 774080 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:16,838-Speed 6294.53 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 37 Global Step: 774090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:20,092-Speed 6294.83 samples/sec Loss 2.6009 LearningRate 0.0000 Epoch: 37 Global Step: 774100 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:23,354-Speed 6280.70 samples/sec Loss 2.6002 LearningRate 0.0000 Epoch: 37 Global Step: 774110 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:26,612-Speed 6288.23 samples/sec Loss 2.6248 LearningRate 0.0000 Epoch: 37 Global Step: 774120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:29,864-Speed 6298.33 samples/sec Loss 2.5740 LearningRate 0.0000 Epoch: 37 Global Step: 774130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:33,110-Speed 6309.57 samples/sec Loss 2.5334 LearningRate 0.0000 Epoch: 37 Global Step: 774140 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:36,368-Speed 6289.17 samples/sec Loss 2.6064 LearningRate 0.0000 Epoch: 37 Global Step: 774150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:39,622-Speed 6294.95 samples/sec Loss 2.6058 LearningRate 0.0000 Epoch: 37 Global Step: 774160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:42,873-Speed 6300.88 samples/sec Loss 2.6222 LearningRate 0.0000 Epoch: 37 Global Step: 774170 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:46,129-Speed 6292.12 samples/sec Loss 2.5573 LearningRate 0.0000 Epoch: 37 Global Step: 774180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:49,390-Speed 6281.61 samples/sec Loss 2.5988 LearningRate 0.0000 Epoch: 37 Global Step: 774190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:52,644-Speed 6294.14 samples/sec Loss 2.5862 LearningRate 0.0000 Epoch: 37 Global Step: 774200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:55,904-Speed 6285.16 samples/sec Loss 2.5875 LearningRate 0.0000 Epoch: 37 Global Step: 774210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:10:59,172-Speed 6267.80 samples/sec Loss 2.5993 LearningRate 0.0000 Epoch: 37 Global Step: 774220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:02,434-Speed 6279.70 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 37 Global Step: 774230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:05,708-Speed 6256.38 samples/sec Loss 2.6173 LearningRate 0.0000 Epoch: 37 Global Step: 774240 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:11:08,950-Speed 6319.15 samples/sec Loss 2.6464 LearningRate 0.0000 Epoch: 37 Global Step: 774250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:12,212-Speed 6278.62 samples/sec Loss 2.6481 LearningRate 0.0000 Epoch: 37 Global Step: 774260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:15,463-Speed 6302.05 samples/sec Loss 2.5286 LearningRate 0.0000 Epoch: 37 Global Step: 774270 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:18,717-Speed 6294.33 samples/sec Loss 2.6089 LearningRate 0.0000 Epoch: 37 Global Step: 774280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:21,984-Speed 6270.86 samples/sec Loss 2.5861 LearningRate 0.0000 Epoch: 37 Global Step: 774290 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:25,239-Speed 6293.64 samples/sec Loss 2.6027 LearningRate 0.0000 Epoch: 37 Global Step: 774300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:28,497-Speed 6285.94 samples/sec Loss 2.6115 LearningRate 0.0000 Epoch: 37 Global Step: 774310 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:31,748-Speed 6301.81 samples/sec Loss 2.5896 LearningRate 0.0000 Epoch: 37 Global Step: 774320 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:35,003-Speed 6293.29 samples/sec Loss 2.6245 LearningRate 0.0000 Epoch: 37 Global Step: 774330 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:38,265-Speed 6279.31 samples/sec Loss 2.6180 LearningRate 0.0000 Epoch: 37 Global Step: 774340 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:41,508-Speed 6320.49 samples/sec Loss 2.5888 LearningRate 0.0000 Epoch: 37 Global Step: 774350 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:44,763-Speed 6293.70 samples/sec Loss 2.6040 LearningRate 0.0000 Epoch: 37 Global Step: 774360 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:48,021-Speed 6287.02 samples/sec Loss 2.5506 LearningRate 0.0000 Epoch: 37 Global Step: 774370 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:51,299-Speed 6249.38 samples/sec Loss 2.6213 LearningRate 0.0000 Epoch: 37 Global Step: 774380 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:54,553-Speed 6295.89 samples/sec Loss 2.6245 LearningRate 0.0000 Epoch: 37 Global Step: 774390 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:11:57,814-Speed 6282.99 samples/sec Loss 2.5868 LearningRate 0.0000 Epoch: 37 Global Step: 774400 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:01,080-Speed 6270.55 samples/sec Loss 2.6494 LearningRate 0.0000 Epoch: 37 Global Step: 774410 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:04,339-Speed 6286.95 samples/sec Loss 2.6005 LearningRate 0.0000 Epoch: 37 Global Step: 774420 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:07,594-Speed 6291.87 samples/sec Loss 2.5850 LearningRate 0.0000 Epoch: 37 Global Step: 774430 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:10,855-Speed 6282.46 samples/sec Loss 2.6119 LearningRate 0.0000 Epoch: 37 Global Step: 774440 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:14,132-Speed 6250.01 samples/sec Loss 2.6182 LearningRate 0.0000 Epoch: 37 Global Step: 774450 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:12:17,370-Speed 6326.28 samples/sec Loss 2.5745 LearningRate 0.0000 Epoch: 37 Global Step: 774460 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:20,623-Speed 6298.23 samples/sec Loss 2.5996 LearningRate 0.0000 Epoch: 37 Global Step: 774470 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:23,879-Speed 6290.17 samples/sec Loss 2.6177 LearningRate 0.0000 Epoch: 37 Global Step: 774480 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:27,140-Speed 6282.68 samples/sec Loss 2.6498 LearningRate 0.0000 Epoch: 37 Global Step: 774490 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:30,389-Speed 6304.56 samples/sec Loss 2.6172 LearningRate 0.0000 Epoch: 37 Global Step: 774500 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:33,644-Speed 6292.84 samples/sec Loss 2.6424 LearningRate 0.0000 Epoch: 37 Global Step: 774510 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:36,898-Speed 6295.29 samples/sec Loss 2.5970 LearningRate 0.0000 Epoch: 37 Global Step: 774520 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:40,152-Speed 6296.51 samples/sec Loss 2.6748 LearningRate 0.0000 Epoch: 37 Global Step: 774530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:43,408-Speed 6290.49 samples/sec Loss 2.5858 LearningRate 0.0000 Epoch: 37 Global Step: 774540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:46,664-Speed 6291.25 samples/sec Loss 2.5920 LearningRate 0.0000 Epoch: 37 Global Step: 774550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:49,898-Speed 6333.59 samples/sec Loss 2.5873 LearningRate 0.0000 Epoch: 37 Global Step: 774560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:53,170-Speed 6261.15 samples/sec Loss 2.6376 LearningRate 0.0000 Epoch: 37 Global Step: 774570 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:56,431-Speed 6282.08 samples/sec Loss 2.6137 LearningRate 0.0000 Epoch: 37 Global Step: 774580 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:12:59,683-Speed 6299.40 samples/sec Loss 2.6223 LearningRate 0.0000 Epoch: 37 Global Step: 774590 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:02,941-Speed 6286.70 samples/sec Loss 2.5796 LearningRate 0.0000 Epoch: 37 Global Step: 774600 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:06,202-Speed 6281.67 samples/sec Loss 2.6002 LearningRate 0.0000 Epoch: 37 Global Step: 774610 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:09,463-Speed 6283.37 samples/sec Loss 2.6066 LearningRate 0.0000 Epoch: 37 Global Step: 774620 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:12,723-Speed 6282.28 samples/sec Loss 2.6189 LearningRate 0.0000 Epoch: 37 Global Step: 774630 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:15,977-Speed 6295.05 samples/sec Loss 2.6187 LearningRate 0.0000 Epoch: 37 Global Step: 774640 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:19,236-Speed 6285.23 samples/sec Loss 2.6069 LearningRate 0.0000 Epoch: 37 Global Step: 774650 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:22,475-Speed 6325.37 samples/sec Loss 2.6231 LearningRate 0.0000 Epoch: 37 Global Step: 774660 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:25,728-Speed 6297.01 samples/sec Loss 2.6038 LearningRate 0.0000 Epoch: 37 Global Step: 774670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:28,982-Speed 6295.68 samples/sec Loss 2.5990 LearningRate 0.0000 Epoch: 37 Global Step: 774680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:32,237-Speed 6293.69 samples/sec Loss 2.5722 LearningRate 0.0000 Epoch: 37 Global Step: 774690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:35,496-Speed 6285.59 samples/sec Loss 2.5758 LearningRate 0.0000 Epoch: 37 Global Step: 774700 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:38,823-Speed 6155.83 samples/sec Loss 2.6028 LearningRate 0.0000 Epoch: 37 Global Step: 774710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:42,091-Speed 6267.79 samples/sec Loss 2.6101 LearningRate 0.0000 Epoch: 37 Global Step: 774720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:45,359-Speed 6268.62 samples/sec Loss 2.5680 LearningRate 0.0000 Epoch: 37 Global Step: 774730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:48,622-Speed 6278.30 samples/sec Loss 2.5570 LearningRate 0.0000 Epoch: 37 Global Step: 774740 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:51,877-Speed 6292.95 samples/sec Loss 2.5427 LearningRate 0.0000 Epoch: 37 Global Step: 774750 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:55,128-Speed 6301.24 samples/sec Loss 2.5868 LearningRate 0.0000 Epoch: 37 Global Step: 774760 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:13:58,386-Speed 6287.82 samples/sec Loss 2.6173 LearningRate 0.0000 Epoch: 37 Global Step: 774770 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:01,633-Speed 6309.46 samples/sec Loss 2.5748 LearningRate 0.0000 Epoch: 37 Global Step: 774780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:04,905-Speed 6260.52 samples/sec Loss 2.6620 LearningRate 0.0000 Epoch: 37 Global Step: 774790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:08,161-Speed 6290.46 samples/sec Loss 2.6181 LearningRate 0.0000 Epoch: 37 Global Step: 774800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:11,430-Speed 6267.33 samples/sec Loss 2.6561 LearningRate 0.0000 Epoch: 37 Global Step: 774810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:14,687-Speed 6288.63 samples/sec Loss 2.6077 LearningRate 0.0000 Epoch: 37 Global Step: 774820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:17,938-Speed 6301.20 samples/sec Loss 2.6370 LearningRate 0.0000 Epoch: 37 Global Step: 774830 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:21,192-Speed 6296.79 samples/sec Loss 2.6666 LearningRate 0.0000 Epoch: 37 Global Step: 774840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:24,458-Speed 6272.08 samples/sec Loss 2.5980 LearningRate 0.0000 Epoch: 37 Global Step: 774850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:27,700-Speed 6318.33 samples/sec Loss 2.6202 LearningRate 0.0000 Epoch: 37 Global Step: 774860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:30,958-Speed 6286.84 samples/sec Loss 2.5537 LearningRate 0.0000 Epoch: 37 Global Step: 774870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:34,219-Speed 6280.75 samples/sec Loss 2.5466 LearningRate 0.0000 Epoch: 37 Global Step: 774880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:37,474-Speed 6293.32 samples/sec Loss 2.5775 LearningRate 0.0000 Epoch: 37 Global Step: 774890 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:40,735-Speed 6281.68 samples/sec Loss 2.6107 LearningRate 0.0000 Epoch: 37 Global Step: 774900 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:43,989-Speed 6295.52 samples/sec Loss 2.6539 LearningRate 0.0000 Epoch: 37 Global Step: 774910 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:47,251-Speed 6280.12 samples/sec Loss 2.6005 LearningRate 0.0000 Epoch: 37 Global Step: 774920 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:50,506-Speed 6294.04 samples/sec Loss 2.6255 LearningRate 0.0000 Epoch: 37 Global Step: 774930 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:53,764-Speed 6287.24 samples/sec Loss 2.6414 LearningRate 0.0000 Epoch: 37 Global Step: 774940 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:14:57,016-Speed 6299.72 samples/sec Loss 2.6318 LearningRate 0.0000 Epoch: 37 Global Step: 774950 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:00,256-Speed 6321.70 samples/sec Loss 2.5682 LearningRate 0.0000 Epoch: 37 Global Step: 774960 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:03,515-Speed 6285.28 samples/sec Loss 2.5747 LearningRate 0.0000 Epoch: 37 Global Step: 774970 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:06,777-Speed 6280.31 samples/sec Loss 2.5554 LearningRate 0.0000 Epoch: 37 Global Step: 774980 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:10,031-Speed 6294.46 samples/sec Loss 2.6117 LearningRate 0.0000 Epoch: 37 Global Step: 774990 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:13,288-Speed 6291.01 samples/sec Loss 2.5802 LearningRate 0.0000 Epoch: 37 Global Step: 775000 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:16,537-Speed 6303.41 samples/sec Loss 2.6227 LearningRate 0.0000 Epoch: 37 Global Step: 775010 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:19,794-Speed 6290.62 samples/sec Loss 2.6751 LearningRate 0.0000 Epoch: 37 Global Step: 775020 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:23,053-Speed 6285.15 samples/sec Loss 2.5655 LearningRate 0.0000 Epoch: 37 Global Step: 775030 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:26,314-Speed 6281.89 samples/sec Loss 2.5638 LearningRate 0.0000 Epoch: 37 Global Step: 775040 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:29,572-Speed 6286.89 samples/sec Loss 2.5898 LearningRate 0.0000 Epoch: 37 Global Step: 775050 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:32,834-Speed 6280.51 samples/sec Loss 2.5983 LearningRate 0.0000 Epoch: 37 Global Step: 775060 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:15:36,074-Speed 6321.72 samples/sec Loss 2.6438 LearningRate 0.0000 Epoch: 37 Global Step: 775070 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:39,337-Speed 6278.17 samples/sec Loss 2.5895 LearningRate 0.0000 Epoch: 37 Global Step: 775080 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:42,596-Speed 6286.35 samples/sec Loss 2.6152 LearningRate 0.0000 Epoch: 37 Global Step: 775090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:45,846-Speed 6302.74 samples/sec Loss 2.5421 LearningRate 0.0000 Epoch: 37 Global Step: 775100 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:49,095-Speed 6304.57 samples/sec Loss 2.6022 LearningRate 0.0000 Epoch: 37 Global Step: 775110 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:52,357-Speed 6279.63 samples/sec Loss 2.5791 LearningRate 0.0000 Epoch: 37 Global Step: 775120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:55,612-Speed 6292.49 samples/sec Loss 2.5811 LearningRate 0.0000 Epoch: 37 Global Step: 775130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:15:58,870-Speed 6288.31 samples/sec Loss 2.6468 LearningRate 0.0000 Epoch: 37 Global Step: 775140 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:02,129-Speed 6284.11 samples/sec Loss 2.6286 LearningRate 0.0000 Epoch: 37 Global Step: 775150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:05,407-Speed 6249.86 samples/sec Loss 2.5775 LearningRate 0.0000 Epoch: 37 Global Step: 775160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:08,652-Speed 6311.96 samples/sec Loss 2.5771 LearningRate 0.0000 Epoch: 37 Global Step: 775170 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:11,904-Speed 6299.62 samples/sec Loss 2.6404 LearningRate 0.0000 Epoch: 37 Global Step: 775180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:15,174-Speed 6264.88 samples/sec Loss 2.6177 LearningRate 0.0000 Epoch: 37 Global Step: 775190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:18,429-Speed 6293.42 samples/sec Loss 2.6219 LearningRate 0.0000 Epoch: 37 Global Step: 775200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:21,685-Speed 6290.63 samples/sec Loss 2.6229 LearningRate 0.0000 Epoch: 37 Global Step: 775210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:24,942-Speed 6289.31 samples/sec Loss 2.5776 LearningRate 0.0000 Epoch: 37 Global Step: 775220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:28,195-Speed 6297.98 samples/sec Loss 2.6178 LearningRate 0.0000 Epoch: 37 Global Step: 775230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:31,466-Speed 6262.83 samples/sec Loss 2.6281 LearningRate 0.0000 Epoch: 37 Global Step: 775240 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:34,719-Speed 6297.63 samples/sec Loss 2.6033 LearningRate 0.0000 Epoch: 37 Global Step: 775250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:37,977-Speed 6287.66 samples/sec Loss 2.6247 LearningRate 0.0000 Epoch: 37 Global Step: 775260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:41,241-Speed 6274.48 samples/sec Loss 2.5890 LearningRate 0.0000 Epoch: 37 Global Step: 775270 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:16:44,478-Speed 6328.18 samples/sec Loss 2.6002 LearningRate 0.0000 Epoch: 37 Global Step: 775280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:47,734-Speed 6290.93 samples/sec Loss 2.6592 LearningRate 0.0000 Epoch: 37 Global Step: 775290 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:50,995-Speed 6282.17 samples/sec Loss 2.5206 LearningRate 0.0000 Epoch: 37 Global Step: 775300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:54,255-Speed 6283.31 samples/sec Loss 2.6188 LearningRate 0.0000 Epoch: 37 Global Step: 775310 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:16:57,513-Speed 6288.75 samples/sec Loss 2.6238 LearningRate 0.0000 Epoch: 37 Global Step: 775320 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:00,775-Speed 6279.65 samples/sec Loss 2.5618 LearningRate 0.0000 Epoch: 37 Global Step: 775330 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:04,035-Speed 6283.35 samples/sec Loss 2.6435 LearningRate 0.0000 Epoch: 37 Global Step: 775340 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:07,292-Speed 6289.99 samples/sec Loss 2.6261 LearningRate 0.0000 Epoch: 37 Global Step: 775350 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:10,555-Speed 6276.04 samples/sec Loss 2.6242 LearningRate 0.0000 Epoch: 37 Global Step: 775360 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:13,810-Speed 6293.49 samples/sec Loss 2.6006 LearningRate 0.0000 Epoch: 37 Global Step: 775370 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:17,052-Speed 6319.27 samples/sec Loss 2.5718 LearningRate 0.0000 Epoch: 37 Global Step: 775380 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:20,314-Speed 6279.95 samples/sec Loss 2.6168 LearningRate 0.0000 Epoch: 37 Global Step: 775390 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:23,571-Speed 6290.12 samples/sec Loss 2.5951 LearningRate 0.0000 Epoch: 37 Global Step: 775400 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:26,829-Speed 6287.63 samples/sec Loss 2.6021 LearningRate 0.0000 Epoch: 37 Global Step: 775410 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:30,089-Speed 6282.93 samples/sec Loss 2.6344 LearningRate 0.0000 Epoch: 37 Global Step: 775420 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:33,342-Speed 6297.68 samples/sec Loss 2.6271 LearningRate 0.0000 Epoch: 37 Global Step: 775430 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:36,598-Speed 6291.98 samples/sec Loss 2.6369 LearningRate 0.0000 Epoch: 37 Global Step: 775440 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:39,852-Speed 6294.38 samples/sec Loss 2.5916 LearningRate 0.0000 Epoch: 37 Global Step: 775450 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:43,104-Speed 6298.52 samples/sec Loss 2.6017 LearningRate 0.0000 Epoch: 37 Global Step: 775460 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:46,362-Speed 6288.36 samples/sec Loss 2.6046 LearningRate 0.0000 Epoch: 37 Global Step: 775470 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:49,603-Speed 6320.36 samples/sec Loss 2.5601 LearningRate 0.0000 Epoch: 37 Global Step: 775480 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:52,867-Speed 6275.87 samples/sec Loss 2.6475 LearningRate 0.0000 Epoch: 37 Global Step: 775490 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:56,135-Speed 6267.26 samples/sec Loss 2.6273 LearningRate 0.0000 Epoch: 37 Global Step: 775500 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:17:59,394-Speed 6285.41 samples/sec Loss 2.6381 LearningRate 0.0000 Epoch: 37 Global Step: 775510 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:18:02,661-Speed 6270.01 samples/sec Loss 2.5944 LearningRate 0.0000 Epoch: 37 Global Step: 775520 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:18:05,917-Speed 6292.26 samples/sec Loss 2.6016 LearningRate 0.0000 Epoch: 37 Global Step: 775530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:18:09,181-Speed 6276.74 samples/sec Loss 2.6244 LearningRate 0.0000 Epoch: 37 Global Step: 775540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:18:12,441-Speed 6283.18 samples/sec Loss 2.5682 LearningRate 0.0000 Epoch: 37 Global Step: 775550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:18:15,699-Speed 6287.51 samples/sec Loss 2.6342 LearningRate 0.0000 Epoch: 37 Global Step: 775560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:18:18,947-Speed 6305.36 samples/sec Loss 2.5754 LearningRate 0.0000 Epoch: 37 Global Step: 775570 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:18:22,197-Speed 6304.52 samples/sec Loss 2.6713 LearningRate 0.0000 Epoch: 37 Global Step: 775580 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:18:25,460-Speed 6276.93 samples/sec Loss 2.6264 LearningRate 0.0000 Epoch: 37 Global Step: 775590 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:18:28,717-Speed 6289.39 samples/sec Loss 2.5898 LearningRate 0.0000 Epoch: 37 Global Step: 775600 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:18:31,972-Speed 6294.69 samples/sec Loss 2.6180 LearningRate 0.0000 Epoch: 37 Global Step: 775610 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:18:35,232-Speed 6282.91 samples/sec Loss 2.5897 LearningRate 0.0000 Epoch: 37 Global Step: 775620 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:18:38,489-Speed 6290.63 samples/sec Loss 2.6058 LearningRate 0.0000 Epoch: 37 Global Step: 775630 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:18:41,742-Speed 6297.15 samples/sec Loss 2.6522 LearningRate 0.0000 Epoch: 37 Global Step: 775640 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:18:44,996-Speed 6293.42 samples/sec Loss 2.5866 LearningRate 0.0000 Epoch: 37 Global Step: 775650 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:18:48,254-Speed 6289.03 samples/sec Loss 2.5991 LearningRate 0.0000 Epoch: 37 Global Step: 775660 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:18:51,529-Speed 6253.48 samples/sec Loss 2.5913 LearningRate 0.0000 Epoch: 37 Global Step: 775670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:18:54,783-Speed 6296.25 samples/sec Loss 2.5507 LearningRate 0.0000 Epoch: 37 Global Step: 775680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:18:58,047-Speed 6275.09 samples/sec Loss 2.6350 LearningRate 0.0000 Epoch: 37 Global Step: 775690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:01,305-Speed 6288.14 samples/sec Loss 2.5846 LearningRate 0.0000 Epoch: 37 Global Step: 775700 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:04,558-Speed 6296.30 samples/sec Loss 2.5646 LearningRate 0.0000 Epoch: 37 Global Step: 775710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:07,819-Speed 6282.84 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 37 Global Step: 775720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:11,074-Speed 6292.45 samples/sec Loss 2.6341 LearningRate 0.0000 Epoch: 37 Global Step: 775730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:14,336-Speed 6280.76 samples/sec Loss 2.6373 LearningRate 0.0000 Epoch: 37 Global Step: 775740 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:17,599-Speed 6277.26 samples/sec Loss 2.6494 LearningRate 0.0000 Epoch: 37 Global Step: 775750 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:20,853-Speed 6294.97 samples/sec Loss 2.6005 LearningRate 0.0000 Epoch: 37 Global Step: 775760 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:24,095-Speed 6319.02 samples/sec Loss 2.5644 LearningRate 0.0000 Epoch: 37 Global Step: 775770 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:27,357-Speed 6279.35 samples/sec Loss 2.6397 LearningRate 0.0000 Epoch: 37 Global Step: 775780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:30,612-Speed 6293.97 samples/sec Loss 2.5985 LearningRate 0.0000 Epoch: 37 Global Step: 775790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:33,866-Speed 6294.95 samples/sec Loss 2.5911 LearningRate 0.0000 Epoch: 37 Global Step: 775800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:37,157-Speed 6225.90 samples/sec Loss 2.6337 LearningRate 0.0000 Epoch: 37 Global Step: 775810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:40,434-Speed 6250.48 samples/sec Loss 2.5950 LearningRate 0.0000 Epoch: 37 Global Step: 775820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:43,681-Speed 6308.22 samples/sec Loss 2.6385 LearningRate 0.0000 Epoch: 37 Global Step: 775830 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:46,937-Speed 6292.07 samples/sec Loss 2.6044 LearningRate 0.0000 Epoch: 37 Global Step: 775840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:50,194-Speed 6287.81 samples/sec Loss 2.6160 LearningRate 0.0000 Epoch: 37 Global Step: 775850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:53,454-Speed 6283.92 samples/sec Loss 2.6052 LearningRate 0.0000 Epoch: 37 Global Step: 775860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:19:56,713-Speed 6284.94 samples/sec Loss 2.5707 LearningRate 0.0000 Epoch: 37 Global Step: 775870 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:19:59,961-Speed 6307.29 samples/sec Loss 2.5822 LearningRate 0.0000 Epoch: 37 Global Step: 775880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:03,219-Speed 6288.26 samples/sec Loss 2.6237 LearningRate 0.0000 Epoch: 37 Global Step: 775890 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:06,474-Speed 6292.29 samples/sec Loss 2.5891 LearningRate 0.0000 Epoch: 37 Global Step: 775900 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:09,733-Speed 6286.62 samples/sec Loss 2.5941 LearningRate 0.0000 Epoch: 37 Global Step: 775910 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:12,990-Speed 6288.82 samples/sec Loss 2.6631 LearningRate 0.0000 Epoch: 37 Global Step: 775920 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:16,243-Speed 6297.01 samples/sec Loss 2.6281 LearningRate 0.0000 Epoch: 37 Global Step: 775930 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:19,497-Speed 6295.01 samples/sec Loss 2.5827 LearningRate 0.0000 Epoch: 37 Global Step: 775940 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:22,774-Speed 6251.64 samples/sec Loss 2.5688 LearningRate 0.0000 Epoch: 37 Global Step: 775950 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:26,034-Speed 6283.37 samples/sec Loss 2.6178 LearningRate 0.0000 Epoch: 37 Global Step: 775960 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:29,286-Speed 6298.61 samples/sec Loss 2.6116 LearningRate 0.0000 Epoch: 37 Global Step: 775970 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:32,532-Speed 6312.18 samples/sec Loss 2.6317 LearningRate 0.0000 Epoch: 37 Global Step: 775980 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:35,785-Speed 6296.83 samples/sec Loss 2.5754 LearningRate 0.0000 Epoch: 37 Global Step: 775990 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:39,034-Speed 6305.05 samples/sec Loss 2.6228 LearningRate 0.0000 Epoch: 37 Global Step: 776000 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:42,292-Speed 6288.06 samples/sec Loss 2.6121 LearningRate 0.0000 Epoch: 37 Global Step: 776010 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:45,544-Speed 6299.36 samples/sec Loss 2.5386 LearningRate 0.0000 Epoch: 37 Global Step: 776020 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:48,805-Speed 6280.88 samples/sec Loss 2.6172 LearningRate 0.0000 Epoch: 37 Global Step: 776030 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:52,063-Speed 6287.16 samples/sec Loss 2.5939 LearningRate 0.0000 Epoch: 37 Global Step: 776040 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:55,325-Speed 6279.87 samples/sec Loss 2.6470 LearningRate 0.0000 Epoch: 37 Global Step: 776050 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:20:58,576-Speed 6301.01 samples/sec Loss 2.6390 LearningRate 0.0000 Epoch: 37 Global Step: 776060 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:01,835-Speed 6286.01 samples/sec Loss 2.6188 LearningRate 0.0000 Epoch: 37 Global Step: 776070 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:05,096-Speed 6281.92 samples/sec Loss 2.5973 LearningRate 0.0000 Epoch: 37 Global Step: 776080 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:21:08,328-Speed 6337.90 samples/sec Loss 2.5776 LearningRate 0.0000 Epoch: 37 Global Step: 776090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:11,585-Speed 6288.29 samples/sec Loss 2.6157 LearningRate 0.0000 Epoch: 37 Global Step: 776100 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:14,848-Speed 6279.42 samples/sec Loss 2.5653 LearningRate 0.0000 Epoch: 37 Global Step: 776110 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:18,147-Speed 6208.90 samples/sec Loss 2.5452 LearningRate 0.0000 Epoch: 37 Global Step: 776120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:21,410-Speed 6278.61 samples/sec Loss 2.6221 LearningRate 0.0000 Epoch: 37 Global Step: 776130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:24,669-Speed 6284.69 samples/sec Loss 2.6227 LearningRate 0.0000 Epoch: 37 Global Step: 776140 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:27,921-Speed 6298.92 samples/sec Loss 2.5939 LearningRate 0.0000 Epoch: 37 Global Step: 776150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:31,178-Speed 6290.30 samples/sec Loss 2.5923 LearningRate 0.0000 Epoch: 37 Global Step: 776160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:34,431-Speed 6295.59 samples/sec Loss 2.6053 LearningRate 0.0000 Epoch: 37 Global Step: 776170 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:37,685-Speed 6296.40 samples/sec Loss 2.6169 LearningRate 0.0000 Epoch: 37 Global Step: 776180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:40,932-Speed 6309.34 samples/sec Loss 2.6290 LearningRate 0.0000 Epoch: 37 Global Step: 776190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:44,194-Speed 6279.84 samples/sec Loss 2.5312 LearningRate 0.0000 Epoch: 37 Global Step: 776200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:47,446-Speed 6299.12 samples/sec Loss 2.5717 LearningRate 0.0000 Epoch: 37 Global Step: 776210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:50,701-Speed 6293.27 samples/sec Loss 2.5870 LearningRate 0.0000 Epoch: 37 Global Step: 776220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:53,966-Speed 6274.34 samples/sec Loss 2.6390 LearningRate 0.0000 Epoch: 37 Global Step: 776230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:21:57,225-Speed 6284.34 samples/sec Loss 2.6042 LearningRate 0.0000 Epoch: 37 Global Step: 776240 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:00,483-Speed 6288.26 samples/sec Loss 2.5677 LearningRate 0.0000 Epoch: 37 Global Step: 776250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:03,739-Speed 6292.13 samples/sec Loss 2.6174 LearningRate 0.0000 Epoch: 37 Global Step: 776260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:06,995-Speed 6290.89 samples/sec Loss 2.6184 LearningRate 0.0000 Epoch: 37 Global Step: 776270 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:10,256-Speed 6281.63 samples/sec Loss 2.5832 LearningRate 0.0000 Epoch: 37 Global Step: 776280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:13,500-Speed 6314.72 samples/sec Loss 2.6078 LearningRate 0.0000 Epoch: 37 Global Step: 776290 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:16,769-Speed 6264.68 samples/sec Loss 2.5820 LearningRate 0.0000 Epoch: 37 Global Step: 776300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:20,024-Speed 6293.99 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 37 Global Step: 776310 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:23,285-Speed 6282.50 samples/sec Loss 2.5825 LearningRate 0.0000 Epoch: 37 Global Step: 776320 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:26,540-Speed 6293.31 samples/sec Loss 2.5972 LearningRate 0.0000 Epoch: 37 Global Step: 776330 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:29,808-Speed 6266.83 samples/sec Loss 2.6373 LearningRate 0.0000 Epoch: 37 Global Step: 776340 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:33,063-Speed 6293.98 samples/sec Loss 2.5873 LearningRate 0.0000 Epoch: 37 Global Step: 776350 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:36,318-Speed 6292.72 samples/sec Loss 2.5880 LearningRate 0.0000 Epoch: 37 Global Step: 776360 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:39,572-Speed 6296.40 samples/sec Loss 2.5994 LearningRate 0.0000 Epoch: 37 Global Step: 776370 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:42,826-Speed 6295.90 samples/sec Loss 2.6168 LearningRate 0.0000 Epoch: 37 Global Step: 776380 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:46,064-Speed 6324.92 samples/sec Loss 2.5943 LearningRate 0.0000 Epoch: 37 Global Step: 776390 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:49,319-Speed 6293.89 samples/sec Loss 2.5746 LearningRate 0.0000 Epoch: 37 Global Step: 776400 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:52,577-Speed 6288.77 samples/sec Loss 2.6034 LearningRate 0.0000 Epoch: 37 Global Step: 776410 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:55,830-Speed 6296.72 samples/sec Loss 2.5894 LearningRate 0.0000 Epoch: 37 Global Step: 776420 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:22:59,078-Speed 6305.45 samples/sec Loss 2.6355 LearningRate 0.0000 Epoch: 37 Global Step: 776430 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:02,340-Speed 6279.79 samples/sec Loss 2.5611 LearningRate 0.0000 Epoch: 37 Global Step: 776440 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:05,601-Speed 6282.66 samples/sec Loss 2.6026 LearningRate 0.0000 Epoch: 37 Global Step: 776450 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:08,862-Speed 6280.78 samples/sec Loss 2.6114 LearningRate 0.0000 Epoch: 37 Global Step: 776460 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:12,119-Speed 6289.54 samples/sec Loss 2.6152 LearningRate 0.0000 Epoch: 37 Global Step: 776470 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:15,441-Speed 6166.86 samples/sec Loss 2.5969 LearningRate 0.0000 Epoch: 37 Global Step: 776480 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:18,688-Speed 6309.12 samples/sec Loss 2.5730 LearningRate 0.0000 Epoch: 37 Global Step: 776490 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:21,943-Speed 6292.83 samples/sec Loss 2.6004 LearningRate 0.0000 Epoch: 37 Global Step: 776500 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:25,205-Speed 6279.58 samples/sec Loss 2.5614 LearningRate 0.0000 Epoch: 37 Global Step: 776510 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:28,464-Speed 6286.02 samples/sec Loss 2.6098 LearningRate 0.0000 Epoch: 37 Global Step: 776520 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:31,722-Speed 6287.33 samples/sec Loss 2.5788 LearningRate 0.0000 Epoch: 37 Global Step: 776530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:34,977-Speed 6292.59 samples/sec Loss 2.5944 LearningRate 0.0000 Epoch: 37 Global Step: 776540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:38,240-Speed 6277.41 samples/sec Loss 2.6311 LearningRate 0.0000 Epoch: 37 Global Step: 776550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:41,501-Speed 6283.12 samples/sec Loss 2.6244 LearningRate 0.0000 Epoch: 37 Global Step: 776560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:44,765-Speed 6275.22 samples/sec Loss 2.6518 LearningRate 0.0000 Epoch: 37 Global Step: 776570 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:48,019-Speed 6295.36 samples/sec Loss 2.6108 LearningRate 0.0000 Epoch: 37 Global Step: 776580 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:51,291-Speed 6261.09 samples/sec Loss 2.5799 LearningRate 0.0000 Epoch: 37 Global Step: 776590 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:23:54,532-Speed 6320.28 samples/sec Loss 2.6035 LearningRate 0.0000 Epoch: 37 Global Step: 776600 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:23:57,791-Speed 6284.86 samples/sec Loss 2.5624 LearningRate 0.0000 Epoch: 37 Global Step: 776610 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:01,053-Speed 6280.43 samples/sec Loss 2.5900 LearningRate 0.0000 Epoch: 37 Global Step: 776620 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:04,308-Speed 6293.39 samples/sec Loss 2.6021 LearningRate 0.0000 Epoch: 37 Global Step: 776630 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:07,562-Speed 6295.67 samples/sec Loss 2.5854 LearningRate 0.0000 Epoch: 37 Global Step: 776640 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:10,813-Speed 6299.96 samples/sec Loss 2.6322 LearningRate 0.0000 Epoch: 37 Global Step: 776650 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:14,073-Speed 6284.77 samples/sec Loss 2.6146 LearningRate 0.0000 Epoch: 37 Global Step: 776660 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:17,323-Speed 6301.86 samples/sec Loss 2.5844 LearningRate 0.0000 Epoch: 37 Global Step: 776670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:20,580-Speed 6289.52 samples/sec Loss 2.5927 LearningRate 0.0000 Epoch: 37 Global Step: 776680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:23,839-Speed 6286.56 samples/sec Loss 2.5898 LearningRate 0.0000 Epoch: 37 Global Step: 776690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:27,076-Speed 6327.13 samples/sec Loss 2.6227 LearningRate 0.0000 Epoch: 37 Global Step: 776700 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:30,340-Speed 6275.45 samples/sec Loss 2.5975 LearningRate 0.0000 Epoch: 37 Global Step: 776710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:33,607-Speed 6270.94 samples/sec Loss 2.6082 LearningRate 0.0000 Epoch: 37 Global Step: 776720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:36,895-Speed 6230.61 samples/sec Loss 2.5782 LearningRate 0.0000 Epoch: 37 Global Step: 776730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:40,165-Speed 6262.71 samples/sec Loss 2.5857 LearningRate 0.0000 Epoch: 37 Global Step: 776740 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:43,423-Speed 6289.16 samples/sec Loss 2.6175 LearningRate 0.0000 Epoch: 37 Global Step: 776750 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:46,682-Speed 6284.48 samples/sec Loss 2.5950 LearningRate 0.0000 Epoch: 37 Global Step: 776760 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:49,944-Speed 6280.43 samples/sec Loss 2.6025 LearningRate 0.0000 Epoch: 37 Global Step: 776770 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:53,198-Speed 6296.48 samples/sec Loss 2.6026 LearningRate 0.0000 Epoch: 37 Global Step: 776780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:56,456-Speed 6287.52 samples/sec Loss 2.5988 LearningRate 0.0000 Epoch: 37 Global Step: 776790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:24:59,700-Speed 6313.24 samples/sec Loss 2.6311 LearningRate 0.0000 Epoch: 37 Global Step: 776800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:25:02,963-Speed 6279.15 samples/sec Loss 2.6039 LearningRate 0.0000 Epoch: 37 Global Step: 776810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:25:06,221-Speed 6286.47 samples/sec Loss 2.5495 LearningRate 0.0000 Epoch: 37 Global Step: 776820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:25:09,473-Speed 6300.10 samples/sec Loss 2.5724 LearningRate 0.0000 Epoch: 37 Global Step: 776830 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:25:12,728-Speed 6292.43 samples/sec Loss 2.5950 LearningRate 0.0000 Epoch: 37 Global Step: 776840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:25:15,993-Speed 6274.37 samples/sec Loss 2.5702 LearningRate 0.0000 Epoch: 37 Global Step: 776850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:25:19,255-Speed 6279.94 samples/sec Loss 2.6014 LearningRate 0.0000 Epoch: 37 Global Step: 776860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:25:22,520-Speed 6274.06 samples/sec Loss 2.5368 LearningRate 0.0000 Epoch: 37 Global Step: 776870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:25:25,775-Speed 6293.04 samples/sec Loss 2.5949 LearningRate 0.0000 Epoch: 37 Global Step: 776880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:25:29,020-Speed 6313.13 samples/sec Loss 2.5816 LearningRate 0.0000 Epoch: 37 Global Step: 776890 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:25:32,274-Speed 6297.81 samples/sec Loss 2.6110 LearningRate 0.0000 Epoch: 37 Global Step: 776900 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:25:35,534-Speed 6284.17 samples/sec Loss 2.6011 LearningRate 0.0000 Epoch: 37 Global Step: 776910 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:25:38,803-Speed 6266.41 samples/sec Loss 2.5863 LearningRate 0.0000 Epoch: 37 Global Step: 776920 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:25:42,087-Speed 6238.42 samples/sec Loss 2.5779 LearningRate 0.0000 Epoch: 37 Global Step: 776930 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:25:45,341-Speed 6294.93 samples/sec Loss 2.5885 LearningRate 0.0000 Epoch: 37 Global Step: 776940 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:25:48,611-Speed 6264.28 samples/sec Loss 2.6334 LearningRate 0.0000 Epoch: 37 Global Step: 776950 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:25:51,867-Speed 6290.44 samples/sec Loss 2.6182 LearningRate 0.0000 Epoch: 37 Global Step: 776960 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:25:55,117-Speed 6304.68 samples/sec Loss 2.5758 LearningRate 0.0000 Epoch: 37 Global Step: 776970 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:25:58,371-Speed 6294.94 samples/sec Loss 2.6053 LearningRate 0.0000 Epoch: 37 Global Step: 776980 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:26:01,629-Speed 6288.05 samples/sec Loss 2.6452 LearningRate 0.0000 Epoch: 37 Global Step: 776990 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:04,893-Speed 6275.00 samples/sec Loss 2.5930 LearningRate 0.0000 Epoch: 37 Global Step: 777000 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:08,145-Speed 6298.82 samples/sec Loss 2.6027 LearningRate 0.0000 Epoch: 37 Global Step: 777010 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:11,416-Speed 6265.42 samples/sec Loss 2.5843 LearningRate 0.0000 Epoch: 37 Global Step: 777020 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:14,678-Speed 6279.73 samples/sec Loss 2.5498 LearningRate 0.0000 Epoch: 37 Global Step: 777030 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:17,937-Speed 6286.48 samples/sec Loss 2.6207 LearningRate 0.0000 Epoch: 37 Global Step: 777040 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:21,191-Speed 6294.95 samples/sec Loss 2.5793 LearningRate 0.0000 Epoch: 37 Global Step: 777050 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:24,452-Speed 6281.63 samples/sec Loss 2.5949 LearningRate 0.0000 Epoch: 37 Global Step: 777060 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:27,709-Speed 6289.64 samples/sec Loss 2.5662 LearningRate 0.0000 Epoch: 37 Global Step: 777070 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:30,973-Speed 6274.36 samples/sec Loss 2.6105 LearningRate 0.0000 Epoch: 37 Global Step: 777080 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:34,220-Speed 6308.85 samples/sec Loss 2.6218 LearningRate 0.0000 Epoch: 37 Global Step: 777090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:37,488-Speed 6269.13 samples/sec Loss 2.5652 LearningRate 0.0000 Epoch: 37 Global Step: 777100 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:40,759-Speed 6262.76 samples/sec Loss 2.5766 LearningRate 0.0000 Epoch: 37 Global Step: 777110 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:44,022-Speed 6278.20 samples/sec Loss 2.5831 LearningRate 0.0000 Epoch: 37 Global Step: 777120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:47,271-Speed 6303.43 samples/sec Loss 2.6135 LearningRate 0.0000 Epoch: 37 Global Step: 777130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:50,529-Speed 6288.77 samples/sec Loss 2.5969 LearningRate 0.0000 Epoch: 37 Global Step: 777140 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:53,786-Speed 6289.48 samples/sec Loss 2.5716 LearningRate 0.0000 Epoch: 37 Global Step: 777150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:26:57,043-Speed 6287.72 samples/sec Loss 2.5264 LearningRate 0.0000 Epoch: 37 Global Step: 777160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:00,306-Speed 6279.54 samples/sec Loss 2.6204 LearningRate 0.0000 Epoch: 37 Global Step: 777170 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:03,575-Speed 6265.53 samples/sec Loss 2.5658 LearningRate 0.0000 Epoch: 37 Global Step: 777180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:06,812-Speed 6328.90 samples/sec Loss 2.6109 LearningRate 0.0000 Epoch: 37 Global Step: 777190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:10,070-Speed 6287.64 samples/sec Loss 2.5947 LearningRate 0.0000 Epoch: 37 Global Step: 777200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:13,321-Speed 6300.52 samples/sec Loss 2.5689 LearningRate 0.0000 Epoch: 37 Global Step: 777210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:16,587-Speed 6273.35 samples/sec Loss 2.5722 LearningRate 0.0000 Epoch: 37 Global Step: 777220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:19,843-Speed 6289.76 samples/sec Loss 2.6345 LearningRate 0.0000 Epoch: 37 Global Step: 777230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:23,117-Speed 6258.42 samples/sec Loss 2.6024 LearningRate 0.0000 Epoch: 37 Global Step: 777240 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:26,369-Speed 6298.93 samples/sec Loss 2.6281 LearningRate 0.0000 Epoch: 37 Global Step: 777250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:29,628-Speed 6284.08 samples/sec Loss 2.6369 LearningRate 0.0000 Epoch: 37 Global Step: 777260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:32,889-Speed 6282.29 samples/sec Loss 2.5697 LearningRate 0.0000 Epoch: 37 Global Step: 777270 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:36,149-Speed 6287.05 samples/sec Loss 2.6066 LearningRate 0.0000 Epoch: 37 Global Step: 777280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:39,410-Speed 6282.40 samples/sec Loss 2.6863 LearningRate 0.0000 Epoch: 37 Global Step: 777290 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:27:42,643-Speed 6334.47 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 37 Global Step: 777300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:45,893-Speed 6303.23 samples/sec Loss 2.6287 LearningRate 0.0000 Epoch: 37 Global Step: 777310 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:49,142-Speed 6305.39 samples/sec Loss 2.5448 LearningRate 0.0000 Epoch: 37 Global Step: 777320 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:52,398-Speed 6292.17 samples/sec Loss 2.6415 LearningRate 0.0000 Epoch: 37 Global Step: 777330 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:55,654-Speed 6290.21 samples/sec Loss 2.5944 LearningRate 0.0000 Epoch: 37 Global Step: 777340 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:27:58,911-Speed 6288.82 samples/sec Loss 2.6090 LearningRate 0.0000 Epoch: 37 Global Step: 777350 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:02,187-Speed 6254.35 samples/sec Loss 2.6080 LearningRate 0.0000 Epoch: 37 Global Step: 777360 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:05,448-Speed 6281.30 samples/sec Loss 2.5943 LearningRate 0.0000 Epoch: 37 Global Step: 777370 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:08,707-Speed 6285.61 samples/sec Loss 2.5977 LearningRate 0.0000 Epoch: 37 Global Step: 777380 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:11,966-Speed 6285.47 samples/sec Loss 2.5949 LearningRate 0.0000 Epoch: 37 Global Step: 777390 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:15,208-Speed 6320.15 samples/sec Loss 2.6009 LearningRate 0.0000 Epoch: 37 Global Step: 777400 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:18,465-Speed 6289.46 samples/sec Loss 2.6023 LearningRate 0.0000 Epoch: 37 Global Step: 777410 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:21,727-Speed 6279.03 samples/sec Loss 2.6254 LearningRate 0.0000 Epoch: 37 Global Step: 777420 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:24,987-Speed 6284.42 samples/sec Loss 2.5761 LearningRate 0.0000 Epoch: 37 Global Step: 777430 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:28,248-Speed 6280.49 samples/sec Loss 2.5642 LearningRate 0.0000 Epoch: 37 Global Step: 777440 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:31,503-Speed 6292.92 samples/sec Loss 2.5778 LearningRate 0.0000 Epoch: 37 Global Step: 777450 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:34,756-Speed 6297.77 samples/sec Loss 2.5744 LearningRate 0.0000 Epoch: 37 Global Step: 777460 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:38,009-Speed 6297.36 samples/sec Loss 2.5848 LearningRate 0.0000 Epoch: 37 Global Step: 777470 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:41,264-Speed 6292.52 samples/sec Loss 2.6058 LearningRate 0.0000 Epoch: 37 Global Step: 777480 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:44,522-Speed 6287.64 samples/sec Loss 2.6059 LearningRate 0.0000 Epoch: 37 Global Step: 777490 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:47,751-Speed 6343.75 samples/sec Loss 2.5849 LearningRate 0.0000 Epoch: 37 Global Step: 777500 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:51,010-Speed 6286.10 samples/sec Loss 2.5904 LearningRate 0.0000 Epoch: 37 Global Step: 777510 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:54,261-Speed 6300.92 samples/sec Loss 2.6048 LearningRate 0.0000 Epoch: 37 Global Step: 777520 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:28:57,516-Speed 6293.64 samples/sec Loss 2.6229 LearningRate 0.0000 Epoch: 37 Global Step: 777530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:00,774-Speed 6286.08 samples/sec Loss 2.6272 LearningRate 0.0000 Epoch: 37 Global Step: 777540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:04,039-Speed 6273.66 samples/sec Loss 2.5737 LearningRate 0.0000 Epoch: 37 Global Step: 777550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:07,294-Speed 6294.91 samples/sec Loss 2.6038 LearningRate 0.0000 Epoch: 37 Global Step: 777560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:10,574-Speed 6244.82 samples/sec Loss 2.6370 LearningRate 0.0000 Epoch: 37 Global Step: 777570 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:13,844-Speed 6265.73 samples/sec Loss 2.5660 LearningRate 0.0000 Epoch: 37 Global Step: 777580 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:17,099-Speed 6292.93 samples/sec Loss 2.5476 LearningRate 0.0000 Epoch: 37 Global Step: 777590 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:20,348-Speed 6305.01 samples/sec Loss 2.6087 LearningRate 0.0000 Epoch: 37 Global Step: 777600 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:23,613-Speed 6273.40 samples/sec Loss 2.5773 LearningRate 0.0000 Epoch: 37 Global Step: 777610 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:26,874-Speed 6282.35 samples/sec Loss 2.6145 LearningRate 0.0000 Epoch: 37 Global Step: 777620 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:30,130-Speed 6290.86 samples/sec Loss 2.5773 LearningRate 0.0000 Epoch: 37 Global Step: 777630 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:33,388-Speed 6287.30 samples/sec Loss 2.6602 LearningRate 0.0000 Epoch: 37 Global Step: 777640 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:36,640-Speed 6300.14 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 37 Global Step: 777650 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:39,894-Speed 6294.29 samples/sec Loss 2.5753 LearningRate 0.0000 Epoch: 37 Global Step: 777660 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:43,145-Speed 6301.51 samples/sec Loss 2.5798 LearningRate 0.0000 Epoch: 37 Global Step: 777670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:46,404-Speed 6284.48 samples/sec Loss 2.6010 LearningRate 0.0000 Epoch: 37 Global Step: 777680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:49,663-Speed 6287.15 samples/sec Loss 2.5677 LearningRate 0.0000 Epoch: 37 Global Step: 777690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:52,928-Speed 6273.82 samples/sec Loss 2.5938 LearningRate 0.0000 Epoch: 37 Global Step: 777700 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:29:56,178-Speed 6301.36 samples/sec Loss 2.5817 LearningRate 0.0000 Epoch: 37 Global Step: 777710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:29:59,439-Speed 6282.65 samples/sec Loss 2.5842 LearningRate 0.0000 Epoch: 37 Global Step: 777720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:02,709-Speed 6263.60 samples/sec Loss 2.6023 LearningRate 0.0000 Epoch: 37 Global Step: 777730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:05,964-Speed 6292.77 samples/sec Loss 2.5525 LearningRate 0.0000 Epoch: 37 Global Step: 777740 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:09,232-Speed 6270.20 samples/sec Loss 2.6398 LearningRate 0.0000 Epoch: 37 Global Step: 777750 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:12,487-Speed 6292.44 samples/sec Loss 2.6670 LearningRate 0.0000 Epoch: 37 Global Step: 777760 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:15,747-Speed 6282.69 samples/sec Loss 2.5972 LearningRate 0.0000 Epoch: 37 Global Step: 777770 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:19,002-Speed 6294.69 samples/sec Loss 2.5865 LearningRate 0.0000 Epoch: 37 Global Step: 777780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:22,253-Speed 6300.68 samples/sec Loss 2.5418 LearningRate 0.0000 Epoch: 37 Global Step: 777790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:25,516-Speed 6282.16 samples/sec Loss 2.5611 LearningRate 0.0000 Epoch: 37 Global Step: 777800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:28,781-Speed 6273.41 samples/sec Loss 2.5598 LearningRate 0.0000 Epoch: 37 Global Step: 777810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:32,049-Speed 6268.01 samples/sec Loss 2.6559 LearningRate 0.0000 Epoch: 37 Global Step: 777820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:35,311-Speed 6279.51 samples/sec Loss 2.6205 LearningRate 0.0000 Epoch: 37 Global Step: 777830 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:38,568-Speed 6291.15 samples/sec Loss 2.5885 LearningRate 0.0000 Epoch: 37 Global Step: 777840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:41,826-Speed 6287.28 samples/sec Loss 2.6640 LearningRate 0.0000 Epoch: 37 Global Step: 777850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:45,079-Speed 6296.46 samples/sec Loss 2.5446 LearningRate 0.0000 Epoch: 37 Global Step: 777860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:48,333-Speed 6295.32 samples/sec Loss 2.6128 LearningRate 0.0000 Epoch: 37 Global Step: 777870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:51,595-Speed 6279.91 samples/sec Loss 2.5768 LearningRate 0.0000 Epoch: 37 Global Step: 777880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:54,849-Speed 6295.94 samples/sec Loss 2.5809 LearningRate 0.0000 Epoch: 37 Global Step: 777890 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:30:58,106-Speed 6288.60 samples/sec Loss 2.5629 LearningRate 0.0000 Epoch: 37 Global Step: 777900 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:01,347-Speed 6319.75 samples/sec Loss 2.6291 LearningRate 0.0000 Epoch: 37 Global Step: 777910 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:04,613-Speed 6272.68 samples/sec Loss 2.6051 LearningRate 0.0000 Epoch: 37 Global Step: 777920 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:07,876-Speed 6278.45 samples/sec Loss 2.5940 LearningRate 0.0000 Epoch: 37 Global Step: 777930 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:11,135-Speed 6284.45 samples/sec Loss 2.5879 LearningRate 0.0000 Epoch: 37 Global Step: 777940 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:14,398-Speed 6278.80 samples/sec Loss 2.5600 LearningRate 0.0000 Epoch: 37 Global Step: 777950 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:17,652-Speed 6294.41 samples/sec Loss 2.6017 LearningRate 0.0000 Epoch: 37 Global Step: 777960 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:20,907-Speed 6293.09 samples/sec Loss 2.5675 LearningRate 0.0000 Epoch: 37 Global Step: 777970 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:24,159-Speed 6300.31 samples/sec Loss 2.5893 LearningRate 0.0000 Epoch: 37 Global Step: 777980 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:27,406-Speed 6307.99 samples/sec Loss 2.5899 LearningRate 0.0000 Epoch: 37 Global Step: 777990 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:30,666-Speed 6284.02 samples/sec Loss 2.5786 LearningRate 0.0000 Epoch: 37 Global Step: 778000 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:33,917-Speed 6300.91 samples/sec Loss 2.6197 LearningRate 0.0000 Epoch: 37 Global Step: 778010 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:37,180-Speed 6277.89 samples/sec Loss 2.5840 LearningRate 0.0000 Epoch: 37 Global Step: 778020 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:40,436-Speed 6292.29 samples/sec Loss 2.6064 LearningRate 0.0000 Epoch: 37 Global Step: 778030 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:43,697-Speed 6279.74 samples/sec Loss 2.5297 LearningRate 0.0000 Epoch: 37 Global Step: 778040 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:46,953-Speed 6292.45 samples/sec Loss 2.5846 LearningRate 0.0000 Epoch: 37 Global Step: 778050 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:50,214-Speed 6282.20 samples/sec Loss 2.5900 LearningRate 0.0000 Epoch: 37 Global Step: 778060 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:53,478-Speed 6275.12 samples/sec Loss 2.6352 LearningRate 0.0000 Epoch: 37 Global Step: 778070 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:31:56,747-Speed 6266.69 samples/sec Loss 2.6227 LearningRate 0.0000 Epoch: 37 Global Step: 778080 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:00,009-Speed 6279.91 samples/sec Loss 2.6170 LearningRate 0.0000 Epoch: 37 Global Step: 778090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:03,273-Speed 6276.42 samples/sec Loss 2.6038 LearningRate 0.0000 Epoch: 37 Global Step: 778100 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:06,535-Speed 6279.62 samples/sec Loss 2.5774 LearningRate 0.0000 Epoch: 37 Global Step: 778110 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:32:09,780-Speed 6312.44 samples/sec Loss 2.6320 LearningRate 0.0000 Epoch: 37 Global Step: 778120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:13,044-Speed 6274.95 samples/sec Loss 2.5787 LearningRate 0.0000 Epoch: 37 Global Step: 778130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:16,330-Speed 6233.67 samples/sec Loss 2.6046 LearningRate 0.0000 Epoch: 37 Global Step: 778140 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:19,591-Speed 6283.00 samples/sec Loss 2.5803 LearningRate 0.0000 Epoch: 37 Global Step: 778150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:22,856-Speed 6273.61 samples/sec Loss 2.5604 LearningRate 0.0000 Epoch: 37 Global Step: 778160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:26,113-Speed 6289.39 samples/sec Loss 2.5771 LearningRate 0.0000 Epoch: 37 Global Step: 778170 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:29,370-Speed 6290.59 samples/sec Loss 2.5741 LearningRate 0.0000 Epoch: 37 Global Step: 778180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:32,628-Speed 6287.59 samples/sec Loss 2.6289 LearningRate 0.0000 Epoch: 37 Global Step: 778190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:35,884-Speed 6291.01 samples/sec Loss 2.5684 LearningRate 0.0000 Epoch: 37 Global Step: 778200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:39,144-Speed 6283.66 samples/sec Loss 2.5933 LearningRate 0.0000 Epoch: 37 Global Step: 778210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:42,384-Speed 6321.84 samples/sec Loss 2.6166 LearningRate 0.0000 Epoch: 37 Global Step: 778220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:45,647-Speed 6278.61 samples/sec Loss 2.6135 LearningRate 0.0000 Epoch: 37 Global Step: 778230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:48,908-Speed 6281.43 samples/sec Loss 2.5659 LearningRate 0.0000 Epoch: 37 Global Step: 778240 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:52,173-Speed 6273.10 samples/sec Loss 2.6109 LearningRate 0.0000 Epoch: 37 Global Step: 778250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:55,425-Speed 6299.99 samples/sec Loss 2.5737 LearningRate 0.0000 Epoch: 37 Global Step: 778260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:32:58,680-Speed 6292.87 samples/sec Loss 2.6233 LearningRate 0.0000 Epoch: 37 Global Step: 778270 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:33:01,934-Speed 6294.52 samples/sec Loss 2.6225 LearningRate 0.0000 Epoch: 37 Global Step: 778280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:33:05,194-Speed 6285.32 samples/sec Loss 2.6212 LearningRate 0.0000 Epoch: 37 Global Step: 778290 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:33:08,449-Speed 6291.71 samples/sec Loss 2.6058 LearningRate 0.0000 Epoch: 37 Global Step: 778300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:33:11,733-Speed 6237.78 samples/sec Loss 2.5950 LearningRate 0.0000 Epoch: 37 Global Step: 778310 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:33:14,976-Speed 6316.13 samples/sec Loss 2.6161 LearningRate 0.0000 Epoch: 37 Global Step: 778320 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:33:18,234-Speed 6288.48 samples/sec Loss 2.6377 LearningRate 0.0000 Epoch: 37 Global Step: 778330 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:33:21,485-Speed 6300.52 samples/sec Loss 2.6253 LearningRate 0.0000 Epoch: 37 Global Step: 778340 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:33:24,749-Speed 6275.25 samples/sec Loss 2.6054 LearningRate 0.0000 Epoch: 37 Global Step: 778350 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:33:28,026-Speed 6252.64 samples/sec Loss 2.5987 LearningRate 0.0000 Epoch: 37 Global Step: 778360 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:33:31,280-Speed 6294.44 samples/sec Loss 2.5553 LearningRate 0.0000 Epoch: 37 Global Step: 778370 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:33:34,541-Speed 6282.33 samples/sec Loss 2.5831 LearningRate 0.0000 Epoch: 37 Global Step: 778380 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:33:37,801-Speed 6283.70 samples/sec Loss 2.6180 LearningRate 0.0000 Epoch: 37 Global Step: 778390 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:33:41,059-Speed 6288.30 samples/sec Loss 2.6268 LearningRate 0.0000 Epoch: 37 Global Step: 778400 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:33:44,319-Speed 6282.16 samples/sec Loss 2.6156 LearningRate 0.0000 Epoch: 37 Global Step: 778410 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:33:47,574-Speed 6293.59 samples/sec Loss 2.5811 LearningRate 0.0000 Epoch: 37 Global Step: 778420 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:33:50,838-Speed 6276.81 samples/sec Loss 2.5877 LearningRate 0.0000 Epoch: 37 Global Step: 778430 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:33:54,112-Speed 6255.35 samples/sec Loss 2.6696 LearningRate 0.0000 Epoch: 37 Global Step: 778440 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:33:57,370-Speed 6287.79 samples/sec Loss 2.6056 LearningRate 0.0000 Epoch: 37 Global Step: 778450 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:00,622-Speed 6299.25 samples/sec Loss 2.5445 LearningRate 0.0000 Epoch: 37 Global Step: 778460 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:03,880-Speed 6289.11 samples/sec Loss 2.5799 LearningRate 0.0000 Epoch: 37 Global Step: 778470 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:07,136-Speed 6290.95 samples/sec Loss 2.6399 LearningRate 0.0000 Epoch: 37 Global Step: 778480 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:10,391-Speed 6292.18 samples/sec Loss 2.6224 LearningRate 0.0000 Epoch: 37 Global Step: 778490 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:13,662-Speed 6262.61 samples/sec Loss 2.6203 LearningRate 0.0000 Epoch: 37 Global Step: 778500 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:16,935-Speed 6258.63 samples/sec Loss 2.5710 LearningRate 0.0000 Epoch: 37 Global Step: 778510 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:20,182-Speed 6309.29 samples/sec Loss 2.6235 LearningRate 0.0000 Epoch: 37 Global Step: 778520 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:23,437-Speed 6292.86 samples/sec Loss 2.6183 LearningRate 0.0000 Epoch: 37 Global Step: 778530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:26,694-Speed 6289.79 samples/sec Loss 2.5842 LearningRate 0.0000 Epoch: 37 Global Step: 778540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:29,943-Speed 6304.17 samples/sec Loss 2.5749 LearningRate 0.0000 Epoch: 37 Global Step: 778550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:33,197-Speed 6295.92 samples/sec Loss 2.5965 LearningRate 0.0000 Epoch: 37 Global Step: 778560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:36,461-Speed 6275.84 samples/sec Loss 2.6099 LearningRate 0.0000 Epoch: 37 Global Step: 778570 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:39,737-Speed 6253.70 samples/sec Loss 2.6027 LearningRate 0.0000 Epoch: 37 Global Step: 778580 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:43,006-Speed 6266.93 samples/sec Loss 2.5647 LearningRate 0.0000 Epoch: 37 Global Step: 778590 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:46,269-Speed 6277.27 samples/sec Loss 2.5493 LearningRate 0.0000 Epoch: 37 Global Step: 778600 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:49,528-Speed 6286.29 samples/sec Loss 2.5993 LearningRate 0.0000 Epoch: 37 Global Step: 778610 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:52,769-Speed 6319.76 samples/sec Loss 2.6513 LearningRate 0.0000 Epoch: 37 Global Step: 778620 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:56,027-Speed 6288.05 samples/sec Loss 2.5924 LearningRate 0.0000 Epoch: 37 Global Step: 778630 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:34:59,303-Speed 6253.08 samples/sec Loss 2.5717 LearningRate 0.0000 Epoch: 37 Global Step: 778640 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:35:02,555-Speed 6298.92 samples/sec Loss 2.5844 LearningRate 0.0000 Epoch: 37 Global Step: 778650 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:35:05,806-Speed 6300.35 samples/sec Loss 2.6383 LearningRate 0.0000 Epoch: 37 Global Step: 778660 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:35:09,064-Speed 6287.92 samples/sec Loss 2.5640 LearningRate 0.0000 Epoch: 37 Global Step: 778670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:35:12,318-Speed 6294.00 samples/sec Loss 2.5785 LearningRate 0.0000 Epoch: 37 Global Step: 778680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:35:15,582-Speed 6276.99 samples/sec Loss 2.6198 LearningRate 0.0000 Epoch: 37 Global Step: 778690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:35:18,842-Speed 6283.84 samples/sec Loss 2.5978 LearningRate 0.0000 Epoch: 37 Global Step: 778700 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:35:22,117-Speed 6253.22 samples/sec Loss 2.5981 LearningRate 0.0000 Epoch: 37 Global Step: 778710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:35:25,384-Speed 6270.84 samples/sec Loss 2.6221 LearningRate 0.0000 Epoch: 37 Global Step: 778720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:35:28,657-Speed 6259.51 samples/sec Loss 2.5705 LearningRate 0.0000 Epoch: 37 Global Step: 778730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:35:31,898-Speed 6318.66 samples/sec Loss 2.5496 LearningRate 0.0000 Epoch: 37 Global Step: 778740 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:35:35,158-Speed 6284.82 samples/sec Loss 2.6085 LearningRate 0.0000 Epoch: 37 Global Step: 778750 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:35:38,417-Speed 6286.07 samples/sec Loss 2.5828 LearningRate 0.0000 Epoch: 37 Global Step: 778760 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:35:41,682-Speed 6273.64 samples/sec Loss 2.5919 LearningRate 0.0000 Epoch: 37 Global Step: 778770 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:35:44,935-Speed 6296.85 samples/sec Loss 2.5847 LearningRate 0.0000 Epoch: 37 Global Step: 778780 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:35:48,192-Speed 6290.66 samples/sec Loss 2.5839 LearningRate 0.0000 Epoch: 37 Global Step: 778790 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:35:51,452-Speed 6282.22 samples/sec Loss 2.6377 LearningRate 0.0000 Epoch: 37 Global Step: 778800 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:35:54,705-Speed 6297.22 samples/sec Loss 2.5849 LearningRate 0.0000 Epoch: 37 Global Step: 778810 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:35:57,961-Speed 6291.74 samples/sec Loss 2.5623 LearningRate 0.0000 Epoch: 37 Global Step: 778820 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:36:01,223-Speed 6280.44 samples/sec Loss 2.6363 LearningRate 0.0000 Epoch: 37 Global Step: 778830 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:36:04,485-Speed 6279.65 samples/sec Loss 2.6019 LearningRate 0.0000 Epoch: 37 Global Step: 778840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:07,736-Speed 6301.51 samples/sec Loss 2.6450 LearningRate 0.0000 Epoch: 37 Global Step: 778850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:10,991-Speed 6292.59 samples/sec Loss 2.6010 LearningRate 0.0000 Epoch: 37 Global Step: 778860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:14,244-Speed 6296.98 samples/sec Loss 2.6046 LearningRate 0.0000 Epoch: 37 Global Step: 778870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:17,496-Speed 6298.24 samples/sec Loss 2.6378 LearningRate 0.0000 Epoch: 37 Global Step: 778880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:20,750-Speed 6295.68 samples/sec Loss 2.5621 LearningRate 0.0000 Epoch: 37 Global Step: 778890 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:24,016-Speed 6271.99 samples/sec Loss 2.5528 LearningRate 0.0000 Epoch: 37 Global Step: 778900 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:27,283-Speed 6270.65 samples/sec Loss 2.5832 LearningRate 0.0000 Epoch: 37 Global Step: 778910 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:30,552-Speed 6265.96 samples/sec Loss 2.5723 LearningRate 0.0000 Epoch: 37 Global Step: 778920 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:33,826-Speed 6257.12 samples/sec Loss 2.5944 LearningRate 0.0000 Epoch: 37 Global Step: 778930 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:37,066-Speed 6321.60 samples/sec Loss 2.5798 LearningRate 0.0000 Epoch: 37 Global Step: 778940 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:40,320-Speed 6295.56 samples/sec Loss 2.5593 LearningRate 0.0000 Epoch: 37 Global Step: 778950 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:43,598-Speed 6249.89 samples/sec Loss 2.6121 LearningRate 0.0000 Epoch: 37 Global Step: 778960 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:46,854-Speed 6291.63 samples/sec Loss 2.6123 LearningRate 0.0000 Epoch: 37 Global Step: 778970 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:50,116-Speed 6279.92 samples/sec Loss 2.5697 LearningRate 0.0000 Epoch: 37 Global Step: 778980 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:53,372-Speed 6291.80 samples/sec Loss 2.5664 LearningRate 0.0000 Epoch: 37 Global Step: 778990 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:56,629-Speed 6288.61 samples/sec Loss 2.6132 LearningRate 0.0000 Epoch: 37 Global Step: 779000 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:36:59,883-Speed 6295.82 samples/sec Loss 2.5420 LearningRate 0.0000 Epoch: 37 Global Step: 779010 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:03,143-Speed 6283.30 samples/sec Loss 2.5794 LearningRate 0.0000 Epoch: 37 Global Step: 779020 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:06,401-Speed 6287.78 samples/sec Loss 2.5704 LearningRate 0.0000 Epoch: 37 Global Step: 779030 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:09,649-Speed 6305.58 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 37 Global Step: 779040 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:12,908-Speed 6284.70 samples/sec Loss 2.6285 LearningRate 0.0000 Epoch: 37 Global Step: 779050 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:16,171-Speed 6278.71 samples/sec Loss 2.5840 LearningRate 0.0000 Epoch: 37 Global Step: 779060 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:19,431-Speed 6284.65 samples/sec Loss 2.5203 LearningRate 0.0000 Epoch: 37 Global Step: 779070 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:22,689-Speed 6286.21 samples/sec Loss 2.5762 LearningRate 0.0000 Epoch: 37 Global Step: 779080 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:25,942-Speed 6296.79 samples/sec Loss 2.5476 LearningRate 0.0000 Epoch: 37 Global Step: 779090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:29,202-Speed 6283.59 samples/sec Loss 2.6053 LearningRate 0.0000 Epoch: 37 Global Step: 779100 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:32,455-Speed 6297.75 samples/sec Loss 2.5994 LearningRate 0.0000 Epoch: 37 Global Step: 779110 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:35,706-Speed 6301.16 samples/sec Loss 2.6025 LearningRate 0.0000 Epoch: 37 Global Step: 779120 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:38,971-Speed 6275.07 samples/sec Loss 2.5606 LearningRate 0.0000 Epoch: 37 Global Step: 779130 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:42,226-Speed 6292.80 samples/sec Loss 2.6538 LearningRate 0.0000 Epoch: 37 Global Step: 779140 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:37:45,474-Speed 6306.75 samples/sec Loss 2.5862 LearningRate 0.0000 Epoch: 37 Global Step: 779150 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:48,729-Speed 6293.43 samples/sec Loss 2.5830 LearningRate 0.0000 Epoch: 37 Global Step: 779160 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:51,987-Speed 6288.51 samples/sec Loss 2.5488 LearningRate 0.0000 Epoch: 37 Global Step: 779170 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:55,243-Speed 6290.13 samples/sec Loss 2.5980 LearningRate 0.0000 Epoch: 37 Global Step: 779180 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:37:58,501-Speed 6288.63 samples/sec Loss 2.6077 LearningRate 0.0000 Epoch: 37 Global Step: 779190 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:01,772-Speed 6261.98 samples/sec Loss 2.5626 LearningRate 0.0000 Epoch: 37 Global Step: 779200 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:05,035-Speed 6277.83 samples/sec Loss 2.5837 LearningRate 0.0000 Epoch: 37 Global Step: 779210 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:08,288-Speed 6298.05 samples/sec Loss 2.5602 LearningRate 0.0000 Epoch: 37 Global Step: 779220 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:11,544-Speed 6290.90 samples/sec Loss 2.6135 LearningRate 0.0000 Epoch: 37 Global Step: 779230 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:14,804-Speed 6283.32 samples/sec Loss 2.5897 LearningRate 0.0000 Epoch: 37 Global Step: 779240 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:18,046-Speed 6318.75 samples/sec Loss 2.5572 LearningRate 0.0000 Epoch: 37 Global Step: 779250 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:21,299-Speed 6296.57 samples/sec Loss 2.6083 LearningRate 0.0000 Epoch: 37 Global Step: 779260 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:24,557-Speed 6288.54 samples/sec Loss 2.6103 LearningRate 0.0000 Epoch: 37 Global Step: 779270 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:27,810-Speed 6296.92 samples/sec Loss 2.6502 LearningRate 0.0000 Epoch: 37 Global Step: 779280 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:31,068-Speed 6286.69 samples/sec Loss 2.6294 LearningRate 0.0000 Epoch: 37 Global Step: 779290 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:34,328-Speed 6283.71 samples/sec Loss 2.5907 LearningRate 0.0000 Epoch: 37 Global Step: 779300 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:37,593-Speed 6273.37 samples/sec Loss 2.5600 LearningRate 0.0000 Epoch: 37 Global Step: 779310 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:40,845-Speed 6300.00 samples/sec Loss 2.5834 LearningRate 0.0000 Epoch: 37 Global Step: 779320 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:44,100-Speed 6292.67 samples/sec Loss 2.5474 LearningRate 0.0000 Epoch: 37 Global Step: 779330 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:47,398-Speed 6210.83 samples/sec Loss 2.6115 LearningRate 0.0000 Epoch: 37 Global Step: 779340 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:50,635-Speed 6327.79 samples/sec Loss 2.5852 LearningRate 0.0000 Epoch: 37 Global Step: 779350 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:53,887-Speed 6298.87 samples/sec Loss 2.5999 LearningRate 0.0000 Epoch: 37 Global Step: 779360 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:38:57,144-Speed 6291.24 samples/sec Loss 2.5912 LearningRate 0.0000 Epoch: 37 Global Step: 779370 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:00,402-Speed 6287.79 samples/sec Loss 2.5790 LearningRate 0.0000 Epoch: 37 Global Step: 779380 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:03,660-Speed 6287.40 samples/sec Loss 2.6288 LearningRate 0.0000 Epoch: 37 Global Step: 779390 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:06,915-Speed 6292.38 samples/sec Loss 2.5580 LearningRate 0.0000 Epoch: 37 Global Step: 779400 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:10,173-Speed 6287.63 samples/sec Loss 2.6018 LearningRate 0.0000 Epoch: 37 Global Step: 779410 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:13,431-Speed 6288.49 samples/sec Loss 2.5914 LearningRate 0.0000 Epoch: 37 Global Step: 779420 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:16,684-Speed 6295.86 samples/sec Loss 2.5789 LearningRate 0.0000 Epoch: 37 Global Step: 779430 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:19,942-Speed 6288.12 samples/sec Loss 2.6025 LearningRate 0.0000 Epoch: 37 Global Step: 779440 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:23,181-Speed 6324.13 samples/sec Loss 2.5879 LearningRate 0.0000 Epoch: 37 Global Step: 779450 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:26,442-Speed 6282.48 samples/sec Loss 2.5802 LearningRate 0.0000 Epoch: 37 Global Step: 779460 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:29,701-Speed 6285.48 samples/sec Loss 2.5951 LearningRate 0.0000 Epoch: 37 Global Step: 779470 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:32,956-Speed 6293.22 samples/sec Loss 2.6227 LearningRate 0.0000 Epoch: 37 Global Step: 779480 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:36,208-Speed 6298.28 samples/sec Loss 2.6126 LearningRate 0.0000 Epoch: 37 Global Step: 779490 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:39,482-Speed 6257.83 samples/sec Loss 2.5596 LearningRate 0.0000 Epoch: 37 Global Step: 779500 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:42,753-Speed 6261.87 samples/sec Loss 2.6083 LearningRate 0.0000 Epoch: 37 Global Step: 779510 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:46,012-Speed 6285.88 samples/sec Loss 2.5665 LearningRate 0.0000 Epoch: 37 Global Step: 779520 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:49,265-Speed 6296.37 samples/sec Loss 2.5780 LearningRate 0.0000 Epoch: 37 Global Step: 779530 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:52,525-Speed 6283.30 samples/sec Loss 2.5992 LearningRate 0.0000 Epoch: 37 Global Step: 779540 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:55,770-Speed 6313.63 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 37 Global Step: 779550 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:39:59,030-Speed 6282.39 samples/sec Loss 2.5941 LearningRate 0.0000 Epoch: 37 Global Step: 779560 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:02,290-Speed 6284.27 samples/sec Loss 2.5844 LearningRate 0.0000 Epoch: 37 Global Step: 779570 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:05,546-Speed 6291.82 samples/sec Loss 2.6120 LearningRate 0.0000 Epoch: 37 Global Step: 779580 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:08,805-Speed 6285.02 samples/sec Loss 2.6045 LearningRate 0.0000 Epoch: 37 Global Step: 779590 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:12,066-Speed 6283.61 samples/sec Loss 2.5455 LearningRate 0.0000 Epoch: 37 Global Step: 779600 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:15,325-Speed 6285.13 samples/sec Loss 2.5676 LearningRate 0.0000 Epoch: 37 Global Step: 779610 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:18,582-Speed 6289.39 samples/sec Loss 2.6127 LearningRate 0.0000 Epoch: 37 Global Step: 779620 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:21,839-Speed 6289.13 samples/sec Loss 2.5818 LearningRate 0.0000 Epoch: 37 Global Step: 779630 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:25,101-Speed 6279.68 samples/sec Loss 2.6037 LearningRate 0.0000 Epoch: 37 Global Step: 779640 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:28,367-Speed 6272.34 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 37 Global Step: 779650 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:40:31,601-Speed 6332.85 samples/sec Loss 2.5638 LearningRate 0.0000 Epoch: 37 Global Step: 779660 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:34,857-Speed 6292.10 samples/sec Loss 2.5709 LearningRate 0.0000 Epoch: 37 Global Step: 779670 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:38,108-Speed 6300.78 samples/sec Loss 2.6413 LearningRate 0.0000 Epoch: 37 Global Step: 779680 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:41,366-Speed 6287.28 samples/sec Loss 2.5726 LearningRate 0.0000 Epoch: 37 Global Step: 779690 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:44,619-Speed 6296.69 samples/sec Loss 2.5620 LearningRate 0.0000 Epoch: 37 Global Step: 779700 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:47,882-Speed 6277.70 samples/sec Loss 2.6089 LearningRate 0.0000 Epoch: 37 Global Step: 779710 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:51,200-Speed 6173.97 samples/sec Loss 2.5829 LearningRate 0.0000 Epoch: 37 Global Step: 779720 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:54,455-Speed 6294.01 samples/sec Loss 2.5329 LearningRate 0.0000 Epoch: 37 Global Step: 779730 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:40:57,711-Speed 6289.89 samples/sec Loss 2.6213 LearningRate 0.0000 Epoch: 37 Global Step: 779740 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:00,967-Speed 6292.33 samples/sec Loss 2.5857 LearningRate 0.0000 Epoch: 37 Global Step: 779750 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:04,227-Speed 6282.74 samples/sec Loss 2.5965 LearningRate 0.0000 Epoch: 37 Global Step: 779760 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:41:07,471-Speed 6315.93 samples/sec Loss 2.5952 LearningRate 0.0000 Epoch: 37 Global Step: 779770 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:10,727-Speed 6292.05 samples/sec Loss 2.5544 LearningRate 0.0000 Epoch: 37 Global Step: 779780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:13,986-Speed 6286.24 samples/sec Loss 2.5592 LearningRate 0.0000 Epoch: 37 Global Step: 779790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:17,244-Speed 6286.36 samples/sec Loss 2.6328 LearningRate 0.0000 Epoch: 37 Global Step: 779800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:20,493-Speed 6304.75 samples/sec Loss 2.5685 LearningRate 0.0000 Epoch: 37 Global Step: 779810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:23,745-Speed 6299.59 samples/sec Loss 2.5468 LearningRate 0.0000 Epoch: 37 Global Step: 779820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:27,006-Speed 6280.76 samples/sec Loss 2.6075 LearningRate 0.0000 Epoch: 37 Global Step: 779830 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:30,270-Speed 6276.54 samples/sec Loss 2.5519 LearningRate 0.0000 Epoch: 37 Global Step: 779840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:33,534-Speed 6276.61 samples/sec Loss 2.5425 LearningRate 0.0000 Epoch: 37 Global Step: 779850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:36,797-Speed 6276.83 samples/sec Loss 2.5757 LearningRate 0.0000 Epoch: 37 Global Step: 779860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:40,044-Speed 6308.76 samples/sec Loss 2.5901 LearningRate 0.0000 Epoch: 37 Global Step: 779870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:43,304-Speed 6284.81 samples/sec Loss 2.6005 LearningRate 0.0000 Epoch: 37 Global Step: 779880 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:46,562-Speed 6286.15 samples/sec Loss 2.6107 LearningRate 0.0000 Epoch: 37 Global Step: 779890 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:49,818-Speed 6291.34 samples/sec Loss 2.5770 LearningRate 0.0000 Epoch: 37 Global Step: 779900 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:53,092-Speed 6260.59 samples/sec Loss 2.5800 LearningRate 0.0000 Epoch: 37 Global Step: 779910 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:56,351-Speed 6284.65 samples/sec Loss 2.5373 LearningRate 0.0000 Epoch: 37 Global Step: 779920 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:41:59,608-Speed 6289.64 samples/sec Loss 2.6086 LearningRate 0.0000 Epoch: 37 Global Step: 779930 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:42:02,866-Speed 6287.27 samples/sec Loss 2.6019 LearningRate 0.0000 Epoch: 37 Global Step: 779940 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:42:06,126-Speed 6285.17 samples/sec Loss 2.5993 LearningRate 0.0000 Epoch: 37 Global Step: 779950 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:42:09,375-Speed 6303.47 samples/sec Loss 2.6114 LearningRate 0.0000 Epoch: 37 Global Step: 779960 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:42:12,645-Speed 6264.26 samples/sec Loss 2.5590 LearningRate 0.0000 Epoch: 37 Global Step: 779970 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-03 14:42:15,892-Speed 6310.01 samples/sec Loss 2.6330 LearningRate 0.0000 Epoch: 37 Global Step: 779980 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:42:19,133-Speed 6319.49 samples/sec Loss 2.5950 LearningRate 0.0000 Epoch: 37 Global Step: 779990 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:42:22,389-Speed 6290.83 samples/sec Loss 2.5947 LearningRate 0.0000 Epoch: 37 Global Step: 780000 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:42:25,660-Speed 6263.36 samples/sec Loss 2.5848 LearningRate 0.0000 Epoch: 37 Global Step: 780010 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:42:28,925-Speed 6274.33 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 37 Global Step: 780020 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:42:32,189-Speed 6276.77 samples/sec Loss 2.6588 LearningRate 0.0000 Epoch: 37 Global Step: 780030 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:42:35,444-Speed 6292.89 samples/sec Loss 2.5840 LearningRate 0.0000 Epoch: 37 Global Step: 780040 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:42:38,731-Speed 6231.84 samples/sec Loss 2.6007 LearningRate 0.0000 Epoch: 37 Global Step: 780050 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:42:41,995-Speed 6276.89 samples/sec Loss 2.5949 LearningRate 0.0000 Epoch: 37 Global Step: 780060 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:42:45,250-Speed 6292.67 samples/sec Loss 2.5703 LearningRate 0.0000 Epoch: 37 Global Step: 780070 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:42:48,503-Speed 6296.57 samples/sec Loss 2.5733 LearningRate 0.0000 Epoch: 37 Global Step: 780080 Fp16 Grad Scale: 2048 Required: 5 hours Training: 2022-04-03 14:42:51,769-Speed 6272.57 samples/sec Loss 2.6578 LearningRate 0.0000 Epoch: 37 Global Step: 780090 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-03 14:42:55,024-Speed 6293.51 samples/sec Loss 2.5953 LearningRate 0.0000 Epoch: 37 Global Step: 780100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:42:58,279-Speed 6293.25 samples/sec Loss 2.5701 LearningRate 0.0000 Epoch: 37 Global Step: 780110 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:01,533-Speed 6294.53 samples/sec Loss 2.5710 LearningRate 0.0000 Epoch: 37 Global Step: 780120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:04,806-Speed 6258.62 samples/sec Loss 2.6314 LearningRate 0.0000 Epoch: 37 Global Step: 780130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:08,060-Speed 6295.53 samples/sec Loss 2.5907 LearningRate 0.0000 Epoch: 37 Global Step: 780140 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:11,315-Speed 6292.67 samples/sec Loss 2.6255 LearningRate 0.0000 Epoch: 37 Global Step: 780150 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:14,584-Speed 6265.73 samples/sec Loss 2.6297 LearningRate 0.0000 Epoch: 37 Global Step: 780160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:17,841-Speed 6290.60 samples/sec Loss 2.5976 LearningRate 0.0000 Epoch: 37 Global Step: 780170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:21,101-Speed 6282.43 samples/sec Loss 2.5643 LearningRate 0.0000 Epoch: 37 Global Step: 780180 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:24,342-Speed 6320.81 samples/sec Loss 2.5609 LearningRate 0.0000 Epoch: 37 Global Step: 780190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:27,605-Speed 6278.99 samples/sec Loss 2.6042 LearningRate 0.0000 Epoch: 37 Global Step: 780200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:30,862-Speed 6289.38 samples/sec Loss 2.5776 LearningRate 0.0000 Epoch: 37 Global Step: 780210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:34,127-Speed 6274.61 samples/sec Loss 2.5759 LearningRate 0.0000 Epoch: 37 Global Step: 780220 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:37,379-Speed 6298.01 samples/sec Loss 2.5827 LearningRate 0.0000 Epoch: 37 Global Step: 780230 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:40,647-Speed 6268.05 samples/sec Loss 2.6571 LearningRate 0.0000 Epoch: 37 Global Step: 780240 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:43,900-Speed 6297.00 samples/sec Loss 2.5687 LearningRate 0.0000 Epoch: 37 Global Step: 780250 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:47,162-Speed 6280.42 samples/sec Loss 2.5831 LearningRate 0.0000 Epoch: 37 Global Step: 780260 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:50,417-Speed 6292.50 samples/sec Loss 2.6126 LearningRate 0.0000 Epoch: 37 Global Step: 780270 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:53,672-Speed 6293.85 samples/sec Loss 2.5768 LearningRate 0.0000 Epoch: 37 Global Step: 780280 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:43:56,912-Speed 6322.32 samples/sec Loss 2.5516 LearningRate 0.0000 Epoch: 37 Global Step: 780290 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:00,167-Speed 6293.73 samples/sec Loss 2.5644 LearningRate 0.0000 Epoch: 37 Global Step: 780300 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:03,427-Speed 6283.87 samples/sec Loss 2.5847 LearningRate 0.0000 Epoch: 37 Global Step: 780310 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:06,683-Speed 6291.23 samples/sec Loss 2.5625 LearningRate 0.0000 Epoch: 37 Global Step: 780320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:09,938-Speed 6292.45 samples/sec Loss 2.6147 LearningRate 0.0000 Epoch: 37 Global Step: 780330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:13,191-Speed 6297.54 samples/sec Loss 2.5953 LearningRate 0.0000 Epoch: 37 Global Step: 780340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:16,454-Speed 6277.82 samples/sec Loss 2.6482 LearningRate 0.0000 Epoch: 37 Global Step: 780350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:19,709-Speed 6293.46 samples/sec Loss 2.5926 LearningRate 0.0000 Epoch: 37 Global Step: 780360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:22,971-Speed 6279.59 samples/sec Loss 2.6013 LearningRate 0.0000 Epoch: 37 Global Step: 780370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:26,230-Speed 6285.68 samples/sec Loss 2.6181 LearningRate 0.0000 Epoch: 37 Global Step: 780380 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:29,482-Speed 6299.67 samples/sec Loss 2.5888 LearningRate 0.0000 Epoch: 37 Global Step: 780390 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 14:44:32,744-Speed 6279.49 samples/sec Loss 2.5342 LearningRate 0.0000 Epoch: 37 Global Step: 780400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:35,999-Speed 6294.08 samples/sec Loss 2.5710 LearningRate 0.0000 Epoch: 37 Global Step: 780410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:39,262-Speed 6277.75 samples/sec Loss 2.5809 LearningRate 0.0000 Epoch: 37 Global Step: 780420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:42,519-Speed 6288.96 samples/sec Loss 2.6007 LearningRate 0.0000 Epoch: 37 Global Step: 780430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:45,780-Speed 6281.33 samples/sec Loss 2.5672 LearningRate 0.0000 Epoch: 37 Global Step: 780440 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:49,042-Speed 6279.57 samples/sec Loss 2.6238 LearningRate 0.0000 Epoch: 37 Global Step: 780450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:52,308-Speed 6272.82 samples/sec Loss 2.5309 LearningRate 0.0000 Epoch: 37 Global Step: 780460 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:55,570-Speed 6279.20 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 37 Global Step: 780470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:44:58,833-Speed 6278.33 samples/sec Loss 2.5859 LearningRate 0.0000 Epoch: 37 Global Step: 780480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:45:02,092-Speed 6286.21 samples/sec Loss 2.5977 LearningRate 0.0000 Epoch: 37 Global Step: 780490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:45:05,338-Speed 6309.40 samples/sec Loss 2.6279 LearningRate 0.0000 Epoch: 37 Global Step: 780500 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:45:08,601-Speed 6279.02 samples/sec Loss 2.5885 LearningRate 0.0000 Epoch: 37 Global Step: 780510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:45:11,861-Speed 6284.19 samples/sec Loss 2.5871 LearningRate 0.0000 Epoch: 37 Global Step: 780520 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:45:15,119-Speed 6287.43 samples/sec Loss 2.5785 LearningRate 0.0000 Epoch: 37 Global Step: 780530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:45:18,356-Speed 6326.94 samples/sec Loss 2.6087 LearningRate 0.0000 Epoch: 37 Global Step: 780540 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:45:21,609-Speed 6296.81 samples/sec Loss 2.6122 LearningRate 0.0000 Epoch: 37 Global Step: 780550 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:45:24,882-Speed 6258.42 samples/sec Loss 2.6432 LearningRate 0.0000 Epoch: 37 Global Step: 780560 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:45:28,141-Speed 6285.89 samples/sec Loss 2.6470 LearningRate 0.0000 Epoch: 37 Global Step: 780570 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:45:31,390-Speed 6305.68 samples/sec Loss 2.5845 LearningRate 0.0000 Epoch: 37 Global Step: 780580 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:45:34,643-Speed 6296.50 samples/sec Loss 2.5855 LearningRate 0.0000 Epoch: 37 Global Step: 780590 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:45:37,902-Speed 6285.90 samples/sec Loss 2.5624 LearningRate 0.0000 Epoch: 37 Global Step: 780600 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:45:41,156-Speed 6295.65 samples/sec Loss 2.6289 LearningRate 0.0000 Epoch: 37 Global Step: 780610 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:45:44,409-Speed 6297.78 samples/sec Loss 2.5525 LearningRate 0.0000 Epoch: 37 Global Step: 780620 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:45:47,660-Speed 6300.65 samples/sec Loss 2.5999 LearningRate 0.0000 Epoch: 37 Global Step: 780630 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:45:50,916-Speed 6291.46 samples/sec Loss 2.5617 LearningRate 0.0000 Epoch: 37 Global Step: 780640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:45:54,173-Speed 6289.42 samples/sec Loss 2.5525 LearningRate 0.0000 Epoch: 37 Global Step: 780650 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:45:57,433-Speed 6283.81 samples/sec Loss 2.6005 LearningRate 0.0000 Epoch: 37 Global Step: 780660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:00,696-Speed 6278.00 samples/sec Loss 2.5966 LearningRate 0.0000 Epoch: 37 Global Step: 780670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:03,966-Speed 6263.55 samples/sec Loss 2.6095 LearningRate 0.0000 Epoch: 37 Global Step: 780680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:07,242-Speed 6253.37 samples/sec Loss 2.6326 LearningRate 0.0000 Epoch: 37 Global Step: 780690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:10,513-Speed 6263.38 samples/sec Loss 2.6055 LearningRate 0.0000 Epoch: 37 Global Step: 780700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:13,770-Speed 6288.57 samples/sec Loss 2.5169 LearningRate 0.0000 Epoch: 37 Global Step: 780710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:17,027-Speed 6288.94 samples/sec Loss 2.6336 LearningRate 0.0000 Epoch: 37 Global Step: 780720 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:20,283-Speed 6292.21 samples/sec Loss 2.5762 LearningRate 0.0000 Epoch: 37 Global Step: 780730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:23,540-Speed 6288.04 samples/sec Loss 2.6168 LearningRate 0.0000 Epoch: 37 Global Step: 780740 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 14:46:26,785-Speed 6313.95 samples/sec Loss 2.5856 LearningRate 0.0000 Epoch: 37 Global Step: 780750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:30,038-Speed 6296.90 samples/sec Loss 2.5450 LearningRate 0.0000 Epoch: 37 Global Step: 780760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:33,290-Speed 6298.33 samples/sec Loss 2.6075 LearningRate 0.0000 Epoch: 37 Global Step: 780770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:36,548-Speed 6288.67 samples/sec Loss 2.6176 LearningRate 0.0000 Epoch: 37 Global Step: 780780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:39,801-Speed 6296.33 samples/sec Loss 2.5858 LearningRate 0.0000 Epoch: 37 Global Step: 780790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:43,061-Speed 6283.25 samples/sec Loss 2.5669 LearningRate 0.0000 Epoch: 37 Global Step: 780800 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:46,322-Speed 6283.56 samples/sec Loss 2.6199 LearningRate 0.0000 Epoch: 37 Global Step: 780810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:49,583-Speed 6281.88 samples/sec Loss 2.5929 LearningRate 0.0000 Epoch: 37 Global Step: 780820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:52,844-Speed 6280.94 samples/sec Loss 2.6084 LearningRate 0.0000 Epoch: 37 Global Step: 780830 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:56,111-Speed 6270.86 samples/sec Loss 2.5644 LearningRate 0.0000 Epoch: 37 Global Step: 780840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:46:59,362-Speed 6301.92 samples/sec Loss 2.6252 LearningRate 0.0000 Epoch: 37 Global Step: 780850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:02,633-Speed 6262.77 samples/sec Loss 2.5292 LearningRate 0.0000 Epoch: 37 Global Step: 780860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:05,889-Speed 6290.25 samples/sec Loss 2.6226 LearningRate 0.0000 Epoch: 37 Global Step: 780870 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:09,146-Speed 6291.03 samples/sec Loss 2.5694 LearningRate 0.0000 Epoch: 37 Global Step: 780880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:12,406-Speed 6281.51 samples/sec Loss 2.5813 LearningRate 0.0000 Epoch: 37 Global Step: 780890 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:15,670-Speed 6276.16 samples/sec Loss 2.5500 LearningRate 0.0000 Epoch: 37 Global Step: 780900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:18,934-Speed 6275.81 samples/sec Loss 2.6047 LearningRate 0.0000 Epoch: 37 Global Step: 780910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:22,189-Speed 6293.38 samples/sec Loss 2.5930 LearningRate 0.0000 Epoch: 37 Global Step: 780920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:25,453-Speed 6276.60 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 37 Global Step: 780930 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:28,723-Speed 6264.81 samples/sec Loss 2.6346 LearningRate 0.0000 Epoch: 37 Global Step: 780940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:31,968-Speed 6312.35 samples/sec Loss 2.6061 LearningRate 0.0000 Epoch: 37 Global Step: 780950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:35,230-Speed 6278.76 samples/sec Loss 2.5809 LearningRate 0.0000 Epoch: 37 Global Step: 780960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:38,490-Speed 6286.62 samples/sec Loss 2.5831 LearningRate 0.0000 Epoch: 37 Global Step: 780970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:41,742-Speed 6298.70 samples/sec Loss 2.6127 LearningRate 0.0000 Epoch: 37 Global Step: 780980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:45,003-Speed 6283.08 samples/sec Loss 2.5378 LearningRate 0.0000 Epoch: 37 Global Step: 780990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:48,262-Speed 6284.14 samples/sec Loss 2.5690 LearningRate 0.0000 Epoch: 37 Global Step: 781000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:51,542-Speed 6246.99 samples/sec Loss 2.6249 LearningRate 0.0000 Epoch: 37 Global Step: 781010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:54,801-Speed 6285.50 samples/sec Loss 2.6236 LearningRate 0.0000 Epoch: 37 Global Step: 781020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:47:58,065-Speed 6275.52 samples/sec Loss 2.6179 LearningRate 0.0000 Epoch: 37 Global Step: 781030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:01,329-Speed 6275.06 samples/sec Loss 2.5733 LearningRate 0.0000 Epoch: 37 Global Step: 781040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:04,585-Speed 6291.29 samples/sec Loss 2.5927 LearningRate 0.0000 Epoch: 37 Global Step: 781050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:07,844-Speed 6286.33 samples/sec Loss 2.5947 LearningRate 0.0000 Epoch: 37 Global Step: 781060 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:11,105-Speed 6282.01 samples/sec Loss 2.5542 LearningRate 0.0000 Epoch: 37 Global Step: 781070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:14,362-Speed 6288.09 samples/sec Loss 2.5343 LearningRate 0.0000 Epoch: 37 Global Step: 781080 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:17,633-Speed 6263.26 samples/sec Loss 2.6208 LearningRate 0.0000 Epoch: 37 Global Step: 781090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:20,892-Speed 6285.88 samples/sec Loss 2.6352 LearningRate 0.0000 Epoch: 37 Global Step: 781100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:24,143-Speed 6300.58 samples/sec Loss 2.5793 LearningRate 0.0000 Epoch: 37 Global Step: 781110 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:27,405-Speed 6279.39 samples/sec Loss 2.6001 LearningRate 0.0000 Epoch: 37 Global Step: 781120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:30,666-Speed 6282.76 samples/sec Loss 2.6175 LearningRate 0.0000 Epoch: 37 Global Step: 781130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:33,921-Speed 6291.67 samples/sec Loss 2.5865 LearningRate 0.0000 Epoch: 37 Global Step: 781140 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:37,189-Speed 6267.92 samples/sec Loss 2.5776 LearningRate 0.0000 Epoch: 37 Global Step: 781150 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 14:48:40,429-Speed 6323.88 samples/sec Loss 2.5739 LearningRate 0.0000 Epoch: 37 Global Step: 781160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:43,688-Speed 6285.56 samples/sec Loss 2.5570 LearningRate 0.0000 Epoch: 37 Global Step: 781170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:46,958-Speed 6264.92 samples/sec Loss 2.6203 LearningRate 0.0000 Epoch: 37 Global Step: 781180 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:50,214-Speed 6290.17 samples/sec Loss 2.5771 LearningRate 0.0000 Epoch: 37 Global Step: 781190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:53,481-Speed 6271.07 samples/sec Loss 2.5580 LearningRate 0.0000 Epoch: 37 Global Step: 781200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:56,740-Speed 6286.73 samples/sec Loss 2.6149 LearningRate 0.0000 Epoch: 37 Global Step: 781210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:48:59,995-Speed 6292.70 samples/sec Loss 2.6373 LearningRate 0.0000 Epoch: 37 Global Step: 781220 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:03,245-Speed 6301.86 samples/sec Loss 2.5941 LearningRate 0.0000 Epoch: 37 Global Step: 781230 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:06,506-Speed 6282.04 samples/sec Loss 2.6220 LearningRate 0.0000 Epoch: 37 Global Step: 781240 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:09,760-Speed 6295.19 samples/sec Loss 2.5981 LearningRate 0.0000 Epoch: 37 Global Step: 781250 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:13,006-Speed 6311.25 samples/sec Loss 2.5870 LearningRate 0.0000 Epoch: 37 Global Step: 781260 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:16,255-Speed 6305.42 samples/sec Loss 2.5444 LearningRate 0.0000 Epoch: 37 Global Step: 781270 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:19,506-Speed 6301.49 samples/sec Loss 2.5657 LearningRate 0.0000 Epoch: 37 Global Step: 781280 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:22,759-Speed 6296.11 samples/sec Loss 2.5480 LearningRate 0.0000 Epoch: 37 Global Step: 781290 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:26,016-Speed 6288.81 samples/sec Loss 2.5296 LearningRate 0.0000 Epoch: 37 Global Step: 781300 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:29,276-Speed 6283.77 samples/sec Loss 2.5886 LearningRate 0.0000 Epoch: 37 Global Step: 781310 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:32,527-Speed 6300.66 samples/sec Loss 2.6001 LearningRate 0.0000 Epoch: 37 Global Step: 781320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:35,784-Speed 6290.53 samples/sec Loss 2.5587 LearningRate 0.0000 Epoch: 37 Global Step: 781330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:39,039-Speed 6292.43 samples/sec Loss 2.5914 LearningRate 0.0000 Epoch: 37 Global Step: 781340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:42,303-Speed 6275.74 samples/sec Loss 2.5787 LearningRate 0.0000 Epoch: 37 Global Step: 781350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:45,549-Speed 6312.50 samples/sec Loss 2.6380 LearningRate 0.0000 Epoch: 37 Global Step: 781360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:48,808-Speed 6284.43 samples/sec Loss 2.5980 LearningRate 0.0000 Epoch: 37 Global Step: 781370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:52,066-Speed 6288.19 samples/sec Loss 2.5929 LearningRate 0.0000 Epoch: 37 Global Step: 781380 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:55,315-Speed 6304.46 samples/sec Loss 2.5774 LearningRate 0.0000 Epoch: 37 Global Step: 781390 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:49:58,559-Speed 6314.20 samples/sec Loss 2.5756 LearningRate 0.0000 Epoch: 37 Global Step: 781400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:01,820-Speed 6282.39 samples/sec Loss 2.5612 LearningRate 0.0000 Epoch: 37 Global Step: 781410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:05,073-Speed 6298.04 samples/sec Loss 2.6065 LearningRate 0.0000 Epoch: 37 Global Step: 781420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:08,334-Speed 6280.56 samples/sec Loss 2.5995 LearningRate 0.0000 Epoch: 37 Global Step: 781430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:11,590-Speed 6292.40 samples/sec Loss 2.6000 LearningRate 0.0000 Epoch: 37 Global Step: 781440 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:14,848-Speed 6287.47 samples/sec Loss 2.6256 LearningRate 0.0000 Epoch: 37 Global Step: 781450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:18,111-Speed 6277.63 samples/sec Loss 2.6105 LearningRate 0.0000 Epoch: 37 Global Step: 781460 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 14:50:21,352-Speed 6320.48 samples/sec Loss 2.5750 LearningRate 0.0000 Epoch: 37 Global Step: 781470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:24,608-Speed 6290.22 samples/sec Loss 2.6392 LearningRate 0.0000 Epoch: 37 Global Step: 781480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:27,872-Speed 6276.56 samples/sec Loss 2.6473 LearningRate 0.0000 Epoch: 37 Global Step: 781490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:31,128-Speed 6291.78 samples/sec Loss 2.5768 LearningRate 0.0000 Epoch: 37 Global Step: 781500 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:34,398-Speed 6264.42 samples/sec Loss 2.6581 LearningRate 0.0000 Epoch: 37 Global Step: 781510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:37,652-Speed 6295.42 samples/sec Loss 2.6155 LearningRate 0.0000 Epoch: 37 Global Step: 781520 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:40,913-Speed 6280.94 samples/sec Loss 2.6029 LearningRate 0.0000 Epoch: 37 Global Step: 781530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:44,170-Speed 6289.64 samples/sec Loss 2.6022 LearningRate 0.0000 Epoch: 37 Global Step: 781540 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:47,438-Speed 6268.47 samples/sec Loss 2.6109 LearningRate 0.0000 Epoch: 37 Global Step: 781550 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:50,701-Speed 6277.30 samples/sec Loss 2.5973 LearningRate 0.0000 Epoch: 37 Global Step: 781560 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:53,937-Speed 6329.51 samples/sec Loss 2.5923 LearningRate 0.0000 Epoch: 37 Global Step: 781570 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:50:57,198-Speed 6282.66 samples/sec Loss 2.5614 LearningRate 0.0000 Epoch: 37 Global Step: 781580 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:00,456-Speed 6287.78 samples/sec Loss 2.5980 LearningRate 0.0000 Epoch: 37 Global Step: 781590 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:03,708-Speed 6299.09 samples/sec Loss 2.6246 LearningRate 0.0000 Epoch: 37 Global Step: 781600 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:06,969-Speed 6282.18 samples/sec Loss 2.6076 LearningRate 0.0000 Epoch: 37 Global Step: 781610 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:10,228-Speed 6285.05 samples/sec Loss 2.5954 LearningRate 0.0000 Epoch: 37 Global Step: 781620 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:13,483-Speed 6293.11 samples/sec Loss 2.4947 LearningRate 0.0000 Epoch: 37 Global Step: 781630 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:16,739-Speed 6292.22 samples/sec Loss 2.5312 LearningRate 0.0000 Epoch: 37 Global Step: 781640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:20,007-Speed 6267.08 samples/sec Loss 2.5899 LearningRate 0.0000 Epoch: 37 Global Step: 781650 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:23,265-Speed 6287.64 samples/sec Loss 2.5866 LearningRate 0.0000 Epoch: 37 Global Step: 781660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:26,513-Speed 6307.52 samples/sec Loss 2.5284 LearningRate 0.0000 Epoch: 37 Global Step: 781670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:29,763-Speed 6302.98 samples/sec Loss 2.5773 LearningRate 0.0000 Epoch: 37 Global Step: 781680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:33,014-Speed 6299.64 samples/sec Loss 2.6048 LearningRate 0.0000 Epoch: 37 Global Step: 781690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:36,282-Speed 6268.35 samples/sec Loss 2.5294 LearningRate 0.0000 Epoch: 37 Global Step: 781700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:39,545-Speed 6278.21 samples/sec Loss 2.5548 LearningRate 0.0000 Epoch: 37 Global Step: 781710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:42,809-Speed 6275.48 samples/sec Loss 2.6242 LearningRate 0.0000 Epoch: 37 Global Step: 781720 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:46,064-Speed 6294.25 samples/sec Loss 2.5517 LearningRate 0.0000 Epoch: 37 Global Step: 781730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:49,317-Speed 6296.15 samples/sec Loss 2.5971 LearningRate 0.0000 Epoch: 37 Global Step: 781740 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:52,571-Speed 6295.79 samples/sec Loss 2.5627 LearningRate 0.0000 Epoch: 37 Global Step: 781750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:55,820-Speed 6304.81 samples/sec Loss 2.5723 LearningRate 0.0000 Epoch: 37 Global Step: 781760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:51:59,072-Speed 6300.18 samples/sec Loss 2.5966 LearningRate 0.0000 Epoch: 37 Global Step: 781770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:02,323-Speed 6300.66 samples/sec Loss 2.6081 LearningRate 0.0000 Epoch: 37 Global Step: 781780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:05,599-Speed 6251.68 samples/sec Loss 2.6317 LearningRate 0.0000 Epoch: 37 Global Step: 781790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:08,855-Speed 6293.19 samples/sec Loss 2.6047 LearningRate 0.0000 Epoch: 37 Global Step: 781800 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:12,110-Speed 6292.01 samples/sec Loss 2.5827 LearningRate 0.0000 Epoch: 37 Global Step: 781810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:15,372-Speed 6282.13 samples/sec Loss 2.6268 LearningRate 0.0000 Epoch: 37 Global Step: 781820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:18,627-Speed 6291.88 samples/sec Loss 2.5942 LearningRate 0.0000 Epoch: 37 Global Step: 781830 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:21,881-Speed 6295.80 samples/sec Loss 2.5497 LearningRate 0.0000 Epoch: 37 Global Step: 781840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:25,138-Speed 6289.24 samples/sec Loss 2.5765 LearningRate 0.0000 Epoch: 37 Global Step: 781850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:28,391-Speed 6297.11 samples/sec Loss 2.5852 LearningRate 0.0000 Epoch: 37 Global Step: 781860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:31,651-Speed 6283.83 samples/sec Loss 2.6485 LearningRate 0.0000 Epoch: 37 Global Step: 781870 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 14:52:34,891-Speed 6322.15 samples/sec Loss 2.5671 LearningRate 0.0000 Epoch: 37 Global Step: 781880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:38,147-Speed 6290.27 samples/sec Loss 2.6293 LearningRate 0.0000 Epoch: 37 Global Step: 781890 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:41,403-Speed 6291.86 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 37 Global Step: 781900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:44,661-Speed 6288.34 samples/sec Loss 2.5493 LearningRate 0.0000 Epoch: 37 Global Step: 781910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:47,912-Speed 6300.70 samples/sec Loss 2.5266 LearningRate 0.0000 Epoch: 37 Global Step: 781920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:51,171-Speed 6286.24 samples/sec Loss 2.5659 LearningRate 0.0000 Epoch: 37 Global Step: 781930 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:54,432-Speed 6280.97 samples/sec Loss 2.5475 LearningRate 0.0000 Epoch: 37 Global Step: 781940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:52:57,686-Speed 6295.04 samples/sec Loss 2.5143 LearningRate 0.0000 Epoch: 37 Global Step: 781950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:00,942-Speed 6292.02 samples/sec Loss 2.5930 LearningRate 0.0000 Epoch: 37 Global Step: 781960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:04,230-Speed 6229.03 samples/sec Loss 2.5487 LearningRate 0.0000 Epoch: 37 Global Step: 781970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:07,536-Speed 6196.85 samples/sec Loss 2.6031 LearningRate 0.0000 Epoch: 37 Global Step: 781980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:10,800-Speed 6274.99 samples/sec Loss 2.5521 LearningRate 0.0000 Epoch: 37 Global Step: 781990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:14,056-Speed 6292.25 samples/sec Loss 2.5502 LearningRate 0.0000 Epoch: 37 Global Step: 782000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:17,317-Speed 6281.47 samples/sec Loss 2.5883 LearningRate 0.0000 Epoch: 37 Global Step: 782010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:20,579-Speed 6280.38 samples/sec Loss 2.5967 LearningRate 0.0000 Epoch: 37 Global Step: 782020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:23,837-Speed 6288.09 samples/sec Loss 2.6180 LearningRate 0.0000 Epoch: 37 Global Step: 782030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:27,085-Speed 6306.09 samples/sec Loss 2.5664 LearningRate 0.0000 Epoch: 37 Global Step: 782040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:30,339-Speed 6295.46 samples/sec Loss 2.5827 LearningRate 0.0000 Epoch: 37 Global Step: 782050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:33,596-Speed 6290.60 samples/sec Loss 2.5567 LearningRate 0.0000 Epoch: 37 Global Step: 782060 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:36,863-Speed 6268.92 samples/sec Loss 2.5818 LearningRate 0.0000 Epoch: 37 Global Step: 782070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:40,106-Speed 6317.17 samples/sec Loss 2.6303 LearningRate 0.0000 Epoch: 37 Global Step: 782080 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:43,362-Speed 6290.73 samples/sec Loss 2.6318 LearningRate 0.0000 Epoch: 37 Global Step: 782090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:46,625-Speed 6279.01 samples/sec Loss 2.6178 LearningRate 0.0000 Epoch: 37 Global Step: 782100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:49,879-Speed 6295.32 samples/sec Loss 2.5685 LearningRate 0.0000 Epoch: 37 Global Step: 782110 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:53,146-Speed 6269.09 samples/sec Loss 2.6169 LearningRate 0.0000 Epoch: 37 Global Step: 782120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:56,403-Speed 6288.70 samples/sec Loss 2.5967 LearningRate 0.0000 Epoch: 37 Global Step: 782130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:53:59,666-Speed 6277.72 samples/sec Loss 2.5752 LearningRate 0.0000 Epoch: 37 Global Step: 782140 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:54:02,928-Speed 6280.23 samples/sec Loss 2.6041 LearningRate 0.0000 Epoch: 37 Global Step: 782150 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:54:06,183-Speed 6294.27 samples/sec Loss 2.5762 LearningRate 0.0000 Epoch: 37 Global Step: 782160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:54:09,440-Speed 6288.20 samples/sec Loss 2.5491 LearningRate 0.0000 Epoch: 37 Global Step: 782170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:54:12,693-Speed 6297.82 samples/sec Loss 2.5891 LearningRate 0.0000 Epoch: 37 Global Step: 782180 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 14:54:15,938-Speed 6313.51 samples/sec Loss 2.6063 LearningRate 0.0000 Epoch: 37 Global Step: 782190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:54:19,194-Speed 6290.70 samples/sec Loss 2.5854 LearningRate 0.0000 Epoch: 37 Global Step: 782200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:54:22,446-Speed 6300.58 samples/sec Loss 2.5754 LearningRate 0.0000 Epoch: 37 Global Step: 782210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:54:25,780-Speed 6143.73 samples/sec Loss 2.6055 LearningRate 0.0000 Epoch: 37 Global Step: 782220 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:54:29,059-Speed 6246.83 samples/sec Loss 2.5628 LearningRate 0.0000 Epoch: 37 Global Step: 782230 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:54:32,318-Speed 6285.92 samples/sec Loss 2.5888 LearningRate 0.0000 Epoch: 37 Global Step: 782240 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:54:35,578-Speed 6283.73 samples/sec Loss 2.5865 LearningRate 0.0000 Epoch: 37 Global Step: 782250 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:54:38,845-Speed 6270.37 samples/sec Loss 2.6205 LearningRate 0.0000 Epoch: 37 Global Step: 782260 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:54:42,105-Speed 6282.56 samples/sec Loss 2.5531 LearningRate 0.0000 Epoch: 37 Global Step: 782270 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:54:45,358-Speed 6297.70 samples/sec Loss 2.5588 LearningRate 0.0000 Epoch: 37 Global Step: 782280 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:54:48,617-Speed 6285.33 samples/sec Loss 2.5972 LearningRate 0.0000 Epoch: 37 Global Step: 782290 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:54:51,882-Speed 6273.95 samples/sec Loss 2.5626 LearningRate 0.0000 Epoch: 37 Global Step: 782300 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:54:55,135-Speed 6296.61 samples/sec Loss 2.6353 LearningRate 0.0000 Epoch: 37 Global Step: 782310 Fp16 Grad Scale: 2048 Required: 4 hours Training: 2022-04-03 14:54:58,389-Speed 6295.59 samples/sec Loss 2.5812 LearningRate 0.0000 Epoch: 37 Global Step: 782320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:01,649-Speed 6283.43 samples/sec Loss 2.5932 LearningRate 0.0000 Epoch: 37 Global Step: 782330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:04,902-Speed 6298.13 samples/sec Loss 2.6263 LearningRate 0.0000 Epoch: 37 Global Step: 782340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:08,163-Speed 6280.66 samples/sec Loss 2.5862 LearningRate 0.0000 Epoch: 37 Global Step: 782350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:11,419-Speed 6292.57 samples/sec Loss 2.5716 LearningRate 0.0000 Epoch: 37 Global Step: 782360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:14,674-Speed 6291.40 samples/sec Loss 2.6182 LearningRate 0.0000 Epoch: 37 Global Step: 782370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:17,919-Speed 6312.63 samples/sec Loss 2.5775 LearningRate 0.0000 Epoch: 37 Global Step: 782380 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:21,168-Speed 6306.66 samples/sec Loss 2.5718 LearningRate 0.0000 Epoch: 37 Global Step: 782390 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:24,423-Speed 6291.94 samples/sec Loss 2.6038 LearningRate 0.0000 Epoch: 37 Global Step: 782400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:27,677-Speed 6295.61 samples/sec Loss 2.5853 LearningRate 0.0000 Epoch: 37 Global Step: 782410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:30,920-Speed 6317.16 samples/sec Loss 2.5938 LearningRate 0.0000 Epoch: 37 Global Step: 782420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:34,184-Speed 6275.40 samples/sec Loss 2.6014 LearningRate 0.0000 Epoch: 37 Global Step: 782430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:37,447-Speed 6278.16 samples/sec Loss 2.6275 LearningRate 0.0000 Epoch: 37 Global Step: 782440 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:40,715-Speed 6269.30 samples/sec Loss 2.6114 LearningRate 0.0000 Epoch: 37 Global Step: 782450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:43,967-Speed 6299.11 samples/sec Loss 2.5824 LearningRate 0.0000 Epoch: 37 Global Step: 782460 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:47,217-Speed 6302.41 samples/sec Loss 2.5830 LearningRate 0.0000 Epoch: 37 Global Step: 782470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:50,483-Speed 6272.71 samples/sec Loss 2.5864 LearningRate 0.0000 Epoch: 37 Global Step: 782480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:53,736-Speed 6297.42 samples/sec Loss 2.5752 LearningRate 0.0000 Epoch: 37 Global Step: 782490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:55:57,001-Speed 6275.12 samples/sec Loss 2.5711 LearningRate 0.0000 Epoch: 37 Global Step: 782500 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:00,255-Speed 6294.93 samples/sec Loss 2.5006 LearningRate 0.0000 Epoch: 37 Global Step: 782510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:03,491-Speed 6330.16 samples/sec Loss 2.5506 LearningRate 0.0000 Epoch: 37 Global Step: 782520 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:06,758-Speed 6268.20 samples/sec Loss 2.5494 LearningRate 0.0000 Epoch: 37 Global Step: 782530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:10,015-Speed 6290.99 samples/sec Loss 2.5772 LearningRate 0.0000 Epoch: 37 Global Step: 782540 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:13,269-Speed 6295.14 samples/sec Loss 2.5433 LearningRate 0.0000 Epoch: 37 Global Step: 782550 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:16,521-Speed 6299.43 samples/sec Loss 2.6131 LearningRate 0.0000 Epoch: 37 Global Step: 782560 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:19,782-Speed 6281.72 samples/sec Loss 2.6138 LearningRate 0.0000 Epoch: 37 Global Step: 782570 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:23,100-Speed 6173.82 samples/sec Loss 2.6298 LearningRate 0.0000 Epoch: 37 Global Step: 782580 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:26,471-Speed 6076.80 samples/sec Loss 2.5856 LearningRate 0.0000 Epoch: 37 Global Step: 782590 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:29,734-Speed 6275.97 samples/sec Loss 2.5894 LearningRate 0.0000 Epoch: 37 Global Step: 782600 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:32,996-Speed 6282.09 samples/sec Loss 2.6140 LearningRate 0.0000 Epoch: 37 Global Step: 782610 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:36,242-Speed 6310.39 samples/sec Loss 2.5657 LearningRate 0.0000 Epoch: 37 Global Step: 782620 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:39,504-Speed 6280.30 samples/sec Loss 2.5949 LearningRate 0.0000 Epoch: 37 Global Step: 782630 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:42,769-Speed 6273.04 samples/sec Loss 2.6070 LearningRate 0.0000 Epoch: 37 Global Step: 782640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:46,025-Speed 6292.25 samples/sec Loss 2.5484 LearningRate 0.0000 Epoch: 37 Global Step: 782650 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:49,283-Speed 6287.32 samples/sec Loss 2.5680 LearningRate 0.0000 Epoch: 37 Global Step: 782660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:52,545-Speed 6278.61 samples/sec Loss 2.5521 LearningRate 0.0000 Epoch: 37 Global Step: 782670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:55,802-Speed 6291.14 samples/sec Loss 2.5910 LearningRate 0.0000 Epoch: 37 Global Step: 782680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:56:59,058-Speed 6290.75 samples/sec Loss 2.5883 LearningRate 0.0000 Epoch: 37 Global Step: 782690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:02,325-Speed 6268.85 samples/sec Loss 2.6105 LearningRate 0.0000 Epoch: 37 Global Step: 782700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:05,586-Speed 6282.21 samples/sec Loss 2.5914 LearningRate 0.0000 Epoch: 37 Global Step: 782710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:08,836-Speed 6303.95 samples/sec Loss 2.5365 LearningRate 0.0000 Epoch: 37 Global Step: 782720 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 14:57:12,071-Speed 6331.20 samples/sec Loss 2.6023 LearningRate 0.0000 Epoch: 37 Global Step: 782730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:15,341-Speed 6263.69 samples/sec Loss 2.5885 LearningRate 0.0000 Epoch: 37 Global Step: 782740 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:18,602-Speed 6282.23 samples/sec Loss 2.6279 LearningRate 0.0000 Epoch: 37 Global Step: 782750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:21,866-Speed 6276.62 samples/sec Loss 2.5656 LearningRate 0.0000 Epoch: 37 Global Step: 782760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:25,122-Speed 6290.12 samples/sec Loss 2.5535 LearningRate 0.0000 Epoch: 37 Global Step: 782770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:28,378-Speed 6291.78 samples/sec Loss 2.5586 LearningRate 0.0000 Epoch: 37 Global Step: 782780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:31,633-Speed 6292.87 samples/sec Loss 2.5794 LearningRate 0.0000 Epoch: 37 Global Step: 782790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:34,894-Speed 6282.37 samples/sec Loss 2.5881 LearningRate 0.0000 Epoch: 37 Global Step: 782800 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:38,153-Speed 6285.34 samples/sec Loss 2.5722 LearningRate 0.0000 Epoch: 37 Global Step: 782810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:41,415-Speed 6281.27 samples/sec Loss 2.5973 LearningRate 0.0000 Epoch: 37 Global Step: 782820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:44,655-Speed 6321.95 samples/sec Loss 2.5904 LearningRate 0.0000 Epoch: 37 Global Step: 782830 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:47,916-Speed 6280.76 samples/sec Loss 2.6045 LearningRate 0.0000 Epoch: 37 Global Step: 782840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:51,181-Speed 6275.07 samples/sec Loss 2.6106 LearningRate 0.0000 Epoch: 37 Global Step: 782850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:54,436-Speed 6293.94 samples/sec Loss 2.5330 LearningRate 0.0000 Epoch: 37 Global Step: 782860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:57:57,687-Speed 6299.74 samples/sec Loss 2.5547 LearningRate 0.0000 Epoch: 37 Global Step: 782870 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:00,948-Speed 6282.59 samples/sec Loss 2.5966 LearningRate 0.0000 Epoch: 37 Global Step: 782880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:04,210-Speed 6278.50 samples/sec Loss 2.5393 LearningRate 0.0000 Epoch: 37 Global Step: 782890 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:07,473-Speed 6278.97 samples/sec Loss 2.6037 LearningRate 0.0000 Epoch: 37 Global Step: 782900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:10,725-Speed 6298.60 samples/sec Loss 2.6061 LearningRate 0.0000 Epoch: 37 Global Step: 782910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:13,991-Speed 6271.84 samples/sec Loss 2.6448 LearningRate 0.0000 Epoch: 37 Global Step: 782920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:17,249-Speed 6287.60 samples/sec Loss 2.5762 LearningRate 0.0000 Epoch: 37 Global Step: 782930 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 14:58:20,501-Speed 6299.64 samples/sec Loss 2.5713 LearningRate 0.0000 Epoch: 37 Global Step: 782940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:23,763-Speed 6278.71 samples/sec Loss 2.5959 LearningRate 0.0000 Epoch: 37 Global Step: 782950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:27,025-Speed 6279.64 samples/sec Loss 2.6018 LearningRate 0.0000 Epoch: 37 Global Step: 782960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:30,285-Speed 6283.99 samples/sec Loss 2.6113 LearningRate 0.0000 Epoch: 37 Global Step: 782970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:33,546-Speed 6282.34 samples/sec Loss 2.5598 LearningRate 0.0000 Epoch: 37 Global Step: 782980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:36,801-Speed 6292.81 samples/sec Loss 2.6005 LearningRate 0.0000 Epoch: 37 Global Step: 782990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:40,057-Speed 6291.41 samples/sec Loss 2.5898 LearningRate 0.0000 Epoch: 37 Global Step: 783000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:43,314-Speed 6290.12 samples/sec Loss 2.6177 LearningRate 0.0000 Epoch: 37 Global Step: 783010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:46,570-Speed 6291.32 samples/sec Loss 2.6159 LearningRate 0.0000 Epoch: 37 Global Step: 783020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:49,826-Speed 6290.48 samples/sec Loss 2.6148 LearningRate 0.0000 Epoch: 37 Global Step: 783030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:53,068-Speed 6320.24 samples/sec Loss 2.5779 LearningRate 0.0000 Epoch: 37 Global Step: 783040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:56,323-Speed 6291.87 samples/sec Loss 2.5367 LearningRate 0.0000 Epoch: 37 Global Step: 783050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:58:59,578-Speed 6293.70 samples/sec Loss 2.5568 LearningRate 0.0000 Epoch: 37 Global Step: 783060 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:02,837-Speed 6286.04 samples/sec Loss 2.5807 LearningRate 0.0000 Epoch: 37 Global Step: 783070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:06,091-Speed 6294.95 samples/sec Loss 2.5967 LearningRate 0.0000 Epoch: 37 Global Step: 783080 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:09,346-Speed 6292.66 samples/sec Loss 2.5998 LearningRate 0.0000 Epoch: 37 Global Step: 783090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:12,610-Speed 6275.47 samples/sec Loss 2.6127 LearningRate 0.0000 Epoch: 37 Global Step: 783100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:15,865-Speed 6295.29 samples/sec Loss 2.5764 LearningRate 0.0000 Epoch: 37 Global Step: 783110 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:19,121-Speed 6289.80 samples/sec Loss 2.5359 LearningRate 0.0000 Epoch: 37 Global Step: 783120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:22,381-Speed 6283.98 samples/sec Loss 2.5216 LearningRate 0.0000 Epoch: 37 Global Step: 783130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:25,619-Speed 6326.50 samples/sec Loss 2.5400 LearningRate 0.0000 Epoch: 37 Global Step: 783140 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:28,877-Speed 6287.98 samples/sec Loss 2.5383 LearningRate 0.0000 Epoch: 37 Global Step: 783150 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:32,132-Speed 6292.23 samples/sec Loss 2.5812 LearningRate 0.0000 Epoch: 37 Global Step: 783160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:35,391-Speed 6285.63 samples/sec Loss 2.5812 LearningRate 0.0000 Epoch: 37 Global Step: 783170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:38,655-Speed 6275.55 samples/sec Loss 2.5515 LearningRate 0.0000 Epoch: 37 Global Step: 783180 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:41,912-Speed 6289.14 samples/sec Loss 2.5842 LearningRate 0.0000 Epoch: 37 Global Step: 783190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:45,166-Speed 6296.41 samples/sec Loss 2.6085 LearningRate 0.0000 Epoch: 37 Global Step: 783200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:48,426-Speed 6282.71 samples/sec Loss 2.5109 LearningRate 0.0000 Epoch: 37 Global Step: 783210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:51,691-Speed 6276.14 samples/sec Loss 2.5869 LearningRate 0.0000 Epoch: 37 Global Step: 783220 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:54,944-Speed 6297.10 samples/sec Loss 2.6093 LearningRate 0.0000 Epoch: 37 Global Step: 783230 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 14:59:58,180-Speed 6329.16 samples/sec Loss 2.5839 LearningRate 0.0000 Epoch: 37 Global Step: 783240 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:01,439-Speed 6285.39 samples/sec Loss 2.5913 LearningRate 0.0000 Epoch: 37 Global Step: 783250 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:04,695-Speed 6292.81 samples/sec Loss 2.5515 LearningRate 0.0000 Epoch: 37 Global Step: 783260 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:07,955-Speed 6283.63 samples/sec Loss 2.5776 LearningRate 0.0000 Epoch: 37 Global Step: 783270 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:11,212-Speed 6289.44 samples/sec Loss 2.5961 LearningRate 0.0000 Epoch: 37 Global Step: 783280 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:14,541-Speed 6152.84 samples/sec Loss 2.5760 LearningRate 0.0000 Epoch: 37 Global Step: 783290 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:17,811-Speed 6263.30 samples/sec Loss 2.5782 LearningRate 0.0000 Epoch: 37 Global Step: 783300 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:21,070-Speed 6285.89 samples/sec Loss 2.6064 LearningRate 0.0000 Epoch: 37 Global Step: 783310 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:24,328-Speed 6288.06 samples/sec Loss 2.5569 LearningRate 0.0000 Epoch: 37 Global Step: 783320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:27,585-Speed 6289.24 samples/sec Loss 2.6171 LearningRate 0.0000 Epoch: 37 Global Step: 783330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:30,828-Speed 6315.82 samples/sec Loss 2.5841 LearningRate 0.0000 Epoch: 37 Global Step: 783340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:34,085-Speed 6289.29 samples/sec Loss 2.6115 LearningRate 0.0000 Epoch: 37 Global Step: 783350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:37,340-Speed 6293.29 samples/sec Loss 2.6326 LearningRate 0.0000 Epoch: 37 Global Step: 783360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:40,600-Speed 6283.88 samples/sec Loss 2.5930 LearningRate 0.0000 Epoch: 37 Global Step: 783370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:43,848-Speed 6306.46 samples/sec Loss 2.5368 LearningRate 0.0000 Epoch: 37 Global Step: 783380 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:47,098-Speed 6302.80 samples/sec Loss 2.5819 LearningRate 0.0000 Epoch: 37 Global Step: 783390 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:50,347-Speed 6305.87 samples/sec Loss 2.6073 LearningRate 0.0000 Epoch: 37 Global Step: 783400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:53,599-Speed 6298.03 samples/sec Loss 2.6012 LearningRate 0.0000 Epoch: 37 Global Step: 783410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:00:56,857-Speed 6288.74 samples/sec Loss 2.5528 LearningRate 0.0000 Epoch: 37 Global Step: 783420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:00,113-Speed 6291.46 samples/sec Loss 2.5658 LearningRate 0.0000 Epoch: 37 Global Step: 783430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:03,374-Speed 6282.33 samples/sec Loss 2.5918 LearningRate 0.0000 Epoch: 37 Global Step: 783440 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:01:06,614-Speed 6322.88 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 37 Global Step: 783450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:09,870-Speed 6291.07 samples/sec Loss 2.5481 LearningRate 0.0000 Epoch: 37 Global Step: 783460 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:13,160-Speed 6225.12 samples/sec Loss 2.5892 LearningRate 0.0000 Epoch: 37 Global Step: 783470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:16,516-Speed 6104.95 samples/sec Loss 2.6029 LearningRate 0.0000 Epoch: 37 Global Step: 783480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:19,794-Speed 6247.92 samples/sec Loss 2.5691 LearningRate 0.0000 Epoch: 37 Global Step: 783490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:23,050-Speed 6291.47 samples/sec Loss 2.6073 LearningRate 0.0000 Epoch: 37 Global Step: 783500 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:26,315-Speed 6274.75 samples/sec Loss 2.6371 LearningRate 0.0000 Epoch: 37 Global Step: 783510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:29,586-Speed 6262.78 samples/sec Loss 2.6008 LearningRate 0.0000 Epoch: 37 Global Step: 783520 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:32,837-Speed 6300.69 samples/sec Loss 2.5861 LearningRate 0.0000 Epoch: 37 Global Step: 783530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:36,097-Speed 6284.41 samples/sec Loss 2.5642 LearningRate 0.0000 Epoch: 37 Global Step: 783540 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:39,339-Speed 6318.11 samples/sec Loss 2.5775 LearningRate 0.0000 Epoch: 37 Global Step: 783550 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:42,598-Speed 6284.40 samples/sec Loss 2.5875 LearningRate 0.0000 Epoch: 37 Global Step: 783560 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:45,850-Speed 6298.71 samples/sec Loss 2.6192 LearningRate 0.0000 Epoch: 37 Global Step: 783570 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:49,106-Speed 6291.46 samples/sec Loss 2.5681 LearningRate 0.0000 Epoch: 37 Global Step: 783580 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:52,372-Speed 6272.69 samples/sec Loss 2.6235 LearningRate 0.0000 Epoch: 37 Global Step: 783590 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:55,636-Speed 6275.67 samples/sec Loss 2.6036 LearningRate 0.0000 Epoch: 37 Global Step: 783600 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:01:58,894-Speed 6286.94 samples/sec Loss 2.5720 LearningRate 0.0000 Epoch: 37 Global Step: 783610 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:02,156-Speed 6281.06 samples/sec Loss 2.6024 LearningRate 0.0000 Epoch: 37 Global Step: 783620 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:05,417-Speed 6281.81 samples/sec Loss 2.5684 LearningRate 0.0000 Epoch: 37 Global Step: 783630 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:08,673-Speed 6291.70 samples/sec Loss 2.5511 LearningRate 0.0000 Epoch: 37 Global Step: 783640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:11,933-Speed 6283.18 samples/sec Loss 2.5437 LearningRate 0.0000 Epoch: 37 Global Step: 783650 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:02:15,180-Speed 6309.03 samples/sec Loss 2.5921 LearningRate 0.0000 Epoch: 37 Global Step: 783660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:18,438-Speed 6287.50 samples/sec Loss 2.6122 LearningRate 0.0000 Epoch: 37 Global Step: 783670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:21,694-Speed 6292.14 samples/sec Loss 2.5846 LearningRate 0.0000 Epoch: 37 Global Step: 783680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:24,952-Speed 6285.62 samples/sec Loss 2.6082 LearningRate 0.0000 Epoch: 37 Global Step: 783690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:28,260-Speed 6193.75 samples/sec Loss 2.6028 LearningRate 0.0000 Epoch: 37 Global Step: 783700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:31,632-Speed 6074.90 samples/sec Loss 2.5826 LearningRate 0.0000 Epoch: 37 Global Step: 783710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:34,920-Speed 6230.00 samples/sec Loss 2.5643 LearningRate 0.0000 Epoch: 37 Global Step: 783720 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:38,284-Speed 6088.96 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 37 Global Step: 783730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:41,547-Speed 6278.23 samples/sec Loss 2.5831 LearningRate 0.0000 Epoch: 37 Global Step: 783740 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:44,813-Speed 6271.14 samples/sec Loss 2.5674 LearningRate 0.0000 Epoch: 37 Global Step: 783750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:48,053-Speed 6321.53 samples/sec Loss 2.5645 LearningRate 0.0000 Epoch: 37 Global Step: 783760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:51,307-Speed 6296.83 samples/sec Loss 2.6220 LearningRate 0.0000 Epoch: 37 Global Step: 783770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:54,627-Speed 6169.97 samples/sec Loss 2.5831 LearningRate 0.0000 Epoch: 37 Global Step: 783780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:02:57,894-Speed 6268.60 samples/sec Loss 2.5981 LearningRate 0.0000 Epoch: 37 Global Step: 783790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:01,152-Speed 6289.39 samples/sec Loss 2.6172 LearningRate 0.0000 Epoch: 37 Global Step: 783800 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:04,406-Speed 6293.59 samples/sec Loss 2.5743 LearningRate 0.0000 Epoch: 37 Global Step: 783810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:07,656-Speed 6303.85 samples/sec Loss 2.6226 LearningRate 0.0000 Epoch: 37 Global Step: 783820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:10,918-Speed 6279.60 samples/sec Loss 2.6253 LearningRate 0.0000 Epoch: 37 Global Step: 783830 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:14,171-Speed 6297.65 samples/sec Loss 2.5552 LearningRate 0.0000 Epoch: 37 Global Step: 783840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:17,431-Speed 6283.33 samples/sec Loss 2.5798 LearningRate 0.0000 Epoch: 37 Global Step: 783850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:20,679-Speed 6307.93 samples/sec Loss 2.5837 LearningRate 0.0000 Epoch: 37 Global Step: 783860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:23,948-Speed 6266.03 samples/sec Loss 2.5519 LearningRate 0.0000 Epoch: 37 Global Step: 783870 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:27,208-Speed 6282.28 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 37 Global Step: 783880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:30,462-Speed 6296.16 samples/sec Loss 2.6074 LearningRate 0.0000 Epoch: 37 Global Step: 783890 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:33,727-Speed 6273.12 samples/sec Loss 2.5947 LearningRate 0.0000 Epoch: 37 Global Step: 783900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:36,987-Speed 6284.61 samples/sec Loss 2.5664 LearningRate 0.0000 Epoch: 37 Global Step: 783910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:40,241-Speed 6295.96 samples/sec Loss 2.5069 LearningRate 0.0000 Epoch: 37 Global Step: 783920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:43,502-Speed 6280.69 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 37 Global Step: 783930 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:46,761-Speed 6285.10 samples/sec Loss 2.6082 LearningRate 0.0000 Epoch: 37 Global Step: 783940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:50,032-Speed 6262.55 samples/sec Loss 2.5942 LearningRate 0.0000 Epoch: 37 Global Step: 783950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:53,274-Speed 6318.33 samples/sec Loss 2.5983 LearningRate 0.0000 Epoch: 37 Global Step: 783960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:56,525-Speed 6301.47 samples/sec Loss 2.6018 LearningRate 0.0000 Epoch: 37 Global Step: 783970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:03:59,784-Speed 6284.87 samples/sec Loss 2.5986 LearningRate 0.0000 Epoch: 37 Global Step: 783980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:03,037-Speed 6296.86 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 37 Global Step: 783990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:06,293-Speed 6291.94 samples/sec Loss 2.6234 LearningRate 0.0000 Epoch: 37 Global Step: 784000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:09,547-Speed 6296.15 samples/sec Loss 2.5503 LearningRate 0.0000 Epoch: 37 Global Step: 784010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:12,810-Speed 6277.71 samples/sec Loss 2.6037 LearningRate 0.0000 Epoch: 37 Global Step: 784020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:16,069-Speed 6285.46 samples/sec Loss 2.5808 LearningRate 0.0000 Epoch: 37 Global Step: 784030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:19,325-Speed 6291.89 samples/sec Loss 2.5726 LearningRate 0.0000 Epoch: 37 Global Step: 784040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:22,573-Speed 6306.26 samples/sec Loss 2.5971 LearningRate 0.0000 Epoch: 37 Global Step: 784050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:25,839-Speed 6272.39 samples/sec Loss 2.6027 LearningRate 0.0000 Epoch: 37 Global Step: 784060 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:04:29,079-Speed 6323.08 samples/sec Loss 2.5583 LearningRate 0.0000 Epoch: 37 Global Step: 784070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:32,331-Speed 6297.47 samples/sec Loss 2.6334 LearningRate 0.0000 Epoch: 37 Global Step: 784080 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:35,581-Speed 6304.25 samples/sec Loss 2.5322 LearningRate 0.0000 Epoch: 37 Global Step: 784090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:38,839-Speed 6286.77 samples/sec Loss 2.5695 LearningRate 0.0000 Epoch: 37 Global Step: 784100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:42,095-Speed 6292.47 samples/sec Loss 2.5758 LearningRate 0.0000 Epoch: 37 Global Step: 784110 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:45,352-Speed 6289.09 samples/sec Loss 2.6098 LearningRate 0.0000 Epoch: 37 Global Step: 784120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:48,613-Speed 6281.06 samples/sec Loss 2.6043 LearningRate 0.0000 Epoch: 37 Global Step: 784130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:51,870-Speed 6289.94 samples/sec Loss 2.6072 LearningRate 0.0000 Epoch: 37 Global Step: 784140 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:55,121-Speed 6300.80 samples/sec Loss 2.5237 LearningRate 0.0000 Epoch: 37 Global Step: 784150 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:04:58,383-Speed 6278.48 samples/sec Loss 2.5404 LearningRate 0.0000 Epoch: 37 Global Step: 784160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:01,628-Speed 6314.20 samples/sec Loss 2.6232 LearningRate 0.0000 Epoch: 37 Global Step: 784170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:04,878-Speed 6302.27 samples/sec Loss 2.5494 LearningRate 0.0000 Epoch: 37 Global Step: 784180 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:08,131-Speed 6296.26 samples/sec Loss 2.5580 LearningRate 0.0000 Epoch: 37 Global Step: 784190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:11,390-Speed 6286.65 samples/sec Loss 2.5942 LearningRate 0.0000 Epoch: 37 Global Step: 784200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:14,651-Speed 6281.55 samples/sec Loss 2.6001 LearningRate 0.0000 Epoch: 37 Global Step: 784210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:17,909-Speed 6286.81 samples/sec Loss 2.5633 LearningRate 0.0000 Epoch: 37 Global Step: 784220 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:21,173-Speed 6277.15 samples/sec Loss 2.5786 LearningRate 0.0000 Epoch: 37 Global Step: 784230 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:24,430-Speed 6289.40 samples/sec Loss 2.5489 LearningRate 0.0000 Epoch: 37 Global Step: 784240 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:27,682-Speed 6299.31 samples/sec Loss 2.5719 LearningRate 0.0000 Epoch: 37 Global Step: 784250 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:30,942-Speed 6284.07 samples/sec Loss 2.5731 LearningRate 0.0000 Epoch: 37 Global Step: 784260 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:34,202-Speed 6283.27 samples/sec Loss 2.5906 LearningRate 0.0000 Epoch: 37 Global Step: 784270 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:05:37,449-Speed 6308.95 samples/sec Loss 2.5784 LearningRate 0.0000 Epoch: 37 Global Step: 784280 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:40,713-Speed 6275.18 samples/sec Loss 2.6705 LearningRate 0.0000 Epoch: 37 Global Step: 784290 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:43,972-Speed 6285.50 samples/sec Loss 2.5867 LearningRate 0.0000 Epoch: 37 Global Step: 784300 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:47,243-Speed 6262.16 samples/sec Loss 2.5216 LearningRate 0.0000 Epoch: 37 Global Step: 784310 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:50,500-Speed 6289.75 samples/sec Loss 2.6064 LearningRate 0.0000 Epoch: 37 Global Step: 784320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:53,746-Speed 6309.91 samples/sec Loss 2.5814 LearningRate 0.0000 Epoch: 37 Global Step: 784330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:05:57,003-Speed 6290.49 samples/sec Loss 2.6069 LearningRate 0.0000 Epoch: 37 Global Step: 784340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:00,259-Speed 6291.08 samples/sec Loss 2.5673 LearningRate 0.0000 Epoch: 37 Global Step: 784350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:03,574-Speed 6179.49 samples/sec Loss 2.5897 LearningRate 0.0000 Epoch: 37 Global Step: 784360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:06,838-Speed 6275.76 samples/sec Loss 2.5501 LearningRate 0.0000 Epoch: 37 Global Step: 784370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:10,077-Speed 6324.54 samples/sec Loss 2.5882 LearningRate 0.0000 Epoch: 37 Global Step: 784380 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:13,336-Speed 6284.98 samples/sec Loss 2.5535 LearningRate 0.0000 Epoch: 37 Global Step: 784390 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:16,708-Speed 6074.83 samples/sec Loss 2.5584 LearningRate 0.0000 Epoch: 37 Global Step: 784400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:20,000-Speed 6223.02 samples/sec Loss 2.5506 LearningRate 0.0000 Epoch: 37 Global Step: 784410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:23,254-Speed 6295.36 samples/sec Loss 2.5933 LearningRate 0.0000 Epoch: 37 Global Step: 784420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:26,506-Speed 6298.47 samples/sec Loss 2.5696 LearningRate 0.0000 Epoch: 37 Global Step: 784430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:29,764-Speed 6288.80 samples/sec Loss 2.5877 LearningRate 0.0000 Epoch: 37 Global Step: 784440 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:33,018-Speed 6296.23 samples/sec Loss 2.5798 LearningRate 0.0000 Epoch: 37 Global Step: 784450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:36,282-Speed 6274.12 samples/sec Loss 2.5561 LearningRate 0.0000 Epoch: 37 Global Step: 784460 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:39,548-Speed 6272.41 samples/sec Loss 2.5738 LearningRate 0.0000 Epoch: 37 Global Step: 784470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:42,792-Speed 6315.27 samples/sec Loss 2.5342 LearningRate 0.0000 Epoch: 37 Global Step: 784480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:46,043-Speed 6300.19 samples/sec Loss 2.5196 LearningRate 0.0000 Epoch: 37 Global Step: 784490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:49,320-Speed 6251.92 samples/sec Loss 2.5793 LearningRate 0.0000 Epoch: 37 Global Step: 784500 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:52,576-Speed 6291.66 samples/sec Loss 2.5737 LearningRate 0.0000 Epoch: 37 Global Step: 784510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:55,833-Speed 6289.31 samples/sec Loss 2.6542 LearningRate 0.0000 Epoch: 37 Global Step: 784520 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:06:59,093-Speed 6282.61 samples/sec Loss 2.5689 LearningRate 0.0000 Epoch: 37 Global Step: 784530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:02,353-Speed 6282.91 samples/sec Loss 2.6194 LearningRate 0.0000 Epoch: 37 Global Step: 784540 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:05,614-Speed 6283.66 samples/sec Loss 2.6087 LearningRate 0.0000 Epoch: 37 Global Step: 784550 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:08,871-Speed 6287.84 samples/sec Loss 2.5917 LearningRate 0.0000 Epoch: 37 Global Step: 784560 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:12,132-Speed 6282.12 samples/sec Loss 2.5879 LearningRate 0.0000 Epoch: 37 Global Step: 784570 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:15,368-Speed 6330.27 samples/sec Loss 2.6189 LearningRate 0.0000 Epoch: 37 Global Step: 784580 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:18,627-Speed 6285.32 samples/sec Loss 2.5424 LearningRate 0.0000 Epoch: 37 Global Step: 784590 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:21,894-Speed 6269.65 samples/sec Loss 2.6499 LearningRate 0.0000 Epoch: 37 Global Step: 784600 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:25,149-Speed 6295.17 samples/sec Loss 2.5230 LearningRate 0.0000 Epoch: 37 Global Step: 784610 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:28,410-Speed 6282.19 samples/sec Loss 2.5929 LearningRate 0.0000 Epoch: 37 Global Step: 784620 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:31,659-Speed 6304.91 samples/sec Loss 2.5949 LearningRate 0.0000 Epoch: 37 Global Step: 784630 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:34,914-Speed 6293.19 samples/sec Loss 2.5523 LearningRate 0.0000 Epoch: 37 Global Step: 784640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:38,173-Speed 6284.37 samples/sec Loss 2.5596 LearningRate 0.0000 Epoch: 37 Global Step: 784650 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:41,428-Speed 6294.18 samples/sec Loss 2.5918 LearningRate 0.0000 Epoch: 37 Global Step: 784660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:44,686-Speed 6286.72 samples/sec Loss 2.5746 LearningRate 0.0000 Epoch: 37 Global Step: 784670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:47,930-Speed 6315.70 samples/sec Loss 2.6037 LearningRate 0.0000 Epoch: 37 Global Step: 784680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:51,190-Speed 6282.72 samples/sec Loss 2.6026 LearningRate 0.0000 Epoch: 37 Global Step: 784690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:54,444-Speed 6294.92 samples/sec Loss 2.5783 LearningRate 0.0000 Epoch: 37 Global Step: 784700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:07:57,701-Speed 6289.76 samples/sec Loss 2.5669 LearningRate 0.0000 Epoch: 37 Global Step: 784710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:00,959-Speed 6287.53 samples/sec Loss 2.5519 LearningRate 0.0000 Epoch: 37 Global Step: 784720 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:04,212-Speed 6297.54 samples/sec Loss 2.5671 LearningRate 0.0000 Epoch: 37 Global Step: 784730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:07,477-Speed 6273.24 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 37 Global Step: 784740 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:10,739-Speed 6279.29 samples/sec Loss 2.6146 LearningRate 0.0000 Epoch: 37 Global Step: 784750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:13,997-Speed 6288.67 samples/sec Loss 2.6128 LearningRate 0.0000 Epoch: 37 Global Step: 784760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:17,256-Speed 6284.68 samples/sec Loss 2.5595 LearningRate 0.0000 Epoch: 37 Global Step: 784770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:20,505-Speed 6306.24 samples/sec Loss 2.5470 LearningRate 0.0000 Epoch: 37 Global Step: 784780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:23,763-Speed 6285.54 samples/sec Loss 2.5539 LearningRate 0.0000 Epoch: 37 Global Step: 784790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:27,022-Speed 6286.78 samples/sec Loss 2.5371 LearningRate 0.0000 Epoch: 37 Global Step: 784800 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:30,278-Speed 6291.65 samples/sec Loss 2.5550 LearningRate 0.0000 Epoch: 37 Global Step: 784810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:33,542-Speed 6274.68 samples/sec Loss 2.5733 LearningRate 0.0000 Epoch: 37 Global Step: 784820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:36,808-Speed 6273.31 samples/sec Loss 2.6299 LearningRate 0.0000 Epoch: 37 Global Step: 784830 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:40,066-Speed 6286.63 samples/sec Loss 2.5892 LearningRate 0.0000 Epoch: 37 Global Step: 784840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:43,320-Speed 6296.30 samples/sec Loss 2.5483 LearningRate 0.0000 Epoch: 37 Global Step: 784850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:46,574-Speed 6294.01 samples/sec Loss 2.5295 LearningRate 0.0000 Epoch: 37 Global Step: 784860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:49,833-Speed 6287.60 samples/sec Loss 2.5808 LearningRate 0.0000 Epoch: 37 Global Step: 784870 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:53,088-Speed 6292.79 samples/sec Loss 2.5373 LearningRate 0.0000 Epoch: 37 Global Step: 784880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:56,341-Speed 6296.32 samples/sec Loss 2.5839 LearningRate 0.0000 Epoch: 37 Global Step: 784890 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:08:59,601-Speed 6282.77 samples/sec Loss 2.5910 LearningRate 0.0000 Epoch: 37 Global Step: 784900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:02,870-Speed 6268.01 samples/sec Loss 2.5663 LearningRate 0.0000 Epoch: 37 Global Step: 784910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:06,123-Speed 6295.91 samples/sec Loss 2.5692 LearningRate 0.0000 Epoch: 37 Global Step: 784920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:09,374-Speed 6301.32 samples/sec Loss 2.5608 LearningRate 0.0000 Epoch: 37 Global Step: 784930 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:12,632-Speed 6287.24 samples/sec Loss 2.5745 LearningRate 0.0000 Epoch: 37 Global Step: 784940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:15,888-Speed 6292.58 samples/sec Loss 2.5702 LearningRate 0.0000 Epoch: 37 Global Step: 784950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:19,143-Speed 6291.75 samples/sec Loss 2.5554 LearningRate 0.0000 Epoch: 37 Global Step: 784960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:22,403-Speed 6285.09 samples/sec Loss 2.6160 LearningRate 0.0000 Epoch: 37 Global Step: 784970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:25,643-Speed 6321.42 samples/sec Loss 2.6148 LearningRate 0.0000 Epoch: 37 Global Step: 784980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:28,898-Speed 6293.15 samples/sec Loss 2.5435 LearningRate 0.0000 Epoch: 37 Global Step: 784990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:32,150-Speed 6298.80 samples/sec Loss 2.5793 LearningRate 0.0000 Epoch: 37 Global Step: 785000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:35,400-Speed 6303.61 samples/sec Loss 2.5938 LearningRate 0.0000 Epoch: 37 Global Step: 785010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:38,649-Speed 6304.22 samples/sec Loss 2.5466 LearningRate 0.0000 Epoch: 37 Global Step: 785020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:41,898-Speed 6304.58 samples/sec Loss 2.5298 LearningRate 0.0000 Epoch: 37 Global Step: 785030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:45,157-Speed 6286.97 samples/sec Loss 2.5265 LearningRate 0.0000 Epoch: 37 Global Step: 785040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:48,409-Speed 6299.30 samples/sec Loss 2.5220 LearningRate 0.0000 Epoch: 37 Global Step: 785050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:51,676-Speed 6270.66 samples/sec Loss 2.5657 LearningRate 0.0000 Epoch: 37 Global Step: 785060 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:54,942-Speed 6271.09 samples/sec Loss 2.5701 LearningRate 0.0000 Epoch: 37 Global Step: 785070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:09:58,194-Speed 6300.07 samples/sec Loss 2.5693 LearningRate 0.0000 Epoch: 37 Global Step: 785080 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:10:01,442-Speed 6307.45 samples/sec Loss 2.6069 LearningRate 0.0000 Epoch: 37 Global Step: 785090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:04,703-Speed 6281.62 samples/sec Loss 2.5552 LearningRate 0.0000 Epoch: 37 Global Step: 785100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:07,975-Speed 6259.52 samples/sec Loss 2.5693 LearningRate 0.0000 Epoch: 37 Global Step: 785110 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:11,256-Speed 6243.16 samples/sec Loss 2.5902 LearningRate 0.0000 Epoch: 37 Global Step: 785120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:14,521-Speed 6273.64 samples/sec Loss 2.6299 LearningRate 0.0000 Epoch: 37 Global Step: 785130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:17,784-Speed 6279.29 samples/sec Loss 2.5534 LearningRate 0.0000 Epoch: 37 Global Step: 785140 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:21,045-Speed 6281.36 samples/sec Loss 2.5797 LearningRate 0.0000 Epoch: 37 Global Step: 785150 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:24,299-Speed 6294.63 samples/sec Loss 2.5670 LearningRate 0.0000 Epoch: 37 Global Step: 785160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:27,556-Speed 6289.78 samples/sec Loss 2.5983 LearningRate 0.0000 Epoch: 37 Global Step: 785170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:30,807-Speed 6299.51 samples/sec Loss 2.5781 LearningRate 0.0000 Epoch: 37 Global Step: 785180 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:34,064-Speed 6290.95 samples/sec Loss 2.6085 LearningRate 0.0000 Epoch: 37 Global Step: 785190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:37,315-Speed 6299.09 samples/sec Loss 2.6020 LearningRate 0.0000 Epoch: 37 Global Step: 785200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:40,577-Speed 6281.16 samples/sec Loss 2.6336 LearningRate 0.0000 Epoch: 37 Global Step: 785210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:43,832-Speed 6292.78 samples/sec Loss 2.5837 LearningRate 0.0000 Epoch: 37 Global Step: 785220 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:47,081-Speed 6304.75 samples/sec Loss 2.5912 LearningRate 0.0000 Epoch: 37 Global Step: 785230 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:50,337-Speed 6290.82 samples/sec Loss 2.5368 LearningRate 0.0000 Epoch: 37 Global Step: 785240 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:53,592-Speed 6293.85 samples/sec Loss 2.5497 LearningRate 0.0000 Epoch: 37 Global Step: 785250 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:10:56,848-Speed 6291.66 samples/sec Loss 2.5175 LearningRate 0.0000 Epoch: 37 Global Step: 785260 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:00,106-Speed 6287.85 samples/sec Loss 2.5351 LearningRate 0.0000 Epoch: 37 Global Step: 785270 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:03,364-Speed 6288.67 samples/sec Loss 2.5716 LearningRate 0.0000 Epoch: 37 Global Step: 785280 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:06,604-Speed 6320.80 samples/sec Loss 2.5822 LearningRate 0.0000 Epoch: 37 Global Step: 785290 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:09,860-Speed 6291.44 samples/sec Loss 2.5605 LearningRate 0.0000 Epoch: 37 Global Step: 785300 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:13,123-Speed 6278.55 samples/sec Loss 2.5578 LearningRate 0.0000 Epoch: 37 Global Step: 785310 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:16,383-Speed 6283.15 samples/sec Loss 2.5884 LearningRate 0.0000 Epoch: 37 Global Step: 785320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:19,644-Speed 6282.20 samples/sec Loss 2.5405 LearningRate 0.0000 Epoch: 37 Global Step: 785330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:22,919-Speed 6254.95 samples/sec Loss 2.6024 LearningRate 0.0000 Epoch: 37 Global Step: 785340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:26,177-Speed 6287.54 samples/sec Loss 2.5767 LearningRate 0.0000 Epoch: 37 Global Step: 785350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:29,434-Speed 6288.24 samples/sec Loss 2.5779 LearningRate 0.0000 Epoch: 37 Global Step: 785360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:32,697-Speed 6278.31 samples/sec Loss 2.5747 LearningRate 0.0000 Epoch: 37 Global Step: 785370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:35,972-Speed 6255.23 samples/sec Loss 2.6090 LearningRate 0.0000 Epoch: 37 Global Step: 785380 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:39,221-Speed 6303.86 samples/sec Loss 2.5498 LearningRate 0.0000 Epoch: 37 Global Step: 785390 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:11:42,470-Speed 6305.22 samples/sec Loss 2.5469 LearningRate 0.0000 Epoch: 37 Global Step: 785400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:45,734-Speed 6276.44 samples/sec Loss 2.5569 LearningRate 0.0000 Epoch: 37 Global Step: 785410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:48,999-Speed 6275.06 samples/sec Loss 2.5758 LearningRate 0.0000 Epoch: 37 Global Step: 785420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:52,257-Speed 6286.15 samples/sec Loss 2.5625 LearningRate 0.0000 Epoch: 37 Global Step: 785430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:55,513-Speed 6291.19 samples/sec Loss 2.6396 LearningRate 0.0000 Epoch: 37 Global Step: 785440 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:11:58,771-Speed 6289.34 samples/sec Loss 2.5777 LearningRate 0.0000 Epoch: 37 Global Step: 785450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:02,032-Speed 6281.35 samples/sec Loss 2.6002 LearningRate 0.0000 Epoch: 37 Global Step: 785460 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:05,298-Speed 6273.08 samples/sec Loss 2.5976 LearningRate 0.0000 Epoch: 37 Global Step: 785470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:08,554-Speed 6291.65 samples/sec Loss 2.5555 LearningRate 0.0000 Epoch: 37 Global Step: 785480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:11,806-Speed 6298.54 samples/sec Loss 2.5098 LearningRate 0.0000 Epoch: 37 Global Step: 785490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:15,053-Speed 6307.78 samples/sec Loss 2.5610 LearningRate 0.0000 Epoch: 37 Global Step: 785500 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:18,310-Speed 6289.36 samples/sec Loss 2.6031 LearningRate 0.0000 Epoch: 37 Global Step: 785510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:21,573-Speed 6277.51 samples/sec Loss 2.5880 LearningRate 0.0000 Epoch: 37 Global Step: 785520 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:24,823-Speed 6302.58 samples/sec Loss 2.6146 LearningRate 0.0000 Epoch: 37 Global Step: 785530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:28,076-Speed 6298.61 samples/sec Loss 2.5658 LearningRate 0.0000 Epoch: 37 Global Step: 785540 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:31,342-Speed 6272.12 samples/sec Loss 2.5605 LearningRate 0.0000 Epoch: 37 Global Step: 785550 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:34,600-Speed 6286.77 samples/sec Loss 2.6378 LearningRate 0.0000 Epoch: 37 Global Step: 785560 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:37,858-Speed 6288.19 samples/sec Loss 2.5725 LearningRate 0.0000 Epoch: 37 Global Step: 785570 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:41,115-Speed 6289.18 samples/sec Loss 2.6014 LearningRate 0.0000 Epoch: 37 Global Step: 785580 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:44,376-Speed 6281.24 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 37 Global Step: 785590 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:47,619-Speed 6316.94 samples/sec Loss 2.5641 LearningRate 0.0000 Epoch: 37 Global Step: 785600 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:50,874-Speed 6292.23 samples/sec Loss 2.5956 LearningRate 0.0000 Epoch: 37 Global Step: 785610 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:54,128-Speed 6294.54 samples/sec Loss 2.5569 LearningRate 0.0000 Epoch: 37 Global Step: 785620 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:12:57,384-Speed 6292.97 samples/sec Loss 2.5797 LearningRate 0.0000 Epoch: 37 Global Step: 785630 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:00,639-Speed 6291.58 samples/sec Loss 2.5733 LearningRate 0.0000 Epoch: 37 Global Step: 785640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:03,890-Speed 6301.27 samples/sec Loss 2.5742 LearningRate 0.0000 Epoch: 37 Global Step: 785650 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:07,149-Speed 6285.73 samples/sec Loss 2.5614 LearningRate 0.0000 Epoch: 37 Global Step: 785660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:10,405-Speed 6292.96 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 37 Global Step: 785670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:13,659-Speed 6295.65 samples/sec Loss 2.5504 LearningRate 0.0000 Epoch: 37 Global Step: 785680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:16,915-Speed 6290.91 samples/sec Loss 2.5592 LearningRate 0.0000 Epoch: 37 Global Step: 785690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:20,157-Speed 6317.66 samples/sec Loss 2.6211 LearningRate 0.0000 Epoch: 37 Global Step: 785700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:23,413-Speed 6290.80 samples/sec Loss 2.5821 LearningRate 0.0000 Epoch: 37 Global Step: 785710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:26,674-Speed 6282.63 samples/sec Loss 2.5785 LearningRate 0.0000 Epoch: 37 Global Step: 785720 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:29,932-Speed 6287.24 samples/sec Loss 2.6001 LearningRate 0.0000 Epoch: 37 Global Step: 785730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:33,182-Speed 6303.60 samples/sec Loss 2.5966 LearningRate 0.0000 Epoch: 37 Global Step: 785740 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:36,437-Speed 6292.55 samples/sec Loss 2.5837 LearningRate 0.0000 Epoch: 37 Global Step: 785750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:39,696-Speed 6285.29 samples/sec Loss 2.5936 LearningRate 0.0000 Epoch: 37 Global Step: 785760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:42,952-Speed 6292.54 samples/sec Loss 2.6020 LearningRate 0.0000 Epoch: 37 Global Step: 785770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:46,212-Speed 6281.75 samples/sec Loss 2.5431 LearningRate 0.0000 Epoch: 37 Global Step: 785780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:49,470-Speed 6288.34 samples/sec Loss 2.5573 LearningRate 0.0000 Epoch: 37 Global Step: 785790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:52,734-Speed 6277.28 samples/sec Loss 2.6196 LearningRate 0.0000 Epoch: 37 Global Step: 785800 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:13:55,978-Speed 6315.13 samples/sec Loss 2.5478 LearningRate 0.0000 Epoch: 37 Global Step: 785810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:13:59,229-Speed 6300.47 samples/sec Loss 2.5468 LearningRate 0.0000 Epoch: 37 Global Step: 785820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:02,484-Speed 6293.71 samples/sec Loss 2.5808 LearningRate 0.0000 Epoch: 37 Global Step: 785830 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:05,733-Speed 6305.48 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 37 Global Step: 785840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:09,003-Speed 6264.14 samples/sec Loss 2.5779 LearningRate 0.0000 Epoch: 37 Global Step: 785850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:12,261-Speed 6288.68 samples/sec Loss 2.5405 LearningRate 0.0000 Epoch: 37 Global Step: 785860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:15,517-Speed 6290.47 samples/sec Loss 2.6065 LearningRate 0.0000 Epoch: 37 Global Step: 785870 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:18,768-Speed 6300.57 samples/sec Loss 2.5383 LearningRate 0.0000 Epoch: 37 Global Step: 785880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:22,019-Speed 6301.27 samples/sec Loss 2.5458 LearningRate 0.0000 Epoch: 37 Global Step: 785890 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:25,279-Speed 6283.60 samples/sec Loss 2.5877 LearningRate 0.0000 Epoch: 37 Global Step: 785900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:28,525-Speed 6312.54 samples/sec Loss 2.6085 LearningRate 0.0000 Epoch: 37 Global Step: 785910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:31,782-Speed 6289.30 samples/sec Loss 2.6152 LearningRate 0.0000 Epoch: 37 Global Step: 785920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:35,030-Speed 6305.09 samples/sec Loss 2.5467 LearningRate 0.0000 Epoch: 37 Global Step: 785930 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:38,287-Speed 6289.00 samples/sec Loss 2.5591 LearningRate 0.0000 Epoch: 37 Global Step: 785940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:41,548-Speed 6283.07 samples/sec Loss 2.5871 LearningRate 0.0000 Epoch: 37 Global Step: 785950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:44,811-Speed 6277.70 samples/sec Loss 2.5617 LearningRate 0.0000 Epoch: 37 Global Step: 785960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:48,075-Speed 6274.97 samples/sec Loss 2.5880 LearningRate 0.0000 Epoch: 37 Global Step: 785970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:51,332-Speed 6290.45 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 37 Global Step: 785980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:54,581-Speed 6304.43 samples/sec Loss 2.5708 LearningRate 0.0000 Epoch: 37 Global Step: 785990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:14:57,832-Speed 6302.25 samples/sec Loss 2.5610 LearningRate 0.0000 Epoch: 37 Global Step: 786000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:01,078-Speed 6309.55 samples/sec Loss 2.5469 LearningRate 0.0000 Epoch: 37 Global Step: 786010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:04,333-Speed 6293.47 samples/sec Loss 2.5414 LearningRate 0.0000 Epoch: 37 Global Step: 786020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:07,585-Speed 6297.73 samples/sec Loss 2.5918 LearningRate 0.0000 Epoch: 37 Global Step: 786030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:10,837-Speed 6299.74 samples/sec Loss 2.5961 LearningRate 0.0000 Epoch: 37 Global Step: 786040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:14,085-Speed 6307.44 samples/sec Loss 2.4982 LearningRate 0.0000 Epoch: 37 Global Step: 786050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:17,343-Speed 6287.15 samples/sec Loss 2.6395 LearningRate 0.0000 Epoch: 37 Global Step: 786060 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:20,597-Speed 6294.91 samples/sec Loss 2.5766 LearningRate 0.0000 Epoch: 37 Global Step: 786070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:23,888-Speed 6225.18 samples/sec Loss 2.5665 LearningRate 0.0000 Epoch: 37 Global Step: 786080 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:27,161-Speed 6259.73 samples/sec Loss 2.6382 LearningRate 0.0000 Epoch: 37 Global Step: 786090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:30,424-Speed 6278.12 samples/sec Loss 2.5627 LearningRate 0.0000 Epoch: 37 Global Step: 786100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:33,687-Speed 6276.80 samples/sec Loss 2.5820 LearningRate 0.0000 Epoch: 37 Global Step: 786110 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:15:36,924-Speed 6328.66 samples/sec Loss 2.5411 LearningRate 0.0000 Epoch: 37 Global Step: 786120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:40,184-Speed 6282.36 samples/sec Loss 2.5956 LearningRate 0.0000 Epoch: 37 Global Step: 786130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:43,443-Speed 6285.92 samples/sec Loss 2.6019 LearningRate 0.0000 Epoch: 37 Global Step: 786140 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:46,694-Speed 6301.97 samples/sec Loss 2.5630 LearningRate 0.0000 Epoch: 37 Global Step: 786150 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:49,947-Speed 6296.66 samples/sec Loss 2.5665 LearningRate 0.0000 Epoch: 37 Global Step: 786160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:53,202-Speed 6294.04 samples/sec Loss 2.5722 LearningRate 0.0000 Epoch: 37 Global Step: 786170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:56,458-Speed 6290.00 samples/sec Loss 2.6153 LearningRate 0.0000 Epoch: 37 Global Step: 786180 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:15:59,722-Speed 6277.45 samples/sec Loss 2.5682 LearningRate 0.0000 Epoch: 37 Global Step: 786190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:02,989-Speed 6268.64 samples/sec Loss 2.5708 LearningRate 0.0000 Epoch: 37 Global Step: 786200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:06,242-Speed 6296.65 samples/sec Loss 2.5503 LearningRate 0.0000 Epoch: 37 Global Step: 786210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:09,485-Speed 6316.83 samples/sec Loss 2.5234 LearningRate 0.0000 Epoch: 37 Global Step: 786220 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:12,736-Speed 6301.66 samples/sec Loss 2.5987 LearningRate 0.0000 Epoch: 37 Global Step: 786230 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:15,993-Speed 6289.10 samples/sec Loss 2.6044 LearningRate 0.0000 Epoch: 37 Global Step: 786240 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:19,255-Speed 6279.40 samples/sec Loss 2.6067 LearningRate 0.0000 Epoch: 37 Global Step: 786250 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:22,528-Speed 6259.46 samples/sec Loss 2.6348 LearningRate 0.0000 Epoch: 37 Global Step: 786260 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:25,784-Speed 6290.84 samples/sec Loss 2.6077 LearningRate 0.0000 Epoch: 37 Global Step: 786270 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:29,039-Speed 6294.96 samples/sec Loss 2.5618 LearningRate 0.0000 Epoch: 37 Global Step: 786280 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:32,291-Speed 6298.77 samples/sec Loss 2.5815 LearningRate 0.0000 Epoch: 37 Global Step: 786290 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:35,536-Speed 6312.75 samples/sec Loss 2.5400 LearningRate 0.0000 Epoch: 37 Global Step: 786300 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:38,785-Speed 6305.01 samples/sec Loss 2.5625 LearningRate 0.0000 Epoch: 37 Global Step: 786310 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:42,026-Speed 6320.78 samples/sec Loss 2.6355 LearningRate 0.0000 Epoch: 37 Global Step: 786320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:45,280-Speed 6293.68 samples/sec Loss 2.5542 LearningRate 0.0000 Epoch: 37 Global Step: 786330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:48,536-Speed 6290.77 samples/sec Loss 2.5505 LearningRate 0.0000 Epoch: 37 Global Step: 786340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:51,826-Speed 6229.48 samples/sec Loss 2.6214 LearningRate 0.0000 Epoch: 37 Global Step: 786350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:55,083-Speed 6288.50 samples/sec Loss 2.5641 LearningRate 0.0000 Epoch: 37 Global Step: 786360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:16:58,342-Speed 6286.24 samples/sec Loss 2.6003 LearningRate 0.0000 Epoch: 37 Global Step: 786370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:01,598-Speed 6292.14 samples/sec Loss 2.5671 LearningRate 0.0000 Epoch: 37 Global Step: 786380 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:04,854-Speed 6291.33 samples/sec Loss 2.6006 LearningRate 0.0000 Epoch: 37 Global Step: 786390 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:08,147-Speed 6220.14 samples/sec Loss 2.6160 LearningRate 0.0000 Epoch: 37 Global Step: 786400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:11,392-Speed 6313.07 samples/sec Loss 2.5543 LearningRate 0.0000 Epoch: 37 Global Step: 786410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:14,627-Speed 6331.60 samples/sec Loss 2.5408 LearningRate 0.0000 Epoch: 37 Global Step: 786420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:17,887-Speed 6284.13 samples/sec Loss 2.5698 LearningRate 0.0000 Epoch: 37 Global Step: 786430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:21,136-Speed 6305.13 samples/sec Loss 2.5090 LearningRate 0.0000 Epoch: 37 Global Step: 786440 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:24,395-Speed 6284.39 samples/sec Loss 2.5834 LearningRate 0.0000 Epoch: 37 Global Step: 786450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:27,650-Speed 6293.60 samples/sec Loss 2.5868 LearningRate 0.0000 Epoch: 37 Global Step: 786460 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:30,911-Speed 6282.52 samples/sec Loss 2.5561 LearningRate 0.0000 Epoch: 37 Global Step: 786470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:34,167-Speed 6291.76 samples/sec Loss 2.5341 LearningRate 0.0000 Epoch: 37 Global Step: 786480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:37,424-Speed 6289.51 samples/sec Loss 2.5683 LearningRate 0.0000 Epoch: 37 Global Step: 786490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:40,680-Speed 6291.77 samples/sec Loss 2.6132 LearningRate 0.0000 Epoch: 37 Global Step: 786500 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:43,934-Speed 6295.61 samples/sec Loss 2.5692 LearningRate 0.0000 Epoch: 37 Global Step: 786510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:47,197-Speed 6276.46 samples/sec Loss 2.6055 LearningRate 0.0000 Epoch: 37 Global Step: 786520 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:17:50,444-Speed 6309.78 samples/sec Loss 2.5404 LearningRate 0.0000 Epoch: 37 Global Step: 786530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:53,704-Speed 6282.93 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 37 Global Step: 786540 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:17:56,970-Speed 6272.08 samples/sec Loss 2.5938 LearningRate 0.0000 Epoch: 37 Global Step: 786550 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:00,230-Speed 6284.16 samples/sec Loss 2.6186 LearningRate 0.0000 Epoch: 37 Global Step: 786560 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:03,484-Speed 6294.61 samples/sec Loss 2.6195 LearningRate 0.0000 Epoch: 37 Global Step: 786570 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:06,730-Speed 6310.02 samples/sec Loss 2.6081 LearningRate 0.0000 Epoch: 37 Global Step: 786580 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:09,998-Speed 6268.47 samples/sec Loss 2.5981 LearningRate 0.0000 Epoch: 37 Global Step: 786590 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:13,264-Speed 6272.24 samples/sec Loss 2.5901 LearningRate 0.0000 Epoch: 37 Global Step: 786600 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:16,520-Speed 6291.42 samples/sec Loss 2.5773 LearningRate 0.0000 Epoch: 37 Global Step: 786610 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:19,783-Speed 6277.12 samples/sec Loss 2.5838 LearningRate 0.0000 Epoch: 37 Global Step: 786620 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:23,031-Speed 6308.50 samples/sec Loss 2.6102 LearningRate 0.0000 Epoch: 37 Global Step: 786630 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:26,289-Speed 6285.63 samples/sec Loss 2.5646 LearningRate 0.0000 Epoch: 37 Global Step: 786640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:29,539-Speed 6303.94 samples/sec Loss 2.5853 LearningRate 0.0000 Epoch: 37 Global Step: 786650 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:32,787-Speed 6307.54 samples/sec Loss 2.5669 LearningRate 0.0000 Epoch: 37 Global Step: 786660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:36,041-Speed 6295.69 samples/sec Loss 2.5778 LearningRate 0.0000 Epoch: 37 Global Step: 786670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:39,293-Speed 6298.32 samples/sec Loss 2.5369 LearningRate 0.0000 Epoch: 37 Global Step: 786680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:42,547-Speed 6295.39 samples/sec Loss 2.5677 LearningRate 0.0000 Epoch: 37 Global Step: 786690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:45,812-Speed 6274.87 samples/sec Loss 2.5659 LearningRate 0.0000 Epoch: 37 Global Step: 786700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:49,066-Speed 6294.53 samples/sec Loss 2.5442 LearningRate 0.0000 Epoch: 37 Global Step: 786710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:52,316-Speed 6302.65 samples/sec Loss 2.5202 LearningRate 0.0000 Epoch: 37 Global Step: 786720 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:55,560-Speed 6314.42 samples/sec Loss 2.6082 LearningRate 0.0000 Epoch: 37 Global Step: 786730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:18:58,814-Speed 6295.89 samples/sec Loss 2.5377 LearningRate 0.0000 Epoch: 37 Global Step: 786740 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:02,069-Speed 6293.36 samples/sec Loss 2.5527 LearningRate 0.0000 Epoch: 37 Global Step: 786750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:05,322-Speed 6296.58 samples/sec Loss 2.5488 LearningRate 0.0000 Epoch: 37 Global Step: 786760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:08,581-Speed 6284.91 samples/sec Loss 2.5582 LearningRate 0.0000 Epoch: 37 Global Step: 786770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:11,840-Speed 6286.65 samples/sec Loss 2.5657 LearningRate 0.0000 Epoch: 37 Global Step: 786780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:15,098-Speed 6287.04 samples/sec Loss 2.5895 LearningRate 0.0000 Epoch: 37 Global Step: 786790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:18,347-Speed 6306.00 samples/sec Loss 2.5964 LearningRate 0.0000 Epoch: 37 Global Step: 786800 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:21,599-Speed 6299.10 samples/sec Loss 2.5838 LearningRate 0.0000 Epoch: 37 Global Step: 786810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:24,847-Speed 6305.60 samples/sec Loss 2.5177 LearningRate 0.0000 Epoch: 37 Global Step: 786820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:28,100-Speed 6297.17 samples/sec Loss 2.5439 LearningRate 0.0000 Epoch: 37 Global Step: 786830 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:19:31,343-Speed 6317.74 samples/sec Loss 2.5795 LearningRate 0.0000 Epoch: 37 Global Step: 786840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:34,601-Speed 6285.92 samples/sec Loss 2.6235 LearningRate 0.0000 Epoch: 37 Global Step: 786850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:37,852-Speed 6303.25 samples/sec Loss 2.5829 LearningRate 0.0000 Epoch: 37 Global Step: 786860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:41,109-Speed 6289.50 samples/sec Loss 2.5607 LearningRate 0.0000 Epoch: 37 Global Step: 786870 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:44,363-Speed 6293.99 samples/sec Loss 2.5890 LearningRate 0.0000 Epoch: 37 Global Step: 786880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:47,620-Speed 6290.54 samples/sec Loss 2.5745 LearningRate 0.0000 Epoch: 37 Global Step: 786890 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:50,893-Speed 6256.98 samples/sec Loss 2.6217 LearningRate 0.0000 Epoch: 37 Global Step: 786900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:54,156-Speed 6279.08 samples/sec Loss 2.5474 LearningRate 0.0000 Epoch: 37 Global Step: 786910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:19:57,412-Speed 6291.86 samples/sec Loss 2.5912 LearningRate 0.0000 Epoch: 37 Global Step: 786920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:00,681-Speed 6265.94 samples/sec Loss 2.5516 LearningRate 0.0000 Epoch: 37 Global Step: 786930 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:03,923-Speed 6318.42 samples/sec Loss 2.6194 LearningRate 0.0000 Epoch: 37 Global Step: 786940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:07,181-Speed 6287.89 samples/sec Loss 2.5626 LearningRate 0.0000 Epoch: 37 Global Step: 786950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:10,433-Speed 6298.92 samples/sec Loss 2.5887 LearningRate 0.0000 Epoch: 37 Global Step: 786960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:13,689-Speed 6290.46 samples/sec Loss 2.5431 LearningRate 0.0000 Epoch: 37 Global Step: 786970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:16,946-Speed 6290.42 samples/sec Loss 2.5598 LearningRate 0.0000 Epoch: 37 Global Step: 786980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:20,197-Speed 6299.97 samples/sec Loss 2.6005 LearningRate 0.0000 Epoch: 37 Global Step: 786990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:23,456-Speed 6285.70 samples/sec Loss 2.6336 LearningRate 0.0000 Epoch: 37 Global Step: 787000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:26,710-Speed 6296.25 samples/sec Loss 2.5642 LearningRate 0.0000 Epoch: 37 Global Step: 787010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:29,965-Speed 6293.20 samples/sec Loss 2.5375 LearningRate 0.0000 Epoch: 37 Global Step: 787020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:33,235-Speed 6264.22 samples/sec Loss 2.5437 LearningRate 0.0000 Epoch: 37 Global Step: 787030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:36,472-Speed 6327.72 samples/sec Loss 2.5490 LearningRate 0.0000 Epoch: 37 Global Step: 787040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:39,730-Speed 6288.40 samples/sec Loss 2.6059 LearningRate 0.0000 Epoch: 37 Global Step: 787050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:42,986-Speed 6291.93 samples/sec Loss 2.5617 LearningRate 0.0000 Epoch: 37 Global Step: 787060 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:46,242-Speed 6290.32 samples/sec Loss 2.6826 LearningRate 0.0000 Epoch: 37 Global Step: 787070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:49,498-Speed 6291.93 samples/sec Loss 2.5943 LearningRate 0.0000 Epoch: 37 Global Step: 787080 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:52,759-Speed 6283.16 samples/sec Loss 2.5525 LearningRate 0.0000 Epoch: 37 Global Step: 787090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:56,008-Speed 6303.31 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 37 Global Step: 787100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:20:59,258-Speed 6304.46 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 37 Global Step: 787110 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:02,525-Speed 6268.60 samples/sec Loss 2.6042 LearningRate 0.0000 Epoch: 37 Global Step: 787120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:05,788-Speed 6277.57 samples/sec Loss 2.5631 LearningRate 0.0000 Epoch: 37 Global Step: 787130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:09,047-Speed 6285.82 samples/sec Loss 2.5827 LearningRate 0.0000 Epoch: 37 Global Step: 787140 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:21:12,294-Speed 6309.14 samples/sec Loss 2.5849 LearningRate 0.0000 Epoch: 37 Global Step: 787150 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:15,553-Speed 6285.47 samples/sec Loss 2.5886 LearningRate 0.0000 Epoch: 37 Global Step: 787160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:18,807-Speed 6295.18 samples/sec Loss 2.6040 LearningRate 0.0000 Epoch: 37 Global Step: 787170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:22,054-Speed 6308.76 samples/sec Loss 2.5793 LearningRate 0.0000 Epoch: 37 Global Step: 787180 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:25,318-Speed 6277.12 samples/sec Loss 2.5656 LearningRate 0.0000 Epoch: 37 Global Step: 787190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:28,580-Speed 6280.24 samples/sec Loss 2.6345 LearningRate 0.0000 Epoch: 37 Global Step: 787200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:31,841-Speed 6281.02 samples/sec Loss 2.5807 LearningRate 0.0000 Epoch: 37 Global Step: 787210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:35,101-Speed 6283.34 samples/sec Loss 2.5680 LearningRate 0.0000 Epoch: 37 Global Step: 787220 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:38,421-Speed 6169.99 samples/sec Loss 2.5458 LearningRate 0.0000 Epoch: 37 Global Step: 787230 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:41,741-Speed 6169.13 samples/sec Loss 2.5624 LearningRate 0.0000 Epoch: 37 Global Step: 787240 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:44,981-Speed 6322.81 samples/sec Loss 2.5521 LearningRate 0.0000 Epoch: 37 Global Step: 787250 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:48,242-Speed 6281.89 samples/sec Loss 2.5986 LearningRate 0.0000 Epoch: 37 Global Step: 787260 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:51,496-Speed 6296.89 samples/sec Loss 2.5729 LearningRate 0.0000 Epoch: 37 Global Step: 787270 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:54,752-Speed 6290.74 samples/sec Loss 2.5583 LearningRate 0.0000 Epoch: 37 Global Step: 787280 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:21:58,009-Speed 6288.98 samples/sec Loss 2.6140 LearningRate 0.0000 Epoch: 37 Global Step: 787290 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:01,270-Speed 6282.77 samples/sec Loss 2.5812 LearningRate 0.0000 Epoch: 37 Global Step: 787300 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:04,526-Speed 6290.19 samples/sec Loss 2.5646 LearningRate 0.0000 Epoch: 37 Global Step: 787310 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:07,788-Speed 6279.92 samples/sec Loss 2.5780 LearningRate 0.0000 Epoch: 37 Global Step: 787320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:11,050-Speed 6280.08 samples/sec Loss 2.6112 LearningRate 0.0000 Epoch: 37 Global Step: 787330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:14,302-Speed 6297.84 samples/sec Loss 2.5554 LearningRate 0.0000 Epoch: 37 Global Step: 787340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:17,540-Speed 6326.04 samples/sec Loss 2.5120 LearningRate 0.0000 Epoch: 37 Global Step: 787350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:20,796-Speed 6292.67 samples/sec Loss 2.5525 LearningRate 0.0000 Epoch: 37 Global Step: 787360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:24,044-Speed 6306.97 samples/sec Loss 2.5742 LearningRate 0.0000 Epoch: 37 Global Step: 787370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:27,294-Speed 6301.26 samples/sec Loss 2.5987 LearningRate 0.0000 Epoch: 37 Global Step: 787380 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:30,553-Speed 6285.98 samples/sec Loss 2.5549 LearningRate 0.0000 Epoch: 37 Global Step: 787390 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:33,831-Speed 6249.19 samples/sec Loss 2.5668 LearningRate 0.0000 Epoch: 37 Global Step: 787400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:37,100-Speed 6266.21 samples/sec Loss 2.5686 LearningRate 0.0000 Epoch: 37 Global Step: 787410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:40,371-Speed 6263.16 samples/sec Loss 2.5653 LearningRate 0.0000 Epoch: 37 Global Step: 787420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:43,623-Speed 6299.31 samples/sec Loss 2.5271 LearningRate 0.0000 Epoch: 37 Global Step: 787430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:46,872-Speed 6303.70 samples/sec Loss 2.5987 LearningRate 0.0000 Epoch: 37 Global Step: 787440 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:50,115-Speed 6316.84 samples/sec Loss 2.5896 LearningRate 0.0000 Epoch: 37 Global Step: 787450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:53,374-Speed 6286.79 samples/sec Loss 2.5572 LearningRate 0.0000 Epoch: 37 Global Step: 787460 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:56,626-Speed 6298.56 samples/sec Loss 2.5424 LearningRate 0.0000 Epoch: 37 Global Step: 787470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:22:59,887-Speed 6281.99 samples/sec Loss 2.6114 LearningRate 0.0000 Epoch: 37 Global Step: 787480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:03,158-Speed 6263.88 samples/sec Loss 2.6269 LearningRate 0.0000 Epoch: 37 Global Step: 787490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:06,413-Speed 6292.55 samples/sec Loss 2.5699 LearningRate 0.0000 Epoch: 37 Global Step: 787500 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:09,668-Speed 6293.37 samples/sec Loss 2.6116 LearningRate 0.0000 Epoch: 37 Global Step: 787510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:12,932-Speed 6274.57 samples/sec Loss 2.5869 LearningRate 0.0000 Epoch: 37 Global Step: 787520 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:16,191-Speed 6285.51 samples/sec Loss 2.6859 LearningRate 0.0000 Epoch: 37 Global Step: 787530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:19,449-Speed 6288.62 samples/sec Loss 2.5639 LearningRate 0.0000 Epoch: 37 Global Step: 787540 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:22,686-Speed 6327.56 samples/sec Loss 2.5843 LearningRate 0.0000 Epoch: 37 Global Step: 787550 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:25,953-Speed 6270.19 samples/sec Loss 2.5352 LearningRate 0.0000 Epoch: 37 Global Step: 787560 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:29,201-Speed 6306.26 samples/sec Loss 2.5490 LearningRate 0.0000 Epoch: 37 Global Step: 787570 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:32,454-Speed 6296.98 samples/sec Loss 2.5396 LearningRate 0.0000 Epoch: 37 Global Step: 787580 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:35,712-Speed 6287.66 samples/sec Loss 2.5547 LearningRate 0.0000 Epoch: 37 Global Step: 787590 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:38,978-Speed 6271.42 samples/sec Loss 2.5770 LearningRate 0.0000 Epoch: 37 Global Step: 787600 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:42,247-Speed 6267.21 samples/sec Loss 2.5418 LearningRate 0.0000 Epoch: 37 Global Step: 787610 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:45,499-Speed 6298.41 samples/sec Loss 2.6043 LearningRate 0.0000 Epoch: 37 Global Step: 787620 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:48,771-Speed 6261.55 samples/sec Loss 2.5801 LearningRate 0.0000 Epoch: 37 Global Step: 787630 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:52,021-Speed 6302.84 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 37 Global Step: 787640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:23:55,277-Speed 6291.51 samples/sec Loss 2.5314 LearningRate 0.0000 Epoch: 37 Global Step: 787650 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:23:58,518-Speed 6319.78 samples/sec Loss 2.5927 LearningRate 0.0000 Epoch: 37 Global Step: 787660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:01,775-Speed 6290.14 samples/sec Loss 2.6221 LearningRate 0.0000 Epoch: 37 Global Step: 787670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:05,046-Speed 6263.25 samples/sec Loss 2.5627 LearningRate 0.0000 Epoch: 37 Global Step: 787680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:08,310-Speed 6275.76 samples/sec Loss 2.5803 LearningRate 0.0000 Epoch: 37 Global Step: 787690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:11,690-Speed 6060.66 samples/sec Loss 2.5629 LearningRate 0.0000 Epoch: 37 Global Step: 787700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:14,982-Speed 6221.99 samples/sec Loss 2.5474 LearningRate 0.0000 Epoch: 37 Global Step: 787710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:18,240-Speed 6287.54 samples/sec Loss 2.5301 LearningRate 0.0000 Epoch: 37 Global Step: 787720 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:21,497-Speed 6288.13 samples/sec Loss 2.5887 LearningRate 0.0000 Epoch: 37 Global Step: 787730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:24,753-Speed 6292.83 samples/sec Loss 2.6102 LearningRate 0.0000 Epoch: 37 Global Step: 787740 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:28,016-Speed 6277.08 samples/sec Loss 2.5590 LearningRate 0.0000 Epoch: 37 Global Step: 787750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:31,257-Speed 6321.79 samples/sec Loss 2.6108 LearningRate 0.0000 Epoch: 37 Global Step: 787760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:34,507-Speed 6302.88 samples/sec Loss 2.5790 LearningRate 0.0000 Epoch: 37 Global Step: 787770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:37,767-Speed 6282.36 samples/sec Loss 2.6123 LearningRate 0.0000 Epoch: 37 Global Step: 787780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:41,024-Speed 6288.92 samples/sec Loss 2.6084 LearningRate 0.0000 Epoch: 37 Global Step: 787790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:44,281-Speed 6290.87 samples/sec Loss 2.5353 LearningRate 0.0000 Epoch: 37 Global Step: 787800 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:47,543-Speed 6278.73 samples/sec Loss 2.6308 LearningRate 0.0000 Epoch: 37 Global Step: 787810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:50,801-Speed 6288.37 samples/sec Loss 2.5276 LearningRate 0.0000 Epoch: 37 Global Step: 787820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:54,062-Speed 6280.47 samples/sec Loss 2.5894 LearningRate 0.0000 Epoch: 37 Global Step: 787830 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:24:57,309-Speed 6308.59 samples/sec Loss 2.5849 LearningRate 0.0000 Epoch: 37 Global Step: 787840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:00,563-Speed 6295.44 samples/sec Loss 2.5725 LearningRate 0.0000 Epoch: 37 Global Step: 787850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:03,805-Speed 6319.49 samples/sec Loss 2.6023 LearningRate 0.0000 Epoch: 37 Global Step: 787860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:07,069-Speed 6275.66 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 37 Global Step: 787870 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:10,324-Speed 6293.16 samples/sec Loss 2.6323 LearningRate 0.0000 Epoch: 37 Global Step: 787880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:13,598-Speed 6256.84 samples/sec Loss 2.5752 LearningRate 0.0000 Epoch: 37 Global Step: 787890 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:16,859-Speed 6282.82 samples/sec Loss 2.5937 LearningRate 0.0000 Epoch: 37 Global Step: 787900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:20,111-Speed 6298.31 samples/sec Loss 2.5554 LearningRate 0.0000 Epoch: 37 Global Step: 787910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:23,370-Speed 6286.23 samples/sec Loss 2.5809 LearningRate 0.0000 Epoch: 37 Global Step: 787920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:26,628-Speed 6286.65 samples/sec Loss 2.5822 LearningRate 0.0000 Epoch: 37 Global Step: 787930 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:29,888-Speed 6283.95 samples/sec Loss 2.6097 LearningRate 0.0000 Epoch: 37 Global Step: 787940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:33,140-Speed 6298.68 samples/sec Loss 2.6621 LearningRate 0.0000 Epoch: 37 Global Step: 787950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:36,378-Speed 6326.82 samples/sec Loss 2.5947 LearningRate 0.0000 Epoch: 37 Global Step: 787960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:39,651-Speed 6258.30 samples/sec Loss 2.6158 LearningRate 0.0000 Epoch: 37 Global Step: 787970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:42,904-Speed 6297.63 samples/sec Loss 2.5431 LearningRate 0.0000 Epoch: 37 Global Step: 787980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:46,249-Speed 6123.00 samples/sec Loss 2.5586 LearningRate 0.0000 Epoch: 37 Global Step: 787990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:49,498-Speed 6305.85 samples/sec Loss 2.5585 LearningRate 0.0000 Epoch: 37 Global Step: 788000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:52,762-Speed 6275.85 samples/sec Loss 2.5882 LearningRate 0.0000 Epoch: 37 Global Step: 788010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:56,017-Speed 6292.80 samples/sec Loss 2.5714 LearningRate 0.0000 Epoch: 37 Global Step: 788020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:25:59,274-Speed 6289.77 samples/sec Loss 2.5789 LearningRate 0.0000 Epoch: 37 Global Step: 788030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:26:02,568-Speed 6217.77 samples/sec Loss 2.5364 LearningRate 0.0000 Epoch: 37 Global Step: 788040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:26:05,817-Speed 6305.93 samples/sec Loss 2.5667 LearningRate 0.0000 Epoch: 37 Global Step: 788050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:26:09,060-Speed 6316.70 samples/sec Loss 2.5929 LearningRate 0.0000 Epoch: 37 Global Step: 788060 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:26:12,317-Speed 6288.75 samples/sec Loss 2.5330 LearningRate 0.0000 Epoch: 37 Global Step: 788070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:26:15,569-Speed 6299.13 samples/sec Loss 2.5849 LearningRate 0.0000 Epoch: 37 Global Step: 788080 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:26:18,833-Speed 6277.00 samples/sec Loss 2.6332 LearningRate 0.0000 Epoch: 37 Global Step: 788090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:26:22,087-Speed 6295.46 samples/sec Loss 2.6039 LearningRate 0.0000 Epoch: 37 Global Step: 788100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:21,478-Speed 344.83 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 38 Global Step: 788110 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:24,737-Speed 6287.03 samples/sec Loss 2.5407 LearningRate 0.0000 Epoch: 38 Global Step: 788120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:27,981-Speed 6314.70 samples/sec Loss 2.5809 LearningRate 0.0000 Epoch: 38 Global Step: 788130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:31,220-Speed 6324.63 samples/sec Loss 2.5667 LearningRate 0.0000 Epoch: 38 Global Step: 788140 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:34,463-Speed 6315.72 samples/sec Loss 2.6363 LearningRate 0.0000 Epoch: 38 Global Step: 788150 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:37,691-Speed 6345.87 samples/sec Loss 2.5246 LearningRate 0.0000 Epoch: 38 Global Step: 788160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:40,945-Speed 6295.22 samples/sec Loss 2.5871 LearningRate 0.0000 Epoch: 38 Global Step: 788170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:44,191-Speed 6310.01 samples/sec Loss 2.6159 LearningRate 0.0000 Epoch: 38 Global Step: 788180 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:47,436-Speed 6313.40 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 38 Global Step: 788190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:50,712-Speed 6252.11 samples/sec Loss 2.5617 LearningRate 0.0000 Epoch: 38 Global Step: 788200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:53,966-Speed 6294.80 samples/sec Loss 2.5944 LearningRate 0.0000 Epoch: 38 Global Step: 788210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:27:57,216-Speed 6303.02 samples/sec Loss 2.5632 LearningRate 0.0000 Epoch: 38 Global Step: 788220 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:00,462-Speed 6311.54 samples/sec Loss 2.5572 LearningRate 0.0000 Epoch: 38 Global Step: 788230 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:03,716-Speed 6295.16 samples/sec Loss 2.6318 LearningRate 0.0000 Epoch: 38 Global Step: 788240 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:06,962-Speed 6310.58 samples/sec Loss 2.5964 LearningRate 0.0000 Epoch: 38 Global Step: 788250 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:10,200-Speed 6325.79 samples/sec Loss 2.5996 LearningRate 0.0000 Epoch: 38 Global Step: 788260 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:13,447-Speed 6308.90 samples/sec Loss 2.5960 LearningRate 0.0000 Epoch: 38 Global Step: 788270 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:16,694-Speed 6308.87 samples/sec Loss 2.5938 LearningRate 0.0000 Epoch: 38 Global Step: 788280 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:19,948-Speed 6295.60 samples/sec Loss 2.5638 LearningRate 0.0000 Epoch: 38 Global Step: 788290 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:23,214-Speed 6272.44 samples/sec Loss 2.5579 LearningRate 0.0000 Epoch: 38 Global Step: 788300 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:26,461-Speed 6309.70 samples/sec Loss 2.6041 LearningRate 0.0000 Epoch: 38 Global Step: 788310 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:29,704-Speed 6315.96 samples/sec Loss 2.5392 LearningRate 0.0000 Epoch: 38 Global Step: 788320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:32,957-Speed 6296.65 samples/sec Loss 2.5695 LearningRate 0.0000 Epoch: 38 Global Step: 788330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:36,221-Speed 6276.58 samples/sec Loss 2.5548 LearningRate 0.0000 Epoch: 38 Global Step: 788340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:39,476-Speed 6293.80 samples/sec Loss 2.5412 LearningRate 0.0000 Epoch: 38 Global Step: 788350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:42,719-Speed 6315.05 samples/sec Loss 2.5896 LearningRate 0.0000 Epoch: 38 Global Step: 788360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:45,982-Speed 6279.04 samples/sec Loss 2.5567 LearningRate 0.0000 Epoch: 38 Global Step: 788370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:49,236-Speed 6295.18 samples/sec Loss 2.5890 LearningRate 0.0000 Epoch: 38 Global Step: 788380 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:52,496-Speed 6283.35 samples/sec Loss 2.6020 LearningRate 0.0000 Epoch: 38 Global Step: 788390 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:55,759-Speed 6278.30 samples/sec Loss 2.5392 LearningRate 0.0000 Epoch: 38 Global Step: 788400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:28:59,019-Speed 6281.77 samples/sec Loss 2.5784 LearningRate 0.0000 Epoch: 38 Global Step: 788410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:02,306-Speed 6233.79 samples/sec Loss 2.5730 LearningRate 0.0000 Epoch: 38 Global Step: 788420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:05,563-Speed 6288.37 samples/sec Loss 2.5408 LearningRate 0.0000 Epoch: 38 Global Step: 788430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:08,825-Speed 6279.05 samples/sec Loss 2.5940 LearningRate 0.0000 Epoch: 38 Global Step: 788440 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:12,093-Speed 6269.35 samples/sec Loss 2.5735 LearningRate 0.0000 Epoch: 38 Global Step: 788450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:15,368-Speed 6254.68 samples/sec Loss 2.5795 LearningRate 0.0000 Epoch: 38 Global Step: 788460 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:29:18,604-Speed 6330.16 samples/sec Loss 2.5731 LearningRate 0.0000 Epoch: 38 Global Step: 788470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:21,859-Speed 6294.08 samples/sec Loss 2.5813 LearningRate 0.0000 Epoch: 38 Global Step: 788480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:25,115-Speed 6290.96 samples/sec Loss 2.5665 LearningRate 0.0000 Epoch: 38 Global Step: 788490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:28,377-Speed 6279.08 samples/sec Loss 2.5840 LearningRate 0.0000 Epoch: 38 Global Step: 788500 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:31,635-Speed 6288.68 samples/sec Loss 2.5573 LearningRate 0.0000 Epoch: 38 Global Step: 788510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:34,893-Speed 6286.94 samples/sec Loss 2.5642 LearningRate 0.0000 Epoch: 38 Global Step: 788520 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:38,150-Speed 6289.87 samples/sec Loss 2.5365 LearningRate 0.0000 Epoch: 38 Global Step: 788530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:41,395-Speed 6312.58 samples/sec Loss 2.5738 LearningRate 0.0000 Epoch: 38 Global Step: 788540 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:44,658-Speed 6278.30 samples/sec Loss 2.6120 LearningRate 0.0000 Epoch: 38 Global Step: 788550 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:47,924-Speed 6270.25 samples/sec Loss 2.5229 LearningRate 0.0000 Epoch: 38 Global Step: 788560 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:51,155-Speed 6340.43 samples/sec Loss 2.5555 LearningRate 0.0000 Epoch: 38 Global Step: 788570 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:54,415-Speed 6284.30 samples/sec Loss 2.5552 LearningRate 0.0000 Epoch: 38 Global Step: 788580 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:29:57,666-Speed 6300.71 samples/sec Loss 2.5759 LearningRate 0.0000 Epoch: 38 Global Step: 788590 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:00,924-Speed 6286.57 samples/sec Loss 2.5674 LearningRate 0.0000 Epoch: 38 Global Step: 788600 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:04,227-Speed 6201.83 samples/sec Loss 2.5488 LearningRate 0.0000 Epoch: 38 Global Step: 788610 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:07,548-Speed 6169.68 samples/sec Loss 2.5796 LearningRate 0.0000 Epoch: 38 Global Step: 788620 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:10,928-Speed 6060.24 samples/sec Loss 2.5828 LearningRate 0.0000 Epoch: 38 Global Step: 788630 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:14,177-Speed 6304.03 samples/sec Loss 2.5353 LearningRate 0.0000 Epoch: 38 Global Step: 788640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:17,425-Speed 6306.57 samples/sec Loss 2.5598 LearningRate 0.0000 Epoch: 38 Global Step: 788650 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:20,672-Speed 6309.74 samples/sec Loss 2.5371 LearningRate 0.0000 Epoch: 38 Global Step: 788660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:23,913-Speed 6320.83 samples/sec Loss 2.6094 LearningRate 0.0000 Epoch: 38 Global Step: 788670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:27,187-Speed 6255.82 samples/sec Loss 2.6117 LearningRate 0.0000 Epoch: 38 Global Step: 788680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:30,445-Speed 6288.57 samples/sec Loss 2.5680 LearningRate 0.0000 Epoch: 38 Global Step: 788690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:33,695-Speed 6302.79 samples/sec Loss 2.5936 LearningRate 0.0000 Epoch: 38 Global Step: 788700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:36,948-Speed 6296.41 samples/sec Loss 2.5662 LearningRate 0.0000 Epoch: 38 Global Step: 788710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:40,254-Speed 6196.88 samples/sec Loss 2.5408 LearningRate 0.0000 Epoch: 38 Global Step: 788720 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:43,506-Speed 6299.48 samples/sec Loss 2.5652 LearningRate 0.0000 Epoch: 38 Global Step: 788730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:46,758-Speed 6298.11 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 38 Global Step: 788740 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:50,073-Speed 6179.77 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 38 Global Step: 788750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:53,328-Speed 6292.70 samples/sec Loss 2.5603 LearningRate 0.0000 Epoch: 38 Global Step: 788760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:56,575-Speed 6308.92 samples/sec Loss 2.5269 LearningRate 0.0000 Epoch: 38 Global Step: 788770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:30:59,833-Speed 6287.86 samples/sec Loss 2.5458 LearningRate 0.0000 Epoch: 38 Global Step: 788780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:03,088-Speed 6294.41 samples/sec Loss 2.5848 LearningRate 0.0000 Epoch: 38 Global Step: 788790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:06,349-Speed 6280.05 samples/sec Loss 2.5924 LearningRate 0.0000 Epoch: 38 Global Step: 788800 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:09,607-Speed 6288.22 samples/sec Loss 2.5700 LearningRate 0.0000 Epoch: 38 Global Step: 788810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:12,859-Speed 6298.54 samples/sec Loss 2.5676 LearningRate 0.0000 Epoch: 38 Global Step: 788820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:16,109-Speed 6303.94 samples/sec Loss 2.5653 LearningRate 0.0000 Epoch: 38 Global Step: 788830 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:19,358-Speed 6303.23 samples/sec Loss 2.5595 LearningRate 0.0000 Epoch: 38 Global Step: 788840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:22,613-Speed 6294.57 samples/sec Loss 2.5628 LearningRate 0.0000 Epoch: 38 Global Step: 788850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:25,864-Speed 6300.76 samples/sec Loss 2.6037 LearningRate 0.0000 Epoch: 38 Global Step: 788860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:29,114-Speed 6302.95 samples/sec Loss 2.5930 LearningRate 0.0000 Epoch: 38 Global Step: 788870 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:31:32,347-Speed 6336.17 samples/sec Loss 2.5631 LearningRate 0.0000 Epoch: 38 Global Step: 788880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:35,603-Speed 6291.53 samples/sec Loss 2.5501 LearningRate 0.0000 Epoch: 38 Global Step: 788890 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:38,858-Speed 6293.76 samples/sec Loss 2.5794 LearningRate 0.0000 Epoch: 38 Global Step: 788900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:42,113-Speed 6294.20 samples/sec Loss 2.5775 LearningRate 0.0000 Epoch: 38 Global Step: 788910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:45,367-Speed 6294.04 samples/sec Loss 2.6132 LearningRate 0.0000 Epoch: 38 Global Step: 788920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:48,621-Speed 6294.79 samples/sec Loss 2.5559 LearningRate 0.0000 Epoch: 38 Global Step: 788930 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:51,882-Speed 6281.35 samples/sec Loss 2.5516 LearningRate 0.0000 Epoch: 38 Global Step: 788940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:55,133-Speed 6301.57 samples/sec Loss 2.4920 LearningRate 0.0000 Epoch: 38 Global Step: 788950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:31:58,394-Speed 6282.48 samples/sec Loss 2.5857 LearningRate 0.0000 Epoch: 38 Global Step: 788960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:01,648-Speed 6295.31 samples/sec Loss 2.5563 LearningRate 0.0000 Epoch: 38 Global Step: 788970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:04,884-Speed 6329.65 samples/sec Loss 2.5388 LearningRate 0.0000 Epoch: 38 Global Step: 788980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:08,133-Speed 6304.84 samples/sec Loss 2.5739 LearningRate 0.0000 Epoch: 38 Global Step: 788990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:11,388-Speed 6293.18 samples/sec Loss 2.5889 LearningRate 0.0000 Epoch: 38 Global Step: 789000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:14,645-Speed 6289.40 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 38 Global Step: 789010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:17,898-Speed 6297.49 samples/sec Loss 2.5363 LearningRate 0.0000 Epoch: 38 Global Step: 789020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:21,153-Speed 6292.38 samples/sec Loss 2.5833 LearningRate 0.0000 Epoch: 38 Global Step: 789030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:24,409-Speed 6291.74 samples/sec Loss 2.5790 LearningRate 0.0000 Epoch: 38 Global Step: 789040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:27,669-Speed 6283.84 samples/sec Loss 2.5709 LearningRate 0.0000 Epoch: 38 Global Step: 789050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:30,930-Speed 6281.07 samples/sec Loss 2.5689 LearningRate 0.0000 Epoch: 38 Global Step: 789060 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:34,188-Speed 6288.93 samples/sec Loss 2.5291 LearningRate 0.0000 Epoch: 38 Global Step: 789070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:37,444-Speed 6292.61 samples/sec Loss 2.5697 LearningRate 0.0000 Epoch: 38 Global Step: 789080 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:40,758-Speed 6180.80 samples/sec Loss 2.5512 LearningRate 0.0000 Epoch: 38 Global Step: 789090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:44,006-Speed 6307.17 samples/sec Loss 2.5499 LearningRate 0.0000 Epoch: 38 Global Step: 789100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:47,262-Speed 6291.11 samples/sec Loss 2.5863 LearningRate 0.0000 Epoch: 38 Global Step: 789110 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:50,511-Speed 6303.69 samples/sec Loss 2.5861 LearningRate 0.0000 Epoch: 38 Global Step: 789120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:53,763-Speed 6299.11 samples/sec Loss 2.5687 LearningRate 0.0000 Epoch: 38 Global Step: 789130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:32:57,010-Speed 6309.38 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 38 Global Step: 789140 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:00,262-Speed 6299.85 samples/sec Loss 2.5863 LearningRate 0.0000 Epoch: 38 Global Step: 789150 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:03,589-Speed 6156.04 samples/sec Loss 2.5809 LearningRate 0.0000 Epoch: 38 Global Step: 789160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:06,919-Speed 6151.13 samples/sec Loss 2.5472 LearningRate 0.0000 Epoch: 38 Global Step: 789170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:10,159-Speed 6324.25 samples/sec Loss 2.5494 LearningRate 0.0000 Epoch: 38 Global Step: 789180 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:13,411-Speed 6297.31 samples/sec Loss 2.6180 LearningRate 0.0000 Epoch: 38 Global Step: 789190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:16,665-Speed 6295.68 samples/sec Loss 2.5584 LearningRate 0.0000 Epoch: 38 Global Step: 789200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:19,922-Speed 6289.19 samples/sec Loss 2.5985 LearningRate 0.0000 Epoch: 38 Global Step: 789210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:23,177-Speed 6293.30 samples/sec Loss 2.5449 LearningRate 0.0000 Epoch: 38 Global Step: 789220 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:26,433-Speed 6290.63 samples/sec Loss 2.5425 LearningRate 0.0000 Epoch: 38 Global Step: 789230 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:29,684-Speed 6302.26 samples/sec Loss 2.5615 LearningRate 0.0000 Epoch: 38 Global Step: 789240 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:32,942-Speed 6287.17 samples/sec Loss 2.5379 LearningRate 0.0000 Epoch: 38 Global Step: 789250 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:36,202-Speed 6284.96 samples/sec Loss 2.5945 LearningRate 0.0000 Epoch: 38 Global Step: 789260 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:39,469-Speed 6269.85 samples/sec Loss 2.5314 LearningRate 0.0000 Epoch: 38 Global Step: 789270 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:42,744-Speed 6255.18 samples/sec Loss 2.5391 LearningRate 0.0000 Epoch: 38 Global Step: 789280 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:45,995-Speed 6299.77 samples/sec Loss 2.5974 LearningRate 0.0000 Epoch: 38 Global Step: 789290 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:49,253-Speed 6288.50 samples/sec Loss 2.6089 LearningRate 0.0000 Epoch: 38 Global Step: 789300 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:52,512-Speed 6285.94 samples/sec Loss 2.5439 LearningRate 0.0000 Epoch: 38 Global Step: 789310 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:55,763-Speed 6300.08 samples/sec Loss 2.5770 LearningRate 0.0000 Epoch: 38 Global Step: 789320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:33:59,022-Speed 6284.99 samples/sec Loss 2.5771 LearningRate 0.0000 Epoch: 38 Global Step: 789330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:02,281-Speed 6286.81 samples/sec Loss 2.6170 LearningRate 0.0000 Epoch: 38 Global Step: 789340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:05,536-Speed 6293.58 samples/sec Loss 2.5929 LearningRate 0.0000 Epoch: 38 Global Step: 789350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:08,790-Speed 6294.96 samples/sec Loss 2.5871 LearningRate 0.0000 Epoch: 38 Global Step: 789360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:12,056-Speed 6271.43 samples/sec Loss 2.5633 LearningRate 0.0000 Epoch: 38 Global Step: 789370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:15,315-Speed 6285.90 samples/sec Loss 2.5658 LearningRate 0.0000 Epoch: 38 Global Step: 789380 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:34:18,554-Speed 6323.95 samples/sec Loss 2.6001 LearningRate 0.0000 Epoch: 38 Global Step: 789390 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:21,815-Speed 6280.91 samples/sec Loss 2.5493 LearningRate 0.0000 Epoch: 38 Global Step: 789400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:25,074-Speed 6285.24 samples/sec Loss 2.5523 LearningRate 0.0000 Epoch: 38 Global Step: 789410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:28,334-Speed 6285.14 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 38 Global Step: 789420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:31,590-Speed 6291.04 samples/sec Loss 2.5958 LearningRate 0.0000 Epoch: 38 Global Step: 789430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:34,844-Speed 6293.98 samples/sec Loss 2.5560 LearningRate 0.0000 Epoch: 38 Global Step: 789440 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:38,101-Speed 6290.07 samples/sec Loss 2.5616 LearningRate 0.0000 Epoch: 38 Global Step: 789450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:41,354-Speed 6297.53 samples/sec Loss 2.5537 LearningRate 0.0000 Epoch: 38 Global Step: 789460 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:44,611-Speed 6289.25 samples/sec Loss 2.5602 LearningRate 0.0000 Epoch: 38 Global Step: 789470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:47,865-Speed 6296.71 samples/sec Loss 2.5922 LearningRate 0.0000 Epoch: 38 Global Step: 789480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:51,099-Speed 6333.59 samples/sec Loss 2.5961 LearningRate 0.0000 Epoch: 38 Global Step: 789490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:54,358-Speed 6285.25 samples/sec Loss 2.5586 LearningRate 0.0000 Epoch: 38 Global Step: 789500 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:34:57,612-Speed 6295.43 samples/sec Loss 2.5991 LearningRate 0.0000 Epoch: 38 Global Step: 789510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:00,862-Speed 6302.44 samples/sec Loss 2.5506 LearningRate 0.0000 Epoch: 38 Global Step: 789520 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:04,116-Speed 6296.54 samples/sec Loss 2.5721 LearningRate 0.0000 Epoch: 38 Global Step: 789530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:07,365-Speed 6303.04 samples/sec Loss 2.5906 LearningRate 0.0000 Epoch: 38 Global Step: 789540 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:10,623-Speed 6289.02 samples/sec Loss 2.5332 LearningRate 0.0000 Epoch: 38 Global Step: 789550 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:13,886-Speed 6276.84 samples/sec Loss 2.5304 LearningRate 0.0000 Epoch: 38 Global Step: 789560 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:17,142-Speed 6291.32 samples/sec Loss 2.5335 LearningRate 0.0000 Epoch: 38 Global Step: 789570 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:20,404-Speed 6280.26 samples/sec Loss 2.5659 LearningRate 0.0000 Epoch: 38 Global Step: 789580 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:23,643-Speed 6324.75 samples/sec Loss 2.6173 LearningRate 0.0000 Epoch: 38 Global Step: 789590 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:26,898-Speed 6292.84 samples/sec Loss 2.5722 LearningRate 0.0000 Epoch: 38 Global Step: 789600 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:30,150-Speed 6298.81 samples/sec Loss 2.5280 LearningRate 0.0000 Epoch: 38 Global Step: 789610 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:33,403-Speed 6297.57 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 38 Global Step: 789620 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:36,661-Speed 6287.06 samples/sec Loss 2.5562 LearningRate 0.0000 Epoch: 38 Global Step: 789630 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:39,922-Speed 6282.74 samples/sec Loss 2.5821 LearningRate 0.0000 Epoch: 38 Global Step: 789640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:43,174-Speed 6297.35 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 38 Global Step: 789650 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:46,430-Speed 6293.07 samples/sec Loss 2.5229 LearningRate 0.0000 Epoch: 38 Global Step: 789660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:49,689-Speed 6285.99 samples/sec Loss 2.5797 LearningRate 0.0000 Epoch: 38 Global Step: 789670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:52,948-Speed 6285.91 samples/sec Loss 2.5834 LearningRate 0.0000 Epoch: 38 Global Step: 789680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:56,187-Speed 6323.17 samples/sec Loss 2.6161 LearningRate 0.0000 Epoch: 38 Global Step: 789690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:35:59,446-Speed 6285.07 samples/sec Loss 2.5381 LearningRate 0.0000 Epoch: 38 Global Step: 789700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:02,704-Speed 6289.19 samples/sec Loss 2.5934 LearningRate 0.0000 Epoch: 38 Global Step: 789710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:05,962-Speed 6286.56 samples/sec Loss 2.5923 LearningRate 0.0000 Epoch: 38 Global Step: 789720 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:09,220-Speed 6288.34 samples/sec Loss 2.5962 LearningRate 0.0000 Epoch: 38 Global Step: 789730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:12,476-Speed 6290.55 samples/sec Loss 2.5903 LearningRate 0.0000 Epoch: 38 Global Step: 789740 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:15,735-Speed 6285.81 samples/sec Loss 2.6214 LearningRate 0.0000 Epoch: 38 Global Step: 789750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:18,991-Speed 6291.09 samples/sec Loss 2.5879 LearningRate 0.0000 Epoch: 38 Global Step: 789760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:22,253-Speed 6279.16 samples/sec Loss 2.5690 LearningRate 0.0000 Epoch: 38 Global Step: 789770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:25,504-Speed 6301.44 samples/sec Loss 2.4982 LearningRate 0.0000 Epoch: 38 Global Step: 789780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:28,748-Speed 6315.40 samples/sec Loss 2.5093 LearningRate 0.0000 Epoch: 38 Global Step: 789790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:32,002-Speed 6295.24 samples/sec Loss 2.5477 LearningRate 0.0000 Epoch: 38 Global Step: 789800 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:35,257-Speed 6293.06 samples/sec Loss 2.5080 LearningRate 0.0000 Epoch: 38 Global Step: 789810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:38,523-Speed 6271.09 samples/sec Loss 2.5984 LearningRate 0.0000 Epoch: 38 Global Step: 789820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:41,779-Speed 6291.05 samples/sec Loss 2.5922 LearningRate 0.0000 Epoch: 38 Global Step: 789830 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:45,043-Speed 6276.18 samples/sec Loss 2.5091 LearningRate 0.0000 Epoch: 38 Global Step: 789840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:48,309-Speed 6271.49 samples/sec Loss 2.6219 LearningRate 0.0000 Epoch: 38 Global Step: 789850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:51,566-Speed 6290.60 samples/sec Loss 2.5097 LearningRate 0.0000 Epoch: 38 Global Step: 789860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:54,835-Speed 6265.94 samples/sec Loss 2.5857 LearningRate 0.0000 Epoch: 38 Global Step: 789870 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:36:58,116-Speed 6244.35 samples/sec Loss 2.5752 LearningRate 0.0000 Epoch: 38 Global Step: 789880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:01,380-Speed 6275.69 samples/sec Loss 2.5667 LearningRate 0.0000 Epoch: 38 Global Step: 789890 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:37:04,626-Speed 6310.44 samples/sec Loss 2.5691 LearningRate 0.0000 Epoch: 38 Global Step: 789900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:07,885-Speed 6286.10 samples/sec Loss 2.5116 LearningRate 0.0000 Epoch: 38 Global Step: 789910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:11,147-Speed 6278.98 samples/sec Loss 2.5274 LearningRate 0.0000 Epoch: 38 Global Step: 789920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:14,404-Speed 6289.33 samples/sec Loss 2.5811 LearningRate 0.0000 Epoch: 38 Global Step: 789930 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:17,667-Speed 6278.82 samples/sec Loss 2.5712 LearningRate 0.0000 Epoch: 38 Global Step: 789940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:20,918-Speed 6300.93 samples/sec Loss 2.5329 LearningRate 0.0000 Epoch: 38 Global Step: 789950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:24,174-Speed 6291.31 samples/sec Loss 2.5556 LearningRate 0.0000 Epoch: 38 Global Step: 789960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:27,431-Speed 6288.94 samples/sec Loss 2.5810 LearningRate 0.0000 Epoch: 38 Global Step: 789970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:30,682-Speed 6301.44 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 38 Global Step: 789980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:33,930-Speed 6306.27 samples/sec Loss 2.5570 LearningRate 0.0000 Epoch: 38 Global Step: 789990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:37,170-Speed 6322.27 samples/sec Loss 2.5835 LearningRate 0.0000 Epoch: 38 Global Step: 790000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:40,420-Speed 6303.05 samples/sec Loss 2.5928 LearningRate 0.0000 Epoch: 38 Global Step: 790010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:43,697-Speed 6251.31 samples/sec Loss 2.5794 LearningRate 0.0000 Epoch: 38 Global Step: 790020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:46,954-Speed 6289.37 samples/sec Loss 2.5840 LearningRate 0.0000 Epoch: 38 Global Step: 790030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:50,207-Speed 6296.56 samples/sec Loss 2.5842 LearningRate 0.0000 Epoch: 38 Global Step: 790040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:53,464-Speed 6289.49 samples/sec Loss 2.5574 LearningRate 0.0000 Epoch: 38 Global Step: 790050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:56,725-Speed 6281.52 samples/sec Loss 2.5729 LearningRate 0.0000 Epoch: 38 Global Step: 790060 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:37:59,978-Speed 6297.04 samples/sec Loss 2.5695 LearningRate 0.0000 Epoch: 38 Global Step: 790070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:03,241-Speed 6279.57 samples/sec Loss 2.6281 LearningRate 0.0000 Epoch: 38 Global Step: 790080 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:06,501-Speed 6283.08 samples/sec Loss 2.5841 LearningRate 0.0000 Epoch: 38 Global Step: 790090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:09,738-Speed 6329.37 samples/sec Loss 2.5700 LearningRate 0.0000 Epoch: 38 Global Step: 790100 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:12,999-Speed 6281.17 samples/sec Loss 2.5821 LearningRate 0.0000 Epoch: 38 Global Step: 790110 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:16,257-Speed 6288.53 samples/sec Loss 2.5629 LearningRate 0.0000 Epoch: 38 Global Step: 790120 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:19,521-Speed 6275.40 samples/sec Loss 2.5682 LearningRate 0.0000 Epoch: 38 Global Step: 790130 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:22,777-Speed 6291.92 samples/sec Loss 2.5625 LearningRate 0.0000 Epoch: 38 Global Step: 790140 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:26,039-Speed 6277.84 samples/sec Loss 2.5276 LearningRate 0.0000 Epoch: 38 Global Step: 790150 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:29,298-Speed 6286.47 samples/sec Loss 2.5807 LearningRate 0.0000 Epoch: 38 Global Step: 790160 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:32,554-Speed 6291.62 samples/sec Loss 2.5927 LearningRate 0.0000 Epoch: 38 Global Step: 790170 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:35,809-Speed 6291.91 samples/sec Loss 2.5471 LearningRate 0.0000 Epoch: 38 Global Step: 790180 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:39,062-Speed 6297.37 samples/sec Loss 2.5584 LearningRate 0.0000 Epoch: 38 Global Step: 790190 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:42,305-Speed 6318.17 samples/sec Loss 2.6032 LearningRate 0.0000 Epoch: 38 Global Step: 790200 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:45,558-Speed 6295.61 samples/sec Loss 2.5866 LearningRate 0.0000 Epoch: 38 Global Step: 790210 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:48,832-Speed 6258.24 samples/sec Loss 2.6067 LearningRate 0.0000 Epoch: 38 Global Step: 790220 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:52,164-Speed 6146.24 samples/sec Loss 2.5340 LearningRate 0.0000 Epoch: 38 Global Step: 790230 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:55,420-Speed 6292.21 samples/sec Loss 2.5631 LearningRate 0.0000 Epoch: 38 Global Step: 790240 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:38:58,681-Speed 6280.59 samples/sec Loss 2.6427 LearningRate 0.0000 Epoch: 38 Global Step: 790250 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:01,937-Speed 6292.22 samples/sec Loss 2.5661 LearningRate 0.0000 Epoch: 38 Global Step: 790260 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:05,220-Speed 6240.10 samples/sec Loss 2.5201 LearningRate 0.0000 Epoch: 38 Global Step: 790270 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:08,478-Speed 6286.82 samples/sec Loss 2.5809 LearningRate 0.0000 Epoch: 38 Global Step: 790280 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:11,744-Speed 6272.43 samples/sec Loss 2.5663 LearningRate 0.0000 Epoch: 38 Global Step: 790290 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:14,984-Speed 6323.20 samples/sec Loss 2.5693 LearningRate 0.0000 Epoch: 38 Global Step: 790300 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:18,240-Speed 6292.52 samples/sec Loss 2.5464 LearningRate 0.0000 Epoch: 38 Global Step: 790310 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:21,492-Speed 6298.21 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 38 Global Step: 790320 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:24,747-Speed 6293.05 samples/sec Loss 2.5910 LearningRate 0.0000 Epoch: 38 Global Step: 790330 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:28,012-Speed 6274.43 samples/sec Loss 2.5923 LearningRate 0.0000 Epoch: 38 Global Step: 790340 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:31,268-Speed 6292.00 samples/sec Loss 2.5774 LearningRate 0.0000 Epoch: 38 Global Step: 790350 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:34,526-Speed 6286.46 samples/sec Loss 2.5886 LearningRate 0.0000 Epoch: 38 Global Step: 790360 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:37,779-Speed 6296.57 samples/sec Loss 2.5325 LearningRate 0.0000 Epoch: 38 Global Step: 790370 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:41,038-Speed 6284.85 samples/sec Loss 2.5892 LearningRate 0.0000 Epoch: 38 Global Step: 790380 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:44,296-Speed 6289.36 samples/sec Loss 2.5753 LearningRate 0.0000 Epoch: 38 Global Step: 790390 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:47,534-Speed 6324.47 samples/sec Loss 2.5845 LearningRate 0.0000 Epoch: 38 Global Step: 790400 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:50,801-Speed 6270.76 samples/sec Loss 2.5272 LearningRate 0.0000 Epoch: 38 Global Step: 790410 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:54,053-Speed 6299.23 samples/sec Loss 2.5790 LearningRate 0.0000 Epoch: 38 Global Step: 790420 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:39:57,302-Speed 6304.37 samples/sec Loss 2.5783 LearningRate 0.0000 Epoch: 38 Global Step: 790430 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:00,563-Speed 6282.65 samples/sec Loss 2.5753 LearningRate 0.0000 Epoch: 38 Global Step: 790440 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:03,818-Speed 6293.02 samples/sec Loss 2.5691 LearningRate 0.0000 Epoch: 38 Global Step: 790450 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:07,083-Speed 6274.45 samples/sec Loss 2.5972 LearningRate 0.0000 Epoch: 38 Global Step: 790460 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:10,334-Speed 6299.37 samples/sec Loss 2.5224 LearningRate 0.0000 Epoch: 38 Global Step: 790470 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:13,586-Speed 6300.52 samples/sec Loss 2.5185 LearningRate 0.0000 Epoch: 38 Global Step: 790480 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:16,838-Speed 6299.19 samples/sec Loss 2.5511 LearningRate 0.0000 Epoch: 38 Global Step: 790490 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:20,093-Speed 6293.36 samples/sec Loss 2.5973 LearningRate 0.0000 Epoch: 38 Global Step: 790500 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-03 15:40:23,333-Speed 6322.25 samples/sec Loss 2.5284 LearningRate 0.0000 Epoch: 38 Global Step: 790510 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:26,586-Speed 6296.81 samples/sec Loss 2.5232 LearningRate 0.0000 Epoch: 38 Global Step: 790520 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:29,835-Speed 6305.18 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 38 Global Step: 790530 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:33,093-Speed 6288.19 samples/sec Loss 2.5553 LearningRate 0.0000 Epoch: 38 Global Step: 790540 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:36,342-Speed 6303.47 samples/sec Loss 2.5862 LearningRate 0.0000 Epoch: 38 Global Step: 790550 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:39,600-Speed 6288.48 samples/sec Loss 2.5252 LearningRate 0.0000 Epoch: 38 Global Step: 790560 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:42,855-Speed 6293.50 samples/sec Loss 2.5458 LearningRate 0.0000 Epoch: 38 Global Step: 790570 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:46,116-Speed 6281.87 samples/sec Loss 2.6053 LearningRate 0.0000 Epoch: 38 Global Step: 790580 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:49,387-Speed 6261.95 samples/sec Loss 2.6030 LearningRate 0.0000 Epoch: 38 Global Step: 790590 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:52,736-Speed 6116.97 samples/sec Loss 2.5480 LearningRate 0.0000 Epoch: 38 Global Step: 790600 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:56,016-Speed 6245.17 samples/sec Loss 2.5542 LearningRate 0.0000 Epoch: 38 Global Step: 790610 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:40:59,261-Speed 6311.93 samples/sec Loss 2.5617 LearningRate 0.0000 Epoch: 38 Global Step: 790620 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:02,550-Speed 6227.59 samples/sec Loss 2.5419 LearningRate 0.0000 Epoch: 38 Global Step: 790630 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:05,903-Speed 6109.76 samples/sec Loss 2.5847 LearningRate 0.0000 Epoch: 38 Global Step: 790640 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:09,156-Speed 6298.02 samples/sec Loss 2.5707 LearningRate 0.0000 Epoch: 38 Global Step: 790650 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:12,421-Speed 6274.39 samples/sec Loss 2.5276 LearningRate 0.0000 Epoch: 38 Global Step: 790660 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:15,676-Speed 6292.78 samples/sec Loss 2.5772 LearningRate 0.0000 Epoch: 38 Global Step: 790670 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:18,952-Speed 6252.60 samples/sec Loss 2.5828 LearningRate 0.0000 Epoch: 38 Global Step: 790680 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:22,204-Speed 6298.97 samples/sec Loss 2.5290 LearningRate 0.0000 Epoch: 38 Global Step: 790690 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:25,490-Speed 6235.31 samples/sec Loss 2.5928 LearningRate 0.0000 Epoch: 38 Global Step: 790700 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:28,734-Speed 6314.19 samples/sec Loss 2.5133 LearningRate 0.0000 Epoch: 38 Global Step: 790710 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:31,991-Speed 6288.73 samples/sec Loss 2.5918 LearningRate 0.0000 Epoch: 38 Global Step: 790720 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:35,241-Speed 6303.80 samples/sec Loss 2.5368 LearningRate 0.0000 Epoch: 38 Global Step: 790730 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:38,497-Speed 6291.49 samples/sec Loss 2.6153 LearningRate 0.0000 Epoch: 38 Global Step: 790740 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:41,742-Speed 6311.26 samples/sec Loss 2.5630 LearningRate 0.0000 Epoch: 38 Global Step: 790750 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:44,995-Speed 6298.10 samples/sec Loss 2.5963 LearningRate 0.0000 Epoch: 38 Global Step: 790760 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:48,251-Speed 6291.07 samples/sec Loss 2.5767 LearningRate 0.0000 Epoch: 38 Global Step: 790770 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:51,504-Speed 6296.20 samples/sec Loss 2.5701 LearningRate 0.0000 Epoch: 38 Global Step: 790780 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:54,762-Speed 6288.98 samples/sec Loss 2.5602 LearningRate 0.0000 Epoch: 38 Global Step: 790790 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:41:58,020-Speed 6286.07 samples/sec Loss 2.5584 LearningRate 0.0000 Epoch: 38 Global Step: 790800 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:01,259-Speed 6324.03 samples/sec Loss 2.5973 LearningRate 0.0000 Epoch: 38 Global Step: 790810 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:04,510-Speed 6301.63 samples/sec Loss 2.5462 LearningRate 0.0000 Epoch: 38 Global Step: 790820 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:07,766-Speed 6290.49 samples/sec Loss 2.5615 LearningRate 0.0000 Epoch: 38 Global Step: 790830 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:11,024-Speed 6288.39 samples/sec Loss 2.5144 LearningRate 0.0000 Epoch: 38 Global Step: 790840 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:14,280-Speed 6292.03 samples/sec Loss 2.5796 LearningRate 0.0000 Epoch: 38 Global Step: 790850 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:17,534-Speed 6293.44 samples/sec Loss 2.5939 LearningRate 0.0000 Epoch: 38 Global Step: 790860 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:20,806-Speed 6261.98 samples/sec Loss 2.5921 LearningRate 0.0000 Epoch: 38 Global Step: 790870 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:24,066-Speed 6282.86 samples/sec Loss 2.5439 LearningRate 0.0000 Epoch: 38 Global Step: 790880 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:27,360-Speed 6218.62 samples/sec Loss 2.5938 LearningRate 0.0000 Epoch: 38 Global Step: 790890 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:30,636-Speed 6254.29 samples/sec Loss 2.6098 LearningRate 0.0000 Epoch: 38 Global Step: 790900 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:33,910-Speed 6257.71 samples/sec Loss 2.6234 LearningRate 0.0000 Epoch: 38 Global Step: 790910 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:37,186-Speed 6252.69 samples/sec Loss 2.5378 LearningRate 0.0000 Epoch: 38 Global Step: 790920 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:40,435-Speed 6303.96 samples/sec Loss 2.5595 LearningRate 0.0000 Epoch: 38 Global Step: 790930 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:43,684-Speed 6306.20 samples/sec Loss 2.5400 LearningRate 0.0000 Epoch: 38 Global Step: 790940 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:46,941-Speed 6288.70 samples/sec Loss 2.5430 LearningRate 0.0000 Epoch: 38 Global Step: 790950 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:50,198-Speed 6288.59 samples/sec Loss 2.5581 LearningRate 0.0000 Epoch: 38 Global Step: 790960 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:53,529-Speed 6151.00 samples/sec Loss 2.5556 LearningRate 0.0000 Epoch: 38 Global Step: 790970 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:42:56,791-Speed 6279.86 samples/sec Loss 2.5911 LearningRate 0.0000 Epoch: 38 Global Step: 790980 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:00,041-Speed 6302.89 samples/sec Loss 2.5868 LearningRate 0.0000 Epoch: 38 Global Step: 790990 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:03,300-Speed 6284.16 samples/sec Loss 2.5303 LearningRate 0.0000 Epoch: 38 Global Step: 791000 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:06,543-Speed 6316.68 samples/sec Loss 2.5362 LearningRate 0.0000 Epoch: 38 Global Step: 791010 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:09,798-Speed 6293.17 samples/sec Loss 2.5982 LearningRate 0.0000 Epoch: 38 Global Step: 791020 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:13,052-Speed 6295.69 samples/sec Loss 2.5463 LearningRate 0.0000 Epoch: 38 Global Step: 791030 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:16,309-Speed 6288.22 samples/sec Loss 2.5738 LearningRate 0.0000 Epoch: 38 Global Step: 791040 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:19,567-Speed 6288.78 samples/sec Loss 2.5703 LearningRate 0.0000 Epoch: 38 Global Step: 791050 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:22,823-Speed 6290.55 samples/sec Loss 2.5482 LearningRate 0.0000 Epoch: 38 Global Step: 791060 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:26,102-Speed 6246.72 samples/sec Loss 2.5502 LearningRate 0.0000 Epoch: 38 Global Step: 791070 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:29,358-Speed 6291.41 samples/sec Loss 2.5313 LearningRate 0.0000 Epoch: 38 Global Step: 791080 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:32,614-Speed 6291.06 samples/sec Loss 2.5767 LearningRate 0.0000 Epoch: 38 Global Step: 791090 Fp16 Grad Scale: 4096 Required: 4 hours Training: 2022-04-03 15:43:35,871-Speed 6289.48 samples/sec Loss 2.5643 LearningRate 0.0000 Epoch: 38 Global Step: 791100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:43:39,104-Speed 6337.03 samples/sec Loss 2.5598 LearningRate 0.0000 Epoch: 38 Global Step: 791110 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:43:42,364-Speed 6285.02 samples/sec Loss 2.5719 LearningRate 0.0000 Epoch: 38 Global Step: 791120 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:43:45,630-Speed 6272.10 samples/sec Loss 2.5810 LearningRate 0.0000 Epoch: 38 Global Step: 791130 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:43:48,891-Speed 6280.41 samples/sec Loss 2.5603 LearningRate 0.0000 Epoch: 38 Global Step: 791140 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:43:52,150-Speed 6285.95 samples/sec Loss 2.5620 LearningRate 0.0000 Epoch: 38 Global Step: 791150 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:43:55,407-Speed 6289.36 samples/sec Loss 2.5771 LearningRate 0.0000 Epoch: 38 Global Step: 791160 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:43:58,665-Speed 6288.44 samples/sec Loss 2.5473 LearningRate 0.0000 Epoch: 38 Global Step: 791170 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:01,918-Speed 6297.22 samples/sec Loss 2.5843 LearningRate 0.0000 Epoch: 38 Global Step: 791180 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:05,167-Speed 6303.32 samples/sec Loss 2.5637 LearningRate 0.0000 Epoch: 38 Global Step: 791190 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:08,426-Speed 6286.04 samples/sec Loss 2.5573 LearningRate 0.0000 Epoch: 38 Global Step: 791200 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:11,684-Speed 6288.27 samples/sec Loss 2.5848 LearningRate 0.0000 Epoch: 38 Global Step: 791210 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 15:44:14,919-Speed 6331.91 samples/sec Loss 2.5881 LearningRate 0.0000 Epoch: 38 Global Step: 791220 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:18,176-Speed 6289.58 samples/sec Loss 2.6187 LearningRate 0.0000 Epoch: 38 Global Step: 791230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:21,437-Speed 6280.60 samples/sec Loss 2.5693 LearningRate 0.0000 Epoch: 38 Global Step: 791240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:24,695-Speed 6288.29 samples/sec Loss 2.5629 LearningRate 0.0000 Epoch: 38 Global Step: 791250 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:27,973-Speed 6249.07 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 38 Global Step: 791260 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:31,232-Speed 6284.42 samples/sec Loss 2.5486 LearningRate 0.0000 Epoch: 38 Global Step: 791270 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:34,481-Speed 6305.54 samples/sec Loss 2.5231 LearningRate 0.0000 Epoch: 38 Global Step: 791280 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:37,734-Speed 6297.03 samples/sec Loss 2.5761 LearningRate 0.0000 Epoch: 38 Global Step: 791290 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:40,990-Speed 6290.44 samples/sec Loss 2.5420 LearningRate 0.0000 Epoch: 38 Global Step: 791300 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:44,263-Speed 6260.28 samples/sec Loss 2.5897 LearningRate 0.0000 Epoch: 38 Global Step: 791310 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:47,506-Speed 6316.66 samples/sec Loss 2.5393 LearningRate 0.0000 Epoch: 38 Global Step: 791320 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:50,757-Speed 6301.04 samples/sec Loss 2.6174 LearningRate 0.0000 Epoch: 38 Global Step: 791330 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:54,012-Speed 6292.86 samples/sec Loss 2.5593 LearningRate 0.0000 Epoch: 38 Global Step: 791340 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:44:57,251-Speed 6325.10 samples/sec Loss 2.5960 LearningRate 0.0000 Epoch: 38 Global Step: 791350 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 15:45:00,511-Speed 6282.68 samples/sec Loss 2.5871 LearningRate 0.0000 Epoch: 38 Global Step: 791360 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 15:45:03,777-Speed 6272.15 samples/sec Loss 2.5266 LearningRate 0.0000 Epoch: 38 Global Step: 791370 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 15:45:07,028-Speed 6301.79 samples/sec Loss 2.6076 LearningRate 0.0000 Epoch: 38 Global Step: 791380 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 15:45:10,289-Speed 6282.33 samples/sec Loss 2.5637 LearningRate 0.0000 Epoch: 38 Global Step: 791390 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 15:45:13,544-Speed 6293.13 samples/sec Loss 2.5696 LearningRate 0.0000 Epoch: 38 Global Step: 791400 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 15:45:16,794-Speed 6302.64 samples/sec Loss 2.5376 LearningRate 0.0000 Epoch: 38 Global Step: 791410 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 15:45:20,044-Speed 6302.43 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 38 Global Step: 791420 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 15:45:23,303-Speed 6285.26 samples/sec Loss 2.5201 LearningRate 0.0000 Epoch: 38 Global Step: 791430 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 15:45:26,573-Speed 6264.18 samples/sec Loss 2.5352 LearningRate 0.0000 Epoch: 38 Global Step: 791440 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 15:45:29,824-Speed 6300.98 samples/sec Loss 2.5680 LearningRate 0.0000 Epoch: 38 Global Step: 791450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:45:33,094-Speed 6264.72 samples/sec Loss 2.5444 LearningRate 0.0000 Epoch: 38 Global Step: 791460 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:45:36,350-Speed 6291.00 samples/sec Loss 2.5498 LearningRate 0.0000 Epoch: 38 Global Step: 791470 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:45:39,619-Speed 6266.75 samples/sec Loss 2.5278 LearningRate 0.0000 Epoch: 38 Global Step: 791480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:45:42,875-Speed 6292.40 samples/sec Loss 2.5608 LearningRate 0.0000 Epoch: 38 Global Step: 791490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:45:46,132-Speed 6287.79 samples/sec Loss 2.5463 LearningRate 0.0000 Epoch: 38 Global Step: 791500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:45:49,397-Speed 6275.22 samples/sec Loss 2.5899 LearningRate 0.0000 Epoch: 38 Global Step: 791510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:45:52,653-Speed 6292.33 samples/sec Loss 2.5344 LearningRate 0.0000 Epoch: 38 Global Step: 791520 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:45:55,920-Speed 6269.99 samples/sec Loss 2.5111 LearningRate 0.0000 Epoch: 38 Global Step: 791530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:45:59,176-Speed 6290.67 samples/sec Loss 2.5265 LearningRate 0.0000 Epoch: 38 Global Step: 791540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:02,419-Speed 6316.61 samples/sec Loss 2.5592 LearningRate 0.0000 Epoch: 38 Global Step: 791550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:05,681-Speed 6280.23 samples/sec Loss 2.5642 LearningRate 0.0000 Epoch: 38 Global Step: 791560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:08,941-Speed 6284.27 samples/sec Loss 2.5730 LearningRate 0.0000 Epoch: 38 Global Step: 791570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:12,199-Speed 6285.83 samples/sec Loss 2.5591 LearningRate 0.0000 Epoch: 38 Global Step: 791580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:15,458-Speed 6286.38 samples/sec Loss 2.5202 LearningRate 0.0000 Epoch: 38 Global Step: 791590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:18,730-Speed 6260.76 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 38 Global Step: 791600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:21,984-Speed 6295.61 samples/sec Loss 2.5689 LearningRate 0.0000 Epoch: 38 Global Step: 791610 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:25,232-Speed 6305.10 samples/sec Loss 2.5910 LearningRate 0.0000 Epoch: 38 Global Step: 791620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:28,492-Speed 6283.83 samples/sec Loss 2.5629 LearningRate 0.0000 Epoch: 38 Global Step: 791630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:31,746-Speed 6296.52 samples/sec Loss 2.4966 LearningRate 0.0000 Epoch: 38 Global Step: 791640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:34,981-Speed 6331.13 samples/sec Loss 2.5770 LearningRate 0.0000 Epoch: 38 Global Step: 791650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:38,234-Speed 6298.33 samples/sec Loss 2.6357 LearningRate 0.0000 Epoch: 38 Global Step: 791660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:41,485-Speed 6300.38 samples/sec Loss 2.6351 LearningRate 0.0000 Epoch: 38 Global Step: 791670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:44,738-Speed 6295.80 samples/sec Loss 2.5413 LearningRate 0.0000 Epoch: 38 Global Step: 791680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:47,989-Speed 6301.72 samples/sec Loss 2.6008 LearningRate 0.0000 Epoch: 38 Global Step: 791690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:51,251-Speed 6279.55 samples/sec Loss 2.6139 LearningRate 0.0000 Epoch: 38 Global Step: 791700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:54,519-Speed 6269.70 samples/sec Loss 2.5327 LearningRate 0.0000 Epoch: 38 Global Step: 791710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:46:57,785-Speed 6270.87 samples/sec Loss 2.5695 LearningRate 0.0000 Epoch: 38 Global Step: 791720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:01,042-Speed 6291.08 samples/sec Loss 2.5832 LearningRate 0.0000 Epoch: 38 Global Step: 791730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:04,297-Speed 6292.25 samples/sec Loss 2.5689 LearningRate 0.0000 Epoch: 38 Global Step: 791740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:07,538-Speed 6320.79 samples/sec Loss 2.5347 LearningRate 0.0000 Epoch: 38 Global Step: 791750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:10,799-Speed 6282.23 samples/sec Loss 2.5677 LearningRate 0.0000 Epoch: 38 Global Step: 791760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:14,054-Speed 6293.14 samples/sec Loss 2.5513 LearningRate 0.0000 Epoch: 38 Global Step: 791770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:17,310-Speed 6290.55 samples/sec Loss 2.5778 LearningRate 0.0000 Epoch: 38 Global Step: 791780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:20,564-Speed 6294.39 samples/sec Loss 2.5271 LearningRate 0.0000 Epoch: 38 Global Step: 791790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:23,830-Speed 6272.30 samples/sec Loss 2.5455 LearningRate 0.0000 Epoch: 38 Global Step: 791800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:27,081-Speed 6302.01 samples/sec Loss 2.5384 LearningRate 0.0000 Epoch: 38 Global Step: 791810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:30,331-Speed 6301.99 samples/sec Loss 2.5877 LearningRate 0.0000 Epoch: 38 Global Step: 791820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:33,602-Speed 6262.53 samples/sec Loss 2.5497 LearningRate 0.0000 Epoch: 38 Global Step: 791830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:36,855-Speed 6297.19 samples/sec Loss 2.6050 LearningRate 0.0000 Epoch: 38 Global Step: 791840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:40,111-Speed 6291.81 samples/sec Loss 2.5248 LearningRate 0.0000 Epoch: 38 Global Step: 791850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 15:47:43,362-Speed 6300.87 samples/sec Loss 2.5756 LearningRate 0.0000 Epoch: 38 Global Step: 791860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:46,657-Speed 6215.66 samples/sec Loss 2.5818 LearningRate 0.0000 Epoch: 38 Global Step: 791870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:49,917-Speed 6285.44 samples/sec Loss 2.6168 LearningRate 0.0000 Epoch: 38 Global Step: 791880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:53,176-Speed 6284.63 samples/sec Loss 2.5382 LearningRate 0.0000 Epoch: 38 Global Step: 791890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:56,438-Speed 6279.20 samples/sec Loss 2.5810 LearningRate 0.0000 Epoch: 38 Global Step: 791900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:47:59,703-Speed 6275.02 samples/sec Loss 2.5411 LearningRate 0.0000 Epoch: 38 Global Step: 791910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:02,958-Speed 6292.30 samples/sec Loss 2.5195 LearningRate 0.0000 Epoch: 38 Global Step: 791920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:06,211-Speed 6297.84 samples/sec Loss 2.5021 LearningRate 0.0000 Epoch: 38 Global Step: 791930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:09,468-Speed 6289.49 samples/sec Loss 2.5402 LearningRate 0.0000 Epoch: 38 Global Step: 791940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:12,734-Speed 6272.34 samples/sec Loss 2.5308 LearningRate 0.0000 Epoch: 38 Global Step: 791950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:15,976-Speed 6317.84 samples/sec Loss 2.5753 LearningRate 0.0000 Epoch: 38 Global Step: 791960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:19,243-Speed 6271.11 samples/sec Loss 2.5962 LearningRate 0.0000 Epoch: 38 Global Step: 791970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:22,497-Speed 6295.80 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 38 Global Step: 791980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:25,763-Speed 6270.75 samples/sec Loss 2.5809 LearningRate 0.0000 Epoch: 38 Global Step: 791990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:29,020-Speed 6290.17 samples/sec Loss 2.5555 LearningRate 0.0000 Epoch: 38 Global Step: 792000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:32,292-Speed 6260.06 samples/sec Loss 2.5947 LearningRate 0.0000 Epoch: 38 Global Step: 792010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:35,546-Speed 6296.01 samples/sec Loss 2.5205 LearningRate 0.0000 Epoch: 38 Global Step: 792020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:38,803-Speed 6289.61 samples/sec Loss 2.5886 LearningRate 0.0000 Epoch: 38 Global Step: 792030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:42,056-Speed 6296.56 samples/sec Loss 2.5465 LearningRate 0.0000 Epoch: 38 Global Step: 792040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:45,308-Speed 6297.71 samples/sec Loss 2.5796 LearningRate 0.0000 Epoch: 38 Global Step: 792050 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:48,548-Speed 6323.37 samples/sec Loss 2.5449 LearningRate 0.0000 Epoch: 38 Global Step: 792060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:51,808-Speed 6283.44 samples/sec Loss 2.5430 LearningRate 0.0000 Epoch: 38 Global Step: 792070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:55,067-Speed 6286.41 samples/sec Loss 2.5746 LearningRate 0.0000 Epoch: 38 Global Step: 792080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:48:58,325-Speed 6286.20 samples/sec Loss 2.5794 LearningRate 0.0000 Epoch: 38 Global Step: 792090 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:01,580-Speed 6293.92 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 38 Global Step: 792100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:04,834-Speed 6294.89 samples/sec Loss 2.5490 LearningRate 0.0000 Epoch: 38 Global Step: 792110 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:08,090-Speed 6293.00 samples/sec Loss 2.5418 LearningRate 0.0000 Epoch: 38 Global Step: 792120 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:11,344-Speed 6293.96 samples/sec Loss 2.6090 LearningRate 0.0000 Epoch: 38 Global Step: 792130 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:14,600-Speed 6291.04 samples/sec Loss 2.5455 LearningRate 0.0000 Epoch: 38 Global Step: 792140 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:17,862-Speed 6280.96 samples/sec Loss 2.5676 LearningRate 0.0000 Epoch: 38 Global Step: 792150 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:21,103-Speed 6319.35 samples/sec Loss 2.5668 LearningRate 0.0000 Epoch: 38 Global Step: 792160 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:24,357-Speed 6296.82 samples/sec Loss 2.4664 LearningRate 0.0000 Epoch: 38 Global Step: 792170 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:27,622-Speed 6273.35 samples/sec Loss 2.5926 LearningRate 0.0000 Epoch: 38 Global Step: 792180 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:30,880-Speed 6287.42 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 38 Global Step: 792190 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:34,140-Speed 6283.88 samples/sec Loss 2.5277 LearningRate 0.0000 Epoch: 38 Global Step: 792200 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:37,397-Speed 6288.84 samples/sec Loss 2.5665 LearningRate 0.0000 Epoch: 38 Global Step: 792210 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:40,650-Speed 6296.86 samples/sec Loss 2.5241 LearningRate 0.0000 Epoch: 38 Global Step: 792220 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:43,903-Speed 6298.39 samples/sec Loss 2.5711 LearningRate 0.0000 Epoch: 38 Global Step: 792230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:47,158-Speed 6293.31 samples/sec Loss 2.5551 LearningRate 0.0000 Epoch: 38 Global Step: 792240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:50,412-Speed 6293.51 samples/sec Loss 2.5980 LearningRate 0.0000 Epoch: 38 Global Step: 792250 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:49:53,666-Speed 6295.77 samples/sec Loss 2.5463 LearningRate 0.0000 Epoch: 38 Global Step: 792260 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 15:49:56,906-Speed 6322.49 samples/sec Loss 2.5894 LearningRate 0.0000 Epoch: 38 Global Step: 792270 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:00,174-Speed 6267.40 samples/sec Loss 2.5938 LearningRate 0.0000 Epoch: 38 Global Step: 792280 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:03,430-Speed 6291.23 samples/sec Loss 2.5396 LearningRate 0.0000 Epoch: 38 Global Step: 792290 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:06,697-Speed 6270.83 samples/sec Loss 2.5775 LearningRate 0.0000 Epoch: 38 Global Step: 792300 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:09,957-Speed 6283.79 samples/sec Loss 2.5770 LearningRate 0.0000 Epoch: 38 Global Step: 792310 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:13,208-Speed 6300.48 samples/sec Loss 2.5276 LearningRate 0.0000 Epoch: 38 Global Step: 792320 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:16,472-Speed 6278.06 samples/sec Loss 2.5608 LearningRate 0.0000 Epoch: 38 Global Step: 792330 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:19,723-Speed 6300.75 samples/sec Loss 2.5903 LearningRate 0.0000 Epoch: 38 Global Step: 792340 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:22,977-Speed 6294.44 samples/sec Loss 2.5423 LearningRate 0.0000 Epoch: 38 Global Step: 792350 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:26,227-Speed 6302.28 samples/sec Loss 2.5497 LearningRate 0.0000 Epoch: 38 Global Step: 792360 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:29,460-Speed 6337.39 samples/sec Loss 2.5542 LearningRate 0.0000 Epoch: 38 Global Step: 792370 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:32,719-Speed 6285.37 samples/sec Loss 2.5506 LearningRate 0.0000 Epoch: 38 Global Step: 792380 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:35,965-Speed 6310.71 samples/sec Loss 2.6098 LearningRate 0.0000 Epoch: 38 Global Step: 792390 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:39,221-Speed 6290.10 samples/sec Loss 2.6148 LearningRate 0.0000 Epoch: 38 Global Step: 792400 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:42,474-Speed 6297.77 samples/sec Loss 2.5371 LearningRate 0.0000 Epoch: 38 Global Step: 792410 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:45,728-Speed 6295.20 samples/sec Loss 2.6015 LearningRate 0.0000 Epoch: 38 Global Step: 792420 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:48,985-Speed 6289.95 samples/sec Loss 2.5318 LearningRate 0.0000 Epoch: 38 Global Step: 792430 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:52,237-Speed 6297.60 samples/sec Loss 2.5523 LearningRate 0.0000 Epoch: 38 Global Step: 792440 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:55,498-Speed 6281.75 samples/sec Loss 2.5048 LearningRate 0.0000 Epoch: 38 Global Step: 792450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:50:58,751-Speed 6297.66 samples/sec Loss 2.5154 LearningRate 0.0000 Epoch: 38 Global Step: 792460 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:02,013-Speed 6280.17 samples/sec Loss 2.5631 LearningRate 0.0000 Epoch: 38 Global Step: 792470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 15:51:05,257-Speed 6315.03 samples/sec Loss 2.5923 LearningRate 0.0000 Epoch: 38 Global Step: 792480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:08,517-Speed 6282.09 samples/sec Loss 2.5533 LearningRate 0.0000 Epoch: 38 Global Step: 792490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:11,774-Speed 6288.87 samples/sec Loss 2.5723 LearningRate 0.0000 Epoch: 38 Global Step: 792500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:15,032-Speed 6289.83 samples/sec Loss 2.5927 LearningRate 0.0000 Epoch: 38 Global Step: 792510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:18,286-Speed 6294.53 samples/sec Loss 2.5472 LearningRate 0.0000 Epoch: 38 Global Step: 792520 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:21,540-Speed 6295.86 samples/sec Loss 2.5466 LearningRate 0.0000 Epoch: 38 Global Step: 792530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:24,809-Speed 6267.24 samples/sec Loss 2.5575 LearningRate 0.0000 Epoch: 38 Global Step: 792540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:28,077-Speed 6266.53 samples/sec Loss 2.5538 LearningRate 0.0000 Epoch: 38 Global Step: 792550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:31,330-Speed 6297.53 samples/sec Loss 2.6106 LearningRate 0.0000 Epoch: 38 Global Step: 792560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:34,584-Speed 6295.84 samples/sec Loss 2.5607 LearningRate 0.0000 Epoch: 38 Global Step: 792570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:37,821-Speed 6327.03 samples/sec Loss 2.5618 LearningRate 0.0000 Epoch: 38 Global Step: 792580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:41,076-Speed 6293.31 samples/sec Loss 2.5072 LearningRate 0.0000 Epoch: 38 Global Step: 792590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:44,331-Speed 6293.79 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 38 Global Step: 792600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:47,589-Speed 6288.26 samples/sec Loss 2.4888 LearningRate 0.0000 Epoch: 38 Global Step: 792610 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:50,847-Speed 6286.27 samples/sec Loss 2.5361 LearningRate 0.0000 Epoch: 38 Global Step: 792620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:54,115-Speed 6269.30 samples/sec Loss 2.5402 LearningRate 0.0000 Epoch: 38 Global Step: 792630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:51:57,373-Speed 6287.40 samples/sec Loss 2.5734 LearningRate 0.0000 Epoch: 38 Global Step: 792640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:00,638-Speed 6274.07 samples/sec Loss 2.5152 LearningRate 0.0000 Epoch: 38 Global Step: 792650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:03,901-Speed 6276.81 samples/sec Loss 2.5382 LearningRate 0.0000 Epoch: 38 Global Step: 792660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:07,154-Speed 6297.90 samples/sec Loss 2.5288 LearningRate 0.0000 Epoch: 38 Global Step: 792670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:10,386-Speed 6338.22 samples/sec Loss 2.5426 LearningRate 0.0000 Epoch: 38 Global Step: 792680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:13,646-Speed 6282.32 samples/sec Loss 2.5911 LearningRate 0.0000 Epoch: 38 Global Step: 792690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:16,962-Speed 6178.27 samples/sec Loss 2.5180 LearningRate 0.0000 Epoch: 38 Global Step: 792700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:20,219-Speed 6290.65 samples/sec Loss 2.5378 LearningRate 0.0000 Epoch: 38 Global Step: 792710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:23,478-Speed 6286.09 samples/sec Loss 2.5847 LearningRate 0.0000 Epoch: 38 Global Step: 792720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:26,732-Speed 6294.02 samples/sec Loss 2.5268 LearningRate 0.0000 Epoch: 38 Global Step: 792730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:29,989-Speed 6289.68 samples/sec Loss 2.5805 LearningRate 0.0000 Epoch: 38 Global Step: 792740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:33,238-Speed 6305.51 samples/sec Loss 2.5507 LearningRate 0.0000 Epoch: 38 Global Step: 792750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:36,490-Speed 6297.79 samples/sec Loss 2.5795 LearningRate 0.0000 Epoch: 38 Global Step: 792760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:39,749-Speed 6285.81 samples/sec Loss 2.5639 LearningRate 0.0000 Epoch: 38 Global Step: 792770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:43,000-Speed 6300.46 samples/sec Loss 2.5492 LearningRate 0.0000 Epoch: 38 Global Step: 792780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 15:52:46,234-Speed 6336.05 samples/sec Loss 2.5357 LearningRate 0.0000 Epoch: 38 Global Step: 792790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:49,490-Speed 6290.18 samples/sec Loss 2.5706 LearningRate 0.0000 Epoch: 38 Global Step: 792800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:52,744-Speed 6295.01 samples/sec Loss 2.5514 LearningRate 0.0000 Epoch: 38 Global Step: 792810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:56,003-Speed 6285.72 samples/sec Loss 2.6151 LearningRate 0.0000 Epoch: 38 Global Step: 792820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:52:59,253-Speed 6303.58 samples/sec Loss 2.5635 LearningRate 0.0000 Epoch: 38 Global Step: 792830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:02,508-Speed 6293.21 samples/sec Loss 2.6219 LearningRate 0.0000 Epoch: 38 Global Step: 792840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:05,754-Speed 6309.81 samples/sec Loss 2.5547 LearningRate 0.0000 Epoch: 38 Global Step: 792850 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:09,011-Speed 6288.92 samples/sec Loss 2.5423 LearningRate 0.0000 Epoch: 38 Global Step: 792860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:12,273-Speed 6281.09 samples/sec Loss 2.5479 LearningRate 0.0000 Epoch: 38 Global Step: 792870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:15,523-Speed 6303.18 samples/sec Loss 2.5505 LearningRate 0.0000 Epoch: 38 Global Step: 792880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:18,759-Speed 6328.57 samples/sec Loss 2.5647 LearningRate 0.0000 Epoch: 38 Global Step: 792890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:22,017-Speed 6289.76 samples/sec Loss 2.5973 LearningRate 0.0000 Epoch: 38 Global Step: 792900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:25,273-Speed 6290.13 samples/sec Loss 2.5555 LearningRate 0.0000 Epoch: 38 Global Step: 792910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:28,529-Speed 6292.19 samples/sec Loss 2.5439 LearningRate 0.0000 Epoch: 38 Global Step: 792920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:31,789-Speed 6284.34 samples/sec Loss 2.5216 LearningRate 0.0000 Epoch: 38 Global Step: 792930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:35,043-Speed 6295.30 samples/sec Loss 2.5320 LearningRate 0.0000 Epoch: 38 Global Step: 792940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:38,294-Speed 6300.66 samples/sec Loss 2.5894 LearningRate 0.0000 Epoch: 38 Global Step: 792950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:41,551-Speed 6289.72 samples/sec Loss 2.5454 LearningRate 0.0000 Epoch: 38 Global Step: 792960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:44,806-Speed 6291.76 samples/sec Loss 2.5458 LearningRate 0.0000 Epoch: 38 Global Step: 792970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:48,060-Speed 6295.42 samples/sec Loss 2.6051 LearningRate 0.0000 Epoch: 38 Global Step: 792980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:51,305-Speed 6313.19 samples/sec Loss 2.5347 LearningRate 0.0000 Epoch: 38 Global Step: 792990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:54,566-Speed 6280.99 samples/sec Loss 2.5316 LearningRate 0.0000 Epoch: 38 Global Step: 793000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:53:57,820-Speed 6296.19 samples/sec Loss 2.5932 LearningRate 0.0000 Epoch: 38 Global Step: 793010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:01,081-Speed 6280.20 samples/sec Loss 2.5862 LearningRate 0.0000 Epoch: 38 Global Step: 793020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:04,335-Speed 6296.62 samples/sec Loss 2.5884 LearningRate 0.0000 Epoch: 38 Global Step: 793030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:07,597-Speed 6279.61 samples/sec Loss 2.5817 LearningRate 0.0000 Epoch: 38 Global Step: 793040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:10,848-Speed 6299.54 samples/sec Loss 2.5445 LearningRate 0.0000 Epoch: 38 Global Step: 793050 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:14,102-Speed 6295.00 samples/sec Loss 2.5656 LearningRate 0.0000 Epoch: 38 Global Step: 793060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:17,358-Speed 6292.02 samples/sec Loss 2.6149 LearningRate 0.0000 Epoch: 38 Global Step: 793070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:20,609-Speed 6300.81 samples/sec Loss 2.6295 LearningRate 0.0000 Epoch: 38 Global Step: 793080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:23,851-Speed 6319.54 samples/sec Loss 2.5887 LearningRate 0.0000 Epoch: 38 Global Step: 793090 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:27,105-Speed 6295.14 samples/sec Loss 2.5895 LearningRate 0.0000 Epoch: 38 Global Step: 793100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:30,357-Speed 6298.44 samples/sec Loss 2.5192 LearningRate 0.0000 Epoch: 38 Global Step: 793110 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:33,613-Speed 6291.45 samples/sec Loss 2.5678 LearningRate 0.0000 Epoch: 38 Global Step: 793120 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:36,888-Speed 6256.84 samples/sec Loss 2.5581 LearningRate 0.0000 Epoch: 38 Global Step: 793130 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:40,141-Speed 6296.13 samples/sec Loss 2.5663 LearningRate 0.0000 Epoch: 38 Global Step: 793140 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:43,387-Speed 6309.69 samples/sec Loss 2.5877 LearningRate 0.0000 Epoch: 38 Global Step: 793150 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:46,651-Speed 6276.75 samples/sec Loss 2.5421 LearningRate 0.0000 Epoch: 38 Global Step: 793160 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:49,920-Speed 6267.34 samples/sec Loss 2.5556 LearningRate 0.0000 Epoch: 38 Global Step: 793170 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:53,175-Speed 6292.99 samples/sec Loss 2.5369 LearningRate 0.0000 Epoch: 38 Global Step: 793180 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:56,409-Speed 6334.56 samples/sec Loss 2.5109 LearningRate 0.0000 Epoch: 38 Global Step: 793190 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:54:59,673-Speed 6276.01 samples/sec Loss 2.5146 LearningRate 0.0000 Epoch: 38 Global Step: 793200 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:02,933-Speed 6282.06 samples/sec Loss 2.5749 LearningRate 0.0000 Epoch: 38 Global Step: 793210 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:06,191-Speed 6287.81 samples/sec Loss 2.5990 LearningRate 0.0000 Epoch: 38 Global Step: 793220 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:09,447-Speed 6292.25 samples/sec Loss 2.5651 LearningRate 0.0000 Epoch: 38 Global Step: 793230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:12,697-Speed 6302.67 samples/sec Loss 2.5991 LearningRate 0.0000 Epoch: 38 Global Step: 793240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:15,950-Speed 6296.42 samples/sec Loss 2.5799 LearningRate 0.0000 Epoch: 38 Global Step: 793250 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:19,207-Speed 6289.03 samples/sec Loss 2.5529 LearningRate 0.0000 Epoch: 38 Global Step: 793260 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:22,458-Speed 6301.83 samples/sec Loss 2.5721 LearningRate 0.0000 Epoch: 38 Global Step: 793270 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:25,716-Speed 6286.60 samples/sec Loss 2.5832 LearningRate 0.0000 Epoch: 38 Global Step: 793280 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:28,972-Speed 6291.02 samples/sec Loss 2.5655 LearningRate 0.0000 Epoch: 38 Global Step: 793290 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 15:55:32,211-Speed 6325.50 samples/sec Loss 2.5762 LearningRate 0.0000 Epoch: 38 Global Step: 793300 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:35,466-Speed 6294.41 samples/sec Loss 2.5644 LearningRate 0.0000 Epoch: 38 Global Step: 793310 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:38,725-Speed 6285.12 samples/sec Loss 2.5388 LearningRate 0.0000 Epoch: 38 Global Step: 793320 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:41,990-Speed 6273.09 samples/sec Loss 2.5464 LearningRate 0.0000 Epoch: 38 Global Step: 793330 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:45,243-Speed 6298.86 samples/sec Loss 2.5529 LearningRate 0.0000 Epoch: 38 Global Step: 793340 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:48,497-Speed 6294.20 samples/sec Loss 2.6058 LearningRate 0.0000 Epoch: 38 Global Step: 793350 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:51,753-Speed 6290.95 samples/sec Loss 2.5468 LearningRate 0.0000 Epoch: 38 Global Step: 793360 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:55,082-Speed 6154.55 samples/sec Loss 2.5239 LearningRate 0.0000 Epoch: 38 Global Step: 793370 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:55:58,334-Speed 6298.40 samples/sec Loss 2.5694 LearningRate 0.0000 Epoch: 38 Global Step: 793380 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:01,595-Speed 6282.69 samples/sec Loss 2.6173 LearningRate 0.0000 Epoch: 38 Global Step: 793390 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:04,840-Speed 6312.26 samples/sec Loss 2.5865 LearningRate 0.0000 Epoch: 38 Global Step: 793400 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:08,098-Speed 6286.51 samples/sec Loss 2.5537 LearningRate 0.0000 Epoch: 38 Global Step: 793410 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:11,345-Speed 6308.75 samples/sec Loss 2.5849 LearningRate 0.0000 Epoch: 38 Global Step: 793420 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:14,599-Speed 6295.52 samples/sec Loss 2.5506 LearningRate 0.0000 Epoch: 38 Global Step: 793430 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:17,853-Speed 6295.94 samples/sec Loss 2.5311 LearningRate 0.0000 Epoch: 38 Global Step: 793440 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:21,103-Speed 6301.09 samples/sec Loss 2.5413 LearningRate 0.0000 Epoch: 38 Global Step: 793450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:24,357-Speed 6295.06 samples/sec Loss 2.5899 LearningRate 0.0000 Epoch: 38 Global Step: 793460 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:27,608-Speed 6301.39 samples/sec Loss 2.5108 LearningRate 0.0000 Epoch: 38 Global Step: 793470 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:30,867-Speed 6286.09 samples/sec Loss 2.5716 LearningRate 0.0000 Epoch: 38 Global Step: 793480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:34,120-Speed 6297.70 samples/sec Loss 2.5971 LearningRate 0.0000 Epoch: 38 Global Step: 793490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:37,357-Speed 6326.96 samples/sec Loss 2.5588 LearningRate 0.0000 Epoch: 38 Global Step: 793500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:40,613-Speed 6293.03 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 38 Global Step: 793510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:43,872-Speed 6285.87 samples/sec Loss 2.5566 LearningRate 0.0000 Epoch: 38 Global Step: 793520 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:47,122-Speed 6302.09 samples/sec Loss 2.4967 LearningRate 0.0000 Epoch: 38 Global Step: 793530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:50,399-Speed 6252.49 samples/sec Loss 2.5334 LearningRate 0.0000 Epoch: 38 Global Step: 793540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:53,657-Speed 6287.24 samples/sec Loss 2.5686 LearningRate 0.0000 Epoch: 38 Global Step: 793550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:56:56,909-Speed 6297.72 samples/sec Loss 2.5577 LearningRate 0.0000 Epoch: 38 Global Step: 793560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:00,165-Speed 6292.03 samples/sec Loss 2.5642 LearningRate 0.0000 Epoch: 38 Global Step: 793570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:03,418-Speed 6296.69 samples/sec Loss 2.5418 LearningRate 0.0000 Epoch: 38 Global Step: 793580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:06,678-Speed 6284.28 samples/sec Loss 2.5588 LearningRate 0.0000 Epoch: 38 Global Step: 793590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:09,914-Speed 6329.27 samples/sec Loss 2.5443 LearningRate 0.0000 Epoch: 38 Global Step: 793600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:13,169-Speed 6293.90 samples/sec Loss 2.5448 LearningRate 0.0000 Epoch: 38 Global Step: 793610 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:16,424-Speed 6293.08 samples/sec Loss 2.5177 LearningRate 0.0000 Epoch: 38 Global Step: 793620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:19,677-Speed 6297.65 samples/sec Loss 2.5725 LearningRate 0.0000 Epoch: 38 Global Step: 793630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:22,928-Speed 6300.05 samples/sec Loss 2.5900 LearningRate 0.0000 Epoch: 38 Global Step: 793640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:26,183-Speed 6294.85 samples/sec Loss 2.5555 LearningRate 0.0000 Epoch: 38 Global Step: 793650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:29,437-Speed 6294.86 samples/sec Loss 2.5266 LearningRate 0.0000 Epoch: 38 Global Step: 793660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:32,690-Speed 6295.99 samples/sec Loss 2.5814 LearningRate 0.0000 Epoch: 38 Global Step: 793670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:35,948-Speed 6287.99 samples/sec Loss 2.5934 LearningRate 0.0000 Epoch: 38 Global Step: 793680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:39,232-Speed 6237.46 samples/sec Loss 2.5774 LearningRate 0.0000 Epoch: 38 Global Step: 793690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:42,489-Speed 6290.12 samples/sec Loss 2.5726 LearningRate 0.0000 Epoch: 38 Global Step: 793700 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 15:57:45,730-Speed 6321.55 samples/sec Loss 2.5996 LearningRate 0.0000 Epoch: 38 Global Step: 793710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:48,983-Speed 6297.64 samples/sec Loss 2.5730 LearningRate 0.0000 Epoch: 38 Global Step: 793720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:52,235-Speed 6297.35 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 38 Global Step: 793730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:55,498-Speed 6280.66 samples/sec Loss 2.6147 LearningRate 0.0000 Epoch: 38 Global Step: 793740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:57:58,748-Speed 6302.34 samples/sec Loss 2.5362 LearningRate 0.0000 Epoch: 38 Global Step: 793750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:02,006-Speed 6289.06 samples/sec Loss 2.5324 LearningRate 0.0000 Epoch: 38 Global Step: 793760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:05,263-Speed 6287.77 samples/sec Loss 2.5631 LearningRate 0.0000 Epoch: 38 Global Step: 793770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:08,520-Speed 6289.56 samples/sec Loss 2.4884 LearningRate 0.0000 Epoch: 38 Global Step: 793780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:11,780-Speed 6283.72 samples/sec Loss 2.5920 LearningRate 0.0000 Epoch: 38 Global Step: 793790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:15,042-Speed 6280.32 samples/sec Loss 2.5189 LearningRate 0.0000 Epoch: 38 Global Step: 793800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:18,277-Speed 6330.90 samples/sec Loss 2.6132 LearningRate 0.0000 Epoch: 38 Global Step: 793810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:21,537-Speed 6283.62 samples/sec Loss 2.5903 LearningRate 0.0000 Epoch: 38 Global Step: 793820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:24,788-Speed 6301.40 samples/sec Loss 2.5365 LearningRate 0.0000 Epoch: 38 Global Step: 793830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:28,047-Speed 6285.96 samples/sec Loss 2.5676 LearningRate 0.0000 Epoch: 38 Global Step: 793840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:31,299-Speed 6298.82 samples/sec Loss 2.5804 LearningRate 0.0000 Epoch: 38 Global Step: 793850 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:34,556-Speed 6290.46 samples/sec Loss 2.5884 LearningRate 0.0000 Epoch: 38 Global Step: 793860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:37,813-Speed 6288.67 samples/sec Loss 2.4945 LearningRate 0.0000 Epoch: 38 Global Step: 793870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:41,068-Speed 6292.91 samples/sec Loss 2.5937 LearningRate 0.0000 Epoch: 38 Global Step: 793880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:44,319-Speed 6301.06 samples/sec Loss 2.5471 LearningRate 0.0000 Epoch: 38 Global Step: 793890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:47,580-Speed 6281.42 samples/sec Loss 2.5785 LearningRate 0.0000 Epoch: 38 Global Step: 793900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:50,827-Speed 6308.92 samples/sec Loss 2.5994 LearningRate 0.0000 Epoch: 38 Global Step: 793910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:54,085-Speed 6288.84 samples/sec Loss 2.5894 LearningRate 0.0000 Epoch: 38 Global Step: 793920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:58:57,346-Speed 6281.39 samples/sec Loss 2.5335 LearningRate 0.0000 Epoch: 38 Global Step: 793930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:00,598-Speed 6299.00 samples/sec Loss 2.5829 LearningRate 0.0000 Epoch: 38 Global Step: 793940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:03,859-Speed 6284.70 samples/sec Loss 2.5432 LearningRate 0.0000 Epoch: 38 Global Step: 793950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:07,117-Speed 6287.68 samples/sec Loss 2.5609 LearningRate 0.0000 Epoch: 38 Global Step: 793960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:10,373-Speed 6290.77 samples/sec Loss 2.4976 LearningRate 0.0000 Epoch: 38 Global Step: 793970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:13,635-Speed 6280.91 samples/sec Loss 2.5768 LearningRate 0.0000 Epoch: 38 Global Step: 793980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:16,888-Speed 6295.85 samples/sec Loss 2.5205 LearningRate 0.0000 Epoch: 38 Global Step: 793990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:20,142-Speed 6295.92 samples/sec Loss 2.5742 LearningRate 0.0000 Epoch: 38 Global Step: 794000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:23,440-Speed 6210.36 samples/sec Loss 2.5381 LearningRate 0.0000 Epoch: 38 Global Step: 794010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:26,694-Speed 6298.99 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 38 Global Step: 794020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:29,957-Speed 6278.44 samples/sec Loss 2.5658 LearningRate 0.0000 Epoch: 38 Global Step: 794030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:33,208-Speed 6300.22 samples/sec Loss 2.5492 LearningRate 0.0000 Epoch: 38 Global Step: 794040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:36,463-Speed 6292.57 samples/sec Loss 2.5404 LearningRate 0.0000 Epoch: 38 Global Step: 794050 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:39,717-Speed 6296.56 samples/sec Loss 2.5132 LearningRate 0.0000 Epoch: 38 Global Step: 794060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:42,979-Speed 6278.37 samples/sec Loss 2.5663 LearningRate 0.0000 Epoch: 38 Global Step: 794070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:46,240-Speed 6282.58 samples/sec Loss 2.5852 LearningRate 0.0000 Epoch: 38 Global Step: 794080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:49,512-Speed 6259.76 samples/sec Loss 2.5786 LearningRate 0.0000 Epoch: 38 Global Step: 794090 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:52,768-Speed 6291.17 samples/sec Loss 2.5590 LearningRate 0.0000 Epoch: 38 Global Step: 794100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 15:59:56,018-Speed 6304.33 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 38 Global Step: 794110 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 15:59:59,263-Speed 6313.39 samples/sec Loss 2.5541 LearningRate 0.0000 Epoch: 38 Global Step: 794120 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:02,518-Speed 6292.65 samples/sec Loss 2.5900 LearningRate 0.0000 Epoch: 38 Global Step: 794130 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:05,773-Speed 6293.09 samples/sec Loss 2.5524 LearningRate 0.0000 Epoch: 38 Global Step: 794140 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:09,029-Speed 6291.81 samples/sec Loss 2.5580 LearningRate 0.0000 Epoch: 38 Global Step: 794150 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:12,284-Speed 6292.76 samples/sec Loss 2.6041 LearningRate 0.0000 Epoch: 38 Global Step: 794160 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:15,543-Speed 6286.54 samples/sec Loss 2.5784 LearningRate 0.0000 Epoch: 38 Global Step: 794170 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:18,799-Speed 6291.53 samples/sec Loss 2.5517 LearningRate 0.0000 Epoch: 38 Global Step: 794180 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:22,054-Speed 6292.84 samples/sec Loss 2.5944 LearningRate 0.0000 Epoch: 38 Global Step: 794190 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:25,308-Speed 6295.87 samples/sec Loss 2.5222 LearningRate 0.0000 Epoch: 38 Global Step: 794200 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:28,591-Speed 6239.71 samples/sec Loss 2.5431 LearningRate 0.0000 Epoch: 38 Global Step: 794210 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:31,842-Speed 6299.54 samples/sec Loss 2.5701 LearningRate 0.0000 Epoch: 38 Global Step: 794220 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:35,097-Speed 6293.62 samples/sec Loss 2.5686 LearningRate 0.0000 Epoch: 38 Global Step: 794230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:38,408-Speed 6187.68 samples/sec Loss 2.5480 LearningRate 0.0000 Epoch: 38 Global Step: 794240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:00:41,765-Speed 6101.66 samples/sec Loss 2.5539 LearningRate 0.0000 Epoch: 38 Global Step: 794250 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:00:45,020-Speed 6293.12 samples/sec Loss 2.5368 LearningRate 0.0000 Epoch: 38 Global Step: 794260 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:00:48,278-Speed 6287.48 samples/sec Loss 2.5508 LearningRate 0.0000 Epoch: 38 Global Step: 794270 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:00:51,528-Speed 6301.71 samples/sec Loss 2.5410 LearningRate 0.0000 Epoch: 38 Global Step: 794280 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:00:54,798-Speed 6265.83 samples/sec Loss 2.5911 LearningRate 0.0000 Epoch: 38 Global Step: 794290 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:00:58,053-Speed 6292.49 samples/sec Loss 2.5505 LearningRate 0.0000 Epoch: 38 Global Step: 794300 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:01:01,307-Speed 6295.17 samples/sec Loss 2.5670 LearningRate 0.0000 Epoch: 38 Global Step: 794310 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:01:04,559-Speed 6299.90 samples/sec Loss 2.5556 LearningRate 0.0000 Epoch: 38 Global Step: 794320 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:01:07,810-Speed 6301.70 samples/sec Loss 2.5730 LearningRate 0.0000 Epoch: 38 Global Step: 794330 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:01:11,064-Speed 6295.12 samples/sec Loss 2.6269 LearningRate 0.0000 Epoch: 38 Global Step: 794340 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:01:14,322-Speed 6287.60 samples/sec Loss 2.5614 LearningRate 0.0000 Epoch: 38 Global Step: 794350 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:17,573-Speed 6299.86 samples/sec Loss 2.5853 LearningRate 0.0000 Epoch: 38 Global Step: 794360 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:20,832-Speed 6286.23 samples/sec Loss 2.5945 LearningRate 0.0000 Epoch: 38 Global Step: 794370 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:24,091-Speed 6285.76 samples/sec Loss 2.5141 LearningRate 0.0000 Epoch: 38 Global Step: 794380 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:27,355-Speed 6275.10 samples/sec Loss 2.5569 LearningRate 0.0000 Epoch: 38 Global Step: 794390 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:30,612-Speed 6290.72 samples/sec Loss 2.5574 LearningRate 0.0000 Epoch: 38 Global Step: 794400 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:33,867-Speed 6291.81 samples/sec Loss 2.5896 LearningRate 0.0000 Epoch: 38 Global Step: 794410 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:37,121-Speed 6296.12 samples/sec Loss 2.5587 LearningRate 0.0000 Epoch: 38 Global Step: 794420 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:40,370-Speed 6303.91 samples/sec Loss 2.5351 LearningRate 0.0000 Epoch: 38 Global Step: 794430 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:43,633-Speed 6277.68 samples/sec Loss 2.5697 LearningRate 0.0000 Epoch: 38 Global Step: 794440 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:46,884-Speed 6302.12 samples/sec Loss 2.5437 LearningRate 0.0000 Epoch: 38 Global Step: 794450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:50,141-Speed 6288.42 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 38 Global Step: 794460 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:53,395-Speed 6294.77 samples/sec Loss 2.5207 LearningRate 0.0000 Epoch: 38 Global Step: 794470 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:56,654-Speed 6287.21 samples/sec Loss 2.5510 LearningRate 0.0000 Epoch: 38 Global Step: 794480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:01:59,913-Speed 6285.12 samples/sec Loss 2.5647 LearningRate 0.0000 Epoch: 38 Global Step: 794490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:03,172-Speed 6285.62 samples/sec Loss 2.5349 LearningRate 0.0000 Epoch: 38 Global Step: 794500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:06,436-Speed 6276.16 samples/sec Loss 2.5394 LearningRate 0.0000 Epoch: 38 Global Step: 794510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:09,692-Speed 6291.47 samples/sec Loss 2.5377 LearningRate 0.0000 Epoch: 38 Global Step: 794520 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:12,951-Speed 6284.67 samples/sec Loss 2.5377 LearningRate 0.0000 Epoch: 38 Global Step: 794530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:16,204-Speed 6297.32 samples/sec Loss 2.5589 LearningRate 0.0000 Epoch: 38 Global Step: 794540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:19,452-Speed 6307.40 samples/sec Loss 2.6035 LearningRate 0.0000 Epoch: 38 Global Step: 794550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:22,707-Speed 6292.61 samples/sec Loss 2.5455 LearningRate 0.0000 Epoch: 38 Global Step: 794560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:25,963-Speed 6291.99 samples/sec Loss 2.5775 LearningRate 0.0000 Epoch: 38 Global Step: 794570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:29,220-Speed 6289.04 samples/sec Loss 2.5741 LearningRate 0.0000 Epoch: 38 Global Step: 794580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:32,484-Speed 6276.27 samples/sec Loss 2.6181 LearningRate 0.0000 Epoch: 38 Global Step: 794590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:35,742-Speed 6288.45 samples/sec Loss 2.5341 LearningRate 0.0000 Epoch: 38 Global Step: 794600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:39,008-Speed 6271.12 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 38 Global Step: 794610 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:42,258-Speed 6302.42 samples/sec Loss 2.5989 LearningRate 0.0000 Epoch: 38 Global Step: 794620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:45,519-Speed 6282.76 samples/sec Loss 2.5919 LearningRate 0.0000 Epoch: 38 Global Step: 794630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:48,770-Speed 6300.49 samples/sec Loss 2.5488 LearningRate 0.0000 Epoch: 38 Global Step: 794640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:52,007-Speed 6327.98 samples/sec Loss 2.5525 LearningRate 0.0000 Epoch: 38 Global Step: 794650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:55,265-Speed 6287.01 samples/sec Loss 2.5139 LearningRate 0.0000 Epoch: 38 Global Step: 794660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:02:58,516-Speed 6301.73 samples/sec Loss 2.5429 LearningRate 0.0000 Epoch: 38 Global Step: 794670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:01,775-Speed 6285.01 samples/sec Loss 2.5542 LearningRate 0.0000 Epoch: 38 Global Step: 794680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:05,028-Speed 6297.42 samples/sec Loss 2.5956 LearningRate 0.0000 Epoch: 38 Global Step: 794690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:08,288-Speed 6283.05 samples/sec Loss 2.5757 LearningRate 0.0000 Epoch: 38 Global Step: 794700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:11,547-Speed 6286.87 samples/sec Loss 2.5393 LearningRate 0.0000 Epoch: 38 Global Step: 794710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:14,813-Speed 6272.36 samples/sec Loss 2.5135 LearningRate 0.0000 Epoch: 38 Global Step: 794720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:18,075-Speed 6280.03 samples/sec Loss 2.5592 LearningRate 0.0000 Epoch: 38 Global Step: 794730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:21,325-Speed 6302.00 samples/sec Loss 2.5749 LearningRate 0.0000 Epoch: 38 Global Step: 794740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:24,563-Speed 6327.40 samples/sec Loss 2.5644 LearningRate 0.0000 Epoch: 38 Global Step: 794750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:27,817-Speed 6294.85 samples/sec Loss 2.5192 LearningRate 0.0000 Epoch: 38 Global Step: 794760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:31,064-Speed 6309.59 samples/sec Loss 2.4987 LearningRate 0.0000 Epoch: 38 Global Step: 794770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:34,319-Speed 6292.86 samples/sec Loss 2.5257 LearningRate 0.0000 Epoch: 38 Global Step: 794780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:37,578-Speed 6285.90 samples/sec Loss 2.5701 LearningRate 0.0000 Epoch: 38 Global Step: 794790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:40,834-Speed 6290.89 samples/sec Loss 2.5746 LearningRate 0.0000 Epoch: 38 Global Step: 794800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:44,082-Speed 6307.24 samples/sec Loss 2.5755 LearningRate 0.0000 Epoch: 38 Global Step: 794810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:47,348-Speed 6271.03 samples/sec Loss 2.4872 LearningRate 0.0000 Epoch: 38 Global Step: 794820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:50,608-Speed 6284.22 samples/sec Loss 2.6034 LearningRate 0.0000 Epoch: 38 Global Step: 794830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:53,859-Speed 6301.25 samples/sec Loss 2.5754 LearningRate 0.0000 Epoch: 38 Global Step: 794840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:03:57,099-Speed 6321.32 samples/sec Loss 2.5517 LearningRate 0.0000 Epoch: 38 Global Step: 794850 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:00,352-Speed 6296.46 samples/sec Loss 2.5639 LearningRate 0.0000 Epoch: 38 Global Step: 794860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:03,609-Speed 6289.67 samples/sec Loss 2.5703 LearningRate 0.0000 Epoch: 38 Global Step: 794870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:06,868-Speed 6286.44 samples/sec Loss 2.5403 LearningRate 0.0000 Epoch: 38 Global Step: 794880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:10,123-Speed 6294.18 samples/sec Loss 2.5527 LearningRate 0.0000 Epoch: 38 Global Step: 794890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:13,387-Speed 6273.92 samples/sec Loss 2.5010 LearningRate 0.0000 Epoch: 38 Global Step: 794900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:16,645-Speed 6288.45 samples/sec Loss 2.5470 LearningRate 0.0000 Epoch: 38 Global Step: 794910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:19,902-Speed 6290.21 samples/sec Loss 2.5163 LearningRate 0.0000 Epoch: 38 Global Step: 794920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:23,153-Speed 6301.54 samples/sec Loss 2.6399 LearningRate 0.0000 Epoch: 38 Global Step: 794930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:26,400-Speed 6307.57 samples/sec Loss 2.5512 LearningRate 0.0000 Epoch: 38 Global Step: 794940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:29,655-Speed 6293.50 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 38 Global Step: 794950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:32,911-Speed 6292.76 samples/sec Loss 2.5711 LearningRate 0.0000 Epoch: 38 Global Step: 794960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:36,170-Speed 6285.04 samples/sec Loss 2.5774 LearningRate 0.0000 Epoch: 38 Global Step: 794970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:39,433-Speed 6277.86 samples/sec Loss 2.5419 LearningRate 0.0000 Epoch: 38 Global Step: 794980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:42,698-Speed 6274.59 samples/sec Loss 2.5206 LearningRate 0.0000 Epoch: 38 Global Step: 794990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:45,944-Speed 6309.39 samples/sec Loss 2.5552 LearningRate 0.0000 Epoch: 38 Global Step: 795000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:49,199-Speed 6293.63 samples/sec Loss 2.5213 LearningRate 0.0000 Epoch: 38 Global Step: 795010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:52,457-Speed 6287.52 samples/sec Loss 2.5334 LearningRate 0.0000 Epoch: 38 Global Step: 795020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:55,719-Speed 6279.64 samples/sec Loss 2.5876 LearningRate 0.0000 Epoch: 38 Global Step: 795030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:04:58,973-Speed 6295.28 samples/sec Loss 2.5292 LearningRate 0.0000 Epoch: 38 Global Step: 795040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:02,227-Speed 6294.83 samples/sec Loss 2.5766 LearningRate 0.0000 Epoch: 38 Global Step: 795050 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:05:05,475-Speed 6308.38 samples/sec Loss 2.5023 LearningRate 0.0000 Epoch: 38 Global Step: 795060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:08,735-Speed 6282.70 samples/sec Loss 2.5973 LearningRate 0.0000 Epoch: 38 Global Step: 795070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:11,988-Speed 6296.41 samples/sec Loss 2.5425 LearningRate 0.0000 Epoch: 38 Global Step: 795080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:15,247-Speed 6285.72 samples/sec Loss 2.5661 LearningRate 0.0000 Epoch: 38 Global Step: 795090 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:18,499-Speed 6300.10 samples/sec Loss 2.5521 LearningRate 0.0000 Epoch: 38 Global Step: 795100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:21,747-Speed 6305.73 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 38 Global Step: 795110 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:25,005-Speed 6287.82 samples/sec Loss 2.5977 LearningRate 0.0000 Epoch: 38 Global Step: 795120 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:28,259-Speed 6297.13 samples/sec Loss 2.5502 LearningRate 0.0000 Epoch: 38 Global Step: 795130 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:31,516-Speed 6289.40 samples/sec Loss 2.5488 LearningRate 0.0000 Epoch: 38 Global Step: 795140 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:34,765-Speed 6303.87 samples/sec Loss 2.5483 LearningRate 0.0000 Epoch: 38 Global Step: 795150 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:38,013-Speed 6308.48 samples/sec Loss 2.5461 LearningRate 0.0000 Epoch: 38 Global Step: 795160 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:41,261-Speed 6306.65 samples/sec Loss 2.5033 LearningRate 0.0000 Epoch: 38 Global Step: 795170 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:44,511-Speed 6301.81 samples/sec Loss 2.5850 LearningRate 0.0000 Epoch: 38 Global Step: 795180 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:47,767-Speed 6291.78 samples/sec Loss 2.5597 LearningRate 0.0000 Epoch: 38 Global Step: 795190 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:51,024-Speed 6288.51 samples/sec Loss 2.6003 LearningRate 0.0000 Epoch: 38 Global Step: 795200 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:54,279-Speed 6293.06 samples/sec Loss 2.6026 LearningRate 0.0000 Epoch: 38 Global Step: 795210 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:05:57,539-Speed 6284.85 samples/sec Loss 2.5970 LearningRate 0.0000 Epoch: 38 Global Step: 795220 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:06:00,801-Speed 6279.84 samples/sec Loss 2.5730 LearningRate 0.0000 Epoch: 38 Global Step: 795230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:06:04,062-Speed 6281.73 samples/sec Loss 2.5369 LearningRate 0.0000 Epoch: 38 Global Step: 795240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:06:07,325-Speed 6278.34 samples/sec Loss 2.5832 LearningRate 0.0000 Epoch: 38 Global Step: 795250 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:06:10,567-Speed 6318.09 samples/sec Loss 2.5869 LearningRate 0.0000 Epoch: 38 Global Step: 795260 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:06:13,831-Speed 6274.49 samples/sec Loss 2.5462 LearningRate 0.0000 Epoch: 38 Global Step: 795270 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:06:17,083-Speed 6299.42 samples/sec Loss 2.5709 LearningRate 0.0000 Epoch: 38 Global Step: 795280 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:06:20,341-Speed 6287.34 samples/sec Loss 2.5610 LearningRate 0.0000 Epoch: 38 Global Step: 795290 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:06:23,599-Speed 6288.76 samples/sec Loss 2.5551 LearningRate 0.0000 Epoch: 38 Global Step: 795300 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:06:26,838-Speed 6323.85 samples/sec Loss 2.5165 LearningRate 0.0000 Epoch: 38 Global Step: 795310 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:06:30,097-Speed 6285.05 samples/sec Loss 2.5704 LearningRate 0.0000 Epoch: 38 Global Step: 795320 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:06:33,352-Speed 6294.73 samples/sec Loss 2.5885 LearningRate 0.0000 Epoch: 38 Global Step: 795330 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:06:36,603-Speed 6301.47 samples/sec Loss 2.5374 LearningRate 0.0000 Epoch: 38 Global Step: 795340 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:06:39,853-Speed 6302.59 samples/sec Loss 2.5119 LearningRate 0.0000 Epoch: 38 Global Step: 795350 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:06:43,113-Speed 6283.20 samples/sec Loss 2.5597 LearningRate 0.0000 Epoch: 38 Global Step: 795360 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:06:46,365-Speed 6298.08 samples/sec Loss 2.5927 LearningRate 0.0000 Epoch: 38 Global Step: 795370 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:06:49,615-Speed 6303.90 samples/sec Loss 2.5757 LearningRate 0.0000 Epoch: 38 Global Step: 795380 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:06:52,868-Speed 6298.06 samples/sec Loss 2.6194 LearningRate 0.0000 Epoch: 38 Global Step: 795390 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:06:56,126-Speed 6286.22 samples/sec Loss 2.6166 LearningRate 0.0000 Epoch: 38 Global Step: 795400 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:06:59,383-Speed 6288.90 samples/sec Loss 2.5585 LearningRate 0.0000 Epoch: 38 Global Step: 795410 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:02,640-Speed 6290.42 samples/sec Loss 2.5156 LearningRate 0.0000 Epoch: 38 Global Step: 795420 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:05,897-Speed 6289.50 samples/sec Loss 2.5569 LearningRate 0.0000 Epoch: 38 Global Step: 795430 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:09,151-Speed 6294.51 samples/sec Loss 2.5139 LearningRate 0.0000 Epoch: 38 Global Step: 795440 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:12,406-Speed 6294.10 samples/sec Loss 2.5838 LearningRate 0.0000 Epoch: 38 Global Step: 795450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:15,663-Speed 6288.34 samples/sec Loss 2.5781 LearningRate 0.0000 Epoch: 38 Global Step: 795460 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:18,931-Speed 6268.32 samples/sec Loss 2.5560 LearningRate 0.0000 Epoch: 38 Global Step: 795470 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:22,182-Speed 6300.83 samples/sec Loss 2.5572 LearningRate 0.0000 Epoch: 38 Global Step: 795480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:25,439-Speed 6290.38 samples/sec Loss 2.4852 LearningRate 0.0000 Epoch: 38 Global Step: 795490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:28,696-Speed 6288.88 samples/sec Loss 2.5494 LearningRate 0.0000 Epoch: 38 Global Step: 795500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:31,938-Speed 6318.96 samples/sec Loss 2.5950 LearningRate 0.0000 Epoch: 38 Global Step: 795510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:35,206-Speed 6267.74 samples/sec Loss 2.5536 LearningRate 0.0000 Epoch: 38 Global Step: 795520 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:38,476-Speed 6265.76 samples/sec Loss 2.5147 LearningRate 0.0000 Epoch: 38 Global Step: 795530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:41,724-Speed 6306.68 samples/sec Loss 2.5522 LearningRate 0.0000 Epoch: 38 Global Step: 795540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:44,980-Speed 6290.26 samples/sec Loss 2.5215 LearningRate 0.0000 Epoch: 38 Global Step: 795550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:48,241-Speed 6282.53 samples/sec Loss 2.4953 LearningRate 0.0000 Epoch: 38 Global Step: 795560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:51,501-Speed 6284.22 samples/sec Loss 2.5147 LearningRate 0.0000 Epoch: 38 Global Step: 795570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:54,760-Speed 6283.99 samples/sec Loss 2.5631 LearningRate 0.0000 Epoch: 38 Global Step: 795580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:07:58,012-Speed 6299.37 samples/sec Loss 2.5398 LearningRate 0.0000 Epoch: 38 Global Step: 795590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:01,273-Speed 6282.71 samples/sec Loss 2.5984 LearningRate 0.0000 Epoch: 38 Global Step: 795600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:04,528-Speed 6291.52 samples/sec Loss 2.5902 LearningRate 0.0000 Epoch: 38 Global Step: 795610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:08:07,765-Speed 6328.54 samples/sec Loss 2.5683 LearningRate 0.0000 Epoch: 38 Global Step: 795620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:11,018-Speed 6297.70 samples/sec Loss 2.5753 LearningRate 0.0000 Epoch: 38 Global Step: 795630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:14,270-Speed 6300.39 samples/sec Loss 2.5131 LearningRate 0.0000 Epoch: 38 Global Step: 795640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:17,522-Speed 6297.58 samples/sec Loss 2.5374 LearningRate 0.0000 Epoch: 38 Global Step: 795650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:20,778-Speed 6291.69 samples/sec Loss 2.5725 LearningRate 0.0000 Epoch: 38 Global Step: 795660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:24,044-Speed 6271.15 samples/sec Loss 2.5144 LearningRate 0.0000 Epoch: 38 Global Step: 795670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:27,303-Speed 6286.78 samples/sec Loss 2.5356 LearningRate 0.0000 Epoch: 38 Global Step: 795680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:30,555-Speed 6297.93 samples/sec Loss 2.5315 LearningRate 0.0000 Epoch: 38 Global Step: 795690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:33,818-Speed 6277.24 samples/sec Loss 2.5828 LearningRate 0.0000 Epoch: 38 Global Step: 795700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:37,079-Speed 6282.93 samples/sec Loss 2.5189 LearningRate 0.0000 Epoch: 38 Global Step: 795710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:40,330-Speed 6301.85 samples/sec Loss 2.5250 LearningRate 0.0000 Epoch: 38 Global Step: 795720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:43,587-Speed 6288.90 samples/sec Loss 2.6369 LearningRate 0.0000 Epoch: 38 Global Step: 795730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:46,851-Speed 6277.09 samples/sec Loss 2.5635 LearningRate 0.0000 Epoch: 38 Global Step: 795740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:50,107-Speed 6290.61 samples/sec Loss 2.5750 LearningRate 0.0000 Epoch: 38 Global Step: 795750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:53,358-Speed 6301.36 samples/sec Loss 2.5534 LearningRate 0.0000 Epoch: 38 Global Step: 795760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:56,616-Speed 6287.87 samples/sec Loss 2.5466 LearningRate 0.0000 Epoch: 38 Global Step: 795770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:08:59,874-Speed 6286.91 samples/sec Loss 2.6009 LearningRate 0.0000 Epoch: 38 Global Step: 795780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:03,128-Speed 6295.87 samples/sec Loss 2.5916 LearningRate 0.0000 Epoch: 38 Global Step: 795790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:06,382-Speed 6295.30 samples/sec Loss 2.5475 LearningRate 0.0000 Epoch: 38 Global Step: 795800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:09,631-Speed 6304.89 samples/sec Loss 2.5258 LearningRate 0.0000 Epoch: 38 Global Step: 795810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:12,865-Speed 6333.75 samples/sec Loss 2.5589 LearningRate 0.0000 Epoch: 38 Global Step: 795820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:16,118-Speed 6296.57 samples/sec Loss 2.5625 LearningRate 0.0000 Epoch: 38 Global Step: 795830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:19,369-Speed 6300.84 samples/sec Loss 2.5448 LearningRate 0.0000 Epoch: 38 Global Step: 795840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:22,632-Speed 6277.25 samples/sec Loss 2.5510 LearningRate 0.0000 Epoch: 38 Global Step: 795850 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:25,882-Speed 6303.66 samples/sec Loss 2.5121 LearningRate 0.0000 Epoch: 38 Global Step: 795860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:29,150-Speed 6268.16 samples/sec Loss 2.5449 LearningRate 0.0000 Epoch: 38 Global Step: 795870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:32,408-Speed 6288.00 samples/sec Loss 2.5785 LearningRate 0.0000 Epoch: 38 Global Step: 795880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:35,679-Speed 6262.14 samples/sec Loss 2.4901 LearningRate 0.0000 Epoch: 38 Global Step: 795890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:38,929-Speed 6301.91 samples/sec Loss 2.5701 LearningRate 0.0000 Epoch: 38 Global Step: 795900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:42,183-Speed 6295.00 samples/sec Loss 2.5734 LearningRate 0.0000 Epoch: 38 Global Step: 795910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:45,428-Speed 6315.01 samples/sec Loss 2.5602 LearningRate 0.0000 Epoch: 38 Global Step: 795920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:48,689-Speed 6280.41 samples/sec Loss 2.6281 LearningRate 0.0000 Epoch: 38 Global Step: 795930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:51,948-Speed 6287.12 samples/sec Loss 2.5823 LearningRate 0.0000 Epoch: 38 Global Step: 795940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:55,197-Speed 6303.88 samples/sec Loss 2.5748 LearningRate 0.0000 Epoch: 38 Global Step: 795950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:09:58,455-Speed 6287.76 samples/sec Loss 2.5796 LearningRate 0.0000 Epoch: 38 Global Step: 795960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:01,703-Speed 6306.69 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 38 Global Step: 795970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:04,957-Speed 6295.50 samples/sec Loss 2.5681 LearningRate 0.0000 Epoch: 38 Global Step: 795980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:08,212-Speed 6292.86 samples/sec Loss 2.5290 LearningRate 0.0000 Epoch: 38 Global Step: 795990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:11,462-Speed 6303.47 samples/sec Loss 2.5557 LearningRate 0.0000 Epoch: 38 Global Step: 796000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:14,711-Speed 6304.13 samples/sec Loss 2.5356 LearningRate 0.0000 Epoch: 38 Global Step: 796010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:17,952-Speed 6321.86 samples/sec Loss 2.5651 LearningRate 0.0000 Epoch: 38 Global Step: 796020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:21,211-Speed 6284.77 samples/sec Loss 2.5216 LearningRate 0.0000 Epoch: 38 Global Step: 796030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:24,470-Speed 6284.98 samples/sec Loss 2.5613 LearningRate 0.0000 Epoch: 38 Global Step: 796040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:27,727-Speed 6289.67 samples/sec Loss 2.5748 LearningRate 0.0000 Epoch: 38 Global Step: 796050 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:30,986-Speed 6285.17 samples/sec Loss 2.6151 LearningRate 0.0000 Epoch: 38 Global Step: 796060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:34,235-Speed 6304.53 samples/sec Loss 2.5550 LearningRate 0.0000 Epoch: 38 Global Step: 796070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:37,491-Speed 6291.28 samples/sec Loss 2.5698 LearningRate 0.0000 Epoch: 38 Global Step: 796080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:40,738-Speed 6309.22 samples/sec Loss 2.5531 LearningRate 0.0000 Epoch: 38 Global Step: 796090 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:43,992-Speed 6295.83 samples/sec Loss 2.5185 LearningRate 0.0000 Epoch: 38 Global Step: 796100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:47,247-Speed 6293.50 samples/sec Loss 2.5359 LearningRate 0.0000 Epoch: 38 Global Step: 796110 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:50,487-Speed 6321.84 samples/sec Loss 2.5929 LearningRate 0.0000 Epoch: 38 Global Step: 796120 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:53,739-Speed 6299.29 samples/sec Loss 2.5398 LearningRate 0.0000 Epoch: 38 Global Step: 796130 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:10:56,988-Speed 6305.54 samples/sec Loss 2.5891 LearningRate 0.0000 Epoch: 38 Global Step: 796140 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:00,244-Speed 6291.60 samples/sec Loss 2.5816 LearningRate 0.0000 Epoch: 38 Global Step: 796150 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:03,505-Speed 6281.93 samples/sec Loss 2.5709 LearningRate 0.0000 Epoch: 38 Global Step: 796160 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:06,760-Speed 6293.66 samples/sec Loss 2.5547 LearningRate 0.0000 Epoch: 38 Global Step: 796170 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:10,017-Speed 6289.62 samples/sec Loss 2.5595 LearningRate 0.0000 Epoch: 38 Global Step: 796180 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:13,273-Speed 6291.71 samples/sec Loss 2.5979 LearningRate 0.0000 Epoch: 38 Global Step: 796190 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:16,530-Speed 6289.51 samples/sec Loss 2.5494 LearningRate 0.0000 Epoch: 38 Global Step: 796200 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:19,781-Speed 6299.71 samples/sec Loss 2.6016 LearningRate 0.0000 Epoch: 38 Global Step: 796210 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:23,038-Speed 6289.65 samples/sec Loss 2.5879 LearningRate 0.0000 Epoch: 38 Global Step: 796220 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:11:26,275-Speed 6327.43 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 38 Global Step: 796230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:29,538-Speed 6279.46 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 38 Global Step: 796240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:32,788-Speed 6302.57 samples/sec Loss 2.5530 LearningRate 0.0000 Epoch: 38 Global Step: 796250 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:36,046-Speed 6287.16 samples/sec Loss 2.5166 LearningRate 0.0000 Epoch: 38 Global Step: 796260 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:39,297-Speed 6301.39 samples/sec Loss 2.5409 LearningRate 0.0000 Epoch: 38 Global Step: 796270 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:42,553-Speed 6290.05 samples/sec Loss 2.5365 LearningRate 0.0000 Epoch: 38 Global Step: 796280 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:45,804-Speed 6301.91 samples/sec Loss 2.5120 LearningRate 0.0000 Epoch: 38 Global Step: 796290 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:49,053-Speed 6303.50 samples/sec Loss 2.4979 LearningRate 0.0000 Epoch: 38 Global Step: 796300 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:52,306-Speed 6297.64 samples/sec Loss 2.5861 LearningRate 0.0000 Epoch: 38 Global Step: 796310 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:55,562-Speed 6292.31 samples/sec Loss 2.5532 LearningRate 0.0000 Epoch: 38 Global Step: 796320 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:11:58,807-Speed 6311.76 samples/sec Loss 2.5723 LearningRate 0.0000 Epoch: 38 Global Step: 796330 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:02,064-Speed 6289.94 samples/sec Loss 2.6287 LearningRate 0.0000 Epoch: 38 Global Step: 796340 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:05,331-Speed 6270.33 samples/sec Loss 2.5798 LearningRate 0.0000 Epoch: 38 Global Step: 796350 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:08,590-Speed 6285.65 samples/sec Loss 2.5789 LearningRate 0.0000 Epoch: 38 Global Step: 796360 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:11,842-Speed 6300.01 samples/sec Loss 2.5073 LearningRate 0.0000 Epoch: 38 Global Step: 796370 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:15,121-Speed 6247.31 samples/sec Loss 2.5233 LearningRate 0.0000 Epoch: 38 Global Step: 796380 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:18,379-Speed 6286.91 samples/sec Loss 2.5699 LearningRate 0.0000 Epoch: 38 Global Step: 796390 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:21,636-Speed 6288.96 samples/sec Loss 2.5739 LearningRate 0.0000 Epoch: 38 Global Step: 796400 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:24,894-Speed 6288.51 samples/sec Loss 2.5117 LearningRate 0.0000 Epoch: 38 Global Step: 796410 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:28,154-Speed 6283.98 samples/sec Loss 2.6049 LearningRate 0.0000 Epoch: 38 Global Step: 796420 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:31,394-Speed 6320.79 samples/sec Loss 2.5435 LearningRate 0.0000 Epoch: 38 Global Step: 796430 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:34,644-Speed 6302.43 samples/sec Loss 2.5634 LearningRate 0.0000 Epoch: 38 Global Step: 796440 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:37,890-Speed 6312.74 samples/sec Loss 2.5219 LearningRate 0.0000 Epoch: 38 Global Step: 796450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:41,143-Speed 6295.66 samples/sec Loss 2.5794 LearningRate 0.0000 Epoch: 38 Global Step: 796460 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:44,401-Speed 6288.42 samples/sec Loss 2.4767 LearningRate 0.0000 Epoch: 38 Global Step: 796470 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:47,647-Speed 6310.50 samples/sec Loss 2.5480 LearningRate 0.0000 Epoch: 38 Global Step: 796480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:50,902-Speed 6293.19 samples/sec Loss 2.4992 LearningRate 0.0000 Epoch: 38 Global Step: 796490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:54,154-Speed 6298.09 samples/sec Loss 2.5458 LearningRate 0.0000 Epoch: 38 Global Step: 796500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:12:57,406-Speed 6299.88 samples/sec Loss 2.5720 LearningRate 0.0000 Epoch: 38 Global Step: 796510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:00,663-Speed 6288.67 samples/sec Loss 2.5871 LearningRate 0.0000 Epoch: 38 Global Step: 796520 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:03,899-Speed 6330.66 samples/sec Loss 2.5761 LearningRate 0.0000 Epoch: 38 Global Step: 796530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:07,157-Speed 6287.58 samples/sec Loss 2.5457 LearningRate 0.0000 Epoch: 38 Global Step: 796540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:10,422-Speed 6274.53 samples/sec Loss 2.5624 LearningRate 0.0000 Epoch: 38 Global Step: 796550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:13,682-Speed 6284.67 samples/sec Loss 2.5348 LearningRate 0.0000 Epoch: 38 Global Step: 796560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:17,058-Speed 6066.53 samples/sec Loss 2.5797 LearningRate 0.0000 Epoch: 38 Global Step: 796570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:20,325-Speed 6270.04 samples/sec Loss 2.5732 LearningRate 0.0000 Epoch: 38 Global Step: 796580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:23,580-Speed 6293.66 samples/sec Loss 2.5219 LearningRate 0.0000 Epoch: 38 Global Step: 796590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:26,831-Speed 6300.15 samples/sec Loss 2.5286 LearningRate 0.0000 Epoch: 38 Global Step: 796600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:30,091-Speed 6284.74 samples/sec Loss 2.5621 LearningRate 0.0000 Epoch: 38 Global Step: 796610 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:33,351-Speed 6282.42 samples/sec Loss 2.6106 LearningRate 0.0000 Epoch: 38 Global Step: 796620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:36,593-Speed 6319.17 samples/sec Loss 2.5102 LearningRate 0.0000 Epoch: 38 Global Step: 796630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:39,853-Speed 6283.23 samples/sec Loss 2.5282 LearningRate 0.0000 Epoch: 38 Global Step: 796640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:43,110-Speed 6290.54 samples/sec Loss 2.5467 LearningRate 0.0000 Epoch: 38 Global Step: 796650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:46,372-Speed 6279.19 samples/sec Loss 2.5817 LearningRate 0.0000 Epoch: 38 Global Step: 796660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:49,629-Speed 6289.42 samples/sec Loss 2.5903 LearningRate 0.0000 Epoch: 38 Global Step: 796670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:52,881-Speed 6298.54 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 38 Global Step: 796680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:56,136-Speed 6294.53 samples/sec Loss 2.5853 LearningRate 0.0000 Epoch: 38 Global Step: 796690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:13:59,396-Speed 6282.04 samples/sec Loss 2.5491 LearningRate 0.0000 Epoch: 38 Global Step: 796700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:02,666-Speed 6264.73 samples/sec Loss 2.5247 LearningRate 0.0000 Epoch: 38 Global Step: 796710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:05,916-Speed 6303.85 samples/sec Loss 2.5624 LearningRate 0.0000 Epoch: 38 Global Step: 796720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:09,173-Speed 6288.81 samples/sec Loss 2.5885 LearningRate 0.0000 Epoch: 38 Global Step: 796730 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:14:12,409-Speed 6330.27 samples/sec Loss 2.5483 LearningRate 0.0000 Epoch: 38 Global Step: 796740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:15,663-Speed 6296.83 samples/sec Loss 2.5595 LearningRate 0.0000 Epoch: 38 Global Step: 796750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:18,921-Speed 6287.02 samples/sec Loss 2.5649 LearningRate 0.0000 Epoch: 38 Global Step: 796760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:22,182-Speed 6280.90 samples/sec Loss 2.5338 LearningRate 0.0000 Epoch: 38 Global Step: 796770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:25,431-Speed 6304.70 samples/sec Loss 2.5363 LearningRate 0.0000 Epoch: 38 Global Step: 796780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:28,688-Speed 6288.82 samples/sec Loss 2.5840 LearningRate 0.0000 Epoch: 38 Global Step: 796790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:31,945-Speed 6289.28 samples/sec Loss 2.5733 LearningRate 0.0000 Epoch: 38 Global Step: 796800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:35,207-Speed 6280.48 samples/sec Loss 2.5607 LearningRate 0.0000 Epoch: 38 Global Step: 796810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:38,454-Speed 6309.63 samples/sec Loss 2.5740 LearningRate 0.0000 Epoch: 38 Global Step: 796820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:41,707-Speed 6296.85 samples/sec Loss 2.5926 LearningRate 0.0000 Epoch: 38 Global Step: 796830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:44,946-Speed 6324.92 samples/sec Loss 2.5107 LearningRate 0.0000 Epoch: 38 Global Step: 796840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:48,203-Speed 6288.35 samples/sec Loss 2.6036 LearningRate 0.0000 Epoch: 38 Global Step: 796850 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:51,463-Speed 6282.79 samples/sec Loss 2.5740 LearningRate 0.0000 Epoch: 38 Global Step: 796860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:54,718-Speed 6294.05 samples/sec Loss 2.5757 LearningRate 0.0000 Epoch: 38 Global Step: 796870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:14:57,978-Speed 6283.09 samples/sec Loss 2.5353 LearningRate 0.0000 Epoch: 38 Global Step: 796880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:01,234-Speed 6291.23 samples/sec Loss 2.5545 LearningRate 0.0000 Epoch: 38 Global Step: 796890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:04,494-Speed 6284.69 samples/sec Loss 2.5653 LearningRate 0.0000 Epoch: 38 Global Step: 796900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:07,748-Speed 6294.51 samples/sec Loss 2.5554 LearningRate 0.0000 Epoch: 38 Global Step: 796910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:11,007-Speed 6286.19 samples/sec Loss 2.5229 LearningRate 0.0000 Epoch: 38 Global Step: 796920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:14,260-Speed 6297.19 samples/sec Loss 2.5443 LearningRate 0.0000 Epoch: 38 Global Step: 796930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:17,499-Speed 6325.33 samples/sec Loss 2.5144 LearningRate 0.0000 Epoch: 38 Global Step: 796940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:20,749-Speed 6302.97 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 38 Global Step: 796950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:24,025-Speed 6251.28 samples/sec Loss 2.5851 LearningRate 0.0000 Epoch: 38 Global Step: 796960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:27,276-Speed 6302.81 samples/sec Loss 2.5261 LearningRate 0.0000 Epoch: 38 Global Step: 796970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:30,538-Speed 6280.00 samples/sec Loss 2.6232 LearningRate 0.0000 Epoch: 38 Global Step: 796980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:33,789-Speed 6299.74 samples/sec Loss 2.5343 LearningRate 0.0000 Epoch: 38 Global Step: 796990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:37,051-Speed 6280.82 samples/sec Loss 2.5182 LearningRate 0.0000 Epoch: 38 Global Step: 797000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:40,302-Speed 6299.38 samples/sec Loss 2.5408 LearningRate 0.0000 Epoch: 38 Global Step: 797010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:43,556-Speed 6296.43 samples/sec Loss 2.4870 LearningRate 0.0000 Epoch: 38 Global Step: 797020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:46,802-Speed 6311.32 samples/sec Loss 2.5512 LearningRate 0.0000 Epoch: 38 Global Step: 797030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:50,054-Speed 6300.95 samples/sec Loss 2.5140 LearningRate 0.0000 Epoch: 38 Global Step: 797040 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:15:53,292-Speed 6327.17 samples/sec Loss 2.5259 LearningRate 0.0000 Epoch: 38 Global Step: 797050 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:56,543-Speed 6299.97 samples/sec Loss 2.5676 LearningRate 0.0000 Epoch: 38 Global Step: 797060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:15:59,813-Speed 6264.70 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 38 Global Step: 797070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:03,067-Speed 6295.85 samples/sec Loss 2.5893 LearningRate 0.0000 Epoch: 38 Global Step: 797080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:06,328-Speed 6281.52 samples/sec Loss 2.5788 LearningRate 0.0000 Epoch: 38 Global Step: 797090 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:09,573-Speed 6312.70 samples/sec Loss 2.5253 LearningRate 0.0000 Epoch: 38 Global Step: 797100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:12,831-Speed 6297.37 samples/sec Loss 2.5225 LearningRate 0.0000 Epoch: 38 Global Step: 797110 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:16,093-Speed 6278.93 samples/sec Loss 2.5473 LearningRate 0.0000 Epoch: 38 Global Step: 797120 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:19,346-Speed 6297.30 samples/sec Loss 2.6107 LearningRate 0.0000 Epoch: 38 Global Step: 797130 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:22,605-Speed 6285.76 samples/sec Loss 2.5424 LearningRate 0.0000 Epoch: 38 Global Step: 797140 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:25,843-Speed 6328.17 samples/sec Loss 2.5328 LearningRate 0.0000 Epoch: 38 Global Step: 797150 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:29,098-Speed 6291.70 samples/sec Loss 2.5787 LearningRate 0.0000 Epoch: 38 Global Step: 797160 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:32,354-Speed 6291.34 samples/sec Loss 2.5974 LearningRate 0.0000 Epoch: 38 Global Step: 797170 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:35,645-Speed 6224.59 samples/sec Loss 2.5041 LearningRate 0.0000 Epoch: 38 Global Step: 797180 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:38,931-Speed 6233.79 samples/sec Loss 2.5907 LearningRate 0.0000 Epoch: 38 Global Step: 797190 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:42,181-Speed 6304.14 samples/sec Loss 2.5707 LearningRate 0.0000 Epoch: 38 Global Step: 797200 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:45,452-Speed 6262.50 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 38 Global Step: 797210 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:48,704-Speed 6297.53 samples/sec Loss 2.5871 LearningRate 0.0000 Epoch: 38 Global Step: 797220 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:51,958-Speed 6295.05 samples/sec Loss 2.5290 LearningRate 0.0000 Epoch: 38 Global Step: 797230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:55,208-Speed 6303.77 samples/sec Loss 2.5559 LearningRate 0.0000 Epoch: 38 Global Step: 797240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:16:58,450-Speed 6319.38 samples/sec Loss 2.5624 LearningRate 0.0000 Epoch: 38 Global Step: 797250 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:01,706-Speed 6290.02 samples/sec Loss 2.5536 LearningRate 0.0000 Epoch: 38 Global Step: 797260 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:04,966-Speed 6283.86 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 38 Global Step: 797270 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:08,226-Speed 6284.58 samples/sec Loss 2.5293 LearningRate 0.0000 Epoch: 38 Global Step: 797280 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:11,482-Speed 6291.56 samples/sec Loss 2.5435 LearningRate 0.0000 Epoch: 38 Global Step: 797290 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:14,734-Speed 6297.44 samples/sec Loss 2.5508 LearningRate 0.0000 Epoch: 38 Global Step: 797300 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:17,996-Speed 6280.83 samples/sec Loss 2.5621 LearningRate 0.0000 Epoch: 38 Global Step: 797310 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:21,309-Speed 6183.03 samples/sec Loss 2.5208 LearningRate 0.0000 Epoch: 38 Global Step: 797320 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:24,592-Speed 6239.85 samples/sec Loss 2.5309 LearningRate 0.0000 Epoch: 38 Global Step: 797330 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:27,848-Speed 6292.73 samples/sec Loss 2.4906 LearningRate 0.0000 Epoch: 38 Global Step: 797340 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:31,085-Speed 6328.01 samples/sec Loss 2.5135 LearningRate 0.0000 Epoch: 38 Global Step: 797350 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:34,334-Speed 6304.61 samples/sec Loss 2.5229 LearningRate 0.0000 Epoch: 38 Global Step: 797360 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:37,585-Speed 6302.78 samples/sec Loss 2.5313 LearningRate 0.0000 Epoch: 38 Global Step: 797370 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:40,841-Speed 6290.66 samples/sec Loss 2.5871 LearningRate 0.0000 Epoch: 38 Global Step: 797380 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:44,097-Speed 6290.99 samples/sec Loss 2.5678 LearningRate 0.0000 Epoch: 38 Global Step: 797390 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:47,357-Speed 6283.94 samples/sec Loss 2.5267 LearningRate 0.0000 Epoch: 38 Global Step: 797400 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:50,606-Speed 6305.64 samples/sec Loss 2.5402 LearningRate 0.0000 Epoch: 38 Global Step: 797410 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:53,858-Speed 6298.93 samples/sec Loss 2.5842 LearningRate 0.0000 Epoch: 38 Global Step: 797420 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:17:57,117-Speed 6284.54 samples/sec Loss 2.5563 LearningRate 0.0000 Epoch: 38 Global Step: 797430 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:00,384-Speed 6269.81 samples/sec Loss 2.5789 LearningRate 0.0000 Epoch: 38 Global Step: 797440 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:03,622-Speed 6326.05 samples/sec Loss 2.5517 LearningRate 0.0000 Epoch: 38 Global Step: 797450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:06,886-Speed 6278.92 samples/sec Loss 2.5811 LearningRate 0.0000 Epoch: 38 Global Step: 797460 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:10,140-Speed 6295.51 samples/sec Loss 2.5314 LearningRate 0.0000 Epoch: 38 Global Step: 797470 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:13,402-Speed 6280.39 samples/sec Loss 2.5325 LearningRate 0.0000 Epoch: 38 Global Step: 797480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:16,656-Speed 6294.12 samples/sec Loss 2.5422 LearningRate 0.0000 Epoch: 38 Global Step: 797490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:19,909-Speed 6297.59 samples/sec Loss 2.5600 LearningRate 0.0000 Epoch: 38 Global Step: 797500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:23,168-Speed 6285.12 samples/sec Loss 2.6258 LearningRate 0.0000 Epoch: 38 Global Step: 797510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:26,427-Speed 6285.35 samples/sec Loss 2.5612 LearningRate 0.0000 Epoch: 38 Global Step: 797520 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:29,684-Speed 6289.69 samples/sec Loss 2.5514 LearningRate 0.0000 Epoch: 38 Global Step: 797530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:32,936-Speed 6299.41 samples/sec Loss 2.5243 LearningRate 0.0000 Epoch: 38 Global Step: 797540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:36,170-Speed 6334.23 samples/sec Loss 2.4908 LearningRate 0.0000 Epoch: 38 Global Step: 797550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:39,425-Speed 6294.27 samples/sec Loss 2.5541 LearningRate 0.0000 Epoch: 38 Global Step: 797560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:42,684-Speed 6285.85 samples/sec Loss 2.5267 LearningRate 0.0000 Epoch: 38 Global Step: 797570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:45,939-Speed 6291.69 samples/sec Loss 2.5251 LearningRate 0.0000 Epoch: 38 Global Step: 797580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:49,198-Speed 6287.67 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 38 Global Step: 797590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:52,455-Speed 6288.53 samples/sec Loss 2.5329 LearningRate 0.0000 Epoch: 38 Global Step: 797600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:55,712-Speed 6289.08 samples/sec Loss 2.5165 LearningRate 0.0000 Epoch: 38 Global Step: 797610 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:18:58,968-Speed 6292.39 samples/sec Loss 2.4739 LearningRate 0.0000 Epoch: 38 Global Step: 797620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:02,222-Speed 6294.80 samples/sec Loss 2.5729 LearningRate 0.0000 Epoch: 38 Global Step: 797630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:05,490-Speed 6267.67 samples/sec Loss 2.5582 LearningRate 0.0000 Epoch: 38 Global Step: 797640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:08,726-Speed 6330.09 samples/sec Loss 2.5806 LearningRate 0.0000 Epoch: 38 Global Step: 797650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:11,999-Speed 6259.24 samples/sec Loss 2.5908 LearningRate 0.0000 Epoch: 38 Global Step: 797660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:15,252-Speed 6297.25 samples/sec Loss 2.5701 LearningRate 0.0000 Epoch: 38 Global Step: 797670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:18,512-Speed 6282.86 samples/sec Loss 2.5161 LearningRate 0.0000 Epoch: 38 Global Step: 797680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:21,765-Speed 6296.51 samples/sec Loss 2.5687 LearningRate 0.0000 Epoch: 38 Global Step: 797690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:25,025-Speed 6284.14 samples/sec Loss 2.5127 LearningRate 0.0000 Epoch: 38 Global Step: 797700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:28,284-Speed 6286.76 samples/sec Loss 2.5623 LearningRate 0.0000 Epoch: 38 Global Step: 797710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:31,536-Speed 6298.90 samples/sec Loss 2.6117 LearningRate 0.0000 Epoch: 38 Global Step: 797720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:34,788-Speed 6298.25 samples/sec Loss 2.5276 LearningRate 0.0000 Epoch: 38 Global Step: 797730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:38,047-Speed 6286.35 samples/sec Loss 2.5555 LearningRate 0.0000 Epoch: 38 Global Step: 797740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:41,284-Speed 6326.61 samples/sec Loss 2.4855 LearningRate 0.0000 Epoch: 38 Global Step: 797750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:44,537-Speed 6298.79 samples/sec Loss 2.5428 LearningRate 0.0000 Epoch: 38 Global Step: 797760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:47,805-Speed 6267.68 samples/sec Loss 2.5423 LearningRate 0.0000 Epoch: 38 Global Step: 797770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:51,076-Speed 6264.06 samples/sec Loss 2.5650 LearningRate 0.0000 Epoch: 38 Global Step: 797780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:54,327-Speed 6299.73 samples/sec Loss 2.5196 LearningRate 0.0000 Epoch: 38 Global Step: 797790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:19:57,584-Speed 6289.65 samples/sec Loss 2.5512 LearningRate 0.0000 Epoch: 38 Global Step: 797800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:00,854-Speed 6265.15 samples/sec Loss 2.5647 LearningRate 0.0000 Epoch: 38 Global Step: 797810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:04,111-Speed 6287.40 samples/sec Loss 2.5696 LearningRate 0.0000 Epoch: 38 Global Step: 797820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:07,364-Speed 6299.00 samples/sec Loss 2.5602 LearningRate 0.0000 Epoch: 38 Global Step: 797830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:10,613-Speed 6304.02 samples/sec Loss 2.5015 LearningRate 0.0000 Epoch: 38 Global Step: 797840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:13,869-Speed 6291.40 samples/sec Loss 2.5556 LearningRate 0.0000 Epoch: 38 Global Step: 797850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:20:17,118-Speed 6303.81 samples/sec Loss 2.5566 LearningRate 0.0000 Epoch: 38 Global Step: 797860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:20,366-Speed 6307.16 samples/sec Loss 2.5487 LearningRate 0.0000 Epoch: 38 Global Step: 797870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:23,620-Speed 6295.39 samples/sec Loss 2.5678 LearningRate 0.0000 Epoch: 38 Global Step: 797880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:26,873-Speed 6296.64 samples/sec Loss 2.5608 LearningRate 0.0000 Epoch: 38 Global Step: 797890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:30,136-Speed 6280.15 samples/sec Loss 2.5846 LearningRate 0.0000 Epoch: 38 Global Step: 797900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:33,387-Speed 6300.84 samples/sec Loss 2.5152 LearningRate 0.0000 Epoch: 38 Global Step: 797910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:36,641-Speed 6294.88 samples/sec Loss 2.5456 LearningRate 0.0000 Epoch: 38 Global Step: 797920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:39,898-Speed 6289.38 samples/sec Loss 2.5419 LearningRate 0.0000 Epoch: 38 Global Step: 797930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:43,148-Speed 6303.56 samples/sec Loss 2.5594 LearningRate 0.0000 Epoch: 38 Global Step: 797940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:46,449-Speed 6206.42 samples/sec Loss 2.5802 LearningRate 0.0000 Epoch: 38 Global Step: 797950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:49,693-Speed 6313.04 samples/sec Loss 2.5742 LearningRate 0.0000 Epoch: 38 Global Step: 797960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:52,949-Speed 6291.50 samples/sec Loss 2.5746 LearningRate 0.0000 Epoch: 38 Global Step: 797970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:56,201-Speed 6301.21 samples/sec Loss 2.6153 LearningRate 0.0000 Epoch: 38 Global Step: 797980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:20:59,462-Speed 6280.39 samples/sec Loss 2.5782 LearningRate 0.0000 Epoch: 38 Global Step: 797990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:02,714-Speed 6299.72 samples/sec Loss 2.5403 LearningRate 0.0000 Epoch: 38 Global Step: 798000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:05,966-Speed 6300.19 samples/sec Loss 2.5518 LearningRate 0.0000 Epoch: 38 Global Step: 798010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:09,223-Speed 6287.79 samples/sec Loss 2.4942 LearningRate 0.0000 Epoch: 38 Global Step: 798020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:12,478-Speed 6294.92 samples/sec Loss 2.6169 LearningRate 0.0000 Epoch: 38 Global Step: 798030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:15,734-Speed 6290.48 samples/sec Loss 2.6258 LearningRate 0.0000 Epoch: 38 Global Step: 798040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:18,995-Speed 6282.29 samples/sec Loss 2.4969 LearningRate 0.0000 Epoch: 38 Global Step: 798050 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:22,231-Speed 6329.42 samples/sec Loss 2.5647 LearningRate 0.0000 Epoch: 38 Global Step: 798060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:25,505-Speed 6256.54 samples/sec Loss 2.5541 LearningRate 0.0000 Epoch: 38 Global Step: 798070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:28,797-Speed 6223.41 samples/sec Loss 2.5328 LearningRate 0.0000 Epoch: 38 Global Step: 798080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:32,051-Speed 6294.49 samples/sec Loss 2.5820 LearningRate 0.0000 Epoch: 38 Global Step: 798090 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:35,307-Speed 6291.50 samples/sec Loss 2.5194 LearningRate 0.0000 Epoch: 38 Global Step: 798100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:38,557-Speed 6302.51 samples/sec Loss 2.5453 LearningRate 0.0000 Epoch: 38 Global Step: 798110 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:21:41,801-Speed 6314.34 samples/sec Loss 2.5460 LearningRate 0.0000 Epoch: 38 Global Step: 798120 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:21:45,060-Speed 6285.66 samples/sec Loss 2.5686 LearningRate 0.0000 Epoch: 38 Global Step: 798130 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:21:48,328-Speed 6268.43 samples/sec Loss 2.5588 LearningRate 0.0000 Epoch: 38 Global Step: 798140 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:21:51,582-Speed 6295.15 samples/sec Loss 2.5388 LearningRate 0.0000 Epoch: 38 Global Step: 798150 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:21:54,838-Speed 6291.25 samples/sec Loss 2.5716 LearningRate 0.0000 Epoch: 38 Global Step: 798160 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:21:58,089-Speed 6302.00 samples/sec Loss 2.5458 LearningRate 0.0000 Epoch: 38 Global Step: 798170 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:22:01,347-Speed 6286.56 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 38 Global Step: 798180 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:22:04,600-Speed 6297.20 samples/sec Loss 2.5623 LearningRate 0.0000 Epoch: 38 Global Step: 798190 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:22:07,858-Speed 6287.14 samples/sec Loss 2.5845 LearningRate 0.0000 Epoch: 38 Global Step: 798200 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:22:11,114-Speed 6291.95 samples/sec Loss 2.6055 LearningRate 0.0000 Epoch: 38 Global Step: 798210 Fp16 Grad Scale: 2048 Required: 3 hours Training: 2022-04-03 16:22:14,363-Speed 6305.57 samples/sec Loss 2.5598 LearningRate 0.0000 Epoch: 38 Global Step: 798220 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:17,618-Speed 6293.46 samples/sec Loss 2.5392 LearningRate 0.0000 Epoch: 38 Global Step: 798230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:20,880-Speed 6279.08 samples/sec Loss 2.5502 LearningRate 0.0000 Epoch: 38 Global Step: 798240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:24,139-Speed 6286.68 samples/sec Loss 2.5999 LearningRate 0.0000 Epoch: 38 Global Step: 798250 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:27,394-Speed 6294.00 samples/sec Loss 2.4907 LearningRate 0.0000 Epoch: 38 Global Step: 798260 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:30,644-Speed 6301.90 samples/sec Loss 2.5882 LearningRate 0.0000 Epoch: 38 Global Step: 798270 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:33,898-Speed 6295.90 samples/sec Loss 2.5067 LearningRate 0.0000 Epoch: 38 Global Step: 798280 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:37,155-Speed 6289.81 samples/sec Loss 2.5592 LearningRate 0.0000 Epoch: 38 Global Step: 798290 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:40,411-Speed 6289.70 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 38 Global Step: 798300 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:43,661-Speed 6302.96 samples/sec Loss 2.5135 LearningRate 0.0000 Epoch: 38 Global Step: 798310 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:46,898-Speed 6328.46 samples/sec Loss 2.5397 LearningRate 0.0000 Epoch: 38 Global Step: 798320 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:50,154-Speed 6291.14 samples/sec Loss 2.5418 LearningRate 0.0000 Epoch: 38 Global Step: 798330 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:53,405-Speed 6301.51 samples/sec Loss 2.5432 LearningRate 0.0000 Epoch: 38 Global Step: 798340 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:56,660-Speed 6292.49 samples/sec Loss 2.5847 LearningRate 0.0000 Epoch: 38 Global Step: 798350 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:22:59,917-Speed 6291.13 samples/sec Loss 2.5180 LearningRate 0.0000 Epoch: 38 Global Step: 798360 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:03,173-Speed 6290.32 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 38 Global Step: 798370 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:06,434-Speed 6281.70 samples/sec Loss 2.5272 LearningRate 0.0000 Epoch: 38 Global Step: 798380 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:09,693-Speed 6285.22 samples/sec Loss 2.5694 LearningRate 0.0000 Epoch: 38 Global Step: 798390 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:12,961-Speed 6268.56 samples/sec Loss 2.4866 LearningRate 0.0000 Epoch: 38 Global Step: 798400 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:16,215-Speed 6294.28 samples/sec Loss 2.5629 LearningRate 0.0000 Epoch: 38 Global Step: 798410 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:19,457-Speed 6319.39 samples/sec Loss 2.5626 LearningRate 0.0000 Epoch: 38 Global Step: 798420 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:22,708-Speed 6301.40 samples/sec Loss 2.5231 LearningRate 0.0000 Epoch: 38 Global Step: 798430 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:25,960-Speed 6298.85 samples/sec Loss 2.5654 LearningRate 0.0000 Epoch: 38 Global Step: 798440 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:29,216-Speed 6291.10 samples/sec Loss 2.5503 LearningRate 0.0000 Epoch: 38 Global Step: 798450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:32,473-Speed 6289.73 samples/sec Loss 2.5620 LearningRate 0.0000 Epoch: 38 Global Step: 798460 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:35,731-Speed 6288.64 samples/sec Loss 2.5078 LearningRate 0.0000 Epoch: 38 Global Step: 798470 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:38,984-Speed 6296.58 samples/sec Loss 2.5163 LearningRate 0.0000 Epoch: 38 Global Step: 798480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:42,239-Speed 6292.89 samples/sec Loss 2.5514 LearningRate 0.0000 Epoch: 38 Global Step: 798490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:45,493-Speed 6295.58 samples/sec Loss 2.5620 LearningRate 0.0000 Epoch: 38 Global Step: 798500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:48,787-Speed 6218.31 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 38 Global Step: 798510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:52,038-Speed 6301.98 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 38 Global Step: 798520 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:23:55,303-Speed 6272.81 samples/sec Loss 2.5527 LearningRate 0.0000 Epoch: 38 Global Step: 798530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:23:58,561-Speed 6288.21 samples/sec Loss 2.5476 LearningRate 0.0000 Epoch: 38 Global Step: 798540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:01,830-Speed 6266.50 samples/sec Loss 2.5789 LearningRate 0.0000 Epoch: 38 Global Step: 798550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:05,082-Speed 6298.95 samples/sec Loss 2.5108 LearningRate 0.0000 Epoch: 38 Global Step: 798560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:08,355-Speed 6259.05 samples/sec Loss 2.5414 LearningRate 0.0000 Epoch: 38 Global Step: 798570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:11,618-Speed 6277.21 samples/sec Loss 2.5373 LearningRate 0.0000 Epoch: 38 Global Step: 798580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:14,873-Speed 6293.24 samples/sec Loss 2.5309 LearningRate 0.0000 Epoch: 38 Global Step: 798590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:18,130-Speed 6288.50 samples/sec Loss 2.5940 LearningRate 0.0000 Epoch: 38 Global Step: 798600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:21,386-Speed 6292.62 samples/sec Loss 2.4938 LearningRate 0.0000 Epoch: 38 Global Step: 798610 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:24,640-Speed 6294.86 samples/sec Loss 2.5464 LearningRate 0.0000 Epoch: 38 Global Step: 798620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:27,880-Speed 6322.73 samples/sec Loss 2.6312 LearningRate 0.0000 Epoch: 38 Global Step: 798630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:31,136-Speed 6290.68 samples/sec Loss 2.5596 LearningRate 0.0000 Epoch: 38 Global Step: 798640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:34,389-Speed 6297.74 samples/sec Loss 2.5043 LearningRate 0.0000 Epoch: 38 Global Step: 798650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:37,638-Speed 6305.17 samples/sec Loss 2.5385 LearningRate 0.0000 Epoch: 38 Global Step: 798660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:40,890-Speed 6298.83 samples/sec Loss 2.5539 LearningRate 0.0000 Epoch: 38 Global Step: 798670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:44,137-Speed 6308.30 samples/sec Loss 2.5166 LearningRate 0.0000 Epoch: 38 Global Step: 798680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:47,384-Speed 6309.04 samples/sec Loss 2.5375 LearningRate 0.0000 Epoch: 38 Global Step: 798690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:50,636-Speed 6300.09 samples/sec Loss 2.5510 LearningRate 0.0000 Epoch: 38 Global Step: 798700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:53,886-Speed 6302.82 samples/sec Loss 2.5278 LearningRate 0.0000 Epoch: 38 Global Step: 798710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:24:57,141-Speed 6291.85 samples/sec Loss 2.5450 LearningRate 0.0000 Epoch: 38 Global Step: 798720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:00,420-Speed 6248.27 samples/sec Loss 2.5730 LearningRate 0.0000 Epoch: 38 Global Step: 798730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:03,675-Speed 6293.06 samples/sec Loss 2.5405 LearningRate 0.0000 Epoch: 38 Global Step: 798740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:06,932-Speed 6288.61 samples/sec Loss 2.5812 LearningRate 0.0000 Epoch: 38 Global Step: 798750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:10,202-Speed 6265.83 samples/sec Loss 2.5948 LearningRate 0.0000 Epoch: 38 Global Step: 798760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:13,466-Speed 6274.10 samples/sec Loss 2.4953 LearningRate 0.0000 Epoch: 38 Global Step: 798770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:16,719-Speed 6297.26 samples/sec Loss 2.5272 LearningRate 0.0000 Epoch: 38 Global Step: 798780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:19,973-Speed 6295.06 samples/sec Loss 2.5673 LearningRate 0.0000 Epoch: 38 Global Step: 798790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:23,243-Speed 6265.70 samples/sec Loss 2.5775 LearningRate 0.0000 Epoch: 38 Global Step: 798800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:26,498-Speed 6293.28 samples/sec Loss 2.5227 LearningRate 0.0000 Epoch: 38 Global Step: 798810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:29,747-Speed 6303.45 samples/sec Loss 2.5783 LearningRate 0.0000 Epoch: 38 Global Step: 798820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:32,988-Speed 6320.44 samples/sec Loss 2.5382 LearningRate 0.0000 Epoch: 38 Global Step: 798830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:36,243-Speed 6294.34 samples/sec Loss 2.5505 LearningRate 0.0000 Epoch: 38 Global Step: 798840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:39,508-Speed 6272.97 samples/sec Loss 2.5470 LearningRate 0.0000 Epoch: 38 Global Step: 798850 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:42,761-Speed 6298.17 samples/sec Loss 2.5236 LearningRate 0.0000 Epoch: 38 Global Step: 798860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:46,019-Speed 6286.60 samples/sec Loss 2.5362 LearningRate 0.0000 Epoch: 38 Global Step: 798870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:49,275-Speed 6292.71 samples/sec Loss 2.5348 LearningRate 0.0000 Epoch: 38 Global Step: 798880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:52,532-Speed 6290.28 samples/sec Loss 2.5217 LearningRate 0.0000 Epoch: 38 Global Step: 798890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:55,786-Speed 6294.51 samples/sec Loss 2.5789 LearningRate 0.0000 Epoch: 38 Global Step: 798900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:25:59,039-Speed 6298.01 samples/sec Loss 2.5833 LearningRate 0.0000 Epoch: 38 Global Step: 798910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:02,294-Speed 6293.62 samples/sec Loss 2.5963 LearningRate 0.0000 Epoch: 38 Global Step: 798920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:05,535-Speed 6320.89 samples/sec Loss 2.6394 LearningRate 0.0000 Epoch: 38 Global Step: 798930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:08,791-Speed 6289.94 samples/sec Loss 2.5322 LearningRate 0.0000 Epoch: 38 Global Step: 798940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:12,044-Speed 6297.76 samples/sec Loss 2.5267 LearningRate 0.0000 Epoch: 38 Global Step: 798950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:15,294-Speed 6302.24 samples/sec Loss 2.6009 LearningRate 0.0000 Epoch: 38 Global Step: 798960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:18,548-Speed 6294.70 samples/sec Loss 2.5277 LearningRate 0.0000 Epoch: 38 Global Step: 798970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:21,809-Speed 6282.42 samples/sec Loss 2.5566 LearningRate 0.0000 Epoch: 38 Global Step: 798980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:25,067-Speed 6286.47 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 38 Global Step: 798990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:28,328-Speed 6282.57 samples/sec Loss 2.5444 LearningRate 0.0000 Epoch: 38 Global Step: 799000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:31,596-Speed 6269.09 samples/sec Loss 2.6033 LearningRate 0.0000 Epoch: 38 Global Step: 799010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:34,842-Speed 6309.45 samples/sec Loss 2.5793 LearningRate 0.0000 Epoch: 38 Global Step: 799020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:38,102-Speed 6284.43 samples/sec Loss 2.5476 LearningRate 0.0000 Epoch: 38 Global Step: 799030 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:26:41,348-Speed 6311.06 samples/sec Loss 2.5642 LearningRate 0.0000 Epoch: 38 Global Step: 799040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:44,600-Speed 6297.57 samples/sec Loss 2.5414 LearningRate 0.0000 Epoch: 38 Global Step: 799050 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:47,867-Speed 6278.96 samples/sec Loss 2.5347 LearningRate 0.0000 Epoch: 38 Global Step: 799060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:51,136-Speed 6265.63 samples/sec Loss 2.5712 LearningRate 0.0000 Epoch: 38 Global Step: 799070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:54,388-Speed 6300.38 samples/sec Loss 2.5545 LearningRate 0.0000 Epoch: 38 Global Step: 799080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:26:57,642-Speed 6295.15 samples/sec Loss 2.5955 LearningRate 0.0000 Epoch: 38 Global Step: 799090 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:00,897-Speed 6292.65 samples/sec Loss 2.5342 LearningRate 0.0000 Epoch: 38 Global Step: 799100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:04,167-Speed 6265.39 samples/sec Loss 2.5879 LearningRate 0.0000 Epoch: 38 Global Step: 799110 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:07,428-Speed 6282.15 samples/sec Loss 2.5974 LearningRate 0.0000 Epoch: 38 Global Step: 799120 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:10,687-Speed 6285.77 samples/sec Loss 2.5731 LearningRate 0.0000 Epoch: 38 Global Step: 799130 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:13,922-Speed 6332.01 samples/sec Loss 2.5540 LearningRate 0.0000 Epoch: 38 Global Step: 799140 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:17,172-Speed 6302.23 samples/sec Loss 2.5846 LearningRate 0.0000 Epoch: 38 Global Step: 799150 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:20,432-Speed 6284.00 samples/sec Loss 2.5638 LearningRate 0.0000 Epoch: 38 Global Step: 799160 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:23,699-Speed 6269.86 samples/sec Loss 2.5576 LearningRate 0.0000 Epoch: 38 Global Step: 799170 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:26,959-Speed 6284.08 samples/sec Loss 2.5419 LearningRate 0.0000 Epoch: 38 Global Step: 799180 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:30,213-Speed 6294.29 samples/sec Loss 2.5223 LearningRate 0.0000 Epoch: 38 Global Step: 799190 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:33,470-Speed 6289.44 samples/sec Loss 2.5684 LearningRate 0.0000 Epoch: 38 Global Step: 799200 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:36,730-Speed 6284.38 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 38 Global Step: 799210 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:39,982-Speed 6299.24 samples/sec Loss 2.5386 LearningRate 0.0000 Epoch: 38 Global Step: 799220 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:43,233-Speed 6300.25 samples/sec Loss 2.5311 LearningRate 0.0000 Epoch: 38 Global Step: 799230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:46,467-Speed 6334.29 samples/sec Loss 2.5391 LearningRate 0.0000 Epoch: 38 Global Step: 799240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:49,722-Speed 6293.57 samples/sec Loss 2.5108 LearningRate 0.0000 Epoch: 38 Global Step: 799250 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:52,969-Speed 6308.30 samples/sec Loss 2.5616 LearningRate 0.0000 Epoch: 38 Global Step: 799260 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:56,221-Speed 6297.85 samples/sec Loss 2.5465 LearningRate 0.0000 Epoch: 38 Global Step: 799270 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:27:59,473-Speed 6300.70 samples/sec Loss 2.5716 LearningRate 0.0000 Epoch: 38 Global Step: 799280 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:02,727-Speed 6293.82 samples/sec Loss 2.5758 LearningRate 0.0000 Epoch: 38 Global Step: 799290 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:06,008-Speed 6245.20 samples/sec Loss 2.5203 LearningRate 0.0000 Epoch: 38 Global Step: 799300 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:09,256-Speed 6307.33 samples/sec Loss 2.5665 LearningRate 0.0000 Epoch: 38 Global Step: 799310 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:12,509-Speed 6296.88 samples/sec Loss 2.5180 LearningRate 0.0000 Epoch: 38 Global Step: 799320 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:15,761-Speed 6299.15 samples/sec Loss 2.5832 LearningRate 0.0000 Epoch: 38 Global Step: 799330 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:19,012-Speed 6301.67 samples/sec Loss 2.5893 LearningRate 0.0000 Epoch: 38 Global Step: 799340 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:28:22,248-Speed 6328.71 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 38 Global Step: 799350 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:25,511-Speed 6278.30 samples/sec Loss 2.5203 LearningRate 0.0000 Epoch: 38 Global Step: 799360 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:28,773-Speed 6280.08 samples/sec Loss 2.4911 LearningRate 0.0000 Epoch: 38 Global Step: 799370 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:32,039-Speed 6272.84 samples/sec Loss 2.5840 LearningRate 0.0000 Epoch: 38 Global Step: 799380 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:35,295-Speed 6291.64 samples/sec Loss 2.5891 LearningRate 0.0000 Epoch: 38 Global Step: 799390 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:38,564-Speed 6264.71 samples/sec Loss 2.5567 LearningRate 0.0000 Epoch: 38 Global Step: 799400 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:41,812-Speed 6308.19 samples/sec Loss 2.5427 LearningRate 0.0000 Epoch: 38 Global Step: 799410 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:45,080-Speed 6268.33 samples/sec Loss 2.6245 LearningRate 0.0000 Epoch: 38 Global Step: 799420 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:48,328-Speed 6305.09 samples/sec Loss 2.5237 LearningRate 0.0000 Epoch: 38 Global Step: 799430 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:51,578-Speed 6302.75 samples/sec Loss 2.5468 LearningRate 0.0000 Epoch: 38 Global Step: 799440 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:54,817-Speed 6326.19 samples/sec Loss 2.5938 LearningRate 0.0000 Epoch: 38 Global Step: 799450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:28:58,082-Speed 6273.33 samples/sec Loss 2.5894 LearningRate 0.0000 Epoch: 38 Global Step: 799460 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:01,334-Speed 6298.54 samples/sec Loss 2.5605 LearningRate 0.0000 Epoch: 38 Global Step: 799470 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:04,591-Speed 6289.00 samples/sec Loss 2.5530 LearningRate 0.0000 Epoch: 38 Global Step: 799480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:07,845-Speed 6295.02 samples/sec Loss 2.5492 LearningRate 0.0000 Epoch: 38 Global Step: 799490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:11,104-Speed 6285.83 samples/sec Loss 2.5579 LearningRate 0.0000 Epoch: 38 Global Step: 799500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:14,349-Speed 6312.59 samples/sec Loss 2.4896 LearningRate 0.0000 Epoch: 38 Global Step: 799510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:17,608-Speed 6284.99 samples/sec Loss 2.5530 LearningRate 0.0000 Epoch: 38 Global Step: 799520 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:20,861-Speed 6298.50 samples/sec Loss 2.5292 LearningRate 0.0000 Epoch: 38 Global Step: 799530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:24,133-Speed 6261.38 samples/sec Loss 2.5547 LearningRate 0.0000 Epoch: 38 Global Step: 799540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:27,370-Speed 6328.21 samples/sec Loss 2.5841 LearningRate 0.0000 Epoch: 38 Global Step: 799550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:30,621-Speed 6300.34 samples/sec Loss 2.5011 LearningRate 0.0000 Epoch: 38 Global Step: 799560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:33,872-Speed 6301.90 samples/sec Loss 2.5762 LearningRate 0.0000 Epoch: 38 Global Step: 799570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:37,121-Speed 6304.41 samples/sec Loss 2.5826 LearningRate 0.0000 Epoch: 38 Global Step: 799580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:40,383-Speed 6280.68 samples/sec Loss 2.5168 LearningRate 0.0000 Epoch: 38 Global Step: 799590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:43,635-Speed 6298.93 samples/sec Loss 2.5553 LearningRate 0.0000 Epoch: 38 Global Step: 799600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:46,888-Speed 6296.65 samples/sec Loss 2.5702 LearningRate 0.0000 Epoch: 38 Global Step: 799610 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:50,146-Speed 6286.98 samples/sec Loss 2.5254 LearningRate 0.0000 Epoch: 38 Global Step: 799620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:53,405-Speed 6287.06 samples/sec Loss 2.5375 LearningRate 0.0000 Epoch: 38 Global Step: 799630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:56,661-Speed 6290.67 samples/sec Loss 2.5138 LearningRate 0.0000 Epoch: 38 Global Step: 799640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:29:59,922-Speed 6281.15 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 38 Global Step: 799650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:03,181-Speed 6285.93 samples/sec Loss 2.5060 LearningRate 0.0000 Epoch: 38 Global Step: 799660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:06,441-Speed 6283.90 samples/sec Loss 2.6070 LearningRate 0.0000 Epoch: 38 Global Step: 799670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:09,694-Speed 6297.19 samples/sec Loss 2.5822 LearningRate 0.0000 Epoch: 38 Global Step: 799680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:12,946-Speed 6298.90 samples/sec Loss 2.5670 LearningRate 0.0000 Epoch: 38 Global Step: 799690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:16,198-Speed 6297.66 samples/sec Loss 2.5125 LearningRate 0.0000 Epoch: 38 Global Step: 799700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:19,459-Speed 6283.20 samples/sec Loss 2.5697 LearningRate 0.0000 Epoch: 38 Global Step: 799710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:22,711-Speed 6298.70 samples/sec Loss 2.5823 LearningRate 0.0000 Epoch: 38 Global Step: 799720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:25,983-Speed 6260.51 samples/sec Loss 2.5615 LearningRate 0.0000 Epoch: 38 Global Step: 799730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:29,245-Speed 6280.50 samples/sec Loss 2.5693 LearningRate 0.0000 Epoch: 38 Global Step: 799740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:32,484-Speed 6324.83 samples/sec Loss 2.5915 LearningRate 0.0000 Epoch: 38 Global Step: 799750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:35,739-Speed 6293.82 samples/sec Loss 2.5291 LearningRate 0.0000 Epoch: 38 Global Step: 799760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:38,986-Speed 6307.34 samples/sec Loss 2.5475 LearningRate 0.0000 Epoch: 38 Global Step: 799770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:42,242-Speed 6292.36 samples/sec Loss 2.5354 LearningRate 0.0000 Epoch: 38 Global Step: 799780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:45,493-Speed 6301.87 samples/sec Loss 2.5293 LearningRate 0.0000 Epoch: 38 Global Step: 799790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:48,742-Speed 6303.15 samples/sec Loss 2.5198 LearningRate 0.0000 Epoch: 38 Global Step: 799800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:51,998-Speed 6292.78 samples/sec Loss 2.5800 LearningRate 0.0000 Epoch: 38 Global Step: 799810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:55,257-Speed 6285.59 samples/sec Loss 2.5504 LearningRate 0.0000 Epoch: 38 Global Step: 799820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:30:58,516-Speed 6283.86 samples/sec Loss 2.5428 LearningRate 0.0000 Epoch: 38 Global Step: 799830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:01,768-Speed 6300.01 samples/sec Loss 2.5790 LearningRate 0.0000 Epoch: 38 Global Step: 799840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:05,003-Speed 6332.83 samples/sec Loss 2.5095 LearningRate 0.0000 Epoch: 38 Global Step: 799850 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:08,257-Speed 6294.80 samples/sec Loss 2.4907 LearningRate 0.0000 Epoch: 38 Global Step: 799860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:11,516-Speed 6284.66 samples/sec Loss 2.5570 LearningRate 0.0000 Epoch: 38 Global Step: 799870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:14,769-Speed 6297.98 samples/sec Loss 2.5198 LearningRate 0.0000 Epoch: 38 Global Step: 799880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:18,023-Speed 6295.22 samples/sec Loss 2.6165 LearningRate 0.0000 Epoch: 38 Global Step: 799890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:21,275-Speed 6297.61 samples/sec Loss 2.5329 LearningRate 0.0000 Epoch: 38 Global Step: 799900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:24,526-Speed 6301.41 samples/sec Loss 2.4941 LearningRate 0.0000 Epoch: 38 Global Step: 799910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:27,778-Speed 6298.97 samples/sec Loss 2.5594 LearningRate 0.0000 Epoch: 38 Global Step: 799920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:31,036-Speed 6287.10 samples/sec Loss 2.5456 LearningRate 0.0000 Epoch: 38 Global Step: 799930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:34,300-Speed 6276.95 samples/sec Loss 2.5407 LearningRate 0.0000 Epoch: 38 Global Step: 799940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:37,539-Speed 6325.19 samples/sec Loss 2.5587 LearningRate 0.0000 Epoch: 38 Global Step: 799950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:40,787-Speed 6307.03 samples/sec Loss 2.5320 LearningRate 0.0000 Epoch: 38 Global Step: 799960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:44,051-Speed 6275.62 samples/sec Loss 2.5347 LearningRate 0.0000 Epoch: 38 Global Step: 799970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:47,303-Speed 6298.98 samples/sec Loss 2.5126 LearningRate 0.0000 Epoch: 38 Global Step: 799980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:50,551-Speed 6306.15 samples/sec Loss 2.5072 LearningRate 0.0000 Epoch: 38 Global Step: 799990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:53,806-Speed 6294.31 samples/sec Loss 2.5488 LearningRate 0.0000 Epoch: 38 Global Step: 800000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:31:57,055-Speed 6303.81 samples/sec Loss 2.5772 LearningRate 0.0000 Epoch: 38 Global Step: 800010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:00,311-Speed 6292.77 samples/sec Loss 2.5413 LearningRate 0.0000 Epoch: 38 Global Step: 800020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:03,562-Speed 6300.61 samples/sec Loss 2.5051 LearningRate 0.0000 Epoch: 38 Global Step: 800030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:06,819-Speed 6288.83 samples/sec Loss 2.5212 LearningRate 0.0000 Epoch: 38 Global Step: 800040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:10,079-Speed 6285.01 samples/sec Loss 2.4975 LearningRate 0.0000 Epoch: 38 Global Step: 800050 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:32:13,318-Speed 6323.29 samples/sec Loss 2.5599 LearningRate 0.0000 Epoch: 38 Global Step: 800060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:16,571-Speed 6296.29 samples/sec Loss 2.5502 LearningRate 0.0000 Epoch: 38 Global Step: 800070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:19,824-Speed 6297.04 samples/sec Loss 2.5103 LearningRate 0.0000 Epoch: 38 Global Step: 800080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:23,088-Speed 6277.69 samples/sec Loss 2.5692 LearningRate 0.0000 Epoch: 38 Global Step: 800090 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:26,336-Speed 6305.38 samples/sec Loss 2.5578 LearningRate 0.0000 Epoch: 38 Global Step: 800100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:29,593-Speed 6289.33 samples/sec Loss 2.5523 LearningRate 0.0000 Epoch: 38 Global Step: 800110 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:32,847-Speed 6295.05 samples/sec Loss 2.5338 LearningRate 0.0000 Epoch: 38 Global Step: 800120 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:36,097-Speed 6303.20 samples/sec Loss 2.5584 LearningRate 0.0000 Epoch: 38 Global Step: 800130 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:39,349-Speed 6299.73 samples/sec Loss 2.6077 LearningRate 0.0000 Epoch: 38 Global Step: 800140 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:42,602-Speed 6296.72 samples/sec Loss 2.5265 LearningRate 0.0000 Epoch: 38 Global Step: 800150 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:45,841-Speed 6323.76 samples/sec Loss 2.5294 LearningRate 0.0000 Epoch: 38 Global Step: 800160 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:49,113-Speed 6261.03 samples/sec Loss 2.4987 LearningRate 0.0000 Epoch: 38 Global Step: 800170 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:52,425-Speed 6185.11 samples/sec Loss 2.5590 LearningRate 0.0000 Epoch: 38 Global Step: 800180 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:55,674-Speed 6306.48 samples/sec Loss 2.5720 LearningRate 0.0000 Epoch: 38 Global Step: 800190 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:32:58,930-Speed 6291.35 samples/sec Loss 2.5275 LearningRate 0.0000 Epoch: 38 Global Step: 800200 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:02,179-Speed 6303.97 samples/sec Loss 2.5135 LearningRate 0.0000 Epoch: 38 Global Step: 800210 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:05,431-Speed 6299.28 samples/sec Loss 2.5233 LearningRate 0.0000 Epoch: 38 Global Step: 800220 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:08,680-Speed 6304.06 samples/sec Loss 2.5895 LearningRate 0.0000 Epoch: 38 Global Step: 800230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:11,941-Speed 6282.17 samples/sec Loss 2.5305 LearningRate 0.0000 Epoch: 38 Global Step: 800240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:15,194-Speed 6297.32 samples/sec Loss 2.5857 LearningRate 0.0000 Epoch: 38 Global Step: 800250 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:18,449-Speed 6293.80 samples/sec Loss 2.5682 LearningRate 0.0000 Epoch: 38 Global Step: 800260 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:21,705-Speed 6291.62 samples/sec Loss 2.4989 LearningRate 0.0000 Epoch: 38 Global Step: 800270 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:24,952-Speed 6307.36 samples/sec Loss 2.5353 LearningRate 0.0000 Epoch: 38 Global Step: 800280 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:28,212-Speed 6284.47 samples/sec Loss 2.6029 LearningRate 0.0000 Epoch: 38 Global Step: 800290 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:31,460-Speed 6307.36 samples/sec Loss 2.5682 LearningRate 0.0000 Epoch: 38 Global Step: 800300 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:34,717-Speed 6289.46 samples/sec Loss 2.5513 LearningRate 0.0000 Epoch: 38 Global Step: 800310 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:37,972-Speed 6292.05 samples/sec Loss 2.5356 LearningRate 0.0000 Epoch: 38 Global Step: 800320 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:41,229-Speed 6290.01 samples/sec Loss 2.4974 LearningRate 0.0000 Epoch: 38 Global Step: 800330 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:44,479-Speed 6302.78 samples/sec Loss 2.5130 LearningRate 0.0000 Epoch: 38 Global Step: 800340 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:47,730-Speed 6301.14 samples/sec Loss 2.5786 LearningRate 0.0000 Epoch: 38 Global Step: 800350 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:50,970-Speed 6322.47 samples/sec Loss 2.5385 LearningRate 0.0000 Epoch: 38 Global Step: 800360 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:54,233-Speed 6277.89 samples/sec Loss 2.5597 LearningRate 0.0000 Epoch: 38 Global Step: 800370 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:33:57,488-Speed 6293.98 samples/sec Loss 2.5660 LearningRate 0.0000 Epoch: 38 Global Step: 800380 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:00,746-Speed 6286.77 samples/sec Loss 2.5606 LearningRate 0.0000 Epoch: 38 Global Step: 800390 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:04,008-Speed 6279.83 samples/sec Loss 2.5025 LearningRate 0.0000 Epoch: 38 Global Step: 800400 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:07,266-Speed 6287.53 samples/sec Loss 2.5196 LearningRate 0.0000 Epoch: 38 Global Step: 800410 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:10,520-Speed 6296.78 samples/sec Loss 2.5392 LearningRate 0.0000 Epoch: 38 Global Step: 800420 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:13,774-Speed 6294.17 samples/sec Loss 2.5708 LearningRate 0.0000 Epoch: 38 Global Step: 800430 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:17,030-Speed 6291.19 samples/sec Loss 2.5155 LearningRate 0.0000 Epoch: 38 Global Step: 800440 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:20,285-Speed 6293.62 samples/sec Loss 2.5790 LearningRate 0.0000 Epoch: 38 Global Step: 800450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:23,544-Speed 6285.59 samples/sec Loss 2.5612 LearningRate 0.0000 Epoch: 38 Global Step: 800460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:34:26,784-Speed 6321.34 samples/sec Loss 2.5534 LearningRate 0.0000 Epoch: 38 Global Step: 800470 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:30,036-Speed 6299.73 samples/sec Loss 2.5088 LearningRate 0.0000 Epoch: 38 Global Step: 800480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:33,285-Speed 6303.98 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 38 Global Step: 800490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:36,538-Speed 6297.55 samples/sec Loss 2.5182 LearningRate 0.0000 Epoch: 38 Global Step: 800500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:39,801-Speed 6278.02 samples/sec Loss 2.5313 LearningRate 0.0000 Epoch: 38 Global Step: 800510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:43,067-Speed 6273.33 samples/sec Loss 2.5691 LearningRate 0.0000 Epoch: 38 Global Step: 800520 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:46,327-Speed 6282.35 samples/sec Loss 2.5175 LearningRate 0.0000 Epoch: 38 Global Step: 800530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:49,587-Speed 6283.28 samples/sec Loss 2.5233 LearningRate 0.0000 Epoch: 38 Global Step: 800540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:52,852-Speed 6274.88 samples/sec Loss 2.4965 LearningRate 0.0000 Epoch: 38 Global Step: 800550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:56,155-Speed 6201.67 samples/sec Loss 2.5582 LearningRate 0.0000 Epoch: 38 Global Step: 800560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:34:59,394-Speed 6323.33 samples/sec Loss 2.5041 LearningRate 0.0000 Epoch: 38 Global Step: 800570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:02,640-Speed 6310.33 samples/sec Loss 2.5433 LearningRate 0.0000 Epoch: 38 Global Step: 800580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:05,898-Speed 6289.14 samples/sec Loss 2.5342 LearningRate 0.0000 Epoch: 38 Global Step: 800590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:09,157-Speed 6285.04 samples/sec Loss 2.5009 LearningRate 0.0000 Epoch: 38 Global Step: 800600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:12,416-Speed 6286.36 samples/sec Loss 2.5504 LearningRate 0.0000 Epoch: 38 Global Step: 800610 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:15,669-Speed 6296.60 samples/sec Loss 2.5450 LearningRate 0.0000 Epoch: 38 Global Step: 800620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:18,919-Speed 6303.36 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 38 Global Step: 800630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:22,178-Speed 6286.26 samples/sec Loss 2.5512 LearningRate 0.0000 Epoch: 38 Global Step: 800640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:25,427-Speed 6305.11 samples/sec Loss 2.5253 LearningRate 0.0000 Epoch: 38 Global Step: 800650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:28,685-Speed 6287.63 samples/sec Loss 2.5143 LearningRate 0.0000 Epoch: 38 Global Step: 800660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:31,934-Speed 6302.98 samples/sec Loss 2.5882 LearningRate 0.0000 Epoch: 38 Global Step: 800670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:35,202-Speed 6270.01 samples/sec Loss 2.5646 LearningRate 0.0000 Epoch: 38 Global Step: 800680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:38,456-Speed 6295.01 samples/sec Loss 2.4972 LearningRate 0.0000 Epoch: 38 Global Step: 800690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:41,716-Speed 6283.25 samples/sec Loss 2.5376 LearningRate 0.0000 Epoch: 38 Global Step: 800700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:44,965-Speed 6305.01 samples/sec Loss 2.5730 LearningRate 0.0000 Epoch: 38 Global Step: 800710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:48,216-Speed 6300.14 samples/sec Loss 2.4489 LearningRate 0.0000 Epoch: 38 Global Step: 800720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:51,470-Speed 6295.23 samples/sec Loss 2.5748 LearningRate 0.0000 Epoch: 38 Global Step: 800730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:54,732-Speed 6280.20 samples/sec Loss 2.5826 LearningRate 0.0000 Epoch: 38 Global Step: 800740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:35:57,986-Speed 6295.95 samples/sec Loss 2.5613 LearningRate 0.0000 Epoch: 38 Global Step: 800750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:01,237-Speed 6300.40 samples/sec Loss 2.5753 LearningRate 0.0000 Epoch: 38 Global Step: 800760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:04,482-Speed 6312.82 samples/sec Loss 2.5507 LearningRate 0.0000 Epoch: 38 Global Step: 800770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:07,744-Speed 6278.62 samples/sec Loss 2.5722 LearningRate 0.0000 Epoch: 38 Global Step: 800780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:11,001-Speed 6290.24 samples/sec Loss 2.5396 LearningRate 0.0000 Epoch: 38 Global Step: 800790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:14,254-Speed 6296.53 samples/sec Loss 2.6082 LearningRate 0.0000 Epoch: 38 Global Step: 800800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:17,504-Speed 6304.26 samples/sec Loss 2.5901 LearningRate 0.0000 Epoch: 38 Global Step: 800810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:20,754-Speed 6303.58 samples/sec Loss 2.5421 LearningRate 0.0000 Epoch: 38 Global Step: 800820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:24,000-Speed 6310.92 samples/sec Loss 2.5609 LearningRate 0.0000 Epoch: 38 Global Step: 800830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:27,255-Speed 6292.31 samples/sec Loss 2.5883 LearningRate 0.0000 Epoch: 38 Global Step: 800840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:30,515-Speed 6284.75 samples/sec Loss 2.5749 LearningRate 0.0000 Epoch: 38 Global Step: 800850 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:33,769-Speed 6294.09 samples/sec Loss 2.5706 LearningRate 0.0000 Epoch: 38 Global Step: 800860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:37,016-Speed 6308.91 samples/sec Loss 2.5499 LearningRate 0.0000 Epoch: 38 Global Step: 800870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:40,284-Speed 6270.16 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 38 Global Step: 800880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:43,534-Speed 6300.92 samples/sec Loss 2.5433 LearningRate 0.0000 Epoch: 38 Global Step: 800890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:46,788-Speed 6296.61 samples/sec Loss 2.5896 LearningRate 0.0000 Epoch: 38 Global Step: 800900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:50,044-Speed 6291.77 samples/sec Loss 2.5656 LearningRate 0.0000 Epoch: 38 Global Step: 800910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:53,301-Speed 6289.60 samples/sec Loss 2.5418 LearningRate 0.0000 Epoch: 38 Global Step: 800920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:56,548-Speed 6306.84 samples/sec Loss 2.5689 LearningRate 0.0000 Epoch: 38 Global Step: 800930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:36:59,826-Speed 6250.10 samples/sec Loss 2.5427 LearningRate 0.0000 Epoch: 38 Global Step: 800940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:03,084-Speed 6287.10 samples/sec Loss 2.5143 LearningRate 0.0000 Epoch: 38 Global Step: 800950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:06,351-Speed 6269.92 samples/sec Loss 2.5777 LearningRate 0.0000 Epoch: 38 Global Step: 800960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:09,595-Speed 6313.79 samples/sec Loss 2.5792 LearningRate 0.0000 Epoch: 38 Global Step: 800970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:12,858-Speed 6279.33 samples/sec Loss 2.5144 LearningRate 0.0000 Epoch: 38 Global Step: 800980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:16,118-Speed 6283.74 samples/sec Loss 2.5333 LearningRate 0.0000 Epoch: 38 Global Step: 800990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:19,372-Speed 6293.56 samples/sec Loss 2.5081 LearningRate 0.0000 Epoch: 38 Global Step: 801000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:22,629-Speed 6290.16 samples/sec Loss 2.5036 LearningRate 0.0000 Epoch: 38 Global Step: 801010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:25,886-Speed 6289.33 samples/sec Loss 2.5317 LearningRate 0.0000 Epoch: 38 Global Step: 801020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:29,141-Speed 6292.83 samples/sec Loss 2.5321 LearningRate 0.0000 Epoch: 38 Global Step: 801030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:32,403-Speed 6280.15 samples/sec Loss 2.5635 LearningRate 0.0000 Epoch: 38 Global Step: 801040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:35,663-Speed 6285.13 samples/sec Loss 2.5658 LearningRate 0.0000 Epoch: 38 Global Step: 801050 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:38,915-Speed 6298.21 samples/sec Loss 2.5360 LearningRate 0.0000 Epoch: 38 Global Step: 801060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:42,172-Speed 6290.48 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 38 Global Step: 801070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:45,425-Speed 6297.60 samples/sec Loss 2.5262 LearningRate 0.0000 Epoch: 38 Global Step: 801080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:48,678-Speed 6295.56 samples/sec Loss 2.5562 LearningRate 0.0000 Epoch: 38 Global Step: 801090 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:51,934-Speed 6291.39 samples/sec Loss 2.5577 LearningRate 0.0000 Epoch: 38 Global Step: 801100 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:55,190-Speed 6292.53 samples/sec Loss 2.5672 LearningRate 0.0000 Epoch: 38 Global Step: 801110 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:37:58,449-Speed 6285.50 samples/sec Loss 2.5544 LearningRate 0.0000 Epoch: 38 Global Step: 801120 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:01,694-Speed 6311.18 samples/sec Loss 2.5330 LearningRate 0.0000 Epoch: 38 Global Step: 801130 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:04,952-Speed 6288.87 samples/sec Loss 2.5311 LearningRate 0.0000 Epoch: 38 Global Step: 801140 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:08,198-Speed 6309.70 samples/sec Loss 2.5494 LearningRate 0.0000 Epoch: 38 Global Step: 801150 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:11,457-Speed 6286.55 samples/sec Loss 2.5705 LearningRate 0.0000 Epoch: 38 Global Step: 801160 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:14,697-Speed 6320.99 samples/sec Loss 2.5254 LearningRate 0.0000 Epoch: 38 Global Step: 801170 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:17,953-Speed 6291.78 samples/sec Loss 2.5311 LearningRate 0.0000 Epoch: 38 Global Step: 801180 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:21,253-Speed 6207.85 samples/sec Loss 2.5540 LearningRate 0.0000 Epoch: 38 Global Step: 801190 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:24,566-Speed 6182.63 samples/sec Loss 2.5653 LearningRate 0.0000 Epoch: 38 Global Step: 801200 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:27,817-Speed 6300.36 samples/sec Loss 2.5475 LearningRate 0.0000 Epoch: 38 Global Step: 801210 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:31,105-Speed 6229.91 samples/sec Loss 2.5383 LearningRate 0.0000 Epoch: 38 Global Step: 801220 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:34,357-Speed 6298.83 samples/sec Loss 2.5380 LearningRate 0.0000 Epoch: 38 Global Step: 801230 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:37,617-Speed 6283.82 samples/sec Loss 2.5078 LearningRate 0.0000 Epoch: 38 Global Step: 801240 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:40,865-Speed 6307.90 samples/sec Loss 2.5570 LearningRate 0.0000 Epoch: 38 Global Step: 801250 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:44,121-Speed 6291.58 samples/sec Loss 2.5196 LearningRate 0.0000 Epoch: 38 Global Step: 801260 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:47,373-Speed 6299.02 samples/sec Loss 2.5491 LearningRate 0.0000 Epoch: 38 Global Step: 801270 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:38:50,619-Speed 6314.07 samples/sec Loss 2.5384 LearningRate 0.0000 Epoch: 38 Global Step: 801280 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:53,871-Speed 6298.51 samples/sec Loss 2.6095 LearningRate 0.0000 Epoch: 38 Global Step: 801290 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:38:57,124-Speed 6297.60 samples/sec Loss 2.5369 LearningRate 0.0000 Epoch: 38 Global Step: 801300 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:00,378-Speed 6295.33 samples/sec Loss 2.5745 LearningRate 0.0000 Epoch: 38 Global Step: 801310 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:03,631-Speed 6296.56 samples/sec Loss 2.5340 LearningRate 0.0000 Epoch: 38 Global Step: 801320 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:06,893-Speed 6280.95 samples/sec Loss 2.5776 LearningRate 0.0000 Epoch: 38 Global Step: 801330 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:10,147-Speed 6293.72 samples/sec Loss 2.5667 LearningRate 0.0000 Epoch: 38 Global Step: 801340 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:13,408-Speed 6282.90 samples/sec Loss 2.5695 LearningRate 0.0000 Epoch: 38 Global Step: 801350 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:16,666-Speed 6287.00 samples/sec Loss 2.5249 LearningRate 0.0000 Epoch: 38 Global Step: 801360 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:19,918-Speed 6298.99 samples/sec Loss 2.5350 LearningRate 0.0000 Epoch: 38 Global Step: 801370 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:23,158-Speed 6323.44 samples/sec Loss 2.5947 LearningRate 0.0000 Epoch: 38 Global Step: 801380 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:26,413-Speed 6293.02 samples/sec Loss 2.5826 LearningRate 0.0000 Epoch: 38 Global Step: 801390 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:29,660-Speed 6308.07 samples/sec Loss 2.5056 LearningRate 0.0000 Epoch: 38 Global Step: 801400 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:32,918-Speed 6287.06 samples/sec Loss 2.5548 LearningRate 0.0000 Epoch: 38 Global Step: 801410 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:36,175-Speed 6289.54 samples/sec Loss 2.5474 LearningRate 0.0000 Epoch: 38 Global Step: 801420 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:39,430-Speed 6293.94 samples/sec Loss 2.5305 LearningRate 0.0000 Epoch: 38 Global Step: 801430 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:42,716-Speed 6233.56 samples/sec Loss 2.5684 LearningRate 0.0000 Epoch: 38 Global Step: 801440 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:45,973-Speed 6290.22 samples/sec Loss 2.5810 LearningRate 0.0000 Epoch: 38 Global Step: 801450 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:49,254-Speed 6243.56 samples/sec Loss 2.5720 LearningRate 0.0000 Epoch: 38 Global Step: 801460 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:52,545-Speed 6224.55 samples/sec Loss 2.5420 LearningRate 0.0000 Epoch: 38 Global Step: 801470 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:55,785-Speed 6321.98 samples/sec Loss 2.5342 LearningRate 0.0000 Epoch: 38 Global Step: 801480 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:39:59,040-Speed 6293.42 samples/sec Loss 2.5630 LearningRate 0.0000 Epoch: 38 Global Step: 801490 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:02,295-Speed 6292.56 samples/sec Loss 2.5401 LearningRate 0.0000 Epoch: 38 Global Step: 801500 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:05,551-Speed 6291.71 samples/sec Loss 2.5230 LearningRate 0.0000 Epoch: 38 Global Step: 801510 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:08,808-Speed 6290.34 samples/sec Loss 2.5218 LearningRate 0.0000 Epoch: 38 Global Step: 801520 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:12,065-Speed 6288.73 samples/sec Loss 2.5453 LearningRate 0.0000 Epoch: 38 Global Step: 801530 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:15,314-Speed 6305.01 samples/sec Loss 2.4728 LearningRate 0.0000 Epoch: 38 Global Step: 801540 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:18,568-Speed 6296.20 samples/sec Loss 2.5366 LearningRate 0.0000 Epoch: 38 Global Step: 801550 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:21,829-Speed 6280.50 samples/sec Loss 2.4971 LearningRate 0.0000 Epoch: 38 Global Step: 801560 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:25,080-Speed 6300.73 samples/sec Loss 2.5374 LearningRate 0.0000 Epoch: 38 Global Step: 801570 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:28,317-Speed 6327.99 samples/sec Loss 2.5272 LearningRate 0.0000 Epoch: 38 Global Step: 801580 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:31,569-Speed 6299.46 samples/sec Loss 2.5572 LearningRate 0.0000 Epoch: 38 Global Step: 801590 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:34,830-Speed 6282.89 samples/sec Loss 2.5733 LearningRate 0.0000 Epoch: 38 Global Step: 801600 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:38,084-Speed 6293.62 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 38 Global Step: 801610 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:41,339-Speed 6294.62 samples/sec Loss 2.5432 LearningRate 0.0000 Epoch: 38 Global Step: 801620 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:44,598-Speed 6285.65 samples/sec Loss 2.4819 LearningRate 0.0000 Epoch: 38 Global Step: 801630 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:47,854-Speed 6289.62 samples/sec Loss 2.5696 LearningRate 0.0000 Epoch: 38 Global Step: 801640 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:51,111-Speed 6290.32 samples/sec Loss 2.5543 LearningRate 0.0000 Epoch: 38 Global Step: 801650 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:54,375-Speed 6277.12 samples/sec Loss 2.5528 LearningRate 0.0000 Epoch: 38 Global Step: 801660 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:40:57,627-Speed 6298.51 samples/sec Loss 2.5923 LearningRate 0.0000 Epoch: 38 Global Step: 801670 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:00,866-Speed 6323.42 samples/sec Loss 2.5534 LearningRate 0.0000 Epoch: 38 Global Step: 801680 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:04,121-Speed 6294.11 samples/sec Loss 2.5506 LearningRate 0.0000 Epoch: 38 Global Step: 801690 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:07,375-Speed 6295.24 samples/sec Loss 2.5202 LearningRate 0.0000 Epoch: 38 Global Step: 801700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:10,629-Speed 6295.53 samples/sec Loss 2.5574 LearningRate 0.0000 Epoch: 38 Global Step: 801710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:13,886-Speed 6290.37 samples/sec Loss 2.5421 LearningRate 0.0000 Epoch: 38 Global Step: 801720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:17,140-Speed 6295.06 samples/sec Loss 2.5764 LearningRate 0.0000 Epoch: 38 Global Step: 801730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:20,394-Speed 6294.43 samples/sec Loss 2.5645 LearningRate 0.0000 Epoch: 38 Global Step: 801740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:23,646-Speed 6300.26 samples/sec Loss 2.5670 LearningRate 0.0000 Epoch: 38 Global Step: 801750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:26,898-Speed 6299.46 samples/sec Loss 2.5476 LearningRate 0.0000 Epoch: 38 Global Step: 801760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:30,149-Speed 6300.53 samples/sec Loss 2.5795 LearningRate 0.0000 Epoch: 38 Global Step: 801770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:33,409-Speed 6283.85 samples/sec Loss 2.5337 LearningRate 0.0000 Epoch: 38 Global Step: 801780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-03 16:41:36,650-Speed 6319.23 samples/sec Loss 2.5878 LearningRate 0.0000 Epoch: 38 Global Step: 801790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:39,904-Speed 6296.54 samples/sec Loss 2.5397 LearningRate 0.0000 Epoch: 38 Global Step: 801800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:43,159-Speed 6292.13 samples/sec Loss 2.5222 LearningRate 0.0000 Epoch: 38 Global Step: 801810 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:46,414-Speed 6292.73 samples/sec Loss 2.5668 LearningRate 0.0000 Epoch: 38 Global Step: 801820 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:49,668-Speed 6296.43 samples/sec Loss 2.5293 LearningRate 0.0000 Epoch: 38 Global Step: 801830 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:52,918-Speed 6303.04 samples/sec Loss 2.5435 LearningRate 0.0000 Epoch: 38 Global Step: 801840 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:56,176-Speed 6287.37 samples/sec Loss 2.5975 LearningRate 0.0000 Epoch: 38 Global Step: 801850 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:41:59,429-Speed 6296.80 samples/sec Loss 2.5317 LearningRate 0.0000 Epoch: 38 Global Step: 801860 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:02,690-Speed 6281.24 samples/sec Loss 2.5473 LearningRate 0.0000 Epoch: 38 Global Step: 801870 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:05,947-Speed 6289.15 samples/sec Loss 2.5186 LearningRate 0.0000 Epoch: 38 Global Step: 801880 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:09,191-Speed 6315.80 samples/sec Loss 2.5849 LearningRate 0.0000 Epoch: 38 Global Step: 801890 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:12,440-Speed 6304.69 samples/sec Loss 2.5493 LearningRate 0.0000 Epoch: 38 Global Step: 801900 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:15,695-Speed 6293.71 samples/sec Loss 2.5378 LearningRate 0.0000 Epoch: 38 Global Step: 801910 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:18,951-Speed 6291.42 samples/sec Loss 2.5247 LearningRate 0.0000 Epoch: 38 Global Step: 801920 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:22,207-Speed 6290.78 samples/sec Loss 2.5754 LearningRate 0.0000 Epoch: 38 Global Step: 801930 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:25,463-Speed 6292.42 samples/sec Loss 2.5336 LearningRate 0.0000 Epoch: 38 Global Step: 801940 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:28,709-Speed 6310.60 samples/sec Loss 2.5368 LearningRate 0.0000 Epoch: 38 Global Step: 801950 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:31,971-Speed 6280.71 samples/sec Loss 2.4782 LearningRate 0.0000 Epoch: 38 Global Step: 801960 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:35,224-Speed 6295.82 samples/sec Loss 2.5378 LearningRate 0.0000 Epoch: 38 Global Step: 801970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:38,476-Speed 6298.37 samples/sec Loss 2.5575 LearningRate 0.0000 Epoch: 38 Global Step: 801980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:41,719-Speed 6317.58 samples/sec Loss 2.4793 LearningRate 0.0000 Epoch: 38 Global Step: 801990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:44,975-Speed 6291.77 samples/sec Loss 2.5264 LearningRate 0.0000 Epoch: 38 Global Step: 802000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:48,233-Speed 6286.55 samples/sec Loss 2.5268 LearningRate 0.0000 Epoch: 38 Global Step: 802010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:51,489-Speed 6290.57 samples/sec Loss 2.5009 LearningRate 0.0000 Epoch: 38 Global Step: 802020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:54,743-Speed 6296.21 samples/sec Loss 2.5410 LearningRate 0.0000 Epoch: 38 Global Step: 802030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:42:57,994-Speed 6301.74 samples/sec Loss 2.5587 LearningRate 0.0000 Epoch: 38 Global Step: 802040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:43:01,249-Speed 6293.35 samples/sec Loss 2.5695 LearningRate 0.0000 Epoch: 38 Global Step: 802050 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:43:04,502-Speed 6296.62 samples/sec Loss 2.5083 LearningRate 0.0000 Epoch: 38 Global Step: 802060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:43:07,754-Speed 6298.18 samples/sec Loss 2.5327 LearningRate 0.0000 Epoch: 38 Global Step: 802070 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:43:11,014-Speed 6283.09 samples/sec Loss 2.5824 LearningRate 0.0000 Epoch: 38 Global Step: 802080 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-03 16:43:14,245-Speed 6340.60 samples/sec Loss 2.5640 LearningRate 0.0000 Epoch: 38 Global Step: 802090 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:17,493-Speed 6306.61 samples/sec Loss 2.5920 LearningRate 0.0000 Epoch: 38 Global Step: 802100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:20,754-Speed 6282.23 samples/sec Loss 2.5762 LearningRate 0.0000 Epoch: 38 Global Step: 802110 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:24,121-Speed 6083.62 samples/sec Loss 2.5738 LearningRate 0.0000 Epoch: 38 Global Step: 802120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:27,379-Speed 6288.23 samples/sec Loss 2.5769 LearningRate 0.0000 Epoch: 38 Global Step: 802130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:30,630-Speed 6300.25 samples/sec Loss 2.5365 LearningRate 0.0000 Epoch: 38 Global Step: 802140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:33,891-Speed 6282.85 samples/sec Loss 2.5393 LearningRate 0.0000 Epoch: 38 Global Step: 802150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:37,170-Speed 6246.40 samples/sec Loss 2.5410 LearningRate 0.0000 Epoch: 38 Global Step: 802160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:40,436-Speed 6272.23 samples/sec Loss 2.5735 LearningRate 0.0000 Epoch: 38 Global Step: 802170 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:43,685-Speed 6305.00 samples/sec Loss 2.4995 LearningRate 0.0000 Epoch: 38 Global Step: 802180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:46,927-Speed 6318.22 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 38 Global Step: 802190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:50,175-Speed 6307.17 samples/sec Loss 2.5319 LearningRate 0.0000 Epoch: 38 Global Step: 802200 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:53,432-Speed 6290.23 samples/sec Loss 2.5396 LearningRate 0.0000 Epoch: 38 Global Step: 802210 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:56,684-Speed 6298.67 samples/sec Loss 2.5447 LearningRate 0.0000 Epoch: 38 Global Step: 802220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:43:59,942-Speed 6287.97 samples/sec Loss 2.6118 LearningRate 0.0000 Epoch: 38 Global Step: 802230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:44:03,199-Speed 6289.98 samples/sec Loss 2.5791 LearningRate 0.0000 Epoch: 38 Global Step: 802240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:44:06,457-Speed 6287.58 samples/sec Loss 2.5245 LearningRate 0.0000 Epoch: 38 Global Step: 802250 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:44:09,709-Speed 6298.51 samples/sec Loss 2.5375 LearningRate 0.0000 Epoch: 38 Global Step: 802260 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:44:12,970-Speed 6281.86 samples/sec Loss 2.5360 LearningRate 0.0000 Epoch: 38 Global Step: 802270 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:44:16,224-Speed 6297.15 samples/sec Loss 2.6079 LearningRate 0.0000 Epoch: 38 Global Step: 802280 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:44:19,465-Speed 6319.41 samples/sec Loss 2.5893 LearningRate 0.0000 Epoch: 38 Global Step: 802290 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:44:22,717-Speed 6299.25 samples/sec Loss 2.5196 LearningRate 0.0000 Epoch: 38 Global Step: 802300 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:44:25,980-Speed 6279.27 samples/sec Loss 2.5415 LearningRate 0.0000 Epoch: 38 Global Step: 802310 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:44:29,230-Speed 6302.31 samples/sec Loss 2.5060 LearningRate 0.0000 Epoch: 38 Global Step: 802320 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:44:32,484-Speed 6296.80 samples/sec Loss 2.5245 LearningRate 0.0000 Epoch: 38 Global Step: 802330 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:44:35,738-Speed 6294.46 samples/sec Loss 2.5453 LearningRate 0.0000 Epoch: 38 Global Step: 802340 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:44:38,999-Speed 6281.82 samples/sec Loss 2.5867 LearningRate 0.0000 Epoch: 38 Global Step: 802350 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:44:42,248-Speed 6304.90 samples/sec Loss 2.6027 LearningRate 0.0000 Epoch: 38 Global Step: 802360 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:44:45,503-Speed 6293.00 samples/sec Loss 2.5396 LearningRate 0.0000 Epoch: 38 Global Step: 802370 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:44:48,759-Speed 6292.31 samples/sec Loss 2.4912 LearningRate 0.0000 Epoch: 38 Global Step: 802380 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:44:52,022-Speed 6277.42 samples/sec Loss 2.5445 LearningRate 0.0000 Epoch: 38 Global Step: 802390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:44:55,271-Speed 6304.01 samples/sec Loss 2.5315 LearningRate 0.0000 Epoch: 38 Global Step: 802400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:44:58,525-Speed 6295.50 samples/sec Loss 2.5420 LearningRate 0.0000 Epoch: 38 Global Step: 802410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:01,780-Speed 6294.04 samples/sec Loss 2.5321 LearningRate 0.0000 Epoch: 38 Global Step: 802420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:05,036-Speed 6291.32 samples/sec Loss 2.5279 LearningRate 0.0000 Epoch: 38 Global Step: 802430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:08,291-Speed 6292.57 samples/sec Loss 2.5482 LearningRate 0.0000 Epoch: 38 Global Step: 802440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:11,544-Speed 6296.94 samples/sec Loss 2.5799 LearningRate 0.0000 Epoch: 38 Global Step: 802450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:14,811-Speed 6269.96 samples/sec Loss 2.5596 LearningRate 0.0000 Epoch: 38 Global Step: 802460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:18,067-Speed 6292.49 samples/sec Loss 2.5488 LearningRate 0.0000 Epoch: 38 Global Step: 802470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:21,315-Speed 6306.82 samples/sec Loss 2.5326 LearningRate 0.0000 Epoch: 38 Global Step: 802480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:24,549-Speed 6333.41 samples/sec Loss 2.5550 LearningRate 0.0000 Epoch: 38 Global Step: 802490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:27,803-Speed 6294.73 samples/sec Loss 2.4666 LearningRate 0.0000 Epoch: 38 Global Step: 802500 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:31,076-Speed 6260.67 samples/sec Loss 2.5717 LearningRate 0.0000 Epoch: 38 Global Step: 802510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:34,326-Speed 6301.88 samples/sec Loss 2.5627 LearningRate 0.0000 Epoch: 38 Global Step: 802520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:37,575-Speed 6304.08 samples/sec Loss 2.5397 LearningRate 0.0000 Epoch: 38 Global Step: 802530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:40,832-Speed 6291.61 samples/sec Loss 2.5667 LearningRate 0.0000 Epoch: 38 Global Step: 802540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:44,090-Speed 6286.46 samples/sec Loss 2.5371 LearningRate 0.0000 Epoch: 38 Global Step: 802550 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:47,348-Speed 6287.81 samples/sec Loss 2.4871 LearningRate 0.0000 Epoch: 38 Global Step: 802560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:50,613-Speed 6274.67 samples/sec Loss 2.5390 LearningRate 0.0000 Epoch: 38 Global Step: 802570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:53,873-Speed 6283.76 samples/sec Loss 2.5537 LearningRate 0.0000 Epoch: 38 Global Step: 802580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:45:57,111-Speed 6324.74 samples/sec Loss 2.5451 LearningRate 0.0000 Epoch: 38 Global Step: 802590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:00,365-Speed 6296.45 samples/sec Loss 2.5759 LearningRate 0.0000 Epoch: 38 Global Step: 802600 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:03,632-Speed 6270.76 samples/sec Loss 2.5878 LearningRate 0.0000 Epoch: 38 Global Step: 802610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:06,887-Speed 6291.70 samples/sec Loss 2.5976 LearningRate 0.0000 Epoch: 38 Global Step: 802620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:10,143-Speed 6292.00 samples/sec Loss 2.5182 LearningRate 0.0000 Epoch: 38 Global Step: 802630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:13,400-Speed 6289.89 samples/sec Loss 2.4899 LearningRate 0.0000 Epoch: 38 Global Step: 802640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:16,650-Speed 6301.69 samples/sec Loss 2.5700 LearningRate 0.0000 Epoch: 38 Global Step: 802650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:19,902-Speed 6300.25 samples/sec Loss 2.5038 LearningRate 0.0000 Epoch: 38 Global Step: 802660 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:23,146-Speed 6312.75 samples/sec Loss 2.5074 LearningRate 0.0000 Epoch: 38 Global Step: 802670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:26,403-Speed 6289.65 samples/sec Loss 2.5190 LearningRate 0.0000 Epoch: 38 Global Step: 802680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:29,635-Speed 6338.33 samples/sec Loss 2.5646 LearningRate 0.0000 Epoch: 38 Global Step: 802690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:32,886-Speed 6302.26 samples/sec Loss 2.4823 LearningRate 0.0000 Epoch: 38 Global Step: 802700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:36,138-Speed 6298.39 samples/sec Loss 2.5374 LearningRate 0.0000 Epoch: 38 Global Step: 802710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:39,515-Speed 6064.90 samples/sec Loss 2.5430 LearningRate 0.0000 Epoch: 38 Global Step: 802720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:42,823-Speed 6193.20 samples/sec Loss 2.5922 LearningRate 0.0000 Epoch: 38 Global Step: 802730 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:46,082-Speed 6286.06 samples/sec Loss 2.5791 LearningRate 0.0000 Epoch: 38 Global Step: 802740 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:49,335-Speed 6296.93 samples/sec Loss 2.5278 LearningRate 0.0000 Epoch: 38 Global Step: 802750 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:52,590-Speed 6293.90 samples/sec Loss 2.5759 LearningRate 0.0000 Epoch: 38 Global Step: 802760 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:55,841-Speed 6303.03 samples/sec Loss 2.5330 LearningRate 0.0000 Epoch: 38 Global Step: 802770 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:46:59,094-Speed 6296.21 samples/sec Loss 2.5401 LearningRate 0.0000 Epoch: 38 Global Step: 802780 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:02,346-Speed 6299.89 samples/sec Loss 2.5503 LearningRate 0.0000 Epoch: 38 Global Step: 802790 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:05,601-Speed 6292.65 samples/sec Loss 2.5627 LearningRate 0.0000 Epoch: 38 Global Step: 802800 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:08,863-Speed 6280.43 samples/sec Loss 2.5447 LearningRate 0.0000 Epoch: 38 Global Step: 802810 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:12,114-Speed 6300.47 samples/sec Loss 2.5559 LearningRate 0.0000 Epoch: 38 Global Step: 802820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:15,371-Speed 6290.45 samples/sec Loss 2.5043 LearningRate 0.0000 Epoch: 38 Global Step: 802830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:18,621-Speed 6303.40 samples/sec Loss 2.5541 LearningRate 0.0000 Epoch: 38 Global Step: 802840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:21,877-Speed 6289.87 samples/sec Loss 2.5749 LearningRate 0.0000 Epoch: 38 Global Step: 802850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:25,128-Speed 6301.80 samples/sec Loss 2.5061 LearningRate 0.0000 Epoch: 38 Global Step: 802860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:28,385-Speed 6288.86 samples/sec Loss 2.5456 LearningRate 0.0000 Epoch: 38 Global Step: 802870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:31,647-Speed 6280.48 samples/sec Loss 2.5850 LearningRate 0.0000 Epoch: 38 Global Step: 802880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:34,905-Speed 6286.79 samples/sec Loss 2.5221 LearningRate 0.0000 Epoch: 38 Global Step: 802890 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 16:47:38,146-Speed 6321.03 samples/sec Loss 2.5584 LearningRate 0.0000 Epoch: 38 Global Step: 802900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:41,392-Speed 6311.03 samples/sec Loss 2.5152 LearningRate 0.0000 Epoch: 38 Global Step: 802910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:44,644-Speed 6297.81 samples/sec Loss 2.5556 LearningRate 0.0000 Epoch: 38 Global Step: 802920 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:47,900-Speed 6291.52 samples/sec Loss 2.5749 LearningRate 0.0000 Epoch: 38 Global Step: 802930 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:51,155-Speed 6294.18 samples/sec Loss 2.5434 LearningRate 0.0000 Epoch: 38 Global Step: 802940 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:54,409-Speed 6293.58 samples/sec Loss 2.5143 LearningRate 0.0000 Epoch: 38 Global Step: 802950 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:47:57,664-Speed 6295.15 samples/sec Loss 2.5734 LearningRate 0.0000 Epoch: 38 Global Step: 802960 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:00,915-Speed 6300.39 samples/sec Loss 2.5297 LearningRate 0.0000 Epoch: 38 Global Step: 802970 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:04,161-Speed 6310.52 samples/sec Loss 2.5251 LearningRate 0.0000 Epoch: 38 Global Step: 802980 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:07,417-Speed 6292.81 samples/sec Loss 2.5318 LearningRate 0.0000 Epoch: 38 Global Step: 802990 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:10,654-Speed 6326.45 samples/sec Loss 2.5194 LearningRate 0.0000 Epoch: 38 Global Step: 803000 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:13,905-Speed 6301.20 samples/sec Loss 2.5278 LearningRate 0.0000 Epoch: 38 Global Step: 803010 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:17,164-Speed 6285.74 samples/sec Loss 2.6194 LearningRate 0.0000 Epoch: 38 Global Step: 803020 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:20,418-Speed 6296.66 samples/sec Loss 2.5279 LearningRate 0.0000 Epoch: 38 Global Step: 803030 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:23,677-Speed 6284.22 samples/sec Loss 2.6022 LearningRate 0.0000 Epoch: 38 Global Step: 803040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:26,938-Speed 6283.16 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 38 Global Step: 803050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:30,191-Speed 6296.65 samples/sec Loss 2.5574 LearningRate 0.0000 Epoch: 38 Global Step: 803060 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:33,449-Speed 6288.13 samples/sec Loss 2.5465 LearningRate 0.0000 Epoch: 38 Global Step: 803070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:36,695-Speed 6310.05 samples/sec Loss 2.5547 LearningRate 0.0000 Epoch: 38 Global Step: 803080 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:39,952-Speed 6289.60 samples/sec Loss 2.5560 LearningRate 0.0000 Epoch: 38 Global Step: 803090 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:43,198-Speed 6309.36 samples/sec Loss 2.5511 LearningRate 0.0000 Epoch: 38 Global Step: 803100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:46,454-Speed 6291.84 samples/sec Loss 2.5290 LearningRate 0.0000 Epoch: 38 Global Step: 803110 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:49,721-Speed 6270.99 samples/sec Loss 2.5183 LearningRate 0.0000 Epoch: 38 Global Step: 803120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:52,970-Speed 6304.06 samples/sec Loss 2.5520 LearningRate 0.0000 Epoch: 38 Global Step: 803130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:56,227-Speed 6289.84 samples/sec Loss 2.5725 LearningRate 0.0000 Epoch: 38 Global Step: 803140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:48:59,479-Speed 6298.18 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 38 Global Step: 803150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:49:02,724-Speed 6313.96 samples/sec Loss 2.5592 LearningRate 0.0000 Epoch: 38 Global Step: 803160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:49:05,984-Speed 6282.59 samples/sec Loss 2.5477 LearningRate 0.0000 Epoch: 38 Global Step: 803170 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:49:09,236-Speed 6298.90 samples/sec Loss 2.5543 LearningRate 0.0000 Epoch: 38 Global Step: 803180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:49:12,494-Speed 6288.05 samples/sec Loss 2.6028 LearningRate 0.0000 Epoch: 38 Global Step: 803190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:49:15,746-Speed 6299.25 samples/sec Loss 2.5179 LearningRate 0.0000 Epoch: 38 Global Step: 803200 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 16:49:18,991-Speed 6313.92 samples/sec Loss 2.5466 LearningRate 0.0000 Epoch: 38 Global Step: 803210 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:49:22,255-Speed 6274.80 samples/sec Loss 2.5853 LearningRate 0.0000 Epoch: 38 Global Step: 803220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:49:25,504-Speed 6305.59 samples/sec Loss 2.5510 LearningRate 0.0000 Epoch: 38 Global Step: 803230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:49:28,765-Speed 6281.27 samples/sec Loss 2.5738 LearningRate 0.0000 Epoch: 38 Global Step: 803240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:49:32,001-Speed 6330.71 samples/sec Loss 2.5152 LearningRate 0.0000 Epoch: 38 Global Step: 803250 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:49:35,263-Speed 6279.45 samples/sec Loss 2.5302 LearningRate 0.0000 Epoch: 38 Global Step: 803260 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:49:38,515-Speed 6299.73 samples/sec Loss 2.5282 LearningRate 0.0000 Epoch: 38 Global Step: 803270 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:49:41,761-Speed 6309.52 samples/sec Loss 2.5717 LearningRate 0.0000 Epoch: 38 Global Step: 803280 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:49:45,027-Speed 6272.39 samples/sec Loss 2.5604 LearningRate 0.0000 Epoch: 38 Global Step: 803290 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:49:48,278-Speed 6301.72 samples/sec Loss 2.5367 LearningRate 0.0000 Epoch: 38 Global Step: 803300 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:49:51,531-Speed 6297.34 samples/sec Loss 2.5169 LearningRate 0.0000 Epoch: 38 Global Step: 803310 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:49:54,789-Speed 6286.29 samples/sec Loss 2.6113 LearningRate 0.0000 Epoch: 38 Global Step: 803320 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:49:58,049-Speed 6284.49 samples/sec Loss 2.5294 LearningRate 0.0000 Epoch: 38 Global Step: 803330 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:50:01,314-Speed 6273.07 samples/sec Loss 2.5134 LearningRate 0.0000 Epoch: 38 Global Step: 803340 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:50:04,569-Speed 6292.77 samples/sec Loss 2.5418 LearningRate 0.0000 Epoch: 38 Global Step: 803350 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:07,828-Speed 6285.92 samples/sec Loss 2.5387 LearningRate 0.0000 Epoch: 38 Global Step: 803360 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:11,105-Speed 6252.16 samples/sec Loss 2.5330 LearningRate 0.0000 Epoch: 38 Global Step: 803370 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:14,373-Speed 6266.67 samples/sec Loss 2.5299 LearningRate 0.0000 Epoch: 38 Global Step: 803380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:17,628-Speed 6293.74 samples/sec Loss 2.5835 LearningRate 0.0000 Epoch: 38 Global Step: 803390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:20,890-Speed 6281.09 samples/sec Loss 2.5767 LearningRate 0.0000 Epoch: 38 Global Step: 803400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:24,137-Speed 6309.29 samples/sec Loss 2.5201 LearningRate 0.0000 Epoch: 38 Global Step: 803410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:27,398-Speed 6281.41 samples/sec Loss 2.4888 LearningRate 0.0000 Epoch: 38 Global Step: 803420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:30,660-Speed 6279.07 samples/sec Loss 2.5698 LearningRate 0.0000 Epoch: 38 Global Step: 803430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:33,920-Speed 6284.90 samples/sec Loss 2.5446 LearningRate 0.0000 Epoch: 38 Global Step: 803440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:37,184-Speed 6278.39 samples/sec Loss 2.5583 LearningRate 0.0000 Epoch: 38 Global Step: 803450 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 16:50:40,419-Speed 6330.89 samples/sec Loss 2.5138 LearningRate 0.0000 Epoch: 38 Global Step: 803460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:43,669-Speed 6304.15 samples/sec Loss 2.6096 LearningRate 0.0000 Epoch: 38 Global Step: 803470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:46,926-Speed 6288.61 samples/sec Loss 2.5262 LearningRate 0.0000 Epoch: 38 Global Step: 803480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:50,182-Speed 6292.18 samples/sec Loss 2.5626 LearningRate 0.0000 Epoch: 38 Global Step: 803490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:53,433-Speed 6301.05 samples/sec Loss 2.5635 LearningRate 0.0000 Epoch: 38 Global Step: 803500 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:56,685-Speed 6298.69 samples/sec Loss 2.5425 LearningRate 0.0000 Epoch: 38 Global Step: 803510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:50:59,940-Speed 6292.10 samples/sec Loss 2.5681 LearningRate 0.0000 Epoch: 38 Global Step: 803520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:03,195-Speed 6293.07 samples/sec Loss 2.6045 LearningRate 0.0000 Epoch: 38 Global Step: 803530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:06,452-Speed 6289.72 samples/sec Loss 2.6013 LearningRate 0.0000 Epoch: 38 Global Step: 803540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:09,708-Speed 6291.61 samples/sec Loss 2.5228 LearningRate 0.0000 Epoch: 38 Global Step: 803550 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:12,953-Speed 6313.67 samples/sec Loss 2.5625 LearningRate 0.0000 Epoch: 38 Global Step: 803560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:16,235-Speed 6240.81 samples/sec Loss 2.5484 LearningRate 0.0000 Epoch: 38 Global Step: 803570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:19,538-Speed 6200.46 samples/sec Loss 2.5548 LearningRate 0.0000 Epoch: 38 Global Step: 803580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:22,804-Speed 6272.09 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 38 Global Step: 803590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:26,060-Speed 6292.04 samples/sec Loss 2.5407 LearningRate 0.0000 Epoch: 38 Global Step: 803600 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:29,314-Speed 6295.77 samples/sec Loss 2.5660 LearningRate 0.0000 Epoch: 38 Global Step: 803610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:32,566-Speed 6298.89 samples/sec Loss 2.5493 LearningRate 0.0000 Epoch: 38 Global Step: 803620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:35,821-Speed 6293.10 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 38 Global Step: 803630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:39,074-Speed 6298.62 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 38 Global Step: 803640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:42,324-Speed 6302.85 samples/sec Loss 2.5158 LearningRate 0.0000 Epoch: 38 Global Step: 803650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:45,560-Speed 6329.18 samples/sec Loss 2.5675 LearningRate 0.0000 Epoch: 38 Global Step: 803660 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:48,814-Speed 6294.62 samples/sec Loss 2.5064 LearningRate 0.0000 Epoch: 38 Global Step: 803670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:52,088-Speed 6257.57 samples/sec Loss 2.5187 LearningRate 0.0000 Epoch: 38 Global Step: 803680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:55,348-Speed 6283.77 samples/sec Loss 2.5081 LearningRate 0.0000 Epoch: 38 Global Step: 803690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:51:58,618-Speed 6264.71 samples/sec Loss 2.5423 LearningRate 0.0000 Epoch: 38 Global Step: 803700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:52:01,879-Speed 6284.39 samples/sec Loss 2.5088 LearningRate 0.0000 Epoch: 38 Global Step: 803710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:52:05,136-Speed 6290.07 samples/sec Loss 2.5708 LearningRate 0.0000 Epoch: 38 Global Step: 803720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:52:08,372-Speed 6329.43 samples/sec Loss 2.5151 LearningRate 0.0000 Epoch: 38 Global Step: 803730 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:52:11,624-Speed 6299.18 samples/sec Loss 2.5466 LearningRate 0.0000 Epoch: 38 Global Step: 803740 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:52:14,872-Speed 6306.65 samples/sec Loss 2.4758 LearningRate 0.0000 Epoch: 38 Global Step: 803750 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:52:18,125-Speed 6296.02 samples/sec Loss 2.5712 LearningRate 0.0000 Epoch: 38 Global Step: 803760 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:52:21,377-Speed 6299.35 samples/sec Loss 2.5316 LearningRate 0.0000 Epoch: 38 Global Step: 803770 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:52:24,630-Speed 6297.68 samples/sec Loss 2.5813 LearningRate 0.0000 Epoch: 38 Global Step: 803780 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:52:27,882-Speed 6299.60 samples/sec Loss 2.5424 LearningRate 0.0000 Epoch: 38 Global Step: 803790 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:52:31,143-Speed 6280.76 samples/sec Loss 2.5179 LearningRate 0.0000 Epoch: 38 Global Step: 803800 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:52:34,396-Speed 6296.75 samples/sec Loss 2.5780 LearningRate 0.0000 Epoch: 38 Global Step: 803810 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:52:37,648-Speed 6300.54 samples/sec Loss 2.5662 LearningRate 0.0000 Epoch: 38 Global Step: 803820 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 16:52:40,910-Speed 6278.33 samples/sec Loss 2.5206 LearningRate 0.0000 Epoch: 38 Global Step: 803830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:52:44,164-Speed 6296.59 samples/sec Loss 2.4678 LearningRate 0.0000 Epoch: 38 Global Step: 803840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:52:47,425-Speed 6281.06 samples/sec Loss 2.5388 LearningRate 0.0000 Epoch: 38 Global Step: 803850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:52:50,678-Speed 6297.82 samples/sec Loss 2.5060 LearningRate 0.0000 Epoch: 38 Global Step: 803860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:52:53,935-Speed 6290.36 samples/sec Loss 2.5076 LearningRate 0.0000 Epoch: 38 Global Step: 803870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:52:57,197-Speed 6278.03 samples/sec Loss 2.5020 LearningRate 0.0000 Epoch: 38 Global Step: 803880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:00,460-Speed 6278.75 samples/sec Loss 2.4933 LearningRate 0.0000 Epoch: 38 Global Step: 803890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:03,712-Speed 6299.76 samples/sec Loss 2.5551 LearningRate 0.0000 Epoch: 38 Global Step: 803900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:06,965-Speed 6297.90 samples/sec Loss 2.4916 LearningRate 0.0000 Epoch: 38 Global Step: 803910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:10,216-Speed 6301.44 samples/sec Loss 2.5537 LearningRate 0.0000 Epoch: 38 Global Step: 803920 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:13,473-Speed 6288.61 samples/sec Loss 2.5948 LearningRate 0.0000 Epoch: 38 Global Step: 803930 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 16:53:16,715-Speed 6318.43 samples/sec Loss 2.5072 LearningRate 0.0000 Epoch: 38 Global Step: 803940 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:19,972-Speed 6289.27 samples/sec Loss 2.5698 LearningRate 0.0000 Epoch: 38 Global Step: 803950 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:23,229-Speed 6290.01 samples/sec Loss 2.4799 LearningRate 0.0000 Epoch: 38 Global Step: 803960 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:26,480-Speed 6300.08 samples/sec Loss 2.5147 LearningRate 0.0000 Epoch: 38 Global Step: 803970 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:29,849-Speed 6080.22 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 38 Global Step: 803980 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:33,105-Speed 6292.81 samples/sec Loss 2.5992 LearningRate 0.0000 Epoch: 38 Global Step: 803990 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:36,356-Speed 6299.92 samples/sec Loss 2.6259 LearningRate 0.0000 Epoch: 38 Global Step: 804000 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:39,605-Speed 6304.39 samples/sec Loss 2.5175 LearningRate 0.0000 Epoch: 38 Global Step: 804010 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:42,857-Speed 6300.09 samples/sec Loss 2.5328 LearningRate 0.0000 Epoch: 38 Global Step: 804020 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:46,123-Speed 6271.10 samples/sec Loss 2.6360 LearningRate 0.0000 Epoch: 38 Global Step: 804030 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:49,364-Speed 6322.12 samples/sec Loss 2.5486 LearningRate 0.0000 Epoch: 38 Global Step: 804040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:52,618-Speed 6295.43 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 38 Global Step: 804050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:55,873-Speed 6292.22 samples/sec Loss 2.5134 LearningRate 0.0000 Epoch: 38 Global Step: 804060 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:53:59,149-Speed 6253.38 samples/sec Loss 2.5595 LearningRate 0.0000 Epoch: 38 Global Step: 804070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:02,421-Speed 6261.27 samples/sec Loss 2.5824 LearningRate 0.0000 Epoch: 38 Global Step: 804080 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:05,676-Speed 6292.63 samples/sec Loss 2.5620 LearningRate 0.0000 Epoch: 38 Global Step: 804090 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:08,932-Speed 6291.13 samples/sec Loss 2.5496 LearningRate 0.0000 Epoch: 38 Global Step: 804100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:12,188-Speed 6292.16 samples/sec Loss 2.4998 LearningRate 0.0000 Epoch: 38 Global Step: 804110 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:15,447-Speed 6284.46 samples/sec Loss 2.5498 LearningRate 0.0000 Epoch: 38 Global Step: 804120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:18,705-Speed 6287.31 samples/sec Loss 2.5482 LearningRate 0.0000 Epoch: 38 Global Step: 804130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:21,937-Speed 6339.40 samples/sec Loss 2.5289 LearningRate 0.0000 Epoch: 38 Global Step: 804140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:25,192-Speed 6292.28 samples/sec Loss 2.5394 LearningRate 0.0000 Epoch: 38 Global Step: 804150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:28,443-Speed 6300.78 samples/sec Loss 2.5548 LearningRate 0.0000 Epoch: 38 Global Step: 804160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:31,714-Speed 6263.35 samples/sec Loss 2.5767 LearningRate 0.0000 Epoch: 38 Global Step: 804170 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:34,976-Speed 6279.59 samples/sec Loss 2.5288 LearningRate 0.0000 Epoch: 38 Global Step: 804180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:38,223-Speed 6307.92 samples/sec Loss 2.5359 LearningRate 0.0000 Epoch: 38 Global Step: 804190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:41,475-Speed 6299.65 samples/sec Loss 2.5072 LearningRate 0.0000 Epoch: 38 Global Step: 804200 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:44,731-Speed 6291.66 samples/sec Loss 2.5521 LearningRate 0.0000 Epoch: 38 Global Step: 804210 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:47,985-Speed 6295.04 samples/sec Loss 2.5339 LearningRate 0.0000 Epoch: 38 Global Step: 804220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:51,238-Speed 6297.37 samples/sec Loss 2.5574 LearningRate 0.0000 Epoch: 38 Global Step: 804230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:54,480-Speed 6317.26 samples/sec Loss 2.5386 LearningRate 0.0000 Epoch: 38 Global Step: 804240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:54:57,734-Speed 6295.98 samples/sec Loss 2.5509 LearningRate 0.0000 Epoch: 38 Global Step: 804250 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:00,981-Speed 6309.40 samples/sec Loss 2.5216 LearningRate 0.0000 Epoch: 38 Global Step: 804260 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:04,238-Speed 6290.22 samples/sec Loss 2.5799 LearningRate 0.0000 Epoch: 38 Global Step: 804270 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:07,490-Speed 6299.03 samples/sec Loss 2.5735 LearningRate 0.0000 Epoch: 38 Global Step: 804280 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:10,742-Speed 6298.86 samples/sec Loss 2.5360 LearningRate 0.0000 Epoch: 38 Global Step: 804290 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:14,011-Speed 6265.84 samples/sec Loss 2.5331 LearningRate 0.0000 Epoch: 38 Global Step: 804300 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:17,277-Speed 6271.64 samples/sec Loss 2.5641 LearningRate 0.0000 Epoch: 38 Global Step: 804310 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:20,518-Speed 6319.89 samples/sec Loss 2.5800 LearningRate 0.0000 Epoch: 38 Global Step: 804320 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:23,770-Speed 6299.94 samples/sec Loss 2.5585 LearningRate 0.0000 Epoch: 38 Global Step: 804330 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:27,008-Speed 6325.19 samples/sec Loss 2.6081 LearningRate 0.0000 Epoch: 38 Global Step: 804340 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:30,268-Speed 6285.05 samples/sec Loss 2.5394 LearningRate 0.0000 Epoch: 38 Global Step: 804350 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:33,516-Speed 6306.79 samples/sec Loss 2.5102 LearningRate 0.0000 Epoch: 38 Global Step: 804360 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:36,765-Speed 6304.34 samples/sec Loss 2.5288 LearningRate 0.0000 Epoch: 38 Global Step: 804370 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:40,028-Speed 6278.18 samples/sec Loss 2.5393 LearningRate 0.0000 Epoch: 38 Global Step: 804380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:43,282-Speed 6295.52 samples/sec Loss 2.4882 LearningRate 0.0000 Epoch: 38 Global Step: 804390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:46,542-Speed 6281.93 samples/sec Loss 2.5332 LearningRate 0.0000 Epoch: 38 Global Step: 804400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:49,803-Speed 6283.36 samples/sec Loss 2.6073 LearningRate 0.0000 Epoch: 38 Global Step: 804410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:53,056-Speed 6300.75 samples/sec Loss 2.5609 LearningRate 0.0000 Epoch: 38 Global Step: 804420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:56,306-Speed 6302.62 samples/sec Loss 2.5003 LearningRate 0.0000 Epoch: 38 Global Step: 804430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:55:59,550-Speed 6313.20 samples/sec Loss 2.5061 LearningRate 0.0000 Epoch: 38 Global Step: 804440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:02,806-Speed 6291.44 samples/sec Loss 2.5756 LearningRate 0.0000 Epoch: 38 Global Step: 804450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:06,060-Speed 6295.53 samples/sec Loss 2.5338 LearningRate 0.0000 Epoch: 38 Global Step: 804460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:09,312-Speed 6300.95 samples/sec Loss 2.4860 LearningRate 0.0000 Epoch: 38 Global Step: 804470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:12,583-Speed 6262.43 samples/sec Loss 2.5752 LearningRate 0.0000 Epoch: 38 Global Step: 804480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:15,837-Speed 6294.37 samples/sec Loss 2.5407 LearningRate 0.0000 Epoch: 38 Global Step: 804490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:19,090-Speed 6296.17 samples/sec Loss 2.6012 LearningRate 0.0000 Epoch: 38 Global Step: 804500 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:22,344-Speed 6296.90 samples/sec Loss 2.5611 LearningRate 0.0000 Epoch: 38 Global Step: 804510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:25,600-Speed 6290.04 samples/sec Loss 2.5229 LearningRate 0.0000 Epoch: 38 Global Step: 804520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:28,875-Speed 6255.65 samples/sec Loss 2.5415 LearningRate 0.0000 Epoch: 38 Global Step: 804530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:32,233-Speed 6101.17 samples/sec Loss 2.5206 LearningRate 0.0000 Epoch: 38 Global Step: 804540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:35,503-Speed 6263.59 samples/sec Loss 2.5576 LearningRate 0.0000 Epoch: 38 Global Step: 804550 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:38,752-Speed 6308.00 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 38 Global Step: 804560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:42,008-Speed 6291.64 samples/sec Loss 2.5582 LearningRate 0.0000 Epoch: 38 Global Step: 804570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:45,300-Speed 6223.02 samples/sec Loss 2.5250 LearningRate 0.0000 Epoch: 38 Global Step: 804580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:48,549-Speed 6304.01 samples/sec Loss 2.5245 LearningRate 0.0000 Epoch: 38 Global Step: 804590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:51,808-Speed 6285.16 samples/sec Loss 2.5834 LearningRate 0.0000 Epoch: 38 Global Step: 804600 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:55,061-Speed 6297.68 samples/sec Loss 2.5310 LearningRate 0.0000 Epoch: 38 Global Step: 804610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:56:58,322-Speed 6282.31 samples/sec Loss 2.5560 LearningRate 0.0000 Epoch: 38 Global Step: 804620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:01,567-Speed 6310.96 samples/sec Loss 2.5363 LearningRate 0.0000 Epoch: 38 Global Step: 804630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:04,810-Speed 6316.90 samples/sec Loss 2.5528 LearningRate 0.0000 Epoch: 38 Global Step: 804640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:08,064-Speed 6294.59 samples/sec Loss 2.5956 LearningRate 0.0000 Epoch: 38 Global Step: 804650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:11,315-Speed 6302.21 samples/sec Loss 2.5712 LearningRate 0.0000 Epoch: 38 Global Step: 804660 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:14,570-Speed 6294.92 samples/sec Loss 2.5185 LearningRate 0.0000 Epoch: 38 Global Step: 804670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:17,827-Speed 6289.60 samples/sec Loss 2.5829 LearningRate 0.0000 Epoch: 38 Global Step: 804680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:21,077-Speed 6301.69 samples/sec Loss 2.5284 LearningRate 0.0000 Epoch: 38 Global Step: 804690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:24,341-Speed 6276.87 samples/sec Loss 2.5379 LearningRate 0.0000 Epoch: 38 Global Step: 804700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:27,588-Speed 6307.43 samples/sec Loss 2.5717 LearningRate 0.0000 Epoch: 38 Global Step: 804710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:30,842-Speed 6295.66 samples/sec Loss 2.5325 LearningRate 0.0000 Epoch: 38 Global Step: 804720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:34,103-Speed 6281.55 samples/sec Loss 2.5322 LearningRate 0.0000 Epoch: 38 Global Step: 804730 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:37,360-Speed 6289.37 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 38 Global Step: 804740 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 16:57:40,607-Speed 6308.31 samples/sec Loss 2.5103 LearningRate 0.0000 Epoch: 38 Global Step: 804750 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:43,857-Speed 6303.30 samples/sec Loss 2.5338 LearningRate 0.0000 Epoch: 38 Global Step: 804760 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:47,114-Speed 6290.30 samples/sec Loss 2.5543 LearningRate 0.0000 Epoch: 38 Global Step: 804770 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:50,372-Speed 6288.00 samples/sec Loss 2.5266 LearningRate 0.0000 Epoch: 38 Global Step: 804780 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:53,626-Speed 6293.45 samples/sec Loss 2.5661 LearningRate 0.0000 Epoch: 38 Global Step: 804790 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:57:56,875-Speed 6304.83 samples/sec Loss 2.4892 LearningRate 0.0000 Epoch: 38 Global Step: 804800 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:00,151-Speed 6252.82 samples/sec Loss 2.6043 LearningRate 0.0000 Epoch: 38 Global Step: 804810 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:03,409-Speed 6289.31 samples/sec Loss 2.5269 LearningRate 0.0000 Epoch: 38 Global Step: 804820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:06,668-Speed 6283.71 samples/sec Loss 2.5770 LearningRate 0.0000 Epoch: 38 Global Step: 804830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:09,916-Speed 6307.61 samples/sec Loss 2.5448 LearningRate 0.0000 Epoch: 38 Global Step: 804840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:13,157-Speed 6319.81 samples/sec Loss 2.5445 LearningRate 0.0000 Epoch: 38 Global Step: 804850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:16,413-Speed 6291.81 samples/sec Loss 2.5021 LearningRate 0.0000 Epoch: 38 Global Step: 804860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:19,671-Speed 6288.18 samples/sec Loss 2.4769 LearningRate 0.0000 Epoch: 38 Global Step: 804870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:22,930-Speed 6287.19 samples/sec Loss 2.5381 LearningRate 0.0000 Epoch: 38 Global Step: 804880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:26,187-Speed 6288.10 samples/sec Loss 2.4992 LearningRate 0.0000 Epoch: 38 Global Step: 804890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:29,461-Speed 6257.96 samples/sec Loss 2.5350 LearningRate 0.0000 Epoch: 38 Global Step: 804900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:32,719-Speed 6286.01 samples/sec Loss 2.5336 LearningRate 0.0000 Epoch: 38 Global Step: 804910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:35,973-Speed 6296.33 samples/sec Loss 2.6167 LearningRate 0.0000 Epoch: 38 Global Step: 804920 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:39,226-Speed 6296.43 samples/sec Loss 2.5057 LearningRate 0.0000 Epoch: 38 Global Step: 804930 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:42,485-Speed 6285.93 samples/sec Loss 2.5248 LearningRate 0.0000 Epoch: 38 Global Step: 804940 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:45,720-Speed 6331.69 samples/sec Loss 2.5521 LearningRate 0.0000 Epoch: 38 Global Step: 804950 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:48,979-Speed 6286.99 samples/sec Loss 2.5558 LearningRate 0.0000 Epoch: 38 Global Step: 804960 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:52,230-Speed 6300.10 samples/sec Loss 2.5311 LearningRate 0.0000 Epoch: 38 Global Step: 804970 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:55,484-Speed 6295.08 samples/sec Loss 2.5152 LearningRate 0.0000 Epoch: 38 Global Step: 804980 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:58:58,743-Speed 6284.69 samples/sec Loss 2.5419 LearningRate 0.0000 Epoch: 38 Global Step: 804990 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:02,000-Speed 6289.25 samples/sec Loss 2.5516 LearningRate 0.0000 Epoch: 38 Global Step: 805000 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:05,253-Speed 6297.96 samples/sec Loss 2.5745 LearningRate 0.0000 Epoch: 38 Global Step: 805010 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:08,501-Speed 6306.72 samples/sec Loss 2.5647 LearningRate 0.0000 Epoch: 38 Global Step: 805020 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:11,750-Speed 6305.50 samples/sec Loss 2.5419 LearningRate 0.0000 Epoch: 38 Global Step: 805030 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:14,998-Speed 6305.88 samples/sec Loss 2.5092 LearningRate 0.0000 Epoch: 38 Global Step: 805040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:18,240-Speed 6319.40 samples/sec Loss 2.5221 LearningRate 0.0000 Epoch: 38 Global Step: 805050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:21,497-Speed 6289.34 samples/sec Loss 2.5688 LearningRate 0.0000 Epoch: 38 Global Step: 805060 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:24,752-Speed 6293.17 samples/sec Loss 2.4970 LearningRate 0.0000 Epoch: 38 Global Step: 805070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:28,002-Speed 6301.79 samples/sec Loss 2.5285 LearningRate 0.0000 Epoch: 38 Global Step: 805080 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:31,253-Speed 6302.81 samples/sec Loss 2.5068 LearningRate 0.0000 Epoch: 38 Global Step: 805090 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:34,518-Speed 6274.48 samples/sec Loss 2.5647 LearningRate 0.0000 Epoch: 38 Global Step: 805100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:37,778-Speed 6283.61 samples/sec Loss 2.5024 LearningRate 0.0000 Epoch: 38 Global Step: 805110 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:41,020-Speed 6317.16 samples/sec Loss 2.5121 LearningRate 0.0000 Epoch: 38 Global Step: 805120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:44,271-Speed 6302.44 samples/sec Loss 2.5027 LearningRate 0.0000 Epoch: 38 Global Step: 805130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:47,519-Speed 6305.61 samples/sec Loss 2.5593 LearningRate 0.0000 Epoch: 38 Global Step: 805140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:50,770-Speed 6301.15 samples/sec Loss 2.5357 LearningRate 0.0000 Epoch: 38 Global Step: 805150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:54,033-Speed 6277.70 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 38 Global Step: 805160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 16:59:57,286-Speed 6297.57 samples/sec Loss 2.5783 LearningRate 0.0000 Epoch: 38 Global Step: 805170 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:00,547-Speed 6282.31 samples/sec Loss 2.5706 LearningRate 0.0000 Epoch: 38 Global Step: 805180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:03,803-Speed 6291.26 samples/sec Loss 2.5147 LearningRate 0.0000 Epoch: 38 Global Step: 805190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:07,053-Speed 6301.91 samples/sec Loss 2.5240 LearningRate 0.0000 Epoch: 38 Global Step: 805200 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:10,303-Speed 6302.13 samples/sec Loss 2.5428 LearningRate 0.0000 Epoch: 38 Global Step: 805210 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:13,564-Speed 6282.23 samples/sec Loss 2.5520 LearningRate 0.0000 Epoch: 38 Global Step: 805220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:16,816-Speed 6299.12 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 38 Global Step: 805230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:20,071-Speed 6293.07 samples/sec Loss 2.5279 LearningRate 0.0000 Epoch: 38 Global Step: 805240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:23,314-Speed 6316.37 samples/sec Loss 2.5529 LearningRate 0.0000 Epoch: 38 Global Step: 805250 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:26,586-Speed 6261.42 samples/sec Loss 2.5708 LearningRate 0.0000 Epoch: 38 Global Step: 805260 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:29,842-Speed 6289.87 samples/sec Loss 2.5296 LearningRate 0.0000 Epoch: 38 Global Step: 805270 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:33,101-Speed 6287.64 samples/sec Loss 2.5409 LearningRate 0.0000 Epoch: 38 Global Step: 805280 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:36,359-Speed 6285.98 samples/sec Loss 2.5506 LearningRate 0.0000 Epoch: 38 Global Step: 805290 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:39,615-Speed 6290.75 samples/sec Loss 2.6087 LearningRate 0.0000 Epoch: 38 Global Step: 805300 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:42,880-Speed 6276.81 samples/sec Loss 2.5507 LearningRate 0.0000 Epoch: 38 Global Step: 805310 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:46,137-Speed 6287.86 samples/sec Loss 2.5462 LearningRate 0.0000 Epoch: 38 Global Step: 805320 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:49,388-Speed 6302.03 samples/sec Loss 2.5740 LearningRate 0.0000 Epoch: 38 Global Step: 805330 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:52,643-Speed 6294.26 samples/sec Loss 2.5356 LearningRate 0.0000 Epoch: 38 Global Step: 805340 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:00:55,896-Speed 6296.13 samples/sec Loss 2.5351 LearningRate 0.0000 Epoch: 38 Global Step: 805350 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:00:59,133-Speed 6327.27 samples/sec Loss 2.5206 LearningRate 0.0000 Epoch: 38 Global Step: 805360 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:02,388-Speed 6296.09 samples/sec Loss 2.5106 LearningRate 0.0000 Epoch: 38 Global Step: 805370 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:05,638-Speed 6302.13 samples/sec Loss 2.5290 LearningRate 0.0000 Epoch: 38 Global Step: 805380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:08,889-Speed 6299.64 samples/sec Loss 2.5005 LearningRate 0.0000 Epoch: 38 Global Step: 805390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:12,144-Speed 6294.23 samples/sec Loss 2.5429 LearningRate 0.0000 Epoch: 38 Global Step: 805400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:15,392-Speed 6306.43 samples/sec Loss 2.5903 LearningRate 0.0000 Epoch: 38 Global Step: 805410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:18,643-Speed 6300.70 samples/sec Loss 2.5445 LearningRate 0.0000 Epoch: 38 Global Step: 805420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:21,898-Speed 6294.42 samples/sec Loss 2.5421 LearningRate 0.0000 Epoch: 38 Global Step: 805430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:25,151-Speed 6297.20 samples/sec Loss 2.5268 LearningRate 0.0000 Epoch: 38 Global Step: 805440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:28,401-Speed 6303.00 samples/sec Loss 2.4965 LearningRate 0.0000 Epoch: 38 Global Step: 805450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:31,637-Speed 6329.05 samples/sec Loss 2.5591 LearningRate 0.0000 Epoch: 38 Global Step: 805460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:34,913-Speed 6253.24 samples/sec Loss 2.5426 LearningRate 0.0000 Epoch: 38 Global Step: 805470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:38,169-Speed 6291.21 samples/sec Loss 2.5296 LearningRate 0.0000 Epoch: 38 Global Step: 805480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:41,527-Speed 6100.32 samples/sec Loss 2.5437 LearningRate 0.0000 Epoch: 38 Global Step: 805490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:44,789-Speed 6279.05 samples/sec Loss 2.5855 LearningRate 0.0000 Epoch: 38 Global Step: 805500 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:48,042-Speed 6299.40 samples/sec Loss 2.4960 LearningRate 0.0000 Epoch: 38 Global Step: 805510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:51,294-Speed 6297.99 samples/sec Loss 2.5359 LearningRate 0.0000 Epoch: 38 Global Step: 805520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:54,546-Speed 6299.04 samples/sec Loss 2.5331 LearningRate 0.0000 Epoch: 38 Global Step: 805530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:01:57,796-Speed 6304.66 samples/sec Loss 2.5490 LearningRate 0.0000 Epoch: 38 Global Step: 805540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:01,095-Speed 6209.19 samples/sec Loss 2.5410 LearningRate 0.0000 Epoch: 38 Global Step: 805550 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:04,335-Speed 6321.41 samples/sec Loss 2.5480 LearningRate 0.0000 Epoch: 38 Global Step: 805560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:07,588-Speed 6296.47 samples/sec Loss 2.5286 LearningRate 0.0000 Epoch: 38 Global Step: 805570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:10,848-Speed 6285.05 samples/sec Loss 2.5388 LearningRate 0.0000 Epoch: 38 Global Step: 805580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:14,101-Speed 6296.42 samples/sec Loss 2.5226 LearningRate 0.0000 Epoch: 38 Global Step: 805590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:17,352-Speed 6300.90 samples/sec Loss 2.5498 LearningRate 0.0000 Epoch: 38 Global Step: 805600 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:20,607-Speed 6292.79 samples/sec Loss 2.5395 LearningRate 0.0000 Epoch: 38 Global Step: 805610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:23,862-Speed 6292.98 samples/sec Loss 2.5434 LearningRate 0.0000 Epoch: 38 Global Step: 805620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:27,112-Speed 6303.84 samples/sec Loss 2.5908 LearningRate 0.0000 Epoch: 38 Global Step: 805630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:30,366-Speed 6294.61 samples/sec Loss 2.5200 LearningRate 0.0000 Epoch: 38 Global Step: 805640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:33,615-Speed 6305.31 samples/sec Loss 2.5368 LearningRate 0.0000 Epoch: 38 Global Step: 805650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:36,868-Speed 6297.99 samples/sec Loss 2.5733 LearningRate 0.0000 Epoch: 38 Global Step: 805660 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:02:40,112-Speed 6314.08 samples/sec Loss 2.5177 LearningRate 0.0000 Epoch: 38 Global Step: 805670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:43,367-Speed 6293.78 samples/sec Loss 2.5039 LearningRate 0.0000 Epoch: 38 Global Step: 805680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:46,621-Speed 6293.68 samples/sec Loss 2.5219 LearningRate 0.0000 Epoch: 38 Global Step: 805690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:49,951-Speed 6152.63 samples/sec Loss 2.5200 LearningRate 0.0000 Epoch: 38 Global Step: 805700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:53,204-Speed 6296.00 samples/sec Loss 2.5467 LearningRate 0.0000 Epoch: 38 Global Step: 805710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:56,465-Speed 6282.97 samples/sec Loss 2.5697 LearningRate 0.0000 Epoch: 38 Global Step: 805720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:02:59,717-Speed 6299.88 samples/sec Loss 2.4918 LearningRate 0.0000 Epoch: 38 Global Step: 805730 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:02,970-Speed 6297.00 samples/sec Loss 2.5058 LearningRate 0.0000 Epoch: 38 Global Step: 805740 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:06,233-Speed 6277.32 samples/sec Loss 2.5606 LearningRate 0.0000 Epoch: 38 Global Step: 805750 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:09,494-Speed 6280.97 samples/sec Loss 2.5594 LearningRate 0.0000 Epoch: 38 Global Step: 805760 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:12,736-Speed 6319.77 samples/sec Loss 2.5364 LearningRate 0.0000 Epoch: 38 Global Step: 805770 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:15,992-Speed 6290.36 samples/sec Loss 2.5404 LearningRate 0.0000 Epoch: 38 Global Step: 805780 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:19,255-Speed 6279.04 samples/sec Loss 2.5293 LearningRate 0.0000 Epoch: 38 Global Step: 805790 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:22,516-Speed 6281.48 samples/sec Loss 2.5157 LearningRate 0.0000 Epoch: 38 Global Step: 805800 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:25,767-Speed 6300.94 samples/sec Loss 2.5535 LearningRate 0.0000 Epoch: 38 Global Step: 805810 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:29,037-Speed 6264.22 samples/sec Loss 2.5192 LearningRate 0.0000 Epoch: 38 Global Step: 805820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:32,293-Speed 6292.42 samples/sec Loss 2.5432 LearningRate 0.0000 Epoch: 38 Global Step: 805830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:35,546-Speed 6295.73 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 38 Global Step: 805840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:38,804-Speed 6286.84 samples/sec Loss 2.5568 LearningRate 0.0000 Epoch: 38 Global Step: 805850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:42,062-Speed 6288.81 samples/sec Loss 2.5474 LearningRate 0.0000 Epoch: 38 Global Step: 805860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:45,306-Speed 6313.91 samples/sec Loss 2.5230 LearningRate 0.0000 Epoch: 38 Global Step: 805870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:48,563-Speed 6289.64 samples/sec Loss 2.5373 LearningRate 0.0000 Epoch: 38 Global Step: 805880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:51,813-Speed 6303.12 samples/sec Loss 2.5290 LearningRate 0.0000 Epoch: 38 Global Step: 805890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:55,074-Speed 6282.42 samples/sec Loss 2.5598 LearningRate 0.0000 Epoch: 38 Global Step: 805900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:03:58,337-Speed 6277.06 samples/sec Loss 2.5416 LearningRate 0.0000 Epoch: 38 Global Step: 805910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:04:01,589-Speed 6299.85 samples/sec Loss 2.5414 LearningRate 0.0000 Epoch: 38 Global Step: 805920 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:04:04,905-Speed 6177.04 samples/sec Loss 2.5552 LearningRate 0.0000 Epoch: 38 Global Step: 805930 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:04:08,149-Speed 6315.29 samples/sec Loss 2.5110 LearningRate 0.0000 Epoch: 38 Global Step: 805940 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:04:11,415-Speed 6272.20 samples/sec Loss 2.4972 LearningRate 0.0000 Epoch: 38 Global Step: 805950 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:04:14,671-Speed 6292.12 samples/sec Loss 2.5141 LearningRate 0.0000 Epoch: 38 Global Step: 805960 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:04:17,934-Speed 6277.81 samples/sec Loss 2.4930 LearningRate 0.0000 Epoch: 38 Global Step: 805970 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:04:21,188-Speed 6293.93 samples/sec Loss 2.5719 LearningRate 0.0000 Epoch: 38 Global Step: 805980 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:04:24,438-Speed 6307.00 samples/sec Loss 2.5416 LearningRate 0.0000 Epoch: 38 Global Step: 805990 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:04:27,693-Speed 6293.58 samples/sec Loss 2.5227 LearningRate 0.0000 Epoch: 38 Global Step: 806000 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:04:30,947-Speed 6296.54 samples/sec Loss 2.5549 LearningRate 0.0000 Epoch: 38 Global Step: 806010 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:04:34,202-Speed 6293.51 samples/sec Loss 2.5296 LearningRate 0.0000 Epoch: 38 Global Step: 806020 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:04:37,458-Speed 6290.44 samples/sec Loss 2.5232 LearningRate 0.0000 Epoch: 38 Global Step: 806030 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:04:40,713-Speed 6293.72 samples/sec Loss 2.4797 LearningRate 0.0000 Epoch: 38 Global Step: 806040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:04:43,968-Speed 6293.42 samples/sec Loss 2.5231 LearningRate 0.0000 Epoch: 38 Global Step: 806050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:04:47,216-Speed 6306.65 samples/sec Loss 2.5248 LearningRate 0.0000 Epoch: 38 Global Step: 806060 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:04:50,471-Speed 6292.23 samples/sec Loss 2.5423 LearningRate 0.0000 Epoch: 38 Global Step: 806070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:04:53,724-Speed 6297.46 samples/sec Loss 2.5718 LearningRate 0.0000 Epoch: 38 Global Step: 806080 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:04:56,976-Speed 6299.51 samples/sec Loss 2.5014 LearningRate 0.0000 Epoch: 38 Global Step: 806090 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:00,229-Speed 6297.33 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 38 Global Step: 806100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:03,490-Speed 6282.01 samples/sec Loss 2.5341 LearningRate 0.0000 Epoch: 38 Global Step: 806110 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:06,745-Speed 6292.31 samples/sec Loss 2.5435 LearningRate 0.0000 Epoch: 38 Global Step: 806120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:09,993-Speed 6307.72 samples/sec Loss 2.4653 LearningRate 0.0000 Epoch: 38 Global Step: 806130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:13,235-Speed 6319.12 samples/sec Loss 2.5446 LearningRate 0.0000 Epoch: 38 Global Step: 806140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:16,482-Speed 6308.07 samples/sec Loss 2.5378 LearningRate 0.0000 Epoch: 38 Global Step: 806150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:19,734-Speed 6298.98 samples/sec Loss 2.5667 LearningRate 0.0000 Epoch: 38 Global Step: 806160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:22,984-Speed 6303.14 samples/sec Loss 2.5309 LearningRate 0.0000 Epoch: 38 Global Step: 806170 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:26,235-Speed 6302.10 samples/sec Loss 2.5370 LearningRate 0.0000 Epoch: 38 Global Step: 806180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:29,491-Speed 6292.05 samples/sec Loss 2.5069 LearningRate 0.0000 Epoch: 38 Global Step: 806190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:32,750-Speed 6283.80 samples/sec Loss 2.5625 LearningRate 0.0000 Epoch: 38 Global Step: 806200 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:36,011-Speed 6282.47 samples/sec Loss 2.5662 LearningRate 0.0000 Epoch: 38 Global Step: 806210 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:39,261-Speed 6303.22 samples/sec Loss 2.5725 LearningRate 0.0000 Epoch: 38 Global Step: 806220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:42,510-Speed 6303.71 samples/sec Loss 2.5728 LearningRate 0.0000 Epoch: 38 Global Step: 806230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:45,751-Speed 6321.96 samples/sec Loss 2.5180 LearningRate 0.0000 Epoch: 38 Global Step: 806240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:49,007-Speed 6290.20 samples/sec Loss 2.5598 LearningRate 0.0000 Epoch: 38 Global Step: 806250 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:52,252-Speed 6313.16 samples/sec Loss 2.6058 LearningRate 0.0000 Epoch: 38 Global Step: 806260 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:55,521-Speed 6265.42 samples/sec Loss 2.5244 LearningRate 0.0000 Epoch: 38 Global Step: 806270 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:05:58,785-Speed 6276.97 samples/sec Loss 2.5277 LearningRate 0.0000 Epoch: 38 Global Step: 806280 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:02,040-Speed 6292.87 samples/sec Loss 2.5413 LearningRate 0.0000 Epoch: 38 Global Step: 806290 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:05,284-Speed 6315.16 samples/sec Loss 2.4764 LearningRate 0.0000 Epoch: 38 Global Step: 806300 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:08,553-Speed 6265.11 samples/sec Loss 2.5153 LearningRate 0.0000 Epoch: 38 Global Step: 806310 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:11,802-Speed 6305.49 samples/sec Loss 2.5687 LearningRate 0.0000 Epoch: 38 Global Step: 806320 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:15,057-Speed 6293.04 samples/sec Loss 2.5614 LearningRate 0.0000 Epoch: 38 Global Step: 806330 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:18,308-Speed 6300.24 samples/sec Loss 2.5655 LearningRate 0.0000 Epoch: 38 Global Step: 806340 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:06:21,548-Speed 6324.67 samples/sec Loss 2.5263 LearningRate 0.0000 Epoch: 38 Global Step: 806350 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:24,806-Speed 6286.79 samples/sec Loss 2.4880 LearningRate 0.0000 Epoch: 38 Global Step: 806360 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:28,061-Speed 6293.00 samples/sec Loss 2.5881 LearningRate 0.0000 Epoch: 38 Global Step: 806370 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:31,318-Speed 6291.16 samples/sec Loss 2.5537 LearningRate 0.0000 Epoch: 38 Global Step: 806380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:34,570-Speed 6297.59 samples/sec Loss 2.5210 LearningRate 0.0000 Epoch: 38 Global Step: 806390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:37,834-Speed 6276.15 samples/sec Loss 2.4815 LearningRate 0.0000 Epoch: 38 Global Step: 806400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:41,093-Speed 6285.71 samples/sec Loss 2.5111 LearningRate 0.0000 Epoch: 38 Global Step: 806410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:44,349-Speed 6291.00 samples/sec Loss 2.5283 LearningRate 0.0000 Epoch: 38 Global Step: 806420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:47,610-Speed 6282.33 samples/sec Loss 2.5422 LearningRate 0.0000 Epoch: 38 Global Step: 806430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:50,862-Speed 6298.41 samples/sec Loss 2.5462 LearningRate 0.0000 Epoch: 38 Global Step: 806440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:54,101-Speed 6324.95 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 38 Global Step: 806450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:06:57,358-Speed 6289.33 samples/sec Loss 2.5454 LearningRate 0.0000 Epoch: 38 Global Step: 806460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:00,615-Speed 6288.77 samples/sec Loss 2.5678 LearningRate 0.0000 Epoch: 38 Global Step: 806470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:03,866-Speed 6302.22 samples/sec Loss 2.5420 LearningRate 0.0000 Epoch: 38 Global Step: 806480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:07,133-Speed 6269.67 samples/sec Loss 2.5475 LearningRate 0.0000 Epoch: 38 Global Step: 806490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:10,386-Speed 6296.26 samples/sec Loss 2.5197 LearningRate 0.0000 Epoch: 38 Global Step: 806500 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:13,644-Speed 6286.89 samples/sec Loss 2.5666 LearningRate 0.0000 Epoch: 38 Global Step: 806510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:16,892-Speed 6308.41 samples/sec Loss 2.5185 LearningRate 0.0000 Epoch: 38 Global Step: 806520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:20,144-Speed 6298.68 samples/sec Loss 2.5512 LearningRate 0.0000 Epoch: 38 Global Step: 806530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:23,393-Speed 6304.61 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 38 Global Step: 806540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:26,643-Speed 6302.48 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 38 Global Step: 806550 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:07:29,883-Speed 6324.62 samples/sec Loss 2.5770 LearningRate 0.0000 Epoch: 38 Global Step: 806560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:33,137-Speed 6294.57 samples/sec Loss 2.5307 LearningRate 0.0000 Epoch: 38 Global Step: 806570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:36,403-Speed 6272.25 samples/sec Loss 2.5197 LearningRate 0.0000 Epoch: 38 Global Step: 806580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:39,656-Speed 6296.33 samples/sec Loss 2.5618 LearningRate 0.0000 Epoch: 38 Global Step: 806590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:42,915-Speed 6285.38 samples/sec Loss 2.5170 LearningRate 0.0000 Epoch: 38 Global Step: 806600 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:46,181-Speed 6272.03 samples/sec Loss 2.5382 LearningRate 0.0000 Epoch: 38 Global Step: 806610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:49,431-Speed 6303.76 samples/sec Loss 2.5450 LearningRate 0.0000 Epoch: 38 Global Step: 806620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:52,688-Speed 6289.01 samples/sec Loss 2.5459 LearningRate 0.0000 Epoch: 38 Global Step: 806630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:55,939-Speed 6302.06 samples/sec Loss 2.5519 LearningRate 0.0000 Epoch: 38 Global Step: 806640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:07:59,193-Speed 6293.71 samples/sec Loss 2.5075 LearningRate 0.0000 Epoch: 38 Global Step: 806650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:02,439-Speed 6311.20 samples/sec Loss 2.5387 LearningRate 0.0000 Epoch: 38 Global Step: 806660 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:05,700-Speed 6282.18 samples/sec Loss 2.5831 LearningRate 0.0000 Epoch: 38 Global Step: 806670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:08,952-Speed 6299.34 samples/sec Loss 2.5888 LearningRate 0.0000 Epoch: 38 Global Step: 806680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:12,199-Speed 6308.85 samples/sec Loss 2.5390 LearningRate 0.0000 Epoch: 38 Global Step: 806690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:15,455-Speed 6289.56 samples/sec Loss 2.5766 LearningRate 0.0000 Epoch: 38 Global Step: 806700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:18,702-Speed 6310.27 samples/sec Loss 2.5191 LearningRate 0.0000 Epoch: 38 Global Step: 806710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:21,954-Speed 6299.05 samples/sec Loss 2.5346 LearningRate 0.0000 Epoch: 38 Global Step: 806720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:25,208-Speed 6293.81 samples/sec Loss 2.5784 LearningRate 0.0000 Epoch: 38 Global Step: 806730 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:28,473-Speed 6275.36 samples/sec Loss 2.5332 LearningRate 0.0000 Epoch: 38 Global Step: 806740 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:31,720-Speed 6307.63 samples/sec Loss 2.5140 LearningRate 0.0000 Epoch: 38 Global Step: 806750 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:34,959-Speed 6325.10 samples/sec Loss 2.5363 LearningRate 0.0000 Epoch: 38 Global Step: 806760 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:38,215-Speed 6291.12 samples/sec Loss 2.5359 LearningRate 0.0000 Epoch: 38 Global Step: 806770 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:41,481-Speed 6272.09 samples/sec Loss 2.5599 LearningRate 0.0000 Epoch: 38 Global Step: 806780 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:44,735-Speed 6295.52 samples/sec Loss 2.5547 LearningRate 0.0000 Epoch: 38 Global Step: 806790 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:47,986-Speed 6301.65 samples/sec Loss 2.5406 LearningRate 0.0000 Epoch: 38 Global Step: 806800 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:51,237-Speed 6300.69 samples/sec Loss 2.5408 LearningRate 0.0000 Epoch: 38 Global Step: 806810 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:54,486-Speed 6305.82 samples/sec Loss 2.6021 LearningRate 0.0000 Epoch: 38 Global Step: 806820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:08:57,743-Speed 6288.70 samples/sec Loss 2.5586 LearningRate 0.0000 Epoch: 38 Global Step: 806830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:09:00,999-Speed 6294.76 samples/sec Loss 2.5579 LearningRate 0.0000 Epoch: 38 Global Step: 806840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:09:04,259-Speed 6283.58 samples/sec Loss 2.5069 LearningRate 0.0000 Epoch: 38 Global Step: 806850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:09:07,491-Speed 6338.26 samples/sec Loss 2.5130 LearningRate 0.0000 Epoch: 38 Global Step: 806860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:09:10,750-Speed 6284.97 samples/sec Loss 2.5676 LearningRate 0.0000 Epoch: 38 Global Step: 806870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:09:14,003-Speed 6297.77 samples/sec Loss 2.5395 LearningRate 0.0000 Epoch: 38 Global Step: 806880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:09:17,239-Speed 6330.34 samples/sec Loss 2.5511 LearningRate 0.0000 Epoch: 38 Global Step: 806890 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:09:20,495-Speed 6290.22 samples/sec Loss 2.5288 LearningRate 0.0000 Epoch: 38 Global Step: 806900 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:09:23,762-Speed 6270.41 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 38 Global Step: 806910 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:09:27,017-Speed 6293.99 samples/sec Loss 2.5482 LearningRate 0.0000 Epoch: 38 Global Step: 806920 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:09:30,261-Speed 6314.19 samples/sec Loss 2.5387 LearningRate 0.0000 Epoch: 38 Global Step: 806930 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:09:33,512-Speed 6300.23 samples/sec Loss 2.5264 LearningRate 0.0000 Epoch: 38 Global Step: 806940 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:09:36,775-Speed 6278.95 samples/sec Loss 2.5774 LearningRate 0.0000 Epoch: 38 Global Step: 806950 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:09:40,031-Speed 6290.61 samples/sec Loss 2.5513 LearningRate 0.0000 Epoch: 38 Global Step: 806960 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:09:43,285-Speed 6296.27 samples/sec Loss 2.5628 LearningRate 0.0000 Epoch: 38 Global Step: 806970 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:09:46,535-Speed 6302.23 samples/sec Loss 2.5808 LearningRate 0.0000 Epoch: 38 Global Step: 806980 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:09:49,793-Speed 6288.52 samples/sec Loss 2.4893 LearningRate 0.0000 Epoch: 38 Global Step: 806990 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:09:53,097-Speed 6199.96 samples/sec Loss 2.5282 LearningRate 0.0000 Epoch: 38 Global Step: 807000 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:09:56,353-Speed 6291.78 samples/sec Loss 2.4951 LearningRate 0.0000 Epoch: 38 Global Step: 807010 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:09:59,602-Speed 6304.58 samples/sec Loss 2.5166 LearningRate 0.0000 Epoch: 38 Global Step: 807020 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:02,855-Speed 6296.10 samples/sec Loss 2.5149 LearningRate 0.0000 Epoch: 38 Global Step: 807030 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:06,124-Speed 6268.08 samples/sec Loss 2.5166 LearningRate 0.0000 Epoch: 38 Global Step: 807040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:09,375-Speed 6299.75 samples/sec Loss 2.5270 LearningRate 0.0000 Epoch: 38 Global Step: 807050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:12,633-Speed 6288.62 samples/sec Loss 2.5048 LearningRate 0.0000 Epoch: 38 Global Step: 807060 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:15,893-Speed 6282.20 samples/sec Loss 2.5095 LearningRate 0.0000 Epoch: 38 Global Step: 807070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:19,145-Speed 6300.00 samples/sec Loss 2.5816 LearningRate 0.0000 Epoch: 38 Global Step: 807080 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:22,408-Speed 6277.78 samples/sec Loss 2.5532 LearningRate 0.0000 Epoch: 38 Global Step: 807090 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:10:25,645-Speed 6326.74 samples/sec Loss 2.5302 LearningRate 0.0000 Epoch: 38 Global Step: 807100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:28,906-Speed 6283.51 samples/sec Loss 2.5906 LearningRate 0.0000 Epoch: 38 Global Step: 807110 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:32,164-Speed 6287.42 samples/sec Loss 2.5836 LearningRate 0.0000 Epoch: 38 Global Step: 807120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:35,413-Speed 6304.96 samples/sec Loss 2.5819 LearningRate 0.0000 Epoch: 38 Global Step: 807130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:38,672-Speed 6284.47 samples/sec Loss 2.5297 LearningRate 0.0000 Epoch: 38 Global Step: 807140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:41,950-Speed 6249.63 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 38 Global Step: 807150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:45,199-Speed 6304.50 samples/sec Loss 2.5880 LearningRate 0.0000 Epoch: 38 Global Step: 807160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:48,450-Speed 6301.18 samples/sec Loss 2.5317 LearningRate 0.0000 Epoch: 38 Global Step: 807170 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:51,701-Speed 6301.23 samples/sec Loss 2.5181 LearningRate 0.0000 Epoch: 38 Global Step: 807180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:54,951-Speed 6302.79 samples/sec Loss 2.5085 LearningRate 0.0000 Epoch: 38 Global Step: 807190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:10:58,187-Speed 6330.65 samples/sec Loss 2.5222 LearningRate 0.0000 Epoch: 38 Global Step: 807200 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:01,437-Speed 6303.51 samples/sec Loss 2.5557 LearningRate 0.0000 Epoch: 38 Global Step: 807210 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:04,691-Speed 6295.10 samples/sec Loss 2.5398 LearningRate 0.0000 Epoch: 38 Global Step: 807220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:07,940-Speed 6305.69 samples/sec Loss 2.5673 LearningRate 0.0000 Epoch: 38 Global Step: 807230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:11,193-Speed 6296.65 samples/sec Loss 2.5807 LearningRate 0.0000 Epoch: 38 Global Step: 807240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:14,443-Speed 6301.75 samples/sec Loss 2.5713 LearningRate 0.0000 Epoch: 38 Global Step: 807250 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:17,702-Speed 6286.13 samples/sec Loss 2.5536 LearningRate 0.0000 Epoch: 38 Global Step: 807260 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:20,952-Speed 6303.23 samples/sec Loss 2.5218 LearningRate 0.0000 Epoch: 38 Global Step: 807270 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:24,205-Speed 6296.79 samples/sec Loss 2.5592 LearningRate 0.0000 Epoch: 38 Global Step: 807280 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:27,449-Speed 6314.99 samples/sec Loss 2.5566 LearningRate 0.0000 Epoch: 38 Global Step: 807290 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:30,686-Speed 6328.23 samples/sec Loss 2.4675 LearningRate 0.0000 Epoch: 38 Global Step: 807300 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:33,933-Speed 6308.18 samples/sec Loss 2.5567 LearningRate 0.0000 Epoch: 38 Global Step: 807310 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:37,184-Speed 6301.78 samples/sec Loss 2.5429 LearningRate 0.0000 Epoch: 38 Global Step: 807320 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:40,439-Speed 6292.65 samples/sec Loss 2.5268 LearningRate 0.0000 Epoch: 38 Global Step: 807330 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:43,694-Speed 6293.73 samples/sec Loss 2.5382 LearningRate 0.0000 Epoch: 38 Global Step: 807340 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:46,954-Speed 6283.22 samples/sec Loss 2.5384 LearningRate 0.0000 Epoch: 38 Global Step: 807350 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:50,203-Speed 6304.56 samples/sec Loss 2.4946 LearningRate 0.0000 Epoch: 38 Global Step: 807360 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:53,453-Speed 6303.56 samples/sec Loss 2.5021 LearningRate 0.0000 Epoch: 38 Global Step: 807370 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:56,722-Speed 6266.22 samples/sec Loss 2.5622 LearningRate 0.0000 Epoch: 38 Global Step: 807380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:11:59,971-Speed 6304.17 samples/sec Loss 2.5001 LearningRate 0.0000 Epoch: 38 Global Step: 807390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:03,209-Speed 6327.31 samples/sec Loss 2.5370 LearningRate 0.0000 Epoch: 38 Global Step: 807400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:06,459-Speed 6303.37 samples/sec Loss 2.5541 LearningRate 0.0000 Epoch: 38 Global Step: 807410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:09,732-Speed 6256.78 samples/sec Loss 2.5431 LearningRate 0.0000 Epoch: 38 Global Step: 807420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:12,989-Speed 6290.82 samples/sec Loss 2.5255 LearningRate 0.0000 Epoch: 38 Global Step: 807430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:16,248-Speed 6285.44 samples/sec Loss 2.5041 LearningRate 0.0000 Epoch: 38 Global Step: 807440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:19,500-Speed 6299.99 samples/sec Loss 2.5754 LearningRate 0.0000 Epoch: 38 Global Step: 807450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:22,751-Speed 6300.88 samples/sec Loss 2.5786 LearningRate 0.0000 Epoch: 38 Global Step: 807460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:26,006-Speed 6293.88 samples/sec Loss 2.5574 LearningRate 0.0000 Epoch: 38 Global Step: 807470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:29,255-Speed 6303.52 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 38 Global Step: 807480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:32,504-Speed 6304.89 samples/sec Loss 2.5088 LearningRate 0.0000 Epoch: 38 Global Step: 807490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:35,752-Speed 6308.54 samples/sec Loss 2.5551 LearningRate 0.0000 Epoch: 38 Global Step: 807500 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:39,008-Speed 6291.22 samples/sec Loss 2.5597 LearningRate 0.0000 Epoch: 38 Global Step: 807510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:42,262-Speed 6293.46 samples/sec Loss 2.5191 LearningRate 0.0000 Epoch: 38 Global Step: 807520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:45,522-Speed 6284.07 samples/sec Loss 2.5407 LearningRate 0.0000 Epoch: 38 Global Step: 807530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:48,786-Speed 6277.22 samples/sec Loss 2.5282 LearningRate 0.0000 Epoch: 38 Global Step: 807540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:52,057-Speed 6262.38 samples/sec Loss 2.5414 LearningRate 0.0000 Epoch: 38 Global Step: 807550 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:55,311-Speed 6293.95 samples/sec Loss 2.5567 LearningRate 0.0000 Epoch: 38 Global Step: 807560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:12:58,561-Speed 6303.93 samples/sec Loss 2.5485 LearningRate 0.0000 Epoch: 38 Global Step: 807570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:01,813-Speed 6297.80 samples/sec Loss 2.5720 LearningRate 0.0000 Epoch: 38 Global Step: 807580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:05,065-Speed 6300.36 samples/sec Loss 2.5689 LearningRate 0.0000 Epoch: 38 Global Step: 807590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:08,320-Speed 6292.07 samples/sec Loss 2.5209 LearningRate 0.0000 Epoch: 38 Global Step: 807600 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:13:11,554-Speed 6335.43 samples/sec Loss 2.5099 LearningRate 0.0000 Epoch: 38 Global Step: 807610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:14,809-Speed 6291.71 samples/sec Loss 2.5562 LearningRate 0.0000 Epoch: 38 Global Step: 807620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:18,063-Speed 6296.11 samples/sec Loss 2.5636 LearningRate 0.0000 Epoch: 38 Global Step: 807630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:21,314-Speed 6300.79 samples/sec Loss 2.5697 LearningRate 0.0000 Epoch: 38 Global Step: 807640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:24,568-Speed 6294.96 samples/sec Loss 2.5284 LearningRate 0.0000 Epoch: 38 Global Step: 807650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:27,827-Speed 6287.17 samples/sec Loss 2.5076 LearningRate 0.0000 Epoch: 38 Global Step: 807660 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:31,083-Speed 6291.47 samples/sec Loss 2.5318 LearningRate 0.0000 Epoch: 38 Global Step: 807670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:34,337-Speed 6294.94 samples/sec Loss 2.5958 LearningRate 0.0000 Epoch: 38 Global Step: 807680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:37,594-Speed 6288.62 samples/sec Loss 2.5834 LearningRate 0.0000 Epoch: 38 Global Step: 807690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:40,845-Speed 6302.04 samples/sec Loss 2.5697 LearningRate 0.0000 Epoch: 38 Global Step: 807700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:44,082-Speed 6328.02 samples/sec Loss 2.5080 LearningRate 0.0000 Epoch: 38 Global Step: 807710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:47,339-Speed 6288.82 samples/sec Loss 2.5483 LearningRate 0.0000 Epoch: 38 Global Step: 807720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:50,591-Speed 6298.81 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 38 Global Step: 807730 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:53,860-Speed 6267.43 samples/sec Loss 2.5423 LearningRate 0.0000 Epoch: 38 Global Step: 807740 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:13:57,110-Speed 6302.07 samples/sec Loss 2.5584 LearningRate 0.0000 Epoch: 38 Global Step: 807750 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:00,362-Speed 6299.08 samples/sec Loss 2.5310 LearningRate 0.0000 Epoch: 38 Global Step: 807760 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:03,627-Speed 6273.36 samples/sec Loss 2.5129 LearningRate 0.0000 Epoch: 38 Global Step: 807770 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:06,898-Speed 6264.12 samples/sec Loss 2.4935 LearningRate 0.0000 Epoch: 38 Global Step: 807780 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:10,168-Speed 6263.82 samples/sec Loss 2.5240 LearningRate 0.0000 Epoch: 38 Global Step: 807790 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:13,428-Speed 6282.96 samples/sec Loss 2.5691 LearningRate 0.0000 Epoch: 38 Global Step: 807800 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:16,664-Speed 6331.31 samples/sec Loss 2.5456 LearningRate 0.0000 Epoch: 38 Global Step: 807810 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:19,922-Speed 6287.01 samples/sec Loss 2.5772 LearningRate 0.0000 Epoch: 38 Global Step: 807820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:23,171-Speed 6303.87 samples/sec Loss 2.5092 LearningRate 0.0000 Epoch: 38 Global Step: 807830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:26,427-Speed 6292.18 samples/sec Loss 2.5051 LearningRate 0.0000 Epoch: 38 Global Step: 807840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:29,682-Speed 6294.31 samples/sec Loss 2.5623 LearningRate 0.0000 Epoch: 38 Global Step: 807850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:32,936-Speed 6294.10 samples/sec Loss 2.5455 LearningRate 0.0000 Epoch: 38 Global Step: 807860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:36,221-Speed 6237.31 samples/sec Loss 2.5575 LearningRate 0.0000 Epoch: 38 Global Step: 807870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:39,593-Speed 6074.46 samples/sec Loss 2.4997 LearningRate 0.0000 Epoch: 38 Global Step: 807880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:42,927-Speed 6143.47 samples/sec Loss 2.5450 LearningRate 0.0000 Epoch: 38 Global Step: 807890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:46,181-Speed 6295.71 samples/sec Loss 2.5260 LearningRate 0.0000 Epoch: 38 Global Step: 807900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:49,422-Speed 6320.69 samples/sec Loss 2.5377 LearningRate 0.0000 Epoch: 38 Global Step: 807910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:52,692-Speed 6264.39 samples/sec Loss 2.5412 LearningRate 0.0000 Epoch: 38 Global Step: 807920 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:55,945-Speed 6296.77 samples/sec Loss 2.5978 LearningRate 0.0000 Epoch: 38 Global Step: 807930 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:14:59,195-Speed 6302.35 samples/sec Loss 2.5424 LearningRate 0.0000 Epoch: 38 Global Step: 807940 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:02,463-Speed 6267.93 samples/sec Loss 2.5491 LearningRate 0.0000 Epoch: 38 Global Step: 807950 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:05,718-Speed 6293.73 samples/sec Loss 2.5609 LearningRate 0.0000 Epoch: 38 Global Step: 807960 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:08,969-Speed 6302.01 samples/sec Loss 2.5226 LearningRate 0.0000 Epoch: 38 Global Step: 807970 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:12,222-Speed 6296.89 samples/sec Loss 2.5250 LearningRate 0.0000 Epoch: 38 Global Step: 807980 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:15,478-Speed 6290.05 samples/sec Loss 2.5563 LearningRate 0.0000 Epoch: 38 Global Step: 807990 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:18,730-Speed 6299.94 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 38 Global Step: 808000 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:21,964-Speed 6333.76 samples/sec Loss 2.5609 LearningRate 0.0000 Epoch: 38 Global Step: 808010 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:25,225-Speed 6282.19 samples/sec Loss 2.5422 LearningRate 0.0000 Epoch: 38 Global Step: 808020 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:28,480-Speed 6293.23 samples/sec Loss 2.5608 LearningRate 0.0000 Epoch: 38 Global Step: 808030 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:31,731-Speed 6300.19 samples/sec Loss 2.5139 LearningRate 0.0000 Epoch: 38 Global Step: 808040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:34,981-Speed 6303.30 samples/sec Loss 2.5442 LearningRate 0.0000 Epoch: 38 Global Step: 808050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:38,242-Speed 6281.78 samples/sec Loss 2.4975 LearningRate 0.0000 Epoch: 38 Global Step: 808060 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:41,493-Speed 6300.38 samples/sec Loss 2.4988 LearningRate 0.0000 Epoch: 38 Global Step: 808070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:44,746-Speed 6298.13 samples/sec Loss 2.5363 LearningRate 0.0000 Epoch: 38 Global Step: 808080 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:47,992-Speed 6310.70 samples/sec Loss 2.6021 LearningRate 0.0000 Epoch: 38 Global Step: 808090 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:51,245-Speed 6297.08 samples/sec Loss 2.5477 LearningRate 0.0000 Epoch: 38 Global Step: 808100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:15:54,501-Speed 6291.68 samples/sec Loss 2.5587 LearningRate 0.0000 Epoch: 38 Global Step: 808110 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:15:57,753-Speed 6299.85 samples/sec Loss 2.5366 LearningRate 0.0000 Epoch: 38 Global Step: 808120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:01,005-Speed 6298.73 samples/sec Loss 2.4983 LearningRate 0.0000 Epoch: 38 Global Step: 808130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:04,257-Speed 6300.01 samples/sec Loss 2.5481 LearningRate 0.0000 Epoch: 38 Global Step: 808140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:07,505-Speed 6304.93 samples/sec Loss 2.4931 LearningRate 0.0000 Epoch: 38 Global Step: 808150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:10,763-Speed 6287.59 samples/sec Loss 2.5895 LearningRate 0.0000 Epoch: 38 Global Step: 808160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:14,011-Speed 6308.16 samples/sec Loss 2.5815 LearningRate 0.0000 Epoch: 38 Global Step: 808170 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:17,265-Speed 6293.89 samples/sec Loss 2.4977 LearningRate 0.0000 Epoch: 38 Global Step: 808180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:20,514-Speed 6305.88 samples/sec Loss 2.5138 LearningRate 0.0000 Epoch: 38 Global Step: 808190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:23,777-Speed 6278.07 samples/sec Loss 2.5244 LearningRate 0.0000 Epoch: 38 Global Step: 808200 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:27,031-Speed 6293.41 samples/sec Loss 2.5304 LearningRate 0.0000 Epoch: 38 Global Step: 808210 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:30,272-Speed 6321.70 samples/sec Loss 2.5383 LearningRate 0.0000 Epoch: 38 Global Step: 808220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:33,521-Speed 6304.84 samples/sec Loss 2.5083 LearningRate 0.0000 Epoch: 38 Global Step: 808230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:36,774-Speed 6297.83 samples/sec Loss 2.5742 LearningRate 0.0000 Epoch: 38 Global Step: 808240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:40,032-Speed 6285.96 samples/sec Loss 2.5135 LearningRate 0.0000 Epoch: 38 Global Step: 808250 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:16:43,272-Speed 6323.72 samples/sec Loss 2.5391 LearningRate 0.0000 Epoch: 38 Global Step: 808260 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:16:46,528-Speed 6291.40 samples/sec Loss 2.5453 LearningRate 0.0000 Epoch: 38 Global Step: 808270 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:16:49,781-Speed 6296.30 samples/sec Loss 2.5525 LearningRate 0.0000 Epoch: 38 Global Step: 808280 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:16:53,132-Speed 6114.31 samples/sec Loss 2.5184 LearningRate 0.0000 Epoch: 38 Global Step: 808290 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:16:56,412-Speed 6243.59 samples/sec Loss 2.5577 LearningRate 0.0000 Epoch: 38 Global Step: 808300 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:16:59,665-Speed 6297.09 samples/sec Loss 2.5286 LearningRate 0.0000 Epoch: 38 Global Step: 808310 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:17:02,929-Speed 6277.41 samples/sec Loss 2.5387 LearningRate 0.0000 Epoch: 38 Global Step: 808320 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:17:06,186-Speed 6289.64 samples/sec Loss 2.4901 LearningRate 0.0000 Epoch: 38 Global Step: 808330 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:17:09,434-Speed 6307.18 samples/sec Loss 2.5676 LearningRate 0.0000 Epoch: 38 Global Step: 808340 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:17:12,687-Speed 6296.64 samples/sec Loss 2.5328 LearningRate 0.0000 Epoch: 38 Global Step: 808350 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:17:15,941-Speed 6295.10 samples/sec Loss 2.5745 LearningRate 0.0000 Epoch: 38 Global Step: 808360 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:19,220-Speed 6247.09 samples/sec Loss 2.5501 LearningRate 0.0000 Epoch: 38 Global Step: 808370 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:22,472-Speed 6298.94 samples/sec Loss 2.5979 LearningRate 0.0000 Epoch: 38 Global Step: 808380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:25,722-Speed 6302.51 samples/sec Loss 2.5235 LearningRate 0.0000 Epoch: 38 Global Step: 808390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:28,970-Speed 6306.91 samples/sec Loss 2.4975 LearningRate 0.0000 Epoch: 38 Global Step: 808400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:32,224-Speed 6293.89 samples/sec Loss 2.5834 LearningRate 0.0000 Epoch: 38 Global Step: 808410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:35,479-Speed 6294.25 samples/sec Loss 2.5651 LearningRate 0.0000 Epoch: 38 Global Step: 808420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:38,742-Speed 6278.46 samples/sec Loss 2.5046 LearningRate 0.0000 Epoch: 38 Global Step: 808430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:42,008-Speed 6271.52 samples/sec Loss 2.5349 LearningRate 0.0000 Epoch: 38 Global Step: 808440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:45,275-Speed 6270.21 samples/sec Loss 2.5511 LearningRate 0.0000 Epoch: 38 Global Step: 808450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:48,512-Speed 6328.12 samples/sec Loss 2.5441 LearningRate 0.0000 Epoch: 38 Global Step: 808460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:51,777-Speed 6272.73 samples/sec Loss 2.5329 LearningRate 0.0000 Epoch: 38 Global Step: 808470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:55,092-Speed 6179.13 samples/sec Loss 2.5213 LearningRate 0.0000 Epoch: 38 Global Step: 808480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:17:58,360-Speed 6270.01 samples/sec Loss 2.5262 LearningRate 0.0000 Epoch: 38 Global Step: 808490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:01,619-Speed 6285.12 samples/sec Loss 2.5684 LearningRate 0.0000 Epoch: 38 Global Step: 808500 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:04,888-Speed 6267.62 samples/sec Loss 2.5415 LearningRate 0.0000 Epoch: 38 Global Step: 808510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:08,146-Speed 6287.49 samples/sec Loss 2.5753 LearningRate 0.0000 Epoch: 38 Global Step: 808520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:11,407-Speed 6280.22 samples/sec Loss 2.5506 LearningRate 0.0000 Epoch: 38 Global Step: 808530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:14,660-Speed 6297.45 samples/sec Loss 2.5850 LearningRate 0.0000 Epoch: 38 Global Step: 808540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:17,925-Speed 6273.82 samples/sec Loss 2.5695 LearningRate 0.0000 Epoch: 38 Global Step: 808550 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:21,161-Speed 6330.02 samples/sec Loss 2.4784 LearningRate 0.0000 Epoch: 38 Global Step: 808560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:24,416-Speed 6293.68 samples/sec Loss 2.5875 LearningRate 0.0000 Epoch: 38 Global Step: 808570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:27,666-Speed 6302.58 samples/sec Loss 2.5391 LearningRate 0.0000 Epoch: 38 Global Step: 808580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:30,919-Speed 6298.45 samples/sec Loss 2.5186 LearningRate 0.0000 Epoch: 38 Global Step: 808590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:34,179-Speed 6283.16 samples/sec Loss 2.5801 LearningRate 0.0000 Epoch: 38 Global Step: 808600 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:37,434-Speed 6291.85 samples/sec Loss 2.5736 LearningRate 0.0000 Epoch: 38 Global Step: 808610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:40,691-Speed 6290.15 samples/sec Loss 2.4932 LearningRate 0.0000 Epoch: 38 Global Step: 808620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:43,942-Speed 6301.11 samples/sec Loss 2.5707 LearningRate 0.0000 Epoch: 38 Global Step: 808630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:47,195-Speed 6296.90 samples/sec Loss 2.5254 LearningRate 0.0000 Epoch: 38 Global Step: 808640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:50,449-Speed 6294.89 samples/sec Loss 2.5854 LearningRate 0.0000 Epoch: 38 Global Step: 808650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:18:53,705-Speed 6292.28 samples/sec Loss 2.5520 LearningRate 0.0000 Epoch: 38 Global Step: 808660 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:18:56,942-Speed 6327.54 samples/sec Loss 2.5776 LearningRate 0.0000 Epoch: 38 Global Step: 808670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:19:00,191-Speed 6304.79 samples/sec Loss 2.5780 LearningRate 0.0000 Epoch: 38 Global Step: 808680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:19:03,439-Speed 6306.82 samples/sec Loss 2.5259 LearningRate 0.0000 Epoch: 38 Global Step: 808690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:19:06,687-Speed 6307.50 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 38 Global Step: 808700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:19:09,964-Speed 6251.19 samples/sec Loss 2.5629 LearningRate 0.0000 Epoch: 38 Global Step: 808710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:19:13,288-Speed 6163.34 samples/sec Loss 2.5306 LearningRate 0.0000 Epoch: 38 Global Step: 808720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:19:16,543-Speed 6293.60 samples/sec Loss 2.4971 LearningRate 0.0000 Epoch: 38 Global Step: 808730 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:19:19,792-Speed 6304.31 samples/sec Loss 2.5104 LearningRate 0.0000 Epoch: 38 Global Step: 808740 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:19:23,040-Speed 6306.16 samples/sec Loss 2.5186 LearningRate 0.0000 Epoch: 38 Global Step: 808750 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:19:26,302-Speed 6280.26 samples/sec Loss 2.5714 LearningRate 0.0000 Epoch: 38 Global Step: 808760 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:19:29,517-Speed 6371.96 samples/sec Loss 2.5689 LearningRate 0.0000 Epoch: 38 Global Step: 808770 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:19:32,776-Speed 6285.99 samples/sec Loss 2.5965 LearningRate 0.0000 Epoch: 38 Global Step: 808780 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:19:36,027-Speed 6299.45 samples/sec Loss 2.5291 LearningRate 0.0000 Epoch: 38 Global Step: 808790 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:19:39,281-Speed 6295.79 samples/sec Loss 2.4977 LearningRate 0.0000 Epoch: 38 Global Step: 808800 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:19:42,537-Speed 6290.60 samples/sec Loss 2.5451 LearningRate 0.0000 Epoch: 38 Global Step: 808810 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:19:45,796-Speed 6287.58 samples/sec Loss 2.5302 LearningRate 0.0000 Epoch: 38 Global Step: 808820 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:19:49,080-Speed 6236.68 samples/sec Loss 2.5204 LearningRate 0.0000 Epoch: 38 Global Step: 808830 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:19:52,337-Speed 6288.60 samples/sec Loss 2.5731 LearningRate 0.0000 Epoch: 38 Global Step: 808840 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:20:52,199-Speed 342.13 samples/sec Loss 2.5834 LearningRate 0.0000 Epoch: 39 Global Step: 808850 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:20:55,450-Speed 6300.41 samples/sec Loss 2.5851 LearningRate 0.0000 Epoch: 39 Global Step: 808860 Fp16 Grad Scale: 2048 Required: 2 hours Training: 2022-04-03 17:20:58,694-Speed 6313.69 samples/sec Loss 2.4976 LearningRate 0.0000 Epoch: 39 Global Step: 808870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:01,938-Speed 6316.25 samples/sec Loss 2.5128 LearningRate 0.0000 Epoch: 39 Global Step: 808880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:05,183-Speed 6311.89 samples/sec Loss 2.5200 LearningRate 0.0000 Epoch: 39 Global Step: 808890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:08,441-Speed 6288.42 samples/sec Loss 2.5630 LearningRate 0.0000 Epoch: 39 Global Step: 808900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:11,685-Speed 6314.19 samples/sec Loss 2.5616 LearningRate 0.0000 Epoch: 39 Global Step: 808910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:14,931-Speed 6311.26 samples/sec Loss 2.5658 LearningRate 0.0000 Epoch: 39 Global Step: 808920 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:18,175-Speed 6314.40 samples/sec Loss 2.5297 LearningRate 0.0000 Epoch: 39 Global Step: 808930 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:21,429-Speed 6295.08 samples/sec Loss 2.5537 LearningRate 0.0000 Epoch: 39 Global Step: 808940 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:24,777-Speed 6119.29 samples/sec Loss 2.4820 LearningRate 0.0000 Epoch: 39 Global Step: 808950 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:28,063-Speed 6233.85 samples/sec Loss 2.5599 LearningRate 0.0000 Epoch: 39 Global Step: 808960 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:31,350-Speed 6232.35 samples/sec Loss 2.5293 LearningRate 0.0000 Epoch: 39 Global Step: 808970 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:34,603-Speed 6296.51 samples/sec Loss 2.5645 LearningRate 0.0000 Epoch: 39 Global Step: 808980 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:37,854-Speed 6300.49 samples/sec Loss 2.5292 LearningRate 0.0000 Epoch: 39 Global Step: 808990 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:41,101-Speed 6309.32 samples/sec Loss 2.6017 LearningRate 0.0000 Epoch: 39 Global Step: 809000 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:44,358-Speed 6289.53 samples/sec Loss 2.5127 LearningRate 0.0000 Epoch: 39 Global Step: 809010 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:47,611-Speed 6296.66 samples/sec Loss 2.5076 LearningRate 0.0000 Epoch: 39 Global Step: 809020 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:50,867-Speed 6291.78 samples/sec Loss 2.5534 LearningRate 0.0000 Epoch: 39 Global Step: 809030 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:54,118-Speed 6299.51 samples/sec Loss 2.5032 LearningRate 0.0000 Epoch: 39 Global Step: 809040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:21:57,370-Speed 6299.02 samples/sec Loss 2.4921 LearningRate 0.0000 Epoch: 39 Global Step: 809050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:00,623-Speed 6298.46 samples/sec Loss 2.5605 LearningRate 0.0000 Epoch: 39 Global Step: 809060 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:03,859-Speed 6330.06 samples/sec Loss 2.5865 LearningRate 0.0000 Epoch: 39 Global Step: 809070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:07,110-Speed 6300.54 samples/sec Loss 2.5235 LearningRate 0.0000 Epoch: 39 Global Step: 809080 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:10,362-Speed 6298.71 samples/sec Loss 2.5394 LearningRate 0.0000 Epoch: 39 Global Step: 809090 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:13,639-Speed 6251.23 samples/sec Loss 2.5849 LearningRate 0.0000 Epoch: 39 Global Step: 809100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:16,892-Speed 6297.92 samples/sec Loss 2.5301 LearningRate 0.0000 Epoch: 39 Global Step: 809110 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:20,142-Speed 6302.49 samples/sec Loss 2.5388 LearningRate 0.0000 Epoch: 39 Global Step: 809120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:23,390-Speed 6307.30 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 39 Global Step: 809130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:26,633-Speed 6316.05 samples/sec Loss 2.4845 LearningRate 0.0000 Epoch: 39 Global Step: 809140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:29,878-Speed 6313.56 samples/sec Loss 2.5149 LearningRate 0.0000 Epoch: 39 Global Step: 809150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:33,123-Speed 6313.30 samples/sec Loss 2.5467 LearningRate 0.0000 Epoch: 39 Global Step: 809160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:36,370-Speed 6308.01 samples/sec Loss 2.5427 LearningRate 0.0000 Epoch: 39 Global Step: 809170 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:22:39,613-Speed 6316.07 samples/sec Loss 2.5406 LearningRate 0.0000 Epoch: 39 Global Step: 809180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:42,860-Speed 6308.53 samples/sec Loss 2.4814 LearningRate 0.0000 Epoch: 39 Global Step: 809190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:46,107-Speed 6309.58 samples/sec Loss 2.5484 LearningRate 0.0000 Epoch: 39 Global Step: 809200 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:49,354-Speed 6308.10 samples/sec Loss 2.5508 LearningRate 0.0000 Epoch: 39 Global Step: 809210 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:52,602-Speed 6308.20 samples/sec Loss 2.5310 LearningRate 0.0000 Epoch: 39 Global Step: 809220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:55,849-Speed 6307.00 samples/sec Loss 2.5275 LearningRate 0.0000 Epoch: 39 Global Step: 809230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:22:59,105-Speed 6293.17 samples/sec Loss 2.5469 LearningRate 0.0000 Epoch: 39 Global Step: 809240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:02,371-Speed 6271.59 samples/sec Loss 2.5008 LearningRate 0.0000 Epoch: 39 Global Step: 809250 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:05,612-Speed 6319.79 samples/sec Loss 2.5646 LearningRate 0.0000 Epoch: 39 Global Step: 809260 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:08,857-Speed 6313.29 samples/sec Loss 2.5397 LearningRate 0.0000 Epoch: 39 Global Step: 809270 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:12,085-Speed 6344.86 samples/sec Loss 2.5845 LearningRate 0.0000 Epoch: 39 Global Step: 809280 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:15,333-Speed 6307.24 samples/sec Loss 2.5340 LearningRate 0.0000 Epoch: 39 Global Step: 809290 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:18,581-Speed 6306.80 samples/sec Loss 2.5338 LearningRate 0.0000 Epoch: 39 Global Step: 809300 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:21,835-Speed 6295.94 samples/sec Loss 2.5047 LearningRate 0.0000 Epoch: 39 Global Step: 809310 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:25,085-Speed 6302.81 samples/sec Loss 2.5646 LearningRate 0.0000 Epoch: 39 Global Step: 809320 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:28,335-Speed 6304.17 samples/sec Loss 2.5030 LearningRate 0.0000 Epoch: 39 Global Step: 809330 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:31,580-Speed 6312.73 samples/sec Loss 2.5422 LearningRate 0.0000 Epoch: 39 Global Step: 809340 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:34,829-Speed 6304.59 samples/sec Loss 2.5934 LearningRate 0.0000 Epoch: 39 Global Step: 809350 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:38,079-Speed 6301.49 samples/sec Loss 2.5323 LearningRate 0.0000 Epoch: 39 Global Step: 809360 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:41,325-Speed 6312.13 samples/sec Loss 2.5133 LearningRate 0.0000 Epoch: 39 Global Step: 809370 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:44,560-Speed 6331.50 samples/sec Loss 2.5456 LearningRate 0.0000 Epoch: 39 Global Step: 809380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:47,803-Speed 6316.53 samples/sec Loss 2.4982 LearningRate 0.0000 Epoch: 39 Global Step: 809390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:51,055-Speed 6299.70 samples/sec Loss 2.5180 LearningRate 0.0000 Epoch: 39 Global Step: 809400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:54,312-Speed 6288.89 samples/sec Loss 2.5078 LearningRate 0.0000 Epoch: 39 Global Step: 809410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:23:57,563-Speed 6301.23 samples/sec Loss 2.5413 LearningRate 0.0000 Epoch: 39 Global Step: 809420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:00,815-Speed 6298.62 samples/sec Loss 2.5694 LearningRate 0.0000 Epoch: 39 Global Step: 809430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:04,080-Speed 6274.47 samples/sec Loss 2.5320 LearningRate 0.0000 Epoch: 39 Global Step: 809440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:07,323-Speed 6316.00 samples/sec Loss 2.5174 LearningRate 0.0000 Epoch: 39 Global Step: 809450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:10,572-Speed 6304.84 samples/sec Loss 2.5232 LearningRate 0.0000 Epoch: 39 Global Step: 809460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:13,821-Speed 6306.18 samples/sec Loss 2.5034 LearningRate 0.0000 Epoch: 39 Global Step: 809470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:17,054-Speed 6336.10 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 39 Global Step: 809480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:20,306-Speed 6298.57 samples/sec Loss 2.5467 LearningRate 0.0000 Epoch: 39 Global Step: 809490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:23,552-Speed 6311.01 samples/sec Loss 2.5161 LearningRate 0.0000 Epoch: 39 Global Step: 809500 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:26,798-Speed 6309.57 samples/sec Loss 2.5558 LearningRate 0.0000 Epoch: 39 Global Step: 809510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:30,046-Speed 6307.37 samples/sec Loss 2.5690 LearningRate 0.0000 Epoch: 39 Global Step: 809520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:33,295-Speed 6306.07 samples/sec Loss 2.4898 LearningRate 0.0000 Epoch: 39 Global Step: 809530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:36,549-Speed 6295.03 samples/sec Loss 2.5691 LearningRate 0.0000 Epoch: 39 Global Step: 809540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:39,799-Speed 6301.98 samples/sec Loss 2.5042 LearningRate 0.0000 Epoch: 39 Global Step: 809550 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:43,049-Speed 6304.06 samples/sec Loss 2.5307 LearningRate 0.0000 Epoch: 39 Global Step: 809560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:46,300-Speed 6301.04 samples/sec Loss 2.5631 LearningRate 0.0000 Epoch: 39 Global Step: 809570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:49,531-Speed 6340.39 samples/sec Loss 2.5163 LearningRate 0.0000 Epoch: 39 Global Step: 809580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:52,783-Speed 6298.31 samples/sec Loss 2.5549 LearningRate 0.0000 Epoch: 39 Global Step: 809590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:56,030-Speed 6310.05 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 39 Global Step: 809600 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:24:59,295-Speed 6276.12 samples/sec Loss 2.5271 LearningRate 0.0000 Epoch: 39 Global Step: 809610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:02,536-Speed 6320.62 samples/sec Loss 2.5188 LearningRate 0.0000 Epoch: 39 Global Step: 809620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:05,789-Speed 6297.84 samples/sec Loss 2.5178 LearningRate 0.0000 Epoch: 39 Global Step: 809630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:09,041-Speed 6298.61 samples/sec Loss 2.5721 LearningRate 0.0000 Epoch: 39 Global Step: 809640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:12,305-Speed 6276.05 samples/sec Loss 2.5441 LearningRate 0.0000 Epoch: 39 Global Step: 809650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:15,557-Speed 6298.12 samples/sec Loss 2.5544 LearningRate 0.0000 Epoch: 39 Global Step: 809660 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:18,801-Speed 6315.61 samples/sec Loss 2.5474 LearningRate 0.0000 Epoch: 39 Global Step: 809670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:22,033-Speed 6338.24 samples/sec Loss 2.5174 LearningRate 0.0000 Epoch: 39 Global Step: 809680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:25,337-Speed 6198.40 samples/sec Loss 2.5476 LearningRate 0.0000 Epoch: 39 Global Step: 809690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:28,588-Speed 6302.05 samples/sec Loss 2.5446 LearningRate 0.0000 Epoch: 39 Global Step: 809700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:31,833-Speed 6311.97 samples/sec Loss 2.5278 LearningRate 0.0000 Epoch: 39 Global Step: 809710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:35,095-Speed 6280.87 samples/sec Loss 2.4855 LearningRate 0.0000 Epoch: 39 Global Step: 809720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:38,341-Speed 6311.04 samples/sec Loss 2.5512 LearningRate 0.0000 Epoch: 39 Global Step: 809730 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:41,591-Speed 6302.44 samples/sec Loss 2.5093 LearningRate 0.0000 Epoch: 39 Global Step: 809740 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:44,845-Speed 6297.59 samples/sec Loss 2.5496 LearningRate 0.0000 Epoch: 39 Global Step: 809750 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:48,102-Speed 6290.53 samples/sec Loss 2.5165 LearningRate 0.0000 Epoch: 39 Global Step: 809760 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:51,350-Speed 6306.97 samples/sec Loss 2.5430 LearningRate 0.0000 Epoch: 39 Global Step: 809770 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:25:54,598-Speed 6306.34 samples/sec Loss 2.5396 LearningRate 0.0000 Epoch: 39 Global Step: 809780 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:25:57,830-Speed 6337.88 samples/sec Loss 2.5657 LearningRate 0.0000 Epoch: 39 Global Step: 809790 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:01,074-Speed 6314.54 samples/sec Loss 2.4601 LearningRate 0.0000 Epoch: 39 Global Step: 809800 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:04,322-Speed 6306.78 samples/sec Loss 2.4837 LearningRate 0.0000 Epoch: 39 Global Step: 809810 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:07,566-Speed 6315.80 samples/sec Loss 2.5212 LearningRate 0.0000 Epoch: 39 Global Step: 809820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:10,812-Speed 6310.20 samples/sec Loss 2.5670 LearningRate 0.0000 Epoch: 39 Global Step: 809830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:14,060-Speed 6306.99 samples/sec Loss 2.5932 LearningRate 0.0000 Epoch: 39 Global Step: 809840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:17,306-Speed 6311.20 samples/sec Loss 2.5106 LearningRate 0.0000 Epoch: 39 Global Step: 809850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:20,557-Speed 6300.19 samples/sec Loss 2.5772 LearningRate 0.0000 Epoch: 39 Global Step: 809860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:23,816-Speed 6285.11 samples/sec Loss 2.5250 LearningRate 0.0000 Epoch: 39 Global Step: 809870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:27,061-Speed 6312.26 samples/sec Loss 2.5401 LearningRate 0.0000 Epoch: 39 Global Step: 809880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:30,298-Speed 6329.23 samples/sec Loss 2.5543 LearningRate 0.0000 Epoch: 39 Global Step: 809890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:33,548-Speed 6302.35 samples/sec Loss 2.5276 LearningRate 0.0000 Epoch: 39 Global Step: 809900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:36,806-Speed 6287.05 samples/sec Loss 2.6066 LearningRate 0.0000 Epoch: 39 Global Step: 809910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:40,054-Speed 6308.12 samples/sec Loss 2.5509 LearningRate 0.0000 Epoch: 39 Global Step: 809920 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:43,325-Speed 6262.03 samples/sec Loss 2.5328 LearningRate 0.0000 Epoch: 39 Global Step: 809930 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:46,574-Speed 6304.47 samples/sec Loss 2.5642 LearningRate 0.0000 Epoch: 39 Global Step: 809940 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:49,842-Speed 6269.89 samples/sec Loss 2.5354 LearningRate 0.0000 Epoch: 39 Global Step: 809950 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:53,088-Speed 6310.99 samples/sec Loss 2.4786 LearningRate 0.0000 Epoch: 39 Global Step: 809960 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:56,356-Speed 6267.23 samples/sec Loss 2.4983 LearningRate 0.0000 Epoch: 39 Global Step: 809970 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:26:59,605-Speed 6304.95 samples/sec Loss 2.5739 LearningRate 0.0000 Epoch: 39 Global Step: 809980 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:02,844-Speed 6324.42 samples/sec Loss 2.6043 LearningRate 0.0000 Epoch: 39 Global Step: 809990 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:06,093-Speed 6304.59 samples/sec Loss 2.5523 LearningRate 0.0000 Epoch: 39 Global Step: 810000 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:09,350-Speed 6290.76 samples/sec Loss 2.5369 LearningRate 0.0000 Epoch: 39 Global Step: 810010 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:12,596-Speed 6310.42 samples/sec Loss 2.5314 LearningRate 0.0000 Epoch: 39 Global Step: 810020 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:15,840-Speed 6314.33 samples/sec Loss 2.5136 LearningRate 0.0000 Epoch: 39 Global Step: 810030 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:19,088-Speed 6307.08 samples/sec Loss 2.4591 LearningRate 0.0000 Epoch: 39 Global Step: 810040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:22,341-Speed 6296.86 samples/sec Loss 2.5194 LearningRate 0.0000 Epoch: 39 Global Step: 810050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:25,589-Speed 6306.62 samples/sec Loss 2.5058 LearningRate 0.0000 Epoch: 39 Global Step: 810060 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:28,840-Speed 6300.98 samples/sec Loss 2.5459 LearningRate 0.0000 Epoch: 39 Global Step: 810070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:32,092-Speed 6298.28 samples/sec Loss 2.5854 LearningRate 0.0000 Epoch: 39 Global Step: 810080 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:35,328-Speed 6330.92 samples/sec Loss 2.5070 LearningRate 0.0000 Epoch: 39 Global Step: 810090 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:38,580-Speed 6298.24 samples/sec Loss 2.5596 LearningRate 0.0000 Epoch: 39 Global Step: 810100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:41,830-Speed 6303.23 samples/sec Loss 2.5314 LearningRate 0.0000 Epoch: 39 Global Step: 810110 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:45,082-Speed 6299.55 samples/sec Loss 2.5180 LearningRate 0.0000 Epoch: 39 Global Step: 810120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:48,330-Speed 6307.20 samples/sec Loss 2.5702 LearningRate 0.0000 Epoch: 39 Global Step: 810130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:51,579-Speed 6304.72 samples/sec Loss 2.5394 LearningRate 0.0000 Epoch: 39 Global Step: 810140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:54,830-Speed 6302.26 samples/sec Loss 2.5100 LearningRate 0.0000 Epoch: 39 Global Step: 810150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:27:58,079-Speed 6303.75 samples/sec Loss 2.4907 LearningRate 0.0000 Epoch: 39 Global Step: 810160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:01,331-Speed 6299.01 samples/sec Loss 2.5086 LearningRate 0.0000 Epoch: 39 Global Step: 810170 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:04,583-Speed 6300.02 samples/sec Loss 2.5386 LearningRate 0.0000 Epoch: 39 Global Step: 810180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:07,818-Speed 6332.69 samples/sec Loss 2.5184 LearningRate 0.0000 Epoch: 39 Global Step: 810190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:11,068-Speed 6302.08 samples/sec Loss 2.5189 LearningRate 0.0000 Epoch: 39 Global Step: 810200 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:14,320-Speed 6298.84 samples/sec Loss 2.5481 LearningRate 0.0000 Epoch: 39 Global Step: 810210 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:17,572-Speed 6298.75 samples/sec Loss 2.4698 LearningRate 0.0000 Epoch: 39 Global Step: 810220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:20,825-Speed 6298.68 samples/sec Loss 2.5692 LearningRate 0.0000 Epoch: 39 Global Step: 810230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:24,077-Speed 6297.41 samples/sec Loss 2.5517 LearningRate 0.0000 Epoch: 39 Global Step: 810240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:27,330-Speed 6296.98 samples/sec Loss 2.5084 LearningRate 0.0000 Epoch: 39 Global Step: 810250 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:30,584-Speed 6296.18 samples/sec Loss 2.5531 LearningRate 0.0000 Epoch: 39 Global Step: 810260 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:33,831-Speed 6308.76 samples/sec Loss 2.5365 LearningRate 0.0000 Epoch: 39 Global Step: 810270 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:37,086-Speed 6292.64 samples/sec Loss 2.4986 LearningRate 0.0000 Epoch: 39 Global Step: 810280 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:40,342-Speed 6291.92 samples/sec Loss 2.5325 LearningRate 0.0000 Epoch: 39 Global Step: 810290 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:28:43,579-Speed 6328.24 samples/sec Loss 2.5321 LearningRate 0.0000 Epoch: 39 Global Step: 810300 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:46,826-Speed 6308.29 samples/sec Loss 2.5651 LearningRate 0.0000 Epoch: 39 Global Step: 810310 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:50,076-Speed 6304.11 samples/sec Loss 2.5272 LearningRate 0.0000 Epoch: 39 Global Step: 810320 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:53,325-Speed 6303.51 samples/sec Loss 2.4813 LearningRate 0.0000 Epoch: 39 Global Step: 810330 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:56,574-Speed 6306.24 samples/sec Loss 2.5975 LearningRate 0.0000 Epoch: 39 Global Step: 810340 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:28:59,829-Speed 6293.21 samples/sec Loss 2.5169 LearningRate 0.0000 Epoch: 39 Global Step: 810350 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:03,128-Speed 6209.73 samples/sec Loss 2.5485 LearningRate 0.0000 Epoch: 39 Global Step: 810360 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:06,379-Speed 6301.06 samples/sec Loss 2.4997 LearningRate 0.0000 Epoch: 39 Global Step: 810370 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:09,644-Speed 6274.22 samples/sec Loss 2.5271 LearningRate 0.0000 Epoch: 39 Global Step: 810380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:12,898-Speed 6293.89 samples/sec Loss 2.5568 LearningRate 0.0000 Epoch: 39 Global Step: 810390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:16,150-Speed 6299.78 samples/sec Loss 2.5508 LearningRate 0.0000 Epoch: 39 Global Step: 810400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:19,398-Speed 6307.13 samples/sec Loss 2.5209 LearningRate 0.0000 Epoch: 39 Global Step: 810410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:22,657-Speed 6286.14 samples/sec Loss 2.5569 LearningRate 0.0000 Epoch: 39 Global Step: 810420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:25,910-Speed 6295.45 samples/sec Loss 2.4664 LearningRate 0.0000 Epoch: 39 Global Step: 810430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:29,164-Speed 6296.34 samples/sec Loss 2.5506 LearningRate 0.0000 Epoch: 39 Global Step: 810440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:32,411-Speed 6307.36 samples/sec Loss 2.5525 LearningRate 0.0000 Epoch: 39 Global Step: 810450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:35,659-Speed 6307.17 samples/sec Loss 2.5524 LearningRate 0.0000 Epoch: 39 Global Step: 810460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:38,909-Speed 6303.64 samples/sec Loss 2.5578 LearningRate 0.0000 Epoch: 39 Global Step: 810470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:42,161-Speed 6298.88 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 39 Global Step: 810480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:45,414-Speed 6296.58 samples/sec Loss 2.5534 LearningRate 0.0000 Epoch: 39 Global Step: 810490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:48,662-Speed 6306.70 samples/sec Loss 2.4875 LearningRate 0.0000 Epoch: 39 Global Step: 810500 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:29:51,914-Speed 6299.37 samples/sec Loss 2.5162 LearningRate 0.0000 Epoch: 39 Global Step: 810510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:55,162-Speed 6306.88 samples/sec Loss 2.5357 LearningRate 0.0000 Epoch: 39 Global Step: 810520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:29:58,406-Speed 6314.36 samples/sec Loss 2.5640 LearningRate 0.0000 Epoch: 39 Global Step: 810530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:01,654-Speed 6308.04 samples/sec Loss 2.4645 LearningRate 0.0000 Epoch: 39 Global Step: 810540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:04,907-Speed 6297.30 samples/sec Loss 2.5365 LearningRate 0.0000 Epoch: 39 Global Step: 810550 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:08,155-Speed 6307.06 samples/sec Loss 2.5124 LearningRate 0.0000 Epoch: 39 Global Step: 810560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:11,420-Speed 6273.94 samples/sec Loss 2.5753 LearningRate 0.0000 Epoch: 39 Global Step: 810570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:14,669-Speed 6305.20 samples/sec Loss 2.5526 LearningRate 0.0000 Epoch: 39 Global Step: 810580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:17,918-Speed 6304.87 samples/sec Loss 2.4881 LearningRate 0.0000 Epoch: 39 Global Step: 810590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:21,171-Speed 6297.29 samples/sec Loss 2.5533 LearningRate 0.0000 Epoch: 39 Global Step: 810600 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:24,403-Speed 6338.78 samples/sec Loss 2.5504 LearningRate 0.0000 Epoch: 39 Global Step: 810610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:27,654-Speed 6299.70 samples/sec Loss 2.4648 LearningRate 0.0000 Epoch: 39 Global Step: 810620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:30,906-Speed 6298.26 samples/sec Loss 2.5395 LearningRate 0.0000 Epoch: 39 Global Step: 810630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:34,165-Speed 6286.18 samples/sec Loss 2.5360 LearningRate 0.0000 Epoch: 39 Global Step: 810640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:37,419-Speed 6294.73 samples/sec Loss 2.5075 LearningRate 0.0000 Epoch: 39 Global Step: 810650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:40,663-Speed 6315.24 samples/sec Loss 2.5494 LearningRate 0.0000 Epoch: 39 Global Step: 810660 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:43,918-Speed 6293.72 samples/sec Loss 2.5985 LearningRate 0.0000 Epoch: 39 Global Step: 810670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:47,166-Speed 6306.59 samples/sec Loss 2.5517 LearningRate 0.0000 Epoch: 39 Global Step: 810680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:50,418-Speed 6300.04 samples/sec Loss 2.5122 LearningRate 0.0000 Epoch: 39 Global Step: 810690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:53,668-Speed 6302.34 samples/sec Loss 2.5525 LearningRate 0.0000 Epoch: 39 Global Step: 810700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:30:56,900-Speed 6337.84 samples/sec Loss 2.5120 LearningRate 0.0000 Epoch: 39 Global Step: 810710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:00,150-Speed 6304.01 samples/sec Loss 2.5739 LearningRate 0.0000 Epoch: 39 Global Step: 810720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:03,409-Speed 6284.36 samples/sec Loss 2.5353 LearningRate 0.0000 Epoch: 39 Global Step: 810730 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:06,663-Speed 6295.46 samples/sec Loss 2.4977 LearningRate 0.0000 Epoch: 39 Global Step: 810740 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:09,911-Speed 6307.42 samples/sec Loss 2.5755 LearningRate 0.0000 Epoch: 39 Global Step: 810750 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:13,163-Speed 6298.38 samples/sec Loss 2.5698 LearningRate 0.0000 Epoch: 39 Global Step: 810760 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:16,425-Speed 6281.63 samples/sec Loss 2.5446 LearningRate 0.0000 Epoch: 39 Global Step: 810770 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:19,674-Speed 6303.34 samples/sec Loss 2.5630 LearningRate 0.0000 Epoch: 39 Global Step: 810780 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:22,926-Speed 6301.08 samples/sec Loss 2.5253 LearningRate 0.0000 Epoch: 39 Global Step: 810790 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:26,185-Speed 6283.91 samples/sec Loss 2.5468 LearningRate 0.0000 Epoch: 39 Global Step: 810800 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:29,423-Speed 6326.88 samples/sec Loss 2.5116 LearningRate 0.0000 Epoch: 39 Global Step: 810810 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:32,678-Speed 6293.18 samples/sec Loss 2.5248 LearningRate 0.0000 Epoch: 39 Global Step: 810820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:35,980-Speed 6203.25 samples/sec Loss 2.5617 LearningRate 0.0000 Epoch: 39 Global Step: 810830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:39,231-Speed 6300.50 samples/sec Loss 2.5224 LearningRate 0.0000 Epoch: 39 Global Step: 810840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:42,478-Speed 6308.91 samples/sec Loss 2.5289 LearningRate 0.0000 Epoch: 39 Global Step: 810850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:45,728-Speed 6304.08 samples/sec Loss 2.5163 LearningRate 0.0000 Epoch: 39 Global Step: 810860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:48,981-Speed 6296.77 samples/sec Loss 2.5103 LearningRate 0.0000 Epoch: 39 Global Step: 810870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:52,239-Speed 6287.13 samples/sec Loss 2.5644 LearningRate 0.0000 Epoch: 39 Global Step: 810880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:55,495-Speed 6292.77 samples/sec Loss 2.5173 LearningRate 0.0000 Epoch: 39 Global Step: 810890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:31:58,748-Speed 6296.65 samples/sec Loss 2.5153 LearningRate 0.0000 Epoch: 39 Global Step: 810900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:02,015-Speed 6269.92 samples/sec Loss 2.5107 LearningRate 0.0000 Epoch: 39 Global Step: 810910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:05,268-Speed 6297.76 samples/sec Loss 2.5514 LearningRate 0.0000 Epoch: 39 Global Step: 810920 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:08,522-Speed 6295.08 samples/sec Loss 2.5533 LearningRate 0.0000 Epoch: 39 Global Step: 810930 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:11,770-Speed 6305.56 samples/sec Loss 2.5668 LearningRate 0.0000 Epoch: 39 Global Step: 810940 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:15,025-Speed 6294.41 samples/sec Loss 2.5607 LearningRate 0.0000 Epoch: 39 Global Step: 810950 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:18,277-Speed 6299.58 samples/sec Loss 2.5383 LearningRate 0.0000 Epoch: 39 Global Step: 810960 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:21,525-Speed 6306.84 samples/sec Loss 2.5931 LearningRate 0.0000 Epoch: 39 Global Step: 810970 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:24,778-Speed 6297.00 samples/sec Loss 2.5406 LearningRate 0.0000 Epoch: 39 Global Step: 810980 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:28,034-Speed 6291.10 samples/sec Loss 2.5566 LearningRate 0.0000 Epoch: 39 Global Step: 810990 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:31,292-Speed 6286.86 samples/sec Loss 2.5289 LearningRate 0.0000 Epoch: 39 Global Step: 811000 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:34,535-Speed 6317.85 samples/sec Loss 2.5256 LearningRate 0.0000 Epoch: 39 Global Step: 811010 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:37,804-Speed 6265.17 samples/sec Loss 2.5053 LearningRate 0.0000 Epoch: 39 Global Step: 811020 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:41,050-Speed 6310.60 samples/sec Loss 2.5462 LearningRate 0.0000 Epoch: 39 Global Step: 811030 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:44,300-Speed 6304.19 samples/sec Loss 2.5277 LearningRate 0.0000 Epoch: 39 Global Step: 811040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:47,559-Speed 6284.61 samples/sec Loss 2.5249 LearningRate 0.0000 Epoch: 39 Global Step: 811050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:50,822-Speed 6278.22 samples/sec Loss 2.5264 LearningRate 0.0000 Epoch: 39 Global Step: 811060 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:54,072-Speed 6302.06 samples/sec Loss 2.4808 LearningRate 0.0000 Epoch: 39 Global Step: 811070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:32:57,316-Speed 6315.85 samples/sec Loss 2.5113 LearningRate 0.0000 Epoch: 39 Global Step: 811080 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:00,568-Speed 6298.35 samples/sec Loss 2.5695 LearningRate 0.0000 Epoch: 39 Global Step: 811090 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:03,819-Speed 6301.86 samples/sec Loss 2.5597 LearningRate 0.0000 Epoch: 39 Global Step: 811100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:07,057-Speed 6325.04 samples/sec Loss 2.5209 LearningRate 0.0000 Epoch: 39 Global Step: 811110 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:10,310-Speed 6297.52 samples/sec Loss 2.5535 LearningRate 0.0000 Epoch: 39 Global Step: 811120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:13,556-Speed 6311.34 samples/sec Loss 2.5518 LearningRate 0.0000 Epoch: 39 Global Step: 811130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:16,807-Speed 6299.43 samples/sec Loss 2.5390 LearningRate 0.0000 Epoch: 39 Global Step: 811140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:20,067-Speed 6284.90 samples/sec Loss 2.4898 LearningRate 0.0000 Epoch: 39 Global Step: 811150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:23,321-Speed 6295.80 samples/sec Loss 2.5518 LearningRate 0.0000 Epoch: 39 Global Step: 811160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:26,577-Speed 6292.02 samples/sec Loss 2.5404 LearningRate 0.0000 Epoch: 39 Global Step: 811170 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:29,865-Speed 6229.73 samples/sec Loss 2.5478 LearningRate 0.0000 Epoch: 39 Global Step: 811180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:33,110-Speed 6312.11 samples/sec Loss 2.5373 LearningRate 0.0000 Epoch: 39 Global Step: 811190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:36,366-Speed 6291.78 samples/sec Loss 2.5082 LearningRate 0.0000 Epoch: 39 Global Step: 811200 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:39,619-Speed 6297.64 samples/sec Loss 2.5341 LearningRate 0.0000 Epoch: 39 Global Step: 811210 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:33:42,855-Speed 6328.70 samples/sec Loss 2.5395 LearningRate 0.0000 Epoch: 39 Global Step: 811220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:46,113-Speed 6288.69 samples/sec Loss 2.5223 LearningRate 0.0000 Epoch: 39 Global Step: 811230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:49,382-Speed 6266.02 samples/sec Loss 2.5114 LearningRate 0.0000 Epoch: 39 Global Step: 811240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:52,637-Speed 6293.86 samples/sec Loss 2.5452 LearningRate 0.0000 Epoch: 39 Global Step: 811250 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:55,889-Speed 6298.27 samples/sec Loss 2.5512 LearningRate 0.0000 Epoch: 39 Global Step: 811260 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:33:59,137-Speed 6308.02 samples/sec Loss 2.5208 LearningRate 0.0000 Epoch: 39 Global Step: 811270 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:02,397-Speed 6283.93 samples/sec Loss 2.5666 LearningRate 0.0000 Epoch: 39 Global Step: 811280 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:05,641-Speed 6313.48 samples/sec Loss 2.5100 LearningRate 0.0000 Epoch: 39 Global Step: 811290 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:08,891-Speed 6304.03 samples/sec Loss 2.5570 LearningRate 0.0000 Epoch: 39 Global Step: 811300 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:12,141-Speed 6301.04 samples/sec Loss 2.5205 LearningRate 0.0000 Epoch: 39 Global Step: 811310 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:15,386-Speed 6314.52 samples/sec Loss 2.5671 LearningRate 0.0000 Epoch: 39 Global Step: 811320 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:18,634-Speed 6305.31 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 39 Global Step: 811330 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:21,885-Speed 6301.22 samples/sec Loss 2.5920 LearningRate 0.0000 Epoch: 39 Global Step: 811340 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:25,146-Speed 6283.57 samples/sec Loss 2.5456 LearningRate 0.0000 Epoch: 39 Global Step: 811350 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:28,397-Speed 6299.38 samples/sec Loss 2.5181 LearningRate 0.0000 Epoch: 39 Global Step: 811360 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:31,650-Speed 6298.49 samples/sec Loss 2.5077 LearningRate 0.0000 Epoch: 39 Global Step: 811370 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:34,902-Speed 6299.61 samples/sec Loss 2.5270 LearningRate 0.0000 Epoch: 39 Global Step: 811380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:38,155-Speed 6296.35 samples/sec Loss 2.4968 LearningRate 0.0000 Epoch: 39 Global Step: 811390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:41,409-Speed 6294.22 samples/sec Loss 2.5035 LearningRate 0.0000 Epoch: 39 Global Step: 811400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:44,652-Speed 6317.75 samples/sec Loss 2.4781 LearningRate 0.0000 Epoch: 39 Global Step: 811410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:47,902-Speed 6303.54 samples/sec Loss 2.5415 LearningRate 0.0000 Epoch: 39 Global Step: 811420 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:34:51,139-Speed 6327.66 samples/sec Loss 2.5414 LearningRate 0.0000 Epoch: 39 Global Step: 811430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:54,384-Speed 6313.08 samples/sec Loss 2.5161 LearningRate 0.0000 Epoch: 39 Global Step: 811440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:34:57,636-Speed 6298.15 samples/sec Loss 2.4912 LearningRate 0.0000 Epoch: 39 Global Step: 811450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:00,884-Speed 6306.32 samples/sec Loss 2.4912 LearningRate 0.0000 Epoch: 39 Global Step: 811460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:04,137-Speed 6297.43 samples/sec Loss 2.5132 LearningRate 0.0000 Epoch: 39 Global Step: 811470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:07,386-Speed 6304.67 samples/sec Loss 2.5986 LearningRate 0.0000 Epoch: 39 Global Step: 811480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:10,638-Speed 6299.75 samples/sec Loss 2.5695 LearningRate 0.0000 Epoch: 39 Global Step: 811490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:13,890-Speed 6302.12 samples/sec Loss 2.5394 LearningRate 0.0000 Epoch: 39 Global Step: 811500 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:17,146-Speed 6289.59 samples/sec Loss 2.5645 LearningRate 0.0000 Epoch: 39 Global Step: 811510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:20,395-Speed 6304.96 samples/sec Loss 2.5237 LearningRate 0.0000 Epoch: 39 Global Step: 811520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:23,654-Speed 6285.60 samples/sec Loss 2.5761 LearningRate 0.0000 Epoch: 39 Global Step: 811530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:26,905-Speed 6301.43 samples/sec Loss 2.5393 LearningRate 0.0000 Epoch: 39 Global Step: 811540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:30,161-Speed 6291.66 samples/sec Loss 2.5360 LearningRate 0.0000 Epoch: 39 Global Step: 811550 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:33,420-Speed 6285.83 samples/sec Loss 2.5614 LearningRate 0.0000 Epoch: 39 Global Step: 811560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:36,676-Speed 6291.45 samples/sec Loss 2.5222 LearningRate 0.0000 Epoch: 39 Global Step: 811570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:39,928-Speed 6299.91 samples/sec Loss 2.5422 LearningRate 0.0000 Epoch: 39 Global Step: 811580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:43,177-Speed 6304.33 samples/sec Loss 2.5218 LearningRate 0.0000 Epoch: 39 Global Step: 811590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:46,440-Speed 6278.92 samples/sec Loss 2.5516 LearningRate 0.0000 Epoch: 39 Global Step: 811600 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:49,701-Speed 6280.05 samples/sec Loss 2.5394 LearningRate 0.0000 Epoch: 39 Global Step: 811610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:52,953-Speed 6299.11 samples/sec Loss 2.5370 LearningRate 0.0000 Epoch: 39 Global Step: 811620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:56,184-Speed 6339.98 samples/sec Loss 2.5530 LearningRate 0.0000 Epoch: 39 Global Step: 811630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:35:59,442-Speed 6288.30 samples/sec Loss 2.4990 LearningRate 0.0000 Epoch: 39 Global Step: 811640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:02,689-Speed 6309.29 samples/sec Loss 2.5175 LearningRate 0.0000 Epoch: 39 Global Step: 811650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:05,937-Speed 6305.70 samples/sec Loss 2.5297 LearningRate 0.0000 Epoch: 39 Global Step: 811660 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:09,198-Speed 6282.69 samples/sec Loss 2.5170 LearningRate 0.0000 Epoch: 39 Global Step: 811670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:12,443-Speed 6311.32 samples/sec Loss 2.5071 LearningRate 0.0000 Epoch: 39 Global Step: 811680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:15,701-Speed 6291.46 samples/sec Loss 2.5007 LearningRate 0.0000 Epoch: 39 Global Step: 811690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:18,954-Speed 6296.40 samples/sec Loss 2.5400 LearningRate 0.0000 Epoch: 39 Global Step: 811700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:22,206-Speed 6298.76 samples/sec Loss 2.5259 LearningRate 0.0000 Epoch: 39 Global Step: 811710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:25,460-Speed 6294.58 samples/sec Loss 2.5333 LearningRate 0.0000 Epoch: 39 Global Step: 811720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:28,702-Speed 6319.39 samples/sec Loss 2.5612 LearningRate 0.0000 Epoch: 39 Global Step: 811730 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:31,963-Speed 6282.77 samples/sec Loss 2.5847 LearningRate 0.0000 Epoch: 39 Global Step: 811740 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:35,219-Speed 6291.20 samples/sec Loss 2.5502 LearningRate 0.0000 Epoch: 39 Global Step: 811750 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:38,472-Speed 6297.34 samples/sec Loss 2.4689 LearningRate 0.0000 Epoch: 39 Global Step: 811760 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:41,734-Speed 6279.59 samples/sec Loss 2.5387 LearningRate 0.0000 Epoch: 39 Global Step: 811770 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:44,984-Speed 6303.08 samples/sec Loss 2.5787 LearningRate 0.0000 Epoch: 39 Global Step: 811780 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:48,234-Speed 6303.70 samples/sec Loss 2.5349 LearningRate 0.0000 Epoch: 39 Global Step: 811790 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:51,482-Speed 6304.72 samples/sec Loss 2.5056 LearningRate 0.0000 Epoch: 39 Global Step: 811800 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:54,745-Speed 6280.01 samples/sec Loss 2.5784 LearningRate 0.0000 Epoch: 39 Global Step: 811810 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:36:58,019-Speed 6255.91 samples/sec Loss 2.5343 LearningRate 0.0000 Epoch: 39 Global Step: 811820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:01,258-Speed 6324.47 samples/sec Loss 2.5324 LearningRate 0.0000 Epoch: 39 Global Step: 811830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:04,515-Speed 6288.75 samples/sec Loss 2.5457 LearningRate 0.0000 Epoch: 39 Global Step: 811840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:07,774-Speed 6285.36 samples/sec Loss 2.5700 LearningRate 0.0000 Epoch: 39 Global Step: 811850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:11,032-Speed 6288.93 samples/sec Loss 2.5182 LearningRate 0.0000 Epoch: 39 Global Step: 811860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:14,281-Speed 6303.43 samples/sec Loss 2.5252 LearningRate 0.0000 Epoch: 39 Global Step: 811870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:17,529-Speed 6307.11 samples/sec Loss 2.5395 LearningRate 0.0000 Epoch: 39 Global Step: 811880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:20,783-Speed 6295.72 samples/sec Loss 2.5247 LearningRate 0.0000 Epoch: 39 Global Step: 811890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:24,037-Speed 6295.47 samples/sec Loss 2.5405 LearningRate 0.0000 Epoch: 39 Global Step: 811900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:27,289-Speed 6299.27 samples/sec Loss 2.5659 LearningRate 0.0000 Epoch: 39 Global Step: 811910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:30,539-Speed 6302.56 samples/sec Loss 2.5126 LearningRate 0.0000 Epoch: 39 Global Step: 811920 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:33,777-Speed 6326.59 samples/sec Loss 2.5728 LearningRate 0.0000 Epoch: 39 Global Step: 811930 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:37,028-Speed 6301.35 samples/sec Loss 2.5371 LearningRate 0.0000 Epoch: 39 Global Step: 811940 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:40,278-Speed 6302.88 samples/sec Loss 2.5919 LearningRate 0.0000 Epoch: 39 Global Step: 811950 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:43,533-Speed 6294.45 samples/sec Loss 2.5458 LearningRate 0.0000 Epoch: 39 Global Step: 811960 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:46,781-Speed 6306.37 samples/sec Loss 2.5351 LearningRate 0.0000 Epoch: 39 Global Step: 811970 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:50,043-Speed 6280.20 samples/sec Loss 2.5307 LearningRate 0.0000 Epoch: 39 Global Step: 811980 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:53,307-Speed 6276.07 samples/sec Loss 2.5350 LearningRate 0.0000 Epoch: 39 Global Step: 811990 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:56,591-Speed 6237.78 samples/sec Loss 2.5038 LearningRate 0.0000 Epoch: 39 Global Step: 812000 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:37:59,921-Speed 6151.56 samples/sec Loss 2.5799 LearningRate 0.0000 Epoch: 39 Global Step: 812010 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:03,176-Speed 6294.08 samples/sec Loss 2.5744 LearningRate 0.0000 Epoch: 39 Global Step: 812020 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:06,439-Speed 6277.53 samples/sec Loss 2.5280 LearningRate 0.0000 Epoch: 39 Global Step: 812030 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:38:09,674-Speed 6331.88 samples/sec Loss 2.5467 LearningRate 0.0000 Epoch: 39 Global Step: 812040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:12,975-Speed 6204.57 samples/sec Loss 2.5006 LearningRate 0.0000 Epoch: 39 Global Step: 812050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:16,232-Speed 6289.86 samples/sec Loss 2.5412 LearningRate 0.0000 Epoch: 39 Global Step: 812060 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:19,481-Speed 6305.13 samples/sec Loss 2.5485 LearningRate 0.0000 Epoch: 39 Global Step: 812070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:22,731-Speed 6301.93 samples/sec Loss 2.5213 LearningRate 0.0000 Epoch: 39 Global Step: 812080 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:25,981-Speed 6302.77 samples/sec Loss 2.5283 LearningRate 0.0000 Epoch: 39 Global Step: 812090 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:29,251-Speed 6265.06 samples/sec Loss 2.5159 LearningRate 0.0000 Epoch: 39 Global Step: 812100 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:32,512-Speed 6281.31 samples/sec Loss 2.5653 LearningRate 0.0000 Epoch: 39 Global Step: 812110 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:35,767-Speed 6294.65 samples/sec Loss 2.5260 LearningRate 0.0000 Epoch: 39 Global Step: 812120 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:39,032-Speed 6273.04 samples/sec Loss 2.5903 LearningRate 0.0000 Epoch: 39 Global Step: 812130 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:42,269-Speed 6329.63 samples/sec Loss 2.5733 LearningRate 0.0000 Epoch: 39 Global Step: 812140 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:45,527-Speed 6287.33 samples/sec Loss 2.5191 LearningRate 0.0000 Epoch: 39 Global Step: 812150 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:48,782-Speed 6292.30 samples/sec Loss 2.5101 LearningRate 0.0000 Epoch: 39 Global Step: 812160 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:52,034-Speed 6300.76 samples/sec Loss 2.5303 LearningRate 0.0000 Epoch: 39 Global Step: 812170 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:55,286-Speed 6299.08 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 39 Global Step: 812180 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:38:58,539-Speed 6296.26 samples/sec Loss 2.5116 LearningRate 0.0000 Epoch: 39 Global Step: 812190 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:01,792-Speed 6297.53 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 39 Global Step: 812200 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:05,064-Speed 6296.83 samples/sec Loss 2.4960 LearningRate 0.0000 Epoch: 39 Global Step: 812210 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:08,314-Speed 6303.11 samples/sec Loss 2.5514 LearningRate 0.0000 Epoch: 39 Global Step: 812220 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:11,574-Speed 6283.31 samples/sec Loss 2.5288 LearningRate 0.0000 Epoch: 39 Global Step: 812230 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:14,818-Speed 6315.15 samples/sec Loss 2.4861 LearningRate 0.0000 Epoch: 39 Global Step: 812240 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:18,069-Speed 6300.34 samples/sec Loss 2.5152 LearningRate 0.0000 Epoch: 39 Global Step: 812250 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:21,318-Speed 6304.82 samples/sec Loss 2.4840 LearningRate 0.0000 Epoch: 39 Global Step: 812260 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:24,578-Speed 6283.45 samples/sec Loss 2.5290 LearningRate 0.0000 Epoch: 39 Global Step: 812270 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:27,842-Speed 6275.48 samples/sec Loss 2.5077 LearningRate 0.0000 Epoch: 39 Global Step: 812280 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:31,086-Speed 6314.35 samples/sec Loss 2.5598 LearningRate 0.0000 Epoch: 39 Global Step: 812290 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:34,336-Speed 6304.03 samples/sec Loss 2.5173 LearningRate 0.0000 Epoch: 39 Global Step: 812300 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:37,597-Speed 6281.55 samples/sec Loss 2.5572 LearningRate 0.0000 Epoch: 39 Global Step: 812310 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:40,846-Speed 6304.81 samples/sec Loss 2.5110 LearningRate 0.0000 Epoch: 39 Global Step: 812320 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:44,101-Speed 6294.10 samples/sec Loss 2.5681 LearningRate 0.0000 Epoch: 39 Global Step: 812330 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:47,356-Speed 6292.79 samples/sec Loss 2.5202 LearningRate 0.0000 Epoch: 39 Global Step: 812340 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:39:50,597-Speed 6320.95 samples/sec Loss 2.5482 LearningRate 0.0000 Epoch: 39 Global Step: 812350 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:53,847-Speed 6302.99 samples/sec Loss 2.5420 LearningRate 0.0000 Epoch: 39 Global Step: 812360 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:39:57,114-Speed 6271.22 samples/sec Loss 2.5274 LearningRate 0.0000 Epoch: 39 Global Step: 812370 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:00,371-Speed 6288.44 samples/sec Loss 2.5155 LearningRate 0.0000 Epoch: 39 Global Step: 812380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:03,627-Speed 6291.02 samples/sec Loss 2.5233 LearningRate 0.0000 Epoch: 39 Global Step: 812390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:06,882-Speed 6293.66 samples/sec Loss 2.4986 LearningRate 0.0000 Epoch: 39 Global Step: 812400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:10,137-Speed 6292.16 samples/sec Loss 2.5413 LearningRate 0.0000 Epoch: 39 Global Step: 812410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:13,389-Speed 6299.38 samples/sec Loss 2.5297 LearningRate 0.0000 Epoch: 39 Global Step: 812420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:16,641-Speed 6300.02 samples/sec Loss 2.5105 LearningRate 0.0000 Epoch: 39 Global Step: 812430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:19,896-Speed 6293.11 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 39 Global Step: 812440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:23,128-Speed 6336.64 samples/sec Loss 2.5356 LearningRate 0.0000 Epoch: 39 Global Step: 812450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:26,389-Speed 6284.27 samples/sec Loss 2.5662 LearningRate 0.0000 Epoch: 39 Global Step: 812460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:29,646-Speed 6289.48 samples/sec Loss 2.4979 LearningRate 0.0000 Epoch: 39 Global Step: 812470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:32,898-Speed 6298.63 samples/sec Loss 2.4939 LearningRate 0.0000 Epoch: 39 Global Step: 812480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:36,150-Speed 6298.79 samples/sec Loss 2.4847 LearningRate 0.0000 Epoch: 39 Global Step: 812490 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:39,416-Speed 6272.62 samples/sec Loss 2.5351 LearningRate 0.0000 Epoch: 39 Global Step: 812500 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:42,666-Speed 6302.70 samples/sec Loss 2.5229 LearningRate 0.0000 Epoch: 39 Global Step: 812510 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:45,915-Speed 6304.39 samples/sec Loss 2.5360 LearningRate 0.0000 Epoch: 39 Global Step: 812520 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:49,165-Speed 6303.42 samples/sec Loss 2.6077 LearningRate 0.0000 Epoch: 39 Global Step: 812530 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:52,419-Speed 6295.15 samples/sec Loss 2.4906 LearningRate 0.0000 Epoch: 39 Global Step: 812540 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:55,653-Speed 6333.78 samples/sec Loss 2.5298 LearningRate 0.0000 Epoch: 39 Global Step: 812550 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:40:58,909-Speed 6291.46 samples/sec Loss 2.5371 LearningRate 0.0000 Epoch: 39 Global Step: 812560 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:02,156-Speed 6309.55 samples/sec Loss 2.5295 LearningRate 0.0000 Epoch: 39 Global Step: 812570 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:05,418-Speed 6280.61 samples/sec Loss 2.5041 LearningRate 0.0000 Epoch: 39 Global Step: 812580 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:08,664-Speed 6308.97 samples/sec Loss 2.5323 LearningRate 0.0000 Epoch: 39 Global Step: 812590 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:11,910-Speed 6312.36 samples/sec Loss 2.5089 LearningRate 0.0000 Epoch: 39 Global Step: 812600 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:15,165-Speed 6292.88 samples/sec Loss 2.5686 LearningRate 0.0000 Epoch: 39 Global Step: 812610 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:18,423-Speed 6286.78 samples/sec Loss 2.5293 LearningRate 0.0000 Epoch: 39 Global Step: 812620 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:21,671-Speed 6306.77 samples/sec Loss 2.4836 LearningRate 0.0000 Epoch: 39 Global Step: 812630 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:24,919-Speed 6307.50 samples/sec Loss 2.5150 LearningRate 0.0000 Epoch: 39 Global Step: 812640 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:28,153-Speed 6333.72 samples/sec Loss 2.5286 LearningRate 0.0000 Epoch: 39 Global Step: 812650 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:31,402-Speed 6305.05 samples/sec Loss 2.5281 LearningRate 0.0000 Epoch: 39 Global Step: 812660 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:34,652-Speed 6302.51 samples/sec Loss 2.5362 LearningRate 0.0000 Epoch: 39 Global Step: 812670 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:37,909-Speed 6289.10 samples/sec Loss 2.5468 LearningRate 0.0000 Epoch: 39 Global Step: 812680 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:41,162-Speed 6297.22 samples/sec Loss 2.5294 LearningRate 0.0000 Epoch: 39 Global Step: 812690 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:44,415-Speed 6296.24 samples/sec Loss 2.5127 LearningRate 0.0000 Epoch: 39 Global Step: 812700 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:47,679-Speed 6277.63 samples/sec Loss 2.5231 LearningRate 0.0000 Epoch: 39 Global Step: 812710 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:50,939-Speed 6282.95 samples/sec Loss 2.5048 LearningRate 0.0000 Epoch: 39 Global Step: 812720 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:54,208-Speed 6266.08 samples/sec Loss 2.5837 LearningRate 0.0000 Epoch: 39 Global Step: 812730 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:41:57,462-Speed 6296.51 samples/sec Loss 2.5385 LearningRate 0.0000 Epoch: 39 Global Step: 812740 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:00,714-Speed 6299.66 samples/sec Loss 2.5573 LearningRate 0.0000 Epoch: 39 Global Step: 812750 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:42:03,949-Speed 6331.29 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 39 Global Step: 812760 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:07,206-Speed 6289.93 samples/sec Loss 2.5397 LearningRate 0.0000 Epoch: 39 Global Step: 812770 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:10,495-Speed 6227.18 samples/sec Loss 2.5087 LearningRate 0.0000 Epoch: 39 Global Step: 812780 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:13,747-Speed 6299.79 samples/sec Loss 2.4874 LearningRate 0.0000 Epoch: 39 Global Step: 812790 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:16,998-Speed 6301.83 samples/sec Loss 2.5413 LearningRate 0.0000 Epoch: 39 Global Step: 812800 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:20,252-Speed 6294.67 samples/sec Loss 2.5539 LearningRate 0.0000 Epoch: 39 Global Step: 812810 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:23,499-Speed 6308.14 samples/sec Loss 2.5682 LearningRate 0.0000 Epoch: 39 Global Step: 812820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:26,750-Speed 6301.61 samples/sec Loss 2.5180 LearningRate 0.0000 Epoch: 39 Global Step: 812830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:29,999-Speed 6305.16 samples/sec Loss 2.5258 LearningRate 0.0000 Epoch: 39 Global Step: 812840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:33,247-Speed 6306.22 samples/sec Loss 2.5072 LearningRate 0.0000 Epoch: 39 Global Step: 812850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:36,480-Speed 6336.16 samples/sec Loss 2.5315 LearningRate 0.0000 Epoch: 39 Global Step: 812860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:39,731-Speed 6300.49 samples/sec Loss 2.5052 LearningRate 0.0000 Epoch: 39 Global Step: 812870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:42,982-Speed 6302.51 samples/sec Loss 2.5685 LearningRate 0.0000 Epoch: 39 Global Step: 812880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:46,231-Speed 6302.95 samples/sec Loss 2.5188 LearningRate 0.0000 Epoch: 39 Global Step: 812890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:49,481-Speed 6303.42 samples/sec Loss 2.5246 LearningRate 0.0000 Epoch: 39 Global Step: 812900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:52,733-Speed 6299.70 samples/sec Loss 2.5190 LearningRate 0.0000 Epoch: 39 Global Step: 812910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:55,975-Speed 6318.39 samples/sec Loss 2.5271 LearningRate 0.0000 Epoch: 39 Global Step: 812920 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:42:59,225-Speed 6301.93 samples/sec Loss 2.5755 LearningRate 0.0000 Epoch: 39 Global Step: 812930 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:02,476-Speed 6302.40 samples/sec Loss 2.5161 LearningRate 0.0000 Epoch: 39 Global Step: 812940 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:05,731-Speed 6294.24 samples/sec Loss 2.5678 LearningRate 0.0000 Epoch: 39 Global Step: 812950 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:08,970-Speed 6323.97 samples/sec Loss 2.5191 LearningRate 0.0000 Epoch: 39 Global Step: 812960 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:12,221-Speed 6301.81 samples/sec Loss 2.5613 LearningRate 0.0000 Epoch: 39 Global Step: 812970 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:15,468-Speed 6308.50 samples/sec Loss 2.5417 LearningRate 0.0000 Epoch: 39 Global Step: 812980 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:18,726-Speed 6286.20 samples/sec Loss 2.5500 LearningRate 0.0000 Epoch: 39 Global Step: 812990 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:21,981-Speed 6293.40 samples/sec Loss 2.5264 LearningRate 0.0000 Epoch: 39 Global Step: 813000 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:25,244-Speed 6278.14 samples/sec Loss 2.5185 LearningRate 0.0000 Epoch: 39 Global Step: 813010 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:28,563-Speed 6171.34 samples/sec Loss 2.5476 LearningRate 0.0000 Epoch: 39 Global Step: 813020 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:31,811-Speed 6308.35 samples/sec Loss 2.5773 LearningRate 0.0000 Epoch: 39 Global Step: 813030 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:35,057-Speed 6310.52 samples/sec Loss 2.5627 LearningRate 0.0000 Epoch: 39 Global Step: 813040 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:38,316-Speed 6284.89 samples/sec Loss 2.5366 LearningRate 0.0000 Epoch: 39 Global Step: 813050 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:41,577-Speed 6280.91 samples/sec Loss 2.5171 LearningRate 0.0000 Epoch: 39 Global Step: 813060 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-03 17:43:44,815-Speed 6326.08 samples/sec Loss 2.4792 LearningRate 0.0000 Epoch: 39 Global Step: 813070 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-03 17:43:48,080-Speed 6275.68 samples/sec Loss 2.5400 LearningRate 0.0000 Epoch: 39 Global Step: 813080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:43:51,332-Speed 6298.49 samples/sec Loss 2.4999 LearningRate 0.0000 Epoch: 39 Global Step: 813090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:43:54,580-Speed 6307.17 samples/sec Loss 2.4899 LearningRate 0.0000 Epoch: 39 Global Step: 813100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:43:57,836-Speed 6289.40 samples/sec Loss 2.5465 LearningRate 0.0000 Epoch: 39 Global Step: 813110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:01,086-Speed 6304.29 samples/sec Loss 2.5531 LearningRate 0.0000 Epoch: 39 Global Step: 813120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:04,340-Speed 6296.01 samples/sec Loss 2.5263 LearningRate 0.0000 Epoch: 39 Global Step: 813130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:07,591-Speed 6300.07 samples/sec Loss 2.4539 LearningRate 0.0000 Epoch: 39 Global Step: 813140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:10,839-Speed 6307.83 samples/sec Loss 2.5640 LearningRate 0.0000 Epoch: 39 Global Step: 813150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:14,096-Speed 6289.18 samples/sec Loss 2.5280 LearningRate 0.0000 Epoch: 39 Global Step: 813160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:17,341-Speed 6312.75 samples/sec Loss 2.5346 LearningRate 0.0000 Epoch: 39 Global Step: 813170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:20,592-Speed 6300.54 samples/sec Loss 2.5378 LearningRate 0.0000 Epoch: 39 Global Step: 813180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:23,846-Speed 6295.58 samples/sec Loss 2.5915 LearningRate 0.0000 Epoch: 39 Global Step: 813190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:27,095-Speed 6304.14 samples/sec Loss 2.5352 LearningRate 0.0000 Epoch: 39 Global Step: 813200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:30,346-Speed 6302.30 samples/sec Loss 2.4986 LearningRate 0.0000 Epoch: 39 Global Step: 813210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:33,595-Speed 6305.11 samples/sec Loss 2.5022 LearningRate 0.0000 Epoch: 39 Global Step: 813220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:36,843-Speed 6305.23 samples/sec Loss 2.5340 LearningRate 0.0000 Epoch: 39 Global Step: 813230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:40,090-Speed 6309.87 samples/sec Loss 2.5490 LearningRate 0.0000 Epoch: 39 Global Step: 813240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:43,339-Speed 6304.39 samples/sec Loss 2.5336 LearningRate 0.0000 Epoch: 39 Global Step: 813250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:46,587-Speed 6307.26 samples/sec Loss 2.5485 LearningRate 0.0000 Epoch: 39 Global Step: 813260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:49,868-Speed 6243.44 samples/sec Loss 2.5144 LearningRate 0.0000 Epoch: 39 Global Step: 813270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:53,117-Speed 6303.88 samples/sec Loss 2.5481 LearningRate 0.0000 Epoch: 39 Global Step: 813280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:56,362-Speed 6312.58 samples/sec Loss 2.5225 LearningRate 0.0000 Epoch: 39 Global Step: 813290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:44:59,621-Speed 6285.50 samples/sec Loss 2.5008 LearningRate 0.0000 Epoch: 39 Global Step: 813300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:02,873-Speed 6300.38 samples/sec Loss 2.5075 LearningRate 0.0000 Epoch: 39 Global Step: 813310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:06,121-Speed 6306.91 samples/sec Loss 2.4993 LearningRate 0.0000 Epoch: 39 Global Step: 813320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:09,374-Speed 6296.38 samples/sec Loss 2.5263 LearningRate 0.0000 Epoch: 39 Global Step: 813330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:12,622-Speed 6307.83 samples/sec Loss 2.5266 LearningRate 0.0000 Epoch: 39 Global Step: 813340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:15,874-Speed 6299.50 samples/sec Loss 2.5216 LearningRate 0.0000 Epoch: 39 Global Step: 813350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:19,132-Speed 6289.00 samples/sec Loss 2.5434 LearningRate 0.0000 Epoch: 39 Global Step: 813360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:22,372-Speed 6320.91 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 39 Global Step: 813370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:25,656-Speed 6238.85 samples/sec Loss 2.5365 LearningRate 0.0000 Epoch: 39 Global Step: 813380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:28,923-Speed 6269.22 samples/sec Loss 2.5556 LearningRate 0.0000 Epoch: 39 Global Step: 813390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:32,174-Speed 6301.37 samples/sec Loss 2.5196 LearningRate 0.0000 Epoch: 39 Global Step: 813400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:35,424-Speed 6303.71 samples/sec Loss 2.5230 LearningRate 0.0000 Epoch: 39 Global Step: 813410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:38,678-Speed 6294.34 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 39 Global Step: 813420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:41,942-Speed 6275.86 samples/sec Loss 2.4999 LearningRate 0.0000 Epoch: 39 Global Step: 813430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:45,190-Speed 6307.72 samples/sec Loss 2.5097 LearningRate 0.0000 Epoch: 39 Global Step: 813440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:48,456-Speed 6270.94 samples/sec Loss 2.5351 LearningRate 0.0000 Epoch: 39 Global Step: 813450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:51,706-Speed 6303.44 samples/sec Loss 2.5261 LearningRate 0.0000 Epoch: 39 Global Step: 813460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:54,940-Speed 6333.63 samples/sec Loss 2.5108 LearningRate 0.0000 Epoch: 39 Global Step: 813470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:45:58,187-Speed 6308.99 samples/sec Loss 2.5639 LearningRate 0.0000 Epoch: 39 Global Step: 813480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:01,455-Speed 6269.35 samples/sec Loss 2.4978 LearningRate 0.0000 Epoch: 39 Global Step: 813490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:04,706-Speed 6299.69 samples/sec Loss 2.5178 LearningRate 0.0000 Epoch: 39 Global Step: 813500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:07,959-Speed 6297.78 samples/sec Loss 2.5060 LearningRate 0.0000 Epoch: 39 Global Step: 813510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:11,213-Speed 6295.36 samples/sec Loss 2.5305 LearningRate 0.0000 Epoch: 39 Global Step: 813520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:14,472-Speed 6284.34 samples/sec Loss 2.5371 LearningRate 0.0000 Epoch: 39 Global Step: 813530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:17,728-Speed 6292.41 samples/sec Loss 2.5203 LearningRate 0.0000 Epoch: 39 Global Step: 813540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:20,999-Speed 6262.59 samples/sec Loss 2.5314 LearningRate 0.0000 Epoch: 39 Global Step: 813550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:24,245-Speed 6309.95 samples/sec Loss 2.5458 LearningRate 0.0000 Epoch: 39 Global Step: 813560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:27,493-Speed 6306.61 samples/sec Loss 2.5225 LearningRate 0.0000 Epoch: 39 Global Step: 813570 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 17:46:30,728-Speed 6334.55 samples/sec Loss 2.5101 LearningRate 0.0000 Epoch: 39 Global Step: 813580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:34,004-Speed 6253.16 samples/sec Loss 2.5689 LearningRate 0.0000 Epoch: 39 Global Step: 813590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:37,255-Speed 6299.46 samples/sec Loss 2.5171 LearningRate 0.0000 Epoch: 39 Global Step: 813600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:40,509-Speed 6296.94 samples/sec Loss 2.5262 LearningRate 0.0000 Epoch: 39 Global Step: 813610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:43,762-Speed 6296.72 samples/sec Loss 2.5393 LearningRate 0.0000 Epoch: 39 Global Step: 813620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:47,013-Speed 6300.09 samples/sec Loss 2.5593 LearningRate 0.0000 Epoch: 39 Global Step: 813630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:50,264-Speed 6301.68 samples/sec Loss 2.5030 LearningRate 0.0000 Epoch: 39 Global Step: 813640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:53,514-Speed 6303.11 samples/sec Loss 2.5012 LearningRate 0.0000 Epoch: 39 Global Step: 813650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:46:56,760-Speed 6308.82 samples/sec Loss 2.5261 LearningRate 0.0000 Epoch: 39 Global Step: 813660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:00,009-Speed 6306.49 samples/sec Loss 2.5698 LearningRate 0.0000 Epoch: 39 Global Step: 813670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:03,245-Speed 6329.62 samples/sec Loss 2.5843 LearningRate 0.0000 Epoch: 39 Global Step: 813680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:06,499-Speed 6294.11 samples/sec Loss 2.5219 LearningRate 0.0000 Epoch: 39 Global Step: 813690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:09,746-Speed 6310.47 samples/sec Loss 2.4952 LearningRate 0.0000 Epoch: 39 Global Step: 813700 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:12,999-Speed 6296.14 samples/sec Loss 2.5451 LearningRate 0.0000 Epoch: 39 Global Step: 813710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:16,254-Speed 6294.19 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 39 Global Step: 813720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:19,500-Speed 6309.54 samples/sec Loss 2.5509 LearningRate 0.0000 Epoch: 39 Global Step: 813730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:22,751-Speed 6300.74 samples/sec Loss 2.5607 LearningRate 0.0000 Epoch: 39 Global Step: 813740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:26,004-Speed 6298.53 samples/sec Loss 2.5538 LearningRate 0.0000 Epoch: 39 Global Step: 813750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:29,252-Speed 6305.03 samples/sec Loss 2.5331 LearningRate 0.0000 Epoch: 39 Global Step: 813760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:32,503-Speed 6301.71 samples/sec Loss 2.5097 LearningRate 0.0000 Epoch: 39 Global Step: 813770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:35,740-Speed 6329.25 samples/sec Loss 2.4832 LearningRate 0.0000 Epoch: 39 Global Step: 813780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:38,992-Speed 6298.36 samples/sec Loss 2.5347 LearningRate 0.0000 Epoch: 39 Global Step: 813790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:42,255-Speed 6279.01 samples/sec Loss 2.5401 LearningRate 0.0000 Epoch: 39 Global Step: 813800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:45,517-Speed 6278.90 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 39 Global Step: 813810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:48,768-Speed 6302.78 samples/sec Loss 2.4968 LearningRate 0.0000 Epoch: 39 Global Step: 813820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:52,016-Speed 6305.94 samples/sec Loss 2.5430 LearningRate 0.0000 Epoch: 39 Global Step: 813830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:55,270-Speed 6295.87 samples/sec Loss 2.4977 LearningRate 0.0000 Epoch: 39 Global Step: 813840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:47:58,532-Speed 6278.44 samples/sec Loss 2.5168 LearningRate 0.0000 Epoch: 39 Global Step: 813850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:01,784-Speed 6300.05 samples/sec Loss 2.5524 LearningRate 0.0000 Epoch: 39 Global Step: 813860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:05,040-Speed 6290.47 samples/sec Loss 2.5459 LearningRate 0.0000 Epoch: 39 Global Step: 813870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:08,277-Speed 6329.06 samples/sec Loss 2.5308 LearningRate 0.0000 Epoch: 39 Global Step: 813880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:11,533-Speed 6290.74 samples/sec Loss 2.5135 LearningRate 0.0000 Epoch: 39 Global Step: 813890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:14,785-Speed 6298.94 samples/sec Loss 2.5232 LearningRate 0.0000 Epoch: 39 Global Step: 813900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:18,039-Speed 6296.14 samples/sec Loss 2.6059 LearningRate 0.0000 Epoch: 39 Global Step: 813910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:21,305-Speed 6271.69 samples/sec Loss 2.4777 LearningRate 0.0000 Epoch: 39 Global Step: 813920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:24,559-Speed 6295.33 samples/sec Loss 2.5721 LearningRate 0.0000 Epoch: 39 Global Step: 813930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:27,818-Speed 6284.25 samples/sec Loss 2.5282 LearningRate 0.0000 Epoch: 39 Global Step: 813940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:31,067-Speed 6306.29 samples/sec Loss 2.5478 LearningRate 0.0000 Epoch: 39 Global Step: 813950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:34,327-Speed 6283.18 samples/sec Loss 2.5445 LearningRate 0.0000 Epoch: 39 Global Step: 813960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:37,591-Speed 6276.12 samples/sec Loss 2.5304 LearningRate 0.0000 Epoch: 39 Global Step: 813970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:40,845-Speed 6295.76 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 39 Global Step: 813980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:44,098-Speed 6296.47 samples/sec Loss 2.5304 LearningRate 0.0000 Epoch: 39 Global Step: 813990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:47,355-Speed 6290.00 samples/sec Loss 2.5467 LearningRate 0.0000 Epoch: 39 Global Step: 814000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:50,604-Speed 6304.55 samples/sec Loss 2.4773 LearningRate 0.0000 Epoch: 39 Global Step: 814010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:53,884-Speed 6249.76 samples/sec Loss 2.5871 LearningRate 0.0000 Epoch: 39 Global Step: 814020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:48:57,136-Speed 6297.88 samples/sec Loss 2.5276 LearningRate 0.0000 Epoch: 39 Global Step: 814030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:00,391-Speed 6293.96 samples/sec Loss 2.5341 LearningRate 0.0000 Epoch: 39 Global Step: 814040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:03,642-Speed 6301.17 samples/sec Loss 2.5346 LearningRate 0.0000 Epoch: 39 Global Step: 814050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:06,897-Speed 6292.74 samples/sec Loss 2.5548 LearningRate 0.0000 Epoch: 39 Global Step: 814060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:10,145-Speed 6306.12 samples/sec Loss 2.5716 LearningRate 0.0000 Epoch: 39 Global Step: 814070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:13,385-Speed 6324.96 samples/sec Loss 2.5406 LearningRate 0.0000 Epoch: 39 Global Step: 814080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:16,636-Speed 6301.57 samples/sec Loss 2.5449 LearningRate 0.0000 Epoch: 39 Global Step: 814090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:19,882-Speed 6308.95 samples/sec Loss 2.5709 LearningRate 0.0000 Epoch: 39 Global Step: 814100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:23,150-Speed 6268.65 samples/sec Loss 2.5101 LearningRate 0.0000 Epoch: 39 Global Step: 814110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:26,400-Speed 6303.86 samples/sec Loss 2.5010 LearningRate 0.0000 Epoch: 39 Global Step: 814120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:29,703-Speed 6201.66 samples/sec Loss 2.5127 LearningRate 0.0000 Epoch: 39 Global Step: 814130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:32,952-Speed 6305.09 samples/sec Loss 2.5348 LearningRate 0.0000 Epoch: 39 Global Step: 814140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:36,200-Speed 6307.29 samples/sec Loss 2.5660 LearningRate 0.0000 Epoch: 39 Global Step: 814150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:39,455-Speed 6292.19 samples/sec Loss 2.5071 LearningRate 0.0000 Epoch: 39 Global Step: 814160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:42,708-Speed 6299.07 samples/sec Loss 2.4843 LearningRate 0.0000 Epoch: 39 Global Step: 814170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:45,964-Speed 6290.45 samples/sec Loss 2.5576 LearningRate 0.0000 Epoch: 39 Global Step: 814180 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 17:49:49,204-Speed 6322.16 samples/sec Loss 2.5326 LearningRate 0.0000 Epoch: 39 Global Step: 814190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:52,465-Speed 6284.81 samples/sec Loss 2.5356 LearningRate 0.0000 Epoch: 39 Global Step: 814200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:55,727-Speed 6278.93 samples/sec Loss 2.5717 LearningRate 0.0000 Epoch: 39 Global Step: 814210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:49:58,996-Speed 6267.37 samples/sec Loss 2.4880 LearningRate 0.0000 Epoch: 39 Global Step: 814220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:02,381-Speed 6051.21 samples/sec Loss 2.5164 LearningRate 0.0000 Epoch: 39 Global Step: 814230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:05,703-Speed 6165.91 samples/sec Loss 2.5022 LearningRate 0.0000 Epoch: 39 Global Step: 814240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:08,955-Speed 6297.78 samples/sec Loss 2.4824 LearningRate 0.0000 Epoch: 39 Global Step: 814250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:12,204-Speed 6305.33 samples/sec Loss 2.5367 LearningRate 0.0000 Epoch: 39 Global Step: 814260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:15,451-Speed 6309.52 samples/sec Loss 2.5226 LearningRate 0.0000 Epoch: 39 Global Step: 814270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:18,717-Speed 6271.95 samples/sec Loss 2.5405 LearningRate 0.0000 Epoch: 39 Global Step: 814280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:21,953-Speed 6332.94 samples/sec Loss 2.5118 LearningRate 0.0000 Epoch: 39 Global Step: 814290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:25,262-Speed 6191.53 samples/sec Loss 2.5324 LearningRate 0.0000 Epoch: 39 Global Step: 814300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:28,549-Speed 6231.71 samples/sec Loss 2.5197 LearningRate 0.0000 Epoch: 39 Global Step: 814310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:31,803-Speed 6294.95 samples/sec Loss 2.5428 LearningRate 0.0000 Epoch: 39 Global Step: 814320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:35,061-Speed 6286.55 samples/sec Loss 2.5483 LearningRate 0.0000 Epoch: 39 Global Step: 814330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:38,314-Speed 6298.24 samples/sec Loss 2.5590 LearningRate 0.0000 Epoch: 39 Global Step: 814340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:41,570-Speed 6290.67 samples/sec Loss 2.5439 LearningRate 0.0000 Epoch: 39 Global Step: 814350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:44,820-Speed 6302.65 samples/sec Loss 2.5247 LearningRate 0.0000 Epoch: 39 Global Step: 814360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:48,069-Speed 6305.32 samples/sec Loss 2.5402 LearningRate 0.0000 Epoch: 39 Global Step: 814370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:51,315-Speed 6311.29 samples/sec Loss 2.5508 LearningRate 0.0000 Epoch: 39 Global Step: 814380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:54,553-Speed 6325.57 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 39 Global Step: 814390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:50:57,806-Speed 6298.08 samples/sec Loss 2.4825 LearningRate 0.0000 Epoch: 39 Global Step: 814400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:01,062-Speed 6290.67 samples/sec Loss 2.5650 LearningRate 0.0000 Epoch: 39 Global Step: 814410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:04,315-Speed 6298.51 samples/sec Loss 2.5544 LearningRate 0.0000 Epoch: 39 Global Step: 814420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:07,565-Speed 6302.81 samples/sec Loss 2.5486 LearningRate 0.0000 Epoch: 39 Global Step: 814430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:10,812-Speed 6308.78 samples/sec Loss 2.4907 LearningRate 0.0000 Epoch: 39 Global Step: 814440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:14,061-Speed 6304.48 samples/sec Loss 2.5261 LearningRate 0.0000 Epoch: 39 Global Step: 814450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:17,310-Speed 6304.05 samples/sec Loss 2.5248 LearningRate 0.0000 Epoch: 39 Global Step: 814460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:20,563-Speed 6298.04 samples/sec Loss 2.4861 LearningRate 0.0000 Epoch: 39 Global Step: 814470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:23,811-Speed 6305.76 samples/sec Loss 2.5442 LearningRate 0.0000 Epoch: 39 Global Step: 814480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:27,047-Speed 6331.34 samples/sec Loss 2.5484 LearningRate 0.0000 Epoch: 39 Global Step: 814490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:30,295-Speed 6307.06 samples/sec Loss 2.5328 LearningRate 0.0000 Epoch: 39 Global Step: 814500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:33,548-Speed 6296.93 samples/sec Loss 2.5274 LearningRate 0.0000 Epoch: 39 Global Step: 814510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:36,802-Speed 6293.85 samples/sec Loss 2.5684 LearningRate 0.0000 Epoch: 39 Global Step: 814520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:40,058-Speed 6292.25 samples/sec Loss 2.5508 LearningRate 0.0000 Epoch: 39 Global Step: 814530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:43,319-Speed 6282.04 samples/sec Loss 2.5452 LearningRate 0.0000 Epoch: 39 Global Step: 814540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:46,591-Speed 6259.59 samples/sec Loss 2.5554 LearningRate 0.0000 Epoch: 39 Global Step: 814550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:49,920-Speed 6153.64 samples/sec Loss 2.5224 LearningRate 0.0000 Epoch: 39 Global Step: 814560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:53,172-Speed 6300.63 samples/sec Loss 2.5381 LearningRate 0.0000 Epoch: 39 Global Step: 814570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:56,426-Speed 6295.04 samples/sec Loss 2.5238 LearningRate 0.0000 Epoch: 39 Global Step: 814580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:51:59,683-Speed 6288.83 samples/sec Loss 2.5638 LearningRate 0.0000 Epoch: 39 Global Step: 814590 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 17:52:02,937-Speed 6295.33 samples/sec Loss 2.5147 LearningRate 0.0000 Epoch: 39 Global Step: 814600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:06,196-Speed 6285.28 samples/sec Loss 2.5165 LearningRate 0.0000 Epoch: 39 Global Step: 814610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:09,444-Speed 6306.96 samples/sec Loss 2.5058 LearningRate 0.0000 Epoch: 39 Global Step: 814620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:12,725-Speed 6247.80 samples/sec Loss 2.5487 LearningRate 0.0000 Epoch: 39 Global Step: 814630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:15,978-Speed 6297.13 samples/sec Loss 2.5404 LearningRate 0.0000 Epoch: 39 Global Step: 814640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:19,232-Speed 6295.23 samples/sec Loss 2.5132 LearningRate 0.0000 Epoch: 39 Global Step: 814650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:22,507-Speed 6254.72 samples/sec Loss 2.5568 LearningRate 0.0000 Epoch: 39 Global Step: 814660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:25,757-Speed 6302.29 samples/sec Loss 2.5819 LearningRate 0.0000 Epoch: 39 Global Step: 814670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:29,011-Speed 6296.33 samples/sec Loss 2.5109 LearningRate 0.0000 Epoch: 39 Global Step: 814680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:32,263-Speed 6299.32 samples/sec Loss 2.5332 LearningRate 0.0000 Epoch: 39 Global Step: 814690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:35,498-Speed 6331.81 samples/sec Loss 2.5634 LearningRate 0.0000 Epoch: 39 Global Step: 814700 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:38,796-Speed 6211.11 samples/sec Loss 2.5105 LearningRate 0.0000 Epoch: 39 Global Step: 814710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:42,069-Speed 6257.55 samples/sec Loss 2.5350 LearningRate 0.0000 Epoch: 39 Global Step: 814720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:45,333-Speed 6277.32 samples/sec Loss 2.5381 LearningRate 0.0000 Epoch: 39 Global Step: 814730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:48,625-Speed 6221.81 samples/sec Loss 2.4956 LearningRate 0.0000 Epoch: 39 Global Step: 814740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:51,998-Speed 6073.92 samples/sec Loss 2.5234 LearningRate 0.0000 Epoch: 39 Global Step: 814750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:55,308-Speed 6188.50 samples/sec Loss 2.5489 LearningRate 0.0000 Epoch: 39 Global Step: 814760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:52:58,564-Speed 6290.75 samples/sec Loss 2.5241 LearningRate 0.0000 Epoch: 39 Global Step: 814770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:01,837-Speed 6259.33 samples/sec Loss 2.5622 LearningRate 0.0000 Epoch: 39 Global Step: 814780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:05,099-Speed 6279.40 samples/sec Loss 2.5211 LearningRate 0.0000 Epoch: 39 Global Step: 814790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:08,343-Speed 6314.85 samples/sec Loss 2.5531 LearningRate 0.0000 Epoch: 39 Global Step: 814800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:11,588-Speed 6312.83 samples/sec Loss 2.4772 LearningRate 0.0000 Epoch: 39 Global Step: 814810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:14,844-Speed 6292.24 samples/sec Loss 2.5296 LearningRate 0.0000 Epoch: 39 Global Step: 814820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:18,099-Speed 6291.50 samples/sec Loss 2.5578 LearningRate 0.0000 Epoch: 39 Global Step: 814830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:21,357-Speed 6288.91 samples/sec Loss 2.5827 LearningRate 0.0000 Epoch: 39 Global Step: 814840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:24,606-Speed 6304.34 samples/sec Loss 2.5175 LearningRate 0.0000 Epoch: 39 Global Step: 814850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:27,855-Speed 6304.58 samples/sec Loss 2.5236 LearningRate 0.0000 Epoch: 39 Global Step: 814860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:31,100-Speed 6312.33 samples/sec Loss 2.5213 LearningRate 0.0000 Epoch: 39 Global Step: 814870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:34,354-Speed 6296.03 samples/sec Loss 2.5682 LearningRate 0.0000 Epoch: 39 Global Step: 814880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:37,605-Speed 6300.19 samples/sec Loss 2.5390 LearningRate 0.0000 Epoch: 39 Global Step: 814890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:40,844-Speed 6324.83 samples/sec Loss 2.5707 LearningRate 0.0000 Epoch: 39 Global Step: 814900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:44,088-Speed 6315.00 samples/sec Loss 2.5858 LearningRate 0.0000 Epoch: 39 Global Step: 814910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:47,338-Speed 6302.80 samples/sec Loss 2.5164 LearningRate 0.0000 Epoch: 39 Global Step: 814920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:50,594-Speed 6291.52 samples/sec Loss 2.5227 LearningRate 0.0000 Epoch: 39 Global Step: 814930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:53,853-Speed 6285.79 samples/sec Loss 2.5403 LearningRate 0.0000 Epoch: 39 Global Step: 814940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:53:57,114-Speed 6280.20 samples/sec Loss 2.5133 LearningRate 0.0000 Epoch: 39 Global Step: 814950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:00,366-Speed 6300.51 samples/sec Loss 2.5458 LearningRate 0.0000 Epoch: 39 Global Step: 814960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:03,620-Speed 6295.67 samples/sec Loss 2.4752 LearningRate 0.0000 Epoch: 39 Global Step: 814970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:06,875-Speed 6292.36 samples/sec Loss 2.5477 LearningRate 0.0000 Epoch: 39 Global Step: 814980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:10,127-Speed 6300.14 samples/sec Loss 2.5320 LearningRate 0.0000 Epoch: 39 Global Step: 814990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:13,376-Speed 6304.49 samples/sec Loss 2.5095 LearningRate 0.0000 Epoch: 39 Global Step: 815000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:16,623-Speed 6307.75 samples/sec Loss 2.5583 LearningRate 0.0000 Epoch: 39 Global Step: 815010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:19,871-Speed 6306.93 samples/sec Loss 2.5236 LearningRate 0.0000 Epoch: 39 Global Step: 815020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:23,124-Speed 6297.25 samples/sec Loss 2.5694 LearningRate 0.0000 Epoch: 39 Global Step: 815030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:26,390-Speed 6271.94 samples/sec Loss 2.5618 LearningRate 0.0000 Epoch: 39 Global Step: 815040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:29,645-Speed 6293.11 samples/sec Loss 2.5247 LearningRate 0.0000 Epoch: 39 Global Step: 815050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:32,894-Speed 6306.20 samples/sec Loss 2.5035 LearningRate 0.0000 Epoch: 39 Global Step: 815060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:36,151-Speed 6288.04 samples/sec Loss 2.5622 LearningRate 0.0000 Epoch: 39 Global Step: 815070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:39,404-Speed 6296.87 samples/sec Loss 2.5212 LearningRate 0.0000 Epoch: 39 Global Step: 815080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:42,653-Speed 6306.36 samples/sec Loss 2.5576 LearningRate 0.0000 Epoch: 39 Global Step: 815090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:45,911-Speed 6287.50 samples/sec Loss 2.5090 LearningRate 0.0000 Epoch: 39 Global Step: 815100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:49,251-Speed 6132.91 samples/sec Loss 2.5171 LearningRate 0.0000 Epoch: 39 Global Step: 815110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:52,498-Speed 6311.16 samples/sec Loss 2.5427 LearningRate 0.0000 Epoch: 39 Global Step: 815120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:55,750-Speed 6297.80 samples/sec Loss 2.5418 LearningRate 0.0000 Epoch: 39 Global Step: 815130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:54:58,994-Speed 6316.11 samples/sec Loss 2.5359 LearningRate 0.0000 Epoch: 39 Global Step: 815140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:02,252-Speed 6286.64 samples/sec Loss 2.5155 LearningRate 0.0000 Epoch: 39 Global Step: 815150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:05,521-Speed 6266.30 samples/sec Loss 2.5055 LearningRate 0.0000 Epoch: 39 Global Step: 815160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:08,782-Speed 6282.65 samples/sec Loss 2.5347 LearningRate 0.0000 Epoch: 39 Global Step: 815170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:12,041-Speed 6284.69 samples/sec Loss 2.5120 LearningRate 0.0000 Epoch: 39 Global Step: 815180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:15,296-Speed 6294.55 samples/sec Loss 2.5255 LearningRate 0.0000 Epoch: 39 Global Step: 815190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:18,533-Speed 6328.23 samples/sec Loss 2.5267 LearningRate 0.0000 Epoch: 39 Global Step: 815200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:21,777-Speed 6313.72 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 39 Global Step: 815210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:25,033-Speed 6291.63 samples/sec Loss 2.5313 LearningRate 0.0000 Epoch: 39 Global Step: 815220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:28,286-Speed 6297.08 samples/sec Loss 2.5860 LearningRate 0.0000 Epoch: 39 Global Step: 815230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:31,539-Speed 6297.43 samples/sec Loss 2.5376 LearningRate 0.0000 Epoch: 39 Global Step: 815240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:34,803-Speed 6276.41 samples/sec Loss 2.5352 LearningRate 0.0000 Epoch: 39 Global Step: 815250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:38,056-Speed 6296.92 samples/sec Loss 2.4911 LearningRate 0.0000 Epoch: 39 Global Step: 815260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:41,320-Speed 6276.17 samples/sec Loss 2.4825 LearningRate 0.0000 Epoch: 39 Global Step: 815270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:44,568-Speed 6306.58 samples/sec Loss 2.5127 LearningRate 0.0000 Epoch: 39 Global Step: 815280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:47,822-Speed 6294.53 samples/sec Loss 2.5580 LearningRate 0.0000 Epoch: 39 Global Step: 815290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:51,076-Speed 6295.75 samples/sec Loss 2.4924 LearningRate 0.0000 Epoch: 39 Global Step: 815300 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 17:55:54,326-Speed 6303.01 samples/sec Loss 2.5397 LearningRate 0.0000 Epoch: 39 Global Step: 815310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:55:57,575-Speed 6304.92 samples/sec Loss 2.4892 LearningRate 0.0000 Epoch: 39 Global Step: 815320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:00,850-Speed 6254.14 samples/sec Loss 2.5789 LearningRate 0.0000 Epoch: 39 Global Step: 815330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:04,098-Speed 6307.66 samples/sec Loss 2.5336 LearningRate 0.0000 Epoch: 39 Global Step: 815340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:07,347-Speed 6303.10 samples/sec Loss 2.5320 LearningRate 0.0000 Epoch: 39 Global Step: 815350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:10,596-Speed 6305.40 samples/sec Loss 2.5014 LearningRate 0.0000 Epoch: 39 Global Step: 815360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:13,859-Speed 6277.40 samples/sec Loss 2.5277 LearningRate 0.0000 Epoch: 39 Global Step: 815370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:17,110-Speed 6302.47 samples/sec Loss 2.5851 LearningRate 0.0000 Epoch: 39 Global Step: 815380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:20,360-Speed 6302.26 samples/sec Loss 2.5533 LearningRate 0.0000 Epoch: 39 Global Step: 815390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:23,606-Speed 6311.39 samples/sec Loss 2.4934 LearningRate 0.0000 Epoch: 39 Global Step: 815400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:26,848-Speed 6317.87 samples/sec Loss 2.4765 LearningRate 0.0000 Epoch: 39 Global Step: 815410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:30,098-Speed 6302.89 samples/sec Loss 2.5327 LearningRate 0.0000 Epoch: 39 Global Step: 815420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:33,364-Speed 6273.33 samples/sec Loss 2.5560 LearningRate 0.0000 Epoch: 39 Global Step: 815430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:36,636-Speed 6260.62 samples/sec Loss 2.5252 LearningRate 0.0000 Epoch: 39 Global Step: 815440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:39,892-Speed 6289.95 samples/sec Loss 2.5671 LearningRate 0.0000 Epoch: 39 Global Step: 815450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:43,141-Speed 6305.81 samples/sec Loss 2.4705 LearningRate 0.0000 Epoch: 39 Global Step: 815460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:46,392-Speed 6300.13 samples/sec Loss 2.5298 LearningRate 0.0000 Epoch: 39 Global Step: 815470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:49,644-Speed 6299.51 samples/sec Loss 2.5236 LearningRate 0.0000 Epoch: 39 Global Step: 815480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:52,901-Speed 6289.43 samples/sec Loss 2.5171 LearningRate 0.0000 Epoch: 39 Global Step: 815490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:56,157-Speed 6290.66 samples/sec Loss 2.4645 LearningRate 0.0000 Epoch: 39 Global Step: 815500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:56:59,398-Speed 6320.57 samples/sec Loss 2.5018 LearningRate 0.0000 Epoch: 39 Global Step: 815510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:02,652-Speed 6295.27 samples/sec Loss 2.5541 LearningRate 0.0000 Epoch: 39 Global Step: 815520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:05,907-Speed 6293.75 samples/sec Loss 2.5189 LearningRate 0.0000 Epoch: 39 Global Step: 815530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:09,152-Speed 6311.65 samples/sec Loss 2.5465 LearningRate 0.0000 Epoch: 39 Global Step: 815540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:12,421-Speed 6267.72 samples/sec Loss 2.5434 LearningRate 0.0000 Epoch: 39 Global Step: 815550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:15,681-Speed 6283.35 samples/sec Loss 2.5093 LearningRate 0.0000 Epoch: 39 Global Step: 815560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:18,935-Speed 6297.09 samples/sec Loss 2.5623 LearningRate 0.0000 Epoch: 39 Global Step: 815570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:22,184-Speed 6305.27 samples/sec Loss 2.5169 LearningRate 0.0000 Epoch: 39 Global Step: 815580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:25,437-Speed 6297.81 samples/sec Loss 2.5391 LearningRate 0.0000 Epoch: 39 Global Step: 815590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:28,697-Speed 6283.85 samples/sec Loss 2.5306 LearningRate 0.0000 Epoch: 39 Global Step: 815600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:31,947-Speed 6303.24 samples/sec Loss 2.4880 LearningRate 0.0000 Epoch: 39 Global Step: 815610 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 17:57:35,188-Speed 6320.55 samples/sec Loss 2.5389 LearningRate 0.0000 Epoch: 39 Global Step: 815620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:38,437-Speed 6304.05 samples/sec Loss 2.4796 LearningRate 0.0000 Epoch: 39 Global Step: 815630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:41,693-Speed 6290.69 samples/sec Loss 2.5442 LearningRate 0.0000 Epoch: 39 Global Step: 815640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:44,951-Speed 6287.16 samples/sec Loss 2.5749 LearningRate 0.0000 Epoch: 39 Global Step: 815650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:48,223-Speed 6263.20 samples/sec Loss 2.5247 LearningRate 0.0000 Epoch: 39 Global Step: 815660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:51,475-Speed 6299.90 samples/sec Loss 2.5052 LearningRate 0.0000 Epoch: 39 Global Step: 815670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:54,723-Speed 6306.87 samples/sec Loss 2.5319 LearningRate 0.0000 Epoch: 39 Global Step: 815680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:57:57,980-Speed 6289.05 samples/sec Loss 2.5264 LearningRate 0.0000 Epoch: 39 Global Step: 815690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:01,233-Speed 6296.93 samples/sec Loss 2.5580 LearningRate 0.0000 Epoch: 39 Global Step: 815700 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:04,495-Speed 6280.45 samples/sec Loss 2.5346 LearningRate 0.0000 Epoch: 39 Global Step: 815710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:07,729-Speed 6333.54 samples/sec Loss 2.5273 LearningRate 0.0000 Epoch: 39 Global Step: 815720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:10,977-Speed 6306.53 samples/sec Loss 2.4996 LearningRate 0.0000 Epoch: 39 Global Step: 815730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:14,225-Speed 6306.46 samples/sec Loss 2.5591 LearningRate 0.0000 Epoch: 39 Global Step: 815740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:17,492-Speed 6270.82 samples/sec Loss 2.5703 LearningRate 0.0000 Epoch: 39 Global Step: 815750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:20,737-Speed 6311.82 samples/sec Loss 2.5319 LearningRate 0.0000 Epoch: 39 Global Step: 815760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:23,989-Speed 6300.13 samples/sec Loss 2.5409 LearningRate 0.0000 Epoch: 39 Global Step: 815770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:27,317-Speed 6155.36 samples/sec Loss 2.5605 LearningRate 0.0000 Epoch: 39 Global Step: 815780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:30,649-Speed 6148.18 samples/sec Loss 2.5032 LearningRate 0.0000 Epoch: 39 Global Step: 815790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:33,901-Speed 6299.03 samples/sec Loss 2.5175 LearningRate 0.0000 Epoch: 39 Global Step: 815800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:37,160-Speed 6286.77 samples/sec Loss 2.5430 LearningRate 0.0000 Epoch: 39 Global Step: 815810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:40,411-Speed 6300.14 samples/sec Loss 2.4839 LearningRate 0.0000 Epoch: 39 Global Step: 815820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:43,664-Speed 6296.87 samples/sec Loss 2.5217 LearningRate 0.0000 Epoch: 39 Global Step: 815830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:46,923-Speed 6285.61 samples/sec Loss 2.5159 LearningRate 0.0000 Epoch: 39 Global Step: 815840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:50,176-Speed 6296.73 samples/sec Loss 2.5155 LearningRate 0.0000 Epoch: 39 Global Step: 815850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:53,436-Speed 6283.82 samples/sec Loss 2.5422 LearningRate 0.0000 Epoch: 39 Global Step: 815860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:56,683-Speed 6308.14 samples/sec Loss 2.5181 LearningRate 0.0000 Epoch: 39 Global Step: 815870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:58:59,931-Speed 6308.35 samples/sec Loss 2.5526 LearningRate 0.0000 Epoch: 39 Global Step: 815880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:03,210-Speed 6245.91 samples/sec Loss 2.5818 LearningRate 0.0000 Epoch: 39 Global Step: 815890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:06,463-Speed 6297.91 samples/sec Loss 2.5282 LearningRate 0.0000 Epoch: 39 Global Step: 815900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:09,761-Speed 6211.41 samples/sec Loss 2.5474 LearningRate 0.0000 Epoch: 39 Global Step: 815910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:13,003-Speed 6317.27 samples/sec Loss 2.5384 LearningRate 0.0000 Epoch: 39 Global Step: 815920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:16,302-Speed 6210.13 samples/sec Loss 2.5239 LearningRate 0.0000 Epoch: 39 Global Step: 815930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:19,606-Speed 6198.61 samples/sec Loss 2.5493 LearningRate 0.0000 Epoch: 39 Global Step: 815940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:22,857-Speed 6302.37 samples/sec Loss 2.5252 LearningRate 0.0000 Epoch: 39 Global Step: 815950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:26,109-Speed 6298.78 samples/sec Loss 2.5232 LearningRate 0.0000 Epoch: 39 Global Step: 815960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:29,396-Speed 6230.75 samples/sec Loss 2.5127 LearningRate 0.0000 Epoch: 39 Global Step: 815970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:32,656-Speed 6285.52 samples/sec Loss 2.5602 LearningRate 0.0000 Epoch: 39 Global Step: 815980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:35,901-Speed 6313.22 samples/sec Loss 2.5606 LearningRate 0.0000 Epoch: 39 Global Step: 815990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:39,151-Speed 6303.02 samples/sec Loss 2.5151 LearningRate 0.0000 Epoch: 39 Global Step: 816000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:42,410-Speed 6283.95 samples/sec Loss 2.5413 LearningRate 0.0000 Epoch: 39 Global Step: 816010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:45,666-Speed 6291.74 samples/sec Loss 2.5308 LearningRate 0.0000 Epoch: 39 Global Step: 816020 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 17:59:48,940-Speed 6257.09 samples/sec Loss 2.5966 LearningRate 0.0000 Epoch: 39 Global Step: 816030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:52,307-Speed 6083.46 samples/sec Loss 2.5356 LearningRate 0.0000 Epoch: 39 Global Step: 816040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:55,630-Speed 6165.53 samples/sec Loss 2.5445 LearningRate 0.0000 Epoch: 39 Global Step: 816050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 17:59:58,881-Speed 6299.70 samples/sec Loss 2.4946 LearningRate 0.0000 Epoch: 39 Global Step: 816060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:02,133-Speed 6299.68 samples/sec Loss 2.5102 LearningRate 0.0000 Epoch: 39 Global Step: 816070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:05,385-Speed 6298.58 samples/sec Loss 2.5199 LearningRate 0.0000 Epoch: 39 Global Step: 816080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:08,641-Speed 6291.41 samples/sec Loss 2.5750 LearningRate 0.0000 Epoch: 39 Global Step: 816090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:11,890-Speed 6305.05 samples/sec Loss 2.5114 LearningRate 0.0000 Epoch: 39 Global Step: 816100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:15,146-Speed 6292.11 samples/sec Loss 2.5407 LearningRate 0.0000 Epoch: 39 Global Step: 816110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:18,395-Speed 6303.95 samples/sec Loss 2.5700 LearningRate 0.0000 Epoch: 39 Global Step: 816120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:21,638-Speed 6317.85 samples/sec Loss 2.4785 LearningRate 0.0000 Epoch: 39 Global Step: 816130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:24,887-Speed 6304.23 samples/sec Loss 2.5115 LearningRate 0.0000 Epoch: 39 Global Step: 816140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:28,136-Speed 6305.15 samples/sec Loss 2.5044 LearningRate 0.0000 Epoch: 39 Global Step: 816150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:31,403-Speed 6269.25 samples/sec Loss 2.5321 LearningRate 0.0000 Epoch: 39 Global Step: 816160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:34,657-Speed 6296.65 samples/sec Loss 2.4665 LearningRate 0.0000 Epoch: 39 Global Step: 816170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:37,905-Speed 6306.23 samples/sec Loss 2.5913 LearningRate 0.0000 Epoch: 39 Global Step: 816180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:41,155-Speed 6303.75 samples/sec Loss 2.5604 LearningRate 0.0000 Epoch: 39 Global Step: 816190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:44,404-Speed 6304.31 samples/sec Loss 2.4943 LearningRate 0.0000 Epoch: 39 Global Step: 816200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:47,649-Speed 6314.17 samples/sec Loss 2.5480 LearningRate 0.0000 Epoch: 39 Global Step: 816210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:50,900-Speed 6300.09 samples/sec Loss 2.5456 LearningRate 0.0000 Epoch: 39 Global Step: 816220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:54,135-Speed 6332.04 samples/sec Loss 2.4606 LearningRate 0.0000 Epoch: 39 Global Step: 816230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:00:57,392-Speed 6290.05 samples/sec Loss 2.5390 LearningRate 0.0000 Epoch: 39 Global Step: 816240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:00,658-Speed 6271.79 samples/sec Loss 2.5099 LearningRate 0.0000 Epoch: 39 Global Step: 816250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:03,914-Speed 6291.94 samples/sec Loss 2.5204 LearningRate 0.0000 Epoch: 39 Global Step: 816260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:07,171-Speed 6288.78 samples/sec Loss 2.5454 LearningRate 0.0000 Epoch: 39 Global Step: 816270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:10,416-Speed 6312.26 samples/sec Loss 2.5295 LearningRate 0.0000 Epoch: 39 Global Step: 816280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:13,660-Speed 6313.81 samples/sec Loss 2.5267 LearningRate 0.0000 Epoch: 39 Global Step: 816290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:16,915-Speed 6294.42 samples/sec Loss 2.5349 LearningRate 0.0000 Epoch: 39 Global Step: 816300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:20,168-Speed 6297.65 samples/sec Loss 2.5316 LearningRate 0.0000 Epoch: 39 Global Step: 816310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:23,422-Speed 6295.28 samples/sec Loss 2.4809 LearningRate 0.0000 Epoch: 39 Global Step: 816320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:26,658-Speed 6328.41 samples/sec Loss 2.5221 LearningRate 0.0000 Epoch: 39 Global Step: 816330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:29,908-Speed 6303.98 samples/sec Loss 2.5222 LearningRate 0.0000 Epoch: 39 Global Step: 816340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:33,165-Speed 6289.67 samples/sec Loss 2.5444 LearningRate 0.0000 Epoch: 39 Global Step: 816350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:36,445-Speed 6244.17 samples/sec Loss 2.5680 LearningRate 0.0000 Epoch: 39 Global Step: 816360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:39,703-Speed 6288.23 samples/sec Loss 2.5300 LearningRate 0.0000 Epoch: 39 Global Step: 816370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:42,952-Speed 6304.38 samples/sec Loss 2.5749 LearningRate 0.0000 Epoch: 39 Global Step: 816380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:46,215-Speed 6278.20 samples/sec Loss 2.5400 LearningRate 0.0000 Epoch: 39 Global Step: 816390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:49,481-Speed 6275.72 samples/sec Loss 2.6021 LearningRate 0.0000 Epoch: 39 Global Step: 816400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:52,732-Speed 6300.02 samples/sec Loss 2.5744 LearningRate 0.0000 Epoch: 39 Global Step: 816410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:55,980-Speed 6307.57 samples/sec Loss 2.5233 LearningRate 0.0000 Epoch: 39 Global Step: 816420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:01:59,220-Speed 6322.61 samples/sec Loss 2.4893 LearningRate 0.0000 Epoch: 39 Global Step: 816430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:02,472-Speed 6297.86 samples/sec Loss 2.5098 LearningRate 0.0000 Epoch: 39 Global Step: 816440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:05,722-Speed 6304.53 samples/sec Loss 2.5087 LearningRate 0.0000 Epoch: 39 Global Step: 816450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:08,971-Speed 6303.25 samples/sec Loss 2.5226 LearningRate 0.0000 Epoch: 39 Global Step: 816460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:12,224-Speed 6298.15 samples/sec Loss 2.5379 LearningRate 0.0000 Epoch: 39 Global Step: 816470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:15,475-Speed 6300.48 samples/sec Loss 2.5303 LearningRate 0.0000 Epoch: 39 Global Step: 816480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:18,719-Speed 6314.13 samples/sec Loss 2.5286 LearningRate 0.0000 Epoch: 39 Global Step: 816490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:21,979-Speed 6284.83 samples/sec Loss 2.5423 LearningRate 0.0000 Epoch: 39 Global Step: 816500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:25,236-Speed 6289.07 samples/sec Loss 2.5228 LearningRate 0.0000 Epoch: 39 Global Step: 816510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:28,494-Speed 6286.52 samples/sec Loss 2.5156 LearningRate 0.0000 Epoch: 39 Global Step: 816520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:31,729-Speed 6332.59 samples/sec Loss 2.5202 LearningRate 0.0000 Epoch: 39 Global Step: 816530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:34,985-Speed 6291.66 samples/sec Loss 2.5118 LearningRate 0.0000 Epoch: 39 Global Step: 816540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:38,240-Speed 6292.95 samples/sec Loss 2.5447 LearningRate 0.0000 Epoch: 39 Global Step: 816550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:41,493-Speed 6297.78 samples/sec Loss 2.5029 LearningRate 0.0000 Epoch: 39 Global Step: 816560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:44,744-Speed 6300.19 samples/sec Loss 2.5441 LearningRate 0.0000 Epoch: 39 Global Step: 816570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:47,997-Speed 6298.68 samples/sec Loss 2.5079 LearningRate 0.0000 Epoch: 39 Global Step: 816580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:51,263-Speed 6271.18 samples/sec Loss 2.5237 LearningRate 0.0000 Epoch: 39 Global Step: 816590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:54,523-Speed 6283.58 samples/sec Loss 2.5107 LearningRate 0.0000 Epoch: 39 Global Step: 816600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:02:57,769-Speed 6312.65 samples/sec Loss 2.5598 LearningRate 0.0000 Epoch: 39 Global Step: 816610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:03:01,030-Speed 6280.69 samples/sec Loss 2.5237 LearningRate 0.0000 Epoch: 39 Global Step: 816620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:03:04,271-Speed 6320.56 samples/sec Loss 2.5132 LearningRate 0.0000 Epoch: 39 Global Step: 816630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:03:07,522-Speed 6300.82 samples/sec Loss 2.5401 LearningRate 0.0000 Epoch: 39 Global Step: 816640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:03:10,765-Speed 6316.42 samples/sec Loss 2.5183 LearningRate 0.0000 Epoch: 39 Global Step: 816650 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:03:14,011-Speed 6310.07 samples/sec Loss 2.5137 LearningRate 0.0000 Epoch: 39 Global Step: 816660 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:03:17,279-Speed 6269.15 samples/sec Loss 2.5433 LearningRate 0.0000 Epoch: 39 Global Step: 816670 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:03:20,530-Speed 6301.57 samples/sec Loss 2.5114 LearningRate 0.0000 Epoch: 39 Global Step: 816680 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:03:23,783-Speed 6296.05 samples/sec Loss 2.5342 LearningRate 0.0000 Epoch: 39 Global Step: 816690 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:03:27,036-Speed 6298.03 samples/sec Loss 2.5657 LearningRate 0.0000 Epoch: 39 Global Step: 816700 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:03:30,289-Speed 6296.73 samples/sec Loss 2.5212 LearningRate 0.0000 Epoch: 39 Global Step: 816710 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:03:33,541-Speed 6298.68 samples/sec Loss 2.5281 LearningRate 0.0000 Epoch: 39 Global Step: 816720 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:03:36,790-Speed 6304.60 samples/sec Loss 2.5456 LearningRate 0.0000 Epoch: 39 Global Step: 816730 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:03:40,041-Speed 6300.70 samples/sec Loss 2.5633 LearningRate 0.0000 Epoch: 39 Global Step: 816740 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:03:43,302-Speed 6282.81 samples/sec Loss 2.5391 LearningRate 0.0000 Epoch: 39 Global Step: 816750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:03:46,551-Speed 6304.47 samples/sec Loss 2.5850 LearningRate 0.0000 Epoch: 39 Global Step: 816760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:03:49,856-Speed 6198.16 samples/sec Loss 2.5388 LearningRate 0.0000 Epoch: 39 Global Step: 816770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:03:53,108-Speed 6298.59 samples/sec Loss 2.5230 LearningRate 0.0000 Epoch: 39 Global Step: 816780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:03:56,366-Speed 6288.37 samples/sec Loss 2.5610 LearningRate 0.0000 Epoch: 39 Global Step: 816790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:03:59,622-Speed 6292.32 samples/sec Loss 2.5021 LearningRate 0.0000 Epoch: 39 Global Step: 816800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:02,883-Speed 6281.81 samples/sec Loss 2.5382 LearningRate 0.0000 Epoch: 39 Global Step: 816810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:06,142-Speed 6285.52 samples/sec Loss 2.5520 LearningRate 0.0000 Epoch: 39 Global Step: 816820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:09,383-Speed 6320.50 samples/sec Loss 2.5102 LearningRate 0.0000 Epoch: 39 Global Step: 816830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:12,636-Speed 6296.54 samples/sec Loss 2.5407 LearningRate 0.0000 Epoch: 39 Global Step: 816840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:15,873-Speed 6327.62 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 39 Global Step: 816850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:19,122-Speed 6305.04 samples/sec Loss 2.4914 LearningRate 0.0000 Epoch: 39 Global Step: 816860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:22,372-Speed 6303.16 samples/sec Loss 2.5363 LearningRate 0.0000 Epoch: 39 Global Step: 816870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:25,636-Speed 6276.34 samples/sec Loss 2.5461 LearningRate 0.0000 Epoch: 39 Global Step: 816880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:28,889-Speed 6296.30 samples/sec Loss 2.5441 LearningRate 0.0000 Epoch: 39 Global Step: 816890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:32,142-Speed 6298.10 samples/sec Loss 2.5476 LearningRate 0.0000 Epoch: 39 Global Step: 816900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:35,392-Speed 6301.58 samples/sec Loss 2.4940 LearningRate 0.0000 Epoch: 39 Global Step: 816910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:38,657-Speed 6274.09 samples/sec Loss 2.5770 LearningRate 0.0000 Epoch: 39 Global Step: 816920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:41,908-Speed 6302.22 samples/sec Loss 2.5472 LearningRate 0.0000 Epoch: 39 Global Step: 816930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:45,159-Speed 6300.61 samples/sec Loss 2.5512 LearningRate 0.0000 Epoch: 39 Global Step: 816940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:48,388-Speed 6342.83 samples/sec Loss 2.5271 LearningRate 0.0000 Epoch: 39 Global Step: 816950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:51,636-Speed 6307.92 samples/sec Loss 2.5449 LearningRate 0.0000 Epoch: 39 Global Step: 816960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:54,893-Speed 6288.86 samples/sec Loss 2.5239 LearningRate 0.0000 Epoch: 39 Global Step: 816970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:04:58,145-Speed 6299.90 samples/sec Loss 2.5448 LearningRate 0.0000 Epoch: 39 Global Step: 816980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:01,405-Speed 6281.67 samples/sec Loss 2.5814 LearningRate 0.0000 Epoch: 39 Global Step: 816990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:04,659-Speed 6297.71 samples/sec Loss 2.5214 LearningRate 0.0000 Epoch: 39 Global Step: 817000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:07,909-Speed 6301.92 samples/sec Loss 2.5142 LearningRate 0.0000 Epoch: 39 Global Step: 817010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:11,163-Speed 6296.07 samples/sec Loss 2.5693 LearningRate 0.0000 Epoch: 39 Global Step: 817020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:14,418-Speed 6292.80 samples/sec Loss 2.5196 LearningRate 0.0000 Epoch: 39 Global Step: 817030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:17,676-Speed 6287.11 samples/sec Loss 2.5089 LearningRate 0.0000 Epoch: 39 Global Step: 817040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:20,913-Speed 6329.14 samples/sec Loss 2.5696 LearningRate 0.0000 Epoch: 39 Global Step: 817050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:24,166-Speed 6297.15 samples/sec Loss 2.5611 LearningRate 0.0000 Epoch: 39 Global Step: 817060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:27,422-Speed 6289.81 samples/sec Loss 2.5314 LearningRate 0.0000 Epoch: 39 Global Step: 817070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:30,676-Speed 6295.26 samples/sec Loss 2.5189 LearningRate 0.0000 Epoch: 39 Global Step: 817080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:33,933-Speed 6290.54 samples/sec Loss 2.5256 LearningRate 0.0000 Epoch: 39 Global Step: 817090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:37,183-Speed 6303.25 samples/sec Loss 2.5091 LearningRate 0.0000 Epoch: 39 Global Step: 817100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:40,443-Speed 6282.98 samples/sec Loss 2.4923 LearningRate 0.0000 Epoch: 39 Global Step: 817110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:43,695-Speed 6299.79 samples/sec Loss 2.5284 LearningRate 0.0000 Epoch: 39 Global Step: 817120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:46,947-Speed 6298.31 samples/sec Loss 2.5197 LearningRate 0.0000 Epoch: 39 Global Step: 817130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:50,195-Speed 6305.86 samples/sec Loss 2.5404 LearningRate 0.0000 Epoch: 39 Global Step: 817140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:53,437-Speed 6319.04 samples/sec Loss 2.5614 LearningRate 0.0000 Epoch: 39 Global Step: 817150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:56,689-Speed 6299.78 samples/sec Loss 2.5658 LearningRate 0.0000 Epoch: 39 Global Step: 817160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:05:59,934-Speed 6312.09 samples/sec Loss 2.5308 LearningRate 0.0000 Epoch: 39 Global Step: 817170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:03,186-Speed 6299.46 samples/sec Loss 2.5040 LearningRate 0.0000 Epoch: 39 Global Step: 817180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:06,441-Speed 6293.24 samples/sec Loss 2.5043 LearningRate 0.0000 Epoch: 39 Global Step: 817190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:09,697-Speed 6291.66 samples/sec Loss 2.5588 LearningRate 0.0000 Epoch: 39 Global Step: 817200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:12,946-Speed 6305.60 samples/sec Loss 2.5069 LearningRate 0.0000 Epoch: 39 Global Step: 817210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:16,200-Speed 6294.40 samples/sec Loss 2.5351 LearningRate 0.0000 Epoch: 39 Global Step: 817220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:19,454-Speed 6295.89 samples/sec Loss 2.5191 LearningRate 0.0000 Epoch: 39 Global Step: 817230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:22,700-Speed 6310.29 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 39 Global Step: 817240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:25,939-Speed 6325.14 samples/sec Loss 2.4892 LearningRate 0.0000 Epoch: 39 Global Step: 817250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:29,199-Speed 6283.15 samples/sec Loss 2.5256 LearningRate 0.0000 Epoch: 39 Global Step: 817260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:32,460-Speed 6281.67 samples/sec Loss 2.5039 LearningRate 0.0000 Epoch: 39 Global Step: 817270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:35,720-Speed 6284.75 samples/sec Loss 2.5377 LearningRate 0.0000 Epoch: 39 Global Step: 817280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:38,976-Speed 6290.23 samples/sec Loss 2.5276 LearningRate 0.0000 Epoch: 39 Global Step: 817290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:42,230-Speed 6295.11 samples/sec Loss 2.5229 LearningRate 0.0000 Epoch: 39 Global Step: 817300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:45,480-Speed 6302.70 samples/sec Loss 2.4928 LearningRate 0.0000 Epoch: 39 Global Step: 817310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:48,735-Speed 6294.40 samples/sec Loss 2.5169 LearningRate 0.0000 Epoch: 39 Global Step: 817320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:51,986-Speed 6301.20 samples/sec Loss 2.5548 LearningRate 0.0000 Epoch: 39 Global Step: 817330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:55,241-Speed 6291.93 samples/sec Loss 2.4934 LearningRate 0.0000 Epoch: 39 Global Step: 817340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:06:58,496-Speed 6294.10 samples/sec Loss 2.5500 LearningRate 0.0000 Epoch: 39 Global Step: 817350 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:07:01,744-Speed 6306.86 samples/sec Loss 2.5286 LearningRate 0.0000 Epoch: 39 Global Step: 817360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:05,048-Speed 6200.23 samples/sec Loss 2.5091 LearningRate 0.0000 Epoch: 39 Global Step: 817370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:08,301-Speed 6296.24 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 39 Global Step: 817380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:11,550-Speed 6306.10 samples/sec Loss 2.5450 LearningRate 0.0000 Epoch: 39 Global Step: 817390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:14,804-Speed 6295.41 samples/sec Loss 2.5408 LearningRate 0.0000 Epoch: 39 Global Step: 817400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:18,052-Speed 6306.76 samples/sec Loss 2.5584 LearningRate 0.0000 Epoch: 39 Global Step: 817410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:21,303-Speed 6300.53 samples/sec Loss 2.5297 LearningRate 0.0000 Epoch: 39 Global Step: 817420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:24,560-Speed 6289.51 samples/sec Loss 2.4968 LearningRate 0.0000 Epoch: 39 Global Step: 817430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:27,814-Speed 6296.02 samples/sec Loss 2.4738 LearningRate 0.0000 Epoch: 39 Global Step: 817440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:31,064-Speed 6303.50 samples/sec Loss 2.5626 LearningRate 0.0000 Epoch: 39 Global Step: 817450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:34,299-Speed 6331.89 samples/sec Loss 2.5427 LearningRate 0.0000 Epoch: 39 Global Step: 817460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:37,546-Speed 6307.71 samples/sec Loss 2.5553 LearningRate 0.0000 Epoch: 39 Global Step: 817470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:40,803-Speed 6290.96 samples/sec Loss 2.4877 LearningRate 0.0000 Epoch: 39 Global Step: 817480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:44,053-Speed 6301.12 samples/sec Loss 2.5326 LearningRate 0.0000 Epoch: 39 Global Step: 817490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:47,316-Speed 6277.52 samples/sec Loss 2.5384 LearningRate 0.0000 Epoch: 39 Global Step: 817500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:50,570-Speed 6296.19 samples/sec Loss 2.5325 LearningRate 0.0000 Epoch: 39 Global Step: 817510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:53,824-Speed 6295.40 samples/sec Loss 2.5016 LearningRate 0.0000 Epoch: 39 Global Step: 817520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:07:57,073-Speed 6304.31 samples/sec Loss 2.5319 LearningRate 0.0000 Epoch: 39 Global Step: 817530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:00,324-Speed 6300.52 samples/sec Loss 2.5233 LearningRate 0.0000 Epoch: 39 Global Step: 817540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:03,579-Speed 6293.88 samples/sec Loss 2.5604 LearningRate 0.0000 Epoch: 39 Global Step: 817550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:06,813-Speed 6334.62 samples/sec Loss 2.5194 LearningRate 0.0000 Epoch: 39 Global Step: 817560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:10,070-Speed 6288.97 samples/sec Loss 2.5527 LearningRate 0.0000 Epoch: 39 Global Step: 817570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:13,317-Speed 6308.03 samples/sec Loss 2.5108 LearningRate 0.0000 Epoch: 39 Global Step: 817580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:16,583-Speed 6273.24 samples/sec Loss 2.5218 LearningRate 0.0000 Epoch: 39 Global Step: 817590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:19,837-Speed 6294.24 samples/sec Loss 2.5278 LearningRate 0.0000 Epoch: 39 Global Step: 817600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:23,089-Speed 6300.19 samples/sec Loss 2.5444 LearningRate 0.0000 Epoch: 39 Global Step: 817610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:26,347-Speed 6287.15 samples/sec Loss 2.5795 LearningRate 0.0000 Epoch: 39 Global Step: 817620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:29,594-Speed 6308.40 samples/sec Loss 2.5231 LearningRate 0.0000 Epoch: 39 Global Step: 817630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:32,849-Speed 6293.99 samples/sec Loss 2.5406 LearningRate 0.0000 Epoch: 39 Global Step: 817640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:36,096-Speed 6309.44 samples/sec Loss 2.5235 LearningRate 0.0000 Epoch: 39 Global Step: 817650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:39,334-Speed 6326.71 samples/sec Loss 2.5250 LearningRate 0.0000 Epoch: 39 Global Step: 817660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:42,588-Speed 6294.10 samples/sec Loss 2.5240 LearningRate 0.0000 Epoch: 39 Global Step: 817670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:45,840-Speed 6299.78 samples/sec Loss 2.5563 LearningRate 0.0000 Epoch: 39 Global Step: 817680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:49,094-Speed 6294.80 samples/sec Loss 2.5896 LearningRate 0.0000 Epoch: 39 Global Step: 817690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:52,353-Speed 6285.39 samples/sec Loss 2.5113 LearningRate 0.0000 Epoch: 39 Global Step: 817700 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:55,619-Speed 6271.30 samples/sec Loss 2.5687 LearningRate 0.0000 Epoch: 39 Global Step: 817710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:08:58,885-Speed 6273.36 samples/sec Loss 2.5258 LearningRate 0.0000 Epoch: 39 Global Step: 817720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:02,139-Speed 6294.41 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 39 Global Step: 817730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:05,394-Speed 6292.87 samples/sec Loss 2.5230 LearningRate 0.0000 Epoch: 39 Global Step: 817740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:08,647-Speed 6297.92 samples/sec Loss 2.5293 LearningRate 0.0000 Epoch: 39 Global Step: 817750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:11,914-Speed 6269.69 samples/sec Loss 2.4784 LearningRate 0.0000 Epoch: 39 Global Step: 817760 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:09:15,156-Speed 6318.24 samples/sec Loss 2.5664 LearningRate 0.0000 Epoch: 39 Global Step: 817770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:18,407-Speed 6302.11 samples/sec Loss 2.5230 LearningRate 0.0000 Epoch: 39 Global Step: 817780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:21,661-Speed 6294.90 samples/sec Loss 2.5227 LearningRate 0.0000 Epoch: 39 Global Step: 817790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:24,912-Speed 6301.53 samples/sec Loss 2.5567 LearningRate 0.0000 Epoch: 39 Global Step: 817800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:28,165-Speed 6296.25 samples/sec Loss 2.5082 LearningRate 0.0000 Epoch: 39 Global Step: 817810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:31,429-Speed 6277.63 samples/sec Loss 2.5478 LearningRate 0.0000 Epoch: 39 Global Step: 817820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:34,684-Speed 6292.76 samples/sec Loss 2.5142 LearningRate 0.0000 Epoch: 39 Global Step: 817830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:37,938-Speed 6295.12 samples/sec Loss 2.5269 LearningRate 0.0000 Epoch: 39 Global Step: 817840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:41,186-Speed 6307.80 samples/sec Loss 2.4934 LearningRate 0.0000 Epoch: 39 Global Step: 817850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:44,435-Speed 6303.83 samples/sec Loss 2.4915 LearningRate 0.0000 Epoch: 39 Global Step: 817860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:47,681-Speed 6311.55 samples/sec Loss 2.5710 LearningRate 0.0000 Epoch: 39 Global Step: 817870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:50,932-Speed 6299.59 samples/sec Loss 2.5411 LearningRate 0.0000 Epoch: 39 Global Step: 817880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:54,188-Speed 6291.20 samples/sec Loss 2.5634 LearningRate 0.0000 Epoch: 39 Global Step: 817890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:09:57,441-Speed 6297.48 samples/sec Loss 2.5532 LearningRate 0.0000 Epoch: 39 Global Step: 817900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:00,704-Speed 6278.59 samples/sec Loss 2.5215 LearningRate 0.0000 Epoch: 39 Global Step: 817910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:03,955-Speed 6299.84 samples/sec Loss 2.5095 LearningRate 0.0000 Epoch: 39 Global Step: 817920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:07,207-Speed 6300.02 samples/sec Loss 2.5573 LearningRate 0.0000 Epoch: 39 Global Step: 817930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:10,462-Speed 6293.34 samples/sec Loss 2.5566 LearningRate 0.0000 Epoch: 39 Global Step: 817940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:13,726-Speed 6276.46 samples/sec Loss 2.5479 LearningRate 0.0000 Epoch: 39 Global Step: 817950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:16,985-Speed 6285.67 samples/sec Loss 2.4886 LearningRate 0.0000 Epoch: 39 Global Step: 817960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:20,219-Speed 6334.18 samples/sec Loss 2.5086 LearningRate 0.0000 Epoch: 39 Global Step: 817970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:23,474-Speed 6292.56 samples/sec Loss 2.5243 LearningRate 0.0000 Epoch: 39 Global Step: 817980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:26,726-Speed 6298.80 samples/sec Loss 2.5347 LearningRate 0.0000 Epoch: 39 Global Step: 817990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:29,984-Speed 6287.52 samples/sec Loss 2.5425 LearningRate 0.0000 Epoch: 39 Global Step: 818000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:33,232-Speed 6306.38 samples/sec Loss 2.5158 LearningRate 0.0000 Epoch: 39 Global Step: 818010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:36,485-Speed 6297.45 samples/sec Loss 2.5839 LearningRate 0.0000 Epoch: 39 Global Step: 818020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:39,738-Speed 6297.39 samples/sec Loss 2.5294 LearningRate 0.0000 Epoch: 39 Global Step: 818030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:42,990-Speed 6299.63 samples/sec Loss 2.5321 LearningRate 0.0000 Epoch: 39 Global Step: 818040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:46,267-Speed 6251.83 samples/sec Loss 2.5303 LearningRate 0.0000 Epoch: 39 Global Step: 818050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:49,515-Speed 6305.80 samples/sec Loss 2.5516 LearningRate 0.0000 Epoch: 39 Global Step: 818060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:52,762-Speed 6309.45 samples/sec Loss 2.5560 LearningRate 0.0000 Epoch: 39 Global Step: 818070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:56,016-Speed 6294.70 samples/sec Loss 2.4886 LearningRate 0.0000 Epoch: 39 Global Step: 818080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:10:59,266-Speed 6302.02 samples/sec Loss 2.4977 LearningRate 0.0000 Epoch: 39 Global Step: 818090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:02,515-Speed 6305.73 samples/sec Loss 2.5357 LearningRate 0.0000 Epoch: 39 Global Step: 818100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:05,761-Speed 6310.58 samples/sec Loss 2.5308 LearningRate 0.0000 Epoch: 39 Global Step: 818110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:09,008-Speed 6308.34 samples/sec Loss 2.5564 LearningRate 0.0000 Epoch: 39 Global Step: 818120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:12,262-Speed 6295.86 samples/sec Loss 2.4896 LearningRate 0.0000 Epoch: 39 Global Step: 818130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:15,533-Speed 6261.60 samples/sec Loss 2.5468 LearningRate 0.0000 Epoch: 39 Global Step: 818140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:18,785-Speed 6299.84 samples/sec Loss 2.5341 LearningRate 0.0000 Epoch: 39 Global Step: 818150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:22,038-Speed 6296.66 samples/sec Loss 2.5576 LearningRate 0.0000 Epoch: 39 Global Step: 818160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:25,289-Speed 6301.84 samples/sec Loss 2.5653 LearningRate 0.0000 Epoch: 39 Global Step: 818170 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:11:28,601-Speed 6183.93 samples/sec Loss 2.5531 LearningRate 0.0000 Epoch: 39 Global Step: 818180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:31,860-Speed 6286.23 samples/sec Loss 2.5345 LearningRate 0.0000 Epoch: 39 Global Step: 818190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:35,121-Speed 6282.01 samples/sec Loss 2.5175 LearningRate 0.0000 Epoch: 39 Global Step: 818200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:38,376-Speed 6293.07 samples/sec Loss 2.5071 LearningRate 0.0000 Epoch: 39 Global Step: 818210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:41,630-Speed 6295.28 samples/sec Loss 2.5395 LearningRate 0.0000 Epoch: 39 Global Step: 818220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:44,882-Speed 6298.99 samples/sec Loss 2.5125 LearningRate 0.0000 Epoch: 39 Global Step: 818230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:48,149-Speed 6270.94 samples/sec Loss 2.5347 LearningRate 0.0000 Epoch: 39 Global Step: 818240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:51,407-Speed 6287.77 samples/sec Loss 2.5156 LearningRate 0.0000 Epoch: 39 Global Step: 818250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:54,659-Speed 6298.18 samples/sec Loss 2.5364 LearningRate 0.0000 Epoch: 39 Global Step: 818260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:11:57,910-Speed 6301.44 samples/sec Loss 2.5288 LearningRate 0.0000 Epoch: 39 Global Step: 818270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:01,143-Speed 6335.87 samples/sec Loss 2.5559 LearningRate 0.0000 Epoch: 39 Global Step: 818280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:04,394-Speed 6301.43 samples/sec Loss 2.5061 LearningRate 0.0000 Epoch: 39 Global Step: 818290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:07,644-Speed 6302.03 samples/sec Loss 2.4440 LearningRate 0.0000 Epoch: 39 Global Step: 818300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:10,900-Speed 6292.09 samples/sec Loss 2.4851 LearningRate 0.0000 Epoch: 39 Global Step: 818310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:14,161-Speed 6281.10 samples/sec Loss 2.5099 LearningRate 0.0000 Epoch: 39 Global Step: 818320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:17,415-Speed 6296.42 samples/sec Loss 2.5069 LearningRate 0.0000 Epoch: 39 Global Step: 818330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:20,666-Speed 6299.95 samples/sec Loss 2.5434 LearningRate 0.0000 Epoch: 39 Global Step: 818340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:23,922-Speed 6291.72 samples/sec Loss 2.5734 LearningRate 0.0000 Epoch: 39 Global Step: 818350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:27,171-Speed 6303.35 samples/sec Loss 2.4958 LearningRate 0.0000 Epoch: 39 Global Step: 818360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:30,423-Speed 6299.69 samples/sec Loss 2.5281 LearningRate 0.0000 Epoch: 39 Global Step: 818370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:33,663-Speed 6321.85 samples/sec Loss 2.5599 LearningRate 0.0000 Epoch: 39 Global Step: 818380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:36,924-Speed 6283.15 samples/sec Loss 2.5072 LearningRate 0.0000 Epoch: 39 Global Step: 818390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:40,175-Speed 6299.73 samples/sec Loss 2.5356 LearningRate 0.0000 Epoch: 39 Global Step: 818400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:43,435-Speed 6285.09 samples/sec Loss 2.5154 LearningRate 0.0000 Epoch: 39 Global Step: 818410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:46,686-Speed 6300.33 samples/sec Loss 2.5020 LearningRate 0.0000 Epoch: 39 Global Step: 818420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:49,935-Speed 6305.47 samples/sec Loss 2.5534 LearningRate 0.0000 Epoch: 39 Global Step: 818430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:53,184-Speed 6304.62 samples/sec Loss 2.5340 LearningRate 0.0000 Epoch: 39 Global Step: 818440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:56,426-Speed 6317.71 samples/sec Loss 2.5223 LearningRate 0.0000 Epoch: 39 Global Step: 818450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:12:59,678-Speed 6299.00 samples/sec Loss 2.5122 LearningRate 0.0000 Epoch: 39 Global Step: 818460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:02,929-Speed 6301.76 samples/sec Loss 2.5093 LearningRate 0.0000 Epoch: 39 Global Step: 818470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:06,166-Speed 6329.30 samples/sec Loss 2.5406 LearningRate 0.0000 Epoch: 39 Global Step: 818480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:09,411-Speed 6311.73 samples/sec Loss 2.5498 LearningRate 0.0000 Epoch: 39 Global Step: 818490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:12,661-Speed 6304.12 samples/sec Loss 2.5291 LearningRate 0.0000 Epoch: 39 Global Step: 818500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:15,920-Speed 6285.14 samples/sec Loss 2.5214 LearningRate 0.0000 Epoch: 39 Global Step: 818510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:19,166-Speed 6310.66 samples/sec Loss 2.5039 LearningRate 0.0000 Epoch: 39 Global Step: 818520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:22,421-Speed 6292.63 samples/sec Loss 2.4987 LearningRate 0.0000 Epoch: 39 Global Step: 818530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:25,665-Speed 6315.34 samples/sec Loss 2.4887 LearningRate 0.0000 Epoch: 39 Global Step: 818540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:28,919-Speed 6294.32 samples/sec Loss 2.5541 LearningRate 0.0000 Epoch: 39 Global Step: 818550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:32,174-Speed 6292.92 samples/sec Loss 2.5376 LearningRate 0.0000 Epoch: 39 Global Step: 818560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:35,420-Speed 6311.93 samples/sec Loss 2.5437 LearningRate 0.0000 Epoch: 39 Global Step: 818570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:38,669-Speed 6304.61 samples/sec Loss 2.5201 LearningRate 0.0000 Epoch: 39 Global Step: 818580 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:13:41,909-Speed 6322.67 samples/sec Loss 2.4986 LearningRate 0.0000 Epoch: 39 Global Step: 818590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:45,162-Speed 6296.00 samples/sec Loss 2.5097 LearningRate 0.0000 Epoch: 39 Global Step: 818600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:48,413-Speed 6302.01 samples/sec Loss 2.5024 LearningRate 0.0000 Epoch: 39 Global Step: 818610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:51,670-Speed 6288.71 samples/sec Loss 2.5812 LearningRate 0.0000 Epoch: 39 Global Step: 818620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:54,921-Speed 6302.31 samples/sec Loss 2.5499 LearningRate 0.0000 Epoch: 39 Global Step: 818630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:13:58,173-Speed 6298.18 samples/sec Loss 2.5895 LearningRate 0.0000 Epoch: 39 Global Step: 818640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:01,428-Speed 6292.41 samples/sec Loss 2.5434 LearningRate 0.0000 Epoch: 39 Global Step: 818650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:04,682-Speed 6297.01 samples/sec Loss 2.5592 LearningRate 0.0000 Epoch: 39 Global Step: 818660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:07,934-Speed 6298.81 samples/sec Loss 2.5077 LearningRate 0.0000 Epoch: 39 Global Step: 818670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:11,183-Speed 6305.04 samples/sec Loss 2.5116 LearningRate 0.0000 Epoch: 39 Global Step: 818680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:14,423-Speed 6322.90 samples/sec Loss 2.5183 LearningRate 0.0000 Epoch: 39 Global Step: 818690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:17,681-Speed 6287.43 samples/sec Loss 2.5409 LearningRate 0.0000 Epoch: 39 Global Step: 818700 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:20,957-Speed 6253.39 samples/sec Loss 2.5375 LearningRate 0.0000 Epoch: 39 Global Step: 818710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:24,206-Speed 6305.08 samples/sec Loss 2.5069 LearningRate 0.0000 Epoch: 39 Global Step: 818720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:27,456-Speed 6302.55 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 39 Global Step: 818730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:30,701-Speed 6312.09 samples/sec Loss 2.5346 LearningRate 0.0000 Epoch: 39 Global Step: 818740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:33,950-Speed 6305.34 samples/sec Loss 2.5095 LearningRate 0.0000 Epoch: 39 Global Step: 818750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:37,201-Speed 6301.42 samples/sec Loss 2.5880 LearningRate 0.0000 Epoch: 39 Global Step: 818760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:40,448-Speed 6307.33 samples/sec Loss 2.5603 LearningRate 0.0000 Epoch: 39 Global Step: 818770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:43,693-Speed 6312.76 samples/sec Loss 2.5599 LearningRate 0.0000 Epoch: 39 Global Step: 818780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:46,939-Speed 6311.87 samples/sec Loss 2.4961 LearningRate 0.0000 Epoch: 39 Global Step: 818790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:50,198-Speed 6284.40 samples/sec Loss 2.5153 LearningRate 0.0000 Epoch: 39 Global Step: 818800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:53,442-Speed 6315.79 samples/sec Loss 2.5561 LearningRate 0.0000 Epoch: 39 Global Step: 818810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:56,692-Speed 6303.19 samples/sec Loss 2.4961 LearningRate 0.0000 Epoch: 39 Global Step: 818820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:14:59,941-Speed 6303.38 samples/sec Loss 2.4891 LearningRate 0.0000 Epoch: 39 Global Step: 818830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:03,190-Speed 6306.12 samples/sec Loss 2.5105 LearningRate 0.0000 Epoch: 39 Global Step: 818840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:06,439-Speed 6304.34 samples/sec Loss 2.5708 LearningRate 0.0000 Epoch: 39 Global Step: 818850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:09,690-Speed 6301.13 samples/sec Loss 2.4849 LearningRate 0.0000 Epoch: 39 Global Step: 818860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:12,952-Speed 6279.83 samples/sec Loss 2.5273 LearningRate 0.0000 Epoch: 39 Global Step: 818870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:16,313-Speed 6095.50 samples/sec Loss 2.5191 LearningRate 0.0000 Epoch: 39 Global Step: 818880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:19,551-Speed 6326.19 samples/sec Loss 2.5256 LearningRate 0.0000 Epoch: 39 Global Step: 818890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:22,797-Speed 6310.14 samples/sec Loss 2.5335 LearningRate 0.0000 Epoch: 39 Global Step: 818900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:26,054-Speed 6290.46 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 39 Global Step: 818910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:29,300-Speed 6310.10 samples/sec Loss 2.5872 LearningRate 0.0000 Epoch: 39 Global Step: 818920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:32,550-Speed 6302.20 samples/sec Loss 2.4862 LearningRate 0.0000 Epoch: 39 Global Step: 818930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:35,811-Speed 6282.36 samples/sec Loss 2.5600 LearningRate 0.0000 Epoch: 39 Global Step: 818940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:39,065-Speed 6295.59 samples/sec Loss 2.5338 LearningRate 0.0000 Epoch: 39 Global Step: 818950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:42,313-Speed 6306.02 samples/sec Loss 2.5240 LearningRate 0.0000 Epoch: 39 Global Step: 818960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:45,571-Speed 6287.90 samples/sec Loss 2.5179 LearningRate 0.0000 Epoch: 39 Global Step: 818970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:48,860-Speed 6228.11 samples/sec Loss 2.5471 LearningRate 0.0000 Epoch: 39 Global Step: 818980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:52,095-Speed 6332.02 samples/sec Loss 2.5117 LearningRate 0.0000 Epoch: 39 Global Step: 818990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:55,348-Speed 6297.37 samples/sec Loss 2.4900 LearningRate 0.0000 Epoch: 39 Global Step: 819000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:15:58,623-Speed 6254.22 samples/sec Loss 2.5388 LearningRate 0.0000 Epoch: 39 Global Step: 819010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:01,877-Speed 6296.00 samples/sec Loss 2.4728 LearningRate 0.0000 Epoch: 39 Global Step: 819020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:05,142-Speed 6272.91 samples/sec Loss 2.5164 LearningRate 0.0000 Epoch: 39 Global Step: 819030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:08,392-Speed 6304.73 samples/sec Loss 2.5401 LearningRate 0.0000 Epoch: 39 Global Step: 819040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:11,683-Speed 6222.66 samples/sec Loss 2.5411 LearningRate 0.0000 Epoch: 39 Global Step: 819050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:14,942-Speed 6285.74 samples/sec Loss 2.5111 LearningRate 0.0000 Epoch: 39 Global Step: 819060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:18,191-Speed 6305.82 samples/sec Loss 2.4946 LearningRate 0.0000 Epoch: 39 Global Step: 819070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:21,449-Speed 6288.59 samples/sec Loss 2.5316 LearningRate 0.0000 Epoch: 39 Global Step: 819080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:24,685-Speed 6330.19 samples/sec Loss 2.5213 LearningRate 0.0000 Epoch: 39 Global Step: 819090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:27,935-Speed 6303.41 samples/sec Loss 2.5088 LearningRate 0.0000 Epoch: 39 Global Step: 819100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:31,181-Speed 6310.29 samples/sec Loss 2.5511 LearningRate 0.0000 Epoch: 39 Global Step: 819110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:34,435-Speed 6295.67 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 39 Global Step: 819120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:37,699-Speed 6275.04 samples/sec Loss 2.5710 LearningRate 0.0000 Epoch: 39 Global Step: 819130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:40,944-Speed 6312.27 samples/sec Loss 2.5443 LearningRate 0.0000 Epoch: 39 Global Step: 819140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:44,213-Speed 6266.46 samples/sec Loss 2.5410 LearningRate 0.0000 Epoch: 39 Global Step: 819150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:47,472-Speed 6286.37 samples/sec Loss 2.5577 LearningRate 0.0000 Epoch: 39 Global Step: 819160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:50,715-Speed 6315.86 samples/sec Loss 2.5725 LearningRate 0.0000 Epoch: 39 Global Step: 819170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:53,966-Speed 6300.35 samples/sec Loss 2.5199 LearningRate 0.0000 Epoch: 39 Global Step: 819180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:16:57,246-Speed 6246.30 samples/sec Loss 2.5059 LearningRate 0.0000 Epoch: 39 Global Step: 819190 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:17:00,487-Speed 6319.97 samples/sec Loss 2.5004 LearningRate 0.0000 Epoch: 39 Global Step: 819200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:03,738-Speed 6301.87 samples/sec Loss 2.5342 LearningRate 0.0000 Epoch: 39 Global Step: 819210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:06,986-Speed 6306.87 samples/sec Loss 2.5308 LearningRate 0.0000 Epoch: 39 Global Step: 819220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:10,243-Speed 6288.24 samples/sec Loss 2.5473 LearningRate 0.0000 Epoch: 39 Global Step: 819230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:13,493-Speed 6303.81 samples/sec Loss 2.5581 LearningRate 0.0000 Epoch: 39 Global Step: 819240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:16,747-Speed 6295.64 samples/sec Loss 2.5293 LearningRate 0.0000 Epoch: 39 Global Step: 819250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:19,998-Speed 6301.08 samples/sec Loss 2.5214 LearningRate 0.0000 Epoch: 39 Global Step: 819260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:23,258-Speed 6283.88 samples/sec Loss 2.5061 LearningRate 0.0000 Epoch: 39 Global Step: 819270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:26,507-Speed 6305.25 samples/sec Loss 2.5007 LearningRate 0.0000 Epoch: 39 Global Step: 819280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:29,750-Speed 6315.73 samples/sec Loss 2.5031 LearningRate 0.0000 Epoch: 39 Global Step: 819290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:32,997-Speed 6309.26 samples/sec Loss 2.5654 LearningRate 0.0000 Epoch: 39 Global Step: 819300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:36,245-Speed 6306.18 samples/sec Loss 2.5449 LearningRate 0.0000 Epoch: 39 Global Step: 819310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:39,496-Speed 6302.44 samples/sec Loss 2.5302 LearningRate 0.0000 Epoch: 39 Global Step: 819320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:42,746-Speed 6301.24 samples/sec Loss 2.5903 LearningRate 0.0000 Epoch: 39 Global Step: 819330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:45,995-Speed 6306.22 samples/sec Loss 2.5006 LearningRate 0.0000 Epoch: 39 Global Step: 819340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:49,246-Speed 6300.35 samples/sec Loss 2.5422 LearningRate 0.0000 Epoch: 39 Global Step: 819350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:52,493-Speed 6307.86 samples/sec Loss 2.4659 LearningRate 0.0000 Epoch: 39 Global Step: 819360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:55,742-Speed 6306.40 samples/sec Loss 2.5237 LearningRate 0.0000 Epoch: 39 Global Step: 819370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:17:58,996-Speed 6294.55 samples/sec Loss 2.5397 LearningRate 0.0000 Epoch: 39 Global Step: 819380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:02,260-Speed 6275.41 samples/sec Loss 2.4955 LearningRate 0.0000 Epoch: 39 Global Step: 819390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:05,511-Speed 6301.62 samples/sec Loss 2.5524 LearningRate 0.0000 Epoch: 39 Global Step: 819400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:08,764-Speed 6297.04 samples/sec Loss 2.5560 LearningRate 0.0000 Epoch: 39 Global Step: 819410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:12,010-Speed 6310.88 samples/sec Loss 2.5358 LearningRate 0.0000 Epoch: 39 Global Step: 819420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:15,261-Speed 6300.90 samples/sec Loss 2.4753 LearningRate 0.0000 Epoch: 39 Global Step: 819430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:18,509-Speed 6306.99 samples/sec Loss 2.4555 LearningRate 0.0000 Epoch: 39 Global Step: 819440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:21,759-Speed 6301.76 samples/sec Loss 2.5233 LearningRate 0.0000 Epoch: 39 Global Step: 819450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:25,029-Speed 6265.56 samples/sec Loss 2.5422 LearningRate 0.0000 Epoch: 39 Global Step: 819460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:28,289-Speed 6284.54 samples/sec Loss 2.5509 LearningRate 0.0000 Epoch: 39 Global Step: 819470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:31,546-Speed 6289.05 samples/sec Loss 2.5324 LearningRate 0.0000 Epoch: 39 Global Step: 819480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:34,793-Speed 6308.23 samples/sec Loss 2.5205 LearningRate 0.0000 Epoch: 39 Global Step: 819490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:38,035-Speed 6318.41 samples/sec Loss 2.5281 LearningRate 0.0000 Epoch: 39 Global Step: 819500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:41,280-Speed 6313.75 samples/sec Loss 2.5219 LearningRate 0.0000 Epoch: 39 Global Step: 819510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:44,536-Speed 6290.76 samples/sec Loss 2.5310 LearningRate 0.0000 Epoch: 39 Global Step: 819520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:47,786-Speed 6302.41 samples/sec Loss 2.4903 LearningRate 0.0000 Epoch: 39 Global Step: 819530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:51,035-Speed 6306.47 samples/sec Loss 2.4956 LearningRate 0.0000 Epoch: 39 Global Step: 819540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:54,285-Speed 6301.34 samples/sec Loss 2.5041 LearningRate 0.0000 Epoch: 39 Global Step: 819550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:18:57,550-Speed 6274.30 samples/sec Loss 2.5503 LearningRate 0.0000 Epoch: 39 Global Step: 819560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:00,796-Speed 6311.76 samples/sec Loss 2.5079 LearningRate 0.0000 Epoch: 39 Global Step: 819570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:04,049-Speed 6296.06 samples/sec Loss 2.5056 LearningRate 0.0000 Epoch: 39 Global Step: 819580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:07,291-Speed 6318.91 samples/sec Loss 2.4913 LearningRate 0.0000 Epoch: 39 Global Step: 819590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:10,527-Speed 6330.35 samples/sec Loss 2.5785 LearningRate 0.0000 Epoch: 39 Global Step: 819600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:13,776-Speed 6305.36 samples/sec Loss 2.5544 LearningRate 0.0000 Epoch: 39 Global Step: 819610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:17,026-Speed 6302.41 samples/sec Loss 2.5269 LearningRate 0.0000 Epoch: 39 Global Step: 819620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:20,279-Speed 6296.56 samples/sec Loss 2.5318 LearningRate 0.0000 Epoch: 39 Global Step: 819630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:23,551-Speed 6260.72 samples/sec Loss 2.5095 LearningRate 0.0000 Epoch: 39 Global Step: 819640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:26,800-Speed 6305.46 samples/sec Loss 2.5273 LearningRate 0.0000 Epoch: 39 Global Step: 819650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:30,045-Speed 6311.92 samples/sec Loss 2.5167 LearningRate 0.0000 Epoch: 39 Global Step: 819660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:33,298-Speed 6296.50 samples/sec Loss 2.5221 LearningRate 0.0000 Epoch: 39 Global Step: 819670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:36,555-Speed 6289.66 samples/sec Loss 2.5645 LearningRate 0.0000 Epoch: 39 Global Step: 819680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:39,809-Speed 6295.87 samples/sec Loss 2.5296 LearningRate 0.0000 Epoch: 39 Global Step: 819690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:43,058-Speed 6305.55 samples/sec Loss 2.5075 LearningRate 0.0000 Epoch: 39 Global Step: 819700 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:19:46,295-Speed 6328.54 samples/sec Loss 2.5146 LearningRate 0.0000 Epoch: 39 Global Step: 819710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:49,561-Speed 6271.36 samples/sec Loss 2.5331 LearningRate 0.0000 Epoch: 39 Global Step: 819720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:52,817-Speed 6291.92 samples/sec Loss 2.5428 LearningRate 0.0000 Epoch: 39 Global Step: 819730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:56,070-Speed 6297.49 samples/sec Loss 2.5278 LearningRate 0.0000 Epoch: 39 Global Step: 819740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:19:59,319-Speed 6304.09 samples/sec Loss 2.5773 LearningRate 0.0000 Epoch: 39 Global Step: 819750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:02,566-Speed 6308.93 samples/sec Loss 2.5380 LearningRate 0.0000 Epoch: 39 Global Step: 819760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:05,819-Speed 6296.34 samples/sec Loss 2.5320 LearningRate 0.0000 Epoch: 39 Global Step: 819770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:09,069-Speed 6303.61 samples/sec Loss 2.5274 LearningRate 0.0000 Epoch: 39 Global Step: 819780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:12,323-Speed 6295.67 samples/sec Loss 2.5324 LearningRate 0.0000 Epoch: 39 Global Step: 819790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:15,576-Speed 6296.22 samples/sec Loss 2.5430 LearningRate 0.0000 Epoch: 39 Global Step: 819800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:18,819-Speed 6318.10 samples/sec Loss 2.5704 LearningRate 0.0000 Epoch: 39 Global Step: 819810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:22,072-Speed 6295.14 samples/sec Loss 2.5161 LearningRate 0.0000 Epoch: 39 Global Step: 819820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:25,330-Speed 6288.77 samples/sec Loss 2.5857 LearningRate 0.0000 Epoch: 39 Global Step: 819830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:28,577-Speed 6307.52 samples/sec Loss 2.5045 LearningRate 0.0000 Epoch: 39 Global Step: 819840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:31,825-Speed 6306.78 samples/sec Loss 2.4961 LearningRate 0.0000 Epoch: 39 Global Step: 819850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:35,113-Speed 6231.33 samples/sec Loss 2.4814 LearningRate 0.0000 Epoch: 39 Global Step: 819860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:38,413-Speed 6206.81 samples/sec Loss 2.6114 LearningRate 0.0000 Epoch: 39 Global Step: 819870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:41,664-Speed 6302.37 samples/sec Loss 2.5113 LearningRate 0.0000 Epoch: 39 Global Step: 819880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:44,911-Speed 6308.06 samples/sec Loss 2.5773 LearningRate 0.0000 Epoch: 39 Global Step: 819890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:48,159-Speed 6307.17 samples/sec Loss 2.5364 LearningRate 0.0000 Epoch: 39 Global Step: 819900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:51,393-Speed 6334.05 samples/sec Loss 2.5724 LearningRate 0.0000 Epoch: 39 Global Step: 819910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:54,645-Speed 6298.79 samples/sec Loss 2.5208 LearningRate 0.0000 Epoch: 39 Global Step: 819920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:20:57,891-Speed 6311.91 samples/sec Loss 2.5061 LearningRate 0.0000 Epoch: 39 Global Step: 819930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:01,140-Speed 6303.97 samples/sec Loss 2.5632 LearningRate 0.0000 Epoch: 39 Global Step: 819940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:04,392-Speed 6298.81 samples/sec Loss 2.5535 LearningRate 0.0000 Epoch: 39 Global Step: 819950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:07,638-Speed 6311.25 samples/sec Loss 2.5473 LearningRate 0.0000 Epoch: 39 Global Step: 819960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:10,889-Speed 6301.08 samples/sec Loss 2.5029 LearningRate 0.0000 Epoch: 39 Global Step: 819970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:14,138-Speed 6304.28 samples/sec Loss 2.5365 LearningRate 0.0000 Epoch: 39 Global Step: 819980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:17,392-Speed 6294.68 samples/sec Loss 2.5761 LearningRate 0.0000 Epoch: 39 Global Step: 819990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:20,636-Speed 6315.75 samples/sec Loss 2.5595 LearningRate 0.0000 Epoch: 39 Global Step: 820000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:23,873-Speed 6327.93 samples/sec Loss 2.5405 LearningRate 0.0000 Epoch: 39 Global Step: 820010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:27,125-Speed 6299.36 samples/sec Loss 2.5625 LearningRate 0.0000 Epoch: 39 Global Step: 820020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:30,382-Speed 6288.66 samples/sec Loss 2.5223 LearningRate 0.0000 Epoch: 39 Global Step: 820030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:33,645-Speed 6277.23 samples/sec Loss 2.5456 LearningRate 0.0000 Epoch: 39 Global Step: 820040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:36,902-Speed 6289.31 samples/sec Loss 2.5663 LearningRate 0.0000 Epoch: 39 Global Step: 820050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:40,154-Speed 6300.07 samples/sec Loss 2.5001 LearningRate 0.0000 Epoch: 39 Global Step: 820060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:43,402-Speed 6306.04 samples/sec Loss 2.5056 LearningRate 0.0000 Epoch: 39 Global Step: 820070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:46,654-Speed 6298.46 samples/sec Loss 2.4932 LearningRate 0.0000 Epoch: 39 Global Step: 820080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:49,909-Speed 6295.85 samples/sec Loss 2.5177 LearningRate 0.0000 Epoch: 39 Global Step: 820090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:53,159-Speed 6302.46 samples/sec Loss 2.5472 LearningRate 0.0000 Epoch: 39 Global Step: 820100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:56,394-Speed 6333.22 samples/sec Loss 2.5084 LearningRate 0.0000 Epoch: 39 Global Step: 820110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:21:59,642-Speed 6307.20 samples/sec Loss 2.5060 LearningRate 0.0000 Epoch: 39 Global Step: 820120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:02,900-Speed 6286.50 samples/sec Loss 2.5204 LearningRate 0.0000 Epoch: 39 Global Step: 820130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:06,152-Speed 6300.09 samples/sec Loss 2.5563 LearningRate 0.0000 Epoch: 39 Global Step: 820140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:09,405-Speed 6296.03 samples/sec Loss 2.5176 LearningRate 0.0000 Epoch: 39 Global Step: 820150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:12,650-Speed 6312.90 samples/sec Loss 2.6014 LearningRate 0.0000 Epoch: 39 Global Step: 820160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:15,909-Speed 6286.05 samples/sec Loss 2.5100 LearningRate 0.0000 Epoch: 39 Global Step: 820170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:19,159-Speed 6302.20 samples/sec Loss 2.4735 LearningRate 0.0000 Epoch: 39 Global Step: 820180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:22,403-Speed 6315.14 samples/sec Loss 2.4934 LearningRate 0.0000 Epoch: 39 Global Step: 820190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:25,651-Speed 6306.95 samples/sec Loss 2.5748 LearningRate 0.0000 Epoch: 39 Global Step: 820200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:28,900-Speed 6304.34 samples/sec Loss 2.4801 LearningRate 0.0000 Epoch: 39 Global Step: 820210 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:22:32,134-Speed 6334.46 samples/sec Loss 2.5728 LearningRate 0.0000 Epoch: 39 Global Step: 820220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:35,386-Speed 6299.39 samples/sec Loss 2.5611 LearningRate 0.0000 Epoch: 39 Global Step: 820230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:38,635-Speed 6304.68 samples/sec Loss 2.5085 LearningRate 0.0000 Epoch: 39 Global Step: 820240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:41,888-Speed 6296.25 samples/sec Loss 2.5173 LearningRate 0.0000 Epoch: 39 Global Step: 820250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:45,137-Speed 6304.55 samples/sec Loss 2.5883 LearningRate 0.0000 Epoch: 39 Global Step: 820260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:48,386-Speed 6305.26 samples/sec Loss 2.5256 LearningRate 0.0000 Epoch: 39 Global Step: 820270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:51,634-Speed 6306.66 samples/sec Loss 2.5391 LearningRate 0.0000 Epoch: 39 Global Step: 820280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:54,884-Speed 6303.47 samples/sec Loss 2.5567 LearningRate 0.0000 Epoch: 39 Global Step: 820290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:22:58,147-Speed 6279.34 samples/sec Loss 2.5436 LearningRate 0.0000 Epoch: 39 Global Step: 820300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:01,419-Speed 6260.35 samples/sec Loss 2.4960 LearningRate 0.0000 Epoch: 39 Global Step: 820310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:04,653-Speed 6333.57 samples/sec Loss 2.4736 LearningRate 0.0000 Epoch: 39 Global Step: 820320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:07,905-Speed 6299.62 samples/sec Loss 2.4737 LearningRate 0.0000 Epoch: 39 Global Step: 820330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:11,162-Speed 6289.12 samples/sec Loss 2.4800 LearningRate 0.0000 Epoch: 39 Global Step: 820340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:14,418-Speed 6291.88 samples/sec Loss 2.5266 LearningRate 0.0000 Epoch: 39 Global Step: 820350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:17,669-Speed 6300.73 samples/sec Loss 2.5450 LearningRate 0.0000 Epoch: 39 Global Step: 820360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:20,926-Speed 6288.70 samples/sec Loss 2.4964 LearningRate 0.0000 Epoch: 39 Global Step: 820370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:24,179-Speed 6297.42 samples/sec Loss 2.5561 LearningRate 0.0000 Epoch: 39 Global Step: 820380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:27,424-Speed 6312.90 samples/sec Loss 2.5279 LearningRate 0.0000 Epoch: 39 Global Step: 820390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:30,673-Speed 6304.84 samples/sec Loss 2.5274 LearningRate 0.0000 Epoch: 39 Global Step: 820400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:33,927-Speed 6295.33 samples/sec Loss 2.5929 LearningRate 0.0000 Epoch: 39 Global Step: 820410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:37,167-Speed 6322.19 samples/sec Loss 2.5380 LearningRate 0.0000 Epoch: 39 Global Step: 820420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:40,417-Speed 6302.89 samples/sec Loss 2.5351 LearningRate 0.0000 Epoch: 39 Global Step: 820430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:43,668-Speed 6300.32 samples/sec Loss 2.5262 LearningRate 0.0000 Epoch: 39 Global Step: 820440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:46,921-Speed 6296.81 samples/sec Loss 2.5107 LearningRate 0.0000 Epoch: 39 Global Step: 820450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:50,174-Speed 6297.31 samples/sec Loss 2.5229 LearningRate 0.0000 Epoch: 39 Global Step: 820460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:53,434-Speed 6283.40 samples/sec Loss 2.5645 LearningRate 0.0000 Epoch: 39 Global Step: 820470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:56,686-Speed 6300.27 samples/sec Loss 2.4969 LearningRate 0.0000 Epoch: 39 Global Step: 820480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:23:59,933-Speed 6309.00 samples/sec Loss 2.5017 LearningRate 0.0000 Epoch: 39 Global Step: 820490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:03,186-Speed 6297.56 samples/sec Loss 2.5189 LearningRate 0.0000 Epoch: 39 Global Step: 820500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:06,434-Speed 6306.55 samples/sec Loss 2.5553 LearningRate 0.0000 Epoch: 39 Global Step: 820510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:09,678-Speed 6314.01 samples/sec Loss 2.4917 LearningRate 0.0000 Epoch: 39 Global Step: 820520 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:24:12,923-Speed 6312.32 samples/sec Loss 2.5218 LearningRate 0.0000 Epoch: 39 Global Step: 820530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:16,199-Speed 6252.86 samples/sec Loss 2.5553 LearningRate 0.0000 Epoch: 39 Global Step: 820540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:19,496-Speed 6213.90 samples/sec Loss 2.4851 LearningRate 0.0000 Epoch: 39 Global Step: 820550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:22,741-Speed 6311.83 samples/sec Loss 2.4959 LearningRate 0.0000 Epoch: 39 Global Step: 820560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:26,030-Speed 6228.97 samples/sec Loss 2.4802 LearningRate 0.0000 Epoch: 39 Global Step: 820570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:29,279-Speed 6305.75 samples/sec Loss 2.5020 LearningRate 0.0000 Epoch: 39 Global Step: 820580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:32,536-Speed 6289.14 samples/sec Loss 2.5804 LearningRate 0.0000 Epoch: 39 Global Step: 820590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:35,788-Speed 6299.44 samples/sec Loss 2.4825 LearningRate 0.0000 Epoch: 39 Global Step: 820600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:39,040-Speed 6297.20 samples/sec Loss 2.4808 LearningRate 0.0000 Epoch: 39 Global Step: 820610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:42,292-Speed 6299.98 samples/sec Loss 2.5317 LearningRate 0.0000 Epoch: 39 Global Step: 820620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:45,527-Speed 6332.42 samples/sec Loss 2.4888 LearningRate 0.0000 Epoch: 39 Global Step: 820630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:48,781-Speed 6293.96 samples/sec Loss 2.5003 LearningRate 0.0000 Epoch: 39 Global Step: 820640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:52,031-Speed 6303.23 samples/sec Loss 2.5044 LearningRate 0.0000 Epoch: 39 Global Step: 820650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:55,281-Speed 6303.79 samples/sec Loss 2.5182 LearningRate 0.0000 Epoch: 39 Global Step: 820660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:24:58,529-Speed 6306.37 samples/sec Loss 2.5601 LearningRate 0.0000 Epoch: 39 Global Step: 820670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:01,786-Speed 6289.30 samples/sec Loss 2.4971 LearningRate 0.0000 Epoch: 39 Global Step: 820680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:05,038-Speed 6300.36 samples/sec Loss 2.5072 LearningRate 0.0000 Epoch: 39 Global Step: 820690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:08,289-Speed 6300.34 samples/sec Loss 2.5337 LearningRate 0.0000 Epoch: 39 Global Step: 820700 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:11,553-Speed 6276.46 samples/sec Loss 2.5452 LearningRate 0.0000 Epoch: 39 Global Step: 820710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:14,803-Speed 6302.74 samples/sec Loss 2.5608 LearningRate 0.0000 Epoch: 39 Global Step: 820720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:18,045-Speed 6318.01 samples/sec Loss 2.5356 LearningRate 0.0000 Epoch: 39 Global Step: 820730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:21,298-Speed 6297.69 samples/sec Loss 2.5647 LearningRate 0.0000 Epoch: 39 Global Step: 820740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:24,554-Speed 6290.81 samples/sec Loss 2.4995 LearningRate 0.0000 Epoch: 39 Global Step: 820750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:27,813-Speed 6286.06 samples/sec Loss 2.5622 LearningRate 0.0000 Epoch: 39 Global Step: 820760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:31,064-Speed 6300.37 samples/sec Loss 2.5257 LearningRate 0.0000 Epoch: 39 Global Step: 820770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:34,313-Speed 6305.63 samples/sec Loss 2.4559 LearningRate 0.0000 Epoch: 39 Global Step: 820780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:37,564-Speed 6300.81 samples/sec Loss 2.5612 LearningRate 0.0000 Epoch: 39 Global Step: 820790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:40,820-Speed 6290.59 samples/sec Loss 2.5033 LearningRate 0.0000 Epoch: 39 Global Step: 820800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:44,070-Speed 6304.41 samples/sec Loss 2.4953 LearningRate 0.0000 Epoch: 39 Global Step: 820810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:47,325-Speed 6292.99 samples/sec Loss 2.4401 LearningRate 0.0000 Epoch: 39 Global Step: 820820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:50,557-Speed 6337.62 samples/sec Loss 2.4897 LearningRate 0.0000 Epoch: 39 Global Step: 820830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:53,806-Speed 6304.45 samples/sec Loss 2.5030 LearningRate 0.0000 Epoch: 39 Global Step: 820840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:25:57,058-Speed 6299.91 samples/sec Loss 2.5091 LearningRate 0.0000 Epoch: 39 Global Step: 820850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:00,305-Speed 6308.52 samples/sec Loss 2.5262 LearningRate 0.0000 Epoch: 39 Global Step: 820860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:03,572-Speed 6269.95 samples/sec Loss 2.5398 LearningRate 0.0000 Epoch: 39 Global Step: 820870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:06,822-Speed 6303.70 samples/sec Loss 2.4965 LearningRate 0.0000 Epoch: 39 Global Step: 820880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:10,068-Speed 6310.15 samples/sec Loss 2.5086 LearningRate 0.0000 Epoch: 39 Global Step: 820890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:13,315-Speed 6310.33 samples/sec Loss 2.5269 LearningRate 0.0000 Epoch: 39 Global Step: 820900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:16,577-Speed 6280.44 samples/sec Loss 2.5472 LearningRate 0.0000 Epoch: 39 Global Step: 820910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:19,829-Speed 6300.37 samples/sec Loss 2.4798 LearningRate 0.0000 Epoch: 39 Global Step: 820920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:23,062-Speed 6334.99 samples/sec Loss 2.5333 LearningRate 0.0000 Epoch: 39 Global Step: 820930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:26,317-Speed 6294.53 samples/sec Loss 2.4903 LearningRate 0.0000 Epoch: 39 Global Step: 820940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:29,576-Speed 6284.53 samples/sec Loss 2.5259 LearningRate 0.0000 Epoch: 39 Global Step: 820950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:32,830-Speed 6294.65 samples/sec Loss 2.5233 LearningRate 0.0000 Epoch: 39 Global Step: 820960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:36,079-Speed 6306.18 samples/sec Loss 2.5816 LearningRate 0.0000 Epoch: 39 Global Step: 820970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:39,329-Speed 6303.14 samples/sec Loss 2.5710 LearningRate 0.0000 Epoch: 39 Global Step: 820980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:42,576-Speed 6307.69 samples/sec Loss 2.5639 LearningRate 0.0000 Epoch: 39 Global Step: 820990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:45,832-Speed 6291.69 samples/sec Loss 2.5272 LearningRate 0.0000 Epoch: 39 Global Step: 821000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:49,079-Speed 6308.02 samples/sec Loss 2.5385 LearningRate 0.0000 Epoch: 39 Global Step: 821010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:52,337-Speed 6288.60 samples/sec Loss 2.4980 LearningRate 0.0000 Epoch: 39 Global Step: 821020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:55,572-Speed 6331.65 samples/sec Loss 2.5260 LearningRate 0.0000 Epoch: 39 Global Step: 821030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:26:58,823-Speed 6300.65 samples/sec Loss 2.5400 LearningRate 0.0000 Epoch: 39 Global Step: 821040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:02,074-Speed 6301.70 samples/sec Loss 2.5544 LearningRate 0.0000 Epoch: 39 Global Step: 821050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:05,321-Speed 6308.11 samples/sec Loss 2.5128 LearningRate 0.0000 Epoch: 39 Global Step: 821060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:08,567-Speed 6312.40 samples/sec Loss 2.5357 LearningRate 0.0000 Epoch: 39 Global Step: 821070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:11,810-Speed 6316.41 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 39 Global Step: 821080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:15,061-Speed 6300.89 samples/sec Loss 2.5123 LearningRate 0.0000 Epoch: 39 Global Step: 821090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:18,307-Speed 6311.20 samples/sec Loss 2.5082 LearningRate 0.0000 Epoch: 39 Global Step: 821100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:21,554-Speed 6308.53 samples/sec Loss 2.5264 LearningRate 0.0000 Epoch: 39 Global Step: 821110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:24,804-Speed 6303.01 samples/sec Loss 2.5290 LearningRate 0.0000 Epoch: 39 Global Step: 821120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:28,053-Speed 6303.46 samples/sec Loss 2.5369 LearningRate 0.0000 Epoch: 39 Global Step: 821130 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:27:31,287-Speed 6334.87 samples/sec Loss 2.5300 LearningRate 0.0000 Epoch: 39 Global Step: 821140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:34,534-Speed 6308.82 samples/sec Loss 2.5228 LearningRate 0.0000 Epoch: 39 Global Step: 821150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:37,784-Speed 6303.77 samples/sec Loss 2.5493 LearningRate 0.0000 Epoch: 39 Global Step: 821160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:41,034-Speed 6301.03 samples/sec Loss 2.4880 LearningRate 0.0000 Epoch: 39 Global Step: 821170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:44,281-Speed 6310.79 samples/sec Loss 2.5278 LearningRate 0.0000 Epoch: 39 Global Step: 821180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:47,534-Speed 6295.32 samples/sec Loss 2.5215 LearningRate 0.0000 Epoch: 39 Global Step: 821190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:50,788-Speed 6295.41 samples/sec Loss 2.5675 LearningRate 0.0000 Epoch: 39 Global Step: 821200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:54,041-Speed 6297.96 samples/sec Loss 2.4736 LearningRate 0.0000 Epoch: 39 Global Step: 821210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:27:57,290-Speed 6305.18 samples/sec Loss 2.5621 LearningRate 0.0000 Epoch: 39 Global Step: 821220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:00,543-Speed 6296.62 samples/sec Loss 2.5107 LearningRate 0.0000 Epoch: 39 Global Step: 821230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:03,784-Speed 6321.36 samples/sec Loss 2.5082 LearningRate 0.0000 Epoch: 39 Global Step: 821240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:07,034-Speed 6301.56 samples/sec Loss 2.5205 LearningRate 0.0000 Epoch: 39 Global Step: 821250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:10,286-Speed 6299.61 samples/sec Loss 2.5272 LearningRate 0.0000 Epoch: 39 Global Step: 821260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:13,532-Speed 6309.58 samples/sec Loss 2.4925 LearningRate 0.0000 Epoch: 39 Global Step: 821270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:16,783-Speed 6301.45 samples/sec Loss 2.4975 LearningRate 0.0000 Epoch: 39 Global Step: 821280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:20,035-Speed 6301.24 samples/sec Loss 2.5189 LearningRate 0.0000 Epoch: 39 Global Step: 821290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:23,299-Speed 6274.88 samples/sec Loss 2.5780 LearningRate 0.0000 Epoch: 39 Global Step: 821300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:26,553-Speed 6296.02 samples/sec Loss 2.5028 LearningRate 0.0000 Epoch: 39 Global Step: 821310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:29,809-Speed 6290.42 samples/sec Loss 2.5054 LearningRate 0.0000 Epoch: 39 Global Step: 821320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:33,070-Speed 6282.28 samples/sec Loss 2.5205 LearningRate 0.0000 Epoch: 39 Global Step: 821330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:36,304-Speed 6333.23 samples/sec Loss 2.4769 LearningRate 0.0000 Epoch: 39 Global Step: 821340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:39,620-Speed 6178.37 samples/sec Loss 2.5129 LearningRate 0.0000 Epoch: 39 Global Step: 821350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:42,880-Speed 6283.24 samples/sec Loss 2.5012 LearningRate 0.0000 Epoch: 39 Global Step: 821360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:46,135-Speed 6292.81 samples/sec Loss 2.5026 LearningRate 0.0000 Epoch: 39 Global Step: 821370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:49,383-Speed 6307.19 samples/sec Loss 2.4879 LearningRate 0.0000 Epoch: 39 Global Step: 821380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:52,634-Speed 6300.72 samples/sec Loss 2.5600 LearningRate 0.0000 Epoch: 39 Global Step: 821390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:55,882-Speed 6307.47 samples/sec Loss 2.5230 LearningRate 0.0000 Epoch: 39 Global Step: 821400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:28:59,130-Speed 6307.11 samples/sec Loss 2.5439 LearningRate 0.0000 Epoch: 39 Global Step: 821410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:02,386-Speed 6294.04 samples/sec Loss 2.5609 LearningRate 0.0000 Epoch: 39 Global Step: 821420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:05,638-Speed 6299.56 samples/sec Loss 2.5465 LearningRate 0.0000 Epoch: 39 Global Step: 821430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:08,877-Speed 6323.15 samples/sec Loss 2.5967 LearningRate 0.0000 Epoch: 39 Global Step: 821440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:12,126-Speed 6304.47 samples/sec Loss 2.5098 LearningRate 0.0000 Epoch: 39 Global Step: 821450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:15,381-Speed 6294.13 samples/sec Loss 2.6003 LearningRate 0.0000 Epoch: 39 Global Step: 821460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:18,635-Speed 6294.34 samples/sec Loss 2.5120 LearningRate 0.0000 Epoch: 39 Global Step: 821470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:21,887-Speed 6300.06 samples/sec Loss 2.4919 LearningRate 0.0000 Epoch: 39 Global Step: 821480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:25,133-Speed 6311.69 samples/sec Loss 2.5776 LearningRate 0.0000 Epoch: 39 Global Step: 821490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:28,381-Speed 6307.15 samples/sec Loss 2.5341 LearningRate 0.0000 Epoch: 39 Global Step: 821500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:31,632-Speed 6300.71 samples/sec Loss 2.5080 LearningRate 0.0000 Epoch: 39 Global Step: 821510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:34,877-Speed 6312.70 samples/sec Loss 2.5268 LearningRate 0.0000 Epoch: 39 Global Step: 821520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:38,145-Speed 6267.89 samples/sec Loss 2.5559 LearningRate 0.0000 Epoch: 39 Global Step: 821530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:41,402-Speed 6289.86 samples/sec Loss 2.5447 LearningRate 0.0000 Epoch: 39 Global Step: 821540 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:29:44,638-Speed 6329.46 samples/sec Loss 2.5556 LearningRate 0.0000 Epoch: 39 Global Step: 821550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:47,891-Speed 6298.23 samples/sec Loss 2.4688 LearningRate 0.0000 Epoch: 39 Global Step: 821560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:51,140-Speed 6305.01 samples/sec Loss 2.5278 LearningRate 0.0000 Epoch: 39 Global Step: 821570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:54,393-Speed 6296.62 samples/sec Loss 2.5422 LearningRate 0.0000 Epoch: 39 Global Step: 821580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:29:57,646-Speed 6296.21 samples/sec Loss 2.5368 LearningRate 0.0000 Epoch: 39 Global Step: 821590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:00,908-Speed 6280.90 samples/sec Loss 2.5556 LearningRate 0.0000 Epoch: 39 Global Step: 821600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:04,162-Speed 6294.25 samples/sec Loss 2.5119 LearningRate 0.0000 Epoch: 39 Global Step: 821610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:07,422-Speed 6283.88 samples/sec Loss 2.5607 LearningRate 0.0000 Epoch: 39 Global Step: 821620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:10,681-Speed 6286.50 samples/sec Loss 2.5417 LearningRate 0.0000 Epoch: 39 Global Step: 821630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:13,929-Speed 6305.26 samples/sec Loss 2.5181 LearningRate 0.0000 Epoch: 39 Global Step: 821640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:17,167-Speed 6327.32 samples/sec Loss 2.5379 LearningRate 0.0000 Epoch: 39 Global Step: 821650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:20,415-Speed 6307.24 samples/sec Loss 2.4761 LearningRate 0.0000 Epoch: 39 Global Step: 821660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:23,666-Speed 6300.01 samples/sec Loss 2.5129 LearningRate 0.0000 Epoch: 39 Global Step: 821670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:26,948-Speed 6241.31 samples/sec Loss 2.5183 LearningRate 0.0000 Epoch: 39 Global Step: 821680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:30,202-Speed 6298.05 samples/sec Loss 2.5324 LearningRate 0.0000 Epoch: 39 Global Step: 821690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:33,454-Speed 6299.59 samples/sec Loss 2.5573 LearningRate 0.0000 Epoch: 39 Global Step: 821700 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:36,701-Speed 6308.84 samples/sec Loss 2.5276 LearningRate 0.0000 Epoch: 39 Global Step: 821710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:39,960-Speed 6285.54 samples/sec Loss 2.5545 LearningRate 0.0000 Epoch: 39 Global Step: 821720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:43,207-Speed 6308.61 samples/sec Loss 2.5371 LearningRate 0.0000 Epoch: 39 Global Step: 821730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:46,454-Speed 6309.90 samples/sec Loss 2.5333 LearningRate 0.0000 Epoch: 39 Global Step: 821740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:49,692-Speed 6326.26 samples/sec Loss 2.5278 LearningRate 0.0000 Epoch: 39 Global Step: 821750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:52,947-Speed 6292.35 samples/sec Loss 2.5218 LearningRate 0.0000 Epoch: 39 Global Step: 821760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:56,202-Speed 6294.12 samples/sec Loss 2.5375 LearningRate 0.0000 Epoch: 39 Global Step: 821770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:30:59,458-Speed 6290.37 samples/sec Loss 2.5930 LearningRate 0.0000 Epoch: 39 Global Step: 821780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:02,710-Speed 6301.74 samples/sec Loss 2.5707 LearningRate 0.0000 Epoch: 39 Global Step: 821790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:05,962-Speed 6300.39 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 39 Global Step: 821800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:09,212-Speed 6303.90 samples/sec Loss 2.5101 LearningRate 0.0000 Epoch: 39 Global Step: 821810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:12,463-Speed 6299.67 samples/sec Loss 2.5136 LearningRate 0.0000 Epoch: 39 Global Step: 821820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:15,723-Speed 6284.10 samples/sec Loss 2.5341 LearningRate 0.0000 Epoch: 39 Global Step: 821830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:18,967-Speed 6313.48 samples/sec Loss 2.5050 LearningRate 0.0000 Epoch: 39 Global Step: 821840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:22,204-Speed 6329.69 samples/sec Loss 2.5529 LearningRate 0.0000 Epoch: 39 Global Step: 821850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:25,453-Speed 6304.71 samples/sec Loss 2.5285 LearningRate 0.0000 Epoch: 39 Global Step: 821860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:28,704-Speed 6300.31 samples/sec Loss 2.5331 LearningRate 0.0000 Epoch: 39 Global Step: 821870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:31,956-Speed 6299.93 samples/sec Loss 2.4940 LearningRate 0.0000 Epoch: 39 Global Step: 821880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:35,214-Speed 6286.33 samples/sec Loss 2.5379 LearningRate 0.0000 Epoch: 39 Global Step: 821890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:38,465-Speed 6302.82 samples/sec Loss 2.5441 LearningRate 0.0000 Epoch: 39 Global Step: 821900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:41,710-Speed 6312.52 samples/sec Loss 2.5463 LearningRate 0.0000 Epoch: 39 Global Step: 821910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:44,962-Speed 6298.70 samples/sec Loss 2.5312 LearningRate 0.0000 Epoch: 39 Global Step: 821920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:48,212-Speed 6303.01 samples/sec Loss 2.5232 LearningRate 0.0000 Epoch: 39 Global Step: 821930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:51,462-Speed 6302.15 samples/sec Loss 2.4938 LearningRate 0.0000 Epoch: 39 Global Step: 821940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:31:54,705-Speed 6316.15 samples/sec Loss 2.5025 LearningRate 0.0000 Epoch: 39 Global Step: 821950 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:31:57,946-Speed 6321.37 samples/sec Loss 2.5371 LearningRate 0.0000 Epoch: 39 Global Step: 821960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:01,200-Speed 6296.29 samples/sec Loss 2.5241 LearningRate 0.0000 Epoch: 39 Global Step: 821970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:04,446-Speed 6309.63 samples/sec Loss 2.5918 LearningRate 0.0000 Epoch: 39 Global Step: 821980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:07,703-Speed 6289.87 samples/sec Loss 2.5200 LearningRate 0.0000 Epoch: 39 Global Step: 821990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:10,972-Speed 6266.24 samples/sec Loss 2.5866 LearningRate 0.0000 Epoch: 39 Global Step: 822000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:14,223-Speed 6300.23 samples/sec Loss 2.5798 LearningRate 0.0000 Epoch: 39 Global Step: 822010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:17,480-Speed 6289.50 samples/sec Loss 2.5064 LearningRate 0.0000 Epoch: 39 Global Step: 822020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:20,734-Speed 6294.82 samples/sec Loss 2.5121 LearningRate 0.0000 Epoch: 39 Global Step: 822030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:23,986-Speed 6298.80 samples/sec Loss 2.5401 LearningRate 0.0000 Epoch: 39 Global Step: 822040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:27,240-Speed 6296.29 samples/sec Loss 2.5214 LearningRate 0.0000 Epoch: 39 Global Step: 822050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:30,476-Speed 6330.75 samples/sec Loss 2.5199 LearningRate 0.0000 Epoch: 39 Global Step: 822060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:33,725-Speed 6303.98 samples/sec Loss 2.5019 LearningRate 0.0000 Epoch: 39 Global Step: 822070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:36,976-Speed 6302.36 samples/sec Loss 2.5560 LearningRate 0.0000 Epoch: 39 Global Step: 822080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:40,221-Speed 6312.01 samples/sec Loss 2.5062 LearningRate 0.0000 Epoch: 39 Global Step: 822090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:43,476-Speed 6293.85 samples/sec Loss 2.5093 LearningRate 0.0000 Epoch: 39 Global Step: 822100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:46,729-Speed 6297.53 samples/sec Loss 2.5040 LearningRate 0.0000 Epoch: 39 Global Step: 822110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:49,975-Speed 6310.10 samples/sec Loss 2.5409 LearningRate 0.0000 Epoch: 39 Global Step: 822120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:53,227-Speed 6299.54 samples/sec Loss 2.5344 LearningRate 0.0000 Epoch: 39 Global Step: 822130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:56,480-Speed 6296.97 samples/sec Loss 2.5420 LearningRate 0.0000 Epoch: 39 Global Step: 822140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:32:59,729-Speed 6304.36 samples/sec Loss 2.5350 LearningRate 0.0000 Epoch: 39 Global Step: 822150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:02,969-Speed 6322.68 samples/sec Loss 2.5267 LearningRate 0.0000 Epoch: 39 Global Step: 822160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:06,224-Speed 6293.09 samples/sec Loss 2.5160 LearningRate 0.0000 Epoch: 39 Global Step: 822170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:09,479-Speed 6294.10 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 39 Global Step: 822180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:12,726-Speed 6308.67 samples/sec Loss 2.5182 LearningRate 0.0000 Epoch: 39 Global Step: 822190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:15,974-Speed 6307.08 samples/sec Loss 2.5581 LearningRate 0.0000 Epoch: 39 Global Step: 822200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:19,233-Speed 6285.38 samples/sec Loss 2.5365 LearningRate 0.0000 Epoch: 39 Global Step: 822210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:22,479-Speed 6310.26 samples/sec Loss 2.5240 LearningRate 0.0000 Epoch: 39 Global Step: 822220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:25,734-Speed 6292.28 samples/sec Loss 2.4978 LearningRate 0.0000 Epoch: 39 Global Step: 822230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:29,006-Speed 6261.34 samples/sec Loss 2.5396 LearningRate 0.0000 Epoch: 39 Global Step: 822240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:32,259-Speed 6297.70 samples/sec Loss 2.5540 LearningRate 0.0000 Epoch: 39 Global Step: 822250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:35,493-Speed 6333.07 samples/sec Loss 2.5426 LearningRate 0.0000 Epoch: 39 Global Step: 822260 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:38,743-Speed 6304.02 samples/sec Loss 2.5590 LearningRate 0.0000 Epoch: 39 Global Step: 822270 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:41,990-Speed 6307.52 samples/sec Loss 2.5641 LearningRate 0.0000 Epoch: 39 Global Step: 822280 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:45,242-Speed 6299.82 samples/sec Loss 2.5128 LearningRate 0.0000 Epoch: 39 Global Step: 822290 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:48,497-Speed 6292.98 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 39 Global Step: 822300 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:51,747-Speed 6303.95 samples/sec Loss 2.5396 LearningRate 0.0000 Epoch: 39 Global Step: 822310 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:55,000-Speed 6296.77 samples/sec Loss 2.5316 LearningRate 0.0000 Epoch: 39 Global Step: 822320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:33:58,250-Speed 6302.52 samples/sec Loss 2.5494 LearningRate 0.0000 Epoch: 39 Global Step: 822330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:01,501-Speed 6302.76 samples/sec Loss 2.5306 LearningRate 0.0000 Epoch: 39 Global Step: 822340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:04,758-Speed 6288.34 samples/sec Loss 2.5032 LearningRate 0.0000 Epoch: 39 Global Step: 822350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:07,988-Speed 6342.55 samples/sec Loss 2.4783 LearningRate 0.0000 Epoch: 39 Global Step: 822360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:11,251-Speed 6278.58 samples/sec Loss 2.5143 LearningRate 0.0000 Epoch: 39 Global Step: 822370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:14,498-Speed 6310.47 samples/sec Loss 2.5022 LearningRate 0.0000 Epoch: 39 Global Step: 822380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:17,752-Speed 6295.17 samples/sec Loss 2.5232 LearningRate 0.0000 Epoch: 39 Global Step: 822390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:21,004-Speed 6298.15 samples/sec Loss 2.5286 LearningRate 0.0000 Epoch: 39 Global Step: 822400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:24,251-Speed 6309.53 samples/sec Loss 2.5556 LearningRate 0.0000 Epoch: 39 Global Step: 822410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:27,515-Speed 6275.65 samples/sec Loss 2.5374 LearningRate 0.0000 Epoch: 39 Global Step: 822420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:30,779-Speed 6274.95 samples/sec Loss 2.4808 LearningRate 0.0000 Epoch: 39 Global Step: 822430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:34,030-Speed 6301.19 samples/sec Loss 2.5125 LearningRate 0.0000 Epoch: 39 Global Step: 822440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:37,293-Speed 6278.92 samples/sec Loss 2.5401 LearningRate 0.0000 Epoch: 39 Global Step: 822450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:40,541-Speed 6305.39 samples/sec Loss 2.5517 LearningRate 0.0000 Epoch: 39 Global Step: 822460 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:34:43,782-Speed 6322.03 samples/sec Loss 2.5631 LearningRate 0.0000 Epoch: 39 Global Step: 822470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:47,042-Speed 6283.02 samples/sec Loss 2.5473 LearningRate 0.0000 Epoch: 39 Global Step: 822480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:50,290-Speed 6308.07 samples/sec Loss 2.5346 LearningRate 0.0000 Epoch: 39 Global Step: 822490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:53,539-Speed 6304.26 samples/sec Loss 2.5232 LearningRate 0.0000 Epoch: 39 Global Step: 822500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:34:56,791-Speed 6299.19 samples/sec Loss 2.5430 LearningRate 0.0000 Epoch: 39 Global Step: 822510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:00,043-Speed 6299.22 samples/sec Loss 2.4856 LearningRate 0.0000 Epoch: 39 Global Step: 822520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:03,292-Speed 6304.19 samples/sec Loss 2.5049 LearningRate 0.0000 Epoch: 39 Global Step: 822530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:06,540-Speed 6306.31 samples/sec Loss 2.5457 LearningRate 0.0000 Epoch: 39 Global Step: 822540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:09,784-Speed 6315.18 samples/sec Loss 2.5175 LearningRate 0.0000 Epoch: 39 Global Step: 822550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:13,030-Speed 6310.39 samples/sec Loss 2.4676 LearningRate 0.0000 Epoch: 39 Global Step: 822560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:16,260-Speed 6341.68 samples/sec Loss 2.4834 LearningRate 0.0000 Epoch: 39 Global Step: 822570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:19,506-Speed 6312.62 samples/sec Loss 2.5194 LearningRate 0.0000 Epoch: 39 Global Step: 822580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:22,756-Speed 6301.53 samples/sec Loss 2.4644 LearningRate 0.0000 Epoch: 39 Global Step: 822590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:26,004-Speed 6306.56 samples/sec Loss 2.5478 LearningRate 0.0000 Epoch: 39 Global Step: 822600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:29,252-Speed 6307.97 samples/sec Loss 2.5027 LearningRate 0.0000 Epoch: 39 Global Step: 822610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:32,507-Speed 6293.14 samples/sec Loss 2.5059 LearningRate 0.0000 Epoch: 39 Global Step: 822620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:35,759-Speed 6298.15 samples/sec Loss 2.5237 LearningRate 0.0000 Epoch: 39 Global Step: 822630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:39,029-Speed 6264.84 samples/sec Loss 2.5795 LearningRate 0.0000 Epoch: 39 Global Step: 822640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:42,275-Speed 6310.13 samples/sec Loss 2.5501 LearningRate 0.0000 Epoch: 39 Global Step: 822650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:45,536-Speed 6281.39 samples/sec Loss 2.5236 LearningRate 0.0000 Epoch: 39 Global Step: 822660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:48,784-Speed 6306.92 samples/sec Loss 2.4848 LearningRate 0.0000 Epoch: 39 Global Step: 822670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:52,030-Speed 6311.88 samples/sec Loss 2.5583 LearningRate 0.0000 Epoch: 39 Global Step: 822680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:55,284-Speed 6295.34 samples/sec Loss 2.5302 LearningRate 0.0000 Epoch: 39 Global Step: 822690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:35:58,534-Speed 6302.68 samples/sec Loss 2.5042 LearningRate 0.0000 Epoch: 39 Global Step: 822700 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:01,788-Speed 6296.61 samples/sec Loss 2.5466 LearningRate 0.0000 Epoch: 39 Global Step: 822710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:05,055-Speed 6268.51 samples/sec Loss 2.5060 LearningRate 0.0000 Epoch: 39 Global Step: 822720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:08,303-Speed 6306.50 samples/sec Loss 2.5764 LearningRate 0.0000 Epoch: 39 Global Step: 822730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:11,552-Speed 6306.04 samples/sec Loss 2.5344 LearningRate 0.0000 Epoch: 39 Global Step: 822740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:14,803-Speed 6301.51 samples/sec Loss 2.5414 LearningRate 0.0000 Epoch: 39 Global Step: 822750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:18,059-Speed 6291.50 samples/sec Loss 2.5405 LearningRate 0.0000 Epoch: 39 Global Step: 822760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:21,322-Speed 6276.38 samples/sec Loss 2.5259 LearningRate 0.0000 Epoch: 39 Global Step: 822770 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:36:24,570-Speed 6306.98 samples/sec Loss 2.5630 LearningRate 0.0000 Epoch: 39 Global Step: 822780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:27,824-Speed 6295.68 samples/sec Loss 2.5009 LearningRate 0.0000 Epoch: 39 Global Step: 822790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:31,101-Speed 6251.91 samples/sec Loss 2.5128 LearningRate 0.0000 Epoch: 39 Global Step: 822800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:34,359-Speed 6288.18 samples/sec Loss 2.5083 LearningRate 0.0000 Epoch: 39 Global Step: 822810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:37,617-Speed 6287.40 samples/sec Loss 2.5391 LearningRate 0.0000 Epoch: 39 Global Step: 822820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:40,869-Speed 6299.54 samples/sec Loss 2.5835 LearningRate 0.0000 Epoch: 39 Global Step: 822830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:44,119-Speed 6300.92 samples/sec Loss 2.5180 LearningRate 0.0000 Epoch: 39 Global Step: 822840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:47,371-Speed 6299.87 samples/sec Loss 2.5330 LearningRate 0.0000 Epoch: 39 Global Step: 822850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:50,623-Speed 6299.57 samples/sec Loss 2.5660 LearningRate 0.0000 Epoch: 39 Global Step: 822860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:53,878-Speed 6293.29 samples/sec Loss 2.4966 LearningRate 0.0000 Epoch: 39 Global Step: 822870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:36:57,120-Speed 6318.52 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 39 Global Step: 822880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:00,378-Speed 6288.35 samples/sec Loss 2.5277 LearningRate 0.0000 Epoch: 39 Global Step: 822890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:03,626-Speed 6306.99 samples/sec Loss 2.4791 LearningRate 0.0000 Epoch: 39 Global Step: 822900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:06,877-Speed 6299.51 samples/sec Loss 2.5110 LearningRate 0.0000 Epoch: 39 Global Step: 822910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:10,139-Speed 6281.30 samples/sec Loss 2.5599 LearningRate 0.0000 Epoch: 39 Global Step: 822920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:13,390-Speed 6299.89 samples/sec Loss 2.5370 LearningRate 0.0000 Epoch: 39 Global Step: 822930 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:16,653-Speed 6278.43 samples/sec Loss 2.5479 LearningRate 0.0000 Epoch: 39 Global Step: 822940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:19,905-Speed 6299.10 samples/sec Loss 2.5346 LearningRate 0.0000 Epoch: 39 Global Step: 822950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:23,157-Speed 6299.87 samples/sec Loss 2.5319 LearningRate 0.0000 Epoch: 39 Global Step: 822960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:26,406-Speed 6303.52 samples/sec Loss 2.5475 LearningRate 0.0000 Epoch: 39 Global Step: 822970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:29,641-Speed 6331.65 samples/sec Loss 2.5430 LearningRate 0.0000 Epoch: 39 Global Step: 822980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:32,889-Speed 6307.63 samples/sec Loss 2.5308 LearningRate 0.0000 Epoch: 39 Global Step: 822990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:36,146-Speed 6290.07 samples/sec Loss 2.5653 LearningRate 0.0000 Epoch: 39 Global Step: 823000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:39,399-Speed 6295.66 samples/sec Loss 2.4887 LearningRate 0.0000 Epoch: 39 Global Step: 823010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:42,648-Speed 6306.21 samples/sec Loss 2.4963 LearningRate 0.0000 Epoch: 39 Global Step: 823020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:45,897-Speed 6303.85 samples/sec Loss 2.5539 LearningRate 0.0000 Epoch: 39 Global Step: 823030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:49,146-Speed 6306.21 samples/sec Loss 2.4934 LearningRate 0.0000 Epoch: 39 Global Step: 823040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:52,475-Speed 6152.11 samples/sec Loss 2.5634 LearningRate 0.0000 Epoch: 39 Global Step: 823050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:55,776-Speed 6206.24 samples/sec Loss 2.4993 LearningRate 0.0000 Epoch: 39 Global Step: 823060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:37:59,025-Speed 6303.84 samples/sec Loss 2.5137 LearningRate 0.0000 Epoch: 39 Global Step: 823070 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:02,256-Speed 6340.26 samples/sec Loss 2.5174 LearningRate 0.0000 Epoch: 39 Global Step: 823080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:05,506-Speed 6303.66 samples/sec Loss 2.5205 LearningRate 0.0000 Epoch: 39 Global Step: 823090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:08,758-Speed 6299.13 samples/sec Loss 2.5394 LearningRate 0.0000 Epoch: 39 Global Step: 823100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:12,012-Speed 6297.40 samples/sec Loss 2.5068 LearningRate 0.0000 Epoch: 39 Global Step: 823110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:15,258-Speed 6312.26 samples/sec Loss 2.5026 LearningRate 0.0000 Epoch: 39 Global Step: 823120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:18,509-Speed 6300.08 samples/sec Loss 2.5240 LearningRate 0.0000 Epoch: 39 Global Step: 823130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:21,755-Speed 6311.79 samples/sec Loss 2.5059 LearningRate 0.0000 Epoch: 39 Global Step: 823140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:25,018-Speed 6277.49 samples/sec Loss 2.5573 LearningRate 0.0000 Epoch: 39 Global Step: 823150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:28,269-Speed 6300.06 samples/sec Loss 2.5367 LearningRate 0.0000 Epoch: 39 Global Step: 823160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:31,514-Speed 6313.68 samples/sec Loss 2.5303 LearningRate 0.0000 Epoch: 39 Global Step: 823170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:34,748-Speed 6333.09 samples/sec Loss 2.5447 LearningRate 0.0000 Epoch: 39 Global Step: 823180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:38,000-Speed 6300.05 samples/sec Loss 2.5501 LearningRate 0.0000 Epoch: 39 Global Step: 823190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:41,255-Speed 6292.65 samples/sec Loss 2.5185 LearningRate 0.0000 Epoch: 39 Global Step: 823200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:44,507-Speed 6298.58 samples/sec Loss 2.5637 LearningRate 0.0000 Epoch: 39 Global Step: 823210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:38:47,757-Speed 6303.47 samples/sec Loss 2.5591 LearningRate 0.0000 Epoch: 39 Global Step: 823220 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:38:51,005-Speed 6307.70 samples/sec Loss 2.5233 LearningRate 0.0000 Epoch: 39 Global Step: 823230 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:38:54,254-Speed 6304.23 samples/sec Loss 2.5155 LearningRate 0.0000 Epoch: 39 Global Step: 823240 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:38:57,502-Speed 6306.78 samples/sec Loss 2.4707 LearningRate 0.0000 Epoch: 39 Global Step: 823250 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:39:00,760-Speed 6287.45 samples/sec Loss 2.5560 LearningRate 0.0000 Epoch: 39 Global Step: 823260 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:39:04,008-Speed 6306.88 samples/sec Loss 2.5312 LearningRate 0.0000 Epoch: 39 Global Step: 823270 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:39:07,272-Speed 6274.86 samples/sec Loss 2.5065 LearningRate 0.0000 Epoch: 39 Global Step: 823280 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:39:10,535-Speed 6278.18 samples/sec Loss 2.5573 LearningRate 0.0000 Epoch: 39 Global Step: 823290 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:39:13,787-Speed 6300.12 samples/sec Loss 2.5412 LearningRate 0.0000 Epoch: 39 Global Step: 823300 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:39:17,052-Speed 6273.97 samples/sec Loss 2.5011 LearningRate 0.0000 Epoch: 39 Global Step: 823310 Fp16 Grad Scale: 2048 Required: 1 hours Training: 2022-04-03 18:39:20,307-Speed 6293.98 samples/sec Loss 2.5200 LearningRate 0.0000 Epoch: 39 Global Step: 823320 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:23,551-Speed 6314.44 samples/sec Loss 2.5206 LearningRate 0.0000 Epoch: 39 Global Step: 823330 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:26,800-Speed 6305.19 samples/sec Loss 2.4616 LearningRate 0.0000 Epoch: 39 Global Step: 823340 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:30,065-Speed 6274.76 samples/sec Loss 2.5290 LearningRate 0.0000 Epoch: 39 Global Step: 823350 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:33,312-Speed 6308.73 samples/sec Loss 2.5098 LearningRate 0.0000 Epoch: 39 Global Step: 823360 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:36,563-Speed 6299.37 samples/sec Loss 2.5328 LearningRate 0.0000 Epoch: 39 Global Step: 823370 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:39,816-Speed 6297.75 samples/sec Loss 2.5040 LearningRate 0.0000 Epoch: 39 Global Step: 823380 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:43,066-Speed 6302.29 samples/sec Loss 2.5706 LearningRate 0.0000 Epoch: 39 Global Step: 823390 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:46,322-Speed 6290.86 samples/sec Loss 2.5330 LearningRate 0.0000 Epoch: 39 Global Step: 823400 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:49,581-Speed 6285.86 samples/sec Loss 2.5403 LearningRate 0.0000 Epoch: 39 Global Step: 823410 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:52,813-Speed 6339.10 samples/sec Loss 2.5636 LearningRate 0.0000 Epoch: 39 Global Step: 823420 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:56,075-Speed 6279.24 samples/sec Loss 2.5310 LearningRate 0.0000 Epoch: 39 Global Step: 823430 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:39:59,325-Speed 6303.20 samples/sec Loss 2.5185 LearningRate 0.0000 Epoch: 39 Global Step: 823440 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:02,579-Speed 6299.51 samples/sec Loss 2.4708 LearningRate 0.0000 Epoch: 39 Global Step: 823450 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:05,830-Speed 6300.95 samples/sec Loss 2.5393 LearningRate 0.0000 Epoch: 39 Global Step: 823460 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:09,074-Speed 6313.05 samples/sec Loss 2.4940 LearningRate 0.0000 Epoch: 39 Global Step: 823470 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:12,330-Speed 6295.33 samples/sec Loss 2.4896 LearningRate 0.0000 Epoch: 39 Global Step: 823480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:15,581-Speed 6300.04 samples/sec Loss 2.5051 LearningRate 0.0000 Epoch: 39 Global Step: 823490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:18,835-Speed 6295.86 samples/sec Loss 2.5204 LearningRate 0.0000 Epoch: 39 Global Step: 823500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:22,086-Speed 6302.44 samples/sec Loss 2.4988 LearningRate 0.0000 Epoch: 39 Global Step: 823510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:25,349-Speed 6276.87 samples/sec Loss 2.5076 LearningRate 0.0000 Epoch: 39 Global Step: 823520 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:40:28,641-Speed 6221.90 samples/sec Loss 2.5351 LearningRate 0.0000 Epoch: 39 Global Step: 823530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:31,925-Speed 6237.51 samples/sec Loss 2.5282 LearningRate 0.0000 Epoch: 39 Global Step: 823540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:35,183-Speed 6288.72 samples/sec Loss 2.5426 LearningRate 0.0000 Epoch: 39 Global Step: 823550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:38,434-Speed 6301.06 samples/sec Loss 2.5280 LearningRate 0.0000 Epoch: 39 Global Step: 823560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:41,685-Speed 6301.05 samples/sec Loss 2.5675 LearningRate 0.0000 Epoch: 39 Global Step: 823570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:44,947-Speed 6279.22 samples/sec Loss 2.5215 LearningRate 0.0000 Epoch: 39 Global Step: 823580 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:48,207-Speed 6283.47 samples/sec Loss 2.5494 LearningRate 0.0000 Epoch: 39 Global Step: 823590 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:51,461-Speed 6294.78 samples/sec Loss 2.4779 LearningRate 0.0000 Epoch: 39 Global Step: 823600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:54,719-Speed 6288.14 samples/sec Loss 2.5315 LearningRate 0.0000 Epoch: 39 Global Step: 823610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:40:57,970-Speed 6301.04 samples/sec Loss 2.5067 LearningRate 0.0000 Epoch: 39 Global Step: 823620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:01,213-Speed 6316.57 samples/sec Loss 2.4852 LearningRate 0.0000 Epoch: 39 Global Step: 823630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:04,468-Speed 6292.44 samples/sec Loss 2.5479 LearningRate 0.0000 Epoch: 39 Global Step: 823640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:07,719-Speed 6301.64 samples/sec Loss 2.5392 LearningRate 0.0000 Epoch: 39 Global Step: 823650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:10,970-Speed 6300.17 samples/sec Loss 2.5281 LearningRate 0.0000 Epoch: 39 Global Step: 823660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:14,224-Speed 6295.63 samples/sec Loss 2.5266 LearningRate 0.0000 Epoch: 39 Global Step: 823670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:17,502-Speed 6249.02 samples/sec Loss 2.4914 LearningRate 0.0000 Epoch: 39 Global Step: 823680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:20,759-Speed 6289.99 samples/sec Loss 2.5485 LearningRate 0.0000 Epoch: 39 Global Step: 823690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:24,008-Speed 6305.46 samples/sec Loss 2.5173 LearningRate 0.0000 Epoch: 39 Global Step: 823700 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:27,256-Speed 6305.74 samples/sec Loss 2.5261 LearningRate 0.0000 Epoch: 39 Global Step: 823710 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:30,512-Speed 6296.16 samples/sec Loss 2.5216 LearningRate 0.0000 Epoch: 39 Global Step: 823720 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:33,783-Speed 6262.08 samples/sec Loss 2.4952 LearningRate 0.0000 Epoch: 39 Global Step: 823730 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:37,054-Speed 6261.51 samples/sec Loss 2.5404 LearningRate 0.0000 Epoch: 39 Global Step: 823740 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:40,308-Speed 6295.95 samples/sec Loss 2.5749 LearningRate 0.0000 Epoch: 39 Global Step: 823750 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:43,558-Speed 6303.22 samples/sec Loss 2.5151 LearningRate 0.0000 Epoch: 39 Global Step: 823760 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:46,811-Speed 6295.30 samples/sec Loss 2.5130 LearningRate 0.0000 Epoch: 39 Global Step: 823770 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:50,076-Speed 6274.85 samples/sec Loss 2.5369 LearningRate 0.0000 Epoch: 39 Global Step: 823780 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:53,327-Speed 6301.38 samples/sec Loss 2.5789 LearningRate 0.0000 Epoch: 39 Global Step: 823790 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:56,576-Speed 6305.19 samples/sec Loss 2.5239 LearningRate 0.0000 Epoch: 39 Global Step: 823800 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:41:59,849-Speed 6257.51 samples/sec Loss 2.5509 LearningRate 0.0000 Epoch: 39 Global Step: 823810 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:03,111-Speed 6279.81 samples/sec Loss 2.5291 LearningRate 0.0000 Epoch: 39 Global Step: 823820 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:06,355-Speed 6315.03 samples/sec Loss 2.5414 LearningRate 0.0000 Epoch: 39 Global Step: 823830 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:09,601-Speed 6310.10 samples/sec Loss 2.5255 LearningRate 0.0000 Epoch: 39 Global Step: 823840 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:12,851-Speed 6302.38 samples/sec Loss 2.5215 LearningRate 0.0000 Epoch: 39 Global Step: 823850 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:16,103-Speed 6299.36 samples/sec Loss 2.5041 LearningRate 0.0000 Epoch: 39 Global Step: 823860 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:19,354-Speed 6301.52 samples/sec Loss 2.5626 LearningRate 0.0000 Epoch: 39 Global Step: 823870 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:22,600-Speed 6311.04 samples/sec Loss 2.5369 LearningRate 0.0000 Epoch: 39 Global Step: 823880 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:25,849-Speed 6305.42 samples/sec Loss 2.5428 LearningRate 0.0000 Epoch: 39 Global Step: 823890 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:29,098-Speed 6304.84 samples/sec Loss 2.5367 LearningRate 0.0000 Epoch: 39 Global Step: 823900 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:32,346-Speed 6306.09 samples/sec Loss 2.5082 LearningRate 0.0000 Epoch: 39 Global Step: 823910 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:35,592-Speed 6312.27 samples/sec Loss 2.5337 LearningRate 0.0000 Epoch: 39 Global Step: 823920 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:38,837-Speed 6311.99 samples/sec Loss 2.5533 LearningRate 0.0000 Epoch: 39 Global Step: 823930 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-03 18:42:42,070-Speed 6335.21 samples/sec Loss 2.5967 LearningRate 0.0000 Epoch: 39 Global Step: 823940 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:45,318-Speed 6307.49 samples/sec Loss 2.4753 LearningRate 0.0000 Epoch: 39 Global Step: 823950 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:48,584-Speed 6272.43 samples/sec Loss 2.5484 LearningRate 0.0000 Epoch: 39 Global Step: 823960 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:51,830-Speed 6309.83 samples/sec Loss 2.5414 LearningRate 0.0000 Epoch: 39 Global Step: 823970 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:55,077-Speed 6309.33 samples/sec Loss 2.5226 LearningRate 0.0000 Epoch: 39 Global Step: 823980 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:42:58,324-Speed 6309.44 samples/sec Loss 2.5466 LearningRate 0.0000 Epoch: 39 Global Step: 823990 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:43:01,575-Speed 6301.04 samples/sec Loss 2.5641 LearningRate 0.0000 Epoch: 39 Global Step: 824000 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:43:04,826-Speed 6299.25 samples/sec Loss 2.5487 LearningRate 0.0000 Epoch: 39 Global Step: 824010 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:43:08,069-Speed 6316.67 samples/sec Loss 2.4803 LearningRate 0.0000 Epoch: 39 Global Step: 824020 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:43:11,319-Speed 6302.99 samples/sec Loss 2.5324 LearningRate 0.0000 Epoch: 39 Global Step: 824030 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:43:14,550-Speed 6339.98 samples/sec Loss 2.5226 LearningRate 0.0000 Epoch: 39 Global Step: 824040 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:43:17,799-Speed 6305.60 samples/sec Loss 2.5174 LearningRate 0.0000 Epoch: 39 Global Step: 824050 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:43:21,049-Speed 6302.58 samples/sec Loss 2.5717 LearningRate 0.0000 Epoch: 39 Global Step: 824060 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-03 18:43:24,299-Speed 6303.52 samples/sec Loss 2.4983 LearningRate 0.0000 Epoch: 39 Global Step: 824070 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:43:27,559-Speed 6284.03 samples/sec Loss 2.5246 LearningRate 0.0000 Epoch: 39 Global Step: 824080 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:43:30,803-Speed 6313.84 samples/sec Loss 2.5477 LearningRate 0.0000 Epoch: 39 Global Step: 824090 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:43:34,052-Speed 6304.83 samples/sec Loss 2.5642 LearningRate 0.0000 Epoch: 39 Global Step: 824100 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:43:37,295-Speed 6317.02 samples/sec Loss 2.4892 LearningRate 0.0000 Epoch: 39 Global Step: 824110 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:43:40,545-Speed 6304.11 samples/sec Loss 2.5530 LearningRate 0.0000 Epoch: 39 Global Step: 824120 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:43:43,797-Speed 6299.23 samples/sec Loss 2.5031 LearningRate 0.0000 Epoch: 39 Global Step: 824130 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:43:47,037-Speed 6322.16 samples/sec Loss 2.5706 LearningRate 0.0000 Epoch: 39 Global Step: 824140 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:43:50,282-Speed 6311.77 samples/sec Loss 2.4910 LearningRate 0.0000 Epoch: 39 Global Step: 824150 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:43:53,529-Speed 6309.78 samples/sec Loss 2.5468 LearningRate 0.0000 Epoch: 39 Global Step: 824160 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:43:56,774-Speed 6313.00 samples/sec Loss 2.5579 LearningRate 0.0000 Epoch: 39 Global Step: 824170 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:00,017-Speed 6318.50 samples/sec Loss 2.5221 LearningRate 0.0000 Epoch: 39 Global Step: 824180 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:03,269-Speed 6298.78 samples/sec Loss 2.5634 LearningRate 0.0000 Epoch: 39 Global Step: 824190 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:06,521-Speed 6299.79 samples/sec Loss 2.5315 LearningRate 0.0000 Epoch: 39 Global Step: 824200 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:09,770-Speed 6304.22 samples/sec Loss 2.5110 LearningRate 0.0000 Epoch: 39 Global Step: 824210 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:13,022-Speed 6298.09 samples/sec Loss 2.5540 LearningRate 0.0000 Epoch: 39 Global Step: 824220 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:16,275-Speed 6298.60 samples/sec Loss 2.5224 LearningRate 0.0000 Epoch: 39 Global Step: 824230 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:19,531-Speed 6290.56 samples/sec Loss 2.5328 LearningRate 0.0000 Epoch: 39 Global Step: 824240 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-03 18:44:22,760-Speed 6344.19 samples/sec Loss 2.4879 LearningRate 0.0000 Epoch: 39 Global Step: 824250 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:26,024-Speed 6276.08 samples/sec Loss 2.4989 LearningRate 0.0000 Epoch: 39 Global Step: 824260 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:29,275-Speed 6301.40 samples/sec Loss 2.5222 LearningRate 0.0000 Epoch: 39 Global Step: 824270 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:32,576-Speed 6204.00 samples/sec Loss 2.5474 LearningRate 0.0000 Epoch: 39 Global Step: 824280 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:35,858-Speed 6241.59 samples/sec Loss 2.4714 LearningRate 0.0000 Epoch: 39 Global Step: 824290 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:39,108-Speed 6304.73 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 39 Global Step: 824300 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:42,359-Speed 6301.37 samples/sec Loss 2.5119 LearningRate 0.0000 Epoch: 39 Global Step: 824310 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:45,606-Speed 6308.12 samples/sec Loss 2.5585 LearningRate 0.0000 Epoch: 39 Global Step: 824320 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:48,874-Speed 6267.55 samples/sec Loss 2.5413 LearningRate 0.0000 Epoch: 39 Global Step: 824330 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:52,126-Speed 6300.69 samples/sec Loss 2.5288 LearningRate 0.0000 Epoch: 39 Global Step: 824340 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:55,363-Speed 6327.32 samples/sec Loss 2.5302 LearningRate 0.0000 Epoch: 39 Global Step: 824350 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:44:58,609-Speed 6310.68 samples/sec Loss 2.5331 LearningRate 0.0000 Epoch: 39 Global Step: 824360 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:01,858-Speed 6304.68 samples/sec Loss 2.5528 LearningRate 0.0000 Epoch: 39 Global Step: 824370 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:05,107-Speed 6304.18 samples/sec Loss 2.4850 LearningRate 0.0000 Epoch: 39 Global Step: 824380 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:08,355-Speed 6307.70 samples/sec Loss 2.5201 LearningRate 0.0000 Epoch: 39 Global Step: 824390 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:11,610-Speed 6292.44 samples/sec Loss 2.5279 LearningRate 0.0000 Epoch: 39 Global Step: 824400 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:14,855-Speed 6313.11 samples/sec Loss 2.5056 LearningRate 0.0000 Epoch: 39 Global Step: 824410 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:18,103-Speed 6309.44 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 39 Global Step: 824420 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:21,359-Speed 6291.43 samples/sec Loss 2.5570 LearningRate 0.0000 Epoch: 39 Global Step: 824430 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:24,622-Speed 6277.31 samples/sec Loss 2.4625 LearningRate 0.0000 Epoch: 39 Global Step: 824440 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:27,866-Speed 6314.82 samples/sec Loss 2.4890 LearningRate 0.0000 Epoch: 39 Global Step: 824450 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:31,115-Speed 6305.23 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 39 Global Step: 824460 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:34,363-Speed 6307.00 samples/sec Loss 2.4879 LearningRate 0.0000 Epoch: 39 Global Step: 824470 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:37,613-Speed 6302.72 samples/sec Loss 2.4948 LearningRate 0.0000 Epoch: 39 Global Step: 824480 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:40,869-Speed 6290.18 samples/sec Loss 2.5286 LearningRate 0.0000 Epoch: 39 Global Step: 824490 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:44,124-Speed 6295.56 samples/sec Loss 2.5368 LearningRate 0.0000 Epoch: 39 Global Step: 824500 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:47,369-Speed 6311.41 samples/sec Loss 2.5407 LearningRate 0.0000 Epoch: 39 Global Step: 824510 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:50,619-Speed 6302.92 samples/sec Loss 2.5023 LearningRate 0.0000 Epoch: 39 Global Step: 824520 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:53,870-Speed 6301.06 samples/sec Loss 2.4924 LearningRate 0.0000 Epoch: 39 Global Step: 824530 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:45:57,121-Speed 6301.94 samples/sec Loss 2.5003 LearningRate 0.0000 Epoch: 39 Global Step: 824540 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:00,358-Speed 6327.62 samples/sec Loss 2.5442 LearningRate 0.0000 Epoch: 39 Global Step: 824550 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:03,608-Speed 6302.59 samples/sec Loss 2.4958 LearningRate 0.0000 Epoch: 39 Global Step: 824560 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:06,864-Speed 6292.20 samples/sec Loss 2.5057 LearningRate 0.0000 Epoch: 39 Global Step: 824570 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:10,109-Speed 6311.47 samples/sec Loss 2.4972 LearningRate 0.0000 Epoch: 39 Global Step: 824580 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:13,361-Speed 6299.09 samples/sec Loss 2.5517 LearningRate 0.0000 Epoch: 39 Global Step: 824590 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:16,613-Speed 6300.27 samples/sec Loss 2.5462 LearningRate 0.0000 Epoch: 39 Global Step: 824600 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:19,863-Speed 6303.00 samples/sec Loss 2.5029 LearningRate 0.0000 Epoch: 39 Global Step: 824610 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:23,113-Speed 6302.65 samples/sec Loss 2.5561 LearningRate 0.0000 Epoch: 39 Global Step: 824620 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:26,361-Speed 6306.62 samples/sec Loss 2.5148 LearningRate 0.0000 Epoch: 39 Global Step: 824630 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:29,614-Speed 6296.52 samples/sec Loss 2.5279 LearningRate 0.0000 Epoch: 39 Global Step: 824640 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:32,852-Speed 6326.91 samples/sec Loss 2.5496 LearningRate 0.0000 Epoch: 39 Global Step: 824650 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:36,106-Speed 6295.62 samples/sec Loss 2.5340 LearningRate 0.0000 Epoch: 39 Global Step: 824660 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:39,358-Speed 6297.99 samples/sec Loss 2.5131 LearningRate 0.0000 Epoch: 39 Global Step: 824670 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:42,613-Speed 6294.46 samples/sec Loss 2.4966 LearningRate 0.0000 Epoch: 39 Global Step: 824680 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:45,867-Speed 6293.57 samples/sec Loss 2.5298 LearningRate 0.0000 Epoch: 39 Global Step: 824690 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:49,121-Speed 6297.06 samples/sec Loss 2.5374 LearningRate 0.0000 Epoch: 39 Global Step: 824700 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:52,376-Speed 6292.86 samples/sec Loss 2.5346 LearningRate 0.0000 Epoch: 39 Global Step: 824710 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:55,628-Speed 6298.97 samples/sec Loss 2.4758 LearningRate 0.0000 Epoch: 39 Global Step: 824720 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:46:58,880-Speed 6298.89 samples/sec Loss 2.5267 LearningRate 0.0000 Epoch: 39 Global Step: 824730 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:02,134-Speed 6294.72 samples/sec Loss 2.5327 LearningRate 0.0000 Epoch: 39 Global Step: 824740 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:05,378-Speed 6315.67 samples/sec Loss 2.5327 LearningRate 0.0000 Epoch: 39 Global Step: 824750 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:08,643-Speed 6274.21 samples/sec Loss 2.5488 LearningRate 0.0000 Epoch: 39 Global Step: 824760 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:11,896-Speed 6297.71 samples/sec Loss 2.5362 LearningRate 0.0000 Epoch: 39 Global Step: 824770 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:15,143-Speed 6308.00 samples/sec Loss 2.5527 LearningRate 0.0000 Epoch: 39 Global Step: 824780 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:18,392-Speed 6305.66 samples/sec Loss 2.5371 LearningRate 0.0000 Epoch: 39 Global Step: 824790 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:21,639-Speed 6308.09 samples/sec Loss 2.5154 LearningRate 0.0000 Epoch: 39 Global Step: 824800 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:24,888-Speed 6305.43 samples/sec Loss 2.5369 LearningRate 0.0000 Epoch: 39 Global Step: 824810 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:28,144-Speed 6290.99 samples/sec Loss 2.5472 LearningRate 0.0000 Epoch: 39 Global Step: 824820 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:31,398-Speed 6295.42 samples/sec Loss 2.4969 LearningRate 0.0000 Epoch: 39 Global Step: 824830 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:34,655-Speed 6289.48 samples/sec Loss 2.5354 LearningRate 0.0000 Epoch: 39 Global Step: 824840 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:37,887-Speed 6337.80 samples/sec Loss 2.5388 LearningRate 0.0000 Epoch: 39 Global Step: 824850 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:41,140-Speed 6296.61 samples/sec Loss 2.5542 LearningRate 0.0000 Epoch: 39 Global Step: 824860 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:44,393-Speed 6297.34 samples/sec Loss 2.5657 LearningRate 0.0000 Epoch: 39 Global Step: 824870 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:47,640-Speed 6308.17 samples/sec Loss 2.5568 LearningRate 0.0000 Epoch: 39 Global Step: 824880 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:50,892-Speed 6299.30 samples/sec Loss 2.5303 LearningRate 0.0000 Epoch: 39 Global Step: 824890 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:54,144-Speed 6299.61 samples/sec Loss 2.4820 LearningRate 0.0000 Epoch: 39 Global Step: 824900 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:47:57,393-Speed 6306.09 samples/sec Loss 2.5372 LearningRate 0.0000 Epoch: 39 Global Step: 824910 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:00,638-Speed 6311.38 samples/sec Loss 2.4714 LearningRate 0.0000 Epoch: 39 Global Step: 824920 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:03,939-Speed 6207.32 samples/sec Loss 2.5354 LearningRate 0.0000 Epoch: 39 Global Step: 824930 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:07,249-Speed 6187.87 samples/sec Loss 2.5323 LearningRate 0.0000 Epoch: 39 Global Step: 824940 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:10,497-Speed 6306.07 samples/sec Loss 2.5826 LearningRate 0.0000 Epoch: 39 Global Step: 824950 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-03 18:48:13,734-Speed 6329.52 samples/sec Loss 2.4933 LearningRate 0.0000 Epoch: 39 Global Step: 824960 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:16,977-Speed 6315.99 samples/sec Loss 2.5253 LearningRate 0.0000 Epoch: 39 Global Step: 824970 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:20,227-Speed 6303.03 samples/sec Loss 2.4746 LearningRate 0.0000 Epoch: 39 Global Step: 824980 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:23,478-Speed 6300.52 samples/sec Loss 2.5353 LearningRate 0.0000 Epoch: 39 Global Step: 824990 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:26,731-Speed 6296.14 samples/sec Loss 2.5254 LearningRate 0.0000 Epoch: 39 Global Step: 825000 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:29,985-Speed 6295.75 samples/sec Loss 2.5409 LearningRate 0.0000 Epoch: 39 Global Step: 825010 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:33,244-Speed 6285.40 samples/sec Loss 2.5479 LearningRate 0.0000 Epoch: 39 Global Step: 825020 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:36,503-Speed 6286.14 samples/sec Loss 2.5626 LearningRate 0.0000 Epoch: 39 Global Step: 825030 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:39,753-Speed 6302.54 samples/sec Loss 2.4915 LearningRate 0.0000 Epoch: 39 Global Step: 825040 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:43,008-Speed 6293.44 samples/sec Loss 2.5544 LearningRate 0.0000 Epoch: 39 Global Step: 825050 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:46,247-Speed 6324.00 samples/sec Loss 2.5502 LearningRate 0.0000 Epoch: 39 Global Step: 825060 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:49,502-Speed 6292.72 samples/sec Loss 2.5330 LearningRate 0.0000 Epoch: 39 Global Step: 825070 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:52,757-Speed 6294.86 samples/sec Loss 2.5764 LearningRate 0.0000 Epoch: 39 Global Step: 825080 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:56,011-Speed 6294.27 samples/sec Loss 2.5751 LearningRate 0.0000 Epoch: 39 Global Step: 825090 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:48:59,264-Speed 6298.77 samples/sec Loss 2.5124 LearningRate 0.0000 Epoch: 39 Global Step: 825100 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:02,518-Speed 6295.31 samples/sec Loss 2.5343 LearningRate 0.0000 Epoch: 39 Global Step: 825110 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:05,774-Speed 6291.57 samples/sec Loss 2.5203 LearningRate 0.0000 Epoch: 39 Global Step: 825120 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:09,080-Speed 6195.23 samples/sec Loss 2.5608 LearningRate 0.0000 Epoch: 39 Global Step: 825130 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:12,333-Speed 6296.96 samples/sec Loss 2.5741 LearningRate 0.0000 Epoch: 39 Global Step: 825140 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:15,582-Speed 6305.11 samples/sec Loss 2.5116 LearningRate 0.0000 Epoch: 39 Global Step: 825150 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:18,812-Speed 6341.11 samples/sec Loss 2.5297 LearningRate 0.0000 Epoch: 39 Global Step: 825160 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:22,069-Speed 6290.49 samples/sec Loss 2.4871 LearningRate 0.0000 Epoch: 39 Global Step: 825170 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:25,319-Speed 6302.02 samples/sec Loss 2.5176 LearningRate 0.0000 Epoch: 39 Global Step: 825180 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:28,565-Speed 6311.82 samples/sec Loss 2.4910 LearningRate 0.0000 Epoch: 39 Global Step: 825190 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:31,815-Speed 6303.19 samples/sec Loss 2.5145 LearningRate 0.0000 Epoch: 39 Global Step: 825200 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:35,132-Speed 6174.23 samples/sec Loss 2.4931 LearningRate 0.0000 Epoch: 39 Global Step: 825210 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:38,394-Speed 6280.94 samples/sec Loss 2.5259 LearningRate 0.0000 Epoch: 39 Global Step: 825220 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:41,677-Speed 6239.98 samples/sec Loss 2.4833 LearningRate 0.0000 Epoch: 39 Global Step: 825230 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:44,923-Speed 6309.82 samples/sec Loss 2.5332 LearningRate 0.0000 Epoch: 39 Global Step: 825240 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:48,173-Speed 6302.99 samples/sec Loss 2.5209 LearningRate 0.0000 Epoch: 39 Global Step: 825250 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:51,407-Speed 6333.93 samples/sec Loss 2.5169 LearningRate 0.0000 Epoch: 39 Global Step: 825260 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:54,652-Speed 6312.88 samples/sec Loss 2.5517 LearningRate 0.0000 Epoch: 39 Global Step: 825270 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:49:57,900-Speed 6306.29 samples/sec Loss 2.5268 LearningRate 0.0000 Epoch: 39 Global Step: 825280 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:01,152-Speed 6300.04 samples/sec Loss 2.5411 LearningRate 0.0000 Epoch: 39 Global Step: 825290 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:04,404-Speed 6298.09 samples/sec Loss 2.5225 LearningRate 0.0000 Epoch: 39 Global Step: 825300 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:07,683-Speed 6247.35 samples/sec Loss 2.5391 LearningRate 0.0000 Epoch: 39 Global Step: 825310 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:10,931-Speed 6307.49 samples/sec Loss 2.5411 LearningRate 0.0000 Epoch: 39 Global Step: 825320 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:14,179-Speed 6307.81 samples/sec Loss 2.5375 LearningRate 0.0000 Epoch: 39 Global Step: 825330 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:17,424-Speed 6311.95 samples/sec Loss 2.5210 LearningRate 0.0000 Epoch: 39 Global Step: 825340 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:20,672-Speed 6306.59 samples/sec Loss 2.5546 LearningRate 0.0000 Epoch: 39 Global Step: 825350 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:23,912-Speed 6323.81 samples/sec Loss 2.5310 LearningRate 0.0000 Epoch: 39 Global Step: 825360 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:27,163-Speed 6299.88 samples/sec Loss 2.4998 LearningRate 0.0000 Epoch: 39 Global Step: 825370 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:30,415-Speed 6298.56 samples/sec Loss 2.5194 LearningRate 0.0000 Epoch: 39 Global Step: 825380 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:33,669-Speed 6295.52 samples/sec Loss 2.5340 LearningRate 0.0000 Epoch: 39 Global Step: 825390 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:36,909-Speed 6321.96 samples/sec Loss 2.5078 LearningRate 0.0000 Epoch: 39 Global Step: 825400 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:40,160-Speed 6301.28 samples/sec Loss 2.5068 LearningRate 0.0000 Epoch: 39 Global Step: 825410 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:43,411-Speed 6301.67 samples/sec Loss 2.5015 LearningRate 0.0000 Epoch: 39 Global Step: 825420 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:46,662-Speed 6300.48 samples/sec Loss 2.5326 LearningRate 0.0000 Epoch: 39 Global Step: 825430 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:49,921-Speed 6285.20 samples/sec Loss 2.4866 LearningRate 0.0000 Epoch: 39 Global Step: 825440 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:53,169-Speed 6307.92 samples/sec Loss 2.5187 LearningRate 0.0000 Epoch: 39 Global Step: 825450 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:50:56,418-Speed 6304.16 samples/sec Loss 2.4821 LearningRate 0.0000 Epoch: 39 Global Step: 825460 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-03 18:50:59,650-Speed 6337.93 samples/sec Loss 2.5299 LearningRate 0.0000 Epoch: 39 Global Step: 825470 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:02,913-Speed 6277.53 samples/sec Loss 2.5199 LearningRate 0.0000 Epoch: 39 Global Step: 825480 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:06,174-Speed 6281.56 samples/sec Loss 2.5294 LearningRate 0.0000 Epoch: 39 Global Step: 825490 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:09,424-Speed 6304.00 samples/sec Loss 2.5348 LearningRate 0.0000 Epoch: 39 Global Step: 825500 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:12,678-Speed 6295.01 samples/sec Loss 2.5802 LearningRate 0.0000 Epoch: 39 Global Step: 825510 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:15,923-Speed 6312.81 samples/sec Loss 2.4840 LearningRate 0.0000 Epoch: 39 Global Step: 825520 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:19,172-Speed 6305.29 samples/sec Loss 2.5259 LearningRate 0.0000 Epoch: 39 Global Step: 825530 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:22,421-Speed 6305.60 samples/sec Loss 2.5273 LearningRate 0.0000 Epoch: 39 Global Step: 825540 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:25,686-Speed 6273.14 samples/sec Loss 2.5059 LearningRate 0.0000 Epoch: 39 Global Step: 825550 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:28,932-Speed 6310.50 samples/sec Loss 2.5034 LearningRate 0.0000 Epoch: 39 Global Step: 825560 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:32,168-Speed 6331.60 samples/sec Loss 2.5500 LearningRate 0.0000 Epoch: 39 Global Step: 825570 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:35,412-Speed 6313.14 samples/sec Loss 2.5501 LearningRate 0.0000 Epoch: 39 Global Step: 825580 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:38,660-Speed 6307.73 samples/sec Loss 2.5460 LearningRate 0.0000 Epoch: 39 Global Step: 825590 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:41,911-Speed 6300.63 samples/sec Loss 2.5268 LearningRate 0.0000 Epoch: 39 Global Step: 825600 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:45,164-Speed 6297.74 samples/sec Loss 2.5819 LearningRate 0.0000 Epoch: 39 Global Step: 825610 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:48,415-Speed 6300.84 samples/sec Loss 2.5541 LearningRate 0.0000 Epoch: 39 Global Step: 825620 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:51,667-Speed 6299.55 samples/sec Loss 2.5355 LearningRate 0.0000 Epoch: 39 Global Step: 825630 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:54,924-Speed 6287.76 samples/sec Loss 2.5445 LearningRate 0.0000 Epoch: 39 Global Step: 825640 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:51:58,173-Speed 6304.94 samples/sec Loss 2.5235 LearningRate 0.0000 Epoch: 39 Global Step: 825650 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:01,424-Speed 6301.62 samples/sec Loss 2.5199 LearningRate 0.0000 Epoch: 39 Global Step: 825660 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:04,664-Speed 6322.39 samples/sec Loss 2.5026 LearningRate 0.0000 Epoch: 39 Global Step: 825670 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:07,916-Speed 6298.30 samples/sec Loss 2.5024 LearningRate 0.0000 Epoch: 39 Global Step: 825680 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:11,162-Speed 6311.26 samples/sec Loss 2.4773 LearningRate 0.0000 Epoch: 39 Global Step: 825690 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:14,410-Speed 6307.43 samples/sec Loss 2.5364 LearningRate 0.0000 Epoch: 39 Global Step: 825700 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:17,657-Speed 6308.95 samples/sec Loss 2.5238 LearningRate 0.0000 Epoch: 39 Global Step: 825710 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:20,911-Speed 6294.72 samples/sec Loss 2.5170 LearningRate 0.0000 Epoch: 39 Global Step: 825720 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:24,161-Speed 6304.37 samples/sec Loss 2.5116 LearningRate 0.0000 Epoch: 39 Global Step: 825730 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:27,423-Speed 6278.57 samples/sec Loss 2.4883 LearningRate 0.0000 Epoch: 39 Global Step: 825740 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:30,686-Speed 6277.48 samples/sec Loss 2.5410 LearningRate 0.0000 Epoch: 39 Global Step: 825750 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:33,938-Speed 6300.89 samples/sec Loss 2.5047 LearningRate 0.0000 Epoch: 39 Global Step: 825760 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:52:37,176-Speed 6326.29 samples/sec Loss 2.5571 LearningRate 0.0000 Epoch: 39 Global Step: 825770 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:52:40,424-Speed 6305.79 samples/sec Loss 2.5503 LearningRate 0.0000 Epoch: 39 Global Step: 825780 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:52:43,674-Speed 6302.85 samples/sec Loss 2.5062 LearningRate 0.0000 Epoch: 39 Global Step: 825790 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:52:46,924-Speed 6302.56 samples/sec Loss 2.4898 LearningRate 0.0000 Epoch: 39 Global Step: 825800 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:52:50,172-Speed 6308.04 samples/sec Loss 2.4940 LearningRate 0.0000 Epoch: 39 Global Step: 825810 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:52:53,431-Speed 6284.83 samples/sec Loss 2.5398 LearningRate 0.0000 Epoch: 39 Global Step: 825820 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:52:56,677-Speed 6310.05 samples/sec Loss 2.5271 LearningRate 0.0000 Epoch: 39 Global Step: 825830 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:52:59,928-Speed 6301.37 samples/sec Loss 2.4917 LearningRate 0.0000 Epoch: 39 Global Step: 825840 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:53:03,184-Speed 6292.66 samples/sec Loss 2.5175 LearningRate 0.0000 Epoch: 39 Global Step: 825850 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:53:06,436-Speed 6297.44 samples/sec Loss 2.5497 LearningRate 0.0000 Epoch: 39 Global Step: 825860 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:53:09,690-Speed 6295.11 samples/sec Loss 2.5545 LearningRate 0.0000 Epoch: 39 Global Step: 825870 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:12,939-Speed 6306.02 samples/sec Loss 2.5336 LearningRate 0.0000 Epoch: 39 Global Step: 825880 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:16,189-Speed 6301.62 samples/sec Loss 2.5692 LearningRate 0.0000 Epoch: 39 Global Step: 825890 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:19,438-Speed 6306.00 samples/sec Loss 2.4978 LearningRate 0.0000 Epoch: 39 Global Step: 825900 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:22,683-Speed 6313.27 samples/sec Loss 2.5252 LearningRate 0.0000 Epoch: 39 Global Step: 825910 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:25,937-Speed 6294.14 samples/sec Loss 2.5065 LearningRate 0.0000 Epoch: 39 Global Step: 825920 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:29,186-Speed 6306.43 samples/sec Loss 2.5903 LearningRate 0.0000 Epoch: 39 Global Step: 825930 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:32,444-Speed 6287.55 samples/sec Loss 2.5314 LearningRate 0.0000 Epoch: 39 Global Step: 825940 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:35,695-Speed 6300.79 samples/sec Loss 2.5143 LearningRate 0.0000 Epoch: 39 Global Step: 825950 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:38,946-Speed 6301.93 samples/sec Loss 2.5290 LearningRate 0.0000 Epoch: 39 Global Step: 825960 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:42,190-Speed 6314.32 samples/sec Loss 2.5297 LearningRate 0.0000 Epoch: 39 Global Step: 825970 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:45,449-Speed 6284.01 samples/sec Loss 2.5158 LearningRate 0.0000 Epoch: 39 Global Step: 825980 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:48,705-Speed 6291.21 samples/sec Loss 2.5283 LearningRate 0.0000 Epoch: 39 Global Step: 825990 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:51,958-Speed 6298.81 samples/sec Loss 2.5101 LearningRate 0.0000 Epoch: 39 Global Step: 826000 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:55,210-Speed 6298.91 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 39 Global Step: 826010 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:53:58,456-Speed 6310.19 samples/sec Loss 2.5859 LearningRate 0.0000 Epoch: 39 Global Step: 826020 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:01,708-Speed 6297.37 samples/sec Loss 2.5773 LearningRate 0.0000 Epoch: 39 Global Step: 826030 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:04,964-Speed 6292.97 samples/sec Loss 2.5725 LearningRate 0.0000 Epoch: 39 Global Step: 826040 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:08,209-Speed 6312.27 samples/sec Loss 2.5236 LearningRate 0.0000 Epoch: 39 Global Step: 826050 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:11,454-Speed 6311.96 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 39 Global Step: 826060 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:14,693-Speed 6325.64 samples/sec Loss 2.5197 LearningRate 0.0000 Epoch: 39 Global Step: 826070 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:17,936-Speed 6315.74 samples/sec Loss 2.5268 LearningRate 0.0000 Epoch: 39 Global Step: 826080 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:21,189-Speed 6296.65 samples/sec Loss 2.5068 LearningRate 0.0000 Epoch: 39 Global Step: 826090 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:24,438-Speed 6304.49 samples/sec Loss 2.5243 LearningRate 0.0000 Epoch: 39 Global Step: 826100 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:27,691-Speed 6298.13 samples/sec Loss 2.4722 LearningRate 0.0000 Epoch: 39 Global Step: 826110 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:30,942-Speed 6299.90 samples/sec Loss 2.5411 LearningRate 0.0000 Epoch: 39 Global Step: 826120 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:34,201-Speed 6286.96 samples/sec Loss 2.4993 LearningRate 0.0000 Epoch: 39 Global Step: 826130 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:37,463-Speed 6280.04 samples/sec Loss 2.5004 LearningRate 0.0000 Epoch: 39 Global Step: 826140 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:40,713-Speed 6302.57 samples/sec Loss 2.5439 LearningRate 0.0000 Epoch: 39 Global Step: 826150 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:43,959-Speed 6310.60 samples/sec Loss 2.5322 LearningRate 0.0000 Epoch: 39 Global Step: 826160 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:47,195-Speed 6330.96 samples/sec Loss 2.5297 LearningRate 0.0000 Epoch: 39 Global Step: 826170 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:50,446-Speed 6300.36 samples/sec Loss 2.4857 LearningRate 0.0000 Epoch: 39 Global Step: 826180 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:53,696-Speed 6303.97 samples/sec Loss 2.5515 LearningRate 0.0000 Epoch: 39 Global Step: 826190 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:54:56,944-Speed 6305.59 samples/sec Loss 2.5039 LearningRate 0.0000 Epoch: 39 Global Step: 826200 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:00,201-Speed 6290.82 samples/sec Loss 2.5670 LearningRate 0.0000 Epoch: 39 Global Step: 826210 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:03,450-Speed 6304.32 samples/sec Loss 2.5450 LearningRate 0.0000 Epoch: 39 Global Step: 826220 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:06,697-Speed 6308.21 samples/sec Loss 2.5574 LearningRate 0.0000 Epoch: 39 Global Step: 826230 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:09,950-Speed 6297.25 samples/sec Loss 2.4767 LearningRate 0.0000 Epoch: 39 Global Step: 826240 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:13,197-Speed 6308.56 samples/sec Loss 2.5569 LearningRate 0.0000 Epoch: 39 Global Step: 826250 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:16,455-Speed 6288.00 samples/sec Loss 2.5552 LearningRate 0.0000 Epoch: 39 Global Step: 826260 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:19,703-Speed 6305.83 samples/sec Loss 2.4639 LearningRate 0.0000 Epoch: 39 Global Step: 826270 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-03 18:55:22,944-Speed 6320.49 samples/sec Loss 2.5080 LearningRate 0.0000 Epoch: 39 Global Step: 826280 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:26,191-Speed 6310.17 samples/sec Loss 2.5332 LearningRate 0.0000 Epoch: 39 Global Step: 826290 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:29,440-Speed 6304.34 samples/sec Loss 2.4891 LearningRate 0.0000 Epoch: 39 Global Step: 826300 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:32,702-Speed 6278.90 samples/sec Loss 2.5427 LearningRate 0.0000 Epoch: 39 Global Step: 826310 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:35,949-Speed 6310.51 samples/sec Loss 2.5174 LearningRate 0.0000 Epoch: 39 Global Step: 826320 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:39,218-Speed 6265.94 samples/sec Loss 2.5190 LearningRate 0.0000 Epoch: 39 Global Step: 826330 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:42,464-Speed 6311.96 samples/sec Loss 2.5238 LearningRate 0.0000 Epoch: 39 Global Step: 826340 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:45,712-Speed 6305.76 samples/sec Loss 2.5130 LearningRate 0.0000 Epoch: 39 Global Step: 826350 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:48,965-Speed 6297.62 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 39 Global Step: 826360 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:52,213-Speed 6306.92 samples/sec Loss 2.5068 LearningRate 0.0000 Epoch: 39 Global Step: 826370 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:55,448-Speed 6331.53 samples/sec Loss 2.5198 LearningRate 0.0000 Epoch: 39 Global Step: 826380 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:55:58,703-Speed 6293.12 samples/sec Loss 2.5083 LearningRate 0.0000 Epoch: 39 Global Step: 826390 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:01,952-Speed 6304.56 samples/sec Loss 2.5093 LearningRate 0.0000 Epoch: 39 Global Step: 826400 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:05,201-Speed 6305.73 samples/sec Loss 2.5582 LearningRate 0.0000 Epoch: 39 Global Step: 826410 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:08,454-Speed 6297.47 samples/sec Loss 2.5448 LearningRate 0.0000 Epoch: 39 Global Step: 826420 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:11,700-Speed 6309.74 samples/sec Loss 2.5134 LearningRate 0.0000 Epoch: 39 Global Step: 826430 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:14,950-Speed 6303.75 samples/sec Loss 2.5450 LearningRate 0.0000 Epoch: 39 Global Step: 826440 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:18,205-Speed 6292.14 samples/sec Loss 2.5226 LearningRate 0.0000 Epoch: 39 Global Step: 826450 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:21,458-Speed 6298.39 samples/sec Loss 2.5656 LearningRate 0.0000 Epoch: 39 Global Step: 826460 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:24,711-Speed 6297.36 samples/sec Loss 2.5107 LearningRate 0.0000 Epoch: 39 Global Step: 826470 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:27,946-Speed 6332.35 samples/sec Loss 2.5006 LearningRate 0.0000 Epoch: 39 Global Step: 826480 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:31,192-Speed 6309.79 samples/sec Loss 2.4889 LearningRate 0.0000 Epoch: 39 Global Step: 826490 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:34,441-Speed 6305.47 samples/sec Loss 2.5594 LearningRate 0.0000 Epoch: 39 Global Step: 826500 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:37,695-Speed 6294.65 samples/sec Loss 2.4787 LearningRate 0.0000 Epoch: 39 Global Step: 826510 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:40,940-Speed 6313.11 samples/sec Loss 2.5746 LearningRate 0.0000 Epoch: 39 Global Step: 826520 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:44,189-Speed 6304.96 samples/sec Loss 2.4806 LearningRate 0.0000 Epoch: 39 Global Step: 826530 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:47,433-Speed 6315.91 samples/sec Loss 2.5347 LearningRate 0.0000 Epoch: 39 Global Step: 826540 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:50,682-Speed 6304.50 samples/sec Loss 2.4500 LearningRate 0.0000 Epoch: 39 Global Step: 826550 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:53,924-Speed 6318.24 samples/sec Loss 2.5514 LearningRate 0.0000 Epoch: 39 Global Step: 826560 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:56:57,174-Speed 6303.38 samples/sec Loss 2.4915 LearningRate 0.0000 Epoch: 39 Global Step: 826570 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:00,412-Speed 6326.40 samples/sec Loss 2.5620 LearningRate 0.0000 Epoch: 39 Global Step: 826580 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:03,664-Speed 6297.43 samples/sec Loss 2.4838 LearningRate 0.0000 Epoch: 39 Global Step: 826590 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:06,914-Speed 6303.26 samples/sec Loss 2.4589 LearningRate 0.0000 Epoch: 39 Global Step: 826600 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:10,163-Speed 6305.56 samples/sec Loss 2.4912 LearningRate 0.0000 Epoch: 39 Global Step: 826610 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:13,409-Speed 6310.89 samples/sec Loss 2.5535 LearningRate 0.0000 Epoch: 39 Global Step: 826620 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:16,670-Speed 6280.95 samples/sec Loss 2.4877 LearningRate 0.0000 Epoch: 39 Global Step: 826630 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:19,935-Speed 6273.36 samples/sec Loss 2.4892 LearningRate 0.0000 Epoch: 39 Global Step: 826640 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:23,197-Speed 6280.71 samples/sec Loss 2.5318 LearningRate 0.0000 Epoch: 39 Global Step: 826650 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:26,449-Speed 6298.97 samples/sec Loss 2.5744 LearningRate 0.0000 Epoch: 39 Global Step: 826660 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:29,712-Speed 6277.87 samples/sec Loss 2.5597 LearningRate 0.0000 Epoch: 39 Global Step: 826670 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:32,945-Speed 6335.87 samples/sec Loss 2.5242 LearningRate 0.0000 Epoch: 39 Global Step: 826680 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:36,207-Speed 6279.60 samples/sec Loss 2.4881 LearningRate 0.0000 Epoch: 39 Global Step: 826690 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:39,455-Speed 6312.48 samples/sec Loss 2.5517 LearningRate 0.0000 Epoch: 39 Global Step: 826700 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:42,705-Speed 6302.84 samples/sec Loss 2.5679 LearningRate 0.0000 Epoch: 39 Global Step: 826710 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:45,955-Speed 6304.08 samples/sec Loss 2.5174 LearningRate 0.0000 Epoch: 39 Global Step: 826720 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:49,213-Speed 6287.76 samples/sec Loss 2.5275 LearningRate 0.0000 Epoch: 39 Global Step: 826730 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:52,465-Speed 6297.62 samples/sec Loss 2.5129 LearningRate 0.0000 Epoch: 39 Global Step: 826740 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:55,718-Speed 6298.58 samples/sec Loss 2.4716 LearningRate 0.0000 Epoch: 39 Global Step: 826750 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:57:58,954-Speed 6330.21 samples/sec Loss 2.5223 LearningRate 0.0000 Epoch: 39 Global Step: 826760 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:58:02,218-Speed 6275.75 samples/sec Loss 2.4904 LearningRate 0.0000 Epoch: 39 Global Step: 826770 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:58:05,467-Speed 6303.41 samples/sec Loss 2.4915 LearningRate 0.0000 Epoch: 39 Global Step: 826780 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:58:08,722-Speed 6294.91 samples/sec Loss 2.5288 LearningRate 0.0000 Epoch: 39 Global Step: 826790 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:58:11,967-Speed 6311.83 samples/sec Loss 2.5019 LearningRate 0.0000 Epoch: 39 Global Step: 826800 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:58:15,221-Speed 6295.16 samples/sec Loss 2.5076 LearningRate 0.0000 Epoch: 39 Global Step: 826810 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:58:18,481-Speed 6284.30 samples/sec Loss 2.5513 LearningRate 0.0000 Epoch: 39 Global Step: 826820 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:58:21,725-Speed 6313.48 samples/sec Loss 2.5013 LearningRate 0.0000 Epoch: 39 Global Step: 826830 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:58:24,975-Speed 6302.95 samples/sec Loss 2.4979 LearningRate 0.0000 Epoch: 39 Global Step: 826840 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:58:28,229-Speed 6295.33 samples/sec Loss 2.5390 LearningRate 0.0000 Epoch: 39 Global Step: 826850 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 18:58:31,482-Speed 6297.72 samples/sec Loss 2.5563 LearningRate 0.0000 Epoch: 39 Global Step: 826860 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:58:34,731-Speed 6304.19 samples/sec Loss 2.5201 LearningRate 0.0000 Epoch: 39 Global Step: 826870 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:58:37,979-Speed 6306.51 samples/sec Loss 2.5039 LearningRate 0.0000 Epoch: 39 Global Step: 826880 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:58:41,228-Speed 6305.44 samples/sec Loss 2.5027 LearningRate 0.0000 Epoch: 39 Global Step: 826890 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:58:44,476-Speed 6307.75 samples/sec Loss 2.5415 LearningRate 0.0000 Epoch: 39 Global Step: 826900 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:58:47,743-Speed 6269.31 samples/sec Loss 2.5418 LearningRate 0.0000 Epoch: 39 Global Step: 826910 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:58:50,994-Speed 6300.30 samples/sec Loss 2.5311 LearningRate 0.0000 Epoch: 39 Global Step: 826920 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:58:54,257-Speed 6279.35 samples/sec Loss 2.5262 LearningRate 0.0000 Epoch: 39 Global Step: 826930 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:58:57,513-Speed 6291.01 samples/sec Loss 2.4905 LearningRate 0.0000 Epoch: 39 Global Step: 826940 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:00,761-Speed 6307.30 samples/sec Loss 2.5055 LearningRate 0.0000 Epoch: 39 Global Step: 826950 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:04,014-Speed 6296.81 samples/sec Loss 2.5238 LearningRate 0.0000 Epoch: 39 Global Step: 826960 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-03 18:59:07,245-Speed 6340.28 samples/sec Loss 2.4930 LearningRate 0.0000 Epoch: 39 Global Step: 826970 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:10,490-Speed 6311.78 samples/sec Loss 2.5392 LearningRate 0.0000 Epoch: 39 Global Step: 826980 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:13,738-Speed 6306.65 samples/sec Loss 2.5288 LearningRate 0.0000 Epoch: 39 Global Step: 826990 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:16,984-Speed 6311.81 samples/sec Loss 2.5089 LearningRate 0.0000 Epoch: 39 Global Step: 827000 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:20,234-Speed 6302.35 samples/sec Loss 2.5308 LearningRate 0.0000 Epoch: 39 Global Step: 827010 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:23,478-Speed 6314.10 samples/sec Loss 2.5245 LearningRate 0.0000 Epoch: 39 Global Step: 827020 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:26,735-Speed 6289.67 samples/sec Loss 2.5534 LearningRate 0.0000 Epoch: 39 Global Step: 827030 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:29,994-Speed 6285.16 samples/sec Loss 2.5603 LearningRate 0.0000 Epoch: 39 Global Step: 827040 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:33,254-Speed 6284.85 samples/sec Loss 2.5502 LearningRate 0.0000 Epoch: 39 Global Step: 827050 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:36,507-Speed 6296.56 samples/sec Loss 2.5455 LearningRate 0.0000 Epoch: 39 Global Step: 827060 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:39,745-Speed 6326.45 samples/sec Loss 2.5466 LearningRate 0.0000 Epoch: 39 Global Step: 827070 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:42,991-Speed 6310.75 samples/sec Loss 2.5521 LearningRate 0.0000 Epoch: 39 Global Step: 827080 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:46,238-Speed 6309.40 samples/sec Loss 2.4738 LearningRate 0.0000 Epoch: 39 Global Step: 827090 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:49,494-Speed 6290.98 samples/sec Loss 2.5439 LearningRate 0.0000 Epoch: 39 Global Step: 827100 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:52,743-Speed 6305.02 samples/sec Loss 2.5007 LearningRate 0.0000 Epoch: 39 Global Step: 827110 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:55,995-Speed 6299.71 samples/sec Loss 2.5287 LearningRate 0.0000 Epoch: 39 Global Step: 827120 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 18:59:59,258-Speed 6277.88 samples/sec Loss 2.5295 LearningRate 0.0000 Epoch: 39 Global Step: 827130 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:02,508-Speed 6304.03 samples/sec Loss 2.4571 LearningRate 0.0000 Epoch: 39 Global Step: 827140 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:05,754-Speed 6310.88 samples/sec Loss 2.4919 LearningRate 0.0000 Epoch: 39 Global Step: 827150 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:09,010-Speed 6290.41 samples/sec Loss 2.5379 LearningRate 0.0000 Epoch: 39 Global Step: 827160 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:12,264-Speed 6295.10 samples/sec Loss 2.5451 LearningRate 0.0000 Epoch: 39 Global Step: 827170 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:15,510-Speed 6310.84 samples/sec Loss 2.5009 LearningRate 0.0000 Epoch: 39 Global Step: 827180 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:18,755-Speed 6313.77 samples/sec Loss 2.5286 LearningRate 0.0000 Epoch: 39 Global Step: 827190 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:22,002-Speed 6307.38 samples/sec Loss 2.4994 LearningRate 0.0000 Epoch: 39 Global Step: 827200 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:25,249-Speed 6309.40 samples/sec Loss 2.5373 LearningRate 0.0000 Epoch: 39 Global Step: 827210 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:28,492-Speed 6316.04 samples/sec Loss 2.5255 LearningRate 0.0000 Epoch: 39 Global Step: 827220 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:31,740-Speed 6307.03 samples/sec Loss 2.5671 LearningRate 0.0000 Epoch: 39 Global Step: 827230 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:34,990-Speed 6303.54 samples/sec Loss 2.5471 LearningRate 0.0000 Epoch: 39 Global Step: 827240 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:38,238-Speed 6309.85 samples/sec Loss 2.4755 LearningRate 0.0000 Epoch: 39 Global Step: 827250 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:41,489-Speed 6301.59 samples/sec Loss 2.5400 LearningRate 0.0000 Epoch: 39 Global Step: 827260 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:44,729-Speed 6323.07 samples/sec Loss 2.4938 LearningRate 0.0000 Epoch: 39 Global Step: 827270 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:47,975-Speed 6310.99 samples/sec Loss 2.5130 LearningRate 0.0000 Epoch: 39 Global Step: 827280 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:51,227-Speed 6297.98 samples/sec Loss 2.5742 LearningRate 0.0000 Epoch: 39 Global Step: 827290 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:54,472-Speed 6313.29 samples/sec Loss 2.5083 LearningRate 0.0000 Epoch: 39 Global Step: 827300 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:00:57,715-Speed 6317.24 samples/sec Loss 2.5444 LearningRate 0.0000 Epoch: 39 Global Step: 827310 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:01:00,964-Speed 6303.00 samples/sec Loss 2.4960 LearningRate 0.0000 Epoch: 39 Global Step: 827320 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:01:04,217-Speed 6298.14 samples/sec Loss 2.5610 LearningRate 0.0000 Epoch: 39 Global Step: 827330 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:01:07,484-Speed 6271.23 samples/sec Loss 2.5559 LearningRate 0.0000 Epoch: 39 Global Step: 827340 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:01:10,733-Speed 6303.88 samples/sec Loss 2.4790 LearningRate 0.0000 Epoch: 39 Global Step: 827350 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:01:13,984-Speed 6301.80 samples/sec Loss 2.5725 LearningRate 0.0000 Epoch: 39 Global Step: 827360 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:01:17,222-Speed 6331.78 samples/sec Loss 2.4939 LearningRate 0.0000 Epoch: 39 Global Step: 827370 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:01:20,476-Speed 6294.53 samples/sec Loss 2.4843 LearningRate 0.0000 Epoch: 39 Global Step: 827380 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:01:23,728-Speed 6299.29 samples/sec Loss 2.5619 LearningRate 0.0000 Epoch: 39 Global Step: 827390 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:01:26,976-Speed 6306.42 samples/sec Loss 2.5289 LearningRate 0.0000 Epoch: 39 Global Step: 827400 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:01:30,207-Speed 6339.83 samples/sec Loss 2.5370 LearningRate 0.0000 Epoch: 39 Global Step: 827410 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:01:33,458-Speed 6301.33 samples/sec Loss 2.5112 LearningRate 0.0000 Epoch: 39 Global Step: 827420 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:01:36,717-Speed 6284.63 samples/sec Loss 2.5187 LearningRate 0.0000 Epoch: 39 Global Step: 827430 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:01:39,968-Speed 6301.54 samples/sec Loss 2.5068 LearningRate 0.0000 Epoch: 39 Global Step: 827440 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:01:43,216-Speed 6307.79 samples/sec Loss 2.5437 LearningRate 0.0000 Epoch: 39 Global Step: 827450 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:01:46,473-Speed 6289.49 samples/sec Loss 2.5565 LearningRate 0.0000 Epoch: 39 Global Step: 827460 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:01:49,736-Speed 6277.23 samples/sec Loss 2.5283 LearningRate 0.0000 Epoch: 39 Global Step: 827470 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:01:52,990-Speed 6294.95 samples/sec Loss 2.4929 LearningRate 0.0000 Epoch: 39 Global Step: 827480 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:01:56,240-Speed 6303.47 samples/sec Loss 2.5416 LearningRate 0.0000 Epoch: 39 Global Step: 827490 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:01:59,486-Speed 6309.59 samples/sec Loss 2.5529 LearningRate 0.0000 Epoch: 39 Global Step: 827500 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:02:02,738-Speed 6299.42 samples/sec Loss 2.5199 LearningRate 0.0000 Epoch: 39 Global Step: 827510 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:06,000-Speed 6280.68 samples/sec Loss 2.4844 LearningRate 0.0000 Epoch: 39 Global Step: 827520 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:09,253-Speed 6296.36 samples/sec Loss 2.5885 LearningRate 0.0000 Epoch: 39 Global Step: 827530 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:12,498-Speed 6313.04 samples/sec Loss 2.4973 LearningRate 0.0000 Epoch: 39 Global Step: 827540 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:15,752-Speed 6295.02 samples/sec Loss 2.5327 LearningRate 0.0000 Epoch: 39 Global Step: 827550 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:18,998-Speed 6312.66 samples/sec Loss 2.4770 LearningRate 0.0000 Epoch: 39 Global Step: 827560 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:22,247-Speed 6304.21 samples/sec Loss 2.5357 LearningRate 0.0000 Epoch: 39 Global Step: 827570 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:25,501-Speed 6295.93 samples/sec Loss 2.5394 LearningRate 0.0000 Epoch: 39 Global Step: 827580 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:28,763-Speed 6278.43 samples/sec Loss 2.5623 LearningRate 0.0000 Epoch: 39 Global Step: 827590 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:32,022-Speed 6285.41 samples/sec Loss 2.5325 LearningRate 0.0000 Epoch: 39 Global Step: 827600 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:35,251-Speed 6343.75 samples/sec Loss 2.4994 LearningRate 0.0000 Epoch: 39 Global Step: 827610 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:38,500-Speed 6309.75 samples/sec Loss 2.5110 LearningRate 0.0000 Epoch: 39 Global Step: 827620 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:41,751-Speed 6300.82 samples/sec Loss 2.5605 LearningRate 0.0000 Epoch: 39 Global Step: 827630 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:45,002-Speed 6301.19 samples/sec Loss 2.5301 LearningRate 0.0000 Epoch: 39 Global Step: 827640 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:48,260-Speed 6287.72 samples/sec Loss 2.5266 LearningRate 0.0000 Epoch: 39 Global Step: 827650 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:51,505-Speed 6312.50 samples/sec Loss 2.4969 LearningRate 0.0000 Epoch: 39 Global Step: 827660 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:54,761-Speed 6290.13 samples/sec Loss 2.5710 LearningRate 0.0000 Epoch: 39 Global Step: 827670 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:02:58,023-Speed 6280.35 samples/sec Loss 2.5253 LearningRate 0.0000 Epoch: 39 Global Step: 827680 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:03:01,287-Speed 6275.09 samples/sec Loss 2.5274 LearningRate 0.0000 Epoch: 39 Global Step: 827690 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:03:04,538-Speed 6301.27 samples/sec Loss 2.5298 LearningRate 0.0000 Epoch: 39 Global Step: 827700 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:03:07,772-Speed 6333.90 samples/sec Loss 2.4769 LearningRate 0.0000 Epoch: 39 Global Step: 827710 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:03:11,055-Speed 6239.90 samples/sec Loss 2.4972 LearningRate 0.0000 Epoch: 39 Global Step: 827720 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:03:14,315-Speed 6284.66 samples/sec Loss 2.5666 LearningRate 0.0000 Epoch: 39 Global Step: 827730 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:03:17,545-Speed 6341.43 samples/sec Loss 2.5440 LearningRate 0.0000 Epoch: 39 Global Step: 827740 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:03:20,792-Speed 6308.15 samples/sec Loss 2.5555 LearningRate 0.0000 Epoch: 39 Global Step: 827750 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:03:24,050-Speed 6289.27 samples/sec Loss 2.5274 LearningRate 0.0000 Epoch: 39 Global Step: 827760 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:03:27,301-Speed 6300.88 samples/sec Loss 2.5199 LearningRate 0.0000 Epoch: 39 Global Step: 827770 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:03:30,549-Speed 6306.11 samples/sec Loss 2.5221 LearningRate 0.0000 Epoch: 39 Global Step: 827780 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:03:33,834-Speed 6236.62 samples/sec Loss 2.5428 LearningRate 0.0000 Epoch: 39 Global Step: 827790 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:03:37,087-Speed 6297.15 samples/sec Loss 2.5504 LearningRate 0.0000 Epoch: 39 Global Step: 827800 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:03:40,332-Speed 6311.50 samples/sec Loss 2.4676 LearningRate 0.0000 Epoch: 39 Global Step: 827810 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:03:43,582-Speed 6304.49 samples/sec Loss 2.5132 LearningRate 0.0000 Epoch: 39 Global Step: 827820 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:03:46,837-Speed 6291.88 samples/sec Loss 2.4747 LearningRate 0.0000 Epoch: 39 Global Step: 827830 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:03:50,082-Speed 6314.43 samples/sec Loss 2.5040 LearningRate 0.0000 Epoch: 39 Global Step: 827840 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:03:53,328-Speed 6309.03 samples/sec Loss 2.5576 LearningRate 0.0000 Epoch: 39 Global Step: 827850 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:03:56,582-Speed 6295.27 samples/sec Loss 2.4956 LearningRate 0.0000 Epoch: 39 Global Step: 827860 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:03:59,839-Speed 6289.00 samples/sec Loss 2.5431 LearningRate 0.0000 Epoch: 39 Global Step: 827870 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:03,088-Speed 6306.40 samples/sec Loss 2.5657 LearningRate 0.0000 Epoch: 39 Global Step: 827880 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:06,335-Speed 6308.23 samples/sec Loss 2.5311 LearningRate 0.0000 Epoch: 39 Global Step: 827890 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:09,589-Speed 6295.61 samples/sec Loss 2.5707 LearningRate 0.0000 Epoch: 39 Global Step: 827900 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:12,837-Speed 6305.73 samples/sec Loss 2.5008 LearningRate 0.0000 Epoch: 39 Global Step: 827910 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:16,083-Speed 6311.51 samples/sec Loss 2.5404 LearningRate 0.0000 Epoch: 39 Global Step: 827920 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:19,330-Speed 6309.37 samples/sec Loss 2.4956 LearningRate 0.0000 Epoch: 39 Global Step: 827930 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:22,571-Speed 6319.38 samples/sec Loss 2.5292 LearningRate 0.0000 Epoch: 39 Global Step: 827940 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:25,824-Speed 6297.91 samples/sec Loss 2.5118 LearningRate 0.0000 Epoch: 39 Global Step: 827950 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:29,077-Speed 6297.40 samples/sec Loss 2.5636 LearningRate 0.0000 Epoch: 39 Global Step: 827960 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:32,331-Speed 6294.95 samples/sec Loss 2.5097 LearningRate 0.0000 Epoch: 39 Global Step: 827970 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:35,580-Speed 6306.15 samples/sec Loss 2.4926 LearningRate 0.0000 Epoch: 39 Global Step: 827980 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:38,851-Speed 6262.36 samples/sec Loss 2.5590 LearningRate 0.0000 Epoch: 39 Global Step: 827990 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:42,101-Speed 6301.52 samples/sec Loss 2.5231 LearningRate 0.0000 Epoch: 39 Global Step: 828000 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:45,345-Speed 6315.23 samples/sec Loss 2.4952 LearningRate 0.0000 Epoch: 39 Global Step: 828010 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:48,595-Speed 6302.45 samples/sec Loss 2.5112 LearningRate 0.0000 Epoch: 39 Global Step: 828020 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:51,850-Speed 6293.64 samples/sec Loss 2.5715 LearningRate 0.0000 Epoch: 39 Global Step: 828030 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:55,082-Speed 6339.09 samples/sec Loss 2.5164 LearningRate 0.0000 Epoch: 39 Global Step: 828040 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:04:58,331-Speed 6305.04 samples/sec Loss 2.5695 LearningRate 0.0000 Epoch: 39 Global Step: 828050 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:01,591-Speed 6282.61 samples/sec Loss 2.5142 LearningRate 0.0000 Epoch: 39 Global Step: 828060 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:04,839-Speed 6307.38 samples/sec Loss 2.5916 LearningRate 0.0000 Epoch: 39 Global Step: 828070 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:08,088-Speed 6304.39 samples/sec Loss 2.5478 LearningRate 0.0000 Epoch: 39 Global Step: 828080 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:11,338-Speed 6302.54 samples/sec Loss 2.5246 LearningRate 0.0000 Epoch: 39 Global Step: 828090 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:14,588-Speed 6302.58 samples/sec Loss 2.5551 LearningRate 0.0000 Epoch: 39 Global Step: 828100 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:17,842-Speed 6296.93 samples/sec Loss 2.4931 LearningRate 0.0000 Epoch: 39 Global Step: 828110 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:21,092-Speed 6302.77 samples/sec Loss 2.4995 LearningRate 0.0000 Epoch: 39 Global Step: 828120 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:24,342-Speed 6302.50 samples/sec Loss 2.5549 LearningRate 0.0000 Epoch: 39 Global Step: 828130 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:27,583-Speed 6319.71 samples/sec Loss 2.5453 LearningRate 0.0000 Epoch: 39 Global Step: 828140 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:30,836-Speed 6296.84 samples/sec Loss 2.4934 LearningRate 0.0000 Epoch: 39 Global Step: 828150 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:34,086-Speed 6302.99 samples/sec Loss 2.5385 LearningRate 0.0000 Epoch: 39 Global Step: 828160 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:37,335-Speed 6306.86 samples/sec Loss 2.5141 LearningRate 0.0000 Epoch: 39 Global Step: 828170 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:40,584-Speed 6304.52 samples/sec Loss 2.5587 LearningRate 0.0000 Epoch: 39 Global Step: 828180 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:43,843-Speed 6285.93 samples/sec Loss 2.5193 LearningRate 0.0000 Epoch: 39 Global Step: 828190 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:47,159-Speed 6177.73 samples/sec Loss 2.5466 LearningRate 0.0000 Epoch: 39 Global Step: 828200 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:50,453-Speed 6220.20 samples/sec Loss 2.5081 LearningRate 0.0000 Epoch: 39 Global Step: 828210 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:53,769-Speed 6177.60 samples/sec Loss 2.5174 LearningRate 0.0000 Epoch: 39 Global Step: 828220 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:05:57,026-Speed 6289.53 samples/sec Loss 2.5179 LearningRate 0.0000 Epoch: 39 Global Step: 828230 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:00,458-Speed 5968.34 samples/sec Loss 2.5409 LearningRate 0.0000 Epoch: 39 Global Step: 828240 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:04,081-Speed 5654.80 samples/sec Loss 2.4914 LearningRate 0.0000 Epoch: 39 Global Step: 828250 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:07,733-Speed 5608.73 samples/sec Loss 2.5443 LearningRate 0.0000 Epoch: 39 Global Step: 828260 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:12,516-Speed 4282.91 samples/sec Loss 2.5182 LearningRate 0.0000 Epoch: 39 Global Step: 828270 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:19,272-Speed 3033.99 samples/sec Loss 2.5039 LearningRate 0.0000 Epoch: 39 Global Step: 828280 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:23,910-Speed 4424.85 samples/sec Loss 2.5541 LearningRate 0.0000 Epoch: 39 Global Step: 828290 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:29,187-Speed 3881.21 samples/sec Loss 2.4955 LearningRate 0.0000 Epoch: 39 Global Step: 828300 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:32,586-Speed 6026.20 samples/sec Loss 2.5504 LearningRate 0.0000 Epoch: 39 Global Step: 828310 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:35,871-Speed 6235.91 samples/sec Loss 2.5027 LearningRate 0.0000 Epoch: 39 Global Step: 828320 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:39,117-Speed 6311.08 samples/sec Loss 2.5052 LearningRate 0.0000 Epoch: 39 Global Step: 828330 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:42,367-Speed 6303.28 samples/sec Loss 2.5381 LearningRate 0.0000 Epoch: 39 Global Step: 828340 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-03 19:06:45,601-Speed 6333.37 samples/sec Loss 2.4832 LearningRate 0.0000 Epoch: 39 Global Step: 828350 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:48,847-Speed 6311.16 samples/sec Loss 2.5596 LearningRate 0.0000 Epoch: 39 Global Step: 828360 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:52,094-Speed 6308.46 samples/sec Loss 2.5553 LearningRate 0.0000 Epoch: 39 Global Step: 828370 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:55,349-Speed 6295.04 samples/sec Loss 2.4927 LearningRate 0.0000 Epoch: 39 Global Step: 828380 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:06:58,607-Speed 6286.92 samples/sec Loss 2.5218 LearningRate 0.0000 Epoch: 39 Global Step: 828390 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:01,853-Speed 6309.68 samples/sec Loss 2.5238 LearningRate 0.0000 Epoch: 39 Global Step: 828400 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:05,119-Speed 6273.72 samples/sec Loss 2.5602 LearningRate 0.0000 Epoch: 39 Global Step: 828410 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:08,370-Speed 6299.52 samples/sec Loss 2.4442 LearningRate 0.0000 Epoch: 39 Global Step: 828420 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:11,619-Speed 6305.23 samples/sec Loss 2.5630 LearningRate 0.0000 Epoch: 39 Global Step: 828430 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:14,877-Speed 6287.55 samples/sec Loss 2.5165 LearningRate 0.0000 Epoch: 39 Global Step: 828440 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:18,115-Speed 6325.82 samples/sec Loss 2.5463 LearningRate 0.0000 Epoch: 39 Global Step: 828450 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:21,360-Speed 6312.70 samples/sec Loss 2.5270 LearningRate 0.0000 Epoch: 39 Global Step: 828460 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:24,618-Speed 6288.00 samples/sec Loss 2.5827 LearningRate 0.0000 Epoch: 39 Global Step: 828470 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:27,876-Speed 6288.28 samples/sec Loss 2.5131 LearningRate 0.0000 Epoch: 39 Global Step: 828480 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:31,126-Speed 6301.71 samples/sec Loss 2.5498 LearningRate 0.0000 Epoch: 39 Global Step: 828490 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:34,373-Speed 6308.92 samples/sec Loss 2.5423 LearningRate 0.0000 Epoch: 39 Global Step: 828500 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:37,626-Speed 6297.13 samples/sec Loss 2.5020 LearningRate 0.0000 Epoch: 39 Global Step: 828510 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:40,874-Speed 6307.43 samples/sec Loss 2.5544 LearningRate 0.0000 Epoch: 39 Global Step: 828520 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:44,123-Speed 6305.13 samples/sec Loss 2.5438 LearningRate 0.0000 Epoch: 39 Global Step: 828530 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:47,374-Speed 6301.06 samples/sec Loss 2.5303 LearningRate 0.0000 Epoch: 39 Global Step: 828540 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:50,612-Speed 6325.52 samples/sec Loss 2.5825 LearningRate 0.0000 Epoch: 39 Global Step: 828550 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:53,868-Speed 6292.38 samples/sec Loss 2.5596 LearningRate 0.0000 Epoch: 39 Global Step: 828560 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:07:57,144-Speed 6252.65 samples/sec Loss 2.5475 LearningRate 0.0000 Epoch: 39 Global Step: 828570 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:08:00,408-Speed 6276.03 samples/sec Loss 2.5232 LearningRate 0.0000 Epoch: 39 Global Step: 828580 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:08:03,677-Speed 6267.52 samples/sec Loss 2.5306 LearningRate 0.0000 Epoch: 39 Global Step: 828590 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:08:06,905-Speed 6343.65 samples/sec Loss 2.5275 LearningRate 0.0000 Epoch: 39 Global Step: 828600 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:10,160-Speed 6295.07 samples/sec Loss 2.5456 LearningRate 0.0000 Epoch: 39 Global Step: 828610 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:13,413-Speed 6296.76 samples/sec Loss 2.5071 LearningRate 0.0000 Epoch: 39 Global Step: 828620 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:16,690-Speed 6250.87 samples/sec Loss 2.4927 LearningRate 0.0000 Epoch: 39 Global Step: 828630 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:19,948-Speed 6288.22 samples/sec Loss 2.5141 LearningRate 0.0000 Epoch: 39 Global Step: 828640 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:23,202-Speed 6293.53 samples/sec Loss 2.4901 LearningRate 0.0000 Epoch: 39 Global Step: 828650 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:26,453-Speed 6301.76 samples/sec Loss 2.5486 LearningRate 0.0000 Epoch: 39 Global Step: 828660 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:29,702-Speed 6303.55 samples/sec Loss 2.4927 LearningRate 0.0000 Epoch: 39 Global Step: 828670 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:32,957-Speed 6294.61 samples/sec Loss 2.5155 LearningRate 0.0000 Epoch: 39 Global Step: 828680 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:36,208-Speed 6300.00 samples/sec Loss 2.5435 LearningRate 0.0000 Epoch: 39 Global Step: 828690 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:39,460-Speed 6299.43 samples/sec Loss 2.5544 LearningRate 0.0000 Epoch: 39 Global Step: 828700 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:08:42,711-Speed 6300.52 samples/sec Loss 2.5140 LearningRate 0.0000 Epoch: 39 Global Step: 828710 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:45,956-Speed 6313.95 samples/sec Loss 2.5212 LearningRate 0.0000 Epoch: 39 Global Step: 828720 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:49,206-Speed 6301.39 samples/sec Loss 2.4889 LearningRate 0.0000 Epoch: 39 Global Step: 828730 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:52,458-Speed 6299.50 samples/sec Loss 2.5415 LearningRate 0.0000 Epoch: 39 Global Step: 828740 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:55,712-Speed 6295.76 samples/sec Loss 2.5260 LearningRate 0.0000 Epoch: 39 Global Step: 828750 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:08:58,961-Speed 6305.01 samples/sec Loss 2.5154 LearningRate 0.0000 Epoch: 39 Global Step: 828760 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:09:02,215-Speed 6296.45 samples/sec Loss 2.5426 LearningRate 0.0000 Epoch: 39 Global Step: 828770 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:09:05,463-Speed 6306.40 samples/sec Loss 2.5017 LearningRate 0.0000 Epoch: 39 Global Step: 828780 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:09:08,713-Speed 6303.85 samples/sec Loss 2.5398 LearningRate 0.0000 Epoch: 39 Global Step: 828790 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:09:11,963-Speed 6302.51 samples/sec Loss 2.4906 LearningRate 0.0000 Epoch: 39 Global Step: 828800 Fp16 Grad Scale: 2048 Required: 0 hours Training: 2022-04-03 19:09:15,276-Speed 6185.78 samples/sec Loss 2.5294 LearningRate 0.0000 Epoch: 39 Global Step: 828810 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:18,526-Speed 6301.62 samples/sec Loss 2.5620 LearningRate 0.0000 Epoch: 39 Global Step: 828820 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:21,780-Speed 6297.14 samples/sec Loss 2.5657 LearningRate 0.0000 Epoch: 39 Global Step: 828830 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:25,036-Speed 6290.76 samples/sec Loss 2.5406 LearningRate 0.0000 Epoch: 39 Global Step: 828840 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:28,289-Speed 6296.31 samples/sec Loss 2.5407 LearningRate 0.0000 Epoch: 39 Global Step: 828850 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:31,541-Speed 6299.90 samples/sec Loss 2.5203 LearningRate 0.0000 Epoch: 39 Global Step: 828860 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:34,799-Speed 6287.09 samples/sec Loss 2.5461 LearningRate 0.0000 Epoch: 39 Global Step: 828870 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:38,048-Speed 6304.94 samples/sec Loss 2.5398 LearningRate 0.0000 Epoch: 39 Global Step: 828880 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:41,298-Speed 6303.14 samples/sec Loss 2.5197 LearningRate 0.0000 Epoch: 39 Global Step: 828890 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:44,547-Speed 6303.93 samples/sec Loss 2.5266 LearningRate 0.0000 Epoch: 39 Global Step: 828900 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:47,811-Speed 6275.61 samples/sec Loss 2.5027 LearningRate 0.0000 Epoch: 39 Global Step: 828910 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-03 19:09:51,047-Speed 6331.41 samples/sec Loss 2.5577 LearningRate 0.0000 Epoch: 39 Global Step: 828920 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:54,311-Speed 6275.95 samples/sec Loss 2.5665 LearningRate 0.0000 Epoch: 39 Global Step: 828930 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:09:57,559-Speed 6306.39 samples/sec Loss 2.5147 LearningRate 0.0000 Epoch: 39 Global Step: 828940 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:00,813-Speed 6295.31 samples/sec Loss 2.4730 LearningRate 0.0000 Epoch: 39 Global Step: 828950 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:04,069-Speed 6292.79 samples/sec Loss 2.5235 LearningRate 0.0000 Epoch: 39 Global Step: 828960 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:07,326-Speed 6288.80 samples/sec Loss 2.5491 LearningRate 0.0000 Epoch: 39 Global Step: 828970 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:10,579-Speed 6296.98 samples/sec Loss 2.5144 LearningRate 0.0000 Epoch: 39 Global Step: 828980 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:13,833-Speed 6296.33 samples/sec Loss 2.5421 LearningRate 0.0000 Epoch: 39 Global Step: 828990 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:17,087-Speed 6295.91 samples/sec Loss 2.5563 LearningRate 0.0000 Epoch: 39 Global Step: 829000 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:20,340-Speed 6296.40 samples/sec Loss 2.5103 LearningRate 0.0000 Epoch: 39 Global Step: 829010 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:23,577-Speed 6328.34 samples/sec Loss 2.4803 LearningRate 0.0000 Epoch: 39 Global Step: 829020 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:26,833-Speed 6292.09 samples/sec Loss 2.5309 LearningRate 0.0000 Epoch: 39 Global Step: 829030 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:30,091-Speed 6287.47 samples/sec Loss 2.4986 LearningRate 0.0000 Epoch: 39 Global Step: 829040 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:33,342-Speed 6301.06 samples/sec Loss 2.5044 LearningRate 0.0000 Epoch: 39 Global Step: 829050 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:36,593-Speed 6300.38 samples/sec Loss 2.5498 LearningRate 0.0000 Epoch: 39 Global Step: 829060 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:39,874-Speed 6242.70 samples/sec Loss 2.5486 LearningRate 0.0000 Epoch: 39 Global Step: 829070 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:43,118-Speed 6315.41 samples/sec Loss 2.5197 LearningRate 0.0000 Epoch: 39 Global Step: 829080 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:46,372-Speed 6294.14 samples/sec Loss 2.4762 LearningRate 0.0000 Epoch: 39 Global Step: 829090 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:49,634-Speed 6281.28 samples/sec Loss 2.5143 LearningRate 0.0000 Epoch: 39 Global Step: 829100 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:52,891-Speed 6288.63 samples/sec Loss 2.4840 LearningRate 0.0000 Epoch: 39 Global Step: 829110 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:56,126-Speed 6332.55 samples/sec Loss 2.5053 LearningRate 0.0000 Epoch: 39 Global Step: 829120 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:10:59,381-Speed 6293.90 samples/sec Loss 2.5231 LearningRate 0.0000 Epoch: 39 Global Step: 829130 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:02,647-Speed 6271.45 samples/sec Loss 2.5472 LearningRate 0.0000 Epoch: 39 Global Step: 829140 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:05,895-Speed 6307.18 samples/sec Loss 2.5302 LearningRate 0.0000 Epoch: 39 Global Step: 829150 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:09,158-Speed 6278.84 samples/sec Loss 2.5578 LearningRate 0.0000 Epoch: 39 Global Step: 829160 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:12,407-Speed 6304.16 samples/sec Loss 2.5034 LearningRate 0.0000 Epoch: 39 Global Step: 829170 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:15,662-Speed 6294.38 samples/sec Loss 2.5420 LearningRate 0.0000 Epoch: 39 Global Step: 829180 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:18,926-Speed 6276.46 samples/sec Loss 2.5264 LearningRate 0.0000 Epoch: 39 Global Step: 829190 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:22,190-Speed 6275.12 samples/sec Loss 2.5457 LearningRate 0.0000 Epoch: 39 Global Step: 829200 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:25,441-Speed 6302.21 samples/sec Loss 2.5894 LearningRate 0.0000 Epoch: 39 Global Step: 829210 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:28,682-Speed 6320.09 samples/sec Loss 2.5201 LearningRate 0.0000 Epoch: 39 Global Step: 829220 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:31,929-Speed 6307.44 samples/sec Loss 2.5858 LearningRate 0.0000 Epoch: 39 Global Step: 829230 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:35,183-Speed 6296.01 samples/sec Loss 2.5769 LearningRate 0.0000 Epoch: 39 Global Step: 829240 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:38,450-Speed 6270.14 samples/sec Loss 2.5524 LearningRate 0.0000 Epoch: 39 Global Step: 829250 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:41,702-Speed 6297.96 samples/sec Loss 2.5401 LearningRate 0.0000 Epoch: 39 Global Step: 829260 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:44,949-Speed 6309.06 samples/sec Loss 2.5426 LearningRate 0.0000 Epoch: 39 Global Step: 829270 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:48,200-Speed 6301.68 samples/sec Loss 2.4734 LearningRate 0.0000 Epoch: 39 Global Step: 829280 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:51,450-Speed 6303.59 samples/sec Loss 2.5748 LearningRate 0.0000 Epoch: 39 Global Step: 829290 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:54,698-Speed 6306.58 samples/sec Loss 2.5014 LearningRate 0.0000 Epoch: 39 Global Step: 829300 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:11:57,950-Speed 6298.43 samples/sec Loss 2.5536 LearningRate 0.0000 Epoch: 39 Global Step: 829310 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:01,184-Speed 6334.58 samples/sec Loss 2.5334 LearningRate 0.0000 Epoch: 39 Global Step: 829320 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:04,433-Speed 6304.92 samples/sec Loss 2.4843 LearningRate 0.0000 Epoch: 39 Global Step: 829330 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:07,703-Speed 6264.41 samples/sec Loss 2.4768 LearningRate 0.0000 Epoch: 39 Global Step: 829340 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:10,956-Speed 6297.21 samples/sec Loss 2.5303 LearningRate 0.0000 Epoch: 39 Global Step: 829350 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:14,212-Speed 6290.68 samples/sec Loss 2.5703 LearningRate 0.0000 Epoch: 39 Global Step: 829360 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:17,467-Speed 6293.64 samples/sec Loss 2.5198 LearningRate 0.0000 Epoch: 39 Global Step: 829370 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:20,749-Speed 6242.24 samples/sec Loss 2.5047 LearningRate 0.0000 Epoch: 39 Global Step: 829380 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:24,003-Speed 6295.42 samples/sec Loss 2.5510 LearningRate 0.0000 Epoch: 39 Global Step: 829390 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:27,253-Speed 6302.45 samples/sec Loss 2.5683 LearningRate 0.0000 Epoch: 39 Global Step: 829400 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:30,506-Speed 6297.03 samples/sec Loss 2.5201 LearningRate 0.0000 Epoch: 39 Global Step: 829410 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:33,757-Speed 6302.09 samples/sec Loss 2.5396 LearningRate 0.0000 Epoch: 39 Global Step: 829420 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-03 19:12:36,992-Speed 6330.70 samples/sec Loss 2.5385 LearningRate 0.0000 Epoch: 39 Global Step: 829430 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:40,243-Speed 6301.57 samples/sec Loss 2.4925 LearningRate 0.0000 Epoch: 39 Global Step: 829440 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:43,494-Speed 6300.97 samples/sec Loss 2.5216 LearningRate 0.0000 Epoch: 39 Global Step: 829450 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:46,751-Speed 6289.26 samples/sec Loss 2.5398 LearningRate 0.0000 Epoch: 39 Global Step: 829460 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:50,006-Speed 6292.69 samples/sec Loss 2.5635 LearningRate 0.0000 Epoch: 39 Global Step: 829470 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:53,265-Speed 6285.87 samples/sec Loss 2.5668 LearningRate 0.0000 Epoch: 39 Global Step: 829480 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:56,521-Speed 6290.93 samples/sec Loss 2.5114 LearningRate 0.0000 Epoch: 39 Global Step: 829490 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:12:59,776-Speed 6294.20 samples/sec Loss 2.5040 LearningRate 0.0000 Epoch: 39 Global Step: 829500 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:13:03,031-Speed 6292.58 samples/sec Loss 2.5132 LearningRate 0.0000 Epoch: 39 Global Step: 829510 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:13:06,278-Speed 6310.12 samples/sec Loss 2.5029 LearningRate 0.0000 Epoch: 39 Global Step: 829520 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:13:09,517-Speed 6322.92 samples/sec Loss 2.5114 LearningRate 0.0000 Epoch: 39 Global Step: 829530 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:13:12,765-Speed 6306.31 samples/sec Loss 2.4690 LearningRate 0.0000 Epoch: 39 Global Step: 829540 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:13:16,016-Speed 6301.38 samples/sec Loss 2.5282 LearningRate 0.0000 Epoch: 39 Global Step: 829550 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-03 19:13:19,277-Speed 6281.42 samples/sec Loss 2.5471 LearningRate 0.0000 Epoch: 39 Global Step: 829560 Fp16 Grad Scale: 4096 Required: -0 hours Training: 2022-04-03 19:13:22,542-Speed 6273.90 samples/sec Loss 2.5925 LearningRate 0.0000 Epoch: 39 Global Step: 829570 Fp16 Grad Scale: 4096 Required: -0 hours Training: 2022-04-03 19:13:25,797-Speed 6294.49 samples/sec Loss 2.5826 LearningRate 0.0000 Epoch: 39 Global Step: 829580 Fp16 Grad Scale: 4096 Required: -0 hours